Homework 1
- Due Sep 15, 2023 by 11:59pm
- Points 100
- Submitting a file upload
Homework Project 1
- Due Fri, Sept 15, 2023
- Homeworks, projects and assignments
- Homework Submission Rules
- Homework Headers
Problem 1.1 (Long-to-Wide Data format translation):
We demonstrated the wide-to-long conversion in lecture. Load in the SOCR Housing Price Dataset Links to an external site.. It's in a long format where 7 years of statewide data are shown on top of each other (long format), i.e., each state has 7 rows in the data (covering 2000-2006 data). Convert the original data from long to wide, and then back to long format. Note that in the wide format, each state will have a single row of data, with seven times the number of original columns, i.e., the original variables, HPI, UR, Region, Pop, and Percent will be tagged with year, e.g., HPI2000, ..., HPI2006.
Problem 1.2 (Data stratification):
Use the same Schizophrenia Neuroimaging Study Dataset Links to an external site. and complete the following data-manipulation steps in R. These steps need not be concatenated (i.e., applied sequentially).
- Extract the first 10 subjects
- Find the cases for which
L_caudate
< 160. - Sort the subjects based on
L_caudate
values in descending and ascending order. - Generate frequency and probability tables for
Age
,FS_IQ
, andSex
. - Compute the mean
Age
and the correlation betweenAge
andFS_IQ
. - Plot Histogram and density of
R_fusiform_gyrus
, and draw scatterplotL_fusiform_gyrus
andL_insular_cortex
.
Generate 10,000 standard normal variables and another 5,000 student t
distributed random variables with df=5
. Generate a quantile-quantile (Q-Q) probability plot of the two samples. Then, compare it with qqnorm()
or plot_ly()
of student t
simulation and interpret the findings.
Define a new function myMode()
that computes the sample mode(s). Test your function using the simulation data you generate in the last question (#1.3). Did you cover all possible situations for the input data? Does your function work with qualitative (numeric), qualitative (character), and tensor (array) inputs? Handle mixed type data-frame inputs gracefully.
Rubric
Criteria | Ratings | Pts | ||
---|---|---|---|---|
Correctness and scientific validity
threshold:
pts
|
|
pts
--
|
||
Result reproducibility
threshold:
pts
|
|
pts
--
|
||
Content focus, presentaiton style, and clarity
threshold:
pts
|
|
pts
--
|
||
Total Points:
100
out of 100
|