Homework 2
- Due Sep 29, 2023 by 11:59pm
- Points 100
- Submitting a file upload
Homework Project 2
- Due Fri, Sept 29, 2023
- Homeworks, projects and assignments
- Homework Submission Rules
- Homework Headers
Problem 2.1 (Data Manipulation):
For each of the two case-studies below, load the data, generate summary statistics for all features, plot some of the features using plot_ly() histograms, box plots, density plots, etc., as appropriate, and save the summaries locally as tab-delimited text files.
- Use the above TBI data Links to an external site. to explore some bivariate relations (e.g. bivariate plot, correlation, table, crosstable etc.)
- Use DSPA Case-Study 7, 07_UMich_AnnArbor_MI_TempPrecipitation_HistData_1900_2015 data, to show the relations between temperature and time. [Hint: use
plot_ly()
].
Problem 2.3 (Missing Data)
Introduce (artificially) some missing data in the Knee Pain dataset Links to an external site., impute the missing values and examine the differences between the original incomplete and the imputed datasets.
Problem 2.4 (Surface Plots)
Generate a surface plot for the (RF
) Knee Pain data illustrating the 2D distribution of locations of the patient reported knee pain (use plot_ly and kernel density estimation).
Problem 2.5 (Sample-Size Rebalancing)
Rebalance the groups of ALS (training data) patients according ALSFRS−Total−max>37 vs.
ALSFRS−Total−max≤37 using synthetic minority oversampling (SMOTE)
Links to an external site. and confirm approximately equal cohort sizes. [Hint: table(ALS.train$ALSFRS_Total_max <= 37) # FALSE TRUE # 257 1966]
Rubric
Criteria | Ratings | Pts | ||
---|---|---|---|---|
Correctness and scientific validity
threshold:
pts
|
|
pts
--
|
||
Result reproducibility
threshold:
pts
|
|
pts
--
|
||
Content focus, presentaiton style, and clarity
threshold:
pts
|
|
pts
--
|
||
Total Points:
100
out of 100
|