Homework 3
- Due Oct 13, 2023 by 11:59pm
- Points 100
- Submitting a file upload
Homework Project 3
- Due Fri, Oct 13, 2023
- Homeworks, projects and assignments
- Homework Submission Rules
- Homework Headers
Problem 3.1 (Probability Distributions):
Complete the following tasks for each of the probability distributions below:
- Generate plots of the density, CDF, and the quantile (inverse-CDF) functions Links to an external site.
- Report the first 4 moments (mean, variance, skewness, kurtosis) Links to an external site.
- Complete the discrete probability distributions table below. The cell values in the table represent the values of the quantile function for the corresponding p-value (column) and distribution (row).
SMHS Prob Distributions Links to an external site. | Probabilities | ||||||||
Distributions Links to an external site. | 0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | 0.8 | 0.9 |
Weibull(1,5) | |||||||||
Uniform(-13,100) | |||||||||
Student's t (df=3) | |||||||||
Cauchy | |||||||||
Negative Binomial(8, 0.3) | |||||||||
Chi-Square (df=7) | |||||||||
Poisson (9) |
Problem 3.2 (Matrix equation solution):
Use R to solve the following system of (4) linear equations, then validate your solution.
6x + 3y - 3z + w = 2
7x + y + 2z + 2w = 5
5x + 3y - 3z + w = 3
-6x - 2y + 3z = 6
Problem 3.3 (Dimensionality reduction)
Use PCA, t-SNE, and UMAP to visualize (2D and 3D), analyze, and interpret the Autism Dataset (ABIDE, Case-Study #17, ABIDE_Aggregated_Data.csv). To avoid various complexities, focus on a subset of the features including some of the demographics variables (researchGroup, subjectSex, Dx_Category, subjectAge, weightKg, handedness, iq) and 300-500 of the derived neuroimaging biomarkers (L_superior_frontal_gyrus, ..., brainstem + volume_3rd-Ventricle, ..., curv_ind_rh_V2). The outcome "researchGroup" should only be used to label the lower-dimensional projections to see if there is clear separation of clinical phenotypes in the projection space, and not be used in the actual dimensionality reduction or as a model-based predictive covariate.
Problem 3.4 (Least Squares Estimation)
Use the SOCR Knee Pain dataset
Links to an external site., extract the LF = Left-Front
locations (x,y), and fit in a linear model for vertical location (y) in terms of the horizontal location (x). Display the linear model on top of the scatter plot of the paired data.
Problem 3.5 (Extra Challenge, optional)
Generate a dynamic 3D scatter plot of the following 3 features of the Autism Dataset: subjectAge, L_hippocampus, and R_hippocampus. Consider a model Y=R_hippocampus regressed on the other two variables, X={subjectAge, L_hippocampus}. Estimate and display a flat 2D plane model that outputs a prediction for each input .
Remember that 2D planes in R3 are defined by aX + bY +cZ +d = 0, where the coordinates are (X,Y,Z), the normal vector to the plane is (a,b,c) and the free-parameter "d" is defined so that the plane passes through a specific point, e.g., coordinate mean of the 3D scatter plot. That is, once you compute the first 2 principal vectors v1 and v2, their cross-product (v1 x v2) is the plane normal vector (a,b,c). Then, force the plane to go through the point representing the arithmetic average of the 3D point scatter and this will restrict the final "d" (free) parameter.
Please try to come up with elegant solutions, informative visualizations, and detailed documentation/explanations.
Rubric
Criteria | Ratings | Pts | ||
---|---|---|---|---|
Correctness and scientific validity
threshold:
pts
|
|
pts
--
|
||
Result reproducibility
threshold:
pts
|
|
pts
--
|
||
Content focus, presentaiton style, and clarity
threshold:
pts
|
|
pts
--
|
||
Total Points:
100
out of 100
|