Homework Project 3
- Due Mar 9, 2018 by 11:59pm
- Points 100
- Submitting a file upload
Homework Project 3
Multivariate analysis for behavioral research
- Due to Dr. Anne Buu: 3/14/18 Wed, in class
- Homeworks, projects and assignments Links to an external site.
- Homework Submission Rules Links to an external site.
- Homework Headers Links to an external site.
Problem 3.1 This project aims to classify 39 cities in the USA in terms of their air population levels using the following indices available from government databases:
- Temperature: average annual temperature in Fahrenheit
- Factories: number of manufacturing enterprises employing 20 or more workers
- Population: population size (1970 census) in thousands
- Windspeed: average annual wind speed in miles per hour
- Rain: average annual precipitation in inches
- Rainydays: average number of days with precipitation per year.
The resulting clusters will be then evaluated by the following gold standard:
So2: SO2 content of air in micrograms per cubic meter.
Data set: The data set “usair.sas7bdat” contains 8 variables including the 7 variables described above and the identifier “City”, see some details here Links to an external site..
Analytical tasks:
Remember to turn on the graphing function (ods graphics on;) in SAS and standardize the raw data using the sample means and standard deviations before you conduct the following analysis.
- Conduct complete linkage clustering analysis on the data using the 6 predictors available from government databases. Print out the dendrogram. Using the model in (A) to generate a 4-cluster model. List the cluster membership for each city. Comment on the specific cluster that Detroit belongs to. What are the other cities that belong to the same cluster? What are their common features?
- Conduct an overall MANOVA test on the 6 predictors based on the 4 clusters in (A). Interpret the results.
- Conduct an ANOVA test on the gold standard (So2) to evaluate the 4-cluster model in (A). Interpret the results.
- Conduct K-mean clustering analysis (K=4) on the data using the 6 predictors. Print out the cluster membership for each city. Comment on the specific cluster that Detroit belongs to. What are the other cities that belong to the same cluster? What are their common features?
- Conduct an overall MANOVA test on the 6 predictors based on the 4 clusters in (D). Interpret the results.
- Conduct an ANOVA test on the gold standard (So2) to evaluate the 4-cluster model in (D). Interpret the results.
- Compare the complete linkage clustering model (with 4 clusters) and the K-mean clustering model (K=4) and comment on your preference based on the following two criteria: (1) the cluster membership of Detroit; and (2) the evaluation based on the gold standard (So2).
- Re-run the complete linkage clustering analysis in (A). Output the cluster membership (under the 4-cluster model) and the original variables into a temporary SAS data set named after yourself. Conduct a principal component analysis on this temporary SAS data set using the same 6 predictors as in (A). Interpret the correlation matrix and the first two principal components (what do they mean and what proportion of the variance can they explain).
- Produce a scatter plot with the first principal component as the x-axis and the second principal component as the y-axis, using the cluster membership generated in (H) to label each data point. Comment on the characteristics of the 4 clusters in terms of the first two principal components.
- Conduct a linear regression analysis using the first two principal components in (H) to predict SO2 Interpret the results in terms of the direction, magnitude, and hypothesis testing of the regression coefficients.
Submission: Print out your SAS program, relevant outputs and your written report and submit them to Dr. Buu in class.
Scoring: Each question is worth 1 point; the max points=10.
Rubric
Criteria | Ratings | Pts | ||
---|---|---|---|---|
Correctness and scientific validity
threshold:
pts
|
|
pts
--
|
||
Result reproducibility
threshold:
pts
|
|
pts
--
|
||
Content focus, presentaiton style, and clarity
threshold:
pts
|
|
pts
--
|
||
Total Points:
100
out of 100
|