This is an open-ended R&D projects that SOCR/DSPA Trainees can complete. Any creative solutions can be send to the instructor (Ivo D. Dinov). Use the Autism Brain Imaging Data Exchange (ABIDE) Data to design a meaningful biomedical study examining, characterizing, and contrasting normal and pathological (autism) brain neuro-development.
These data consist of derived neuroimaging data, quality assessment (QA) metrics prefixed by anat_ and func_, and manual quality assessment prefixed by qc_.
Automated QA Measures: These columns reflect automated metrics where outliers may be identified by a statistical procedure (e.g., \(2\sigma\)).
Anatomical measures:
Functional measures:
3dTout
command in AFNI.More information and meta-data are available in the data-provenance DOCX in the DSPA ABIDE Case-Study Folder.
# install.packages(magrittr)
library(magrittr)
# load ABIDE data (ABIDE_Aggregated_Data.csv)
<- read.csv('https://umich.instructure.com/files/20935287/download?download_frd=1', header=T)
ABIDE_data
dim(ABIDE_data) # 1098 2145
## [1] 1098 2145
attach(ABIDE_data)
# Review the data element types
# colnames(ABIDE_data)
# Potential relevant Outcomes (Y)
table(ABIDE_data$researchGroup)
##
## Autism Control
## 528 570
# Autism Control
# 528 570
table(ABIDE_data$subjectSex)
##
## F M
## 163 935
# Data Cleaning (QC)
#replaces the missing (-9999) IQ values with 30
$iq <- replace(ABIDE_data$iq, ABIDE_data$iq<0, 30)
ABIDE_data
# Visualize the data
#table(ABIDE_data$iq)
library(plotly)
<- list(title = "Intelligence (IQ)")
xLabel <- list(title = "Frequency")
yLabel plot_ly(x = ~ABIDE_data$iq, type = "histogram") %>%
layout(xaxis = xLabel, yaxis = yLabel)
# MODEL the data
# Fit and plot linear models according to specified predictors and outcomes
<- function (Y, X) {
fitPlot_LM_Model # Y= outcome column name
# X= vector of predictor column names
### .......
# return (myPlot)
}
#### Run the Full model-fitting prospectively and display the prediction forecasts
# Logit modeling
Introduce some MCAR deletions. Impute the missing values and compare the (simulated-missing) data and models to their complete (original) data counterparts.
# Introduce simulated MCAR missingness
# Imputation
# Rhat convergence statistics compares the variance between chains to the variance
# within chains (similar to the ANOVA F-test).
# Rhat Values ~ 1.0 indicate likely convergence,
# Rhat Values > 1.1 indicate that the chains should be run longer
# (use large number of iterations)
# Compare the results of the complete data (1979-2020) models to the imputed data model (1979-2020)
# Plot the resulting models and quantify model differences
Using the DSPA Chapter 3 for more elaborate data mixture distribution modeling, develop some forward prediction models.
Try some of the DSPA unsupervised clustering and classification techniques on the US macro-economic dataset.
Think out-of-the-box in this interactive-learning projects using the monthly US macro-economic data. Try to use the RMD source and the provided data to experiment with novel AI/ML techniques. Think of ways to augment these data (expand the time range and increase the feature richness).