This is an open-ended R&D projects that SOCR/DSPA Trainees can complete. Any creative solutions can be send to the instructor (Ivo D. Dinov). Use the Autism Brain Imaging Data Exchange (ABIDE) Data to design a meaningful biomedical study examining, characterizing, and contrasting normal and pathological (autism) brain neuro-development.

1 Autism Brain Imaging Data Exchange (ABIDE) Data

These data consist of derived neuroimaging data, quality assessment (QA) metrics prefixed by anat_ and func_, and manual quality assessment prefixed by qc_.

Automated QA Measures: These columns reflect automated metrics where outliers may be identified by a statistical procedure (e.g., \(2\sigma\)).

Anatomical measures:

  • Contrast to Noise Ratio [anat_cnr]: mean of the gray matter values minus the mean of the white matter values, divided by the standard deviation of the air values 1.
  • Entropy Focus Criterion [anat_efc]: Shannon’s entropy is used to summarize the principal directions distribution, higher energy indicating the distribution is more uniform (i.e., less noisy).
  • Foreground to Background Energy Ratio [anat_fber]: Mean energy of image values (i.e., mean of squares) within the head relative to outside the head. -Smoothness of Voxels [anat_fwhm]: The full-width half maximum (FWHM) of the spatial distribution of the image intensity values in terms of voxels (e.g., a value of 3 implies smoothness of 3 voxels).
  • Percent of Artifact Voxels [anat_qi1]: The proportion of voxels with intensity corrupted by artifacts normalized by the number of voxels in the background.
  • Signal to Noise Ratio [anat_snr]: The mean of image values within gray matter divided by the standard deviation of the image values within air (i.e., outside the head) 1.

Functional measures:

  • Entropy Focus Criterion [func_efc]: Shannon’s entropy is used to summarize the principal directions distribution, higher energy indicating the distribution is more uniform (i.e., less noisy)
  • Foreground to Background Energy Ratio [func_fber]: Mean energy of image values (i.e., mean of squares) within the head relative to outside the head. Uses mean functional.
  • Smoothness of Voxels [func_fwhm]: The full-width half maximum (FWHM) of the spatial distribution of the image intensity values. Uses mean functional.
  • Standardized DVARS [func_dvars]: The spatial standard deviation of the temporal derivative of the data, normalized by the temporal standard deviation and temporal autocorrelation.
  • Fraction of Outlier Voxels [func_outlier]: The mean fraction of outliers found in each volume using 3dTout command in AFNI.
  • Mean Distance to Median Volume [func_quality]: The mean distance (1 – spearman’s rho) between each time-point’s volume and the median volume using AFNI’s 3dTqual command.
  • Mean Framewise Displacement (FD) [func_mean_fd]: A measure of subject head motion, which compares the motion between the current and previous volumes. This is calculated by summing the absolute value of displacement changes in the x, y and z directions and rotational changes about those three axes. The rotational changes are given distance values based on the changes across the surface of a 50mm radius sphere.
  • Number FD greater than 0.2mm [func_num_fd]: The number of frames or volumes with displacement greater than 0.2mm.
  • Percent FD greater than 0.2mm [func_perc_fd]: The percent of frames or volumes with displacement greater than 0.2mm.
  • Ghost to Signal Ratio [func_gsr]: A measure of the mean signal in the ‘ghost’ image (signal present outside the brain due to acquisition in the phase encoding direction) relative to mean signal within the brain.
  • Manual QA measures: Manual inspection of the data was carried out by three independent raters.

More information and meta-data are available in the data-provenance DOCX in the DSPA ABIDE Case-Study Folder.

1.1 Load in the data

# install.packages(magrittr)
library(magrittr)
# load ABIDE data (ABIDE_Aggregated_Data.csv)
ABIDE_data <- read.csv('https://umich.instructure.com/files/20935287/download?download_frd=1', header=T)

dim(ABIDE_data)  # 1098 2145
## [1] 1098 2145
attach(ABIDE_data)

1.2 Data Modeling, EDA

# Review the data element types
# colnames(ABIDE_data)

# Potential relevant Outcomes (Y)
table(ABIDE_data$researchGroup)
## 
##  Autism Control 
##     528     570
#  Autism Control 
#    528     570 

table(ABIDE_data$subjectSex)
## 
##   F   M 
## 163 935
# Data Cleaning (QC)
#replaces the missing (-9999) IQ values with 30
ABIDE_data$iq <- replace(ABIDE_data$iq, ABIDE_data$iq<0, 30)

# Visualize the data
#table(ABIDE_data$iq)
library(plotly)
xLabel <- list(title = "Intelligence (IQ)")
yLabel <- list(title = "Frequency")
plot_ly(x = ~ABIDE_data$iq, type = "histogram") %>%
  layout(xaxis = xLabel, yaxis = yLabel)
# MODEL the data
# Fit and plot linear models according to specified predictors and outcomes
fitPlot_LM_Model <- function (Y, X) {   
    # Y= outcome column name
    # X= vector of predictor column names
    
    ### .......
  
    # return (myPlot)
}

#### Run the Full model-fitting prospectively and display the prediction forecasts

2 Predict recessions

# Logit modeling

3 Multiple Imputation of incomplete Data

Introduce some MCAR deletions. Impute the missing values and compare the (simulated-missing) data and models to their complete (original) data counterparts.

# Introduce simulated MCAR missingness

# Imputation

# Rhat convergence statistics compares the variance between chains to the variance
# within chains (similar to the ANOVA F-test). 
# Rhat Values ~ 1.0 indicate likely convergence, 
# Rhat Values > 1.1 indicate that the chains should be run longer 
# (use large number of iterations)

# Compare the results of the complete data (1979-2020) models to the imputed data model (1979-2020)

# Plot the resulting models and quantify model differences

4 Mixture Distribution modeling

Using the DSPA Chapter 3 for more elaborate data mixture distribution modeling, develop some forward prediction models.

5 Unsupervised clustering

Try some of the DSPA unsupervised clustering and classification techniques on the US macro-economic dataset.

6 Venture beyond …

Think out-of-the-box in this interactive-learning projects using the monthly US macro-economic data. Try to use the RMD source and the provided data to experiment with novel AI/ML techniques. Think of ways to augment these data (expand the time range and increase the feature richness).

7 References

SOCR Resource Visitor number Web Analytics SOCR Email