Topics

Learning Modules

I. Foundations of R
Class Notes » Links to an external site. R Code » Links to an external site. Assignment » Links to an external site.

Statistical Software – Pros/Cons Comparison    

Getting started        

Install Basic Shell-based R 

GUI based R Invocation (RStudio)

RStudio GUI Layout 

Help

Simple Long-to-Wide Data format translation    

Data generation

I/O      

Slicing and extracting data 

Variable conversion 

Variable information 

Data selection and manipulation   

Math Functions       

Matrix Operations    

Advanced Data Processing

Strings

Plotting

QQ Normal Probability Plots

Low-level plotting commands       

Graphics parameters         

Optimization and model fitting      

Statistics       

Distributions  

Programming

Data Simulation Primer

II. Managing data with R
Class Notes » Links to an external site. R Code » Links to an external site. Assignment » Links to an external site.

Managing Data in R

Saving and Loading R Data Structures  

Importing and Saving Data from CSV Files      

Exploring the Structure of Data   

Exploring Numeric Variables       

Measuring the Central Tendency - mean and median 

Measuring Spread - quartiles and the five-number summary 

Visualizing Numeric Variables - boxplots

Visualizing Numeric Variables - histograms      

Understanding Numeric Data - uniform and normal distributions      

Measuring Spread - variance and standard deviation  

Exploring Categorical Variables   

Measuring the Central Tendency - the mode

Exploring Relationships Between Variables

Missing Data

Parsing webpages and visualizing tabular HTML data

Cohort-Rebalancing (for Imbalanced Groups)

III. Data Visualization
Class Notes » Links to an external site. R Code » Links to an external site. Assignment » Links to an external site.

Classification of visualization methods   

Composition 

Histograms and density plots      

Pie Chart     

Heat map     

Comparison

Paired ScatterPlots 

Barplots       

Trees and Graphs  

Correlation Plots     

Relationships

Line plots using ggplot      

Density Plots

Distributions 

2D Kernel Density and 3D Surface Plots

Jitter plot     

Appendix     

Hands-on Activity (Health Behavior Risks)       

IV. Linear Algebra & Matrix Computing
Class Notes » Links to an external site. R Code » Links to an external site. Assignment » Links to an external site.

Linear Algebra & Matrix Computing        

Building Matrices     

Create matrices       

Adding columns and rows  

Matrix subscripts     

Matrix Operations    

Addition        

Subtraction   

Multiplication 

Elementwise multiplication  

Matrix multiplication 

Division         

Transpose     

Inverse

Matrix Operations    

Matrix Algebra Notation     

Matrix Notation        

Solving Systems of Equations      

The identity matrix   

Vectors, Matrices, and Scalars     

Sample Statistics     

Mean  

Variance       

Applications of Matrix Algebra: Linear modeling 

Finding function extrema (min/max) using calculus      

Least Square Estimation    

The R lm Function   

Eigenvalues and Eigenvectors      

Other important functions  

Matrix notation        

Linear regression

Sample covariance matrix  

V. Dimensionality Reduction
Class Notes » Links to an external site. R Code » Links to an external site. Assignment » Links to an external site.

Principal Component Analysis (PCA)

Independent Component Analysis (ICA)

Factor Analysis (FA)

Singular Value Decomposition (SVD)

VI. Lazy Learning – Classification Using Nearest Neighbors
Class Notes » Links to an external site. R Code » Links to an external site. Assignment » Links to an external site.

Understanding classification using nearest neighbors

The kNN algorithm

Calculating distance

Choosing an appropriate k

Preparing data for use with kNN

Why is the kNN algorithm lazy?

Predictive Diagnostics

VII. Probabilistic Learning – Classification Using Naive Bayes
Class Notes » Links to an external site. R Code » Links to an external site. Assignment » Links to an external site.

The Naive Bayes Algorithm         

Assumptions

Bayes Formula       

The Laplace Estimator      

Case Study: Head and Neck Cancer Medication        

VIII. Divide and Conquer – Classification Using Decision Trees
Class Notes » Links to an external site. R Code » Links to an external site. Assignment » Links to an external site.

Understanding decision trees

Divide and conquer

The C5.0 decision tree algorithm

Choosing the best split

Pruning the decision tree

Boosting the accuracy of decision trees

Making some mistakes more costly than others

Understanding classification rules

Separate and conquer

The One Rule algorithm

The RIPPER algorithm

Rules from decision trees

IX. Forecasting Numeric Data – Regression Methods
Class Notes » Links to an external site. R Code » Links to an external site. Assignment » Links to an external site.

Simple linear regression    

Ordinary least squares estimation

Correlations 

Multiple Linear Regression

Case Study 1: Baseball Players

Step 2 - exploring and preparing the data

Step 3 - training a model on the data     

Step 4 - evaluating model performance 

Step 5 - improving model performance  

Regression trees and model trees

Heart Attack Data

X. Black Box Methods – Neural Networks and Support Vector Machines
Class Notes » Links to an external site. R Code » Links to an external site. Assignment » Links to an external site.

Neural Networks     

Network topology   

Training neural networks with backpropagation

Case Study 1: Google Trends and the Stock Market

Support Vector Machines (SVM)

Case Study 2: Optical Character Recognition (OCR)

Case Study 3: Iris Flowers

XI. Apriori Algorithm for Association Rule Learning
Class Notes » Links to an external site. R Code » Links to an external site. Assignment » Links to an external site.

Association Rules   

Rule support and confidence

Case Study 1: Head and Neck Cancer Medications

Practice Problems: Groceries

XII. k-Means Clustering
Class Notes » Links to an external site. R Code » Links to an external site. Assignment » Links to an external site.

Clustering as a machine learning task    

The k-Means Clustering Algorithm         

Case Study 1: Divorce and Consequences on Young Adults 

Case study 2: Pediatric Trauma

Practice Problem: Youth Development

XIII. Evaluating Model Performance
Class Notes » Links to an external site. R Code » Links to an external site. Assignment » Links to an external site.

Measuring performance for classification         

Working with classification prediction data

Evaluation: Confusion matrices

Other performance measures

Visualizing performance tradeoffs

Estimating future performance (internal statistical validation)

The holdout method

XIV. Improving Model Performance
Class Notes » Links to an external site. R Code » Links to an external site. Assignment » Links to an external site.

Tuning stock models for better performance

Using caret for automated parameter tuning

Creating a simple tuned model

Customizing the tuning process

Improving model performance with meta-learning

Understanding ensembles

Bagging

Boosting

Random forests

Training random forests

Evaluating random forest performance

XV. Specialized Machine Learning Topics: Data Formats and Optimization of Computation
Class Notes » Links to an external site. R Code » Links to an external site. Assignment » Links to an external site.

Working with specialized data and databases

Querying data in SQL databases

Downloading the complete text of web pages

Web-page Data Scraping

Parsing JSON from web APIs

Reading and writing Microsoft Excel spreadsheets using XLSX

Visualizing network data

Optimization and improving the computational performance

Generalizing tabular data structures with dplyr

Parallel computing

GPU computing

XVI. Variable/Feature Selection XVII. Regularized Linear Modeling and Knockoff Filtering
Class Notes » Links to an external site. R Code » Links to an external site. Assignment » Links to an external site.

Regularized Linear Modeling       

Ridge Regression   

Least Absolute Shrinkage and Selection Operator (LASSO) Regression         

Linear Regression  

Assessing Prediction Accuracy    

Estimating Prediction Error

Improving Prediction Accuracy

General Regularization Framework

Example: Neuroimaging-genetics study of Parkinson's Disease Dataset

Computational Complexity

n-Fold Cross Validation

Knock-off Filtering: Simulated Example

PD Neuroimaging-genetics Case-Study

Visualization 

XVIII. Big Longitudinal Data Analysis
Class Notes » Links to an external site. R Code » Links to an external site. Assignment » Links to an external site.

Time series analysis

Identifying the Diff, AR and MA parameters

Structural Equation Modeling (SEM)

Case study - Parkinson's Disease (PD)   

Linear Mixed model 

GLMM and GEE Longitudinal data analysis

XIX. Text Mining & Natural Language Processing
Class Notes » Links to an external site. R Code » Links to an external site. Assignment » Links to an external site.

Term Frequency (TF), Inverse Document Frequency (IDF)

Document Term Matrix (DTM)

Case-Study: Job ranking 

NLP

XX. Prediction and Internal Statistical Cross Validation
Class Notes » Links to an external site. R Code » Links to an external site. Assignment » Links to an external site.

Forecasting types and assessment approaches

Overfitting

Internal Statistical Cross-validation is an iterative process

Example (Linear Regression)

Cross-validation methods

Case-Studies

Summary of CS output

Alternative predictor functions

Prediction Models

Appendix: R Debugging

XXI. Function Optimization
Class Notes » Links to an external site. R Code » Links to an external site. Assignment » Links to an external site.

Free (unconstrained) optimization

Constrained Optimization

Equality nand Inequality constraints

Lagrange Multipliers

Linear and Quadratic Programming

Manual vs. Automated Lagrange Multiplier Optimization

Data Denoising

XXII. Deep Learning
Class Notes » Links to an external site. R Code » Links to an external site. Assignment » Links to an external site.

Perceptrons

Biological Relevance

Simple Neural Net Examples XOR and NAND Operators

Sonar data example

Schizophrenia Neuroimaging Study

Spirals 2D Data

IBS Study

Country QoL Ranking Data

Handwritten Digits Classification

Classifying Real-World Images