Topics
Learning Modules
I. Foundations of RClass Notes » Links to an external site. | R Code » Links to an external site. | Assignment » Links to an external site. |
Statistical Software – Pros/Cons Comparison
Getting started
Install Basic Shell-based R
GUI based R Invocation (RStudio)
RStudio GUI Layout
Help
Simple Long-to-Wide Data format translation
Data generation
I/O
Slicing and extracting data
Variable conversion
Variable information
Data selection and manipulation
Math Functions
Matrix Operations
Advanced Data Processing
Strings
Plotting
QQ Normal Probability Plots
Low-level plotting commands
Graphics parameters
Optimization and model fitting
Statistics
Distributions
Programming
Data Simulation Primer
Class Notes » Links to an external site. | R Code » Links to an external site. | Assignment » Links to an external site. |
Managing Data in R
Saving and Loading R Data Structures
Importing and Saving Data from CSV Files
Exploring the Structure of Data
Exploring Numeric Variables
Measuring the Central Tendency - mean and median
Measuring Spread - quartiles and the five-number summary
Visualizing Numeric Variables - boxplots
Visualizing Numeric Variables - histograms
Understanding Numeric Data - uniform and normal distributions
Measuring Spread - variance and standard deviation
Exploring Categorical Variables
Measuring the Central Tendency - the mode
Exploring Relationships Between Variables
Missing Data
Parsing webpages and visualizing tabular HTML data
Cohort-Rebalancing (for Imbalanced Groups)
Class Notes » Links to an external site. | R Code » Links to an external site. | Assignment » Links to an external site. |
Classification of visualization methods
Composition
Histograms and density plots
Pie Chart
Heat map
Comparison
Paired ScatterPlots
Barplots
Trees and Graphs
Correlation Plots
Relationships
Line plots using ggplot
Density Plots
Distributions
2D Kernel Density and 3D Surface Plots
Jitter plot
Appendix
Hands-on Activity (Health Behavior Risks)
Class Notes » Links to an external site. | R Code » Links to an external site. | Assignment » Links to an external site. |
Linear Algebra & Matrix Computing
Building Matrices
Create matrices
Adding columns and rows
Matrix subscripts
Matrix Operations
Addition
Subtraction
Multiplication
Elementwise multiplication
Matrix multiplication
Division
Transpose
Inverse
Matrix Operations
Matrix Algebra Notation
Matrix Notation
Solving Systems of Equations
The identity matrix
Vectors, Matrices, and Scalars
Sample Statistics
Mean
Variance
Applications of Matrix Algebra: Linear modeling
Finding function extrema (min/max) using calculus
Least Square Estimation
The R lm Function
Eigenvalues and Eigenvectors
Other important functions
Matrix notation
Linear regression
Sample covariance matrix
Class Notes » Links to an external site. | R Code » Links to an external site. | Assignment » Links to an external site. |
Principal Component Analysis (PCA)
Independent Component Analysis (ICA)
Factor Analysis (FA)
Singular Value Decomposition (SVD)
Class Notes » Links to an external site. | R Code » Links to an external site. | Assignment » Links to an external site. |
Understanding classification using nearest neighbors
The kNN algorithm
Calculating distance
Choosing an appropriate k
Preparing data for use with kNN
Why is the kNN algorithm lazy?
Predictive Diagnostics
Class Notes » Links to an external site. | R Code » Links to an external site. | Assignment » Links to an external site. |
The Naive Bayes Algorithm
Assumptions
Bayes Formula
The Laplace Estimator
Case Study: Head and Neck Cancer Medication
Class Notes » Links to an external site. | R Code » Links to an external site. | Assignment » Links to an external site. |
Understanding decision trees
Divide and conquer
The C5.0 decision tree algorithm
Choosing the best split
Pruning the decision tree
Boosting the accuracy of decision trees
Making some mistakes more costly than others
Understanding classification rules
Separate and conquer
The One Rule algorithm
The RIPPER algorithm
Rules from decision trees
Class Notes » Links to an external site. | R Code » Links to an external site. | Assignment » Links to an external site. |
Simple linear regression
Ordinary least squares estimation
Correlations
Multiple Linear Regression
Case Study 1: Baseball Players
Step 2 - exploring and preparing the data
Step 3 - training a model on the data
Step 4 - evaluating model performance
Step 5 - improving model performance
Regression trees and model trees
Heart Attack Data
Class Notes » Links to an external site. | R Code » Links to an external site. | Assignment » Links to an external site. |
Neural Networks
Network topology
Training neural networks with backpropagation
Case Study 1: Google Trends and the Stock Market
Support Vector Machines (SVM)
Case Study 2: Optical Character Recognition (OCR)
Case Study 3: Iris Flowers
Class Notes » Links to an external site. | R Code » Links to an external site. | Assignment » Links to an external site. |
Association Rules
Rule support and confidence
Case Study 1: Head and Neck Cancer Medications
Practice Problems: Groceries
Class Notes » Links to an external site. | R Code » Links to an external site. | Assignment » Links to an external site. |
Clustering as a machine learning task
The k-Means Clustering Algorithm
Case Study 1: Divorce and Consequences on Young Adults
Case study 2: Pediatric Trauma
Practice Problem: Youth Development
Class Notes » Links to an external site. | R Code » Links to an external site. | Assignment » Links to an external site. |
Measuring performance for classification
Working with classification prediction data
Evaluation: Confusion matrices
Other performance measures
Visualizing performance tradeoffs
Estimating future performance (internal statistical validation)
The holdout method
Class Notes » Links to an external site. | R Code » Links to an external site. | Assignment » Links to an external site. |
Tuning stock models for better performance
Using caret for automated parameter tuning
Creating a simple tuned model
Customizing the tuning process
Improving model performance with meta-learning
Understanding ensembles
Bagging
Boosting
Random forests
Training random forests
Evaluating random forest performance
Class Notes » Links to an external site. | R Code » Links to an external site. | Assignment » Links to an external site. |
Working with specialized data and databases
Querying data in SQL databases
Downloading the complete text of web pages
Web-page Data Scraping
Parsing JSON from web APIs
Reading and writing Microsoft Excel spreadsheets using XLSX
Visualizing network data
Optimization and improving the computational performance
Generalizing tabular data structures with dplyr
Parallel computing
GPU computing
Class Notes » Links to an external site. | R Code » Links to an external site. | Assignment » Links to an external site. |
Variable selection methods
Case Study - ALS
Evaluating model performance
Class Notes » Links to an external site. | R Code » Links to an external site. | Assignment » Links to an external site. |
Regularized Linear Modeling
Ridge Regression
Least Absolute Shrinkage and Selection Operator (LASSO) Regression
Linear Regression
Assessing Prediction Accuracy
Estimating Prediction Error
Improving Prediction Accuracy
General Regularization Framework
Example: Neuroimaging-genetics study of Parkinson's Disease Dataset
Computational Complexity
n-Fold Cross Validation
Knock-off Filtering: Simulated Example
PD Neuroimaging-genetics Case-Study
Visualization
Class Notes » Links to an external site. | R Code » Links to an external site. | Assignment » Links to an external site. |
Time series analysis
Identifying the Diff, AR and MA parameters
Structural Equation Modeling (SEM)
Case study - Parkinson's Disease (PD)
Linear Mixed model
GLMM and GEE Longitudinal data analysis
Class Notes » Links to an external site. | R Code » Links to an external site. | Assignment » Links to an external site. |
Term Frequency (TF), Inverse Document Frequency (IDF)
Document Term Matrix (DTM)
Case-Study: Job ranking
NLP
Class Notes » Links to an external site. | R Code » Links to an external site. | Assignment » Links to an external site. |
Forecasting types and assessment approaches
Overfitting
Internal Statistical Cross-validation is an iterative process
Example (Linear Regression)
Cross-validation methods
Case-Studies
Summary of CS output
Alternative predictor functions
Prediction Models
Appendix: R Debugging
Class Notes » Links to an external site. | R Code » Links to an external site. | Assignment » Links to an external site. |
Free (unconstrained) optimization
Constrained Optimization
Equality nand Inequality constraints
Lagrange Multipliers
Linear and Quadratic Programming
Manual vs. Automated Lagrange Multiplier Optimization
Data Denoising
Class Notes » Links to an external site. | R Code » Links to an external site. | Assignment » Links to an external site. |
Perceptrons
Biological Relevance
Simple Neural Net Examples XOR and NAND Operators
Sonar data example
Schizophrenia Neuroimaging Study
Spirals 2D Data
IBS Study
Country QoL Ranking Data
Handwritten Digits Classification
Classifying Real-World Images