--- title: "SOCR Case-Study: Deaths in Guatemala (2009-2016)" subtitle: "

Fall 2018 SOCR Health Analytics Training Workshop

" author: "

SOCR/MIDAS (Ivo Dinov)

" date: "`r format(Sys.time(), '%B %Y')`" tags: [DSPA, SOCR, MIDAS, Big Dta, Predictive Analytics] output: html_document: theme: spacelab highlight: tango toc: true number_sections: true toc_depth: 2 toc_float: collapsed: false smooth_scroll: true --- # Import, plot, sumarize and save data Load the SPSS (*.sav) 2 datasets, generate summary statistics for all variables, plot some of the features (e.g., histograms, box plots, density plots, etc.) of several variables. * [Case-Study: Deaths in Guatemala (2009_2016)](https://umich.instructure.com/courses/38100/files/folder/Case_Studies/15_ALS_CaseStudy), * [Other SOCR Case-Studies](https://umich.instructure.com/courses/38100/files/folder/Case_Studies). ```{r message=F, warning=F} # install.packages("foreign") library("foreign") pathToZip <- tempfile() download.file("https://umich.instructure.com/files/8882923/download?download_frd=1", pathToZip, mode = "wb") #dataset <- read.spss(unzip(pathToZip, files = "namcs2015-spss.sav", list = F, overwrite = TRUE), to.data.frame=TRUE) # Check ZIP file content unzip(pathToZip, list = T, overwrite = TRUE) # 2009 dataset_2009 <- read.spss(unzip(pathToZip, files = "2009vitales.sav", list = F, overwrite = TRUE), to.data.frame=TRUE) dim(dataset_2009) ## 71707 25 # str(dataset_2009) # View(dataset_2009) summary(dataset_2009) str(dataset_2009) # 2016 dataset_2016 <- read.spss(unzip(pathToZip, files = "2016vitales.sav", list = F, overwrite = TRUE), to.data.frame=TRUE) dim(dataset_2016) # 82565 28 summary(dataset_2016) str(dataset_2016) # Data Dictionary and Challenges (DDC) dataset_DDC <- readxl::read_xlsx(unzip(pathToZip, files = "DataDictionary_Challenges.xlsx")) # dim(dataset_DDC) # 82565 28 View(dataset_DDC) unlink(pathToZip) library("DT") datatable(dataset_2016) ``` # Descriptive statistics and graphs of the data Try some `exploratory` and `quantitative` data analytics for these data using these materials: * [DSPA Chapter 2: Data Management](http://www.socr.umich.edu/people/dinov/courses/DSPA_notes/02_ManagingData.html) * [DSPA Chapter 3: Visualizaiton](http://www.socr.umich.edu/people/dinov/courses/DSPA_notes/03_DataVisualization.html) ...