├── .DS_Store
├── .gitignore
├── Global Environment.mm
├── LICENSE
├── README.md
├── README2.md
├── code
└── readBaltimoreRavensHTML4.R
├── data
├── covid_GA_CO.csv
└── georgiaCovid2020-04-23.csv
├── datasciencectacontent.Rproj
├── markdown
├── .DS_Store
├── Sample mathjax equations.Rmd
├── ToothGrowthChecklist.md
├── capstone-choosingATextPackage.md
├── capstone-dfmToMatrix.md
├── capstone-ngramComputerCapacity.md
├── capstone-ngramTimings.md
├── capstone-simplifiedApproach.md
├── cleaningData-demystifyingHTMLParsing.md
├── cleaningData-gottaScrapeEmAll.md
├── cleaningData-javaAndXLSX.md
├── cleaningData-readingFiles.md
├── cleaningData-sqldfDriver.md
├── cleaningData-valueOfCleaningData.md
├── cleaningData-week2Q5.md
├── configureRStudioGitOSXVersion.md
├── configureRStudioGitWindowsVersion.md
├── dataFrameAsList.md
├── dataProd-settingShinyappsTimeout.md
├── dataProd-shinyTimeoutConfig.md
├── dss-mentorTips.md
├── edaInToothGrowthAnalysis.md
├── exampleSortRvsSAS.md
├── exdata-readingSubsetOfRawData.md
├── gen-gettingHelpWithSwirl.md
├── gen-handlingMissingValues.md
├── genMdWithGraphicsInGithub.md
├── googlesearch.md
├── images
│ ├── .DS_Store
│ ├── 2017-08-27_07-46-44.png
│ ├── 2017-10-14_14-48-35.png
│ ├── ExpDistChecklist.png
│ ├── RepDataAssignment2Checklist.png
│ ├── barplotPA2.png
│ ├── capstone-dfmToMatrix01.png
│ ├── capstone-dfmToMatrix02.png
│ ├── cleaningData-gottaScrapeEmAll01.png
│ ├── cleaningData-gottaScrapeEmAll02.png
│ ├── cleaningData-gottaScrapeEmAll03.png
│ ├── cleaningData-htmlParsing00.png
│ ├── cleaningData-htmlParsing01.png
│ ├── cleaningData-htmlParsing02.png
│ ├── cleaningData-htmlParsing03.png
│ ├── cleaningData-htmlParsing04.png
│ ├── cleaningData-htmlParsing05.png
│ ├── cleaningData-htmlParsing06.png
│ ├── cleaningData-htmlParsing07.png
│ ├── cleaningData-htmlParsing08.png
│ ├── cleaningData-htmlParsing09.png
│ ├── cleaningData-htmlParsing10.png
│ ├── cleaningData-htmlParsing11.png
│ ├── cleaningData-htmlParsing12.png
│ ├── cleaningData-htmlParsing13.png
│ ├── cleaningData-htmlParsing14.png
│ ├── cleaningData-htmlParsing15.png
│ ├── cleaningData-htmlParsing16.png
│ ├── cleaningData-htmlParsing17.png
│ ├── cleaningData-javaError01.png
│ ├── cleaningData-javaError02.png
│ ├── cleaningData-readingFiles00.png
│ ├── cleaningData-readingFiles01.png
│ ├── cleaningData-readingFiles02.png
│ ├── cleaningData-valueOfCleaningData01.png
│ ├── cleaningData-valueOfCleaningData02.png
│ ├── cleaningData-valueOfCleaningData03.png
│ ├── cleaningData-valueOfCleaningData04.png
│ ├── cleaningData-week2q5-01.png
│ ├── cleaningData-week2q5-02.png
│ ├── cleaningData-week2q5-03.png
│ ├── cleaningData-week2q5-04.png
│ ├── cleaningData-week2q5-05.png
│ ├── cleaningData-week2q5-06.png
│ ├── cleaningData-week2q5-07.png
│ ├── cleaningData-week2q5-08.png
│ ├── configRStudioGit1.png
│ ├── configRStudioGit10.png
│ ├── configRStudioGit11.png
│ ├── configRStudioGit12.png
│ ├── configRStudioGit13.png
│ ├── configRStudioGit14.png
│ ├── configRStudioGit2.png
│ ├── configRStudioGit3.png
│ ├── configRStudioGit4.png
│ ├── configRStudioGit4a.png
│ ├── configRStudioGit5.png
│ ├── configRStudioGit6.png
│ ├── configRStudioGit7.png
│ ├── configRStudioGit8.png
│ ├── configRStudioGit9.png
│ ├── dataProd-shinyConfig1.png
│ ├── dataProd-shinyConfig2.png
│ ├── dataProd-shinyConfig3.png
│ ├── exdata-readSubsetOfRawData01.png
│ ├── exdata-readSubsetOfRawData02.png
│ ├── exdata-readSubsetOfRawData03.png
│ ├── exdata-readSubsetOfRawData04.png
│ ├── forumPostFeatures1.png
│ ├── forumPostFeatures2.png
│ ├── forumPostFeatures3.png
│ ├── forumPostFeatures4.png
│ ├── gen-gettingHelpWithSwirl01.png
│ ├── gen-gettingHelpWithSwirl02.png
│ ├── gen-gettingHelpWithSwirl03.png
│ ├── googlesheets01.png
│ ├── googlesheets02.png
│ ├── googlesheets03.png
│ ├── googlesheets04.png
│ ├── googlesheets05.png
│ ├── googlesheets06.png
│ ├── googlesheets07.png
│ ├── googlesheets08.png
│ ├── installMikTeX00.png
│ ├── installMikTeX01.png
│ ├── installMikTeX02.png
│ ├── installMikTeX03.png
│ ├── installMikTeX04.png
│ ├── installMikTeX05.png
│ ├── installMikTeX06.png
│ ├── installMikTeX07.png
│ ├── installMikTeX08.png
│ ├── installMikTeX09.png
│ ├── installMikTeX10.png
│ ├── installMikTeX11.png
│ ├── installMikTeX12.png
│ ├── installMikTeX13.png
│ ├── installMikTeX14.png
│ ├── installMikTeX15.png
│ ├── installMikTeX16.png
│ ├── installMikTeX17.png
│ ├── installMikTeX18.png
│ ├── installMikTeX19.png
│ ├── installMikTeX20.png
│ ├── installMikTeX21.png
│ ├── installMikTeX22.png
│ ├── installMikTeX23.png
│ ├── installMikTeX24.png
│ ├── installMikTeX25.png
│ ├── installMikTeX26.png
│ ├── kableTable01.png
│ ├── misc-Mathjax01.png
│ ├── misc-Mathjax02.png
│ ├── misc-Mathjax03.png
│ ├── misc-Mathjax04.png
│ ├── misc-Mathjax05.png
│ ├── misc-rOnChromebook01.png
│ ├── misc-rOnChromebook02.png
│ ├── misc-rOnChromebook03.png
│ ├── misc-rOnChromebook04.png
│ ├── misc-rOnChromebook05.png
│ ├── misc-rOnChromebook06.png
│ ├── misc-rOnChromebook07.png
│ ├── misc-rOnChromebook08.png
│ ├── misc-rOnChromebook09.png
│ ├── misc-rOnChromebook10.png
│ ├── misc-rOnChromebook11.png
│ ├── misc-rOnChromebook12.png
│ ├── misc-rOnChromebook13.png
│ ├── misc-rOnChromebook14.png
│ ├── misc-rOnChromebook15.png
│ ├── misc-rOnChromebook16.png
│ ├── misc-rOnChromebook17.png
│ ├── misc-rOnChromebook18.png
│ ├── misc-rOnChromebook19.png
│ ├── misc-rOnChromebook20.png
│ ├── misc-rOnChromebook21.png
│ ├── misc-rOnChromebook22.png
│ ├── misc-rOnChromebook23.png
│ ├── pml-ElemStatLearn01.png
│ ├── pml-ElemStatLearn02.png
│ ├── pml-ElemStatLearn03.png
│ ├── pml-combiningPredictorsBinomial01.png
│ ├── pml-installingRattleOnMacOSX01.png
│ ├── pml-installingRattleOnMacOSX02.png
│ ├── regmods-sumEtimesXeq0.png
│ ├── repData-configKnitrWithMD01.png
│ ├── repData-configKnitrWithMD02.png
│ ├── repData-configKnitrWithMD03.png
│ ├── repData-configKnitrWithMD04.png
│ ├── repData-configKnitrWithMD05.png
│ ├── repData-configKnitrWithMD06.png
│ ├── repData-configKnitrWithMD07.png
│ ├── repData-configKnitrWithMD08.png
│ ├── repData-configKnitrWithMD09.png
│ ├── repData-configKnitrWithMD10.png
│ ├── repData-configKnitrWithMD11.png
│ ├── repData-configKnitrWithMD12.png
│ ├── repData-configKnitrWithMD13.png
│ ├── repData-stormDataGuide01.png
│ ├── rprog-Assignment1Instructions.PDF
│ ├── rprog-OOPandR01.png
│ ├── rprog-OOPandR02.png
│ ├── rprog-assignment1Demo01.png
│ ├── rprog-assignment1Demo02.png
│ ├── rprog-assignment1Demo03.png
│ ├── rprog-assignment1DemoOutput01.png
│ ├── rprog-assignment1DemoOutput02.png
│ ├── rprog-assignment1Solutions01.png
│ ├── rprog-assignment1Solutions02.png
│ ├── rprog-assignment1Solutions03.png
│ ├── rprog-assignment1Solutions04.png
│ ├── rprog-assignmentOperator01.png
│ ├── rprog-assignmentOperators.png
│ ├── rprog-breakingDownMakeVector01.png
│ ├── rprog-breakingDownMakeVector01a.png
│ ├── rprog-breakingDownMakeVector02.png
│ ├── rprog-breakingDownMakeVector03.png
│ ├── rprog-breakingDownMakeVector04.png
│ ├── rprog-breakingDownMakeVector05.png
│ ├── rprog-conceptsForFileProcessing01.png
│ ├── rprog-conceptsForFileProcessing02.png
│ ├── rprog-conceptsForFileProcessing03.png
│ ├── rprog-downloadLectures01.png
│ ├── rprog-downloadLectures02.png
│ ├── rprog-downloadLectures03.png
│ ├── rprog-extractOperator01.png
│ ├── rprog-extractOperator02.png
│ ├── rprog-extractOperator03.png
│ ├── rprog-extractOperator04.png
│ ├── rprog-extractOperator05.png
│ ├── rprog-extractOperator06.png
│ ├── rprog-extractOperator07.png
│ ├── rprog-extractOperator08.png
│ ├── rprog-extractOperator09.png
│ ├── rprog-extractOperator10.png
│ ├── rprog-githubDesktop01.png
│ ├── rprog-githubDesktop02.png
│ ├── rprog-githubDesktop03.png
│ ├── rprog-githubDesktop04.png
│ ├── rprog-githubDesktop05.png
│ ├── rprog-githubDesktop06.png
│ ├── rprog-githubDesktop07.png
│ ├── rprog-githubDesktop08.png
│ ├── rprog-githubDesktop09.png
│ ├── rprog-githubDesktop10.png
│ ├── rprog-githubDesktop11.png
│ ├── rprog-githubDesktop12.png
│ ├── rprog-githubDesktop13.png
│ ├── rprog-lexicalScoping02.png
│ ├── rprog-lexicalScoping03.png
│ ├── rprog-lexicalScopingDiagrams.pptx
│ ├── rprog-lexicalScopingIllustration.png
│ ├── rprog-pollutantmean01.png
│ ├── rprog-pollutantmean02.png
│ ├── rprog-pollutantmeanWithSAS00.png
│ ├── rprog-pollutantmeanWithSAS01.png
│ ├── rprog-pollutantmeanWithSAS02.png
│ ├── rprog-pollutantmeanWithSAS03.png
│ ├── rprog-pollutantmeanWithSAS04.png
│ ├── rprog-weightedMean01.png
│ ├── rprog-weightedMean02.png
│ ├── rprog-weightedMean03.png
│ ├── rprog-weightedMean04.png
│ ├── rprog-weightedMean05.png
│ ├── rprog-weightedMean06.png
│ ├── rprog-weightedMean07.png
│ ├── sha1hash-1of4.png
│ ├── sha1hash-2of4.png
│ ├── sha1hash-3of4.png
│ ├── sha1hash-4of4.png
│ ├── signup1.png
│ ├── signup2.png
│ ├── statinf-areaOfPointOnNormalCurve00.png
│ ├── statinf-areaOfPointOnNormalCurve000.png
│ ├── statinf-areaOfPointOnNormalCurve01.png
│ ├── statinf-areaOfPointOnNormalCurve02.png
│ ├── statinf-areaOfPointOnNormalCurve03.png
│ ├── statinf-areaOfPointOnNormalCurve04.png
│ ├── statinf-permutationTests00.png
│ ├── statinf-permutationTests01.png
│ ├── statinf-permutationTests02.png
│ ├── statinf-poissonInterval01.png
│ ├── statinf-suppressPrintingInKinitr01.png
│ ├── statinf-suppressPrintingInKinitr02.png
│ ├── statinf-usingKnitrInReports01.png
│ ├── statinf-usingKnitrInReports02.png
│ ├── statinf-usingKnitrInReports03.png
│ ├── statinf-usingKnitrInReports04.png
│ ├── statinf-varOfBinomialDistribution01.png
│ ├── statinf-varOfBinomialDistribution02.png
│ ├── statinf-varOfBinomialDistribution03.png
│ ├── statinf-varianceOfExpDist01.png
│ └── toothGrowthChecklist.png
├── kableDataFrameTable.md
├── makeItRun.md
├── mathjaxWithGithubMarkdown.md
├── misc-installingRonChromebook.md
├── permutationTestExample.Rmd
├── permutationTestExample.md
├── pml-ElemStatLearnPackage.md
├── pml-caretRunTimings.md
├── pml-combiningPredictorsBinomial.md
├── pml-ghPagesSetup.md
├── pml-installingRattleOnMacOSX.md
├── pml-predictionSummary.md
├── pml-projectChecklist.md
├── pml-randomForestPerformance.md
├── pml-requiredModelAccuracy.md
├── regmodels-references.md
├── regmodels-sumOfErrorTimesX.md
├── repData-configuringKnitrWithMarkdownOutput.md
├── repData-stormAnalysisGuide.md
├── repDataAssignment2Checklist.md
├── repdata-improvingInitialFileReadSpeed.md
├── repdata-stormAnalysisCodebook.md
├── resources
│ ├── 2013.Velloso.QAR-WLE.pdf
│ ├── ASAStatComp-Bell-Koren-VolinskyArticle.pdf
│ ├── Confidence Intervals for Poisson Variables.pdf
│ ├── MissingDataReview.pdf
│ ├── chambers.pdf
│ └── makeVector logic.docx
├── rprog-OOPandR.md
├── rprog-References.md
├── rprog-assignment1Demos.md
├── rprog-assignment1Solutions.md
├── rprog-assignmentOperators.md
├── rprog-breakingDownComplete.md
├── rprog-breakingDownCorr.md
├── rprog-breakingDownMakeVector.md
├── rprog-completeSortProblem.md
├── rprog-conceptsForFileProcessing.md
├── rprog-discussPollutantmean.md
├── rprog-downloadingFiles.md
├── rprog-downloadingLectures.md
├── rprog-dssValueProposition.md
├── rprog-extractOperator.md
├── rprog-githubDesktopSync.md
├── rprog-gradeSHA1hash.md
├── rprog-lexicalScoping.md
├── rprog-onboardingForSASUsers.md
├── rprog-overwritingRFunctions.md
├── rprog-pollutantmeanSASVersion.md
├── rprog-rScopingVsC.md
├── rprog-rVsPython.md
├── rprog-rprogrammingResources.md
├── rprog-sortFunctionsExample.md
├── rprog-weightedMeans.md
├── rprogAssignment2Prototype.md
├── statinf-accessingRCodeFromKnitrAppendix.md
├── statinf-areaOfPointOnNormalCurve.md
├── statinf-expDistChecklist.md
├── statinf-generatePDF.md
├── statinf-optimalSampleSize.md
├── statinf-permutationTests.md
├── statinf-poissonInterval.Rmd
├── statinf-poissonInterval.md
├── statinf-references.html
├── statinf-references.md
├── statinf-varOfBinomialDistribution.md
├── statinf-varianceOfExpDist.md
├── statsPackagesHistory.md
├── toolbox-RStudioOnChromebook.md
├── toolbox-computerStandards.md
├── toolbox-courseDifficultyLevels.md
├── usingMarkdownInForumPosts.md
└── whyIsRHarderThanSAS.md
├── pml-elemStatLearnAccess.R
├── pml-exampleSonarRandomForest.R
├── pml-modelAccuracyCalcs.R
├── resources
├── 2010LSIexcerpt.pdf
├── STATNews Critique of IHME COVID-19 model.PDF
├── data science process.mdj
├── noaaSSTCodebook.xlsx
├── repDataProcess.mdj
└── testMakeCacheMatrix.R
├── rprog-extractOperator.R
├── statinf-integralCalculations.Rmd
├── statinf-varOfBinomialDistribution.Rmd
└── statinf-varOfBinomialDistribution.html
/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/.DS_Store
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | # History files
2 | .Rhistory
3 | .Rapp.history
4 |
5 | # Example code in package build process
6 | *-Ex.R
7 |
8 | # RStudio files
9 | .Rproj.user/
10 |
11 | # produced vignettes
12 | vignettes/*.html
13 | vignettes/*.pdf
14 | .Rproj.user
15 |
--------------------------------------------------------------------------------
/Global Environment.mm:
--------------------------------------------------------------------------------
1 |
14 |
--------------------------------------------------------------------------------
/code/readBaltimoreRavensHTML4.R:
--------------------------------------------------------------------------------
1 | ### attempt to read w/ rvest package 21 Nov 2020
2 | ### ESPN redesigned their NFL site again, making it more difficult to parse
3 |
4 | library(rvest)
5 | baseURL <- "https://www.espn.com/nfl/team/schedule/_/name/bal"
6 |
7 | html <- read_html(baseURL)
8 | ## content selected with rvest SelectorGadget
9 | theTable <- html_nodes(html,xpath='//*[contains(concat( " ", @class, " " ), concat( " ", "Table__TD", " " ))] | //*[contains(concat( " ", @class, " " ), concat( " ", "Card__Content", " " ))]//*[contains(concat( " ", @class, " " ), concat( " ", "items-center", " " ))]//*[contains(concat( " ", @class, " " ), concat( " ", "dib", " " ))]')
10 | textData <- html_text(theTable)
11 | # use iebiball / vgrep to find where each row begins, easier
12 | # to see when viewed as a data frame
13 | df <- data.frame(textData)
14 | View(df)
15 |
16 | # elements 59, 60 are BYE week
17 | rowStartIDs <- c(11,19,27,35,43,51,61,69,77,85)
18 | gamesPlayed <- data.frame(do.call(rbind,
19 | lapply(rowStartIDs,function(x) textData[x:(x+7)])))
20 | colnames(gamesPlayed) <- c("Week","Date","Opponent","Result","Record","HighPasser",
21 | "HighRusher","HighReceiver")
22 |
23 |
24 |
25 |
26 | ### older version of code that has 351 elements due to differences in xpath,
27 | ### which parses out additional data
28 |
29 | # observe each row of played games has 19 data elements, bye week has 3
30 | # elements, and unplayed games have 12 data elements
31 |
32 |
33 | library(rvest)
34 | baseURL <- "https://www.espn.com/nfl/team/schedule/_/name/bal"
35 | html <- read_html(baseURL)
36 | ## content selected with rvest SelectorGadget
37 | theTable <- html_nodes(html,xpath='//*[contains(concat( " ", @class, " " ), concat( " ", "Table__TD", " " ))] | //span')
38 | textData <- html_text(theTable)
39 |
40 | # assign ID value based on element in vector where played games start,
41 | # ignoring bye week
42 | rowStartIDs <- c(43,62,81,100,119,138,160,179,198)
43 |
44 | # columns to retain include 1,3,6,8,10,11,12,14,16,18
45 | gamesPlayed <- data.frame(do.call(rbind,
46 | lapply(rowStartIDs,function(x) textData[x:(x+18)])))[c(1,3,6,8,10,11,12,14,16,18)]
47 | # add column names
48 | colnames(gamesPlayed) <- c("Week","Date","Location","Opponent", "Outcome",
49 | "Score","Record","HighPasser","HighRusher","HighReceiver")
--------------------------------------------------------------------------------
/datasciencectacontent.Rproj:
--------------------------------------------------------------------------------
1 | Version: 1.0
2 |
3 | RestoreWorkspace: Default
4 | SaveWorkspace: Default
5 | AlwaysSaveHistory: Default
6 |
7 | EnableCodeIndexing: Yes
8 | UseSpacesForTab: Yes
9 | NumSpacesForTab: 2
10 | Encoding: UTF-8
11 |
12 | RnwWeave: Sweave
13 | LaTeX: pdfLaTeX
14 |
--------------------------------------------------------------------------------
/markdown/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/.DS_Store
--------------------------------------------------------------------------------
/markdown/Sample mathjax equations.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Equations"
3 | author: "Len Greski"
4 | date: "December 17, 2016"
5 | output: word_document
6 | ---
7 |
8 | ## Sample equations here
9 |
10 | $\frac {\bar X - \mu}{\frac{\sigma}{\sqrt n}}$
11 |
--------------------------------------------------------------------------------
/markdown/ToothGrowthChecklist.md:
--------------------------------------------------------------------------------
1 | Working through the Data Science Specialization, students in numerous courses have lost credit on project assignments due to missing key points in the instructions. Here is a checklist for the Tooth Growth analysis that can be used to ensure key requirements from the assignment are covered in your project submission.
2 |
3 |
4 |
--------------------------------------------------------------------------------
/markdown/capstone-choosingATextPackage.md:
--------------------------------------------------------------------------------
1 | # Capstone: Choosing a Text Analysis package
2 |
3 | Given the diversity of R packages (over 9,000 available as of May 2017) and the popularity of natural language processing as a domain for data science, students have a wide variety of R packages from which to choose for the project.
4 |
5 | ## Key Considerations
6 |
7 | There are two key considerations for selecting a package to use during the Capstone project: features and performance. First, does a particular package have the features one needs to complete the required tasks? Feature rich packages allow students to spend more time understanding the data instead of manually coding algorithms in R. Second, how fast does the package complete the work, given the amount of data to be analyzed. For the Capstone project, the data includes a total of 4,269,678 texts as we stated earlier in the article.
8 |
9 | R conducts all of its processing in memory (versus disk), so the text algorithms must be able to fit the data in memory in order to process them. Text mining packages that use memory efficiently will handle larger problems than those that use memory less efficiently. In practical terms, R packages that use C/C++ will be more efficient, handle larger problems, and run faster than those which use Java.
10 |
11 | The [CRAN Task View for Natural Language Processing](https://cran.r-project.org/web/views/NaturalLanguageProcessing.html) provides a comprehensive list of packages that can be used for textual analysis with R. Some of the packages used by students during the Capstone course include:
12 |
13 | * [ngram](https://cran.r-project.org/web/packages/ngram/vignettes/ngram-guide.pdf)
14 | * [quanteda](https://cran.r-project.org/web/packages/quanteda/quanteda.pdf)
15 | * [RWeka](https://cran.r-project.org/web/packages/RWeka/RWeka.pdf)
16 | * [tm](https://cran.r-project.org/web/packages/tm/vignettes/tm.pdf)
17 |
18 | Each package has its strengths and weaknesses. For example, `ngram` is fast but it's capability is limited solely to the production of ngrams. `RWeka` and `tm` have a broader set of text mining features, but have significantly slower performance and do not scale well to a large corpus such as the one we must use for the Capstone project.
19 |
20 | ## Why use quanteda?
21 |
22 | `quanteda` provides a rich set of text analysis features coupled with excellent performance relative to Java-based R packages for text analysis. Quoting Kenneth Benoit from the [quanteda github README](https://github.com/kbenoit/quanteda):
23 |
24 | > **Built for efficiency and speed.** All of the functions in `quanteda` are built for maximum performance and scale while still being as R-based as possible. The package makes use of three efficient architectural elements: the `stringi` package for text processing, the `Matrix` package for sparse matrix objects, and the `data.table` package for indexing large documents efficiently. If you can fit it into memory, `quanteda` will handle it quickly. (And eventually, we will make it possible to process objects even larger than available memory.)
25 |
26 | The aspect of quanteda being "R like" is very useful, in contrast to packages like `ngram`. Also, since `quanteda` relies on `data.table`, it's particularly well suited to use for the Capstone. Why? `data.table` has features to index a data table so students can retrieve values by index rather than having to sequentially process an entire data frame to extract a small number of rows. Since the final deliverable for the Capstone project is a text prediction app written in Shiny, students will find `data.table` is an effective and efficient mechanism to use with a text prediction algorithm.
27 |
28 | ## Quanteda Resources
29 |
30 | Ken Benoit and the quanteda team have recently upgraded the documentation for the quanteda package. As of late 2017 a number of resources are available to help people become productive with Quanteda, including the following.
31 |
32 | * [Quanteda Overview](http://bit.ly/2k3DVBC)
33 | * [Quanteda Design / Structure](http://bit.ly/2zhssX0)
34 | * [Quick Start Guide](http://bit.ly/2AHy3V2)
35 | * [Detailed Function Reference](http://bit.ly/2op7PF1)
36 | * [Source Repository](http://bit.ly/2CnJPUG)
37 | * [Report a Problem](http://bit.ly/2yIR3AE)
38 |
39 | *last updated: 16 Dec 2017*
40 |
--------------------------------------------------------------------------------
/markdown/capstone-dfmToMatrix.md:
--------------------------------------------------------------------------------
1 | # Common Problems: Converting Document Feature Matrix to a Regular Matrix
2 |
3 | A common tool in natural language processing is the document feature matrix, also known as a [document term matrix](http://bit.ly/2B6ILE4). Many of the text analysis packages in R use a special type of sparse matrix to minimize the amount of memory used to store a document feature matrix.
4 |
5 | Sometimes people new to natural language processing attempt to convert a document feature matrix to a regular matrix in R. Usually this procedure fails because the computer runs out of memory. Why does this happen? Converting a document feature matrix to a regular matrix is very expensive in terms of memory. The document feature matrix doesn't consume memory for empty cells in the matrix, whereas a regular numeric matrix consumes 8 bytes per element, plus overhead.
6 |
7 | To illustrate the comparison, I took a .2% sample of the data from the [Heliohost Corpus](http://bit.ly/2qm30YY) used for the *Johns Hopkins University Data Science Specialization* Capstone project, generated a document feature matrix, and then converted it to a matrix. The document feature matrix of words consumes about 1.5Gb as a matrix.
8 |
9 |
10 |
11 | We can calculate in advance the size of the matrix by using the `ndoc()` and `nfeature()` functions. Each element in a numeric matrix consumes 8 bytes of RAM. One can then compare the predicted size with actual size via `object.size()`.
12 |
13 | As illustrated in the output posted below, the actual size of the matrix is within 2 Mb of the estimate.
14 |
15 |
16 |
17 | If one uses this technique on the `Corpus.tokens.dfm` object, it quickly becomes evident why the `as.matrix()` function runs out of memory when applied to a document frequency matrix.
18 |
19 | The code listed above was run on the MacBook Pro with the following configuration.
20 |
21 |
22 |
23 | Computer |
24 | Configuration |
25 |
26 |
27 | Apple Macbook Pro |
28 |
29 |
30 | - Operating system: OS X Sierra 10.12.6 (16G29)
31 | - Processor: Intel i5 at 2.6Ghz, turbo up to 3.3Ghz, two cores with two threads each
32 | - Memory: 8 gigabytes
33 | - Disk: 512 gigabytes, solid state drive
34 | - Date built: April 2013
35 |
36 | |
37 |
38 |
39 |
--------------------------------------------------------------------------------
/markdown/capstone-ngramTimings.md:
--------------------------------------------------------------------------------
1 | # n-gram Timings: 25% Sample
2 |
3 | Starting tokenization...
4 | ...preserving Twitter characters (#, @)...total elapsed: 0.326000000000022 seconds.
5 | ...tokenizing texts
6 | ...removing separators....total elapsed: 48.7730000000001 seconds.
7 | ...replacing Twitter characters (#, @)...total elapsed: 12.5729999999999 seconds.
8 | ...replacing names...total elapsed: 0.0399999999999636 seconds.
9 | Finished tokenizing and cleaning 1,067,420 texts.
10 | Warning message:
11 | In tokenize.character(char_tolower(theText), removePunct = TRUE, :
12 | Arguments removePunctremoveNumbersremoveSeparatorsremoveTwitter not used.
13 | > paste("theText size is: ",format(object.size(theText),units="auto"))
14 | [1] "theText size is: 264.5 Mb"
15 | > rm(theText)
16 | > system.time(saveRDS(words,paste("./capstone/data/",outFile,"words.rds",sep="")))
17 | user system elapsed
18 | 24.930 0.275 25.543
19 | > paste("Tokenized words size is: ",format(object.size(words),units="MB"))
20 | [1] "Tokenized words size is: 1407.4 Mb"
21 | > system.time(ngram2 <- ngrams(words,n=2))
22 | user system elapsed
23 | 68.699 8.085 78.631
24 | Warning message:
25 | 'ngrams' is deprecated.
26 | Use 'tokens_ngrams' instead.
27 | See help("Deprecated")
28 | > system.time(saveRDS(ngram2,paste("./capstone/data/",outFile,"ngram2.rds",sep="")))
29 | user system elapsed
30 | 40.647 0.399 41.378
31 | > paste("ngram2 size is: ",format(object.size(ngram2),units="MB"))
32 | [1] "ngram2 size is: 2029.8 Mb"
33 | > rm(ngram2)
34 | > ?tokens_ngrams
35 | > system.time(ngram3 <- tokens_ngrams(words,n=3))
36 | user system elapsed
37 | 131.732 22.382 161.632
38 | > # runtime: 57.72 seconds x-360
39 | > system.time(saveRDS(ngram3,paste("./capstone/data/",outFile,"ngram3.rds",sep="")))
40 | user system elapsed
41 | 63.118 0.790 64.876
42 | > paste("ngram3 size is: ",format(object.size(ngram3),units="MB"))
43 | [1] "ngram3 size is: 2885.5 Mb"
44 | > rm(ngram3)
45 | > # runtime: 356.14 seconds x-360
46 | > system.time(ngram4 <- tokens_ngrams(words,n=4))
47 | user system elapsed
48 | 193.003 124.425 419.931
49 | > # runtime: 480 seconds x-360
50 | > system.time(saveRDS(ngram4,paste("./capstone/data/",outFile,"ngram4.rds",sep="")))
51 | user system elapsed
52 | 78.835 1.078 81.811
53 | > paste("ngram4 size is: ",format(object.size(ngram4),units="MB"))
54 | [1] "ngram4 size is: 3618.2 Mb"
55 | > rm(ngram4)
56 | > system.time(ngram5 <- tokens_ngrams(words,n=5))
57 | user system elapsed
58 | 201.675 83.696 339.453
59 | > system.time(saveRDS(ngram5,paste("./capstone/data/",outFile,"ngram5.rds",sep="")))
60 | user system elapsed
61 | 76.676 0.658 78.221
62 | > paste("ngram5 size is: ",format(object.size(ngram5),units="MB"))
63 | [1] "ngram5 size is: 3915.3 Mb"
64 | > rm(ngram5)
65 | > system.time(ngram6 <- tokens_ngrams(words,n=6))
66 | user system elapsed
67 | 206.480 89.015 342.710
68 | > system.time(saveRDS(ngram6,paste("./capstone/data/",outFile,"ngram6.rds",sep="")))
69 | user system elapsed
70 | 77.300 0.497 77.953
71 | > paste("ngram6 size is: ",format(object.size(ngram6),units="MB"))
72 | [1] "ngram6 size is: 4035.1 Mb"
73 | > rm(ngram6)
74 |
--------------------------------------------------------------------------------
/markdown/capstone-simplifiedApproach.md:
--------------------------------------------------------------------------------
1 | # Capstone Strategy: Simplify, Simplify, Simplify...
2 |
3 | It is very easy to overcomplicate the Johns Hopkins Data Science Specialization Capstone project. Why? Students must run their prediction applications in less than 1Gb of RAM, so this limits the sophistication of the final work product.
4 |
5 | Therefore, it is helpful to spend one's time on as simple a model as possible. Interestingly, this is the same advice I give to *R Programming* students in [Strategy for the Programming Assignments](http://bit.ly/2ddFh9A): *make it work, make it right, make it fast.*
6 |
7 | A simple solution to the Capstone can be accomplished with three key tools:
8 |
9 | 1. **data.table** -- due to its high performance, low memory usage, and ability to do an indexed search like a database table, this package is extremely useful not only to create the data needed for the prediction algorithm, but it is also very valuable in the shiny app.
10 | 2. **quanteda::tokens_ngrams()** -- the workhorse that will generate the data needed for the easiest possible algorithm, a simple back off model based on last word frequencies / probabilities given a set of first words
11 | 3. **SQL with the sqldf package** -- given a set of n-grams that are aggregated into three columns, a base consisting of n-1 words in the n-gram, and a prediction that is the last word, and a count variable for the frequency of occurrence of this n-gram, it's easy to write an SQL statement to extract the most frequently occurring prediction and save these into an output data.table for your shiny app
12 |
13 | An approach like this can result in a final data.table that is about 400Mb, using the entire corpus to build 2-grams through 5-grams into bases and predicted words.
14 |
15 | The irony in the grading scheme is that students are better off building an application that runs fast but is inaccurate (one can get 2 of 3 points for accuracy as long as the application always returns a result), than one that is accurate but slow (you can get 0 of 2 points for usability if the app is too slow).
16 |
--------------------------------------------------------------------------------
/markdown/cleaningData-sqldfDriver.md:
--------------------------------------------------------------------------------
1 | # Common Problems: sqldf() fails to connect to database
2 |
3 | A number of students in *Getting and Cleaning Data* have reported problems with `sqldf()` when attempting to answer the week 2 quiz. After some back and forth on the Discussion Forum, one student employed her hacker skills and discovered that she could resolve the problem by explicitly setting an option for `sqldf.driver`.
4 |
5 | options(sqldf.driver="SQLite")
6 |
7 | She also noted that she had installed MySQL on her computer, in addition to the R libraries required for `sqldf()`. This means that the error was most likely caused by `sqldf()` attempting to use the `RMySQL` driver, which requires a MySQL database to exist in order to function correctly.
8 |
9 | # Conclusion
10 |
11 | If you're having problems getting `sqldf()` to work, set the option for `sqldf.driver` to use `SQLite` instead of another driver such as `RMySQL` that will attempt to connect to a database management system outside of the R environment.
12 |
13 | *Hat tip to an unnamed student from Getting & Cleaning Data who discovered this solution.*
14 |
--------------------------------------------------------------------------------
/markdown/dataFrameAsList.md:
--------------------------------------------------------------------------------
1 | # Data Frame as List
2 | An important aspect of a data frame is that it is also a `list()`, where each column in the data frame is an item in the list. This is why when you look at a data frame in the Environment pane in RStudio, you see the following, where the list items are displayed in rows (when one might expect the columns to be rendered in columns):
3 |
4 |
5 |
6 | Each item in the list can then be accessed with the `extract` or `$` operator, which extracts items from lists. In a data frame, the content in each element of the list consists of one vector, and remember, all elements in a vector are of the same data type (e.g. character, numeric, date, logical, etc.).
--------------------------------------------------------------------------------
/markdown/dataProd-settingShinyappsTimeout.md:
--------------------------------------------------------------------------------
1 | ## Conserving Your shinyapps.io CPU Resources
2 |
3 | In order to save sufficient CPU cycles for the peer grading activity, it's important to set the Instance Idle Timeout for your application to the smallest value possible, 5 minutes.
4 |
5 | The procedure to do this is:
6 |
7 | 1. Login to shinyapps.io
8 | 2. Navigate to the **Applications / All **panel by selecting this on the left navigation bar
9 | 3. Select Settings on your app (the gear) to view the Application Overview
10 | 4. Select Settings (the gear) to view the application settings, and on the General tab,
11 | 5. Change the **Instance Idle Timeout** time to 5 minutes by scrolling the spinner selector.
12 |
13 | ## Step 3
14 |
15 | ## Step 4
16 |
17 | ## Step 5
18 |
19 | As you can see from the chart in Step 3, my application has been accessed a total of 11 times since December 25th, for a total of 1.07 hours of total execution time. We receive 25 hours of CPU time per month at the "free" membership level. Since the default Instance Idle Time is 30 minutes (if my memory is correct), you can burn up your entire month's time allotment after only 50 views of your application at the default settings.
20 |
--------------------------------------------------------------------------------
/markdown/dataProd-shinyTimeoutConfig.md:
--------------------------------------------------------------------------------
1 | Because the free membership level on shinyapps.io only provides 25 hours of CPU time per month, it's important to set the Instance Idle Timeout for your application to the smallest value possible, 5 minutes.
2 |
3 | The procedure to do this is:
4 |
5 | 1. Login to shinyapps.io
6 | 2. Navigate to the **Applications / All **panel by selecting this on the left navigation bar
7 | 3. Select Settings on your app (the gear) to view the Application Overview
8 | 4. Select Settings (the gear) to view the application settings, and on the General tab,
9 | 5. Change the **Instance Idle Timeout** time to 5 minutes by scrolling the spinner selector.
10 |
11 | ## Step 3
12 |
13 |
14 | ## Step 4
15 |
16 |
17 | ## Step 5
18 |
19 |
20 | As you can see from the chart in Step 3, my application has been accessed a total of 11 times since December 25th, for a total of 1.07 hours of total execution time. We receive 25 hours of CPU time per month at the "free" membership level. Since the default Instance Idle Time is 30 minutes (if my memory is correct), one can quickly burn up an entire month's time allotment after only 50 views of an application where timeout is set to the default settings.
21 |
--------------------------------------------------------------------------------
/markdown/edaInToothGrowthAnalysis.md:
--------------------------------------------------------------------------------
1 | # Exploratory Data Analysis in ToothGrowth Assignment
2 |
3 | For students who have not yet taken the *Exploratory Data Analysis* course within the Data Science curriculum, the requirement for exploratory data analysis in the ToothGrowth assignment may be confusing.
4 |
5 | The primary purpose of the exploratory data analysis for the ToothGrowth assignment is to have the student assess whether the data meets the required assumptions for hypothesis testing with t-tests or confidence intervals, such as "is the data normally distributed?" including:
6 |
7 | * is the distribution mesokurtic (neither too peaked nor too flat)?
8 | * is the distribution symmetric (neither positively nor negatively skewed)?
9 |
10 | Questions like this can be answered with descriptive statistics: mean, median, mode, skewness, kurtosis, and the [Shapiro-Wilk Test for Normality](https://en.wikipedia.org/wiki/Shapiro%E2%80%93Wilk_test).
11 |
12 | All of these statistics are available in the `stat.desc()` function within the [pastecs package](https://cran.r-project.org/web/packages/pastecs/pastecs.pdf).
13 |
14 | To visualize the distribution of a variable, researchers typically use boxplots, histograms (which are different from bar charts), Q-Q plots, and stem & leaf charts.
15 |
16 | Also, exploratory data analysis is also useful to find identify missing values if they exist in the data set, and develop a strategy for managing them, such as mean / median imputation.
17 |
18 | Note that it is important to understand the assumptions, limits, and biases inherent in any statistic used as part of the analysis. For example, the [Shapiro-Wilk Test for Normality](https://en.wikipedia.org/wiki/Shapiro%E2%80%93Wilk_test) has a known bias with large sample sizes.
19 | Since N = 60 for the ToothGrowth analysis, it is appropriate to use the Shapiro-Wilk test. In contrast, the [Central Limit Theorem component]((https://github.com/lgreski/datasciencectacontent/blob/master/markdown/statinf-expDistChecklist.md)) of the course project requires students to analyze distributions of 1,000 numbers, so a Shapiro-Wilk test on the distribution of means may erroneously be significant due to the large sample size, and must be visually verified with a Q-Q plot.
20 |
--------------------------------------------------------------------------------
/markdown/exampleSortRvsSAS.md:
--------------------------------------------------------------------------------
1 | # Thinking in R vs. Thinking in SAS
2 |
3 | Coming into the Data Science Specialization with years of experience in SAS and SPSS, it took a concerted effort to understand some of the key concepts in R well enough for to use them effectively, such as:
4 |
5 | > Everything that exists is an object.
6 | > Everything that happens is a function call.
7 | >
8 | > -- John Chambers, quoted in *Advanced R* (1)
9 |
10 | As I progressed through the Specialization and began work as a Community Mentor, I noticed other students struggling with the same issue, especially students who had prior experience with SAS. Arguably, one's SAS experience becomes an impediment to learning R to the degree that we expect R to work like SAS.
11 |
12 | For example, once you've had experience in SAS, you tend to think about sorting like this:
13 |
14 | /*
15 | * read the mtcars data set from the R datasets library,
16 | * after having written it with a pipe delimiter and no column names
17 | */
18 |
19 | data mtcars;
20 | infile "/folders/myfolders/mtcars.tsv" dlm="|";
21 | input vehicle : $20. mpg cyl disp hp drat wt qsec vs am gear carb;
22 | run;
23 | /*
24 | * sort the data by cyl and mpg
25 | */
26 | proc sort data = mtcars out = sorted;
27 | by cyl mpg;
28 | run;
29 | /*
30 | * print the data
31 | */
32 | proc print data = sorted;
33 | run;
34 |
35 |
36 | 
37 |
38 |
39 |
40 | The R version of sorting requires either loading a package specifically designed to sort data frames, or knowledge of syntax to coerce the vector-oriented `order()` function to operate on a data frame by ordering the rows as an operation within the row component of the `dataFrame[rows , columns]` syntax as follows:
41 |
42 | library(datasets)
43 | head(mtcars)
44 | #
45 | # use base::order() to sort the data frame by cyl and mpt
46 | #
47 | orderedData <- mtcars[order(mtcars$cyl,mtcars$mpg),]
48 | head(orderedData)
49 |
50 | mpg cyl disp hp drat wt qsec vs am gear carb
51 | Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
52 | Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
53 | Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
54 | Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
55 | Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
56 | Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
57 | >
58 |
59 | The "out of box" R solution to the sort problem would not seem intuitive to a SAS programmer. As we gain experience with R and learn about the existence of packages such as Hadley Wickham's `dplyr`, we discover that third party package developers have built a variety of capabilities to shield people from some of the complexities of the base language for common data manipulation tasks. For example, dplyr's `arrange()` function arranges rows in a data frame based on the values of one or more variables.
60 |
61 | While these features are helpful to reduce the amount of R code required for common data manipulation tasks, it's still important to understand how R works under the covers in order to use these packages correctly.
62 |
63 | **Bottom line:** if you're used to working with SAS and expect R to behave in a similar manner (e.g. R must have a `sort()` function that works on data frames, not vectors), you're in for a steep learning curve while you "unlearn" how one does things in SAS, or find the third party packages that support features whose behavior is more similar to SAS.
64 |
65 | ### References
66 |
67 | (1) Wickham, Hadley (2015) -- *Advanced R*, CRC Press, Boca Raton, FL
68 |
--------------------------------------------------------------------------------
/markdown/exdata-readingSubsetOfRawData.md:
--------------------------------------------------------------------------------
1 | # Reading a Subset of Raw Data into R
2 |
3 | The first programming assignment in *Exploratory Data Analysis* requires students to read a subset of power consumption data, based on a set of dates.
4 |
5 | Students who have constrained memory and/or slow CPU speed often ask what is the most efficient way to subset the data, i.e. without having to read the entire file and then subset the result.
6 |
7 | ## Approach 1: `readLines()` and parsing
8 |
9 | If the data is in a fixed length format, one could use a text editor like [UltraEdit](http://www.ultraedit.com) (license required) or [Atom](https://atom.io) (free) to open the raw data file, and count the columns. Once the columns in which the data needed to subset the file are known, one can write an R script with `readLines()` to read one line at a time, parse the specific columns into an R object, and decide whether to keep the data.
10 |
11 | As an example of visually inspecting a file, here is a file of software application data that has been read into the *UltraEdit* text editor.
12 |
13 |
14 |
15 | Notice the column numbers across the top of the editor window. These can be used to evaluate the location of data in a fixed record format.
16 |
17 | Another way to access this information is to use `readLines()` to read the first 10 lines of the file, and then inspect the contents in R.
18 |
19 | For a detailed example illustrating how to parse data with `readLines()`, please read [Real World Example: Reading American Community Survey U.S. Census Data](http://bit.ly/2bAdLE9).
20 |
21 | ## Approach 2: `read.csv.sql()` for Comma Separated Values Files
22 |
23 | Another approach that works for comma separated values files is to use the `sqldf` package and the `read.csv.sql()` function, which allows one to include a `where` clause to subset the data.
24 |
25 | ## Approach 3: Use a High Speed Reader
26 |
27 | Finally, one could read the data using using a high speed data reader such as `readr::read_csv2()`, leaving everything as character variables, and then subset the data.
28 |
29 | Comparing the last two options on the power consumption data for *Exploratory Data Analysis*, we see that `readr::read_csv()` and a subset via the extract operator is significantly faster (2.76 seconds) than `sqldf::read.csv.sql()` (10.84 seconds).
30 |
31 |
32 |
33 | We also see that the output of each approach results in the same number of rows written to the result data frame, 2,880, or 1 observation per minute (1,440) for two days.
34 |
35 |
36 |
37 | ## Conclusions
38 |
39 | The counter-intuitive result is that if one has access to a highly efficient mechanism to read the entire file, it can be faster to read it and then subset, rather than using a more complex but less efficient approach.
40 |
41 | For a highly complex file with many variables, `readLines()` and parsing will outperform `readr`. Files with multiple record types for a single observation require an approach like the one demonstrated in [Real World Example: Reading American Community Survey U.S. Census Data](http://bit.ly/2bAdLE9)
42 |
43 | # Appendix: Computer Specifications
44 |
45 | The performance timings in this article were generated on a Macbook Pro with the following specifications.
46 |
47 |
48 |
49 | Computer |
50 | Configuration |
51 |
52 |
53 | Apple Macbook Pro |
54 |
55 |
56 | - Operating system: macOS Sierra 10.12.4 (16E195)
57 | - Processor: Intel i5 at 2.6Ghz, turbo up to 3.3Ghz, two cores
58 | - Memory: 8 gigabytes
59 | - Disk: 512 gigabytes, solid state drive
60 | - Date built: April 2013
61 |
62 | |
63 |
64 |
65 |
--------------------------------------------------------------------------------
/markdown/gen-gettingHelpWithSwirl.md:
--------------------------------------------------------------------------------
1 | # Swirl: Common Problems and Getting Help
2 |
3 | Students in the Johns Hopkins *Data Science Specialization* are introduced to `swirl`, an interactive environment for learning R and data science.
4 |
5 | They sometimes have problems either installing the software or running it with the version of R installed on their computers. They also have problems submitting swirl assignments for credit through Coursera.
6 |
7 | ## ISSUE 1: Swirl works with the current version of R
8 |
9 | Sometimes students start the *Specialization* after having previously installed an older version of R on their computers. Swirl only works with the latest version of R, so it is important to upgrade to the latest version before installing swirl.
10 |
11 | Most students' problems with swirl can be resolved by running the latest version of R.
12 |
13 | **SOLUTION:** Install / upgrade to the current version of R before installing swirl. If you already have a large volume of R packages installed, see also [How to Upgrade R without Losing Your Packages.](http://bit.ly/2uGKYFY)
14 |
15 | ## ISSUE 2: Problems Submitting Work for Credit
16 |
17 | Students (especially those outside the United States) sometimes experience difficulties getting the grading of swirl assignments to work correctly. Often this is due to connectivity problems (e.g. firewalls, virtual private networks, etc.) between a student's computer and Coursera servers. All swirl assignments in the *Data Science Specialization* are optional. Therefore, it is not required to submit them for credit on Coursera in order to pass any of the courses in the Specialization.
18 |
19 | **SOLUTION:** Finish the assignments and don't worry about getting "credit" for them.
20 |
21 | ## ISSUE 3: Finding the Assignment Token
22 |
23 | Another common problem students experience when attempting to submit work for credit is where to find the assignment token that is requested by swirl at the end of a lesson.
24 |
25 | From the course home page, select the Week 1 pull down to make the content visible for Week 1.
26 |
27 |
28 |
29 | The Week 1 content includes videos, a quiz, and practice exercises. Select the `Practice Exercises` icon.
30 |
31 |
32 |
33 | This will bring up the assignment instructions page. The Assignment Token is visible in a shaded box on the right side of the web page. Use the email address and assignment token when requested by Swirl to grade your assignment and have the grade posted to Coursera.
34 |
35 |
36 |
37 | # Where to Get Help with Swirl
38 |
39 | Swirl is supported by [swirlstats.com](https://swirlstats.com), whose lead developer is [Sean Kross](http://seankross.com).
40 |
41 | One can obtain help for swirl in the following places:
42 | * [Swirl help page](http://swirlstats.com/help.html)
43 | * [Frequently Asked Questions for swirl](http://swirlstats.com/faq.html)
44 | * [Swirl google discussion group](https://groups.google.com/forum/#!forum/swirl-discuss)
45 | * [Reporting problems with swirl](https://github.com/swirldev/swirl/issues)
46 |
--------------------------------------------------------------------------------
/markdown/gen-handlingMissingValues.md:
--------------------------------------------------------------------------------
1 | # Strategies for Handling Missing Values
2 |
3 | A number of courses in the *Johns Hopkins Data Science Specialization* on Coursera force students to deal with messy data, including missing values in the data. This leads them to ask questions about different ways for managing missing values, given that this topic is not covered in any level of detail in the specialization.
4 |
5 | To provide more background on this topic, I've compiled the following list of resources on missing values. Since articles posted on the internet sometimes disappear over time for various reasons, I've referenced local copies of the publicly available articles.
6 |
7 |
8 | Resource | Description |
9 |
10 | A Comparison of Six Methods for Missing Data Imputation
11 | |
12 | Author(s): Peter Schmitt, Jonas Mandel, and Mickael Guedj
Article compares six different imputation methods against four real data sets of varying sizes. Results are based on four evaluation criteria, including root mean squared error, unsupervised classification error, supervised classification error, and execution time.
13 | |
14 |
15 |
16 | A Review of Missing Data Handling Methods in Education Research
17 | |
18 | Author: Jehanzeb R. Cheema
Article discusses the problem of missing data in educational research, including a review of previously published studies.
19 | |
20 |
21 |
22 | A Framework for Missing Value Imputation
23 | |
24 | Author(s): Ms. R. Malarvizhi, Dr. Antony Selvadoss Thanamani
Article discusses the imputation of data by comparing the two most popular techniques, mean substitution and k-means clustering with a proposed k nearest neighbor approach.
25 | |
26 |
27 |
28 | Chapter 25: Missing Data Imputation
29 | |
30 | Author(s): Andrew Gelman, Jennifer Hill
Professor Gelman posted chapter 25 of his book Data Analysis Using Regression and Multilevel / Hierarchical Models on his website at Columbia University. The book is considered an important reference for social scientists using linear and hierarchical models. The missing values chapter describes a variety of ways to handle missing data, and includes examples coded in R.
31 | |
32 |
33 |
34 | Review of Methods for Missing Data
35 | |
36 | Author(s): Therese Pigott
Pigott's article compares model based methods of missing value imputation with ad hoc methods, such as pairwise or listwise deletion. The approaches are compared using an analysis of students' ability to control asthma symptoms.
37 | |
38 |
39 |
40 | Missing Data in Educational Research: A Review of Reporting Practices and Suggestions for Improvement
41 | |
42 | Author(s): James L. Peugh, Craig K. Enders
Article provides an overview of missing-data theory, maximum likelihood estimation and multiple imputation. It also includes a methodological review of missing data reporting practices across 23 applied research journals, and demonstrates forms of imputation on data from the Longitudinal Study of American Youth.
43 | |
44 |
45 |
46 |
47 |
--------------------------------------------------------------------------------
/markdown/genMdWithGraphicsInGithub.md:
--------------------------------------------------------------------------------
1 | # Steps to Generate a Proper Markdown File with Graphics on Github
2 |
3 | A couple of the Data Science Specialization courses require students to generate a markdown file to be viewed by peer reviewers on Github, where the markdown file should include graphics. I have observed students lose points on their course projects for either failing to publish the output markdown file from the course project to Github, or by publishing the .md file without publishing the supporting graphics.
4 |
5 | This article explains how to generate a proper markdown file including graphics as separate files in RStudio, and how to ensure all required files are published to Github.
6 |
7 | *Additional content goes here...*
--------------------------------------------------------------------------------
/markdown/googlesearch.md:
--------------------------------------------------------------------------------
1 | If necessary, put your hacker skills to work by making Google your friend. For example, if you're trying to solve the *R Programming* class Assignment 1, some of the following Google queries might be helpful:
2 |
3 | * [list files in r](http://www.google.com/search?q=list+files+in+r)
4 | * [subset data frame by list of values](http://www.google.com/search?subset+data+frame+by+list+of+values)
5 | * [missing values in R](http://www.google.com/search?missing+values+in+r)
6 |
7 | And yes, I did have to do some hacking around myself to figure out how to embed Google searches in this article.
8 |
--------------------------------------------------------------------------------
/markdown/images/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/.DS_Store
--------------------------------------------------------------------------------
/markdown/images/2017-08-27_07-46-44.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/2017-08-27_07-46-44.png
--------------------------------------------------------------------------------
/markdown/images/2017-10-14_14-48-35.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/2017-10-14_14-48-35.png
--------------------------------------------------------------------------------
/markdown/images/ExpDistChecklist.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/ExpDistChecklist.png
--------------------------------------------------------------------------------
/markdown/images/RepDataAssignment2Checklist.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/RepDataAssignment2Checklist.png
--------------------------------------------------------------------------------
/markdown/images/barplotPA2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/barplotPA2.png
--------------------------------------------------------------------------------
/markdown/images/capstone-dfmToMatrix01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/capstone-dfmToMatrix01.png
--------------------------------------------------------------------------------
/markdown/images/capstone-dfmToMatrix02.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/capstone-dfmToMatrix02.png
--------------------------------------------------------------------------------
/markdown/images/cleaningData-gottaScrapeEmAll01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/cleaningData-gottaScrapeEmAll01.png
--------------------------------------------------------------------------------
/markdown/images/cleaningData-gottaScrapeEmAll02.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/cleaningData-gottaScrapeEmAll02.png
--------------------------------------------------------------------------------
/markdown/images/cleaningData-gottaScrapeEmAll03.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/cleaningData-gottaScrapeEmAll03.png
--------------------------------------------------------------------------------
/markdown/images/cleaningData-htmlParsing00.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/cleaningData-htmlParsing00.png
--------------------------------------------------------------------------------
/markdown/images/cleaningData-htmlParsing01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/cleaningData-htmlParsing01.png
--------------------------------------------------------------------------------
/markdown/images/cleaningData-htmlParsing02.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/cleaningData-htmlParsing02.png
--------------------------------------------------------------------------------
/markdown/images/cleaningData-htmlParsing03.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/cleaningData-htmlParsing03.png
--------------------------------------------------------------------------------
/markdown/images/cleaningData-htmlParsing04.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/cleaningData-htmlParsing04.png
--------------------------------------------------------------------------------
/markdown/images/cleaningData-htmlParsing05.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/cleaningData-htmlParsing05.png
--------------------------------------------------------------------------------
/markdown/images/cleaningData-htmlParsing06.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/cleaningData-htmlParsing06.png
--------------------------------------------------------------------------------
/markdown/images/cleaningData-htmlParsing07.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/cleaningData-htmlParsing07.png
--------------------------------------------------------------------------------
/markdown/images/cleaningData-htmlParsing08.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/cleaningData-htmlParsing08.png
--------------------------------------------------------------------------------
/markdown/images/cleaningData-htmlParsing09.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/cleaningData-htmlParsing09.png
--------------------------------------------------------------------------------
/markdown/images/cleaningData-htmlParsing10.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/cleaningData-htmlParsing10.png
--------------------------------------------------------------------------------
/markdown/images/cleaningData-htmlParsing11.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/cleaningData-htmlParsing11.png
--------------------------------------------------------------------------------
/markdown/images/cleaningData-htmlParsing12.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/cleaningData-htmlParsing12.png
--------------------------------------------------------------------------------
/markdown/images/cleaningData-htmlParsing13.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/cleaningData-htmlParsing13.png
--------------------------------------------------------------------------------
/markdown/images/cleaningData-htmlParsing14.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/cleaningData-htmlParsing14.png
--------------------------------------------------------------------------------
/markdown/images/cleaningData-htmlParsing15.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/cleaningData-htmlParsing15.png
--------------------------------------------------------------------------------
/markdown/images/cleaningData-htmlParsing16.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/cleaningData-htmlParsing16.png
--------------------------------------------------------------------------------
/markdown/images/cleaningData-htmlParsing17.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/cleaningData-htmlParsing17.png
--------------------------------------------------------------------------------
/markdown/images/cleaningData-javaError01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/cleaningData-javaError01.png
--------------------------------------------------------------------------------
/markdown/images/cleaningData-javaError02.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/cleaningData-javaError02.png
--------------------------------------------------------------------------------
/markdown/images/cleaningData-readingFiles00.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/cleaningData-readingFiles00.png
--------------------------------------------------------------------------------
/markdown/images/cleaningData-readingFiles01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/cleaningData-readingFiles01.png
--------------------------------------------------------------------------------
/markdown/images/cleaningData-readingFiles02.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/cleaningData-readingFiles02.png
--------------------------------------------------------------------------------
/markdown/images/cleaningData-valueOfCleaningData01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/cleaningData-valueOfCleaningData01.png
--------------------------------------------------------------------------------
/markdown/images/cleaningData-valueOfCleaningData02.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/cleaningData-valueOfCleaningData02.png
--------------------------------------------------------------------------------
/markdown/images/cleaningData-valueOfCleaningData03.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/cleaningData-valueOfCleaningData03.png
--------------------------------------------------------------------------------
/markdown/images/cleaningData-valueOfCleaningData04.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/cleaningData-valueOfCleaningData04.png
--------------------------------------------------------------------------------
/markdown/images/cleaningData-week2q5-01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/cleaningData-week2q5-01.png
--------------------------------------------------------------------------------
/markdown/images/cleaningData-week2q5-02.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/cleaningData-week2q5-02.png
--------------------------------------------------------------------------------
/markdown/images/cleaningData-week2q5-03.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/cleaningData-week2q5-03.png
--------------------------------------------------------------------------------
/markdown/images/cleaningData-week2q5-04.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/cleaningData-week2q5-04.png
--------------------------------------------------------------------------------
/markdown/images/cleaningData-week2q5-05.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/cleaningData-week2q5-05.png
--------------------------------------------------------------------------------
/markdown/images/cleaningData-week2q5-06.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/cleaningData-week2q5-06.png
--------------------------------------------------------------------------------
/markdown/images/cleaningData-week2q5-07.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/cleaningData-week2q5-07.png
--------------------------------------------------------------------------------
/markdown/images/cleaningData-week2q5-08.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/cleaningData-week2q5-08.png
--------------------------------------------------------------------------------
/markdown/images/configRStudioGit1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/configRStudioGit1.png
--------------------------------------------------------------------------------
/markdown/images/configRStudioGit10.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/configRStudioGit10.png
--------------------------------------------------------------------------------
/markdown/images/configRStudioGit11.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/configRStudioGit11.png
--------------------------------------------------------------------------------
/markdown/images/configRStudioGit12.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/configRStudioGit12.png
--------------------------------------------------------------------------------
/markdown/images/configRStudioGit13.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/configRStudioGit13.png
--------------------------------------------------------------------------------
/markdown/images/configRStudioGit14.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/configRStudioGit14.png
--------------------------------------------------------------------------------
/markdown/images/configRStudioGit2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/configRStudioGit2.png
--------------------------------------------------------------------------------
/markdown/images/configRStudioGit3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/configRStudioGit3.png
--------------------------------------------------------------------------------
/markdown/images/configRStudioGit4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/configRStudioGit4.png
--------------------------------------------------------------------------------
/markdown/images/configRStudioGit4a.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/configRStudioGit4a.png
--------------------------------------------------------------------------------
/markdown/images/configRStudioGit5.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/configRStudioGit5.png
--------------------------------------------------------------------------------
/markdown/images/configRStudioGit6.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/configRStudioGit6.png
--------------------------------------------------------------------------------
/markdown/images/configRStudioGit7.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/configRStudioGit7.png
--------------------------------------------------------------------------------
/markdown/images/configRStudioGit8.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/configRStudioGit8.png
--------------------------------------------------------------------------------
/markdown/images/configRStudioGit9.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/configRStudioGit9.png
--------------------------------------------------------------------------------
/markdown/images/dataProd-shinyConfig1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/dataProd-shinyConfig1.png
--------------------------------------------------------------------------------
/markdown/images/dataProd-shinyConfig2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/dataProd-shinyConfig2.png
--------------------------------------------------------------------------------
/markdown/images/dataProd-shinyConfig3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/dataProd-shinyConfig3.png
--------------------------------------------------------------------------------
/markdown/images/exdata-readSubsetOfRawData01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/exdata-readSubsetOfRawData01.png
--------------------------------------------------------------------------------
/markdown/images/exdata-readSubsetOfRawData02.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/exdata-readSubsetOfRawData02.png
--------------------------------------------------------------------------------
/markdown/images/exdata-readSubsetOfRawData03.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/exdata-readSubsetOfRawData03.png
--------------------------------------------------------------------------------
/markdown/images/exdata-readSubsetOfRawData04.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/exdata-readSubsetOfRawData04.png
--------------------------------------------------------------------------------
/markdown/images/forumPostFeatures1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/forumPostFeatures1.png
--------------------------------------------------------------------------------
/markdown/images/forumPostFeatures2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/forumPostFeatures2.png
--------------------------------------------------------------------------------
/markdown/images/forumPostFeatures3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/forumPostFeatures3.png
--------------------------------------------------------------------------------
/markdown/images/forumPostFeatures4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/forumPostFeatures4.png
--------------------------------------------------------------------------------
/markdown/images/gen-gettingHelpWithSwirl01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/gen-gettingHelpWithSwirl01.png
--------------------------------------------------------------------------------
/markdown/images/gen-gettingHelpWithSwirl02.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/gen-gettingHelpWithSwirl02.png
--------------------------------------------------------------------------------
/markdown/images/gen-gettingHelpWithSwirl03.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/gen-gettingHelpWithSwirl03.png
--------------------------------------------------------------------------------
/markdown/images/googlesheets01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/googlesheets01.png
--------------------------------------------------------------------------------
/markdown/images/googlesheets02.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/googlesheets02.png
--------------------------------------------------------------------------------
/markdown/images/googlesheets03.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/googlesheets03.png
--------------------------------------------------------------------------------
/markdown/images/googlesheets04.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/googlesheets04.png
--------------------------------------------------------------------------------
/markdown/images/googlesheets05.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/googlesheets05.png
--------------------------------------------------------------------------------
/markdown/images/googlesheets06.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/googlesheets06.png
--------------------------------------------------------------------------------
/markdown/images/googlesheets07.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/googlesheets07.png
--------------------------------------------------------------------------------
/markdown/images/googlesheets08.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/googlesheets08.png
--------------------------------------------------------------------------------
/markdown/images/installMikTeX00.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/installMikTeX00.png
--------------------------------------------------------------------------------
/markdown/images/installMikTeX01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/installMikTeX01.png
--------------------------------------------------------------------------------
/markdown/images/installMikTeX02.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/installMikTeX02.png
--------------------------------------------------------------------------------
/markdown/images/installMikTeX03.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/installMikTeX03.png
--------------------------------------------------------------------------------
/markdown/images/installMikTeX04.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/installMikTeX04.png
--------------------------------------------------------------------------------
/markdown/images/installMikTeX05.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/installMikTeX05.png
--------------------------------------------------------------------------------
/markdown/images/installMikTeX06.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/installMikTeX06.png
--------------------------------------------------------------------------------
/markdown/images/installMikTeX07.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/installMikTeX07.png
--------------------------------------------------------------------------------
/markdown/images/installMikTeX08.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/installMikTeX08.png
--------------------------------------------------------------------------------
/markdown/images/installMikTeX09.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/installMikTeX09.png
--------------------------------------------------------------------------------
/markdown/images/installMikTeX10.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/installMikTeX10.png
--------------------------------------------------------------------------------
/markdown/images/installMikTeX11.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/installMikTeX11.png
--------------------------------------------------------------------------------
/markdown/images/installMikTeX12.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/installMikTeX12.png
--------------------------------------------------------------------------------
/markdown/images/installMikTeX13.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/installMikTeX13.png
--------------------------------------------------------------------------------
/markdown/images/installMikTeX14.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/installMikTeX14.png
--------------------------------------------------------------------------------
/markdown/images/installMikTeX15.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/installMikTeX15.png
--------------------------------------------------------------------------------
/markdown/images/installMikTeX16.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/installMikTeX16.png
--------------------------------------------------------------------------------
/markdown/images/installMikTeX17.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/installMikTeX17.png
--------------------------------------------------------------------------------
/markdown/images/installMikTeX18.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/installMikTeX18.png
--------------------------------------------------------------------------------
/markdown/images/installMikTeX19.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/installMikTeX19.png
--------------------------------------------------------------------------------
/markdown/images/installMikTeX20.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/installMikTeX20.png
--------------------------------------------------------------------------------
/markdown/images/installMikTeX21.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/installMikTeX21.png
--------------------------------------------------------------------------------
/markdown/images/installMikTeX22.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/installMikTeX22.png
--------------------------------------------------------------------------------
/markdown/images/installMikTeX23.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/installMikTeX23.png
--------------------------------------------------------------------------------
/markdown/images/installMikTeX24.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/installMikTeX24.png
--------------------------------------------------------------------------------
/markdown/images/installMikTeX25.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/installMikTeX25.png
--------------------------------------------------------------------------------
/markdown/images/installMikTeX26.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/installMikTeX26.png
--------------------------------------------------------------------------------
/markdown/images/kableTable01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/kableTable01.png
--------------------------------------------------------------------------------
/markdown/images/misc-Mathjax01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/misc-Mathjax01.png
--------------------------------------------------------------------------------
/markdown/images/misc-Mathjax02.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/misc-Mathjax02.png
--------------------------------------------------------------------------------
/markdown/images/misc-Mathjax03.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/misc-Mathjax03.png
--------------------------------------------------------------------------------
/markdown/images/misc-Mathjax04.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/misc-Mathjax04.png
--------------------------------------------------------------------------------
/markdown/images/misc-Mathjax05.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/misc-Mathjax05.png
--------------------------------------------------------------------------------
/markdown/images/misc-rOnChromebook01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/misc-rOnChromebook01.png
--------------------------------------------------------------------------------
/markdown/images/misc-rOnChromebook02.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/misc-rOnChromebook02.png
--------------------------------------------------------------------------------
/markdown/images/misc-rOnChromebook03.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/misc-rOnChromebook03.png
--------------------------------------------------------------------------------
/markdown/images/misc-rOnChromebook04.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/misc-rOnChromebook04.png
--------------------------------------------------------------------------------
/markdown/images/misc-rOnChromebook05.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/misc-rOnChromebook05.png
--------------------------------------------------------------------------------
/markdown/images/misc-rOnChromebook06.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/misc-rOnChromebook06.png
--------------------------------------------------------------------------------
/markdown/images/misc-rOnChromebook07.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/misc-rOnChromebook07.png
--------------------------------------------------------------------------------
/markdown/images/misc-rOnChromebook08.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/misc-rOnChromebook08.png
--------------------------------------------------------------------------------
/markdown/images/misc-rOnChromebook09.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/misc-rOnChromebook09.png
--------------------------------------------------------------------------------
/markdown/images/misc-rOnChromebook10.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/misc-rOnChromebook10.png
--------------------------------------------------------------------------------
/markdown/images/misc-rOnChromebook11.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/misc-rOnChromebook11.png
--------------------------------------------------------------------------------
/markdown/images/misc-rOnChromebook12.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/misc-rOnChromebook12.png
--------------------------------------------------------------------------------
/markdown/images/misc-rOnChromebook13.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/misc-rOnChromebook13.png
--------------------------------------------------------------------------------
/markdown/images/misc-rOnChromebook14.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/misc-rOnChromebook14.png
--------------------------------------------------------------------------------
/markdown/images/misc-rOnChromebook15.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/misc-rOnChromebook15.png
--------------------------------------------------------------------------------
/markdown/images/misc-rOnChromebook16.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/misc-rOnChromebook16.png
--------------------------------------------------------------------------------
/markdown/images/misc-rOnChromebook17.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/misc-rOnChromebook17.png
--------------------------------------------------------------------------------
/markdown/images/misc-rOnChromebook18.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/misc-rOnChromebook18.png
--------------------------------------------------------------------------------
/markdown/images/misc-rOnChromebook19.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/misc-rOnChromebook19.png
--------------------------------------------------------------------------------
/markdown/images/misc-rOnChromebook20.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/misc-rOnChromebook20.png
--------------------------------------------------------------------------------
/markdown/images/misc-rOnChromebook21.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/misc-rOnChromebook21.png
--------------------------------------------------------------------------------
/markdown/images/misc-rOnChromebook22.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/misc-rOnChromebook22.png
--------------------------------------------------------------------------------
/markdown/images/misc-rOnChromebook23.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/misc-rOnChromebook23.png
--------------------------------------------------------------------------------
/markdown/images/pml-ElemStatLearn01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/pml-ElemStatLearn01.png
--------------------------------------------------------------------------------
/markdown/images/pml-ElemStatLearn02.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/pml-ElemStatLearn02.png
--------------------------------------------------------------------------------
/markdown/images/pml-ElemStatLearn03.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/pml-ElemStatLearn03.png
--------------------------------------------------------------------------------
/markdown/images/pml-combiningPredictorsBinomial01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/pml-combiningPredictorsBinomial01.png
--------------------------------------------------------------------------------
/markdown/images/pml-installingRattleOnMacOSX01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/pml-installingRattleOnMacOSX01.png
--------------------------------------------------------------------------------
/markdown/images/pml-installingRattleOnMacOSX02.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/pml-installingRattleOnMacOSX02.png
--------------------------------------------------------------------------------
/markdown/images/regmods-sumEtimesXeq0.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/regmods-sumEtimesXeq0.png
--------------------------------------------------------------------------------
/markdown/images/repData-configKnitrWithMD01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/repData-configKnitrWithMD01.png
--------------------------------------------------------------------------------
/markdown/images/repData-configKnitrWithMD02.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/repData-configKnitrWithMD02.png
--------------------------------------------------------------------------------
/markdown/images/repData-configKnitrWithMD03.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/repData-configKnitrWithMD03.png
--------------------------------------------------------------------------------
/markdown/images/repData-configKnitrWithMD04.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/repData-configKnitrWithMD04.png
--------------------------------------------------------------------------------
/markdown/images/repData-configKnitrWithMD05.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/repData-configKnitrWithMD05.png
--------------------------------------------------------------------------------
/markdown/images/repData-configKnitrWithMD06.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/repData-configKnitrWithMD06.png
--------------------------------------------------------------------------------
/markdown/images/repData-configKnitrWithMD07.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/repData-configKnitrWithMD07.png
--------------------------------------------------------------------------------
/markdown/images/repData-configKnitrWithMD08.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/repData-configKnitrWithMD08.png
--------------------------------------------------------------------------------
/markdown/images/repData-configKnitrWithMD09.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/repData-configKnitrWithMD09.png
--------------------------------------------------------------------------------
/markdown/images/repData-configKnitrWithMD10.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/repData-configKnitrWithMD10.png
--------------------------------------------------------------------------------
/markdown/images/repData-configKnitrWithMD11.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/repData-configKnitrWithMD11.png
--------------------------------------------------------------------------------
/markdown/images/repData-configKnitrWithMD12.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/repData-configKnitrWithMD12.png
--------------------------------------------------------------------------------
/markdown/images/repData-configKnitrWithMD13.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/repData-configKnitrWithMD13.png
--------------------------------------------------------------------------------
/markdown/images/repData-stormDataGuide01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/repData-stormDataGuide01.png
--------------------------------------------------------------------------------
/markdown/images/rprog-Assignment1Instructions.PDF:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-Assignment1Instructions.PDF
--------------------------------------------------------------------------------
/markdown/images/rprog-OOPandR01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-OOPandR01.png
--------------------------------------------------------------------------------
/markdown/images/rprog-OOPandR02.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-OOPandR02.png
--------------------------------------------------------------------------------
/markdown/images/rprog-assignment1Demo01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-assignment1Demo01.png
--------------------------------------------------------------------------------
/markdown/images/rprog-assignment1Demo02.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-assignment1Demo02.png
--------------------------------------------------------------------------------
/markdown/images/rprog-assignment1Demo03.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-assignment1Demo03.png
--------------------------------------------------------------------------------
/markdown/images/rprog-assignment1DemoOutput01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-assignment1DemoOutput01.png
--------------------------------------------------------------------------------
/markdown/images/rprog-assignment1DemoOutput02.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-assignment1DemoOutput02.png
--------------------------------------------------------------------------------
/markdown/images/rprog-assignment1Solutions01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-assignment1Solutions01.png
--------------------------------------------------------------------------------
/markdown/images/rprog-assignment1Solutions02.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-assignment1Solutions02.png
--------------------------------------------------------------------------------
/markdown/images/rprog-assignment1Solutions03.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-assignment1Solutions03.png
--------------------------------------------------------------------------------
/markdown/images/rprog-assignment1Solutions04.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-assignment1Solutions04.png
--------------------------------------------------------------------------------
/markdown/images/rprog-assignmentOperator01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-assignmentOperator01.png
--------------------------------------------------------------------------------
/markdown/images/rprog-assignmentOperators.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-assignmentOperators.png
--------------------------------------------------------------------------------
/markdown/images/rprog-breakingDownMakeVector01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-breakingDownMakeVector01.png
--------------------------------------------------------------------------------
/markdown/images/rprog-breakingDownMakeVector01a.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-breakingDownMakeVector01a.png
--------------------------------------------------------------------------------
/markdown/images/rprog-breakingDownMakeVector02.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-breakingDownMakeVector02.png
--------------------------------------------------------------------------------
/markdown/images/rprog-breakingDownMakeVector03.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-breakingDownMakeVector03.png
--------------------------------------------------------------------------------
/markdown/images/rprog-breakingDownMakeVector04.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-breakingDownMakeVector04.png
--------------------------------------------------------------------------------
/markdown/images/rprog-breakingDownMakeVector05.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-breakingDownMakeVector05.png
--------------------------------------------------------------------------------
/markdown/images/rprog-conceptsForFileProcessing01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-conceptsForFileProcessing01.png
--------------------------------------------------------------------------------
/markdown/images/rprog-conceptsForFileProcessing02.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-conceptsForFileProcessing02.png
--------------------------------------------------------------------------------
/markdown/images/rprog-conceptsForFileProcessing03.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-conceptsForFileProcessing03.png
--------------------------------------------------------------------------------
/markdown/images/rprog-downloadLectures01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-downloadLectures01.png
--------------------------------------------------------------------------------
/markdown/images/rprog-downloadLectures02.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-downloadLectures02.png
--------------------------------------------------------------------------------
/markdown/images/rprog-downloadLectures03.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-downloadLectures03.png
--------------------------------------------------------------------------------
/markdown/images/rprog-extractOperator01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-extractOperator01.png
--------------------------------------------------------------------------------
/markdown/images/rprog-extractOperator02.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-extractOperator02.png
--------------------------------------------------------------------------------
/markdown/images/rprog-extractOperator03.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-extractOperator03.png
--------------------------------------------------------------------------------
/markdown/images/rprog-extractOperator04.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-extractOperator04.png
--------------------------------------------------------------------------------
/markdown/images/rprog-extractOperator05.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-extractOperator05.png
--------------------------------------------------------------------------------
/markdown/images/rprog-extractOperator06.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-extractOperator06.png
--------------------------------------------------------------------------------
/markdown/images/rprog-extractOperator07.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-extractOperator07.png
--------------------------------------------------------------------------------
/markdown/images/rprog-extractOperator08.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-extractOperator08.png
--------------------------------------------------------------------------------
/markdown/images/rprog-extractOperator09.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-extractOperator09.png
--------------------------------------------------------------------------------
/markdown/images/rprog-extractOperator10.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-extractOperator10.png
--------------------------------------------------------------------------------
/markdown/images/rprog-githubDesktop01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-githubDesktop01.png
--------------------------------------------------------------------------------
/markdown/images/rprog-githubDesktop02.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-githubDesktop02.png
--------------------------------------------------------------------------------
/markdown/images/rprog-githubDesktop03.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-githubDesktop03.png
--------------------------------------------------------------------------------
/markdown/images/rprog-githubDesktop04.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-githubDesktop04.png
--------------------------------------------------------------------------------
/markdown/images/rprog-githubDesktop05.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-githubDesktop05.png
--------------------------------------------------------------------------------
/markdown/images/rprog-githubDesktop06.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-githubDesktop06.png
--------------------------------------------------------------------------------
/markdown/images/rprog-githubDesktop07.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-githubDesktop07.png
--------------------------------------------------------------------------------
/markdown/images/rprog-githubDesktop08.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-githubDesktop08.png
--------------------------------------------------------------------------------
/markdown/images/rprog-githubDesktop09.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-githubDesktop09.png
--------------------------------------------------------------------------------
/markdown/images/rprog-githubDesktop10.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-githubDesktop10.png
--------------------------------------------------------------------------------
/markdown/images/rprog-githubDesktop11.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-githubDesktop11.png
--------------------------------------------------------------------------------
/markdown/images/rprog-githubDesktop12.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-githubDesktop12.png
--------------------------------------------------------------------------------
/markdown/images/rprog-githubDesktop13.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-githubDesktop13.png
--------------------------------------------------------------------------------
/markdown/images/rprog-lexicalScoping02.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-lexicalScoping02.png
--------------------------------------------------------------------------------
/markdown/images/rprog-lexicalScoping03.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-lexicalScoping03.png
--------------------------------------------------------------------------------
/markdown/images/rprog-lexicalScopingDiagrams.pptx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-lexicalScopingDiagrams.pptx
--------------------------------------------------------------------------------
/markdown/images/rprog-lexicalScopingIllustration.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-lexicalScopingIllustration.png
--------------------------------------------------------------------------------
/markdown/images/rprog-pollutantmean01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-pollutantmean01.png
--------------------------------------------------------------------------------
/markdown/images/rprog-pollutantmean02.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-pollutantmean02.png
--------------------------------------------------------------------------------
/markdown/images/rprog-pollutantmeanWithSAS00.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-pollutantmeanWithSAS00.png
--------------------------------------------------------------------------------
/markdown/images/rprog-pollutantmeanWithSAS01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-pollutantmeanWithSAS01.png
--------------------------------------------------------------------------------
/markdown/images/rprog-pollutantmeanWithSAS02.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-pollutantmeanWithSAS02.png
--------------------------------------------------------------------------------
/markdown/images/rprog-pollutantmeanWithSAS03.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-pollutantmeanWithSAS03.png
--------------------------------------------------------------------------------
/markdown/images/rprog-pollutantmeanWithSAS04.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-pollutantmeanWithSAS04.png
--------------------------------------------------------------------------------
/markdown/images/rprog-weightedMean01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-weightedMean01.png
--------------------------------------------------------------------------------
/markdown/images/rprog-weightedMean02.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-weightedMean02.png
--------------------------------------------------------------------------------
/markdown/images/rprog-weightedMean03.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-weightedMean03.png
--------------------------------------------------------------------------------
/markdown/images/rprog-weightedMean04.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-weightedMean04.png
--------------------------------------------------------------------------------
/markdown/images/rprog-weightedMean05.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-weightedMean05.png
--------------------------------------------------------------------------------
/markdown/images/rprog-weightedMean06.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-weightedMean06.png
--------------------------------------------------------------------------------
/markdown/images/rprog-weightedMean07.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/rprog-weightedMean07.png
--------------------------------------------------------------------------------
/markdown/images/sha1hash-1of4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/sha1hash-1of4.png
--------------------------------------------------------------------------------
/markdown/images/sha1hash-2of4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/sha1hash-2of4.png
--------------------------------------------------------------------------------
/markdown/images/sha1hash-3of4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/sha1hash-3of4.png
--------------------------------------------------------------------------------
/markdown/images/sha1hash-4of4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/sha1hash-4of4.png
--------------------------------------------------------------------------------
/markdown/images/signup1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/signup1.png
--------------------------------------------------------------------------------
/markdown/images/signup2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/signup2.png
--------------------------------------------------------------------------------
/markdown/images/statinf-areaOfPointOnNormalCurve00.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/statinf-areaOfPointOnNormalCurve00.png
--------------------------------------------------------------------------------
/markdown/images/statinf-areaOfPointOnNormalCurve000.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/statinf-areaOfPointOnNormalCurve000.png
--------------------------------------------------------------------------------
/markdown/images/statinf-areaOfPointOnNormalCurve01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/statinf-areaOfPointOnNormalCurve01.png
--------------------------------------------------------------------------------
/markdown/images/statinf-areaOfPointOnNormalCurve02.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/statinf-areaOfPointOnNormalCurve02.png
--------------------------------------------------------------------------------
/markdown/images/statinf-areaOfPointOnNormalCurve03.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/statinf-areaOfPointOnNormalCurve03.png
--------------------------------------------------------------------------------
/markdown/images/statinf-areaOfPointOnNormalCurve04.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/statinf-areaOfPointOnNormalCurve04.png
--------------------------------------------------------------------------------
/markdown/images/statinf-permutationTests00.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/statinf-permutationTests00.png
--------------------------------------------------------------------------------
/markdown/images/statinf-permutationTests01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/statinf-permutationTests01.png
--------------------------------------------------------------------------------
/markdown/images/statinf-permutationTests02.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/statinf-permutationTests02.png
--------------------------------------------------------------------------------
/markdown/images/statinf-poissonInterval01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/statinf-poissonInterval01.png
--------------------------------------------------------------------------------
/markdown/images/statinf-suppressPrintingInKinitr01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/statinf-suppressPrintingInKinitr01.png
--------------------------------------------------------------------------------
/markdown/images/statinf-suppressPrintingInKinitr02.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/statinf-suppressPrintingInKinitr02.png
--------------------------------------------------------------------------------
/markdown/images/statinf-usingKnitrInReports01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/statinf-usingKnitrInReports01.png
--------------------------------------------------------------------------------
/markdown/images/statinf-usingKnitrInReports02.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/statinf-usingKnitrInReports02.png
--------------------------------------------------------------------------------
/markdown/images/statinf-usingKnitrInReports03.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/statinf-usingKnitrInReports03.png
--------------------------------------------------------------------------------
/markdown/images/statinf-usingKnitrInReports04.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/statinf-usingKnitrInReports04.png
--------------------------------------------------------------------------------
/markdown/images/statinf-varOfBinomialDistribution01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/statinf-varOfBinomialDistribution01.png
--------------------------------------------------------------------------------
/markdown/images/statinf-varOfBinomialDistribution02.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/statinf-varOfBinomialDistribution02.png
--------------------------------------------------------------------------------
/markdown/images/statinf-varOfBinomialDistribution03.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/statinf-varOfBinomialDistribution03.png
--------------------------------------------------------------------------------
/markdown/images/statinf-varianceOfExpDist01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/statinf-varianceOfExpDist01.png
--------------------------------------------------------------------------------
/markdown/images/toothGrowthChecklist.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/images/toothGrowthChecklist.png
--------------------------------------------------------------------------------
/markdown/kableDataFrameTable.md:
--------------------------------------------------------------------------------
1 | One of the things that is useful about `kable()` within the `knitr` package is that anything you can structure into a data frame can be printed as a table. I used this technique to create figures to compare actual vs. theoretical values for the Exponential Distribution assignment in the *Statistical Inference* course (sample values redacted).
2 |
3 |
4 |
5 | The code to produce this table looks like:
6 |
7 | Statistic <- c("Mean","Std. Deviation")
8 | Sample <- c(round(theMean,2),round(theSd,2))
9 | Theoretical <- c(5.0,5.0)
10 | theTable <- data.frame(Statistic,Sample,Theoretical)
11 | rownames(theTable) <- NULL
12 | kable(theTable)
13 |
14 |
--------------------------------------------------------------------------------
/markdown/mathjaxWithGithubMarkdown.md:
--------------------------------------------------------------------------------
1 | # Using Mathjax / LaTeX Formulas in the Data Science Specialization
2 |
3 | Students in the *Statistical Inference* and subsequent classes often need to use formulas as part of their communication in three different areas:
4 |
5 | * To explain points within course Discussion Forums,
6 | * In documentation for course projects, and
7 | * In HTML / Markdown files on Github.
8 |
9 | Coursera, RStudio, and Github all use [LaTeX](https://en.wikibooks.org/wiki/LaTeX/Mathematics) to specify formulas, and [MathJax](http://docs.mathjax.org/en/latest/start.html) to render the formulas in HTML and/or Markdown.
10 |
11 | Unfortunately, the techniques required to display the content correctly vary slightly for each output format. This article illustrates how to write the correct formula syntax for each type of deliverable.
12 |
13 | ## Mathjax & LaTeX in Discussion Forums
14 | In the Coursera Discussion Forums, the syntax to begin and end a formula is the double dollar sign: `$$`. In the text editor, one can specify formulas as illustrated below.
15 |
16 |
17 |
18 | To see how the formulas will render once the item is posted to the forum, press the `Preview` button to the upper right of the editor toolbar.
19 |
20 |
21 |
22 | Note that hte `Preview` button is only available in the older Coursera software that was used for the *Data Science Specialization* prior to January 2016. In the new version of the Coursera discussion forums, the student must post his/her content to see how it is evaluated by MathJax, and then edit the post to make adjustments.
23 |
24 | ## Mathjax & LaTeX in Project Assignments / R Markdown
25 |
26 | In R Markdown the syntax to begin and end a formula are the single dollar sign `$` and the double dollar sign: `$$`. The single dollar sign is used to include a formula as part of a sentence. The double dollar sign is used to break to a new line, center, and display a formula. The alternate Mathjax syntax of `\(` and `\)` for inline equations, and `\[` `\]` to display the equations centered on separate lines of text.
27 |
28 | Notice the subtle difference in the following illustrations from R Markdown versus the illustrations for the Discussion Forums.
29 |
30 |
31 |
32 | Once the markdown document is knit to HTML, PDF, or a Word document, the formulas are rendered.
33 |
34 |
35 |
36 | One can choose to keep the intermediate markdown file that is generated during the `knit2html()` process. For some of the Data Science Specialization classes, one of the requirements of the course projects is to keep and publish the markdown file to Github, along with any supporting figures.
37 |
38 | Note that although the formulas will be rendered correctly in an HTML output file from R Markdown that is subsequently posted to Github pages, the formulas will not render in the associated .md markdown file if it is also posted to Github.
39 |
40 | ## Mathjax in HTML on Github / Github Pages
41 |
42 | To display LaTeX formulas in an html page on Github that was built outside RStudio / R Markdown, the following code block must be included in the heading section of the HTML file to load the Mathjax interpreter.
43 |
44 |
45 |
48 |
49 |
50 | The remainder of the document should be coded with HTML commands, and then the file can be accessed as an HTML file within Github pages and correctly rendered when viewed through a web browser. Note that the syntax for LaTeX must use the `\(` `\)` style syntax, not the `$` `$$` style that is used within the Discussion Forums or R Markdown.
51 |
52 |
53 |
54 | Note that this syntax added to a markdown file will not cause the LaTeX to be correctly interpreted into equations in a Markdown file on Github. For more information on this topic, see [Using Mathjax on Github Pages](http://www.christopherpoole.net/using-mathjax-on-githubpages.html).
55 |
56 | ## References
57 |
58 | 1. Poole, Christopher (2012) -- [Using Mathjax on Github Pages](http://www.christopherpoole.net/using-mathjax-on-githubpages.html), retrieved December 3, 2015.
59 | 2. Wikipedia (2015) -- [LaTeX Mathematics](https://en.wikibooks.org/wiki/LaTeX/Mathematics), retrieved December 3, 2015.
60 |
--------------------------------------------------------------------------------
/markdown/permutationTestExample.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Permutation Test Example"
3 | author: "Len Greski"
4 | date: "April 30, 2016"
5 | output:
6 | html_document:
7 | keep_md: yes
8 | ---
9 |
10 | ```{r setup, include=FALSE}
11 | knitr::opts_chunk$set(echo = TRUE)
12 | ```
13 |
14 | Illustrate the permutation test example in Statistical Inference taken from Resampling lecture. The point of this example is to demonstrate that the differences of means between groups B and C from the 10,000 samples is
15 |
16 |
17 | ```{r permutations}
18 | data(InsectSprays)
19 | boxplot(count ~ spray,data = InsectSprays)
20 |
21 | # create a subset of data where the spray is either B or C
22 | subdata <- InsectSprays[InsectSprays$spray %in% c("B", "C"), ]
23 |
24 | # set y equal to the count column in subdata
25 | y <- subdata$count
26 |
27 | # convert spray from a factor to a character vector
28 | group <- as.character(subdata$spray)
29 |
30 | # create a function called testStat() which is defined as the difference
31 | # of means between group B and C. Note that the g == "B"
32 | # syntax generates a logical vector of TRUE / FALSE, which is used to
33 | # then subset the w vector passed to the function
34 | testStat <- function(w, g) mean(w[g == "B"]) - mean(w[g == "C"])
35 |
36 | # next, calculate the difference of means across the entirety of the
37 | # two groups, and save the result to observedStat
38 | observedStat <- testStat(y, group)
39 |
40 | # now, calculate permutations by taking 10,000 permutation samples
41 | # from y and group, calculating testStat() on each sample
42 | permutations <- sapply(1:10000, function(i) testStat(y, sample(group)))
43 |
44 | # print the value of observedStat, which was calculated on all of the data
45 | observedStat
46 |
47 | # now calculate the average of number of permutations that were larger
48 | # than observedStat, which we would expect to be zero because the permutation
49 | # distribution we created should have a mean of zero, and the actual test statistic
50 | # is 13.25
51 | mean(permutations > observedStat)
52 |
53 | # draw histogram of permutations, illustrating that they are approximately
54 | # normally distributed with a mean of zero and compare to the actual test
55 | # statistic, which shows that the probability of achieving a test statistic of
56 | # 13.25 from the permutation distribution is very low
57 | hist(permutations,breaks=15,xlim=c(-15,15))
58 | abline(v=observedStat,col = "blue", lwd=2)
59 |
60 | ```
61 |
62 | *end of example*
63 |
64 |
--------------------------------------------------------------------------------
/markdown/permutationTestExample.md:
--------------------------------------------------------------------------------
1 | # Permutation Test Example
2 | Len Greski
3 | April 30, 2016
4 |
5 |
6 |
7 | Illustrate the permutation test example in Statistical Inference taken from Resampling lecture. The point of this example is to demonstrate that the differences of means between groups B and C from the 10,000 samples is
8 |
9 |
10 |
11 | ```r
12 | data(InsectSprays)
13 | boxplot(count ~ spray,data = InsectSprays)
14 | ```
15 |
16 | 
17 |
18 | ```r
19 | # create a subset of data where the spray is either B or C
20 | subdata <- InsectSprays[InsectSprays$spray %in% c("B", "C"), ]
21 |
22 | # set y equal to the count column in subdata
23 | y <- subdata$count
24 |
25 | # convert spray from a factor to a character vector
26 | group <- as.character(subdata$spray)
27 |
28 | # create a function called testStat() which is defined as the difference
29 | # of means between group B and C. Note that the g == "B"
30 | # syntax generates a logical vector of TRUE / FALSE, which is used to
31 | # then subset the w vector passed to the function
32 | testStat <- function(w, g) mean(w[g == "B"]) - mean(w[g == "C"])
33 |
34 | # next, calculate the difference of means across the entirety of the
35 | # two groups, and save the result to observedStat
36 | observedStat <- testStat(y, group)
37 |
38 | # now, calculate permutations by taking 10,000 permutation samples
39 | # from y and group, calculating testStat() on each sample
40 | permutations <- sapply(1:10000, function(i) testStat(y, sample(group)))
41 |
42 | # print the value of observedStat, which was calculated on all of the data
43 | observedStat
44 | ```
45 |
46 | ```
47 | ## [1] 13.25
48 | ```
49 |
50 | ```r
51 | # now calculate the average of number of permutations that were larger
52 | # than observedStat, which we would expect to be zero because the permutation
53 | # distribution we created should have a mean of zero, and the actual test statistic
54 | # is 13.25
55 | mean(permutations > observedStat)
56 | ```
57 |
58 | ```
59 | ## [1] 0
60 | ```
61 |
62 | ```r
63 | # draw histogram of permutations, illustrating that they are approximately
64 | # normally distributed with a mean of zero and compare to the actual test
65 | # statistic, which shows that the probability of achieving a test statistic of
66 | # 13.25 from the permutation distribution is very low
67 | hist(permutations,breaks=15,xlim=c(-15,15))
68 | abline(v=observedStat,col = "blue", lwd=2)
69 | ```
70 |
71 | 
72 |
73 | *end of example*
74 |
75 |
--------------------------------------------------------------------------------
/markdown/pml-caretRunTimings.md:
--------------------------------------------------------------------------------
1 | # Caret Run Timings by Model and Machine
2 |
3 | Students working through the *Practical Machine Learning* course run into a number of surprises as they develop models for the course project. First, although the [caret documentation](http://topepo.github.io/caret/index.html) provides a unified interface to a variety of models, it
4 |
--------------------------------------------------------------------------------
/markdown/pml-combiningPredictorsBinomial.md:
--------------------------------------------------------------------------------
1 | ## Explanation of Combining Predictors -- Majority Vote
2 |
3 | Within the [Combining Predictors](https://github.com/bcaffo/courses/blob/master/08_PracticalMachineLearning/025combiningPredictors/Combining%20predictors.pdf) lecture, Professor Leek describes a scenario with five independent classifiers to illustrate how a majority vote approach improves the predictive power of a model. That is, if a majority of the classifiers make the same prediction, the probability that it is the correct one quickly becomes very high (e.g. exceeds 0.8).
4 |
5 | The calculation is based on the binomial theorem. If we have 5 independent classifiers and we need to calculate the probability that a majority will make the correct prediction, and the probability of any single predictor being correct is 0.7, we calculate the probability that a majority are correct as follows:
6 |
7 |
8 |
9 | ...and yes, there is a typo on the slide at 5 chooses 4.
10 |
--------------------------------------------------------------------------------
/markdown/pml-ghPagesSetup.md:
--------------------------------------------------------------------------------
1 | # Overview
2 | Github Pages is a tool to create a website from a github repository. When the Data Science Specialization was built back in 2014, students in the *Practical Machine Learning* course frequently expressed frustration at the work required to correctly configure Github Pages.
3 |
4 | Over the years Github has made it much easier to publish content via Github pages. The biggest improvement is that one can publish content directly from the *main* branch, rather than having to create a *gh-pages* branch from which a repository's content is published.
5 |
6 | ## Process as of April 2022
7 |
8 | The current process is a very simple 5 steps that publishes from the main branch of a repository:
9 |
10 | 1. Create a repository
11 | 2. Clone the repository to a local computer
12 | 3. Create an index.html file on the local computer
13 | 4. Commit and push the html file to the remote repository
14 | 5. Navigate to https://username.github.io/username/reponame to see the website
15 |
16 | Github has published a short walkthrough of the process to create a personal website based on a github repository that has the same name as one's github userid [here](https://pages.github.com).
17 |
18 | **Last updated 16 April 2022**
--------------------------------------------------------------------------------
/markdown/pml-installingRattleOnMacOSX.md:
--------------------------------------------------------------------------------
1 | ## Installing Rtgk2 and Rattle on Macbook OSX
2 |
3 | The `rattle` package is referenced during the *Practical Machine Learning* course in the Johns Hopkins Data Science Specialization as a mechanism to print nice looking tree diagrams from the model output generated by the `rpart` package. Unfortunately, the package has a dependency on the `gtk2` library that is not present by default on Mac OSX.
4 |
5 | Therefore, students using OSX frequently post questions about how to install the components necessary to use the Rattle `fancyRpartPlot()` function.
6 |
7 | After a fair amount of effort I was able to install rattle on Mac OS X Sierra.
8 |
9 | The install requires the gtk toolkit, and on Mac one must do the following, per a solution on [stackoverflow.com](https://stackoverflow.com/questions/15868860/r-3-0-and-gtk-rgtk2-error).
10 |
11 | 1. [Install macports](https://www.macports.org/install.php) — tool for installing mac packages
12 | 2. run SUDO to install gtk2 on mac
`sudo port install gtk2 ## (X11 -- not aqua)`
13 | 3. export new path
`export PATH=/opt/local/bin:/opt/local/sbin:$PATH`
14 | 4. From command line R, enter install rgtk2 with
`install.packages("RGtk2",type="source")` to compile from source
15 |
16 | **NOTE:** For the `RGtk2` install to work correctly from RStudio, one must first confirm that the `PATH` change listed above is applied to the shell that is used to start RStudio.
17 |
18 | The most complete set of instructions is located at [Sebastian Kopf's Gist page](https://gist.github.com/sebkopf/9405675) and verified by my own install on June 17, 2017. Once installed, loading the rattle library will generate the following output in the R console.
19 |
20 |
21 |
22 | In order to use `fancyRpartPlot()`, students will also need to install the `rpart.plot` package.
23 |
24 | install.packages("rpart.plot")
25 |
26 | ## Example: Fancy Rpart Plot of Iris Data
27 |
28 | Here we've replicated the code needed to generate a fancy tree diagram with `caret` and `rattle` that is discussed in the *Practical Machine Learning* lecture on *Predicting with Trees*.
29 |
30 | library(caret)
31 | library(rattle)
32 | inTrain <- createDataPartition(y = iris$Species,
33 | p = 0.7,
34 | list = FALSE)
35 | training <- iris[inTrain,]
36 | testing <- iris[-inTrain,]
37 | modFit <- train(Species ~ .,method = "rpart",data = training)
38 | fancyRpartPlot(modFit$finalModel)
39 |
40 |
41 |
42 |
43 | *Last updated: 27 January 2018*
44 |
--------------------------------------------------------------------------------
/markdown/pml-projectChecklist.md:
--------------------------------------------------------------------------------
1 |
2 | ## Practical Machine Learning Project Checklist
3 |
4 |
5 |
6 | Present | Evaluation Criterion |
7 | [ ] | The report describes a machine learning algorithm to predict activity quality from activity monitors. |
8 |
9 | [ ] | The report is 2,000 words or less. |
10 | [ ] | The number of figures in the document 5 or less. |
11 | [ ] | The report explains how the model was built. |
12 | [ ] | The report explains how cross-validation was used to estimate the out of sample error rate. |
13 | [ ] | The report lists the key decisions made during the analysis, and explains why these decisions were made. |
14 | [ ] | The report reviews the accuracy of the selected machine learning algorithm in predicting the 20 unknown test cases. |
15 | [ ] | The submission includes a github repository, and a link to the repository as part of the Coursera submission. |
16 | [ ] | The submission includes a compiled HTML document, the output .md markdown file, and any graphics required to correctly view them within the markdown file on github. |
17 | [ ] | If a github pages branch is provided, is the HTML file accessible at https://username.github.io/reponame. |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
--------------------------------------------------------------------------------
/markdown/pml-requiredModelAccuracy.md:
--------------------------------------------------------------------------------
1 | ## Practical Machine Learning: Required Model Accuracy for Course project
2 |
3 | As students complete the course project for _Practical Machine Learning_, they tend to raise questions about the accuracy required to correctly predict all 20 cases in the test data set.
4 |
5 | Going back to the probability theory concepts that were covered in *Statistical Inference*, each observation in the test data set is independent of the others. If *a* represents the accuracy of a machine learning model, then the probability of correctly predicting 20 out of 20 test cases with the model in question is *a^20*, because the probability of the total is equal to the product of the independent probabilities.
6 |
7 | The following table illustrates the probability of predicting all 20 test cases, given a particular model accuracy.
8 |
9 |
10 |
Model Accuracy | Probability of Predicting 20 out of 20 Correctly |
11 |
12 | 0.800 | 0.0115 |
13 | 0.850 | 0.0388 |
14 | 0.900 | 0.1216 |
15 | 0.950 | 0.3585 |
16 | 0.990 | 0.8179 |
17 | 0.991 | 0.8346 |
18 | 0.992 | 0.8516 |
19 | 0.993 | 0.8689 |
20 | 0.994 | 0.8866 |
21 | 0.995 | 0.9046 |
22 |
23 |
24 | Bottom Line: Submit your test cases for grading only after you've achieved a model accuracy of at least .99 on the training data set.
25 |
26 | # Appendix: Accuracy Required for 95% Confidence Across 20 Tests
27 |
28 | In January 2018 a student posted an [issue](http://bit.ly/2mv5Dr4) on my github site, suggesting that a better way to calculate the required accuracy would be to use the formula `(1-.05)^(1/20)`. This approach leverages the concept of familywise error rates across multiple comparisons of means in the week 4 lectures from the *Statistical Inference* course. This specific calculation is known as the [Šidák correction for multiple tests](http://bit.ly/2DuPwlq).
29 |
30 | When we compare the two approaches we find that they produce the same result within .001. To have 95% confidence that all 20 predictions will be accurate, we need a familywise accuracy rate of .9974386, as illustrated below.
31 |
32 | > mdlAccuracy <- c(.8,.85,.9,.95,.99,.995,.996,.997,.9974,0.9974386,.9975)
33 | > predAccuracy <- mdlAccuracy^20
34 | > data.frame(mdlAccuracy,predAccuracy)
35 | mdlAccuracy predAccuracy
36 | 1 0.8000000 0.01152922
37 | 2 0.8500000 0.03875953
38 | 3 0.9000000 0.12157665
39 | 4 0.9500000 0.35848592
40 | 5 0.9900000 0.81790694
41 | 6 0.9950000 0.90461048
42 | 7 0.9960000 0.92296826
43 | 8 0.9970000 0.94167961
44 | 9 0.9974000 0.94926458
45 | 10 0.9974386 0.94999960
46 | 11 0.9975000 0.95116988
47 | >
48 | > # alternate approach: Šidák's correction of multiple tests
49 | > # generate 95% confidence familywise accuracy needed across 20 tests
50 | > (1 - .05)^(1/20)
51 | [1] 0.9974386
52 | >
53 |
--------------------------------------------------------------------------------
/markdown/regmodels-references.md:
--------------------------------------------------------------------------------
1 | # References for Regression Models Course
2 |
3 | In response to students' questions about additional reference material for the Johns Hopkins University *Regression Models* course on Coursera, we have compiled the following list of reference materials. Note that some of them overlap with a similar list provided for [Statistical Inference](http://bit.ly/2c50sKo).
4 |
5 |
6 | Reference | Description |
7 | Regression Models for Data Science | Start here! Written by Brian Caffo, the instructor for Statistical Inference, this is the book designed to accompany the Johns Hopkins University Regression Models course, and it's available for low (or no) cost on leanpub.org. It also contains exercises and solutions. |
8 | OpenIntro Statistics | Openintro.org provides a set of three statistics courses that include books and videos. The courses are targeted at different categories of students, ranging from high school to college.
All of these resources are available for free.
Content related to regression analysis is contained in chapters 7 and 8. |
9 | R in Action | Written by Robert Kabacoff, R in Action is an excellent overall reference for R, written by a statistician.
Content related to regression analysis is contained in chapters 8 and 13. |
10 | Statmethods.net Multiple Regression Page | Also developed by Robert Kabacoff, statmethods.net is the free online companion to R in Action. It contains content about a variety of statistical methods, as well as a list of additional books and tutorials.
|
11 |
12 |
13 |
14 | We will continue to periodically update this list with new reference material.
15 |
16 | *Last updated: 2 April 2018*
17 |
--------------------------------------------------------------------------------
/markdown/regmodels-sumOfErrorTimesX.md:
--------------------------------------------------------------------------------
1 | # Why does the sum of errors times X equal 0?
2 |
3 |
4 |
5 | Additional background on the derivation may be found at [Least Squares Properties](http://bit.ly/2n2tlte) on the [Thinking with Data](http://bit.ly/2ogVpKo) website.
6 |
--------------------------------------------------------------------------------
/markdown/repDataAssignment2Checklist.md:
--------------------------------------------------------------------------------
1 | # Reproducible Research Assignment 2 Checklist
2 |
3 | During the October 2015 run of *Reproducible Research*, a number of students missed the first project submission deadline, or failed to comply with important requirements for the assignment. As a way to reduce student frustration with the second assignment, we produced the following requirements checklist for the assignment. Feedback about the checklist was very positive.
4 |
5 |
6 |
--------------------------------------------------------------------------------
/markdown/repdata-improvingInitialFileReadSpeed.md:
--------------------------------------------------------------------------------
1 | # Reproducible Research Final Project: Improving Initial Data Read Performance
2 |
3 | One of the most useful features in R is the ability of the `knitr` package to produce high quality analyses that combine text and graphics in a reproducible way. That is, if code that provides internet access to the original data sources is included in the R Markdown document, anyone with R can completely reproduce the analysis.
4 |
5 | However, when working with large data sets, the default behavior of knitr to rerun the entire analysis can be time consuming, making the data scientist wait to test various pieces of code in the analysis. Assignment 2: the NOAA Storm Data Analysis is a case in point. Many students in *Reproducible Research* struggle with slow response times when they have to read the storm events raw data file into R.
6 |
7 | Here are four approaches to improving the response time performance of the initial load.
8 |
9 | 1. Set `stringsAsFactors=FALSE` on `read.csv()`. One of the default argument settings is to convert strings to factors, and this is very slow for the Storm Data data set (71 seconds read time on an [HP Spectre x-360 laptop](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/repdata-improvingInitialFileReadSpeed.md#appendix-computer-specifications)) vs. 31 seconds when stringsAsFactors=FALSE.
10 | 2. Use a more efficient program to read the data, such as `readr::read_csv()`. I write about this in [Real World Example: Reading American Community Survey U.S. Census Data](http://bit.ly/2bAdLE9). On my laptop, `readr::read_csv()` reads the NOAA Storm Data data in 11.3 seconds.
11 | 3. Write logic to read the data once, and save it as a serialized object with `saveRDS()`. Once saved, the data can be read with `readRDS()`, which takes about 10.7 seconds on my laptop.
12 | 4. Use `knitr` options to [cache results of code chunks](http://rstudio-pubs-static.s3.amazonaws.com/180_77c843dcecf2406fb89d35dd0476628a).
13 |
14 | With #3, the logic to do this would look like the following, including programming statements to track the load, save, and read times.
15 |
16 | if (!file.exists("./repdata/assignment2/stormData.rds")) {
17 | message("Entered file does not exist block")
18 | library(readr)
19 | intervalStart <- Sys.time()
20 | stormData <- read_csv("./repdata/assignment2/repdata-data-StormData.csv",
21 | n_max = 902297,
22 | col_names = TRUE,
23 | ...) # additional arguments go here
24 | intervalEnd <- Sys.time()
25 | message(paste("readr::read_csv() took: ",intervalEnd - intervalStart,attr(intervalEnd - intervalStart,"units")))
26 | intervalStart <- Sys.time()
27 | saveRDS(stormData,"./repdata/assignment2/stormData.rds")
28 | intervalEnd <- Sys.time()
29 | message(paste("saveRDS() took: ",intervalEnd - intervalStart,attr(intervalEnd - intervalStart,"units")))
30 |
31 | } else {
32 | message('Entered readRDS() block')
33 | intervalStart <- Sys.time()
34 | stormData <- readRDS("./repdata/assignment2/stormData.rds")
35 | intervalEnd <- Sys.time()
36 | message(paste("readRDS() took: ",intervalEnd - intervalStart,attr(intervalEnd - intervalStart,"units")))
37 |
38 | }
39 |
40 | Note that I added statements to show when each code block is executed, as well as to take performance timings for the data reads and writes.
41 |
42 | # Appendix: Computer Specifications
43 |
44 | The computer used for the performance timings listed in the article has the following specifications.
45 |
46 |
47 |
48 | HP Spectre X360 laptop |
49 |
50 |
51 | - Operating system: Microsoft Windows 10, 64bit
52 | - Processor: Intel Core i7-6500U at 2.5Ghz, turbo up to 3.1Ghz, two cores
53 | - Memory: 8 gigabytes
54 | - Disk: 512 gigabytes, solid state drive
55 | - Date built: December 2015
56 |
57 | |
58 |
59 |
60 |
--------------------------------------------------------------------------------
/markdown/repdata-stormAnalysisCodebook.md:
--------------------------------------------------------------------------------
1 |
2 | Variable | Type | Description |
3 |
4 | PROPDMGEXP | Character | Exponential value used to adjust PROPDMG variable. Its values include:
5 | 0 - 8: coding errors, multiply PROPDMG by 1
6 | H,h: hundreds, multiply PROPDMG by 100
7 | K,k: thousands, multiply PROPDMG by 1,000
8 | M,m: millions, multiply PROPDMG by 1,000,000
9 | B,b: billions, multiply PROPDMG by 1,000,000,000
10 | ?: unknown, multiply PROPDMG by 0
11 | -: less than, multiply PROPDMG by 0
12 | +: greater than number estimated in PROPDMG, multiply PROPDMG by 1
13 | blank: not specified, multiply PROPDMG by 0 |
14 |
15 |
16 |
--------------------------------------------------------------------------------
/markdown/resources/2013.Velloso.QAR-WLE.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/resources/2013.Velloso.QAR-WLE.pdf
--------------------------------------------------------------------------------
/markdown/resources/ASAStatComp-Bell-Koren-VolinskyArticle.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/resources/ASAStatComp-Bell-Koren-VolinskyArticle.pdf
--------------------------------------------------------------------------------
/markdown/resources/Confidence Intervals for Poisson Variables.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/resources/Confidence Intervals for Poisson Variables.pdf
--------------------------------------------------------------------------------
/markdown/resources/MissingDataReview.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/resources/MissingDataReview.pdf
--------------------------------------------------------------------------------
/markdown/resources/chambers.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/resources/chambers.pdf
--------------------------------------------------------------------------------
/markdown/resources/makeVector logic.docx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/markdown/resources/makeVector logic.docx
--------------------------------------------------------------------------------
/markdown/rprog-OOPandR.md:
--------------------------------------------------------------------------------
1 | # Object Oriented Programming and R
2 |
3 | In September 2016 a student asked the following questions on the *R Programming* discussion forum:
4 |
5 | I have a question about the extend to which R packages (should) follow the OOP design principles. This stood out to me when experimenting with the lm() function but applies to almost every package that is widely used.
6 |
7 | Here is a way (or the only way?) to access the coefficients of a regression model in R:
8 |
9 | fit <- lm(y ~ x, data = myDF);
10 |
11 | coef(fit)
12 |
13 | Coming from an OOP background this rings a lot of bells, for example what stops the user from typing "coef(2)" or coef("abc")? Making coef() a function rather than a method of the "fit" object not only pollutes the global namespace but also makes code error prone. The same holds true for many other functions that in my opinion should have been class methods.
14 |
15 | What is the reason behind this design?
16 |
17 | R is built on the S language, so it contains the key features of S. In a [2014 use!R presentation](http://bit.ly/2cJGL8L), John Chambers explained three principles on which S (and R) are based, including:
18 |
19 | 1. Everything is an object,
20 | 2. Everything happens in a function, and
21 | 3. Functions are interfaces to algorithms.
22 |
23 | In R, there are two underlying object systems: the S3 system and the S4 system. The S3 system is designed around the use of `list()` to create objects. I explain how this relates to Programming Assignment 2 in my articles [makeCacheMatrix() as an Object](http://bit.ly/2byUe4e) and [Demystifying makeVector()](http://bit.ly/2bTXXfq).
24 |
25 | Regarding the student's specific question, given \#1 above, there is a way to access the coefficients output from `lm()` without using a function. They can be accessed with the `$` form of the extract operator as follows.
26 |
27 | #
28 | # example highlighting object output from lm
29 | #
30 |
31 | library(datasets)
32 | data(mtcars)
33 |
34 | aResult <- lm(mpg ~ disp + wt + am,data=mtcars)
35 |
36 | # now, access model coefficents from output object, aResult
37 |
38 | aResult$coefficients
39 |
40 |
41 |
42 |
43 | In the example above `aResult` is an S3 object, i.e. a `list()`. The output object from `lm()` consists of 12 named list elements, as illustrated from the *RStudio Environment Viewer:*
44 |
45 |
46 |
47 | To understand more of the details behind the design of R including its object oriented features, I highly recommend reading [Software For Data Analysis: Programming with R](http://amzn.to/2cmLFuR), by John Chambers.
48 |
49 | Regarding the student's comments about object orientation, I've written code in at least four object oriented languages (Smalltalk, Java, Objective C, and R). Each has its idiosyncracies. That said, given the focus on R functions in the early lectures in *R Programming,* one can understand why R doesn't appear very "object oriented."
50 |
51 | In Programming Assignment 2 one can see how the object created by `makeVector()` is an object. It has state (the elements `x` and `m`), behavior (the methods `set()`, `get()`, `getmean()`, and `setmean()`). Encapsulation exists because once an object of type `makeVector()` is created, one can only access `x` or `m` through the methods exposed through the final `list()` call at the end of the `makeVector()` function.
52 |
53 | For more details on how `makeVector()` works, one can review my article [Demystifying makeVector()](http://bit.ly/2bTXXfq).
54 |
--------------------------------------------------------------------------------
/markdown/rprog-assignment1Demos.md:
--------------------------------------------------------------------------------
1 | # R Programming Assignment 1 - Demo Output
2 |
3 | Students in *R Programming* periodically have difficulties accessing the sample output files that are associated with the first programming assignment. For convenience, I've posted screenshots of the content here.
4 |
5 | ## Part 1: pollutantmean()
6 |
7 | [Example output for the pollutantmean function](https://d18ky98rnyall9.cloudfront.net/_3b0da118473bfa0845efddcbe29cc336_pollutantmean-demo.html?Expires=1497916800&Signature=IQPv7V9hfmSIjZRKJxcp5O6GkCsFPItXubjRHUqc-z-ajQHFh~hd7RQ2~mHVYXxwACzJ1Axg2mL2FPG7jPRSVO4hb2rQNk73QaXyTn9JwzkevVj7wLIYW0g4PJEPBDxs2z93rn97sKjS-P1tHHIFxZglaTXUBEvTsssnETj3h5c_&Key-Pair-Id=APKAJLTNE6QMUY6HBC5A)
8 |
9 |
10 |
11 | ## Part 2: complete()
12 |
13 | [Example output for the complete function](https://d18ky98rnyall9.cloudfront.net/_3b0da118473bfa0845efddcbe29cc336_complete-demo.html?Expires=1497916800&Signature=jPxH2SBZWrf9Sa-1rVsS3N~FEetPO6WfSZusEFJcyrhyMBSVrNJUBqdtnA9FRfE6fksNyZovjUpnly3MjHx3fYyankOx~vwZD3PyRcbRjNNKE5U3ASZmq9HvBr-6qu~lKmlmzqNLbiuaOlWyU8zkjCwu2gu-zGZDZEfiFjbJ0Sw_&Key-Pair-Id=APKAJLTNE6QMUY6HBC5A)
14 |
15 |
16 |
17 |
18 | ## Part 3: corr()
19 |
20 | [Example output for the corr function](https://d18ky98rnyall9.cloudfront.net/_e92e575b8e62dcb1e3a086d2ff0d5a1e_corr-demo.html?Expires=1497916800&Signature=CRvn6GplUGq2uwaktu9GVproHN8skcrajwQ7CbU8K6Vf1c2ZgZnPm0czoh5JVx5tzI7FL-4bNMHuDIyJJTPqXbyvNQYFau7HMhCY4m76f~CJGcWJumMvQaS~EpKAb4U68GDNHANsSGBKDYDouISNzC22zZKpVooUpSBW6WGUZvk_&Key-Pair-Id=APKAJLTNE6QMUY6HBC5A)
21 |
22 |
23 |
--------------------------------------------------------------------------------
/markdown/rprog-assignmentOperators.md:
--------------------------------------------------------------------------------
1 | ## Forms of the Assignment Operator in R
2 |
3 | In the *R Programming* assignment on lexical scoping, students are introduced to the `<<-` syntax for assigning values to R objects. The sample code for this assignment often confuses people, because it is not clear how this syntax works.
4 |
5 | `<<-` is one of three forms of the assignment operator. The following syntax from the `makeVector()` function for the lexical scoping assignment in *R Programming* assigns the object m the value of mean:
6 |
7 | #
8 | # assign value of mean to m in parent scope
9 | #
10 | setmean <- function(mean) m <<- mean
11 |
12 | The double left arrow `<<` indicates that the assignment should be made to the parent environment, as opposed to the current scope within the `setmean()` function.
13 |
14 | To make the scoping more obvious, one could rewrite the code this way.
15 |
16 | setmean <- function(mean) {
17 | m <<- mean
18 | }
19 |
20 | The other forms of the assignment operator are `<-`and `=`. All of these are documented in the [Assignment Operators R Documentation](https://stat.ethz.ch/R-manual/R-devel/library/base/html/assignOps.html).
21 |
22 | For example:
23 |
24 |
25 |
26 | As one can see from the user of the `assignParent()` function, the value assigned to within the function is accessible after the function ends because we used the `<<-` version of the assignment operator.
27 |
28 | One subtlety of the assignment operator is that it can be used bidirectionally.
29 |
30 |
31 | # leftward form
32 | x <- 15
33 |
34 | is the same as
35 |
36 | # rightward form
37 | 15 -> x
38 |
39 | Note that the `=` form of the assignment operator is leftward form only, and that it has other restrictions on its use: `=` is only allowed at the top level (e.g., in the complete expression typed at the command prompt), or within a subexpression within a braced list of expressions.
40 |
41 | Therefore, most people who work in R prefer `<-` over `=`.
42 |
43 | # Appendix
44 |
45 | This section contains questions and answers about topics related to the assignment operator.
46 |
47 | **Question:** why must I use `<<-` to assign a value to an object in a parent environment?
48 |
49 | **Answer:** The `<-` form of the assignment operator will create a new object that is local to a function rather than traversing the environment tree to find whether there is an object of the same name in a parent scope. Therefore, one must use `<<-` to assign a value to an object in a parent scope.
50 |
51 | We'll illustrate the point with the following code. Notice how `anObject` in the parent environment retains its original value after `sampleFunction()` is executed.
52 |
53 | anObject <- "original value"
54 | sampleFunction <- function() {
55 | # use local form of assignment operator
56 | anObject <- "new value"
57 | message(paste("anObject value is:", anObject))
58 | }
59 | sampleFunction()
60 | anObject
61 |
62 |
--------------------------------------------------------------------------------
/markdown/rprog-completeSortProblem.md:
--------------------------------------------------------------------------------
1 | # Common Problems: complete("specdata",332:1) Fails
2 |
3 | There is a subtle requirement in this part of the first assignment in *R Programming*: data in the output data frame must match the sequence of sensor IDs passed in the `id` argument to `complete()`.
4 |
5 | Quiz question 7 tests students function against this requirement.
6 |
7 | There are at least three different techniques students typically use to solve the assignment, including:
8 |
9 | 1. `for()` loop that processes each sensor file in sequence based on the `id` argument, and assemble the results into a single data frame
10 | 2. `apply()` function that uses the `id` argument as input to drive the processing within an anonymous function in a manner similar to approach 1
11 | 3. Functions that summarize the data, such as `stats::aggregate()`, or `hmisc::summarize()`, or `base::table()`
12 |
13 | The first two approaches produce the correct results because unless one sorts the output before returning it to the parent environment, it will match the order of the `id` argument.
14 |
15 | The third approach fails because aggregation functions typically use what is called "by groups" to process the data, and the output is sorted in ascending order of by groups.
16 |
17 | To produce the correct answer for quiz question 7, one must use a technique that does not automatically sort the data in ascending order of `id`.
--------------------------------------------------------------------------------
/markdown/rprog-conceptsForFileProcessing.md:
--------------------------------------------------------------------------------
1 | # Concepts: Techniques for Processing Sets of Files
2 |
3 | As students in the Johns Hopkins University *R Programming* course work through the first programming assignment, one of the challenges they face in writing the `complete()` function is figuring out how to use a function to summarize each data file in the assignment, and return the output as a single `data.frame()`.
4 |
5 | ## A for()-Based Solution
6 |
7 | To illustrate some of the concepts in this solution,
8 | here is a function that takes a list of files from a directory and summarizes them with a function that is passed as another argument as is covered in Swirl lesson 9 on Functions.
9 |
10 | summarizeFiles <- function(directory,
11 | aVariable=NULL,
12 | summaryFunction=median,
13 | id=1:6,...){
14 |
15 | # read files and subset by id
16 | theFiles <- dir(directory,full.names = TRUE)[id]
17 |
18 | # initialize result array to NA so we can avoid
19 | # using combine function
20 | result <- rep(NA,length(theFiles))
21 |
22 | # process the files
23 | for (i in 1:length(theFiles)) {
24 | aFile <- read.csv(theFiles[i])
25 | # calculate summary function, including passing
26 | # ... argument to summary function
27 | result[i] <- summaryFunction(aFile[,aVariable],...)
28 | }
29 | # combine into an output data frame and return
30 | # result to parent environment
31 | data.frame(File=theFiles,Value=result)
32 | }
33 |
34 | To illustrate how the code works, I used it with the [Pokémon Stats data set](http://bit.ly/2ovmmxu) that I obtained from Alberto Barradas. He originally posted this data on Kaggle.com. I broke out Alberto's single CSV data file into separate files, one for each of the first six generations of Pokémon that were available as of the time Alberto created the data set.
35 |
36 | When we use the function to summarize the data by file with the median function on the Speed variable in the data set, the function generates the following results.
37 |
38 |
39 |
40 | Notice how the files are specified in the id argument, and how the function maintains the original order by using the argument to control how the results of `dir()` are saved in the file list. That is, by default `dir()` would return the files in ascending alphabetical order, but by using the \[ form of the [extract operator](http://bit.ly/2bzLYTL) on the result, we reorder the file names so they match the order of the id argument.
41 |
42 | An important byproduct of this approach is that it always replicates the order of files on the `id` argument, even if they are not in an increasing or decreasing series.
43 |
44 |
45 |
46 | ## An apply()-Based Solution
47 |
48 | During *R Programming* students learn about how `for()` loops are associated with slow performing R code. On the other hand, `apply()` functions aren't covered until after this assignment is due, so it's easier to illustrate a concept to new students with a `for()` loop rather than an `apply()` function. That said, an `apply()` based solution requires fewer statements than a `for()` based solution. Therefore, we include one here.
49 |
50 | summarizeFilesApply <- function(directory,
51 | aVariable=NULL,
52 | summaryFunction=median,
53 | id=1:6,...){
54 |
55 | # read files and subset by id
56 | theFiles <- dir(directory,full.names = TRUE)[id]
57 |
58 | # Read the files and keep the desired varible
59 | dataFrames <- lapply(theFiles,function(x) read.csv(x)[[aVariable]])
60 |
61 | # summarize the variable
62 | result <- unlist(lapply(dataFrames, function (x) {summaryFunction(x,...)}))
63 |
64 | # combine into an output data frame and return
65 | data.frame(File=theFiles,Value=result)
66 | }
67 |
68 | Again using the Pokémon data, output for this version matches the `for()` based solution.
69 |
70 |
71 |
--------------------------------------------------------------------------------
/markdown/rprog-downloadingLectures.md:
--------------------------------------------------------------------------------
1 | # Creative Use of R: Downloading Course Lectures
2 |
3 | During the August 2016 run of *R Programming*, a student asked whether all of the lectures were available in a single tar or zip file. As far as I know, the lectures aren't available for bulk download in a format that can be subsequently accessed on a local disk drive.
4 |
5 | I thought it might be useful to show how one could use R to download the files, since it does have capabilities to download data files in a variety of formats, and the example would be a good illustration of techniques students will need in *Getting and Cleaning Data* and *Reproducible Research*.
6 |
7 | The function `downloadLectures()` can be used to download files in a binary format, given a URL. We created function arguments for a list of files, and a prefix to add to each file that is downloaded.
8 |
9 | #
10 | # download lectures, requires curl package
11 | #
12 |
13 | downloadLectures<- function(fileList,courseName="rProgramming prework") {
14 | # configure set download method for windows vs. Mac / Linux
15 | dlMethod <- "curl"
16 | if(substr(Sys.getenv("OS"),1,7) == "Windows") dlMethod <- "wininet"
17 | for (i in 1:length(fileList)) {
18 | outFile <- paste(courseName,"_lecture_",sprintf("%03d.mp4",i),sep="")
19 | if(!file.exists(outFile)){
20 | download.file(fileList[i],destfile=outFile,method=dlMethod,mode="wb")
21 | }
22 |
23 | }
24 | }
25 |
26 | To execute the function, simply build a list of URLs at which the videos are stored, source, and call the downloadLectures() function.
27 |
28 | #
29 | # run downloadLectures() for video to install R on a Mac
30 | #
31 | theFiles <- c(
32 | "https://www.coursera.org/learn/r-programming/lecture/9Aepc/installing-r-on-a-mac"
33 | )
34 | downloadLectures(theFiles,courseName="rProgramming_prework")
35 |
36 |
37 | The files will be downloaded to the *R Working Directory*. Once we've executed the function, we can check the files we've downloaded to ensure they work with an MP4 player.
38 |
39 |
40 |
41 | Next, open the selected file with a video player.
42 |
43 |
44 |
45 | Here is what the selected video looks like in Quick Time.
46 |
47 |
48 |
49 | Ideas for enhancing this code on your own:
50 |
51 | 1. Add code to downloadLectures() to zip / tar the files once they've all been downloaded.
52 | 2. Add code to distinguish one week's lectures from another, enabling the function to download all the lectures for a course in a single function call.
53 |
--------------------------------------------------------------------------------
/markdown/rprog-githubDesktopSync.md:
--------------------------------------------------------------------------------
1 | # Using Github Desktop for Assignment 2
2 |
3 | This walkthrough is intended to illustrate how to save and post code for *Programming Assignment 2* from a local machine running RStudio to a forked copy of Professor Peng's [ProgrammingAssignment2](https://github.com/rdpeng/ProgrammingAssignment2) github repository.
4 |
5 | ## Prerequisites
6 |
7 | * R and RStudio installed
8 | * Github Desktop installed
9 | * Professor Peng's repository forked and accessible from your Github account
10 | * Local copy cloned from your remote Github repository and accessible in Github Desktop
11 |
12 | ## Step 1: Edit the cachematrix.R file
13 |
14 | First, open the `cachematrix.R` file in RStudio, and edit to comply with the requirements of the programming assignment. Here is my version after adding a couple of comment lines prior to the `list()` function call at the end of `makeCacheMatrix()`.
15 |
16 |
17 |
18 | Since I have configured RStudio / Git integration, I can now see that the file is seen as modified by Git within the Git tab of the *Environment Pane* in RStudio.
19 |
20 |
21 |
22 | ## Step 2: View the edited file in Github Desktop
23 |
24 | Open Github Desktop, access the ProgrammingAssignment2 repository, and you'll now see that `cachematrix.R` is noted as modified, as illustrated by the yellow icon next to the file name.
25 |
26 |
27 |
28 | To see details of the changes in a manner similar to what we did in RStudio, click on the `cachematrix.R` file name.
29 |
30 |
31 |
32 |
33 | ## Step 3: Commit changes
34 |
35 | Each commit requires a text summary and description. Enter this information in the text entry boxes in the lower left corner of the Github Desktop window.
36 |
37 | Then press the `` button in the lower left corner of the Github Desktop Window to commit these changes to the local repository.
38 |
39 |
40 |
41 | Once the changes have been committed to the local repository, you'll see the window change and your commit is listed as "committed just now."
42 |
43 |
44 |
45 | ## Step 4: Push the changes to Github
46 |
47 | To push the updated file to the remote Github repository, press the `` button in the top navigation bar of Github Desktop.
48 |
49 |
50 |
51 | Once the push completes, the button will now say ``, meaning that there are no local changes to be pushed to the remote.
52 |
53 |
54 |
55 | ## Step 5: Confirm changes are visible on Github
56 |
57 | Now you can navigate to your remote version of the *ProgrammingAssignment2* repository on Github, and view that the change has been made on the remote repository.
58 |
59 | First, we confirm that our commit information is present in the `last commit` area on the repository home page.
60 |
61 |
62 |
63 | Second, we can click on the `cachematrix.R` file to view it, confirming that the comments we added locally are now present in the remote copy.
64 |
65 |
66 |
67 | ## Step 6: Find and Copy the SHA-1 hash code
68 |
69 | Finally, since students must post the SHA-1 hash code along with the URL for the Github repository for the assignment, view it by clicking on the first few characters of the hash code in the upper right corner of the file viewer window.
70 |
71 |
72 |
73 | The complete version of the hash code is displayed on the commit details page. Copy the entire hash code so you can paste it into the appropriate text entry area in the project assignment submission page.
74 |
75 |
76 |
--------------------------------------------------------------------------------
/markdown/rprog-gradeSHA1hash.md:
--------------------------------------------------------------------------------
1 | # R Programming Assignment 2: Grading the SHA-1 Hash Code
2 | If the cachematrix.R has a newer commit date than the original from the forked repository, and the SHA key corresponds to an actual commit of an updated README.md in the student's repository, according to the grading rubric, award 2 points. To check the differences between the latest commit and the prior version, take the following steps.
3 |
4 | 1) From the main page of the forked repository, click on the README.md file
5 |
6 |
7 |
8 | 2) Next, select the History button on the README.md page
9 |
10 |
11 |
12 | 3) From the Commit History page, select the commit that corresponds with the SHA-1 code for which you want to compare the latest commit against the prior version
13 |
14 |
15 |
16 | 4) Finally, the next page will display the DIFF between the last two versions. If the DIFF output shows that the student changed the file in an observable way, it qualifies as a valid commit.
17 |
18 |
19 |
20 | You can also check whether there is a different commit that contains the commit for cachematrix.R beyond the original fork and diff the files, using the same procedure outlined above.
21 |
--------------------------------------------------------------------------------
/markdown/rprog-overwritingRFunctions.md:
--------------------------------------------------------------------------------
1 | ## Common R Mistakes: Overwriting R Functions with Data Objects
2 |
3 | Beginning R students often run into a problem where a particular R function appears to be no longer working in R or RStudio. The cause of this problem is usually due to the fact that the student has saved the result of an R function to an object that has the same name as an R function.
4 |
5 | The following quote from John Chambers is appropriate here:
6 |
7 | > Everything that exists is an object. Everything that happens is a function call.
8 | >
9 | > John Chambers, quoted in _Advanced R_, p. 79.
10 |
11 | In the context of this particular topic, if one executes the following statements:
12 |
13 | > str(mean)
14 | function (x, ...)
15 | > x <- c(1,2,3,4,5)
16 | > mean <- mean(x)
17 | > str(mean)
18 | num 3
19 | >
20 |
21 | the base package `mean()` function is now hidden by an object called mean in the current environment, as evidenced by the change in output from the `str()` function. There are two ways to access a function that has been hidden by overriding it.
22 |
23 | 1. Eliminate the local environment's version of the object with `rm()`, or
24 | 2. Use the package name to specify the exact function to be called, such as `base::mean()`
25 |
26 |
27 | base::mean(x)
28 | [1] 3
29 |
30 |
31 | After using the `rm()` function, one can see that the `base::mean()` function is accessible again without having to include its package name.
32 |
33 | > rm('mean')
34 | > mean
35 | function (x, ...)
36 | UseMethod("mean")
37 |
38 |
39 | >
40 |
41 | **Bottom Line:** These types of problems can be very difficult to debug in large R programs, so avoid using an R function name as a an object in your code unless you have a specific reason for overwriting the function.
42 |
--------------------------------------------------------------------------------
/markdown/rprog-rScopingVsC.md:
--------------------------------------------------------------------------------
1 | In the February 2017 run of *R Programming* a student asked a question about how lexical scoping in R compares to variable scoping in C/C++.
2 |
3 | ## Scoping in C/C++
4 |
5 | C/C++ make a distinction between [local and global variables](http://bit.ly/2kEqjNJ). As noted in the student's original post, variables declared in C functions are local to the function. In addition, one can declare global variables in C, which are accessible in `main()` as well as any other function.
6 |
7 | Also note that C/C++ include the concept of pointers, which are references to memory locations accessible within a C/C++ program. Pointers complicate the topic of variable scope within C/C++ considerably. It is common to pass a pointer into a function and have the function manipulate the object referenced by the pointer. This type of behavior is similar to what happens with the `<<-` assignment operator, where R assigns a value to an object in the parent environment to the current environment.
8 |
9 | ## Scoping in R
10 |
11 | In R, every object is tied to an environment as I describe in [R Objects, S Objects, and Lexical Scoping](http://bit.ly/2dtOSXi). The concept of environments automatically introduces more levels of scoping than C/C++, because environments are hierarchical, and each function creates its own environment.
12 |
13 | R programs can access objects in other environments by using the appropriate function calls. R also provides capabilities to `get()` and `assign()` content to objects in specific environments. These features are described in Hadley Wickham's [Advanced R section on Environments](http://bit.ly/2lcJagr), but I'll leave the detailed coverage of these details to Hadley.
14 |
15 | When an R function is loaded into memory, it creates an environment into which any objects created within the function are stored. If the function returns a reference to itself (as [makeVector()](http://bit.ly/2bTXXfq) does), then the environment for that function remains in memory until R ends or the function is removed from memory with `rm()`.
16 |
--------------------------------------------------------------------------------
/markdown/rprog-rVsPython.md:
--------------------------------------------------------------------------------
1 | # R versus Python
2 |
3 | One of the common questions asked by students in the Johns Hopkins University Data Science Specialization on Cousera is, "Why does the specialization use R instead of Python?"
4 |
5 | Rather than providing my own opinion on this topic, this article provides a list of references where this topic is debated / presented.
6 |
7 | 1. [DataCamp Infographic](http://bit.ly/2ldNC1H) a nice contrast of the two languages, written a few years ago
8 | 2. [KDNuggets 2016 Survey](http://bit.ly/2b827ey) June 2016 survey results, where R usage slightly leads that of Python
9 | 3. [Domino Labs: R vs. Python](http://bit.ly/2ldZrVC) Domino Labs' Chief Scientist, Eduardo Ariño de la Rubia, makes an argument for each language as best for data science.
10 | 4. [Elite Data Science Summary of DominoLabs Video](http://bit.ly/2kNePHQ)
11 | 5. [R Swallows Python](http://bit.ly/2kNjrxO) 2015 Infoworld article comparing R and Python
12 | 6. [Should you teach R or Python for data science?](http://bit.ly/2kqSvjP) DataSchool article comparing R and Python
13 |
--------------------------------------------------------------------------------
/markdown/rprog-sortFunctionsExample.md:
--------------------------------------------------------------------------------
1 | Some students have trouble with the third programming assignment in *R Programming* because they struggle with sorting the data in the correct order. Here is an example of how to use two different R functions to sort the Motor Trend Cars data set from the `datasets` package.
2 |
3 | > library(datasets)
4 | > head(mtcars)
5 | mpg cyl disp hp drat wt qsec vs am gear carb
6 | Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
7 | Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
8 | Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
9 | Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
10 | Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
11 | Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
12 | > library(plyr)
13 | > #
14 | > # use plyr::arrange() to sort the data frame by cyl and mpg
15 | > #
16 | >
17 | > arrangedData <- arrange(mtcars,cyl,mpg)
18 | > head(arrangedData)
19 | mpg cyl disp hp drat wt qsec vs am gear carb
20 | 1 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
21 | 2 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
22 | 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
23 | 4 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
24 | 5 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
25 | 6 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
26 | >
27 |
28 | As you can see from the output, the data is sorted by `cyl` and then `mpg`. One of the productivity aids from `plyr` is the ability to reference columns in the data frame directly, without the need for the extract operator `$`, as in `mtcars$mpg`. Also note that because the car names were included as row names instead of a regular column, they were not copied to the `arrangedData` data frame.
29 |
30 | Note that the `dplyr` package also provides an `arrange()` function.
31 |
32 | Here is sample code that does the same thing using the `base::order()` function.
33 |
34 | > #
35 | > # use order() to do the same thing, sort mtcars by cyl and mpg
36 | > #
37 | > orderedData <- mtcars[order(mtcars$cyl,mtcars$mpg),]
38 | > head(orderedData)
39 | mpg cyl disp hp drat wt qsec vs am gear carb
40 | Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
41 | Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
42 | Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
43 | Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
44 | Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
45 | Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
46 | >
47 |
48 | Notice that the order function must be used within the row dimension of the `dataFrame[rows,columns]` syntax, instead of a simple function call that operates on a data frame like the `arrange()` function.
49 |
50 | To include the car names in the output from `plyr::arrange()` and make the results constant between the two functions, we would need to set the row names to a vector, column bind the vector back to the data frame, and set the rownames to NULL.
51 |
52 | > library(datasets)
53 | > library(plyr)
54 | > #
55 | > # move rownames to a column, because plyr::arrange() does not work with rownames
56 | > #
57 | > carName <- rownames(mtcars)
58 | > mtcars <- cbind(carName,mtcars)
59 | > rownames(mtcars) <- NULL
60 | > head(mtcars)
61 | carName mpg cyl disp hp drat wt qsec vs am gear carb
62 | 1 Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
63 | 2 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
64 | 3 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
65 | 4 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
66 | 5 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
67 | 6 Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
68 |
--------------------------------------------------------------------------------
/markdown/rprogAssignment2Prototype.md:
--------------------------------------------------------------------------------
1 | # Programming Assignment 2: makeCacheMatrix() as an Object
2 |
3 | In R, the `list` data type is the basis for the S3 object system, one of two "object" systems in R. The second system is the S4 object system, as referenced in [Forms of the Extract Operator](http://bit.ly/2bzLYTL). Programming Assignment 2 is based on the S3 object system.
4 |
5 | When an object is defined as an S3 object, it includes not only the functions in its list, but also any variables in memory of the function where the list was created. This is what makes it an "object," it contains both behavior (the functions), and state (the variables in the environment).
6 |
7 | In this context, the functions in the list are the equivalent of methods in a Java class, and the matrix that was originally passed to `makeCacheMatrix()` is still available within the environment of the object to which the list was returned.
8 |
9 | A subtlety about the S3 model that isn't explained in the _R Programming_ lectures is that S3 objects rely on a "trick" that makes them work. When an R function returns an object that contains functions to its parent environment (as is the case with a call like `myMatrix <- makeCacheMatrix(a)`), not only does `myMatrix` have access to the specific functions in its list, but it also retains access to the entire environment defined by `makeCacheMatrix()`, including the original argument used to start the function.
10 |
11 | Why is this the case? `myMatrix` contains pointers to functions that are within the `makeCacheMatrix()` environment after the function ends, so these pointers prevent the memory consumed by `makeCacheMatrix()` from being released by the garbage collector. Therefore, the entire `makeCacheMatrix()` environment stays in memory, and `myMatrix` can access its functions as well as any data in that environment that is referenced in its functions.
12 |
13 | This is why `x` (the argument initialized on the original function call) is accessible by subsequent calls to functions on `myMatrix` such as `myMatrix$get()`, and it also explains why the code works without having to explicitly issue `myMatrix$set()` to set the value of `x`.
14 |
15 | Reference: _Software for Data Analysis,_ Kindle Edition, location 1683\.
16 |
17 | To illustrate this point with the `makeVector()` function that is used as the reference example for *Assignment 2*, notice that the function declaration along with the first line of code provide the same functionality as the `set()` function.
18 |
19 | makeVector <- function(x) {
20 | m <- NULL
21 | set <- function(y) {
22 | x <<- y
23 | m <<- NULL
24 | }
25 | ...
26 | }
27 |
28 |
29 | if R was a more strongly typed language, the function stub in Professor Peng's repository might look like:
30 |
31 | cacheSolve <- function(makeCacheMatrix x, ...) {
32 |
33 | # return the inverse of x, or calculate & return if cache is empty
34 | }
35 |
36 | This type of specification would make it obvious that `cacheSolve()` requires as its input the type of object that is output by `makeCacheMatrix()`.
37 |
38 | For additional discussion of the R features used in the programming assignment, please review the article [Demystifying MakeVector](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/rprog-breakingDownMakeVector.md).
39 |
--------------------------------------------------------------------------------
/markdown/statinf-accessingRCodeFromKnitrAppendix.md:
--------------------------------------------------------------------------------
1 | # ToothGrowth Assignment: Accessing R Code from a Report Appendix
2 |
3 | Students often struggle to fit within the three page limit for the ToothGrowth portion of the *Statistical Inference* course project.
4 |
5 | The instructions and grading criteria for part 2 do not explicitly require inclusion of the code, but it's usually a good idea to include the code so reviewers can see the steps that were taken to generate the content in the report. One technique for fitting into the page limit restriction for the course project while simultaneously including the code is to move it to an Appendix, and use `ref.label` within the report to execute chunks of code as they are needed to support the analysis text.
6 |
7 | One can suppress printing of code in the report by using `echo=FALSE` in the knitr code to prevent knitr from printing the source code during the report. Since we're allowed to include code in an Appendix of up to 3 pages, I suggest that you include the code in the Appendix as follows.
8 |
9 | I typically organize my Rmd reports in the following manner:
10 |
11 | * Write all of the R code in an Appendix section, break it up into components, and give each component a name so I can execute it by reference during the report.
12 | * Where I need to run a section of the R code during the report, add code to call it by reference and use echo=FALSE to prevent the code from printing.
13 | * Add a \newpage LaTeX command right before the Appendix section to ensure the Appendix starts at the top of a new page.
14 | * Optionally, allow the code to print in the Appendix section.
15 |
16 | Here is how this looks in practice, based on my course project from *Regression Models*, with content redacted to comply with the Coursera Honor Code.
17 |
18 | First, here is what the Appendix looks like:
19 |
20 |
21 |
22 | Second, here is what the start of the report looks like, where I use `ref.label` to execute named sections of R code that are located in the Appendix:
23 |
24 |
25 |
26 | Third, here is what the output in the beginning of the report looks like:
27 |
28 |
29 |
30 | Notice that 1) there is no echoing of the code between the Executive Summary and the Exploratory Data Analysis headings, even though I executed code to generate descriptive statistics ahead of the EDA section, where at 2) I've retrieved a mean calculated from an R procedure and displayed it in the report.
31 |
32 | Finally, here is what the appendix looks like, where the code is displayed `echo=TRUE` but not evaluated `eval=FALSE`.
33 |
34 |
35 |
36 | In summary, the techniques above provide a straightforward way simultaneously manage the size of a report, include output from appropriate R functions in the report, and display code in an Appendix with or without executing it.
37 |
--------------------------------------------------------------------------------
/markdown/statinf-areaOfPointOnNormalCurve.md:
--------------------------------------------------------------------------------
1 | # Calculating Area for a Point on the Normal Curve
2 |
3 | A student in the July 2017 run of the Johns Hopkins University *Statistical Inference* course asked how one could calculate the probability for a specific point on a normal curve, given Professor Caffo's statement that we assign probability values to ranges of continuous variables. The key question here is "Why can't we calculate the probability of a specific value for a continuous variable when we can calculate the probability of a specific value for a discrete variable?"
4 |
5 | [Integral calculus](https://en.wikipedia.org/wiki/Integral) provides the mathematics to calculate the area under a curve. An integral is essentially the reverse of a [derivative](https://en.wikipedia.org/wiki/Derivative), and the [fundamental theorem of calculus](https://en.wikipedia.org/wiki/Fundamental_theorem_of_calculus) explains how the two relate for a continuous real function.
6 |
7 | if f is a continuous real-valued function defined on a closed interval [a, b], then, once an antiderivative F of f is known, the definite integral of f over that interval is given by
8 |
9 |
10 |
11 |
12 | From the above definition it logically follows that
. Therefore, the area under any specific point on the normal curve is 0 because *F(a) - F(a) = 0*. However, one can use the mean and standard deviation of a distribution along with a specific value to associate it with a [quantile](https://en.wikipedia.org/wiki/Quantile), as Professor Caffo defined on slide 21 of the Probability lecture.
13 |
14 | A quantile calculates the area from the bottom of the curve to a specific point, allowing us to use the fixed integral technique as we described above. We can also calculate the area for a small slice of the normal curve surrounding the exact value in which we are interested.
15 |
16 | We'll illustrate this technique with the details supporting the question asked by the student on the *Statistical Inference* discussion forum. Let's look at the heights of males in the United States. What would we do if we wanted to calculate the probability that an individual's height is 1.7 meters?
17 |
18 | We can answer this question with data from the United States Centers for Disease Control's summary of the [National Health and Nutrition Index Survey - Anthropometric Reference Data for Children and Adults 2011 - 2014](http://bit.ly/2wa3d4E).
19 |
20 |
21 |
22 | We will use the table that includes average height (in centimeters) and percentiles for varying age categories of adult males. We can calculate the standard deviation for height manipulating the formula for a Z score at the 5th percentile to solve for the sample standard deviation *s*, where
23 |
24 | For the purpose of this exercise, we will assume that height of males in the U.S. is normally distributed. Given the assumption of a normal distribution we can calculate where a height of 170 centimeters (1.7 meters) is on a normal curve (the Z distribution) as follows.
25 |
26 |
27 |
28 | The resulting probability value (i.e proportion of the normal curve) means that approximately 22% of people are 170 cm or shorter, and 78% are taller than 170 cm. In other words, a height of 170 cm is at the 22nd percentile of the height distribution if height is distributed normally.
29 |
30 | The closest we can get to an exact probability for a height of 170 cm is to use the [Reimann integral](https://en.wikipedia.org/wiki/Riemann_integral) technique to calculate the area under the curve between 169.95 and 170.05 cm, which is about 0.004.
31 |
32 |
33 |
--------------------------------------------------------------------------------
/markdown/statinf-expDistChecklist.md:
--------------------------------------------------------------------------------
1 | Working through the Data Science Specialization, students in numerous courses have lost credit on project assignments due to missing key points in the instructions. Here is a checklist for the Exponential Distribution / Central Limit Theorem analysis that can be used to ensure key requirements from the assignment are covered in your project submission.
2 |
3 |
4 |
--------------------------------------------------------------------------------
/markdown/statinf-optimalSampleSize.md:
--------------------------------------------------------------------------------
1 | # Power Calculations: what is an "optimal" sample size?
2 |
3 | Generally speaking, "optimal" sample size is the minimum sample size needed to distinguish an effect of a specific size between test and control groups at a specific alpha level. That is, if a researcher designs an experiment with a test and control group and can distinguish a substantively meaningful effect with a sample size of 60, why should s/he spend money on a sample size of 100?
4 |
5 | Remember, effect size for a difference of means test is calculated as \(mu_a - mu_0\) / \(sigma\), where mu_a represents the mean from the test group, and mu_0 represents the control group, assuming a pooled variance.
6 |
7 | What does "substantively meaningful" mean? Well, if one is testing the difference in a test advertisement and a control advertisement on the probability of a consumer purchasing a product, "substantively meaningful" is when the effect is large enough that it makes sense to deploy the test advertisement based on the amount of additional revenue that will be generated by using the test ad versus the control ad.
8 |
9 | With a large enough *N*, any comparison of means or proportions can be statistically significant, so it's important to ensure that a statistically significant result is also substantively meaningful.
10 |
--------------------------------------------------------------------------------
/markdown/statinf-permutationTests.md:
--------------------------------------------------------------------------------
1 | # Background on Permutation Tests
2 |
3 | This article is an additional explanation of the content in the Johns Hopkins University Data Science Specialization course on *Statistical Inference*, specifically the lecture 13 content on [group comparisons and permutation tests](https://github.com/bcaffo/courses/blob/master/06_StatisticalInference/13_Resampling/index.pdf) starting on slide 17.
4 |
5 | If we have a variable *x* and two groups *A* and *B*, the permutation test evaluates whether they come from the same underlying distribution. If our test statistic is a difference of means, we test the hypothesis:
6 |
7 |
8 |
9 | The last line of code on slide 20 is the probability value associated with the null hypothesis. If this is less than alpha \(the rejection region\), we reject the null hypothesis that the two groups come from the same distribution.
10 |
11 | For example, let's build some sample data containing two groups of normally distributed values with means of 2,500 and run the permutation test.
12 |
13 |
14 |
15 | Notice that the probability value is *> 0.05*. Therefore, we would fail to reject the null hypothesis at alpha = 0.05 that the two groups came from the same distribution, and conclude that the two groups do in fact come from the same underlying distribution.
16 |
17 | Now, let's change the data so that the second group has a mean of 50 instead of 2,500 and rerun the test.
18 |
19 |
20 |
21 | Here we reject the null hypothesis that the two groups come from the same underlying distribution.
22 |
23 | For more information on permutation tests, take a look at the [Wikipedia page for Resampling](https://en.wikipedia.org/wiki/Resampling_(statistics)).
24 |
--------------------------------------------------------------------------------
/markdown/statinf-poissonInterval.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "poissonInterval"
3 | author: "Len Greski"
4 | date: "2/25/2017"
5 | output: html_document
6 | ---
7 |
8 | $X \sim Poisson(\lambda t)$ means that X is distributed as Poisson with an arrival rate of $\lambda$ over a time period of $t$.
9 |
10 | Estimate $\hat \lambda = X/t$ means that we estimate $\lambda$ with our sample $X$ and $t$, because $\lambda$ is the mean of a variable that conforms to a Poisson distribution. That is, if $X$ events occur over over $t$ time period, the average arrival per unit of time is $\lambda$.
11 |
12 | $Var(\hat \lambda) = \lambda / t$ means that the variance of $\hat \lambda$, or variance of the sampling distribution of the mean is $\lambda / t$.
13 |
14 | Therefore, variance estimate $= \hat \lambda / t$, there the time period is the equivalent of $n$ in a variance of a sampling distribution of means of a continuous variable.
--------------------------------------------------------------------------------
/markdown/statinf-poissonInterval.md:
--------------------------------------------------------------------------------
1 | # Poisson Confidence Interval Explained
2 |
3 | In February 2017 a student in the Johns Hopkins University *Statistical Inference* course requested a more verbose explanation of the content on calculating a confidence interval for the mean of a Poisson random variable. Specifically he wanted more detail regarding the content on slides 26 - 27 of the *Asymptopia* lecture.
4 |
5 | ## Explaining the Formulas
6 |
7 |
8 |
9 | ## Key difference: how wide is the interval?
10 |
11 | Here is the R code from slide 27 of the lecture.
12 |
13 | > x <- 5
14 | > t <- 94.32
15 | > lambda <- x / t
16 | >
17 | > # manually calculated confidence interval
18 | > round(lambda + c(-1,1) * qnorm(.975) * sqrt(lambda/t),3)
19 | [1] 0.007 0.099
20 | >
21 | > # Poisson test version of confidence interval
22 | > poisson.test(x,t)$conf.int
23 | [1] 0.01721254 0.12371005
24 | attr(,"conf.level")
25 | [1] 0.95
26 | >
27 |
28 | Notice that the manually calculated interval is more aggressive (narrower) than the `poisson.test()` version.
29 |
30 | > # manual
31 | > manualInterval <- round(lambda + c(-1,1) * qnorm(.975) * sqrt(lambda/t),5)
32 | > manualInterval[2] - manualInterval[1]
33 | [1] 0.09293
34 | > # poisson test
35 | > poissonInterval <- poisson.test(x,t)$conf.int
36 | > poissonInterval[2] - poissonInterval[1]
37 | [1] 0.1064975
38 | >
39 |
40 | ## Other Forms of Poisson Intervals
41 |
42 | In July 2017, another student asked a question about a formula that was based on the square root of lambda, asking why this could be accurate given that the formula for the standard devision of a poisson distribution is sqrt(lambda).
43 |
44 | The literature on confidence intervals for poisson means is confusing because there are many different ways to calculate a confidence interval for a poisson mean, including:
45 |
46 | * intervals based on event counts,
47 | * intervals based on a rate, and
48 | * intervals based on an average of several counts.
49 |
50 | When the event count is large, many of these formulas use an approximation to the normal distribution where the standard deviation equals the mean of the event count, leading to a formula that looks like
51 |
52 | lambda +/- Z * sqrt(lambda)
53 |
54 | The clearest summary of these approaches that I found during my research on this question is from an article on [pmean.com](http://bit.ly/2tzbIbe).
55 |
56 | ## Additional Information
57 |
58 | For additional background on the Poisson distribution and calculating confidence intervals for it, see also [Introduction to the Poisson Distribution](http://bit.ly/2kJH86C), [Computing Confidence Interval for Poisson Mean](http://bit.ly/2lVyPdj), and [Comparison of Confidence Levels for the Poisson Mean: Some New Aspects](http://bit.ly/2lhIZlg) by Patil and Kulkarni.
59 |
--------------------------------------------------------------------------------
/markdown/statinf-varOfBinomialDistribution.md:
--------------------------------------------------------------------------------
1 | # Variance of Binomial Distribution
2 |
3 | In the August run of the Johns Hopkins Data Science Specialization *Statistical Inference* course, a student asked a question about why the variance for a toss of a coin with a probability of heads (1) equal to *p* was *p(1 - p)* instead of n * p(1 - p), referencing the Wikipedia article for the [Binomial Distribution](http://bit.ly/2vU528j).
4 |
5 | In the lecture, Professor Caffo uses the [moment statistics](http://bit.ly/2iTE6zW) formula for the second central moment to calculate the variance, assuming a single coin flip.
6 |
7 |
8 |
9 | In the case where we have multiple coin flips, each flip is independent, and all flips have the same probability of a TRUE / 1 result. In this situation, we have a random variable *X* with parameters *n* and *p* that represents the sum of *n* independent variables *Z*, each of which can take the value 0 or 1. Therefore, the variance of the total number of flips is equal to n * p(1 - p).
10 |
11 | In the case of a single coin flip for a binomial distribution, the binomial distribution variance is:
12 |
13 |
14 |
15 | This matches the result of the calculation using the second central moment formula.
16 |
17 | If we flip a fair coin 8 times, the variance of the binomial distribution of flips is:
18 |
19 |
20 |
21 | That is, since the events *Z* are independent, the variance of the total of *n* events is equal to the sum of the individual event variances.
22 |
23 | The conclusion is that the variance of a binomial distribution of *n* flips increases as the number of flips increases because we are measuring the mean of the counts of 1 over the total number of flips.
24 |
25 |
26 | This makes sense intuitively because if *n = 10*, the mean of the count of 1s can vary between 0 and 10 with *E[X] = 5*, but if *n = 100* the mean of counts can vary between 0 and 100 with *E[X] = 50*.
27 |
--------------------------------------------------------------------------------
/markdown/statinf-varianceOfExpDist.md:
--------------------------------------------------------------------------------
1 | # Theoretical Variance: Sampling Distribution of the Mean
2 |
3 | A number of students in Johns Hopkins University Data Science Specialization *Statistical Inference* course have asked questions about the theoretical variance for the distribution of sample means taken from an exponential distribution.
4 |
5 | Within the assignment instructions we are told to use a value of lambda = .2 for the simulations we generate as we validate the Central Limit Theorem.
6 |
7 | Given the assigned value of lambda, we derive the value of the theoretical variance as follows:
8 |
9 |
10 |
--------------------------------------------------------------------------------
/markdown/toolbox-RStudioOnChromebook.md:
--------------------------------------------------------------------------------
1 | # R and RStudio on a Chromebook?
2 |
3 | In the *Data Science Specialization*, students sometimes ask whether the coursework can be completed on a Chromebook. While it is possible to access R and RStudio on a Chromebook, it's important to understand that the computing requirements become more intensive for certain courses, including:
4 |
5 |
6 | Course | Computing Considerations |
7 | Capstone | Requires a machine with significant processing power, at least 16GB of RAM in order to process (without sampling) the 4 million rows of text content that must be processed to build a text prediction algorithm. |
8 | Practical Machine Learning | Due to the processor-intensive nature of the machine learning algorithms covered in the course, the more speed and memory a machine has, the more choices a student can make when building the final project. That said, I have completed the models on a tablet with 2GB of RAM and an Intel Atom-based processor, but a single model took over one hour to process. |
9 | Reproducible Research | A large volume of messy data from our friends at the National Oceanic and Atmospheric Administration leads to a significant data cleaning task. Since R processes everything in memory, more compute power makes this course easier to complete.
That said, I have been able to process this assignment on a Chromebook. |
10 |
11 |
12 | ## Making it Work
13 |
14 | I originally wrote this article in November 2016. Since then I have successfully [installed R and RStudio on Chromebook](http://bit.ly/2tHLVOo) using the Crouton tool. It's not a trivial amount of work, but it is possible to do many of the assignments required for the Johns Hopkins *Data Science Specialization* on a Chromebook.
15 |
16 | ## Additional Resources
17 |
18 | 1. [Rollapp](https://www.rollapp.com/app/rstudio): Provides access to R / RStudio, including read only file access to a variety of cloud-based disk storage
19 | 2. [Linux on Chromebook / Crouton](https://gigaom.com/2014/12/30/chromebooks-can-now-run-linux-in-a-chrome-os-window/): Gigaom article including a video that explains how to install Linux on a Chromebook. Once you've installed Linux, students can then install a Linux version of R and RStudio.
20 | 3. [Aurio on Amazon Cloud](https://aws.amazon.com/marketplace/pp/B00VETUL8M?qid=1479050009114&sr=0-3&ref_=srh_res_product_title): The Aurio application in the Amazon Web Services Marketplace is another potential way to access R / RStudio through a Chromebook.
21 |
22 | *Last Updated: 7 October 2017*
23 |
--------------------------------------------------------------------------------
/markdown/usingMarkdownInForumPosts.md:
--------------------------------------------------------------------------------
1 | # Using Editor Modes in Discussion Forum Posts
2 |
3 | As you progress through the Data Science Specialization, you will begin to notice forum posts that include a number of special features, including inline code references \(e.g. `str(anObject)`\), embedded pictures, and code blocks.
4 |
5 | How does one use these features within a Discussion Forum post?
6 |
7 | From the **New Thread** Window, a pulldown menu allows you to switch between three different ways to format a forum post, including:
8 |
9 | 1. HTML: write a forum post using HTML tags to format the content. Useful if you already know HTML, or want to use specific HTML features that aren't easily supported in the other modes.
10 | 2. Markdown: write a forum post using [Markdown](https://daringfireball.net/projects/markdown/syntax) a simplified version of HTML. Markdown is a key skill in the *Data Science Specialization*, and we begin using Markdown as part of the first project in *The Data Scientist's Toolbox*.
11 | 3. Rich Text: write a forum post using rich text, which supports basic formatting through a limited menu bar of features, as well as the ability to embed images in a post.
12 |
13 | ### Figure 1: Multiple Edit modes
14 |
15 |
16 |
17 | To write a high quality post, sometimes we need to use multiple modes. When starting a new forum thread, the default mode is *Markdown*. This means that we can write the post using Markdown syntax to format the post.
18 |
19 | The key feature from Markdown that is not available in Rich Text format is `inline code`. To highlight text in this way, use the backtick \` key at the start and end of the text to be formatted, such as \`here is my inline code\`, which will be rendered as `here is my inline code` when it displays in the post.
20 |
21 | To switch from Markdown to Rich Text, select the pulldown menu above the content entry box, and select **Rich Text**.
22 |
23 | When Rich Text is the edit mode, you can then use the icons at the top of the content entry area to format content.
24 |
25 | One of the most useful features in the Rich Text mode is the code block format. This format makes it easy to load a block of code into a forum post, and renders it in the standard format of a fixed point font on a gray background.
26 |
27 | To add code in a code block to a forum post, place your cursor where you want to insert the code block, and select the \ icon above the content entry area, which will insert the formatting for a code block.
28 |
29 | ### Figure 2: Adding a Rich Text Code Block
30 |
31 |
32 |
33 | Once the code block is visible, you can manually type the code, or paste the code into the formatted area if it was previously written in R or RStudio.
34 |
35 | ### Figure 3: Pasting Code into a Code Block
36 |
37 |
38 |
39 | By using these features, you can make it easier for readers to read the code within your forum posts. Additional documentation on Markdown Language is available on the [Daring Fireball](https://daringfireball.net/projects/markdown/) website that is maintained by the developer of Markdown Language, [John Gruber](https://en.wikipedia.org/wiki/John_Gruber).
40 |
--------------------------------------------------------------------------------
/pml-elemStatLearnAccess.R:
--------------------------------------------------------------------------------
1 | #
2 | # access data files from Elementary Statistical Learning book
3 | # see https://github.com/lgreski/datasciencectacontent/markdown/pml-ElemStatLearnPackage.md
4 |
5 |
6 | # technique 1: download from ESL website
7 | library(readr)
8 | vowel.train <- read_csv("https://web.stanford.edu/~hastie/ElemStatLearn/datasets/vowel.train")
9 | vowel.test <- read_csv("https://web.stanford.edu/~hastie/ElemStatLearn/datasets/vowel.test")
10 |
11 |
12 |
13 | # technique 2 - download from CRAN github site
14 | theURL <- "https://raw.githubusercontent.com/CRAN/ElemStatLearn/master/data/SAheart.RData"
15 |
16 | download.file(theURL,"./data/SAheart.RData",mode = "wb")
17 | load("./data/SAheart.RData")
18 | head(SAheart)
19 |
20 | # technique 3 - download and install package from CRAN archive
21 | #
22 | # first, navigate browser to archive for the desired package
23 | # http://cran.r-project.org/src/contrib/Archive/ElemStatLearn
24 |
25 | # next, we download and unzip the file
26 |
27 | url <- "http://cran.r-project.org/src/contrib/Archive/ElemStatLearn/ElemStatLearn_2015.6.26.tar.gz"
28 | pkgFile <- "ElemStatLearn_2015.6.26.tar.gz"
29 | download.file(url = url, destfile = pkgFile)
30 |
31 | # Next, we install dependencies. Since there are no dependencies for ElemStatLearn, we skip this step.
32 |
33 | # finally, we install the package with type = "source" and no repository
34 | install.packages(pkgs = pkgFile, type = "source", repos = NULL)
35 | library(ElemStatLearn)
36 | head(bone)
37 |
38 | # if desired, we can also delete the package tarball
39 | unlink(pkgFile)
--------------------------------------------------------------------------------
/pml-exampleSonarRandomForest.R:
--------------------------------------------------------------------------------
1 | #
2 | # Sonar example from caret documentation
3 | #
4 |
5 | library(mlbench)
6 | library(randomForest) # needed for varImpPlot
7 | data(Sonar)
8 | #
9 | # review distribution of Class column
10 | #
11 | table(Sonar$Class)
12 | library(caret)
13 | set.seed(95014)
14 |
15 | # create training & testing data sets
16 |
17 | inTraining <- createDataPartition(Sonar$Class, p = .75, list=FALSE)
18 | training <- Sonar[inTraining,]
19 | testing <- Sonar[-inTraining,]
20 |
21 | #
22 | # Step 1: configure parallel processing
23 | #
24 |
25 | library(parallel)
26 | library(doParallel)
27 | cluster <- makeCluster(detectCores() - 1) # convention to leave 1 core for OS
28 | registerDoParallel(cluster)
29 |
30 | #
31 | # Step 2: configure trainControl() object for k-fold cross validation with
32 | # 10 folds
33 | #
34 |
35 | fitControl <- trainControl(method = "cv",
36 | number = 10,
37 | allowParallel = TRUE)
38 |
39 | #
40 | # Step 3: develop training model
41 | #
42 |
43 | system.time(fit <- train(Class ~ ., method="rf",data=Sonar,trControl = fitControl))
44 |
45 | #
46 | # Step 4: de-register cluster
47 | #
48 | stopCluster(cluster)
49 | registerDoSEQ()
50 | #
51 | # Step 5: evaluate model fit
52 | #
53 | fit
54 | fit$resample
55 | confusionMatrix.train(fit)
56 | #average OOB error from final model
57 | mean(fit$finalModel$err.rate[,"OOB"])
58 |
59 | plot(fit,main="Accuracy by Predictor Count")
60 | varImpPlot(fit$finalModel,
61 | main="Variable Importance Plot: Random Forest")
62 |
63 | #
64 | # Step 6: acquire in sample estimate of error, which is the model oob [out-of-bag] estimate of error
65 | # rate ???, for comparison in subsequent step with out of sample estimate of error
66 | #
67 | fit$finalModel # see "OOB estimate of error rate: ##.#%" value in output
68 | # look for column name for oob [out-of-bag] estimate of error rate
69 | mer <- fit$finalModel$err.rate; dimnames(mer)
70 | # compare average of $err.rate[, "OOB"] with model output oob estimate of error rate
71 | mean(fit$finalModel$err.rate[, "OOB"]) * 100
72 | # compare median of $err.rate[, "OOB"] with model output oob estimate of error rate
73 | median(fit$finalModel$err.rate[, "OOB"]) * 100
74 | # found mean of $err.rate[,"OOB"] > median of $err.rate[,"OOB"] > fit$finalModel output OOB value
75 | inSampleError <- median(fit$finalModel$err.rate[, "OOB"]) * 100
76 |
77 | #
78 | # Step 7. Calculate out of sample estimate of error rate and compare with in sample estimate of error rate
79 | #
80 | pred <- predict(fit, newdata = testing)
81 | confmat <- confusionMatrix(pred, testing$Class)
82 | confmat$table; confmat$overall[["Accuracy"]]
83 | predAccuracy <- confmat$overall[["Accuracy"]]
84 | outOfSampleError <- (1 - predAccuracy) * 100
85 | outOfSampleError; inSampleError
86 | # Here we are looking for out of sample estimate error rate to be worse than in sample estimate of
87 | # error rate in order. This is to confirm that accuracy should be better in case of training data
88 | # where we knew the outcome values vs the unseen training data where we in theory don't know the
89 | # outcome values.
90 | sessionInfo()
91 |
--------------------------------------------------------------------------------
/pml-modelAccuracyCalcs.R:
--------------------------------------------------------------------------------
1 | #
2 | # model accuracy calculations
3 | # compare Bayes Theorem vs Šidák's correction of multiple tests
4 | #
5 | # reference: https://github.com/lgreski/datasciencectacontent/issues/7
6 | #
7 | options(digits=9)
8 | mdlAccuracy <- c(.8,.85,.9,.95,.99,.995,.996,.997,.9974,0.9974386,0.99743863,0.99743865,.9975)
9 | predAccuracy <- mdlAccuracy^20
10 | data.frame(mdlAccuracy,predAccuracy)
11 |
12 | # alternate approach: Šidák's correction of multiple tests
13 | # generate 95% confidence for 20 tests
14 | (1 - .05)^(1/20)
--------------------------------------------------------------------------------
/resources/2010LSIexcerpt.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/resources/2010LSIexcerpt.pdf
--------------------------------------------------------------------------------
/resources/STATNews Critique of IHME COVID-19 model.PDF:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/resources/STATNews Critique of IHME COVID-19 model.PDF
--------------------------------------------------------------------------------
/resources/noaaSSTCodebook.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgreski/datasciencectacontent/88c1701aad61be85a503b33a5add6e129a0837e6/resources/noaaSSTCodebook.xlsx
--------------------------------------------------------------------------------
/resources/testMakeCacheMatrix.R:
--------------------------------------------------------------------------------
1 | #
2 | # rprog-testCacheMatrix.R
3 | #
4 | # test script for makeCacheMatrix.R
5 | #
6 |
7 | # approach 1: create a matrix object, then use it as input to cacheSolve()
8 |
9 | a <- makeCacheMatrix(matrix(c(-1, -2, 1, 1), 2,2))
10 | cacheSolve(a)
11 |
12 | # call cacheSolve(a) a second time to trigger the "getting cached inverse" message
13 | cacheSolve(a)
14 |
15 | # multiply the matrix by inverse, resulting in identity matrix
16 | a$get() %*% a$getsolve()
17 |
18 | # reset a with another matrix to clear out cached value
19 | a$set(matrix(c(2,3,2,2),2,2))
20 |
21 | # confirm that a has new data and that cache is NULL
22 | a$get()
23 | a$getsolve()
24 |
25 | # rerun cache solve, note that "getting cached inverse" does not print,
26 | # and that we get a different result
27 | cacheSolve(a)
28 |
29 | # approach 2: use makeCacheMatrix() as the input argument to cacheSolve()
30 | # note that the argument to cacheSolve() is a different object
31 | # than the argument to the first call of cacheSolve()
32 | cacheSolve(makeCacheMatrix(matrix(c(-1, -2, 1, 1), 2,2)))
33 |
34 | # try a non-invertible matrix
35 | b <- makeCacheMatrix(matrix(c(0,0,0,0),2,2))
36 | cacheSolve(b)
37 |
38 | # illustrate getting the memory locations
39 | a <- makeCacheMatrix(matrix(c(-1, -2, 1, 1), 2,2))
40 | tracemem(a)
41 | tracemem(matrix(c(-1, -2, 1, 1), 2,2))
42 |
43 | # approach 2: use makeCacheMatrix() as the input argument to cacheSolve()
44 | # note that the argument to cacheSolve() is a different object
45 | # than the argument to the first call of cacheSolve()
46 | cacheSolve(makeCacheMatrix(matrix(c(-1, -2, 1, 1), 2,2)))
47 |
48 | # illustrate getting the memory locations
49 | a <- makeCacheMatrix(matrix(c(-1, -2, 1, 1), 2,2))
50 | tracemem(a)
51 | tracemem(matrix(c(-1, -2, 1, 1), 2,2))
52 |
53 | # test non-matrix input: should return "not a matrix" error
54 |
55 | a$set(1:5)
56 | cacheSolve(a)
--------------------------------------------------------------------------------
/rprog-extractOperator.R:
--------------------------------------------------------------------------------
1 | #
2 | # R code for Forms of the Extract Operator article
3 | #
4 |
5 | # stored in datasciencectacontent
6 |
7 | download.file("https://github.com/lgreski/PokemonData/raw/master/pokemonData.zip","pokemonData.zip",
8 | mode="wb",method="wininet")
9 |
10 | unzip("pokemonData.zip")
11 |
12 | thePokemonFiles <- list.files("./pokemonData",
13 | full.names=TRUE)[1:2]
14 | thePokemonFiles
15 |
16 | pokemonData <- lapply(thePokemonFiles,function(x) read.csv(x)[["Attack"]])
17 |
18 | # show the list of vectors
19 | summary(pokemonData)
20 |
21 | attackStats <- unlist(pokemonData)
22 | hist(attackStats,
23 | main="Pokemon Attack Stats: Gen 1 & 2")
--------------------------------------------------------------------------------
/statinf-integralCalculations.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "statinf-integralCalculations"
3 | author: "Len Greski"
4 | date: "8/5/2017"
5 | output: html_document
6 | ---
7 |
8 | ```{r setup, include=FALSE}
9 | knitr::opts_chunk$set(echo = TRUE)
10 | ```
11 |
12 | ## Fixed Integral of a Point
13 |
14 | $\int_{a}^{a} f(x) dx = 0$
15 |
16 | ## Standard Deviation from Census Table Data
17 |
18 | $s = \frac{x_{5 ptile}-\bar{x}}{Z_{5 ptile}}$.
--------------------------------------------------------------------------------
/statinf-varOfBinomialDistribution.Rmd:
--------------------------------------------------------------------------------
1 | # Variance of Binomial Distribution
2 |
3 | In the August run of *Statistical Inference*, a student asked a question about why the variance for a toss of a coin with a probability of heads (1) equal to *p* was *p(1 - p)* instead of *n p(1 - p)*, referencing the Wikipedia article for the [Binomial Distribution](http://bit.ly/2vU528j).
4 |
5 | In the lecture, Professor Caffo uses the [moment statistics](http://bit.ly/2iTE6zW) formula for the second central moment to calculate the variance, assuming a single coin flip.
6 |
7 | $p = 0.5$
8 |
9 | $Var(X) =E[X^2] - E[X]^2 = p - p^2$
10 |
11 | $p - p^2 = p(1 - p)$.
12 |
13 | $p(1 - p) = 1 * .5(1 - .5) = .25$.
14 |
15 | In the case where we have multiple coin flips, each flip is independent, and all flips have the same probability of a TRUE / 1 result. In this situation, we have a random variable *X* with parameters *n* and *p* that represents the sum of *n* independent variables *Z*, each of which can take the value 0 or 1. Therefore, the variance of the total number of flips is equal to n * p(1 - p).
16 |
17 | In the case of a single coin flip for a binomial distribution, the binomial distribution variance is:
18 |
19 | $Var(p) = n * p(1-p) = 1 * .5(1 - .5) = .25$.
20 |
21 | If we flip a fair coin 8 times, the variance of the binomial distribution of flips is:
22 |
23 | $8 * p(1 - p) = 8 * .5 * .5 = 2$.
24 |
25 | The conclusion is that the variance of a binomial distribution of *n* flips increases as the number of flips increases because we are measuring the mean of the counts of 1 over the total number of flips.
26 |
27 | This makes sense intuitively because if *n = 10*, the mean can vary between 0 and 10 with *E[X] = 5*, but if *n = 100* the mean can vary between 0 and 100 with *E[X] = 50*.
28 |
29 | regards,
30 |
31 | Len
32 |
--------------------------------------------------------------------------------