├── .gitignore ├── 00_whyR ├── 00_whyR.Rmd ├── 00_whyR.log ├── 00_whyR.pdf ├── exercises │ ├── 1_stats.R │ ├── 2_visual.R │ ├── 3_eda.R │ ├── 4_reproducible.Rmd │ └── covnat.rda ├── header.tex └── images │ ├── Coding-Lab.jpg │ ├── Knowledge-Scaffolding.jpg │ ├── R4art.png │ ├── RStudio-Screenshot.png │ ├── R_inventor.png │ ├── R_logo.png │ ├── R_vs_SPSS.jpg │ ├── arrival-movie.png │ ├── data-science-explore.png │ ├── data_science.png │ ├── hadley-wickham.jpg │ ├── night_king.jpg │ ├── r4ds-cover.png │ ├── rstudio-editor.png │ ├── social_science.jpg │ ├── tidyverse.png │ ├── tiobe-index.png │ ├── typesetting.png │ └── what_is_R.png ├── 01_install ├── 01_install.Rmd ├── 01_install.log ├── 01_install.pdf ├── header.tex └── images │ ├── QQgroup_PsyStats.png │ ├── QQgroup_chenglong.png │ ├── QQgroup_shizishan.png │ ├── RStudio-Screenshot.png │ ├── Rhelp.png │ ├── Rinstall.png │ ├── Rstudio_install.png │ ├── dashboard.jpg │ ├── engine.jpg │ ├── mirror1.png │ ├── mirror2.png │ ├── rstudio-editor1.png │ └── run_script.png ├── 02_basicR ├── 02_basicR.Rmd ├── 02_basicR.log ├── 02_basicR.pdf ├── header.tex └── images │ ├── Rhelp.png │ ├── data_struction1.png │ ├── data_type.png │ ├── rstudio-editor.png │ ├── script1.png │ └── script2.png ├── 03_subset ├── 03_subset.Rmd ├── 03_subset.pdf ├── header.tex └── images │ ├── R_box.png │ └── data_struction1.png ├── 04_Rmarkdown ├── 04_Rmarkdown.Rmd ├── 04_Rmarkdown.log ├── 04_Rmarkdown.pdf ├── header.tex └── images │ ├── R_logo.png │ ├── rmarkdown.png │ └── rstudio-markdown.png ├── 05_dplyr ├── 05_dplyr.Rmd ├── 05_dplyr.log ├── 05_dplyr.pdf ├── demo_data │ ├── olympics.xlsx │ └── wages.csv ├── header.tex └── images │ ├── import_datatype01.png │ ├── pipe1.png │ ├── pipe2.png │ └── tidyverse.png ├── 06_ggplot2 ├── 06_ggplot2.Rmd ├── 06_ggplot2.log ├── 06_ggplot2.pdf ├── demo_data │ └── temp_carbon.csv ├── header.tex └── images │ ├── Paradox1.pdf │ ├── Paradox2.pdf │ ├── Paradox3.pdf │ ├── a-14.png │ ├── a-20.png │ ├── a-21.png │ ├── a-3.png │ ├── cholera_a.pdf │ ├── cholera_b.pdf │ ├── cholera_c.pdf │ ├── ggplot_template.png │ ├── mapping.png │ └── tidyverse.png ├── 09_stringr ├── 09_stringr.Rmd ├── 09_stringr.pdf ├── header.tex └── images │ ├── hex-stringr.png │ └── regex_repeat.jpg ├── 15_eda02 ├── 15_reproducible.Rmd ├── 15_reproducible.pdf ├── demo_data │ └── penguins.csv ├── header.tex └── images │ ├── 01.png │ ├── 4_3.png │ ├── culmen_depth.png │ ├── lter_penguins.png │ └── penguins.png ├── R4DS_slides.Rproj ├── README.md └── data_science.jpg /.gitignore: -------------------------------------------------------------------------------- 1 | .Rproj.user 2 | .Rhistory 3 | .RData 4 | .Ruserdata 5 | -------------------------------------------------------------------------------- /00_whyR/00_whyR.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "R能给我们生活带来什么?" 3 | author: "王敏杰" 4 | institute: "四川师范大学" 5 | date: "\\today" 6 | fontsize: 12pt 7 | output: binb::metropolis 8 | section-titles: true 9 | #toc: true 10 | header-includes: 11 | - \usepackage[fontset = fandol]{ctex} 12 | - \input{header.tex} 13 | link-citations: yes 14 | colorlinks: yes 15 | linkcolor: red 16 | classoption: "dvipsnames,UTF8" 17 | --- 18 | 19 | ```{r setup, include=FALSE} 20 | options(digits = 3) 21 | knitr::opts_chunk$set( 22 | comment = "#>", 23 | echo = TRUE, 24 | collapse = TRUE, 25 | message = FALSE, 26 | warning = FALSE, 27 | out.width = "50%", 28 | fig.align = "center", 29 | fig.asp = 0.618, # 1 / phi 30 | fig.show = "hold" 31 | ) 32 | ``` 33 | 34 | ## R能给我们生活带来什么? 35 | 36 | 这个问题,好比人生三大终极问题: 37 | 38 | - R是什么? 39 | - R能干什么? 40 | - 为什么是R? 41 | 42 | # R是什么 43 | 44 | ## R那些事 45 | 46 | - 1992年,新西兰奥克兰大学统计学教授 Ross Ihaka 和 Robert Gentleman,为了方便地给学生教授统计学课程,他们设计开发了R语言(他们名字的首字母都是R)。 47 | 48 | ```{r echo=FALSE, out.width = '0.8\\textwidth'} 49 | knitr::include_graphics(path = "images/R_inventor.png") 50 | ``` 51 | 52 | - 2000年,R1.0.0 发布 53 | - 2004年,第一届国际useR!会议(随后每年举办一次) 54 | - 2005年,ggplot2宏包(2018.8 - 2019.8下载量超过 1.3 亿次) 55 | - 2012年,R2.15.2 发布 56 | - 2013年,R3.0.2 发布, CRAN上的宏包数量5026个 57 | - 2016年,Rstudio公司推出 tidyverse 宏包(数据科学当前最流行的R宏包) 58 | - 2017年,R3.4.1 发布,CRAN上的宏包数量10875个 59 | - 2019年,R3.6.1 发布,CRAN上的宏包数量15102个 60 | - 2020年,R4.0.0 发布,CRAN上的宏包数量16054个 61 | 62 | [The History of R](https://blog.revolutionanalytics.com/2020/07/the-history-of-r-updated-for-2020.html) 63 | 64 | ## R是什么 65 | 66 | 官网定义: 67 | 68 | ```{r eval=FALSE, include=FALSE} 69 | knitr::include_graphics("images/what_is_R.png") 70 | ``` 71 | 72 | R语言是用于统计分析,图形表示和报告的编程语言: 73 | 74 | - R 是一个\textcolor{red}{统计编程}语言(statistical programming) 75 | - R 可运行于多种平台之上,包括Windows、UNIX 和 Mac OS X 76 | - R 拥有顶尖水准的\textcolor{red}{制图}功能 77 | - R 是免费的 78 | - R 应用广泛,拥有丰富的\textcolor{red}{库包} 79 | - 活跃的\textcolor{red}{社区} 80 | 81 | 82 | 83 | 84 | 85 | 86 | 87 | 88 | ## R语言发展趋势 89 | 90 | ```{r echo=FALSE, out.width = '100%'} 91 | knitr::include_graphics("images/tiobe-index.png") 92 | ``` 93 | 94 | [TIOBE index](https://www.tiobe.com/tiobe-index/) 95 | 96 | ## 界面很友好 97 | 98 | ```{r out.width = '85%', echo = FALSE} 99 | knitr::include_graphics("images/rstudio-editor.png") 100 | ``` 101 | 102 | ## R路上的大神 103 | 104 | 2019 年 8 月,国际统计学年会将考普斯总统奖(\textcolor{red}{被誉为统计学的诺贝尔奖})奖颁给 tidyverse 的作者 105 | 106 | ```{r echo=FALSE, out.width = '50%'} 107 | knitr::include_graphics("images/hadley-wickham.jpg") 108 | ``` 109 | 110 | - [Hadley Wickham](http://hadley.nz/) 111 | - R路上的大神 112 | - 一个改变了R语言的人 113 | 114 | # R能干什么 115 | 116 | ## 数据科学的流程 117 | 118 | Hadley Wickham 定义了数据科学的工作流程 119 | 120 | ```{r echo=FALSE, out.width = '\\textwidth'} 121 | knitr::include_graphics(path = "images/data-science-explore.png") 122 | ``` 123 | 124 | 125 | ## tidyverse套餐 126 | 127 | ```{r out.width = '80%', echo = FALSE} 128 | knitr::include_graphics("images/tidyverse.png") 129 | ``` 130 | \centering{https://www.tidyverse.org/} 131 | 132 | 133 | 134 | ## R & tidyverse 135 | 136 | | 序号 | 内容 | 代码演示 | 137 | |------ |-------------- |------------------ | 138 | | 1 | 统计 | 1_stats.R | 139 | | 2 | 可视化 | 2_visual.R | 140 | | 3 | 探索性分析 | 3_eda.R | 141 | | 4 | 可重复性报告 | 4_reproducible.R | 142 | 143 | 144 | ## 难吗? 145 | \Huge 146 | \centering{ 感觉很难吗? \\ 如果是,那说明你认真听了} 147 | 148 | 149 | 150 | ## 看了这些代码,可能第一眼感觉是这样的 151 | ```{r echo=FALSE, out.width = '100%', fig.cap='图片来自电影《降临》'} 152 | knitr::include_graphics("images/arrival-movie.png") 153 | ``` 154 | 155 | 156 | 157 | 158 | ## 但我更希望这门课结束后 159 | ```{r echo=FALSE, out.width = '100%', fig.cap='图片来自美剧《权利的游戏》'} 160 | knitr::include_graphics("images/night_king.jpg") 161 | ``` 162 | 163 | 164 | 165 | # 为什么是R 166 | 167 | 168 | ## 社会科学需要统计 169 | 170 | ```{r echo=FALSE, out.width = '60%'} 171 | knitr::include_graphics("images/social_science.jpg") 172 | ``` 173 | 174 | \centering{我们不是学统计的,但需要统计} 175 | 176 | 177 | 178 | 179 | 180 | ## 社会科学需要可视化 181 | 182 | ```{r echo=FALSE, out.width = '50%'} 183 | knitr::include_graphics("images/R4art.png") 184 | ``` 185 | 186 | 187 | \centering{我们不是学美术的,但要可视化} 188 | 189 | 190 | 191 | ## 社会科学需要编程 192 | 193 | ```{r echo=FALSE, out.width = '80%'} 194 | knitr::include_graphics("images/Coding-Lab.jpg") 195 | ``` 196 | 197 | \centering{我们不是学计算机的,但需要编程} 198 | 199 | 200 | 201 | 202 | ## 你的论文需要排版 203 | 204 | ```{r echo=FALSE, out.width = '60%'} 205 | knitr::include_graphics("images/typesetting.png") 206 | ``` 207 | 208 | \centering{我们不是学设计的,但要操心\textcolor{red}{交叉引用}的事} 209 | 210 | 211 | 212 | 213 | 214 | ## 挖掘机技术到底哪家强? 215 | 216 | 217 | 218 | 219 | 220 | \centering 221 | 你有需求,而 222 | \raisebox{-.5\height}{\includegraphics[height=3\baselineskip]{images/R_logo.png}} 223 | 很专业 224 | 225 | 226 | 227 | | 序号 | 内容 | 特性 | 评价 | 228 | |------ |--------------- |---------- |------ | 229 | | 1 | 统计分析 | 看家本领 | 好用 | 230 | | 2 | ggplot2画图 | 颜值担当 | 好看 | 231 | | 3 | tidyverse语法 | 简单易懂 | 好学 | 232 | | 4 | 可重复性报告 | 方便快捷 | 好玩 | 233 | 234 | 235 | 236 | 237 | 238 | ## 一见钟情,还是相见恨晚? 239 | 240 | ```{r echo=FALSE, out.width = '100%'} 241 | knitr::include_graphics("images/R_vs_SPSS.jpg") 242 | ``` 243 | 244 | 245 | 246 | 247 | 248 | # 关于学习 249 | 250 | ## 我们的课程不会枯燥 251 | 252 | ```{r echo=FALSE, out.width = '45%'} 253 | knitr::include_graphics("images/data_science.png") 254 | ``` 255 | 256 | - 数据科学是为社会科学服务的,我们会有很多案例 257 | - 编程是工具,统计是灵魂,专业是核心 258 | 259 | 260 | 261 | ## 关于学习 262 | 263 | 我很少使用 264 | 265 | $$ 266 | f(x)=\frac{1}{\sqrt{2 \pi}} e^{-\frac{1}{2} x^{2}} 267 | $$ 268 | 269 | 更多的是 270 | 271 | ```{r, eval = FALSE} 272 | library(tidyverse) 273 | summary_monthly_temp <- weather %>% 274 | group_by(month) %>% 275 | summarize(mean = mean(temp), 276 | std_dev = sd(temp)) 277 | ``` 278 | 279 | ## 关于学习 280 | 281 | ### 课程目标 282 | 283 | - 训练数据思维,提升编程技能,培养创新能力 284 | 285 | ### 学习方法 286 | 287 | - **问题驱动型学习** 288 | - 多实践(光看李小龙的电影,是学不会功夫的) 289 | - 不是 learning R,而是 learning with R 290 | - 把 R 看做是知识学习的**脚手架** 291 | 292 | ```{r echo=FALSE, out.width = '35%'} 293 | knitr::include_graphics("images/Knowledge-Scaffolding.jpg") 294 | ``` 295 | 296 | 297 | 298 | 299 | 300 | 301 | 302 | 303 | 304 | 305 | 306 | 307 | 308 | 309 | 310 | 311 | 312 | 313 | ## 参考书目 314 | 315 | ```{r echo=FALSE, out.width = '35%'} 316 | knitr::include_graphics("images/r4ds-cover.png") 317 | ``` 318 | 319 | - [R for Data Science](https://r4ds.had.co.nz/) 320 | - [https://bookdown.org/wangminjie/R4DS/](https://bookdown.org/wangminjie/R4DS/) 321 | -------------------------------------------------------------------------------- /00_whyR/00_whyR.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/00_whyR/00_whyR.pdf -------------------------------------------------------------------------------- /00_whyR/exercises/1_stats.R: -------------------------------------------------------------------------------- 1 | # Numeric Functions 2 | 1 + 5 3 | 1:100 4 | abs(-3.14) 5 | sqrt(3.14) 6 | floor(3.14) 7 | round(3.14) 8 | cos(3.14) 9 | log(3.14) 10 | exp(3.14) 11 | 12 | seq(1, 10, 2) 13 | rep(1:3, 2) 14 | 15 | 16 | 17 | 18 | 19 | # Character Functions 20 | substr("abcdef", 2, 4) 21 | grep("a", c("alice", "bob", "claro")) 22 | strsplit("a.b.c", "\\.") 23 | toupper("Alice") 24 | tolower("Alice") 25 | 26 | 27 | 28 | 29 | # Statistical Functions 30 | x <- 1:10 31 | sum(x) 32 | min(x) 33 | mean(x) 34 | sd(x) 35 | var(x) 36 | median(x) 37 | quantile(x, probs = 0.75) 38 | range(x) 39 | scale(x, center = TRUE, scale = TRUE) 40 | 41 | 42 | 43 | # Statistical Probability Functions 44 | rnorm(20, mean = 0, sd = 1) 45 | dnorm(0.5, mean = 0, sd = 1) 46 | rpois(100, lambda = 10) 47 | dpois(2, lambda = 10) 48 | 49 | 50 | 51 | 52 | 53 | # Regression Modeling 54 | lm(mpg ~ wt, data = mtcars) 55 | aov(mpg ~ wt, data = mtcars) 56 | t.test(extra ~ group, data = sleep) 57 | 58 | 59 | -------------------------------------------------------------------------------- /00_whyR/exercises/2_visual.R: -------------------------------------------------------------------------------- 1 | library(ggplot2) 2 | 3 | ggplot(midwest, aes(x = area, y = poptotal)) + 4 | geom_point(aes(color = state, size = popdensity)) + 5 | geom_smooth(method = "loess", se = F) + 6 | xlim(c(0, 0.1)) + 7 | ylim(c(0, 500000)) + 8 | labs( 9 | subtitle = "Area Vs Population", 10 | y = "Population", 11 | x = "Area", 12 | title = "Scatterplot", 13 | caption = "Source: midwest" 14 | ) 15 | -------------------------------------------------------------------------------- /00_whyR/exercises/3_eda.R: -------------------------------------------------------------------------------- 1 | library(tidyverse) 2 | 3 | # 案例一:飓风数据集 4 | 5 | storms %>% count(year) 6 | 7 | storms %>% 8 | group_by(year) %>% 9 | summarize( 10 | wind_mean = mean(wind), 11 | wind_sd = sd(wind) 12 | ) 13 | 14 | 15 | 16 | 17 | 18 | # 案例二:VC剂量和喂食方法对豚鼠牙齿的影响? 19 | # 双因素方差分析 (ANOVA) 20 | 21 | my_data <- ToothGrowth %>% 22 | mutate( 23 | across(c(supp, dose), as_factor) 24 | ) 25 | 26 | 27 | my_data %>% 28 | ggplot(aes(x = supp, y = len, fill = supp)) + 29 | geom_boxplot(position = position_dodge()) + 30 | facet_wrap(vars(dose)) 31 | 32 | 33 | 34 | aov(len ~ supp + dose, data = my_data) 35 | 36 | 37 | aov(len ~ supp + dose, data = my_data) %>% 38 | TukeyHSD(which = "dose") %>% 39 | broom::tidy() 40 | 41 | 42 | 43 | 44 | 45 | 46 | -------------------------------------------------------------------------------- /00_whyR/exercises/4_reproducible.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "这是一份关于新冠肺炎的探索性分析报告" 3 | author: "王小二" 4 | date: "`r Sys.Date()`" 5 | output: 6 | pdf_document: 7 | latex_engine: xelatex 8 | extra_dependencies: 9 | ctex: UTF8 10 | number_sections: yes 11 | #toc: yes 12 | df_print: kable 13 | classoptions: "hyperref, 12pt, a4paper" 14 | --- 15 | 16 | 17 | ```{r setup, include=FALSE} 18 | knitr::opts_chunk$set(echo = TRUE, 19 | message = FALSE, 20 | warning = FALSE, 21 | fig.align = "center" 22 | ) 23 | ``` 24 | 25 | 26 | 27 | # 引言 28 | 29 | 新型冠状病毒疫情在多国蔓延,一些国家的病例确诊数量明显增多,各国防疫力度继续加强。本章通过分析疫情数据,了解疫情发展,祝愿人类早日会战胜病毒! 30 | 31 | 32 | # 导入数据 33 | 34 | 首先,我们加载需要的宏包,其中tidyverse用于数据探索、covdata用于获取数据 35 | 36 | ```{r} 37 | # Load libraries 38 | library(tidyverse) 39 | #library(covdata) 40 | load("covnat.rda") 41 | ``` 42 | 43 | 44 | 论文的数据来源 45 | [https://kjhealy.github.io/covdata/](https://kjhealy.github.io/covdata/),我们选取部分数据看看 46 | 47 | 48 | ```{r, echo = FALSE} 49 | covnat %>% 50 | tail(8) 51 | ``` 52 | 53 | 54 | 55 | # 数据变量 56 | 57 | 这个数据集包含8个变量,具体含义如下: 58 | 59 | | 变量 | 含义 | 60 | |----------- |-------------------- | 61 | | date | 日期 | 62 | | cname | 国家名 | 63 | | iso3 | 国家编码 | 64 | | cases | 确诊病例 | 65 | | deaths | 死亡病例 | 66 | | pop | 2019年国家人口数量 | 67 | | cu_cases | 累积确诊病例 | 68 | | cu_deaths | 累积死亡病例 | 69 | 70 | # 数据探索 71 | 72 | 找出累积确诊病例最多的几个国家 73 | 74 | ```{r} 75 | covnat %>% 76 | ungroup() %>% 77 | filter(date == max(date)) %>% 78 | slice_max(cu_cases, n = 8) 79 | ``` 80 | 81 | 82 | # 可视化 83 | 84 | 为了更好的呈现数据,我们将筛选出美国确诊病例数据,并可视化 85 | 86 | ```{r, fig.showtext = TRUE} 87 | covnat %>% 88 | filter(iso3 == "USA") %>% 89 | filter(cu_cases > 0) %>% 90 | ungroup() %>% 91 | 92 | ggplot(aes(x = date, y = cases)) + 93 | geom_path() + 94 | scale_x_date(name = NULL, breaks = "month") + 95 | labs(title = "美国新冠肺炎累积确诊病例", 96 | subtitle = "数据来源https://kjhealy.github.io/covdata/") 97 | ``` 98 | 99 | 100 | -------------------------------------------------------------------------------- /00_whyR/exercises/covnat.rda: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/00_whyR/exercises/covnat.rda -------------------------------------------------------------------------------- /00_whyR/header.tex: -------------------------------------------------------------------------------- 1 | \usepackage{ctex} 2 | \usepackage{booktabs} 3 | \usepackage{longtable} 4 | \usepackage{array} 5 | \usepackage{multirow} 6 | \usepackage{wrapfig} 7 | \usepackage{float} 8 | \usepackage{colortbl} 9 | \usepackage{pdflscape} 10 | \usepackage{tabu} 11 | \usepackage{threeparttable} 12 | \usepackage{threeparttablex} 13 | \usepackage{makecell} 14 | \usepackage{xcolor} 15 | \usepackage{xtab} 16 | 17 | \def\begincols{ 18 | \begin{columns}} \def\begincol{\begin{column}} \def\endcol{\end{column}} \def\endcols{\end{columns} 19 | } 20 | 21 | 22 | -------------------------------------------------------------------------------- /00_whyR/images/Coding-Lab.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/00_whyR/images/Coding-Lab.jpg -------------------------------------------------------------------------------- /00_whyR/images/Knowledge-Scaffolding.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/00_whyR/images/Knowledge-Scaffolding.jpg -------------------------------------------------------------------------------- /00_whyR/images/R4art.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/00_whyR/images/R4art.png -------------------------------------------------------------------------------- /00_whyR/images/RStudio-Screenshot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/00_whyR/images/RStudio-Screenshot.png -------------------------------------------------------------------------------- /00_whyR/images/R_inventor.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/00_whyR/images/R_inventor.png -------------------------------------------------------------------------------- /00_whyR/images/R_logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/00_whyR/images/R_logo.png -------------------------------------------------------------------------------- /00_whyR/images/R_vs_SPSS.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/00_whyR/images/R_vs_SPSS.jpg -------------------------------------------------------------------------------- /00_whyR/images/arrival-movie.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/00_whyR/images/arrival-movie.png -------------------------------------------------------------------------------- /00_whyR/images/data-science-explore.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/00_whyR/images/data-science-explore.png -------------------------------------------------------------------------------- /00_whyR/images/data_science.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/00_whyR/images/data_science.png -------------------------------------------------------------------------------- /00_whyR/images/hadley-wickham.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/00_whyR/images/hadley-wickham.jpg -------------------------------------------------------------------------------- /00_whyR/images/night_king.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/00_whyR/images/night_king.jpg -------------------------------------------------------------------------------- /00_whyR/images/r4ds-cover.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/00_whyR/images/r4ds-cover.png -------------------------------------------------------------------------------- /00_whyR/images/rstudio-editor.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/00_whyR/images/rstudio-editor.png -------------------------------------------------------------------------------- /00_whyR/images/social_science.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/00_whyR/images/social_science.jpg -------------------------------------------------------------------------------- /00_whyR/images/tidyverse.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/00_whyR/images/tidyverse.png -------------------------------------------------------------------------------- /00_whyR/images/tiobe-index.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/00_whyR/images/tiobe-index.png -------------------------------------------------------------------------------- /00_whyR/images/typesetting.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/00_whyR/images/typesetting.png -------------------------------------------------------------------------------- /00_whyR/images/what_is_R.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/00_whyR/images/what_is_R.png -------------------------------------------------------------------------------- /01_install/01_install.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "第一章:配置R语言环境" 3 | author: "王敏杰" 4 | institute: "四川师范大学" 5 | date: "\\today" 6 | fontsize: 12pt 7 | output: binb::metropolis 8 | section-titles: true 9 | #toc: true 10 | header-includes: 11 | - \usepackage[fontset = fandol]{ctex} 12 | - \input{header.tex} 13 | link-citations: yes 14 | colorlinks: yes 15 | linkcolor: red 16 | classoption: "dvipsnames,UTF8" 17 | --- 18 | 19 | ```{r setup, include=FALSE} 20 | options(digits = 3) 21 | knitr::opts_chunk$set( 22 | comment = "#>", 23 | echo = TRUE, 24 | collapse = TRUE, 25 | message = FALSE, 26 | warning = FALSE, 27 | out.width = "50%", 28 | fig.align = "center", 29 | fig.asp = 0.618, # 1 / phi 30 | fig.show = "hold" 31 | ) 32 | ``` 33 | 34 | # 配置R语言环境 35 | 36 | ## 准备工作 37 | 38 | - \textcolor{red}{第一步}:连网(无线)办法 39 | - 用户名:学号+ @sicnu,比如 `20150956@sicnu` 40 | - 密码:出生年月日 + 身份证最后一位(如果最后一位为X,要大写),比如 `19880923X` 41 | 42 | - \textcolor{red}{第二步}:加QQ群 43 | 44 | ```{r echo=FALSE, out.width = '25%'} 45 | #knitr::include_graphics(path = "images/QQgroup_PsyStats.png") 46 | knitr::include_graphics(path = "images/QQgroup_shizishan.png") 47 | knitr::include_graphics(path = "images/QQgroup_chenglong.png") 48 | ``` 49 | 50 | - \textcolor{red}{第三步}:在QQ群文件里下载(R-4.0.2-win.exe, RStudio-1.3.1091.exe),点击安装 51 | 52 | 53 | ## 环境配置 54 | 55 | 主要分三步: 56 | 57 | - 安装R 58 | - 安装Rstudio 59 | - 安装必要的宏包(packages) 60 | 61 | ## 第一步安装R 62 | 63 | - 下载并安装R,官方网站 64 | 65 | ```{r echo=FALSE, out.width = '85%'} 66 | knitr::include_graphics("images/Rinstall.png") 67 | ``` 68 | 69 | ## 第二步安装RStudio 70 | 71 | - 下载并安装RStudio,官方网站 72 | - 选择`RStudio Desktop` 73 | 74 | ```{r out.width = '85%', echo = FALSE} 75 | knitr::include_graphics("images/Rstudio_install.png") 76 | ``` 77 | 78 | ## 注意事项 79 | 80 | 这里有个小小的提示: 81 | 82 | - 电脑用户名\textcolor{red}{不要有中文和空格} 83 | 84 | - 尽量安装在\textcolor{red}{非系统盘},比如,可以选择安装在D盘 85 | 86 | - 安装路径\textcolor{red}{不要有中文和空格}。比如,这样就比较好 87 | 88 | - `D:/R` 89 | - `D:/Rstudio 90 | 91 | 92 | 93 | ## R 与 RStudio 是什么关系呢 94 | 95 | \qquad \qquad \qquad R \hspace{4cm} RStudio 96 | 97 | ```{r, fig.show="hold", out.width="49%", echo = FALSE} 98 | knitr::include_graphics(c("images/engine.jpg", "images/dashboard.jpg")) 99 | ``` 100 | 101 | \centering{R 是有趣的灵魂, Rstudio 是好看的皮囊} 102 | 103 | 104 | ## RStudio很友好 105 | 106 | 从windows开始菜单,点开rstudio,界面效果 107 | 108 | ```{r out.width = '75%', echo = FALSE} 109 | knitr::include_graphics("images/rstudio-editor1.png") 110 | ``` 111 | 112 | 113 | 114 | ## 第三步安装宏包 115 | 116 | ```{r out.width = '60%', echo = FALSE} 117 | knitr::include_graphics("images/RStudio-Screenshot.png") 118 | ``` 119 | 120 | - 命令行安装 121 | 122 | - `install.packages("tidyverse")` 123 | - 回车,安静等待 124 | 125 | 126 | 127 | 128 | 129 | 130 | 131 | 132 | 133 | 134 | 135 | 136 | 137 | 138 | 139 | 140 | 141 | 142 | ## 如果宏包安装速度太慢, 方法一 143 | 144 | - 指定清华大学镜像 145 | 146 | ```{r, eval = FALSE} 147 | install.packages( 148 | "tidyverse", 149 | repos = "http://mirrors.tuna.tsinghua.edu.cn/CRAN" 150 | ) 151 | ``` 152 | 153 | 154 | - 或者,指定兰州大学镜像 155 | 156 | ```{r, eval = FALSE} 157 | install.packages( 158 | "tidyverse", 159 | repos = "https://mirror.lzu.edu.cn/CRAN/" 160 | ) 161 | ``` 162 | 163 | 164 | 165 | ## 如果宏包安装速度太慢, 方法二 166 | 167 | - `Rstudio`里设置镜像,步骤如下: 168 | 169 | 170 | 171 | 172 | ```{r image_grobs, fig.show='hold', out.width = "49%", fig.align = "default", echo=FALSE} 173 | library(cowplot) 174 | library(ggplot2) 175 | 176 | ggdraw() + draw_image("images/mirror1.png") 177 | ggdraw() + draw_image("images/mirror2.png") 178 | ``` 179 | 180 | 181 | - 然后 182 | 183 | ```{r, eval = FALSE } 184 | install.packages("tidyverse") 185 | ``` 186 | 187 | 188 | ## 测试 189 | 190 | 复制以下代码到**脚本编辑区** \footnotesize 191 | 192 | ```{r, eval=FALSE} 193 | library(ggplot2) 194 | 195 | ggplot(midwest, aes(x = area, y = poptotal)) + 196 | geom_point(aes(color = state, size = popdensity)) + 197 | geom_smooth(method = "loess", se = F) + 198 | xlim(c(0, 0.1)) + 199 | ylim(c(0, 500000)) + 200 | labs( 201 | title = "Scatterplot", 202 | subtitle = "Area Vs Population", 203 | x = "Area", 204 | y = "Population" 205 | ) 206 | ``` 207 | 208 | 209 | 210 | ## 运行 211 | 212 | ```{r out.width = '65%', echo = FALSE} 213 | knitr::include_graphics("images/run_script.png") 214 | ``` 215 | 216 | - 方法1:点击`Run`, 运行光标所在行的代码 217 | - 方法2:点击`Source`,从头到尾运行全部代码 218 | 219 | 220 | 221 | 222 | 223 | ## 如果出现这个图,说明配置成功 224 | 225 | ```{r out.width = '100%', echo = FALSE} 226 | library(ggplot2) 227 | 228 | ggplot(midwest, aes(x = area, y = poptotal)) + 229 | geom_point(aes(color = state, size = popdensity)) + 230 | geom_smooth(method = "loess", se = F) + 231 | xlim(c(0, 0.1)) + 232 | ylim(c(0, 500000)) + 233 | labs( 234 | title = "Scatterplot", 235 | subtitle = "Area Vs Population", 236 | x = "Area", 237 | y = "Population" 238 | ) 239 | ``` 240 | 241 | # 可能的问题 242 | 243 | ## 可能的问题 244 | 245 | - 我的电脑是苹果系统,怎么安装呢? 246 | - 我的Rstudio需要哪些设置? 247 | - 我的系统不能兼容64位的Rstudio? 248 | - 为什么Rstudio打开是空白呢? 249 | - 安装宏包太慢,怎么解决? 250 | - 安装宏包,遇到报错信息"unable to access index for repository..."? 251 | 252 | 253 | ## Happy R 254 | 255 | 课时有限,想掌握这门技术,需要课后多下功夫 256 | 257 | - 请务必配置好环境,包括安装宏包(群里有安装视频,实在不行,@我远程协助) 258 | - 学习资料 https://bookdown.org/wangminjie/R4DS/ 259 | - 参考书目《R数据科学》(群文件book文件夹中) 260 | - 我们不是孙悟空,一出生就身怀绝技。No shame in asking help 261 | - 学习曲线会比较陡,但有老司机带路,要有信心。 262 | 263 | 祝大家happy R ! 264 | -------------------------------------------------------------------------------- /01_install/01_install.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/01_install/01_install.pdf -------------------------------------------------------------------------------- /01_install/header.tex: -------------------------------------------------------------------------------- 1 | \usepackage{ctex} 2 | \usepackage{booktabs} 3 | \usepackage{longtable} 4 | \usepackage{array} 5 | \usepackage{multirow} 6 | \usepackage{wrapfig} 7 | \usepackage{float} 8 | \usepackage{colortbl} 9 | \usepackage{pdflscape} 10 | \usepackage{tabu} 11 | \usepackage{threeparttable} 12 | \usepackage{threeparttablex} 13 | \usepackage{makecell} 14 | \usepackage{xcolor} 15 | \usepackage{xtab} 16 | 17 | \def\begincols{ 18 | \begin{columns}} \def\begincol{\begin{column}} \def\endcol{\end{column}} \def\endcols{\end{columns} 19 | } 20 | 21 | 22 | -------------------------------------------------------------------------------- /01_install/images/QQgroup_PsyStats.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/01_install/images/QQgroup_PsyStats.png -------------------------------------------------------------------------------- /01_install/images/QQgroup_chenglong.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/01_install/images/QQgroup_chenglong.png -------------------------------------------------------------------------------- /01_install/images/QQgroup_shizishan.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/01_install/images/QQgroup_shizishan.png -------------------------------------------------------------------------------- /01_install/images/RStudio-Screenshot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/01_install/images/RStudio-Screenshot.png -------------------------------------------------------------------------------- /01_install/images/Rhelp.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/01_install/images/Rhelp.png -------------------------------------------------------------------------------- /01_install/images/Rinstall.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/01_install/images/Rinstall.png -------------------------------------------------------------------------------- /01_install/images/Rstudio_install.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/01_install/images/Rstudio_install.png -------------------------------------------------------------------------------- /01_install/images/dashboard.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/01_install/images/dashboard.jpg -------------------------------------------------------------------------------- /01_install/images/engine.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/01_install/images/engine.jpg -------------------------------------------------------------------------------- /01_install/images/mirror1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/01_install/images/mirror1.png -------------------------------------------------------------------------------- /01_install/images/mirror2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/01_install/images/mirror2.png -------------------------------------------------------------------------------- /01_install/images/rstudio-editor1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/01_install/images/rstudio-editor1.png -------------------------------------------------------------------------------- /01_install/images/run_script.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/01_install/images/run_script.png -------------------------------------------------------------------------------- /02_basicR/02_basicR.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "第二章:R语言基础" 3 | author: "王敏杰" 4 | institute: "四川师范大学" 5 | date: "\\today" 6 | fontsize: 12pt 7 | output: binb::metropolis 8 | section-titles: true 9 | #toc: true 10 | header-includes: 11 | - \usepackage[fontset = fandol]{ctex} 12 | - \input{header.tex} 13 | link-citations: yes 14 | colorlinks: yes 15 | linkcolor: red 16 | classoption: "dvipsnames,UTF8" 17 | --- 18 | 19 | ```{r setup, include=FALSE} 20 | options(digits = 3) 21 | knitr::opts_chunk$set( 22 | comment = "#>", 23 | echo = TRUE, 24 | collapse = TRUE, 25 | message = FALSE, 26 | warning = FALSE, 27 | out.width = "50%", 28 | fig.align = "center", 29 | fig.asp = 0.618, # 1 / phi 30 | fig.show = "hold" 31 | ) 32 | ``` 33 | 34 | 35 | 36 | # 开始 37 | 38 | ## 开始 39 | 40 | 安装完毕后,从windows`开始菜单`,点开`rstudio`图标,就打开了rstudio的窗口,界面效果如下 41 | 42 | ```{r out.width = '75%', echo = FALSE} 43 | knitr::include_graphics("images/rstudio-editor.png") 44 | ``` 45 | 46 | ## RStudio 非常友好 47 | 48 | 想要运行一段R代码,只需要在 RStudio 控制台面板最下面(Console)一行内键入R 代码,然后回车即可。比如 49 | ```{r } 50 | 1 + 1 51 | ``` 52 | 53 | 54 | ```{r echo=TRUE, message=TRUE, warning=TRUE} 55 | log(8) 56 | ``` 57 | 58 | 59 | 60 | 61 | ```{r echo=TRUE, message=TRUE, warning=TRUE} 62 | 1:12 63 | ``` 64 | 65 | 66 | ## 对象 67 | 68 | ### 一切都是对象 69 | 在R中存储的数据称为**对象**, R语言数据处理实际上就是不断的创建和操控这些对象。 70 | 71 | ### 创建对象 72 | 创建一个 R 对象,首先确定一个名称,然后使用 73 | 赋值操作符 `<-`,将数据赋值给它。比如,如果想给变量 x 赋值为5,在命令行中可以这样写 `x <- 5` ,然后回车. 74 | 75 | ```{r assignment operator} 76 | x <- 5 77 | ``` 78 | 79 | ### 打印对象 80 | 81 | 当键入`x` 然后回车,就打印出 x 的值 82 | ```{r} 83 | x 84 | ``` 85 | 86 | 87 | 88 | 89 | 90 | 91 | 92 | 93 | 94 | ## 对象 95 | 96 | ### 创建对象 97 | ```{r} 98 | l <- "hello world" 99 | ``` 100 | 101 | ### 访问对象 102 | 103 | ```{r} 104 | l 105 | ``` 106 | 107 | 108 | ## 对象 109 | 110 | ### 创建一个序列 111 | ```{r echo=TRUE, message=TRUE, warning=TRUE} 112 | d <- 1:10 113 | ``` 114 | 115 | ### 访问对象 116 | ```{r echo=TRUE, message=TRUE, warning=TRUE} 117 | d 118 | ``` 119 | 120 | 121 | 122 | 123 | 124 | 125 | 126 | 127 | 128 | 129 | 130 | 131 | 132 | 133 | 134 | 135 | 136 | 137 | 138 | 139 | 140 | 141 | 142 | 143 | ## 数据类型 144 | 145 | 146 | ```{r out.width = '100%', echo = FALSE} 147 | knitr::include_graphics("images/data_type.png") 148 | ``` 149 | ## 数据类型 150 | - 数值型 151 | ```{r} 152 | 3 153 | 5000 154 | 3e+06 155 | class(0.0001) 156 | ``` 157 | 158 | 159 | ## 数据类型 160 | - 字符串型 161 | ```{r} 162 | "hello" 163 | "girl" 164 | "1" # 注意1 和 "1" 的区别 165 | ``` 166 | 167 | ```{r} 168 | class("1") 169 | ``` 170 | 171 | 172 | ## 数据类型 173 | 174 | - 逻辑型 175 | ```{r} 176 | TRUE 177 | FALSE 178 | 3 < 4 179 | ``` 180 | 181 | 182 | ```{r} 183 | class(T) 184 | ``` 185 | 186 | 187 | ```{r} 188 | 3 < 4 189 | ``` 190 | 191 | ## 数据类型 192 | - 因子型 193 | ```{r} 194 | fac <- factor(c("a", "b", "c")) 195 | fac 196 | ``` 197 | 198 | 199 | ```{r} 200 | class(fac) 201 | ``` 202 | 203 | 204 | 205 | 206 | 207 | 208 | ## 数据结构 209 | 210 | ### 向量 211 | - 用`c`函数将一组数据**构造**成向量,要求每个元素用逗号分隔,且每个元素的数据类型是一致的 212 | 213 | ```{r echo=TRUE, message=TRUE, warning=TRUE} 214 | d <- c(2, 4, 3, 1, 5, 7) 215 | d 216 | ``` 217 | 218 | 219 | ```{r echo=TRUE, message=TRUE, warning=TRUE} 220 | t <- c("2", "4", "3", "1", "5", "7") 221 | t 222 | ``` 223 | 224 | 225 | 长度为 1 的向量 226 | ```{r echo=TRUE, message=TRUE, warning=TRUE} 227 | x <- c(1) # 228 | x <- 1 # 偷懒的写法 229 | ``` 230 | 231 | 232 | ## 数据结构 233 | ### 矩阵 234 | - 可以用 `matrix` 函数创建 235 | ```{r echo=TRUE, message=TRUE, warning=TRUE} 236 | m <- matrix(c(2, 4, 3, 1, 5, 7), 237 | nrow = 2, 238 | ncol = 3, 239 | byrow = TRUE 240 | ) 241 | m 242 | ``` 243 | 244 | 245 | 246 | ## 数据结构 247 | ### 数组 248 | - `array` 函数生成`n`维数组 249 | ```{r echo=TRUE, message=TRUE, warning=TRUE} 250 | ar <- array(c(11:14, 21:24, 31:34), dim = c(2, 2, 3)) 251 | ar 252 | ``` 253 | 254 | 255 | 256 | 257 | 258 | ## 数据结构 259 | ### 列表 260 | - 与`c`函数创建向量的方式相似,元素之间用逗号分开。不同的是,列表允许每个元素不同的数据类型(数值型,字符型,逻辑型等),而向量要求每个元素的数据类型必须相同。 261 | 262 | ```{r echo=TRUE, message=TRUE, warning=TRUE} 263 | list1 <- list(100:110, "R", c(2, 4, 3, 1, 5, 7)) 264 | list1 265 | ``` 266 | 267 | 268 | 269 | ## 数据结构 270 | ### 数据框 271 | - `data.frame`函数构建 272 | 273 | ```{r echo=TRUE, message=TRUE, warning=TRUE} 274 | df <- data.frame( 275 | name = c("ace", "bob", "carl", "kaite"), 276 | age = c(21, 14, 13, 15), 277 | sex = c("girl", "boy", "boy", "girl") 278 | ) 279 | df 280 | ``` 281 | 282 | 283 | 284 | 285 | ## 数据结构 286 | ### 数据框 287 | R 对象的数据结构(向量、矩阵、数组、列表和数据框),总结如下 288 | 289 | ```{r out.width = '100%', echo = FALSE} 290 | knitr::include_graphics("images/data_struction1.png") 291 | ``` 292 | 293 | 294 | 295 | 296 | 297 | 298 | ## 函数 299 | 300 | R 语言的强大在于使用**函数**操控各种对象,你可以把对象看作是名词,而函数看作是动词。 301 | 我们用一个简单的例子,`sum()`来演示函数如何工作的。`sum()`后的结果可以直接显示出来, 302 | ```{r} 303 | sum(5, 10) 304 | ``` 305 | 306 | 也可以赋名。比如下面代码,首先计算`5 + 10`然后赋给新创建的对象`y`, 并在第二行中打印出来对象`y`的值 307 | 308 | ```{r} 309 | y <- sum(5, 10) 310 | y 311 | ``` 312 | 313 | 314 | ## 更多函数 315 | 316 | 除了`sum()`求和函数,R语言有很多很多函数 317 | 318 | 319 | ```{r echo=TRUE, message=TRUE, warning=TRUE} 320 | mean(1:6) 321 | ``` 322 | 323 | 324 | ```{r echo=TRUE, message=TRUE, warning=TRUE} 325 | abs(1:6) 326 | ``` 327 | 328 | 329 | 330 | ```{r echo=TRUE, message=TRUE, warning=TRUE} 331 | round(3.14159) 332 | ``` 333 | 334 | 335 | 336 | ```{r echo=TRUE, message=TRUE, warning=TRUE} 337 | x <- seq(1, 100) 338 | sum(x) 339 | ``` 340 | 341 | 342 | 343 | 344 | ## 脚本 345 | ### 什么是脚本 346 | 如果我们已经写好了一段R程序,我们可以保存为**脚本**文件,脚本文件通常以.R作为文件的后缀名。比如我们可以将刚才创建`x`和 `y`对象的命令,保存为脚本文件`my_script.R`。 347 | 这样我们可以在其它时间修改和重新运行它。 348 | 349 | ## 脚本 350 | ### 创建脚本 351 | 在RStudio中,你可以通过菜单栏依此点击`File > New File > R Script` 来创建一个新的脚本。 352 | 强烈建议大家在运行代码之前,使用脚本的形式编写和编辑自己的程序,养成这样的习惯后,你今后所有的工作都有案可查,并且具有可重复性。 353 | 354 | 355 | ## 创建脚本 356 | ```{r out.width = '100%', echo = FALSE} 357 | knitr::include_graphics("images/script1.png") 358 | ``` 359 | 360 | 361 | 362 | ## 运行脚本 363 | 364 | - 点击 `Run` 运行光标所在行 365 | - 点击 `Source` 运行整个脚本 366 | 367 | ```{r out.width = '75%', echo = FALSE} 368 | knitr::include_graphics("images/script2.png") 369 | ``` 370 | 371 | 372 | 373 | 374 | ## 宏包 375 | 376 | R 语言的强大还在于各种宏包,一般在[The Comprehensive R Archive Network (CRAN)](https://cran.r-project.org)下载安装。 377 | 378 | 379 | 可以用如下命令安装宏包: 380 | 381 | ```{r, eval = FALSE } 382 | # 安装单个包 383 | install.packages("tidyverse") 384 | ``` 385 | 386 | 387 | ```{r, eval = FALSE } 388 | # 安装多个包 389 | install.packages(c("ggplot2", "devtools", "dplyr")) 390 | ``` 391 | 392 | 393 | 394 | ## 如何获取帮助 395 | 396 | 397 | - 记住和学习所有的函数几乎是不可能的 398 | - 打开函数的帮助页面(`Rstudio`右下面板的`Help`选项卡) 399 | 400 | ```{r, eval = FALSE } 401 | ?sqrt 402 | ?gather 403 | ?spread 404 | ?ggplot2 405 | ?scale 406 | ?map_dfr 407 | ``` 408 | 409 | ## 如何获取帮助 410 | 411 | 快速获取帮助,是R的又一个优良特性 412 | 413 | ```{r out.width = '100%', echo = FALSE} 414 | knitr::include_graphics("images/Rhelp.png") 415 | ``` 416 | 417 | 418 | 419 | -------------------------------------------------------------------------------- /02_basicR/02_basicR.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/02_basicR/02_basicR.pdf -------------------------------------------------------------------------------- /02_basicR/header.tex: -------------------------------------------------------------------------------- 1 | \usepackage{ctex} 2 | \usepackage{booktabs} 3 | \usepackage{longtable} 4 | \usepackage{array} 5 | \usepackage{multirow} 6 | \usepackage{wrapfig} 7 | \usepackage{float} 8 | \usepackage{colortbl} 9 | \usepackage{pdflscape} 10 | \usepackage{tabu} 11 | \usepackage{threeparttable} 12 | \usepackage{threeparttablex} 13 | \usepackage{makecell} 14 | \usepackage{xcolor} 15 | \usepackage{xtab} 16 | 17 | \def\begincols{ 18 | \begin{columns}} \def\begincol{\begin{column}} \def\endcol{\end{column}} \def\endcols{\end{columns} 19 | } 20 | 21 | 22 | -------------------------------------------------------------------------------- /02_basicR/images/Rhelp.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/02_basicR/images/Rhelp.png -------------------------------------------------------------------------------- /02_basicR/images/data_struction1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/02_basicR/images/data_struction1.png -------------------------------------------------------------------------------- /02_basicR/images/data_type.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/02_basicR/images/data_type.png -------------------------------------------------------------------------------- /02_basicR/images/rstudio-editor.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/02_basicR/images/rstudio-editor.png -------------------------------------------------------------------------------- /02_basicR/images/script1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/02_basicR/images/script1.png -------------------------------------------------------------------------------- /02_basicR/images/script2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/02_basicR/images/script2.png -------------------------------------------------------------------------------- /03_subset/03_subset.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "第三章:子集选取" 3 | author: "王敏杰" 4 | institute: "四川师范大学" 5 | date: "\\today" 6 | fontsize: 12pt 7 | output: binb::metropolis 8 | section-titles: true 9 | #toc: true 10 | header-includes: 11 | - \usepackage[fontset = fandol]{ctex} 12 | - \input{header.tex} 13 | link-citations: yes 14 | colorlinks: yes 15 | linkcolor: red 16 | classoption: "dvipsnames,UTF8" 17 | --- 18 | 19 | ```{r setup, include=FALSE} 20 | options(digits = 3) 21 | knitr::opts_chunk$set( 22 | comment = "#>", 23 | echo = TRUE, 24 | collapse = TRUE, 25 | message = FALSE, 26 | warning = FALSE, 27 | out.width = "100%", 28 | fig.align = "center", 29 | fig.asp = 0.618, # 1 / phi 30 | fig.show = "hold" 31 | ) 32 | ``` 33 | 34 | 35 | 36 | ## 子集选取 37 | 38 | **对象**就是在计算机里新建了存储空间,好比一个盒子, 39 | 我们可以往盒子里装东西,也可以从盒子里取东西。 40 | 41 | ```{r echo=FALSE, out.width = '85%'} 42 | knitr::include_graphics("images/R_box.png") 43 | ``` 44 | 45 | 46 | ## 数据结构 47 | 48 | R 对象的数据结构(向量、矩阵、数组、列表和数据框) 49 | 50 | ```{r out.width = '100%', echo = FALSE} 51 | knitr::include_graphics("images/data_struction1.png") 52 | ``` 53 | 下面依次讲解,从每一种数据结构中选取子集... 54 | 55 | 56 | # 开始 57 | 58 | ## 向量 59 | 60 | 对于原子型向量,我们有至少四种选取子集的方法 61 | ```{r} 62 | x <- c(1.1, 2.2, 3.3, 4.4, 5.5) 63 | ``` 64 | 65 | 66 | - 正整数: 指定向量元素中的位置 67 | ```{r} 68 | x[1] 69 | ``` 70 | 71 | ```{r} 72 | x[c(3,1)] 73 | ``` 74 | ```{r} 75 | x[1:3] 76 | ``` 77 | 78 | ## 向量 79 | - 负整数:删除指定位置的元素 80 | ```{r} 81 | x[-2] 82 | ``` 83 | 84 | 85 | ```{r} 86 | x[c(-3, -4)] 87 | ``` 88 | 89 | 90 | ## 向量 91 | 92 | - 逻辑向量:将`TRUE`对应位置的元素提取出来 93 | ```{r} 94 | x[c(TRUE, FALSE, TRUE, FALSE, TRUE)] 95 | ``` 96 | 97 | 常用的一种情形;筛选出大于某个值的所有元素 98 | ```{r} 99 | x > 3 100 | ``` 101 | 102 | ```{r} 103 | x[x > 3] 104 | ``` 105 | 106 | ## 向量 107 | - 如果是命名向量 108 | ```{r} 109 | y <- c("a" = 11, "b" = 12, "c" = 13, "d" = 14) 110 | y 111 | ``` 112 | 113 | 我们可以用命名向量,返回对应位置的向量 114 | ```{r} 115 | y[c("d", "c", "a")] 116 | ``` 117 | 118 | 119 | 120 | 121 | ## 列表 122 | 123 | 对列表取子集,和向量的方法一样。使用`[`总是返回列表, 124 | ```{r} 125 | l <- list("one" = c("a", "b", "c"), 126 | "two" = c(1:5), 127 | "three" = c(TRUE, FALSE) 128 | ) 129 | l 130 | ``` 131 | 132 | ```{r} 133 | l[1] # 仍然是列表喔 134 | ``` 135 | 136 | 137 | ## 列表 138 | 139 | 如果想列表中的元素,需要使用`[[` 140 | ```{r} 141 | l[[1]] 142 | ``` 143 | 144 | 145 | 也可以使用其中的元素名,比如`[["one"]]`, 146 | ```{r} 147 | l[["one"]] 148 | ``` 149 | 150 | 151 | 程序员觉得以上太麻烦了,于是用`$`来简写 152 | ```{r} 153 | l$one 154 | ``` 155 | 156 | 157 | ## 列表 158 | 159 | 所以,请记住 160 | 161 | - `[` 和 `[[` 的区别 162 | - `x$y` 是 `x[["y"]]` 的简写 163 | 164 | 165 | 166 | 167 | ## 矩阵 168 | 169 | ```{r} 170 | a <- matrix(1:9, nrow = 3, byrow = TRUE) 171 | a 172 | ``` 173 | 我们取第1到第2行的2-3列,写成`[1:2, 2:3]`. 注意,中间以逗号分隔,它得到一个新的矩阵 174 | ```{r} 175 | a[1:2, 2:3] 176 | ``` 177 | 178 | 179 | ## 矩阵 180 | 默认情况下, `[` 会将获取的数据以尽可能低的维度形式呈现。比如 181 | ```{r} 182 | a[1, 1:2] 183 | ``` 184 | 表示第1行的第1、2列,此时不再是$1 \times 2$矩阵,而是包含了两个元素的向量。 185 | 186 | \vfill 187 | **以尽可能低的维度形式呈现**,简单理解就是,这个`r a[1, 1:2]`长的像个矩阵,又有点像向量,向量的维度比矩阵低,那就是向量吧。 188 | 189 | 190 | ## 矩阵 191 | 有些时候,我们想保留所有的行或者列,比如 192 | 193 | - 行方向,只选取第1行到第2行 194 | - 列方向,选取所有列 195 | 196 | 可以这样简写 197 | 198 | ```{r} 199 | a[1:2, ] 200 | ``` 201 | 202 | 想想,这种写法,会输出什么 203 | ```{r, eval = FALSE} 204 | a[ , ] 205 | ``` 206 | 207 | 208 | ## 矩阵 209 | 210 | ```{r} 211 | a[ , ] 212 | ``` 213 | 214 | 215 | ```{r} 216 | # 可以再简化点? 217 | a[] 218 | ``` 219 | 220 | 221 | ```{r} 222 | # 是不是可以再简化点? 223 | a 224 | ``` 225 | 226 | 227 | ## 数据框 228 | 229 | 数据框具有`list`和`matrix`的双重属性,因此 230 | 231 | - 当选取数据框的某几列的时候,可以像list一样,指定元素位置,比如`df[1:2]`选取前两列 232 | - 也可以像矩阵一样,使用行和列的标识选取,比如`df[1:3, ]`选取前三行的所有列 233 | 234 | \small 235 | ```{r} 236 | df <- data.frame(x = 1:4, 237 | y = 4:1, 238 | z = c("a", "b", "c", "d") ) 239 | df 240 | ``` 241 | 242 | ## 数据框 243 | 244 | \small 245 | ```{r} 246 | # Like a list 247 | df[c("x", "z")] 248 | ``` 249 | 250 | 251 | 252 | ```{r} 253 | # Like a matrix 254 | df[, c("x", "z")] 255 | ``` 256 | 257 | ## 数据框 258 | 259 | 也可以通过行和列的位置 260 | ```{r} 261 | df[1:2] 262 | ``` 263 | 264 | 265 | ```{r} 266 | df[1:3, ] 267 | ``` 268 | 269 | 270 | 271 | ## 数据框 272 | 遇到单行或单列的时候,也和矩阵一样,数据会降维 273 | ```{r} 274 | df[, "x"] 275 | ``` 276 | 277 | 如果想避免降维,需要多写一句话 278 | 279 | ```{r} 280 | df[, "x", drop = FALSE] 281 | ``` 282 | 283 | 284 | 285 | 286 | 287 | 288 | 289 | 290 | 291 | 292 | ## 延伸阅读 293 | 294 | - 如何获取`matrix(1:9, nrow = 3)`上对角元? 对角元? 295 | - 对数据框,思考`df["x"]`, `df[["x"]]`, `df$x`三者的区别? 296 | - 如果`x`是一个矩阵,请问 `x[] <- 0` 和`x <- 0` 有什么区别? 297 | 298 | ```{r eval=FALSE, include=FALSE} 299 | m <- matrix(1:9, nrow = 3) 300 | m 301 | ``` 302 | 303 | 304 | ```{r eval=FALSE, include=FALSE} 305 | diag(m) 306 | upper.tri(m, diag = FALSE) 307 | ``` 308 | 309 | 310 | ```{r eval=FALSE, include=FALSE} 311 | m[upper.tri(m, diag = FALSE)] 312 | ``` 313 | 314 | 315 | -------------------------------------------------------------------------------- /03_subset/03_subset.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/03_subset/03_subset.pdf -------------------------------------------------------------------------------- /03_subset/header.tex: -------------------------------------------------------------------------------- 1 | \usepackage{ctex} 2 | \usepackage{booktabs} 3 | \usepackage{longtable} 4 | \usepackage{array} 5 | \usepackage{multirow} 6 | \usepackage{wrapfig} 7 | \usepackage{float} 8 | \usepackage{colortbl} 9 | \usepackage{pdflscape} 10 | \usepackage{tabu} 11 | \usepackage{threeparttable} 12 | \usepackage{threeparttablex} 13 | \usepackage{makecell} 14 | \usepackage{xcolor} 15 | \usepackage{xtab} 16 | 17 | \def\begincols{ 18 | \begin{columns}} \def\begincol{\begin{column}} \def\endcol{\end{column}} \def\endcols{\end{columns} 19 | } 20 | 21 | 22 | -------------------------------------------------------------------------------- /03_subset/images/R_box.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/03_subset/images/R_box.png -------------------------------------------------------------------------------- /03_subset/images/data_struction1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/03_subset/images/data_struction1.png -------------------------------------------------------------------------------- /04_Rmarkdown/04_Rmarkdown.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "第三章:可重复性报告" 3 | author: "王敏杰" 4 | institute: "四川师范大学" 5 | date: "\\today" 6 | fontsize: 12pt 7 | output: binb::metropolis 8 | section-titles: true 9 | #toc: true 10 | header-includes: 11 | - \usepackage[fontset = fandol]{ctex} 12 | - \input{header.tex} 13 | link-citations: yes 14 | colorlinks: yes 15 | linkcolor: red 16 | classoption: "dvipsnames,UTF8" 17 | --- 18 | 19 | ```{r setup, include=FALSE} 20 | options(digits = 3) 21 | knitr::opts_chunk$set( 22 | comment = "#>", 23 | echo = TRUE, 24 | collapse = TRUE, 25 | message = FALSE, 26 | warning = FALSE, 27 | out.width = "100%", 28 | fig.align = "center", 29 | fig.asp = 0.618, # 1 / phi 30 | fig.show = "hold" 31 | ) 32 | ``` 33 | 34 | ## 为什么要做可重复性报告 35 | 36 | 交流-理解-重复 37 | 38 | - 需要**展示和分享**我们的数据分析结果给同行、老板或者老师 39 | - 为了让老板能快速地的理解我们的分析思路和方法,最好的方法,就是将分析背景、分析过程、分析结果以及图表等形成**报告** 40 | - 让读者能重复和验证我们的结果,确保结论的真实可信 41 | 42 | 因此,本章将介绍用Rmarkdown生成分析报告(可重复性报告) 43 | 44 | 45 | ## 什么是Rmarkdown 46 | ```{r out.width = '100%', echo = FALSE} 47 | knitr::include_graphics("images/rmarkdown.png") 48 | ``` 49 | 50 | 51 | 52 | # markdown 基本语法 53 | 54 | ## markdown 基本语法 55 | 56 | - 章节 57 | ```{markdown, eval = FALSE, echo = TRUE} 58 | # 第一章 (注意 "#" 与 "第一章"之间有空格) 59 | ## 第一节 (同上, "##" 与 "第一节"之间有空格) 60 | ### 第一小节 (同上,"###" 与 "第一小节"之间有空格) 61 | ``` 62 | 63 | - 正文 64 | ```{markdown, eval = FALSE, echo = TRUE} 65 | This is a sentence. ...这是正文... 66 | ``` 67 | 68 | 69 | ## markdown 基本语法 70 | - 序列 71 | ```{markdown, eval = FALSE, echo = TRUE} 72 | Now a list begins: 73 | 74 | - no importance 75 | - again 76 | - repeat 77 | 78 | A numbered list: 79 | 80 | 1. first 81 | 2. second 82 | ``` 83 | 84 | 85 | ## markdown 基本语法 86 | 87 | - 其他标记 88 | ```{markdown, eval = FALSE, echo = TRUE} 89 | __bold__ 90 | _italic_ 91 | ~~strike through~~ 92 | ``` 93 | 94 | 95 | 96 | # 创建 RMarkdown 97 | 98 | ## 创建 RMarkdown 99 | 100 | ```{r, eval = FALSE} 101 | install.packages("rmarkdown") 102 | ``` 103 | 104 | `Rstudio`中创建: `File -> New File -> R Markdown`. 105 | 106 | 107 | 基本构成(图中绿色括号地方) 108 | 109 | - metadata 110 | - text 111 | - code 112 | 113 | 114 | ## 创建 RMarkdown 115 | ```{r out.width = '85%', echo = FALSE} 116 | knitr::include_graphics("images/rstudio-markdown.png") 117 | ``` 118 | 119 | 点击knit(图中红色地方),选择想要输出的文档格式即可。 120 | 121 | 122 | ## 生成html文档 123 | 124 | 希望html文档有章节号、目录或者更好显示表格,可以修改头文件(用下面的内容替换Rmarkdown的头文件) 125 | 126 | ```yaml 127 | --- 128 | title: Habits 129 | author: John Doe 130 | date: "`r Sys.Date()`" 131 | output: 132 | html_document: 133 | df_print: paged 134 | toc: yes 135 | number_sections: yes 136 | --- 137 | ``` 138 | 139 | 140 | 141 | 142 | 143 | 144 | ## 生成pdf文档 145 | 146 | 优雅的pdf文档 147 | 148 | - pdf文档可以插入漂亮的矢量图和优雅的数学公式,所以备受同学们的喜欢。 149 | - 但往往我们写中文的时候,编译不成功,解决方案就是使用`tinytex`,可以看这个[视频](https://www.bilibili.com/video/BV1Gf4y1R7md)。 150 | 151 | 152 | ```{r, eval = FALSE} 153 | install.packages("tinytex") 154 | tinytex::install_tinytex(dir = "D:\\Tinytex", 155 | force = T) 156 | ``` 157 | 158 | 159 | 160 | 161 | 162 | # Rmarkdown 使用方法 163 | 164 | ## 插入公式 165 | 166 | 我相信你已经熟悉了latex语法,那么我们在Rmarkdwon里输入 167 | `$$\frac{\sum (\bar{x} - x_i)^2}{n-1}$$`,那么实际输出: 168 | 169 | $$\frac{\sum (\bar{x} - x_i)^2}{n-1}$$ 170 | 171 | 172 | ## 插入公式 173 | 也可以使用latex的等式环境, 比如 174 | 175 | ```latex 176 | $$ 177 | \Theta = \begin{pmatrix}\alpha & \beta\\ 178 | \gamma & \delta 179 | \end{pmatrix} 180 | $$ 181 | ``` 182 | 输出 183 | 184 | $$ 185 | \Theta = \begin{pmatrix}\alpha & \beta\\ 186 | \gamma & \delta 187 | \end{pmatrix} 188 | $$ 189 | 190 | 191 | 192 | ## 插入图片 193 | 194 | \scriptsize 195 | ````markdown 196 | `r ''````{r, out.width='35%', fig.align='center', fig.cap='this is caption'} 197 | knitr::include_graphics("images/R_logo.png") 198 | ``` 199 | ```` 200 | 201 | 202 | ```{r out.width = '35%', fig.align='center', fig.cap='this is caption', echo = F} 203 | knitr::include_graphics("images/R_logo.png") 204 | ``` 205 | 206 | 207 | 208 | ## 运行代码 209 | 210 | ```{r, echo = T} 211 | summary(cars) 212 | ``` 213 | 214 | 215 | ## 表格 216 | ````md 217 | ```{r tables-mtcars}`r ''` 218 | knitr::kable(iris[1:5, ], caption = "A caption") 219 | ``` 220 | ```` 221 | 222 | \vskip -1cm 223 | ```{r tables-mtcars, echo = F} 224 | knitr::kable(iris[1:5, ], caption = "A caption") 225 | ``` 226 | 227 | 需要更优美的表格,可参考[这里](https://haozhu233.github.io/kableExtra/) 228 | 229 | 230 | 231 | ## 生成图片 232 | ````md 233 | ```{r}`r ''` 234 | plot(pressure) 235 | ``` 236 | ```` 237 | 238 | 239 | ```{r out.width = '85%', echo=FALSE} 240 | plot(pressure) 241 | ``` 242 | 243 | 244 | 245 | ## 把这段代码复制到你的Rmarkdown文档试试 246 | 247 | \scriptsize 248 | ````md 249 | ```{r, out.width = '85%', fig.showtext = TRUE}`r ''` 250 | library(tidyverse) 251 | library(nycflights13) 252 | library(showtext) 253 | showtext_auto() 254 | flights %>% 255 | group_by(dest) %>% 256 | summarize( 257 | count = n(), 258 | dist = mean(distance, na.rm = TRUE), 259 | delay = mean(arr_delay, na.rm = TRUE) 260 | ) %>% 261 | dplyr::filter(delay > 0, count > 20, dest != "HNL") %>% 262 | ggplot(mapping = aes(x = dist, y = delay)) + 263 | geom_point(aes(size = count), alpha = 1 / 3) + 264 | geom_smooth(se = FALSE) + 265 | ggtitle("这是我的标题") 266 | ``` 267 | ```` 268 | 269 | 270 | 271 | 272 | 273 | 274 | 275 | ## 延伸阅读 276 | 277 | * Markdown tutorial https://www.markdowntutorial.com (10分钟学完) 278 | * LaTeX tutorial https://www.latex-tutorial.com/quick-start/ 279 | * Rmarkdown 介绍 https://bookdown.org/yihui/rmarkdown/ 280 | * Rmarkdown 手册 https://bookdown.org/yihui/rmarkdown-cookbook/ 281 | -------------------------------------------------------------------------------- /04_Rmarkdown/04_Rmarkdown.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/04_Rmarkdown/04_Rmarkdown.pdf -------------------------------------------------------------------------------- /04_Rmarkdown/header.tex: -------------------------------------------------------------------------------- 1 | \usepackage{ctex} 2 | \usepackage{booktabs} 3 | \usepackage{longtable} 4 | \usepackage{array} 5 | \usepackage{multirow} 6 | \usepackage{wrapfig} 7 | \usepackage{float} 8 | \usepackage{colortbl} 9 | \usepackage{pdflscape} 10 | \usepackage{tabu} 11 | \usepackage{threeparttable} 12 | \usepackage{threeparttablex} 13 | \usepackage{makecell} 14 | \usepackage{xcolor} 15 | \usepackage{xtab} 16 | 17 | \def\begincols{ 18 | \begin{columns}} \def\begincol{\begin{column}} \def\endcol{\end{column}} \def\endcols{\end{columns} 19 | } 20 | 21 | 22 | -------------------------------------------------------------------------------- /04_Rmarkdown/images/R_logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/04_Rmarkdown/images/R_logo.png -------------------------------------------------------------------------------- /04_Rmarkdown/images/rmarkdown.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/04_Rmarkdown/images/rmarkdown.png -------------------------------------------------------------------------------- /04_Rmarkdown/images/rstudio-markdown.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/04_Rmarkdown/images/rstudio-markdown.png -------------------------------------------------------------------------------- /05_dplyr/05_dplyr.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "第四章:数据处理" 3 | author: "王敏杰" 4 | institute: "四川师范大学" 5 | date: "\\today" 6 | fontsize: 12pt 7 | output: binb::metropolis 8 | section-titles: true 9 | #toc: true 10 | header-includes: 11 | - \usepackage[fontset = fandol]{ctex} 12 | - \input{header.tex} 13 | link-citations: yes 14 | colorlinks: yes 15 | linkcolor: red 16 | classoption: "dvipsnames,UTF8" 17 | --- 18 | 19 | ```{r setup, include=FALSE} 20 | options(digits = 3) 21 | knitr::opts_chunk$set( 22 | comment = "#>", 23 | echo = TRUE, 24 | collapse = TRUE, 25 | message = FALSE, 26 | warning = FALSE, 27 | out.width = "100%", 28 | fig.align = "center", 29 | fig.asp = 0.618, # 1 / phi 30 | fig.show = "hold" 31 | ) 32 | ``` 33 | 34 | ## 正式进入tidyverse家族的学习 35 | ```{r echo=FALSE, out.width = '85%'} 36 | knitr::include_graphics("images/tidyverse.png") 37 | ``` 38 | 39 | ## tidyverse 家族 40 | 41 | tidyverse家族主要成员包括 42 | 43 | 44 | | 功能 | 宏包 | 45 | |------|-------------| 46 | 有颜值担当 | ggplot2 | 47 | 数据处理王者 | dplyr | 48 | 数据转换专家 | tidyr | 49 | 数据载入利器 | readr | 50 | 循环加速器 | purrr | 51 | 强化数据框 | tibble | 52 | 53 | # 数据读取 54 | 55 | ## 读取数据 56 | 57 | R语言提供了很多读取数据的函数。 58 | 59 | 60 | 文件格式 | **R** 函数 61 | :--------------------------- | :---------------------- 62 | .txt | read.table() 63 | .csv | read.csv() and readr::read_csv() 64 | .xls and .xlsx | readxl::read_excel() and openxlsx::read.xlsx() 65 | .sav | foreign::read.spss() 66 | .Rdata or rda | load() 67 | .rds | readRDS() and readr::read_rds() 68 | .dta | haven::read_dta() and haven::read_stata() 69 | Internet | download.file() 70 | 71 | 72 | 73 | ## 范例 74 | 75 | ```{r} 76 | library(readr) 77 | wages <- read_csv("./demo_data/wages.csv") 78 | head(wages, 6) 79 | ``` 80 | 81 | ## 范例 82 | ```{r} 83 | library(readxl) 84 | d <- read_excel("./demo_data/olympics.xlsx") 85 | tail(d, 6) 86 | ``` 87 | 88 | 89 | 90 | 91 | # 数据处理 92 | 93 | ## tidy原则 94 | 95 | Hadley Wickhamt提出了数据科学tidy原则,我结合自己的理解,tidy思想体现在: 96 | 97 | ```{r out.width = '85%', echo = FALSE} 98 | knitr::include_graphics("images/import_datatype01.png") 99 | ``` 100 | 101 | - 一切都是数据框,任何数据都可以规整 102 | - 数据框的一列代表一个**变量**,数据框的一行代表一次**观察** 103 | - 函数处理数据时,数据框进数据框出(函数的第一个参数始终为**数据框**) 104 | 105 | 106 | 107 | ## dplyr宏包 108 | 本章我们介绍tidyverse里数据处理的神器dplyr宏包。首先,我们加载该宏包 109 | ```{r message = FALSE, warning = FALSE} 110 | library(dplyr) 111 | ``` 112 | 113 | dplyr 定义了数据处理的规范语法,其中主要包含以下七个主要的函数。 114 | 115 | * `mutate() `, `select() `, `filter() ` 116 | * `summarise() `, `group_by()`, `arrange() ` 117 | * `left_join()`, `right_join()`, `full_join()` 118 | 119 | 我们将依次介绍 120 | 121 | 122 | ## 假定数据 123 | 124 | 假定我们有一数据框,包含三位学生的英语和数学科目 125 | \small 126 | ```{r} 127 | df <- data.frame( 128 | name = c("Alice", "Alice", "Bob", "Bob", "Carol", "Carol"), 129 | type = c("english", "math", "english", "math", "english", "math") 130 | ) 131 | df 132 | ``` 133 | 134 | 135 | 136 | ## `mutate() `增加一列 137 | 这里有他们的最近的考试成绩,想添加到数据框中 138 | \footnotesize 139 | ```{r} 140 | score2020 <- c(80.2, 90.5, 92.2, 90.8, 82.5, 84.6) 141 | score2020 142 | ``` 143 | 144 | 145 | \begincols[T] 146 | \begincol[T]{.48\textwidth} 147 | 使用传统的方法 148 | ```{r} 149 | df$score <- score2020 150 | df 151 | ``` 152 | 153 | ```{r include=FALSE} 154 | df <- data.frame( 155 | name = c("Alice", "Alice", "Bob", "Bob", "Carol", "Carol"), 156 | type = c("english", "math", "english", "math", "english", "math") 157 | ) 158 | ``` 159 | 160 | \endcol 161 | 162 | \begincol[T]{.48\textwidth} 163 | dplyr语法这样写 164 | 165 | ```{r} 166 | # 167 | mutate(df, score = score2020) 168 | ``` 169 | \endcol 170 | \endcols 171 | 172 | 173 | 174 | 175 | 176 | ## `mutate() `增加一列 177 | 178 | `mutate()` 函数 179 | 180 | ```{r, eval=FALSE} 181 | mutate(.data = df, score = score2020) 182 | ``` 183 | 184 | - 第一参数是我们要处理的数据框,比如这里的`df`, 185 | - 第二个参数是`score = score2020`,等号左边的`score`是我们打算创建一个新列,而取的列名; 186 | 等号右边是装着学生成绩的**向量**(注意,向量 的长度要与数据框的行数相等,比如这里长度都是6) 187 | 188 | 189 | 190 | 191 | ## `管道` %>% 192 | 193 | 这里有必要介绍下管道操作符 [ `%>%` ](https://magrittr.tidyverse.org/). 194 | 195 | ```{r} 196 | c(1:10) 197 | ``` 198 | 199 | ```{r} 200 | sum(c(1:10)) 201 | ``` 202 | 203 | 204 | 与下面的写法是等价的, 205 | ```{r} 206 | c(1:10) %>% sum() 207 | ``` 208 | 209 | 210 | 211 | ## `管道` %>% 212 | 213 | ```{r, eval=FALSE} 214 | c(1:10) %>% sum() 215 | ``` 216 | 这条语句的意思,向量`c(1:10)` 通过管道操作符 `%>%` ,传递到函数`sum()`的第一个参数位置,即`sum(c(1:10))`, 这个`%>%`管道操作符还是很形象的, 217 | 218 | ```{r out.width = '50%', echo = FALSE} 219 | knitr::include_graphics("images/pipe1.png") 220 | ``` 221 | 222 | 223 | ## `管道` %>% 224 | 当对执行多个函数操作的时候,就显得格外方便,代码可读性更强。 225 | 226 | ```{r} 227 | sqrt(sum(abs(c(-10:10)))) 228 | ``` 229 | 230 | 231 | ```{r} 232 | c(-10:10) %>% abs() %>% sum() %>% sqrt() 233 | ``` 234 | 235 | 236 | 237 | 238 | 239 | ## `管道` %>% 240 | 那么,上面增加学生成绩的语句`mutate(df, score = score2020)`就可以使用管道 241 | 242 | ```{r out.width = '75%', echo = FALSE} 243 | knitr::include_graphics("images/pipe2.png") 244 | ``` 245 | 246 | 247 | ## `管道` %>% 248 | ```{r} 249 | # 等价于 250 | df %>% mutate(score = score2020) 251 | ``` 252 | 是不是很赞? 253 | 254 | 255 | 256 | ```{r, include=FALSE} 257 | df <- df %>% mutate(score = score2020) 258 | df 259 | ``` 260 | 261 | 262 | 263 | 264 | 265 | ## `select() ` 选择某列 266 | 267 | `select()`,就是选择数据框的某一列 268 | 269 | \bigskip 270 | 271 | \begincols 272 | \begincol{.48\textwidth} 273 | 传统的方法 274 | ```{r} 275 | df["name"] 276 | ``` 277 | 278 | \endcol 279 | 280 | \begincol{.48\textwidth} 281 | dplyr的方法 282 | ```{r} 283 | df %>% select(name) 284 | ``` 285 | \endcol 286 | \endcols 287 | 288 | 289 | 290 | ## `select() ` 选择某列 291 | 如果选取多列,就再写一个就行了 292 | ```{r} 293 | df %>% select(name, score) 294 | ``` 295 | 296 | 297 | ## `select()` 选择某列 298 | 299 | 如果不想要某列, 可以在变量前面加`-`, 300 | ```{r} 301 | df %>% select(-type) 302 | ``` 303 | 304 | 305 | 306 | 307 | 308 | 309 | ## `filter() ` 筛选 310 | 311 | 我们还可以对数据行方向的选择和筛选,比如这里把**成绩高于90分的**同学筛选出来 312 | 313 | ```{r} 314 | df %>% filter(score >= 90) 315 | ``` 316 | 317 | 318 | ## `filter()` 筛选 319 | 320 | 我们也可以限定多个条件进行筛选, 英语成绩高于90分的筛选出来 321 | ```{r} 322 | df %>% filter(type == "english", score >= 90) 323 | ``` 324 | 325 | 326 | 327 | 328 | ## `summarise() `统计 329 | 330 | `summarise() `主要用于统计,往往与其他函数配合使用 331 | 332 | \medskip 333 | 比如,计算所有同学的考试成绩的均值 334 | ```{r} 335 | df %>% summarise( mean_score = mean(score)) 336 | ``` 337 | 338 | 比如,计算所有同学的考试成绩的标准差 339 | ```{r} 340 | df %>% summarise( mean_score = sd(score)) 341 | ``` 342 | 343 | 344 | 345 | 346 | ## `summarise() `统计 347 | 还可以同时完成多个统计 348 | ```{r} 349 | df %>% summarise( 350 | mean_score = mean(score), 351 | median_score = median(score), 352 | n = n(), 353 | sum = sum(score) 354 | ) 355 | ``` 356 | 357 | 358 | 359 | 360 | 361 | ## `group_by()`分组 362 | 363 | 先分组再统计。比如,我们想统计每个学生的平均成绩,即先按学生`name`分组,然后分别求平均 364 | 365 | \small 366 | ```{r} 367 | df %>% 368 | group_by(name) %>% 369 | summarise( 370 | mean_score = mean(score), 371 | sd_score = sd(score) 372 | ) 373 | ``` 374 | 375 | 376 | 377 | 378 | 379 | ## `arrange() `排序 380 | 我们按照考试成绩从低到高排序,然后输出 381 | ```{r} 382 | df %>% arrange(score) 383 | ``` 384 | 385 | 386 | 387 | ## `arrange() `排序 388 | 389 | 如果从高到低降序排列呢,有两种方法: 390 | 391 | \small 392 | \begincols 393 | \begincol{.48\textwidth} 394 | ```{r} 395 | df %>% arrange(-score) 396 | ``` 397 | \endcol 398 | 399 | \begincol{.48\textwidth} 400 | 401 | ```{r} 402 | df %>% arrange(desc(score)) 403 | ``` 404 | \endcol 405 | \endcols 406 | 407 | 哪边可读性更强些? 408 | 409 | 410 | ## `arrange() `排序 411 | 也可对多个变量先后排序。 412 | \medskip 413 | 比如,先按学科排,然后按照成绩从高到底排序 414 | ```{r} 415 | df %>% 416 | arrange(type, desc(score)) 417 | ``` 418 | 419 | 420 | 421 | 422 | 423 | ## `left_join` 合并 424 | 假定我们已经统计了每个同学的平均成绩,存放在数据框`df1` 425 | 426 | ```{r} 427 | df1 <- df %>% 428 | group_by(name) %>% 429 | summarise( mean_score = mean(score) ) 430 | df1 431 | ``` 432 | 433 | 434 | ## `left_join` 合并 435 | 同时,我们又有新一个数据框`df2`,它包含同学们的年龄信息 436 | ```{r} 437 | df2 <- tibble( 438 | name = c("Alice", "Bob"), 439 | age = c(12, 13) 440 | ) 441 | df2 442 | ``` 443 | 444 | 445 | 446 | ## `left_join` 左合并 447 | 448 | 通过姓名`name`把两个数据框`df1`和`df2`合并, 449 | 450 | ```{r} 451 | left_join(df1, df2, by = "name") 452 | ``` 453 | 454 | 大家注意到最后一行Carol的年龄是`NA`, 大家想想为什么呢? 455 | 456 | 457 | 458 | 459 | 460 | ## `left_join` 左合并 461 | 462 | 当然,也可以这样写 463 | ```{r} 464 | df1 %>% left_join(df2, by = "name") 465 | ``` 466 | 467 | 468 | 469 | 470 | 471 | 472 | ## `right_join` 右合并 473 | 我们再试试`right_join()`右合并 474 | 475 | ```{r, message=FALSE} 476 | df1 %>% right_join(df2, by = "name") 477 | ``` 478 | Carol同学的信息没有了? 大家想想又为什么呢? 479 | 480 | 481 | 482 | 483 | 484 | 485 | 486 | 487 | ## 延伸阅读 488 | 489 | - 推荐[https://dplyr.tidyverse.org/](https://dplyr.tidyverse.org/). 490 | - [cheatsheet](https://github.com/rstudio/cheatsheets/raw/master/data-transformation.pdf) 491 | - 运行并读懂[nycflights.Rmd](https://github.com/perlatex/R_for_Data_Science/blob/master/data/nycflights.Rmd) 492 | 493 | -------------------------------------------------------------------------------- /05_dplyr/05_dplyr.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/05_dplyr/05_dplyr.pdf -------------------------------------------------------------------------------- /05_dplyr/demo_data/olympics.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/05_dplyr/demo_data/olympics.xlsx -------------------------------------------------------------------------------- /05_dplyr/demo_data/wages.csv: -------------------------------------------------------------------------------- 1 | "earn","height","sex","race","ed","age" 2 | 79571.299011024,73.89,"male","white",16,49 3 | 96396.9886433106,66.23,"female","white",16,62 4 | 48710.666947391,63.77,"female","white",16,33 5 | 80478.0961525837,63.22,"female","other",16,95 6 | 82089.3454983326,63.08,"female","white",17,43 7 | 15313.3529014342,64.53,"female","white",15,30 8 | 47104.1718212293,61.54,"female","white",12,53 9 | 50960.0542820731,73.29,"male","white",17,50 10 | 3212.6495560539,72.24,"male","hispanic",15,25 11 | 42996.6378844038,72.4,"male","white",12,30 12 | 10328.6188426045,70.22,"male","white",16,69 13 | 1002.30715511839,63.15,"female","white",14,54 14 | 47597.8198637099,68.11,"male","white",11,38 15 | 19019.5422985066,68.08,"male","white",12,31 16 | 20063.9966387225,64.86,"female","white",12,55 17 | 992.832346303227,60.06,"female","white",12,31 18 | 35972.1711232638,66.01,"female","white",16,39 19 | 26930.5439643451,68.07,"male","white",12,62 20 | 64602.0639724231,68.16,"female","white",14,33 21 | 69993.6930698179,70.02,"male","white",13,48 22 | 1000.21830625211,67.08,"female","white",9,25 23 | 12131.8221152514,64.2,"female","black",12,59 24 | 84223.3979186057,72.76,"male","black",13,39 25 | 8949.47493547637,62.17,"female","white",13,55 26 | 23278.3232780607,63.04,"female","white",14,25 27 | 8750.13918500201,66.67,"male","white",14,26 28 | 64593.5998054014,65.95,"female","white",12,45 29 | 54079.8149454029,71.69,"male","white",12,49 30 | 16896.1000435301,62.76,"female","black",12,39 31 | -95.7102792312691,68.46,"male","white",11,25 32 | 43938.8576046547,64.07,"female","white",16,64 33 | 1004.09561369181,60,"female","white",12,48 34 | 79478.0468181047,71.97,"male","white",16,42 35 | 984.761813301634,68.25,"female","white",12,59 36 | 65237.7700476547,76.68,"male","white",16,37 37 | 978.872090839732,63.77,"female","white",10,23 38 | 24845.4285917486,63.96,"female","white",14,29 39 | 40754.1029736281,63.72,"female","white",12,37 40 | 119251.055119839,71.92,"male","white",17,43 41 | 42933.0122551917,68.12,"male","white",17,35 42 | 20088.3114644238,64.48,"female","white",12,30 43 | 12924.2239977829,60.94,"female","white",14,82 44 | 48693.4797964477,67.19,"female","black",14,35 45 | 34383.101849067,65.05,"female","white",12,61 46 | 43928.4198331574,64.17,"female","white",14,30 47 | 5782.248948451,66.49,"female","white",15,69 48 | 40758.5577226173,65.58,"female","white",12,34 49 | 39162.8754520925,64.93,"female","white",12,45 50 | 51896.846670074,67.02,"female","white",18,33 51 | 16905.1408589265,67.49,"female","white",17,34 52 | 1000.99111084442,70.03,"female","white",12,61 53 | 18497.8538930071,60.61,"female","hispanic",12,25 54 | 30740.47803689,64.92,"female","white",13,36 55 | 32803.4426895406,62,"female","white",12,33 56 | 5529.47149841489,72.16,"male","white",10,22 57 | 20594.711211041,66.35,"male","black",8,60 58 | 39816.5002027645,70.09,"male","white",12,69 59 | 33327.5987832144,70.05,"male","white",17,45 60 | 55067.7931193718,68.49,"female","white",17,53 61 | 10545.4502477035,60.13,"female","white",12,69 62 | 26979.2976737281,71.15,"male","white",12,32 63 | 55634.1696790621,70.77,"male","white",12,36 64 | 6380.09148122974,72.87,"male","white",13,22 65 | 23264.3741899011,68,"female","white",14,59 66 | 16918.2189161501,65.64,"female","white",12,61 67 | 39676.2644997822,68.74,"male","white",16,33 68 | 994.133503863912,63.02,"female","white",8,67 69 | 26420.8322535518,62.8,"female","white",14,31 70 | 986.449291094241,65.99,"female","black",13,34 71 | 986.993627550926,64.66,"female","white",12,30 72 | 25550.7559075621,62.04,"male","hispanic",14,32 73 | 27235.2680173859,64.24,"female","white",14,47 74 | 7354.05363222349,63.82,"female","white",9,72 75 | 7101.32415873714,66.95,"female","white",9,56 76 | 35041.7055007873,71.28,"male","white",12,43 77 | 1318.87555348029,59.63,"female","white",16,57 78 | 42343.3439688464,66.34,"female","white",16,31 79 | 4985.24601643037,66,"female","white",15,25 80 | 28022.6386657665,65.49,"female","white",14,43 81 | 13729.6720245917,69.96,"female","white",13,26 82 | 991.599966272541,64.79,"female","white",13,67 83 | 20074.7729832596,66.35,"female","white",13,72 84 | 16901.6558792008,62.09,"female","white",12,51 85 | 16899.2184171763,65.83,"female","white",15,71 86 | 999.338097135203,63.78,"female","white",14,42 87 | 23831.5316537746,71.81,"male","white",12,43 88 | 4812.71528146838,67.12,"female","hispanic",8,43 89 | 47742.1252747604,68.19,"male","hispanic",12,36 90 | 47655.590724269,69.76,"male","white",12,37 91 | 16892.6857012084,61.91,"female","white",12,42 92 | 8934.46253020389,63.29,"female","white",13,30 93 | 20099.9218436585,65.48,"female","white",13,67 94 | 32803.3025937337,69.75,"female","white",10,65 95 | 32811.8623722485,68.09,"female","white",12,40 96 | 31765.8967813837,70.21,"male","white",14,27 97 | 1001.00043870299,65.99,"female","white",12,63 98 | 2907.2064170387,66.23,"female","white",12,24 99 | 2115.67589682381,67.79,"female","white",16,36 100 | 31884.9092385098,69.94,"male","white",16,31 101 | 16905.0220799872,60.75,"female","hispanic",12,26 102 | 48686.3352313777,62.9,"female","white",12,77 103 | 64606.3235215003,63.47,"female","white",14,60 104 | 39751.1940297085,67.14,"male","white",12,93 105 | 16881.7141614142,60.66,"female","white",17,83 106 | 96382.8900908101,69.55,"female","white",18,67 107 | 986.228613755399,63.46,"female","white",12,25 108 | 29618.8251244886,67.83,"female","white",12,70 109 | 26500.2429561617,64.39,"female","white",12,37 110 | 24854.9727381481,67.01,"female","black",14,34 111 | 15885.5195104926,68.79,"male","white",17,27 112 | 53463.6223523582,67.05,"female","white",13,47 113 | 29627.8222039787,67.83,"female","white",12,34 114 | 24843.1748606049,63.03,"female","white",12,41 115 | 33438.2100886863,70.97,"male","white",12,26 116 | 34374.0790977282,63.91,"female","black",17,47 117 | 58834.1668506326,68.22,"male","white",11,41 118 | 61429.1854058301,68.43,"female","white",17,48 119 | 28014.7345560315,65.53,"female","hispanic",14,47 120 | 981.65670365787,66.13,"female","white",12,33 121 | 996.182111287872,66.31,"female","other",16,36 122 | 50924.6583604247,77.12,"male","white",16,34 123 | 44729.6150466756,67.17,"female","white",12,62 124 | 27223.3656953101,62.11,"female","white",12,48 125 | 1007.70988015799,64.5,"female","white",17,34 126 | 996.828601796463,71.96,"female","white",12,55 127 | 40750.0226434449,67.42,"female","white",18,39 128 | 43929.4541868987,61.17,"female","white",18,47 129 | 8949.67101572717,63.6,"female","white",12,32 130 | 112309.660734072,61.91,"female","white",16,42 131 | 8939.55497630854,62.83,"female","white",12,44 132 | 7932.57103394049,67.83,"male","white",16,28 133 | 32800.8023474806,63.71,"female","white",15,30 134 | 6359.00598305609,71.36,"male","white",15,25 135 | 96383.0764233227,64,"female","white",16,39 136 | 8967.87647229583,60.96,"female","white",13,35 137 | 47691.8764601662,74.21,"male","white",12,42 138 | 111283.259406706,70.25,"male","white",14,39 139 | 79481.9408160514,71.08,"male","white",16,45 140 | 70960.3420883276,62.85,"female","white",12,43 141 | 48722.16771983,63.61,"female","white",14,47 142 | 997.891112569392,62.03,"female","white",16,36 143 | 16912.1555694825,62.08,"female","white",16,44 144 | 37573.2990727913,70.15,"female","white",17,46 145 | 1005.10722008903,62.1,"female","white",12,43 146 | 71548.0876578581,70.76,"male","white",17,66 147 | 23878.8111841365,71.04,"male","white",14,35 148 | 7344.95757908863,69.86,"female","white",14,75 149 | 26877.8701780781,67.02,"male","white",14,35 150 | 48708.525626797,64.16,"female","white",12,36 151 | 44711.1395351231,67.09,"female","white",12,34 152 | 10044.7225146099,62.1,"female","white",8,73 153 | 29611.1767650406,61.76,"female","hispanic",13,60 154 | 68256.8967140838,67.95,"male","black",13,48 155 | 51888.9713088147,62.5,"female","black",14,48 156 | 995.46575719133,66.19,"female","black",5,55 157 | 16887.1457287969,68.16,"female","black",18,60 158 | 95449.6102254534,74.23,"male","white",13,49 159 | 34383.7045204962,68.07,"female","other",12,54 160 | 4807.83133117883,61.74,"female","white",16,26 161 | 2587.63185305423,65.9,"female","white",15,32 162 | 42922.5679351389,68.18,"male","white",12,31 163 | 11485.9945183583,60.18,"female","hispanic",14,32 164 | 25447.465302014,67.55,"male","white",8,47 165 | 143095.852138569,72.09,"male","white",12,30 166 | 13720.5053672839,66.27,"female","white",12,46 167 | 31829.2883253981,67.86,"male","white",10,36 168 | 24858.2269938452,67.47,"female","white",12,22 169 | 20078.9661907246,69.1,"female","white",12,64 170 | 39165.2382046526,64.15,"female","white",16,50 171 | 32794.5406477114,65.02,"female","white",14,43 172 | 31207.4500897519,61.19,"female","white",12,50 173 | 15970.26222705,71.89,"male","white",12,53 174 | 64610.3579806432,66.19,"female","white",16,38 175 | 991.628909680612,61.91,"female","white",10,61 176 | 995.090910576791,59.91,"female","hispanic",10,52 177 | 39783.9469489965,67.85,"male","white",14,68 178 | 39684.6241268197,70.08,"male","white",12,28 179 | 39724.6811008388,71.13,"male","white",14,36 180 | 30201.5172441044,71.51,"male","white",16,65 181 | 70051.7508949032,68.28,"male","white",16,52 182 | 1007.95714596412,65.81,"female","white",12,23 183 | 1012.93115242453,63.07,"female","white",18,59 184 | 24854.2035992486,60.32,"female","white",14,53 185 | 28036.2730849519,63.06,"female","white",12,40 186 | 38198.250803332,67.61,"male","white",12,60 187 | 37585.5211136113,63.65,"female","white",12,41 188 | 20628.2955187112,70.44,"male","white",12,78 189 | 103375.877816366,67.81,"male","white",16,50 190 | 12129.4641155017,60.95,"female","white",12,67 191 | 63635.6570881354,67.78,"male","white",18,67 192 | 24843.7543318963,66.4,"female","white",17,47 193 | 31795.5788124066,69.7,"male","white",16,29 194 | 31852.2062255474,70.62,"male","white",12,54 195 | 32804.6545498774,64.76,"female","white",12,46 196 | 40747.1407845328,66.06,"female","white",12,53 197 | 77889.5763942784,69.9,"male","white",13,41 198 | 1010.19089949095,63.84,"female","white",12,40 199 | 1009.69186942175,63,"female","white",16,39 200 | 40750.8830526368,61.39,"female","white",12,69 201 | 18480.6824537285,66.26,"female","white",14,61 202 | 26423.1293431361,60.8,"female","white",15,42 203 | 55572.0362132959,72,"male","white",14,47 204 | 198835.433852258,74.22,"male","white",18,49 205 | 37566.5366877765,66.83,"female","white",12,51 206 | 28022.2375146568,66.01,"female","white",16,32 207 | 1016.49101336236,62.09,"female","white",13,35 208 | 43933.83836836,62.21,"female","white",12,45 209 | 111301.925090235,72.23,"male","white",14,36 210 | 56639.2874902506,65.17,"female","white",15,51 211 | 16899.2307482988,60.07,"female","white",12,66 212 | 56646.5578023636,61.91,"female","white",15,43 213 | 24858.2973709503,66.01,"female","white",12,42 214 | 20065.3040015574,64.06,"female","white",12,43 215 | 12641.1483677478,71.01,"male","white",12,72 216 | 13742.373319619,65.32,"female","white",12,54 217 | 55575.0908182266,73.15,"male","white",18,54 218 | 71500.598919884,68.92,"male","white",11,55 219 | 997.072456516123,64.04,"female","white",12,62 220 | 24845.7392111286,64.09,"female","white",14,26 221 | 24831.1444642002,61.37,"female","white",12,81 222 | 39143.607661738,67.06,"female","white",18,55 223 | 40770.5860160793,61.68,"female","white",13,36 224 | 40764.7785489312,62.82,"female","black",14,58 225 | 31727.703584918,66.64,"male","other",10,32 226 | 39154.2544457864,67.86,"female","white",12,28 227 | 69902.4752117052,72.3,"male","white",16,50 228 | 109708.748822754,69.94,"male","white",12,42 229 | 98535.1068275004,69.96,"male","white",18,48 230 | 50919.6738042205,67.17,"male","black",14,36 231 | 32810.0439602876,62.83,"female","black",12,37 232 | 51892.9423880023,65.68,"female","other",18,39 233 | 40756.069498211,65.9,"female","white",17,54 234 | 270275.894207949,71.07,"male","white",18,49 235 | 1001.22511774436,66.07,"female","black",11,33 236 | 55728.5899158062,66.9,"male","white",16,37 237 | 63516.2793540724,76.48,"male","black",16,42 238 | 52438.922964203,68.71,"male","white",14,34 239 | 29615.6153757967,65.41,"female","white",12,26 240 | 48695.6588978653,61.92,"female","white",13,47 241 | 42341.7892935171,61.97,"female","white",12,35 242 | 8962.40245588917,61.72,"female","white",12,25 243 | 32800.6355705226,62.29,"female","white",16,77 244 | 28011.6417041634,60.95,"female","white",12,34 245 | 51878.8132937651,65.16,"female","white",14,32 246 | 23868.1473946433,72,"male","white",14,26 247 | 981.194900092476,65.92,"female","white",17,45 248 | 79510.340661361,72.13,"male","white",14,59 249 | 13722.141363414,63.85,"female","white",12,43 250 | 63693.4110882131,71.29,"male","white",14,55 251 | 63610.4749987873,72.43,"male","white",15,45 252 | 52085.5294649461,65.85,"male","white",12,38 253 | 32803.4792304162,65,"female","white",14,73 254 | 58230.5879236984,64.9,"female","black",15,43 255 | 10542.8902094512,61.48,"female","white",12,68 256 | 20095.5458305818,65.97,"female","black",14,38 257 | 95430.6104210465,66.78,"male","white",14,67 258 | 64613.0668228329,64.3,"female","white",13,48 259 | 69383.4258503022,63.28,"female","white",12,42 260 | 72549.3620022462,67.36,"female","white",16,38 261 | 10522.9175006923,67.83,"female","white",18,43 262 | 13718.4543920556,64.51,"female","other",10,86 263 | 32796.9508717817,63.87,"female","black",13,43 264 | 27064.549819231,66.1,"male","black",12,27 265 | 4182.39423494047,65.48,"female","white",12,36 266 | 103364.476394143,64.43,"male","white",12,32 267 | 79469.6826737892,72.13,"male","white",12,45 268 | 998.682558355125,67.92,"female","white",11,58 269 | 998.684495526987,62.69,"female","white",11,33 270 | 18477.7803339031,67.83,"female","white",12,42 271 | 55628.230922933,74.41,"male","white",18,34 272 | 42838.8607714631,63.1,"male","white",12,37 273 | 6565.33123484257,63.75,"female","white",12,65 274 | 66836.7602543026,68.19,"male","white",12,41 275 | 32800.535842363,66.94,"female","white",18,33 276 | 24847.4675748411,66.65,"female","white",12,33 277 | 15859.5314446913,72.86,"male","white",12,25 278 | 1003.64167432293,62.21,"female","white",12,52 279 | 9490.29690448586,69.22,"male","white",8,82 280 | 28032.1868635186,64.95,"female","hispanic",16,27 281 | 55589.0943127064,68.9,"male","hispanic",16,69 282 | 44516.112502923,77.02,"male","white",14,32 283 | 24848.5150235251,68.07,"female","white",12,37 284 | 32819.8518440008,62.2,"female","white",17,28 285 | 32807.4666688853,64.07,"female","white",12,33 286 | 16671.5011254779,66.55,"male","hispanic",12,46 287 | 20550.1150385843,70.19,"male","white",16,26 288 | 16894.1454216893,62.2,"female","white",12,56 289 | 5768.92665868558,62.71,"female","white",12,41 290 | 39157.5819181736,61.99,"female","white",14,33 291 | 998.503387403771,62.81,"female","white",17,45 292 | 28024.6010138162,68.84,"female","white",13,39 293 | 18494.0961370381,64.28,"female","white",12,52 294 | 1028.7967089839,64.52,"female","white",14,41 295 | 51883.97214454,68.63,"female","white",14,44 296 | 28035.1640643814,59.06,"female","white",15,30 297 | 5764.82640107653,66.36,"female","other",12,39 298 | 1000.43844737019,64.97,"female","white",12,45 299 | 4330.56327434975,62,"female","black",12,25 300 | 5042.87288471185,75.18,"male","black",11,38 301 | 28044.017605739,62.99,"female","white",17,30 302 | 48696.6894978227,65.03,"female","white",13,32 303 | 23793.5962873794,70.02,"male","white",12,36 304 | 39169.7206103304,68.32,"female","white",16,46 305 | 16889.5158551673,65.01,"female","white",14,71 306 | 79469.6524162958,69.22,"male","white",8,57 307 | 79521.0309906084,71.7,"male","white",16,57 308 | 32811.1171608666,63.65,"female","white",17,53 309 | 48692.7045921071,62.84,"female","hispanic",18,58 310 | 1000.81650997941,63.41,"female","white",12,67 311 | 1019.52175123961,65.4,"female","white",10,52 312 | 986.055083445869,63.07,"female","white",5,62 313 | 35986.0131003784,73.2,"female","white",14,49 314 | 42903.3817816247,74.77,"male","white",12,28 315 | 7363.89520150288,65.14,"female","white",12,42 316 | 28828.8892012243,66.55,"female","white",17,30 317 | 27220.0746330027,65.93,"female","white",14,32 318 | 45524.391056089,63.91,"female","white",16,43 319 | 83682.9635299098,66.1,"female","white",18,54 320 | 24854.2862407647,63.68,"female","white",16,33 321 | 31210.0450199533,62.71,"female","white",13,36 322 | 998.792806987374,62.75,"female","white",13,48 323 | 42834.4504315573,72.72,"male","white",12,27 324 | 23793.0568258544,72,"male","white",8,62 325 | 1004.68819257564,66.06,"female","white",12,28 326 | 23873.5673156157,71.94,"male","white",14,37 327 | 24035.958990178,63.9,"female","white",12,59 328 | 39149.8127118589,64.24,"female","white",13,34 329 | 28581.7197594866,68.16,"male","white",15,40 330 | 7351.36224357467,61.96,"female","white",12,72 331 | 7350.34389591184,68.48,"female","white",13,61 332 | 2098.28255691469,59.46,"female","white",12,40 333 | 39169.7501351968,64.79,"female","white",12,95 334 | 42926.3869579729,76.09,"male","white",12,43 335 | 20091.4931673901,71.41,"female","white",12,39 336 | 35985.2005008923,62.08,"female","white",18,51 337 | 48698.2042696945,62.21,"female","white",13,40 338 | 55616.9724786137,71.55,"male","white",12,65 339 | 31836.5309592245,73.79,"male","white",12,46 340 | 51897.557115687,65.16,"female","white",16,36 341 | 10565.080348061,62.79,"female","other",12,35 342 | 20084.3631443331,66.52,"female","white",12,43 343 | 989.32483001483,65.05,"female","white",16,48 344 | 16893.650618774,62.93,"female","white",12,35 345 | 2583.37635997833,65.01,"female","white",12,22 346 | 20057.9469932086,63.83,"female","white",15,75 347 | 26434.5356297357,65.19,"female","white",12,40 348 | 990.632829899077,67.9,"female","white",16,45 349 | 40730.7869181018,62.87,"female","white",16,37 350 | 40751.8595966049,69.96,"female","white",12,31 351 | 50933.4879483364,68.62,"male","white",12,34 352 | 55621.9374127112,66.75,"male","white",14,47 353 | 10538.8367190333,65.05,"female","white",12,32 354 | 127210.627675111,69.39,"male","white",14,48 355 | 32799.5880325174,67.7,"female","white",14,29 356 | 2598.33597751564,66.37,"female","white",17,42 357 | 44502.6875350514,71.38,"male","hispanic",13,42 358 | 41290.7032034576,63.87,"male","hispanic",16,27 359 | 42946.1384264164,74.16,"male","white",12,28 360 | 29617.1242132711,61.92,"female","white",13,45 361 | 4203.92493626903,64.84,"female","white",12,63 362 | 31774.6724028254,69.09,"male","white",12,70 363 | 10547.6593411727,66.08,"female","white",16,52 364 | 998.222937309824,60.16,"female","white",10,72 365 | 39793.2360685558,66.6,"male","white",11,67 366 | 19083.0662723967,72.94,"male","white",12,33 367 | 982.603707126398,65.13,"female","white",18,30 368 | 19019.4383518021,76.02,"male","white",12,61 369 | 47621.5506990729,70.14,"male","white",14,80 370 | 1002.02789557595,64.18,"female","white",12,34 371 | 7352.90609561772,63,"female","black",15,26 372 | 42344.141216902,62.96,"female","black",13,41 373 | 36597.1185121691,70.04,"male","black",12,39 374 | 33372.9363140627,71.27,"male","white",12,37 375 | 40755.6601213819,65.19,"female","white",12,66 376 | 28578.7772496041,69.48,"male","white",16,79 377 | 7372.74244873303,63.48,"female","white",13,48 378 | 39006.1541350665,64.11,"female","white",12,26 379 | 56653.7613128414,63.44,"female","white",18,44 380 | 41312.4117058859,67.15,"male","white",14,34 381 | 39767.884119698,65.82,"male","white",16,40 382 | 38374.9264949555,68.18,"female","white",13,31 383 | 20079.9007989922,64.7,"female","white",12,26 384 | 24856.3852622375,66.85,"female","white",12,24 385 | 4771.49798475468,72.62,"male","white",17,27 386 | 10538.599791571,63.68,"female","white",12,71 387 | 22284.6898628754,70.77,"male","white",12,73 388 | 24851.1945376469,63.17,"female","white",16,35 389 | 278213.531775569,71.05,"male","white",16,52 390 | 1012.72857450333,62.7,"female","white",12,55 391 | 1605.96493256327,72.85,"male","white",18,29 392 | 16899.9838545566,64.92,"female","white",14,67 393 | 71575.5885980481,65.94,"male","white",13,86 394 | 24854.2581843483,66.21,"female","white",12,85 395 | 32804.552369449,59,"female","white",13,45 396 | 55626.7507266474,69.9,"male","white",16,34 397 | 63619.6597966112,71.96,"male","white",16,32 398 | 56660.0517213774,64.27,"female","white",12,76 399 | 80491.7059230187,65.48,"female","white",16,58 400 | 159001.929910422,69.42,"male","white",18,61 401 | 55744.0702635899,69.09,"male","white",13,32 402 | 1019.39370328577,65.85,"female","white",12,55 403 | 39166.0892107678,68.19,"female","white",12,39 404 | 994.954167707421,64.15,"female","white",12,34 405 | 1016.64167481653,63.06,"female","white",12,64 406 | 55597.0775538434,73.86,"male","white",17,47 407 | 60429.4216733453,72.55,"male","white",16,78 408 | 47666.4694402061,67.35,"male","white",12,56 409 | 8958.40683258562,64.3,"female","hispanic",12,43 410 | 235388.719222562,66.72,"male","white",18,42 411 | 47782.4465201032,69.45,"male","white",12,36 412 | 11338.9470357351,63.76,"female","white",15,35 413 | 5765.71378324095,61.71,"female","white",16,41 414 | 986.455165422203,65.94,"female","white",8,57 415 | 36713.0878336217,69.45,"male","white",18,73 416 | 63544.160094039,67.69,"male","white",17,45 417 | 47713.2746225333,71.77,"male","white",14,43 418 | 998.346114761212,63.09,"female","white",14,33 419 | 23257.5590577062,63.09,"female","white",12,41 420 | 24843.7923614691,68.86,"female","white",12,38 421 | 42347.4010321721,63.75,"female","white",18,40 422 | 13720.5614863724,71.45,"female","white",12,29 423 | 39159.796830908,63.98,"female","black",16,31 424 | -12.0167137832365,72.08,"male","white",8,24 425 | 1013.27147701205,62.27,"female","white",12,29 426 | 988.758787012582,65.93,"female","white",10,42 427 | 8181.73028203835,65.8,"male","white",8,71 428 | 16889.8103273437,60.72,"female","white",15,29 429 | 1016.50273906467,69.1,"female","white",13,23 430 | 20100.2642571535,68.3,"female","white",16,36 431 | 999.845388435426,65.9,"female","white",12,27 432 | 79521.0347468327,67.87,"male","white",17,38 433 | 37551.8483983486,63.61,"female","white",13,30 434 | 1016.37310656959,61,"female","white",16,44 435 | 63578.5962735478,65.98,"male","black",14,38 436 | 10543.6518922358,61.86,"female","black",13,30 437 | 8955.40314951583,64.86,"female","black",16,34 438 | 174917.848536018,65.81,"male","white",18,41 439 | 65089.763788033,72.05,"male","hispanic",13,36 440 | 37562.8527389814,62.95,"female","white",13,82 441 | 1000.54221572545,63.1,"female","white",12,36 442 | 33354.9444670964,69.91,"male","white",12,27 443 | 7358.5069793707,62.02,"female","white",12,33 444 | 39763.7660650055,71.11,"male","white",12,40 445 | 47665.4186748928,69.84,"male","white",18,50 446 | 23255.7192646579,64.04,"female","white",11,55 447 | 10546.2705351933,63.34,"female","white",12,39 448 | 23266.3819801947,65.03,"female","white",12,50 449 | 68461.3756361159,70.81,"male","white",12,31 450 | 40750.2315198708,61.87,"female","white",15,37 451 | 63533.925247467,73.05,"male","white",12,70 452 | 103313.622213574,67.86,"male","white",17,44 453 | 25439.9094368504,76.18,"male","white",15,36 454 | 13715.9034686946,65.99,"female","white",12,75 455 | 32795.5150089862,64.02,"female","white",15,43 456 | 23259.2826084242,60.01,"female","white",14,76 457 | 71637.552888116,67.86,"male","white",12,78 458 | 13712.7662216597,61.18,"female","white",11,76 459 | 19065.5179466674,66.13,"male","white",12,32 460 | 13735.4075238793,63.02,"female","white",14,35 461 | 26440.4988768348,62.85,"female","white",15,36 462 | 9579.8456488705,73.86,"male","white",15,77 463 | 30173.6300852123,69.92,"male","white",12,36 464 | 34396.1596986703,68.26,"female","white",13,43 465 | 69369.6365028672,69.64,"female","white",18,50 466 | 55664.3998002978,69.15,"male","white",15,25 467 | 12747.0423371962,69.42,"male","white",10,79 468 | 34385.4430522478,62.74,"female","white",14,45 469 | 10233.0797957323,65.15,"female","white",12,34 470 | 27082.7070995351,68.85,"male","white",12,50 471 | 38087.3220223372,68.15,"male","white",14,53 472 | 8935.97929440994,62.14,"female","white",11,75 473 | 18489.6235991451,65.91,"female","white",12,27 474 | 15920.5922048939,65.15,"male","white",13,31 475 | 63608.3668031832,70.11,"male","white",14,40 476 | 993.515449224117,66,"female","black",12,41 477 | 63604.6309830045,70.06,"male","white",12,51 478 | 39163.5186521581,66.36,"female","black",14,33 479 | 38187.4073424178,68.23,"male","white",16,30 480 | 31761.6801664963,67.77,"male","white",16,37 481 | 95330.5448443733,69.87,"male","white",18,54 482 | 18482.8594515788,65.28,"female","white",13,51 483 | 8950.51549317265,63.91,"female","white",15,44 484 | 43919.1420419723,64.88,"female","white",16,34 485 | 141452.613413474,77.21,"male","white",16,45 486 | 63665.5948402963,70.36,"male","white",14,32 487 | 24859.8151722343,64.15,"female","white",16,30 488 | 2581.0429400068,63.83,"female","white",14,24 489 | 998.726796716888,62.04,"female","white",12,32 490 | 35995.1803431969,64.2,"female","white",12,64 491 | 20086.1754557979,63.93,"female","white",8,52 492 | 35968.8432808826,63.09,"female","white",16,32 493 | 64581.9520540544,57.94,"female","black",12,60 494 | 994.323952930143,66,"female","white",12,62 495 | 64599.9564039253,62.1,"female","white",16,48 496 | 995.486049534476,60.97,"female","white",13,46 497 | 56649.7673340898,63.61,"female","hispanic",14,57 498 | 12129.4184092628,64.22,"female","white",12,32 499 | 23257.4531814882,61.35,"female","white",12,41 500 | 24849.2645102664,64.06,"female","white",12,60 501 | 997.060240519208,66.31,"female","black",14,35 502 | 10536.8164690924,66.3,"female","white",13,38 503 | 18491.4632477733,68.48,"female","black",12,45 504 | 16894.8715844397,63.71,"female","white",13,35 505 | 1004.08167170771,62.42,"female","white",12,63 506 | 31776.7508647511,71.11,"male","white",14,45 507 | 12132.2731355026,62.1,"female","white",12,77 508 | 60439.3840685454,67.95,"male","white",9,50 509 | 46075.0353918281,67.96,"male","white",9,62 510 | 32794.3325793405,62.48,"female","white",10,48 511 | 13726.4773008127,63,"female","white",12,61 512 | 57180.2849042573,69.28,"male","white",16,55 513 | 3396.41078804734,67.46,"female","white",12,48 514 | 67766.0710226514,63.84,"female","white",18,33 515 | 39705.6238023595,69.96,"male","white",12,26 516 | 20670.4756812433,69.3,"male","white",10,24 517 | 47728.0991250952,69.4,"male","black",11,55 518 | 11260.2857898754,69.05,"male","white",12,27 519 | 50295.0102290679,59.96,"female","black",12,52 520 | 24861.2188259363,66.62,"female","black",12,46 521 | 987.153288063948,59.86,"female","white",18,45 522 | 79449.0780155172,66.77,"male","white",16,53 523 | 7902.16681412268,66.39,"male","white",15,24 524 | 1955.16818716443,69.87,"female","black",12,22 525 | 23250.1981515703,63.89,"female","black",8,55 526 | 64598.4562971162,59.72,"female","other",18,39 527 | 38212.8861850698,68.14,"male","white",18,40 528 | 35071.3664810722,73.49,"male","black",12,28 529 | 31732.0084520613,71.98,"male","white",8,40 530 | 32973.46426248,63.69,"female","white",12,42 531 | 19100.6126796045,66.85,"male","white",13,71 532 | 66201.6869744607,66.88,"female","white",13,44 533 | 20636.4454407762,70.68,"male","black",9,54 534 | 1016.75164962538,63.23,"female","white",12,29 535 | 95424.9084378097,73.21,"male","white",16,35 536 | 38084.9099393145,69.94,"male","white",16,29 537 | 60380.5619633035,69.85,"male","white",16,40 538 | 5781.36536737211,71.49,"female","white",14,31 539 | 21666.0178506817,66.05,"female","white",12,34 540 | 7893.44745525963,75.04,"male","white",12,25 541 | 32801.7096211093,59.83,"female","white",14,41 542 | 34894.9420985275,71.97,"male","white",12,32 543 | 990.366789688609,63.08,"female","black",16,27 544 | 44509.7431378579,75.11,"male","black",9,59 545 | 35983.4557169327,59.68,"female","white",12,45 546 | 47725.9251418864,72.54,"male","black",12,48 547 | 48687.901309322,64.93,"female","black",12,36 548 | 47617.1421807931,67.12,"male","white",16,35 549 | 51881.1137290279,60.77,"female","white",12,73 550 | 42328.8007934413,69,"female","white",17,47 551 | 35974.9467608756,64.05,"female","white",14,38 552 | 39662.96205847,70.18,"male","white",12,37 553 | 18481.473822349,68.24,"female","white",10,28 554 | 21658.7263654599,62.09,"female","white",12,57 555 | 16913.3676793106,64.15,"female","white",15,46 556 | 1002.3923362722,67.76,"female","white",12,28 557 | 20068.0607328364,68.05,"female","white",12,49 558 | 56.3781338622286,66.18,"male","white",13,61 559 | 12123.3789274287,64.22,"female","white",11,68 560 | 90527.2378218376,72.92,"male","white",14,41 561 | 57196.6301938738,73.38,"male","white",16,41 562 | 46069.1445429611,68.71,"male","hispanic",14,33 563 | 13727.1703803878,67.87,"female","white",13,23 564 | 4831.58925740444,71.34,"male","black",11,22 565 | 990.536028389992,63.2,"female","white",12,68 566 | 32809.7265515796,66.19,"female","white",16,45 567 | 995.475488386213,65.81,"female","white",12,40 568 | 52457.2959876668,66.22,"male","white",12,61 569 | 24864.5362144608,68.26,"female","hispanic",12,72 570 | 25420.1296221833,69.66,"male","white",12,37 571 | 8949.60197006944,62.17,"female","white",12,82 572 | 13733.774868634,65.98,"female","white",14,47 573 | 51874.5737440326,63,"female","black",16,38 574 | 23102.781191952,74.97,"male","white",12,28 575 | 21676.2738975494,66.86,"female","white",12,33 576 | 1008.33879197347,62.22,"female","white",10,24 577 | 12134.4467282926,63.24,"female","white",9,29 578 | 80502.5320404378,65.44,"female","black",18,69 579 | 20075.1124669118,68.84,"female","black",12,39 580 | 30282.4673070547,71.33,"male","white",14,65 581 | 15324.3810807192,62.23,"female","white",12,59 582 | 32794.1983616288,64.92,"female","black",12,39 583 | 87392.6632009754,72.34,"male","hispanic",18,39 584 | 16890.3326960442,65.12,"female","white",12,44 585 | 21661.0075017285,62.9,"female","white",12,23 586 | 32809.6662173545,64.34,"female","white",13,35 587 | 47708.8917890766,71.9,"male","white",12,30 588 | 20074.666638086,64.39,"female","white",12,38 589 | 42347.1664095893,70.17,"female","white",13,37 590 | 20882.6809309634,65.09,"female","white",12,31 591 | 13706.6343035933,64.38,"female","white",12,33 592 | 24851.8480250879,64.12,"female","hispanic",16,36 593 | 103274.093507014,67.05,"male","white",14,29 594 | 39730.4007784177,73.13,"male","white",10,25 595 | 12735.5841473359,73.06,"male","hispanic",14,25 596 | 23877.3476332195,65.08,"male","other",14,25 597 | 20079.3061155936,64.49,"female","white",16,44 598 | 56642.756541816,66.91,"female","black",13,41 599 | 63557.9760687751,66.58,"male","white",16,35 600 | 46114.9940451608,74.49,"male","white",12,26 601 | 24856.1676094287,62.84,"female","white",13,35 602 | 31818.6236230481,74.17,"male","white",12,40 603 | 4706.62072409605,73.86,"male","black",6,66 604 | 2596.37371508954,63,"female","black",16,23 605 | 13711.4490881715,62.37,"female","black",11,27 606 | 8962.85062670136,61.96,"female","white",12,22 607 | 40751.6454536098,66.75,"female","white",16,31 608 | 31860.2747237421,69.1,"male","white",12,61 609 | 47719.3361845415,73.1,"male","white",12,38 610 | 20084.3187707433,63.03,"female","white",12,39 611 | 1009.5439125996,60.19,"female","white",12,38 612 | 16895.3990352936,67.24,"female","white",14,36 613 | 25455.211936639,68.73,"male","white",12,23 614 | 71550.4036773232,70.95,"male","white",12,33 615 | 63567.3694572935,72.35,"male","white",14,58 616 | 991.402309767381,62.95,"female","white",12,66 617 | 39688.5905654115,70.55,"male","white",14,85 618 | 31203.6566167839,63.75,"female","white",14,56 619 | 28969.5085549037,64.52,"female","white",16,34 620 | 11052.2685502736,68.13,"male","white",12,71 621 | 47697.1018967501,74.14,"male","white",12,36 622 | 29620.4026862233,62.11,"female","white",11,51 623 | 40743.0231380484,64.81,"female","white",16,51 624 | 1001.39285180773,66.42,"female","white",14,59 625 | 10535.5109396086,59.93,"female","white",5,66 626 | 111355.582492486,68.96,"male","white",18,46 627 | 24855.0354935343,61.2,"female","other",16,38 628 | 40753.3280697748,65.17,"female","white",17,37 629 | 56670.3273867862,65.88,"female","white",16,32 630 | 28615.2800070585,70.22,"male","white",16,28 631 | 44568.5809689135,72.07,"male","black",15,34 632 | 24860.3637743202,63.94,"female","hispanic",11,25 633 | 34957.1221747868,71.36,"male","hispanic",14,32 634 | 1007.21194874105,62.74,"female","white",12,33 635 | 55693.8337608029,72.86,"male","white",12,34 636 | 29624.8547880902,64.52,"female","black",15,50 637 | 55760.7664723291,67.86,"male","white",14,41 638 | 27068.2291862713,63.7,"female","black",12,25 639 | 34920.5390609967,66.03,"male","black",16,41 640 | 48697.0600361851,67.18,"female","white",16,62 641 | 28032.2062659646,64.07,"female","hispanic",12,31 642 | 40745.341702208,64.09,"female","black",14,40 643 | 16899.7657635547,61.94,"female","white",12,55 644 | 63679.4040953726,69.43,"male","white",12,41 645 | 20859.1253422943,62.89,"female","white",12,67 646 | 55622.579497519,71.29,"male","white",15,49 647 | 984.500213531472,61.15,"female","hispanic",16,27 648 | 166981.025077615,75.23,"male","white",12,53 649 | 159024.205461937,70.35,"male","white",18,44 650 | 33381.7739759541,68.61,"male","white",12,55 651 | 16898.5850156759,63.93,"female","white",14,60 652 | 52454.8022134593,72.05,"male","white",12,30 653 | 41354.4324383427,68.64,"male","white",18,29 654 | 96398.2293678427,62.95,"female","white",10,82 655 | 28625.3297430993,71.44,"male","white",16,49 656 | 24843.1804714277,58.97,"female","white",8,65 657 | 8031.69997754504,68.31,"male","white",13,24 658 | 95370.5559507232,76.73,"male","white",17,46 659 | 20078.7017904862,67.96,"female","white",14,73 660 | 44610.2025899329,70.28,"male","black",18,42 661 | 66786.5507838122,69.29,"male","white",18,47 662 | 4194.42425623004,63.43,"female","white",12,37 663 | 31855.1033379246,73.14,"male","black",18,37 664 | 44600.1770620232,66.41,"male","other",15,44 665 | 15997.5455999182,72.1,"male","black",17,66 666 | 69.6914927913448,65.97,"male","white",16,69 667 | 49365.2190022763,72.29,"male","black",12,35 668 | 28014.3209098335,69.15,"female","black",13,41 669 | 26427.6756161573,67.11,"female","white",12,43 670 | 56644.6466978089,65.88,"female","white",15,42 671 | 5775.40581158023,67.13,"female","white",12,30 672 | 26443.0930466471,68.29,"female","white",18,43 673 | 56659.1888329205,64.77,"female","black",15,46 674 | 43928.4164617031,62.55,"female","black",17,37 675 | 39716.9277091473,68.99,"male","white",17,33 676 | 1013.14945943267,62.04,"female","white",14,82 677 | 63549.4462474664,70.38,"male","white",16,45 678 | 72559.7829905608,66.23,"female","white",16,47 679 | 22225.1108881337,65.96,"male","white",12,27 680 | 40739.3407716587,66.04,"female","white",14,41 681 | 63629.7998156167,71.66,"male","white",12,46 682 | 54097.4886990852,68.61,"male","white",12,39 683 | 63563.485216593,65.94,"male","white",12,38 684 | 55702.3014545118,69.1,"male","white",14,32 685 | 20079.0535040621,59.9,"female","hispanic",12,55 686 | 48710.9197381887,64.89,"female","white",17,47 687 | 37562.0541780795,61.81,"female","white",12,44 688 | 4169.61756189437,66.73,"female","white",12,34 689 | 31923.2309000489,67.11,"male","white",14,29 690 | 16911.0248263552,68.29,"female","black",9,66 691 | 996.67215235483,64.96,"female","black",6,71 692 | 96404.2353809042,65.97,"female","black",16,47 693 | 19051.340409669,74.3,"male","black",16,45 694 | 40745.0728919601,65.19,"female","white",18,65 695 | 35966.4027951844,62.73,"female","black",14,77 696 | 998.612844261211,61.64,"female","white",12,42 697 | 12787.9070555997,67.82,"male","white",17,29 698 | 4195.33627740169,65.65,"female","white",16,25 699 | 30577.5151842098,64.93,"female","white",16,30 700 | 32797.7152048203,62.44,"female","white",14,24 701 | 158979.09115076,72.74,"male","white",18,41 702 | 80508.0900964618,61.91,"female","white",15,44 703 | 42906.4330216585,69.7,"male","white",12,46 704 | 2609.64069289783,57.34,"female","black",12,62 705 | 28564.2913953672,67.72,"male","white",12,24 706 | 28025.9975550003,66.98,"female","white",14,27 707 | 16886.8514579247,64.44,"female","white",12,78 708 | 22206.6211632941,69.86,"male","white",14,39 709 | 11491.8443763099,72.53,"male","white",14,70 710 | 1023.98892050724,62.99,"female","other",8,56 711 | 53464.7050486639,63.3,"female","white",17,43 712 | 36578.6173669224,65.62,"male","white",12,38 713 | 16888.7393946248,64.78,"female","white",12,45 714 | 34939.3171908471,61.12,"male","white",12,28 715 | 21659.5418912689,60.56,"female","hispanic",12,23 716 | 53459.7469784682,66.74,"female","white",12,35 717 | 57150.2428716162,72.79,"male","white",12,40 718 | 16898.6413084924,71.17,"female","black",14,36 719 | 10547.8916527515,64.19,"female","white",12,64 720 | 20090.3525656537,67.44,"female","white",12,47 721 | 16900.9336464029,61.76,"female","white",11,51 722 | 34400.7901688741,67.26,"female","white",12,49 723 | 41363.6776236564,69.02,"male","white",12,36 724 | 35991.8855411288,65.97,"female","black",12,54 725 | 37571.3742815177,65.72,"female","black",15,58 726 | 31883.2552493761,66.22,"male","white",15,28 727 | 16910.1847964277,65.99,"female","white",13,42 728 | 24824.3894868874,63.89,"female","white",12,75 729 | 11207.8912675841,71.87,"male","black",14,28 730 | 24859.4210423322,63.83,"female","white",12,42 731 | 987.477982853054,64.33,"female","white",12,64 732 | 42339.8421394851,64.83,"female","white",16,38 733 | 15961.4273011639,71.79,"male","white",11,35 734 | 982.925263925835,63.12,"female","white",8,61 735 | 4753.87988315343,71.2,"male","white",14,24 736 | 43913.668813905,63.72,"female","hispanic",17,45 737 | 45521.6194017236,64,"female","white",17,29 738 | 24848.1063279448,66.95,"female","white",12,31 739 | 87473.4075168657,75.04,"male","white",18,34 740 | 32823.4149412493,65.68,"female","hispanic",17,30 741 | 14525.0653532774,67.11,"female","white",12,48 742 | 999.7960302637,68.09,"female","white",12,62 743 | 47682.5461562175,71.41,"male","white",16,34 744 | 1002.43982783669,61.88,"female","white",12,70 745 | 32801.265467448,67.96,"female","white",12,35 746 | 34947.9182367206,69.77,"male","white",6,79 747 | 39823.374123275,67.17,"male","white",16,30 748 | 977.095394374498,63.77,"female","white",16,66 749 | 4758.44302121065,70.99,"male","white",8,26 750 | 26437.6534741544,62.96,"female","white",12,53 751 | 1012.99331167704,64.14,"female","white",12,61 752 | 16878.8349959772,60.61,"female","black",12,43 753 | 24854.9424481381,65.34,"female","black",12,32 754 | 44513.8048948285,66.96,"male","white",12,50 755 | 50014.1224657651,65.18,"male","white",14,39 756 | 150967.87829962,66.3,"male","white",18,56 757 | 60411.7997400031,66.64,"male","white",12,44 758 | 47660.9252116808,73.89,"male","white",12,45 759 | 57055.5653429877,73.95,"male","white",16,46 760 | 20096.2174172541,66.07,"female","white",13,26 761 | 16908.8113233328,69.12,"female","white",16,29 762 | 72543.836019343,66.05,"female","white",18,48 763 | 63581.3456537451,74.27,"male","white",16,60 764 | 34293.320938979,71.96,"male","white",12,31 765 | 22149.5427565164,66.26,"male","white",12,66 766 | 60492.1212305534,67.04,"male","other",17,58 767 | 2581.8704016655,64.79,"female","white",12,22 768 | 22241.3709490266,65.29,"male","white",12,77 769 | 24837.2986927913,64.98,"female","white",12,35 770 | 1017.15996053443,62.58,"female","hispanic",12,67 771 | 39167.7457051747,68.21,"female","white",13,46 772 | 8161.62386130716,64.15,"female","white",16,27 773 | 29627.8442660978,63.68,"female","white",16,67 774 | 23260.5391564021,63.15,"female","white",12,24 775 | 16903.3551430115,69.06,"female","white",12,32 776 | 16108.242172076,66.91,"female","white",12,36 777 | 18516.6837641757,63.1,"female","white",15,70 778 | 27100.834976173,66.96,"male","white",12,71 779 | 10559.4810739797,60.23,"female","white",13,32 780 | 4972.0353918892,62.24,"female","white",11,62 781 | 63505.6743924597,71.88,"male","white",16,45 782 | 996.561743690155,67.74,"female","white",14,39 783 | 39706.4175846121,66.88,"male","white",16,43 784 | 45523.4974026761,66,"female","black",15,64 785 | 53471.4323638042,66.07,"female","black",12,49 786 | 7370.75164242468,59.64,"female","white",8,68 787 | 981.628981126362,67.19,"female","white",12,33 788 | 6528.93768232502,65.47,"male","white",16,34 789 | 40752.142789777,69.11,"female","white",12,86 790 | 50863.956748591,72.15,"male","white",12,32 791 | 21674.3140627863,63.41,"female","white",8,62 792 | 12698.1310972785,63.81,"male","hispanic",8,31 793 | 987.960796526986,61.66,"female","other",17,30 794 | 993.15806525758,64.93,"female","white",13,60 795 | 28035.9398261196,64.08,"female","white",12,42 796 | 24847.9216110984,67.41,"female","white",12,81 797 | 28590.2219778065,65.71,"male","white",13,36 798 | 19119.2497606979,73.4,"male","white",12,28 799 | 11062.216631806,73.87,"male","white",7,53 800 | 1006.76154115774,65.63,"female","white",12,28 801 | 2909.93805759301,65.08,"female","white",14,26 802 | -27.8768193545869,72.29,"male","white",12,22 803 | 1000.22150357482,64.09,"female","white",12,22 804 | 22329.158952236,62.91,"male","white",12,25 805 | 28028.8492187587,62.85,"female","white",12,44 806 | 47765.3406387636,66.04,"male","white",12,46 807 | 66.5428358774727,72.38,"male","white",10,67 808 | 39678.5326981348,70.12,"male","white",12,57 809 | 35978.7787242601,66.07,"female","hispanic",12,37 810 | 24857.016024841,62.92,"female","white",12,25 811 | 42963.3620053351,72.94,"male","white",12,95 812 | 32796.4245622424,63.26,"female","white",16,30 813 | 15292.2967766704,68.74,"female","white",11,38 814 | 994.593757949998,66.3,"female","white",14,28 815 | 27055.6866263278,72.35,"male","white",12,34 816 | 3843.76286582021,67.86,"female","white",12,50 817 | 29620.1650898871,64.07,"female","white",12,50 818 | 40742.5641452661,63.38,"female","white",18,46 819 | 31844.6518626703,70.34,"male","black",16,45 820 | 120240.278171623,63.8,"female","white",15,44 821 | 6555.89914734581,66.3,"female","white",12,44 822 | 58227.6431745198,60.76,"female","white",12,75 823 | 47685.7793442066,72.2,"male","white",12,29 824 | 40734.3058021157,64.77,"female","white",13,82 825 | 2593.55740323646,63.79,"female","white",14,30 826 | 95457.421242717,67.82,"male","white",12,70 827 | 55645.5778949582,73.86,"male","white",13,80 828 | 14214.3087953851,67.14,"male","white",13,37 829 | 55712.3484320936,70.13,"male","white",9,88 830 | 18488.0224412708,66.21,"female","white",10,62 831 | 19131.3555361695,70.93,"male","white",12,56 832 | 47724.5834057795,70.96,"male","white",12,47 833 | 30251.8926549142,71.1,"male","black",7,61 834 | 14176.3872529867,73.37,"male","white",9,65 835 | 47779.8599237978,75.01,"male","white",16,33 836 | 988.016236486753,62.18,"female","white",12,37 837 | 14300.4605801796,71.72,"male","white",12,36 838 | 24859.7960733726,65.36,"female","hispanic",16,34 839 | 13712.5235224073,64.8,"female","black",12,35 840 | 16884.77431914,65.91,"female","white",10,79 841 | 52499.2752680738,72.14,"male","white",12,41 842 | 20095.9029830232,59.98,"female","hispanic",12,32 843 | 29623.7009005662,64.12,"female","white",15,41 844 | 48698.5250501847,65,"female","white",13,65 845 | 20884.8217696475,64.94,"female","white",12,36 846 | 16892.8740662929,62.1,"female","black",12,40 847 | 13622.6093393889,67.76,"male","black",13,75 848 | 33357.2798994404,72.91,"male","black",14,50 849 | 1003.35472447434,62.98,"female","white",12,33 850 | 34397.6269652435,63.58,"female","white",12,81 851 | 1000.47251325349,65.81,"female","white",13,28 852 | 34389.1320382317,67.92,"female","white",17,29 853 | 1003.70654303727,63.1,"female","white",13,43 854 | 8951.3872765461,65.83,"female","white",12,56 855 | 35977.614797689,70.85,"female","white",17,40 856 | 47746.0565649852,72.99,"male","white",14,30 857 | 1010.22903182616,64.17,"female","white",12,32 858 | 196568.589166955,61.32,"female","white",14,62 859 | 13720.6740585699,62.68,"female","white",9,49 860 | 1012.3586518629,63.12,"female","white",9,58 861 | 994.513680698742,61.78,"female","white",8,60 862 | 4180.54604546177,64.29,"female","white",9,29 863 | 13080.2480579603,67.15,"female","white",6,66 864 | 1002.02384296223,66.59,"female","white",12,22 865 | 992.171547840471,60.73,"female","white",18,49 866 | 16908.1568753555,63.28,"female","white",12,29 867 | 55636.6222369252,73.11,"male","white",12,52 868 | 24852.5097234656,66.06,"female","white",16,61 869 | 7368.26610617647,70.2,"female","white",12,26 870 | 25499.5396603811,68.08,"male","white",8,28 871 | 1314.20587405782,66.84,"female","white",12,37 872 | 36623.3121917525,74.3,"male","white",16,29 873 | 26425.9974604264,64.28,"female","white",16,31 874 | 982.107472627729,61.11,"female","white",12,31 875 | 996.918499717436,66.34,"female","white",12,56 876 | 10560.7163277053,59.71,"female","white",12,53 877 | 1939.87257570583,59.32,"female","white",9,50 878 | 995.580464467877,62.15,"female","black",16,40 879 | 996.243150686733,61.78,"female","black",14,65 880 | 1003.80414608848,61.56,"female","white",12,29 881 | 44447.4099566408,74.83,"male","other",16,38 882 | 7371.86019531202,65.92,"female","white",8,70 883 | 23907.5056799262,64.37,"male","white",12,39 884 | 23797.7513516587,71.15,"male","white",12,29 885 | 999.242958614164,60.96,"female","white",10,39 886 | 40757.5684534422,63.02,"female","white",12,34 887 | 24847.4599106837,67.2,"female","white",12,30 888 | 63548.3136161029,69.24,"male","white",11,48 889 | 47707.0118820173,69.64,"male","white",11,55 890 | 8960.2545578515,66.11,"female","white",12,33 891 | 55645.6818546773,73.98,"male","white",14,38 892 | 26447.0035703518,65.03,"female","white",14,35 893 | 45509.8650510557,63.49,"female","white",16,44 894 | 31750.6770282697,68.1,"male","white",15,46 895 | 33347.263714708,66.19,"male","white",13,34 896 | 27048.7526856383,69.33,"male","black",12,46 897 | 39167.6226009284,59.75,"female","white",13,54 898 | 40742.7691523491,60.61,"female","white",12,79 899 | 63566.2328416663,74.98,"male","white",12,50 900 | 998.767187740165,63.01,"female","hispanic",9,27 901 | -98.5804890739183,69.12,"male","white",13,24 902 | 993.102442046512,66.14,"female","white",16,67 903 | 13718.6298924513,63.71,"female","white",13,23 904 | 31214.718656156,63.03,"female","hispanic",15,49 905 | 72540.167389067,66.94,"female","white",16,73 906 | 7847.48828249529,64.33,"male","black",17,37 907 | -0.964888977907029,72.17,"male","white",14,25 908 | 39165.6797373926,62.96,"female","black",14,44 909 | 8939.87426566371,67.04,"female","white",14,23 910 | 52519.6301914733,66.08,"male","white",16,38 911 | 39148.2239673475,64.56,"female","white",14,28 912 | 10524.3333363139,63.36,"female","white",13,27 913 | 144086.409334098,65.88,"female","white",14,59 914 | 980.298999532855,64.16,"female","white",16,49 915 | 91628.4462828987,64.19,"female","white",18,51 916 | 21922.3163311986,70.93,"male","white",14,58 917 | 47648.579482433,70.72,"male","white",18,41 918 | 23809.4477379481,71.85,"male","white",12,44 919 | 24860.6466090689,59.96,"female","white",14,79 920 | 49236.364512364,69.77,"male","white",12,33 921 | 52421.8265274802,73.93,"male","white",17,44 922 | 47707.8481763399,67.96,"male","white",18,65 923 | 19150.3369351537,74.1,"male","white",12,42 924 | 8012.3490716726,74.88,"male","white",12,26 925 | 33388.2987658794,67.89,"male","white",12,63 926 | 45529.9389713226,62.98,"female","white",18,50 927 | 19031.9000979283,71.81,"male","white",12,35 928 | 31207.4749542588,63.68,"female","white",12,50 929 | 10557.2030414009,63.04,"female","white",14,39 930 | 9431.11526131584,66.18,"female","white",13,64 931 | 24852.9189798823,60.98,"female","white",13,32 932 | 997.248534885182,61.73,"female","white",12,33 933 | 27009.5096925429,72.76,"male","white",12,50 934 | 8466.49445018308,66.1,"female","black",12,82 935 | 1007.99494108658,68.26,"female","white",12,22 936 | 16124.9628676755,62.28,"female","white",12,69 937 | 2634.96424437502,65.16,"male","white",12,26 938 | 990.082696774432,63.19,"female","white",14,47 939 | 9576.38819748704,62.32,"female","white",12,35 940 | 15892.2091978625,68.15,"male","white",12,35 941 | 40744.8747647464,59.15,"female","white",15,87 942 | 998.925444353651,62.21,"female","white",12,43 943 | 1013.52477383429,64.5,"female","white",9,67 944 | 56656.301330095,68.12,"female","white",16,47 945 | 45523.58050729,62.53,"female","white",18,48 946 | 67762.2226156826,63.24,"female","white",12,44 947 | 71609.0662816366,67.85,"male","white",12,62 948 | 88453.5952587809,63.83,"female","white",12,55 949 | 58228.0853639499,69.89,"female","white",16,34 950 | 39858.0562950258,69.96,"male","white",10,35 951 | 30161.9293463618,72.16,"male","white",12,32 952 | 60431.9169106581,69.34,"male","white",11,54 953 | 24850.196030533,63.19,"female","white",14,41 954 | 8162.68267176858,58.09,"female","white",5,89 955 | 153635.749804599,62.98,"female","white",14,31 956 | 18497.9558645746,63.11,"female","white",12,51 957 | 14306.4307247277,72.86,"male","white",15,38 958 | 8945.55105069553,68.91,"female","white",14,28 959 | 79540.6463287231,69.52,"male","white",12,55 960 | 1007.91101944059,65.89,"female","white",12,63 961 | 984.047256808445,65.91,"female","white",16,66 962 | 10537.4648191351,64.96,"female","white",13,48 963 | 158930.467358879,72.38,"male","white",13,26 964 | 27220.0059304223,62.05,"female","white",13,50 965 | -25.6552603659183,68.9,"male","white",12,22 966 | 50877.7761685217,71.69,"male","white",14,44 967 | 47709.3292637674,73.13,"male","white",18,45 968 | 16906.6360987587,68.19,"female","white",16,46 969 | 1002.15389694231,63.62,"female","white",12,39 970 | 24875.871655644,65.8,"female","white",12,23 971 | 11339.3623934459,59.92,"female","white",8,87 972 | 42345.881829433,60.31,"female","white",18,67 973 | 28003.467467509,65.03,"female","white",12,63 974 | 35979.2002268293,62.73,"female","other",18,63 975 | 16904.9984512194,64.38,"female","white",12,45 976 | 79523.4868411869,70.81,"male","white",14,62 977 | 31790.3006737847,67.43,"male","white",12,39 978 | 1002.45371064789,66.78,"female","white",13,63 979 | 1641.84246580928,62.86,"female","white",12,35 980 | 26433.7795557355,64.82,"female","white",14,63 981 | 16876.3917770673,65.85,"female","white",12,70 982 | 23262.3062737323,63.38,"female","white",13,62 983 | 8935.22287731831,65.04,"female","white",15,24 984 | 63570.4143675579,73.88,"male","white",16,38 985 | 8930.78455644505,67.27,"female","white",13,59 986 | 40751.7835298215,67.65,"female","white",17,51 987 | 20091.5964444871,65.17,"female","white",12,55 988 | 93232.672427805,63.72,"female","white",16,38 989 | 29632.3100863834,67.94,"female","white",16,30 990 | 1578.5428138958,64.53,"male","white",12,22 991 | 33433.936468524,72.24,"male","white",17,51 992 | 17505.4454560322,65.73,"male","black",12,66 993 | 21663.0043154502,63.69,"female","white",12,40 994 | 991.995620972514,63.63,"female","white",12,57 995 | 32809.6326770138,59.61,"female","other",16,92 996 | 39829.3235077431,72.52,"male","black",12,39 997 | 1945.44501302122,63.78,"female","white",12,47 998 | 39693.929061593,67.03,"male","black",14,30 999 | 57292.9968450935,68,"male","white",12,32 1000 | 24854.1042055616,64.02,"female","other",13,37 1001 | 6566.2571368786,66.16,"female","white",12,55 1002 | 31215.7499939053,64.55,"female","white",16,38 1003 | 98558.7781026767,70.02,"male","white",14,58 1004 | 39773.2440613475,69.94,"male","white",12,46 1005 | 79507.9870618854,65.79,"male","white",12,56 1006 | 1002.60049928122,64.24,"female","white",12,32 1007 | 1001.6664823116,66.3,"female","white",13,30 1008 | 63533.3407303505,66.06,"male","white",12,65 1009 | 41283.509070804,74.15,"male","white",18,30 1010 | 35981.5108645561,62.35,"female","white",12,52 1011 | 34393.3511714037,65.53,"female","white",17,43 1012 | 10719.1973744773,61.53,"female","white",12,69 1013 | 48700.4850032987,64.8,"female","black",18,68 1014 | 10526.1893318709,63.02,"female","white",12,82 1015 | 42872.2640087328,71.38,"male","white",16,33 1016 | 7378.63016606156,66.01,"female","white",16,46 1017 | 8026.64579441285,62.04,"female","other",6,76 1018 | 999.878661556705,61.82,"female","white",12,80 1019 | 8942.80671590728,62.97,"female","white",10,91 1020 | 1020.85471771991,64.72,"female","white",9,79 1021 | 71538.2893319736,67.4,"male","white",18,44 1022 | 29637.2679715432,60.78,"female","white",14,49 1023 | 10538.7447239525,65.8,"female","white",15,26 1024 | 988.161628283781,61.2,"female","white",12,53 1025 | 22252.8993931606,71.68,"male","white",12,28 1026 | 56644.8967699588,69.27,"female","white",14,28 1027 | 24852.1695267614,64.25,"female","white",16,33 1028 | 20074.4189766833,62.75,"female","black",12,33 1029 | 2600.43525202508,60.24,"female","white",12,51 1030 | 50832.9187815009,69.42,"male","black",15,37 1031 | 2396.14778671762,73.13,"male","white",12,23 1032 | 5768.44503437889,65.09,"female","black",16,27 1033 | 56642.4364407088,63.69,"female","white",12,46 1034 | 990.807712176254,63.03,"female","white",12,49 1035 | 998.671567646385,62.86,"female","white",16,33 1036 | 111205.855063134,72.18,"male","white",15,52 1037 | 34399.7622930359,66.98,"female","white",16,32 1038 | 79576.5419400654,68.85,"male","white",11,22 1039 | 23824.410492409,70.75,"male","white",9,25 1040 | -56.3219788355347,67.81,"male","hispanic",10,22 1041 | -66.1283004678617,72.92,"male","black",12,24 1042 | 15316.9897842259,65.87,"female","black",8,39 1043 | 16906.2841715607,63.17,"female","black",14,29 1044 | 1013.91240595871,65.3,"female","white",12,64 1045 | 47740.831040028,70.88,"male","white",12,51 1046 | 25452.1338955757,72.11,"male","white",14,25 1047 | 31876.5784243398,67.72,"male","white",12,55 1048 | 1003.37176812386,63.82,"female","white",12,49 1049 | 158996.080700412,71.83,"male","white",17,58 1050 | 16889.9869050085,65.61,"female","white",12,47 1051 | 55635.3843796645,66.11,"male","white",14,67 1052 | 12115.9173222928,64.71,"female","white",16,35 1053 | 1007.2504839221,65,"female","white",12,66 1054 | 1010.4634170377,66.17,"female","white",12,56 1055 | 991.124467831543,64.87,"female","white",11,42 1056 | 12721.3377336341,71.39,"male","black",14,31 1057 | 17477.4743066582,74.38,"male","white",10,54 1058 | 39764.5300001007,69.77,"male","white",9,51 1059 | 40740.013906102,63.82,"female","white",14,49 1060 | 47770.8224810453,69,"male","white",14,52 1061 | 25.5073793326888,66.02,"male","white",12,30 1062 | 77311.9389631547,62.62,"female","white",18,38 1063 | 28041.3399828982,68.02,"female","white",12,50 1064 | 1005.83847378064,70.7,"female","white",8,74 1065 | 41433.6591116814,72.86,"male","white",16,28 1066 | 16896.3080934012,63.35,"female","white",12,65 1067 | 995.746579569306,65.19,"female","white",11,61 1068 | 95351.2060008194,71.05,"male","white",18,57 1069 | 35978.3615194746,65.36,"female","white",18,33 1070 | 10861.0922840824,64.03,"female","white",13,87 1071 | 39168.2941872203,65.22,"female","white",16,36 1072 | 31789.2198043854,71.41,"male","white",14,44 1073 | 12137.2448273531,69.85,"female","white",14,34 1074 | -30.4235024585596,72.38,"male","black",12,23 1075 | 48688.9297482258,67.71,"female","black",16,52 1076 | 37563.434273142,65.88,"female","white",18,32 1077 | 32821.1936901775,63.71,"female","white",12,44 1078 | 5780.073674992,66.19,"female","white",16,33 1079 | 999.631035696323,62.82,"female","white",12,35 1080 | 47782.6325430195,70.35,"male","white",16,45 1081 | 20075.6765540308,64.23,"female","white",14,27 1082 | 993.395992495505,66.15,"female","white",7,45 1083 | 1025.01290254845,63.26,"female","white",10,41 1084 | 63513.1027353022,71.25,"male","white",12,42 1085 | 24859.9457977036,64.02,"female","white",14,27 1086 | 58225.6117358599,62.42,"female","white",16,49 1087 | 16889.6611198725,65.5,"female","white",12,57 1088 | 995.554489598849,63.08,"female","white",11,69 1089 | 21669.2484450363,67.98,"female","white",14,52 1090 | 1009.76704604544,64.88,"female","white",12,63 1091 | 1010.98136569577,60.61,"female","white",14,45 1092 | 16894.2835398117,61.77,"female","black",12,31 1093 | 47695.1950625972,68.89,"male","black",12,41 1094 | 36593.6932422507,64.16,"male","white",13,36 1095 | 31833.0641657761,69.63,"male","white",17,72 1096 | 31216.472797982,64.6,"female","white",13,86 1097 | 31818.0117641456,61.21,"male","black",11,70 1098 | 44466.4806466349,63.44,"male","black",14,67 1099 | 21681.934597577,65.14,"female","white",10,84 1100 | 19119.584235378,63.61,"male","white",12,79 1101 | 31760.3097497459,69.06,"male","white",11,27 1102 | 39789.4734194548,73.88,"male","white",12,34 1103 | 40735.5821116565,67.8,"female","white",13,56 1104 | 1004.34436591698,67.65,"female","white",12,38 1105 | 1014.45045553044,64.91,"female","white",14,45 1106 | 29620.997007219,66.87,"female","white",16,44 1107 | 988.565070198681,64.71,"female","white",12,22 1108 | 995.374617537821,64.18,"female","white",18,50 1109 | 10529.7506464015,62.09,"female","white",10,72 1110 | 25427.4001595129,72.05,"male","white",13,27 1111 | 71514.3289230531,73.05,"male","white",16,42 1112 | 39806.8521692602,69.94,"male","white",14,73 1113 | 20073.7156480621,63.99,"female","white",14,42 1114 | 4160.53105497259,67.87,"female","white",12,22 1115 | 16890.6172628713,62.86,"female","white",12,22 1116 | 22286.8004109747,71.18,"male","white",12,40 1117 | 24852.3361979281,64.41,"female","white",12,35 1118 | 995.172315021961,67.35,"female","white",11,41 1119 | 16901.8264738142,59.77,"female","white",12,43 1120 | 2919.12600531874,65.51,"female","white",4,68 1121 | 24850.8308644127,59.01,"female","white",14,66 1122 | 16900.745158091,62.86,"female","white",13,68 1123 | 3163.02291068586,66.3,"male","white",12,22 1124 | 1006.01762644208,66.21,"female","white",12,39 1125 | 7365.5038036505,64.7,"female","white",13,37 1126 | 16894.0221460191,63.96,"female","white",14,36 1127 | 3393.78999839327,65.78,"female","white",12,27 1128 | 28624.1716370336,66.33,"male","hispanic",17,30 1129 | 6396.29858419505,72.99,"male","white",15,28 1130 | 44524.6055045838,71.76,"male","white",16,30 1131 | 16913.9983227796,74.3,"female","white",14,26 1132 | 5772.98208580172,65.06,"female","white",16,26 1133 | 999.23673707007,63.53,"female","white",14,35 1134 | 19020.2126342116,71.06,"male","white",12,25 1135 | 31197.9028766208,68.08,"female","white",13,49 1136 | 1008.02183926421,61.35,"female","white",12,37 1137 | 57262.115767785,69.77,"male","white",16,43 1138 | 12154.012468166,64.19,"female","white",16,63 1139 | 56655.8520466265,59.16,"female","white",11,38 1140 | 63508.2745013848,68.27,"male","white",18,47 1141 | 23896.1225958883,68.31,"male","white",14,29 1142 | 13717.6657412768,69.24,"female","white",13,47 1143 | 40744.0142293539,69.02,"female","white",14,59 1144 | 44511.8388677947,75.02,"male","white",18,42 1145 | 35056.1897433602,68.75,"male","white",14,39 1146 | 34166.1588166956,72.1,"male","black",13,28 1147 | 5771.77114883749,68.35,"female","white",12,27 1148 | 36577.5114386126,71.74,"male","white",13,32 1149 | 28718.2050356153,71.88,"male","white",12,77 1150 | 31709.3453051173,66.61,"male","white",16,75 1151 | 26432.0238638548,63.23,"female","hispanic",12,50 1152 | 23817.843746481,68.62,"male","white",12,30 1153 | 85268.6446916847,62.85,"female","hispanic",17,36 1154 | 41349.9483732802,71.99,"male","white",12,31 1155 | 7364.53214004544,64.81,"female","other",12,32 1156 | 15809.0448761028,72.98,"male","white",16,63 1157 | 68330.0291746917,69.06,"male","white",12,55 1158 | 27130.7019128214,69.07,"male","other",14,37 1159 | 7363.69743087666,60.66,"female","hispanic",12,42 1160 | 79485.8854920863,72.8,"male","white",15,52 1161 | 88456.4731995184,67.03,"female","white",18,55 1162 | 45507.9788477399,62.91,"female","white",16,45 1163 | 998.429683373758,66.09,"female","white",12,52 1164 | 8951.01665799154,61.87,"female","white",14,52 1165 | 29614.9483149486,65.05,"female","white",13,53 1166 | 33385.6491294888,71.65,"male","hispanic",15,66 1167 | 20085.6872462914,66.91,"female","white",12,37 1168 | 53481.8611190695,65.46,"female","hispanic",16,66 1169 | 79582.9722445708,71.73,"male","hispanic",13,68 1170 | 10543.0059825721,65.35,"female","hispanic",9,78 1171 | 26433.4457319253,63.98,"female","white",14,63 1172 | 55665.7385965739,67.76,"male","white",12,54 1173 | 40750.9588661231,68.43,"female","white",12,47 1174 | 31779.0524294819,75.89,"male","white",14,30 1175 | 29618.6709435351,63.28,"female","white",12,40 1176 | 54032.4163361492,73.66,"male","white",14,82 1177 | 39720.1169340478,67.95,"male","white",16,29 1178 | 31789.5496754198,72.13,"male","white",17,81 1179 | 52556.6622479844,68.05,"male","white",12,73 1180 | 7362.32469627059,65.15,"female","white",12,77 1181 | 32821.7394367105,59.85,"female","white",8,81 1182 | 63629.1436845868,71.62,"male","white",16,37 1183 | 23272.6828309175,63.92,"female","white",14,44 1184 | 20070.4266821973,68.19,"female","white",12,52 1185 | 40752.7857873425,60.68,"female","white",12,35 1186 | 112293.228459904,67.16,"female","white",10,76 1187 | 61431.2358244234,67.35,"female","white",17,59 1188 | 23273.0643236029,62.89,"female","white",12,32 1189 | 1005.66857107228,63.08,"female","white",15,37 1190 | 24867.3944914387,59.74,"female","white",13,41 1191 | 27052.3145447836,74.2,"male","white",16,35 1192 | 95353.794962225,69.22,"male","white",14,70 1193 | 24862.3904718197,66.82,"female","white",15,73 1194 | 39757.9472100514,64.79,"male","white",16,90 1195 | 28612.7898884795,62.2,"male","hispanic",12,67 1196 | -1.50936964863578,64.92,"male","white",12,59 1197 | 111285.63662324,72.19,"male","white",18,51 1198 | 987.783751585589,66.04,"female","hispanic",11,53 1199 | 48687.8484753642,65.16,"female","white",18,63 1200 | 13713.7442426761,61.86,"female","white",12,55 1201 | 47740.9784984125,70.87,"male","white",16,38 1202 | 998.743384307228,62.2,"female","white",11,30 1203 | 42941.4351052398,71.11,"male","white",14,36 1204 | 74706.6532959308,70.24,"male","white",18,47 1205 | 19037.4454862462,67.15,"male","hispanic",12,26 1206 | 39157.7510357454,62.05,"female","white",17,38 1207 | 21664.6599424119,63.05,"female","hispanic",12,39 1208 | 128217.042224129,60.94,"female","white",14,41 1209 | 7935.49480666059,71.42,"male","white",12,22 1210 | 1001.90663270767,64.92,"female","white",12,32 1211 | 40760.9115574943,63.15,"female","white",10,64 1212 | 44533.1100039177,66.12,"male","white",12,48 1213 | 13695.752850286,64.1,"female","white",15,24 1214 | 16880.9998315911,65.96,"female","white",13,55 1215 | 15296.7303638123,68.23,"female","white",14,45 1216 | 32784.8272263708,60.26,"female","white",12,41 1217 | 19053.5691050415,65.21,"male","white",12,75 1218 | 24854.2306409617,64.69,"female","white",14,70 1219 | 32789.3796661253,60.64,"female","white",14,40 1220 | 45525.7374951262,64.01,"female","white",14,40 1221 | 1432.29097374576,65.79,"female","white",15,25 1222 | 55693.5517533498,68.99,"male","white",14,78 1223 | 42341.556083029,65.93,"female","hispanic",14,36 1224 | 43933.8280668161,66.67,"female","white",17,38 1225 | 23263.2462506252,68.63,"female","white",12,34 1226 | 50908.8970813513,65.76,"male","white",16,37 1227 | 16906.3460390581,68.97,"female","white",14,36 1228 | 11121.5282649971,69.72,"male","white",16,34 1229 | 44492.4222581206,71.03,"male","white",17,50 1230 | 95400.5609817628,73.8,"male","white",13,58 1231 | 19050.8316923704,68.91,"male","white",4,66 1232 | 1013.52622790713,65,"female","white",13,67 1233 | 995.78153389181,65.21,"female","white",12,50 1234 | 24857.5303329462,74.88,"female","white",13,79 1235 | 23863.0123251893,71.57,"male","white",11,67 1236 | 42322.0737112319,64.12,"female","white",15,36 1237 | 992.885672062796,58.99,"female","black",12,47 1238 | -49.5976242316242,70.96,"male","other",12,24 1239 | 41288.4230868218,71.91,"male","hispanic",16,29 1240 | 30767.8461101159,63.23,"female","black",10,40 1241 | 1001.49804266558,68.29,"female","black",12,26 1242 | 20079.5574272996,63.8,"female","black",12,77 1243 | 127226.599652973,69.95,"male","white",17,43 1244 | 77309.5329351063,66.06,"female","white",13,43 1245 | 28575.3855635215,62.05,"male","other",16,31 1246 | 135137.686883976,70.25,"male","white",15,69 1247 | 56657.6081257394,64.13,"female","black",12,51 1248 | 8027.18531618615,68.63,"male","white",16,26 1249 | 15309.4702056486,63.81,"female","hispanic",13,26 1250 | 48708.1791232664,66.66,"female","white",16,31 1251 | 127277.492490206,72.18,"male","white",14,62 1252 | 56642.969612385,65.09,"female","white",16,36 1253 | 20084.6692833515,62.95,"female","white",14,38 1254 | 317949.127955061,70.24,"male","white",18,38 1255 | 7375.92570936378,63.98,"female","white",16,49 1256 | 3199.96325523417,68.07,"male","white",12,22 1257 | 39169.1000544924,66.02,"female","white",18,73 1258 | 3128.40045423806,65.48,"male","hispanic",15,27 1259 | 23908.082328374,70.4,"male","white",17,34 1260 | 10533.1039963861,61.49,"female","white",12,69 1261 | 7869.85708849864,65.83,"male","white",12,79 1262 | 3375.81930734894,65.86,"female","hispanic",12,25 1263 | 45509.3990429654,63.86,"female","white",18,52 1264 | 15290.2441182621,69.81,"female","white",12,39 1265 | 95402.6598745111,74.25,"male","white",14,56 1266 | 1889.14088316683,66.1,"male","white",12,23 1267 | 3224.43607582189,62.09,"female","hispanic",3,68 1268 | 12132.5027696978,65.95,"female","white",13,47 1269 | 34398.5804366668,62.01,"female","white",12,35 1270 | 69966.2128885529,68.9,"male","white",12,62 1271 | 46143.8478216483,68.08,"male","black",16,40 1272 | 4767.09597374661,72.33,"male","hispanic",14,22 1273 | 19086.1244824656,64.04,"male","black",13,26 1274 | 988.878651448201,64.1,"female","black",16,23 1275 | 1000.68811662548,59.57,"female","hispanic",9,43 1276 | 31859.9039539382,72.28,"male","hispanic",12,32 1277 | 2912.61685643064,61.72,"female","black",5,77 1278 | 58812.5554780609,74.52,"male","white",16,29 1279 | 12721.5684022168,65.93,"male","other",10,64 1280 | 55596.6235470168,68.04,"male","hispanic",16,34 1281 | 39687.7203364583,61.89,"male","other",14,41 1282 | 29600.80026329,71.22,"female","white",10,33 1283 | 48698.5173261034,62.99,"female","black",16,40 1284 | 31762.5194770489,68.73,"male","white",18,73 1285 | 19176.3341880439,70.83,"male","hispanic",6,44 1286 | 1003.94818935734,64.17,"female","hispanic",12,69 1287 | 16898.1539759619,66.07,"female","white",12,72 1288 | 28592.6154784873,72.31,"male","white",14,52 1289 | 986.165728698769,62.46,"female","hispanic",15,63 1290 | 31868.4793403152,71.49,"male","black",12,30 1291 | 50293.3455419254,67.94,"female","white",16,47 1292 | 9521.39185162996,64.41,"male","hispanic",10,40 1293 | 1004.20977744948,64.37,"female","white",12,42 1294 | 19095.2615453343,69.52,"male","white",13,44 1295 | 21672.4590284355,65.56,"female","white",16,37 1296 | 42342.1759299248,67.01,"female","white",17,38 1297 | 24857.4413604567,67.82,"female","black",11,35 1298 | 48694.1241676915,64.07,"female","white",17,32 1299 | 4199.25482208603,65.01,"female","white",14,45 1300 | 7876.23087159809,69.78,"male","other",13,26 1301 | 85249.2838792362,71.28,"female","white",14,30 1302 | 993.735798257432,63.83,"female","white",12,40 1303 | 4159.03322157321,61.54,"female","white",13,22 1304 | 39787.6214099871,70.48,"male","white",17,33 1305 | 8056.63501215389,73.65,"male","white",12,22 1306 | 34382.0740264456,64.86,"female","white",13,31 1307 | 28038.3578937832,70,"female","white",15,31 1308 | 20087.0438961784,63.81,"female","white",13,41 1309 | 66771.7539282786,70.25,"male","hispanic",14,48 1310 | 33417.5208538918,74.3,"male","white",12,37 1311 | 10531.9881266657,63.14,"female","hispanic",12,69 1312 | 994.515746507949,59.23,"female","hispanic",12,50 1313 | 39790.7425975321,73.85,"male","white",14,28 1314 | 5753.10762270653,66.17,"female","white",14,23 1315 | 6722.68142995174,64.94,"female","white",15,71 1316 | 6373.6175685778,70.59,"male","white",12,23 1317 | 31879.3535269404,69.07,"male","white",14,62 1318 | 82711.753363371,60.14,"male","white",15,54 1319 | 96406.7613267083,64.37,"female","white",12,47 1320 | 24839.6398412734,63.84,"female","white",13,45 1321 | 143040.708851371,72.27,"male","white",16,54 1322 | 42349.1938093216,61.38,"female","white",14,69 1323 | 1013.28842229702,62.68,"female","hispanic",13,36 1324 | 3153.35665292635,69.16,"male","white",12,25 1325 | 26434.5026070359,66.82,"female","white",12,65 1326 | 39165.9289402853,65.15,"female","hispanic",11,31 1327 | 23905.6894085143,67.55,"male","hispanic",16,32 1328 | 66770.3896483816,74.1,"male","white",17,47 1329 | 1007.81676112648,62.35,"female","white",15,52 1330 | 16902.2396528115,64.83,"female","white",12,47 1331 | 27044.398985041,70.96,"male","hispanic",13,40 1332 | 48702.1770934611,64.03,"female","white",14,38 1333 | 1000.8856431763,64.92,"female","white",16,26 1334 | 51874.2088709548,68.84,"female","white",17,41 1335 | 58234.5809653042,63.6,"female","hispanic",17,38 1336 | 15865.7268275005,71.9,"male","hispanic",13,24 1337 | 32792.0539545984,63.12,"female","white",15,33 1338 | 39160.158504899,64.93,"female","white",12,33 1339 | 27043.6720522342,72.12,"male","white",12,36 1340 | 37592.311734899,64.49,"female","white",14,52 1341 | 40747.9655119059,63.91,"female","white",12,37 1342 | 79603.4564638413,71.36,"male","white",12,54 1343 | 1006.46990206972,63.48,"female","white",14,43 1344 | 32793.912006464,61.82,"female","white",12,34 1345 | 48693.2024538462,66.01,"female","white",16,37 1346 | 12137.6764657013,63.95,"female","white",10,43 1347 | 11176.0659373347,66.99,"male","white",10,82 1348 | 47774.2522227392,67.79,"male","white",11,36 1349 | 5761.17534334497,66.38,"female","white",12,28 1350 | 32799.1446940046,65.97,"female","white",14,47 1351 | 64607.9618716391,65.68,"female","white",16,63 1352 | 15899.4031550977,67.02,"male","white",12,81 1353 | 25361.8283789109,66.1,"male","white",12,41 1354 | 18482.9571837826,63.27,"female","white",9,55 1355 | 26432.8864567014,67.89,"female","white",13,47 1356 | 28644.500098047,70.32,"male","white",12,31 1357 | 20065.0827364581,62.98,"female","white",16,30 1358 | 6354.84212675623,70.09,"male","black",14,25 1359 | 95416.6067984548,71.92,"male","white",12,49 1360 | 68428.9230580318,74.92,"male","white",17,44 1361 | 50299.0512923305,61.97,"female","white",14,43 1362 | 80518.409375972,67.98,"female","white",17,43 1363 | 43919.8943431691,68.18,"female","white",14,33 1364 | 47712.4409351132,69.85,"male","white",17,60 1365 | 19182.6074308127,73.26,"male","black",13,25 1366 | 987.26513123966,69.01,"female","white",16,43 1367 | 32798.0511760252,62.37,"female","white",17,34 1368 | 24849.1932942171,60.3,"female","black",12,80 1369 | 40757.9350769351,64.17,"female","other",16,41 1370 | 4184.2226846731,60.19,"female","hispanic",6,71 1371 | 4755.73650797254,72.94,"male","hispanic",15,24 1372 | 175901.453597818,65.9,"female","other",18,52 1373 | 87473.9687783999,68.82,"male","white",18,75 1374 | 92205.596105924,69.62,"male","white",18,57 1375 | 16905.5578510981,70.08,"female","white",16,40 1376 | 30173.3803632981,71.68,"male","white",12,33 1377 | 24853.5195136729,61.31,"female","white",18,86 1378 | 13710.6713116427,63.64,"female","white",12,37 1379 | 95426.0144102907,71.65,"male","white",12,54 1380 | 9575.46185684499,68.22,"male","white",12,31 1381 | -------------------------------------------------------------------------------- /05_dplyr/header.tex: -------------------------------------------------------------------------------- 1 | \usepackage{ctex} 2 | \usepackage{booktabs} 3 | \usepackage{longtable} 4 | \usepackage{array} 5 | \usepackage{multirow} 6 | \usepackage{wrapfig} 7 | \usepackage{float} 8 | \usepackage{colortbl} 9 | \usepackage{pdflscape} 10 | \usepackage{tabu} 11 | \usepackage{threeparttable} 12 | \usepackage{threeparttablex} 13 | \usepackage{makecell} 14 | \usepackage{xcolor} 15 | \usepackage{xtab} 16 | 17 | \def\begincols{ 18 | \begin{columns}} \def\begincol{\begin{column}} \def\endcol{\end{column}} \def\endcols{\end{columns} 19 | } 20 | 21 | 22 | -------------------------------------------------------------------------------- /05_dplyr/images/import_datatype01.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/05_dplyr/images/import_datatype01.png -------------------------------------------------------------------------------- /05_dplyr/images/pipe1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/05_dplyr/images/pipe1.png -------------------------------------------------------------------------------- /05_dplyr/images/pipe2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/05_dplyr/images/pipe2.png -------------------------------------------------------------------------------- /05_dplyr/images/tidyverse.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/05_dplyr/images/tidyverse.png -------------------------------------------------------------------------------- /06_ggplot2/06_ggplot2.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "第五章:数据可视化" 3 | author: "王敏杰" 4 | institute: "四川师范大学" 5 | date: "\\today" 6 | fontsize: 12pt 7 | output: binb::metropolis 8 | section-titles: true 9 | #toc: true 10 | header-includes: 11 | - \usepackage[fontset = fandol]{ctex} 12 | - \input{header.tex} 13 | link-citations: yes 14 | colorlinks: yes 15 | linkcolor: red 16 | classoption: "dvipsnames,UTF8" 17 | --- 18 | 19 | ```{r setup, include=FALSE} 20 | options(digits = 3) 21 | knitr::opts_chunk$set( 22 | comment = "#>", 23 | echo = TRUE, 24 | collapse = TRUE, 25 | message = FALSE, 26 | warning = FALSE, 27 | out.width = "75%", 28 | fig.align = "center", 29 | fig.asp = 0.618, # 1 / phi 30 | fig.show = "hold" 31 | ) 32 | ``` 33 | 34 | ## tidyverse 家族 35 | ```{r echo=FALSE, out.width = '85%'} 36 | knitr::include_graphics("images/tidyverse.png") 37 | ``` 38 | 39 | 40 | 41 | 42 | 43 | # 为什么要可视化 44 | 45 | ## 1854年伦敦霍乱 46 | ```{r out.width = '100%', echo = FALSE} 47 | knitr::include_graphics("images/cholera_a.pdf") 48 | ``` 49 | 50 | 51 | ## 1854年伦敦霍乱 52 | ```{r out.width = '100%', echo = FALSE} 53 | knitr::include_graphics("images/cholera_b.pdf") 54 | ``` 55 | 56 | 57 | ## 1854年伦敦霍乱 58 | ```{r out.width = '100%', echo = FALSE} 59 | knitr::include_graphics("images/cholera_c.pdf") 60 | ``` 61 | 62 | 63 | 64 | ## 辛普森悖论(Simpson's Paradox) 65 | ```{r out.width = '100%', echo = FALSE} 66 | knitr::include_graphics("images/Paradox1.pdf") 67 | ``` 68 | 69 | ## 辛普森悖论(Simpson's Paradox) 70 | ```{r out.width = '100%', echo = FALSE} 71 | knitr::include_graphics("images/Paradox2.pdf") 72 | ``` 73 | 74 | 75 | ## 辛普森悖论(Simpson's Paradox) 76 | ```{r out.width = '100%', echo = FALSE} 77 | knitr::include_graphics("images/Paradox3.pdf") 78 | ``` 79 | 80 | 81 | # ggplot2 宏包 82 | 83 | ## 宏包ggplot2 84 | 85 | - ggplot2是RStudio首席科学家Hadley Wickham在2005年读博士期间的作品。 86 | - 很多人学习R语言,就是因为ggplot2宏包 87 | - ggplot2已经发展成为最受欢迎的R宏包,没有之一 88 | 89 | ```{r} 90 | library(ggplot2) # install.packages("ggplot2") 91 | # or 92 | library(tidyverse) # install.packages("tidyverse") 93 | ``` 94 | 95 | 96 | 97 | 98 | 99 | ## ggplot2 的图形语法 100 | 101 | ggplot2有一套优雅的绘图语法(grammar of graphics) 102 | 103 | ```{r out.width = '70%', echo = FALSE} 104 | knitr::include_graphics("images/mapping.png") 105 | ``` 106 | 107 | Hadley Wickham将这套语法诠释为: 108 | 109 | 一张统计图形就是从数据到几何对象(geometric object,缩写geom)的图形属性(aesthetic attribute,缩写aes)的一个映射。 110 | 111 | 112 | ## ggplot2 的图形语法 113 | 114 | `ggplot()`函数包括9个部件: 115 | 116 | - **数据 (data)** 117 | - **映射 (mapping)** 118 | - **几何对象 (geom)** 119 | - 统计变换 (stats) 120 | - 标度 (scale) 121 | - 坐标系 (coord) 122 | - 分面 (facet) 123 | - 主题 (theme) 124 | - 存储和输出 (output) 125 | 126 | 其中前三个是必需的。 127 | 128 | 129 | 130 | ## 语法模板 131 | 132 | ```{r out.width = '100%', echo = FALSE} 133 | knitr::include_graphics("images/ggplot_template.png") 134 | ``` 135 | 136 | 137 | 138 | ## 案例 139 | 140 | 简单的案例(1880-2014年温度变化和二氧化碳排放量) 141 | 142 | \footnotesize 143 | ```{r, warning = FALSE, message = FALSE} 144 | d <- readr::read_csv("./demo_data/temp_carbon.csv") 145 | ``` 146 | 147 | ```{r, echo = FALSE} 148 | d %>% 149 | head(10) %>% 150 | knitr::kable() 151 | ``` 152 | 153 | ## 是不是很简单? 154 | \footnotesize 155 | ```{r, out.width="85%"} 156 | ggplot(data = d, mapping = aes(x = year, y = carbon_emissions)) + 157 | geom_line() 158 | ``` 159 | 160 | 161 | # ggplot2 语法详解 162 | 163 | ## 演示数据 164 | 165 | 我们用ggplot2宏包内置的燃油经济性数据[mpg](https://ggplot2.tidyverse.org/reference/mpg.html)演示 166 | 167 | \small 168 | |序号|变量|含义| 169 | |:---|:---|:---| 170 | |1 | manufacturer | 生产厂家| 171 | |2 | model | 类型| 172 | |3 | displ | 发动机排量,升| 173 | |4 | year | 生产年份| 174 | |5 | cyl | 气缸数量| 175 | |6 | trans | 传输类型| 176 | |7 | drv | 驱动类型| 177 | |8 | cty | 每加仑城市里程| 178 | |9 | hwy | 每加仑高速公路英里| 179 | |10 | fl | 汽油种类| 180 | |11 | class | 类型| 181 | 182 | 183 | ## 排量越大,越耗油吗? 184 | 185 | 回答这个问题,要用到mpg数据集中的三个变量 186 | 187 | |序号|变量|含义| 188 | |:---|:---|:---| 189 | |3 | displ | **排量**| 190 | |9 | hwy | **油耗**| 191 | |11 | class | 汽车类型| 192 | 193 | 194 | ```{r} 195 | mpg %>% 196 | select(displ, hwy, class) %>% 197 | head(4) 198 | ``` 199 | 200 | 201 | 202 | ## 映射 203 | 204 | 为考察发动机排量(displ)与每加仑英里数(hwy)之间的关联,先绘制这两个变量的散点图, 205 | 206 | ```{r out.width = '100%', echo = FALSE} 207 | knitr::include_graphics("images/a-3.png") 208 | ``` 209 | 210 | 211 | 212 | 213 | 214 | 215 | 216 | 217 | 218 | 219 | 220 | 221 | 222 | 223 | 224 | ## 运行 225 | 运行脚本后生成图片: 226 | \footnotesize 227 | ```{r, out.width="85%"} 228 | ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 229 | geom_point() 230 | ``` 231 | 232 | 233 | 234 | ## 颜色映射 235 | 236 | 除了位置上的映射,ggplot2还包含了颜色、形状及透明度等图形属性的映射 237 | 238 | 239 | 240 | \footnotesize 241 | ```{r, out.width="75%"} 242 | ggplot(data = mpg, aes(x = displ, y = hwy, color = class) ) + 243 | geom_point() 244 | ``` 245 | 246 | 此图绘制不同类型的车,displ和hwy的散点图, 并用颜色来实现了分组。 247 | 248 | 249 | 250 | ## 更多映射 251 | 大家试试下面代码呢 252 | \footnotesize 253 | ```{r, eval = FALSE} 254 | ggplot(data = mpg, aes(x = displ, y = hwy, size = class)) + 255 | geom_point() 256 | ``` 257 | 258 | 259 | ```{r, eval = FALSE} 260 | ggplot(data = mpg, aes(x = displ, y = hwy, shape = class)) + 261 | geom_point() 262 | ``` 263 | 264 | 265 | ```{r, eval = FALSE} 266 | ggplot(data = mpg, aes(x = displ, y = hwy, alpha = class)) + 267 | geom_point() 268 | ``` 269 | 270 | 271 | ## 默认值 272 | 一些默认的设置 273 | 274 | ```{r out.width = '85%', echo = FALSE} 275 | knitr::include_graphics("images/a-14.png") 276 | ``` 277 | 278 | 279 | 280 | ## 映射 vs.设置 281 | 282 | 想把图中的点指定为某一种颜色,可以使用设置语句,比如 283 | 284 | ```{r out.width = '65%'} 285 | mpg %>% 286 | ggplot(aes(displ, hwy)) + 287 | geom_point(color = "blue") 288 | ``` 289 | 290 | 291 | 292 | ## 更多设置 293 | 大家也可以试试下面 294 | \footnotesize 295 | ```{r, eval = FALSE} 296 | ggplot(mpg, aes(displ, hwy)) + geom_point(size = 5) 297 | ``` 298 | 299 | 300 | ```{r, eval = FALSE} 301 | ggplot(mpg, aes(displ, hwy)) + geom_point(shape = 2) 302 | ``` 303 | 304 | 305 | ```{r, eval = FALSE} 306 | ggplot(mpg, aes(displ, hwy)) + geom_point(alpha = 0.5) 307 | ``` 308 | 309 | 310 | 311 | ## 提问 312 | ```{r out.width = '100%', echo = FALSE} 313 | knitr::include_graphics("images/a-21.png") 314 | ``` 315 | 316 | 思考下`aes(color = "blue")`为什么会红色的点? 317 | 318 | 319 | 320 | 321 | 322 | 323 | 324 | 325 | 326 | 327 | 328 | 329 | 330 | 331 | 332 | 333 | 334 | 335 | 336 | 337 | 338 | 339 | 340 | 341 | 342 | 343 | 344 | 345 | 346 | 347 | ## 几何对象 348 | 349 | `geom_point()` 可以画散点图,也可以使用`geom_smooth()`绘制平滑曲线 350 | 351 | ```{r} 352 | ggplot(data = mpg, aes(x = displ, y = hwy)) + 353 | geom_smooth() 354 | ``` 355 | 356 | 357 | 358 | ## 图层叠加 359 | 360 | ```{r} 361 | ggplot(data = mpg, aes(x = displ, y = hwy)) + 362 | geom_point() + 363 | geom_smooth() 364 | ``` 365 | 366 | 367 | 368 | 369 | 370 | ## Global vs. Local 371 | \footnotesize 372 | 373 | ```{r, eval=FALSE} 374 | ggplot(mpg) + 375 | geom_point(aes(x = displ, y = hwy, color = class)) 376 | ``` 377 | 378 | ```{r, eval=FALSE} 379 | ggplot(mpg) + 380 | geom_point( aes(x = displ, y = hwy, color = class) ) 381 | ``` 382 | 383 | 384 | \begincols[T] 385 | \begincol[T]{.49\textwidth} 386 | ```{r, echo=FALSE, out.width= "100%"} 387 | ggplot(mpg) + 388 | geom_point(aes(x = displ, y = hwy, color = class)) 389 | ``` 390 | \endcol 391 | 392 | \begincol[T]{.49\textwidth} 393 | 394 | ```{r, echo=FALSE, out.width= "100%"} 395 | ggplot(mpg) + 396 | geom_point( aes(x = displ, y = hwy, color = class) ) 397 | ``` 398 | \endcol 399 | \endcols 400 | 401 | 大家可以看到,以上两段代码出来的图是一样,但背后的含义却不同。 402 | 403 | 404 | 405 | ## Global vs. Local 406 | 407 | - 如果映射关系`aes()` 写在`ggplot()`里, 那么`x = displ, y = hwy, color = class` 为全局变量 408 | 409 | ```{r, eval=FALSE} 410 | ggplot(mpg, aes(x = displ, y = hwy, color = class)) + 411 | geom_point() 412 | ``` 413 | 414 | - `geom_point()`中缺少所绘图所需要的映射关系,就会继承全局变量的映射关系 415 | 416 | 417 | 418 | ## Global vs. Local 419 | - 如果映射关系`aes()` 写在几何对象`geom_point()`里, 就为局部变量, 420 | 421 | ```{r, eval=FALSE} 422 | ggplot(mpg) + 423 | geom_point(aes(x = displ, y = hwy, color = class)) 424 | ``` 425 | 426 | - `geom_point()`绘图所需要的映射关系已经存在,就不会继承全局变量的映射关系 427 | 428 | 429 | 430 | 431 | ## Global vs. Local 432 | ```{r, eval=FALSE, warning=FALSE, message=FALSE} 433 | ggplot(mpg, aes(x = displ, y = hwy)) + 434 | geom_point(aes(color = class)) + 435 | geom_smooth() 436 | ``` 437 | 438 | 这里的 `geom_point()` 和 `geom_smooth()` 都会从全局变量中继承映射关系。 439 | 440 | 441 | 442 | ## Global vs. Local 443 | 444 | ```{r, out.width= "65%"} 445 | ggplot(mpg, aes(x = displ, y = hwy, color = class)) + 446 | geom_point(aes(color = factor(cyl))) 447 | ``` 448 | 局部变量中的映射关系 449 | `aes(color = )`已经存在,因此不会从全局变量中继承,沿用当前的映射关系。 450 | 451 | 452 | 453 | 454 | ## 提问 455 | 大家细细体会下,下面两段代码的区别 456 | ```{r, eval=FALSE} 457 | ggplot(mpg, aes(x = displ, y = hwy, color = class)) + 458 | geom_smooth(method = lm) + 459 | geom_point() 460 | ``` 461 | 462 | 463 | ```{r, eval=FALSE} 464 | ggplot(mpg, aes(x = displ, y = hwy)) + 465 | geom_smooth(method = lm) + 466 | geom_point(aes(color = class)) 467 | ``` 468 | 469 | 470 | 471 | 472 | ## 保存图片 473 | 474 | 可以使用`ggsave()`函数,将图片保存为所需要的格式,如".pdf", ".png"等 475 | 476 | ```{r, eval = FALSE} 477 | p <- ggplot(mpg, aes(x = displ, y = hwy)) + 478 | geom_smooth(method = lm) + 479 | geom_point(aes(color = class)) + 480 | ggtitle("This is my first plot") 481 | 482 | ggsave( 483 | filename = "myplot.pdf", 484 | plot = p, 485 | width = 8, 486 | height = 6 487 | ) 488 | ``` 489 | 490 | 491 | 492 | 493 | -------------------------------------------------------------------------------- /06_ggplot2/06_ggplot2.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/06_ggplot2/06_ggplot2.pdf -------------------------------------------------------------------------------- /06_ggplot2/demo_data/temp_carbon.csv: -------------------------------------------------------------------------------- 1 | year,temp_anomaly,land_anomaly,ocean_anomaly,carbon_emissions 2 | 1880,-0.11,-0.48,-0.01,236 3 | 1881,-0.08,-0.4,0.01,243 4 | 1882,-0.1,-0.48,0,256 5 | 1883,-0.18,-0.66,-0.04,272 6 | 1884,-0.26,-0.69,-0.14,275 7 | 1885,-0.25,-0.56,-0.17,277 8 | 1886,-0.24,-0.51,-0.17,281 9 | 1887,-0.28,-0.47,-0.23,295 10 | 1888,-0.13,-0.41,-0.05,327 11 | 1889,-0.09,-0.31,-0.02,327 12 | 1890,-0.34,-0.51,-0.29,356 13 | 1891,-0.25,-0.52,-0.15,372 14 | 1892,-0.3,-0.49,-0.23,374 15 | 1893,-0.32,-0.54,-0.24,370 16 | 1894,-0.3,-0.38,-0.27,383 17 | 1895,-0.23,-0.39,-0.17,406 18 | 1896,-0.09,-0.33,0,419 19 | 1897,-0.09,-0.26,-0.03,440 20 | 1898,-0.26,-0.37,-0.22,465 21 | 1899,-0.15,-0.21,-0.13,507 22 | 1900,-0.07,-0.15,-0.05,534 23 | 1901,-0.15,-0.12,-0.16,552 24 | 1902,-0.25,-0.26,-0.24,566 25 | 1903,-0.37,-0.37,-0.37,617 26 | 1904,-0.45,-0.44,-0.46,624 27 | 1905,-0.27,-0.33,-0.25,663 28 | 1906,-0.21,-0.17,-0.22,707 29 | 1907,-0.38,-0.62,-0.29,784 30 | 1908,-0.43,-0.44,-0.43,750 31 | 1909,-0.44,-0.43,-0.45,785 32 | 1910,-0.4,-0.36,-0.42,819 33 | 1911,-0.44,-0.48,-0.43,836 34 | 1912,-0.34,-0.48,-0.28,879 35 | 1913,-0.32,-0.31,-0.32,943 36 | 1914,-0.14,-0.06,-0.17,850 37 | 1915,-0.09,-0.08,-0.1,838 38 | 1916,-0.32,-0.46,-0.26,901 39 | 1917,-0.4,-0.63,-0.29,955 40 | 1918,-0.3,-0.5,-0.21,936 41 | 1919,-0.25,-0.33,-0.21,806 42 | 1920,-0.23,-0.36,-0.18,932 43 | 1921,-0.16,-0.15,-0.17,803 44 | 1922,-0.25,-0.27,-0.24,845 45 | 1923,-0.25,-0.29,-0.24,970 46 | 1924,-0.24,-0.25,-0.24,963 47 | 1925,-0.18,-0.15,-0.19,975 48 | 1926,-0.07,-0.02,-0.1,983 49 | 1927,-0.17,-0.22,-0.16,1062 50 | 1928,-0.18,-0.15,-0.2,1065 51 | 1929,-0.33,-0.49,-0.27,1145 52 | 1930,-0.11,-0.13,-0.11,1053 53 | 1931,-0.06,-0.02,-0.08,940 54 | 1932,-0.13,-0.03,-0.17,847 55 | 1933,-0.26,-0.36,-0.22,893 56 | 1934,-0.11,-0.06,-0.13,973 57 | 1935,-0.16,-0.17,-0.15,1027 58 | 1936,-0.12,-0.12,-0.12,1130 59 | 1937,-0.01,-0.02,-0.01,1209 60 | 1938,-0.02,0.17,-0.1,1142 61 | 1939,0.01,0.1,-0.03,1192 62 | 1940,0.16,0.07,0.2,1299 63 | 1941,0.27,0.1,0.35,1334 64 | 1942,0.11,0.06,0.13,1342 65 | 1943,0.11,0.07,0.12,1391 66 | 1944,0.28,0.19,0.32,1383 67 | 1945,0.18,-0.07,0.3,1160 68 | 1946,-0.01,-0.01,-0.01,1238 69 | 1947,-0.04,0.04,-0.07,1392 70 | 1948,-0.05,0.05,-0.1,1469 71 | 1949,-0.07,-0.07,-0.08,1419 72 | 1950,-0.15,-0.32,-0.09,1630 73 | 1951,0,-0.06,0.02,1767 74 | 1952,0.05,-0.05,0.08,1795 75 | 1953,0.13,0.2,0.1,1841 76 | 1954,-0.1,-0.12,-0.09,1865 77 | 1955,-0.13,-0.11,-0.13,2042 78 | 1956,-0.18,-0.4,-0.1,2177 79 | 1957,0.07,-0.04,0.11,2270 80 | 1958,0.13,0.15,0.12,2330 81 | 1959,0.08,0.09,0.08,2454 82 | 1960,0.05,0,0.07,2569 83 | 1961,0.1,0.12,0.09,2580 84 | 1962,0.11,0.16,0.09,2686 85 | 1963,0.12,0.21,0.08,2833 86 | 1964,-0.14,-0.22,-0.11,2995 87 | 1965,-0.07,-0.12,-0.05,3130 88 | 1966,-0.01,-0.05,0.01,3288 89 | 1967,0,0.01,-0.01,3393 90 | 1968,-0.03,-0.11,0.01,3566 91 | 1969,0.11,-0.08,0.17,3780 92 | 1970,0.06,0.05,0.06,4053 93 | 1971,-0.07,-0.02,-0.09,4208 94 | 1972,0.04,-0.17,0.11,4376 95 | 1973,0.19,0.34,0.14,4614 96 | 1974,-0.06,-0.18,-0.02,4623 97 | 1975,0.01,0.14,-0.04,4596 98 | 1976,-0.07,-0.23,-0.01,4864 99 | 1977,0.21,0.25,0.19,5016 100 | 1978,0.12,0.1,0.12,5074 101 | 1979,0.23,0.17,0.24,5357 102 | 1980,0.28,0.31,0.26,5301 103 | 1981,0.32,0.52,0.25,5138 104 | 1982,0.19,0.11,0.22,5094 105 | 1983,0.36,0.5,0.3,5075 106 | 1984,0.17,0.06,0.2,5258 107 | 1985,0.16,0.1,0.18,5417 108 | 1986,0.23,0.3,0.21,5583 109 | 1987,0.38,0.45,0.36,5725 110 | 1988,0.39,0.58,0.32,5936 111 | 1989,0.29,0.36,0.27,6066 112 | 1990,0.45,0.66,0.37,6074 113 | 1991,0.39,0.53,0.34,6142 114 | 1992,0.24,0.24,0.23,6078 115 | 1993,0.28,0.35,0.25,6070 116 | 1994,0.34,0.48,0.29,6174 117 | 1995,0.47,0.78,0.35,6305 118 | 1996,0.32,0.35,0.31,6448 119 | 1997,0.51,0.64,0.46,6556 120 | 1998,0.65,0.98,0.52,6576 121 | 1999,0.44,0.78,0.31,6561 122 | 2000,0.42,0.62,0.34,6733 123 | 2001,0.57,0.84,0.46,6893 124 | 2002,0.62,0.95,0.49,6994 125 | 2003,0.63,0.94,0.52,7376 126 | 2004,0.58,0.81,0.49,7743 127 | 2005,0.66,1.08,0.5,8042 128 | 2006,0.63,0.97,0.5,8336 129 | 2007,0.61,1.12,0.43,8503 130 | 2008,0.54,0.89,0.41,8776 131 | 2009,0.64,0.9,0.54,8697 132 | 2010,0.72,1.14,0.56,9128 133 | 2011,0.57,0.91,0.44,9503 134 | 2012,0.63,0.95,0.51,9673 135 | 2013,0.67,1.03,0.53,9773 136 | 2014,0.73,1.01,0.63,9855 137 | -------------------------------------------------------------------------------- /06_ggplot2/header.tex: -------------------------------------------------------------------------------- 1 | \usepackage{ctex} 2 | \usepackage{booktabs} 3 | \usepackage{longtable} 4 | \usepackage{array} 5 | \usepackage{multirow} 6 | \usepackage{wrapfig} 7 | \usepackage{float} 8 | \usepackage{colortbl} 9 | \usepackage{pdflscape} 10 | \usepackage{tabu} 11 | \usepackage{threeparttable} 12 | \usepackage{threeparttablex} 13 | \usepackage{makecell} 14 | \usepackage{xcolor} 15 | \usepackage{xtab} 16 | 17 | \def\begincols{ 18 | \begin{columns}} \def\begincol{\begin{column}} \def\endcol{\end{column}} \def\endcols{\end{columns} 19 | } 20 | 21 | 22 | -------------------------------------------------------------------------------- /06_ggplot2/images/Paradox1.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/06_ggplot2/images/Paradox1.pdf -------------------------------------------------------------------------------- /06_ggplot2/images/Paradox2.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/06_ggplot2/images/Paradox2.pdf -------------------------------------------------------------------------------- /06_ggplot2/images/Paradox3.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/06_ggplot2/images/Paradox3.pdf -------------------------------------------------------------------------------- /06_ggplot2/images/a-14.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/06_ggplot2/images/a-14.png -------------------------------------------------------------------------------- /06_ggplot2/images/a-20.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/06_ggplot2/images/a-20.png -------------------------------------------------------------------------------- /06_ggplot2/images/a-21.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/06_ggplot2/images/a-21.png -------------------------------------------------------------------------------- /06_ggplot2/images/a-3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/06_ggplot2/images/a-3.png -------------------------------------------------------------------------------- /06_ggplot2/images/cholera_a.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/06_ggplot2/images/cholera_a.pdf -------------------------------------------------------------------------------- /06_ggplot2/images/cholera_b.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/06_ggplot2/images/cholera_b.pdf -------------------------------------------------------------------------------- /06_ggplot2/images/cholera_c.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/06_ggplot2/images/cholera_c.pdf -------------------------------------------------------------------------------- /06_ggplot2/images/ggplot_template.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/06_ggplot2/images/ggplot_template.png -------------------------------------------------------------------------------- /06_ggplot2/images/mapping.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/06_ggplot2/images/mapping.png -------------------------------------------------------------------------------- /06_ggplot2/images/tidyverse.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/06_ggplot2/images/tidyverse.png -------------------------------------------------------------------------------- /09_stringr/09_stringr.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "第九章:字符串处理" 3 | author: "王敏杰" 4 | institute: "四川师范大学" 5 | date: "\\today" 6 | fontsize: 12pt 7 | output: binb::metropolis 8 | section-titles: true 9 | #toc: true 10 | header-includes: 11 | - \usepackage[fontset = fandol]{ctex} 12 | - \input{header.tex} 13 | link-citations: yes 14 | colorlinks: yes 15 | linkcolor: red 16 | classoption: "dvipsnames,UTF8" 17 | --- 18 | 19 | ```{r setup, include=FALSE} 20 | options(digits = 3) 21 | knitr::opts_chunk$set( 22 | comment = "#>", 23 | echo = TRUE, 24 | collapse = TRUE, 25 | message = FALSE, 26 | warning = FALSE, 27 | out.width = "75%", 28 | fig.asp = 0.618, # 1 / phi 29 | fig.show = "hold", 30 | fig.showtext = TRUE 31 | ) 32 | ``` 33 | 34 | 35 | # 提问 36 | 37 | ## 问题 38 | 39 | 这是一份关于地址信息的数据 40 | ```{r echo=FALSE, message=FALSE, warning=FALSE} 41 | library(tidyverse) 42 | library(stringr) 43 | library(knitr) 44 | library(printr) 45 | 46 | d <- tibble::tribble( 47 | ~No, ~address, 48 | 1L, "Sichuan Univ, Coll Chem", 49 | 2L, "Sichuan Univ, Coll Elect Engn", 50 | 3L, "Sichuan Univ, Dept Phys", 51 | 4L, "Sichuan Univ, Coll Life Sci", 52 | 6L, "Sichuan Univ, Food Engn", 53 | 7L, "Sichuan Univ, Coll Phys", 54 | 8L, "Sichuan Univ, Sch Business", 55 | 9L, "Wuhan Univ, Mat Sci" 56 | ) 57 | 58 | d 59 | ``` 60 | 61 | - 如何提取`Sichuan Univ`后面的学院? 62 | 63 | 64 | ```{r eval=FALSE, include=FALSE} 65 | d %>% dplyr::mutate( 66 | coll = str_extract_all(address, "(?<=Sichuan Univ,).*") 67 | ) %>% 68 | tidyr::unnest(coll, keep_empty = TRUE) 69 | ``` 70 | 71 | 72 | ```{r eval=FALSE, include=FALSE} 73 | d %>% mutate( 74 | coll = str_remove_all(address, ".*,") 75 | ) 76 | ``` 77 | 78 | ```{r eval=FALSE, include=FALSE} 79 | d %>% tidyr::separate( 80 | address, into = c("univ", "coll"), sep = ",", remove = FALSE 81 | ) 82 | ``` 83 | 84 | 85 | ```{r eval=FALSE, include=FALSE} 86 | d %>% 87 | tidyr::extract( 88 | address, c("univ", "coll"), "(Sichuan Univ), (.+)", 89 | remove = FALSE 90 | ) 91 | ``` 92 | 93 | 94 | 95 | # 正则表达式 96 | 97 | 98 | ## 什么是正则表达式 99 | 100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 | 109 | 110 | 111 | 112 | 113 | 正则表达式(Regular Expression),是一种强大、便捷、高效的文本处理工具。它描述了一种字符串匹配的模式(pattern),比如: 114 | 115 | - 具有固定格式的文本 116 | - 电话号码 117 | - 网络地址、邮件地址 118 | - 日期格式 119 | - 网页解析 120 | - 等等 121 | 122 | 123 | 124 | 125 | ## stringr包 126 | - 正则表达式并不是R语言特有的,事实上,几乎所有程序语言都支持正则表达式 (e.g. Perl, Python, Java, Ruby, etc). 127 | 128 | - R语言中很多函数都需要使用正则表达式,然而大神Hadley Wickham开发的stringr包让正则表达式简单易懂,所以今天我们介绍这个包。 129 | 130 | ```{r out.width = '20%', fig.align='center', echo = FALSE} 131 | knitr::include_graphics("images/hex-stringr.png") 132 | ``` 133 | 134 | 135 | ```{r echo=TRUE, message=FALSE, warning=FALSE} 136 | library(stringr) #install.packages("stringr") 137 | ``` 138 | 139 | 140 | ## stringr包 141 | 142 | \small 143 | - 字符串处理基础 144 | - 字符串长度 145 | - 字符串组合 146 | - 字符串子串 147 | 148 | - 使用正则表达式进行模式匹配 149 | - 基础匹配 150 | - 锚点[máo][diǎn] 151 | - 字符类与字符选项 152 | - 重复 153 | - 分组与回溯引用 154 | 155 | - 解决实际问题 156 | - 判断是否匹配 157 | - 提取匹配内容 158 | 159 | 160 | 161 | 162 | 163 | # 字符串处理基础 164 | 165 | ## 字符串长度 166 | 167 | 想获取字符串的长度,使用 `str_length()`函数: 168 | ```{r} 169 | str_length("R for data science") 170 | ``` 171 | 172 | 字符串向量,也适用 173 | ```{r} 174 | str_length(c("a", "R for data science", NA)) 175 | ``` 176 | 177 | ## 字符串长度 178 | 179 | 数据框里配合dplyr函数,同样很方便 180 | ```{r} 181 | data.frame( 182 | x = c("a", "R for data science", NA) 183 | ) %>% 184 | mutate(y = str_length(x)) 185 | ``` 186 | 187 | 188 | 189 | 190 | 191 | 192 | 193 | ## 字符串组合 194 | 195 | 把字符串拼接在一起,使用`str_c()`函数 196 | ```{r} 197 | str_c("x", "y") 198 | ``` 199 | 200 | 201 | 把字符串拼接在一起,可以设置中间的间隔 202 | ```{r} 203 | str_c("x", "y", sep = ", ") 204 | ``` 205 | 206 | 207 | ```{r} 208 | str_c(c("x", "y", "z"), sep = ", ") 209 | ``` 210 | 是不是和你想象的不一样,那就试试`?str_c` 211 | 212 | 213 | 214 | ## 字符串组合 215 | ```{r} 216 | str_c(c("x", "y", "z"), c("x", "y", "z"), sep = ", ") 217 | ``` 218 | 219 | 用在数据框里 220 | ```{r} 221 | data.frame( x = c("I", "love", "you"), 222 | y = c("you", "like", "me") ) %>% 223 | mutate(z = str_c(x, y, sep = "|")) 224 | ``` 225 | 226 | 227 | ## 字符串组合 228 | 229 | 使用collapse选项,是先组合,然后再转换成单个字符串,大家对比下 230 | 231 | ```{r} 232 | str_c(c("x", "y", "z"), c("a", "b", "c"), sep = "|") 233 | ``` 234 | 235 | ```{r} 236 | str_c( 237 | c("x", "y", "z"), c("a", "b", "c"), collapse = "|" 238 | ) 239 | ``` 240 | 241 | 242 | 243 | 244 | 245 | 246 | 247 | ## 字符串取子集 248 | 249 | 截取字符串的一部分,需要指定截取的开始位置和结束位置 250 | ```{r} 251 | x <- c("Apple", "Banana", "Pear") 252 | str_sub(x, 1, 3) 253 | ``` 254 | 255 | 开始位置和结束位置如果是负整数,就表示位置是从后往前数,比如下面这段代码,截取倒数第3个至倒数第1个位置上的字符串 256 | ```{r} 257 | str_sub(x, -3, -1) 258 | ``` 259 | 260 | ## 字符串取子集 261 | 262 | 也可以进行赋值,如果该位置上有字符,就用新的字符替换旧的字符 263 | ```{r} 264 | x <- c("Apple", "Banana", "Pear") 265 | x 266 | ``` 267 | 268 | 269 | ```{r} 270 | str_sub(x, 1, 1) 271 | ``` 272 | 273 | 274 | ```{r} 275 | str_sub(x, 1, 1) <- "Q" 276 | x 277 | ``` 278 | 279 | 280 | 281 | 282 | 283 | 284 | # 使用正则表达式进行模式匹配 285 | 286 | 287 | ## 基础匹配 288 | 289 | `str_view()` 是查看string是否匹配pattern, 290 | 291 | 如果匹配,就高亮显示 292 | ```{r, out.width="300%"} 293 | x <- c("apple", "banana", "pear") 294 | str_view(string = x, pattern = "an") 295 | ``` 296 | 297 | 298 | ## 基础匹配 299 | 有时候,我们希望在字符`a`前后都有字符(即,a处在两字符中间,如rap, bad, sad, wave,spear等等) 300 | ```{r, out.width="300%"} 301 | x <- c("apple", "banana", "pear") 302 | str_view(x, ".a.") 303 | ``` 304 | 305 | 306 | ## 基础匹配 307 | 308 | \begincols[T] 309 | \begincol[T]{.49\textwidth} 310 | 311 | 这里的`.` 代表任意字符. 312 | 313 | ```{r, out.width="600%"} 314 | c("s.d") %>% 315 | str_view(".") 316 | ``` 317 | \endcol 318 | 319 | \begincol[T]{.49\textwidth} 320 | 321 | 如果想表达.本身呢? 322 | ```{r, out.width="600%"} 323 | c("s.d") %>% 324 | str_view("\\.") 325 | ``` 326 | 327 | \endcol 328 | \endcols 329 | 330 | 331 | 332 | ## 锚点 333 | ```{r} 334 | x <- c("apple", "banana", "pear") 335 | x 336 | ``` 337 | \begincols[T] 338 | \begincol[T]{.49\textwidth} 339 | 340 | 希望`a`是字符串的开始 341 | ```{r, out.width="600%"} 342 | str_view(x, "^a") 343 | ``` 344 | \endcol 345 | 346 | \begincol[T]{.49\textwidth} 347 | 348 | 希望`a`是一字符串的末尾 349 | ```{r, out.width="600%"} 350 | str_view(x, "a$") 351 | ``` 352 | \endcol 353 | \endcols 354 | 355 | 356 | 357 | 358 | ## 锚点 359 | ```{r, out.width="300%"} 360 | x <- c("apple pie", "apple", "apple cake") 361 | str_view(x, "^apple$") 362 | ``` 363 | 364 | 365 | 366 | 367 | 368 | ## 字符类与字符选项 369 | 370 | 前面提到,`.`匹配任意字符,事实上还有很多这种**特殊含义**的字符: 371 | 372 | * `\d`: matches any digit. 373 | * `\s`: matches any whitespace (e.g. space, tab, newline). 374 | * `[abc]`: matches a, b, or c. 375 | * `[^abc]`: matches anything except a, b, or c. 376 | 377 | 378 | ```{r, out.width="300%"} 379 | str_view(c("grey", "gray"), "gr[ea]y") 380 | ``` 381 | 382 | 383 | 384 | 385 | 386 | 387 | 388 | 389 | ## 重复 390 | 391 | 控制匹配次数: 392 | 393 | * `?`: 0 or 1 394 | * `+`: 1 or more 395 | * `*`: 0 or more 396 | 397 | 398 | ```{r} 399 | x <- "Roman numerals: MDCCCLXXXVIII" 400 | ``` 401 | 402 | \begincols[T] 403 | \begincol[T]{.49\textwidth} 404 | 405 | ```{r, out.width="600%"} 406 | str_view(x, "CC?") 407 | ``` 408 | 409 | \endcol 410 | \begincol[T]{.49\textwidth} 411 | 412 | ```{r, out.width="600%"} 413 | str_view(x, "X+") 414 | ``` 415 | \endcol 416 | \endcols 417 | 418 | 419 | 420 | ## 重复 421 | 控制匹配次数: 422 | 423 | * `{n}`: exactly n 424 | * `{n,}`: n or more 425 | * `{,m}`: at most m 426 | * `{n,m}`: between n and m 427 | 428 | 429 | 430 | ## 重复 431 | ```{r, out.width="300%"} 432 | x <- "Roman numerals: MDCCCLXXXVIII" 433 | str_view(x, "C{2}") 434 | str_view(x, "C{2,}") 435 | str_view(x, "C{2,3}") 436 | ``` 437 | 438 | 439 | 440 | ## 重复 441 | - 默认的情况,`*`, `+` 匹配都是**贪婪**的,也就是它会尽可能的匹配更多 442 | - 如果想让它不贪婪,而是变得懒惰起来,可以在 `*` 或 `+` 后加个`?` 443 | 444 | 445 | ```{r} 446 | x <- "Roman numerals: MDCCCLXXXVIII" 447 | ``` 448 | 449 | \begincols[T] 450 | \begincol[T]{.49\textwidth} 451 | ```{r, out.width="600%"} 452 | str_view(x, "CLX+") 453 | ``` 454 | \endcol 455 | \begincol[T]{.49\textwidth} 456 | 457 | ```{r, out.width="600%"} 458 | str_view(x, "CLX+?") 459 | ``` 460 | \endcol 461 | \endcols 462 | 463 | 464 | 465 | ## 小结一下 466 | 467 | ```{r out.width = '100%', fig.align='center', echo = FALSE} 468 | knitr::include_graphics("images/regex_repeat.jpg") 469 | ``` 470 | 471 | 472 | 473 | 474 | ## 分组与回溯引用 475 | 476 | 477 | ```{r} 478 | ft <- fruit %>% head(10) 479 | ft 480 | ``` 481 | 482 | 我们想看看这些单词里,有哪些字母是重复两次的,比如`aa`, `pp`. 如果用上面学的方法 483 | ```{r, out.width="300%"} 484 | str_view(ft, ".{2}", match = TRUE) 485 | ``` 486 | 487 | 发现是不是和我们的预想不一样呢? 488 | 489 | 490 | 491 | ## 分组与回溯引用 492 | 所以需要用到新技术 **分组与回溯引用**, 493 | ```{r, out.width="300%"} 494 | str_view(ft, "(.)\\1", match = TRUE) 495 | ``` 496 | 497 | 498 | ## 分组与回溯引用 499 | ```{r, eval=FALSE} 500 | str_view(ft, "(.)\\1", match = TRUE) 501 | ``` 502 | 503 | - `.` 是匹配任何字符 504 | - `(.)` 将匹配项括起来,它就用了一个名字,叫`\\1`; 如果有两个括号,就叫`\\1`和`\\2` 505 | - `\\1` 表示回溯引用,表示引用`\\1`对于的`(.)` 506 | 507 | 所以`(.)\\1`的意思就是,匹配到了字符,后面还希望有个**同样的字符** 508 | 509 | 510 | 511 | ## 分组与回溯引用 512 | 如果是匹配`abab`, `wcwc` 513 | ```{r, out.width="300%"} 514 | str_view(ft, "(..)\\1", match = TRUE) 515 | ``` 516 | 517 | 如果是匹配`abba`, `wccw`呢? 518 | 519 | ```{r, out.width="300%"} 520 | str_view(ft, "(.)(.)\\2\\1", match = TRUE) 521 | ``` 522 | 523 | 是不是很神奇? 524 | 525 | 526 | 527 | # 进阶部分 528 | 529 | 530 | ## look ahead 531 | 532 | 想匹配Windows,同时希望Windows右侧是`"95", "98", "NT", "2000"`中的一个 533 | ```{r, out.width="300%"} 534 | win <- c("Windows2000", "Windows", "Windows3.1") 535 | str_view(win, "Windows(?=95|98|NT|2000)") 536 | ``` 537 | 538 | ## look ahead 539 | 540 | ```{r, out.width="300%"} 541 | win <- c("Windows2000", "Windows", "Windows3.1") 542 | str_view(win, "Windows(?!95|98|NT|2000)") 543 | ``` 544 | 545 | 546 | 547 | 548 | 549 | 550 | 551 | ## look behind 552 | 553 | 554 | ```{r, out.width="300%"} 555 | win <- c("2000Windows", "Windows", "3.1Windows") 556 | str_view(win, "(?<=95|98|NT|2000)Windows") 557 | ``` 558 | 559 | ## look behind 560 | 561 | ```{r, out.width="300%"} 562 | win <- c("2000Windows", "Windows", "3.1Windows") 563 | str_view(win, "(?% mutate(has_e = str_detect(x, "e")) 592 | ``` 593 | 594 | 595 | 596 | ## 确定一个字符向量是否匹配一种模式 597 | 用去筛选也很方便 598 | ```{r echo=FALSE} 599 | d <- tibble(x = c("apple", "banana", "pear") ) 600 | d 601 | ``` 602 | 603 | ```{r} 604 | d %>% filter(str_detect(x, "e")) 605 | ``` 606 | 607 | 608 | 609 | 610 | 611 | ## 提取匹配的内容 612 | 613 | 我们希望能提取第二列中的数值,构成新的一列 614 | 615 | \begincols[T] 616 | \begincol[T]{.3\textwidth} 617 | 618 | ```{r echo=FALSE} 619 | dt <- tibble( 620 | x = 1:4, 621 | y = c("wk 3", "week-1", "7", "w#9") 622 | ) 623 | dt 624 | ``` 625 | \endcol 626 | \begincol[T]{.69\textwidth} 627 | 628 | ```{r} 629 | dt %>% mutate( 630 | z = str_extract(y, "[0-9]") 631 | ) 632 | ``` 633 | 634 | \endcol 635 | \endcols 636 | 637 | 638 | 639 | 640 | 641 | ## 提取匹配的内容 642 | 643 | 644 | 回到上课提问:如何提取`Sichuan Univ`后面的学院? 645 | ```{r echo=FALSE, message=FALSE, warning=FALSE} 646 | d <- tibble::tribble( 647 | ~No, ~address, 648 | 1L, "Sichuan Univ, Coll Chem", 649 | 2L, "Sichuan Univ, Coll Elect Engn", 650 | 3L, "Sichuan Univ, Dept Phys", 651 | 4L, "Sichuan Univ, Coll Life Sci", 652 | 6L, "Sichuan Univ, Food Engn", 653 | 7L, "Sichuan Univ, Coll Phys", 654 | 8L, "Sichuan Univ, Sch Business", 655 | 9L, "Wuhan Univ, Mat Sci" 656 | ) 657 | 658 | d 659 | ``` 660 | 661 | 662 | ## 提取匹配的内容 663 | \footnotesize 664 | ```{r} 665 | d %>% mutate( 666 | coll = str_extract(address, "(?<=Sichuan Univ,).*") 667 | ) %>% 668 | tidyr::unnest(coll, keep_empty = TRUE) 669 | ``` 670 | 671 | 672 | -------------------------------------------------------------------------------- /09_stringr/09_stringr.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/09_stringr/09_stringr.pdf -------------------------------------------------------------------------------- /09_stringr/header.tex: -------------------------------------------------------------------------------- 1 | \usepackage{ctex} 2 | \usepackage{booktabs} 3 | \usepackage{longtable} 4 | \usepackage{array} 5 | \usepackage{multirow} 6 | \usepackage{wrapfig} 7 | \usepackage{float} 8 | \usepackage{colortbl} 9 | \usepackage{pdflscape} 10 | \usepackage{tabu} 11 | \usepackage{threeparttable} 12 | \usepackage{threeparttablex} 13 | \usepackage{makecell} 14 | \usepackage{xcolor} 15 | \usepackage{xtab} 16 | 17 | \def\begincols{ 18 | \begin{columns}} \def\begincol{\begin{column}} \def\endcol{\end{column}} \def\endcols{\end{columns} 19 | } 20 | 21 | 22 | -------------------------------------------------------------------------------- /09_stringr/images/hex-stringr.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/09_stringr/images/hex-stringr.png -------------------------------------------------------------------------------- /09_stringr/images/regex_repeat.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/09_stringr/images/regex_repeat.jpg -------------------------------------------------------------------------------- /15_eda02/15_reproducible.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Exploratory Data Analysis with the Tidyverse" 3 | subtitle: "一个关于企鹅的数据故事" 4 | author: "诗与远方" 5 | date: "`r Sys.Date()`" 6 | output: 7 | pdf_document: 8 | latex_engine: xelatex 9 | extra_dependencies: 10 | ctex: UTF8 11 | number_sections: yes 12 | #toc: yes 13 | df_print: kable 14 | classoptions: "hyperref, 12pt, a4paper" 15 | --- 16 | 17 | 18 | ```{r setup, include=FALSE} 19 | knitr::opts_chunk$set( 20 | echo = TRUE, 21 | message = FALSE, 22 | warning = FALSE, 23 | fig.align = "center" 24 | ) 25 | ``` 26 | 27 | 28 | 29 | # 数据故事 30 | 31 | 今天讲一个关于企鹅的数据故事。数据来源[这里](https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-07-28/penguins.csv),图片来源[这里](https://github.com/allisonhorst/palmerpenguins). 32 | 33 | ```{r out.width = '100%', echo = FALSE} 34 | knitr::include_graphics("images/penguins.png") 35 | ``` 36 | 37 | 38 | 39 | # 数据 40 | 41 | ## 导入数据 42 | 43 | 可通过宏包`palmerpenguins::penguins`获取数据,也可以读取本地`penguins.csv`文件, 44 | 我们采取后面一种方法: 45 | 46 | ```{r} 47 | library(tidyverse) 48 | penguins <- read_csv("./demo_data/penguins.csv") 49 | penguins %>% head(5) 50 | ``` 51 | 52 | 53 | 54 | ## 变量含义 55 | 56 | |variable |class |description | 57 | |:-----------------|:-------|:-----------| 58 | |species |integer | 企鹅种类 (Adelie, Gentoo, Chinstrap) | 59 | |island |integer | 所在岛屿 (Biscoe, Dream, Torgersen) | 60 | |bill_length_mm |double | 嘴峰长度 (单位毫米) | 61 | |bill_depth_mm |double | 嘴峰深度 (单位毫米)| 62 | |flipper_length_mm |integer | 鰭肢长度 (单位毫米) | 63 | |body_mass_g |integer | 体重 (单位克) | 64 | |sex |integer | 性别 | 65 | |year |integer | 记录年份 | 66 | 67 | 68 | 69 | ```{r out.width = '86%', echo = FALSE} 70 | knitr::include_graphics("images/culmen_depth.png") 71 | ``` 72 | 73 | ## 数据清洗 74 | ```{r} 75 | penguins %>% filter_all(any_vars(is.na(.))) 76 | ``` 77 | 78 | ```{r} 79 | d <- penguins %>% drop_na() 80 | d %>% head() 81 | ``` 82 | 83 | # 探索性分析 84 | 85 | 86 | ## 多少种类的企鹅 87 | ```{r} 88 | d %>% count(species, sort = T) 89 | ``` 90 | 91 | ## 多少个岛屿 92 | ```{r} 93 | d %>% count(island, sort = T) 94 | ``` 95 | 96 | ## 每种类型的企鹅,他们的各个属性的均值和分布 97 | ```{r} 98 | d %>% 99 | group_by(species) %>% 100 | summarise( 101 | across(where(is.numeric), mean, na.rm = T) 102 | ) 103 | ``` 104 | ```{r} 105 | d %>% 106 | ggplot(aes( x = bill_length_mm)) + 107 | geom_density() + 108 | facet_wrap(vars(species), scale = "free") 109 | ``` 110 | 111 | ```{r} 112 | library(ggridges) 113 | d %>% 114 | ggplot(aes( x = bill_depth_mm, y = species, fill = species) ) + 115 | ggridges::geom_density_ridges() 116 | 117 | ``` 118 | 119 | 120 | 121 | ```{r} 122 | d %>% select(species, body_mass_g, ends_with("_mm")) %>% 123 | pivot_longer( 124 | cols = -species, 125 | names_to = "metric", 126 | values_to = "values" 127 | ) %>% 128 | ggplot(aes(x = values, y = species, fill = species) ) + 129 | ggridges::geom_density_ridges() + 130 | facet_wrap(vars(metric), scale = "free") 131 | ``` 132 | 133 | ## 嘴巴的长度和深度的关联? 134 | ```{r} 135 | d %>% 136 | ggplot(aes(x = bill_length_mm, y = bill_depth_mm)) + 137 | geom_point() + 138 | geom_smooth(method = lm, aes(color = species)) + 139 | geom_smooth(method = lm) 140 | ``` 141 | 142 | 143 | 144 | ## 不同种类的宝宝,体重具有显著性差异? 145 | ```{r} 146 | d %>% 147 | ggplot(aes(x = species, y = body_mass_g)) + 148 | geom_boxplot() + 149 | geom_jitter() 150 | ``` 151 | ```{r} 152 | aov(body_mass_g ~ species, data = d) %>% summary() 153 | ``` 154 | 155 | ```{r} 156 | library(ggstatsplot) 157 | d %>% 158 | ggbetweenstats( 159 | x = species, 160 | y = body_mass_g, 161 | pairwise.comparisons = T, 162 | pairwise.display = T 163 | ) 164 | 165 | 166 | ``` 167 | 使用这个宏包辅助我们学习统计 168 | 169 | 170 | ## 通过嘴巴的长度和深度,区分企鹅的种类?性别? 171 | 172 | 这是机器学习的范畴 173 | ```{r} 174 | d %>% 175 | ggplot(aes(x = bill_length_mm, y = bill_depth_mm, color = species, shape = species)) + 176 | geom_point() 177 | ``` 178 | 179 | 180 | ```{r} 181 | library(tidymodels) 182 | d <- d %>% mutate(species = factor(species)) 183 | 184 | split <- initial_split(d) 185 | split 186 | training_data <- training(split) 187 | testing_data <- testing(split) 188 | 189 | model <- parsnip::nearest_neighbor() %>% 190 | set_engine("kknn") %>% 191 | set_mode("classification") %>% 192 | fit(species ~ bill_length_mm + bill_depth_mm, data = training_data) 193 | 194 | 195 | predict(model, new_data = testing_data) %>% 196 | bind_cols(testing_data) %>% 197 | count(species, .pred_class) 198 | ``` 199 | 200 | 201 | 202 | -------------------------------------------------------------------------------- /15_eda02/15_reproducible.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/15_eda02/15_reproducible.pdf -------------------------------------------------------------------------------- /15_eda02/demo_data/penguins.csv: -------------------------------------------------------------------------------- 1 | species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex,year 2 | Adelie,Torgersen,39.1,18.7,181,3750,male,2007 3 | Adelie,Torgersen,39.5,17.4,186,3800,female,2007 4 | Adelie,Torgersen,40.3,18,195,3250,female,2007 5 | Adelie,Torgersen,NA,NA,NA,NA,NA,2007 6 | Adelie,Torgersen,36.7,19.3,193,3450,female,2007 7 | Adelie,Torgersen,39.3,20.6,190,3650,male,2007 8 | Adelie,Torgersen,38.9,17.8,181,3625,female,2007 9 | Adelie,Torgersen,39.2,19.6,195,4675,male,2007 10 | Adelie,Torgersen,34.1,18.1,193,3475,NA,2007 11 | Adelie,Torgersen,42,20.2,190,4250,NA,2007 12 | Adelie,Torgersen,37.8,17.1,186,3300,NA,2007 13 | Adelie,Torgersen,37.8,17.3,180,3700,NA,2007 14 | Adelie,Torgersen,41.1,17.6,182,3200,female,2007 15 | Adelie,Torgersen,38.6,21.2,191,3800,male,2007 16 | Adelie,Torgersen,34.6,21.1,198,4400,male,2007 17 | Adelie,Torgersen,36.6,17.8,185,3700,female,2007 18 | Adelie,Torgersen,38.7,19,195,3450,female,2007 19 | Adelie,Torgersen,42.5,20.7,197,4500,male,2007 20 | Adelie,Torgersen,34.4,18.4,184,3325,female,2007 21 | Adelie,Torgersen,46,21.5,194,4200,male,2007 22 | Adelie,Biscoe,37.8,18.3,174,3400,female,2007 23 | Adelie,Biscoe,37.7,18.7,180,3600,male,2007 24 | Adelie,Biscoe,35.9,19.2,189,3800,female,2007 25 | Adelie,Biscoe,38.2,18.1,185,3950,male,2007 26 | Adelie,Biscoe,38.8,17.2,180,3800,male,2007 27 | Adelie,Biscoe,35.3,18.9,187,3800,female,2007 28 | Adelie,Biscoe,40.6,18.6,183,3550,male,2007 29 | Adelie,Biscoe,40.5,17.9,187,3200,female,2007 30 | Adelie,Biscoe,37.9,18.6,172,3150,female,2007 31 | Adelie,Biscoe,40.5,18.9,180,3950,male,2007 32 | Adelie,Dream,39.5,16.7,178,3250,female,2007 33 | Adelie,Dream,37.2,18.1,178,3900,male,2007 34 | Adelie,Dream,39.5,17.8,188,3300,female,2007 35 | Adelie,Dream,40.9,18.9,184,3900,male,2007 36 | Adelie,Dream,36.4,17,195,3325,female,2007 37 | Adelie,Dream,39.2,21.1,196,4150,male,2007 38 | Adelie,Dream,38.8,20,190,3950,male,2007 39 | Adelie,Dream,42.2,18.5,180,3550,female,2007 40 | Adelie,Dream,37.6,19.3,181,3300,female,2007 41 | Adelie,Dream,39.8,19.1,184,4650,male,2007 42 | Adelie,Dream,36.5,18,182,3150,female,2007 43 | Adelie,Dream,40.8,18.4,195,3900,male,2007 44 | Adelie,Dream,36,18.5,186,3100,female,2007 45 | Adelie,Dream,44.1,19.7,196,4400,male,2007 46 | Adelie,Dream,37,16.9,185,3000,female,2007 47 | Adelie,Dream,39.6,18.8,190,4600,male,2007 48 | Adelie,Dream,41.1,19,182,3425,male,2007 49 | Adelie,Dream,37.5,18.9,179,2975,NA,2007 50 | Adelie,Dream,36,17.9,190,3450,female,2007 51 | Adelie,Dream,42.3,21.2,191,4150,male,2007 52 | Adelie,Biscoe,39.6,17.7,186,3500,female,2008 53 | Adelie,Biscoe,40.1,18.9,188,4300,male,2008 54 | Adelie,Biscoe,35,17.9,190,3450,female,2008 55 | Adelie,Biscoe,42,19.5,200,4050,male,2008 56 | Adelie,Biscoe,34.5,18.1,187,2900,female,2008 57 | Adelie,Biscoe,41.4,18.6,191,3700,male,2008 58 | Adelie,Biscoe,39,17.5,186,3550,female,2008 59 | Adelie,Biscoe,40.6,18.8,193,3800,male,2008 60 | Adelie,Biscoe,36.5,16.6,181,2850,female,2008 61 | Adelie,Biscoe,37.6,19.1,194,3750,male,2008 62 | Adelie,Biscoe,35.7,16.9,185,3150,female,2008 63 | Adelie,Biscoe,41.3,21.1,195,4400,male,2008 64 | Adelie,Biscoe,37.6,17,185,3600,female,2008 65 | Adelie,Biscoe,41.1,18.2,192,4050,male,2008 66 | Adelie,Biscoe,36.4,17.1,184,2850,female,2008 67 | Adelie,Biscoe,41.6,18,192,3950,male,2008 68 | Adelie,Biscoe,35.5,16.2,195,3350,female,2008 69 | Adelie,Biscoe,41.1,19.1,188,4100,male,2008 70 | Adelie,Torgersen,35.9,16.6,190,3050,female,2008 71 | Adelie,Torgersen,41.8,19.4,198,4450,male,2008 72 | Adelie,Torgersen,33.5,19,190,3600,female,2008 73 | Adelie,Torgersen,39.7,18.4,190,3900,male,2008 74 | Adelie,Torgersen,39.6,17.2,196,3550,female,2008 75 | Adelie,Torgersen,45.8,18.9,197,4150,male,2008 76 | Adelie,Torgersen,35.5,17.5,190,3700,female,2008 77 | Adelie,Torgersen,42.8,18.5,195,4250,male,2008 78 | Adelie,Torgersen,40.9,16.8,191,3700,female,2008 79 | Adelie,Torgersen,37.2,19.4,184,3900,male,2008 80 | Adelie,Torgersen,36.2,16.1,187,3550,female,2008 81 | Adelie,Torgersen,42.1,19.1,195,4000,male,2008 82 | Adelie,Torgersen,34.6,17.2,189,3200,female,2008 83 | Adelie,Torgersen,42.9,17.6,196,4700,male,2008 84 | Adelie,Torgersen,36.7,18.8,187,3800,female,2008 85 | Adelie,Torgersen,35.1,19.4,193,4200,male,2008 86 | Adelie,Dream,37.3,17.8,191,3350,female,2008 87 | Adelie,Dream,41.3,20.3,194,3550,male,2008 88 | Adelie,Dream,36.3,19.5,190,3800,male,2008 89 | Adelie,Dream,36.9,18.6,189,3500,female,2008 90 | Adelie,Dream,38.3,19.2,189,3950,male,2008 91 | Adelie,Dream,38.9,18.8,190,3600,female,2008 92 | Adelie,Dream,35.7,18,202,3550,female,2008 93 | Adelie,Dream,41.1,18.1,205,4300,male,2008 94 | Adelie,Dream,34,17.1,185,3400,female,2008 95 | Adelie,Dream,39.6,18.1,186,4450,male,2008 96 | Adelie,Dream,36.2,17.3,187,3300,female,2008 97 | Adelie,Dream,40.8,18.9,208,4300,male,2008 98 | Adelie,Dream,38.1,18.6,190,3700,female,2008 99 | Adelie,Dream,40.3,18.5,196,4350,male,2008 100 | Adelie,Dream,33.1,16.1,178,2900,female,2008 101 | Adelie,Dream,43.2,18.5,192,4100,male,2008 102 | Adelie,Biscoe,35,17.9,192,3725,female,2009 103 | Adelie,Biscoe,41,20,203,4725,male,2009 104 | Adelie,Biscoe,37.7,16,183,3075,female,2009 105 | Adelie,Biscoe,37.8,20,190,4250,male,2009 106 | Adelie,Biscoe,37.9,18.6,193,2925,female,2009 107 | Adelie,Biscoe,39.7,18.9,184,3550,male,2009 108 | Adelie,Biscoe,38.6,17.2,199,3750,female,2009 109 | Adelie,Biscoe,38.2,20,190,3900,male,2009 110 | Adelie,Biscoe,38.1,17,181,3175,female,2009 111 | Adelie,Biscoe,43.2,19,197,4775,male,2009 112 | Adelie,Biscoe,38.1,16.5,198,3825,female,2009 113 | Adelie,Biscoe,45.6,20.3,191,4600,male,2009 114 | Adelie,Biscoe,39.7,17.7,193,3200,female,2009 115 | Adelie,Biscoe,42.2,19.5,197,4275,male,2009 116 | Adelie,Biscoe,39.6,20.7,191,3900,female,2009 117 | Adelie,Biscoe,42.7,18.3,196,4075,male,2009 118 | Adelie,Torgersen,38.6,17,188,2900,female,2009 119 | Adelie,Torgersen,37.3,20.5,199,3775,male,2009 120 | Adelie,Torgersen,35.7,17,189,3350,female,2009 121 | Adelie,Torgersen,41.1,18.6,189,3325,male,2009 122 | Adelie,Torgersen,36.2,17.2,187,3150,female,2009 123 | Adelie,Torgersen,37.7,19.8,198,3500,male,2009 124 | Adelie,Torgersen,40.2,17,176,3450,female,2009 125 | Adelie,Torgersen,41.4,18.5,202,3875,male,2009 126 | Adelie,Torgersen,35.2,15.9,186,3050,female,2009 127 | Adelie,Torgersen,40.6,19,199,4000,male,2009 128 | Adelie,Torgersen,38.8,17.6,191,3275,female,2009 129 | Adelie,Torgersen,41.5,18.3,195,4300,male,2009 130 | Adelie,Torgersen,39,17.1,191,3050,female,2009 131 | Adelie,Torgersen,44.1,18,210,4000,male,2009 132 | Adelie,Torgersen,38.5,17.9,190,3325,female,2009 133 | Adelie,Torgersen,43.1,19.2,197,3500,male,2009 134 | Adelie,Dream,36.8,18.5,193,3500,female,2009 135 | Adelie,Dream,37.5,18.5,199,4475,male,2009 136 | Adelie,Dream,38.1,17.6,187,3425,female,2009 137 | Adelie,Dream,41.1,17.5,190,3900,male,2009 138 | Adelie,Dream,35.6,17.5,191,3175,female,2009 139 | Adelie,Dream,40.2,20.1,200,3975,male,2009 140 | Adelie,Dream,37,16.5,185,3400,female,2009 141 | Adelie,Dream,39.7,17.9,193,4250,male,2009 142 | Adelie,Dream,40.2,17.1,193,3400,female,2009 143 | Adelie,Dream,40.6,17.2,187,3475,male,2009 144 | Adelie,Dream,32.1,15.5,188,3050,female,2009 145 | Adelie,Dream,40.7,17,190,3725,male,2009 146 | Adelie,Dream,37.3,16.8,192,3000,female,2009 147 | Adelie,Dream,39,18.7,185,3650,male,2009 148 | Adelie,Dream,39.2,18.6,190,4250,male,2009 149 | Adelie,Dream,36.6,18.4,184,3475,female,2009 150 | Adelie,Dream,36,17.8,195,3450,female,2009 151 | Adelie,Dream,37.8,18.1,193,3750,male,2009 152 | Adelie,Dream,36,17.1,187,3700,female,2009 153 | Adelie,Dream,41.5,18.5,201,4000,male,2009 154 | Gentoo,Biscoe,46.1,13.2,211,4500,female,2007 155 | Gentoo,Biscoe,50,16.3,230,5700,male,2007 156 | Gentoo,Biscoe,48.7,14.1,210,4450,female,2007 157 | Gentoo,Biscoe,50,15.2,218,5700,male,2007 158 | Gentoo,Biscoe,47.6,14.5,215,5400,male,2007 159 | Gentoo,Biscoe,46.5,13.5,210,4550,female,2007 160 | Gentoo,Biscoe,45.4,14.6,211,4800,female,2007 161 | Gentoo,Biscoe,46.7,15.3,219,5200,male,2007 162 | Gentoo,Biscoe,43.3,13.4,209,4400,female,2007 163 | Gentoo,Biscoe,46.8,15.4,215,5150,male,2007 164 | Gentoo,Biscoe,40.9,13.7,214,4650,female,2007 165 | Gentoo,Biscoe,49,16.1,216,5550,male,2007 166 | Gentoo,Biscoe,45.5,13.7,214,4650,female,2007 167 | Gentoo,Biscoe,48.4,14.6,213,5850,male,2007 168 | Gentoo,Biscoe,45.8,14.6,210,4200,female,2007 169 | Gentoo,Biscoe,49.3,15.7,217,5850,male,2007 170 | Gentoo,Biscoe,42,13.5,210,4150,female,2007 171 | Gentoo,Biscoe,49.2,15.2,221,6300,male,2007 172 | Gentoo,Biscoe,46.2,14.5,209,4800,female,2007 173 | Gentoo,Biscoe,48.7,15.1,222,5350,male,2007 174 | Gentoo,Biscoe,50.2,14.3,218,5700,male,2007 175 | Gentoo,Biscoe,45.1,14.5,215,5000,female,2007 176 | Gentoo,Biscoe,46.5,14.5,213,4400,female,2007 177 | Gentoo,Biscoe,46.3,15.8,215,5050,male,2007 178 | Gentoo,Biscoe,42.9,13.1,215,5000,female,2007 179 | Gentoo,Biscoe,46.1,15.1,215,5100,male,2007 180 | Gentoo,Biscoe,44.5,14.3,216,4100,NA,2007 181 | Gentoo,Biscoe,47.8,15,215,5650,male,2007 182 | Gentoo,Biscoe,48.2,14.3,210,4600,female,2007 183 | Gentoo,Biscoe,50,15.3,220,5550,male,2007 184 | Gentoo,Biscoe,47.3,15.3,222,5250,male,2007 185 | Gentoo,Biscoe,42.8,14.2,209,4700,female,2007 186 | Gentoo,Biscoe,45.1,14.5,207,5050,female,2007 187 | Gentoo,Biscoe,59.6,17,230,6050,male,2007 188 | Gentoo,Biscoe,49.1,14.8,220,5150,female,2008 189 | Gentoo,Biscoe,48.4,16.3,220,5400,male,2008 190 | Gentoo,Biscoe,42.6,13.7,213,4950,female,2008 191 | Gentoo,Biscoe,44.4,17.3,219,5250,male,2008 192 | Gentoo,Biscoe,44,13.6,208,4350,female,2008 193 | Gentoo,Biscoe,48.7,15.7,208,5350,male,2008 194 | Gentoo,Biscoe,42.7,13.7,208,3950,female,2008 195 | Gentoo,Biscoe,49.6,16,225,5700,male,2008 196 | Gentoo,Biscoe,45.3,13.7,210,4300,female,2008 197 | Gentoo,Biscoe,49.6,15,216,4750,male,2008 198 | Gentoo,Biscoe,50.5,15.9,222,5550,male,2008 199 | Gentoo,Biscoe,43.6,13.9,217,4900,female,2008 200 | Gentoo,Biscoe,45.5,13.9,210,4200,female,2008 201 | Gentoo,Biscoe,50.5,15.9,225,5400,male,2008 202 | Gentoo,Biscoe,44.9,13.3,213,5100,female,2008 203 | Gentoo,Biscoe,45.2,15.8,215,5300,male,2008 204 | Gentoo,Biscoe,46.6,14.2,210,4850,female,2008 205 | Gentoo,Biscoe,48.5,14.1,220,5300,male,2008 206 | Gentoo,Biscoe,45.1,14.4,210,4400,female,2008 207 | Gentoo,Biscoe,50.1,15,225,5000,male,2008 208 | Gentoo,Biscoe,46.5,14.4,217,4900,female,2008 209 | Gentoo,Biscoe,45,15.4,220,5050,male,2008 210 | Gentoo,Biscoe,43.8,13.9,208,4300,female,2008 211 | Gentoo,Biscoe,45.5,15,220,5000,male,2008 212 | Gentoo,Biscoe,43.2,14.5,208,4450,female,2008 213 | Gentoo,Biscoe,50.4,15.3,224,5550,male,2008 214 | Gentoo,Biscoe,45.3,13.8,208,4200,female,2008 215 | Gentoo,Biscoe,46.2,14.9,221,5300,male,2008 216 | Gentoo,Biscoe,45.7,13.9,214,4400,female,2008 217 | Gentoo,Biscoe,54.3,15.7,231,5650,male,2008 218 | Gentoo,Biscoe,45.8,14.2,219,4700,female,2008 219 | Gentoo,Biscoe,49.8,16.8,230,5700,male,2008 220 | Gentoo,Biscoe,46.2,14.4,214,4650,NA,2008 221 | Gentoo,Biscoe,49.5,16.2,229,5800,male,2008 222 | Gentoo,Biscoe,43.5,14.2,220,4700,female,2008 223 | Gentoo,Biscoe,50.7,15,223,5550,male,2008 224 | Gentoo,Biscoe,47.7,15,216,4750,female,2008 225 | Gentoo,Biscoe,46.4,15.6,221,5000,male,2008 226 | Gentoo,Biscoe,48.2,15.6,221,5100,male,2008 227 | Gentoo,Biscoe,46.5,14.8,217,5200,female,2008 228 | Gentoo,Biscoe,46.4,15,216,4700,female,2008 229 | Gentoo,Biscoe,48.6,16,230,5800,male,2008 230 | Gentoo,Biscoe,47.5,14.2,209,4600,female,2008 231 | Gentoo,Biscoe,51.1,16.3,220,6000,male,2008 232 | Gentoo,Biscoe,45.2,13.8,215,4750,female,2008 233 | Gentoo,Biscoe,45.2,16.4,223,5950,male,2008 234 | Gentoo,Biscoe,49.1,14.5,212,4625,female,2009 235 | Gentoo,Biscoe,52.5,15.6,221,5450,male,2009 236 | Gentoo,Biscoe,47.4,14.6,212,4725,female,2009 237 | Gentoo,Biscoe,50,15.9,224,5350,male,2009 238 | Gentoo,Biscoe,44.9,13.8,212,4750,female,2009 239 | Gentoo,Biscoe,50.8,17.3,228,5600,male,2009 240 | Gentoo,Biscoe,43.4,14.4,218,4600,female,2009 241 | Gentoo,Biscoe,51.3,14.2,218,5300,male,2009 242 | Gentoo,Biscoe,47.5,14,212,4875,female,2009 243 | Gentoo,Biscoe,52.1,17,230,5550,male,2009 244 | Gentoo,Biscoe,47.5,15,218,4950,female,2009 245 | Gentoo,Biscoe,52.2,17.1,228,5400,male,2009 246 | Gentoo,Biscoe,45.5,14.5,212,4750,female,2009 247 | Gentoo,Biscoe,49.5,16.1,224,5650,male,2009 248 | Gentoo,Biscoe,44.5,14.7,214,4850,female,2009 249 | Gentoo,Biscoe,50.8,15.7,226,5200,male,2009 250 | Gentoo,Biscoe,49.4,15.8,216,4925,male,2009 251 | Gentoo,Biscoe,46.9,14.6,222,4875,female,2009 252 | Gentoo,Biscoe,48.4,14.4,203,4625,female,2009 253 | Gentoo,Biscoe,51.1,16.5,225,5250,male,2009 254 | Gentoo,Biscoe,48.5,15,219,4850,female,2009 255 | Gentoo,Biscoe,55.9,17,228,5600,male,2009 256 | Gentoo,Biscoe,47.2,15.5,215,4975,female,2009 257 | Gentoo,Biscoe,49.1,15,228,5500,male,2009 258 | Gentoo,Biscoe,47.3,13.8,216,4725,NA,2009 259 | Gentoo,Biscoe,46.8,16.1,215,5500,male,2009 260 | Gentoo,Biscoe,41.7,14.7,210,4700,female,2009 261 | Gentoo,Biscoe,53.4,15.8,219,5500,male,2009 262 | Gentoo,Biscoe,43.3,14,208,4575,female,2009 263 | Gentoo,Biscoe,48.1,15.1,209,5500,male,2009 264 | Gentoo,Biscoe,50.5,15.2,216,5000,female,2009 265 | Gentoo,Biscoe,49.8,15.9,229,5950,male,2009 266 | Gentoo,Biscoe,43.5,15.2,213,4650,female,2009 267 | Gentoo,Biscoe,51.5,16.3,230,5500,male,2009 268 | Gentoo,Biscoe,46.2,14.1,217,4375,female,2009 269 | Gentoo,Biscoe,55.1,16,230,5850,male,2009 270 | Gentoo,Biscoe,44.5,15.7,217,4875,NA,2009 271 | Gentoo,Biscoe,48.8,16.2,222,6000,male,2009 272 | Gentoo,Biscoe,47.2,13.7,214,4925,female,2009 273 | Gentoo,Biscoe,NA,NA,NA,NA,NA,2009 274 | Gentoo,Biscoe,46.8,14.3,215,4850,female,2009 275 | Gentoo,Biscoe,50.4,15.7,222,5750,male,2009 276 | Gentoo,Biscoe,45.2,14.8,212,5200,female,2009 277 | Gentoo,Biscoe,49.9,16.1,213,5400,male,2009 278 | Chinstrap,Dream,46.5,17.9,192,3500,female,2007 279 | Chinstrap,Dream,50,19.5,196,3900,male,2007 280 | Chinstrap,Dream,51.3,19.2,193,3650,male,2007 281 | Chinstrap,Dream,45.4,18.7,188,3525,female,2007 282 | Chinstrap,Dream,52.7,19.8,197,3725,male,2007 283 | Chinstrap,Dream,45.2,17.8,198,3950,female,2007 284 | Chinstrap,Dream,46.1,18.2,178,3250,female,2007 285 | Chinstrap,Dream,51.3,18.2,197,3750,male,2007 286 | Chinstrap,Dream,46,18.9,195,4150,female,2007 287 | Chinstrap,Dream,51.3,19.9,198,3700,male,2007 288 | Chinstrap,Dream,46.6,17.8,193,3800,female,2007 289 | Chinstrap,Dream,51.7,20.3,194,3775,male,2007 290 | Chinstrap,Dream,47,17.3,185,3700,female,2007 291 | Chinstrap,Dream,52,18.1,201,4050,male,2007 292 | Chinstrap,Dream,45.9,17.1,190,3575,female,2007 293 | Chinstrap,Dream,50.5,19.6,201,4050,male,2007 294 | Chinstrap,Dream,50.3,20,197,3300,male,2007 295 | Chinstrap,Dream,58,17.8,181,3700,female,2007 296 | Chinstrap,Dream,46.4,18.6,190,3450,female,2007 297 | Chinstrap,Dream,49.2,18.2,195,4400,male,2007 298 | Chinstrap,Dream,42.4,17.3,181,3600,female,2007 299 | Chinstrap,Dream,48.5,17.5,191,3400,male,2007 300 | Chinstrap,Dream,43.2,16.6,187,2900,female,2007 301 | Chinstrap,Dream,50.6,19.4,193,3800,male,2007 302 | Chinstrap,Dream,46.7,17.9,195,3300,female,2007 303 | Chinstrap,Dream,52,19,197,4150,male,2007 304 | Chinstrap,Dream,50.5,18.4,200,3400,female,2008 305 | Chinstrap,Dream,49.5,19,200,3800,male,2008 306 | Chinstrap,Dream,46.4,17.8,191,3700,female,2008 307 | Chinstrap,Dream,52.8,20,205,4550,male,2008 308 | Chinstrap,Dream,40.9,16.6,187,3200,female,2008 309 | Chinstrap,Dream,54.2,20.8,201,4300,male,2008 310 | Chinstrap,Dream,42.5,16.7,187,3350,female,2008 311 | Chinstrap,Dream,51,18.8,203,4100,male,2008 312 | Chinstrap,Dream,49.7,18.6,195,3600,male,2008 313 | Chinstrap,Dream,47.5,16.8,199,3900,female,2008 314 | Chinstrap,Dream,47.6,18.3,195,3850,female,2008 315 | Chinstrap,Dream,52,20.7,210,4800,male,2008 316 | Chinstrap,Dream,46.9,16.6,192,2700,female,2008 317 | Chinstrap,Dream,53.5,19.9,205,4500,male,2008 318 | Chinstrap,Dream,49,19.5,210,3950,male,2008 319 | Chinstrap,Dream,46.2,17.5,187,3650,female,2008 320 | Chinstrap,Dream,50.9,19.1,196,3550,male,2008 321 | Chinstrap,Dream,45.5,17,196,3500,female,2008 322 | Chinstrap,Dream,50.9,17.9,196,3675,female,2009 323 | Chinstrap,Dream,50.8,18.5,201,4450,male,2009 324 | Chinstrap,Dream,50.1,17.9,190,3400,female,2009 325 | Chinstrap,Dream,49,19.6,212,4300,male,2009 326 | Chinstrap,Dream,51.5,18.7,187,3250,male,2009 327 | Chinstrap,Dream,49.8,17.3,198,3675,female,2009 328 | Chinstrap,Dream,48.1,16.4,199,3325,female,2009 329 | Chinstrap,Dream,51.4,19,201,3950,male,2009 330 | Chinstrap,Dream,45.7,17.3,193,3600,female,2009 331 | Chinstrap,Dream,50.7,19.7,203,4050,male,2009 332 | Chinstrap,Dream,42.5,17.3,187,3350,female,2009 333 | Chinstrap,Dream,52.2,18.8,197,3450,male,2009 334 | Chinstrap,Dream,45.2,16.6,191,3250,female,2009 335 | Chinstrap,Dream,49.3,19.9,203,4050,male,2009 336 | Chinstrap,Dream,50.2,18.8,202,3800,male,2009 337 | Chinstrap,Dream,45.6,19.4,194,3525,female,2009 338 | Chinstrap,Dream,51.9,19.5,206,3950,male,2009 339 | Chinstrap,Dream,46.8,16.5,189,3650,female,2009 340 | Chinstrap,Dream,45.7,17,195,3650,female,2009 341 | Chinstrap,Dream,55.8,19.8,207,4000,male,2009 342 | Chinstrap,Dream,43.5,18.1,202,3400,female,2009 343 | Chinstrap,Dream,49.6,18.2,193,3775,male,2009 344 | Chinstrap,Dream,50.8,19,210,4100,male,2009 345 | Chinstrap,Dream,50.2,18.7,198,3775,female,2009 346 | -------------------------------------------------------------------------------- /15_eda02/header.tex: -------------------------------------------------------------------------------- 1 | \usepackage{ctex} 2 | \usepackage{booktabs} 3 | \usepackage{longtable} 4 | \usepackage{array} 5 | \usepackage{multirow} 6 | \usepackage{wrapfig} 7 | \usepackage{float} 8 | \usepackage{colortbl} 9 | \usepackage{pdflscape} 10 | \usepackage{tabu} 11 | \usepackage{threeparttable} 12 | \usepackage{threeparttablex} 13 | \usepackage{makecell} 14 | \usepackage{xcolor} 15 | \usepackage{xtab} 16 | 17 | \def\begincols{ 18 | \begin{columns}} \def\begincol{\begin{column}} \def\endcol{\end{column}} \def\endcols{\end{columns} 19 | } 20 | 21 | 22 | -------------------------------------------------------------------------------- /15_eda02/images/01.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/15_eda02/images/01.png -------------------------------------------------------------------------------- /15_eda02/images/4_3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/15_eda02/images/4_3.png -------------------------------------------------------------------------------- /15_eda02/images/culmen_depth.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/15_eda02/images/culmen_depth.png -------------------------------------------------------------------------------- /15_eda02/images/lter_penguins.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/15_eda02/images/lter_penguins.png -------------------------------------------------------------------------------- /15_eda02/images/penguins.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/15_eda02/images/penguins.png -------------------------------------------------------------------------------- /R4DS_slides.Rproj: -------------------------------------------------------------------------------- 1 | Version: 1.0 2 | 3 | RestoreWorkspace: Default 4 | SaveWorkspace: Default 5 | AlwaysSaveHistory: Default 6 | 7 | EnableCodeIndexing: Yes 8 | UseSpacesForTab: Yes 9 | NumSpacesForTab: 2 10 | Encoding: UTF-8 11 | 12 | RnwWeave: knitr 13 | LaTeX: XeLaTeX 14 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # 数据科学中的 R 语言 2 | 3 | 4 |

5 | 6 | 7 | 8 | ## 课程简介 9 | 10 | - 数据科学是综合了统计学、计算机科学和专业领域的交叉学科,具体内容就是用数据的方法研究科学,用科学的方法研究数据。 11 | - R 是一种统计分析的编程语言,集统计分析与图形显示于一体。通过学习和掌握语言的语法,可以编制自己的函数来扩展现有的语言。 12 | - 2019 年国际统计学年会将考普斯总统奖(统计学界的诺贝尔奖)颁给 R 语言宏包 tidyverse 的作者 Hadley Wickham,说明 R 语言得到了学术界的充分认可。 13 | - 由于统计分析能力突出、作图功能强大、拓展与开发能力强等特点,在国际上,R 语言在自然科学和社会科学研究领域,得到了越来越广泛的应用。 14 | 15 | 本课程将以 R 语言作为数据科学学习之旅的新起点,讲解 R 语言入门基础、数据可视化、数据处理、探索性分析、统计建模、案例解析以及在代表性领域的应用,适用于研究生和博士生。 16 | 17 | 18 | ## 课程目标 19 | 训练数据思维、提升编程技能、培养创新能力 20 | 21 | 22 | ## 课程内容 23 | 24 | 25 | | 时间 | 标题 | 主要内容 | 课时 | 课件 | 26 | |-------- |------------------- |-------------------------------------------------------------------------------------------- |------ |---------------------------------------------------------------------------------------------------------------------- | 27 | | week01 | Why R? | R是什么?R能干什么?为什么是R? | 1 | [00_whyR.pdf](https://github.com/perlatex/R4DS_slides/blob/master/00_whyR/00_whyR.pdf) | 28 | | week01 | 数据科学基础 | 了解数据科学流程,配置运行环境,安装R和Rstudio,以及如何安装所需要的宏包 | 1 | [01_install.pdf](https://github.com/perlatex/R4DS_slides/blob/master/01_install/01_install.pdf) | 29 | | week02 | R语言基础 | 基本运算、数据类型、数据结构、常用统计函数、分支,循环等,了解脚本、宏包,以及如何获取帮助 | 2 | [02_basicR.pdf](https://github.com/perlatex/R4DS_slides/blob/master/02_basicR/02_basicR.pdf) | 30 | | week03 | 子集选取 | 向量、列表、矩阵、数据框 | 2 | [03_subset.pdf](https://github.com/perlatex/R4DS_slides/blob/master/03_subset/03_subset.pdf) | 31 | | week04 | 可重复性研究 | Rmarkdown语法,生成html格式报告、生成pdf格式报告、生成word格式报告 | 2 | [04_Rmarkdown.pdf](https://github.com/perlatex/R4DS_slides/blob/master/04_Rmarkdown/04_Rmarkdown.pdf) | 32 | | week05 | 数据处理 | 读取外部数据,存储数据,dplyr数据处理,案例讲解 | 2 | [05_dplyr.pdf](https://github.com/perlatex/R4DS_slides/blob/master/05_dplyr/05_dplyr.pdf) | 33 | | week06 | 数据可视化1 | ggplot2基本语法、映射、设置、图片保存 | 2 | [06_ggplot2.pdf](https://github.com/perlatex/R4DS_slides/blob/master/06_ggplot2/06_ggplot2.pdf) | 34 | | week07 | 数据可视化2 | 几何对象、主题风格、标度体系、图例系统 | 2 | [07_ggplot2.pdf](https://github.com/perlatex/R4DS_slides/blob/master/07_ggplot2/07_ggplot2.pdf) | 35 | | week08 | 探索性数据分析1 | 结合案例数据,综合运用数据处理、可视化探索技能 | 2 | [08_eda01.pdf](https://github.com/perlatex/R4DS_slides/blob/master/08_eda01/08_eda01.pdf) | 36 | | week09 | 字符串处理 | 正则表达式,文本信息提取 | 2 | [09_stringr.pdf](https://github.com/perlatex/R4DS_slides/blob/master/09_stringr/09_stringr.pdf) | 37 | | week10 | 因子类型数据 | 因子型变量的处理和应用 | 2 | [10_forcats.pdf](https://github.com/perlatex/R4DS_slides/blob/master/10_forcats/10_forcats.pdf) | 38 | | week11 | 线性回归 | 一元回归、多元回归模型,重点是分析和解释模型输出、拟合与预测 | 2 | [11_lm.pdf](https://github.com/perlatex/R4DS_slides/blob/master/11_lm/11_lm.pdf) | 39 | | week12 | 基础统计分析 | 基本描述统计,假设检验,方差分析,以及与线性回归的等价性 | 2 | [12_tidystats.pdf](https://github.com/perlatex/R4DS_slides/blob/master/12_tidystats/12_tidystats.pdf) | 40 | | week13 | 函数式编程 | 安全高效的迭代处理技术 | 2 | [13_purrr.pdf](https://github.com/perlatex/R4DS_slides/blob/master/13_purrr/13_purrr.pdf) | 41 | | week14 | tidyverse编程进阶 | 各种应用场景,常用函数和技巧 | 2 | [14_tidyverse_tips.pdf](https://github.com/perlatex/R4DS_slides/blob/master/14_tidyverse_tips/14_tidyverse_tips.pdf) | 42 | | week15 | 探索性数据分析2 | 结合具体案例,完成数据分析和建模,训练数据思维 | 2 | [15_eda02.pdf](https://github.com/perlatex/R4DS_slides/blob/master/15_eda02/15_eda02.pdf) | 43 | 44 | 45 | 46 | 47 | ## 关于考核 48 | 结合所在学科,找一篇与自己研究方向相关的文献,用课堂上学到的 R 统计编程技能,**重复**文献的数据分析和可视化过程. 49 | 50 | 51 | 52 | 53 | ## 参考书目 54 | - [https://r4ds.had.co.nz/](https://r4ds.had.co.nz/) 55 | - [https://bookdown.org/wangminjie/R4DS/](https://bookdown.org/wangminjie/R4DS/) 56 | 57 | 58 | 59 | 60 | ## 我会努力的 61 | 愿 R 语言成为你构建知识大厦的脚手架! 62 | -------------------------------------------------------------------------------- /data_science.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/data_science.jpg --------------------------------------------------------------------------------