├── .gitignore
├── 00_whyR
├── 00_whyR.Rmd
├── 00_whyR.log
├── 00_whyR.pdf
├── exercises
│ ├── 1_stats.R
│ ├── 2_visual.R
│ ├── 3_eda.R
│ ├── 4_reproducible.Rmd
│ └── covnat.rda
├── header.tex
└── images
│ ├── Coding-Lab.jpg
│ ├── Knowledge-Scaffolding.jpg
│ ├── R4art.png
│ ├── RStudio-Screenshot.png
│ ├── R_inventor.png
│ ├── R_logo.png
│ ├── R_vs_SPSS.jpg
│ ├── arrival-movie.png
│ ├── data-science-explore.png
│ ├── data_science.png
│ ├── hadley-wickham.jpg
│ ├── night_king.jpg
│ ├── r4ds-cover.png
│ ├── rstudio-editor.png
│ ├── social_science.jpg
│ ├── tidyverse.png
│ ├── tiobe-index.png
│ ├── typesetting.png
│ └── what_is_R.png
├── 01_install
├── 01_install.Rmd
├── 01_install.log
├── 01_install.pdf
├── header.tex
└── images
│ ├── QQgroup_PsyStats.png
│ ├── QQgroup_chenglong.png
│ ├── QQgroup_shizishan.png
│ ├── RStudio-Screenshot.png
│ ├── Rhelp.png
│ ├── Rinstall.png
│ ├── Rstudio_install.png
│ ├── dashboard.jpg
│ ├── engine.jpg
│ ├── mirror1.png
│ ├── mirror2.png
│ ├── rstudio-editor1.png
│ └── run_script.png
├── 02_basicR
├── 02_basicR.Rmd
├── 02_basicR.log
├── 02_basicR.pdf
├── header.tex
└── images
│ ├── Rhelp.png
│ ├── data_struction1.png
│ ├── data_type.png
│ ├── rstudio-editor.png
│ ├── script1.png
│ └── script2.png
├── 03_subset
├── 03_subset.Rmd
├── 03_subset.pdf
├── header.tex
└── images
│ ├── R_box.png
│ └── data_struction1.png
├── 04_Rmarkdown
├── 04_Rmarkdown.Rmd
├── 04_Rmarkdown.log
├── 04_Rmarkdown.pdf
├── header.tex
└── images
│ ├── R_logo.png
│ ├── rmarkdown.png
│ └── rstudio-markdown.png
├── 05_dplyr
├── 05_dplyr.Rmd
├── 05_dplyr.log
├── 05_dplyr.pdf
├── demo_data
│ ├── olympics.xlsx
│ └── wages.csv
├── header.tex
└── images
│ ├── import_datatype01.png
│ ├── pipe1.png
│ ├── pipe2.png
│ └── tidyverse.png
├── 06_ggplot2
├── 06_ggplot2.Rmd
├── 06_ggplot2.log
├── 06_ggplot2.pdf
├── demo_data
│ └── temp_carbon.csv
├── header.tex
└── images
│ ├── Paradox1.pdf
│ ├── Paradox2.pdf
│ ├── Paradox3.pdf
│ ├── a-14.png
│ ├── a-20.png
│ ├── a-21.png
│ ├── a-3.png
│ ├── cholera_a.pdf
│ ├── cholera_b.pdf
│ ├── cholera_c.pdf
│ ├── ggplot_template.png
│ ├── mapping.png
│ └── tidyverse.png
├── 09_stringr
├── 09_stringr.Rmd
├── 09_stringr.pdf
├── header.tex
└── images
│ ├── hex-stringr.png
│ └── regex_repeat.jpg
├── 15_eda02
├── 15_reproducible.Rmd
├── 15_reproducible.pdf
├── demo_data
│ └── penguins.csv
├── header.tex
└── images
│ ├── 01.png
│ ├── 4_3.png
│ ├── culmen_depth.png
│ ├── lter_penguins.png
│ └── penguins.png
├── R4DS_slides.Rproj
├── README.md
└── data_science.jpg
/.gitignore:
--------------------------------------------------------------------------------
1 | .Rproj.user
2 | .Rhistory
3 | .RData
4 | .Ruserdata
5 |
--------------------------------------------------------------------------------
/00_whyR/00_whyR.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "R能给我们生活带来什么?"
3 | author: "王敏杰"
4 | institute: "四川师范大学"
5 | date: "\\today"
6 | fontsize: 12pt
7 | output: binb::metropolis
8 | section-titles: true
9 | #toc: true
10 | header-includes:
11 | - \usepackage[fontset = fandol]{ctex}
12 | - \input{header.tex}
13 | link-citations: yes
14 | colorlinks: yes
15 | linkcolor: red
16 | classoption: "dvipsnames,UTF8"
17 | ---
18 |
19 | ```{r setup, include=FALSE}
20 | options(digits = 3)
21 | knitr::opts_chunk$set(
22 | comment = "#>",
23 | echo = TRUE,
24 | collapse = TRUE,
25 | message = FALSE,
26 | warning = FALSE,
27 | out.width = "50%",
28 | fig.align = "center",
29 | fig.asp = 0.618, # 1 / phi
30 | fig.show = "hold"
31 | )
32 | ```
33 |
34 | ## R能给我们生活带来什么?
35 |
36 | 这个问题,好比人生三大终极问题:
37 |
38 | - R是什么?
39 | - R能干什么?
40 | - 为什么是R?
41 |
42 | # R是什么
43 |
44 | ## R那些事
45 |
46 | - 1992年,新西兰奥克兰大学统计学教授 Ross Ihaka 和 Robert Gentleman,为了方便地给学生教授统计学课程,他们设计开发了R语言(他们名字的首字母都是R)。
47 |
48 | ```{r echo=FALSE, out.width = '0.8\\textwidth'}
49 | knitr::include_graphics(path = "images/R_inventor.png")
50 | ```
51 |
52 | - 2000年,R1.0.0 发布
53 | - 2004年,第一届国际useR!会议(随后每年举办一次)
54 | - 2005年,ggplot2宏包(2018.8 - 2019.8下载量超过 1.3 亿次)
55 | - 2012年,R2.15.2 发布
56 | - 2013年,R3.0.2 发布, CRAN上的宏包数量5026个
57 | - 2016年,Rstudio公司推出 tidyverse 宏包(数据科学当前最流行的R宏包)
58 | - 2017年,R3.4.1 发布,CRAN上的宏包数量10875个
59 | - 2019年,R3.6.1 发布,CRAN上的宏包数量15102个
60 | - 2020年,R4.0.0 发布,CRAN上的宏包数量16054个
61 |
62 | [The History of R](https://blog.revolutionanalytics.com/2020/07/the-history-of-r-updated-for-2020.html)
63 |
64 | ## R是什么
65 |
66 | 官网定义:
67 |
68 | ```{r eval=FALSE, include=FALSE}
69 | knitr::include_graphics("images/what_is_R.png")
70 | ```
71 |
72 | R语言是用于统计分析,图形表示和报告的编程语言:
73 |
74 | - R 是一个\textcolor{red}{统计编程}语言(statistical programming)
75 | - R 可运行于多种平台之上,包括Windows、UNIX 和 Mac OS X
76 | - R 拥有顶尖水准的\textcolor{red}{制图}功能
77 | - R 是免费的
78 | - R 应用广泛,拥有丰富的\textcolor{red}{库包}
79 | - 活跃的\textcolor{red}{社区}
80 |
81 |
82 |
83 |
84 |
85 |
86 |
87 |
88 | ## R语言发展趋势
89 |
90 | ```{r echo=FALSE, out.width = '100%'}
91 | knitr::include_graphics("images/tiobe-index.png")
92 | ```
93 |
94 | [TIOBE index](https://www.tiobe.com/tiobe-index/)
95 |
96 | ## 界面很友好
97 |
98 | ```{r out.width = '85%', echo = FALSE}
99 | knitr::include_graphics("images/rstudio-editor.png")
100 | ```
101 |
102 | ## R路上的大神
103 |
104 | 2019 年 8 月,国际统计学年会将考普斯总统奖(\textcolor{red}{被誉为统计学的诺贝尔奖})奖颁给 tidyverse 的作者
105 |
106 | ```{r echo=FALSE, out.width = '50%'}
107 | knitr::include_graphics("images/hadley-wickham.jpg")
108 | ```
109 |
110 | - [Hadley Wickham](http://hadley.nz/)
111 | - R路上的大神
112 | - 一个改变了R语言的人
113 |
114 | # R能干什么
115 |
116 | ## 数据科学的流程
117 |
118 | Hadley Wickham 定义了数据科学的工作流程
119 |
120 | ```{r echo=FALSE, out.width = '\\textwidth'}
121 | knitr::include_graphics(path = "images/data-science-explore.png")
122 | ```
123 |
124 |
125 | ## tidyverse套餐
126 |
127 | ```{r out.width = '80%', echo = FALSE}
128 | knitr::include_graphics("images/tidyverse.png")
129 | ```
130 | \centering{https://www.tidyverse.org/}
131 |
132 |
133 |
134 | ## R & tidyverse
135 |
136 | | 序号 | 内容 | 代码演示 |
137 | |------ |-------------- |------------------ |
138 | | 1 | 统计 | 1_stats.R |
139 | | 2 | 可视化 | 2_visual.R |
140 | | 3 | 探索性分析 | 3_eda.R |
141 | | 4 | 可重复性报告 | 4_reproducible.R |
142 |
143 |
144 | ## 难吗?
145 | \Huge
146 | \centering{ 感觉很难吗? \\ 如果是,那说明你认真听了}
147 |
148 |
149 |
150 | ## 看了这些代码,可能第一眼感觉是这样的
151 | ```{r echo=FALSE, out.width = '100%', fig.cap='图片来自电影《降临》'}
152 | knitr::include_graphics("images/arrival-movie.png")
153 | ```
154 |
155 |
156 |
157 |
158 | ## 但我更希望这门课结束后
159 | ```{r echo=FALSE, out.width = '100%', fig.cap='图片来自美剧《权利的游戏》'}
160 | knitr::include_graphics("images/night_king.jpg")
161 | ```
162 |
163 |
164 |
165 | # 为什么是R
166 |
167 |
168 | ## 社会科学需要统计
169 |
170 | ```{r echo=FALSE, out.width = '60%'}
171 | knitr::include_graphics("images/social_science.jpg")
172 | ```
173 |
174 | \centering{我们不是学统计的,但需要统计}
175 |
176 |
177 |
178 |
179 |
180 | ## 社会科学需要可视化
181 |
182 | ```{r echo=FALSE, out.width = '50%'}
183 | knitr::include_graphics("images/R4art.png")
184 | ```
185 |
186 |
187 | \centering{我们不是学美术的,但要可视化}
188 |
189 |
190 |
191 | ## 社会科学需要编程
192 |
193 | ```{r echo=FALSE, out.width = '80%'}
194 | knitr::include_graphics("images/Coding-Lab.jpg")
195 | ```
196 |
197 | \centering{我们不是学计算机的,但需要编程}
198 |
199 |
200 |
201 |
202 | ## 你的论文需要排版
203 |
204 | ```{r echo=FALSE, out.width = '60%'}
205 | knitr::include_graphics("images/typesetting.png")
206 | ```
207 |
208 | \centering{我们不是学设计的,但要操心\textcolor{red}{交叉引用}的事}
209 |
210 |
211 |
212 |
213 |
214 | ## 挖掘机技术到底哪家强?
215 |
216 |
217 |
218 |
219 |
220 | \centering
221 | 你有需求,而
222 | \raisebox{-.5\height}{\includegraphics[height=3\baselineskip]{images/R_logo.png}}
223 | 很专业
224 |
225 |
226 |
227 | | 序号 | 内容 | 特性 | 评价 |
228 | |------ |--------------- |---------- |------ |
229 | | 1 | 统计分析 | 看家本领 | 好用 |
230 | | 2 | ggplot2画图 | 颜值担当 | 好看 |
231 | | 3 | tidyverse语法 | 简单易懂 | 好学 |
232 | | 4 | 可重复性报告 | 方便快捷 | 好玩 |
233 |
234 |
235 |
236 |
237 |
238 | ## 一见钟情,还是相见恨晚?
239 |
240 | ```{r echo=FALSE, out.width = '100%'}
241 | knitr::include_graphics("images/R_vs_SPSS.jpg")
242 | ```
243 |
244 |
245 |
246 |
247 |
248 | # 关于学习
249 |
250 | ## 我们的课程不会枯燥
251 |
252 | ```{r echo=FALSE, out.width = '45%'}
253 | knitr::include_graphics("images/data_science.png")
254 | ```
255 |
256 | - 数据科学是为社会科学服务的,我们会有很多案例
257 | - 编程是工具,统计是灵魂,专业是核心
258 |
259 |
260 |
261 | ## 关于学习
262 |
263 | 我很少使用
264 |
265 | $$
266 | f(x)=\frac{1}{\sqrt{2 \pi}} e^{-\frac{1}{2} x^{2}}
267 | $$
268 |
269 | 更多的是
270 |
271 | ```{r, eval = FALSE}
272 | library(tidyverse)
273 | summary_monthly_temp <- weather %>%
274 | group_by(month) %>%
275 | summarize(mean = mean(temp),
276 | std_dev = sd(temp))
277 | ```
278 |
279 | ## 关于学习
280 |
281 | ### 课程目标
282 |
283 | - 训练数据思维,提升编程技能,培养创新能力
284 |
285 | ### 学习方法
286 |
287 | - **问题驱动型学习**
288 | - 多实践(光看李小龙的电影,是学不会功夫的)
289 | - 不是 learning R,而是 learning with R
290 | - 把 R 看做是知识学习的**脚手架**
291 |
292 | ```{r echo=FALSE, out.width = '35%'}
293 | knitr::include_graphics("images/Knowledge-Scaffolding.jpg")
294 | ```
295 |
296 |
297 |
298 |
299 |
300 |
301 |
302 |
303 |
304 |
305 |
306 |
307 |
308 |
309 |
310 |
311 |
312 |
313 | ## 参考书目
314 |
315 | ```{r echo=FALSE, out.width = '35%'}
316 | knitr::include_graphics("images/r4ds-cover.png")
317 | ```
318 |
319 | - [R for Data Science](https://r4ds.had.co.nz/)
320 | - [https://bookdown.org/wangminjie/R4DS/](https://bookdown.org/wangminjie/R4DS/)
321 |
--------------------------------------------------------------------------------
/00_whyR/00_whyR.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/00_whyR/00_whyR.pdf
--------------------------------------------------------------------------------
/00_whyR/exercises/1_stats.R:
--------------------------------------------------------------------------------
1 | # Numeric Functions
2 | 1 + 5
3 | 1:100
4 | abs(-3.14)
5 | sqrt(3.14)
6 | floor(3.14)
7 | round(3.14)
8 | cos(3.14)
9 | log(3.14)
10 | exp(3.14)
11 |
12 | seq(1, 10, 2)
13 | rep(1:3, 2)
14 |
15 |
16 |
17 |
18 |
19 | # Character Functions
20 | substr("abcdef", 2, 4)
21 | grep("a", c("alice", "bob", "claro"))
22 | strsplit("a.b.c", "\\.")
23 | toupper("Alice")
24 | tolower("Alice")
25 |
26 |
27 |
28 |
29 | # Statistical Functions
30 | x <- 1:10
31 | sum(x)
32 | min(x)
33 | mean(x)
34 | sd(x)
35 | var(x)
36 | median(x)
37 | quantile(x, probs = 0.75)
38 | range(x)
39 | scale(x, center = TRUE, scale = TRUE)
40 |
41 |
42 |
43 | # Statistical Probability Functions
44 | rnorm(20, mean = 0, sd = 1)
45 | dnorm(0.5, mean = 0, sd = 1)
46 | rpois(100, lambda = 10)
47 | dpois(2, lambda = 10)
48 |
49 |
50 |
51 |
52 |
53 | # Regression Modeling
54 | lm(mpg ~ wt, data = mtcars)
55 | aov(mpg ~ wt, data = mtcars)
56 | t.test(extra ~ group, data = sleep)
57 |
58 |
59 |
--------------------------------------------------------------------------------
/00_whyR/exercises/2_visual.R:
--------------------------------------------------------------------------------
1 | library(ggplot2)
2 |
3 | ggplot(midwest, aes(x = area, y = poptotal)) +
4 | geom_point(aes(color = state, size = popdensity)) +
5 | geom_smooth(method = "loess", se = F) +
6 | xlim(c(0, 0.1)) +
7 | ylim(c(0, 500000)) +
8 | labs(
9 | subtitle = "Area Vs Population",
10 | y = "Population",
11 | x = "Area",
12 | title = "Scatterplot",
13 | caption = "Source: midwest"
14 | )
15 |
--------------------------------------------------------------------------------
/00_whyR/exercises/3_eda.R:
--------------------------------------------------------------------------------
1 | library(tidyverse)
2 |
3 | # 案例一:飓风数据集
4 |
5 | storms %>% count(year)
6 |
7 | storms %>%
8 | group_by(year) %>%
9 | summarize(
10 | wind_mean = mean(wind),
11 | wind_sd = sd(wind)
12 | )
13 |
14 |
15 |
16 |
17 |
18 | # 案例二:VC剂量和喂食方法对豚鼠牙齿的影响?
19 | # 双因素方差分析 (ANOVA)
20 |
21 | my_data <- ToothGrowth %>%
22 | mutate(
23 | across(c(supp, dose), as_factor)
24 | )
25 |
26 |
27 | my_data %>%
28 | ggplot(aes(x = supp, y = len, fill = supp)) +
29 | geom_boxplot(position = position_dodge()) +
30 | facet_wrap(vars(dose))
31 |
32 |
33 |
34 | aov(len ~ supp + dose, data = my_data)
35 |
36 |
37 | aov(len ~ supp + dose, data = my_data) %>%
38 | TukeyHSD(which = "dose") %>%
39 | broom::tidy()
40 |
41 |
42 |
43 |
44 |
45 |
46 |
--------------------------------------------------------------------------------
/00_whyR/exercises/4_reproducible.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "这是一份关于新冠肺炎的探索性分析报告"
3 | author: "王小二"
4 | date: "`r Sys.Date()`"
5 | output:
6 | pdf_document:
7 | latex_engine: xelatex
8 | extra_dependencies:
9 | ctex: UTF8
10 | number_sections: yes
11 | #toc: yes
12 | df_print: kable
13 | classoptions: "hyperref, 12pt, a4paper"
14 | ---
15 |
16 |
17 | ```{r setup, include=FALSE}
18 | knitr::opts_chunk$set(echo = TRUE,
19 | message = FALSE,
20 | warning = FALSE,
21 | fig.align = "center"
22 | )
23 | ```
24 |
25 |
26 |
27 | # 引言
28 |
29 | 新型冠状病毒疫情在多国蔓延,一些国家的病例确诊数量明显增多,各国防疫力度继续加强。本章通过分析疫情数据,了解疫情发展,祝愿人类早日会战胜病毒!
30 |
31 |
32 | # 导入数据
33 |
34 | 首先,我们加载需要的宏包,其中tidyverse用于数据探索、covdata用于获取数据
35 |
36 | ```{r}
37 | # Load libraries
38 | library(tidyverse)
39 | #library(covdata)
40 | load("covnat.rda")
41 | ```
42 |
43 |
44 | 论文的数据来源
45 | [https://kjhealy.github.io/covdata/](https://kjhealy.github.io/covdata/),我们选取部分数据看看
46 |
47 |
48 | ```{r, echo = FALSE}
49 | covnat %>%
50 | tail(8)
51 | ```
52 |
53 |
54 |
55 | # 数据变量
56 |
57 | 这个数据集包含8个变量,具体含义如下:
58 |
59 | | 变量 | 含义 |
60 | |----------- |-------------------- |
61 | | date | 日期 |
62 | | cname | 国家名 |
63 | | iso3 | 国家编码 |
64 | | cases | 确诊病例 |
65 | | deaths | 死亡病例 |
66 | | pop | 2019年国家人口数量 |
67 | | cu_cases | 累积确诊病例 |
68 | | cu_deaths | 累积死亡病例 |
69 |
70 | # 数据探索
71 |
72 | 找出累积确诊病例最多的几个国家
73 |
74 | ```{r}
75 | covnat %>%
76 | ungroup() %>%
77 | filter(date == max(date)) %>%
78 | slice_max(cu_cases, n = 8)
79 | ```
80 |
81 |
82 | # 可视化
83 |
84 | 为了更好的呈现数据,我们将筛选出美国确诊病例数据,并可视化
85 |
86 | ```{r, fig.showtext = TRUE}
87 | covnat %>%
88 | filter(iso3 == "USA") %>%
89 | filter(cu_cases > 0) %>%
90 | ungroup() %>%
91 |
92 | ggplot(aes(x = date, y = cases)) +
93 | geom_path() +
94 | scale_x_date(name = NULL, breaks = "month") +
95 | labs(title = "美国新冠肺炎累积确诊病例",
96 | subtitle = "数据来源https://kjhealy.github.io/covdata/")
97 | ```
98 |
99 |
100 |
--------------------------------------------------------------------------------
/00_whyR/exercises/covnat.rda:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/00_whyR/exercises/covnat.rda
--------------------------------------------------------------------------------
/00_whyR/header.tex:
--------------------------------------------------------------------------------
1 | \usepackage{ctex}
2 | \usepackage{booktabs}
3 | \usepackage{longtable}
4 | \usepackage{array}
5 | \usepackage{multirow}
6 | \usepackage{wrapfig}
7 | \usepackage{float}
8 | \usepackage{colortbl}
9 | \usepackage{pdflscape}
10 | \usepackage{tabu}
11 | \usepackage{threeparttable}
12 | \usepackage{threeparttablex}
13 | \usepackage{makecell}
14 | \usepackage{xcolor}
15 | \usepackage{xtab}
16 |
17 | \def\begincols{
18 | \begin{columns}} \def\begincol{\begin{column}} \def\endcol{\end{column}} \def\endcols{\end{columns}
19 | }
20 |
21 |
22 |
--------------------------------------------------------------------------------
/00_whyR/images/Coding-Lab.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/00_whyR/images/Coding-Lab.jpg
--------------------------------------------------------------------------------
/00_whyR/images/Knowledge-Scaffolding.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/00_whyR/images/Knowledge-Scaffolding.jpg
--------------------------------------------------------------------------------
/00_whyR/images/R4art.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/00_whyR/images/R4art.png
--------------------------------------------------------------------------------
/00_whyR/images/RStudio-Screenshot.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/00_whyR/images/RStudio-Screenshot.png
--------------------------------------------------------------------------------
/00_whyR/images/R_inventor.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/00_whyR/images/R_inventor.png
--------------------------------------------------------------------------------
/00_whyR/images/R_logo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/00_whyR/images/R_logo.png
--------------------------------------------------------------------------------
/00_whyR/images/R_vs_SPSS.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/00_whyR/images/R_vs_SPSS.jpg
--------------------------------------------------------------------------------
/00_whyR/images/arrival-movie.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/00_whyR/images/arrival-movie.png
--------------------------------------------------------------------------------
/00_whyR/images/data-science-explore.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/00_whyR/images/data-science-explore.png
--------------------------------------------------------------------------------
/00_whyR/images/data_science.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/00_whyR/images/data_science.png
--------------------------------------------------------------------------------
/00_whyR/images/hadley-wickham.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/00_whyR/images/hadley-wickham.jpg
--------------------------------------------------------------------------------
/00_whyR/images/night_king.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/00_whyR/images/night_king.jpg
--------------------------------------------------------------------------------
/00_whyR/images/r4ds-cover.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/00_whyR/images/r4ds-cover.png
--------------------------------------------------------------------------------
/00_whyR/images/rstudio-editor.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/00_whyR/images/rstudio-editor.png
--------------------------------------------------------------------------------
/00_whyR/images/social_science.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/00_whyR/images/social_science.jpg
--------------------------------------------------------------------------------
/00_whyR/images/tidyverse.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/00_whyR/images/tidyverse.png
--------------------------------------------------------------------------------
/00_whyR/images/tiobe-index.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/00_whyR/images/tiobe-index.png
--------------------------------------------------------------------------------
/00_whyR/images/typesetting.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/00_whyR/images/typesetting.png
--------------------------------------------------------------------------------
/00_whyR/images/what_is_R.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/00_whyR/images/what_is_R.png
--------------------------------------------------------------------------------
/01_install/01_install.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "第一章:配置R语言环境"
3 | author: "王敏杰"
4 | institute: "四川师范大学"
5 | date: "\\today"
6 | fontsize: 12pt
7 | output: binb::metropolis
8 | section-titles: true
9 | #toc: true
10 | header-includes:
11 | - \usepackage[fontset = fandol]{ctex}
12 | - \input{header.tex}
13 | link-citations: yes
14 | colorlinks: yes
15 | linkcolor: red
16 | classoption: "dvipsnames,UTF8"
17 | ---
18 |
19 | ```{r setup, include=FALSE}
20 | options(digits = 3)
21 | knitr::opts_chunk$set(
22 | comment = "#>",
23 | echo = TRUE,
24 | collapse = TRUE,
25 | message = FALSE,
26 | warning = FALSE,
27 | out.width = "50%",
28 | fig.align = "center",
29 | fig.asp = 0.618, # 1 / phi
30 | fig.show = "hold"
31 | )
32 | ```
33 |
34 | # 配置R语言环境
35 |
36 | ## 准备工作
37 |
38 | - \textcolor{red}{第一步}:连网(无线)办法
39 | - 用户名:学号+ @sicnu,比如 `20150956@sicnu`
40 | - 密码:出生年月日 + 身份证最后一位(如果最后一位为X,要大写),比如 `19880923X`
41 |
42 | - \textcolor{red}{第二步}:加QQ群
43 |
44 | ```{r echo=FALSE, out.width = '25%'}
45 | #knitr::include_graphics(path = "images/QQgroup_PsyStats.png")
46 | knitr::include_graphics(path = "images/QQgroup_shizishan.png")
47 | knitr::include_graphics(path = "images/QQgroup_chenglong.png")
48 | ```
49 |
50 | - \textcolor{red}{第三步}:在QQ群文件里下载(R-4.0.2-win.exe, RStudio-1.3.1091.exe),点击安装
51 |
52 |
53 | ## 环境配置
54 |
55 | 主要分三步:
56 |
57 | - 安装R
58 | - 安装Rstudio
59 | - 安装必要的宏包(packages)
60 |
61 | ## 第一步安装R
62 |
63 | - 下载并安装R,官方网站
64 |
65 | ```{r echo=FALSE, out.width = '85%'}
66 | knitr::include_graphics("images/Rinstall.png")
67 | ```
68 |
69 | ## 第二步安装RStudio
70 |
71 | - 下载并安装RStudio,官方网站
72 | - 选择`RStudio Desktop`
73 |
74 | ```{r out.width = '85%', echo = FALSE}
75 | knitr::include_graphics("images/Rstudio_install.png")
76 | ```
77 |
78 | ## 注意事项
79 |
80 | 这里有个小小的提示:
81 |
82 | - 电脑用户名\textcolor{red}{不要有中文和空格}
83 |
84 | - 尽量安装在\textcolor{red}{非系统盘},比如,可以选择安装在D盘
85 |
86 | - 安装路径\textcolor{red}{不要有中文和空格}。比如,这样就比较好
87 |
88 | - `D:/R`
89 | - `D:/Rstudio
90 |
91 |
92 |
93 | ## R 与 RStudio 是什么关系呢
94 |
95 | \qquad \qquad \qquad R \hspace{4cm} RStudio
96 |
97 | ```{r, fig.show="hold", out.width="49%", echo = FALSE}
98 | knitr::include_graphics(c("images/engine.jpg", "images/dashboard.jpg"))
99 | ```
100 |
101 | \centering{R 是有趣的灵魂, Rstudio 是好看的皮囊}
102 |
103 |
104 | ## RStudio很友好
105 |
106 | 从windows开始菜单,点开rstudio,界面效果
107 |
108 | ```{r out.width = '75%', echo = FALSE}
109 | knitr::include_graphics("images/rstudio-editor1.png")
110 | ```
111 |
112 |
113 |
114 | ## 第三步安装宏包
115 |
116 | ```{r out.width = '60%', echo = FALSE}
117 | knitr::include_graphics("images/RStudio-Screenshot.png")
118 | ```
119 |
120 | - 命令行安装
121 |
122 | - `install.packages("tidyverse")`
123 | - 回车,安静等待
124 |
125 |
126 |
127 |
128 |
129 |
130 |
131 |
132 |
133 |
134 |
135 |
136 |
137 |
138 |
139 |
140 |
141 |
142 | ## 如果宏包安装速度太慢, 方法一
143 |
144 | - 指定清华大学镜像
145 |
146 | ```{r, eval = FALSE}
147 | install.packages(
148 | "tidyverse",
149 | repos = "http://mirrors.tuna.tsinghua.edu.cn/CRAN"
150 | )
151 | ```
152 |
153 |
154 | - 或者,指定兰州大学镜像
155 |
156 | ```{r, eval = FALSE}
157 | install.packages(
158 | "tidyverse",
159 | repos = "https://mirror.lzu.edu.cn/CRAN/"
160 | )
161 | ```
162 |
163 |
164 |
165 | ## 如果宏包安装速度太慢, 方法二
166 |
167 | - `Rstudio`里设置镜像,步骤如下:
168 |
169 |
170 |
171 |
172 | ```{r image_grobs, fig.show='hold', out.width = "49%", fig.align = "default", echo=FALSE}
173 | library(cowplot)
174 | library(ggplot2)
175 |
176 | ggdraw() + draw_image("images/mirror1.png")
177 | ggdraw() + draw_image("images/mirror2.png")
178 | ```
179 |
180 |
181 | - 然后
182 |
183 | ```{r, eval = FALSE }
184 | install.packages("tidyverse")
185 | ```
186 |
187 |
188 | ## 测试
189 |
190 | 复制以下代码到**脚本编辑区** \footnotesize
191 |
192 | ```{r, eval=FALSE}
193 | library(ggplot2)
194 |
195 | ggplot(midwest, aes(x = area, y = poptotal)) +
196 | geom_point(aes(color = state, size = popdensity)) +
197 | geom_smooth(method = "loess", se = F) +
198 | xlim(c(0, 0.1)) +
199 | ylim(c(0, 500000)) +
200 | labs(
201 | title = "Scatterplot",
202 | subtitle = "Area Vs Population",
203 | x = "Area",
204 | y = "Population"
205 | )
206 | ```
207 |
208 |
209 |
210 | ## 运行
211 |
212 | ```{r out.width = '65%', echo = FALSE}
213 | knitr::include_graphics("images/run_script.png")
214 | ```
215 |
216 | - 方法1:点击`Run`, 运行光标所在行的代码
217 | - 方法2:点击`Source`,从头到尾运行全部代码
218 |
219 |
220 |
221 |
222 |
223 | ## 如果出现这个图,说明配置成功
224 |
225 | ```{r out.width = '100%', echo = FALSE}
226 | library(ggplot2)
227 |
228 | ggplot(midwest, aes(x = area, y = poptotal)) +
229 | geom_point(aes(color = state, size = popdensity)) +
230 | geom_smooth(method = "loess", se = F) +
231 | xlim(c(0, 0.1)) +
232 | ylim(c(0, 500000)) +
233 | labs(
234 | title = "Scatterplot",
235 | subtitle = "Area Vs Population",
236 | x = "Area",
237 | y = "Population"
238 | )
239 | ```
240 |
241 | # 可能的问题
242 |
243 | ## 可能的问题
244 |
245 | - 我的电脑是苹果系统,怎么安装呢?
246 | - 我的Rstudio需要哪些设置?
247 | - 我的系统不能兼容64位的Rstudio?
248 | - 为什么Rstudio打开是空白呢?
249 | - 安装宏包太慢,怎么解决?
250 | - 安装宏包,遇到报错信息"unable to access index for repository..."?
251 |
252 |
253 | ## Happy R
254 |
255 | 课时有限,想掌握这门技术,需要课后多下功夫
256 |
257 | - 请务必配置好环境,包括安装宏包(群里有安装视频,实在不行,@我远程协助)
258 | - 学习资料 https://bookdown.org/wangminjie/R4DS/
259 | - 参考书目《R数据科学》(群文件book文件夹中)
260 | - 我们不是孙悟空,一出生就身怀绝技。No shame in asking help
261 | - 学习曲线会比较陡,但有老司机带路,要有信心。
262 |
263 | 祝大家happy R !
264 |
--------------------------------------------------------------------------------
/01_install/01_install.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/01_install/01_install.pdf
--------------------------------------------------------------------------------
/01_install/header.tex:
--------------------------------------------------------------------------------
1 | \usepackage{ctex}
2 | \usepackage{booktabs}
3 | \usepackage{longtable}
4 | \usepackage{array}
5 | \usepackage{multirow}
6 | \usepackage{wrapfig}
7 | \usepackage{float}
8 | \usepackage{colortbl}
9 | \usepackage{pdflscape}
10 | \usepackage{tabu}
11 | \usepackage{threeparttable}
12 | \usepackage{threeparttablex}
13 | \usepackage{makecell}
14 | \usepackage{xcolor}
15 | \usepackage{xtab}
16 |
17 | \def\begincols{
18 | \begin{columns}} \def\begincol{\begin{column}} \def\endcol{\end{column}} \def\endcols{\end{columns}
19 | }
20 |
21 |
22 |
--------------------------------------------------------------------------------
/01_install/images/QQgroup_PsyStats.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/01_install/images/QQgroup_PsyStats.png
--------------------------------------------------------------------------------
/01_install/images/QQgroup_chenglong.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/01_install/images/QQgroup_chenglong.png
--------------------------------------------------------------------------------
/01_install/images/QQgroup_shizishan.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/01_install/images/QQgroup_shizishan.png
--------------------------------------------------------------------------------
/01_install/images/RStudio-Screenshot.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/01_install/images/RStudio-Screenshot.png
--------------------------------------------------------------------------------
/01_install/images/Rhelp.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/01_install/images/Rhelp.png
--------------------------------------------------------------------------------
/01_install/images/Rinstall.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/01_install/images/Rinstall.png
--------------------------------------------------------------------------------
/01_install/images/Rstudio_install.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/01_install/images/Rstudio_install.png
--------------------------------------------------------------------------------
/01_install/images/dashboard.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/01_install/images/dashboard.jpg
--------------------------------------------------------------------------------
/01_install/images/engine.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/01_install/images/engine.jpg
--------------------------------------------------------------------------------
/01_install/images/mirror1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/01_install/images/mirror1.png
--------------------------------------------------------------------------------
/01_install/images/mirror2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/01_install/images/mirror2.png
--------------------------------------------------------------------------------
/01_install/images/rstudio-editor1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/01_install/images/rstudio-editor1.png
--------------------------------------------------------------------------------
/01_install/images/run_script.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/01_install/images/run_script.png
--------------------------------------------------------------------------------
/02_basicR/02_basicR.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "第二章:R语言基础"
3 | author: "王敏杰"
4 | institute: "四川师范大学"
5 | date: "\\today"
6 | fontsize: 12pt
7 | output: binb::metropolis
8 | section-titles: true
9 | #toc: true
10 | header-includes:
11 | - \usepackage[fontset = fandol]{ctex}
12 | - \input{header.tex}
13 | link-citations: yes
14 | colorlinks: yes
15 | linkcolor: red
16 | classoption: "dvipsnames,UTF8"
17 | ---
18 |
19 | ```{r setup, include=FALSE}
20 | options(digits = 3)
21 | knitr::opts_chunk$set(
22 | comment = "#>",
23 | echo = TRUE,
24 | collapse = TRUE,
25 | message = FALSE,
26 | warning = FALSE,
27 | out.width = "50%",
28 | fig.align = "center",
29 | fig.asp = 0.618, # 1 / phi
30 | fig.show = "hold"
31 | )
32 | ```
33 |
34 |
35 |
36 | # 开始
37 |
38 | ## 开始
39 |
40 | 安装完毕后,从windows`开始菜单`,点开`rstudio`图标,就打开了rstudio的窗口,界面效果如下
41 |
42 | ```{r out.width = '75%', echo = FALSE}
43 | knitr::include_graphics("images/rstudio-editor.png")
44 | ```
45 |
46 | ## RStudio 非常友好
47 |
48 | 想要运行一段R代码,只需要在 RStudio 控制台面板最下面(Console)一行内键入R 代码,然后回车即可。比如
49 | ```{r }
50 | 1 + 1
51 | ```
52 |
53 |
54 | ```{r echo=TRUE, message=TRUE, warning=TRUE}
55 | log(8)
56 | ```
57 |
58 |
59 |
60 |
61 | ```{r echo=TRUE, message=TRUE, warning=TRUE}
62 | 1:12
63 | ```
64 |
65 |
66 | ## 对象
67 |
68 | ### 一切都是对象
69 | 在R中存储的数据称为**对象**, R语言数据处理实际上就是不断的创建和操控这些对象。
70 |
71 | ### 创建对象
72 | 创建一个 R 对象,首先确定一个名称,然后使用
73 | 赋值操作符 `<-`,将数据赋值给它。比如,如果想给变量 x 赋值为5,在命令行中可以这样写 `x <- 5` ,然后回车.
74 |
75 | ```{r assignment operator}
76 | x <- 5
77 | ```
78 |
79 | ### 打印对象
80 |
81 | 当键入`x` 然后回车,就打印出 x 的值
82 | ```{r}
83 | x
84 | ```
85 |
86 |
87 |
88 |
89 |
90 |
91 |
92 |
93 |
94 | ## 对象
95 |
96 | ### 创建对象
97 | ```{r}
98 | l <- "hello world"
99 | ```
100 |
101 | ### 访问对象
102 |
103 | ```{r}
104 | l
105 | ```
106 |
107 |
108 | ## 对象
109 |
110 | ### 创建一个序列
111 | ```{r echo=TRUE, message=TRUE, warning=TRUE}
112 | d <- 1:10
113 | ```
114 |
115 | ### 访问对象
116 | ```{r echo=TRUE, message=TRUE, warning=TRUE}
117 | d
118 | ```
119 |
120 |
121 |
122 |
123 |
124 |
125 |
126 |
127 |
128 |
129 |
130 |
131 |
132 |
133 |
134 |
135 |
136 |
137 |
138 |
139 |
140 |
141 |
142 |
143 | ## 数据类型
144 |
145 |
146 | ```{r out.width = '100%', echo = FALSE}
147 | knitr::include_graphics("images/data_type.png")
148 | ```
149 | ## 数据类型
150 | - 数值型
151 | ```{r}
152 | 3
153 | 5000
154 | 3e+06
155 | class(0.0001)
156 | ```
157 |
158 |
159 | ## 数据类型
160 | - 字符串型
161 | ```{r}
162 | "hello"
163 | "girl"
164 | "1" # 注意1 和 "1" 的区别
165 | ```
166 |
167 | ```{r}
168 | class("1")
169 | ```
170 |
171 |
172 | ## 数据类型
173 |
174 | - 逻辑型
175 | ```{r}
176 | TRUE
177 | FALSE
178 | 3 < 4
179 | ```
180 |
181 |
182 | ```{r}
183 | class(T)
184 | ```
185 |
186 |
187 | ```{r}
188 | 3 < 4
189 | ```
190 |
191 | ## 数据类型
192 | - 因子型
193 | ```{r}
194 | fac <- factor(c("a", "b", "c"))
195 | fac
196 | ```
197 |
198 |
199 | ```{r}
200 | class(fac)
201 | ```
202 |
203 |
204 |
205 |
206 |
207 |
208 | ## 数据结构
209 |
210 | ### 向量
211 | - 用`c`函数将一组数据**构造**成向量,要求每个元素用逗号分隔,且每个元素的数据类型是一致的
212 |
213 | ```{r echo=TRUE, message=TRUE, warning=TRUE}
214 | d <- c(2, 4, 3, 1, 5, 7)
215 | d
216 | ```
217 |
218 |
219 | ```{r echo=TRUE, message=TRUE, warning=TRUE}
220 | t <- c("2", "4", "3", "1", "5", "7")
221 | t
222 | ```
223 |
224 |
225 | 长度为 1 的向量
226 | ```{r echo=TRUE, message=TRUE, warning=TRUE}
227 | x <- c(1) #
228 | x <- 1 # 偷懒的写法
229 | ```
230 |
231 |
232 | ## 数据结构
233 | ### 矩阵
234 | - 可以用 `matrix` 函数创建
235 | ```{r echo=TRUE, message=TRUE, warning=TRUE}
236 | m <- matrix(c(2, 4, 3, 1, 5, 7),
237 | nrow = 2,
238 | ncol = 3,
239 | byrow = TRUE
240 | )
241 | m
242 | ```
243 |
244 |
245 |
246 | ## 数据结构
247 | ### 数组
248 | - `array` 函数生成`n`维数组
249 | ```{r echo=TRUE, message=TRUE, warning=TRUE}
250 | ar <- array(c(11:14, 21:24, 31:34), dim = c(2, 2, 3))
251 | ar
252 | ```
253 |
254 |
255 |
256 |
257 |
258 | ## 数据结构
259 | ### 列表
260 | - 与`c`函数创建向量的方式相似,元素之间用逗号分开。不同的是,列表允许每个元素不同的数据类型(数值型,字符型,逻辑型等),而向量要求每个元素的数据类型必须相同。
261 |
262 | ```{r echo=TRUE, message=TRUE, warning=TRUE}
263 | list1 <- list(100:110, "R", c(2, 4, 3, 1, 5, 7))
264 | list1
265 | ```
266 |
267 |
268 |
269 | ## 数据结构
270 | ### 数据框
271 | - `data.frame`函数构建
272 |
273 | ```{r echo=TRUE, message=TRUE, warning=TRUE}
274 | df <- data.frame(
275 | name = c("ace", "bob", "carl", "kaite"),
276 | age = c(21, 14, 13, 15),
277 | sex = c("girl", "boy", "boy", "girl")
278 | )
279 | df
280 | ```
281 |
282 |
283 |
284 |
285 | ## 数据结构
286 | ### 数据框
287 | R 对象的数据结构(向量、矩阵、数组、列表和数据框),总结如下
288 |
289 | ```{r out.width = '100%', echo = FALSE}
290 | knitr::include_graphics("images/data_struction1.png")
291 | ```
292 |
293 |
294 |
295 |
296 |
297 |
298 | ## 函数
299 |
300 | R 语言的强大在于使用**函数**操控各种对象,你可以把对象看作是名词,而函数看作是动词。
301 | 我们用一个简单的例子,`sum()`来演示函数如何工作的。`sum()`后的结果可以直接显示出来,
302 | ```{r}
303 | sum(5, 10)
304 | ```
305 |
306 | 也可以赋名。比如下面代码,首先计算`5 + 10`然后赋给新创建的对象`y`, 并在第二行中打印出来对象`y`的值
307 |
308 | ```{r}
309 | y <- sum(5, 10)
310 | y
311 | ```
312 |
313 |
314 | ## 更多函数
315 |
316 | 除了`sum()`求和函数,R语言有很多很多函数
317 |
318 |
319 | ```{r echo=TRUE, message=TRUE, warning=TRUE}
320 | mean(1:6)
321 | ```
322 |
323 |
324 | ```{r echo=TRUE, message=TRUE, warning=TRUE}
325 | abs(1:6)
326 | ```
327 |
328 |
329 |
330 | ```{r echo=TRUE, message=TRUE, warning=TRUE}
331 | round(3.14159)
332 | ```
333 |
334 |
335 |
336 | ```{r echo=TRUE, message=TRUE, warning=TRUE}
337 | x <- seq(1, 100)
338 | sum(x)
339 | ```
340 |
341 |
342 |
343 |
344 | ## 脚本
345 | ### 什么是脚本
346 | 如果我们已经写好了一段R程序,我们可以保存为**脚本**文件,脚本文件通常以.R作为文件的后缀名。比如我们可以将刚才创建`x`和 `y`对象的命令,保存为脚本文件`my_script.R`。
347 | 这样我们可以在其它时间修改和重新运行它。
348 |
349 | ## 脚本
350 | ### 创建脚本
351 | 在RStudio中,你可以通过菜单栏依此点击`File > New File > R Script` 来创建一个新的脚本。
352 | 强烈建议大家在运行代码之前,使用脚本的形式编写和编辑自己的程序,养成这样的习惯后,你今后所有的工作都有案可查,并且具有可重复性。
353 |
354 |
355 | ## 创建脚本
356 | ```{r out.width = '100%', echo = FALSE}
357 | knitr::include_graphics("images/script1.png")
358 | ```
359 |
360 |
361 |
362 | ## 运行脚本
363 |
364 | - 点击 `Run` 运行光标所在行
365 | - 点击 `Source` 运行整个脚本
366 |
367 | ```{r out.width = '75%', echo = FALSE}
368 | knitr::include_graphics("images/script2.png")
369 | ```
370 |
371 |
372 |
373 |
374 | ## 宏包
375 |
376 | R 语言的强大还在于各种宏包,一般在[The Comprehensive R Archive Network (CRAN)](https://cran.r-project.org)下载安装。
377 |
378 |
379 | 可以用如下命令安装宏包:
380 |
381 | ```{r, eval = FALSE }
382 | # 安装单个包
383 | install.packages("tidyverse")
384 | ```
385 |
386 |
387 | ```{r, eval = FALSE }
388 | # 安装多个包
389 | install.packages(c("ggplot2", "devtools", "dplyr"))
390 | ```
391 |
392 |
393 |
394 | ## 如何获取帮助
395 |
396 |
397 | - 记住和学习所有的函数几乎是不可能的
398 | - 打开函数的帮助页面(`Rstudio`右下面板的`Help`选项卡)
399 |
400 | ```{r, eval = FALSE }
401 | ?sqrt
402 | ?gather
403 | ?spread
404 | ?ggplot2
405 | ?scale
406 | ?map_dfr
407 | ```
408 |
409 | ## 如何获取帮助
410 |
411 | 快速获取帮助,是R的又一个优良特性
412 |
413 | ```{r out.width = '100%', echo = FALSE}
414 | knitr::include_graphics("images/Rhelp.png")
415 | ```
416 |
417 |
418 |
419 |
--------------------------------------------------------------------------------
/02_basicR/02_basicR.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/02_basicR/02_basicR.pdf
--------------------------------------------------------------------------------
/02_basicR/header.tex:
--------------------------------------------------------------------------------
1 | \usepackage{ctex}
2 | \usepackage{booktabs}
3 | \usepackage{longtable}
4 | \usepackage{array}
5 | \usepackage{multirow}
6 | \usepackage{wrapfig}
7 | \usepackage{float}
8 | \usepackage{colortbl}
9 | \usepackage{pdflscape}
10 | \usepackage{tabu}
11 | \usepackage{threeparttable}
12 | \usepackage{threeparttablex}
13 | \usepackage{makecell}
14 | \usepackage{xcolor}
15 | \usepackage{xtab}
16 |
17 | \def\begincols{
18 | \begin{columns}} \def\begincol{\begin{column}} \def\endcol{\end{column}} \def\endcols{\end{columns}
19 | }
20 |
21 |
22 |
--------------------------------------------------------------------------------
/02_basicR/images/Rhelp.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/02_basicR/images/Rhelp.png
--------------------------------------------------------------------------------
/02_basicR/images/data_struction1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/02_basicR/images/data_struction1.png
--------------------------------------------------------------------------------
/02_basicR/images/data_type.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/02_basicR/images/data_type.png
--------------------------------------------------------------------------------
/02_basicR/images/rstudio-editor.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/02_basicR/images/rstudio-editor.png
--------------------------------------------------------------------------------
/02_basicR/images/script1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/02_basicR/images/script1.png
--------------------------------------------------------------------------------
/02_basicR/images/script2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/02_basicR/images/script2.png
--------------------------------------------------------------------------------
/03_subset/03_subset.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "第三章:子集选取"
3 | author: "王敏杰"
4 | institute: "四川师范大学"
5 | date: "\\today"
6 | fontsize: 12pt
7 | output: binb::metropolis
8 | section-titles: true
9 | #toc: true
10 | header-includes:
11 | - \usepackage[fontset = fandol]{ctex}
12 | - \input{header.tex}
13 | link-citations: yes
14 | colorlinks: yes
15 | linkcolor: red
16 | classoption: "dvipsnames,UTF8"
17 | ---
18 |
19 | ```{r setup, include=FALSE}
20 | options(digits = 3)
21 | knitr::opts_chunk$set(
22 | comment = "#>",
23 | echo = TRUE,
24 | collapse = TRUE,
25 | message = FALSE,
26 | warning = FALSE,
27 | out.width = "100%",
28 | fig.align = "center",
29 | fig.asp = 0.618, # 1 / phi
30 | fig.show = "hold"
31 | )
32 | ```
33 |
34 |
35 |
36 | ## 子集选取
37 |
38 | **对象**就是在计算机里新建了存储空间,好比一个盒子,
39 | 我们可以往盒子里装东西,也可以从盒子里取东西。
40 |
41 | ```{r echo=FALSE, out.width = '85%'}
42 | knitr::include_graphics("images/R_box.png")
43 | ```
44 |
45 |
46 | ## 数据结构
47 |
48 | R 对象的数据结构(向量、矩阵、数组、列表和数据框)
49 |
50 | ```{r out.width = '100%', echo = FALSE}
51 | knitr::include_graphics("images/data_struction1.png")
52 | ```
53 | 下面依次讲解,从每一种数据结构中选取子集...
54 |
55 |
56 | # 开始
57 |
58 | ## 向量
59 |
60 | 对于原子型向量,我们有至少四种选取子集的方法
61 | ```{r}
62 | x <- c(1.1, 2.2, 3.3, 4.4, 5.5)
63 | ```
64 |
65 |
66 | - 正整数: 指定向量元素中的位置
67 | ```{r}
68 | x[1]
69 | ```
70 |
71 | ```{r}
72 | x[c(3,1)]
73 | ```
74 | ```{r}
75 | x[1:3]
76 | ```
77 |
78 | ## 向量
79 | - 负整数:删除指定位置的元素
80 | ```{r}
81 | x[-2]
82 | ```
83 |
84 |
85 | ```{r}
86 | x[c(-3, -4)]
87 | ```
88 |
89 |
90 | ## 向量
91 |
92 | - 逻辑向量:将`TRUE`对应位置的元素提取出来
93 | ```{r}
94 | x[c(TRUE, FALSE, TRUE, FALSE, TRUE)]
95 | ```
96 |
97 | 常用的一种情形;筛选出大于某个值的所有元素
98 | ```{r}
99 | x > 3
100 | ```
101 |
102 | ```{r}
103 | x[x > 3]
104 | ```
105 |
106 | ## 向量
107 | - 如果是命名向量
108 | ```{r}
109 | y <- c("a" = 11, "b" = 12, "c" = 13, "d" = 14)
110 | y
111 | ```
112 |
113 | 我们可以用命名向量,返回对应位置的向量
114 | ```{r}
115 | y[c("d", "c", "a")]
116 | ```
117 |
118 |
119 |
120 |
121 | ## 列表
122 |
123 | 对列表取子集,和向量的方法一样。使用`[`总是返回列表,
124 | ```{r}
125 | l <- list("one" = c("a", "b", "c"),
126 | "two" = c(1:5),
127 | "three" = c(TRUE, FALSE)
128 | )
129 | l
130 | ```
131 |
132 | ```{r}
133 | l[1] # 仍然是列表喔
134 | ```
135 |
136 |
137 | ## 列表
138 |
139 | 如果想列表中的元素,需要使用`[[`
140 | ```{r}
141 | l[[1]]
142 | ```
143 |
144 |
145 | 也可以使用其中的元素名,比如`[["one"]]`,
146 | ```{r}
147 | l[["one"]]
148 | ```
149 |
150 |
151 | 程序员觉得以上太麻烦了,于是用`$`来简写
152 | ```{r}
153 | l$one
154 | ```
155 |
156 |
157 | ## 列表
158 |
159 | 所以,请记住
160 |
161 | - `[` 和 `[[` 的区别
162 | - `x$y` 是 `x[["y"]]` 的简写
163 |
164 |
165 |
166 |
167 | ## 矩阵
168 |
169 | ```{r}
170 | a <- matrix(1:9, nrow = 3, byrow = TRUE)
171 | a
172 | ```
173 | 我们取第1到第2行的2-3列,写成`[1:2, 2:3]`. 注意,中间以逗号分隔,它得到一个新的矩阵
174 | ```{r}
175 | a[1:2, 2:3]
176 | ```
177 |
178 |
179 | ## 矩阵
180 | 默认情况下, `[` 会将获取的数据以尽可能低的维度形式呈现。比如
181 | ```{r}
182 | a[1, 1:2]
183 | ```
184 | 表示第1行的第1、2列,此时不再是$1 \times 2$矩阵,而是包含了两个元素的向量。
185 |
186 | \vfill
187 | **以尽可能低的维度形式呈现**,简单理解就是,这个`r a[1, 1:2]`长的像个矩阵,又有点像向量,向量的维度比矩阵低,那就是向量吧。
188 |
189 |
190 | ## 矩阵
191 | 有些时候,我们想保留所有的行或者列,比如
192 |
193 | - 行方向,只选取第1行到第2行
194 | - 列方向,选取所有列
195 |
196 | 可以这样简写
197 |
198 | ```{r}
199 | a[1:2, ]
200 | ```
201 |
202 | 想想,这种写法,会输出什么
203 | ```{r, eval = FALSE}
204 | a[ , ]
205 | ```
206 |
207 |
208 | ## 矩阵
209 |
210 | ```{r}
211 | a[ , ]
212 | ```
213 |
214 |
215 | ```{r}
216 | # 可以再简化点?
217 | a[]
218 | ```
219 |
220 |
221 | ```{r}
222 | # 是不是可以再简化点?
223 | a
224 | ```
225 |
226 |
227 | ## 数据框
228 |
229 | 数据框具有`list`和`matrix`的双重属性,因此
230 |
231 | - 当选取数据框的某几列的时候,可以像list一样,指定元素位置,比如`df[1:2]`选取前两列
232 | - 也可以像矩阵一样,使用行和列的标识选取,比如`df[1:3, ]`选取前三行的所有列
233 |
234 | \small
235 | ```{r}
236 | df <- data.frame(x = 1:4,
237 | y = 4:1,
238 | z = c("a", "b", "c", "d") )
239 | df
240 | ```
241 |
242 | ## 数据框
243 |
244 | \small
245 | ```{r}
246 | # Like a list
247 | df[c("x", "z")]
248 | ```
249 |
250 |
251 |
252 | ```{r}
253 | # Like a matrix
254 | df[, c("x", "z")]
255 | ```
256 |
257 | ## 数据框
258 |
259 | 也可以通过行和列的位置
260 | ```{r}
261 | df[1:2]
262 | ```
263 |
264 |
265 | ```{r}
266 | df[1:3, ]
267 | ```
268 |
269 |
270 |
271 | ## 数据框
272 | 遇到单行或单列的时候,也和矩阵一样,数据会降维
273 | ```{r}
274 | df[, "x"]
275 | ```
276 |
277 | 如果想避免降维,需要多写一句话
278 |
279 | ```{r}
280 | df[, "x", drop = FALSE]
281 | ```
282 |
283 |
284 |
285 |
286 |
287 |
288 |
289 |
290 |
291 |
292 | ## 延伸阅读
293 |
294 | - 如何获取`matrix(1:9, nrow = 3)`上对角元? 对角元?
295 | - 对数据框,思考`df["x"]`, `df[["x"]]`, `df$x`三者的区别?
296 | - 如果`x`是一个矩阵,请问 `x[] <- 0` 和`x <- 0` 有什么区别?
297 |
298 | ```{r eval=FALSE, include=FALSE}
299 | m <- matrix(1:9, nrow = 3)
300 | m
301 | ```
302 |
303 |
304 | ```{r eval=FALSE, include=FALSE}
305 | diag(m)
306 | upper.tri(m, diag = FALSE)
307 | ```
308 |
309 |
310 | ```{r eval=FALSE, include=FALSE}
311 | m[upper.tri(m, diag = FALSE)]
312 | ```
313 |
314 |
315 |
--------------------------------------------------------------------------------
/03_subset/03_subset.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/03_subset/03_subset.pdf
--------------------------------------------------------------------------------
/03_subset/header.tex:
--------------------------------------------------------------------------------
1 | \usepackage{ctex}
2 | \usepackage{booktabs}
3 | \usepackage{longtable}
4 | \usepackage{array}
5 | \usepackage{multirow}
6 | \usepackage{wrapfig}
7 | \usepackage{float}
8 | \usepackage{colortbl}
9 | \usepackage{pdflscape}
10 | \usepackage{tabu}
11 | \usepackage{threeparttable}
12 | \usepackage{threeparttablex}
13 | \usepackage{makecell}
14 | \usepackage{xcolor}
15 | \usepackage{xtab}
16 |
17 | \def\begincols{
18 | \begin{columns}} \def\begincol{\begin{column}} \def\endcol{\end{column}} \def\endcols{\end{columns}
19 | }
20 |
21 |
22 |
--------------------------------------------------------------------------------
/03_subset/images/R_box.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/03_subset/images/R_box.png
--------------------------------------------------------------------------------
/03_subset/images/data_struction1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/03_subset/images/data_struction1.png
--------------------------------------------------------------------------------
/04_Rmarkdown/04_Rmarkdown.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "第三章:可重复性报告"
3 | author: "王敏杰"
4 | institute: "四川师范大学"
5 | date: "\\today"
6 | fontsize: 12pt
7 | output: binb::metropolis
8 | section-titles: true
9 | #toc: true
10 | header-includes:
11 | - \usepackage[fontset = fandol]{ctex}
12 | - \input{header.tex}
13 | link-citations: yes
14 | colorlinks: yes
15 | linkcolor: red
16 | classoption: "dvipsnames,UTF8"
17 | ---
18 |
19 | ```{r setup, include=FALSE}
20 | options(digits = 3)
21 | knitr::opts_chunk$set(
22 | comment = "#>",
23 | echo = TRUE,
24 | collapse = TRUE,
25 | message = FALSE,
26 | warning = FALSE,
27 | out.width = "100%",
28 | fig.align = "center",
29 | fig.asp = 0.618, # 1 / phi
30 | fig.show = "hold"
31 | )
32 | ```
33 |
34 | ## 为什么要做可重复性报告
35 |
36 | 交流-理解-重复
37 |
38 | - 需要**展示和分享**我们的数据分析结果给同行、老板或者老师
39 | - 为了让老板能快速地的理解我们的分析思路和方法,最好的方法,就是将分析背景、分析过程、分析结果以及图表等形成**报告**
40 | - 让读者能重复和验证我们的结果,确保结论的真实可信
41 |
42 | 因此,本章将介绍用Rmarkdown生成分析报告(可重复性报告)
43 |
44 |
45 | ## 什么是Rmarkdown
46 | ```{r out.width = '100%', echo = FALSE}
47 | knitr::include_graphics("images/rmarkdown.png")
48 | ```
49 |
50 |
51 |
52 | # markdown 基本语法
53 |
54 | ## markdown 基本语法
55 |
56 | - 章节
57 | ```{markdown, eval = FALSE, echo = TRUE}
58 | # 第一章 (注意 "#" 与 "第一章"之间有空格)
59 | ## 第一节 (同上, "##" 与 "第一节"之间有空格)
60 | ### 第一小节 (同上,"###" 与 "第一小节"之间有空格)
61 | ```
62 |
63 | - 正文
64 | ```{markdown, eval = FALSE, echo = TRUE}
65 | This is a sentence. ...这是正文...
66 | ```
67 |
68 |
69 | ## markdown 基本语法
70 | - 序列
71 | ```{markdown, eval = FALSE, echo = TRUE}
72 | Now a list begins:
73 |
74 | - no importance
75 | - again
76 | - repeat
77 |
78 | A numbered list:
79 |
80 | 1. first
81 | 2. second
82 | ```
83 |
84 |
85 | ## markdown 基本语法
86 |
87 | - 其他标记
88 | ```{markdown, eval = FALSE, echo = TRUE}
89 | __bold__
90 | _italic_
91 | ~~strike through~~
92 | ```
93 |
94 |
95 |
96 | # 创建 RMarkdown
97 |
98 | ## 创建 RMarkdown
99 |
100 | ```{r, eval = FALSE}
101 | install.packages("rmarkdown")
102 | ```
103 |
104 | `Rstudio`中创建: `File -> New File -> R Markdown`.
105 |
106 |
107 | 基本构成(图中绿色括号地方)
108 |
109 | - metadata
110 | - text
111 | - code
112 |
113 |
114 | ## 创建 RMarkdown
115 | ```{r out.width = '85%', echo = FALSE}
116 | knitr::include_graphics("images/rstudio-markdown.png")
117 | ```
118 |
119 | 点击knit(图中红色地方),选择想要输出的文档格式即可。
120 |
121 |
122 | ## 生成html文档
123 |
124 | 希望html文档有章节号、目录或者更好显示表格,可以修改头文件(用下面的内容替换Rmarkdown的头文件)
125 |
126 | ```yaml
127 | ---
128 | title: Habits
129 | author: John Doe
130 | date: "`r Sys.Date()`"
131 | output:
132 | html_document:
133 | df_print: paged
134 | toc: yes
135 | number_sections: yes
136 | ---
137 | ```
138 |
139 |
140 |
141 |
142 |
143 |
144 | ## 生成pdf文档
145 |
146 | 优雅的pdf文档
147 |
148 | - pdf文档可以插入漂亮的矢量图和优雅的数学公式,所以备受同学们的喜欢。
149 | - 但往往我们写中文的时候,编译不成功,解决方案就是使用`tinytex`,可以看这个[视频](https://www.bilibili.com/video/BV1Gf4y1R7md)。
150 |
151 |
152 | ```{r, eval = FALSE}
153 | install.packages("tinytex")
154 | tinytex::install_tinytex(dir = "D:\\Tinytex",
155 | force = T)
156 | ```
157 |
158 |
159 |
160 |
161 |
162 | # Rmarkdown 使用方法
163 |
164 | ## 插入公式
165 |
166 | 我相信你已经熟悉了latex语法,那么我们在Rmarkdwon里输入
167 | `$$\frac{\sum (\bar{x} - x_i)^2}{n-1}$$`,那么实际输出:
168 |
169 | $$\frac{\sum (\bar{x} - x_i)^2}{n-1}$$
170 |
171 |
172 | ## 插入公式
173 | 也可以使用latex的等式环境, 比如
174 |
175 | ```latex
176 | $$
177 | \Theta = \begin{pmatrix}\alpha & \beta\\
178 | \gamma & \delta
179 | \end{pmatrix}
180 | $$
181 | ```
182 | 输出
183 |
184 | $$
185 | \Theta = \begin{pmatrix}\alpha & \beta\\
186 | \gamma & \delta
187 | \end{pmatrix}
188 | $$
189 |
190 |
191 |
192 | ## 插入图片
193 |
194 | \scriptsize
195 | ````markdown
196 | `r ''````{r, out.width='35%', fig.align='center', fig.cap='this is caption'}
197 | knitr::include_graphics("images/R_logo.png")
198 | ```
199 | ````
200 |
201 |
202 | ```{r out.width = '35%', fig.align='center', fig.cap='this is caption', echo = F}
203 | knitr::include_graphics("images/R_logo.png")
204 | ```
205 |
206 |
207 |
208 | ## 运行代码
209 |
210 | ```{r, echo = T}
211 | summary(cars)
212 | ```
213 |
214 |
215 | ## 表格
216 | ````md
217 | ```{r tables-mtcars}`r ''`
218 | knitr::kable(iris[1:5, ], caption = "A caption")
219 | ```
220 | ````
221 |
222 | \vskip -1cm
223 | ```{r tables-mtcars, echo = F}
224 | knitr::kable(iris[1:5, ], caption = "A caption")
225 | ```
226 |
227 | 需要更优美的表格,可参考[这里](https://haozhu233.github.io/kableExtra/)
228 |
229 |
230 |
231 | ## 生成图片
232 | ````md
233 | ```{r}`r ''`
234 | plot(pressure)
235 | ```
236 | ````
237 |
238 |
239 | ```{r out.width = '85%', echo=FALSE}
240 | plot(pressure)
241 | ```
242 |
243 |
244 |
245 | ## 把这段代码复制到你的Rmarkdown文档试试
246 |
247 | \scriptsize
248 | ````md
249 | ```{r, out.width = '85%', fig.showtext = TRUE}`r ''`
250 | library(tidyverse)
251 | library(nycflights13)
252 | library(showtext)
253 | showtext_auto()
254 | flights %>%
255 | group_by(dest) %>%
256 | summarize(
257 | count = n(),
258 | dist = mean(distance, na.rm = TRUE),
259 | delay = mean(arr_delay, na.rm = TRUE)
260 | ) %>%
261 | dplyr::filter(delay > 0, count > 20, dest != "HNL") %>%
262 | ggplot(mapping = aes(x = dist, y = delay)) +
263 | geom_point(aes(size = count), alpha = 1 / 3) +
264 | geom_smooth(se = FALSE) +
265 | ggtitle("这是我的标题")
266 | ```
267 | ````
268 |
269 |
270 |
271 |
272 |
273 |
274 |
275 | ## 延伸阅读
276 |
277 | * Markdown tutorial https://www.markdowntutorial.com (10分钟学完)
278 | * LaTeX tutorial https://www.latex-tutorial.com/quick-start/
279 | * Rmarkdown 介绍 https://bookdown.org/yihui/rmarkdown/
280 | * Rmarkdown 手册 https://bookdown.org/yihui/rmarkdown-cookbook/
281 |
--------------------------------------------------------------------------------
/04_Rmarkdown/04_Rmarkdown.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/04_Rmarkdown/04_Rmarkdown.pdf
--------------------------------------------------------------------------------
/04_Rmarkdown/header.tex:
--------------------------------------------------------------------------------
1 | \usepackage{ctex}
2 | \usepackage{booktabs}
3 | \usepackage{longtable}
4 | \usepackage{array}
5 | \usepackage{multirow}
6 | \usepackage{wrapfig}
7 | \usepackage{float}
8 | \usepackage{colortbl}
9 | \usepackage{pdflscape}
10 | \usepackage{tabu}
11 | \usepackage{threeparttable}
12 | \usepackage{threeparttablex}
13 | \usepackage{makecell}
14 | \usepackage{xcolor}
15 | \usepackage{xtab}
16 |
17 | \def\begincols{
18 | \begin{columns}} \def\begincol{\begin{column}} \def\endcol{\end{column}} \def\endcols{\end{columns}
19 | }
20 |
21 |
22 |
--------------------------------------------------------------------------------
/04_Rmarkdown/images/R_logo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/04_Rmarkdown/images/R_logo.png
--------------------------------------------------------------------------------
/04_Rmarkdown/images/rmarkdown.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/04_Rmarkdown/images/rmarkdown.png
--------------------------------------------------------------------------------
/04_Rmarkdown/images/rstudio-markdown.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/04_Rmarkdown/images/rstudio-markdown.png
--------------------------------------------------------------------------------
/05_dplyr/05_dplyr.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "第四章:数据处理"
3 | author: "王敏杰"
4 | institute: "四川师范大学"
5 | date: "\\today"
6 | fontsize: 12pt
7 | output: binb::metropolis
8 | section-titles: true
9 | #toc: true
10 | header-includes:
11 | - \usepackage[fontset = fandol]{ctex}
12 | - \input{header.tex}
13 | link-citations: yes
14 | colorlinks: yes
15 | linkcolor: red
16 | classoption: "dvipsnames,UTF8"
17 | ---
18 |
19 | ```{r setup, include=FALSE}
20 | options(digits = 3)
21 | knitr::opts_chunk$set(
22 | comment = "#>",
23 | echo = TRUE,
24 | collapse = TRUE,
25 | message = FALSE,
26 | warning = FALSE,
27 | out.width = "100%",
28 | fig.align = "center",
29 | fig.asp = 0.618, # 1 / phi
30 | fig.show = "hold"
31 | )
32 | ```
33 |
34 | ## 正式进入tidyverse家族的学习
35 | ```{r echo=FALSE, out.width = '85%'}
36 | knitr::include_graphics("images/tidyverse.png")
37 | ```
38 |
39 | ## tidyverse 家族
40 |
41 | tidyverse家族主要成员包括
42 |
43 |
44 | | 功能 | 宏包 |
45 | |------|-------------|
46 | 有颜值担当 | ggplot2 |
47 | 数据处理王者 | dplyr |
48 | 数据转换专家 | tidyr |
49 | 数据载入利器 | readr |
50 | 循环加速器 | purrr |
51 | 强化数据框 | tibble |
52 |
53 | # 数据读取
54 |
55 | ## 读取数据
56 |
57 | R语言提供了很多读取数据的函数。
58 |
59 |
60 | 文件格式 | **R** 函数
61 | :--------------------------- | :----------------------
62 | .txt | read.table()
63 | .csv | read.csv() and readr::read_csv()
64 | .xls and .xlsx | readxl::read_excel() and openxlsx::read.xlsx()
65 | .sav | foreign::read.spss()
66 | .Rdata or rda | load()
67 | .rds | readRDS() and readr::read_rds()
68 | .dta | haven::read_dta() and haven::read_stata()
69 | Internet | download.file()
70 |
71 |
72 |
73 | ## 范例
74 |
75 | ```{r}
76 | library(readr)
77 | wages <- read_csv("./demo_data/wages.csv")
78 | head(wages, 6)
79 | ```
80 |
81 | ## 范例
82 | ```{r}
83 | library(readxl)
84 | d <- read_excel("./demo_data/olympics.xlsx")
85 | tail(d, 6)
86 | ```
87 |
88 |
89 |
90 |
91 | # 数据处理
92 |
93 | ## tidy原则
94 |
95 | Hadley Wickhamt提出了数据科学tidy原则,我结合自己的理解,tidy思想体现在:
96 |
97 | ```{r out.width = '85%', echo = FALSE}
98 | knitr::include_graphics("images/import_datatype01.png")
99 | ```
100 |
101 | - 一切都是数据框,任何数据都可以规整
102 | - 数据框的一列代表一个**变量**,数据框的一行代表一次**观察**
103 | - 函数处理数据时,数据框进数据框出(函数的第一个参数始终为**数据框**)
104 |
105 |
106 |
107 | ## dplyr宏包
108 | 本章我们介绍tidyverse里数据处理的神器dplyr宏包。首先,我们加载该宏包
109 | ```{r message = FALSE, warning = FALSE}
110 | library(dplyr)
111 | ```
112 |
113 | dplyr 定义了数据处理的规范语法,其中主要包含以下七个主要的函数。
114 |
115 | * `mutate() `, `select() `, `filter() `
116 | * `summarise() `, `group_by()`, `arrange() `
117 | * `left_join()`, `right_join()`, `full_join()`
118 |
119 | 我们将依次介绍
120 |
121 |
122 | ## 假定数据
123 |
124 | 假定我们有一数据框,包含三位学生的英语和数学科目
125 | \small
126 | ```{r}
127 | df <- data.frame(
128 | name = c("Alice", "Alice", "Bob", "Bob", "Carol", "Carol"),
129 | type = c("english", "math", "english", "math", "english", "math")
130 | )
131 | df
132 | ```
133 |
134 |
135 |
136 | ## `mutate() `增加一列
137 | 这里有他们的最近的考试成绩,想添加到数据框中
138 | \footnotesize
139 | ```{r}
140 | score2020 <- c(80.2, 90.5, 92.2, 90.8, 82.5, 84.6)
141 | score2020
142 | ```
143 |
144 |
145 | \begincols[T]
146 | \begincol[T]{.48\textwidth}
147 | 使用传统的方法
148 | ```{r}
149 | df$score <- score2020
150 | df
151 | ```
152 |
153 | ```{r include=FALSE}
154 | df <- data.frame(
155 | name = c("Alice", "Alice", "Bob", "Bob", "Carol", "Carol"),
156 | type = c("english", "math", "english", "math", "english", "math")
157 | )
158 | ```
159 |
160 | \endcol
161 |
162 | \begincol[T]{.48\textwidth}
163 | dplyr语法这样写
164 |
165 | ```{r}
166 | #
167 | mutate(df, score = score2020)
168 | ```
169 | \endcol
170 | \endcols
171 |
172 |
173 |
174 |
175 |
176 | ## `mutate() `增加一列
177 |
178 | `mutate()` 函数
179 |
180 | ```{r, eval=FALSE}
181 | mutate(.data = df, score = score2020)
182 | ```
183 |
184 | - 第一参数是我们要处理的数据框,比如这里的`df`,
185 | - 第二个参数是`score = score2020`,等号左边的`score`是我们打算创建一个新列,而取的列名;
186 | 等号右边是装着学生成绩的**向量**(注意,向量 的长度要与数据框的行数相等,比如这里长度都是6)
187 |
188 |
189 |
190 |
191 | ## `管道` %>%
192 |
193 | 这里有必要介绍下管道操作符 [ `%>%` ](https://magrittr.tidyverse.org/).
194 |
195 | ```{r}
196 | c(1:10)
197 | ```
198 |
199 | ```{r}
200 | sum(c(1:10))
201 | ```
202 |
203 |
204 | 与下面的写法是等价的,
205 | ```{r}
206 | c(1:10) %>% sum()
207 | ```
208 |
209 |
210 |
211 | ## `管道` %>%
212 |
213 | ```{r, eval=FALSE}
214 | c(1:10) %>% sum()
215 | ```
216 | 这条语句的意思,向量`c(1:10)` 通过管道操作符 `%>%` ,传递到函数`sum()`的第一个参数位置,即`sum(c(1:10))`, 这个`%>%`管道操作符还是很形象的,
217 |
218 | ```{r out.width = '50%', echo = FALSE}
219 | knitr::include_graphics("images/pipe1.png")
220 | ```
221 |
222 |
223 | ## `管道` %>%
224 | 当对执行多个函数操作的时候,就显得格外方便,代码可读性更强。
225 |
226 | ```{r}
227 | sqrt(sum(abs(c(-10:10))))
228 | ```
229 |
230 |
231 | ```{r}
232 | c(-10:10) %>% abs() %>% sum() %>% sqrt()
233 | ```
234 |
235 |
236 |
237 |
238 |
239 | ## `管道` %>%
240 | 那么,上面增加学生成绩的语句`mutate(df, score = score2020)`就可以使用管道
241 |
242 | ```{r out.width = '75%', echo = FALSE}
243 | knitr::include_graphics("images/pipe2.png")
244 | ```
245 |
246 |
247 | ## `管道` %>%
248 | ```{r}
249 | # 等价于
250 | df %>% mutate(score = score2020)
251 | ```
252 | 是不是很赞?
253 |
254 |
255 |
256 | ```{r, include=FALSE}
257 | df <- df %>% mutate(score = score2020)
258 | df
259 | ```
260 |
261 |
262 |
263 |
264 |
265 | ## `select() ` 选择某列
266 |
267 | `select()`,就是选择数据框的某一列
268 |
269 | \bigskip
270 |
271 | \begincols
272 | \begincol{.48\textwidth}
273 | 传统的方法
274 | ```{r}
275 | df["name"]
276 | ```
277 |
278 | \endcol
279 |
280 | \begincol{.48\textwidth}
281 | dplyr的方法
282 | ```{r}
283 | df %>% select(name)
284 | ```
285 | \endcol
286 | \endcols
287 |
288 |
289 |
290 | ## `select() ` 选择某列
291 | 如果选取多列,就再写一个就行了
292 | ```{r}
293 | df %>% select(name, score)
294 | ```
295 |
296 |
297 | ## `select()` 选择某列
298 |
299 | 如果不想要某列, 可以在变量前面加`-`,
300 | ```{r}
301 | df %>% select(-type)
302 | ```
303 |
304 |
305 |
306 |
307 |
308 |
309 | ## `filter() ` 筛选
310 |
311 | 我们还可以对数据行方向的选择和筛选,比如这里把**成绩高于90分的**同学筛选出来
312 |
313 | ```{r}
314 | df %>% filter(score >= 90)
315 | ```
316 |
317 |
318 | ## `filter()` 筛选
319 |
320 | 我们也可以限定多个条件进行筛选, 英语成绩高于90分的筛选出来
321 | ```{r}
322 | df %>% filter(type == "english", score >= 90)
323 | ```
324 |
325 |
326 |
327 |
328 | ## `summarise() `统计
329 |
330 | `summarise() `主要用于统计,往往与其他函数配合使用
331 |
332 | \medskip
333 | 比如,计算所有同学的考试成绩的均值
334 | ```{r}
335 | df %>% summarise( mean_score = mean(score))
336 | ```
337 |
338 | 比如,计算所有同学的考试成绩的标准差
339 | ```{r}
340 | df %>% summarise( mean_score = sd(score))
341 | ```
342 |
343 |
344 |
345 |
346 | ## `summarise() `统计
347 | 还可以同时完成多个统计
348 | ```{r}
349 | df %>% summarise(
350 | mean_score = mean(score),
351 | median_score = median(score),
352 | n = n(),
353 | sum = sum(score)
354 | )
355 | ```
356 |
357 |
358 |
359 |
360 |
361 | ## `group_by()`分组
362 |
363 | 先分组再统计。比如,我们想统计每个学生的平均成绩,即先按学生`name`分组,然后分别求平均
364 |
365 | \small
366 | ```{r}
367 | df %>%
368 | group_by(name) %>%
369 | summarise(
370 | mean_score = mean(score),
371 | sd_score = sd(score)
372 | )
373 | ```
374 |
375 |
376 |
377 |
378 |
379 | ## `arrange() `排序
380 | 我们按照考试成绩从低到高排序,然后输出
381 | ```{r}
382 | df %>% arrange(score)
383 | ```
384 |
385 |
386 |
387 | ## `arrange() `排序
388 |
389 | 如果从高到低降序排列呢,有两种方法:
390 |
391 | \small
392 | \begincols
393 | \begincol{.48\textwidth}
394 | ```{r}
395 | df %>% arrange(-score)
396 | ```
397 | \endcol
398 |
399 | \begincol{.48\textwidth}
400 |
401 | ```{r}
402 | df %>% arrange(desc(score))
403 | ```
404 | \endcol
405 | \endcols
406 |
407 | 哪边可读性更强些?
408 |
409 |
410 | ## `arrange() `排序
411 | 也可对多个变量先后排序。
412 | \medskip
413 | 比如,先按学科排,然后按照成绩从高到底排序
414 | ```{r}
415 | df %>%
416 | arrange(type, desc(score))
417 | ```
418 |
419 |
420 |
421 |
422 |
423 | ## `left_join` 合并
424 | 假定我们已经统计了每个同学的平均成绩,存放在数据框`df1`
425 |
426 | ```{r}
427 | df1 <- df %>%
428 | group_by(name) %>%
429 | summarise( mean_score = mean(score) )
430 | df1
431 | ```
432 |
433 |
434 | ## `left_join` 合并
435 | 同时,我们又有新一个数据框`df2`,它包含同学们的年龄信息
436 | ```{r}
437 | df2 <- tibble(
438 | name = c("Alice", "Bob"),
439 | age = c(12, 13)
440 | )
441 | df2
442 | ```
443 |
444 |
445 |
446 | ## `left_join` 左合并
447 |
448 | 通过姓名`name`把两个数据框`df1`和`df2`合并,
449 |
450 | ```{r}
451 | left_join(df1, df2, by = "name")
452 | ```
453 |
454 | 大家注意到最后一行Carol的年龄是`NA`, 大家想想为什么呢?
455 |
456 |
457 |
458 |
459 |
460 | ## `left_join` 左合并
461 |
462 | 当然,也可以这样写
463 | ```{r}
464 | df1 %>% left_join(df2, by = "name")
465 | ```
466 |
467 |
468 |
469 |
470 |
471 |
472 | ## `right_join` 右合并
473 | 我们再试试`right_join()`右合并
474 |
475 | ```{r, message=FALSE}
476 | df1 %>% right_join(df2, by = "name")
477 | ```
478 | Carol同学的信息没有了? 大家想想又为什么呢?
479 |
480 |
481 |
482 |
483 |
484 |
485 |
486 |
487 | ## 延伸阅读
488 |
489 | - 推荐[https://dplyr.tidyverse.org/](https://dplyr.tidyverse.org/).
490 | - [cheatsheet](https://github.com/rstudio/cheatsheets/raw/master/data-transformation.pdf)
491 | - 运行并读懂[nycflights.Rmd](https://github.com/perlatex/R_for_Data_Science/blob/master/data/nycflights.Rmd)
492 |
493 |
--------------------------------------------------------------------------------
/05_dplyr/05_dplyr.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/05_dplyr/05_dplyr.pdf
--------------------------------------------------------------------------------
/05_dplyr/demo_data/olympics.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/05_dplyr/demo_data/olympics.xlsx
--------------------------------------------------------------------------------
/05_dplyr/demo_data/wages.csv:
--------------------------------------------------------------------------------
1 | "earn","height","sex","race","ed","age"
2 | 79571.299011024,73.89,"male","white",16,49
3 | 96396.9886433106,66.23,"female","white",16,62
4 | 48710.666947391,63.77,"female","white",16,33
5 | 80478.0961525837,63.22,"female","other",16,95
6 | 82089.3454983326,63.08,"female","white",17,43
7 | 15313.3529014342,64.53,"female","white",15,30
8 | 47104.1718212293,61.54,"female","white",12,53
9 | 50960.0542820731,73.29,"male","white",17,50
10 | 3212.6495560539,72.24,"male","hispanic",15,25
11 | 42996.6378844038,72.4,"male","white",12,30
12 | 10328.6188426045,70.22,"male","white",16,69
13 | 1002.30715511839,63.15,"female","white",14,54
14 | 47597.8198637099,68.11,"male","white",11,38
15 | 19019.5422985066,68.08,"male","white",12,31
16 | 20063.9966387225,64.86,"female","white",12,55
17 | 992.832346303227,60.06,"female","white",12,31
18 | 35972.1711232638,66.01,"female","white",16,39
19 | 26930.5439643451,68.07,"male","white",12,62
20 | 64602.0639724231,68.16,"female","white",14,33
21 | 69993.6930698179,70.02,"male","white",13,48
22 | 1000.21830625211,67.08,"female","white",9,25
23 | 12131.8221152514,64.2,"female","black",12,59
24 | 84223.3979186057,72.76,"male","black",13,39
25 | 8949.47493547637,62.17,"female","white",13,55
26 | 23278.3232780607,63.04,"female","white",14,25
27 | 8750.13918500201,66.67,"male","white",14,26
28 | 64593.5998054014,65.95,"female","white",12,45
29 | 54079.8149454029,71.69,"male","white",12,49
30 | 16896.1000435301,62.76,"female","black",12,39
31 | -95.7102792312691,68.46,"male","white",11,25
32 | 43938.8576046547,64.07,"female","white",16,64
33 | 1004.09561369181,60,"female","white",12,48
34 | 79478.0468181047,71.97,"male","white",16,42
35 | 984.761813301634,68.25,"female","white",12,59
36 | 65237.7700476547,76.68,"male","white",16,37
37 | 978.872090839732,63.77,"female","white",10,23
38 | 24845.4285917486,63.96,"female","white",14,29
39 | 40754.1029736281,63.72,"female","white",12,37
40 | 119251.055119839,71.92,"male","white",17,43
41 | 42933.0122551917,68.12,"male","white",17,35
42 | 20088.3114644238,64.48,"female","white",12,30
43 | 12924.2239977829,60.94,"female","white",14,82
44 | 48693.4797964477,67.19,"female","black",14,35
45 | 34383.101849067,65.05,"female","white",12,61
46 | 43928.4198331574,64.17,"female","white",14,30
47 | 5782.248948451,66.49,"female","white",15,69
48 | 40758.5577226173,65.58,"female","white",12,34
49 | 39162.8754520925,64.93,"female","white",12,45
50 | 51896.846670074,67.02,"female","white",18,33
51 | 16905.1408589265,67.49,"female","white",17,34
52 | 1000.99111084442,70.03,"female","white",12,61
53 | 18497.8538930071,60.61,"female","hispanic",12,25
54 | 30740.47803689,64.92,"female","white",13,36
55 | 32803.4426895406,62,"female","white",12,33
56 | 5529.47149841489,72.16,"male","white",10,22
57 | 20594.711211041,66.35,"male","black",8,60
58 | 39816.5002027645,70.09,"male","white",12,69
59 | 33327.5987832144,70.05,"male","white",17,45
60 | 55067.7931193718,68.49,"female","white",17,53
61 | 10545.4502477035,60.13,"female","white",12,69
62 | 26979.2976737281,71.15,"male","white",12,32
63 | 55634.1696790621,70.77,"male","white",12,36
64 | 6380.09148122974,72.87,"male","white",13,22
65 | 23264.3741899011,68,"female","white",14,59
66 | 16918.2189161501,65.64,"female","white",12,61
67 | 39676.2644997822,68.74,"male","white",16,33
68 | 994.133503863912,63.02,"female","white",8,67
69 | 26420.8322535518,62.8,"female","white",14,31
70 | 986.449291094241,65.99,"female","black",13,34
71 | 986.993627550926,64.66,"female","white",12,30
72 | 25550.7559075621,62.04,"male","hispanic",14,32
73 | 27235.2680173859,64.24,"female","white",14,47
74 | 7354.05363222349,63.82,"female","white",9,72
75 | 7101.32415873714,66.95,"female","white",9,56
76 | 35041.7055007873,71.28,"male","white",12,43
77 | 1318.87555348029,59.63,"female","white",16,57
78 | 42343.3439688464,66.34,"female","white",16,31
79 | 4985.24601643037,66,"female","white",15,25
80 | 28022.6386657665,65.49,"female","white",14,43
81 | 13729.6720245917,69.96,"female","white",13,26
82 | 991.599966272541,64.79,"female","white",13,67
83 | 20074.7729832596,66.35,"female","white",13,72
84 | 16901.6558792008,62.09,"female","white",12,51
85 | 16899.2184171763,65.83,"female","white",15,71
86 | 999.338097135203,63.78,"female","white",14,42
87 | 23831.5316537746,71.81,"male","white",12,43
88 | 4812.71528146838,67.12,"female","hispanic",8,43
89 | 47742.1252747604,68.19,"male","hispanic",12,36
90 | 47655.590724269,69.76,"male","white",12,37
91 | 16892.6857012084,61.91,"female","white",12,42
92 | 8934.46253020389,63.29,"female","white",13,30
93 | 20099.9218436585,65.48,"female","white",13,67
94 | 32803.3025937337,69.75,"female","white",10,65
95 | 32811.8623722485,68.09,"female","white",12,40
96 | 31765.8967813837,70.21,"male","white",14,27
97 | 1001.00043870299,65.99,"female","white",12,63
98 | 2907.2064170387,66.23,"female","white",12,24
99 | 2115.67589682381,67.79,"female","white",16,36
100 | 31884.9092385098,69.94,"male","white",16,31
101 | 16905.0220799872,60.75,"female","hispanic",12,26
102 | 48686.3352313777,62.9,"female","white",12,77
103 | 64606.3235215003,63.47,"female","white",14,60
104 | 39751.1940297085,67.14,"male","white",12,93
105 | 16881.7141614142,60.66,"female","white",17,83
106 | 96382.8900908101,69.55,"female","white",18,67
107 | 986.228613755399,63.46,"female","white",12,25
108 | 29618.8251244886,67.83,"female","white",12,70
109 | 26500.2429561617,64.39,"female","white",12,37
110 | 24854.9727381481,67.01,"female","black",14,34
111 | 15885.5195104926,68.79,"male","white",17,27
112 | 53463.6223523582,67.05,"female","white",13,47
113 | 29627.8222039787,67.83,"female","white",12,34
114 | 24843.1748606049,63.03,"female","white",12,41
115 | 33438.2100886863,70.97,"male","white",12,26
116 | 34374.0790977282,63.91,"female","black",17,47
117 | 58834.1668506326,68.22,"male","white",11,41
118 | 61429.1854058301,68.43,"female","white",17,48
119 | 28014.7345560315,65.53,"female","hispanic",14,47
120 | 981.65670365787,66.13,"female","white",12,33
121 | 996.182111287872,66.31,"female","other",16,36
122 | 50924.6583604247,77.12,"male","white",16,34
123 | 44729.6150466756,67.17,"female","white",12,62
124 | 27223.3656953101,62.11,"female","white",12,48
125 | 1007.70988015799,64.5,"female","white",17,34
126 | 996.828601796463,71.96,"female","white",12,55
127 | 40750.0226434449,67.42,"female","white",18,39
128 | 43929.4541868987,61.17,"female","white",18,47
129 | 8949.67101572717,63.6,"female","white",12,32
130 | 112309.660734072,61.91,"female","white",16,42
131 | 8939.55497630854,62.83,"female","white",12,44
132 | 7932.57103394049,67.83,"male","white",16,28
133 | 32800.8023474806,63.71,"female","white",15,30
134 | 6359.00598305609,71.36,"male","white",15,25
135 | 96383.0764233227,64,"female","white",16,39
136 | 8967.87647229583,60.96,"female","white",13,35
137 | 47691.8764601662,74.21,"male","white",12,42
138 | 111283.259406706,70.25,"male","white",14,39
139 | 79481.9408160514,71.08,"male","white",16,45
140 | 70960.3420883276,62.85,"female","white",12,43
141 | 48722.16771983,63.61,"female","white",14,47
142 | 997.891112569392,62.03,"female","white",16,36
143 | 16912.1555694825,62.08,"female","white",16,44
144 | 37573.2990727913,70.15,"female","white",17,46
145 | 1005.10722008903,62.1,"female","white",12,43
146 | 71548.0876578581,70.76,"male","white",17,66
147 | 23878.8111841365,71.04,"male","white",14,35
148 | 7344.95757908863,69.86,"female","white",14,75
149 | 26877.8701780781,67.02,"male","white",14,35
150 | 48708.525626797,64.16,"female","white",12,36
151 | 44711.1395351231,67.09,"female","white",12,34
152 | 10044.7225146099,62.1,"female","white",8,73
153 | 29611.1767650406,61.76,"female","hispanic",13,60
154 | 68256.8967140838,67.95,"male","black",13,48
155 | 51888.9713088147,62.5,"female","black",14,48
156 | 995.46575719133,66.19,"female","black",5,55
157 | 16887.1457287969,68.16,"female","black",18,60
158 | 95449.6102254534,74.23,"male","white",13,49
159 | 34383.7045204962,68.07,"female","other",12,54
160 | 4807.83133117883,61.74,"female","white",16,26
161 | 2587.63185305423,65.9,"female","white",15,32
162 | 42922.5679351389,68.18,"male","white",12,31
163 | 11485.9945183583,60.18,"female","hispanic",14,32
164 | 25447.465302014,67.55,"male","white",8,47
165 | 143095.852138569,72.09,"male","white",12,30
166 | 13720.5053672839,66.27,"female","white",12,46
167 | 31829.2883253981,67.86,"male","white",10,36
168 | 24858.2269938452,67.47,"female","white",12,22
169 | 20078.9661907246,69.1,"female","white",12,64
170 | 39165.2382046526,64.15,"female","white",16,50
171 | 32794.5406477114,65.02,"female","white",14,43
172 | 31207.4500897519,61.19,"female","white",12,50
173 | 15970.26222705,71.89,"male","white",12,53
174 | 64610.3579806432,66.19,"female","white",16,38
175 | 991.628909680612,61.91,"female","white",10,61
176 | 995.090910576791,59.91,"female","hispanic",10,52
177 | 39783.9469489965,67.85,"male","white",14,68
178 | 39684.6241268197,70.08,"male","white",12,28
179 | 39724.6811008388,71.13,"male","white",14,36
180 | 30201.5172441044,71.51,"male","white",16,65
181 | 70051.7508949032,68.28,"male","white",16,52
182 | 1007.95714596412,65.81,"female","white",12,23
183 | 1012.93115242453,63.07,"female","white",18,59
184 | 24854.2035992486,60.32,"female","white",14,53
185 | 28036.2730849519,63.06,"female","white",12,40
186 | 38198.250803332,67.61,"male","white",12,60
187 | 37585.5211136113,63.65,"female","white",12,41
188 | 20628.2955187112,70.44,"male","white",12,78
189 | 103375.877816366,67.81,"male","white",16,50
190 | 12129.4641155017,60.95,"female","white",12,67
191 | 63635.6570881354,67.78,"male","white",18,67
192 | 24843.7543318963,66.4,"female","white",17,47
193 | 31795.5788124066,69.7,"male","white",16,29
194 | 31852.2062255474,70.62,"male","white",12,54
195 | 32804.6545498774,64.76,"female","white",12,46
196 | 40747.1407845328,66.06,"female","white",12,53
197 | 77889.5763942784,69.9,"male","white",13,41
198 | 1010.19089949095,63.84,"female","white",12,40
199 | 1009.69186942175,63,"female","white",16,39
200 | 40750.8830526368,61.39,"female","white",12,69
201 | 18480.6824537285,66.26,"female","white",14,61
202 | 26423.1293431361,60.8,"female","white",15,42
203 | 55572.0362132959,72,"male","white",14,47
204 | 198835.433852258,74.22,"male","white",18,49
205 | 37566.5366877765,66.83,"female","white",12,51
206 | 28022.2375146568,66.01,"female","white",16,32
207 | 1016.49101336236,62.09,"female","white",13,35
208 | 43933.83836836,62.21,"female","white",12,45
209 | 111301.925090235,72.23,"male","white",14,36
210 | 56639.2874902506,65.17,"female","white",15,51
211 | 16899.2307482988,60.07,"female","white",12,66
212 | 56646.5578023636,61.91,"female","white",15,43
213 | 24858.2973709503,66.01,"female","white",12,42
214 | 20065.3040015574,64.06,"female","white",12,43
215 | 12641.1483677478,71.01,"male","white",12,72
216 | 13742.373319619,65.32,"female","white",12,54
217 | 55575.0908182266,73.15,"male","white",18,54
218 | 71500.598919884,68.92,"male","white",11,55
219 | 997.072456516123,64.04,"female","white",12,62
220 | 24845.7392111286,64.09,"female","white",14,26
221 | 24831.1444642002,61.37,"female","white",12,81
222 | 39143.607661738,67.06,"female","white",18,55
223 | 40770.5860160793,61.68,"female","white",13,36
224 | 40764.7785489312,62.82,"female","black",14,58
225 | 31727.703584918,66.64,"male","other",10,32
226 | 39154.2544457864,67.86,"female","white",12,28
227 | 69902.4752117052,72.3,"male","white",16,50
228 | 109708.748822754,69.94,"male","white",12,42
229 | 98535.1068275004,69.96,"male","white",18,48
230 | 50919.6738042205,67.17,"male","black",14,36
231 | 32810.0439602876,62.83,"female","black",12,37
232 | 51892.9423880023,65.68,"female","other",18,39
233 | 40756.069498211,65.9,"female","white",17,54
234 | 270275.894207949,71.07,"male","white",18,49
235 | 1001.22511774436,66.07,"female","black",11,33
236 | 55728.5899158062,66.9,"male","white",16,37
237 | 63516.2793540724,76.48,"male","black",16,42
238 | 52438.922964203,68.71,"male","white",14,34
239 | 29615.6153757967,65.41,"female","white",12,26
240 | 48695.6588978653,61.92,"female","white",13,47
241 | 42341.7892935171,61.97,"female","white",12,35
242 | 8962.40245588917,61.72,"female","white",12,25
243 | 32800.6355705226,62.29,"female","white",16,77
244 | 28011.6417041634,60.95,"female","white",12,34
245 | 51878.8132937651,65.16,"female","white",14,32
246 | 23868.1473946433,72,"male","white",14,26
247 | 981.194900092476,65.92,"female","white",17,45
248 | 79510.340661361,72.13,"male","white",14,59
249 | 13722.141363414,63.85,"female","white",12,43
250 | 63693.4110882131,71.29,"male","white",14,55
251 | 63610.4749987873,72.43,"male","white",15,45
252 | 52085.5294649461,65.85,"male","white",12,38
253 | 32803.4792304162,65,"female","white",14,73
254 | 58230.5879236984,64.9,"female","black",15,43
255 | 10542.8902094512,61.48,"female","white",12,68
256 | 20095.5458305818,65.97,"female","black",14,38
257 | 95430.6104210465,66.78,"male","white",14,67
258 | 64613.0668228329,64.3,"female","white",13,48
259 | 69383.4258503022,63.28,"female","white",12,42
260 | 72549.3620022462,67.36,"female","white",16,38
261 | 10522.9175006923,67.83,"female","white",18,43
262 | 13718.4543920556,64.51,"female","other",10,86
263 | 32796.9508717817,63.87,"female","black",13,43
264 | 27064.549819231,66.1,"male","black",12,27
265 | 4182.39423494047,65.48,"female","white",12,36
266 | 103364.476394143,64.43,"male","white",12,32
267 | 79469.6826737892,72.13,"male","white",12,45
268 | 998.682558355125,67.92,"female","white",11,58
269 | 998.684495526987,62.69,"female","white",11,33
270 | 18477.7803339031,67.83,"female","white",12,42
271 | 55628.230922933,74.41,"male","white",18,34
272 | 42838.8607714631,63.1,"male","white",12,37
273 | 6565.33123484257,63.75,"female","white",12,65
274 | 66836.7602543026,68.19,"male","white",12,41
275 | 32800.535842363,66.94,"female","white",18,33
276 | 24847.4675748411,66.65,"female","white",12,33
277 | 15859.5314446913,72.86,"male","white",12,25
278 | 1003.64167432293,62.21,"female","white",12,52
279 | 9490.29690448586,69.22,"male","white",8,82
280 | 28032.1868635186,64.95,"female","hispanic",16,27
281 | 55589.0943127064,68.9,"male","hispanic",16,69
282 | 44516.112502923,77.02,"male","white",14,32
283 | 24848.5150235251,68.07,"female","white",12,37
284 | 32819.8518440008,62.2,"female","white",17,28
285 | 32807.4666688853,64.07,"female","white",12,33
286 | 16671.5011254779,66.55,"male","hispanic",12,46
287 | 20550.1150385843,70.19,"male","white",16,26
288 | 16894.1454216893,62.2,"female","white",12,56
289 | 5768.92665868558,62.71,"female","white",12,41
290 | 39157.5819181736,61.99,"female","white",14,33
291 | 998.503387403771,62.81,"female","white",17,45
292 | 28024.6010138162,68.84,"female","white",13,39
293 | 18494.0961370381,64.28,"female","white",12,52
294 | 1028.7967089839,64.52,"female","white",14,41
295 | 51883.97214454,68.63,"female","white",14,44
296 | 28035.1640643814,59.06,"female","white",15,30
297 | 5764.82640107653,66.36,"female","other",12,39
298 | 1000.43844737019,64.97,"female","white",12,45
299 | 4330.56327434975,62,"female","black",12,25
300 | 5042.87288471185,75.18,"male","black",11,38
301 | 28044.017605739,62.99,"female","white",17,30
302 | 48696.6894978227,65.03,"female","white",13,32
303 | 23793.5962873794,70.02,"male","white",12,36
304 | 39169.7206103304,68.32,"female","white",16,46
305 | 16889.5158551673,65.01,"female","white",14,71
306 | 79469.6524162958,69.22,"male","white",8,57
307 | 79521.0309906084,71.7,"male","white",16,57
308 | 32811.1171608666,63.65,"female","white",17,53
309 | 48692.7045921071,62.84,"female","hispanic",18,58
310 | 1000.81650997941,63.41,"female","white",12,67
311 | 1019.52175123961,65.4,"female","white",10,52
312 | 986.055083445869,63.07,"female","white",5,62
313 | 35986.0131003784,73.2,"female","white",14,49
314 | 42903.3817816247,74.77,"male","white",12,28
315 | 7363.89520150288,65.14,"female","white",12,42
316 | 28828.8892012243,66.55,"female","white",17,30
317 | 27220.0746330027,65.93,"female","white",14,32
318 | 45524.391056089,63.91,"female","white",16,43
319 | 83682.9635299098,66.1,"female","white",18,54
320 | 24854.2862407647,63.68,"female","white",16,33
321 | 31210.0450199533,62.71,"female","white",13,36
322 | 998.792806987374,62.75,"female","white",13,48
323 | 42834.4504315573,72.72,"male","white",12,27
324 | 23793.0568258544,72,"male","white",8,62
325 | 1004.68819257564,66.06,"female","white",12,28
326 | 23873.5673156157,71.94,"male","white",14,37
327 | 24035.958990178,63.9,"female","white",12,59
328 | 39149.8127118589,64.24,"female","white",13,34
329 | 28581.7197594866,68.16,"male","white",15,40
330 | 7351.36224357467,61.96,"female","white",12,72
331 | 7350.34389591184,68.48,"female","white",13,61
332 | 2098.28255691469,59.46,"female","white",12,40
333 | 39169.7501351968,64.79,"female","white",12,95
334 | 42926.3869579729,76.09,"male","white",12,43
335 | 20091.4931673901,71.41,"female","white",12,39
336 | 35985.2005008923,62.08,"female","white",18,51
337 | 48698.2042696945,62.21,"female","white",13,40
338 | 55616.9724786137,71.55,"male","white",12,65
339 | 31836.5309592245,73.79,"male","white",12,46
340 | 51897.557115687,65.16,"female","white",16,36
341 | 10565.080348061,62.79,"female","other",12,35
342 | 20084.3631443331,66.52,"female","white",12,43
343 | 989.32483001483,65.05,"female","white",16,48
344 | 16893.650618774,62.93,"female","white",12,35
345 | 2583.37635997833,65.01,"female","white",12,22
346 | 20057.9469932086,63.83,"female","white",15,75
347 | 26434.5356297357,65.19,"female","white",12,40
348 | 990.632829899077,67.9,"female","white",16,45
349 | 40730.7869181018,62.87,"female","white",16,37
350 | 40751.8595966049,69.96,"female","white",12,31
351 | 50933.4879483364,68.62,"male","white",12,34
352 | 55621.9374127112,66.75,"male","white",14,47
353 | 10538.8367190333,65.05,"female","white",12,32
354 | 127210.627675111,69.39,"male","white",14,48
355 | 32799.5880325174,67.7,"female","white",14,29
356 | 2598.33597751564,66.37,"female","white",17,42
357 | 44502.6875350514,71.38,"male","hispanic",13,42
358 | 41290.7032034576,63.87,"male","hispanic",16,27
359 | 42946.1384264164,74.16,"male","white",12,28
360 | 29617.1242132711,61.92,"female","white",13,45
361 | 4203.92493626903,64.84,"female","white",12,63
362 | 31774.6724028254,69.09,"male","white",12,70
363 | 10547.6593411727,66.08,"female","white",16,52
364 | 998.222937309824,60.16,"female","white",10,72
365 | 39793.2360685558,66.6,"male","white",11,67
366 | 19083.0662723967,72.94,"male","white",12,33
367 | 982.603707126398,65.13,"female","white",18,30
368 | 19019.4383518021,76.02,"male","white",12,61
369 | 47621.5506990729,70.14,"male","white",14,80
370 | 1002.02789557595,64.18,"female","white",12,34
371 | 7352.90609561772,63,"female","black",15,26
372 | 42344.141216902,62.96,"female","black",13,41
373 | 36597.1185121691,70.04,"male","black",12,39
374 | 33372.9363140627,71.27,"male","white",12,37
375 | 40755.6601213819,65.19,"female","white",12,66
376 | 28578.7772496041,69.48,"male","white",16,79
377 | 7372.74244873303,63.48,"female","white",13,48
378 | 39006.1541350665,64.11,"female","white",12,26
379 | 56653.7613128414,63.44,"female","white",18,44
380 | 41312.4117058859,67.15,"male","white",14,34
381 | 39767.884119698,65.82,"male","white",16,40
382 | 38374.9264949555,68.18,"female","white",13,31
383 | 20079.9007989922,64.7,"female","white",12,26
384 | 24856.3852622375,66.85,"female","white",12,24
385 | 4771.49798475468,72.62,"male","white",17,27
386 | 10538.599791571,63.68,"female","white",12,71
387 | 22284.6898628754,70.77,"male","white",12,73
388 | 24851.1945376469,63.17,"female","white",16,35
389 | 278213.531775569,71.05,"male","white",16,52
390 | 1012.72857450333,62.7,"female","white",12,55
391 | 1605.96493256327,72.85,"male","white",18,29
392 | 16899.9838545566,64.92,"female","white",14,67
393 | 71575.5885980481,65.94,"male","white",13,86
394 | 24854.2581843483,66.21,"female","white",12,85
395 | 32804.552369449,59,"female","white",13,45
396 | 55626.7507266474,69.9,"male","white",16,34
397 | 63619.6597966112,71.96,"male","white",16,32
398 | 56660.0517213774,64.27,"female","white",12,76
399 | 80491.7059230187,65.48,"female","white",16,58
400 | 159001.929910422,69.42,"male","white",18,61
401 | 55744.0702635899,69.09,"male","white",13,32
402 | 1019.39370328577,65.85,"female","white",12,55
403 | 39166.0892107678,68.19,"female","white",12,39
404 | 994.954167707421,64.15,"female","white",12,34
405 | 1016.64167481653,63.06,"female","white",12,64
406 | 55597.0775538434,73.86,"male","white",17,47
407 | 60429.4216733453,72.55,"male","white",16,78
408 | 47666.4694402061,67.35,"male","white",12,56
409 | 8958.40683258562,64.3,"female","hispanic",12,43
410 | 235388.719222562,66.72,"male","white",18,42
411 | 47782.4465201032,69.45,"male","white",12,36
412 | 11338.9470357351,63.76,"female","white",15,35
413 | 5765.71378324095,61.71,"female","white",16,41
414 | 986.455165422203,65.94,"female","white",8,57
415 | 36713.0878336217,69.45,"male","white",18,73
416 | 63544.160094039,67.69,"male","white",17,45
417 | 47713.2746225333,71.77,"male","white",14,43
418 | 998.346114761212,63.09,"female","white",14,33
419 | 23257.5590577062,63.09,"female","white",12,41
420 | 24843.7923614691,68.86,"female","white",12,38
421 | 42347.4010321721,63.75,"female","white",18,40
422 | 13720.5614863724,71.45,"female","white",12,29
423 | 39159.796830908,63.98,"female","black",16,31
424 | -12.0167137832365,72.08,"male","white",8,24
425 | 1013.27147701205,62.27,"female","white",12,29
426 | 988.758787012582,65.93,"female","white",10,42
427 | 8181.73028203835,65.8,"male","white",8,71
428 | 16889.8103273437,60.72,"female","white",15,29
429 | 1016.50273906467,69.1,"female","white",13,23
430 | 20100.2642571535,68.3,"female","white",16,36
431 | 999.845388435426,65.9,"female","white",12,27
432 | 79521.0347468327,67.87,"male","white",17,38
433 | 37551.8483983486,63.61,"female","white",13,30
434 | 1016.37310656959,61,"female","white",16,44
435 | 63578.5962735478,65.98,"male","black",14,38
436 | 10543.6518922358,61.86,"female","black",13,30
437 | 8955.40314951583,64.86,"female","black",16,34
438 | 174917.848536018,65.81,"male","white",18,41
439 | 65089.763788033,72.05,"male","hispanic",13,36
440 | 37562.8527389814,62.95,"female","white",13,82
441 | 1000.54221572545,63.1,"female","white",12,36
442 | 33354.9444670964,69.91,"male","white",12,27
443 | 7358.5069793707,62.02,"female","white",12,33
444 | 39763.7660650055,71.11,"male","white",12,40
445 | 47665.4186748928,69.84,"male","white",18,50
446 | 23255.7192646579,64.04,"female","white",11,55
447 | 10546.2705351933,63.34,"female","white",12,39
448 | 23266.3819801947,65.03,"female","white",12,50
449 | 68461.3756361159,70.81,"male","white",12,31
450 | 40750.2315198708,61.87,"female","white",15,37
451 | 63533.925247467,73.05,"male","white",12,70
452 | 103313.622213574,67.86,"male","white",17,44
453 | 25439.9094368504,76.18,"male","white",15,36
454 | 13715.9034686946,65.99,"female","white",12,75
455 | 32795.5150089862,64.02,"female","white",15,43
456 | 23259.2826084242,60.01,"female","white",14,76
457 | 71637.552888116,67.86,"male","white",12,78
458 | 13712.7662216597,61.18,"female","white",11,76
459 | 19065.5179466674,66.13,"male","white",12,32
460 | 13735.4075238793,63.02,"female","white",14,35
461 | 26440.4988768348,62.85,"female","white",15,36
462 | 9579.8456488705,73.86,"male","white",15,77
463 | 30173.6300852123,69.92,"male","white",12,36
464 | 34396.1596986703,68.26,"female","white",13,43
465 | 69369.6365028672,69.64,"female","white",18,50
466 | 55664.3998002978,69.15,"male","white",15,25
467 | 12747.0423371962,69.42,"male","white",10,79
468 | 34385.4430522478,62.74,"female","white",14,45
469 | 10233.0797957323,65.15,"female","white",12,34
470 | 27082.7070995351,68.85,"male","white",12,50
471 | 38087.3220223372,68.15,"male","white",14,53
472 | 8935.97929440994,62.14,"female","white",11,75
473 | 18489.6235991451,65.91,"female","white",12,27
474 | 15920.5922048939,65.15,"male","white",13,31
475 | 63608.3668031832,70.11,"male","white",14,40
476 | 993.515449224117,66,"female","black",12,41
477 | 63604.6309830045,70.06,"male","white",12,51
478 | 39163.5186521581,66.36,"female","black",14,33
479 | 38187.4073424178,68.23,"male","white",16,30
480 | 31761.6801664963,67.77,"male","white",16,37
481 | 95330.5448443733,69.87,"male","white",18,54
482 | 18482.8594515788,65.28,"female","white",13,51
483 | 8950.51549317265,63.91,"female","white",15,44
484 | 43919.1420419723,64.88,"female","white",16,34
485 | 141452.613413474,77.21,"male","white",16,45
486 | 63665.5948402963,70.36,"male","white",14,32
487 | 24859.8151722343,64.15,"female","white",16,30
488 | 2581.0429400068,63.83,"female","white",14,24
489 | 998.726796716888,62.04,"female","white",12,32
490 | 35995.1803431969,64.2,"female","white",12,64
491 | 20086.1754557979,63.93,"female","white",8,52
492 | 35968.8432808826,63.09,"female","white",16,32
493 | 64581.9520540544,57.94,"female","black",12,60
494 | 994.323952930143,66,"female","white",12,62
495 | 64599.9564039253,62.1,"female","white",16,48
496 | 995.486049534476,60.97,"female","white",13,46
497 | 56649.7673340898,63.61,"female","hispanic",14,57
498 | 12129.4184092628,64.22,"female","white",12,32
499 | 23257.4531814882,61.35,"female","white",12,41
500 | 24849.2645102664,64.06,"female","white",12,60
501 | 997.060240519208,66.31,"female","black",14,35
502 | 10536.8164690924,66.3,"female","white",13,38
503 | 18491.4632477733,68.48,"female","black",12,45
504 | 16894.8715844397,63.71,"female","white",13,35
505 | 1004.08167170771,62.42,"female","white",12,63
506 | 31776.7508647511,71.11,"male","white",14,45
507 | 12132.2731355026,62.1,"female","white",12,77
508 | 60439.3840685454,67.95,"male","white",9,50
509 | 46075.0353918281,67.96,"male","white",9,62
510 | 32794.3325793405,62.48,"female","white",10,48
511 | 13726.4773008127,63,"female","white",12,61
512 | 57180.2849042573,69.28,"male","white",16,55
513 | 3396.41078804734,67.46,"female","white",12,48
514 | 67766.0710226514,63.84,"female","white",18,33
515 | 39705.6238023595,69.96,"male","white",12,26
516 | 20670.4756812433,69.3,"male","white",10,24
517 | 47728.0991250952,69.4,"male","black",11,55
518 | 11260.2857898754,69.05,"male","white",12,27
519 | 50295.0102290679,59.96,"female","black",12,52
520 | 24861.2188259363,66.62,"female","black",12,46
521 | 987.153288063948,59.86,"female","white",18,45
522 | 79449.0780155172,66.77,"male","white",16,53
523 | 7902.16681412268,66.39,"male","white",15,24
524 | 1955.16818716443,69.87,"female","black",12,22
525 | 23250.1981515703,63.89,"female","black",8,55
526 | 64598.4562971162,59.72,"female","other",18,39
527 | 38212.8861850698,68.14,"male","white",18,40
528 | 35071.3664810722,73.49,"male","black",12,28
529 | 31732.0084520613,71.98,"male","white",8,40
530 | 32973.46426248,63.69,"female","white",12,42
531 | 19100.6126796045,66.85,"male","white",13,71
532 | 66201.6869744607,66.88,"female","white",13,44
533 | 20636.4454407762,70.68,"male","black",9,54
534 | 1016.75164962538,63.23,"female","white",12,29
535 | 95424.9084378097,73.21,"male","white",16,35
536 | 38084.9099393145,69.94,"male","white",16,29
537 | 60380.5619633035,69.85,"male","white",16,40
538 | 5781.36536737211,71.49,"female","white",14,31
539 | 21666.0178506817,66.05,"female","white",12,34
540 | 7893.44745525963,75.04,"male","white",12,25
541 | 32801.7096211093,59.83,"female","white",14,41
542 | 34894.9420985275,71.97,"male","white",12,32
543 | 990.366789688609,63.08,"female","black",16,27
544 | 44509.7431378579,75.11,"male","black",9,59
545 | 35983.4557169327,59.68,"female","white",12,45
546 | 47725.9251418864,72.54,"male","black",12,48
547 | 48687.901309322,64.93,"female","black",12,36
548 | 47617.1421807931,67.12,"male","white",16,35
549 | 51881.1137290279,60.77,"female","white",12,73
550 | 42328.8007934413,69,"female","white",17,47
551 | 35974.9467608756,64.05,"female","white",14,38
552 | 39662.96205847,70.18,"male","white",12,37
553 | 18481.473822349,68.24,"female","white",10,28
554 | 21658.7263654599,62.09,"female","white",12,57
555 | 16913.3676793106,64.15,"female","white",15,46
556 | 1002.3923362722,67.76,"female","white",12,28
557 | 20068.0607328364,68.05,"female","white",12,49
558 | 56.3781338622286,66.18,"male","white",13,61
559 | 12123.3789274287,64.22,"female","white",11,68
560 | 90527.2378218376,72.92,"male","white",14,41
561 | 57196.6301938738,73.38,"male","white",16,41
562 | 46069.1445429611,68.71,"male","hispanic",14,33
563 | 13727.1703803878,67.87,"female","white",13,23
564 | 4831.58925740444,71.34,"male","black",11,22
565 | 990.536028389992,63.2,"female","white",12,68
566 | 32809.7265515796,66.19,"female","white",16,45
567 | 995.475488386213,65.81,"female","white",12,40
568 | 52457.2959876668,66.22,"male","white",12,61
569 | 24864.5362144608,68.26,"female","hispanic",12,72
570 | 25420.1296221833,69.66,"male","white",12,37
571 | 8949.60197006944,62.17,"female","white",12,82
572 | 13733.774868634,65.98,"female","white",14,47
573 | 51874.5737440326,63,"female","black",16,38
574 | 23102.781191952,74.97,"male","white",12,28
575 | 21676.2738975494,66.86,"female","white",12,33
576 | 1008.33879197347,62.22,"female","white",10,24
577 | 12134.4467282926,63.24,"female","white",9,29
578 | 80502.5320404378,65.44,"female","black",18,69
579 | 20075.1124669118,68.84,"female","black",12,39
580 | 30282.4673070547,71.33,"male","white",14,65
581 | 15324.3810807192,62.23,"female","white",12,59
582 | 32794.1983616288,64.92,"female","black",12,39
583 | 87392.6632009754,72.34,"male","hispanic",18,39
584 | 16890.3326960442,65.12,"female","white",12,44
585 | 21661.0075017285,62.9,"female","white",12,23
586 | 32809.6662173545,64.34,"female","white",13,35
587 | 47708.8917890766,71.9,"male","white",12,30
588 | 20074.666638086,64.39,"female","white",12,38
589 | 42347.1664095893,70.17,"female","white",13,37
590 | 20882.6809309634,65.09,"female","white",12,31
591 | 13706.6343035933,64.38,"female","white",12,33
592 | 24851.8480250879,64.12,"female","hispanic",16,36
593 | 103274.093507014,67.05,"male","white",14,29
594 | 39730.4007784177,73.13,"male","white",10,25
595 | 12735.5841473359,73.06,"male","hispanic",14,25
596 | 23877.3476332195,65.08,"male","other",14,25
597 | 20079.3061155936,64.49,"female","white",16,44
598 | 56642.756541816,66.91,"female","black",13,41
599 | 63557.9760687751,66.58,"male","white",16,35
600 | 46114.9940451608,74.49,"male","white",12,26
601 | 24856.1676094287,62.84,"female","white",13,35
602 | 31818.6236230481,74.17,"male","white",12,40
603 | 4706.62072409605,73.86,"male","black",6,66
604 | 2596.37371508954,63,"female","black",16,23
605 | 13711.4490881715,62.37,"female","black",11,27
606 | 8962.85062670136,61.96,"female","white",12,22
607 | 40751.6454536098,66.75,"female","white",16,31
608 | 31860.2747237421,69.1,"male","white",12,61
609 | 47719.3361845415,73.1,"male","white",12,38
610 | 20084.3187707433,63.03,"female","white",12,39
611 | 1009.5439125996,60.19,"female","white",12,38
612 | 16895.3990352936,67.24,"female","white",14,36
613 | 25455.211936639,68.73,"male","white",12,23
614 | 71550.4036773232,70.95,"male","white",12,33
615 | 63567.3694572935,72.35,"male","white",14,58
616 | 991.402309767381,62.95,"female","white",12,66
617 | 39688.5905654115,70.55,"male","white",14,85
618 | 31203.6566167839,63.75,"female","white",14,56
619 | 28969.5085549037,64.52,"female","white",16,34
620 | 11052.2685502736,68.13,"male","white",12,71
621 | 47697.1018967501,74.14,"male","white",12,36
622 | 29620.4026862233,62.11,"female","white",11,51
623 | 40743.0231380484,64.81,"female","white",16,51
624 | 1001.39285180773,66.42,"female","white",14,59
625 | 10535.5109396086,59.93,"female","white",5,66
626 | 111355.582492486,68.96,"male","white",18,46
627 | 24855.0354935343,61.2,"female","other",16,38
628 | 40753.3280697748,65.17,"female","white",17,37
629 | 56670.3273867862,65.88,"female","white",16,32
630 | 28615.2800070585,70.22,"male","white",16,28
631 | 44568.5809689135,72.07,"male","black",15,34
632 | 24860.3637743202,63.94,"female","hispanic",11,25
633 | 34957.1221747868,71.36,"male","hispanic",14,32
634 | 1007.21194874105,62.74,"female","white",12,33
635 | 55693.8337608029,72.86,"male","white",12,34
636 | 29624.8547880902,64.52,"female","black",15,50
637 | 55760.7664723291,67.86,"male","white",14,41
638 | 27068.2291862713,63.7,"female","black",12,25
639 | 34920.5390609967,66.03,"male","black",16,41
640 | 48697.0600361851,67.18,"female","white",16,62
641 | 28032.2062659646,64.07,"female","hispanic",12,31
642 | 40745.341702208,64.09,"female","black",14,40
643 | 16899.7657635547,61.94,"female","white",12,55
644 | 63679.4040953726,69.43,"male","white",12,41
645 | 20859.1253422943,62.89,"female","white",12,67
646 | 55622.579497519,71.29,"male","white",15,49
647 | 984.500213531472,61.15,"female","hispanic",16,27
648 | 166981.025077615,75.23,"male","white",12,53
649 | 159024.205461937,70.35,"male","white",18,44
650 | 33381.7739759541,68.61,"male","white",12,55
651 | 16898.5850156759,63.93,"female","white",14,60
652 | 52454.8022134593,72.05,"male","white",12,30
653 | 41354.4324383427,68.64,"male","white",18,29
654 | 96398.2293678427,62.95,"female","white",10,82
655 | 28625.3297430993,71.44,"male","white",16,49
656 | 24843.1804714277,58.97,"female","white",8,65
657 | 8031.69997754504,68.31,"male","white",13,24
658 | 95370.5559507232,76.73,"male","white",17,46
659 | 20078.7017904862,67.96,"female","white",14,73
660 | 44610.2025899329,70.28,"male","black",18,42
661 | 66786.5507838122,69.29,"male","white",18,47
662 | 4194.42425623004,63.43,"female","white",12,37
663 | 31855.1033379246,73.14,"male","black",18,37
664 | 44600.1770620232,66.41,"male","other",15,44
665 | 15997.5455999182,72.1,"male","black",17,66
666 | 69.6914927913448,65.97,"male","white",16,69
667 | 49365.2190022763,72.29,"male","black",12,35
668 | 28014.3209098335,69.15,"female","black",13,41
669 | 26427.6756161573,67.11,"female","white",12,43
670 | 56644.6466978089,65.88,"female","white",15,42
671 | 5775.40581158023,67.13,"female","white",12,30
672 | 26443.0930466471,68.29,"female","white",18,43
673 | 56659.1888329205,64.77,"female","black",15,46
674 | 43928.4164617031,62.55,"female","black",17,37
675 | 39716.9277091473,68.99,"male","white",17,33
676 | 1013.14945943267,62.04,"female","white",14,82
677 | 63549.4462474664,70.38,"male","white",16,45
678 | 72559.7829905608,66.23,"female","white",16,47
679 | 22225.1108881337,65.96,"male","white",12,27
680 | 40739.3407716587,66.04,"female","white",14,41
681 | 63629.7998156167,71.66,"male","white",12,46
682 | 54097.4886990852,68.61,"male","white",12,39
683 | 63563.485216593,65.94,"male","white",12,38
684 | 55702.3014545118,69.1,"male","white",14,32
685 | 20079.0535040621,59.9,"female","hispanic",12,55
686 | 48710.9197381887,64.89,"female","white",17,47
687 | 37562.0541780795,61.81,"female","white",12,44
688 | 4169.61756189437,66.73,"female","white",12,34
689 | 31923.2309000489,67.11,"male","white",14,29
690 | 16911.0248263552,68.29,"female","black",9,66
691 | 996.67215235483,64.96,"female","black",6,71
692 | 96404.2353809042,65.97,"female","black",16,47
693 | 19051.340409669,74.3,"male","black",16,45
694 | 40745.0728919601,65.19,"female","white",18,65
695 | 35966.4027951844,62.73,"female","black",14,77
696 | 998.612844261211,61.64,"female","white",12,42
697 | 12787.9070555997,67.82,"male","white",17,29
698 | 4195.33627740169,65.65,"female","white",16,25
699 | 30577.5151842098,64.93,"female","white",16,30
700 | 32797.7152048203,62.44,"female","white",14,24
701 | 158979.09115076,72.74,"male","white",18,41
702 | 80508.0900964618,61.91,"female","white",15,44
703 | 42906.4330216585,69.7,"male","white",12,46
704 | 2609.64069289783,57.34,"female","black",12,62
705 | 28564.2913953672,67.72,"male","white",12,24
706 | 28025.9975550003,66.98,"female","white",14,27
707 | 16886.8514579247,64.44,"female","white",12,78
708 | 22206.6211632941,69.86,"male","white",14,39
709 | 11491.8443763099,72.53,"male","white",14,70
710 | 1023.98892050724,62.99,"female","other",8,56
711 | 53464.7050486639,63.3,"female","white",17,43
712 | 36578.6173669224,65.62,"male","white",12,38
713 | 16888.7393946248,64.78,"female","white",12,45
714 | 34939.3171908471,61.12,"male","white",12,28
715 | 21659.5418912689,60.56,"female","hispanic",12,23
716 | 53459.7469784682,66.74,"female","white",12,35
717 | 57150.2428716162,72.79,"male","white",12,40
718 | 16898.6413084924,71.17,"female","black",14,36
719 | 10547.8916527515,64.19,"female","white",12,64
720 | 20090.3525656537,67.44,"female","white",12,47
721 | 16900.9336464029,61.76,"female","white",11,51
722 | 34400.7901688741,67.26,"female","white",12,49
723 | 41363.6776236564,69.02,"male","white",12,36
724 | 35991.8855411288,65.97,"female","black",12,54
725 | 37571.3742815177,65.72,"female","black",15,58
726 | 31883.2552493761,66.22,"male","white",15,28
727 | 16910.1847964277,65.99,"female","white",13,42
728 | 24824.3894868874,63.89,"female","white",12,75
729 | 11207.8912675841,71.87,"male","black",14,28
730 | 24859.4210423322,63.83,"female","white",12,42
731 | 987.477982853054,64.33,"female","white",12,64
732 | 42339.8421394851,64.83,"female","white",16,38
733 | 15961.4273011639,71.79,"male","white",11,35
734 | 982.925263925835,63.12,"female","white",8,61
735 | 4753.87988315343,71.2,"male","white",14,24
736 | 43913.668813905,63.72,"female","hispanic",17,45
737 | 45521.6194017236,64,"female","white",17,29
738 | 24848.1063279448,66.95,"female","white",12,31
739 | 87473.4075168657,75.04,"male","white",18,34
740 | 32823.4149412493,65.68,"female","hispanic",17,30
741 | 14525.0653532774,67.11,"female","white",12,48
742 | 999.7960302637,68.09,"female","white",12,62
743 | 47682.5461562175,71.41,"male","white",16,34
744 | 1002.43982783669,61.88,"female","white",12,70
745 | 32801.265467448,67.96,"female","white",12,35
746 | 34947.9182367206,69.77,"male","white",6,79
747 | 39823.374123275,67.17,"male","white",16,30
748 | 977.095394374498,63.77,"female","white",16,66
749 | 4758.44302121065,70.99,"male","white",8,26
750 | 26437.6534741544,62.96,"female","white",12,53
751 | 1012.99331167704,64.14,"female","white",12,61
752 | 16878.8349959772,60.61,"female","black",12,43
753 | 24854.9424481381,65.34,"female","black",12,32
754 | 44513.8048948285,66.96,"male","white",12,50
755 | 50014.1224657651,65.18,"male","white",14,39
756 | 150967.87829962,66.3,"male","white",18,56
757 | 60411.7997400031,66.64,"male","white",12,44
758 | 47660.9252116808,73.89,"male","white",12,45
759 | 57055.5653429877,73.95,"male","white",16,46
760 | 20096.2174172541,66.07,"female","white",13,26
761 | 16908.8113233328,69.12,"female","white",16,29
762 | 72543.836019343,66.05,"female","white",18,48
763 | 63581.3456537451,74.27,"male","white",16,60
764 | 34293.320938979,71.96,"male","white",12,31
765 | 22149.5427565164,66.26,"male","white",12,66
766 | 60492.1212305534,67.04,"male","other",17,58
767 | 2581.8704016655,64.79,"female","white",12,22
768 | 22241.3709490266,65.29,"male","white",12,77
769 | 24837.2986927913,64.98,"female","white",12,35
770 | 1017.15996053443,62.58,"female","hispanic",12,67
771 | 39167.7457051747,68.21,"female","white",13,46
772 | 8161.62386130716,64.15,"female","white",16,27
773 | 29627.8442660978,63.68,"female","white",16,67
774 | 23260.5391564021,63.15,"female","white",12,24
775 | 16903.3551430115,69.06,"female","white",12,32
776 | 16108.242172076,66.91,"female","white",12,36
777 | 18516.6837641757,63.1,"female","white",15,70
778 | 27100.834976173,66.96,"male","white",12,71
779 | 10559.4810739797,60.23,"female","white",13,32
780 | 4972.0353918892,62.24,"female","white",11,62
781 | 63505.6743924597,71.88,"male","white",16,45
782 | 996.561743690155,67.74,"female","white",14,39
783 | 39706.4175846121,66.88,"male","white",16,43
784 | 45523.4974026761,66,"female","black",15,64
785 | 53471.4323638042,66.07,"female","black",12,49
786 | 7370.75164242468,59.64,"female","white",8,68
787 | 981.628981126362,67.19,"female","white",12,33
788 | 6528.93768232502,65.47,"male","white",16,34
789 | 40752.142789777,69.11,"female","white",12,86
790 | 50863.956748591,72.15,"male","white",12,32
791 | 21674.3140627863,63.41,"female","white",8,62
792 | 12698.1310972785,63.81,"male","hispanic",8,31
793 | 987.960796526986,61.66,"female","other",17,30
794 | 993.15806525758,64.93,"female","white",13,60
795 | 28035.9398261196,64.08,"female","white",12,42
796 | 24847.9216110984,67.41,"female","white",12,81
797 | 28590.2219778065,65.71,"male","white",13,36
798 | 19119.2497606979,73.4,"male","white",12,28
799 | 11062.216631806,73.87,"male","white",7,53
800 | 1006.76154115774,65.63,"female","white",12,28
801 | 2909.93805759301,65.08,"female","white",14,26
802 | -27.8768193545869,72.29,"male","white",12,22
803 | 1000.22150357482,64.09,"female","white",12,22
804 | 22329.158952236,62.91,"male","white",12,25
805 | 28028.8492187587,62.85,"female","white",12,44
806 | 47765.3406387636,66.04,"male","white",12,46
807 | 66.5428358774727,72.38,"male","white",10,67
808 | 39678.5326981348,70.12,"male","white",12,57
809 | 35978.7787242601,66.07,"female","hispanic",12,37
810 | 24857.016024841,62.92,"female","white",12,25
811 | 42963.3620053351,72.94,"male","white",12,95
812 | 32796.4245622424,63.26,"female","white",16,30
813 | 15292.2967766704,68.74,"female","white",11,38
814 | 994.593757949998,66.3,"female","white",14,28
815 | 27055.6866263278,72.35,"male","white",12,34
816 | 3843.76286582021,67.86,"female","white",12,50
817 | 29620.1650898871,64.07,"female","white",12,50
818 | 40742.5641452661,63.38,"female","white",18,46
819 | 31844.6518626703,70.34,"male","black",16,45
820 | 120240.278171623,63.8,"female","white",15,44
821 | 6555.89914734581,66.3,"female","white",12,44
822 | 58227.6431745198,60.76,"female","white",12,75
823 | 47685.7793442066,72.2,"male","white",12,29
824 | 40734.3058021157,64.77,"female","white",13,82
825 | 2593.55740323646,63.79,"female","white",14,30
826 | 95457.421242717,67.82,"male","white",12,70
827 | 55645.5778949582,73.86,"male","white",13,80
828 | 14214.3087953851,67.14,"male","white",13,37
829 | 55712.3484320936,70.13,"male","white",9,88
830 | 18488.0224412708,66.21,"female","white",10,62
831 | 19131.3555361695,70.93,"male","white",12,56
832 | 47724.5834057795,70.96,"male","white",12,47
833 | 30251.8926549142,71.1,"male","black",7,61
834 | 14176.3872529867,73.37,"male","white",9,65
835 | 47779.8599237978,75.01,"male","white",16,33
836 | 988.016236486753,62.18,"female","white",12,37
837 | 14300.4605801796,71.72,"male","white",12,36
838 | 24859.7960733726,65.36,"female","hispanic",16,34
839 | 13712.5235224073,64.8,"female","black",12,35
840 | 16884.77431914,65.91,"female","white",10,79
841 | 52499.2752680738,72.14,"male","white",12,41
842 | 20095.9029830232,59.98,"female","hispanic",12,32
843 | 29623.7009005662,64.12,"female","white",15,41
844 | 48698.5250501847,65,"female","white",13,65
845 | 20884.8217696475,64.94,"female","white",12,36
846 | 16892.8740662929,62.1,"female","black",12,40
847 | 13622.6093393889,67.76,"male","black",13,75
848 | 33357.2798994404,72.91,"male","black",14,50
849 | 1003.35472447434,62.98,"female","white",12,33
850 | 34397.6269652435,63.58,"female","white",12,81
851 | 1000.47251325349,65.81,"female","white",13,28
852 | 34389.1320382317,67.92,"female","white",17,29
853 | 1003.70654303727,63.1,"female","white",13,43
854 | 8951.3872765461,65.83,"female","white",12,56
855 | 35977.614797689,70.85,"female","white",17,40
856 | 47746.0565649852,72.99,"male","white",14,30
857 | 1010.22903182616,64.17,"female","white",12,32
858 | 196568.589166955,61.32,"female","white",14,62
859 | 13720.6740585699,62.68,"female","white",9,49
860 | 1012.3586518629,63.12,"female","white",9,58
861 | 994.513680698742,61.78,"female","white",8,60
862 | 4180.54604546177,64.29,"female","white",9,29
863 | 13080.2480579603,67.15,"female","white",6,66
864 | 1002.02384296223,66.59,"female","white",12,22
865 | 992.171547840471,60.73,"female","white",18,49
866 | 16908.1568753555,63.28,"female","white",12,29
867 | 55636.6222369252,73.11,"male","white",12,52
868 | 24852.5097234656,66.06,"female","white",16,61
869 | 7368.26610617647,70.2,"female","white",12,26
870 | 25499.5396603811,68.08,"male","white",8,28
871 | 1314.20587405782,66.84,"female","white",12,37
872 | 36623.3121917525,74.3,"male","white",16,29
873 | 26425.9974604264,64.28,"female","white",16,31
874 | 982.107472627729,61.11,"female","white",12,31
875 | 996.918499717436,66.34,"female","white",12,56
876 | 10560.7163277053,59.71,"female","white",12,53
877 | 1939.87257570583,59.32,"female","white",9,50
878 | 995.580464467877,62.15,"female","black",16,40
879 | 996.243150686733,61.78,"female","black",14,65
880 | 1003.80414608848,61.56,"female","white",12,29
881 | 44447.4099566408,74.83,"male","other",16,38
882 | 7371.86019531202,65.92,"female","white",8,70
883 | 23907.5056799262,64.37,"male","white",12,39
884 | 23797.7513516587,71.15,"male","white",12,29
885 | 999.242958614164,60.96,"female","white",10,39
886 | 40757.5684534422,63.02,"female","white",12,34
887 | 24847.4599106837,67.2,"female","white",12,30
888 | 63548.3136161029,69.24,"male","white",11,48
889 | 47707.0118820173,69.64,"male","white",11,55
890 | 8960.2545578515,66.11,"female","white",12,33
891 | 55645.6818546773,73.98,"male","white",14,38
892 | 26447.0035703518,65.03,"female","white",14,35
893 | 45509.8650510557,63.49,"female","white",16,44
894 | 31750.6770282697,68.1,"male","white",15,46
895 | 33347.263714708,66.19,"male","white",13,34
896 | 27048.7526856383,69.33,"male","black",12,46
897 | 39167.6226009284,59.75,"female","white",13,54
898 | 40742.7691523491,60.61,"female","white",12,79
899 | 63566.2328416663,74.98,"male","white",12,50
900 | 998.767187740165,63.01,"female","hispanic",9,27
901 | -98.5804890739183,69.12,"male","white",13,24
902 | 993.102442046512,66.14,"female","white",16,67
903 | 13718.6298924513,63.71,"female","white",13,23
904 | 31214.718656156,63.03,"female","hispanic",15,49
905 | 72540.167389067,66.94,"female","white",16,73
906 | 7847.48828249529,64.33,"male","black",17,37
907 | -0.964888977907029,72.17,"male","white",14,25
908 | 39165.6797373926,62.96,"female","black",14,44
909 | 8939.87426566371,67.04,"female","white",14,23
910 | 52519.6301914733,66.08,"male","white",16,38
911 | 39148.2239673475,64.56,"female","white",14,28
912 | 10524.3333363139,63.36,"female","white",13,27
913 | 144086.409334098,65.88,"female","white",14,59
914 | 980.298999532855,64.16,"female","white",16,49
915 | 91628.4462828987,64.19,"female","white",18,51
916 | 21922.3163311986,70.93,"male","white",14,58
917 | 47648.579482433,70.72,"male","white",18,41
918 | 23809.4477379481,71.85,"male","white",12,44
919 | 24860.6466090689,59.96,"female","white",14,79
920 | 49236.364512364,69.77,"male","white",12,33
921 | 52421.8265274802,73.93,"male","white",17,44
922 | 47707.8481763399,67.96,"male","white",18,65
923 | 19150.3369351537,74.1,"male","white",12,42
924 | 8012.3490716726,74.88,"male","white",12,26
925 | 33388.2987658794,67.89,"male","white",12,63
926 | 45529.9389713226,62.98,"female","white",18,50
927 | 19031.9000979283,71.81,"male","white",12,35
928 | 31207.4749542588,63.68,"female","white",12,50
929 | 10557.2030414009,63.04,"female","white",14,39
930 | 9431.11526131584,66.18,"female","white",13,64
931 | 24852.9189798823,60.98,"female","white",13,32
932 | 997.248534885182,61.73,"female","white",12,33
933 | 27009.5096925429,72.76,"male","white",12,50
934 | 8466.49445018308,66.1,"female","black",12,82
935 | 1007.99494108658,68.26,"female","white",12,22
936 | 16124.9628676755,62.28,"female","white",12,69
937 | 2634.96424437502,65.16,"male","white",12,26
938 | 990.082696774432,63.19,"female","white",14,47
939 | 9576.38819748704,62.32,"female","white",12,35
940 | 15892.2091978625,68.15,"male","white",12,35
941 | 40744.8747647464,59.15,"female","white",15,87
942 | 998.925444353651,62.21,"female","white",12,43
943 | 1013.52477383429,64.5,"female","white",9,67
944 | 56656.301330095,68.12,"female","white",16,47
945 | 45523.58050729,62.53,"female","white",18,48
946 | 67762.2226156826,63.24,"female","white",12,44
947 | 71609.0662816366,67.85,"male","white",12,62
948 | 88453.5952587809,63.83,"female","white",12,55
949 | 58228.0853639499,69.89,"female","white",16,34
950 | 39858.0562950258,69.96,"male","white",10,35
951 | 30161.9293463618,72.16,"male","white",12,32
952 | 60431.9169106581,69.34,"male","white",11,54
953 | 24850.196030533,63.19,"female","white",14,41
954 | 8162.68267176858,58.09,"female","white",5,89
955 | 153635.749804599,62.98,"female","white",14,31
956 | 18497.9558645746,63.11,"female","white",12,51
957 | 14306.4307247277,72.86,"male","white",15,38
958 | 8945.55105069553,68.91,"female","white",14,28
959 | 79540.6463287231,69.52,"male","white",12,55
960 | 1007.91101944059,65.89,"female","white",12,63
961 | 984.047256808445,65.91,"female","white",16,66
962 | 10537.4648191351,64.96,"female","white",13,48
963 | 158930.467358879,72.38,"male","white",13,26
964 | 27220.0059304223,62.05,"female","white",13,50
965 | -25.6552603659183,68.9,"male","white",12,22
966 | 50877.7761685217,71.69,"male","white",14,44
967 | 47709.3292637674,73.13,"male","white",18,45
968 | 16906.6360987587,68.19,"female","white",16,46
969 | 1002.15389694231,63.62,"female","white",12,39
970 | 24875.871655644,65.8,"female","white",12,23
971 | 11339.3623934459,59.92,"female","white",8,87
972 | 42345.881829433,60.31,"female","white",18,67
973 | 28003.467467509,65.03,"female","white",12,63
974 | 35979.2002268293,62.73,"female","other",18,63
975 | 16904.9984512194,64.38,"female","white",12,45
976 | 79523.4868411869,70.81,"male","white",14,62
977 | 31790.3006737847,67.43,"male","white",12,39
978 | 1002.45371064789,66.78,"female","white",13,63
979 | 1641.84246580928,62.86,"female","white",12,35
980 | 26433.7795557355,64.82,"female","white",14,63
981 | 16876.3917770673,65.85,"female","white",12,70
982 | 23262.3062737323,63.38,"female","white",13,62
983 | 8935.22287731831,65.04,"female","white",15,24
984 | 63570.4143675579,73.88,"male","white",16,38
985 | 8930.78455644505,67.27,"female","white",13,59
986 | 40751.7835298215,67.65,"female","white",17,51
987 | 20091.5964444871,65.17,"female","white",12,55
988 | 93232.672427805,63.72,"female","white",16,38
989 | 29632.3100863834,67.94,"female","white",16,30
990 | 1578.5428138958,64.53,"male","white",12,22
991 | 33433.936468524,72.24,"male","white",17,51
992 | 17505.4454560322,65.73,"male","black",12,66
993 | 21663.0043154502,63.69,"female","white",12,40
994 | 991.995620972514,63.63,"female","white",12,57
995 | 32809.6326770138,59.61,"female","other",16,92
996 | 39829.3235077431,72.52,"male","black",12,39
997 | 1945.44501302122,63.78,"female","white",12,47
998 | 39693.929061593,67.03,"male","black",14,30
999 | 57292.9968450935,68,"male","white",12,32
1000 | 24854.1042055616,64.02,"female","other",13,37
1001 | 6566.2571368786,66.16,"female","white",12,55
1002 | 31215.7499939053,64.55,"female","white",16,38
1003 | 98558.7781026767,70.02,"male","white",14,58
1004 | 39773.2440613475,69.94,"male","white",12,46
1005 | 79507.9870618854,65.79,"male","white",12,56
1006 | 1002.60049928122,64.24,"female","white",12,32
1007 | 1001.6664823116,66.3,"female","white",13,30
1008 | 63533.3407303505,66.06,"male","white",12,65
1009 | 41283.509070804,74.15,"male","white",18,30
1010 | 35981.5108645561,62.35,"female","white",12,52
1011 | 34393.3511714037,65.53,"female","white",17,43
1012 | 10719.1973744773,61.53,"female","white",12,69
1013 | 48700.4850032987,64.8,"female","black",18,68
1014 | 10526.1893318709,63.02,"female","white",12,82
1015 | 42872.2640087328,71.38,"male","white",16,33
1016 | 7378.63016606156,66.01,"female","white",16,46
1017 | 8026.64579441285,62.04,"female","other",6,76
1018 | 999.878661556705,61.82,"female","white",12,80
1019 | 8942.80671590728,62.97,"female","white",10,91
1020 | 1020.85471771991,64.72,"female","white",9,79
1021 | 71538.2893319736,67.4,"male","white",18,44
1022 | 29637.2679715432,60.78,"female","white",14,49
1023 | 10538.7447239525,65.8,"female","white",15,26
1024 | 988.161628283781,61.2,"female","white",12,53
1025 | 22252.8993931606,71.68,"male","white",12,28
1026 | 56644.8967699588,69.27,"female","white",14,28
1027 | 24852.1695267614,64.25,"female","white",16,33
1028 | 20074.4189766833,62.75,"female","black",12,33
1029 | 2600.43525202508,60.24,"female","white",12,51
1030 | 50832.9187815009,69.42,"male","black",15,37
1031 | 2396.14778671762,73.13,"male","white",12,23
1032 | 5768.44503437889,65.09,"female","black",16,27
1033 | 56642.4364407088,63.69,"female","white",12,46
1034 | 990.807712176254,63.03,"female","white",12,49
1035 | 998.671567646385,62.86,"female","white",16,33
1036 | 111205.855063134,72.18,"male","white",15,52
1037 | 34399.7622930359,66.98,"female","white",16,32
1038 | 79576.5419400654,68.85,"male","white",11,22
1039 | 23824.410492409,70.75,"male","white",9,25
1040 | -56.3219788355347,67.81,"male","hispanic",10,22
1041 | -66.1283004678617,72.92,"male","black",12,24
1042 | 15316.9897842259,65.87,"female","black",8,39
1043 | 16906.2841715607,63.17,"female","black",14,29
1044 | 1013.91240595871,65.3,"female","white",12,64
1045 | 47740.831040028,70.88,"male","white",12,51
1046 | 25452.1338955757,72.11,"male","white",14,25
1047 | 31876.5784243398,67.72,"male","white",12,55
1048 | 1003.37176812386,63.82,"female","white",12,49
1049 | 158996.080700412,71.83,"male","white",17,58
1050 | 16889.9869050085,65.61,"female","white",12,47
1051 | 55635.3843796645,66.11,"male","white",14,67
1052 | 12115.9173222928,64.71,"female","white",16,35
1053 | 1007.2504839221,65,"female","white",12,66
1054 | 1010.4634170377,66.17,"female","white",12,56
1055 | 991.124467831543,64.87,"female","white",11,42
1056 | 12721.3377336341,71.39,"male","black",14,31
1057 | 17477.4743066582,74.38,"male","white",10,54
1058 | 39764.5300001007,69.77,"male","white",9,51
1059 | 40740.013906102,63.82,"female","white",14,49
1060 | 47770.8224810453,69,"male","white",14,52
1061 | 25.5073793326888,66.02,"male","white",12,30
1062 | 77311.9389631547,62.62,"female","white",18,38
1063 | 28041.3399828982,68.02,"female","white",12,50
1064 | 1005.83847378064,70.7,"female","white",8,74
1065 | 41433.6591116814,72.86,"male","white",16,28
1066 | 16896.3080934012,63.35,"female","white",12,65
1067 | 995.746579569306,65.19,"female","white",11,61
1068 | 95351.2060008194,71.05,"male","white",18,57
1069 | 35978.3615194746,65.36,"female","white",18,33
1070 | 10861.0922840824,64.03,"female","white",13,87
1071 | 39168.2941872203,65.22,"female","white",16,36
1072 | 31789.2198043854,71.41,"male","white",14,44
1073 | 12137.2448273531,69.85,"female","white",14,34
1074 | -30.4235024585596,72.38,"male","black",12,23
1075 | 48688.9297482258,67.71,"female","black",16,52
1076 | 37563.434273142,65.88,"female","white",18,32
1077 | 32821.1936901775,63.71,"female","white",12,44
1078 | 5780.073674992,66.19,"female","white",16,33
1079 | 999.631035696323,62.82,"female","white",12,35
1080 | 47782.6325430195,70.35,"male","white",16,45
1081 | 20075.6765540308,64.23,"female","white",14,27
1082 | 993.395992495505,66.15,"female","white",7,45
1083 | 1025.01290254845,63.26,"female","white",10,41
1084 | 63513.1027353022,71.25,"male","white",12,42
1085 | 24859.9457977036,64.02,"female","white",14,27
1086 | 58225.6117358599,62.42,"female","white",16,49
1087 | 16889.6611198725,65.5,"female","white",12,57
1088 | 995.554489598849,63.08,"female","white",11,69
1089 | 21669.2484450363,67.98,"female","white",14,52
1090 | 1009.76704604544,64.88,"female","white",12,63
1091 | 1010.98136569577,60.61,"female","white",14,45
1092 | 16894.2835398117,61.77,"female","black",12,31
1093 | 47695.1950625972,68.89,"male","black",12,41
1094 | 36593.6932422507,64.16,"male","white",13,36
1095 | 31833.0641657761,69.63,"male","white",17,72
1096 | 31216.472797982,64.6,"female","white",13,86
1097 | 31818.0117641456,61.21,"male","black",11,70
1098 | 44466.4806466349,63.44,"male","black",14,67
1099 | 21681.934597577,65.14,"female","white",10,84
1100 | 19119.584235378,63.61,"male","white",12,79
1101 | 31760.3097497459,69.06,"male","white",11,27
1102 | 39789.4734194548,73.88,"male","white",12,34
1103 | 40735.5821116565,67.8,"female","white",13,56
1104 | 1004.34436591698,67.65,"female","white",12,38
1105 | 1014.45045553044,64.91,"female","white",14,45
1106 | 29620.997007219,66.87,"female","white",16,44
1107 | 988.565070198681,64.71,"female","white",12,22
1108 | 995.374617537821,64.18,"female","white",18,50
1109 | 10529.7506464015,62.09,"female","white",10,72
1110 | 25427.4001595129,72.05,"male","white",13,27
1111 | 71514.3289230531,73.05,"male","white",16,42
1112 | 39806.8521692602,69.94,"male","white",14,73
1113 | 20073.7156480621,63.99,"female","white",14,42
1114 | 4160.53105497259,67.87,"female","white",12,22
1115 | 16890.6172628713,62.86,"female","white",12,22
1116 | 22286.8004109747,71.18,"male","white",12,40
1117 | 24852.3361979281,64.41,"female","white",12,35
1118 | 995.172315021961,67.35,"female","white",11,41
1119 | 16901.8264738142,59.77,"female","white",12,43
1120 | 2919.12600531874,65.51,"female","white",4,68
1121 | 24850.8308644127,59.01,"female","white",14,66
1122 | 16900.745158091,62.86,"female","white",13,68
1123 | 3163.02291068586,66.3,"male","white",12,22
1124 | 1006.01762644208,66.21,"female","white",12,39
1125 | 7365.5038036505,64.7,"female","white",13,37
1126 | 16894.0221460191,63.96,"female","white",14,36
1127 | 3393.78999839327,65.78,"female","white",12,27
1128 | 28624.1716370336,66.33,"male","hispanic",17,30
1129 | 6396.29858419505,72.99,"male","white",15,28
1130 | 44524.6055045838,71.76,"male","white",16,30
1131 | 16913.9983227796,74.3,"female","white",14,26
1132 | 5772.98208580172,65.06,"female","white",16,26
1133 | 999.23673707007,63.53,"female","white",14,35
1134 | 19020.2126342116,71.06,"male","white",12,25
1135 | 31197.9028766208,68.08,"female","white",13,49
1136 | 1008.02183926421,61.35,"female","white",12,37
1137 | 57262.115767785,69.77,"male","white",16,43
1138 | 12154.012468166,64.19,"female","white",16,63
1139 | 56655.8520466265,59.16,"female","white",11,38
1140 | 63508.2745013848,68.27,"male","white",18,47
1141 | 23896.1225958883,68.31,"male","white",14,29
1142 | 13717.6657412768,69.24,"female","white",13,47
1143 | 40744.0142293539,69.02,"female","white",14,59
1144 | 44511.8388677947,75.02,"male","white",18,42
1145 | 35056.1897433602,68.75,"male","white",14,39
1146 | 34166.1588166956,72.1,"male","black",13,28
1147 | 5771.77114883749,68.35,"female","white",12,27
1148 | 36577.5114386126,71.74,"male","white",13,32
1149 | 28718.2050356153,71.88,"male","white",12,77
1150 | 31709.3453051173,66.61,"male","white",16,75
1151 | 26432.0238638548,63.23,"female","hispanic",12,50
1152 | 23817.843746481,68.62,"male","white",12,30
1153 | 85268.6446916847,62.85,"female","hispanic",17,36
1154 | 41349.9483732802,71.99,"male","white",12,31
1155 | 7364.53214004544,64.81,"female","other",12,32
1156 | 15809.0448761028,72.98,"male","white",16,63
1157 | 68330.0291746917,69.06,"male","white",12,55
1158 | 27130.7019128214,69.07,"male","other",14,37
1159 | 7363.69743087666,60.66,"female","hispanic",12,42
1160 | 79485.8854920863,72.8,"male","white",15,52
1161 | 88456.4731995184,67.03,"female","white",18,55
1162 | 45507.9788477399,62.91,"female","white",16,45
1163 | 998.429683373758,66.09,"female","white",12,52
1164 | 8951.01665799154,61.87,"female","white",14,52
1165 | 29614.9483149486,65.05,"female","white",13,53
1166 | 33385.6491294888,71.65,"male","hispanic",15,66
1167 | 20085.6872462914,66.91,"female","white",12,37
1168 | 53481.8611190695,65.46,"female","hispanic",16,66
1169 | 79582.9722445708,71.73,"male","hispanic",13,68
1170 | 10543.0059825721,65.35,"female","hispanic",9,78
1171 | 26433.4457319253,63.98,"female","white",14,63
1172 | 55665.7385965739,67.76,"male","white",12,54
1173 | 40750.9588661231,68.43,"female","white",12,47
1174 | 31779.0524294819,75.89,"male","white",14,30
1175 | 29618.6709435351,63.28,"female","white",12,40
1176 | 54032.4163361492,73.66,"male","white",14,82
1177 | 39720.1169340478,67.95,"male","white",16,29
1178 | 31789.5496754198,72.13,"male","white",17,81
1179 | 52556.6622479844,68.05,"male","white",12,73
1180 | 7362.32469627059,65.15,"female","white",12,77
1181 | 32821.7394367105,59.85,"female","white",8,81
1182 | 63629.1436845868,71.62,"male","white",16,37
1183 | 23272.6828309175,63.92,"female","white",14,44
1184 | 20070.4266821973,68.19,"female","white",12,52
1185 | 40752.7857873425,60.68,"female","white",12,35
1186 | 112293.228459904,67.16,"female","white",10,76
1187 | 61431.2358244234,67.35,"female","white",17,59
1188 | 23273.0643236029,62.89,"female","white",12,32
1189 | 1005.66857107228,63.08,"female","white",15,37
1190 | 24867.3944914387,59.74,"female","white",13,41
1191 | 27052.3145447836,74.2,"male","white",16,35
1192 | 95353.794962225,69.22,"male","white",14,70
1193 | 24862.3904718197,66.82,"female","white",15,73
1194 | 39757.9472100514,64.79,"male","white",16,90
1195 | 28612.7898884795,62.2,"male","hispanic",12,67
1196 | -1.50936964863578,64.92,"male","white",12,59
1197 | 111285.63662324,72.19,"male","white",18,51
1198 | 987.783751585589,66.04,"female","hispanic",11,53
1199 | 48687.8484753642,65.16,"female","white",18,63
1200 | 13713.7442426761,61.86,"female","white",12,55
1201 | 47740.9784984125,70.87,"male","white",16,38
1202 | 998.743384307228,62.2,"female","white",11,30
1203 | 42941.4351052398,71.11,"male","white",14,36
1204 | 74706.6532959308,70.24,"male","white",18,47
1205 | 19037.4454862462,67.15,"male","hispanic",12,26
1206 | 39157.7510357454,62.05,"female","white",17,38
1207 | 21664.6599424119,63.05,"female","hispanic",12,39
1208 | 128217.042224129,60.94,"female","white",14,41
1209 | 7935.49480666059,71.42,"male","white",12,22
1210 | 1001.90663270767,64.92,"female","white",12,32
1211 | 40760.9115574943,63.15,"female","white",10,64
1212 | 44533.1100039177,66.12,"male","white",12,48
1213 | 13695.752850286,64.1,"female","white",15,24
1214 | 16880.9998315911,65.96,"female","white",13,55
1215 | 15296.7303638123,68.23,"female","white",14,45
1216 | 32784.8272263708,60.26,"female","white",12,41
1217 | 19053.5691050415,65.21,"male","white",12,75
1218 | 24854.2306409617,64.69,"female","white",14,70
1219 | 32789.3796661253,60.64,"female","white",14,40
1220 | 45525.7374951262,64.01,"female","white",14,40
1221 | 1432.29097374576,65.79,"female","white",15,25
1222 | 55693.5517533498,68.99,"male","white",14,78
1223 | 42341.556083029,65.93,"female","hispanic",14,36
1224 | 43933.8280668161,66.67,"female","white",17,38
1225 | 23263.2462506252,68.63,"female","white",12,34
1226 | 50908.8970813513,65.76,"male","white",16,37
1227 | 16906.3460390581,68.97,"female","white",14,36
1228 | 11121.5282649971,69.72,"male","white",16,34
1229 | 44492.4222581206,71.03,"male","white",17,50
1230 | 95400.5609817628,73.8,"male","white",13,58
1231 | 19050.8316923704,68.91,"male","white",4,66
1232 | 1013.52622790713,65,"female","white",13,67
1233 | 995.78153389181,65.21,"female","white",12,50
1234 | 24857.5303329462,74.88,"female","white",13,79
1235 | 23863.0123251893,71.57,"male","white",11,67
1236 | 42322.0737112319,64.12,"female","white",15,36
1237 | 992.885672062796,58.99,"female","black",12,47
1238 | -49.5976242316242,70.96,"male","other",12,24
1239 | 41288.4230868218,71.91,"male","hispanic",16,29
1240 | 30767.8461101159,63.23,"female","black",10,40
1241 | 1001.49804266558,68.29,"female","black",12,26
1242 | 20079.5574272996,63.8,"female","black",12,77
1243 | 127226.599652973,69.95,"male","white",17,43
1244 | 77309.5329351063,66.06,"female","white",13,43
1245 | 28575.3855635215,62.05,"male","other",16,31
1246 | 135137.686883976,70.25,"male","white",15,69
1247 | 56657.6081257394,64.13,"female","black",12,51
1248 | 8027.18531618615,68.63,"male","white",16,26
1249 | 15309.4702056486,63.81,"female","hispanic",13,26
1250 | 48708.1791232664,66.66,"female","white",16,31
1251 | 127277.492490206,72.18,"male","white",14,62
1252 | 56642.969612385,65.09,"female","white",16,36
1253 | 20084.6692833515,62.95,"female","white",14,38
1254 | 317949.127955061,70.24,"male","white",18,38
1255 | 7375.92570936378,63.98,"female","white",16,49
1256 | 3199.96325523417,68.07,"male","white",12,22
1257 | 39169.1000544924,66.02,"female","white",18,73
1258 | 3128.40045423806,65.48,"male","hispanic",15,27
1259 | 23908.082328374,70.4,"male","white",17,34
1260 | 10533.1039963861,61.49,"female","white",12,69
1261 | 7869.85708849864,65.83,"male","white",12,79
1262 | 3375.81930734894,65.86,"female","hispanic",12,25
1263 | 45509.3990429654,63.86,"female","white",18,52
1264 | 15290.2441182621,69.81,"female","white",12,39
1265 | 95402.6598745111,74.25,"male","white",14,56
1266 | 1889.14088316683,66.1,"male","white",12,23
1267 | 3224.43607582189,62.09,"female","hispanic",3,68
1268 | 12132.5027696978,65.95,"female","white",13,47
1269 | 34398.5804366668,62.01,"female","white",12,35
1270 | 69966.2128885529,68.9,"male","white",12,62
1271 | 46143.8478216483,68.08,"male","black",16,40
1272 | 4767.09597374661,72.33,"male","hispanic",14,22
1273 | 19086.1244824656,64.04,"male","black",13,26
1274 | 988.878651448201,64.1,"female","black",16,23
1275 | 1000.68811662548,59.57,"female","hispanic",9,43
1276 | 31859.9039539382,72.28,"male","hispanic",12,32
1277 | 2912.61685643064,61.72,"female","black",5,77
1278 | 58812.5554780609,74.52,"male","white",16,29
1279 | 12721.5684022168,65.93,"male","other",10,64
1280 | 55596.6235470168,68.04,"male","hispanic",16,34
1281 | 39687.7203364583,61.89,"male","other",14,41
1282 | 29600.80026329,71.22,"female","white",10,33
1283 | 48698.5173261034,62.99,"female","black",16,40
1284 | 31762.5194770489,68.73,"male","white",18,73
1285 | 19176.3341880439,70.83,"male","hispanic",6,44
1286 | 1003.94818935734,64.17,"female","hispanic",12,69
1287 | 16898.1539759619,66.07,"female","white",12,72
1288 | 28592.6154784873,72.31,"male","white",14,52
1289 | 986.165728698769,62.46,"female","hispanic",15,63
1290 | 31868.4793403152,71.49,"male","black",12,30
1291 | 50293.3455419254,67.94,"female","white",16,47
1292 | 9521.39185162996,64.41,"male","hispanic",10,40
1293 | 1004.20977744948,64.37,"female","white",12,42
1294 | 19095.2615453343,69.52,"male","white",13,44
1295 | 21672.4590284355,65.56,"female","white",16,37
1296 | 42342.1759299248,67.01,"female","white",17,38
1297 | 24857.4413604567,67.82,"female","black",11,35
1298 | 48694.1241676915,64.07,"female","white",17,32
1299 | 4199.25482208603,65.01,"female","white",14,45
1300 | 7876.23087159809,69.78,"male","other",13,26
1301 | 85249.2838792362,71.28,"female","white",14,30
1302 | 993.735798257432,63.83,"female","white",12,40
1303 | 4159.03322157321,61.54,"female","white",13,22
1304 | 39787.6214099871,70.48,"male","white",17,33
1305 | 8056.63501215389,73.65,"male","white",12,22
1306 | 34382.0740264456,64.86,"female","white",13,31
1307 | 28038.3578937832,70,"female","white",15,31
1308 | 20087.0438961784,63.81,"female","white",13,41
1309 | 66771.7539282786,70.25,"male","hispanic",14,48
1310 | 33417.5208538918,74.3,"male","white",12,37
1311 | 10531.9881266657,63.14,"female","hispanic",12,69
1312 | 994.515746507949,59.23,"female","hispanic",12,50
1313 | 39790.7425975321,73.85,"male","white",14,28
1314 | 5753.10762270653,66.17,"female","white",14,23
1315 | 6722.68142995174,64.94,"female","white",15,71
1316 | 6373.6175685778,70.59,"male","white",12,23
1317 | 31879.3535269404,69.07,"male","white",14,62
1318 | 82711.753363371,60.14,"male","white",15,54
1319 | 96406.7613267083,64.37,"female","white",12,47
1320 | 24839.6398412734,63.84,"female","white",13,45
1321 | 143040.708851371,72.27,"male","white",16,54
1322 | 42349.1938093216,61.38,"female","white",14,69
1323 | 1013.28842229702,62.68,"female","hispanic",13,36
1324 | 3153.35665292635,69.16,"male","white",12,25
1325 | 26434.5026070359,66.82,"female","white",12,65
1326 | 39165.9289402853,65.15,"female","hispanic",11,31
1327 | 23905.6894085143,67.55,"male","hispanic",16,32
1328 | 66770.3896483816,74.1,"male","white",17,47
1329 | 1007.81676112648,62.35,"female","white",15,52
1330 | 16902.2396528115,64.83,"female","white",12,47
1331 | 27044.398985041,70.96,"male","hispanic",13,40
1332 | 48702.1770934611,64.03,"female","white",14,38
1333 | 1000.8856431763,64.92,"female","white",16,26
1334 | 51874.2088709548,68.84,"female","white",17,41
1335 | 58234.5809653042,63.6,"female","hispanic",17,38
1336 | 15865.7268275005,71.9,"male","hispanic",13,24
1337 | 32792.0539545984,63.12,"female","white",15,33
1338 | 39160.158504899,64.93,"female","white",12,33
1339 | 27043.6720522342,72.12,"male","white",12,36
1340 | 37592.311734899,64.49,"female","white",14,52
1341 | 40747.9655119059,63.91,"female","white",12,37
1342 | 79603.4564638413,71.36,"male","white",12,54
1343 | 1006.46990206972,63.48,"female","white",14,43
1344 | 32793.912006464,61.82,"female","white",12,34
1345 | 48693.2024538462,66.01,"female","white",16,37
1346 | 12137.6764657013,63.95,"female","white",10,43
1347 | 11176.0659373347,66.99,"male","white",10,82
1348 | 47774.2522227392,67.79,"male","white",11,36
1349 | 5761.17534334497,66.38,"female","white",12,28
1350 | 32799.1446940046,65.97,"female","white",14,47
1351 | 64607.9618716391,65.68,"female","white",16,63
1352 | 15899.4031550977,67.02,"male","white",12,81
1353 | 25361.8283789109,66.1,"male","white",12,41
1354 | 18482.9571837826,63.27,"female","white",9,55
1355 | 26432.8864567014,67.89,"female","white",13,47
1356 | 28644.500098047,70.32,"male","white",12,31
1357 | 20065.0827364581,62.98,"female","white",16,30
1358 | 6354.84212675623,70.09,"male","black",14,25
1359 | 95416.6067984548,71.92,"male","white",12,49
1360 | 68428.9230580318,74.92,"male","white",17,44
1361 | 50299.0512923305,61.97,"female","white",14,43
1362 | 80518.409375972,67.98,"female","white",17,43
1363 | 43919.8943431691,68.18,"female","white",14,33
1364 | 47712.4409351132,69.85,"male","white",17,60
1365 | 19182.6074308127,73.26,"male","black",13,25
1366 | 987.26513123966,69.01,"female","white",16,43
1367 | 32798.0511760252,62.37,"female","white",17,34
1368 | 24849.1932942171,60.3,"female","black",12,80
1369 | 40757.9350769351,64.17,"female","other",16,41
1370 | 4184.2226846731,60.19,"female","hispanic",6,71
1371 | 4755.73650797254,72.94,"male","hispanic",15,24
1372 | 175901.453597818,65.9,"female","other",18,52
1373 | 87473.9687783999,68.82,"male","white",18,75
1374 | 92205.596105924,69.62,"male","white",18,57
1375 | 16905.5578510981,70.08,"female","white",16,40
1376 | 30173.3803632981,71.68,"male","white",12,33
1377 | 24853.5195136729,61.31,"female","white",18,86
1378 | 13710.6713116427,63.64,"female","white",12,37
1379 | 95426.0144102907,71.65,"male","white",12,54
1380 | 9575.46185684499,68.22,"male","white",12,31
1381 |
--------------------------------------------------------------------------------
/05_dplyr/header.tex:
--------------------------------------------------------------------------------
1 | \usepackage{ctex}
2 | \usepackage{booktabs}
3 | \usepackage{longtable}
4 | \usepackage{array}
5 | \usepackage{multirow}
6 | \usepackage{wrapfig}
7 | \usepackage{float}
8 | \usepackage{colortbl}
9 | \usepackage{pdflscape}
10 | \usepackage{tabu}
11 | \usepackage{threeparttable}
12 | \usepackage{threeparttablex}
13 | \usepackage{makecell}
14 | \usepackage{xcolor}
15 | \usepackage{xtab}
16 |
17 | \def\begincols{
18 | \begin{columns}} \def\begincol{\begin{column}} \def\endcol{\end{column}} \def\endcols{\end{columns}
19 | }
20 |
21 |
22 |
--------------------------------------------------------------------------------
/05_dplyr/images/import_datatype01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/05_dplyr/images/import_datatype01.png
--------------------------------------------------------------------------------
/05_dplyr/images/pipe1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/05_dplyr/images/pipe1.png
--------------------------------------------------------------------------------
/05_dplyr/images/pipe2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/05_dplyr/images/pipe2.png
--------------------------------------------------------------------------------
/05_dplyr/images/tidyverse.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/05_dplyr/images/tidyverse.png
--------------------------------------------------------------------------------
/06_ggplot2/06_ggplot2.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "第五章:数据可视化"
3 | author: "王敏杰"
4 | institute: "四川师范大学"
5 | date: "\\today"
6 | fontsize: 12pt
7 | output: binb::metropolis
8 | section-titles: true
9 | #toc: true
10 | header-includes:
11 | - \usepackage[fontset = fandol]{ctex}
12 | - \input{header.tex}
13 | link-citations: yes
14 | colorlinks: yes
15 | linkcolor: red
16 | classoption: "dvipsnames,UTF8"
17 | ---
18 |
19 | ```{r setup, include=FALSE}
20 | options(digits = 3)
21 | knitr::opts_chunk$set(
22 | comment = "#>",
23 | echo = TRUE,
24 | collapse = TRUE,
25 | message = FALSE,
26 | warning = FALSE,
27 | out.width = "75%",
28 | fig.align = "center",
29 | fig.asp = 0.618, # 1 / phi
30 | fig.show = "hold"
31 | )
32 | ```
33 |
34 | ## tidyverse 家族
35 | ```{r echo=FALSE, out.width = '85%'}
36 | knitr::include_graphics("images/tidyverse.png")
37 | ```
38 |
39 |
40 |
41 |
42 |
43 | # 为什么要可视化
44 |
45 | ## 1854年伦敦霍乱
46 | ```{r out.width = '100%', echo = FALSE}
47 | knitr::include_graphics("images/cholera_a.pdf")
48 | ```
49 |
50 |
51 | ## 1854年伦敦霍乱
52 | ```{r out.width = '100%', echo = FALSE}
53 | knitr::include_graphics("images/cholera_b.pdf")
54 | ```
55 |
56 |
57 | ## 1854年伦敦霍乱
58 | ```{r out.width = '100%', echo = FALSE}
59 | knitr::include_graphics("images/cholera_c.pdf")
60 | ```
61 |
62 |
63 |
64 | ## 辛普森悖论(Simpson's Paradox)
65 | ```{r out.width = '100%', echo = FALSE}
66 | knitr::include_graphics("images/Paradox1.pdf")
67 | ```
68 |
69 | ## 辛普森悖论(Simpson's Paradox)
70 | ```{r out.width = '100%', echo = FALSE}
71 | knitr::include_graphics("images/Paradox2.pdf")
72 | ```
73 |
74 |
75 | ## 辛普森悖论(Simpson's Paradox)
76 | ```{r out.width = '100%', echo = FALSE}
77 | knitr::include_graphics("images/Paradox3.pdf")
78 | ```
79 |
80 |
81 | # ggplot2 宏包
82 |
83 | ## 宏包ggplot2
84 |
85 | - ggplot2是RStudio首席科学家Hadley Wickham在2005年读博士期间的作品。
86 | - 很多人学习R语言,就是因为ggplot2宏包
87 | - ggplot2已经发展成为最受欢迎的R宏包,没有之一
88 |
89 | ```{r}
90 | library(ggplot2) # install.packages("ggplot2")
91 | # or
92 | library(tidyverse) # install.packages("tidyverse")
93 | ```
94 |
95 |
96 |
97 |
98 |
99 | ## ggplot2 的图形语法
100 |
101 | ggplot2有一套优雅的绘图语法(grammar of graphics)
102 |
103 | ```{r out.width = '70%', echo = FALSE}
104 | knitr::include_graphics("images/mapping.png")
105 | ```
106 |
107 | Hadley Wickham将这套语法诠释为:
108 |
109 | 一张统计图形就是从数据到几何对象(geometric object,缩写geom)的图形属性(aesthetic attribute,缩写aes)的一个映射。
110 |
111 |
112 | ## ggplot2 的图形语法
113 |
114 | `ggplot()`函数包括9个部件:
115 |
116 | - **数据 (data)**
117 | - **映射 (mapping)**
118 | - **几何对象 (geom)**
119 | - 统计变换 (stats)
120 | - 标度 (scale)
121 | - 坐标系 (coord)
122 | - 分面 (facet)
123 | - 主题 (theme)
124 | - 存储和输出 (output)
125 |
126 | 其中前三个是必需的。
127 |
128 |
129 |
130 | ## 语法模板
131 |
132 | ```{r out.width = '100%', echo = FALSE}
133 | knitr::include_graphics("images/ggplot_template.png")
134 | ```
135 |
136 |
137 |
138 | ## 案例
139 |
140 | 简单的案例(1880-2014年温度变化和二氧化碳排放量)
141 |
142 | \footnotesize
143 | ```{r, warning = FALSE, message = FALSE}
144 | d <- readr::read_csv("./demo_data/temp_carbon.csv")
145 | ```
146 |
147 | ```{r, echo = FALSE}
148 | d %>%
149 | head(10) %>%
150 | knitr::kable()
151 | ```
152 |
153 | ## 是不是很简单?
154 | \footnotesize
155 | ```{r, out.width="85%"}
156 | ggplot(data = d, mapping = aes(x = year, y = carbon_emissions)) +
157 | geom_line()
158 | ```
159 |
160 |
161 | # ggplot2 语法详解
162 |
163 | ## 演示数据
164 |
165 | 我们用ggplot2宏包内置的燃油经济性数据[mpg](https://ggplot2.tidyverse.org/reference/mpg.html)演示
166 |
167 | \small
168 | |序号|变量|含义|
169 | |:---|:---|:---|
170 | |1 | manufacturer | 生产厂家|
171 | |2 | model | 类型|
172 | |3 | displ | 发动机排量,升|
173 | |4 | year | 生产年份|
174 | |5 | cyl | 气缸数量|
175 | |6 | trans | 传输类型|
176 | |7 | drv | 驱动类型|
177 | |8 | cty | 每加仑城市里程|
178 | |9 | hwy | 每加仑高速公路英里|
179 | |10 | fl | 汽油种类|
180 | |11 | class | 类型|
181 |
182 |
183 | ## 排量越大,越耗油吗?
184 |
185 | 回答这个问题,要用到mpg数据集中的三个变量
186 |
187 | |序号|变量|含义|
188 | |:---|:---|:---|
189 | |3 | displ | **排量**|
190 | |9 | hwy | **油耗**|
191 | |11 | class | 汽车类型|
192 |
193 |
194 | ```{r}
195 | mpg %>%
196 | select(displ, hwy, class) %>%
197 | head(4)
198 | ```
199 |
200 |
201 |
202 | ## 映射
203 |
204 | 为考察发动机排量(displ)与每加仑英里数(hwy)之间的关联,先绘制这两个变量的散点图,
205 |
206 | ```{r out.width = '100%', echo = FALSE}
207 | knitr::include_graphics("images/a-3.png")
208 | ```
209 |
210 |
211 |
212 |
213 |
214 |
215 |
216 |
217 |
218 |
219 |
220 |
221 |
222 |
223 |
224 | ## 运行
225 | 运行脚本后生成图片:
226 | \footnotesize
227 | ```{r, out.width="85%"}
228 | ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
229 | geom_point()
230 | ```
231 |
232 |
233 |
234 | ## 颜色映射
235 |
236 | 除了位置上的映射,ggplot2还包含了颜色、形状及透明度等图形属性的映射
237 |
238 |
239 |
240 | \footnotesize
241 | ```{r, out.width="75%"}
242 | ggplot(data = mpg, aes(x = displ, y = hwy, color = class) ) +
243 | geom_point()
244 | ```
245 |
246 | 此图绘制不同类型的车,displ和hwy的散点图, 并用颜色来实现了分组。
247 |
248 |
249 |
250 | ## 更多映射
251 | 大家试试下面代码呢
252 | \footnotesize
253 | ```{r, eval = FALSE}
254 | ggplot(data = mpg, aes(x = displ, y = hwy, size = class)) +
255 | geom_point()
256 | ```
257 |
258 |
259 | ```{r, eval = FALSE}
260 | ggplot(data = mpg, aes(x = displ, y = hwy, shape = class)) +
261 | geom_point()
262 | ```
263 |
264 |
265 | ```{r, eval = FALSE}
266 | ggplot(data = mpg, aes(x = displ, y = hwy, alpha = class)) +
267 | geom_point()
268 | ```
269 |
270 |
271 | ## 默认值
272 | 一些默认的设置
273 |
274 | ```{r out.width = '85%', echo = FALSE}
275 | knitr::include_graphics("images/a-14.png")
276 | ```
277 |
278 |
279 |
280 | ## 映射 vs.设置
281 |
282 | 想把图中的点指定为某一种颜色,可以使用设置语句,比如
283 |
284 | ```{r out.width = '65%'}
285 | mpg %>%
286 | ggplot(aes(displ, hwy)) +
287 | geom_point(color = "blue")
288 | ```
289 |
290 |
291 |
292 | ## 更多设置
293 | 大家也可以试试下面
294 | \footnotesize
295 | ```{r, eval = FALSE}
296 | ggplot(mpg, aes(displ, hwy)) + geom_point(size = 5)
297 | ```
298 |
299 |
300 | ```{r, eval = FALSE}
301 | ggplot(mpg, aes(displ, hwy)) + geom_point(shape = 2)
302 | ```
303 |
304 |
305 | ```{r, eval = FALSE}
306 | ggplot(mpg, aes(displ, hwy)) + geom_point(alpha = 0.5)
307 | ```
308 |
309 |
310 |
311 | ## 提问
312 | ```{r out.width = '100%', echo = FALSE}
313 | knitr::include_graphics("images/a-21.png")
314 | ```
315 |
316 | 思考下`aes(color = "blue")`为什么会红色的点?
317 |
318 |
319 |
320 |
321 |
322 |
323 |
324 |
325 |
326 |
327 |
328 |
329 |
330 |
331 |
332 |
333 |
334 |
335 |
336 |
337 |
338 |
339 |
340 |
341 |
342 |
343 |
344 |
345 |
346 |
347 | ## 几何对象
348 |
349 | `geom_point()` 可以画散点图,也可以使用`geom_smooth()`绘制平滑曲线
350 |
351 | ```{r}
352 | ggplot(data = mpg, aes(x = displ, y = hwy)) +
353 | geom_smooth()
354 | ```
355 |
356 |
357 |
358 | ## 图层叠加
359 |
360 | ```{r}
361 | ggplot(data = mpg, aes(x = displ, y = hwy)) +
362 | geom_point() +
363 | geom_smooth()
364 | ```
365 |
366 |
367 |
368 |
369 |
370 | ## Global vs. Local
371 | \footnotesize
372 |
373 | ```{r, eval=FALSE}
374 | ggplot(mpg) +
375 | geom_point(aes(x = displ, y = hwy, color = class))
376 | ```
377 |
378 | ```{r, eval=FALSE}
379 | ggplot(mpg) +
380 | geom_point( aes(x = displ, y = hwy, color = class) )
381 | ```
382 |
383 |
384 | \begincols[T]
385 | \begincol[T]{.49\textwidth}
386 | ```{r, echo=FALSE, out.width= "100%"}
387 | ggplot(mpg) +
388 | geom_point(aes(x = displ, y = hwy, color = class))
389 | ```
390 | \endcol
391 |
392 | \begincol[T]{.49\textwidth}
393 |
394 | ```{r, echo=FALSE, out.width= "100%"}
395 | ggplot(mpg) +
396 | geom_point( aes(x = displ, y = hwy, color = class) )
397 | ```
398 | \endcol
399 | \endcols
400 |
401 | 大家可以看到,以上两段代码出来的图是一样,但背后的含义却不同。
402 |
403 |
404 |
405 | ## Global vs. Local
406 |
407 | - 如果映射关系`aes()` 写在`ggplot()`里, 那么`x = displ, y = hwy, color = class` 为全局变量
408 |
409 | ```{r, eval=FALSE}
410 | ggplot(mpg, aes(x = displ, y = hwy, color = class)) +
411 | geom_point()
412 | ```
413 |
414 | - `geom_point()`中缺少所绘图所需要的映射关系,就会继承全局变量的映射关系
415 |
416 |
417 |
418 | ## Global vs. Local
419 | - 如果映射关系`aes()` 写在几何对象`geom_point()`里, 就为局部变量,
420 |
421 | ```{r, eval=FALSE}
422 | ggplot(mpg) +
423 | geom_point(aes(x = displ, y = hwy, color = class))
424 | ```
425 |
426 | - `geom_point()`绘图所需要的映射关系已经存在,就不会继承全局变量的映射关系
427 |
428 |
429 |
430 |
431 | ## Global vs. Local
432 | ```{r, eval=FALSE, warning=FALSE, message=FALSE}
433 | ggplot(mpg, aes(x = displ, y = hwy)) +
434 | geom_point(aes(color = class)) +
435 | geom_smooth()
436 | ```
437 |
438 | 这里的 `geom_point()` 和 `geom_smooth()` 都会从全局变量中继承映射关系。
439 |
440 |
441 |
442 | ## Global vs. Local
443 |
444 | ```{r, out.width= "65%"}
445 | ggplot(mpg, aes(x = displ, y = hwy, color = class)) +
446 | geom_point(aes(color = factor(cyl)))
447 | ```
448 | 局部变量中的映射关系
449 | `aes(color = )`已经存在,因此不会从全局变量中继承,沿用当前的映射关系。
450 |
451 |
452 |
453 |
454 | ## 提问
455 | 大家细细体会下,下面两段代码的区别
456 | ```{r, eval=FALSE}
457 | ggplot(mpg, aes(x = displ, y = hwy, color = class)) +
458 | geom_smooth(method = lm) +
459 | geom_point()
460 | ```
461 |
462 |
463 | ```{r, eval=FALSE}
464 | ggplot(mpg, aes(x = displ, y = hwy)) +
465 | geom_smooth(method = lm) +
466 | geom_point(aes(color = class))
467 | ```
468 |
469 |
470 |
471 |
472 | ## 保存图片
473 |
474 | 可以使用`ggsave()`函数,将图片保存为所需要的格式,如".pdf", ".png"等
475 |
476 | ```{r, eval = FALSE}
477 | p <- ggplot(mpg, aes(x = displ, y = hwy)) +
478 | geom_smooth(method = lm) +
479 | geom_point(aes(color = class)) +
480 | ggtitle("This is my first plot")
481 |
482 | ggsave(
483 | filename = "myplot.pdf",
484 | plot = p,
485 | width = 8,
486 | height = 6
487 | )
488 | ```
489 |
490 |
491 |
492 |
493 |
--------------------------------------------------------------------------------
/06_ggplot2/06_ggplot2.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/06_ggplot2/06_ggplot2.pdf
--------------------------------------------------------------------------------
/06_ggplot2/demo_data/temp_carbon.csv:
--------------------------------------------------------------------------------
1 | year,temp_anomaly,land_anomaly,ocean_anomaly,carbon_emissions
2 | 1880,-0.11,-0.48,-0.01,236
3 | 1881,-0.08,-0.4,0.01,243
4 | 1882,-0.1,-0.48,0,256
5 | 1883,-0.18,-0.66,-0.04,272
6 | 1884,-0.26,-0.69,-0.14,275
7 | 1885,-0.25,-0.56,-0.17,277
8 | 1886,-0.24,-0.51,-0.17,281
9 | 1887,-0.28,-0.47,-0.23,295
10 | 1888,-0.13,-0.41,-0.05,327
11 | 1889,-0.09,-0.31,-0.02,327
12 | 1890,-0.34,-0.51,-0.29,356
13 | 1891,-0.25,-0.52,-0.15,372
14 | 1892,-0.3,-0.49,-0.23,374
15 | 1893,-0.32,-0.54,-0.24,370
16 | 1894,-0.3,-0.38,-0.27,383
17 | 1895,-0.23,-0.39,-0.17,406
18 | 1896,-0.09,-0.33,0,419
19 | 1897,-0.09,-0.26,-0.03,440
20 | 1898,-0.26,-0.37,-0.22,465
21 | 1899,-0.15,-0.21,-0.13,507
22 | 1900,-0.07,-0.15,-0.05,534
23 | 1901,-0.15,-0.12,-0.16,552
24 | 1902,-0.25,-0.26,-0.24,566
25 | 1903,-0.37,-0.37,-0.37,617
26 | 1904,-0.45,-0.44,-0.46,624
27 | 1905,-0.27,-0.33,-0.25,663
28 | 1906,-0.21,-0.17,-0.22,707
29 | 1907,-0.38,-0.62,-0.29,784
30 | 1908,-0.43,-0.44,-0.43,750
31 | 1909,-0.44,-0.43,-0.45,785
32 | 1910,-0.4,-0.36,-0.42,819
33 | 1911,-0.44,-0.48,-0.43,836
34 | 1912,-0.34,-0.48,-0.28,879
35 | 1913,-0.32,-0.31,-0.32,943
36 | 1914,-0.14,-0.06,-0.17,850
37 | 1915,-0.09,-0.08,-0.1,838
38 | 1916,-0.32,-0.46,-0.26,901
39 | 1917,-0.4,-0.63,-0.29,955
40 | 1918,-0.3,-0.5,-0.21,936
41 | 1919,-0.25,-0.33,-0.21,806
42 | 1920,-0.23,-0.36,-0.18,932
43 | 1921,-0.16,-0.15,-0.17,803
44 | 1922,-0.25,-0.27,-0.24,845
45 | 1923,-0.25,-0.29,-0.24,970
46 | 1924,-0.24,-0.25,-0.24,963
47 | 1925,-0.18,-0.15,-0.19,975
48 | 1926,-0.07,-0.02,-0.1,983
49 | 1927,-0.17,-0.22,-0.16,1062
50 | 1928,-0.18,-0.15,-0.2,1065
51 | 1929,-0.33,-0.49,-0.27,1145
52 | 1930,-0.11,-0.13,-0.11,1053
53 | 1931,-0.06,-0.02,-0.08,940
54 | 1932,-0.13,-0.03,-0.17,847
55 | 1933,-0.26,-0.36,-0.22,893
56 | 1934,-0.11,-0.06,-0.13,973
57 | 1935,-0.16,-0.17,-0.15,1027
58 | 1936,-0.12,-0.12,-0.12,1130
59 | 1937,-0.01,-0.02,-0.01,1209
60 | 1938,-0.02,0.17,-0.1,1142
61 | 1939,0.01,0.1,-0.03,1192
62 | 1940,0.16,0.07,0.2,1299
63 | 1941,0.27,0.1,0.35,1334
64 | 1942,0.11,0.06,0.13,1342
65 | 1943,0.11,0.07,0.12,1391
66 | 1944,0.28,0.19,0.32,1383
67 | 1945,0.18,-0.07,0.3,1160
68 | 1946,-0.01,-0.01,-0.01,1238
69 | 1947,-0.04,0.04,-0.07,1392
70 | 1948,-0.05,0.05,-0.1,1469
71 | 1949,-0.07,-0.07,-0.08,1419
72 | 1950,-0.15,-0.32,-0.09,1630
73 | 1951,0,-0.06,0.02,1767
74 | 1952,0.05,-0.05,0.08,1795
75 | 1953,0.13,0.2,0.1,1841
76 | 1954,-0.1,-0.12,-0.09,1865
77 | 1955,-0.13,-0.11,-0.13,2042
78 | 1956,-0.18,-0.4,-0.1,2177
79 | 1957,0.07,-0.04,0.11,2270
80 | 1958,0.13,0.15,0.12,2330
81 | 1959,0.08,0.09,0.08,2454
82 | 1960,0.05,0,0.07,2569
83 | 1961,0.1,0.12,0.09,2580
84 | 1962,0.11,0.16,0.09,2686
85 | 1963,0.12,0.21,0.08,2833
86 | 1964,-0.14,-0.22,-0.11,2995
87 | 1965,-0.07,-0.12,-0.05,3130
88 | 1966,-0.01,-0.05,0.01,3288
89 | 1967,0,0.01,-0.01,3393
90 | 1968,-0.03,-0.11,0.01,3566
91 | 1969,0.11,-0.08,0.17,3780
92 | 1970,0.06,0.05,0.06,4053
93 | 1971,-0.07,-0.02,-0.09,4208
94 | 1972,0.04,-0.17,0.11,4376
95 | 1973,0.19,0.34,0.14,4614
96 | 1974,-0.06,-0.18,-0.02,4623
97 | 1975,0.01,0.14,-0.04,4596
98 | 1976,-0.07,-0.23,-0.01,4864
99 | 1977,0.21,0.25,0.19,5016
100 | 1978,0.12,0.1,0.12,5074
101 | 1979,0.23,0.17,0.24,5357
102 | 1980,0.28,0.31,0.26,5301
103 | 1981,0.32,0.52,0.25,5138
104 | 1982,0.19,0.11,0.22,5094
105 | 1983,0.36,0.5,0.3,5075
106 | 1984,0.17,0.06,0.2,5258
107 | 1985,0.16,0.1,0.18,5417
108 | 1986,0.23,0.3,0.21,5583
109 | 1987,0.38,0.45,0.36,5725
110 | 1988,0.39,0.58,0.32,5936
111 | 1989,0.29,0.36,0.27,6066
112 | 1990,0.45,0.66,0.37,6074
113 | 1991,0.39,0.53,0.34,6142
114 | 1992,0.24,0.24,0.23,6078
115 | 1993,0.28,0.35,0.25,6070
116 | 1994,0.34,0.48,0.29,6174
117 | 1995,0.47,0.78,0.35,6305
118 | 1996,0.32,0.35,0.31,6448
119 | 1997,0.51,0.64,0.46,6556
120 | 1998,0.65,0.98,0.52,6576
121 | 1999,0.44,0.78,0.31,6561
122 | 2000,0.42,0.62,0.34,6733
123 | 2001,0.57,0.84,0.46,6893
124 | 2002,0.62,0.95,0.49,6994
125 | 2003,0.63,0.94,0.52,7376
126 | 2004,0.58,0.81,0.49,7743
127 | 2005,0.66,1.08,0.5,8042
128 | 2006,0.63,0.97,0.5,8336
129 | 2007,0.61,1.12,0.43,8503
130 | 2008,0.54,0.89,0.41,8776
131 | 2009,0.64,0.9,0.54,8697
132 | 2010,0.72,1.14,0.56,9128
133 | 2011,0.57,0.91,0.44,9503
134 | 2012,0.63,0.95,0.51,9673
135 | 2013,0.67,1.03,0.53,9773
136 | 2014,0.73,1.01,0.63,9855
137 |
--------------------------------------------------------------------------------
/06_ggplot2/header.tex:
--------------------------------------------------------------------------------
1 | \usepackage{ctex}
2 | \usepackage{booktabs}
3 | \usepackage{longtable}
4 | \usepackage{array}
5 | \usepackage{multirow}
6 | \usepackage{wrapfig}
7 | \usepackage{float}
8 | \usepackage{colortbl}
9 | \usepackage{pdflscape}
10 | \usepackage{tabu}
11 | \usepackage{threeparttable}
12 | \usepackage{threeparttablex}
13 | \usepackage{makecell}
14 | \usepackage{xcolor}
15 | \usepackage{xtab}
16 |
17 | \def\begincols{
18 | \begin{columns}} \def\begincol{\begin{column}} \def\endcol{\end{column}} \def\endcols{\end{columns}
19 | }
20 |
21 |
22 |
--------------------------------------------------------------------------------
/06_ggplot2/images/Paradox1.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/06_ggplot2/images/Paradox1.pdf
--------------------------------------------------------------------------------
/06_ggplot2/images/Paradox2.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/06_ggplot2/images/Paradox2.pdf
--------------------------------------------------------------------------------
/06_ggplot2/images/Paradox3.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/06_ggplot2/images/Paradox3.pdf
--------------------------------------------------------------------------------
/06_ggplot2/images/a-14.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/06_ggplot2/images/a-14.png
--------------------------------------------------------------------------------
/06_ggplot2/images/a-20.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/06_ggplot2/images/a-20.png
--------------------------------------------------------------------------------
/06_ggplot2/images/a-21.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/06_ggplot2/images/a-21.png
--------------------------------------------------------------------------------
/06_ggplot2/images/a-3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/06_ggplot2/images/a-3.png
--------------------------------------------------------------------------------
/06_ggplot2/images/cholera_a.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/06_ggplot2/images/cholera_a.pdf
--------------------------------------------------------------------------------
/06_ggplot2/images/cholera_b.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/06_ggplot2/images/cholera_b.pdf
--------------------------------------------------------------------------------
/06_ggplot2/images/cholera_c.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/06_ggplot2/images/cholera_c.pdf
--------------------------------------------------------------------------------
/06_ggplot2/images/ggplot_template.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/06_ggplot2/images/ggplot_template.png
--------------------------------------------------------------------------------
/06_ggplot2/images/mapping.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/06_ggplot2/images/mapping.png
--------------------------------------------------------------------------------
/06_ggplot2/images/tidyverse.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/06_ggplot2/images/tidyverse.png
--------------------------------------------------------------------------------
/09_stringr/09_stringr.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "第九章:字符串处理"
3 | author: "王敏杰"
4 | institute: "四川师范大学"
5 | date: "\\today"
6 | fontsize: 12pt
7 | output: binb::metropolis
8 | section-titles: true
9 | #toc: true
10 | header-includes:
11 | - \usepackage[fontset = fandol]{ctex}
12 | - \input{header.tex}
13 | link-citations: yes
14 | colorlinks: yes
15 | linkcolor: red
16 | classoption: "dvipsnames,UTF8"
17 | ---
18 |
19 | ```{r setup, include=FALSE}
20 | options(digits = 3)
21 | knitr::opts_chunk$set(
22 | comment = "#>",
23 | echo = TRUE,
24 | collapse = TRUE,
25 | message = FALSE,
26 | warning = FALSE,
27 | out.width = "75%",
28 | fig.asp = 0.618, # 1 / phi
29 | fig.show = "hold",
30 | fig.showtext = TRUE
31 | )
32 | ```
33 |
34 |
35 | # 提问
36 |
37 | ## 问题
38 |
39 | 这是一份关于地址信息的数据
40 | ```{r echo=FALSE, message=FALSE, warning=FALSE}
41 | library(tidyverse)
42 | library(stringr)
43 | library(knitr)
44 | library(printr)
45 |
46 | d <- tibble::tribble(
47 | ~No, ~address,
48 | 1L, "Sichuan Univ, Coll Chem",
49 | 2L, "Sichuan Univ, Coll Elect Engn",
50 | 3L, "Sichuan Univ, Dept Phys",
51 | 4L, "Sichuan Univ, Coll Life Sci",
52 | 6L, "Sichuan Univ, Food Engn",
53 | 7L, "Sichuan Univ, Coll Phys",
54 | 8L, "Sichuan Univ, Sch Business",
55 | 9L, "Wuhan Univ, Mat Sci"
56 | )
57 |
58 | d
59 | ```
60 |
61 | - 如何提取`Sichuan Univ`后面的学院?
62 |
63 |
64 | ```{r eval=FALSE, include=FALSE}
65 | d %>% dplyr::mutate(
66 | coll = str_extract_all(address, "(?<=Sichuan Univ,).*")
67 | ) %>%
68 | tidyr::unnest(coll, keep_empty = TRUE)
69 | ```
70 |
71 |
72 | ```{r eval=FALSE, include=FALSE}
73 | d %>% mutate(
74 | coll = str_remove_all(address, ".*,")
75 | )
76 | ```
77 |
78 | ```{r eval=FALSE, include=FALSE}
79 | d %>% tidyr::separate(
80 | address, into = c("univ", "coll"), sep = ",", remove = FALSE
81 | )
82 | ```
83 |
84 |
85 | ```{r eval=FALSE, include=FALSE}
86 | d %>%
87 | tidyr::extract(
88 | address, c("univ", "coll"), "(Sichuan Univ), (.+)",
89 | remove = FALSE
90 | )
91 | ```
92 |
93 |
94 |
95 | # 正则表达式
96 |
97 |
98 | ## 什么是正则表达式
99 |
100 |
101 |
102 |
103 |
104 |
105 |
106 |
107 |
108 |
109 |
110 |
111 |
112 |
113 | 正则表达式(Regular Expression),是一种强大、便捷、高效的文本处理工具。它描述了一种字符串匹配的模式(pattern),比如:
114 |
115 | - 具有固定格式的文本
116 | - 电话号码
117 | - 网络地址、邮件地址
118 | - 日期格式
119 | - 网页解析
120 | - 等等
121 |
122 |
123 |
124 |
125 | ## stringr包
126 | - 正则表达式并不是R语言特有的,事实上,几乎所有程序语言都支持正则表达式 (e.g. Perl, Python, Java, Ruby, etc).
127 |
128 | - R语言中很多函数都需要使用正则表达式,然而大神Hadley Wickham开发的stringr包让正则表达式简单易懂,所以今天我们介绍这个包。
129 |
130 | ```{r out.width = '20%', fig.align='center', echo = FALSE}
131 | knitr::include_graphics("images/hex-stringr.png")
132 | ```
133 |
134 |
135 | ```{r echo=TRUE, message=FALSE, warning=FALSE}
136 | library(stringr) #install.packages("stringr")
137 | ```
138 |
139 |
140 | ## stringr包
141 |
142 | \small
143 | - 字符串处理基础
144 | - 字符串长度
145 | - 字符串组合
146 | - 字符串子串
147 |
148 | - 使用正则表达式进行模式匹配
149 | - 基础匹配
150 | - 锚点[máo][diǎn]
151 | - 字符类与字符选项
152 | - 重复
153 | - 分组与回溯引用
154 |
155 | - 解决实际问题
156 | - 判断是否匹配
157 | - 提取匹配内容
158 |
159 |
160 |
161 |
162 |
163 | # 字符串处理基础
164 |
165 | ## 字符串长度
166 |
167 | 想获取字符串的长度,使用 `str_length()`函数:
168 | ```{r}
169 | str_length("R for data science")
170 | ```
171 |
172 | 字符串向量,也适用
173 | ```{r}
174 | str_length(c("a", "R for data science", NA))
175 | ```
176 |
177 | ## 字符串长度
178 |
179 | 数据框里配合dplyr函数,同样很方便
180 | ```{r}
181 | data.frame(
182 | x = c("a", "R for data science", NA)
183 | ) %>%
184 | mutate(y = str_length(x))
185 | ```
186 |
187 |
188 |
189 |
190 |
191 |
192 |
193 | ## 字符串组合
194 |
195 | 把字符串拼接在一起,使用`str_c()`函数
196 | ```{r}
197 | str_c("x", "y")
198 | ```
199 |
200 |
201 | 把字符串拼接在一起,可以设置中间的间隔
202 | ```{r}
203 | str_c("x", "y", sep = ", ")
204 | ```
205 |
206 |
207 | ```{r}
208 | str_c(c("x", "y", "z"), sep = ", ")
209 | ```
210 | 是不是和你想象的不一样,那就试试`?str_c`
211 |
212 |
213 |
214 | ## 字符串组合
215 | ```{r}
216 | str_c(c("x", "y", "z"), c("x", "y", "z"), sep = ", ")
217 | ```
218 |
219 | 用在数据框里
220 | ```{r}
221 | data.frame( x = c("I", "love", "you"),
222 | y = c("you", "like", "me") ) %>%
223 | mutate(z = str_c(x, y, sep = "|"))
224 | ```
225 |
226 |
227 | ## 字符串组合
228 |
229 | 使用collapse选项,是先组合,然后再转换成单个字符串,大家对比下
230 |
231 | ```{r}
232 | str_c(c("x", "y", "z"), c("a", "b", "c"), sep = "|")
233 | ```
234 |
235 | ```{r}
236 | str_c(
237 | c("x", "y", "z"), c("a", "b", "c"), collapse = "|"
238 | )
239 | ```
240 |
241 |
242 |
243 |
244 |
245 |
246 |
247 | ## 字符串取子集
248 |
249 | 截取字符串的一部分,需要指定截取的开始位置和结束位置
250 | ```{r}
251 | x <- c("Apple", "Banana", "Pear")
252 | str_sub(x, 1, 3)
253 | ```
254 |
255 | 开始位置和结束位置如果是负整数,就表示位置是从后往前数,比如下面这段代码,截取倒数第3个至倒数第1个位置上的字符串
256 | ```{r}
257 | str_sub(x, -3, -1)
258 | ```
259 |
260 | ## 字符串取子集
261 |
262 | 也可以进行赋值,如果该位置上有字符,就用新的字符替换旧的字符
263 | ```{r}
264 | x <- c("Apple", "Banana", "Pear")
265 | x
266 | ```
267 |
268 |
269 | ```{r}
270 | str_sub(x, 1, 1)
271 | ```
272 |
273 |
274 | ```{r}
275 | str_sub(x, 1, 1) <- "Q"
276 | x
277 | ```
278 |
279 |
280 |
281 |
282 |
283 |
284 | # 使用正则表达式进行模式匹配
285 |
286 |
287 | ## 基础匹配
288 |
289 | `str_view()` 是查看string是否匹配pattern,
290 |
291 | 如果匹配,就高亮显示
292 | ```{r, out.width="300%"}
293 | x <- c("apple", "banana", "pear")
294 | str_view(string = x, pattern = "an")
295 | ```
296 |
297 |
298 | ## 基础匹配
299 | 有时候,我们希望在字符`a`前后都有字符(即,a处在两字符中间,如rap, bad, sad, wave,spear等等)
300 | ```{r, out.width="300%"}
301 | x <- c("apple", "banana", "pear")
302 | str_view(x, ".a.")
303 | ```
304 |
305 |
306 | ## 基础匹配
307 |
308 | \begincols[T]
309 | \begincol[T]{.49\textwidth}
310 |
311 | 这里的`.` 代表任意字符.
312 |
313 | ```{r, out.width="600%"}
314 | c("s.d") %>%
315 | str_view(".")
316 | ```
317 | \endcol
318 |
319 | \begincol[T]{.49\textwidth}
320 |
321 | 如果想表达.本身呢?
322 | ```{r, out.width="600%"}
323 | c("s.d") %>%
324 | str_view("\\.")
325 | ```
326 |
327 | \endcol
328 | \endcols
329 |
330 |
331 |
332 | ## 锚点
333 | ```{r}
334 | x <- c("apple", "banana", "pear")
335 | x
336 | ```
337 | \begincols[T]
338 | \begincol[T]{.49\textwidth}
339 |
340 | 希望`a`是字符串的开始
341 | ```{r, out.width="600%"}
342 | str_view(x, "^a")
343 | ```
344 | \endcol
345 |
346 | \begincol[T]{.49\textwidth}
347 |
348 | 希望`a`是一字符串的末尾
349 | ```{r, out.width="600%"}
350 | str_view(x, "a$")
351 | ```
352 | \endcol
353 | \endcols
354 |
355 |
356 |
357 |
358 | ## 锚点
359 | ```{r, out.width="300%"}
360 | x <- c("apple pie", "apple", "apple cake")
361 | str_view(x, "^apple$")
362 | ```
363 |
364 |
365 |
366 |
367 |
368 | ## 字符类与字符选项
369 |
370 | 前面提到,`.`匹配任意字符,事实上还有很多这种**特殊含义**的字符:
371 |
372 | * `\d`: matches any digit.
373 | * `\s`: matches any whitespace (e.g. space, tab, newline).
374 | * `[abc]`: matches a, b, or c.
375 | * `[^abc]`: matches anything except a, b, or c.
376 |
377 |
378 | ```{r, out.width="300%"}
379 | str_view(c("grey", "gray"), "gr[ea]y")
380 | ```
381 |
382 |
383 |
384 |
385 |
386 |
387 |
388 |
389 | ## 重复
390 |
391 | 控制匹配次数:
392 |
393 | * `?`: 0 or 1
394 | * `+`: 1 or more
395 | * `*`: 0 or more
396 |
397 |
398 | ```{r}
399 | x <- "Roman numerals: MDCCCLXXXVIII"
400 | ```
401 |
402 | \begincols[T]
403 | \begincol[T]{.49\textwidth}
404 |
405 | ```{r, out.width="600%"}
406 | str_view(x, "CC?")
407 | ```
408 |
409 | \endcol
410 | \begincol[T]{.49\textwidth}
411 |
412 | ```{r, out.width="600%"}
413 | str_view(x, "X+")
414 | ```
415 | \endcol
416 | \endcols
417 |
418 |
419 |
420 | ## 重复
421 | 控制匹配次数:
422 |
423 | * `{n}`: exactly n
424 | * `{n,}`: n or more
425 | * `{,m}`: at most m
426 | * `{n,m}`: between n and m
427 |
428 |
429 |
430 | ## 重复
431 | ```{r, out.width="300%"}
432 | x <- "Roman numerals: MDCCCLXXXVIII"
433 | str_view(x, "C{2}")
434 | str_view(x, "C{2,}")
435 | str_view(x, "C{2,3}")
436 | ```
437 |
438 |
439 |
440 | ## 重复
441 | - 默认的情况,`*`, `+` 匹配都是**贪婪**的,也就是它会尽可能的匹配更多
442 | - 如果想让它不贪婪,而是变得懒惰起来,可以在 `*` 或 `+` 后加个`?`
443 |
444 |
445 | ```{r}
446 | x <- "Roman numerals: MDCCCLXXXVIII"
447 | ```
448 |
449 | \begincols[T]
450 | \begincol[T]{.49\textwidth}
451 | ```{r, out.width="600%"}
452 | str_view(x, "CLX+")
453 | ```
454 | \endcol
455 | \begincol[T]{.49\textwidth}
456 |
457 | ```{r, out.width="600%"}
458 | str_view(x, "CLX+?")
459 | ```
460 | \endcol
461 | \endcols
462 |
463 |
464 |
465 | ## 小结一下
466 |
467 | ```{r out.width = '100%', fig.align='center', echo = FALSE}
468 | knitr::include_graphics("images/regex_repeat.jpg")
469 | ```
470 |
471 |
472 |
473 |
474 | ## 分组与回溯引用
475 |
476 |
477 | ```{r}
478 | ft <- fruit %>% head(10)
479 | ft
480 | ```
481 |
482 | 我们想看看这些单词里,有哪些字母是重复两次的,比如`aa`, `pp`. 如果用上面学的方法
483 | ```{r, out.width="300%"}
484 | str_view(ft, ".{2}", match = TRUE)
485 | ```
486 |
487 | 发现是不是和我们的预想不一样呢?
488 |
489 |
490 |
491 | ## 分组与回溯引用
492 | 所以需要用到新技术 **分组与回溯引用**,
493 | ```{r, out.width="300%"}
494 | str_view(ft, "(.)\\1", match = TRUE)
495 | ```
496 |
497 |
498 | ## 分组与回溯引用
499 | ```{r, eval=FALSE}
500 | str_view(ft, "(.)\\1", match = TRUE)
501 | ```
502 |
503 | - `.` 是匹配任何字符
504 | - `(.)` 将匹配项括起来,它就用了一个名字,叫`\\1`; 如果有两个括号,就叫`\\1`和`\\2`
505 | - `\\1` 表示回溯引用,表示引用`\\1`对于的`(.)`
506 |
507 | 所以`(.)\\1`的意思就是,匹配到了字符,后面还希望有个**同样的字符**
508 |
509 |
510 |
511 | ## 分组与回溯引用
512 | 如果是匹配`abab`, `wcwc`
513 | ```{r, out.width="300%"}
514 | str_view(ft, "(..)\\1", match = TRUE)
515 | ```
516 |
517 | 如果是匹配`abba`, `wccw`呢?
518 |
519 | ```{r, out.width="300%"}
520 | str_view(ft, "(.)(.)\\2\\1", match = TRUE)
521 | ```
522 |
523 | 是不是很神奇?
524 |
525 |
526 |
527 | # 进阶部分
528 |
529 |
530 | ## look ahead
531 |
532 | 想匹配Windows,同时希望Windows右侧是`"95", "98", "NT", "2000"`中的一个
533 | ```{r, out.width="300%"}
534 | win <- c("Windows2000", "Windows", "Windows3.1")
535 | str_view(win, "Windows(?=95|98|NT|2000)")
536 | ```
537 |
538 | ## look ahead
539 |
540 | ```{r, out.width="300%"}
541 | win <- c("Windows2000", "Windows", "Windows3.1")
542 | str_view(win, "Windows(?!95|98|NT|2000)")
543 | ```
544 |
545 |
546 |
547 |
548 |
549 |
550 |
551 | ## look behind
552 |
553 |
554 | ```{r, out.width="300%"}
555 | win <- c("2000Windows", "Windows", "3.1Windows")
556 | str_view(win, "(?<=95|98|NT|2000)Windows")
557 | ```
558 |
559 | ## look behind
560 |
561 | ```{r, out.width="300%"}
562 | win <- c("2000Windows", "Windows", "3.1Windows")
563 | str_view(win, "(?% mutate(has_e = str_detect(x, "e"))
592 | ```
593 |
594 |
595 |
596 | ## 确定一个字符向量是否匹配一种模式
597 | 用去筛选也很方便
598 | ```{r echo=FALSE}
599 | d <- tibble(x = c("apple", "banana", "pear") )
600 | d
601 | ```
602 |
603 | ```{r}
604 | d %>% filter(str_detect(x, "e"))
605 | ```
606 |
607 |
608 |
609 |
610 |
611 | ## 提取匹配的内容
612 |
613 | 我们希望能提取第二列中的数值,构成新的一列
614 |
615 | \begincols[T]
616 | \begincol[T]{.3\textwidth}
617 |
618 | ```{r echo=FALSE}
619 | dt <- tibble(
620 | x = 1:4,
621 | y = c("wk 3", "week-1", "7", "w#9")
622 | )
623 | dt
624 | ```
625 | \endcol
626 | \begincol[T]{.69\textwidth}
627 |
628 | ```{r}
629 | dt %>% mutate(
630 | z = str_extract(y, "[0-9]")
631 | )
632 | ```
633 |
634 | \endcol
635 | \endcols
636 |
637 |
638 |
639 |
640 |
641 | ## 提取匹配的内容
642 |
643 |
644 | 回到上课提问:如何提取`Sichuan Univ`后面的学院?
645 | ```{r echo=FALSE, message=FALSE, warning=FALSE}
646 | d <- tibble::tribble(
647 | ~No, ~address,
648 | 1L, "Sichuan Univ, Coll Chem",
649 | 2L, "Sichuan Univ, Coll Elect Engn",
650 | 3L, "Sichuan Univ, Dept Phys",
651 | 4L, "Sichuan Univ, Coll Life Sci",
652 | 6L, "Sichuan Univ, Food Engn",
653 | 7L, "Sichuan Univ, Coll Phys",
654 | 8L, "Sichuan Univ, Sch Business",
655 | 9L, "Wuhan Univ, Mat Sci"
656 | )
657 |
658 | d
659 | ```
660 |
661 |
662 | ## 提取匹配的内容
663 | \footnotesize
664 | ```{r}
665 | d %>% mutate(
666 | coll = str_extract(address, "(?<=Sichuan Univ,).*")
667 | ) %>%
668 | tidyr::unnest(coll, keep_empty = TRUE)
669 | ```
670 |
671 |
672 |
--------------------------------------------------------------------------------
/09_stringr/09_stringr.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/09_stringr/09_stringr.pdf
--------------------------------------------------------------------------------
/09_stringr/header.tex:
--------------------------------------------------------------------------------
1 | \usepackage{ctex}
2 | \usepackage{booktabs}
3 | \usepackage{longtable}
4 | \usepackage{array}
5 | \usepackage{multirow}
6 | \usepackage{wrapfig}
7 | \usepackage{float}
8 | \usepackage{colortbl}
9 | \usepackage{pdflscape}
10 | \usepackage{tabu}
11 | \usepackage{threeparttable}
12 | \usepackage{threeparttablex}
13 | \usepackage{makecell}
14 | \usepackage{xcolor}
15 | \usepackage{xtab}
16 |
17 | \def\begincols{
18 | \begin{columns}} \def\begincol{\begin{column}} \def\endcol{\end{column}} \def\endcols{\end{columns}
19 | }
20 |
21 |
22 |
--------------------------------------------------------------------------------
/09_stringr/images/hex-stringr.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/09_stringr/images/hex-stringr.png
--------------------------------------------------------------------------------
/09_stringr/images/regex_repeat.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/09_stringr/images/regex_repeat.jpg
--------------------------------------------------------------------------------
/15_eda02/15_reproducible.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Exploratory Data Analysis with the Tidyverse"
3 | subtitle: "一个关于企鹅的数据故事"
4 | author: "诗与远方"
5 | date: "`r Sys.Date()`"
6 | output:
7 | pdf_document:
8 | latex_engine: xelatex
9 | extra_dependencies:
10 | ctex: UTF8
11 | number_sections: yes
12 | #toc: yes
13 | df_print: kable
14 | classoptions: "hyperref, 12pt, a4paper"
15 | ---
16 |
17 |
18 | ```{r setup, include=FALSE}
19 | knitr::opts_chunk$set(
20 | echo = TRUE,
21 | message = FALSE,
22 | warning = FALSE,
23 | fig.align = "center"
24 | )
25 | ```
26 |
27 |
28 |
29 | # 数据故事
30 |
31 | 今天讲一个关于企鹅的数据故事。数据来源[这里](https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-07-28/penguins.csv),图片来源[这里](https://github.com/allisonhorst/palmerpenguins).
32 |
33 | ```{r out.width = '100%', echo = FALSE}
34 | knitr::include_graphics("images/penguins.png")
35 | ```
36 |
37 |
38 |
39 | # 数据
40 |
41 | ## 导入数据
42 |
43 | 可通过宏包`palmerpenguins::penguins`获取数据,也可以读取本地`penguins.csv`文件,
44 | 我们采取后面一种方法:
45 |
46 | ```{r}
47 | library(tidyverse)
48 | penguins <- read_csv("./demo_data/penguins.csv")
49 | penguins %>% head(5)
50 | ```
51 |
52 |
53 |
54 | ## 变量含义
55 |
56 | |variable |class |description |
57 | |:-----------------|:-------|:-----------|
58 | |species |integer | 企鹅种类 (Adelie, Gentoo, Chinstrap) |
59 | |island |integer | 所在岛屿 (Biscoe, Dream, Torgersen) |
60 | |bill_length_mm |double | 嘴峰长度 (单位毫米) |
61 | |bill_depth_mm |double | 嘴峰深度 (单位毫米)|
62 | |flipper_length_mm |integer | 鰭肢长度 (单位毫米) |
63 | |body_mass_g |integer | 体重 (单位克) |
64 | |sex |integer | 性别 |
65 | |year |integer | 记录年份 |
66 |
67 |
68 |
69 | ```{r out.width = '86%', echo = FALSE}
70 | knitr::include_graphics("images/culmen_depth.png")
71 | ```
72 |
73 | ## 数据清洗
74 | ```{r}
75 | penguins %>% filter_all(any_vars(is.na(.)))
76 | ```
77 |
78 | ```{r}
79 | d <- penguins %>% drop_na()
80 | d %>% head()
81 | ```
82 |
83 | # 探索性分析
84 |
85 |
86 | ## 多少种类的企鹅
87 | ```{r}
88 | d %>% count(species, sort = T)
89 | ```
90 |
91 | ## 多少个岛屿
92 | ```{r}
93 | d %>% count(island, sort = T)
94 | ```
95 |
96 | ## 每种类型的企鹅,他们的各个属性的均值和分布
97 | ```{r}
98 | d %>%
99 | group_by(species) %>%
100 | summarise(
101 | across(where(is.numeric), mean, na.rm = T)
102 | )
103 | ```
104 | ```{r}
105 | d %>%
106 | ggplot(aes( x = bill_length_mm)) +
107 | geom_density() +
108 | facet_wrap(vars(species), scale = "free")
109 | ```
110 |
111 | ```{r}
112 | library(ggridges)
113 | d %>%
114 | ggplot(aes( x = bill_depth_mm, y = species, fill = species) ) +
115 | ggridges::geom_density_ridges()
116 |
117 | ```
118 |
119 |
120 |
121 | ```{r}
122 | d %>% select(species, body_mass_g, ends_with("_mm")) %>%
123 | pivot_longer(
124 | cols = -species,
125 | names_to = "metric",
126 | values_to = "values"
127 | ) %>%
128 | ggplot(aes(x = values, y = species, fill = species) ) +
129 | ggridges::geom_density_ridges() +
130 | facet_wrap(vars(metric), scale = "free")
131 | ```
132 |
133 | ## 嘴巴的长度和深度的关联?
134 | ```{r}
135 | d %>%
136 | ggplot(aes(x = bill_length_mm, y = bill_depth_mm)) +
137 | geom_point() +
138 | geom_smooth(method = lm, aes(color = species)) +
139 | geom_smooth(method = lm)
140 | ```
141 |
142 |
143 |
144 | ## 不同种类的宝宝,体重具有显著性差异?
145 | ```{r}
146 | d %>%
147 | ggplot(aes(x = species, y = body_mass_g)) +
148 | geom_boxplot() +
149 | geom_jitter()
150 | ```
151 | ```{r}
152 | aov(body_mass_g ~ species, data = d) %>% summary()
153 | ```
154 |
155 | ```{r}
156 | library(ggstatsplot)
157 | d %>%
158 | ggbetweenstats(
159 | x = species,
160 | y = body_mass_g,
161 | pairwise.comparisons = T,
162 | pairwise.display = T
163 | )
164 |
165 |
166 | ```
167 | 使用这个宏包辅助我们学习统计
168 |
169 |
170 | ## 通过嘴巴的长度和深度,区分企鹅的种类?性别?
171 |
172 | 这是机器学习的范畴
173 | ```{r}
174 | d %>%
175 | ggplot(aes(x = bill_length_mm, y = bill_depth_mm, color = species, shape = species)) +
176 | geom_point()
177 | ```
178 |
179 |
180 | ```{r}
181 | library(tidymodels)
182 | d <- d %>% mutate(species = factor(species))
183 |
184 | split <- initial_split(d)
185 | split
186 | training_data <- training(split)
187 | testing_data <- testing(split)
188 |
189 | model <- parsnip::nearest_neighbor() %>%
190 | set_engine("kknn") %>%
191 | set_mode("classification") %>%
192 | fit(species ~ bill_length_mm + bill_depth_mm, data = training_data)
193 |
194 |
195 | predict(model, new_data = testing_data) %>%
196 | bind_cols(testing_data) %>%
197 | count(species, .pred_class)
198 | ```
199 |
200 |
201 |
202 |
--------------------------------------------------------------------------------
/15_eda02/15_reproducible.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/15_eda02/15_reproducible.pdf
--------------------------------------------------------------------------------
/15_eda02/demo_data/penguins.csv:
--------------------------------------------------------------------------------
1 | species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex,year
2 | Adelie,Torgersen,39.1,18.7,181,3750,male,2007
3 | Adelie,Torgersen,39.5,17.4,186,3800,female,2007
4 | Adelie,Torgersen,40.3,18,195,3250,female,2007
5 | Adelie,Torgersen,NA,NA,NA,NA,NA,2007
6 | Adelie,Torgersen,36.7,19.3,193,3450,female,2007
7 | Adelie,Torgersen,39.3,20.6,190,3650,male,2007
8 | Adelie,Torgersen,38.9,17.8,181,3625,female,2007
9 | Adelie,Torgersen,39.2,19.6,195,4675,male,2007
10 | Adelie,Torgersen,34.1,18.1,193,3475,NA,2007
11 | Adelie,Torgersen,42,20.2,190,4250,NA,2007
12 | Adelie,Torgersen,37.8,17.1,186,3300,NA,2007
13 | Adelie,Torgersen,37.8,17.3,180,3700,NA,2007
14 | Adelie,Torgersen,41.1,17.6,182,3200,female,2007
15 | Adelie,Torgersen,38.6,21.2,191,3800,male,2007
16 | Adelie,Torgersen,34.6,21.1,198,4400,male,2007
17 | Adelie,Torgersen,36.6,17.8,185,3700,female,2007
18 | Adelie,Torgersen,38.7,19,195,3450,female,2007
19 | Adelie,Torgersen,42.5,20.7,197,4500,male,2007
20 | Adelie,Torgersen,34.4,18.4,184,3325,female,2007
21 | Adelie,Torgersen,46,21.5,194,4200,male,2007
22 | Adelie,Biscoe,37.8,18.3,174,3400,female,2007
23 | Adelie,Biscoe,37.7,18.7,180,3600,male,2007
24 | Adelie,Biscoe,35.9,19.2,189,3800,female,2007
25 | Adelie,Biscoe,38.2,18.1,185,3950,male,2007
26 | Adelie,Biscoe,38.8,17.2,180,3800,male,2007
27 | Adelie,Biscoe,35.3,18.9,187,3800,female,2007
28 | Adelie,Biscoe,40.6,18.6,183,3550,male,2007
29 | Adelie,Biscoe,40.5,17.9,187,3200,female,2007
30 | Adelie,Biscoe,37.9,18.6,172,3150,female,2007
31 | Adelie,Biscoe,40.5,18.9,180,3950,male,2007
32 | Adelie,Dream,39.5,16.7,178,3250,female,2007
33 | Adelie,Dream,37.2,18.1,178,3900,male,2007
34 | Adelie,Dream,39.5,17.8,188,3300,female,2007
35 | Adelie,Dream,40.9,18.9,184,3900,male,2007
36 | Adelie,Dream,36.4,17,195,3325,female,2007
37 | Adelie,Dream,39.2,21.1,196,4150,male,2007
38 | Adelie,Dream,38.8,20,190,3950,male,2007
39 | Adelie,Dream,42.2,18.5,180,3550,female,2007
40 | Adelie,Dream,37.6,19.3,181,3300,female,2007
41 | Adelie,Dream,39.8,19.1,184,4650,male,2007
42 | Adelie,Dream,36.5,18,182,3150,female,2007
43 | Adelie,Dream,40.8,18.4,195,3900,male,2007
44 | Adelie,Dream,36,18.5,186,3100,female,2007
45 | Adelie,Dream,44.1,19.7,196,4400,male,2007
46 | Adelie,Dream,37,16.9,185,3000,female,2007
47 | Adelie,Dream,39.6,18.8,190,4600,male,2007
48 | Adelie,Dream,41.1,19,182,3425,male,2007
49 | Adelie,Dream,37.5,18.9,179,2975,NA,2007
50 | Adelie,Dream,36,17.9,190,3450,female,2007
51 | Adelie,Dream,42.3,21.2,191,4150,male,2007
52 | Adelie,Biscoe,39.6,17.7,186,3500,female,2008
53 | Adelie,Biscoe,40.1,18.9,188,4300,male,2008
54 | Adelie,Biscoe,35,17.9,190,3450,female,2008
55 | Adelie,Biscoe,42,19.5,200,4050,male,2008
56 | Adelie,Biscoe,34.5,18.1,187,2900,female,2008
57 | Adelie,Biscoe,41.4,18.6,191,3700,male,2008
58 | Adelie,Biscoe,39,17.5,186,3550,female,2008
59 | Adelie,Biscoe,40.6,18.8,193,3800,male,2008
60 | Adelie,Biscoe,36.5,16.6,181,2850,female,2008
61 | Adelie,Biscoe,37.6,19.1,194,3750,male,2008
62 | Adelie,Biscoe,35.7,16.9,185,3150,female,2008
63 | Adelie,Biscoe,41.3,21.1,195,4400,male,2008
64 | Adelie,Biscoe,37.6,17,185,3600,female,2008
65 | Adelie,Biscoe,41.1,18.2,192,4050,male,2008
66 | Adelie,Biscoe,36.4,17.1,184,2850,female,2008
67 | Adelie,Biscoe,41.6,18,192,3950,male,2008
68 | Adelie,Biscoe,35.5,16.2,195,3350,female,2008
69 | Adelie,Biscoe,41.1,19.1,188,4100,male,2008
70 | Adelie,Torgersen,35.9,16.6,190,3050,female,2008
71 | Adelie,Torgersen,41.8,19.4,198,4450,male,2008
72 | Adelie,Torgersen,33.5,19,190,3600,female,2008
73 | Adelie,Torgersen,39.7,18.4,190,3900,male,2008
74 | Adelie,Torgersen,39.6,17.2,196,3550,female,2008
75 | Adelie,Torgersen,45.8,18.9,197,4150,male,2008
76 | Adelie,Torgersen,35.5,17.5,190,3700,female,2008
77 | Adelie,Torgersen,42.8,18.5,195,4250,male,2008
78 | Adelie,Torgersen,40.9,16.8,191,3700,female,2008
79 | Adelie,Torgersen,37.2,19.4,184,3900,male,2008
80 | Adelie,Torgersen,36.2,16.1,187,3550,female,2008
81 | Adelie,Torgersen,42.1,19.1,195,4000,male,2008
82 | Adelie,Torgersen,34.6,17.2,189,3200,female,2008
83 | Adelie,Torgersen,42.9,17.6,196,4700,male,2008
84 | Adelie,Torgersen,36.7,18.8,187,3800,female,2008
85 | Adelie,Torgersen,35.1,19.4,193,4200,male,2008
86 | Adelie,Dream,37.3,17.8,191,3350,female,2008
87 | Adelie,Dream,41.3,20.3,194,3550,male,2008
88 | Adelie,Dream,36.3,19.5,190,3800,male,2008
89 | Adelie,Dream,36.9,18.6,189,3500,female,2008
90 | Adelie,Dream,38.3,19.2,189,3950,male,2008
91 | Adelie,Dream,38.9,18.8,190,3600,female,2008
92 | Adelie,Dream,35.7,18,202,3550,female,2008
93 | Adelie,Dream,41.1,18.1,205,4300,male,2008
94 | Adelie,Dream,34,17.1,185,3400,female,2008
95 | Adelie,Dream,39.6,18.1,186,4450,male,2008
96 | Adelie,Dream,36.2,17.3,187,3300,female,2008
97 | Adelie,Dream,40.8,18.9,208,4300,male,2008
98 | Adelie,Dream,38.1,18.6,190,3700,female,2008
99 | Adelie,Dream,40.3,18.5,196,4350,male,2008
100 | Adelie,Dream,33.1,16.1,178,2900,female,2008
101 | Adelie,Dream,43.2,18.5,192,4100,male,2008
102 | Adelie,Biscoe,35,17.9,192,3725,female,2009
103 | Adelie,Biscoe,41,20,203,4725,male,2009
104 | Adelie,Biscoe,37.7,16,183,3075,female,2009
105 | Adelie,Biscoe,37.8,20,190,4250,male,2009
106 | Adelie,Biscoe,37.9,18.6,193,2925,female,2009
107 | Adelie,Biscoe,39.7,18.9,184,3550,male,2009
108 | Adelie,Biscoe,38.6,17.2,199,3750,female,2009
109 | Adelie,Biscoe,38.2,20,190,3900,male,2009
110 | Adelie,Biscoe,38.1,17,181,3175,female,2009
111 | Adelie,Biscoe,43.2,19,197,4775,male,2009
112 | Adelie,Biscoe,38.1,16.5,198,3825,female,2009
113 | Adelie,Biscoe,45.6,20.3,191,4600,male,2009
114 | Adelie,Biscoe,39.7,17.7,193,3200,female,2009
115 | Adelie,Biscoe,42.2,19.5,197,4275,male,2009
116 | Adelie,Biscoe,39.6,20.7,191,3900,female,2009
117 | Adelie,Biscoe,42.7,18.3,196,4075,male,2009
118 | Adelie,Torgersen,38.6,17,188,2900,female,2009
119 | Adelie,Torgersen,37.3,20.5,199,3775,male,2009
120 | Adelie,Torgersen,35.7,17,189,3350,female,2009
121 | Adelie,Torgersen,41.1,18.6,189,3325,male,2009
122 | Adelie,Torgersen,36.2,17.2,187,3150,female,2009
123 | Adelie,Torgersen,37.7,19.8,198,3500,male,2009
124 | Adelie,Torgersen,40.2,17,176,3450,female,2009
125 | Adelie,Torgersen,41.4,18.5,202,3875,male,2009
126 | Adelie,Torgersen,35.2,15.9,186,3050,female,2009
127 | Adelie,Torgersen,40.6,19,199,4000,male,2009
128 | Adelie,Torgersen,38.8,17.6,191,3275,female,2009
129 | Adelie,Torgersen,41.5,18.3,195,4300,male,2009
130 | Adelie,Torgersen,39,17.1,191,3050,female,2009
131 | Adelie,Torgersen,44.1,18,210,4000,male,2009
132 | Adelie,Torgersen,38.5,17.9,190,3325,female,2009
133 | Adelie,Torgersen,43.1,19.2,197,3500,male,2009
134 | Adelie,Dream,36.8,18.5,193,3500,female,2009
135 | Adelie,Dream,37.5,18.5,199,4475,male,2009
136 | Adelie,Dream,38.1,17.6,187,3425,female,2009
137 | Adelie,Dream,41.1,17.5,190,3900,male,2009
138 | Adelie,Dream,35.6,17.5,191,3175,female,2009
139 | Adelie,Dream,40.2,20.1,200,3975,male,2009
140 | Adelie,Dream,37,16.5,185,3400,female,2009
141 | Adelie,Dream,39.7,17.9,193,4250,male,2009
142 | Adelie,Dream,40.2,17.1,193,3400,female,2009
143 | Adelie,Dream,40.6,17.2,187,3475,male,2009
144 | Adelie,Dream,32.1,15.5,188,3050,female,2009
145 | Adelie,Dream,40.7,17,190,3725,male,2009
146 | Adelie,Dream,37.3,16.8,192,3000,female,2009
147 | Adelie,Dream,39,18.7,185,3650,male,2009
148 | Adelie,Dream,39.2,18.6,190,4250,male,2009
149 | Adelie,Dream,36.6,18.4,184,3475,female,2009
150 | Adelie,Dream,36,17.8,195,3450,female,2009
151 | Adelie,Dream,37.8,18.1,193,3750,male,2009
152 | Adelie,Dream,36,17.1,187,3700,female,2009
153 | Adelie,Dream,41.5,18.5,201,4000,male,2009
154 | Gentoo,Biscoe,46.1,13.2,211,4500,female,2007
155 | Gentoo,Biscoe,50,16.3,230,5700,male,2007
156 | Gentoo,Biscoe,48.7,14.1,210,4450,female,2007
157 | Gentoo,Biscoe,50,15.2,218,5700,male,2007
158 | Gentoo,Biscoe,47.6,14.5,215,5400,male,2007
159 | Gentoo,Biscoe,46.5,13.5,210,4550,female,2007
160 | Gentoo,Biscoe,45.4,14.6,211,4800,female,2007
161 | Gentoo,Biscoe,46.7,15.3,219,5200,male,2007
162 | Gentoo,Biscoe,43.3,13.4,209,4400,female,2007
163 | Gentoo,Biscoe,46.8,15.4,215,5150,male,2007
164 | Gentoo,Biscoe,40.9,13.7,214,4650,female,2007
165 | Gentoo,Biscoe,49,16.1,216,5550,male,2007
166 | Gentoo,Biscoe,45.5,13.7,214,4650,female,2007
167 | Gentoo,Biscoe,48.4,14.6,213,5850,male,2007
168 | Gentoo,Biscoe,45.8,14.6,210,4200,female,2007
169 | Gentoo,Biscoe,49.3,15.7,217,5850,male,2007
170 | Gentoo,Biscoe,42,13.5,210,4150,female,2007
171 | Gentoo,Biscoe,49.2,15.2,221,6300,male,2007
172 | Gentoo,Biscoe,46.2,14.5,209,4800,female,2007
173 | Gentoo,Biscoe,48.7,15.1,222,5350,male,2007
174 | Gentoo,Biscoe,50.2,14.3,218,5700,male,2007
175 | Gentoo,Biscoe,45.1,14.5,215,5000,female,2007
176 | Gentoo,Biscoe,46.5,14.5,213,4400,female,2007
177 | Gentoo,Biscoe,46.3,15.8,215,5050,male,2007
178 | Gentoo,Biscoe,42.9,13.1,215,5000,female,2007
179 | Gentoo,Biscoe,46.1,15.1,215,5100,male,2007
180 | Gentoo,Biscoe,44.5,14.3,216,4100,NA,2007
181 | Gentoo,Biscoe,47.8,15,215,5650,male,2007
182 | Gentoo,Biscoe,48.2,14.3,210,4600,female,2007
183 | Gentoo,Biscoe,50,15.3,220,5550,male,2007
184 | Gentoo,Biscoe,47.3,15.3,222,5250,male,2007
185 | Gentoo,Biscoe,42.8,14.2,209,4700,female,2007
186 | Gentoo,Biscoe,45.1,14.5,207,5050,female,2007
187 | Gentoo,Biscoe,59.6,17,230,6050,male,2007
188 | Gentoo,Biscoe,49.1,14.8,220,5150,female,2008
189 | Gentoo,Biscoe,48.4,16.3,220,5400,male,2008
190 | Gentoo,Biscoe,42.6,13.7,213,4950,female,2008
191 | Gentoo,Biscoe,44.4,17.3,219,5250,male,2008
192 | Gentoo,Biscoe,44,13.6,208,4350,female,2008
193 | Gentoo,Biscoe,48.7,15.7,208,5350,male,2008
194 | Gentoo,Biscoe,42.7,13.7,208,3950,female,2008
195 | Gentoo,Biscoe,49.6,16,225,5700,male,2008
196 | Gentoo,Biscoe,45.3,13.7,210,4300,female,2008
197 | Gentoo,Biscoe,49.6,15,216,4750,male,2008
198 | Gentoo,Biscoe,50.5,15.9,222,5550,male,2008
199 | Gentoo,Biscoe,43.6,13.9,217,4900,female,2008
200 | Gentoo,Biscoe,45.5,13.9,210,4200,female,2008
201 | Gentoo,Biscoe,50.5,15.9,225,5400,male,2008
202 | Gentoo,Biscoe,44.9,13.3,213,5100,female,2008
203 | Gentoo,Biscoe,45.2,15.8,215,5300,male,2008
204 | Gentoo,Biscoe,46.6,14.2,210,4850,female,2008
205 | Gentoo,Biscoe,48.5,14.1,220,5300,male,2008
206 | Gentoo,Biscoe,45.1,14.4,210,4400,female,2008
207 | Gentoo,Biscoe,50.1,15,225,5000,male,2008
208 | Gentoo,Biscoe,46.5,14.4,217,4900,female,2008
209 | Gentoo,Biscoe,45,15.4,220,5050,male,2008
210 | Gentoo,Biscoe,43.8,13.9,208,4300,female,2008
211 | Gentoo,Biscoe,45.5,15,220,5000,male,2008
212 | Gentoo,Biscoe,43.2,14.5,208,4450,female,2008
213 | Gentoo,Biscoe,50.4,15.3,224,5550,male,2008
214 | Gentoo,Biscoe,45.3,13.8,208,4200,female,2008
215 | Gentoo,Biscoe,46.2,14.9,221,5300,male,2008
216 | Gentoo,Biscoe,45.7,13.9,214,4400,female,2008
217 | Gentoo,Biscoe,54.3,15.7,231,5650,male,2008
218 | Gentoo,Biscoe,45.8,14.2,219,4700,female,2008
219 | Gentoo,Biscoe,49.8,16.8,230,5700,male,2008
220 | Gentoo,Biscoe,46.2,14.4,214,4650,NA,2008
221 | Gentoo,Biscoe,49.5,16.2,229,5800,male,2008
222 | Gentoo,Biscoe,43.5,14.2,220,4700,female,2008
223 | Gentoo,Biscoe,50.7,15,223,5550,male,2008
224 | Gentoo,Biscoe,47.7,15,216,4750,female,2008
225 | Gentoo,Biscoe,46.4,15.6,221,5000,male,2008
226 | Gentoo,Biscoe,48.2,15.6,221,5100,male,2008
227 | Gentoo,Biscoe,46.5,14.8,217,5200,female,2008
228 | Gentoo,Biscoe,46.4,15,216,4700,female,2008
229 | Gentoo,Biscoe,48.6,16,230,5800,male,2008
230 | Gentoo,Biscoe,47.5,14.2,209,4600,female,2008
231 | Gentoo,Biscoe,51.1,16.3,220,6000,male,2008
232 | Gentoo,Biscoe,45.2,13.8,215,4750,female,2008
233 | Gentoo,Biscoe,45.2,16.4,223,5950,male,2008
234 | Gentoo,Biscoe,49.1,14.5,212,4625,female,2009
235 | Gentoo,Biscoe,52.5,15.6,221,5450,male,2009
236 | Gentoo,Biscoe,47.4,14.6,212,4725,female,2009
237 | Gentoo,Biscoe,50,15.9,224,5350,male,2009
238 | Gentoo,Biscoe,44.9,13.8,212,4750,female,2009
239 | Gentoo,Biscoe,50.8,17.3,228,5600,male,2009
240 | Gentoo,Biscoe,43.4,14.4,218,4600,female,2009
241 | Gentoo,Biscoe,51.3,14.2,218,5300,male,2009
242 | Gentoo,Biscoe,47.5,14,212,4875,female,2009
243 | Gentoo,Biscoe,52.1,17,230,5550,male,2009
244 | Gentoo,Biscoe,47.5,15,218,4950,female,2009
245 | Gentoo,Biscoe,52.2,17.1,228,5400,male,2009
246 | Gentoo,Biscoe,45.5,14.5,212,4750,female,2009
247 | Gentoo,Biscoe,49.5,16.1,224,5650,male,2009
248 | Gentoo,Biscoe,44.5,14.7,214,4850,female,2009
249 | Gentoo,Biscoe,50.8,15.7,226,5200,male,2009
250 | Gentoo,Biscoe,49.4,15.8,216,4925,male,2009
251 | Gentoo,Biscoe,46.9,14.6,222,4875,female,2009
252 | Gentoo,Biscoe,48.4,14.4,203,4625,female,2009
253 | Gentoo,Biscoe,51.1,16.5,225,5250,male,2009
254 | Gentoo,Biscoe,48.5,15,219,4850,female,2009
255 | Gentoo,Biscoe,55.9,17,228,5600,male,2009
256 | Gentoo,Biscoe,47.2,15.5,215,4975,female,2009
257 | Gentoo,Biscoe,49.1,15,228,5500,male,2009
258 | Gentoo,Biscoe,47.3,13.8,216,4725,NA,2009
259 | Gentoo,Biscoe,46.8,16.1,215,5500,male,2009
260 | Gentoo,Biscoe,41.7,14.7,210,4700,female,2009
261 | Gentoo,Biscoe,53.4,15.8,219,5500,male,2009
262 | Gentoo,Biscoe,43.3,14,208,4575,female,2009
263 | Gentoo,Biscoe,48.1,15.1,209,5500,male,2009
264 | Gentoo,Biscoe,50.5,15.2,216,5000,female,2009
265 | Gentoo,Biscoe,49.8,15.9,229,5950,male,2009
266 | Gentoo,Biscoe,43.5,15.2,213,4650,female,2009
267 | Gentoo,Biscoe,51.5,16.3,230,5500,male,2009
268 | Gentoo,Biscoe,46.2,14.1,217,4375,female,2009
269 | Gentoo,Biscoe,55.1,16,230,5850,male,2009
270 | Gentoo,Biscoe,44.5,15.7,217,4875,NA,2009
271 | Gentoo,Biscoe,48.8,16.2,222,6000,male,2009
272 | Gentoo,Biscoe,47.2,13.7,214,4925,female,2009
273 | Gentoo,Biscoe,NA,NA,NA,NA,NA,2009
274 | Gentoo,Biscoe,46.8,14.3,215,4850,female,2009
275 | Gentoo,Biscoe,50.4,15.7,222,5750,male,2009
276 | Gentoo,Biscoe,45.2,14.8,212,5200,female,2009
277 | Gentoo,Biscoe,49.9,16.1,213,5400,male,2009
278 | Chinstrap,Dream,46.5,17.9,192,3500,female,2007
279 | Chinstrap,Dream,50,19.5,196,3900,male,2007
280 | Chinstrap,Dream,51.3,19.2,193,3650,male,2007
281 | Chinstrap,Dream,45.4,18.7,188,3525,female,2007
282 | Chinstrap,Dream,52.7,19.8,197,3725,male,2007
283 | Chinstrap,Dream,45.2,17.8,198,3950,female,2007
284 | Chinstrap,Dream,46.1,18.2,178,3250,female,2007
285 | Chinstrap,Dream,51.3,18.2,197,3750,male,2007
286 | Chinstrap,Dream,46,18.9,195,4150,female,2007
287 | Chinstrap,Dream,51.3,19.9,198,3700,male,2007
288 | Chinstrap,Dream,46.6,17.8,193,3800,female,2007
289 | Chinstrap,Dream,51.7,20.3,194,3775,male,2007
290 | Chinstrap,Dream,47,17.3,185,3700,female,2007
291 | Chinstrap,Dream,52,18.1,201,4050,male,2007
292 | Chinstrap,Dream,45.9,17.1,190,3575,female,2007
293 | Chinstrap,Dream,50.5,19.6,201,4050,male,2007
294 | Chinstrap,Dream,50.3,20,197,3300,male,2007
295 | Chinstrap,Dream,58,17.8,181,3700,female,2007
296 | Chinstrap,Dream,46.4,18.6,190,3450,female,2007
297 | Chinstrap,Dream,49.2,18.2,195,4400,male,2007
298 | Chinstrap,Dream,42.4,17.3,181,3600,female,2007
299 | Chinstrap,Dream,48.5,17.5,191,3400,male,2007
300 | Chinstrap,Dream,43.2,16.6,187,2900,female,2007
301 | Chinstrap,Dream,50.6,19.4,193,3800,male,2007
302 | Chinstrap,Dream,46.7,17.9,195,3300,female,2007
303 | Chinstrap,Dream,52,19,197,4150,male,2007
304 | Chinstrap,Dream,50.5,18.4,200,3400,female,2008
305 | Chinstrap,Dream,49.5,19,200,3800,male,2008
306 | Chinstrap,Dream,46.4,17.8,191,3700,female,2008
307 | Chinstrap,Dream,52.8,20,205,4550,male,2008
308 | Chinstrap,Dream,40.9,16.6,187,3200,female,2008
309 | Chinstrap,Dream,54.2,20.8,201,4300,male,2008
310 | Chinstrap,Dream,42.5,16.7,187,3350,female,2008
311 | Chinstrap,Dream,51,18.8,203,4100,male,2008
312 | Chinstrap,Dream,49.7,18.6,195,3600,male,2008
313 | Chinstrap,Dream,47.5,16.8,199,3900,female,2008
314 | Chinstrap,Dream,47.6,18.3,195,3850,female,2008
315 | Chinstrap,Dream,52,20.7,210,4800,male,2008
316 | Chinstrap,Dream,46.9,16.6,192,2700,female,2008
317 | Chinstrap,Dream,53.5,19.9,205,4500,male,2008
318 | Chinstrap,Dream,49,19.5,210,3950,male,2008
319 | Chinstrap,Dream,46.2,17.5,187,3650,female,2008
320 | Chinstrap,Dream,50.9,19.1,196,3550,male,2008
321 | Chinstrap,Dream,45.5,17,196,3500,female,2008
322 | Chinstrap,Dream,50.9,17.9,196,3675,female,2009
323 | Chinstrap,Dream,50.8,18.5,201,4450,male,2009
324 | Chinstrap,Dream,50.1,17.9,190,3400,female,2009
325 | Chinstrap,Dream,49,19.6,212,4300,male,2009
326 | Chinstrap,Dream,51.5,18.7,187,3250,male,2009
327 | Chinstrap,Dream,49.8,17.3,198,3675,female,2009
328 | Chinstrap,Dream,48.1,16.4,199,3325,female,2009
329 | Chinstrap,Dream,51.4,19,201,3950,male,2009
330 | Chinstrap,Dream,45.7,17.3,193,3600,female,2009
331 | Chinstrap,Dream,50.7,19.7,203,4050,male,2009
332 | Chinstrap,Dream,42.5,17.3,187,3350,female,2009
333 | Chinstrap,Dream,52.2,18.8,197,3450,male,2009
334 | Chinstrap,Dream,45.2,16.6,191,3250,female,2009
335 | Chinstrap,Dream,49.3,19.9,203,4050,male,2009
336 | Chinstrap,Dream,50.2,18.8,202,3800,male,2009
337 | Chinstrap,Dream,45.6,19.4,194,3525,female,2009
338 | Chinstrap,Dream,51.9,19.5,206,3950,male,2009
339 | Chinstrap,Dream,46.8,16.5,189,3650,female,2009
340 | Chinstrap,Dream,45.7,17,195,3650,female,2009
341 | Chinstrap,Dream,55.8,19.8,207,4000,male,2009
342 | Chinstrap,Dream,43.5,18.1,202,3400,female,2009
343 | Chinstrap,Dream,49.6,18.2,193,3775,male,2009
344 | Chinstrap,Dream,50.8,19,210,4100,male,2009
345 | Chinstrap,Dream,50.2,18.7,198,3775,female,2009
346 |
--------------------------------------------------------------------------------
/15_eda02/header.tex:
--------------------------------------------------------------------------------
1 | \usepackage{ctex}
2 | \usepackage{booktabs}
3 | \usepackage{longtable}
4 | \usepackage{array}
5 | \usepackage{multirow}
6 | \usepackage{wrapfig}
7 | \usepackage{float}
8 | \usepackage{colortbl}
9 | \usepackage{pdflscape}
10 | \usepackage{tabu}
11 | \usepackage{threeparttable}
12 | \usepackage{threeparttablex}
13 | \usepackage{makecell}
14 | \usepackage{xcolor}
15 | \usepackage{xtab}
16 |
17 | \def\begincols{
18 | \begin{columns}} \def\begincol{\begin{column}} \def\endcol{\end{column}} \def\endcols{\end{columns}
19 | }
20 |
21 |
22 |
--------------------------------------------------------------------------------
/15_eda02/images/01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/15_eda02/images/01.png
--------------------------------------------------------------------------------
/15_eda02/images/4_3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/15_eda02/images/4_3.png
--------------------------------------------------------------------------------
/15_eda02/images/culmen_depth.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/15_eda02/images/culmen_depth.png
--------------------------------------------------------------------------------
/15_eda02/images/lter_penguins.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/15_eda02/images/lter_penguins.png
--------------------------------------------------------------------------------
/15_eda02/images/penguins.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/15_eda02/images/penguins.png
--------------------------------------------------------------------------------
/R4DS_slides.Rproj:
--------------------------------------------------------------------------------
1 | Version: 1.0
2 |
3 | RestoreWorkspace: Default
4 | SaveWorkspace: Default
5 | AlwaysSaveHistory: Default
6 |
7 | EnableCodeIndexing: Yes
8 | UseSpacesForTab: Yes
9 | NumSpacesForTab: 2
10 | Encoding: UTF-8
11 |
12 | RnwWeave: knitr
13 | LaTeX: XeLaTeX
14 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # 数据科学中的 R 语言
2 |
3 |
4 | 
5 |
6 |
7 |
8 | ## 课程简介
9 |
10 | - 数据科学是综合了统计学、计算机科学和专业领域的交叉学科,具体内容就是用数据的方法研究科学,用科学的方法研究数据。
11 | - R 是一种统计分析的编程语言,集统计分析与图形显示于一体。通过学习和掌握语言的语法,可以编制自己的函数来扩展现有的语言。
12 | - 2019 年国际统计学年会将考普斯总统奖(统计学界的诺贝尔奖)颁给 R 语言宏包 tidyverse 的作者 Hadley Wickham,说明 R 语言得到了学术界的充分认可。
13 | - 由于统计分析能力突出、作图功能强大、拓展与开发能力强等特点,在国际上,R 语言在自然科学和社会科学研究领域,得到了越来越广泛的应用。
14 |
15 | 本课程将以 R 语言作为数据科学学习之旅的新起点,讲解 R 语言入门基础、数据可视化、数据处理、探索性分析、统计建模、案例解析以及在代表性领域的应用,适用于研究生和博士生。
16 |
17 |
18 | ## 课程目标
19 | 训练数据思维、提升编程技能、培养创新能力
20 |
21 |
22 | ## 课程内容
23 |
24 |
25 | | 时间 | 标题 | 主要内容 | 课时 | 课件 |
26 | |-------- |------------------- |-------------------------------------------------------------------------------------------- |------ |---------------------------------------------------------------------------------------------------------------------- |
27 | | week01 | Why R? | R是什么?R能干什么?为什么是R? | 1 | [00_whyR.pdf](https://github.com/perlatex/R4DS_slides/blob/master/00_whyR/00_whyR.pdf) |
28 | | week01 | 数据科学基础 | 了解数据科学流程,配置运行环境,安装R和Rstudio,以及如何安装所需要的宏包 | 1 | [01_install.pdf](https://github.com/perlatex/R4DS_slides/blob/master/01_install/01_install.pdf) |
29 | | week02 | R语言基础 | 基本运算、数据类型、数据结构、常用统计函数、分支,循环等,了解脚本、宏包,以及如何获取帮助 | 2 | [02_basicR.pdf](https://github.com/perlatex/R4DS_slides/blob/master/02_basicR/02_basicR.pdf) |
30 | | week03 | 子集选取 | 向量、列表、矩阵、数据框 | 2 | [03_subset.pdf](https://github.com/perlatex/R4DS_slides/blob/master/03_subset/03_subset.pdf) |
31 | | week04 | 可重复性研究 | Rmarkdown语法,生成html格式报告、生成pdf格式报告、生成word格式报告 | 2 | [04_Rmarkdown.pdf](https://github.com/perlatex/R4DS_slides/blob/master/04_Rmarkdown/04_Rmarkdown.pdf) |
32 | | week05 | 数据处理 | 读取外部数据,存储数据,dplyr数据处理,案例讲解 | 2 | [05_dplyr.pdf](https://github.com/perlatex/R4DS_slides/blob/master/05_dplyr/05_dplyr.pdf) |
33 | | week06 | 数据可视化1 | ggplot2基本语法、映射、设置、图片保存 | 2 | [06_ggplot2.pdf](https://github.com/perlatex/R4DS_slides/blob/master/06_ggplot2/06_ggplot2.pdf) |
34 | | week07 | 数据可视化2 | 几何对象、主题风格、标度体系、图例系统 | 2 | [07_ggplot2.pdf](https://github.com/perlatex/R4DS_slides/blob/master/07_ggplot2/07_ggplot2.pdf) |
35 | | week08 | 探索性数据分析1 | 结合案例数据,综合运用数据处理、可视化探索技能 | 2 | [08_eda01.pdf](https://github.com/perlatex/R4DS_slides/blob/master/08_eda01/08_eda01.pdf) |
36 | | week09 | 字符串处理 | 正则表达式,文本信息提取 | 2 | [09_stringr.pdf](https://github.com/perlatex/R4DS_slides/blob/master/09_stringr/09_stringr.pdf) |
37 | | week10 | 因子类型数据 | 因子型变量的处理和应用 | 2 | [10_forcats.pdf](https://github.com/perlatex/R4DS_slides/blob/master/10_forcats/10_forcats.pdf) |
38 | | week11 | 线性回归 | 一元回归、多元回归模型,重点是分析和解释模型输出、拟合与预测 | 2 | [11_lm.pdf](https://github.com/perlatex/R4DS_slides/blob/master/11_lm/11_lm.pdf) |
39 | | week12 | 基础统计分析 | 基本描述统计,假设检验,方差分析,以及与线性回归的等价性 | 2 | [12_tidystats.pdf](https://github.com/perlatex/R4DS_slides/blob/master/12_tidystats/12_tidystats.pdf) |
40 | | week13 | 函数式编程 | 安全高效的迭代处理技术 | 2 | [13_purrr.pdf](https://github.com/perlatex/R4DS_slides/blob/master/13_purrr/13_purrr.pdf) |
41 | | week14 | tidyverse编程进阶 | 各种应用场景,常用函数和技巧 | 2 | [14_tidyverse_tips.pdf](https://github.com/perlatex/R4DS_slides/blob/master/14_tidyverse_tips/14_tidyverse_tips.pdf) |
42 | | week15 | 探索性数据分析2 | 结合具体案例,完成数据分析和建模,训练数据思维 | 2 | [15_eda02.pdf](https://github.com/perlatex/R4DS_slides/blob/master/15_eda02/15_eda02.pdf) |
43 |
44 |
45 |
46 |
47 | ## 关于考核
48 | 结合所在学科,找一篇与自己研究方向相关的文献,用课堂上学到的 R 统计编程技能,**重复**文献的数据分析和可视化过程.
49 |
50 |
51 |
52 |
53 | ## 参考书目
54 | - [https://r4ds.had.co.nz/](https://r4ds.had.co.nz/)
55 | - [https://bookdown.org/wangminjie/R4DS/](https://bookdown.org/wangminjie/R4DS/)
56 |
57 |
58 |
59 |
60 | ## 我会努力的
61 | 愿 R 语言成为你构建知识大厦的脚手架!
62 |
--------------------------------------------------------------------------------
/data_science.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/perlatex/R4DS_slides/117751057aaab7e4ebc2396641e1c32a09826e41/data_science.jpg
--------------------------------------------------------------------------------