├── .gitignore
├── README.md
├── img
    ├── author_image.png
    └── shield_image.png
├── datasets
    ├── luffy_event.RData
    ├── straw_hat_df.RData
    ├── luffy_event_bounty.RData
    └── straw_hat_devil_fruit.RData
├── requirements.r
├── course.yml
├── removed
    └── removed.Rmd
├── chapter1.md
├── chapter5.md
├── chapter4.md
├── chapter2.md
└── chapter3.md


/.gitignore:
--------------------------------------------------------------------------------
1 | .Rproj.user/*
2 | .Rproj.user
3 | .cache
4 | .DS_STORE
5 | .Rhistory
6 | .RData
7 | .Rdata
8 | .rdata
9 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Dataframe Manipulation in R (Chinese)
2 | 
3 | Community course on Dataframe manipulation in Cantonese.
4 | 


--------------------------------------------------------------------------------
/img/author_image.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datacamp/community-courses-dataframe-manipulation-r-chinese/master/img/author_image.png


--------------------------------------------------------------------------------
/img/shield_image.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datacamp/community-courses-dataframe-manipulation-r-chinese/master/img/shield_image.png


--------------------------------------------------------------------------------
/datasets/luffy_event.RData:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datacamp/community-courses-dataframe-manipulation-r-chinese/master/datasets/luffy_event.RData


--------------------------------------------------------------------------------
/datasets/straw_hat_df.RData:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datacamp/community-courses-dataframe-manipulation-r-chinese/master/datasets/straw_hat_df.RData


--------------------------------------------------------------------------------
/datasets/luffy_event_bounty.RData:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datacamp/community-courses-dataframe-manipulation-r-chinese/master/datasets/luffy_event_bounty.RData


--------------------------------------------------------------------------------
/datasets/straw_hat_devil_fruit.RData:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datacamp/community-courses-dataframe-manipulation-r-chinese/master/datasets/straw_hat_devil_fruit.RData


--------------------------------------------------------------------------------
/requirements.r:
--------------------------------------------------------------------------------
1 | devtools::install_version("tibble", "1.4.2")
2 | devtools::install_version("plyr", "1.8.4")
3 | devtools::install_version("dplyr", "0.7.4")
4 | devtools::install_version("tidyr", "0.8.0")
5 | 


--------------------------------------------------------------------------------
/course.yml:
--------------------------------------------------------------------------------
 1 | title            : 資料框整理技巧
 2 | author_field      : Tony Yao-Jen Kuo
 3 | description      : 在 R 語言導論中我們介紹了一種資料類型叫資料框，實務中我們面對的多數資料分析專案都是將原始資料讀為一個資料框再開始進行後續工作。我們將在這門課程中跟著草帽海賊團在新世界冒險，一邊學習各種資料框整理技巧，像是欄與列的技巧、生成衍生變數、轉置以及聯結等，努力成為懸賞金額破億的超新星海賊，一場爭奪 One Piece 的海上冒險故事！
 4 | university       : DataCamp
 5 | difficulty_level : 2
 6 | time_needed      : 2 hours
 7 | author_bio: "郭耀仁畢業於臺灣大學商學研究所碩士班，是資料科學與推廣教育的愛好者。目前是新創公司 Kyosei 的共同創辦人；同時亦是臺大系統訓練班的講師成員，推廣資料科學，R 語言與 Python，
 8 | 班上熱心的學員是協助校正中文課程的得力助手。在 Kyosei 之前是 Coupang 的資料分析師與 SAS 分析顧問。
 9 | 閒暇時熱愛長跑與乒乓球，參加 2016 年波士頓馬拉松時初次拜訪 DataCamp，與我們一同討論課程的中文化與資料科學的推廣。
10 | <br/>
11 | <br/>
12 | Yao-Jen is enthusiastic about data science and education.
13 | He is now one of the co-founders at Kyosei, a startup focusing on light-weight data science solutions.
14 | He lectures on \"Data Science and R/Python\" at System Training Program of NTU.
15 | Active students in class helps him elaborate the localized courses.
16 | Prior to founding Kyosei, he was a senior data analyst at Coupang and an analytical consultant at SAS Software.
17 | He holds a M.B.A. at NTU, however, his career so far is quite different from most of his peers.
18 | In his spare time, he enjoys Marathons and ping-pong. In fact, the first time we met him was the 2016 Boston Marathon.
19 | He came to our office after the race and discussed localization of our courses."
20 | from: 'r-base-prod:27'
21 | 


--------------------------------------------------------------------------------
/removed/removed.Rmd:
--------------------------------------------------------------------------------
 1 | --- type:NormalExercise lang:r xp:100 skills:4 key:1a1353d859
 2 | ## 寫 SQL 查詢
 3 | 
 4 | 走到這裡，你已經是一位能夠自在使用 R 語言整理資料框的高額賞金海賊！最後一個練習我們介紹 R 語言的 [`sqldf`](http://www.rdocumentation.org/packages/sqldf/versions/0.4-7.1) 套件，載入套件以後你就可以使用 [`sqldf()`](http://www.rdocumentation.org/packages/sqldf/versions/0.4-7.1/topics/sqldf) 函數，它能夠讓你寫 SQL 查詢來整理資料框，這對原本就擅長使用 SQL 查詢語法的海賊們無疑是個天大好消息！
 5 | 
 6 | ```{r}
 7 | library(sqldf)
 8 | sqldf("your sql query here...")
 9 | ```
10 | 
11 | *** =instructions
12 | - 完成第一個 SQL 查詢將草帽海賊團賞金高於 1 億貝里的船員選出來，並指派給一個資料框 `straw_hat_high_bounty`。
13 | - 將 `straw_hat_high_bounty` 輸出在 R Console。
14 | - 完成第二個 SQL 查詢將角色設定與惡魔果實資料框做**左外部聯結** （LEFT JOIN），並指派給一個資料框 `straw_hat_df_devil_fruit`。
15 | - 將 `straw_hat_df_devil_fruit` 輸出在 R Console。
16 | 
17 | *** =hint
18 | - 第一個查詢 `WHERE` 條件要下為 `WHERE bounty > 100000000`。
19 | - 第二個查詢要輸入 `LEFT JOIN straw_hat_devil_fruit ON straw_hat_df.name = straw_hat_devil_fruit.name`
20 | 
21 | *** =pre_exercise_code
22 | ```{r}
23 | load(url("http://s3.amazonaws.com/assets.datacamp.com/production/course_1570/datasets/straw_hat_df.RData"))
24 | load(url("http://s3.amazonaws.com/assets.datacamp.com/production/course_1570/datasets/straw_hat_devil_fruit.RData"))
25 | ```
26 | 
27 | *** =sample_code
28 | ```{r}
29 | # straw_hat_df 與 straw_hat_devil_fruit 已預先載入
30 | 
31 | # 載入 sqldf 套件
32 | library(sqldf)
33 | 
34 | # 第一個查詢
35 | straw_hat_high_bounty <- sqldf("SELECT * FROM straw_hat_df WHERE __ > __")
36 | 
37 | # 將 straw_hat_high_bounty 輸出在 R Console
38 | 
39 | 
40 | # 第二個查詢
41 | straw_hat_df_devil_fruit <- sqldf("SELECT straw_hat_df.*, straw_hat_devil_fruit.devil_fruit, straw_hat_devil_fruit.devil_fruit_type FROM straw_hat_df __ __ straw_hat_devil_fruit ON straw_hat_df.__ = straw_hat_devil_fruit.__")
42 | 
43 | # 將 straw_hat_df_devil_fruit 輸出在 R Console
44 | 
45 | ```
46 | 
47 | *** =solution
48 | ```{r}
49 | # straw_hat_df 與 straw_hat_devil_fruit 已預先載入
50 | 
51 | # 載入 sqldf 套件
52 | library(sqldf)
53 | 
54 | # 第一個查詢
55 | straw_hat_high_bounty <- sqldf("SELECT * FROM straw_hat_df WHERE bounty > 100000000")
56 | 
57 | # 將 straw_hat_high_bounty 輸出在 R Console
58 | straw_hat_high_bounty
59 | 
60 | # 第二個查詢
61 | straw_hat_df_devil_fruit <- sqldf("SELECT straw_hat_df.*, straw_hat_devil_fruit.devil_fruit, straw_hat_devil_fruit.devil_fruit_type FROM straw_hat_df LEFT JOIN straw_hat_devil_fruit ON straw_hat_df.name = straw_hat_devil_fruit.name")
62 | 
63 | # 將 straw_hat_df_devil_fruit 輸出在 R Console
64 | straw_hat_df_devil_fruit
65 | ```
66 | 
67 | *** =sct
68 | ```{r}
69 | msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#22312; `WHERE` &#26781;&#20214;&#21152;&#20837;&#36062;&#37329;&#39640;&#26044; 1 &#20740;&#35997;&#37324;&#65311;"
70 | test_object("straw_hat_high_bounty",
71 |             undefined_msg = msg,
72 |             incorrect_msg = msg)
73 | 
74 | msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#22312; R Console &#36664;&#20986; `straw_hat_high_bounty`&#65311;"
75 | test_output_contains("straw_hat_high_bounty",
76 |                      incorrect_msg = msg)
77 | 
78 | msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#20351;&#29992; `LEFT JOIN` &#35486;&#27861;&#36914;&#34892;&#24038;&#22806;&#37096;&#32879;&#32080;&#65311;"
79 | test_object("straw_hat_df_devil_fruit",
80 |             undefined_msg = msg,
81 |             incorrect_msg = msg)
82 | 
83 | msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#22312; R Console &#36664;&#20986; `straw_hat_df_devil_fruit`&#65311;"
84 | test_output_contains("straw_hat_df_devil_fruit",
85 |                      incorrect_msg = msg)
86 | 
87 | success_msg("&#24685;&#21916;&#20320;&#65281;&#20320;&#39640;&#36229;&#30340;&#36039;&#26009;&#26694;&#25972;&#29702;&#25216;&#24039;&#24050;&#32147;&#34987;&#28023;&#36557;&#21015;&#28858;&#37325;&#40670;&#36890;&#32221;&#28023;&#36042;&#65292;&#20320;&#30340;&#21517;&#34399;&#24050;&#32147;&#20659;&#36941;&#20102;&#26032;&#19990;&#30028;&#65292;&#19968;&#22580;&#29229;&#22890; One Piece &#30340;&#28023;&#19978;&#20882;&#38570;&#25925;&#20107;&#65281;")
88 | ```
89 | 
90 | 
91 | 
92 | 
93 | 
94 | 


--------------------------------------------------------------------------------
/chapter1.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title_meta  : 第一章
  3 | title       : 建立與探索資料框
  4 | description : 在學習資料框的整理技巧之前，我們得先在 R 語言的工作環境中建立出可供我們練習的資料框才行，在本章我們首先會複習在 R 語言導論中學過的內容，像是建立一個資料框以及一些快速探索資料框的好用函數，一場爭奪 One Piece 的海上冒險故事！
  5 | 
  6 | --- type:NormalExercise lang:r xp:100 skills:4 key:9a073c3931
  7 | ## 建立資料框
  8 | 
  9 | 本課程會用到很多 [R 語言導論](https://www.datacamp.com/community/open-courses/r-%E8%AA%9E%E8%A8%80%E5%B0%8E%E8%AB%96#gs.xZfVkjM)中的觀念與語法，非常建議你先去玩玩看 [R 語言導論](https://www.datacamp.com/community/open-courses/r-%E8%AA%9E%E8%A8%80%E5%B0%8E%E8%AB%96#gs.xZfVkjM)再開始本課程！
 10 | 
 11 | 草帽海賊團主要角色設定有：
 12 | 
 13 | - 姓名
 14 | - 性別
 15 | - 職業
 16 | - 賞金
 17 | - 年齡
 18 | - 生日
 19 | - 身高
 20 | 
 21 | 建立資料框之前通常習慣先將各個欄位生成為向量，我們大概猜想得到姓名應該是字串型的向量，而年齡會是數值型的向量。資料框特性是可以容納不同的資料格式，這代表著我們可以生成一個資料框將草帽海賊團的角色設定記錄在其中。
 22 | 
 23 | 資料來源：[One Piece Wiki](http://onepiece.wikia.com/wiki/Main_Page)
 24 | 
 25 | *** =instructions
 26 | - 使用 `data.frame()` 將右邊編輯區已經定義好的角色設定向量結合為一個資料框，並命名為 `straw_hat_df`。
 27 | 
 28 | *** =hint
 29 | - 在編輯區輸入 `straw_hat_df <- data.frame(name, gender, occupation, bounty, age, birthday, height)`
 30 | 
 31 | *** =pre_exercise_code
 32 | ```{r}
 33 | # no pec
 34 | ```
 35 | 
 36 | *** =sample_code
 37 | ```{r}
 38 | # 角色設定的向量
 39 | name <- c("Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook")
 40 | gender <- c("Male", "Male", "Female", "Male", "Male", "Male", "Female", "Male", "Male")
 41 | occupation <- c("Captain", "Swordsman", "Navigator", "Sniper", "Cook", "Doctor", "Archaeologist", "Shipwright", "Musician")
 42 | bounty <- c(500000000, 320000000, 66000000, 200000000, 177000000, 100, 130000000, 94000000, 83000000)
 43 | age <- c(19, 21, 20, 19, 21, 17, 30, 36, 90)
 44 | birthday <- c("05-05", "11-11", "07-03", "04-01", "03-02", "12-24", "02-06", "03-09", "04-03")
 45 | height <- c(174, 181, 170, 176, 180, 90, 188, 240, 277)
 46 | 
 47 | # 建立草帽海賊團角色設定的資料框
 48 | straw_hat_df <- 
 49 | ```
 50 | 
 51 | *** =solution
 52 | ```{r}
 53 | # 角色設定的向量
 54 | name <- c("Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook")
 55 | gender <- c("Male", "Male", "Female", "Male", "Male", "Male", "Female", "Male", "Male")
 56 | occupation <- c("Captain", "Swordsman", "Navigator", "Sniper", "Cook", "Doctor", "Archaeologist", "Shipwright", "Musician")
 57 | bounty <- c(500000000, 320000000, 66000000, 200000000, 177000000, 100, 130000000, 94000000, 83000000)
 58 | age <- c(19, 21, 20, 19, 21, 17, 30, 36, 90)
 59 | birthday <- c("05-05", "11-11", "07-03", "04-01", "03-02", "12-24", "02-06", "03-09", "04-03")
 60 | height <- c(174, 181, 170, 176, 180, 90, 188, 240, 277)
 61 | 
 62 | # 建立草帽海賊團角色設定的資料框
 63 | straw_hat_df <- data.frame(name, gender, occupation, bounty, age, birthday, height)
 64 | ```
 65 | 
 66 | *** =sct
 67 | ```{r}
 68 | msg <- "&#19981;&#38656;&#35201;&#31227;&#38500;&#21407;&#26412;&#24171;&#20320;&#23450;&#32681;&#22909;&#30340;&#21521;&#37327;&#21908;&#65281;"
 69 | 
 70 | test_object("name", undefined_msg = msg, incorrect_msg = msg)
 71 | test_object("gender", undefined_msg = msg, incorrect_msg = msg)
 72 | test_object("occupation", undefined_msg = msg, incorrect_msg = msg)
 73 | test_object("bounty", undefined_msg = msg, incorrect_msg = msg)
 74 | test_object("age", undefined_msg = msg, incorrect_msg = msg)
 75 | test_object("birthday", undefined_msg = msg, incorrect_msg = msg)
 76 | test_object("height", undefined_msg = msg, incorrect_msg = msg)
 77 | 
 78 | test_object("straw_hat_df", incorrect_msg = "&#30906;&#35469;&#26159;&#21542;&#27491;&#30906;&#20351;&#29992; `data.frame()` &#20989;&#25976;&#20006;&#19988;&#23559;&#32080;&#26524;&#25351;&#27966;&#32102; straw_hat_df&#65311;")
 79 | 
 80 | success_msg("&#20320;&#20570;&#24471;&#22826;&#26834;&#20102;&#65292;&#35731;&#25105;&#20497;&#32380;&#32396;&#19979;&#19968;&#20491;&#32244;&#32722;&#65281;")
 81 | ```
 82 | 
 83 | --- type:NormalExercise lang:r xp:100 skills:4 key:cb3400c7e4
 84 | ## 探索資料框
 85 | 
 86 | 我們可以使用幾個好用的函數來快速探索一個資料框：
 87 | 
 88 | - [`dim()`](http://www.rdocumentation.org/packages/base/versions/3.3.1/topics/dim) 函數
 89 | - [`head()`](http://www.rdocumentation.org/packages/utils/versions/3.3.1/topics/head) 函數
 90 | - [`tail()`](http://www.rdocumentation.org/packages/utils/versions/3.3.1/topics/head) 函數
 91 | - [`str()`](http://www.rdocumentation.org/packages/utils/versions/3.3.1/topics/str) 函數
 92 | - [`summary()`](http://www.rdocumentation.org/packages/base/versions/3.3.1/topics/summary) 函數
 93 | 
 94 | [`dim()`](http://www.rdocumentation.org/packages/base/versions/3.3.1/topics/dim) 函數會回傳資料框的列數與欄數；[`head()`](http://www.rdocumentation.org/packages/utils/versions/3.3.1/topics/head) 函數會回傳資料框的前六列；[`tail()`](http://www.rdocumentation.org/packages/utils/versions/3.3.1/topics/head) 函數會回傳資料框的後六列；[`str()`](http://www.rdocumentation.org/packages/utils/versions/3.3.1/topics/str) 函數不僅會列出資料框的觀察值個數與變數個數，還會列出每個欄位的資料類型以及前幾個觀測值；[`summary()`](http://www.rdocumentation.org/packages/base/versions/3.3.1/topics/summary) 函數會回傳量值型變數的敘述性統計資料，以及類別型變數的屬性資料。
 95 | 
 96 | *** =instructions
 97 | - 使用這 5 個好用的函數探索已經載入工作環境的 `straw_hat_df`。
 98 | 
 99 | *** =hint
100 | - 使用 `dim(straw_hat_df)` 、 `head(straw_hat_df)` 、 `tail(straw_hat_df)` 、 `str(straw_hat_df)` 與 `summary(straw_hat_df)` 探索資料框。
101 | 
102 | *** =pre_exercise_code
103 | ```{r}
104 | load(url("http://s3.amazonaws.com/assets.datacamp.com/production/course_1570/datasets/straw_hat_df.RData"))
105 | ```
106 | 
107 | *** =sample_code
108 | ```{r}
109 | # straw_hat_df 已經預先載入
110 | 
111 | # 對 straw_hat_df 使用 dim() 、 head() 、 tail() 、 str() 與 summary()
112 | 
113 | ```
114 | 
115 | *** =solution
116 | ```{r}
117 | # straw_hat_df 已經預先載入
118 | 
119 | # 對 straw_hat_df 使用 dim() 、 head() 、 tail() 、 str() 與 summary()
120 | dim(straw_hat_df)
121 | head(straw_hat_df)
122 | tail(straw_hat_df)
123 | str(straw_hat_df)
124 | summary(straw_hat_df)
125 | ```
126 | 
127 | *** =sct
128 | ```{r}
129 | test_output_contains("dim(straw_hat_df)", incorrect_msg = "&#30906;&#35469;&#26159;&#21542;&#27491;&#30906;&#23565; `straw_hat_df` &#20351;&#29992; `dim()` &#20989;&#25976;&#65311;")
130 | 
131 | test_output_contains("head(straw_hat_df)", incorrect_msg = "&#30906;&#35469;&#26159;&#21542;&#27491;&#30906;&#23565; `straw_hat_df` &#20351;&#29992; `head()` &#20989;&#25976;&#65311;")
132 | 
133 | test_output_contains("tail(straw_hat_df)", incorrect_msg = "&#30906;&#35469;&#26159;&#21542;&#27491;&#30906;&#23565; `straw_hat_df` &#20351;&#29992; `tail()` &#20989;&#25976;&#65311;")
134 | 
135 | test_output_contains("str(straw_hat_df)", incorrect_msg = "&#30906;&#35469;&#26159;&#21542;&#27491;&#30906;&#23565; `straw_hat_df` &#20351;&#29992; `str()` &#20989;&#25976;&#65311;")
136 | 
137 | test_output_contains("summary(straw_hat_df)", incorrect_msg = "&#30906;&#35469;&#26159;&#21542;&#27491;&#30906;&#23565; `straw_hat_df` &#20351;&#29992; `summary()` &#20989;&#25976;&#65311;")
138 | 
139 | success_msg("&#22826;&#26834;&#20102;&#65292;&#36889;&#20123;&#20989;&#25976;&#37117;&#38750;&#24120;&#23526;&#29992;&#65292;&#19968;&#23450;&#35201;&#25226;&#23427;&#20497;&#35352;&#36215;&#20358;&#65281;");
140 | ```
141 | 
142 | --- type:NormalExercise lang:r xp:100 skills:4 key:92e28431f1
143 | ## 依據欄位排序資料框
144 | 
145 | 有時候我們對於資料框的外觀與排列會有自己的意見，例如會希望依照字母順序或者年齡大小的方式排序，在 R 語言可以使用 [`order()`](http://www.rdocumentation.org/packages/base/versions/3.3.1/topics/order) 函數來實現：
146 | 
147 | ```{r}
148 | df[order(df$col, decreasing = FALSE)]
149 | ```
150 | 
151 | [`order()`](http://www.rdocumentation.org/packages/base/versions/3.3.1/topics/order) 函數預設都是**遞增排序**，所以 `decreasing = ` 的參數預設為 `FALSE`，如果我們希望是**遞減排序**就必須將參數設為 `decreasing = TRUE`。
152 | 
153 | *** =instructions
154 | - 用 `height` 遞增排序草帽海賊團資料框。
155 | - 用 `bounty` 遞減排序草帽海賊團資料框。
156 | 
157 | *** =hint
158 | - 用 `height` 遞增排序時 `decreasing = ` 設為 `FALSE` 或者不指定參數沿用預設。
159 | - 用 `bounty` 遞減排序時 `decreasing = ` 設為 `TRUE`。
160 | 
161 | *** =pre_exercise_code
162 | ```{r}
163 | load(url("http://s3.amazonaws.com/assets.datacamp.com/production/course_1570/datasets/straw_hat_df.RData"))
164 | ```
165 | 
166 | *** =sample_code
167 | ```{r}
168 | # straw_hat_df 已經預先載入
169 | 
170 | # 用 height 遞增排序
171 | straw_hat_df[order(straw_hat_df$__, decreasing = __), ]
172 | 
173 | # 用 bounty 遞減排序
174 | straw_hat_df[order(straw_hat_df$__, decreasing = __), ]
175 | 
176 | ```
177 | 
178 | *** =solution
179 | ```{r}
180 | # straw_hat_df 已經預先載入
181 | 
182 | # 用 height 遞增排序
183 | straw_hat_df[order(straw_hat_df$height, decreasing = FALSE), ]
184 | 
185 | # 用 bounty 遞減排序
186 | straw_hat_df[order(straw_hat_df$bounty, decreasing = TRUE), ]
187 | 
188 | ```
189 | 
190 | *** =sct
191 | ```{r}
192 | msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#20351;&#29992; `order()` &#20989;&#25976;&#23565;&#36523;&#39640;&#20570;&#36958;&#22686;&#25490;&#24207;&#65311;"
193 | test_output_contains("straw_hat_df[order(straw_hat_df$height, decreasing = FALSE), ]",
194 |                      incorrect_msg = msg)
195 | 
196 | msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#20351;&#29992; `order()` &#20989;&#25976;&#23565;&#36062;&#37329;&#20570;&#36958;&#28187;&#25490;&#24207;&#65311;"
197 | test_output_contains("straw_hat_df[order(straw_hat_df$bounty, decreasing = TRUE), ]",
198 |                      incorrect_msg = msg)
199 | 
200 | success_msg("&#22826;&#22909;&#20102;&#65292;&#36889;&#20123;&#29105;&#36523;&#32244;&#32722;&#23565;&#20320;&#32780;&#35328;&#19968;&#23450;&#37117;&#30456;&#30070;&#23481;&#26131;&#21543;&#65311;&#28310;&#20633;&#22909;&#25105;&#20497;&#23601;&#35201;&#33322;&#21521;&#19979;&#19968;&#20491;&#23798;&#23996;&#22217;&#65281;")
201 | ```


--------------------------------------------------------------------------------
/chapter5.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title_meta  : 第五章
  3 | title       : 聯結資料框
  4 | description : 在一個資料分析專案中，資料可能散落在多個資料框中，因此對於資料框的聯結你必須要有清晰的認知，假如你使用過關聯式資料庫進行資料查詢，你在這章節的練習中會覺得駕輕就熟；假如你沒有使用過也不要緊，我們將在本章節學習這些技巧與概念，一場爭奪 One Piece 的海上冒險故事！
  5 | 
  6 | --- type:NormalExercise lang:r xp:100 skills:4 key:4aa0c0c890
  7 | ## 惡魔果實的資料框
  8 | 
  9 | **惡魔果實**是海賊王世界的奇特果實，有「海上惡魔的化身」的別稱，可以讓食用者得到特殊能力，擁有惡魔果實的人普遍被叫作「惡魔果實能力者」。在草帽海賊團中有 4 個能力者，但你注意到我們並沒有一個欄位紀錄船員的惡魔果實資訊，於是你駭入海軍的機密資料庫取得了草帽海賊團的惡魔果實資料 `straw_hat_devil_fruit`。
 10 | 
 11 | 資料來源：[One Piece Wiki](http://onepiece.wikia.com/wiki/Main_Page)
 12 | 
 13 | *** =instructions
 14 | - 將草帽海賊團的惡魔果實資料框輸出在 R Console。
 15 | 
 16 | *** =hint
 17 | - 在編輯區輸入 `straw_hat_devil_fruit` 即可。
 18 | 
 19 | *** =pre_exercise_code
 20 | ```{r}
 21 | load(url("http://s3.amazonaws.com/assets.datacamp.com/production/course_1570/datasets/straw_hat_devil_fruit.RData"))
 22 | ```
 23 | 
 24 | *** =sample_code
 25 | ```{r}
 26 | # straw_hat_devil_fruit 已預先載入
 27 | 
 28 | # 在 R Console 輸出 straw_hat_devil_fruit
 29 | 
 30 | ```
 31 | 
 32 | *** =solution
 33 | ```{r}
 34 | # straw_hat_devil_fruit 已預先載入
 35 | 
 36 | # 在 R Console 印出 straw_hat_devil_fruit
 37 | straw_hat_devil_fruit
 38 | ```
 39 | 
 40 | *** =sct
 41 | ```{r}
 42 | test_output_contains("straw_hat_devil_fruit",
 43 |                      incorrect_msg = "&#30906;&#35469;&#20320;&#26159;&#21542;&#26377;&#25226; `straw_hat_devil_fruit` &#36664;&#20986;&#22312; R Console&#65311;")
 44 |                      
 45 | success_msg("&#23565;&#20320;&#36889;&#20301;&#34987;&#25976;&#21315;&#33836;&#35997;&#37324;&#25080;&#36062;&#30340;&#28023;&#36042;&#32780;&#35328;&#36889;&#32244;&#32722;&#23526;&#22312;&#26159;&#19968;&#29255;&#34507;&#31957;&#65281;")
 46 | ```
 47 | 
 48 | --- type:NormalExercise lang:r xp:100 skills:4 key:84d7738b12
 49 | ## 內部聯結
 50 | 
 51 | 現在你手邊已經取得了 `straw_hat_devil_fruit`，接著你想要把惡魔果實的資訊加入原本的資料框中。R 語言使用 [`merge()`](http://www.rdocumentation.org/packages/base/versions/3.3.1/topics/merge) 函數來進行資料框的聯結，如果你沒有使用過關聯式資料資料庫，你可以想像一下 Excel 的 vlookup 函數功能。
 52 | 
 53 | ```{r}
 54 | merge(df1, df2, by = "foreign_key_column", ...)
 55 | ```
 56 | 
 57 | `by = ` 參數要指定兩個資料框參照的欄位，由於我們的資料框要參照的欄位名稱是相同的： `name`，因此在使用 [`merge()`](http://www.rdocumentation.org/packages/base/versions/3.3.1/topics/merge) 函數時不需要指定 `by = ` 參數。
 58 | 
 59 | 輸出的結果會保留兩個資料框中有參照到的觀測值，這就是俗稱的**內部聯結**！
 60 | 
 61 | *** =instructions
 62 | - 使用 `merge()` 函數聯結 `straw_hat_df` 與 `straw_hat_devil_fruit`，將聯結後資料框宣告為 `straw_hat_df_devil_fruit`。
 63 | - 將 `straw_hat_df_devil_fruit` 輸出在 R Console。
 64 | 
 65 | *** =hint
 66 | - 輸入 `straw_hat_df_devil_fruit <- merge(straw_hat_df, straw_hat_devil_fruit)` 就可以完成。
 67 | 
 68 | *** =pre_exercise_code
 69 | ```{r}
 70 | load(url("http://s3.amazonaws.com/assets.datacamp.com/production/course_1570/datasets/straw_hat_df.RData"))
 71 | load(url("http://s3.amazonaws.com/assets.datacamp.com/production/course_1570/datasets/straw_hat_devil_fruit.RData"))
 72 | ```
 73 | 
 74 | *** =sample_code
 75 | ```{r}
 76 | # straw_hat_df 與 straw_hat_devil_fruit 已預先載入
 77 | 
 78 | # 聯結資料框
 79 | straw_hat_df_devil_fruit <-
 80 | 
 81 | # 將結果輸出在 R Console
 82 | 
 83 | ```
 84 | 
 85 | *** =solution
 86 | ```{r}
 87 | # straw_hat_df 與 straw_hat_devil_fruit 已預先載入
 88 | 
 89 | # 聯結資料框
 90 | straw_hat_df_devil_fruit <- merge(straw_hat_df, straw_hat_devil_fruit)
 91 | 
 92 | # 將結果輸出在 R Console
 93 | straw_hat_df_devil_fruit
 94 | ```
 95 | 
 96 | *** =sct
 97 | ```{r}
 98 | msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#20351;&#29992; `merge()` &#20989;&#25976;&#29983;&#25104; `straw_hat_df_devil_fruit`&#65311;"
 99 | test_object("straw_hat_df_devil_fruit", 
100 |             undefined_msg = msg, 
101 |             incorrect_msg = msg)
102 | 
103 | msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#22312; R Console &#36664;&#20986; `straw_hat_df_devil_fruit`&#65311;"
104 | test_output_contains("straw_hat_df_devil_fruit",
105 |                      incorrect_msg = msg)
106 | 
107 | success_msg("&#22826;&#26834;&#20102;&#65292;&#35264;&#23519;&#19968;&#19979;&#20320;&#30340;&#36664;&#20986;&#65292;&#20839;&#37096;&#32879;&#32080;&#21482;&#26371;&#20445;&#30041;&#20841;&#20491;&#36039;&#26009;&#26694;&#26377;&#21443;&#29031;&#21040;&#30340;&#35264;&#28204;&#20540;&#65281;")
108 | ```
109 | 
110 | --- type:NormalExercise lang:r xp:100 skills:4 key:5200a77316
111 | ## 左外部聯結
112 | 
113 | 回憶前一個練習最後的輸出，我們原本的資料框有 9 個船員，但是其中只有 4 個船員是惡魔果實能力者，因此當 [`merge()`](http://www.rdocumentation.org/packages/base/versions/3.3.1/topics/merge) 函數沒有做其他參數設定時，預設即為**內部聯結**，輸出結果只會有兩個資料框交集的船員。
114 | 
115 | 如果想要保留所有船員的資料，要在 [`merge()`](http://www.rdocumentation.org/packages/base/versions/3.3.1/topics/merge) 函數中額外指定參數 `all.x = TRUE` 請 R 語言將第一個資料框的所有觀測值都保留下來：
116 | 
117 | ```{r}
118 | merge(df1, df2, all.x = TRUE)
119 | ```
120 | 
121 | 輸出的結果會保留第一個資料框中的所有觀測值，而參照不到惡魔果實的船員會以遺漏值記錄，這就是俗稱的**左外部聯結**！
122 | 
123 | *** =instructions
124 | - 使用 `merge()` 函數將 `straw_hat_df` 與 `straw_hat_devil_fruit` 進行左外部聯結，將聯結後資料框宣告為 `straw_hat_df_devil_fruit`。
125 | - 將 `straw_hat_df_devil_fruit` 輸出在 R Console。
126 | 
127 | *** =hint
128 | - `merge()` 函數要設定 `all.x = TRUE`。
129 | 
130 | *** =pre_exercise_code
131 | ```{r}
132 | load(url("http://s3.amazonaws.com/assets.datacamp.com/production/course_1570/datasets/straw_hat_df.RData"))
133 | load(url("http://s3.amazonaws.com/assets.datacamp.com/production/course_1570/datasets/straw_hat_devil_fruit.RData"))
134 | ```
135 | 
136 | *** =sample_code
137 | ```{r}
138 | # straw_hat_df 與 straw_hat_devil_fruit 已預先載入
139 | 
140 | # 左外部聯結
141 | straw_hat_df_devil_fruit <- 
142 | 
143 | # 將結果輸出在 R Console
144 | 
145 | ```
146 | 
147 | *** =solution
148 | ```{r}
149 | # straw_hat_df 與 straw_hat_devil_fruit 已預先載入
150 | 
151 | # 左外部聯結
152 | straw_hat_df_devil_fruit <- merge(straw_hat_df, straw_hat_devil_fruit, all.x = TRUE)
153 | 
154 | # 將結果輸出在 R Console
155 | straw_hat_df_devil_fruit
156 | ```
157 | 
158 | *** =sct
159 | ```{r}
160 | msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#20351;&#29992; `merge()` &#20989;&#25976;&#29983;&#25104;&#24038;&#22806;&#37096;&#32879;&#32080;&#30340;&#32080;&#26524;&#65311;"
161 | test_object("straw_hat_df_devil_fruit", 
162 |             undefined_msg = msg, 
163 |             incorrect_msg = msg) 
164 | 
165 | msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#23559; `straw_hat_df_devil_fruit`&#36664;&#20986;&#22312; R Console&#65311;"
166 | test_output_contains("straw_hat_df_devil_fruit",
167 |                      incorrect_msg = msg)
168 | 
169 | success_msg("&#35264;&#23519;&#19968;&#19979;&#24038;&#22806;&#37096;&#32879;&#32080;&#29983;&#25104;&#30340;&#32080;&#26524;&#65292;&#27880;&#24847;&#27794;&#26377;&#21443;&#29031;&#21040;&#30340;&#35264;&#28204;&#20540;&#26371;&#20197;&#36986;&#28431;&#20540;&#26041;&#24335;&#35352;&#37636;&#65281;")
170 | ```
171 | 
172 | --- type:NormalExercise lang:r xp:100 skills:4 key:cf1c36b9fc
173 | ## 右外部聯結
174 | 
175 | 與前一個練習相呼應，既然有保留第一個資料框所有觀測值的左外部聯結，我們一定也有保留第二個資料框所有觀測值的**右外部聯結**。想要保留所有惡魔果實能力者的資料，要在 [`merge()`](http://www.rdocumentation.org/packages/base/versions/3.3.1/topics/merge) 函數中額外指定參數 `all.y = TRUE` 請 R 語言將第二個資料框的所有觀測值都保留下來：
176 | 
177 | ```{r}
178 | merge(df1, df2, all.y = TRUE)
179 | ```
180 | 
181 | 為了展示**右外部聯結**的效果，我們稍微修改了第一個資料框 `straw_hat_df` 留下四個船員的角色設定，兩個能力者與兩個非能力者。輸出的結果會保留第二個資料框中的所有觀測值，而參照不到主要角色設定的惡魔果實能力者欄位會以遺漏值記錄，這就是俗稱的**右外部聯結**！
182 | 
183 | *** =instructions
184 | - 先將修改後的 `straw_hat_df` 輸出在 R Console 看看。
185 | - 使用 `merge()` 函數將 `straw_hat_df` 與 `straw_hat_devil_fruit` 進行右外部聯結，將聯結後資料框宣告為 `straw_hat_df_devil_fruit`。
186 | - 將 `straw_hat_df_devil_fruit` 輸出在 R Console。
187 | 
188 | *** =hint
189 | - `merge()` 函數要設定 `all.y = TRUE`。
190 | 
191 | *** =pre_exercise_code
192 | ```{r}
193 | load(url("http://s3.amazonaws.com/assets.datacamp.com/production/course_1570/datasets/straw_hat_df.RData"))
194 | load(url("http://s3.amazonaws.com/assets.datacamp.com/production/course_1570/datasets/straw_hat_devil_fruit.RData"))
195 | straw_hat_df <- straw_hat_df[c(1:3, 6), ]
196 | ```
197 | 
198 | *** =sample_code
199 | ```{r}
200 | # straw_hat_df 與 straw_hat_devil_fruit 已預先載入
201 | 
202 | # 將 straw_hat_df 輸出在 R Console
203 | 
204 | 
205 | # 右外部聯結
206 | straw_hat_df_devil_fruit <- 
207 | 
208 | # 將結果輸出在 R Console
209 | 
210 | ```
211 | 
212 | *** =solution
213 | ```{r}
214 | # straw_hat_df 與 straw_hat_devil_fruit 已預先載入
215 | 
216 | # 將 straw_hat_df 輸出在 R Console
217 | straw_hat_df
218 | 
219 | # 右外部聯結
220 | straw_hat_df_devil_fruit <- merge(straw_hat_df, straw_hat_devil_fruit, all.y = TRUE)
221 | 
222 | # 將結果輸出在 R Console
223 | straw_hat_df_devil_fruit
224 | ```
225 | 
226 | *** =sct
227 | ```{r}
228 | msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#23559; `straw_hat_df` &#36664;&#20986;&#22312; R Console&#65311;"
229 | test_output_contains("straw_hat_df",
230 |                      incorrect_msg = msg)
231 | 
232 | msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#20351;&#29992; `merge()` &#20989;&#25976;&#36914;&#34892;&#21491;&#22806;&#37096;&#32879;&#32080;&#65311;"
233 | test_object("straw_hat_df_devil_fruit", 
234 |             undefined_msg = msg, 
235 |             incorrect_msg = msg) 
236 |                      
237 | msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#23559; `straw_hat_df_devil_fruit` &#36664;&#20986;&#22312; R Console&#65311;"
238 | test_output_contains("straw_hat_df_devil_fruit",
239 |                      incorrect_msg = msg)
240 | 
241 | success_msg("&#22826;&#26834;&#20102;&#65292;&#36996;&#24046;&#19968;&#20491;&#32244;&#32722;&#20320;&#30340;&#36039;&#26009;&#26694;&#32879;&#32080;&#25216;&#33021;&#23601;&#21487;&#20197;&#34987;&#40670;&#28415;&#20102;&#65281;")
242 | ```
243 | 
244 | --- type:NormalExercise lang:r xp:100 skills:4 key:69321d3583
245 | ## 全外部聯結
246 | 
247 | 既然有保留第一個資料框所有觀測值的左外部聯結，亦有保留第二個資料框所有觀測值的右外部聯結，聰明如你一定想到我們必定有保留兩個資料框所有觀測值的聯結，你的推測沒有錯，那就是俗稱的**全外部聯結**，在 [`merge()`](http://www.rdocumentation.org/packages/base/versions/3.3.1/topics/merge) 函數中額外指定參數 `all.x = TRUE` 以及 `all.y = TRUE` 請 R 語言將兩個資料框的所有觀測值都保留下來：
248 | 
249 | ```{r}
250 | merge(df1, df2, all.x = TRUE, all.y = TRUE)
251 | ```
252 | 
253 | *** =instructions
254 | - 先將修改後的 `straw_hat_df` 輸出在 R Console 看看。
255 | - 使用 `merge()` 函數將 `straw_hat_df` 與 `straw_hat_devil_fruit` 進行全外部聯結，將聯結後資料框宣告為 `straw_hat_df_devil_fruit`。
256 | - 將 `straw_hat_df_devil_fruit` 輸出在 R Console。
257 | 
258 | *** =hint
259 | - `merge()` 函數要設定 `all.x = TRUE, all.y = TRUE`。
260 | 
261 | *** =pre_exercise_code
262 | ```{r}
263 | load(url("http://s3.amazonaws.com/assets.datacamp.com/production/course_1570/datasets/straw_hat_df.RData"))
264 | load(url("http://s3.amazonaws.com/assets.datacamp.com/production/course_1570/datasets/straw_hat_devil_fruit.RData"))
265 | straw_hat_df <- straw_hat_df[c(1:3, 6), ]
266 | ```
267 | 
268 | *** =sample_code
269 | ```{r}
270 | # straw_hat_df 與 straw_hat_devil_fruit 已預先載入
271 | 
272 | # 將 straw_hat_df 輸出在 R Console
273 | straw_hat_df
274 | 
275 | # 全外部聯結
276 | straw_hat_df_devil_fruit <- 
277 | 
278 | # 將結果輸出在 R Console
279 | straw_hat_df_devil_fruit
280 | ```
281 | 
282 | *** =solution
283 | ```{r}
284 | # straw_hat_df 與 straw_hat_devil_fruit 已預先載入
285 | 
286 | # 將 straw_hat_df 輸出在 R Console
287 | straw_hat_df
288 | 
289 | # 全外部聯結
290 | straw_hat_df_devil_fruit <- merge(straw_hat_df, straw_hat_devil_fruit, all.x = TRUE, all.y = TRUE)
291 | 
292 | # 將結果輸出在 R Console
293 | straw_hat_df_devil_fruit
294 | ```
295 | 
296 | *** =sct
297 | ```{r}
298 | msg = "&#30906;&#35469;&#20320;&#26159;&#21542;&#26377;&#23559; `straw_hat_df` &#36664;&#20986;&#22312; R Console&#65311;"
299 | test_output_contains("straw_hat_df",
300 |                      incorrect_msg = msg)
301 | 
302 | msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#20351;&#29992; `merge()` &#20989;&#25976;&#36914;&#34892;&#21491;&#22806;&#37096;&#32879;&#32080;&#65311;"
303 | test_object("straw_hat_df_devil_fruit", 
304 |             undefined_msg = msg, 
305 |             incorrect_msg = msg) 
306 | 
307 | msg = "&#30906;&#35469;&#20320;&#26159;&#21542;&#26377;&#23559; `straw_hat_df_devil_fruit` &#36664;&#20986;&#22312; R Console&#65311;"
308 | test_output_contains("straw_hat_df_devil_fruit",
309 |                      incorrect_msg = msg)
310 | success_msg("&#24685;&#21916;&#20320;&#65292;&#20320;&#24050;&#32147;&#25104;&#28858;&#36039;&#26009;&#32879;&#32080;&#30340;&#39640;&#25163;&#65292;&#26597;&#19968;&#19979;&#28023;&#36557;&#30340;&#36039;&#26009;&#24235;&#65292;&#20320;&#24050;&#32147;&#34987;&#21015;&#20837;&#19978;&#20740;&#36062;&#37329;&#30340;&#37325;&#40670;&#36890;&#32221;&#21517;&#21934;&#65281;")
311 | ```
312 | 
313 | --- type:NormalExercise lang:r xp:100 skills:4 key:65494f3253
314 | ## 多個參照欄位
315 | 
316 | 你的懸賞金額現在已經跟草帽魯夫並駕齊驅了，但你很好奇草帽魯夫的歷史懸賞金額，於是你再度駭入海軍的機密資料庫，你找到了一個資料框紀錄草帽魯夫各時期的懸賞金額。你發現了海軍利用事件以及賞金兩個欄位紀錄不同時期的懸賞金額，因此我們如果要參照不只一個欄位時，在 `by = ` 的參數設定要使用 `c()` 將多個參照欄位寫入：
317 | 
318 | ```{r}
319 | merge(df1, df2, by = c("col1", "col2", ...))
320 | ```
321 | 
322 | 資料來源：[One Piece Wiki](http://onepiece.wikia.com/wiki/Main_Page)
323 | 
324 | *** =instructions
325 | - 將 `luffy_event` 與 `luffy_event_bounty` 輸出在 R Console 看看。
326 | - 將 `luffy_event` 與 `luffy_event_bounty` 內部聯結產出 `luffy_bounty`。
327 | - 將 `luffy_bounty` 輸出在 R Console 看看。
328 | 
329 | *** =hint
330 | - `merge()` 函數要設定 `by = c("name", "event")` 或採預設不指定 `by = ` 亦可。
331 | 
332 | *** =pre_exercise_code
333 | ```{r}
334 | load(url("http://s3.amazonaws.com/assets.datacamp.com/production/course_1570/datasets/luffy_event_bounty.RData"))
335 | load(url("http://s3.amazonaws.com/assets.datacamp.com/production/course_1570/datasets/luffy_event.RData"))
336 | ```
337 | 
338 | *** =sample_code
339 | ```{r}
340 | # luffy_event_bounty 與 luffy_event 都已預先載入
341 | 
342 | # 將 luffy_event_bounty 與 luffy_event 輸出在 R Console
343 | 
344 | 
345 | 
346 | # 內部聯結
347 | luffy_bounty <- 
348 | 
349 | # 將 luffy_bounty 輸出在 R Console
350 | 
351 | ```
352 | 
353 | *** =solution
354 | ```{r}
355 | # luffy_event_bounty 與 luffy_event 都已預先載入
356 | 
357 | # 將 luffy_event_bounty 與 luffy_event 輸出在 R Console
358 | luffy_event_bounty
359 | luffy_event
360 | 
361 | # 內部聯結
362 | luffy_bounty <- merge(luffy_event, luffy_event_bounty, by = c("name", "event"))
363 | 
364 | # 將 luffy_bounty 輸出在 R Console
365 | luffy_bounty
366 | ```
367 | 
368 | *** =sct
369 | ```{r}
370 | msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#36664;&#20986; `luffy_event_bounty` &#22312; R Console&#65311;"
371 | test_output_contains("luffy_event_bounty",
372 |                      incorrect_msg = msg)
373 | 
374 | msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#36664;&#20986; `luffy_event` &#22312; R Console&#65311;"
375 | test_output_contains("luffy_event",
376 |                      incorrect_msg = msg)
377 | 
378 | msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#20351;&#29992; `merge()` &#20989;&#25976;&#29983;&#25104; `luffy_bounty`&#65311;"
379 | test_object("luffy_bounty", 
380 |             undefined_msg = msg, 
381 |             incorrect_msg = msg)
382 | 
383 | msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#36664;&#20986; `luffy_bounty` &#22312; R Console&#65311;"
384 | test_output_contains("luffy_bounty",
385 |                      incorrect_msg = msg)
386 | 
387 | success_msg("&#24685;&#21916;&#20320;&#65281;&#20320;&#39640;&#36229;&#30340;&#36039;&#26009;&#26694;&#25972;&#29702;&#25216;&#24039;&#24050;&#32147;&#34987;&#28023;&#36557;&#21015;&#28858;&#37325;&#40670;&#36890;&#32221;&#28023;&#36042;&#65292;&#20320;&#30340;&#21517;&#34399;&#24050;&#32147;&#20659;&#36941;&#20102;&#26032;&#19990;&#30028;&#65292;&#19968;&#22580;&#29229;&#22890; One Piece &#30340;&#28023;&#19978;&#20882;&#38570;&#25925;&#20107;&#65281;")
388 | ```
389 | 
390 | 


--------------------------------------------------------------------------------
/chapter4.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title_meta  : 第四章
  3 | title       : 欄位聚合與資料轉置
  4 | description : 分析資料框常需要對某些欄位進行摘要統計，可能是求取總和或者平均值，讓你對變數的分佈更為清楚；有時候甚至你需要依據某個類別變數，分別計算不同類別的摘要統計，一如你在 Excel 中使用樞紐分析表一般；有經驗的資料科學家還必須熟悉長資料框與寬資料框的互相轉換，視需求靈活調整資料結構。我們將在本章節學習這些技巧與概念，一場爭奪 One Piece 的海上冒險故事！
  5 | 
  6 | --- type:NormalExercise lang:r xp:100 skills:4 key:eb29a9a1a9
  7 | ## 摘要統計
  8 | 
  9 | 你與草帽海賊團同樣屬於**超新星世代**，知己知彼，百戰不殆。你打算要針對這個潛在對手好好分析，在第一章中我們曾經有介紹過 [`summary()`](http://www.rdocumentation.org/packages/base/versions/3.3.1/topics/summary) 函數，它可以幫助你快速暸解資料框的摘要，而[`summary()`](http://www.rdocumentation.org/packages/base/versions/3.3.1/topics/summary) 函數除了可以應用在整個資料框之外，其實也可以使用在單一的變數上：
 10 | 
 11 | ```{r}
 12 | summary(df$col)
 13 | ```
 14 | 
 15 | 我們也可以善用簡單的 [`sum()`](http://www.rdocumentation.org/packages/base/versions/3.3.1/topics/sum) 函數或 [`sd()`](http://www.rdocumentation.org/packages/stats/versions/3.3.1/topics/sd) 函數產出統計值，不一定只能仰賴 [`summary()`](http://www.rdocumentation.org/packages/base/versions/3.3.1/topics/summary) 函數產出既定的摘要統計值。
 16 | 
 17 | *** =instructions
 18 | - 對草帽海賊團資料框使用 `summary()` 函數
 19 | - 對草帽海賊團資料框之中的 `height` 變數使用 `summary()` 函數
 20 | - 對草帽海賊團資料框之中的 `bounty` 變數使用 `sum()` 函數
 21 | - 對草帽海賊團資料框之中的 `bounty` 變數使用 `sd()` 函數
 22 | 
 23 | *** =hint
 24 | - 將 `straw_hat_df` 作為 `summary()` 函數的唯一參數。
 25 | - 將 `straw_hat_df$height` 作為 `summary()` 函數的唯一參數。
 26 | - 將 `straw_hat_df$bounty` 作為 `sum()` 函數的唯一參數。
 27 | - 將 `straw_hat_df$bounty` 作為 `sd()` 函數的唯一參數。
 28 | 
 29 | *** =pre_exercise_code
 30 | ```{r}
 31 | load(url("http://s3.amazonaws.com/assets.datacamp.com/production/course_1570/datasets/straw_hat_df.RData"))
 32 | ```
 33 | 
 34 | *** =sample_code
 35 | ```{r}
 36 | # straw_hat_df 已預先載入
 37 | 
 38 | # 對 straw_hat_df 使用 summary()
 39 | 
 40 | 
 41 | # 對 straw_hat_df$height 使用 summary()
 42 | 
 43 | 
 44 | # 對 straw_hat_df$bounty 使用 sum()
 45 | 
 46 | 
 47 | # 對 straw_hat_df$bounty 使用 sd()
 48 | 
 49 | 
 50 | ```
 51 | 
 52 | 
 53 | *** =solution
 54 | ```{r}
 55 | # straw_hat_df 已預先載入
 56 | 
 57 | # 對 straw_hat_df 使用 summary()
 58 | summary(straw_hat_df)
 59 | 
 60 | # 對 straw_hat_df$height 使用 summary()
 61 | summary(straw_hat_df$height)
 62 | 
 63 | # 對 straw_hat_df$bounty 使用 sum()
 64 | sum(straw_hat_df$bounty)
 65 | 
 66 | # 對 straw_hat_df$bounty 使用 sd()
 67 | sd(straw_hat_df$bounty)
 68 | 
 69 | ```
 70 | 
 71 | *** =sct
 72 | ```{r}
 73 | msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#23565; `straw_hat_df` &#20351;&#29992; `summary()` &#20989;&#25976;&#65311;"
 74 | test_output_contains("summary(straw_hat_df)",
 75 |                      incorrect_msg = msg)
 76 |                      
 77 | msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#23565; `straw_hat_df$height` &#20351;&#29992; `summary()` &#20989;&#25976;&#65311;"
 78 | test_output_contains("summary(straw_hat_df$height)",
 79 |                      incorrect_msg = msg)
 80 | 
 81 | msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#23565; `straw_hat_df$bounty` &#20351;&#29992; `sum()` &#20989;&#25976;&#65311;"
 82 | test_output_contains("sum(straw_hat_df$bounty)",
 83 |                      incorrect_msg = msg)
 84 | 
 85 | msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#23565; `straw_hat_df$bounty` &#20351;&#29992; `sd()` &#20989;&#25976;&#65311;"
 86 | test_output_contains("sd(straw_hat_df$bounty)",
 87 |                      incorrect_msg = msg)
 88 | 
 89 | success_msg("&#30475;&#21040;&#33609;&#24125;&#28023;&#36042;&#22296;&#20840;&#22296;&#30340;&#25080;&#36062;&#37329;&#38989;&#36889;&#40636;&#39640;&#65292;&#20320;&#26377;&#27794;&#26377;&#35258;&#24471;&#36996;&#26159;&#19981;&#35201;&#25307;&#24825;&#20182;&#20497;&#22909;&#20102;&#21602;&#65311;")
 90 | ```
 91 | 
 92 | --- type:NormalExercise lang:r xp:100 skills:4 key:077f2d637a
 93 | ## 摘要統計（2）
 94 | 
 95 | 身為**超新星世代**的一員，你對草帽海賊團的戰力分析絕對不僅止於世界政府那粗糙的整體摘要統計，你想著也許針對不同性別或者不同戰鬥位置的船員分開剖析，可以幫助你找到草帽海賊團的弱點，便於在將來正面對決時能加以利用。
 96 | 
 97 | 於是我們要來學習使用類似在 Excel 中常用的**樞紐分析表**來達成這件事情，為了簡單地做到這件事情，接下來我們要使用 [`ddply()`](http://www.rdocumentation.org/packages/plyr/versions/1.8.4/topics/ddply) 函數，跟我們先前使用的函數不一樣的地方在於，它不是 R 語言的原生函數，而是源自於一個套件 [`plyr`](http://www.rdocumentation.org/packages/plyr/versions/1.8.4)，因此在使用之前必須要使用 [`library()`](http://www.rdocumentation.org/packages/SIM/versions/1.42.0/topics/Library) 函數將 [`plyr`](http://www.rdocumentation.org/packages/plyr/versions/1.8.4) 套件載入才行。
 98 | 
 99 | ```{r}
100 | library(plyr)
101 | ```
102 | 
103 | [`ddply()`](http://www.rdocumentation.org/packages/plyr/versions/1.8.4/topics/ddply) 函數需要輸入較多參數，`.variables =` 要放的是欲分別摘要的類別型變數，`.fun = summarise` 在現階段先不做更動，後面則是加上聚合計算欄位的名稱與算式：
104 | 
105 | ```{r}
106 | ddply(df, .variables = c("category1", "category2", ...), .fun = summarise, mean_value1 = mean(value))
107 | ```
108 | 
109 | *** =instructions
110 | - 使用 `head()` 函數看一下草帽海賊團資料框
111 | - 依據性別 `gender` 計算各性別的平均身高
112 | - 依據戰鬥角色 `battle_role` 計算各角色的加總賞金
113 | - 依據性別與戰鬥角色，計算平均身高與加總賞金
114 | 
115 | *** =hint
116 | - 輸入 `head(straw_hat_df)` 看看資料框
117 | - `.variables = ` 參數要指定為 `"gender"`，聚合計算可以參考 `avg_height = mean(height)`
118 | - `.variables = ` 參數要指定為 `"battle_role"`，聚合計算可以參考 `ttl_bounty = sum(bounty)`
119 | - `.variables = ` 參數要指定為 `c("gender", "battle_role")`，聚合計算可以參考 `avg_height = mean(height), ttl_bounty = sum(bounty)`
120 | 
121 | *** =pre_exercise_code
122 | ```{r}
123 | load(url("http://s3.amazonaws.com/assets.datacamp.com/production/course_1570/datasets/straw_hat_df.RData"))
124 | battle_role <- factor(c("Fighter", "Fighter", "Support", "Support", "Fighter", "Support", "Support", "Fighter", "Fighter"))
125 | straw_hat_df$battle_role <- battle_role
126 | ```
127 | 
128 | *** =sample_code
129 | ```{r}
130 | # straw_hat_df 已預先載入
131 | 
132 | # 用 head() 函數看一下 straw_hat_df
133 | 
134 | 
135 | # 載入 plyr 套件
136 | library(plyr)
137 | 
138 | # 依據 gender 計算平均身高
139 | ddply(straw_hat_df, .variables = "__", .fun = summarise, avg_height = mean(__))
140 | 
141 | # 依據 battle_role 計算加總賞金
142 | ddply(straw_hat_df, .variables = "__", .fun = summarise, ttl_bounty = sum(__))
143 | 
144 | # 依據 gender 與 battle_role 計算平均身高與加總賞金
145 | ddply(straw_hat_df, .variables = c("__", "__"), .fun = summarise, avg_height = mean(__), ttl_bounty = sum(__))
146 | ```
147 | 
148 | *** =solution
149 | ```{r}
150 | # straw_hat_df 已預先載入
151 | 
152 | # 用 head() 函數看一下 straw_hat_df
153 | head(straw_hat_df)
154 | 
155 | # 載入 plyr 套件
156 | library(plyr)
157 | 
158 | # 依據 gender 計算平均身高
159 | ddply(straw_hat_df, .variables = "gender", .fun = summarise, avg_height = mean(height))
160 | 
161 | # 依據 battle_role 計算加總賞金
162 | ddply(straw_hat_df, .variables = "battle_role", .fun = summarise, ttl_bounty = sum(bounty))
163 | 
164 | # 依據 gender 與 battle_role 計算平均身高與加總賞金
165 | ddply(straw_hat_df, .variables = c("gender", "battle_role"), .fun = summarise, avg_height = mean(height), ttl_bounty = sum(bounty))
166 | ```
167 | 
168 | *** =sct
169 | ```{r}
170 | msg = "&#30906;&#35469;&#20320;&#26159;&#21542;&#20351;&#29992; `head()` &#20989;&#25976;&#30475;&#30475; `straw_hat_df` &#30340;&#21069;&#20845;&#21015;&#65311;"
171 | test_output_contains("head(straw_hat_df)",
172 |                      incorrect_msg = msg)
173 | 
174 | msg = "&#30906;&#35469;&#20320;&#26159;&#21542;&#26377;&#27491;&#30906;&#20351;&#29992; `ddply()` &#20989;&#25976;&#20381;&#25818; `gender` &#35336;&#31639;&#21508;&#24615;&#21029;&#30340;&#24179;&#22343;&#36523;&#39640;&#65311;"
175 | test_output_contains("ddply(straw_hat_df, .variables = \"gender\", .fun = summarise, avg_height = mean(height))",
176 |                      incorrect_msg = msg)
177 | 
178 | msg = "&#30906;&#35469;&#20320;&#26159;&#21542;&#26377;&#27491;&#30906;&#20351;&#29992; `ddply()` &#20989;&#25976;&#20381;&#25818; `battle_role` &#35336;&#31639;&#21508;&#25136;&#39717;&#35282;&#33394;&#30340;&#21152;&#32317;&#36062;&#37329;&#65311;"
179 | test_output_contains("ddply(straw_hat_df, .variables = \"battle_role\", .fun = summarise, ttl_bounty = sum(bounty))",
180 |                      incorrect_msg = msg)
181 | 
182 | msg = "&#30906;&#35469;&#20320;&#26159;&#21542;&#26377;&#27491;&#30906;&#20351;&#29992; `ddply()` &#20989;&#25976;&#20381;&#25818; `gender` &#33287; `battle_role` &#35336;&#31639;&#24179;&#22343;&#36523;&#39640;&#33287;&#21152;&#32317;&#36062;&#37329;&#65311;"
183 | test_output_contains("ddply(straw_hat_df, .variables = c(\"gender\", \"battle_role\"), .fun = summarise, avg_height = mean(height), ttl_bounty = sum(bounty))",
184 |                      incorrect_msg = msg)
185 | 
186 | success_msg("&#21703;&#65292;&#33021;&#22816;&#23436;&#25104;&#21040;&#36889;&#35041;&#30495;&#30340;&#24456;&#19981;&#31777;&#21934;&#65281;&#26377;&#36889;&#27171;&#30340;&#23526;&#21147;&#25105;&#30456;&#20449;&#20320;&#30495;&#30340;&#26159;&#36229;&#26032;&#26143;&#19990;&#20195;&#30340;&#20854;&#20013;&#19968;&#21729;&#12290;&#25105;&#20497;&#20677;&#20677;&#20171;&#32057;&#20102; `plyr` &#22871;&#20214;&#20013;&#30340;&#19968;&#20491; `ddply()` &#20989;&#25976;&#65292;&#23427;&#35041;&#38957;&#24456;&#26377;&#24456;&#22810;&#20540;&#24471;&#20320;&#33457;&#26178;&#38291;&#21435;&#30740;&#31350;&#30340;&#20989;&#25976;&#65281;")
187 | ```
188 | 
189 | --- type:NormalExercise lang:r xp:100 skills:4 key:c3028a5601
190 | ## 資料轉置：寬變長
191 | 
192 | 課程進行到這裡，你的資料整理懸賞金額大概已經由數百元貝里提高到了數百萬貝里，但是要成為一個夠資格與草帽海賊團相抗衡的海賊，你的懸賞金額可得提高到數千萬貝里才行！接下來我們要學習的是關於資料的轉置，學會這個技巧之後可以提升你的資料整理懸賞金額至數千萬貝里！
193 | 
194 | 我們首先介紹如何將一個寬資料框變為長資料框，你記得在草帽海賊團資料框中，我們有 2 個整數欄位嗎？它們分別是 `age` 和 `height`，但你有想過這是我們儲存資料的唯一方法嗎？我們可以改用一個類別欄位儲存數值的種類以及一個數值欄位儲存數值，如此一來不管有多少個數值，我們都可以用兩個欄位儲存！
195 | 
196 | *** =instructions
197 | - 建立一個新的資料框 `straw_hat_wide_df` 僅包含姓名、年齡與身高這三個欄位。
198 | 
199 | *** =hint
200 | - 可以使用 `[, c("name", "age", "height")]` 將欄位選出來。
201 | 
202 | *** =pre_exercise_code
203 | ```{r}
204 | load(url("http://s3.amazonaws.com/assets.datacamp.com/production/course_1570/datasets/straw_hat_df.RData"))
205 | ```
206 | 
207 | *** =sample_code
208 | ```{r}
209 | # straw_hat_df 已預先載入
210 | 
211 | # 建立一個新的資料框 straw_hat_wide_df 僅包含姓名、年齡與身高
212 | straw_hat_wide_df <- 
213 | ```
214 | 
215 | *** =solution
216 | ```{r}
217 | # straw_hat_df 已預先載入
218 | 
219 | # 建立一個新的資料框 straw_hat_wide_df 僅包含姓名、年齡與身高
220 | straw_hat_wide_df <- straw_hat_df[, c("name", "age", "height")]
221 | ```
222 | 
223 | *** =sct
224 | ```{r}
225 | msg = "&#30906;&#35469;&#20320;&#26159;&#21542;&#26377;&#29986;&#29983; `straw_hat_wide_df` &#20006;&#21253;&#21547;&#22995;&#21517;&#12289;&#24180;&#40801;&#33287;&#36523;&#39640;&#36889;&#19977;&#20491;&#27396;&#20301;&#65311;"
226 | test_object("straw_hat_wide_df",
227 |             undefined_msg = msg, 
228 |             incorrect_msg = msg)
229 | 
230 | success_msg("&#22909;&#65292;&#29105;&#36523;&#23436;&#30050;&#20102;&#65281;&#25105;&#20497;&#25226;&#36039;&#26009;&#26694;&#31777;&#21270;&#25104;&#19977;&#20491;&#27396;&#20301;&#20043;&#24460;&#35201;&#38283;&#22987;&#20570;&#36681;&#32622;&#22217;&#65281;")
231 | ```
232 | 
233 | --- type:NormalExercise lang:r xp:100 skills:4 key:45945ea8ab
234 | ## 資料轉置：寬變長（2）
235 | 
236 | 進行寬資料框變為長資料框時我們需要使用 [`gather()`](http://www.rdocumentation.org/packages/tidyr/versions/0.5.1/topics/gather) 函數，它不是 R 語言的原生函數，而是源自於一個套件 [`tidyr`](http://www.rdocumentation.org/packages/tidyr/versions/0.5.1)。
237 | 
238 | 接著我們使用 [`gather()`](http://www.rdocumentation.org/packages/tidyr/versions/0.5.1/topics/gather) 函數來把寬資料框變為長資料框，[`gather()`](http://www.rdocumentation.org/packages/tidyr/versions/0.5.1/topics/gather) 函數必須要指定幾個參數，第一個是寬資料框名稱，而 `key = ` 是轉置後用來儲存**數值種類**的欄位名稱，`value = ` 是轉置後用來儲存**數值**的欄位名稱。後面則輸入需要被轉置的原始欄位名稱 :
239 | 
240 | ```{r}
241 | gather(df_wide, key = 新命名一個變數區分數值的種類, value = 新命名一個變數存放數值, 原始數值欄位1, 原始數值欄位2, ...)
242 | ```
243 | 
244 | *** =instructions
245 | - 使用 `gather()` 函數把上一個練習生成的 `straw_hat_wide_df` 轉置為 `straw_hat_long_df`。
246 | - 把 `straw_hat_long_df` 輸出在 R Console 看看。
247 | 
248 | *** =hint
249 | - `gather()` 函數的參數要指派 `straw_hat_wide_df, key = cate, value = int, height, age`
250 | 
251 | *** =pre_exercise_code
252 | ```{r}
253 | load(url("http://s3.amazonaws.com/assets.datacamp.com/production/course_1570/datasets/straw_hat_df.RData"))
254 | straw_hat_wide_df <- straw_hat_df[, c("name", "age", "height")]
255 | library(tidyr)
256 | ```
257 | 
258 | *** =sample_code
259 | ```{r}
260 | # straw_hat_wide_df 、 tidyr 已預先載入
261 | 
262 | # 轉置
263 | straw_hat_long_df <- gather(__, key = __, value = __, __, __)
264 | 
265 | # 將資料框輸出在 R Console
266 | 
267 | ```
268 | 
269 | *** =solution
270 | ```{r}
271 | # straw_hat_wide_df 、 tidyr 已預先載入
272 | 
273 | # 轉置
274 | straw_hat_long_df <- gather(straw_hat_wide_df, key = cate, value = int, height, age)
275 | 
276 | # 將資料框輸出在 R Console
277 | straw_hat_long_df
278 | ```
279 | 
280 | *** =sct
281 | ```{r}
282 | msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#20351;&#29992; `gather()` &#20989;&#25976;&#23559;&#23532;&#36039;&#26009;&#26694;&#36681;&#25563;&#25104;&#38263;&#36039;&#26009;&#26694; `straw_hat_long_df`&#65311;"
283 | test_function("gather",
284 |               args = NULL, index = 1,
285 |               eval = TRUE,
286 |               eq_condition = "equivalent",
287 |               not_called_msg = msg,
288 |               args_not_specified_msg = msg,
289 |               incorrect_msg = msg)
290 | 
291 | msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#23559; `straw_hat_long_df` &#36664;&#20986;&#22312; R Console&#65311;"
292 | test_output_contains("straw_hat_long_df",
293 |                      incorrect_msg = msg)
294 | 
295 | success_msg("&#22826;&#26834;&#20102;&#65292;&#25105;&#20497;&#24050;&#32147;&#23416;&#26371;&#20102;&#23532;&#36039;&#26009;&#26694;&#35722;&#38263;&#36039;&#26009;&#26694;&#65292;&#25509;&#33879;&#20358;&#23416;&#24590;&#40636;&#25226;&#38263;&#36039;&#26009;&#26694;&#35722;&#25104;&#23532;&#36039;&#26009;&#26694;&#65281;")
296 | ```
297 | 
298 | --- type:NormalExercise lang:r xp:100 skills:4 key:3c9b7efd39
299 | ## 資料轉置：長變寬
300 | 
301 | 完成這個練習你的資料整理懸賞金額就可以提升至數千萬貝里，成為名副其實能夠跟草帽海賊團相抗衡的**超新星世代**！
302 | 
303 | 進行長資料框轉置為寬資料框時我們使用同樣源於 [`tidyr`](http://www.rdocumentation.org/packages/tidyr/versions/0.5.1) 套件的
304 | [`spread()`](http://www.rdocumentation.org/packages/tidyr/versions/0.5.1/topics/spread) 函數。函數必須要指定幾個參數，第一個是長資料框名稱，而 `key = ` 是用來儲存**數值種類**的欄位名稱，`value = ` 是用來儲存**數值**的欄位名稱：
305 | 
306 | ```{r}
307 | spread(df_long, key = 類別欄位, value = 數值欄位)
308 | ```
309 | 
310 | *** =instructions
311 | - 把 `straw_hat_long_df` 輸出在 R Console 看看。
312 | - 使用 `spread()` 函數把上一個練習生成的 `straw_hat_long_df` 轉置為 `straw_hat_wide_df`。
313 | - 把 `straw_hat_wide_df` 輸出在 R Console 看看。
314 | 
315 | *** =hint
316 | - `spread()` 函數的參數要指派 `straw_hat_wide_df, key = cate, value = int`
317 | 
318 | *** =pre_exercise_code
319 | ```{r}
320 | load(url("http://s3.amazonaws.com/assets.datacamp.com/production/course_1570/datasets/straw_hat_df.RData"))
321 | library(tidyr)
322 | straw_hat_wide_df <- straw_hat_df[, c("name", "age", "height")]
323 | straw_hat_long_df <- gather(straw_hat_wide_df, key = cate, value = int, height, age)
324 | ```
325 | 
326 | *** =sample_code
327 | ```{r}
328 | # straw_hat_long_df 、 tidyr 已預先載入
329 | 
330 | # 將 straw_hat_long_df 輸出在 R Console看看
331 | 
332 | 
333 | # 轉置
334 | straw_hat_wide_df <- spread(__, key = __, value = __)
335 | 
336 | # 將 straw_hat_wide_df 輸出在 R Console看看
337 | 
338 | ```
339 | 
340 | *** =solution
341 | ```{r}
342 | # straw_hat_long_df 、 tidyr 已預先載入
343 | 
344 | # 將 straw_hat_long_df 輸出在 R Console看看
345 | straw_hat_long_df
346 | 
347 | # 轉置
348 | straw_hat_wide_df <- spread(straw_hat_long_df, key = cate, value = int)
349 | 
350 | # 將 straw_hat_wide_df 輸出在 R Console看看
351 | straw_hat_wide_df
352 | ```
353 | 
354 | *** =sct
355 | ```{r}
356 | msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#23559; `straw_hat_long_df` &#36664;&#20986;&#22312; R Console&#65311;"
357 | test_output_contains("straw_hat_long_df",
358 |                      incorrect_msg = msg)
359 | 
360 | msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#20351;&#29992; `spread()` &#20989;&#25976;&#23559;&#23532;&#36039;&#26009;&#26694;&#36681;&#25563;&#25104;&#38263;&#36039;&#26009;&#26694; `straw_hat_wide_df`&#65311;"
361 | test_function("spread",
362 |               args = NULL, index = 1,
363 |               eval = TRUE,
364 |               eq_condition = "equivalent",
365 |               not_called_msg = msg,
366 |               args_not_specified_msg = msg,
367 |               incorrect_msg = msg)
368 | 
369 | test_object("straw_hat_wide_df", 
370 |             undefined_msg = msg, 
371 |             incorrect_msg = msg) 
372 | 
373 | msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#23559; `straw_hat_wide_df` &#36664;&#20986;&#22312; R Console&#65311;"
374 | test_output_contains("straw_hat_wide_df",
375 |                      incorrect_msg = msg)
376 | 
377 | success_msg("&#24685;&#21916;&#65292;&#20320;&#30340;&#25080;&#36062;&#37329;&#38989;&#24050;&#32147;&#36948;&#21040;&#25976;&#21315;&#33836;&#35997;&#37324;&#20102;&#65281;&#25509;&#19979;&#20358;&#20320;&#35201;&#33322;&#34892;&#21040;&#26368;&#24460;&#19968;&#24231;&#23798;&#23996;&#65292;&#36890;&#36942;&#35430;&#29001;&#20043;&#24460;&#20320;&#23559;&#25104;&#28858;&#25080;&#36062;&#37329;&#38989;&#25976;&#20740;&#35997;&#37324;&#65292;&#35731;&#28023;&#36557;&#38957;&#30171;&#30340;&#36229;&#32026;&#28023;&#36042;&#65281;")
378 | ```


--------------------------------------------------------------------------------
/chapter2.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title_meta  : 第二章
  3 | title       : 欄與列的相關技巧
  4 | description : 實務上我們很常會有新增變數、刪除變數或者篩選觀測值 ... 等的需求，在資料框的結構中，其實就是針對欄或者列進行整理，我們將在本章節學習這些技巧與概念，一場爭奪 One Piece 的海上冒險故事！
  5 | 
  6 | --- type:NormalExercise lang:r xp:100 skills:4 key:297843af96
  7 | ## 新增欄位
  8 | 
  9 | 草帽海賊團的廚師賓什莫克·香吉士要為船員們準備餐點，卻發現主要角色設定遺漏了大家最喜愛的料理，於是他向可愛的讀者求助，就讓我們來幫助他將最喜愛的料理 `favorite_food` 加入 `straw_hat_df` 中。
 10 | 
 11 | 資料來源：[One Piece Wiki](http://onepiece.wikia.com/wiki/Main_Page)
 12 | 
 13 | *** =instructions
 14 | - 將右邊編輯區已經定義好的 `favorite_food` 向量加入 `straw_hat_df` 中。
 15 | - 把 `straw_hat_df` 輸出在 R Console 看看。
 16 | 
 17 | *** =hint
 18 | - 在編輯區輸入 `straw_hat_df$favorite_food <- favorite_food`。
 19 | - 在編輯區輸入 `straw_hat_df`。
 20 | 
 21 | *** =pre_exercise_code
 22 | ```{r}
 23 | load(url("http://s3.amazonaws.com/assets.datacamp.com/production/course_1570/datasets/straw_hat_df.RData"))
 24 | ```
 25 | 
 26 | *** =sample_code
 27 | ```{r}
 28 | # 最喜愛料理的向量
 29 | favorite_food <- c("Meat", "Food matches wine", "Orange", "Fish", "Food matches black tea", "Sweets", "Food matches coffee", "Food matches coke", "Milk")
 30 | 
 31 | # 將向量加入資料框中成為新的欄位
 32 | straw_hat_df$favorite_food <- 
 33 | 
 34 | # 將資料框輸出在 R Console
 35 | 
 36 | ```
 37 | 
 38 | *** =solution
 39 | ```{r}
 40 | # 最喜愛料理的向量
 41 | favorite_food <- c("Meat", "Food matches wine", "Orange", "Fish", "Food matches black tea", "Sweets", "Food matches coffee", "Food matches coke", "Milk")
 42 | 
 43 | # 將向量加入資料框中成為新的欄位
 44 | straw_hat_df$favorite_food <- favorite_food
 45 | 
 46 | # 將資料框輸出在 R Console
 47 | straw_hat_df
 48 | ```
 49 | 
 50 | *** =sct
 51 | ```{r}
 52 | msg = "&#19981;&#38656;&#35201;&#21034;&#38500;&#21407;&#26412;&#24171;&#20320;&#23450;&#32681;&#22909;&#30340;&#21521;&#37327;&#65281;"
 53 | test_object("favorite_food",
 54 |             undefined_msg = msg, 
 55 |             incorrect_msg = msg) 
 56 | 
 57 | msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#23559;&#26368;&#21916;&#24859;&#26009;&#29702;&#30340;&#21521;&#37327;&#21152;&#20837;&#36039;&#26009;&#26694;&#20013;&#25104;&#28858;&#26032;&#30340;&#27396;&#20301;&#65311;"
 58 | test_data_frame("straw_hat_df",
 59 |                 columns = "favorite_food",
 60 |                 eq_condition = "equivalent",
 61 |                 undefined_msg = msg,
 62 |                 undefined_cols_msg = msg,
 63 |                 incorrect_msg = msg)
 64 | 
 65 | test_output_contains("straw_hat_df",
 66 |                      incorrect_msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#23559; `straw_hat_df` &#36664;&#20986;&#22312; R Console&#65311;")
 67 | 
 68 | success_msg("&#20320;&#24171;&#20102;&#39321;&#21513;&#22763;&#19968;&#20491;&#22823;&#24537;&#65292;&#36889;&#27171;&#20182;&#25165;&#30693;&#36947;&#35442;&#28310;&#20633;&#21738;&#20123;&#26009;&#29702;&#32102;&#33337;&#21729;&#20497;&#21507;&#65292;&#20339;&#39791;&#19992; <3")
 69 | ```
 70 | 
 71 | --- type:NormalExercise lang:r xp:100 skills:4 key:142b366f49
 72 | ## 新增欄位（2）
 73 | 
 74 | R 語言有一個很可愛的特性是**殊途同歸**，做同樣一件事情，可能有多種方式可以達成。在這個練習中我們要介紹如何使用 [`cbind()`](http://www.rdocumentation.org/packages/base/versions/3.3.1/topics/cbind) 函數來將最喜愛的料理 `favorite_food` 加入 `straw_hat_df` 中。
 75 | 
 76 | ```{r}
 77 | df <- cbind(df, column_to_add)
 78 | ```
 79 | 
 80 | *** =instructions
 81 | - 將右邊編輯區已經定義好的 `favorite_food` 向量利用 `cbind()` 函數加入 `straw_hat_df` 中。
 82 | - 把 `straw_hat_df` 輸出在 R Console 看看。
 83 | 
 84 | *** =hint
 85 | - 在編輯區輸入 `straw_hat_df <- cbind(straw_hat_df, favorite_food)`。
 86 | - 在編輯區輸入 `straw_hat_df`。
 87 | 
 88 | *** =pre_exercise_code
 89 | ```{r}
 90 | load(url("http://s3.amazonaws.com/assets.datacamp.com/production/course_1570/datasets/straw_hat_df.RData"))
 91 | ```
 92 | 
 93 | *** =sample_code
 94 | ```{r}
 95 | # 最喜愛料理的向量
 96 | favorite_food <- c("Meat", "Food matches wine", "Orange", "Fish", "Food matches black tea", "Sweets", "Food matches coffee", "Food matches coke", "Milk")
 97 | 
 98 | # 利用 cbind() 函數將向量加入資料框中成為新的欄位
 99 | straw_hat_df <- 
100 | 
101 | # 將資料框輸出在 R Console
102 | 
103 | ```
104 | 
105 | *** =solution
106 | ```{r}
107 | # 最喜愛料理的向量
108 | favorite_food <- c("Meat", "Food matches wine", "Orange", "Fish", "Food matches black tea", "Sweets", "Food matches coffee", "Food matches coke", "Milk")
109 | 
110 | # 利用 cbind() 函數將向量加入資料框中成為新的欄位
111 | straw_hat_df <- cbind(straw_hat_df, favorite_food)
112 | 
113 | # 將資料框輸出在 R Console
114 | straw_hat_df
115 | ```
116 | 
117 | *** =sct
118 | ```{r}
119 | msg = "&#19981;&#38656;&#35201;&#21034;&#38500;&#21407;&#26412;&#24171;&#20320;&#23450;&#32681;&#22909;&#30340;&#21521;&#37327;&#65281;"
120 | test_object("favorite_food",
121 |             undefined_msg = msg, 
122 |             incorrect_msg = msg) 
123 | 
124 | msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#20351;&#29992; `cbind()` &#20989;&#25976;&#65311;"
125 | test_function("cbind",
126 |               args = NULL, index = 1,
127 |               eval = TRUE,
128 |               eq_condition = "equivalent",
129 |               not_called_msg = msg,
130 |               args_not_specified_msg = NULL,
131 |               incorrect_msg = msg)
132 | 
133 | msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#23559;&#26368;&#21916;&#24859;&#26009;&#29702;&#30340;&#21521;&#37327;&#21152;&#20837;&#36039;&#26009;&#26694;&#20013;&#25104;&#28858;&#26032;&#30340;&#27396;&#20301;&#65311;"
134 | test_data_frame("straw_hat_df",
135 |                 columns = "favorite_food",
136 |                 eq_condition = "equivalent",
137 |                 undefined_msg = msg,
138 |                 undefined_cols_msg = msg,
139 |                 incorrect_msg = msg)
140 | 
141 | test_output_contains("straw_hat_df",
142 |                      incorrect_msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#23559; `straw_hat_df` &#36664;&#20986;&#22312; R Console&#65311;")
143 | 
144 | success_msg("&#22826;&#26834;&#20102;&#65292;&#23416;&#26371;&#26032;&#22686;&#20043;&#24460;&#25105;&#20497;&#20358;&#23416;&#32722;&#22914;&#20309;&#21034;&#38500;&#65281;")
145 | ```
146 | 
147 | --- type:NormalExercise lang:r xp:100 skills:4 key:c8c237a023
148 | ## 刪除欄位
149 | 
150 | 你在前一個練習中已經幫賓什莫克·香吉士解決了問題，你突然又覺得最喜愛的料理這個資訊無關痛癢。這樣子確實十分地善變，但是現實生活中你的主管或你的同事很可能也是如此的善變呢！讓我們將最喜愛的料理 `favorite_food` 從 `straw_hat_df` 中移除，在 R 語言中要將資料框中的欄位移除非常容易，只需要把該欄位指派為 `NULL` 即可：
151 | 
152 | ```{r}
153 | df$column_to_delete <- NULL
154 | ```
155 | 
156 | *** =instructions
157 | - 將 `straw_hat_df$favorite_food` 欄位從 `straw_hat_df` 中移除。
158 | - 把 `straw_hat_df` 輸出在 R Console 看看。
159 | 
160 | *** =hint
161 | - 在編輯區輸入 `straw_hat_df$favorite_food <- NULL`。
162 | - 在編輯區輸入 `straw_hat_df`。
163 | 
164 | *** =pre_exercise_code
165 | ```{r}
166 | load(url("http://s3.amazonaws.com/assets.datacamp.com/production/course_1570/datasets/straw_hat_df.RData"))
167 | favorite_food <- c("Meat", "Food matches wine", "Orange", "Fish", "Food matches black tea", "Sweets", "Food matches coffee", "Food matches coke", "Milk")
168 | straw_hat_df$favorite_food <- favorite_food
169 | ```
170 | 
171 | *** =sample_code
172 | ```{r}
173 | # 刪除最喜愛料理的欄位
174 | 
175 | 
176 | # 將資料框輸出在 R Console
177 | 
178 | ```
179 | 
180 | *** =solution
181 | ```{r}
182 | # 刪除最喜愛料理的欄位
183 | straw_hat_df$favorite_food <- NULL
184 | 
185 | # 將資料框輸出在 R Console
186 | straw_hat_df
187 | ```
188 | 
189 | *** =sct
190 | ```{r}
191 | msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#23559;&#26368;&#21916;&#24859;&#26009;&#29702;&#30340;&#27396;&#20301;&#25351;&#27966;&#32102; NULL&#65311;"
192 | test_object("straw_hat_df", 
193 |             undefined_msg = msg, 
194 |             incorrect_msg = msg) 
195 | 
196 | test_output_contains("straw_hat_df",
197 |                      incorrect_msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#23559; `straw_hat_df` &#36664;&#20986;&#22312; R Console&#65311;")
198 | 
199 | success_msg("&#22826;&#22909;&#20102;&#65292;&#19981;&#31649;&#20854;&#20182;&#20154;&#22810;&#40637;&#21892;&#35722;&#65292;&#26032;&#22686;&#21034;&#38500;&#27396;&#20301;&#37117;&#38627;&#19981;&#20498;&#20320;&#20102;&#65281;")
200 | ```
201 | 
202 | --- type:NormalExercise lang:r xp:100 skills:4 key:c78ba59b50
203 | ## 刪除欄位（2）
204 | 
205 | 還記得我們在 [R 語言導論](https://www.datacamp.com/community/open-courses/r-%E8%AA%9E%E8%A8%80%E5%B0%8E%E8%AB%96#gs.FaeP7Yg)中有提到一個便利的 [`subset()`](http://www.rdocumentation.org/packages/base/versions/3.3.1/topics/subset) 函數嗎？我們也可以利用它來刪除欄位，只需在想要刪除的欄位名稱前面加上**減號**，而且它還提供了更進階的功能，讓你可以一次刪除多個欄位！
206 | 
207 | ```{r}
208 | df <- subset(df, select = -col1)
209 | df <- subset(df, select = c(-col1, -col2, ...))
210 | ```
211 | 
212 | *** =instructions
213 | - 利用 `subset()` 函數寫一行程式將職業 `straw_hat_df$occupation` 與身高 `straw_hat_df$height` 從 `straw_hat_df` 中移除。
214 | - 把 `straw_hat_df` 輸出在 R Console 看看。
215 | 
216 | *** =hint
217 | - 在編輯區輸入 `straw_hat_df <- subset(straw_hat_df, select = c(-occupation, -height))`。
218 | - 在編輯區輸入 `straw_hat_df`。
219 | 
220 | *** =pre_exercise_code
221 | ```{r}
222 | load(url("http://s3.amazonaws.com/assets.datacamp.com/production/course_1570/datasets/straw_hat_df.RData"))
223 | ```
224 | 
225 | *** =sample_code
226 | ```{r}
227 | # 用 subset 函數一次刪除職業與身高兩個欄位
228 | straw_hat_df <- subset(__, select = c(-__, -__))
229 | 
230 | # 將資料框輸出在 R Console
231 | 
232 | ```
233 | 
234 | *** =solution
235 | ```{r}
236 | # 用 subset 函數一次刪除職業與身高兩個欄位
237 | straw_hat_df <- subset(straw_hat_df, select = c(-occupation, -height))
238 | 
239 | # 將資料框輸出在 R Console
240 | straw_hat_df
241 | ```
242 | 
243 | *** =sct
244 | ```{r}
245 | msg = "&#30906;&#35469;&#26159;&#21542;&#27491;&#30906;&#22320;&#20351;&#29992; `subset()` &#20989;&#25976;&#65311;"
246 | test_function("subset",
247 |               args = NULL, index = 1,
248 |               eval = TRUE,
249 |               eq_condition = "equivalent",
250 |               not_called_msg = msg,
251 |               args_not_specified_msg = NULL,
252 |               incorrect_msg = msg)
253 | 
254 | msg = "&#30906;&#35469;&#26159;&#21542;&#20351;&#29992; `subset()` &#20989;&#25976;&#21034;&#38500;&#20102;&#32887;&#26989;&#33287;&#36523;&#39640;&#20841;&#20491;&#27396;&#20301;&#65311;"
255 | test_object("straw_hat_df",
256 |             undefined_msg = msg, 
257 |             incorrect_msg = msg) 
258 | 
259 | test_output_contains("straw_hat_df",
260 |                      incorrect_msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#23559; `straw_hat_df` &#36664;&#20986;&#22312; R Console&#65311;")
261 | 
262 | success_msg("&#22826;&#26834;&#20102;&#65292;&#38500;&#20102;&#21487;&#20197;&#30452;&#25509;&#25351;&#23450;&#27396;&#20301;&#30340;&#21517;&#31281;&#65292;&#20320;&#20063;&#21487;&#20197;&#29992;&#32034;&#24341;&#20540;&#20358;&#25351;&#23450;&#65292;&#26377;&#31354;&#21487;&#20197;&#22312; R Console &#20013;&#32244;&#32722;&#65281;")
263 | ```
264 | 
265 | --- type:NormalExercise lang:r xp:100 skills:4 key:ccfe68db30
266 | ## 為欄位重新命名
267 | 
268 | R 語言可以使用 [`names()`](http://www.rdocumentation.org/packages/base/versions/3.3.1/topics/names) 函數將資料框的「變數名稱」以向量的型態輸出：
269 | 
270 | ```{r}
271 | names(df)
272 | ```
273 | 
274 | 透過指定索引值就可以對欄位重新命名：
275 | 
276 | ```{r}
277 | names(df)[1] <- "new_name_column1"
278 | ```
279 | 
280 | 注意 R 語言的索引值是由 **1** 起算，這一點跟其他程式語言從 0 起算是不一樣的！
281 | 
282 | *** =instructions
283 | - 將 `straw_hat_df` 的賞金欄位 `bounty` 改命名為 `reward`。
284 | - 把 `straw_hat_df` 的欄位名稱輸出在 R Console 看看。
285 | 
286 | *** =hint
287 | - 在編輯區輸入 `names(straw_hat_df)[4] <- "reward"`。
288 | - 在編輯區輸入 `names(straw_hat_df)`。
289 | 
290 | *** =pre_exercise_code
291 | ```{r}
292 | load(url("http://s3.amazonaws.com/assets.datacamp.com/production/course_1570/datasets/straw_hat_df.RData"))
293 | ```
294 | 
295 | *** =sample_code
296 | ```{r}
297 | # straw_hat_df 資料框已預先載入
298 | 
299 | # 將賞金欄位 straw_hat_df$bounty 改命名為 straw_hat_df$reward
300 | 
301 | # 將 straw_hat_df 的欄位名稱輸出在 R Console
302 | 
303 | ```
304 | 
305 | *** =solution
306 | ```{r}
307 | # straw_hat_df 資料框已預先載入
308 | 
309 | # 將賞金欄位 straw_hat_df$bounty 改命名為 straw_hat_df$reward
310 | names(straw_hat_df)[4] <- "reward"
311 | 
312 | # 將 straw_hat_df 的欄位名稱輸出在 R Console
313 | names(straw_hat_df)
314 | ```
315 | 
316 | *** =sct
317 | ```{r}
318 | msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#20351;&#29992; `names()` &#20989;&#25976;&#23559; bounty &#27396;&#20301;&#25913;&#21629;&#21517;&#28858; reward&#65311;"
319 | test_data_frame("straw_hat_df",
320 |                 columns = "reward",
321 |                 eq_condition = "equivalent",
322 |                 undefined_msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#23559; `straw_hat_df` &#31227;&#38500;&#65311;",
323 |                 undefined_cols_msg = msg,
324 |                 incorrect_msg = msg)
325 | 
326 | test_output_contains("names(straw_hat_df)",
327 |                      incorrect_msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#23559; `straw_hat_df` &#30340;&#27396;&#20301;&#21517;&#31281;&#36664;&#20986;&#22312; R Console&#65311;")
328 | 
329 | success_msg("&#36899;&#40860;&#27611;&#30340;&#37325;&#26032;&#21629;&#21517;&#27396;&#20301;&#37117;&#38627;&#19981;&#20498;&#20320;&#65292;&#22826;&#26834;&#20102;&#65281;")
330 | ```
331 | 
332 | --- type:NormalExercise lang:r xp:100 skills:4 key:d3c614f19b
333 | ## 鑽研 subset() 函數
334 | 
335 | 前面練習示範的刪除欄位只是 [`subset()`](http://www.rdocumentation.org/packages/base/versions/3.3.1/topics/subset) 函數的其中一個功能。[`subset()`](http://www.rdocumentation.org/packages/base/versions/3.3.1/topics/subset) 函數在篩選觀測值與變數非常實用，假如你想快速看到草帽魯夫的懸賞金額，可以練習在 R Console 輸入：
336 | 
337 | ```{r}
338 | subset(straw_hat_df, name == "Monkey D. Luffy", select = c(name, bounty))
339 | ```
340 | 
341 | *** =instructions
342 | - 將草帽海賊團賞金大於 1000 萬貝里並且年齡小於 30 歲的成員篩選出來，欄位只需要包含姓名、賞金與年齡。
343 | 
344 | *** =hint
345 | - 在編輯區輸入 `subset(straw_hat_df, bounty > 10000000 & age < 30, select = c(name, bounty, age))`
346 | 
347 | *** =pre_exercise_code
348 | ```{r}
349 | load(url("http://s3.amazonaws.com/assets.datacamp.com/production/course_1570/datasets/straw_hat_df.RData"))
350 | ```
351 | 
352 | *** =sample_code
353 | ```{r}
354 | # straw_hat_df 資料框已預先載入
355 | 
356 | # 篩選賞金大於 1000 萬貝里並且年齡小於 30 歲，欄位只需要包含姓名、賞金與年齡
357 | subset(straw_hat_df, bounty > __ & age < __, select = c(name, __, age))
358 | ```
359 | 
360 | *** =solution
361 | ```{r}
362 | # straw_hat_df 資料框已預先載入
363 | 
364 | # 篩選賞金大於 1000 萬貝里並且年齡小於 30 歲，欄位只需要包含姓名、賞金與年齡
365 | subset(straw_hat_df, bounty > 10000000 & age < 30, select = c(name, bounty, age))
366 | ```
367 | 
368 | *** =sct
369 | ```{r}
370 | test_output_contains("subset(straw_hat_df, bounty > 10000000 & age < 30, select = c(name, bounty, age))",
371 |                       incorrect_msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#31721;&#36984;&#20986;&#38988;&#30446;&#25152;&#35201;&#27714;&#30340;&#35264;&#28204;&#20540;&#33287;&#27396;&#20301;&#65311;")
372 | 
373 | success_msg("&#22826;&#26834;&#20102;&#65292;`subset()` &#20989;&#25976;&#26159;&#24375;&#32780;&#26377;&#21147;&#30340;&#24037;&#20855;&#65292;&#23427;&#33021;&#22816;&#35731;&#20320;&#26356;&#26377;&#25928;&#29575;&#22320;&#34389;&#29702;&#36039;&#26009;&#26694;&#65281;")
374 | ```
375 | 
376 | --- type:NormalExercise lang:r xp:100 skills:4 key:2de6bd4bcf
377 | ## 新增列數
378 | 
379 | 前面練習我們介紹了 [`cbind()`](http://www.rdocumentation.org/packages/base/versions/3.3.1/topics/subset) 函數能夠協助新增欄位，聰明的你一定能夠舉一反三，想到是不是也有相對應的函數可以協助新增列數呢？沒錯，R 語言的確有相對應的 [`rbind()`](http://www.rdocumentation.org/packages/R6Frame/versions/0.1.0/topics/rbind) 函數：
380 | 
381 | ```{r}
382 | df <- rbind(df, row_to_add)
383 | ```
384 | 
385 | 鄉民們對於第十位船員的猜測議論紛紛，各種神預測，在這裡我們稍微懷舊一下，草帽海賊團永遠的夥伴：阿拉巴斯坦王國的薇薇公主。
386 | 
387 | 資料來源：[One Piece Wiki](http://onepiece.wikia.com/wiki/Main_Page)
388 | 
389 | *** =instructions
390 | - 利用 `rbind()` 將薇薇公主加入草帽海賊團資料框。
391 | - 把 `straw_hat_df` 輸出在 R Console 看看。
392 | 
393 | *** =hint
394 | - 在編輯區輸入 `straw_hat_df <- rbind(straw_hat_df, princess_vivi)`。
395 | - 在編輯區輸入 `straw_hat_df`。
396 | 
397 | *** =pre_exercise_code
398 | ```{r}
399 | load(url("http://s3.amazonaws.com/assets.datacamp.com/production/course_1570/datasets/straw_hat_df.RData"))
400 | ```
401 | 
402 | *** =sample_code
403 | ```{r}
404 | # straw_hat_df 資料框已預先載入
405 | 
406 | # 薇薇公主
407 | princess_vivi <- c("Nefeltari Vivi", "Female", "Princess of Alabasta", NA, 18, "02-02", NA)
408 | 
409 | # 將薇薇公主加入草帽海賊團資料框
410 | straw_hat_df <- 
411 | 
412 | # 把 straw_hat_df 輸出在 R Console
413 | 
414 | ```
415 | 
416 | *** =solution
417 | ```{r}
418 | # straw_hat_df 資料框已預先載入
419 | 
420 | # 薇薇公主
421 | princess_vivi <- c("Nefeltari Vivi", "Female", "Princess of Alabasta", NA, 18, "02-02", NA)
422 | 
423 | # 將薇薇公主加入草帽海賊團資料框
424 | straw_hat_df <- rbind(straw_hat_df, princess_vivi)
425 | 
426 | # 把 straw_hat_df 輸出在 R Console
427 | straw_hat_df
428 | ```
429 | 
430 | *** =sct
431 | ```{r}
432 | msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#27491;&#30906;&#20351;&#29992; `rbind()` &#20989;&#25976;&#65311;"
433 | test_function("rbind",
434 |               args = NULL, index = 1,
435 |               eval = TRUE,
436 |               eq_condition = "equivalent",
437 |               not_called_msg = msg,
438 |               args_not_specified_msg = NULL,
439 |               incorrect_msg = msg)
440 | 
441 | msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#23559;&#34183;&#34183;&#20844;&#20027;&#21152;&#20837;&#33609;&#24125;&#28023;&#36042;&#22296;&#36039;&#26009;&#26694;&#65311;"
442 |                
443 | test_object("straw_hat_df", 
444 |              undefined_msg = msg, 
445 |              incorrect_msg = msg) 
446 | 
447 | test_output_contains("straw_hat_df",
448 |                      incorrect_msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#23559; `straw_hat_df` &#36664;&#20986;&#22312; R Console&#65311;")
449 | 
450 | success_msg("&#33609;&#24125;&#28023;&#36042;&#22296;&#38626;&#38283;&#38463;&#25289;&#24052;&#26031;&#22374;&#29579;&#22283;&#26178;&#65292;&#32972;&#23565;&#33879;&#34183;&#34183;&#21644;&#36305;&#24471;&#24555;&#33289;&#36215;&#20102;&#30059;&#22312;&#24038;&#25163;&#33218;&#19978;&#30340; X &#35352;&#34399;&#12290;&#19981;&#31649;&#31532;&#21313;&#20491;&#22821;&#20276;&#26159;&#35504;&#65292;&#34183;&#34183;&#21644;&#36305;&#24471;&#24555;&#20173;&#34987;&#35469;&#23450;&#26159;&#33609;&#24125;&#28023;&#36042;&#22296;&#30340;&#33337;&#21729;&#65281;")
451 | ```


--------------------------------------------------------------------------------
/chapter3.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title_meta  : 第三章
  3 | title       : 生成衍生變數
  4 | description : 從資料庫查詢得到的結果或者既有資料的變數有時候並不能滿足我們的分析需求，這時我們會需要生成衍生變數，可能是將類別型變數重新歸類、將數值型變數歸類為類別型變數或者針對數值型變數作計算 ... 等，我們將在本章節學習這些技巧與概念，一場爭奪 One Piece 的海上冒險故事！
  5 | 
  6 | --- type:NormalExercise lang:r xp:100 skills:4 key:db8d80a572
  7 | ## 類別型變數的分類
  8 | 
  9 | 雖然草帽海賊團每個船員都有獨立作戰的能力，但交戰仍然區分為兩種類型：輔助型與戰鬥型。具體而言，我們的航海士（Navigator）、狙擊手（Sniper）、船醫（Doctor）與考古學家（Archaeologist）是屬於輔助型（Support），其餘船員則不意外是屬於戰鬥型（Fighter）。我們要多加一個欄位紀錄船員們的戰鬥類型 `battle_role`，這樣的二元重新分類可以善用 [`ifelse()`](http://www.rdocumentation.org/packages/base/versions/3.3.1/topics/ifelse) 函數，讓我們來練習看看！
 10 | 
 11 | ```{r}
 12 | ifelse(test, yes, no)
 13 | ```
 14 | 
 15 | 在對比文字時我們使用一個特殊的運算子符號 `%in%`，如果運算子左方的文字有出現在右方的向量中，就會回傳 `TRUE` 反之則回傳 `FALSE`。
 16 | 
 17 | *** =instructions
 18 | - 將右邊編輯區畫底線的空位填入適當值。
 19 | - 把 `straw_hat_df` 輸出在 R Console 看看。
 20 | 
 21 | *** =hint
 22 | - 我們要把職業是航海士（Navigator）、狙擊手（Sniper）、船醫（Doctor）與考古學家（Archaeologist）的船員指定為 `battle_role = "Support"`，其餘指定為 `battle_role = "Fighter"`。
 23 | - 在編輯區輸入 `straw_hat_df`。
 24 | 
 25 | *** =pre_exercise_code
 26 | ```{r}
 27 | load(url("http://s3.amazonaws.com/assets.datacamp.com/production/course_1570/datasets/straw_hat_df.RData"))
 28 | ```
 29 | 
 30 | *** =sample_code
 31 | ```{r}
 32 | # straw_hat_df 已預先載入
 33 | 
 34 | # 填入適當的值
 35 | straw_hat_df$battle_role <- ifelse(straw_hat_df$occupation %in% c("__", "__", "__", "__"), yes = "__", no = "__")
 36 | 
 37 | # 將資料框輸出在 R Console
 38 | 
 39 | ```
 40 | 
 41 | *** =solution
 42 | ```{r}
 43 | # straw_hat_df 已預先載入
 44 | 
 45 | # 填入適當的值
 46 | straw_hat_df$battle_role <- ifelse(straw_hat_df$occupation %in% c("Navigator", "Sniper", "Doctor", "Archaeologist"), yes = "Support", no = "Fighter")
 47 | 
 48 | # 將資料框輸出在 R Console
 49 | straw_hat_df
 50 | ```
 51 | 
 52 | *** =sct
 53 | ```{r}
 54 | test_data_frame("straw_hat_df", columns = "battle_role",
 55 |                 undefined_msg = "&#30906;&#35469;&#20320;&#26159;&#21542;&#19981;&#23567;&#24515;&#23559; `straw_hat_df` &#31227;&#38500;&#20102;&#65311;",
 56 |                 undefined_cols_msg = "&#30906;&#35469;&#20320;&#26159;&#21542;&#27491;&#30906;&#29983;&#25104; `battle_role` &#35722;&#25976;&#65311;",
 57 |                 incorrect_msg = "&#30906;&#35469;&#20320;&#26159;&#21542;&#27491;&#30906;&#29983;&#25104; `battle_role` &#35722;&#25976;&#65311;")
 58 | 
 59 | test_output_contains("straw_hat_df",
 60 |                      incorrect_msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#23559; `straw_hat_df` &#36664;&#20986;&#22312; R Console&#65311;")
 61 | 
 62 | success_msg("&#22826;&#26834;&#20102;&#65292;&#20294;&#20320;&#26377;&#24819;&#36942;&#20551;&#22914;&#25105;&#20497;&#37325;&#26032;&#27512;&#39006;&#19981;&#21482;&#20841;&#31278;&#39006;&#21029;&#65292;&#25033;&#35442;&#24590;&#40636;&#36774;&#65311;")
 63 | ```
 64 | 
 65 | --- type:NormalExercise lang:r xp:100 skills:4 key:ec549da7b3
 66 | ## 類別型變數的分類（2）
 67 | 
 68 | 在輔助型戰鬥角色中，其實可以再將狙擊手（Sniper）另外歸類為遠距攻擊型（Range），而我們的船醫（Doctor）、考古學家（Archaeologist）與航海士（Navigator）仍然歸類為輔助型（Support），原本歸類為戰鬥型（Fighter）的船長（Captain）、劍士（Swordsman）、廚師（Cook）、船匠（Shipwright）與音樂家（Musician）則維持原歸類，如此一來我們的類別會達到三種，這時我們採用向量索引值進行歸類。
 69 | 
 70 | *** =instructions
 71 | - 將右邊編輯區畫底線的空位填入適當值。
 72 | - 把 `straw_hat_df` 輸出在 R Console 看看。
 73 | 
 74 | *** =hint
 75 | - 我們要把職業是狙擊手（Sniper）與的船員指定為 `battle_role = "Range"`，職業是船醫（Doctor）、考古學家（Archaeologist）與航海士（Navigator）的船員指定為 `battle_role = "Support"`，其餘指定為 `battle_role = "Fighter"`。
 76 | - 在編輯區輸入 `straw_hat_df`。
 77 | 
 78 | *** =pre_exercise_code
 79 | ```{r}
 80 | load(url("http://s3.amazonaws.com/assets.datacamp.com/production/course_1570/datasets/straw_hat_df.RData"))
 81 | ```
 82 | 
 83 | *** =sample_code
 84 | ```{r}
 85 | # straw_hat_df 已預先載入
 86 | 
 87 | # 填入適當的值
 88 | straw_hat_df$battle_role[straw_hat_df$occupation == c("__")] <- "Range"
 89 | straw_hat_df$battle_role[straw_hat_df$occupation %in% c("__", "__", "__")] <- "Support"
 90 | straw_hat_df$battle_role[straw_hat_df$occupation %in% c("__", "__", "__", "__", "__")] <- "Fighter"
 91 | 
 92 | # 將資料框輸出在 R Console
 93 | 
 94 | ```
 95 | 
 96 | *** =solution
 97 | ```{r}
 98 | # straw_hat_df 已預先載入
 99 | 
100 | # 填入適當的值
101 | straw_hat_df$battle_role[straw_hat_df$occupation == "Sniper"] <- "Range"
102 | straw_hat_df$battle_role[straw_hat_df$occupation %in% c("Doctor", "Archaeologist", "Navigator")] <- "Support"
103 | straw_hat_df$battle_role[straw_hat_df$occupation %in% c("Captain", "Swordsman", "Cook", "Shipwright", "Musician")] <- "Fighter"
104 | 
105 | # 將資料框輸出在 R Console
106 | straw_hat_df
107 | ```
108 | 
109 | *** =sct
110 | ```{r}
111 | test_data_frame("straw_hat_df",
112 |                 columns = "battle_role",
113 |                 eq_condition = "equivalent",
114 |                 undefined_msg = "&#30906;&#35469;&#20320;&#26159;&#21542;&#23559; `straw_hat_df` &#31227;&#38500;&#20102;&#65311;",
115 |                 undefined_cols_msg = "&#30906;&#35469;&#20320;&#26159;&#21542;&#26377;&#27491;&#30906;&#23559; `straw_hat_df$battle_role` &#20316;&#20998;&#39006;&#65311;",
116 |                 incorrect_msg = "&#30906;&#35469;&#20320;&#26159;&#21542;&#26377;&#27491;&#30906;&#23559; `straw_hat_df$battle_role` &#20316;&#20998;&#39006;&#65311;")
117 | 
118 | test_output_contains("straw_hat_df",
119 |                      incorrect_msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#23559; `straw_hat_df` &#36664;&#20986;&#22312; R Console&#65311;")
120 | 
121 | success_msg("&#22826;&#22909;&#20102;&#65292;&#25509;&#19979;&#20358;&#25105;&#20497;&#35201;&#30475;&#30475;&#24590;&#40636;&#23565;&#25976;&#20540;&#22411;&#35722;&#25976;&#36914;&#34892;&#20998;&#39006;&#65281;")
122 | ```
123 | 
124 | --- type:NormalExercise lang:r xp:100 skills:4 key:70f2080cea
125 | ## 數值型變數的分類
126 | 
127 | 草帽海賊團的船員經過多雷斯羅薩決戰之後賞金大幅上升，新世界其他的海賊團無不虎視眈眈，對草帽海賊團進行戰力評估，他們想要將船員依照賞金級距切分為低、中與高三個等級，這個作法如同新增加了一個類別型變數，但卻是由既有的數值型變數所衍生得到。在 R 語言中，我們可以善用 [`cut()`](http://www.rdocumentation.org/packages/base/versions/3.3.1/topics/cut) 函數來做這件事情。
128 | 
129 | ```{r}
130 | df$new_column <- cut(df$column, breaks = c(0, break1, break2, Inf), labels = c("label1", "label2", "label3"))
131 | ```
132 | 
133 | 其中 `breaks` 參數設定必須要有一個最小值與最大值，範例中是指介於 `0 - break1` 的數值歸類為 `label1`，介於 `break1 - break2` 的數值歸類為 `label2`，而介於 `break2 - Inf` 的數值歸類為 `label3`，`Inf` 在 R 語言中是無限大的數值，你可以在 R Console 中輸入 `class(Inf)` 來驗證。
134 | 
135 | *** =instructions
136 | - 新增一個變數 `bounty_level` 將賞金小於 8 千 3 百萬貝里的船員歸類為 `"Low"`，賞金介於 8 千 3 百萬貝里與 1 億 8 千萬貝里之間的船員歸類為 `"Medium"`，將賞金高於 1 億 8 千萬貝里的船員歸類為 `"High"`。
137 | - 把 `straw_hat_df` 輸出在 R Console 看看。
138 | 
139 | *** =hint
140 | - `cut()` 函數的第一個參數要設為 `straw_hat_df$bounty`，`breaks` 的級距要加入 8 千 3 百萬貝里與 1 億 8 千萬貝里，`labels` 則是要將 `"Low"`、 `"Medium"` 與 `"High"` 依序放入！
141 | - 在編輯區輸入 `straw_hat_df`。
142 | 
143 | *** =pre_exercise_code
144 | ```{r}
145 | load(url("http://s3.amazonaws.com/assets.datacamp.com/production/course_1570/datasets/straw_hat_df.RData"))
146 | ```
147 | 
148 | *** =sample_code
149 | ```{r}
150 | # straw_hat_df 已預先載入
151 | 
152 | # 填入適當的值
153 | straw_hat_df$bounty_level <- cut(straw_hat_df$bounty, breaks = c(0, __, __, Inf), labels = c("__", "__", "__"))
154 | 
155 | # 將資料框輸出在 R Console
156 | 
157 | ```
158 | 
159 | *** =solution
160 | ```{r}
161 | # straw_hat_df 已預先載入
162 | 
163 | # 填入適當的值
164 | straw_hat_df$bounty_level <- cut(straw_hat_df$bounty, breaks = c(0, 83000000, 180000000, Inf), labels = c("Low", "Medium", "High"))
165 | 
166 | # 將資料框輸出在 R Console
167 | straw_hat_df
168 | ```
169 | 
170 | *** =sct
171 | ```{r}
172 | msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#27491;&#30906;&#20351;&#29992; `cut()` &#20989;&#25976;&#65311;"
173 | test_function("cut",
174 |               args = NULL, index = 1,
175 |               eval = TRUE,
176 |               eq_condition = "equivalent",
177 |               not_called_msg = msg,
178 |               args_not_specified_msg = NULL,
179 |               incorrect_msg = msg)
180 | 
181 | msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#23559;&#33337;&#21729;&#20381;&#29031;&#36062;&#37329;&#20998;&#39006;&#28858;&#20302;&#20013;&#39640;&#19977;&#20491;&#32026;&#36317;&#65311;"
182 | test_data_frame("straw_hat_df",
183 |                 columns = "bounty_level",
184 |                 eq_condition = "equivalent",
185 |                 undefined_msg = "&#30906;&#35469;&#26159;&#21542;&#23559; `straw_hat_df` &#31227;&#38500;&#20102;&#65311;",
186 |                 undefined_cols_msg = msg,
187 |                 incorrect_msg = msg)
188 | 
189 | test_output_contains("straw_hat_df",
190 |                      incorrect_msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#23559; `straw_hat_df` &#36664;&#20986;&#22312; R Console&#65311;")
191 | 
192 | success_msg("&#21703;&#65292;&#20320;&#24050;&#32147;&#23565;&#33609;&#24125;&#28023;&#36042;&#22296;&#30340;&#25136;&#21147;&#30637;&#33509;&#25351;&#25484;&#65292;&#20294;&#20809;&#26159;&#36889;&#27171;&#36996;&#19981;&#36275;&#20197;&#35731;&#20320;&#36319;&#20182;&#20497;&#30456;&#25239;&#34913;&#65292;&#20320;&#36996;&#38656;&#35201;&#32380;&#32396;&#24375;&#21270;&#25136;&#21147;&#65281;")
193 | ```
194 | 
195 | --- type:NormalExercise lang:r xp:100 skills:4 key:3d3db367ea
196 | ## 衍生計算數值型變數
197 | 
198 | 在前一個練習中我們對賞金進行級距的切分時你是否有感覺到單位的不便？沒錯，通常在處理較大數量級的變數時我們會轉換單位便於使用，像是千、百萬或者十億。現在就讓我們來新增一個以百萬元貝里作為單位的變數 `bounty_million`。
199 | 
200 | *** =instructions
201 | - 將原本的 `straw_hat_df$bounty` 除以 1,000,000 並指派給一個新資料框變數 `straw_hat_df$bounty_million`。
202 | - 把 `straw_hat_df` 輸出在 R Console 看看。
203 | 
204 | *** =hint
205 | - 在編輯區輸入 `straw_hat_df$bounty_million <- straw_hat_df$bounty / 1000000`。
206 | - 在編輯區輸入 `straw_hat_df`。
207 | 
208 | *** =pre_exercise_code
209 | ```{r}
210 | load(url("http://s3.amazonaws.com/assets.datacamp.com/production/course_1570/datasets/straw_hat_df.RData"))
211 | ```
212 | 
213 | *** =sample_code
214 | ```{r}
215 | # straw_hat_df 已預先載入
216 | 
217 | # 新增一個以百萬元貝里作為單位的 straw_hat_df$bounty_million
218 | straw_hat_df$bounty_million <- 
219 | 
220 | # 將資料框輸出在 R Console
221 | 
222 | ```
223 | 
224 | *** =solution
225 | ```{r}
226 | # straw_hat_df 已預先載入
227 | 
228 | # 新增一個以百萬元貝里作為單位的 straw_hat_df$bounty_million
229 | straw_hat_df$bounty_million <- straw_hat_df$bounty / 1000000
230 | 
231 | # 將資料框輸出在 R Console
232 | straw_hat_df
233 | ```
234 | 
235 | *** =sct
236 | ```{r}
237 | msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#27491;&#30906;&#23559;&#36062;&#37329;&#38500;&#20197; 1 &#30334;&#33836;&#20006;&#29986;&#29983;&#19968;&#20491;&#26032;&#35722;&#25976; `bounty_million`&#65311;"
238 | test_data_frame("straw_hat_df",
239 |                 columns = "bounty_million",
240 |                 undefined_msg = "&#30906;&#35469;&#26159;&#21542;&#19981;&#23567;&#24515;&#23559; `straw_hat_df` &#31227;&#38500;&#20102;&#65311;",
241 |                 undefined_cols_msg = msg,
242 |                 incorrect_msg = msg)
243 | 
244 | test_output_contains("straw_hat_df",
245 |                      incorrect_msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#23559; `straw_hat_df` &#36664;&#20986;&#22312; R Console&#65311;")
246 | 
247 | success_msg("&#30495;&#26159;&#22826;&#21426;&#23475;&#20102;&#65292;&#36889;&#19979;&#34893;&#29983;&#35336;&#31639;&#25976;&#20540;&#22411;&#35722;&#25976;&#20063;&#38627;&#19981;&#20498;&#20320;&#20102;&#65281;")
248 | ```
249 | 
250 | --- type:NormalExercise lang:r xp:100 skills:4 key:324a377059
251 | ## 較難的衍生變數
252 | 
253 | 前進新世界，邁向成為海賊王道路上的挑戰是愈來愈艱辛，在接下來的練習我們要處理有一點棘手的問題。在原始的角色設定中，我們只有船員們的年齡與生日的月份和日期，即便海賊王世界所使用的紀年與我們所熟稔的西元紀年迥異，身為超級粉絲與資料狂熱份子，你依然想要新增一個變數欄位來紀錄包含西元年份的草帽海賊團船員生日。
254 | 
255 | 我們要先介紹 `Sys.Date()` 這個函數，它會回傳現在的系統日期，你可以在 R Console 輸入：
256 | 
257 | ```{r}
258 | Sys.Date()
259 | ```
260 | 
261 | R Console 會將現在的系統日期以 "%Y-%m-%d" 的格式回傳。`%Y` 代表四位數字的西元紀年，`%m` 代表兩位數字的月份，`%d` 代表兩位數字的日期。而運用 [`format()`](http://www.rdocumentation.org/packages/utils/versions/3.3.1/topics/format) 函數可以得到我們需要的西元年。
262 | 
263 | ```{r}
264 | format(Sys.Date(), '%Y')
265 | ```
266 | 
267 | 產生出來的西元年格式是字元，如果想要做運算還需要利用 [`as.numeric()`](http://www.rdocumentation.org/packages/base/versions/3.3.1/topics/numeric) 轉為數值。
268 | 
269 | ```{r}
270 | as.numeric(format(Sys.Date(), '%Y'))
271 | ```
272 | 
273 | *** =instructions
274 | - 將 `Sys.Date()` 的產出指派給一個變數 `sys_date`。
275 | - 利用 `format()` 函數將 `sys_date` 的西元年指派給 `sys_date_year`。
276 | - 利用 `as.numeric()` 函數將 `sys_date_year` 轉換為數值並指派給 `sys_date_year_num`。
277 | 
278 | *** =hint
279 | - `Sys.Date()` 不需要輸入參數。
280 | - `format()` 函數第二個參數必須指定為 `'%Y'`。
281 | 
282 | *** =pre_exercise_code
283 | ```{r}
284 | # no pec
285 | ```
286 | 
287 | *** =sample_code
288 | ```{r}
289 | # 產生 sys_date
290 | sys_date <- 
291 | 
292 | # 產生 sys_date_year
293 | sys_date_year <- 
294 | 
295 | # 產生 sys_date_year_num
296 | sys_date_year_num <- 
297 | 
298 | # 將 sys_date 、 sys_date_year 與 sys_date_year_num 輸出在 R Console
299 | 
300 | ```
301 | 
302 | *** =solution
303 | ```{r}
304 | # 產生 sys_date
305 | sys_date <- Sys.Date()
306 | 
307 | # 產生 sys_date_year
308 | sys_date_year <- format(sys_date, '%Y')
309 | 
310 | # 產生 sys_date_year_num
311 | sys_date_year_num <- as.numeric(sys_date_year)
312 | 
313 | # 將 sys_date 、 sys_date_year 與 sys_date_year_num 輸出在 R Console
314 | sys_date
315 | sys_date_year
316 | sys_date_year_num
317 | ```
318 | 
319 | *** =sct
320 | ```{r}
321 | msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#27491;&#30906;&#20351;&#29992; `Sys.Date()` &#20989;&#25976;&#29986;&#20986; `sys_date`&#65311;"
322 | test_object("sys_date", 
323 |             undefined_msg = msg, 
324 |             incorrect_msg = msg)
325 | 
326 | msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#27491;&#30906;&#20351;&#29992; `format()` &#20989;&#25976;&#29986;&#20986; `sys_date_year`&#65311;"            
327 | test_object("sys_date_year", 
328 |             undefined_msg = msg, 
329 |             incorrect_msg = msg)
330 | 
331 | msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#27491;&#30906;&#20351;&#29992; `as.numeric()` &#20989;&#25976;&#29986;&#20986; `sys_date_year_num`&#65311;"            
332 | test_object("sys_date_year_num", 
333 |             undefined_msg = msg, 
334 |             incorrect_msg = msg)
335 | 
336 | msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#23559; `sys_date` &#12289; `sys_date_year` &#33287; `sys_date_year_num` &#36664;&#20986;&#22312; R Console&#65311;"
337 | test_output_contains("sys_date",
338 |                      times = 1,
339 |                      incorrect_msg = msg)
340 | test_output_contains("sys_date_year",
341 |                      times = 1,
342 |                      incorrect_msg = msg)
343 | test_output_contains("sys_date_year_num",
344 |                      times = 1,
345 |                      incorrect_msg = msg)
346 | 
347 | success_msg("&#22826;&#22909;&#20102;&#65292;&#25509;&#19979;&#20358;&#25105;&#20497;&#35201;&#25226;&#29986;&#20986;&#30340;&#35199;&#20803;&#24180;&#20221;&#28187;&#21435;&#33337;&#21729;&#20497;&#30340;&#24180;&#40801;&#65292;&#20358;&#24471;&#21040;&#27599;&#20491;&#20154;&#30340;&#35199;&#20803;&#20986;&#29983;&#24180;&#20221;&#65281;")
348 | ```
349 | 
350 | --- type:NormalExercise lang:r xp:100 skills:4 key:b1faadbe0c
351 | ## 較難的衍生變數（2）
352 | 
353 | 在前一個練習我們已經生成被儲存為數值類型的系統日期西元年份，接下來我們要用每個船員各自的年齡來計算生日的西元年份，R 語言的使用者在產生衍生變數的過程不喜歡扛著整個資料框，於是我們會先將要使用於計算的變數獨立出來：
354 | 
355 | ```{r}
356 | vector1 <- df$col1
357 | ```
358 | 
359 | 計算後我們會得到船員的生日西元年份，但是你忽然想起來 `birthday` 是儲存成字元的資料格式，於是在結合之前可別忘了使用 [`as.character()`](http://www.rdocumentation.org/packages/base/versions/3.3.1/topics/character) 函數轉換為字元！
360 | 
361 | *** =instructions
362 | - 將年齡從資料框中選出，另外指派為一個向量 `age`。
363 | - 將生日從資料框中選出，另外指派為一個向量 `birthday`。
364 | - 將系統日期西元年份 `sys_date_year_num` 減去 `age` 得到各個船員的生日西元年份 `birth_year`。
365 | - 使用 `as.character()` 將 `birth_year` 轉成字元 `birth_year_char`。
366 | 
367 | *** =hint
368 | - 輸入 `age <- straw_hat_df$age` 與 `birthday <- straw_hat_df$birthday` 將需要計算的欄位獨立出來。
369 | - 記得使用 `as.character()` 函數。
370 | 
371 | *** =pre_exercise_code
372 | ```{r}
373 | load(url("http://s3.amazonaws.com/assets.datacamp.com/production/course_1570/datasets/straw_hat_df.RData"))
374 | sys_date <- Sys.Date()
375 | sys_date_year <- format(sys_date, '%Y')
376 | sys_date_year_num <- as.numeric(sys_date_year)
377 | rm(sys_date)
378 | rm(sys_date_year)
379 | ```
380 | 
381 | *** =sample_code
382 | ```{r}
383 | # straw_hat_df 與 sys_date_year_num 已預先載入
384 | 
385 | # 宣告 age 向量
386 | age <- 
387 | 
388 | # 宣告 birthday 向量
389 | birthday <- 
390 | 
391 | # 用 sys_date_year_num 減去 age 並指派給 birth_year
392 | birth_year <- 
393 | 
394 | # 利用 as.character 將 birth_year 轉換成字元並指派給 birth_year_char
395 | birth_year_char <- 
396 | 
397 | # 將 birth_year 與 birth_year_char 輸出在 R Console
398 | 
399 | ```
400 | 
401 | *** =solution
402 | ```{r}
403 | # straw_hat_df 與 sys_date_year_num 已預先載入
404 | 
405 | # 宣告 age 向量
406 | age <- straw_hat_df$age
407 | 
408 | # 宣告 birthday 向量
409 | birthday <- straw_hat_df$birthday
410 | 
411 | # 用 sys_date_year_num 減去 age 並指派給 birth_year
412 | birth_year <- sys_date_year_num - age
413 | 
414 | # 利用 as.character 將 birth_year 轉換成字元並指派給 birth_year_char
415 | birth_year_char <- as.character(birth_year)
416 | 
417 | # 將 birth_year 與 birth_year_char 輸出在 R Console
418 | birth_year
419 | birth_year_char
420 | ```
421 | 
422 | *** =sct
423 | ```{r}
424 | msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#27491;&#30906;&#23459;&#21578; `age` &#35722;&#25976;&#65311;"
425 | test_object("age", 
426 |             undefined_msg = msg, 
427 |             incorrect_msg = msg)
428 | 
429 | msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#27491;&#30906;&#23459;&#21578; `birthday` &#35722;&#25976;&#65311;"
430 | test_object("birthday", 
431 |             undefined_msg = msg, 
432 |             incorrect_msg = msg)
433 | 
434 | msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#29992; `sys_date_year_num` &#28187;&#21435; `age` &#20006;&#25351;&#27966;&#32102; `birth_year`&#65311;"
435 | test_object("birth_year", 
436 |             undefined_msg = msg, 
437 |             incorrect_msg = msg)
438 | 
439 | msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#29992; `as.character()` &#23559;&#25976;&#20540;&#36681;&#25563;&#28858;&#23383;&#20803;&#65311;"
440 | test_function("as.character",
441 |               args = NULL, index = 1,
442 |               eval = TRUE,
443 |               eq_condition = "equivalent",
444 |               not_called_msg = msg,
445 |               args_not_specified_msg = NULL,
446 |               incorrect_msg = msg)
447 | 
448 | msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#27491;&#30906;&#23459;&#21578; `birth_year_char`&#65311;"
449 | test_object("birth_year_char", 
450 |             undefined_msg = msg, 
451 |             incorrect_msg = msg)
452 | 
453 | msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#23559; `birth_year` &#33287; `birth_year_char` &#36664;&#20986;&#22312; R Console&#65311;"
454 | test_output_contains("birth_year",
455 |                      times = 1,
456 |                      incorrect_msg = msg)
457 | test_output_contains("birth_year_char",
458 |                      times = 1,
459 |                      incorrect_msg = msg)
460 | 
461 | success_msg("&#22826;&#26834;&#20102;&#65292;&#25105;&#20497;&#22312;&#19979;&#19968;&#20491;&#32244;&#32722;&#23601;&#21487;&#20197;&#22823;&#21151;&#21578;&#25104;&#65292;&#36245;&#24555;&#20358;&#21543;&#65281;")
462 | ```
463 | 
464 | --- type:NormalExercise lang:r xp:100 skills:4 key:a40e05d297
465 | ## 較難的衍生變數（3）
466 | 
467 | 呼，終於要完成這個有點麻煩的衍生變數了！我們接下來要將剛剛生成的 `birth_year_char` 與 `birthday` 結合，字串的結合我們要使用 [`paste()`](http://www.rdocumentation.org/packages/base/versions/3.3.1/topics/paste) 函數：
468 | 
469 | ```{r}
470 | char_pasted <- paste(char1, char2, sep = " ")
471 | ```
472 | 
473 | 注意預設的 `sep = ` 參數是空格，由於西元日期會以 `-` 連接，所以記得要使用 `sep = "-"`。結合好以後我們只需使用 [`as.Date()`](http://www.rdocumentation.org/packages/base/versions/3.3.1/topics/as.Date) 函數將字元轉換成**日期**格式，就可以將這個向量新增至資料框了！
474 | 
475 | *** =instructions
476 | - 使用 `paste()` 函數將 `birth_year_char` 與 `birthday` 結合起來，成為 `birth_date_char`。
477 | - 使用 `as.Date()` 函數將 `birth_date_char` 轉成日期 `birth_date`。
478 | - 將 `birth_date` 新增至資料框。
479 | - 把 `straw_hat_df` 輸出在 R Console 看看。
480 | 
481 | *** =hint
482 | - `paste()` 函數的參數記得要設為 `sep = "-"`。
483 | - 輸入 `straw_hat_df$birth_date <- birth_date` 就可以完成新增變數。
484 | - 在編輯區輸入 `straw_hat_df`。
485 | 
486 | *** =pre_exercise_code
487 | ```{r}
488 | load(url("http://s3.amazonaws.com/assets.datacamp.com/production/course_1570/datasets/straw_hat_df.RData"))
489 | sys_date <- Sys.Date()
490 | sys_date_year <- format(sys_date, '%Y')
491 | sys_date_year_num <- as.numeric(sys_date_year)
492 | age <- straw_hat_df$age
493 | birthday <- straw_hat_df$birthday
494 | birth_year <- sys_date_year_num - age
495 | birth_year_char <- as.character(birth_year)
496 | rm(sys_date)
497 | rm(sys_date_year)
498 | rm(sys_date_year_num)
499 | rm(age)
500 | rm(birth_year)
501 | ```
502 | 
503 | *** =sample_code
504 | ```{r}
505 | # straw_hat_df 、 birthday 與 birth_year_char 已預先載入
506 | 
507 | # 結合 birth_year_char 與 birthday
508 | birth_date_char <- paste(__, __, sep = "__")
509 | 
510 | # 將 birth_date_char 轉成日期 birth_date
511 | birth_date <- 
512 | 
513 | # 將 birth_date 新增至資料框
514 | straw_hat_df$birth_date <- 
515 | 
516 | # 將資料框輸出在 R Console
517 | 
518 | ```
519 | 
520 | *** =solution
521 | ```{r}
522 | # straw_hat_df 、 birthday 與 birth_year_char 已預先載入
523 | 
524 | # 結合 birth_year_char 與 birthday
525 | birth_date_char <- paste(birth_year_char, birthday, sep = "-")
526 | 
527 | # 將 birth_date_char 轉成日期 birth_date
528 | birth_date <- as.Date(birth_date_char)
529 | 
530 | # 將 birth_date 新增至資料框
531 | straw_hat_df$birth_date <- birth_date
532 | 
533 | # 將資料框輸出在 R Console
534 | straw_hat_df
535 | ```
536 | 
537 | *** =sct
538 | ```{r}
539 | msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#27491;&#30906;&#20351;&#29992; `paste()` &#20989;&#25976;&#65311;"
540 | test_function("paste",
541 |               args = NULL, index = 1,
542 |               eval = TRUE,
543 |               eq_condition = "equivalent",
544 |               not_called_msg = msg,
545 |               args_not_specified_msg = NULL,
546 |               incorrect_msg = msg)
547 | 
548 | msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#20351;&#29992; `paste()` &#20989;&#25976;&#29983;&#25104; `birth_date_char`&#65311;"
549 | test_object("birth_date_char", 
550 |             undefined_msg = msg, 
551 |             incorrect_msg = msg)
552 |             
553 | msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#27491;&#30906;&#20351;&#29992; `as.Date()` &#20989;&#25976;&#65311;"
554 | test_function("as.Date",
555 |               args = NULL, index = 1,
556 |               eval = TRUE,
557 |               eq_condition = "equivalent",
558 |               not_called_msg = msg,
559 |               args_not_specified_msg = NULL,
560 |               incorrect_msg = msg)
561 | 
562 | msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#20351;&#29992; `as.Date()` &#20989;&#25976;&#29983;&#25104; `birth_date`&#65311;"
563 | test_object("birth_date", 
564 |             undefined_msg = msg, 
565 |             incorrect_msg = msg)
566 | 
567 | msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#25104;&#21151;&#23559; `birth_date` &#26032;&#22686;&#33267;&#33609;&#24125;&#28023;&#36042;&#22296;&#36039;&#26009;&#26694;&#20013;&#65311;"
568 | 
569 | test_data_frame("straw_hat_df",
570 |                 columns = "birth_date",
571 |                 undefined_msg = "&#30906;&#35469;&#20320;&#26159;&#21542;&#19981;&#23567;&#24515;&#23559; `straw_hat_df` &#31227;&#38500;&#20102;&#65311;",
572 |                 undefined_cols_msg = msg,
573 |                 incorrect_msg = msg)
574 | 
575 | test_output_contains("straw_hat_df",
576 |                      incorrect_msg = "&#30906;&#35469;&#26159;&#21542;&#26377;&#23559; `straw_hat_df` &#36664;&#20986;&#22312; R Console&#65311;")
577 | 
578 | success_msg("&#33021;&#22816;&#23436;&#25104;&#36889;&#20491;&#31995;&#21015;&#30340;&#32244;&#32722;&#30495;&#26159;&#19981;&#31777;&#21934;&#65292;&#30475;&#20358;&#20320;&#24456;&#26377;&#28507;&#21147;&#25104;&#28858;&#33609;&#24125;&#28023;&#36042;&#22296;&#30340;&#31532;&#21313;&#20491;&#22821;&#20276;&#65281;")
579 | ```


--------------------------------------------------------------------------------