├── .gitignore
├── README.md
├── chapter1.Rmd
├── chapter2.Rmd
├── chapter3.Rmd
├── chapter4.Rmd
├── chapter5.Rmd
├── chapter6.Rmd
├── course.yml
├── datasets
├── chapter5.R
├── chapter5.RData
├── chapter6.R
└── chapter6.RData
├── refguides
├── chapter1_refguide.Rmd
├── chapter2_refguide.Rmd
├── chapter3_refguide.Rmd
├── chapter4_refguide.Rmd
├── chapter5_refguide.Rmd
├── chapter6_refguide.Rmd
└── chapter7_refguide.Rmd
└── scripts
└── chapter1_script.md
/.gitignore:
--------------------------------------------------------------------------------
1 | *
2 | !*.Rmd
3 | !*.yml
4 | !README.md
5 | !.gitignore
6 | .Rproj.user
7 | !removed/
8 | !removed/*.Rmd
9 | !scripts/
10 | !scripts/*.md
11 | !datasets/
12 | !datasets/*.R
13 | !datasets/*.RData
14 | !refguides/
15 | !refguides/*
16 |
17 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Introduction to R (beta)
2 |
3 | Source files for the improved introduction to R course (still in beta)
4 |
5 | [**Link to Course**](https://www.datacamp.com/courses/732)
6 |
7 | This course should be updated through [DataCamp Teach](https://www.datacamp.com/teach).
8 |
--------------------------------------------------------------------------------
/chapter1.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title_meta : Chapter 1
3 | title : Intro to basics
4 | description : "In this chapter, you will take your first steps with R. You will learn how to use the console as a calculator and how to assign variables. You will also get to know the basic data types in R. Let's get started!"
5 | attachments :
6 | slides_link: https://s3.amazonaws.com/assets.datacamp.com/course/introduction_to_r/slides/ch1_slides_v2.pdf
7 |
8 | --- type:VideoExercise lang:r xp:50 skills:1 key:1a1ba28cd5
9 | ## Meet R
10 |
11 | *** =video_link
12 | //player.vimeo.com/video/144351865
13 |
14 | *** =video_hls
15 | //videos.datacamp.com/transcoded/732_intro_to_r/v1/hls-ch1_1.master.m3u8
16 |
17 | --- type:NormalExercise lang:r xp:100 skills:1 key:5714863ba8
18 | ## Your first R script
19 |
20 | In the script on the right you should type R code to solve the exercises. When you hit the _Submit Answer_ button, every line of code in the script is interpreted and executed by R and you get a message that indicates whether or not your code was correct. The output of your submission is shown in the R console.
21 |
22 | You can also execute R commands straight in the console. When you type in the console, your submission will not be checked for correctness! Try, for example, to type in `3 * 4` and hit Enter. R should return `[1] 12`.
23 |
24 | *** =instructions
25 | In the script, add another line of code that calculates the sum of 6 and 12, and hit the _Submit Answer_ button.
26 |
27 | *** =hint
28 | Simply add a line of R code that calculates the sum of 6 and 12, just like the example in the sample code!
29 |
30 | *** =pre_exercise_code
31 | ```{r}
32 | # no pec
33 | ```
34 |
35 | *** =sample_code
36 | ```{r}
37 | 3 + 4
38 | ```
39 |
40 | *** =solution
41 | ```{r}
42 | 3 + 4
43 | 6 + 12
44 | ```
45 |
46 | *** =sct
47 | ```{r}
48 | test_output_contains("18", incorrect_msg = "Make sure to add a line of R code that calculates the sum of 6 and 12.")
49 | success_msg("Awesome! See how the console shows the result of the R code you submitted?")
50 | ```
51 |
52 |
53 | --- type:NormalExercise lang:r xp:100 skills:1 key:ab37530088
54 | ## Documenting your code
55 |
56 | Adding comments to your code is extremely important to make sure that you and others can understand what your code is about. R makes use of the `#` sign to add comments, just like Twitter!
57 |
58 | Comments are not run as R code, so they will not influence your result. For example, _Calculate 3 + 4_ in the script on the right is a comment and is ignored during execution.
59 |
60 | *** =instructions
61 | Add another comment in the script on the right, _Calculate 6 + 12_, at the appropriate location.
62 |
63 | *** =hint
64 | Simply add the line `# Calculate 6 + 12` above the R code that calculates 6 + 12.
65 |
66 | *** =pre_exercise_code
67 | ```{r}
68 | # no pec
69 | ```
70 |
71 | *** =sample_code
72 | ```{r}
73 | # Calculate 3 + 4
74 | 3 + 4
75 |
76 |
77 | 6 + 12
78 | ```
79 |
80 | *** =solution
81 | ```{r}
82 | # Calculate 3 + 4
83 | 3 + 4
84 |
85 | # Calculate 6 + 12
86 | 6 + 12
87 | ```
88 |
89 | *** =sct
90 | ```{r}
91 | test_output_contains("7", incorrect_msg = "Do not remove the code that calculates 3 + 4.")
92 | test_student_typed("# Calculate 3 + 4", not_typed_msg = "Do not remove the comment for the code that calculates 3 + 4.")
93 | test_output_contains("18", incorrect_msg = "Do not remove the code that calculates 6 + 12.")
94 | test_student_typed(c("# Calculate 6 + 12", "# calculate 6 + 12", "#Calculate 6 + 12", "#calculate 6 + 12",
95 | "# Calculate 6+12", "# calculate 6+12", "#Calculate 6+12", "#calculate 6+12"),
96 | not_typed_msg = "Make sure to add the comment: `# Calculate 6 + 12`")
97 | success_msg("Great! Looks better, doesn't it?")
98 | ```
99 |
100 |
101 | --- type:NormalExercise lang:r xp:100 skills:1 key:9d8a3d0b88
102 | ## R as a calculator
103 |
104 | In its most basic form R can be used as a scientific calculator. Consider the following arithmetic operators:
105 |
106 | - Addition: `+`
107 | - Subtraction: `-`
108 | - Multiplication: `*`
109 | - Division: `/`
110 | - Exponentiation: `^`
111 | - Modulo: `%%`
112 |
113 | The last two might need some explaining:
114 | - The `^` operator raises the number to its left to the power of the number to its right: for example `3^2` equals 9.
115 | - The modulo returns the remainder of the division of the number to the left by the number on its right, for example 5 modulo 3 or `5 %% 3` equals 2.
116 |
117 | *** =instructions
118 | - Type `2^5` in the script to calculate 2 to the power 5.
119 | - Type `28 %% 6` to calculate 28 modulo 6.
120 | - Click _Submit Answer_ and have a look at the R output in the console.
121 |
122 | *** =hint
123 | Another example of the modulo operator: `9 %% 2` equals `1`.
124 |
125 | *** =pre_exercise_code
126 | ```{r}
127 | # no pec
128 | ```
129 |
130 | *** =sample_code
131 | ```{r}
132 | # Addition
133 | 5 + 5
134 |
135 | # Subtraction
136 | 5 - 5
137 |
138 | # Multiplication
139 | 3 * 5
140 |
141 | # Division
142 | (5 + 5) / 2
143 |
144 | # Exponentiation
145 |
146 |
147 | # Modulo
148 |
149 | ```
150 |
151 | *** =solution
152 | ```{r}
153 | # Addition
154 | 5 + 5
155 |
156 | # Subtraction
157 | 5 - 5
158 |
159 | # Multiplication
160 | 3 * 5
161 |
162 | # Division
163 | (5 + 5) / 2
164 |
165 | # Exponentiation
166 | 2 ^ 5
167 |
168 | # Modulo
169 | 28 %% 6
170 | ```
171 |
172 | *** =sct
173 | ```{r}
174 | msg <- "Do not remove the examples that have already been coded for you!"
175 | test_output_contains("5 + 5", incorrect_msg = msg)
176 | test_output_contains("5 - 5", incorrect_msg = msg)
177 | test_output_contains("3 * 5", incorrect_msg = msg)
178 | test_output_contains("(5 + 5)/2", incorrect_msg = msg)
179 | test_output_contains("2^5", incorrect_msg = "Have another look at the exponentiation. Read the instructions carefully.")
180 | test_output_contains("28 %% 6", incorrect_msg = "Have another look at the use of the `%%` operator. Read the instructions carefully.")
181 | success_msg("Nice one!")
182 | ```
183 |
184 |
185 | --- type:MultipleChoiceExercise xp:50 skills:1 key:9d8819fb2e
186 | ## R's pros and cons
187 |
188 | As Filip explained in the video, there are things that make R the awesome and immensely popular language that it is today. On the other hand, there are also aspects about R that are less attractive. Which of the following statements are true regarding this statistical programming language developed by Ihaka and Gentleman in the nineties?
189 |
190 | 1. As opposed to SAS and SPSS, R is completely open-source.
191 | 2. R is open-source, but it's hard to share your code with others since R uses a command-line interface.
192 | 3. It typically takes a long time for new and updated R packages to be released and made available to the public.
193 | 4. R is easy to use, but this comes at the cost of limited graphical abilities.
194 | 5. R works well with large data sets, if the code is properly written and the data fits into the working memory.
195 |
196 | *** =instructions
197 | - statements (1) and (2) are correct; the others are false.
198 | - statements (1) and (4) are correct; the others are false.
199 | - statements (1) and (5) are correct; the others are false.
200 | - statements (2) and (4) are correct; the others are false.
201 | - statements (3) and (5) are correct; the others are false.
202 |
203 | *** =hint
204 | Remember that your data has to fit in the working memory for R to be able to process it.
205 |
206 | *** =pre_exercise_code
207 | ```{r}
208 | # no pec
209 | ```
210 |
211 | *** =sct
212 | ```{r}
213 | msg1 = "Remember that the fact that R uses a command-line interface, does not make it hard to share code. On the contrary, sharing your results becomes very straightforward because you can easily share R scripts."
214 | msg2 = "R is the perfect tool for creating neat and insightful visualizations. Try again."
215 | msg3 = "Great! Head over to the next exercise and get your hands dirty!"
216 | msg4 = "R uses a command-line interface, which makes it very easy to share one's code. Also, R is very suitable for creating visualizations. Try again."
217 | msg5 = "It's fairly straightforward to write, maintain and share R packages. Try again."
218 | test_mc(3, feedback_msgs = c(msg1, msg2, msg3, msg4, msg5))
219 | ```
220 |
221 |
222 | --- type:NormalExercise lang:r xp:100 skills:1 key:6b6fb4974c
223 | ## Variable assignment (1)
224 |
225 | A variable allows you store a value or an object in R. You can then later use this variable's name to easily access the value or the object that is stored within this variable. You use `<-` to assign a variable:
226 |
227 | ```
228 | my_variable <- 4
229 | ```
230 |
231 | *** =instructions
232 | Complete the code in the editor such that it assigns the value 42 to the variable `x` in the editor. Click 'Submit Answer'. Notice that when you ask R to print `x`, the value 42 appears.
233 |
234 | *** =hint
235 | Look at how the value 4 was assigned to `my_variable` in the exercise's assignment. Do the exact same thing in the editor, but now assign 42 to the variable `x`.
236 |
237 | *** =pre_exercise_code
238 | ```{r}
239 | # no pec
240 | ```
241 |
242 | *** =sample_code
243 | ```{r}
244 | # Assign the value 42 to x
245 | x <-
246 |
247 | # Print out the value of the variable x
248 | x
249 | ```
250 |
251 | *** =solution
252 | ```{r}
253 | # Assign the value 42 to x
254 | x <- 42
255 |
256 | # Print out the value of the variable x
257 | x
258 | ```
259 |
260 | *** =sct
261 | ```{r}
262 | test_error()
263 | test_object("x",
264 | undefined_msg = "Make sure to define a variable x
.",
265 | incorrect_msg = "Make sure that you assign the correct value to x
.")
266 | success_msg("Good job! Notice that R does not print the value of a variable to the console when you do the assignment. x <- 42
did not generate any output, because R assumes that you will be needing this variable in the future. Otherwise you wouldn't have stored the value in a variable in the first place, right? Proceed to the next exercise!")
267 | ```
268 |
269 |
270 |
271 |
272 | --- type:NormalExercise xp:100 skills:1 key:a5b8028834
273 | ## Variable assignment (2)
274 |
275 | Suppose you have a fruit basket with five apples. You want to store the number of apples in a variable with the name `my_apples`.
276 |
277 | *** =instructions
278 | - Using `<-`, assign the value 5 to `my_apples` below the first comment.
279 | - Type `my_apples` below the second comment. This will print out the value of `my_apples`.
280 | - After clicking _Submit Answer_, have a look at the console: the number 5 is printed, so R now links the variable `my_apples` to the value 5.
281 |
282 | *** =hint
283 | Remember that if you want to assign a number or an object to a variable in R, you can make use of the assignment operator `<-`. Alternatively, you can use `=`, but `<-` is widely preferred in the R community.
284 |
285 | *** =pre_exercise_code
286 | ```{r}
287 | ```
288 |
289 | *** =sample_code
290 | ```{r}
291 | # Assign the value 5 to the variable called my_apples
292 |
293 |
294 | # Print out the value of the variable my_apples
295 |
296 | ```
297 |
298 | *** =solution
299 | ```{r}
300 | # Assign the value 5 to the variable called my_apples
301 | my_apples <- 5
302 |
303 | # Print out the value of the variable my_apples
304 | my_apples
305 | ```
306 |
307 | *** =sct
308 | ```{r}
309 | test_object("my_apples", incorrect_msg = "Have you correctly assigned 5 to `my_apples`? Write `my_apples <- 5` on a new line in the script.")
310 | test_output_contains("my_apples", incorrect_msg = "Have you explicitly told R to print out the `my_apples` variable to the console? Simply type `my_apples` on a new line.")
311 | success_msg("Great! You could also use `=` for variable assignment, but `<-` is typically preferred.")
312 | ```
313 |
314 |
315 | --- type:NormalExercise lang:r xp:100 skills:1 key:a0cb1bea96
316 | ## Variable assignment (3)
317 |
318 | Every tasty fruit basket needs oranges, so you decide to add six oranges. You decide to create the variable `my_oranges` and assign the value 6 to it. Next, you want to calculate how many pieces of fruit you have in total. Since you have given meaningful names to these values, you can now code this in a clear way:
319 |
320 | ```
321 | my_apples + my_oranges
322 | ```
323 |
324 | *** =instructions
325 | - Assign to `my_oranges` the value 6.
326 | - Add the variables `my_apples` and `my_oranges` and have R simply print the result.
327 | - Combine the variables `my_apples` and `my_oranges` into a new variable `my_fruit`, which is the total amount of fruits in your fruit basket.
328 |
329 | *** =hint
330 | `my_fruit` is just the sum of `my_apples` and `my_oranges`. You can use the `+` operator to sum the two and `<-` to assign that value to the variable `my_fruit`.
331 |
332 | *** =pre_exercise_code
333 | ```{r}
334 | # no pec
335 | ```
336 |
337 | *** =sample_code
338 | ```{r}
339 | # Assign 5 to my_apples
340 | my_apples <- 5
341 |
342 | # Assign 6 to my_oranges
343 |
344 |
345 | # Add my_apples and my_oranges: print the result
346 |
347 |
348 | # Add my_apples and my_oranges: assign to my_fruit
349 |
350 | ```
351 |
352 | *** =solution
353 | ```{r}
354 | # Assign 5 to my_apples
355 | my_apples <- 5
356 |
357 | # Assign 6 to my_oranges
358 | my_oranges <- 6
359 |
360 | # Add my_apples and my_oranges: print the result
361 | my_apples + my_oranges
362 |
363 | # Add my_apples and my_oranges: assign to my_fruit
364 | my_fruit <- my_apples + my_oranges
365 | ```
366 |
367 | *** =sct
368 | ```{r}
369 | test_object("my_apples", incorrect_msg = "Do not change the assignment of the `my_apples` variable!")
370 | test_object("my_oranges")
371 | test_output_contains("my_apples + my_oranges",
372 | incorrect_msg = "The output does not contain the result of adding `my_apples` and `my_oranges` (second instruction). Try again.")
373 | test_object("my_fruit")
374 | success_msg("Nice one! The great advantage of doing calculations with variables is reusability. If you just change `my_apples` to equal 12 instead of 5 and rerun the script, `my_fruit` will automatically update as well.")
375 | ```
376 |
377 |
378 | --- type:NormalExercise lang:r xp:100 skills:1 key:6192f64167
379 | ## The workspace
380 |
381 | If you assign a value to a variable, this variable is stored in the workspace. It's the place where all user-defined variables live. The command [`ls()`](http://www.rdocumentation.org/packages/base/functions/ls) lists the contents of this workspace.
382 |
383 | ```
384 | a <- 1
385 | b <- 2
386 | ls()
387 | ```
388 |
389 | The first two lines create the variables `a` and `b`. Calling [`ls()`](http://www.rdocumentation.org/packages/base/functions/ls) now shows you that `a` and `b` are in the workspace.
390 |
391 | You can also remove variables from the workspace. You do this with [`rm()`](http://www.rdocumentation.org/packages/base/functions/rm). `rm(a)`, for example, would remove `a` from the workspace again. `rm(list = ls())`, which is used in the beginning of your script, clears everything from the workspace.
392 |
393 | *** =instructions
394 | - Create a variable, `horses`, equal to 3, and a variable `dogs`, equal to 7.
395 | - List the contents of your workspace with [`ls()`](http://www.rdocumentation.org/packages/base/functions/ls) to see that indeed, these two variables are in there.
396 |
397 | *** =hint
398 | All you need is a combination of [`ls()`](http://www.rdocumentation.org/packages/base/functions/ls) and [`rm()`](http://www.rdocumentation.org/packages/base/functions/rm) at the right time. Give it a try and let the feedback messages guide you.
399 |
400 | *** =pre_exercise_code
401 | ```{r}
402 | # no pec
403 | ```
404 |
405 | *** =sample_code
406 | ```{r}
407 | # Clear the entire workspace
408 | rm(list = ls())
409 |
410 | # Create the variables horses and dogs
411 |
412 |
413 | # List the contents of your workspace
414 |
415 |
416 | ```
417 |
418 | *** =solution
419 | ```{r}
420 | # Clear the entire workspace
421 | rm(list = ls())
422 |
423 | # Create the variables horses and dogs
424 | horses <- 3
425 | dogs <- 7
426 |
427 | # Inspect the contents of the workspace again
428 | ls()
429 | ```
430 |
431 | *** =sct
432 | ```{r}
433 | test_student_typed("rm(list = ls())", not_typed_msg = "Do not remove the line `rm(list = ls())`.")
434 | test_object("horses")
435 | test_object("dogs")
436 | test_output_contains('c("dogs", "horses")',
437 | incorrect_msg = "Make sure to inspect the objects in your workspace after creating `horses` and `dogs`.")
438 | success_msg("Awesome! You can now build up and inspect your workspace, great!")
439 | ```
440 |
441 |
442 | --- type:VideoExercise lang:r xp:50 skills:1 key:9f9019501e
443 | ## Basic Data Types
444 |
445 | *** =video_link
446 | //player.vimeo.com/video/138173888
447 |
448 | *** =video_hls
449 | //videos.datacamp.com/transcoded/732_intro_to_r/v1/hls-ch1_2.master.m3u8
450 |
451 |
452 |
453 | --- type:NormalExercise lang:r xp:100 skills:1 key:1866cdd202
454 | ## Discover Basic Data Types
455 |
456 | To get started, here are some of R's most basic data types:
457 |
458 | - Decimal values like `4.5` are called **numerics**.
459 | - Natural numbers like `4L` are called **integers**. Integers are also numerics.
460 | - Boolean values (`TRUE` or `FALSE`) are called **logical**. Capital letters are important here; `true` and `false` are not valid.
461 | - Text (or string) values are called **characters**.
462 |
463 | Note how the quotation marks on the right indicate that `"some text"` is of type character.
464 |
465 | *** =instructions
466 | Change the value of the:
467 |
468 | - `my_numeric` variable to `42`.
469 | - `my_character` variable to `"forty-two"`. Note that the quotation marks indicate that `"forty-two"` is a character.
470 | - `my_logical` variable to `FALSE`.
471 |
472 | *** =hint
473 | Replace the values in the script with the values that are provided in the exercise.
474 | ```
475 | my_numeric <- 42
476 | ```
477 | assigns the value 42 to the variable `my_numeric`.
478 |
479 | *** =pre_exercise_code
480 | ```{r}
481 | # no pec
482 | ```
483 |
484 | *** =sample_code
485 | ```{r}
486 | # What is the answer to the universe?
487 | my_numeric <- 42.5
488 |
489 | # The quotation marks indicate that the variable is of type character
490 | my_character <- "some text"
491 |
492 | # Change the value of my_logical
493 | my_logical <- TRUE
494 | ```
495 |
496 | *** =solution
497 | ```{r}
498 | # What is the answer to the universe?
499 | my_numeric <- 42
500 |
501 | # The quotation marks indicate that the variable is of type character
502 | my_character <- "forty-two"
503 |
504 | # Change the value of my_logical
505 | my_logical <- FALSE
506 | ```
507 |
508 | *** =sct
509 | ```{r}
510 | test_object("my_numeric",
511 | incorrect_msg = "Make sure that you assign the correct value to `my_numeric.`")
512 | test_object("my_character",
513 | incorrect_msg = paste("Make sure that you assign the correct value to `my_character`.",
514 | "Do not forget the quotes and beware of capitalization! R is case sensitive!"))
515 | test_object("my_logical",
516 | undefined_msg = "Please make sure to define a variable `my_logical`.",
517 | incorrect_msg = "Make sure that you assign the correct value to `my_logical`.")
518 | success_msg("Great work! Continue to the next exercise.")
519 | ```
520 |
521 | --- type:NormalExercise lang:r xp:100 skills:1 key:c52153af0b
522 | ## Back to Apples and Oranges
523 |
524 | Common knowledge tells you not to add apples and oranges. But hey, that is what you just did! The `my_apples` and `my_oranges` variables both contained a number in the previous exercise. The `+` operator works with numeric variables in R.
525 |
526 | However, if you try to add a numeric and a character string, R will complain.
527 |
528 | *** =instructions
529 | - Click _Submit Answer_ and read the error message. Make sure you understand why this did not work.
530 | - Adjust `my_oranges <- "six"` such that R knows you have 6 oranges and thus a fruit basket with 11 pieces of fruit. Click _Submit Answer_ again.
531 |
532 | *** =hint
533 | You have to assign the numeric value `6` to the `my_oranges` variable instead of the character value `"six"`. Notice how the quotation marks are used to indicate that `"six"` is a character.
534 |
535 | *** =pre_exercise_code
536 | ```{r}
537 | # no pec
538 | ```
539 |
540 | *** =sample_code
541 | ```{r}
542 | # Assign a value to the variable my_apples and print it out
543 | my_apples <- 5
544 | my_apples
545 |
546 | # Assign a value to the variable my_oranges and print it out
547 | my_oranges <- "six"
548 | my_oranges
549 |
550 | # New variable that contains the total amount of fruit
551 | my_fruit <- my_apples + my_oranges
552 | my_fruit
553 | ```
554 |
555 | *** =solution
556 | ```{r}
557 | # Assign a value to the variable my_apples and print it out
558 | my_apples <- 5
559 | my_apples
560 |
561 | # Assign a value to the variable my_oranges and print it out
562 | my_oranges <- 6
563 | my_oranges
564 |
565 | # New variable that contains the total amount of fruit
566 | my_fruit <- my_apples + my_oranges
567 | my_fruit
568 | ```
569 |
570 | *** =sct
571 | ```{r}
572 | test_object("my_apples", incorrect_msg = "Don't change the code that assigns 5 to `my_apples`.")
573 | test_object("my_oranges", incorrect_msg = "Change the assignment of the `my_oranges` variable such that the code runs without errors.")
574 | test_object("my_fruit",
575 | undefined_msg = "Please make sure to define a variable `my_fruit`.",
576 | incorrect_msg = "Make sure that you assign the correct value to `my_fruit`.")
577 | test_output_contains("my_fruit", incorrect_msg = "The output does not contain the result of adding `my_apples` and `my_oranges`.")
578 | success_msg("Awesome, keep up the good work!")
579 | ```
580 |
581 | --- type:MultipleChoiceExercise lang:r xp:50 skills:1 key:7806ca24d2
582 | ## What's that data type?
583 |
584 | When you added the variables containing `5` and `"six"`, you got an error due to a mismatch in data types. You can avoid such embarrassing situations by checking the data type of a variable beforehand:
585 |
586 | ```
587 | class(my_var)
588 | ```
589 |
590 | In the workspace (you can see what it contains by typing [`ls()`](http://www.rdocumentation.org/packages/base/functions/ls) in the console), some variables have already been defined. Which statement concerning these variables are correct?
591 |
592 | *** =instructions
593 | - `a`'s class is `integer`, `b` is a `character`, `c` is a `boolean`.
594 | - `a`'s class is `character`, `b` is a `character` as well, `c` is a `logical`.
595 | - `a`'s class is `numeric`, `b` is a `string`, `c` is a `logical`.
596 | - `a`'s class is `numeric`, `b` is a `character`, `c` is a `logical`.
597 |
598 | *** =hint
599 | You can find out the data type of the `a` variable for example by typing `class(a)`. You can do similar things for `b` and `c`.
600 |
601 | *** =pre_exercise_code
602 | ```{r}
603 | a <- 42
604 | b <- "forty-two"
605 | c <- FALSE
606 | ```
607 |
608 | *** =sct
609 | ```{r}
610 | msg1 <- "`boolean` is not the class for logical values. Try again."
611 | msg2 <- "`a` is of the class `numeric`, give it another go."
612 | msg3 <- "`string` is not a class in R. `character` is!"
613 | msg4 <- "Nice one. Let's step it up a notch and start coercing variables!"
614 | test_mc(correct = 4, feedback_msgs = c(msg1, msg2, msg3, msg4))
615 | ```
616 |
617 | --- type:NormalExercise lang:r xp:100 skills:1 key:c75fe45544
618 | ## Coercion: Taming your data
619 |
620 | As Filip explained in the video, coercion to transform your data from one type to another is perfectly possible. Next to the [`class()`](http://www.rdocumentation.org/packages/base/functions/class) function and the `is.*()` functions, you can use the `as.*()` functions to force data to change types.
621 |
622 | Take this example:
623 |
624 | ```
625 | var <- "3"
626 | var_num <- as.numeric(var)
627 | ```
628 |
629 | `var`, a character string, is converted into a numeric using [`as.numeric()`](http://www.rdocumentation.org/packages/base/functions/numeric). The resulting numeric is stored as `var_num`.
630 |
631 | *** =instructions
632 | - Convert `var`, a logical, to a character. Assign to resulting character string to the variable `var_char`.
633 | - Inspect the class of `var_char` by using [`class()`](http://www.rdocumentation.org/packages/base/functions/class) on it.
634 |
635 | *** =hints
636 | Use the [`as.character()`](http://www.rdocumentation.org/packages/base/functions/character) function to convert `var` to a character.
637 |
638 | *** =pre_exercise_code
639 | ```{r}
640 | ```
641 |
642 | *** =sample_code
643 | ```{r}
644 | # Definition of var
645 | var <- TRUE
646 |
647 | # Convert var to a character: var_char
648 |
649 |
650 | # Display the class of var_char
651 |
652 |
653 | ```
654 |
655 | *** =solution
656 | ```{r}
657 | # Definition of var
658 | var <- TRUE
659 |
660 | # Convert var to a character: var_char
661 | var_char <- as.character(var)
662 |
663 | # Display the class of var_char
664 | class(var_char)
665 | ```
666 |
667 | *** =sct
668 | ```{r}
669 | test_error()
670 | msg <- "Do not remove or change the definition of the variable `var`."
671 | test_object("var", undefined_msg = msg, incorrect_msg = msg)
672 | test_function("as.character", "x",
673 | not_called_msg = "Make sure to call the function [`as.character()`](http://www.rdocumentation.org/packages/base/functions/character) to convert `var` to a character.",
674 | incorrect_msg = "Have you passed the correct variable to the function [`as.character()`](http://www.rdocumentation.org/packages/base/functions/character)?")
675 | test_object("var_char")
676 | test_function("class", "x",
677 | not_called_msg = "Make sure to call the function class()
to inspect the class of var_char
.",
678 | incorrect_msg = "Have you passed the correct variable to the function class()/?")
679 | success_msg("Bellissimo!")
680 | ```
681 |
--------------------------------------------------------------------------------
/chapter2.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title_meta : Chapter 2
3 | title : Vectors
4 | description : We take you on a trip to Vegas, where you will learn how to analyze your gambling results using vectors in R! After completing this chapter, you will be able to create vectors in R, name them, select elements from them and compare different vectors.
5 | attachments :
6 | slides_link: https://s3.amazonaws.com/assets.datacamp.com/course/introduction_to_r/slides/ch2_slides.pdf
7 |
8 | --- type:VideoExercise lang:r xp:50 skills:1 key:b91dd847a0
9 | ## Create and Name Vectors
10 |
11 | *** =video_link
12 | //player.vimeo.com/video/138173896
13 |
14 | *** =video_hls
15 | //videos.datacamp.com/transcoded/732_intro_to_r/v1/hls-ch2_1.master.m3u8
16 |
17 |
18 | --- type:NormalExercise lang:r xp:100 skills:1 key:2d1cb04427
19 | ## Create a vector (1)
20 |
21 | Feeling lucky? You better, because we'll take you on a trip to Las Vegas!
22 |
23 | Thanks to R and your new data science skills, you will learn how to uplift your performance at the tables and fire off your career as a professional gambler. This chapter will show how you can easily keep track of your betting progress and how you can do some simple analyses on past actions.
24 |
25 | You will use vectors. As Filip explained you, vectors are one dimensional arrays that can hold numeric data, character data or logical data. You create a vector with the combine function [`c()`](http://www.rdocumentation.org/packages/base/functions/c). You place the vector elements separated by a comma between the brackets. For example:
26 |
27 | ```
28 | numeric_vector <- c(1, 2, 3)
29 | character_vector <- c("a", "b", "c")
30 | logical_vector <- c(TRUE, FALSE)
31 | ```
32 |
33 | *** =instructions
34 | Create a vector, `logical_vector`, that contains the three elements: `TRUE`, `FALSE` and `TRUE` (in that order).
35 |
36 | *** =hint
37 | Assign `c(TRUE, FALSE, TRUE)` to the variable `logical_vector` with the `<-` operator.
38 |
39 | *** =pre_exercise_code
40 | ```{r}
41 | # no pec
42 | ```
43 |
44 | *** =sample_code
45 | ```{r}
46 | numeric_vector <- c(1, 10, 49)
47 | character_vector <- c("x", "y", "z")
48 |
49 | # Create logical_vector
50 |
51 | ```
52 |
53 | *** =solution
54 | ```{r}
55 | numeric_vector <- c(1, 10, 49)
56 | character_vector <- c("x", "y", "z")
57 |
58 | # Create logical_vector
59 | logical_vector <- c(TRUE, FALSE, TRUE)
60 | ```
61 |
62 | *** =sct
63 | ```{r}
64 | msg <- "Do not change how `numeric_vector` and `character_vector` are created!"
65 | lapply(c("numeric_vector", "character_vector"), test_object, undefined_msg = msg, incorrect_msg = msg)
66 | test_object("logical_vector", incorrect_msg = "Make sure that you assign the correct values to `logical_vector`. The order matters!")
67 | success_msg("Perfect! Let's practice some more with vector creation.")
68 | ```
69 |
70 | --- type:NormalExercise lang:r xp:100 skills:1 key:c6e056b9c3
71 | ## Create a vector (2)
72 |
73 | After one week in Las Vegas and still zero Ferraris in your garage, you decide that it is time to start using your data science superpowers.
74 |
75 | Before doing your first analysis, you decide to collect all the winnings and losses for the last week:
76 |
77 | For `poker_vector`:
78 | - On Monday you won \$140
79 | - Tuesday you **lost** \$50
80 | - Wednesday you won \$20
81 | - Thursday you **lost** \$120
82 | - Friday you won \$240
83 |
84 | For `roulette_vector`:
85 | - On Monday you **lost** \$24
86 | - Tuesday you **lost** \$50
87 | - Wednesday you won \$100
88 | - Thursday you **lost** \$350
89 | - Friday you won \$10
90 |
91 | To be able to use this data in R, you decide to create the variables `poker_vector` and `roulette_vector`.
92 |
93 | *** =instructions
94 | Assign the winnings/losses for roulette as a vector to the variable `roulette_vector`. Make sure to use the correct order.
95 |
96 | *** =hint
97 | To help you with this step, the script already contains the code for creating `poker_vector`. Assign the correct values to `roulette_vector` based on the numbers in the assignment. Do not forget that losses are negative numbers.
98 |
99 |
100 | *** =pre_exercise_code
101 | ```{r}
102 | ```
103 |
104 | *** =sample_code
105 | ```{r}
106 | # Poker winnings from Monday to Friday
107 | poker_vector <- c(140, -50, 20, -120, 240)
108 |
109 | # Roulette winnings from Monday to Friday: roulette_vector
110 |
111 | ```
112 |
113 | *** =solution
114 |
115 | ```{r}
116 | # Poker winnings from Monday to Friday
117 | poker_vector <- c(140, -50, 20, -120, 240)
118 |
119 | # Roulette winnings from Monday to Friday: roulette_vector
120 | roulette_vector <- c(-24, -50, 100, -350, 10)
121 | ```
122 |
123 | *** =sct
124 | ```{r}
125 | test_object("poker_vector",
126 | incorrect_msg = "Don't change how `poker_vector` is defined.")
127 | test_object("roulette_vector",
128 | incorrect_msg = paste("Make sure that you assign a vector with the correct values to `roulette_vector`.",
129 | "If you lost money, you should use a `-` sign."))
130 | success_msg("Very good! To check out the contents of your vectors, remember that you can always simply type the variable in the console and hit Enter. Proceed to the next exercise!")
131 | ```
132 |
133 |
134 | --- type:NormalExercise lang:r xp:100 skills:1 key:ebb5aae2ff
135 | ## Naming a vector (1)
136 |
137 | As a data analyst, it is important to have a clear view on the data that you are using. Understanding what each element refers to is essential.
138 |
139 | In the previous exercise, we created a vector with your winnings over the week. Each vector element refers to a day of the week but it is hard to tell which element belongs to which day. It would be nice if you could show that in the vector itself. Remember the [`names()`](http://www.rdocumentation.org/packages/base/functions/names) function to name the elements of a vector?
140 |
141 | ```
142 | some_vector <- c("Johnny", "Poker Player")
143 | names(some_vector) <- c("Name", "Profession")
144 | ```
145 |
146 | *** =instructions
147 | `poker_vector` has already been named with the days of the week. Do the same thing for `roulette_vector`. Beware: R is case sensitive!
148 |
149 | *** =hint
150 | Assign `c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")` to `names(roulette_vector)`.
151 |
152 | *** =pre_exercise_code
153 | ```{r}
154 | ```
155 |
156 | *** =sample_code
157 | ```{r}
158 | # Poker winnings from Monday to Friday
159 | poker_vector <- c(140, -50, 20, -120, 240)
160 |
161 | # Roulette winnings from Monday to Friday
162 | roulette_vector <- c(-24, -50, 100, -350, 10)
163 |
164 | # Add names to poker_vector
165 | names(poker_vector) <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
166 |
167 | # Add names to roulette_vector
168 |
169 | ```
170 |
171 | *** =solution
172 | ```{r}
173 | # Poker winnings from Monday to Friday
174 | poker_vector <- c(140, -50, 20, -120, 240)
175 |
176 | # Roulette winnings from Monday to Friday
177 | roulette_vector <- c(-24, -50, 100, -350, 10)
178 |
179 | # Add names to poker_vector
180 | names(poker_vector) <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
181 |
182 | # Add names to roulette_vector
183 | names(roulette_vector) <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
184 | ```
185 |
186 | *** =sct
187 | ```{r}
188 | msg <- "Do not change the values inside `%s`; they were already coded for you."
189 | test_object("poker_vector", incorrect_msg = sprintf(msg, "poker_vector"))
190 | test_object("roulette_vector", incorrect_msg = sprintf(msg, "roulette_vector"))
191 | msg <- "Make sure that you assign the correct names vector to `%s`. The names of the day should start with a capital letter!"
192 | test_object("poker_vector", eq_condition = "equal", incorrect_msg = sprintf(msg, "poker_vector"))
193 | test_object("roulette_vector", eq_condition = "equal", incorrect_msg = sprintf(msg, "roulette_vector"))
194 | success_msg("Well done!")
195 | ```
196 |
197 |
198 | --- type:NormalExercise lang:r xp:100 skills:1 key:5c026ed9fb
199 | ## Naming a vector (2)
200 |
201 | If you want to become a good statistician, you have to become lazy. (If you are already lazy, chances are high you are one of those exceptional, natural-born statistical talents!)
202 |
203 | In the previous exercises you probably experienced that it is boring and frustrating to type and retype information such as the days of the week. However, there is a more efficient way to do this, namely, to assign the days of the week vector to a variable!
204 |
205 | Just like you did with your poker and roulette returns, you can also create a variable that contains the days of the week. This way you can use and re-use it. This variable, `days_vector`, has already been coded for you.
206 |
207 | *** =instructions
208 | - Use the variable `days_vector` to set the names of `poker_vector`.
209 | - Use the variable `days_vector` to set the names of `roulette_vector`.
210 |
211 | *** =hint
212 | You can use `names(poker_vector) <- ` to set the names of the variable `poker_vector`.
213 |
214 | *** =pre_exercise_code
215 | ```{r}
216 | # no pec
217 | ```
218 |
219 | *** =sample_code
220 | ```{r}
221 | # Poker winnings from Monday to Friday
222 | poker_vector <- c(140, -50, 20, -120, 240)
223 |
224 | # Roulette winnings from Monday to Friday
225 | roulette_vector <- c(-24, -50, 100, -350, 10)
226 |
227 | # Create the variable days_vector
228 | days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
229 |
230 | # Use days_vector to name poker_vector
231 |
232 |
233 | # Use days_vector to name roulette_vector
234 | ```
235 |
236 | *** =solution
237 | ```{r}
238 | # Poker winnings from Monday to Friday
239 | poker_vector <- c(140, -50, 20, -120, 240)
240 |
241 | # Roulette winnings from Monday to Friday
242 | roulette_vector <- c(-24, -50, 100, -350, 10)
243 |
244 | # Create the variable days_vector
245 | days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
246 |
247 | # Use days_vector to name poker_vector
248 | names(poker_vector) <- days_vector
249 |
250 | # Use days_vector to name roulette_vector
251 | names(roulette_vector) <- days_vector
252 | ```
253 |
254 | *** =sct
255 | ```{r}
256 | msg <- "Do not change the values inside `%s`; they were already coded for you."
257 | test_object("poker_vector", incorrect_msg = sprintf(msg, "poker_vector"))
258 | test_object("roulette_vector", incorrect_msg = sprintf(msg, "roulette_vector"))
259 | test_object("days_vector", incorrect_msg = sprintf(msg, "days_vector"))
260 |
261 | msg <- "Make sure that you assign `days_vector` to the names of `%s`. Use the `names()` function."
262 | test_object("poker_vector", eq_condition = "equal", incorrect_msg = sprintf(msg, "poker_vector"))
263 | test_object("roulette_vector", eq_condition = "equal", incorrect_msg = sprintf(msg, "roulette_vector"))
264 |
265 | success_msg("Nice one! A word of advice: try to avoid code duplication at all times.")
266 | ```
267 |
268 | --- type:VideoExercise lang:r xp:50 skills:1 key:b47466f033
269 | ## Vector Arithmetic
270 |
271 | *** =video_link
272 | //player.vimeo.com/video/141163398
273 |
274 | *** =video_hls
275 | //videos.datacamp.com/transcoded/732_intro_to_r/v1/hls-ch2_2.master.m3u8
276 |
277 |
278 | --- type:NormalExercise lang:r xp:100 skills:1 key:6b17fc50b9
279 | ## Calculate your earnings
280 |
281 | Now that you understand how R does arithmetic calculations with vectors, it is time to get those Ferraris in your garage! First, you need to understand what the overall profit or loss per day of the week was. The total daily profit is the sum of the profit/loss you realized on poker per day, and the profit/loss you realized on roulette per day.
282 |
283 | Remember that vector calculations happen element-wise; the following three statements are completely equivalent:
284 |
285 | ```
286 | c(1, 2, 3) + c(4, 5, 6)
287 | c(1 + 4, 2 + 5, 3 + 6)
288 | c(5, 7, 9)
289 | ```
290 |
291 | *** =instructions
292 | - Assign to the variable `total_daily` how much you won or lost on each day in total (poker and roulette combined). `total_daily` should be a vector with 5 values.
293 | - Print out `total_daily`.
294 |
295 | *** =hint
296 | Similar to the previous exercise, assign the sum of two vectors to a new variable, `total_daily`.
297 |
298 | *** =pre_exercise_code
299 | ```{r}
300 | # no pec
301 | ```
302 |
303 | *** =sample_code
304 | ```{r}
305 | # Casino winnings from Monday to Friday
306 | poker_vector <- c(140, -50, 20, -120, 240)
307 | roulette_vector <- c(-24, -50, 100, -350, 10)
308 | days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
309 | names(poker_vector) <- days_vector
310 | names(roulette_vector) <- days_vector
311 |
312 | # Calculate your daily earnings: total_daily
313 |
314 |
315 | # Print out total_daily
316 | ```
317 |
318 | *** =solution
319 | ```{r}
320 | # Casino winnings from Monday to Friday
321 | poker_vector <- c(140, -50, 20, -120, 240)
322 | roulette_vector <- c(-24, -50, 100, -350, 10)
323 | days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
324 | names(poker_vector) <- days_vector
325 | names(roulette_vector) <- days_vector
326 |
327 | # Calculate your daily earnings: total_daily
328 | total_daily <- poker_vector + roulette_vector
329 |
330 | # Print out total_daily
331 | total_daily
332 | ```
333 |
334 | *** =sct
335 | ```{r}
336 | msg = "Do not change anything about the definition and naming of `poker_vector` and `roulette_vector`."
337 | test_object("days_vector", undefined_msg = msg, incorrect_msg = msg)
338 | test_object("poker_vector", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg)
339 | test_object("roulette_vector", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg)
340 | test_object("total_daily",
341 | incorrect_msg = "Make sure that you assign the sum of `poker_vector` and `roulette_vector` to `total_daily`. Simply use `+`.")
342 | test_output_contains("total_daily", incorrect_msg = "Don't forget to print out `total_daily`.")
343 | success_msg("Great! Continue to the next exercise.")
344 | ```
345 |
346 |
347 | --- type:NormalExercise lang:r xp:100 skills:1 key:a9a1a50a31
348 | ## Calculate total winnings: sum()
349 |
350 | Based on the previous analysis, it looks like you had a mix of good and bad days. This is not what your ego expected, and you wonder if there may be a (very very very) tiny chance you have lost money over the week in total?
351 |
352 | You can answer this question using the [`sum()`](http://www.rdocumentation.org/packages/base/functions/sum) function. As mentioned in the video, it calculates the sum of all elements of a vector.
353 |
354 | *** =instructions
355 | - Calculate the total amount of money that you have won/lost with poker and assign it to the variable `total_poker`.
356 | - Do the same thing for roulette and assign the result to `total_roulette`.
357 | - Use `+` to sum the `total_poker` and `total_roulette`, which is the sum of all gains and losses of the week. Simply print the result to the console.
358 |
359 | *** =hint
360 | Use the [`sum()`](http://www.rdocumentation.org/packages/base/functions/sum) function to get the total of the `poker_vector`. Do the same thing for `roulette_vector`.
361 |
362 | *** =pre_exercise_code
363 | ```{r}
364 | # no pec
365 | ```
366 |
367 | *** =sample_code
368 | ```{r}
369 | # Casino winnings from Monday to Friday
370 | poker_vector <- c(140, -50, 20, -120, 240)
371 | roulette_vector <- c(-24, -50, 100, -350, 10)
372 | days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
373 | names(poker_vector) <- days_vector
374 | names(roulette_vector) <- days_vector
375 |
376 | # Total winnings with poker: total_poker
377 |
378 |
379 | # Total winnings with roulette: total_roulette
380 |
381 |
382 | # Total winnings overall: print out the result
383 |
384 | ```
385 |
386 | *** =solution
387 | ```{r}
388 | # Casino winnings from Monday to Friday
389 | poker_vector <- c(140, -50, 20, -120, 240)
390 | roulette_vector <- c(-24, -50, 100, -350, 10)
391 | days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
392 | names(poker_vector) <- days_vector
393 | names(roulette_vector) <- days_vector
394 |
395 | # Total winnings with poker: total_poker
396 | total_poker <- sum(poker_vector)
397 |
398 | # Total winnings with roulette: total_roulette
399 | total_roulette <- sum(roulette_vector)
400 |
401 | # Total winnings overall: print out the result
402 | total_roulette + total_poker
403 | ```
404 |
405 | *** =sct
406 | ```{r}
407 | msg <- "Do not change anything about the definition and naming of `poker_vector` and `roulette_vector`."
408 | test_object("days_vector", undefined_msg = msg, incorrect_msg = msg)
409 | test_object("poker_vector", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg)
410 | test_object("roulette_vector", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg)
411 | test_object("total_poker",
412 | undefined_msg = "Please make sure to define a variable `total_poker`.",
413 | incorrect_msg = "Make sure that you assign to `total_poker` the sum of the `poker_vector`.")
414 | test_object("total_roulette",
415 | undefined_msg = "Please make sure to define a variable `total_roulette`.",
416 | incorrect_msg = "Make sure that you assign to `total_roulette` the sum of the `roulette_vector`.")
417 | test_output_contains("total_poker + total_roulette", incorrect_msg = "Print the sum of `total_poker` and `total_roulette` to the console.")
418 | success_msg("Oops, it seems like you are losing money. Time to rethink and adapt your strategy! This will require some deeper analysis...")
419 | ```
420 |
421 |
422 | --- type:VideoExercise lang:r xp:50 skills:1 key:513029f4ac
423 | ## Vector Subsetting
424 |
425 | *** =video_link
426 | //player.vimeo.com/video/138173916
427 |
428 | *** =video_hls
429 | //videos.datacamp.com/transcoded/732_intro_to_r/v1/hls-ch2_3.master.m3u8
430 |
431 |
432 | --- type:NormalExercise lang:r xp:100 skills:1 key:6112e74425
433 | ## Selection by index (1)
434 |
435 | After you figured that roulette is not your forte, you decide to compare your performance at the beginning of the week to your performance at the end of the week. You did have a couple of Margarita cocktails at the end of the week...
436 |
437 | To answer that question, you only want to focus on a selection of the `total_vector`. In other words, our goal is to select specific elements of the vector.
438 |
439 | *** =instructions
440 | - Assign the poker results of Wednesday to the variable `poker_wednesday`.
441 | - Assign the roulette results of Friday to the variable `roulette_friday`.
442 |
443 | *** =hint
444 | Wednesday is the third element of `poker_vector`, and can thus be selected with `poker_vector[3]`.
445 |
446 | *** =pre_exercise_code
447 | ```{r}
448 | # no pec
449 | ```
450 |
451 | *** =sample_code
452 | ```{r}
453 | # Casino winnings from Monday to Friday
454 | poker_vector <- c(140, -50, 20, -120, 240)
455 | roulette_vector <- c(-24, -50, 100, -350, 10)
456 | days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
457 | names(poker_vector) <- days_vector
458 | names(roulette_vector) <- days_vector
459 |
460 | # Poker results of Wednesday: poker_wednesday
461 |
462 |
463 | # Roulette results of Friday: roulette_friday
464 |
465 | ```
466 |
467 | *** =solution
468 | ```{r}
469 | # Casino winnings from Monday to Friday
470 | poker_vector <- c(140, -50, 20, -120, 240)
471 | roulette_vector <- c(-24, -50, 100, -350, 10)
472 | days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
473 | names(poker_vector) <- days_vector
474 | names(roulette_vector) <- days_vector
475 |
476 | # Poker results of Wednesday: poker_wednesday
477 | poker_wednesday <- poker_vector[3]
478 |
479 | # Roulette results of Friday: roulette_friday
480 | roulette_friday <- roulette_vector[5]
481 | ```
482 |
483 | *** =sct
484 | ```{r}
485 |
486 | msg = "Do not change anything about the definition and naming of `poker_vector` and `roulette_vector`."
487 | test_object("days_vector", undefined_msg = msg, incorrect_msg = msg)
488 | test_object("poker_vector", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg)
489 | test_object("roulette_vector", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg)
490 | test_object("poker_wednesday",
491 | incorrect_msg = "It looks like `poker_wednesday` does not contain the correct value of `poker_vector`.")
492 | test_object("roulette_friday",
493 | incorrect_msg = "It looks like `roulette_friday` does not contain the correct value of `roulette_vector`.")
494 | success_msg("Great! R also makes it possible to select multiple elements from a vector at once, remember? Put the theory to practice in the next exercise!")
495 | ```
496 |
497 |
498 | --- type:NormalExercise lang:r xp:100 skills:1 key:ae2832fbd1
499 | ## Selection by index (2)
500 |
501 | How about analyzing your midweek results?
502 |
503 | Instead of using a single number to select a single element, you can also select multiple elements by passing a vector inside the square brackets. For example,
504 |
505 | ```
506 | poker_vector[c(1,5)]
507 | ```
508 |
509 | selects the first and the fifth element of `poker_vector`.
510 |
511 |
512 | *** =instructions
513 | - Assign the poker results of Tuesday, Wednesday and Thursday to the variable `poker_midweek`.
514 | - Assign the roulette results of Thursday and Friday to the variable `roulette_endweek`.
515 |
516 | *** =hint
517 | Use the vector `c(2,3,4)` between square brackets to select the correct elements of `poker_vector`.
518 |
519 | *** =pre_exercise_code
520 | ```{r}
521 | # no pec
522 | ```
523 |
524 | *** =sample_code
525 | ```{r}
526 | # Casino winnings from Monday to Friday
527 | poker_vector <- c(140, -50, 20, -120, 240)
528 | roulette_vector <- c(-24, -50, 100, -350, 10)
529 | days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
530 | names(poker_vector) <- days_vector
531 | names(roulette_vector) <- days_vector
532 |
533 | # Mid-week poker results: poker_midweek
534 |
535 |
536 | # End-of-week roulette results: roulette_endweek
537 |
538 |
539 | ```
540 |
541 | *** =solution
542 | ```{r}
543 | # Casino winnings from Monday to Friday
544 | poker_vector <- c(140, -50, 20, -120, 240)
545 | roulette_vector <- c(-24, -50, 100, -350, 10)
546 | days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
547 | names(poker_vector) <- days_vector
548 | names(roulette_vector) <- days_vector
549 |
550 | # Mid-week poker results: poker_midweek
551 | poker_midweek <- poker_vector[c(2, 3, 4)]
552 |
553 | # End-of-week roulette results: roulette_endweek
554 | roulette_endweek <- roulette_vector[c(4,5)]
555 | ```
556 |
557 | *** =sct
558 | ```{r}
559 | msg <- "Do not change anything about the definition and naming of `poker_vector` and `roulette_vector`."
560 | test_object("days_vector", undefined_msg = msg, incorrect_msg = msg)
561 | test_object("poker_vector", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg)
562 | test_object("roulette_vector", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg)
563 |
564 | msg <- "It looks like `%s` does not contain the correct elements from `%s`."
565 | test_object("poker_midweek",
566 | incorrect_msg = sprintf(msg, "poker_midweek", "poker_vector"))
567 | test_object("roulette_endweek",
568 | incorrect_msg = sprintf(msg, "roulette_endweek", "roulette_vector"))
569 |
570 | success_msg("Well done! Another way to find the mid-week results is `poker_vector[2:4]`. Continue to the next exercise to specialize in vector selection some more!");
571 | ```
572 |
573 | --- type:NormalExercise lang:r xp:100 skills:1 key:5919f3fc05
574 | ## Selection by name
575 |
576 | Another way to tackle the previous exercise is by using the names of the vector elements (Monday, Tuesday, ...) instead of their numeric positions. For example,
577 |
578 | ```
579 | poker_vector["Monday"]
580 | ```
581 |
582 | will select the first element of `poker_vector` since `"Monday"` is the name of that first element.
583 |
584 | *** =instructions
585 | - Select the fourth element, corresponding to Thursday, from `roulette_vector`. Name it `roulette_thursday`.
586 | - Select Tuesday's poker gains using subsetting by name. Assign the result to `poker_tuesday`.
587 |
588 | *** =hint
589 | You can use `mean(my_vector)` to get the mean of the vector `my_vector`.
590 |
591 | *** =pre_exercise_code
592 | ```{r}
593 | # no pec
594 | ```
595 |
596 | *** =sample_code
597 | ```{r}
598 | # Casino winnings from Monday to Friday
599 | poker_vector <- c(140, -50, 20, -120, 240)
600 | roulette_vector <- c(-24, -50, 100, -350, 10)
601 | days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
602 | names(poker_vector) <- days_vector
603 | names(roulette_vector) <- days_vector
604 |
605 | # Select Thursday's roulette gains: roulette_thursday
606 |
607 |
608 | # Select Tuesday's poker gains: poker_tuesday
609 |
610 | ```
611 |
612 | *** =solution
613 | ```{r}
614 | # Casino winnings from Monday to Friday
615 | poker_vector <- c(140, -50, 20, -120, 240)
616 | roulette_vector <- c(-24, -50, 100, -350, 10)
617 | days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
618 | names(poker_vector) <- days_vector
619 | names(roulette_vector) <- days_vector
620 |
621 | # Select Thursday's roulette gains: roulette_thursday
622 | roulette_thursday <- roulette_vector["Thursday"]
623 |
624 | # Select Tuesday's poker gains: poker_tuesday
625 | poker_tuesday <- poker_vector["Tuesday"]
626 | ```
627 |
628 | *** =sct
629 | ```{r}
630 | msg <- "Do not change anything about the definition and naming of `poker_vector` and `roulette_vector`."
631 | test_object("days_vector", undefined_msg = msg, incorrect_msg = msg)
632 | test_object("poker_vector", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg)
633 | test_object("roulette_vector", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg)
634 |
635 | test_object("roulette_thursday")
636 | test_object("poker_tuesday")
637 | success_msg("Good job! Head over to the next exercise.");
638 | ```
639 |
640 | --- type:NormalExercise lang:r xp:100 skills:1 key:22121c6c46
641 | ## Selection by logicals (1)
642 |
643 | There are basically three ways to subset vectors: by using the indices, by using the names (if the vectors are named) and by using logical vectors. Filip already told you about the internals in the instructional video. As a refresher, have a look at the following statements to select elements from `poker_vector`, which are all equivalent:
644 |
645 | ```
646 | # selection by index
647 | poker_vector[c(1,3)]
648 |
649 | # selection by name
650 | poker_vector[c("Monday", "Wednesday")]
651 |
652 | # selection by logicals
653 | poker_vector[c(TRUE, FALSE, TRUE, FALSE, FALSE)]
654 | ```
655 |
656 | *** =instructions
657 | - Assign the roulette results from the first, third and fifth day to `roulette_subset`.
658 | - Select the first three days from `poker_vector` using a vector of logicals. Assign the result to `poker_start`.
659 |
660 | *** =hint
661 | The logical vector to use inside square brackets for the first instruction is `c(TRUE, FALSE, TRUE, FALSE, TRUE)`.
662 |
663 | *** =pre_exercise_code
664 | ```{r}
665 | # no pec
666 | ```
667 |
668 | *** =sample_code
669 | ```{r}
670 | # Casino winnings from Monday to Friday
671 | poker_vector <- c(140, -50, 20, -120, 240)
672 | roulette_vector <- c(-24, -50, 100, -350, 10)
673 | days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
674 | names(poker_vector) <- days_vector
675 | names(roulette_vector) <- days_vector
676 |
677 | # Roulette results for day 1, 3 and 5: roulette_subset
678 |
679 |
680 | # Poker results for first three days: poker_start
681 | ```
682 |
683 | *** =solution
684 | ```{r}
685 | # Casino winnings from Monday to Friday
686 | poker_vector <- c(140, -50, 20, -120, 240)
687 | roulette_vector <- c(-24, -50, 100, -350, 10)
688 | days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
689 | names(poker_vector) <- days_vector
690 | names(roulette_vector) <- days_vector
691 |
692 | # Roulette relsults for day 1, 3 and 5: roulette_subset
693 | roulette_subset <- roulette_vector[c(TRUE, FALSE, TRUE, FALSE, TRUE)]
694 |
695 | # Poker results for first three days: poker_start
696 | poker_start <- poker_vector[c(TRUE, TRUE, TRUE, FALSE, FALSE)]
697 | ```
698 |
699 | *** =sct
700 | ```{r}
701 | msg = "Do not change anything about the definition and naming of `poker_vector` and `roulette_vector`."
702 | test_object("days_vector", undefined_msg = msg, incorrect_msg = msg)
703 | test_object("poker_vector", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg)
704 | test_object("roulette_vector", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg)
705 | test_object("roulette_subset")
706 | test_object("poker_start")
707 | success_msg("Nice one! Using logical vectors to perform subsetting might seem somewhat tedious, but its true power will become clear in the next exercise!")
708 | ```
709 |
710 |
711 | --- type:NormalExercise lang:r xp:100 skills:1 key:aa2e5f6e97
712 | ## Selection by logicals (2)
713 |
714 | By making use of a combination of comparison operators and subsetting using logicals, you can investigate your casino performance in a more pro-active way.
715 |
716 | The (logical) comparison operators known to R are:
717 | - `<` for less than
718 | - `>` for greater than
719 | - `<=` for less than or equal to
720 | - `>=` for greater than or equal to
721 | - `==` for equal to each other
722 | - `!=` not equal to each other
723 |
724 | Experiment with these operators in the console:
725 |
726 | ```
727 | lost_roulette_days <- roulette_vector < 0
728 | lost_roulette_days
729 | ```
730 |
731 | The result will be a logical vector, which you can use to perform subsetting, like this example:
732 |
733 | ```
734 | roulette_vector[lost_roulette_days]
735 | ```
736 |
737 | The result is a subset of `roulette_vector` that contains only your losses in roulette.
738 |
739 | *** =instructions
740 | - Check if your poker winnings are positive on the different days of the week (i.e. > 0), and assign this to `selection_vector`.
741 | - Assign the amounts that you won on the profitable days to the variable `poker_profits` by using `selection_vector`.
742 |
743 | *** =hint
744 | - In order to check for which days your poker gains are positive, R should check for each element of `poker_vector` whether it is larger than zero. `some_vector > 0` is the way to tell R what you are after.
745 | - After creating `selection_vector`, you can use it to subset `poker_vector` like this: `poker_vector[selection_vector]`.
746 |
747 | *** =pre_exercise_code
748 | ```{r}
749 | # no pec
750 | ```
751 |
752 | *** =sample_code
753 | ```{r}
754 | # Casino winnings from Monday to Friday
755 | poker_vector <- c(140, -50, 20, -120, 240)
756 | roulette_vector <- c(-24, -50, 100, -350, 10)
757 | days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
758 | names(poker_vector) <- days_vector
759 | names(roulette_vector) <- days_vector
760 |
761 | # Create logical vector corresponding to profitable poker days: selection_vector
762 |
763 |
764 | # Select amounts for profitable poker days: poker_profits
765 |
766 | ```
767 |
768 | *** =solution
769 | ```{r}
770 | # Casino winnings from Monday to Friday
771 | poker_vector <- c(140, -50, 20, -120, 240)
772 | roulette_vector <- c(-24, -50, 100, -350, 10)
773 | days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
774 | names(poker_vector) <- days_vector
775 | names(roulette_vector) <- days_vector
776 |
777 | # Create logical vector corresponding to profitable poker days: selection_vector
778 | selection_vector <- poker_vector > 0
779 |
780 | # Select amounts for profitable poker days: poker_profits
781 | poker_profits <- poker_vector[selection_vector]
782 | ```
783 |
784 | *** =sct
785 | ```{r}
786 | msg = "Do not change anything about the definition and naming of `poker_vector` and `roulette_vector`."
787 | test_object("days_vector", undefined_msg = msg, incorrect_msg = msg)
788 | test_object("poker_vector", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg)
789 | test_object("roulette_vector", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg)
790 | test_object("selection_vector",
791 | undefined_msg = "Please make sure to define a variable `selection_vector`.",
792 | incorrect_msg = "It looks like `selection_vector` does not contain the correct result. Remember that R uses element wise operations for vectors.")
793 | test_object("poker_profits",
794 | undefined_msg = "Please make sure to define a variable `poker_profits`.",
795 | incorrect_msg = "It looks like `poker_profits` does not contain the correct result. Remember that R uses element wise operations for vectors.")
796 | success_msg("Great! Move on to the Matrices chapter!")
797 | ```
798 |
799 |
--------------------------------------------------------------------------------
/chapter3.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title_meta : Chapter 3
3 | title : Matrices
4 | description : In this chapter you will learn how to work with matrices in R. By the end of the chapter, you will be able to create matrices and to understand how you can do basic computations with them. You will analyze the box office numbers of Star Wars to illustrate the use of matrices in R. May the force be with you!
5 | attachments :
6 | slides_link: https://s3.amazonaws.com/assets.datacamp.com/course/introduction_to_r/slides/ch3_slides.pdf
7 |
8 | --- type:VideoExercise lang:r xp:50 skills:1 key:82d8734b17
9 | ## Create and Name Matrices
10 |
11 | *** =video_link
12 | //player.vimeo.com/video/138173926
13 |
14 | *** =video_hls
15 | //videos.datacamp.com/transcoded/732_intro_to_r/v1/hls-ch3_1.master.m3u8
16 |
17 |
18 | --- type:NormalExercise lang:r xp:100 skills:1 key:834a0e546c
19 | ## Analyzing matrices, you shall (1)
20 |
21 | It is now time to get your hands dirty. In the following exercises you will analyze the box office numbers of the Star Wars franchise. May the force be with you!
22 |
23 | As a reminder, look at this line of code that constructs a matrix with numbers 1 through 9, filled row by row:
24 |
25 | ```
26 | matrix(1:9, byrow = TRUE, nrow = 3)
27 | ```
28 |
29 | In the script, a vector `box` is defined that represents the box office numbers from the first three Star Wars movies. The first, third and fifth element correspond to the US box office revenue for the movies, the second, fourth and sixth element represent the non-US box office revenue.
30 |
31 | *** =instructions
32 | Construct a matrix `star_wars_matrix`:
33 |
34 | - Each row represents a movie.
35 | - The first column is for the US box office revenue, and the second column for the non-US box office revenue.
36 | - Use the function `matrix()` with `box` as the first input, and the additional arguments `nrow` and `byrow`.
37 |
38 | *** =hint
39 | Set `nrow` to `3` and `byrow` to `TRUE` inside `matrix()`.
40 |
41 | *** =pre_exercise_code
42 | ```{r}
43 | # no pec
44 | ```
45 |
46 | *** =sample_code
47 | ```{r}
48 | # Star Wars box office in millions (!)
49 | box <- c(460.998, 314.4, 290.475, 247.900, 309.306, 165.8)
50 |
51 | # Create star_wars_matrix
52 |
53 | ```
54 |
55 | *** =solution
56 | ```{r}
57 | # Star Wars box office in millions (!)
58 | box <- c(460.998, 314.4, 290.475, 247.900, 309.306, 165.8)
59 |
60 | # Create star_wars_matrix
61 | star_wars_matrix <- matrix(box, nrow = 3, byrow = TRUE)
62 | ```
63 |
64 | *** =sct
65 | ```{r}
66 | test_error()
67 | msg <- "Do not change or remove the definition of `box`!"
68 | test_object("box", undefined_msg = msg, incorrect_msg = msg)
69 |
70 | test_correct({
71 | test_object("star_wars_matrix",
72 | undefined_msg = "Please make sure to define a variable `star_wars_matrix`.",
73 | incorrect_msg = "Did you assign the correct matrix containing the vector that holds all three movies to `star_wars_matrix`?")
74 | }, {
75 | test_function("matrix", "data")
76 | test_function("matrix", "nrow")
77 | test_function("matrix", "byrow")
78 | })
79 | success_msg("Great job!")
80 | ```
81 |
82 |
83 | --- type:NormalExercise lang:r xp:100 skills:1 key:0dfb4c5e70
84 | ## Analyzing matrices, you shall (2)
85 |
86 | Instead of as a single vector, the box office numbers for the three Star Wars movies are represented as three vectors. Remember the [`rbind()`](http://www.rdocumentation.org/packages/base/functions/cbind) function to paste together different vectors as if they were rows of a matrix? Take this example, that pastes together 2 vectors as if they were rows of a matrix:
87 |
88 | ```
89 | a <- c(1, 2, 3)
90 | b <- c(4, 5, 6)
91 | rbind(a, b)
92 | ```
93 |
94 | Try a similar thing on the astronomical Star Wars numbers!
95 |
96 | *** =instructions
97 | Again, construct the matrix `star_wars_matrix` with one row for each movie.
98 |
99 | *** =hint
100 | Simply pass the three vectors to the [`rbind()`](http://www.rdocumentation.org/packages/base/functions/cbind) function.
101 |
102 | *** =pre_exercise_code
103 | ```{r}
104 | # no pec
105 | ```
106 |
107 | *** =sample_code
108 | ```{r}
109 | # Star Wars box office in millions (!)
110 | new_hope <- c(460.998, 314.4)
111 | empire_strikes <- c(290.475, 247.900)
112 | return_jedi <- c(309.306, 165.8)
113 |
114 | # Create star_wars_matrix
115 |
116 | ```
117 |
118 | *** =solution
119 | ```{r}
120 | # Star Wars box office in millions (!)
121 | new_hope <- c(460.998, 314.4)
122 | empire_strikes <- c(290.475, 247.900)
123 | return_jedi <- c(309.306, 165.8)
124 |
125 | # Create star_wars_matrix
126 | star_wars_matrix <- rbind(new_hope, empire_strikes, return_jedi)
127 | ```
128 |
129 | *** =sct
130 | ```{r}
131 | test_error()
132 | msg = "Do not change anything about the box office variables `new_hope`, `empire_strikes` and `return_jedi`!"
133 | test_object("new_hope", undefined_msg = msg, incorrect_msg = msg)
134 | test_object("empire_strikes", undefined_msg = msg, incorrect_msg = msg)
135 | test_object("return_jedi", undefined_msg = msg, incorrect_msg = msg)
136 |
137 | test_correct({
138 | test_object("star_wars_matrix",
139 | incorrect_msg = "Did you assign the correct matrix containing the vector that holds all three movies to `star_wars_matrix`?")
140 | }, {
141 | test_function("rbind", not_called_msg = "You should use the [`rbind()`](http://www.rdocumentation.org/packages/base/functions/cbind) function to create the matrix.")
142 | })
143 | success_msg("The force is actually with you! Continue to the next exercise.")
144 | ```
145 |
146 |
147 | --- type:NormalExercise lang:r xp:100 skills:1 key:ca3dbb8a9f
148 | ## Naming a matrix
149 |
150 | To help you remember what is stored in `star_wars_matrix`, you would like to add the names of the movies for the rows. Not only does this help you to read the data, but it is also useful to select certain elements from the matrix later.
151 |
152 | Similar to vectors, you can add names for the rows and the columns of a matrix
153 |
154 | ```
155 | rownames(my_matrix) <- row_names_vector
156 | colnames(my_matrix) <- col_names_vector
157 | ```
158 |
159 | *** =instructions
160 | - Two vectors containing the row names and column names have been created for you: `movie_names` and `col_titles`.
161 | - Name the rows of `star_wars_matrix` with `movie_names`.
162 | - Name the columns of `star_wars_matrix` with the pre-defined `col_titles`.
163 |
164 | *** =hint
165 | To name the rows, start with `rownames(star_wars_matrix) <-`; can you finish the command?
166 |
167 | *** =pre_exercise_code
168 | ```{r}
169 | # no pec
170 | ```
171 |
172 | *** =sample_code
173 | ```{r}
174 | # Star Wars box office in millions (!)
175 | new_hope <- c(460.998, 314.4)
176 | empire_strikes <- c(290.475, 247.900)
177 | return_jedi <- c(309.306, 165.8)
178 | star_wars_matrix <- rbind(new_hope, empire_strikes, return_jedi)
179 |
180 | # Build col_names_vector and row_names_vector
181 | movie_names <- c("A New Hope", "The Empire Strikes Back", "Return of the Jedi")
182 | col_titles <- c("US", "non-US")
183 |
184 | # Add row names to star_wars_matrix
185 |
186 |
187 | # Add column names to star_wars_matrix
188 |
189 | ```
190 |
191 | *** =solution
192 | ```{r}
193 | # Star Wars box office in millions (!)
194 | new_hope <- c(460.998, 314.4)
195 | empire_strikes <- c(290.475, 247.900)
196 | return_jedi <- c(309.306, 165.8)
197 | star_wars_matrix <- rbind(new_hope, empire_strikes, return_jedi)
198 |
199 | # Build col_names_vector and row_names_vector
200 | movie_names <- c("A New Hope", "The Empire Strikes Back", "Return of the Jedi")
201 | col_titles <- c("US", "non-US")
202 |
203 | # Add row names to star_wars_matrix
204 | rownames(star_wars_matrix) <- movie_names
205 |
206 | # Add column names to star_wars_matrix
207 | colnames(star_wars_matrix) <- col_titles
208 | ```
209 |
210 | *** =sct
211 | ```{r}
212 | msg = "Do not change anything about the box office variables `new_hope`, `empire_strikes` and `return_jedi`!"
213 | lapply(c("new_hope", "empire_strikes", "return_jedi"), test_object, undefined_msg = msg, incorrect_msg = msg)
214 | msg <- "Do not change anything about the creation of `star_wars_matrix`."
215 | test_object("star_wars_matrix", undefined_msg = msg, incorrect_msg = msg)
216 | msg <- paste("Do not change or remove the vectors `col_names_vector` and `row_names_vector`;",
217 | "you can use them to name the columns and rows of `star_wars_matrix`.")
218 | lapply(c("movie_names", "col_titles"), test_object, undefined_msg = msg, incorrect_msg = msg)
219 | test_object("star_wars_matrix", eq_condition = "equal",
220 | incorrect_msg = paste("Did you set the row and column names of `star_wars_matrix` correctly?",
221 | "Have another look at your code."))
222 | success_msg("Great! You're on your way of becoming an R jedi!")
223 | ```
224 |
225 |
226 | --- type:NormalExercise lang:r xp:100 skills:1 key:3b60b1a49a
227 | ## Calculating the worldwide box office
228 |
229 | The single most important thing for a movie in order to become an instant legend in Tinseltown is its worldwide box office figures.
230 |
231 | To calculate the total box office revenue for the three Star Wars movies, you have to take the sum of the US revenue column and the non-US revenue column.
232 |
233 | In R, the function [`rowSums()`](http://www.rdocumentation.org/packages/base/functions/colSums) conveniently calculates the totals for each row of a matrix. This function creates a new vector:
234 |
235 | ```
236 | sum_of_rows_vector <- rowSums(my_matrix)
237 | ```
238 |
239 | *** =instructions
240 | Calculate the worldwide box office figures for the three movies and put these in the vector named `worldwide_vector`.
241 |
242 | *** =hint
243 | The ['rowSums'](http://www.rdocumentation.org/packages/base/functions/colSums) function will calculate the total box office for each row of the `star_wars_matrix`, if you supply `star_wars_matrix` as an argument to that function by putting it between the parentheses.
244 |
245 | *** =pre_exercise_code
246 | ```{r}
247 | # no pec
248 | ```
249 |
250 | *** =sample_code
251 | ```{r}
252 | # Star Wars box office in millions (!)
253 | new_hope <- c(460.998, 314.4)
254 | empire_strikes <- c(290.475, 247.900)
255 | return_jedi <- c(309.306, 165.8)
256 | star_wars_matrix <- rbind(new_hope, empire_strikes, return_jedi)
257 | rownames(star_wars_matrix) <- c("A New Hope", "The Empire Strikes Back", "Return of the Jedi")
258 | colnames(star_wars_matrix) <- c("US", "non-US")
259 |
260 | # Calculate the worldwide box office: worldwide_vector
261 |
262 | ```
263 |
264 | *** =solution
265 | ```{r}
266 | # Star Wars box office in millions (!)
267 | new_hope <- c(460.998, 314.4)
268 | empire_strikes <- c(290.475, 247.900)
269 | return_jedi <- c(309.306, 165.8)
270 | star_wars_matrix <- rbind(new_hope, empire_strikes, return_jedi)
271 | rownames(star_wars_matrix) <- c("A New Hope", "The Empire Strikes Back", "Return of the Jedi")
272 | colnames(star_wars_matrix) <- c("US", "non-US")
273 |
274 | # Calculate the worldwide box office: worldwide_vector
275 | worldwide_vector <- rowSums(star_wars_matrix)
276 | ```
277 |
278 | *** =sct
279 | ```{r}
280 | msg = "Do not change anything about the construction and naming of `star_wars_matrix`!"
281 | test_object("star_wars_matrix", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg)
282 |
283 | test_function("rowSums", "x",
284 | not_called_msg = "Have you used the function `rowSums()
left
272 | ```
273 |
274 | *** =sct
275 | ```{r}
276 | msg = "Do not change anything about the first lines that define `survey_vector` and `survey_factor`."
277 | test_object("survey_vector", undefined_msg = msg, incorrect_msg = msg)
278 | test_object("survey_factor", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg)
279 | test_output_contains("survey_factor[1] > survey_factor[2]", incorrect_msg = "Make sure to correctly perform the comparison between right and left; we want the battle of dexterity be sorted once and for all!")
280 | success_msg("Phew, it seems that R is neutral ;-).")
281 | ```
282 |
283 |
284 | --- type:NormalExercise lang:r xp:100 skills:1 key:d65e71390a
285 | ## Ordered factors
286 |
287 | Sometimes you will deal with factors that do have a natural ordering between its categories. In this case, we have to tell R about it.
288 |
289 | Suppose you're leading a research team of five data analysts and that you want to evaluate their performance. To do this, you track their speed, evaluate each analyst as `"Slow"`, `"OK"` or `"Fast"`, and save the results in `speed_vector`.
290 |
291 | `speed_vector` should be converted to an ordinal factor since its categories have a natural ordening. You already know how to do this. Here's a template to define an ordered factor once more:
292 |
293 | ```
294 | factor(some_vector, ordered = TRUE, levels = c("Level_1", "Level_2", ...))
295 | ```
296 |
297 | *** =instructions
298 | - Use the example code above to define `speed_factor`, that contains the speed information as an ordered factor. You can start from `speed_vector`, which is already created for you.
299 | - Print `speed_factor` to the console.
300 | - Use [`summary()`](http://www.rdocumentation.org/packages/base/functions/summary) to generate a summary of `speed_factor`: automagically, R prints the factor levels in the right order.
301 |
302 | *** =hint
303 | - Use the function [`factor()`](http://www.rdocumentation.org/packages/base/functions/factor) to create `factor_speed_vector` based on `speed_character_vector`.
304 | - The argument `ordered` should be set to `TRUE` since there is a natural ordering.
305 | - The argument `levels` should be equal to `c("Slow", "OK", "Fast")`.
306 |
307 | *** =pre_exercise_code
308 | ```{r}
309 | # no pec
310 | ```
311 |
312 | *** =sample_code
313 | ```{r}
314 | # Create speed_vector
315 | speed_vector <- c("OK", "Slow", "Slow", "OK", "Fast")
316 |
317 | # Convert speed_vector to ordered speed_factor
318 |
319 |
320 | # Print speed_factor
321 |
322 |
323 | # Summarize speed_factor
324 |
325 | ```
326 |
327 | *** =solution
328 | ```{r}
329 | # Create speed_vector
330 | speed_vector <- c("OK", "Slow", "Slow", "OK", "Fast")
331 |
332 | # Convert speed_vector to ordered speed_factor
333 | speed_factor <- factor(speed_vector, ordered = TRUE, levels = c("Slow", "OK", "Fast"))
334 |
335 | # Print speed_factor
336 | speed_factor
337 |
338 | # Summarize speed_factor
339 | summary(speed_factor)
340 | ```
341 |
342 | *** =sct
343 | ```{r}
344 | test_error()
345 | msg <- "Do not change anything about the command that defines `speed_vector`."
346 | test_object("speed_vector", undefined_msg = msg, incorrect_msg = msg)
347 | test_correct({
348 | test_object("speed_factor", eq_condition = "equal",
349 | incorrect_msg = "Make sure that you assigned the correct factor to `speed_factor`. Pay attention to the correct order of the `levels` argument.")
350 | },{
351 | test_function("factor", "x")
352 | test_function("factor", "levels")
353 | test_function("factor", "ordered")
354 | })
355 | test_output_contains("summary(speed_factor)", incorrect_msg = "Don't forget to summarise `speed_factor`. Use [`summary()`](http://www.rdocumentation.org/packages/base/functions/summary).")
356 | success_msg("Great! Have a look at the console. It is now indicated that the Levels indeed have an order associated, with the `<` sign. Continue to the next exercise.");
357 | success_msg("A job well done! Continue to the next exercise.")
358 | ```
359 |
360 |
361 | --- type:NormalExercise lang:r xp:100 skills:1 key:e23011c42b
362 | ## Comparing ordered factors
363 |
364 | 'Data analyst number two' is having a bad day at work. He enters your office and starts complaining that 'data analyst number five' is slowing down the entire project. Since you know that 'data analyst number two' has the reputation of being a smarty-pants, you first decide to check if his statement is true.
365 |
366 | The fact that `speed_factor` is now ordered enables us to compare different elements (the data analysts in this case). You can simply do this by using a well-known operator: `>`.
367 |
368 | *** =instructions
369 | Check whether data analyst 2 is faster than data analyst 5. Simply print out the result, which should be a logical.
370 |
371 | *** =hint
372 | `vector[1] > vector[2]` checks whether the first element of vector is larger than the second element.
373 |
374 | *** =pre_exercise_code
375 | ```{r}
376 | # no pec
377 | ```
378 |
379 | *** =sample_code
380 | ```{r}
381 | # Definition of speed_vector and speed_factor
382 | speed_factor <- factor(c("Fast", "Slow", "Slow", "Fast", "Ultra-fast"),
383 | ordered = TRUE, levels = c("Slow", "Fast", "Ultra-fast"))
384 |
385 | # Compare data analyst 2 with data analyst 5
386 |
387 | ```
388 |
389 | *** =solution
390 | ```{r}
391 | # Definition of speed_factor
392 | speed_factor <- factor(c("Fast", "Slow", "Slow", "Fast", "Ultra-fast"),
393 | ordered = TRUE, levels = c("Slow", "Fast", "Ultra-fast"))
394 |
395 | # Compare data analyst 2 with data analyst 5
396 | speed_factor[2] > speed_factor[5]
397 | ```
398 |
399 | *** =sct
400 | ```{r}
401 | msg <- "Do not change anything about the command that defines `speed_factor`!"
402 | test_object("speed_factor", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg)
403 | test_output_contains("speed_factor[2] > speed_factor[5]",
404 | incorrect_msg = paste("Have you correctly compared data analyst 2 to data analyst 5?",
405 | "Use subsetting in combination with the `>` operator."))
406 | success_msg("Bellissimo! What does the result tell you? Data analyst two is complaining about the data analyst five while in fact he or she is the one slowing everything down!")
407 | ```
408 |
--------------------------------------------------------------------------------
/chapter5.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title_meta : Chapter 5
3 | title : Lists
4 | description : Lists, as opposed to vectors, can hold components of different types, just like your to-do list at home or at work. This chapter will teach you how to create, name and subset these lists!
5 | attachments :
6 | slides_link: https://s3.amazonaws.com/assets.datacamp.com/course/introduction_to_r/slides/ch5_slides.pdf
7 |
8 | --- type:VideoExercise lang:r xp:50 skills:1 key:d4fe5a84ef
9 | ## Create and Name Lists
10 |
11 | *** =video_link
12 | //player.vimeo.com/video/138173972
13 |
14 | *** =video_hls
15 | //videos.datacamp.com/transcoded/732_intro_to_r/v1/hls-ch5_1.master.m3u8
16 |
17 |
18 | --- type:NormalExercise lang:r xp:100 skills:1 key:b14f9472dd
19 | ## Create a list
20 |
21 | Just a quick refresher: A list in R allows you to gather a variety of objects in an ordered way. These objects can be matrices, vectors, factors, data frames, even other lists, etc. It is not even required that these objects are related to each other. You can construct a list with the [`list()`](http://www.rdocumentation.org/packages/base/functions/list) function:
22 |
23 | ```
24 | my_list <- list(comp1, comp2 ...)
25 | ```
26 |
27 | *** =instructions
28 | Construct a list, named `my_list`, that contains the variables `my_vector`, `my_matrix` and `my_factor` as list components.
29 |
30 | *** =hint
31 | Use the [`list()`](http://www.rdocumentation.org/packages/base/functions/list) function with `my_vector`, `my_matrix` and `my_factor` as arguments separated by a comma.
32 |
33 | *** =pre_exercise_code
34 | ```{r}
35 | # no pec
36 | ```
37 |
38 | *** =sample_code
39 | ```{r}
40 | # Numeric vector: 1 up to 10
41 | my_vector <- 1:10
42 |
43 | # Numeric matrix: 1 up to 9
44 | my_matrix <- matrix(1:9, ncol = 3)
45 |
46 | # Factor of sizes
47 | my_factor <- factor(c("M","S","L","L","M"), ordered = TRUE, levels = c("S","M","L"))
48 |
49 | # Construct my_list with these different elements
50 |
51 | ```
52 |
53 | *** =solution
54 | ```{r}
55 | # Numeric vector: 1 up to 10
56 | my_vector <- 1:10
57 |
58 | # Numeric matrix: 1 up to 9
59 | my_matrix <- matrix(1:9, ncol = 3)
60 |
61 | # Factor of sizes
62 | my_factor <- factor(c("M","S","L","L","M"), ordered = TRUE, levels = c("S","M","L"))
63 |
64 | # Construct my_list with these different elements
65 | my_list <- list(my_vector, my_matrix, my_factor)
66 | ```
67 |
68 | *** =sct
69 | ```{r}
70 | test_error()
71 | msg = "Do not remove or change the definition of the variables `my_vector`, `my_matrix` or `my_factor`!"
72 | test_object("my_vector", undefined_msg = msg, incorrect_msg = msg)
73 | test_object("my_matrix", undefined_msg = msg, incorrect_msg = msg)
74 | test_object("my_factor", undefined_msg = msg, incorrect_msg = msg)
75 | test_object("my_list", incorrect_msg = "It looks like `my_list` does not contain the correct elements. Have another look.")
76 | success_msg("Wonderful! Your skillset is growing at a staggering pace! Head over to the next exercise.")
77 | ```
78 |
79 |
80 | --- type:NormalExercise lang:r xp:100 skills:1 key:849f04f218
81 | ## Listception: lists in lists
82 |
83 | As mentioned before, lists can also contain other lists. This works just the same as storing other types of R objects in a list. Next to the variables `my_vector`, `my_matrix` and `my_factor` from the previous exercise, now also `my_list` is predefined. Up to you to merge them in a new list; a super list!
84 |
85 | *** =instructions
86 | - Construct a list, named `my_super_list`, that now contains the four predefined variables listed in the sample code (in the same order).
87 | - Print the structure of `my_super_list` with the [`str()`](http://www.rdocumentation.org/packages/utils/functions/str) function. Be sure to enter the variables in the following order: `my_ vector`, `my_ matrix`, `my_ factor`, `my_ list`
88 |
89 | *** =hint
90 | Just as in the previous exercise, use the [`list()`](http://www.rdocumentation.org/packages/base/functions/list) function. This time you have to add four components.
91 |
92 | *** =pre_exercise_code
93 | ```{r}
94 | # no pec
95 | ```
96 |
97 | *** =sample_code
98 | ```{r}
99 | # Numeric vector: 1 up to 10
100 | my_vector <- 1:10
101 |
102 | # Numeric matrix: 1 up to 9
103 | my_matrix <- matrix(1:9, ncol = 3)
104 |
105 | # Factor of sizes
106 | my_factor <- factor(c("M","S","L","L","M"), ordered = TRUE, levels = c("S","M","L"))
107 |
108 | # List containing vector, matrix and factor
109 | my_list <- list(my_vector, my_matrix, my_factor)
110 |
111 | # Construct my_super_list with the four data structures above
112 |
113 |
114 | # Display structure of my_super_list
115 |
116 | ```
117 |
118 | *** =solution
119 | ```{r}
120 | # Numeric vector: 1 up to 10
121 | my_vector <- 1:10
122 |
123 | # Numeric matrix: 1 up to 9
124 | my_matrix <- matrix(1:9, ncol = 3)
125 |
126 | # Factor of sizes
127 | my_factor <- factor(c("M","S","L","L","M"), ordered = TRUE, levels = c("S","M","L"))
128 |
129 | # List containing vector, matrix and factor
130 | my_list <- list(my_vector, my_matrix, my_factor)
131 |
132 | # Construct my_super_list with the four data structures above
133 | my_super_list <- list(my_vector, my_matrix, my_factor, my_list)
134 |
135 | # Display structure of my_super_list
136 | str(my_super_list)
137 | ```
138 |
139 | *** =sct
140 | ```{r}
141 | test_error()
142 | msg = "Do not remove or change the definition of the variables `my_vector`, `my_matrix`, `my_factor` or `my_list`!"
143 | test_object("my_vector", undefined_msg = msg, incorrect_msg = msg)
144 | test_object("my_matrix", undefined_msg = msg, incorrect_msg = msg)
145 | test_object("my_factor", undefined_msg = msg, incorrect_msg = msg)
146 | test_object("my_list", undefined_msg = msg, incorrect_msg = msg)
147 | test_object("my_super_list",
148 | incorrect_msg = "It looks like `my_super_list` does not contain the correct elements. It's also possible that the order is not correct. Have another look.")
149 | test_output_contains("str(my_super_list)", incorrect_msg = "Don't forget to display the structure of `my_super_list` using the [`str()`](http://www.rdocumentation.org/packages/utils/functions/str) function.")
150 | success_msg("Nice one! Can you see from the displayed structure how the vector, matrix and ordered factor appear twice: once in the top-level list and once in the embedded list. Next!")
151 | ```
152 |
153 |
154 | --- type:NormalExercise lang:r xp:100 skills:1 key:f37117a766
155 | ## Create a named list (1)
156 |
157 | Well done! Let us keep this train going! To make the elements of your list clearer, you'll often want to name them:
158 |
159 | ```
160 | my_list <- list(name1 = your_comp1,
161 | name2 = your_comp2)
162 | ```
163 |
164 | If you want to name your lists after you've created them, you can use the [`names()`](http://www.rdocumentation.org/packages/base/functions/names) function as you did with vectors. The following commands are fully equivalent to the assignment above:
165 |
166 | ```
167 | my_list <- list(your_comp1, your_comp2)
168 | names(my_list) <- c("name1", "name2")
169 | ```
170 |
171 | *** =instructions
172 | - Change the code that build `my_list` by adding names to the components. Use for `my_matrix` the name `mat`, for `my_vector` the name `vec` and for `my_factor` the name `fac`.
173 | - Print the list to the console and inspect the output.
174 |
175 | *** =hint
176 | The first method of assigning names to your list components is the easiest. It starts like this:
177 | ```
178 | my_list <- list(vec = my_vector)
179 | ```
180 | Add the other two components in a similar fashion.
181 |
182 | *** =pre_exercise_code
183 | ```{r}
184 | # no pec
185 | ```
186 |
187 | *** =sample_code
188 | ```{r}
189 | # Numeric vector: 1 up to 10
190 | my_vector <- 1:10
191 |
192 | # Numeric matrix: 1 up to 9
193 | my_matrix <- matrix(1:9, ncol = 3)
194 |
195 | # Factor of sizes
196 | my_factor <- factor(c("M","S","L","L","M"), ordered = TRUE, levels = c("S","M","L"))
197 |
198 | # Adapt code to add names to elements in my_list
199 | my_list <- list(my_vector, my_matrix, my_factor)
200 |
201 | # Print my_list to the console
202 |
203 | ```
204 |
205 | *** =solution
206 | ```{r}
207 | # Numeric vector: 1 up to 10
208 | my_vector <- 1:10
209 |
210 | # Numeric matrix: 1 up to 9
211 | my_matrix <- matrix(1:9, ncol = 3)
212 |
213 | # Factor of sizes
214 | my_factor <- factor(c("M","S","L","L","M"), ordered = TRUE, levels = c("S","M","L"))
215 |
216 | # Adapt code to add names to elements in my_list
217 | my_list <- list(vec = my_vector, mat = my_matrix, fac = my_factor)
218 |
219 | # Print my_list to the console
220 | my_list
221 | ```
222 |
223 | *** =sct
224 | ```{r}
225 | test_error()
226 | msg = "Do not remove or change the definition of the variables `my_vector`, `my_matrix` or `my_factor`!"
227 | test_object("my_vector", undefined_msg = msg, incorrect_msg = msg)
228 | test_object("my_matrix", undefined_msg = msg, incorrect_msg = msg)
229 | test_object("my_factor", undefined_msg = msg, incorrect_msg = msg)
230 | test_object("my_list",
231 | incorrect_msg = "It looks like `my_list` does not contain the correct elements.")
232 | test_object("my_list", eq_condition = "equal",
233 | incorrect_msg = "It looks like `my_list` does not contain the correct naming for the components.");
234 | test_output_contains("my_list",
235 | incorrect_msg = "Don't forget to print `my_list` to the console! Simply add `my_list` on a new line in the script.")
236 | success_msg("Great! Not only do you know how to construct lists now, you can also name them; a skill that will prove most useful in practice. Continue to the next exercise.")
237 | ```
238 |
239 |
240 | --- type:NormalExercise lang:r xp:100 skills:1 key:dc5f4f6f30
241 | ## Create a named list (2)
242 |
243 | Being a huge movie fan, you decide to start storing information on good movies with the help of lists.
244 |
245 | Start by creating a list for the movie "The Shining". The variables `actors` and `reviews` that you'll need have already been coded in the sample code.
246 |
247 | *** =instructions
248 | Create the variable `shining_list`. The list contains the movie title first as "title", then the actor names as "actors", and finally the review scores factor as "reviews". Make sure to adopt the same order, and pay attention to the correct naming!
249 |
250 | *** =hint
251 | Let's get you started with a chunk of code:
252 |
253 | shining_list <- list(title = "The Shining")
254 |
255 | Can you complete the rest? You still have to add `actors_vector` and `reviews_factor` with the appropriate names.
256 |
257 | *** =pre_exercise_code
258 | ```{r}
259 | # no pec
260 | ```
261 |
262 | *** =sample_code
263 | ```{r}
264 | # Create actors and reviews
265 | actors_vector <- c("Jack Nicholson","Shelley Duvall","Danny Lloyd","Scatman Crothers","Barry Nelson")
266 | reviews_factor <- factor(c("Good", "OK", "Good", "Perfect", "Bad", "Perfect", "Good"),
267 | ordered = TRUE, levels = c("Bad", "OK", "Good", "Perfect"))
268 |
269 | # Create shining_list
270 |
271 | ```
272 |
273 | *** =solution
274 | ```{r}
275 | # Create actors and reviews
276 | actors_vector <- c("Jack Nicholson","Shelley Duvall","Danny Lloyd","Scatman Crothers","Barry Nelson")
277 | reviews_factor <- factor(c("Good", "OK", "Good", "Perfect", "Bad", "Perfect", "Good"),
278 | ordered = TRUE, levels = c("Bad", "OK", "Good", "Perfect"))
279 |
280 | # Create the list 'shining_list'
281 | shining_list <- list(title = "The Shining", actors = actors_vector, reviews = reviews_factor)
282 | ```
283 |
284 | *** =sct
285 | ```{r}
286 | test_error()
287 | msg = "Do not remove or change the definition of the pre-set variables!"
288 | test_object("actors_vector", undefined_msg = msg, incorrect_msg = msg)
289 | test_object("reviews_factor", undefined_msg = msg, incorrect_msg = msg)
290 | test_object("shining_list",
291 | incorrect_msg = "It looks like `shining_list` does not contain the correct elements.")
292 | test_object("shining_list", eq_condition = "equal",
293 | incorrect_msg = "It looks like `shining_list` does not contain the correct naming for the components.")
294 | success_msg("Perfect!")
295 | ```
296 |
297 |
298 | --- type:VideoExercise lang:r xp:50 skills:1 key:a5e5ff2680
299 | ## Subset and Extend Lists
300 |
301 | *** =video_link
302 | //player.vimeo.com/video/138173990
303 |
304 | *** =video_hls
305 | //videos.datacamp.com/transcoded/732_intro_to_r/v1/hls-ch5_2.master.m3u8
306 |
307 |
308 | --- type:NormalExercise lang:r xp:100 skills:1 key:1e3b5f4b0a
309 | ## Selecting elements from a list
310 |
311 | Your list will often be built out of numerous elements and components. Therefore, getting a single element, multiple elements, or a component out of it is not always straightforward.
312 |
313 | To select a single element from a list, for example the first element from `shining_list`, you can any one of the following commands:
314 |
315 | ```
316 | shining_list[[1]]
317 | shining_list[["title"]]
318 | shining_list$title
319 | ```
320 |
321 | If you perform selection with single square brackets, you'll end up with another list that contains the specified elements:
322 |
323 | ```
324 | shining_list[c(2,3)]
325 | shining_list[c(F,T,T)]
326 | ```
327 |
328 | *** =instructions
329 | - Select the actors from `shining_list` and assign the result to `act`.
330 | - Create a new list containing only the title and the reviews of `shining_list`; save the new list in `sublist`.
331 | - Display the structure of `sublist`.
332 |
333 | *** =hint
334 | For the first instruction you need double brackets (or the `$`), for the second one the single brackets will do.
335 |
336 | *** =pre_exercise_code
337 | ```{r}
338 | load(url("http://s3.amazonaws.com/assets.datacamp.com/course/introduction_to_r/chapter5.RData"))
339 | ```
340 |
341 | *** =sample_code
342 | ```{r}
343 | # shining_list is already defined in the workspace
344 |
345 | # Actors from shining_list: act
346 |
347 |
348 | # List containing title and reviews from shining_list: sublist
349 |
350 |
351 | # Display structure of sublist
352 |
353 | ```
354 |
355 | *** =solution
356 | ```{r}
357 | # shining_list is already defined in the workspace
358 |
359 | # Actors from shining_list: act
360 | act <- shining_list[["actors"]]
361 |
362 | # List containing title and reviews from shining_list: sublist
363 | sublist <- shining_list[c("title", "reviews")]
364 |
365 | # Display structure of sublist
366 | str(sublist)
367 | ```
368 |
369 | *** =sct
370 | ```{r}
371 | test_error()
372 | msg = "Do not remove or override `shining_list`."
373 | test_object("shining_list", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg)
374 |
375 | test_object("act", incorrect_msg = "Have you correctly selected the actors from `shining_list`?")
376 | test_object("sublist", incorrect_msg = "Have you correctly selected the title and reviews from `shining_list`? Use single brackets in combination with a vector to select multiple elements.")
377 | test_output_contains("str(sublist)", incorrect_msg = "Don't forget to display the structure of `sublist`!")
378 |
379 | success_msg("Nice! That was still pretty easy, right? Always be aware of this difference between `[` and `[[`!")
380 | ```
381 |
382 |
383 | --- type:NormalExercise lang:r xp:100 skills:1 key:cbda2853c3
384 | ## Chaining your selections
385 |
386 | Besides selecting entire list elements, it's also possible that you have to access specific parts of these components themselves. It's perfectly possible to _chain your subsetting operations_ in R.
387 |
388 | For example, with
389 |
390 | ```
391 | shining_list[[2]][1]
392 | ```
393 |
394 | you select from the second component, actors (`shining_list[[2]]`), the first element (`[1]`). When you type this in the console, you will see the answer is Jack Nicholson.
395 |
396 | *** =instructions
397 | - Select from the `shining_list` the last actor and assign the result to `last_actor`.
398 | - Select from the `shining_list` the second review score (which is a factor). Store the result in `second_review`.
399 |
400 | *** =hint
401 | - If you want to do things nicely: `length(shining_list$actors)` gives you the number of actors, and thus the element to select.
402 | - You can select the information of the second review with `shining_list$reviews[2, ]`.
403 |
404 | *** =pre_exercise_code
405 | ```{r}
406 | load(url("http://s3.amazonaws.com/assets.datacamp.com/course/introduction_to_r/chapter5.RData"))
407 | ```
408 |
409 | *** =sample_code
410 | ```{r}
411 | # shining_list is already defined in the workspace
412 |
413 | # Select the last actor: last_actor
414 |
415 |
416 | # Select the second review: second_review
417 |
418 | ```
419 |
420 | *** =solution
421 | ```{r}
422 | # shining_list is already defined in the workspace
423 |
424 | # Select the last actor: last_actor
425 | last_actor <- shining_list$actors[length(shining_list$actors)]
426 |
427 | # Select the second review: second_review
428 | second_review <- shining_list$reviews[2]
429 | ```
430 |
431 | *** =sct
432 | ```{r}
433 | test_error()
434 | msg = "Do not remove or override `shining_list`."
435 | test_object("shining_list", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg)
436 |
437 | test_object("last_actor",
438 | incorrect_msg = "Looks like `last_actor` does not equal the last actor.")
439 | test_object("second_review",
440 | incorrect_msg = "It looks like `second_review` does not contain the factor that corresponds to the second review.")
441 | success_msg("Great! Selecting elements from lists is rather easy isn't it? Continue to the next exercise.")
442 | ```
443 |
444 | --- type:NormalExercise lang:r xp:100 skills:1 key:35ba0d6f0d
445 | ## Extending a list
446 |
447 | You already know that the `$` as well as `[[` can be used to select elements from lists. They can also be used to extend lists. To extend `shining_list` with some personal opinion, you could do one of the following things:
448 |
449 | ```
450 | shining_list$my_opinion <- "Love it!"
451 | shining_list[["my_opinion"]] <- "Love it!"
452 | ```
453 |
454 | Being proud of your first list, you shared it with the members of your movie hobby club. However, one of the senior members, a guy named M. McDowell, noted that you forgot to add the release year (1980). Given your ambitions to become next year's president of the club, you decide to add this information to the list. To fully make up for your mistake, you also decide to add the name of the director (Stanley Kubrick).
455 |
456 | *** =instructions
457 | - Add the release year as a numeric to `shining_list` under the name `year`.
458 | - Add the director to the list, `"Stanley Kubrick"`, with the name `director`.
459 | - Finally, inspect the structure of `shining_list`.
460 |
461 | *** =hint
462 | Let the examples in the assignment guide you to list extension mastery! Remember that R is case sensitive!
463 |
464 | *** =pre_exercise_code
465 | ```{r}
466 | load(url("http://s3.amazonaws.com/assets.datacamp.com/course/introduction_to_r/chapter5.RData"))
467 | ```
468 |
469 | *** =sample_code
470 |
471 | ```{r}
472 | # shining_list is already defined in the workspace
473 |
474 | # Add the release year to shining_list
475 |
476 |
477 | # Add the director to shining_list
478 |
479 |
480 | # Inspect the structure of shining_list
481 |
482 | ```
483 |
484 | *** =solution
485 | ```{r}
486 | # shining_list is already defined in the workspace
487 |
488 | # Add the release year to shining_list
489 | shining_list$year <- 1980
490 |
491 | # Add the director to shining_list
492 | shining_list$director <- "Stanley Kubrick"
493 |
494 | # Inspect the structure of shining_list
495 | str(shining_list)
496 | ```
497 |
498 | *** =sct
499 | ```{r}
500 | test_error()
501 | test_object("shining_list",
502 | incorrect_msg = paste("Have you correctly added both the year and the director to `shining_list`?",
503 | "Make sure to use the correct names (\"year\" and \"director\"). Remember that R is case sensitive!"))
504 |
505 | test_output_contains("str(shining_list)",
506 | incorrect_msg = "Do not forget to inspect the structure of `shining_list` using the [`str()`](http://www.rdocumentation.org/packages/utils/functions/str) function.")
507 |
508 | success_msg("Congratulations on finishing up on this chapter!")
509 | ```
510 |
--------------------------------------------------------------------------------
/chapter6.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title_meta : Chapter 6
3 | title : Data Frames
4 | description : Most data sets you will be working with will be stored as a data frame. By the end of this chapter, you will be able to create a data frame, select interesting parts of a data frame and order a data frame according to certain variables.
5 | attachments :
6 | slides_link: https://s3.amazonaws.com/assets.datacamp.com/course/introduction_to_r/slides/ch6_slides.pdf
7 |
8 | --- type:VideoExercise lang:r xp:50 skills:1 key:d4bde604ab
9 | ## Explore the Data Frame
10 |
11 | *** =video_link
12 | //player.vimeo.com/video/138173996
13 |
14 | *** =video_hls
15 | //videos.datacamp.com/transcoded/732_intro_to_r/v1/hls-ch6_1.master.m3u8
16 |
17 | --- type:NormalExercise lang:r xp:100 skills:1 key:d4ddcf9a7d
18 | ## Have a look at your data set
19 |
20 | Working with large data sets is not uncommon in data analysis. When you work with (extremely) large data sets and data frames, your first task as a data analyst is to develop a clear understanding of its structure and main elements. Therefore, it is often useful to show only a small part of the entire data set.
21 |
22 | There are several ways to do this in R. The function [`head()`](http://www.rdocumentation.org/packages/utils/functions/head) enables you to show the first observations of a data frame (or any R object you pass to it). Unoriginally, the function [`tail()`](http://www.rdocumentation.org/packages/utils/functions/head) prints out the last observations in your data set. You can also use the function [`dim()`](http://www.rdocumentation.org/packages/base/functions/dim) to show the dimensions of your data set.
23 |
24 | In this exercise, you'll be working with the `mtcars` dataset, that is available in R by default.
25 |
26 | *** =instructions
27 | - Print the first observations of the [`mtcars`](http://www.rdocumentation.org/packages/datasets/functions/mtcars) data set.
28 | - Use the [`tail()`](http://www.rdocumentation.org/packages/utils/functions/head) function to display the last observations.
29 | - Finally, display the overall dimensions of the [`mtcars`](http://www.rdocumentation.org/packages/datasets/functions/mtcars) data frame with [`dim()`](http://www.rdocumentation.org/packages/base/functions/dim).
30 |
31 | *** =hint
32 | You'll need [`head()`](http://www.rdocumentation.org/packages/utils/functions/head) to show the first observations in [`mtcars`](http://www.rdocumentation.org/packages/datasets/functions/mtcars).
33 |
34 | *** =pre_exercise_code
35 | ```{r}
36 | # no pec
37 | ```
38 |
39 | *** =sample_code
40 | ```{r}
41 | # Print the first observations of mtcars
42 |
43 |
44 | # Print the last observations of mtcars
45 |
46 |
47 | # Print the dimensions of mtcars
48 | ```
49 |
50 | *** =solution
51 | ```{r}
52 | # Print the first observations of mtcars
53 | head(mtcars)
54 |
55 | # Print the last observations of mtcars
56 | tail(mtcars)
57 |
58 | # Print the dimensions of mtcars
59 | dim(mtcars)
60 | ```
61 |
62 | *** =sct
63 | ```{r}
64 | test_error()
65 | test_function("head", "x", incorrect_msg = "Have you specified the correct parameter in the [`head()`](http://www.rdocumentation.org/packages/utils/functions/head) function? Make sure to pass it a data set you want to inspect, `mtcars` in this case.")
66 | test_function("tail", "x", incorrect_msg = "Have you specified the correct parameter in the [`tail()`](http://www.rdocumentation.org/packages/utils/functions/head) function? Make sure to pass it a data set you want to inspect, `mtcars` in this case.")
67 | test_output_contains("dim(mtcars)", incorrect_msg = "Don't forget to also call the [`dim()`](http://www.rdocumentation.org/packages/base/functions/dim) function on `mtcars`!")
68 |
69 | success_msg("Wonderful! So, do you now have a good idea about what we have in the data set? For a full overview of the variables' meaning, type `?mtcars` in the console and read the help page. Continue to the next exercise!")
70 | ```
71 |
72 |
73 | --- type:NormalExercise lang:r xp:100 skills:1 key:c8f389fdbd
74 | ## Have a look at the structure
75 |
76 | Another method that is often used to get a rapid overview of your data is the function [`str()`](http://www.rdocumentation.org/packages/utils/functions/str). The function [`str()`](http://www.rdocumentation.org/packages/utils/functions/str) shows you the structure of your data set. For a data frame it tells you:
77 |
78 | - The total number of observations (e.g. 32 car types)
79 | - The total number of variables (e.g. 11 car features)
80 | - A full list of the variables names (e.g. mpg, cyl ... )
81 | - The data type of each variable (e.g. num for car features)
82 | - The first observations
83 |
84 | Applying the [`str()`](http://www.rdocumentation.org/packages/utils/functions/str) function will often be the first thing that you do when receiving a new data set or data frame. It is a great way to get more insight in your data set before diving into the real analysis.
85 |
86 | *** =instructions
87 | Investigate the structure of [`mtcars`](http://www.rdocumentation.org/packages/datasets/functions/mtcars). Make sure that you see the same numbers, variables and data types as mentioned above.
88 |
89 | *** =hint
90 | Use the [`str()`](http://www.rdocumentation.org/packages/utils/functions/str) function with [`mtcars`](http://www.rdocumentation.org/packages/datasets/functions/mtcars) as input!
91 |
92 | *** =pre_exercise_code
93 | ```{r}
94 | # no pec
95 | ```
96 |
97 | *** =sample_code
98 | ```{r}
99 | # Investigate the structure of the mtcars data set
100 | ```
101 |
102 | *** =solution
103 | ```{r}
104 | # Investigate the structure of the mtcars data set
105 | str(mtcars)
106 | ```
107 |
108 | *** =sct
109 | ```{r}
110 | test_function("str","object",incorrect_msg = "Make sure to check the structure of the `mtcars` data set.")
111 | test_output_contains("str(mtcars)", incorrect_msg = "Make sure that you use the [`str()`](http://www.rdocumentation.org/packages/utils/functions/str) function on `mtcars`.")
112 | success_msg("Nice work! Can you find all the information that is listed in the exercise's assignment? Continue to the next exercise.")
113 | ```
114 |
115 |
116 | --- type:NormalExercise lang:r xp:100 skills:1 key:f09c0189ac
117 | ## Creating a data frame (1)
118 |
119 | Since using built-in data sets is not even half the fun of creating your own data sets, the next exercises are based on your personally developed data set. So put your jet pack on because it is time for some good old fashioned space exploration!
120 |
121 | As a first goal, you want to construct a data frame that describes the main characteristics of eight planets in our solar system. According to your good friend Buzz, the main features of a planet are:
122 |
123 | - The type of the planet (Terrestrial or Gas Giant).
124 | - The planet's diameter relative to the diameter of the Earth.
125 | - The planet's rotation across the sun relative to that of the Earth.
126 | - If the planet has rings or not (TRUE or FALSE).
127 |
128 | After doing some high-quality research on [Wikipedia](http://en.wikipedia.org/wiki/Planet), you feel confident enough to create the necessary vectors: `planets`, `type`, `diameter`, `rotation` and `rings`. Can you correctly use the [`data.frame()`](http://www.rdocumentation.org/packages/base/functions/data.frame) function to create a data set from this information?
129 |
130 | *** =instructions
131 | - Use the function [`data.frame()`](http://www.rdocumentation.org/packages/base/functions/data.frame) to construct `planets_df`.
132 | - Make sure that you've actually created a data frame with 8 observations and 5 variables with [`str()`](http://www.rdocumentation.org/packages/utils/functions/str).
133 |
134 | *** =hint
135 | The [`data.frame()`](http://www.rdocumentation.org/packages/base/functions/data.frame) function takes as arguments the vectors that will become the columns of the data frame, separated by commas. The columns in this case are (in this order): `planet`, `type`, `diameter`, `rotation` and `rings`.
136 |
137 | *** =pre_exercise_code
138 | ```{r}
139 | # no pec
140 | ```
141 |
142 | *** =sample_code
143 | ```{r}
144 | # Definition of vectors
145 | planets <- c("Mercury", "Venus", "Earth", "Mars", "Jupiter", "Saturn", "Uranus", "Neptune")
146 | type <- c("Terrestrial planet", "Terrestrial planet", "Terrestrial planet",
147 | "Terrestrial planet", "Gas giant", "Gas giant", "Gas giant", "Gas giant")
148 | diameter <- c(0.382, 0.949, 1, 0.532, 11.209, 9.449, 4.007, 3.883)
149 | rotation <- c(58.64, -243.02, 1, 1.03, 0.41, 0.43, -0.72, 0.67)
150 | rings <- c(FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE)
151 |
152 | # Create a data frame: planets_df
153 |
154 |
155 | # Display the structure of planets_df
156 |
157 | ```
158 |
159 | *** =solution
160 | ```{r}
161 | # Definition of vectors
162 | planets <- c("Mercury", "Venus", "Earth", "Mars", "Jupiter", "Saturn", "Uranus", "Neptune")
163 | type <- c("Terrestrial planet", "Terrestrial planet", "Terrestrial planet",
164 | "Terrestrial planet", "Gas giant", "Gas giant", "Gas giant", "Gas giant")
165 | diameter <- c(0.382, 0.949, 1, 0.532, 11.209, 9.449, 4.007, 3.883)
166 | rotation <- c(58.64, -243.02, 1, 1.03, 0.41, 0.43, -0.72, 0.67)
167 | rings <- c(FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE)
168 |
169 | # Create a data frame: planets_df
170 | planets_df <- data.frame(planets, type, diameter, rotation, rings)
171 |
172 | # Display the structure of planets_df
173 | str(planets_df)
174 | ```
175 |
176 | *** =sct
177 | ```{r}
178 | test_correct({
179 | test_object("planets_df",
180 | undefined_msg = "Please make sure to define a variable `planets_df`.",
181 | incorrect_msg = "Make sure to assign the correct order of arguments to the data.frame `planets_df`. The correct order is planets, type, diameter, rotation and rings.")
182 | }, {
183 | msg = "Do not change anything about the definition of the vector. Only add code to create the `planets_df` data frame!"
184 | test_object("planets", undefined_msg = msg, incorrect_msg = msg)
185 | test_object("type", undefined_msg = msg, incorrect_msg = msg)
186 | test_object("diameter", undefined_msg = msg, incorrect_msg = msg)
187 | test_object("rotation", undefined_msg = msg, incorrect_msg = msg)
188 | test_object("rings", undefined_msg = msg, incorrect_msg = msg)
189 | })
190 |
191 | test_output_contains("str(planets_df)", incorrect_msg = "Don't forget to display the structure of `planets_df`!")
192 |
193 | success_msg("Great job! The structure of `planets_df` reveals that both the `planets` as the `type` column are factors, and not character vectors. That's not really what you want, right?")
194 | ```
195 |
196 | --- type:NormalExercise lang:r xp:100 skills:1 key:7090dc3538
197 | ## Creating a data frame (2)
198 |
199 | In the previous exercise, you found out that both the `planets` and `type` columns of `planets_df` are factors. For the `type` column this makes sense, because a planet type is some kind of category. For the `planets` column, however, that contains the planet names, this is less logical.
200 |
201 | You can set the `stringsAsFactors` argument inside [`data.frame()`](http://www.rdocumentation.org/packages/base/functions/data.frame) to avoid that R automatically converts character vectors to factors:
202 |
203 | ```
204 | data.frame(vec1, vec2, ..., stringsAsFactors = FALSE)
205 | ```
206 |
207 | Up to you now to adapt the way `planets_df` is constructed!
208 |
209 | *** =instructions
210 | - Encode the `type` vector in a factor, called `type_factor`.
211 | - Next use `planets`, `type_factor`, `diameter`, `rotation` and `rings` to construct `planets_df`. This time, make sure that strings are not converted to factors, by setting `stringsAsFactors = FALSE`.
212 | - Display the structure of `planets_df` to check you coded things correctly.
213 |
214 | *** =hint
215 | Use the function [`factor()`](http://www.rdocumentation.org/packages/base/functions/factor) to encode `type` as a factor.
216 |
217 | *** =pre_exercise_code
218 | ```{r}
219 | # no pec
220 | ```
221 |
222 | *** =sample_code
223 | ```{r}
224 | # Definition of vectors
225 | planets <- c("Mercury", "Venus", "Earth", "Mars", "Jupiter", "Saturn", "Uranus", "Neptune")
226 | type <- c("Terrestrial planet", "Terrestrial planet", "Terrestrial planet",
227 | "Terrestrial planet", "Gas giant", "Gas giant", "Gas giant","Gas giant")
228 | diameter <- c(0.382, 0.949, 1, 0.532, 11.209, 9.449, 4.007, 3.883)
229 | rotation <- c(58.64, -243.02, 1, 1.03, 0.41, 0.43, -0.72, 0.67)
230 | rings <- c(FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE)
231 |
232 | # Encode type as a factor: type_factor
233 |
234 |
235 | # Construct planets_df: strings are not converted to factors!
236 |
237 |
238 | # Display the structure of planets_df
239 |
240 | ```
241 |
242 | *** =solution
243 | ```{r}
244 | # Definition of vectors
245 | planets <- c("Mercury", "Venus", "Earth", "Mars", "Jupiter", "Saturn", "Uranus", "Neptune")
246 | type <- c("Terrestrial planet", "Terrestrial planet", "Terrestrial planet",
247 | "Terrestrial planet", "Gas giant", "Gas giant", "Gas giant", "Gas giant")
248 | diameter <- c(0.382, 0.949, 1, 0.532, 11.209, 9.449, 4.007, 3.883)
249 | rotation <- c(58.64, -243.02, 1, 1.03, 0.41, 0.43, -0.72, 0.67)
250 | rings <- c(FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE)
251 |
252 | # Encode type as a factor: type_factor
253 | type_factor <- factor(type)
254 |
255 | # Construct planets_df: strings are not converted to factors!
256 | planets_df <- data.frame(planets, type_factor, diameter, rotation, rings, stringsAsFactors = FALSE)
257 |
258 | # Display the structure of planets_df
259 | str(planets_df)
260 | ```
261 |
262 | *** =sct
263 | ```{r}
264 | msg <- "Do not remove or change the definition of all the vectors!"
265 | test_object("planets", undefined_msg = msg, incorrect_msg = msg)
266 | test_object("type", undefined_msg = msg, incorrect_msg = msg)
267 | test_object("diameter", undefined_msg = msg, incorrect_msg = msg)
268 | test_object("rotation", undefined_msg = msg, incorrect_msg = msg)
269 | test_object("rings", undefined_msg = msg, incorrect_msg = msg)
270 |
271 | test_object("type_factor", incorrect_msg = "Have you correctly created `type_factor`? Simply use the [`factor()`](http://www.rdocumentation.org/packages/base/functions/factor) function on `type`.")
272 | test_object("planets_df", incorrect_msg = "Have you correctly created `planets_df`? Make sure to use `type_factor` instead of `type` and set `stringsAsFactors` to `FALSE` inside [`data.frame()`](http://www.rdocumentation.org/packages/base/functions/data.frame).")
273 | test_output_contains("str(planets_df)", incorrect_msg = "Don't forget to display the structure of `planets_df`.")
274 |
275 | success_msg("That looks more like it! Head over to the next exercise.")
276 | ```
277 |
278 |
279 | --- type:NormalExercise lang:r xp:100 skills:1 key:a80ae7fbe8
280 | ## Rename the data frame columns
281 |
282 | As a data frame is actually a list containing same-length vectors under the hood, it's possible to name and rename data frames just as you did with lists. To create a data frame and name it in one and the same call you can use:
283 |
284 | ```
285 | data.frame(name1 = vec1, name2 = vec2, ...)
286 | ```
287 |
288 | You can also name a data frame after creating it:
289 |
290 | ```
291 | my_df <- data.frame(vec1, vec2, ...)
292 | names(my_df) <- c("name1", "name2", ...)
293 | ```
294 |
295 | Very proud of your first ever data frame, you show it to your friend Buzz. He's pretty impressed that you managed to include both factor and character columns, but he still finds the column names pretty odd. Time to make some improvements! The code that constructs the improved data frame, as you coded in the previous exercise, is already included.
296 |
297 | *** =instructions
298 | Rename the columns of `planets_df`. As `planets_df` is already created, you'll want to use the [`names()`](http://www.rdocumentation.org/packages/base/functions/names) function.
299 |
300 | - Name the `planets` column `name`.
301 | - Name the `type_factor` column `type`.
302 | - You can keep the names `diameter` and `rotation`.
303 | - Change the name `rings` to `has_rings`.
304 |
305 | Finally, print `planets_df` after you renamed it (not its structure!).
306 |
307 | *** =hint
308 | You'll need the vector containing `"name"`, `"type"`, `"diameter"`, `"rotation"` and `"has_rings"`.
309 |
310 | *** =pre_exercise_code
311 | ```{r}
312 | # no pec
313 | ```
314 |
315 | *** =sample_code
316 | ```{r}
317 | # Construct improved planets_df
318 | planets <- c("Mercury", "Venus", "Earth", "Mars", "Jupiter", "Saturn", "Uranus", "Neptune")
319 | type <- c("Terrestrial planet", "Terrestrial planet", "Terrestrial planet",
320 | "Terrestrial planet", "Gas giant", "Gas giant", "Gas giant", "Gas giant")
321 | diameter <- c(0.382, 0.949, 1, 0.532, 11.209, 9.449, 4.007, 3.883)
322 | rotation <- c(58.64, -243.02, 1, 1.03, 0.41, 0.43, -0.72, 0.67)
323 | rings <- c(FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE)
324 | type_factor <- factor(type)
325 | planets_df <- data.frame(planets, type_factor, diameter, rotation, rings, stringsAsFactors = FALSE)
326 |
327 | # Improve the names of planets_df
328 |
329 |
330 | # Print planets_df
331 |
332 | ```
333 |
334 | *** =solution
335 | ```{r}
336 | # Construct improved planets_df
337 | planets <- c("Mercury", "Venus", "Earth", "Mars", "Jupiter", "Saturn", "Uranus", "Neptune")
338 | type <- c("Terrestrial planet", "Terrestrial planet", "Terrestrial planet",
339 | "Terrestrial planet", "Gas giant", "Gas giant", "Gas giant", "Gas giant")
340 | diameter <- c(0.382, 0.949, 1, 0.532, 11.209, 9.449, 4.007, 3.883)
341 | rotation <- c(58.64, -243.02, 1, 1.03, 0.41, 0.43, -0.72, 0.67)
342 | rings <- c(FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE)
343 | type_factor <- factor(type)
344 | planets_df <- data.frame(planets, type_factor, diameter, rotation, rings, stringsAsFactors = FALSE)
345 |
346 | # Improve the names of planets_df
347 | names(planets_df) <- c("name", "type", "diameter", "rotation", "has_rings")
348 |
349 | # Print planets_df
350 | planets_df
351 | ```
352 |
353 | *** =sct
354 | ```{r}
355 |
356 | msg <- "Do not remove or change the definition of the predefined variables!"
357 | test_object("planets", undefined_msg = msg, incorrect_msg = msg)
358 | test_object("type", undefined_msg = msg, incorrect_msg = msg)
359 | test_object("diameter", undefined_msg = msg, incorrect_msg = msg)
360 | test_object("rotation", undefined_msg = msg, incorrect_msg = msg)
361 | test_object("rings", undefined_msg = msg, incorrect_msg = msg)
362 | test_object("type_factor", undefined_msg = msg, incorrect_msg = msg)
363 | test_object("planets_df", undefined_msg = msg, incorrect_msg = "Don't change the contents of `planets_df`, only change the column names!")
364 | test_object("planets_df", eq_condition = "equal",
365 | undefined_msg = msg, incorrect_msg = "Are you sure you have correctly renamed the columns of `planets_df`? The hint might be able to help you out.")
366 |
367 | test_output_contains("planets_df", incorrect_msg = "Don't forget to print `planets_df`.")
368 | success_msg("Nice one! This is going fast!")
369 | ```
370 |
371 | --- type:VideoExercise lang:r xp:50 skills:1 key:9a2f941de8
372 | ## Subset, Extend & Sort Data Frames
373 |
374 | *** =video_link
375 | //player.vimeo.com/video/138174008
376 |
377 | *** =video_hls
378 | //videos.datacamp.com/transcoded/732_intro_to_r/v1/hls-ch6_2.master.m3u8
379 |
380 |
381 | --- type:NormalExercise lang:r xp:100 skills:1 key:b6125af738
382 | ## Selection of data frame elements
383 |
384 | Similar to matrices, you select elements from a data frame with the help of square brackets `[ ]`. By using a comma, you can indicate what to select from the rows and the columns respectively:
385 |
386 | ```
387 | # first row, second column
388 | my_df[1,2]
389 |
390 | # rows 1 to 3, columns 2 to 4
391 | my_df[1:3,2:4]
392 |
393 | # Entire first row
394 | my_df[1, ]
395 |
396 | # rows 1 to 3 of "type" column
397 | planets_df[1:3,2]
398 | planets_df[1:3,"type"]
399 | ```
400 |
401 | Let us now apply this technique on `planets_df`! This data frame is already available in the workspace.
402 |
403 | *** =instructions
404 | - Select the type of Mars; store the factor in `mars_type`.
405 | - Store the entire rotation column in `rotation` as a vector.
406 | - Create a data frame, `closest_planets_df`, that contains all data on the first three planets.
407 | - Likewise, build the data frame `furthest_planets_df` that contains all data on the last three planets.
408 |
409 | *** =hint
410 | `planets_df[1:3,]` will select all elements of the first three rows.
411 |
412 | *** =pre_exercise_code
413 | ```{r}
414 | load(url("http://s3.amazonaws.com/assets.datacamp.com/course/introduction_to_r/chapter6.RData"))
415 | ```
416 |
417 | *** =sample_code
418 | ```{r}
419 | # planets_df is pre-loaded
420 |
421 | # The type of Mars: mars_type
422 |
423 |
424 | # Entire rotation column: rotation
425 |
426 |
427 | # First three planets: closest_planets_df
428 |
429 |
430 | # Last three planets: furthest_planets_df
431 |
432 |
433 | ```
434 |
435 | *** =solution
436 | ```{r}
437 | # planets_df is pre-loaded
438 |
439 | # The type of Mars: mars_type
440 | mars_type <- planets_df[4, 2]
441 |
442 | # Entire rotation column: rotation
443 | rotation <- planets_df[ ,4]
444 |
445 | # First three planets: closest_planets_df
446 | closest_planets_df <- planets_df[1:3, ]
447 |
448 | # Last three planets: furthest_planets_df
449 | furthest_planets_df <- planets_df[6:8, ]
450 | ```
451 |
452 | *** =sct
453 | ```{r}
454 |
455 | msg <- "Do not remove or overwrite the `planets_df` data frame!"
456 | test_object("planets_df", undefined_msg = msg, incorrect_msg = msg)
457 |
458 | test_object("mars_type",
459 | incorrect_msg = "Are you sure you correctly selected the type of Mars?")
460 | test_object("rotation",
461 | incorrect_msg = "Have another look at the command to define `rotation`. You'll want to select the fourth column.")
462 | test_object("closest_planets_df",
463 | incorrect_msg = "Did you select the first three rows of `planets_df` to create `closest_planets_df`?")
464 | test_object("furthest_planets_df",
465 | incorrect_msg = "Make sure that you select the last three rows of `planets_df` to build `furthest_planets_df`.")
466 |
467 | success_msg("Great! Feel free to have a look at the variables you've just created. Apart from selecting elements from your data frame by index, you can also use the column names.")
468 | ```
469 |
470 | --- type:NormalExercise lang:r xp:100 skills:1 key:7aa0a94261
471 | ## Only planets with rings (1)
472 |
473 | You will often want to select an entire column, namely one specific variable from a data frame. If you want to select the column `diameter` from `planets_df`, you can use either on of the following:
474 |
475 | ```
476 | planets_df[, 3]
477 | planets_df[, "diameter"]
478 | planets_df$diameter
479 | ```
480 |
481 | *** =instructions
482 | - Make use of the `$` sign to create the variable `rings_vector` that contains the entire `has_rings` column in the `planets_df` data frame.
483 | - Print the `rings_vector`; it should be a vector.
484 |
485 | *** =hint
486 | `my_df$col_name` is the most convenient way to select a column from a data frame. In this case, the data frame is `planets_df` and the variable is `has_rings`.
487 |
488 | *** =pre_exercise_code
489 | ```{r}
490 | load(url("http://s3.amazonaws.com/assets.datacamp.com/course/introduction_to_r/chapter6.RData"))
491 | ```
492 |
493 | *** =sample_code
494 | ```{r}
495 | # planets_df is pre-loaded in your workspace
496 |
497 | # Create rings_vector
498 |
499 |
500 | # Print rings_vector
501 |
502 | ```
503 |
504 | *** =solution
505 | ```{r}
506 | # planets_df is pre-loaded in your workspace
507 |
508 | # Create rings_vector
509 | rings_vector <- planets_df$has_rings
510 |
511 | # Print rings_vector
512 | rings_vector
513 | ```
514 |
515 | *** =sct
516 | ```{r}
517 | msg <- "Do not remove or overwrite the `planets_df` data frame!"
518 | test_object("planets_df", undefined_msg = msg, incorrect_msg = msg)
519 |
520 | test_object("rings_vector", incorrect_msg = "It looks like `rings_vector` does not contain all the elements of the `has_rings` variable of`planets_df`.")
521 |
522 | test_output_contains("rings_vector", incorrect_msg = "Don't forget to print `rings_vector`!")
523 |
524 | success_msg("Great! Continue to the next exercise and discover yet another way of subsetting!")
525 | ```
526 |
527 |
528 | --- type:NormalExercise lang:r xp:100 skills:1 key:d6245c1bb1
529 | ## Only planets with rings (2)
530 |
531 | You probably remember from high school that some planets in our solar system have rings and others do not. But due to other priorities at that time (read: puberty) you can not recall their names, let alone their rotation speed, etc. Could R help you out?
532 |
533 | The `rings_vector` you've coded before is a logical vector. It's `TRUE` when the corresponding planets have rings and `FALSE` when they don't. To select those observations from `planets_df` that have rings, you can use the `rings_vector` and perform subsetting by logicals!
534 |
535 | To subset observations by logicals, put the logical vector and a comma inside square brackets, similar to this:
536 |
537 | ```
538 | df[,logical_vector]
539 | ```
540 |
541 | *** =instructions
542 | - Assign to `planets_with_rings_df` all data in the `planets_df` data set for the planets with rings, that is, where `rings_vector` is `TRUE`.
543 | - Print `planets_with_rings_df`.
544 |
545 | *** =hint
546 | Select elements from `planets_df` by using the square brackets. The `rings_vector` contains boolean values and R will select only those rows/columns were the vector element is `TRUE`. In this case, you want to select rows based on `rings_vector` and select all the columns.
547 |
548 | *** =pre_exercise_code
549 | ```{r}
550 | load(url("http://s3.amazonaws.com/assets.datacamp.com/course/introduction_to_r/chapter6.RData"))
551 | ```
552 |
553 | *** =sample_code
554 | ```{r}
555 | # planets_df pre-loaded in your workspace
556 |
557 | # Create rings_vector
558 | rings_vector <- planets_df$has_rings
559 |
560 | # Select the information on planets with rings: planets_with_rings_df
561 |
562 |
563 | # Print planets_with_rings_df
564 | ```
565 |
566 | *** =solution
567 | ```{r}
568 | # planets_df pre-loaded in your workspace
569 |
570 | # Create rings_vector
571 | rings_vector <- planets_df$has_rings
572 |
573 | # Select the information on planets with rings: planets_with_rings_df
574 | planets_with_rings_df <- planets_df[rings_vector,]
575 |
576 | # Print planets_with_rings_df
577 | planets_with_rings_df
578 | ```
579 |
580 | *** =sct
581 | ```{r}
582 |
583 | msg <- "Do not remove or overwrite `planets_df` or `rings_vector`!"
584 | test_object("planets_df", undefined_msg = msg, incorrect_msg = msg)
585 | test_object("rings_vector", undefined_msg = msg, incorrect_msg = msg)
586 |
587 | test_object("planets_with_rings_df",
588 | incorrect_msg = "It looks like `planets_with_rings_df` does not contain all the data of the planets with rings. Make sure to not specify any column selector, to keep all columns.")
589 |
590 | test_output_contains("planets_with_rings_df",
591 | incorrect_msg = "Don't forget to print `planets_with_rings_df`!")
592 | success_msg("Nice work, but this is a rather tedious solution. The next exercise will teach you how to do it in a more concise way.")
593 | ```
594 |
595 |
596 | --- type:NormalExercise lang:r xp:100 skills:1 key:c1a08e245c
597 | ## Only planets with rings but shorter
598 |
599 | So what exactly did you learn in the previous exercises? You selected a subset from a data frame (`planets_df`) based on whether or not a certain condition was true (rings or no rings), and you managed to pull out all relevant data. Pretty awesome! By now, NASA is probably already flirting with your CV!
600 |
601 | Instead of having to define a vector `rings_vector`, which you then use to subset `planets_df`, you could've also used either one of these:
602 |
603 | ```
604 | planets_df[planets_df$has_rings, ]
605 | planets_df[planets_df$has_rings == TRUE, ]
606 | ```
607 |
608 | *** =instructions
609 | - Create a data frame `small_planets_df` with planets that have a diameter smaller than the Earth. This means that the `diameter` variable should be smaller than 1, since diameter is a relative measure of the planet's diameter in relation to planet Earth.
610 | - Build another data frame, `slow_planets_df`, with the observations that have a longer rotation period than Earth. This means that the absolute value (use the function [`abs()`](http://www.rdocumentation.org/packages/base/functions/MathFun)) of the `rotation` variable should be greater than 1.
611 |
612 | *** =hint
613 | Make use of the logical operators `>` and `<`. Use the [`abs()`](http://www.rdocumentation.org/packages/base/functions/MathFun) function for absolute values.
614 |
615 | *** =pre_exercise_code
616 | ```{r}
617 | load(url("http://s3.amazonaws.com/assets.datacamp.com/course/introduction_to_r/chapter6.RData"))
618 | ```
619 |
620 | *** =sample_code
621 | ```{r}
622 | # planets_df is pre-loaded in your workspace
623 |
624 | # Planets that are smaller than planet Earth: small_planets_df
625 |
626 |
627 | # Planets that rotate slower than planet Earth: slow_planets_df
628 |
629 | ```
630 |
631 | *** =solution
632 | ```{r}
633 | # planets_df is pre-loaded in your workspace
634 |
635 | # Planets that are smaller than planet Earth: small_planets_df
636 | small_planets_df <- planets_df[planets_df$diameter < 1, ] # option 1
637 | small_planets_df <- subset(planets_df, subset = diameter < 1) # option 2
638 |
639 | # Planets that rotate slower than planet Earth: slow_planets_df
640 | slow_planets_df <- planets_df[abs(planets_df$rotation) > 1, ] # option 1
641 | slow_planets_df <- subset(planets_df, subset = abs(rotation) > 1) # option 2
642 | ```
643 |
644 | *** =sct
645 | ```{r}
646 |
647 | msg <- "Do not remove or overwrite the `planets_df` data frame!"
648 | test_object("planets_df", undefined_msg = msg, incorrect_msg = msg)
649 |
650 | test_object("small_planets_df",
651 | incorrect_msg = "It looks like `small_planets_df` does not contain the correct subset of `planets_df`.")
652 |
653 | test_object("slow_planets_df",
654 | incorrect_msg = "It looks like `slow_planets_df` does not contains the correct subset of `planets_df`. Make sure to use the [`abs()`](http://www.rdocumentation.org/packages/base/functions/MathFun) function for absolute values.")
655 |
656 | success_msg("Great! Not only is the [`subset()`](http://www.rdocumentation.org/packages/base/functions/subset) function more concise, it is probably also more understandable for people who read your code. Continue to the next exercise.")
657 | ```
658 |
659 |
660 | --- type:NormalExercise lang:r xp:100 skills:1 key:e9ca3eeb99
661 | ## Add variable/column
662 |
663 | There are many cases in which you'll want to add more variables to your data frame. This comes down to adding a column to the data frame. The exact same techniques to select columns from a data frame can be used here. To add a column `new_column` to `my_df`, with data from `my_vec`, you can use one of the following calls:
664 |
665 | ```
666 | my_df$new_column <- my_vec
667 | my_df[["new_column"]] <- my_vec
668 | my_df <- cbind(my_df, new_column = my_vec)
669 | ```
670 |
671 | You've browsed [Wikipedia](https://en.wikipedia.org/wiki/Planet) and also decide to add a column that lists the number of moons each of the planets has. Also the planets' masses can be a cool addition. The `moon` and `masses` vectors are already included in the workspace; up to you to add them to `planets_df`.
672 |
673 | *** =instructions
674 | - Add `moons` to `planets_df` under the variable name "moon".
675 | - In a similar fashion, add `masses` under the variable name `"mass"`.
676 |
677 | *** =hint
678 | To add a new column called "moon", you can use: `planets_df$moon <- moons`.
679 |
680 | *** =pre_exercise_code
681 | ```{r}
682 | load(url("http://s3.amazonaws.com/assets.datacamp.com/course/introduction_to_r/chapter6.RData"))
683 | ```
684 |
685 | *** =sample_code
686 | ```{r}
687 | # planets_df is already pre-loaded in your workspace
688 |
689 | # Definition of moons and masses
690 | moons <- c(0, 0, 1, 2, 67, 62, 27, 14)
691 | masses <- c(0.06, 0.82, 1.00, 0.11, 317.8, 95.2, 14.6, 17.2)
692 |
693 | # Add moons to planets_df under the name "moon"
694 |
695 |
696 | # Add masses to planets_df under the name "mass"
697 |
698 | ```
699 |
700 | *** =solution
701 | ```{r}
702 | # planets_df is already pre-loaded in your workspace
703 |
704 | # Definition of moons and masses
705 | moons <- c(0, 0, 1, 2, 67, 62, 27, 14)
706 | masses <- c(0.06, 0.82, 1.00, 0.11, 317.8, 95.2, 14.6, 17.2)
707 |
708 | # Add moons to planets_df under the name "moon"
709 | planets_df$moon <- moons
710 |
711 | # Add masses to planets_df under the name "mass"
712 | planets_df$mass <- masses
713 | ```
714 |
715 | *** =sct
716 | ```{r}
717 |
718 | undef_msg <- "Do not remove `planets_df`!"
719 | msg <- "Do not change anything about the columns that were already in `planets_df`; you should only add columns."
720 | test_data_frame(name = "planets_df",
721 | columns = c("name", "type", "diameter", "rotation", "has_rings"),
722 | undefined_msg = undef_msg, undefined_cols_msg = msg, incorrect_msg = msg)
723 |
724 | test_data_frame(name = "planets_df",
725 | columns = "moon",
726 | undefined_cols_msg = "Make sure to name the column to contain the moon information \"moon\".",
727 | incorrect_msg = "The \"moon\" column does not contain the correct information. Try again.")
728 |
729 | test_data_frame(name = "planets_df",
730 | columns = "mass",
731 | undefined_cols_msg = "Make sure to name the column to contain the mass information \"mass\".",
732 | incorrect_msg = "The \"mass\" column does not contain the correct information. Try again.")
733 |
734 | test_object("planets_df", incorrect_msg = "It appears that you have correctly specified the \"moon\" and \"mass\" columns, but there's still something wrong with the resulting `planets_df`. Make sure to add columns twice!")
735 | success_msg("Nice one! This data frame is beginning to contain quite some information!")
736 | ```
737 |
738 |
739 | --- type:NormalExercise lang:r xp:100 skills:1 key:8e5ade7078
740 | ## Sorting
741 |
742 | In data analysis you will often sort your data according to a certain variable in the data set. In R, this is done with the help of the function [`order()`](http://www.rdocumentation.org/packages/base/functions/order).
743 |
744 | [`order()`](http://www.rdocumentation.org/packages/base/functions/order) is a function that gives you the ranked position of each element when it is applied on a variable, such as a vector for example:
745 |
746 | ```
747 | a <- c(100,9,101)
748 | order(a)
749 | ```
750 |
751 | this code returns the vector containing 2, 1 and 3; that's because 100 is the second largest element of the vector, 9 is the smallest element and 101 is the largest element.
752 |
753 | ```
754 | a[order(a)]
755 | ```
756 |
757 | will thus give you the ordered vector (9, 100, 101), since it first picks the second element of `a`, then the first and then the last. Got it? If you are not sure, use the console to play with the [`order()`](http://www.rdocumentation.org/packages/base/functions/order) function.
758 |
759 | *** =instructions
760 | Experiment with the [`order()`](http://www.rdocumentation.org/packages/base/functions/order) function in the console. Click 'Submit Answer' when you are ready to continue.
761 |
762 | *** =hint
763 | Just play with the [`order()`](http://www.rdocumentation.org/packages/base/functions/order) function in the console!
764 |
765 | *** =pre_exercise_code
766 | ```{r}
767 | # no pec
768 | ```
769 |
770 | *** =sample_code
771 | ```{r}
772 | # Just play around with the order function in the console to see how it works!
773 | ```
774 |
775 | *** =solution
776 | ```{r}
777 | # Just play around with the order function in the console to see how it works!
778 | # Some examples:
779 | order(1:10)
780 | order(2:11)
781 | order(c(5,4,6,7))
782 | ```
783 |
784 | *** =sct
785 | ```{r}
786 | success_msg("Great! Now let's use the [`order()`](http://www.rdocumentation.org/packages/base/functions/order) function to sort your data frame!")
787 | ```
788 |
789 |
790 | --- type:NormalExercise lang:r xp:100 skills:1 key:ec87541ef1
791 | ## Sorting your data frame
792 |
793 | Alright, now let us do something useful with the [`order()`](http://www.rdocumentation.org/packages/base/functions/order) function! You would like to rearrange your data frame such that it starts with the smallest planet and ends with the largest one. A sort on the `diameter` column, if you will.
794 |
795 | Suppose you have a data frame `df`, with three columns `a`, `b` and `c`. The following code will print a version of df that is sorted on the column `a`.
796 |
797 | ```
798 | pos <- order(df$a)
799 | df[pos, ]
800 | ```
801 |
802 | *** =instructions
803 | - Assign to the variable `positions` the desired ordering for the new data frame that you will create in the next step. You can use the [`order()`](http://www.rdocumentation.org/packages/base/functions/order) function for that.
804 | - Now create the data frame `smallest_first_df`, which contains the same information as `planets_df`, but with the planets in increasing order of magnitude. Use the previously created variable `positions` as row indices inside square brackets to achieve this.
805 | - Print `smallest_first_df` to see what you've accomplished.
806 |
807 | *** =hint
808 | - `order(planets_df$diameter)` will give you the ordering of the variable diameter from smallest to largest. This is what you should assign to `positions`.
809 | - Use the variable positions then to select from the data frame `planets_df`: `planets_df[positions, ]`.
810 |
811 | *** =pre_exercise_code
812 | ```{r}
813 | load(url("http://s3.amazonaws.com/assets.datacamp.com/course/introduction_to_r/chapter6.RData"))
814 | ```
815 |
816 | *** =sample_code
817 | ```{r}
818 | # planets_df is pre-loaded in your workspace
819 |
820 | # Create a desired ordering for planets_df: positions
821 |
822 |
823 | # Create a new, ordered data frame: smallest_first_df
824 |
825 |
826 | # Print smallest_first_df
827 | ```
828 |
829 | *** =solution
830 | ```{r}
831 | # planets_df is pre-loaded in your workspace
832 |
833 | # Create a desired ordering for planets_df: positions
834 | positions <- order(planets_df$diameter)
835 |
836 | # Create a new, ordered data frame: smallest_first_df
837 | smallest_first_df <- planets_df[positions, ]
838 |
839 | # Print smallest_first_df
840 | smallest_first_df
841 | ```
842 |
843 | *** =sct
844 | ```{r}
845 | msg = "Do not remove or overwrite the `planets_df` data frame!"
846 | test_object("planets_df", undefined_msg = msg, incorrect_msg = msg)
847 | test_object("positions",
848 | incorrect_msg = "It looks like `positions` does not contain all the correct ordering of the diameter column.")
849 | test_object("smallest_first_df",
850 | incorrect_msg = "It looks like `smallest_first_df` does not contain the positions of the ordered `planets_df`.")
851 | test_output_contains("smallest_first_df", incorrect_msg = "Finish off by printing `smallest_first_df`.")
852 | success_msg("Wonderful! What does the resulting data frame look like? Order prevailed!")
853 | ```
854 |
855 |
--------------------------------------------------------------------------------
/course.yml:
--------------------------------------------------------------------------------
1 | id: 732
2 | title: Introduction to R (beta)
3 | author_field: Filip Schouwenaars
4 | university: DataCamp
5 | author_bio: Next to being the main developer of DataCamp's interactive courses, Filip
6 | is responsible for everything related to R. Under the motto 'Eat your own dog food',
7 | he leverages the techniques DataCamp teaches its students to perform data analysis
8 | for DataCamp. Filip holds degrees in Electrical Engineering and Artificial Intelligence.
9 |
This course is a rework of the earlier introduction to R course,
10 | built by Jonathan Cornelissen and Martijn Theuwissen, co-founders of DataCamp.
11 | description: With over 2 million users worldwide R is rapidly becoming the leading
12 | programming language in statistics and data science. Every year, the number of R
13 | users grows by 40%, and an increasing number of organizations are using it in their
14 | day-to-day activities.
In this introduction to R, you will master the basics
15 | of this beautiful open source language such as factors, lists and data frames. With
16 | the knowledge gained in this course, you will be ready to undertake your first very
17 | own data analysis.
18 | chapters:
19 | chapter1.Rmd: 1720
20 | chapter2.Rmd: 1721
21 | chapter3.Rmd: 1722
22 | chapter4.Rmd: 1723
23 | chapter5.Rmd: 1724
24 | chapter6.Rmd: 1725
25 |
26 |
--------------------------------------------------------------------------------
/datasets/chapter5.R:
--------------------------------------------------------------------------------
1 | # Create shining_list
2 | actors_vector <- c("Jack Nicholson","Shelley Duvall","Danny Lloyd","Scatman Crothers","Barry Nelson")
3 | reviews_factor <- factor(c("Good", "OK", "Good", "Perfect", "Bad", "Perfect", "Good"),
4 | ordered = TRUE, levels = c("Bad", "OK", "Good", "Perfect"))
5 | shining_list <- list(title = "The Shining", actors = actors_vector, reviews = reviews_factor)
6 | rm(actors_vector, reviews_factor)
7 | save(shining_list, file = "datasets/chapter5.RData")
--------------------------------------------------------------------------------
/datasets/chapter5.RData:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datacamp/courses-intro-to-r-beta/b028fe1f2dcdc1eba49e8e4135f6b061ae7dd394/datasets/chapter5.RData
--------------------------------------------------------------------------------
/datasets/chapter6.R:
--------------------------------------------------------------------------------
1 | planets <- c("Mercury", "Venus", "Earth", "Mars", "Jupiter", "Saturn", "Uranus", "Neptune")
2 | type <- c("Terrestrial planet", "Terrestrial planet", "Terrestrial planet",
3 | "Terrestrial planet", "Gas giant", "Gas giant", "Gas giant", "Gas giant")
4 | diameter <- c(0.382, 0.949, 1, 0.532, 11.209, 9.449, 4.007, 3.883)
5 | rotation <- c(58.64, -243.02, 1, 1.03, 0.41, 0.43, -0.72, 0.67)
6 | rings <- c(FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE)
7 | type_factor <- factor(type)
8 | planets_df <- data.frame(planets, type_factor, diameter, rotation, rings, stringsAsFactors = FALSE)
9 | names(planets_df) <- c("name", "type", "diameter", "rotation", "has_rings")
10 | rm(planets, type, diameter, rotation, rings, type_factor)
11 | save(planets_df, file = "datasets/chapter6.RData")
--------------------------------------------------------------------------------
/datasets/chapter6.RData:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datacamp/courses-intro-to-r-beta/b028fe1f2dcdc1eba49e8e4135f6b061ae7dd394/datasets/chapter6.RData
--------------------------------------------------------------------------------
/refguides/chapter1_refguide.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | output: html_document
3 | ---
4 |
5 | ## R: The true basics
6 |
7 |
8 | ### R as calculator
9 |
10 | After R is started, there is a console awaiting for input. At the prompt (`>`), you can enter numbers and perform calculations.
11 |
12 | ```{r}
13 | 3*2
14 | ```
15 | A few arithmetic operators are:
16 |
17 | - Addition: `+`
18 | - Subtraction: `-`
19 | - Multiplication: `*`
20 | - Division: `/`
21 | - Exponentiation: `^`
22 | - Modulo: `%%`
23 |
24 |
25 | ### Variable assignment and Operations
26 |
27 | You can assign values to variables with the assignment operator `<-`. Just typing the variable by itself at the prompt will print out the value.
28 |
29 | ```{r}
30 | x <- 3
31 | x
32 | y <- 9
33 | y
34 | ```
35 |
36 | You can also perform arithmetic operations with variables. Look at the result of multiplying `x` and `y`, we defined previously:
37 |
38 | ```{r}
39 | y * x
40 | ```
41 |
42 | As you work in R and create new variables it can be easy to lose track of what variables you have defined. To get a list of all the variables that have been defined use [`ls()`](http://www.rdocumentation.org/packages/base/functions/ls). And if you need to remove variables, you can use `rm()`.
43 |
44 |
45 | ### Comment your code
46 |
47 | Adding comments to your code helps others understanding it. Comments in R are ignored by the parser. Any text that could be typed after the `#` character and on the same line is taken to be a comment, unless the `#` character is inside a quoted string. For example,
48 |
49 | ```{r}
50 | x <- 24 # this is a comment
51 | y <- " #... but this is not."
52 | ```
53 |
54 |
55 | ## Basic Data Types
56 |
57 | There are several basic R data types that are of frequent occurrence in routine R calculations. We will try to understand a few of them better by using the [`class()`](http://www.rdocumentation.org/packages/base/functions/class) function.
58 |
59 | - Decimal values are called numerics in R. You can perform arithmetic operations on them.
60 | ```{r}
61 | x <- 12.3
62 | x
63 | class(x)
64 | ```
65 |
66 | - A special type of numeric is an integer. You can specify that a number is an integer using the following syntax.
67 | ```{r}
68 | y <- 3L
69 | y
70 | class(y)
71 | ```
72 |
73 | Another way is to invoke the [`as.integer()`](http://www.rdocumentation.org/packages/base/functions/integer) function.
74 |
75 | ```{r}
76 | y <- as.integer(3)
77 | y
78 | class(y)
79 | ```
80 |
81 | And one can convert an integer value to a numeric value by [`as.numeric()`](http://www.rdocumentation.org/packages/base/functions/numeric).
82 |
83 | - A character object is used to represent string values in R.
84 |
85 | ```{r}
86 | z <- "Good morning!"
87 | z
88 | class(z)
89 | ```
90 |
91 | You can convert objects into character values with the [`as.character()`](http://www.rdocumentation.org/packages/base/functions/numeric) function.
92 |
93 | - Another important data type is the logical type. There are two predefined variables, `TRUE` and `FALSE`.
94 |
95 | ```{r}
96 | a <- TRUE
97 | a
98 | class(a)
99 | ```
100 |
101 | You can also see the data type of a variable by invoking one of the following `is.*()` functions. The result is a logical statement `TRUE` or `FALSE`.
102 |
103 | ```{r, eval=FALSE}
104 | is.numeric() #to evaluate if type = numeric
105 | is.integer() #to evaluate if type = integer
106 | is.character() #to evaluate if type = character
107 | ```
108 |
109 |
110 |
111 |
112 |
113 |
--------------------------------------------------------------------------------
/refguides/chapter2_refguide.Rmd:
--------------------------------------------------------------------------------
1 |
2 | ## Create and Name Vectors
3 |
4 | A vector is a sequence of data elements of the same basic type. You can have character, numerical, logical vectors and many more. To create one you can use the [`c()`](http://www.rdocumentation.org/packages/onion/functions/c) function. Here is a numeric vector:
5 |
6 | ```{r}
7 | c(12, 7, 4)
8 | ```
9 |
10 | And a character one, assigned to the variable `types_water`.
11 |
12 | ```{r}
13 | types_water <- c("Fresh", "Brackish", "Seawater")
14 | ```
15 |
16 | As in the previous chapter, you can verify if `types_water` is a vector by
17 |
18 | ```{r}
19 | is.vector(types_water)
20 | ```
21 |
22 | You can also name the elements of your vector by the [`names()`](http://www.rdocumentation.org/packages/nlme/functions/Names) function or create a named vector to begin with.
23 |
24 | ```{r}
25 | numeric_vector <- c(12, 7, 4)
26 | name <- c("months/year", "days/week", "weeks/month")
27 | names(numeric_vector) <- name
28 | numeric_vector
29 | ```
30 |
31 | In our example, the `numeric_vector ` contains three elements. To see how many elements your vector contains, use:
32 |
33 | ```{r}
34 | length(numeric_vector)
35 | ```
36 |
37 | ### Important Note
38 | A vector can only contain elements of the same type. If you try to build a vector of different data types, R performs coersion. It trasnfroms all the elements to the same data type.
39 |
40 | ```{r}
41 | new_vector <- c("ice-cream", TRUE, 2)
42 | new_vector
43 | ```
44 |
45 | You can verify that now `new_vector` is a character vector by invoking:
46 |
47 | ```{r}
48 | class(new_vector)
49 | ```
50 |
51 |
52 | ## Vector Calculus
53 |
54 | Computations on vectors are performed in an element-wise fashion. Ckeck it out.
55 |
56 | ```{r}
57 | vector1 <- c(5, 2, 39, 106)
58 | vector2 <- c(300, 5, 1 , 0)
59 | vector3 <- vector1 + vector2
60 | vector3
61 | ```
62 |
63 | You can use all the arithemtic operations you learned in Chapter 1!
64 |
65 | If you want to sum up all the elements of your vector you can use
66 |
67 | ```{r}
68 | sum(vector3)
69 | ```
70 |
71 | Moreover, you can use relational operators like `<` and `>` to compare two vectors. Remember the comparison is performed again element-wise.
72 |
73 | ```{r}
74 | vector1 < vector2
75 | ```
76 |
77 | ## Vector Subsetting
78 |
79 | Suppose you need to select an element of your vector. You can use `[...]`.
80 |
81 | ```{r}
82 | numeric_vector[1]
83 | ```
84 |
85 | The number inside the brackets corresponds to the element you want to select, here the first one has been selected.
86 |
87 | If you are dealing with a named vector you can use the names of the elements to select them.
88 |
89 | ```{r}
90 | numeric_vector["months/year"]
91 | ```
92 |
93 | If you need to select more than one elements you can use another vector! Take a minute to understand the following syntax
94 |
95 | ```{r}
96 | numeric_vector[c(1,3)]
97 | ```
98 |
99 | And also the order matters!
100 |
101 | ```{r}
102 | numeric_vector[c(3,1)]
103 | ```
104 |
105 | If you want to select all but one elements
106 |
107 | ```{r}
108 | numeric_vector[-2]
109 | ```
110 |
111 | Notice that the last and the [ante-penultimate](https://en.wiktionary.org/wiki/antepenultimate) examples give the same result!
112 |
113 | Another way to subset a vector is use a logical vector. The logical vector has to have the same length as the one you want to subset. Only the elements that correspond to `TRUE` will be kept.
114 |
115 | ```{r}
116 | numeric_vector[c(TRUE, FALSE, TRUE)]
117 | ```
118 |
119 | Again we get the same result! If your logical vector is shorter than the vector you are subsetting, R recycles the logical vector that you passed until it has the same length as the one you subset. For further explanations about how recycling works, go to the videos.
120 |
121 |
122 |
--------------------------------------------------------------------------------
/refguides/chapter3_refguide.Rmd:
--------------------------------------------------------------------------------
1 | ## Create and Name Matrices
2 |
3 | Matrices are not very different from vectors; both are data structures that store elements of the same type:
4 |
5 | A matrix is a 2-dimensional array consisting of rows and columns. Matrices and dataframes are different since the latter can only contain numeric vectors and can be considered as a natural extension of a vector.
6 | You can build them easily with the function [`matrix()`](http://www.rdocumentation.org/packages/gmp/functions/matrix).
7 |
8 | ```{r, eval= F }
9 | matrix(data = NA, nrow = 1, ncol = 1, byrow = FALSE, dimnames = NULL)
10 | ```
11 |
12 | Only the number of rows `nrow` and columns `ncol` need to be specified. However, the argument `byrow` can be used to specify whether the matrix is filled up row-wise or column-wise.
13 |
14 | ```{r}
15 | my_matrix <- matrix(c(9,2,5, 1,3,4, 1,2,7), nrow = 3, ncol = 3, byrow = TRUE)
16 | my_matrix
17 | ```
18 |
19 |
20 | Per default, the rows and columns do not have names. The argument `dimnames` can change that by defining a list with names such as `dimnames = list(c(r1, r2...), c(c1, c2,...)` depending on the number of rows and columns.
21 |
22 | Since matrices are just several vectors that you can put together, they can also be build by pasting rows or columns with [`rbind()`](http://www.rdocumentation.org/packages/dplyr/functions/rbind) or [`cbind()`](http://www.rdocumentation.org/packages/marray/functions/cbind) instead of using the function [`matrix()`](http://www.rdocumentation.org/packages/gmp/functions/matrix).
23 |
24 | A matrix is defined as an atomic vector. Thus, it is possible to create a matrix based on two matrices that do not necessary contain numbers, as seen in the first video exercise. Then you have created a dataframe or a list by applying coercion.
25 |
26 |
27 | ## Subsetting Matrices
28 | > Like any other data object, you can draw subsets from the matrices. They can be built using the square brackets `[]` on the matrix object and specifying the row and column that is to be subtracted. You can then maintain a single matrix element:
29 |
30 | ```{r}
31 | my_matrix[1, 2]
32 | ```
33 |
34 | Otherwise rows and columns can be specifiyed by simply defining the number of the row or the column:
35 |
36 | ```{r}
37 | my_matrix[1,]
38 | my_matrix[, 2]
39 | ```
40 |
41 | If only a single number is defined, R returns the value of the position defined inside of the subset:
42 |
43 | ```{r}
44 | my_matrix[1]
45 | ```
46 |
47 | Remark: R counts the positions inside of a matrix from the first row value to the last row value in the first column, then the first row value to the last row value in the second column.
48 |
49 | Furthermore, you can subset multiple elements of a matrix vector by defining the row or columns and the position of the value. As seen in the video lectures you can use the concatenate function [`c()`](http://www.rdocumentation.org/packages/onion/functions/c) to either retain a single value of a specific position or a sub-matrix.
50 |
51 | ```{r}
52 | my_matrix[2, c(2, 3)]
53 | my_matrix[c(2, 3), c(2, 3)]
54 | ```
55 |
56 | In a similar manner, matrices can be subsetted by names instead of indices of rows and columns.
57 | Alternatively, logical vectors can be used to subset, when both rows and columns are defined!
58 |
59 | ```{r}
60 | my_matrix[c(F, F, T), c(F, F, T)]
61 | ```
62 |
63 | ## Matrix calculus
64 |
65 | R has two easy functions to let you sum up the values of the rows and columns:
66 |
67 | * [`rowSums()`](http://www.rdocumentation.org/packages/base/functions/colSums)
68 | * [`colSums()`](http://www.rdocumentation.org/packages/base/functions/colSums)
69 |
70 | And of course any arithmetic operation can be proceeded on a matrix as well:
71 |
72 | * calculate a scalar
73 | * any other operations (`/`, `+`, `-`)
74 |
75 | In general, all matrix operations are done element-wise.
76 |
77 | Remark: Matrix recycling is automatically done when a matrix calculation is done between two unequal matrices or between a matrix and a vector. This has to be handled very carefully, since R might recylces in a way you don't want it to recycle.
78 |
79 |
80 |
81 |
82 |
83 |
84 |
--------------------------------------------------------------------------------
/refguides/chapter4_refguide.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | output: html_document
3 | ---
4 | ## Factors
5 |
6 | Factors are defined as categorical variables that take on a few values. To define a variable as categorical use the function [`factor()`](http://www.rdocumentation.org/packages/base/functions/factor).
7 |
8 | What does R do?
9 |
10 | * screening for all values and defining them as factors.
11 | * sort them alphabetically
12 | * character values correspond to integer values (handy, in the case of long charcater strings)
13 |
14 | ## Rename factors
15 |
16 | Moreover, the order has to be specified manually inside of the factor using the `levels` argument in the [`factor()`](http://www.rdocumentation.org/packages/base/functions/factor) function.
17 |
18 | ```{r, eval=FALSE}
19 | factor(my_var, levels = c("xy","xz","zy"))
20 | ```
21 |
22 | And the level names can be defined manually using the [`levels()`](http://www.rdocumentation.org/packages/base/functions/levels) function
23 |
24 | ```{r, eval=FALSE}
25 | levels(my_var) = c("na_xy", "na_xz", "na_zy")
26 | ```
27 |
28 | or by using the `labels` argument inside of the function [`factor()`](http://www.rdocumentation.org/packages/base/functions/factor)
29 |
30 | ```{r, eval=F}
31 | factor(my_var, labels = c("na_xy","na_xz","na_zy"))
32 |
33 | ```
34 |
35 | Remark: To rename levels, you always have to follow the original order of the levels. To avoid confusion and misspecification, it is suggested to use both `levels` and `labels` inside [`factor()`](http://www.rdocumentation.org/packages/base/functions/factor).
36 |
37 |
38 | ## Nominal vs Ordinal
39 |
40 | **Ordinal** variables contain a natural order among their levels, whereas **nominal** variables do not inherit any such order.
41 |
42 | First thing to know about **ordinal** variables is that they are as well defined with [`factor()`](http://www.rdocumentation.org/packages/base/functions/factor) but the argument `order` is specified as `TRUE`.
43 | R orders them alphabetically, unless specified otherwise.
44 | The ordinal structure is quite specific:
45 | * it is regarded in comparisons and operations
46 | * it reflects by `<` and `>` signs
47 |
48 | For example:
49 | ```{r, eval=F}
50 | factor(my_ordinal, order = T, levels = c(1, 2, 3))
51 | ```
52 |
53 |
54 |
55 |
56 |
--------------------------------------------------------------------------------
/refguides/chapter5_refguide.Rmd:
--------------------------------------------------------------------------------
1 |
2 | ## Lists
3 |
4 | A list is a generic vector containing other objects. There is no particular need for the objects to be of the same type, as with vectors. For example, a list could consist of a numeric vector, a logical value, a matrix, other lists, and so on.
5 |
6 | ```{r}
7 | my_family <- list("Ryan", "Mary", 3, TRUE)
8 | my_family
9 | ```
10 |
11 | Components of lists may also be named. You can assign names to list elements by the [`names()`](http://www.rdocumentation.org/packages/base/functions/names) function or at the time of creation.
12 |
13 | ```{r}
14 | my_family <- list(father="Ryan", mother="Mary", siblings=3, divorced=TRUE)
15 | my_family
16 | ```
17 |
18 | If you want to know if your object,`my_family` in our case, is a list you can use the following.
19 |
20 | ```{r}
21 | is.list(my_family)
22 | ```
23 |
24 | Finally, you can use `str()` to display the stucture of your list.
25 |
26 | ```{r}
27 | str(my_family)
28 | ```
29 |
30 |
31 | ## Subset and Extend Lists
32 |
33 | If you need to isolate parts of your list you can use `[...]` and `[[...]]`. Indexing with `[...]` as used to subset vectors will give you sublist not the content inside the element. To retrieve the content, we need to use `[[...]]`. This approach will allow you to access a single element at a time.
34 |
35 | ```{r}
36 | my_family[1]
37 | my_family[[1]]
38 | ```
39 |
40 | If you want to retrieve more elements of your list.
41 |
42 | ```{r}
43 | my_family[c(1,3)]
44 | ```
45 |
46 | If the list is named its elements can be refered by names instead of numeric indeces.
47 |
48 | ```{r}
49 | my_family[["father"]]
50 | ```
51 |
52 | Alternatively, you can use the `$` operator.
53 | ```{r}
54 | my_family$father
55 | ```
56 |
57 | Another way to subset is logical data types, namely `TRUE` and `FALSE`.
58 |
59 | ```{r}
60 | my_family[c(TRUE, FALSE, TRUE, FALSE)]
61 | ```
62 |
63 | Adding new elements is easy. You simply assign values using new tags and it will pop into action.
64 |
65 | ```{r}
66 | grandparents <- c("Arthur","Josephin")
67 | my_family$grandparents <- grandparents
68 | my_family
69 | ```
70 |
71 | or equivalently
72 | ```{r}
73 | my_family[["grandparents"]] <- grandparents
74 | ```
75 |
76 |
77 |
78 |
--------------------------------------------------------------------------------
/refguides/chapter6_refguide.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | output: html_document
3 | ---
4 | ## Explore dataframes
5 |
6 | Data sets
7 |
8 | * consist of observations
9 | * corresponding to variables
10 | * stored in a dataframe
11 |
12 | Matrices on the other hand are only useful for atomic vectors, and lists would require too much coding.
13 |
14 | What is a dataframe?
15 |
16 | * built to specifically store data
17 | * matrix form: with rows as observations and columns as variables
18 | * allows for elements of all types (logicals, numerics, characters)
19 |
20 | How to create a dataframe?
21 |
22 | * import data from CSV files
23 | * import from a database (i.e. SQL)
24 | * import from other statistical software etc...
25 |
26 | Remark: Dataframes are basically lists with n elements corresponding to each column of the dataframe. The elements of the lists are of length of the number of observations BUT the number of observations has to be equal.
27 |
28 | In general, you can define a data frame inside R using the function [`data.frame()`](http://www.rdocumentation.org/packages/R.utils/functions/dataFrame).
29 |
30 | ```{r, eval = F}
31 | data.frame(..., row.names = NULL, check.rows = FALSE, check.names = TRUE,
32 | stringsAsFactors = default.stringsAsFactors()))
33 | ```
34 |
35 | ## Subset a dataframe
36 |
37 | Due to the nature of a dataframe, you use the subsetting syntax of lists and matrices.
38 |
39 | To draw a subset from a matrix, you apply the square brackets and choose a row and a column
40 |
41 | ```{r, eval = FALSE}
42 | my_df[3, 2]
43 | ```
44 |
45 | The indices can be columns names as well!
46 | As before, to get the only one of the rows you would specify which one you want to keep and leave the column argument empty. Same applies for keeping only one variable but all the observations.
47 |
48 | ```{r, eval=FALSE}
49 | my_df[3, ] # only the third row is subsetted
50 | my_df[ , 2] # only the secnd column in subsetted
51 |
52 | ```
53 |
54 | This can be generalized to the situation where you want to select only some variables but keep all the observations; or select only a few observations:
55 |
56 | ```{r, eval=FALSE}
57 | my_df[c(3,2), c(3,2)]
58 | ```
59 |
60 | Remark: Any built subset leads to a _new dataframe_ and not a vector, as it was the case before.
61 |
62 | How to use the list syntax to select elements?
63 |
64 | * Either by using the dollar sign (`$`)
65 |
66 | ```{r, eval= F}
67 | my_df$variable1
68 | ```
69 |
70 | * Or by using double brackets (`[[...]]`)
71 |
72 | ```{r, eval = F}
73 | my_df[[variable1]]
74 | ```
75 |
76 | Remark: Now, the result is a vector. If instead of double square brackets, single square brackets are used, then a _new list_ is created.
77 |
78 | ## Extend your dataframe
79 |
80 | You can extend your dataframe by adding a column
81 |
82 | * by using the dollar sign (`$`)
83 | ```{r, eval=F}
84 | my_df$new_column <- new_column
85 | ```
86 |
87 | * by using double square brackets (`[[...]]`)
88 | ```{r, eval=F}
89 | my_df[["new_column"]] <- new_column
90 | ```
91 |
92 | * using [`cbind()`](http://www.rdocumentation.org/packages/marray/functions/cbind)
93 | ```{r, eval=FALSE}
94 | cbind(my_df, new_column)
95 | ```
96 |
97 | The dataframe can be extended by adding a rows. Since rows corresponds to lists, it is necessary to create a new dataframe with [`data.frame()`](http://www.rdocumentation.org/packages/R.utils/functions/dataFrame) and combine the original one and the new one.
98 |
99 | * by using [`rbind()`](http://www.rdocumentation.org/packages/dplyr/functions/rbind)
100 | ```{r, eval=FALSE}
101 | rbind(my_df, new_df)
102 | ```
103 |
104 | ## Sort a dataframe
105 |
106 | In general, the function [`sort()`](http://www.rdocumentation.org/packages/arules/functions/sort)
107 | can be applied. However, to sort the rows in a data frame, you can use the [`order()`](http://www.rdocumentation.org/packages/base/functions/order) function.
108 |
109 | ```{r, eval= F}
110 | rank <- order(my_df$variable1)
111 | ```
112 |
113 | The order function
114 |
115 | * returns a vector with rank/position of each element
116 | * the first value indicates the rank of the element in the vector/matrix
117 | * [`order()`](http://www.rdocumentation.org/packages/base/functions/order) can be used inside of a subset
118 |
119 | ```{r, eval=FALSE}
120 | my_df[order(my_df, decreasing = TRUE), ]
121 | ```
122 |
123 | For more information, have a look at the exercises!
124 |
--------------------------------------------------------------------------------
/refguides/chapter7_refguide.Rmd:
--------------------------------------------------------------------------------
1 |
2 | ## Basic Graphics
3 |
4 | One of the most frequently used plotting functions in R is the [`plot()`](http://www.rdocumentation.org/packages/graphics/functions/plot). This is a generic function: the type of plot produced is dependent on the type or class of the argument(s).
5 |
6 | ```{r, eval=FALSE}
7 | x <- c(1, 2, 3, 4)
8 | plot(x) # this generates a plot of the values in the variable against their index
9 |
10 | x <- factor(c("Black", "White", "Green"))
11 | plot(x) # this generates a bar chart
12 | ```
13 |
14 | ```{r, eval=FALSE}
15 | x <- c(1, 2, 3)
16 | y <- c(1, 2, 3)
17 | plot(x, y) # this generates a scatter plot
18 |
19 | x <- factor(c("Black", "White", "Green"))
20 | y <- c(1, 2, 3)
21 | plot(x, y) # this generates boxplots of y for each level of x
22 |
23 | x <- factor(c("Black", "White", "Green"))
24 | y <- factor(c("Left", "Right", "Centre"))
25 | plot(x, y) # this generates stacked bar chart
26 | ```
27 |
28 | Histograms can be created using the [`hist()`](http://www.rdocumentation.org/packages/graphics/functions/hist) function. This function takes in a continuous variable, `x`, for which the histogram is plotted.
29 |
30 | ```{r, eval=FALSE}
31 | hist(x, breaks = ``)
32 | ```
33 |
34 | With the `breaks` argument you can specify the number of bins you want in the histogram.
35 |
36 | You can also check other graphics functions such as [`boxplot()`](http://www.rdocumentation.org/packages/graphics/functions/boxplot) and [`barplot()`]( http://www.rdocumentation.org/packages/raster/functions/barplot).
37 |
38 |
39 | ## Customizing Plots
40 |
41 | Now, you can modify your plot !
42 |
43 | ```{r,eval=FALSE}
44 | plot(x,y,
45 | xlab = " ", # changes the label of the horizontal axis
46 | ylab = " ", # changes the label of the vertical axis
47 | main = " ", # specifies the title of the plot
48 | type = " ", # specifies the type of the plot i.e lines, points etc
49 | col = " ") # specifies the color of the plot
50 | ```
51 |
52 | Type `?par` in your console to take a peek on the graphical parameters you can specify.
53 |
54 | A few of them are
55 |
56 | ```{r,eval=FALSE}
57 | plot(x,y,
58 | xlab = " ",
59 | ylab = " ",
60 | main = " ",
61 | type = " ",
62 | col = " ",
63 | col.main = " ", # specifies the color of the main title
64 | cex.axis = ` `, # specifies the size of the fonts
65 | lty = ` `, # specifies the line type
66 | pch = ` `) # specifies the plot symbol
67 | ```
68 |
69 | ### Important Note
70 | Since all the arguments are specified inside the [`plot()`](http://www.rdocumentation.org/packages/graphics/functions/plot)
71 | function they are valid only for the specific plot. It is possible, though, to set the parameters of the graphs globally by using the [`par()`](http://www.rdocumentation.org/packages/graphics/functions/par) function.
72 |
73 |
74 | ## Multiple Plots
75 |
76 | R makes it easy to combine multiple plots into one overall graph, using either the [`par()`](http://www.rdocumentation.org/packages/graphics/functions/par)
77 | or [`layout()`](http://www.rdocumentation.org/packages/graphics/functions/layout) function.
78 |
79 | With [`par()`](http://www.rdocumentation.org/packages/graphics/functions/par), you can include the option `mfrow` to create a grid of `nrows` and `ncols` plots that are filled in by row.
80 |
81 | ```{r,eval=FALSE}
82 | par(mfrow = c(nrows, ncols))
83 | ```
84 |
85 | If you use `mfcol`, it fills in the grid by columns.
86 |
87 | ```{r,eval=FALSE}
88 | par(mfcol = c(nrows, ncols))
89 | ```
90 |
91 | In order to reset the graphical parameters you can use:
92 |
93 | ```{r,eval=FALSE}
94 | par(mfrow = c(1, 1))
95 | ```
96 |
97 | or equivalently,
98 |
99 | ```{r,eval=FALSE}
100 | old_par <- par()
101 | ```
102 |
103 | and invoke the `old_par` when you need to reset the parameters.
104 |
105 | Another way is to use the [`layout()`](http://www.rdocumentation.org/packages/graphics/functions/layout) function which divides the plotting space into as many rows and columns as there are in matrix `mat`.
106 |
107 | ```{r,eval=FALSE}
108 | layout(mat, ...)
109 | ```
110 |
111 | Once more way to reset the graphical parameters, is to use:
112 |
113 | ```{r,eval=FALSE}
114 | layout(1)
115 | ```
116 |
117 |
118 | In order to add more information to your plot you can use the following fuunctions.
119 |
120 | ```{r,eval=FALSE}
121 | plot(x, y)
122 | abline() # adds one or more straight lines
123 | lines() # adds lines (careful how to specify the arguments, watch video for more info)
124 | points() # adds points
125 | text() # adds text
126 | segments() # adds line segments between pairs of points
127 | ```
128 |
129 | Take a look at the documentatin to get more insight into these functions [`abline()`](http://www.rdocumentation.org/packages/graphics/functions/abline),
130 | [`lines()`](http://www.rdocumentation.org/packages/graphics/functions/lines),
131 | [`points()`](http://www.rdocumentation.org/packages/graphics/functions/points),
132 | [`text()`](http://www.rdocumentation.org/packages/graphics/functions/text),
133 | and [`segments()`](http://www.rdocumentation.org/packages/graphics/functions/segments).
134 |
135 |
136 |
137 |
--------------------------------------------------------------------------------
/scripts/chapter1_script.md:
--------------------------------------------------------------------------------
1 | ## chapter_1_1 script: R, the true basics
2 |
3 | Hi! My name is Filip and I'm a data scientist at DataCamp. DataCamp is an online data science school. You'll take fun video lessons, like the one you're watching now and solve interactive coding challenges, where you receive instant and detailed feedback. All this happens in the comfort of your browser, so you can immediately start learning the skill of the future.
4 |
5 | In this introduction to R course you will learn about the basics of R, as well as the most common data structures it uses to store data. By the end of this course, you will know how to create these data structures, manipulate them and perform calculations on them to get surprising insights.
6 |
7 | But first things first: the basics of R. It's also called the language for statistical computing, and is one of the most popular languages to do data science, used by tons of companies and universities around the globe in all sorts of fields. Optimizing a financial portfolio? Mapping marketing data? Analyzing outcomes of clinical trials? You name it, R can handle it.
8 |
9 | But why did R become so popular? Well, first of all, it's free to use! Next, R's visualization capabilities are top notch, making it easy to build beautiful plots. It's also easy to create so-called packages, which are extensions to R. R's very active community has created thousands of these packages for many different fields. Last but not least, R is an actual programming language, with a command-line interface for executing code. This is a big plus compared to other point-and-click programs out there. It might take some energy to fully get the hang of it, but feat not: DataCamp is here to help you master R in no time! Let's get started.
10 |
11 | An important component of R, is the console. It's a place where you can execute R commands. In DataCamp's interactive interface, the console can be found here. Let's try to calculate the sum of 1 and 2. We simply type 1 + 2 at the prompt the console and hit Enter. R interprets what you typed and prints the result.
12 |
13 | R is more than a scientific calculator, though. You can also create so-called variables. A variable allows you to store data in R for later use. You can use the less than sign followed by a dash to create a variable. Suppose the height of a rectangle is 2. Let's assign this value 2 to a variable height. In the console, we type height, less than sign, dash, 2:
14 |
15 | This time, R does not print anything, because it assumes that you will be using this variable in the future. If you now simply type and execute height in the console, R returns 2:
16 |
17 | We can do a similar thing for the width of our imaginary rectangle. We assign the value 4 to a variable width.
18 |
19 | Typing width gives us 4, great.
20 |
21 | As you're assigning variables in the R console, you're actually accumulating the R workspace. It's the place where R variables 'live'. You can list all variables with the `ls()` function. Simply type ls followed by empty parentheses and hit enter.
22 |
23 | This shows you a list of all the variables you have created up to now. There are two objects in your workspace at the moment, height and width. I we try to access variable that's not in the workspace, depth for example, R throws an error.
24 |
25 | Suppose you now want to find out the area of our imaginary rectangle, which is height multiplied by width. height equals 2, and width equals 4, so the result is 8. Let's also assign this result to a new variable, area.
26 |
27 | Inspecting the workspace again with ls, shows that the workspace contains three objects now: area, height and width.
28 |
29 | Now, this is all great, but what if you want to recalculate the area of your imaginary rectangle when the height is 3 and the width is 6? You'd have to reassign the variables width and height in the console, and then recalculate the area. That's quite some coding you'd have to redo, isn't it?
30 |
31 | This is the place where R scripts come in! An R script is simply a text file with succesive lines of R code. Let's create such a script, "rectangle.R", that contains the code that we've written up to now.
32 |
33 | Next, you can run this script. In the DataCamp interface, you can do this with the 'Submit Answer' button. R goes through your code, line by line, executing every command one by one in the console, just as if you are typing each command yourself. The cool thing is, that if you want to change your code, you can simply adapt your script and run it again. Let's change the height to 3 and the width to 6, and rerun the script. The variables are given different values this time, and the output changes accordingly.
34 |
35 | Now it's time for some interactive exercises! Use the console for experimentation, and the R script editor for coding the actual answer. When you hit Submit Answer, your script will be executed, and checked for correctness. DataCamp's tailored feedback will guide you to R mastery!
--------------------------------------------------------------------------------