├── .gitignore
├── README.md
├── chapter1.Rmd
├── chapter2.Rmd
├── chapter3.Rmd
├── chapter4.Rmd
├── chapter5.Rmd
├── chapter6.Rmd
├── course.yml
├── datasets
    ├── chapter5.R
    ├── chapter5.RData
    ├── chapter6.R
    └── chapter6.RData
├── refguides
    ├── chapter1_refguide.Rmd
    ├── chapter2_refguide.Rmd
    ├── chapter3_refguide.Rmd
    ├── chapter4_refguide.Rmd
    ├── chapter5_refguide.Rmd
    ├── chapter6_refguide.Rmd
    └── chapter7_refguide.Rmd
└── scripts
    └── chapter1_script.md


/.gitignore:
--------------------------------------------------------------------------------
 1 | *
 2 | !*.Rmd
 3 | !*.yml
 4 | !README.md
 5 | !.gitignore
 6 | .Rproj.user
 7 | !removed/
 8 | !removed/*.Rmd
 9 | !scripts/
10 | !scripts/*.md
11 | !datasets/
12 | !datasets/*.R
13 | !datasets/*.RData
14 | !refguides/
15 | !refguides/*
16 | 
17 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Introduction to R (beta)
2 | 
3 | Source files for the improved introduction to R course (still in beta)
4 | 
5 | [**Link to Course**](https://www.datacamp.com/courses/732)
6 | 
7 | This course should be updated through [DataCamp Teach](https://www.datacamp.com/teach).
8 | 


--------------------------------------------------------------------------------
/chapter1.Rmd:
--------------------------------------------------------------------------------
  1 | --- 
  2 | title_meta  : Chapter 1
  3 | title       : Intro to basics
  4 | description : "In this chapter, you will take your first steps with R. You will learn how to use the console as a calculator and how to assign variables. You will also get to know the basic data types in R. Let's get started!"
  5 | attachments : 
  6 |   slides_link: https://s3.amazonaws.com/assets.datacamp.com/course/introduction_to_r/slides/ch1_slides_v2.pdf
  7 |   
  8 | --- type:VideoExercise lang:r xp:50 skills:1 key:1a1ba28cd5
  9 | ## Meet R
 10 | 
 11 | *** =video_link
 12 | //player.vimeo.com/video/144351865
 13 | 
 14 | *** =video_hls
 15 | //videos.datacamp.com/transcoded/732_intro_to_r/v1/hls-ch1_1.master.m3u8
 16 | 
 17 | --- type:NormalExercise lang:r xp:100 skills:1 key:5714863ba8
 18 | ## Your first R script
 19 | 
 20 | In the script on the right you should type R code to solve the exercises. When you hit the _Submit Answer_ button, every line of code in the script is interpreted and executed by R and you get a message that indicates whether or not your code was correct. The output of your submission is shown in the R console.
 21 | 
 22 | You can also execute R commands straight in the console. When you type in the console, your submission will not be checked for correctness! Try, for example, to type in `3 * 4` and hit Enter. R should return `[1] 12`.
 23 | 
 24 | *** =instructions
 25 | In the script, add another line of code that calculates the sum of 6 and 12, and hit the _Submit Answer_ button.
 26 | 
 27 | *** =hint
 28 | Simply add a line of R code that calculates the sum of 6 and 12, just like the example in the sample code!
 29 | 
 30 | *** =pre_exercise_code
 31 | ```{r}
 32 | # no pec
 33 | ```
 34 | 
 35 | *** =sample_code
 36 | ```{r}
 37 | 3 + 4
 38 | ```
 39 | 
 40 | *** =solution
 41 | ```{r}
 42 | 3 + 4
 43 | 6 + 12
 44 | ```
 45 | 
 46 | *** =sct
 47 | ```{r}
 48 | test_output_contains("18", incorrect_msg = "Make sure to add a line of R code that calculates the sum of 6 and 12.")
 49 | success_msg("Awesome! See how the console shows the result of the R code you submitted?")
 50 | ```
 51 | 
 52 | 
 53 | --- type:NormalExercise lang:r xp:100 skills:1 key:ab37530088
 54 | ## Documenting your code
 55 | 
 56 | Adding comments to your code is extremely important to make sure that you and others can understand what your code is about. R makes use of the `#` sign to add comments, just like Twitter!
 57 | 
 58 | Comments are not run as R code, so they will not influence your result. For example, _Calculate 3 + 4_ in the script on the right is a comment and is ignored during execution.
 59 | 
 60 | *** =instructions
 61 | Add another comment in the script on the right, _Calculate 6 + 12_, at the appropriate location.
 62 | 
 63 | *** =hint
 64 | Simply add the line `# Calculate 6 + 12` above the R code that calculates 6 + 12.
 65 | 
 66 | *** =pre_exercise_code
 67 | ```{r}
 68 | # no pec
 69 | ```
 70 | 
 71 | *** =sample_code
 72 | ```{r}
 73 | # Calculate 3 + 4
 74 | 3 + 4
 75 | 
 76 | 
 77 | 6 + 12
 78 | ```
 79 | 
 80 | *** =solution
 81 | ```{r}
 82 | # Calculate 3 + 4
 83 | 3 + 4
 84 | 
 85 | # Calculate 6 + 12
 86 | 6 + 12
 87 | ```
 88 | 
 89 | *** =sct
 90 | ```{r}
 91 | test_output_contains("7", incorrect_msg = "Do not remove the code that calculates 3 + 4.")
 92 | test_student_typed("# Calculate 3 + 4", not_typed_msg = "Do not remove the comment for the code that calculates 3 + 4.")
 93 | test_output_contains("18", incorrect_msg = "Do not remove the code that calculates 6 + 12.")
 94 | test_student_typed(c("# Calculate 6 + 12", "# calculate 6 + 12", "#Calculate 6 + 12", "#calculate 6 + 12",
 95 |                      "# Calculate 6+12", "# calculate 6+12", "#Calculate 6+12", "#calculate 6+12"), 
 96 |                    not_typed_msg = "Make sure to add the comment: `# Calculate 6 + 12`")
 97 | success_msg("Great! Looks better, doesn't it?")
 98 | ```
 99 | 
100 | 
101 | --- type:NormalExercise lang:r xp:100 skills:1 key:9d8a3d0b88
102 | ## R as a calculator
103 | 
104 | In its most basic form R can be used as a scientific calculator. Consider the following arithmetic operators:
105 | 
106 | - Addition: `+`
107 | - Subtraction: `-`
108 | - Multiplication: `*`
109 | - Division: `/`
110 | - Exponentiation: `^`
111 | - Modulo: `%%`
112 | 
113 | The last two might need some explaining:
114 | - The `^` operator raises the number to its left to the power of the number to its right: for example `3^2` equals 9.
115 | - The modulo returns the remainder of the division of the number to the left by the number on its right, for example 5 modulo 3 or `5 %% 3` equals 2.
116 | 
117 | *** =instructions
118 | - Type `2^5` in the script to calculate 2 to the power 5.
119 | - Type `28 %% 6` to calculate 28 modulo 6.
120 | - Click _Submit Answer_ and have a look at the R output in the console.
121 | 
122 | *** =hint
123 | Another example of the modulo operator: `9 %% 2` equals `1`.
124 | 
125 | *** =pre_exercise_code
126 | ```{r}
127 | # no pec
128 | ```
129 | 
130 | *** =sample_code
131 | ```{r}
132 | # Addition
133 | 5 + 5 
134 | 
135 | # Subtraction
136 | 5 - 5 
137 | 
138 | # Multiplication
139 | 3 * 5
140 | 
141 |  # Division
142 | (5 + 5) / 2 
143 | 
144 | # Exponentiation
145 | 
146 | 
147 | # Modulo
148 | 
149 | ```
150 | 
151 | *** =solution
152 | ```{r}
153 | # Addition
154 | 5 + 5 
155 | 
156 | # Subtraction
157 | 5 - 5 
158 | 
159 | # Multiplication
160 | 3 * 5
161 | 
162 |  # Division
163 | (5 + 5) / 2 
164 | 
165 | # Exponentiation
166 | 2 ^ 5
167 | 
168 | # Modulo
169 | 28 %% 6
170 | ```
171 | 
172 | *** =sct
173 | ```{r}
174 | msg <- "Do not remove the examples that have already been coded for you!"
175 | test_output_contains("5 + 5", incorrect_msg = msg)
176 | test_output_contains("5 - 5", incorrect_msg = msg)
177 | test_output_contains("3 * 5", incorrect_msg = msg)
178 | test_output_contains("(5 + 5)/2", incorrect_msg = msg)
179 | test_output_contains("2^5", incorrect_msg = "Have another look at the exponentiation. Read the instructions carefully.")
180 | test_output_contains("28 %% 6", incorrect_msg = "Have another look at the use of the `%%` operator. Read the instructions carefully.")
181 | success_msg("Nice one!")
182 | ```
183 | 
184 | 
185 | --- type:MultipleChoiceExercise xp:50 skills:1 key:9d8819fb2e
186 | ## R's pros and cons
187 | 
188 | As Filip explained in the video, there are things that make R the awesome and immensely popular language that it is today. On the other hand, there are also aspects about R that are less attractive. Which of the following statements are true regarding this statistical programming language developed by Ihaka and Gentleman in the nineties?
189 | 
190 | 1. As opposed to SAS and SPSS, R is completely open-source.
191 | 2. R is open-source, but it's hard to share your code with others since R uses a command-line interface.
192 | 3. It typically takes a long time for new and updated R packages to be released and made available to the public.
193 | 4. R is easy to use, but this comes at the cost of limited graphical abilities.
194 | 5. R works well with large data sets, if the code is properly written and the data fits into the working memory. 
195 | 
196 | *** =instructions
197 | - statements (1) and (2) are correct; the others are false.
198 | - statements (1) and (4) are correct; the others are false.
199 | - statements (1) and (5) are correct; the others are false.
200 | - statements (2) and (4) are correct; the others are false.
201 | - statements (3) and (5) are correct; the others are false.
202 | 
203 | *** =hint
204 | Remember that your data has to fit in the working memory for R to be able to process it.
205 | 
206 | *** =pre_exercise_code
207 | ```{r}
208 | # no pec
209 | ```
210 | 
211 | *** =sct
212 | ```{r}
213 | msg1 = "Remember that the fact that R uses a command-line interface, does not make it hard to share code. On the contrary, sharing your results becomes very straightforward because you can easily share R scripts."
214 | msg2 = "R is the perfect tool for creating neat and insightful visualizations. Try again."
215 | msg3 = "Great! Head over to the next exercise and get your hands dirty!"
216 | msg4 = "R uses a command-line interface, which makes it very easy to share one's code. Also, R is very suitable for creating visualizations. Try again."
217 | msg5 = "It's fairly straightforward to write, maintain and share R packages. Try again."
218 | test_mc(3, feedback_msgs = c(msg1, msg2, msg3, msg4, msg5))
219 | ```
220 | 
221 | 
222 | --- type:NormalExercise lang:r xp:100 skills:1 key:6b6fb4974c
223 | ## Variable assignment (1)
224 | 
225 | A variable allows you store a value or an object in R. You can then later use this variable's name to easily access the value or the object that is stored within this variable. You use `<-` to assign a variable:
226 | 
227 | ```
228 | my_variable <- 4
229 | ```
230 | 
231 | *** =instructions
232 | Complete the code in the editor such that it assigns the value 42 to the variable `x` in the editor. Click 'Submit Answer'. Notice that when you ask R to print `x`, the value 42 appears.
233 | 
234 | *** =hint
235 | Look at how the value 4 was assigned to `my_variable` in the exercise's assignment. Do the exact same thing in the editor, but now assign 42 to the variable `x`.
236 | 
237 | *** =pre_exercise_code
238 | ```{r}
239 | # no pec
240 | ```
241 | 
242 | *** =sample_code
243 | ```{r}
244 | # Assign the value 42 to x
245 | x <- 
246 | 
247 | # Print out the value of the variable x
248 | x
249 | ```
250 | 
251 | *** =solution
252 | ```{r}
253 | # Assign the value 42 to x
254 | x <- 42
255 | 
256 | # Print out the value of the variable x
257 | x
258 | ```
259 | 
260 | *** =sct
261 | ```{r}
262 | test_error()
263 | test_object("x", 
264 |             undefined_msg = "Make sure to define a variable <code>x</code>.",
265 |             incorrect_msg = "Make sure that you assign the correct value to <code>x</code>.") 
266 | success_msg("Good job! Notice that R does not print the value of a variable to the console when you do the assignment. <code>x <- 42</code> did not generate any output, because R assumes that you will be needing this variable in the future. Otherwise you wouldn't have stored the value in a variable in the first place, right? Proceed to the next exercise!")
267 | ```
268 | 
269 | 
270 | 
271 | 
272 | --- type:NormalExercise xp:100 skills:1 key:a5b8028834
273 | ## Variable assignment (2)
274 | 
275 | Suppose you have a fruit basket with five apples. You want to store the number of apples in a variable with the name `my_apples`. 
276 | 
277 | *** =instructions
278 | - Using `<-`, assign the value 5 to `my_apples` below the first comment.
279 | - Type `my_apples` below the second comment. This will print out the value of `my_apples`.
280 | - After clicking _Submit Answer_, have a look at the console: the number 5 is printed, so R now links the variable `my_apples` to the value 5.
281 | 
282 | *** =hint
283 | Remember that if you want to assign a number or an object to a variable in R, you can make use of the assignment operator `<-`. Alternatively, you can use `=`, but `<-` is widely preferred in the R community.
284 | 
285 | *** =pre_exercise_code
286 | ```{r}
287 | ```
288 | 
289 | *** =sample_code
290 | ```{r}
291 | # Assign the value 5 to the variable called my_apples
292 | 
293 | 
294 | # Print out the value of the variable my_apples
295 | 
296 | ```
297 | 
298 | *** =solution
299 | ```{r}
300 | # Assign the value 5 to the variable called my_apples
301 | my_apples <- 5
302 | 
303 | # Print out the value of the variable my_apples
304 | my_apples
305 | ```
306 | 
307 | *** =sct
308 | ```{r}
309 | test_object("my_apples", incorrect_msg = "Have you correctly assigned 5 to `my_apples`? Write `my_apples <- 5` on a new line in the script.")
310 | test_output_contains("my_apples", incorrect_msg = "Have you explicitly told R to print out the `my_apples` variable to the console? Simply type `my_apples` on a new line.")
311 | success_msg("Great! You could also use `=` for variable assignment, but `<-` is typically preferred.")
312 | ```
313 | 
314 | 
315 | --- type:NormalExercise lang:r xp:100 skills:1 key:a0cb1bea96
316 | ## Variable assignment (3)
317 | 
318 | Every tasty fruit basket needs oranges, so you decide to add six oranges. You decide to create the variable `my_oranges` and assign the value 6 to it. Next, you want to calculate how many pieces of fruit you have in total. Since you have given meaningful names to these values, you can now code this in a clear way:
319 | 
320 | ```
321 | my_apples + my_oranges
322 | ```
323 | 
324 | *** =instructions
325 | - Assign to `my_oranges` the value 6.
326 | - Add the variables `my_apples` and `my_oranges` and have R simply print the result.
327 | - Combine the variables `my_apples` and `my_oranges` into a new variable `my_fruit`, which is the total amount of fruits in your fruit basket.
328 | 
329 | *** =hint
330 | `my_fruit` is just the sum of `my_apples` and `my_oranges`. You can use the `+` operator to sum the two and `<-` to assign that value to the variable `my_fruit`.
331 | 
332 | *** =pre_exercise_code
333 | ```{r}
334 | # no pec
335 | ```
336 | 
337 | *** =sample_code
338 | ```{r}
339 | # Assign 5 to my_apples
340 | my_apples <- 5
341 | 
342 | # Assign 6 to my_oranges
343 | 
344 | 
345 | # Add my_apples and my_oranges: print the result
346 | 
347 | 
348 | # Add my_apples and my_oranges: assign to my_fruit
349 | 
350 | ```
351 | 
352 | *** =solution
353 | ```{r}
354 | # Assign 5 to my_apples
355 | my_apples <- 5
356 | 
357 | # Assign 6 to my_oranges
358 | my_oranges <- 6
359 | 
360 | # Add my_apples and my_oranges: print the result
361 | my_apples + my_oranges
362 | 
363 | # Add my_apples and my_oranges: assign to my_fruit
364 | my_fruit <- my_apples + my_oranges
365 | ```
366 | 
367 | *** =sct
368 | ```{r}
369 | test_object("my_apples", incorrect_msg = "Do not change the assignment of the `my_apples` variable!")
370 | test_object("my_oranges")
371 | test_output_contains("my_apples + my_oranges",
372 |                      incorrect_msg = "The output does not contain the result of adding `my_apples` and `my_oranges` (second instruction). Try again.")
373 | test_object("my_fruit")
374 | success_msg("Nice one! The great advantage of doing calculations with variables is reusability. If you just change `my_apples` to equal 12 instead of 5 and rerun the script, `my_fruit` will automatically update as well.")
375 | ```
376 | 
377 | 
378 | --- type:NormalExercise lang:r xp:100 skills:1 key:6192f64167
379 | ## The workspace
380 | 
381 | If you assign a value to a variable, this variable is stored in the workspace. It's the place where all user-defined variables live. The command [`ls()`](http://www.rdocumentation.org/packages/base/functions/ls) lists the contents of this workspace. 
382 | 
383 | ```
384 | a <- 1
385 | b <- 2
386 | ls()
387 | ```
388 | 
389 | The first two lines create the variables `a` and `b`. Calling [`ls()`](http://www.rdocumentation.org/packages/base/functions/ls) now shows you that `a` and `b` are in the workspace. 
390 | 
391 | You can also remove variables from the workspace. You do this with [`rm()`](http://www.rdocumentation.org/packages/base/functions/rm). `rm(a)`, for example, would remove `a` from the workspace again. `rm(list = ls())`, which is used in the beginning of your script, clears everything from the workspace.
392 | 
393 | *** =instructions
394 | - Create a variable, `horses`, equal to 3, and a variable `dogs`, equal to 7.
395 | - List the contents of your workspace with [`ls()`](http://www.rdocumentation.org/packages/base/functions/ls) to see that indeed, these two variables are in there.
396 | 
397 | *** =hint
398 | All you need is a combination of [`ls()`](http://www.rdocumentation.org/packages/base/functions/ls) and [`rm()`](http://www.rdocumentation.org/packages/base/functions/rm) at the right time. Give it a try and let the feedback messages guide you.
399 | 
400 | *** =pre_exercise_code
401 | ```{r}
402 | # no pec
403 | ```
404 | 
405 | *** =sample_code
406 | ```{r}
407 | # Clear the entire workspace
408 | rm(list = ls())
409 | 
410 | # Create the variables horses and dogs
411 | 
412 | 
413 | # List the contents of your workspace
414 | 
415 | 
416 | ```
417 | 
418 | *** =solution
419 | ```{r}
420 | # Clear the entire workspace
421 | rm(list = ls())
422 | 
423 | # Create the variables horses and dogs
424 | horses <- 3
425 | dogs <- 7
426 | 
427 | # Inspect the contents of the workspace again
428 | ls()
429 | ```
430 | 
431 | *** =sct
432 | ```{r}
433 | test_student_typed("rm(list = ls())", not_typed_msg = "Do not remove the line `rm(list = ls())`.")
434 | test_object("horses")
435 | test_object("dogs")
436 | test_output_contains('c("dogs", "horses")',
437 |                      incorrect_msg = "Make sure to inspect the objects in your workspace after creating `horses` and `dogs`.")
438 | success_msg("Awesome! You can now build up and inspect your workspace, great!")
439 | ```
440 | 
441 | 
442 | --- type:VideoExercise lang:r xp:50 skills:1 key:9f9019501e
443 | ## Basic Data Types
444 | 
445 | *** =video_link
446 | //player.vimeo.com/video/138173888
447 | 
448 | *** =video_hls
449 | //videos.datacamp.com/transcoded/732_intro_to_r/v1/hls-ch1_2.master.m3u8
450 | 
451 | 
452 | 
453 | --- type:NormalExercise lang:r xp:100 skills:1 key:1866cdd202
454 | ## Discover Basic Data Types
455 | 
456 | To get started, here are some of R's most basic data types:
457 | 
458 | - Decimal values like `4.5` are called **numerics**.
459 | - Natural numbers like `4L` are called **integers**. Integers are also numerics.
460 | - Boolean values (`TRUE` or `FALSE`) are called **logical**. Capital letters are important here; `true` and `false` are not valid.
461 | - Text (or string) values are called **characters**.
462 | 
463 | Note how the quotation marks on the right indicate that `"some text"` is of type character.
464 | 
465 | *** =instructions
466 | Change the value of the:
467 | 
468 | - `my_numeric` variable to `42`.
469 | - `my_character` variable to `"forty-two"`. Note that the quotation marks indicate that `"forty-two"` is a character.
470 | - `my_logical` variable to `FALSE`.
471 | 
472 | *** =hint 
473 | Replace the values in the script with the values that are provided in the exercise.
474 | ```
475 | my_numeric <- 42
476 | ```
477 | assigns the value 42 to the variable `my_numeric`. 
478 | 
479 | *** =pre_exercise_code
480 | ```{r}
481 | # no pec
482 | ```
483 | 
484 | *** =sample_code
485 | ```{r}
486 | # What is the answer to the universe?
487 | my_numeric <- 42.5
488 | 
489 | # The quotation marks indicate that the variable is of type character
490 | my_character <- "some text"
491 | 
492 | # Change the value of my_logical
493 | my_logical <- TRUE
494 | ```
495 | 
496 | *** =solution
497 | ```{r}
498 | # What is the answer to the universe?
499 | my_numeric <- 42
500 | 
501 | # The quotation marks indicate that the variable is of type character
502 | my_character <- "forty-two"
503 | 
504 | # Change the value of my_logical
505 | my_logical <- FALSE
506 | ```
507 | 
508 | *** =sct
509 | ```{r}
510 | test_object("my_numeric", 
511 |             incorrect_msg = "Make sure that you assign the correct value to `my_numeric.`")
512 | test_object("my_character",
513 |             incorrect_msg = paste("Make sure that you assign the correct value to `my_character`.",
514 |                                   "Do not forget the quotes and beware of capitalization! R is case sensitive!"))
515 | test_object("my_logical",
516 |             undefined_msg = "Please make sure to define a variable `my_logical`.",
517 |             incorrect_msg = "Make sure that you assign the correct value to `my_logical`.") 
518 | success_msg("Great work! Continue to the next exercise.")
519 | ```
520 | 
521 | --- type:NormalExercise lang:r xp:100 skills:1 key:c52153af0b
522 | ## Back to Apples and Oranges
523 | 
524 | Common knowledge tells you not to add apples and oranges. But hey, that is what you just did! The `my_apples` and `my_oranges` variables both contained a number in the previous exercise. The `+` operator works with numeric variables in R. 
525 | 
526 | However, if you try to add a numeric and a character string, R will complain.
527 | 
528 | *** =instructions
529 | - Click _Submit Answer_ and read the error message. Make sure you understand why this did not work.
530 | - Adjust `my_oranges <- "six"` such that R knows you have 6 oranges and thus a fruit basket with 11 pieces of fruit. Click _Submit Answer_ again.
531 | 
532 | *** =hint
533 | You have to assign the numeric value `6` to the `my_oranges` variable instead of the character value `"six"`. Notice how the quotation marks are used to indicate that `"six"` is a character.
534 | 
535 | *** =pre_exercise_code
536 | ```{r}
537 | # no pec
538 | ```
539 | 
540 | *** =sample_code
541 | ```{r}
542 | # Assign a value to the variable my_apples and print it out
543 | my_apples <- 5
544 | my_apples       
545 | 
546 | # Assign a value to the variable my_oranges and print it out
547 | my_oranges <- "six" 
548 | my_oranges 
549 | 
550 | # New variable that contains the total amount of fruit
551 | my_fruit <- my_apples + my_oranges 
552 | my_fruit
553 | ```
554 | 
555 | *** =solution
556 | ```{r}
557 | # Assign a value to the variable my_apples and print it out
558 | my_apples <- 5  
559 | my_apples  
560 | 
561 | # Assign a value to the variable my_oranges and print it out
562 | my_oranges <- 6
563 | my_oranges 
564 | 
565 | # New variable that contains the total amount of fruit
566 | my_fruit <- my_apples + my_oranges 
567 | my_fruit
568 | ```
569 | 
570 | *** =sct
571 | ```{r}
572 | test_object("my_apples", incorrect_msg = "Don't change the code that assigns 5 to `my_apples`.")
573 | test_object("my_oranges", incorrect_msg = "Change the assignment of the `my_oranges` variable such that the code runs without errors.")
574 | test_object("my_fruit", 
575 |             undefined_msg = "Please make sure to define a variable `my_fruit`.",
576 |             incorrect_msg = "Make sure that you assign the correct value to `my_fruit`.")
577 | test_output_contains("my_fruit", incorrect_msg = "The output does not contain the result of adding `my_apples` and `my_oranges`.")
578 | success_msg("Awesome, keep up the good work!")
579 | ```
580 | 
581 | --- type:MultipleChoiceExercise lang:r xp:50 skills:1 key:7806ca24d2
582 | ## What's that data type?
583 | 
584 | When you added the variables containing `5` and `"six"`, you got an error due to a mismatch in data types. You can avoid such embarrassing situations by checking the data type of a variable beforehand:
585 | 
586 | ```
587 | class(my_var)
588 | ```
589 | 
590 | In the workspace (you can see what it contains by typing [`ls()`](http://www.rdocumentation.org/packages/base/functions/ls) in the console), some variables have already been defined. Which statement concerning these variables are correct?
591 | 
592 | *** =instructions
593 | - `a`'s class is `integer`, `b` is a `character`, `c` is a `boolean`.
594 | - `a`'s class is `character`, `b` is a `character` as well, `c` is a `logical`.
595 | - `a`'s class is `numeric`, `b` is a `string`, `c` is a `logical`.
596 | - `a`'s class is `numeric`, `b` is a `character`, `c` is a `logical`.
597 | 
598 | *** =hint
599 | You can find out the data type of the `a` variable for example by typing `class(a)`. You can do similar things for `b` and `c`.
600 | 
601 | *** =pre_exercise_code
602 | ```{r}
603 | a <- 42
604 | b <- "forty-two"
605 | c <- FALSE
606 | ```
607 | 
608 | *** =sct
609 | ```{r}
610 | msg1 <- "`boolean` is not the class for logical values. Try again."
611 | msg2 <- "`a` is of the class `numeric`, give it another go."
612 | msg3 <- "`string` is not a class in R. `character` is!"
613 | msg4 <- "Nice one. Let's step it up a notch and start coercing variables!"
614 | test_mc(correct = 4, feedback_msgs = c(msg1, msg2, msg3, msg4))
615 | ```
616 | 
617 | --- type:NormalExercise lang:r xp:100 skills:1 key:c75fe45544
618 | ## Coercion: Taming your data
619 | 
620 | As Filip explained in the video, coercion to transform your data from one type to another is perfectly possible. Next to the [`class()`](http://www.rdocumentation.org/packages/base/functions/class) function and the `is.*()` functions, you can use the `as.*()` functions to force data to change types.
621 | 
622 | Take this example:
623 | 
624 | ```
625 | var <- "3"
626 | var_num <- as.numeric(var)
627 | ```
628 | 
629 | `var`, a character string, is converted into a numeric using [`as.numeric()`](http://www.rdocumentation.org/packages/base/functions/numeric). The resulting numeric is stored as `var_num`.
630 | 
631 | *** =instructions
632 | - Convert `var`, a logical, to a character. Assign to resulting character string to the variable `var_char`.
633 | - Inspect the class of `var_char` by using [`class()`](http://www.rdocumentation.org/packages/base/functions/class) on it.
634 | 
635 | *** =hints
636 | Use the [`as.character()`](http://www.rdocumentation.org/packages/base/functions/character) function to convert `var` to a character.
637 | 
638 | *** =pre_exercise_code
639 | ```{r}
640 | ```
641 | 
642 | *** =sample_code
643 | ```{r}
644 | # Definition of var
645 | var <- TRUE
646 | 
647 | # Convert var to a character: var_char
648 | 
649 | 
650 | # Display the class of var_char
651 | 
652 | 
653 | ```
654 | 
655 | *** =solution
656 | ```{r}
657 | # Definition of var
658 | var <- TRUE
659 | 
660 | # Convert var to a character: var_char
661 | var_char <- as.character(var)
662 | 
663 | # Display the class of var_char
664 | class(var_char)
665 | ```
666 | 
667 | *** =sct
668 | ```{r}
669 | test_error()
670 | msg <- "Do not remove or change the definition of the variable `var`."
671 | test_object("var", undefined_msg = msg, incorrect_msg = msg)
672 | test_function("as.character", "x",
673 |               not_called_msg = "Make sure to call the function [`as.character()`](http://www.rdocumentation.org/packages/base/functions/character) to convert `var` to a character.",
674 |               incorrect_msg = "Have you passed the correct variable to the function [`as.character()`](http://www.rdocumentation.org/packages/base/functions/character)?")
675 | test_object("var_char")
676 | test_function("class", "x", 
677 |               not_called_msg = "Make sure to call the function <code>class()</code> to inspect the class of <code>var_char</code>.",
678 |               incorrect_msg = "Have you passed the correct variable to the function <code>class()/<code>?")
679 | success_msg("Bellissimo!")
680 | ```
681 | 


--------------------------------------------------------------------------------
/chapter2.Rmd:
--------------------------------------------------------------------------------
  1 | --- 
  2 | title_meta  : Chapter 2
  3 | title       : Vectors
  4 | description : We take you on a trip to Vegas, where you will learn how to analyze your gambling results using vectors in R! After completing this chapter, you will be able to create vectors in R, name them, select elements from them and compare different vectors.
  5 | attachments : 
  6 |   slides_link: https://s3.amazonaws.com/assets.datacamp.com/course/introduction_to_r/slides/ch2_slides.pdf
  7 |   
  8 | --- type:VideoExercise lang:r xp:50 skills:1 key:b91dd847a0
  9 | ## Create and Name Vectors
 10 | 
 11 | *** =video_link
 12 | //player.vimeo.com/video/138173896
 13 | 
 14 | *** =video_hls
 15 | //videos.datacamp.com/transcoded/732_intro_to_r/v1/hls-ch2_1.master.m3u8
 16 | 
 17 | 
 18 | --- type:NormalExercise lang:r xp:100 skills:1 key:2d1cb04427
 19 | ## Create a vector (1)
 20 | 
 21 | Feeling lucky? You better, because we'll take you on a trip to Las Vegas!
 22 | 
 23 | Thanks to R and your new data science skills, you will learn how to uplift your performance at the tables and fire off your career as a professional gambler. This chapter will show how you can easily keep track of your betting progress and how you can do some simple analyses on past actions.
 24 | 
 25 | You will use vectors. As Filip explained you, vectors are one dimensional arrays that can hold numeric data, character data or logical data. You create a vector with the combine function [`c()`](http://www.rdocumentation.org/packages/base/functions/c). You place the vector elements separated by a comma between the brackets. For example:
 26 | 
 27 | ```
 28 | numeric_vector <- c(1, 2, 3)
 29 | character_vector <- c("a", "b", "c")
 30 | logical_vector <- c(TRUE, FALSE)
 31 | ```
 32 | 
 33 | *** =instructions 
 34 | Create a vector, `logical_vector`, that contains the three elements: `TRUE`, `FALSE` and `TRUE` (in that order). 
 35 | 
 36 | *** =hint 
 37 | Assign `c(TRUE, FALSE, TRUE)` to the variable `logical_vector` with the `<-` operator.
 38 | 
 39 | *** =pre_exercise_code
 40 | ```{r}
 41 | # no pec
 42 | ```
 43 | 
 44 | *** =sample_code
 45 | ```{r}
 46 | numeric_vector <- c(1, 10, 49)
 47 | character_vector <- c("x", "y", "z")
 48 | 
 49 | # Create logical_vector
 50 | 
 51 | ```
 52 | 
 53 | *** =solution
 54 | ```{r}
 55 | numeric_vector <- c(1, 10, 49)
 56 | character_vector <- c("x", "y", "z")
 57 | 
 58 | # Create logical_vector
 59 | logical_vector <- c(TRUE, FALSE, TRUE)
 60 | ```
 61 | 
 62 | *** =sct
 63 | ```{r}
 64 | msg <- "Do not change how `numeric_vector` and `character_vector` are created!"
 65 | lapply(c("numeric_vector", "character_vector"), test_object, undefined_msg = msg, incorrect_msg = msg)
 66 | test_object("logical_vector", incorrect_msg = "Make sure that you assign the correct values to `logical_vector`. The order matters!")
 67 | success_msg("Perfect! Let's practice some more with vector creation.")
 68 | ```
 69 | 
 70 | --- type:NormalExercise lang:r xp:100 skills:1 key:c6e056b9c3
 71 | ## Create a vector (2)
 72 | 
 73 | After one week in Las Vegas and still zero Ferraris in your garage, you decide that it is time to start using your data science superpowers.
 74 | 
 75 | Before doing your first analysis, you decide to collect all the winnings and losses for the last week: 
 76 | 
 77 | For `poker_vector`: 
 78 | - On Monday you won \$140
 79 | - Tuesday you **lost** \$50
 80 | - Wednesday you won \$20 
 81 | - Thursday you **lost** \$120
 82 | - Friday you won \$240
 83 | 
 84 | For `roulette_vector`: 
 85 | - On Monday you **lost** \$24
 86 | - Tuesday you **lost** \$50
 87 | - Wednesday you won \$100
 88 | - Thursday you **lost** \$350
 89 | - Friday you won \$10
 90 | 
 91 | To be able to use this data in R, you decide to create the variables `poker_vector` and `roulette_vector`.
 92 | 
 93 | *** =instructions
 94 | Assign the winnings/losses for roulette as a vector to the variable `roulette_vector`. Make sure to use the correct order.
 95 | 
 96 | *** =hint
 97 | To help you with this step, the script already contains the code for creating `poker_vector`. Assign the correct values to `roulette_vector` based on the numbers in the assignment. Do not forget that losses are negative numbers.
 98 | 
 99 | 
100 | *** =pre_exercise_code
101 | ```{r}
102 | ```
103 | 
104 | *** =sample_code
105 | ```{r}
106 | # Poker winnings from Monday to Friday
107 | poker_vector <- c(140, -50, 20, -120, 240)
108 | 
109 | # Roulette winnings from Monday to Friday: roulette_vector
110 | 
111 | ```
112 | 
113 | *** =solution
114 | 
115 | ```{r}
116 | # Poker winnings from Monday to Friday
117 | poker_vector <- c(140, -50, 20, -120, 240)
118 | 
119 | # Roulette winnings from Monday to Friday: roulette_vector
120 | roulette_vector <- c(-24, -50, 100, -350, 10)
121 | ```
122 | 
123 | *** =sct
124 | ```{r}
125 | test_object("poker_vector", 
126 |             incorrect_msg = "Don't change how `poker_vector` is defined.")
127 | test_object("roulette_vector", 
128 |             incorrect_msg = paste("Make sure that you assign a vector with the correct values to `roulette_vector`.",
129 |                                   "If you lost money, you should use a `-` sign."))
130 | success_msg("Very good! To check out the contents of your vectors, remember that you can always simply type the variable in the console and hit Enter. Proceed to the next exercise!")
131 | ```
132 | 
133 | 
134 | --- type:NormalExercise lang:r xp:100 skills:1 key:ebb5aae2ff
135 | ## Naming a vector (1)
136 | 
137 | As a data analyst, it is important to have a clear view on the data that you are using. Understanding what each element refers to is essential. 
138 | 
139 | In the previous exercise, we created a vector with your winnings over the week. Each vector element refers to a day of the week but it is hard to tell which element belongs to which day. It would be nice if you could show that in the vector itself. Remember the [`names()`](http://www.rdocumentation.org/packages/base/functions/names) function to name the elements of a vector?
140 | 
141 | ```
142 | some_vector <- c("Johnny", "Poker Player")
143 | names(some_vector) <- c("Name", "Profession")
144 | ```
145 | 
146 | *** =instructions
147 | `poker_vector` has already been named with the days of the week. Do the same thing for `roulette_vector`. Beware: R is case sensitive!
148 | 
149 | *** =hint
150 | Assign `c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")` to `names(roulette_vector)`.
151 | 
152 | *** =pre_exercise_code
153 | ```{r}
154 | ```
155 | 
156 | *** =sample_code
157 | ```{r}
158 | # Poker winnings from Monday to Friday
159 | poker_vector <- c(140, -50, 20, -120, 240)
160 | 
161 | # Roulette winnings from Monday to Friday
162 | roulette_vector <- c(-24, -50, 100, -350, 10)
163 | 
164 | # Add names to poker_vector
165 | names(poker_vector) <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
166 | 
167 | # Add names to roulette_vector
168 | 
169 | ```
170 | 
171 | *** =solution
172 | ```{r}
173 | # Poker winnings from Monday to Friday
174 | poker_vector <- c(140, -50, 20, -120, 240)
175 | 
176 | # Roulette winnings from Monday to Friday
177 | roulette_vector <- c(-24, -50, 100, -350, 10)
178 | 
179 | # Add names to poker_vector
180 | names(poker_vector) <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
181 | 
182 | # Add names to roulette_vector
183 | names(roulette_vector) <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
184 | ```
185 | 
186 | *** =sct
187 | ```{r}
188 | msg <- "Do not change the values inside `%s`; they were already coded for you."
189 | test_object("poker_vector", incorrect_msg = sprintf(msg, "poker_vector"))
190 | test_object("roulette_vector", incorrect_msg = sprintf(msg, "roulette_vector"))
191 | msg <- "Make sure that you assign the correct names vector to `%s`. The names of the day should start with a capital letter!"
192 | test_object("poker_vector", eq_condition = "equal", incorrect_msg = sprintf(msg, "poker_vector"))
193 | test_object("roulette_vector", eq_condition = "equal", incorrect_msg = sprintf(msg, "roulette_vector"))
194 | success_msg("Well done!")
195 | ```
196 | 
197 | 
198 | --- type:NormalExercise lang:r xp:100 skills:1 key:5c026ed9fb
199 | ## Naming a vector (2)
200 | 
201 | If you want to become a good statistician, you have to become lazy. (If you are already lazy, chances are high you are one of those exceptional, natural-born statistical talents!)
202 | 
203 | In the previous exercises you probably experienced that it is boring and frustrating to type and retype information such as the days of the week. However, there is a more efficient way to do this, namely, to assign the days of the week vector to a variable! 
204 | 
205 | Just like you did with your poker and roulette returns, you can also create a variable that contains the days of the week. This way you can use and re-use it. This variable, `days_vector`, has already been coded for you.
206 | 
207 | *** =instructions
208 | - Use the variable `days_vector` to set the names of `poker_vector`.
209 | - Use the variable `days_vector` to set the names of `roulette_vector`.
210 | 
211 | *** =hint
212 | You can use `names(poker_vector) <- ` to set the names of the variable `poker_vector`.
213 | 
214 | *** =pre_exercise_code
215 | ```{r}
216 | # no pec
217 | ```
218 | 
219 | *** =sample_code
220 | ```{r}
221 | # Poker winnings from Monday to Friday
222 | poker_vector <- c(140, -50, 20, -120, 240)
223 | 
224 | # Roulette winnings from Monday to Friday
225 | roulette_vector <- c(-24, -50, 100, -350, 10)
226 | 
227 | # Create the variable days_vector
228 | days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
229 |  
230 | # Use days_vector to name poker_vector
231 | 
232 | 
233 | # Use days_vector to name roulette_vector
234 | ```
235 | 
236 | *** =solution
237 | ```{r}
238 | # Poker winnings from Monday to Friday
239 | poker_vector <- c(140, -50, 20, -120, 240)
240 | 
241 | # Roulette winnings from Monday to Friday
242 | roulette_vector <- c(-24, -50, 100, -350, 10)
243 | 
244 | # Create the variable days_vector
245 | days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
246 | 
247 | # Use days_vector to name poker_vector
248 | names(poker_vector) <- days_vector
249 | 
250 | # Use days_vector to name roulette_vector
251 | names(roulette_vector) <- days_vector
252 | ```
253 | 
254 | *** =sct
255 | ```{r}
256 | msg <- "Do not change the values inside `%s`; they were already coded for you."
257 | test_object("poker_vector", incorrect_msg = sprintf(msg, "poker_vector"))
258 | test_object("roulette_vector", incorrect_msg = sprintf(msg, "roulette_vector"))
259 | test_object("days_vector", incorrect_msg = sprintf(msg, "days_vector"))
260 | 
261 | msg <- "Make sure that you assign `days_vector` to the names of `%s`. Use the `names()` function."
262 | test_object("poker_vector", eq_condition = "equal", incorrect_msg = sprintf(msg, "poker_vector"))
263 | test_object("roulette_vector", eq_condition = "equal", incorrect_msg = sprintf(msg, "roulette_vector"))
264 | 
265 | success_msg("Nice one! A word of advice: try to avoid code duplication at all times.")
266 | ```
267 | 
268 | --- type:VideoExercise lang:r xp:50 skills:1 key:b47466f033
269 | ## Vector Arithmetic
270 | 
271 | *** =video_link
272 | //player.vimeo.com/video/141163398
273 | 
274 | *** =video_hls
275 | //videos.datacamp.com/transcoded/732_intro_to_r/v1/hls-ch2_2.master.m3u8
276 | 
277 | 
278 | --- type:NormalExercise lang:r xp:100 skills:1 key:6b17fc50b9
279 | ## Calculate your earnings
280 | 
281 | Now that you understand how R does arithmetic calculations with vectors, it is time to get those Ferraris in your garage! First, you need to understand what the overall profit or loss per day of the week was. The total daily profit is the sum of the profit/loss you realized on poker per day, and the profit/loss you realized on roulette per day.
282 | 
283 | Remember that vector calculations happen element-wise; the following three statements are completely equivalent:
284 | 
285 | ```
286 | c(1, 2, 3) + c(4, 5, 6)
287 | c(1 + 4, 2 + 5, 3 + 6)
288 | c(5, 7, 9)
289 | ```
290 | 
291 | *** =instructions
292 | - Assign to the variable `total_daily` how much you won or lost on each day in total (poker and roulette combined). `total_daily` should be a vector with 5 values.
293 | - Print out `total_daily`.
294 | 
295 | *** =hint
296 | Similar to the previous exercise, assign the sum of two vectors to a new variable, `total_daily`.
297 | 
298 | *** =pre_exercise_code
299 | ```{r}
300 | # no pec
301 | ```
302 | 
303 | *** =sample_code
304 | ```{r}
305 | # Casino winnings from Monday to Friday
306 | poker_vector <- c(140, -50, 20, -120, 240)
307 | roulette_vector <- c(-24, -50, 100, -350, 10)
308 | days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
309 | names(poker_vector) <- days_vector
310 | names(roulette_vector) <- days_vector
311 | 
312 | # Calculate your daily earnings: total_daily
313 | 
314 | 
315 | # Print out total_daily
316 | ```
317 | 
318 | *** =solution
319 | ```{r}
320 | # Casino winnings from Monday to Friday
321 | poker_vector <- c(140, -50, 20, -120, 240)
322 | roulette_vector <- c(-24, -50, 100, -350, 10)
323 | days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
324 | names(poker_vector) <- days_vector
325 | names(roulette_vector) <- days_vector
326 | 
327 | # Calculate your daily earnings: total_daily
328 | total_daily <- poker_vector + roulette_vector
329 | 
330 | # Print out total_daily
331 | total_daily
332 | ```
333 | 
334 | *** =sct
335 | ```{r}
336 | msg = "Do not change anything about the definition and naming of `poker_vector` and `roulette_vector`."
337 | test_object("days_vector", undefined_msg = msg, incorrect_msg = msg)
338 | test_object("poker_vector", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg)
339 | test_object("roulette_vector", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg)
340 | test_object("total_daily", 
341 |             incorrect_msg = "Make sure that you assign the sum of `poker_vector` and `roulette_vector` to `total_daily`. Simply use `+`.")
342 | test_output_contains("total_daily", incorrect_msg = "Don't forget to print out `total_daily`.")
343 | success_msg("Great! Continue to the next exercise.")
344 | ```
345 | 
346 | 
347 | --- type:NormalExercise lang:r xp:100 skills:1 key:a9a1a50a31
348 | ## Calculate total winnings: sum()
349 | 
350 | Based on the previous analysis, it looks like you had a mix of good and bad days. This is not what your ego expected, and you wonder if there may be a (very very very) tiny chance you have lost money over the week in total? 
351 | 
352 | You can answer this question using the [`sum()`](http://www.rdocumentation.org/packages/base/functions/sum) function. As mentioned in the video, it calculates the sum of all elements of a vector.
353 | 
354 | *** =instructions
355 | - Calculate the total amount of money that you have won/lost with poker and assign it to the variable `total_poker`.
356 | - Do the same thing for roulette and assign the result to `total_roulette`.
357 | - Use `+` to sum the `total_poker` and `total_roulette`, which is the sum of all gains and losses of the week. Simply print the result to the console.
358 | 
359 | *** =hint
360 | Use the [`sum()`](http://www.rdocumentation.org/packages/base/functions/sum) function to get the total of the `poker_vector`. Do the same thing for `roulette_vector`.
361 | 
362 | *** =pre_exercise_code
363 | ```{r}
364 | # no pec
365 | ```
366 | 
367 | *** =sample_code
368 | ```{r}
369 | # Casino winnings from Monday to Friday
370 | poker_vector <- c(140, -50, 20, -120, 240)
371 | roulette_vector <- c(-24, -50, 100, -350, 10)
372 | days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
373 | names(poker_vector) <- days_vector
374 | names(roulette_vector) <- days_vector
375 | 
376 | # Total winnings with poker: total_poker
377 | 
378 | 
379 | # Total winnings with roulette: total_roulette
380 | 
381 | 
382 | # Total winnings overall: print out the result
383 | 
384 | ```
385 | 
386 | *** =solution
387 | ```{r}
388 | # Casino winnings from Monday to Friday
389 | poker_vector <- c(140, -50, 20, -120, 240)
390 | roulette_vector <- c(-24, -50, 100, -350, 10)
391 | days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
392 | names(poker_vector) <- days_vector
393 | names(roulette_vector) <- days_vector
394 | 
395 | # Total winnings with poker: total_poker
396 | total_poker <- sum(poker_vector)
397 | 
398 | # Total winnings with roulette: total_roulette
399 | total_roulette <-  sum(roulette_vector)
400 | 
401 | # Total winnings overall: print out the result
402 | total_roulette + total_poker
403 | ```
404 | 
405 | *** =sct
406 | ```{r}
407 | msg <- "Do not change anything about the definition and naming of `poker_vector` and `roulette_vector`."
408 | test_object("days_vector", undefined_msg = msg, incorrect_msg = msg)
409 | test_object("poker_vector", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg)
410 | test_object("roulette_vector", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg)
411 | test_object("total_poker", 
412 |             undefined_msg = "Please make sure to define a variable `total_poker`.",
413 |             incorrect_msg = "Make sure that you assign to `total_poker` the sum of the `poker_vector`.")
414 | test_object("total_roulette",
415 |             undefined_msg = "Please make sure to define a variable `total_roulette`.",
416 |             incorrect_msg = "Make sure that you assign to `total_roulette` the sum of the `roulette_vector`.")
417 | test_output_contains("total_poker + total_roulette", incorrect_msg = "Print the sum of `total_poker` and `total_roulette` to the console.")
418 | success_msg("Oops, it seems like you are losing money. Time to rethink and adapt your strategy! This will require some deeper analysis...")
419 | ```
420 | 
421 | 
422 | --- type:VideoExercise lang:r xp:50 skills:1 key:513029f4ac
423 | ## Vector Subsetting
424 | 
425 | *** =video_link
426 | //player.vimeo.com/video/138173916
427 | 
428 | *** =video_hls
429 | //videos.datacamp.com/transcoded/732_intro_to_r/v1/hls-ch2_3.master.m3u8
430 | 
431 | 
432 | --- type:NormalExercise lang:r xp:100 skills:1 key:6112e74425
433 | ## Selection by index (1)
434 | 
435 | After you figured that roulette is not your forte, you decide to compare your performance at the beginning of the week to your performance at the end of the week. You did have a couple of Margarita cocktails at the end of the week...
436 | 
437 | To answer that question, you only want to focus on a selection of the `total_vector`. In other words, our goal is to select specific elements of the vector.
438 | 
439 | *** =instructions
440 | - Assign the poker results of Wednesday to the variable `poker_wednesday`.
441 | - Assign the roulette results of Friday to the variable `roulette_friday`.
442 | 
443 | *** =hint
444 | Wednesday is the third element of `poker_vector`, and can thus be selected with `poker_vector[3]`.
445 | 
446 | *** =pre_exercise_code
447 | ```{r}
448 | # no pec
449 | ```
450 | 
451 | *** =sample_code
452 | ```{r}
453 | # Casino winnings from Monday to Friday
454 | poker_vector <- c(140, -50, 20, -120, 240)
455 | roulette_vector <- c(-24, -50, 100, -350, 10)
456 | days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
457 | names(poker_vector) <- days_vector
458 | names(roulette_vector) <- days_vector
459 | 
460 | # Poker results of Wednesday: poker_wednesday
461 | 
462 | 
463 | # Roulette results of Friday: roulette_friday
464 | 
465 | ```
466 | 
467 | *** =solution
468 | ```{r}
469 | # Casino winnings from Monday to Friday
470 | poker_vector <- c(140, -50, 20, -120, 240)
471 | roulette_vector <- c(-24, -50, 100, -350, 10)
472 | days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
473 | names(poker_vector) <- days_vector
474 | names(roulette_vector) <- days_vector
475 | 
476 | # Poker results of Wednesday: poker_wednesday
477 | poker_wednesday <- poker_vector[3]
478 | 
479 | # Roulette results of Friday: roulette_friday
480 | roulette_friday <- roulette_vector[5]
481 | ```
482 | 
483 | *** =sct
484 | ```{r}
485 | 
486 | msg = "Do not change anything about the definition and naming of `poker_vector` and `roulette_vector`."
487 | test_object("days_vector", undefined_msg = msg, incorrect_msg = msg)
488 | test_object("poker_vector", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg)
489 | test_object("roulette_vector", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg)
490 | test_object("poker_wednesday",
491 |             incorrect_msg = "It looks like `poker_wednesday` does not contain the correct value of `poker_vector`.")
492 | test_object("roulette_friday",
493 |             incorrect_msg = "It looks like `roulette_friday` does not contain the correct value of `roulette_vector`.")
494 | success_msg("Great! R also makes it possible to select multiple elements from a vector at once, remember? Put the theory to practice in the next exercise!")
495 | ```
496 | 
497 | 
498 | --- type:NormalExercise lang:r xp:100 skills:1 key:ae2832fbd1
499 | ## Selection by index (2) 
500 | 
501 | How about analyzing your midweek results? 
502 | 
503 | Instead of using a single number to select a single element, you can also select multiple elements by passing a vector inside the square brackets. For example,
504 | 
505 | ```
506 | poker_vector[c(1,5)]
507 | ```
508 | 
509 | selects the first and the fifth element of `poker_vector`.
510 | 
511 | 
512 | *** =instructions
513 | - Assign the poker results of Tuesday, Wednesday and Thursday to the variable `poker_midweek`.
514 | - Assign the roulette results of Thursday and Friday to the variable `roulette_endweek`.
515 | 
516 | *** =hint
517 | Use the vector `c(2,3,4)` between square brackets to select the correct elements of `poker_vector`.
518 | 
519 | *** =pre_exercise_code
520 | ```{r}
521 | # no pec
522 | ``` 
523 | 
524 | *** =sample_code
525 | ```{r}
526 | # Casino winnings from Monday to Friday
527 | poker_vector <- c(140, -50, 20, -120, 240)
528 | roulette_vector <- c(-24, -50, 100, -350, 10)
529 | days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
530 | names(poker_vector) <- days_vector
531 | names(roulette_vector) <- days_vector
532 | 
533 | # Mid-week poker results: poker_midweek
534 | 
535 | 
536 | # End-of-week roulette results: roulette_endweek
537 | 
538 | 
539 | ```
540 | 
541 | *** =solution
542 | ```{r}
543 | # Casino winnings from Monday to Friday
544 | poker_vector <- c(140, -50, 20, -120, 240)
545 | roulette_vector <- c(-24, -50, 100, -350, 10)
546 | days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
547 | names(poker_vector) <- days_vector
548 | names(roulette_vector) <- days_vector
549 | 
550 | # Mid-week poker results: poker_midweek
551 | poker_midweek <- poker_vector[c(2, 3, 4)]
552 | 
553 | # End-of-week roulette results: roulette_endweek
554 | roulette_endweek <- roulette_vector[c(4,5)]
555 | ```
556 | 
557 | *** =sct
558 | ```{r}
559 | msg <- "Do not change anything about the definition and naming of `poker_vector` and `roulette_vector`."
560 | test_object("days_vector", undefined_msg = msg, incorrect_msg = msg)
561 | test_object("poker_vector", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg)
562 | test_object("roulette_vector", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg)
563 | 
564 | msg <- "It looks like `%s` does not contain the correct elements from `%s`."
565 | test_object("poker_midweek", 
566 |             incorrect_msg = sprintf(msg, "poker_midweek", "poker_vector"))
567 | test_object("roulette_endweek",
568 |             incorrect_msg = sprintf(msg, "roulette_endweek", "roulette_vector"))
569 | 
570 | success_msg("Well done! Another way to find the mid-week results is `poker_vector[2:4]`. Continue to the next exercise to specialize in vector selection some more!");
571 | ```
572 | 
573 | --- type:NormalExercise lang:r xp:100 skills:1 key:5919f3fc05
574 | ## Selection by name
575 | 
576 | Another way to tackle the previous exercise is by using the names of the vector elements (Monday, Tuesday, ...) instead of their numeric positions. For example, 
577 | 
578 | ```
579 | poker_vector["Monday"]
580 | ```
581 | 
582 | will select the first element of `poker_vector` since `"Monday"` is the name of that first element.
583 | 
584 | *** =instructions
585 | - Select the fourth element, corresponding to Thursday, from `roulette_vector`. Name it `roulette_thursday`.
586 | - Select Tuesday's poker gains using subsetting by name. Assign the result to `poker_tuesday`.
587 | 
588 | *** =hint
589 | You can use `mean(my_vector)` to get the mean of the vector `my_vector`.
590 | 
591 | *** =pre_exercise_code
592 | ```{r}
593 | # no pec
594 | ```
595 | 
596 | *** =sample_code
597 | ```{r}
598 | # Casino winnings from Monday to Friday
599 | poker_vector <- c(140, -50, 20, -120, 240)
600 | roulette_vector <- c(-24, -50, 100, -350, 10)
601 | days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
602 | names(poker_vector) <- days_vector
603 | names(roulette_vector) <- days_vector
604 | 
605 | # Select Thursday's roulette gains: roulette_thursday
606 | 
607 | 
608 | # Select Tuesday's poker gains: poker_tuesday
609 | 
610 | ```
611 | 
612 | *** =solution
613 | ```{r}
614 | # Casino winnings from Monday to Friday
615 | poker_vector <- c(140, -50, 20, -120, 240)
616 | roulette_vector <- c(-24, -50, 100, -350, 10)
617 | days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
618 | names(poker_vector) <- days_vector
619 | names(roulette_vector) <- days_vector
620 | 
621 | # Select Thursday's roulette gains: roulette_thursday
622 | roulette_thursday <- roulette_vector["Thursday"]
623 | 
624 | # Select Tuesday's poker gains: poker_tuesday
625 | poker_tuesday <- poker_vector["Tuesday"]
626 | ```
627 | 
628 | *** =sct
629 | ```{r}
630 | msg <- "Do not change anything about the definition and naming of `poker_vector` and `roulette_vector`."
631 | test_object("days_vector", undefined_msg = msg, incorrect_msg = msg)
632 | test_object("poker_vector", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg)
633 | test_object("roulette_vector", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg)
634 | 
635 | test_object("roulette_thursday")
636 | test_object("poker_tuesday")
637 | success_msg("Good job! Head over to the next exercise.");
638 | ```
639 | 
640 | --- type:NormalExercise lang:r xp:100 skills:1 key:22121c6c46
641 | ## Selection by logicals (1)
642 | 
643 | There are basically three ways to subset vectors: by using the indices, by using the names (if the vectors are named) and by using logical vectors. Filip already told you about the internals in the instructional video. As a refresher, have a look at the following statements to select elements from `poker_vector`, which are all equivalent:
644 | 
645 | ```
646 | # selection by index
647 | poker_vector[c(1,3)]
648 | 
649 | # selection by name
650 | poker_vector[c("Monday", "Wednesday")]
651 | 
652 | # selection by logicals
653 | poker_vector[c(TRUE, FALSE, TRUE, FALSE, FALSE)]
654 | ```
655 | 
656 | *** =instructions
657 | - Assign the roulette results from the first, third and fifth day to `roulette_subset`.
658 | - Select the first three days from `poker_vector` using a vector of logicals. Assign the result to `poker_start`.
659 | 
660 | *** =hint
661 | The logical vector to use inside square brackets for the first instruction is `c(TRUE, FALSE, TRUE, FALSE, TRUE)`.
662 | 
663 | *** =pre_exercise_code
664 | ```{r}
665 | # no pec
666 | ```
667 | 
668 | *** =sample_code
669 | ```{r}
670 | # Casino winnings from Monday to Friday
671 | poker_vector <- c(140, -50, 20, -120, 240)
672 | roulette_vector <- c(-24, -50, 100, -350, 10)
673 | days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
674 | names(poker_vector) <- days_vector
675 | names(roulette_vector) <- days_vector
676 | 
677 | # Roulette results for day 1, 3 and 5: roulette_subset
678 | 
679 |   
680 | # Poker results for first three days: poker_start
681 | ```
682 | 
683 | *** =solution
684 | ```{r}
685 | # Casino winnings from Monday to Friday
686 | poker_vector <- c(140, -50, 20, -120, 240)
687 | roulette_vector <- c(-24, -50, 100, -350, 10)
688 | days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
689 | names(poker_vector) <- days_vector
690 | names(roulette_vector) <- days_vector
691 | 
692 | # Roulette relsults for day 1, 3 and 5: roulette_subset
693 | roulette_subset <- roulette_vector[c(TRUE, FALSE, TRUE, FALSE, TRUE)]
694 |   
695 | # Poker results for first three days: poker_start
696 | poker_start <- poker_vector[c(TRUE, TRUE, TRUE, FALSE, FALSE)]
697 | ```
698 | 
699 | *** =sct
700 | ```{r}
701 | msg = "Do not change anything about the definition and naming of `poker_vector` and `roulette_vector`."
702 | test_object("days_vector", undefined_msg = msg, incorrect_msg = msg)
703 | test_object("poker_vector", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg)
704 | test_object("roulette_vector", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg)
705 | test_object("roulette_subset")
706 | test_object("poker_start")
707 | success_msg("Nice one! Using logical vectors to perform subsetting might seem somewhat tedious, but its true power will become clear in the next exercise!")
708 | ```
709 | 
710 | 
711 | --- type:NormalExercise lang:r xp:100 skills:1 key:aa2e5f6e97
712 | ## Selection by logicals (2)
713 | 
714 | By making use of a combination of comparison operators and subsetting using logicals, you can investigate your casino performance in a more pro-active way.
715 | 
716 | The (logical) comparison operators known to R are:
717 | - `<` for less than
718 | - `>` for greater than
719 | - `<=` for less than or equal to
720 | - `>=` for greater than or equal to
721 | - `==` for equal to each other
722 | - `!=` not equal to each other
723 | 
724 | Experiment with these operators in the console:
725 | 
726 | ```
727 | lost_roulette_days <- roulette_vector < 0
728 | lost_roulette_days
729 | ```
730 | 
731 | The result will be a logical vector, which you can use to perform subsetting, like this example:
732 | 
733 | ```
734 | roulette_vector[lost_roulette_days]
735 | ```
736 | 
737 | The result is a subset of `roulette_vector` that contains only your losses in roulette.
738 | 
739 | *** =instructions
740 | - Check if your poker winnings are positive on the different days of the week (i.e. > 0), and assign this to `selection_vector`.
741 | - Assign the amounts that you won on the profitable days to the variable `poker_profits` by using `selection_vector`.
742 | 
743 | *** =hint
744 | - In order to check for which days your poker gains are positive, R should check for each element of `poker_vector` whether it is larger than zero. `some_vector > 0` is the way to tell R what you are after.
745 | - After creating `selection_vector`, you can use it to subset `poker_vector` like this: `poker_vector[selection_vector]`.
746 | 
747 | *** =pre_exercise_code
748 | ```{r}
749 | # no pec
750 | ```
751 | 
752 | *** =sample_code
753 | ```{r}
754 | # Casino winnings from Monday to Friday
755 | poker_vector <- c(140, -50, 20, -120, 240)
756 | roulette_vector <- c(-24, -50, 100, -350, 10)
757 | days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
758 | names(poker_vector) <- days_vector
759 | names(roulette_vector) <- days_vector
760 | 
761 | # Create logical vector corresponding to profitable poker days: selection_vector
762 | 
763 | 
764 | # Select amounts for profitable poker days: poker_profits
765 |  
766 | ```
767 | 
768 | *** =solution
769 | ```{r}
770 | # Casino winnings from Monday to Friday
771 | poker_vector <- c(140, -50, 20, -120, 240)
772 | roulette_vector <- c(-24, -50, 100, -350, 10)
773 | days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
774 | names(poker_vector) <- days_vector
775 | names(roulette_vector) <- days_vector
776 | 
777 | # Create logical vector corresponding to profitable poker days: selection_vector
778 | selection_vector <- poker_vector > 0
779 | 
780 | # Select amounts for profitable poker days: poker_profits
781 | poker_profits <- poker_vector[selection_vector]
782 | ```
783 | 
784 | *** =sct
785 | ```{r}
786 | msg = "Do not change anything about the definition and naming of `poker_vector` and `roulette_vector`."
787 | test_object("days_vector", undefined_msg = msg, incorrect_msg = msg)
788 | test_object("poker_vector", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg)
789 | test_object("roulette_vector", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg)
790 | test_object("selection_vector", 
791 |             undefined_msg = "Please make sure to define a variable `selection_vector`.",
792 |             incorrect_msg = "It looks like `selection_vector` does not contain the correct result. Remember that R uses element wise operations for vectors.")
793 | test_object("poker_profits",
794 |             undefined_msg =  "Please make sure to define a variable `poker_profits`.",
795 |             incorrect_msg =  "It looks like `poker_profits` does not contain the correct result. Remember that R uses element wise operations for vectors.")
796 | success_msg("Great! Move on to the Matrices chapter!")
797 | ```
798 | 
799 | 


--------------------------------------------------------------------------------
/chapter3.Rmd:
--------------------------------------------------------------------------------
  1 | --- 
  2 | title_meta  : Chapter 3
  3 | title       : Matrices
  4 | description : In this chapter you will learn how to work with matrices in R. By the end of the chapter, you will be able to create matrices and to understand how you can do basic computations with them. You will analyze the box office numbers of Star Wars to illustrate the use of matrices in R. May the force be with you!
  5 | attachments : 
  6 |   slides_link: https://s3.amazonaws.com/assets.datacamp.com/course/introduction_to_r/slides/ch3_slides.pdf
  7 |   
  8 | --- type:VideoExercise lang:r xp:50 skills:1 key:82d8734b17
  9 | ## Create and Name Matrices
 10 | 
 11 | *** =video_link
 12 | //player.vimeo.com/video/138173926
 13 | 
 14 | *** =video_hls
 15 | //videos.datacamp.com/transcoded/732_intro_to_r/v1/hls-ch3_1.master.m3u8
 16 | 
 17 | 
 18 | --- type:NormalExercise lang:r xp:100 skills:1 key:834a0e546c
 19 | ## Analyzing matrices, you shall (1)
 20 | 
 21 | It is now time to get your hands dirty. In the following exercises you will analyze the box office numbers of the Star Wars franchise. May the force be with you!
 22 | 
 23 | As a reminder, look at this line of code that constructs a matrix with numbers 1 through 9, filled row by row:
 24 | 
 25 | ```
 26 | matrix(1:9, byrow = TRUE, nrow = 3)
 27 | ```
 28 | 
 29 | In the script, a vector `box` is defined that represents the box office numbers from the first three Star Wars movies. The first, third and fifth element correspond to the US box office revenue for the movies, the second, fourth and sixth element represent the non-US box office revenue.
 30 | 
 31 | *** =instructions
 32 | Construct a matrix `star_wars_matrix`:
 33 | 
 34 | - Each row represents a movie.
 35 | - The first column is for the US box office revenue, and the second column for the non-US box office revenue. 
 36 | - Use the function `matrix()` with `box` as the first input, and the additional arguments `nrow` and `byrow`.
 37 | 
 38 | *** =hint
 39 | Set `nrow` to `3` and `byrow` to `TRUE` inside `matrix()`.
 40 | 
 41 | *** =pre_exercise_code
 42 | ```{r}
 43 | # no pec
 44 | ```
 45 | 
 46 | *** =sample_code
 47 | ```{r}
 48 | # Star Wars box office in millions (!)
 49 | box <- c(460.998, 314.4, 290.475, 247.900, 309.306, 165.8)
 50 | 
 51 | # Create star_wars_matrix
 52 | 
 53 | ```
 54 | 
 55 | *** =solution
 56 | ```{r}
 57 | # Star Wars box office in millions (!)
 58 | box <- c(460.998, 314.4, 290.475, 247.900, 309.306, 165.8)
 59 | 
 60 | # Create star_wars_matrix
 61 | star_wars_matrix <- matrix(box, nrow = 3, byrow = TRUE) 
 62 | ```
 63 | 
 64 | *** =sct
 65 | ```{r}
 66 | test_error()
 67 | msg <- "Do not change or remove the definition of `box`!"
 68 | test_object("box", undefined_msg = msg, incorrect_msg = msg)
 69 | 
 70 | test_correct({
 71 |   test_object("star_wars_matrix",
 72 |               undefined_msg = "Please make sure to define a variable `star_wars_matrix`.",
 73 |               incorrect_msg = "Did you assign the correct matrix containing the vector that holds all three movies to `star_wars_matrix`?")
 74 | }, {
 75 |   test_function("matrix", "data")
 76 |   test_function("matrix", "nrow")
 77 |   test_function("matrix", "byrow")
 78 | })
 79 | success_msg("Great job!")
 80 | ```
 81 | 
 82 | 
 83 | --- type:NormalExercise lang:r xp:100 skills:1 key:0dfb4c5e70
 84 | ## Analyzing matrices, you shall (2)
 85 | 
 86 | Instead of as a single vector, the box office numbers for the three Star Wars movies are represented as three vectors. Remember the [`rbind()`](http://www.rdocumentation.org/packages/base/functions/cbind) function to paste together different vectors as if they were rows of a matrix? Take this example, that pastes together 2 vectors as if they were rows of a matrix:
 87 | 
 88 | ```
 89 | a <- c(1, 2, 3)
 90 | b <- c(4, 5, 6)
 91 | rbind(a, b)
 92 | ```
 93 | 
 94 | Try a similar thing on the astronomical Star Wars numbers!
 95 | 
 96 | *** =instructions
 97 | Again, construct the matrix `star_wars_matrix` with one row for each movie.
 98 | 
 99 | *** =hint
100 | Simply pass the three vectors to the [`rbind()`](http://www.rdocumentation.org/packages/base/functions/cbind) function.
101 | 
102 | *** =pre_exercise_code
103 | ```{r}
104 | # no pec
105 | ```
106 | 
107 | *** =sample_code
108 | ```{r}
109 | # Star Wars box office in millions (!)
110 | new_hope <- c(460.998, 314.4)
111 | empire_strikes <- c(290.475, 247.900)
112 | return_jedi <- c(309.306, 165.8)
113 | 
114 | # Create star_wars_matrix
115 | 
116 | ```
117 | 
118 | *** =solution
119 | ```{r}
120 | # Star Wars box office in millions (!)
121 | new_hope <- c(460.998, 314.4)
122 | empire_strikes <- c(290.475, 247.900)
123 | return_jedi <- c(309.306, 165.8)
124 | 
125 | # Create star_wars_matrix
126 | star_wars_matrix <- rbind(new_hope, empire_strikes, return_jedi)
127 | ```
128 | 
129 | *** =sct
130 | ```{r}
131 | test_error()
132 | msg = "Do not change anything about the box office variables `new_hope`, `empire_strikes` and `return_jedi`!"
133 | test_object("new_hope", undefined_msg = msg, incorrect_msg = msg)
134 | test_object("empire_strikes", undefined_msg = msg, incorrect_msg = msg)
135 | test_object("return_jedi", undefined_msg = msg, incorrect_msg = msg)
136 | 
137 | test_correct({
138 |   test_object("star_wars_matrix",
139 |             incorrect_msg = "Did you assign the correct matrix containing the vector that holds all three movies to `star_wars_matrix`?")
140 | }, {
141 |   test_function("rbind", not_called_msg = "You should use the [`rbind()`](http://www.rdocumentation.org/packages/base/functions/cbind) function to create the matrix.")
142 | })
143 | success_msg("The force is actually with you! Continue to the next exercise.")
144 | ```
145 | 
146 | 
147 | --- type:NormalExercise lang:r xp:100 skills:1 key:ca3dbb8a9f
148 | ## Naming a matrix
149 | 
150 | To help you remember what is stored in `star_wars_matrix`, you would like to add the names of the movies for the rows. Not only does this help you to read the data, but it is also useful to select certain elements from the matrix later. 
151 | 
152 | Similar to vectors, you can add names for the rows and the columns of a matrix
153 | 
154 | ```
155 | rownames(my_matrix) <- row_names_vector
156 | colnames(my_matrix) <- col_names_vector
157 | ```
158 | 
159 | *** =instructions
160 | - Two vectors containing the row names and column names have been created for you: `movie_names` and `col_titles`.
161 | - Name the rows of `star_wars_matrix` with `movie_names`.
162 | - Name the columns of `star_wars_matrix` with the pre-defined `col_titles`.
163 | 
164 | *** =hint
165 | To name the rows, start with `rownames(star_wars_matrix) <-`; can you finish the command?
166 | 
167 | *** =pre_exercise_code
168 | ```{r}
169 | # no pec
170 | ```
171 | 
172 | *** =sample_code
173 | ```{r}
174 | # Star Wars box office in millions (!)
175 | new_hope <- c(460.998, 314.4)
176 | empire_strikes <- c(290.475, 247.900)
177 | return_jedi <- c(309.306, 165.8)
178 | star_wars_matrix <- rbind(new_hope, empire_strikes, return_jedi)
179 | 
180 | # Build col_names_vector and row_names_vector
181 | movie_names <- c("A New Hope", "The Empire Strikes Back", "Return of the Jedi")
182 | col_titles <- c("US", "non-US")
183 | 
184 | # Add row names to star_wars_matrix
185 | 
186 | 
187 | # Add column names to star_wars_matrix
188 | 
189 | ```
190 | 
191 | *** =solution
192 | ```{r}
193 | # Star Wars box office in millions (!)
194 | new_hope <- c(460.998, 314.4)
195 | empire_strikes <- c(290.475, 247.900)
196 | return_jedi <- c(309.306, 165.8)
197 | star_wars_matrix <- rbind(new_hope, empire_strikes, return_jedi)
198 | 
199 | # Build col_names_vector and row_names_vector
200 | movie_names <- c("A New Hope", "The Empire Strikes Back", "Return of the Jedi")
201 | col_titles <- c("US", "non-US")
202 | 
203 | # Add row names to star_wars_matrix
204 | rownames(star_wars_matrix) <- movie_names
205 | 
206 | # Add column names to star_wars_matrix
207 | colnames(star_wars_matrix) <- col_titles
208 | ```
209 | 
210 | *** =sct
211 | ```{r}
212 | msg = "Do not change anything about the box office variables `new_hope`, `empire_strikes` and `return_jedi`!"
213 | lapply(c("new_hope", "empire_strikes", "return_jedi"), test_object, undefined_msg = msg, incorrect_msg = msg)
214 | msg <- "Do not change anything about the creation of `star_wars_matrix`."
215 | test_object("star_wars_matrix", undefined_msg = msg, incorrect_msg = msg)
216 | msg <- paste("Do not change or remove the vectors `col_names_vector` and `row_names_vector`;",
217 |              "you can use them to name the columns and rows of `star_wars_matrix`.")
218 | lapply(c("movie_names", "col_titles"), test_object, undefined_msg = msg, incorrect_msg = msg)
219 | test_object("star_wars_matrix", eq_condition = "equal",
220 |             incorrect_msg = paste("Did you set the row and column names of `star_wars_matrix` correctly?",
221 |                                   "Have another look at your code."))
222 | success_msg("Great! You're on your way of becoming an R jedi!")
223 | ```
224 | 
225 | 
226 | --- type:NormalExercise lang:r xp:100 skills:1 key:3b60b1a49a
227 | ## Calculating the worldwide box office 
228 | 
229 | The single most important thing for a movie in order to become an instant legend in Tinseltown is its worldwide box office figures. 
230 | 
231 | To calculate the total box office revenue for the three Star Wars movies, you have to take the sum of the US revenue column and the non-US revenue column.
232 | 
233 | In R, the function [`rowSums()`](http://www.rdocumentation.org/packages/base/functions/colSums) conveniently calculates the totals for each row of a matrix. This function creates a new vector:
234 | 
235 | ```
236 | sum_of_rows_vector <- rowSums(my_matrix)
237 | ```
238 | 
239 | *** =instructions
240 | Calculate the worldwide box office figures for the three movies and put these in the vector named `worldwide_vector`.
241 | 
242 | *** =hint
243 | The ['rowSums'](http://www.rdocumentation.org/packages/base/functions/colSums) function will calculate the total box office for each row of the `star_wars_matrix`, if you supply `star_wars_matrix` as an argument to that function by putting it between the parentheses.
244 | 
245 | *** =pre_exercise_code
246 | ```{r}
247 | # no pec
248 | ```
249 | 
250 | *** =sample_code
251 | ```{r}
252 | # Star Wars box office in millions (!)
253 | new_hope <- c(460.998, 314.4)
254 | empire_strikes <- c(290.475, 247.900)
255 | return_jedi <- c(309.306, 165.8)
256 | star_wars_matrix <- rbind(new_hope, empire_strikes, return_jedi)
257 | rownames(star_wars_matrix) <- c("A New Hope", "The Empire Strikes Back", "Return of the Jedi")
258 | colnames(star_wars_matrix) <- c("US", "non-US")
259 | 
260 | # Calculate the worldwide box office: worldwide_vector
261 | 
262 | ```
263 | 
264 | *** =solution
265 | ```{r}
266 | # Star Wars box office in millions (!)
267 | new_hope <- c(460.998, 314.4)
268 | empire_strikes <- c(290.475, 247.900)
269 | return_jedi <- c(309.306, 165.8)
270 | star_wars_matrix <- rbind(new_hope, empire_strikes, return_jedi)
271 | rownames(star_wars_matrix) <- c("A New Hope", "The Empire Strikes Back", "Return of the Jedi")
272 | colnames(star_wars_matrix) <- c("US", "non-US")
273 | 
274 | # Calculate the worldwide box office: worldwide_vector
275 | worldwide_vector <- rowSums(star_wars_matrix)
276 | ```
277 | 
278 | *** =sct
279 | ```{r}
280 | msg = "Do not change anything about the construction and naming of `star_wars_matrix`!"
281 | test_object("star_wars_matrix", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg)
282 | 
283 | test_function("rowSums", "x", 
284 |               not_called_msg = "Have you used the function `rowSums()</code?",
285 |               incorrect_msg = "Did you use the [`rowSums()`](http://www.rdocumentation.org/packages/base/functions/colSums) function on the correct vector?")
286 | 
287 | test_object("worldwide_vector",
288 |             incorrect_msg = "Have you specified `worldwide_vector` correctly?")
289 | success_msg("Well done! Continue to the next exercise.")
290 | ```
291 | 
292 | --- type:NormalExercise lang:r xp:100 skills:1 key:4e0c938d72
293 | ## Adding a row
294 | 
295 | Your R workspace contains two matrices: 
296 | - `star_wars_matrix`: the matrix that you just created for the first trilogy.
297 | - `star_wars_matrix2`: similar information for the second trilogy. 
298 | Type the names of the matrices in the console and press enter if you want to have a closer look.
299 | 
300 | *** =instructions
301 | Use [`rbind()`](http://www.rdocumentation.org/packages/base/functions/cbind) to create `all_wars_matrix`, a new matrix with `star_wars_matrix` in the first three rows and `star_wars_matrix2` in the next three rows.
302 | 
303 | *** =hint
304 | You can check out the contents of the workspace by executing [`ls()`](http://www.rdocumentation.org/packages/base/functions/ls) in the console.
305 | 
306 | Bind the two matrices together in the following way: 
307 | ```
308 | rbind(matrix1, matrix2)
309 | ```
310 | Assign the result to `all_wars_matrix`.
311 | 
312 | 
313 | *** =pre_exercise_code
314 | ```{r}
315 | # Star Wars box office in millions (!)
316 | new_hope <- c(460.998, 314.4)
317 | empire_strikes <- c(290.475, 247.900)
318 | return_jedi <- c(309.306, 165.8)
319 | star_wars_matrix <- rbind(new_hope, empire_strikes, return_jedi)
320 | rownames(star_wars_matrix) <- c("A New Hope", "The Empire Strikes Back", "Return of the Jedi")
321 | colnames(star_wars_matrix) <- c("US", "non-US")
322 | 
323 | # Construct matrix2
324 | box_office_all2 <- c(474.5, 552.5, 310.7, 338.7, 380.3, 468.5)
325 | movie_names2 <- c("The Phantom Menace", "Attack of the Clones", "Revenge of the Sith")
326 | star_wars_matrix2 <- matrix(box_office_all2, nrow = 3, byrow = TRUE)
327 | rownames(star_wars_matrix2) <- c("The Phantom Menace", "Attack of the Clones", "Revenge of the Sith")
328 | colnames(star_wars_matrix2) <- c("US", "non-US")
329 | ```
330 | 
331 | *** =sample_code
332 | ```{r}
333 | # Matrix that contains the first trilogy box office
334 | star_wars_matrix  
335 | 
336 | # Matrix that contains the second trilogy box office
337 | star_wars_matrix2 
338 | 
339 | # Combine both Star Wars trilogies in one matrix: all_wars_matrix
340 | 
341 | ```
342 | 
343 | *** =solution
344 | ```{r}
345 | # Matrix that contains the first trilogy box office
346 | star_wars_matrix  
347 | 
348 | # Matrix that contains the second trilogy box office
349 | star_wars_matrix2 
350 | 
351 | # Combine both Star Wars trilogies in one matrix: all_wars_matrix
352 | all_wars_matrix <- rbind(star_wars_matrix, star_wars_matrix2)
353 | ```
354 | 
355 | *** =sct
356 | ```{r}
357 | msg <- "Do not override the variables that have been defined for you in the workspace (`star_wars_matrix` and `star_wars_matrix2`)."
358 | test_object("star_wars_matrix", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg)
359 | test_object("star_wars_matrix2", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg)
360 | test_object("all_wars_matrix",
361 |             incorrect_msg = "Did you use the [`rbind()`](http://www.rdocumentation.org/packages/base/functions/cbind) function with the correct arguments to build `all_wars_matrix`?")
362 | success_msg("Wonderful! Continue with the next exercise and see how you can combine the results of the [`rbind()`](http://www.rdocumentation.org/packages/base/functions/cbind) function with the [`colSums()`](http://www.rdocumentation.org/packages/base/functions/colSums) function!")
363 | ```
364 | 
365 | 
366 | --- type:NormalExercise lang:r xp:100 skills:1 key:30a0c39c10
367 | ## The total box office revenue for the entire saga
368 | 
369 | Just like every [`cbind()`](http://www.rdocumentation.org/packages/base/functions/cbind) has a [`rbind()`](http://www.rdocumentation.org/packages/base/functions/cbind), every [`colSums()`](http://www.rdocumentation.org/packages/base/functions/colSums) has a [`rowSums()`](http://www.rdocumentation.org/packages/base/functions/colSums). 
370 | 
371 | Your R workspace already contains the `all_wars_matrix` that you constructed in the previous exercise. Let us now calculate the total box office revenue for the entire saga.
372 | 
373 | *** =instructions
374 | - Create a vector of length two with the total revenue for the US and the non-US region. Name this vector `total_revenue_vector`.
375 | - Print `total_revenue_vector`.
376 | 
377 | *** =hint
378 | You should use the [`colSums()`](http://www.rdocumentation.org/packages/base/functions/colSums) function with `star_wars_matrix` as the argument to find the total box office per region.
379 | 
380 | *** =pre_exercise_code
381 | ```{r}
382 | # Star Wars box office in millions (!)
383 | new_hope <- c(460.998, 314.4)
384 | empire_strikes <- c(290.475, 247.900)
385 | return_jedi <- c(309.306, 165.8)
386 | star_wars_matrix <- rbind(new_hope, empire_strikes, return_jedi)
387 | rownames(star_wars_matrix) <- c("A New Hope", "The Empire Strikes Back", "Return of the Jedi")
388 | colnames(star_wars_matrix) <- c("US", "non-US")
389 | 
390 | # Construct matrix2
391 | box_office_all2 <- c(474.5, 552.5, 310.7, 338.7, 380.3, 468.5)
392 | movie_names2 <- c("The Phantom Menace", "Attack of the Clones", "Revenge of the Sith")
393 | star_wars_matrix2 <- matrix(box_office_all2, nrow = 3, byrow = TRUE)
394 | colnames(star_wars_matrix2) <- c("US", "non-US")
395 | rownames(star_wars_matrix2) <- c("The Phantom Menace", "Attack of the Clones", "Revenge of the Sith")
396 | 
397 | # Combine both Star Wars trilogies in one matrix
398 | all_wars_matrix <- rbind(star_wars_matrix, star_wars_matrix2)
399 | rm(star_wars_matrix, star_wars_matrix2)
400 | ```
401 | 
402 | *** =sample_code
403 | ```{r}
404 | # Print box office Star Wars
405 | all_wars_matrix
406 | 
407 | # Total revenue for US and non-US: total_revenue_vector
408 | 
409 | 
410 | # Print total_revenue_vector
411 | ```
412 | 
413 | *** =solution
414 | ```{r}
415 | # Print box office Star Wars
416 | all_wars_matrix
417 | 
418 | # Total revenue for US and non-US: total_revenue_vector
419 | total_revenue_vector <- colSums(all_wars_matrix)
420 | 
421 | # Print total_revenue_vector
422 | total_revenue_vector
423 | ```
424 | 
425 | *** =sct
426 | ```{r}
427 | msg <- "Do not override the variables that have been defined for you in the workspace."
428 | test_object("all_wars_matrix", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg)
429 | test_function("colSums", "x", incorrect_msg = "Did you use the [`colSums()`](http://www.rdocumentation.org/packages/base/functions/colSums) function on `all_wars_matrix`?")
430 | test_object("total_revenue_vector",
431 |             undefined_msg = "Please make sure to define a variable `total_revenue_vector`.",
432 |             incorrect_msg = "Have you correctly assigned the result of [`colSums()`](http://www.rdocumentation.org/packages/base/functions/colSums) to `total_revenue_vector?`")
433 | test_output_contains("total_revenue_vector", incorrect_msg = "Don't forget to print out `total_revenue_vector`.")
434 | success_msg("Wonderful!")
435 | ```
436 | 
437 | 
438 | --- type:VideoExercise lang:r xp:50 skills:1 key:7ab632b301
439 | ## Subsetting Matrices
440 | 
441 | *** =video_link
442 | //player.vimeo.com/video/138173935
443 | 
444 | *** =video_hls
445 | //videos.datacamp.com/transcoded/732_intro_to_r/v1/hls-ch3_2.master.m3u8
446 | 
447 | 
448 | --- type:NormalExercise lang:r xp:100 skills:1 key:b4007d31b3
449 | ## Select elements
450 | 
451 | In the previous video, Filip explained how subsetting, using square brackets, extended from vectors to matrices. In general, the following line selects an element that's on row `i` and column `j` from a matrix `m`:
452 | 
453 | ```
454 | m[i,j]
455 | ```
456 | 
457 | Let's go intergalactic on subsetting now! You'll continue working on `star_wars_matrix`, which is still a matrix containing both US and non-US box office figures for the first three movies.
458 | 
459 | *** =instructions
460 | - Select the US box office figure for "The Empire Strikes Back".
461 | - Select the non-US box office number for "A New Hope"
462 | No need to assign these elements to new variables; simply print them.
463 | 
464 | *** =hint
465 | To select the element on row 3 and column 2, you can use `star_wars_matrix[3, 2]`.
466 | 
467 | *** =pre_exercise_code
468 | ```{r}
469 | # Star Wars box office in millions (!)
470 | new_hope <- c(460.998, 314.4)
471 | empire_strikes <- c(290.475, 247.900)
472 | return_jedi <- c(309.306, 165.8)
473 | star_wars_matrix <- rbind(new_hope, empire_strikes, return_jedi)
474 | colnames(star_wars_matrix) <- c("US", "non-US")
475 | rownames(star_wars_matrix) <- c("A New Hope", "The Empire Strikes Back", "Return of the Jedi")
476 | ```
477 | 
478 | *** =sample_code
479 | ```{r}
480 | # star_wars_matrix is already defined in your workspace
481 | 
482 | # US box office revenue for "The Empire Strikes Back"
483 | 
484 | 
485 | # non-US box office revenue for "A New Hope"
486 | 
487 | 
488 | ```
489 | 
490 | *** =solution
491 | ```{r}
492 | # star_wars_matrix is already defined in your workspace
493 | 
494 | # US box office revenue for "The Empire Strikes Back"
495 | star_wars_matrix[2,1]
496 | 
497 | # non-US box office revenue for "A New Hope"
498 | star_wars_matrix[1,2]
499 | ```
500 | 
501 | *** =sct
502 | ```{r}
503 | 
504 | msg <- "Do not remove or override `star_wars_matrix`, it has already been defined for you!"
505 | test_object("star_wars_matrix", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg)
506 | msg <- "Have another look at the %s instruction. Are you sure you selected the correct element(s)?"
507 | test_output_contains("star_wars_matrix[2,1]", incorrect_msg = sprintf(msg, "first"))
508 | test_output_contains("star_wars_matrix[1,2]", incorrect_msg = sprintf(msg, "second"))
509 | success_msg("Great! That wasn't too hard was it? Head over to the next exercise.")
510 | ```
511 | 
512 | 
513 | --- type:NormalExercise lang:r xp:100 skills:1 key:3b3ab3e40a
514 | ## Select rows and columns
515 | 
516 | In the previous exercise, you covered the selection of a single element from a matrix. The result was a vector of length 1. However, as the matrix is two-dimensional, you can also extract one-dimensional parts from it. More specifically, to select all elements on row `i` of a matrix `m`, you use:
517 | 
518 | ```
519 | m[i,]
520 | ```
521 | 
522 | Likewise, to select all elements on column `j`, 
523 | 
524 | ```
525 | m[,j]
526 | ```
527 | 
528 | will help you out. Notice that the result of these subsetting operators are also vectors, but they're typically contain more than 1 element.
529 | 
530 | *** =instructions
531 | - Select all US box office revenue from `star_wars_matrix`, so the entire first column.
532 | - Extract all the revenue information for "A New Hope", so the entire first row.
533 | 
534 | *** =hint
535 | - To select the first row from `star_wars_matrix`, you can use `star_wars_matrix[1,]`.
536 | 
537 | *** =pre_exercise_code
538 | ```{r}
539 | # Star Wars box office in millions (!)
540 | new_hope <- c(460.998, 314.4)
541 | empire_strikes <- c(290.475, 247.900)
542 | return_jedi <- c(309.306, 165.8)
543 | star_wars_matrix <- rbind(new_hope, empire_strikes, return_jedi)
544 | rownames(star_wars_matrix) <- c("A New Hope", "The Empire Strikes Back", "Return of the Jedi")
545 | colnames(star_wars_matrix) <- c("US", "non-US")
546 | ```
547 | 
548 | *** =sample_code
549 | ```{r}
550 | # star_wars_matrix is already defined in your workspace
551 | 
552 | # Select all US box office revenue
553 | 
554 | 
555 | # Select revenue for "A New Hope"
556 | 
557 | 
558 | ```
559 | 
560 | *** =solution
561 | ```{r}
562 | # star_wars_matrix is already defined in your workspace
563 | 
564 | # Select all US box office revenue
565 | star_wars_matrix[,1]
566 | 
567 | # Select revenue for "A New Hope"
568 | star_wars_matrix[1,]
569 | ```
570 | 
571 | *** =sct
572 | ```{r}
573 | 
574 | msg <- "Do not remove or override `star_wars_matrix`, it has already been defined for you!"
575 | test_object("star_wars_matrix", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg)
576 | msg <- "Have another look at the %s instruction. Are you sure you selected the correct element(s)?"
577 | test_output_contains("star_wars_matrix[,1]", incorrect_msg = sprintf(msg, "first"))
578 | test_output_contains("star_wars_matrix[1,]", incorrect_msg = sprintf(msg, "second"))
579 | success_msg("Great! Continue to the next exercise.")
580 | ```
581 | 
582 | 
583 | --- type:NormalExercise lang:r xp:100 skills:1 key:32d3cedaba
584 | ## Create submatrices
585 | 
586 | Last but not least, you can create submatrices from larger matrices. If a vector is not sufficient to store the information you want to select, you need to create a new matrix. If you want to create a submatrix that comprises rows 1 and 4 and columns 2 and 3 of a matrix `m`, to following call will help you out:
587 | 
588 | ```
589 | m[c(1,4), c(2,3)]
590 | ```
591 | 
592 | *** =instructions
593 | Select all revenue figures for "A New Hope" and "Return of the Jedi" from `star_wars_matrix`.
594 | 
595 | *** =hint
596 | No hint on this one, you're on your own here!
597 | 
598 | *** =pre_exercise_code
599 | ```{r}
600 | # Star Wars box office in millions (!)
601 | new_hope <- c(460.998, 314.4)
602 | empire_strikes <- c(290.475, 247.900)
603 | return_jedi <- c(309.306, 165.8)
604 | star_wars_matrix <- rbind(new_hope, empire_strikes, return_jedi)
605 | rownames(star_wars_matrix) <- c("A New Hope", "The Empire Strikes Back", "Return of the Jedi")
606 | colnames(star_wars_matrix) <- c("US", "non-US")
607 | ```
608 | 
609 | *** =sample_code
610 | ```{r}
611 | # star_wars_matrix is already defined in your workspace
612 | 
613 | # All figures for "A New Hope" and "Return of the Jedi"
614 | 
615 | ```
616 | 
617 | *** =solution
618 | ```{r}
619 | # star_wars_matrix is already defined in your workspace
620 | 
621 | # All figures for "A New Hope" and "Return of the Jedi"
622 | star_wars_matrix[c(1,3), c(1,2)]   # option 1
623 | star_wars_matrix[c(1,3), ]         # option 2
624 | ```
625 | 
626 | *** =sct
627 | ```{r}
628 | 
629 | msg <- "Do not remove or override `star_wars_matrix`, it has already been defined for you!"
630 | test_object("star_wars_matrix", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg)
631 | test_output_contains("star_wars_matrix[c(1,3), c(1,2)]", 
632 |                      incorrect_msg = "Hmm, that's not totally correct. Make sure you end up with another matrix, containing 4 elements in total.")
633 | success_msg("Nice one! You could have used both `star_wars_matrix[c(1,3), c(1,2)]` and `star_wars_matrix[c(1,3), ]` to solve this exercise. Not defining any index for a dimension, is actually keeping all the indices for that dimension.")
634 | ```
635 | 
636 | 
637 | --- type:NormalExercise lang:r xp:100 skills:1 key:a6e32664a3
638 | ## Alternative ways of subsetting
639 | 
640 | Just as with vectors, you can also subset matrices using names and logical vectors. Of course, you can only subset by name if the matrix you're working with actually has names associated with it. Logical vectors on the other hand, can always be used to select the element(s) of interest.
641 | 
642 | *** =instructions
643 | - Select the US revenues for "A New Hope" and "The Empire Strikes Back".
644 | - Select the last two rows and both columns from `star_wars_matrix`.
645 | 
646 | *** =hint
647 | To select the US revenue for "Return of The Jedi", you can use the following command:
648 | ```
649 | star_wars_matrix["Return of the Jedi", "US"]
650 | ```
651 | 
652 | *** =pre_exercise_code
653 | ```{r}
654 | # Star Wars box office in millions (!)
655 | new_hope <- c(460.998, 314.4)
656 | empire_strikes <- c(290.475, 247.900)
657 | return_jedi <- c(309.306, 165.8)
658 | star_wars_matrix <- rbind(new_hope, empire_strikes, return_jedi)
659 | rownames(star_wars_matrix) <- c("A New Hope", "The Empire Strikes Back", "Return of the Jedi")
660 | colnames(star_wars_matrix) <- c("US", "non-US")
661 | ```
662 | 
663 | *** =sample_code
664 | ```{r}
665 | # star_wars_matrix is already defined in your workspace
666 | 
667 | # Select the US revenues for "A New Hope" and "The Empire Strikes Back"
668 | 
669 | 
670 | # Select the last two rows and both columns
671 | 
672 | 
673 | ```
674 | 
675 | *** =solution
676 | ```{r}
677 | # star_wars_matrix is already defined in your workspace
678 | 
679 | # Select the US revenues for "A New Hope" and "The Empire Strikes Back"
680 | star_wars_matrix[c("A New Hope", "The Empire Strikes Back"), "US"]
681 | 
682 | # Select the last two rows and both columns
683 | star_wars_matrix[c(FALSE, TRUE, TRUE), c(TRUE, TRUE)]
684 | ```
685 | 
686 | *** =sct
687 | ```{r}
688 | msg <- "Do not remove or override `star_wars_matrix`, it has already been defined for you!"
689 | test_object("star_wars_matrix", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg)
690 | msg <- "Have another look at the %s instruction. Are you sure you selected the correct element(s) using the correct methods?"
691 | test_output_contains("star_wars_matrix[c(\"A New Hope\", \"The Empire Strikes Back\"), \"US\"]", incorrect_msg = sprintf(msg, "first"))
692 | test_output_contains("star_wars_matrix[c(FALSE, TRUE, TRUE), c(TRUE, TRUE)]", incorrect_msg = sprintf(msg, "second"))
693 | success_msg("Awesome! Remember that you can combine subsetting by indices, by names and using logical vectors: you can for example select the rows using indices, but the columns using a vector of `TRUE`s and `FALSE`s. You name it, R can handle it!")
694 | ```
695 | 
696 | 
697 | --- type:VideoExercise lang:r xp:50 skills:1 key:f49d1498f5
698 | ## Matrix Arithmetic
699 | 
700 | *** =video_link
701 | //player.vimeo.com/video/141163423
702 | 
703 | *** =video_hls
704 | //videos.datacamp.com/transcoded/732_intro_to_r/v1/hls-ch3_3.master.m3u8
705 | 
706 | --- type:NormalExercise lang:r xp:100 skills:1 key:c099d6de31
707 | ## Arithmetic with matrices (1)
708 | 
709 | Similar to what you have learned with vectors, the standard operators like `+`, `-`, `/`, `*`, etc. work in an element-wise way on matrices in R. 
710 | 
711 | As a newly-hired data analyst for StarWarsStudios, it is your job is to find out how many visitors went to each movie for each geographical area. 
712 | 
713 | You already have the total revenue figures in `star_wars_matrix`. Assume that the price of a ticket was 5 dollars. Box office numbers divided by the ticket price gives you the number of visitors.
714 | 
715 | *** =instructions
716 | - Assign the matrix with the estimated number of Non-US and US visitors (in millions) for the three movies to `visitors`.
717 | - Print the resulting variable to the console.
718 | 
719 | *** =hint
720 | The number of visitors is the revenue (which is stored in `star_wars_matrix`) divided by the price of ticket (assumed to be $5).
721 | 
722 | *** =pre_exercise_code
723 | ```{r}
724 | # no pec
725 | ``` 
726 | 
727 | *** =sample_code
728 | ```{r}
729 | # Star Wars box office in millions (!)
730 | new_hope <- c(460.998, 314.4)
731 | empire_strikes <- c(290.475, 247.900)
732 | return_jedi <- c(309.306, 165.8)
733 | star_wars_matrix <- rbind(new_hope, empire_strikes, return_jedi)
734 | rownames(star_wars_matrix) <- c("A New Hope", "The Empire Strikes Back", "Return of the Jedi")
735 | colnames(star_wars_matrix) <- c("US", "non-US")
736 | 
737 | # Estimation of visitors
738 | 
739 |   
740 | # Print the estimate to the console
741 | 
742 | ```
743 | 
744 | *** =solution
745 | ```{r}
746 | # Star Wars box office in millions (!)
747 | new_hope <- c(460.998, 314.4)
748 | empire_strikes <- c(290.475, 247.900)
749 | return_jedi <- c(309.306, 165.8)
750 | star_wars_matrix <- rbind(new_hope, empire_strikes, return_jedi)
751 | rownames(star_wars_matrix) <- c("A New Hope", "The Empire Strikes Back", "Return of the Jedi")
752 | colnames(star_wars_matrix) <- c("US", "non-US")
753 | 
754 | # Estimation of visitors
755 | visitors <- star_wars_matrix / 5
756 | 
757 | # Print the estimate to the console
758 | visitors
759 | ```
760 | 
761 | *** =sct
762 | ```{r}
763 | msg <- "Do not remove or override `star_wars_matrix`, it has already been defined for you!"
764 | test_object("star_wars_matrix", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg)
765 | test_object("visitors",
766 |             incorrect_msg = "It looks like `visitors` does not contain the correct value. Remember that operations on matrices are element-wise.")
767 | test_output_contains("visitors", 
768 |                      incorrect_msg = "Don't forget to also print the variable `visitors` to the console.")
769 | success_msg("Great! What do these results tell you? A staggering 92 million people went to see A New Hope in theaters! Continue to the next exercise.");
770 | ```
771 | 
772 | 
773 | --- type:NormalExercise lang:r xp:100 skills:1 key:57d4c926e3
774 | ## Arithmetic with matrices (2)
775 | 
776 | Just like `2 * my_matrix` multiplies every element of `my_matrix` by 2, `my_matrix1 * my_matrix2` creates a matrix where each element is the product of the corresponding elements in `my_matrix1` and `my_matrix2`. 
777 | 
778 | After looking at the result of the previous exercise, the boss of StarWarsStudios points out that the ticket prices went up over time with one dollar per movie. He asks to redo the analysis based on the prices you can find in `ticket_prices_matrix` (source: imagination).
779 | 
780 | _Those who are familiar with linear algebra: this is not the standard matrix multiplication for which you should use `%*%` in R._
781 | 
782 | *** =instructions
783 | - Assign to `visitors` the matrix with your estimated number of Non-US and US visitors (in millions) for the three movies. Use ticket_prices_matrix` this time to take into account the movie-specific ticket prices.
784 | - Subset `visitors` so that you keep only the US visitors, and calculate the average number of this column. You have to use [`mean()`](http://www.rdocumentation.org/packages/base/functions/mean) here; this R function gives you the average of a numerical vector. Store the result in a variable `average_us_visitors`.
785 | 
786 | *** =hint
787 | - You can use the function [`mean()`](http://www.rdocumentation.org/packages/base/functions/mean) to calculate the average of the inputs to the function.
788 | - To get the number of visitors in the US, select the first column from `visitors` using `visitors[ ,1]`.
789 | 
790 | *** =pre_exercise_code
791 | ```{r}
792 | # no pec
793 | ```
794 | 
795 | *** =sample_code
796 | ```{r}
797 | # Star Wars box office in millions (!)
798 | box_office_all <- c(461, 314.4, 290.5, 247.9, 309.3, 165.8)
799 | movie_names <- c("A New Hope","The Empire Strikes Back","Return of the Jedi")
800 | col_titles <- c("US","non-US")
801 | star_wars_matrix <- matrix(box_office_all, nrow = 3, byrow = TRUE, dimnames = list(movie_names, col_titles))
802 | 
803 | # Definition of ticket_prices_matrix
804 | ticket_prices_matrix <- matrix(c(5, 5, 6, 6, 7, 7), nrow = 3, byrow = TRUE, dimnames = list(movie_names, col_titles)) 
805 | 
806 | # Estimated number of visitors
807 | 
808 | 
809 | # Average number of US visitors
810 | 
811 | 
812 | ```
813 | 
814 | *** =solution
815 | ```{r}
816 | # Star Wars box office in millions (!) 
817 | # Construct matrix 
818 | box_office_all <- c(461, 314.4, 290.5, 247.9, 309.3, 165.8)
819 | movie_names <- c("A New Hope","The Empire Strikes Back","Return of the Jedi")
820 | col_titles <- c("US","non-US")
821 | star_wars_matrix <- matrix(box_office_all, nrow = 3, byrow = TRUE, dimnames = list(movie_names, col_titles))
822 | ticket_prices_matrix <- matrix(c(5, 5, 6, 6, 7, 7), nrow = 3, byrow = TRUE, dimnames = list(movie_names,col_titles))
823 | 
824 | # Estimated number of visitors
825 | visitors <- star_wars_matrix / ticket_prices_matrix
826 | 
827 | # Average number of US visitors
828 | average_us_visitors <- mean(visitors[ ,1])
829 | ```
830 | 
831 | *** =sct
832 | ```{r}
833 | msg <- "Do not change anything about the preset variables!"
834 | test_object("star_wars_matrix", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg)
835 | test_object("ticket_prices_matrix", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg)
836 | test_object("visitors", incorrect_msg = "It looks like `visitors` does not contain the correct value. Remember that you can divide two matrices.")
837 | test_object("average_us_visitors", incorrect_msg = "It looks like `average_us_visitors` does not contain the average of the US visitors. Use [`mean()`](http://www.rdocumentation.org/packages/base/functions/mean) in combination with a subset of `visitors`.")
838 | success_msg("It's a fact: the R force is with you! You are now ready for the final exercise on matrices!")
839 | ```
840 | 
841 | 


--------------------------------------------------------------------------------
/chapter4.Rmd:
--------------------------------------------------------------------------------
  1 | ---
  2 | title_meta  : Chapter 4
  3 | title       : Factors
  4 | description : Very often, data falls into a limited number of categories.In R, categorical data is stored in factors. Given the importance of these factors in data analysis, you should start learning how to create, subset and compare them now!
  5 | attachments :
  6 |   slides_link: https://s3.amazonaws.com/assets.datacamp.com/course/introduction_to_r/slides/ch4_slides.pdf
  7 | 
  8 | --- type:VideoExercise lang:r xp:50 skills:1 key:c9cff655dd
  9 | ## Factors
 10 | 
 11 | *** =video_link
 12 | //player.vimeo.com/video/138173962
 13 | 
 14 | *** =video_hls
 15 | //videos.datacamp.com/transcoded/732_intro_to_r/v1/hls-ch4_1.master.m3u8
 16 | 
 17 | 
 18 | --- type:NormalExercise lang:r xp:100 skills:1 key:36ba783482
 19 | ## Vector to factor
 20 | 
 21 | In the following exercises, we'll be working with handedness as a categorical variable that can be either "Left" or "Right". In general, you're either left-handed or you're right-handed (not both), so this complies with the conditions of a categorical variable.
 22 | 
 23 | To create factors in R, you make use of the function [`factor()`](http://www.rdocumentation.org/packages/base/functions/factor). The first thing you do is creating a vector that contains all the observations that belong to a limited number of categories. For example, `hand_vector` contains the handedness of 5 different individuals. Next, you pass this vector to the function [`factor()`](http://www.rdocumentation.org/packages/base/functions/factor).
 24 | 
 25 | *** =instructions
 26 | - Assign to `hand_factor` the character vector `hand_vector` converted to a factor. Look at the console and note that R prints out the factor levels below the actual values.
 27 | - Have a look at the underlying structure of `hand_factor` using [`str()`](http://www.rdocumentation.org/packages/utils/functions/str).
 28 | 
 29 | *** =hint
 30 | Simply use the function [`factor()`](http://www.rdocumentation.org/packages/base/functions/factor) on `hand_vector`. Have a look at the assignment, the answer is already there somewhere...
 31 | 
 32 | *** =pre_exercise_code
 33 | ```{r}
 34 | # no pec
 35 | ```
 36 | 
 37 | *** =sample_code
 38 | ```{r}
 39 | # Definition of hand_vector
 40 | hand_vector <- c("Right", "Left", "Left", "Right", "Left")
 41 | 
 42 | # Convert hand_vector to a factor: hand_factor
 43 | 
 44 | 
 45 | # Display the structure of hand_factor
 46 | ```
 47 | 
 48 | *** =solution
 49 | ```{r}
 50 | # Definition of hand_vector
 51 | hand_vector <- c("Right", "Left", "Left", "Right", "Left")
 52 | 
 53 | # Convert hand_vector to a factor: hand_factor
 54 | hand_factor <- factor(hand_vector)
 55 | 
 56 | # Display the structure of hand_factor
 57 | str(hand_factor)
 58 | ```
 59 | 
 60 | *** =sct
 61 | ```{r}
 62 | test_error()
 63 | test_object("hand_factor",
 64 |             incorrect_msg = "Did you correctly convert `hand_vector` to a factor and assign the result to `hand_factor`?")
 65 | test_function("str", "object",
 66 |               not_called_msg = "Make sure to use the [`str()`](http://www.rdocumentation.org/packages/utils/functions/str) to show the structure of `hand_factor`",
 67 |               incorrect_msg = "Have you correctly passed `hand_factor` to [`str()`](http://www.rdocumentation.org/packages/utils/functions/str)?")
 68 | success_msg("Great! If you want to find out more about the [`factor()`](http://www.rdocumentation.org/packages/base/functions/factor) function, do not hesitate to type `?factor` in the console.");
 69 | ```
 70 | 
 71 | 
 72 | --- type:NormalExercise lang:r xp:100 skills:1 key:56d9d5409e
 73 | ## Factor levels
 74 | 
 75 | When you first get a data set, you will often notice that it contains factors with specific factor levels. Of course, you can also change these factor levels. You can either do this with the [`levels()`](http://www.rdocumentation.org/packages/base/functions/levels) function, after you defined the factor, or using the `labels` argument inside [`factor()`](http://www.rdocumentation.org/packages/base/functions/factor). The following options are equivalent:
 76 | 
 77 | ```
 78 | my_vector <- c("L", "S", "L", "M", "M")
 79 | 
 80 | # Option 1
 81 | my_factor <- factor(my_vector)
 82 | levels(my_factor) <- c("Large", "Medium", "Small")
 83 | 
 84 | # Option 2
 85 | my_factor <- factor(my_vector,
 86 |                     levels = c("S", "M", "L"),
 87 |                     labels = c("Small", "Medium", "Large"))
 88 | ```
 89 | 
 90 | In the first option, you have to pass the levels in alphabetical order. To not make mistakes, you better use the second option, but that's up to you!
 91 | 
 92 | You performed a street questionnaire, and recorded the respondents' handedness using the letters "R" and "L". This information is stored in a vector `survey_vector`, which is already coded on the right.
 93 | 
 94 | *** =instructions
 95 | - Convert the character vector `survey_vector` into a factor vector, `survey_factor`, with the levels "Right" and "Left".
 96 | - Print `survey_factor` to inspect its contents.
 97 | 
 98 | *** =hint
 99 | If you're using the first approach outlined above, mind the order in which you have to type in the factor levels.
100 | 
101 | *** =pre_exercise_code
102 | ```{r}
103 | # no pec
104 | ```
105 | 
106 | *** =sample_code
107 | ```{r}
108 | # Definition of survey_vector
109 | survey_vector <- c("R", "L", "L", "R", "R")
110 | 
111 | # Encode survey_vector as a factor with the correct names: survey_factor
112 | 
113 | 
114 | # Print survey_factor
115 | 
116 | ```
117 | 
118 | *** =solution
119 | ```{r}
120 | # Definition of survey_vector
121 | survey_vector <- c("R", "L", "L", "R", "R")
122 | 
123 | # Encode survey_vector as a factor with the correct names: survey_factor
124 | survey_factor <- factor(survey_vector, levels = c("R", "L"), labels = c("Right", "Left"))
125 | survey_factor_2 <- factor(survey_vector, levels = c("L", "R"), labels = c("Left", "Right")) # also possible
126 | 
127 | # Print survey_factor
128 | survey_factor
129 | ```
130 | 
131 | *** =sct
132 | ```{r}
133 | msg <- "Do not change the definition of `survey_vector`!"
134 | test_object("survey_vector", undefined_msg = msg, incorrect_msg = msg)
135 | msg <- "Have you correctly converted `survey_vector` to a factor? Make sure to correctly specify the new factor levels (R is case sensitive!)."
136 | test_object("survey_factor")
137 | test_object("survey_factor", eq_condition = "equal", undefined_msg = msg)
138 | test_output_contains("survey_factor", incorrect_msg = "Don't forget to print `survey_factor`!")
139 | success_msg("Wonderful! Proceed to the next exercise.")
140 | ```
141 | 
142 | 
143 | --- type:NormalExercise lang:r xp:100 skills:1 key:e09996c475
144 | ## Nominal versus Ordinal, Unordered versus Ordered
145 | 
146 | Remember that there are two types of categorical variables? On the one hand there's the _nominal categorical variable_, which does not have an implied order. The _ordinal categorical variable_, on the other hand, does have a natural ordering.
147 | 
148 | By default, the function [`factor()`](http://www.rdocumentation.org/packages/base/functions/factor) transforms a vector into an unordered factor. To create an ordered factor, you have to add two additional arguments: `ordered` and `levels`.
149 | 
150 | ```
151 | factor(some_vector,
152 |        ordered = TRUE,
153 |        levels = c("Level_1", "Level_2" ...))
154 | ```
155 | 
156 | By setting the argument `ordered` to `TRUE`, you indicate that the factor is ordered. With the argument `levels` you give the values of the factor in the correct order.
157 | 
158 | *** =instructions
159 | - Convert `animal_vector` to a factor, `animal_factor`. Make it an ordered factor if that's appropriate.
160 | - Encode `temperature_vector` as a factor called `temperature_factor`. Again, order this factor if that makes sense.
161 | - Print out both factors and compare the outputs.
162 | 
163 | *** =hint
164 | `animal_factor` should not be ordered; `temperature_factor` should!
165 | 
166 | *** =pre_exercise_code
167 | ```{r}
168 | # no pec
169 | ```
170 | 
171 | *** =sample_code
172 | ```{r}
173 | # Definition of animal_vector and temperature_vector
174 | animal_vector <- c("Elephant", "Giraffe", "Donkey", "Horse")
175 | temperature_vector <- c("High", "Low", "High", "Low", "Medium")
176 | 
177 | # Convert animal_vector to a factor: animal_factor
178 | 
179 | 
180 | # Encode temperature_vector as a factor: temperature_factor
181 | 
182 | 
183 | # Print out animal_factor and temperature_factor
184 | 
185 | ```
186 | 
187 | *** =solution
188 | ```{r}
189 | # Definition of animal_vector and temperature_vector
190 | animal_vector <- c("Elephant", "Giraffe", "Donkey", "Horse")
191 | temperature_vector <- c("High", "Low", "High", "Low", "Medium")
192 | 
193 | # Convert animal_vector to a factor: animal_factor
194 | animal_factor <- factor(animal_vector)
195 | 
196 | # Encode temperature_vector as a factor: temperature_factor
197 | temperature_factor <- factor(temperature_vector, ordered = TRUE, levels = c("Low", "Medium", "High"))
198 | 
199 | # Print out animal_factor and temperature_factor
200 | animal_factor
201 | temperature_factor
202 | ```
203 | 
204 | *** =sct
205 | ```{r}
206 | msg <- "Do not change or remove the code that defines `animal_vector` and `temperature_vector`!"
207 | test_object("animal_vector", undefined_msg = msg, incorrect_msg = msg)
208 | test_object("temperature_vector", undefined_msg = msg, incorrect_msg = msg)
209 | 
210 | test_object("animal_factor", eq_condition = "equal",
211 |             incorrect_msg = "Have you correctly converted `animal_vector` to a factor? You shouldn't order this factor!")
212 | 
213 | test_object("temperature_factor", eq_condition = "equal",
214 |             incorrect_msg = "Have you correctly encoded `temperature_vector` as a factor? In this case, an ordered factor makes sense!")
215 | 
216 | msg <- "Don't forget to print out both `animal_factor` and `temperature_factor`."
217 | test_output_contains("animal_factor", incorrect_msg = msg)
218 | test_output_contains("temperature_factor", incorrect_msg = msg)
219 | 
220 | success_msg("Awesome! You correctly figured out that `animal_factor` shouldn't be ordered while `temperature_factor` should.")
221 | ```
222 | 
223 | 
224 | --- type:NormalExercise lang:r xp:100 skills:1 key:f6eb0a5c5f
225 | ## Left better than right?
226 | 
227 | In `survey_factor` you have a two-level factor, containing "Left" and "Right". But how does R value these relatively to each other? In other words, who does R think is better, left or right?
228 | 
229 | *** =instructions
230 | - Select the first element from `survey_factor` and store it in the variable `right`.
231 | - Select the second element form `survey_factor` and store it in the variable `left`.
232 | - Using the greater than sign, find out whether `right` is greater than `left`.
233 | 
234 | *** =hint
235 | You can subset factors exactly the same as you subset vectors: using square brackets. To select the fourth element from `survey_factor`, you can type `survey_factor[4]`.
236 | 
237 | *** =pre_exercise_code
238 | ```{r}
239 | # no pec
240 | ```
241 | 
242 | *** =sample_code
243 | ```{r}
244 | # Definition of survey_vector and survey_factor
245 | survey_vector <- c("R", "L", "L", "R", "R")
246 | survey_factor <- factor(survey_vector, levels = c("R", "L"), labels = c("Right", "Left"))
247 | 
248 | # First element from survey_factor: right
249 | 
250 | 
251 | # Second element from survey_factor: left
252 | 
253 | 
254 | # Right 'greater than' left?
255 | 
256 | ```
257 | 
258 | *** =solution
259 | ```{r}
260 | # Definition of survey_vector and survey_factor
261 | survey_vector <- c("R", "L", "L", "R", "R")
262 | survey_factor <- factor(survey_vector, levels = c("R", "L"), labels = c("Right", "Left"))
263 | 
264 | # First element from survey_factor: right
265 | right <- survey_factor[1]
266 | 
267 | # Second element from survey_factor: left
268 | left <- survey_factor[2]
269 | 
270 | # Right 'greater than' left?
271 | right > left
272 | ```
273 | 
274 | *** =sct
275 | ```{r}
276 | msg = "Do not change anything about the first lines that define `survey_vector` and `survey_factor`."
277 | test_object("survey_vector", undefined_msg = msg, incorrect_msg = msg)
278 | test_object("survey_factor", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg)
279 | test_output_contains("survey_factor[1] > survey_factor[2]", incorrect_msg = "Make sure to correctly perform the comparison between right and left; we want the battle of dexterity be sorted once and for all!")
280 | success_msg("Phew, it seems that R is neutral ;-).")
281 | ```
282 | 
283 | 
284 | --- type:NormalExercise lang:r xp:100 skills:1 key:d65e71390a
285 | ## Ordered factors
286 | 
287 | Sometimes you will deal with factors that do have a natural ordering between its categories. In this case, we have to tell R about it.
288 | 
289 | Suppose you're leading a research team of five data analysts and that you want to evaluate their performance. To do this, you track their speed, evaluate each analyst as `"Slow"`, `"OK"` or `"Fast"`, and save the results in `speed_vector`.
290 | 
291 | `speed_vector` should be converted to an ordinal factor since its categories have a natural ordening. You already know how to do this. Here's a template to define an ordered factor once more:
292 | 
293 | ```
294 | factor(some_vector, ordered = TRUE, levels = c("Level_1", "Level_2", ...))
295 | ```
296 | 
297 | *** =instructions
298 | - Use the example code above to define `speed_factor`, that contains the speed information as an ordered factor. You can start from `speed_vector`, which is already created for you.
299 | - Print `speed_factor` to the console.
300 | - Use [`summary()`](http://www.rdocumentation.org/packages/base/functions/summary) to generate a summary of `speed_factor`: automagically, R prints the factor levels in the right order.
301 | 
302 | *** =hint
303 | - Use the function [`factor()`](http://www.rdocumentation.org/packages/base/functions/factor) to create `factor_speed_vector` based on `speed_character_vector`.
304 | - The argument `ordered` should be set to `TRUE` since there is a natural ordering.
305 | - The argument `levels` should be equal to `c("Slow", "OK", "Fast")`.
306 | 
307 | *** =pre_exercise_code
308 | ```{r}
309 | # no pec
310 | ```
311 | 
312 | *** =sample_code
313 | ```{r}
314 | # Create speed_vector
315 | speed_vector <- c("OK", "Slow", "Slow", "OK", "Fast")
316 | 
317 | # Convert speed_vector to ordered speed_factor
318 | 
319 | 
320 | # Print speed_factor
321 | 
322 | 
323 | # Summarize speed_factor
324 | 
325 | ```
326 | 
327 | *** =solution
328 | ```{r}
329 | # Create speed_vector
330 | speed_vector <- c("OK", "Slow", "Slow", "OK", "Fast")
331 | 
332 | # Convert speed_vector to ordered speed_factor
333 | speed_factor <- factor(speed_vector, ordered = TRUE, levels = c("Slow", "OK", "Fast"))
334 | 
335 | # Print speed_factor
336 | speed_factor
337 | 
338 | # Summarize speed_factor
339 | summary(speed_factor)
340 | ```
341 | 
342 | *** =sct
343 | ```{r}
344 | test_error()
345 | msg <- "Do not change anything about the command that defines `speed_vector`."
346 | test_object("speed_vector", undefined_msg = msg, incorrect_msg = msg)
347 | test_correct({
348 |   test_object("speed_factor", eq_condition = "equal",
349 |               incorrect_msg = "Make sure that you assigned the correct factor to `speed_factor`. Pay attention to the correct order of the `levels` argument.")
350 | },{
351 |   test_function("factor", "x")
352 |   test_function("factor", "levels")
353 |   test_function("factor", "ordered")
354 | })
355 | test_output_contains("summary(speed_factor)", incorrect_msg = "Don't forget to summarise `speed_factor`. Use [`summary()`](http://www.rdocumentation.org/packages/base/functions/summary).")
356 | success_msg("Great! Have a look at the console. It is now indicated that the Levels indeed have an order associated, with the `<` sign. Continue to the next exercise.");
357 | success_msg("A job well done! Continue to the next exercise.")
358 | ```
359 | 
360 | 
361 | --- type:NormalExercise lang:r xp:100 skills:1 key:e23011c42b
362 | ## Comparing ordered factors
363 | 
364 | 'Data analyst number two' is having a bad day at work. He enters your office and starts complaining that 'data analyst number five' is slowing down the entire project. Since you know that 'data analyst number two' has the reputation of being a smarty-pants, you first decide to check if his statement is true.
365 | 
366 | The fact that `speed_factor` is now ordered enables us to compare different elements (the data analysts in this case). You can simply do this by using a well-known operator: `>`.
367 | 
368 | *** =instructions
369 | Check whether data analyst 2 is faster than data analyst 5. Simply print out the result, which should be a logical.
370 | 
371 | *** =hint
372 | `vector[1] > vector[2]` checks whether the first element of vector is larger than the second element.
373 | 
374 | *** =pre_exercise_code
375 | ```{r}
376 | # no pec
377 | ```
378 | 
379 | *** =sample_code
380 | ```{r}
381 | # Definition of speed_vector and speed_factor
382 | speed_factor <- factor(c("Fast", "Slow", "Slow", "Fast", "Ultra-fast"),
383 |                        ordered = TRUE, levels = c("Slow", "Fast", "Ultra-fast"))
384 | 
385 | # Compare data analyst 2 with data analyst 5
386 | 
387 | ```
388 | 
389 | *** =solution
390 | ```{r}
391 | # Definition of speed_factor
392 | speed_factor <- factor(c("Fast", "Slow", "Slow", "Fast", "Ultra-fast"),
393 |                        ordered = TRUE, levels = c("Slow", "Fast", "Ultra-fast"))
394 | 
395 | # Compare data analyst 2 with data analyst 5
396 | speed_factor[2] > speed_factor[5]
397 | ```
398 | 
399 | *** =sct
400 | ```{r}
401 | msg <- "Do not change anything about the command that defines `speed_factor`!"
402 | test_object("speed_factor", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg)
403 | test_output_contains("speed_factor[2] > speed_factor[5]",
404 |                      incorrect_msg = paste("Have you correctly compared data analyst 2 to data analyst 5?",
405 |                                            "Use subsetting in combination with the `>` operator."))
406 | success_msg("Bellissimo! What does the result tell you? Data analyst two is complaining about the data analyst five while in fact he or she is the one slowing everything down!")
407 | ```
408 | 


--------------------------------------------------------------------------------
/chapter5.Rmd:
--------------------------------------------------------------------------------
  1 | --- 
  2 | title_meta  : Chapter 5
  3 | title       : Lists
  4 | description : Lists, as opposed to vectors, can hold components of different types, just like your to-do list at home or at work. This chapter will teach you how to create, name and subset these lists!
  5 | attachments : 
  6 |   slides_link: https://s3.amazonaws.com/assets.datacamp.com/course/introduction_to_r/slides/ch5_slides.pdf
  7 | 
  8 | --- type:VideoExercise lang:r xp:50 skills:1 key:d4fe5a84ef
  9 | ## Create and Name Lists
 10 | 
 11 | *** =video_link
 12 | //player.vimeo.com/video/138173972
 13 | 
 14 | *** =video_hls
 15 | //videos.datacamp.com/transcoded/732_intro_to_r/v1/hls-ch5_1.master.m3u8
 16 | 
 17 | 
 18 | --- type:NormalExercise lang:r xp:100 skills:1 key:b14f9472dd
 19 | ## Create a list
 20 | 
 21 | Just a quick refresher: A list in R allows you to gather a variety of objects in an ordered way. These objects can be matrices, vectors, factors, data frames, even other lists, etc. It is not even required that these objects are related to each other. You can construct a list with the [`list()`](http://www.rdocumentation.org/packages/base/functions/list) function:
 22 | 
 23 | ```
 24 | my_list <- list(comp1, comp2 ...)
 25 | ```
 26 | 
 27 | *** =instructions
 28 | Construct a list, named `my_list`, that contains the variables `my_vector`, `my_matrix` and `my_factor` as list components.
 29 | 
 30 | *** =hint
 31 | Use the [`list()`](http://www.rdocumentation.org/packages/base/functions/list) function with `my_vector`, `my_matrix` and `my_factor` as arguments separated by a comma.
 32 | 
 33 | *** =pre_exercise_code
 34 | ```{r}
 35 | # no pec
 36 | ```
 37 | 
 38 | *** =sample_code
 39 | ```{r}
 40 | # Numeric vector: 1 up to 10
 41 | my_vector <- 1:10 
 42 | 
 43 | # Numeric matrix: 1 up to 9
 44 | my_matrix <- matrix(1:9, ncol = 3)
 45 | 
 46 | # Factor of sizes
 47 | my_factor <- factor(c("M","S","L","L","M"), ordered = TRUE, levels = c("S","M","L"))
 48 | 
 49 | # Construct my_list with these different elements
 50 | 
 51 | ```
 52 | 
 53 | *** =solution
 54 | ```{r}
 55 | # Numeric vector: 1 up to 10
 56 | my_vector <- 1:10 
 57 | 
 58 | # Numeric matrix: 1 up to 9
 59 | my_matrix <- matrix(1:9, ncol = 3)
 60 | 
 61 | # Factor of sizes
 62 | my_factor <- factor(c("M","S","L","L","M"), ordered = TRUE, levels = c("S","M","L"))
 63 | 
 64 | # Construct my_list with these different elements
 65 | my_list <- list(my_vector, my_matrix, my_factor)
 66 | ```
 67 | 
 68 | *** =sct
 69 | ```{r}
 70 | test_error()
 71 | msg = "Do not remove or change the definition of the variables `my_vector`, `my_matrix` or `my_factor`!"
 72 | test_object("my_vector", undefined_msg = msg, incorrect_msg = msg)
 73 | test_object("my_matrix", undefined_msg = msg, incorrect_msg = msg)
 74 | test_object("my_factor", undefined_msg = msg, incorrect_msg = msg)
 75 | test_object("my_list", incorrect_msg = "It looks like `my_list` does not contain the correct elements. Have another look.")
 76 | success_msg("Wonderful! Your skillset is growing at a staggering pace! Head over to the next exercise.")
 77 | ```
 78 | 
 79 | 
 80 | --- type:NormalExercise lang:r xp:100 skills:1 key:849f04f218
 81 | ## Listception: lists in lists
 82 | 
 83 | As mentioned before, lists can also contain other lists. This works just the same as storing other types of R objects in a list. Next to the variables `my_vector`, `my_matrix` and `my_factor` from the previous exercise, now also `my_list` is predefined. Up to you to merge them in a new list; a super list!
 84 | 
 85 | *** =instructions
 86 | - Construct a list, named `my_super_list`, that now contains the four predefined variables listed in the sample code (in the same order).
 87 | - Print the structure of `my_super_list` with the [`str()`](http://www.rdocumentation.org/packages/utils/functions/str) function. Be sure to enter the variables in the following order: `my_ vector`, `my_ matrix`, `my_ factor`, `my_ list`
 88 | 
 89 | *** =hint
 90 | Just as in the previous exercise, use the [`list()`](http://www.rdocumentation.org/packages/base/functions/list) function. This time you have to add four components.
 91 | 
 92 | *** =pre_exercise_code
 93 | ```{r}
 94 | # no pec
 95 | ```
 96 | 
 97 | *** =sample_code
 98 | ```{r}
 99 | # Numeric vector: 1 up to 10
100 | my_vector <- 1:10 
101 | 
102 | # Numeric matrix: 1 up to 9
103 | my_matrix <- matrix(1:9, ncol = 3)
104 | 
105 | # Factor of sizes
106 | my_factor <- factor(c("M","S","L","L","M"), ordered = TRUE, levels = c("S","M","L"))
107 | 
108 | # List containing vector, matrix and factor
109 | my_list <- list(my_vector, my_matrix, my_factor)
110 | 
111 | # Construct my_super_list with the four data structures above
112 | 
113 | 
114 | # Display structure of my_super_list
115 | 
116 | ```
117 | 
118 | *** =solution
119 | ```{r}
120 | # Numeric vector: 1 up to 10
121 | my_vector <- 1:10 
122 | 
123 | # Numeric matrix: 1 up to 9
124 | my_matrix <- matrix(1:9, ncol = 3)
125 | 
126 | # Factor of sizes
127 | my_factor <- factor(c("M","S","L","L","M"), ordered = TRUE, levels = c("S","M","L"))
128 | 
129 | # List containing vector, matrix and factor
130 | my_list <- list(my_vector, my_matrix, my_factor)
131 | 
132 | # Construct my_super_list with the four data structures above
133 | my_super_list <- list(my_vector, my_matrix, my_factor, my_list)
134 | 
135 | # Display structure of my_super_list
136 | str(my_super_list)
137 | ```
138 | 
139 | *** =sct
140 | ```{r}
141 | test_error()
142 | msg = "Do not remove or change the definition of the variables `my_vector`, `my_matrix`, `my_factor` or `my_list`!"
143 | test_object("my_vector", undefined_msg = msg, incorrect_msg = msg)
144 | test_object("my_matrix", undefined_msg = msg, incorrect_msg = msg)
145 | test_object("my_factor", undefined_msg = msg, incorrect_msg = msg)
146 | test_object("my_list", undefined_msg = msg, incorrect_msg = msg)
147 | test_object("my_super_list",
148 |             incorrect_msg = "It looks like `my_super_list` does not contain the correct elements. It's also possible that the order is not correct. Have another look.")
149 | test_output_contains("str(my_super_list)", incorrect_msg = "Don't forget to display the structure of `my_super_list` using the [`str()`](http://www.rdocumentation.org/packages/utils/functions/str) function.")
150 | success_msg("Nice one! Can you see from the displayed structure how the vector, matrix and ordered factor appear twice: once in the top-level list and once in the embedded list. Next!")
151 | ```
152 | 
153 | 
154 | --- type:NormalExercise lang:r xp:100 skills:1 key:f37117a766
155 | ## Create a named list (1)
156 | 
157 | Well done! Let us keep this train going! To make the elements of your list clearer, you'll often want to name them: 
158 | 
159 | ```
160 | my_list <- list(name1 = your_comp1, 
161 |                 name2 = your_comp2)
162 | ``` 
163 | 
164 | If you want to name your lists after you've created them, you can use the [`names()`](http://www.rdocumentation.org/packages/base/functions/names) function as you did with vectors. The following commands are fully equivalent to the assignment above:
165 | 
166 | ```
167 | my_list <- list(your_comp1, your_comp2)
168 | names(my_list) <- c("name1", "name2")
169 | ```
170 | 
171 | *** =instructions
172 | - Change the code that build `my_list` by adding names to the components. Use for `my_matrix` the name `mat`, for `my_vector` the name `vec` and for `my_factor` the name `fac`.
173 | - Print the list to the console and inspect the output.
174 | 
175 | *** =hint
176 | The first method of assigning names to your list components is the easiest. It starts like this: 
177 | ```
178 | my_list <- list(vec = my_vector)
179 | ```
180 | Add the other two components in a similar fashion.
181 | 
182 | *** =pre_exercise_code
183 | ```{r}
184 | # no pec
185 | ```
186 | 
187 | *** =sample_code
188 | ```{r}
189 | # Numeric vector: 1 up to 10
190 | my_vector <- 1:10 
191 | 
192 | # Numeric matrix: 1 up to 9
193 | my_matrix <- matrix(1:9, ncol = 3)
194 | 
195 | # Factor of sizes
196 | my_factor <- factor(c("M","S","L","L","M"), ordered = TRUE, levels = c("S","M","L"))
197 | 
198 | # Adapt code to add names to elements in my_list
199 | my_list <- list(my_vector, my_matrix, my_factor)
200 | 
201 | # Print my_list to the console
202 | 
203 | ```
204 | 
205 | *** =solution
206 | ```{r}
207 | # Numeric vector: 1 up to 10
208 | my_vector <- 1:10 
209 | 
210 | # Numeric matrix: 1 up to 9
211 | my_matrix <- matrix(1:9, ncol = 3)
212 | 
213 | # Factor of sizes
214 | my_factor <- factor(c("M","S","L","L","M"), ordered = TRUE, levels = c("S","M","L"))
215 | 
216 | # Adapt code to add names to elements in my_list
217 | my_list <- list(vec = my_vector, mat = my_matrix, fac = my_factor)
218 | 
219 | # Print my_list to the console
220 | my_list
221 | ```
222 | 
223 | *** =sct
224 | ```{r}
225 | test_error()
226 | msg = "Do not remove or change the definition of the variables `my_vector`, `my_matrix` or `my_factor`!"
227 | test_object("my_vector", undefined_msg = msg, incorrect_msg = msg)
228 | test_object("my_matrix", undefined_msg = msg, incorrect_msg = msg)
229 | test_object("my_factor", undefined_msg = msg, incorrect_msg = msg)
230 | test_object("my_list",
231 |             incorrect_msg = "It looks like `my_list` does not contain the correct elements.")
232 | test_object("my_list", eq_condition = "equal",
233 |             incorrect_msg = "It looks like `my_list` does not contain the correct naming for the components.");
234 | test_output_contains("my_list", 
235 |                      incorrect_msg = "Don't forget to print `my_list` to the console! Simply add `my_list` on a new line in the script.")
236 | success_msg("Great! Not only do you know how to construct lists now, you can also name them; a skill that will prove most useful in practice. Continue to the next exercise.")
237 | ```
238 | 
239 | 
240 | --- type:NormalExercise lang:r xp:100 skills:1 key:dc5f4f6f30
241 | ## Create a named list (2)
242 | 
243 | Being a huge movie fan, you decide to start storing information on good movies with the help of lists. 
244 | 
245 | Start by creating a list for the movie "The Shining". The variables `actors` and `reviews` that you'll need have already been coded in the sample code.
246 | 
247 | *** =instructions
248 | Create the variable `shining_list`. The list contains the movie title first as "title", then the actor names as "actors", and finally the review scores factor as "reviews". Make sure to adopt the same order, and pay attention to the correct naming!
249 | 
250 | *** =hint
251 | Let's get you started with a chunk of code:
252 | 
253 |         shining_list <- list(title = "The Shining")
254 |         
255 | Can you complete the rest? You still have to add `actors_vector` and `reviews_factor` with the appropriate names.
256 | 
257 | *** =pre_exercise_code
258 | ```{r}
259 | # no pec
260 | ```
261 | 
262 | *** =sample_code
263 | ```{r}
264 | # Create actors and reviews
265 | actors_vector <- c("Jack Nicholson","Shelley Duvall","Danny Lloyd","Scatman Crothers","Barry Nelson")
266 | reviews_factor <- factor(c("Good", "OK", "Good", "Perfect", "Bad", "Perfect", "Good"), 
267 |                   ordered = TRUE, levels = c("Bad", "OK", "Good", "Perfect"))
268 | 
269 | # Create shining_list
270 | 
271 | ```
272 | 
273 | *** =solution
274 | ```{r}
275 | # Create actors and reviews
276 | actors_vector <- c("Jack Nicholson","Shelley Duvall","Danny Lloyd","Scatman Crothers","Barry Nelson")
277 | reviews_factor <- factor(c("Good", "OK", "Good", "Perfect", "Bad", "Perfect", "Good"), 
278 |                   ordered = TRUE, levels = c("Bad", "OK", "Good", "Perfect"))
279 | 
280 | # Create the list 'shining_list'
281 | shining_list <- list(title = "The Shining", actors = actors_vector, reviews = reviews_factor)
282 | ```
283 | 
284 | *** =sct
285 | ```{r}
286 | test_error()
287 | msg = "Do not remove or change the definition of the pre-set variables!"
288 | test_object("actors_vector", undefined_msg = msg, incorrect_msg = msg)
289 | test_object("reviews_factor", undefined_msg = msg, incorrect_msg = msg)
290 | test_object("shining_list",
291 |             incorrect_msg = "It looks like `shining_list` does not contain the correct elements.")
292 | test_object("shining_list", eq_condition = "equal",
293 |             incorrect_msg = "It looks like `shining_list` does not contain the correct naming for the components.")
294 | success_msg("Perfect!")
295 | ```
296 | 
297 | 
298 | --- type:VideoExercise lang:r xp:50 skills:1 key:a5e5ff2680
299 | ## Subset and Extend Lists
300 | 
301 | *** =video_link
302 | //player.vimeo.com/video/138173990
303 | 
304 | *** =video_hls
305 | //videos.datacamp.com/transcoded/732_intro_to_r/v1/hls-ch5_2.master.m3u8
306 | 
307 | 
308 | --- type:NormalExercise lang:r xp:100 skills:1 key:1e3b5f4b0a
309 | ## Selecting elements from a list
310 | 
311 | Your list will often be built out of numerous elements and components. Therefore, getting a single element, multiple elements, or a component out of it is not always straightforward.
312 | 
313 | To select a single element from a list, for example the first element from `shining_list`, you can any one of the following commands:
314 | 
315 | ```
316 | shining_list[[1]]
317 | shining_list[["title"]]
318 | shining_list$title
319 | ```
320 | 
321 | If you perform selection with single square brackets, you'll end up with another list that contains the specified elements:
322 | 
323 | ```
324 | shining_list[c(2,3)]
325 | shining_list[c(F,T,T)]
326 | ```
327 | 
328 | *** =instructions
329 | - Select the actors from `shining_list` and assign the result to `act`.
330 | - Create a new list containing only the title and the reviews of `shining_list`; save the new list in `sublist`.
331 | - Display the structure of `sublist`.
332 | 
333 | *** =hint
334 | For the first instruction you need double brackets (or the `$`), for the second one the single brackets will do.
335 | 
336 | *** =pre_exercise_code
337 | ```{r}
338 | load(url("http://s3.amazonaws.com/assets.datacamp.com/course/introduction_to_r/chapter5.RData"))
339 | ```
340 | 
341 | *** =sample_code
342 | ```{r}
343 | # shining_list is already defined in the workspace
344 | 
345 | # Actors from shining_list: act
346 | 
347 | 
348 | # List containing title and reviews from shining_list: sublist
349 | 
350 | 
351 | # Display structure of sublist
352 | 
353 | ```
354 | 
355 | *** =solution
356 | ```{r}
357 | # shining_list is already defined in the workspace
358 | 
359 | # Actors from shining_list: act
360 | act <- shining_list[["actors"]]
361 | 
362 | # List containing title and reviews from shining_list: sublist
363 | sublist <- shining_list[c("title", "reviews")]
364 | 
365 | # Display structure of sublist
366 | str(sublist)
367 | ```
368 | 
369 | *** =sct
370 | ```{r}
371 | test_error()
372 | msg = "Do not remove or override `shining_list`."
373 | test_object("shining_list", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg)
374 | 
375 | test_object("act", incorrect_msg = "Have you correctly selected the actors from `shining_list`?")
376 | test_object("sublist", incorrect_msg = "Have you correctly selected the title and reviews from `shining_list`? Use single brackets in combination with a vector to select multiple elements.")
377 | test_output_contains("str(sublist)", incorrect_msg = "Don't forget to display the structure of `sublist`!")
378 | 
379 | success_msg("Nice! That was still pretty easy, right? Always be aware of this difference between `[` and `[[`!")
380 | ```
381 | 
382 | 
383 | --- type:NormalExercise lang:r xp:100 skills:1 key:cbda2853c3
384 | ## Chaining your selections
385 | 
386 | Besides selecting entire list elements, it's also possible that you have to access specific parts of these components themselves. It's perfectly possible to _chain your subsetting operations_ in R.
387 | 
388 | For example, with 
389 | 
390 | ```
391 | shining_list[[2]][1]
392 | ```
393 | 
394 | you select from the second component, actors (`shining_list[[2]]`), the first element (`[1]`). When you type this in the console, you will see the answer is Jack Nicholson.
395 | 
396 | *** =instructions
397 | - Select from the `shining_list` the last actor and assign the result to `last_actor`.
398 | - Select from the `shining_list` the second review score (which is a factor). Store the result in `second_review`.
399 | 
400 | *** =hint
401 | - If you want to do things nicely: `length(shining_list$actors)` gives you the number of actors, and thus the element to select.
402 | - You can select the information of the second review with `shining_list$reviews[2, ]`.
403 | 
404 | *** =pre_exercise_code
405 | ```{r}
406 | load(url("http://s3.amazonaws.com/assets.datacamp.com/course/introduction_to_r/chapter5.RData"))
407 | ```
408 | 
409 | *** =sample_code
410 | ```{r}
411 | # shining_list is already defined in the workspace
412 | 
413 | # Select the last actor: last_actor
414 | 
415 | 
416 | # Select the second review: second_review
417 | 
418 | ```
419 | 
420 | *** =solution
421 | ```{r}
422 | # shining_list is already defined in the workspace
423 | 
424 | # Select the last actor: last_actor
425 | last_actor <- shining_list$actors[length(shining_list$actors)]
426 | 
427 | # Select the second review: second_review
428 | second_review <- shining_list$reviews[2]
429 | ```
430 | 
431 | *** =sct
432 | ```{r}
433 | test_error()
434 | msg = "Do not remove or override `shining_list`."
435 | test_object("shining_list", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg)
436 | 
437 | test_object("last_actor", 
438 |             incorrect_msg = "Looks like `last_actor` does not equal the last actor.")
439 | test_object("second_review",
440 |             incorrect_msg = "It looks like `second_review` does not contain the factor that corresponds to the second review.")
441 | success_msg("Great! Selecting elements from lists is rather easy isn't it? Continue to the next exercise.")
442 | ```
443 | 
444 | --- type:NormalExercise lang:r xp:100 skills:1 key:35ba0d6f0d
445 | ## Extending a list
446 | 
447 | You already know that the `$` as well as `[[` can be used to select elements from lists. They can also be used to extend lists. To extend `shining_list` with some personal opinion, you could do one of the following things:
448 | 
449 | ```
450 | shining_list$my_opinion <- "Love it!"
451 | shining_list[["my_opinion"]] <- "Love it!"
452 | ```
453 | 
454 | Being proud of your first list, you shared it with the members of your movie hobby club. However, one of the senior members, a guy named M. McDowell, noted that you forgot to add the release year (1980). Given your ambitions to become next year's president of the club, you decide to add this information to the list. To fully make up for your mistake, you also decide to add the name of the director (Stanley Kubrick). 
455 | 
456 | *** =instructions
457 | - Add the release year as a numeric to `shining_list` under the name `year`.
458 | - Add the director to the list, `"Stanley Kubrick"`, with the name `director`.
459 | - Finally, inspect the structure of `shining_list`.
460 | 
461 | *** =hint
462 | Let the examples in the assignment guide you to list extension mastery! Remember that R is case sensitive!
463 | 
464 | *** =pre_exercise_code
465 | ```{r}
466 | load(url("http://s3.amazonaws.com/assets.datacamp.com/course/introduction_to_r/chapter5.RData"))
467 | ```
468 | 
469 | *** =sample_code
470 | 
471 | ```{r}
472 | # shining_list is already defined in the workspace
473 | 
474 | # Add the release year to shining_list
475 | 
476 | 
477 | # Add the director to shining_list
478 | 
479 | 
480 | # Inspect the structure of shining_list
481 | 
482 | ```
483 | 
484 | *** =solution
485 | ```{r}
486 | # shining_list is already defined in the workspace
487 | 
488 | # Add the release year to shining_list
489 | shining_list$year <- 1980
490 | 
491 | # Add the director to shining_list
492 | shining_list$director <- "Stanley Kubrick"
493 | 
494 | # Inspect the structure of shining_list
495 | str(shining_list)
496 | ```
497 | 
498 | *** =sct
499 | ```{r}
500 | test_error()
501 | test_object("shining_list", 
502 |             incorrect_msg = paste("Have you correctly added both the year and the director to `shining_list`?",
503 |                                   "Make sure to use the correct names (\"year\" and \"director\"). Remember that R is case sensitive!"))
504 | 
505 | test_output_contains("str(shining_list)", 
506 |                      incorrect_msg = "Do not forget to inspect the structure of `shining_list` using the [`str()`](http://www.rdocumentation.org/packages/utils/functions/str) function.")
507 | 
508 | success_msg("Congratulations on finishing up on this chapter!")
509 | ```
510 | 


--------------------------------------------------------------------------------
/chapter6.Rmd:
--------------------------------------------------------------------------------
  1 | --- 
  2 | title_meta  : Chapter 6
  3 | title       : Data Frames
  4 | description : Most data sets you will be working with will be stored as a data frame. By the end of this chapter, you will be able to create a data frame, select interesting parts of a data frame and order a data frame according to certain variables.
  5 | attachments : 
  6 |   slides_link: https://s3.amazonaws.com/assets.datacamp.com/course/introduction_to_r/slides/ch6_slides.pdf
  7 | 
  8 | --- type:VideoExercise lang:r xp:50 skills:1 key:d4bde604ab
  9 | ## Explore the Data Frame
 10 | 
 11 | *** =video_link
 12 | //player.vimeo.com/video/138173996
 13 | 
 14 | *** =video_hls
 15 | //videos.datacamp.com/transcoded/732_intro_to_r/v1/hls-ch6_1.master.m3u8
 16 | 
 17 | --- type:NormalExercise lang:r xp:100 skills:1 key:d4ddcf9a7d
 18 | ## Have a look at your data set
 19 | 
 20 | Working with large data sets is not uncommon in data analysis. When you work with (extremely) large data sets and data frames, your first task as a data analyst is to develop a clear understanding of its structure and main elements. Therefore, it is often useful to show only a small part of the entire data set. 
 21 | 
 22 | There are several ways to do this in R. The function [`head()`](http://www.rdocumentation.org/packages/utils/functions/head) enables you to show the first observations of a data frame (or any R object you pass to it). Unoriginally, the function [`tail()`](http://www.rdocumentation.org/packages/utils/functions/head) prints out the last observations in your data set. You can also use the function [`dim()`](http://www.rdocumentation.org/packages/base/functions/dim) to show the dimensions of your data set.
 23 | 
 24 | In this exercise, you'll be working with the `mtcars` dataset, that is available in R by default.
 25 | 
 26 | *** =instructions
 27 | - Print the first observations of the [`mtcars`](http://www.rdocumentation.org/packages/datasets/functions/mtcars) data set.
 28 | - Use the [`tail()`](http://www.rdocumentation.org/packages/utils/functions/head) function to display the last observations.
 29 | - Finally, display the overall dimensions of the [`mtcars`](http://www.rdocumentation.org/packages/datasets/functions/mtcars) data frame with [`dim()`](http://www.rdocumentation.org/packages/base/functions/dim).
 30 | 
 31 | *** =hint
 32 | You'll need [`head()`](http://www.rdocumentation.org/packages/utils/functions/head) to show the first observations in [`mtcars`](http://www.rdocumentation.org/packages/datasets/functions/mtcars).
 33 | 
 34 | *** =pre_exercise_code
 35 | ```{r}
 36 | # no pec
 37 | ```
 38 | 
 39 | *** =sample_code
 40 | ```{r}
 41 | # Print the first observations of mtcars
 42 | 
 43 | 
 44 | # Print the last observations of mtcars
 45 | 
 46 | 
 47 | # Print the dimensions of mtcars
 48 | ```
 49 | 
 50 | *** =solution
 51 | ```{r}
 52 | # Print the first observations of mtcars
 53 | head(mtcars)
 54 | 
 55 | # Print the last observations of mtcars
 56 | tail(mtcars)
 57 | 
 58 | # Print the dimensions of mtcars
 59 | dim(mtcars)
 60 | ```
 61 | 
 62 | *** =sct
 63 | ```{r}
 64 | test_error()
 65 | test_function("head", "x", incorrect_msg = "Have you specified the correct parameter in the [`head()`](http://www.rdocumentation.org/packages/utils/functions/head) function? Make sure to pass it a data set you want to inspect, `mtcars` in this case.")
 66 | test_function("tail", "x", incorrect_msg = "Have you specified the correct parameter in the [`tail()`](http://www.rdocumentation.org/packages/utils/functions/head) function? Make sure to pass it a data set you want to inspect, `mtcars` in this case.")
 67 | test_output_contains("dim(mtcars)", incorrect_msg = "Don't forget to also call the [`dim()`](http://www.rdocumentation.org/packages/base/functions/dim) function on `mtcars`!")
 68 | 
 69 | success_msg("Wonderful! So, do you now have a good idea about what we have in the data set? For a full overview of the variables' meaning, type `?mtcars` in the console and read the help page. Continue to the next exercise!")
 70 | ```
 71 | 
 72 | 
 73 | --- type:NormalExercise lang:r xp:100 skills:1 key:c8f389fdbd
 74 | ## Have a look at the structure
 75 | 
 76 | Another method that is often used to get a rapid overview of your data is the function [`str()`](http://www.rdocumentation.org/packages/utils/functions/str). The function [`str()`](http://www.rdocumentation.org/packages/utils/functions/str) shows you the structure of your data set. For a data frame it tells you:
 77 | 
 78 | - The total number of observations (e.g. 32 car types)
 79 | - The total number of variables (e.g. 11 car features)
 80 | - A full list of the variables names (e.g. mpg, cyl ... )
 81 | - The data type of each variable (e.g. num for car features)
 82 | - The first observations
 83 | 
 84 | Applying the [`str()`](http://www.rdocumentation.org/packages/utils/functions/str) function will often be the first thing that you do when receiving a new data set or data frame. It is a great way to get more insight in your data set before diving into the real analysis.
 85 | 
 86 | *** =instructions
 87 | Investigate the structure of [`mtcars`](http://www.rdocumentation.org/packages/datasets/functions/mtcars). Make sure that you see the same numbers, variables and data types as mentioned above.
 88 | 
 89 | *** =hint
 90 | Use the [`str()`](http://www.rdocumentation.org/packages/utils/functions/str) function with [`mtcars`](http://www.rdocumentation.org/packages/datasets/functions/mtcars) as input!
 91 | 
 92 | *** =pre_exercise_code
 93 | ```{r}
 94 | # no pec
 95 | ```
 96 | 
 97 | *** =sample_code
 98 | ```{r}
 99 | # Investigate the structure of the mtcars data set
100 | ```
101 | 
102 | *** =solution
103 | ```{r}
104 | # Investigate the structure of the mtcars data set
105 | str(mtcars)
106 | ```
107 | 
108 | *** =sct
109 | ```{r}
110 | test_function("str","object",incorrect_msg = "Make sure to check the structure of the `mtcars` data set.")
111 | test_output_contains("str(mtcars)", incorrect_msg = "Make sure that you use the [`str()`](http://www.rdocumentation.org/packages/utils/functions/str) function on `mtcars`.")
112 | success_msg("Nice work! Can you find all the information that is listed in the exercise's assignment? Continue to the next exercise.")
113 | ```
114 | 
115 | 
116 | --- type:NormalExercise lang:r xp:100 skills:1 key:f09c0189ac
117 | ## Creating a data frame (1)
118 | 
119 | Since using built-in data sets is not even half the fun of creating your own data sets, the next exercises are based on your personally developed data set. So put your jet pack on because it is time for some good old fashioned space exploration! 
120 | 
121 | As a first goal, you want to construct a data frame that describes the main characteristics of eight planets in our solar system. According to your good friend Buzz, the main features of a planet are:
122 | 
123 | - The type of the planet (Terrestrial or Gas Giant).
124 | - The planet's diameter relative to the diameter of the Earth.
125 | - The planet's rotation across the sun relative to that of the Earth.
126 | - If the planet has rings or not (TRUE or FALSE).
127 | 
128 | After doing some high-quality research on [Wikipedia](http://en.wikipedia.org/wiki/Planet), you feel confident enough to create the necessary vectors: `planets`, `type`, `diameter`, `rotation` and `rings`. Can you correctly use the [`data.frame()`](http://www.rdocumentation.org/packages/base/functions/data.frame) function to create a data set from this information?
129 | 
130 | *** =instructions
131 | - Use the function [`data.frame()`](http://www.rdocumentation.org/packages/base/functions/data.frame) to construct `planets_df`.
132 | - Make sure that you've actually created a data frame with 8 observations and 5 variables with [`str()`](http://www.rdocumentation.org/packages/utils/functions/str).
133 | 
134 | *** =hint
135 | The [`data.frame()`](http://www.rdocumentation.org/packages/base/functions/data.frame) function takes as arguments the vectors that will become the columns of the data frame, separated by commas. The columns in this case are (in this order): `planet`, `type`, `diameter`, `rotation` and `rings`.
136 | 
137 | *** =pre_exercise_code
138 | ```{r}
139 | # no pec
140 | ```
141 | 
142 | *** =sample_code
143 | ```{r}
144 | # Definition of vectors
145 | planets <- c("Mercury", "Venus", "Earth", "Mars", "Jupiter", "Saturn", "Uranus", "Neptune")
146 | type <- c("Terrestrial planet", "Terrestrial planet", "Terrestrial planet", 
147 |           "Terrestrial planet", "Gas giant", "Gas giant", "Gas giant", "Gas giant")
148 | diameter <- c(0.382, 0.949, 1, 0.532, 11.209, 9.449, 4.007, 3.883)
149 | rotation <- c(58.64, -243.02, 1, 1.03, 0.41, 0.43, -0.72, 0.67)
150 | rings <- c(FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE)
151 | 
152 | # Create a data frame: planets_df
153 | 
154 | 
155 | # Display the structure of planets_df
156 | 
157 | ```
158 | 
159 | *** =solution
160 | ```{r}
161 | # Definition of vectors
162 | planets <- c("Mercury", "Venus", "Earth", "Mars", "Jupiter", "Saturn", "Uranus", "Neptune")
163 | type <- c("Terrestrial planet", "Terrestrial planet", "Terrestrial planet", 
164 |           "Terrestrial planet", "Gas giant", "Gas giant", "Gas giant", "Gas giant")
165 | diameter <- c(0.382, 0.949, 1, 0.532, 11.209, 9.449, 4.007, 3.883)
166 | rotation <- c(58.64, -243.02, 1, 1.03, 0.41, 0.43, -0.72, 0.67)
167 | rings <- c(FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE)
168 | 
169 | # Create a data frame: planets_df
170 | planets_df <- data.frame(planets, type, diameter, rotation, rings)
171 | 
172 | # Display the structure of planets_df
173 | str(planets_df)
174 | ```
175 | 
176 | *** =sct
177 | ```{r}
178 | test_correct({
179 |   test_object("planets_df",
180 |               undefined_msg = "Please make sure to define a variable `planets_df`.",
181 |               incorrect_msg = "Make sure to assign the correct order of arguments to the data.frame `planets_df`. The correct order is planets, type, diameter, rotation and rings.")  
182 | }, {
183 |   msg = "Do not change anything about the definition of the vector. Only add code to create the `planets_df` data frame!"
184 |   test_object("planets", undefined_msg = msg, incorrect_msg = msg)
185 |   test_object("type", undefined_msg = msg, incorrect_msg = msg)
186 |   test_object("diameter", undefined_msg = msg, incorrect_msg = msg)
187 |   test_object("rotation", undefined_msg = msg, incorrect_msg = msg)
188 |   test_object("rings", undefined_msg = msg, incorrect_msg = msg)
189 | })
190 | 
191 | test_output_contains("str(planets_df)", incorrect_msg = "Don't forget to display the structure of `planets_df`!")
192 | 
193 | success_msg("Great job! The structure of `planets_df` reveals that both the `planets` as the `type` column are factors, and not character vectors. That's not really what you want, right?")
194 | ```
195 | 
196 | --- type:NormalExercise lang:r xp:100 skills:1 key:7090dc3538
197 | ## Creating a data frame (2)
198 | 
199 | In the previous exercise, you found out that both the `planets` and `type` columns of `planets_df` are factors. For the `type` column this makes sense, because a planet type is some kind of category. For the `planets` column, however, that contains the planet names, this is less logical.
200 | 
201 | You can set the `stringsAsFactors` argument inside [`data.frame()`](http://www.rdocumentation.org/packages/base/functions/data.frame) to avoid that R automatically converts character vectors to factors:
202 | 
203 | ```
204 | data.frame(vec1, vec2, ..., stringsAsFactors = FALSE)
205 | ```
206 | 
207 | Up to you now to adapt the way `planets_df` is constructed!
208 | 
209 | *** =instructions
210 | - Encode the `type` vector in a factor, called `type_factor`.
211 | - Next use `planets`, `type_factor`, `diameter`, `rotation` and `rings` to construct `planets_df`. This time, make sure that strings are not converted to factors, by setting `stringsAsFactors = FALSE`.
212 | - Display the structure of `planets_df` to check you coded things correctly.
213 | 
214 | *** =hint
215 | Use the function [`factor()`](http://www.rdocumentation.org/packages/base/functions/factor) to encode `type` as a factor.
216 | 
217 | *** =pre_exercise_code
218 | ```{r}
219 | # no pec
220 | ```
221 | 
222 | *** =sample_code
223 | ```{r}
224 | # Definition of vectors
225 | planets <- c("Mercury", "Venus", "Earth", "Mars", "Jupiter", "Saturn", "Uranus", "Neptune")
226 | type <- c("Terrestrial planet", "Terrestrial planet", "Terrestrial planet",
227 |           "Terrestrial planet", "Gas giant", "Gas giant", "Gas giant","Gas giant")
228 | diameter <- c(0.382, 0.949, 1, 0.532, 11.209, 9.449, 4.007, 3.883)
229 | rotation <- c(58.64, -243.02, 1, 1.03, 0.41, 0.43, -0.72, 0.67)
230 | rings <- c(FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE)
231 | 
232 | # Encode type as a factor: type_factor
233 | 
234 | 
235 | # Construct planets_df: strings are not converted to factors!
236 | 
237 | 
238 | # Display the structure of planets_df
239 | 
240 | ```
241 | 
242 | *** =solution
243 | ```{r}
244 | # Definition of vectors
245 | planets <- c("Mercury", "Venus", "Earth", "Mars", "Jupiter", "Saturn", "Uranus", "Neptune")
246 | type <- c("Terrestrial planet", "Terrestrial planet", "Terrestrial planet", 
247 |           "Terrestrial planet", "Gas giant", "Gas giant", "Gas giant", "Gas giant")
248 | diameter <- c(0.382, 0.949, 1, 0.532, 11.209, 9.449, 4.007, 3.883)
249 | rotation <- c(58.64, -243.02, 1, 1.03, 0.41, 0.43, -0.72, 0.67)
250 | rings <- c(FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE)
251 | 
252 | # Encode type as a factor: type_factor
253 | type_factor <- factor(type)
254 | 
255 | # Construct planets_df: strings are not converted to factors!
256 | planets_df <- data.frame(planets, type_factor, diameter, rotation, rings, stringsAsFactors = FALSE)
257 | 
258 | # Display the structure of planets_df
259 | str(planets_df)
260 | ```
261 | 
262 | *** =sct
263 | ```{r}
264 | msg <- "Do not remove or change the definition of all the vectors!"
265 | test_object("planets", undefined_msg = msg, incorrect_msg = msg)
266 | test_object("type", undefined_msg = msg, incorrect_msg = msg)
267 | test_object("diameter", undefined_msg = msg, incorrect_msg = msg)
268 | test_object("rotation", undefined_msg = msg, incorrect_msg = msg)
269 | test_object("rings", undefined_msg = msg, incorrect_msg = msg)
270 | 
271 | test_object("type_factor", incorrect_msg = "Have you correctly created `type_factor`? Simply use the [`factor()`](http://www.rdocumentation.org/packages/base/functions/factor) function on `type`.")
272 | test_object("planets_df", incorrect_msg = "Have you correctly created `planets_df`? Make sure to use `type_factor` instead of `type` and set `stringsAsFactors` to `FALSE` inside [`data.frame()`](http://www.rdocumentation.org/packages/base/functions/data.frame).")
273 | test_output_contains("str(planets_df)", incorrect_msg = "Don't forget to display the structure of `planets_df`.")
274 | 
275 | success_msg("That looks more like it! Head over to the next exercise.")
276 | ```
277 | 
278 | 
279 | --- type:NormalExercise lang:r xp:100 skills:1 key:a80ae7fbe8
280 | ## Rename the data frame columns
281 | 
282 | As a data frame is actually a list containing same-length vectors under the hood, it's possible to name and rename data frames just as you did with lists. To create a data frame and name it in one and the same call you can use:
283 | 
284 | ```
285 | data.frame(name1 = vec1, name2 = vec2, ...)
286 | ```
287 | 
288 | You can also name a data frame after creating it:
289 | 
290 | ```
291 | my_df <- data.frame(vec1, vec2, ...)
292 | names(my_df) <- c("name1", "name2", ...)
293 | ```
294 | 
295 | Very proud of your first ever data frame, you show it to your friend Buzz. He's pretty impressed that you managed to include both factor and character columns, but he still finds the column names pretty odd. Time to make some improvements! The code that constructs the improved data frame, as you coded in the previous exercise, is already included.
296 | 
297 | *** =instructions
298 | Rename the columns of `planets_df`. As `planets_df` is already created, you'll want to use the [`names()`](http://www.rdocumentation.org/packages/base/functions/names) function. 
299 | 
300 | - Name the `planets` column `name`.
301 | - Name the `type_factor` column `type`.
302 | - You can keep the names `diameter` and `rotation`.
303 | - Change the name `rings` to `has_rings`.
304 | 
305 | Finally, print `planets_df` after you renamed it (not its structure!).
306 | 
307 | *** =hint
308 | You'll need the vector containing `"name"`, `"type"`, `"diameter"`, `"rotation"` and `"has_rings"`.
309 | 
310 | *** =pre_exercise_code
311 | ```{r}
312 | # no pec
313 | ```
314 | 
315 | *** =sample_code
316 | ```{r}
317 | # Construct improved planets_df
318 | planets <- c("Mercury", "Venus", "Earth", "Mars", "Jupiter", "Saturn", "Uranus", "Neptune")
319 | type <- c("Terrestrial planet", "Terrestrial planet", "Terrestrial planet", 
320 |           "Terrestrial planet", "Gas giant", "Gas giant", "Gas giant", "Gas giant")
321 | diameter <- c(0.382, 0.949, 1, 0.532, 11.209, 9.449, 4.007, 3.883)
322 | rotation <- c(58.64, -243.02, 1, 1.03, 0.41, 0.43, -0.72, 0.67)
323 | rings <- c(FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE)
324 | type_factor <- factor(type)
325 | planets_df <- data.frame(planets, type_factor, diameter, rotation, rings, stringsAsFactors = FALSE)
326 | 
327 | # Improve the names of planets_df
328 | 
329 | 
330 | # Print planets_df
331 | 
332 | ```
333 | 
334 | *** =solution
335 | ```{r}
336 | # Construct improved planets_df
337 | planets <- c("Mercury", "Venus", "Earth", "Mars", "Jupiter", "Saturn", "Uranus", "Neptune")
338 | type <- c("Terrestrial planet", "Terrestrial planet", "Terrestrial planet", 
339 |           "Terrestrial planet", "Gas giant", "Gas giant", "Gas giant", "Gas giant")
340 | diameter <- c(0.382, 0.949, 1, 0.532, 11.209, 9.449, 4.007, 3.883)
341 | rotation <- c(58.64, -243.02, 1, 1.03, 0.41, 0.43, -0.72, 0.67)
342 | rings <- c(FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE)
343 | type_factor <- factor(type)
344 | planets_df <- data.frame(planets, type_factor, diameter, rotation, rings, stringsAsFactors = FALSE)
345 | 
346 | # Improve the names of planets_df
347 | names(planets_df) <- c("name", "type", "diameter", "rotation", "has_rings")
348 | 
349 | # Print planets_df
350 | planets_df
351 | ```
352 | 
353 | *** =sct
354 | ```{r}
355 | 
356 | msg <- "Do not remove or change the definition of the predefined variables!"
357 | test_object("planets", undefined_msg = msg, incorrect_msg = msg)
358 | test_object("type", undefined_msg = msg, incorrect_msg = msg)
359 | test_object("diameter", undefined_msg = msg, incorrect_msg = msg)
360 | test_object("rotation", undefined_msg = msg, incorrect_msg = msg)
361 | test_object("rings", undefined_msg = msg, incorrect_msg = msg)
362 | test_object("type_factor", undefined_msg = msg, incorrect_msg = msg)
363 | test_object("planets_df", undefined_msg = msg, incorrect_msg = "Don't change the contents of `planets_df`, only change the column names!")
364 | test_object("planets_df", eq_condition = "equal",
365 |             undefined_msg = msg, incorrect_msg = "Are you sure you have correctly renamed the columns of `planets_df`? The hint might be able to help you out.")
366 | 
367 | test_output_contains("planets_df", incorrect_msg = "Don't forget to print `planets_df`.")
368 | success_msg("Nice one! This is going fast!")
369 | ```
370 | 
371 | --- type:VideoExercise lang:r xp:50 skills:1 key:9a2f941de8
372 | ## Subset, Extend & Sort Data Frames
373 | 
374 | *** =video_link
375 | //player.vimeo.com/video/138174008
376 | 
377 | *** =video_hls
378 | //videos.datacamp.com/transcoded/732_intro_to_r/v1/hls-ch6_2.master.m3u8
379 | 
380 | 
381 | --- type:NormalExercise lang:r xp:100 skills:1 key:b6125af738
382 | ## Selection of data frame elements
383 | 
384 | Similar to matrices, you select elements from a data frame with the help of square brackets `[ ]`. By using a comma, you can indicate what to select from the rows and the columns respectively:
385 | 
386 | ```
387 | # first row, second column
388 | my_df[1,2]
389 | 
390 | # rows 1 to 3, columns 2 to 4
391 | my_df[1:3,2:4]
392 | 
393 | # Entire first row
394 | my_df[1, ]
395 | 
396 | # rows 1 to 3 of "type" column
397 | planets_df[1:3,2]
398 | planets_df[1:3,"type"]
399 | ```
400 | 
401 | Let us now apply this technique on `planets_df`! This data frame is already available in the workspace.
402 | 
403 | *** =instructions
404 | - Select the type of Mars; store the factor in `mars_type`.
405 | - Store the entire rotation column in `rotation` as a vector.
406 | - Create a data frame, `closest_planets_df`, that contains all data on the first three planets.
407 | - Likewise, build the data frame `furthest_planets_df` that contains all data on the last three planets.
408 | 
409 | *** =hint
410 | `planets_df[1:3,]` will select all elements of the first three rows.
411 | 
412 | *** =pre_exercise_code
413 | ```{r}
414 | load(url("http://s3.amazonaws.com/assets.datacamp.com/course/introduction_to_r/chapter6.RData"))
415 | ```
416 | 
417 | *** =sample_code
418 | ```{r}
419 | # planets_df is pre-loaded
420 | 
421 | # The type of Mars: mars_type
422 | 
423 | 
424 | # Entire rotation column: rotation
425 | 
426 | 
427 | # First three planets: closest_planets_df
428 | 
429 | 
430 | # Last three planets: furthest_planets_df
431 | 
432 | 
433 | ```
434 | 
435 | *** =solution
436 | ```{r}
437 | # planets_df is pre-loaded
438 | 
439 | # The type of Mars: mars_type
440 | mars_type <- planets_df[4, 2]
441 | 
442 | # Entire rotation column: rotation
443 | rotation <- planets_df[ ,4]
444 | 
445 | # First three planets: closest_planets_df
446 | closest_planets_df <- planets_df[1:3, ]
447 | 
448 | # Last three planets: furthest_planets_df
449 | furthest_planets_df <- planets_df[6:8, ]
450 | ```
451 | 
452 | *** =sct
453 | ```{r}
454 | 
455 | msg <- "Do not remove or overwrite the `planets_df` data frame!"
456 | test_object("planets_df", undefined_msg = msg, incorrect_msg = msg)
457 | 
458 | test_object("mars_type", 
459 |             incorrect_msg = "Are you sure you correctly selected the type of Mars?")
460 | test_object("rotation", 
461 |             incorrect_msg = "Have another look at the command to define `rotation`. You'll want to select the fourth column.")
462 | test_object("closest_planets_df", 
463 |             incorrect_msg = "Did you select the first three rows of `planets_df` to create `closest_planets_df`?")
464 | test_object("furthest_planets_df",
465 |             incorrect_msg = "Make sure that you select the last three rows of `planets_df` to build `furthest_planets_df`.")
466 | 
467 | success_msg("Great! Feel free to have a look at the variables you've just created. Apart from selecting elements from your data frame by index, you can also use the column names.")
468 | ```
469 | 
470 | --- type:NormalExercise lang:r xp:100 skills:1 key:7aa0a94261
471 | ## Only planets with rings (1)
472 | 
473 | You will often want to select an entire column, namely one specific variable from a data frame. If you want to select the column `diameter` from `planets_df`, you can use either on of the following:
474 | 
475 | ```
476 | planets_df[, 3]
477 | planets_df[, "diameter"]
478 | planets_df$diameter
479 | ```
480 | 
481 | *** =instructions
482 | - Make use of the `$` sign to create the variable `rings_vector` that contains the entire `has_rings` column in the `planets_df` data frame.
483 | - Print the `rings_vector`; it should be a vector.
484 | 
485 | *** =hint
486 | `my_df$col_name` is the most convenient way to select a column from a data frame. In this case, the data frame is `planets_df` and the variable is `has_rings`.
487 | 
488 | *** =pre_exercise_code
489 | ```{r}
490 | load(url("http://s3.amazonaws.com/assets.datacamp.com/course/introduction_to_r/chapter6.RData"))
491 | ```
492 | 
493 | *** =sample_code
494 | ```{r}
495 | # planets_df is pre-loaded in your workspace
496 | 
497 | # Create rings_vector
498 | 
499 | 
500 | # Print rings_vector
501 | 
502 | ```
503 | 
504 | *** =solution
505 | ```{r}
506 | # planets_df is pre-loaded in your workspace
507 | 
508 | # Create rings_vector
509 | rings_vector <- planets_df$has_rings
510 | 
511 | # Print rings_vector
512 | rings_vector
513 | ```
514 | 
515 | *** =sct
516 | ```{r}
517 | msg <- "Do not remove or overwrite the `planets_df` data frame!"
518 | test_object("planets_df", undefined_msg = msg, incorrect_msg = msg)
519 | 
520 | test_object("rings_vector", incorrect_msg = "It looks like `rings_vector` does not contain all the elements of the `has_rings` variable of`planets_df`.")
521 | 
522 | test_output_contains("rings_vector", incorrect_msg = "Don't forget to print `rings_vector`!")
523 | 
524 | success_msg("Great! Continue to the next exercise and discover yet another way of subsetting!")
525 | ```
526 | 
527 | 
528 | --- type:NormalExercise lang:r xp:100 skills:1 key:d6245c1bb1
529 | ## Only planets with rings (2)
530 | 
531 | You probably remember from high school that some planets in our solar system have rings and others do not. But due to other priorities at that time (read: puberty) you can not recall their names, let alone their rotation speed, etc. Could R help you out?
532 | 
533 | The `rings_vector` you've coded before is a logical vector. It's `TRUE` when the corresponding planets have rings and `FALSE` when they don't. To select those observations from `planets_df` that have rings, you can use the `rings_vector` and perform subsetting by logicals!
534 | 
535 | To subset observations by logicals, put the logical vector and a comma inside square brackets, similar to this:
536 | 
537 | ```
538 | df[,logical_vector]
539 | ```
540 | 
541 | *** =instructions
542 | - Assign to `planets_with_rings_df` all data in the `planets_df` data set for the planets with rings, that is, where `rings_vector` is `TRUE`.
543 | - Print `planets_with_rings_df`.
544 | 
545 | *** =hint
546 | Select elements from `planets_df` by using the square brackets. The `rings_vector` contains boolean values and R will select only those rows/columns were the vector element is `TRUE`. In this case, you want to select rows based on `rings_vector` and select all the columns.
547 | 
548 | *** =pre_exercise_code
549 | ```{r}
550 | load(url("http://s3.amazonaws.com/assets.datacamp.com/course/introduction_to_r/chapter6.RData"))
551 | ```
552 | 
553 | *** =sample_code
554 | ```{r}
555 | # planets_df pre-loaded in your workspace
556 | 
557 | # Create rings_vector
558 | rings_vector <- planets_df$has_rings
559 | 
560 | # Select the information on planets with rings: planets_with_rings_df
561 | 
562 | 
563 | # Print planets_with_rings_df
564 | ```
565 | 
566 | *** =solution
567 | ```{r}
568 | # planets_df pre-loaded in your workspace
569 | 
570 | # Create rings_vector
571 | rings_vector <- planets_df$has_rings
572 | 
573 | # Select the information on planets with rings: planets_with_rings_df
574 | planets_with_rings_df <- planets_df[rings_vector,]
575 | 
576 | # Print planets_with_rings_df
577 | planets_with_rings_df
578 | ```
579 | 
580 | *** =sct
581 | ```{r}
582 | 
583 | msg <- "Do not remove or overwrite `planets_df` or `rings_vector`!"
584 | test_object("planets_df", undefined_msg = msg, incorrect_msg = msg)
585 | test_object("rings_vector", undefined_msg = msg, incorrect_msg = msg)
586 | 
587 | test_object("planets_with_rings_df",
588 |             incorrect_msg = "It looks like `planets_with_rings_df` does not contain all the data of the planets with rings. Make sure to not specify any column selector, to keep all columns.")
589 | 
590 | test_output_contains("planets_with_rings_df",
591 |                      incorrect_msg = "Don't forget to print `planets_with_rings_df`!")
592 | success_msg("Nice work, but this is a rather tedious solution. The next exercise will teach you how to do it in a more concise way.")
593 | ```
594 | 
595 | 
596 | --- type:NormalExercise lang:r xp:100 skills:1 key:c1a08e245c
597 | ## Only planets with rings but shorter
598 | 
599 | So what exactly did you learn in the previous exercises? You selected a subset from a data frame (`planets_df`) based on whether or not a certain condition was true (rings or no rings), and you managed to pull out all relevant data. Pretty awesome! By now, NASA is probably already flirting with your CV!
600 | 
601 | Instead of having to define a vector `rings_vector`, which you then use to subset `planets_df`, you could've also used either one of these:
602 | 
603 | ```
604 | planets_df[planets_df$has_rings, ]
605 | planets_df[planets_df$has_rings == TRUE, ]
606 | ```
607 | 
608 | *** =instructions
609 | - Create a data frame `small_planets_df` with planets that have a diameter smaller than the Earth. This means that the `diameter` variable should be smaller than 1, since diameter is a relative measure of the planet's diameter in relation to planet Earth.
610 | - Build another data frame, `slow_planets_df`, with the observations that have a longer rotation period than Earth. This means that the absolute value (use the function [`abs()`](http://www.rdocumentation.org/packages/base/functions/MathFun)) of the `rotation` variable should be greater than 1.
611 | 
612 | *** =hint
613 | Make use of the logical operators `>` and `<`. Use the [`abs()`](http://www.rdocumentation.org/packages/base/functions/MathFun) function for absolute values. 
614 | 
615 | *** =pre_exercise_code
616 | ```{r}
617 | load(url("http://s3.amazonaws.com/assets.datacamp.com/course/introduction_to_r/chapter6.RData"))
618 | ```
619 | 
620 | *** =sample_code
621 | ```{r}
622 | # planets_df is pre-loaded in your workspace
623 | 
624 | # Planets that are smaller than planet Earth: small_planets_df
625 | 
626 | 
627 | # Planets that rotate slower than planet Earth: slow_planets_df
628 | 
629 | ```
630 | 
631 | *** =solution
632 | ```{r}
633 | # planets_df is pre-loaded in your workspace
634 | 
635 | # Planets that are smaller than planet Earth: small_planets_df
636 | small_planets_df <- planets_df[planets_df$diameter < 1, ]        # option 1
637 | small_planets_df  <- subset(planets_df, subset = diameter < 1)   # option 2
638 | 
639 | # Planets that rotate slower than planet Earth: slow_planets_df
640 | slow_planets_df <- planets_df[abs(planets_df$rotation) > 1, ]      # option 1
641 | slow_planets_df <- subset(planets_df, subset = abs(rotation) > 1)  # option 2
642 | ```
643 | 
644 | *** =sct
645 | ```{r}
646 | 
647 | msg <- "Do not remove or overwrite the `planets_df` data frame!"
648 | test_object("planets_df", undefined_msg = msg, incorrect_msg = msg)
649 | 
650 | test_object("small_planets_df",
651 |             incorrect_msg = "It looks like `small_planets_df` does not contain the correct subset of `planets_df`.")
652 | 
653 | test_object("slow_planets_df",
654 |             incorrect_msg = "It looks like `slow_planets_df` does not contains the correct subset of `planets_df`. Make sure to use the [`abs()`](http://www.rdocumentation.org/packages/base/functions/MathFun) function for absolute values.")
655 | 
656 | success_msg("Great! Not only is the [`subset()`](http://www.rdocumentation.org/packages/base/functions/subset) function more concise, it is probably also more understandable for people who read your code. Continue to the next exercise.")
657 | ```
658 | 
659 | 
660 | --- type:NormalExercise lang:r xp:100 skills:1 key:e9ca3eeb99
661 | ## Add variable/column
662 | 
663 | There are many cases in which you'll want to add more variables to your data frame. This comes down to adding a column to the data frame. The exact same techniques to select columns from a data frame can be used here. To add a column `new_column` to `my_df`, with data from `my_vec`, you can use one of the following calls:
664 | 
665 | ```
666 | my_df$new_column <- my_vec
667 | my_df[["new_column"]] <- my_vec
668 | my_df <- cbind(my_df, new_column = my_vec)
669 | ```
670 | 
671 | You've browsed [Wikipedia](https://en.wikipedia.org/wiki/Planet) and also decide to add a column that lists the number of moons each of the planets has. Also the planets' masses can be a cool addition. The `moon` and `masses` vectors are already included in the workspace; up to you to add them to `planets_df`.
672 | 
673 | *** =instructions
674 | - Add `moons` to `planets_df` under the variable name "moon".
675 | - In a similar fashion, add `masses` under the variable name `"mass"`.
676 | 
677 | *** =hint
678 | To add a new column called "moon", you can use: `planets_df$moon <- moons`.
679 | 
680 | *** =pre_exercise_code
681 | ```{r}
682 | load(url("http://s3.amazonaws.com/assets.datacamp.com/course/introduction_to_r/chapter6.RData"))
683 | ```
684 | 
685 | *** =sample_code
686 | ```{r}
687 | # planets_df is already pre-loaded in your workspace
688 | 
689 | # Definition of moons and masses
690 | moons <- c(0, 0, 1, 2, 67, 62, 27, 14)
691 | masses <- c(0.06, 0.82, 1.00, 0.11, 317.8, 95.2, 14.6, 17.2)
692 | 
693 | # Add moons to planets_df under the name "moon"
694 | 
695 | 
696 | # Add masses to planets_df under the name "mass"
697 | 
698 | ```
699 | 
700 | *** =solution
701 | ```{r}
702 | # planets_df is already pre-loaded in your workspace
703 | 
704 | # Definition of moons and masses
705 | moons <- c(0, 0, 1, 2, 67, 62, 27, 14)
706 | masses <- c(0.06, 0.82, 1.00, 0.11, 317.8, 95.2, 14.6, 17.2)
707 | 
708 | # Add moons to planets_df under the name "moon"
709 | planets_df$moon <- moons
710 | 
711 | # Add masses to planets_df under the name "mass"
712 | planets_df$mass <- masses
713 | ```
714 | 
715 | *** =sct
716 | ```{r}
717 | 
718 | undef_msg <- "Do not remove `planets_df`!"
719 | msg <- "Do not change anything about the columns that were already in `planets_df`; you should only <i>add</i> columns."
720 | test_data_frame(name = "planets_df", 
721 |                 columns = c("name", "type", "diameter", "rotation", "has_rings"),
722 |                 undefined_msg = undef_msg, undefined_cols_msg = msg, incorrect_msg = msg)
723 | 
724 | test_data_frame(name = "planets_df",
725 |                 columns = "moon",
726 |                 undefined_cols_msg = "Make sure to name the column to contain the moon information \"moon\".",
727 |                 incorrect_msg = "The \"moon\" column does not contain the correct information. Try again.")
728 | 
729 | test_data_frame(name = "planets_df",
730 |                 columns = "mass",
731 |                 undefined_cols_msg = "Make sure to name the column to contain the mass information \"mass\".",
732 |                 incorrect_msg = "The \"mass\" column does not contain the correct information. Try again.")
733 | 
734 | test_object("planets_df", incorrect_msg = "It appears that you have correctly specified the \"moon\" and \"mass\" columns, but there's still something wrong with the resulting `planets_df`. Make sure to add columns twice!")
735 | success_msg("Nice one! This data frame is beginning to contain quite some information!")
736 | ```
737 | 
738 | 
739 | --- type:NormalExercise lang:r xp:100 skills:1 key:8e5ade7078
740 | ## Sorting
741 | 
742 | In data analysis you will often sort your data according to a certain variable in the data set. In R, this is done with the help of the function [`order()`](http://www.rdocumentation.org/packages/base/functions/order). 
743 | 
744 | [`order()`](http://www.rdocumentation.org/packages/base/functions/order) is a function that gives you the ranked position of each element when it is applied on a variable, such as a vector for example:
745 | 
746 | ```
747 | a <- c(100,9,101)
748 | order(a)
749 | ``` 
750 | 
751 | this code returns the vector containing 2, 1 and 3; that's because 100 is the second largest element of the vector, 9 is the smallest element and 101 is the largest element.
752 |     
753 | ```
754 | a[order(a)]
755 | ```
756 | 
757 | will thus give you the ordered vector (9, 100, 101), since it first picks the second element of `a`, then the first and then the last. Got it? If you are not sure, use the console to play with the [`order()`](http://www.rdocumentation.org/packages/base/functions/order) function. 
758 | 
759 | *** =instructions
760 | Experiment with the [`order()`](http://www.rdocumentation.org/packages/base/functions/order) function in the console. Click 'Submit Answer' when you are ready to continue.
761 | 
762 | *** =hint
763 | Just play with the [`order()`](http://www.rdocumentation.org/packages/base/functions/order) function in the console!
764 | 
765 | *** =pre_exercise_code
766 | ```{r}
767 | # no pec
768 | ```
769 | 
770 | *** =sample_code
771 | ```{r}
772 | # Just play around with the order function in the console to see how it works!
773 | ```
774 | 
775 | *** =solution
776 | ```{r}
777 | # Just play around with the order function in the console to see how it works!
778 | # Some examples:
779 | order(1:10)
780 | order(2:11)
781 | order(c(5,4,6,7))
782 | ```
783 | 
784 | *** =sct
785 | ```{r}
786 | success_msg("Great! Now let's use the [`order()`](http://www.rdocumentation.org/packages/base/functions/order) function to sort your data frame!")
787 | ```
788 | 
789 | 
790 | --- type:NormalExercise lang:r xp:100 skills:1 key:ec87541ef1
791 | ## Sorting your data frame
792 | 
793 | Alright, now let us do something useful with the [`order()`](http://www.rdocumentation.org/packages/base/functions/order) function! You would like to rearrange your data frame such that it starts with the smallest planet and ends with the largest one. A sort on the `diameter` column, if you will.
794 | 
795 | Suppose you have a data frame `df`, with three columns `a`, `b` and `c`. The following code will print a version of df that is sorted on the column `a`.
796 | 
797 | ```
798 | pos <- order(df$a)
799 | df[pos, ]
800 | ```
801 | 
802 | *** =instructions
803 | - Assign to the variable `positions` the desired ordering for the new data frame that you will create in the next step. You can use the [`order()`](http://www.rdocumentation.org/packages/base/functions/order) function for that.
804 | - Now create the data frame `smallest_first_df`, which contains the same information as `planets_df`, but with the planets in increasing order of magnitude. Use the previously created variable `positions` as row indices inside square brackets to achieve this.
805 | - Print `smallest_first_df` to see what you've accomplished.
806 | 
807 | *** =hint
808 | - `order(planets_df$diameter)` will give you the ordering of the variable diameter from smallest to largest. This is what you should assign to `positions`.
809 | - Use the variable positions then to select from the data frame `planets_df`: `planets_df[positions, ]`.
810 | 
811 | *** =pre_exercise_code
812 | ```{r}
813 | load(url("http://s3.amazonaws.com/assets.datacamp.com/course/introduction_to_r/chapter6.RData"))
814 | ```
815 | 
816 | *** =sample_code
817 | ```{r}
818 | # planets_df is pre-loaded in your workspace
819 | 
820 | # Create a desired ordering for planets_df: positions
821 | 
822 | 
823 | # Create a new, ordered data frame: smallest_first_df
824 | 
825 | 
826 | # Print smallest_first_df
827 | ```
828 | 
829 | *** =solution
830 | ```{r}
831 | # planets_df is pre-loaded in your workspace
832 | 
833 | # Create a desired ordering for planets_df: positions
834 | positions <- order(planets_df$diameter)
835 | 
836 | # Create a new, ordered data frame: smallest_first_df
837 | smallest_first_df <- planets_df[positions, ]
838 | 
839 | # Print smallest_first_df
840 | smallest_first_df
841 | ```
842 | 
843 | *** =sct
844 | ```{r}
845 | msg = "Do not remove or overwrite the `planets_df` data frame!"
846 | test_object("planets_df", undefined_msg = msg, incorrect_msg = msg)
847 | test_object("positions",
848 |             incorrect_msg = "It looks like `positions` does not contain all the correct ordering of the diameter column.")
849 | test_object("smallest_first_df",
850 |             incorrect_msg = "It looks like `smallest_first_df` does not contain the positions of the ordered `planets_df`.")
851 | test_output_contains("smallest_first_df", incorrect_msg = "Finish off by printing `smallest_first_df`.")
852 | success_msg("Wonderful! What does the resulting data frame look like? Order prevailed!")
853 | ```
854 | 
855 | 


--------------------------------------------------------------------------------
/course.yml:
--------------------------------------------------------------------------------
 1 | id: 732
 2 | title: Introduction to R (beta)
 3 | author_field: Filip Schouwenaars
 4 | university: DataCamp
 5 | author_bio: Next to being the main developer of DataCamp's interactive courses, Filip
 6 |   is responsible for everything related to R. Under the motto 'Eat your own dog food',
 7 |   he leverages the techniques DataCamp teaches its students to perform data analysis
 8 |   for DataCamp. Filip holds degrees in Electrical Engineering and Artificial Intelligence.
 9 |   <br><br>This course is a rework of the earlier introduction to R course,
10 |   built by Jonathan Cornelissen and Martijn Theuwissen, co-founders of DataCamp.
11 | description: With over 2 million users worldwide R is rapidly becoming the leading
12 |   programming language in statistics and data science. Every year, the number of R
13 |   users grows by 40%, and an increasing number of organizations are using it in their
14 |   day-to-day activities.<br>In this introduction to R, you will master the basics
15 |   of this beautiful open source language such as factors, lists and data frames. With
16 |   the knowledge gained in this course, you will be ready to undertake your first very
17 |   own data analysis.
18 | chapters:
19 |   chapter1.Rmd: 1720
20 |   chapter2.Rmd: 1721
21 |   chapter3.Rmd: 1722
22 |   chapter4.Rmd: 1723
23 |   chapter5.Rmd: 1724
24 |   chapter6.Rmd: 1725
25 | 
26 | 


--------------------------------------------------------------------------------
/datasets/chapter5.R:
--------------------------------------------------------------------------------
1 | # Create shining_list
2 | actors_vector <- c("Jack Nicholson","Shelley Duvall","Danny Lloyd","Scatman Crothers","Barry Nelson")
3 | reviews_factor <- factor(c("Good", "OK", "Good", "Perfect", "Bad", "Perfect", "Good"), 
4 |                          ordered = TRUE, levels = c("Bad", "OK", "Good", "Perfect"))
5 | shining_list <- list(title = "The Shining", actors = actors_vector, reviews = reviews_factor)
6 | rm(actors_vector, reviews_factor)
7 | save(shining_list, file = "datasets/chapter5.RData")


--------------------------------------------------------------------------------
/datasets/chapter5.RData:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datacamp/courses-intro-to-r-beta/b028fe1f2dcdc1eba49e8e4135f6b061ae7dd394/datasets/chapter5.RData


--------------------------------------------------------------------------------
/datasets/chapter6.R:
--------------------------------------------------------------------------------
 1 | planets <- c("Mercury", "Venus", "Earth", "Mars", "Jupiter", "Saturn", "Uranus", "Neptune")
 2 | type <- c("Terrestrial planet", "Terrestrial planet", "Terrestrial planet", 
 3 |           "Terrestrial planet", "Gas giant", "Gas giant", "Gas giant", "Gas giant")
 4 | diameter <- c(0.382, 0.949, 1, 0.532, 11.209, 9.449, 4.007, 3.883)
 5 | rotation <- c(58.64, -243.02, 1, 1.03, 0.41, 0.43, -0.72, 0.67)
 6 | rings <- c(FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE)
 7 | type_factor <- factor(type)
 8 | planets_df <- data.frame(planets, type_factor, diameter, rotation, rings, stringsAsFactors = FALSE)
 9 | names(planets_df) <- c("name", "type", "diameter", "rotation", "has_rings")
10 | rm(planets, type, diameter, rotation, rings, type_factor)
11 | save(planets_df, file = "datasets/chapter6.RData")


--------------------------------------------------------------------------------
/datasets/chapter6.RData:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datacamp/courses-intro-to-r-beta/b028fe1f2dcdc1eba49e8e4135f6b061ae7dd394/datasets/chapter6.RData


--------------------------------------------------------------------------------
/refguides/chapter1_refguide.Rmd:
--------------------------------------------------------------------------------
  1 | ---
  2 | output: html_document
  3 | ---
  4 | 
  5 | ## R: The true basics  
  6 | 
  7 | 
  8 | ### R as calculator
  9 | 
 10 | After R is started, there is a console awaiting for input. At the prompt (`>`), you can enter numbers and perform calculations. 
 11 | 
 12 | ```{r}
 13 |  3*2
 14 | ```
 15 | A few arithmetic operators are:  
 16 | 
 17 | - Addition: `+`
 18 | - Subtraction: `-`
 19 | - Multiplication: `*`
 20 | - Division: `/`
 21 | - Exponentiation: `^`
 22 | - Modulo: `%%`
 23 | 
 24 | 
 25 | ### Variable assignment and Operations
 26 | 
 27 | You can assign values to variables with the assignment operator `<-`. Just typing the variable by itself at the prompt will print out the value. 
 28 | 
 29 | ```{r}
 30 |  x <- 3
 31 |  x
 32 |  y <- 9
 33 |  y
 34 | ```
 35 | 
 36 | You can also perform arithmetic operations with variables. Look at the result of multiplying `x` and `y`, we defined previously:
 37 | 
 38 | ```{r}
 39 |  y * x
 40 | ```
 41 | 
 42 | As you work in R and create new variables it can be easy to lose track of what variables you have defined. To get a list of all the variables that have been defined use [`ls()`](http://www.rdocumentation.org/packages/base/functions/ls). And if you need to remove variables, you can use `rm(<var_name>)`. 
 43 | 
 44 | 
 45 | ### Comment your code
 46 | 
 47 | Adding comments to your code helps others understanding it. Comments in R are ignored by the parser. Any text that could be typed after the `#` character and on the same line is taken to be a comment, unless the `#` character is inside a quoted string. For example,
 48 | 
 49 | ```{r}
 50 |  x <- 24 # this is a comment 
 51 |  y <- "  #... but this is not."
 52 | ```
 53 | 
 54 | 
 55 | ## Basic Data Types 
 56 | 
 57 | There are several basic R data types that are of frequent occurrence in routine R calculations. We will try to understand a few of them better by using the [`class()`](http://www.rdocumentation.org/packages/base/functions/class) function. 
 58 | 
 59 | - Decimal values are called numerics in R. You can perform arithmetic operations on them.
 60 | ```{r}
 61 |  x <- 12.3
 62 |  x
 63 |  class(x)
 64 | ```
 65 | 
 66 | - A special type of numeric is an integer. You can specify that a number is an integer using the following syntax.
 67 | ```{r}
 68 |  y <- 3L
 69 |  y
 70 |  class(y)
 71 | ```
 72 | 
 73 | Another way is to invoke the [`as.integer()`](http://www.rdocumentation.org/packages/base/functions/integer) function. 
 74 | 
 75 | ```{r}
 76 |  y <- as.integer(3)
 77 |  y
 78 |  class(y)
 79 | ```
 80 | 
 81 | And one can convert an integer value to a numeric value by [`as.numeric()`](http://www.rdocumentation.org/packages/base/functions/numeric).  
 82 | 
 83 | - A character object is used to represent string values in R.
 84 | 
 85 | ```{r}
 86 |  z <- "Good morning!"
 87 |  z
 88 |  class(z)
 89 | ```
 90 | 
 91 | You can convert objects into character values with the [`as.character()`](http://www.rdocumentation.org/packages/base/functions/numeric) function. 
 92 | 
 93 | - Another important data type is the logical type. There are two predefined variables, `TRUE` and `FALSE`. 
 94 | 
 95 | ```{r}
 96 |  a <- TRUE
 97 |  a
 98 |  class(a)
 99 | ```
100 | 
101 | You can also see the data type of a variable by invoking one of the following `is.*()` functions. The result is a logical statement `TRUE` or `FALSE`. 
102 | 
103 | ```{r, eval=FALSE}
104 |  is.numeric()      #to evaluate if type = numeric
105 |  is.integer()      #to evaluate if type = integer
106 |  is.character()    #to evaluate if type = character
107 | ```
108 | 
109 | 
110 | 
111 | 
112 | 
113 | 


--------------------------------------------------------------------------------
/refguides/chapter2_refguide.Rmd:
--------------------------------------------------------------------------------
  1 | 
  2 | ## Create and Name Vectors
  3 | 
  4 | A vector is a sequence of data elements of the same basic type. You can have character, numerical, logical vectors and many more. To create one you can use the [`c()`](http://www.rdocumentation.org/packages/onion/functions/c) function. Here is a numeric vector:
  5 | 
  6 | ```{r}
  7 | c(12, 7, 4) 
  8 | ```
  9 | 
 10 | And a character one, assigned to the variable `types_water`.
 11 | 
 12 | ```{r}
 13 | types_water <- c("Fresh", "Brackish", "Seawater")
 14 | ```
 15 | 
 16 | As in the previous chapter, you can verify if `types_water` is a vector by 
 17 | 
 18 | ```{r}
 19 | is.vector(types_water)
 20 | ```
 21 | 
 22 | You can also name the elements of your vector by the [`names()`](http://www.rdocumentation.org/packages/nlme/functions/Names) function or create a named vector to begin with. 
 23 | 
 24 | ```{r}
 25 | numeric_vector <- c(12, 7, 4) 
 26 | name <- c("months/year", "days/week", "weeks/month")
 27 | names(numeric_vector) <- name
 28 | numeric_vector 
 29 | ```
 30 | 
 31 | In our example, the `numeric_vector ` contains three elements. To see how many elements your vector contains, use:
 32 | 
 33 | ```{r}
 34 | length(numeric_vector)
 35 | ```
 36 | 
 37 | ### Important Note
 38 | A vector can only contain elements of the same type. If you try to build a vector of different data types, R performs coersion. It trasnfroms all the elements to the same data type.  
 39 | 
 40 | ```{r}
 41 | new_vector <- c("ice-cream", TRUE, 2)
 42 | new_vector
 43 | ```
 44 | 
 45 | You can verify that now `new_vector` is a character vector by invoking: 
 46 | 
 47 | ```{r}
 48 | class(new_vector)
 49 | ```
 50 | 
 51 | 
 52 | ## Vector Calculus
 53 | 
 54 | Computations on vectors are performed in an element-wise fashion. Ckeck it out.
 55 | 
 56 | ```{r}
 57 | vector1 <- c(5, 2, 39, 106)
 58 | vector2 <- c(300, 5, 1 , 0)
 59 | vector3 <- vector1 + vector2
 60 | vector3
 61 | ```
 62 | 
 63 | You can use all the arithemtic operations you learned in Chapter 1!
 64 | 
 65 | If you want to sum up all the elements of your vector you can use
 66 | 
 67 | ```{r}
 68 | sum(vector3)
 69 | ```
 70 | 
 71 | Moreover, you can use relational operators like `<` and `>` to compare two vectors. Remember the comparison is performed again element-wise.
 72 | 
 73 | ```{r}
 74 | vector1 < vector2
 75 | ```
 76 | 
 77 | ## Vector Subsetting
 78 | 
 79 | Suppose you need to select an element of your vector. You can use `[...]`.
 80 | 
 81 | ```{r}
 82 | numeric_vector[1]
 83 | ```
 84 | 
 85 | The number inside the brackets corresponds to the element you want to select, here the first one has been selected. 
 86 | 
 87 | If you are dealing with a named vector you can use the names of the elements to select them. 
 88 | 
 89 | ```{r}
 90 | numeric_vector["months/year"]
 91 | ```
 92 | 
 93 | If you need to select more than one elements you can use another vector! Take a minute to understand the following syntax 
 94 | 
 95 | ```{r}
 96 | numeric_vector[c(1,3)]
 97 | ```
 98 | 
 99 | And also the order matters!
100 | 
101 | ```{r}
102 | numeric_vector[c(3,1)]
103 | ```
104 | 
105 | If you want to select all but one elements 
106 | 
107 | ```{r}
108 | numeric_vector[-2]
109 | ```
110 | 
111 | Notice that the last and the [ante-penultimate](https://en.wiktionary.org/wiki/antepenultimate) examples give the same result! 
112 | 
113 | Another way to subset a vector is use a logical vector. The logical vector has to have the same length as the one you want to subset. Only the elements that correspond to `TRUE` will be kept. 
114 | 
115 | ```{r}
116 | numeric_vector[c(TRUE, FALSE, TRUE)]
117 | ```
118 | 
119 | Again we get the same result! If your logical vector is shorter than the vector you are subsetting, R recycles the logical vector that you passed until it has the same length as the one you subset. For further explanations about how recycling works, go to the videos. 
120 | 
121 | 
122 | 


--------------------------------------------------------------------------------
/refguides/chapter3_refguide.Rmd:
--------------------------------------------------------------------------------
 1 | ## Create and Name Matrices
 2 | 
 3 | Matrices are not very different from vectors; both are data structures that store elements of the same type:
 4 | 
 5 | A matrix is a 2-dimensional array consisting of rows and columns. Matrices and dataframes are different since the latter can only contain numeric vectors and can be considered as a natural extension of a vector.
 6 | You can build them easily with the function [`matrix()`](http://www.rdocumentation.org/packages/gmp/functions/matrix).
 7 | 
 8 | ```{r, eval= F }
 9 | matrix(data = NA, nrow = 1, ncol = 1, byrow = FALSE, dimnames = NULL) 
10 | ```
11 | 
12 | Only the number of rows `nrow` and columns `ncol` need to be specified. However, the argument `byrow` can be used to specify whether the matrix is filled up row-wise or column-wise.
13 | 
14 | ```{r}
15 | my_matrix <- matrix(c(9,2,5, 1,3,4, 1,2,7), nrow = 3, ncol = 3, byrow = TRUE)
16 | my_matrix
17 | ```
18 | 
19 | 
20 | Per default, the rows and columns do not have names. The argument `dimnames` can change that by defining a list with names such as `dimnames = list(c(r1, r2...), c(c1, c2,...)` depending on the number of rows and columns.
21 | 
22 | Since matrices are just several vectors that you can put together, they can also be build by pasting rows or columns with [`rbind()`](http://www.rdocumentation.org/packages/dplyr/functions/rbind) or [`cbind()`](http://www.rdocumentation.org/packages/marray/functions/cbind) instead of using the function [`matrix()`](http://www.rdocumentation.org/packages/gmp/functions/matrix).
23 | 
24 | A matrix is defined as an atomic vector. Thus, it is possible to create a matrix based on two matrices that do not necessary contain numbers, as seen in the first video exercise. Then you have created a dataframe or a list by applying coercion.
25 | 
26 | 
27 | ## Subsetting Matrices
28 | > Like any other data object, you can draw subsets from the matrices. They can be built using the square brackets `[]` on the matrix object and specifying the row and column that is to be subtracted. You can then maintain a single matrix element:
29 | 
30 | ```{r}
31 | my_matrix[1, 2]
32 | ```
33 | 
34 | Otherwise rows and columns can be specifiyed by simply defining the number of the row or the column:
35 | 
36 | ```{r}
37 | my_matrix[1,]
38 | my_matrix[, 2]
39 | ```
40 | 
41 | If only a single number is defined, R returns the value of the position defined inside of the subset:
42 | 
43 | ```{r}
44 | my_matrix[1]
45 | ```
46 | 
47 | Remark: R counts the positions inside of a matrix from the first row value to the last row value in the first column, then the first row value to the last row value in the second column. 
48 | 
49 | Furthermore, you can subset multiple elements of a matrix vector by defining the row or columns and the position of the value. As seen in the video lectures you can use the concatenate function [`c()`](http://www.rdocumentation.org/packages/onion/functions/c) to either retain a single value of a specific position or a sub-matrix.
50 | 
51 | ```{r}
52 | my_matrix[2, c(2, 3)]
53 | my_matrix[c(2, 3), c(2, 3)]
54 | ```
55 | 
56 | In a similar manner, matrices can be subsetted by names instead of indices of rows and columns.
57 | Alternatively, logical vectors can be used to subset, when both rows and columns are defined!
58 | 
59 | ```{r}
60 | my_matrix[c(F, F, T), c(F, F, T)]
61 | ```
62 | 
63 | ## Matrix calculus
64 | 
65 | R has two easy functions to let you sum up the values of the rows and columns:
66 | 
67 | * [`rowSums()`](http://www.rdocumentation.org/packages/base/functions/colSums) 
68 | * [`colSums()`](http://www.rdocumentation.org/packages/base/functions/colSums)
69 | 
70 | And of course any arithmetic operation can be proceeded on a matrix as well:
71 | 
72 | * calculate a scalar 
73 | * any other operations (`/`, `+`, `-`) 
74 | 
75 | In general, all matrix operations are done element-wise.
76 | 
77 | Remark: Matrix recycling is automatically done when a matrix calculation is done between two unequal matrices or between a matrix and a vector. This has to be handled very carefully, since R might recylces in a way you don't want it to recycle.
78 | 
79 | 
80 | 
81 | 
82 | 
83 | 
84 | 


--------------------------------------------------------------------------------
/refguides/chapter4_refguide.Rmd:
--------------------------------------------------------------------------------
 1 | ---
 2 | output: html_document
 3 | ---
 4 | ## Factors
 5 | 
 6 | Factors are defined as categorical variables that take on a few values. To define a variable as categorical use the function [`factor()`](http://www.rdocumentation.org/packages/base/functions/factor).
 7 | 
 8 | What does R do?
 9 | 
10 | * screening for all values and defining them as factors.
11 | * sort them alphabetically
12 | * character values correspond to integer values (handy, in the case of long charcater strings)
13 | 
14 | ## Rename factors
15 | 
16 | Moreover, the order has to be specified manually inside of the factor using the `levels` argument in the  [`factor()`](http://www.rdocumentation.org/packages/base/functions/factor) function.
17 | 
18 | ```{r, eval=FALSE}
19 | factor(my_var, levels = c("xy","xz","zy"))
20 | ```
21 | 
22 | And the level names can be defined manually using the [`levels()`](http://www.rdocumentation.org/packages/base/functions/levels) function
23 | 
24 | ```{r, eval=FALSE}
25 | levels(my_var) = c("na_xy", "na_xz", "na_zy")
26 | ```
27 | 
28 | or by using the `labels` argument inside of the function [`factor()`](http://www.rdocumentation.org/packages/base/functions/factor)
29 | 
30 | ```{r, eval=F}
31 | factor(my_var, labels = c("na_xy","na_xz","na_zy"))
32 | 
33 | ```
34 | 
35 | Remark: To rename levels, you always have to follow the original order of the levels. To avoid confusion and misspecification, it is suggested to use both `levels` and `labels` inside [`factor()`](http://www.rdocumentation.org/packages/base/functions/factor).
36 | 
37 | 
38 | ## Nominal vs Ordinal
39 | 
40 | **Ordinal** variables contain a natural order among their levels, whereas **nominal** variables do not inherit any such order.
41 | 
42 | First thing to know about **ordinal** variables is that they are as well defined with [`factor()`](http://www.rdocumentation.org/packages/base/functions/factor) but the argument `order` is specified as `TRUE`.
43 | R orders them alphabetically, unless specified otherwise.
44 | The ordinal structure is quite specific:
45 | * it is regarded in comparisons and operations
46 | * it reflects by `<` and `>` signs
47 | 
48 | For example:
49 | ```{r, eval=F}
50 | factor(my_ordinal, order = T, levels = c(1, 2, 3))
51 | ```
52 | 
53 | 
54 | 
55 | 
56 | 


--------------------------------------------------------------------------------
/refguides/chapter5_refguide.Rmd:
--------------------------------------------------------------------------------
 1 | 
 2 | ## Lists
 3 | 
 4 | A list is a generic vector containing other objects. There is no particular need for the objects to be of the same type, as with vectors. For example, a list could consist of a numeric vector, a logical value, a matrix, other lists, and so on.
 5 | 
 6 | ```{r}
 7 | my_family <- list("Ryan", "Mary", 3, TRUE)
 8 | my_family
 9 | ``` 
10 | 
11 | Components of lists may also be named. You can assign names to list elements by the [`names()`](http://www.rdocumentation.org/packages/base/functions/names) function or at the time of creation.
12 | 
13 | ```{r}
14 | my_family <- list(father="Ryan", mother="Mary", siblings=3, divorced=TRUE)
15 | my_family
16 | ```
17 | 
18 | If you want to know if your object,`my_family` in our case, is a list you can use the following. 
19 | 
20 | ```{r}
21 | is.list(my_family)
22 | ```
23 | 
24 | Finally, you can use `str()` to display the stucture of your list. 
25 | 
26 | ```{r}
27 | str(my_family)
28 | ```
29 | 
30 | 
31 | ## Subset and Extend Lists
32 | 
33 | If you need to isolate parts of your list you can use `[...]` and `[[...]]`. Indexing with `[...]` as used to subset vectors will give you sublist not the content inside the element. To retrieve the content, we need to use `[[...]]`. This approach will allow you to access a single element at a time.
34 | 
35 | ```{r}
36 | my_family[1]
37 | my_family[[1]]
38 | ```
39 | 
40 | If you want to retrieve more elements of your list. 
41 | 
42 | ```{r}
43 | my_family[c(1,3)]
44 | ```
45 | 
46 | If the list is named its elements can be refered by names instead of numeric indeces.
47 | 
48 | ```{r}
49 | my_family[["father"]]
50 | ```
51 | 
52 | Alternatively, you can use the `$` operator. 
53 | ```{r}
54 | my_family$father
55 | ```
56 | 
57 | Another way to subset is logical data types, namely `TRUE` and `FALSE`.
58 | 
59 | ```{r}
60 | my_family[c(TRUE, FALSE, TRUE, FALSE)]
61 | ```
62 | 
63 | Adding new elements is easy. You simply assign values using new tags and it will pop into action. 
64 | 
65 | ```{r}
66 | grandparents <- c("Arthur","Josephin")
67 | my_family$grandparents <- grandparents
68 | my_family
69 | ```
70 | 
71 | or equivalently
72 | ```{r}
73 | my_family[["grandparents"]] <- grandparents
74 | ```
75 | 
76 | 
77 | 
78 | 


--------------------------------------------------------------------------------
/refguides/chapter6_refguide.Rmd:
--------------------------------------------------------------------------------
  1 | ---
  2 | output: html_document
  3 | ---
  4 | ## Explore dataframes
  5 | 
  6 | Data sets
  7 | 
  8 | * consist of observations
  9 | * corresponding to variables
 10 | * stored in a dataframe
 11 | 
 12 | Matrices on the other hand are only useful for atomic vectors, and lists would require too much coding.
 13 | 
 14 | What is a dataframe?
 15 | 
 16 | * built to specifically store data
 17 | * matrix form: with rows as observations and columns as variables
 18 | * allows for elements of all types (logicals, numerics, characters)
 19 | 
 20 | How to create a dataframe?
 21 | 
 22 | * import data from CSV files
 23 | * import from a database (i.e. SQL)
 24 | * import from other statistical software etc...
 25 | 
 26 | Remark: Dataframes are basically lists with n elements corresponding to each column of the dataframe. The elements of the lists are of length of the number of observations BUT the number of observations has to be equal.
 27 | 
 28 | In general, you can define a data frame inside R using the function [`data.frame()`](http://www.rdocumentation.org/packages/R.utils/functions/dataFrame).
 29 | 
 30 | ```{r, eval = F}
 31 | data.frame(..., row.names = NULL, check.rows = FALSE, check.names = TRUE, 
 32 |            stringsAsFactors = default.stringsAsFactors()))
 33 | ```
 34 | 
 35 | ## Subset a dataframe
 36 | 
 37 | Due to the nature of a dataframe, you use the subsetting syntax of lists and matrices.
 38 | 
 39 | To draw a subset from a matrix, you apply the square brackets and choose a row and a column
 40 | 
 41 | ```{r, eval = FALSE}
 42 | my_df[3, 2]
 43 | ```
 44 | 
 45 | The indices can be columns names as well! 
 46 | As before, to get the only one of the rows you would specify which one you want to keep and leave the column argument empty. Same applies for keeping only one variable but all the observations.
 47 | 
 48 | ```{r, eval=FALSE}
 49 | my_df[3,  ]  # only the third row is subsetted
 50 | my_df[ , 2]  # only the secnd column in subsetted
 51 | 
 52 | ```
 53 | 
 54 | This can be generalized to the situation where you want to select only some variables but keep all the observations; or select only a few observations:
 55 | 
 56 | ```{r, eval=FALSE}
 57 | my_df[c(3,2), c(3,2)]
 58 | ```
 59 | 
 60 | Remark: Any built subset leads to a _new dataframe_ and not a vector, as it was the case before.
 61 | 
 62 | How to use the list syntax to select elements?
 63 | 
 64 | * Either by using the dollar sign (`$`)
 65 | 
 66 | ```{r, eval= F}
 67 | my_df$variable1
 68 | ```
 69 | 
 70 | * Or by using double brackets (`[[...]]`)
 71 | 
 72 | ```{r, eval = F}
 73 | my_df[[variable1]]
 74 | ```
 75 | 
 76 | Remark: Now, the result is a vector. If instead of double square brackets, single square brackets are used, then a _new list_ is created.
 77 | 
 78 | ## Extend your dataframe
 79 | 
 80 | You can extend your dataframe by adding a column
 81 |  
 82 |  * by using the dollar sign (`$`)
 83 | ```{r, eval=F}
 84 | my_df$new_column <- new_column
 85 | ```
 86 | 
 87 |  * by using double square brackets (`[[...]]`)
 88 | ```{r, eval=F}
 89 | my_df[["new_column"]] <- new_column
 90 | ```
 91 | 
 92 |  * using [`cbind()`](http://www.rdocumentation.org/packages/marray/functions/cbind)
 93 | ```{r, eval=FALSE}
 94 | cbind(my_df, new_column)
 95 | ```
 96 | 
 97 | The dataframe can be extended by adding a rows. Since rows corresponds to lists, it is necessary to create a new dataframe with [`data.frame()`](http://www.rdocumentation.org/packages/R.utils/functions/dataFrame) and combine the original one and the new one.
 98 | 
 99 | * by using [`rbind()`](http://www.rdocumentation.org/packages/dplyr/functions/rbind) 
100 | ```{r, eval=FALSE}
101 | rbind(my_df, new_df)
102 | ```
103 | 
104 | ## Sort a dataframe
105 | 
106 | In general, the function [`sort()`](http://www.rdocumentation.org/packages/arules/functions/sort)
107 | can be applied. However, to sort the rows in a data frame, you can use the [`order()`](http://www.rdocumentation.org/packages/base/functions/order) function.
108 | 
109 | ```{r, eval= F}
110 | rank <- order(my_df$variable1)
111 | ```
112 | 
113 | The order function
114 | 
115 | * returns a vector with rank/position of each element
116 | * the first value indicates the rank of the element in the vector/matrix
117 | * [`order()`](http://www.rdocumentation.org/packages/base/functions/order) can be used inside of a subset
118 | 
119 | ```{r, eval=FALSE}
120 | my_df[order(my_df, decreasing = TRUE), ]
121 | ```
122 | 
123 | For more information, have a look at the exercises!
124 | 


--------------------------------------------------------------------------------
/refguides/chapter7_refguide.Rmd:
--------------------------------------------------------------------------------
  1 | 
  2 | ## Basic Graphics
  3 | 
  4 | One of the most frequently used plotting functions in R is the [`plot()`](http://www.rdocumentation.org/packages/graphics/functions/plot). This is a generic function: the type of plot produced is dependent on the type or class of the argument(s).
  5 | 
  6 | ```{r, eval=FALSE}
  7 | x <- c(1, 2, 3, 4)
  8 | plot(x) # this generates a plot of the values in the variable against their index
  9 | 
 10 | x <- factor(c("Black", "White", "Green"))
 11 | plot(x) # this generates a bar chart
 12 | ```
 13 | 
 14 | ```{r, eval=FALSE}
 15 | x <- c(1, 2, 3)
 16 | y <- c(1, 2, 3)
 17 | plot(x, y) # this generates a scatter plot
 18 | 
 19 | x <- factor(c("Black", "White", "Green"))
 20 | y <- c(1, 2, 3)
 21 | plot(x, y) # this generates boxplots of y for each level of x
 22 | 
 23 | x <- factor(c("Black", "White", "Green"))
 24 | y <- factor(c("Left", "Right", "Centre"))
 25 | plot(x, y) # this generates stacked bar chart 
 26 | ```
 27 | 
 28 | Histograms can be created using the [`hist()`](http://www.rdocumentation.org/packages/graphics/functions/hist) function. This function takes in a continuous variable, `x`, for which the histogram is plotted. 
 29 | 
 30 | ```{r, eval=FALSE}
 31 | hist(x, breaks = ``)
 32 | ```
 33 | 
 34 | With the `breaks` argument you can specify the number of bins you want in the histogram.
 35 | 
 36 | You can also check other graphics functions such as [`boxplot()`](http://www.rdocumentation.org/packages/graphics/functions/boxplot) and [`barplot()`]( http://www.rdocumentation.org/packages/raster/functions/barplot). 
 37 | 
 38 | 
 39 | ## Customizing Plots
 40 | 
 41 | Now, you can modify your plot !
 42 | 
 43 | ```{r,eval=FALSE}
 44 | plot(x,y, 
 45 |       xlab = " ",     # changes the label of the horizontal axis 
 46 |       ylab = " ",     # changes the label of the vertical axis 
 47 |       main = " ",     # specifies the title of the plot 
 48 |       type = " ",     # specifies the type of the plot i.e lines, points etc 
 49 |       col = " ")      # specifies the color of the plot 
 50 | ```
 51 | 
 52 | Type `?par` in your console to take a peek on the graphical parameters you can specify. 
 53 |  
 54 | A few of them are 
 55 | 
 56 | ```{r,eval=FALSE}
 57 | plot(x,y, 
 58 |       xlab = " ",   
 59 |       ylab = " ",  
 60 |       main = " ",   
 61 |       type = " ",   
 62 |       col = " ",
 63 |       col.main = " ",  # specifies the color of the main title
 64 |       cex.axis = ` `,  # specifies the size of the fonts 
 65 |       lty = ` `,       # specifies the line type
 66 |       pch = ` `)       # specifies the plot symbol
 67 | ```
 68 | 
 69 | ### Important Note
 70 | Since all the arguments are specified inside the [`plot()`](http://www.rdocumentation.org/packages/graphics/functions/plot)
 71 | function they are valid only for the specific plot. It is possible, though, to set the parameters of the graphs globally by using the [`par()`](http://www.rdocumentation.org/packages/graphics/functions/par) function. 
 72 | 
 73 | 
 74 | ## Multiple Plots
 75 | 
 76 | R makes it easy to combine multiple plots into one overall graph, using either the [`par()`](http://www.rdocumentation.org/packages/graphics/functions/par)
 77 | or [`layout()`](http://www.rdocumentation.org/packages/graphics/functions/layout) function.
 78 | 
 79 | With [`par()`](http://www.rdocumentation.org/packages/graphics/functions/par), you can include the option `mfrow` to create a grid of `nrows` and `ncols` plots that are filled in by row. 
 80 | 
 81 | ```{r,eval=FALSE}
 82 | par(mfrow = c(nrows, ncols))
 83 | ```
 84 | 
 85 | If you use `mfcol`, it fills in the grid by columns.
 86 | 
 87 | ```{r,eval=FALSE}
 88 | par(mfcol = c(nrows, ncols))
 89 | ```
 90 | 
 91 | In order to reset the graphical parameters you can use: 
 92 | 
 93 | ```{r,eval=FALSE}
 94 | par(mfrow = c(1, 1))
 95 | ```
 96 | 
 97 | or equivalently,
 98 | 
 99 | ```{r,eval=FALSE}
100 | old_par <- par()
101 | ```
102 | 
103 | and invoke the `old_par` when you need to reset the parameters. 
104 | 
105 | Another way is to use the [`layout()`](http://www.rdocumentation.org/packages/graphics/functions/layout) function which divides the plotting space into as many rows and columns as there are in matrix `mat`.
106 | 
107 | ```{r,eval=FALSE}
108 | layout(mat, ...)
109 | ```
110 | 
111 | Once more way to reset the graphical parameters, is to use:
112 | 
113 | ```{r,eval=FALSE}
114 | layout(1)
115 | ```
116 | 
117 | 
118 | In order to add more information to your plot you can use the following fuunctions. 
119 | 
120 | ```{r,eval=FALSE}
121 | plot(x, y)
122 | abline()      # adds one or more straight lines
123 | lines()       # adds lines (careful how to specify the arguments, watch video for more info)
124 | points()      # adds points 
125 | text()        # adds text 
126 | segments()    # adds line segments between pairs of points
127 | ```
128 | 
129 | Take a look at the documentatin to get more insight into these functions [`abline()`](http://www.rdocumentation.org/packages/graphics/functions/abline),
130 | [`lines()`](http://www.rdocumentation.org/packages/graphics/functions/lines),
131 | [`points()`](http://www.rdocumentation.org/packages/graphics/functions/points),
132 | [`text()`](http://www.rdocumentation.org/packages/graphics/functions/text),
133 | and [`segments()`](http://www.rdocumentation.org/packages/graphics/functions/segments).
134 | 
135 | 
136 | 
137 | 


--------------------------------------------------------------------------------
/scripts/chapter1_script.md:
--------------------------------------------------------------------------------
 1 | ## chapter_1_1 script: R, the true basics
 2 | 
 3 | Hi! My name is Filip and I'm a data scientist at DataCamp. DataCamp is an online data science school. You'll take fun video lessons, like the one you're watching now and solve interactive coding challenges, where you receive instant and detailed feedback. All this happens in the comfort of your browser, so you can immediately start learning the skill of the future.
 4 | 
 5 | In this introduction to R course you will learn about the basics of R, as well as the most common data structures it uses to store data. By the end of this course, you will know how to create these data structures, manipulate them and perform calculations on them to get surprising insights.
 6 | 
 7 | But first things first: the basics of R. It's also called the language for statistical computing, and is one of the most popular languages to do data science, used by tons of companies and universities around the globe in all sorts of fields. Optimizing a financial portfolio? Mapping marketing data? Analyzing outcomes of clinical trials? You name it, R can handle it.
 8 | 
 9 | But why did R become so popular? Well, first of all, it's free to use! Next, R's visualization capabilities are top notch, making it easy to build beautiful plots. It's also easy to create so-called packages, which are extensions to R. R's very active community has created thousands of these packages for many different fields. Last but not least, R is an actual programming language, with a command-line interface for executing code. This is a big plus compared to other point-and-click programs out there. It might take some energy to fully get the hang of it, but feat not: DataCamp is here to help you master R in no time! Let's get started.
10 | 
11 | An important component of R, is the console. It's a place where you can execute R commands. In DataCamp's interactive interface, the console can be found here. Let's try to calculate the sum of 1 and 2. We simply type 1 + 2 at the prompt the console and hit Enter. R interprets what you typed and prints the result. 
12 | 
13 | R is more than a scientific calculator, though. You can also create so-called variables. A variable allows you to store data in R for later use. You can use the less than sign followed by a dash to create a variable. Suppose the height of a rectangle is 2. Let's assign this value 2 to a variable height. In the console, we type height, less than sign, dash, 2:
14 | 
15 | This time, R does not print anything, because it assumes that you will be using this variable in the future. If you now simply type and execute height in the console, R returns 2:
16 | 
17 | We can do a similar thing for the width of our imaginary rectangle. We assign the value 4 to a variable width.
18 | 
19 | Typing width gives us 4, great.
20 | 
21 | As you're assigning variables in the R console, you're actually accumulating the R workspace. It's the place where R variables 'live'. You can list all variables with the `ls()` function. Simply type ls followed by empty parentheses and hit enter.
22 | 
23 | This shows you a list of all the variables you have created up to now. There are two objects in your workspace at the moment, height and width. I we try to access variable that's not in the workspace, depth for example, R throws an error.
24 | 
25 | Suppose you now want to find out the area of our imaginary rectangle, which is height multiplied by width. height equals 2, and width equals 4, so the result is 8. Let's also assign this result to a new variable, area.
26 | 
27 | Inspecting the workspace again with ls, shows that the workspace contains three objects now: area, height and width.
28 | 
29 | Now, this is all great, but what if you want to recalculate the area of your imaginary rectangle when the height is 3 and the width is 6? You'd have to reassign the variables width and height in the console, and then recalculate the area. That's quite some coding you'd have to redo, isn't it? 
30 | 
31 | This is the place where R scripts come in! An R script is simply a text file with succesive lines of R code. Let's create such a script, "rectangle.R", that contains the code that we've written up to now.
32 | 
33 | Next, you can run this script. In the DataCamp interface, you can do this with the 'Submit Answer' button. R goes through your code, line by line, executing every command one by one in the console, just as if you are typing each command yourself. The cool thing is, that if you want to change your code, you can simply adapt your script and run it again. Let's change the height to 3 and the width to 6, and rerun the script. The variables are given different values this time, and the output changes accordingly.
34 | 
35 | Now it's time for some interactive exercises! Use the console for experimentation, and the R script editor for coding the actual answer. When you hit Submit Answer, your script will be executed, and checked for correctness. DataCamp's tailored feedback will guide you to R mastery!


--------------------------------------------------------------------------------