├── .gitignore ├── README.md ├── chapter1.Rmd ├── chapter2.Rmd ├── chapter3.Rmd ├── chapter4.Rmd ├── chapter5.Rmd ├── chapter6.Rmd ├── course.yml ├── datasets ├── chapter5.R ├── chapter5.RData ├── chapter6.R └── chapter6.RData ├── refguides ├── chapter1_refguide.Rmd ├── chapter2_refguide.Rmd ├── chapter3_refguide.Rmd ├── chapter4_refguide.Rmd ├── chapter5_refguide.Rmd ├── chapter6_refguide.Rmd └── chapter7_refguide.Rmd └── scripts └── chapter1_script.md /.gitignore: -------------------------------------------------------------------------------- 1 | * 2 | !*.Rmd 3 | !*.yml 4 | !README.md 5 | !.gitignore 6 | .Rproj.user 7 | !removed/ 8 | !removed/*.Rmd 9 | !scripts/ 10 | !scripts/*.md 11 | !datasets/ 12 | !datasets/*.R 13 | !datasets/*.RData 14 | !refguides/ 15 | !refguides/* 16 | 17 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Introduction to R (beta) 2 | 3 | Source files for the improved introduction to R course (still in beta) 4 | 5 | [**Link to Course**](https://www.datacamp.com/courses/732) 6 | 7 | This course should be updated through [DataCamp Teach](https://www.datacamp.com/teach). 8 | -------------------------------------------------------------------------------- /chapter1.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title_meta : Chapter 1 3 | title : Intro to basics 4 | description : "In this chapter, you will take your first steps with R. You will learn how to use the console as a calculator and how to assign variables. You will also get to know the basic data types in R. Let's get started!" 5 | attachments : 6 | slides_link: https://s3.amazonaws.com/assets.datacamp.com/course/introduction_to_r/slides/ch1_slides_v2.pdf 7 | 8 | --- type:VideoExercise lang:r xp:50 skills:1 key:1a1ba28cd5 9 | ## Meet R 10 | 11 | *** =video_link 12 | //player.vimeo.com/video/144351865 13 | 14 | *** =video_hls 15 | //videos.datacamp.com/transcoded/732_intro_to_r/v1/hls-ch1_1.master.m3u8 16 | 17 | --- type:NormalExercise lang:r xp:100 skills:1 key:5714863ba8 18 | ## Your first R script 19 | 20 | In the script on the right you should type R code to solve the exercises. When you hit the _Submit Answer_ button, every line of code in the script is interpreted and executed by R and you get a message that indicates whether or not your code was correct. The output of your submission is shown in the R console. 21 | 22 | You can also execute R commands straight in the console. When you type in the console, your submission will not be checked for correctness! Try, for example, to type in `3 * 4` and hit Enter. R should return `[1] 12`. 23 | 24 | *** =instructions 25 | In the script, add another line of code that calculates the sum of 6 and 12, and hit the _Submit Answer_ button. 26 | 27 | *** =hint 28 | Simply add a line of R code that calculates the sum of 6 and 12, just like the example in the sample code! 29 | 30 | *** =pre_exercise_code 31 | ```{r} 32 | # no pec 33 | ``` 34 | 35 | *** =sample_code 36 | ```{r} 37 | 3 + 4 38 | ``` 39 | 40 | *** =solution 41 | ```{r} 42 | 3 + 4 43 | 6 + 12 44 | ``` 45 | 46 | *** =sct 47 | ```{r} 48 | test_output_contains("18", incorrect_msg = "Make sure to add a line of R code that calculates the sum of 6 and 12.") 49 | success_msg("Awesome! See how the console shows the result of the R code you submitted?") 50 | ``` 51 | 52 | 53 | --- type:NormalExercise lang:r xp:100 skills:1 key:ab37530088 54 | ## Documenting your code 55 | 56 | Adding comments to your code is extremely important to make sure that you and others can understand what your code is about. R makes use of the `#` sign to add comments, just like Twitter! 57 | 58 | Comments are not run as R code, so they will not influence your result. For example, _Calculate 3 + 4_ in the script on the right is a comment and is ignored during execution. 59 | 60 | *** =instructions 61 | Add another comment in the script on the right, _Calculate 6 + 12_, at the appropriate location. 62 | 63 | *** =hint 64 | Simply add the line `# Calculate 6 + 12` above the R code that calculates 6 + 12. 65 | 66 | *** =pre_exercise_code 67 | ```{r} 68 | # no pec 69 | ``` 70 | 71 | *** =sample_code 72 | ```{r} 73 | # Calculate 3 + 4 74 | 3 + 4 75 | 76 | 77 | 6 + 12 78 | ``` 79 | 80 | *** =solution 81 | ```{r} 82 | # Calculate 3 + 4 83 | 3 + 4 84 | 85 | # Calculate 6 + 12 86 | 6 + 12 87 | ``` 88 | 89 | *** =sct 90 | ```{r} 91 | test_output_contains("7", incorrect_msg = "Do not remove the code that calculates 3 + 4.") 92 | test_student_typed("# Calculate 3 + 4", not_typed_msg = "Do not remove the comment for the code that calculates 3 + 4.") 93 | test_output_contains("18", incorrect_msg = "Do not remove the code that calculates 6 + 12.") 94 | test_student_typed(c("# Calculate 6 + 12", "# calculate 6 + 12", "#Calculate 6 + 12", "#calculate 6 + 12", 95 | "# Calculate 6+12", "# calculate 6+12", "#Calculate 6+12", "#calculate 6+12"), 96 | not_typed_msg = "Make sure to add the comment: `# Calculate 6 + 12`") 97 | success_msg("Great! Looks better, doesn't it?") 98 | ``` 99 | 100 | 101 | --- type:NormalExercise lang:r xp:100 skills:1 key:9d8a3d0b88 102 | ## R as a calculator 103 | 104 | In its most basic form R can be used as a scientific calculator. Consider the following arithmetic operators: 105 | 106 | - Addition: `+` 107 | - Subtraction: `-` 108 | - Multiplication: `*` 109 | - Division: `/` 110 | - Exponentiation: `^` 111 | - Modulo: `%%` 112 | 113 | The last two might need some explaining: 114 | - The `^` operator raises the number to its left to the power of the number to its right: for example `3^2` equals 9. 115 | - The modulo returns the remainder of the division of the number to the left by the number on its right, for example 5 modulo 3 or `5 %% 3` equals 2. 116 | 117 | *** =instructions 118 | - Type `2^5` in the script to calculate 2 to the power 5. 119 | - Type `28 %% 6` to calculate 28 modulo 6. 120 | - Click _Submit Answer_ and have a look at the R output in the console. 121 | 122 | *** =hint 123 | Another example of the modulo operator: `9 %% 2` equals `1`. 124 | 125 | *** =pre_exercise_code 126 | ```{r} 127 | # no pec 128 | ``` 129 | 130 | *** =sample_code 131 | ```{r} 132 | # Addition 133 | 5 + 5 134 | 135 | # Subtraction 136 | 5 - 5 137 | 138 | # Multiplication 139 | 3 * 5 140 | 141 | # Division 142 | (5 + 5) / 2 143 | 144 | # Exponentiation 145 | 146 | 147 | # Modulo 148 | 149 | ``` 150 | 151 | *** =solution 152 | ```{r} 153 | # Addition 154 | 5 + 5 155 | 156 | # Subtraction 157 | 5 - 5 158 | 159 | # Multiplication 160 | 3 * 5 161 | 162 | # Division 163 | (5 + 5) / 2 164 | 165 | # Exponentiation 166 | 2 ^ 5 167 | 168 | # Modulo 169 | 28 %% 6 170 | ``` 171 | 172 | *** =sct 173 | ```{r} 174 | msg <- "Do not remove the examples that have already been coded for you!" 175 | test_output_contains("5 + 5", incorrect_msg = msg) 176 | test_output_contains("5 - 5", incorrect_msg = msg) 177 | test_output_contains("3 * 5", incorrect_msg = msg) 178 | test_output_contains("(5 + 5)/2", incorrect_msg = msg) 179 | test_output_contains("2^5", incorrect_msg = "Have another look at the exponentiation. Read the instructions carefully.") 180 | test_output_contains("28 %% 6", incorrect_msg = "Have another look at the use of the `%%` operator. Read the instructions carefully.") 181 | success_msg("Nice one!") 182 | ``` 183 | 184 | 185 | --- type:MultipleChoiceExercise xp:50 skills:1 key:9d8819fb2e 186 | ## R's pros and cons 187 | 188 | As Filip explained in the video, there are things that make R the awesome and immensely popular language that it is today. On the other hand, there are also aspects about R that are less attractive. Which of the following statements are true regarding this statistical programming language developed by Ihaka and Gentleman in the nineties? 189 | 190 | 1. As opposed to SAS and SPSS, R is completely open-source. 191 | 2. R is open-source, but it's hard to share your code with others since R uses a command-line interface. 192 | 3. It typically takes a long time for new and updated R packages to be released and made available to the public. 193 | 4. R is easy to use, but this comes at the cost of limited graphical abilities. 194 | 5. R works well with large data sets, if the code is properly written and the data fits into the working memory. 195 | 196 | *** =instructions 197 | - statements (1) and (2) are correct; the others are false. 198 | - statements (1) and (4) are correct; the others are false. 199 | - statements (1) and (5) are correct; the others are false. 200 | - statements (2) and (4) are correct; the others are false. 201 | - statements (3) and (5) are correct; the others are false. 202 | 203 | *** =hint 204 | Remember that your data has to fit in the working memory for R to be able to process it. 205 | 206 | *** =pre_exercise_code 207 | ```{r} 208 | # no pec 209 | ``` 210 | 211 | *** =sct 212 | ```{r} 213 | msg1 = "Remember that the fact that R uses a command-line interface, does not make it hard to share code. On the contrary, sharing your results becomes very straightforward because you can easily share R scripts." 214 | msg2 = "R is the perfect tool for creating neat and insightful visualizations. Try again." 215 | msg3 = "Great! Head over to the next exercise and get your hands dirty!" 216 | msg4 = "R uses a command-line interface, which makes it very easy to share one's code. Also, R is very suitable for creating visualizations. Try again." 217 | msg5 = "It's fairly straightforward to write, maintain and share R packages. Try again." 218 | test_mc(3, feedback_msgs = c(msg1, msg2, msg3, msg4, msg5)) 219 | ``` 220 | 221 | 222 | --- type:NormalExercise lang:r xp:100 skills:1 key:6b6fb4974c 223 | ## Variable assignment (1) 224 | 225 | A variable allows you store a value or an object in R. You can then later use this variable's name to easily access the value or the object that is stored within this variable. You use `<-` to assign a variable: 226 | 227 | ``` 228 | my_variable <- 4 229 | ``` 230 | 231 | *** =instructions 232 | Complete the code in the editor such that it assigns the value 42 to the variable `x` in the editor. Click 'Submit Answer'. Notice that when you ask R to print `x`, the value 42 appears. 233 | 234 | *** =hint 235 | Look at how the value 4 was assigned to `my_variable` in the exercise's assignment. Do the exact same thing in the editor, but now assign 42 to the variable `x`. 236 | 237 | *** =pre_exercise_code 238 | ```{r} 239 | # no pec 240 | ``` 241 | 242 | *** =sample_code 243 | ```{r} 244 | # Assign the value 42 to x 245 | x <- 246 | 247 | # Print out the value of the variable x 248 | x 249 | ``` 250 | 251 | *** =solution 252 | ```{r} 253 | # Assign the value 42 to x 254 | x <- 42 255 | 256 | # Print out the value of the variable x 257 | x 258 | ``` 259 | 260 | *** =sct 261 | ```{r} 262 | test_error() 263 | test_object("x", 264 | undefined_msg = "Make sure to define a variable x.", 265 | incorrect_msg = "Make sure that you assign the correct value to x.") 266 | success_msg("Good job! Notice that R does not print the value of a variable to the console when you do the assignment. x <- 42 did not generate any output, because R assumes that you will be needing this variable in the future. Otherwise you wouldn't have stored the value in a variable in the first place, right? Proceed to the next exercise!") 267 | ``` 268 | 269 | 270 | 271 | 272 | --- type:NormalExercise xp:100 skills:1 key:a5b8028834 273 | ## Variable assignment (2) 274 | 275 | Suppose you have a fruit basket with five apples. You want to store the number of apples in a variable with the name `my_apples`. 276 | 277 | *** =instructions 278 | - Using `<-`, assign the value 5 to `my_apples` below the first comment. 279 | - Type `my_apples` below the second comment. This will print out the value of `my_apples`. 280 | - After clicking _Submit Answer_, have a look at the console: the number 5 is printed, so R now links the variable `my_apples` to the value 5. 281 | 282 | *** =hint 283 | Remember that if you want to assign a number or an object to a variable in R, you can make use of the assignment operator `<-`. Alternatively, you can use `=`, but `<-` is widely preferred in the R community. 284 | 285 | *** =pre_exercise_code 286 | ```{r} 287 | ``` 288 | 289 | *** =sample_code 290 | ```{r} 291 | # Assign the value 5 to the variable called my_apples 292 | 293 | 294 | # Print out the value of the variable my_apples 295 | 296 | ``` 297 | 298 | *** =solution 299 | ```{r} 300 | # Assign the value 5 to the variable called my_apples 301 | my_apples <- 5 302 | 303 | # Print out the value of the variable my_apples 304 | my_apples 305 | ``` 306 | 307 | *** =sct 308 | ```{r} 309 | test_object("my_apples", incorrect_msg = "Have you correctly assigned 5 to `my_apples`? Write `my_apples <- 5` on a new line in the script.") 310 | test_output_contains("my_apples", incorrect_msg = "Have you explicitly told R to print out the `my_apples` variable to the console? Simply type `my_apples` on a new line.") 311 | success_msg("Great! You could also use `=` for variable assignment, but `<-` is typically preferred.") 312 | ``` 313 | 314 | 315 | --- type:NormalExercise lang:r xp:100 skills:1 key:a0cb1bea96 316 | ## Variable assignment (3) 317 | 318 | Every tasty fruit basket needs oranges, so you decide to add six oranges. You decide to create the variable `my_oranges` and assign the value 6 to it. Next, you want to calculate how many pieces of fruit you have in total. Since you have given meaningful names to these values, you can now code this in a clear way: 319 | 320 | ``` 321 | my_apples + my_oranges 322 | ``` 323 | 324 | *** =instructions 325 | - Assign to `my_oranges` the value 6. 326 | - Add the variables `my_apples` and `my_oranges` and have R simply print the result. 327 | - Combine the variables `my_apples` and `my_oranges` into a new variable `my_fruit`, which is the total amount of fruits in your fruit basket. 328 | 329 | *** =hint 330 | `my_fruit` is just the sum of `my_apples` and `my_oranges`. You can use the `+` operator to sum the two and `<-` to assign that value to the variable `my_fruit`. 331 | 332 | *** =pre_exercise_code 333 | ```{r} 334 | # no pec 335 | ``` 336 | 337 | *** =sample_code 338 | ```{r} 339 | # Assign 5 to my_apples 340 | my_apples <- 5 341 | 342 | # Assign 6 to my_oranges 343 | 344 | 345 | # Add my_apples and my_oranges: print the result 346 | 347 | 348 | # Add my_apples and my_oranges: assign to my_fruit 349 | 350 | ``` 351 | 352 | *** =solution 353 | ```{r} 354 | # Assign 5 to my_apples 355 | my_apples <- 5 356 | 357 | # Assign 6 to my_oranges 358 | my_oranges <- 6 359 | 360 | # Add my_apples and my_oranges: print the result 361 | my_apples + my_oranges 362 | 363 | # Add my_apples and my_oranges: assign to my_fruit 364 | my_fruit <- my_apples + my_oranges 365 | ``` 366 | 367 | *** =sct 368 | ```{r} 369 | test_object("my_apples", incorrect_msg = "Do not change the assignment of the `my_apples` variable!") 370 | test_object("my_oranges") 371 | test_output_contains("my_apples + my_oranges", 372 | incorrect_msg = "The output does not contain the result of adding `my_apples` and `my_oranges` (second instruction). Try again.") 373 | test_object("my_fruit") 374 | success_msg("Nice one! The great advantage of doing calculations with variables is reusability. If you just change `my_apples` to equal 12 instead of 5 and rerun the script, `my_fruit` will automatically update as well.") 375 | ``` 376 | 377 | 378 | --- type:NormalExercise lang:r xp:100 skills:1 key:6192f64167 379 | ## The workspace 380 | 381 | If you assign a value to a variable, this variable is stored in the workspace. It's the place where all user-defined variables live. The command [`ls()`](http://www.rdocumentation.org/packages/base/functions/ls) lists the contents of this workspace. 382 | 383 | ``` 384 | a <- 1 385 | b <- 2 386 | ls() 387 | ``` 388 | 389 | The first two lines create the variables `a` and `b`. Calling [`ls()`](http://www.rdocumentation.org/packages/base/functions/ls) now shows you that `a` and `b` are in the workspace. 390 | 391 | You can also remove variables from the workspace. You do this with [`rm()`](http://www.rdocumentation.org/packages/base/functions/rm). `rm(a)`, for example, would remove `a` from the workspace again. `rm(list = ls())`, which is used in the beginning of your script, clears everything from the workspace. 392 | 393 | *** =instructions 394 | - Create a variable, `horses`, equal to 3, and a variable `dogs`, equal to 7. 395 | - List the contents of your workspace with [`ls()`](http://www.rdocumentation.org/packages/base/functions/ls) to see that indeed, these two variables are in there. 396 | 397 | *** =hint 398 | All you need is a combination of [`ls()`](http://www.rdocumentation.org/packages/base/functions/ls) and [`rm()`](http://www.rdocumentation.org/packages/base/functions/rm) at the right time. Give it a try and let the feedback messages guide you. 399 | 400 | *** =pre_exercise_code 401 | ```{r} 402 | # no pec 403 | ``` 404 | 405 | *** =sample_code 406 | ```{r} 407 | # Clear the entire workspace 408 | rm(list = ls()) 409 | 410 | # Create the variables horses and dogs 411 | 412 | 413 | # List the contents of your workspace 414 | 415 | 416 | ``` 417 | 418 | *** =solution 419 | ```{r} 420 | # Clear the entire workspace 421 | rm(list = ls()) 422 | 423 | # Create the variables horses and dogs 424 | horses <- 3 425 | dogs <- 7 426 | 427 | # Inspect the contents of the workspace again 428 | ls() 429 | ``` 430 | 431 | *** =sct 432 | ```{r} 433 | test_student_typed("rm(list = ls())", not_typed_msg = "Do not remove the line `rm(list = ls())`.") 434 | test_object("horses") 435 | test_object("dogs") 436 | test_output_contains('c("dogs", "horses")', 437 | incorrect_msg = "Make sure to inspect the objects in your workspace after creating `horses` and `dogs`.") 438 | success_msg("Awesome! You can now build up and inspect your workspace, great!") 439 | ``` 440 | 441 | 442 | --- type:VideoExercise lang:r xp:50 skills:1 key:9f9019501e 443 | ## Basic Data Types 444 | 445 | *** =video_link 446 | //player.vimeo.com/video/138173888 447 | 448 | *** =video_hls 449 | //videos.datacamp.com/transcoded/732_intro_to_r/v1/hls-ch1_2.master.m3u8 450 | 451 | 452 | 453 | --- type:NormalExercise lang:r xp:100 skills:1 key:1866cdd202 454 | ## Discover Basic Data Types 455 | 456 | To get started, here are some of R's most basic data types: 457 | 458 | - Decimal values like `4.5` are called **numerics**. 459 | - Natural numbers like `4L` are called **integers**. Integers are also numerics. 460 | - Boolean values (`TRUE` or `FALSE`) are called **logical**. Capital letters are important here; `true` and `false` are not valid. 461 | - Text (or string) values are called **characters**. 462 | 463 | Note how the quotation marks on the right indicate that `"some text"` is of type character. 464 | 465 | *** =instructions 466 | Change the value of the: 467 | 468 | - `my_numeric` variable to `42`. 469 | - `my_character` variable to `"forty-two"`. Note that the quotation marks indicate that `"forty-two"` is a character. 470 | - `my_logical` variable to `FALSE`. 471 | 472 | *** =hint 473 | Replace the values in the script with the values that are provided in the exercise. 474 | ``` 475 | my_numeric <- 42 476 | ``` 477 | assigns the value 42 to the variable `my_numeric`. 478 | 479 | *** =pre_exercise_code 480 | ```{r} 481 | # no pec 482 | ``` 483 | 484 | *** =sample_code 485 | ```{r} 486 | # What is the answer to the universe? 487 | my_numeric <- 42.5 488 | 489 | # The quotation marks indicate that the variable is of type character 490 | my_character <- "some text" 491 | 492 | # Change the value of my_logical 493 | my_logical <- TRUE 494 | ``` 495 | 496 | *** =solution 497 | ```{r} 498 | # What is the answer to the universe? 499 | my_numeric <- 42 500 | 501 | # The quotation marks indicate that the variable is of type character 502 | my_character <- "forty-two" 503 | 504 | # Change the value of my_logical 505 | my_logical <- FALSE 506 | ``` 507 | 508 | *** =sct 509 | ```{r} 510 | test_object("my_numeric", 511 | incorrect_msg = "Make sure that you assign the correct value to `my_numeric.`") 512 | test_object("my_character", 513 | incorrect_msg = paste("Make sure that you assign the correct value to `my_character`.", 514 | "Do not forget the quotes and beware of capitalization! R is case sensitive!")) 515 | test_object("my_logical", 516 | undefined_msg = "Please make sure to define a variable `my_logical`.", 517 | incorrect_msg = "Make sure that you assign the correct value to `my_logical`.") 518 | success_msg("Great work! Continue to the next exercise.") 519 | ``` 520 | 521 | --- type:NormalExercise lang:r xp:100 skills:1 key:c52153af0b 522 | ## Back to Apples and Oranges 523 | 524 | Common knowledge tells you not to add apples and oranges. But hey, that is what you just did! The `my_apples` and `my_oranges` variables both contained a number in the previous exercise. The `+` operator works with numeric variables in R. 525 | 526 | However, if you try to add a numeric and a character string, R will complain. 527 | 528 | *** =instructions 529 | - Click _Submit Answer_ and read the error message. Make sure you understand why this did not work. 530 | - Adjust `my_oranges <- "six"` such that R knows you have 6 oranges and thus a fruit basket with 11 pieces of fruit. Click _Submit Answer_ again. 531 | 532 | *** =hint 533 | You have to assign the numeric value `6` to the `my_oranges` variable instead of the character value `"six"`. Notice how the quotation marks are used to indicate that `"six"` is a character. 534 | 535 | *** =pre_exercise_code 536 | ```{r} 537 | # no pec 538 | ``` 539 | 540 | *** =sample_code 541 | ```{r} 542 | # Assign a value to the variable my_apples and print it out 543 | my_apples <- 5 544 | my_apples 545 | 546 | # Assign a value to the variable my_oranges and print it out 547 | my_oranges <- "six" 548 | my_oranges 549 | 550 | # New variable that contains the total amount of fruit 551 | my_fruit <- my_apples + my_oranges 552 | my_fruit 553 | ``` 554 | 555 | *** =solution 556 | ```{r} 557 | # Assign a value to the variable my_apples and print it out 558 | my_apples <- 5 559 | my_apples 560 | 561 | # Assign a value to the variable my_oranges and print it out 562 | my_oranges <- 6 563 | my_oranges 564 | 565 | # New variable that contains the total amount of fruit 566 | my_fruit <- my_apples + my_oranges 567 | my_fruit 568 | ``` 569 | 570 | *** =sct 571 | ```{r} 572 | test_object("my_apples", incorrect_msg = "Don't change the code that assigns 5 to `my_apples`.") 573 | test_object("my_oranges", incorrect_msg = "Change the assignment of the `my_oranges` variable such that the code runs without errors.") 574 | test_object("my_fruit", 575 | undefined_msg = "Please make sure to define a variable `my_fruit`.", 576 | incorrect_msg = "Make sure that you assign the correct value to `my_fruit`.") 577 | test_output_contains("my_fruit", incorrect_msg = "The output does not contain the result of adding `my_apples` and `my_oranges`.") 578 | success_msg("Awesome, keep up the good work!") 579 | ``` 580 | 581 | --- type:MultipleChoiceExercise lang:r xp:50 skills:1 key:7806ca24d2 582 | ## What's that data type? 583 | 584 | When you added the variables containing `5` and `"six"`, you got an error due to a mismatch in data types. You can avoid such embarrassing situations by checking the data type of a variable beforehand: 585 | 586 | ``` 587 | class(my_var) 588 | ``` 589 | 590 | In the workspace (you can see what it contains by typing [`ls()`](http://www.rdocumentation.org/packages/base/functions/ls) in the console), some variables have already been defined. Which statement concerning these variables are correct? 591 | 592 | *** =instructions 593 | - `a`'s class is `integer`, `b` is a `character`, `c` is a `boolean`. 594 | - `a`'s class is `character`, `b` is a `character` as well, `c` is a `logical`. 595 | - `a`'s class is `numeric`, `b` is a `string`, `c` is a `logical`. 596 | - `a`'s class is `numeric`, `b` is a `character`, `c` is a `logical`. 597 | 598 | *** =hint 599 | You can find out the data type of the `a` variable for example by typing `class(a)`. You can do similar things for `b` and `c`. 600 | 601 | *** =pre_exercise_code 602 | ```{r} 603 | a <- 42 604 | b <- "forty-two" 605 | c <- FALSE 606 | ``` 607 | 608 | *** =sct 609 | ```{r} 610 | msg1 <- "`boolean` is not the class for logical values. Try again." 611 | msg2 <- "`a` is of the class `numeric`, give it another go." 612 | msg3 <- "`string` is not a class in R. `character` is!" 613 | msg4 <- "Nice one. Let's step it up a notch and start coercing variables!" 614 | test_mc(correct = 4, feedback_msgs = c(msg1, msg2, msg3, msg4)) 615 | ``` 616 | 617 | --- type:NormalExercise lang:r xp:100 skills:1 key:c75fe45544 618 | ## Coercion: Taming your data 619 | 620 | As Filip explained in the video, coercion to transform your data from one type to another is perfectly possible. Next to the [`class()`](http://www.rdocumentation.org/packages/base/functions/class) function and the `is.*()` functions, you can use the `as.*()` functions to force data to change types. 621 | 622 | Take this example: 623 | 624 | ``` 625 | var <- "3" 626 | var_num <- as.numeric(var) 627 | ``` 628 | 629 | `var`, a character string, is converted into a numeric using [`as.numeric()`](http://www.rdocumentation.org/packages/base/functions/numeric). The resulting numeric is stored as `var_num`. 630 | 631 | *** =instructions 632 | - Convert `var`, a logical, to a character. Assign to resulting character string to the variable `var_char`. 633 | - Inspect the class of `var_char` by using [`class()`](http://www.rdocumentation.org/packages/base/functions/class) on it. 634 | 635 | *** =hints 636 | Use the [`as.character()`](http://www.rdocumentation.org/packages/base/functions/character) function to convert `var` to a character. 637 | 638 | *** =pre_exercise_code 639 | ```{r} 640 | ``` 641 | 642 | *** =sample_code 643 | ```{r} 644 | # Definition of var 645 | var <- TRUE 646 | 647 | # Convert var to a character: var_char 648 | 649 | 650 | # Display the class of var_char 651 | 652 | 653 | ``` 654 | 655 | *** =solution 656 | ```{r} 657 | # Definition of var 658 | var <- TRUE 659 | 660 | # Convert var to a character: var_char 661 | var_char <- as.character(var) 662 | 663 | # Display the class of var_char 664 | class(var_char) 665 | ``` 666 | 667 | *** =sct 668 | ```{r} 669 | test_error() 670 | msg <- "Do not remove or change the definition of the variable `var`." 671 | test_object("var", undefined_msg = msg, incorrect_msg = msg) 672 | test_function("as.character", "x", 673 | not_called_msg = "Make sure to call the function [`as.character()`](http://www.rdocumentation.org/packages/base/functions/character) to convert `var` to a character.", 674 | incorrect_msg = "Have you passed the correct variable to the function [`as.character()`](http://www.rdocumentation.org/packages/base/functions/character)?") 675 | test_object("var_char") 676 | test_function("class", "x", 677 | not_called_msg = "Make sure to call the function class() to inspect the class of var_char.", 678 | incorrect_msg = "Have you passed the correct variable to the function class()/?") 679 | success_msg("Bellissimo!") 680 | ``` 681 | -------------------------------------------------------------------------------- /chapter2.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title_meta : Chapter 2 3 | title : Vectors 4 | description : We take you on a trip to Vegas, where you will learn how to analyze your gambling results using vectors in R! After completing this chapter, you will be able to create vectors in R, name them, select elements from them and compare different vectors. 5 | attachments : 6 | slides_link: https://s3.amazonaws.com/assets.datacamp.com/course/introduction_to_r/slides/ch2_slides.pdf 7 | 8 | --- type:VideoExercise lang:r xp:50 skills:1 key:b91dd847a0 9 | ## Create and Name Vectors 10 | 11 | *** =video_link 12 | //player.vimeo.com/video/138173896 13 | 14 | *** =video_hls 15 | //videos.datacamp.com/transcoded/732_intro_to_r/v1/hls-ch2_1.master.m3u8 16 | 17 | 18 | --- type:NormalExercise lang:r xp:100 skills:1 key:2d1cb04427 19 | ## Create a vector (1) 20 | 21 | Feeling lucky? You better, because we'll take you on a trip to Las Vegas! 22 | 23 | Thanks to R and your new data science skills, you will learn how to uplift your performance at the tables and fire off your career as a professional gambler. This chapter will show how you can easily keep track of your betting progress and how you can do some simple analyses on past actions. 24 | 25 | You will use vectors. As Filip explained you, vectors are one dimensional arrays that can hold numeric data, character data or logical data. You create a vector with the combine function [`c()`](http://www.rdocumentation.org/packages/base/functions/c). You place the vector elements separated by a comma between the brackets. For example: 26 | 27 | ``` 28 | numeric_vector <- c(1, 2, 3) 29 | character_vector <- c("a", "b", "c") 30 | logical_vector <- c(TRUE, FALSE) 31 | ``` 32 | 33 | *** =instructions 34 | Create a vector, `logical_vector`, that contains the three elements: `TRUE`, `FALSE` and `TRUE` (in that order). 35 | 36 | *** =hint 37 | Assign `c(TRUE, FALSE, TRUE)` to the variable `logical_vector` with the `<-` operator. 38 | 39 | *** =pre_exercise_code 40 | ```{r} 41 | # no pec 42 | ``` 43 | 44 | *** =sample_code 45 | ```{r} 46 | numeric_vector <- c(1, 10, 49) 47 | character_vector <- c("x", "y", "z") 48 | 49 | # Create logical_vector 50 | 51 | ``` 52 | 53 | *** =solution 54 | ```{r} 55 | numeric_vector <- c(1, 10, 49) 56 | character_vector <- c("x", "y", "z") 57 | 58 | # Create logical_vector 59 | logical_vector <- c(TRUE, FALSE, TRUE) 60 | ``` 61 | 62 | *** =sct 63 | ```{r} 64 | msg <- "Do not change how `numeric_vector` and `character_vector` are created!" 65 | lapply(c("numeric_vector", "character_vector"), test_object, undefined_msg = msg, incorrect_msg = msg) 66 | test_object("logical_vector", incorrect_msg = "Make sure that you assign the correct values to `logical_vector`. The order matters!") 67 | success_msg("Perfect! Let's practice some more with vector creation.") 68 | ``` 69 | 70 | --- type:NormalExercise lang:r xp:100 skills:1 key:c6e056b9c3 71 | ## Create a vector (2) 72 | 73 | After one week in Las Vegas and still zero Ferraris in your garage, you decide that it is time to start using your data science superpowers. 74 | 75 | Before doing your first analysis, you decide to collect all the winnings and losses for the last week: 76 | 77 | For `poker_vector`: 78 | - On Monday you won \$140 79 | - Tuesday you **lost** \$50 80 | - Wednesday you won \$20 81 | - Thursday you **lost** \$120 82 | - Friday you won \$240 83 | 84 | For `roulette_vector`: 85 | - On Monday you **lost** \$24 86 | - Tuesday you **lost** \$50 87 | - Wednesday you won \$100 88 | - Thursday you **lost** \$350 89 | - Friday you won \$10 90 | 91 | To be able to use this data in R, you decide to create the variables `poker_vector` and `roulette_vector`. 92 | 93 | *** =instructions 94 | Assign the winnings/losses for roulette as a vector to the variable `roulette_vector`. Make sure to use the correct order. 95 | 96 | *** =hint 97 | To help you with this step, the script already contains the code for creating `poker_vector`. Assign the correct values to `roulette_vector` based on the numbers in the assignment. Do not forget that losses are negative numbers. 98 | 99 | 100 | *** =pre_exercise_code 101 | ```{r} 102 | ``` 103 | 104 | *** =sample_code 105 | ```{r} 106 | # Poker winnings from Monday to Friday 107 | poker_vector <- c(140, -50, 20, -120, 240) 108 | 109 | # Roulette winnings from Monday to Friday: roulette_vector 110 | 111 | ``` 112 | 113 | *** =solution 114 | 115 | ```{r} 116 | # Poker winnings from Monday to Friday 117 | poker_vector <- c(140, -50, 20, -120, 240) 118 | 119 | # Roulette winnings from Monday to Friday: roulette_vector 120 | roulette_vector <- c(-24, -50, 100, -350, 10) 121 | ``` 122 | 123 | *** =sct 124 | ```{r} 125 | test_object("poker_vector", 126 | incorrect_msg = "Don't change how `poker_vector` is defined.") 127 | test_object("roulette_vector", 128 | incorrect_msg = paste("Make sure that you assign a vector with the correct values to `roulette_vector`.", 129 | "If you lost money, you should use a `-` sign.")) 130 | success_msg("Very good! To check out the contents of your vectors, remember that you can always simply type the variable in the console and hit Enter. Proceed to the next exercise!") 131 | ``` 132 | 133 | 134 | --- type:NormalExercise lang:r xp:100 skills:1 key:ebb5aae2ff 135 | ## Naming a vector (1) 136 | 137 | As a data analyst, it is important to have a clear view on the data that you are using. Understanding what each element refers to is essential. 138 | 139 | In the previous exercise, we created a vector with your winnings over the week. Each vector element refers to a day of the week but it is hard to tell which element belongs to which day. It would be nice if you could show that in the vector itself. Remember the [`names()`](http://www.rdocumentation.org/packages/base/functions/names) function to name the elements of a vector? 140 | 141 | ``` 142 | some_vector <- c("Johnny", "Poker Player") 143 | names(some_vector) <- c("Name", "Profession") 144 | ``` 145 | 146 | *** =instructions 147 | `poker_vector` has already been named with the days of the week. Do the same thing for `roulette_vector`. Beware: R is case sensitive! 148 | 149 | *** =hint 150 | Assign `c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")` to `names(roulette_vector)`. 151 | 152 | *** =pre_exercise_code 153 | ```{r} 154 | ``` 155 | 156 | *** =sample_code 157 | ```{r} 158 | # Poker winnings from Monday to Friday 159 | poker_vector <- c(140, -50, 20, -120, 240) 160 | 161 | # Roulette winnings from Monday to Friday 162 | roulette_vector <- c(-24, -50, 100, -350, 10) 163 | 164 | # Add names to poker_vector 165 | names(poker_vector) <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday") 166 | 167 | # Add names to roulette_vector 168 | 169 | ``` 170 | 171 | *** =solution 172 | ```{r} 173 | # Poker winnings from Monday to Friday 174 | poker_vector <- c(140, -50, 20, -120, 240) 175 | 176 | # Roulette winnings from Monday to Friday 177 | roulette_vector <- c(-24, -50, 100, -350, 10) 178 | 179 | # Add names to poker_vector 180 | names(poker_vector) <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday") 181 | 182 | # Add names to roulette_vector 183 | names(roulette_vector) <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday") 184 | ``` 185 | 186 | *** =sct 187 | ```{r} 188 | msg <- "Do not change the values inside `%s`; they were already coded for you." 189 | test_object("poker_vector", incorrect_msg = sprintf(msg, "poker_vector")) 190 | test_object("roulette_vector", incorrect_msg = sprintf(msg, "roulette_vector")) 191 | msg <- "Make sure that you assign the correct names vector to `%s`. The names of the day should start with a capital letter!" 192 | test_object("poker_vector", eq_condition = "equal", incorrect_msg = sprintf(msg, "poker_vector")) 193 | test_object("roulette_vector", eq_condition = "equal", incorrect_msg = sprintf(msg, "roulette_vector")) 194 | success_msg("Well done!") 195 | ``` 196 | 197 | 198 | --- type:NormalExercise lang:r xp:100 skills:1 key:5c026ed9fb 199 | ## Naming a vector (2) 200 | 201 | If you want to become a good statistician, you have to become lazy. (If you are already lazy, chances are high you are one of those exceptional, natural-born statistical talents!) 202 | 203 | In the previous exercises you probably experienced that it is boring and frustrating to type and retype information such as the days of the week. However, there is a more efficient way to do this, namely, to assign the days of the week vector to a variable! 204 | 205 | Just like you did with your poker and roulette returns, you can also create a variable that contains the days of the week. This way you can use and re-use it. This variable, `days_vector`, has already been coded for you. 206 | 207 | *** =instructions 208 | - Use the variable `days_vector` to set the names of `poker_vector`. 209 | - Use the variable `days_vector` to set the names of `roulette_vector`. 210 | 211 | *** =hint 212 | You can use `names(poker_vector) <- ` to set the names of the variable `poker_vector`. 213 | 214 | *** =pre_exercise_code 215 | ```{r} 216 | # no pec 217 | ``` 218 | 219 | *** =sample_code 220 | ```{r} 221 | # Poker winnings from Monday to Friday 222 | poker_vector <- c(140, -50, 20, -120, 240) 223 | 224 | # Roulette winnings from Monday to Friday 225 | roulette_vector <- c(-24, -50, 100, -350, 10) 226 | 227 | # Create the variable days_vector 228 | days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday") 229 | 230 | # Use days_vector to name poker_vector 231 | 232 | 233 | # Use days_vector to name roulette_vector 234 | ``` 235 | 236 | *** =solution 237 | ```{r} 238 | # Poker winnings from Monday to Friday 239 | poker_vector <- c(140, -50, 20, -120, 240) 240 | 241 | # Roulette winnings from Monday to Friday 242 | roulette_vector <- c(-24, -50, 100, -350, 10) 243 | 244 | # Create the variable days_vector 245 | days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday") 246 | 247 | # Use days_vector to name poker_vector 248 | names(poker_vector) <- days_vector 249 | 250 | # Use days_vector to name roulette_vector 251 | names(roulette_vector) <- days_vector 252 | ``` 253 | 254 | *** =sct 255 | ```{r} 256 | msg <- "Do not change the values inside `%s`; they were already coded for you." 257 | test_object("poker_vector", incorrect_msg = sprintf(msg, "poker_vector")) 258 | test_object("roulette_vector", incorrect_msg = sprintf(msg, "roulette_vector")) 259 | test_object("days_vector", incorrect_msg = sprintf(msg, "days_vector")) 260 | 261 | msg <- "Make sure that you assign `days_vector` to the names of `%s`. Use the `names()` function." 262 | test_object("poker_vector", eq_condition = "equal", incorrect_msg = sprintf(msg, "poker_vector")) 263 | test_object("roulette_vector", eq_condition = "equal", incorrect_msg = sprintf(msg, "roulette_vector")) 264 | 265 | success_msg("Nice one! A word of advice: try to avoid code duplication at all times.") 266 | ``` 267 | 268 | --- type:VideoExercise lang:r xp:50 skills:1 key:b47466f033 269 | ## Vector Arithmetic 270 | 271 | *** =video_link 272 | //player.vimeo.com/video/141163398 273 | 274 | *** =video_hls 275 | //videos.datacamp.com/transcoded/732_intro_to_r/v1/hls-ch2_2.master.m3u8 276 | 277 | 278 | --- type:NormalExercise lang:r xp:100 skills:1 key:6b17fc50b9 279 | ## Calculate your earnings 280 | 281 | Now that you understand how R does arithmetic calculations with vectors, it is time to get those Ferraris in your garage! First, you need to understand what the overall profit or loss per day of the week was. The total daily profit is the sum of the profit/loss you realized on poker per day, and the profit/loss you realized on roulette per day. 282 | 283 | Remember that vector calculations happen element-wise; the following three statements are completely equivalent: 284 | 285 | ``` 286 | c(1, 2, 3) + c(4, 5, 6) 287 | c(1 + 4, 2 + 5, 3 + 6) 288 | c(5, 7, 9) 289 | ``` 290 | 291 | *** =instructions 292 | - Assign to the variable `total_daily` how much you won or lost on each day in total (poker and roulette combined). `total_daily` should be a vector with 5 values. 293 | - Print out `total_daily`. 294 | 295 | *** =hint 296 | Similar to the previous exercise, assign the sum of two vectors to a new variable, `total_daily`. 297 | 298 | *** =pre_exercise_code 299 | ```{r} 300 | # no pec 301 | ``` 302 | 303 | *** =sample_code 304 | ```{r} 305 | # Casino winnings from Monday to Friday 306 | poker_vector <- c(140, -50, 20, -120, 240) 307 | roulette_vector <- c(-24, -50, 100, -350, 10) 308 | days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday") 309 | names(poker_vector) <- days_vector 310 | names(roulette_vector) <- days_vector 311 | 312 | # Calculate your daily earnings: total_daily 313 | 314 | 315 | # Print out total_daily 316 | ``` 317 | 318 | *** =solution 319 | ```{r} 320 | # Casino winnings from Monday to Friday 321 | poker_vector <- c(140, -50, 20, -120, 240) 322 | roulette_vector <- c(-24, -50, 100, -350, 10) 323 | days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday") 324 | names(poker_vector) <- days_vector 325 | names(roulette_vector) <- days_vector 326 | 327 | # Calculate your daily earnings: total_daily 328 | total_daily <- poker_vector + roulette_vector 329 | 330 | # Print out total_daily 331 | total_daily 332 | ``` 333 | 334 | *** =sct 335 | ```{r} 336 | msg = "Do not change anything about the definition and naming of `poker_vector` and `roulette_vector`." 337 | test_object("days_vector", undefined_msg = msg, incorrect_msg = msg) 338 | test_object("poker_vector", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg) 339 | test_object("roulette_vector", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg) 340 | test_object("total_daily", 341 | incorrect_msg = "Make sure that you assign the sum of `poker_vector` and `roulette_vector` to `total_daily`. Simply use `+`.") 342 | test_output_contains("total_daily", incorrect_msg = "Don't forget to print out `total_daily`.") 343 | success_msg("Great! Continue to the next exercise.") 344 | ``` 345 | 346 | 347 | --- type:NormalExercise lang:r xp:100 skills:1 key:a9a1a50a31 348 | ## Calculate total winnings: sum() 349 | 350 | Based on the previous analysis, it looks like you had a mix of good and bad days. This is not what your ego expected, and you wonder if there may be a (very very very) tiny chance you have lost money over the week in total? 351 | 352 | You can answer this question using the [`sum()`](http://www.rdocumentation.org/packages/base/functions/sum) function. As mentioned in the video, it calculates the sum of all elements of a vector. 353 | 354 | *** =instructions 355 | - Calculate the total amount of money that you have won/lost with poker and assign it to the variable `total_poker`. 356 | - Do the same thing for roulette and assign the result to `total_roulette`. 357 | - Use `+` to sum the `total_poker` and `total_roulette`, which is the sum of all gains and losses of the week. Simply print the result to the console. 358 | 359 | *** =hint 360 | Use the [`sum()`](http://www.rdocumentation.org/packages/base/functions/sum) function to get the total of the `poker_vector`. Do the same thing for `roulette_vector`. 361 | 362 | *** =pre_exercise_code 363 | ```{r} 364 | # no pec 365 | ``` 366 | 367 | *** =sample_code 368 | ```{r} 369 | # Casino winnings from Monday to Friday 370 | poker_vector <- c(140, -50, 20, -120, 240) 371 | roulette_vector <- c(-24, -50, 100, -350, 10) 372 | days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday") 373 | names(poker_vector) <- days_vector 374 | names(roulette_vector) <- days_vector 375 | 376 | # Total winnings with poker: total_poker 377 | 378 | 379 | # Total winnings with roulette: total_roulette 380 | 381 | 382 | # Total winnings overall: print out the result 383 | 384 | ``` 385 | 386 | *** =solution 387 | ```{r} 388 | # Casino winnings from Monday to Friday 389 | poker_vector <- c(140, -50, 20, -120, 240) 390 | roulette_vector <- c(-24, -50, 100, -350, 10) 391 | days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday") 392 | names(poker_vector) <- days_vector 393 | names(roulette_vector) <- days_vector 394 | 395 | # Total winnings with poker: total_poker 396 | total_poker <- sum(poker_vector) 397 | 398 | # Total winnings with roulette: total_roulette 399 | total_roulette <- sum(roulette_vector) 400 | 401 | # Total winnings overall: print out the result 402 | total_roulette + total_poker 403 | ``` 404 | 405 | *** =sct 406 | ```{r} 407 | msg <- "Do not change anything about the definition and naming of `poker_vector` and `roulette_vector`." 408 | test_object("days_vector", undefined_msg = msg, incorrect_msg = msg) 409 | test_object("poker_vector", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg) 410 | test_object("roulette_vector", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg) 411 | test_object("total_poker", 412 | undefined_msg = "Please make sure to define a variable `total_poker`.", 413 | incorrect_msg = "Make sure that you assign to `total_poker` the sum of the `poker_vector`.") 414 | test_object("total_roulette", 415 | undefined_msg = "Please make sure to define a variable `total_roulette`.", 416 | incorrect_msg = "Make sure that you assign to `total_roulette` the sum of the `roulette_vector`.") 417 | test_output_contains("total_poker + total_roulette", incorrect_msg = "Print the sum of `total_poker` and `total_roulette` to the console.") 418 | success_msg("Oops, it seems like you are losing money. Time to rethink and adapt your strategy! This will require some deeper analysis...") 419 | ``` 420 | 421 | 422 | --- type:VideoExercise lang:r xp:50 skills:1 key:513029f4ac 423 | ## Vector Subsetting 424 | 425 | *** =video_link 426 | //player.vimeo.com/video/138173916 427 | 428 | *** =video_hls 429 | //videos.datacamp.com/transcoded/732_intro_to_r/v1/hls-ch2_3.master.m3u8 430 | 431 | 432 | --- type:NormalExercise lang:r xp:100 skills:1 key:6112e74425 433 | ## Selection by index (1) 434 | 435 | After you figured that roulette is not your forte, you decide to compare your performance at the beginning of the week to your performance at the end of the week. You did have a couple of Margarita cocktails at the end of the week... 436 | 437 | To answer that question, you only want to focus on a selection of the `total_vector`. In other words, our goal is to select specific elements of the vector. 438 | 439 | *** =instructions 440 | - Assign the poker results of Wednesday to the variable `poker_wednesday`. 441 | - Assign the roulette results of Friday to the variable `roulette_friday`. 442 | 443 | *** =hint 444 | Wednesday is the third element of `poker_vector`, and can thus be selected with `poker_vector[3]`. 445 | 446 | *** =pre_exercise_code 447 | ```{r} 448 | # no pec 449 | ``` 450 | 451 | *** =sample_code 452 | ```{r} 453 | # Casino winnings from Monday to Friday 454 | poker_vector <- c(140, -50, 20, -120, 240) 455 | roulette_vector <- c(-24, -50, 100, -350, 10) 456 | days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday") 457 | names(poker_vector) <- days_vector 458 | names(roulette_vector) <- days_vector 459 | 460 | # Poker results of Wednesday: poker_wednesday 461 | 462 | 463 | # Roulette results of Friday: roulette_friday 464 | 465 | ``` 466 | 467 | *** =solution 468 | ```{r} 469 | # Casino winnings from Monday to Friday 470 | poker_vector <- c(140, -50, 20, -120, 240) 471 | roulette_vector <- c(-24, -50, 100, -350, 10) 472 | days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday") 473 | names(poker_vector) <- days_vector 474 | names(roulette_vector) <- days_vector 475 | 476 | # Poker results of Wednesday: poker_wednesday 477 | poker_wednesday <- poker_vector[3] 478 | 479 | # Roulette results of Friday: roulette_friday 480 | roulette_friday <- roulette_vector[5] 481 | ``` 482 | 483 | *** =sct 484 | ```{r} 485 | 486 | msg = "Do not change anything about the definition and naming of `poker_vector` and `roulette_vector`." 487 | test_object("days_vector", undefined_msg = msg, incorrect_msg = msg) 488 | test_object("poker_vector", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg) 489 | test_object("roulette_vector", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg) 490 | test_object("poker_wednesday", 491 | incorrect_msg = "It looks like `poker_wednesday` does not contain the correct value of `poker_vector`.") 492 | test_object("roulette_friday", 493 | incorrect_msg = "It looks like `roulette_friday` does not contain the correct value of `roulette_vector`.") 494 | success_msg("Great! R also makes it possible to select multiple elements from a vector at once, remember? Put the theory to practice in the next exercise!") 495 | ``` 496 | 497 | 498 | --- type:NormalExercise lang:r xp:100 skills:1 key:ae2832fbd1 499 | ## Selection by index (2) 500 | 501 | How about analyzing your midweek results? 502 | 503 | Instead of using a single number to select a single element, you can also select multiple elements by passing a vector inside the square brackets. For example, 504 | 505 | ``` 506 | poker_vector[c(1,5)] 507 | ``` 508 | 509 | selects the first and the fifth element of `poker_vector`. 510 | 511 | 512 | *** =instructions 513 | - Assign the poker results of Tuesday, Wednesday and Thursday to the variable `poker_midweek`. 514 | - Assign the roulette results of Thursday and Friday to the variable `roulette_endweek`. 515 | 516 | *** =hint 517 | Use the vector `c(2,3,4)` between square brackets to select the correct elements of `poker_vector`. 518 | 519 | *** =pre_exercise_code 520 | ```{r} 521 | # no pec 522 | ``` 523 | 524 | *** =sample_code 525 | ```{r} 526 | # Casino winnings from Monday to Friday 527 | poker_vector <- c(140, -50, 20, -120, 240) 528 | roulette_vector <- c(-24, -50, 100, -350, 10) 529 | days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday") 530 | names(poker_vector) <- days_vector 531 | names(roulette_vector) <- days_vector 532 | 533 | # Mid-week poker results: poker_midweek 534 | 535 | 536 | # End-of-week roulette results: roulette_endweek 537 | 538 | 539 | ``` 540 | 541 | *** =solution 542 | ```{r} 543 | # Casino winnings from Monday to Friday 544 | poker_vector <- c(140, -50, 20, -120, 240) 545 | roulette_vector <- c(-24, -50, 100, -350, 10) 546 | days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday") 547 | names(poker_vector) <- days_vector 548 | names(roulette_vector) <- days_vector 549 | 550 | # Mid-week poker results: poker_midweek 551 | poker_midweek <- poker_vector[c(2, 3, 4)] 552 | 553 | # End-of-week roulette results: roulette_endweek 554 | roulette_endweek <- roulette_vector[c(4,5)] 555 | ``` 556 | 557 | *** =sct 558 | ```{r} 559 | msg <- "Do not change anything about the definition and naming of `poker_vector` and `roulette_vector`." 560 | test_object("days_vector", undefined_msg = msg, incorrect_msg = msg) 561 | test_object("poker_vector", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg) 562 | test_object("roulette_vector", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg) 563 | 564 | msg <- "It looks like `%s` does not contain the correct elements from `%s`." 565 | test_object("poker_midweek", 566 | incorrect_msg = sprintf(msg, "poker_midweek", "poker_vector")) 567 | test_object("roulette_endweek", 568 | incorrect_msg = sprintf(msg, "roulette_endweek", "roulette_vector")) 569 | 570 | success_msg("Well done! Another way to find the mid-week results is `poker_vector[2:4]`. Continue to the next exercise to specialize in vector selection some more!"); 571 | ``` 572 | 573 | --- type:NormalExercise lang:r xp:100 skills:1 key:5919f3fc05 574 | ## Selection by name 575 | 576 | Another way to tackle the previous exercise is by using the names of the vector elements (Monday, Tuesday, ...) instead of their numeric positions. For example, 577 | 578 | ``` 579 | poker_vector["Monday"] 580 | ``` 581 | 582 | will select the first element of `poker_vector` since `"Monday"` is the name of that first element. 583 | 584 | *** =instructions 585 | - Select the fourth element, corresponding to Thursday, from `roulette_vector`. Name it `roulette_thursday`. 586 | - Select Tuesday's poker gains using subsetting by name. Assign the result to `poker_tuesday`. 587 | 588 | *** =hint 589 | You can use `mean(my_vector)` to get the mean of the vector `my_vector`. 590 | 591 | *** =pre_exercise_code 592 | ```{r} 593 | # no pec 594 | ``` 595 | 596 | *** =sample_code 597 | ```{r} 598 | # Casino winnings from Monday to Friday 599 | poker_vector <- c(140, -50, 20, -120, 240) 600 | roulette_vector <- c(-24, -50, 100, -350, 10) 601 | days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday") 602 | names(poker_vector) <- days_vector 603 | names(roulette_vector) <- days_vector 604 | 605 | # Select Thursday's roulette gains: roulette_thursday 606 | 607 | 608 | # Select Tuesday's poker gains: poker_tuesday 609 | 610 | ``` 611 | 612 | *** =solution 613 | ```{r} 614 | # Casino winnings from Monday to Friday 615 | poker_vector <- c(140, -50, 20, -120, 240) 616 | roulette_vector <- c(-24, -50, 100, -350, 10) 617 | days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday") 618 | names(poker_vector) <- days_vector 619 | names(roulette_vector) <- days_vector 620 | 621 | # Select Thursday's roulette gains: roulette_thursday 622 | roulette_thursday <- roulette_vector["Thursday"] 623 | 624 | # Select Tuesday's poker gains: poker_tuesday 625 | poker_tuesday <- poker_vector["Tuesday"] 626 | ``` 627 | 628 | *** =sct 629 | ```{r} 630 | msg <- "Do not change anything about the definition and naming of `poker_vector` and `roulette_vector`." 631 | test_object("days_vector", undefined_msg = msg, incorrect_msg = msg) 632 | test_object("poker_vector", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg) 633 | test_object("roulette_vector", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg) 634 | 635 | test_object("roulette_thursday") 636 | test_object("poker_tuesday") 637 | success_msg("Good job! Head over to the next exercise."); 638 | ``` 639 | 640 | --- type:NormalExercise lang:r xp:100 skills:1 key:22121c6c46 641 | ## Selection by logicals (1) 642 | 643 | There are basically three ways to subset vectors: by using the indices, by using the names (if the vectors are named) and by using logical vectors. Filip already told you about the internals in the instructional video. As a refresher, have a look at the following statements to select elements from `poker_vector`, which are all equivalent: 644 | 645 | ``` 646 | # selection by index 647 | poker_vector[c(1,3)] 648 | 649 | # selection by name 650 | poker_vector[c("Monday", "Wednesday")] 651 | 652 | # selection by logicals 653 | poker_vector[c(TRUE, FALSE, TRUE, FALSE, FALSE)] 654 | ``` 655 | 656 | *** =instructions 657 | - Assign the roulette results from the first, third and fifth day to `roulette_subset`. 658 | - Select the first three days from `poker_vector` using a vector of logicals. Assign the result to `poker_start`. 659 | 660 | *** =hint 661 | The logical vector to use inside square brackets for the first instruction is `c(TRUE, FALSE, TRUE, FALSE, TRUE)`. 662 | 663 | *** =pre_exercise_code 664 | ```{r} 665 | # no pec 666 | ``` 667 | 668 | *** =sample_code 669 | ```{r} 670 | # Casino winnings from Monday to Friday 671 | poker_vector <- c(140, -50, 20, -120, 240) 672 | roulette_vector <- c(-24, -50, 100, -350, 10) 673 | days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday") 674 | names(poker_vector) <- days_vector 675 | names(roulette_vector) <- days_vector 676 | 677 | # Roulette results for day 1, 3 and 5: roulette_subset 678 | 679 | 680 | # Poker results for first three days: poker_start 681 | ``` 682 | 683 | *** =solution 684 | ```{r} 685 | # Casino winnings from Monday to Friday 686 | poker_vector <- c(140, -50, 20, -120, 240) 687 | roulette_vector <- c(-24, -50, 100, -350, 10) 688 | days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday") 689 | names(poker_vector) <- days_vector 690 | names(roulette_vector) <- days_vector 691 | 692 | # Roulette relsults for day 1, 3 and 5: roulette_subset 693 | roulette_subset <- roulette_vector[c(TRUE, FALSE, TRUE, FALSE, TRUE)] 694 | 695 | # Poker results for first three days: poker_start 696 | poker_start <- poker_vector[c(TRUE, TRUE, TRUE, FALSE, FALSE)] 697 | ``` 698 | 699 | *** =sct 700 | ```{r} 701 | msg = "Do not change anything about the definition and naming of `poker_vector` and `roulette_vector`." 702 | test_object("days_vector", undefined_msg = msg, incorrect_msg = msg) 703 | test_object("poker_vector", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg) 704 | test_object("roulette_vector", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg) 705 | test_object("roulette_subset") 706 | test_object("poker_start") 707 | success_msg("Nice one! Using logical vectors to perform subsetting might seem somewhat tedious, but its true power will become clear in the next exercise!") 708 | ``` 709 | 710 | 711 | --- type:NormalExercise lang:r xp:100 skills:1 key:aa2e5f6e97 712 | ## Selection by logicals (2) 713 | 714 | By making use of a combination of comparison operators and subsetting using logicals, you can investigate your casino performance in a more pro-active way. 715 | 716 | The (logical) comparison operators known to R are: 717 | - `<` for less than 718 | - `>` for greater than 719 | - `<=` for less than or equal to 720 | - `>=` for greater than or equal to 721 | - `==` for equal to each other 722 | - `!=` not equal to each other 723 | 724 | Experiment with these operators in the console: 725 | 726 | ``` 727 | lost_roulette_days <- roulette_vector < 0 728 | lost_roulette_days 729 | ``` 730 | 731 | The result will be a logical vector, which you can use to perform subsetting, like this example: 732 | 733 | ``` 734 | roulette_vector[lost_roulette_days] 735 | ``` 736 | 737 | The result is a subset of `roulette_vector` that contains only your losses in roulette. 738 | 739 | *** =instructions 740 | - Check if your poker winnings are positive on the different days of the week (i.e. > 0), and assign this to `selection_vector`. 741 | - Assign the amounts that you won on the profitable days to the variable `poker_profits` by using `selection_vector`. 742 | 743 | *** =hint 744 | - In order to check for which days your poker gains are positive, R should check for each element of `poker_vector` whether it is larger than zero. `some_vector > 0` is the way to tell R what you are after. 745 | - After creating `selection_vector`, you can use it to subset `poker_vector` like this: `poker_vector[selection_vector]`. 746 | 747 | *** =pre_exercise_code 748 | ```{r} 749 | # no pec 750 | ``` 751 | 752 | *** =sample_code 753 | ```{r} 754 | # Casino winnings from Monday to Friday 755 | poker_vector <- c(140, -50, 20, -120, 240) 756 | roulette_vector <- c(-24, -50, 100, -350, 10) 757 | days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday") 758 | names(poker_vector) <- days_vector 759 | names(roulette_vector) <- days_vector 760 | 761 | # Create logical vector corresponding to profitable poker days: selection_vector 762 | 763 | 764 | # Select amounts for profitable poker days: poker_profits 765 | 766 | ``` 767 | 768 | *** =solution 769 | ```{r} 770 | # Casino winnings from Monday to Friday 771 | poker_vector <- c(140, -50, 20, -120, 240) 772 | roulette_vector <- c(-24, -50, 100, -350, 10) 773 | days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday") 774 | names(poker_vector) <- days_vector 775 | names(roulette_vector) <- days_vector 776 | 777 | # Create logical vector corresponding to profitable poker days: selection_vector 778 | selection_vector <- poker_vector > 0 779 | 780 | # Select amounts for profitable poker days: poker_profits 781 | poker_profits <- poker_vector[selection_vector] 782 | ``` 783 | 784 | *** =sct 785 | ```{r} 786 | msg = "Do not change anything about the definition and naming of `poker_vector` and `roulette_vector`." 787 | test_object("days_vector", undefined_msg = msg, incorrect_msg = msg) 788 | test_object("poker_vector", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg) 789 | test_object("roulette_vector", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg) 790 | test_object("selection_vector", 791 | undefined_msg = "Please make sure to define a variable `selection_vector`.", 792 | incorrect_msg = "It looks like `selection_vector` does not contain the correct result. Remember that R uses element wise operations for vectors.") 793 | test_object("poker_profits", 794 | undefined_msg = "Please make sure to define a variable `poker_profits`.", 795 | incorrect_msg = "It looks like `poker_profits` does not contain the correct result. Remember that R uses element wise operations for vectors.") 796 | success_msg("Great! Move on to the Matrices chapter!") 797 | ``` 798 | 799 | -------------------------------------------------------------------------------- /chapter3.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title_meta : Chapter 3 3 | title : Matrices 4 | description : In this chapter you will learn how to work with matrices in R. By the end of the chapter, you will be able to create matrices and to understand how you can do basic computations with them. You will analyze the box office numbers of Star Wars to illustrate the use of matrices in R. May the force be with you! 5 | attachments : 6 | slides_link: https://s3.amazonaws.com/assets.datacamp.com/course/introduction_to_r/slides/ch3_slides.pdf 7 | 8 | --- type:VideoExercise lang:r xp:50 skills:1 key:82d8734b17 9 | ## Create and Name Matrices 10 | 11 | *** =video_link 12 | //player.vimeo.com/video/138173926 13 | 14 | *** =video_hls 15 | //videos.datacamp.com/transcoded/732_intro_to_r/v1/hls-ch3_1.master.m3u8 16 | 17 | 18 | --- type:NormalExercise lang:r xp:100 skills:1 key:834a0e546c 19 | ## Analyzing matrices, you shall (1) 20 | 21 | It is now time to get your hands dirty. In the following exercises you will analyze the box office numbers of the Star Wars franchise. May the force be with you! 22 | 23 | As a reminder, look at this line of code that constructs a matrix with numbers 1 through 9, filled row by row: 24 | 25 | ``` 26 | matrix(1:9, byrow = TRUE, nrow = 3) 27 | ``` 28 | 29 | In the script, a vector `box` is defined that represents the box office numbers from the first three Star Wars movies. The first, third and fifth element correspond to the US box office revenue for the movies, the second, fourth and sixth element represent the non-US box office revenue. 30 | 31 | *** =instructions 32 | Construct a matrix `star_wars_matrix`: 33 | 34 | - Each row represents a movie. 35 | - The first column is for the US box office revenue, and the second column for the non-US box office revenue. 36 | - Use the function `matrix()` with `box` as the first input, and the additional arguments `nrow` and `byrow`. 37 | 38 | *** =hint 39 | Set `nrow` to `3` and `byrow` to `TRUE` inside `matrix()`. 40 | 41 | *** =pre_exercise_code 42 | ```{r} 43 | # no pec 44 | ``` 45 | 46 | *** =sample_code 47 | ```{r} 48 | # Star Wars box office in millions (!) 49 | box <- c(460.998, 314.4, 290.475, 247.900, 309.306, 165.8) 50 | 51 | # Create star_wars_matrix 52 | 53 | ``` 54 | 55 | *** =solution 56 | ```{r} 57 | # Star Wars box office in millions (!) 58 | box <- c(460.998, 314.4, 290.475, 247.900, 309.306, 165.8) 59 | 60 | # Create star_wars_matrix 61 | star_wars_matrix <- matrix(box, nrow = 3, byrow = TRUE) 62 | ``` 63 | 64 | *** =sct 65 | ```{r} 66 | test_error() 67 | msg <- "Do not change or remove the definition of `box`!" 68 | test_object("box", undefined_msg = msg, incorrect_msg = msg) 69 | 70 | test_correct({ 71 | test_object("star_wars_matrix", 72 | undefined_msg = "Please make sure to define a variable `star_wars_matrix`.", 73 | incorrect_msg = "Did you assign the correct matrix containing the vector that holds all three movies to `star_wars_matrix`?") 74 | }, { 75 | test_function("matrix", "data") 76 | test_function("matrix", "nrow") 77 | test_function("matrix", "byrow") 78 | }) 79 | success_msg("Great job!") 80 | ``` 81 | 82 | 83 | --- type:NormalExercise lang:r xp:100 skills:1 key:0dfb4c5e70 84 | ## Analyzing matrices, you shall (2) 85 | 86 | Instead of as a single vector, the box office numbers for the three Star Wars movies are represented as three vectors. Remember the [`rbind()`](http://www.rdocumentation.org/packages/base/functions/cbind) function to paste together different vectors as if they were rows of a matrix? Take this example, that pastes together 2 vectors as if they were rows of a matrix: 87 | 88 | ``` 89 | a <- c(1, 2, 3) 90 | b <- c(4, 5, 6) 91 | rbind(a, b) 92 | ``` 93 | 94 | Try a similar thing on the astronomical Star Wars numbers! 95 | 96 | *** =instructions 97 | Again, construct the matrix `star_wars_matrix` with one row for each movie. 98 | 99 | *** =hint 100 | Simply pass the three vectors to the [`rbind()`](http://www.rdocumentation.org/packages/base/functions/cbind) function. 101 | 102 | *** =pre_exercise_code 103 | ```{r} 104 | # no pec 105 | ``` 106 | 107 | *** =sample_code 108 | ```{r} 109 | # Star Wars box office in millions (!) 110 | new_hope <- c(460.998, 314.4) 111 | empire_strikes <- c(290.475, 247.900) 112 | return_jedi <- c(309.306, 165.8) 113 | 114 | # Create star_wars_matrix 115 | 116 | ``` 117 | 118 | *** =solution 119 | ```{r} 120 | # Star Wars box office in millions (!) 121 | new_hope <- c(460.998, 314.4) 122 | empire_strikes <- c(290.475, 247.900) 123 | return_jedi <- c(309.306, 165.8) 124 | 125 | # Create star_wars_matrix 126 | star_wars_matrix <- rbind(new_hope, empire_strikes, return_jedi) 127 | ``` 128 | 129 | *** =sct 130 | ```{r} 131 | test_error() 132 | msg = "Do not change anything about the box office variables `new_hope`, `empire_strikes` and `return_jedi`!" 133 | test_object("new_hope", undefined_msg = msg, incorrect_msg = msg) 134 | test_object("empire_strikes", undefined_msg = msg, incorrect_msg = msg) 135 | test_object("return_jedi", undefined_msg = msg, incorrect_msg = msg) 136 | 137 | test_correct({ 138 | test_object("star_wars_matrix", 139 | incorrect_msg = "Did you assign the correct matrix containing the vector that holds all three movies to `star_wars_matrix`?") 140 | }, { 141 | test_function("rbind", not_called_msg = "You should use the [`rbind()`](http://www.rdocumentation.org/packages/base/functions/cbind) function to create the matrix.") 142 | }) 143 | success_msg("The force is actually with you! Continue to the next exercise.") 144 | ``` 145 | 146 | 147 | --- type:NormalExercise lang:r xp:100 skills:1 key:ca3dbb8a9f 148 | ## Naming a matrix 149 | 150 | To help you remember what is stored in `star_wars_matrix`, you would like to add the names of the movies for the rows. Not only does this help you to read the data, but it is also useful to select certain elements from the matrix later. 151 | 152 | Similar to vectors, you can add names for the rows and the columns of a matrix 153 | 154 | ``` 155 | rownames(my_matrix) <- row_names_vector 156 | colnames(my_matrix) <- col_names_vector 157 | ``` 158 | 159 | *** =instructions 160 | - Two vectors containing the row names and column names have been created for you: `movie_names` and `col_titles`. 161 | - Name the rows of `star_wars_matrix` with `movie_names`. 162 | - Name the columns of `star_wars_matrix` with the pre-defined `col_titles`. 163 | 164 | *** =hint 165 | To name the rows, start with `rownames(star_wars_matrix) <-`; can you finish the command? 166 | 167 | *** =pre_exercise_code 168 | ```{r} 169 | # no pec 170 | ``` 171 | 172 | *** =sample_code 173 | ```{r} 174 | # Star Wars box office in millions (!) 175 | new_hope <- c(460.998, 314.4) 176 | empire_strikes <- c(290.475, 247.900) 177 | return_jedi <- c(309.306, 165.8) 178 | star_wars_matrix <- rbind(new_hope, empire_strikes, return_jedi) 179 | 180 | # Build col_names_vector and row_names_vector 181 | movie_names <- c("A New Hope", "The Empire Strikes Back", "Return of the Jedi") 182 | col_titles <- c("US", "non-US") 183 | 184 | # Add row names to star_wars_matrix 185 | 186 | 187 | # Add column names to star_wars_matrix 188 | 189 | ``` 190 | 191 | *** =solution 192 | ```{r} 193 | # Star Wars box office in millions (!) 194 | new_hope <- c(460.998, 314.4) 195 | empire_strikes <- c(290.475, 247.900) 196 | return_jedi <- c(309.306, 165.8) 197 | star_wars_matrix <- rbind(new_hope, empire_strikes, return_jedi) 198 | 199 | # Build col_names_vector and row_names_vector 200 | movie_names <- c("A New Hope", "The Empire Strikes Back", "Return of the Jedi") 201 | col_titles <- c("US", "non-US") 202 | 203 | # Add row names to star_wars_matrix 204 | rownames(star_wars_matrix) <- movie_names 205 | 206 | # Add column names to star_wars_matrix 207 | colnames(star_wars_matrix) <- col_titles 208 | ``` 209 | 210 | *** =sct 211 | ```{r} 212 | msg = "Do not change anything about the box office variables `new_hope`, `empire_strikes` and `return_jedi`!" 213 | lapply(c("new_hope", "empire_strikes", "return_jedi"), test_object, undefined_msg = msg, incorrect_msg = msg) 214 | msg <- "Do not change anything about the creation of `star_wars_matrix`." 215 | test_object("star_wars_matrix", undefined_msg = msg, incorrect_msg = msg) 216 | msg <- paste("Do not change or remove the vectors `col_names_vector` and `row_names_vector`;", 217 | "you can use them to name the columns and rows of `star_wars_matrix`.") 218 | lapply(c("movie_names", "col_titles"), test_object, undefined_msg = msg, incorrect_msg = msg) 219 | test_object("star_wars_matrix", eq_condition = "equal", 220 | incorrect_msg = paste("Did you set the row and column names of `star_wars_matrix` correctly?", 221 | "Have another look at your code.")) 222 | success_msg("Great! You're on your way of becoming an R jedi!") 223 | ``` 224 | 225 | 226 | --- type:NormalExercise lang:r xp:100 skills:1 key:3b60b1a49a 227 | ## Calculating the worldwide box office 228 | 229 | The single most important thing for a movie in order to become an instant legend in Tinseltown is its worldwide box office figures. 230 | 231 | To calculate the total box office revenue for the three Star Wars movies, you have to take the sum of the US revenue column and the non-US revenue column. 232 | 233 | In R, the function [`rowSums()`](http://www.rdocumentation.org/packages/base/functions/colSums) conveniently calculates the totals for each row of a matrix. This function creates a new vector: 234 | 235 | ``` 236 | sum_of_rows_vector <- rowSums(my_matrix) 237 | ``` 238 | 239 | *** =instructions 240 | Calculate the worldwide box office figures for the three movies and put these in the vector named `worldwide_vector`. 241 | 242 | *** =hint 243 | The ['rowSums'](http://www.rdocumentation.org/packages/base/functions/colSums) function will calculate the total box office for each row of the `star_wars_matrix`, if you supply `star_wars_matrix` as an argument to that function by putting it between the parentheses. 244 | 245 | *** =pre_exercise_code 246 | ```{r} 247 | # no pec 248 | ``` 249 | 250 | *** =sample_code 251 | ```{r} 252 | # Star Wars box office in millions (!) 253 | new_hope <- c(460.998, 314.4) 254 | empire_strikes <- c(290.475, 247.900) 255 | return_jedi <- c(309.306, 165.8) 256 | star_wars_matrix <- rbind(new_hope, empire_strikes, return_jedi) 257 | rownames(star_wars_matrix) <- c("A New Hope", "The Empire Strikes Back", "Return of the Jedi") 258 | colnames(star_wars_matrix) <- c("US", "non-US") 259 | 260 | # Calculate the worldwide box office: worldwide_vector 261 | 262 | ``` 263 | 264 | *** =solution 265 | ```{r} 266 | # Star Wars box office in millions (!) 267 | new_hope <- c(460.998, 314.4) 268 | empire_strikes <- c(290.475, 247.900) 269 | return_jedi <- c(309.306, 165.8) 270 | star_wars_matrix <- rbind(new_hope, empire_strikes, return_jedi) 271 | rownames(star_wars_matrix) <- c("A New Hope", "The Empire Strikes Back", "Return of the Jedi") 272 | colnames(star_wars_matrix) <- c("US", "non-US") 273 | 274 | # Calculate the worldwide box office: worldwide_vector 275 | worldwide_vector <- rowSums(star_wars_matrix) 276 | ``` 277 | 278 | *** =sct 279 | ```{r} 280 | msg = "Do not change anything about the construction and naming of `star_wars_matrix`!" 281 | test_object("star_wars_matrix", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg) 282 | 283 | test_function("rowSums", "x", 284 | not_called_msg = "Have you used the function `rowSums() left 272 | ``` 273 | 274 | *** =sct 275 | ```{r} 276 | msg = "Do not change anything about the first lines that define `survey_vector` and `survey_factor`." 277 | test_object("survey_vector", undefined_msg = msg, incorrect_msg = msg) 278 | test_object("survey_factor", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg) 279 | test_output_contains("survey_factor[1] > survey_factor[2]", incorrect_msg = "Make sure to correctly perform the comparison between right and left; we want the battle of dexterity be sorted once and for all!") 280 | success_msg("Phew, it seems that R is neutral ;-).") 281 | ``` 282 | 283 | 284 | --- type:NormalExercise lang:r xp:100 skills:1 key:d65e71390a 285 | ## Ordered factors 286 | 287 | Sometimes you will deal with factors that do have a natural ordering between its categories. In this case, we have to tell R about it. 288 | 289 | Suppose you're leading a research team of five data analysts and that you want to evaluate their performance. To do this, you track their speed, evaluate each analyst as `"Slow"`, `"OK"` or `"Fast"`, and save the results in `speed_vector`. 290 | 291 | `speed_vector` should be converted to an ordinal factor since its categories have a natural ordening. You already know how to do this. Here's a template to define an ordered factor once more: 292 | 293 | ``` 294 | factor(some_vector, ordered = TRUE, levels = c("Level_1", "Level_2", ...)) 295 | ``` 296 | 297 | *** =instructions 298 | - Use the example code above to define `speed_factor`, that contains the speed information as an ordered factor. You can start from `speed_vector`, which is already created for you. 299 | - Print `speed_factor` to the console. 300 | - Use [`summary()`](http://www.rdocumentation.org/packages/base/functions/summary) to generate a summary of `speed_factor`: automagically, R prints the factor levels in the right order. 301 | 302 | *** =hint 303 | - Use the function [`factor()`](http://www.rdocumentation.org/packages/base/functions/factor) to create `factor_speed_vector` based on `speed_character_vector`. 304 | - The argument `ordered` should be set to `TRUE` since there is a natural ordering. 305 | - The argument `levels` should be equal to `c("Slow", "OK", "Fast")`. 306 | 307 | *** =pre_exercise_code 308 | ```{r} 309 | # no pec 310 | ``` 311 | 312 | *** =sample_code 313 | ```{r} 314 | # Create speed_vector 315 | speed_vector <- c("OK", "Slow", "Slow", "OK", "Fast") 316 | 317 | # Convert speed_vector to ordered speed_factor 318 | 319 | 320 | # Print speed_factor 321 | 322 | 323 | # Summarize speed_factor 324 | 325 | ``` 326 | 327 | *** =solution 328 | ```{r} 329 | # Create speed_vector 330 | speed_vector <- c("OK", "Slow", "Slow", "OK", "Fast") 331 | 332 | # Convert speed_vector to ordered speed_factor 333 | speed_factor <- factor(speed_vector, ordered = TRUE, levels = c("Slow", "OK", "Fast")) 334 | 335 | # Print speed_factor 336 | speed_factor 337 | 338 | # Summarize speed_factor 339 | summary(speed_factor) 340 | ``` 341 | 342 | *** =sct 343 | ```{r} 344 | test_error() 345 | msg <- "Do not change anything about the command that defines `speed_vector`." 346 | test_object("speed_vector", undefined_msg = msg, incorrect_msg = msg) 347 | test_correct({ 348 | test_object("speed_factor", eq_condition = "equal", 349 | incorrect_msg = "Make sure that you assigned the correct factor to `speed_factor`. Pay attention to the correct order of the `levels` argument.") 350 | },{ 351 | test_function("factor", "x") 352 | test_function("factor", "levels") 353 | test_function("factor", "ordered") 354 | }) 355 | test_output_contains("summary(speed_factor)", incorrect_msg = "Don't forget to summarise `speed_factor`. Use [`summary()`](http://www.rdocumentation.org/packages/base/functions/summary).") 356 | success_msg("Great! Have a look at the console. It is now indicated that the Levels indeed have an order associated, with the `<` sign. Continue to the next exercise."); 357 | success_msg("A job well done! Continue to the next exercise.") 358 | ``` 359 | 360 | 361 | --- type:NormalExercise lang:r xp:100 skills:1 key:e23011c42b 362 | ## Comparing ordered factors 363 | 364 | 'Data analyst number two' is having a bad day at work. He enters your office and starts complaining that 'data analyst number five' is slowing down the entire project. Since you know that 'data analyst number two' has the reputation of being a smarty-pants, you first decide to check if his statement is true. 365 | 366 | The fact that `speed_factor` is now ordered enables us to compare different elements (the data analysts in this case). You can simply do this by using a well-known operator: `>`. 367 | 368 | *** =instructions 369 | Check whether data analyst 2 is faster than data analyst 5. Simply print out the result, which should be a logical. 370 | 371 | *** =hint 372 | `vector[1] > vector[2]` checks whether the first element of vector is larger than the second element. 373 | 374 | *** =pre_exercise_code 375 | ```{r} 376 | # no pec 377 | ``` 378 | 379 | *** =sample_code 380 | ```{r} 381 | # Definition of speed_vector and speed_factor 382 | speed_factor <- factor(c("Fast", "Slow", "Slow", "Fast", "Ultra-fast"), 383 | ordered = TRUE, levels = c("Slow", "Fast", "Ultra-fast")) 384 | 385 | # Compare data analyst 2 with data analyst 5 386 | 387 | ``` 388 | 389 | *** =solution 390 | ```{r} 391 | # Definition of speed_factor 392 | speed_factor <- factor(c("Fast", "Slow", "Slow", "Fast", "Ultra-fast"), 393 | ordered = TRUE, levels = c("Slow", "Fast", "Ultra-fast")) 394 | 395 | # Compare data analyst 2 with data analyst 5 396 | speed_factor[2] > speed_factor[5] 397 | ``` 398 | 399 | *** =sct 400 | ```{r} 401 | msg <- "Do not change anything about the command that defines `speed_factor`!" 402 | test_object("speed_factor", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg) 403 | test_output_contains("speed_factor[2] > speed_factor[5]", 404 | incorrect_msg = paste("Have you correctly compared data analyst 2 to data analyst 5?", 405 | "Use subsetting in combination with the `>` operator.")) 406 | success_msg("Bellissimo! What does the result tell you? Data analyst two is complaining about the data analyst five while in fact he or she is the one slowing everything down!") 407 | ``` 408 | -------------------------------------------------------------------------------- /chapter5.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title_meta : Chapter 5 3 | title : Lists 4 | description : Lists, as opposed to vectors, can hold components of different types, just like your to-do list at home or at work. This chapter will teach you how to create, name and subset these lists! 5 | attachments : 6 | slides_link: https://s3.amazonaws.com/assets.datacamp.com/course/introduction_to_r/slides/ch5_slides.pdf 7 | 8 | --- type:VideoExercise lang:r xp:50 skills:1 key:d4fe5a84ef 9 | ## Create and Name Lists 10 | 11 | *** =video_link 12 | //player.vimeo.com/video/138173972 13 | 14 | *** =video_hls 15 | //videos.datacamp.com/transcoded/732_intro_to_r/v1/hls-ch5_1.master.m3u8 16 | 17 | 18 | --- type:NormalExercise lang:r xp:100 skills:1 key:b14f9472dd 19 | ## Create a list 20 | 21 | Just a quick refresher: A list in R allows you to gather a variety of objects in an ordered way. These objects can be matrices, vectors, factors, data frames, even other lists, etc. It is not even required that these objects are related to each other. You can construct a list with the [`list()`](http://www.rdocumentation.org/packages/base/functions/list) function: 22 | 23 | ``` 24 | my_list <- list(comp1, comp2 ...) 25 | ``` 26 | 27 | *** =instructions 28 | Construct a list, named `my_list`, that contains the variables `my_vector`, `my_matrix` and `my_factor` as list components. 29 | 30 | *** =hint 31 | Use the [`list()`](http://www.rdocumentation.org/packages/base/functions/list) function with `my_vector`, `my_matrix` and `my_factor` as arguments separated by a comma. 32 | 33 | *** =pre_exercise_code 34 | ```{r} 35 | # no pec 36 | ``` 37 | 38 | *** =sample_code 39 | ```{r} 40 | # Numeric vector: 1 up to 10 41 | my_vector <- 1:10 42 | 43 | # Numeric matrix: 1 up to 9 44 | my_matrix <- matrix(1:9, ncol = 3) 45 | 46 | # Factor of sizes 47 | my_factor <- factor(c("M","S","L","L","M"), ordered = TRUE, levels = c("S","M","L")) 48 | 49 | # Construct my_list with these different elements 50 | 51 | ``` 52 | 53 | *** =solution 54 | ```{r} 55 | # Numeric vector: 1 up to 10 56 | my_vector <- 1:10 57 | 58 | # Numeric matrix: 1 up to 9 59 | my_matrix <- matrix(1:9, ncol = 3) 60 | 61 | # Factor of sizes 62 | my_factor <- factor(c("M","S","L","L","M"), ordered = TRUE, levels = c("S","M","L")) 63 | 64 | # Construct my_list with these different elements 65 | my_list <- list(my_vector, my_matrix, my_factor) 66 | ``` 67 | 68 | *** =sct 69 | ```{r} 70 | test_error() 71 | msg = "Do not remove or change the definition of the variables `my_vector`, `my_matrix` or `my_factor`!" 72 | test_object("my_vector", undefined_msg = msg, incorrect_msg = msg) 73 | test_object("my_matrix", undefined_msg = msg, incorrect_msg = msg) 74 | test_object("my_factor", undefined_msg = msg, incorrect_msg = msg) 75 | test_object("my_list", incorrect_msg = "It looks like `my_list` does not contain the correct elements. Have another look.") 76 | success_msg("Wonderful! Your skillset is growing at a staggering pace! Head over to the next exercise.") 77 | ``` 78 | 79 | 80 | --- type:NormalExercise lang:r xp:100 skills:1 key:849f04f218 81 | ## Listception: lists in lists 82 | 83 | As mentioned before, lists can also contain other lists. This works just the same as storing other types of R objects in a list. Next to the variables `my_vector`, `my_matrix` and `my_factor` from the previous exercise, now also `my_list` is predefined. Up to you to merge them in a new list; a super list! 84 | 85 | *** =instructions 86 | - Construct a list, named `my_super_list`, that now contains the four predefined variables listed in the sample code (in the same order). 87 | - Print the structure of `my_super_list` with the [`str()`](http://www.rdocumentation.org/packages/utils/functions/str) function. Be sure to enter the variables in the following order: `my_ vector`, `my_ matrix`, `my_ factor`, `my_ list` 88 | 89 | *** =hint 90 | Just as in the previous exercise, use the [`list()`](http://www.rdocumentation.org/packages/base/functions/list) function. This time you have to add four components. 91 | 92 | *** =pre_exercise_code 93 | ```{r} 94 | # no pec 95 | ``` 96 | 97 | *** =sample_code 98 | ```{r} 99 | # Numeric vector: 1 up to 10 100 | my_vector <- 1:10 101 | 102 | # Numeric matrix: 1 up to 9 103 | my_matrix <- matrix(1:9, ncol = 3) 104 | 105 | # Factor of sizes 106 | my_factor <- factor(c("M","S","L","L","M"), ordered = TRUE, levels = c("S","M","L")) 107 | 108 | # List containing vector, matrix and factor 109 | my_list <- list(my_vector, my_matrix, my_factor) 110 | 111 | # Construct my_super_list with the four data structures above 112 | 113 | 114 | # Display structure of my_super_list 115 | 116 | ``` 117 | 118 | *** =solution 119 | ```{r} 120 | # Numeric vector: 1 up to 10 121 | my_vector <- 1:10 122 | 123 | # Numeric matrix: 1 up to 9 124 | my_matrix <- matrix(1:9, ncol = 3) 125 | 126 | # Factor of sizes 127 | my_factor <- factor(c("M","S","L","L","M"), ordered = TRUE, levels = c("S","M","L")) 128 | 129 | # List containing vector, matrix and factor 130 | my_list <- list(my_vector, my_matrix, my_factor) 131 | 132 | # Construct my_super_list with the four data structures above 133 | my_super_list <- list(my_vector, my_matrix, my_factor, my_list) 134 | 135 | # Display structure of my_super_list 136 | str(my_super_list) 137 | ``` 138 | 139 | *** =sct 140 | ```{r} 141 | test_error() 142 | msg = "Do not remove or change the definition of the variables `my_vector`, `my_matrix`, `my_factor` or `my_list`!" 143 | test_object("my_vector", undefined_msg = msg, incorrect_msg = msg) 144 | test_object("my_matrix", undefined_msg = msg, incorrect_msg = msg) 145 | test_object("my_factor", undefined_msg = msg, incorrect_msg = msg) 146 | test_object("my_list", undefined_msg = msg, incorrect_msg = msg) 147 | test_object("my_super_list", 148 | incorrect_msg = "It looks like `my_super_list` does not contain the correct elements. It's also possible that the order is not correct. Have another look.") 149 | test_output_contains("str(my_super_list)", incorrect_msg = "Don't forget to display the structure of `my_super_list` using the [`str()`](http://www.rdocumentation.org/packages/utils/functions/str) function.") 150 | success_msg("Nice one! Can you see from the displayed structure how the vector, matrix and ordered factor appear twice: once in the top-level list and once in the embedded list. Next!") 151 | ``` 152 | 153 | 154 | --- type:NormalExercise lang:r xp:100 skills:1 key:f37117a766 155 | ## Create a named list (1) 156 | 157 | Well done! Let us keep this train going! To make the elements of your list clearer, you'll often want to name them: 158 | 159 | ``` 160 | my_list <- list(name1 = your_comp1, 161 | name2 = your_comp2) 162 | ``` 163 | 164 | If you want to name your lists after you've created them, you can use the [`names()`](http://www.rdocumentation.org/packages/base/functions/names) function as you did with vectors. The following commands are fully equivalent to the assignment above: 165 | 166 | ``` 167 | my_list <- list(your_comp1, your_comp2) 168 | names(my_list) <- c("name1", "name2") 169 | ``` 170 | 171 | *** =instructions 172 | - Change the code that build `my_list` by adding names to the components. Use for `my_matrix` the name `mat`, for `my_vector` the name `vec` and for `my_factor` the name `fac`. 173 | - Print the list to the console and inspect the output. 174 | 175 | *** =hint 176 | The first method of assigning names to your list components is the easiest. It starts like this: 177 | ``` 178 | my_list <- list(vec = my_vector) 179 | ``` 180 | Add the other two components in a similar fashion. 181 | 182 | *** =pre_exercise_code 183 | ```{r} 184 | # no pec 185 | ``` 186 | 187 | *** =sample_code 188 | ```{r} 189 | # Numeric vector: 1 up to 10 190 | my_vector <- 1:10 191 | 192 | # Numeric matrix: 1 up to 9 193 | my_matrix <- matrix(1:9, ncol = 3) 194 | 195 | # Factor of sizes 196 | my_factor <- factor(c("M","S","L","L","M"), ordered = TRUE, levels = c("S","M","L")) 197 | 198 | # Adapt code to add names to elements in my_list 199 | my_list <- list(my_vector, my_matrix, my_factor) 200 | 201 | # Print my_list to the console 202 | 203 | ``` 204 | 205 | *** =solution 206 | ```{r} 207 | # Numeric vector: 1 up to 10 208 | my_vector <- 1:10 209 | 210 | # Numeric matrix: 1 up to 9 211 | my_matrix <- matrix(1:9, ncol = 3) 212 | 213 | # Factor of sizes 214 | my_factor <- factor(c("M","S","L","L","M"), ordered = TRUE, levels = c("S","M","L")) 215 | 216 | # Adapt code to add names to elements in my_list 217 | my_list <- list(vec = my_vector, mat = my_matrix, fac = my_factor) 218 | 219 | # Print my_list to the console 220 | my_list 221 | ``` 222 | 223 | *** =sct 224 | ```{r} 225 | test_error() 226 | msg = "Do not remove or change the definition of the variables `my_vector`, `my_matrix` or `my_factor`!" 227 | test_object("my_vector", undefined_msg = msg, incorrect_msg = msg) 228 | test_object("my_matrix", undefined_msg = msg, incorrect_msg = msg) 229 | test_object("my_factor", undefined_msg = msg, incorrect_msg = msg) 230 | test_object("my_list", 231 | incorrect_msg = "It looks like `my_list` does not contain the correct elements.") 232 | test_object("my_list", eq_condition = "equal", 233 | incorrect_msg = "It looks like `my_list` does not contain the correct naming for the components."); 234 | test_output_contains("my_list", 235 | incorrect_msg = "Don't forget to print `my_list` to the console! Simply add `my_list` on a new line in the script.") 236 | success_msg("Great! Not only do you know how to construct lists now, you can also name them; a skill that will prove most useful in practice. Continue to the next exercise.") 237 | ``` 238 | 239 | 240 | --- type:NormalExercise lang:r xp:100 skills:1 key:dc5f4f6f30 241 | ## Create a named list (2) 242 | 243 | Being a huge movie fan, you decide to start storing information on good movies with the help of lists. 244 | 245 | Start by creating a list for the movie "The Shining". The variables `actors` and `reviews` that you'll need have already been coded in the sample code. 246 | 247 | *** =instructions 248 | Create the variable `shining_list`. The list contains the movie title first as "title", then the actor names as "actors", and finally the review scores factor as "reviews". Make sure to adopt the same order, and pay attention to the correct naming! 249 | 250 | *** =hint 251 | Let's get you started with a chunk of code: 252 | 253 | shining_list <- list(title = "The Shining") 254 | 255 | Can you complete the rest? You still have to add `actors_vector` and `reviews_factor` with the appropriate names. 256 | 257 | *** =pre_exercise_code 258 | ```{r} 259 | # no pec 260 | ``` 261 | 262 | *** =sample_code 263 | ```{r} 264 | # Create actors and reviews 265 | actors_vector <- c("Jack Nicholson","Shelley Duvall","Danny Lloyd","Scatman Crothers","Barry Nelson") 266 | reviews_factor <- factor(c("Good", "OK", "Good", "Perfect", "Bad", "Perfect", "Good"), 267 | ordered = TRUE, levels = c("Bad", "OK", "Good", "Perfect")) 268 | 269 | # Create shining_list 270 | 271 | ``` 272 | 273 | *** =solution 274 | ```{r} 275 | # Create actors and reviews 276 | actors_vector <- c("Jack Nicholson","Shelley Duvall","Danny Lloyd","Scatman Crothers","Barry Nelson") 277 | reviews_factor <- factor(c("Good", "OK", "Good", "Perfect", "Bad", "Perfect", "Good"), 278 | ordered = TRUE, levels = c("Bad", "OK", "Good", "Perfect")) 279 | 280 | # Create the list 'shining_list' 281 | shining_list <- list(title = "The Shining", actors = actors_vector, reviews = reviews_factor) 282 | ``` 283 | 284 | *** =sct 285 | ```{r} 286 | test_error() 287 | msg = "Do not remove or change the definition of the pre-set variables!" 288 | test_object("actors_vector", undefined_msg = msg, incorrect_msg = msg) 289 | test_object("reviews_factor", undefined_msg = msg, incorrect_msg = msg) 290 | test_object("shining_list", 291 | incorrect_msg = "It looks like `shining_list` does not contain the correct elements.") 292 | test_object("shining_list", eq_condition = "equal", 293 | incorrect_msg = "It looks like `shining_list` does not contain the correct naming for the components.") 294 | success_msg("Perfect!") 295 | ``` 296 | 297 | 298 | --- type:VideoExercise lang:r xp:50 skills:1 key:a5e5ff2680 299 | ## Subset and Extend Lists 300 | 301 | *** =video_link 302 | //player.vimeo.com/video/138173990 303 | 304 | *** =video_hls 305 | //videos.datacamp.com/transcoded/732_intro_to_r/v1/hls-ch5_2.master.m3u8 306 | 307 | 308 | --- type:NormalExercise lang:r xp:100 skills:1 key:1e3b5f4b0a 309 | ## Selecting elements from a list 310 | 311 | Your list will often be built out of numerous elements and components. Therefore, getting a single element, multiple elements, or a component out of it is not always straightforward. 312 | 313 | To select a single element from a list, for example the first element from `shining_list`, you can any one of the following commands: 314 | 315 | ``` 316 | shining_list[[1]] 317 | shining_list[["title"]] 318 | shining_list$title 319 | ``` 320 | 321 | If you perform selection with single square brackets, you'll end up with another list that contains the specified elements: 322 | 323 | ``` 324 | shining_list[c(2,3)] 325 | shining_list[c(F,T,T)] 326 | ``` 327 | 328 | *** =instructions 329 | - Select the actors from `shining_list` and assign the result to `act`. 330 | - Create a new list containing only the title and the reviews of `shining_list`; save the new list in `sublist`. 331 | - Display the structure of `sublist`. 332 | 333 | *** =hint 334 | For the first instruction you need double brackets (or the `$`), for the second one the single brackets will do. 335 | 336 | *** =pre_exercise_code 337 | ```{r} 338 | load(url("http://s3.amazonaws.com/assets.datacamp.com/course/introduction_to_r/chapter5.RData")) 339 | ``` 340 | 341 | *** =sample_code 342 | ```{r} 343 | # shining_list is already defined in the workspace 344 | 345 | # Actors from shining_list: act 346 | 347 | 348 | # List containing title and reviews from shining_list: sublist 349 | 350 | 351 | # Display structure of sublist 352 | 353 | ``` 354 | 355 | *** =solution 356 | ```{r} 357 | # shining_list is already defined in the workspace 358 | 359 | # Actors from shining_list: act 360 | act <- shining_list[["actors"]] 361 | 362 | # List containing title and reviews from shining_list: sublist 363 | sublist <- shining_list[c("title", "reviews")] 364 | 365 | # Display structure of sublist 366 | str(sublist) 367 | ``` 368 | 369 | *** =sct 370 | ```{r} 371 | test_error() 372 | msg = "Do not remove or override `shining_list`." 373 | test_object("shining_list", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg) 374 | 375 | test_object("act", incorrect_msg = "Have you correctly selected the actors from `shining_list`?") 376 | test_object("sublist", incorrect_msg = "Have you correctly selected the title and reviews from `shining_list`? Use single brackets in combination with a vector to select multiple elements.") 377 | test_output_contains("str(sublist)", incorrect_msg = "Don't forget to display the structure of `sublist`!") 378 | 379 | success_msg("Nice! That was still pretty easy, right? Always be aware of this difference between `[` and `[[`!") 380 | ``` 381 | 382 | 383 | --- type:NormalExercise lang:r xp:100 skills:1 key:cbda2853c3 384 | ## Chaining your selections 385 | 386 | Besides selecting entire list elements, it's also possible that you have to access specific parts of these components themselves. It's perfectly possible to _chain your subsetting operations_ in R. 387 | 388 | For example, with 389 | 390 | ``` 391 | shining_list[[2]][1] 392 | ``` 393 | 394 | you select from the second component, actors (`shining_list[[2]]`), the first element (`[1]`). When you type this in the console, you will see the answer is Jack Nicholson. 395 | 396 | *** =instructions 397 | - Select from the `shining_list` the last actor and assign the result to `last_actor`. 398 | - Select from the `shining_list` the second review score (which is a factor). Store the result in `second_review`. 399 | 400 | *** =hint 401 | - If you want to do things nicely: `length(shining_list$actors)` gives you the number of actors, and thus the element to select. 402 | - You can select the information of the second review with `shining_list$reviews[2, ]`. 403 | 404 | *** =pre_exercise_code 405 | ```{r} 406 | load(url("http://s3.amazonaws.com/assets.datacamp.com/course/introduction_to_r/chapter5.RData")) 407 | ``` 408 | 409 | *** =sample_code 410 | ```{r} 411 | # shining_list is already defined in the workspace 412 | 413 | # Select the last actor: last_actor 414 | 415 | 416 | # Select the second review: second_review 417 | 418 | ``` 419 | 420 | *** =solution 421 | ```{r} 422 | # shining_list is already defined in the workspace 423 | 424 | # Select the last actor: last_actor 425 | last_actor <- shining_list$actors[length(shining_list$actors)] 426 | 427 | # Select the second review: second_review 428 | second_review <- shining_list$reviews[2] 429 | ``` 430 | 431 | *** =sct 432 | ```{r} 433 | test_error() 434 | msg = "Do not remove or override `shining_list`." 435 | test_object("shining_list", eq_condition = "equal", undefined_msg = msg, incorrect_msg = msg) 436 | 437 | test_object("last_actor", 438 | incorrect_msg = "Looks like `last_actor` does not equal the last actor.") 439 | test_object("second_review", 440 | incorrect_msg = "It looks like `second_review` does not contain the factor that corresponds to the second review.") 441 | success_msg("Great! Selecting elements from lists is rather easy isn't it? Continue to the next exercise.") 442 | ``` 443 | 444 | --- type:NormalExercise lang:r xp:100 skills:1 key:35ba0d6f0d 445 | ## Extending a list 446 | 447 | You already know that the `$` as well as `[[` can be used to select elements from lists. They can also be used to extend lists. To extend `shining_list` with some personal opinion, you could do one of the following things: 448 | 449 | ``` 450 | shining_list$my_opinion <- "Love it!" 451 | shining_list[["my_opinion"]] <- "Love it!" 452 | ``` 453 | 454 | Being proud of your first list, you shared it with the members of your movie hobby club. However, one of the senior members, a guy named M. McDowell, noted that you forgot to add the release year (1980). Given your ambitions to become next year's president of the club, you decide to add this information to the list. To fully make up for your mistake, you also decide to add the name of the director (Stanley Kubrick). 455 | 456 | *** =instructions 457 | - Add the release year as a numeric to `shining_list` under the name `year`. 458 | - Add the director to the list, `"Stanley Kubrick"`, with the name `director`. 459 | - Finally, inspect the structure of `shining_list`. 460 | 461 | *** =hint 462 | Let the examples in the assignment guide you to list extension mastery! Remember that R is case sensitive! 463 | 464 | *** =pre_exercise_code 465 | ```{r} 466 | load(url("http://s3.amazonaws.com/assets.datacamp.com/course/introduction_to_r/chapter5.RData")) 467 | ``` 468 | 469 | *** =sample_code 470 | 471 | ```{r} 472 | # shining_list is already defined in the workspace 473 | 474 | # Add the release year to shining_list 475 | 476 | 477 | # Add the director to shining_list 478 | 479 | 480 | # Inspect the structure of shining_list 481 | 482 | ``` 483 | 484 | *** =solution 485 | ```{r} 486 | # shining_list is already defined in the workspace 487 | 488 | # Add the release year to shining_list 489 | shining_list$year <- 1980 490 | 491 | # Add the director to shining_list 492 | shining_list$director <- "Stanley Kubrick" 493 | 494 | # Inspect the structure of shining_list 495 | str(shining_list) 496 | ``` 497 | 498 | *** =sct 499 | ```{r} 500 | test_error() 501 | test_object("shining_list", 502 | incorrect_msg = paste("Have you correctly added both the year and the director to `shining_list`?", 503 | "Make sure to use the correct names (\"year\" and \"director\"). Remember that R is case sensitive!")) 504 | 505 | test_output_contains("str(shining_list)", 506 | incorrect_msg = "Do not forget to inspect the structure of `shining_list` using the [`str()`](http://www.rdocumentation.org/packages/utils/functions/str) function.") 507 | 508 | success_msg("Congratulations on finishing up on this chapter!") 509 | ``` 510 | -------------------------------------------------------------------------------- /chapter6.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title_meta : Chapter 6 3 | title : Data Frames 4 | description : Most data sets you will be working with will be stored as a data frame. By the end of this chapter, you will be able to create a data frame, select interesting parts of a data frame and order a data frame according to certain variables. 5 | attachments : 6 | slides_link: https://s3.amazonaws.com/assets.datacamp.com/course/introduction_to_r/slides/ch6_slides.pdf 7 | 8 | --- type:VideoExercise lang:r xp:50 skills:1 key:d4bde604ab 9 | ## Explore the Data Frame 10 | 11 | *** =video_link 12 | //player.vimeo.com/video/138173996 13 | 14 | *** =video_hls 15 | //videos.datacamp.com/transcoded/732_intro_to_r/v1/hls-ch6_1.master.m3u8 16 | 17 | --- type:NormalExercise lang:r xp:100 skills:1 key:d4ddcf9a7d 18 | ## Have a look at your data set 19 | 20 | Working with large data sets is not uncommon in data analysis. When you work with (extremely) large data sets and data frames, your first task as a data analyst is to develop a clear understanding of its structure and main elements. Therefore, it is often useful to show only a small part of the entire data set. 21 | 22 | There are several ways to do this in R. The function [`head()`](http://www.rdocumentation.org/packages/utils/functions/head) enables you to show the first observations of a data frame (or any R object you pass to it). Unoriginally, the function [`tail()`](http://www.rdocumentation.org/packages/utils/functions/head) prints out the last observations in your data set. You can also use the function [`dim()`](http://www.rdocumentation.org/packages/base/functions/dim) to show the dimensions of your data set. 23 | 24 | In this exercise, you'll be working with the `mtcars` dataset, that is available in R by default. 25 | 26 | *** =instructions 27 | - Print the first observations of the [`mtcars`](http://www.rdocumentation.org/packages/datasets/functions/mtcars) data set. 28 | - Use the [`tail()`](http://www.rdocumentation.org/packages/utils/functions/head) function to display the last observations. 29 | - Finally, display the overall dimensions of the [`mtcars`](http://www.rdocumentation.org/packages/datasets/functions/mtcars) data frame with [`dim()`](http://www.rdocumentation.org/packages/base/functions/dim). 30 | 31 | *** =hint 32 | You'll need [`head()`](http://www.rdocumentation.org/packages/utils/functions/head) to show the first observations in [`mtcars`](http://www.rdocumentation.org/packages/datasets/functions/mtcars). 33 | 34 | *** =pre_exercise_code 35 | ```{r} 36 | # no pec 37 | ``` 38 | 39 | *** =sample_code 40 | ```{r} 41 | # Print the first observations of mtcars 42 | 43 | 44 | # Print the last observations of mtcars 45 | 46 | 47 | # Print the dimensions of mtcars 48 | ``` 49 | 50 | *** =solution 51 | ```{r} 52 | # Print the first observations of mtcars 53 | head(mtcars) 54 | 55 | # Print the last observations of mtcars 56 | tail(mtcars) 57 | 58 | # Print the dimensions of mtcars 59 | dim(mtcars) 60 | ``` 61 | 62 | *** =sct 63 | ```{r} 64 | test_error() 65 | test_function("head", "x", incorrect_msg = "Have you specified the correct parameter in the [`head()`](http://www.rdocumentation.org/packages/utils/functions/head) function? Make sure to pass it a data set you want to inspect, `mtcars` in this case.") 66 | test_function("tail", "x", incorrect_msg = "Have you specified the correct parameter in the [`tail()`](http://www.rdocumentation.org/packages/utils/functions/head) function? Make sure to pass it a data set you want to inspect, `mtcars` in this case.") 67 | test_output_contains("dim(mtcars)", incorrect_msg = "Don't forget to also call the [`dim()`](http://www.rdocumentation.org/packages/base/functions/dim) function on `mtcars`!") 68 | 69 | success_msg("Wonderful! So, do you now have a good idea about what we have in the data set? For a full overview of the variables' meaning, type `?mtcars` in the console and read the help page. Continue to the next exercise!") 70 | ``` 71 | 72 | 73 | --- type:NormalExercise lang:r xp:100 skills:1 key:c8f389fdbd 74 | ## Have a look at the structure 75 | 76 | Another method that is often used to get a rapid overview of your data is the function [`str()`](http://www.rdocumentation.org/packages/utils/functions/str). The function [`str()`](http://www.rdocumentation.org/packages/utils/functions/str) shows you the structure of your data set. For a data frame it tells you: 77 | 78 | - The total number of observations (e.g. 32 car types) 79 | - The total number of variables (e.g. 11 car features) 80 | - A full list of the variables names (e.g. mpg, cyl ... ) 81 | - The data type of each variable (e.g. num for car features) 82 | - The first observations 83 | 84 | Applying the [`str()`](http://www.rdocumentation.org/packages/utils/functions/str) function will often be the first thing that you do when receiving a new data set or data frame. It is a great way to get more insight in your data set before diving into the real analysis. 85 | 86 | *** =instructions 87 | Investigate the structure of [`mtcars`](http://www.rdocumentation.org/packages/datasets/functions/mtcars). Make sure that you see the same numbers, variables and data types as mentioned above. 88 | 89 | *** =hint 90 | Use the [`str()`](http://www.rdocumentation.org/packages/utils/functions/str) function with [`mtcars`](http://www.rdocumentation.org/packages/datasets/functions/mtcars) as input! 91 | 92 | *** =pre_exercise_code 93 | ```{r} 94 | # no pec 95 | ``` 96 | 97 | *** =sample_code 98 | ```{r} 99 | # Investigate the structure of the mtcars data set 100 | ``` 101 | 102 | *** =solution 103 | ```{r} 104 | # Investigate the structure of the mtcars data set 105 | str(mtcars) 106 | ``` 107 | 108 | *** =sct 109 | ```{r} 110 | test_function("str","object",incorrect_msg = "Make sure to check the structure of the `mtcars` data set.") 111 | test_output_contains("str(mtcars)", incorrect_msg = "Make sure that you use the [`str()`](http://www.rdocumentation.org/packages/utils/functions/str) function on `mtcars`.") 112 | success_msg("Nice work! Can you find all the information that is listed in the exercise's assignment? Continue to the next exercise.") 113 | ``` 114 | 115 | 116 | --- type:NormalExercise lang:r xp:100 skills:1 key:f09c0189ac 117 | ## Creating a data frame (1) 118 | 119 | Since using built-in data sets is not even half the fun of creating your own data sets, the next exercises are based on your personally developed data set. So put your jet pack on because it is time for some good old fashioned space exploration! 120 | 121 | As a first goal, you want to construct a data frame that describes the main characteristics of eight planets in our solar system. According to your good friend Buzz, the main features of a planet are: 122 | 123 | - The type of the planet (Terrestrial or Gas Giant). 124 | - The planet's diameter relative to the diameter of the Earth. 125 | - The planet's rotation across the sun relative to that of the Earth. 126 | - If the planet has rings or not (TRUE or FALSE). 127 | 128 | After doing some high-quality research on [Wikipedia](http://en.wikipedia.org/wiki/Planet), you feel confident enough to create the necessary vectors: `planets`, `type`, `diameter`, `rotation` and `rings`. Can you correctly use the [`data.frame()`](http://www.rdocumentation.org/packages/base/functions/data.frame) function to create a data set from this information? 129 | 130 | *** =instructions 131 | - Use the function [`data.frame()`](http://www.rdocumentation.org/packages/base/functions/data.frame) to construct `planets_df`. 132 | - Make sure that you've actually created a data frame with 8 observations and 5 variables with [`str()`](http://www.rdocumentation.org/packages/utils/functions/str). 133 | 134 | *** =hint 135 | The [`data.frame()`](http://www.rdocumentation.org/packages/base/functions/data.frame) function takes as arguments the vectors that will become the columns of the data frame, separated by commas. The columns in this case are (in this order): `planet`, `type`, `diameter`, `rotation` and `rings`. 136 | 137 | *** =pre_exercise_code 138 | ```{r} 139 | # no pec 140 | ``` 141 | 142 | *** =sample_code 143 | ```{r} 144 | # Definition of vectors 145 | planets <- c("Mercury", "Venus", "Earth", "Mars", "Jupiter", "Saturn", "Uranus", "Neptune") 146 | type <- c("Terrestrial planet", "Terrestrial planet", "Terrestrial planet", 147 | "Terrestrial planet", "Gas giant", "Gas giant", "Gas giant", "Gas giant") 148 | diameter <- c(0.382, 0.949, 1, 0.532, 11.209, 9.449, 4.007, 3.883) 149 | rotation <- c(58.64, -243.02, 1, 1.03, 0.41, 0.43, -0.72, 0.67) 150 | rings <- c(FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE) 151 | 152 | # Create a data frame: planets_df 153 | 154 | 155 | # Display the structure of planets_df 156 | 157 | ``` 158 | 159 | *** =solution 160 | ```{r} 161 | # Definition of vectors 162 | planets <- c("Mercury", "Venus", "Earth", "Mars", "Jupiter", "Saturn", "Uranus", "Neptune") 163 | type <- c("Terrestrial planet", "Terrestrial planet", "Terrestrial planet", 164 | "Terrestrial planet", "Gas giant", "Gas giant", "Gas giant", "Gas giant") 165 | diameter <- c(0.382, 0.949, 1, 0.532, 11.209, 9.449, 4.007, 3.883) 166 | rotation <- c(58.64, -243.02, 1, 1.03, 0.41, 0.43, -0.72, 0.67) 167 | rings <- c(FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE) 168 | 169 | # Create a data frame: planets_df 170 | planets_df <- data.frame(planets, type, diameter, rotation, rings) 171 | 172 | # Display the structure of planets_df 173 | str(planets_df) 174 | ``` 175 | 176 | *** =sct 177 | ```{r} 178 | test_correct({ 179 | test_object("planets_df", 180 | undefined_msg = "Please make sure to define a variable `planets_df`.", 181 | incorrect_msg = "Make sure to assign the correct order of arguments to the data.frame `planets_df`. The correct order is planets, type, diameter, rotation and rings.") 182 | }, { 183 | msg = "Do not change anything about the definition of the vector. Only add code to create the `planets_df` data frame!" 184 | test_object("planets", undefined_msg = msg, incorrect_msg = msg) 185 | test_object("type", undefined_msg = msg, incorrect_msg = msg) 186 | test_object("diameter", undefined_msg = msg, incorrect_msg = msg) 187 | test_object("rotation", undefined_msg = msg, incorrect_msg = msg) 188 | test_object("rings", undefined_msg = msg, incorrect_msg = msg) 189 | }) 190 | 191 | test_output_contains("str(planets_df)", incorrect_msg = "Don't forget to display the structure of `planets_df`!") 192 | 193 | success_msg("Great job! The structure of `planets_df` reveals that both the `planets` as the `type` column are factors, and not character vectors. That's not really what you want, right?") 194 | ``` 195 | 196 | --- type:NormalExercise lang:r xp:100 skills:1 key:7090dc3538 197 | ## Creating a data frame (2) 198 | 199 | In the previous exercise, you found out that both the `planets` and `type` columns of `planets_df` are factors. For the `type` column this makes sense, because a planet type is some kind of category. For the `planets` column, however, that contains the planet names, this is less logical. 200 | 201 | You can set the `stringsAsFactors` argument inside [`data.frame()`](http://www.rdocumentation.org/packages/base/functions/data.frame) to avoid that R automatically converts character vectors to factors: 202 | 203 | ``` 204 | data.frame(vec1, vec2, ..., stringsAsFactors = FALSE) 205 | ``` 206 | 207 | Up to you now to adapt the way `planets_df` is constructed! 208 | 209 | *** =instructions 210 | - Encode the `type` vector in a factor, called `type_factor`. 211 | - Next use `planets`, `type_factor`, `diameter`, `rotation` and `rings` to construct `planets_df`. This time, make sure that strings are not converted to factors, by setting `stringsAsFactors = FALSE`. 212 | - Display the structure of `planets_df` to check you coded things correctly. 213 | 214 | *** =hint 215 | Use the function [`factor()`](http://www.rdocumentation.org/packages/base/functions/factor) to encode `type` as a factor. 216 | 217 | *** =pre_exercise_code 218 | ```{r} 219 | # no pec 220 | ``` 221 | 222 | *** =sample_code 223 | ```{r} 224 | # Definition of vectors 225 | planets <- c("Mercury", "Venus", "Earth", "Mars", "Jupiter", "Saturn", "Uranus", "Neptune") 226 | type <- c("Terrestrial planet", "Terrestrial planet", "Terrestrial planet", 227 | "Terrestrial planet", "Gas giant", "Gas giant", "Gas giant","Gas giant") 228 | diameter <- c(0.382, 0.949, 1, 0.532, 11.209, 9.449, 4.007, 3.883) 229 | rotation <- c(58.64, -243.02, 1, 1.03, 0.41, 0.43, -0.72, 0.67) 230 | rings <- c(FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE) 231 | 232 | # Encode type as a factor: type_factor 233 | 234 | 235 | # Construct planets_df: strings are not converted to factors! 236 | 237 | 238 | # Display the structure of planets_df 239 | 240 | ``` 241 | 242 | *** =solution 243 | ```{r} 244 | # Definition of vectors 245 | planets <- c("Mercury", "Venus", "Earth", "Mars", "Jupiter", "Saturn", "Uranus", "Neptune") 246 | type <- c("Terrestrial planet", "Terrestrial planet", "Terrestrial planet", 247 | "Terrestrial planet", "Gas giant", "Gas giant", "Gas giant", "Gas giant") 248 | diameter <- c(0.382, 0.949, 1, 0.532, 11.209, 9.449, 4.007, 3.883) 249 | rotation <- c(58.64, -243.02, 1, 1.03, 0.41, 0.43, -0.72, 0.67) 250 | rings <- c(FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE) 251 | 252 | # Encode type as a factor: type_factor 253 | type_factor <- factor(type) 254 | 255 | # Construct planets_df: strings are not converted to factors! 256 | planets_df <- data.frame(planets, type_factor, diameter, rotation, rings, stringsAsFactors = FALSE) 257 | 258 | # Display the structure of planets_df 259 | str(planets_df) 260 | ``` 261 | 262 | *** =sct 263 | ```{r} 264 | msg <- "Do not remove or change the definition of all the vectors!" 265 | test_object("planets", undefined_msg = msg, incorrect_msg = msg) 266 | test_object("type", undefined_msg = msg, incorrect_msg = msg) 267 | test_object("diameter", undefined_msg = msg, incorrect_msg = msg) 268 | test_object("rotation", undefined_msg = msg, incorrect_msg = msg) 269 | test_object("rings", undefined_msg = msg, incorrect_msg = msg) 270 | 271 | test_object("type_factor", incorrect_msg = "Have you correctly created `type_factor`? Simply use the [`factor()`](http://www.rdocumentation.org/packages/base/functions/factor) function on `type`.") 272 | test_object("planets_df", incorrect_msg = "Have you correctly created `planets_df`? Make sure to use `type_factor` instead of `type` and set `stringsAsFactors` to `FALSE` inside [`data.frame()`](http://www.rdocumentation.org/packages/base/functions/data.frame).") 273 | test_output_contains("str(planets_df)", incorrect_msg = "Don't forget to display the structure of `planets_df`.") 274 | 275 | success_msg("That looks more like it! Head over to the next exercise.") 276 | ``` 277 | 278 | 279 | --- type:NormalExercise lang:r xp:100 skills:1 key:a80ae7fbe8 280 | ## Rename the data frame columns 281 | 282 | As a data frame is actually a list containing same-length vectors under the hood, it's possible to name and rename data frames just as you did with lists. To create a data frame and name it in one and the same call you can use: 283 | 284 | ``` 285 | data.frame(name1 = vec1, name2 = vec2, ...) 286 | ``` 287 | 288 | You can also name a data frame after creating it: 289 | 290 | ``` 291 | my_df <- data.frame(vec1, vec2, ...) 292 | names(my_df) <- c("name1", "name2", ...) 293 | ``` 294 | 295 | Very proud of your first ever data frame, you show it to your friend Buzz. He's pretty impressed that you managed to include both factor and character columns, but he still finds the column names pretty odd. Time to make some improvements! The code that constructs the improved data frame, as you coded in the previous exercise, is already included. 296 | 297 | *** =instructions 298 | Rename the columns of `planets_df`. As `planets_df` is already created, you'll want to use the [`names()`](http://www.rdocumentation.org/packages/base/functions/names) function. 299 | 300 | - Name the `planets` column `name`. 301 | - Name the `type_factor` column `type`. 302 | - You can keep the names `diameter` and `rotation`. 303 | - Change the name `rings` to `has_rings`. 304 | 305 | Finally, print `planets_df` after you renamed it (not its structure!). 306 | 307 | *** =hint 308 | You'll need the vector containing `"name"`, `"type"`, `"diameter"`, `"rotation"` and `"has_rings"`. 309 | 310 | *** =pre_exercise_code 311 | ```{r} 312 | # no pec 313 | ``` 314 | 315 | *** =sample_code 316 | ```{r} 317 | # Construct improved planets_df 318 | planets <- c("Mercury", "Venus", "Earth", "Mars", "Jupiter", "Saturn", "Uranus", "Neptune") 319 | type <- c("Terrestrial planet", "Terrestrial planet", "Terrestrial planet", 320 | "Terrestrial planet", "Gas giant", "Gas giant", "Gas giant", "Gas giant") 321 | diameter <- c(0.382, 0.949, 1, 0.532, 11.209, 9.449, 4.007, 3.883) 322 | rotation <- c(58.64, -243.02, 1, 1.03, 0.41, 0.43, -0.72, 0.67) 323 | rings <- c(FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE) 324 | type_factor <- factor(type) 325 | planets_df <- data.frame(planets, type_factor, diameter, rotation, rings, stringsAsFactors = FALSE) 326 | 327 | # Improve the names of planets_df 328 | 329 | 330 | # Print planets_df 331 | 332 | ``` 333 | 334 | *** =solution 335 | ```{r} 336 | # Construct improved planets_df 337 | planets <- c("Mercury", "Venus", "Earth", "Mars", "Jupiter", "Saturn", "Uranus", "Neptune") 338 | type <- c("Terrestrial planet", "Terrestrial planet", "Terrestrial planet", 339 | "Terrestrial planet", "Gas giant", "Gas giant", "Gas giant", "Gas giant") 340 | diameter <- c(0.382, 0.949, 1, 0.532, 11.209, 9.449, 4.007, 3.883) 341 | rotation <- c(58.64, -243.02, 1, 1.03, 0.41, 0.43, -0.72, 0.67) 342 | rings <- c(FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE) 343 | type_factor <- factor(type) 344 | planets_df <- data.frame(planets, type_factor, diameter, rotation, rings, stringsAsFactors = FALSE) 345 | 346 | # Improve the names of planets_df 347 | names(planets_df) <- c("name", "type", "diameter", "rotation", "has_rings") 348 | 349 | # Print planets_df 350 | planets_df 351 | ``` 352 | 353 | *** =sct 354 | ```{r} 355 | 356 | msg <- "Do not remove or change the definition of the predefined variables!" 357 | test_object("planets", undefined_msg = msg, incorrect_msg = msg) 358 | test_object("type", undefined_msg = msg, incorrect_msg = msg) 359 | test_object("diameter", undefined_msg = msg, incorrect_msg = msg) 360 | test_object("rotation", undefined_msg = msg, incorrect_msg = msg) 361 | test_object("rings", undefined_msg = msg, incorrect_msg = msg) 362 | test_object("type_factor", undefined_msg = msg, incorrect_msg = msg) 363 | test_object("planets_df", undefined_msg = msg, incorrect_msg = "Don't change the contents of `planets_df`, only change the column names!") 364 | test_object("planets_df", eq_condition = "equal", 365 | undefined_msg = msg, incorrect_msg = "Are you sure you have correctly renamed the columns of `planets_df`? The hint might be able to help you out.") 366 | 367 | test_output_contains("planets_df", incorrect_msg = "Don't forget to print `planets_df`.") 368 | success_msg("Nice one! This is going fast!") 369 | ``` 370 | 371 | --- type:VideoExercise lang:r xp:50 skills:1 key:9a2f941de8 372 | ## Subset, Extend & Sort Data Frames 373 | 374 | *** =video_link 375 | //player.vimeo.com/video/138174008 376 | 377 | *** =video_hls 378 | //videos.datacamp.com/transcoded/732_intro_to_r/v1/hls-ch6_2.master.m3u8 379 | 380 | 381 | --- type:NormalExercise lang:r xp:100 skills:1 key:b6125af738 382 | ## Selection of data frame elements 383 | 384 | Similar to matrices, you select elements from a data frame with the help of square brackets `[ ]`. By using a comma, you can indicate what to select from the rows and the columns respectively: 385 | 386 | ``` 387 | # first row, second column 388 | my_df[1,2] 389 | 390 | # rows 1 to 3, columns 2 to 4 391 | my_df[1:3,2:4] 392 | 393 | # Entire first row 394 | my_df[1, ] 395 | 396 | # rows 1 to 3 of "type" column 397 | planets_df[1:3,2] 398 | planets_df[1:3,"type"] 399 | ``` 400 | 401 | Let us now apply this technique on `planets_df`! This data frame is already available in the workspace. 402 | 403 | *** =instructions 404 | - Select the type of Mars; store the factor in `mars_type`. 405 | - Store the entire rotation column in `rotation` as a vector. 406 | - Create a data frame, `closest_planets_df`, that contains all data on the first three planets. 407 | - Likewise, build the data frame `furthest_planets_df` that contains all data on the last three planets. 408 | 409 | *** =hint 410 | `planets_df[1:3,]` will select all elements of the first three rows. 411 | 412 | *** =pre_exercise_code 413 | ```{r} 414 | load(url("http://s3.amazonaws.com/assets.datacamp.com/course/introduction_to_r/chapter6.RData")) 415 | ``` 416 | 417 | *** =sample_code 418 | ```{r} 419 | # planets_df is pre-loaded 420 | 421 | # The type of Mars: mars_type 422 | 423 | 424 | # Entire rotation column: rotation 425 | 426 | 427 | # First three planets: closest_planets_df 428 | 429 | 430 | # Last three planets: furthest_planets_df 431 | 432 | 433 | ``` 434 | 435 | *** =solution 436 | ```{r} 437 | # planets_df is pre-loaded 438 | 439 | # The type of Mars: mars_type 440 | mars_type <- planets_df[4, 2] 441 | 442 | # Entire rotation column: rotation 443 | rotation <- planets_df[ ,4] 444 | 445 | # First three planets: closest_planets_df 446 | closest_planets_df <- planets_df[1:3, ] 447 | 448 | # Last three planets: furthest_planets_df 449 | furthest_planets_df <- planets_df[6:8, ] 450 | ``` 451 | 452 | *** =sct 453 | ```{r} 454 | 455 | msg <- "Do not remove or overwrite the `planets_df` data frame!" 456 | test_object("planets_df", undefined_msg = msg, incorrect_msg = msg) 457 | 458 | test_object("mars_type", 459 | incorrect_msg = "Are you sure you correctly selected the type of Mars?") 460 | test_object("rotation", 461 | incorrect_msg = "Have another look at the command to define `rotation`. You'll want to select the fourth column.") 462 | test_object("closest_planets_df", 463 | incorrect_msg = "Did you select the first three rows of `planets_df` to create `closest_planets_df`?") 464 | test_object("furthest_planets_df", 465 | incorrect_msg = "Make sure that you select the last three rows of `planets_df` to build `furthest_planets_df`.") 466 | 467 | success_msg("Great! Feel free to have a look at the variables you've just created. Apart from selecting elements from your data frame by index, you can also use the column names.") 468 | ``` 469 | 470 | --- type:NormalExercise lang:r xp:100 skills:1 key:7aa0a94261 471 | ## Only planets with rings (1) 472 | 473 | You will often want to select an entire column, namely one specific variable from a data frame. If you want to select the column `diameter` from `planets_df`, you can use either on of the following: 474 | 475 | ``` 476 | planets_df[, 3] 477 | planets_df[, "diameter"] 478 | planets_df$diameter 479 | ``` 480 | 481 | *** =instructions 482 | - Make use of the `$` sign to create the variable `rings_vector` that contains the entire `has_rings` column in the `planets_df` data frame. 483 | - Print the `rings_vector`; it should be a vector. 484 | 485 | *** =hint 486 | `my_df$col_name` is the most convenient way to select a column from a data frame. In this case, the data frame is `planets_df` and the variable is `has_rings`. 487 | 488 | *** =pre_exercise_code 489 | ```{r} 490 | load(url("http://s3.amazonaws.com/assets.datacamp.com/course/introduction_to_r/chapter6.RData")) 491 | ``` 492 | 493 | *** =sample_code 494 | ```{r} 495 | # planets_df is pre-loaded in your workspace 496 | 497 | # Create rings_vector 498 | 499 | 500 | # Print rings_vector 501 | 502 | ``` 503 | 504 | *** =solution 505 | ```{r} 506 | # planets_df is pre-loaded in your workspace 507 | 508 | # Create rings_vector 509 | rings_vector <- planets_df$has_rings 510 | 511 | # Print rings_vector 512 | rings_vector 513 | ``` 514 | 515 | *** =sct 516 | ```{r} 517 | msg <- "Do not remove or overwrite the `planets_df` data frame!" 518 | test_object("planets_df", undefined_msg = msg, incorrect_msg = msg) 519 | 520 | test_object("rings_vector", incorrect_msg = "It looks like `rings_vector` does not contain all the elements of the `has_rings` variable of`planets_df`.") 521 | 522 | test_output_contains("rings_vector", incorrect_msg = "Don't forget to print `rings_vector`!") 523 | 524 | success_msg("Great! Continue to the next exercise and discover yet another way of subsetting!") 525 | ``` 526 | 527 | 528 | --- type:NormalExercise lang:r xp:100 skills:1 key:d6245c1bb1 529 | ## Only planets with rings (2) 530 | 531 | You probably remember from high school that some planets in our solar system have rings and others do not. But due to other priorities at that time (read: puberty) you can not recall their names, let alone their rotation speed, etc. Could R help you out? 532 | 533 | The `rings_vector` you've coded before is a logical vector. It's `TRUE` when the corresponding planets have rings and `FALSE` when they don't. To select those observations from `planets_df` that have rings, you can use the `rings_vector` and perform subsetting by logicals! 534 | 535 | To subset observations by logicals, put the logical vector and a comma inside square brackets, similar to this: 536 | 537 | ``` 538 | df[,logical_vector] 539 | ``` 540 | 541 | *** =instructions 542 | - Assign to `planets_with_rings_df` all data in the `planets_df` data set for the planets with rings, that is, where `rings_vector` is `TRUE`. 543 | - Print `planets_with_rings_df`. 544 | 545 | *** =hint 546 | Select elements from `planets_df` by using the square brackets. The `rings_vector` contains boolean values and R will select only those rows/columns were the vector element is `TRUE`. In this case, you want to select rows based on `rings_vector` and select all the columns. 547 | 548 | *** =pre_exercise_code 549 | ```{r} 550 | load(url("http://s3.amazonaws.com/assets.datacamp.com/course/introduction_to_r/chapter6.RData")) 551 | ``` 552 | 553 | *** =sample_code 554 | ```{r} 555 | # planets_df pre-loaded in your workspace 556 | 557 | # Create rings_vector 558 | rings_vector <- planets_df$has_rings 559 | 560 | # Select the information on planets with rings: planets_with_rings_df 561 | 562 | 563 | # Print planets_with_rings_df 564 | ``` 565 | 566 | *** =solution 567 | ```{r} 568 | # planets_df pre-loaded in your workspace 569 | 570 | # Create rings_vector 571 | rings_vector <- planets_df$has_rings 572 | 573 | # Select the information on planets with rings: planets_with_rings_df 574 | planets_with_rings_df <- planets_df[rings_vector,] 575 | 576 | # Print planets_with_rings_df 577 | planets_with_rings_df 578 | ``` 579 | 580 | *** =sct 581 | ```{r} 582 | 583 | msg <- "Do not remove or overwrite `planets_df` or `rings_vector`!" 584 | test_object("planets_df", undefined_msg = msg, incorrect_msg = msg) 585 | test_object("rings_vector", undefined_msg = msg, incorrect_msg = msg) 586 | 587 | test_object("planets_with_rings_df", 588 | incorrect_msg = "It looks like `planets_with_rings_df` does not contain all the data of the planets with rings. Make sure to not specify any column selector, to keep all columns.") 589 | 590 | test_output_contains("planets_with_rings_df", 591 | incorrect_msg = "Don't forget to print `planets_with_rings_df`!") 592 | success_msg("Nice work, but this is a rather tedious solution. The next exercise will teach you how to do it in a more concise way.") 593 | ``` 594 | 595 | 596 | --- type:NormalExercise lang:r xp:100 skills:1 key:c1a08e245c 597 | ## Only planets with rings but shorter 598 | 599 | So what exactly did you learn in the previous exercises? You selected a subset from a data frame (`planets_df`) based on whether or not a certain condition was true (rings or no rings), and you managed to pull out all relevant data. Pretty awesome! By now, NASA is probably already flirting with your CV! 600 | 601 | Instead of having to define a vector `rings_vector`, which you then use to subset `planets_df`, you could've also used either one of these: 602 | 603 | ``` 604 | planets_df[planets_df$has_rings, ] 605 | planets_df[planets_df$has_rings == TRUE, ] 606 | ``` 607 | 608 | *** =instructions 609 | - Create a data frame `small_planets_df` with planets that have a diameter smaller than the Earth. This means that the `diameter` variable should be smaller than 1, since diameter is a relative measure of the planet's diameter in relation to planet Earth. 610 | - Build another data frame, `slow_planets_df`, with the observations that have a longer rotation period than Earth. This means that the absolute value (use the function [`abs()`](http://www.rdocumentation.org/packages/base/functions/MathFun)) of the `rotation` variable should be greater than 1. 611 | 612 | *** =hint 613 | Make use of the logical operators `>` and `<`. Use the [`abs()`](http://www.rdocumentation.org/packages/base/functions/MathFun) function for absolute values. 614 | 615 | *** =pre_exercise_code 616 | ```{r} 617 | load(url("http://s3.amazonaws.com/assets.datacamp.com/course/introduction_to_r/chapter6.RData")) 618 | ``` 619 | 620 | *** =sample_code 621 | ```{r} 622 | # planets_df is pre-loaded in your workspace 623 | 624 | # Planets that are smaller than planet Earth: small_planets_df 625 | 626 | 627 | # Planets that rotate slower than planet Earth: slow_planets_df 628 | 629 | ``` 630 | 631 | *** =solution 632 | ```{r} 633 | # planets_df is pre-loaded in your workspace 634 | 635 | # Planets that are smaller than planet Earth: small_planets_df 636 | small_planets_df <- planets_df[planets_df$diameter < 1, ] # option 1 637 | small_planets_df <- subset(planets_df, subset = diameter < 1) # option 2 638 | 639 | # Planets that rotate slower than planet Earth: slow_planets_df 640 | slow_planets_df <- planets_df[abs(planets_df$rotation) > 1, ] # option 1 641 | slow_planets_df <- subset(planets_df, subset = abs(rotation) > 1) # option 2 642 | ``` 643 | 644 | *** =sct 645 | ```{r} 646 | 647 | msg <- "Do not remove or overwrite the `planets_df` data frame!" 648 | test_object("planets_df", undefined_msg = msg, incorrect_msg = msg) 649 | 650 | test_object("small_planets_df", 651 | incorrect_msg = "It looks like `small_planets_df` does not contain the correct subset of `planets_df`.") 652 | 653 | test_object("slow_planets_df", 654 | incorrect_msg = "It looks like `slow_planets_df` does not contains the correct subset of `planets_df`. Make sure to use the [`abs()`](http://www.rdocumentation.org/packages/base/functions/MathFun) function for absolute values.") 655 | 656 | success_msg("Great! Not only is the [`subset()`](http://www.rdocumentation.org/packages/base/functions/subset) function more concise, it is probably also more understandable for people who read your code. Continue to the next exercise.") 657 | ``` 658 | 659 | 660 | --- type:NormalExercise lang:r xp:100 skills:1 key:e9ca3eeb99 661 | ## Add variable/column 662 | 663 | There are many cases in which you'll want to add more variables to your data frame. This comes down to adding a column to the data frame. The exact same techniques to select columns from a data frame can be used here. To add a column `new_column` to `my_df`, with data from `my_vec`, you can use one of the following calls: 664 | 665 | ``` 666 | my_df$new_column <- my_vec 667 | my_df[["new_column"]] <- my_vec 668 | my_df <- cbind(my_df, new_column = my_vec) 669 | ``` 670 | 671 | You've browsed [Wikipedia](https://en.wikipedia.org/wiki/Planet) and also decide to add a column that lists the number of moons each of the planets has. Also the planets' masses can be a cool addition. The `moon` and `masses` vectors are already included in the workspace; up to you to add them to `planets_df`. 672 | 673 | *** =instructions 674 | - Add `moons` to `planets_df` under the variable name "moon". 675 | - In a similar fashion, add `masses` under the variable name `"mass"`. 676 | 677 | *** =hint 678 | To add a new column called "moon", you can use: `planets_df$moon <- moons`. 679 | 680 | *** =pre_exercise_code 681 | ```{r} 682 | load(url("http://s3.amazonaws.com/assets.datacamp.com/course/introduction_to_r/chapter6.RData")) 683 | ``` 684 | 685 | *** =sample_code 686 | ```{r} 687 | # planets_df is already pre-loaded in your workspace 688 | 689 | # Definition of moons and masses 690 | moons <- c(0, 0, 1, 2, 67, 62, 27, 14) 691 | masses <- c(0.06, 0.82, 1.00, 0.11, 317.8, 95.2, 14.6, 17.2) 692 | 693 | # Add moons to planets_df under the name "moon" 694 | 695 | 696 | # Add masses to planets_df under the name "mass" 697 | 698 | ``` 699 | 700 | *** =solution 701 | ```{r} 702 | # planets_df is already pre-loaded in your workspace 703 | 704 | # Definition of moons and masses 705 | moons <- c(0, 0, 1, 2, 67, 62, 27, 14) 706 | masses <- c(0.06, 0.82, 1.00, 0.11, 317.8, 95.2, 14.6, 17.2) 707 | 708 | # Add moons to planets_df under the name "moon" 709 | planets_df$moon <- moons 710 | 711 | # Add masses to planets_df under the name "mass" 712 | planets_df$mass <- masses 713 | ``` 714 | 715 | *** =sct 716 | ```{r} 717 | 718 | undef_msg <- "Do not remove `planets_df`!" 719 | msg <- "Do not change anything about the columns that were already in `planets_df`; you should only add columns." 720 | test_data_frame(name = "planets_df", 721 | columns = c("name", "type", "diameter", "rotation", "has_rings"), 722 | undefined_msg = undef_msg, undefined_cols_msg = msg, incorrect_msg = msg) 723 | 724 | test_data_frame(name = "planets_df", 725 | columns = "moon", 726 | undefined_cols_msg = "Make sure to name the column to contain the moon information \"moon\".", 727 | incorrect_msg = "The \"moon\" column does not contain the correct information. Try again.") 728 | 729 | test_data_frame(name = "planets_df", 730 | columns = "mass", 731 | undefined_cols_msg = "Make sure to name the column to contain the mass information \"mass\".", 732 | incorrect_msg = "The \"mass\" column does not contain the correct information. Try again.") 733 | 734 | test_object("planets_df", incorrect_msg = "It appears that you have correctly specified the \"moon\" and \"mass\" columns, but there's still something wrong with the resulting `planets_df`. Make sure to add columns twice!") 735 | success_msg("Nice one! This data frame is beginning to contain quite some information!") 736 | ``` 737 | 738 | 739 | --- type:NormalExercise lang:r xp:100 skills:1 key:8e5ade7078 740 | ## Sorting 741 | 742 | In data analysis you will often sort your data according to a certain variable in the data set. In R, this is done with the help of the function [`order()`](http://www.rdocumentation.org/packages/base/functions/order). 743 | 744 | [`order()`](http://www.rdocumentation.org/packages/base/functions/order) is a function that gives you the ranked position of each element when it is applied on a variable, such as a vector for example: 745 | 746 | ``` 747 | a <- c(100,9,101) 748 | order(a) 749 | ``` 750 | 751 | this code returns the vector containing 2, 1 and 3; that's because 100 is the second largest element of the vector, 9 is the smallest element and 101 is the largest element. 752 | 753 | ``` 754 | a[order(a)] 755 | ``` 756 | 757 | will thus give you the ordered vector (9, 100, 101), since it first picks the second element of `a`, then the first and then the last. Got it? If you are not sure, use the console to play with the [`order()`](http://www.rdocumentation.org/packages/base/functions/order) function. 758 | 759 | *** =instructions 760 | Experiment with the [`order()`](http://www.rdocumentation.org/packages/base/functions/order) function in the console. Click 'Submit Answer' when you are ready to continue. 761 | 762 | *** =hint 763 | Just play with the [`order()`](http://www.rdocumentation.org/packages/base/functions/order) function in the console! 764 | 765 | *** =pre_exercise_code 766 | ```{r} 767 | # no pec 768 | ``` 769 | 770 | *** =sample_code 771 | ```{r} 772 | # Just play around with the order function in the console to see how it works! 773 | ``` 774 | 775 | *** =solution 776 | ```{r} 777 | # Just play around with the order function in the console to see how it works! 778 | # Some examples: 779 | order(1:10) 780 | order(2:11) 781 | order(c(5,4,6,7)) 782 | ``` 783 | 784 | *** =sct 785 | ```{r} 786 | success_msg("Great! Now let's use the [`order()`](http://www.rdocumentation.org/packages/base/functions/order) function to sort your data frame!") 787 | ``` 788 | 789 | 790 | --- type:NormalExercise lang:r xp:100 skills:1 key:ec87541ef1 791 | ## Sorting your data frame 792 | 793 | Alright, now let us do something useful with the [`order()`](http://www.rdocumentation.org/packages/base/functions/order) function! You would like to rearrange your data frame such that it starts with the smallest planet and ends with the largest one. A sort on the `diameter` column, if you will. 794 | 795 | Suppose you have a data frame `df`, with three columns `a`, `b` and `c`. The following code will print a version of df that is sorted on the column `a`. 796 | 797 | ``` 798 | pos <- order(df$a) 799 | df[pos, ] 800 | ``` 801 | 802 | *** =instructions 803 | - Assign to the variable `positions` the desired ordering for the new data frame that you will create in the next step. You can use the [`order()`](http://www.rdocumentation.org/packages/base/functions/order) function for that. 804 | - Now create the data frame `smallest_first_df`, which contains the same information as `planets_df`, but with the planets in increasing order of magnitude. Use the previously created variable `positions` as row indices inside square brackets to achieve this. 805 | - Print `smallest_first_df` to see what you've accomplished. 806 | 807 | *** =hint 808 | - `order(planets_df$diameter)` will give you the ordering of the variable diameter from smallest to largest. This is what you should assign to `positions`. 809 | - Use the variable positions then to select from the data frame `planets_df`: `planets_df[positions, ]`. 810 | 811 | *** =pre_exercise_code 812 | ```{r} 813 | load(url("http://s3.amazonaws.com/assets.datacamp.com/course/introduction_to_r/chapter6.RData")) 814 | ``` 815 | 816 | *** =sample_code 817 | ```{r} 818 | # planets_df is pre-loaded in your workspace 819 | 820 | # Create a desired ordering for planets_df: positions 821 | 822 | 823 | # Create a new, ordered data frame: smallest_first_df 824 | 825 | 826 | # Print smallest_first_df 827 | ``` 828 | 829 | *** =solution 830 | ```{r} 831 | # planets_df is pre-loaded in your workspace 832 | 833 | # Create a desired ordering for planets_df: positions 834 | positions <- order(planets_df$diameter) 835 | 836 | # Create a new, ordered data frame: smallest_first_df 837 | smallest_first_df <- planets_df[positions, ] 838 | 839 | # Print smallest_first_df 840 | smallest_first_df 841 | ``` 842 | 843 | *** =sct 844 | ```{r} 845 | msg = "Do not remove or overwrite the `planets_df` data frame!" 846 | test_object("planets_df", undefined_msg = msg, incorrect_msg = msg) 847 | test_object("positions", 848 | incorrect_msg = "It looks like `positions` does not contain all the correct ordering of the diameter column.") 849 | test_object("smallest_first_df", 850 | incorrect_msg = "It looks like `smallest_first_df` does not contain the positions of the ordered `planets_df`.") 851 | test_output_contains("smallest_first_df", incorrect_msg = "Finish off by printing `smallest_first_df`.") 852 | success_msg("Wonderful! What does the resulting data frame look like? Order prevailed!") 853 | ``` 854 | 855 | -------------------------------------------------------------------------------- /course.yml: -------------------------------------------------------------------------------- 1 | id: 732 2 | title: Introduction to R (beta) 3 | author_field: Filip Schouwenaars 4 | university: DataCamp 5 | author_bio: Next to being the main developer of DataCamp's interactive courses, Filip 6 | is responsible for everything related to R. Under the motto 'Eat your own dog food', 7 | he leverages the techniques DataCamp teaches its students to perform data analysis 8 | for DataCamp. Filip holds degrees in Electrical Engineering and Artificial Intelligence. 9 |

This course is a rework of the earlier introduction to R course, 10 | built by Jonathan Cornelissen and Martijn Theuwissen, co-founders of DataCamp. 11 | description: With over 2 million users worldwide R is rapidly becoming the leading 12 | programming language in statistics and data science. Every year, the number of R 13 | users grows by 40%, and an increasing number of organizations are using it in their 14 | day-to-day activities.
In this introduction to R, you will master the basics 15 | of this beautiful open source language such as factors, lists and data frames. With 16 | the knowledge gained in this course, you will be ready to undertake your first very 17 | own data analysis. 18 | chapters: 19 | chapter1.Rmd: 1720 20 | chapter2.Rmd: 1721 21 | chapter3.Rmd: 1722 22 | chapter4.Rmd: 1723 23 | chapter5.Rmd: 1724 24 | chapter6.Rmd: 1725 25 | 26 | -------------------------------------------------------------------------------- /datasets/chapter5.R: -------------------------------------------------------------------------------- 1 | # Create shining_list 2 | actors_vector <- c("Jack Nicholson","Shelley Duvall","Danny Lloyd","Scatman Crothers","Barry Nelson") 3 | reviews_factor <- factor(c("Good", "OK", "Good", "Perfect", "Bad", "Perfect", "Good"), 4 | ordered = TRUE, levels = c("Bad", "OK", "Good", "Perfect")) 5 | shining_list <- list(title = "The Shining", actors = actors_vector, reviews = reviews_factor) 6 | rm(actors_vector, reviews_factor) 7 | save(shining_list, file = "datasets/chapter5.RData") -------------------------------------------------------------------------------- /datasets/chapter5.RData: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacamp/courses-intro-to-r-beta/b028fe1f2dcdc1eba49e8e4135f6b061ae7dd394/datasets/chapter5.RData -------------------------------------------------------------------------------- /datasets/chapter6.R: -------------------------------------------------------------------------------- 1 | planets <- c("Mercury", "Venus", "Earth", "Mars", "Jupiter", "Saturn", "Uranus", "Neptune") 2 | type <- c("Terrestrial planet", "Terrestrial planet", "Terrestrial planet", 3 | "Terrestrial planet", "Gas giant", "Gas giant", "Gas giant", "Gas giant") 4 | diameter <- c(0.382, 0.949, 1, 0.532, 11.209, 9.449, 4.007, 3.883) 5 | rotation <- c(58.64, -243.02, 1, 1.03, 0.41, 0.43, -0.72, 0.67) 6 | rings <- c(FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE) 7 | type_factor <- factor(type) 8 | planets_df <- data.frame(planets, type_factor, diameter, rotation, rings, stringsAsFactors = FALSE) 9 | names(planets_df) <- c("name", "type", "diameter", "rotation", "has_rings") 10 | rm(planets, type, diameter, rotation, rings, type_factor) 11 | save(planets_df, file = "datasets/chapter6.RData") -------------------------------------------------------------------------------- /datasets/chapter6.RData: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacamp/courses-intro-to-r-beta/b028fe1f2dcdc1eba49e8e4135f6b061ae7dd394/datasets/chapter6.RData -------------------------------------------------------------------------------- /refguides/chapter1_refguide.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | output: html_document 3 | --- 4 | 5 | ## R: The true basics 6 | 7 | 8 | ### R as calculator 9 | 10 | After R is started, there is a console awaiting for input. At the prompt (`>`), you can enter numbers and perform calculations. 11 | 12 | ```{r} 13 | 3*2 14 | ``` 15 | A few arithmetic operators are: 16 | 17 | - Addition: `+` 18 | - Subtraction: `-` 19 | - Multiplication: `*` 20 | - Division: `/` 21 | - Exponentiation: `^` 22 | - Modulo: `%%` 23 | 24 | 25 | ### Variable assignment and Operations 26 | 27 | You can assign values to variables with the assignment operator `<-`. Just typing the variable by itself at the prompt will print out the value. 28 | 29 | ```{r} 30 | x <- 3 31 | x 32 | y <- 9 33 | y 34 | ``` 35 | 36 | You can also perform arithmetic operations with variables. Look at the result of multiplying `x` and `y`, we defined previously: 37 | 38 | ```{r} 39 | y * x 40 | ``` 41 | 42 | As you work in R and create new variables it can be easy to lose track of what variables you have defined. To get a list of all the variables that have been defined use [`ls()`](http://www.rdocumentation.org/packages/base/functions/ls). And if you need to remove variables, you can use `rm()`. 43 | 44 | 45 | ### Comment your code 46 | 47 | Adding comments to your code helps others understanding it. Comments in R are ignored by the parser. Any text that could be typed after the `#` character and on the same line is taken to be a comment, unless the `#` character is inside a quoted string. For example, 48 | 49 | ```{r} 50 | x <- 24 # this is a comment 51 | y <- " #... but this is not." 52 | ``` 53 | 54 | 55 | ## Basic Data Types 56 | 57 | There are several basic R data types that are of frequent occurrence in routine R calculations. We will try to understand a few of them better by using the [`class()`](http://www.rdocumentation.org/packages/base/functions/class) function. 58 | 59 | - Decimal values are called numerics in R. You can perform arithmetic operations on them. 60 | ```{r} 61 | x <- 12.3 62 | x 63 | class(x) 64 | ``` 65 | 66 | - A special type of numeric is an integer. You can specify that a number is an integer using the following syntax. 67 | ```{r} 68 | y <- 3L 69 | y 70 | class(y) 71 | ``` 72 | 73 | Another way is to invoke the [`as.integer()`](http://www.rdocumentation.org/packages/base/functions/integer) function. 74 | 75 | ```{r} 76 | y <- as.integer(3) 77 | y 78 | class(y) 79 | ``` 80 | 81 | And one can convert an integer value to a numeric value by [`as.numeric()`](http://www.rdocumentation.org/packages/base/functions/numeric). 82 | 83 | - A character object is used to represent string values in R. 84 | 85 | ```{r} 86 | z <- "Good morning!" 87 | z 88 | class(z) 89 | ``` 90 | 91 | You can convert objects into character values with the [`as.character()`](http://www.rdocumentation.org/packages/base/functions/numeric) function. 92 | 93 | - Another important data type is the logical type. There are two predefined variables, `TRUE` and `FALSE`. 94 | 95 | ```{r} 96 | a <- TRUE 97 | a 98 | class(a) 99 | ``` 100 | 101 | You can also see the data type of a variable by invoking one of the following `is.*()` functions. The result is a logical statement `TRUE` or `FALSE`. 102 | 103 | ```{r, eval=FALSE} 104 | is.numeric() #to evaluate if type = numeric 105 | is.integer() #to evaluate if type = integer 106 | is.character() #to evaluate if type = character 107 | ``` 108 | 109 | 110 | 111 | 112 | 113 | -------------------------------------------------------------------------------- /refguides/chapter2_refguide.Rmd: -------------------------------------------------------------------------------- 1 | 2 | ## Create and Name Vectors 3 | 4 | A vector is a sequence of data elements of the same basic type. You can have character, numerical, logical vectors and many more. To create one you can use the [`c()`](http://www.rdocumentation.org/packages/onion/functions/c) function. Here is a numeric vector: 5 | 6 | ```{r} 7 | c(12, 7, 4) 8 | ``` 9 | 10 | And a character one, assigned to the variable `types_water`. 11 | 12 | ```{r} 13 | types_water <- c("Fresh", "Brackish", "Seawater") 14 | ``` 15 | 16 | As in the previous chapter, you can verify if `types_water` is a vector by 17 | 18 | ```{r} 19 | is.vector(types_water) 20 | ``` 21 | 22 | You can also name the elements of your vector by the [`names()`](http://www.rdocumentation.org/packages/nlme/functions/Names) function or create a named vector to begin with. 23 | 24 | ```{r} 25 | numeric_vector <- c(12, 7, 4) 26 | name <- c("months/year", "days/week", "weeks/month") 27 | names(numeric_vector) <- name 28 | numeric_vector 29 | ``` 30 | 31 | In our example, the `numeric_vector ` contains three elements. To see how many elements your vector contains, use: 32 | 33 | ```{r} 34 | length(numeric_vector) 35 | ``` 36 | 37 | ### Important Note 38 | A vector can only contain elements of the same type. If you try to build a vector of different data types, R performs coersion. It trasnfroms all the elements to the same data type. 39 | 40 | ```{r} 41 | new_vector <- c("ice-cream", TRUE, 2) 42 | new_vector 43 | ``` 44 | 45 | You can verify that now `new_vector` is a character vector by invoking: 46 | 47 | ```{r} 48 | class(new_vector) 49 | ``` 50 | 51 | 52 | ## Vector Calculus 53 | 54 | Computations on vectors are performed in an element-wise fashion. Ckeck it out. 55 | 56 | ```{r} 57 | vector1 <- c(5, 2, 39, 106) 58 | vector2 <- c(300, 5, 1 , 0) 59 | vector3 <- vector1 + vector2 60 | vector3 61 | ``` 62 | 63 | You can use all the arithemtic operations you learned in Chapter 1! 64 | 65 | If you want to sum up all the elements of your vector you can use 66 | 67 | ```{r} 68 | sum(vector3) 69 | ``` 70 | 71 | Moreover, you can use relational operators like `<` and `>` to compare two vectors. Remember the comparison is performed again element-wise. 72 | 73 | ```{r} 74 | vector1 < vector2 75 | ``` 76 | 77 | ## Vector Subsetting 78 | 79 | Suppose you need to select an element of your vector. You can use `[...]`. 80 | 81 | ```{r} 82 | numeric_vector[1] 83 | ``` 84 | 85 | The number inside the brackets corresponds to the element you want to select, here the first one has been selected. 86 | 87 | If you are dealing with a named vector you can use the names of the elements to select them. 88 | 89 | ```{r} 90 | numeric_vector["months/year"] 91 | ``` 92 | 93 | If you need to select more than one elements you can use another vector! Take a minute to understand the following syntax 94 | 95 | ```{r} 96 | numeric_vector[c(1,3)] 97 | ``` 98 | 99 | And also the order matters! 100 | 101 | ```{r} 102 | numeric_vector[c(3,1)] 103 | ``` 104 | 105 | If you want to select all but one elements 106 | 107 | ```{r} 108 | numeric_vector[-2] 109 | ``` 110 | 111 | Notice that the last and the [ante-penultimate](https://en.wiktionary.org/wiki/antepenultimate) examples give the same result! 112 | 113 | Another way to subset a vector is use a logical vector. The logical vector has to have the same length as the one you want to subset. Only the elements that correspond to `TRUE` will be kept. 114 | 115 | ```{r} 116 | numeric_vector[c(TRUE, FALSE, TRUE)] 117 | ``` 118 | 119 | Again we get the same result! If your logical vector is shorter than the vector you are subsetting, R recycles the logical vector that you passed until it has the same length as the one you subset. For further explanations about how recycling works, go to the videos. 120 | 121 | 122 | -------------------------------------------------------------------------------- /refguides/chapter3_refguide.Rmd: -------------------------------------------------------------------------------- 1 | ## Create and Name Matrices 2 | 3 | Matrices are not very different from vectors; both are data structures that store elements of the same type: 4 | 5 | A matrix is a 2-dimensional array consisting of rows and columns. Matrices and dataframes are different since the latter can only contain numeric vectors and can be considered as a natural extension of a vector. 6 | You can build them easily with the function [`matrix()`](http://www.rdocumentation.org/packages/gmp/functions/matrix). 7 | 8 | ```{r, eval= F } 9 | matrix(data = NA, nrow = 1, ncol = 1, byrow = FALSE, dimnames = NULL) 10 | ``` 11 | 12 | Only the number of rows `nrow` and columns `ncol` need to be specified. However, the argument `byrow` can be used to specify whether the matrix is filled up row-wise or column-wise. 13 | 14 | ```{r} 15 | my_matrix <- matrix(c(9,2,5, 1,3,4, 1,2,7), nrow = 3, ncol = 3, byrow = TRUE) 16 | my_matrix 17 | ``` 18 | 19 | 20 | Per default, the rows and columns do not have names. The argument `dimnames` can change that by defining a list with names such as `dimnames = list(c(r1, r2...), c(c1, c2,...)` depending on the number of rows and columns. 21 | 22 | Since matrices are just several vectors that you can put together, they can also be build by pasting rows or columns with [`rbind()`](http://www.rdocumentation.org/packages/dplyr/functions/rbind) or [`cbind()`](http://www.rdocumentation.org/packages/marray/functions/cbind) instead of using the function [`matrix()`](http://www.rdocumentation.org/packages/gmp/functions/matrix). 23 | 24 | A matrix is defined as an atomic vector. Thus, it is possible to create a matrix based on two matrices that do not necessary contain numbers, as seen in the first video exercise. Then you have created a dataframe or a list by applying coercion. 25 | 26 | 27 | ## Subsetting Matrices 28 | > Like any other data object, you can draw subsets from the matrices. They can be built using the square brackets `[]` on the matrix object and specifying the row and column that is to be subtracted. You can then maintain a single matrix element: 29 | 30 | ```{r} 31 | my_matrix[1, 2] 32 | ``` 33 | 34 | Otherwise rows and columns can be specifiyed by simply defining the number of the row or the column: 35 | 36 | ```{r} 37 | my_matrix[1,] 38 | my_matrix[, 2] 39 | ``` 40 | 41 | If only a single number is defined, R returns the value of the position defined inside of the subset: 42 | 43 | ```{r} 44 | my_matrix[1] 45 | ``` 46 | 47 | Remark: R counts the positions inside of a matrix from the first row value to the last row value in the first column, then the first row value to the last row value in the second column. 48 | 49 | Furthermore, you can subset multiple elements of a matrix vector by defining the row or columns and the position of the value. As seen in the video lectures you can use the concatenate function [`c()`](http://www.rdocumentation.org/packages/onion/functions/c) to either retain a single value of a specific position or a sub-matrix. 50 | 51 | ```{r} 52 | my_matrix[2, c(2, 3)] 53 | my_matrix[c(2, 3), c(2, 3)] 54 | ``` 55 | 56 | In a similar manner, matrices can be subsetted by names instead of indices of rows and columns. 57 | Alternatively, logical vectors can be used to subset, when both rows and columns are defined! 58 | 59 | ```{r} 60 | my_matrix[c(F, F, T), c(F, F, T)] 61 | ``` 62 | 63 | ## Matrix calculus 64 | 65 | R has two easy functions to let you sum up the values of the rows and columns: 66 | 67 | * [`rowSums()`](http://www.rdocumentation.org/packages/base/functions/colSums) 68 | * [`colSums()`](http://www.rdocumentation.org/packages/base/functions/colSums) 69 | 70 | And of course any arithmetic operation can be proceeded on a matrix as well: 71 | 72 | * calculate a scalar 73 | * any other operations (`/`, `+`, `-`) 74 | 75 | In general, all matrix operations are done element-wise. 76 | 77 | Remark: Matrix recycling is automatically done when a matrix calculation is done between two unequal matrices or between a matrix and a vector. This has to be handled very carefully, since R might recylces in a way you don't want it to recycle. 78 | 79 | 80 | 81 | 82 | 83 | 84 | -------------------------------------------------------------------------------- /refguides/chapter4_refguide.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | output: html_document 3 | --- 4 | ## Factors 5 | 6 | Factors are defined as categorical variables that take on a few values. To define a variable as categorical use the function [`factor()`](http://www.rdocumentation.org/packages/base/functions/factor). 7 | 8 | What does R do? 9 | 10 | * screening for all values and defining them as factors. 11 | * sort them alphabetically 12 | * character values correspond to integer values (handy, in the case of long charcater strings) 13 | 14 | ## Rename factors 15 | 16 | Moreover, the order has to be specified manually inside of the factor using the `levels` argument in the [`factor()`](http://www.rdocumentation.org/packages/base/functions/factor) function. 17 | 18 | ```{r, eval=FALSE} 19 | factor(my_var, levels = c("xy","xz","zy")) 20 | ``` 21 | 22 | And the level names can be defined manually using the [`levels()`](http://www.rdocumentation.org/packages/base/functions/levels) function 23 | 24 | ```{r, eval=FALSE} 25 | levels(my_var) = c("na_xy", "na_xz", "na_zy") 26 | ``` 27 | 28 | or by using the `labels` argument inside of the function [`factor()`](http://www.rdocumentation.org/packages/base/functions/factor) 29 | 30 | ```{r, eval=F} 31 | factor(my_var, labels = c("na_xy","na_xz","na_zy")) 32 | 33 | ``` 34 | 35 | Remark: To rename levels, you always have to follow the original order of the levels. To avoid confusion and misspecification, it is suggested to use both `levels` and `labels` inside [`factor()`](http://www.rdocumentation.org/packages/base/functions/factor). 36 | 37 | 38 | ## Nominal vs Ordinal 39 | 40 | **Ordinal** variables contain a natural order among their levels, whereas **nominal** variables do not inherit any such order. 41 | 42 | First thing to know about **ordinal** variables is that they are as well defined with [`factor()`](http://www.rdocumentation.org/packages/base/functions/factor) but the argument `order` is specified as `TRUE`. 43 | R orders them alphabetically, unless specified otherwise. 44 | The ordinal structure is quite specific: 45 | * it is regarded in comparisons and operations 46 | * it reflects by `<` and `>` signs 47 | 48 | For example: 49 | ```{r, eval=F} 50 | factor(my_ordinal, order = T, levels = c(1, 2, 3)) 51 | ``` 52 | 53 | 54 | 55 | 56 | -------------------------------------------------------------------------------- /refguides/chapter5_refguide.Rmd: -------------------------------------------------------------------------------- 1 | 2 | ## Lists 3 | 4 | A list is a generic vector containing other objects. There is no particular need for the objects to be of the same type, as with vectors. For example, a list could consist of a numeric vector, a logical value, a matrix, other lists, and so on. 5 | 6 | ```{r} 7 | my_family <- list("Ryan", "Mary", 3, TRUE) 8 | my_family 9 | ``` 10 | 11 | Components of lists may also be named. You can assign names to list elements by the [`names()`](http://www.rdocumentation.org/packages/base/functions/names) function or at the time of creation. 12 | 13 | ```{r} 14 | my_family <- list(father="Ryan", mother="Mary", siblings=3, divorced=TRUE) 15 | my_family 16 | ``` 17 | 18 | If you want to know if your object,`my_family` in our case, is a list you can use the following. 19 | 20 | ```{r} 21 | is.list(my_family) 22 | ``` 23 | 24 | Finally, you can use `str()` to display the stucture of your list. 25 | 26 | ```{r} 27 | str(my_family) 28 | ``` 29 | 30 | 31 | ## Subset and Extend Lists 32 | 33 | If you need to isolate parts of your list you can use `[...]` and `[[...]]`. Indexing with `[...]` as used to subset vectors will give you sublist not the content inside the element. To retrieve the content, we need to use `[[...]]`. This approach will allow you to access a single element at a time. 34 | 35 | ```{r} 36 | my_family[1] 37 | my_family[[1]] 38 | ``` 39 | 40 | If you want to retrieve more elements of your list. 41 | 42 | ```{r} 43 | my_family[c(1,3)] 44 | ``` 45 | 46 | If the list is named its elements can be refered by names instead of numeric indeces. 47 | 48 | ```{r} 49 | my_family[["father"]] 50 | ``` 51 | 52 | Alternatively, you can use the `$` operator. 53 | ```{r} 54 | my_family$father 55 | ``` 56 | 57 | Another way to subset is logical data types, namely `TRUE` and `FALSE`. 58 | 59 | ```{r} 60 | my_family[c(TRUE, FALSE, TRUE, FALSE)] 61 | ``` 62 | 63 | Adding new elements is easy. You simply assign values using new tags and it will pop into action. 64 | 65 | ```{r} 66 | grandparents <- c("Arthur","Josephin") 67 | my_family$grandparents <- grandparents 68 | my_family 69 | ``` 70 | 71 | or equivalently 72 | ```{r} 73 | my_family[["grandparents"]] <- grandparents 74 | ``` 75 | 76 | 77 | 78 | -------------------------------------------------------------------------------- /refguides/chapter6_refguide.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | output: html_document 3 | --- 4 | ## Explore dataframes 5 | 6 | Data sets 7 | 8 | * consist of observations 9 | * corresponding to variables 10 | * stored in a dataframe 11 | 12 | Matrices on the other hand are only useful for atomic vectors, and lists would require too much coding. 13 | 14 | What is a dataframe? 15 | 16 | * built to specifically store data 17 | * matrix form: with rows as observations and columns as variables 18 | * allows for elements of all types (logicals, numerics, characters) 19 | 20 | How to create a dataframe? 21 | 22 | * import data from CSV files 23 | * import from a database (i.e. SQL) 24 | * import from other statistical software etc... 25 | 26 | Remark: Dataframes are basically lists with n elements corresponding to each column of the dataframe. The elements of the lists are of length of the number of observations BUT the number of observations has to be equal. 27 | 28 | In general, you can define a data frame inside R using the function [`data.frame()`](http://www.rdocumentation.org/packages/R.utils/functions/dataFrame). 29 | 30 | ```{r, eval = F} 31 | data.frame(..., row.names = NULL, check.rows = FALSE, check.names = TRUE, 32 | stringsAsFactors = default.stringsAsFactors())) 33 | ``` 34 | 35 | ## Subset a dataframe 36 | 37 | Due to the nature of a dataframe, you use the subsetting syntax of lists and matrices. 38 | 39 | To draw a subset from a matrix, you apply the square brackets and choose a row and a column 40 | 41 | ```{r, eval = FALSE} 42 | my_df[3, 2] 43 | ``` 44 | 45 | The indices can be columns names as well! 46 | As before, to get the only one of the rows you would specify which one you want to keep and leave the column argument empty. Same applies for keeping only one variable but all the observations. 47 | 48 | ```{r, eval=FALSE} 49 | my_df[3, ] # only the third row is subsetted 50 | my_df[ , 2] # only the secnd column in subsetted 51 | 52 | ``` 53 | 54 | This can be generalized to the situation where you want to select only some variables but keep all the observations; or select only a few observations: 55 | 56 | ```{r, eval=FALSE} 57 | my_df[c(3,2), c(3,2)] 58 | ``` 59 | 60 | Remark: Any built subset leads to a _new dataframe_ and not a vector, as it was the case before. 61 | 62 | How to use the list syntax to select elements? 63 | 64 | * Either by using the dollar sign (`$`) 65 | 66 | ```{r, eval= F} 67 | my_df$variable1 68 | ``` 69 | 70 | * Or by using double brackets (`[[...]]`) 71 | 72 | ```{r, eval = F} 73 | my_df[[variable1]] 74 | ``` 75 | 76 | Remark: Now, the result is a vector. If instead of double square brackets, single square brackets are used, then a _new list_ is created. 77 | 78 | ## Extend your dataframe 79 | 80 | You can extend your dataframe by adding a column 81 | 82 | * by using the dollar sign (`$`) 83 | ```{r, eval=F} 84 | my_df$new_column <- new_column 85 | ``` 86 | 87 | * by using double square brackets (`[[...]]`) 88 | ```{r, eval=F} 89 | my_df[["new_column"]] <- new_column 90 | ``` 91 | 92 | * using [`cbind()`](http://www.rdocumentation.org/packages/marray/functions/cbind) 93 | ```{r, eval=FALSE} 94 | cbind(my_df, new_column) 95 | ``` 96 | 97 | The dataframe can be extended by adding a rows. Since rows corresponds to lists, it is necessary to create a new dataframe with [`data.frame()`](http://www.rdocumentation.org/packages/R.utils/functions/dataFrame) and combine the original one and the new one. 98 | 99 | * by using [`rbind()`](http://www.rdocumentation.org/packages/dplyr/functions/rbind) 100 | ```{r, eval=FALSE} 101 | rbind(my_df, new_df) 102 | ``` 103 | 104 | ## Sort a dataframe 105 | 106 | In general, the function [`sort()`](http://www.rdocumentation.org/packages/arules/functions/sort) 107 | can be applied. However, to sort the rows in a data frame, you can use the [`order()`](http://www.rdocumentation.org/packages/base/functions/order) function. 108 | 109 | ```{r, eval= F} 110 | rank <- order(my_df$variable1) 111 | ``` 112 | 113 | The order function 114 | 115 | * returns a vector with rank/position of each element 116 | * the first value indicates the rank of the element in the vector/matrix 117 | * [`order()`](http://www.rdocumentation.org/packages/base/functions/order) can be used inside of a subset 118 | 119 | ```{r, eval=FALSE} 120 | my_df[order(my_df, decreasing = TRUE), ] 121 | ``` 122 | 123 | For more information, have a look at the exercises! 124 | -------------------------------------------------------------------------------- /refguides/chapter7_refguide.Rmd: -------------------------------------------------------------------------------- 1 | 2 | ## Basic Graphics 3 | 4 | One of the most frequently used plotting functions in R is the [`plot()`](http://www.rdocumentation.org/packages/graphics/functions/plot). This is a generic function: the type of plot produced is dependent on the type or class of the argument(s). 5 | 6 | ```{r, eval=FALSE} 7 | x <- c(1, 2, 3, 4) 8 | plot(x) # this generates a plot of the values in the variable against their index 9 | 10 | x <- factor(c("Black", "White", "Green")) 11 | plot(x) # this generates a bar chart 12 | ``` 13 | 14 | ```{r, eval=FALSE} 15 | x <- c(1, 2, 3) 16 | y <- c(1, 2, 3) 17 | plot(x, y) # this generates a scatter plot 18 | 19 | x <- factor(c("Black", "White", "Green")) 20 | y <- c(1, 2, 3) 21 | plot(x, y) # this generates boxplots of y for each level of x 22 | 23 | x <- factor(c("Black", "White", "Green")) 24 | y <- factor(c("Left", "Right", "Centre")) 25 | plot(x, y) # this generates stacked bar chart 26 | ``` 27 | 28 | Histograms can be created using the [`hist()`](http://www.rdocumentation.org/packages/graphics/functions/hist) function. This function takes in a continuous variable, `x`, for which the histogram is plotted. 29 | 30 | ```{r, eval=FALSE} 31 | hist(x, breaks = ``) 32 | ``` 33 | 34 | With the `breaks` argument you can specify the number of bins you want in the histogram. 35 | 36 | You can also check other graphics functions such as [`boxplot()`](http://www.rdocumentation.org/packages/graphics/functions/boxplot) and [`barplot()`]( http://www.rdocumentation.org/packages/raster/functions/barplot). 37 | 38 | 39 | ## Customizing Plots 40 | 41 | Now, you can modify your plot ! 42 | 43 | ```{r,eval=FALSE} 44 | plot(x,y, 45 | xlab = " ", # changes the label of the horizontal axis 46 | ylab = " ", # changes the label of the vertical axis 47 | main = " ", # specifies the title of the plot 48 | type = " ", # specifies the type of the plot i.e lines, points etc 49 | col = " ") # specifies the color of the plot 50 | ``` 51 | 52 | Type `?par` in your console to take a peek on the graphical parameters you can specify. 53 | 54 | A few of them are 55 | 56 | ```{r,eval=FALSE} 57 | plot(x,y, 58 | xlab = " ", 59 | ylab = " ", 60 | main = " ", 61 | type = " ", 62 | col = " ", 63 | col.main = " ", # specifies the color of the main title 64 | cex.axis = ` `, # specifies the size of the fonts 65 | lty = ` `, # specifies the line type 66 | pch = ` `) # specifies the plot symbol 67 | ``` 68 | 69 | ### Important Note 70 | Since all the arguments are specified inside the [`plot()`](http://www.rdocumentation.org/packages/graphics/functions/plot) 71 | function they are valid only for the specific plot. It is possible, though, to set the parameters of the graphs globally by using the [`par()`](http://www.rdocumentation.org/packages/graphics/functions/par) function. 72 | 73 | 74 | ## Multiple Plots 75 | 76 | R makes it easy to combine multiple plots into one overall graph, using either the [`par()`](http://www.rdocumentation.org/packages/graphics/functions/par) 77 | or [`layout()`](http://www.rdocumentation.org/packages/graphics/functions/layout) function. 78 | 79 | With [`par()`](http://www.rdocumentation.org/packages/graphics/functions/par), you can include the option `mfrow` to create a grid of `nrows` and `ncols` plots that are filled in by row. 80 | 81 | ```{r,eval=FALSE} 82 | par(mfrow = c(nrows, ncols)) 83 | ``` 84 | 85 | If you use `mfcol`, it fills in the grid by columns. 86 | 87 | ```{r,eval=FALSE} 88 | par(mfcol = c(nrows, ncols)) 89 | ``` 90 | 91 | In order to reset the graphical parameters you can use: 92 | 93 | ```{r,eval=FALSE} 94 | par(mfrow = c(1, 1)) 95 | ``` 96 | 97 | or equivalently, 98 | 99 | ```{r,eval=FALSE} 100 | old_par <- par() 101 | ``` 102 | 103 | and invoke the `old_par` when you need to reset the parameters. 104 | 105 | Another way is to use the [`layout()`](http://www.rdocumentation.org/packages/graphics/functions/layout) function which divides the plotting space into as many rows and columns as there are in matrix `mat`. 106 | 107 | ```{r,eval=FALSE} 108 | layout(mat, ...) 109 | ``` 110 | 111 | Once more way to reset the graphical parameters, is to use: 112 | 113 | ```{r,eval=FALSE} 114 | layout(1) 115 | ``` 116 | 117 | 118 | In order to add more information to your plot you can use the following fuunctions. 119 | 120 | ```{r,eval=FALSE} 121 | plot(x, y) 122 | abline() # adds one or more straight lines 123 | lines() # adds lines (careful how to specify the arguments, watch video for more info) 124 | points() # adds points 125 | text() # adds text 126 | segments() # adds line segments between pairs of points 127 | ``` 128 | 129 | Take a look at the documentatin to get more insight into these functions [`abline()`](http://www.rdocumentation.org/packages/graphics/functions/abline), 130 | [`lines()`](http://www.rdocumentation.org/packages/graphics/functions/lines), 131 | [`points()`](http://www.rdocumentation.org/packages/graphics/functions/points), 132 | [`text()`](http://www.rdocumentation.org/packages/graphics/functions/text), 133 | and [`segments()`](http://www.rdocumentation.org/packages/graphics/functions/segments). 134 | 135 | 136 | 137 | -------------------------------------------------------------------------------- /scripts/chapter1_script.md: -------------------------------------------------------------------------------- 1 | ## chapter_1_1 script: R, the true basics 2 | 3 | Hi! My name is Filip and I'm a data scientist at DataCamp. DataCamp is an online data science school. You'll take fun video lessons, like the one you're watching now and solve interactive coding challenges, where you receive instant and detailed feedback. All this happens in the comfort of your browser, so you can immediately start learning the skill of the future. 4 | 5 | In this introduction to R course you will learn about the basics of R, as well as the most common data structures it uses to store data. By the end of this course, you will know how to create these data structures, manipulate them and perform calculations on them to get surprising insights. 6 | 7 | But first things first: the basics of R. It's also called the language for statistical computing, and is one of the most popular languages to do data science, used by tons of companies and universities around the globe in all sorts of fields. Optimizing a financial portfolio? Mapping marketing data? Analyzing outcomes of clinical trials? You name it, R can handle it. 8 | 9 | But why did R become so popular? Well, first of all, it's free to use! Next, R's visualization capabilities are top notch, making it easy to build beautiful plots. It's also easy to create so-called packages, which are extensions to R. R's very active community has created thousands of these packages for many different fields. Last but not least, R is an actual programming language, with a command-line interface for executing code. This is a big plus compared to other point-and-click programs out there. It might take some energy to fully get the hang of it, but feat not: DataCamp is here to help you master R in no time! Let's get started. 10 | 11 | An important component of R, is the console. It's a place where you can execute R commands. In DataCamp's interactive interface, the console can be found here. Let's try to calculate the sum of 1 and 2. We simply type 1 + 2 at the prompt the console and hit Enter. R interprets what you typed and prints the result. 12 | 13 | R is more than a scientific calculator, though. You can also create so-called variables. A variable allows you to store data in R for later use. You can use the less than sign followed by a dash to create a variable. Suppose the height of a rectangle is 2. Let's assign this value 2 to a variable height. In the console, we type height, less than sign, dash, 2: 14 | 15 | This time, R does not print anything, because it assumes that you will be using this variable in the future. If you now simply type and execute height in the console, R returns 2: 16 | 17 | We can do a similar thing for the width of our imaginary rectangle. We assign the value 4 to a variable width. 18 | 19 | Typing width gives us 4, great. 20 | 21 | As you're assigning variables in the R console, you're actually accumulating the R workspace. It's the place where R variables 'live'. You can list all variables with the `ls()` function. Simply type ls followed by empty parentheses and hit enter. 22 | 23 | This shows you a list of all the variables you have created up to now. There are two objects in your workspace at the moment, height and width. I we try to access variable that's not in the workspace, depth for example, R throws an error. 24 | 25 | Suppose you now want to find out the area of our imaginary rectangle, which is height multiplied by width. height equals 2, and width equals 4, so the result is 8. Let's also assign this result to a new variable, area. 26 | 27 | Inspecting the workspace again with ls, shows that the workspace contains three objects now: area, height and width. 28 | 29 | Now, this is all great, but what if you want to recalculate the area of your imaginary rectangle when the height is 3 and the width is 6? You'd have to reassign the variables width and height in the console, and then recalculate the area. That's quite some coding you'd have to redo, isn't it? 30 | 31 | This is the place where R scripts come in! An R script is simply a text file with succesive lines of R code. Let's create such a script, "rectangle.R", that contains the code that we've written up to now. 32 | 33 | Next, you can run this script. In the DataCamp interface, you can do this with the 'Submit Answer' button. R goes through your code, line by line, executing every command one by one in the console, just as if you are typing each command yourself. The cool thing is, that if you want to change your code, you can simply adapt your script and run it again. Let's change the height to 3 and the width to 6, and rerun the script. The variables are given different values this time, and the output changes accordingly. 34 | 35 | Now it's time for some interactive exercises! Use the console for experimentation, and the R script editor for coding the actual answer. When you hit Submit Answer, your script will be executed, and checked for correctness. DataCamp's tailored feedback will guide you to R mastery! --------------------------------------------------------------------------------