├── .gitignore ├── README.md ├── chapter1.md ├── chapter2.md ├── chapter3.md ├── chapter4.md ├── course.yml ├── courses-introduction-to-python.Rproj ├── datasets ├── baseball.csv ├── fifa.csv └── references.md ├── img └── shield_image.png ├── intro-to-python-keynotes.zip ├── requirements.sh ├── scripts ├── chapter1_script.md ├── chapter2_script.md ├── chapter3_script.md └── chapter4_script.md └── slides ├── ch4_slides.pdf ├── chapter_1_433dcfcfedaee070cbf440491c402e3b.md ├── chapter_1_d8fcd4c930027fa4e1c3870c7e7e0ff1.md ├── chapter_2_355ed52d2fb0d67508c6a311b7cbc6d3.md ├── chapter_2_a0530c4542f10988847b2dbb91f717c3.md ├── chapter_2_fc15ba5cb9485456df8589130b519ea3.md ├── chapter_3_1204d914b0e53100529827e07441ee6c.md ├── chapter_3_8e387776f3a264a745128b68aa8d8f83.md ├── chapter_3_cedcfb34350be8545599768f96695cdd.md ├── chapter_4_34495ba457d74296794d2a122c9b6e19.md ├── chapter_4_a0487c26210f6b71ea98f917734cea3a.md ├── chapter_4_ae3238dcc7feb9adecfee0c395fc8dc8.md └── timings.json /.gitignore: -------------------------------------------------------------------------------- 1 | .Rproj.user/* 2 | .Rproj.user 3 | .cache 4 | .DS_STORE 5 | .Rhistory 6 | *.html 7 | *.Rproj 8 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Intro to Python for Data Science 2 | 3 | - Teach: https://www.datacamp.com/teach/repositories/288 4 | - Campus: https://www.datacamp.com/courses/intro-to-python-for-data-science 5 | - Docs: https://instructor-support.datacamp.com 6 | 7 | This repository contains the source files for the interactive course "Intro to Python for Data Science", hosted at www.datacamp.com. Feel free to suggest improvements! 8 | 9 | Want to create your own DataCamp course? Everybody can teach on DataCamp! Visit https://www.datacamp.com/teach. 10 | -------------------------------------------------------------------------------- /chapter1.md: -------------------------------------------------------------------------------- 1 | --- 2 | title_meta: Chapter 1 3 | title: Python Basics 4 | description: >- 5 | An introduction to the basic concepts of Python. Learn how to use Python 6 | interactively and by using a script. Create your first variables and acquaint 7 | yourself with Python's basic data types. 8 | attachments: 9 | slides_link: 'https://projector-video-pdf-converter.datacamp.com/735/chapter1.pdf' 10 | free_preview: true 11 | lessons: 12 | - nb_of_exercises: 3 13 | title: Hello Python! 14 | - nb_of_exercises: 5 15 | title: Variables and Types 16 | --- 17 | 18 | ## Hello Python! 19 | 20 | ```yaml 21 | type: VideoExercise 22 | key: f644a48d5d 23 | xp: 50 24 | ``` 25 | 26 | `@projector_key` 27 | d8fcd4c930027fa4e1c3870c7e7e0ff1 28 | 29 | --- 30 | 31 | ## Your first Python code 32 | 33 | ```yaml 34 | type: NormalExercise 35 | key: bdc52f0e19 36 | lang: python 37 | xp: 100 38 | skills: 39 | - 2 40 | ``` 41 | 42 | It's time to run your first Python code! 43 | 44 | Head to the code and hit the run code button to see the output. 45 | 46 | `@instructions` 47 | - Hit the run code button to see the output of `print(5 / 8)`. 48 | 49 | `@hint` 50 | - Run the code first before submitting your answer so you have time to explore the output. 51 | 52 | `@pre_exercise_code` 53 | ```{python} 54 | 55 | ``` 56 | 57 | `@sample_code` 58 | ```{python} 59 | # Hit run code to see the output! 60 | print(5 / 8) 61 | ``` 62 | 63 | `@solution` 64 | ```{python} 65 | # Hit run code to see the output! 66 | print(5 / 8) 67 | ``` 68 | 69 | `@sct` 70 | ```{python} 71 | Ex().has_printout(0, not_printed_msg = "__JINJA__:Have you used `{{sol_call}}` to print out `5 / 8`?") 72 | success_msg("Great! On to the next one!") 73 | ``` 74 | 75 | --- 76 | 77 | ## Python as a calculator 78 | 79 | ```yaml 80 | type: NormalExercise 81 | key: 0f7c039428 82 | lang: python 83 | xp: 100 84 | skills: 85 | - 2 86 | ``` 87 | 88 | Python is perfectly suited to do basic calculations. It can do addition, subtraction, multiplication and division. 89 | 90 | The code in the script gives some examples. 91 | 92 | Now it's your turn to practice by writing some code yourself. 93 | 94 | `@instructions` 95 | - Print the result of subtracting `5` from `5` under `# Subtraction` using `print()`. 96 | - Print the result of multiplying `3` by `5` under `# Multiplication`. 97 | 98 | `@hint` 99 | - You'll need to use `print()` to generate an output. 100 | - You can subtract with `-` and multiply with `*`. 101 | 102 | `@pre_exercise_code` 103 | ```{python} 104 | 105 | ``` 106 | 107 | `@sample_code` 108 | ```{python} 109 | # Addition and division 110 | print(4 + 5) 111 | print(10 / 2) 112 | 113 | # Subtraction 114 | print() 115 | 116 | # Multiplication 117 | 118 | ``` 119 | 120 | `@solution` 121 | ```{python} 122 | # Addition and division 123 | print(4 + 5) 124 | print(10 / 2) 125 | 126 | # Subtraction 127 | print(5 - 5) 128 | 129 | # Multiplication 130 | print(3 * 5) 131 | ``` 132 | 133 | `@sct` 134 | ```{python} 135 | Ex().has_printout(0, not_printed_msg = "Have you used `print(4 + 5)` to print out the result of your sum?") 136 | 137 | Ex().has_printout(1, not_printed_msg = "Have you used `print(5 - 5)` to print out the result of your subtration?") 138 | 139 | Ex().has_printout(2, not_printed_msg = "Have you used `print(3 * 5)` to print out the result of your multiplication?") 140 | 141 | Ex().has_printout(3, not_printed_msg = "Have you used `print(10 / 2)` to print out the result of your division?") 142 | 143 | success_msg("That's correct! Python can help you do the math, a characteristic that will be helpful for analysis as we grow our data skills.") 144 | ``` 145 | 146 | --- 147 | 148 | ## Variables and Types 149 | 150 | ```yaml 151 | type: VideoExercise 152 | key: c2e396792e 153 | xp: 50 154 | ``` 155 | 156 | `@projector_key` 157 | 433dcfcfedaee070cbf440491c402e3b 158 | 159 | --- 160 | 161 | ## Variable Assignment 162 | 163 | ```yaml 164 | type: NormalExercise 165 | key: 4bf65ad83e 166 | lang: python 167 | xp: 100 168 | skills: 169 | - 2 170 | ``` 171 | 172 | In Python, a variable allows you to refer to a value with a name. To create a variable `x` with a value of `5`, you use `=`, like this example: 173 | 174 | ``` 175 | x = 5 176 | ``` 177 | 178 | You can now use the name of this variable, `x`, instead of the actual value, `5`. 179 | 180 | Remember, `=` in Python means _assignment_, it doesn't test equality! Try it in the exercise by replacing `____` with your code. 181 | 182 | `@instructions` 183 | - Create a variable `savings` with the value of `100`. 184 | - Check out this variable by typing `print(savings)` in the script. 185 | 186 | `@hint` 187 | - Type `savings = 100` to create the variable `savings`. 188 | - After creating the variable `savings`, you can type `print(savings)`. 189 | - Your final code should not include any `____`. 190 | 191 | `@pre_exercise_code` 192 | ```{python} 193 | 194 | ``` 195 | 196 | `@sample_code` 197 | ```{python} 198 | # Create a variable savings 199 | ____ 200 | 201 | # Print out savings 202 | ____ 203 | ``` 204 | 205 | `@solution` 206 | ```{python} 207 | # Create a variable savings 208 | savings = 100 209 | 210 | # Print out savings 211 | print(savings) 212 | ``` 213 | 214 | `@sct` 215 | ```{python} 216 | Ex().check_object("savings").has_equal_value(incorrect_msg="Assign `100` to the variable `savings`.") 217 | Ex().has_printout(0, not_printed_msg = "Print out `savings`, the variable you created, with `print(savings)`.") 218 | success_msg("Great! Let's try to do some calculations with this variable now!") 219 | ``` 220 | 221 | --- 222 | 223 | ## Calculations with variables 224 | 225 | ```yaml 226 | type: NormalExercise 227 | key: ff06cedeb4 228 | lang: python 229 | xp: 100 230 | skills: 231 | - 2 232 | ``` 233 | 234 | You've now created a savings variable, so let's start saving! 235 | 236 | Instead of calculating with the actual values, you can use variables instead. 237 | 238 | How much money would you have saved four months from now, if you saved $10 each month? 239 | 240 | `@instructions` 241 | - Create a variable `monthly_savings`, equal to `10` and `num_months`, equal to `4`. 242 | - Multiply `monthly_savings` by `num_months` and assign it to `new_savings`. 243 | - Print the value of `new_savings`. 244 | 245 | `@hint` 246 | - You can do calculations with variables the same way as with numbers so instead of `10 * 4`, replace the numbers with the variables! 247 | - Use `print()` to see the amount in `new_savings`. 248 | - Take care to spell the variables correctly! 249 | 250 | `@pre_exercise_code` 251 | ```{python} 252 | 253 | ``` 254 | 255 | `@sample_code` 256 | ```{python} 257 | # Create the variables monthly_savings and num_months 258 | 259 | 260 | 261 | # Multiply monthly_savings and num_months 262 | new_savings = ____ 263 | 264 | # Print new_savings 265 | 266 | ``` 267 | 268 | `@solution` 269 | ```{python} 270 | # Create the variables monthly_savings and num_months 271 | monthly_savings = 10 272 | num_months = 4 273 | 274 | # Multiply monthly_savings and num_months 275 | new_savings = monthly_savings * num_months 276 | 277 | # Print new_savings 278 | print(new_savings) 279 | ``` 280 | 281 | `@sct` 282 | ```{python} 283 | Ex().check_object("monthly_savings").has_equal_value(incorrect_msg = "Did you save `10` to `monthly_savings` using `monthly_savings = 10`?") 284 | Ex().check_object("num_months").has_equal_value(incorrect_msg = "Did you save `4` to `num_months` using `num_months = 4`?") 285 | Ex().check_object("new_savings").has_equal_value(incorrect_msg = "Did you use the correct variables and symbols to multiply? Expected `monthly_savings * num_months` but got something else.") 286 | # Ex().check_object("total_savings").has_equal_value(incorrect_msg = "Did you use the correct variables and symbols to add? Expected `savings + new_savings` but got something else.") 287 | 288 | Ex().has_printout(0, not_printed_msg="Remember to print out `new_savings` at the end of your script.") 289 | 290 | success_msg("You have $40 in new savings!") 291 | ``` 292 | 293 | --- 294 | 295 | ## Other variable types 296 | 297 | ```yaml 298 | type: NormalExercise 299 | key: 006b48561f 300 | lang: python 301 | xp: 100 302 | skills: 303 | - 2 304 | ``` 305 | 306 | In the previous exercise, you worked with the integer Python data type: 307 | 308 | - `int`, or integer: a number without a fractional part. `savings`, with the value `100`, is an example of an integer. 309 | 310 | Next to numerical data types, there are three other very common data types: 311 | 312 | - `float`, or floating point: a number that has both an integer and fractional part, separated by a point. `1.1`, is an example of a float. 313 | - `str`, or string: a type to represent text. You can use single or double quotes to build a string. 314 | - `bool`, or boolean: a type to represent logical values. It can only be `True` or `False` (the capitalization is important!). 315 | 316 | `@instructions` 317 | - Create a new float, `half`, with the value `0.5`. 318 | - Create a new string, `intro`, with the value `"Hello! How are you?"`. 319 | - Create a new boolean, `is_good`, with the value `True`. 320 | 321 | `@hint` 322 | - To create a variable in Python, use `=`. Make sure to wrap your string in single or double quotes. 323 | - Only two boolean values exist in Python: `True` and `False`. `TRUE`, `true`, `FALSE`, `false` and other versions will not be accepted. 324 | 325 | `@pre_exercise_code` 326 | ```{python} 327 | 328 | ``` 329 | 330 | `@sample_code` 331 | ```{python} 332 | # Create a variable half 333 | 334 | 335 | # Create a variable intro 336 | 337 | 338 | # Create a variable is_good 339 | 340 | ``` 341 | 342 | `@solution` 343 | ```{python} 344 | # Create a variable half 345 | half = 0.5 346 | 347 | # Create a variable intro 348 | intro = "Hello! How are you?" 349 | 350 | # Create a variable is_good 351 | is_good = True 352 | ``` 353 | 354 | `@sct` 355 | ```{python} 356 | Ex().check_object("half").has_equal_value(incorrect_msg = "Did you save the float, `0.5` to `half`?") 357 | 358 | Ex().check_object("intro").has_equal_value(incorrect_msg = "Hmm, something is incorrect in your `intro` variable. Double check the spelling and make sure you've used quotation marks.") 359 | 360 | Ex().check_object("is_good").has_equal_value(incorrect_msg = "Did you capitalize the boolean value? Remember you don't need to use quotation marks here.") 361 | 362 | success_msg("Nice!") 363 | ``` 364 | 365 | --- 366 | 367 | ## Operations with other types 368 | 369 | ```yaml 370 | type: BulletExercise 371 | key: 4d0d83cc02 372 | xp: 100 373 | ``` 374 | 375 | Variables come in different types in Python. You can see the type of a variable by using `type()`. For example, to see type of `a`, execute: `type(a)`. 376 | 377 | Different types behave differently in Python. When you sum two strings, for example, you'll get different behavior than when you sum two integers or two booleans. 378 | 379 | Time for you to test this out. 380 | 381 | `@pre_exercise_code` 382 | ```{python} 383 | 384 | ``` 385 | 386 | *** 387 | 388 | ```yaml 389 | type: NormalExercise 390 | key: f4e91c4ae9 391 | xp: 50 392 | ``` 393 | 394 | `@instructions` 395 | - Add `savings` and `new_savings` and assign it to `total_savings`. 396 | - Use `type()` to print the resulting type of `total_savings`. 397 | 398 | `@hint` 399 | - Assign `savings + new_savings` to a new variable, `total_savings`. 400 | - To print the type of a variable `x`, use `print(type(x))`. 401 | 402 | `@sample_code` 403 | ```{python} 404 | savings = 100 405 | new_savings = 40 406 | 407 | # Calculate total_savings using savings and new_savings 408 | ____ 409 | print(total_savings) 410 | 411 | # Print the type of total_savings 412 | print(____) 413 | ``` 414 | 415 | `@solution` 416 | ```{python} 417 | savings = 100 418 | new_savings = 40 419 | 420 | # Calculate total_savings using savings and new_savings 421 | total_savings = savings + new_savings 422 | print(total_savings) 423 | 424 | # Print the type of total_savings 425 | print(type(total_savings)) 426 | ``` 427 | 428 | `@sct` 429 | ```{python} 430 | # predefined 431 | msg = "You don't have to change or remove the predefined variables." 432 | 433 | Ex().multi( 434 | check_object('savings', missing_msg=msg).has_equal_value(incorrect_msg=msg), 435 | check_object('new_savings', missing_msg=msg).has_equal_value(incorrect_msg=msg) 436 | ) 437 | 438 | Ex().multi( 439 | check_object("total_savings").has_equal_value(incorrect_msg="Add `savings` and `new_savings` to create the `total_savings` variable."), 440 | has_printout(1, not_printed_msg = "__JINJA__:Use `{{sol_call}}` to print out the type of `total_savings`.") 441 | ) 442 | ``` 443 | 444 | *** 445 | 446 | ```yaml 447 | type: NormalExercise 448 | key: f54fbf9bd9 449 | xp: 50 450 | ``` 451 | 452 | `@instructions` 453 | - Calculate the sum of `intro` and `intro` and assign the result to `doubleintro`. 454 | - Print out `doubleintro`. Did you expect this? 455 | 456 | `@hint` 457 | - Assign `intro + intro` to a new variable, `doubleintro`. 458 | - To print a variable `x`, write `print(x)` in the script. 459 | 460 | `@sample_code` 461 | ```{python} 462 | intro = "Hello! How are you?" 463 | 464 | # Assign sum of intro and intro to doubleintro 465 | ____ 466 | 467 | # Print out doubleintro 468 | print(____) 469 | ``` 470 | 471 | `@solution` 472 | ```{python} 473 | intro = "Hello! How are you?" 474 | 475 | # Assign sum of intro and intro to doubleintro 476 | doubleintro = intro + intro 477 | 478 | # Print out doubleintro 479 | print(doubleintro) 480 | ``` 481 | 482 | `@sct` 483 | ```{python} 484 | # predefined 485 | msg = "You don't have to change or remove the predefined variables." 486 | 487 | Ex().check_object('intro', missing_msg=msg).has_equal_value(incorrect_msg=msg) 488 | 489 | Ex().multi( 490 | check_object("doubleintro").has_equal_value(incorrect_msg = "Have you stored the result of `intro + intro` in `doubleintro`?"), 491 | has_printout(0, not_printed_msg = "Don't forget to print out `doubleintro`.") 492 | ) 493 | 494 | success_msg("Nice. Notice how `intro + intro` causes `\"Hello! How are you?\"` and `\"Hello! How are you?\"` to be pasted together.") 495 | ``` 496 | -------------------------------------------------------------------------------- /chapter2.md: -------------------------------------------------------------------------------- 1 | --- 2 | title_meta: Chapter 2 3 | title: Python Lists 4 | description: >- 5 | Learn to store, access, and manipulate data in lists: the first step toward 6 | efficiently working with huge amounts of data. 7 | attachments: 8 | slides_link: 'https://projector-video-pdf-converter.datacamp.com/735/chapter2.pdf' 9 | lessons: 10 | - nb_of_exercises: 4 11 | title: Python Lists 12 | - nb_of_exercises: 4 13 | title: Subsetting Lists 14 | - nb_of_exercises: 5 15 | title: Manipulating Lists 16 | --- 17 | 18 | ## Python Lists 19 | 20 | ```yaml 21 | type: VideoExercise 22 | key: a5886d213f 23 | xp: 50 24 | ``` 25 | 26 | `@projector_key` 27 | a0530c4542f10988847b2dbb91f717c3 28 | 29 | --- 30 | 31 | ## Create a list 32 | 33 | ```yaml 34 | type: NormalExercise 35 | key: e6c527bf41 36 | lang: python 37 | xp: 100 38 | skills: 39 | - 2 40 | ``` 41 | 42 | A list is a **compound data type**; you can group values together, like this: 43 | 44 | ``` 45 | a = "is" 46 | b = "nice" 47 | my_list = ["my", "list", a, b] 48 | ``` 49 | 50 | After measuring the height of your family, you decide to collect some information on the house you're living in. The areas of the different parts of your house are stored in separate variables in the exercise. 51 | 52 | `@instructions` 53 | - Create a list, `areas`, that contains the area of the hallway (`hall`), kitchen (`kit`), living room (`liv`), bedroom (`bed`) and bathroom (`bath`), in this order. Use the predefined variables. 54 | - Print `areas` with the `print()` function. 55 | 56 | `@hint` 57 | - You can use the variables that have already been created to build the list: `areas = [hall, kit, ...]`. 58 | - Make sure to use square brackets `[]` rather than parentheses `()`. 59 | - You don't need to use quotation marks when storing variables within a list. 60 | - Type `print(areas)` to print out the list when submitting. 61 | 62 | `@pre_exercise_code` 63 | ```{python} 64 | 65 | ``` 66 | 67 | `@sample_code` 68 | ```{python} 69 | hall = 11.25 70 | kit = 18.0 71 | liv = 20.0 72 | bed = 10.75 73 | bath = 9.50 74 | 75 | # Create list areas 76 | 77 | 78 | # Print areas 79 | 80 | ``` 81 | 82 | `@solution` 83 | ```{python} 84 | hall = 11.25 85 | kit = 18.0 86 | liv = 20.0 87 | bed = 10.75 88 | bath = 9.50 89 | 90 | # Create list areas 91 | areas = [hall, kit, liv, bed, bath] 92 | 93 | # Print areas 94 | print(areas) 95 | ``` 96 | 97 | `@sct` 98 | ```{python} 99 | predef_msg = "Don't remove or edit the predefined variables!" 100 | areas_msg = "Define `areas` as the list containing all the area variables, in the correct order: `[hall, kit, liv, bed, bath]`. Watch out for typos. The list shouldn't contain anything else!" 101 | 102 | Ex().check_correct( 103 | has_printout(0, not_printed_msg = "__JINJA__:Have you used `{{sol_call}}` to print out the `areas` list at the end of your script?"), 104 | check_correct( 105 | check_object("areas").has_equal_value(incorrect_msg = areas_msg), 106 | multi( 107 | check_object('hall', missing_msg=predef_msg).has_equal_value(incorrect_msg=predef_msg), 108 | check_object('kit', missing_msg=predef_msg).has_equal_value(incorrect_msg=predef_msg), 109 | check_object('liv', missing_msg=predef_msg).has_equal_value(incorrect_msg=predef_msg), 110 | check_object('bed', missing_msg=predef_msg).has_equal_value(incorrect_msg=predef_msg), 111 | check_object('bath', missing_msg=predef_msg).has_equal_value(incorrect_msg=predef_msg) 112 | ) 113 | ) 114 | ) 115 | 116 | success_msg("Nice! A list is way better here, isn't it?") 117 | ``` 118 | 119 | --- 120 | 121 | ## Create lists with different types 122 | 123 | ```yaml 124 | type: NormalExercise 125 | key: 1702a8bcdc 126 | lang: python 127 | xp: 100 128 | skills: 129 | - 2 130 | ``` 131 | 132 | Although it's not really common, a list can also contain a mix of Python types including strings, floats, and booleans. 133 | 134 | You're now going to add the room names to your list, so you can easily see both the room name and size together. 135 | 136 | Some of the code has been provided for you to get you started. Pay attention here! `"bathroom"` is a string, while `bath` is a variable that represents the float `9.50` you specified earlier. 137 | 138 | `@instructions` 139 | - Finish the code that creates the `areas` list. Build the list so that the list first contains the name of each room as a string and then its area. In other words, add the strings `"hallway"`, `"kitchen"` and `"bedroom"` at the appropriate locations. 140 | - Print `areas` again; is the printout more informative this time? 141 | 142 | `@hint` 143 | - The first four elements of the list `areas` are coded as `["hallway", hall, "kitchen", kit, ...`. 144 | - A string will need to be in quotation marks `""`. 145 | 146 | `@pre_exercise_code` 147 | ```{python} 148 | 149 | ``` 150 | 151 | `@sample_code` 152 | ```{python} 153 | hall = 11.25 154 | kit = 18.0 155 | liv = 20.0 156 | bed = 10.75 157 | bath = 9.50 158 | 159 | # Adapt list areas 160 | areas = [____, hall, ____, kit, "living room", liv, ____, bed, "bathroom", bath] 161 | 162 | # Print areas 163 | ____ 164 | ``` 165 | 166 | `@solution` 167 | ```{python} 168 | hall = 11.25 169 | kit = 18.0 170 | liv = 20.0 171 | bed = 10.75 172 | bath = 9.50 173 | 174 | # Adapt list areas 175 | areas = ["hallway", hall, "kitchen", kit, "living room", liv, "bedroom", bed, "bathroom", bath] 176 | 177 | # Print areas 178 | print(areas) 179 | ``` 180 | 181 | `@sct` 182 | ```{python} 183 | objs = ["hall", "kit", "liv", "bed", "bath"] 184 | predef_msg = "Don't remove or edit the predefined variables!" 185 | areas_msg = "You didn't assign the correct value to `areas`. Have another look at the instructions. Make sure to place the room name before the variable containing the area each time. The order matters here! Watch out for typos." 186 | 187 | Ex().check_correct( 188 | check_object("areas").has_equal_value(incorrect_msg = areas_msg), 189 | multi([ check_object(obj, missing_msg = predef_msg).has_equal_value(incorrect_msg = predef_msg) for obj in objs]) 190 | ) 191 | 192 | Ex().has_printout(0, not_printed_msg = "__JINJA__:Have you used `{{sol_call}}` to print out the `areas` list at the end of your script?") 193 | 194 | success_msg("Nice! This list contains both strings and floats, but that's not a problem for Python!") 195 | ``` 196 | 197 | --- 198 | 199 | ## List of lists 200 | 201 | ```yaml 202 | type: NormalExercise 203 | key: 9158c577b0 204 | lang: python 205 | xp: 100 206 | skills: 207 | - 2 208 | ``` 209 | 210 | As a data scientist, you'll often be dealing with a lot of data, and it will make sense to group some of this data. 211 | 212 | Instead of creating a list containing strings and floats, representing the names and areas of the rooms in your house, you can create a list of lists. 213 | 214 | Remember: `"hallway"` is a string, while `hall` is a variable that represents the float `11.25` you specified earlier. 215 | 216 | `@instructions` 217 | - Finish the list of lists so that it also contains the bedroom and bathroom data. Make sure you enter these in order! 218 | - Print out `house`; does this way of structuring your data make more sense? 219 | 220 | `@hint` 221 | - Add _sublists_ to the `house` list by adding `["bedroom", bed]` and `["bathroom", bath]` inside the square brackets. 222 | - Remember to include a comma `,` after each sublist. 223 | - To print a variable `x`, write `print(x)` on a new line. 224 | 225 | `@pre_exercise_code` 226 | ```{python} 227 | 228 | ``` 229 | 230 | `@sample_code` 231 | ```{python} 232 | hall = 11.25 233 | kit = 18.0 234 | liv = 20.0 235 | bed = 10.75 236 | bath = 9.50 237 | 238 | # House information as list of lists 239 | house = [["hallway", hall], 240 | ["kitchen", kit], 241 | ["living room", liv], 242 | ____, 243 | ____] 244 | 245 | # Print out house 246 | ____ 247 | ``` 248 | 249 | `@solution` 250 | ```{python} 251 | hall = 11.25 252 | kit = 18.0 253 | liv = 20.0 254 | bed = 10.75 255 | bath = 9.50 256 | 257 | # House information as list of lists 258 | house = [["hallway", hall], 259 | ["kitchen", kit], 260 | ["living room", liv], 261 | ["bedroom", bed], 262 | ["bathroom", bath]] 263 | 264 | # Print out house 265 | print(house) 266 | ``` 267 | 268 | `@sct` 269 | ```{python} 270 | predef_msg = "Don't remove or edit the predefined variables!" 271 | house_msg = "You didn't assign the correct value to `house`. Have another look at the instructions. Extend the list of lists so it incorporates a list for each pair of room name and room area. Mind the order and typos!" 272 | 273 | Ex().check_correct( 274 | check_object("house").has_equal_value(incorrect_msg = house_msg), 275 | multi( 276 | check_object('hall', missing_msg=predef_msg).has_equal_value(incorrect_msg=predef_msg), 277 | check_object('kit', missing_msg=predef_msg).has_equal_value(incorrect_msg=predef_msg), 278 | check_object('liv', missing_msg=predef_msg).has_equal_value(incorrect_msg=predef_msg), 279 | check_object('bed', missing_msg=predef_msg).has_equal_value(incorrect_msg=predef_msg), 280 | check_object('bath', missing_msg=predef_msg).has_equal_value(incorrect_msg=predef_msg) 281 | ) 282 | ) 283 | 284 | Ex().has_printout(0, not_printed_msg = "__JINJA__:Have you used `{{sol_call}}` to print out the contents of `house`?") 285 | 286 | success_msg("Great! Get ready to learn about list subsetting!") 287 | ``` 288 | 289 | --- 290 | 291 | ## Subsetting Lists 292 | 293 | ```yaml 294 | type: VideoExercise 295 | key: c076b5a69c 296 | xp: 50 297 | ``` 298 | 299 | `@projector_key` 300 | fc15ba5cb9485456df8589130b519ea3 301 | 302 | --- 303 | 304 | ## Subset and conquer 305 | 306 | ```yaml 307 | type: NormalExercise 308 | key: c3ce582e32 309 | lang: python 310 | xp: 100 311 | skills: 312 | - 2 313 | ``` 314 | 315 | Subsetting Python lists is a piece of cake. Take the code sample below, which creates a list `x` and then selects "b" from it. Remember that this is the second element, so it has index 1. You can also use negative indexing. 316 | 317 | ``` 318 | x = ["a", "b", "c", "d"] 319 | x[1] 320 | x[-3] # same result! 321 | ``` 322 | 323 | Remember the `areas` list from before, containing both strings and floats? Its definition is already in the script. Can you add the correct code to do some Python subsetting? 324 | 325 | `@instructions` 326 | - Print out the second element from the `areas` list (it has the value `11.25`). 327 | - Subset and print out the last element of `areas`, being `9.50`. Using a negative index makes sense here! 328 | - Select the number representing the area of the living room (`20.0`) and print it out. 329 | 330 | `@hint` 331 | - Use `x[1]` to select the second element of a list `x`. 332 | - Use `x[-1]` to select the last element of a list `x`. 333 | - Make sure to wrap your subsetting operations in a `print()` call. 334 | - The number representing the area of the living room is the 6th element in the list, so you'll need `[5]` here. `area[4]` would show the string! 335 | 336 | `@pre_exercise_code` 337 | ```{python} 338 | 339 | ``` 340 | 341 | `@sample_code` 342 | ```{python} 343 | # Create the areas list 344 | areas = ["hallway", 11.25, "kitchen", 18.0, "living room", 20.0, "bedroom", 10.75, "bathroom", 9.50] 345 | 346 | # Print out second element from areas 347 | print(areas[____]) 348 | 349 | # Print out last element from areas 350 | print(areas[____]) 351 | 352 | # Print out the area of the living room 353 | print(areas[____]) 354 | ``` 355 | 356 | `@solution` 357 | ```{python} 358 | # Create the areas list 359 | areas = ["hallway", 11.25, "kitchen", 18.0, "living room", 20.0, "bedroom", 10.75, "bathroom", 9.50] 360 | 361 | # Print out second element from areas 362 | print(areas[1]) 363 | 364 | # Print out last element from areas 365 | print(areas[-1]) 366 | 367 | # Print out the area of the living room 368 | print(areas[5]) 369 | ``` 370 | 371 | `@sct` 372 | ```{python} 373 | msg = "Don't remove or edit the predefined `areas` list." 374 | Ex().check_object("areas", missing_msg = msg).has_equal_value(incorrect_msg = msg) 375 | Ex().has_printout(0, not_printed_msg = "Have another look at your code to print out the second element in `areas`, which is at index `1`.") 376 | Ex().has_printout(1, not_printed_msg = "Have another look at your code to print out the last element in `areas`, which is at index `-1`.") 377 | Ex().has_printout(2, not_printed_msg = "Have another look at your code to print out the area of the living room. It's at index `5`.") 378 | success_msg("Good job!") 379 | ``` 380 | 381 | --- 382 | 383 | ## Slicing and dicing 384 | 385 | ```yaml 386 | type: NormalExercise 387 | key: 7f08642d18 388 | lang: python 389 | xp: 100 390 | skills: 391 | - 2 392 | ``` 393 | 394 | Selecting single values from a list is just one part of the story. It's also possible to _slice_ your list, which means selecting multiple elements from your list. Use the following syntax: 395 | 396 | ``` 397 | my_list[start:end] 398 | ``` 399 | 400 | The `start` index will be included, while the `end` index is _not_. However, it's also possible not to specify these indexes. If you don't specify the `start` index, Python figures out that you want to start your slice at the beginning of your list. 401 | 402 | `@instructions` 403 | - Use slicing to create a list, `downstairs`, that contains the first 6 elements of `areas`. 404 | - Create `upstairs`, as the last `4` elements of `areas`. This time, simplify the slicing by omitting the `end` index. 405 | - Print both `downstairs` and `upstairs` using `print()`. 406 | 407 | `@hint` 408 | - Use the brackets `[0:6]` to get the first six elements of a list. 409 | - To get everything except the first 5 elements of a list, `l`, you would use `l[5:]`. 410 | - Add two `print()` calls to print out `downstairs` and `upstairs`. 411 | 412 | `@pre_exercise_code` 413 | ```{python} 414 | 415 | ``` 416 | 417 | `@sample_code` 418 | ```{python} 419 | # Create the areas list 420 | areas = ["hallway", 11.25, "kitchen", 18.0, "living room", 20.0, "bedroom", 10.75, "bathroom", 9.50] 421 | 422 | # Use slicing to create downstairs 423 | downstairs = areas[____] 424 | 425 | # Use slicing to create upstairs 426 | upstairs = areas[____] 427 | 428 | # Print out downstairs and upstairs 429 | ____ 430 | ____ 431 | ``` 432 | 433 | `@solution` 434 | ```{python} 435 | # Create the areas list 436 | areas = ["hallway", 11.25, "kitchen", 18.0, "living room", 20.0, "bedroom", 10.75, "bathroom", 9.50] 437 | 438 | # Use slicing to create downstairs 439 | downstairs = areas[0:6] 440 | 441 | # Use slicing to create upstairs 442 | upstairs = areas[6:] 443 | 444 | # Print out downstairs and upstairs 445 | print(downstairs) 446 | print(upstairs) 447 | ``` 448 | 449 | `@sct` 450 | ```{python} 451 | msg = "Don't remove or edit the predefined `areas` list." 452 | Ex().check_object("areas", missing_msg = msg).has_equal_value(incorrect_msg = msg) 453 | 454 | patt = "`%s` is incorrect. Use `areas[%s]` and slicing to select the elements you want, or something equivalent." 455 | Ex().check_object("downstairs").has_equal_value(incorrect_msg = patt % ('downstairs', '0:6')) 456 | Ex().check_object("upstairs").has_equal_value(incorrect_msg = patt % ("upstairs",":6")) 457 | 458 | Ex().has_printout(0, not_printed_msg="Have you printed out `downstairs` after calculating it?") 459 | Ex().has_printout(1, not_printed_msg="Have you printed out `upstairs` after calculating it?") 460 | 461 | success_msg("Great!") 462 | ``` 463 | 464 | --- 465 | 466 | ## Subsetting lists of lists 467 | 468 | ```yaml 469 | type: NormalExercise 470 | key: dbbbd306cf 471 | xp: 100 472 | ``` 473 | 474 | A Python list can also contain other lists. 475 | 476 | To subset lists of lists, you can use the same technique as before: square brackets. This would look something like this for a list, `house`: 477 | 478 | ``` 479 | house[2][0] 480 | ``` 481 | 482 | `@instructions` 483 | - Subset the `house` list to get the float `9.5`. 484 | 485 | `@hint` 486 | - Break this down step by step. First you want to get to the last element of the list, `["bathroom", 9.50]`. Recall the index of the last element is `-1`. 487 | - Next you want to get the second element of `["bathroom", 9.50]`, which is at index `1`. 488 | 489 | `@pre_exercise_code` 490 | ```{python} 491 | 492 | ``` 493 | 494 | `@sample_code` 495 | ```{python} 496 | house = [["hallway", 11.25], 497 | ["kitchen", 18.0], 498 | ["living room", 20.0], 499 | ["bedroom", 10.75], 500 | ["bathroom", 9.50]] 501 | 502 | # Subset the house list 503 | house___ 504 | ``` 505 | 506 | `@solution` 507 | ```{python} 508 | house = [["hallway", 11.25], 509 | ["kitchen", 18.0], 510 | ["living room", 20.0], 511 | ["bedroom", 10.75], 512 | ["bathroom", 9.50]] 513 | 514 | # Subset the house list 515 | house[-1][1] 516 | ``` 517 | 518 | `@sct` 519 | ```{python} 520 | Ex().check_or( 521 | has_code("house[-1][1]", pattern=False), 522 | has_code("house[4][1]", pattern=False) 523 | ) 524 | 525 | success_msg("Correctomundo! The last piece of the list puzzle is manipulation.") 526 | ``` 527 | 528 | --- 529 | 530 | ## Manipulating Lists 531 | 532 | ```yaml 533 | type: VideoExercise 534 | key: d7fe818b3a 535 | xp: 50 536 | ``` 537 | 538 | `@projector_key` 539 | 355ed52d2fb0d67508c6a311b7cbc6d3 540 | 541 | --- 542 | 543 | ## Replace list elements 544 | 545 | ```yaml 546 | type: NormalExercise 547 | key: 4e1bba1b55 548 | lang: python 549 | xp: 100 550 | skills: 551 | - 2 552 | ``` 553 | 554 | To replace list elements, you subset the list and assign new values to the subset. You can select single elements or you can change entire list slices at once. 555 | 556 | For this and the following exercises, you'll continue working on the `areas` list that contains the names and areas of different rooms in a house. 557 | 558 | `@instructions` 559 | - Update the area of the bathroom to be `10.50` square meters instead of `9.50` using negative indexing. 560 | - Make the `areas` list more trendy! Change `"living room"` to `"chill zone"`. Don't use negative indexing this time. 561 | 562 | `@hint` 563 | - To update the bathroom area, identify the subset of the bathroom area (it's the last item of the list!). 564 | - Then, replace the value with the new bathroom area by assigning it to this subset. 565 | - Do the same to update the `"living room"` name, which is at index 4. 566 | 567 | `@pre_exercise_code` 568 | ```{python} 569 | 570 | ``` 571 | 572 | `@sample_code` 573 | ```{python} 574 | # Create the areas list 575 | areas = ["hallway", 11.25, "kitchen", 18.0, "living room", 20.0, "bedroom", 10.75, "bathroom", 9.50] 576 | 577 | # Correct the bathroom area 578 | 579 | 580 | # Change "living room" to "chill zone" 581 | 582 | ``` 583 | 584 | `@solution` 585 | ```{python} 586 | # Create the areas list 587 | areas = ["hallway", 11.25, "kitchen", 18.0, "living room", 20.0, "bedroom", 10.75, "bathroom", 9.50] 588 | 589 | # Correct the bathroom area 590 | areas[-1] = 10.50 591 | 592 | # Change "living room" to "chill zone" 593 | areas[4] = "chill zone" 594 | ``` 595 | 596 | `@sct` 597 | ```{python} 598 | bathroom_msg = 'You can use `areas[-1] = 10.50` to update the bathroom area.' 599 | chillzone_msg = 'You can use `areas[4] = "chill zone"` to update the living room name.' 600 | Ex().check_correct( 601 | check_object('areas').has_equal_value(incorrect_msg = 'Your changes to `areas` did not result in the correct list. Are you sure you used the correct subset operations? When in doubt, you can use a hint!'), 602 | multi( 603 | has_equal_value(expr_code='areas[-1]', override=10.50, incorrect_msg = bathroom_msg), 604 | has_equal_value(expr_code='areas[4]', override='chill zone', incorrect_msg = chillzone_msg), 605 | ) 606 | ) 607 | success_msg('Sweet! As the code sample showed, you can also slice a list and replace it with another list to update multiple elements in a single command.') 608 | ``` 609 | 610 | --- 611 | 612 | ## Extend a list 613 | 614 | ```yaml 615 | type: NormalExercise 616 | key: ff0fe8d967 617 | lang: python 618 | xp: 100 619 | skills: 620 | - 2 621 | ``` 622 | 623 | If you can change elements in a list, you sure want to be able to add elements to it, right? You can use the `+` operator: 624 | 625 | ``` 626 | x = ["a", "b", "c", "d"] 627 | y = x + ["e", "f"] 628 | ``` 629 | 630 | You just won the lottery, awesome! You decide to build a poolhouse and a garage. Can you add the information to the `areas` list? 631 | 632 | `@instructions` 633 | - Use the `+` operator to paste the list `["poolhouse", 24.5]` to the end of the `areas` list. Store the resulting list as `areas_1`. 634 | - Further extend `areas_1` by adding data on your garage. Add the string `"garage"` and float `15.45`. Name the resulting list `areas_2`. 635 | 636 | `@hint` 637 | - Follow the code sample in the assignment. `x` is `areas` here, and `["e", "f"]` is `["poolhouse", 24.5]`. 638 | - To add more elements to `areas_1`, use `areas_1 + ["element", 123]`. 639 | 640 | `@pre_exercise_code` 641 | ```{python} 642 | 643 | ``` 644 | 645 | `@sample_code` 646 | ```{python} 647 | # Create the areas list and make some changes 648 | areas = ["hallway", 11.25, "kitchen", 18.0, "chill zone", 20.0, 649 | "bedroom", 10.75, "bathroom", 10.50] 650 | 651 | # Add poolhouse data to areas, new list is areas_1 652 | areas_1 = ____ 653 | 654 | # Add garage data to areas_1, new list is areas_2 655 | areas_2 = ____ 656 | ``` 657 | 658 | `@solution` 659 | ```{python} 660 | # Create the areas list (updated version) 661 | areas = ["hallway", 11.25, "kitchen", 18.0, "chill zone", 20.0, 662 | "bedroom", 10.75, "bathroom", 10.50] 663 | 664 | # Add poolhouse data to areas, new list is areas_1 665 | areas_1 = areas + ["poolhouse", 24.5] 666 | 667 | # Add garage data to areas_1, new list is areas_2 668 | areas_2 = areas_1 + ["garage", 15.45] 669 | ``` 670 | 671 | `@sct` 672 | ```{python} 673 | msg = "Don't remove or edit the predefined `areas` list." 674 | Ex().check_object("areas", missing_msg = msg).has_equal_value(incorrect_msg = msg) 675 | Ex().check_object("areas_1").has_equal_value(incorrect_msg = "Use `areas + [\"poolhouse\", 24.5]` to create `areas_1`. Watch out for typos!") 676 | Ex().check_object("areas_2").has_equal_value(incorrect_msg = "Use `areas_1 + [\"garage\", 15.45]` to create `areas_2`. Watch out for typos!") 677 | success_msg("Cool! The list is shaping up nicely!") 678 | ``` 679 | 680 | --- 681 | 682 | ## Delete list elements 683 | 684 | ```yaml 685 | type: NormalExercise 686 | key: 85f792356e 687 | xp: 100 688 | ``` 689 | 690 | Finally, you can also remove elements from your list. You can do this with the `del` statement: 691 | 692 | ``` 693 | x = ["a", "b", "c", "d"] 694 | del x[1] 695 | ``` 696 | 697 | Pay attention here: as soon as you remove an element from a list, the indexes of the elements that come after the deleted element all change! 698 | 699 | Unfortunately, the amount you won with the lottery is not that big after all and it looks like the poolhouse isn't going to happen. You'll need to remove it from the list. You decide to remove the corresponding string and float from the `areas` list. 700 | 701 | `@instructions` 702 | - Delete the string and float for the `"poolhouse"` from your `areas` list. 703 | - Print the updated `areas` list. 704 | 705 | `@hint` 706 | - You'll need to use `del` twice to delete two elements. Be careful about changing indexes though! 707 | 708 | `@pre_exercise_code` 709 | ```{python} 710 | 711 | ``` 712 | 713 | `@sample_code` 714 | ```{python} 715 | areas = ["hallway", 11.25, "kitchen", 18.0, 716 | "chill zone", 20.0, "bedroom", 10.75, 717 | "bathroom", 10.50, "poolhouse", 24.5, 718 | "garage", 15.45] 719 | 720 | # Delete the poolhouse items from the list 721 | 722 | 723 | # Print the updated list 724 | 725 | ``` 726 | 727 | `@solution` 728 | ```{python} 729 | areas = ["hallway", 11.25, "kitchen", 18.0, 730 | "chill zone", 20.0, "bedroom", 10.75, 731 | "bathroom", 10.50, "poolhouse", 24.5, 732 | "garage", 15.45] 733 | 734 | # Delete the poolhouse items from the list 735 | del areas[10] 736 | del areas[10] 737 | 738 | # Print the updated list 739 | print(areas) 740 | ``` 741 | 742 | `@sct` 743 | ```{python} 744 | Ex().check_or( 745 | multi( 746 | has_code("del areas[10]", pattern=False), 747 | has_code("del areas[10]", pattern=False) 748 | ), 749 | has_code("del areas[-4:-2]", pattern=False), 750 | has_code("del(areas[-4:-2])", pattern=False), 751 | multi( 752 | has_code("del(areas[10])", pattern=False), 753 | has_code("del(areas[10])", pattern=False) 754 | ), 755 | has_code("del areas[10:12]", pattern=False), 756 | has_code("del(areas[10:12])", pattern=False), 757 | multi( 758 | has_code("del areas[-4]", pattern=False), 759 | has_code("del areas[-3]", pattern=False) 760 | ), 761 | multi( 762 | has_code("del(areas[-4])", pattern=False), 763 | has_code("del(areas[-3])", pattern=False) 764 | ) 765 | ) 766 | 767 | Ex().has_printout(0, not_printed_msg="Have you printed out `areas` after removing the poolhouse string and float?") 768 | success_msg("Correct! You'll learn about easier ways to remove specific elements from Python lists later on.") 769 | ``` 770 | 771 | --- 772 | 773 | ## Inner workings of lists 774 | 775 | ```yaml 776 | type: NormalExercise 777 | key: af72db9915 778 | lang: python 779 | xp: 100 780 | skills: 781 | - 2 782 | ``` 783 | 784 | Some code has been provided for you in this exercise: a list with the name `areas` and a copy named `areas_copy`. 785 | 786 | Currently, the first element in the `areas_copy` list is changed and the `areas` list is printed out. If you hit the run code button you'll see that, although you've changed `areas_copy`, the change also takes effect in the `areas` list. That's because `areas` and `areas_copy` point to the same list. 787 | 788 | If you want to prevent changes in `areas_copy` from also taking effect in `areas`, you'll have to do a more explicit copy of the `areas` list with `list()` or by using `[:]`. 789 | 790 | `@instructions` 791 | - Change the second command, that creates the variable `areas_copy`, such that `areas_copy` is an explicit copy of `areas`. After your edit, changes made to `areas_copy` shouldn't affect `areas`. Submit the answer to check this. 792 | 793 | `@hint` 794 | - Change the `areas_copy = areas` call. Instead of assigning `areas`, you can assign `list(areas)` or `areas[:]`. 795 | 796 | `@pre_exercise_code` 797 | ```{python} 798 | 799 | ``` 800 | 801 | `@sample_code` 802 | ```{python} 803 | # Create list areas 804 | areas = [11.25, 18.0, 20.0, 10.75, 9.50] 805 | 806 | # Change this command 807 | areas_copy = areas 808 | 809 | # Change areas_copy 810 | areas_copy[0] = 5.0 811 | 812 | # Print areas 813 | print(areas) 814 | ``` 815 | 816 | `@solution` 817 | ```{python} 818 | # Create list areas 819 | areas = [11.25, 18.0, 20.0, 10.75, 9.50] 820 | 821 | # Change this command 822 | areas_copy = list(areas) 823 | 824 | # Change areas_copy 825 | areas_copy[0] = 5.0 826 | 827 | # Print areas 828 | print(areas) 829 | ``` 830 | 831 | `@sct` 832 | ```{python} 833 | Ex().check_correct( 834 | check_object("areas_copy").has_equal_value(incorrect_msg = "It seems that `areas_copy` has not been updated correctly."), 835 | check_function("list", missing_msg = "Make sure to use `list(areas)` to create an `areas_copy`.") 836 | ) 837 | 838 | mmsg = "Don't remove the predefined `areas` list." 839 | imsg = "Be sure to edit ONLY the copy, not the original `areas` list. Have another look at the exercise description if you're unsure how to create a copy." 840 | Ex().check_correct( 841 | check_object("areas", missing_msg = mmsg).has_equal_value(incorrect_msg = imsg), 842 | check_function("list", missing_msg = "Make sure to use `list(areas)` to create an `areas_copy`.") 843 | ) 844 | 845 | success_msg("Nice! The difference between explicit and reference-based copies is subtle, but can be really important. Try to keep in mind how a list is stored in the computer's memory.") 846 | ``` 847 | -------------------------------------------------------------------------------- /chapter3.md: -------------------------------------------------------------------------------- 1 | --- 2 | title_meta: Chapter 3 3 | title: Functions and Packages 4 | description: >- 5 | You'll learn how to use functions, methods, and packages to efficiently 6 | leverage the code that brilliant Python developers have written. The goal is 7 | to reduce the amount of code you need to solve challenging problems! 8 | attachments: 9 | slides_link: 'https://projector-video-pdf-converter.datacamp.com/735/chapter3.pdf' 10 | lessons: 11 | - nb_of_exercises: 4 12 | title: Functions 13 | - nb_of_exercises: 4 14 | title: Methods 15 | - nb_of_exercises: 4 16 | title: Packages 17 | --- 18 | 19 | ## Functions 20 | 21 | ```yaml 22 | type: VideoExercise 23 | key: 5c5f365930 24 | xp: 50 25 | ``` 26 | 27 | `@projector_key` 28 | 1204d914b0e53100529827e07441ee6c 29 | 30 | --- 31 | 32 | ## Familiar functions 33 | 34 | ```yaml 35 | type: NormalExercise 36 | key: c422ee929b 37 | lang: python 38 | xp: 100 39 | skills: 40 | - 2 41 | ``` 42 | 43 | Out of the box, Python offers a bunch of built-in functions to make your life as a data scientist easier. You already know two such functions: `print()` and `type()`. There are also functions like `str()`, `int()`, `bool()` and `float()` to switch between data types. You can find out about them [here.](https://docs.python.org/3/library/functions.html) These are built-in functions as well. 44 | 45 | Calling a function is easy. To get the type of `3.0` and store the output as a new variable, `result`, you can use the following: 46 | 47 | ``` 48 | result = type(3.0) 49 | ``` 50 | 51 | `@instructions` 52 | - Use `print()` in combination with `type()` to print out the type of `var1`. 53 | - Use `len()` to get the [length of the list](https://docs.python.org/3/library/functions.html#len) `var1`. Wrap it in a `print()` call to directly print it out. 54 | - Use `int()` to convert `var2` to an [integer](https://docs.python.org/3/library/functions.html#int). Store the output as `out2`. 55 | 56 | `@hint` 57 | - Call the `type()` function like this: `type(var1)`. 58 | - Call `print()` like you did so many times before. Simply put the variable you want to print in parentheses. 59 | - `int(x)` will convert `x` to an integer. 60 | 61 | `@pre_exercise_code` 62 | ```{python} 63 | 64 | ``` 65 | 66 | `@sample_code` 67 | ```{python} 68 | # Create variables var1 and var2 69 | var1 = [1, 2, 3, 4] 70 | var2 = True 71 | 72 | # Print out type of var1 73 | ____ 74 | 75 | # Print out length of var1 76 | ____ 77 | 78 | # Convert var2 to an integer: out2 79 | out2 = ____ 80 | ``` 81 | 82 | `@solution` 83 | ```{python} 84 | # Create variables var1 and var2 85 | var1 = [1, 2, 3, 4] 86 | var2 = True 87 | 88 | # Print out type of var1 89 | print(type(var1)) 90 | 91 | # Print out length of var1 92 | print(len(var1)) 93 | 94 | # Convert var2 to an integer: out2 95 | out2 = int(var2) 96 | ``` 97 | 98 | `@sct` 99 | ```{python} 100 | msg = "You don't have to change or remove the predefined variables." 101 | Ex().check_object("var1", missing_msg=msg).has_equal_value(incorrect_msg=msg) 102 | Ex().check_object("var2", missing_msg=msg).has_equal_value(incorrect_msg=msg) 103 | 104 | patt = "__JINJA__:Make sure to print out the %s of `var1` with `{{sol_call}}`." 105 | Ex().has_printout(0, not_printed_msg = patt % 'type') 106 | Ex().has_printout(1, not_printed_msg = patt % 'length') 107 | 108 | int_miss_msg = "Have you used `int()` to make an integer of `var2`?" 109 | int_incorr_msg = "Have you passed `var2` to `int()`?" 110 | Ex().check_correct( 111 | check_object("out2").has_equal_value(incorrect_msg="You called `int()` correctly; now make sure to assign the result of this call to `out2`."), 112 | check_function("int", missing_msg=int_miss_msg).has_equal_value(incorrect_msg=int_incorr_msg) 113 | ) 114 | success_msg("Great job! The `len()` function is extremely useful; it also works on strings to count the number of characters!") 115 | ``` 116 | 117 | --- 118 | 119 | ## Help! 120 | 121 | ```yaml 122 | type: MultipleChoiceExercise 123 | key: 679b852978 124 | lang: python 125 | xp: 50 126 | skills: 127 | - 2 128 | ``` 129 | 130 | Maybe you already know the name of a Python function, but you still have to figure out how to use it. Ironically, you have to ask for information about a function with another function: `help()`. In IPython specifically, you can also use `?` before the function name. 131 | 132 | To get help on the `max()` function, for example, you can use one of these calls: 133 | 134 | ``` 135 | help(max) 136 | ?max 137 | ``` 138 | 139 | Use the IPython Shell to open up the [documentation](https://docs.python.org/3/library/functions.html#pow) on `pow()`. Do this by typing `?pow` or `help(pow)` and hitting **Enter**. 140 | 141 | Which of the following statements is true? 142 | 143 | `@possible_answers` 144 | - `pow()` takes three arguments: `base`, `exp`, and `mod`. Without `mod`, the function will return an error. 145 | - `pow()` takes three required arguments: `base`, `exp`, and `None`. 146 | - `pow()` requires `base` and `exp` arguments; `mod` is optional. 147 | - `pow()` takes two arguments: `exp` and `mod`. Missing `exp` results in an error. 148 | 149 | `@hint` 150 | - Optional arguments are set `=` to a default value, which the function will use if that argument is not specified. 151 | 152 | `@pre_exercise_code` 153 | ```{python} 154 | 155 | ``` 156 | 157 | `@sct` 158 | ```{python} 159 | msg1 = "Not quite. `mod` has a default value that will be used if you don't specify a value." 160 | msg2 = "Incorrect. `None` is the default value for the `mod` argument." 161 | msg3 = "Perfect! Using `help()` can help you understand how functions work, unleashing their full potential!" 162 | msg4 = "Incorrect. `pow()` takes three arguments, one of which has a default value." 163 | Ex().has_chosen(3, [msg1, msg2, msg3, msg4]) 164 | ``` 165 | 166 | --- 167 | 168 | ## Multiple arguments 169 | 170 | ```yaml 171 | type: NormalExercise 172 | key: e30486d7c1 173 | lang: python 174 | xp: 100 175 | skills: 176 | - 2 177 | ``` 178 | 179 | In the previous exercise, you identified optional arguments by viewing the documentation with `help()`. You'll now apply this to change the behavior of the `sorted()` function. 180 | 181 | Have a look at the [documentation](https://docs.python.org/3/library/functions.html#sorted) of `sorted()` by typing `help(sorted)` in the IPython Shell. 182 | 183 | You'll see that `sorted()` takes three arguments: `iterable`, `key`, and `reverse`. In this exercise, you'll only have to specify `iterable` and `reverse`, not `key`. 184 | 185 | Two lists have been created for you. 186 | 187 | Can you paste them together and sort them in descending order? 188 | 189 | `@instructions` 190 | - Use `+` to merge the contents of `first` and `second` into a new list: `full`. 191 | - Call `sorted()` and on `full` and specify the `reverse` argument to be `True`. Save the sorted list as `full_sorted`. 192 | - Finish off by printing out `full_sorted`. 193 | 194 | `@hint` 195 | - Sum `first` and `second` as if they are two numbers and assign the result to `full`. 196 | - Use `sorted()` with two inputs: `full` and `reverse=True`. 197 | - To print out a variable, use `print()`. 198 | 199 | `@pre_exercise_code` 200 | ```{python} 201 | 202 | ``` 203 | 204 | `@sample_code` 205 | ```{python} 206 | # Create lists first and second 207 | first = [11.25, 18.0, 20.0] 208 | second = [10.75, 9.50] 209 | 210 | # Paste together first and second: full 211 | full = ____ + ____ 212 | 213 | # Sort full in descending order: full_sorted 214 | full_sorted = ____ 215 | 216 | # Print out full_sorted 217 | ____ 218 | ``` 219 | 220 | `@solution` 221 | ```{python} 222 | # Create lists first and second 223 | first = [11.25, 18.0, 20.0] 224 | second = [10.75, 9.50] 225 | 226 | # Paste together first and second: full 227 | full = first + second 228 | 229 | # Sort full in descending order: full_sorted 230 | full_sorted = sorted(full, reverse=True) 231 | 232 | # Print out full_sorted 233 | print(full_sorted) 234 | ``` 235 | 236 | `@sct` 237 | ```{python} 238 | msg = "You don't have to change or remove the already variables `first` and `second`." 239 | Ex().multi( 240 | check_object("first", missing_msg=msg).has_equal_value(incorrect_msg=msg), 241 | check_object("second", missing_msg=msg).has_equal_value(incorrect_msg=msg) 242 | ) 243 | Ex().check_correct( 244 | check_object("full_sorted").has_equal_value(incorrect_msg="Make sure you assign the result of calling `sorted()` to `full_sorted`."), 245 | check_function("sorted").multi( 246 | check_args(0).has_equal_value(), 247 | check_args('reverse').has_equal_value() 248 | ) 249 | ) 250 | 251 | success_msg("Cool! Head over to the video on Python methods.") 252 | ``` 253 | 254 | --- 255 | 256 | ## Methods 257 | 258 | ```yaml 259 | type: VideoExercise 260 | key: 2b66cb66b1 261 | xp: 50 262 | ``` 263 | 264 | `@projector_key` 265 | 8e387776f3a264a745128b68aa8d8f83 266 | 267 | --- 268 | 269 | ## String Methods 270 | 271 | ```yaml 272 | type: NormalExercise 273 | key: 4039302ee0 274 | lang: python 275 | xp: 100 276 | skills: 277 | - 2 278 | ``` 279 | 280 | Strings come with a bunch of methods. Follow the instructions closely to discover some of them. If you want to discover them in more detail, you can always type `help(str)` in the IPython Shell. 281 | 282 | A string `place` has already been created for you to experiment with. 283 | 284 | `@instructions` 285 | - Use the `.upper()` [method](https://docs.python.org/3/library/stdtypes.html#str.upper) on `place` and store the result in `place_up`. Use the syntax for calling methods that you learned in the previous video. 286 | - Print out `place` and `place_up`. Did both change? 287 | - Print out the number of o's on the variable `place` by calling `.count()` on `place` and passing the letter `'o'` as an input to the method. We're talking about the variable `place`, not the word `"place"`! 288 | 289 | `@hint` 290 | - You can call the `.upper()` method on `place` without any additional inputs. 291 | - To print out a variable `x`, you can write `print(x)`. 292 | - Make sure to wrap your `place.count(____)` call in a `print()` function so that you print it out. 293 | 294 | `@pre_exercise_code` 295 | ```{python} 296 | 297 | ``` 298 | 299 | `@sample_code` 300 | ```{python} 301 | # string to experiment with: place 302 | place = "poolhouse" 303 | 304 | # Use upper() on place 305 | place_up = 306 | 307 | # Print out place and place_up 308 | 309 | 310 | 311 | # Print out the number of o's in place 312 | 313 | ``` 314 | 315 | `@solution` 316 | ```{python} 317 | # string to experiment with: place 318 | place = "poolhouse" 319 | 320 | # Use upper() on place 321 | place_up = place.upper() 322 | 323 | # Print out place and place_up 324 | print(place) 325 | print(place_up) 326 | 327 | # Print out the number of o's in place 328 | print(place.count('o')) 329 | ``` 330 | 331 | `@sct` 332 | ```{python} 333 | msg = "You don't have to change or remove the predefined variables." 334 | Ex().check_object("place", missing_msg=msg).has_equal_value(incorrect_msg=msg) 335 | 336 | patt = "Don't forget to print out `%s`." 337 | Ex().has_printout(0, not_printed_msg=patt % "place") 338 | Ex().check_correct( 339 | has_printout(1, not_printed_msg=patt % "place_up"), 340 | check_correct( 341 | check_object("place_up").has_equal_value(incorrect_msg="Assign the result of your `place.upper()` call to `place_up`."), 342 | check_function("place.upper", signature=False) 343 | ) 344 | ) 345 | 346 | # check count of place 347 | Ex().check_correct( 348 | has_printout(2, not_printed_msg = "You have calculated the number of o's in `place` fine; now make sure to wrap `place.count('o')` call in a `print()` function to print out the result."), 349 | check_function("place.count", signature=False).check_args(0).has_equal_value() 350 | ) 351 | 352 | success_msg("Nice! Notice from the printouts that the `upper()` method does not change the object it is called on. This will be different for lists in the next exercise!") 353 | ``` 354 | 355 | --- 356 | 357 | ## List Methods 358 | 359 | ```yaml 360 | type: NormalExercise 361 | key: 0dbe8ed695 362 | lang: python 363 | xp: 100 364 | skills: 365 | - 2 366 | ``` 367 | 368 | Strings are not the only Python types that have methods associated with them. Lists, floats, integers and booleans are also types that come packaged with a bunch of useful methods. In this exercise, you'll be experimenting with: 369 | 370 | - `.index()`, to get the index of the first element of a list that matches its input and 371 | - `.count()`, to get the number of times an element appears in a list. 372 | 373 | You'll be working on the list with the area of different parts of a house: `areas`. 374 | 375 | `@instructions` 376 | - Use the `.index()` method to get the index of the element in `areas` that is equal to `20.0`. Print out this index. 377 | - Call `.count()` on `areas` to find out how many times `9.50` appears in the list. Again, simply print out this number. 378 | 379 | `@hint` 380 | - To print out the index, wrap the `areas.index(___)` call in a `print()` function. 381 | - To print out the number of times an element `x` occurs in the list, wrap the `areas.count(___)` call in a `print()` function. 382 | 383 | `@pre_exercise_code` 384 | ```{python} 385 | 386 | ``` 387 | 388 | `@sample_code` 389 | ```{python} 390 | # Create list areas 391 | areas = [11.25, 18.0, 20.0, 10.75, 9.50] 392 | 393 | # Print out the index of the element 20.0 394 | 395 | 396 | # Print out how often 9.50 appears in areas 397 | 398 | ``` 399 | 400 | `@solution` 401 | ```{python} 402 | # Create list areas 403 | areas = [11.25, 18.0, 20.0, 10.75, 9.50] 404 | 405 | # Print out the index of the element 20.0 406 | print(areas.index(20.0)) 407 | 408 | # Print out how often 9.50 appears in areas 409 | print(areas.count(9.50)) 410 | ``` 411 | 412 | `@sct` 413 | ```{python} 414 | predef_msg = "You don't have to change or remove the predefined list `areas`." 415 | 416 | Ex().check_object("areas", missing_msg=predef_msg).has_equal_value(incorrect_msg=predef_msg) 417 | 418 | Ex().check_function("print", index=0).check_args(0).check_function('areas.index', signature=False).check_args(0).has_equal_value() 419 | 420 | 421 | Ex().check_function("print", index=1).check_args(0).check_function('areas.count', signature=False).has_equal_value() 422 | 423 | success_msg("Nice! These were examples of `list` methods that did not change the list they were called on.") 424 | ``` 425 | 426 | --- 427 | 428 | ## List Methods (2) 429 | 430 | ```yaml 431 | type: NormalExercise 432 | key: 1fbeab82d0 433 | lang: python 434 | xp: 100 435 | skills: 436 | - 2 437 | ``` 438 | 439 | Most list methods will change the list they're called on. Examples are: 440 | 441 | - `.append()`, that adds an element to the list it is called on, 442 | - `.remove()`, that [removes](https://docs.python.org/3/library/stdtypes.html#typesseq-mutable) the first element of a list that matches the input, and 443 | - `.reverse()`, that [reverses](https://docs.python.org/3/library/stdtypes.html#typesseq-mutable) the order of the elements in the list it is called on. 444 | 445 | You'll be working on the list with the area of different parts of the house: `areas`. 446 | 447 | `@instructions` 448 | - Use `.append()` twice to add the size of the poolhouse and the garage again: `24.5` and `15.45`, respectively. Make sure to add them in this order. 449 | - Print out `areas` 450 | - Use the `.reverse()` method to reverse the order of the elements in `areas`. 451 | - Print out `areas` once more. 452 | 453 | `@hint` 454 | - For the first instruction, use the `areas.append(___)` call twice. 455 | - To print out a variable `x`, simply write `print(x)`. 456 | - The `.reverse()` method does not require additional inputs; just use the dot notation and empty parentheses: `.reverse()`. 457 | - To print out a variable `x`, simply write `print(x)`. 458 | 459 | `@pre_exercise_code` 460 | ```{python} 461 | 462 | ``` 463 | 464 | `@sample_code` 465 | ```{python} 466 | # Create list areas 467 | areas = [11.25, 18.0, 20.0, 10.75, 9.50] 468 | 469 | # Use append twice to add poolhouse and garage size 470 | 471 | 472 | 473 | # Print out areas 474 | 475 | 476 | # Reverse the orders of the elements in areas 477 | 478 | 479 | # Print out areas 480 | 481 | ``` 482 | 483 | `@solution` 484 | ```{python} 485 | # Create list areas 486 | areas = [11.25, 18.0, 20.0, 10.75, 9.50] 487 | 488 | # Use append twice to add poolhouse and garage size 489 | areas.append(24.5) 490 | areas.append(15.45) 491 | 492 | # Print out areas 493 | print(areas) 494 | 495 | # Reverse the orders of the elements in areas 496 | areas.reverse() 497 | 498 | # Print out areas 499 | print(areas) 500 | ``` 501 | 502 | `@sct` 503 | ```{python} 504 | Ex().multi( 505 | check_function("areas.append", index=0, signature=False).check_args(0).has_equal_value(), 506 | check_function("areas.append", index=1, signature=False).check_args(0).has_equal_value(), 507 | check_function("print", index=0).check_args(0).has_equal_ast(), 508 | check_function("areas.reverse", index=0, signature=False), 509 | check_function("print", index=1).check_args(0).has_equal_ast() 510 | ) 511 | 512 | success_msg("Great!") 513 | ``` 514 | 515 | --- 516 | 517 | ## Packages 518 | 519 | ```yaml 520 | type: VideoExercise 521 | key: ab96a17c5e 522 | xp: 50 523 | ``` 524 | 525 | `@projector_key` 526 | cedcfb34350be8545599768f96695cdd 527 | 528 | --- 529 | 530 | ## Import package 531 | 532 | ```yaml 533 | type: NormalExercise 534 | key: 7432a6376f 535 | lang: python 536 | xp: 100 537 | skills: 538 | - 2 539 | ``` 540 | 541 | Let's say you wanted to calculate the circumference and area of a circle. Here's what those formulas look like: 542 | 543 | $$C = 2 \pi r$$ 544 | $$A = \pi r^2 $$ 545 | 546 | Rather than typing the number for `pi`, you can use the `math` package that contains the number 547 | 548 | For reference, `**` is the symbol for exponentiation. For example `3**4` is `3` to the power of `4` and will give `81`. 549 | 550 | `@instructions` 551 | - Import the `math` package. 552 | - Use `math.pi` to calculate the circumference of the circle and store it in `C`. 553 | - Use `math.pi` to calculate the area of the circle and store it in `A`. 554 | 555 | `@hint` 556 | - You can simply use `import math`, and then refer to `pi` with `math.pi`. 557 | - Use the equation in the assignment text to find `C`. Use `*` 558 | - Use the equation in the assignment text to find `A`. Use `*` and `**`. 559 | 560 | `@pre_exercise_code` 561 | ```{python} 562 | 563 | ``` 564 | 565 | `@sample_code` 566 | ```{python} 567 | # Import the math package 568 | import ____ 569 | 570 | # Calculate C 571 | C = 2 * 0.43 * ____ 572 | 573 | # Calculate A 574 | A = ____ * 0.43 ** 2 575 | 576 | print("Circumference: " + str(C)) 577 | print("Area: " + str(A)) 578 | ``` 579 | 580 | `@solution` 581 | ```{python} 582 | # Import the math package 583 | import math 584 | 585 | # Calculate C 586 | C = 2 * 0.43 * math.pi 587 | 588 | # Calculate A 589 | A = math.pi * 0.43 ** 2 590 | 591 | print("Circumference: " + str(C)) 592 | print("Area: " + str(A)) 593 | ``` 594 | 595 | `@sct` 596 | ```{python} 597 | patt = "Your calculation of `%s` is not quite correct. Make sure to use `math.pi`." 598 | Ex().multi( 599 | has_import('math', same_as=False), 600 | check_object('C').has_equal_value(incorrect_msg=patt%'C'), 601 | check_object('A').has_equal_value(incorrect_msg=patt%'A') 602 | ) 603 | 604 | Ex().multi( 605 | has_printout(0, not_printed_msg = "__JINJA__:Keep `{{sol_call}}` in there to print out the circumference."), 606 | has_printout(1, not_printed_msg = "__JINJA__:Keep `{{sol_call}}` in there to print out the area.") 607 | ) 608 | 609 | success_msg("Nice! If you know how to deal with functions from packages, the power of a lot of Python programmers is at your fingertips!") 610 | ``` 611 | 612 | --- 613 | 614 | ## Selective import 615 | 616 | ```yaml 617 | type: NormalExercise 618 | key: fe65eff50a 619 | lang: python 620 | xp: 100 621 | skills: 622 | - 2 623 | ``` 624 | 625 | General imports, like `import math`, make **all** functionality from the `math` package available to you. However, if you decide to only use a specific part of a package, you can always make your import more selective: 626 | 627 | ``` 628 | from math import pi 629 | ``` 630 | 631 | Try the same thing again, but this time only use `pi`. 632 | 633 | `@instructions` 634 | - Perform a selective import from the `math` package where you only import the `pi` function. 635 | - Use `math.pi` to calculate the circumference of the circle and store it in `C`. 636 | - Use `math.pi` to calculate the area of the circle and store it in `A`. 637 | 638 | `@hint` 639 | - Use `from math import pi` to do the selective import. 640 | - Now, you can use `pi` on it's own! 641 | 642 | `@pre_exercise_code` 643 | ```{python} 644 | 645 | ``` 646 | 647 | `@sample_code` 648 | ```{python} 649 | # Import pi function of math package 650 | from math import ____ 651 | 652 | # Calculate C 653 | C = 2 * 0.43 * ____ 654 | 655 | # Calculate A 656 | A = ____ * 0.43 ** 2 657 | 658 | print("Circumference: " + str(C)) 659 | print("Area: " + str(A)) 660 | ``` 661 | 662 | `@solution` 663 | ```{python} 664 | # Import pi function of math package 665 | from math import pi 666 | 667 | # Calculate C 668 | C = 2 * 0.43 * pi 669 | 670 | # Calculate A 671 | A = pi * 0.43 ** 2 672 | 673 | print("Circumference: " + str(C)) 674 | print("Area: " + str(A)) 675 | ``` 676 | 677 | `@sct` 678 | ```{python} 679 | patt = "Your calculation of `%s` is not quite correct. Make sure to use only `pi`." 680 | 681 | Ex().has_import("math.pi", not_imported_msg = "Be sure to import `pi` from the `math` package. You should use the `from ___ import ___` notation.",) 682 | 683 | Ex().multi( 684 | check_object('C').has_equal_value(incorrect_msg=patt%'C'), 685 | check_object('A').has_equal_value(incorrect_msg=patt%'A') 686 | ) 687 | 688 | Ex().multi( 689 | has_printout(0, not_printed_msg = "__JINJA__:Keep `{{sol_call}}` in there to print out the circumference."), 690 | has_printout(1, not_printed_msg = "__JINJA__:Keep `{{sol_call}}` in there to print out the area.") 691 | ) 692 | 693 | success_msg("Nice! Head over to the next exercise.") 694 | ``` 695 | 696 | --- 697 | 698 | ## Different ways of importing 699 | 700 | ```yaml 701 | type: MultipleChoiceExercise 702 | key: f1b2675a2a 703 | lang: python 704 | xp: 50 705 | skills: 706 | - 2 707 | ``` 708 | 709 | There are several ways to import packages and modules into Python. Depending on the import call, you'll have to use different Python code. 710 | 711 | Suppose you want to use the [function](https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.inv.html) `inv()`, which is in the `linalg` subpackage of the `scipy` package. You want to be able to use this function as follows: 712 | 713 | ``` 714 | my_inv([[1,2], [3,4]]) 715 | ``` 716 | 717 | Which `import` statement will you need in order to run the above code without an error? 718 | 719 | `@possible_answers` 720 | - `import scipy` 721 | - `import scipy.linalg` 722 | - `from scipy.linalg import my_inv` 723 | - `from scipy.linalg import inv as my_inv` 724 | 725 | `@hint` 726 | - Try the different import statements in the IPython shell and see which one causes the line `my_inv([[1, 2], [3, 4]])` to run without errors. Hit **enter** to run the code you have typed. 727 | 728 | `@pre_exercise_code` 729 | ```{python} 730 | 731 | ``` 732 | 733 | `@sct` 734 | ```{python} 735 | msg1 = msg2 = msg3 = "Incorrect, try again. Try the different import statements in the IPython shell and see which one causes the line `my_inv([[1, 2], [3, 4]])` to run without errors." 736 | msg4 = "Correct! The `as` word allows you to create a local name for the function you're importing: `inv()` is now available as `my_inv()`." 737 | Ex().has_chosen(4, [msg1, msg2, msg3, msg4]) 738 | ``` 739 | -------------------------------------------------------------------------------- /chapter4.md: -------------------------------------------------------------------------------- 1 | --- 2 | title_meta: Chapter 4 3 | title: NumPy 4 | description: >- 5 | NumPy is a fundamental Python package to efficiently practice data science. 6 | Learn to work with powerful tools in the NumPy array, and get started with 7 | data exploration. 8 | attachments: 9 | slides_link: 'https://projector-video-pdf-converter.datacamp.com/735/chapter4.pdf' 10 | lessons: 11 | - nb_of_exercises: 5 12 | title: Numpy 13 | - nb_of_exercises: 5 14 | title: 2D Numpy Arrays 15 | - nb_of_exercises: 3 16 | title: 'Numpy: Basic Statistics' 17 | --- 18 | 19 | ## NumPy 20 | 21 | ```yaml 22 | type: VideoExercise 23 | key: f4545baa53 24 | xp: 50 25 | ``` 26 | 27 | `@projector_key` 28 | a0487c26210f6b71ea98f917734cea3a 29 | 30 | --- 31 | 32 | ## Your First NumPy Array 33 | 34 | ```yaml 35 | type: NormalExercise 36 | key: 84cab9d170 37 | lang: python 38 | xp: 100 39 | skills: 40 | - 2 41 | ``` 42 | 43 | You're now going to dive into the world of baseball. Along the way, you'll get comfortable with the basics of `numpy`, a powerful package to do data science. 44 | 45 | A list `baseball` has already been defined in the Python script, representing the height of some baseball players in centimeters. Can you add some code to create a `numpy` array from it? 46 | 47 | `@instructions` 48 | - Import the `numpy` package as `np`, so that you can refer to `numpy` with `np`. 49 | - Use `np.array()` to create a `numpy` array from `baseball`. Name this array `np_baseball`. 50 | - Print out the type of `np_baseball` to check that you got it right. 51 | 52 | `@hint` 53 | - `import numpy as np` will do the trick. Now, you have to use `np.fun_name()` whenever you want to use a `numpy` function. 54 | - `np.array()` should take on input `baseball`. Assign the result of the function call to `np_baseball`. 55 | - To print out the type of a variable `x`, simply type `print(type(x))`. 56 | 57 | `@pre_exercise_code` 58 | ```{python} 59 | import numpy as np 60 | ``` 61 | 62 | `@sample_code` 63 | ```{python} 64 | # Import the numpy package as np 65 | 66 | 67 | baseball = [180, 215, 210, 210, 188, 176, 209, 200] 68 | 69 | # Create a numpy array from baseball: np_baseball 70 | 71 | 72 | # Print out type of np_baseball 73 | 74 | ``` 75 | 76 | `@solution` 77 | ```{python} 78 | # Import the numpy package as np 79 | import numpy as np 80 | 81 | baseball = [180, 215, 210, 210, 188, 176, 209, 200] 82 | 83 | # Create a NumPy array from baseball: np_baseball 84 | np_baseball = np.array(baseball) 85 | 86 | # Print out type of np_baseball 87 | print(type(np_baseball)) 88 | ``` 89 | 90 | `@sct` 91 | ```{python} 92 | predef_msg = "You don't have to change or remove the predefined variables." 93 | Ex().has_import("numpy") 94 | Ex().check_correct( 95 | check_object("np_baseball"), 96 | multi( 97 | check_object("baseball", missing_msg=predef_msg).has_equal_value(incorrect_msg=predef_msg), 98 | check_function("numpy.array").check_args(0).has_equal_ast() 99 | ) 100 | ) 101 | 102 | Ex().has_printout(0) 103 | success_msg("Great job!") 104 | ``` 105 | 106 | --- 107 | 108 | ## Baseball players' height 109 | 110 | ```yaml 111 | type: NormalExercise 112 | key: e7e25a89ea 113 | lang: python 114 | xp: 100 115 | skills: 116 | - 2 117 | ``` 118 | 119 | You are a huge baseball fan. You decide to call the MLB (Major League Baseball) and ask around for some more statistics on the height of the main players. They pass along data on more than a thousand players, which is stored as a regular Python list: `height_in`. The height is expressed in inches. Can you make a `numpy` array out of it and convert the units to meters? 120 | 121 | `height_in` is already available and the `numpy` package is loaded, so you can start straight away (Source: stat.ucla.edu). 122 | 123 | `@instructions` 124 | - Create a `numpy` array from `height_in`. Name this new array `np_height_in`. 125 | - Print `np_height_in`. 126 | - Multiply `np_height_in` with `0.0254` to convert all height measurements from inches to meters. Store the new values in a new array, `np_height_m`. 127 | - Print out `np_height_m` and check if the output makes sense. 128 | 129 | `@hint` 130 | - Use `np.array()` and pass it `height`. Store the result in `np_height_in`. 131 | - To print out a variable `x`, type `print(x)` in the Python script. 132 | - Perform calculations as if `np_height_in` is a single number: `np_height_in * conversion_factor` is part of the answer. 133 | - To print out a variable `x`, type `print(x)` in the Python script. 134 | 135 | `@pre_exercise_code` 136 | ```{python} 137 | import pandas as pd 138 | mlb = pd.read_csv("https://assets.datacamp.com/course/intro_to_python/baseball.csv") 139 | height_in = mlb['Height'].tolist() 140 | import numpy as np 141 | ``` 142 | 143 | `@sample_code` 144 | ```{python} 145 | # Import numpy 146 | import numpy as np 147 | 148 | # Create a numpy array from height_in: np_height_in 149 | 150 | 151 | # Print out np_height_in 152 | 153 | 154 | # Convert np_height_in to m: np_height_m 155 | 156 | 157 | # Print np_height_m 158 | 159 | ``` 160 | 161 | `@solution` 162 | ```{python} 163 | # Import numpy 164 | import numpy as np 165 | 166 | # Create a numpy array from height_in: np_height_in 167 | np_height_in = np.array(height_in) 168 | 169 | # Print out np_height_in 170 | print(np_height_in) 171 | 172 | # Convert np_height_in to m: np_height_m 173 | np_height_m = np_height_in * 0.0254 174 | 175 | # Print np_height_m 176 | print(np_height_m) 177 | ``` 178 | 179 | `@sct` 180 | ```{python} 181 | Ex().has_import("numpy", same_as = False) 182 | 183 | Ex().check_correct( 184 | has_printout(0), 185 | check_correct( 186 | check_object('np_height_in').has_equal_value(), 187 | check_function('numpy.array').check_args(0).has_equal_ast() 188 | ) 189 | ) 190 | 191 | Ex().check_correct( 192 | has_printout(1), 193 | check_object("np_height_m").has_equal_value(incorrect_msg = "Use `np_height_in * 0.0254` to calculate `np_height_m`.") 194 | ) 195 | 196 | success_msg("Nice! In the blink of an eye, `numpy` performs multiplications on more than 1000 height measurements.") 197 | ``` 198 | 199 | --- 200 | 201 | ## NumPy Side Effects 202 | 203 | ```yaml 204 | type: MultipleChoiceExercise 205 | key: 3662ff6637 206 | lang: python 207 | xp: 50 208 | skills: 209 | - 2 210 | ``` 211 | 212 | `numpy` is great for doing vector arithmetic. If you compare its functionality with regular Python lists, however, some things have changed. 213 | 214 | First of all, `numpy` arrays cannot contain elements with different types. 215 | Second, the typical arithmetic operators, such as `+`, `-`, `*` and `/` have a different meaning for regular Python lists and `numpy` arrays. 216 | 217 | Some lines of code have been provided for you. Try these out and select the one that would match this: 218 | 219 | ``` 220 | np.array([True, 1, 2]) + np.array([3, 4, False]) 221 | ``` 222 | 223 | The `numpy` package is already imported as `np`. 224 | 225 | `@possible_answers` 226 | - `np.array([True, 1, 2, 3, 4, False])` 227 | - `np.array([4, 3, 0]) + np.array([0, 2, 2])` 228 | - `np.array([1, 1, 2]) + np.array([3, 4, -1])` 229 | - `np.array([0, 1, 2, 3, 4, 5])` 230 | 231 | `@hint` 232 | - Copy the different code chunks and paste them in the IPython Shell. Hit **enter** to run the code and see which output matches the one generated by `np.array([True, 1, 2]) + np.array([3, 4, False])`. 233 | 234 | `@pre_exercise_code` 235 | ```{python} 236 | import numpy as np 237 | ``` 238 | 239 | `@sct` 240 | ```{python} 241 | msg1 = msg3 = msg4 = "Incorrect. Try out the different code chunks and see which one matches the target code chunk." 242 | msg2 = "Great job! `True` is converted to 1, `False` is converted to 0." 243 | Ex().has_chosen(2, [msg1, msg2, msg3, msg4]) 244 | ``` 245 | 246 | --- 247 | 248 | ## Subsetting NumPy Arrays 249 | 250 | ```yaml 251 | type: NormalExercise 252 | key: fcb2a9007b 253 | lang: python 254 | xp: 100 255 | skills: 256 | - 2 257 | ``` 258 | 259 | Subsetting (using the square bracket notation on lists or arrays) works exactly the same with both lists and arrays. 260 | 261 | This exercise already has two lists, `height_in` and `weight_lb`, loaded in the background for you. These contain the height and weight of the MLB players as regular lists. It also has two `numpy` array lists, `np_weight_lb` and `np_height_in` prepared for you. 262 | 263 | `@instructions` 264 | - Subset `np_weight_lb` by printing out the element at index 50. 265 | - Print out a sub-array of `np_height_in` that contains the elements at index 100 up to **and including** index 110. 266 | 267 | `@hint` 268 | - Make sure to wrap a `print()` call around your subsetting operations. 269 | - Use `[100:111]` to get the elements from index 100 up to and including index 110. 270 | 271 | `@pre_exercise_code` 272 | ```{python} 273 | import pandas as pd 274 | mlb = pd.read_csv("https://assets.datacamp.com/course/intro_to_python/baseball.csv") 275 | height_in = mlb['Height'].tolist() 276 | weight_lb = mlb['Weight'].tolist() 277 | ``` 278 | 279 | `@sample_code` 280 | ```{python} 281 | import numpy as np 282 | 283 | np_weight_lb = np.array(weight_lb) 284 | np_height_in = np.array(height_in) 285 | 286 | # Print out the weight at index 50 287 | 288 | 289 | # Print out sub-array of np_height_in: index 100 up to and including index 110 290 | 291 | ``` 292 | 293 | `@solution` 294 | ```{python} 295 | import numpy as np 296 | 297 | np_weight_lb = np.array(weight_lb) 298 | np_height_in = np.array(height_in) 299 | 300 | # Print out the weight at index 50 301 | print(np_weight_lb[50]) 302 | 303 | # Print out sub-array of np_height_in: index 100 up to and including index 110 304 | print(np_height_in[100:111]) 305 | ``` 306 | 307 | `@sct` 308 | ```{python} 309 | Ex().has_import("numpy", same_as=False) 310 | msg = "You don't have to change or remove the predefined variables." 311 | Ex().multi( 312 | check_object("np_height_in", missing_msg=msg).has_equal_value(incorrect_msg = msg), 313 | check_object("np_weight_lb", missing_msg=msg).has_equal_value(incorrect_msg = msg) 314 | ) 315 | 316 | Ex().has_printout(0) 317 | Ex().has_printout(1) 318 | 319 | success_msg("Nice! Time to learn something new: 2D NumPy arrays!") 320 | ``` 321 | 322 | --- 323 | 324 | ## 2D NumPy Arrays 325 | 326 | ```yaml 327 | type: VideoExercise 328 | key: 1241efac7a 329 | xp: 50 330 | ``` 331 | 332 | `@projector_key` 333 | ae3238dcc7feb9adecfee0c395fc8dc8 334 | 335 | --- 336 | 337 | ## Your First 2D NumPy Array 338 | 339 | ```yaml 340 | type: NormalExercise 341 | key: 5cb045bb13 342 | lang: python 343 | xp: 100 344 | skills: 345 | - 2 346 | ``` 347 | 348 | Before working on the actual MLB data, let's try to create a 2D `numpy` array from a small list of lists. 349 | 350 | In this exercise, `baseball` is a list of lists. The main list contains 4 elements. Each of these elements is a list containing the height and the weight of 4 baseball players, in this order. `baseball` is already coded for you in the script. 351 | 352 | `@instructions` 353 | - Use `np.array()` to create a 2D `numpy` array from `baseball`. Name it `np_baseball`. 354 | - Print out the type of `np_baseball`. 355 | - Print out the `shape` attribute of `np_baseball`. Use `np_baseball.shape`. 356 | 357 | `@hint` 358 | - `baseball` is already coded for you in the script. Call `np.array()` on it and store the resulting 2D `numpy` array in `np_baseball`. 359 | - Use `print()` in combination with `type()` for the second instruction. 360 | - `np_baseball.shape` will give you the dimensions of the `np_baseball`. Make sure to wrap a `print()` call around it. 361 | 362 | `@pre_exercise_code` 363 | ```{python} 364 | 365 | ``` 366 | 367 | `@sample_code` 368 | ```{python} 369 | import numpy as np 370 | 371 | baseball = [[180, 78.4], 372 | [215, 102.7], 373 | [210, 98.5], 374 | [188, 75.2]] 375 | 376 | # Create a 2D numpy array from baseball: np_baseball 377 | 378 | 379 | # Print out the type of np_baseball 380 | 381 | 382 | # Print out the shape of np_baseball 383 | 384 | ``` 385 | 386 | `@solution` 387 | ```{python} 388 | import numpy as np 389 | 390 | baseball = [[180, 78.4], 391 | [215, 102.7], 392 | [210, 98.5], 393 | [188, 75.2]] 394 | 395 | # Create a 2D numpy array from baseball: np_baseball 396 | np_baseball = np.array(baseball) 397 | 398 | # Print out the type of np_baseball 399 | print(type(np_baseball)) 400 | 401 | # Print out the shape of np_baseball 402 | print(np_baseball.shape) 403 | ``` 404 | 405 | `@sct` 406 | ```{python} 407 | msg = "You don't have to change or remove the predefined variables." 408 | Ex().check_object("baseball", missing_msg=msg).has_equal_value(incorrect_msg = msg) 409 | Ex().has_import("numpy", same_as = False) 410 | 411 | Ex().check_correct( 412 | multi( 413 | has_printout(0), 414 | has_printout(1) 415 | ), 416 | check_correct( 417 | check_object('np_baseball').has_equal_value(), 418 | check_function('numpy.array').check_args(0).has_equal_ast() 419 | ) 420 | ) 421 | 422 | success_msg("Great! You're ready to convert the actual MLB data to a 2D `numpy` array now!") 423 | ``` 424 | 425 | --- 426 | 427 | ## Baseball data in 2D form 428 | 429 | ```yaml 430 | type: NormalExercise 431 | key: 5df25d0b7b 432 | lang: python 433 | xp: 100 434 | skills: 435 | - 2 436 | ``` 437 | 438 | You realize that it makes more sense to restructure all this information in a 2D `numpy` array. 439 | 440 | You have a Python list of lists. In this list of lists, each sublist represents the height and weight of a single baseball player. The name of this list is `baseball` and it has been loaded for you already (although you can't see it). 441 | 442 | Store the data as a 2D array to unlock `numpy`'s extra functionality. 443 | 444 | `@instructions` 445 | - Use `np.array()` to create a 2D `numpy` array from `baseball`. Name it `np_baseball`. 446 | - Print out the `shape` attribute of `np_baseball`. 447 | 448 | `@hint` 449 | - `baseball` is already available in the Python environment. Call `np.array()` on it and store the resulting 2D `numpy` array in `np_baseball`. 450 | - `np_baseball.shape` will give the dimensions of the `np_baseball`. Make sure to wrap a `print()`call around it. 451 | 452 | `@pre_exercise_code` 453 | ```{python} 454 | import pandas as pd 455 | baseball = pd.read_csv("https://assets.datacamp.com/course/intro_to_python/baseball.csv")[['Height', 'Weight']].to_numpy().tolist() 456 | import numpy as np 457 | ``` 458 | 459 | `@sample_code` 460 | ```{python} 461 | import numpy as np 462 | 463 | # Create a 2D numpy array from baseball: np_baseball 464 | np_baseball = 465 | 466 | # Print out the shape of np_baseball 467 | 468 | ``` 469 | 470 | `@solution` 471 | ```{python} 472 | import numpy as np 473 | 474 | # Create a 2D numpy array from baseball: np_baseball 475 | np_baseball = np.array(baseball) 476 | 477 | # Print out the shape of np_baseball 478 | print(np_baseball.shape) 479 | ``` 480 | 481 | `@sct` 482 | ```{python} 483 | Ex().has_import("numpy", same_as = False) 484 | 485 | Ex().check_correct( 486 | has_printout(0), 487 | check_correct( 488 | check_object('np_baseball').has_equal_value(), 489 | check_function('numpy.array').check_args(0).has_equal_ast() 490 | ) 491 | ) 492 | 493 | success_msg("Slick! Time to show off some killer features of multi-dimensional `numpy` arrays!") 494 | ``` 495 | 496 | --- 497 | 498 | ## Subsetting 2D NumPy Arrays 499 | 500 | ```yaml 501 | type: NormalExercise 502 | key: aeca4977f0 503 | lang: python 504 | xp: 100 505 | skills: 506 | - 2 507 | ``` 508 | 509 | If your 2D `numpy` array has a regular structure, i.e. each row and column has a fixed number of values, complicated ways of subsetting become very easy. Have a look at the code below where the elements `"a"` and `"c"` are extracted from a list of lists. 510 | 511 | ``` 512 | # numpy 513 | import numpy as np 514 | np_x = np.array(x) 515 | np_x[:, 0] 516 | ``` 517 | 518 | The indexes before the comma refer to the rows, while those after the comma refer to the columns. The `:` is for slicing; in this example, it tells Python to include all rows. 519 | 520 | `@instructions` 521 | - Print out the 50th row of `np_baseball`. 522 | - Make a new variable, `np_weight_lb`, containing the entire second column of `np_baseball`. 523 | - Select the height (first column) of the 124th baseball player in `np_baseball` and print it out. 524 | 525 | `@hint` 526 | - You need row index 49 in the first instruction! More specifically, you'll want to use `[49, :]`. 527 | - To select the entire second column, you'll need `[:, 1]`. 528 | - For the last instruction, use `[123, 0]`; don't forget to wrap it all in a `print()` statement. 529 | 530 | `@pre_exercise_code` 531 | ```{python} 532 | import pandas as pd 533 | baseball = pd.read_csv("https://assets.datacamp.com/course/intro_to_python/baseball.csv")[['Height', 'Weight']].to_numpy().tolist() 534 | import numpy as np 535 | ``` 536 | 537 | `@sample_code` 538 | ```{python} 539 | import numpy as np 540 | 541 | np_baseball = np.array(baseball) 542 | 543 | # Print out the 50th row of np_baseball 544 | 545 | 546 | # Select the entire second column of np_baseball: np_weight_lb 547 | 548 | 549 | # Print out height of 124th player 550 | 551 | ``` 552 | 553 | `@solution` 554 | ```{python} 555 | import numpy as np 556 | 557 | np_baseball = np.array(baseball) 558 | 559 | # Print out the 50th row of np_baseball 560 | print(np_baseball[49,:]) 561 | 562 | # Select the entire second column of np_baseball: np_weight_lb 563 | np_weight_lb = np_baseball[:,1] 564 | 565 | # Print out height of 124th player 566 | print(np_baseball[123, 0]) 567 | ``` 568 | 569 | `@sct` 570 | ```{python} 571 | msg = "You don't have to change or remove the predefined variables." 572 | Ex().multi( 573 | has_import("numpy", same_as = False), 574 | check_object("np_baseball", missing_msg=msg).has_equal_value(incorrect_msg = msg) 575 | ) 576 | 577 | Ex().has_printout(0) 578 | 579 | Ex().check_object('np_weight_lb').has_equal_value(incorrect_msg = "You can use `np_baseball[:,1]` to define `np_weight_lb`. This will select the entire first column.") 580 | 581 | Ex().has_printout(1) 582 | 583 | success_msg("This is going well!") 584 | ``` 585 | 586 | --- 587 | 588 | ## 2D Arithmetic 589 | 590 | ```yaml 591 | type: NormalExercise 592 | key: 1c2378b677 593 | lang: python 594 | xp: 100 595 | skills: 596 | - 2 597 | ``` 598 | 599 | 2D `numpy` arrays can perform calculations element by element, like `numpy` arrays. 600 | 601 | `np_baseball` is coded for you; it's again a 2D `numpy` array with 3 columns representing height (in inches), weight (in pounds) and age (in years). `baseball` is available as a regular list of lists and `updated` is available as 2D numpy array. 602 | 603 | `@instructions` 604 | - You managed to get hold of the changes in height, weight and age of all baseball players. It is available as a 2D `numpy` array, `updated`. Add `np_baseball` and `updated` and print out the result. 605 | - You want to convert the units of height and weight to metric (meters and kilograms, respectively). As a first step, create a `numpy` array with three values: `0.0254`, `0.453592` and `1`. Name this array `conversion`. 606 | - Multiply `np_baseball` with `conversion` and print out the result. 607 | 608 | `@hint` 609 | - `np_baseball + updated` will do an element-wise summation of the two `numpy` arrays. 610 | - Create a `numpy` array with `np.array()`; the input is a regular Python list with three elements. 611 | - `np_baseball * conversion` will work, without extra work. Try out it! Make sure to wrap it in a `print()` call. 612 | 613 | `@pre_exercise_code` 614 | ```{python} 615 | import pandas as pd 616 | import numpy as np 617 | baseball = pd.read_csv("https://assets.datacamp.com/course/intro_to_python/baseball.csv")[['Height', 'Weight', 'Age']].to_numpy().tolist() 618 | n = len(baseball) 619 | updated = np.array(pd.read_csv("https://assets.datacamp.com/course/intro_to_python/update.csv", header = None)) 620 | import numpy as np 621 | ``` 622 | 623 | `@sample_code` 624 | ```{python} 625 | import numpy as np 626 | 627 | np_baseball = np.array(baseball) 628 | 629 | # Print out addition of np_baseball and updated 630 | 631 | 632 | # Create numpy array: conversion 633 | 634 | 635 | # Print out product of np_baseball and conversion 636 | 637 | ``` 638 | 639 | `@solution` 640 | ```{python} 641 | import numpy as np 642 | 643 | np_baseball = np.array(baseball) 644 | 645 | # Print out addition of np_baseball and updated 646 | print(np_baseball + updated) 647 | 648 | # Create numpy array: conversion 649 | conversion = np.array([0.0254, 0.453592, 1]) 650 | 651 | # Print out product of np_baseball and conversion 652 | print(np_baseball * conversion) 653 | ``` 654 | 655 | `@sct` 656 | ```{python} 657 | Ex().has_import("numpy") 658 | 659 | msg = "You don't have to change or remove the predefined variables." 660 | Ex().check_object("np_baseball", missing_msg=msg).has_equal_value(incorrect_msg = msg) 661 | 662 | Ex().has_printout(0) 663 | 664 | Ex().check_correct( 665 | has_printout(1), 666 | check_correct( 667 | check_object('conversion').has_equal_value(), 668 | check_function('numpy.array', index = 1).check_args(0).has_equal_value() 669 | ) 670 | ) 671 | 672 | success_msg("Great job! Notice how with very little code, you can change all values in your `numpy` data structure in a very specific way. This will be very useful in your future as a data scientist!") 673 | ``` 674 | 675 | --- 676 | 677 | ## NumPy: Basic Statistics 678 | 679 | ```yaml 680 | type: VideoExercise 681 | key: 287995e488 682 | xp: 50 683 | ``` 684 | 685 | `@projector_key` 686 | 34495ba457d74296794d2a122c9b6e19 687 | 688 | --- 689 | 690 | ## Average versus median 691 | 692 | ```yaml 693 | type: NormalExercise 694 | key: 509c588eb6 695 | lang: python 696 | xp: 100 697 | skills: 698 | - 2 699 | ``` 700 | 701 | You now know how to use `numpy` functions to get a better feeling for your data. 702 | 703 | The baseball data is available as a 2D `numpy` array with 3 columns (height, weight, age) and 1015 rows. The name of this `numpy` array is `np_baseball`. After restructuring the data, however, you notice that some height values are abnormally high. Follow the instructions and discover which summary statistic is best suited if you're dealing with so-called _outliers_. `np_baseball` is available. 704 | 705 | `@instructions` 706 | - Create `numpy` array `np_height_in` that is equal to first column of `np_baseball`. 707 | - Print out the mean of `np_height_in`. 708 | - Print out the median of `np_height_in`. 709 | 710 | `@hint` 711 | - Use 2D `numpy` subsetting: `[:,0]` is a part of the solution. 712 | - If `numpy` is imported as `np`, you can use `np.mean()` to get the mean of a NumPy array. Don't forget to throw in a `print()` call. 713 | - For the last instruction, use `np.median()`. 714 | 715 | `@pre_exercise_code` 716 | ```{python} 717 | import pandas as pd 718 | np_baseball = pd.read_csv("https://assets.datacamp.com/course/intro_to_python/baseball.csv")[['Height', 'Weight', 'Age']].to_numpy() 719 | np_baseball[slice(0, 1015, 50), 0] = np_baseball[slice(0, 1015, 50), 0]*1000 720 | import numpy as np 721 | ``` 722 | 723 | `@sample_code` 724 | ```{python} 725 | import numpy as np 726 | 727 | # Create np_height_in from np_baseball 728 | 729 | 730 | # Print out the mean of np_height_in 731 | 732 | 733 | # Print out the median of np_height_in 734 | 735 | ``` 736 | 737 | `@solution` 738 | ```{python} 739 | import numpy as np 740 | 741 | # Create np_height_in from np_baseball 742 | np_height_in = np_baseball[:,0] 743 | 744 | # Print out the mean of np_height_in 745 | print(np.mean(np_height_in)) 746 | 747 | # Print out the median of np_height_in 748 | print(np.median(np_height_in)) 749 | ``` 750 | 751 | `@sct` 752 | ```{python} 753 | Ex().has_import("numpy", same_as = False) 754 | 755 | Ex().check_object("np_height_in").has_equal_value(incorrect_msg = "You can use `np_baseball[:,0]` to select the first column from `np_baseball`"), 756 | 757 | Ex().check_correct( 758 | has_printout(0), 759 | check_function('numpy.mean').has_equal_value() 760 | ) 761 | 762 | Ex().check_correct( 763 | has_printout(1), 764 | check_function('numpy.median').has_equal_value() 765 | ) 766 | 767 | success_msg("An average height of 1586 inches, that doesn't sound right, does it? However, the median does not seem affected by the outliers: 74 inches makes perfect sense. It's always a good idea to check both the median and the mean, to get an idea about the overall distribution of the entire dataset.") 768 | ``` 769 | 770 | --- 771 | 772 | ## Explore the baseball data 773 | 774 | ```yaml 775 | type: NormalExercise 776 | key: '4409948807' 777 | lang: python 778 | xp: 100 779 | skills: 780 | - 2 781 | ``` 782 | 783 | Because the mean and median are so far apart, you decide to complain to the MLB. They find the error and send the corrected data over to you. It's again available as a 2D NumPy array `np_baseball`, with three columns. 784 | 785 | The Python script in the editor already includes code to print out informative messages with the different summary statistics and `numpy` is already loaded as `np`. Can you finish the job? `np_baseball` is available. 786 | 787 | `@instructions` 788 | - The code to print out the mean height is already included. Complete the code for the median height. 789 | - Use `np.std()` on the first column of `np_baseball` to calculate `stddev`. 790 | - Do big players tend to be heavier? Use `np.corrcoef()` to store the correlation between the first and second column of `np_baseball` in `corr`. 791 | 792 | `@hint` 793 | - Use `np.median()` to calculate the median. Make sure to select to correct column first! 794 | - Subset the same column when calculating the standard deviation with `np.std()`. 795 | - Use `np_baseball[:, 0]` and `np_baseball[:, 1]` to select the first and second columns; these are the inputs to `np.corrcoef()`. 796 | 797 | `@pre_exercise_code` 798 | ```{python} 799 | import pandas as pd 800 | np_baseball = pd.read_csv("https://assets.datacamp.com/course/intro_to_python/baseball.csv")[['Height', 'Weight', 'Age']].to_numpy() 801 | import numpy as np 802 | ``` 803 | 804 | `@sample_code` 805 | ```{python} 806 | avg = np.mean(np_baseball[:,0]) 807 | print("Average: " + str(avg)) 808 | 809 | # Print median height 810 | med = ____ 811 | print("Median: " + str(med)) 812 | 813 | # Print out the standard deviation on height 814 | stddev = ____ 815 | print("Standard Deviation: " + str(stddev)) 816 | 817 | # Print out correlation between first and second column 818 | corr = ____ 819 | print("Correlation: " + str(corr)) 820 | ``` 821 | 822 | `@solution` 823 | ```{python} 824 | avg = np.mean(np_baseball[:,0]) 825 | print("Average: " + str(avg)) 826 | 827 | # Print median height 828 | med = np.median(np_baseball[:,0]) 829 | print("Median: " + str(med)) 830 | 831 | # Print out the standard deviation on height 832 | stddev = np.std(np_baseball[:,0]) 833 | print("Standard Deviation: " + str(stddev)) 834 | 835 | # Print out correlation between first and second column 836 | corr = np.corrcoef(np_baseball[:,0], np_baseball[:,1]) 837 | print("Correlation: " + str(corr)) 838 | ``` 839 | 840 | `@sct` 841 | ```{python} 842 | msg = "You shouldn't change or remove the predefined `avg` variable." 843 | Ex().check_object("avg", missing_msg=msg).has_equal_value(incorrect_msg=msg) 844 | 845 | missing = "Have you used `np.median()` to calculate the median?" 846 | incorrect = "To calculate `med`, pass the first column of `np_baseball` to `numpy.median()`. The example of `np.mean()` shows how it's done." 847 | Ex().check_correct( 848 | check_object("med").has_equal_value(), 849 | check_function("numpy.median", index=0, missing_msg=missing).check_args(0).has_equal_value(incorrect_msg=incorrect) 850 | ) 851 | 852 | missing = "Have you used `np.std()` to calculate the standard deviation?" 853 | incorrect = "To calculate `stddev`, pass the first column of `np_baseball` to `numpy.std()`. The example of `np.mean()` shows how it's done." 854 | Ex().check_correct( 855 | check_object("stddev").has_equal_value(), 856 | check_function("numpy.std", index=0, missing_msg=missing).check_args(0).has_equal_value(incorrect_msg=incorrect) 857 | ) 858 | 859 | missing = "Have you used `np.corrcoef()` to calculate the correlation?" 860 | incorrect1 = "To calculate `corr`, the first argument to `np.corrcoef()` should be the first column of `np_baseball`, similar to how did it before." 861 | incorrect2 = "To calculate `corr`, the second argument to `np.corrcoef()` should be the second column of `np_baseball`. Instead of `[:,0]`, use `[:,1]` this time." 862 | Ex().check_correct( 863 | check_object("corr").has_equal_value(), 864 | check_function("numpy.corrcoef", index=0, missing_msg=missing).multi( 865 | check_args(0, missing_msg=incorrect1).has_equal_value(incorrect_msg=incorrect1), 866 | check_args(1, missing_msg=incorrect2).has_equal_value(incorrect_msg=incorrect2) 867 | ) 868 | ) 869 | 870 | success_msg("Great work! You've built a solid foundation - now it's time to use all of your new data science skills to solve more challenges and make an impact.") 871 | ``` 872 | -------------------------------------------------------------------------------- /course.yml: -------------------------------------------------------------------------------- 1 | id: 735 2 | title: Introduction to Python 3 | programming_language: python 4 | description: >- 5 | Python is a general-purpose programming language that is becoming ever more 6 | popular for data science. Companies worldwide are using Python to harvest 7 | insights from their data and gain a competitive edge. Unlike other Python 8 | tutorials, this course focuses on Python specifically for data science. In our 9 | Introduction to Python course, you’ll learn about powerful ways to store and 10 | manipulate data, and helpful data science tools to begin conducting your own 11 | analyses. Start DataCamp’s online Python curriculum now. 12 | from: 'python-base-prod:v2.0.0' 13 | practice_pool_id: 107 14 | datasets: 15 | baseball.csv: MLB (baseball) 16 | fifa.csv: FIFA (soccer) 17 | -------------------------------------------------------------------------------- /courses-introduction-to-python.Rproj: -------------------------------------------------------------------------------- 1 | Version: 1.0 2 | 3 | RestoreWorkspace: Default 4 | SaveWorkspace: Default 5 | AlwaysSaveHistory: Default 6 | 7 | EnableCodeIndexing: Yes 8 | UseSpacesForTab: Yes 9 | NumSpacesForTab: 2 10 | Encoding: UTF-8 11 | 12 | RnwWeave: Sweave 13 | LaTeX: pdfLaTeX 14 | -------------------------------------------------------------------------------- /datasets/references.md: -------------------------------------------------------------------------------- 1 | # Sources of datasets 2 | 3 | - MLB data: http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_MLB_HeightsWeights 4 | - Soccer data: https://github.com/jokecamp/FootballData -------------------------------------------------------------------------------- /img/shield_image.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacamp/courses-introduction-to-python/4deee0de7c9fb66ce9687590847413b01862a7a0/img/shield_image.png -------------------------------------------------------------------------------- /intro-to-python-keynotes.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacamp/courses-introduction-to-python/4deee0de7c9fb66ce9687590847413b01862a7a0/intro-to-python-keynotes.zip -------------------------------------------------------------------------------- /requirements.sh: -------------------------------------------------------------------------------- 1 | # pip3 install numpy==1.11.0 2 | # pip3 install scipy==0.18.1 3 | 4 | -------------------------------------------------------------------------------- /scripts/chapter1_script.md: -------------------------------------------------------------------------------- 1 | --- video_exercise_key:d5509896f7 2 | 3 | ## Hello Python! 4 | 5 | Hi, my name is Filip and I'll be your host for Introduction to Python for Data Science. It's a long name, but that's to stress something: this is not just another Python tutorial. Instead, the focus will be on using Python specifically for data science. By the end of this course, you'll know about powerful ways to store and manipulate data and to deploy cool data science tools for your own analyses. 6 | 7 | You will learn Python for Data Science through video lessons, like this one, and interactive exercises. You get your own Python session where you can experiment and try to come up with the correct code to solve the instructions. You're learning by doing, while receiving customized and instant feedback on your work. 8 | 9 | Python was conceived by Guido Van Rossum. What started as a hobby project, soon became a general purpose programming language: nowadays, you can use Python to build practically any piece of software. But how did this happen? Well, first of all, Python is open source. It's free to use. Second, it's very easy to build packages in Python, which is code that you can share with other people to solve specific problems. Throughout time, more and more of these packages specifically built for data science have been developed. Suppose you want to make some fancy visualizations of your company's sales. There's a package for that. Or what about connecting to a database to analyze sensor measurements? There's also a package for that. 10 | 11 | Currently, there are two common versions of Python, version 2.7 and 3.5 and later. Apart from some syntactical differences, they are pretty similar, but as support for version 2 will fade over time, our courses focus on Python 3. To install Python 3 on your own system, follow the steps at this URL. 12 | 13 | Now that you're all eyes and ears for Python, let's start experimenting. I'll start with the Python shell, a place where you can type Python code and immediately see the results. In DataCamp's exercise interface, this shell is embedded here. Let's start off simple and use Python as a calculator. Let me type 4 + 5 and hit Enter. Python interprets what you typed and prints the result of your calculation, 9. The Python shell that's used here is actually not the original one; we're using IPython, short for Interactive Python, which is some kind of juiced up version of regular Python that'll be useful later on. 14 | 15 | Apart from interactively working with Python, you can also have Python run so called python scripts. These python scripts are simply text files with the extension (dot) py. It's basically a list of Python commands that are executed, almost as if you where typing the commands in the shell yourself, line by line. Let's put the command from before in a script now, that can be found here in DataCamp's interface. The next step is executing the script, by clicking 'Submit Answer'. 16 | 17 | If you execute this script in the DataCamp interface, there's nothing in the output pane. That's because you have to explicitly use print() inside scripts if you want to generate output during execution. Let's wrap our previous calculation in a print() call, and rerun the script. This time, the same output as before is generated, great! 18 | 19 | Putting your code in Python scripts instead of manually retyping every step interactively will help you to keep structure and avoid retyping everything over and over again if you want to make a change; you simply make the change in the script, and rerun the entire thing. 20 | 21 | Now that you've got an idea about different ways of working with Python, I suggest you head over to the exercises. Use the IPython Shell for experimentation, and use the Python script editor to code the actual answer. If you click Submit Answer, your script will be executed and checked for correctness. Have fun! 22 | 23 | --- video_exercise_key:ef8356fb92 24 | 25 | ## Variables and types 26 | 27 | It's clear that Python is a great calculator. If you want to do more complex calculations though, you will want to "save" values while you're coding along. You can do this by defining a variable, with a specific, case-sensitive name. Once you create (or declare) such a variable, you can later call up its value by typing the variable name. 28 | 29 | Suppose you measure your height and weight, in metric units: you are 1.79 meters tall, and weigh 68.7 kilograms. You can assign these values to two variables, named height and weight, with an equals sign: 30 | 31 | If you now type the name of the variable, height, 32 | 33 | Python looks for the variable name, retrieves its value, and prints it out. 34 | 35 | Let's now calculate the Body Mass Index, or BMI, which is calculated as follows, with weight in kilograms and height in meters. You can do this with the actual values, but you can just as well use the variables height and weight, like in here. Every time you type the variable's name, you are asking Python to change it with the actual value of the variable. weight corresponds to 68.7, and height to 1.79. 36 | 37 | Finally, this version has Python store the result in a new variable, bmi. bmi now contains the same value as the one you calculated earlier. 38 | 39 | In Python, variables are used all the time. They help to make your code reproducible. Suppose the code to create the height, weight and bmi variable are in a script, like this. If you now want to recalculate the bmi for another weight, you can simply change the declaration of the weight variable, and rerun the script. The bmi changes accordingly, because the value of the variable weight has changed as well. 40 | 41 | So far, we've only worked with numerical values, such as height and weight. In Python, these numbers all have a specific type. You can check out the type of a value with the type() function. To see the type of our bmi value, simply write type and then bmi inside parentheses. You can see that it's a float, which is python's way of representing a real number, so a number which can have both an integer part and a fractional part. Python also has a type for integers: int, like this example. 42 | 43 | To do data science, you'll need more than ints and floats, though. Python features tons of other data types. The most common ones are strings and booleans. 44 | 45 | A string is Python's way to represent text. You can use both double and single quotes to build a string, as you can see from these examples. If you print the type of the last variable here, you see that it's str, short for string. 46 | 47 | The Boolean is a type that can either be True or False. You can think of it as 'Yes' and 'No' in everyday language. Booleans will be very useful in the future, to perform filtering operations on your data for example. 48 | 49 | There's something special about Python data types. Have a look at this line of code, that sums two integers, and then this line of code, that sums two strings. 50 | 51 | For the integers, the values were summed, while for the strings, the strings were pasted together. The plus operator behaved differently for different data types. This is a general principle: how the code behaves depends on the types you're working with. 52 | 53 | In the exercises that follow, you'll create your first variables and experiment with some of Python's data types. I'll see you in the next video to explain all about lists. 54 | -------------------------------------------------------------------------------- /scripts/chapter2_script.md: -------------------------------------------------------------------------------- 1 | --- video_exercise_key:f366e876d8 2 | 3 | ## Lists 4 | 5 | By now, you've played around with different data types. On the numbers side, there's the float, to represent a real number, and the int, to represent an integer. Next, we also have str, short for string, to represent text in Python, and bool, which can be either True or False. You can save these values as a variable, like these examples show. Each variable then represents a single value. 6 | 7 | As a data scientist, you'll often want to work with many data points. If you for example want to measure the height of everybody in your family, and store this information in Python, it would be inconvenient to create a new python variable for each point you collected right? 8 | 9 | What you can do instead, is store all this information in a Python list. You can build such a list with square brackets. Suppose you asked your two sisters and parents for their height, in meters. You can build the list as follows: 10 | 11 | Of course, also this data structure can be referenced to with a variable. Simply put the variable name and the equals sign in front, like here. 12 | 13 | A list is a way to give a single name to a collection of values. These values, or elements, can have any type; they can be floats, integer, booleans, strings, but also more advanced Python types, even lists. 14 | 15 | It's perfectly possible for a list to contain different types as well. Suppose, for example, that you want to add the names of your sisters and parents to the list, so that you know which height belongs to who. You can throw in some strings without issues. 16 | 17 | But that's not all. I just told you that lists can also contain lists themselves. Instead of putting the strings in between the numbers, you can create little sublists for each member of the family. One for liz, one for emma and so on. Now, you can tell Python that these sublists are the elements of another list, that I named fam2: the little lists are wrapped in square brackets and separated with commas. If you now print out fam2, you see that we have a list of lists. The main list contains 4 sub-lists. 18 | 19 | We're dealing with a new Python type here, next to the strings, booleans, integers and floats you already know about: the list. These calls show that both fam and fam2 are lists. Remember that I told you that each type has specific functionality and behavior associated? Well, for lists, this is also true. Python lists host a bunch of tools to subset and adapt them. But let's take this step by step, and have you experiment with list creation first! 20 | 21 | --- video_exercise_key:9e15e5b8a0 22 | 23 | ## Subsetting lists 24 | 25 | After you've created your very own Python list, you might wonder how you can access information in the list. Python uses the index to do this. Have a look at the fam list again here. The first element in the list has index 0, the second element has index 1, and so on. Suppose that you want to select the height of emma, the float 1.68. It's the fourth element, so it has index 3. To select it, you use 3 inside square brackets. 26 | 27 | Similarly, to select the string "dad" from the list, which is the seventh element in the list, you'll need to put the index 6 inside square brackets. 28 | 29 | You can also count backwards, using negative indexes. This is useful if you want to get some elements at the end of your list. To get your dad's height, for example, you'll need the index -1. These are the negative indexes for all list elements. 30 | 31 | This means that this line and this line, return the exact same result. 32 | 33 | Apart from indexing, there's also something called slicing, which allows you to select multiple elements from a list, thus creating a new list. You can do this by specifying a range, using a colon. Let's first have another look at the list, and then try this piece of code. 34 | 35 | Can you guess what it'll return? A list with the the float 1.68, the string "mom", and the float 1.71, corresponding to the 4th, 5th and 6th element in the list maybe? Let's see what the output is. 36 | 37 | Apparently, only the elements with index 3 and 4, get returned. The element with index 5 is not included. In general, this is the syntax: the index you specify before the colon, so where the slice starts, is included, while the index you specify after the colon, where the slice ends, is not. 38 | 39 | With this in mind, can you tell what this call will return? 40 | 41 | You probably guessed correctly that this call gives you a list with three elements, corresponding to the elements with index 1, 2 and 3 of the fam list. 42 | 43 | You can also choose to just leave out the index before or after the colon. If you leave out the index where the slice should begin, you're telling Python to start the slice from index 0, like this example. 44 | 45 | If you leave out the index where the slice should end, you include all elements up to and including the last element in the list, like here. 46 | 47 | Now it's time to head over to the exercises, where you will continue to work on the list you've created yourself before. You'll use different subsetting methods to get exactly the piece of information you need! 48 | 49 | --- video_exercise_key:fbdaaec22a 50 | 51 | ## Manipulating lists 52 | 53 | After creation and subsetting, the final piece of the Python lists puzzle is manipulation, so ways to change elements in your list, or to add elements to and remove elements from your list. 54 | 55 | Changing list elements is pretty straightforward. You use the same square brackets that we've used to subset lists, and then assign new elements to it using the equals sign. Suppose that after another look at `fam`, you realize that your dad's height is not up to date anymore, as he's shrinking with age. Instead of 1.89 meters, it should be 1.86 meters. To change this list element, which is at index 7, you can use this line of code. 56 | 57 | If you now check out fam, you'll see that the value is updated. 58 | 59 | You can even change an entire list slice at once. To change the elements "liz" and 1.73, you access the first two elements with 0:2, and then assign a new list to it. 60 | 61 | Do you still remember how the plus operator was different for strings and integers? Well, it's again different for lists. If you use the plus sign with two lists, Python simply pastes together their contents in a single list. Suppose you want to add your own name and height to the fam height list. This will do the trick. 62 | 63 | Of course, you can also store this new list in a variable, `fam_ext` for example. 64 | 65 | Finally, deleting elements from a list is also pretty straightforward, you'll have to use `del` here. Take this line, for example, that deletes the element with index 2, so "emma", from the list. 66 | 67 | If you check out fam now, you'll see that the "emma" string is gone. Because you've removed an index, all elements that came after "emma" scooted over by one index. If you again run the same line, you're again removing the element at index 2, which is emma's height, 1.68 meters now. 68 | 69 | Understanding how Python lists actually work behind the scenes becomes pretty important now. What actually happens when you create a new list, `x`, like this? 70 | 71 | Well, in a simplified sense, you're storing a list in your computer memory, and store the 'address' of that list, so where the list is in your computer memory, in `x`. This means that `x` does not actually contain all the list elements, it rather contains a reference to the list. For basic operations, the difference is not that important, but it becomes more so when you start copying lists. Let me clarify this with an example. 72 | 73 | Let's store the list `x` as a new variable `y`, by simply using the equals sign. 74 | 75 | Let's now change the element with index one in the list `y`, as follows. 76 | 77 | The funky thing is that if you now check out `x` again, also here the second element was changed. 78 | 79 | That's because when you copied x to y with the equals sign, you copied the reference to the list, not the actual values themselves. When you're updating an element the list, it's one and the same list in the computer memory your changing. Both `x` and `y` point to this list, so the update is visible from both variables. 80 | 81 | If you want to create a list `y` that points to a new list in the memory with the same values, you'll need to use something else than the equals sign. You can use the `list()` function, like this, or use slicing to select all list elements explicitly. 82 | 83 | If you now make a change to the list `y` points to, `x` is not affected. 84 | 85 | If this was a bit too much to take in, don't worry. The exercises will help you understand list manipulation and the subtle inner workings of lists. I'm sure you'll do great! 86 | -------------------------------------------------------------------------------- /scripts/chapter3_script.md: -------------------------------------------------------------------------------- 1 | --- video_exercise_key:2dde2f90b8 2 | 3 | ## Functions, what are they? 4 | 5 | In this video, I'm going to introduce you to functions. Functions aren't entirely new for you actually: you've already used them. type(), for example, is a function that returns the type of a value. But what is a function? Simply put, a function is a piece of reusable code, aimed at solving a particular task. You can call functions instead of having to write code yourself. Maybe an example can clarify things here. 6 | 7 | Suppose you have the list containing only the heights of your family, fam: 8 | 9 | Say that you want to get the maximum value in this list. Instead of writing your own piece of Python code that goes through the list and finds the highest value, you can also use Python's max() function. This is one of Python's built-in functions, just like type(). We simply pass fam to max() inside parentheses. 10 | 11 | The output makes sense: 1.89, the highest number in the list. 12 | 13 | max() worked kind of like a black box here: you passed it a list, then the implementation of `max()`, that you don't know, did its magic, and produced an output. How max() actually did this, is not important to you, it just does what it's supposed to, and you didn't have to write your own code, which made your life easier. 14 | 15 | Of course, it's possible to also assign the result of a function call to a new variable, like here. Now `tallest` is just like any other variable; you can use to continue your fancy calculations. 16 | 17 | Another one of these built-in functions is round(). It takes two inputs: first, a number you want to round, and second, the precision with which to round, so how many digits behind the decimal point you want to keep. Say you want to round 1.68 to one decimal place. The first input is 1.68, the second input is 1. You separate the inputs with a comma. 18 | 19 | But there's more. It's perfectly possible to call the round() function with only one input, like this. This time, Python figured out that you didn't specify the second input, and automatically chooses to round the number to the closest integer. 20 | 21 | To understand why both approaches work, let's open up the documentation. You can do this with yet another function, `help`, as follows. 22 | 23 | It appears that round() takes two inputs. In Python, these inputs, also called arguments, have names: number and ndigits. When you call the function round(), with these two inputs, Python matches the inputs to the arguments: number is set to 1.68 and ndigits is set to 1. Next, The round() function does its calculations with number and ndigits as if they are variables in a Python script. We don't know exactly what code Python executes. What is important, though, is that the function produces an output, namely the number 1.68 rounded to 1 decimal place. 24 | 25 | If you call the function round() with only one input, Python again tries to match the inputs to the arguments. There's no input to match to the ndigits argument though. Luckily, the internal machinery of the round() function knows how to handle this. When ndigits is not specified, the function simply rounds to the closest integer and returns that integer. That's why we got the number 2. 26 | 27 | How was I so sure that calling the function with a single input would work? Well, in the documentation, there are square brackets around the comma and the ndigits here. This tells us that you can call round() in this form, as well as in this one. In other words, ndigits is an optional argument. Actually, Python offers yet another way to show that a function has optional arguments, but that's something for the exercises. 28 | 29 | By now, you have an idea about how to use max() and round(), but how could you know that a function such as round() exists in Python in the first place? Well, this is something you will learn with time. Whenever you are doing a rather standard task in Python, you can be pretty sure that there's already a function that can do this for you. In that case, you should definitely use it! Just do a quick internet search and you'll find the function you need with a nice usage example. And there is of course DataCamp, where you'll also learn about powerful functions and how to use them. Get straight to it in the interactive exercises! 30 | 31 | --- video_exercise_key:e1aaeb300b 32 | 33 | ## Methods 34 | 35 | Built-in functions are only one part of the Python story. You already know about functions such as max(), to get the maximum of a list, len(), to get the length of a list or a string, and so on. But what about other basic things, such getting the index of a specific element in the list, or reversing a list? You can look very hard for built-in functions that do this, but you won't find them. 36 | 37 | In the past exercises, you've already created a bunch of variables. Among other Python types, you've created strings, floats and lists, like the ones you see here. Each one of these values or data structures are so-called Python objects. This string is an object, this float is an object, but this list is also an object. These objects have a specific type, that you already know: string, float, and list, and of course they represent the values you gave them, such as "liz", 1.73 and an entire list. But next to that, Python objects also come with a bunch of so-called "methods". You can think of methods as functions that "belong to" Python objects. A Python object of type string has methods, such as capitalize and replace, but also objects of type float and list have specific methods depending on the type. 38 | 39 | Enough for the theory now; let's try to use a method! Suppose you want to get the index of the string "mom" in the fam list. fam is an Python object with the type list, and has a method named index(). To call the method, you use the dot notation, like this. The only input is the string "mom", the element you want to get the index for. 40 | 41 | Python returns 4, which indeed is the index of the string "mom". I called the index() method "on" the fam list here, and the output was 4. Similarly, I can use the count() method on the fam list to count the number of times 1.73 occurs in the list. 42 | 43 | Python gives me 1, which makes sense, because only liz is 1.73 meters tall. 44 | 45 | 46 | But lists are not the only Python objects that have methods associated. Also floats, integers, booleans and strings are Python objects that have specific methods associated with them. Take the variable `sister` for example, that represents a string. 47 | 48 | You can call the method capitalize() on sister, without any inputs. It returns a string where the first letter is capitalized now. 49 | 50 | Or what if you want to replace some parts of the string with other parts? Not a problem. Just call the method replace on sister, with two appropriate inputs. 51 | 52 | In the output, "z" is replaced with "sa". 53 | 54 | I guess it's clear by now: in Python, everything is an object, and each object has specific methods associated. Depending on the type of the object, list, string, float, whatever, the available methods are different. A string object like sister has a replace method, but a list like fam doesn't have this, as you can see from this error. Objects of different types can have methods with the same name: Take the index() method. It's available for both strings and lists. If you call it on a string, you get the index of the letters in the string; If you call it on a list, you get the index of the element in the list. This means that, depending on the type of the object, the methods behave differently. 55 | 56 | Before I unleash you on some exercises on methods, there's one more thing I want to tell you. Some methods can change the objects they are called on. Let's retake the fam list, and call the append() method on it. As the input, we pass a string we want to add to the list. 57 | 58 | Python doesn't generate an output, but if we check the `fam` list again, we see that it has been extended with the string "me". 59 | 60 | Let's do this again, this time to add my length to the list. 61 | 62 | Again, the fam list was extended. 63 | 64 | This is pretty cool, because you can write very concise code to update your data structures on the fly, but it can also be pretty dangerous. Some method calls don't change the object they're called on, while others do, so watch out. 65 | 66 | Let's take a step back here and summarise this. you have Python functions, like type(), max() and round(), that you can call like this. 67 | There's also methods, which are functions that are specific to Python objects. Depending on the type of the Python object you're dealing with, you'll be able to use different methods and they behave differently. You can call methods on the objects with the dot notation, like this, for example. 68 | 69 | There's much more to tell about Python objects, methods and how Python works internally, but for now, let's stick to what I've talked about here. It's time to get some exercises and add methods to your evergrowing skillset! 70 | 71 | --- video_exercise_key:2b89c5a9d8 72 | 73 | ## Packages 74 | 75 | By now, I hope you're convinced that python functions and methods are extremely powerful: you can basically use other people's code to solve your own problems. However, adding all functions and methods that have been written up to now to the same Python distribution would be a mess. There would be tons and tons of code in there, that you'll never use. Also, maintaining all of this code would be a real pain. 76 | 77 | This is where packages come into play. You can think of packages as a directory of Python scripts. Each such script is a so-called module. These modules specify functions, methods and new Python types aimed at solving particular problems. There are thousands of Python packages available from the internet. Among them are packages for data science: there's numpy to efficiently work with arrays, matplotlib for data visualization, and scikit-learn for machine learning. 78 | 79 | Not all these packages are available in Python by default. To use Python packages, you'll first have to install them on your own system, and then put code in your script to tell Python that you want to use these packages. 80 | 81 | Datacamp already has all necessary packages installed for you, but if you want to install them on your own system, you'll want to use pip, a package maintenance system for Python. If you go to this URL, you can download the file get-pip.py. Next, you go to the terminal, and execute python3 get-pip.py. Now you can use pip to actually install a Python package of your choosing. Suppose we want to install the numpy package, which you'll learn about in the next chapter. You type pip3 install numpy. You have to use the commands python3 and pip3 here to tell our system that we're working with Python version 3. 82 | 83 | Now that the package is installed, you can actually start using it in one of your Python scripts. Before you can do this, you should import the package, or a specific module of the package. You can do this with the import statement. 84 | 85 | To import the entire numpy package, you can do import numpy, like this. 86 | 87 | A commonly used function in Numpy is array(). It takes a list as input. Simply calling the array function like this, will generate an error. 88 | 89 | To refer to the array function from the numpy package, you'll need this. 90 | 91 | This time it works. The Numpy array is very useful to do data science, but more on that later. 92 | 93 | Using this numpy dot prefix all the time can become pretty tiring, so you can also import the package and refer to it with a different name. You can do this by extending your import statement with as, like this. 94 | 95 | Now, instead of numpy.array(), you'll have to use np.array() to use Numpy's array function. 96 | 97 | There are cases in which you only need one specific function of a package. Python allows you to make this explicit in your code. Suppose that we only want to use the array() function from the Numpy package. Instead of doing import numpy, you can instead do from numpy import array, like this. 98 | 99 | This time, you can simply call the array function like this, no need to use numpy dot here. 100 | 101 | This from import version to use specific parts of a package can be useful to limit the amount of coding, but you're also loosing some of the context. Suppose you're working in a long Python script. You import the array function from numpy at the very top, and way later, you actually use this array function. Somebody else who's reading your code might have forgotten that this array function is a specific Numpy function; it's not clear from the function call. In that respect, the more standard import numpy call is preferred: In this case, your function call is numpy.array(), making it very clear that you're working with Numpy. At the end of the day, it's a matter of personal preference; up to you to decide what you think is most convenient! 102 | 103 | Off to the exercises now, where you can practice on different ways of importing packages and modules yourself! 104 | -------------------------------------------------------------------------------- /scripts/chapter4_script.md: -------------------------------------------------------------------------------- 1 | --- video_exercise_key:ed471f4b00 2 | 3 | ## Intro to Numpy 4 | 5 | By now, you are aware that the Python list is pretty powerful. A list can hold any type and can hold different types at the same time. You can also change, add and remove elements. This is wonderful, but one feature is missing, a feature that is super important for aspiring data scientists as yourself. When analyzing data, you'll often want to carry out operations over entire collections of values, and you want to do this fast. With lists, this is a problem. 6 | 7 | Let's retake the heights of your family and yourself. Suppose you've also asked for everybody's weight. It's not very polite, but everything for science, right? You end up with two lists, height, and weight. The first person is 1.73 meters tall and weighs 65.4 kilograms. 8 | 9 | If you now want to calculate the Body Mass Index for each family member, you'd hope that this call can work, making the calculations element-wise. 10 | 11 | Unfortunately, Python throws an error, because it has no idea how to do calculations on lists. You could solve this by going through each list element one after the other, and calculating the BMI for each person separately, but this is terribly inefficient and tiresome to write. 12 | 13 | A way more elegant solution is to use NumPy, or Numeric Python. It's a Python package that, among others, provides a alternative to the regular python list: the Numpy array. The Numpy array is pretty similar to the list, but has one additional feature: you can perform calculations over entire arrays. It's really easy, and super-fast as well. 14 | 15 | The Numpy package is already installed on DataCamp's servers, but if you want to work with it on your own system, go to the command line and execute pip3 install numpy. 16 | 17 | Next, to actually use Numpy in your Python session, you can import the numpy package, like this. 18 | 19 | Let's start with creating a numpy array. You do this with Numpy's array() function: the input is a regular Python list. I'm using array() twice here, to create Numpy versions of the height and weight lists from before: np_height and np_weight: 20 | 21 | Let's try to calculate everybody's BMI with a single call again. 22 | 23 | This time, it worked fine: the calculations were performed element-wise. The first person's BMI was calculated by dividing the first element in np_weight by the square of the first element in np_height, the second person's BMI was calculated with the second height and weight elements, and so on. 24 | 25 | Let's do a quick comparison here. First, we tried to do calculations with regular lists, like this, but this gave us an error, because Python doesn't now how to do calculations with lists like we want them to. Next, these regular lists where converted to Numpy arrays. The same operations now work without any problem: Numpy knows how to work with arrays as if they are single values, which is pretty awesome if you ask me. 26 | 27 | You should still pay attention, though. First of all, Numpy can do all of this so easily because it assumes that your Numpy array can only contain values of a single type. It's either an array of floats, either an array of booleans, and so on. If you do try to create an array with different types, like this for example, the resulting Numpy array will contain a single type, string in this case. The boolean and the float were both converted to strings. 28 | 29 | Second, you should know that a Numpy array is simply a new kind of Python type, like the float, string and list types from before. This means that it comes with its own methods, which can behave differently than you'd expect. Take this Python list and this numpy array, for example. 30 | 31 | If you do python_list + python_list, the list elements are pasted together, generating a list with 6 elements. If you do this with the numpy arrays, on the other hand, Python will do an element-wise sum of the arrays. 32 | 33 | Just make sure to pay attention when you're juggling around with different Python types, because the outcomes can differ a lot! 34 | 35 | Apart from these subtleties, you can work with Numpy arrays pretty much the same as you can with regular Python lists. When you want to get elements from your array, for example, you can use square brackets. Suppose you want to get the `bmi` for the second person, so at index 1. This will do the trick. 36 | 37 | Specifically for Numpy, there's also another way to do list subsetting: using an array of booleans. Say you want to get all BMI values in the bmi array that are over 23. A first step is using the greater than sign, like this: 38 | 39 | The result is a Numpy array containing booleans: True if the corresponding bmi is above 23, False if it's below. Next, you can use this boolean array inside square brackets to do subsetting. Only the elements in bmi that are above 23, so for which the corresponding boolean value is True, is selected. There's only one BMI that's above 23, so we end up with a Numpy array with a single value, that specific BMI. 40 | 41 | Using the result of a comparison to make a selection of your data is a very common way to get surprising insights. Learn all about it and the other Numpy basics in the exercises! 42 | 43 | --- video_exercise_key:84e9f3c38d 44 | 45 | ## 2D Numpy arrays 46 | 47 | Let's recreate the numpy arrays from the previous video. 48 | 49 | If you ask for the type of these arrays, Python tells you that they are numpy.ndarray. numpy dot tells you it's a type that was defined in the numpy package. ndarray stands for n-dimensional array. The arrays np_height and np_weight are one-dimensional arrays, but it's perfectly possible to create 2 dimensional, three dimensional, heck even seven dimensional arrays! Let's stick to 2 in this video though. 50 | 51 | You can create a 2D numpy array from a regular Python list of lists. Let's try to create one numpy array for all height and weight data of your family, like this. 52 | 53 | If you print out np_2d now, you'll see that it is a rectangular data structure: Each sublist in the list, corresponds to a row in the two dimensional numpy array. From np_2d.shape, you can see that we indeed have 2 rows and 5 columns. shape is a so-called attribute of the np2d array, that can give you more information about what the data structure looks like. 54 | 55 | Also for 2D arrays, the Numpy rule applies: an array can only contain a single type. If you change one float to be string, all the array elements will be coerced to strings, to end up with a homogenous array. 56 | 57 | You can think of the 2D numpy array as an improved list of lists: you can perform calculations on the arrays, like I showed before, and you can do more advanced ways of subsetting. 58 | 59 | Suppose you want the first row, and then the third element in that row. To select the row, you need the index 0 in square brackets. 60 | 61 | To then select the third element, you can extend the same call with another pair of brackets, this time with the index 2, like this. Basically you're selecting the row, and then from that row do another selection. 62 | 63 | There's also an alternative way of subsetting, using single square brackets and a comma. This call returns the exact same value as before. The value before the comma specifies the row, the value after the comma specifies the column. The intersection of the rows and columns you specified, are returned. 64 | 65 | Once you get used to it, this syntax is more intuitive and opens up more possibilities. Suppose you want to select the height and weight of the second and third family member. You want both rows, so you put in a colon before the comma. You only want the second and third column, so you put in the indices 1 to 3 after the comma. Remember that the third index is not included here. The intersection gives us a 2D array with 2 rows and 2 columns: 66 | 67 | Similarly, you can select the weight of all family members like this: you only want the second row, so put 1 before the comma. You want all columns, so you use a colon after the comma. The intersection gives us the entire second row. 68 | 69 | Finally, 2D numpy arrays enable you to do element-wise calculations, the same way you did it with 1D numpy arrays. That's something you can experiment with in the exercises, along with creating and subsetting 2D numpy arrays! Exciting... 70 | 71 | --- video_exercise_key:16403c5a74 72 | 73 | ## Basic Statistics with Numpy 74 | 75 | A typical first step in analyzing your data, is getting to know your data in the first place. For the Numpy arrays from before, this is pretty easy, because it isn't a lot of data. However, as a data scientist, you'll be crunching thousands, if not millions or billions of numbers. 76 | 77 | Imagine you conduct a city-wide survey where you ask 5000 adults about their height and weight. You end up with something like this: a 2D numpy array, which I named np_city, that has 5000 rows, corresponding to the 5000 people, and two columns, corresponding to the height and the weight. 78 | 79 | Simply staring at these numbers like a zombie won't give you any insights. What you can do, though, is generate summarizing statistics about your data. Aside from an efficient data structure for number crunching, it happens that Numpy is also good at doing these kinds of things. 80 | 81 | For starters, you can try to find out the average height of these 5000 people, with Numpy's mean function. Because it's a function from the Numpy package, don't forget to start with np.. 82 | 83 | Of course, I first had to do a subsetting operation to get the height column from the 2D array. It appears that on average, people are 1.75 meters tall. What about the median height? This is the height of the middle person if you sort all persons from small to tall. Instead of writing complicated python code to figure this out, you can simply use Numpy's median() function: 84 | 85 | You can do similar things for the weight column in np_city. Often, these summarizing statistics will provide you with a "sanity check" of your data. If you end up with a average weight of 2000 kilograms, your measurements are most likely incorrect. 86 | 87 | Apart from mean() and median(), there's also other functions, like corrcoeff() to check if for example height and weight are correlated, 88 | 89 | and std(), for standard deviation. 90 | 91 | Numpy also features more basic functions, such as sum() and sort(), which also exist in the basic Python distribution. However, the big difference here is speed. Because Numpy enforces a single data type in an array, it can drastically speed up the calculations. 92 | 93 | Just a sidenote here: If you're wondering how I came up with the data in this video: I simulated it with Numpy functions! I sampled two random distributions 5000 times to create the height and weight arrays, and then used column_stack to paste them together as two columns. Another thing that Numpy can do! 94 | 95 | Another great tool to get some sense of your data is to visualize it, but that's something for later. First, head over to the exercises to learn how to explore your Numpy arrays! 96 | -------------------------------------------------------------------------------- /slides/ch4_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacamp/courses-introduction-to-python/4deee0de7c9fb66ce9687590847413b01862a7a0/slides/ch4_slides.pdf -------------------------------------------------------------------------------- /slides/chapter_1_433dcfcfedaee070cbf440491c402e3b.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Insert title here 3 | key: 433dcfcfedaee070cbf440491c402e3b 4 | video_link: 5 | mp4: 'https://videos.datacamp.com/raw/735_intro_to_python/v6/735_ch1_2.mp4' 6 | hls: >- 7 | https://videos.datacamp.com/transcoded/735_intro_to_python/v6/hls-735_ch1_2.master.m3u8 8 | transformations: 9 | translateX: 50 10 | translateY: 0 11 | scale: 1 12 | --- 13 | 14 | ## Variables and Types 15 | 16 | ```yaml 17 | type: TitleSlide 18 | key: dc8b62f1c8 19 | ``` 20 | 21 | `@lower_third` 22 | name: Hugo Bowne-Anderson 23 | title: Data Scientist at DataCamp 24 | 25 | `@script` 26 | Well done and welcome back! It's clear that Python is a great calculator. If you want to do more complex calculations though, you will want to "save" values while you're coding along. 27 | 28 | --- 29 | 30 | ## Variable 31 | 32 | ```yaml 33 | type: FullSlide 34 | key: 36ec318b41 35 | ``` 36 | 37 | `@part1` 38 | - Specific, case-sensitive name 39 | 40 | - Call up value through variable name{{1}} 41 | 42 | - 1.79 m - 68.7 kg{{2}} 43 | 44 | ```py 45 | height = 1.79 46 | weight = 68.7 47 | ```{{3}} 48 | ```py 49 | height 50 | ```{{4}} 51 | 52 | ```out 53 | 1.79 54 | ```{{4}} 55 | 56 | `@script` 57 | You can do this by defining a variable, with a specific, case-sensitive name. Once you create (or declare) such a variable, you can later call up its value by typing the variable name. 58 | 59 | Suppose you measure your height and weight, in metric units: you are 1.79 meters tall, and weigh 68.7 kilograms. You can assign these values to two variables, named height and weight, with an equals sign: 60 | 61 | If you now type the name of the variable, height, 62 | 63 | Python looks for the variable name, retrieves its value, and prints it out. 64 | 65 | --- 66 | 67 | ## Calculate BMI 68 | 69 | ```yaml 70 | type: TwoColumns 71 | key: fe1b10a93b 72 | code_zoom: 80 73 | ``` 74 | 75 | `@part1` 76 | ```py 77 | height = 1.79 78 | weight = 68.7 79 | ``` 80 | ```py 81 | height 82 | ``` 83 | 84 | ```out 85 | 1.79 86 | ``` 87 | 88 | $$ \text{BMI} = \frac{\text{weight}}{\text{height}^2} $${{1}} 89 | 90 | `@part2` 91 | ```py 92 | 68.7 / 1.79 ** 2 93 | ```{{2}} 94 | 95 | ```out 96 | 21.4413 97 | ```{{2}} 98 | 99 | ```py 100 | weight / height ** 2 101 | ```{{3}} 102 | 103 | ```out 104 | 21.4413 105 | ```{{3}} 106 | 107 | ```py 108 | bmi = weight / height ** 2 109 | bmi 110 | ```{{4}} 111 | 112 | ```out 113 | 21.4413 114 | ```{{4}} 115 | 116 | `@script` 117 | Let's now calculate the Body Mass Index, or BMI, which is calculated as follows, with weight in kilograms and height in meters. You can do this with the actual values, but you can just as well use the variables height and weight, like in here. Every time you type the variable's name, you are asking Python to change it with the actual value of the variable. weight corresponds to 68.7, and height to 1.79. 118 | 119 | Finally, this version has Python store the result in a new variable, bmi. bmi now contains the same value as the one you calculated earlier. 120 | 121 | In Python, variables are used all the time. They help to make your code reproducible. 122 | 123 | --- 124 | 125 | ## Reproducibility 126 | 127 | ```yaml 128 | type: FullSlide 129 | key: 9980f47f9d 130 | ``` 131 | 132 | `@part1` 133 | ```py 134 | height = 1.79 135 | weight = 68.7 136 | bmi = weight / height ** 2 137 | print(bmi) 138 | ``` 139 | 140 | ```out 141 | 21.4413 142 | ``` 143 | 144 | `@script` 145 | Suppose the code to create the height, weight and bmi variable are in a script, like this. If you now want to recalculate the bmi for another weight, 146 | 147 | --- 148 | 149 | ## Reproducibility 150 | 151 | ```yaml 152 | type: FullSlide 153 | key: a4e899f00f 154 | disable_transition: true 155 | ``` 156 | 157 | `@part1` 158 | ```py 159 | height = 1.79 160 | weight = 74.2 # <- 161 | bmi = weight / height ** 2 162 | print(bmi) 163 | ``` 164 | 165 | ```out 166 | 23.1578 167 | ``` 168 | 169 | `@script` 170 | you can simply change the declaration of the weight variable, and rerun the script. The bmi changes accordingly, because the value of the variable weight has changed as well. 171 | 172 | So far, we've only worked with numerical values, such as height and weight. 173 | 174 | --- 175 | 176 | ## Python Types 177 | 178 | ```yaml 179 | type: FullSlide 180 | key: 9d86084ad4 181 | ``` 182 | 183 | `@part1` 184 | ```py 185 | type(bmi) 186 | ```{{1}} 187 | 188 | ```out 189 | float 190 | ```{{1}} 191 | 192 | ```py 193 | day_of_week = 5 194 | type(day_of_week) 195 | ```{{2}} 196 | 197 | ```out 198 | int 199 | ```{{2}} 200 | 201 | `@script` 202 | In Python, these numbers all have a specific type. You can check out the type of a value with the type function. To see the type of our bmi value, simply write type and then bmi inside parentheses. You can see that it's a float, which is python's way of representing a real number, so a number which can have both an integer part and a fractional part. Python also has a type for integers: int, like this example. 203 | 204 | To do data science, you'll need more than ints and floats, though. 205 | 206 | --- 207 | 208 | ## Python Types (2) 209 | 210 | ```yaml 211 | type: FullSlide 212 | key: d971d34e6a 213 | ``` 214 | 215 | `@part1` 216 | ```py 217 | x = "body mass index" 218 | y = 'this works too' 219 | ```{{1}} 220 | ```py 221 | type(y) 222 | ```{{2}} 223 | 224 | ```out 225 | str 226 | ```{{2}} 227 | 228 | ```py 229 | z = True 230 | type(z) 231 | ```{{3}} 232 | 233 | ```out 234 | bool 235 | ```{{3}} 236 | 237 | `@script` 238 | Python features tons of other data types. The most common ones are strings and booleans. 239 | 240 | A string is Python's way to represent text. You can use both double and single quotes to build a string, as you can see from these examples. If you print the type of the last variable here, you see that it's str, short for string. 241 | 242 | The Boolean is a type that can either be True or False. You can think of it as 'Yes' and 'No' in everyday language. Booleans will be very useful in the future, to perform filtering operations on your data for example. 243 | 244 | There's something special about Python data types. 245 | 246 | --- 247 | 248 | ## Python Types (3) 249 | 250 | ```yaml 251 | type: FullSlide 252 | key: 24601e2af0 253 | ``` 254 | 255 | `@part1` 256 | ```py 257 | 2 + 3 258 | ```{{1}} 259 | 260 | ```out 261 | 5 262 | ```{{1}} 263 | 264 | ```py 265 | 'ab' + 'cd' 266 | ```{{2}} 267 | 268 | ```out 269 | 'abcd' 270 | ```{{2}} 271 | 272 | - Different type = different behavior!{{3}} 273 | 274 | `@script` 275 | Have a look at this line of code, that sums two integers, and then this line of code, that sums two strings. 276 | 277 | For the integers, the values were summed, while for the strings, the strings were pasted together. The plus operator behaved differently for different data types. This is a general principle: how the code behaves depends on the types you're working with. 278 | 279 | In the exercises that follow, you'll create your first variables and experiment with some of Python's data types. I'll see you in the next video to explain all about lists. 280 | 281 | --- 282 | 283 | ## Let's practice! 284 | 285 | ```yaml 286 | type: FinalSlide 287 | key: b7fc40db4d 288 | ``` 289 | 290 | `@script` 291 | Let's get you coding and I can't wait to see you in the next chapter where you'll build even more awesome python charts. 292 | -------------------------------------------------------------------------------- /slides/chapter_1_d8fcd4c930027fa4e1c3870c7e7e0ff1.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Insert title here 3 | key: d8fcd4c930027fa4e1c3870c7e7e0ff1 4 | video_link: 5 | mp4: 'https://videos.datacamp.com/raw/735_intro_to_python/v8/735_ch1_1.mp4' 6 | hls: >- 7 | https://videos.datacamp.com/transcoded/735_intro_to_python/v8/hls-735_ch1_1.master.m3u8 8 | transformations: 9 | translateX: 50 10 | translateY: 0 11 | scale: 1 12 | --- 13 | 14 | ## Hello Python! 15 | 16 | ```yaml 17 | type: TitleSlide 18 | key: f743ca8c41 19 | ``` 20 | 21 | `@lower_third` 22 | name: Hugo Bowne-Anderson 23 | title: Data Scientist at DataCamp 24 | 25 | `@script` 26 | Hi, my name is Hugo and I'll be your host for Introduction to Python for Data Science. 27 | 28 | I'm a data scientist and educator at DataCamp. 29 | 30 | --- 31 | 32 | ## How you will learn 33 | 34 | ```yaml 35 | type: FullSlide 36 | key: 30ee08a725 37 | disable_transition: true 38 | ``` 39 | 40 | `@part1` 41 | ![DataCamp Interface](https://assets.datacamp.com/production/repositories/288/datasets/729574d2168960686381caefe79baf5978e27d0d/liveexercise.gif) 42 | 43 | `@script` 44 | In this course, you will learn Python for Data Science through video lessons, like this one, and interactive exercises. You get your own Python session where you can experiment and try to come up with the correct code to solve the instructions. You're learning by doing, while receiving customized and instant feedback on your work. 45 | 46 | --- 47 | 48 | ## Python 49 | 50 | ```yaml 51 | type: FullSlide 52 | key: 3f23b93572 53 | ``` 54 | 55 | `@part1` 56 | ![guido-hba.png](https://assets.datacamp.com/production/repositories/288/datasets/fb3e4b8dc114529dafffb37d33f2b2244210d40f/guido-hba.png = 38){{1}} 57 | 58 | - General purpose: build anything{{2}} 59 | 60 | - Open source! Free!{{3}} 61 | 62 | - Python packages, also for data science{{4}} 63 | 64 | - Many applications and fields{{5}} 65 | 66 | `@script` 67 | Python was conceived by Guido Van Rossum. Here, you can see a photo of me with Guido. What started as a hobby project, soon became a general purpose programming language: nowadays, you can use Python to build practically any piece of software. But how did this happen? Well, first of all, Python is open source. It's free to use. Second, it's very easy to build packages in Python, which is code that you can share with other people to solve specific problems. Throughout time, more and more of these packages specifically built for data science have been developed. Suppose you want to make some fancy visualizations of your company's sales. There's a package for that. Or what about connecting to a database to analyze sensor measurements? There's also a package for that. 68 | People often refer to Python as the swiss army knife of programming languages as you can do almost anything with it. 69 | In this course, we'll start to build up your data science coding skills bit by bit, so make sure to stick around to see how powerful the language can be. 70 | 71 | --- 72 | 73 | ## IPython Shell 74 | 75 | ```yaml 76 | type: FullSlide 77 | key: 43a91a7217 78 | ``` 79 | 80 | `@part1` 81 | **Execute Python commands** 82 | 83 | ![ipython_shell.png](https://assets.datacamp.com/production/repositories/288/datasets/a9e8440bb8fbd49e4a73e4c36ef1cd677c0dd55f/pyexercise.png = 95) 84 | 85 | `@script` 86 | Now that you're all eyes and ears for Python, let's start experimenting. I'll start with the 87 | 88 | --- 89 | 90 | ## IPython Shell 91 | 92 | ```yaml 93 | type: FullSlide 94 | key: 9c51ee700d 95 | disable_transition: true 96 | ``` 97 | 98 | `@part1` 99 | **Execute Python commands** 100 | 101 | ![ipython_shell_highlighted.png](https://assets.datacamp.com/production/repositories/288/datasets/dd43cc0183b15b43a072eb0fbab4caa72dee9250/pyexercise_shell.jpg = 95) 102 | 103 | `@script` 104 | Python shell, a place where you can type Python code and immediately see the results. In DataCamp's exercise interface, this shell is embedded here. Let's start off simple and use Python as a calculator. 105 | 106 | --- 107 | 108 | ## IPython Shell 109 | 110 | ```yaml 111 | type: FullSlide 112 | key: 524e4c20a7 113 | disable_transition: true 114 | ``` 115 | 116 | `@part1` 117 |   118 | 119 | ![Calculations in DataCamp's IPython shell](https://assets.datacamp.com/production/repositories/288/datasets/cee32b788a62e4b9a1234ccde56ac9ebb49cfa72/shelladdition.gif = 95) 120 | 121 | `@script` 122 | Let me type 4 + 5, and hit Enter. Python interprets what you typed and prints the result of your calculation, 9. The Python shell that's used here is actually not the original one; we're using IPython, short for Interactive Python, which is some kind of juiced up version of regular Python that'll be useful later on. 123 | 124 | IPython was created by Fernando Pérez and is part of the broader Jupyter ecosystem. Apart from interactively working with Python, you can also have Python run so called 125 | 126 | --- 127 | 128 | ## Python Script 129 | 130 | ```yaml 131 | type: FullSlide 132 | key: 78ef256bc0 133 | ``` 134 | 135 | `@part1` 136 | - Text files - `.py`{{1}} 137 | 138 | - List of Python commands{{2}} 139 | 140 | - Similar to typing in IPython Shell{{3}} 141 | 142 | ![Python script in DataCamp](https://assets.datacamp.com/production/repositories/288/datasets/59f196e96536543a4fb8801228019fc4106f3791/pyexercise_script.jpg = 78){{3}} 143 | 144 | `@script` 145 | python scripts. These python scripts are simply text files with the extension (dot) py. It's basically a list of Python commands that are executed, almost as if you where typing the commands in the shell yourself, line by line. 146 | 147 | --- 148 | 149 | ## Python Script 150 | 151 | ```yaml 152 | type: FullSlide 153 | key: 717d124175 154 | disable_transition: true 155 | ``` 156 | 157 | `@part1` 158 | ![GIF: typing 4 + 5 in the script and hitting submit answer. No output is shown.](https://assets.datacamp.com/production/repositories/288/datasets/2f96e979012e15329cc158d1e0f496aac3539f45/scriptnoprint.gif = 95) 159 | 160 | `@script` 161 | Let's put the command from before in a script now, which can be found here in DataCamp's interface. The next step is executing the script, by clicking 'Submit Answer'. If you execute this script in the DataCamp interface, there's nothing in the output pane. That's because you have to explicitly use print inside scripts if you want to generate output during execution. 162 | 163 | --- 164 | 165 | ## Python Script 166 | 167 | ```yaml 168 | type: FullSlide 169 | key: c7a9d02fb6 170 | disable_transition: true 171 | code_zoom: 90 172 | ``` 173 | 174 | `@part1` 175 | ![python_script_print.gif](https://assets.datacamp.com/production/repositories/288/datasets/8b13d046bb54dcb11aa49f0da7363781129d1561/scriptwithprint.gif = 95) 176 | 177 | - Use `print()` to generate output from script 178 | 179 | `@script` 180 | Let's wrap our previous calculation in a print call, and rerun the script. This time, the same output as before is generated, great! Putting your code in Python scripts instead of manually retyping every step interactively will help you to keep structure and avoid retyping everything over and over again if you want to make a change; you simply make the change in the script, and rerun the entire thing. 181 | 182 | --- 183 | 184 | ## DataCamp Interface 185 | 186 | ```yaml 187 | type: FullSlide 188 | key: 693ba1cd14 189 | ``` 190 | 191 | `@part1` 192 | ![Screenshot of DataCamp interface](https://assets.datacamp.com/production/repositories/288/datasets/a9e8440bb8fbd49e4a73e4c36ef1cd677c0dd55f/pyexercise.png) 193 | 194 | `@script` 195 | Now that you've got an idea about different ways of working with Python, I suggest you head over to the exercises. Use the IPython Shell for experimentation, and use the Python script editor to code the actual answer. If you click Submit Answer, your script will be executed and checked for correctness. 196 | 197 | --- 198 | 199 | ## Let's practice! 200 | 201 | ```yaml 202 | type: FinalSlide 203 | key: 7445cd202e 204 | ``` 205 | 206 | `@script` 207 | Get coding and don't forget to have fun! 208 | -------------------------------------------------------------------------------- /slides/chapter_2_355ed52d2fb0d67508c6a311b7cbc6d3.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Insert title here 3 | key: 355ed52d2fb0d67508c6a311b7cbc6d3 4 | video_link: 5 | mp4: 'https://videos.datacamp.com/raw/735_intro_to_python/v6/735_ch2_3.mp4' 6 | hls: >- 7 | https://videos.datacamp.com/transcoded/735_intro_to_python/v6/hls-735_ch2_3.master.m3u8 8 | transformations: 9 | translateX: 50 10 | translateY: 0 11 | scale: 1 12 | --- 13 | 14 | ## Manipulating Lists 15 | 16 | ```yaml 17 | type: TitleSlide 18 | key: 6484e4d1f6 19 | ``` 20 | 21 | `@lower_third` 22 | name: Hugo Bowne-Anderson 23 | title: Data Scientist at DataCamp 24 | 25 | `@script` 26 | Wow, you're doing super well. So now, after creation and subsetting, the final piece of the Python lists puzzle is 27 | 28 | --- 29 | 30 | ## List Manipulation 31 | 32 | ```yaml 33 | type: FullSlide 34 | key: 5b83249ee9 35 | ``` 36 | 37 | `@part1` 38 | - Change list elements{{1}} 39 | 40 | - Add list elements{{2}} 41 | 42 | - Remove list elements{{3}} 43 | 44 | `@script` 45 | manipulation, so ways to change elements in your list, or to add elements to and remove elements from your list. 46 | 47 | --- 48 | 49 | ## Changing list elements 50 | 51 | ```yaml 52 | type: FullSlide 53 | key: c1d58a3c4c 54 | code_zoom: 64 55 | ``` 56 | 57 | `@part1` 58 | ```py 59 | fam = ["liz", 1.73, "emma", 1.68, "mom", 1.71, "dad", 1.89] 60 | fam 61 | ``` 62 | 63 | ```out 64 | ['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89] 65 | ``` 66 | 67 | ```py 68 | fam[7] = 1.86 69 | fam 70 | ```{{1}} 71 | 72 | ```out 73 | ['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.86] 74 | ```{{1}} 75 | 76 | ```py 77 | fam[0:2] = ["lisa", 1.74] 78 | fam 79 | ```{{2}} 80 | 81 | ```out 82 | ['lisa', 1.74, 'emma', 1.68, 'mom', 1.71, 'dad', 1.86] 83 | ```{{2}} 84 | 85 | `@script` 86 | Changing list elements is pretty straightforward. You use the same square brackets that we've used to subset lists, and then assign new elements to it using the equals sign. Suppose that after another look at fam, you realize that your dad's height is not up to date anymore, as he's shrinking with age. Instead of 1.89 meters, it should be 1.86 meters. To change this list element, which is at index 7, you can use this line of code. 87 | 88 | If you now check out fam, you'll see that the value is updated. 89 | 90 | You can even change an entire list slice at once. To change the elements "liz" and 1.73, you access the first two elements with 0:2, and then assign a new list to it. 91 | 92 | Do you still remember how the plus operator was different for strings and integers? 93 | 94 | --- 95 | 96 | ## Adding and removing elements 97 | 98 | ```yaml 99 | type: FullSlide 100 | key: a66d56cb46 101 | code_zoom: 74 102 | ``` 103 | 104 | `@part1` 105 | ```py 106 | fam + ["me", 1.79] 107 | ```{{1}} 108 | 109 | ```out 110 | ['lisa', 1.74,'emma', 1.68, 'mom', 1.71, 'dad', 1.86, 'me', 1.79] 111 | ```{{1}} 112 | 113 | ```py 114 | fam_ext = fam + ["me", 1.79] 115 | ```{{2}} 116 | ```py 117 | del fam[2] 118 | ```{{3}} 119 | ```py 120 | fam 121 | ```{{4}} 122 | 123 | ```out 124 | ['lisa', 1.74, 1.68, 'mom', 1.71, 'dad', 1.86] 125 | ```{{4}} 126 | 127 | `@script` 128 | Well, it's again different for lists. If you use the plus sign with two lists, Python simply pastes together their contents in a single list. Suppose you want to add your own name and height to the fam height list. This will do the trick. 129 | 130 | Of course, you can also store this new list in a variable, fam_ext for example. 131 | 132 | Finally, deleting elements from a list is also pretty straightforward, you'll have to use del here. Take this line, for example, that deletes the element with index 2, so "emma", from the list. 133 | 134 | If you check out fam now, you'll see that the "emma" string is gone. Because you've removed an index, all elements that came after "emma" scooted over by one index. If you again run the same line, you're again removing the element at index 2, which is emma's height, 1.68 meters now. 135 | 136 | Understanding how Python lists actually work 137 | 138 | --- 139 | 140 | ## Behind the scenes (1) 141 | 142 | ```yaml 143 | type: TwoColumns 144 | key: ef5370967a 145 | code_zoom: 100 146 | ``` 147 | 148 | `@part1` 149 | ```py 150 | x = ["a", "b", "c"] 151 | ```{{1}} 152 | 153 | `@part2` 154 | ![ch_2_3_slides.024.png](https://assets.datacamp.com/production/repositories/288/datasets/e91761036b6647fa635fe8493b4ff3379587f5d5/ch_2_3_slides.024.png = 70){{2}} 155 | 156 | `@script` 157 | behind the scenes becomes pretty important now. What actually happens when you create a new list, x, like this? 158 | 159 | Well, in a simplified sense, you're storing a list in your computer memory, and store the 'address' of that list, so 160 | 161 | --- 162 | 163 | ## Behind the scenes (1) 164 | 165 | ```yaml 166 | type: TwoColumns 167 | key: 4d48163f25 168 | disable_transition: true 169 | code_zoom: 100 170 | ``` 171 | 172 | `@part1` 173 | ```py 174 | x = ["a", "b", "c"] 175 | ``` 176 | ```py 177 | y = x 178 | ```{{1}} 179 | ```py 180 | y[1] = "z" 181 | y 182 | ```{{2}} 183 | 184 | ```out 185 | ['a', 'z', 'c'] 186 | ```{{2}} 187 | 188 | ```py 189 | x 190 | ```{{3}} 191 | 192 | ```out 193 | ['a', 'z', 'c'] 194 | ```{{3}} 195 | 196 | `@part2` 197 | ![ch_2_3_slides.025.png](https://assets.datacamp.com/production/repositories/288/datasets/03d95d40b2e0d631ea89f07cadf12e66babd3693/ch_2_3_slides.025.png = 70) 198 | 199 | `@script` 200 | where the list is in your computer memory, in x. This means that x does not actually contain all the list elements, it rather contains a reference to the list. For basic operations, the difference is not that important, but it becomes more so when you start copying lists. Let me clarify this with an example. 201 | 202 | Let's store the list x as a new variable y, by simply using the equals sign. 203 | 204 | Let's now change the element with index one in the list y, like this. 205 | 206 | The funky thing is that if you now check out x again, also here the second element was changed. 207 | 208 | That's because when you copied x to y with the equals sign, 209 | 210 | --- 211 | 212 | ## Behind the scenes (1) 213 | 214 | ```yaml 215 | type: TwoColumns 216 | key: 4a5827f664 217 | disable_transition: true 218 | code_zoom: 100 219 | ``` 220 | 221 | `@part1` 222 | ```py 223 | x = ["a", "b", "c"] 224 | ``` 225 | ```py 226 | y = x 227 | ``` 228 | ```py 229 | y[1] = "z" 230 | y 231 | ``` 232 | 233 | ```out 234 | ['a', 'z', 'c'] 235 | ``` 236 | 237 | ```py 238 | x 239 | ``` 240 | 241 | ```out 242 | ['a', 'z', 'c'] 243 | ``` 244 | 245 | `@part2` 246 | ![ch_2_3_slides.030.png](https://assets.datacamp.com/production/repositories/288/datasets/cee01ad8680d8cd824bab998aed4c5e5f74521bb/ch_2_3_slides.030.png = 70) 247 | 248 | `@script` 249 | you copied the reference to the list, not the actual values themselves. 250 | 251 | --- 252 | 253 | ## Behind the scenes (1) 254 | 255 | ```yaml 256 | type: TwoColumns 257 | key: ef3476e2fc 258 | disable_transition: true 259 | ``` 260 | 261 | `@part1` 262 | ```py 263 | x = ["a", "b", "c"] 264 | ``` 265 | ```py 266 | y = x 267 | ``` 268 | ```py 269 | y[1] = "z" 270 | y 271 | ``` 272 | 273 | ```out 274 | ['a', 'z', 'c'] 275 | ``` 276 | 277 | ```py 278 | x 279 | ``` 280 | 281 | ```out 282 | ['a', 'z', 'c'] 283 | ``` 284 | 285 | `@part2` 286 | ![ch_2_3_slides.031.png](https://assets.datacamp.com/production/repositories/288/datasets/fff4d255ec69a9a6e4d64394bdb92464390498c4/ch_2_3_slides.031.png = 70) 287 | 288 | `@script` 289 | When you're updating an element the list, it's one and the same list in the computer memory your changing. Both x and y point to this list, so the update is visible from both variables. 290 | 291 | If you want to create a list y that points to a new list in the memory with the same values, 292 | 293 | --- 294 | 295 | ## Behind the scenes (2) 296 | 297 | ```yaml 298 | type: TwoColumns 299 | key: 05f37e881d 300 | code_zoom: 100 301 | ``` 302 | 303 | `@part1` 304 | ```py 305 | x = ["a", "b", "c"] 306 | ``` 307 | 308 | `@part2` 309 | ![ch_2_3_slides.033.png](https://assets.datacamp.com/production/repositories/288/datasets/97dc873ce995a7fb3cf83305c56a6a9b4f23de51/ch_2_3_slides.033.png) 310 | 311 | `@script` 312 | you'll need to use something else than the equals sign. You can use the list function, 313 | 314 | --- 315 | 316 | ## Behind the scenes (2) 317 | 318 | ```yaml 319 | type: TwoColumns 320 | key: 678dfa958a 321 | disable_transition: true 322 | code_zoom: 100 323 | ``` 324 | 325 | `@part1` 326 | ```py 327 | x = ["a", "b", "c"] 328 | ``` 329 | ```py 330 | y = list(x) 331 | y = x[:] 332 | ``` 333 | 334 | `@part2` 335 | ![ch_2_3_slides.034.png](https://assets.datacamp.com/production/repositories/288/datasets/ec9a50129117c16795d74c53b070c34c0015f6d1/ch_2_3_slides.034.png) 336 | 337 | `@script` 338 | like this, or use slicing to select all list elements explicitly. 339 | 340 | If you now 341 | 342 | --- 343 | 344 | ## Behind the scenes (2) 345 | 346 | ```yaml 347 | type: TwoColumns 348 | key: d211be5714 349 | disable_transition: true 350 | code_zoom: 100 351 | ``` 352 | 353 | `@part1` 354 | ```py 355 | x = ["a", "b", "c"] 356 | ``` 357 | ```py 358 | y = list(x) 359 | y = x[:] 360 | ``` 361 | ```py 362 | y[1] = "z" 363 | x 364 | ``` 365 | 366 | ```out 367 | ['a', 'b', 'c'] 368 | ``` 369 | 370 | `@part2` 371 | ![ch_2_3_slides.036.png](https://assets.datacamp.com/production/repositories/288/datasets/3f6b4d36a70007385ff752d07fa842a1e3a7f878/ch_2_3_slides.036.png) 372 | 373 | `@script` 374 | make a change to the list y points to, x is not affected. 375 | 376 | If this was a bit too much to take in, don't worry. 377 | 378 | --- 379 | 380 | ## Let's practice! 381 | 382 | ```yaml 383 | type: FinalSlide 384 | key: 934a5be348 385 | ``` 386 | 387 | `@script` 388 | The exercises will help you understand list manipulation and the subtle inner workings of lists. I'm sure you'll do great! 389 | -------------------------------------------------------------------------------- /slides/chapter_2_a0530c4542f10988847b2dbb91f717c3.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Insert title here 3 | key: a0530c4542f10988847b2dbb91f717c3 4 | video_link: 5 | mp4: 'https://videos.datacamp.com/raw/735_intro_to_python/v6/735_ch2_1.mp4' 6 | hls: >- 7 | https://videos.datacamp.com/transcoded/735_intro_to_python/v6/hls-735_ch2_1.master.m3u8 8 | transformations: 9 | translateX: 50 10 | translateY: 0 11 | scale: 1 12 | --- 13 | 14 | ## Python Lists 15 | 16 | ```yaml 17 | type: TitleSlide 18 | key: 30d2c57d4e 19 | ``` 20 | 21 | `@lower_third` 22 | name: Hugo Bowne-Anderson 23 | title: Data Scientist at DataCamp 24 | 25 | `@script` 26 | Welcome back aspiring Pythonista. By now, you've played around with different data types, and I hope you've had as much fun as I have. 27 | 28 | --- 29 | 30 | ## Python Data Types 31 | 32 | ```yaml 33 | type: FullSlide 34 | key: 2b9e2d1529 35 | ``` 36 | 37 | `@part1` 38 | - float - real numbers{{1}} 39 | 40 | - int - integer numbers{{2}} 41 | 42 | - str - string, text{{3}} 43 | 44 | - bool - True, False{{4}} 45 | 46 | ```py 47 | height = 1.73 48 | tall = True 49 | ```{{5}} 50 | 51 | - Each variable represents single value{{6}} 52 | 53 | `@script` 54 | On the numbers side, there's the float, to represent a real number, and the int, to represent an integer. Next, we also have str, short for string, to represent text in Python, and bool, which can be either True or False. You can save these values as a variable, like these examples show. Each variable then represents a single value. 55 | 56 | As a data scientist, 57 | 58 | --- 59 | 60 | ## Problem 61 | 62 | ```yaml 63 | type: FullSlide 64 | key: a6e5aa6c25 65 | ``` 66 | 67 | `@part1` 68 | - Data Science: many data points{{1}} 69 | 70 | - Height of entire family{{2}} 71 | 72 | ```py 73 | height1 = 1.73 74 | height2 = 1.68 75 | height3 = 1.71 76 | height4 = 1.89 77 | ```{{3}} 78 | 79 | - Inconvenient{{4}} 80 | 81 | `@script` 82 | you'll often want to work with many data points. If you for example want to measure the height of everybody in your family, and store this information in Python, it would be inconvenient to create a new python variable for each point you collected right? 83 | 84 | What you can do instead, is store all this information in a Python list. 85 | 86 | --- 87 | 88 | ## Python List 89 | 90 | ```yaml 91 | type: FullSlide 92 | key: e0a7e67ef6 93 | code_zoom: 66 94 | ``` 95 | 96 | `@part1` 97 | - `[a, b, c]` 98 | 99 | 100 | ```py 101 | [1.73, 1.68, 1.71, 1.89] 102 | ```{{1}} 103 | 104 | ```out 105 | [1.73, 1.68, 1.71, 1.89] 106 | ```{{1}} 107 | 108 | ```py 109 | fam = [1.73, 1.68, 1.71, 1.89] 110 | fam 111 | ```{{2}} 112 | 113 | ```out 114 | [1.73, 1.68, 1.71, 1.89] 115 | ```{{2}} 116 | 117 | - Name a collection of values{{3}} 118 | 119 | - Contain any type{{4}} 120 | 121 | - Contain different types{{5}} 122 | 123 | `@script` 124 | You can build such a list with square brackets. Suppose you asked your two sisters and parents for their height, in meters. You can build the list as follows: 125 | 126 | Of course, also this data structure can be referenced to with a variable. Simply put the variable name and the equals sign in front, like here. 127 | 128 | A list is a way to give a single name to a collection of values. These values, or elements, can have any type; they can be floats, integer, booleans, strings, but also more advanced Python types, even lists. 129 | 130 | It's perfectly possible for a list to contain different types as well. 131 | 132 | --- 133 | 134 | ## Python List 135 | 136 | ```yaml 137 | type: FullSlide 138 | key: 35d6825cd6 139 | code_zoom: 68 140 | ``` 141 | 142 | `@part1` 143 | - `[a, b, c]` 144 | 145 | ```py 146 | fam = ["liz", 1.73, "emma", 1.68, "mom", 1.71, "dad", 1.89] 147 | ``` 148 | ```py 149 | fam 150 | ``` 151 | 152 | ```out 153 | ['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89] 154 | ``` 155 | 156 | ```py 157 | fam2 = [["liz", 1.73], 158 | ["emma", 1.68], 159 | ["mom", 1.71], 160 | ["dad", 1.89]] 161 | ```{{1}} 162 | ```py 163 | fam2 164 | ```{{2}} 165 | 166 | ```out 167 | [['liz', 1.73], ['emma', 1.68], ['mom', 1.71], ['dad', 1.89]] 168 | ```{{2}} 169 | 170 | `@script` 171 | Suppose, for example, that you want to add the names of your sisters and parents to the list, so that you know which height belongs to who. You can throw in some strings without issues. 172 | 173 | But that's not all. I just told you that lists can also contain lists themselves. Instead of putting the strings in between the numbers, you can create little sublists for each member of the family. One for liz, one for emma and so on. Now, you can tell Python that these sublists are the elements of another list, that I named fam2: the little lists are wrapped in square brackets and separated with commas. If you now print out fam2, you see that we have a list of lists. The main list contains 4 sub-lists. 174 | 175 | We're dealing with a new Python type here, next to the strings, booleans, integers and floats you already know about: 176 | 177 | --- 178 | 179 | ## List type 180 | 181 | ```yaml 182 | type: FullSlide 183 | key: 2dd9765326 184 | code_zoom: 80 185 | ``` 186 | 187 | `@part1` 188 | ```py 189 | type(fam) 190 | ``` 191 | 192 | ```out 193 | list 194 | ``` 195 | 196 | ```py 197 | type(fam2) 198 | ``` 199 | 200 | ```out 201 | list 202 | ``` 203 | 204 | - Specific functionality{{1}} 205 | 206 | - Specific behavior{{1}} 207 | 208 | `@script` 209 | the list. These calls show that both fam and fam2 are lists. Remember that I told you that each type has specific functionality and behavior associated? Well, for lists, this is also true. Python lists host a bunch of tools to subset and adapt them. But let's take this step by step, 210 | 211 | --- 212 | 213 | ## Let's practice! 214 | 215 | ```yaml 216 | type: FinalSlide 217 | key: de08280f5e 218 | ``` 219 | 220 | `@script` 221 | and have you experiment with list creation first! 222 | -------------------------------------------------------------------------------- /slides/chapter_2_fc15ba5cb9485456df8589130b519ea3.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Insert title here 3 | key: fc15ba5cb9485456df8589130b519ea3 4 | video_link: 5 | mp4: 'https://videos.datacamp.com/raw/735_intro_to_python/v6/735_ch2_2.mp4' 6 | hls: >- 7 | https://videos.datacamp.com/transcoded/735_intro_to_python/v6/hls-735_ch2_2.master.m3u8 8 | transformations: 9 | translateX: 50 10 | translateY: 0 11 | scale: 1 12 | --- 13 | 14 | ## Subsetting Lists 15 | 16 | ```yaml 17 | type: TitleSlide 18 | key: e4c1e2cc21 19 | ``` 20 | 21 | `@lower_third` 22 | name: Hugo Bowne-Anderson 23 | title: Data Scientist at DataCamp 24 | 25 | `@script` 26 | After you've created your very own Python list, you'll need to know how you can access information in the list. 27 | 28 | --- 29 | 30 | ## Subsetting lists 31 | 32 | ```yaml 33 | type: FullSlide 34 | key: 3c299aff4c 35 | code_zoom: 70 36 | ``` 37 | 38 | `@part1` 39 | ```py 40 | fam = ["liz", 1.73, "emma", 1.68, "mom", 1.71, "dad", 1.89] 41 | fam 42 | ```{{1}} 43 | 44 | ```out 45 | ['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89] 46 | ```{{1}} 47 | 48 | ```py 49 | fam[3] 50 | ```{{2}} 51 | 52 | ```out 53 | 1.68 54 | ```{{2}} 55 | 56 | `@script` 57 | Python uses the index to do this. Have a look at the fam list again here. The first element in the list has index 0, the second element has index 1, and so on. Suppose that you want to select the height of emma, the float 1.68. It's the fourth element, so it has index 3. To select it, you use 3 inside square brackets. 58 | 59 | Similarly, to select the string "dad" from the list, 60 | 61 | --- 62 | 63 | ## Subsetting lists 64 | 65 | ```yaml 66 | type: FullSlide 67 | key: e036a40a08 68 | code_zoom: 70 69 | ``` 70 | 71 | `@part1` 72 | ```out 73 | ['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89] 74 | ``` 75 | 76 | ```py 77 | fam[6] 78 | ```{{1}} 79 | 80 | ```out 81 | 'dad' 82 | ```{{1}} 83 | 84 | ```py 85 | fam[-1] 86 | ```{{2}} 87 | 88 | ```out 89 | 1.89 90 | ```{{2}} 91 | 92 | ```py 93 | fam[7] 94 | ```{{3}} 95 | 96 | ```out 97 | 1.89 98 | ```{{3}} 99 | 100 | `@script` 101 | which is the seventh element in the list, you'll need to put the index 6 inside square brackets. 102 | 103 | You can also count backwards, using negative indexes. This is useful if you want to get some elements at the end of your list. To get your dad's height, for example, you'll need the index -1. These are the negative indexes for all list elements. 104 | 105 | --- 106 | 107 | ## Subsetting lists 108 | 109 | ```yaml 110 | type: FullSlide 111 | key: 06e85623c2 112 | disable_transition: true 113 | code_zoom: 70 114 | ``` 115 | 116 | `@part1` 117 | ```out 118 | ['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89] 119 | ``` 120 | 121 | ```py 122 | fam[6] 123 | ``` 124 | 125 | ```out 126 | 'dad' 127 | ``` 128 | 129 | ```py 130 | fam[-1] # <- 131 | ``` 132 | 133 | ```out 134 | 1.89 135 | ``` 136 | 137 | ```py 138 | fam[7] # <- 139 | ``` 140 | 141 | ```out 142 | 1.89 143 | ``` 144 | 145 | `@script` 146 | This means that both these lines return the exact same result. 147 | 148 | Apart from indexing, there's also something called slicing, 149 | 150 | --- 151 | 152 | ## List slicing 153 | 154 | ```yaml 155 | type: FullSlide 156 | key: 125c4cb6c9 157 | code_zoom: 70 158 | ``` 159 | 160 | `@part1` 161 | ```py 162 | fam 163 | ``` 164 | 165 | ```out 166 | ['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89] 167 | ``` 168 | 169 | ```py 170 | fam[3:5] 171 | ```{{1}} 172 | 173 | ```out 174 | [1.68, 'mom'] 175 | ```{{2}} 176 | 177 | ```py 178 | fam[1:4] 179 | ```{{4}} 180 | 181 | ```out 182 | [1.73, 'emma', 1.68] 183 | ```{{5}} 184 | 185 | ![The slicing syntax for Python lists, showing that the start value is included in the subset, while the stop value is excluded.](https://assets.datacamp.com/production/repositories/288/datasets/83dd2f807be0d4d08a187935eed11667c18fcfe3/slicing-syntax.png = 40){{3}} 186 | 187 | `@script` 188 | which allows you to select multiple elements from a list, thus creating a new list. You can do this by specifying a range, using a colon. Let's first have another look at the list, and then try this piece of code. 189 | 190 | Can you guess what it'll return? A list with the the float 1.68, the string "mom", and the float 1.71, corresponding to the 4th, 5th and 6th element in the list maybe? Let's see what the output is. 191 | 192 | Apparently, only the elements with index 3 and 4, get returned. The element with index 5 is not included. In general, this is the syntax: the index you specify before the colon, so where the slice starts, is included, while the index you specify after the colon, where the slice ends, is not. 193 | 194 | With this in mind, can you tell what this call will return? 195 | 196 | You probably guessed correctly that this call gives you a list with three elements, corresponding to the elements with index 1, 2 and 3 of the fam list. 197 | 198 | You can also choose to just leave out the index before or after the colon. 199 | 200 | --- 201 | 202 | ## List slicing 203 | 204 | ```yaml 205 | type: FullSlide 206 | key: 8207b3255e 207 | disable_transition: true 208 | code_zoom: 70 209 | ``` 210 | 211 | `@part1` 212 | ```py 213 | fam 214 | ``` 215 | 216 | ```out 217 | ['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89] 218 | ``` 219 | 220 | ```py 221 | fam[:4] 222 | ```{{1}} 223 | 224 | ```out 225 | ['liz', 1.73, 'emma', 1.68] 226 | ```{{1}} 227 | 228 | ```py 229 | fam[5:] 230 | ```{{2}} 231 | 232 | ```out 233 | [1.71, 'dad', 1.89] 234 | ```{{2}} 235 | 236 | `@script` 237 | If you leave out the index where the slice should begin, you're telling Python to start the slice from index 0, like this example. 238 | 239 | If you leave out the index where the slice should end, you include all elements up to and including the last element in the list, like here. 240 | 241 | Now it's time to head over to the exercises, 242 | 243 | --- 244 | 245 | ## Let's practice! 246 | 247 | ```yaml 248 | type: FinalSlide 249 | key: 048b2b774f 250 | ``` 251 | 252 | `@script` 253 | where you will continue to work on the list you've created yourself before. You'll use different subsetting methods to get exactly the piece of information you need! 254 | -------------------------------------------------------------------------------- /slides/chapter_3_1204d914b0e53100529827e07441ee6c.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Insert title here 3 | key: 1204d914b0e53100529827e07441ee6c 4 | transformations: 5 | translateX: 50 6 | translateY: 0 7 | scale: 1 8 | video_link: 9 | mp4: 'https://videos.datacamp.com/raw/735_intro_to_python/v8/735_ch3_1.mp4' 10 | hls: >- 11 | https://videos.datacamp.com/transcoded/735_intro_to_python/v8/hls-735_ch3_1.master.m3u8 12 | --- 13 | 14 | ## Functions 15 | 16 | ```yaml 17 | type: TitleSlide 18 | key: 6d7066bcd2 19 | ``` 20 | 21 | `@lower_third` 22 | name: Hugo Bowne-Anderson 23 | title: Data Scientist at DataCamp 24 | 25 | `@script` 26 | In this video, I'm going to introduce you to functions. Once you learn about them you won't be able to stop using them. I sure can't. 27 | 28 | --- 29 | 30 | ## Functions 31 | 32 | ```yaml 33 | type: FullSlide 34 | key: 5f508018d7 35 | ``` 36 | 37 | `@part1` 38 | - Nothing new!{{1}} 39 | 40 | - `type()`{{2}} 41 | 42 | - Piece of reusable code{{3}} 43 | 44 | - Solves particular task{{4}} 45 | 46 | - Call function instead of writing code yourself{{5}} 47 | 48 | `@script` 49 | Functions aren't entirely new for you actually: you've already used them. type, for example, is a function that returns the type of a value. But what is a function? Simply put, a function is a piece of reusable code, aimed at solving a particular task. You can call functions instead of having to write code yourself. Maybe an example can clarify things here. 50 | 51 | --- 52 | 53 | ## Example 54 | 55 | ```yaml 56 | type: FullSlide 57 | key: c2afbb6435 58 | code_zoom: 75 59 | ``` 60 | 61 | `@part1` 62 | ```py 63 | fam = [1.73, 1.68, 1.71, 1.89] 64 | fam 65 | ``` 66 | 67 | ```out 68 | [1.73, 1.68, 1.71, 1.89] 69 | ``` 70 | 71 | ```py 72 | max(fam) 73 | ```{{1}} 74 | 75 | ```out 76 | 1.89 77 | ```{{1}} 78 | 79 | ![ch_3_1_slides.012.png](https://assets.datacamp.com/production/repositories/288/datasets/efef98eb50aba2b36df52166f7c4b18fd89c62e1/ch_3_1_slides.012.png){{2}} 80 | 81 | `@script` 82 | Suppose you have the list containing only the heights of your family, fam: 83 | 84 | Say that you want to get the maximum value in this list. Instead of writing your own piece of Python code that goes through the list and finds the highest value, you can also use Python's max function. This is one of Python's built-in functions, just like type. We simply pass fam to max inside parentheses. 85 | 86 | The output makes sense: 1.89, the highest number in the list. 87 | 88 | max worked kind of like a black box here: 89 | 90 | --- 91 | 92 | ## Example 93 | 94 | ```yaml 95 | type: FullSlide 96 | key: 46af509641 97 | disable_transition: true 98 | code_zoom: 75 99 | ``` 100 | 101 | `@part1` 102 | ```py 103 | fam = [1.73, 1.68, 1.71, 1.89] 104 | fam 105 | ``` 106 | 107 | ```out 108 | [1.73, 1.68, 1.71, 1.89] 109 | ``` 110 | 111 | ```py 112 | max(fam) 113 | ``` 114 | 115 | ```out 116 | 1.89 117 | ``` 118 | 119 | ![ch_3_1_slides.013.png](https://assets.datacamp.com/production/repositories/288/datasets/65f70092ec124c8f29a082f9409e473496806aaa/ch_3_1_slides.013.png) 120 | 121 | `@script` 122 | you passed it a list, then the implementation of max, that you don't know, did its magic, 123 | 124 | --- 125 | 126 | ## Example 127 | 128 | ```yaml 129 | type: FullSlide 130 | key: c575524d98 131 | disable_transition: true 132 | code_zoom: 75 133 | ``` 134 | 135 | `@part1` 136 | ```py 137 | fam = [1.73, 1.68, 1.71, 1.89] 138 | fam 139 | ``` 140 | 141 | ```out 142 | [1.73, 1.68, 1.71, 1.89] 143 | ``` 144 | 145 | ```py 146 | max(fam) 147 | ``` 148 | 149 | ```out 150 | 1.89 151 | ``` 152 | 153 | ![ch_3_1_slides.014.png](https://assets.datacamp.com/production/repositories/288/datasets/404545609ab865031039dcfd81ea2d2962126f72/ch_3_1_slides.014.png) 154 | 155 | `@script` 156 | and produced an output. How max actually did this, is not important to you, it just does what it's supposed to, and you didn't have to write your own code, which made your life easier. 157 | 158 | --- 159 | 160 | ## Example 161 | 162 | ```yaml 163 | type: FullSlide 164 | key: bed6186ee9 165 | disable_transition: true 166 | code_zoom: 75 167 | ``` 168 | 169 | `@part1` 170 | ```py 171 | fam = [1.73, 1.68, 1.71, 1.89] 172 | fam 173 | ``` 174 | 175 | ```out 176 | [1.73, 1.68, 1.71, 1.89] 177 | ``` 178 | 179 | ```py 180 | max(fam) 181 | ``` 182 | 183 | ```out 184 | 1.89 185 | ``` 186 | 187 | ```py 188 | tallest = max(fam) 189 | tallest 190 | ```{{1}} 191 | 192 | ```out 193 | 1.89 194 | ```{{1}} 195 | 196 | `@script` 197 | Of course, it's possible to also assign the result of a function call to a new variable, like here. Now tallest is just like any other variable; you can use it to continue your fancy calculations. 198 | 199 | --- 200 | 201 | ## round() 202 | 203 | ```yaml 204 | type: FullSlide 205 | key: b6626f6bff 206 | code_zoom: 62 207 | ``` 208 | 209 | `@part1` 210 | ```py 211 | round(1.68, 1) 212 | ```{{1}} 213 | 214 | ```out 215 | 1.7 216 | ```{{1}} 217 | 218 | ```py 219 | round(1.68) 220 | ```{{2}} 221 | 222 | ```out 223 | 2 224 | ```{{2}} 225 | 226 | ```py 227 | help(round) # Open up documentation 228 | ```{{3}} 229 | 230 | ```out 231 | Help on built-in function round in module builtins: 232 | 233 | round(number, ndigits=None) 234 | Round a number to a given precision in decimal digits. 235 | 236 | The return value is an integer if ndigits is omitted or None. 237 | Otherwise the return value has the same type as the number. ndigits may be negative. 238 | ```{{3}} 239 | 240 | `@script` 241 | Another one of these built-in functions is round. It takes two inputs: first, a number you want to round, and second, the precision with which to round, which is how many digits after the decimal point you want to keep. Say you want to round 1.68 to one decimal place. The first input is 1.68, the second input is 1. You separate the inputs with a comma. 242 | 243 | But there's more. It's perfectly possible to call the round function with only one input, like this. This time, Python figured out that you didn't specify the second input, and automatically chooses to round the number to the closest integer. 244 | 245 | To understand why both approaches work, let's open up the documentation. You can do this with yet another function, help, like this. 246 | 247 | It appears that round takes two inputs. 248 | 249 | --- 250 | 251 | ## round() 252 | 253 | ```yaml 254 | type: FullSlide 255 | key: c8119a3588 256 | code_zoom: 63 257 | ``` 258 | 259 | `@part1` 260 | ```py 261 | help(round) 262 | ``` 263 | 264 | ```out 265 | Help on built-in function round in module builtins: 266 | 267 | round(number, ndigits=None) 268 | Round a number to a given precision in decimal digits. 269 | 270 | The return value is an integer if ndigits is omitted or None. 271 | Otherwise the return value has the same type as the number. ndigits may be negative. 272 | ``` 273 | 274 | 275 | 276 | ![ch_3_1_slides.026.png](https://assets.datacamp.com/production/repositories/288/datasets/27ffd63d62347e84e5471ee64a8652616e575616/ch_3_1_slides.026.png){{1}} 277 | 278 | `@script` 279 | In Python, these inputs, also called arguments, have names: number and ndigits. When you call the function round, 280 | 281 | --- 282 | 283 | ## round() 284 | 285 | ```yaml 286 | type: FullSlide 287 | key: 8aacabb9b1 288 | disable_transition: true 289 | code_zoom: 63 290 | ``` 291 | 292 | `@part1` 293 | ```py 294 | help(round) 295 | ``` 296 | 297 | ```out 298 | Help on built-in function round in module builtins: 299 | 300 | round(number, ndigits=None) 301 | Round a number to a given precision in decimal digits. 302 | 303 | The return value is an integer if ndigits is omitted or None. 304 | Otherwise the return value has the same type as the number. ndigits may be negative. 305 | ``` 306 | 307 | ![ch_3_1_slides.027.png](https://assets.datacamp.com/production/repositories/288/datasets/0b07066836b79b6c2539ddda423da6ff6352ddf6/ch_3_1_slides.027.png) 308 | 309 | `@script` 310 | with these two inputs, Python matches the inputs to the arguments: 311 | 312 | --- 313 | 314 | ## round() 315 | 316 | ```yaml 317 | type: FullSlide 318 | key: 0ae8191d5a 319 | disable_transition: true 320 | code_zoom: 63 321 | ``` 322 | 323 | `@part1` 324 | ```py 325 | help(round) 326 | ``` 327 | 328 | ```out 329 | Help on built-in function round in module builtins: 330 | 331 | round(number, ndigits=None) 332 | Round a number to a given precision in decimal digits. 333 | 334 | The return value is an integer if ndigits is omitted or None. 335 | Otherwise the return value has the same type as the number. ndigits may be negative. 336 | ``` 337 | 338 | ![ch_3_1_slides.028.png](https://assets.datacamp.com/production/repositories/288/datasets/4c257fe9ca0994487db6141be7376a370a81d25f/ch_3_1_slides.028.png) 339 | 340 | `@script` 341 | number is set to 1.68 and 342 | 343 | --- 344 | 345 | ## round() 346 | 347 | ```yaml 348 | type: FullSlide 349 | key: 061bc680d8 350 | disable_transition: true 351 | code_zoom: 63 352 | ``` 353 | 354 | `@part1` 355 | ```py 356 | help(round) 357 | ``` 358 | 359 | ```out 360 | Help on built-in function round in module builtins: 361 | 362 | round(number, ndigits=None) 363 | Round a number to a given precision in decimal digits. 364 | 365 | The return value is an integer if ndigits is omitted or None. 366 | Otherwise the return value has the same type as the number. ndigits may be negative. 367 | ``` 368 | 369 | ![ch_3_1_slides.029.png](https://assets.datacamp.com/production/repositories/288/datasets/26344efb7eb778da4d8c3350f79dadd82a8a6fd1/ch_3_1_slides.029.png) 370 | 371 | `@script` 372 | ndigits is set to 1. Next, 373 | 374 | --- 375 | 376 | ## round() 377 | 378 | ```yaml 379 | type: FullSlide 380 | key: 7289eaeb61 381 | disable_transition: true 382 | code_zoom: 63 383 | ``` 384 | 385 | `@part1` 386 | ```py 387 | help(round) 388 | ``` 389 | 390 | ```out 391 | Help on built-in function round in module builtins: 392 | 393 | round(number, ndigits=None) 394 | Round a number to a given precision in decimal digits. 395 | 396 | The return value is an integer if ndigits is omitted or None. 397 | Otherwise the return value has the same type as the number. ndigits may be negative. 398 | ``` 399 | 400 | ![ch_3_1_slides.030.png](https://assets.datacamp.com/production/repositories/288/datasets/a7d825885a0519ffec79b0763684cd8b16822d6e/ch_3_1_slides.030.png) 401 | 402 | `@script` 403 | The round function does its calculations with number and ndigits as if they are Python variables in a script. We don't know exactly what code Python executes. What is important, though, is that the function produces an output, 404 | 405 | --- 406 | 407 | ## round() 408 | 409 | ```yaml 410 | type: FullSlide 411 | key: b5ef829b0c 412 | disable_transition: true 413 | code_zoom: 63 414 | ``` 415 | 416 | `@part1` 417 | ```py 418 | help(round) 419 | ``` 420 | 421 | ```out 422 | Help on built-in function round in module builtins: 423 | 424 | round(number, ndigits=None) 425 | Round a number to a given precision in decimal digits. 426 | 427 | The return value is an integer if ndigits is omitted or None. 428 | Otherwise the return value has the same type as the number. ndigits may be negative. 429 | ``` 430 | 431 | ![ch_3_1_slides.031.png](https://assets.datacamp.com/production/repositories/288/datasets/c4e016f38f0612354160324d2e2abe0ce922a4f3/ch_3_1_slides.031.png) 432 | 433 | `@script` 434 | namely the number 1.68 rounded to 1 decimal place. 435 | 436 | --- 437 | 438 | ## round() 439 | 440 | ```yaml 441 | type: FullSlide 442 | key: c02d6edac7 443 | disable_transition: true 444 | code_zoom: 63 445 | ``` 446 | 447 | `@part1` 448 | ```py 449 | help(round) 450 | ``` 451 | 452 | ```out 453 | Help on built-in function round in module builtins: 454 | 455 | round(number, ndigits=None) 456 | Round a number to a given precision in decimal digits. 457 | 458 | The return value is an integer if ndigits is omitted or None. 459 | Otherwise the return value has the same type as the number. ndigits may be negative. 460 | ``` 461 | 462 | 463 | 464 | ![ch_3_1_slides.032.png](https://assets.datacamp.com/production/repositories/288/datasets/27ffd63d62347e84e5471ee64a8652616e575616/ch_3_1_slides.032.png) 465 | 466 | `@script` 467 | If you call the function round with only one input, 468 | 469 | --- 470 | 471 | ## round() 472 | 473 | ```yaml 474 | type: FullSlide 475 | key: 7c246e950e 476 | disable_transition: true 477 | code_zoom: 63 478 | ``` 479 | 480 | `@part1` 481 | ```py 482 | help(round) 483 | ``` 484 | 485 | ```out 486 | Help on built-in function round in module builtins: 487 | 488 | round(number, ndigits=None) 489 | Round a number to a given precision in decimal digits. 490 | 491 | The return value is an integer if ndigits is omitted or None. 492 | Otherwise the return value has the same type as the number. ndigits may be negative. 493 | ``` 494 | 495 | ![ch_3_1_slides.033.png](https://assets.datacamp.com/production/repositories/288/datasets/29fb0a5fe3ca2ea269fc4c82815591b9bca55d5e/ch_3_1_slides.033.png) 496 | 497 | `@script` 498 | Python again tries to 499 | 500 | --- 501 | 502 | ## round() 503 | 504 | ```yaml 505 | type: FullSlide 506 | key: 51e45534c3 507 | disable_transition: true 508 | code_zoom: 63 509 | ``` 510 | 511 | `@part1` 512 | ```py 513 | help(round) 514 | ``` 515 | 516 | ```out 517 | Help on built-in function round in module builtins: 518 | 519 | round(number, ndigits=None) 520 | Round a number to a given precision in decimal digits. 521 | 522 | The return value is an integer if ndigits is omitted or None. 523 | Otherwise the return value has the same type as the number. ndigits may be negative. 524 | ``` 525 | 526 | ![ch_3_1_slides.034.png](https://assets.datacamp.com/production/repositories/288/datasets/4167c94aecf6b66c78efaf5f8ac9232187fb23df/ch_3_1_slides.034.png) 527 | 528 | `@script` 529 | match the inputs to 530 | 531 | --- 532 | 533 | ## round() 534 | 535 | ```yaml 536 | type: FullSlide 537 | key: e33598e422 538 | disable_transition: true 539 | code_zoom: 63 540 | ``` 541 | 542 | `@part1` 543 | ```py 544 | help(round) 545 | ``` 546 | 547 | ```out 548 | Help on built-in function round in module builtins: 549 | 550 | round(number, ndigits=None) 551 | Round a number to a given precision in decimal digits. 552 | 553 | The return value is an integer if ndigits is omitted or None. 554 | Otherwise the return value has the same type as the number. ndigits may be negative. 555 | ``` 556 | 557 | ![ch_3_1_slides.035.png](https://assets.datacamp.com/production/repositories/288/datasets/1218fd4989e4f6dfd8471d5cf0f88da0189efc27/ch_3_1_slides.035.png) 558 | 559 | `@script` 560 | the arguments. There's no input to match to the ndigits argument though. Luckily, 561 | 562 | --- 563 | 564 | ## round() 565 | 566 | ```yaml 567 | type: FullSlide 568 | key: 767966a5a9 569 | disable_transition: true 570 | code_zoom: 63 571 | ``` 572 | 573 | `@part1` 574 | ```py 575 | help(round) 576 | ``` 577 | 578 | ```out 579 | Help on built-in function round in module builtins: 580 | 581 | round(number, ndigits=None) 582 | Round a number to a given precision in decimal digits. 583 | 584 | The return value is an integer if ndigits is omitted or None. 585 | Otherwise the return value has the same type as the number. ndigits may be negative. 586 | ``` 587 | 588 | ![ch_3_1_slides.036.png](https://assets.datacamp.com/production/repositories/288/datasets/b8f1b94ac3acfdd400bbdf1fca652f772cee4ae6/ch_3_1_slides.036.png) 589 | 590 | `@script` 591 | the internal machinery of the round function knows how to handle this. When ndigits is not specified, the function simply rounds to the closest integer and 592 | 593 | --- 594 | 595 | ## round() 596 | 597 | ```yaml 598 | type: FullSlide 599 | key: 93b669c9cb 600 | disable_transition: true 601 | code_zoom: 63 602 | ``` 603 | 604 | `@part1` 605 | ```py 606 | help(round) 607 | ``` 608 | 609 | ```out 610 | Help on built-in function round in module builtins: 611 | 612 | round(number, ndigits=None) 613 | Round a number to a given precision in decimal digits. 614 | 615 | The return value is an integer if ndigits is omitted or None. 616 | Otherwise the return value has the same type as the number. ndigits may be negative. 617 | ``` 618 | 619 | ![ch_3_1_slides.037.png](https://assets.datacamp.com/production/repositories/288/datasets/0a9ca09bc0b46f05f77483d00fb1eadadfc75033/ch_3_1_slides.037.png) 620 | 621 | `@script` 622 | returns that integer. That's why we got the number 2. 623 | 624 | --- 625 | 626 | ## round() 627 | 628 | ```yaml 629 | type: FullSlide 630 | key: eed1e60402 631 | disable_transition: true 632 | code_zoom: 63 633 | ``` 634 | 635 | `@part1` 636 | ```py 637 | help(round) 638 | ``` 639 | 640 | ```out 641 | Help on built-in function round in module builtins: 642 | 643 | round(number, ndigits=None) 644 | Round a number to a given precision in decimal digits. 645 | 646 | The return value is an integer if ndigits is omitted or None. 647 | Otherwise the return value has the same type as the number. ndigits may be negative. 648 | ``` 649 | 650 | - `round(number)`{{1}} 651 | - `round(number, ndigits)`{{2}} 652 | 653 | `@script` 654 | In other words, ndigits is an optional argument. This tells us that you can call round in this form, as well as in this one. 655 | 656 | --- 657 | 658 | ## Find functions 659 | 660 | ```yaml 661 | type: FullSlide 662 | key: a9853a9d66 663 | ``` 664 | 665 | `@part1` 666 | - How to know?{{1}} 667 | 668 | - Standard task -> probably function exists!{{2}} 669 | 670 | - The internet is your friend{{3}} 671 | 672 | `@script` 673 | By now, you have an idea about how to use max and round, but how could you know that a function such as round exists in Python in the first place? Well, this is something you will learn with time. Whenever you are doing a rather standard task in Python, you can be pretty sure that there's already a function that can do this for you. In that case, you should definitely use it! Just do a quick internet search and you'll find the function you need with a nice usage example. And there is of course DataCamp, where you'll also learn about powerful functions and how to use them. 674 | 675 | --- 676 | 677 | ## Let's practice! 678 | 679 | ```yaml 680 | type: FinalSlide 681 | key: dbac5490bd 682 | ``` 683 | 684 | `@script` 685 | Get straight to it in the interactive exercises, and I'll see you back here soon! 686 | -------------------------------------------------------------------------------- /slides/chapter_3_8e387776f3a264a745128b68aa8d8f83.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Insert title here 3 | key: 8e387776f3a264a745128b68aa8d8f83 4 | video_link: 5 | mp4: 'https://videos.datacamp.com/raw/735_intro_to_python/v6/735_ch3_2.mp4' 6 | hls: >- 7 | https://videos.datacamp.com/transcoded/735_intro_to_python/v6/hls-735_ch3_2.master.m3u8 8 | transformations: 9 | translateX: 50 10 | translateY: 0 11 | scale: 1 12 | --- 13 | 14 | ## Methods 15 | 16 | ```yaml 17 | type: TitleSlide 18 | key: c536df1034 19 | ``` 20 | 21 | `@lower_third` 22 | name: Hugo Bowne-Anderson 23 | title: Data Scientist at DataCamp 24 | 25 | `@script` 26 | Built-in functions are only 27 | 28 | --- 29 | 30 | ## Built-in Functions 31 | 32 | ```yaml 33 | type: FullSlide 34 | key: 45877294bd 35 | ``` 36 | 37 | `@part1` 38 | - Maximum of list: max(){{1}} 39 | 40 | - Length of list or string: len(){{2}} 41 | 42 | - Get index in list: ?{{3}} 43 | 44 | - Reversing a list: ?{{4}} 45 | 46 | `@script` 47 | one part of the Python story. You already know about functions such as max, to get the maximum of a list, len, to get the length of a list or a string, and so on. But what about other basic things, such getting the index of a specific element in the list, or reversing a list? You can look very hard for built-in functions that do this, but you won't find them. 48 | 49 | --- 50 | 51 | ## Back 2 Basics 52 | 53 | ```yaml 54 | type: TwoColumns 55 | key: a3e45f6524 56 | code_zoom: 75 57 | ``` 58 | 59 | `@part1` 60 |   61 | 62 | ```py 63 | sister = "liz" 64 | ``` 65 | 66 | ```py 67 | height = 1.73 68 | ```{{1}} 69 | 70 | 71 | ```py 72 | fam = ["liz", 1.73, "emma", 1.68, 73 | "mom", 1.71, "dad", 1.89] 74 | ```{{2}} 75 | 76 | `@part2` 77 | ![ch_3_2_slides.020.png](https://assets.datacamp.com/production/repositories/288/datasets/c7a9757fa49f8396eb025ef221823441d6e66ced/ch_3_2_slides.020.png = 85){{3}} 78 | 79 | `@script` 80 | In the past exercises, you've already created a bunch of variables. Among other Python types, you've created strings, floats and lists, like the ones you see here. Each one of these values or data structures are so-called Python objects. This string is an object, this float is an object, but this list is also, you got it, an object. These objects have a specific type, that you already know: 81 | 82 | --- 83 | 84 | ## Back 2 Basics 85 | 86 | ```yaml 87 | type: TwoColumns 88 | key: 6a5eddd6ea 89 | disable_transition: true 90 | code_zoom: 75 91 | ``` 92 | 93 | `@part1` 94 |   95 | 96 | ```py 97 | sister = "liz" 98 | ``` 99 | 100 | ```py 101 | height = 1.73 102 | ``` 103 | 104 | 105 | ```py 106 | fam = ["liz", 1.73, "emma", 1.68, 107 | "mom", 1.71, "dad", 1.89] 108 | ``` 109 | 110 | - Methods: Functions that belong to objects{{1}} 111 | 112 | `@part2` 113 | ![ch_3_2_slides.024.png](https://assets.datacamp.com/production/repositories/288/datasets/6d444348823438f856363d02d093318f2ed457a3/ch_3_2_slides.024.png = 85) 114 | 115 | `@script` 116 | string, float, and list, and of course they represent the values you gave them, such as "liz", 1.73 and an entire list. But in addition to this, Python objects also come with a bunch of so-called "methods". You can think of methods as functions that "belong to" Python objects. A Python object of type string has methods, 117 | 118 | --- 119 | 120 | ## Back 2 Basics 121 | 122 | ```yaml 123 | type: TwoColumns 124 | key: ff540e522c 125 | disable_transition: true 126 | code_zoom: 75 127 | ``` 128 | 129 | `@part1` 130 |   131 | 132 | ```py 133 | sister = "liz" 134 | ``` 135 | 136 | ```py 137 | height = 1.73 138 | ``` 139 | 140 | 141 | ```py 142 | fam = ["liz", 1.73, "emma", 1.68, 143 | "mom", 1.71, "dad", 1.89] 144 | ``` 145 | 146 | - Methods: Functions that belong to objects 147 | 148 | `@part2` 149 | ![ch_3_2_slides.028.png](https://assets.datacamp.com/production/repositories/288/datasets/80891dbbb1a9b4f759540c5d601cbfb661a894d9/ch_3_2_slides.028.png = 85) 150 | 151 | `@script` 152 | such as capitalize and replace, but also objects of type float and list have specific methods depending on the type. 153 | 154 | Enough for the theory now; let's try to use a method! 155 | 156 | --- 157 | 158 | ## list methods 159 | 160 | ```yaml 161 | type: FullSlide 162 | key: 431cae8707 163 | code_zoom: 85 164 | ``` 165 | 166 | `@part1` 167 | ```py 168 | fam 169 | ``` 170 | 171 | ```out 172 | ['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89] 173 | ``` 174 | 175 | ```py 176 | fam.index("mom") # "Call method index() on fam" 177 | ```{{1}} 178 | 179 | ```out 180 | 4 181 | ```{{2}} 182 | 183 | ```py 184 | fam.count(1.73) 185 | ```{{3}} 186 | 187 | ```out 188 | 1 189 | ```{{4}} 190 | 191 | `@script` 192 | Suppose you want to get the index of the string "mom" in the fam list. fam is an Python object with the type list, and has a method named index. To call the method, you use the dot notation, like this. The only input is the string "mom", the element you want to get the index for. 193 | 194 | Python returns 4, which indeed is the index of the string "mom". I called the index method "on" the fam list here, and the output was 4. Similarly, I can use the count method on the fam list to count the number of times 1.73 occurs in the list. 195 | 196 | Python gives me 1, which makes sense, because only liz is 1.73 meters tall. 197 | 198 | But lists are not the only Python objects that have methods associated. Also floats, integers, booleans and strings 199 | 200 | --- 201 | 202 | ## str methods 203 | 204 | ```yaml 205 | type: FullSlide 206 | key: 73c6a6ff3a 207 | code_zoom: 80 208 | ``` 209 | 210 | `@part1` 211 | ```py 212 | sister 213 | ```{{1}} 214 | 215 | ```out 216 | 'liz' 217 | ```{{1}} 218 | 219 | ```py 220 | sister.capitalize() 221 | ```{{2}} 222 | 223 | ```out 224 | 'Liz' 225 | ```{{3}} 226 | 227 | ```py 228 | sister.replace("z", "sa") 229 | ```{{4}} 230 | 231 | ```out 232 | 'lisa' 233 | ```{{5}} 234 | 235 | `@script` 236 | are Python objects that have specific methods associated with them. Take the variable sister for example, that represents a string. 237 | 238 | You can call the method capitalize on sister, without any inputs. It returns a string where the first letter is capitalized now. 239 | 240 | Or what if you want to replace some parts of the string with other parts? Not a problem. Just call the method replace on sister, with two appropriate inputs. 241 | 242 | In the output, "z" is replaced with "sa". 243 | 244 | --- 245 | 246 | ## Methods 247 | 248 | ```yaml 249 | type: FullSlide 250 | key: 346697c688 251 | code_zoom: 80 252 | ``` 253 | 254 | `@part1` 255 | - Everything = object{{1}} 256 | 257 | - Object have methods associated, depending on type{{2}} 258 | 259 | ```py 260 | sister.replace("z", "sa") 261 | ```{{3}} 262 | 263 | ```out 264 | 'lisa' 265 | ```{{3}} 266 | 267 | ```py 268 | fam.replace("mom", "mommy") 269 | ```{{4}} 270 | 271 | ```out 272 | AttributeError: 'list' object has no attribute 'replace' 273 | ```{{4}} 274 | 275 | `@script` 276 | To be absolutely clear: in Python, everything is an object, and each object has specific methods associated. Depending on the type of the object, list, string, float, whatever, the available methods are different. A string object like sister has a replace method, but a list like fam doesn't have this, as you can see from this error. 277 | 278 | --- 279 | 280 | ## Methods 281 | 282 | ```yaml 283 | type: FullSlide 284 | key: c0100c8d69 285 | disable_transition: true 286 | code_zoom: 80 287 | ``` 288 | 289 | `@part1` 290 | ```py 291 | sister.index("z") 292 | ```{{1}} 293 | 294 | ```out 295 | 2 296 | ```{{1}} 297 | 298 | ```py 299 | fam.index("mom") 300 | ```{{1}} 301 | 302 | ```out 303 | 4 304 | ```{{1}} 305 | 306 | `@script` 307 | Objects of different types can have methods with the same name: Take the index method. It's available for both strings and lists. If you call it on a string, you get the index of the letters in the string; If you call it on a list, you get the index of the element in the list. This means that, depending on the type of the object, the methods behave differently. 308 | 309 | Before I unleash you on some exercises on methods, 310 | 311 | --- 312 | 313 | ## Methods (2) 314 | 315 | ```yaml 316 | type: FullSlide 317 | key: f03ac21e34 318 | code_zoom: 75 319 | ``` 320 | 321 | `@part1` 322 | ```py 323 | fam 324 | ```{{1}} 325 | 326 | ```out 327 | ['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89] 328 | ```{{1}} 329 | 330 | ```py 331 | fam.append("me") 332 | ```{{2}} 333 | ```py 334 | fam 335 | ```{{3}} 336 | 337 | ```out 338 | ['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89, 'me'] 339 | ```{{3}} 340 | 341 | ```py 342 | fam.append(1.79) 343 | fam 344 | ```{{4}} 345 | 346 | ```out 347 | ['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89, 'me', 1.79] 348 | ```{{4}} 349 | 350 | `@script` 351 | there's one more thing I want to tell you. Some methods can change the objects they are called on. Let's retake the fam list, and call the append method on it. As the input, we pass a string we want to add to the list. 352 | 353 | Python doesn't generate an output, but if we check the fam list again, we see that it has been extended with the string "me". 354 | 355 | Let's do this again, this time to add my height to the list. 356 | 357 | Again, the fam list was extended. 358 | 359 | This is pretty cool, because you can write very concise code to update your data structures on the fly, but it can also be pretty dangerous. Some method calls don't change the object they're called on, while others do, so watch out. 360 | 361 | --- 362 | 363 | ## Summary 364 | 365 | ```yaml 366 | type: FullSlide 367 | key: eecd826650 368 | code_zoom: 80 369 | ``` 370 | 371 | `@part1` 372 | Functions{{1}} 373 | 374 | ```py 375 | type(fam) 376 | ```{{2}} 377 | 378 | ```out 379 | list 380 | ```{{2}} 381 | 382 | Methods: call functions on objects{{3}} 383 | 384 | ```py 385 | fam.index("dad") 386 | ```{{4}} 387 | 388 | ```out 389 | 6 390 | ```{{4}} 391 | 392 | `@script` 393 | Let's take a step back here and summarize this. you have Python functions, like type, max and round, that you can call like this. There's also methods, which are functions that are specific to Python objects. Depending on the type of the Python object you're dealing with, you'll be able to use different methods and they behave differently. You can call methods on the objects with the dot notation, like this, for example. 394 | 395 | There's much more to tell about Python objects, methods and how Python works internally, 396 | 397 | --- 398 | 399 | ## Let's practice! 400 | 401 | ```yaml 402 | type: FinalSlide 403 | key: cefb86a284 404 | ``` 405 | 406 | `@script` 407 | but for now, let's stick to what I've talked about here. It's time to get some exercises and add methods to your evergrowing skillset! 408 | -------------------------------------------------------------------------------- /slides/chapter_3_cedcfb34350be8545599768f96695cdd.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Insert title here 3 | key: cedcfb34350be8545599768f96695cdd 4 | video_link: 5 | mp4: 'https://videos.datacamp.com/raw/735_intro_to_python/v6/735_ch3_3.mp4' 6 | hls: >- 7 | https://videos.datacamp.com/transcoded/735_intro_to_python/v6/hls-735_ch3_3.master.m3u8 8 | transformations: 9 | translateX: 50 10 | translateY: 0 11 | scale: 1 12 | --- 13 | 14 | ## Packages 15 | 16 | ```yaml 17 | type: TitleSlide 18 | key: de661f5035 19 | ``` 20 | 21 | `@lower_third` 22 | name: Hugo Bowne-Anderson 23 | title: Data Scientist at DataCamp 24 | 25 | `@script` 26 | By now, I hope you're convinced 27 | 28 | --- 29 | 30 | ## Motivation 31 | 32 | ```yaml 33 | type: FullSlide 34 | key: 63ee37e52b 35 | ``` 36 | 37 | `@part1` 38 | - Functions and methods are powerful{{1}} 39 | 40 | - All code in Python distribution?{{2}} 41 | 42 | - Huge code base: messy{{2}} 43 | 44 | - Lots of code you won’t use{{3}} 45 | 46 | - Maintenance problem{{4}} 47 | 48 | `@script` 49 | that python functions and methods are extremely powerful: you can basically use other people's code to solve your own problems. That's amazing! However, adding all functions and methods that have been written up to now to the same Python distribution would be a mess. There would be tons and tons of code in there, that you'll never use. Also, maintaining all of this code would be a real pain. 50 | 51 | --- 52 | 53 | ## Packages 54 | 55 | ```yaml 56 | type: TwoColumns 57 | key: fe3a37611e 58 | ``` 59 | 60 | `@part1` 61 | - Directory of Python Scripts{{1}} 62 | 63 | - Each script = module{{2}} 64 | 65 | - Specify functions, methods, types{{3}} 66 | 67 | - Thousands of packages available{{4}} 68 | 69 | - NumPy{{5}} 70 | 71 | - Matplotlib{{6}} 72 | 73 | - scikit-learn{{7}} 74 | 75 | `@part2` 76 | ![Screen Shot 2019-09-08 at 9.18.56 AM.png](https://assets.datacamp.com/production/repositories/288/datasets/4763cadd79023a264f2e25c85c8344817ec13c55/Screen%20Shot%202019-09-08%20at%209.18.56%20AM.png = 60) 77 | 78 | `@script` 79 | This is where packages come into play. You can think of packages as a directory of Python scripts. Each such script is a so-called module. These modules specify functions, methods and new Python types aimed at solving particular problems. There are thousands of Python packages available from the internet. Among them are packages for data science: there's NumPy to efficiently work with arrays, Matplotlib for data visualization, and scikit-learn for machine learning. 80 | 81 | Not all these packages are available in Python by default. 82 | 83 | --- 84 | 85 | ## Install package 86 | 87 | ```yaml 88 | type: FullSlide 89 | key: a198cbb666 90 | ``` 91 | 92 | `@part1` 93 | - https://pip.pypa.io/en/stable/installation/{{1}} 94 | 95 | - Download `get-pip.py`{{2}} 96 | 97 | - Terminal:{{3}} 98 | 99 | - `python3 get-pip.py`{{4}} 100 | 101 | - `pip3 install numpy`{{5}} 102 | 103 | `@script` 104 | To use Python packages, you'll first have to install them on your own system, and then put code in your script to tell Python that you want to use these packages. 105 | 106 | Datacamp already has all necessary packages installed for you, but if you want to install them on your own system, you'll want to use pip, a package maintenance system for Python. If you go to this URL, you can download the file get-pip.py. Next, you go to the terminal, and execute python3 get-pip.py. Now you can use pip to actually install a Python package of your choosing. Suppose we want to install the numpy package, which you'll learn about in the next chapter. You type pip3 install numpy. You have to use the commands python3 and pip3 here to tell our system that we're working with Python version 3. 107 | 108 | Now that the package is installed, you can actually start using it in one of your Python scripts. 109 | 110 | --- 111 | 112 | ## Import package 113 | 114 | ```yaml 115 | type: TwoColumns 116 | key: d87a9581e9 117 | code_zoom: 68 118 | ``` 119 | 120 | `@part1` 121 | ```py 122 | import numpy 123 | ```{{1}} 124 | ```py 125 | array([1, 2, 3]) 126 | ```{{2}} 127 | 128 | ```out 129 | NameError: name 'array' is not defined 130 | ```{{3}} 131 | 132 | ```py 133 | numpy.array([1, 2, 3]) 134 | ```{{4}} 135 | 136 | ```out 137 | array([1, 2, 3]) 138 | ```{{5}} 139 | 140 | `@part2` 141 | ```py 142 | import numpy as np 143 | ```{{6}} 144 | ```py 145 | np.array([1, 2, 3]) 146 | ```{{7}} 147 | 148 | ```out 149 | array([1, 2, 3]) 150 | ```{{8}} 151 | 152 | ```py 153 | from numpy import array 154 | ```{{9}} 155 | ```py 156 | array([1, 2, 3]) 157 | ```{{10}} 158 | 159 | ```out 160 | array([1, 2, 3]) 161 | ```{{11}} 162 | 163 | `@script` 164 | Before you can do this, you should import the package, or a specific module of the package. You can do this with the import statement. 165 | 166 | To import the entire numpy package, you can do import numpy, like this. 167 | 168 | A commonly used function in NumPy is array. It takes a list as input. Simply calling the array function like this, will generate an error. 169 | 170 | To refer to the array function from the numpy package, you'll need this. 171 | 172 | This time it works. The NumPy array is very useful to do data science, but more on that later. 173 | 174 | Using this numpy dot prefix all the time can become pretty tiring, so you can also import the package and refer to it with a different name. You can do this by extending your import statement with as, like this. 175 | 176 | Now, instead of numpy.array, you'll have to use np.array to use NumPy's array function. 177 | 178 | There are cases in which you only need one specific function of a package. Python allows you to make this explicit in your code. Suppose that we only want to use the array function from the NumPy package. Instead of doing import numpy, you can instead do from numpy import array, like this. 179 | 180 | This time, you can simply call the array function like this, no need to use numpy dot here. 181 | 182 | This from import version to use specific parts of a package can be useful to limit the amount of coding, but you're also loosing some of the context. 183 | 184 | --- 185 | 186 | ## from numpy import array 187 | 188 | ```yaml 189 | type: FullSlide 190 | key: e17caa7b57 191 | code_zoom: 70 192 | ``` 193 | 194 | `@part1` 195 | - `my_script.py` 196 | 197 | ```py 198 | from numpy import array 199 | ``` 200 | ```py 201 | 202 | fam = ["liz", 1.73, "emma", 1.68, 203 | "mom", 1.71, "dad", 1.89] 204 | 205 | ... 206 | ``` 207 | ```py 208 | fam_ext = fam + ["me", 1.79] 209 | 210 | ... 211 | ``` 212 | ```py 213 | print(str(len(fam_ext)) + " elements in fam_ext") 214 | 215 | ... 216 | ``` 217 | ```py 218 | np_fam = array(fam_ext) 219 | ```{{1}} 220 | 221 | - Using NumPy, but not very clear{{2}} 222 | 223 | `@script` 224 | Suppose you're working in a long Python script. You import the array function from numpy at the very top, and way later, you actually use this array function. Somebody else who's reading your code might have forgotten that this array function is a specific NumPy function; it's not clear from the function call. 225 | 226 | --- 227 | 228 | ## import numpy 229 | 230 | ```yaml 231 | type: FullSlide 232 | key: b287cdae79 233 | code_zoom: 70 234 | ``` 235 | 236 | `@part1` 237 | ```py 238 | import numpy as np 239 | 240 | fam = ["liz", 1.73, "emma", 1.68, 241 | "mom", 1.71, "dad", 1.89] 242 | 243 | ... 244 | ``` 245 | ```py 246 | fam_ext = fam + ["me", 1.79] 247 | 248 | ... 249 | ``` 250 | ```py 251 | print(str(len(fam_ext)) + " elements in fam_ext") 252 | 253 | ... 254 | ``` 255 | ```py 256 | np_fam = np.array(fam_ext) # Clearly using NumPy 257 | ```{{1}} 258 | 259 | `@script` 260 | In that respect, the more standard import numpy call is preferred: In this case, your function call is numpy.array, making it very clear that you're working with NumPy. 261 | 262 | --- 263 | 264 | ## Let's practice! 265 | 266 | ```yaml 267 | type: FinalSlide 268 | key: 570affae26 269 | ``` 270 | 271 | `@script` 272 | Off to the exercises now, where you can practice different ways of importing packages and modules yourself. You're well on your way to becoming a pythonista data science ninja. 273 | -------------------------------------------------------------------------------- /slides/chapter_4_34495ba457d74296794d2a122c9b6e19.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Insert title here 3 | key: 34495ba457d74296794d2a122c9b6e19 4 | video_link: 5 | hls: >- 6 | https://videos.datacamp.com/transcoded/735_intro_to_python/v6/hls-735_ch4_3.master.m3u8 7 | mp4: 'https://videos.datacamp.com/raw/735_intro_to_python/v6/735_ch4_3.mp4' 8 | transformations: 9 | translateX: 50 10 | translateY: 0 11 | scale: 1 12 | --- 13 | 14 | ## NumPy: Basic Statistics 15 | 16 | ```yaml 17 | type: TitleSlide 18 | key: 5d21c4b49f 19 | ``` 20 | 21 | `@lower_third` 22 | name: Hugo Bowne-Anderson 23 | title: Data Scientist at DataCamp 24 | 25 | `@script` 26 | A typical first step in analyzing your data, 27 | 28 | --- 29 | 30 | ## Data analysis 31 | 32 | ```yaml 33 | type: FullSlide 34 | key: 32899f8a31 35 | ``` 36 | 37 | `@part1` 38 | - Get to know your data{{1}} 39 | 40 | - Little data -> simply look at it{{2}} 41 | 42 | - Big data -> ?{{3}} 43 | 44 | `@script` 45 | is getting to know your data in the first place. For the NumPy arrays from before, this is pretty easy, because it isn't a lot of data. However, as a data scientist, you'll be crunching thousands, if not millions or billions of numbers. 46 | 47 | --- 48 | 49 | ## City-wide survey 50 | 51 | ```yaml 52 | type: FullSlide 53 | key: df02059657 54 | ``` 55 | 56 | `@part1` 57 | ```py 58 | import numpy as np 59 | np_city = ... # Implementation left out 60 | np_city 61 | ```{{1}} 62 | 63 | ```out 64 | array([[1.64, 71.78], 65 | [1.37, 63.35], 66 | [1.6 , 55.09], 67 | ..., 68 | [2.04, 74.85], 69 | [2.04, 68.72], 70 | [2.01, 73.57]]) 71 | ```{{1}} 72 | 73 | `@script` 74 | Imagine you conduct a city-wide survey where you ask 5000 adults about their height and weight. You end up with something like this: a 2D numpy array, which I named np_city, that has 5000 rows, corresponding to the 5000 people, and two columns, corresponding to the height and the weight. 75 | 76 | Simply staring at these numbers like a zombie won't give you any insights. What you can do, though, is generate summarizing statistics about your data. 77 | 78 | --- 79 | 80 | ## NumPy 81 | 82 | ```yaml 83 | type: FullSlide 84 | key: d3c991b91f 85 | code_zoom: 90 86 | ``` 87 | 88 | `@part1` 89 | ```py 90 | np.mean(np_city[:, 0]) 91 | ```{{1}} 92 | 93 | ```out 94 | 1.7472 95 | ```{{1}} 96 | 97 | ```py 98 | np.median(np_city[:, 0]) 99 | ```{{2}} 100 | 101 | ```out 102 | 1.75 103 | ```{{2}} 104 | 105 | `@script` 106 | Aside from an efficient data structure for number crunching, it happens that NumPy is also good at doing these kinds of things. 107 | 108 | For starters, you can try to find out the average height of these 5000 people, with NumPy's mean function. Because it's a function from the NumPy package, don't forget to start with np.. 109 | 110 | Of course, I first had to do a subsetting operation to get the height column from the 2D array. It appears that on average, people are 1.75 meters tall. What about the median height? This is the height of the middle person if you sort all persons from small to tall. Instead of writing complicated python code to figure this out, you can simply use NumPy's median function: 111 | 112 | You can do similar things for the weight column in np_city. Often, these summarizing statistics will provide you with a "sanity check" of your data. If you end up with a average weight of 2000 kilograms, your measurements are most likely incorrect. 113 | 114 | Apart from mean and median, there's also other functions, 115 | 116 | --- 117 | 118 | ## NumPy 119 | 120 | ```yaml 121 | type: FullSlide 122 | key: a66131c711 123 | ``` 124 | 125 | `@part1` 126 | ```py 127 | np.corrcoef(np_city[:, 0], np_city[:, 1]) 128 | ``` 129 | 130 | ```out 131 | array([[ 1. , -0.01802], 132 | [-0.01803, 1. ]]) 133 | ``` 134 | 135 | ```py 136 | np.std(np_city[:, 0]) 137 | ```{{1}} 138 | 139 | ```out 140 | 0.1992 141 | ```{{1}} 142 | 143 | - sum(), sort(), ...{{2}} 144 | 145 | - Enforce single data type: speed!{{3}} 146 | 147 | `@script` 148 | like corrcoeff to check if for example height and weight are correlated, 149 | 150 | and std, for standard deviation. 151 | 152 | NumPy also features more basic functions, such as sum and sort, which also exist in the basic Python distribution. However, the big difference here is speed. Because NumPy enforces a single data type in an array, it can drastically speed up the calculations. 153 | 154 | --- 155 | 156 | ## Generate data 157 | 158 | ```yaml 159 | type: FullSlide 160 | key: 0c27803967 161 | code_zoom: 80 162 | ``` 163 | 164 | `@part1` 165 | - Arguments for `np.random.normal()` {{1}} 166 | - distribution mean{{1}} 167 | - distribution standard deviation{{1}} 168 | - number of samples{{1}} 169 | 170 | ```py 171 | height = np.round(np.random.normal(1.75, 0.20, 5000), 2) 172 | 173 | weight = np.round(np.random.normal(60.32, 15, 5000), 2) 174 | 175 | ```{{1}} 176 | ```py 177 | np_city = np.column_stack((height, weight)) 178 | ```{{2}} 179 | 180 | `@script` 181 | Just a sidenote here: If you're wondering how I came up with the data in this video: We simulated it with NumPy functions! I sampled two random distributions 5000 times to create the height and weight arrays, and then used column_stack to paste them together as two columns. Another awesome thing that NumPy can do! 182 | 183 | Another great tool to get some sense of your data is to visualize it, but that's something for the next course also. 184 | 185 | --- 186 | 187 | ## Let's practice! 188 | 189 | ```yaml 190 | type: FinalSlide 191 | key: c4df18cfc1 192 | ``` 193 | 194 | `@script` 195 | First, head over to the exercises to learn how to explore your NumPy arrays! 196 | -------------------------------------------------------------------------------- /slides/chapter_4_a0487c26210f6b71ea98f917734cea3a.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Insert title here 3 | key: a0487c26210f6b71ea98f917734cea3a 4 | video_link: 5 | mp4: 'https://videos.datacamp.com/raw/735_intro_to_python/v6/735_ch4_1.mp4' 6 | hls: >- 7 | https://videos.datacamp.com/transcoded/735_intro_to_python/v6/hls-735_ch4_1.master.m3u8 8 | transformations: 9 | translateX: 50 10 | translateY: 0 11 | scale: 1 12 | --- 13 | 14 | ## NumPy 15 | 16 | ```yaml 17 | type: TitleSlide 18 | key: 1062fb4e4c 19 | ``` 20 | 21 | `@lower_third` 22 | name: Hugo Bowne-Anderson 23 | title: Data Scientist at DataCamp 24 | 25 | `@script` 26 | Wow, you've done well and by now, you are aware 27 | 28 | --- 29 | 30 | ## Lists Recap 31 | 32 | ```yaml 33 | type: FullSlide 34 | key: 819dc4dd09 35 | ``` 36 | 37 | `@part1` 38 | - Powerful{{1}} 39 | 40 | - Collection of values{{2}} 41 | 42 | - Hold different types{{3}} 43 | 44 | - Change, add, remove{{4}} 45 | 46 | - Need for Data Science{{5}} 47 | 48 | - Mathematical operations over collections{{6}} 49 | 50 | - Speed{{7}} 51 | 52 | `@script` 53 | that the Python list is pretty powerful. A list can hold any type and can hold different types at the same time. You can also change, add and remove elements. This is wonderful, but one feature is missing, a feature that is super important for aspiring data scientists as yourself. When analyzing data, you'll often want to carry out operations over entire collections of values, and you want to do this fast. With lists, this is a problem. 54 | 55 | --- 56 | 57 | ## Illustration 58 | 59 | ```yaml 60 | type: FullSlide 61 | key: c038185807 62 | code_zoom: 64 63 | ``` 64 | 65 | `@part1` 66 | ```py 67 | height = [1.73, 1.68, 1.71, 1.89, 1.79] 68 | height 69 | ``` 70 | 71 | ```out 72 | [1.73, 1.68, 1.71, 1.89, 1.79] 73 | ``` 74 | 75 | ```py 76 | weight = [65.4, 59.2, 63.6, 88.4, 68.7] 77 | weight 78 | ```{{1}} 79 | 80 | ```out 81 | [65.4, 59.2, 63.6, 88.4, 68.7] 82 | ```{{1}} 83 | 84 | ```py 85 | weight / height ** 2 86 | ```{{2}} 87 | 88 | ```out 89 | TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'int' 90 | ```{{3}} 91 | 92 | `@script` 93 | Let's retake the heights of your family and yourself. Suppose you've also asked for everybody's weight. It's not very polite, but everything for science, right? You end up with two lists, height, and weight. The first person is 1.73 meters tall and weighs 65.4 kilograms. 94 | 95 | If you now want to calculate the Body Mass Index for each family member, you'd hope that this call can work, making the calculations element-wise. 96 | 97 | Unfortunately, Python throws an error, because it has no idea how to do calculations on lists. You could solve this by going through each list element one after the other, and calculating the BMI for each person separately, but this is terribly inefficient and tiresome to write. 98 | 99 | --- 100 | 101 | ## Solution: NumPy 102 | 103 | ```yaml 104 | type: FullSlide 105 | key: 7d3d0276cb 106 | ``` 107 | 108 | `@part1` 109 | - Numeric Python{{1}} 110 | 111 | - Alternative to Python List: NumPy Array{{2}} 112 | 113 | - Calculations over entire arrays{{3}} 114 | 115 | - Easy and Fast{{4}} 116 | 117 | - Installation{{5}} 118 | 119 | - In the terminal: `pip3 install numpy`{{6}} 120 | 121 | `@script` 122 | A way more elegant solution is to use NumPy, or Numeric Python. It's a Python package that, among others, provides a alternative to the regular python list: the NumPy array. The NumPy array is pretty similar to the list, but has one additional feature: you can perform calculations over entire arrays. It's really easy, and super-fast as well. 123 | 124 | The NumPy package is already installed on DataCamp's servers, but if you want to work with it on your own system, go to the command line and execute pip3 install numpy. 125 | 126 | Next, 127 | 128 | --- 129 | 130 | ## NumPy 131 | 132 | ```yaml 133 | type: FullSlide 134 | key: b227a9dc4f 135 | code_zoom: 75 136 | ``` 137 | 138 | `@part1` 139 | ```py 140 | import numpy as np 141 | ``` 142 | ```py 143 | np_height = np.array(height) 144 | np_height 145 | ```{{1}} 146 | 147 | ```out 148 | array([1.73, 1.68, 1.71, 1.89, 1.79]) 149 | ```{{1}} 150 | 151 | ```py 152 | np_weight = np.array(weight) 153 | np_weight 154 | ```{{1}} 155 | 156 | ```out 157 | array([65.4, 59.2, 63.6, 88.4, 68.7]) 158 | ```{{1}} 159 | 160 | ```py 161 | bmi = np_weight / np_height ** 2 162 | bmi 163 | ```{{2}} 164 | 165 | ```out 166 | array([21.85171573, 20.97505669, 21.75028214, 24.7473475 , 21.44127836]) 167 | ```{{2}} 168 | 169 | `@script` 170 | to actually use NumPy in your Python session, you can import the numpy package, like this. 171 | 172 | Let's start with creating a numpy array. You do this with NumPy's array function: the input is a regular Python list. I'm using array twice here, to create NumPy versions of the height and weight lists from before: np_height and np_weight: 173 | 174 | Let's try to calculate everybody's BMI with a single call again. 175 | 176 | This time, it worked fine: the calculations were performed element-wise. The first person's BMI was calculated by dividing the first element in np_weight by the square of the first element in np_height, the second person's BMI was calculated with the second height and weight elements, and so on. 177 | 178 | --- 179 | 180 | ## Comparison 181 | 182 | ```yaml 183 | type: FullSlide 184 | key: b0247dd81c 185 | code_zoom: 77 186 | ``` 187 | 188 | `@part1` 189 | ```py 190 | height = [1.73, 1.68, 1.71, 1.89, 1.79] 191 | weight = [65.4, 59.2, 63.6, 88.4, 68.7] 192 | weight / height ** 2 193 | ```{{1}} 194 | 195 | ```out 196 | TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'int' 197 | ```{{1}} 198 | 199 | ```py 200 | np_height = np.array(height) 201 | np_weight = np.array(weight) 202 | np_weight / np_height ** 2 203 | ```{{2}} 204 | 205 | ```out 206 | array([21.85171573, 20.97505669, 21.75028214, 24.7473475 , 21.44127836]) 207 | ```{{2}} 208 | 209 | `@script` 210 | Let's do a quick comparison here. First, we tried to do calculations with regular lists, like this, but this gave us an error, because Python doesn't now how to do calculations with lists like we want them to. Next, these regular lists where converted to NumPy arrays. The same operations now work without any problem: NumPy knows how to work with arrays as if they are single values, which is pretty awesome if you ask me. 211 | 212 | --- 213 | 214 | ## NumPy: remarks 215 | 216 | ```yaml 217 | type: FullSlide 218 | key: f9882b091b 219 | code_zoom: 90 220 | ``` 221 | 222 | `@part1` 223 | ```py 224 | np.array([1.0, "is", True]) 225 | ```{{1}} 226 | 227 | ```out 228 | array(['1.0', 'is', 'True'], dtype=' 23 310 | ```{{3}} 311 | 312 | ```out 313 | array([False, False, False, True, False]) 314 | ```{{3}} 315 | 316 | ```py 317 | bmi[bmi > 23] 318 | ```{{4}} 319 | 320 | ```out 321 | array([24.7473475]) 322 | ```{{4}} 323 | 324 | `@script` 325 | you can work with NumPy arrays pretty much the same as you can with regular Python lists. When you want to get elements from your array, for example, you can use square brackets. Suppose you want to get the bmi for the second person, so at index 1. This will do the trick. 326 | 327 | Specifically for NumPy, there's also another way to do list subsetting: using an array of booleans. Say you want to get all BMI values in the bmi array that are over 23. A first step is using the greater than sign, like this: 328 | 329 | The result is a NumPy array containing booleans: True if the corresponding bmi is above 23, False if it's below. Next, you can use this boolean array inside square brackets to do subsetting. Only the elements in bmi that are above 23, so for which the corresponding boolean value is True, is selected. There's only one BMI that's above 23, so we end up with a NumPy array with a single value, that specific BMI. 330 | 331 | Using the result of a comparison to make a selection of your data is a very common way to get surprising insights. 332 | 333 | --- 334 | 335 | ## Let's practice! 336 | 337 | ```yaml 338 | type: FinalSlide 339 | key: 1138fd29b8 340 | ``` 341 | 342 | `@script` 343 | Learn all about it and the other NumPy basics in the exercises! 344 | -------------------------------------------------------------------------------- /slides/chapter_4_ae3238dcc7feb9adecfee0c395fc8dc8.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Insert title here 3 | key: ae3238dcc7feb9adecfee0c395fc8dc8 4 | video_link: 5 | mp4: 'https://videos.datacamp.com/raw/735_intro_to_python/v6/735_ch4_2.mp4' 6 | hls: >- 7 | https://videos.datacamp.com/transcoded/735_intro_to_python/v6/hls-735_ch4_2.master.m3u8 8 | transformations: 9 | translateX: 50 10 | translateY: 0 11 | scale: 1 12 | --- 13 | 14 | ## 2D NumPy Arrays 15 | 16 | ```yaml 17 | type: TitleSlide 18 | key: 0cc8abf493 19 | ``` 20 | 21 | `@lower_third` 22 | name: Hugo Bowne-Anderson 23 | title: Data Scientist at DataCamp 24 | 25 | `@script` 26 | Well done you legend! Let's now recreate the numpy arrays from the previous video. 27 | 28 | --- 29 | 30 | ## Type of NumPy Arrays 31 | 32 | ```yaml 33 | type: FullSlide 34 | key: 1b9db47fd2 35 | code_zoom: 100 36 | ``` 37 | 38 | `@part1` 39 | ```py 40 | import numpy as np 41 | np_height = np.array([1.73, 1.68, 1.71, 1.89, 1.79]) 42 | np_weight = np.array([65.4, 59.2, 63.6, 88.4, 68.7]) 43 | ``` 44 | 45 | ```py 46 | type(np_height) 47 | ``` 48 | 49 | ```out 50 | numpy.ndarray 51 | ``` 52 | 53 | ```py 54 | type(np_weight) 55 | ``` 56 | 57 | ```out 58 | numpy.ndarray 59 | ``` 60 | 61 | `@script` 62 | If you ask for the type of these arrays, Python tells you that they are numpy.ndarray. numpy dot tells you it's a type that was defined in the numpy package. ndarray stands for n-dimensional array. The arrays np_height and np_weight are one-dimensional arrays, but it's perfectly possible to create 2 dimensional, three dimensional, heck even seven dimensional arrays! Let's stick to 2 in this video though. 63 | 64 | --- 65 | 66 | ## 2D NumPy Arrays 67 | 68 | ```yaml 69 | type: FullSlide 70 | key: ebb550dcba 71 | code_zoom: 71 72 | ``` 73 | 74 | `@part1` 75 | ```py 76 | np_2d = np.array([[1.73, 1.68, 1.71, 1.89, 1.79], 77 | [65.4, 59.2, 63.6, 88.4, 68.7]]) 78 | ```{{1}} 79 | ```py 80 | np_2d 81 | ```{{2}} 82 | 83 | ```out 84 | array([[ 1.73, 1.68, 1.71, 1.89, 1.79], 85 | [65.4 , 59.2 , 63.6 , 88.4 , 68.7 ]]) 86 | ```{{2}} 87 | 88 | ```py 89 | np_2d.shape 90 | ```{{3}} 91 | 92 | ```out 93 | (2, 5) # 2 rows, 5 columns 94 | ```{{3}} 95 | 96 | ```py 97 | np.array([[1.73, 1.68, 1.71, 1.89, 1.79], 98 | [65.4, 59.2, 63.6, 88.4, "68.7"]]) 99 | ```{{4}} 100 | 101 | ```out 102 | array([['1.73', '1.68', '1.71', '1.89', '1.79'], 103 | ['65.4', '59.2', '63.6', '88.4', '68.7']], dtype='