├── .ipynb_checkpoints ├── class_one-checkpoint.ipynb └── class_two-checkpoint.ipynb ├── README.md ├── alice_in_wonderland.txt ├── class_five.ipynb ├── class_four.ipynb ├── class_one.ipynb ├── class_three.ipynb ├── class_two.ipynb ├── continuing_the_journey.md ├── enigma.py └── new_file.txt /.ipynb_checkpoints/class_one-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Class One - A Gentle Introduction\n", 8 | "\n", 9 | "The Python language is a high level object oriented programming language with influences from many other programming languages. It exists within the spectrum of possible computer languages.\n", 10 | "\n", 11 | "## First Point\n", 12 | "\n", 13 | "Python, like all computer languages, is a written language. We can think of this language, like others as having nouns and verbs.\n", 14 | "\n", 15 | "Today, we will learn a few nouns and verbs for this language." 16 | ] 17 | }, 18 | { 19 | "cell_type": "markdown", 20 | "metadata": {}, 21 | "source": [ 22 | "# First Nouns" 23 | ] 24 | }, 25 | { 26 | "cell_type": "code", 27 | "execution_count": 1, 28 | "metadata": { 29 | "collapsed": true 30 | }, 31 | "outputs": [], 32 | "source": [ 33 | "string_variable = \"\"\n", 34 | "integer_variable = 0\n", 35 | "floating_point_variable = 0.0\n", 36 | "boolean_variable = True" 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": {}, 42 | "source": [ 43 | "The above four examples show us the most common four variables - strings, integers, floats, and booleans. \n", 44 | "\n", 45 | "**Definition** _Integer_ - An integer is a whole number that can be treated as a mathematical object. It is one that can be added to, subtracted from, divided by, or multiplied by; to get new numbers.\n", 46 | "\n", 47 | "**Definition** _Floating Point Number_ - A floating point number is a whole number + some list of numbers after the decimal point. Floating point numbers can be thought of as belonging to the set of real numbers, from mathematics. They too, like numbers, can be added, subtracted, multiplied and divided.\n", 48 | "\n", 49 | "**Definition** _String_ - A string is literal, whatever exists between the open quote (`\"`) and the close quote (`\"`), will be treated as is. We see strings as different in a fundamental way from numbers, they can be manipulated, in most high level computer languages, however they cannot be added to, subtracted from, multiplied by or divided by.\n", 50 | "\n", 51 | "**Definition** _Boolean_ - A boolean is a binary variable. It can only take on one of two values, True and False. The notion of being able to state True or False with semantic interpretability is a powerful construct. \n", 52 | "\n", 53 | "_Aside_: Being able to speak in terms of absolute truth with lend us the ability to do some extraordinary things." 54 | ] 55 | }, 56 | { 57 | "cell_type": "markdown", 58 | "metadata": {}, 59 | "source": [ 60 | "# First Verbs" 61 | ] 62 | }, 63 | { 64 | "cell_type": "code", 65 | "execution_count": 5, 66 | "metadata": {}, 67 | "outputs": [ 68 | { 69 | "name": "stdout", 70 | "output_type": "stream", 71 | "text": [ 72 | "Adding 5 + 7 = 12\n", 73 | "Dividing 6.3 by 17 = 0.37058823529411766\n", 74 | "Concatenating the literal 5 and the literal 7 yields 57\n", 75 | "The truth value of True AND False is False\n" 76 | ] 77 | } 78 | ], 79 | "source": [ 80 | "integer_result = 5 + 7\n", 81 | "print(\"Adding 5 + 7 = {}\".format(integer_result))\n", 82 | "floating_point_result = 6.3 / 17\n", 83 | "print(\"Dividing 6.3 by 17 = {}\".format(floating_point_result))\n", 84 | "string_result = \"5\" + \"7\"\n", 85 | "print(\"Concatenating the literal 5 and the literal 7 yields {}\".format(string_result))\n", 86 | "boolean_result = True and False\n", 87 | "print(\"The truth value of True AND False is {}\".format(boolean_result))\n" 88 | ] 89 | }, 90 | { 91 | "cell_type": "markdown", 92 | "metadata": {}, 93 | "source": [ 94 | "# Understanding how to use verbs and nouns together\n", 95 | "\n", 96 | "In this last example we used each of the nouns we defined, often called types of data, or types for short. Notice that the types interacted with verbs:\n", 97 | "\n", 98 | "* \"+\" - addition for integers and floating point numbers, concatenation for strings.\n", 99 | "* \"/\" - division for integers and floating point numbers.\n", 100 | "* \"and\" - logical AND for booleans.\n", 101 | "* `print()` - prints strings to the screen.\n", 102 | "\n", 103 | "Notice that we are seeing two types of verbs, or as computer programmers often call them, functions. The first type of functions works by placing it's inputs on either side of the function. For instance:\n", 104 | "\n", 105 | "`True and False`\n", 106 | "\n", 107 | "Or\n", 108 | "\n", 109 | "`5 + 7`\n", 110 | "\n", 111 | "However, the `print()` function doesn't work like that, it takes it's inputs in order, as a list of inputs. Most computer functions in Python work with way. There are lot of good reasons for this, but the simpliest one is the following:\n", 112 | "\n", 113 | "Try to add 12 integers using \"+\" - `1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12`. Let's see the result just for fun:\n" 114 | ] 115 | }, 116 | { 117 | "cell_type": "code", 118 | "execution_count": 6, 119 | "metadata": {}, 120 | "outputs": [ 121 | { 122 | "data": { 123 | "text/plain": [ 124 | "78" 125 | ] 126 | }, 127 | "execution_count": 6, 128 | "metadata": {}, 129 | "output_type": "execute_result" 130 | } 131 | ], 132 | "source": [ 133 | "1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12" 134 | ] 135 | }, 136 | { 137 | "cell_type": "markdown", 138 | "metadata": {}, 139 | "source": [ 140 | "That was a lot of typing! We had to write the plus operator 11 times! Let's look at this same example, if we used the function the other way, like the print function:\n", 141 | "\n", 142 | "`add(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)`\n", 143 | "\n", 144 | "Way less typing!\n", 145 | "\n", 146 | "But wait, we don't have an add function already defined for us in Python. Is there a way we could make our own? " 147 | ] 148 | }, 149 | { 150 | "cell_type": "markdown", 151 | "metadata": {}, 152 | "source": [ 153 | "# Defining Our Own Functions\n", 154 | "\n", 155 | "It turns out defining our own functions in Python is very, very easy, unlike some other languages.\n", 156 | "\n", 157 | "Here's the general syntax for doing so:\n", 158 | "\n", 159 | "```\n", 160 | "def function_name(first_input, second_input, third_input):\n", 161 | " # ... some code goes here\n", 162 | " return result_of_code\n", 163 | "```\n", 164 | "\n", 165 | "So let's see what it's like defining our own function:" 166 | ] 167 | }, 168 | { 169 | "cell_type": "code", 170 | "execution_count": 8, 171 | "metadata": {}, 172 | "outputs": [ 173 | { 174 | "name": "stdout", 175 | "output_type": "stream", 176 | "text": [ 177 | "5 + 7 is 12\n" 178 | ] 179 | } 180 | ], 181 | "source": [ 182 | "def add(first_input, second_input):\n", 183 | " return first_input + second_input\n", 184 | "\n", 185 | "print(\"5 + 7 is {}\".format(add(5,7)))" 186 | ] 187 | }, 188 | { 189 | "cell_type": "markdown", 190 | "metadata": {}, 191 | "source": [ 192 | "We'll learn later on in the course how to take in an arbitrary of inputs to Python functions later on in the course. But for now, just take my word that it's possible to do this." 193 | ] 194 | }, 195 | { 196 | "cell_type": "markdown", 197 | "metadata": {}, 198 | "source": [ 199 | "# The Power Of Booleans\n", 200 | "\n", 201 | "Earlier on in the less I said that boolean values were AWESOME. And now I'm going to show you how awesome they are. With booleans we can do a lot of things like:\n", 202 | "\n", 203 | "* Write a program that never ends\n", 204 | "* Write expressions that evaluate to True or False\n", 205 | "* Write functions that evaluate to True or False" 206 | ] 207 | }, 208 | { 209 | "cell_type": "code", 210 | "execution_count": null, 211 | "metadata": { 212 | "collapsed": true 213 | }, 214 | "outputs": [], 215 | "source": [ 216 | "# A program that never ends\n", 217 | "\n", 218 | "while True:\n", 219 | " print(\"This is the song that never ends, it goes on and on my friend\")\n", 220 | " print(\"some people started singing it, not knowing what it was.\")\n", 221 | " print(\"And now they keep on singing it and that is just because,\")\n", 222 | "# Don't run this!!!!" 223 | ] 224 | }, 225 | { 226 | "cell_type": "markdown", 227 | "metadata": {}, 228 | "source": [ 229 | "We'll get into the while keyboard in a later lecture and all this other crazy stuff. But for now, just think of the statement doing the following:\n", 230 | "\n", 231 | "```\n", 232 | "while [Some statement that is True remains true]:\n", 233 | " do everything inside of the while statement\n", 234 | "```\n", 235 | "\n", 236 | "So this means, things will _keep_ happening _forever_, if the statement is never false. Well, `True` is always `True`, and we can check that:" 237 | ] 238 | }, 239 | { 240 | "cell_type": "code", 241 | "execution_count": 9, 242 | "metadata": {}, 243 | "outputs": [ 244 | { 245 | "data": { 246 | "text/plain": [ 247 | "True" 248 | ] 249 | }, 250 | "execution_count": 9, 251 | "metadata": {}, 252 | "output_type": "execute_result" 253 | } 254 | ], 255 | "source": [ 256 | "True == True" 257 | ] 258 | }, 259 | { 260 | "cell_type": "markdown", 261 | "metadata": {}, 262 | "source": [ 263 | "Yup, checks out! \n", 264 | "\n", 265 | "So as long as `True` is `True`, we are guaranteed that this statement keeps being executed. That's going to be super useful for us later on, because we may want to write programs that never end." 266 | ] 267 | }, 268 | { 269 | "cell_type": "markdown", 270 | "metadata": {}, 271 | "source": [ 272 | "Let's look at another case now, writing expressions that evaluate to `True` or `False`. \n", 273 | "\n", 274 | "It turns out that if you make use of a boolean function then your expression will return one of these. The list of builtin boolean functions includes:\n", 275 | "\n", 276 | "* `<` - less than\n", 277 | "* `<=` - less than or equal to\n", 278 | "* `>` - greater than\n", 279 | "* `>=` - greater than or equal to\n", 280 | "* `==` - equal to\n", 281 | "* `and` - Logical AND\n", 282 | "* `or` - Logical OR\n", 283 | "* `not` - reverses the truth value of the statement\n", 284 | "* `in` - checks if an element is in a collection (we'll get to this)\n", 285 | "* `is` - is checks if two things are the same, it's similar equal to, but not the same (don't worry too much about this for now).\n", 286 | "\n", 287 | "We'll only concern ourselves today with everything up to `not`, all the other booleans will be discussed later on.\n", 288 | "\n", 289 | "So let's see our first example: \n", 290 | "\n", 291 | "Let's check if 5 is less than 7" 292 | ] 293 | }, 294 | { 295 | "cell_type": "code", 296 | "execution_count": 11, 297 | "metadata": {}, 298 | "outputs": [ 299 | { 300 | "name": "stdout", 301 | "output_type": "stream", 302 | "text": [ 303 | "True\n" 304 | ] 305 | } 306 | ], 307 | "source": [ 308 | "print(5 < 7)" 309 | ] 310 | }, 311 | { 312 | "cell_type": "markdown", 313 | "metadata": {}, 314 | "source": [ 315 | "Great! Now what can we do with that? It turns out Python has a builtin function that let's you check `if` an expression returns `True`." 316 | ] 317 | }, 318 | { 319 | "cell_type": "code", 320 | "execution_count": 12, 321 | "metadata": {}, 322 | "outputs": [ 323 | { 324 | "name": "stdout", 325 | "output_type": "stream", 326 | "text": [ 327 | "turns out, 5 is less than 7\n" 328 | ] 329 | } 330 | ], 331 | "source": [ 332 | "if 5 < 7:\n", 333 | " print(\"turns out, 5 is less than 7\")" 334 | ] 335 | }, 336 | { 337 | "cell_type": "markdown", 338 | "metadata": {}, 339 | "source": [ 340 | "And we can also check if an expression isn't `True` we can do something else, via an else statement." 341 | ] 342 | }, 343 | { 344 | "cell_type": "code", 345 | "execution_count": 13, 346 | "metadata": {}, 347 | "outputs": [ 348 | { 349 | "name": "stdout", 350 | "output_type": "stream", 351 | "text": [ 352 | "turns out, 5 is less than 7\n" 353 | ] 354 | } 355 | ], 356 | "source": [ 357 | "if 5 < 7:\n", 358 | " print(\"turns out, 5 is less than 7\")\n", 359 | "else:\n", 360 | " print(\"uh oh, looks like we implemented Math wrong!\")" 361 | ] 362 | }, 363 | { 364 | "cell_type": "markdown", 365 | "metadata": {}, 366 | "source": [ 367 | "There a whole list of powerful checks we can do, to express many powerful ideas, just by making use of this high level powerful syntax. We'll concern ourselves with a few more examples now." 368 | ] 369 | }, 370 | { 371 | "cell_type": "code", 372 | "execution_count": 14, 373 | "metadata": {}, 374 | "outputs": [ 375 | { 376 | "name": "stdout", 377 | "output_type": "stream", 378 | "text": [ 379 | "looks like lots of the math we know works in programs\n" 380 | ] 381 | } 382 | ], 383 | "source": [ 384 | "if 5 < 7 and 14 > 12:\n", 385 | " print(\"looks like lots of the math we know works in programs\")\n", 386 | "else:\n", 387 | " print(\"hmm maybe I don't remember how to check for less than after all\")" 388 | ] 389 | }, 390 | { 391 | "cell_type": "code", 392 | "execution_count": 15, 393 | "metadata": {}, 394 | "outputs": [ 395 | { 396 | "name": "stdout", 397 | "output_type": "stream", 398 | "text": [ 399 | "I guess it makes sense that this is true\n" 400 | ] 401 | } 402 | ], 403 | "source": [ 404 | "if 5 < 7 or 12 > 14:\n", 405 | " print(\"I guess it makes sense that this is true\")\n", 406 | "else:\n", 407 | " print(\"woah, learned something new!\")" 408 | ] 409 | }, 410 | { 411 | "cell_type": "code", 412 | "execution_count": 16, 413 | "metadata": {}, 414 | "outputs": [ 415 | { 416 | "name": "stdout", 417 | "output_type": "stream", 418 | "text": [ 419 | "phew, not going crazy, yet.\n" 420 | ] 421 | } 422 | ], 423 | "source": [ 424 | "if 5 == 7:\n", 425 | " print(\"what? When did that happen\")\n", 426 | "else:\n", 427 | " print(\"phew, not going crazy, yet.\")" 428 | ] 429 | }, 430 | { 431 | "cell_type": "markdown", 432 | "metadata": {}, 433 | "source": [ 434 | "# Assignment\n", 435 | "\n", 436 | "https://github.com/18F/an_introduction_to_python#assignment-for-class-1" 437 | ] 438 | }, 439 | { 440 | "cell_type": "markdown", 441 | "metadata": { 442 | "collapsed": true 443 | }, 444 | "source": [ 445 | "For the assignment you'll need to know how to read input from a user. Here we'll see an example of this." 446 | ] 447 | }, 448 | { 449 | "cell_type": "code", 450 | "execution_count": 17, 451 | "metadata": {}, 452 | "outputs": [ 453 | { 454 | "name": "stdout", 455 | "output_type": "stream", 456 | "text": [ 457 | "What is your name?Eric\n", 458 | "Hello Eric\n" 459 | ] 460 | } 461 | ], 462 | "source": [ 463 | "name = str(input(\"What is your name?\"))\n", 464 | "print(\"Hello {}\".format(name))" 465 | ] 466 | }, 467 | { 468 | "cell_type": "markdown", 469 | "metadata": {}, 470 | "source": [] 471 | } 472 | ], 473 | "metadata": { 474 | "kernelspec": { 475 | "display_name": "Python 3", 476 | "language": "python", 477 | "name": "python3" 478 | }, 479 | "language_info": { 480 | "codemirror_mode": { 481 | "name": "ipython", 482 | "version": 3 483 | }, 484 | "file_extension": ".py", 485 | "mimetype": "text/x-python", 486 | "name": "python", 487 | "nbconvert_exporter": "python", 488 | "pygments_lexer": "ipython3", 489 | "version": "3.6.1" 490 | } 491 | }, 492 | "nbformat": 4, 493 | "nbformat_minor": 2 494 | } 495 | -------------------------------------------------------------------------------- /.ipynb_checkpoints/class_two-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Class Two - Dealing with the file system and standing on the shoulders of giants\n", 8 | "\n", 9 | "The reason the Python language is so good is not because of it's syntax. It's syntax is wonderful! But almost all modern programming languages are similar. Python is just one of many languages that looks and feels the way it does. This is not by accident, that's because the good ideas in computer languages are not owned by any one language. \n", 10 | "\n", 11 | "The reason the Python language is the powerhouse that it is, is simple, it's because of the community. Python has more built with it than almost any other language I know of. And it's much younger than the ones with more built in it, looking at you, Java.\n", 12 | "\n", 13 | "So how do we leverage the true power of Python? With one simple statement, `import`.\n", 14 | "\n", 15 | "Let's see an example to understand this better." 16 | ] 17 | }, 18 | { 19 | "cell_type": "code", 20 | "execution_count": 3, 21 | "metadata": {}, 22 | "outputs": [ 23 | { 24 | "name": "stdout", 25 | "output_type": "stream", 26 | "text": [ 27 | "8.0\n" 28 | ] 29 | } 30 | ], 31 | "source": [ 32 | "import math\n", 33 | "\n", 34 | "print(math.pow(2,3))" 35 | ] 36 | }, 37 | { 38 | "cell_type": "markdown", 39 | "metadata": {}, 40 | "source": [ 41 | "So what did we just see here? First, we imported the `math` and then we used a function as part of the math library to do some computation. In order to get a better sense of this, let's introduce some definitions.\n", 42 | "\n", 43 | "**Definition** _library_ - a library (also sometimes called a module), is a collection of code that can be used in your existing program. In python you access these libraries by first importing them into your program.\n", 44 | "\n", 45 | "**Definition** _class_ - a class is a collection of functions and data, all accessed via a dot operator. You call functions by adding opening and closing paranthesis after the name of the function, with possibly some input variables in between. \n", 46 | "\n", 47 | "**Definition** _dot operator_ - a dot is used to break up paths to specific pieces of code. We already know another piece of notation to break up semantically different pieces in a long string - `/` when used in the context of the file system.\n", 48 | "\n", 49 | "Often times the library you are importing will have multiple classes - collections of code - that you'll want to access individually. Here's an example of using such a piece of code:" 50 | ] 51 | }, 52 | { 53 | "cell_type": "code", 54 | "execution_count": 4, 55 | "metadata": {}, 56 | "outputs": [ 57 | { 58 | "data": { 59 | "text/plain": [ 60 | "NormaltestResult(statistic=32.008023157040626, pvalue=1.1208463532086409e-07)" 61 | ] 62 | }, 63 | "execution_count": 4, 64 | "metadata": {}, 65 | "output_type": "execute_result" 66 | } 67 | ], 68 | "source": [ 69 | "from scipy import stats\n", 70 | "import random\n", 71 | "\n", 72 | "set_of_values = [random.randint(0,100) for _ in range(100)]\n", 73 | "stats.normaltest(set_of_values)" 74 | ] 75 | }, 76 | { 77 | "cell_type": "markdown", 78 | "metadata": {}, 79 | "source": [ 80 | "The above piece of code does two import statements - first it selects the `stats` class from `scipy`. And then it imports `random` - a module that generates random data. The remaining lines of code aren't super important to understand it detail. But just note that I'm calling `stats.normaltest` and `random.randint`. I'm calling two functions defined in the classes `stats` and `random`. That's the important take away. \n", 81 | "\n", 82 | "Of course, I could probably write these functions myself and these classes for that matter. But that would take a lot of time. I'd have to look up the algorithms, internalize how they work, and then go about writing them. It would probably take many, many tries to actually get them right, even after I understand them. And then, maybe 3 months from now, I'd have code that functions as the above 4 lines does. \n", 83 | "\n", 84 | "This is what I mean when I say Python lets you stand on the shoulders of giants. The Python community has already done the hard work for you! All you have to do is `import` [whatever] and go! \n", 85 | "\n", 86 | "Now ofcourse things are slightly more complicated than that. But more on that later!" 87 | ] 88 | }, 89 | { 90 | "cell_type": "markdown", 91 | "metadata": {}, 92 | "source": [ 93 | "## Understanding the File System\n", 94 | "\n", 95 | "Before we can move onto the next part of the course, we need to understand how the file system works, so I'm going to be switching to the terminal for this part of the class.\n", 96 | "\n", 97 | "[DEMO]\n", 98 | "\n", 99 | "**Definition** _a file_ - a piece of memory that stores information as a string.\n", 100 | "\n", 101 | "**Definition** _a folder_ - a collection of files and folders, contained within it.\n", 102 | "\n", 103 | "**Definition** _current working directory_ - the current directory that your terminal or program is referencing.\n", 104 | "\n", 105 | "**Definition** _command line comands_ - functions you can call from your command line to act on files and folders. You can think of these like the worlds first \"apps\". Similar to the ones you use on your smart phone.\n", 106 | "\n", 107 | "**Definition** _root directory_ - the top of your file system. This is the directory you hit when you type `cd /`.\n", 108 | "\n", 109 | "**Definition** _home directory_ - the top of your users file system. This is the directory you hit when you type `cd ~`.\n", 110 | "\n", 111 | "**Definition** _full file path_ - the full path from your root directory to your current folder.\n", 112 | "\n", 113 | "**Definition** _relative file path_ - the file path to the directory or file you want to reference, relative to the directory you are currently in.\n", 114 | "\n", 115 | "\n", 116 | "And now a list of commands:\n", 117 | "\n", 118 | "* `ls` - see all the files and folders in your current working directory\n", 119 | "* `cd` - change directories by passing this command a folder.\n", 120 | "* `pwd` - print out the full path to current directory.\n", 121 | "\n", 122 | "Now that we understand the basics of the file system, let's see how Python manipulates the file system with code." 123 | ] 124 | }, 125 | { 126 | "cell_type": "code", 127 | "execution_count": 5, 128 | "metadata": {}, 129 | "outputs": [ 130 | { 131 | "name": "stdout", 132 | "output_type": "stream", 133 | "text": [ 134 | "/Users/ericschles/Documents/projects/python_courses/an_introduction_to_python\n", 135 | "/Users/ericschles/Documents/projects/python_courses\n", 136 | "/Users/ericschles/Documents/projects/python_courses/an_introduction_to_python\n" 137 | ] 138 | } 139 | ], 140 | "source": [ 141 | "import os\n", 142 | "\n", 143 | "current_directory = os.getcwd()\n", 144 | "print(os.getcwd()) #equivalent to pwd\n", 145 | "os.chdir(\"..\") # equivalent to cd ..\n", 146 | "print(os.getcwd())\n", 147 | "os.chdir(current_directory)\n", 148 | "print(os.getcwd())" 149 | ] 150 | }, 151 | { 152 | "cell_type": "code", 153 | "execution_count": 8, 154 | "metadata": {}, 155 | "outputs": [ 156 | { 157 | "name": "stdout", 158 | "output_type": "stream", 159 | "text": [ 160 | "False\n", 161 | "True\n" 162 | ] 163 | } 164 | ], 165 | "source": [ 166 | "import os\n", 167 | "\n", 168 | "print(os.path.isdir(\"class_two.ipynb\"))\n", 169 | "print(os.path.isfile(\"class_two.ipynb\"))" 170 | ] 171 | }, 172 | { 173 | "cell_type": "markdown", 174 | "metadata": {}, 175 | "source": [ 176 | "Okay so we can mess with the file system and change between directories programmatically, so what? Well, this becomes extremely powerful if we add the ability to read and write files! Think about how many tasks we can automate with that :) \n", 177 | "\n", 178 | "Here's just _an_ example - Say your boss asks you for a report every friday at 4pm of what happened last week. All of the information is recorded in a database. And right now you just click some buttons, download some data from a database and then copy/paste the results into a template. What if a program automatically generated the data, directly from the database and then emailed it to your boss. Then you could work on other stuff! No more having to do the same boring task every friday and right before the end of the work day!\n", 179 | "\n", 180 | "Now imagine you could do that for every single task like that. Imagine how much time you'd save! You could focus on high impact, stimulating work, instead of the boring stuff. \n", 181 | "\n", 182 | "Hopefully this motivates the next example well enough for your needs (I know that folks asked me for this kinda stuff all the time in my first job)." 183 | ] 184 | }, 185 | { 186 | "cell_type": "code", 187 | "execution_count": 10, 188 | "metadata": {}, 189 | "outputs": [ 190 | { 191 | "name": "stdout", 192 | "output_type": "stream", 193 | "text": [ 194 | "Hello there!\n" 195 | ] 196 | } 197 | ], 198 | "source": [ 199 | "new_file = open(\"new_file.txt\", \"w\")\n", 200 | "\n", 201 | "new_file.write(\"Hello there!\")\n", 202 | "new_file.close()\n", 203 | "\n", 204 | "just_created_file = open(\"new_file.txt\", \"r\")\n", 205 | "print(just_created_file.read())" 206 | ] 207 | }, 208 | { 209 | "cell_type": "markdown", 210 | "metadata": {}, 211 | "source": [ 212 | "What did we just do?!?! We created a file with `open` and then we wrote a string to it and then we closed it. And finally we re-openned the file and read it's contents.\n", 213 | "\n", 214 | "So you'll notice that the open command takes a second string. The first time we used it looked like this:\n", 215 | "\n", 216 | "`new_file = open(\"new_file.txt\", \"w\")`\n", 217 | "\n", 218 | "What that means is, open the file for writing and the string \"w\" tells open to do just that - open the file for writing.\n", 219 | "\n", 220 | "Later on we used open like so:\n", 221 | "\n", 222 | "`just_created_file = open(\"new_file.txt\", \"r\")` - this tells open to open the file for reading, with \"r\" for reading.\n", 223 | "\n", 224 | "There is a third command we'll be interested in today: appending.\n", 225 | "\n", 226 | "open with the \"w\" passed in as the second input opens a file for writing, and if it doesn't exist yet, it creates the file for you! However, if the file already exists, it overwrites any content already in the file. That's clearly not always what we want.\n", 227 | "\n", 228 | "So there is a `open(\"new_file.txt\", \"a\")` comamnd which opens a new file for appending. This means the file's contents is added to, instead of overwritten. \n", 229 | "\n", 230 | "So let's try it out!" 231 | ] 232 | }, 233 | { 234 | "cell_type": "code", 235 | "execution_count": 11, 236 | "metadata": {}, 237 | "outputs": [ 238 | { 239 | "name": "stdout", 240 | "output_type": "stream", 241 | "text": [ 242 | "Hello there!\n", 243 | " yo!\n" 244 | ] 245 | } 246 | ], 247 | "source": [ 248 | "file_handle = open(\"new_file.txt\", \"a\")\n", 249 | "file_handle.write(\"\\n yo!\")\n", 250 | "file_handle.close()\n", 251 | "\n", 252 | "file_handle = open(\"new_file.txt\", \"r\")\n", 253 | "print(file_handle.read())" 254 | ] 255 | }, 256 | { 257 | "cell_type": "markdown", 258 | "metadata": {}, 259 | "source": [ 260 | "Now we'll try the same thing, except will use \"w\" instead of \"a\" for the second parameter." 261 | ] 262 | }, 263 | { 264 | "cell_type": "code", 265 | "execution_count": 12, 266 | "metadata": {}, 267 | "outputs": [ 268 | { 269 | "name": "stdout", 270 | "output_type": "stream", 271 | "text": [ 272 | "\n", 273 | " yo!\n" 274 | ] 275 | } 276 | ], 277 | "source": [ 278 | "file_handle = open(\"new_file.txt\", \"w\")\n", 279 | "file_handle.write(\"\\n yo!\")\n", 280 | "file_handle.close()\n", 281 | "\n", 282 | "file_handle = open(\"new_file.txt\", \"r\")\n", 283 | "print(file_handle.read())" 284 | ] 285 | }, 286 | { 287 | "cell_type": "markdown", 288 | "metadata": {}, 289 | "source": [ 290 | "As you can see, \"Hello there!\" is gone." 291 | ] 292 | }, 293 | { 294 | "cell_type": "code", 295 | "execution_count": null, 296 | "metadata": { 297 | "collapsed": true 298 | }, 299 | "outputs": [], 300 | "source": [] 301 | } 302 | ], 303 | "metadata": { 304 | "kernelspec": { 305 | "display_name": "Python 3", 306 | "language": "python", 307 | "name": "python3" 308 | }, 309 | "language_info": { 310 | "codemirror_mode": { 311 | "name": "ipython", 312 | "version": 3 313 | }, 314 | "file_extension": ".py", 315 | "mimetype": "text/x-python", 316 | "name": "python", 317 | "nbconvert_exporter": "python", 318 | "pygments_lexer": "ipython3", 319 | "version": "3.6.1" 320 | } 321 | }, 322 | "nbformat": 4, 323 | "nbformat_minor": 2 324 | } 325 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Introduction 2 | 3 | Welcome to introduction to Python! 4 | 5 | This three week class will prepare you to read and write basic python scripts. No prior programming experience required! 6 | 7 | # Help with the homework? 8 | 9 | Email `eric.schles@gsa.gov` 10 | 11 | # Administrivia 12 | 13 | Some homework assignments are from hackerRank. There are a few reasons for this: 14 | 15 | 1. Teaches you how to write code that will pass tests. 16 | 2. No installation required - you don't need Python locally in order to write code here. 17 | 3. Solve real problems. 18 | 19 | # Course Breakdown 20 | 21 | ## Week 1 22 | 23 | * Introduction to Python - Class 1 (class time 30 minutes) - [CLASS ONE NOTES](https://github.com/18F/an_introduction_to_python/blob/master/class_one.ipynb) 24 | * writing your first program 25 | * understanding Python data types 26 | * if/else statements in Python 27 | * functions in python 28 | * printing to the screen 29 | * string processing basics - the `.format` operator. 30 | 31 | * Standing on the shoulders of giants - Class 2 (class time 45 minutes) - [CLASS TWO NOTES](https://github.com/18F/an_introduction_to_python/blob/master/class_two.ipynb) 32 | * reading and writing files 33 | * introduction to import statements 34 | * introduction to the os module 35 | 36 | Homework: 37 | 38 | ## Assignment for class 1: 39 | 40 | 1. https://www.hackerrank.com/challenges/py-hello-world 41 | 2. https://www.hackerrank.com/challenges/python-raw-input 42 | 3. https://www.hackerrank.com/challenges/py-if-else 43 | 4. https://www.hackerrank.com/challenges/python-arithmetic-operators 44 | 5. https://www.hackerrank.com/challenges/python-division 45 | 6. https://www.hackerrank.com/challenges/write-a-function 46 | 7. https://www.hackerrank.com/challenges/python-print 47 | 48 | 49 | ## Assignment for class 2: 50 | 51 | For this assignment you will be either creating an account on https://www.pythonanywhere.com 52 | or downloading and installing Python locally. If at all possible, it is better to install python locally. 53 | 54 | But if you are restricted from doing so, PythonAnywhere works. 55 | 56 | 57 | 1. Write a Python program called create_file.py 58 | 59 | The program should create a file called practing_file_writing.txt. The file should contain the following text: 60 | 61 | Hello! You've successfully created this file with a program! Congradulations!!!! 62 | 63 | 2. Write a Python program called edit_file.py 64 | 65 | The program should open practicing_file_writing.txt and then add the following line to the end of the file: 66 | 67 | Sincerely, 68 | Python 69 | 70 | So the whole file should now look like: 71 | 72 | Hello! You've successfully created this file with a program! Congradulations!!!! 73 | Sincerely, 74 | Python 75 | 76 | The file should then be writen back out as practicing_file_updating.txt. 77 | 78 | 3. Create a directory called to_traverse and put the file two files created above in the directory. Then from the directory above to_traverse put a python program called traverse_and_open.py. This python program should change directories, read both the files into memory and then check if the two files contents are equal. 79 | 80 | Week 2 81 | 82 | 83 | * Introduction to Python Data Structures - Class 1 (class time 30 minutes) 84 | * lists 85 | * while loops 86 | * for loops 87 | * dictionaries 88 | * sets 89 | 90 | * More file processing - Class 2 (class time 45 minutes) 91 | * string processing 92 | * using for loops to read a bunch of text from a file 93 | * using for loops to write a bunch of text to a file 94 | * using for loops to move from one directory to the next 95 | * working with os.walk 96 | 97 | ## Assignment for class 3: 98 | 99 | 1. https://www.hackerrank.com/challenges/python-lists 100 | 2. https://www.hackerrank.com/challenges/find-second-maximum-number-in-a-list 101 | 3. https://www.hackerrank.com/challenges/finding-the-percentage 102 | 4. https://www.hackerrank.com/challenges/py-introduction-to-sets 103 | 5. https://www.hackerrank.com/challenges/symmetric-difference 104 | 6. https://www.hackerrank.com/challenges/python-loops 105 | 106 | ## Assignment for class 4: 107 | 108 | 1. Write a program called traversal.py - This file will traverse the entire file system, starting at the first directory you give it. 109 | 110 | 2. Write a new program called print_traversal.py - this file will traverse the entire file system, starting at the first directory you give it and print out the full paths of every filename it traverses. 111 | 112 | 3. Write a new program called print_analyze_traversal.py - this file traverse the entire file system, starting at the first directory you give it and print out the full paths of every filename it traverses. Also it will record the number of times the file is a python file and the number of times it is not. At the end of the program, the program should print out how many python files were found and how many non-python files were found. It should print this out in absolute terms and relative terms. Example: 113 | 114 | total number of python files found: 5 115 | total number of non-python files found: 500 116 | percentage of python files: 1% 117 | percentage of non-python files: 99% 118 | 119 | Week 3 120 | 121 | * Introduction to methods - Class 1 (class time 30 minutes) 122 | * methods on integers 123 | * methods on strings 124 | * methods on lists 125 | * working with dictionaries 126 | 127 | * Making use of methods on strings with files (class time 45 minutes) 128 | * advanced string processing 129 | * reading files and processing them as strings in Python 130 | * traversing directories to look for patterns 131 | * introduction to very basic regular expressions 132 | 133 | ## Assignment for class 5: 134 | 135 | * Do every question in this section - https://www.hackerrank.com/domains/python/py-strings 136 | * Do every question in this section - https://www.hackerrank.com/domains/python/py-math 137 | 138 | ## Assignment for class 6: 139 | 140 | Now that you have an understanding of how to iterate over a set of things, let's make use of that to really do something interesting! 141 | 142 | 1. write a program called find.py. This program will start at whatever directory you give it and traverse until it finds a specific string. In this case, the string will be Hello there! If the string is never found, after looking through all subfolders and files, the program terminates with, couldn't find the string. 143 | 144 | How you'll call the file: python find.py dir_name 145 | 146 | 2. Now you'll be creating a new program called find_replace.py. This program will do the same traversal as the last program, except, it will also open any files with Hello there! edit that line to say Hi instead of Hello there! and close the file, writing it back out to the file system. 147 | 148 | 3. Now you'll be going even further - now instead of looking for a simple find and replace, you'll be adding descriptive statistics to any files you find. The descriptive statistics you'll be adding are: 149 | 150 | 1. the number of lines in the file 151 | 2. the five most commonly used words in the file 152 | 3. if the file is a program, what language is it written in 153 | 154 | Note you should write functions for each of these sub commands. Here's what a typical output should look like: 155 | 156 | This file has 10 lines 157 | The five most common words used are: print, if, else, import, and 158 | This file is written in Python 159 | -------------------------------------------------------------------------------- /class_five.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "* Classes and objects\n", 8 | "* advanced string processing\n", 9 | "* installing Python locally\n", 10 | "* using pip\n", 11 | "* installing jupyter notebook\n", 12 | "* intro to pandas with basic examples\n", 13 | "* intro to docx" 14 | ] 15 | }, 16 | { 17 | "cell_type": "code", 18 | "execution_count": 1, 19 | "metadata": { 20 | "collapsed": true 21 | }, 22 | "outputs": [], 23 | "source": [ 24 | "# reading word docs into memory\n", 25 | "from docx import Document\n", 26 | "doc = Document()\n", 27 | "doc.add_paragraph('Lorem ipsum dolor sit amet.')\n", 28 | "doc.save(\"test.docx\")\n" 29 | ] 30 | }, 31 | { 32 | "cell_type": "code", 33 | "execution_count": 4, 34 | "metadata": {}, 35 | "outputs": [ 36 | { 37 | "name": "stdout", 38 | "output_type": "stream", 39 | "text": [ 40 | "Lorem ipsum dolor sit amet.\n" 41 | ] 42 | } 43 | ], 44 | "source": [ 45 | "from docx import Document\n", 46 | "document = Document('test.docx')\n", 47 | "for paragraph in document.paragraphs:\n", 48 | " print(paragraph.text)" 49 | ] 50 | }, 51 | { 52 | "cell_type": "code", 53 | "execution_count": 5, 54 | "metadata": {}, 55 | "outputs": [ 56 | { 57 | "data": { 58 | "text/html": [ 59 | "
\n", 60 | "\n", 61 | " \n", 62 | " \n", 63 | " \n", 64 | " \n", 65 | " \n", 66 | " \n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | "
Unnamed: 0abcd
00-0.5910231690.296527'molly'
110.6265411389.357373'dolly'
22-0.1210862395.636287'molly'
330.6913771995.360784'molly'
44-0.403012596.244526'dolly'
\n", 114 | "
" 115 | ], 116 | "text/plain": [ 117 | " Unnamed: 0 a b c d\n", 118 | "0 0 -0.591023 16 90.296527 'molly'\n", 119 | "1 1 0.626541 13 89.357373 'dolly'\n", 120 | "2 2 -0.121086 23 95.636287 'molly'\n", 121 | "3 3 0.691377 19 95.360784 'molly'\n", 122 | "4 4 -0.403012 5 96.244526 'dolly'" 123 | ] 124 | }, 125 | "execution_count": 5, 126 | "metadata": {}, 127 | "output_type": "execute_result" 128 | } 129 | ], 130 | "source": [ 131 | "import pandas as pd\n", 132 | "data_frame = pd.read_csv(\"data.csv\")\n", 133 | "data_frame.head()" 134 | ] 135 | }, 136 | { 137 | "cell_type": "code", 138 | "execution_count": 7, 139 | "metadata": {}, 140 | "outputs": [ 141 | { 142 | "name": "stdout", 143 | "output_type": "stream", 144 | "text": [ 145 | "['T', '_AXIS_ALIASES', '_AXIS_IALIASES', '_AXIS_LEN', '_AXIS_NAMES', '_AXIS_NUMBERS', '_AXIS_ORDERS', '_AXIS_REVERSED', '_AXIS_SLICEMAP', '__abs__', '__add__', '__and__', '__array__', '__array_wrap__', '__bool__', '__bytes__', '__class__', '__contains__', '__delattr__', '__delitem__', '__dict__', '__dir__', '__div__', '__doc__', '__eq__', '__finalize__', '__floordiv__', '__format__', '__ge__', '__getattr__', '__getattribute__', '__getitem__', '__getstate__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__init_subclass__', '__invert__', '__ipow__', '__isub__', '__iter__', '__itruediv__', '__le__', '__len__', '__lt__', '__mod__', '__module__', '__mul__', '__ne__', '__neg__', '__new__', '__nonzero__', '__or__', '__pow__', '__radd__', '__rand__', '__rdiv__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rmod__', '__rmul__', '__ror__', '__round__', '__rpow__', '__rsub__', '__rtruediv__', '__rxor__', '__setattr__', '__setitem__', '__setstate__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__', '__unicode__', '__weakref__', '__xor__', '_accessors', '_add_numeric_operations', '_add_series_only_operations', '_add_series_or_dataframe_operations', '_agg_by_level', '_align_frame', '_align_series', '_apply_broadcast', '_apply_empty_result', '_apply_raw', '_apply_standard', '_at', '_box_col_values', '_box_item_values', '_check_inplace_setting', '_check_is_chained_assignment_possible', '_check_percentile', '_check_setitem_copy', '_clear_item_cache', '_combine_const', '_combine_frame', '_combine_match_columns', '_combine_match_index', '_combine_series', '_combine_series_infer', '_compare_frame', '_compare_frame_evaluate', '_consolidate_inplace', '_construct_axes_dict', '_construct_axes_dict_for_slice', '_construct_axes_dict_from', '_construct_axes_from_arguments', '_constructor', '_constructor_expanddim', '_constructor_sliced', '_convert', '_count_level', '_create_indexer', '_dir_additions', '_dir_deletions', '_ensure_valid_index', '_expand_axes', '_flex_compare_frame', '_from_arrays', '_from_axes', '_get_agg_axis', '_get_axis', '_get_axis_name', '_get_axis_number', '_get_axis_resolvers', '_get_block_manager_axis', '_get_bool_data', '_get_cacher', '_get_index_resolvers', '_get_item_cache', '_get_numeric_data', '_get_values', '_getitem_array', '_getitem_column', '_getitem_frame', '_getitem_multilevel', '_getitem_slice', '_iat', '_iget_item_cache', '_iloc', '_indexed_same', '_info_axis', '_info_axis_name', '_info_axis_number', '_info_repr', '_init_dict', '_init_mgr', '_init_ndarray', '_internal_names', '_internal_names_set', '_is_cached', '_is_datelike_mixed_type', '_is_mixed_type', '_is_numeric_mixed_type', '_is_view', '_ix', '_ixs', '_join_compat', '_loc', '_maybe_cache_changed', '_maybe_update_cacher', '_metadata', '_needs_reindex_multi', '_nsorted', '_protect_consolidate', '_reduce', '_reindex_axes', '_reindex_axis', '_reindex_columns', '_reindex_index', '_reindex_multi', '_reindex_with_indexers', '_repr_fits_horizontal_', '_repr_fits_vertical_', '_repr_html_', '_repr_latex_', '_reset_cache', '_reset_cacher', '_sanitize_column', '_series', '_set_as_cached', '_set_axis', '_set_axis_name', '_set_is_copy', '_set_item', '_setitem_array', '_setitem_frame', '_setitem_slice', '_setup_axes', '_slice', '_stat_axis', '_stat_axis_name', '_stat_axis_number', '_typ', '_unpickle_frame_compat', '_unpickle_matrix_compat', '_update_inplace', '_validate_dtype', '_values', '_xs', 'a', 'abs', 'add', 'add_prefix', 'add_suffix', 'align', 'all', 'any', 'append', 'apply', 'applymap', 'as_blocks', 'as_matrix', 'asfreq', 'assign', 'astype', 'at', 'at_time', 'axes', 'b', 'between_time', 'bfill', 'blocks', 'bool', 'boxplot', 'c', 'clip', 'clip_lower', 'clip_upper', 'columns', 'combine', 'combineAdd', 'combineMult', 'combine_first', 'compound', 'consolidate', 'convert_objects', 'copy', 'corr', 'corrwith', 'count', 'cov', 'cummax', 'cummin', 'cumprod', 'cumsum', 'd', 'describe', 'diff', 'div', 'divide', 'dot', 'drop', 'drop_duplicates', 'dropna', 'dtypes', 'duplicated', 'empty', 'eq', 'equals', 'eval', 'ewm', 'expanding', 'ffill', 'fillna', 'filter', 'first', 'first_valid_index', 'floordiv', 'from_csv', 'from_dict', 'from_items', 'from_records', 'ftypes', 'ge', 'get', 'get_dtype_counts', 'get_ftype_counts', 'get_value', 'get_values', 'groupby', 'gt', 'head', 'hist', 'iat', 'icol', 'idxmax', 'idxmin', 'iget_value', 'iloc', 'index', 'info', 'insert', 'interpolate', 'irow', 'is_copy', 'isin', 'isnull', 'items', 'iteritems', 'iterkv', 'iterrows', 'itertuples', 'ix', 'join', 'keys', 'kurt', 'kurtosis', 'last', 'last_valid_index', 'le', 'loc', 'lookup', 'lt', 'mad', 'mask', 'max', 'mean', 'median', 'memory_usage', 'merge', 'min', 'mod', 'mode', 'mul', 'multiply', 'ndim', 'ne', 'nlargest', 'notnull', 'nsmallest', 'pct_change', 'pipe', 'pivot', 'pivot_table', 'plot', 'pop', 'pow', 'prod', 'product', 'quantile', 'query', 'radd', 'rank', 'rdiv', 'reindex', 'reindex_axis', 'reindex_like', 'rename', 'rename_axis', 'reorder_levels', 'replace', 'resample', 'reset_index', 'rfloordiv', 'rmod', 'rmul', 'rolling', 'round', 'rpow', 'rsub', 'rtruediv', 'sample', 'select', 'select_dtypes', 'sem', 'set_axis', 'set_index', 'set_value', 'shape', 'shift', 'size', 'skew', 'slice_shift', 'sort', 'sort_index', 'sort_values', 'sortlevel', 'squeeze', 'stack', 'std', 'style', 'sub', 'subtract', 'sum', 'swapaxes', 'swaplevel', 'tail', 'take', 'to_clipboard', 'to_csv', 'to_dense', 'to_dict', 'to_excel', 'to_gbq', 'to_hdf', 'to_html', 'to_json', 'to_latex', 'to_msgpack', 'to_panel', 'to_period', 'to_pickle', 'to_records', 'to_sparse', 'to_sql', 'to_stata', 'to_string', 'to_timestamp', 'to_wide', 'to_xarray', 'transpose', 'truediv', 'truncate', 'tshift', 'tz_convert', 'tz_localize', 'unstack', 'update', 'values', 'var', 'where', 'xs']\n" 146 | ] 147 | } 148 | ], 149 | "source": [ 150 | "print(dir(data_frame))" 151 | ] 152 | }, 153 | { 154 | "cell_type": "code", 155 | "execution_count": null, 156 | "metadata": { 157 | "collapsed": true 158 | }, 159 | "outputs": [], 160 | "source": [] 161 | } 162 | ], 163 | "metadata": { 164 | "kernelspec": { 165 | "display_name": "Python 3", 166 | "language": "python", 167 | "name": "python3" 168 | }, 169 | "language_info": { 170 | "codemirror_mode": { 171 | "name": "ipython", 172 | "version": 3 173 | }, 174 | "file_extension": ".py", 175 | "mimetype": "text/x-python", 176 | "name": "python", 177 | "nbconvert_exporter": "python", 178 | "pygments_lexer": "ipython3", 179 | "version": "3.6.2" 180 | } 181 | }, 182 | "nbformat": 4, 183 | "nbformat_minor": 2 184 | } 185 | -------------------------------------------------------------------------------- /class_four.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## More File Processing\n", 8 | "\n", 9 | "So far we've learned looping, but what is it good for really? Today we will see our first application of a for loop, that will make sense! Today, we will be reading a file into memory, treating each of the lines of the file as strings and applying various processing techniques to them. We'll first go over reading text into memory and then we'll go over some string processing techniques. \n", 10 | "\n", 11 | "\n" 12 | ] 13 | }, 14 | { 15 | "cell_type": "markdown", 16 | "metadata": {}, 17 | "source": [ 18 | "```\n", 19 | "with [FILE HANDLE] as [NAME OF FILE OBJECT]:\n", 20 | " any processing code goes here\n", 21 | "```" 22 | ] 23 | }, 24 | { 25 | "cell_type": "code", 26 | "execution_count": 6, 27 | "metadata": { 28 | "collapsed": true 29 | }, 30 | "outputs": [], 31 | "source": [ 32 | "with open(\"alice_in_wonderland.txt\",\"r\") as f:\n", 33 | " text = f.read()" 34 | ] 35 | }, 36 | { 37 | "cell_type": "markdown", 38 | "metadata": {}, 39 | "source": [ 40 | "The above code reads our text into memory for processing. We are now ready to start processing and analyzing our piece of text!\n", 41 | "\n", 42 | "There are lots of methods built into the python language for processing strings:" 43 | ] 44 | }, 45 | { 46 | "cell_type": "markdown", 47 | "metadata": {}, 48 | "source": [ 49 | "_Definition_ **Context Management** := Context management is the process by you which you manage different contexts. A context is a space where certain variables take on specific values that will change when you leave that context. So from a high level we can think of this is a seperated ecosystem, where things are true, but may not be true later. " 50 | ] 51 | }, 52 | { 53 | "cell_type": "code", 54 | "execution_count": 5, 55 | "metadata": {}, 56 | "outputs": [ 57 | { 58 | "name": "stdout", 59 | "output_type": "stream", 60 | "text": [ 61 | "1\n" 62 | ] 63 | }, 64 | { 65 | "ename": "NameError", 66 | "evalue": "name 'z' is not defined", 67 | "output_type": "error", 68 | "traceback": [ 69 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 70 | "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", 71 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 13\u001b[0m \u001b[0mfunc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m5\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m6\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 14\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtmp_var\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 15\u001b[0;31m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mz\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", 72 | "\u001b[0;31mNameError\u001b[0m: name 'z' is not defined" 73 | ] 74 | } 75 | ], 76 | "source": [ 77 | "# with in python\n", 78 | "# a reference for you to read on your own - \n", 79 | "# https://jeffknupp.com/blog/2016/03/07/python-with-context-managers/\n", 80 | "\n", 81 | "def func(x, y):\n", 82 | " z = x + y\n", 83 | "\n", 84 | "with open(\"alice_in_wonderland.txt\", \"r\") as file_handle:\n", 85 | " tmp_var = 0\n", 86 | " text = file_handle.read()\n", 87 | " tmp_var += 1\n", 88 | "\n", 89 | "func(5, 6)\n", 90 | "print(tmp_var)\n", 91 | "print(z)" 92 | ] 93 | }, 94 | { 95 | "cell_type": "code", 96 | "execution_count": 7, 97 | "metadata": {}, 98 | "outputs": [ 99 | { 100 | "name": "stdout", 101 | "output_type": "stream", 102 | "text": [ 103 | "True\n" 104 | ] 105 | } 106 | ], 107 | "source": [ 108 | "listing = []\n", 109 | "for i in range(100):\n", 110 | " listing.append(i)\n", 111 | "print(listing == [i for i in range(100)])" 112 | ] 113 | }, 114 | { 115 | "cell_type": "code", 116 | "execution_count": 8, 117 | "metadata": {}, 118 | "outputs": [ 119 | { 120 | "name": "stdout", 121 | "output_type": "stream", 122 | "text": [ 123 | "True\n" 124 | ] 125 | } 126 | ], 127 | "source": [ 128 | "listing = []\n", 129 | "for i in range(100):\n", 130 | " if i < 10:\n", 131 | " listing.append(i)\n", 132 | "print(listing == [i for i in range(100) if i < 10])" 133 | ] 134 | }, 135 | { 136 | "cell_type": "markdown", 137 | "metadata": {}, 138 | "source": [ 139 | "`[tranformation to temporary variable; for loop goes here; flow of control logic goes here]`" 140 | ] 141 | }, 142 | { 143 | "cell_type": "code", 144 | "execution_count": 1, 145 | "metadata": {}, 146 | "outputs": [ 147 | { 148 | "data": { 149 | "text/plain": [ 150 | "['capitalize',\n", 151 | " 'casefold',\n", 152 | " 'center',\n", 153 | " 'count',\n", 154 | " 'encode',\n", 155 | " 'endswith',\n", 156 | " 'expandtabs',\n", 157 | " 'find',\n", 158 | " 'format',\n", 159 | " 'format_map',\n", 160 | " 'index',\n", 161 | " 'isalnum',\n", 162 | " 'isalpha',\n", 163 | " 'isdecimal',\n", 164 | " 'isdigit',\n", 165 | " 'isidentifier',\n", 166 | " 'islower',\n", 167 | " 'isnumeric',\n", 168 | " 'isprintable',\n", 169 | " 'isspace',\n", 170 | " 'istitle',\n", 171 | " 'isupper',\n", 172 | " 'join',\n", 173 | " 'ljust',\n", 174 | " 'lower',\n", 175 | " 'lstrip',\n", 176 | " 'maketrans',\n", 177 | " 'partition',\n", 178 | " 'replace',\n", 179 | " 'rfind',\n", 180 | " 'rindex',\n", 181 | " 'rjust',\n", 182 | " 'rpartition',\n", 183 | " 'rsplit',\n", 184 | " 'rstrip',\n", 185 | " 'split',\n", 186 | " 'splitlines',\n", 187 | " 'startswith',\n", 188 | " 'strip',\n", 189 | " 'swapcase',\n", 190 | " 'title',\n", 191 | " 'translate',\n", 192 | " 'upper',\n", 193 | " 'zfill']" 194 | ] 195 | }, 196 | "execution_count": 1, 197 | "metadata": {}, 198 | "output_type": "execute_result" 199 | } 200 | ], 201 | "source": [ 202 | "[elem for elem in dir(str()) if \"__\" not in elem]" 203 | ] 204 | }, 205 | { 206 | "cell_type": "markdown", 207 | "metadata": {}, 208 | "source": [ 209 | "Today we will be going over:\n", 210 | "\n", 211 | "* lower\n", 212 | "* upper\n", 213 | "* islower\n", 214 | "* isnumeric\n", 215 | "* isupper\n", 216 | "* split\n", 217 | "* splitlines\n", 218 | "* translate\n", 219 | "* strip\n", 220 | "* lstrip\n", 221 | "* rstrip\n", 222 | "* isalnum\n", 223 | "* isalpha\n", 224 | "* isdecimal\n", 225 | "* isspace\n", 226 | "* isdigit\n", 227 | "* find\n", 228 | "* startswith\n", 229 | "* endswith\n", 230 | "* captialize\n", 231 | "\n", 232 | "This may seem like a lot but don't worry! It's really only a few categories of things :)\n", 233 | "\n", 234 | "Let's get started with `split` and then move onto the `startswith`, `endswith` methods, because those are pretty easy to work with." 235 | ] 236 | }, 237 | { 238 | "cell_type": "code", 239 | "execution_count": 9, 240 | "metadata": {}, 241 | "outputs": [ 242 | { 243 | "name": "stdout", 244 | "output_type": "stream", 245 | "text": [ 246 | "\n" 247 | ] 248 | }, 249 | { 250 | "data": { 251 | "text/plain": [ 252 | "\"Project Gutenberg's Alice's Adventures in Wonderland, by Lewis Carroll\"" 253 | ] 254 | }, 255 | "execution_count": 9, 256 | "metadata": {}, 257 | "output_type": "execute_result" 258 | } 259 | ], 260 | "source": [ 261 | "with open(\"alice_in_wonderland.txt\",\"r\") as f:\n", 262 | " text = f.read()\n", 263 | "lines = text.split(\"\\n\")\n", 264 | "print(type(lines))\n", 265 | "lines[0]" 266 | ] 267 | }, 268 | { 269 | "cell_type": "code", 270 | "execution_count": 12, 271 | "metadata": {}, 272 | "outputs": [ 273 | { 274 | "data": { 275 | "text/plain": [ 276 | "'my'" 277 | ] 278 | }, 279 | "execution_count": 12, 280 | "metadata": {}, 281 | "output_type": "execute_result" 282 | } 283 | ], 284 | "source": [ 285 | "listing = \"Hello there my name is Eric\".split()\n", 286 | "listing[2]" 287 | ] 288 | }, 289 | { 290 | "cell_type": "markdown", 291 | "metadata": {}, 292 | "source": [ 293 | "The `split` method allows us to specify a character to split the text on. Every time the character is seen, it will be split at that character. Here's a more worked example:" 294 | ] 295 | }, 296 | { 297 | "cell_type": "code", 298 | "execution_count": 5, 299 | "metadata": {}, 300 | "outputs": [ 301 | { 302 | "name": "stdout", 303 | "output_type": "stream", 304 | "text": [ 305 | "['For this string', \" we'll split\", ' on', ' commas', ' okay?']\n", 306 | "For this string\n", 307 | " we'll split\n", 308 | " on\n", 309 | " commas\n", 310 | " okay?\n" 311 | ] 312 | } 313 | ], 314 | "source": [ 315 | "string = \"For this string, we'll split, on, commas, okay?\"\n", 316 | "listing = string.split(\",\")\n", 317 | "print(listing)\n", 318 | "for elem in listing:\n", 319 | " print(elem)" 320 | ] 321 | }, 322 | { 323 | "cell_type": "markdown", 324 | "metadata": {}, 325 | "source": [ 326 | "So for every comma in the string, the string is split on that character. By default, if you don't pass in anything, then the split function splits on white space characters.\n", 327 | "\n", 328 | "Now that we can split on specific strings, let's look at the `startswith` and `endswith` methods. These methods will be used to do some baseline analysis of the text! " 329 | ] 330 | }, 331 | { 332 | "cell_type": "code", 333 | "execution_count": 22, 334 | "metadata": {}, 335 | "outputs": [ 336 | { 337 | "name": "stdout", 338 | "output_type": "stream", 339 | "text": [ 340 | "Hello there my name is Eric\n", 341 | "\n", 342 | "Hello\n", 343 | "there\n", 344 | "my\n", 345 | "name\n", 346 | "is\n", 347 | "Eric\n" 348 | ] 349 | } 350 | ], 351 | "source": [ 352 | "listing = \"Hello there my name is Eric\".split()\n", 353 | "string_with_spaces = \" \".join(listing)\n", 354 | "print(string_with_spaces)\n", 355 | "string_with_newlines = \"\\n\".join(listing)\n", 356 | "print()\n", 357 | "print(string_with_newlines)" 358 | ] 359 | }, 360 | { 361 | "cell_type": "code", 362 | "execution_count": 1, 363 | "metadata": {}, 364 | "outputs": [ 365 | { 366 | "name": "stdout", 367 | "output_type": "stream", 368 | "text": [ 369 | "yup\n" 370 | ] 371 | } 372 | ], 373 | "source": [ 374 | "string = \"Hello there\"\n", 375 | "if string.startswith(\"Hello\"):\n", 376 | " print(\"yup\")\n", 377 | "else:\n", 378 | " print(\"what?!\")" 379 | ] 380 | }, 381 | { 382 | "cell_type": "code", 383 | "execution_count": 23, 384 | "metadata": {}, 385 | "outputs": [ 386 | { 387 | "name": "stdout", 388 | "output_type": "stream", 389 | "text": [ 390 | "phew\n" 391 | ] 392 | } 393 | ], 394 | "source": [ 395 | "string = \"Hello there\"\n", 396 | "if string.endswith(\"Hello\"):\n", 397 | " print(\"what happened?\")\n", 398 | "else:\n", 399 | " print(\"phew\")" 400 | ] 401 | }, 402 | { 403 | "cell_type": "code", 404 | "execution_count": 27, 405 | "metadata": {}, 406 | "outputs": [ 407 | { 408 | "name": "stdout", 409 | "output_type": "stream", 410 | "text": [ 411 | "True\n", 412 | "True\n", 413 | "True\n", 414 | "False\n", 415 | "True\n", 416 | "False\n" 417 | ] 418 | } 419 | ], 420 | "source": [ 421 | "listing = [True, True]\n", 422 | "print(any(listing)) #if any booleans are True\n", 423 | "print(all(listing)) # if all booleans are True\n", 424 | "number_listing = [0, 2]\n", 425 | "print(any(number_listing))\n", 426 | "print(all(number_listing))\n", 427 | "string_listing = [\"\", \"stuff\"]\n", 428 | "print(any(string_listing))\n", 429 | "print(all(string_listing))" 430 | ] 431 | }, 432 | { 433 | "cell_type": "code", 434 | "execution_count": 3, 435 | "metadata": {}, 436 | "outputs": [ 437 | { 438 | "name": "stdout", 439 | "output_type": "stream", 440 | "text": [ 441 | "True\n", 442 | "False\n" 443 | ] 444 | } 445 | ], 446 | "source": [ 447 | "listing = [False, True]\n", 448 | "print(any(listing)) #if any booleans are True\n", 449 | "print(all(listing)) # if all booleans are True" 450 | ] 451 | }, 452 | { 453 | "cell_type": "code", 454 | "execution_count": 4, 455 | "metadata": {}, 456 | "outputs": [ 457 | { 458 | "name": "stdout", 459 | "output_type": "stream", 460 | "text": [ 461 | "False\n", 462 | "False\n" 463 | ] 464 | } 465 | ], 466 | "source": [ 467 | "listing = [False, False]\n", 468 | "print(any(listing)) #if any booleans are True\n", 469 | "print(all(listing)) # if all booleans are True" 470 | ] 471 | }, 472 | { 473 | "cell_type": "code", 474 | "execution_count": 6, 475 | "metadata": {}, 476 | "outputs": [ 477 | { 478 | "name": "stdout", 479 | "output_type": "stream", 480 | "text": [ 481 | "[1, 2, 3, 4, 5, 6, 7, 8, 9]\n", 482 | "[2, 3, 4, 5, 6, 7, 8, 9, 10]\n" 483 | ] 484 | } 485 | ], 486 | "source": [ 487 | "print([x for x in range(1,10)])\n", 488 | "print([x+1 for x in range(1,10)])" 489 | ] 490 | }, 491 | { 492 | "cell_type": "code", 493 | "execution_count": 30, 494 | "metadata": {}, 495 | "outputs": [ 496 | { 497 | "data": { 498 | "text/plain": [ 499 | "(0, 0)" 500 | ] 501 | }, 502 | "execution_count": 30, 503 | "metadata": {}, 504 | "output_type": "execute_result" 505 | } 506 | ], 507 | "source": [ 508 | "greeting_words = [\"Hello\", \"Hi\", \"How are you?\"]\n", 509 | "goodbye_words = [\"Goodbye\", \"See you\", \"See you!\"]\n", 510 | "\n", 511 | "with open(\"alice_in_wonderland.txt\",\"r\") as f:\n", 512 | " text = f.read()\n", 513 | "lines = text.split(\"\\n\")\n", 514 | "greetings = 0\n", 515 | "goodbyes = 0\n", 516 | "for line in lines:\n", 517 | " if any([line.startswith(elem) for elem in greeting_words]):\n", 518 | " greetings += 1\n", 519 | " if any([line.startswith(elem) for elem in goodbye_words]):\n", 520 | " goodbyes += 1\n", 521 | "greetings, goodbyes" 522 | ] 523 | }, 524 | { 525 | "cell_type": "code", 526 | "execution_count": 9, 527 | "metadata": {}, 528 | "outputs": [ 529 | { 530 | "name": "stdout", 531 | "output_type": "stream", 532 | "text": [ 533 | "False\n", 534 | "woah\n" 535 | ] 536 | } 537 | ], 538 | "source": [ 539 | "print(\"See you\" == \"See you!\")\n", 540 | "see_you = \"See you\"\n", 541 | "see_you_exclam = \"See you!\"\n", 542 | "if \"See you!\".startswith(see_you):\n", 543 | " print(\"woah\")" 544 | ] 545 | }, 546 | { 547 | "cell_type": "code", 548 | "execution_count": 7, 549 | "metadata": {}, 550 | "outputs": [ 551 | { 552 | "data": { 553 | "text/plain": [ 554 | "(0, 0)" 555 | ] 556 | }, 557 | "execution_count": 7, 558 | "metadata": {}, 559 | "output_type": "execute_result" 560 | } 561 | ], 562 | "source": [ 563 | "greeting_words = [\"Hello\", \"Hi\", \"How are you?\"]\n", 564 | "goodbye_words = [\"Goodbye\", \"See you\", \"See you!\"]\n", 565 | "\n", 566 | "with open(\"alice_in_wonderland.txt\",\"r\") as f:\n", 567 | " text = f.read()\n", 568 | "lines = text.split(\"\\n\")\n", 569 | "greetings = 0\n", 570 | "goodbyes = 0\n", 571 | "for line in lines:\n", 572 | " if any([line.endswith(elem) for elem in greeting_words]):\n", 573 | " greetings += 1\n", 574 | " if any([line.endswith(elem) for elem in goodbye_words]):\n", 575 | " goodbyes += 1\n", 576 | "greetings, goodbyes" 577 | ] 578 | }, 579 | { 580 | "cell_type": "markdown", 581 | "metadata": {}, 582 | "source": [ 583 | "Looks like our set of greetings and goodbyes didn't yield any results, I guess folks aren't very friendly! Let's expand the list by ignoring case, but how do we do that?! Enter `upper` and `lower`." 584 | ] 585 | }, 586 | { 587 | "cell_type": "code", 588 | "execution_count": 8, 589 | "metadata": {}, 590 | "outputs": [ 591 | { 592 | "data": { 593 | "text/plain": [ 594 | "\"LET'S MAKE THIS TEXT ALL UPPERCASE\"" 595 | ] 596 | }, 597 | "execution_count": 8, 598 | "metadata": {}, 599 | "output_type": "execute_result" 600 | } 601 | ], 602 | "source": [ 603 | "string = \"Let's make this text all uppercase\"\n", 604 | "string.upper()" 605 | ] 606 | }, 607 | { 608 | "cell_type": "code", 609 | "execution_count": 9, 610 | "metadata": {}, 611 | "outputs": [ 612 | { 613 | "data": { 614 | "text/plain": [ 615 | "\"let's make this text all lowercase\"" 616 | ] 617 | }, 618 | "execution_count": 9, 619 | "metadata": {}, 620 | "output_type": "execute_result" 621 | } 622 | ], 623 | "source": [ 624 | "string = \"Let's make ThIs tExT all lowercase\"\n", 625 | "string.lower()" 626 | ] 627 | }, 628 | { 629 | "cell_type": "markdown", 630 | "metadata": {}, 631 | "source": [ 632 | "Now let's see if any of the text words for upper or lower case versions of the text!" 633 | ] 634 | }, 635 | { 636 | "cell_type": "code", 637 | "execution_count": 10, 638 | "metadata": {}, 639 | "outputs": [ 640 | { 641 | "data": { 642 | "text/plain": [ 643 | "(9, 1)" 644 | ] 645 | }, 646 | "execution_count": 10, 647 | "metadata": {}, 648 | "output_type": "execute_result" 649 | } 650 | ], 651 | "source": [ 652 | "greeting_words = [\"hello\", \"hi\", \"how are you?\"]\n", 653 | "goodbye_words = [\"goodbye\", \"see you\", \"see you!\"]\n", 654 | "\n", 655 | "with open(\"alice_in_wonderland.txt\",\"r\") as f:\n", 656 | " text = f.read()\n", 657 | "lines = text.split(\"\\n\")\n", 658 | "greetings = 0\n", 659 | "goodbyes = 0\n", 660 | "for line in lines:\n", 661 | " line = line.lower()\n", 662 | " if any([line.startswith(elem) for elem in greeting_words]):\n", 663 | " greetings += 1\n", 664 | " if any([line.startswith(elem) for elem in goodbye_words]):\n", 665 | " goodbyes += 1\n", 666 | " if any([line.endswith(elem) for elem in greeting_words]):\n", 667 | " greetings += 1\n", 668 | " if any([line.endswith(elem) for elem in goodbye_words]):\n", 669 | " goodbyes += 1\n", 670 | "greetings, goodbyes" 671 | ] 672 | }, 673 | { 674 | "cell_type": "markdown", 675 | "metadata": {}, 676 | "source": [ 677 | "Pay dirt! We got some greetings and goodbyes! It turns out sometimes, people are friendly in alice in wonderland after all! Now let's see if any of our greeting words or goodbye words are anywhere on any line in the text. We'll use find to do this :)" 678 | ] 679 | }, 680 | { 681 | "cell_type": "code", 682 | "execution_count": 12, 683 | "metadata": {}, 684 | "outputs": [ 685 | { 686 | "name": "stdout", 687 | "output_type": "stream", 688 | "text": [ 689 | "6\n", 690 | "-1\n" 691 | ] 692 | } 693 | ], 694 | "source": [ 695 | "string = \"Hello there friend\"\n", 696 | "print(string.find(\"there\"))\n", 697 | "print(string.find(\"whatever\"))" 698 | ] 699 | }, 700 | { 701 | "cell_type": "markdown", 702 | "metadata": {}, 703 | "source": [ 704 | "So find either, finds the occurrence of the string or returns -1" 705 | ] 706 | }, 707 | { 708 | "cell_type": "code", 709 | "execution_count": 14, 710 | "metadata": {}, 711 | "outputs": [ 712 | { 713 | "data": { 714 | "text/plain": [ 715 | "(736, 4)" 716 | ] 717 | }, 718 | "execution_count": 14, 719 | "metadata": {}, 720 | "output_type": "execute_result" 721 | } 722 | ], 723 | "source": [ 724 | "greeting_words = [\"hello\", \"hi\", \"how are you?\"]\n", 725 | "goodbye_words = [\"goodbye\", \"see you\", \"see you!\"]\n", 726 | "\n", 727 | "with open(\"alice_in_wonderland.txt\",\"r\") as f:\n", 728 | " text = f.read()\n", 729 | "lines = text.split(\"\\n\")\n", 730 | "greetings = 0\n", 731 | "goodbyes = 0\n", 732 | "for line in lines:\n", 733 | " line = line.lower()\n", 734 | " greetings_found = [elem for elem in greeting_words if line.find(elem) != -1]\n", 735 | " goodbyes_found = [elem for elem in goodbye_words if line.find(elem) != -1]\n", 736 | " greetings += len(greetings_found)\n", 737 | " goodbyes += len(goodbyes_found)\n", 738 | "greetings, goodbyes" 739 | ] 740 | }, 741 | { 742 | "cell_type": "code", 743 | "execution_count": 34, 744 | "metadata": {}, 745 | "outputs": [ 746 | { 747 | "name": "stdout", 748 | "output_type": "stream", 749 | "text": [ 750 | "Rabbit\n" 751 | ] 752 | }, 753 | { 754 | "data": { 755 | "text/plain": [ 756 | "51" 757 | ] 758 | }, 759 | "execution_count": 34, 760 | "metadata": {}, 761 | "output_type": "execute_result" 762 | } 763 | ], 764 | "source": [ 765 | "search_terms = str(input())\n", 766 | "search_terms = search_terms.lower()\n", 767 | "search_terms = search_terms.split()\n", 768 | "with open(\"alice_in_wonderland.txt\",\"r\") as f:\n", 769 | " text = f.read()\n", 770 | "lines = text.split(\"\\n\")\n", 771 | "search_term_frequency = 0\n", 772 | "for line in lines:\n", 773 | " line = line.lower()\n", 774 | " terms_found = [elem for elem in search_terms if line.find(elem) != -1]\n", 775 | " search_term_frequency += len(terms_found)\n", 776 | "search_term_frequency" 777 | ] 778 | }, 779 | { 780 | "cell_type": "code", 781 | "execution_count": 36, 782 | "metadata": {}, 783 | "outputs": [ 784 | { 785 | "name": "stdout", 786 | "output_type": "stream", 787 | "text": [ 788 | "6\n" 789 | ] 790 | } 791 | ], 792 | "source": [ 793 | "string = \"hello his stuff\"\n", 794 | "print(string.find(\"hi\"))" 795 | ] 796 | }, 797 | { 798 | "cell_type": "code", 799 | "execution_count": 2, 800 | "metadata": {}, 801 | "outputs": [ 802 | { 803 | "name": "stdout", 804 | "output_type": "stream", 805 | "text": [ 806 | "Alice\n" 807 | ] 808 | }, 809 | { 810 | "data": { 811 | "text/plain": [ 812 | "2745" 813 | ] 814 | }, 815 | "execution_count": 2, 816 | "metadata": {}, 817 | "output_type": "execute_result" 818 | } 819 | ], 820 | "source": [ 821 | "search_terms = str(input())\n", 822 | "search_terms = search_terms.lower()\n", 823 | "search_terms = search_terms.split()\n", 824 | "with open(\"alice_in_wonderland.txt\",\"r\") as f:\n", 825 | " text = f.read()\n", 826 | "lines = text.split(\"\\n\")\n", 827 | "search_term_frequency = 0\n", 828 | "for line in lines:\n", 829 | " line = line.lower()\n", 830 | " line = line.split()\n", 831 | " for elem in line:\n", 832 | " for term in search_terms:\n", 833 | " search_term_frequency += line.count(term)\n", 834 | "search_term_frequency" 835 | ] 836 | }, 837 | { 838 | "cell_type": "code", 839 | "execution_count": 41, 840 | "metadata": {}, 841 | "outputs": [ 842 | { 843 | "data": { 844 | "text/plain": [ 845 | "['__add__',\n", 846 | " '__class__',\n", 847 | " '__contains__',\n", 848 | " '__delattr__',\n", 849 | " '__delitem__',\n", 850 | " '__dir__',\n", 851 | " '__doc__',\n", 852 | " '__eq__',\n", 853 | " '__format__',\n", 854 | " '__ge__',\n", 855 | " '__getattribute__',\n", 856 | " '__getitem__',\n", 857 | " '__gt__',\n", 858 | " '__hash__',\n", 859 | " '__iadd__',\n", 860 | " '__imul__',\n", 861 | " '__init__',\n", 862 | " '__init_subclass__',\n", 863 | " '__iter__',\n", 864 | " '__le__',\n", 865 | " '__len__',\n", 866 | " '__lt__',\n", 867 | " '__mul__',\n", 868 | " '__ne__',\n", 869 | " '__new__',\n", 870 | " '__reduce__',\n", 871 | " '__reduce_ex__',\n", 872 | " '__repr__',\n", 873 | " '__reversed__',\n", 874 | " '__rmul__',\n", 875 | " '__setattr__',\n", 876 | " '__setitem__',\n", 877 | " '__sizeof__',\n", 878 | " '__str__',\n", 879 | " '__subclasshook__',\n", 880 | " 'append',\n", 881 | " 'clear',\n", 882 | " 'copy',\n", 883 | " 'count',\n", 884 | " 'extend',\n", 885 | " 'index',\n", 886 | " 'insert',\n", 887 | " 'pop',\n", 888 | " 'remove',\n", 889 | " 'reverse',\n", 890 | " 'sort']" 891 | ] 892 | }, 893 | "execution_count": 41, 894 | "metadata": {}, 895 | "output_type": "execute_result" 896 | } 897 | ], 898 | "source": [] 899 | }, 900 | { 901 | "cell_type": "markdown", 902 | "metadata": {}, 903 | "source": [ 904 | "Using the above method we are able to effectively search the text, in it's entirety for occurrences of the six phrases of interest! At this point, we've likely found all the instances of those words. But this leads us to a general point. We can search text for words or phrases!!!\n", 905 | "\n", 906 | "Now let's use our new found powers to something really cool - let's spell correct a bunch of text." 907 | ] 908 | }, 909 | { 910 | "cell_type": "code", 911 | "execution_count": 21, 912 | "metadata": { 913 | "collapsed": true 914 | }, 915 | "outputs": [], 916 | "source": [ 917 | "from autocorrect import spell\n", 918 | "import time\n", 919 | "with open(\"alice_in_wonderland.txt\",\"r\") as f:\n", 920 | " text = f.read()\n", 921 | "lines = text.split(\"\\n\")\n", 922 | "new_lines = []\n", 923 | "total_misspellings = 0\n", 924 | "misspellings_per_line = []\n", 925 | "start = time.time()\n", 926 | "for index,line in enumerate(lines):\n", 927 | " tokens = line.split()\n", 928 | " new_tokens = []\n", 929 | " misspellings = 0\n", 930 | " for token in tokens:\n", 931 | " correct_spelling = spell(token)\n", 932 | " if correct_spelling != token:\n", 933 | " total_misspellings += 1\n", 934 | " misspellings += 1\n", 935 | " new_tokens.append(correct_spelling)\n", 936 | " misspellings_per_line.append(misspellings)\n", 937 | " new_string = \" \".join(new_tokens)\n", 938 | " new_lines.append(new_string)\n", 939 | "new_text = \"\\n\".join(new_lines)\n", 940 | "with open(\"correctly_spelled_alice_in_wonderland.txt\", \"w\") as f:\n", 941 | " f.write(new_text)" 942 | ] 943 | }, 944 | { 945 | "cell_type": "markdown", 946 | "metadata": {}, 947 | "source": [ 948 | "Just for fun, let's do a little bit of analysis on the number of misspellings on our text:" 949 | ] 950 | }, 951 | { 952 | "cell_type": "code", 953 | "execution_count": 23, 954 | "metadata": {}, 955 | "outputs": [ 956 | { 957 | "name": "stdout", 958 | "output_type": "stream", 959 | "text": [ 960 | "1.6383832976445396\n", 961 | "6121\n" 962 | ] 963 | } 964 | ], 965 | "source": [ 966 | "import statistics\n", 967 | "print(statistics.mean(misspellings_per_line))\n", 968 | "print(total_misspellings)" 969 | ] 970 | }, 971 | { 972 | "cell_type": "markdown", 973 | "metadata": {}, 974 | "source": [ 975 | "So we made a ton of corrections to this document! Pretty good. Okay, okay. So if we turn this into code, then we'll still need to point our code at a bunch of files, can we do better?\n", 976 | "\n", 977 | "Turns out we can! Remember last week, when we learned how to move between directories, let's apply some of that knowledge now!" 978 | ] 979 | }, 980 | { 981 | "cell_type": "code", 982 | "execution_count": 1, 983 | "metadata": {}, 984 | "outputs": [ 985 | { 986 | "name": "stdout", 987 | "output_type": "stream", 988 | "text": [ 989 | "currently process /Users/ericschles/Documents/projects/python_courses/an_introduction_to_python\n", 990 | "beginning file processing\n" 991 | ] 992 | }, 993 | { 994 | "ename": "KeyboardInterrupt", 995 | "evalue": "", 996 | "output_type": "error", 997 | "traceback": [ 998 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 999 | "\u001b[0;31mKeyboardInterrupt\u001b[0m Traceback (most recent call last)", 1000 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 28\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"beginning file processing\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 29\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mfile\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mfiles\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 30\u001b[0;31m \u001b[0mcorrectly_spelled_file\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mspell_correct\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfile\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 31\u001b[0m \u001b[0;32mwith\u001b[0m \u001b[0mopen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfile\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\"w\"\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0mf\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 32\u001b[0m \u001b[0mf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mwrite\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcorrectly_spelled_file\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 1001 | "\u001b[0;32m\u001b[0m in \u001b[0;36mspell_correct\u001b[0;34m(file_path)\u001b[0m\n\u001b[1;32m 12\u001b[0m \u001b[0mnew_tokens\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 13\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mtoken\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mtokens\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 14\u001b[0;31m \u001b[0mnew_tokens\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mappend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mspell\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtoken\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 15\u001b[0m \u001b[0mnew_line\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m\" \"\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mjoin\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnew_tokens\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 16\u001b[0m \u001b[0mnew_lines\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mappend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnew_line\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 1002 | "\u001b[0;32m/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/autocorrect/__init__.py\u001b[0m in \u001b[0;36mspell\u001b[0;34m(word)\u001b[0m\n\u001b[1;32m 21\u001b[0m \u001b[0mw\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mWord\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mword\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 22\u001b[0m candidates = (common([word]) or exact([word]) or known([word]) or\n\u001b[0;32m---> 23\u001b[0;31m \u001b[0mknown\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mw\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtypos\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mor\u001b[0m \u001b[0mcommon\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mw\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdouble_typos\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mor\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 24\u001b[0m [word])\n\u001b[1;32m 25\u001b[0m \u001b[0mcorrection\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mmax\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcandidates\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkey\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mNLP_COUNTS\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 1003 | "\u001b[0;32m/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/autocorrect/word.py\u001b[0m in \u001b[0;36mdouble_typos\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 69\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mdouble_typos\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 70\u001b[0m \u001b[0;34m\"\"\"letter combinations two typos away from word\"\"\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 71\u001b[0;31m return {e2 for e1 in self.typos()\n\u001b[0m\u001b[1;32m 72\u001b[0m for e2 in Word(e1).typos()}\n\u001b[1;32m 73\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", 1004 | "\u001b[0;32m/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/autocorrect/word.py\u001b[0m in \u001b[0;36m\u001b[0;34m(.0)\u001b[0m\n\u001b[1;32m 70\u001b[0m \u001b[0;34m\"\"\"letter combinations two typos away from word\"\"\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 71\u001b[0m return {e2 for e1 in self.typos()\n\u001b[0;32m---> 72\u001b[0;31m for e2 in Word(e1).typos()}\n\u001b[0m\u001b[1;32m 73\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 74\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", 1005 | "\u001b[0;32m/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/autocorrect/word.py\u001b[0m in \u001b[0;36mtypos\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 65\u001b[0m \u001b[0;34m\"\"\"letter combinations one typo away from word\"\"\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 66\u001b[0m return (self._deletes() | self._transposes() |\n\u001b[0;32m---> 67\u001b[0;31m self._replaces() | self._inserts())\n\u001b[0m\u001b[1;32m 68\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 69\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mdouble_typos\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 1006 | "\u001b[0;32m/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/autocorrect/word.py\u001b[0m in \u001b[0;36m_replaces\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 53\u001b[0m \u001b[0;34m\"\"\"tge\"\"\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 54\u001b[0m return {concat(a, c, b[1:])\n\u001b[0;32m---> 55\u001b[0;31m \u001b[0;32mfor\u001b[0m \u001b[0ma\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mb\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mslices\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 56\u001b[0m for c in ALPHABET}\n\u001b[1;32m 57\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", 1007 | "\u001b[0;32m/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/autocorrect/word.py\u001b[0m in \u001b[0;36m\u001b[0;34m(.0)\u001b[0m\n\u001b[1;32m 52\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_replaces\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 53\u001b[0m \u001b[0;34m\"\"\"tge\"\"\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 54\u001b[0;31m return {concat(a, c, b[1:])\n\u001b[0m\u001b[1;32m 55\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0ma\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mb\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mslices\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 56\u001b[0m for c in ALPHABET}\n", 1008 | "\u001b[0;31mKeyboardInterrupt\u001b[0m: " 1009 | ] 1010 | } 1011 | ], 1012 | "source": [ 1013 | "import os\n", 1014 | "from glob import glob\n", 1015 | "from autocorrect import spell\n", 1016 | "\n", 1017 | "def spell_correct(file_path):\n", 1018 | " with open(file_path, \"r\") as f:\n", 1019 | " text = f.read()\n", 1020 | " lines = text.split(\"\\n\")\n", 1021 | " new_lines = []\n", 1022 | " for line in lines:\n", 1023 | " tokens = line.split()\n", 1024 | " new_tokens = []\n", 1025 | " for token in tokens:\n", 1026 | " new_tokens.append(spell(token))\n", 1027 | " new_line = \" \".join(new_tokens)\n", 1028 | " new_lines.append(new_line)\n", 1029 | " return \"\\n\".join(new_lines)\n", 1030 | "\n", 1031 | "current_dir = os.getcwd()\n", 1032 | "dirs = []\n", 1033 | "traversed_dirs = [current_dir]\n", 1034 | "previous_dir = \"\"\n", 1035 | "while current_dir != previous_dir:\n", 1036 | " print(\"currently process\", current_dir)\n", 1037 | " files = [os.path.abspath(file) for file in glob(\"*\") if os.path.isfile(file)]\n", 1038 | " dirs += [os.path.abspath(directory) for directory in glob(\"*\") if os.path.isdir(directory)]\n", 1039 | " dirs = [directory for directory in dirs if directory not in traversed_dirs]\n", 1040 | " print(\"beginning file processing\")\n", 1041 | " for file in files:\n", 1042 | " correctly_spelled_file = spell_correct(file)\n", 1043 | " with open(file,\"w\") as f:\n", 1044 | " f.write(correctly_spelled_file)\n", 1045 | " print(\"wrote out all files with spell correction\")\n", 1046 | " previous_dir = current_dir\n", 1047 | " current_dir = dirs.pop()\n", 1048 | " traversed_dirs.append(current_dir)\n", 1049 | " os.chdir(current_dir)" 1050 | ] 1051 | }, 1052 | { 1053 | "cell_type": "code", 1054 | "execution_count": null, 1055 | "metadata": { 1056 | "collapsed": true 1057 | }, 1058 | "outputs": [], 1059 | "source": [] 1060 | } 1061 | ], 1062 | "metadata": { 1063 | "kernelspec": { 1064 | "display_name": "Python 3", 1065 | "language": "python", 1066 | "name": "python3" 1067 | }, 1068 | "language_info": { 1069 | "codemirror_mode": { 1070 | "name": "ipython", 1071 | "version": 3 1072 | }, 1073 | "file_extension": ".py", 1074 | "mimetype": "text/x-python", 1075 | "name": "python", 1076 | "nbconvert_exporter": "python", 1077 | "pygments_lexer": "ipython3", 1078 | "version": "3.6.2" 1079 | } 1080 | }, 1081 | "nbformat": 4, 1082 | "nbformat_minor": 2 1083 | } 1084 | -------------------------------------------------------------------------------- /class_one.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Class One - A Gentle Introduction\n", 8 | "\n", 9 | "The Python language is a high level object oriented programming language with influences from many other programming languages. It exists within the spectrum of possible computer languages.\n", 10 | "\n", 11 | "## First Point\n", 12 | "\n", 13 | "Python, like all computer languages, is a written language. We can think of this language, like others as having nouns and verbs.\n", 14 | "\n", 15 | "Today, we will learn a few nouns and verbs for this language." 16 | ] 17 | }, 18 | { 19 | "cell_type": "markdown", 20 | "metadata": {}, 21 | "source": [ 22 | "# First Nouns" 23 | ] 24 | }, 25 | { 26 | "cell_type": "code", 27 | "execution_count": 1, 28 | "metadata": { 29 | "collapsed": true 30 | }, 31 | "outputs": [], 32 | "source": [ 33 | "string_variable = \"\"\n", 34 | "integer_variable = 0\n", 35 | "floating_point_variable = 0.0\n", 36 | "boolean_variable = True" 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": {}, 42 | "source": [ 43 | "The above four examples show us the most common four variables - strings, integers, floats, and booleans. \n", 44 | "\n", 45 | "**Definition** _Integer_ - An integer is a whole number that can be treated as a mathematical object. It is one that can be added to, subtracted from, divided by, or multiplied by; to get new numbers.\n", 46 | "\n", 47 | "**Definition** _Floating Point Number_ - A floating point number is a whole number + some list of numbers after the decimal point. Floating point numbers can be thought of as belonging to the set of real numbers, from mathematics. They too, like numbers, can be added, subtracted, multiplied and divided.\n", 48 | "\n", 49 | "**Definition** _String_ - A string is literal, whatever exists between the open quote (`\"`) and the close quote (`\"`), will be treated as is. We see strings as different in a fundamental way from numbers, they can be manipulated, in most high level computer languages, however they cannot be added to, subtracted from, multiplied by or divided by.\n", 50 | "\n", 51 | "**Definition** _Boolean_ - A boolean is a binary variable. It can only take on one of two values, True and False. The notion of being able to state True or False with semantic interpretability is a powerful construct. \n", 52 | "\n", 53 | "_Aside_: Being able to speak in terms of absolute truth with lend us the ability to do some extraordinary things." 54 | ] 55 | }, 56 | { 57 | "cell_type": "markdown", 58 | "metadata": {}, 59 | "source": [ 60 | "# First Verbs" 61 | ] 62 | }, 63 | { 64 | "cell_type": "code", 65 | "execution_count": 5, 66 | "metadata": {}, 67 | "outputs": [ 68 | { 69 | "name": "stdout", 70 | "output_type": "stream", 71 | "text": [ 72 | "Adding 5 + 7 = 12\n", 73 | "Dividing 6.3 by 17 = 0.37058823529411766\n", 74 | "Concatenating the literal 5 and the literal 7 yields 57\n", 75 | "The truth value of True AND False is False\n" 76 | ] 77 | } 78 | ], 79 | "source": [ 80 | "integer_result = 5 + 7\n", 81 | "print(\"Adding 5 + 7 = {}\".format(integer_result))\n", 82 | "floating_point_result = 6.3 / 17\n", 83 | "print(\"Dividing 6.3 by 17 = {}\".format(floating_point_result))\n", 84 | "string_result = \"5\" + \"7\"\n", 85 | "print(\"Concatenating the literal 5 and the literal 7 yields {}\".format(string_result))\n", 86 | "boolean_result = True and False\n", 87 | "print(\"The truth value of True AND False is {}\".format(boolean_result))\n" 88 | ] 89 | }, 90 | { 91 | "cell_type": "markdown", 92 | "metadata": {}, 93 | "source": [ 94 | "# Understanding how to use verbs and nouns together\n", 95 | "\n", 96 | "In this last example we used each of the nouns we defined, often called types of data, or types for short. Notice that the types interacted with verbs:\n", 97 | "\n", 98 | "* \"+\" - addition for integers and floating point numbers, concatenation for strings.\n", 99 | "* \"/\" - division for integers and floating point numbers.\n", 100 | "* \"and\" - logical AND for booleans.\n", 101 | "* `print()` - prints strings to the screen.\n", 102 | "\n", 103 | "Notice that we are seeing two types of verbs, or as computer programmers often call them, functions. The first type of functions works by placing it's inputs on either side of the function. For instance:\n", 104 | "\n", 105 | "`True and False`\n", 106 | "\n", 107 | "Or\n", 108 | "\n", 109 | "`5 + 7`\n", 110 | "\n", 111 | "However, the `print()` function doesn't work like that, it takes it's inputs in order, as a list of inputs. Most computer functions in Python work this way. There are lot of good reasons for this, but the simpliest one is the following:\n", 112 | "\n", 113 | "Try to add 12 integers using \"+\" - `1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12`. Let's see the result just for fun:\n" 114 | ] 115 | }, 116 | { 117 | "cell_type": "code", 118 | "execution_count": 6, 119 | "metadata": {}, 120 | "outputs": [ 121 | { 122 | "data": { 123 | "text/plain": [ 124 | "78" 125 | ] 126 | }, 127 | "execution_count": 6, 128 | "metadata": {}, 129 | "output_type": "execute_result" 130 | } 131 | ], 132 | "source": [ 133 | "1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12" 134 | ] 135 | }, 136 | { 137 | "cell_type": "markdown", 138 | "metadata": {}, 139 | "source": [ 140 | "That was a lot of typing! We had to write the plus operator 11 times! Let's look at this same example, if we used the function the other way, like the print function:\n", 141 | "\n", 142 | "`add(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)`\n", 143 | "\n", 144 | "Way less typing!\n", 145 | "\n", 146 | "But wait, we don't have an add function already defined for us in Python. Is there a way we could make our own? " 147 | ] 148 | }, 149 | { 150 | "cell_type": "markdown", 151 | "metadata": {}, 152 | "source": [ 153 | "# Defining Our Own Functions\n", 154 | "\n", 155 | "It turns out defining our own functions in Python is very, very easy, unlike some other languages.\n", 156 | "\n", 157 | "Here's the general syntax for doing so:\n", 158 | "\n", 159 | "```\n", 160 | "def function_name(first_input, second_input, third_input):\n", 161 | " # ... some code goes here\n", 162 | " return result_of_code\n", 163 | "```\n", 164 | "\n", 165 | "So let's see what it's like defining our own function:" 166 | ] 167 | }, 168 | { 169 | "cell_type": "code", 170 | "execution_count": 8, 171 | "metadata": {}, 172 | "outputs": [ 173 | { 174 | "name": "stdout", 175 | "output_type": "stream", 176 | "text": [ 177 | "5 + 7 is 12\n" 178 | ] 179 | } 180 | ], 181 | "source": [ 182 | "def add(first_input, second_input):\n", 183 | " return first_input + second_input\n", 184 | "\n", 185 | "print(\"5 + 7 is {}\".format(add(5,7)))" 186 | ] 187 | }, 188 | { 189 | "cell_type": "markdown", 190 | "metadata": {}, 191 | "source": [ 192 | "One of the things that happened above is we defined a function with a different scope than the rest of our program. What does scope mean? Let's investigate by example." 193 | ] 194 | }, 195 | { 196 | "cell_type": "code", 197 | "execution_count": 2, 198 | "metadata": {}, 199 | "outputs": [ 200 | { 201 | "name": "stdout", 202 | "output_type": "stream", 203 | "text": [ 204 | "The result of something 16\n" 205 | ] 206 | }, 207 | { 208 | "ename": "NameError", 209 | "evalue": "name 'y' is not defined", 210 | "output_type": "error", 211 | "traceback": [ 212 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 213 | "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", 214 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0mx\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m12\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 6\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"The result of something {}\"\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mformat\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msomething\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 7\u001b[0;31m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"The value of y {}\"\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mformat\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0my\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", 215 | "\u001b[0;31mNameError\u001b[0m: name 'y' is not defined" 216 | ] 217 | } 218 | ], 219 | "source": [ 220 | "def something(x):\n", 221 | " y = 7\n", 222 | " return x + z\n", 223 | "\n", 224 | "z = 4 #something can see z\n", 225 | "x = 12\n", 226 | "print(\"The result of something {}\".format(something(x)))\n", 227 | "print(\"The value of y {}\".format(y))" 228 | ] 229 | }, 230 | { 231 | "cell_type": "markdown", 232 | "metadata": {}, 233 | "source": [ 234 | "Extracting some general rules:\n", 235 | "\n", 236 | "* functions can see any variables defined in the main scope\n", 237 | "* the main scope cannot see any variables defined inside a function scope" 238 | ] 239 | }, 240 | { 241 | "cell_type": "markdown", 242 | "metadata": {}, 243 | "source": [ 244 | "We'll learn later on in the course how to take in an arbitrary of inputs to Python functions later on in the course. But for now, just take my word that it's possible to do this." 245 | ] 246 | }, 247 | { 248 | "cell_type": "markdown", 249 | "metadata": {}, 250 | "source": [ 251 | "# The Power Of Booleans\n", 252 | "\n", 253 | "Earlier on in the less I said that boolean values were AWESOME. And now I'm going to show you how awesome they are. With booleans we can do a lot of things like:\n", 254 | "\n", 255 | "* Write a program that never ends\n", 256 | "* Write expressions that evaluate to True or False\n", 257 | "* Write functions that evaluate to True or False" 258 | ] 259 | }, 260 | { 261 | "cell_type": "code", 262 | "execution_count": null, 263 | "metadata": { 264 | "collapsed": true 265 | }, 266 | "outputs": [], 267 | "source": [ 268 | "# A program that never ends\n", 269 | "\n", 270 | "while True:\n", 271 | " print(\"This is the song that never ends, it goes on and on my friend\")\n", 272 | " print(\"some people started singing it, not knowing what it was.\")\n", 273 | " print(\"And now they keep on singing it and that is just because,\")\n", 274 | "# Don't run this!!!!" 275 | ] 276 | }, 277 | { 278 | "cell_type": "markdown", 279 | "metadata": {}, 280 | "source": [ 281 | "We'll get into the while keyboard in a later lecture and all this other crazy stuff. But for now, just think of the statement doing the following:\n", 282 | "\n", 283 | "```\n", 284 | "while [Some statement that is True remains true]:\n", 285 | " do everything inside of the while statement\n", 286 | "```\n", 287 | "\n", 288 | "So this means, things will _keep_ happening _forever_, if the statement is never false. Well, `True` is always `True`, and we can check that:" 289 | ] 290 | }, 291 | { 292 | "cell_type": "code", 293 | "execution_count": 9, 294 | "metadata": {}, 295 | "outputs": [ 296 | { 297 | "data": { 298 | "text/plain": [ 299 | "True" 300 | ] 301 | }, 302 | "execution_count": 9, 303 | "metadata": {}, 304 | "output_type": "execute_result" 305 | } 306 | ], 307 | "source": [ 308 | "True == True" 309 | ] 310 | }, 311 | { 312 | "cell_type": "markdown", 313 | "metadata": {}, 314 | "source": [ 315 | "Yup, checks out! \n", 316 | "\n", 317 | "So as long as `True` is `True`, we are guaranteed that this statement keeps being executed. That's going to be super useful for us later on, because we may want to write programs that never end." 318 | ] 319 | }, 320 | { 321 | "cell_type": "markdown", 322 | "metadata": {}, 323 | "source": [ 324 | "Let's look at another case now, writing expressions that evaluate to `True` or `False`. \n", 325 | "\n", 326 | "It turns out that if you make use of a boolean function then your expression will return one of these. The list of builtin boolean functions includes:\n", 327 | "\n", 328 | "* `<` - less than\n", 329 | "* `<=` - less than or equal to\n", 330 | "* `>` - greater than\n", 331 | "* `>=` - greater than or equal to\n", 332 | "* `==` - equal to\n", 333 | "* `and` - Logical AND\n", 334 | "* `or` - Logical OR\n", 335 | "* `not` - reverses the truth value of the statement\n", 336 | "* `in` - checks if an element is in a collection (we'll get to this)\n", 337 | "* `is` - is checks if two things are the same, it's similar equal to, but not the same (don't worry too much about this for now).\n", 338 | "\n", 339 | "We'll only concern ourselves today with everything up to `not`, all the other booleans will be discussed later on.\n", 340 | "\n", 341 | "So let's see our first example: \n", 342 | "\n", 343 | "Let's check if 5 is less than 7" 344 | ] 345 | }, 346 | { 347 | "cell_type": "code", 348 | "execution_count": 11, 349 | "metadata": {}, 350 | "outputs": [ 351 | { 352 | "name": "stdout", 353 | "output_type": "stream", 354 | "text": [ 355 | "True\n" 356 | ] 357 | } 358 | ], 359 | "source": [ 360 | "print(5 < 7)" 361 | ] 362 | }, 363 | { 364 | "cell_type": "markdown", 365 | "metadata": {}, 366 | "source": [ 367 | "Great! Now what can we do with that? It turns out Python has a builtin function that let's you check `if` an expression returns `True`." 368 | ] 369 | }, 370 | { 371 | "cell_type": "code", 372 | "execution_count": 12, 373 | "metadata": {}, 374 | "outputs": [ 375 | { 376 | "name": "stdout", 377 | "output_type": "stream", 378 | "text": [ 379 | "turns out, 5 is less than 7\n" 380 | ] 381 | } 382 | ], 383 | "source": [ 384 | "if 5 < 7:\n", 385 | " print(\"turns out, 5 is less than 7\")" 386 | ] 387 | }, 388 | { 389 | "cell_type": "markdown", 390 | "metadata": {}, 391 | "source": [ 392 | "And we can also check if an expression isn't `True` we can do something else, via an else statement." 393 | ] 394 | }, 395 | { 396 | "cell_type": "code", 397 | "execution_count": 13, 398 | "metadata": {}, 399 | "outputs": [ 400 | { 401 | "name": "stdout", 402 | "output_type": "stream", 403 | "text": [ 404 | "turns out, 5 is less than 7\n" 405 | ] 406 | } 407 | ], 408 | "source": [ 409 | "if 5 < 7:\n", 410 | " print(\"turns out, 5 is less than 7\")\n", 411 | "else:\n", 412 | " print(\"uh oh, looks like we implemented Math wrong!\")" 413 | ] 414 | }, 415 | { 416 | "cell_type": "markdown", 417 | "metadata": {}, 418 | "source": [ 419 | "There a whole list of powerful checks we can do, to express many powerful ideas, just by making use of this high level powerful syntax. We'll concern ourselves with a few more examples now." 420 | ] 421 | }, 422 | { 423 | "cell_type": "code", 424 | "execution_count": 14, 425 | "metadata": {}, 426 | "outputs": [ 427 | { 428 | "name": "stdout", 429 | "output_type": "stream", 430 | "text": [ 431 | "looks like lots of the math we know works in programs\n" 432 | ] 433 | } 434 | ], 435 | "source": [ 436 | "if 5 < 7 and 14 > 12:\n", 437 | " print(\"looks like lots of the math we know works in programs\")\n", 438 | "else:\n", 439 | " print(\"hmm maybe I don't remember how to check for less than after all\")" 440 | ] 441 | }, 442 | { 443 | "cell_type": "code", 444 | "execution_count": 15, 445 | "metadata": {}, 446 | "outputs": [ 447 | { 448 | "name": "stdout", 449 | "output_type": "stream", 450 | "text": [ 451 | "I guess it makes sense that this is true\n" 452 | ] 453 | } 454 | ], 455 | "source": [ 456 | "if 5 < 7 or 12 > 14:\n", 457 | " print(\"I guess it makes sense that this is true\")\n", 458 | "else:\n", 459 | " print(\"woah, learned something new!\")" 460 | ] 461 | }, 462 | { 463 | "cell_type": "code", 464 | "execution_count": 16, 465 | "metadata": {}, 466 | "outputs": [ 467 | { 468 | "name": "stdout", 469 | "output_type": "stream", 470 | "text": [ 471 | "phew, not going crazy, yet.\n" 472 | ] 473 | } 474 | ], 475 | "source": [ 476 | "if 5 == 7:\n", 477 | " print(\"what? When did that happen\")\n", 478 | "else:\n", 479 | " print(\"phew, not going crazy, yet.\")" 480 | ] 481 | }, 482 | { 483 | "cell_type": "markdown", 484 | "metadata": {}, 485 | "source": [ 486 | "# Assignment\n", 487 | "\n", 488 | "https://github.com/18F/an_introduction_to_python#assignment-for-class-1" 489 | ] 490 | }, 491 | { 492 | "cell_type": "markdown", 493 | "metadata": { 494 | "collapsed": true 495 | }, 496 | "source": [ 497 | "For the assignment you'll need to know how to read input from a user. Here we'll see an example of this." 498 | ] 499 | }, 500 | { 501 | "cell_type": "code", 502 | "execution_count": 4, 503 | "metadata": {}, 504 | "outputs": [ 505 | { 506 | "name": "stdout", 507 | "output_type": "stream", 508 | "text": [ 509 | "What is your name?Eric\n", 510 | "Hello Eric\n" 511 | ] 512 | } 513 | ], 514 | "source": [ 515 | "name = str(input(\"What is your name?\"))\n", 516 | "print(\"Hello {}\".format(name))" 517 | ] 518 | } 519 | ], 520 | "metadata": { 521 | "kernelspec": { 522 | "display_name": "Python 3", 523 | "language": "python", 524 | "name": "python3" 525 | }, 526 | "language_info": { 527 | "codemirror_mode": { 528 | "name": "ipython", 529 | "version": 3 530 | }, 531 | "file_extension": ".py", 532 | "mimetype": "text/x-python", 533 | "name": "python", 534 | "nbconvert_exporter": "python", 535 | "pygments_lexer": "ipython3", 536 | "version": "3.6.1" 537 | } 538 | }, 539 | "nbformat": 4, 540 | "nbformat_minor": 2 541 | } 542 | -------------------------------------------------------------------------------- /class_three.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Class three - Dealing with iteration\n", 8 | "\n", 9 | "So far we've covered a lot of ground! Talking about types and math and imports, My Word! But the greatest strength of computers has not yet been discussed. This foundational power house is central to why computers are the transformative power houses they are. And all of it comes down to one simple idea: iteration.\n", 10 | "\n", 11 | "To get a true sense of the real power of iteration, let's look at a historical example:\n", 12 | "\n", 13 | "This comes to us from world war 2 and the work of Alan Turing, one of the first computer scientists.\n", 14 | "\n", 15 | "He used iteration to crack the German's enigma machine. The specifics of how we did it are out of scope for the course. But I can say iteration is at the core of the solution. \n", 16 | "\n", 17 | "The problem being solved is well formulated in the movie the imitation game - here is an approximate description of that problem:\n", 18 | "\n", 19 | "The German's send codes using engima over open airways. Anyone can intercept them, but the messages will be encoded. You need an engima machine in order to decode the message. Now the Polish Cryptographers have figured out how to decode the engima machine, if they know the configuration of the machine. Unfortunately, the configuration space for enigma was impossibly large:" 20 | ] 21 | }, 22 | { 23 | "cell_type": "code", 24 | "execution_count": 4, 25 | "metadata": {}, 26 | "outputs": [ 27 | { 28 | "data": { 29 | "text/plain": [ 30 | "107458687327250619360000" 31 | ] 32 | }, 33 | "execution_count": 4, 34 | "metadata": {}, 35 | "output_type": "execute_result" 36 | } 37 | ], 38 | "source": [ 39 | "60 * 17576 * 676 * 150738274937250" 40 | ] 41 | }, 42 | { 43 | "cell_type": "markdown", 44 | "metadata": {}, 45 | "source": [ 46 | "possible configurations to be exact. It is impossible to know which configuration to check, so you have to check all of them. Before Turing built his machine, you'd need to do this _by hand_.\n", 47 | "\n", 48 | "Here is some \"semi\" real code that does what turing's machine did." 49 | ] 50 | }, 51 | { 52 | "cell_type": "code", 53 | "execution_count": null, 54 | "metadata": { 55 | "collapsed": true 56 | }, 57 | "outputs": [], 58 | "source": [ 59 | "def combine(config, message):\n", 60 | " # actual config message is out of scope\n", 61 | " return message\n", 62 | "\n", 63 | "def check_config(config, message):\n", 64 | " result = combine(config, message)\n", 65 | " if \"Hitler\" in result:\n", 66 | " return True\n", 67 | " else:\n", 68 | " return False\n", 69 | " \n", 70 | "def find_config():\n", 71 | " for config in configurations:\n", 72 | " if check_config(config, message):\n", 73 | " return config\n", 74 | " return \"config not found\"" 75 | ] 76 | }, 77 | { 78 | "cell_type": "markdown", 79 | "metadata": {}, 80 | "source": [ 81 | "Granted, the combine function which I haven't defined is complex and enumerating all the possible configurations is also complex. But other than that, the algorithm is really pretty simple, _because of iteration_. That's what makes computers so powerful - they can iterate through an extremely large number of possible things, very, very fast.\n", 82 | "\n", 83 | "This of course isn't the only example of iteration changing the way of humanity. The next example comes to us from NASA in the 1960s. \n", 84 | "\n", 85 | "The problem definition here was also well formed in another major film - hidden figures:\n", 86 | "\n", 87 | "The rocket they were trying to send into space needed dynamic calculations of certain forces. If the calculations were wrong, on reentry, the space capsule would not come down correctly, killing the pilot inside. So you needed a method to dynamically figure out the forces on the capsule, in order to dynamically readjust the forces on the capsule.\n", 88 | "\n", 89 | "Enter Euler's method:" 90 | ] 91 | }, 92 | { 93 | "cell_type": "code", 94 | "execution_count": null, 95 | "metadata": { 96 | "collapsed": true 97 | }, 98 | "outputs": [], 99 | "source": [ 100 | "import numpy as np\n", 101 | "import matplotlib.pyplot as plt\n", 102 | "from __future__ import division\n", 103 | "\n", 104 | "# Concentration over time\n", 105 | "N = lambda t: N0 * np.exp(-k * t)\n", 106 | "# dN/dt\n", 107 | "def dx_dt(x):\n", 108 | " return -k * x\n", 109 | "\n", 110 | "k = .5\n", 111 | "h = 0.001\n", 112 | "N0 = 100.\n", 113 | "\n", 114 | "t = np.arange(0, 10, h)\n", 115 | "y = np.zeros(len(t))\n", 116 | "\n", 117 | "y[0] = N0\n", 118 | "for i in range(1, len(t)):\n", 119 | " # Euler's method\n", 120 | " y[i] = y[i-1] + dx_dt(y[i-1]) * h\n", 121 | "\n", 122 | "max_error = abs(y-N(t)).max()\n", 123 | "print('Max difference between the exact solution and Euler's approximation with step size h=0.001:')\n", 124 | "\n", 125 | "print('{0:.15}'.format(max_error))" 126 | ] 127 | }, 128 | { 129 | "cell_type": "markdown", 130 | "metadata": {}, 131 | "source": [ 132 | "There is a lot of heavy math in this function, but don't focus on it. Just recognize that there is a loop here:\n", 133 | "\n", 134 | "```\n", 135 | "for i in range(1, len(t)):\n", 136 | " # Euler's method\n", 137 | " y[i] = y[i-1] + dx_dt(y[i-1]) * h\n", 138 | "```\n", 139 | "\n", 140 | "Without iteration, we never would have been able to make these calculations fast enough to ensure good enough force calculations." 141 | ] 142 | }, 143 | { 144 | "cell_type": "markdown", 145 | "metadata": {}, 146 | "source": [ 147 | "Hopefully this well motivates iteration for you and it's power - defeating the Nazi's in world war 2 and safely returning home from space travel. That's just some of the power of iteration.\n", 148 | "\n", 149 | "Enough talking about it! Here's how to do it.\n", 150 | "\n", 151 | "# The While Loop\n", 152 | "\n", 153 | "The while loop is the simplest form of iteration and is easiest to reason about, so we'll start with that. The general form of a while loop is:\n", 154 | "\n", 155 | "1. intialize value\n", 156 | "2. write while loop with exit loop condition\n", 157 | "3. block of code to execute until loop completes" 158 | ] 159 | }, 160 | { 161 | "cell_type": "code", 162 | "execution_count": 34, 163 | "metadata": {}, 164 | "outputs": [ 165 | { 166 | "name": "stdout", 167 | "output_type": "stream", 168 | "text": [ 169 | "by hand\n", 170 | "0\n", 171 | "1\n", 172 | "2\n", 173 | "using iteration\n", 174 | "0\n", 175 | "1\n", 176 | "2\n", 177 | "3\n", 178 | "4\n", 179 | "5\n", 180 | "6\n", 181 | "7\n", 182 | "8\n", 183 | "9\n" 184 | ] 185 | } 186 | ], 187 | "source": [ 188 | "print(\"by hand\")\n", 189 | "x = 0\n", 190 | "print(x)\n", 191 | "x = x + 1\n", 192 | "print(x)\n", 193 | "x = x + 1\n", 194 | "print(x)\n", 195 | "print(\"using iteration\")\n", 196 | "x = 0\n", 197 | "while x < 10:\n", 198 | " print(x)\n", 199 | " x = x + 1" 200 | ] 201 | }, 202 | { 203 | "cell_type": "markdown", 204 | "metadata": {}, 205 | "source": [ 206 | "The above shows the simplicity of doing the computation. Something important to note: we have to update the value of x in our loop. If we didn't than `x < 10` would never become `False` and our loop would continue forever. That's because while loops are kind of like `if` statements - they execute until they evaluate to `False`.\n", 207 | "\n", 208 | "Let's rewrite our while loop with some short hand in the python language." 209 | ] 210 | }, 211 | { 212 | "cell_type": "code", 213 | "execution_count": 6, 214 | "metadata": {}, 215 | "outputs": [ 216 | { 217 | "name": "stdout", 218 | "output_type": "stream", 219 | "text": [ 220 | "0\n", 221 | "1\n", 222 | "2\n", 223 | "3\n", 224 | "4\n", 225 | "5\n", 226 | "6\n", 227 | "7\n", 228 | "8\n", 229 | "9\n" 230 | ] 231 | } 232 | ], 233 | "source": [ 234 | "x = 0\n", 235 | "while x < 10:\n", 236 | " print(x)\n", 237 | " x += 1 # equivalent to x = x + 1" 238 | ] 239 | }, 240 | { 241 | "cell_type": "markdown", 242 | "metadata": {}, 243 | "source": [ 244 | "Notice this code is more or less the same, the big difference is the shift from `x = x + 1` to `x += 1`. It's kind of a weird thing in general and you have to get used to it, I certainly did. What's actually happening is you are saying:\n", 245 | "\n", 246 | "x + 1 and then store the result of that in a new variable which overwrites the old value of x.\n", 247 | "So you actually read the code x = x + 1 from right hand side to left hand side, instead of left hand side to right hand side. Kinda weird right?\n", 248 | "\n", 249 | "The short hand and the original code: `x = x + 1` and `x += 1` mean the same thing. There is no deep logical stuff going on here. This is just a bit of syntax change for readability." 250 | ] 251 | }, 252 | { 253 | "cell_type": "markdown", 254 | "metadata": {}, 255 | "source": [ 256 | "## Introduction to Lists\n", 257 | "\n", 258 | "So we've talked about iteration, the next thing to cover is lists, so we can have something to iterate over! A list is just a collection of objects stored together. We can think of this kind of like a list of nouns.\n", 259 | "\n", 260 | "Let's see an example of a human list:\n", 261 | "\n", 262 | "* dough\n", 263 | "* breed\n", 264 | "* cereal\n", 265 | "* rice\n", 266 | "* pasta\n", 267 | "\n", 268 | "This is a list of grains, something many of us should already be familiar with. Here's another example:\n", 269 | "\n", 270 | "* Eric\n", 271 | "* Japan\n", 272 | "* telephone\n", 273 | "* 5\n", 274 | "* False\n", 275 | "* running\n", 276 | "\n", 277 | "This is a list of words, with no inherent meaning. So we can say, in general, a list is just a collection of things. Whether they are associated or not doesn't matter, but we syntactically can group anything together, using english. Now let's look at a Python list." 278 | ] 279 | }, 280 | { 281 | "cell_type": "code", 282 | "execution_count": 36, 283 | "metadata": {}, 284 | "outputs": [ 285 | { 286 | "name": "stdout", 287 | "output_type": "stream", 288 | "text": [ 289 | "[1, 2, 3, 4, 5, 6]\n", 290 | "[]\n" 291 | ] 292 | } 293 | ], 294 | "source": [ 295 | "listing = [1, 2, 3, 4, 5, 6]\n", 296 | "print(listing)\n", 297 | "listing_two = list()\n", 298 | "print(listing_two)" 299 | ] 300 | }, 301 | { 302 | "cell_type": "markdown", 303 | "metadata": {}, 304 | "source": [ 305 | "Here we see that we actually group Python nouns, like integers together! The syntax for creating a list is as follows:\n", 306 | "\n", 307 | "`variable_name = []`\n", 308 | "\n", 309 | "OR\n", 310 | "\n", 311 | "`variable_name = list()`\n", 312 | "\n", 313 | "Both of the above create empty lists. We can also start with lists that have elements in them! \n", 314 | "\n", 315 | "Like the above example, or this one:" 316 | ] 317 | }, 318 | { 319 | "cell_type": "code", 320 | "execution_count": 3, 321 | "metadata": {}, 322 | "outputs": [ 323 | { 324 | "data": { 325 | "text/plain": [ 326 | "['hello', 'there', 'friends']" 327 | ] 328 | }, 329 | "execution_count": 3, 330 | "metadata": {}, 331 | "output_type": "execute_result" 332 | } 333 | ], 334 | "source": [ 335 | "second_list = [\"hello\", \"there\", \"friends\"]\n", 336 | "second_list" 337 | ] 338 | }, 339 | { 340 | "cell_type": "markdown", 341 | "metadata": {}, 342 | "source": [ 343 | "A list can store collections of any Python basic type we might like! Even ones we dream up, but more on that later. \n", 344 | "\n", 345 | "Let's see another example!" 346 | ] 347 | }, 348 | { 349 | "cell_type": "code", 350 | "execution_count": 5, 351 | "metadata": {}, 352 | "outputs": [ 353 | { 354 | "data": { 355 | "text/plain": [ 356 | "['hello', 5, 7.2, True]" 357 | ] 358 | }, 359 | "execution_count": 5, 360 | "metadata": {}, 361 | "output_type": "execute_result" 362 | } 363 | ], 364 | "source": [ 365 | "third_list = [\"hello\", 5, 7.2, True]\n", 366 | "third_list" 367 | ] 368 | }, 369 | { 370 | "cell_type": "markdown", 371 | "metadata": {}, 372 | "source": [ 373 | "We can even store multiple types in one list! That's _crazy_!!!\n", 374 | "\n", 375 | "Now let's look at adding an element to a list and taking one away." 376 | ] 377 | }, 378 | { 379 | "cell_type": "code", 380 | "execution_count": 7, 381 | "metadata": {}, 382 | "outputs": [ 383 | { 384 | "name": "stdout", 385 | "output_type": "stream", 386 | "text": [ 387 | "After calling the append method the list looks like this [1, 2, 3, 4, 5, 6]\n", 388 | "After calling the remove method the list looks like this [1, 3, 4, 5, 6]\n" 389 | ] 390 | } 391 | ], 392 | "source": [ 393 | "listing = [1, 2, 3, 4, 5]\n", 394 | "listing.append(6)\n", 395 | "print(\"After calling the append method the list looks like this {}\".format(listing))\n", 396 | "listing.remove(2)\n", 397 | "print(\"After calling the remove method the list looks like this {}\".format(listing))" 398 | ] 399 | }, 400 | { 401 | "cell_type": "markdown", 402 | "metadata": {}, 403 | "source": [ 404 | "In the above example we added an element to the list, by appending it to the end of the list and we removed an element from the list, by calling the remove method. Notice that append adds to the end of the list, while remove removes semantically from wherever the element in question is, in the list.\n", 405 | "\n", 406 | "Let's look at the full list of possible operations on lists" 407 | ] 408 | }, 409 | { 410 | "cell_type": "code", 411 | "execution_count": 10, 412 | "metadata": {}, 413 | "outputs": [ 414 | { 415 | "data": { 416 | "text/plain": [ 417 | "['append',\n", 418 | " 'clear',\n", 419 | " 'copy',\n", 420 | " 'count',\n", 421 | " 'extend',\n", 422 | " 'index',\n", 423 | " 'insert',\n", 424 | " 'pop',\n", 425 | " 'remove',\n", 426 | " 'reverse',\n", 427 | " 'sort']" 428 | ] 429 | }, 430 | "execution_count": 10, 431 | "metadata": {}, 432 | "output_type": "execute_result" 433 | } 434 | ], 435 | "source": [ 436 | "[elem for elem in dir(list()) if \"__\" not in elem]" 437 | ] 438 | }, 439 | { 440 | "cell_type": "code", 441 | "execution_count": 48, 442 | "metadata": {}, 443 | "outputs": [ 444 | { 445 | "name": "stdout", 446 | "output_type": "stream", 447 | "text": [ 448 | "\n", 449 | "\n" 450 | ] 451 | }, 452 | { 453 | "data": { 454 | "text/plain": [ 455 | "[5, 'five']" 456 | ] 457 | }, 458 | "execution_count": 48, 459 | "metadata": {}, 460 | "output_type": "execute_result" 461 | } 462 | ], 463 | "source": [ 464 | "5 < 7 # these are comparable!\n", 465 | "# [] < {}\n", 466 | "listing = [5, \"five\"]\n", 467 | "print(type(listing[0]))\n", 468 | "print(type(listing[1]))\n", 469 | "listing.insert(0, \"happy\")\n", 470 | "del listing[0]\n", 471 | "listing" 472 | ] 473 | }, 474 | { 475 | "cell_type": "markdown", 476 | "metadata": {}, 477 | "source": [ 478 | "One of the most powerful things about lists is, if they are all the same type and comparable to one another, then you can call `list.sort()`, which will sort the elements in your list from lowest to highest for you. Sorting is out of scope for this course, but I'll say, the sorting algorithm Python uses is very, very fast.\n", 479 | "\n", 480 | "Some other important methods are `count`, which counts the number of occurrences of an element in the list:" 481 | ] 482 | }, 483 | { 484 | "cell_type": "code", 485 | "execution_count": 40, 486 | "metadata": {}, 487 | "outputs": [ 488 | { 489 | "data": { 490 | "text/plain": [ 491 | "2" 492 | ] 493 | }, 494 | "execution_count": 40, 495 | "metadata": {}, 496 | "output_type": "execute_result" 497 | } 498 | ], 499 | "source": [ 500 | "listing = [1, 2, 1, 2, 3, 3, 3, 3]\n", 501 | "listing.count(1)" 502 | ] 503 | }, 504 | { 505 | "cell_type": "markdown", 506 | "metadata": {}, 507 | "source": [ 508 | "The `index` method which tells you the first index of an element." 509 | ] 510 | }, 511 | { 512 | "cell_type": "code", 513 | "execution_count": 42, 514 | "metadata": {}, 515 | "outputs": [ 516 | { 517 | "data": { 518 | "text/plain": [ 519 | "0" 520 | ] 521 | }, 522 | "execution_count": 42, 523 | "metadata": {}, 524 | "output_type": "execute_result" 525 | } 526 | ], 527 | "source": [ 528 | "listing = [1, 2, 3, 4, 5, 1]\n", 529 | "listing.index(1)" 530 | ] 531 | }, 532 | { 533 | "cell_type": "markdown", 534 | "metadata": {}, 535 | "source": [ 536 | "So at this point you may be asking yourself, what's an index? An index is a number that allows you to call a specific element, by it's place in the list. Often times the way programmers talk about this is by saying \"indexing into the list at a specific element\". This way of describing it makes sense of you look at syntactically you reference an element by it's index in the list:" 537 | ] 538 | }, 539 | { 540 | "cell_type": "code", 541 | "execution_count": 15, 542 | "metadata": {}, 543 | "outputs": [ 544 | { 545 | "name": "stdout", 546 | "output_type": "stream", 547 | "text": [ 548 | "The zeroth index of the list is 1\n", 549 | "The 3rd index of the list is 4\n" 550 | ] 551 | } 552 | ], 553 | "source": [ 554 | "listing = [1, 2, 3, 4, 5, 6, 7, 8]\n", 555 | "print(\"The zeroth index of the list is {}\".format(listing[0]))\n", 556 | "print(\"The 3rd index of the list is {}\".format(listing[3]))" 557 | ] 558 | }, 559 | { 560 | "cell_type": "markdown", 561 | "metadata": {}, 562 | "source": [ 563 | "## An introduction to for loops\n", 564 | "\n", 565 | "Notice that the first index or first place in the list is 0, not 1. This is a point of confusion for lots of folks, but number theorists have insisted for a long time that counting should start at 0, not 1. There are some reasons this can be nice, for calculating distances in discrete space. \n", 566 | "\n", 567 | "The next thing we'll look at is making use of our indexing to iterate over the list. This will be just like what we did with while loops, except with a difference syntax. First we'll see the example with a while loop and then we'll see it with a for loop." 568 | ] 569 | }, 570 | { 571 | "cell_type": "code", 572 | "execution_count": 19, 573 | "metadata": {}, 574 | "outputs": [ 575 | { 576 | "name": "stdout", 577 | "output_type": "stream", 578 | "text": [ 579 | "Iterating with a while loop\n", 580 | "1\n", 581 | "2\n", 582 | "3\n", 583 | "4\n", 584 | "5\n", 585 | "6\n", 586 | "7\n", 587 | "8\n", 588 | "\n", 589 | "Interating with a for loop\n", 590 | "1\n", 591 | "2\n", 592 | "3\n", 593 | "4\n", 594 | "5\n", 595 | "6\n", 596 | "7\n", 597 | "8\n" 598 | ] 599 | } 600 | ], 601 | "source": [ 602 | "listing = [1, 2, 3, 4, 5, 6, 7, 8]\n", 603 | "\n", 604 | "print(\"Iterating with a while loop\")\n", 605 | "iterator = 0\n", 606 | "while iterator < len(listing):\n", 607 | " print(listing[iterator])\n", 608 | " iterator += 1\n", 609 | "print()\n", 610 | "print(\"Interating with a for loop\")\n", 611 | "for elem in listing:\n", 612 | " print(elem)" 613 | ] 614 | }, 615 | { 616 | "cell_type": "markdown", 617 | "metadata": {}, 618 | "source": [ 619 | "As you can see the two ways of looping are equivalent. However, the for loop is the preferred way of doing this. At first, at may seem less explicit, but you'll find that it leads to fewer errors in the program. This is because you don't need to explicitly set the boolean condition for your loop to terminate. Infinite loops - loops that never end are a big problem for while loops, and sometimes be very, very hard to debug, depending on how complex your terminating condition is.\n", 620 | "\n", 621 | "For loops solve this nicely, because you don't tell the loop when to terminate, the for loop figures it out, because it just goes over every index in the list.\n", 622 | "\n", 623 | "If you want to do something sophisticated with your list, which some folks do, you can explicitly still set up the counter like we did before, with the for-loop, so you don't lose any of the expressive power of the while loop, while using the for loop." 624 | ] 625 | }, 626 | { 627 | "cell_type": "code", 628 | "execution_count": 22, 629 | "metadata": {}, 630 | "outputs": [ 631 | { 632 | "name": "stdout", 633 | "output_type": "stream", 634 | "text": [ 635 | "The element of the list at index 0 is 1\n", 636 | "The element of the list at index 1 is 2\n", 637 | "The element of the list at index 2 is 3\n", 638 | "The element of the list at index 3 is 4\n", 639 | "The element of the list at index 4 is 5\n", 640 | "The element of the list at index 5 is 6\n", 641 | "The element of the list at index 6 is 7\n", 642 | "The element of the list at index 7 is 8\n" 643 | ] 644 | } 645 | ], 646 | "source": [ 647 | "iterator = 0\n", 648 | "listing = [1, 2, 3, 4, 5, 6, 7, 8]\n", 649 | "\n", 650 | "for elem in listing:\n", 651 | " print(\"The element of the list at index {} is {}\".format(iterator, elem))\n", 652 | " iterator += 1" 653 | ] 654 | }, 655 | { 656 | "cell_type": "markdown", 657 | "metadata": {}, 658 | "source": [ 659 | "This is a powerful and flexible syntax, but having to initialize the interator is also kind of annoying, so Python comes with a nice function doing this for you automatically as well:" 660 | ] 661 | }, 662 | { 663 | "cell_type": "code", 664 | "execution_count": 50, 665 | "metadata": {}, 666 | "outputs": [ 667 | { 668 | "name": "stdout", 669 | "output_type": "stream", 670 | "text": [ 671 | "The element of the list at index 0 is 1\n", 672 | "The element of the list at index 1 is 2\n", 673 | "The element of the list at index 2 is 3\n", 674 | "The element of the list at index 3 is 4\n", 675 | "The element of the list at index 4 is 5\n", 676 | "The element of the list at index 5 is 6\n", 677 | "The element of the list at index 6 is 7\n", 678 | "The element of the list at index 7 is 8\n" 679 | ] 680 | } 681 | ], 682 | "source": [ 683 | "listing = [1, 2, 3, 4, 5, 6, 7, 8]\n", 684 | "for iterator, elem in enumerate(listing):\n", 685 | " print(\"The element of the list at index {} is {}\".format(iterator, elem))" 686 | ] 687 | }, 688 | { 689 | "cell_type": "markdown", 690 | "metadata": {}, 691 | "source": [ 692 | "No more pesky updating of iterators! Now everything is clear, high level and clean. \n", 693 | "\n", 694 | "So with all this talk of indexing, you may wonder, do you always have to index your lists by the counting numbers, also known as the natural numbers?\n", 695 | "\n", 696 | "## An introduction to dictionaries\n", 697 | "\n", 698 | "The answer is no! It's time to meet our next kind of collection - the dictionary!" 699 | ] 700 | }, 701 | { 702 | "cell_type": "code", 703 | "execution_count": 24, 704 | "metadata": {}, 705 | "outputs": [ 706 | { 707 | "data": { 708 | "text/plain": [ 709 | "{1: 2, 2: 3, 4: 7}" 710 | ] 711 | }, 712 | "execution_count": 24, 713 | "metadata": {}, 714 | "output_type": "execute_result" 715 | } 716 | ], 717 | "source": [ 718 | "dicter = {1:2, 2:3, 4:7}\n", 719 | "dicter" 720 | ] 721 | }, 722 | { 723 | "cell_type": "markdown", 724 | "metadata": {}, 725 | "source": [ 726 | "In this dictionary, the index is the first number and the values are the second number. In this collection we defined the index to be 1, 2, and 4 and the stored values to be 2, 3, and 7. Certainly not indexed by the counting numbers, incremented one at a time!\n", 727 | "\n", 728 | "We can do a lot more than just have a weird indexing strategy with dictionaries. We can _even_ create mappings or lookups from words to other words.\n" 729 | ] 730 | }, 731 | { 732 | "cell_type": "code", 733 | "execution_count": 26, 734 | "metadata": {}, 735 | "outputs": [ 736 | { 737 | "data": { 738 | "text/plain": [ 739 | "\"Hi my name is Teacher I'm friends with Person and he has a Cat named Cat\"" 740 | ] 741 | }, 742 | "execution_count": 26, 743 | "metadata": {}, 744 | "output_type": "execute_result" 745 | } 746 | ], 747 | "source": [ 748 | "dicter_two = {\"Eric\": \"Teacher\", \"Bob\": \"Person\", \"Dog\":\"Cat\"}\n", 749 | "sentence = \"Hi my name is Eric I'm friends with Bob and he has a Dog named Dog\"\n", 750 | "new_sentence = []\n", 751 | "for word in sentence.split():\n", 752 | " if word in dicter_two:\n", 753 | " new_sentence.append(dicter_two[word])\n", 754 | " else:\n", 755 | " new_sentence.append(word)\n", 756 | "\" \".join(new_sentence)" 757 | ] 758 | }, 759 | { 760 | "cell_type": "markdown", 761 | "metadata": {}, 762 | "source": [ 763 | "There is a lot going on here! The first thing to see is that we can create a list, by calling `split()` on a string! This method is pretty amazing, it splits strings on white space characters like `' '`. And turns them into lists of words. Then I'm using a for loop for each word, if the word is in the index, usually called keys, of the dictionary then I do the mapping. Otherwise I append the word to the list, as is. Finally a join all the words back together.\n", 764 | "\n", 765 | "The join method on a string is the opposite operation of split. Instead of splitting apart a string into a bunch of strings in a list. It takes a list of strings and stitches them all together with whatever string you are calling join on, in this case, a white space.\n", 766 | "\n", 767 | "## An introduction to sets\n", 768 | "The final thing we are going to discuss today is sets." 769 | ] 770 | }, 771 | { 772 | "cell_type": "markdown", 773 | "metadata": {}, 774 | "source": [ 775 | "A set is the least interesting, but still useful, collection we will be looking at today. You can't index into it, you can't even add more than one element with the same value to it!\n", 776 | "\n", 777 | "But this turns out to be a powerful tool none the less. Here we will see some of the power of sets:" 778 | ] 779 | }, 780 | { 781 | "cell_type": "code", 782 | "execution_count": 29, 783 | "metadata": {}, 784 | "outputs": [ 785 | { 786 | "name": "stdout", 787 | "output_type": "stream", 788 | "text": [ 789 | "This list has 10000 elements of which 10 are unique\n" 790 | ] 791 | } 792 | ], 793 | "source": [ 794 | "import random\n", 795 | "listing = [random.randint(1,10) for _ in range(10000)]\n", 796 | "print(\"This list has {} elements of which {} are unique\".format(len(listing), len(set(listing))))" 797 | ] 798 | }, 799 | { 800 | "cell_type": "markdown", 801 | "metadata": {}, 802 | "source": [ 803 | "We can also iterate over sets like follows:\n" 804 | ] 805 | }, 806 | { 807 | "cell_type": "code", 808 | "execution_count": 30, 809 | "metadata": {}, 810 | "outputs": [ 811 | { 812 | "name": "stdout", 813 | "output_type": "stream", 814 | "text": [ 815 | "1\n", 816 | "2\n", 817 | "3\n", 818 | "4\n", 819 | "5\n", 820 | "6\n", 821 | "7\n", 822 | "8\n", 823 | "9\n", 824 | "10\n" 825 | ] 826 | } 827 | ], 828 | "source": [ 829 | "set_list = set(listing)\n", 830 | "for elem in set_list:\n", 831 | " print(elem)" 832 | ] 833 | }, 834 | { 835 | "cell_type": "code", 836 | "execution_count": null, 837 | "metadata": { 838 | "collapsed": true 839 | }, 840 | "outputs": [], 841 | "source": [] 842 | } 843 | ], 844 | "metadata": { 845 | "kernelspec": { 846 | "display_name": "Python 3", 847 | "language": "python", 848 | "name": "python3" 849 | }, 850 | "language_info": { 851 | "codemirror_mode": { 852 | "name": "ipython", 853 | "version": 3 854 | }, 855 | "file_extension": ".py", 856 | "mimetype": "text/x-python", 857 | "name": "python", 858 | "nbconvert_exporter": "python", 859 | "pygments_lexer": "ipython3", 860 | "version": "3.6.1" 861 | } 862 | }, 863 | "nbformat": 4, 864 | "nbformat_minor": 2 865 | } 866 | -------------------------------------------------------------------------------- /class_two.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Class Two - Dealing with the file system and standing on the shoulders of giants\n", 8 | "\n", 9 | "The reason the Python language is so good is not because of it's syntax. It's syntax is wonderful! But almost all modern programming languages are similar. Python is just one of many languages that looks and feels the way it does. This is not by accident, that's because the good ideas in computer languages are not owned by any one language. \n", 10 | "\n", 11 | "The reason the Python language is the powerhouse that it is, is simple, it's because of the community. Python has more built with it than almost any other language I know of. And it's much younger than the ones with more built in it, looking at you, Java.\n", 12 | "\n", 13 | "So how do we leverage the true power of Python? With one simple statement, `import`.\n", 14 | "\n", 15 | "Let's see an example to understand this better." 16 | ] 17 | }, 18 | { 19 | "cell_type": "code", 20 | "execution_count": 3, 21 | "metadata": {}, 22 | "outputs": [ 23 | { 24 | "name": "stdout", 25 | "output_type": "stream", 26 | "text": [ 27 | "8.0\n" 28 | ] 29 | } 30 | ], 31 | "source": [ 32 | "import math\n", 33 | "\n", 34 | "print(math.pow(2,3))" 35 | ] 36 | }, 37 | { 38 | "cell_type": "markdown", 39 | "metadata": {}, 40 | "source": [ 41 | "So what did we just see here? First, we imported the `math` and then we used a function as part of the math library to do some computation. In order to get a better sense of this, let's introduce some definitions.\n", 42 | "\n", 43 | "**Definition** _library_ - a library (also sometimes called a module), is a collection of code that can be used in your existing program. In python you access these libraries by first importing them into your program.\n", 44 | "\n", 45 | "**Definition** _class_ - a class is a collection of functions and data, all accessed via a dot operator. You call functions by adding opening and closing paranthesis after the name of the function, with possibly some input variables in between. \n", 46 | "\n", 47 | "**Definition** _dot operator_ - a dot is used to break up paths to specific pieces of code. We already know another piece of notation to break up semantically different pieces in a long string - `/` when used in the context of the file system.\n", 48 | "\n", 49 | "Often times the library you are importing will have multiple classes - collections of code - that you'll want to access individually. Here's an example of using such a piece of code:" 50 | ] 51 | }, 52 | { 53 | "cell_type": "code", 54 | "execution_count": 4, 55 | "metadata": {}, 56 | "outputs": [ 57 | { 58 | "data": { 59 | "text/plain": [ 60 | "NormaltestResult(statistic=32.008023157040626, pvalue=1.1208463532086409e-07)" 61 | ] 62 | }, 63 | "execution_count": 4, 64 | "metadata": {}, 65 | "output_type": "execute_result" 66 | } 67 | ], 68 | "source": [ 69 | "from scipy import stats\n", 70 | "import random\n", 71 | "\n", 72 | "set_of_values = [random.randint(0,100) for _ in range(100)]\n", 73 | "stats.normaltest(set_of_values)" 74 | ] 75 | }, 76 | { 77 | "cell_type": "markdown", 78 | "metadata": {}, 79 | "source": [ 80 | "The above piece of code does two import statements - first it selects the `stats` class from `scipy`. And then it imports `random` - a module that generates random data. The remaining lines of code aren't super important to understand it detail. But just note that I'm calling `stats.normaltest` and `random.randint`. I'm calling two functions defined in the classes `stats` and `random`. That's the important take away. \n", 81 | "\n", 82 | "Of course, I could probably write these functions myself and these classes for that matter. But that would take a lot of time. I'd have to look up the algorithms, internalize how they work, and then go about writing them. It would probably take many, many tries to actually get them right, even after I understand them. And then, maybe 3 months from now, I'd have code that functions as the above 4 lines does. \n", 83 | "\n", 84 | "This is what I mean when I say Python lets you stand on the shoulders of giants. The Python community has already done the hard work for you! All you have to do is `import` [whatever] and go! \n", 85 | "\n", 86 | "Now ofcourse things are slightly more complicated than that. But more on that later!" 87 | ] 88 | }, 89 | { 90 | "cell_type": "markdown", 91 | "metadata": {}, 92 | "source": [ 93 | "## Understanding the File System\n", 94 | "\n", 95 | "Before we can move onto the next part of the course, we need to understand how the file system works, so I'm going to be switching to the terminal for this part of the class.\n", 96 | "\n", 97 | "[DEMO]\n", 98 | "\n", 99 | "**Definition** _a file_ - a piece of memory that stores information as a string.\n", 100 | "\n", 101 | "**Definition** _a folder_ - a collection of files and folders, contained within it.\n", 102 | "\n", 103 | "**Definition** _current working directory_ - the current directory that your terminal or program is referencing.\n", 104 | "\n", 105 | "**Definition** _command line comands_ - functions you can call from your command line to act on files and folders. You can think of these like the worlds first \"apps\". Similar to the ones you use on your smart phone.\n", 106 | "\n", 107 | "**Definition** _root directory_ - the top of your file system. This is the directory you hit when you type `cd /`.\n", 108 | "\n", 109 | "**Definition** _home directory_ - the top of your users file system. This is the directory you hit when you type `cd ~`.\n", 110 | "\n", 111 | "**Definition** _full file path_ - the full path from your root directory to your current folder.\n", 112 | "\n", 113 | "**Definition** _relative file path_ - the file path to the directory or file you want to reference, relative to the directory you are currently in.\n", 114 | "\n", 115 | "\n", 116 | "And now a list of commands:\n", 117 | "\n", 118 | "* `ls` - see all the files and folders in your current working directory\n", 119 | "* `cd` - change directories by passing this command a folder.\n", 120 | "* `pwd` - print out the full path to current directory.\n", 121 | "\n", 122 | "Now that we understand the basics of the file system, let's see how Python manipulates the file system with code." 123 | ] 124 | }, 125 | { 126 | "cell_type": "code", 127 | "execution_count": 5, 128 | "metadata": {}, 129 | "outputs": [ 130 | { 131 | "name": "stdout", 132 | "output_type": "stream", 133 | "text": [ 134 | "/Users/ericschles/Documents/projects/python_courses/an_introduction_to_python\n", 135 | "/Users/ericschles/Documents/projects/python_courses\n", 136 | "/Users/ericschles/Documents/projects/python_courses/an_introduction_to_python\n" 137 | ] 138 | } 139 | ], 140 | "source": [ 141 | "import os\n", 142 | "\n", 143 | "current_directory = os.getcwd()\n", 144 | "print(os.getcwd()) #equivalent to pwd\n", 145 | "os.chdir(\"..\") # equivalent to cd ..\n", 146 | "print(os.getcwd())\n", 147 | "os.chdir(current_directory)\n", 148 | "print(os.getcwd())" 149 | ] 150 | }, 151 | { 152 | "cell_type": "code", 153 | "execution_count": 8, 154 | "metadata": {}, 155 | "outputs": [ 156 | { 157 | "name": "stdout", 158 | "output_type": "stream", 159 | "text": [ 160 | "False\n", 161 | "True\n" 162 | ] 163 | } 164 | ], 165 | "source": [ 166 | "import os\n", 167 | "\n", 168 | "print(os.path.isdir(\"class_two.ipynb\"))\n", 169 | "print(os.path.isfile(\"class_two.ipynb\"))" 170 | ] 171 | }, 172 | { 173 | "cell_type": "markdown", 174 | "metadata": {}, 175 | "source": [ 176 | "Okay so we can mess with the file system and change between directories programmatically, so what? Well, this becomes extremely powerful if we add the ability to read and write files! Think about how many tasks we can automate with that :) \n", 177 | "\n", 178 | "Here's just _an_ example - Say your boss asks you for a report every friday at 4pm of what happened last week. All of the information is recorded in a database. And right now you just click some buttons, download some data from a database and then copy/paste the results into a template. What if a program automatically generated the data, directly from the database and then emailed it to your boss. Then you could work on other stuff! No more having to do the same boring task every friday and right before the end of the work day!\n", 179 | "\n", 180 | "Now imagine you could do that for every single task like that. Imagine how much time you'd save! You could focus on high impact, stimulating work, instead of the boring stuff. \n", 181 | "\n", 182 | "Hopefully this motivates the next example well enough for your needs (I know that folks asked me for this kinda stuff all the time in my first job)." 183 | ] 184 | }, 185 | { 186 | "cell_type": "code", 187 | "execution_count": 10, 188 | "metadata": {}, 189 | "outputs": [ 190 | { 191 | "name": "stdout", 192 | "output_type": "stream", 193 | "text": [ 194 | "Hello there!\n" 195 | ] 196 | } 197 | ], 198 | "source": [ 199 | "new_file = open(\"new_file.txt\", \"w\")\n", 200 | "\n", 201 | "new_file.write(\"Hello there!\")\n", 202 | "new_file.close()\n", 203 | "\n", 204 | "just_created_file = open(\"new_file.txt\", \"r\")\n", 205 | "print(just_created_file.read())" 206 | ] 207 | }, 208 | { 209 | "cell_type": "markdown", 210 | "metadata": {}, 211 | "source": [ 212 | "What did we just do?!?! We created a file with `open` and then we wrote a string to it and then we closed it. And finally we re-openned the file and read it's contents.\n", 213 | "\n", 214 | "So you'll notice that the open command takes a second string. The first time we used it looked like this:\n", 215 | "\n", 216 | "`new_file = open(\"new_file.txt\", \"w\")`\n", 217 | "\n", 218 | "What that means is, open the file for writing and the string \"w\" tells open to do just that - open the file for writing.\n", 219 | "\n", 220 | "Later on we used open like so:\n", 221 | "\n", 222 | "`just_created_file = open(\"new_file.txt\", \"r\")` - this tells open to open the file for reading, with \"r\" for reading.\n", 223 | "\n", 224 | "There is a third command we'll be interested in today: appending.\n", 225 | "\n", 226 | "open with the \"w\" passed in as the second input opens a file for writing, and if it doesn't exist yet, it creates the file for you! However, if the file already exists, it overwrites any content already in the file. That's clearly not always what we want.\n", 227 | "\n", 228 | "So there is a `open(\"new_file.txt\", \"a\")` comamnd which opens a new file for appending. This means the file's contents is added to, instead of overwritten. \n", 229 | "\n", 230 | "So let's try it out!" 231 | ] 232 | }, 233 | { 234 | "cell_type": "code", 235 | "execution_count": 11, 236 | "metadata": {}, 237 | "outputs": [ 238 | { 239 | "name": "stdout", 240 | "output_type": "stream", 241 | "text": [ 242 | "Hello there!\n", 243 | " yo!\n" 244 | ] 245 | } 246 | ], 247 | "source": [ 248 | "file_handle = open(\"new_file.txt\", \"a\")\n", 249 | "file_handle.write(\"\\n yo!\")\n", 250 | "file_handle.close()\n", 251 | "\n", 252 | "file_handle = open(\"new_file.txt\", \"r\")\n", 253 | "print(file_handle.read())" 254 | ] 255 | }, 256 | { 257 | "cell_type": "markdown", 258 | "metadata": {}, 259 | "source": [ 260 | "Now we'll try the same thing, except will use \"w\" instead of \"a\" for the second parameter." 261 | ] 262 | }, 263 | { 264 | "cell_type": "code", 265 | "execution_count": 12, 266 | "metadata": {}, 267 | "outputs": [ 268 | { 269 | "name": "stdout", 270 | "output_type": "stream", 271 | "text": [ 272 | "\n", 273 | " yo!\n" 274 | ] 275 | } 276 | ], 277 | "source": [ 278 | "file_handle = open(\"new_file.txt\", \"w\")\n", 279 | "file_handle.write(\"\\n yo!\")\n", 280 | "file_handle.close()\n", 281 | "\n", 282 | "file_handle = open(\"new_file.txt\", \"r\")\n", 283 | "print(file_handle.read())" 284 | ] 285 | }, 286 | { 287 | "cell_type": "markdown", 288 | "metadata": {}, 289 | "source": [ 290 | "As you can see, \"Hello there!\" is gone." 291 | ] 292 | }, 293 | { 294 | "cell_type": "code", 295 | "execution_count": null, 296 | "metadata": { 297 | "collapsed": true 298 | }, 299 | "outputs": [], 300 | "source": [] 301 | } 302 | ], 303 | "metadata": { 304 | "kernelspec": { 305 | "display_name": "Python 3", 306 | "language": "python", 307 | "name": "python3" 308 | }, 309 | "language_info": { 310 | "codemirror_mode": { 311 | "name": "ipython", 312 | "version": 3 313 | }, 314 | "file_extension": ".py", 315 | "mimetype": "text/x-python", 316 | "name": "python", 317 | "nbconvert_exporter": "python", 318 | "pygments_lexer": "ipython3", 319 | "version": "3.6.1" 320 | } 321 | }, 322 | "nbformat": 4, 323 | "nbformat_minor": 2 324 | } 325 | -------------------------------------------------------------------------------- /continuing_the_journey.md: -------------------------------------------------------------------------------- 1 | ## Data Analytics 2 | 3 | * [an introduction to pandas](https://www.youtube.com/playlist?list=PLeo1K3hjS3uuASpe-1LjfG5f14Bnozjwy) 4 | * [data analysis course I teach](https://github.com/EricSchles/data_analysis_nyu_python_class) 5 | * [PyData](https://www.youtube.com/user/PyDataTV) 6 | * [Google's Developers Page](https://www.youtube.com/user/GoogleDevelopers) 7 | * [Assorted Technical Talks](https://www.youtube.com/user/NextDayVideo) 8 | * [Python For research](http://online-learning.harvard.edu/course/using-python-research) 9 | * [Stanford science class](https://stanford.edu/~arbenson/cme193.html) 10 | 11 | ## Programming More Generally 12 | 13 | * [MIT CS intro class](https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-01sc-introduction-to-electrical-engineering-and-computer-science-i-spring-2011/) 14 | * [MIT gentle intro to Python](https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-189-a-gentle-introduction-to-programming-using-python-january-iap-2011/index.htm) 15 | * [Brown Programming languages class](https://cs.brown.edu/courses/cs173/2012/) 16 | * [Math && Python - Slides](http://codingthematrix.com/) and [Math && Python - Resources](http://resources.codingthematrix.com/) 17 | * [An introduction to Object Oriented Programming](http://blog.thedigitalcatonline.com/blog/2014/08/20/python-3-oop-part-1-objects-and-types/#.WYIOXtPyuCQ) 18 | * [Stanford Python Course](http://stanfordpython.com/) 19 | * [CMU Python class](https://www.cs.cmu.edu/~112/) 20 | 21 | ## Web Dev with Python 22 | 23 | * [Mega Intro to Flask (best tutorial I have found)](https://blog.miguelgrinberg.com/post/the-flask-mega-tutorial-part-i-hello-world) 24 | * [Django Intro](https://www.djangoproject.com/start/) 25 | * [Django Girls Intro](https://tutorial.djangogirls.org/en/python_introduction/) 26 | * [Django Book](https://djangobook.com/) 27 | * [Django Test Driven Development - BEST BOOK FOR THIS](http://chimera.labs.oreilly.com/books/1234000000754/) -------------------------------------------------------------------------------- /enigma.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | def Ord(ch): return ord(ch) - ord('A') # convert A-Z to 0-25 4 | def Chr(ch): return chr(ch + ord('A')) # convert 0-25 to A-Z 5 | def Text(s): return "".join(ch for ch in s if ch in "ABCDEFGHIJKLMNOPQRSTUVWXYZ") 6 | 7 | Rotors = { # name: (wiring, notches) 8 | "I": ("EKMFLGDQVZNTOWYHXUSPAIBRCJ", "Q"), # 1930 Enigma I 9 | "II": ("AJDKSIRUXBLHWTMCQGZNPYFVOE", "E"), # 1930 Enigma I 10 | "III": ("BDFHJLCPRTXVZNYEIWGAKMUSQO", "V"), # 1930 Enigma I 11 | "IV": ("ESOVPZJAYQUIRHXLNFTGKDCMWB", "J"), # December 1938 M3 Army 12 | "V": ("VZBRGITYUPSDNHLXAWMJQOFECK", "Z"), # December 1938 M3 Army 13 | "VI": ("JPGVOUMFYQBENHZRDKASXLICTW", "ZM"), # 1939 M3 & M4 Naval (FEB 1942) 14 | "VII": ("NZJHGRCXMYSWBOUFAIVLPEKQDT", "ZM"), # 1939 M3 & M4 Naval (FEB 1942) 15 | "VIII": ("FKQHTLXOCBJSPDZRAMEWNIUYGV", "ZM"), # 1939 M3 & M4 Naval (FEB 1942) 16 | "Beta": ("LEYJVCNIXWPBQMDRTAKZGFUHOS", ""), # Spring 1941 M4 R2 17 | "Gamma":("FSOKANUERHMBTIYCWLQPZXVGJD", ""), # Spring 1942 M4 R2 18 | "A": ("EJMZALYXVBWFCRQUONTSPIKHGD", ""), 19 | "B": ("YRUHQSLDPXNGOKMIEBFZCWVJAT", ""), 20 | "C": ("FVPJIAOYEDRZXWGCTKUQSBNMHL", ""), 21 | "B Thin": ("ENKQAUYWJICOPBLMDXZVFTHRGS", ""), # 1940 M4 R1 (M3 + Thin) 22 | "C Thin": ("RDOBJNTKVEHMLFCWZAXGYIPSUQ", ""), # 1940 M4 R1 (M3 + Thin) 23 | } 24 | 25 | class Rotor: 26 | def __init__(self, table, ofs='A', ring=1): 27 | self.rotor, (table, notches) = table, Rotors[table] 28 | self.table = list(map(Ord, table)) 29 | self.reciprocal = [self.table.index(i) for i in range(26)] 30 | self.notches = [(n - ring + 1) % 26 for n in map(Ord, notches)] 31 | self.ofs = Ord(ofs) - ring + 1 32 | self.ring = ring - 1 33 | def knocks(self): return (self.ofs % 26) in self.notches 34 | def advance(self): self.ofs += 1 35 | def enter(self, ch, ofs): return self.table[(ch + self.ofs - ofs) % 26] 36 | def exit(self, ch, ofs): return self.reciprocal[(ch + self.ofs - ofs) % 26] 37 | 38 | class Machine: 39 | def __init__(self, plugboard, *rotors): 40 | self.plugboard = list(range(26)) 41 | for pair in plugboard.split(): 42 | a, b = list(map(Ord, pair)) 43 | self.plugboard[a] = b 44 | self.plugboard[b] = a 45 | self.rotors = rotors # natural order is reflector, left, middle, right 46 | def transcode(self, message, expected=None): 47 | out = "" 48 | for ch in message: 49 | if not Text(ch): # whitespace etc 50 | out += ch 51 | continue 52 | # advance rotors; the three rightmost can rotate 53 | left, middle, right = self.rotors[-3:] 54 | if middle.knocks(): 55 | left.advance() 56 | middle.advance() 57 | elif right.knocks(): 58 | middle.advance() 59 | right.advance() 60 | # transcode character 61 | ofs = 0 62 | ch = self.plugboard[Ord(ch)] # through the plugboard 63 | for rotor in self.rotors[::-1]: # through the rotors right to left and then reflector 64 | ch, ofs = rotor.enter(ch, ofs), rotor.ofs 65 | for rotor in self.rotors[1:]: # and back through the rotors left to right 66 | ch, ofs = rotor.exit(ch, ofs), rotor.ofs 67 | ch = (ch - rotor.ofs) % 26 # unmap it 68 | ch = self.plugboard[ch] # and back through the plugboard 69 | out += Chr(ch) 70 | if expected: assert Text(expected) == Text(out), "\nEXP: %s\nGOT: %s" % (expected, out) 71 | return out 72 | 73 | # from http://wiki.franklinheath.co.uk/index.php/Enigma/Paper_Enigma 74 | print(Machine("", Rotor("B"), Rotor("I", 'A', 1), Rotor("II", 'B', 1), Rotor("III", 'C', 1)).\ 75 | transcode("AEFAE JXXBN XYJTY", "CONGRATULATIONS")) 76 | print(Machine("", Rotor("B"), Rotor("I", 'A', 1), Rotor("II", 'B', 1), Rotor("III", 'R', 1)).\ 77 | transcode("MABEK GZXSG", "TURN MIDDLE")) 78 | print(Machine("", Rotor("B"), Rotor("I", 'A', 1), Rotor("II", 'D', 1), Rotor("III", 'S', 1)).\ 79 | transcode("RZFOG FYHPL", "TURNS THREE")) 80 | print(Machine("", Rotor("B"), Rotor("I", 'X', 10), Rotor("II", 'Y', 14), Rotor("III", 'Z', 21)).\ 81 | transcode("QKTPE BZIUK", "GOOD RESULT")) 82 | print(Machine("AP BR CM FZ GJ IL NT OV QS WX", Rotor("B"), Rotor("I", 'V', 10), Rotor("II", 'Q', 14), Rotor("III", 'Q', 21)).\ 83 | transcode("HABHV HLYDF NADZY", "THATS IT WELL DONE")) 84 | 85 | # from http://wiki.franklinheath.co.uk/index.php/Enigma/Sample_Messages 86 | # Enigma Instruction Manual, 1930 87 | print(Machine("AM FI NV PS TU WZ", Rotor("A"), Rotor("II", 'A', 24), Rotor("I", 'B', 13), Rotor("III", 'L', 22)).\ 88 | transcode("GCDSE AHUGW TQGRK VLFGX UCALX VYMIG MMNMF DXTGN VHVRM MEVOU YFZSL RHDRR XFJWC FHUHM UNZEF RDISI KBGPM YVXUZ", 89 | "FEIND LIQEI NFANT ERIEK OLONN EBEOB AQTET XANFA NGSUE DAUSG ANGBA ERWAL DEXEN DEDRE IKMOS TWAER TSNEU STADT")) 90 | # Operation Barbarossa, 1941 91 | print(Machine("AV BS CG DL FU HZ IN KM OW RX", Rotor("B"), Rotor("II", 'B', 2), Rotor("IV", 'L', 21), Rotor("V", 'A', 12)).\ 92 | transcode("EDPUD NRGYS ZRCXN UYTPO MRMBO FKTBZ REZKM LXLVE FGUEY SIOZV EQMIK UBPMM YLKLT TDEIS MDICA GYKUA CTCDO MOHWX MUUIA UBSTS LRNBZ SZWNR FXWFY SSXJZ VIJHI DISHP RKLKA YUPAD TXQSP INQMA TLPIF SVKDA SCTAC DPBOP VHJK-", 93 | "AUFKL XABTE ILUNG XVONX KURTI NOWAX KURTI NOWAX NORDW ESTLX SEBEZ XSEBE ZXUAF FLIEG ERSTR ASZER IQTUN GXDUB ROWKI XDUBR OWKIX OPOTS CHKAX OPOTS CHKAX UMXEI NSAQT DREIN ULLXU HRANG ETRET ENXAN GRIFF XINFX RGTX-")) 94 | print(Machine("AV BS CG DL FU HZ IN KM OW RX", Rotor("B"), Rotor("II", 'L', 2), Rotor("IV", 'S', 21), Rotor("V", 'D', 12)).\ 95 | transcode("SFBWD NJUSE GQOBH KRTAR EEZMW KPPRB XOHDR OEQGB BGTQV PGVKB VVGBI MHUSZ YDAJQ IROAX SSSNR EHYGG RPISE ZBOVM QIEMM ZCYSG QDGRE RVBIL EKXYQ IRGIR QNRDN VRXCY YTNJR", 96 | "DREIG EHTLA NGSAM ABERS IQERV ORWAE RTSXE INSSI EBENN ULLSE QSXUH RXROE MXEIN SXINF RGTXD REIXA UFFLI EGERS TRASZ EMITA NFANG XEINS SEQSX KMXKM XOSTW XKAME NECXK")) 97 | # U-264 (Kapitänleutnant Hartwig Looks), 1942 98 | print(Machine("AT BL DF GJ HM NW OP QY RZ VX", Rotor("B Thin"), Rotor("Beta", 'V', 1), Rotor("II", 'J', 1), Rotor("IV", 'N', 1), Rotor("I", 'A', 22)).\ 99 | transcode("NCZW VUSX PNYM INHZ XMQX SFWX WLKJ AHSH NMCO CCAK UQPM KCSM HKSE INJU SBLK IOSX CKUB HMLL XCSJ USRR DVKO HULX WCCB GVLI YXEO AHXR HKKF VDRE WEZL XOBA FGYU JQUK GRTV UKAM EURB VEKS UHHV OYHA BCJW MAKL FKLM YFVN RIZR VVRT KOFD ANJM OLBG FFLE OPRG TFLV RHOW OPBE KVWM UQFM PWPA RMFH AGKX IIBG", 100 | "VONV ONJL OOKS JHFF TTTE INSE INSD REIZ WOYY QNNS NEUN INHA LTXX BEIA NGRI FFUN TERW ASSE RGED RUEC KTYW ABOS XLET ZTER GEGN ERST ANDN ULAC HTDR EINU LUHR MARQ UANT ONJO TANE UNAC HTSE YHSD REIY ZWOZ WONU LGRA DYAC HTSM YSTO SSEN ACHX EKNS VIER MBFA ELLT YNNN NNNO OOVI ERYS ICHT EINS NULL")) 101 | # Scharnhorst (Konteradmiral Erich Bey), 1943 102 | print(Machine("AN EZ HK IJ LR MQ OT PV SW UX", Rotor("B"), Rotor("III", 'U', 1), Rotor("VI", 'Z', 8), Rotor("VIII", 'V', 13)).\ 103 | transcode("YKAE NZAP MSCH ZBFO CUVM RMDP YCOF HADZ IZME FXTH FLOL PZLF GGBO TGOX GRET DWTJ IQHL MXVJ WKZU ASTR", 104 | "STEUE REJTA NAFJO RDJAN STAND ORTQU AAACC CVIER NEUNN EUNZW OFAHR TZWON ULSMX XSCHA RNHOR STHCO")) 105 | -------------------------------------------------------------------------------- /new_file.txt: -------------------------------------------------------------------------------- 1 | 2 | yo! --------------------------------------------------------------------------------