├── README.md
├── class-notes
    ├── class-1.md
    ├── class-2.md
    ├── class-3.md
    ├── class-4.md
    ├── class-5.md
    ├── class-6.md
    ├── class-7.md
    └── examples
    │   ├── class7
    │       ├── email.py
    │       ├── vidbot.py
    │       └── webapp.py
    │   ├── natural-language-processing
    │       ├── classify.py
    │       ├── manifesto.txt
    │       ├── part-of-speech.py
    │       ├── pride.txt
    │       ├── regexp.py
    │       ├── similarity.py
    │       └── translate.py
    │   └── video
    │       ├── combine_videos.py
    │       ├── random_overlay.py
    │       └── randomize.py
├── reader-01-the-command-line.md
└── reader-02-python-basics.md


/README.md:
--------------------------------------------------------------------------------
  1 | # Scrapism
  2 | 
  3 | (draft syllabus)
  4 | 
  5 | **Instructor:** [Sam Lavigne](http://lav.io) | [splavigne@gmail.com](mailto:splavigne@gmail.com)  
  6 | **Teaching Assistant:** TBD  
  7 | **Track:** Code Poetry, Fall 2018  
  8 | **Location:** [School for Poetic Computation](http://sfpc.io/) | 155 Bank St, New York, NY 10014  
  9 | **Time:** Tuesdays 10am to 1pm  
 10 | **Office Hours:** Tuesdays 2pm to 4pm (or by appointment)  
 11 | **Class Notes:** [link](https://paper.dropbox.com/folder/show/Class-Notes-e.1gg8YzoPEhbTkrhvQwJ2zz3XJBcZkbceseDnY854qf9k5dPQtUC2)
 12 | 
 13 | Scrapism is the artistic practice of web scraping, or of automatically collecting and transforming found digital material. It hinges upon a combination of curatorial practice, reverse engineering, and hoarding mentality. In this class students will learn how to scrape massive quantities of material from the internet with Python, and then use that material to make poetic, satirical, critical, political projects. Each session we will cover a different web scraping technique, with production assignments relating to text, image and video. We will explore surrealist, dadaist, situationist techniques such as detournement, collage, and cut-ups, and apply them to a contemporary digital context.
 14 | 
 15 | ## Schedule
 16 | 
 17 | ### 1. September 18th
 18 | 
 19 | Introductions. Using the terminal. Basic python. Reading lines.
 20 | 
 21 | #### Readings  
 22 | * [Intro to the command line](https://github.com/antiboredom/sfpc-scrapism/blob/master/reader-01-the-command-line.md)
 23 | * [Python basics](https://github.com/antiboredom/sfpc-scrapism/blob/master/reader-02-python-basics.md)
 24 | * [Artificial Hells (introduction and chapter 1)](https://selforganizedseminar.files.wordpress.com/2011/08/bishop-claire-artificial-hells-participatory-art-and-politics-spectatorship.pdf) By Claire Bishop
 25 | * [A User’s Guide to Détournement](http://www.bopsecrets.org/SI/detourn.htm)
 26 | 
 27 | #### Assignment
 28 | * Find three sentences (or phrases) in the wild. Your sentences could come from the internet or the real world, from a book, a store sign, a facebook post, a news article, product packaging, or from a restaurant menu. Anything is fine, but you must not write it yourself. Be prepared to recite what you have found next week in class.  
 29 | 
 30 | ---
 31 | 
 32 | ### 2. September 25th
 33 | 
 34 | Python part 2. Manipulating text. Automating writing.
 35 | 
 36 | #### Readings  
 37 | * Tech reading 2 TBD  
 38 | * [The Cut Up Method](http://www.writing.upenn.edu/~afilreis/88v/burroughs-cutup.html) by William Burroughs
 39 | 
 40 | #### Assignment
 41 | * Transform a non-poetic text into a poetic text using Python. It is up to you to determine how and why a text is poetic or non-poetic. If you are stuck, try techniques like sorting, randomizing, filtering, deleting, or replacing.
 42 | 
 43 | ---
 44 | 
 45 | ### 3. October 2nd
 46 | 
 47 | Web scraping basics.  Making big lists.
 48 | 
 49 | #### Readings  
 50 | * Tech reading 3
 51 | * [Uncreative Writing](https://www.chronicle.com/article/Uncreative-Writing/128908) by Kenneth Goldsmith
 52 | 
 53 | ---
 54 | 
 55 | ### 4. October 9th
 56 | 
 57 | Web scraping part 2. APIs. Advanced text manipulation and parsing.
 58 | 
 59 | #### Readings  
 60 | * Tech reading 4
 61 | * [Digital Divide](https://www.artforum.com/print/201207/digital-divide-contemporary-art-and-new-media-31944) by Claire Bishop
 62 | * [Montage](https://lucian.uchicago.edu/blogs/mediatheory/keywords/montage/) by Jared Leibowich
 63 | 
 64 | ---
 65 | 
 66 | ### 5. October 16th
 67 | 
 68 | Automating collage.
 69 | 
 70 | #### Readings  
 71 | * Tech reading 5
 72 | * [Too Much World: Is the Internet Dead?](https://www.e-flux.com/journal/49/60004/too-much-world-is-the-internet-dead/) by Hito Steyerl
 73 | 
 74 | ---
 75 | 
 76 | ### 6. October 23rd
 77 | 
 78 | Automating video.
 79 | 
 80 | #### Readings  
 81 | * Tech reading 6
 82 | * [Surrealism: the Last Snapshot of the European Intelligentsia](https://monoskop.org/images/a/a0/Benjamin_Walter_1929_1978_Surrealism_The_Last_Snapshot_of_the_European_Intelligentsia.pdf) by Walter Benjamin
 83 | 
 84 | ---
 85 | 
 86 | ### 7. October 30th
 87 | 
 88 | Bots and project work.
 89 | 
 90 | ---
 91 | 
 92 | 
 93 | ## Fun/useful Python Libraries
 94 | * [moviepy](http://zulko.github.io/moviepy/) - edit video
 95 | * [vidpy](http://antiboredom.github.com/vidpy/) - edit video (my library)
 96 | * [videogrep](http://antiboredom.github.com/videogrep/) - make supercuts (my library)
 97 | * [youtube-dl](https://rg3.github.io/youtube-dl/) - download videos
 98 | * [pillow](https://python-pillow.org/) - edit images
 99 | * [flask](http://flask.pocoo.org/) - web server
100 | * [twython](https://github.com/ryanmcgrath/twython) - use the twitter api
101 | * [spacy](https://github.com/ryanmcgrath/twython) - natural language processing
102 | * [requests](http://docs.python-requests.org/en/master/) - easy http requests
103 | * [envelopes](http://tomekwojcik.github.io/envelopes/) - send email
104 | * [opencv](http://opencv.org/) - computer vision
105 | * [asciimatics](https://github.com/peterbrittain/asciimatics) - text-based interfaces and animation
106 | * [colorama](https://github.com/tartley/colorama) - easy color in the terminal
107 | 
108 | 


--------------------------------------------------------------------------------
/class-notes/class-1.md:
--------------------------------------------------------------------------------
  1 | # Sept 18 - The Command Line
  2 | **Instructor**: Sam Lavigne | [splavigne@gmail.com](mailto:splavigne@gmail.com) 
  3 | **Teaching Assistant**: Fernando Ramallo | [fernando.ramallo@gmail.com](mailto:fernando.ramallo@gmail.com) 
  4 | **Track**: Code Poetry, Fall 2018 
  5 | **Location**: School for Poetic Computation | 155 Bank St, New York, NY 10014 **Time**: Tuesdays 10am to 1pm 
  6 | **Office Hours**: Tuesdays 2pm to 4pm (or by appointment)
  7 | 
  8 | Slack channel: #2018-fall-scrapism
  9 | Sam’s office hours Sign-up sheet: [+Sam Office Hours](https://paper.dropbox.com/doc/Sam-Office-Hours-gaKmWg2Qo7jnn2FbO7F5b) 
 10 | Fernando’s office hours sign-up sheet: [+Fernando (TA) Office Hours](https://paper.dropbox.com/doc/Fernando-TA-Office-Hours-p8FxDav0hzpIjrJ4rtfeX) 
 11 |  
 12 | 
 13 | # Reader
 14 | - [Intro to the command line](https://github.com/antiboredom/sfpc-scrapism/blob/master/reader-01-the-command-line.md)
 15 | - [Python basics](https://github.com/antiboredom/sfpc-scrapism/blob/master/reader-02-python-basics.md)
 16 | # Notes
 17 | 
 18 | 
 19 | 
 20 | - We all introduced ourselves, again!
 21 | - We’re gonna assume no technical knowledge, feel free to reach out for questions.
 22 | - Sam will record himself giving the class, put it in a private link
 23 | 
 24 | 
 25 | ## Sam’s work
 26 | 
 27 | http://lav.io/
 28 | 
 29 | How can we make critical statements without saying specifically what that statement is.
 30 | 
 31 | https://lav.io/projects/white-collar-crime-risk-zones/
 32 | https://lav.io/projects/baabaa/ - An index of selected commodities listed for sale on alibaba.com. Items are arranged by price and minimum order quantity and are search results for terms like “riot gear” and “human labor”.
 33 | https://lav.io/projects/cspan-5/ - most frequently stated phrases turned into a video
 34 | 
 35 | 
 36 | 
 37 | ## Scrapism
 38 | 
 39 | Q of this class: how do we make something new by using material that already exists / 
 40 | What new things are sayable today? .. by means of these tools that wouldn’t be sayable otherwise
 41 | 
 42 | Objectives
 43 | 
 44 | - learn python
 45 | - use it to collect material and manipulate it
 46 | - use text: how do we create automatic *poetry*
 47 | - image: how do we create automatic *collage*
 48 | - video: automatic *montage*
 49 | 
 50 | Look at groups and individuals from the past that used rule-based techniques / almost automatically / surrealists, dadaists, situationists
 51 | We’re gonna be making critiques, satires, commentaries, poetry.
 52 | Process:
 53 | 
 54 | - find a good source material
 55 | - figure out how to get that source material (get a lot of it)
 56 | - figure out how to parse it and transform it / take something that is a big mess from the internet, take unstructured information / transform it into something you can use
 57 | - figure out how to present what you’ve collected to the world / something new
 58 | 
 59 | We’re gonna treat everything *as a text***,** looking at images *as* *if they were* text, e.g. [C-SPAN5 bot](https://twitter.com/cspanfive) (treating video as text that is cut and put together). 
 60 | 
 61 | How do these techniques work in a post-Trump environment?
 62 | All information is out in the open, does that make this work superfluous?
 63 | 
 64 | I saw a horrible website today!
 65 | https://anti-captcha.com
 66 | 
 67 | 
 68 | ## Class today
 69 | 
 70 | All the things we’re gonna talk about today are gonna be in these readers:
 71 | 
 72 | - [Intro to the command line](https://github.com/antiboredom/sfpc-scrapism/blob/master/reader-01-the-command-line.md)
 73 | - [Python basics](https://github.com/antiboredom/sfpc-scrapism/blob/master/reader-02-python-basics.md)
 74 | 
 75 | Every class will have a series of readings (technical and non-technical):
 76 | 
 77 | - Technical readings are what we talked about in the class, for reference / when you forget
 78 | - The readings are the ones listed in the [syllabus](https://github.com/antiboredom/sfpc-scrapism), *in the slot for the previous class* 
 79 | 
 80 | 
 81 | 
 82 | ## The Terminal
 83 | 
 84 | Applications > Utilities > Terminal
 85 | Cmd+Space > “Terminal”
 86 | 
 87 | 
 88 | ![](https://d2mxuefqeaa7sj.cloudfront.net/s_DB93935784C30DFE0319F4DADC3823BE454C5CF94C07DCD9BB4B5FA46EC71A23_1537283005399_image.png)
 89 | 
 90 | 
 91 | The terminal is a text-based way of navigating folders
 92 | 
 93 | **Print the directory you’re in:**
 94 | 
 95 |     pwd
 96 | 
 97 | 
 98 | ![](https://d2mxuefqeaa7sj.cloudfront.net/s_DB93935784C30DFE0319F4DADC3823BE454C5CF94C07DCD9BB4B5FA46EC71A23_1537283131928_image.png)
 99 | 
100 | 
101 | 
102 | 
103 | **See what’s in the folder you’re in**
104 | 
105 |     ls
106 | ![](https://d2mxuefqeaa7sj.cloudfront.net/s_DB93935784C30DFE0319F4DADC3823BE454C5CF94C07DCD9BB4B5FA46EC71A23_1537283236404_image.png)
107 | 
108 | 
109 | **Change the directory you’re in**
110 | 
111 |     cd [folder you want to enter]
112 |     
113 |     cd Desktop
114 | 
115 | 
116 | ![](https://d2mxuefqeaa7sj.cloudfront.net/s_DB93935784C30DFE0319F4DADC3823BE454C5CF94C07DCD9BB4B5FA46EC71A23_1537283219373_image.png)
117 | 
118 | 
119 | **The terminal doesn’t understand spaces. Use commas “ to access folders and files with spaces.**
120 | 
121 | 
122 |     cd Creative Cloud Files   # doesn't work
123 |     cd "Creative Cloud Files"
124 |     
125 | 
126 | 
127 | **Going back: To go back one directory: cd ..**
128 | 
129 |     cd ..     # goes back to the previous folder
130 | 
131 | 
132 | **Making directories: mkdir**
133 | 
134 |     mkdir [name of the directory]
135 |     
136 |     mkdir newfolder      # makes a folder called 'newfolder'
137 | 
138 | 
139 | 
140 | **Move files and folders and rename them: mv**
141 | 
142 | 
143 |     mv [old name] [new name]
144 |     
145 |     mv newfolder/ newnamedfolder     #renames folder 'newfolder' to 'newnamedfolder'
146 | 
147 | slash means folder, it’s optional
148 | 
149 | can be used for moving a file, but also be used for renaming
150 | 
151 | 
152 | **Creating new files: touch**
153 | Updates the last date modified tag for a file or folder,  to be right now.
154 | If that file doesn’t exist, it **creates that file**
155 | a fast way of making files
156 | 
157 | 
158 |     touch [name of file or folder]
159 |     
160 |     touch coolfile.txt       #makes an empty file called 'coolfile.txt'
161 | 
162 | 
163 | 
164 | **Delete**
165 | 
166 |     rm [name of file]
167 |     
168 |     rm coolfile.txt
169 | 
170 | 
171 | **Hit tab to autocomplete a file or folder**
172 | 
173 |     cd Des[HIT TAB]       # autocompletes to cd Desktop
174 | 
175 | 
176 | 
177 | 
178 | ## Manipulating text
179 | 
180 | **Use gutenberg for source text**
181 | A good external source to work with is [project Gutenberg](http://gutenberg.org/). 57,000 free eBooks public domain texts.
182 | 
183 | - Download files in Plain Text format
184 | 
185 | Moby dick text: https://www.gutenberg.org/cache/epub/15/pg15.txt
186 | The Trial by Kafka: https://www.gutenberg.org/cache/epub/7849/pg7849.txt
187 | 
188 | Save file as Plain Text Document (or Page Source in Safari)
189 | 
190 | **See information about the file**
191 | 
192 |     file [name of file]
193 |     
194 |     file mobydick.txt 
195 |     Output: mobydick.txt: UTF-8 Unicode (with BOM) text, with CRLF line terminators
196 | 
197 | **Looking inside the contents of a file**
198 | 
199 |     cat [name of file]   # prints content of the file on the screen
200 |     
201 |     cat mobydick.txt 
202 |     # .... will print the entire text
203 | 
204 | **Use the ‘more’ command to actually read through the text with scrolling**
205 | 
206 |     more [name of file]
207 |     
208 |     more mobydick.txt
209 |     # ... scroll through the text
210 |     # ... type Q to exit
211 | 
212 | 
213 | 
214 | **Best command: say**
215 | 
216 |     say hello
217 |     
218 |     say this is your computer i am going to murder you
219 |     
220 | 
221 | 
222 | All the commands have a stucture
223 | ***name of command + argument (usually file or folder)***
224 | 
225 | **But most commands have additional options**
226 | Every single command has a manual built-in. Access it with **man** keyword
227 | 
228 |     man say
229 |     # will go to the manual about the say command, 
230 |     # exit by typing Q
231 | 
232 | e.g. -v to change the voice, -f file, -r rate
233 | usually two ways of accessing an option, e.g.
234 | 
235 | - -r rate
236 | - --rate=rate
237 |     say whatever
238 |     # says 'whatever' at normal rate
239 |     
240 |     say -r 500 whatever
241 |     # says 'whatever' at the rate of 500 words per minute
242 |     
243 |     # use -f option to read a file
244 |     say -f mobydick.txt
245 |     # says the entirety of Moby Dick outloud. Poetic!
246 |     
247 |     
248 | 
249 | 
250 | **To stop a command**
251 | 
252 | - Ctrl + C: Stops the command
253 | - Cmd + Q (Alt + F4 in windows): Closes the terminal entirely
254 | 
255 | 
256 | **Use grep command to print every line of a text file that contains a certain word**
257 | a line is understood as every time there’s a carriage return / breaking point / enter in the text
258 | 
259 |     
260 |     grep trial thetrial.txt
261 |     # prints all the lines of the text file that has the word 'trial'
262 |     
263 |     grep whale mobydick.txt
264 |     
265 |     # to search for more than one word, put it in quotes
266 |     grep "the whale" mobydick.txt
267 |     
268 |     
269 | 
270 | 
271 | **Sort comand** 
272 | sorts every line
273 | 
274 |     sort thetrial.txt
275 |     # returns the trial, alphabetically ordered
276 |     
277 |     sort -u # only uniques
278 |     sort -r # reverse
279 | 
280 | 
281 | 
282 | 
283 | **Save the output of the command line to a new file, with the > sign**
284 | *this is called a redirect*
285 | 
286 |     [command] > [file name to save to]
287 |     
288 |     sort thetrial.txt > thetrial_sorted.txt
289 |     # instead of printing, save whatever output to thetrial_sorted.txt file
290 |     
291 | 
292 | 
293 | **You can combine commands together**
294 | take the output of one command, pipe it to another command, and chain things together
295 | e.g. do the sort and grep at the same time
296 | 
297 |     
298 |     # use the vertical bar character (pipe) | to chain commands 
299 |     
300 |     grep whale mobydick.txt | sort
301 |     # take the output of the lines from grep, into the sort command, finally to the screen
302 |     
303 |     grep whale mobydick.txt | sort > sorted_whales.txt
304 |     # make a text file with the lines that include "whale", sorted alphabetically
305 |     
306 | 
307 | 
308 | **Other fun commands**
309 | 
310 | **use cut to separate words**
311 | 
312 |     cut # breaks every line in the file by a delimiter, 
313 |     # e.g. break the lines by spaces, 
314 |     # -d delimiter
315 |     # -f field
316 |     
317 |     cut -d " " -f 1 mobydick.txt
318 |     # separate the lines by empty spaces (therefore separating each word), get the first field (the first instance, ie. the first word), of mobydick.txt
319 | 
320 | **use a wildcard to access multiple files**
321 | 
322 |     ls *.txt
323 |     # lists any file that ends with .txt
324 | 
325 | 
326 | **clear to clear the screen**
327 | 
328 |     clear
329 |     # empties the terminal window
330 | 
331 | 
332 | 
333 | 
334 | ## How the file system works
335 | 
336 | Files and folders,
337 | Every folder has exactly one parent folder, except the very top (the root)
338 | 
339 | The root folder (the hard drive) is described as a forward slash /
340 | 
341 |     cd /
342 |     # goes to the root folder
343 | 
344 | Some files and folders are **hidden**
345 | 
346 |     cd /
347 |     ls
348 |     # will list all the files and folders in the root, you'll see some that are hidden in the Finder / Folder viewer
349 | 
350 | Each file/folder has a unique path
351 | You can go to a specific folder and access a file inside it
352 | 
353 |     cd /Users/sam/Desktop/
354 |     # go to the desktop
355 |     more thetrial.txt
356 |     # if there's a file called thetrial.txt in Desktop, it gets printed out
357 |     # otherwise, an error
358 | 
359 | But you can also access a file by its **unique path**, from any other folder
360 | 
361 |     cd /
362 |     more /Users/sam/Desktop/thetrial.txt
363 | 
364 | *Tip: Drag a folder or file from the Finder to the terminal and get its unique path without having to type it*
365 | 
366 | cd can be used to navigate the file system easily
367 | 
368 |     cd
369 |     # cd with no argument goes to the root folder
370 |     
371 |     cd ../Documents
372 |     # .. means one level up
373 |     # goes one level up, and then down into the Documents folder, if it exists
374 |     # can be combined:
375 |     cd ../../../Desktop   #go three levels up and then into Desktop
376 |     
377 |     cd ./Desktop
378 |     # . means the folder we are currently in
379 |     
380 | 
381 | **open** opens a file in its default application
382 | 
383 |     open mobydick.txt
384 |     # opens the text file in TextEdit or notepad
385 |     
386 |     open .
387 |     # opens the folder we currently are in, in the folder viewer (eg. Finder)
388 | 
389 | 
390 | **Some tricks to move the typing cursor quickly**
391 | **Shortcuts:**
392 | 
393 | - Ctrl + A: brings the cursor to the beginning of the line
394 | - Ctrl + E: brings the cursor to the end of the line
395 | - Tab: for autocomplete of commands
396 |   - “gr” + Tab: show all commands that start with gr
397 | - Cmd + D: splits screen to have multiple terminals
398 | - Cmd + N: makes a new terminal window
399 | - Cmd + T: makes a new tab
400 | 
401 | **Another terminal program**
402 | 
403 | - iTerm
404 | 
405 | 
406 | 
407 | ## Install python + text editor
408 | 
409 | **Installing python**
410 | Your computer comes with python, but we need a different version.
411 | 
412 | There’s tons of ways to install python? 
413 | We’re gonna use a tool called **brew** to install stuff with:
414 | https://brew.sh/
415 | 
416 | Take the main example line, copy paste it into a Termina, hit enter.
417 | 
418 |     /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
419 | 
420 | “It should just work”
421 | 
422 | Once brew is installed, install python, on a terminal:
423 | 
424 |     brew install python3
425 | ## 
426 | 
427 | **Installing a text editor**
428 | 
429 | Doesn’t matter what text editor you use, but a few good ones
430 | 
431 | - Sublime https://www.sublimetext.com/ **paid but fast!** 
432 | - Visual Studio Code https://code.visualstudio.com/   **free/open source**
433 | - Atom https://atom.io/ **free/open source**
434 | 
435 | See [Python basics](https://github.com/antiboredom/sfpc-scrapism/blob/master/reader-02-python-basics.md) for install instructions
436 | 
437 | Text editors will color-code a python file to show you different parts.
438 | 
439 | You can also edit Python files in an **IDE**, “integrated development environment”, they are full platforms for programming, with lots of features. For the purpose of this class we’ll stick to plain text editors.
440 | 
441 | **Using python**
442 | 
443 | python is just a command line program (a program that you can use in the Terminal)
444 | 
445 | you might have more than one python version,
446 | to use the one we’re using type **python3**
447 | 
448 | **Way ONE to use python: without arguments**
449 | 
450 | In a terminal window:
451 | 
452 |     python3
453 | 
454 | 
455 | ![](https://d2mxuefqeaa7sj.cloudfront.net/s_DB93935784C30DFE0319F4DADC3823BE454C5CF94C07DCD9BB4B5FA46EC71A23_1537288255592_image.png)
456 | 
457 | 
458 | 
459 |     >>> 2+1
460 |     3   # output
461 | 
462 | 
463 | To exit the python console, type 
464 | 
465 |     Ctrl + D
466 |     
467 |     >>> exit()
468 | 
469 | 
470 | **Way TWO: next week!**
471 | 
472 | 
473 | 
474 | 
475 | ## Works to look for / Works we’re basing our work on
476 | 
477 | **Allison Parrish**
478 | http://www.decontextualize.com/
479 | 
480 | https://twitter.com/everyword
481 | a twitter bot that tweets every single word of the english language in alphabetical order
482 | 
483 | / when you make a work for this, what is it you’re doing?
484 | / closer to a performance
485 | / the lens of performance can help us understand this work
486 | 
487 | not just about the bot itself, about the reactions to the bot
488 | 
489 | / related to Claire Bishop’s reading
490 | 
491 | Responses to éclair
492 | [https://twitter.com/everyword/status/475170297776447488](https://twitter.com/everyword/status/475170297776447488)
493 | 
494 | **Nick Monfort**
495 | 256 characters-long one line terminal commands to make poetry
496 | https://nickm.com/poems/ppg256.html
497 | 
498 | **Everest Pipkin -** Cloud OCR
499 | http://ifyoulived.org/translations.html
500 | Misusing image conversion / analysis
501 | https://procedural-generation.tumblr.com/
502 | 
503 | what does the cloud say according to the computer
504 | poem
505 | 
506 | / it’s broken / a natural lifespan/limit
507 | 
508 | **Daniel Temkin - Internet Directory**
509 | http://danieltemkin.com/InternetDirectory
510 | A 37k+ page loose-leaf book containing all 115 million .COM domains in alphabetical order, along with current IP addresses.
511 | 
512 | 
513 | **Sam’s own - Patent Generator**
514 | http://lav.io/2014/05/transform-any-text-into-a-patent-application/
515 | Output: https://saaaam.s3.amazonaws.com/communist.pdf
516 | 
517 | 
518 | **Kate Compton - Tracery**
519 | http://www.tracery.io/
520 | Text generation
521 | 
522 | 
523 | / You can make tools
524 | / You can share those tools, see what other people make with it
525 | / You are making a form / with constraints
526 | 
527 | 
528 | **Kyle Macdonald - Keytweeter**
529 | [https://vimeo.com/9922212](https://vimeo.com/9922212)
530 | Tweets everything you type
531 | 
532 | 
533 | 
534 | **Great book for learning python**
535 | Learn Python the hard way
536 | https://www.learnpythonthehardway.org/
537 | 
538 | **Other resources for learning python**
539 | Automate the Boring Stuff
540 | https://automatetheboringstuff.com/
541 | 
542 | Python for Everybody
543 | https://books.trinket.io/pfe/
544 | 
545 | 
546 | ## Assignment for next week
547 | 
548 | Look at python basics
549 | Read 
550 | 
551 | - [Python basics](https://github.com/antiboredom/sfpc-scrapism/blob/master/reader-02-python-basics.md)
552 | - [Artificial Hells (introduction and chapter 1)](https://selforganizedseminar.files.wordpress.com/2011/08/bishop-claire-artificial-hells-participatory-art-and-politics-spectatorship.pdf) By Claire Bishop
553 | - [A User’s Guide to Détournement](http://www.bopsecrets.org/SI/detourn.htm)
554 | 
555 | 
556 | **Find 3 sentences**
557 | You’re gonna assign them to the rest of the class
558 | not too long
559 | they can come from anywhere / internet real world facebook post product packaging menu
560 | as long as you don’t write them yourself
561 | 
562 | Combine them (one after the other)
563 | either make sense together or not
564 | that creates new possibilities when put together
565 | 
566 | 
567 | 
568 | 
569 | ## WordHack this Thursday! 
570 | 
571 | [[link]](https://www.facebook.com/events/713754025655700/?acontext=%7B%22ref%22%3A%2229%22%2C%22ref_notif_type%22%3A%22event_aggregate%22%2C%22action_history%22%3A%22null%22%7D&notif_id=1537184173953188&notif_t=event_aggregate)
572 | WordHack is a monthly evening of performances and talks exploring the intersection of language and technology. Code poetry, digital literature, e-lit, language games, coders interested in the creative side, writers interested in new forms writing can take, all are welcome here.
573 | 
574 | This month we will feature talks and performances by:
575 | JOANNE MCNEIL ([http://www.joannemcneil.com/](http://www.joannemcneil.com/))
576 | MARTIN O'LEARY ([http://mewo2.com/](http://mewo2.com/))
577 | ESTHER SEYFFARTH ([https://user.phil.hhu.de/~seyffarth/index.html](https://user.phil.hhu.de/~seyffarth/index.html))
578 | 
579 | 
580 | ## Syncrony NYC
581 | 
582 | Syncrony NYC
583 | http://synchrony.nyc/2019/index.html
584 | Synchrony is a DEMOPARTY that begins in NEW YORK CITY, continues on an Amtrak train, and concludes in MONTREAL.
585 | 
586 | Synchrony is about being creative with computers, and seeing how computers can produce amazing sorts of animation, graphics, music, and other experiences. At the end we have COMPOS (competitions) that are voted on by those who are there at the party. Some people may work on their entries for these compos for months beforehand; some, just on the train ride up. People are welcome to enter remotely, even if they are unable to attend.
587 | 
588 | 
589 | 


--------------------------------------------------------------------------------
/class-notes/class-2.md:
--------------------------------------------------------------------------------
  1 | # Sept 25 - Python part 2. Manipulating text. Automating writing
  2 | 
  3 | **Instructor**: Sam Lavigne | [splavigne@gmail.com](mailto:splavigne@gmail.com) 
  4 | **Teaching Assistant**: Fernando Ramallo | [fernando.ramallo@gmail.com](mailto:fernando.ramallo@gmail.com) 
  5 | **Track**: Code Poetry, Fall 2018 
  6 | **Location**: School for Poetic Computation | 155 Bank St, New York, NY 10014 **Time**: Tuesdays 10am to 1pm 
  7 | **Office Hours**: Tuesdays 2pm to 4pm (or by appointment)
  8 | 
  9 | Syllabus: http://github.com/antiboredom/sfpc-scrapism
 10 | Slack channel: #2018-fall-scrapism
 11 | Sam’s office hours Sign-up sheet: [+Sam Office Hours](https://paper.dropbox.com/doc/Sam-Office-Hours-gaKmWg2Qo7jnn2FbO7F5b) 
 12 | Fernando’s office hours sign-up sheet: [+Fernando (TA) Office Hours](https://paper.dropbox.com/doc/Fernando-TA-Office-Hours-p8FxDav0hzpIjrJ4rtfeX) 
 13 | 
 14 | 
 15 | # Notes
 16 | 
 17 | 
 18 | 
 19 | Fernando gave a presentation about his work
 20 | 
 21 | - His website http://byfernando.com/
 22 | - His games https://fernandoramallo.itch.io/ (get in touch if you want a free copy of any)
 23 | 
 24 | We all went through our assignments
 25 | 
 26 | 
 27 | Get you in the mood of using language that you find around.
 28 | Juxtaposition
 29 | 
 30 | 
 31 | # Readings
 32 | 
 33 | Claire Bishop
 34 | 
 35 | - Good survey of things that have been done around reutilizing existing language
 36 | - attempts to create art that are not commodifiable, a characteristic of social art
 37 | - frequently dealing with ethical political concerns
 38 | - creating a social space, rather than making an object
 39 | - social art isn’t held to same standards as normal art, when judged. is it good art? good activism? sometimes neither. important to take note of / be aware of, when making work that’s aesthetic and activist. 
 40 |   - is making an art project the best way to achieve activist goals?
 41 |   - is doing an activist project the best way to achieve the artistic goals?
 42 |   - set your own ideas for how your work is judged / sometimes it’s not quantifiable
 43 |   - 
 44 | 
 45 | Detournement 
 46 | 
 47 | - different intepretations of it
 48 |   - what is the source text advocating for / in using it your erradicating its context
 49 |   - you’re renewing its value
 50 |   - as a practitioner, what would your desired goal be? make a new thing and destroy the old? give new value to the old through that act?
 51 | 
 52 | 
 53 | 
 54 | # Python
 55 | 
 56 | 
 57 | # See Sam’s reader with more examples here:
 58 | 
 59 | https://github.com/antiboredom/sfpc-scrapism/blob/master/reader-02-python-basics.md
 60 | 
 61 | 
 62 | 
 63 | ## Using the right version
 64 | 
 65 | when you installed python 3, it didn’t remove your old version
 66 | if you type python, sometimes it runs the **older** version that comes with Mac, not the one we’ll use
 67 | 
 68 | Depending on your settings, you might be able to type ***python*** in the terminal and get the right version, but to make sure you can type **python3**
 69 | 
 70 | ![](https://d2mxuefqeaa7sj.cloudfront.net/s_B93F4A161C5A44CDAEDBB62D1CDA4B91AEE6CCE1A00E6E4521CF0698C91A37EA_1537886949835_image.png)
 71 | 
 72 | 
 73 | 
 74 | **To exit the python console, press Ctrl + D**
 75 | 
 76 | 
 77 | ## Creating a file with the Terminal
 78 | 
 79 | On the terminal:
 80 | 
 81 | 
 82 | 1. Make a new folder with **mkdir python_lesson_1**
 83 | 2. Enter it with **cd python_lesson_1**
 84 | 3. Create a file with **touch hello.py**
 85 |   1. **touch** updates a file’s modified date if it exists, otherwise it creates it
 86 | 4. Open the file with the default editor with **open hello.py**
 87 |   1. To change the default editor: right click the file in Finder > Get Info > Change the default app in the Open With section
 88 | 
 89 | 
 90 | 
 91 | ## Writing our first program
 92 | 
 93 | **Print something to the screen:**
 94 | 
 95 | On the text editor for hello.py:
 96 | 
 97 |     
 98 |     print("a specter is haunting europe")
 99 |     
100 | 
101 | 
102 | Hit save on your editor
103 | 
104 | Run the program to see its output.
105 | On the termina:
106 | 
107 |     
108 |     $ python3 hello.py
109 |     
110 | ![](https://d2mxuefqeaa7sj.cloudfront.net/s_B93F4A161C5A44CDAEDBB62D1CDA4B91AEE6CCE1A00E6E4521CF0698C91A37EA_1537887306088_image.png)
111 | 
112 | 
113 | 
114 | **Expressions**
115 | 
116 | python replaces mathematical operations with the value of that operation
117 | 
118 | 
119 |     
120 |     # print mathematical operations
121 |     print( 1 + 1 )   # outputs 2
122 |     print( 5 / 33 )
123 |     print( 1 + 7 / 25 * 5 ) 
124 |     
125 | 
126 | 
127 | You can compare expressions
128 | 
129 | 
130 |     
131 |     print( 1 == 2 )  # returns True or False depending on if 1 equals 2
132 |     
133 |     print( 1 < 2 ) # less than
134 |     print( 1 > 2 ) # greater than
135 |     print( 1 >= 2 ) # equal or greater than
136 |     print( 1 <= 2 ) # equal or lesser than
137 |     print( 1 != 2 ) # not equal
138 |     
139 | 
140 | 
141 | You can **comment** parts of code out with # so they’re in your file but they don’t run
142 | 
143 | 
144 |     
145 |     print(1+2)
146 |     # print("Hello")
147 |     
148 | 
149 | Some editors let you comment the code you select with **Ctrl + /**
150 | 
151 | 
152 | You can save the value of an expression with **variables**, where you assign a name to an expression or value
153 | 
154 | 
155 |     
156 |     some_number = 100
157 |     
158 | 
159 | the value 100 is now stored in the variable some_number. We can see its value with print()
160 | 
161 | 
162 |     
163 |     print(some_number) 
164 |     # Output: 100
165 |     
166 | 
167 | There’s different **kinds** of values:
168 | 
169 | - Integer: a whole number (1, 2, 3, 5, 1000)
170 | - Float: a number with decimals (1.55345, 2.0)
171 | - String: a piece of text, defined between quotes (“hello”, “a spectre… “)
172 | - Boolean: True or False
173 | - Lists: a list of items
174 |     
175 |     some_number = 100
176 |     some_float = 10.5
177 |     some_string = "a spectre is haunting europe"
178 |     some_boolean = False
179 |     a_list = [ 1, 100, 20, 25, -305 ]   # a list of integers
180 |     # You can combine types, not a good idea but..
181 |     another_list = [ "hi", 1, 1.53242, False ]
182 |     
183 | 
184 | The most important for us is going to be
185 | 
186 | 
187 | ## Strings
188 | 
189 | A string is a series of characters
190 | 
191 | we can make a variable that stores a string
192 | 
193 | we can combine variables to make new values.
194 | 
195 | If we add two strings together, it **concatenates** them
196 | 
197 | 
198 |     
199 |     first_name = "Karl"
200 |     last_name = "Marx"
201 |     
202 |     full_name = first_name + last_name
203 |     
204 |     print(full_name) # Output: KarlMarx
205 |     
206 |     # To put a space between the values
207 |     full_name = first_name + " " + last_name
208 |     print(full_name) # Output: Karl Marx
209 |     
210 | 
211 | Each character in our string has a numerical index
212 | If we want the first letter, we **access it with brackets and an index** (starting from zero)
213 | 
214 |     
215 |     first_letter = full_name[0]
216 |     second_letter = full_name[1]
217 |     
218 |     print(first_letter) #Output: K
219 |     
220 | 
221 | If we use an **index outside of the length of the string**, we get an error
222 | 
223 |     
224 |     print(full_name[1000]) # Output: IndexError: string index out of range
225 |     
226 | 
227 | 
228 | We can use **indices with negative numbers** to start at the end and walk our way backwards:
229 | 
230 | 
231 |     
232 |     # Get the last letter
233 |     last_letter = full_name[-1]
234 |     second_to_last_letter = full_name[-2]
235 |     
236 | 
237 | We can also get **ranges of characters**, this makes python very powerful for our kind of work
238 | 
239 | 
240 |     
241 |     print(full_name[0:3]) # Outputs the first three characters: Kar\
242 | 
243 | 
244 | We can combine everything we've seen so far:
245 | 
246 | 
247 |     
248 |     print(full_name[4:-1]) # Gets a range from the fifth character to the last one
249 |     
250 | 
251 | 
252 | We can check for the length of a string, with **len()**
253 | 
254 | 
255 |     
256 |     total_characters = len(full_name)
257 |     
258 | 
259 | 
260 | I can determine if a string contains another string, using the **in** keyword
261 | 
262 | - my_string in another_string: return True or False
263 |     
264 |     sentence = "A spectre is haunting Europe"
265 |     
266 |     # is "spectre" inside the sentence?
267 |     print("spectre" in sentence) #Output: True
268 |     
269 |     print("specter" in sentence) #Output: False
270 |     
271 |     
272 |     # it's case-sensitive
273 |     print("Spectre" in sentence) #Output: False
274 |     
275 |     # to make the check case-insensitive, we turn it into lowercase
276 |     print("europe" in sentence) #Output: False
277 |     print("europe" in sentence.lower()) #Output: True .  # Note: doesn't modify sentence
278 | 
279 | 
280 | **String methods** lets us manipulate strings in interesting ways:
281 | 
282 | 
283 |     
284 |     sentence = "A spectre is haunting Europe"
285 |     
286 |     # Make every character upper case
287 |     print(sentence.upper()) # Outputs: A SPECTRE IS HAUNTING EUROPE
288 |     
289 |     # or lower case
290 |     print(sentence.lower()) #Outputs: a spectre is haunting europe
291 |     
292 |     # capitalize the first letter of each word
293 |     print(sentence.title()) #Outputs: A Spectre Is Haunting Europe
294 |     
295 |     # Use replace to find a word and replace it with another
296 |     print(sentence.replace("is", "was")) #Outputs: A spectre was haunting Europe
297 |     
298 |     # We can chain these operations together
299 |     print(sentence.replace("is", "was").upper()) #Outputs: A SPECTRE WAS HAUNTING EUROPE
300 |     
301 | 
302 | None of these examples **modify the original value**, but if we want to actually change it
303 | 
304 |     
305 |     sentence.upper() # only returns the upper case sentence, doesn't modify the variable
306 |     sentence = sentence.upper() # assigns the variable to the newer upper case version
307 |     
308 | 
309 | 
310 | You can go through **more string methods** here:
311 | https://docs.python.org/3.7/library/stdtypes.html#string-methods
312 | like center
313 | 
314 |     
315 |     sentence = sentence.center(30, "*") #puts the character * around the sentence until it's 30 characters long
316 |     
317 | 
318 | 
319 | You can also do fun things like **multiplication**
320 | 
321 |     
322 |     hello = "Hello" * 100
323 |     print(hello) # Outputs: hellohellohellohellohello ...
324 |     
325 |     hello = "hello" + "o" * 100
326 |     print(hello) # Outputs: helloooooooooooooo
327 |     
328 |     hello = "he" + "l" * 1000 + "o"
329 |     
330 |     
331 | 
332 | 
333 | We can **combine different types**, but there are different ways
334 | 
335 | The bad way:
336 | 
337 |     
338 |     number = 10
339 |     message = "The number is " + number
340 |     # This throws an error (cannot concatenate 'str' and 'int' objects)
341 |     
342 | 
343 | The OK way, convert a number to a string:
344 | 
345 |     
346 |     message = "The number is " + str(number)
347 |     
348 | 
349 | The better way if you have lots of numbers, use format, it’ll replace {} with the number
350 | 
351 |     
352 |     # one value
353 |     message = "The number is {}".format(number)
354 |     
355 |     # two values
356 |     message = "The number is {} and the 2nd number is {}.".format(number, 100)
357 |     print(message) # Outputs: The number is 10 and the 2nd number is 100
358 |     
359 | 
360 | 
361 | ## Make the computer say it
362 | 
363 | In the terminal
364 | 
365 |     
366 |     python3 strings.py | say
367 |     
368 | 
369 | 
370 | 
371 | ## Save the output of a python file to a text file from the terminal
372 | 
373 | 
374 |     
375 |     python 3 strings.py > strings.txt
376 |     
377 | 
378 | 
379 | ## Lists
380 | 
381 | Make an empty lists.py file
382 | In the terminal:
383 | 
384 |     touch lists.py
385 |     open lists.py
386 | 
387 | 
388 | A lot of methods from strings apply to lists.
389 | 
390 | 
391 |     # Declaring a list
392 |     names = ["Marx", "Trotsky", "Lenin", "Engels"]
393 |     
394 |     # Get the length with len(names)
395 |     print("Total names: ", len(names)) # Outputs: 4
396 |     
397 |     # Add items to a list
398 |     names.append("Stravinsky")
399 |     
400 |     # We can declare an empty list
401 |     some_list = []
402 |     
403 |     # You can multiply a list
404 |     print(names * 10) # Outputs a list with the content of the names list 10 times
405 |     
406 |     # You can add lists together
407 |     print(names + some_list)
408 |     
409 |     # You can access individual items by their index starting with zero
410 |     print(names[0]) # First item
411 |     print(names[-1]) # Last item
412 |     print(names[0:3]) # A list with items from the first to the 4th item
413 |     
414 |     
415 | 
416 | 
417 | We can go through every item in our list, called **iteration,** using the **for** keyword
418 | 
419 |     
420 |     # declare a variable name first, then the list we're going through second
421 |     # it'll temporarily store each of the values in the variable name
422 |     for name in names:
423 |       print(name)
424 |     # Outputs: calls print for every item, outputs its value
425 | 
426 | In other languages a *block* is defined with brackets { }, but in python it’s defined by **white space, using indentation**
427 | Anything that shares the same indentation (e.g. a Tab), is part of the same block
428 | 
429 |     
430 |     for name in names:
431 |       print(name)
432 |       print("is a dead white guy") # also inside the loop
433 |       
434 |       print("and so is:")  # Still inside the loop
435 |       
436 |     print("That's all the dead white guys in our list")  # Outside of the loop
437 |       
438 | 
439 | 
440 | 
441 | 
442 | ## More 
443 | 
444 | We’ll grab Kafka’s Metamorphosis from gutenberg
445 | https://www.gutenberg.org/cache/epub/5200/pg5200.txt
446 | 
447 | Save it to a file next to our python script
448 | 
449 | We’ll read the text file and store it as a variable
450 | In our python script:
451 | 
452 |     
453 |     text = open("kafka.txt").read() # the name of the file, relative to where the script is
454 |     
455 |     print(text) # Outputs: the entire text
456 |     
457 |     
458 | 
459 | Now we can do stuff with it
460 | 
461 | 
462 |     
463 |     print(text.upper())
464 |     
465 | 
466 | 
467 | To read every single lines, Instead of read() we use readlines()
468 | 
469 |     
470 |     text = open("kafka.txt").readlines()
471 |     # text is now a list of string items, with each line from the file
472 |     
473 | 
474 | Now we can iterate over the lines
475 | 
476 | 
477 |     
478 |     for line in text:
479 |       print(line) #Outputs each line
480 |       
481 | 
482 | The problem, it’s putting a space in between each line.
483 | This is because there’s an extra character after a line break, called a newline character
484 | We can get rid of that with strip()
485 | 
486 | 
487 |     
488 |     for line in text:
489 |       line = line.strip()
490 |       print(line) # Outputs each line without whitespace or extra line breaks
491 |       
492 | 
493 | 
494 | Each of the lines is a string, so we can print parts of each line
495 | 
496 | 
497 |     
498 |     for line in text:
499 |       line = line.strip()
500 |       print(line[0:4])
501 |     
502 |     # Output is the first four characters of each line
503 |     
504 | 
505 | Or do fun stuff like replacing
506 | 
507 | 
508 |     
509 |     for line in text:
510 |       line = line.strip()
511 |       print(line.replace('e', 'eeeeeee'))
512 |       
513 | 
514 | 
515 | 
516 | ## Processing text
517 | 
518 | We’re gonna use a function called split() to break downs a string according to a delimiter character.
519 | You can use split() to return a string as a list separated by a character
520 | You can use join() to join a list back into a string
521 | 
522 |     
523 |     for line in text:
524 |       line = line.strip()
525 |       words = line.split(" ") # Separates the lines by an empty space, getting a list of words
526 |       
527 |       print(words[0]) # Outputs the first word of each sentence
528 |       
529 |       # Chain it all together!
530 |       print(words[0].center(30, '~').upper())
531 |       
532 | 
533 | 
534 | We can use the **random** methods to do interesting stuff
535 | 
536 | Sometimes you have to tell python to add **modules** with the **import** keyword to add functionality you need. Here we’ll import the [random module](https://docs.python.org/3.5/library/random.html). 
537 | 
538 | - Use the documentation to find what you can do with a module
539 | - Make sure you’re seeing the documentation of the python version you’re using (e.g. 3.5)
540 |     # Import the module
541 |     import random
542 |     
543 |     text = open("kafka.txt").readlines()
544 |     
545 |     for line in text:
546 |       line = line.strip()
547 |       words = line.split(" ")
548 |       
549 |       random_word = random.choice(words)  #Get a random item from the word list
550 |       
551 |       random.shuffle(words) # Randomizes the order of the items in the list
552 |     
553 | 
554 | 
555 | We use the join() method to join the randomized word list in to a string
556 | 
557 |     
558 |     
559 |     for line in text:
560 |       line = line.strip()
561 |       words = line.split(" ")
562 |       random.shuffle(words)
563 |       
564 |       new_line = " ".join(words) # Joins each element in the list by sticking the space character in between the words, outputs a string
565 |       
566 | 
567 | 
568 | We can sort with sorted()
569 | 
570 | 
571 |     
572 |     for line in text:
573 |       line = line.strip()
574 |       words = line.split(" ")
575 |       random.shuffle(words)
576 |       
577 |       words = sorted(words) # Sort the words list alphabetically
578 |       
579 |       new_line = " ".join(words)
580 |       
581 | 
582 | 
583 | Final script
584 | 
585 |     # Import the module
586 |     import random
587 |     
588 |     text = open("kafka.txt").readlines()
589 |     for line in text:
590 |       line = line.strip()
591 |       words = line.split(" ")
592 |       random.shuffle(words)
593 |       words = sorted(words)
594 |       new_line = " ".join(words)
595 |       print(new_line)
596 | 
597 | 
598 | ## List comprehension
599 | 
600 | Make a new file comps.py
601 | 
602 | 
603 | We can make a list of upper case’d items
604 | 
605 |     names = ["Trotsky", "Marx", "Lenin", "Engels"]
606 |     
607 |     uppercase_names = []
608 |     for name in names:
609 |       uppercase_names.append(name.upper())
610 |     
611 |     
612 | 
613 | There’s a handier way of doing this in python, called **list comprehension.**
614 | This does the same thing as the example above
615 | 
616 |     names = ["Trotsky", "Marx", "Lenin", "Engels"]
617 |     
618 |     uppercase_names = [name.upper() for name in names]
619 |     
620 | 
621 | It’s saying: for every value in the list **names** temporarily store it as a variable **name**, make that upper case and store it in a new list called **uppercase_names**
622 | 
623 | 
624 |     
625 |     names = [name.replace('r', 'arrrrr') for name in names]
626 |     
627 | 
628 | We can filter too, by adding **if statements** inside too:
629 | 
630 |     
631 |     names = [name for name in names if name[0] == "l"]
632 |     # returns elements inside of the list whose first letter is l
633 |     
634 | 
635 | 
636 | We can add this filtering technique to the words in our previous example
637 | 
638 |     import random
639 |     
640 |     text = open("kafka.txt").readlines()
641 |     for line in text:
642 |       line = line.strip()
643 |       words = line.split(" ")
644 |     
645 |       words = [word for word in words if word.startswith("a")]
646 |     
647 |       new_line = " ".join(words)
648 |         
649 |       print(new_line)
650 |       # prints all the words that start with a  
651 | 
652 | OR more:
653 | 
654 |       words = [word for word in words if len(word) > 5
655 |       # all the words that have 5 or more characters in them
656 | 
657 | 
658 |       words = [word for word in words if word.endswith("ing")]
659 |       # all the words that end in ing
660 | 
661 | 
662 | 
663 | # Assignment for next week
664 | 
665 | Also available in: https://github.com/antiboredom/sfpc-scrapism
666 | 
667 | Transform a non-poetic text into a poetic text
668 | 
669 | - up to you to determine what’s poetic
670 | 
671 | Read some file, or if the text is short you can just put that text directly into python as  a variable
672 | 
673 | if don’t know what to do try stuff like sorting, randomizing, replacing, deleting things
674 | 
675 | by taking something that exists and using these methods we can reformat it, rework it, you can use whatever is at your disposal. you’re not bound by command line, so you can take the output of that text and you’re welcome to format it into something interesting, put it into open frameworkds, whatevr you want to do
676 | 
677 | Take something that exists, do something that transforms it.
678 | 
679 | If you’re more advanced, you can start to get into using third party libraries to analyze text.
680 | If you’re feeling ambitious, make this program so that it can deal with any text. Make this poetic operation so it can work with any text that you feed it.
681 | 
682 | 
683 | 


--------------------------------------------------------------------------------
/class-notes/class-3.md:
--------------------------------------------------------------------------------
  1 | # 10/02 - Dictionaries, scraping the web
  2 | 
  3 | 
  4 | 
  5 | # Dictionaries
  6 | 
  7 | List = collection of items ordered numerically
  8 | Dictionary = no order, the items are indexed by another variable (usually a String)
  9 | 
 10 | 
 11 | 
 12 | On the terminal
 13 | Make a new file and open it
 14 | 
 15 |     
 16 |     $ touch dicts.py
 17 |     $ open dicts.py
 18 | 
 19 | 
 20 | **Dictionaries are Key and Value pairs**
 21 | They’re used to represent structures of data
 22 | 
 23 | In python, you define dictionaries with curly brackets { }
 24 | 
 25 |     
 26 |     person = { } # empty dictionary
 27 |     
 28 |     person = { "first_name": "Karl, "last_name": "Marx", "age": 235 }
 29 |     
 30 |     # An easier way to look at it:
 31 |     person = { 
 32 |       "first_name": "Karl, 
 33 |       "last_name": "Marx", 
 34 |       "age": 235 
 35 |     }
 36 |   
 37 | 
 38 | “first_name” is the **Key**, “Karl” is the **value**
 39 | 
 40 | the values can be of any type: int, float, boolean, Strings, or even other dictionaries
 41 | 
 42 | **Dictionaries can contain any type, including dictionaries and lists**
 43 | 
 44 |     
 45 |     person = { 
 46 |       "first_name": "Karl", 
 47 |       "last_name": "Marx", 
 48 |       "age": 235,
 49 |       "pet": {
 50 |         "name": "Proleterry",
 51 |         "species": "parrot",
 52 |         "age": 12
 53 |       },
 54 |       "favorite_books": ["Ethics", "Twilight"]
 55 |     }
 56 |     
 57 | 
 58 | 
 59 | You’ll want to do things with values in the dictionary
 60 | 
 61 | 
 62 | ## Getting values
 63 | 
 64 | **You can get a value from a dictionary using brackets and accessing the key**
 65 | The key has to be exactly the name of the key, e.g. first_name
 66 | If it doesn’t exist, an error halts the program
 67 | 
 68 |     
 69 |     # 1. access the value using brackets by referencing the key
 70 |     print( person["first_name!"] ) #Outputs: KeyError, there is no key names first_name!
 71 |     
 72 |     print( person["first_name"] ) #Outputs: Karl
 73 |     
 74 | 
 75 | **A safer way is to use the get method,**
 76 | Returns None without an error if the key isn’t present
 77 | 
 78 |     
 79 |     name = person.get("first_name")
 80 |     
 81 | 
 82 | Sometimes dictionaries will have nested values, like a list and dictionaries, so you’ll **iterate** through the values
 83 | 
 84 |     
 85 |     for book in person["favorite_books"]:
 86 |       print(book)
 87 |       
 88 | 
 89 | 
 90 | ## You can iterate through a dictionary 
 91 | 
 92 | and get all its properties
 93 | 
 94 |     
 95 |     for key in person:
 96 |       print(key) # prints all the keys
 97 |       print(person[key]) # prints all the values
 98 |       
 99 | 
100 | 
101 | ## Adding and modifying the dictionary
102 | 
103 | Accessing a key and modifying its value will override the value for that key:
104 | 
105 |     
106 |     # replaces the value for first_name
107 |     person["first_name"] = "Lenin"
108 |     
109 | 
110 | If the key doesn’t exist, you can create it and assign a value
111 | 
112 |     
113 |     person["middle_name"] = "Terry"
114 |     # now there's a new key middle_name with value Terry
115 |     
116 | 
117 | 
118 | 
119 | 
120 | # Intro to HTML
121 | 
122 | HTML is a markup language, that the web is written in.
123 | 
124 | 
125 | ## Tags
126 | 
127 | Works as a series of **tags**
128 | A tag looks like
129 | 
130 |     \<tagname\>some stuff\</tagname\>
131 | 
132 | The beginning of the tag, the contents of it, and the closing of a tag
133 | 
134 | There’s different types for different things
135 | 
136 | - \<p\> paragraph
137 | - \<strong\> makes text bold
138 |   - this text is normal and \<strong\>this text is bold\</strong\>
139 | - \<a\> makes a link
140 |   - \<a href=”http://www.google.com\>go to google\</a\>
141 | - \<h1\> makes a header
142 |   - \<h1\>My Header\</h1\>
143 | - \<div\> represents a random division of text
144 |   - \<div\>I’m a div\</div\>
145 | 
146 | 
147 | ## Attributes
148 | 
149 | Each tag can have a series of attributes, a set of **key** and **value** pairs
150 | Two most important ones for scraping is
151 | 
152 | - **id** attribute 
153 |   - gives a unique identifier to a particular tag
154 |     - \<p id=”the-most-important-paragraph”\>Hi I’m very important\</p\>
155 |   - an id can only be applied to one tag
156 | - **class** attribute
157 |   - designates a category of tag, that the author of the page uses to find or group
158 |   - you can have multiple tags with the same class
159 |     - \<p class=”moderately-important”\>I am somewhat important\</p\>
160 |     - \<p class=”moderately-important”\>I am also somewhat important\</p\>
161 | 
162 | 
163 | ## Specific attributes
164 | 
165 | There’s some attributes that can only be applied to certain tags
166 | 
167 | - **href** is only applied to \<a\> to indicate where to go when you click on a link
168 |   - \<a href=”http://www.google.com”\>google\</a\>
169 | - **src** only applied to \<img\> to indicate which image
170 |   - \<img src=”logo.png\>
171 | 
172 | 
173 | ## Structure
174 | 
175 | A web page looks like this
176 | 
177 |     
178 |     \<html\>
179 |       \<head\>
180 |         \<title\>My page title\</title\>
181 |       \</head\>
182 |       \<body\>
183 |         \<h1\>Hello i am header\</h1\>
184 |     
185 |         \<p\>a paragraph\</p\>
186 |         
187 |       \</body\>
188 |     \</html\>
189 | 
190 | 
191 | ## CSS
192 | 
193 | Cascading Style Sheets, 
194 | just know that CSS is used to apply style to a page
195 | so the HTML stays the same for the content but the CSS indicates text color, sizes, etc.
196 | 
197 | it’s comprised of a selector, that references a part of the page, brackets that contain style
198 | 
199 | A CSS style sheet looks like this
200 | 
201 |     // this sets all the p tags to have a red border
202 |     p {
203 |       border: 1px solid red;
204 |     }
205 |     
206 | 
207 | Different selectors
208 | 
209 |     
210 |     // style the p tags and all the strong tags
211 |     p, strong {
212 |     }
213 |     
214 |     // style all the \<a\> tags inside all the \<p\> tags
215 |     p a {
216 |     
217 |     }
218 |     
219 |     // style everything with a certain class name, preseed with a period
220 |     .moderately-important {
221 |     
222 |     }
223 |     
224 |     // style an id, using #. e.g. style this \<p id="logo"\>logo text\</p\>
225 |     #logo {
226 |     
227 |     }
228 |     
229 |     // style the \<a\> tags inside \<p\> tags, but only if they're a certain class
230 |     p a.moderately-important {
231 |     
232 |     }
233 |     
234 | # Web scraping
235 | 
236 | 
237 | Open Chrome
238 | 
239 | 
240 | ## View source 
241 | 
242 | go to a website
243 | e.g. https://newyork.craigslist.org/d/antiques/search/ata
244 | 
245 | Right click \> View Source
246 | to see the source code
247 | 
248 | 
249 | 
250 | ## See source code for specific elements
251 | 
252 | Right click \> Inspect
253 | 
254 | Highlights the part of the website as you hover over the source code.
255 | 
256 | ![](https://d2mxuefqeaa7sj.cloudfront.net/s_D67961504E4563B95DDF29A2542D190EAEAA0F940919FBD5CB2C6591C6D3326E_1538491358282_image.png)
257 | 
258 | 
259 | 
260 | 
261 | 
262 | ## To scrape you want to figure out how to find a certain element
263 | 
264 | We right click a header and inspect the structure of the page where that element is.
265 | 
266 | We see that it’s a specific **class**, so we can find all elements of that class to see if that gives us all the headers/
267 | 
268 | We right click a craigslist header and find:
269 | 
270 | ![](https://d2mxuefqeaa7sj.cloudfront.net/s_D67961504E4563B95DDF29A2542D190EAEAA0F940919FBD5CB2C6591C6D3326E_1538491758062_image.png)
271 | 
272 | 
273 | we see that its class attribute says **result-title,** and that it’s inside an **\<a\>** tag
274 | so we’ll try to find **all the \<a\> tags with the result-title attribute, to find all the headers**
275 | 
276 | ## Testing inside the browser
277 | 
278 | You can quickly find elements inside the browser using the **Console.**
279 | 
280 | you can use the document.querySelectorAll() that takes one argument that is a css selector
281 | 
282 | 
283 |     
284 |     document.querySelectorAll("h2") // finds all the h2 tags
285 |     
286 | 
287 | 
288 | 
289 | 
290 | ## Getting the CSS selector for an element automatically
291 | 
292 | On the console:
293 | Right click \> Copy \> Copy Selector
294 | gets you the CSS selector
295 | **but this only helps sometimes**
296 | 
297 | ![](https://d2mxuefqeaa7sj.cloudfront.net/s_D67961504E4563B95DDF29A2542D190EAEAA0F940919FBD5CB2C6591C6D3326E_1538491844671_image.png)
298 | 
299 | 
300 | 
301 | 
302 | # How do we translate this into Python and make it automatic?
303 | 
304 | We’ll use a library called requests-html
305 | 
306 | 
307 | - Documentation, How Tos
308 |   - https://html.python-requests.org/
309 |   
310 | 
311 | It’s a library and you can scrape HTML pages with it
312 | 
313 | To scrape a page:
314 | 
315 | - First you download the page, then you convert it into a python data structure you can manipulate
316 | - Getting the HTML involves downloading the page and getting all the text
317 | - The second part is called **parsing**, going through the text and getting data from it
318 | 
319 | 
320 | ## Installing the library
321 | 
322 | You can install libraries in self-contained environments, or globally. 
323 | 
324 | On the terminal:
325 | 
326 |     $ pip3 install requests-html
327 | 
328 | 
329 | 
330 | 
331 | ## Using it
332 | 
333 | On a new python file
334 | 
335 |     
336 |     #import the library
337 |     from requests_html import HTMLSession
338 |     
339 |     # create a new session
340 |     session = HTMLSession()
341 |     
342 |     # open a website
343 |     r = session.get("https://newyork.craigslist.org/d/missed-connections/search/mis")
344 |     
345 |     print(r) # returns if it was able to open the page or not
346 |     
347 |     
348 | 
349 | **You can find items in the page using css selectors**
350 | 
351 |     
352 |     titles = r.html.find(".result-title")
353 |     
354 |     for title in titles:
355 |       print(title) # prints out the entire tag
356 |       
357 |       print(title.text) # prints out the text inside the tag
358 |       
359 |       
360 | 
361 | **You can access tags from the items in the page**
362 | 
363 |     
364 |     for title in titles:
365 |       print(title.attrs["href"]) # gets the URL in the href attribute
366 |       
367 |       print(title.attrs.get("href")) # a safer way, since some tags might not have the attribute
368 |       
369 | 
370 | **We want to also get the description, so we can tell the computer to click on a link and get a part of that other page**
371 | 
372 | **We also want to get a single element of a page**
373 | 
374 |     
375 |     
376 |     titles = r.html.find(".result-title")
377 |     for title in titles:
378 |       url = title.attrs.get("href")
379 |       name = title.text
380 |       
381 |       # open the URL we found
382 |       r = session.get(url)
383 |       # we found the part of the page we want in the article page has an id "postingbody"
384 |       # so we get the part of the page with the id (ids are prefaced by #)
385 |       content = r.html.find("#postingbody", first=True)
386 |       
387 |       if (content.text) # only if we found something
388 |         print (content.text)
389 |       
390 |         # without the first=True attribute we'd get a list, and content.text would throw an error
391 |       content = r.html.find("#postingbody")
392 |       
393 | 
394 | **Errors from r.html.find()**
395 | You might get an error for a few reasons
396 | 
397 | - It couldn’t find that element
398 | 
399 | **Mitigate the requests to not get banned with the time module**
400 | 
401 |     
402 |     # at the top of the page, import the time module
403 |     import time
404 |     
405 |     # use the sleep method to stop the script
406 |     for title in titles:
407 |       time.sleep(0.2) # stop the script for 0.2 seconds
408 |       #...
409 |       
410 | 
411 | 
412 | 
413 | ## Full script
414 |     import time
415 |     from requests_html import HTMLSession
416 |     
417 |     session = HTMLSession()
418 |     r = session.get("https://newyork.craigslist.org/d/missed-connections/search/mis")
419 |     
420 |     titles = r.html.find(".result-title")
421 |     for title in titles:
422 |       url = title.attrs.get("href")
423 |       name = title.text
424 |       
425 |       r = session.get(url)
426 |       content = r.html.find("#postingbody", first=True)
427 |       
428 |       if (content.text) # only if we found something
429 |         print (content.text)
430 |       
431 |       sleep(0.2)
432 |       
433 | 
434 | 
435 | 
436 | ## Other ways of parsing a page
437 | 
438 | Instead
439 | 
440 | You can have a for loop that goes through the html object
441 | 
442 | - It’ll use **intelligent pagination** where it automatically looks for “next” links, and gives you all the subsequent pages, e.g for search results
443 | - This is easier sometimes
444 |     
445 |     for html in r.html:
446 |       titles = html.find(".result-title")
447 |       
448 |       for title in titles:
449 |         print(title)
450 |       
451 | 
452 | 
453 | 
454 | ## Getting multiple items
455 | 
456 | In alibaba search results, we might want to get several elements of a post, instead of finding both elements separately we can get the whole post.
457 | 
458 | alibaba.py
459 | 
460 |     
461 |     from requests_html import HTMLSession
462 |     session = HTMLSession()
463 |     
464 |     r = session.get("https://www.alibaba.com/trade/search?fsb=y&IndexArea=product_en&CatId=&SearchText=drugs")
465 |     
466 |     
467 | 
468 | We inspect the title we want
469 | 
470 | 
471 | ![](https://d2mxuefqeaa7sj.cloudfront.net/s_D67961504E4563B95DDF29A2542D190EAEAA0F940919FBD5CB2C6591C6D3326E_1538495567641_image.png)
472 | 
473 | 
474 | we go up the hierarchy, hovering over the code and find what fills up the entire post
475 | 
476 | we find \<div class=”item-content”\> contains the entire box
477 | 
478 | ![](https://d2mxuefqeaa7sj.cloudfront.net/s_D67961504E4563B95DDF29A2542D190EAEAA0F940919FBD5CB2C6591C6D3326E_1538495625813_image.png)
479 | 
480 | 
481 | so we get the box on our python script
482 | 
483 |     
484 |     # find the box by using the class we found
485 |     items = r.html.find(".item-content")
486 |     
487 |     for item in items:
488 |       # now we have the whole item
489 |       print(item) #returns \<Element 'div' class=('item-content',)\> showing we're getting an element
490 |       
491 |       # we find the price class in the item
492 |       price = item.find(".price", first=True).text
493 |       # we find the title
494 |       title = item.find(".title", first=True).text
495 |       
496 |       print(title + " costs " + price) # returns Pain Pills Raw... costs $1.00
497 |       
498 | 
499 | 
500 | 
501 | 
502 | ## Downloading images
503 | 
504 | https://github.com/antiboredom/detourning-the-web-2018/blob/master/week_04/shutterstock.py
505 | 
506 | 
507 |     import requests
508 |     
509 |     def download_file(url):
510 |         local_filename = url.split('/')[-1]
511 |         # NOTE the stream=True parameter
512 |         r = requests.get(url, stream=True)
513 |         with open(local_filename, 'wb') as f:
514 |             for chunk in r.iter_content(chunk_size=1024): 
515 |                 if chunk: # filter out keep-alive new chunks
516 |                     f.write(chunk)
517 |                     #f.flush() commented by recommendation from J.F.Sebastian
518 |         return local_filename
519 |     
520 | 
521 | 
522 | 
523 | 
524 | ## Some websites have barebones HTML without the content
525 | 
526 | If the content of the HTML is barebones (like facebook) that means the content is loaded AFTER the HTML is downloaded
527 | 
528 | to help with that we can use the render() function to grab the full text of the page the way you’d see it in Chrome
529 | 
530 |     r.html.render()
531 | 
532 | 
533 | 
534 | # Homework
535 | 
536 | Make a big list using this technique
537 | 
538 | of whatever you want
539 | 
540 | There should be a reason to make that big list (a poetic, political, satirical, surrealist reason)
541 | 
542 | You’re welcome to manipulate that list in some way
543 | 
544 | 
545 | 
546 | # Next week
547 | 
548 | More advanced tools for analyzing language
549 | natural language processing
550 | 
551 | if you want a head start, the libraries to look at are:
552 | 
553 |   - TextBlob https://textblob.readthedocs.io/en/dev/
554 |   - Spacy https://spacy.io/
555 | 
556 | 
557 | 
558 | 
559 | # More resources
560 | 
561 | **Understanding Word Vectors by Allison Parrish**
562 | [https://gist.github.com/aparrish/2f562e3737544cf29aaf1af30362f469](https://gist.github.com/aparrish/2f562e3737544cf29aaf1af30362f469)
563 | 
564 | 
565 | 
566 | 
567 | 


--------------------------------------------------------------------------------
/class-notes/class-4.md:
--------------------------------------------------------------------------------
  1 | # 10/09 - Natural Language Processing
  2 | 
  3 | Raise your hand if you’ve had problems scraping
  4 | 
  5 | Scraping is more art than science
  6 | 
  7 | We’ll see other ways of scraping you can try
  8 | 
  9 | We’ll get into basics of natural language processing, using TextBlob, maybe a bit of SpaCy.
 10 | 
 11 | Who wants to share their lists?
 12 | 
 13 | - Elizabeth scraped foundation colors and sorted them
 14 | - Edgardo showed their Playlist Of The Chilean Dictatorship, scraping Billboard’s Top 1 hit during the dictatorship years
 15 | - Tomoya scraped Youtube and sorted thousands of thumbnails of recommended videos
 16 | - Tim scraped pictures of Plan-B
 17 | 
 18 | 
 19 | 
 20 | ## Problem: Scraping google images, would return HTML without my content
 21 | 
 22 | Things to try:
 23 | 
 24 | **View source**
 25 | 
 26 | **Turn off Javascript**
 27 | 
 28 | - Turn off Javascript, then load a page
 29 |   - you’ll get a legacy page in simple HTML, sometimes for certain sites
 30 | 
 31 | **Use the Network tools**
 32 | 
 33 | - View > Developer Tools > Network
 34 | - You can see all the network requests from the browser to the server
 35 |   - So you can see, eg. images that are being loaded
 36 |   
 37 | ![](https://d2mxuefqeaa7sj.cloudfront.net/s_EF94E54173BB9AD30672FE5689D53FCD672AC2999755231FC9590FA02D6C9AFF_1539096267653_image.png)
 38 | 
 39 | - You can get links from there, and content in JSON format
 40 | - Sometimes it’s easier than parsing the HTML itself
 41 | 
 42 | For getting results of a search:
 43 | 
 44 | 1. Look for what looks like it has results, filter by XHR
 45 | 2. right click > Copy > Copy link address
 46 |   1. sometimes will give you a link to a JSON with all the results
 47 |     - a Chrome extension for Beautifying JSON: JSON Formatter
 48 | 3. You can hit Next, see what the new URL is, see what changed, maybe you can see how to get different pages
 49 | 
 50 | **Read the JSON in python:**
 51 | 
 52 |     import requests
 53 |     
 54 |     r = requests.get("....... crazy URL you got from network")
 55 |     
 56 |     # convert it to a JSON object
 57 |     data = r.json()
 58 |     
 59 |     # access its elements
 60 |     print(data["results"]["total_num_results"])
 61 |     
 62 |     # it gets bonkers
 63 |     print(data["results"]["cluster"][0]["patent"]["result"][0]["title"])
 64 |     
 65 | 
 66 | 
 67 | Network doesn’t work in cases like Twitter, where there’s no Next button, you just scroll down and it loads automatically
 68 | 
 69 | **When the JSON link gives you an error, use cURL**
 70 | 
 71 | 
 72 | 1. Right click > Copy > Copy as CURL
 73 | 2. Paste on a Terminal
 74 |   1. Now you’ll get the actual result of a query
 75 | 
 76 | You can paste a curl command on this online tool, and return a python requests command:
 77 | 
 78 |   https://curl.trillworks.com/
 79 | 
 80 | 
 81 | 
 82 | 
 83 | # Natural Language Processing
 84 | 
 85 | What is it?
 86 | 
 87 |   get computers to understand language, extract meaning from text
 88 |   Convert characters into some kind of data the computer understands and we can do something with.
 89 | 
 90 | We can get computers to:
 91 | 
 92 | - extract sentences
 93 | - get words
 94 | - for each word figure out what kind of word it is (Part of speech)
 95 | - Understand the sentiment of a text
 96 | - Classify text
 97 |   - here’s a bunch of negative sounding sentences, positive sentences, here’s a new sentence, is it positive or negative?
 98 | 
 99 | Based on rules, as machine learning or if else statements.
100 | There’s a lot of biases
101 | 
102 | - what the computer determines has some form of ideology, coming from the creator’s intention, consciously or not
103 | 
104 | 
105 | 
106 | ## TextBlob
107 | 
108 | We’ll use a library called TextBlob
109 | https://textblob.readthedocs.io
110 | 
111 | Lets us
112 | 
113 | - basic NLP tasks
114 | - easy-to-use
115 | - tradeoff is that it’s less accurate than other libraries
116 |   - other library, better, but more annoying to use: SpaCy
117 |     - https://spacy.io/
118 | 
119 | **Installing**
120 | On the terminal:
121 | 
122 |     
123 |     $ pip3 install textblob
124 |     
125 | 
126 | Install the data set
127 | 
128 |     
129 |     $ python3 -m textblob.download_corpora
130 |     
131 | 
132 | **Basic usage**
133 | 
134 | Breaking into sentences:
135 | 
136 |     
137 |     from textblob import TextBlob
138 |     
139 |     blob = TextBlob("A specter is haunting this classroom. The specter of sleepiness.")
140 |     
141 |     print(blob.sentences)
142 |     # Outputs a list of Sentence object
143 |     
144 |     # Iterate through all the sentences
145 |     for sentence in blob.sentences:
146 |       print(sentence)
147 |       
148 |     # Get all the words, removing the punctuation
149 |     for word in blob.words:
150 |       print(word)
151 |     
152 |     # Get the POS / part of speech (nouns, adjectives, etc.)
153 |     for tag in blob.tags:
154 |       print(tag) # Outputs a tuple (list you can't change) ('specter', u'NN'): the word, the part of speech
155 |       print(tag[0]) # Word
156 |       print(tag[1]) # POS
157 |     
158 |     # Get all the nouns
159 |     nouns = []
160 |     for tag in blob.tags:
161 |       if tag[1] == "NN":
162 |         nouns.append(tag[0])
163 |     print(nouns)
164 |     
165 | 
166 | 
167 | Tags for Part of Speech
168 | 
169 | - Penn Treebank part of speech tagging system
170 |   https://www.clips.uantwerpen.be/pages/mbsp-tags
171 | 
172 | 
173 | Pluralize words
174 | 
175 |     
176 |     for word in blob.words:
177 |       print(word.pluralize(,))
178 |     
179 | 
180 | 
181 | Classify sentences between positive and negative ones, using a simple training set
182 | 
183 | 
184 |     from textblob import TextBlob
185 |     from textblob.classifiers import NaiveBayesClassifier
186 |     train = [
187 |         ('i am happy today', 'pos'),
188 |         ('this is a good burger', 'pos'),
189 |         ('you\'re a good boy', 'pos'),
190 |         ('you are doing well', 'pos'),
191 |         
192 |         ('i do not like you', 'neg'),
193 |         ("don't go there", 'neg'),
194 |         ('this is so frustrating', 'neg'),
195 |         ('things are bad', 'neg')
196 |     ]
197 |     cl = NaiveBayesClassifier(train)
198 |     sentence = "I feel really bad"
199 |     # Classify a sentence
200 |     print(sentence,"is",cl.classify(sentence))
201 |     # Get the probability
202 |     prob = cl.prob_classify("I don't like tings")
203 |     print("The probability that this sentence is negative is", prob.prob("neg"))
204 |     print("The probability that this sentence is positive is", prob.prob("pos"))
205 |     
206 |     # Get all sentences of a certain category
207 |     for sentence in blob.sentences:
208 |       if (cl.classify(sentence) == "pos")
209 |         print(sentence)
210 |       
211 |       # only if they have less than three words
212 |       if len(sentence.words) < 3:
213 |         print(sentence)
214 |         
215 | 
216 | 
217 | Get the sentiment (biased, unreliable)
218 | 
219 |     
220 |     for sentence in blob.sentences:
221 |       print(sentence.sentiment) # Returns a Sentiment object with a polarity value and a subjectivity value
222 |       
223 |       # Print only the positive sentences
224 |       if (sentence.sentiment.polarity > 0.8)
225 |         print(sentence)
226 |         
227 | 
228 | 
229 | 
230 | # Natural Language Processing Examples
231 | 
232 | https://github.com/antiboredom/sfpc-scrapism/tree/master/class-notes/examples
233 | 
234 | # More Tools for NLP
235 | ## SpaCy
236 | - https://spacy.io/usage/
237 | 
238 | 
239 | ## Concept Net
240 | - http://conceptnet.io/
241 | - synonyms, related terms, used for…, types
242 |   - Use the same URL but add api., for seeing what’d you get with the API
243 |     - http://conceptnet.io/c/en/glass
244 |     - http://api.conceptnet.io/c/en/glass
245 | 
246 | Using the concept net API
247 | 
248 |     
249 |     import requests
250 |     
251 |     word = "glass"
252 |     r = requests.get("http://api.conceptnet.io/c/en/" + word)
253 |     data = r.json()
254 |     
255 |     
256 | 
257 | Click a header, e.g. used for
258 | 
259 | http://conceptnet.io/c/en/glass?rel=/r/UsedFor&limit=1000
260 | 
261 | Get the API URL for all things glass is used for
262 | 
263 | http://api.conceptnet.io/c/en/glass?rel=/r/UsedFor&limit=1000
264 | 
265 | *edges* = the ways in which this word relates to other words
266 | 
267 | - *edges* has a *start* and a *end* ([start] is used for [end])*,* we want the *end*
268 |     
269 |     for edge in data["edges"]:
270 |       if edge["rel"]["label"] == "UsedFor":
271 |         print(edge["start"]["label"])
272 |       
273 | 
274 | A project made with this:
275 | 
276 | - Darius Kazemi’s expanding mind bot
277 |   - https://twitter.com/expandingbot
278 | 
279 | 
280 | 
281 | 
282 | # Tips
283 | ## how to publish code to github without personal information
284 | - Make a secrets.py that has your logins, or passwords or API keys, that you’re using in your code
285 | - add it to .gitignore so it doesn’t get pushed to the server
286 | 
287 | 
288 | 
289 | ## a prettier print() statement
290 | 
291 | 
292 |     from pprint import pprint
293 |     
294 |     pprint(json)
295 |     
296 | 
297 | 
298 | 
299 | # Homework TBD
300 | 
301 | 


--------------------------------------------------------------------------------
/class-notes/class-5.md:
--------------------------------------------------------------------------------
  1 | # 10/16 - Photos 📸
  2 | 
  3 | # 📒 Agenda
  4 | - image manipulation 
  5 |   - getting images
  6 |   - photo manipulation
  7 |     - Image Magick
  8 |     - TLDR
  9 |     - Subprocess
 10 |     - [Pillow](https://pillow.readthedocs.io/en/5.3.x/) 🛌 😴
 11 | - video manipulation (if we have time!)
 12 | # 🤔 Who did homework?!?!
 13 | - Ilona has a WIP: who is coming out day for?
 14 |   - [AI weirdness](http://aiweirdness.com/), training neural networks with a sense of humor
 15 | - Tim made a lyric 🎶 generator using TextBlob
 16 |   - Kanye & Ayn Rand 🙃 
 17 | # 🖼️ How do we get images off a website?
 18 | - [Shutterstock](https://www.shutterstock.com/), [Pexels](https://www.pexels.com/), good image resources 
 19 | - Check whether website still loads by toggling javascript on/off
 20 |   - If it does, it should be easy to scrape
 21 | - When searching, make sure you select only Photos
 22 | ![](https://d2mxuefqeaa7sj.cloudfront.net/s_760AA603A36DCE03DC9C80E71E2F81C1E2EDCAAED6138F202B76C43EC5789514_1539700121631_image.png)
 23 | 
 24 | 
 25 | **Use Requests_Html library to help scrape**
 26 | Gets all images. Generalizable to any website, but will pull all images. 
 27 | 
 28 |     from requests_html import HTMLSession
 29 |     #requests is library requests_html is based off, use to download image
 30 |     import requests
 31 |     #subprocesses are how python can call other command line tools 
 32 |     import subprocess
 33 |     url = 'https://www.shutterstock.com/search?searchterm=existential+despair&search_source=base_search_form&language=en&page=1&sort=popular&image_type=all&measurement=px&safe=true'
 34 |     
 35 |     # searches through all the html tags and grabs specified tags
 36 |     session = HTMLSession()
 37 |     r = session.get(url)
 38 |     
 39 |     # Gets all images on the page
 40 |     images = r.html.find('img'')
 41 |     for img in images:
 42 |       print(img)
 43 | 
 44 | To get image source
 45 | 
 46 |     for img in images:
 47 |       src = img.attrs.get('src') # Gets image source
 48 |       title = img.attrs.get('alt') # Gets image title
 49 | 
 50 | Get title of each image using the split command
 51 | 
 52 |     # Gets end of url for img name
 53 |     imgname = src.split('/')[-1]
 54 |     imgdata = requests.get(src).content
 55 |     
 56 |     #wb is the command for 'write binary'
 57 |     open(imgname, 'wb').write(imgdata)
 58 | 
 59 | 
 60 | # Basic photo manipulation in command line
 61 | ## [IMAGE MAGICK](https://www.imagemagick.org/script/index.php) 🧙 🧙‍♂️ ✨ 
 62 | 
 63 | F*or command line photo manipulation!*
 64 | 
 65 | Run this command in terminal
 66 | 
 67 |     brew install imagemagick
 68 | 
 69 | Examples of Image Magick Functionality: Creating GIF, converting file types, resize, etc. 
 70 | 
 71 |     #Rename and convert file type
 72 |     convert <filename.abc> <newfilename.xyz>
 73 |     
 74 |     #Whoa! Maintains the aspect ratio 😱
 75 |     convert <filename.abc> -resize 1000x1000 <bigfilename.abc>
 76 |     
 77 |     #Rotate!
 78 |     convert <filename.abc> -rotate 90 <rotated.abc>
 79 |     
 80 |     #Invert photo
 81 |     convert <filename.abc> -negate 90 <negative.abc>
 82 |     
 83 |     #You can combine them 🤝
 84 |     convert <filename.abc> -negate -rotate 90 <combined.abc>
 85 |     
 86 |     #Creating GIF from a folder of images
 87 |     convert images/*.jpg -delay 0 animation.gif
 88 |     
 89 |     #Montage tiles images into a grid 
 90 |     montage images/*.jpg montage.jpg
 91 | 
 92 | 👉 All the Image Magick command line options [**here**](https://imagemagick.org/script/command-line-options.php)❗
 93 | 
 94 | 
 95 | ## [TLDR](https://tldr.sh/)
 96 | 
 97 | Finds common terminal commands with keywords 
 98 | 
 99 |     brew install tldr
100 |     tldr <command line tool>
101 | 
102 | 
103 | ## [Subprocess](https://docs.python.org/2/library/subprocess.html)
104 | 
105 | Allows you to use command line
106 | 
107 |     import subprocess
108 |     
109 |     # Same as: say hi
110 |     subprocess.call(["say", "hi"])
111 |     
112 |     # Same as: say -r 300 "a specter is haunting this python script
113 |     # No need to double quote things!
114 |     subprocess.call(["say", "-r", "300", "a specter is haunting this python script"])
115 | 
116 | **Subprocess example #1:** Says title of each image
117 | 
118 |     for img in images:
119 |       src = img.attrs.get('src') # Gets image source
120 |       title = img.attrs.get('alt') # Gets image title
121 |       subprocess.call(["say", title]
122 | 
123 | **Subprocess example #2:** Downloads each file and converts it to negative
124 | 
125 |     subprocess.call(["convert", imgname, "-negate", imgname + ".neg.jpg"])
126 | 
127 | **Subprocess example #3:** Takes all images in folder and makes an animated gif
128 | 
129 |     subprocess.call(["convert", "*.jpg", "-delay", "0", "animation.gif"])
130 | 
131 | 
132 | ## [Pillow](https://pillow.readthedocs.io/en/5.3.x/)
133 | 
134 | **Install using pip**
135 | 
136 |     pip3 install pillow
137 | 
138 | **Basic example**
139 | 
140 |     from PIL import Image, ImageFilter
141 |     #If image & file are in separate folders, provide file path
142 |     img = Image.open("<filename.jpg>")
143 |     
144 |     #Resizing images
145 |     #Thumbnail respects original aspect ratio, takes a new size as a tuple (width, height). It also doesn't make things bigger than they already are. Changes original image.
146 |     img.thumbnail((100, 100))
147 |     img.save("<small_filename.jpg>")
148 |     
149 |     #Resize doesn't respect original aspect ratio. Does not change original image.
150 |     img = img.resize((1000, 1000))
151 |     
152 |     #Rotates image
153 |     img = img.rotate(45)
154 |     
155 |     #Apply a  filter
156 |     img = img.filter(ImageFilter.BLUR)
157 | 
158 | **Image draw**
159 | 
160 |     from PIL import Image, ImageFilter, ImageDraw
161 |     img = Image.open("<filename.jpg>")
162 |     
163 |     #Draws on image!
164 |     draw = ImageDraw.Draw(img)
165 |     draw.text((10, 10), "HELLO!")
166 |     draw.ellipse((0, 0, 500, 500), fill=(255, 255, 255))
167 | 
168 | **Collaging images together**
169 | 
170 |     from PIL import Image, ImageFilter, ImageDraw
171 |     #glob syntax allows you to use /* to easily refer to all items at a path
172 |     from glob import glob
173 |     import random
174 |     
175 |     #a list of all the file names
176 |     files = glob("images/*.jpg")
177 |     
178 |     #Takes three parameters: kind of image, width & height
179 |     canvas = Image.new("RGB", (1000, 1000))
180 |     
181 |     #Loop through all files and stick them on image
182 |     for filename in files:
183 |       img = Image.open(filename)
184 |       
185 |       #generates random location
186 |       x = random.randint(-100, 1000)
187 |       y = random.randint(-100, 1000)
188 |       
189 |       #takes an image pastes it on something else
190 |       canvas.paste(img, (x, y))
191 |     
192 |     canvas.save("collage.jpg")
193 |     
194 | 
195 | **Get rid of labels**
196 | *Crops image*
197 | 
198 |     img = img.crop(0, 0, img.size[0], img.size[1]-20)
199 | 
200 | To use transparency, use the RGBA colorspace. Can’t combine images in Pillow that don’t have the same colorspace
201 | 
202 |     canvas = Image.new("RGBA", (1000, 1000))
203 |     img = img.convert("RGBA")
204 |     canvas.save("collage.png")
205 | 
206 | 
207 | ## [Open CV and Python](https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_tutorials.html)
208 | 
209 | Installation
210 | 
211 |     pip3 install opencv-python
212 | 
213 | **What is a Haar Cascade?**
214 | Computer is looking for patterns. When we tell open CV, we can tell it to grab anything: a face, an eyeball, a smile, etc. [**Download XML file**](https://github.com/opencv/opencv/tree/master/data/haarcascades) ****depending on what you want to detect.
215 | 
216 | 1 - Eyes
217 | 
218 | 2 - Eyeglasses
219 | 
220 | 3 - Front of face
221 | 
222 | 4 - Profile
223 | 
224 | 5 - Full body
225 | 
226 | 6 - Left eye
227 | 
228 | 7 - Right eye
229 | 
230 | 8 - Lower body
231 | 
232 | 
233 |     import cv2
234 |     import numpy as np
235 |     
236 |     cascade = cv2.CascadeClassifier("eye.xml")
237 |     
238 |     #makes video object, looks for video camera on 💻
239 |     video_capture = cv2.VideoCapture(0)
240 |     video_capture.set(3, 1280)
241 |     video_capture.set(4, 720)
242 |     
243 |     #creates an infinite loop 
244 |     while True:
245 |       ret, frame = video_capture.read()
246 |       gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
247 |       #becomes a list of coordinates with eyeballs
248 |       eyeballs = cascade.detectMultiScale(gray)
249 |       
250 |       for (x,y,w,h) in eyeballs:
251 |         #draws rectangle onto image
252 |         #(image, coords, color, stroke width
253 |         #to grab part of the frame 
254 |         eye_img = frame[y:y+h, w:x+w]
255 |         #to keep each frame an increasing number
256 |         outname = "eye_" + str(index) + ".jpg" 
257 |         cv2.imwrite(eye_img, outname)
258 |         cv2.rectangle(frame, (x, y), (x+w, y+h),(0,255,0),2)
259 |       
260 |       #makes window on computer
261 |       cv2.imshow("Video", frame)
262 |       
263 |       #looking for an exit key
264 |       if cv2.waitKey(1) & 0xFF == ord('q'):
265 |         break
266 |       
267 |     #Should stream the video
268 |     VideoCapture.release()
269 |     cv2.destroyAllWindows()
270 | 
271 | Instead of using a live video, you can read an image in.
272 | 
273 |     from glob import glob
274 |     
275 |     files = glob('images/*.jpg')
276 |     for filename in files:
277 |       frame = cv2.imread(filename)
278 |       
279 |       #everything else same as above!
280 | 
281 | detectMultiScale has a few options ([list of params](https://docs.opencv.org/2.4/modules/objdetect/doc/cascade_classification.html))
282 | 
283 |     #eyeballs have to be at least 100, 100
284 |     cascade.detectMultiScale(gray, minSize=(100,100)
285 | # Libraries & Add-Ons
286 | 
287 | [Open CV and Python](https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_tutorials.html)
288 | 
289 |   - Open CV is easiest to implement
290 | 
291 | [Dark Flow Library](https://github.com/thtrieu/darkflow) chaan find people and other objects  (like apples, oranges, etc) 
292 | [Image2Text](https://github.com/tensorflow/models/tree/master/research/im2txt) 
293 | 
294 | 


--------------------------------------------------------------------------------
/class-notes/class-6.md:
--------------------------------------------------------------------------------
  1 | # 10/23 Video
  2 | 
  3 | 
  4 | https://www.youtube.com/watch?v=KgbSjRMqyjc&
  5 | 
  6 | 
  7 | Today
  8 | 
  9 | - download video
 10 | - manipulate them
 11 | - python library called numPy, lets you edit video in python
 12 | 
 13 | 
 14 | # Example scripts from this class
 15 | 
 16 | https://github.com/antiboredom/sfpc-scrapism/tree/master/class-notes/examples/video
 17 | 
 18 | 
 19 | 
 20 | 
 21 | # Getting material with Youtube-dl
 22 | 
 23 | scraping web for video is difficult, so we use **youtube-dl** to download videos from the terminal
 24 | https://youtube-dl.org/
 25 | Documentation:
 26 | https://github.com/rg3/youtube-dl/blob/master/README.md#output-template-examples
 27 | 
 28 | To install, in terminal:
 29 | 
 30 |     $ brew install youtube-dl
 31 | 
 32 | It’s a web scraper just for video. For every website they reversed engineered how to get a video file
 33 | 
 34 | 
 35 | ## To use it
 36 |     $ youtube-dl [URL]
 37 | 
 38 | 
 39 | 
 40 | ## See the possible formats with -F
 41 |     $ youtube-dl [URL] -F
 42 |     # outputs the format available for that file, and resolutions
 43 |     
 44 |     # you can choose a specific format to download, with -f and the code for that format
 45 |     $ youtube-dl [UDL] -f 22
 46 | 
 47 | 
 48 | - Some formats let you download only the video, or only the audio
 49 | 
 50 | 
 51 | 
 52 | ## Download every single video from a youtube user
 53 | 
 54 | Paste the URL from the **youtube channel**
 55 | Get every TED talk:
 56 | 
 57 |     
 58 |     $ youtube-dl https://www.youtube.com/user/TEDtalksDirector 
 59 |     
 60 | 
 61 | Also works for, e.g. playlists
 62 | https://www.youtube.com/user/TEDtalksDirector/playlists
 63 | 
 64 | 
 65 | 
 66 | ## Decide what the output name it’ll be with -o
 67 |     
 68 |     $ youtube-dl [URL] -o [FILENAME]
 69 |     
 70 | 
 71 | 
 72 | 
 73 | ## Try and always save with a specific format (like mp4)
 74 |     
 75 |     $ youtube-dl [URL] --merge-output-format mp4
 76 |     
 77 | 
 78 | 
 79 | 
 80 | ## Download subtitles
 81 | - add “,cc” to your search, to only get videos with closed captions
 82 | 
 83 |      --write-auto-sub
 84 |      --skip-download Download only subtitles and not the video
 85 |     
 86 |     $ youtube-dl [URL] --write-auto-sub --skip-download
 87 |     
 88 | 
 89 | 
 90 | - check sam’s tool for parsing subtitle files
 91 | ## Get the URLS of the video and audio
 92 |     
 93 |     # --get-url
 94 |     $ youtube-dl [URL] --get-url
 95 |     
 96 | 
 97 | 
 98 | 
 99 | ## Tips
100 | - Avoid URLs with “&”
101 |   - Make sure the URL for youtube is just https://www.youtube.com/watch?v=[ID]
102 | - limit max downloads with --max-downloads to save space
103 | - call the program from python with subprocess.call(…)
104 | - Use it for more sites:
105 |   - full list: https://rg3.github.io/youtube-dl/supportedsites.html
106 |   - 
107 | 
108 | 
109 | 
110 | 
111 | 
112 | 
113 | 
114 | 
115 | 
116 | # VLC video player
117 | 
118 | Use VLC to play every video format imaginable
119 | https://www.videolan.org/vlc/
120 | 
121 | 
122 | 
123 | 
124 | 
125 | # Use ffmpeg to convert formats / edit video
126 | 
127 | https://ffmpeg.org/
128 | Documentation: https://ffmpeg.org/ffmpeg.html (insane and confusing, just google stuff)
129 | 
130 | Usage:
131 | 
132 |     
133 |     # ffmpeg -i \[Some kind of input\] [parameters (optional)] [Some kind of output]
134 |     
135 |     # convert my mp4 video to mov format
136 |     $ ffmpeg -i mycatvideo.mp4 mycatvideo.mov
137 |     
138 | 
139 | 
140 | ## Turn things into animated GIFs
141 |     
142 |     $ ffmpeg -i kitten.mp4 kitten.gif
143 |     
144 |     # use -r 3 . to set it to 3 frames a second
145 |     $ ffmpeg -i kitten.mp4 -r 3 kitten.gif
146 |     
147 | 
148 | 
149 | 
150 | ## Output frames to images
151 | 
152 | 
153 |     
154 |     # 1. set the output format to an image format (eg jpg)
155 |     # 2. put %d in the output name, it'll be replaced by the frame number
156 |     
157 |     $ ffmpeg -i kitten.mp4 kitten_frame_%d.jpg
158 |     
159 | 
160 | 
161 | ## Turn a bunch of images into a video
162 |     # 1. set the INPUT format to the image format (eg jpg)
163 |     # 2. put %d in the INPUT name, it'll be replaced by the frame number, make sure those files exist
164 |     
165 |     $ ffmpeg -i kitten_frame_%d.jpg kitten.mp4
166 |     
167 | 
168 | 
169 | ## Cut/trim video (e.g. get rid of the intro)
170 |     
171 |     # BEFORE the input, use -ss [timestamp]
172 |     # AFTER the input, use -t [length in seconds]
173 |     
174 |     # start at 10 seconds, and go for 10 seconds
175 |     $ ffmpeg -s 00:00:10 -i kitten.mp4 -t 10 output.mp4
176 |     
177 | 
178 | 
179 | ## Get info about a video with ffprobe
180 |     
181 |     $ ffprobe kitten.video
182 |     # outputs a bunch of info, like duration, bitrate, etc.
183 |     
184 | 
185 | 
186 | ## Combine all this
187 |     
188 |     # Start at 10 seconds, get a 10 seconds video, turn into a GIF
189 |     $ ffmpeg -s 00:00:10 -i kitten.mp4 -t 10 output.gif
190 |     
191 | 
192 | 
193 | ## Use ffmpeg to download a video URL you got from youtube-dl (!!!)
194 |     
195 |     # Get only the URL
196 |     $ youtube-dl --get-url [YOUTUBE LINK]
197 |     
198 |     # copy that URL to ffmpeg, get only 5 seconds of it
199 |     $ ffmpeg -i "[THE URL WE GOT]" -t 5 short_fan.mp4
200 |     
201 | 
202 | 
203 | ## Stupid filters
204 | 
205 | https://ffmpeg.org/ffmpeg-filters.html
206 | 
207 | **vflip - flip the video vertically**
208 | 
209 |     
210 |     $ ffmpeg -i in.mp4 -vf "vflip" out.mp4
211 |     
212 | 
213 | **edgedetect**
214 | 
215 | - Add parameters to the filter after the -vf, in quotes
216 |   - [option]=[value], separated with **:**
217 |     
218 |     $ ffmpeg -i in.mp4 -vf "edgedetect=low=0.1:high=0.4" out.mp4
219 |     
220 | 
221 | 
222 | 
223 | ## Change the speed of the video
224 | 
225 | The command is stupid
226 | 
227 | - PTS = Presentation Time Stamp, the time stamp eg. 00:10:00
228 | - you tell it to make the PTS, e.g. twice what it is, therefore it slows down
229 | - to make it faster you multiply the PTS by a decimal number, e.g. twice as fast = 0.5
230 | - ugh just google it
231 |     
232 |     # slow it down, twice the length
233 |     $ ffmpeg -i short_fan.mp4 -vf "setpt=2*PTS" slow_short_fan.mp4
234 |     
235 |     # speed it up, twice as fast
236 |     $ ffmpeg -i short_fan.mp4 -vf "setpt=0.5*PTS" fast_short_fan.mp4
237 |     
238 | 
239 | **Interpolation**
240 | 
241 | - When slowing down you can use motion-interpolate to create the frames in between, to make smoother slow-mo videos
242 |   - https://cloudacm.com/?p=3055
243 | 
244 | 
245 | 
246 | ## Use the camera
247 |     
248 |     $ ffmpeg -f avfoundation -pixel_format yuyv422 -framerate 30 -video_size 1280x720 -i 0:0 recording.mp4
249 |     
250 | 
251 | Keeps recording until we hit Ctrl+C
252 | 
253 | **use -t to only record 5 seconds**
254 | 
255 |     
256 |     $ ffmpeg -t 5 -f avfoundation -pixel_format yuyv422 -framerate 30 -video_size 1280x720 -i 0:0 recording.mp4
257 |     
258 | 
259 | 
260 | 
261 | ## Tips
262 | - abort a long operation with Cmd+C
263 | - when stuck, search for specific uses, eg. “ffmpeg make optimized gif”
264 | - delete all files in a folder with terminal (if we downloaded too many files)
265 |   - rm *.jpg
266 | - use youtube-dl on a livestream to capture it
267 | 
268 | 
269 | 
270 | 
271 | # Using MoviePy
272 | 
273 | We’ll use MoviePy for video editing
274 | https://zulko.github.io/moviepy/
275 | Documentation: https://zulko.github.io/moviepy/ref/ref.html
276 | 
277 | 
278 | WIP library from Sam
279 | https://antiboredom.github.io/vidpy/
280 | 
281 | Install
282 | 
283 |     
284 |     $ pip3 install moviepy
285 |     
286 | 
287 | 
288 | 
289 | ## Put two videos together
290 | 
291 | **By concatenating**
292 | 
293 |     
294 |     # import the editor library but call it mp to make it shorter
295 |     import moviepy.editor as mp
296 |     # another way is to only import the functions we need, a bit faster
297 |     # from moviepy.editor import VideoFileClip, concatenate_videoclips
298 |     
299 |     # Join videos together
300 |     clipl = mp.VideoFileClip("fan_upside_down.mp4")
301 |     clip2 = mp.VideoFileClip("prancercise.mp4")
302 |     
303 |     # the function takes a list of video objects
304 |     final_clip = mp.concatenate_videoclips([clip1, clip2])
305 |     # output the video to a file
306 |     final_clip.write_videofile("output.mp4")
307 |     
308 | 
309 | **By compositing videos together (like layers in photoshop/premiere)**
310 | 
311 | 
312 | ## Get only a segment of a video clip, with subclip()
313 |     # from 10 seconds to 13.5 seconds
314 |     clip1 = mp.VideoFileClip("prancercise.mp4").subclip(10, 13.5)
315 |     
316 |     # also works like this
317 |     clip1 = mp.VideoFileClip("prancercise.mp4")
318 |     tiny_clip = clip1.subclip(10, 13.5)
319 |     
320 | 
321 | 
322 | ## Resize a video file
323 | - if the videos are of different sizes, the output might be broken, so we resize them
324 |     
325 |     clip1 = clip1.resize((1280, 720))
326 |     
327 | 
328 | 
329 | 
330 | 
331 | ## Get random subclips from a video and combine them together
332 | 
333 | 
334 |     import random
335 |     import moviepy.editor as mp
336 |     
337 |     video = mp.VideoFileClip("dance.mp4")
338 |     
339 |     # get the duration of the video, in seconds
340 |     video_duration = video.duration
341 |     # define the duration of the subclips we're gonna take
342 |     clip_duration = 0.5
343 |     # make a list we'll populate with subclips
344 |     clips = []
345 |     
346 |     for i in range(0,10): # do this 10 times
347 |       # get a random start point
348 |       # make it so the end can never be past the full video duration
349 |       start = random.uniform(0, video_duration - clip_duration)
350 |       # the end point is whatever start time plus the subclip duration
351 |       end = start + clip_duration
352 |       # add a subclip to the list, between our start and end points
353 |       clips.append(video.subclip(start,end))
354 |     
355 |     # create the final video out of all the clips
356 |     final_clip = mp.concatenate_videoclips(clips)
357 |     # write it to a file
358 |     final_clip.write_video("random_dance.mp4")
359 | 
360 | 
361 | 
362 | ## Play a bunch of clips overlayed on top of each other, for reasons
363 | 
364 | It’s like what we just did but instead of concatenate_videoclips:
365 | 
366 |     
367 |     # use CompositeVideoClip to make a video that's all the clips layered together
368 |     final_clip = mp.CompositeVideoClip(clips)
369 |     
370 | 
371 | 
372 | 
373 |     
374 |     import random
375 |     import moviepy.editor as mp
376 |     
377 |     video = mp.VideoFileClip("dance.mp4")
378 |     
379 |     # get the duration of the video, in seconds
380 |     video_duration = video.duration
381 |     # define the duration of the subclips we're gonna take
382 |     clip_duration = 1.5
383 |     # make a list we'll populate with subclips
384 |     clips = []
385 |     
386 |     for i in range(0,10): # do this 10 times
387 |       # get a random start point
388 |       # make it so the end can never be past the full video duration
389 |       start = random.uniform(0, video_duration - clip_duration)
390 |       # the end point is whatever start time plus the subclip duration
391 |       end = start + clip_duration
392 |     
393 |       # store the subclip so we can do things to it
394 |       clip = video.subclip(start,end)
395 |     
396 |       # set the position of the clip in our canvas
397 |       clip = clip.set_position((random.randint(-100, 800), random.randint(-100, 400)))
398 |       # set the start time, to whatever position we're in (i) by half, just for fun
399 |       clip = clip.set_start( i / 2.0 )
400 |       # add to the list
401 |       clips.append(clip)
402 |     
403 |     # make a video that is a composite of all the subclip
404 |     final_clip = mp.CompositeVideoClip(clips)
405 |     
406 |     # write it to a file
407 |     final_clip.write_videofile(“random_dance.mp4”, codec="libx264", temp_audiofile="something.m4a", remove_temp=True, audio_codec="aac")
408 | ## Tips
409 | - Functions on video clips don’t change the original clip, they return a new clip
410 |   - so always do new_clip = clip1.resize((1280,720))
411 | - To export with an audio codec that mac understands:
412 |     
413 |     write_videofile(“random_dance.mp4”, codec="libx264", temp_audiofile="something.m4a", remove_temp=True, audio_codec="aac")
414 |     
415 | 
416 | 
417 | 
418 | 
419 | # Videogrep
420 | 
421 | Videogrep is a command line tool that searches through dialog in video files and makes supercuts based on what it finds.
422 | 
423 | https://antiboredom.github.io/videogrep/
424 | 
425 | 
426 | - Needs a video file with a subtitle or transcription file associated with it, with the same name
427 |   - most youtube videos have them
428 |   - we can use youtube-dl to get them
429 | 
430 | 
431 | 
432 | ## Download a video, then use videogrep to get all the instances of a word and make a supercut
433 |     
434 |     # get the version 18 that's a smaller video, and download subtitles
435 |     $ youtube-dl "[URL]" -f 18 --write-auto-sub
436 |     
437 |     # if our subtitles are in .vtt format, we add --use-vtt
438 |     $ videogrep -i [name_of_the_video_file] --use-vtt --search "Korea"
439 |     
440 | 
441 | 
442 | ## Get only individual words
443 |     
444 |     # use --search-type word
445 |     $ videogrep -i [name_of_the_video_file] --use-vtt --search "Korea" --search-type word
446 |     
447 | 
448 | 
449 | ## Set the output file
450 |     
451 |     # use -o output_name.mp4
452 |     
453 | 
454 | 
455 | ## Add padding
456 |     
457 |     # add 300 ms of space between words
458 |     # --padding 300
459 |     
460 | 
461 | 
462 | ## Use regular expressions to search for multiple things
463 |     
464 |     # use the pipe character | to search for either text
465 |     # search for "Korea" or any word with "nucl" in it
466 |     # "Korea|nucl"
467 |     $ videogrep -i [name_of_the_video_file] --use-vtt --search "Korea|nucl" --search-type word
468 |     
469 |     # the ^ character means the start of the word
470 |     # ^a = all the words that begin with the letter a
471 |     
472 |     # the $ means the end of the word
473 |     # ing$ = all the words that end with ing
474 |     
475 |     # 
476 |     
477 | 
478 | 
479 | ## Export n-grams
480 |     
481 |     # -n 1
482 |     # outputs the most used words
483 |     
484 |     # -n 2
485 |     # outputs the most used couplings of words
486 |     
487 | 
488 | 
489 | ## Use  sphinx to transcribe videos
490 |     
491 |     # install sphinx
492 |     brew tap watsonbox/cmu-sphinx
493 |     brew tap watsonbox/cmu-sphinx
494 |     brew install --HEAD watsonbox/cmu-sphinx/cmu-sphinxbase
495 |     brew install --HEAD watsonbox/cmu-sphinx/cmu-sphinxtrain # optional
496 |     brew install --HEAD watsonbox/cmu-sphinx/cmu-pocketsphinx
497 |     
498 |     # transcribe the video
499 |     videogrep -i pompeo.mp4 --transcribe
500 |     
501 | 
502 | 
503 | 
504 | # Next
505 | 
506 | 
507 | - Upcoming workshop from hardware class TA
508 |   - A servo motor + a camera + control it with python
509 | 
510 | 
511 | ## Homework
512 | - explore these tools and make a python script that makes a new video every time you run it
513 | 
514 | 
515 | 
516 | 


--------------------------------------------------------------------------------
/class-notes/class-7.md:
--------------------------------------------------------------------------------
  1 | # 10/30 - Bots
  2 | 
  3 | 
  4 | 
  5 | # Troubleshooting videogrep
  6 | 
  7 | Some students had problems with videogrep!
  8 | 
  9 | 
 10 |             “The right attitude is… it’s amazing it works at all!” - Sam
 11 | 
 12 | 
 13 | - **bool is not iterable** might mean it didn’t find the subtitle or video file
 14 | 
 15 | Make sure
 16 | 
 17 | - Subtitle file name is the same as video file name, and it can end in .vtt OR .en.vtt
 18 | - Videos might need to be the same size
 19 | - Keep all videos the same format and size
 20 |   - Make sure youtube-dl gets the same format and size by using the -f setting (-f 22 gives you a 1280x720 video on youtube, use capital F (-F) to see what formats there are)
 21 |   - Convert using ffmpeg: (-vcodec copy means the video doesn’t get reencoded = faster)
 22 |     - ffmpeg -i myvideo.mkv -vcodec copy myvideo.mp4 
 23 | - If a video has youtube auto-generated subtitles, it’ll have **word** **by word** timings. If it’s subtitles uploaded by the user it might have **sentence by sentence** timings.
 24 |   - look at the subtitle file, look for <c> tags surrounding each word, and a timestamp tag next to it
 25 |   - you can use sphinx to transcribe the videos so they’ll all have the same format
 26 | - **The word i’m searching for is in the vtt, but videogrep doesn’t find it**
 27 |   - check the vtt, if that word doesn’t have timing tags for only that word, and instead it’s part of a sentence, videogrep might not find it
 28 |     - use sphinx to transcribe the video
 29 | 
 30 | 
 31 | 
 32 | To test a regular expression
 33 | https://regex101.com/
 34 | 
 35 | 
 36 | 
 37 | # Bots bots beep boop
 38 | 
 39 | Today:
 40 | 
 41 | 1. How we can write python code we can use in multiple contexts
 42 |   1. You want to write code that’s reusable / acts like a tool
 43 |   2. We’ll be able to write one file that does something with any kind of input
 44 |   3. Can be used from another script
 45 |   4. Applicable to other programming languages
 46 | 
 47 | We’ll make an example script, and then we’ll make it modular.
 48 | 
 49 | - Take a video of a sunset
 50 | - overlay a word
 51 |   - the word will be different
 52 | 
 53 | 
 54 | 
 55 | ## The script
 56 | 
 57 | We got a video of a sunset from youtube
 58 | 
 59 | https://www.youtube.com/watch?v=Nl3S8VhUxfY&
 60 | 
 61 | 
 62 | We’ll clip it to start at 01:48 until 01:52
 63 | 
 64 |     
 65 |     # Import VideoFileClip to load a video, 
 66 |     # TextClip to overlay text, and CompositeVideoClip 
 67 |     # to take two or more clips and overlay them as layers
 68 |     from moviepy.editor import VideoFileClip, TextClip, CompositeVideoClip
 69 |     
 70 |     text = "A specter is haunting this sunset"
 71 |     
 72 |     # Load the video, get a subclip that's just our small range, in seconds
 73 |     clip1 = VideoFileClip("sunset.mp4").subclip(108, 112)
 74 |     # Make a text clip, give it a duration
 75 |     clip2 = TextClip(text).set_duration(4)
 76 |     
 77 |     # Make a composite video, takes a list of clips
 78 |     composition = CompositeVideoClip( [ clip1, clip2 ] )
 79 |     
 80 |     # Export the video
 81 |     composition.write_videofile("sunset_words.mp4")
 82 |     
 83 | 
 84 | Problems with this code
 85 | 
 86 | - The text is tiny
 87 |   - We didn’t tell the text what size to make it
 88 | - the video is a bit big, it’ll faster if we resize it
 89 | - we want the text clip to be the same size as the video
 90 | 
 91 | Options for TextClip: https://zulko.github.io/moviepy/ref/VideoClip/VideoClip.html?highlight=textclip#textclip
 92 | 
 93 | 
 94 | ## Making it modular
 95 | 
 96 | We want to reuse this code, so we’ll turn all of this into a **function** with def, and turning the code into a block
 97 | 
 98 | Also, we want to use arguments from the terminal to **reuse the script with any text we want**, by using sys.argsv
 99 | 
100 | **vidbot.py**
101 | 
102 |     # Import VideoFileClip to load a video, 
103 |     # TextClip to overlay text, and 
104 |     # CompositeVideoClip to take two or more clips and overlay them as layers
105 |     from moviepy.editor import VideoFileClip, TextClip, CompositeVideoClip
106 |     
107 |     def compose(text):
108 |       # Load the video, get a subclip that's just our small range, in seconds
109 |       # e.g. start time is one minute, 48 seconds = 60 + 48 = 108
110 |       # Then resize it to make it faster, resize takes a tuple
111 |       clip1 = VideoFileClip("sunset.mp4").subclip(108, 112).resize( (1920/2, 1080/2) )
112 |       # Make a text clip, give it a duration
113 |       # We'll also make it the size of the video clip
114 |       clip2 = TextClip(text, size=clip1.size).set_duration(4)
115 |       
116 |       # Make a composite video, takes a list of clips
117 |       composition = CompositeVideoClip( [ clip1, clip2 ] )
118 |       
119 |       # Export the video
120 |       composition.write_videofile("sunset_words.mp4")
121 |     
122 |     # sys.argv gives us the arguments from the terminal, in a list (first item is the script name)
123 |     text = sys.argv[1]
124 |     # call our function using the text from the terminal
125 |     compose(text)
126 |     
127 | 
128 | 
129 | Now we have a tool to create these videos from the terminal. IT’S AMAZING.
130 | 
131 | 
132 | ## Adding more options
133 | 
134 | We want to be able to change the duration
135 | 
136 | - We’ll make our function take a duration parameter,
137 |   - the start and end of the clip will get calculated based on that
138 | - We’ll make the duration optional
139 |   - by adding a default value on the parameter with *duration=4.0*
140 |   - by checking the sys.argv parameter is actually there, avoid throwing an error if it’s not
141 | 
142 | boop
143 | 
144 |     # import VideoFileClip to load a video, TextClip to overlay text, and CompositeVideoClip to take two or more clips and overlay them as layers
145 |     from moviepy.editor import VideoFileClip, TextClip, CompositeVideoClip
146 |     import sys
147 |     import argparse
148 |     
149 |     # our function takes a text and duration. duration is optional and defaults to 4.0
150 |     def compose(text, duration=4.0):
151 |       # define when our video starts (one minute, 48 seconds = 60 + 48 = 108)
152 |       start = 108
153 |       # and calculate when it ends
154 |       end = start + duration
155 |       # Load the video, get a subclip that's just our calculated range
156 |       # Then resize it to make it faster, resize takes a tuple
157 |       clip1 = VideoFileClip("sunset.mp4").subclip(start, end).resize( (1920/2, 1080/2) )
158 |       # Make a text clip, give it a duration
159 |       # We'll also make it the size of the video clip
160 |       clip2 = TextClip(text, size=clip1.size).set_duration(4)
161 |       
162 |       # Make a composite video, takes a list of clips
163 |       composition = CompositeVideoClip( [ clip1, clip2 ] )
164 |       
165 |       # Export the video
166 |       composition.write_videofile("sunset_words.mp4")
167 |     
168 |     # sys.argv gives us the arguments from the terminal, in a list (first item is the script name)
169 |     text = sys.argv[1]
170 |     # get the duration from the second parameter, if it's there, otherwise use a default
171 |     if len(sys.argv) > 2:
172 |       # it comes as a String, we need to turn it into a number with float()
173 |       duration = float(sys.argv[2])
174 |     else:
175 |       duration = 3
176 |     # call our function
177 |     compose(text, duration)
178 |     
179 | 
180 | 
181 | 
182 | ## Tips
183 | - Use the argparse module to simplify parsing sys.argv
184 |   - https://docs.python.org/3/library/argparse.html
185 | 
186 | 
187 | ## Another script that sends the video
188 | 
189 | We’ll make a script that sends you the video in an email (?)
190 | 
191 | in the terminal:
192 | 
193 |     pip3 install emails
194 | 
195 | python email.py:
196 | 
197 |     import emails
198 |     
199 |     # Create our message object 
200 |     message = emails.html(
201 |       html="Hello friend!", 
202 |       subject="Specter blah blah", 
203 |       mail_from=("Scrap Ism", "scrapism.sfpc@gmail.com")
204 |     )
205 |     # Attach the file
206 |     # Read the video file in binary form (using rb mode)
207 |     message.attach(data=open("sunset_words.mp4", "rb"), filename="sunset_words.mp4")
208 |     
209 |     # Send the email
210 |     message.send(
211 |       to=("Sam", "splavigne@gmail.com"), 
212 |       # A bunch of email server stuff from google
213 |       smtp={
214 |         "host": "smtp.gmail.com",
215 |         "port": 465,
216 |         "ssl": True,
217 |         # You'd use an actual email login info (maybe not your own)
218 |         "user": "scrapism.sfpc@gmail.com",
219 |         "password": "scrapismscrapism"
220 |       }
221 |     )
222 | 
223 | SO COOL
224 | 
225 | 
226 | ## Making the script import the other script 
227 | 
228 | Using **import vidbot** we can import the functions from the other script 🤯 
229 | 
230 | - “vidbot” is whatever name of the other script, without .py
231 | 
232 | When importing another script, everything in the lowest indentation level will be executed. To avoid this we run the code only if the script is run directly through the terminal.
233 | we add this hacky python thing to it
234 | 
235 |     if __name__ == "__main__":
236 |       # our code
237 | 
238 | So our resulting videobot.py:
239 | 
240 |     
241 |     from moviepy.editor import VideoFileClip, TextClip, CompositeVideoClip
242 |     import sys
243 |     import argparse
244 |     
245 |     def compose(text, duration=4.0):
246 |       start = 108
247 |       end = start + duration
248 |       clip1 = VideoFileClip("sunset.mp4").subclip(start, end).resize( (1920/2, 1080/2) )
249 |       clip2 = TextClip(text, size=clip1.size).set_duration(4)
250 |       composition = CompositeVideoClip( [ clip1, clip2 ] )
251 |       composition.write_videofile("sunset_words.mp4")
252 |     
253 |     if __name__ == "__main__":
254 |       text = sys.argv[1]
255 |       if len(sys.argv) > 2:
256 |         duration = float(sys.argv[2])
257 |       else:
258 |         duration = 3
259 |       compose(text, duration)
260 |       
261 | 
262 | 
263 | and in our **email.py** we call the vidbot compose function, by adding
264 | 
265 |     
266 |     import vidbot
267 |     
268 |     # ...
269 |     
270 |     vidbot.compose("cool emailz", 1)
271 |     
272 | 
273 | 
274 | We’ll also turn the video into a gif before sending it
275 | 
276 |     
277 |     import subprocess
278 |     
279 |     subprocess.call(["ffmpeg", "-i", "sunset_words.mp4", "sunset_words.gif"])
280 |     
281 | 
282 | 
283 | And we’ll take the message from:
284 | 
285 | - Corpora https://github.com/dariusk/corpora
286 | 
287 | resulting email.py
288 | 
289 |     import emails
290 |     import vidbot
291 |     import subprocess
292 |     
293 |     isms = [
294 |       "abstract expressionism",
295 |       "academic",
296 |       "action painting",
297 |       "aestheticism",
298 |       "art deco",
299 |       "art nouveau",
300 |       # ...
301 |     ]
302 |     
303 |     # Create our message object 
304 |     message = emails.html(
305 |       html="Hello friend!", 
306 |       subject="Specter blah blah", 
307 |       mail_from=("Scrap Ism", "scrapism.sfpc@gmail.com")
308 |     )
309 |     # Turn into a gif
310 |     subprocess.call(["ffmpeg", "-i", "sunset_words.mp4", "sunset_words.gif"])
311 |     # Attach the file
312 |     # Read the video file in binary form (using rb mode)
313 |     message.attach(data=open("sunset_words.gif", "rb"), filename="sunset_words.gif")
314 |     
315 |     # Send the email
316 |     message.send(
317 |       to=("Sam", "splavigne@gmail.com"), 
318 |       # A bunch of email server stuff from google
319 |       smtp={
320 |         "host": "smtp.gmail.com",
321 |         "port": 465,
322 |         "ssl": True,
323 |         # You'd use an actual email login info (maybe not your own)
324 |         "user": "scrapism.sfpc@gmail.com",
325 |         "password": "scrapismscrapism"
326 |       }
327 |     )
328 | 
329 | 
330 | 
331 | 
332 | ## Making a twitter bot
333 | 
334 | Now you have to apply to post to twitter
335 | https://developer.twitter.com/en/apply-for-access
336 | 
337 | 
338 | - install a python library to interface with twitter
339 | - make an application with twitter dev
340 | - get a set of keys
341 |   - consumer key
342 |   - consumer secret key
343 |   - access token
344 | 
345 | boop
346 | 
347 |     from twython import Twython
348 |     import vidbot
349 |     
350 |     # parameters are APP_KEY, APP_SECRET, OAUTH_TOKEN, OAUTH_TOKEN_SECRET
351 |     twitter = Twython("dslfgjlskdfgjdflsgjdslfgsdffjldisfdf","dslfgjlskdfgjdflsgjdslfgsdffjldisfdf","dslfgjlskdfgjdflsgjdslfgsdffjldisfdf","dslfgjlskdfgjdflsgjdslfgsdffjldisfdf",)
352 |     
353 |     vidbot.compose("A spectre blah blah", 1)
354 |     video = open("sunset_words.mp4", "rb")
355 |     response = twitter.upload_video(media=video, media_type="video/mp4")
356 |     twitter.update_status(media_ids=[response["media_id"]])
357 |     
358 | 
359 | 
360 | ## See also:
361 | - instagram picture of plunger
362 |   - https://www.instagram.com/samepicofplunger/
363 | - Post to tumblr using their API
364 | - instagram api for python unofficial
365 |   - https://github.com/LevPasha/Instagram-API-python
366 | 
367 | 
368 | # Turn a python script into a website!
369 | 
370 | 
371 | ## Running a server
372 | 
373 | install flask, make webservers with it
374 | http://flask.pocoo.org/
375 | 
376 | 
377 |     pip3 install flask
378 | 
379 | make webapp.py
380 | 
381 | boop
382 | 
383 |     from flask import Flask
384 |     app = Flask(__name__)
385 |     
386 |     # In webservers
387 |     # You set up routes, when the user goes to the this URL then show them this thing
388 |     # using a 'decorator'
389 |     # when user enters the base URL /, perform the function on the next line
390 |     @app.route("/")
391 |     def home():
392 |       return "hello"
393 |     
394 |     # run the web server when we run the script and get the web server going
395 |     if __name__ == "__main__":
396 |       #run in debug mode to update the server when we change the script
397 |       app.run(debug=True)
398 | 
399 | When running the script, a local web server will be created and you can open your browser to the address the script gives you (http://127.0.0.1:5555)
400 | 
401 | When you make a change to your script, you need to **stop and restart the server**
402 | To make it easier, you can tell flask to restart the server every time there’s a change to the file by changing:
403 | 
404 |     app.run(debug=True)
405 | 
406 | 
407 | ## Getting data from the URL bar
408 | 
409 | We can get information about the request the user made in the url
410 | 
411 | - 127.0.0.1:5000/?text=lol
412 | 
413 | we get that with the flask request module
414 | 
415 |     # ...
416 |     @app.route("/")
417 |     def home():
418 |       text = request.args.get("text")
419 |       return "hello!! " + text
420 |     # ...
421 | 
422 | 
423 | 
424 | ## Compositing the video from the website
425 | 
426 | We need to
427 | 
428 | - call the compose function of vidbot
429 |   - but we need to make the output file name unique, since there might be more than one user at the same time
430 | - Show the user a preview of the file
431 |   - using flask static server
432 |   - showing the video in a <video> tag
433 | 
434 | First we’ll add an output file option to vidbot.py on the compose() function
435 | 
436 | Making a unique filename
437 | 
438 | - use the time, in seconds and milliseconds, rare but possible for people to do it at the same time but unlikely so good enough
439 | 
440 | boop
441 | 
442 |     import vidbot
443 |     from flask import Flask
444 |     import time
445 |     
446 |     app = Flask(__name__)
447 |     
448 |     # In webservers
449 |     # You set up routes, when the user goes to the this URL then show them this thing
450 |     # using a 'decorator'
451 |     # when user enters the base URL /, perform the function on the next line
452 |     @app.route("/")
453 |     def home():
454 |       # get the text parameter from the URL. First parameter is the id, second is what to return if it's not there
455 |       text = request.args.get("text", "")
456 |       # Get a unique timestamp 
457 |       ts = str(time.time())
458 |       # Get the output video file name
459 |       outname = "static/" + ts + ".mp4"
460 |       if text:
461 |               # Create the output video
462 |               vidbot.compose(text, 1, outname)
463 |               # Show an html tag with the video in it
464 |               return '<video autoplay loop src=' + outname + '>,</video>'
465 |       else:
466 |               # If there's no text in the URL, prompt the user
467 |               return "Type text into the url"
468 |     
469 |     # run the web server when we run the script and get the web server going
470 |     if __name__ == "__main__":
471 |             #run in debug mode to update the server when we change the script
472 |             app.run(debug=True)
473 |             
474 | 
475 | 
476 | 
477 | ## See also
478 | - Deploy a website using flask
479 |   - Heroku
480 |     - https://duckduckgo.com/?q=heroku+flask+app&atb=v125-5__&ia=qa
481 |     - doesn’t host static files
482 |   - DigitalOcean
483 |     - https://www.digitalocean.com/
484 |     - more complicated to set it up
485 | 
486 | 
487 | Proper way to  have modules
488 | 
489 | - Make different environments to keep the right modules for your project
490 | - using virtualenv
491 | - on the terminal:
492 |   - virtualenv [folder to store the modules] -p [version of python]
493 | 
494 |     virtualenv env -p python3
495 |     source env/bin/activate # set the shell to use this environment
496 |     
497 | now when you use pip install, the packages get installed to that environment
498 |     pip freeze #shows what packages are installed for this environment
499 | 
500 | 
501 | 
502 | # CLASS DONE WE MADE IT
503 | 
504 | 


--------------------------------------------------------------------------------
/class-notes/examples/class7/email.py:
--------------------------------------------------------------------------------
  1 | import emails
  2 | import vidbot
  3 | import subprocess
  4 | 
  5 | isms = [
  6 |   "abstract expressionism",
  7 |     "academic",
  8 |     "action painting",
  9 |     "aestheticism",
 10 |     "art deco",
 11 |     "art nouveau",
 12 |     "avant-garde",
 13 |     "barbizon school",
 14 |     "baroque",
 15 |     "bauhaus",
 16 |     "biedermeier",
 17 |     "caravaggisti",
 18 |     "carolingian",
 19 |     "classicism",
 20 |     "cloisonnism",
 21 |     "cobra",
 22 |     "color field painting",
 23 |     "conceptual art",
 24 |     "cubism",
 25 |     "cubo-futurism",
 26 |     "dada",
 27 |     "dadaism",
 28 |     "de stijl",
 29 |     "deformalism",
 30 |     "der blaue reiter",
 31 |     "die brücke",
 32 |     "divisionism",
 33 |     "eclecticism",
 34 |     "ego-futurism",
 35 |     "existentialism",
 36 |     "expressionism",
 37 |     "fauvism",
 38 |     "fluxus",
 39 |     "formalism",
 40 |     "futurism",
 41 |     "geometric abstraction",
 42 |     "gothic art",
 43 |     "gründerzeit",
 44 |     "hard-edge painting",
 45 |     "historicism",
 46 |     "hudson river school",
 47 |     "humanism",
 48 |     "hyperrealism",
 49 |     "idealism",
 50 |     "illusionism",
 51 |     "immagine&poesia",
 52 |     "impressionism",
 53 |     "incoherents",
 54 |     "installation art",
 55 |     "international gothic",
 56 |     "intervention art",
 57 |     "jugendstil",
 58 |     "kinetic art",
 59 |     "land art",
 60 |     "les nabis",
 61 |     "lettrism",
 62 |     "lowbrow",
 63 |     "luminism",
 64 |     "lyrical abstraction",
 65 |     "mail art",
 66 |     "manierism",
 67 |     "mannerism",
 68 |     "maximalism",
 69 |     "merovingian",
 70 |     "metaphysical art ",
 71 |     "minimalism",
 72 |     "modern art",
 73 |     "modernism",
 74 |     "monumentalism",
 75 |     "multiculturalism",
 76 |     "naturalism",
 77 |     "neo-classicism",
 78 |     "neo-dada",
 79 |     "neo-expressionism",
 80 |     "neo-fauvism",
 81 |     "neo-geo",
 82 |     "neo-impressionism",
 83 |     "neo-minimalism",
 84 |     "neoclassicism",
 85 |     "neoism",
 86 |     "neue slowenische kunst",
 87 |     "new media art",
 88 |     "new objectivity",
 89 |     "nonconformism",
 90 |     "nouveau realisme",
 91 |     "op art",
 92 |     "orphism",
 93 |     "ottonian",
 94 |     "outsider art",
 95 |     "performance art",
 96 |     "perspectivism",
 97 |     "photorealism",
 98 |     "pointilism",
 99 |     "pop art",
100 |     "post-conceptualism",
101 |     "post-impressionism",
102 |     "post-minimalism",
103 |     "post-painterly abstraction",
104 |     "post-structuralism",
105 |     "postminimalism",
106 |     "postmodern art",
107 |     "postmodernism",
108 |     "pre-raphaelites",
109 |     "precisionism",
110 |     "primitivism",
111 |     "purism",
112 |     "rayonism",
113 |     "realism",
114 |     "relational art",
115 |     "remodernism",
116 |     "renaissance",
117 |     "rococo",
118 |     "romanesque",
119 |     "romanticism",
120 |     "russian futurism",
121 |     "russian symbolism",
122 |     "scuola romana",
123 |     "secularism",
124 |     "situationist international",
125 |     "social realism",
126 |     "socialist realism",
127 |     "sound art",
128 |     "street art",
129 |     "structuralism",
130 |     "stuckism international",
131 |     "stuckism",
132 |     "superflat",
133 |     "superstroke",
134 |     "suprematism",
135 |     "surrealism",
136 |     "symbolism",
137 |     "synchromism",
138 |     "synthetism",
139 |     "systems art",
140 |     "tachism",
141 |     "tachisme",
142 |     "tonalism",
143 |     "video art",
144 |     "video game art",
145 |     "vorticism",
146 |     "young british artists"
147 | ]
148 | 
149 | 
150 | # Create our message object 
151 | message = emails.html(
152 |   html="Hello friend!", 
153 |   subject="Specter blah blah", 
154 |   mail_from=("Scrap Ism", "scrapism.sfpc@gmail.com")
155 | )
156 | # Turn into a gif
157 | subprocess.call(["ffmpeg", "-i", "sunset_words.mp4", "sunset_words.gif"])
158 | # Attach the file
159 | # Read the video file in binary form (using rb mode)
160 | message.attach(data=open("sunset_words.gif", "rb"), filename="sunset_words.gif")
161 | 
162 | # Send the email
163 | message.send(
164 |   to=("Sam", "splavigne@gmail.com"), 
165 |   # A bunch of email server stuff from google
166 |   smtp={
167 |     "host": "smtp.gmail.com",
168 |     "port": 465,
169 |     "ssl": True,
170 |     # You'd use an actual email login info (maybe not your own)
171 |     "user": "scrapism.sfpc@gmail.com",
172 |     "password": "scrapismscrapism"
173 |   }
174 | )
175 | 


--------------------------------------------------------------------------------
/class-notes/examples/class7/vidbot.py:
--------------------------------------------------------------------------------
 1 | 
 2 | # Import VideoFileClip to load a video, 
 3 | # TextClip to overlay text, and 
 4 | # CompositeVideoClip to take two or more clips and overlay them as layers
 5 | from moviepy.editor import VideoFileClip, TextClip, CompositeVideoClip
 6 | import sys
 7 | import argparse
 8 | 
 9 | # our function takes a text and duration. duration is optional and defaults to 4.0
10 | def compose(text, duration=4.0, outname="sunset_words.mp4"):
11 |   # define when our video starts (one minute, 48 seconds = 60 + 48 = 108)
12 |   start = 108
13 |   # and calculate when it ends
14 |   end = start + duration
15 |   # Load the video, get a subclip that's just our calculated range
16 |   # Then resize it to make it faster, resize takes a tuple
17 |   clip1 = VideoFileClip("sunset.mp4").subclip(start, end).resize( (1920/2, 1080/2) )
18 |   # Make a text clip, give it a duration
19 |   # We'll also make it the size of the video clip
20 |   clip2 = TextClip(text, size=clip1.size).set_duration(4)
21 |   
22 |   # Make a composite video, takes a list of clips
23 |   composition = CompositeVideoClip( [ clip1, clip2 ] )
24 |   
25 |   # Export the video
26 |   composition.write_videofile(outname)
27 | 
28 | # sys.argv gives us the arguments from the terminal, in a list (first item is the script name)
29 | text = sys.argv[1]
30 | # get the duration from the second parameter, if it's there, otherwise use a default
31 | if len(sys.argv) > 2:
32 |   # it comes as a String, we need to turn it into a number with float()
33 |   duration = float(sys.argv[2])
34 | else:
35 |   duration = 3
36 | # call our function
37 | compose(text, duration)
38 | 


--------------------------------------------------------------------------------
/class-notes/examples/class7/webapp.py:
--------------------------------------------------------------------------------
 1 | 
 2 | import vidbot
 3 | from flask import Flask
 4 | import time
 5 | 
 6 | app = Flask(__name__)
 7 | 
 8 | # In webservers
 9 | # You set up routes, when the user goes to the this URL then show them this thing
10 | # using a 'decorator'
11 | # when user enters the base URL /, perform the function on the next line
12 | @app.route("/")
13 | def home():
14 |   # get the text parameter from the URL. First parameter is the id, second is what to return if it's not there
15 |   text = request.args.get("text", "")
16 |   # Get a unique timestamp 
17 |   ts = str(time.time())
18 |   # Get the output video file name
19 |   outname = "static/" + ts + ".mp4"
20 |   if text:
21 |   	# Create the output video
22 |   	vidbot.compose(text, 1, outname)
23 |   	# Show an html tag with the video in it
24 |   	return '<video autoplay loop src=' + outname + '></video>'
25 |   else:
26 |   	# If there's no text in the URL, prompt the user
27 |   	return "Type text into the url"
28 | 
29 | # run the web server when we run the script and get the web server going
30 | if __name__ == "__main__":
31 | 	#run in debug mode to update the server when we change the script
32 | 	app.run(debug=True)
33 | 


--------------------------------------------------------------------------------
/class-notes/examples/natural-language-processing/classify.py:
--------------------------------------------------------------------------------
 1 | from textblob import TextBlob
 2 | from textblob.classifiers import NaiveBayesClassifier
 3 | 
 4 | train = [
 5 |     ('i am happy today', 'pos'),
 6 |     ('this is a good burger', 'pos'),
 7 |     ('you\'re a good boy', 'pos'),
 8 |     ('you are doing well', 'pos'),
 9 |     
10 |     ('i do not like you', 'neg'),
11 |     ("don't go there", 'neg'),
12 |     ('this is so frustrating', 'neg'),
13 |     ('things are bad', 'neg')
14 | ]
15 | 
16 | cl = NaiveBayesClassifier(train)
17 | 
18 | sentence = "I feel really bad"
19 | 
20 | # Classify a sentence
21 | print(sentence,"is",cl.classify(sentence))
22 | 
23 | # Get the probability
24 | prob = cl.prob_classify("I don't like tings")
25 | print("The probability that this sentence is negative is", prob.prob("neg"))
26 | print("The probability that this sentence is positive is", prob.prob("pos"))
27 | 
28 | 
29 | # Analyze a text
30 | text = open("pride.txt").read()
31 | blob = TextBlob(text)
32 | 
33 | results = {
34 |   "pos": [],
35 |   "neg": []
36 | }
37 | for sentence in blob.sentences:
38 |   cat = cl.classify(sentence)
39 |   results[cat].append(sentence)
40 | 
41 | print("There are", len(results["pos"]), "positive sentences and", len(results["neg"]), "negative sentences.")
42 |   
43 | 


--------------------------------------------------------------------------------
/class-notes/examples/natural-language-processing/part-of-speech.py:
--------------------------------------------------------------------------------
 1 | from textblob import TextBlob
 2 | import random
 3 | 
 4 | pride_blob = TextBlob(open("pride.txt").read())
 5 | 
 6 | # pos = "JJ" # adjectives
 7 | pos = "NN" # nouns
 8 | 
 9 | 
10 | # List of POS tags https://www.clips.uantwerpen.be/pages/mbsp-tags
11 | parts = []
12 | print("Finding all the parts...")
13 | for word, tag in pride_blob.tags:
14 |   if tag == pos:
15 |     parts.append(word)
16 | 
17 | output = ""
18 | print ("Replacing all the parts...")
19 | for line in open("manifesto.txt").readlines():
20 |   # If it's a noun, replace it with one of Jane Austen's nouns
21 |   for word, tag in TextBlob(line).tags:
22 |     if tag == pos:
23 |       line = line.replace(word, random.choice(parts))
24 |   output += line
25 | 
26 | # Save to a file
27 | f = open("output_compare.txt", "w")
28 | f.write(output)
29 | f.close()
30 | 
31 | print("Done!")
32 | 


--------------------------------------------------------------------------------
/class-notes/examples/natural-language-processing/regexp.py:
--------------------------------------------------------------------------------
1 | import re
2 | 
3 | text = "Mr. Hurst and Mr. Bingley were at piquet, and Mrs. Darcy was observing their game."
4 | print(text)
5 | 
6 | p = re.compile('Mr[a-z]?\. ([a-z]*)', re.IGNORECASE)
7 | for match in p.finditer(text):
8 |   print("Found name: ",match.group(1))
9 | 


--------------------------------------------------------------------------------
/class-notes/examples/natural-language-processing/similarity.py:
--------------------------------------------------------------------------------
 1 | # python3 -m pip install spacy
 2 | # sudo python3 -m spacy download en
 3 | import spacy
 4 | 
 5 | pairs = [
 6 |   ["A specter is haunting Europe--the specter of Communism.", "It is a truth universally acknowledged, that a single man in possession of a good fortune, must be in want of a wife."],
 7 |   ["The discovery of America, the rounding of the Cape, opened up fresh ground for the rising bourgeoisie.", "The discovery of Mrs. Philips, the rounding of the daughters, opened up fresh ground for the ball."]
 8 | ]
 9 | 
10 | # Load English tokenizer  - Other languages: https://spacy.io/usage/models
11 | nlp = spacy.load('en_core_web_sm')
12 | 
13 | # Similarity
14 | for pair in pairs:
15 |   nlp1 = nlp(pair[0])
16 |   nlp2 = nlp(pair[1])
17 |   print("Phrase 1:  ", nlp1)
18 |   print("Phrase 2:  ", nlp2)
19 |   print("--- Similarity: ", nlp1.similarity(nlp2))
20 | 


--------------------------------------------------------------------------------
/class-notes/examples/natural-language-processing/translate.py:
--------------------------------------------------------------------------------
 1 | from textblob import TextBlob
 2 | 
 3 | text = open("pride.txt").read()[0:2000]
 4 | en_blob = TextBlob(text)
 5 | 
 6 | # Translate
 7 | # Language codes in: https://cloud.google.com/translate/docs/languages
 8 | # es_blob = en_blob.translate(to="es")
 9 | 
10 | # Translate there and back
11 | print("Translating...")
12 | blob = en_blob.translate(to="es").translate(from_lang="es", to="en").translate(to="sw").translate(from_lang="sw", to="en")
13 | output = str(blob)
14 | 
15 | # print(str(sentence),">>>",crazy_sentence)  
16 | 
17 | # Save to a file
18 | f = open("output_parts-of-speech.txt", "w")
19 | f.write(output)
20 | f.close()
21 | 
22 | print("Done!")
23 | 


--------------------------------------------------------------------------------
/class-notes/examples/video/combine_videos.py:
--------------------------------------------------------------------------------
 1 | 
 2 | # Combine two videos using moviepy
 3 | 
 4 | # import the editor library but call it mp to make it shorter
 5 | import moviepy.editor as mp
 6 | # another way is to only import the functions we need, a bit faster
 7 | # from moviepy.editor import VideoFileClip, concatenate_videoclips
 8 | 
 9 | # Join videos together
10 | clip1 = mp.VideoFileClip("fan_upside_down.mp4")
11 | clip2 = mp.VideoFileClip("prancercise.mp4")
12 | # get a segment
13 | clip2 = clip2.subclip(10, 13.5)
14 | 
15 | # the function takes a list of video objects
16 | final_clip = mp.concatenate_videoclips([clip1, clip2])
17 | # output the video to a file
18 | final_clip.write_videofile("output.mp4")
19 | 


--------------------------------------------------------------------------------
/class-notes/examples/video/random_overlay.py:
--------------------------------------------------------------------------------
 1 | 
 2 | import random
 3 | import moviepy.editor as mp
 4 | 
 5 | video = mp.VideoFileClip("dance.mp4")
 6 | 
 7 | # get the duration of the video, in seconds
 8 | video_duration = video.duration
 9 | # define the duration of the subclips we're gonna take
10 | clip_duration = 1.5
11 | # make a list we'll populate with subclips
12 | clips = []
13 | 
14 | for i in range(0,10): # do this 10 times
15 |   # get a random start point
16 |   # make it so the end can never be past the full video duration
17 |   start = random.uniform(0, video_duration - clip_duration)
18 |   # the end point is whatever start time plus the subclip duration
19 |   end = start + clip_duration
20 | 
21 |   # store the subclip so we can do things to it
22 |   clip = video.subclip(start,end)
23 | 
24 |   # set the position of the clip in our canvas
25 |   clip = clip.set_position((random.randint(-100, 800), random.randint(-100, 400)))
26 |   # set the start time, to whatever position we're in (i) by half, just for fun
27 |   clip = clip.set_start( i / 2.0 )
28 |   # add to the list
29 |   clips.append(clip)
30 | 
31 | # make a video that is a composite of all the subclips
32 | final_clip = mp.CompositeVideoClip(clips)
33 | 
34 | # write it to a file
35 | final_clip.write_videofile("random_dance.mp4")
36 | # use this to write a video file with audio that mac understands
37 | #final_clip.write_videofile(“random_dance.mp4”, codec="libx264", temp_audiofile="something.m4a", remove_temp=True, audio_codec="aac")
38 | 


--------------------------------------------------------------------------------
/class-notes/examples/video/randomize.py:
--------------------------------------------------------------------------------
 1 | 
 2 | # Get random segments from a video and make a new video from all those subclips
 3 | 
 4 | import random
 5 | import moviepy.editor as mp
 6 | 
 7 | video = mp.VideoFileClip("dance.mp4")
 8 | 
 9 | # get the duration of the video, in seconds
10 | video_duration = video.duration
11 | # define the duration of the subclips we're gonna take
12 | clip_duration = 0.5
13 | # make a list we'll populate with subclips
14 | clips = []
15 | 
16 | for i in range(0,10): # do this 10 times
17 |   # get a random start point
18 |   # make it so the end can never be past the full video duration
19 |   start = random.uniform(0, video_duration - clip_duration)
20 |   # the end point is whatever start time plus the subclip duration
21 |   end = start + clip_duration
22 |   # add a subclip to the list, between our start and end points
23 |   clips.append(video.subclip(start,end))
24 | 
25 | # create the final video out of all the clips
26 | final_clip = mp.concatenate_videoclips(clips)
27 | # write it to a file
28 | final_clip.write_videofile("random_dance.mp4")
29 | 
30 | # use this to write a video file with audio that mac understands
31 | #final_clip.write_videofile(“random_dance.mp4”, codec="libx264", temp_audiofile="something.m4a", remove_temp=True, audio_codec="aac")
32 | 


--------------------------------------------------------------------------------
/reader-01-the-command-line.md:
--------------------------------------------------------------------------------
  1 | # Reader 01 - The Command Line
  2 | 
  3 | The command line is a text-based interface for interacting with your computer. From the command line you can launch programs, view files, and manipulate your file system by making, moving, and copying files and directories. You can think of it as the Finder in Mac, without the graphic interface, but much more powerful.
  4 | 
  5 | ## Setup
  6 | 
  7 | On a Mac you can access the command line by opening up the `Terminal` application, located in `/Applications/Utilities/Terminal`
  8 | 
  9 | To get started on Windows you will need to set up the Windows Subsystem for Linux, which allows you to run Ubuntu (a Linux distribution) from within your current Windows 10 installation.  [Follow this guide to do so](https://tutorials.ubuntu.com/tutorial/tutorial-ubuntu-on-windows).
 10 | 
 11 | 
 12 | ## The Prompt
 13 | 
 14 | When you open up your terminal application you'll see something like this:
 15 | 
 16 | ```bash
 17 | SamsComputer:~ sam$
 18 | ```
 19 | 
 20 | This is called the "prompt". By default (on a Mac) it shows the name of the computer, the directory that you are currently in, your username, and then a $ sign.
 21 | 
 22 | The basic use of the command line is: 1) you type a command, 2) you hit return, and 3) some output of the command is printed to the screen.
 23 | 
 24 | ## Basic Navigation & File Operations
 25 | 
 26 | *Please note I use the word "directory" and "folder" interchangeably.*
 27 | 
 28 | When you open a new terminal window, you are placed inside your home folder. On a Mac this is `/Users/myusername` and on Linux, `/home/myusername`. 
 29 | 
 30 | To see the folder you are currently in, type: `pwd` and hit return. `pwd` stands for "print working directory", or in other words, "show me the directory I am currently working from".
 31 | 
 32 | #### Here are some basic commands for getting around, making, deleting and copying files and folders.
 33 | 
 34 | 
 35 | **`pwd`** stands for "print working directory". It prints out where you are:
 36 | 
 37 | ```bash
 38 | pwd
 39 | ```
 40 | 
 41 | **`ls`** stands for "list". It lists the contents of current directory.
 42 | 
 43 | ```bash
 44 | ls
 45 | ```
 46 | 
 47 | **`cd`** stands for "change directory". Type `cd` and then the directory you want to go to. For example, change to the Desktop from your home folder:
 48 | 
 49 | ```bash
 50 | cd Desktop
 51 | ```
 52 | 
 53 | To go into the parent folder, up one level in the file structure, type `..` or `../` instead of a folder name, like so:
 54 | 
 55 | ```bash
 56 | cd ..
 57 | ```
 58 | 
 59 | If you type `cd` without a folder name after, it takes you back to your home folder.
 60 | 
 61 | 
 62 | **`mkdir`** stands for "make directory". Type `mkdir` and then a name to make a folder. For example, make a folder called "cool_project":
 63 | 
 64 | ```bash
 65 | mkdir cool_project
 66 | ```
 67 | 
 68 | **`mv`** stands for "move". It lets you move files and folders and also rename them. To rename a file:
 69 | 
 70 | ```bash
 71 | mv oldname.txt newname.txt
 72 | ```
 73 | 
 74 | **`cp`** stands for "copy". It lets duplicate files:
 75 | 
 76 | ```bash
 77 | cp draft.txt draft_copy.txt
 78 | ```
 79 | 
 80 | **`rm`** stands for "remove". It lets you delete files:
 81 | 
 82 | ```bash
 83 | rm bad_selfie.jpg
 84 | ```
 85 | 
 86 | Please note, `rm` will **not** ask for confirmation, and it will not move files to the trash. It'll just delete them immediately, so be careful.
 87 | 
 88 | **`cat`** stands for "concatenate" and it shows you the contents of a file and also allows you to join two files together. For example, to print out the entirety of Moby Dick:
 89 | 
 90 | ```bash
 91 | cat mobydick.txt
 92 | ```
 93 | 
 94 | **`more`** is like `cat` but will paginate the output if it is larger than the size of your terminal window:
 95 | 
 96 | ```bash
 97 | more mobydick.txt
 98 | ```
 99 | (now use the up and down arrows to go up or down by a line, the space to go down by a page and `q` to exit if needed)
100 | 
101 | **`file`** provides basic info about a file:
102 | 
103 | ```bash
104 | file mysterfile.what
105 | ```
106 | 
107 | **`sort`** sorts a file alphabetically by line and prints the output to the screen
108 | 
109 | ```bash
110 | sort names.txt
111 | ```
112 | 
113 | **`grep`** searches each line of a file for some input, and prints those lines to the screen. For example, the following searches for all lines in Moby Dick containing the word "whale".
114 | 
115 | ```bash
116 | grep whale mobydick.txt
117 | ```
118 | 
119 | ## Command Line Options and Getting Help
120 | 
121 | Most commands have extra options that you can input when you run the command.  They are usually preceded by either one or two dashes (`-` or `--`). 
122 | 
123 | The structure of a typical command looks like this:
124 | 
125 | ```bash
126 | command_name [options] arguments
127 | ```
128 | 
129 | ("arguments" refers to the file or files your are running the command with)
130 | 
131 | For example, the `sort` command outputs in ascending order by default, but you can have it use reverse order with the `-r` option, like so:
132 | 
133 | ```bash
134 | sort -r mobydick.txt
135 | ```
136 | 
137 | You can also tell `sort` to only output unique lines (ie, to remove any duplicate lines) with the `-u` option:
138 | 
139 | ```bash
140 | sort -u mobydick.txt
141 | ```
142 | 
143 | Finally you can combine options:
144 | 
145 | ```bash
146 | sort -u -r mobydick.txt
147 | ```
148 | 
149 | Sometimes, options have parameters. For example, the `cut` command cuts out portions of each line of a file. To use it you must specify a delimiter character with the `-d` option and field number to extract with the `-f` option.  To get the first word of every line in Moby Dick I might enter:
150 | 
151 | ```bash
152 | cut -d " " -f 1 mobydick.txt
153 | ```
154 | 
155 | To see all the options and view a manual for any command, use the `man` tool (short for "manual")
156 | 
157 | ```bash
158 | man cut
159 | ```
160 | 
161 | Use the arrow keys to navigate, and `q` to exit.
162 | 
163 | ## Piping and Directing Output
164 | 
165 | Most commands will produce output on the screen. However we can also automatically save that output to the filesystem using the `>` character followed by a filename.
166 | 
167 | Sort a file called "names.txt", and save the output to a new file called "sorted_names.txt":
168 | 
169 | ```bash
170 | sort names.txt > sorted_names.txt
171 | ```
172 | 
173 | `>` will create a file if it does not already exist, or overwrite one if it does. You can use `>>` instead to append to a file.
174 | 
175 | Unix also has a very powerful concept called "pipes" which allow us to chain commands together, effectively feeding the output of one command into the input of another. To do so, we use the `|` symbol.
176 | 
177 | Extract all lines of Moby Dick containing "whale", then sort them.
178 | 
179 | ```bash
180 | grep whale mobydick.txt | sort -u
181 | ```
182 | 
183 | The `|` here means "take the output of the grep command and send it to sort -u". You can use as many pipes as you desire, and combine this technique with the output redirection.
184 | 
185 | Extract all lines of Moby Dick containing "whale", then sort them, then save to a new file called "sorted_whales.txt"
186 | 
187 | ```bash
188 | grep whale mobydick.txt | sort -u > sorted_whales.txt
189 | ```
190 | 
191 | 
192 | ## The Structure of the Filesytem
193 | 
194 | Everything on your computer is either a file or a folder, and these files and folders are organized hierarchically, like a tree. At the very bottom of the tree is the "root folder", indicated by a single forward slash, like so `/`. Here's a basic example of directory structure:
195 | 
196 | ```
197 | /
198 | 	Users/
199 |   		sam/
200 |    			Desktop/
201 | 	  			trotsky.jpg
202 | 	  			the_man_without_qualities.txt
203 | 	 		Documents/
204 | 	 		Downloads/			
205 | 		Guest/
206 | 	Applications/
207 | 	Volumes/
208 | ```
209 | 
210 | Each file and folder has a unique location on the filesystem. This location is called a "path". You can reference files and folders either by their **relative** path, or by their **absolute** or **full** path. In the previous examples I have been using the relative path - that is, I have been referencing files relative to where I currently am. **A path is absolute if it begins with a `/`**
211 | 
212 | For example the absolute path to `the_man_without_qualities.txt` in the above filesystem is `/Users/sam/Desktop/the_man_without_qualities.txt`. I can look inside the contents of this file, from any working directory, with this command:
213 | 
214 | ```bash
215 | more /Users/sam/Desktop/the_man_without_qualities.txt
216 | ```
217 | 
218 | There are a few shortcuts for dealing paths as well. 
219 | 
220 | `.` (single dot) or './' (single dot with slash) means the current folder that I am in.
221 | 
222 | `..` (two dots) or `../` (two dots with slash) means the parent folder. For example, if am in my Desktop folder and I want to list the contents of my Downloads folder I could type:
223 | 
224 | ```bash
225 | ls ../Downloads/
226 | ``` 
227 | 
228 | ## Wildcards
229 | 
230 | It's also possible to reference multiple files using the `*` character in combination with other characters. This can be really useful in a lot of situations.
231 | 
232 | For example, can list all files that begin with the word "the" like so:
233 | 
234 | ```bash
235 | ls the*
236 | ```
237 | 
238 | List all jpg images:
239 | 
240 | ```bash
241 | ls *.jpg
242 | ```
243 | 
244 | Make a folder called `images` and move all jpeg images into it:
245 | 
246 | ```bash
247 | mkdir images
248 | mv *.jpg images/
249 | ```
250 | 
251 | 
252 | ## Tips
253 | 
254 | It can take a while to get used to the command line, but there are a few tips and trick that make it much easier to use.
255 | 
256 | * Use the up and down arrows to view a history of the commands you have entered. 
257 | * Hit the tab key to autocomplete commands and file paths
258 | * Type `open` and then a filename to open the file in its default program
259 | * Drag a folder or file onto the terminal to fill in its absolute path
260 | * Type ctrl-a to move your cursor to the beginning of the line, and ctrl-e to the end
261 | 
262 | 


--------------------------------------------------------------------------------
/reader-02-python-basics.md:
--------------------------------------------------------------------------------
  1 | # Reader 02 - Intro to Python
  2 | 
  3 | ## Installation
  4 | 
  5 | There are two versions of Python, Python 2 and Python 3. For this class we will be using Python 3, which you will need to install on your computer. The easiest way to do this, on a Mac, is with another program called Homebrew, a command line tool that allows you to install and manage other programs.
  6 | 
  7 | ### Install Homebrew
  8 | 
  9 | Visit [Homebrew's website](https://brew.sh/) and follow the instructions there, or just copy and paste the following into your terminal:
 10 | 
 11 | ```bash
 12 | /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
 13 | ```
 14 | 
 15 | ### Install Python 3
 16 | 
 17 | Once Homebrew is done, you can use it to install Python and a number of other extremely useful command line tools.
 18 | 
 19 | ```bash
 20 | brew install python3
 21 | ```
 22 | 
 23 | ### Install a Text Editor
 24 | 
 25 | To create and edit Python files you'll need a good text editor, specifically designed to edit code. Here are a few good options:
 26 | 
 27 | * [Visual Studio Code](https://code.visualstudio.com/)
 28 | * [Atom](https://atom.io/)
 29 | * [Sublime](https://www.sublimetext.com/) (not free)
 30 | 
 31 | 
 32 | ## Basics
 33 | 
 34 | Python is a command line application, just like `cat`, `grep` and `sort`.
 35 | 
 36 | To execute Python code you run the `python` command with a text file as an argument.
 37 | 
 38 | To start, let's make a simple program that prints a message on the terminal. To do this we will use the `print` command.
 39 | 
 40 | Create a new file called `hello.py` and put this in it:
 41 | 
 42 | ```python
 43 | print("Hello comrade")
 44 | ```
 45 | 
 46 | Open your terminal and navigate to the directory where the file is saved, and then type:
 47 | 
 48 | ```python
 49 | python hello.py
 50 | ```
 51 | You should see "Hello comrade" printed on the screen.
 52 | 
 53 | ### Expressions
 54 | 
 55 | An "expression" is a set of instructions for the computer to execute. Python will read or evaluate your expressions and return a result. For example you can add numbers:
 56 | 
 57 | ```python
 58 | print(1+1)
 59 | print(10/2)
 60 | print(100 * 6.2 - 70/3.5)
 61 | ```
 62 | 
 63 | You can also test to see how different expressions relate to each other. 
 64 | 
 65 | `==` tests for equality  
 66 | `<` less than  
 67 | `>` greater than  
 68 | `<=` less than or equal  
 69 | `>=` greater than or equal  
 70 | 
 71 | ```python
 72 | print(1 == 1)
 73 | print(1 == 2)
 74 | print(1 < 2)
 75 | print(5 * 20 >= 100/13)
 76 | ```
 77 | All of these expressions will evaluate to either a `True` or a `False`
 78 | 
 79 | ### Variables
 80 | 
 81 | You can store the value of expressions inside named variables using the `=` symbol.
 82 | 
 83 | ```python
 84 | x = 2
 85 | y = 5
 86 | z = x + y
 87 | print(x * 100)
 88 | print(z)
 89 | ```
 90 | 
 91 | #### Types
 92 | 
 93 | Values have different "types" or categories. For example, 1 is an integer, 1.5 is a float.
 94 | 
 95 | You can see what type a value is is by using the `type` function:
 96 | 
 97 | ```python
 98 | print(type(1))
 99 | ```
100 | 
101 | Some important types are:
102 | 
103 | ```python
104 | a_number = 1 				# an integer
105 | another_number = 5.1 		# a float
106 | some_string = "Hello!" 		# a string
107 | some_boolean = True 		# a boolean (notice the capitalization)
108 | a_list = ["a bunch", "of", "stuff", a_number, some_string]
109 | a_dictionary = {"key1": 10, "key2": "a string"} # a dictionary (key/value pairs)
110 | ```
111 | 
112 | In Python you do not need to declare variable types, or even that you are declaring a variable, you simply type a name, the equals sign, and then a value or expression.
113 | 
114 | ### Strings
115 | 
116 | Strings are a variable type that stores text. To create a string, surround some text within quotation marks. It doesn't matter if you use single or double quotes as long as you are consistent.
117 | 
118 | ```python
119 | first_name = "Karl"
120 | last_name = 'Marx'
121 | 
122 | print(first_name)
123 | print(last_name)
124 | ```
125 | 
126 | If you add two or more strings together, Python will combine a new string for you.
127 | 
128 | ```python
129 | first_name = "Karl"
130 | last_name = 'Marx'
131 | 
132 | print(first_name + last_name)
133 | 
134 | print(first_name + " " + last_name)
135 | ```
136 | 
137 | Each character in a string is indexed numerically, and can access individual characters using `[]` square brackets. 
138 | 
139 | ```python
140 | name = "Karl Marx"
141 | first_letter = name[0]
142 | print(first_letter)
143 | 
144 | second_letter = name[1]
145 | print(second_letter)
146 | ```
147 | 
148 | The character index begins with the number 0. If you wish to access the last character, you use `-1`. The second to last, `-2` and so on.
149 | 
150 | ```python
151 | name = "Karl Marx"
152 | last_letter = name[-1]
153 | print(last_letter)
154 | ```
155 | 
156 | You can also get a range of characters in a string by entering a starting and ending index in your square brackets:
157 | 
158 | 
159 | ```python
160 | name = "Karl Marx"
161 | first_three_letters = name[0:3]
162 | print(first_three_letters)
163 | ```
164 | 
165 | To get the total length of a string, use the `len()` function.
166 | 
167 | ```python
168 | print(len("hello!"))
169 | ```
170 | 
171 | You can also determine if a string exists within another string with the `in` keyword.
172 | 
173 | ```python
174 | sentence = "A spectre is haunting Europe"
175 | print("spectre" in sentence)
176 | ```
177 | 
178 | #### String methods
179 | 
180 | Python's string implementation comes with many useful methods that allow you to transform and get information about strings.
181 | 
182 | For example, to make a string uppercase:
183 | 
184 | ```python
185 | sentence = "hello there!"
186 | uppercase = sentence.upper()
187 | print(uppercase)
188 | ```
189 | 
190 | Here are a few more examples of things that you can do
191 | 
192 | ```python
193 | sentence = "   HELLO THERE   "
194 | 
195 | # make it uppercase
196 | lowercase_sentence = sentence.lower()
197 | 
198 | # make it title case
199 | titlecase_sentence = sentence.title()
200 | 
201 | # remove white space at the start and end
202 | stripped = sentence.strip()
203 | 
204 | # replace one set of characters with another
205 | goodby_sentence = sentence.replace("HELLO", "GOODBYE")
206 | ```
207 | 
208 | Here's a full list: [https://docs.python.org/3.7/library/stdtypes.html#string-methods](https://docs.python.org/3.7/library/stdtypes.html#string-methods)
209 | 
210 | ### Lists
211 | 
212 | A list is a numerically ordered collection of values, also known as an array.
213 | 
214 | ```python
215 | # make an empty list
216 | my_list = []
217 | 
218 | # add something to our list with the "append" method
219 | my_list.append("hi") # the list will now look like this: ["hi"]
220 | 
221 | # add some more stuff
222 | my_list.append(45)
223 | my_list.append(100.2)
224 | my_list.append("whatever")
225 | 
226 | # now our list will look like this:
227 | # ["hi", 45, 100.2, "whatever"]
228 | 
229 | # get the length of a list
230 | len(my_list)
231 | 
232 | # you can access individual items in the list by referrring to their index value
233 | print my_list[0] # prints "hi"
234 | print my_list[2] # prints 100.2
235 | 
236 | # use negative numbers to start at the back
237 | print my_list[-1] # prints "6" - the last item
238 | 
239 | # you can access part of a list with a ":"
240 | my_list[1:3] # will be [45, 100.2, "whatever"]
241 | ```
242 | 
243 | You can iterate through every value in a list with the `for` keyword:
244 | 
245 | ```python
246 | for item in my_list:
247 | 	print(item)
248 | ```
249 | 
250 | ### Reading files
251 | 
252 | To open a file in Python, use the `open()` keyword function. The function takes two arguments. The first is the name of the file to open, and the second is a flag that states if we are opening the file with the intent of *reading* to it (use "r"), or *writing* to it (use "w").
253 | 
254 | Once we have opened a file, we use the `read` function to grab it's contents and return then as a string.
255 | 
256 | In this example, we open a file and store its contents in a string. We then uppercase the entire file and print it to the screen.
257 | 
258 | ```python
259 | content = open("communist_manifesto.txt", "r").read()
260 | loud_manifesto = content.upper()
261 | print(loud_manifesto)
262 | ```
263 | 
264 | You can also store a file as a list of lines using `readlines()` instead of `read()`
265 | 
266 | This example prints the first 5 characters of a text file.
267 | 
268 | ```python
269 | all_lines = open("communist_manifesto.txt", "r").readlines()
270 | for line in all_lines:
271 | 	print(line[0:5])
272 | ```
273 | 


--------------------------------------------------------------------------------