├── .gitignore
├── 01-02.Course Notes
    └── Course Notes - Web Scraping and API Fundamentals in Python.pdf
├── 03.Working with APIs
    ├── Currency Exchange API
    │   ├── Section 3 - Additional API functionalities.ipynb
    │   ├── Section 3 - Creating a simple currency converter.ipynb
    │   ├── Section 3 - Exchange rates API GETting a JSON reply.ipynb
    │   ├── Section 3 - Incorporating parameters in a GET request.ipynb
    │   ├── additional_API_functionalities.py
    │   ├── currency_converter.py
    │   ├── exchange_rate_API.py
    │   └── exchange_rate_API_with_paremeters.py
    ├── EDAMAM API
    │   ├── EDAMAM_API.py
    │   ├── RoastedChicken_nutrients.csv
    │   ├── Section 3 - Downloading files with requests.ipynb
    │   ├── Section 3 - EDAMAM API - Initial setup and registration.ipynb
    │   └── Section 3 - EDAMAM API - Sending a POST request.ipynb
    ├── GitHub API
    │   ├── Section 3 - GitHub API - Pagination.ipynb
    │   └── github_API.py
    └── iTune API
    │   ├── Section 3 - iTunes API - Exercise Solution.ipynb
    │   ├── Section 3 - iTunes API - Exrecise Setup.ipynb
    │   ├── Section 3 - iTunes API - Structuring and exporting the data.ipynb
    │   ├── Section 3 - iTunes API.ipynb
    │   ├── iTunes_API.py
    │   ├── iTunes_API_structuring_exporting.py
    │   ├── songs_info.csv
    │   └── songs_info.xlsx
├── 04.HTML Overview
    ├── Section 4 - CSS and JavaScript.html
    ├── Section 4 - CSS style tag.html
    ├── Section 4 - Character encoding - Euro sign.html
    └── Section 4 - My First Webpage.html
├── 05.Web Scraping with Beautiful Soup
    ├── Section 5 - Extracting data from nested HTML tags.ipynb
    ├── Section 5 - Extracting data from the HTML tree.ipynb
    ├── Section 5 - Extracting text from an HTML tag.ipynb
    ├── Section 5 - Practical example - Exercise Setup-MyWork.ipynb
    ├── Section 5 - Practical example - Exercise Setup.ipynb
    ├── Section 5 - Practical example - Exercise Solution.ipynb
    ├── Section 5 - Practical example - dealing with links.ipynb
    ├── Section 5 - Scraping multiple pages automatically.ipynb
    ├── Section 5 - Searching and navigating the HTML tree.ipynb
    ├── Section 5 - Searching the HTML tree by attributes.ipynb
    ├── Section 5 - Setting up your first scraper.ipynb
    ├── scraper.py
    ├── scraper2_extracting_data.py
    ├── scraper3_extracting_text.py
    ├── scraper4_dealing_links.py
    ├── scraper5_extracting_nestedHTML.py
    ├── scraper6_scraping_multiple_pages.py
    └── wiki_music.html
├── 06.Project Scraping - Rotten Tomatoes
    ├── Rotten_tomatoes_page_2_HTML_Parser.html
    ├── Rotten_tomatoes_page_2_LXML_Parser.html
    ├── Scraper_RottenTomatoes.ipynb
    ├── Section 6 - Dealing with the cast.ipynb
    ├── Section 6 - Extracting the rest of the information - Exercise - Setup.ipynb
    ├── Section 6 - Extracting the rest of the information.ipynb
    ├── Section 6 - Extracting the score - Setup.ipynb
    ├── Section 6 - Extracting the score - Solution.ipynb
    ├── Section 6 - Extracting the title and year of each movie.ipynb
    ├── Section 6 - Setting up your scraper.ipynb
    ├── Section 6 - Storing the data in a structured form.ipynb
    ├── Section 6 -Extracting the rest of the information - Exercise - Solution.ipynb
    ├── movies_info.csv
    └── movies_info.xlsx
├── 07.Scraping HTML Tables with Pandas
    ├── Scraper_HTMLtables.ipynb
    └── Section 7 - Scraping HTML Tables with the help of Pandas.ipynb
├── 08.Scraping Steam Project
    ├── New_Trending_Games_Info.csv
    ├── Scraper Steam - My Work.ipynb
    ├── Section 8 - Scraping Steam - Setup.ipynb
    ├── Top_Rated_Games.info.csv
    ├── Top_Sellers_Games_info.csv
    ├── Trending_Games_info.csv
    └── steam.html
├── 08.Scraping Youtube Project
    ├── Scraper YouTube - MyWork.ipynb
    ├── Section 8 - Scraping YouTube - Setup.ipynb
    ├── searched_video.html
    ├── stairway_to_heaven.html
    └── youtube.html
├── 09.Common roadblocks when Web Scraping
    ├── RequestHeaders.ipynb
    ├── Section 9 - Sample HTML login Form.html
    ├── Section 9 - Sample login code.ipynb
    ├── Section 9 - Scraping multiple pages automatically - rate limitting.ipynb
    └── Sessions.ipynb
├── 10.The Requests-HTML Package
    ├── Scraper_CSS_Selectors.ipynb
    ├── Scraper_JavaScript.ipynb
    ├── Scraper_withRequestsHTML.ipynb
    ├── Section 10 - CSS Selectors.ipynb
    ├── Section 10 - Exploring the package capabilities.ipynb
    ├── Section 10 - Scraping JavaScript.ipynb
    └── Section 10 - Searching for text.ipynb
├── 11.Scraping JavaScript - SoundCloud Project
    ├── Scraper SoundCloud - My Work.ipynb
    └── Section 10 - Scraping SoundCloud - Setup.ipynb
├── LICENSE
└── readme.md


/.gitignore:
--------------------------------------------------------------------------------
  1 | 
  2 | # Created by https://www.gitignore.io/api/python,vagrant,virtualenv,jupyternotebooks
  3 | # Edit at https://www.gitignore.io/?templates=python,vagrant,virtualenv,jupyternotebooks
  4 | 
  5 | ### JupyterNotebooks ###
  6 | # gitignore template for Jupyter Notebooks
  7 | # website: http://jupyter.org/
  8 | 
  9 | .ipynb_checkpoints
 10 | */.ipynb_checkpoints/*
 11 | 
 12 | # IPython
 13 | profile_default/
 14 | ipython_config.py
 15 | 
 16 | # Remove previous ipynb_checkpoints
 17 | #   git rm -r .ipynb_checkpoints/
 18 | 
 19 | ### Python ###
 20 | # Byte-compiled / optimized / DLL files
 21 | __pycache__/
 22 | *.py[cod]
 23 | *$py.class
 24 | 
 25 | # C extensions
 26 | *.so
 27 | 
 28 | # Distribution / packaging
 29 | .Python
 30 | build/
 31 | develop-eggs/
 32 | dist/
 33 | downloads/
 34 | eggs/
 35 | .eggs/
 36 | lib/
 37 | lib64/
 38 | parts/
 39 | sdist/
 40 | var/
 41 | wheels/
 42 | pip-wheel-metadata/
 43 | share/python-wheels/
 44 | *.egg-info/
 45 | .installed.cfg
 46 | *.egg
 47 | MANIFEST
 48 | 
 49 | # PyInstaller
 50 | #  Usually these files are written by a python script from a template
 51 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 52 | *.manifest
 53 | *.spec
 54 | 
 55 | # Installer logs
 56 | pip-log.txt
 57 | pip-delete-this-directory.txt
 58 | 
 59 | # Unit test / coverage reports
 60 | htmlcov/
 61 | .tox/
 62 | .nox/
 63 | .coverage
 64 | .coverage.*
 65 | .cache
 66 | nosetests.xml
 67 | coverage.xml
 68 | *.cover
 69 | .hypothesis/
 70 | .pytest_cache/
 71 | 
 72 | # Translations
 73 | *.mo
 74 | *.pot
 75 | 
 76 | # Scrapy stuff:
 77 | .scrapy
 78 | 
 79 | # Sphinx documentation
 80 | docs/_build/
 81 | 
 82 | # PyBuilder
 83 | target/
 84 | 
 85 | # pyenv
 86 | .python-version
 87 | 
 88 | # pipenv
 89 | #   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
 90 | #   However, in case of collaboration, if having platform-specific dependencies or dependencies
 91 | #   having no cross-platform support, pipenv may install dependencies that don't work, or not
 92 | #   install all needed dependencies.
 93 | #Pipfile.lock
 94 | 
 95 | # celery beat schedule file
 96 | celerybeat-schedule
 97 | 
 98 | # SageMath parsed files
 99 | *.sage.py
100 | 
101 | # Spyder project settings
102 | .spyderproject
103 | .spyproject
104 | 
105 | # Rope project settings
106 | .ropeproject
107 | 
108 | # Mr Developer
109 | .mr.developer.cfg
110 | .project
111 | .pydevproject
112 | 
113 | # mkdocs documentation
114 | /site
115 | 
116 | # mypy
117 | .mypy_cache/
118 | .dmypy.json
119 | dmypy.json
120 | 
121 | # Pyre type checker
122 | .pyre/
123 | 
124 | ### Vagrant ###
125 | # General
126 | .vagrant/*
127 | 
128 | # Log files (if you are creating logs in debug mode, uncomment this)
129 | # *.log
130 | 
131 | ### Vagrant Patch ###
132 | *.box
133 | 
134 | ### VirtualEnv ###
135 | # Virtualenv
136 | # http://iamzed.com/2009/05/07/a-primer-on-virtualenv/
137 | pyvenv.cfg
138 | .env
139 | .venv
140 | env/
141 | venv/
142 | ENV/
143 | env.bak/
144 | venv.bak/
145 | pip-selfcheck.json
146 | 
147 | # End of https://www.gitignore.io/api/python,vagrant,virtualenv,jupyternotebooks
148 | 
149 | __pycache__
150 | *.pyc
151 | .vagrant
152 | 


--------------------------------------------------------------------------------
/01-02.Course Notes/Course Notes - Web Scraping and API Fundamentals in Python.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ptyadana/Web-Scraping-and-API-in-Python/9595bc418866642143eaf4a1f700dd646d81d427/01-02.Course Notes/Course Notes - Web Scraping and API Fundamentals in Python.pdf


--------------------------------------------------------------------------------
/03.Working with APIs/Currency Exchange API/Section 3 - Exchange rates API GETting a JSON reply.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Pulling data from public APIs (without registration) - GET request"
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "code",
 12 |    "execution_count": 1,
 13 |    "metadata": {},
 14 |    "outputs": [],
 15 |    "source": [
 16 |     "# loading the packages\n",
 17 |     "# requests provides us with the capabilities of sending an HTTP request to a server\n",
 18 |     "import requests"
 19 |    ]
 20 |   },
 21 |   {
 22 |    "cell_type": "markdown",
 23 |    "metadata": {},
 24 |    "source": [
 25 |     "## Extracting data on currency exchange rates"
 26 |    ]
 27 |   },
 28 |   {
 29 |    "cell_type": "code",
 30 |    "execution_count": 2,
 31 |    "metadata": {},
 32 |    "outputs": [],
 33 |    "source": [
 34 |     "# We will use an API containing currency exchange rates as published by the European Central Bank\n",
 35 |     "# Documentation at https://exchangeratesapi.io"
 36 |    ]
 37 |   },
 38 |   {
 39 |    "cell_type": "markdown",
 40 |    "metadata": {},
 41 |    "source": [
 42 |     "### Sending a GET request"
 43 |    ]
 44 |   },
 45 |   {
 46 |    "cell_type": "code",
 47 |    "execution_count": 3,
 48 |    "metadata": {},
 49 |    "outputs": [],
 50 |    "source": [
 51 |     "# Define the base URL\n",
 52 |     "# Base URL: the part of the URL common to all requests, not containing the parameters\n",
 53 |     "base_url = \"https://api.exchangeratesapi.io/latest\""
 54 |    ]
 55 |   },
 56 |   {
 57 |    "cell_type": "code",
 58 |    "execution_count": 4,
 59 |    "metadata": {},
 60 |    "outputs": [],
 61 |    "source": [
 62 |     "# We can make a GET request to this API endpoint with requests.get\n",
 63 |     "response = requests.get(base_url)\n",
 64 |     "\n",
 65 |     "# This method returns the response from the server\n",
 66 |     "# We store this response in a variable for future processing"
 67 |    ]
 68 |   },
 69 |   {
 70 |    "cell_type": "markdown",
 71 |    "metadata": {},
 72 |    "source": [
 73 |     "### Investigating the response"
 74 |    ]
 75 |   },
 76 |   {
 77 |    "cell_type": "code",
 78 |    "execution_count": 5,
 79 |    "metadata": {},
 80 |    "outputs": [
 81 |     {
 82 |      "data": {
 83 |       "text/plain": [
 84 |        "True"
 85 |       ]
 86 |      },
 87 |      "execution_count": 5,
 88 |      "metadata": {},
 89 |      "output_type": "execute_result"
 90 |     }
 91 |    ],
 92 |    "source": [
 93 |     "# Checking if the request went through ok\n",
 94 |     "response.ok"
 95 |    ]
 96 |   },
 97 |   {
 98 |    "cell_type": "code",
 99 |    "execution_count": 6,
100 |    "metadata": {},
101 |    "outputs": [
102 |     {
103 |      "data": {
104 |       "text/plain": [
105 |        "200"
106 |       ]
107 |      },
108 |      "execution_count": 6,
109 |      "metadata": {},
110 |      "output_type": "execute_result"
111 |     }
112 |    ],
113 |    "source": [
114 |     "# Checking the status code of the response\n",
115 |     "response.status_code"
116 |    ]
117 |   },
118 |   {
119 |    "cell_type": "code",
120 |    "execution_count": 7,
121 |    "metadata": {},
122 |    "outputs": [
123 |     {
124 |      "data": {
125 |       "text/plain": [
126 |        "'{\"rates\":{\"CAD\":1.5613,\"HKD\":8.9041,\"ISK\":145.0,\"PHP\":58.013,\"DKK\":7.4695,\"HUF\":336.25,\"CZK\":25.504,\"AUD\":1.733,\"RON\":4.8175,\"SEK\":10.7203,\"IDR\":16488.05,\"INR\":84.96,\"BRL\":5.4418,\"RUB\":85.1553,\"HRK\":7.55,\"JPY\":117.12,\"THB\":36.081,\"CHF\":1.0594,\"SGD\":1.5841,\"PLN\":4.3132,\"BGN\":1.9558,\"TRY\":7.0002,\"CNY\":7.96,\"NOK\":10.89,\"NZD\":1.8021,\"ZAR\":18.2898,\"USD\":1.1456,\"MXN\":24.3268,\"ILS\":4.0275,\"GBP\":0.87383,\"KRW\":1374.71,\"MYR\":4.8304},\"base\":\"EUR\",\"date\":\"2020-03-09\"}'"
127 |       ]
128 |      },
129 |      "execution_count": 7,
130 |      "metadata": {},
131 |      "output_type": "execute_result"
132 |     }
133 |    ],
134 |    "source": [
135 |     "# Inspecting the content body of the response (as a regular 'string')\n",
136 |     "response.text"
137 |    ]
138 |   },
139 |   {
140 |    "cell_type": "code",
141 |    "execution_count": 8,
142 |    "metadata": {},
143 |    "outputs": [
144 |     {
145 |      "data": {
146 |       "text/plain": [
147 |        "b'{\"rates\":{\"CAD\":1.5613,\"HKD\":8.9041,\"ISK\":145.0,\"PHP\":58.013,\"DKK\":7.4695,\"HUF\":336.25,\"CZK\":25.504,\"AUD\":1.733,\"RON\":4.8175,\"SEK\":10.7203,\"IDR\":16488.05,\"INR\":84.96,\"BRL\":5.4418,\"RUB\":85.1553,\"HRK\":7.55,\"JPY\":117.12,\"THB\":36.081,\"CHF\":1.0594,\"SGD\":1.5841,\"PLN\":4.3132,\"BGN\":1.9558,\"TRY\":7.0002,\"CNY\":7.96,\"NOK\":10.89,\"NZD\":1.8021,\"ZAR\":18.2898,\"USD\":1.1456,\"MXN\":24.3268,\"ILS\":4.0275,\"GBP\":0.87383,\"KRW\":1374.71,\"MYR\":4.8304},\"base\":\"EUR\",\"date\":\"2020-03-09\"}'"
148 |       ]
149 |      },
150 |      "execution_count": 8,
151 |      "metadata": {},
152 |      "output_type": "execute_result"
153 |     }
154 |    ],
155 |    "source": [
156 |     "# Inspecting the content of the response (in 'bytes' format)\n",
157 |     "response.content"
158 |    ]
159 |   },
160 |   {
161 |    "cell_type": "code",
162 |    "execution_count": 9,
163 |    "metadata": {},
164 |    "outputs": [],
165 |    "source": [
166 |     "# The data is presented in JSON format"
167 |    ]
168 |   },
169 |   {
170 |    "cell_type": "markdown",
171 |    "metadata": {},
172 |    "source": [
173 |     "### Handling the JSON"
174 |    ]
175 |   },
176 |   {
177 |    "cell_type": "code",
178 |    "execution_count": 10,
179 |    "metadata": {},
180 |    "outputs": [
181 |     {
182 |      "data": {
183 |       "text/plain": [
184 |        "{'rates': {'CAD': 1.5613,\n",
185 |        "  'HKD': 8.9041,\n",
186 |        "  'ISK': 145.0,\n",
187 |        "  'PHP': 58.013,\n",
188 |        "  'DKK': 7.4695,\n",
189 |        "  'HUF': 336.25,\n",
190 |        "  'CZK': 25.504,\n",
191 |        "  'AUD': 1.733,\n",
192 |        "  'RON': 4.8175,\n",
193 |        "  'SEK': 10.7203,\n",
194 |        "  'IDR': 16488.05,\n",
195 |        "  'INR': 84.96,\n",
196 |        "  'BRL': 5.4418,\n",
197 |        "  'RUB': 85.1553,\n",
198 |        "  'HRK': 7.55,\n",
199 |        "  'JPY': 117.12,\n",
200 |        "  'THB': 36.081,\n",
201 |        "  'CHF': 1.0594,\n",
202 |        "  'SGD': 1.5841,\n",
203 |        "  'PLN': 4.3132,\n",
204 |        "  'BGN': 1.9558,\n",
205 |        "  'TRY': 7.0002,\n",
206 |        "  'CNY': 7.96,\n",
207 |        "  'NOK': 10.89,\n",
208 |        "  'NZD': 1.8021,\n",
209 |        "  'ZAR': 18.2898,\n",
210 |        "  'USD': 1.1456,\n",
211 |        "  'MXN': 24.3268,\n",
212 |        "  'ILS': 4.0275,\n",
213 |        "  'GBP': 0.87383,\n",
214 |        "  'KRW': 1374.71,\n",
215 |        "  'MYR': 4.8304},\n",
216 |        " 'base': 'EUR',\n",
217 |        " 'date': '2020-03-09'}"
218 |       ]
219 |      },
220 |      "execution_count": 10,
221 |      "metadata": {},
222 |      "output_type": "execute_result"
223 |     }
224 |    ],
225 |    "source": [
226 |     "# Requests has in-build method to directly convert the response to JSON format\n",
227 |     "response.json()"
228 |    ]
229 |   },
230 |   {
231 |    "cell_type": "code",
232 |    "execution_count": 11,
233 |    "metadata": {},
234 |    "outputs": [
235 |     {
236 |      "data": {
237 |       "text/plain": [
238 |        "dict"
239 |       ]
240 |      },
241 |      "execution_count": 11,
242 |      "metadata": {},
243 |      "output_type": "execute_result"
244 |     }
245 |    ],
246 |    "source": [
247 |     "# In Python, this JSON is stored as a dictionary\n",
248 |     "type(response.json())"
249 |    ]
250 |   },
251 |   {
252 |    "cell_type": "code",
253 |    "execution_count": 12,
254 |    "metadata": {},
255 |    "outputs": [],
256 |    "source": [
257 |     "# A useful library for JSON manipulation and pretty print\n",
258 |     "import json\n",
259 |     "\n",
260 |     "# It has two main methods:\n",
261 |     "# .loads(), which creates a Python dictionary from a JSON format string (just as response.json() does)\n",
262 |     "# .dumps(), which creates a JSON format string out of a Python dictionary "
263 |    ]
264 |   },
265 |   {
266 |    "cell_type": "code",
267 |    "execution_count": 13,
268 |    "metadata": {},
269 |    "outputs": [
270 |     {
271 |      "data": {
272 |       "text/plain": [
273 |        "'{\\n    \"rates\": {\\n        \"CAD\": 1.5613,\\n        \"HKD\": 8.9041,\\n        \"ISK\": 145.0,\\n        \"PHP\": 58.013,\\n        \"DKK\": 7.4695,\\n        \"HUF\": 336.25,\\n        \"CZK\": 25.504,\\n        \"AUD\": 1.733,\\n        \"RON\": 4.8175,\\n        \"SEK\": 10.7203,\\n        \"IDR\": 16488.05,\\n        \"INR\": 84.96,\\n        \"BRL\": 5.4418,\\n        \"RUB\": 85.1553,\\n        \"HRK\": 7.55,\\n        \"JPY\": 117.12,\\n        \"THB\": 36.081,\\n        \"CHF\": 1.0594,\\n        \"SGD\": 1.5841,\\n        \"PLN\": 4.3132,\\n        \"BGN\": 1.9558,\\n        \"TRY\": 7.0002,\\n        \"CNY\": 7.96,\\n        \"NOK\": 10.89,\\n        \"NZD\": 1.8021,\\n        \"ZAR\": 18.2898,\\n        \"USD\": 1.1456,\\n        \"MXN\": 24.3268,\\n        \"ILS\": 4.0275,\\n        \"GBP\": 0.87383,\\n        \"KRW\": 1374.71,\\n        \"MYR\": 4.8304\\n    },\\n    \"base\": \"EUR\",\\n    \"date\": \"2020-03-09\"\\n}'"
274 |       ]
275 |      },
276 |      "execution_count": 13,
277 |      "metadata": {},
278 |      "output_type": "execute_result"
279 |     }
280 |    ],
281 |    "source": [
282 |     "# .dumps() has options to make the string 'prettier', more readable\n",
283 |     "# We can choose the number of spaces to be used as indentation\n",
284 |     "json.dumps(response.json(), indent=4)"
285 |    ]
286 |   },
287 |   {
288 |    "cell_type": "code",
289 |    "execution_count": 14,
290 |    "metadata": {},
291 |    "outputs": [
292 |     {
293 |      "name": "stdout",
294 |      "output_type": "stream",
295 |      "text": [
296 |       "{\n",
297 |       "    \"rates\": {\n",
298 |       "        \"CAD\": 1.5613,\n",
299 |       "        \"HKD\": 8.9041,\n",
300 |       "        \"ISK\": 145.0,\n",
301 |       "        \"PHP\": 58.013,\n",
302 |       "        \"DKK\": 7.4695,\n",
303 |       "        \"HUF\": 336.25,\n",
304 |       "        \"CZK\": 25.504,\n",
305 |       "        \"AUD\": 1.733,\n",
306 |       "        \"RON\": 4.8175,\n",
307 |       "        \"SEK\": 10.7203,\n",
308 |       "        \"IDR\": 16488.05,\n",
309 |       "        \"INR\": 84.96,\n",
310 |       "        \"BRL\": 5.4418,\n",
311 |       "        \"RUB\": 85.1553,\n",
312 |       "        \"HRK\": 7.55,\n",
313 |       "        \"JPY\": 117.12,\n",
314 |       "        \"THB\": 36.081,\n",
315 |       "        \"CHF\": 1.0594,\n",
316 |       "        \"SGD\": 1.5841,\n",
317 |       "        \"PLN\": 4.3132,\n",
318 |       "        \"BGN\": 1.9558,\n",
319 |       "        \"TRY\": 7.0002,\n",
320 |       "        \"CNY\": 7.96,\n",
321 |       "        \"NOK\": 10.89,\n",
322 |       "        \"NZD\": 1.8021,\n",
323 |       "        \"ZAR\": 18.2898,\n",
324 |       "        \"USD\": 1.1456,\n",
325 |       "        \"MXN\": 24.3268,\n",
326 |       "        \"ILS\": 4.0275,\n",
327 |       "        \"GBP\": 0.87383,\n",
328 |       "        \"KRW\": 1374.71,\n",
329 |       "        \"MYR\": 4.8304\n",
330 |       "    },\n",
331 |       "    \"base\": \"EUR\",\n",
332 |       "    \"date\": \"2020-03-09\"\n",
333 |       "}\n"
334 |      ]
335 |     }
336 |    ],
337 |    "source": [
338 |     "# In order to visualize these changes, we need to print the string\n",
339 |     "print(json.dumps(response.json(), indent=4))"
340 |    ]
341 |   },
342 |   {
343 |    "cell_type": "code",
344 |    "execution_count": 15,
345 |    "metadata": {},
346 |    "outputs": [
347 |     {
348 |      "data": {
349 |       "text/plain": [
350 |        "dict_keys(['rates', 'base', 'date'])"
351 |       ]
352 |      },
353 |      "execution_count": 15,
354 |      "metadata": {},
355 |      "output_type": "execute_result"
356 |     }
357 |    ],
358 |    "source": [
359 |     "# It contains 3 keys; the value for the 'rates' key is another dictionary\n",
360 |     "response.json().keys()"
361 |    ]
362 |   }
363 |  ],
364 |  "metadata": {
365 |   "kernelspec": {
366 |    "display_name": "Python 3",
367 |    "language": "python",
368 |    "name": "python3"
369 |   },
370 |   "language_info": {
371 |    "codemirror_mode": {
372 |     "name": "ipython",
373 |     "version": 3
374 |    },
375 |    "file_extension": ".py",
376 |    "mimetype": "text/x-python",
377 |    "name": "python",
378 |    "nbconvert_exporter": "python",
379 |    "pygments_lexer": "ipython3",
380 |    "version": "3.7.3"
381 |   }
382 |  },
383 |  "nbformat": 4,
384 |  "nbformat_minor": 2
385 | }
386 | 


--------------------------------------------------------------------------------
/03.Working with APIs/Currency Exchange API/Section 3 - Incorporating parameters in a GET request.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Pulling data from public APIs (without registration) - GET request"
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "code",
 12 |    "execution_count": 1,
 13 |    "metadata": {},
 14 |    "outputs": [],
 15 |    "source": [
 16 |     "# loading the packages\n",
 17 |     "# requests provides us with the capabilities of sending an HTTP request to a server\n",
 18 |     "import requests"
 19 |    ]
 20 |   },
 21 |   {
 22 |    "cell_type": "markdown",
 23 |    "metadata": {},
 24 |    "source": [
 25 |     "## Extracting data on currency exchange rates"
 26 |    ]
 27 |   },
 28 |   {
 29 |    "cell_type": "code",
 30 |    "execution_count": 2,
 31 |    "metadata": {},
 32 |    "outputs": [],
 33 |    "source": [
 34 |     "# We will use an API containing currency exchange rates as published by the European Central Bank\n",
 35 |     "# Documentation at https://exchangeratesapi.io"
 36 |    ]
 37 |   },
 38 |   {
 39 |    "cell_type": "markdown",
 40 |    "metadata": {},
 41 |    "source": [
 42 |     "### Sending a GET request"
 43 |    ]
 44 |   },
 45 |   {
 46 |    "cell_type": "code",
 47 |    "execution_count": 3,
 48 |    "metadata": {},
 49 |    "outputs": [],
 50 |    "source": [
 51 |     "# Define the base URL\n",
 52 |     "# Base URL: the part of the URL common to all requests, not containing the parameters\n",
 53 |     "base_url = \"https://api.exchangeratesapi.io/latest\""
 54 |    ]
 55 |   },
 56 |   {
 57 |    "cell_type": "code",
 58 |    "execution_count": 4,
 59 |    "metadata": {},
 60 |    "outputs": [],
 61 |    "source": [
 62 |     "# We can make a GET request to this API endpoint with requests.get\n",
 63 |     "response = requests.get(base_url)\n",
 64 |     "\n",
 65 |     "# This method returns the response from the server\n",
 66 |     "# We store this response in a variable for future processing"
 67 |    ]
 68 |   },
 69 |   {
 70 |    "cell_type": "markdown",
 71 |    "metadata": {},
 72 |    "source": [
 73 |     "### Investigating the response"
 74 |    ]
 75 |   },
 76 |   {
 77 |    "cell_type": "code",
 78 |    "execution_count": 5,
 79 |    "metadata": {},
 80 |    "outputs": [
 81 |     {
 82 |      "data": {
 83 |       "text/plain": [
 84 |        "True"
 85 |       ]
 86 |      },
 87 |      "execution_count": 5,
 88 |      "metadata": {},
 89 |      "output_type": "execute_result"
 90 |     }
 91 |    ],
 92 |    "source": [
 93 |     "# Checking if the request went through ok\n",
 94 |     "response.ok"
 95 |    ]
 96 |   },
 97 |   {
 98 |    "cell_type": "code",
 99 |    "execution_count": 6,
100 |    "metadata": {},
101 |    "outputs": [
102 |     {
103 |      "data": {
104 |       "text/plain": [
105 |        "200"
106 |       ]
107 |      },
108 |      "execution_count": 6,
109 |      "metadata": {},
110 |      "output_type": "execute_result"
111 |     }
112 |    ],
113 |    "source": [
114 |     "# Checking the status code of the response\n",
115 |     "response.status_code"
116 |    ]
117 |   },
118 |   {
119 |    "cell_type": "code",
120 |    "execution_count": 7,
121 |    "metadata": {},
122 |    "outputs": [
123 |     {
124 |      "data": {
125 |       "text/plain": [
126 |        "'{\"rates\":{\"CAD\":1.5613,\"HKD\":8.9041,\"ISK\":145.0,\"PHP\":58.013,\"DKK\":7.4695,\"HUF\":336.25,\"CZK\":25.504,\"AUD\":1.733,\"RON\":4.8175,\"SEK\":10.7203,\"IDR\":16488.05,\"INR\":84.96,\"BRL\":5.4418,\"RUB\":85.1553,\"HRK\":7.55,\"JPY\":117.12,\"THB\":36.081,\"CHF\":1.0594,\"SGD\":1.5841,\"PLN\":4.3132,\"BGN\":1.9558,\"TRY\":7.0002,\"CNY\":7.96,\"NOK\":10.89,\"NZD\":1.8021,\"ZAR\":18.2898,\"USD\":1.1456,\"MXN\":24.3268,\"ILS\":4.0275,\"GBP\":0.87383,\"KRW\":1374.71,\"MYR\":4.8304},\"base\":\"EUR\",\"date\":\"2020-03-09\"}'"
127 |       ]
128 |      },
129 |      "execution_count": 7,
130 |      "metadata": {},
131 |      "output_type": "execute_result"
132 |     }
133 |    ],
134 |    "source": [
135 |     "# Inspecting the content body of the response (as a regular 'string')\n",
136 |     "response.text"
137 |    ]
138 |   },
139 |   {
140 |    "cell_type": "code",
141 |    "execution_count": 8,
142 |    "metadata": {},
143 |    "outputs": [
144 |     {
145 |      "data": {
146 |       "text/plain": [
147 |        "b'{\"rates\":{\"CAD\":1.5613,\"HKD\":8.9041,\"ISK\":145.0,\"PHP\":58.013,\"DKK\":7.4695,\"HUF\":336.25,\"CZK\":25.504,\"AUD\":1.733,\"RON\":4.8175,\"SEK\":10.7203,\"IDR\":16488.05,\"INR\":84.96,\"BRL\":5.4418,\"RUB\":85.1553,\"HRK\":7.55,\"JPY\":117.12,\"THB\":36.081,\"CHF\":1.0594,\"SGD\":1.5841,\"PLN\":4.3132,\"BGN\":1.9558,\"TRY\":7.0002,\"CNY\":7.96,\"NOK\":10.89,\"NZD\":1.8021,\"ZAR\":18.2898,\"USD\":1.1456,\"MXN\":24.3268,\"ILS\":4.0275,\"GBP\":0.87383,\"KRW\":1374.71,\"MYR\":4.8304},\"base\":\"EUR\",\"date\":\"2020-03-09\"}'"
148 |       ]
149 |      },
150 |      "execution_count": 8,
151 |      "metadata": {},
152 |      "output_type": "execute_result"
153 |     }
154 |    ],
155 |    "source": [
156 |     "# Inspecting the content of the response (in 'bytes' format)\n",
157 |     "response.content"
158 |    ]
159 |   },
160 |   {
161 |    "cell_type": "code",
162 |    "execution_count": 9,
163 |    "metadata": {},
164 |    "outputs": [],
165 |    "source": [
166 |     "# The data is presented in JSON format"
167 |    ]
168 |   },
169 |   {
170 |    "cell_type": "markdown",
171 |    "metadata": {},
172 |    "source": [
173 |     "### Handling the JSON"
174 |    ]
175 |   },
176 |   {
177 |    "cell_type": "code",
178 |    "execution_count": 10,
179 |    "metadata": {},
180 |    "outputs": [
181 |     {
182 |      "data": {
183 |       "text/plain": [
184 |        "{'rates': {'CAD': 1.5613,\n",
185 |        "  'HKD': 8.9041,\n",
186 |        "  'ISK': 145.0,\n",
187 |        "  'PHP': 58.013,\n",
188 |        "  'DKK': 7.4695,\n",
189 |        "  'HUF': 336.25,\n",
190 |        "  'CZK': 25.504,\n",
191 |        "  'AUD': 1.733,\n",
192 |        "  'RON': 4.8175,\n",
193 |        "  'SEK': 10.7203,\n",
194 |        "  'IDR': 16488.05,\n",
195 |        "  'INR': 84.96,\n",
196 |        "  'BRL': 5.4418,\n",
197 |        "  'RUB': 85.1553,\n",
198 |        "  'HRK': 7.55,\n",
199 |        "  'JPY': 117.12,\n",
200 |        "  'THB': 36.081,\n",
201 |        "  'CHF': 1.0594,\n",
202 |        "  'SGD': 1.5841,\n",
203 |        "  'PLN': 4.3132,\n",
204 |        "  'BGN': 1.9558,\n",
205 |        "  'TRY': 7.0002,\n",
206 |        "  'CNY': 7.96,\n",
207 |        "  'NOK': 10.89,\n",
208 |        "  'NZD': 1.8021,\n",
209 |        "  'ZAR': 18.2898,\n",
210 |        "  'USD': 1.1456,\n",
211 |        "  'MXN': 24.3268,\n",
212 |        "  'ILS': 4.0275,\n",
213 |        "  'GBP': 0.87383,\n",
214 |        "  'KRW': 1374.71,\n",
215 |        "  'MYR': 4.8304},\n",
216 |        " 'base': 'EUR',\n",
217 |        " 'date': '2020-03-09'}"
218 |       ]
219 |      },
220 |      "execution_count": 10,
221 |      "metadata": {},
222 |      "output_type": "execute_result"
223 |     }
224 |    ],
225 |    "source": [
226 |     "# Requests has in-build method to directly convert the response to JSON format\n",
227 |     "response.json()"
228 |    ]
229 |   },
230 |   {
231 |    "cell_type": "code",
232 |    "execution_count": 11,
233 |    "metadata": {},
234 |    "outputs": [
235 |     {
236 |      "data": {
237 |       "text/plain": [
238 |        "dict"
239 |       ]
240 |      },
241 |      "execution_count": 11,
242 |      "metadata": {},
243 |      "output_type": "execute_result"
244 |     }
245 |    ],
246 |    "source": [
247 |     "# In Python, this JSON is stored as a dictionary\n",
248 |     "type(response.json())"
249 |    ]
250 |   },
251 |   {
252 |    "cell_type": "code",
253 |    "execution_count": 12,
254 |    "metadata": {},
255 |    "outputs": [],
256 |    "source": [
257 |     "# A useful library for JSON manipulation and pretty print\n",
258 |     "import json\n",
259 |     "\n",
260 |     "# It has two main methods:\n",
261 |     "# .loads(), which creates a Python dictionary from a JSON format string (just as response.json() does)\n",
262 |     "# .dumps(), which creates a JSON format string out of a Python dictionary "
263 |    ]
264 |   },
265 |   {
266 |    "cell_type": "code",
267 |    "execution_count": 13,
268 |    "metadata": {},
269 |    "outputs": [
270 |     {
271 |      "data": {
272 |       "text/plain": [
273 |        "'{\\n    \"rates\": {\\n        \"CAD\": 1.5613,\\n        \"HKD\": 8.9041,\\n        \"ISK\": 145.0,\\n        \"PHP\": 58.013,\\n        \"DKK\": 7.4695,\\n        \"HUF\": 336.25,\\n        \"CZK\": 25.504,\\n        \"AUD\": 1.733,\\n        \"RON\": 4.8175,\\n        \"SEK\": 10.7203,\\n        \"IDR\": 16488.05,\\n        \"INR\": 84.96,\\n        \"BRL\": 5.4418,\\n        \"RUB\": 85.1553,\\n        \"HRK\": 7.55,\\n        \"JPY\": 117.12,\\n        \"THB\": 36.081,\\n        \"CHF\": 1.0594,\\n        \"SGD\": 1.5841,\\n        \"PLN\": 4.3132,\\n        \"BGN\": 1.9558,\\n        \"TRY\": 7.0002,\\n        \"CNY\": 7.96,\\n        \"NOK\": 10.89,\\n        \"NZD\": 1.8021,\\n        \"ZAR\": 18.2898,\\n        \"USD\": 1.1456,\\n        \"MXN\": 24.3268,\\n        \"ILS\": 4.0275,\\n        \"GBP\": 0.87383,\\n        \"KRW\": 1374.71,\\n        \"MYR\": 4.8304\\n    },\\n    \"base\": \"EUR\",\\n    \"date\": \"2020-03-09\"\\n}'"
274 |       ]
275 |      },
276 |      "execution_count": 13,
277 |      "metadata": {},
278 |      "output_type": "execute_result"
279 |     }
280 |    ],
281 |    "source": [
282 |     "# .dumps() has options to make the string 'prettier', more readable\n",
283 |     "# We can choose the number of spaces to be used as indentation\n",
284 |     "json.dumps(response.json(), indent=4)"
285 |    ]
286 |   },
287 |   {
288 |    "cell_type": "code",
289 |    "execution_count": 14,
290 |    "metadata": {},
291 |    "outputs": [
292 |     {
293 |      "name": "stdout",
294 |      "output_type": "stream",
295 |      "text": [
296 |       "{\n",
297 |       "    \"rates\": {\n",
298 |       "        \"CAD\": 1.5613,\n",
299 |       "        \"HKD\": 8.9041,\n",
300 |       "        \"ISK\": 145.0,\n",
301 |       "        \"PHP\": 58.013,\n",
302 |       "        \"DKK\": 7.4695,\n",
303 |       "        \"HUF\": 336.25,\n",
304 |       "        \"CZK\": 25.504,\n",
305 |       "        \"AUD\": 1.733,\n",
306 |       "        \"RON\": 4.8175,\n",
307 |       "        \"SEK\": 10.7203,\n",
308 |       "        \"IDR\": 16488.05,\n",
309 |       "        \"INR\": 84.96,\n",
310 |       "        \"BRL\": 5.4418,\n",
311 |       "        \"RUB\": 85.1553,\n",
312 |       "        \"HRK\": 7.55,\n",
313 |       "        \"JPY\": 117.12,\n",
314 |       "        \"THB\": 36.081,\n",
315 |       "        \"CHF\": 1.0594,\n",
316 |       "        \"SGD\": 1.5841,\n",
317 |       "        \"PLN\": 4.3132,\n",
318 |       "        \"BGN\": 1.9558,\n",
319 |       "        \"TRY\": 7.0002,\n",
320 |       "        \"CNY\": 7.96,\n",
321 |       "        \"NOK\": 10.89,\n",
322 |       "        \"NZD\": 1.8021,\n",
323 |       "        \"ZAR\": 18.2898,\n",
324 |       "        \"USD\": 1.1456,\n",
325 |       "        \"MXN\": 24.3268,\n",
326 |       "        \"ILS\": 4.0275,\n",
327 |       "        \"GBP\": 0.87383,\n",
328 |       "        \"KRW\": 1374.71,\n",
329 |       "        \"MYR\": 4.8304\n",
330 |       "    },\n",
331 |       "    \"base\": \"EUR\",\n",
332 |       "    \"date\": \"2020-03-09\"\n",
333 |       "}\n"
334 |      ]
335 |     }
336 |    ],
337 |    "source": [
338 |     "# In order to visualize these changes, we need to print the string\n",
339 |     "print(json.dumps(response.json(), indent=4))"
340 |    ]
341 |   },
342 |   {
343 |    "cell_type": "code",
344 |    "execution_count": 15,
345 |    "metadata": {},
346 |    "outputs": [
347 |     {
348 |      "data": {
349 |       "text/plain": [
350 |        "dict_keys(['rates', 'base', 'date'])"
351 |       ]
352 |      },
353 |      "execution_count": 15,
354 |      "metadata": {},
355 |      "output_type": "execute_result"
356 |     }
357 |    ],
358 |    "source": [
359 |     "# It contains 3 keys; the value for the 'rates' key is another dictionary\n",
360 |     "response.json().keys()"
361 |    ]
362 |   },
363 |   {
364 |    "cell_type": "markdown",
365 |    "metadata": {},
366 |    "source": [
367 |     "### Incorporating parameters in the GET request"
368 |    ]
369 |   },
370 |   {
371 |    "cell_type": "code",
372 |    "execution_count": 16,
373 |    "metadata": {},
374 |    "outputs": [
375 |     {
376 |      "data": {
377 |       "text/plain": [
378 |        "'https://api.exchangeratesapi.io/latest?symbols=USD,GBP'"
379 |       ]
380 |      },
381 |      "execution_count": 16,
382 |      "metadata": {},
383 |      "output_type": "execute_result"
384 |     }
385 |    ],
386 |    "source": [
387 |     "# Request parameters are added to the URL after a question mark '?'\n",
388 |     "# In this case, we request for the exchange rates of the US Dollar (USD) and Pound Sterling (GBP) only\n",
389 |     "param_url = base_url + \"?symbols=USD,GBP\"\n",
390 |     "param_url"
391 |    ]
392 |   },
393 |   {
394 |    "cell_type": "code",
395 |    "execution_count": 17,
396 |    "metadata": {},
397 |    "outputs": [
398 |     {
399 |      "data": {
400 |       "text/plain": [
401 |        "200"
402 |       ]
403 |      },
404 |      "execution_count": 17,
405 |      "metadata": {},
406 |      "output_type": "execute_result"
407 |     }
408 |    ],
409 |    "source": [
410 |     "# Making a request to the server with the new URL, containing the parameters\n",
411 |     "response = requests.get(param_url)\n",
412 |     "response.status_code"
413 |    ]
414 |   },
415 |   {
416 |    "cell_type": "code",
417 |    "execution_count": 18,
418 |    "metadata": {},
419 |    "outputs": [
420 |     {
421 |      "data": {
422 |       "text/plain": [
423 |        "{'rates': {'USD': 1.1456, 'GBP': 0.87383}, 'base': 'EUR', 'date': '2020-03-09'}"
424 |       ]
425 |      },
426 |      "execution_count": 18,
427 |      "metadata": {},
428 |      "output_type": "execute_result"
429 |     }
430 |    ],
431 |    "source": [
432 |     "# Saving the response data\n",
433 |     "data = response.json()\n",
434 |     "data"
435 |    ]
436 |   },
437 |   {
438 |    "cell_type": "code",
439 |    "execution_count": 19,
440 |    "metadata": {},
441 |    "outputs": [
442 |     {
443 |      "data": {
444 |       "text/plain": [
445 |        "'EUR'"
446 |       ]
447 |      },
448 |      "execution_count": 19,
449 |      "metadata": {},
450 |      "output_type": "execute_result"
451 |     }
452 |    ],
453 |    "source": [
454 |     "# 'data' is a dictionary\n",
455 |     "data['base']"
456 |    ]
457 |   },
458 |   {
459 |    "cell_type": "code",
460 |    "execution_count": 20,
461 |    "metadata": {},
462 |    "outputs": [
463 |     {
464 |      "data": {
465 |       "text/plain": [
466 |        "'2020-03-09'"
467 |       ]
468 |      },
469 |      "execution_count": 20,
470 |      "metadata": {},
471 |      "output_type": "execute_result"
472 |     }
473 |    ],
474 |    "source": [
475 |     "data['date']"
476 |    ]
477 |   },
478 |   {
479 |    "cell_type": "code",
480 |    "execution_count": 21,
481 |    "metadata": {},
482 |    "outputs": [
483 |     {
484 |      "data": {
485 |       "text/plain": [
486 |        "{'USD': 1.1456, 'GBP': 0.87383}"
487 |       ]
488 |      },
489 |      "execution_count": 21,
490 |      "metadata": {},
491 |      "output_type": "execute_result"
492 |     }
493 |    ],
494 |    "source": [
495 |     "data['rates']"
496 |    ]
497 |   },
498 |   {
499 |    "cell_type": "code",
500 |    "execution_count": 22,
501 |    "metadata": {},
502 |    "outputs": [],
503 |    "source": [
504 |     "# As per the documentation of this API, we can change the base with the parameter 'base'\n",
505 |     "param_url = base_url + \"?symbols=GBP&base=USD\""
506 |    ]
507 |   },
508 |   {
509 |    "cell_type": "code",
510 |    "execution_count": 23,
511 |    "metadata": {},
512 |    "outputs": [
513 |     {
514 |      "data": {
515 |       "text/plain": [
516 |        "{'rates': {'GBP': 0.7627706006}, 'base': 'USD', 'date': '2020-03-09'}"
517 |       ]
518 |      },
519 |      "execution_count": 23,
520 |      "metadata": {},
521 |      "output_type": "execute_result"
522 |     }
523 |    ],
524 |    "source": [
525 |     "# Sending a request and saving the response JSON, all at once\n",
526 |     "data = requests.get(param_url).json()\n",
527 |     "data"
528 |    ]
529 |   },
530 |   {
531 |    "cell_type": "code",
532 |    "execution_count": 24,
533 |    "metadata": {},
534 |    "outputs": [
535 |     {
536 |      "data": {
537 |       "text/plain": [
538 |        "0.7627706006"
539 |       ]
540 |      },
541 |      "execution_count": 24,
542 |      "metadata": {},
543 |      "output_type": "execute_result"
544 |     }
545 |    ],
546 |    "source": [
547 |     "usd_to_gbp = data['rates']['GBP']\n",
548 |     "usd_to_gbp"
549 |    ]
550 |   }
551 |  ],
552 |  "metadata": {
553 |   "kernelspec": {
554 |    "display_name": "Python 3",
555 |    "language": "python",
556 |    "name": "python3"
557 |   },
558 |   "language_info": {
559 |    "codemirror_mode": {
560 |     "name": "ipython",
561 |     "version": 3
562 |    },
563 |    "file_extension": ".py",
564 |    "mimetype": "text/x-python",
565 |    "name": "python",
566 |    "nbconvert_exporter": "python",
567 |    "pygments_lexer": "ipython3",
568 |    "version": "3.7.3"
569 |   }
570 |  },
571 |  "nbformat": 4,
572 |  "nbformat_minor": 2
573 | }
574 | 


--------------------------------------------------------------------------------
/03.Working with APIs/Currency Exchange API/additional_API_functionalities.py:
--------------------------------------------------------------------------------
 1 | #Obtaining Historical Exchange Rates
 2 | import requests
 3 | import json
 4 | 
 5 | base_url = "https://api.exchangeratesapi.io"
 6 | 
 7 | historical_date_url = base_url + "/2020-04-12"
 8 | 
 9 | response = requests.get(historical_date_url)
10 | data = response.json()
11 | 
12 | # data = {'rates': {'CAD': 1.5265, 'HKD': 8.4259, 'ISK': 155.9, 'PHP': 54.939, 'DKK': 7.4657, 'HUF': 354.76, 'CZK': 26.909, 'AUD': 1.7444, 'RON': 4.833, 'SEK': 10.9455, 'IDR': 17243.21, 'INR': 82.9275, 'BRL': 5.5956, 'RUB': 80.69, 'HRK': 7.6175, 'JPY': 118.33, 'THB': 35.665, 'CHF': 1.0558, 'SGD': 1.5479, 'PLN': 4.5586, 'BGN': 1.9558, 'TRY': 7.3233, 'CNY': 7.6709, 'NOK': 11.2143, 'NZD': 1.8128, 'ZAR': 19.6383, 'USD': 1.0867, 'MXN': 26.0321, 'ILS': 3.8919, 'GBP': 0.87565, 'KRW': 1322.49, 'MYR': 4.7136}, 'base': 'EUR', 'date': '2020-04-09'}
13 | 
14 | print(json.dumps(data, indent=4, sort_keys=True))
15 | 
16 | # Invalid URL
17 | invalid_url = base_url + "/2019-12-01" + "?symbols=USB"
18 | response = requests.get(invalid_url)
19 | 
20 | print(response.status_code)
21 | print(response.json())
22 | # 400 for bad request
23 | #invalid response = {'error': "Symbols 'USB' are invalid for date 2019-12-01."}


--------------------------------------------------------------------------------
/03.Working with APIs/Currency Exchange API/currency_converter.py:
--------------------------------------------------------------------------------
 1 | import requests
 2 | import json
 3 | 
 4 | base_url = "https://api.exchangeratesapi.io/"
 5 | 
 6 | print("***** Welcome to Currency Converter ****")
 7 | date = input("Please enter the date (in the format 'yyyy-mm-dd') OR type 'latest' : ")
 8 | base_currency = input("Currency converted from (example: 'USD') : ")
 9 | to_currency = input("Currency converted to (example: 'JPY') : ")
10 | amount = input(f"How much {base_currency} do you want to convert? : ")
11 | 
12 | if date and base_currency and to_currency and amount:
13 | 
14 |     param_url = base_url + date + "?symbols=" + base_currency + "," + to_currency
15 | 
16 |     if date == 'latest':
17 |         param_url = base_url + "latest?symbols=" +  base_currency + "," + to_currency
18 |     
19 |     response = requests.get(param_url)
20 | 
21 |     if response.ok is False:
22 |         print(f"Opps! Seem like there is an Error {response.status_code}. Please try again.")
23 |         print(f"{response.json()['error']}")
24 |         
25 |     else:
26 |         data = response.json()
27 | 
28 |         #testing
29 |         # base_currency = 'USD'
30 |         # to_currency = 'JPY'
31 |         # amount = 100
32 |         # data = {'rates': {'JPY': 117.55, 'USD': 1.0936}, 'base': 'EUR', 'date': '2020-04-01'}
33 |         
34 |         converted_amount = (float(amount) / float(data['rates'][base_currency])) * float(data['rates'][to_currency])
35 |         converted_amount = round(converted_amount,2)
36 | 
37 |         print(f"The amount equalivant to {base_currency} {amount} is {to_currency} {converted_amount}")
38 | 
39 | else:
40 |     print("You have provided invalid information. Please try again.")
41 | 


--------------------------------------------------------------------------------
/03.Working with APIs/Currency Exchange API/exchange_rate_API.py:
--------------------------------------------------------------------------------
 1 | import requests
 2 | import json
 3 | 
 4 | base_url = 'https://api.exchangeratesapi.io/latest'
 5 | 
 6 | #reqeust to API
 7 | response = requests.get(base_url)
 8 | 
 9 | #investigating response
10 | print(response.ok)
11 | print(response.status_code)
12 | print(response.text)
13 | 
14 | #handling JSON
15 | json_response = response.json()
16 | 
17 | #to void calling to API so many time, just for testing purpose
18 | # json_response = {'rates': {'CAD': 1.5265, 'HKD': 8.4259, 'ISK': 155.9, 'PHP': 54.939, 'DKK': 7.4657, 'HUF': 354.76, 'CZK': 26.909, 'AUD': 1.7444, 'RON': 4.833, 'SEK': 10.9455, 'IDR': 17243.21, 'INR': 82.9275, 'BRL': 5.5956, 'RUB': 80.69, 'HRK': 7.6175, 'JPY': 118.33, 'THB': 35.665, 'CHF': 1.0558, 'SGD': 1.5479, 'PLN': 4.5586, 'BGN': 1.9558, 'TRY': 7.3233, 'CNY': 7.6709, 'NOK': 11.2143, 'NZD': 1.8128, 'ZAR': 19.6383, 'USD': 1.0867, 'MXN': 26.0321, 'ILS': 3.8919, 'GBP': 0.87565, 'KRW': 1322.49, 'MYR': 4.7136}, 'base': 'EUR', 'date': '2020-04-09'}
19 | 
20 | #Python Built in package json
21 | #loads(string): converts a JSON formatted string to a Python Object
22 | #dumps(obj): converts a Python Object to a regular string, with options to make the string prettier
23 | print(json.dumps(json_response, indent=4))


--------------------------------------------------------------------------------
/03.Working with APIs/Currency Exchange API/exchange_rate_API_with_paremeters.py:
--------------------------------------------------------------------------------
 1 | import requests
 2 | import json
 3 | 
 4 | base_url = 'https://api.exchangeratesapi.io/latest'
 5 | 
 6 | param_url = base_url + '?symbols=USD,GBP'
 7 | 
 8 | response = requests.get(param_url)
 9 | data = response.json()
10 | 
11 | # data = {'rates': {'USD': 1.0867, 'GBP': 0.87565}, 'base': 'EUR', 'date': '2020-04-09'}
12 | 
13 | print(type(data))
14 | print(data)
15 | 
16 | rates = data['rates']['USD']
17 | print(rates)
18 | 


--------------------------------------------------------------------------------
/03.Working with APIs/EDAMAM API/EDAMAM_API.py:
--------------------------------------------------------------------------------
 1 | import requests
 2 | import json
 3 | import pandas as pd
 4 | 
 5 | api_endpoint = "https://api.edamam.com/api/nutrition-details"
 6 | 
 7 | app_id = "d5df2415"
 8 | app_key = "b87fbe096f386ba8d6b2ad10dcc672d5"
 9 | url = api_endpoint + "?app_id=" + app_id + "&app_key=" + app_key
10 | 
11 | #Preparing POST request
12 | headers = {
13 |     "Content-Type": "application/json"
14 | }
15 | 
16 | recipe = {
17 |     "title" : "roasted chicken",
18 |     "ingr" : ["1 (5 to 6 pound) roasting chicken", "Kosher salt", "Freshly ground black pepper"]
19 | }
20 | 
21 | #Sending POST request
22 | response = requests.post(url, headers=headers, json=recipe)
23 | print(response.status_code)
24 | 
25 | info = response.json()
26 | print(info.keys())
27 | 
28 | # data frame using pandas
29 | nutrients = pd.DataFrame(info['totalNutrients']).transpose()
30 | print(nutrients)
31 | 
32 | # export to csv
33 | nutrients.to_csv("RoastedChicken_nutrients.csv")


--------------------------------------------------------------------------------
/03.Working with APIs/EDAMAM API/RoastedChicken_nutrients.csv:
--------------------------------------------------------------------------------
 1 | ,label,quantity,unit
 2 | ENERC_KCAL,Energy,3897.8847966250505,kcal
 3 | FAT,Fat,281.7973896498531,g
 4 | FASAT,Saturated,80.41792651629662,g
 5 | FATRN,Trans,0.0,g
 6 | FAMS,Monounsaturated,116.42828684428095,g
 7 | FAPU,Polyunsaturated,60.71976612838291,g
 8 | CHOCDF,Carbs,6.425249319142501,g
 9 | FIBTG,Fiber,1.8935213485650002,g
10 | SUGAR,Sugars,0.047899354272000004,g
11 | PROCNT,Protein,312.01614425200455,g
12 | CHOLE,Cholesterol,1566.2090943730002,mg
13 | NA,Sodium,5818.914444977497,mg
14 | CA,Calcium,218.09684626793904,mg
15 | MG,Magnesium,358.9387221502079,mg
16 | K,Potassium,3669.9071911427136,mg
17 | FE,Iron,25.715630535762603,mg
18 | ZN,Zinc,23.411849338505288,mg
19 | P,Phosphorus,3016.7612062434005,mg
20 | VITA_RAE,Vitamin A,4609.589368849851,µg
21 | VITC,Vitamin C,43.7081607732,mg
22 | THIA,Thiamin (B1),1.0825753017079,mg
23 | RIBF,Riboflavin (B2),3.1276781484795007,mg
24 | NIA,Niacin (B3),117.13235745691864,mg
25 | VITB6A,Vitamin B6,5.84953400740555,mg
26 | FOLDFE,Folate equivalent (total),474.77740164085003,µg
27 | FOLFD,Folate (food),474.77740164085003,µg
28 | FOLAC,Folic acid,0.0,µg
29 | VITB12,Vitamin B12,18.029616318945003,µg
30 | VITD,Vitamin D,0.0,µg
31 | TOCPHA,Vitamin E,0.07783645069200001,mg
32 | VITK1,Vitamin K,12.251756709885,µg
33 | WATER,Water,1202.5662619386048,g
34 | 


--------------------------------------------------------------------------------
/03.Working with APIs/EDAMAM API/Section 3 - Downloading files with requests.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Downloading Files with Requests"
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "code",
 12 |    "execution_count": 1,
 13 |    "metadata": {},
 14 |    "outputs": [],
 15 |    "source": [
 16 |     "# The requests package can also be used to download files from the web.\n",
 17 |     "import requests"
 18 |    ]
 19 |   },
 20 |   {
 21 |    "cell_type": "markdown",
 22 |    "metadata": {},
 23 |    "source": [
 24 |     "## Naive downloading"
 25 |    ]
 26 |   },
 27 |   {
 28 |    "cell_type": "code",
 29 |    "execution_count": 2,
 30 |    "metadata": {},
 31 |    "outputs": [],
 32 |    "source": [
 33 |     "# One way to 'download' a file is to send a request to it.\n",
 34 |     "# Then, export the content of the response to a local file"
 35 |    ]
 36 |   },
 37 |   {
 38 |    "cell_type": "code",
 39 |    "execution_count": 3,
 40 |    "metadata": {},
 41 |    "outputs": [],
 42 |    "source": [
 43 |     "# Let's use an image from wikipedia for this purpose\n",
 44 |     "file_url = \"https://upload.wikimedia.org/wikipedia/commons/thumb/d/d9/Collage_of_Nine_Dogs.jpg/1024px-Collage_of_Nine_Dogs.jpg\""
 45 |    ]
 46 |   },
 47 |   {
 48 |    "cell_type": "code",
 49 |    "execution_count": 4,
 50 |    "metadata": {},
 51 |    "outputs": [
 52 |     {
 53 |      "data": {
 54 |       "text/plain": [
 55 |        "200"
 56 |       ]
 57 |      },
 58 |      "execution_count": 4,
 59 |      "metadata": {},
 60 |      "output_type": "execute_result"
 61 |     }
 62 |    ],
 63 |    "source": [
 64 |     "response = requests.get(file_url)\n",
 65 |     "response.status_code"
 66 |    ]
 67 |   },
 68 |   {
 69 |    "cell_type": "code",
 70 |    "execution_count": 5,
 71 |    "metadata": {},
 72 |    "outputs": [
 73 |     {
 74 |      "data": {
 75 |       "text/plain": [
 76 |        "b'\\xff\\xd8\\xff\\xe0\\x00\\x10JFIF\\x00\\x01\\x01\\x01\\x00H\\x00H\\x00\\x00\\xff\\xfe\\x00OFile source: https://commons.wikimedia.org/wiki/File:Collage_of_Nine_Dogs.jpg\\xff\\xe2\\x02\\x1cICC_PROFILE\\x00\\x01\\x01\\x00\\x00\\x02\\x0clcms\\x02\\x10\\x00\\x00mntrRGB XYZ \\x07\\xdc\\x00\\x01\\x00\\x19\\x00\\x03\\x00)\\x009acspAPPL\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\xf6\\xd6\\x00\\x01\\x00\\x00\\x00\\x00\\xd3-lcms\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\ndesc\\x00\\x00\\x00\\xfc\\x00\\x00\\x00^cprt\\x00\\x00\\x01\\\\\\x00\\x00\\x00\\x0bwtpt\\x00\\x00\\x01h\\x00\\x00\\x00\\x14bkpt\\x00\\x00\\x01|\\x00\\x00\\x00\\x14rXYZ\\x00\\x00\\x01\\x90\\x00\\x00\\x00\\x14gXYZ\\x00\\x00\\x01\\xa4\\x00\\x00\\x00\\x14bXYZ\\x00\\x00\\x01\\xb8\\x00\\x00\\x00\\x14rTRC\\x00\\x00\\x01\\xcc\\x00\\x00\\x00@gTRC\\x00\\x00\\x01\\xcc\\x00\\x00\\x00@bTRC\\x00\\x00\\x01\\xcc\\x00\\x00\\x00@desc\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x03c2\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00text\\x00\\x00\\x00\\x00FB\\x00\\x00XYZ \\x00\\x00\\x00\\x00\\x00\\x00\\xf6\\xd6\\x00\\x01\\x00\\x00\\x00\\x00\\xd3-X'"
 77 |       ]
 78 |      },
 79 |      "execution_count": 5,
 80 |      "metadata": {},
 81 |      "output_type": "execute_result"
 82 |     }
 83 |    ],
 84 |    "source": [
 85 |     "# Printing out the begining of the content of the response\n",
 86 |     "# It is in a binary-encoded format, thus it looks like gibberish\n",
 87 |     "response.content[:500]"
 88 |    ]
 89 |   },
 90 |   {
 91 |    "cell_type": "code",
 92 |    "execution_count": 6,
 93 |    "metadata": {},
 94 |    "outputs": [],
 95 |    "source": [
 96 |     "# We need to export this to an image file (jpg, png, gif...)"
 97 |    ]
 98 |   },
 99 |   {
100 |    "cell_type": "markdown",
101 |    "metadata": {},
102 |    "source": [
103 |     "### Writing to a file"
104 |    ]
105 |   },
106 |   {
107 |    "cell_type": "code",
108 |    "execution_count": 7,
109 |    "metadata": {},
110 |    "outputs": [],
111 |    "source": [
112 |     "# We open/create a file with the function 'open()'\n",
113 |     "file = open(\"dog_image.jpg\", \"wb\")\n",
114 |     "\n",
115 |     "# Then, write to it\n",
116 |     "file.write(response.content)\n",
117 |     "\n",
118 |     "# And close the file after finishing\n",
119 |     "file.close()"
120 |    ]
121 |   },
122 |   {
123 |    "cell_type": "code",
124 |    "execution_count": 8,
125 |    "metadata": {},
126 |    "outputs": [],
127 |    "source": [
128 |     "# The two parameters in the function open() are:\n",
129 |     "# - the name of the file (along with a path to it if it is not in the same directory as our program)\n",
130 |     "# - the mode in wich we want to edit the file\n",
131 |     "\n",
132 |     "# Some popular modes are:\n",
133 |     "# - 'r' : Opens the file in read-only mode;\n",
134 |     "# - 'rb' : Opens the file as read-only in binary format;\n",
135 |     "# - 'w' : Creates a file in write-only mode. If the file already exists, it will overwrite it;\n",
136 |     "# - 'wb': Write-only mode in binary format;\n",
137 |     "# - 'a' : Opens the file for appending new information to the end;\n",
138 |     "# - 'w+' : Opens the file for writing and reading;\n",
139 |     "\n",
140 |     "# We have used 'wb' in this example, since we want to export the data to a file (thus, write to it)\n",
141 |     "# and response.content is in bytes\n",
142 |     "\n",
143 |     "# Never forget to close the file!"
144 |    ]
145 |   },
146 |   {
147 |    "cell_type": "code",
148 |    "execution_count": 9,
149 |    "metadata": {},
150 |    "outputs": [],
151 |    "source": [
152 |     "# To ensure the file will always be closed, use the 'with' statement\n",
153 |     "# This automatically calls file.close() at the end"
154 |    ]
155 |   },
156 |   {
157 |    "cell_type": "code",
158 |    "execution_count": 10,
159 |    "metadata": {},
160 |    "outputs": [],
161 |    "source": [
162 |     "with open(\"dog_image_2.jpg\", \"wb\") as file:\n",
163 |     "    file.write(response.content)"
164 |    ]
165 |   },
166 |   {
167 |    "cell_type": "code",
168 |    "execution_count": null,
169 |    "metadata": {},
170 |    "outputs": [],
171 |    "source": []
172 |   },
173 |   {
174 |    "cell_type": "code",
175 |    "execution_count": 11,
176 |    "metadata": {},
177 |    "outputs": [],
178 |    "source": [
179 |     "# Here, we first receive the whole file and store it in the RAM, then export it to the hard disk\n",
180 |     "# This method is really inefficient, especially for bigger files\n",
181 |     "# In effect we download the file to the RAM\n",
182 |     "\n",
183 |     "# We can fix that with a couple of small changes to our code"
184 |    ]
185 |   },
186 |   {
187 |    "cell_type": "markdown",
188 |    "metadata": {},
189 |    "source": [
190 |     "## Streaming the download to a file"
191 |    ]
192 |   },
193 |   {
194 |    "cell_type": "code",
195 |    "execution_count": 12,
196 |    "metadata": {},
197 |    "outputs": [],
198 |    "source": [
199 |     "# Instead of reading the whole response immidiatelly, \n",
200 |     "# we can signal the program to only read part of the response when we tell it to.\n",
201 |     "\n",
202 |     "# This is achieved with the 'stream' parameter"
203 |    ]
204 |   },
205 |   {
206 |    "cell_type": "code",
207 |    "execution_count": 13,
208 |    "metadata": {},
209 |    "outputs": [],
210 |    "source": [
211 |     "# I will use test video files provided by file-examples.com\n",
212 |     "url = \"https://file-examples.com/wp-content/uploads/2017/04/file_example_MP4_480_1_5MG.mp4\""
213 |    ]
214 |   },
215 |   {
216 |    "cell_type": "code",
217 |    "execution_count": 14,
218 |    "metadata": {},
219 |    "outputs": [],
220 |    "source": [
221 |     "r = requests.get(url, stream = True)\n",
222 |     "\n",
223 |     "with open(\"Sample_video_1,5_MB.mp4\", \"wb\") as f:\n",
224 |     "    \n",
225 |     "    # Now we iterate over the response in chunks\n",
226 |     "    for chunk in r.iter_content(chunk_size = 16*1024):\n",
227 |     "        f.write(chunk)"
228 |    ]
229 |   },
230 |   {
231 |    "cell_type": "code",
232 |    "execution_count": 15,
233 |    "metadata": {},
234 |    "outputs": [],
235 |    "source": [
236 |     "# You can change the chunk size to optimize the fastest download speed for your system"
237 |    ]
238 |   },
239 |   {
240 |    "cell_type": "code",
241 |    "execution_count": 16,
242 |    "metadata": {},
243 |    "outputs": [],
244 |    "source": [
245 |     "# However, when using 'stream=True' requests will not close the connection to the server until all data has been read\n",
246 |     "# Thus, sometimes the connection needs to be closed manually\n",
247 |     "\n",
248 |     "# Again, that is best done using the 'with' statement"
249 |    ]
250 |   },
251 |   {
252 |    "cell_type": "code",
253 |    "execution_count": 17,
254 |    "metadata": {},
255 |    "outputs": [],
256 |    "source": [
257 |     "# So, the final code for file download is\n",
258 |     "url = \"https://file-examples.com/wp-content/uploads/2017/04/file_example_MP4_1920_18MG.mp4\"\n",
259 |     "\n",
260 |     "with requests.get(url, stream = True) as r:\n",
261 |     "    with open(\"Sample_video_18_MB.mp4\", \"wb\") as f:\n",
262 |     "        for chunk in r.iter_content(chunk_size = 16*1024):\n",
263 |     "            f.write(chunk)\n"
264 |    ]
265 |   }
266 |  ],
267 |  "metadata": {
268 |   "kernelspec": {
269 |    "display_name": "Python 3",
270 |    "language": "python",
271 |    "name": "python3"
272 |   },
273 |   "language_info": {
274 |    "codemirror_mode": {
275 |     "name": "ipython",
276 |     "version": 3
277 |    },
278 |    "file_extension": ".py",
279 |    "mimetype": "text/x-python",
280 |    "name": "python",
281 |    "nbconvert_exporter": "python",
282 |    "pygments_lexer": "ipython3",
283 |    "version": "3.7.3"
284 |   }
285 |  },
286 |  "nbformat": 4,
287 |  "nbformat_minor": 2
288 | }
289 | 


--------------------------------------------------------------------------------
/03.Working with APIs/EDAMAM API/Section 3 - EDAMAM API - Initial setup and registration.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# API requiring registration - POST request"
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "markdown",
 12 |    "metadata": {},
 13 |    "source": [
 14 |     "### Registering to the API"
 15 |    ]
 16 |   },
 17 |   {
 18 |    "cell_type": "code",
 19 |    "execution_count": 1,
 20 |    "metadata": {},
 21 |    "outputs": [],
 22 |    "source": [
 23 |     "# We will use a nutritional analysis API\n",
 24 |     "# It requires registration (we need an API key to validate ourselves)\n",
 25 |     "# Many APIs require this kind of registration"
 26 |    ]
 27 |   },
 28 |   {
 29 |    "cell_type": "code",
 30 |    "execution_count": 2,
 31 |    "metadata": {},
 32 |    "outputs": [],
 33 |    "source": [
 34 |     "# You can sign-up for the Developer (Free) edition here: \n",
 35 |     "#        https://developer.edamam.com/edamam-nutrition-api\n",
 36 |     "\n",
 37 |     "# API documentation: \n",
 38 |     "#        https://developer.edamam.com/edamam-docs-nutrition-api"
 39 |    ]
 40 |   },
 41 |   {
 42 |    "cell_type": "markdown",
 43 |    "metadata": {},
 44 |    "source": [
 45 |     "### Initial Setup"
 46 |    ]
 47 |   },
 48 |   {
 49 |    "cell_type": "code",
 50 |    "execution_count": 3,
 51 |    "metadata": {},
 52 |    "outputs": [],
 53 |    "source": [
 54 |     "# loading the packages\n",
 55 |     "import requests\n",
 56 |     "import json"
 57 |    ]
 58 |   },
 59 |   {
 60 |    "cell_type": "code",
 61 |    "execution_count": 4,
 62 |    "metadata": {},
 63 |    "outputs": [],
 64 |    "source": [
 65 |     "# Store the ID and Key in variables\n",
 66 |     "\n",
 67 |     "#APP_ID = \"your_API_ID_here\"\n",
 68 |     "#APP_KEY = \"your_API_key_here\"\n",
 69 |     "\n",
 70 |     "# Note: Those are not real ID and Key,\n",
 71 |     "# Replace the string with your own ones that you recieved upon registration"
 72 |    ]
 73 |   },
 74 |   {
 75 |    "cell_type": "code",
 76 |    "execution_count": 5,
 77 |    "metadata": {},
 78 |    "outputs": [],
 79 |    "source": [
 80 |     "# Setting up the request URL\n",
 81 |     "api_endpoint = \"https://api.edamam.com/api/nutrition-details\"\n",
 82 |     "\n",
 83 |     "url = api_endpoint + \"?app_id=\" + APP_ID + \"&app_key=\" + APP_KEY"
 84 |    ]
 85 |   }
 86 |  ],
 87 |  "metadata": {
 88 |   "kernelspec": {
 89 |    "display_name": "Python 3",
 90 |    "language": "python",
 91 |    "name": "python3"
 92 |   },
 93 |   "language_info": {
 94 |    "codemirror_mode": {
 95 |     "name": "ipython",
 96 |     "version": 3
 97 |    },
 98 |    "file_extension": ".py",
 99 |    "mimetype": "text/x-python",
100 |    "name": "python",
101 |    "nbconvert_exporter": "python",
102 |    "pygments_lexer": "ipython3",
103 |    "version": "3.7.3"
104 |   }
105 |  },
106 |  "nbformat": 4,
107 |  "nbformat_minor": 2
108 | }
109 | 


--------------------------------------------------------------------------------
/03.Working with APIs/GitHub API/github_API.py:
--------------------------------------------------------------------------------
 1 | import requests
 2 | import json
 3 | 
 4 | base_url = "https://jobs.github.com/positions.json"
 5 | 
 6 | #Extracting results from multiple pages
 7 | results = []
 8 | 
 9 | for index in range(10):
10 |     response = requests.get(base_url, params= {"description":"python", "location":"new york","page": index+1})
11 |     
12 |     print(response.url)
13 |     # print(response.json())
14 |     if len(response.json()) == 0:
15 |         break
16 | 
17 |     results.extend(response.json())
18 | 
19 | print(len(results))
20 | 
21 | 
22 | 
23 | 
24 | 
25 | 
26 | 
27 | 
28 | data = response.json()
29 | data = json.dumps(data, indent=4)
30 | 
31 | 


--------------------------------------------------------------------------------
/03.Working with APIs/iTune API/iTunes_API.py:
--------------------------------------------------------------------------------
 1 | import requests
 2 | import json
 3 | 
 4 | base_site = "https://itunes.apple.com/search"
 5 | 
 6 | response = requests.get(base_site,params={"term":"fifth harmony", "country":"us","limit": 200})
 7 | 
 8 | print(response.url)
 9 | print(response.status_code)
10 | 
11 | info = response.json()
12 | print(json.dumps(info, indent=4))
13 | 
14 | #name and release dates of the songs
15 | for result in info['results']:
16 |     print(result['trackName'])
17 |     print(result['releaseDate'])
18 |     
19 | 


--------------------------------------------------------------------------------
/03.Working with APIs/iTune API/iTunes_API_structuring_exporting.py:
--------------------------------------------------------------------------------
 1 | import requests
 2 | import json
 3 | import pandas as pd
 4 | 
 5 | base_site = "https://itunes.apple.com/search"
 6 | 
 7 | response = requests.get(base_site,params={"term":"fifth harmony", "country":"us","limit": 200})
 8 | 
 9 | info = response.json()
10 | 
11 | #dataframe with pandas
12 | songs_df = pd.DataFrame(info['results'])
13 | print(songs_df)
14 | 
15 | #export to csv or excel
16 | songs_df.to_csv('songs_info.csv')
17 | 
18 | songs_df.to_excel('songs_info.xlsx')
19 | 
20 | 
21 | 
22 |     
23 | 


--------------------------------------------------------------------------------
/03.Working with APIs/iTune API/songs_info.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ptyadana/Web-Scraping-and-API-in-Python/9595bc418866642143eaf4a1f700dd646d81d427/03.Working with APIs/iTune API/songs_info.xlsx


--------------------------------------------------------------------------------
/04.HTML Overview/Section 4 - CSS and JavaScript.html:
--------------------------------------------------------------------------------
 1 | <!DOCTYPE html>
 2 | <html>
 3 | 
 4 | 	<head>
 5 | 		<title>CSS and JavaScript</title>
 6 | 	</head>
 7 | 	
 8 | 	
 9 | 	<body>
10 | 	
11 | 		<p align="center" style="background-color:black;color:red;">
12 | 			Come to the dark side, we have <span id="cookie">cookies!</span>
13 | 		</p>
14 | 		
15 | 		<button onclick="alert('This is your last warning!!! Do NOT Press That Button Again!')">
16 | 			Don't click me!
17 | 		</button>
18 | 		
19 | 		
20 | 		<button onclick="document.getElementById('cookie').style.color='yellow'">
21 | 			Click to highlight the advantages of the Dark side.
22 | 		</button>
23 | 		
24 | 	</body>
25 | 
26 | 
27 | </html>


--------------------------------------------------------------------------------
/04.HTML Overview/Section 4 - CSS style tag.html:
--------------------------------------------------------------------------------
 1 | <!DOCTYPE html>
 2 | <html>
 3 | 
 4 | 	<head>
 5 | 		<style>
 6 | 		h1 {
 7 | 		  color: blue;
 8 | 		  font-family: verdana;
 9 | 		  font-size: 300%;
10 | 		}
11 | 		p  {
12 | 		  color: red;
13 | 		  font-family: courier;
14 | 		  font-size: 160%;
15 | 		}
16 | 		#diff {
17 | 			color: green;
18 | 			border: 1px solid powderblue;
19 | 		}
20 | 		</style>
21 | 	</head>
22 | 	
23 | 	<body>
24 | 		<h1>This is a heading</h1>
25 | 		<p>This is a paragraph.</p>
26 | 		<p id="diff">I am different</p>
27 | 	</body>
28 | </html>


--------------------------------------------------------------------------------
/04.HTML Overview/Section 4 - Character encoding - Euro sign.html:
--------------------------------------------------------------------------------
 1 | <!DOCTYPE html>
 2 | <html>
 3 | 
 4 | 	<head>
 5 | 		<title>Character encoding in HTML</title>
 6 | 	</head>
 7 | 	
 8 | 	<body>
 9 | 		
10 | 		<p>This is the Euro sign: &euro; (method 1)</p>
11 | 		<p>This is the Euro sign: &#8364; (method 2)</p>
12 | 		<p>This is the Euro sign: &#x20AC; (method 3)</p>
13 | 		
14 | 	</body>
15 | 
16 | 
17 | </html>


--------------------------------------------------------------------------------
/04.HTML Overview/Section 4 - My First Webpage.html:
--------------------------------------------------------------------------------
 1 | <!DOCTYPE html>
 2 | <html>
 3 | 
 4 | 	<head>
 5 | 		<title>My First Webpage</title>
 6 | 	</head>
 7 | 	
 8 | 	<body>
 9 | 		<p align="center" class="centered">This is not the web page you are looking for. Move along, move along!</p>
10 | 		<button>Don't click me!</button>
11 | 		
12 | 		<a href="https://www.youtube.com/watch?v=D9ioyEvdggk" id="forbidden-riff-1">
13 | 			Click here for high-quality music.
14 | 		</a>
15 | 		
16 | 		<a href="https://www.youtube.com/watch?v=D9ioyEvdggk" target="_blank" id="forbidden-riff-2">
17 | 			Click here for high-quality music in a new tab.
18 | 		</a>
19 | 		
20 | 	</body>
21 | 
22 | 
23 | </html>


--------------------------------------------------------------------------------
/05.Web Scraping with Beautiful Soup/Section 5 - Practical example - Exercise Setup.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "### Importing the packages"
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "code",
 12 |    "execution_count": null,
 13 |    "metadata": {},
 14 |    "outputs": [],
 15 |    "source": [
 16 |     "# Load the packages\n",
 17 |     "import requests\n",
 18 |     "from bs4 import BeautifulSoup"
 19 |    ]
 20 |   },
 21 |   {
 22 |    "cell_type": "markdown",
 23 |    "metadata": {},
 24 |    "source": [
 25 |     "### Making a get request"
 26 |    ]
 27 |   },
 28 |   {
 29 |    "cell_type": "code",
 30 |    "execution_count": null,
 31 |    "metadata": {},
 32 |    "outputs": [],
 33 |    "source": [
 34 |     "# Defining the url of the site\n",
 35 |     "base_site = \"https://en.wikipedia.org/wiki/Music\"\n",
 36 |     "\n",
 37 |     "# Making a get request\n",
 38 |     "response = requests.get(base_site)\n",
 39 |     "response"
 40 |    ]
 41 |   },
 42 |   {
 43 |    "cell_type": "code",
 44 |    "execution_count": null,
 45 |    "metadata": {},
 46 |    "outputs": [],
 47 |    "source": [
 48 |     "# Extracting the HTML\n",
 49 |     "html = response.content"
 50 |    ]
 51 |   },
 52 |   {
 53 |    "cell_type": "markdown",
 54 |    "metadata": {},
 55 |    "source": [
 56 |     "### Making the soup"
 57 |    ]
 58 |   },
 59 |   {
 60 |    "cell_type": "code",
 61 |    "execution_count": null,
 62 |    "metadata": {},
 63 |    "outputs": [],
 64 |    "source": [
 65 |     "# Convert HTML to a BeautifulSoup object. This will allow us to parse out content from the HTML more easily.\n",
 66 |     "# Using the default parser as it is included in Python\n",
 67 |     "soup = BeautifulSoup(html, \"html.parser\")"
 68 |    ]
 69 |   },
 70 |   {
 71 |    "cell_type": "markdown",
 72 |    "metadata": {},
 73 |    "source": [
 74 |     "### 1. Extract all existing titles of links"
 75 |    ]
 76 |   },
 77 |   {
 78 |    "cell_type": "code",
 79 |    "execution_count": null,
 80 |    "metadata": {
 81 |     "scrolled": true
 82 |    },
 83 |    "outputs": [],
 84 |    "source": [
 85 |     "# Find all links on the page \n",
 86 |     "links = soup.find_all('a')\n",
 87 |     "links"
 88 |    ]
 89 |   },
 90 |   {
 91 |    "cell_type": "code",
 92 |    "execution_count": null,
 93 |    "metadata": {},
 94 |    "outputs": [],
 95 |    "source": [
 96 |     "# Dropping the links without 'href' attribute"
 97 |    ]
 98 |   },
 99 |   {
100 |    "cell_type": "code",
101 |    "execution_count": null,
102 |    "metadata": {},
103 |    "outputs": [],
104 |    "source": [
105 |     "# Getting all titles"
106 |    ]
107 |   },
108 |   {
109 |    "cell_type": "code",
110 |    "execution_count": null,
111 |    "metadata": {},
112 |    "outputs": [],
113 |    "source": [
114 |     "# Removing the 'None' titles"
115 |    ]
116 |   },
117 |   {
118 |    "cell_type": "markdown",
119 |    "metadata": {},
120 |    "source": [
121 |     "### 2. Extract all heading 2 strings."
122 |    ]
123 |   },
124 |   {
125 |    "cell_type": "code",
126 |    "execution_count": null,
127 |    "metadata": {},
128 |    "outputs": [],
129 |    "source": [
130 |     "# Inspect all h2 tags"
131 |    ]
132 |   },
133 |   {
134 |    "cell_type": "code",
135 |    "execution_count": null,
136 |    "metadata": {},
137 |    "outputs": [],
138 |    "source": [
139 |     "# Get the text"
140 |    ]
141 |   },
142 |   {
143 |    "cell_type": "markdown",
144 |    "metadata": {},
145 |    "source": [
146 |     "### 3. Print the whole footer text."
147 |    ]
148 |   },
149 |   {
150 |    "cell_type": "code",
151 |    "execution_count": null,
152 |    "metadata": {
153 |     "scrolled": true
154 |    },
155 |    "outputs": [],
156 |    "source": [
157 |     "# By inspection: we see that the footer is contained inside a ..."
158 |    ]
159 |   }
160 |  ],
161 |  "metadata": {
162 |   "kernelspec": {
163 |    "display_name": "Python 3",
164 |    "language": "python",
165 |    "name": "python3"
166 |   },
167 |   "language_info": {
168 |    "codemirror_mode": {
169 |     "name": "ipython",
170 |     "version": 3
171 |    },
172 |    "file_extension": ".py",
173 |    "mimetype": "text/x-python",
174 |    "name": "python",
175 |    "nbconvert_exporter": "python",
176 |    "pygments_lexer": "ipython3",
177 |    "version": "3.7.3"
178 |   }
179 |  },
180 |  "nbformat": 4,
181 |  "nbformat_minor": 2
182 | }
183 | 


--------------------------------------------------------------------------------
/05.Web Scraping with Beautiful Soup/Section 5 - Setting up your first scraper.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Set-up and Workflow"
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "markdown",
 12 |    "metadata": {},
 13 |    "source": [
 14 |     "### Importing the packages"
 15 |    ]
 16 |   },
 17 |   {
 18 |    "cell_type": "code",
 19 |    "execution_count": 1,
 20 |    "metadata": {},
 21 |    "outputs": [],
 22 |    "source": [
 23 |     "# Load the packages\n",
 24 |     "import requests\n",
 25 |     "from bs4 import BeautifulSoup"
 26 |    ]
 27 |   },
 28 |   {
 29 |    "cell_type": "markdown",
 30 |    "metadata": {},
 31 |    "source": [
 32 |     "### Making a GET request"
 33 |    ]
 34 |   },
 35 |   {
 36 |    "cell_type": "code",
 37 |    "execution_count": 2,
 38 |    "metadata": {},
 39 |    "outputs": [
 40 |     {
 41 |      "data": {
 42 |       "text/plain": [
 43 |        "200"
 44 |       ]
 45 |      },
 46 |      "execution_count": 2,
 47 |      "metadata": {},
 48 |      "output_type": "execute_result"
 49 |     }
 50 |    ],
 51 |    "source": [
 52 |     "# Defining the url of the site\n",
 53 |     "base_site = \"https://en.wikipedia.org/wiki/Music\"\n",
 54 |     "\n",
 55 |     "# Making a get request\n",
 56 |     "response = requests.get(base_site)\n",
 57 |     "response.status_code"
 58 |    ]
 59 |   },
 60 |   {
 61 |    "cell_type": "code",
 62 |    "execution_count": 3,
 63 |    "metadata": {},
 64 |    "outputs": [
 65 |     {
 66 |      "data": {
 67 |       "text/plain": [
 68 |        "b'<!DOCTYPE html>\\n<html class=\"client-nojs\" lang=\"en\" dir=\"ltr\">\\n<head>\\n<meta charset=\"UTF-8\"/>\\n<title'"
 69 |       ]
 70 |      },
 71 |      "execution_count": 3,
 72 |      "metadata": {},
 73 |      "output_type": "execute_result"
 74 |     }
 75 |    ],
 76 |    "source": [
 77 |     "# Extracting the HTML\n",
 78 |     "html = response.content\n",
 79 |     "\n",
 80 |     "# Checking that the reply is indeed an HTML code by inspecting the first 100 symbols\n",
 81 |     "html[:100]"
 82 |    ]
 83 |   },
 84 |   {
 85 |    "cell_type": "markdown",
 86 |    "metadata": {},
 87 |    "source": [
 88 |     "### Making the soup"
 89 |    ]
 90 |   },
 91 |   {
 92 |    "cell_type": "code",
 93 |    "execution_count": 4,
 94 |    "metadata": {},
 95 |    "outputs": [],
 96 |    "source": [
 97 |     "# Convert HTML to a BeautifulSoup object. This will allow us to parse out content from the HTML more easily.\n",
 98 |     "# Using the default parser as it is included in Python\n",
 99 |     "soup = BeautifulSoup(html, \"html.parser\")"
100 |    ]
101 |   },
102 |   {
103 |    "cell_type": "markdown",
104 |    "metadata": {},
105 |    "source": [
106 |     "### Exporting the HTML to a file"
107 |    ]
108 |   },
109 |   {
110 |    "cell_type": "code",
111 |    "execution_count": 5,
112 |    "metadata": {},
113 |    "outputs": [],
114 |    "source": [
115 |     "# It is extremely useful to be able to check this file when searching where some info is located\n",
116 |     "# or to see how was the document parsed\n",
117 |     "\n",
118 |     "# Exporting the HTML to a file\n",
119 |     "with open('Wiki_response.html', 'wb') as file:\n",
120 |     "    file.write(soup.prettify('utf-8'))\n",
121 |     "\n",
122 |     "\n",
123 |     "# the 'with' statement is shorthand for a 'try-finally' block\n",
124 |     "# open is function for opening/creating a file to edit\n",
125 |     "# the 'wb' argument signifies the mode in which to edit the file - Writing in Bytes format\n",
126 |     "# .prettify() modifies the HTML code with additional indentations for better readability"
127 |    ]
128 |   }
129 |  ],
130 |  "metadata": {
131 |   "kernelspec": {
132 |    "display_name": "Python 3",
133 |    "language": "python",
134 |    "name": "python3"
135 |   },
136 |   "language_info": {
137 |    "codemirror_mode": {
138 |     "name": "ipython",
139 |     "version": 3
140 |    },
141 |    "file_extension": ".py",
142 |    "mimetype": "text/x-python",
143 |    "name": "python",
144 |    "nbconvert_exporter": "python",
145 |    "pygments_lexer": "ipython3",
146 |    "version": "3.7.3"
147 |   }
148 |  },
149 |  "nbformat": 4,
150 |  "nbformat_minor": 2
151 | }
152 | 


--------------------------------------------------------------------------------
/05.Web Scraping with Beautiful Soup/scraper.py:
--------------------------------------------------------------------------------
 1 | import requests
 2 | from bs4 import BeautifulSoup
 3 | 
 4 | #Making GET request
 5 | base_url = "https://en.wikipedia.org/wiki/Music"
 6 | 
 7 | response = requests.get(base_url)
 8 | 
 9 | if response.status_code:
10 |     html = response.content
11 |     print(html[:100])
12 | 
13 |     #Making the soup
14 |     soup = BeautifulSoup(html, "html.parser")
15 | 
16 |     #Exporting HTML to file
17 |     with open("wiki_music.html","wb") as file:
18 |         file.write(soup.prettify('utf-8'))
19 | 
20 |     #Finding elements
21 |     #find() , find_all()
22 |     links = soup.find_all('a')
23 |     print(isinstance(links,list))
24 |     print(len(links))
25 | 
26 |     table = soup.find('tbody')
27 |     table_type = type(table)
28 | 
29 |     td_tags = table.find_all('td')
30 | 
31 |     #Navigating the tree / children elements
32 |     content = table.contents
33 |     content_len = len(content)
34 | 
35 |     #Navigating up the tree
36 |     parent = table.parent
37 |     grandparent = table.parent.parent
38 | 
39 |     #Searching by Attributes
40 |     div_tags = soup.find('div', id='siteSub')
41 |     a_tags = soup.find_all('a', class_ ='mw-jump-link', href = '#p-search')
42 |     
43 |     #Searching method 2 | Placing attributes in Dictionary
44 |     a_tags = soup.find('a', attrs = {'class':'mw-jump-link', 'href':'#p-search'})
45 |     footer = soup.find('div', attrs={'id':'footer'})
46 | 
47 |     
48 | 
49 |    


--------------------------------------------------------------------------------
/05.Web Scraping with Beautiful Soup/scraper2_extracting_data.py:
--------------------------------------------------------------------------------
 1 | import requests
 2 | from bs4 import BeautifulSoup
 3 | 
 4 | #Making GET request
 5 | base_url = "https://en.wikipedia.org/wiki/Music"
 6 | 
 7 | response = requests.get(base_url)
 8 | 
 9 | if response.status_code:
10 |     html = response.content
11 | 
12 |     #Making the soup
13 |     soup = BeautifulSoup(html, "html.parser")
14 | 
15 |     #Extracting data from HTML tree
16 |     a = soup.find('a', class_ = 'mw-jump-link')
17 |     a_name = a.name
18 | 
19 |     #Getting attribute value (2 ways)
20 |     href_value = a['href']
21 |     href_value = a.get('href')
22 | 
23 |     #class values will be list as each html element can have multiple classes
24 |     href_class_list = a['class']
25 |     href_class_list = a.get('class')
26 |     
27 |     #difference between those 2 methods
28 |     #using dictionary style will raise Key Error, if there is no key
29 |     #get method will return NONE, if there is no such element
30 |     print(a.get('blah'))
31 |     print(repr(a.get('blah'))) #return as string
32 | 
33 |     #dictionary style attributes
34 |     a_attributes = a.attrs
35 |     print(a_attributes)
36 | 
37 |     


--------------------------------------------------------------------------------
/05.Web Scraping with Beautiful Soup/scraper3_extracting_text.py:
--------------------------------------------------------------------------------
 1 | import requests
 2 | from bs4 import BeautifulSoup
 3 | 
 4 | #Making GET request
 5 | base_url = "https://en.wikipedia.org/wiki/Music"
 6 | 
 7 | response = requests.get(base_url)
 8 | 
 9 | if response.status_code:
10 |     html = response.content
11 | 
12 |     #Making the soup
13 |     soup = BeautifulSoup(html, "html.parser")
14 | 
15 |     #Extracting data from HTML tree
16 |     a = soup.find('a', class_ = 'mw-jump-link')
17 | 
18 |     a_string = a.string
19 |     a_string = a.text
20 |     print(a_string)
21 | 
22 | 
23 |     #text vs string
24 |     p = soup.find_all('p')[1]
25 |     # print(p.text)
26 | 
27 |     #.strings and .stripped_strings
28 |     for s in p.strings:
29 |         print(s)
30 | 
31 | 
32 |     for s in p.stripped_strings:
33 |         print(s)


--------------------------------------------------------------------------------
/05.Web Scraping with Beautiful Soup/scraper4_dealing_links.py:
--------------------------------------------------------------------------------
 1 | import requests
 2 | from bs4 import BeautifulSoup
 3 | from urllib.parse import urljoin
 4 | 
 5 | #Making GET request
 6 | base_url = "https://en.wikipedia.org/wiki/Music"
 7 | 
 8 | response = requests.get(base_url)
 9 | 
10 | if response.status_code:
11 |     html = response.content
12 | 
13 |     #Making the soup
14 |     soup = BeautifulSoup(html, "html.parser")
15 | 
16 |     links = soup.find_all('a')
17 |     
18 |     link = links[26]
19 |     link_value = link.string
20 |     link_href = link.get('href')
21 | 
22 |     #Making full url
23 |     link_full_url = urljoin(base_url, link_href)
24 |     print(link_full_url)
25 | 
26 |     #Processing multiple links at once
27 |     for item in links:
28 |         item_value = item.string
29 |         item_url = urljoin(base_url, item.get('href'))
30 |         # print(item_value, item_url)
31 | 
32 |     #using list comprehension
33 |     clean_links = [link.get('href') for link in links if link.get('href') is not None]
34 |     
35 |     full_urls = [urljoin(base_url, clean_url) for clean_url in clean_links]
36 |     print(full_urls)
37 | 
38 |     #getting internal links
39 |     internal_links = [url for url in full_urls if 'wikipedia.org' in url]
40 |     print(internal_links)


--------------------------------------------------------------------------------
/05.Web Scraping with Beautiful Soup/scraper5_extracting_nestedHTML.py:
--------------------------------------------------------------------------------
 1 | import requests
 2 | from bs4 import BeautifulSoup
 3 | from urllib.parse import urljoin
 4 | 
 5 | base_url = "https://en.wikipedia.org/wiki/Music"
 6 | 
 7 | response = requests.get(base_url)
 8 | html = response.content
 9 | 
10 | soup = BeautifulSoup(html, 'html.parser')
11 | 
12 | #Extracting data from nested tags
13 | div_notes = soup.find_all('div', {'role':'note'})
14 | div_links = [div.find('a') for div in div_notes]
15 | 
16 | articles = [link.string for link in div_links]
17 | articles_links = [urljoin(base_url, link.get('href')) for link in div_links]
18 | 
19 | print(articles)
20 | print(articles_links)
21 | 
22 | # shorter version
23 | # articles = [article.find('a').string for article in div_notes]
24 | # print(main_articles)


--------------------------------------------------------------------------------
/05.Web Scraping with Beautiful Soup/scraper6_scraping_multiple_pages.py:
--------------------------------------------------------------------------------
 1 | import requests
 2 | from bs4 import BeautifulSoup
 3 | from urllib.parse import urljoin
 4 | 
 5 | base_url = "https://en.wikipedia.org/wiki/Music"
 6 | 
 7 | response = requests.get(base_url)
 8 | html = response.content
 9 | 
10 | soup = BeautifulSoup(html, 'html.parser')
11 | 
12 | div_notes = soup.find_all('div', {'role':'note'})
13 | div_links = [div.find('a') for div in div_notes]
14 | 
15 | articles_links = [urljoin(base_url, link.get('href')) for link in div_links]
16 | 
17 | #Scraping multiple pages automatically
18 | par_text = []
19 | i = 0
20 | 
21 | for url in articles_links:
22 |     article_response = requests.get(url)
23 | 
24 |     if article_response.status_code == 200:
25 |         print(f'URL {i+1}, {url}')
26 |     else:
27 |         print(f'status code {article_response.status_code} : skipping URL {i+1}, {url}')
28 |         i = i+1
29 |         continue
30 | 
31 |     article_html = article_response.content
32 |     article_soup = BeautifulSoup(article_html, 'lxml')
33 |     article_pars = article_soup.find_all('p')
34 | 
35 |     text = [p.text for p in article_pars]
36 |     par_text.append(text)
37 |     i = i+1
38 | 
39 | 
40 | page_text = ["".join(p_text) for p_text in par_text]
41 | # print(page_text[0])
42 | 
43 | #zipping as dictionary
44 | dict_articles_pages = dict(zip(articles_links, page_text))
45 | print(dict_articles_pages['https://en.wikipedia.org/wiki/Music_education'])


--------------------------------------------------------------------------------
/06.Project Scraping - Rotten Tomatoes/Section 6 - Extracting the rest of the information - Exercise - Setup.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Set-up"
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "code",
 12 |    "execution_count": null,
 13 |    "metadata": {},
 14 |    "outputs": [],
 15 |    "source": [
 16 |     "# load packages\n",
 17 |     "import requests\n",
 18 |     "from bs4 import BeautifulSoup"
 19 |    ]
 20 |   },
 21 |   {
 22 |    "cell_type": "code",
 23 |    "execution_count": null,
 24 |    "metadata": {},
 25 |    "outputs": [],
 26 |    "source": [
 27 |     "# Define the URL of the site\n",
 28 |     "base_site = \"https://editorial.rottentomatoes.com/guide/140-essential-action-movies-to-watch-now/2/\""
 29 |    ]
 30 |   },
 31 |   {
 32 |    "cell_type": "code",
 33 |    "execution_count": null,
 34 |    "metadata": {},
 35 |    "outputs": [],
 36 |    "source": [
 37 |     "# sending a request to the webpage\n",
 38 |     "response = requests.get(base_site)\n",
 39 |     "response"
 40 |    ]
 41 |   },
 42 |   {
 43 |    "cell_type": "code",
 44 |    "execution_count": null,
 45 |    "metadata": {},
 46 |    "outputs": [],
 47 |    "source": [
 48 |     "# get the HTML from the webpage\n",
 49 |     "html = response.content"
 50 |    ]
 51 |   },
 52 |   {
 53 |    "cell_type": "code",
 54 |    "execution_count": null,
 55 |    "metadata": {},
 56 |    "outputs": [],
 57 |    "source": [
 58 |     "# convert the HTML to a BeatifulSoup object\n",
 59 |     "soup = BeautifulSoup(html, 'lxml')"
 60 |    ]
 61 |   },
 62 |   {
 63 |    "cell_type": "code",
 64 |    "execution_count": null,
 65 |    "metadata": {},
 66 |    "outputs": [],
 67 |    "source": [
 68 |     "# Find the encompassing div tags\n",
 69 |     "divs = soup.find_all(\"div\", {\"class\": \"col-sm-18 col-full-xs countdown-item-content\"})\n",
 70 |     "divs"
 71 |    ]
 72 |   },
 73 |   {
 74 |    "cell_type": "markdown",
 75 |    "metadata": {},
 76 |    "source": [
 77 |     "# Extracting the rest of the information"
 78 |    ]
 79 |   },
 80 |   {
 81 |    "cell_type": "markdown",
 82 |    "metadata": {},
 83 |    "source": [
 84 |     "## Adjusted score"
 85 |    ]
 86 |   },
 87 |   {
 88 |    "cell_type": "code",
 89 |    "execution_count": null,
 90 |    "metadata": {
 91 |     "scrolled": true
 92 |    },
 93 |    "outputs": [],
 94 |    "source": []
 95 |   },
 96 |   {
 97 |    "cell_type": "code",
 98 |    "execution_count": null,
 99 |    "metadata": {},
100 |    "outputs": [],
101 |    "source": []
102 |   },
103 |   {
104 |    "cell_type": "code",
105 |    "execution_count": null,
106 |    "metadata": {},
107 |    "outputs": [],
108 |    "source": []
109 |   },
110 |   {
111 |    "cell_type": "code",
112 |    "execution_count": null,
113 |    "metadata": {
114 |     "scrolled": true
115 |    },
116 |    "outputs": [],
117 |    "source": []
118 |   },
119 |   {
120 |    "cell_type": "code",
121 |    "execution_count": null,
122 |    "metadata": {
123 |     "scrolled": true
124 |    },
125 |    "outputs": [],
126 |    "source": []
127 |   },
128 |   {
129 |    "cell_type": "markdown",
130 |    "metadata": {},
131 |    "source": [
132 |     "## Synopsis"
133 |    ]
134 |   },
135 |   {
136 |    "cell_type": "code",
137 |    "execution_count": null,
138 |    "metadata": {},
139 |    "outputs": [],
140 |    "source": []
141 |   },
142 |   {
143 |    "cell_type": "code",
144 |    "execution_count": null,
145 |    "metadata": {},
146 |    "outputs": [],
147 |    "source": []
148 |   },
149 |   {
150 |    "cell_type": "code",
151 |    "execution_count": null,
152 |    "metadata": {},
153 |    "outputs": [],
154 |    "source": []
155 |   },
156 |   {
157 |    "cell_type": "code",
158 |    "execution_count": null,
159 |    "metadata": {
160 |     "scrolled": true
161 |    },
162 |    "outputs": [],
163 |    "source": []
164 |   }
165 |  ],
166 |  "metadata": {
167 |   "kernelspec": {
168 |    "display_name": "Python 3",
169 |    "language": "python",
170 |    "name": "python3"
171 |   },
172 |   "language_info": {
173 |    "codemirror_mode": {
174 |     "name": "ipython",
175 |     "version": 3
176 |    },
177 |    "file_extension": ".py",
178 |    "mimetype": "text/x-python",
179 |    "name": "python",
180 |    "nbconvert_exporter": "python",
181 |    "pygments_lexer": "ipython3",
182 |    "version": "3.7.3"
183 |   }
184 |  },
185 |  "nbformat": 4,
186 |  "nbformat_minor": 2
187 | }
188 | 


--------------------------------------------------------------------------------
/06.Project Scraping - Rotten Tomatoes/Section 6 - Extracting the score - Setup.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "## Set up"
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "code",
 12 |    "execution_count": null,
 13 |    "metadata": {},
 14 |    "outputs": [],
 15 |    "source": [
 16 |     "# load the packages\n",
 17 |     "import requests\n",
 18 |     "from bs4 import BeautifulSoup"
 19 |    ]
 20 |   },
 21 |   {
 22 |    "cell_type": "code",
 23 |    "execution_count": null,
 24 |    "metadata": {},
 25 |    "outputs": [],
 26 |    "source": [
 27 |     "# Define the URL of the site\n",
 28 |     "base_site = \"https://editorial.rottentomatoes.com/guide/140-essential-action-movies-to-watch-now/2/\""
 29 |    ]
 30 |   },
 31 |   {
 32 |    "cell_type": "code",
 33 |    "execution_count": null,
 34 |    "metadata": {},
 35 |    "outputs": [],
 36 |    "source": [
 37 |     "# sending a request to the webpage\n",
 38 |     "response = requests.get(base_site)\n",
 39 |     "response.status_code"
 40 |    ]
 41 |   },
 42 |   {
 43 |    "cell_type": "code",
 44 |    "execution_count": null,
 45 |    "metadata": {},
 46 |    "outputs": [],
 47 |    "source": [
 48 |     "# get the HTML from the webpage\n",
 49 |     "html = response.content"
 50 |    ]
 51 |   },
 52 |   {
 53 |    "cell_type": "code",
 54 |    "execution_count": null,
 55 |    "metadata": {},
 56 |    "outputs": [],
 57 |    "source": [
 58 |     "# convert the HTML to a BeatifulSoup object\n",
 59 |     "soup = BeautifulSoup(html, 'lxml')"
 60 |    ]
 61 |   },
 62 |   {
 63 |    "cell_type": "code",
 64 |    "execution_count": null,
 65 |    "metadata": {},
 66 |    "outputs": [],
 67 |    "source": [
 68 |     "# Find all div tags on the webpage containing the information we want to scrape\n",
 69 |     "divs = soup.find_all(\"div\", {\"class\": \"col-sm-18 col-full-xs countdown-item-content\"})"
 70 |    ]
 71 |   },
 72 |   {
 73 |    "cell_type": "code",
 74 |    "execution_count": null,
 75 |    "metadata": {},
 76 |    "outputs": [],
 77 |    "source": [
 78 |     "# Extracting all 'h2' tags\n",
 79 |     "headings = [div.find(\"h2\") for div in divs]\n",
 80 |     "headings"
 81 |    ]
 82 |   },
 83 |   {
 84 |    "cell_type": "markdown",
 85 |    "metadata": {},
 86 |    "source": [
 87 |     "## Extracting the scores"
 88 |    ]
 89 |   },
 90 |   {
 91 |    "cell_type": "code",
 92 |    "execution_count": null,
 93 |    "metadata": {
 94 |     "scrolled": true
 95 |    },
 96 |    "outputs": [],
 97 |    "source": [
 98 |     "# Filtering only the spans containing the score"
 99 |    ]
100 |   },
101 |   {
102 |    "cell_type": "code",
103 |    "execution_count": null,
104 |    "metadata": {
105 |     "scrolled": true
106 |    },
107 |    "outputs": [],
108 |    "source": [
109 |     "# Extracting the score string"
110 |    ]
111 |   },
112 |   {
113 |    "cell_type": "code",
114 |    "execution_count": null,
115 |    "metadata": {
116 |     "scrolled": true
117 |    },
118 |    "outputs": [],
119 |    "source": [
120 |     "# Removing the '%' sign"
121 |    ]
122 |   },
123 |   {
124 |    "cell_type": "code",
125 |    "execution_count": null,
126 |    "metadata": {
127 |     "scrolled": true
128 |    },
129 |    "outputs": [],
130 |    "source": [
131 |     "# Converting each score to an integer"
132 |    ]
133 |   }
134 |  ],
135 |  "metadata": {
136 |   "kernelspec": {
137 |    "display_name": "Python 3",
138 |    "language": "python",
139 |    "name": "python3"
140 |   },
141 |   "language_info": {
142 |    "codemirror_mode": {
143 |     "name": "ipython",
144 |     "version": 3
145 |    },
146 |    "file_extension": ".py",
147 |    "mimetype": "text/x-python",
148 |    "name": "python",
149 |    "nbconvert_exporter": "python",
150 |    "pygments_lexer": "ipython3",
151 |    "version": "3.7.3"
152 |   }
153 |  },
154 |  "nbformat": 4,
155 |  "nbformat_minor": 2
156 | }
157 | 


--------------------------------------------------------------------------------
/06.Project Scraping - Rotten Tomatoes/Section 6 - Extracting the score - Solution.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "## Set up"
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "code",
 12 |    "execution_count": 1,
 13 |    "metadata": {},
 14 |    "outputs": [],
 15 |    "source": [
 16 |     "# load the packages\n",
 17 |     "import requests\n",
 18 |     "from bs4 import BeautifulSoup"
 19 |    ]
 20 |   },
 21 |   {
 22 |    "cell_type": "code",
 23 |    "execution_count": 2,
 24 |    "metadata": {},
 25 |    "outputs": [],
 26 |    "source": [
 27 |     "# Define the URL of the site\n",
 28 |     "base_site = \"https://editorial.rottentomatoes.com/guide/140-essential-action-movies-to-watch-now/2/\""
 29 |    ]
 30 |   },
 31 |   {
 32 |    "cell_type": "code",
 33 |    "execution_count": 3,
 34 |    "metadata": {},
 35 |    "outputs": [
 36 |     {
 37 |      "data": {
 38 |       "text/plain": [
 39 |        "200"
 40 |       ]
 41 |      },
 42 |      "execution_count": 3,
 43 |      "metadata": {},
 44 |      "output_type": "execute_result"
 45 |     }
 46 |    ],
 47 |    "source": [
 48 |     "# sending a request to the webpage\n",
 49 |     "response = requests.get(base_site)\n",
 50 |     "response.status_code"
 51 |    ]
 52 |   },
 53 |   {
 54 |    "cell_type": "code",
 55 |    "execution_count": 4,
 56 |    "metadata": {},
 57 |    "outputs": [],
 58 |    "source": [
 59 |     "# get the HTML from the webpage\n",
 60 |     "html = response.content"
 61 |    ]
 62 |   },
 63 |   {
 64 |    "cell_type": "code",
 65 |    "execution_count": 5,
 66 |    "metadata": {},
 67 |    "outputs": [],
 68 |    "source": [
 69 |     "# convert the HTML to a BeatifulSoup object\n",
 70 |     "soup = BeautifulSoup(html, 'lxml')"
 71 |    ]
 72 |   },
 73 |   {
 74 |    "cell_type": "code",
 75 |    "execution_count": 6,
 76 |    "metadata": {},
 77 |    "outputs": [],
 78 |    "source": [
 79 |     "# Find all div tags on the webpage containing the information we want to scrape\n",
 80 |     "divs = soup.find_all(\"div\", {\"class\": \"col-sm-18 col-full-xs countdown-item-content\"})"
 81 |    ]
 82 |   },
 83 |   {
 84 |    "cell_type": "code",
 85 |    "execution_count": 7,
 86 |    "metadata": {},
 87 |    "outputs": [
 88 |     {
 89 |      "data": {
 90 |       "text/plain": [
 91 |        "[<h2><a href=\"https://www.rottentomatoes.com/m/13_assassins_2011/\">13 Assassins</a> <span class=\"subtle start-year\">(2011)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">95%</span></h2>,\n",
 92 |        " <h2><a href=\"https://www.rottentomatoes.com/m/full_contact/\">Full Contact</a> <span class=\"subtle start-year\">(1992)</span> <span class=\"icon tiny fresh\" title=\"Fresh\"></span> <span class=\"tMeterScore\">88%</span></h2>,\n",
 93 |        " <h2><a href=\"https://www.rottentomatoes.com/m/indiana_jones_and_the_last_crusade/\">Indiana Jones and the Last Crusade</a> <span class=\"subtle start-year\">(1989)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">88%</span></h2>,\n",
 94 |        " <h2><a href=\"https://www.rottentomatoes.com/m/kung_fu_hustle/\">Kung Fu Hustle</a> <span class=\"subtle start-year\">(2005)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">90%</span></h2>,\n",
 95 |        " <h2><a href=\"https://www.rottentomatoes.com/m/better_tomorrow/\">A Better Tomorrow</a> <span class=\"subtle start-year\">(2010)</span> <span class=\"icon tiny fresh\" title=\"Fresh\"></span> <span class=\"tMeterScore\">93%</span></h2>,\n",
 96 |        " <h2><a href=\"https://www.rottentomatoes.com/m/iron_man/\">Iron Man</a> <span class=\"subtle start-year\">(2008)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">94%</span></h2>,\n",
 97 |        " <h2><a href=\"https://www.rottentomatoes.com/m/the_night_comes_for_us/\">The Night Comes For Us</a> <span class=\"subtle start-year\">(2018)</span> <span class=\"icon tiny fresh\" title=\"Fresh\"></span> <span class=\"tMeterScore\">90%</span></h2>,\n",
 98 |        " <h2><a href=\"https://www.rottentomatoes.com/m/logan_2017/\">Logan</a> <span class=\"subtle start-year\">(2017)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">93%</span></h2>,\n",
 99 |        " <h2><a href=\"https://www.rottentomatoes.com/m/goldfinger/\">Goldfinger</a> <span class=\"subtle start-year\">(1964)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">97%</span></h2>,\n",
100 |        " <h2><a href=\"https://www.rottentomatoes.com/m/1001280-assault_on_precinct_13/\">Assault on Precinct 13</a> <span class=\"subtle start-year\">(1976)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">98%</span></h2>,\n",
101 |        " <h2><a href=\"https://www.rottentomatoes.com/m/wonder_woman_2017/\">Wonder Woman</a> <span class=\"subtle start-year\">(2017)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">93%</span></h2>,\n",
102 |        " <h2><a href=\"https://www.rottentomatoes.com/m/chinese_connection/\">Fist of Fury (Jing wu men)</a> <span class=\"subtle start-year\">(1972)</span> <span class=\"icon tiny fresh\" title=\"Fresh\"></span> <span class=\"tMeterScore\">92%</span></h2>,\n",
103 |        " <h2><a href=\"https://www.rottentomatoes.com/m/captain_america_the_winter_soldier_2014/\">Captain America: The Winter Soldier</a> <span class=\"subtle start-year\">(2014)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">90%</span></h2>,\n",
104 |        " <h2><a href=\"https://www.rottentomatoes.com/m/oldboy/\">Oldboy</a> <span class=\"subtle start-year\">(2005)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">82%</span></h2>,\n",
105 |        " <h2><a href=\"https://www.rottentomatoes.com/m/french_connection/\">The French Connection</a> <span class=\"subtle start-year\">(1971)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">98%</span></h2>,\n",
106 |        " <h2><a href=\"https://www.rottentomatoes.com/m/furious_7/\">Furious 7</a> <span class=\"subtle start-year\">(2015)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">81%</span></h2>,\n",
107 |        " <h2><a href=\"https://www.rottentomatoes.com/m/la_femme_nikita/\">La Femme Nikita (Nikita)</a> <span class=\"subtle start-year\">(1990)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">88%</span></h2>,\n",
108 |        " <h2><a href=\"https://www.rottentomatoes.com/m/supercop/\">Supercop</a> <span class=\"subtle start-year\">(1996)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">96%</span></h2>,\n",
109 |        " <h2><a href=\"https://www.rottentomatoes.com/m/dirty_harry/\">Dirty Harry</a> <span class=\"subtle start-year\">(1971)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">91%</span></h2>,\n",
110 |        " <h2><a href=\"https://www.rottentomatoes.com/m/live_die_repeat_edge_of_tomorrow/\">Live Die Repeat: Edge of Tomorrow</a> <span class=\"subtle start-year\">(2014)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">90%</span></h2>,\n",
111 |        " <h2><a href=\"https://www.rottentomatoes.com/m/x2_xmen_united/\">X2: X-Men United</a> <span class=\"subtle start-year\">(2003)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">85%</span></h2>,\n",
112 |        " <h2><a href=\"https://www.rottentomatoes.com/m/1046129-fugitive/\">The Fugitive</a> <span class=\"subtle start-year\">(1993)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">96%</span></h2>,\n",
113 |        " <h2><a href=\"https://www.rottentomatoes.com/m/black_panther_2018/\">Black Panther</a> <span class=\"subtle start-year\">(2018)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">97%</span></h2>,\n",
114 |        " <h2><a href=\"https://www.rottentomatoes.com/m/inception/\">Inception</a> <span class=\"subtle start-year\">(2010)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">87%</span></h2>,\n",
115 |        " <h2><a href=\"https://www.rottentomatoes.com/m/1065684-braveheart/\">Braveheart</a> <span class=\"subtle start-year\">(1995)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">77%</span></h2>,\n",
116 |        " <h2><a href=\"https://www.rottentomatoes.com/m/minority_report/\">Minority Report</a> <span class=\"subtle start-year\">(2002)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">90%</span></h2>,\n",
117 |        " <h2><a href=\"https://www.rottentomatoes.com/m/avengers_endgame/\">Avengers: Endgame</a> <span class=\"subtle start-year\">(2019)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">94%</span></h2>,\n",
118 |        " <h2><a href=\"https://www.rottentomatoes.com/m/dredd/\">Dredd</a> <span class=\"subtle start-year\">(2012)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">79%</span></h2>,\n",
119 |        " <h2><a href=\"https://www.rottentomatoes.com/m/bourne_identity/\">The Bourne Identity</a> <span class=\"subtle start-year\">(2002)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">83%</span></h2>,\n",
120 |        " <h2><a href=\"https://www.rottentomatoes.com/m/ip_man/\">Ip Man</a> <span class=\"subtle start-year\">(2010)</span> <span class=\"icon tiny fresh\" title=\"Fresh\"></span> <span class=\"tMeterScore\">85%</span></h2>,\n",
121 |        " <h2><a href=\"https://www.rottentomatoes.com/m/faceoff/\">Face/Off</a> <span class=\"subtle start-year\">(1997)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">92%</span></h2>,\n",
122 |        " <h2><a href=\"https://www.rottentomatoes.com/m/to_live_and_die_in_la/\">To Live and Die in L.A.</a> <span class=\"subtle start-year\">(1985)</span> <span class=\"icon tiny fresh\" title=\"Fresh\"></span> <span class=\"tMeterScore\">91%</span></h2>,\n",
123 |        " <h2><a href=\"https://www.rottentomatoes.com/m/the_dark_knight/\">The Dark Knight</a> <span class=\"subtle start-year\">(2008)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">94%</span></h2>,\n",
124 |        " <h2><a href=\"https://www.rottentomatoes.com/m/mission_impossible_ghost_protocol/\">Mission: Impossible Ghost Protocol</a> <span class=\"subtle start-year\">(2011)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">93%</span></h2>,\n",
125 |        " <h2><a href=\"https://www.rottentomatoes.com/m/fast_five/\">Fast Five</a> <span class=\"subtle start-year\">(2011)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">77%</span></h2>,\n",
126 |        " <h2><a href=\"https://www.rottentomatoes.com/m/lethal_weapon/\">Lethal Weapon</a> <span class=\"subtle start-year\">(1987)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">82%</span></h2>,\n",
127 |        " <h2><a href=\"https://www.rottentomatoes.com/m/1072011-rock/\">The Rock</a> <span class=\"subtle start-year\">(1996)</span> <span class=\"icon tiny fresh\" title=\"Fresh\"></span> <span class=\"tMeterScore\">66%</span></h2>,\n",
128 |        " <h2><a href=\"https://www.rottentomatoes.com/m/1017712-robocop/\">RoboCop</a> <span class=\"subtle start-year\">(1987)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">89%</span></h2>,\n",
129 |        " <h2><a href=\"https://www.rottentomatoes.com/m/john_wick_chapter_2/\">John Wick: Chapter 2</a> <span class=\"subtle start-year\">(2017)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">89%</span></h2>,\n",
130 |        " <h2><a href=\"https://www.rottentomatoes.com/m/casino_royale/\">Casino Royale</a> <span class=\"subtle start-year\">(2006)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">95%</span></h2>,\n",
131 |        " <h2><a href=\"https://www.rottentomatoes.com/m/baby_driver/\">Baby Driver</a> <span class=\"subtle start-year\">(2017)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">93%</span></h2>,\n",
132 |        " <h2><a href=\"https://www.rottentomatoes.com/m/fist_of_legend/\">Fist of Legend (Jing wu ying xiong)</a> <span class=\"subtle start-year\">(1994)</span> <span class=\"icon tiny fresh\" title=\"Fresh\"></span> <span class=\"tMeterScore\">100%</span></h2>,\n",
133 |        " <h2><a href=\"https://www.rottentomatoes.com/m/1032434-killer/\">The Killer</a> <span class=\"subtle start-year\">(1989)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">98%</span></h2>,\n",
134 |        " <h2><a href=\"https://www.rottentomatoes.com/m/the_raid_2/\">The Raid 2</a> <span class=\"subtle start-year\">(2014)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">80%</span></h2>,\n",
135 |        " <h2><a href=\"https://www.rottentomatoes.com/m/enter_the_dragon/\">Enter the Dragon</a> <span class=\"subtle start-year\">(1973)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">94%</span></h2>,\n",
136 |        " <h2><a href=\"https://www.rottentomatoes.com/m/1004567-commando/\">Commando</a> <span class=\"subtle start-year\">(1985)</span> <span class=\"icon tiny fresh\" title=\"Fresh\"></span> <span class=\"tMeterScore\">70%</span></h2>,\n",
137 |        " <h2><a href=\"https://www.rottentomatoes.com/m/first_blood/\">First Blood</a> <span class=\"subtle start-year\">(1982)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">87%</span></h2>,\n",
138 |        " <h2><a href=\"https://www.rottentomatoes.com/m/mission_impossible_rogue_nation/\">Mission: Impossible Rogue Nation</a> <span class=\"subtle start-year\">(2015)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">93%</span></h2>,\n",
139 |        " <h2><a href=\"https://www.rottentomatoes.com/m/terminator/\">The Terminator</a> <span class=\"subtle start-year\">(1984)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">100%</span></h2>,\n",
140 |        " <h2><a href=\"https://www.rottentomatoes.com/m/gladiator/\">Gladiator</a> <span class=\"subtle start-year\">(2000)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">76%</span></h2>,\n",
141 |        " <h2><a href=\"https://www.rottentomatoes.com/m/kill_bill_vol_1/\">Kill Bill: Volume 1</a> <span class=\"subtle start-year\">(2003)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">85%</span></h2>,\n",
142 |        " <h2><a href=\"https://www.rottentomatoes.com/m/leon_the_professional/\">Léon: The Professional</a> <span class=\"subtle start-year\">(1994)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">73%</span></h2>,\n",
143 |        " <h2><a href=\"https://www.rottentomatoes.com/m/speed_1994/\">Speed</a> <span class=\"subtle start-year\">(1994)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">94%</span></h2>,\n",
144 |        " <h2><a href=\"https://www.rottentomatoes.com/m/legend_of_drunken_master/\">The Legend of Drunken Master (Jui kuen II) (Drunken Fist II)</a> <span class=\"subtle start-year\">(1994)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">83%</span></h2>,\n",
145 |        " <h2><a href=\"https://www.rottentomatoes.com/m/john_wick/\">John Wick</a> <span class=\"subtle start-year\">(2014)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">86%</span></h2>,\n",
146 |        " <h2><a href=\"https://www.rottentomatoes.com/m/crouching_tiger_hidden_dragon/\">Crouching Tiger, Hidden Dragon</a> <span class=\"subtle start-year\">(2001)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">97%</span></h2>,\n",
147 |        " <h2><a href=\"https://www.rottentomatoes.com/m/predator/\">Predator</a> <span class=\"subtle start-year\">(1987)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">81%</span></h2>,\n",
148 |        " <h2><a href=\"https://www.rottentomatoes.com/m/bourne_ultimatum/\">The Bourne Ultimatum</a> <span class=\"subtle start-year\">(2007)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">92%</span></h2>,\n",
149 |        " <h2><a href=\"https://www.rottentomatoes.com/m/total_recall/\">Total Recall</a> <span class=\"subtle start-year\">(1990)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">82%</span></h2>,\n",
150 |        " <h2><a href=\"https://www.rottentomatoes.com/m/mad_max_2_the_road_warrior/\">Mad Max 2: The Road Warrior</a> <span class=\"subtle start-year\">(1982)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">95%</span></h2>,\n",
151 |        " <h2><a href=\"https://www.rottentomatoes.com/m/1068182-heat/\">Heat</a> <span class=\"subtle start-year\">(1995)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">86%</span></h2>,\n",
152 |        " <h2><a href=\"https://www.rottentomatoes.com/m/the_raid_redemption/\">The Raid: Redemption</a> <span class=\"subtle start-year\">(2012)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">86%</span></h2>,\n",
153 |        " <h2><a href=\"https://www.rottentomatoes.com/m/mission_impossible_fallout/\">Mission: Impossible - Fallout</a> <span class=\"subtle start-year\">(2018)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">97%</span></h2>,\n",
154 |        " <h2><a href=\"https://www.rottentomatoes.com/m/raiders_of_the_lost_ark/\">Raiders of the Lost Ark</a> <span class=\"subtle start-year\">(1981)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">95%</span></h2>,\n",
155 |        " <h2><a href=\"https://www.rottentomatoes.com/m/1000617-aliens/\">Aliens</a> <span class=\"subtle start-year\">(1986)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">99%</span></h2>,\n",
156 |        " <h2><a href=\"https://www.rottentomatoes.com/m/hard_boiled/\">Lat sau san taam (Hard-Boiled)</a> <span class=\"subtle start-year\">(1992)</span> <span class=\"icon tiny fresh\" title=\"Fresh\"></span> <span class=\"tMeterScore\">94%</span></h2>,\n",
157 |        " <h2><a href=\"https://www.rottentomatoes.com/m/matrix/\">The Matrix</a> <span class=\"subtle start-year\">(1999)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">88%</span></h2>,\n",
158 |        " <h2><a href=\"https://www.rottentomatoes.com/m/terminator_2_judgment_day/\">Terminator 2: Judgment Day</a> <span class=\"subtle start-year\">(1991)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">93%</span></h2>,\n",
159 |        " <h2><a href=\"https://www.rottentomatoes.com/m/die_hard/\">Die Hard</a> <span class=\"subtle start-year\">(1988)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">93%</span></h2>,\n",
160 |        " <h2><a href=\"https://www.rottentomatoes.com/m/mad_max_fury_road/\">Mad Max: Fury Road</a> <span class=\"subtle start-year\">(2015)</span> <span class=\"icon tiny certified\" title=\"Certified Fresh\"></span> <span class=\"tMeterScore\">97%</span></h2>]"
161 |       ]
162 |      },
163 |      "execution_count": 7,
164 |      "metadata": {},
165 |      "output_type": "execute_result"
166 |     }
167 |    ],
168 |    "source": [
169 |     "# Extracting all 'h2' tags\n",
170 |     "headings = [div.find(\"h2\") for div in divs]\n",
171 |     "headings"
172 |    ]
173 |   },
174 |   {
175 |    "cell_type": "markdown",
176 |    "metadata": {},
177 |    "source": [
178 |     "## Extracting the scores"
179 |    ]
180 |   },
181 |   {
182 |    "cell_type": "code",
183 |    "execution_count": 8,
184 |    "metadata": {
185 |     "scrolled": true
186 |    },
187 |    "outputs": [
188 |     {
189 |      "data": {
190 |       "text/plain": [
191 |        "[<span class=\"tMeterScore\">95%</span>,\n",
192 |        " <span class=\"tMeterScore\">88%</span>,\n",
193 |        " <span class=\"tMeterScore\">88%</span>,\n",
194 |        " <span class=\"tMeterScore\">90%</span>,\n",
195 |        " <span class=\"tMeterScore\">93%</span>,\n",
196 |        " <span class=\"tMeterScore\">94%</span>,\n",
197 |        " <span class=\"tMeterScore\">90%</span>,\n",
198 |        " <span class=\"tMeterScore\">93%</span>,\n",
199 |        " <span class=\"tMeterScore\">97%</span>,\n",
200 |        " <span class=\"tMeterScore\">98%</span>,\n",
201 |        " <span class=\"tMeterScore\">93%</span>,\n",
202 |        " <span class=\"tMeterScore\">92%</span>,\n",
203 |        " <span class=\"tMeterScore\">90%</span>,\n",
204 |        " <span class=\"tMeterScore\">82%</span>,\n",
205 |        " <span class=\"tMeterScore\">98%</span>,\n",
206 |        " <span class=\"tMeterScore\">81%</span>,\n",
207 |        " <span class=\"tMeterScore\">88%</span>,\n",
208 |        " <span class=\"tMeterScore\">96%</span>,\n",
209 |        " <span class=\"tMeterScore\">91%</span>,\n",
210 |        " <span class=\"tMeterScore\">90%</span>,\n",
211 |        " <span class=\"tMeterScore\">85%</span>,\n",
212 |        " <span class=\"tMeterScore\">96%</span>,\n",
213 |        " <span class=\"tMeterScore\">97%</span>,\n",
214 |        " <span class=\"tMeterScore\">87%</span>,\n",
215 |        " <span class=\"tMeterScore\">77%</span>,\n",
216 |        " <span class=\"tMeterScore\">90%</span>,\n",
217 |        " <span class=\"tMeterScore\">94%</span>,\n",
218 |        " <span class=\"tMeterScore\">79%</span>,\n",
219 |        " <span class=\"tMeterScore\">83%</span>,\n",
220 |        " <span class=\"tMeterScore\">85%</span>,\n",
221 |        " <span class=\"tMeterScore\">92%</span>,\n",
222 |        " <span class=\"tMeterScore\">91%</span>,\n",
223 |        " <span class=\"tMeterScore\">94%</span>,\n",
224 |        " <span class=\"tMeterScore\">93%</span>,\n",
225 |        " <span class=\"tMeterScore\">77%</span>,\n",
226 |        " <span class=\"tMeterScore\">82%</span>,\n",
227 |        " <span class=\"tMeterScore\">66%</span>,\n",
228 |        " <span class=\"tMeterScore\">89%</span>,\n",
229 |        " <span class=\"tMeterScore\">89%</span>,\n",
230 |        " <span class=\"tMeterScore\">95%</span>,\n",
231 |        " <span class=\"tMeterScore\">93%</span>,\n",
232 |        " <span class=\"tMeterScore\">100%</span>,\n",
233 |        " <span class=\"tMeterScore\">98%</span>,\n",
234 |        " <span class=\"tMeterScore\">80%</span>,\n",
235 |        " <span class=\"tMeterScore\">94%</span>,\n",
236 |        " <span class=\"tMeterScore\">70%</span>,\n",
237 |        " <span class=\"tMeterScore\">87%</span>,\n",
238 |        " <span class=\"tMeterScore\">93%</span>,\n",
239 |        " <span class=\"tMeterScore\">100%</span>,\n",
240 |        " <span class=\"tMeterScore\">76%</span>,\n",
241 |        " <span class=\"tMeterScore\">85%</span>,\n",
242 |        " <span class=\"tMeterScore\">73%</span>,\n",
243 |        " <span class=\"tMeterScore\">94%</span>,\n",
244 |        " <span class=\"tMeterScore\">83%</span>,\n",
245 |        " <span class=\"tMeterScore\">86%</span>,\n",
246 |        " <span class=\"tMeterScore\">97%</span>,\n",
247 |        " <span class=\"tMeterScore\">81%</span>,\n",
248 |        " <span class=\"tMeterScore\">92%</span>,\n",
249 |        " <span class=\"tMeterScore\">82%</span>,\n",
250 |        " <span class=\"tMeterScore\">95%</span>,\n",
251 |        " <span class=\"tMeterScore\">86%</span>,\n",
252 |        " <span class=\"tMeterScore\">86%</span>,\n",
253 |        " <span class=\"tMeterScore\">97%</span>,\n",
254 |        " <span class=\"tMeterScore\">95%</span>,\n",
255 |        " <span class=\"tMeterScore\">99%</span>,\n",
256 |        " <span class=\"tMeterScore\">94%</span>,\n",
257 |        " <span class=\"tMeterScore\">88%</span>,\n",
258 |        " <span class=\"tMeterScore\">93%</span>,\n",
259 |        " <span class=\"tMeterScore\">93%</span>,\n",
260 |        " <span class=\"tMeterScore\">97%</span>]"
261 |       ]
262 |      },
263 |      "execution_count": 8,
264 |      "metadata": {},
265 |      "output_type": "execute_result"
266 |     }
267 |    ],
268 |    "source": [
269 |     "# Filtering only the spans containing the score\n",
270 |     "[heading.find(\"span\", class_ = 'tMeterScore') for heading in headings]"
271 |    ]
272 |   },
273 |   {
274 |    "cell_type": "code",
275 |    "execution_count": 9,
276 |    "metadata": {
277 |     "scrolled": true
278 |    },
279 |    "outputs": [
280 |     {
281 |      "data": {
282 |       "text/plain": [
283 |        "['95%',\n",
284 |        " '88%',\n",
285 |        " '88%',\n",
286 |        " '90%',\n",
287 |        " '93%',\n",
288 |        " '94%',\n",
289 |        " '90%',\n",
290 |        " '93%',\n",
291 |        " '97%',\n",
292 |        " '98%',\n",
293 |        " '93%',\n",
294 |        " '92%',\n",
295 |        " '90%',\n",
296 |        " '82%',\n",
297 |        " '98%',\n",
298 |        " '81%',\n",
299 |        " '88%',\n",
300 |        " '96%',\n",
301 |        " '91%',\n",
302 |        " '90%',\n",
303 |        " '85%',\n",
304 |        " '96%',\n",
305 |        " '97%',\n",
306 |        " '87%',\n",
307 |        " '77%',\n",
308 |        " '90%',\n",
309 |        " '94%',\n",
310 |        " '79%',\n",
311 |        " '83%',\n",
312 |        " '85%',\n",
313 |        " '92%',\n",
314 |        " '91%',\n",
315 |        " '94%',\n",
316 |        " '93%',\n",
317 |        " '77%',\n",
318 |        " '82%',\n",
319 |        " '66%',\n",
320 |        " '89%',\n",
321 |        " '89%',\n",
322 |        " '95%',\n",
323 |        " '93%',\n",
324 |        " '100%',\n",
325 |        " '98%',\n",
326 |        " '80%',\n",
327 |        " '94%',\n",
328 |        " '70%',\n",
329 |        " '87%',\n",
330 |        " '93%',\n",
331 |        " '100%',\n",
332 |        " '76%',\n",
333 |        " '85%',\n",
334 |        " '73%',\n",
335 |        " '94%',\n",
336 |        " '83%',\n",
337 |        " '86%',\n",
338 |        " '97%',\n",
339 |        " '81%',\n",
340 |        " '92%',\n",
341 |        " '82%',\n",
342 |        " '95%',\n",
343 |        " '86%',\n",
344 |        " '86%',\n",
345 |        " '97%',\n",
346 |        " '95%',\n",
347 |        " '99%',\n",
348 |        " '94%',\n",
349 |        " '88%',\n",
350 |        " '93%',\n",
351 |        " '93%',\n",
352 |        " '97%']"
353 |       ]
354 |      },
355 |      "execution_count": 9,
356 |      "metadata": {},
357 |      "output_type": "execute_result"
358 |     }
359 |    ],
360 |    "source": [
361 |     "# Extracting the score string\n",
362 |     "scores = [heading.find(\"span\", class_ = 'tMeterScore').string for heading in headings]\n",
363 |     "scores"
364 |    ]
365 |   },
366 |   {
367 |    "cell_type": "code",
368 |    "execution_count": 10,
369 |    "metadata": {
370 |     "scrolled": true
371 |    },
372 |    "outputs": [
373 |     {
374 |      "data": {
375 |       "text/plain": [
376 |        "['95',\n",
377 |        " '88',\n",
378 |        " '88',\n",
379 |        " '90',\n",
380 |        " '93',\n",
381 |        " '94',\n",
382 |        " '90',\n",
383 |        " '93',\n",
384 |        " '97',\n",
385 |        " '98',\n",
386 |        " '93',\n",
387 |        " '92',\n",
388 |        " '90',\n",
389 |        " '82',\n",
390 |        " '98',\n",
391 |        " '81',\n",
392 |        " '88',\n",
393 |        " '96',\n",
394 |        " '91',\n",
395 |        " '90',\n",
396 |        " '85',\n",
397 |        " '96',\n",
398 |        " '97',\n",
399 |        " '87',\n",
400 |        " '77',\n",
401 |        " '90',\n",
402 |        " '94',\n",
403 |        " '79',\n",
404 |        " '83',\n",
405 |        " '85',\n",
406 |        " '92',\n",
407 |        " '91',\n",
408 |        " '94',\n",
409 |        " '93',\n",
410 |        " '77',\n",
411 |        " '82',\n",
412 |        " '66',\n",
413 |        " '89',\n",
414 |        " '89',\n",
415 |        " '95',\n",
416 |        " '93',\n",
417 |        " '100',\n",
418 |        " '98',\n",
419 |        " '80',\n",
420 |        " '94',\n",
421 |        " '70',\n",
422 |        " '87',\n",
423 |        " '93',\n",
424 |        " '100',\n",
425 |        " '76',\n",
426 |        " '85',\n",
427 |        " '73',\n",
428 |        " '94',\n",
429 |        " '83',\n",
430 |        " '86',\n",
431 |        " '97',\n",
432 |        " '81',\n",
433 |        " '92',\n",
434 |        " '82',\n",
435 |        " '95',\n",
436 |        " '86',\n",
437 |        " '86',\n",
438 |        " '97',\n",
439 |        " '95',\n",
440 |        " '99',\n",
441 |        " '94',\n",
442 |        " '88',\n",
443 |        " '93',\n",
444 |        " '93',\n",
445 |        " '97']"
446 |       ]
447 |      },
448 |      "execution_count": 10,
449 |      "metadata": {},
450 |      "output_type": "execute_result"
451 |     }
452 |    ],
453 |    "source": [
454 |     "# Removing the '%' sign\n",
455 |     "scores = [s.strip('%') for s in scores]\n",
456 |     "scores"
457 |    ]
458 |   },
459 |   {
460 |    "cell_type": "code",
461 |    "execution_count": 11,
462 |    "metadata": {
463 |     "scrolled": true
464 |    },
465 |    "outputs": [
466 |     {
467 |      "data": {
468 |       "text/plain": [
469 |        "[95,\n",
470 |        " 88,\n",
471 |        " 88,\n",
472 |        " 90,\n",
473 |        " 93,\n",
474 |        " 94,\n",
475 |        " 90,\n",
476 |        " 93,\n",
477 |        " 97,\n",
478 |        " 98,\n",
479 |        " 93,\n",
480 |        " 92,\n",
481 |        " 90,\n",
482 |        " 82,\n",
483 |        " 98,\n",
484 |        " 81,\n",
485 |        " 88,\n",
486 |        " 96,\n",
487 |        " 91,\n",
488 |        " 90,\n",
489 |        " 85,\n",
490 |        " 96,\n",
491 |        " 97,\n",
492 |        " 87,\n",
493 |        " 77,\n",
494 |        " 90,\n",
495 |        " 94,\n",
496 |        " 79,\n",
497 |        " 83,\n",
498 |        " 85,\n",
499 |        " 92,\n",
500 |        " 91,\n",
501 |        " 94,\n",
502 |        " 93,\n",
503 |        " 77,\n",
504 |        " 82,\n",
505 |        " 66,\n",
506 |        " 89,\n",
507 |        " 89,\n",
508 |        " 95,\n",
509 |        " 93,\n",
510 |        " 100,\n",
511 |        " 98,\n",
512 |        " 80,\n",
513 |        " 94,\n",
514 |        " 70,\n",
515 |        " 87,\n",
516 |        " 93,\n",
517 |        " 100,\n",
518 |        " 76,\n",
519 |        " 85,\n",
520 |        " 73,\n",
521 |        " 94,\n",
522 |        " 83,\n",
523 |        " 86,\n",
524 |        " 97,\n",
525 |        " 81,\n",
526 |        " 92,\n",
527 |        " 82,\n",
528 |        " 95,\n",
529 |        " 86,\n",
530 |        " 86,\n",
531 |        " 97,\n",
532 |        " 95,\n",
533 |        " 99,\n",
534 |        " 94,\n",
535 |        " 88,\n",
536 |        " 93,\n",
537 |        " 93,\n",
538 |        " 97]"
539 |       ]
540 |      },
541 |      "execution_count": 11,
542 |      "metadata": {},
543 |      "output_type": "execute_result"
544 |     }
545 |    ],
546 |    "source": [
547 |     "# Converting each score to an integer\n",
548 |     "scores = [int(s) for s in scores]\n",
549 |     "scores"
550 |    ]
551 |   }
552 |  ],
553 |  "metadata": {
554 |   "kernelspec": {
555 |    "display_name": "Python 3",
556 |    "language": "python",
557 |    "name": "python3"
558 |   },
559 |   "language_info": {
560 |    "codemirror_mode": {
561 |     "name": "ipython",
562 |     "version": 3
563 |    },
564 |    "file_extension": ".py",
565 |    "mimetype": "text/x-python",
566 |    "name": "python",
567 |    "nbconvert_exporter": "python",
568 |    "pygments_lexer": "ipython3",
569 |    "version": "3.7.3"
570 |   }
571 |  },
572 |  "nbformat": 4,
573 |  "nbformat_minor": 2
574 | }
575 | 


--------------------------------------------------------------------------------
/06.Project Scraping - Rotten Tomatoes/Section 6 - Setting up your scraper.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Set-up"
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "code",
 12 |    "execution_count": 1,
 13 |    "metadata": {},
 14 |    "outputs": [],
 15 |    "source": [
 16 |     "# load packages\n",
 17 |     "import requests\n",
 18 |     "from bs4 import BeautifulSoup"
 19 |    ]
 20 |   },
 21 |   {
 22 |    "cell_type": "code",
 23 |    "execution_count": 2,
 24 |    "metadata": {},
 25 |    "outputs": [],
 26 |    "source": [
 27 |     "# Define the URL of the site\n",
 28 |     "base_site = \"https://editorial.rottentomatoes.com/guide/140-essential-action-movies-to-watch-now/2/\""
 29 |    ]
 30 |   },
 31 |   {
 32 |    "cell_type": "code",
 33 |    "execution_count": 3,
 34 |    "metadata": {},
 35 |    "outputs": [
 36 |     {
 37 |      "data": {
 38 |       "text/plain": [
 39 |        "200"
 40 |       ]
 41 |      },
 42 |      "execution_count": 3,
 43 |      "metadata": {},
 44 |      "output_type": "execute_result"
 45 |     }
 46 |    ],
 47 |    "source": [
 48 |     "# sending a request to the webpage\n",
 49 |     "response = requests.get(base_site)\n",
 50 |     "response.status_code"
 51 |    ]
 52 |   },
 53 |   {
 54 |    "cell_type": "code",
 55 |    "execution_count": 4,
 56 |    "metadata": {},
 57 |    "outputs": [],
 58 |    "source": [
 59 |     "# get the HTML from the webpage\n",
 60 |     "html = response.content"
 61 |    ]
 62 |   },
 63 |   {
 64 |    "cell_type": "markdown",
 65 |    "metadata": {},
 66 |    "source": [
 67 |     "## Choosing a parser"
 68 |    ]
 69 |   },
 70 |   {
 71 |    "cell_type": "markdown",
 72 |    "metadata": {},
 73 |    "source": [
 74 |     "### html.parser"
 75 |    ]
 76 |   },
 77 |   {
 78 |    "cell_type": "code",
 79 |    "execution_count": 5,
 80 |    "metadata": {},
 81 |    "outputs": [],
 82 |    "source": [
 83 |     "# convert the HTML to a Beautiful Soup object\n",
 84 |     "soup = BeautifulSoup(html, 'html.parser')"
 85 |    ]
 86 |   },
 87 |   {
 88 |    "cell_type": "code",
 89 |    "execution_count": 6,
 90 |    "metadata": {},
 91 |    "outputs": [],
 92 |    "source": [
 93 |     "# Exporting the HTML to a file\n",
 94 |     "with open('Rotten_tomatoes_page_2_HTML_Parser.html', 'wb') as file:\n",
 95 |     "    file.write(soup.prettify('utf-8'))"
 96 |    ]
 97 |   },
 98 |   {
 99 |    "cell_type": "code",
100 |    "execution_count": 7,
101 |    "metadata": {},
102 |    "outputs": [],
103 |    "source": [
104 |     "# When inspecting the file we see that HTML element is closed at the begining -- it parsed incorrectly!\n",
105 |     "# Let's check another parser"
106 |    ]
107 |   },
108 |   {
109 |    "cell_type": "markdown",
110 |    "metadata": {},
111 |    "source": [
112 |     "### lxml"
113 |    ]
114 |   },
115 |   {
116 |    "cell_type": "code",
117 |    "execution_count": 8,
118 |    "metadata": {},
119 |    "outputs": [],
120 |    "source": [
121 |     "# convert the HTML to a BeatifulSoup object\n",
122 |     "soup = BeautifulSoup(html, 'lxml')"
123 |    ]
124 |   },
125 |   {
126 |    "cell_type": "code",
127 |    "execution_count": 9,
128 |    "metadata": {},
129 |    "outputs": [],
130 |    "source": [
131 |     "# Exporting the HTML to a file\n",
132 |     "with open('Rotten_tomatoes_page_2_LXML_Parser.html', 'wb') as file:\n",
133 |     "    file.write(soup.prettify('utf-8'))"
134 |    ]
135 |   },
136 |   {
137 |    "cell_type": "code",
138 |    "execution_count": 10,
139 |    "metadata": {},
140 |    "outputs": [],
141 |    "source": [
142 |     "# By first accounts of inspecting the file everything seems fine"
143 |    ]
144 |   },
145 |   {
146 |    "cell_type": "markdown",
147 |    "metadata": {},
148 |    "source": [
149 |     "### A word of caution"
150 |    ]
151 |   },
152 |   {
153 |    "cell_type": "code",
154 |    "execution_count": 11,
155 |    "metadata": {},
156 |    "outputs": [],
157 |    "source": [
158 |     "# Beautiful Soup ranks the lxml parser as the best one.\n",
159 |     "\n",
160 |     "# If a parser is not explicitly stated in the Beautiful Soup constructor,\n",
161 |     "# the best one available on the current machine is chosen.\n",
162 |     "\n",
163 |     "# This means that the same piece of code can give different results on different computers."
164 |    ]
165 |   }
166 |  ],
167 |  "metadata": {
168 |   "kernelspec": {
169 |    "display_name": "Python 3",
170 |    "language": "python",
171 |    "name": "python3"
172 |   },
173 |   "language_info": {
174 |    "codemirror_mode": {
175 |     "name": "ipython",
176 |     "version": 3
177 |    },
178 |    "file_extension": ".py",
179 |    "mimetype": "text/x-python",
180 |    "name": "python",
181 |    "nbconvert_exporter": "python",
182 |    "pygments_lexer": "ipython3",
183 |    "version": "3.7.3"
184 |   }
185 |  },
186 |  "nbformat": 4,
187 |  "nbformat_minor": 2
188 | }
189 | 


--------------------------------------------------------------------------------
/06.Project Scraping - Rotten Tomatoes/movies_info.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ptyadana/Web-Scraping-and-API-in-Python/9595bc418866642143eaf4a1f700dd646d81d427/06.Project Scraping - Rotten Tomatoes/movies_info.xlsx


--------------------------------------------------------------------------------
/08.Scraping Steam Project/New_Trending_Games_Info.csv:
--------------------------------------------------------------------------------
 1 | Title,Price,Tags
 2 | Dreamscaper: Prologue,Free,"Action, Indie, RPG, Free to Play"
 3 | RESIDENT EVIL 3,$59.99,"Action, Zombies, Horror, Survival Horror"
 4 | ONE PIECE: PIRATE WARRIORS 4,$59.99,"Action, Anime, Co-op, Online Co-Op"
 5 | Eternal Radiance,$16.19,"Action, Adventure, RPG, Anime"
 6 | Deadside,$19.99,"Massively Multiplayer, Action, Adventure, Indie"
 7 | Conqueror's Blade,Free to Play,"Strategy, Massively Multiplayer, Action, Simulation"
 8 | Borderlands 3,$59.99,"RPG, Action, Online Co-Op, Looter Shooter"
 9 | Granblue Fantasy: Versus,$59.99,"Action, Anime, Fighting, 2D Fighter"
10 | Receiver 2,$17.99,"Simulation, Indie, Action, Shooter"
11 | Rakion Chaos Force,Free,"Action, RPG, Free to Play, Strategy"
12 | Mount & Blade II: Bannerlord,$49.99,"Early Access, Medieval, Strategy, Open World"
13 | Half-Life: Alyx,$59.99,"Masterpiece, Action, VR, Adventure"
14 | Last Oasis,$29.99,"Massively Multiplayer, Survival, Action, Adventure"
15 | DOOM Eternal,$59.99,"Action, Masterpiece, Great Soundtrack, FPS"
16 | Disaster Report 4: Summer Memories,$59.99,"Adventure, Action, Survival, VR"
17 | 


--------------------------------------------------------------------------------
/08.Scraping Steam Project/Section 8 - Scraping Steam - Setup.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Extracting data from Steam "
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "markdown",
 12 |    "metadata": {},
 13 |    "source": [
 14 |     "## Initial Setup"
 15 |    ]
 16 |   },
 17 |   {
 18 |    "cell_type": "code",
 19 |    "execution_count": null,
 20 |    "metadata": {},
 21 |    "outputs": [],
 22 |    "source": [
 23 |     "from bs4 import BeautifulSoup\n",
 24 |     "import requests"
 25 |    ]
 26 |   },
 27 |   {
 28 |    "cell_type": "markdown",
 29 |    "metadata": {},
 30 |    "source": [
 31 |     "## Connect to Steam webpage"
 32 |    ]
 33 |   },
 34 |   {
 35 |    "cell_type": "code",
 36 |    "execution_count": null,
 37 |    "metadata": {},
 38 |    "outputs": [],
 39 |    "source": [
 40 |     "r = requests.get(\"https://store.steampowered.com/tags/en/Action/\")\n",
 41 |     "r.status_code"
 42 |    ]
 43 |   },
 44 |   {
 45 |    "cell_type": "code",
 46 |    "execution_count": null,
 47 |    "metadata": {},
 48 |    "outputs": [],
 49 |    "source": [
 50 |     "html = r.content"
 51 |    ]
 52 |   },
 53 |   {
 54 |    "cell_type": "code",
 55 |    "execution_count": null,
 56 |    "metadata": {},
 57 |    "outputs": [],
 58 |    "source": [
 59 |     "soup = BeautifulSoup(html, \"lxml\")"
 60 |    ]
 61 |   },
 62 |   {
 63 |    "cell_type": "code",
 64 |    "execution_count": null,
 65 |    "metadata": {},
 66 |    "outputs": [],
 67 |    "source": []
 68 |   },
 69 |   {
 70 |    "cell_type": "markdown",
 71 |    "metadata": {},
 72 |    "source": [
 73 |     "## What can we scrape from this webpage?\n",
 74 |     "## 1) Try extracting the names of the top games from this page.\n",
 75 |     "## 2) What tags contain the prices?  Can you extract the price information?\n",
 76 |     "## 3) Get all of the header tags on the page\n",
 77 |     "## 4) Can you get the text from each span tag with class equal to \"top_tag\"?\n",
 78 |     "## 5) Under the \"Narrow by Tag\" section, there are a collection of tags (e.g. \"Indie\", \"Adventure\", etc.).  Write code to return these tags.\n",
 79 |     "## 6) What else can be scraped from this webpage or others on the site?"
 80 |    ]
 81 |   },
 82 |   {
 83 |    "cell_type": "markdown",
 84 |    "metadata": {},
 85 |    "source": [
 86 |     "## Now is your turn!"
 87 |    ]
 88 |   },
 89 |   {
 90 |    "cell_type": "code",
 91 |    "execution_count": null,
 92 |    "metadata": {},
 93 |    "outputs": [],
 94 |    "source": []
 95 |   },
 96 |   {
 97 |    "cell_type": "code",
 98 |    "execution_count": null,
 99 |    "metadata": {},
100 |    "outputs": [],
101 |    "source": []
102 |   },
103 |   {
104 |    "cell_type": "code",
105 |    "execution_count": null,
106 |    "metadata": {},
107 |    "outputs": [],
108 |    "source": []
109 |   },
110 |   {
111 |    "cell_type": "code",
112 |    "execution_count": null,
113 |    "metadata": {},
114 |    "outputs": [],
115 |    "source": []
116 |   },
117 |   {
118 |    "cell_type": "code",
119 |    "execution_count": null,
120 |    "metadata": {},
121 |    "outputs": [],
122 |    "source": []
123 |   }
124 |  ],
125 |  "metadata": {
126 |   "kernelspec": {
127 |    "display_name": "Python 3",
128 |    "language": "python",
129 |    "name": "python3"
130 |   },
131 |   "language_info": {
132 |    "codemirror_mode": {
133 |     "name": "ipython",
134 |     "version": 3
135 |    },
136 |    "file_extension": ".py",
137 |    "mimetype": "text/x-python",
138 |    "name": "python",
139 |    "nbconvert_exporter": "python",
140 |    "pygments_lexer": "ipython3",
141 |    "version": "3.7.3"
142 |   }
143 |  },
144 |  "nbformat": 4,
145 |  "nbformat_minor": 2
146 | }
147 | 


--------------------------------------------------------------------------------
/08.Scraping Steam Project/Top_Rated_Games.info.csv:
--------------------------------------------------------------------------------
 1 | Title,Price,Tags
 2 | Counter-Strike: Global Offensive,Free to Play,"FPS, Shooter, Multiplayer, Competitive"
 3 | Tom Clancy's Rainbow Six® Siege,$19.99,"FPS, Hero Shooter, Multiplayer, Tactical"
 4 | Warframe,Free to Play,"Looter Shooter, Free to Play, Action, Co-op"
 5 | Left 4 Dead 2,$9.99,"Zombies, Co-op, FPS, Multiplayer"
 6 | Counter-Strike,$9.99,"Action, FPS, Multiplayer, Shooter"
 7 | Borderlands 2,$19.99,"Loot, Shooter, Action, Multiplayer"
 8 | Tomb Raider,$19.99,"Adventure, Action, Female Protagonist, Third Person"
 9 | PAYDAY 2,$9.99,"Co-op, Action, FPS, Heist"
10 | Counter-Strike: Source,$9.99,"Shooter, Action, FPS, Multiplayer"
11 | Destiny 2,Free To Play,"Free to Play, Looter Shooter, FPS, Multiplayer"
12 | Half-Life 2,$9.99,"FPS, Action, Sci-fi, Classic"
13 | BioShock Infinite,$29.99,"FPS, Story Rich, Action, Singleplayer"
14 | Mount & Blade: Warband,$19.99,"Medieval, RPG, Open World, Strategy"
15 | Risk of Rain 2,$19.99,"Third-Person Shooter, Action Roguelike, Action, Multiplayer"
16 | MONSTER HUNTER: WORLD,$29.99,"Co-op, Multiplayer, Action, Open World"
17 | 


--------------------------------------------------------------------------------
/08.Scraping Steam Project/Top_Sellers_Games_info.csv:
--------------------------------------------------------------------------------
 1 | Title,Price,Tags
 2 | Counter-Strike: Global Offensive,Free to Play,"FPS, Shooter, Multiplayer, Competitive"
 3 | Tom Clancy's Rainbow Six® Siege,$19.99,"FPS, Hero Shooter, Multiplayer, Tactical"
 4 | Warframe,Free to Play,"Looter Shooter, Free to Play, Action, Co-op"
 5 | Left 4 Dead 2,$9.99,"Zombies, Co-op, FPS, Multiplayer"
 6 | Counter-Strike,$9.99,"Action, FPS, Multiplayer, Shooter"
 7 | Borderlands 2,$19.99,"Loot, Shooter, Action, Multiplayer"
 8 | Tomb Raider,$19.99,"Adventure, Action, Female Protagonist, Third Person"
 9 | PAYDAY 2,$9.99,"Co-op, Action, FPS, Heist"
10 | Counter-Strike: Source,$9.99,"Shooter, Action, FPS, Multiplayer"
11 | Destiny 2,Free To Play,"Free to Play, Looter Shooter, FPS, Multiplayer"
12 | Half-Life 2,$9.99,"FPS, Action, Sci-fi, Classic"
13 | BioShock Infinite,$29.99,"FPS, Story Rich, Action, Singleplayer"
14 | Mount & Blade: Warband,$19.99,"Medieval, RPG, Open World, Strategy"
15 | Risk of Rain 2,$19.99,"Third-Person Shooter, Action Roguelike, Action, Multiplayer"
16 | MONSTER HUNTER: WORLD,$29.99,"Co-op, Multiplayer, Action, Open World"
17 | 


--------------------------------------------------------------------------------
/08.Scraping Steam Project/Trending_Games_info.csv:
--------------------------------------------------------------------------------
 1 | Title,Price,Tags
 2 | Counter-Strike: Global Offensive,Free to Play,"FPS, Shooter, Multiplayer, Competitive"
 3 | Tom Clancy's Rainbow Six® Siege,$19.99,"FPS, Hero Shooter, Multiplayer, Tactical"
 4 | Warframe,Free to Play,"Looter Shooter, Free to Play, Action, Co-op"
 5 | Left 4 Dead 2,$9.99,"Zombies, Co-op, FPS, Multiplayer"
 6 | Counter-Strike,$9.99,"Action, FPS, Multiplayer, Shooter"
 7 | Borderlands 2,$19.99,"Loot, Shooter, Action, Multiplayer"
 8 | Tomb Raider,$19.99,"Adventure, Action, Female Protagonist, Third Person"
 9 | PAYDAY 2,$9.99,"Co-op, Action, FPS, Heist"
10 | Counter-Strike: Source,$9.99,"Shooter, Action, FPS, Multiplayer"
11 | Destiny 2,Free To Play,"Free to Play, Looter Shooter, FPS, Multiplayer"
12 | Half-Life 2,$9.99,"FPS, Action, Sci-fi, Classic"
13 | BioShock Infinite,$29.99,"FPS, Story Rich, Action, Singleplayer"
14 | Mount & Blade: Warband,$19.99,"Medieval, RPG, Open World, Strategy"
15 | Risk of Rain 2,$19.99,"Third-Person Shooter, Action Roguelike, Action, Multiplayer"
16 | MONSTER HUNTER: WORLD,$29.99,"Co-op, Multiplayer, Action, Open World"
17 | 


--------------------------------------------------------------------------------
/08.Scraping Youtube Project/Section 8 - Scraping YouTube - Setup.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Scraping YouTube"
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "markdown",
 12 |    "metadata": {},
 13 |    "source": [
 14 |     "## Initial Setup"
 15 |    ]
 16 |   },
 17 |   {
 18 |    "cell_type": "code",
 19 |    "execution_count": null,
 20 |    "metadata": {},
 21 |    "outputs": [],
 22 |    "source": [
 23 |     "from bs4 import BeautifulSoup\n",
 24 |     "import requests"
 25 |    ]
 26 |   },
 27 |   {
 28 |    "cell_type": "markdown",
 29 |    "metadata": {},
 30 |    "source": [
 31 |     "## Connect to webpage"
 32 |    ]
 33 |   },
 34 |   {
 35 |    "cell_type": "code",
 36 |    "execution_count": null,
 37 |    "metadata": {},
 38 |    "outputs": [],
 39 |    "source": [
 40 |     "r = requests.get(\"https://www.youtube.com/\")\n",
 41 |     "r.status_code"
 42 |    ]
 43 |   },
 44 |   {
 45 |    "cell_type": "code",
 46 |    "execution_count": null,
 47 |    "metadata": {},
 48 |    "outputs": [],
 49 |    "source": [
 50 |     "# get HTML\n",
 51 |     "html = resp.content"
 52 |    ]
 53 |   },
 54 |   {
 55 |    "cell_type": "code",
 56 |    "execution_count": null,
 57 |    "metadata": {},
 58 |    "outputs": [],
 59 |    "source": [
 60 |     "# convert HTML to BeautifulSoup object\n",
 61 |     "soup = BeautifulSoup(html)"
 62 |    ]
 63 |   },
 64 |   {
 65 |    "cell_type": "code",
 66 |    "execution_count": null,
 67 |    "metadata": {},
 68 |    "outputs": [],
 69 |    "source": []
 70 |   },
 71 |   {
 72 |    "cell_type": "markdown",
 73 |    "metadata": {},
 74 |    "source": [
 75 |     "## 1) Scrape the text from each span tag\n",
 76 |     "## 2) How many images are on YouTube'e homepage?\n",
 77 |     "## 3) Can you find the URL of the link with title = \"Movies\"?  Music? Sports?\n",
 78 |     "## 4) Now, try connecting to and scraping https://www.youtube.com/results?search_query=stairway+to+heaven\n",
 79 |     "## a) Can you get the names of the first few videos in the search results?\n",
 80 |     "## b) Next, connect to one of the search result videos - https://www.youtube.com/watch?v=qHFxncb1gRY\n",
 81 |     "## c) Can you find the \"related\" videos?  What are their titles?  Durations?  URLs? Number of views?\n",
 82 |     "## d) Try finding (and scraping) the Twitter description of the video."
 83 |    ]
 84 |   },
 85 |   {
 86 |    "cell_type": "code",
 87 |    "execution_count": null,
 88 |    "metadata": {},
 89 |    "outputs": [],
 90 |    "source": []
 91 |   },
 92 |   {
 93 |    "cell_type": "code",
 94 |    "execution_count": null,
 95 |    "metadata": {},
 96 |    "outputs": [],
 97 |    "source": []
 98 |   },
 99 |   {
100 |    "cell_type": "code",
101 |    "execution_count": null,
102 |    "metadata": {},
103 |    "outputs": [],
104 |    "source": []
105 |   },
106 |   {
107 |    "cell_type": "code",
108 |    "execution_count": null,
109 |    "metadata": {},
110 |    "outputs": [],
111 |    "source": []
112 |   },
113 |   {
114 |    "cell_type": "code",
115 |    "execution_count": null,
116 |    "metadata": {},
117 |    "outputs": [],
118 |    "source": []
119 |   }
120 |  ],
121 |  "metadata": {
122 |   "kernelspec": {
123 |    "display_name": "Python 3",
124 |    "language": "python",
125 |    "name": "python3"
126 |   },
127 |   "language_info": {
128 |    "codemirror_mode": {
129 |     "name": "ipython",
130 |     "version": 3
131 |    },
132 |    "file_extension": ".py",
133 |    "mimetype": "text/x-python",
134 |    "name": "python",
135 |    "nbconvert_exporter": "python",
136 |    "pygments_lexer": "ipython3",
137 |    "version": "3.7.3"
138 |   }
139 |  },
140 |  "nbformat": 4,
141 |  "nbformat_minor": 2
142 | }
143 | 


--------------------------------------------------------------------------------
/09.Common roadblocks when Web Scraping/RequestHeaders.ipynb:
--------------------------------------------------------------------------------
 1 | {
 2 |  "cells": [
 3 |   {
 4 |    "cell_type": "code",
 5 |    "execution_count": 1,
 6 |    "metadata": {},
 7 |    "outputs": [],
 8 |    "source": [
 9 |     "import requests"
10 |    ]
11 |   },
12 |   {
13 |    "cell_type": "code",
14 |    "execution_count": 2,
15 |    "metadata": {},
16 |    "outputs": [],
17 |    "source": [
18 |     "headers = {\"User-Agent\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36\"}"
19 |    ]
20 |   },
21 |   {
22 |    "cell_type": "code",
23 |    "execution_count": 4,
24 |    "metadata": {},
25 |    "outputs": [],
26 |    "source": [
27 |     "r = requests.get('https://www.youtube.com', headers = headers)"
28 |    ]
29 |   },
30 |   {
31 |    "cell_type": "code",
32 |    "execution_count": 5,
33 |    "metadata": {},
34 |    "outputs": [
35 |     {
36 |      "data": {
37 |       "text/plain": [
38 |        "200"
39 |       ]
40 |      },
41 |      "execution_count": 5,
42 |      "metadata": {},
43 |      "output_type": "execute_result"
44 |     }
45 |    ],
46 |    "source": [
47 |     "r.status_code"
48 |    ]
49 |   },
50 |   {
51 |    "cell_type": "code",
52 |    "execution_count": null,
53 |    "metadata": {},
54 |    "outputs": [],
55 |    "source": []
56 |   }
57 |  ],
58 |  "metadata": {
59 |   "kernelspec": {
60 |    "display_name": "Python 3",
61 |    "language": "python",
62 |    "name": "python3"
63 |   },
64 |   "language_info": {
65 |    "codemirror_mode": {
66 |     "name": "ipython",
67 |     "version": 3
68 |    },
69 |    "file_extension": ".py",
70 |    "mimetype": "text/x-python",
71 |    "name": "python",
72 |    "nbconvert_exporter": "python",
73 |    "pygments_lexer": "ipython3",
74 |    "version": "3.7.6"
75 |   }
76 |  },
77 |  "nbformat": 4,
78 |  "nbformat_minor": 4
79 | }
80 | 


--------------------------------------------------------------------------------
/09.Common roadblocks when Web Scraping/Section 9 - Sample HTML login Form.html:
--------------------------------------------------------------------------------
 1 | <!DOCTYPE html>
 2 | <html>
 3 | 
 4 | 	<head>
 5 | 		<title>HTML Form</title>
 6 | 	</head>
 7 | 	
 8 | 	<body>
 9 | 		
10 | 		<form action="/secure/users/sign_in?after_success_url=%2Fhome" method="post">
11 | 			
12 | 			<label for="user_email">Email Address</label>
13 | 			<input type="email"  name="user[email]" id="user_email">
14 | 			
15 | 			<label for="user_password">Password</label>
16 | 			<input type="password" name="user[password]" id="user_password">
17 | 			
18 | 			<input type="hidden" name="utf-8" value="✓">
19 | 			<input type="hidden" name="authenticity_token" value="/VIF79Gh0/GAHAZziuS3RR5L8u8==">
20 | 			
21 | 			<input type="submit" value="Log In">
22 | 			
23 | 		</form>
24 | 		
25 | 	</body>
26 | 
27 | </html>


--------------------------------------------------------------------------------
/09.Common roadblocks when Web Scraping/Section 9 - Sample login code.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "code",
  5 |    "execution_count": 1,
  6 |    "metadata": {},
  7 |    "outputs": [],
  8 |    "source": [
  9 |     "import requests"
 10 |    ]
 11 |   },
 12 |   {
 13 |    "cell_type": "code",
 14 |    "execution_count": 2,
 15 |    "metadata": {},
 16 |    "outputs": [],
 17 |    "source": [
 18 |     "# URL of the POST request - need to inspect the HTML or use devtools to obtain\n",
 19 |     "url = \"target_url_of_post_request\""
 20 |    ]
 21 |   },
 22 |   {
 23 |    "cell_type": "code",
 24 |    "execution_count": 3,
 25 |    "metadata": {},
 26 |    "outputs": [],
 27 |    "source": [
 28 |     "# Define parameters sent with the POST request\n",
 29 |     "# (if there are additional ones, define them as well)\n",
 30 |     "user = \"Your username goes here\"\n",
 31 |     "password = \"Your password goes here\""
 32 |    ]
 33 |   },
 34 |   {
 35 |    "cell_type": "code",
 36 |    "execution_count": 4,
 37 |    "metadata": {},
 38 |    "outputs": [],
 39 |    "source": [
 40 |     "# Arrange all parameters in a dictionary format with the right names\n",
 41 |     "payload = {\n",
 42 |     "    \"user[email]\": user,\n",
 43 |     "    \"user[password]\": password\n",
 44 |     "}"
 45 |    ]
 46 |   },
 47 |   {
 48 |    "cell_type": "code",
 49 |    "execution_count": 5,
 50 |    "metadata": {},
 51 |    "outputs": [],
 52 |    "source": [
 53 |     "# Create a session so that we have consistent cookies\n",
 54 |     "s = requests.Session()"
 55 |    ]
 56 |   },
 57 |   {
 58 |    "cell_type": "code",
 59 |    "execution_count": 6,
 60 |    "metadata": {},
 61 |    "outputs": [
 62 |     {
 63 |      "data": {
 64 |       "text/plain": [
 65 |        "200"
 66 |       ]
 67 |      },
 68 |      "execution_count": 6,
 69 |      "metadata": {},
 70 |      "output_type": "execute_result"
 71 |     }
 72 |    ],
 73 |    "source": [
 74 |     "# Submit the POST request through the session\n",
 75 |     "p = s.post(url, data = payload)\n",
 76 |     "p.status_code"
 77 |    ]
 78 |   },
 79 |   {
 80 |    "cell_type": "code",
 81 |    "execution_count": 7,
 82 |    "metadata": {},
 83 |    "outputs": [],
 84 |    "source": [
 85 |     "# You are now logged in and can proceed with scraping the data\n",
 86 |     "# .\n",
 87 |     "# .\n",
 88 |     "# .\n",
 89 |     "\n",
 90 |     "# Don't forget to close the session when you are done\n",
 91 |     "s.close()"
 92 |    ]
 93 |   }
 94 |  ],
 95 |  "metadata": {
 96 |   "kernelspec": {
 97 |    "display_name": "Python 3",
 98 |    "language": "python",
 99 |    "name": "python3"
100 |   },
101 |   "language_info": {
102 |    "codemirror_mode": {
103 |     "name": "ipython",
104 |     "version": 3
105 |    },
106 |    "file_extension": ".py",
107 |    "mimetype": "text/x-python",
108 |    "name": "python",
109 |    "nbconvert_exporter": "python",
110 |    "pygments_lexer": "ipython3",
111 |    "version": "3.7.6"
112 |   }
113 |  },
114 |  "nbformat": 4,
115 |  "nbformat_minor": 2
116 | }
117 | 


--------------------------------------------------------------------------------
/09.Common roadblocks when Web Scraping/Sessions.ipynb:
--------------------------------------------------------------------------------
 1 | {
 2 |  "cells": [
 3 |   {
 4 |    "cell_type": "code",
 5 |    "execution_count": 1,
 6 |    "metadata": {},
 7 |    "outputs": [],
 8 |    "source": [
 9 |     "import requests"
10 |    ]
11 |   },
12 |   {
13 |    "cell_type": "code",
14 |    "execution_count": null,
15 |    "metadata": {},
16 |    "outputs": [],
17 |    "source": [
18 |     "#initialize a session\n",
19 |     "s = requests.Session()\n",
20 |     "\n",
21 |     "#request made through that session\n",
22 |     "#related cookies are handled through each session\n",
23 |     "r1 = s.post(url1, data = payload)\n",
24 |     "\n",
25 |     "#request made through that session\n",
26 |     "r2 = s.get(url2)\n",
27 |     "\n",
28 |     "s.close()"
29 |    ]
30 |   }
31 |  ],
32 |  "metadata": {
33 |   "kernelspec": {
34 |    "display_name": "Python 3",
35 |    "language": "python",
36 |    "name": "python3"
37 |   },
38 |   "language_info": {
39 |    "codemirror_mode": {
40 |     "name": "ipython",
41 |     "version": 3
42 |    },
43 |    "file_extension": ".py",
44 |    "mimetype": "text/x-python",
45 |    "name": "python",
46 |    "nbconvert_exporter": "python",
47 |    "pygments_lexer": "ipython3",
48 |    "version": "3.7.6"
49 |   }
50 |  },
51 |  "nbformat": 4,
52 |  "nbformat_minor": 4
53 | }
54 | 


--------------------------------------------------------------------------------
/10.The Requests-HTML Package/Scraper_JavaScript.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Set up\n"
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "code",
 12 |    "execution_count": 2,
 13 |    "metadata": {},
 14 |    "outputs": [],
 15 |    "source": [
 16 |     "from requests_html import AsyncHTMLSession"
 17 |    ]
 18 |   },
 19 |   {
 20 |    "cell_type": "code",
 21 |    "execution_count": 3,
 22 |    "metadata": {},
 23 |    "outputs": [],
 24 |    "source": [
 25 |     "session = AsyncHTMLSession()"
 26 |    ]
 27 |   },
 28 |   {
 29 |    "cell_type": "code",
 30 |    "execution_count": 4,
 31 |    "metadata": {},
 32 |    "outputs": [
 33 |     {
 34 |      "data": {
 35 |       "text/plain": [
 36 |        "200"
 37 |       ]
 38 |      },
 39 |      "execution_count": 4,
 40 |      "metadata": {},
 41 |      "output_type": "execute_result"
 42 |     }
 43 |    ],
 44 |    "source": [
 45 |     "r = await session.get('https://www.reddit.com')\n",
 46 |     "r.status_code"
 47 |    ]
 48 |   },
 49 |   {
 50 |    "cell_type": "code",
 51 |    "execution_count": 5,
 52 |    "metadata": {},
 53 |    "outputs": [],
 54 |    "source": [
 55 |     "divs = r.html.find('div')\n",
 56 |     "links = r.html.find('a')\n",
 57 |     "urls = r.html.absolute_links"
 58 |    ]
 59 |   },
 60 |   {
 61 |    "cell_type": "markdown",
 62 |    "metadata": {},
 63 |    "source": [
 64 |     "## need to render the javascript as html is genereted dynamically with JS"
 65 |    ]
 66 |   },
 67 |   {
 68 |    "cell_type": "markdown",
 69 |    "metadata": {},
 70 |    "source": [
 71 |     "this will install chromium in PC which acts like web browser, but only used by program"
 72 |    ]
 73 |   },
 74 |   {
 75 |    "cell_type": "code",
 76 |    "execution_count": 10,
 77 |    "metadata": {},
 78 |    "outputs": [],
 79 |    "source": [
 80 |     "import pyppdf.patch_pyppeteer\n",
 81 |     "await r.html.arender()"
 82 |    ]
 83 |   },
 84 |   {
 85 |    "cell_type": "code",
 86 |    "execution_count": 11,
 87 |    "metadata": {},
 88 |    "outputs": [],
 89 |    "source": [
 90 |     "new_divs = r.html.find('div')\n",
 91 |     "new_links = r.html.find('a')\n",
 92 |     "new_urls = r.html.absolute_links"
 93 |    ]
 94 |   },
 95 |   {
 96 |    "cell_type": "code",
 97 |    "execution_count": 12,
 98 |    "metadata": {},
 99 |    "outputs": [
100 |     {
101 |      "data": {
102 |       "text/plain": [
103 |        "(504, 1649)"
104 |       ]
105 |      },
106 |      "execution_count": 12,
107 |      "metadata": {},
108 |      "output_type": "execute_result"
109 |     }
110 |    ],
111 |    "source": [
112 |     "len(divs) , len(new_divs)"
113 |    ]
114 |   },
115 |   {
116 |    "cell_type": "code",
117 |    "execution_count": 13,
118 |    "metadata": {},
119 |    "outputs": [
120 |     {
121 |      "data": {
122 |       "text/plain": [
123 |        "(80, 661)"
124 |       ]
125 |      },
126 |      "execution_count": 13,
127 |      "metadata": {},
128 |      "output_type": "execute_result"
129 |     }
130 |    ],
131 |    "source": [
132 |     "len(links), len(new_links)"
133 |    ]
134 |   },
135 |   {
136 |    "cell_type": "code",
137 |    "execution_count": 14,
138 |    "metadata": {},
139 |    "outputs": [
140 |     {
141 |      "data": {
142 |       "text/plain": [
143 |        "(57, 627)"
144 |       ]
145 |      },
146 |      "execution_count": 14,
147 |      "metadata": {},
148 |      "output_type": "execute_result"
149 |     }
150 |    ],
151 |    "source": [
152 |     "len(urls), len(new_urls)"
153 |    ]
154 |   },
155 |   {
156 |    "cell_type": "markdown",
157 |    "metadata": {},
158 |    "source": [
159 |     "### Check the difference between first html and rendered version html"
160 |    ]
161 |   },
162 |   {
163 |    "cell_type": "code",
164 |    "execution_count": 15,
165 |    "metadata": {},
166 |    "outputs": [
167 |     {
168 |      "data": {
169 |       "text/plain": [
170 |        "{'https://www.reddit.com/r/1200isplenty/',\n",
171 |        " 'https://www.reddit.com/r/2007scape/',\n",
172 |        " 'https://www.reddit.com/r/49ers/',\n",
173 |        " 'https://www.reddit.com/r/90DayFiance/',\n",
174 |        " 'https://www.reddit.com/r/ACMilan/',\n",
175 |        " 'https://www.reddit.com/r/Adelaide/',\n",
176 |        " 'https://www.reddit.com/r/Amd/',\n",
177 |        " 'https://www.reddit.com/r/Android/',\n",
178 |        " 'https://www.reddit.com/r/Animesuggest/',\n",
179 |        " 'https://www.reddit.com/r/AnthemTheGame/',\n",
180 |        " 'https://www.reddit.com/r/AskCulinary/',\n",
181 |        " 'https://www.reddit.com/r/AskMen/',\n",
182 |        " 'https://www.reddit.com/r/AskNYC/',\n",
183 |        " 'https://www.reddit.com/r/AskReddit/',\n",
184 |        " 'https://www.reddit.com/r/AskWomen/',\n",
185 |        " 'https://www.reddit.com/r/Astros/',\n",
186 |        " 'https://www.reddit.com/r/Atlanta/',\n",
187 |        " 'https://www.reddit.com/r/AtlantaUnited/',\n",
188 |        " 'https://www.reddit.com/r/Austria/',\n",
189 |        " 'https://www.reddit.com/r/Barca/',\n",
190 |        " 'https://www.reddit.com/r/BattlefieldV/',\n",
191 |        " 'https://www.reddit.com/r/BeautyBoxes/',\n",
192 |        " 'https://www.reddit.com/r/BeautyGuruChatter/',\n",
193 |        " 'https://www.reddit.com/r/Berserk/',\n",
194 |        " 'https://www.reddit.com/r/BigBrother/',\n",
195 |        " 'https://www.reddit.com/r/BlackClover/',\n",
196 |        " 'https://www.reddit.com/r/Blackops4/',\n",
197 |        " 'https://www.reddit.com/r/BoJackHorseman/',\n",
198 |        " 'https://www.reddit.com/r/BokuNoHeroAcademia/',\n",
199 |        " 'https://www.reddit.com/r/Boruto/',\n",
200 |        " 'https://www.reddit.com/r/BostonBruins/',\n",
201 |        " 'https://www.reddit.com/r/Boxing/',\n",
202 |        " 'https://www.reddit.com/r/Braves/',\n",
203 |        " 'https://www.reddit.com/r/BravoRealHousewives/',\n",
204 |        " 'https://www.reddit.com/r/Brawlstars/',\n",
205 |        " 'https://www.reddit.com/r/Breath_of_the_Wild/',\n",
206 |        " 'https://www.reddit.com/r/Brogress/',\n",
207 |        " 'https://www.reddit.com/r/Browns/',\n",
208 |        " 'https://www.reddit.com/r/C25K/',\n",
209 |        " 'https://www.reddit.com/r/CFB/',\n",
210 |        " 'https://www.reddit.com/r/CHIBears/',\n",
211 |        " 'https://www.reddit.com/r/CHICubs/',\n",
212 |        " 'https://www.reddit.com/r/Calgary/',\n",
213 |        " 'https://www.reddit.com/r/CampingGear/',\n",
214 |        " 'https://www.reddit.com/r/CampingandHiking/',\n",
215 |        " 'https://www.reddit.com/r/Cardinals/',\n",
216 |        " 'https://www.reddit.com/r/CasualUK/',\n",
217 |        " 'https://www.reddit.com/r/Charlotte/',\n",
218 |        " 'https://www.reddit.com/r/China/',\n",
219 |        " 'https://www.reddit.com/r/ClashOfClans/',\n",
220 |        " 'https://www.reddit.com/r/ClashRoyale/',\n",
221 |        " 'https://www.reddit.com/r/CoDCompetitive/',\n",
222 |        " 'https://www.reddit.com/r/CoachellaValley/',\n",
223 |        " 'https://www.reddit.com/r/CollegeBasketball/',\n",
224 |        " 'https://www.reddit.com/r/Columbus/',\n",
225 |        " 'https://www.reddit.com/r/Competitiveoverwatch/',\n",
226 |        " 'https://www.reddit.com/r/Cooking/',\n",
227 |        " 'https://www.reddit.com/r/Cricket/',\n",
228 |        " 'https://www.reddit.com/r/CrohnsDisease/',\n",
229 |        " 'https://www.reddit.com/r/CrusaderKings/',\n",
230 |        " 'https://www.reddit.com/r/DBZDokkanBattle/',\n",
231 |        " 'https://www.reddit.com/r/DMAcademy/',\n",
232 |        " 'https://www.reddit.com/r/Dallas/',\n",
233 |        " 'https://www.reddit.com/r/DanLeBatardShow/',\n",
234 |        " 'https://www.reddit.com/r/DaysGone/',\n",
235 |        " 'https://www.reddit.com/r/Denmark/',\n",
236 |        " 'https://www.reddit.com/r/Denver/',\n",
237 |        " 'https://www.reddit.com/r/Destiny/',\n",
238 |        " 'https://www.reddit.com/r/DestinyTheGame/',\n",
239 |        " 'https://www.reddit.com/r/Detroit/',\n",
240 |        " 'https://www.reddit.com/r/Disneyland/',\n",
241 |        " 'https://www.reddit.com/r/DnD/',\n",
242 |        " 'https://www.reddit.com/r/Dodgers/',\n",
243 |        " 'https://www.reddit.com/r/DotA2/',\n",
244 |        " 'https://www.reddit.com/r/DuelLinks/',\n",
245 |        " 'https://www.reddit.com/r/DunderMifflin/',\n",
246 |        " 'https://www.reddit.com/r/DynastyFF/',\n",
247 |        " 'https://www.reddit.com/r/EDH/',\n",
248 |        " 'https://www.reddit.com/r/EDanonymemes/',\n",
249 |        " 'https://www.reddit.com/r/EOOD/',\n",
250 |        " 'https://www.reddit.com/r/EatCheapAndHealthy/',\n",
251 |        " 'https://www.reddit.com/r/Edmonton/',\n",
252 |        " 'https://www.reddit.com/r/EliteDangerous/',\n",
253 |        " 'https://www.reddit.com/r/EscapefromTarkov/',\n",
254 |        " 'https://www.reddit.com/r/Eve/',\n",
255 |        " 'https://www.reddit.com/r/FFBraveExvius/',\n",
256 |        " 'https://www.reddit.com/r/FIFA/',\n",
257 |        " 'https://www.reddit.com/r/FORTnITE/',\n",
258 |        " 'https://www.reddit.com/r/FUTMobile/',\n",
259 |        " 'https://www.reddit.com/r/Fallout/',\n",
260 |        " 'https://www.reddit.com/r/FantasyPL/',\n",
261 |        " 'https://www.reddit.com/r/FireEmblemHeroes/',\n",
262 |        " 'https://www.reddit.com/r/Fishing/',\n",
263 |        " 'https://www.reddit.com/r/Fitness/',\n",
264 |        " 'https://www.reddit.com/r/FixedGearBicycle/',\n",
265 |        " 'https://www.reddit.com/r/FlashTV/',\n",
266 |        " 'https://www.reddit.com/r/FortNiteBR/',\n",
267 |        " 'https://www.reddit.com/r/FortniteCompetitive/',\n",
268 |        " 'https://www.reddit.com/r/Frugal/',\n",
269 |        " 'https://www.reddit.com/r/GameOfThronesMemes/',\n",
270 |        " 'https://www.reddit.com/r/Gamingcirclejerk/',\n",
271 |        " 'https://www.reddit.com/r/GetMotivated/',\n",
272 |        " 'https://www.reddit.com/r/Glitch_in_the_Matrix/',\n",
273 |        " 'https://www.reddit.com/r/GlobalOffensive/',\n",
274 |        " 'https://www.reddit.com/r/GlobalOffensiveTrade/',\n",
275 |        " 'https://www.reddit.com/r/GooglePixel/',\n",
276 |        " 'https://www.reddit.com/r/GreenBayPackers/',\n",
277 |        " 'https://www.reddit.com/r/Grimdank/',\n",
278 |        " 'https://www.reddit.com/r/Guildwars2/',\n",
279 |        " 'https://www.reddit.com/r/Gundam/',\n",
280 |        " 'https://www.reddit.com/r/HBOGameofThrones/',\n",
281 |        " 'https://www.reddit.com/r/Hair/',\n",
282 |        " 'https://www.reddit.com/r/HealthyFood/',\n",
283 |        " 'https://www.reddit.com/r/HomeImprovement/',\n",
284 |        " 'https://www.reddit.com/r/IASIP/',\n",
285 |        " 'https://www.reddit.com/r/IAmA/',\n",
286 |        " 'https://www.reddit.com/r/IWantOut/',\n",
287 |        " 'https://www.reddit.com/r/ImaginaryWesteros/',\n",
288 |        " 'https://www.reddit.com/r/Indiemakeupandmore/',\n",
289 |        " 'https://www.reddit.com/r/Instagram/',\n",
290 |        " 'https://www.reddit.com/r/Israel/',\n",
291 |        " 'https://www.reddit.com/r/JapanTravel/',\n",
292 |        " 'https://www.reddit.com/r/Jeopardy/',\n",
293 |        " 'https://www.reddit.com/r/Kitsap/',\n",
294 |        " 'https://www.reddit.com/r/Konosuba/',\n",
295 |        " 'https://www.reddit.com/r/LearnJapanese/',\n",
296 |        " 'https://www.reddit.com/r/LegendsOfTomorrow/',\n",
297 |        " 'https://www.reddit.com/r/LifeProTips/',\n",
298 |        " 'https://www.reddit.com/r/LigaMX/',\n",
299 |        " 'https://www.reddit.com/r/LiverpoolFC/',\n",
300 |        " 'https://www.reddit.com/r/LivestreamFail/',\n",
301 |        " 'https://www.reddit.com/r/LosAngelesRams/',\n",
302 |        " 'https://www.reddit.com/r/LushCosmetics/',\n",
303 |        " 'https://www.reddit.com/r/MCFC/',\n",
304 |        " 'https://www.reddit.com/r/MLBTheShow/',\n",
305 |        " 'https://www.reddit.com/r/MLS/',\n",
306 |        " 'https://www.reddit.com/r/MMA/',\n",
307 |        " 'https://www.reddit.com/r/MTB/',\n",
308 |        " 'https://www.reddit.com/r/MUAontheCheap/',\n",
309 |        " 'https://www.reddit.com/r/MagicArena/',\n",
310 |        " 'https://www.reddit.com/r/Makeup/',\n",
311 |        " 'https://www.reddit.com/r/MakeupAddiction/',\n",
312 |        " 'https://www.reddit.com/r/MakingaMurderer/',\n",
313 |        " 'https://www.reddit.com/r/Market76/',\n",
314 |        " 'https://www.reddit.com/r/MarvelStrikeForce/',\n",
315 |        " 'https://www.reddit.com/r/Mavericks/',\n",
316 |        " 'https://www.reddit.com/r/Minecraft/',\n",
317 |        " 'https://www.reddit.com/r/Minneapolis/',\n",
318 |        " 'https://www.reddit.com/r/MkeBucks/',\n",
319 |        " 'https://www.reddit.com/r/ModernMagic/',\n",
320 |        " 'https://www.reddit.com/r/MonsterHunterWorld/',\n",
321 |        " 'https://www.reddit.com/r/Mordhau/',\n",
322 |        " 'https://www.reddit.com/r/MortalKombat/',\n",
323 |        " 'https://www.reddit.com/r/MtvChallenge/',\n",
324 |        " 'https://www.reddit.com/r/Music/',\n",
325 |        " 'https://www.reddit.com/r/NBA2k/',\n",
326 |        " 'https://www.reddit.com/r/NBASpurs/',\n",
327 |        " 'https://www.reddit.com/r/NFA/',\n",
328 |        " 'https://www.reddit.com/r/NHLHUT/',\n",
329 |        " 'https://www.reddit.com/r/NYKnicks/',\n",
330 |        " 'https://www.reddit.com/r/NYYankees/',\n",
331 |        " 'https://www.reddit.com/r/Naruto/',\n",
332 |        " 'https://www.reddit.com/r/Nationals/',\n",
333 |        " 'https://www.reddit.com/r/Nerf/',\n",
334 |        " 'https://www.reddit.com/r/NetflixBestOf/',\n",
335 |        " 'https://www.reddit.com/r/NewOrleans/',\n",
336 |        " 'https://www.reddit.com/r/NewSkaters/',\n",
337 |        " 'https://www.reddit.com/r/NewYorkMets/',\n",
338 |        " 'https://www.reddit.com/r/NintendoSwitch/',\n",
339 |        " 'https://www.reddit.com/r/NoMansSkyTheGame/',\n",
340 |        " 'https://www.reddit.com/r/NoStupidQuestions/',\n",
341 |        " 'https://www.reddit.com/r/OnePiece/',\n",
342 |        " 'https://www.reddit.com/r/OutOfTheLoop/',\n",
343 |        " 'https://www.reddit.com/r/Overwatch/',\n",
344 |        " 'https://www.reddit.com/r/PS4/',\n",
345 |        " 'https://www.reddit.com/r/PSVR/',\n",
346 |        " 'https://www.reddit.com/r/PUBATTLEGROUNDS/',\n",
347 |        " 'https://www.reddit.com/r/PUBGMobile/',\n",
348 |        " 'https://www.reddit.com/r/Paladins/',\n",
349 |        " 'https://www.reddit.com/r/PanPorn/',\n",
350 |        " 'https://www.reddit.com/r/PandR/',\n",
351 |        " 'https://www.reddit.com/r/Patriots/',\n",
352 |        " 'https://www.reddit.com/r/Persona5/',\n",
353 |        " 'https://www.reddit.com/r/Philippines/',\n",
354 |        " 'https://www.reddit.com/r/Planetside/',\n",
355 |        " 'https://www.reddit.com/r/Polska/',\n",
356 |        " 'https://www.reddit.com/r/Portland/',\n",
357 |        " 'https://www.reddit.com/r/Quebec/',\n",
358 |        " 'https://www.reddit.com/r/RWBY/',\n",
359 |        " 'https://www.reddit.com/r/Rainbow6/',\n",
360 |        " 'https://www.reddit.com/r/RedDeadOnline/',\n",
361 |        " 'https://www.reddit.com/r/RedditLaqueristas/',\n",
362 |        " 'https://www.reddit.com/r/RepLadiesBST/',\n",
363 |        " 'https://www.reddit.com/r/Repsneakers/',\n",
364 |        " 'https://www.reddit.com/r/RimWorld/',\n",
365 |        " 'https://www.reddit.com/r/RocketLeague/',\n",
366 |        " 'https://www.reddit.com/r/RocketLeagueExchange/',\n",
367 |        " 'https://www.reddit.com/r/Romania/',\n",
368 |        " 'https://www.reddit.com/r/Rowing/',\n",
369 |        " 'https://www.reddit.com/r/SFGiants/',\n",
370 |        " 'https://www.reddit.com/r/SWGalaxyOfHeroes/',\n",
371 |        " 'https://www.reddit.com/r/Sacramento/',\n",
372 |        " 'https://www.reddit.com/r/SaltLakeCity/',\n",
373 |        " 'https://www.reddit.com/r/SanJoseSharks/',\n",
374 |        " 'https://www.reddit.com/r/SarahSnark/',\n",
375 |        " 'https://www.reddit.com/r/Scotland/',\n",
376 |        " 'https://www.reddit.com/r/Seaofthieves/',\n",
377 |        " 'https://www.reddit.com/r/Seattle/',\n",
378 |        " 'https://www.reddit.com/r/SequelMemes/',\n",
379 |        " 'https://www.reddit.com/r/ShingekiNoKyojin/',\n",
380 |        " 'https://www.reddit.com/r/Shoestring/',\n",
381 |        " 'https://www.reddit.com/r/Showerthoughts/',\n",
382 |        " 'https://www.reddit.com/r/Smite/',\n",
383 |        " 'https://www.reddit.com/r/Sneakers/',\n",
384 |        " 'https://www.reddit.com/r/Spiderman/',\n",
385 |        " 'https://www.reddit.com/r/SpoiledDragRace/',\n",
386 |        " 'https://www.reddit.com/r/SquaredCircle/',\n",
387 |        " 'https://www.reddit.com/r/StLouis/',\n",
388 |        " 'https://www.reddit.com/r/StarVStheForcesofEvil/',\n",
389 |        " 'https://www.reddit.com/r/StarWarsBattlefront/',\n",
390 |        " 'https://www.reddit.com/r/StardewValley/',\n",
391 |        " 'https://www.reddit.com/r/Steam/',\n",
392 |        " 'https://www.reddit.com/r/Stellaris/',\n",
393 |        " 'https://www.reddit.com/r/StrangerThings/',\n",
394 |        " 'https://www.reddit.com/r/Stronglifts5x5/',\n",
395 |        " 'https://www.reddit.com/r/Suomi/',\n",
396 |        " 'https://www.reddit.com/r/Supplements/',\n",
397 |        " 'https://www.reddit.com/r/TeenMomOGandTeenMom2/',\n",
398 |        " 'https://www.reddit.com/r/Terraria/',\n",
399 |        " 'https://www.reddit.com/r/TheAmazingRace/',\n",
400 |        " 'https://www.reddit.com/r/TheBlackList/',\n",
401 |        " 'https://www.reddit.com/r/TheDickShow/',\n",
402 |        " 'https://www.reddit.com/r/TheHandmaidsTale/',\n",
403 |        " 'https://www.reddit.com/r/TheLastAirbender/',\n",
404 |        " 'https://www.reddit.com/r/TheSimpsons/',\n",
405 |        " 'https://www.reddit.com/r/Tinder/',\n",
406 |        " 'https://www.reddit.com/r/Torontobluejays/',\n",
407 |        " 'https://www.reddit.com/r/Turkey/',\n",
408 |        " 'https://www.reddit.com/r/TurkeyJerky/',\n",
409 |        " 'https://www.reddit.com/r/Twitch/',\n",
410 |        " 'https://www.reddit.com/r/TwoBestFriendsPlay/',\n",
411 |        " 'https://www.reddit.com/r/VictoriaBC/',\n",
412 |        " 'https://www.reddit.com/r/WWE/',\n",
413 |        " 'https://www.reddit.com/r/WWEGames/',\n",
414 |        " 'https://www.reddit.com/r/WaltDisneyWorld/',\n",
415 |        " 'https://www.reddit.com/r/Warframe/',\n",
416 |        " 'https://www.reddit.com/r/Warhammer40k/',\n",
417 |        " 'https://www.reddit.com/r/Warthunder/',\n",
418 |        " 'https://www.reddit.com/r/Watches/',\n",
419 |        " 'https://www.reddit.com/r/Watchexchange/',\n",
420 |        " 'https://www.reddit.com/r/Wellington/',\n",
421 |        " 'https://www.reddit.com/r/Wetshaving/',\n",
422 |        " 'https://www.reddit.com/r/Windows10/',\n",
423 |        " 'https://www.reddit.com/r/Winnipeg/',\n",
424 |        " 'https://www.reddit.com/r/WorldOfWarships/',\n",
425 |        " 'https://www.reddit.com/r/WorldofTanks/',\n",
426 |        " 'https://www.reddit.com/r/Youniqueamua/',\n",
427 |        " 'https://www.reddit.com/r/aSongOfMemesAndRage/',\n",
428 |        " 'https://www.reddit.com/r/acne/',\n",
429 |        " 'https://www.reddit.com/r/adventuretime/',\n",
430 |        " 'https://www.reddit.com/r/airsoft/',\n",
431 |        " 'https://www.reddit.com/r/amateur_boxing/',\n",
432 |        " 'https://www.reddit.com/r/anime/',\n",
433 |        " 'https://www.reddit.com/r/anime_irl/',\n",
434 |        " 'https://www.reddit.com/r/antelopevalley/',\n",
435 |        " 'https://www.reddit.com/r/apple/',\n",
436 |        " 'https://www.reddit.com/r/argentina/',\n",
437 |        " 'https://www.reddit.com/r/arrow/',\n",
438 |        " 'https://www.reddit.com/r/askTO/',\n",
439 |        " 'https://www.reddit.com/r/askscience/',\n",
440 |        " 'https://www.reddit.com/r/asoiaf/',\n",
441 |        " 'https://www.reddit.com/r/australia/',\n",
442 |        " 'https://www.reddit.com/r/awardtravel/',\n",
443 |        " 'https://www.reddit.com/r/backpacking/',\n",
444 |        " 'https://www.reddit.com/r/balisong/',\n",
445 |        " 'https://www.reddit.com/r/barstoolsports/',\n",
446 |        " 'https://www.reddit.com/r/baseball/',\n",
447 |        " 'https://www.reddit.com/r/batman/',\n",
448 |        " 'https://www.reddit.com/r/battlestations/',\n",
449 |        " 'https://www.reddit.com/r/bayarea/',\n",
450 |        " 'https://www.reddit.com/r/beards/',\n",
451 |        " 'https://www.reddit.com/r/beauty/',\n",
452 |        " 'https://www.reddit.com/r/berkeley/',\n",
453 |        " 'https://www.reddit.com/r/bicycling/',\n",
454 |        " 'https://www.reddit.com/r/bikecommuting/',\n",
455 |        " 'https://www.reddit.com/r/bikewrench/',\n",
456 |        " 'https://www.reddit.com/r/bjj/',\n",
457 |        " 'https://www.reddit.com/r/blackmirror/',\n",
458 |        " 'https://www.reddit.com/r/bleach/',\n",
459 |        " 'https://www.reddit.com/r/boardgames/',\n",
460 |        " 'https://www.reddit.com/r/bodybuilding/',\n",
461 |        " 'https://www.reddit.com/r/bodyweightfitness/',\n",
462 |        " 'https://www.reddit.com/r/books/',\n",
463 |        " 'https://www.reddit.com/r/boostedboards/',\n",
464 |        " 'https://www.reddit.com/r/bostonceltics/',\n",
465 |        " 'https://www.reddit.com/r/brasil/',\n",
466 |        " 'https://www.reddit.com/r/brasilivre/',\n",
467 |        " 'https://www.reddit.com/r/breakingbad/',\n",
468 |        " 'https://www.reddit.com/r/brisbane/',\n",
469 |        " 'https://www.reddit.com/r/brooklynninenine/',\n",
470 |        " 'https://www.reddit.com/r/buildapc/',\n",
471 |        " 'https://www.reddit.com/r/burlington/',\n",
472 |        " 'https://www.reddit.com/r/camping/',\n",
473 |        " 'https://www.reddit.com/r/canada/',\n",
474 |        " 'https://www.reddit.com/r/canucks/',\n",
475 |        " 'https://www.reddit.com/r/cars/',\n",
476 |        " 'https://www.reddit.com/r/chelseafc/',\n",
477 |        " 'https://www.reddit.com/r/chile/',\n",
478 |        " 'https://www.reddit.com/r/cirkeltrek/',\n",
479 |        " 'https://www.reddit.com/r/classicwow/',\n",
480 |        " 'https://www.reddit.com/r/climbing/',\n",
481 |        " 'https://www.reddit.com/r/community/',\n",
482 |        " 'https://www.reddit.com/r/confession/',\n",
483 |        " 'https://www.reddit.com/r/cordcutters/',\n",
484 |        " 'https://www.reddit.com/r/cowboys/',\n",
485 |        " 'https://www.reddit.com/r/coys/',\n",
486 |        " 'https://www.reddit.com/r/criterion/',\n",
487 |        " 'https://www.reddit.com/r/croatia/',\n",
488 |        " 'https://www.reddit.com/r/crossfit/',\n",
489 |        " 'https://www.reddit.com/r/cscareerquestions/',\n",
490 |        " 'https://www.reddit.com/r/curlyhair/',\n",
491 |        " 'https://www.reddit.com/r/cycling/',\n",
492 |        " 'https://www.reddit.com/r/danganronpa/',\n",
493 |        " 'https://www.reddit.com/r/dauntless/',\n",
494 |        " 'https://www.reddit.com/r/dbz/',\n",
495 |        " 'https://www.reddit.com/r/de/',\n",
496 |        " 'https://www.reddit.com/r/deadbydaylight/',\n",
497 |        " 'https://www.reddit.com/r/denvernuggets/',\n",
498 |        " 'https://www.reddit.com/r/destiny2/',\n",
499 |        " 'https://www.reddit.com/r/detroitlions/',\n",
500 |        " 'https://www.reddit.com/r/diabetes/',\n",
501 |        " 'https://www.reddit.com/r/diabetes_t1/',\n",
502 |        " 'https://www.reddit.com/r/discgolf/',\n",
503 |        " 'https://www.reddit.com/r/discordapp/',\n",
504 |        " 'https://www.reddit.com/r/disney/',\n",
505 |        " 'https://www.reddit.com/r/dndmemes/',\n",
506 |        " 'https://www.reddit.com/r/dndnext/',\n",
507 |        " 'https://www.reddit.com/r/doctorwho/',\n",
508 |        " 'https://www.reddit.com/r/dubai/',\n",
509 |        " 'https://www.reddit.com/r/eagles/',\n",
510 |        " 'https://www.reddit.com/r/ehlersdanlos/',\n",
511 |        " 'https://www.reddit.com/r/elderscrollsonline/',\n",
512 |        " 'https://www.reddit.com/r/eu4/',\n",
513 |        " 'https://www.reddit.com/r/europe/',\n",
514 |        " 'https://www.reddit.com/r/explainlikeimfive/',\n",
515 |        " 'https://www.reddit.com/r/fairytail/',\n",
516 |        " 'https://www.reddit.com/r/fantasybaseball/',\n",
517 |        " 'https://www.reddit.com/r/fantasyfootball/',\n",
518 |        " 'https://www.reddit.com/r/fasting/',\n",
519 |        " 'https://www.reddit.com/r/femalefashionadvice/',\n",
520 |        " 'https://www.reddit.com/r/femalehairadvice/',\n",
521 |        " 'https://www.reddit.com/r/ffxiv/',\n",
522 |        " 'https://www.reddit.com/r/findfashion/',\n",
523 |        " 'https://www.reddit.com/r/fireemblem/',\n",
524 |        " 'https://www.reddit.com/r/fivenightsatfreddys/',\n",
525 |        " 'https://www.reddit.com/r/flexibility/',\n",
526 |        " 'https://www.reddit.com/r/flightsim/',\n",
527 |        " 'https://www.reddit.com/r/flyfishing/',\n",
528 |        " 'https://www.reddit.com/r/fo76/',\n",
529 |        " 'https://www.reddit.com/r/footballmanagergames/',\n",
530 |        " 'https://www.reddit.com/r/forhonor/',\n",
531 |        " 'https://www.reddit.com/r/formula1/',\n",
532 |        " 'https://www.reddit.com/r/fragrance/',\n",
533 |        " 'https://www.reddit.com/r/france/',\n",
534 |        " 'https://www.reddit.com/r/freefolk/',\n",
535 |        " 'https://www.reddit.com/r/frugalmalefashion/',\n",
536 |        " 'https://www.reddit.com/r/futurama/',\n",
537 |        " 'https://www.reddit.com/r/future_fight/',\n",
538 |        " 'https://www.reddit.com/r/gainit/',\n",
539 |        " 'https://www.reddit.com/r/gameofthrones/',\n",
540 |        " 'https://www.reddit.com/r/germany/',\n",
541 |        " 'https://www.reddit.com/r/girlsfrontline/',\n",
542 |        " 'https://www.reddit.com/r/golf/',\n",
543 |        " 'https://www.reddit.com/r/goodyearwelt/',\n",
544 |        " 'https://www.reddit.com/r/grandorder/',\n",
545 |        " 'https://www.reddit.com/r/greece/',\n",
546 |        " 'https://www.reddit.com/r/greysanatomy/',\n",
547 |        " 'https://www.reddit.com/r/gtaonline/',\n",
548 |        " 'https://www.reddit.com/r/halifax/',\n",
549 |        " 'https://www.reddit.com/r/halo/',\n",
550 |        " 'https://www.reddit.com/r/headphones/',\n",
551 |        " 'https://www.reddit.com/r/hearthstone/',\n",
552 |        " 'https://www.reddit.com/r/heroesofthestorm/',\n",
553 |        " 'https://www.reddit.com/r/hiking/',\n",
554 |        " 'https://www.reddit.com/r/hockey/',\n",
555 |        " 'https://www.reddit.com/r/hockeyjerseys/',\n",
556 |        " 'https://www.reddit.com/r/hockeyplayers/',\n",
557 |        " 'https://www.reddit.com/r/houston/',\n",
558 |        " 'https://www.reddit.com/r/howardstern/',\n",
559 |        " 'https://www.reddit.com/r/hungary/',\n",
560 |        " 'https://www.reddit.com/r/india/',\n",
561 |        " 'https://www.reddit.com/r/indonesia/',\n",
562 |        " 'https://www.reddit.com/r/intermittentfasting/',\n",
563 |        " 'https://www.reddit.com/r/iphone/',\n",
564 |        " 'https://www.reddit.com/r/ireland/',\n",
565 |        " 'https://www.reddit.com/r/italy/',\n",
566 |        " 'https://www.reddit.com/r/jailbreak/',\n",
567 |        " 'https://www.reddit.com/r/japanesestreetwear/',\n",
568 |        " 'https://www.reddit.com/r/japanlife/',\n",
569 |        " 'https://www.reddit.com/r/jobs/',\n",
570 |        " 'https://www.reddit.com/r/kansascity/',\n",
571 |        " 'https://www.reddit.com/r/keto/',\n",
572 |        " 'https://www.reddit.com/r/korea/',\n",
573 |        " 'https://www.reddit.com/r/lakers/',\n",
574 |        " 'https://www.reddit.com/r/leafs/',\n",
575 |        " 'https://www.reddit.com/r/leagueoflegends/',\n",
576 |        " 'https://www.reddit.com/r/leangains/',\n",
577 |        " 'https://www.reddit.com/r/learnprogramming/',\n",
578 |        " 'https://www.reddit.com/r/learnpython/',\n",
579 |        " 'https://www.reddit.com/r/legaladvice/',\n",
580 |        " 'https://www.reddit.com/r/longboarding/',\n",
581 |        " 'https://www.reddit.com/r/loseit/',\n",
582 |        " 'https://www.reddit.com/r/lucifer/',\n",
583 |        " 'https://www.reddit.com/r/makeupexchange/',\n",
584 |        " 'https://www.reddit.com/r/malaysia/',\n",
585 |        " 'https://www.reddit.com/r/malefashion/',\n",
586 |        " 'https://www.reddit.com/r/malefashionadvice/',\n",
587 |        " 'https://www.reddit.com/r/malehairadvice/',\n",
588 |        " 'https://www.reddit.com/r/malelivingspace/',\n",
589 |        " 'https://www.reddit.com/r/marvelmemes/',\n",
590 |        " 'https://www.reddit.com/r/marvelstudios/',\n",
591 |        " 'https://www.reddit.com/r/medical_advice/',\n",
592 |        " 'https://www.reddit.com/r/melbourne/',\n",
593 |        " 'https://www.reddit.com/r/memes/',\n",
594 |        " 'https://www.reddit.com/r/mexico/',\n",
595 |        " 'https://www.reddit.com/r/migraine/',\n",
596 |        " 'https://www.reddit.com/r/minnesotatwins/',\n",
597 |        " 'https://www.reddit.com/r/minnesotavikings/',\n",
598 |        " 'https://www.reddit.com/r/mw4/',\n",
599 |        " 'https://www.reddit.com/r/mylittlepony/',\n",
600 |        " 'https://www.reddit.com/r/nashville/',\n",
601 |        " 'https://www.reddit.com/r/nattyorjuice/',\n",
602 |        " 'https://www.reddit.com/r/nba/',\n",
603 |        " 'https://www.reddit.com/r/nbadiscussion/',\n",
604 |        " 'https://www.reddit.com/r/netflix/',\n",
605 |        " 'https://www.reddit.com/r/newsokur/',\n",
606 |        " 'https://www.reddit.com/r/newzealand/',\n",
607 |        " 'https://www.reddit.com/r/nfl/',\n",
608 |        " 'https://www.reddit.com/r/nhl/',\n",
609 |        " 'https://www.reddit.com/r/norge/',\n",
610 |        " 'https://www.reddit.com/r/nosleep/',\n",
611 |        " 'https://www.reddit.com/r/nova/',\n",
612 |        " 'https://www.reddit.com/r/nrl/',\n",
613 |        " 'https://www.reddit.com/r/nunavut/',\n",
614 |        " 'https://www.reddit.com/r/nutrition/',\n",
615 |        " 'https://www.reddit.com/r/nvidia/',\n",
616 |        " 'https://www.reddit.com/r/nyjets/',\n",
617 |        " 'https://www.reddit.com/r/omad/',\n",
618 |        " 'https://www.reddit.com/r/orangecounty/',\n",
619 |        " 'https://www.reddit.com/r/orangetheory/',\n",
620 |        " 'https://www.reddit.com/r/osugame/',\n",
621 |        " 'https://www.reddit.com/r/ottawa/',\n",
622 |        " 'https://www.reddit.com/r/overlord/',\n",
623 |        " 'https://www.reddit.com/r/pathofexile/',\n",
624 |        " 'https://www.reddit.com/r/pcmasterrace/',\n",
625 |        " 'https://www.reddit.com/r/peloton/',\n",
626 |        " 'https://www.reddit.com/r/pesmobile/',\n",
627 |        " 'https://www.reddit.com/r/philadelphia/',\n",
628 |        " 'https://www.reddit.com/r/phillies/',\n",
629 |        " 'https://www.reddit.com/r/phoenix/',\n",
630 |        " 'https://www.reddit.com/r/pics/',\n",
631 |        " 'https://www.reddit.com/r/pics/?f=flair_name%3A%22Politics%22',\n",
632 |        " 'https://www.reddit.com/r/pics/comments/g1k7qr/well_america_this_explains_it/',\n",
633 |        " 'https://www.reddit.com/r/piercing/',\n",
634 |        " 'https://www.reddit.com/r/pittsburgh/',\n",
635 |        " 'https://www.reddit.com/r/playrust/',\n",
636 |        " 'https://www.reddit.com/r/podemos/',\n",
637 |        " 'https://www.reddit.com/r/pokemon/',\n",
638 |        " 'https://www.reddit.com/r/pokemongo/',\n",
639 |        " 'https://www.reddit.com/r/pokemontrades/',\n",
640 |        " 'https://www.reddit.com/r/portugal/',\n",
641 |        " 'https://www.reddit.com/r/poshmark/',\n",
642 |        " 'https://www.reddit.com/r/powerlifting/',\n",
643 |        " 'https://www.reddit.com/r/progresspics/',\n",
644 |        " 'https://www.reddit.com/r/raleigh/',\n",
645 |        " 'https://www.reddit.com/r/ravens/',\n",
646 |        " 'https://www.reddit.com/r/rawdenim/',\n",
647 |        " 'https://www.reddit.com/r/realmadrid/',\n",
648 |        " 'https://www.reddit.com/r/reddeadredemption/',\n",
649 |        " 'https://www.reddit.com/r/reddevils/',\n",
650 |        " 'https://www.reddit.com/r/redsox/',\n",
651 |        " 'https://www.reddit.com/r/relationship_advice/',\n",
652 |        " 'https://www.reddit.com/r/rickandmorty/',\n",
653 |        " 'https://www.reddit.com/r/ripcity/',\n",
654 |        " 'https://www.reddit.com/r/riverdale/',\n",
655 |        " 'https://www.reddit.com/r/roadtrip/',\n",
656 |        " 'https://www.reddit.com/r/rolex/',\n",
657 |        " 'https://www.reddit.com/r/rollercoasters/',\n",
658 |        " 'https://www.reddit.com/r/rpdrcringe/',\n",
659 |        " 'https://www.reddit.com/r/rugbyunion/',\n",
660 |        " 'https://www.reddit.com/r/runescape/',\n",
661 |        " 'https://www.reddit.com/r/running/',\n",
662 |        " 'https://www.reddit.com/r/rupaulsdragrace/',\n",
663 |        " 'https://www.reddit.com/r/rva/',\n",
664 |        " 'https://www.reddit.com/r/sanantonio/',\n",
665 |        " 'https://www.reddit.com/r/sandiego/',\n",
666 |        " 'https://www.reddit.com/r/sanfrancisco/',\n",
667 |        " 'https://www.reddit.com/r/saskatoon/',\n",
668 |        " 'https://www.reddit.com/r/scifi/',\n",
669 |        " 'https://www.reddit.com/r/seinfeld/',\n",
670 |        " 'https://www.reddit.com/r/serbia/',\n",
671 |        " 'https://www.reddit.com/r/shield/',\n",
672 |        " 'https://www.reddit.com/r/singapore/',\n",
673 |        " 'https://www.reddit.com/r/sixers/',\n",
674 |        " 'https://www.reddit.com/r/skiing/',\n",
675 |        " 'https://www.reddit.com/r/skyrim/',\n",
676 |        " 'https://www.reddit.com/r/smashbros/',\n",
677 |        " 'https://www.reddit.com/r/sneakermarket/',\n",
678 |        " 'https://www.reddit.com/r/snowboarding/',\n",
679 |        " 'https://www.reddit.com/r/soccer/',\n",
680 |        " 'https://www.reddit.com/r/solotravel/',\n",
681 |        " 'https://www.reddit.com/r/southpark/',\n",
682 |        " 'https://www.reddit.com/r/sports/',\n",
683 |        " 'https://www.reddit.com/r/sportsbook/',\n",
684 |        " 'https://www.reddit.com/r/starbucks/',\n",
685 |        " 'https://www.reddit.com/r/starcitizen/',\n",
686 |        " 'https://www.reddit.com/r/startrek/',\n",
687 |        " 'https://www.reddit.com/r/steelers/',\n",
688 |        " 'https://www.reddit.com/r/stevenuniverse/',\n",
689 |        " 'https://www.reddit.com/r/stlouisblues/',\n",
690 |        " 'https://www.reddit.com/r/streetwearstartup/',\n",
691 |        " 'https://www.reddit.com/r/summonerswar/',\n",
692 |        " 'https://www.reddit.com/r/suns/',\n",
693 |        " 'https://www.reddit.com/r/survivor/',\n",
694 |        " 'https://www.reddit.com/r/sweden/',\n",
695 |        " 'https://www.reddit.com/r/swoleacceptance/',\n",
696 |        " 'https://www.reddit.com/r/sydney/',\n",
697 |        " 'https://www.reddit.com/r/sysadmin/',\n",
698 |        " 'https://www.reddit.com/r/tampabayrays/',\n",
699 |        " 'https://www.reddit.com/r/tattoos/',\n",
700 |        " 'https://www.reddit.com/r/techsupport/',\n",
701 |        " 'https://www.reddit.com/r/tennis/',\n",
702 |        " 'https://www.reddit.com/r/tf2/',\n",
703 |        " 'https://www.reddit.com/r/the100/',\n",
704 |        " 'https://www.reddit.com/r/thebachelor/',\n",
705 |        " 'https://www.reddit.com/r/thedivision/',\n",
706 |        " 'https://www.reddit.com/r/thenetherlands/',\n",
707 |        " 'https://www.reddit.com/r/thesims/',\n",
708 |        " 'https://www.reddit.com/r/thesopranos/',\n",
709 |        " 'https://www.reddit.com/r/thewalkingdead/',\n",
710 |        " 'https://www.reddit.com/r/tipofmytongue/',\n",
711 |        " 'https://www.reddit.com/r/titanfolk/',\n",
712 |        " 'https://www.reddit.com/r/todayilearned/',\n",
713 |        " 'https://www.reddit.com/r/torontoraptors/',\n",
714 |        " 'https://www.reddit.com/r/totalwar/',\n",
715 |        " 'https://www.reddit.com/r/touhou/',\n",
716 |        " 'https://www.reddit.com/r/trailerparkboys/',\n",
717 |        " 'https://www.reddit.com/r/translator/',\n",
718 |        " 'https://www.reddit.com/r/travel/',\n",
719 |        " 'https://www.reddit.com/r/vagabond/',\n",
720 |        " 'https://www.reddit.com/r/vancouver/',\n",
721 |        " 'https://www.reddit.com/r/vanderpumprules/',\n",
722 |        " 'https://www.reddit.com/r/vegan/',\n",
723 |        " 'https://www.reddit.com/r/videos/',\n",
724 |        " 'https://www.reddit.com/r/vzla/',\n",
725 |        " 'https://www.reddit.com/r/warriors/',\n",
726 |        " 'https://www.reddit.com/r/weightroom/',\n",
727 |        " 'https://www.reddit.com/r/westworld/',\n",
728 |        " 'https://www.reddit.com/r/wicked_edge/',\n",
729 |        " 'https://www.reddit.com/r/wow/',\n",
730 |        " 'https://www.reddit.com/r/xboxone/',\n",
731 |        " 'https://www.reddit.com/r/xxfitness/',\n",
732 |        " 'https://www.reddit.com/r/yeezys/',\n",
733 |        " 'https://www.reddit.com/r/yoga/',\n",
734 |        " 'https://www.reddit.com/r/yugioh/',\n",
735 |        " 'https://www.reddit.com/r/zerocarb/',\n",
736 |        " 'https://www.reddit.com/rpan/',\n",
737 |        " 'https://www.reddit.com/subreddits/leaderboard/up-and-coming',\n",
738 |        " 'https://www.reddit.com/user/Barknuckle/',\n",
739 |        " 'https://www.reddit.com/user/Frocharocha/',\n",
740 |        " 'https://www.reddit.com/user/Magistrex/',\n",
741 |        " 'https://www.reddit.com/user/PoliticsModeratorBot/',\n",
742 |        " 'https://www.reddit.com/user/Ra75b/',\n",
743 |        " 'https://www.reddit.com/user/TheVirginVibes/',\n",
744 |        " 'https://www.reddit.com/user/frozenHelen/'}"
745 |       ]
746 |      },
747 |      "execution_count": 15,
748 |      "metadata": {},
749 |      "output_type": "execute_result"
750 |     }
751 |    ],
752 |    "source": [
753 |     "new_urls.difference(urls)"
754 |    ]
755 |   },
756 |   {
757 |    "cell_type": "code",
758 |    "execution_count": 16,
759 |    "metadata": {},
760 |    "outputs": [
761 |     {
762 |      "data": {
763 |       "text/plain": [
764 |        "<coroutine object AsyncHTMLSession.close at 0x000000D4EED04DC8>"
765 |       ]
766 |      },
767 |      "execution_count": 16,
768 |      "metadata": {},
769 |      "output_type": "execute_result"
770 |     }
771 |    ],
772 |    "source": [
773 |     "session.close()"
774 |    ]
775 |   }
776 |  ],
777 |  "metadata": {
778 |   "kernelspec": {
779 |    "display_name": "Python 3",
780 |    "language": "python",
781 |    "name": "python3"
782 |   },
783 |   "language_info": {
784 |    "codemirror_mode": {
785 |     "name": "ipython",
786 |     "version": 3
787 |    },
788 |    "file_extension": ".py",
789 |    "mimetype": "text/x-python",
790 |    "name": "python",
791 |    "nbconvert_exporter": "python",
792 |    "pygments_lexer": "ipython3",
793 |    "version": "3.7.6"
794 |   }
795 |  },
796 |  "nbformat": 4,
797 |  "nbformat_minor": 4
798 | }
799 | 


--------------------------------------------------------------------------------
/10.The Requests-HTML Package/Section 10 - Scraping JavaScript.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Scraping data generated by JavaScript"
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "code",
 12 |    "execution_count": 1,
 13 |    "metadata": {},
 14 |    "outputs": [],
 15 |    "source": [
 16 |     "# When coding in Jupyter and Spyder, we need to use the class AsyncHTMLSession to make JavaScript work\n",
 17 |     "# In other environments you can use the normal HTMLSession\n",
 18 |     "from requests_html import AsyncHTMLSession"
 19 |    ]
 20 |   },
 21 |   {
 22 |    "cell_type": "code",
 23 |    "execution_count": 2,
 24 |    "metadata": {},
 25 |    "outputs": [],
 26 |    "source": [
 27 |     "# establish a new asynchronous session\n",
 28 |     "session = AsyncHTMLSession()\n",
 29 |     "\n",
 30 |     "# The only difference we will experience between the regular HTML Session and the asynchronous one,\n",
 31 |     "# is the need to write the keyword 'await' in front of some statements"
 32 |    ]
 33 |   },
 34 |   {
 35 |    "cell_type": "code",
 36 |    "execution_count": 3,
 37 |    "metadata": {},
 38 |    "outputs": [],
 39 |    "source": [
 40 |     "# In this example we're going to use Nike's homepage: https://www.reddit.com/\n",
 41 |     "# Several of the links on this page, as well as other elements, are generated by JavaScript\n",
 42 |     "# We will compare the result of scraping those before and after running the JavaScript code"
 43 |    ]
 44 |   },
 45 |   {
 46 |    "cell_type": "code",
 47 |    "execution_count": 4,
 48 |    "metadata": {},
 49 |    "outputs": [
 50 |     {
 51 |      "data": {
 52 |       "text/plain": [
 53 |        "200"
 54 |       ]
 55 |      },
 56 |      "execution_count": 4,
 57 |      "metadata": {},
 58 |      "output_type": "execute_result"
 59 |     }
 60 |    ],
 61 |    "source": [
 62 |     "# Since we used async session, we need to use the keyword 'await'\n",
 63 |     "# If you use the regular HTMLSession, there is no need for 'await'\n",
 64 |     "r = await session.get(\"https://www.reddit.com/\")\n",
 65 |     "r.status_code"
 66 |    ]
 67 |   },
 68 |   {
 69 |    "cell_type": "code",
 70 |    "execution_count": 5,
 71 |    "metadata": {},
 72 |    "outputs": [],
 73 |    "source": [
 74 |     "# So far, nothing different from our previous example has happened\n",
 75 |     "# The JavaScript code has not yet been executed"
 76 |    ]
 77 |   },
 78 |   {
 79 |    "cell_type": "code",
 80 |    "execution_count": 6,
 81 |    "metadata": {},
 82 |    "outputs": [],
 83 |    "source": [
 84 |     "# Here are some tags obtained before rendering the JavaScript code, i.e. extarcted from the raw HTML\n",
 85 |     "divs = r.html.find(\"div\")\n",
 86 |     "links = r.html.find(\"a\")\n",
 87 |     "urls = r.html.absolute_links"
 88 |    ]
 89 |   },
 90 |   {
 91 |    "cell_type": "code",
 92 |    "execution_count": 7,
 93 |    "metadata": {},
 94 |    "outputs": [],
 95 |    "source": [
 96 |     "# Now, we need to execute the JavaScript code that will generate additional tags"
 97 |    ]
 98 |   },
 99 |   {
100 |    "cell_type": "code",
101 |    "execution_count": 8,
102 |    "metadata": {},
103 |    "outputs": [],
104 |    "source": [
105 |     "# The requests-html package provides a very simple interface for that - just use the 'render()' method\n",
106 |     "# ('arender()' when using async session)\n",
107 |     "# It runs the JavaScript code which updates the HTML. This may take a bit\n",
108 |     "# The updated HTML is stored in the old variable 'r.html' - you do not need to assign a new variable to the method\n",
109 |     "# As before, the 'await' keyword is supplied only because of the Async session\n",
110 |     "await r.html.arender()"
111 |    ]
112 |   },
113 |   {
114 |    "cell_type": "code",
115 |    "execution_count": 9,
116 |    "metadata": {},
117 |    "outputs": [],
118 |    "source": [
119 |     "# NOTE: The first time you run 'a/render()' Chromium will be downloaded and installed on your computer"
120 |    ]
121 |   },
122 |   {
123 |    "cell_type": "code",
124 |    "execution_count": 10,
125 |    "metadata": {},
126 |    "outputs": [],
127 |    "source": [
128 |     "# Now the HTML is updated and we can search for the same tags again\n",
129 |     "new_divs = r.html.find(\"div\")\n",
130 |     "new_links = r.html.find(\"a\")\n",
131 |     "new_urls = r.html.absolute_links"
132 |    ]
133 |   },
134 |   {
135 |    "cell_type": "code",
136 |    "execution_count": 11,
137 |    "metadata": {},
138 |    "outputs": [],
139 |    "source": [
140 |     "# We can see the difference in the number of found elements before and after the JavaScript executed"
141 |    ]
142 |   },
143 |   {
144 |    "cell_type": "code",
145 |    "execution_count": 12,
146 |    "metadata": {},
147 |    "outputs": [
148 |     {
149 |      "data": {
150 |       "text/plain": [
151 |        "(543, 1728)"
152 |       ]
153 |      },
154 |      "execution_count": 12,
155 |      "metadata": {},
156 |      "output_type": "execute_result"
157 |     }
158 |    ],
159 |    "source": [
160 |     "len(divs), len(new_divs)"
161 |    ]
162 |   },
163 |   {
164 |    "cell_type": "code",
165 |    "execution_count": 13,
166 |    "metadata": {},
167 |    "outputs": [
168 |     {
169 |      "data": {
170 |       "text/plain": [
171 |        "(87, 681)"
172 |       ]
173 |      },
174 |      "execution_count": 13,
175 |      "metadata": {},
176 |      "output_type": "execute_result"
177 |     }
178 |    ],
179 |    "source": [
180 |     "len(links), len(new_links)"
181 |    ]
182 |   },
183 |   {
184 |    "cell_type": "code",
185 |    "execution_count": 14,
186 |    "metadata": {},
187 |    "outputs": [
188 |     {
189 |      "data": {
190 |       "text/plain": [
191 |        "(58, 640)"
192 |       ]
193 |      },
194 |      "execution_count": 14,
195 |      "metadata": {},
196 |      "output_type": "execute_result"
197 |     }
198 |    ],
199 |    "source": [
200 |     "len(urls), len(new_urls)"
201 |    ]
202 |   },
203 |   {
204 |    "cell_type": "code",
205 |    "execution_count": 15,
206 |    "metadata": {},
207 |    "outputs": [],
208 |    "source": [
209 |     "# Remember that 'urls' is a set, and not a list?\n",
210 |     "# Well, there is a useful feature of sets that we will now take advantage of\n",
211 |     "# It takes two sets and selects only those items from the first set that are not present in the second one"
212 |    ]
213 |   },
214 |   {
215 |    "cell_type": "code",
216 |    "execution_count": 16,
217 |    "metadata": {
218 |     "scrolled": true
219 |    },
220 |    "outputs": [
221 |     {
222 |      "data": {
223 |       "text/plain": [
224 |        "{'https://i.imgur.com/nMhodgS.gifv',\n",
225 |        " 'https://www.reddit.com/r/1200isplenty/',\n",
226 |        " 'https://www.reddit.com/r/2007scape/',\n",
227 |        " 'https://www.reddit.com/r/49ers/',\n",
228 |        " 'https://www.reddit.com/r/90DayFiance/',\n",
229 |        " 'https://www.reddit.com/r/ACMilan/',\n",
230 |        " 'https://www.reddit.com/r/Adelaide/',\n",
231 |        " 'https://www.reddit.com/r/Amd/',\n",
232 |        " 'https://www.reddit.com/r/Android/',\n",
233 |        " 'https://www.reddit.com/r/Animesuggest/',\n",
234 |        " 'https://www.reddit.com/r/AnthemTheGame/',\n",
235 |        " 'https://www.reddit.com/r/AskCulinary/',\n",
236 |        " 'https://www.reddit.com/r/AskMen/',\n",
237 |        " 'https://www.reddit.com/r/AskNYC/',\n",
238 |        " 'https://www.reddit.com/r/AskReddit/',\n",
239 |        " 'https://www.reddit.com/r/AskWomen/',\n",
240 |        " 'https://www.reddit.com/r/Astros/',\n",
241 |        " 'https://www.reddit.com/r/Atlanta/',\n",
242 |        " 'https://www.reddit.com/r/AtlantaUnited/',\n",
243 |        " 'https://www.reddit.com/r/Augusta/',\n",
244 |        " 'https://www.reddit.com/r/Austria/',\n",
245 |        " 'https://www.reddit.com/r/Barca/',\n",
246 |        " 'https://www.reddit.com/r/BattlefieldV/',\n",
247 |        " 'https://www.reddit.com/r/BeautyBoxes/',\n",
248 |        " 'https://www.reddit.com/r/BeautyGuruChatter/',\n",
249 |        " 'https://www.reddit.com/r/Bend/',\n",
250 |        " 'https://www.reddit.com/r/Berserk/',\n",
251 |        " 'https://www.reddit.com/r/BigBrother/',\n",
252 |        " 'https://www.reddit.com/r/BlackClover/',\n",
253 |        " 'https://www.reddit.com/r/Blackops4/',\n",
254 |        " 'https://www.reddit.com/r/BoJackHorseman/',\n",
255 |        " 'https://www.reddit.com/r/BokuNoHeroAcademia/',\n",
256 |        " 'https://www.reddit.com/r/Boruto/',\n",
257 |        " 'https://www.reddit.com/r/BostonBruins/',\n",
258 |        " 'https://www.reddit.com/r/Boxing/',\n",
259 |        " 'https://www.reddit.com/r/Braves/',\n",
260 |        " 'https://www.reddit.com/r/BravoRealHousewives/',\n",
261 |        " 'https://www.reddit.com/r/Brawlstars/',\n",
262 |        " 'https://www.reddit.com/r/Breath_of_the_Wild/',\n",
263 |        " 'https://www.reddit.com/r/Brogress/',\n",
264 |        " 'https://www.reddit.com/r/Browns/',\n",
265 |        " 'https://www.reddit.com/r/C25K/',\n",
266 |        " 'https://www.reddit.com/r/CFB/',\n",
267 |        " 'https://www.reddit.com/r/CHIBears/',\n",
268 |        " 'https://www.reddit.com/r/CHICubs/',\n",
269 |        " 'https://www.reddit.com/r/Calgary/',\n",
270 |        " 'https://www.reddit.com/r/CampingGear/',\n",
271 |        " 'https://www.reddit.com/r/CampingandHiking/',\n",
272 |        " 'https://www.reddit.com/r/Cardinals/',\n",
273 |        " 'https://www.reddit.com/r/CasualUK/',\n",
274 |        " 'https://www.reddit.com/r/Charlotte/',\n",
275 |        " 'https://www.reddit.com/r/China/',\n",
276 |        " 'https://www.reddit.com/r/ClashOfClans/',\n",
277 |        " 'https://www.reddit.com/r/ClashRoyale/',\n",
278 |        " 'https://www.reddit.com/r/CoDCompetitive/',\n",
279 |        " 'https://www.reddit.com/r/CollegeBasketball/',\n",
280 |        " 'https://www.reddit.com/r/Columbus/',\n",
281 |        " 'https://www.reddit.com/r/Competitiveoverwatch/',\n",
282 |        " 'https://www.reddit.com/r/Cooking/',\n",
283 |        " 'https://www.reddit.com/r/Cricket/',\n",
284 |        " 'https://www.reddit.com/r/CrohnsDisease/',\n",
285 |        " 'https://www.reddit.com/r/CrusaderKings/',\n",
286 |        " 'https://www.reddit.com/r/DBZDokkanBattle/',\n",
287 |        " 'https://www.reddit.com/r/DMAcademy/',\n",
288 |        " 'https://www.reddit.com/r/Dallas/',\n",
289 |        " 'https://www.reddit.com/r/DanLeBatardShow/',\n",
290 |        " 'https://www.reddit.com/r/DaysGone/',\n",
291 |        " 'https://www.reddit.com/r/Denmark/',\n",
292 |        " 'https://www.reddit.com/r/Denver/',\n",
293 |        " 'https://www.reddit.com/r/Destiny/',\n",
294 |        " 'https://www.reddit.com/r/DestinyTheGame/',\n",
295 |        " 'https://www.reddit.com/r/Detroit/',\n",
296 |        " 'https://www.reddit.com/r/Disneyland/',\n",
297 |        " 'https://www.reddit.com/r/DnD/',\n",
298 |        " 'https://www.reddit.com/r/Dodgers/',\n",
299 |        " 'https://www.reddit.com/r/DotA2/',\n",
300 |        " 'https://www.reddit.com/r/DuelLinks/',\n",
301 |        " 'https://www.reddit.com/r/DunderMifflin/',\n",
302 |        " 'https://www.reddit.com/r/DynastyFF/',\n",
303 |        " 'https://www.reddit.com/r/EDH/',\n",
304 |        " 'https://www.reddit.com/r/EDanonymemes/',\n",
305 |        " 'https://www.reddit.com/r/EOOD/',\n",
306 |        " 'https://www.reddit.com/r/EatCheapAndHealthy/',\n",
307 |        " 'https://www.reddit.com/r/Edmonton/',\n",
308 |        " 'https://www.reddit.com/r/EliteDangerous/',\n",
309 |        " 'https://www.reddit.com/r/EscapefromTarkov/',\n",
310 |        " 'https://www.reddit.com/r/Eve/',\n",
311 |        " 'https://www.reddit.com/r/FFBraveExvius/',\n",
312 |        " 'https://www.reddit.com/r/FIFA/',\n",
313 |        " 'https://www.reddit.com/r/FORTnITE/',\n",
314 |        " 'https://www.reddit.com/r/FUTMobile/',\n",
315 |        " 'https://www.reddit.com/r/Fallout/',\n",
316 |        " 'https://www.reddit.com/r/FantasyPL/',\n",
317 |        " 'https://www.reddit.com/r/FireEmblemHeroes/',\n",
318 |        " 'https://www.reddit.com/r/Fishing/',\n",
319 |        " 'https://www.reddit.com/r/Fitness/',\n",
320 |        " 'https://www.reddit.com/r/FixedGearBicycle/',\n",
321 |        " 'https://www.reddit.com/r/FlashTV/',\n",
322 |        " 'https://www.reddit.com/r/FortNiteBR/',\n",
323 |        " 'https://www.reddit.com/r/FortniteCompetitive/',\n",
324 |        " 'https://www.reddit.com/r/Frugal/',\n",
325 |        " 'https://www.reddit.com/r/GameOfThronesMemes/',\n",
326 |        " 'https://www.reddit.com/r/Gamingcirclejerk/',\n",
327 |        " 'https://www.reddit.com/r/GetMotivated/',\n",
328 |        " 'https://www.reddit.com/r/Glitch_in_the_Matrix/',\n",
329 |        " 'https://www.reddit.com/r/GlobalOffensive/',\n",
330 |        " 'https://www.reddit.com/r/GlobalOffensiveTrade/',\n",
331 |        " 'https://www.reddit.com/r/GooglePixel/',\n",
332 |        " 'https://www.reddit.com/r/GreenBayPackers/',\n",
333 |        " 'https://www.reddit.com/r/Grimdank/',\n",
334 |        " 'https://www.reddit.com/r/Guildwars2/',\n",
335 |        " 'https://www.reddit.com/r/Gundam/',\n",
336 |        " 'https://www.reddit.com/r/HBOGameofThrones/',\n",
337 |        " 'https://www.reddit.com/r/Hair/',\n",
338 |        " 'https://www.reddit.com/r/HealthyFood/',\n",
339 |        " 'https://www.reddit.com/r/HomeImprovement/',\n",
340 |        " 'https://www.reddit.com/r/IASIP/',\n",
341 |        " 'https://www.reddit.com/r/IAmA/',\n",
342 |        " 'https://www.reddit.com/r/IWantOut/',\n",
343 |        " 'https://www.reddit.com/r/ImaginaryWesteros/',\n",
344 |        " 'https://www.reddit.com/r/Indiemakeupandmore/',\n",
345 |        " 'https://www.reddit.com/r/Instagram/',\n",
346 |        " 'https://www.reddit.com/r/Israel/',\n",
347 |        " 'https://www.reddit.com/r/JapanTravel/',\n",
348 |        " 'https://www.reddit.com/r/Jeopardy/',\n",
349 |        " 'https://www.reddit.com/r/JoshuaTree/',\n",
350 |        " 'https://www.reddit.com/r/Konosuba/',\n",
351 |        " 'https://www.reddit.com/r/LearnJapanese/',\n",
352 |        " 'https://www.reddit.com/r/LegendsOfTomorrow/',\n",
353 |        " 'https://www.reddit.com/r/LifeProTips/',\n",
354 |        " 'https://www.reddit.com/r/LigaMX/',\n",
355 |        " 'https://www.reddit.com/r/LiverpoolFC/',\n",
356 |        " 'https://www.reddit.com/r/LivestreamFail/',\n",
357 |        " 'https://www.reddit.com/r/LosAngelesRams/',\n",
358 |        " 'https://www.reddit.com/r/LushCosmetics/',\n",
359 |        " 'https://www.reddit.com/r/MCFC/',\n",
360 |        " 'https://www.reddit.com/r/MLBTheShow/',\n",
361 |        " 'https://www.reddit.com/r/MLS/',\n",
362 |        " 'https://www.reddit.com/r/MMA/',\n",
363 |        " 'https://www.reddit.com/r/MTB/',\n",
364 |        " 'https://www.reddit.com/r/MUAontheCheap/',\n",
365 |        " 'https://www.reddit.com/r/MagicArena/',\n",
366 |        " 'https://www.reddit.com/r/Makeup/',\n",
367 |        " 'https://www.reddit.com/r/MakeupAddiction/',\n",
368 |        " 'https://www.reddit.com/r/MakingaMurderer/',\n",
369 |        " 'https://www.reddit.com/r/Market76/',\n",
370 |        " 'https://www.reddit.com/r/MarvelStrikeForce/',\n",
371 |        " 'https://www.reddit.com/r/Mavericks/',\n",
372 |        " 'https://www.reddit.com/r/Minecraft/',\n",
373 |        " 'https://www.reddit.com/r/Minneapolis/',\n",
374 |        " 'https://www.reddit.com/r/MkeBucks/',\n",
375 |        " 'https://www.reddit.com/r/ModernMagic/',\n",
376 |        " 'https://www.reddit.com/r/MonsterHunterWorld/',\n",
377 |        " 'https://www.reddit.com/r/Mordhau/',\n",
378 |        " 'https://www.reddit.com/r/MortalKombat/',\n",
379 |        " 'https://www.reddit.com/r/MtvChallenge/',\n",
380 |        " 'https://www.reddit.com/r/Music/',\n",
381 |        " 'https://www.reddit.com/r/NBA2k/',\n",
382 |        " 'https://www.reddit.com/r/NBASpurs/',\n",
383 |        " 'https://www.reddit.com/r/NFA/',\n",
384 |        " 'https://www.reddit.com/r/NHLHUT/',\n",
385 |        " 'https://www.reddit.com/r/NYKnicks/',\n",
386 |        " 'https://www.reddit.com/r/NYYankees/',\n",
387 |        " 'https://www.reddit.com/r/Naruto/',\n",
388 |        " 'https://www.reddit.com/r/Nationals/',\n",
389 |        " 'https://www.reddit.com/r/Nerf/',\n",
390 |        " 'https://www.reddit.com/r/NetflixBestOf/',\n",
391 |        " 'https://www.reddit.com/r/NewOrleans/',\n",
392 |        " 'https://www.reddit.com/r/NewSkaters/',\n",
393 |        " 'https://www.reddit.com/r/NewYorkMets/',\n",
394 |        " 'https://www.reddit.com/r/NintendoSwitch/',\n",
395 |        " 'https://www.reddit.com/r/NoMansSkyTheGame/',\n",
396 |        " 'https://www.reddit.com/r/NoStupidQuestions/',\n",
397 |        " 'https://www.reddit.com/r/OnePiece/',\n",
398 |        " 'https://www.reddit.com/r/OutOfTheLoop/',\n",
399 |        " 'https://www.reddit.com/r/Overwatch/',\n",
400 |        " 'https://www.reddit.com/r/PS4/',\n",
401 |        " 'https://www.reddit.com/r/PSVR/',\n",
402 |        " 'https://www.reddit.com/r/PUBATTLEGROUNDS/',\n",
403 |        " 'https://www.reddit.com/r/PUBGMobile/',\n",
404 |        " 'https://www.reddit.com/r/Paladins/',\n",
405 |        " 'https://www.reddit.com/r/PanPorn/',\n",
406 |        " 'https://www.reddit.com/r/PandR/',\n",
407 |        " 'https://www.reddit.com/r/Patriots/',\n",
408 |        " 'https://www.reddit.com/r/Persona5/',\n",
409 |        " 'https://www.reddit.com/r/Philippines/',\n",
410 |        " 'https://www.reddit.com/r/Planetside/',\n",
411 |        " 'https://www.reddit.com/r/Polska/',\n",
412 |        " 'https://www.reddit.com/r/Portland/',\n",
413 |        " 'https://www.reddit.com/r/Quebec/',\n",
414 |        " 'https://www.reddit.com/r/RWBY/',\n",
415 |        " 'https://www.reddit.com/r/Rainbow6/',\n",
416 |        " 'https://www.reddit.com/r/RedDeadOnline/',\n",
417 |        " 'https://www.reddit.com/r/RedditLaqueristas/',\n",
418 |        " 'https://www.reddit.com/r/RepLadiesBST/',\n",
419 |        " 'https://www.reddit.com/r/Repsneakers/',\n",
420 |        " 'https://www.reddit.com/r/RimWorld/',\n",
421 |        " 'https://www.reddit.com/r/RocketLeague/',\n",
422 |        " 'https://www.reddit.com/r/RocketLeagueExchange/',\n",
423 |        " 'https://www.reddit.com/r/Romania/',\n",
424 |        " 'https://www.reddit.com/r/Rowing/',\n",
425 |        " 'https://www.reddit.com/r/SFGiants/',\n",
426 |        " 'https://www.reddit.com/r/SWGalaxyOfHeroes/',\n",
427 |        " 'https://www.reddit.com/r/Sacramento/',\n",
428 |        " 'https://www.reddit.com/r/SaltLakeCity/',\n",
429 |        " 'https://www.reddit.com/r/SanJoseSharks/',\n",
430 |        " 'https://www.reddit.com/r/SantaFe/',\n",
431 |        " 'https://www.reddit.com/r/SarahSnark/',\n",
432 |        " 'https://www.reddit.com/r/Scotland/',\n",
433 |        " 'https://www.reddit.com/r/Scottsdale/',\n",
434 |        " 'https://www.reddit.com/r/Seaofthieves/',\n",
435 |        " 'https://www.reddit.com/r/Seattle/',\n",
436 |        " 'https://www.reddit.com/r/SequelMemes/',\n",
437 |        " 'https://www.reddit.com/r/ShingekiNoKyojin/',\n",
438 |        " 'https://www.reddit.com/r/Shoestring/',\n",
439 |        " 'https://www.reddit.com/r/Showerthoughts/',\n",
440 |        " 'https://www.reddit.com/r/Smite/',\n",
441 |        " 'https://www.reddit.com/r/Sneakers/',\n",
442 |        " 'https://www.reddit.com/r/Spiderman/',\n",
443 |        " 'https://www.reddit.com/r/SpoiledDragRace/',\n",
444 |        " 'https://www.reddit.com/r/SquaredCircle/',\n",
445 |        " 'https://www.reddit.com/r/StLouis/',\n",
446 |        " 'https://www.reddit.com/r/StarVStheForcesofEvil/',\n",
447 |        " 'https://www.reddit.com/r/StarWarsBattlefront/',\n",
448 |        " 'https://www.reddit.com/r/StardewValley/',\n",
449 |        " 'https://www.reddit.com/r/Steam/',\n",
450 |        " 'https://www.reddit.com/r/Stellaris/',\n",
451 |        " 'https://www.reddit.com/r/StrangerThings/',\n",
452 |        " 'https://www.reddit.com/r/Stronglifts5x5/',\n",
453 |        " 'https://www.reddit.com/r/Suomi/',\n",
454 |        " 'https://www.reddit.com/r/Supplements/',\n",
455 |        " 'https://www.reddit.com/r/TeenMomOGandTeenMom2/',\n",
456 |        " 'https://www.reddit.com/r/Terraria/',\n",
457 |        " 'https://www.reddit.com/r/TheAmazingRace/',\n",
458 |        " 'https://www.reddit.com/r/TheBlackList/',\n",
459 |        " 'https://www.reddit.com/r/TheDickShow/',\n",
460 |        " 'https://www.reddit.com/r/TheHandmaidsTale/',\n",
461 |        " 'https://www.reddit.com/r/TheLastAirbender/',\n",
462 |        " 'https://www.reddit.com/r/TheSimpsons/',\n",
463 |        " 'https://www.reddit.com/r/Tinder/',\n",
464 |        " 'https://www.reddit.com/r/Torontobluejays/',\n",
465 |        " 'https://www.reddit.com/r/Turkey/',\n",
466 |        " 'https://www.reddit.com/r/TurkeyJerky/',\n",
467 |        " 'https://www.reddit.com/r/Twitch/',\n",
468 |        " 'https://www.reddit.com/r/TwoBestFriendsPlay/',\n",
469 |        " 'https://www.reddit.com/r/VictoriaBC/',\n",
470 |        " 'https://www.reddit.com/r/WWE/',\n",
471 |        " 'https://www.reddit.com/r/WWEGames/',\n",
472 |        " 'https://www.reddit.com/r/WaltDisneyWorld/',\n",
473 |        " 'https://www.reddit.com/r/Warframe/',\n",
474 |        " 'https://www.reddit.com/r/Warhammer40k/',\n",
475 |        " 'https://www.reddit.com/r/Warthunder/',\n",
476 |        " 'https://www.reddit.com/r/Watches/',\n",
477 |        " 'https://www.reddit.com/r/Watchexchange/',\n",
478 |        " 'https://www.reddit.com/r/Wellington/',\n",
479 |        " 'https://www.reddit.com/r/Wetshaving/',\n",
480 |        " 'https://www.reddit.com/r/Windows10/',\n",
481 |        " 'https://www.reddit.com/r/Winnipeg/',\n",
482 |        " 'https://www.reddit.com/r/WorldOfWarships/',\n",
483 |        " 'https://www.reddit.com/r/WorldofTanks/',\n",
484 |        " 'https://www.reddit.com/r/Youniqueamua/',\n",
485 |        " 'https://www.reddit.com/r/aSongOfMemesAndRage/',\n",
486 |        " 'https://www.reddit.com/r/acne/',\n",
487 |        " 'https://www.reddit.com/r/adventuretime/',\n",
488 |        " 'https://www.reddit.com/r/airsoft/',\n",
489 |        " 'https://www.reddit.com/r/amateur_boxing/',\n",
490 |        " 'https://www.reddit.com/r/anime/',\n",
491 |        " 'https://www.reddit.com/r/anime_irl/',\n",
492 |        " 'https://www.reddit.com/r/apple/',\n",
493 |        " 'https://www.reddit.com/r/argentina/',\n",
494 |        " 'https://www.reddit.com/r/arrow/',\n",
495 |        " 'https://www.reddit.com/r/askTO/',\n",
496 |        " 'https://www.reddit.com/r/askscience/',\n",
497 |        " 'https://www.reddit.com/r/asoiaf/',\n",
498 |        " 'https://www.reddit.com/r/australia/',\n",
499 |        " 'https://www.reddit.com/r/awardtravel/',\n",
500 |        " 'https://www.reddit.com/r/backpacking/',\n",
501 |        " 'https://www.reddit.com/r/balisong/',\n",
502 |        " 'https://www.reddit.com/r/barstoolsports/',\n",
503 |        " 'https://www.reddit.com/r/baseball/',\n",
504 |        " 'https://www.reddit.com/r/batman/',\n",
505 |        " 'https://www.reddit.com/r/battlestations/',\n",
506 |        " 'https://www.reddit.com/r/bayarea/',\n",
507 |        " 'https://www.reddit.com/r/beards/',\n",
508 |        " 'https://www.reddit.com/r/beauty/',\n",
509 |        " 'https://www.reddit.com/r/berkeley/',\n",
510 |        " 'https://www.reddit.com/r/bicycling/',\n",
511 |        " 'https://www.reddit.com/r/bikecommuting/',\n",
512 |        " 'https://www.reddit.com/r/bikewrench/',\n",
513 |        " 'https://www.reddit.com/r/bjj/',\n",
514 |        " 'https://www.reddit.com/r/blackmirror/',\n",
515 |        " 'https://www.reddit.com/r/bleach/',\n",
516 |        " 'https://www.reddit.com/r/boardgames/',\n",
517 |        " 'https://www.reddit.com/r/bodybuilding/',\n",
518 |        " 'https://www.reddit.com/r/bodyweightfitness/',\n",
519 |        " 'https://www.reddit.com/r/books/',\n",
520 |        " 'https://www.reddit.com/r/boostedboards/',\n",
521 |        " 'https://www.reddit.com/r/bostonceltics/',\n",
522 |        " 'https://www.reddit.com/r/brasil/',\n",
523 |        " 'https://www.reddit.com/r/brasilivre/',\n",
524 |        " 'https://www.reddit.com/r/breakingbad/',\n",
525 |        " 'https://www.reddit.com/r/brisbane/',\n",
526 |        " 'https://www.reddit.com/r/brooklynninenine/',\n",
527 |        " 'https://www.reddit.com/r/buildapc/',\n",
528 |        " 'https://www.reddit.com/r/camping/',\n",
529 |        " 'https://www.reddit.com/r/canada/',\n",
530 |        " 'https://www.reddit.com/r/canucks/',\n",
531 |        " 'https://www.reddit.com/r/cars/',\n",
532 |        " 'https://www.reddit.com/r/chelseafc/',\n",
533 |        " 'https://www.reddit.com/r/chile/',\n",
534 |        " 'https://www.reddit.com/r/cirkeltrek/',\n",
535 |        " 'https://www.reddit.com/r/classicwow/',\n",
536 |        " 'https://www.reddit.com/r/climbing/',\n",
537 |        " 'https://www.reddit.com/r/community/',\n",
538 |        " 'https://www.reddit.com/r/confession/',\n",
539 |        " 'https://www.reddit.com/r/cordcutters/',\n",
540 |        " 'https://www.reddit.com/r/cowboys/',\n",
541 |        " 'https://www.reddit.com/r/coys/',\n",
542 |        " 'https://www.reddit.com/r/criterion/',\n",
543 |        " 'https://www.reddit.com/r/croatia/',\n",
544 |        " 'https://www.reddit.com/r/crossfit/',\n",
545 |        " 'https://www.reddit.com/r/cscareerquestions/',\n",
546 |        " 'https://www.reddit.com/r/curlyhair/',\n",
547 |        " 'https://www.reddit.com/r/cycling/',\n",
548 |        " 'https://www.reddit.com/r/danganronpa/',\n",
549 |        " 'https://www.reddit.com/r/dataisbeautiful/',\n",
550 |        " 'https://www.reddit.com/r/dataisbeautiful/?f=flair_name%3A%22OC%22',\n",
551 |        " 'https://www.reddit.com/r/dataisbeautiful/comments/g0o65a/oc_a_full_year_of_income_and_expenses_through_my/',\n",
552 |        " 'https://www.reddit.com/r/dauntless/',\n",
553 |        " 'https://www.reddit.com/r/dbz/',\n",
554 |        " 'https://www.reddit.com/r/de/',\n",
555 |        " 'https://www.reddit.com/r/deadbydaylight/',\n",
556 |        " 'https://www.reddit.com/r/denvernuggets/',\n",
557 |        " 'https://www.reddit.com/r/destiny2/',\n",
558 |        " 'https://www.reddit.com/r/detroitlions/',\n",
559 |        " 'https://www.reddit.com/r/diabetes/',\n",
560 |        " 'https://www.reddit.com/r/diabetes_t1/',\n",
561 |        " 'https://www.reddit.com/r/discgolf/',\n",
562 |        " 'https://www.reddit.com/r/discordapp/',\n",
563 |        " 'https://www.reddit.com/r/disney/',\n",
564 |        " 'https://www.reddit.com/r/dndmemes/',\n",
565 |        " 'https://www.reddit.com/r/dndnext/',\n",
566 |        " 'https://www.reddit.com/r/doctorwho/',\n",
567 |        " 'https://www.reddit.com/r/dubai/',\n",
568 |        " 'https://www.reddit.com/r/eagles/',\n",
569 |        " 'https://www.reddit.com/r/ehlersdanlos/',\n",
570 |        " 'https://www.reddit.com/r/elderscrollsonline/',\n",
571 |        " 'https://www.reddit.com/r/eu4/',\n",
572 |        " 'https://www.reddit.com/r/europe/',\n",
573 |        " 'https://www.reddit.com/r/explainlikeimfive/',\n",
574 |        " 'https://www.reddit.com/r/fairytail/',\n",
575 |        " 'https://www.reddit.com/r/fantasybaseball/',\n",
576 |        " 'https://www.reddit.com/r/fantasyfootball/',\n",
577 |        " 'https://www.reddit.com/r/fasting/',\n",
578 |        " 'https://www.reddit.com/r/femalefashionadvice/',\n",
579 |        " 'https://www.reddit.com/r/femalehairadvice/',\n",
580 |        " 'https://www.reddit.com/r/ffxiv/',\n",
581 |        " 'https://www.reddit.com/r/findfashion/',\n",
582 |        " 'https://www.reddit.com/r/fireemblem/',\n",
583 |        " 'https://www.reddit.com/r/fivenightsatfreddys/',\n",
584 |        " 'https://www.reddit.com/r/flexibility/',\n",
585 |        " 'https://www.reddit.com/r/flightsim/',\n",
586 |        " 'https://www.reddit.com/r/flyfishing/',\n",
587 |        " 'https://www.reddit.com/r/fo76/',\n",
588 |        " 'https://www.reddit.com/r/footballmanagergames/',\n",
589 |        " 'https://www.reddit.com/r/forhonor/',\n",
590 |        " 'https://www.reddit.com/r/formula1/',\n",
591 |        " 'https://www.reddit.com/r/fragrance/',\n",
592 |        " 'https://www.reddit.com/r/france/',\n",
593 |        " 'https://www.reddit.com/r/freefolk/',\n",
594 |        " 'https://www.reddit.com/r/frugalmalefashion/',\n",
595 |        " 'https://www.reddit.com/r/futurama/',\n",
596 |        " 'https://www.reddit.com/r/future_fight/',\n",
597 |        " 'https://www.reddit.com/r/gainit/',\n",
598 |        " 'https://www.reddit.com/r/gameofthrones/',\n",
599 |        " 'https://www.reddit.com/r/germany/',\n",
600 |        " 'https://www.reddit.com/r/gifs/',\n",
601 |        " 'https://www.reddit.com/r/gifs/comments/g0tzwn/disney_tried_editing_out_darryl_hannahs_butt_by/',\n",
602 |        " 'https://www.reddit.com/r/girlsfrontline/',\n",
603 |        " 'https://www.reddit.com/r/golf/',\n",
604 |        " 'https://www.reddit.com/r/goodyearwelt/',\n",
605 |        " 'https://www.reddit.com/r/grandorder/',\n",
606 |        " 'https://www.reddit.com/r/greece/',\n",
607 |        " 'https://www.reddit.com/r/greysanatomy/',\n",
608 |        " 'https://www.reddit.com/r/gtaonline/',\n",
609 |        " 'https://www.reddit.com/r/halifax/',\n",
610 |        " 'https://www.reddit.com/r/halo/',\n",
611 |        " 'https://www.reddit.com/r/headphones/',\n",
612 |        " 'https://www.reddit.com/r/hearthstone/',\n",
613 |        " 'https://www.reddit.com/r/heroesofthestorm/',\n",
614 |        " 'https://www.reddit.com/r/hiking/',\n",
615 |        " 'https://www.reddit.com/r/hockey/',\n",
616 |        " 'https://www.reddit.com/r/hockeyjerseys/',\n",
617 |        " 'https://www.reddit.com/r/hockeyplayers/',\n",
618 |        " 'https://www.reddit.com/r/houston/',\n",
619 |        " 'https://www.reddit.com/r/howardstern/',\n",
620 |        " 'https://www.reddit.com/r/hungary/',\n",
621 |        " 'https://www.reddit.com/r/india/',\n",
622 |        " 'https://www.reddit.com/r/indonesia/',\n",
623 |        " 'https://www.reddit.com/r/intermittentfasting/',\n",
624 |        " 'https://www.reddit.com/r/iphone/',\n",
625 |        " 'https://www.reddit.com/r/ireland/',\n",
626 |        " 'https://www.reddit.com/r/italy/',\n",
627 |        " 'https://www.reddit.com/r/jailbreak/',\n",
628 |        " 'https://www.reddit.com/r/japanesestreetwear/',\n",
629 |        " 'https://www.reddit.com/r/japanlife/',\n",
630 |        " 'https://www.reddit.com/r/jobs/',\n",
631 |        " 'https://www.reddit.com/r/kansascity/',\n",
632 |        " 'https://www.reddit.com/r/keto/',\n",
633 |        " 'https://www.reddit.com/r/korea/',\n",
634 |        " 'https://www.reddit.com/r/lakers/',\n",
635 |        " 'https://www.reddit.com/r/leafs/',\n",
636 |        " 'https://www.reddit.com/r/leagueoflegends/',\n",
637 |        " 'https://www.reddit.com/r/leangains/',\n",
638 |        " 'https://www.reddit.com/r/learnprogramming/',\n",
639 |        " 'https://www.reddit.com/r/learnpython/',\n",
640 |        " 'https://www.reddit.com/r/legaladvice/',\n",
641 |        " 'https://www.reddit.com/r/longboarding/',\n",
642 |        " 'https://www.reddit.com/r/loseit/',\n",
643 |        " 'https://www.reddit.com/r/lucifer/',\n",
644 |        " 'https://www.reddit.com/r/makeupexchange/',\n",
645 |        " 'https://www.reddit.com/r/malaysia/',\n",
646 |        " 'https://www.reddit.com/r/malefashion/',\n",
647 |        " 'https://www.reddit.com/r/malefashionadvice/',\n",
648 |        " 'https://www.reddit.com/r/malehairadvice/',\n",
649 |        " 'https://www.reddit.com/r/malelivingspace/',\n",
650 |        " 'https://www.reddit.com/r/marvelmemes/',\n",
651 |        " 'https://www.reddit.com/r/marvelstudios/',\n",
652 |        " 'https://www.reddit.com/r/medical_advice/',\n",
653 |        " 'https://www.reddit.com/r/melbourne/',\n",
654 |        " 'https://www.reddit.com/r/memes/',\n",
655 |        " 'https://www.reddit.com/r/mexico/',\n",
656 |        " 'https://www.reddit.com/r/migraine/',\n",
657 |        " 'https://www.reddit.com/r/minnesotatwins/',\n",
658 |        " 'https://www.reddit.com/r/minnesotavikings/',\n",
659 |        " 'https://www.reddit.com/r/mw4/',\n",
660 |        " 'https://www.reddit.com/r/mylittlepony/',\n",
661 |        " 'https://www.reddit.com/r/nashville/',\n",
662 |        " 'https://www.reddit.com/r/nattyorjuice/',\n",
663 |        " 'https://www.reddit.com/r/nba/',\n",
664 |        " 'https://www.reddit.com/r/nbadiscussion/',\n",
665 |        " 'https://www.reddit.com/r/netflix/',\n",
666 |        " 'https://www.reddit.com/r/newsokur/',\n",
667 |        " 'https://www.reddit.com/r/newzealand/',\n",
668 |        " 'https://www.reddit.com/r/nfl/',\n",
669 |        " 'https://www.reddit.com/r/nhl/',\n",
670 |        " 'https://www.reddit.com/r/norge/',\n",
671 |        " 'https://www.reddit.com/r/nosleep/',\n",
672 |        " 'https://www.reddit.com/r/nova/',\n",
673 |        " 'https://www.reddit.com/r/nrl/',\n",
674 |        " 'https://www.reddit.com/r/nutrition/',\n",
675 |        " 'https://www.reddit.com/r/nvidia/',\n",
676 |        " 'https://www.reddit.com/r/nyjets/',\n",
677 |        " 'https://www.reddit.com/r/omad/',\n",
678 |        " 'https://www.reddit.com/r/orangecounty/',\n",
679 |        " 'https://www.reddit.com/r/orangetheory/',\n",
680 |        " 'https://www.reddit.com/r/osugame/',\n",
681 |        " 'https://www.reddit.com/r/ottawa/',\n",
682 |        " 'https://www.reddit.com/r/overlord/',\n",
683 |        " 'https://www.reddit.com/r/pathofexile/',\n",
684 |        " 'https://www.reddit.com/r/pcmasterrace/',\n",
685 |        " 'https://www.reddit.com/r/peloton/',\n",
686 |        " 'https://www.reddit.com/r/pesmobile/',\n",
687 |        " 'https://www.reddit.com/r/philadelphia/',\n",
688 |        " 'https://www.reddit.com/r/phillies/',\n",
689 |        " 'https://www.reddit.com/r/phoenix/',\n",
690 |        " 'https://www.reddit.com/r/pics/',\n",
691 |        " 'https://www.reddit.com/r/piercing/',\n",
692 |        " 'https://www.reddit.com/r/pittsburgh/',\n",
693 |        " 'https://www.reddit.com/r/playrust/',\n",
694 |        " 'https://www.reddit.com/r/podemos/',\n",
695 |        " 'https://www.reddit.com/r/pokemon/',\n",
696 |        " 'https://www.reddit.com/r/pokemongo/',\n",
697 |        " 'https://www.reddit.com/r/pokemontrades/',\n",
698 |        " 'https://www.reddit.com/r/portugal/',\n",
699 |        " 'https://www.reddit.com/r/poshmark/',\n",
700 |        " 'https://www.reddit.com/r/powerlifting/',\n",
701 |        " 'https://www.reddit.com/r/progresspics/',\n",
702 |        " 'https://www.reddit.com/r/raleigh/',\n",
703 |        " 'https://www.reddit.com/r/ravens/',\n",
704 |        " 'https://www.reddit.com/r/rawdenim/',\n",
705 |        " 'https://www.reddit.com/r/realmadrid/',\n",
706 |        " 'https://www.reddit.com/r/reddeadredemption/',\n",
707 |        " 'https://www.reddit.com/r/reddevils/',\n",
708 |        " 'https://www.reddit.com/r/redsox/',\n",
709 |        " 'https://www.reddit.com/r/relationship_advice/',\n",
710 |        " 'https://www.reddit.com/r/rickandmorty/',\n",
711 |        " 'https://www.reddit.com/r/ripcity/',\n",
712 |        " 'https://www.reddit.com/r/riverdale/',\n",
713 |        " 'https://www.reddit.com/r/roadtrip/',\n",
714 |        " 'https://www.reddit.com/r/rolex/',\n",
715 |        " 'https://www.reddit.com/r/rollercoasters/',\n",
716 |        " 'https://www.reddit.com/r/rpdrcringe/',\n",
717 |        " 'https://www.reddit.com/r/rugbyunion/',\n",
718 |        " 'https://www.reddit.com/r/runescape/',\n",
719 |        " 'https://www.reddit.com/r/running/',\n",
720 |        " 'https://www.reddit.com/r/rupaulsdragrace/',\n",
721 |        " 'https://www.reddit.com/r/rva/',\n",
722 |        " 'https://www.reddit.com/r/sanantonio/',\n",
723 |        " 'https://www.reddit.com/r/sandiego/',\n",
724 |        " 'https://www.reddit.com/r/sanfrancisco/',\n",
725 |        " 'https://www.reddit.com/r/saskatoon/',\n",
726 |        " 'https://www.reddit.com/r/scifi/',\n",
727 |        " 'https://www.reddit.com/r/seinfeld/',\n",
728 |        " 'https://www.reddit.com/r/serbia/',\n",
729 |        " 'https://www.reddit.com/r/shield/',\n",
730 |        " 'https://www.reddit.com/r/singapore/',\n",
731 |        " 'https://www.reddit.com/r/sixers/',\n",
732 |        " 'https://www.reddit.com/r/skiing/',\n",
733 |        " 'https://www.reddit.com/r/skyrim/',\n",
734 |        " 'https://www.reddit.com/r/smashbros/',\n",
735 |        " 'https://www.reddit.com/r/sneakermarket/',\n",
736 |        " 'https://www.reddit.com/r/snowboarding/',\n",
737 |        " 'https://www.reddit.com/r/soccer/',\n",
738 |        " 'https://www.reddit.com/r/solotravel/',\n",
739 |        " 'https://www.reddit.com/r/southpark/',\n",
740 |        " 'https://www.reddit.com/r/sports/',\n",
741 |        " 'https://www.reddit.com/r/sportsbook/',\n",
742 |        " 'https://www.reddit.com/r/starbucks/',\n",
743 |        " 'https://www.reddit.com/r/starcitizen/',\n",
744 |        " 'https://www.reddit.com/r/startrek/',\n",
745 |        " 'https://www.reddit.com/r/steelers/',\n",
746 |        " 'https://www.reddit.com/r/stevenuniverse/',\n",
747 |        " 'https://www.reddit.com/r/stlouisblues/',\n",
748 |        " 'https://www.reddit.com/r/streetwearstartup/',\n",
749 |        " 'https://www.reddit.com/r/summonerswar/',\n",
750 |        " 'https://www.reddit.com/r/suns/',\n",
751 |        " 'https://www.reddit.com/r/survivor/',\n",
752 |        " 'https://www.reddit.com/r/sweden/',\n",
753 |        " 'https://www.reddit.com/r/swoleacceptance/',\n",
754 |        " 'https://www.reddit.com/r/sydney/',\n",
755 |        " 'https://www.reddit.com/r/sysadmin/',\n",
756 |        " 'https://www.reddit.com/r/tampabayrays/',\n",
757 |        " 'https://www.reddit.com/r/tattoos/',\n",
758 |        " 'https://www.reddit.com/r/techsupport/',\n",
759 |        " 'https://www.reddit.com/r/tennis/',\n",
760 |        " 'https://www.reddit.com/r/tf2/',\n",
761 |        " 'https://www.reddit.com/r/the100/',\n",
762 |        " 'https://www.reddit.com/r/thebachelor/',\n",
763 |        " 'https://www.reddit.com/r/thedivision/',\n",
764 |        " 'https://www.reddit.com/r/thenetherlands/',\n",
765 |        " 'https://www.reddit.com/r/thesims/',\n",
766 |        " 'https://www.reddit.com/r/thesopranos/',\n",
767 |        " 'https://www.reddit.com/r/thewalkingdead/',\n",
768 |        " 'https://www.reddit.com/r/tipofmytongue/',\n",
769 |        " 'https://www.reddit.com/r/titanfolk/',\n",
770 |        " 'https://www.reddit.com/r/todayilearned/',\n",
771 |        " 'https://www.reddit.com/r/torontoraptors/',\n",
772 |        " 'https://www.reddit.com/r/totalwar/',\n",
773 |        " 'https://www.reddit.com/r/touhou/',\n",
774 |        " 'https://www.reddit.com/r/trailerparkboys/',\n",
775 |        " 'https://www.reddit.com/r/translator/',\n",
776 |        " 'https://www.reddit.com/r/travel/',\n",
777 |        " 'https://www.reddit.com/r/vagabond/',\n",
778 |        " 'https://www.reddit.com/r/vancouver/',\n",
779 |        " 'https://www.reddit.com/r/vanderpumprules/',\n",
780 |        " 'https://www.reddit.com/r/vegan/',\n",
781 |        " 'https://www.reddit.com/r/videos/',\n",
782 |        " 'https://www.reddit.com/r/vzla/',\n",
783 |        " 'https://www.reddit.com/r/warriors/',\n",
784 |        " 'https://www.reddit.com/r/weightroom/',\n",
785 |        " 'https://www.reddit.com/r/westworld/',\n",
786 |        " 'https://www.reddit.com/r/wicked_edge/',\n",
787 |        " 'https://www.reddit.com/r/worldnews/',\n",
788 |        " 'https://www.reddit.com/r/wow/',\n",
789 |        " 'https://www.reddit.com/r/xboxone/',\n",
790 |        " 'https://www.reddit.com/r/xxfitness/',\n",
791 |        " 'https://www.reddit.com/r/yeezys/',\n",
792 |        " 'https://www.reddit.com/r/yoga/',\n",
793 |        " 'https://www.reddit.com/r/yugioh/',\n",
794 |        " 'https://www.reddit.com/r/zerocarb/',\n",
795 |        " 'https://www.reddit.com/search?q=dune&source=trending',\n",
796 |        " 'https://www.reddit.com/search?q=fauci&source=trending',\n",
797 |        " 'https://www.reddit.com/search?q=kyle%20larson&source=trending',\n",
798 |        " 'https://www.reddit.com/search?q=nascar&source=trending',\n",
799 |        " 'https://www.reddit.com/search?q=rick%20may&source=trending',\n",
800 |        " 'https://www.reddit.com/search?q=tornado&source=trending',\n",
801 |        " 'https://www.reddit.com/subreddits/leaderboard/up-and-coming',\n",
802 |        " 'https://www.reddit.com/user/ItsBOOM/',\n",
803 |        " 'https://www.reddit.com/user/SPM8/',\n",
804 |        " 'https://www.reddit.com/user/con_commenter/',\n",
805 |        " 'https://www.reddit.com/user/jesq/',\n",
806 |        " 'https://www.reddit.com/user/memezzer/',\n",
807 |        " 'https://www.reddit.com/user/mtlgrems/',\n",
808 |        " 'https://www.reddit.com/user/notsure500/',\n",
809 |        " 'https://www.reddit.com/user/skinkbaa/',\n",
810 |        " 'https://www.reddit.com/user/steven5it/'}"
811 |       ]
812 |      },
813 |      "execution_count": 16,
814 |      "metadata": {},
815 |      "output_type": "execute_result"
816 |     }
817 |    ],
818 |    "source": [
819 |     "# Take only the new items in the first set\n",
820 |     "new_urls.difference(urls)"
821 |    ]
822 |   },
823 |   {
824 |    "cell_type": "code",
825 |    "execution_count": 17,
826 |    "metadata": {},
827 |    "outputs": [
828 |     {
829 |      "data": {
830 |       "text/plain": [
831 |        "<coroutine object AsyncHTMLSession.close at 0x00000260CA7AA1C8>"
832 |       ]
833 |      },
834 |      "execution_count": 17,
835 |      "metadata": {},
836 |      "output_type": "execute_result"
837 |     }
838 |    ],
839 |    "source": [
840 |     "# Finally, close the session\n",
841 |     "session.close()"
842 |    ]
843 |   },
844 |   {
845 |    "cell_type": "code",
846 |    "execution_count": 18,
847 |    "metadata": {},
848 |    "outputs": [
849 |     {
850 |      "name": "stdout",
851 |      "output_type": "stream",
852 |      "text": [
853 |       "Reloads the response in Chromium, and replaces HTML content\n",
854 |       "        with an updated version, with JavaScript executed.\n",
855 |       "\n",
856 |       "        :param retries: The number of times to retry loading the page in Chromium.\n",
857 |       "        :param script: JavaScript to execute upon page load (optional).\n",
858 |       "        :param wait: The number of seconds to wait before loading the page, preventing timeouts (optional).\n",
859 |       "        :param scrolldown: Integer, if provided, of how many times to page down.\n",
860 |       "        :param sleep: Integer, if provided, of how many long to sleep after initial render.\n",
861 |       "        :param reload: If ``False``, content will not be loaded from the browser, but will be provided from memory.\n",
862 |       "        :param keep_page: If ``True`` will allow you to interact with the browser page through ``r.html.page``.\n",
863 |       "\n",
864 |       "        If ``scrolldown`` is specified, the page will scrolldown the specified\n",
865 |       "        number of times, after sleeping the specified amount of time\n",
866 |       "        (e.g. ``scrolldown=10, sleep=1``).\n",
867 |       "\n",
868 |       "        If just ``sleep`` is provided, the rendering will wait *n* seconds, before\n",
869 |       "        returning.\n",
870 |       "\n",
871 |       "        If ``script`` is specified, it will execute the provided JavaScript at\n",
872 |       "        runtime. Example:\n",
873 |       "\n",
874 |       "        .. code-block:: python\n",
875 |       "\n",
876 |       "            script = \"\"\"\n",
877 |       "                () => {\n",
878 |       "                    return {\n",
879 |       "                        width: document.documentElement.clientWidth,\n",
880 |       "                        height: document.documentElement.clientHeight,\n",
881 |       "                        deviceScaleFactor: window.devicePixelRatio,\n",
882 |       "                    }\n",
883 |       "                }\n",
884 |       "            \"\"\"\n",
885 |       "\n",
886 |       "        Returns the return value of the executed  ``script``, if any is provided:\n",
887 |       "\n",
888 |       "        .. code-block:: python\n",
889 |       "\n",
890 |       "            >>> r.html.render(script=script)\n",
891 |       "            {'width': 800, 'height': 600, 'deviceScaleFactor': 1}\n",
892 |       "\n",
893 |       "        Warning: the first time you run this method, it will download\n",
894 |       "        Chromium into your home directory (``~/.pyppeteer``).\n",
895 |       "        \n"
896 |      ]
897 |     }
898 |    ],
899 |    "source": [
900 |     "# You can check the documentation directly inside Jupyter\n",
901 |     "print(r.html.render.__doc__)"
902 |    ]
903 |   },
904 |   {
905 |    "cell_type": "code",
906 |    "execution_count": null,
907 |    "metadata": {},
908 |    "outputs": [],
909 |    "source": []
910 |   }
911 |  ],
912 |  "metadata": {
913 |   "kernelspec": {
914 |    "display_name": "Python 3",
915 |    "language": "python",
916 |    "name": "python3"
917 |   },
918 |   "language_info": {
919 |    "codemirror_mode": {
920 |     "name": "ipython",
921 |     "version": 3
922 |    },
923 |    "file_extension": ".py",
924 |    "mimetype": "text/x-python",
925 |    "name": "python",
926 |    "nbconvert_exporter": "python",
927 |    "pygments_lexer": "ipython3",
928 |    "version": "3.7.3"
929 |   }
930 |  },
931 |  "nbformat": 4,
932 |  "nbformat_minor": 2
933 | }
934 | 


--------------------------------------------------------------------------------
/11.Scraping JavaScript - SoundCloud Project/Section 10 - Scraping SoundCloud - Setup.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Scraping SoundCoud"
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "markdown",
 12 |    "metadata": {},
 13 |    "source": [
 14 |     "## Initial Setup"
 15 |    ]
 16 |   },
 17 |   {
 18 |    "cell_type": "code",
 19 |    "execution_count": null,
 20 |    "metadata": {},
 21 |    "outputs": [],
 22 |    "source": [
 23 |     "# import packages\n",
 24 |     "import requests\n",
 25 |     "from bs4 import BeautifulSoup"
 26 |    ]
 27 |   },
 28 |   {
 29 |    "cell_type": "code",
 30 |    "execution_count": null,
 31 |    "metadata": {},
 32 |    "outputs": [],
 33 |    "source": [
 34 |     "from requests_html import AsyncHTMLSession"
 35 |    ]
 36 |   },
 37 |   {
 38 |    "cell_type": "code",
 39 |    "execution_count": null,
 40 |    "metadata": {},
 41 |    "outputs": [],
 42 |    "source": []
 43 |   },
 44 |   {
 45 |    "cell_type": "markdown",
 46 |    "metadata": {},
 47 |    "source": [
 48 |     "## Connect to SoundCloud"
 49 |    ]
 50 |   },
 51 |   {
 52 |    "cell_type": "code",
 53 |    "execution_count": null,
 54 |    "metadata": {},
 55 |    "outputs": [],
 56 |    "source": [
 57 |     "# make connection to webpage\n",
 58 |     "resp = requests.get(\"https://soundcloud.com/discover\")"
 59 |    ]
 60 |   },
 61 |   {
 62 |    "cell_type": "code",
 63 |    "execution_count": null,
 64 |    "metadata": {},
 65 |    "outputs": [],
 66 |    "source": [
 67 |     "# get HTML from response object\n",
 68 |     "html = resp.content"
 69 |    ]
 70 |   },
 71 |   {
 72 |    "cell_type": "code",
 73 |    "execution_count": null,
 74 |    "metadata": {},
 75 |    "outputs": [],
 76 |    "source": [
 77 |     "# convert HTML to BeautifulSoup object\n",
 78 |     "soup = BeautifulSoup(html, \"lxml\")"
 79 |    ]
 80 |   },
 81 |   {
 82 |    "cell_type": "code",
 83 |    "execution_count": null,
 84 |    "metadata": {},
 85 |    "outputs": [],
 86 |    "source": []
 87 |   },
 88 |   {
 89 |    "cell_type": "markdown",
 90 |    "metadata": {},
 91 |    "source": [
 92 |     "## Get links on the webpage.  Notice how this doesn't extract all the links visible on the webpage...what can we do about that?"
 93 |    ]
 94 |   },
 95 |   {
 96 |    "cell_type": "code",
 97 |    "execution_count": null,
 98 |    "metadata": {},
 99 |    "outputs": [],
100 |    "source": [
101 |     "soup.find_all(\"a\")"
102 |    ]
103 |   },
104 |   {
105 |    "cell_type": "code",
106 |    "execution_count": null,
107 |    "metadata": {},
108 |    "outputs": [],
109 |    "source": []
110 |   },
111 |   {
112 |    "cell_type": "markdown",
113 |    "metadata": {},
114 |    "source": [
115 |     "## 1) Use requests-html to extract other links on the page by executing JavaScript.  How many links do you see now?\n",
116 |     "## 2) After you complete 1), get the text of the new paragraphs now visible in the HTML.\n",
117 |     "## 3) Try out a few other tags - what else appears after executing the JavaScript?\n",
118 |     "## 4) Using a CSS selector, extract the meta tag with name = \"keywords\".  Can you get this tag's attributes?\n",
119 |     "## 5) Links that automatically open to a new a tab are identified by having a \"target\" attribute equal to \"_blank\".  Try extracting these links and their URLs."
120 |    ]
121 |   },
122 |   {
123 |    "cell_type": "code",
124 |    "execution_count": null,
125 |    "metadata": {},
126 |    "outputs": [],
127 |    "source": []
128 |   },
129 |   {
130 |    "cell_type": "code",
131 |    "execution_count": null,
132 |    "metadata": {},
133 |    "outputs": [],
134 |    "source": []
135 |   },
136 |   {
137 |    "cell_type": "code",
138 |    "execution_count": null,
139 |    "metadata": {},
140 |    "outputs": [],
141 |    "source": []
142 |   },
143 |   {
144 |    "cell_type": "code",
145 |    "execution_count": null,
146 |    "metadata": {},
147 |    "outputs": [],
148 |    "source": []
149 |   },
150 |   {
151 |    "cell_type": "code",
152 |    "execution_count": null,
153 |    "metadata": {},
154 |    "outputs": [],
155 |    "source": []
156 |   }
157 |  ],
158 |  "metadata": {
159 |   "kernelspec": {
160 |    "display_name": "Python 3",
161 |    "language": "python",
162 |    "name": "python3"
163 |   },
164 |   "language_info": {
165 |    "codemirror_mode": {
166 |     "name": "ipython",
167 |     "version": 3
168 |    },
169 |    "file_extension": ".py",
170 |    "mimetype": "text/x-python",
171 |    "name": "python",
172 |    "nbconvert_exporter": "python",
173 |    "pygments_lexer": "ipython3",
174 |    "version": "3.7.3"
175 |   }
176 |  },
177 |  "nbformat": 4,
178 |  "nbformat_minor": 2
179 | }
180 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2020 Phone Thiri Yadana
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/readme.md:
--------------------------------------------------------------------------------
 1 | # Web Scraping and API in Python
 2 | 
 3 | Python project for integrations with different API and web scraping with BeautifulSoup and HTML-requests libraries for multiple scraping projects such as Youtube, dynamically generated Javascript for SounCloud and many more. 
 4 | 
 5 | ## Built With
 6 | * [python 3](https://www.python.org/)
 7 | * [requests](https://requests.readthedocs.io/en/master/) - Requests is an elegant and simple HTTP library for Python.
 8 | * [pandas](https://pandas.pydata.org/) - fast, powerful, flexible and easy to use open source data analysis and manipulation tool
 9 | * [Beautiful Soup](https://www.crummy.com/software/BeautifulSoup/) - for screen scraping library
10 | * [requests-HTML](https://requests.readthedocs.io/projects/requests-html/en/latest/) - make parsing HTML as simple and intuitive as possible with Full JavaScript Support
11 | * [python html.parser](https://docs.python.org/3/library/html.parser.html) - html parser
12 | * [lxml parser](https://lxml.de/parsing.html) - asd
13 | * [html5lib parser](https://github.com/html5lib/html5lib-python) - simple and powerful API for parsing XML and HTML
14 | * [urllib](https://docs.python.org/3/library/urllib.parse.html#module-urllib.parse) - URL handling module
15 | 
16 | ## API Projects
17 | * [Currency Exchange Rate API](https://exchangeratesapi.io/)
18 | * [iTune API](https://developer.apple.com/library/archive/documentation/AudioVideo/Conceptual/iTuneSearchAPI/Searching.html#//apple_ref/doc/uid/TP40017632-CH5-SW1)
19 | * [GitHub Jobs API](https://jobs.github.com/api)
20 | * [Official Joke API](https://github.com/15Dkatz/official_joke_api)
21 | * [Joke API](https://sv443.net/jokeapi)
22 | 
23 | ## Web Scraping Projects
24 | * [Rotten Tomatoes](https://www.rottentomatoes.com/)
25 | * [Steam](https://store.steampowered.com/games/)
26 | * [Youtube](https://www.youtube.com/)
27 | * [Sound Cloud](https://soundcloud.com/)
28 | 
29 | ## License
30 | 
31 | This project is licensed under the MIT License - see the [LICENSE.md](LICENSE.md) file for details
32 | 
33 | ## References
34 | 
35 | * The challenges are part of [Web Scraping and API in Python course](https://365datascience.com/courses/web-scraping-and-api-fundamentals-in-python/) by 365 Data Science.
36 | 


--------------------------------------------------------------------------------