├── README.md
├── data-directory
├── books_data.csv
├── cars.csv
├── movies.csv
├── real_estate.csv
├── salary_data.csv
└── transfers_data.csv
└── notebooks
├── Amazon.ipynb
├── Carvago.ipynb
├── Cineb_movies.ipynb
├── Real estate.ipynb
├── Salaries.ipynb
├── Transfermarkt.ipynb
└── Understat.ipynb
/README.md:
--------------------------------------------------------------------------------
1 | # Web Scraping Projects With Python
2 |
3 | This repository contains a collection of tools, scripts and projects that focus on analysis and visualisation of football data.
4 |
5 | ## Contents
6 |
7 | Table of Contents
8 |
9 |
10 | About the project
11 | Prerequisites
12 | Folder Structure
13 | Projects
14 |
15 | Scraping salaries data from Salary.com
16 | Scraping car's data and crawling to specific URLs
17 | Scraping of transfers data
18 | Scraping different types of football data from Understat.com
19 | Scraping movie data from Cineb.com
20 | Scraping Real-estate data and crawling to Appartement pages
21 | Scraping amazons data by keywords search
22 |
23 |
24 |
25 |
26 |
27 |
28 |
29 | ## About the Project
30 | This repository has a collection of web scraping projects. I attempted to scrape many websites in order to cope with various structures and obtain various sorts of data (cars, salary, sports...). Some of these projects feature crawling techniques as well as exploratory data visualization. I'd also like to point out that the web isn't constant, thus the method I approach a specific website scraping now may not be appropriate in the future.
31 |
32 | I recommend starting with the notebook that scrapes movie data from Cineb.com since it provides an understanding of how the scraping is done.
33 |
34 |
35 | ## Prerequisites
36 |
37 |
38 | [](https://www.python.org/)
39 | [](https://jupyter.org/try)
40 |
41 |
42 |
43 |
44 | The following open source packages are used in this project:
45 |
46 | * Pandas
47 |
48 | * Matplotlib
49 | * bs4
50 | * requests
51 | * csv
52 | * json
53 |
54 |
55 |
56 | ## Folder structure
57 |
58 | |-- web-scraping-projects
59 | |-- README.md
60 | |-- data-directory
61 | | |-- books_data.csv
62 | | |-- cars.csv
63 | | |-- movies.csv
64 | | |-- real_estate.csv
65 | | |-- salary_data.csv
66 | | |-- transfers_data.csv
67 | |-- notebooks
68 | |-- Amazon.ipynb
69 | |-- Carvago.ipynb
70 | |-- Cineb_movies.ipynb
71 | |-- Real estate.ipynb
72 | |-- Salaries.ipynb
73 | |-- Transfermarkt.ipynb
74 | |-- Understat.ipynb
75 | |-- .ipynb_checkpoints
76 |
77 |
78 |
79 |
80 |
81 |
--------------------------------------------------------------------------------
/data-directory/salary_data.csv:
--------------------------------------------------------------------------------
1 | Title,Description,link,Percentile10,Percentile25,Percentile50,Percentile75,Percentile90
2 | Project Manager - Construction,"Project Manager - Construction oversees and directs all phases of a construction project. Designs and implements project plans. Being a Project Manager - Construction communicates directly with contractors/designers concerning project cost, staffing, and scheduling. Prepares project status reports and works to ensure plans adhere to contract specifications. Additionally, Project Manager - Construction requires a bachelor's degree of engineering. Typically reports to a manager. The Project Manager - Construction manages subordinate staff in the day-to-day performance of their jobs. True first level manager. Ensures that project/department milestones/goals are met and adhering to approved budgets. Has full authority for personnel actions. To be a Project Manager - Construction typically requires 5 years experience in the related area as an individual contributor. 1 - 3 years supervisory experience may be required. Extensive knowledge of the function and department processes.",https://www.salary.com/tools/salary-calculator/project-manager-construction,"$84,577","$97,670","$112,050","$126,274","$139,224"
3 | Project Accounting Manager,"Project Accounting Manager manages a team of project accountants that perform accounting and financial activities to measure and monitor project financial performance. Oversees the creation and maintenance of project level accounts. Being a Project Accounting Manager ensures that all postings, allocations, accruals and payments for the project are completed according to schedule. Reviews monthly project status reports prior to project reviews. Additionally, Project Accounting Manager supports project leaders to address changes in project scope or timeline and resolve any financial issues that might impede project timelines. Tracks and analyzes consolidated financial results for all projects. Requires a bachelor's degree in accounting or finance. Typically reports to a director. The Project Accounting Manager manages subordinate staff in the day-to-day performance of their jobs. True first level manager. Ensures that project/department milestones/goals are met and adhering to approved budgets. Has full authority for personnel actions. To be a Project Accounting Manager typically requires 5 years experience in the related area as an individual contributor. 1 - 3 years supervisory experience may be required. Extensive knowledge of the function and department processes.",https://www.salary.com/tools/salary-calculator/project-accounting-manager,"$86,153","$104,751","$125,179","$148,657","$170,032"
4 | IT Project Manager IV,"IT Project Manager IV manages and oversees all aspects of a technology project to ensure it is completed on-time and within budget. Has overall responsibility for managing scope, cost, schedule, internal staffing, vendors, and contractual deliverables. Being an IT Project Manager IV develops detailed project plans. Monitors project milestones and generate periodic status reports. Additionally, IT Project Manager IV evaluates and manages risk. Incorporates quality measures and standards to project deliverables. Possesses strong knowledge of technology. Typically requires a bachelor's degree or equivalent. May require project management certification. Typically reports to a manager or head of a unit/department. The IT Project Manager IV work is highly independent. May assume a team lead role for the work group. A specialist on complex technical and business matters. To be an IT Project Manager IV typically requires 7+ years of related experience.",https://www.salary.com/tools/salary-calculator/it-project-manager-iv,"$110,759","$121,122","$132,504","$143,264","$153,060"
5 | Project Controls Manager,"Project Controls Manager manages and oversees the project controls for engineering, construction, and other projects. Establishes controls and operating policies that identify, monitor, and mitigate risk factors that could impact the success of a project. Being a Project Controls Manager is responsible for developing a reporting structure and process to share project information with stakeholders. Ensures project control reporting documents are produced and that they clearly reflect the schedule and timeline status, cost or budget considerations, changes, supplier performance, and other risk levels. Additionally, Project Controls Manager provides guidance and consultation to project managers. Typically requires a bachelor's degree or equivalent. Typically reports to a manager or head of a unit/department. The Project Controls Manager manages subordinate staff in the day-to-day performance of their jobs. True first level manager. Ensures that project/department milestones/goals are met and adhering to approved budgets. Has full authority for personnel actions. To be a Project Controls Manager typically requires 5 years experience in the related area as an individual contributor. 1 - 3 years supervisory experience may be required. Extensive knowledge of the function and department processes.",https://www.salary.com/tools/salary-calculator/project-controls-manager,"$108,483","$127,458","$148,301","$163,739","$177,795"
6 | Project Manager Sr. - Construction,"Project Manager Sr. - Construction is responsible for all phases of a construction project. Designs and implements project plans. Being a Project Manager Sr. - Construction communicates directly with contractors/designers concerning project cost, staffing and scheduling. Prepares project status reports and works to ensure plans adhere to contract specifications. Additionally, Project Manager Sr. - Construction requires a bachelor's degree of engineering. Typically reports to senior manager. The Project Manager Sr. - Construction typically manages through subordinate managers and professionals in larger groups of moderate complexity. Provides input to strategic decisions that affect the functional area of responsibility. May give input into developing the budget. To be a Project Manager Sr. - Construction typically requires 3+ years of managerial experience. Capable of resolving escalated issues arising from operations and requiring coordination with other departments.",https://www.salary.com/tools/salary-calculator/project-manager-sr-construction,"$118,505","$136,352","$155,955","$174,966","$192,274"
7 | IT Project Manager V,"IT Project Manager V manages and oversees all aspects of a technology project to ensure it is completed on-time and within budget. Has overall responsibility for managing scope, cost, schedule, internal staffing, vendors, and contractual deliverables. Being an IT Project Manager V develops detailed project plans. Monitors project milestones and generate periodic status reports. Additionally, IT Project Manager V evaluates and manages risk. Incorporates quality measures and standards to project deliverables. Possesses strong knowledge of technology. Typically requires a bachelor's degree or equivalent. May require Project Management Certification. Typically reports to a manager or head of a unit/department. The IT Project Manager V works autonomously. Goals are generally communicated in ""solution"" or project goal terms. May provide a leadership role for the work group through knowledge in the area of specialization. Works on advanced, complex technical projects or business issues requiring state of the art technical or industry knowledge. To be an IT Project Manager V typically requires 10+ years of related experience.",https://www.salary.com/tools/salary-calculator/it-project-manager-v,"$126,798","$139,248","$152,922","$167,797","$181,340"
8 | Project Controls Senior Manager,"Project Controls Senior Manager manages and oversees the project controls for engineering, construction, and other projects. Establishes controls and operating policies that identify, monitor, and mitigate risk factors that could impact the success of a project. Being a Project Controls Senior Manager is responsible for developing a reporting structure and process to share project information with stakeholders. Ensures project control reporting documents are produced and that they clearly reflect the schedule and timeline status, cost or budget considerations, changes, supplier performance, and other risk levels. Additionally, Project Controls Senior Manager provides guidance and consultation to project managers. Typically requires a bachelor's degree or equivalent. Typically reports to a manager or head of a unit/department. The Project Controls Senior Manager typically manages through subordinate managers and professionals in larger groups of moderate complexity. Provides input to strategic decisions that affect the functional area of responsibility. May give input into developing the budget. To be a Project Controls Senior Manager typically requires 3+ years of managerial experience. Capable of resolving escalated issues arising from operations and requiring coordination with other departments.",https://www.salary.com/tools/salary-calculator/project-controls-senior-manager,"$114,939","$140,487","$168,548","$187,370","$204,505"
9 | SAP Project Manager,SAP Project Manager manages all activities related to SAP implementation projects. Ensures that all SAP project goals are accomplished according to specifications and business objectives. Being a SAP Project Manager requires a bachelor's degree in area of specialty. Typically reports to top management. The SAP Project Manager typically manages through subordinate managers and professionals in larger groups of moderate complexity. Provides input to strategic decisions that affect the functional area of responsibility. May give input into developing the budget. Capable of resolving escalated issues arising from operations and requiring coordination with other departments. To be a SAP Project Manager typically requires 3+ years of managerial experience.,https://www.salary.com/tools/salary-calculator/sap-project-manager,"$90,447","$100,376","$111,281","$125,175","$137,824"
10 | Project Management Manager,"Project Management Manager manages and directs the work of project managers and provides managerial oversight for multiple projects. Monitors project scopes, costs, schedules, staffing, communications, outside vendors, and contractual deliverables. Being a Project Management Manager develops standards, processes, and tools used for effective project scheduling and to set and manage quality targets. Addresses internal or vendor issues that may impede project delivery and develops solutions. Additionally, Project Management Manager tracks at risk metrics and facilitates actions to keep projects on track. Establishes data collection and reporting processes to capture key metrics of project activities and to provide periodic reporting. Requires a bachelor's degree. May require a project management certification. Typically reports to a director. The Project Management Manager manages subordinate staff in the day-to-day performance of their jobs. True first level manager. Ensures that project/department milestones/goals are met and adhering to approved budgets. Has full authority for personnel actions. To be a Project Management Manager typically requires 5 years experience in the related area as an individual contributor. 1 - 3 years supervisory experience may be required. Extensive knowledge of the function and department processes.",https://www.salary.com/tools/salary-calculator/project-management-manager,"$108,056","$121,233","$135,707","$151,638","$166,141"
11 | Project Manager IV,"Project Manager IV manages complex projects from planning through delivery. Liaises between project members, cross-functional teams, external vendors, and other stakeholders to ensure deliverables, requirements, schedules, cost, and meeting plans are communicated. Being a Project Manager IV utilizes appropriate tools to plan project timelines, tasks, milestones, and deadlines. Communicates schedule and changes to all stakeholders. Additionally, Project Manager IV plans and facilitates project meetings to align the project team to methods and goals and to track project tasks. Prepares agendas, meeting notes, and project summaries. Monitors task completion status to Identify at risk project tasks and to develop mitigation plans. Allocates resources, budgets, and hours to the project and adjusts allocations when necessary. Typically requires a bachelor's degree or equivalent. May require a project management certification. Typically reports to a manager or head of a unit/department. The Project Manager IV work is highly independent. May assume a team lead role for the work group. A specialist on complex technical and business matters. To be a Project Manager IV typically requires 7+ years of related experience.",https://www.salary.com/tools/salary-calculator/project-manager-iv,"$93,205","$106,563","$121,236","$135,775","$149,012"
12 |
--------------------------------------------------------------------------------
/notebooks/Carvago.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "057b35d3",
6 | "metadata": {},
7 | "source": [
8 | "
Scraping car's data by crawling to specific URLs "
9 | ]
10 | },
11 | {
12 | "cell_type": "code",
13 | "execution_count": 1,
14 | "id": "f8336a25",
15 | "metadata": {},
16 | "outputs": [],
17 | "source": [
18 | "# Imports\n",
19 | "\n",
20 | "from bs4 import BeautifulSoup as bs\n",
21 | "import requests\n",
22 | "import pandas as pd\n",
23 | "import csv\n",
24 | "\n",
25 | "# To print its version only\n",
26 | "import bs4"
27 | ]
28 | },
29 | {
30 | "cell_type": "code",
31 | "execution_count": 2,
32 | "id": "13f0bfa6",
33 | "metadata": {},
34 | "outputs": [
35 | {
36 | "name": "stdout",
37 | "output_type": "stream",
38 | "text": [
39 | "pandas version: 1.4.2\n",
40 | "bs4 version: 4.11.1\n",
41 | "requests version: 2.27.1\n",
42 | "csv version: 1.0\n"
43 | ]
44 | }
45 | ],
46 | "source": [
47 | "# Setup version\n",
48 | "\n",
49 | "print('pandas version: {}'.format(pd.__version__))\n",
50 | "print('bs4 version: {}'.format(bs4.__version__))\n",
51 | "print('requests version: {}'.format(requests.__version__))\n",
52 | "print('csv version: {}'.format(csv.__version__))"
53 | ]
54 | },
55 | {
56 | "cell_type": "code",
57 | "execution_count": 3,
58 | "id": "7304e2da",
59 | "metadata": {},
60 | "outputs": [],
61 | "source": [
62 | "# get souped page of the main webpage\n",
63 | "def get_souped_page(url):\n",
64 | " \n",
65 | " page= requests.get(url)\n",
66 | " soup= bs(page.content, 'html.parser')\n",
67 | " \n",
68 | " return soup\n"
69 | ]
70 | },
71 | {
72 | "cell_type": "code",
73 | "execution_count": 4,
74 | "id": "dba18938",
75 | "metadata": {},
76 | "outputs": [],
77 | "source": [
78 | "# To generate the appropriate link of a page \n",
79 | "def get_page_url(page):\n",
80 | " \n",
81 | " template= 'https://carvago.com/cars?page={}'\n",
82 | " url= template.format(int(page))\n",
83 | " \n",
84 | " return url\n"
85 | ]
86 | },
87 | {
88 | "cell_type": "code",
89 | "execution_count": 5,
90 | "id": "59527d2e",
91 | "metadata": {},
92 | "outputs": [],
93 | "source": [
94 | "# Scrape the number of pages that we have to looop over\n",
95 | "def get_nb_pages(url):\n",
96 | " \n",
97 | " page= requests.get(url)\n",
98 | " soup= bs(page.content, 'html.parser' )\n",
99 | " ul= soup.find_all('ul', class_= 'Pagination-container')[0]\n",
100 | " li= ul.find_all('li', class_= 'Pagination-item')\n",
101 | " final_page= int(li[5].text)\n",
102 | " \n",
103 | " return final_page\n"
104 | ]
105 | },
106 | {
107 | "cell_type": "code",
108 | "execution_count": 6,
109 | "id": "988080b5",
110 | "metadata": {},
111 | "outputs": [],
112 | "source": [
113 | "# scrape urls of articles per page\n",
114 | "def scrape_urls(nb_pages):\n",
115 | " \n",
116 | " links= []\n",
117 | " for page in range(nb_pages):\n",
118 | " page_url= get_page_url(page)\n",
119 | " soup= get_souped_page(page_url)\n",
120 | " div= soup.find('div', class_='css-1f3egm3 e1qmtrzl0')\n",
121 | " articles= div.find_all('div', class_= 'css-j0dexk e1oahio80')\n",
122 | " for article in articles:\n",
123 | " try:\n",
124 | " link= 'https://carvago.com' + article.a['href']\n",
125 | " links.append(link)\n",
126 | " except:\n",
127 | " pass\n",
128 | " \n",
129 | " return links\n"
130 | ]
131 | },
132 | {
133 | "cell_type": "code",
134 | "execution_count": 7,
135 | "id": "da461bde",
136 | "metadata": {},
137 | "outputs": [],
138 | "source": [
139 | "# A function that enables us to scrape all cars data\n",
140 | "def scrape_cars_data(urls):\n",
141 | " \n",
142 | " records= []\n",
143 | " for url in urls:\n",
144 | " \n",
145 | " soup= get_souped_page(url)\n",
146 | " try:\n",
147 | " # get Location\n",
148 | " description= soup.find('div', {'data-test-id' : \"feature.car_detail.car_description\"})\n",
149 | " location_div= description.find_all('div', class_= 'css-cslgjn evubtux3')[3]\n",
150 | " location= location_div.find('div', class_= 'css-o5n6vn evubtux1').text\n",
151 | "\n",
152 | " # get Price\n",
153 | " price_div= soup.find('div', class_='css-tztcmb ejteidz8')\n",
154 | " price= soup.find('div', class_='css-1mbrvie e1hgzarh2').text\n",
155 | "\n",
156 | " # get the overall details div\n",
157 | " details= soup.find('div', class_='css-hl133e e16182je4')\n",
158 | " details= details.find_all('div', class_='css-l5ry6c ebz328c2')\n",
159 | "\n",
160 | " # get vehicule detail \n",
161 | " v_detail= details[0]\n",
162 | " rows= v_detail.find_all('div', class_='sc-gsnTZi gPmveL css-1su3ehq e18uvu5d8')\n",
163 | "\n",
164 | " make= rows[0].a.text\n",
165 | " model= rows[1].a.text\n",
166 | " try:\n",
167 | " body_color= rows[2].find('div', class_= 'css-s5xdrg exn7x430').text\n",
168 | " except:\n",
169 | " body_color= 'None'\n",
170 | " interior_color= rows[3].find('div', class_= 'sc-dkzDqf edLfRl css-1k9kkyi e18uvu5d2').text\n",
171 | " interior_material= rows[4].find('div', class_= 'sc-dkzDqf edLfRl css-1b7garx e18uvu5d4').text\n",
172 | " body= rows[5].find('div', class_= 'sc-dkzDqf edLfRl css-1b7garx e18uvu5d4').text\n",
173 | " doors= rows[6].find('div', class_= 'sc-dkzDqf edLfRl css-1b7garx e18uvu5d4').text\n",
174 | " seats= rows[7].find('div', class_= 'sc-dkzDqf edLfRl css-1b7garx e18uvu5d4').text\n",
175 | " vin= rows[8].find('div', class_= 'sc-dkzDqf edLfRl css-1b7garx e18uvu5d4').text\n",
176 | " if vin == 'not published by the seller':\n",
177 | " vin= 'None'\n",
178 | "\n",
179 | " # get steering\n",
180 | " v_steering= details[1]\n",
181 | " rows_s= v_steering.find_all('div', class_='sc-gsnTZi gPmveL css-1su3ehq e18uvu5d8')\n",
182 | "\n",
183 | " fuel= rows_s[0].find('div', class_= 'sc-dkzDqf edLfRl css-1b7garx e18uvu5d4').text\n",
184 | " transmission= rows_s[1].find('div', class_= 'sc-dkzDqf edLfRl css-1b7garx e18uvu5d4').text\n",
185 | " drive_type= rows_s[2].find('div', class_= 'sc-dkzDqf edLfRl css-1b7garx e18uvu5d4').text\n",
186 | " power= rows_s[3].find('div', class_= 'sc-dkzDqf edLfRl css-1b7garx e18uvu5d4').text\n",
187 | " engine_capacity= rows_s[4].find('div', class_= 'sc-dkzDqf edLfRl css-1b7garx e18uvu5d4').text\n",
188 | "\n",
189 | " co2_emission= rows_s[5].find('div', class_= 'sc-dkzDqf edLfRl css-1b7garx e18uvu5d4').text\n",
190 | " emission_class= rows_s[6].find('div', class_= 'sc-dkzDqf edLfRl css-1b7garx e18uvu5d4').text\n",
191 | "\n",
192 | " # get vehicule condition\n",
193 | " v_condition= details[2]\n",
194 | " rows_c= v_condition.find_all('div', class_= 'sc-gsnTZi gPmveL css-1su3ehq e18uvu5d8')\n",
195 | "\n",
196 | " driven_distance= rows_c[0].find('div', class_='sc-dkzDqf edLfRl css-1b7garx e18uvu5d4').text\n",
197 | " first_registration= rows_c[1].find('div', class_='sc-dkzDqf edLfRl css-1b7garx e18uvu5d4').text\n",
198 | " condition= rows_c[2].find('div', class_='sc-dkzDqf edLfRl css-1b7garx e18uvu5d4').text\n",
199 | " record=(make, model, price, body_color, interior_color, interior_material, body, doors, seats, vin, fuel, transmission, drive_type,\n",
200 | " power, engine_capacity, co2_emission, emission_class, driven_distance, first_registration, condition )\n",
201 | " records.append(record)\n",
202 | " except:\n",
203 | " pass\n",
204 | "\n",
205 | " with open(\"cars.csv\", 'w', newline='',encoding= 'utf-8') as f:\n",
206 | " writer= csv.writer(f)\n",
207 | " writer.writerow(['make', 'model', 'price', 'body_color', 'interior_color', 'interior_material', 'body', 'doors', 'seats', 'vin', 'fuel', 'transmission', 'drive_type',\n",
208 | " 'power', 'engine_capacity', 'co2_emission', 'emission_class', 'driven_distance', 'first_registration', 'condition'])\n",
209 | " writer.writerows(records)\n",
210 | "\n",
211 | " return records\n"
212 | ]
213 | },
214 | {
215 | "cell_type": "code",
216 | "execution_count": 8,
217 | "id": "ac17c491",
218 | "metadata": {},
219 | "outputs": [],
220 | "source": [
221 | "# Let's try what we have built\n",
222 | "nb_pages= get_nb_pages('https://carvago.com/cars?page=35132') \n",
223 | "# genereate links\n",
224 | "links= scrape_urls(50)\n",
225 | "# scrape each car web page\n",
226 | "records= scrape_cars_data(links)"
227 | ]
228 | },
229 | {
230 | "cell_type": "code",
231 | "execution_count": 9,
232 | "id": "f99ee77c",
233 | "metadata": {},
234 | "outputs": [],
235 | "source": []
236 | },
237 | {
238 | "cell_type": "code",
239 | "execution_count": 10,
240 | "id": "eb053fc8",
241 | "metadata": {
242 | "scrolled": true
243 | },
244 | "outputs": [
245 | {
246 | "data": {
247 | "text/html": [
248 | "\n",
249 | "\n",
262 | "
\n",
263 | " \n",
264 | " \n",
265 | " \n",
266 | " make \n",
267 | " model \n",
268 | " price \n",
269 | " body_color \n",
270 | " interior_color \n",
271 | " interior_material \n",
272 | " body \n",
273 | " doors \n",
274 | " seats \n",
275 | " vin \n",
276 | " fuel \n",
277 | " transmission \n",
278 | " drive_type \n",
279 | " power \n",
280 | " engine_capacity \n",
281 | " co2_emission \n",
282 | " emission_class \n",
283 | " driven_distance \n",
284 | " first_registration \n",
285 | " condition \n",
286 | " \n",
287 | " \n",
288 | " \n",
289 | " \n",
290 | " 0 \n",
291 | " Seat \n",
292 | " Leon \n",
293 | " 3 049 € \n",
294 | " None \n",
295 | " Interior color \n",
296 | " Other interior material \n",
297 | " Compact \n",
298 | " 4/5 doors \n",
299 | " 5 \n",
300 | " 1249 \n",
301 | " Petrol \n",
302 | " Manual \n",
303 | " 4x2 \n",
304 | " 92 kW \n",
305 | " 1 390 cc \n",
306 | " 149 \n",
307 | " Euro 4 \n",
308 | " 158 500 km \n",
309 | " 6/2009 \n",
310 | " Used \n",
311 | " \n",
312 | " \n",
313 | " 1 \n",
314 | " Opel \n",
315 | " Crossland X \n",
316 | " 15 899 € \n",
317 | " Orange \n",
318 | " Type of finish \n",
319 | " Black interior \n",
320 | " Cloth interior \n",
321 | " SUV / offroad \n",
322 | " 4/5 doors \n",
323 | " 5 \n",
324 | " Diesel \n",
325 | " Manual \n",
326 | " 4x2 \n",
327 | " 88 kW \n",
328 | " 15 600 cc \n",
329 | " 103 \n",
330 | " Euro 6 \n",
331 | " 69 000 km \n",
332 | " 8/2017 \n",
333 | " 2 \n",
334 | " \n",
335 | " \n",
336 | " 2 \n",
337 | " Kia \n",
338 | " XCeed \n",
339 | " 24 649 € \n",
340 | " Silver \n",
341 | " Type of finish \n",
342 | " Black interior \n",
343 | " Cloth interior \n",
344 | " SUV / offroad \n",
345 | " 4/5 doors \n",
346 | " 5 \n",
347 | " Petrol \n",
348 | " Manual \n",
349 | " 4x2 \n",
350 | " 103 kW \n",
351 | " 13 534 cc \n",
352 | " 134 \n",
353 | " Euro 6d-TEMP \n",
354 | " 19 500 km \n",
355 | " 10/2020 \n",
356 | " Used \n",
357 | " \n",
358 | " \n",
359 | " 3 \n",
360 | " Renault \n",
361 | " Clio \n",
362 | " 15 099 € \n",
363 | " Grey \n",
364 | " Interior color \n",
365 | " Other interior material \n",
366 | " Compact \n",
367 | " 4/5 doors \n",
368 | " 1090 \n",
369 | " None \n",
370 | " Diesel \n",
371 | " Manual \n",
372 | " 4x2 \n",
373 | " 66 kW \n",
374 | " 14 641 cc \n",
375 | " 135 \n",
376 | " Euro 6 \n",
377 | " 13 884 km \n",
378 | " 11/2019 \n",
379 | " Used \n",
380 | " \n",
381 | " \n",
382 | " 4 \n",
383 | " Mercedes-Benz \n",
384 | " E 350 \n",
385 | " 10 899 € \n",
386 | " Black \n",
387 | " Type of finish \n",
388 | " Beige interior \n",
389 | " Full leather interior \n",
390 | " Station Wagon \n",
391 | " 4/5 doors \n",
392 | " 5 \n",
393 | " Petrol \n",
394 | " Automatic \n",
395 | " 4x4 \n",
396 | " 200 kW \n",
397 | " 3 498 cc \n",
398 | " 261 \n",
399 | " Euro 4 \n",
400 | " 163 000 km \n",
401 | " 1/2009 \n",
402 | " 2 \n",
403 | " \n",
404 | " \n",
405 | " ... \n",
406 | " ... \n",
407 | " ... \n",
408 | " ... \n",
409 | " ... \n",
410 | " ... \n",
411 | " ... \n",
412 | " ... \n",
413 | " ... \n",
414 | " ... \n",
415 | " ... \n",
416 | " ... \n",
417 | " ... \n",
418 | " ... \n",
419 | " ... \n",
420 | " ... \n",
421 | " ... \n",
422 | " ... \n",
423 | " ... \n",
424 | " ... \n",
425 | " ... \n",
426 | " \n",
427 | " \n",
428 | " 593 \n",
429 | " Mercedes-Benz \n",
430 | " E 220 \n",
431 | " 32 049 € \n",
432 | " Silver \n",
433 | " Type of finish \n",
434 | " Black interior \n",
435 | " Part leather interior \n",
436 | " Station Wagon \n",
437 | " 4/5 doors \n",
438 | " 5 \n",
439 | " Diesel \n",
440 | " Automatic \n",
441 | " 4x2 \n",
442 | " 143 kW \n",
443 | " 1 950 cc \n",
444 | " 159 \n",
445 | " Euro 6d-TEMP \n",
446 | " 73 170 km \n",
447 | " 6/2019 \n",
448 | " 1 \n",
449 | " \n",
450 | " \n",
451 | " 594 \n",
452 | " BMW \n",
453 | " 220 \n",
454 | " 49 549 € \n",
455 | " Black \n",
456 | " Type of finish \n",
457 | " Black interior \n",
458 | " Full leather interior \n",
459 | " Sedans / saloons \n",
460 | " 4/5 doors \n",
461 | " 5 \n",
462 | " Petrol \n",
463 | " Automatic \n",
464 | " 4x2 \n",
465 | " 135 kW \n",
466 | " 1 998 cc \n",
467 | " 144 \n",
468 | " No emission class \n",
469 | " 0 km \n",
470 | " 7/2022 \n",
471 | " New \n",
472 | " \n",
473 | " \n",
474 | " 595 \n",
475 | " Dodge \n",
476 | " Durango \n",
477 | " 54 399 € \n",
478 | " Grey \n",
479 | " Type of finish \n",
480 | " Black interior \n",
481 | " Full leather interior \n",
482 | " SUV / offroad \n",
483 | " 4/5 doors \n",
484 | " 6 \n",
485 | " Petrol \n",
486 | " Automatic \n",
487 | " 4x4 \n",
488 | " 268 kW \n",
489 | " 5 654 cc \n",
490 | " 387 \n",
491 | " Euro 5 \n",
492 | " 21 958 km \n",
493 | " 7/2021 \n",
494 | " 2 \n",
495 | " \n",
496 | " \n",
497 | " 596 \n",
498 | " BMW \n",
499 | " X3 M40 \n",
500 | " 60 649 € \n",
501 | " Black \n",
502 | " Type of finish \n",
503 | " Black interior \n",
504 | " Full leather interior \n",
505 | " SUV / offroad \n",
506 | " 4/5 doors \n",
507 | " 5 \n",
508 | " Diesel \n",
509 | " Automatic \n",
510 | " 4x4 \n",
511 | " 250 kW \n",
512 | " 2 993 cc \n",
513 | " 184 \n",
514 | " Euro 6d \n",
515 | " 29 699 km \n",
516 | " 8/2021 \n",
517 | " Used \n",
518 | " \n",
519 | " \n",
520 | " 597 \n",
521 | " Audi \n",
522 | " SQ7 \n",
523 | " 62 149 € \n",
524 | " Silver \n",
525 | " Type of finish \n",
526 | " Black interior \n",
527 | " Full leather interior \n",
528 | " SUV / offroad \n",
529 | " 4/5 doors \n",
530 | " 2345 \n",
531 | " Diesel \n",
532 | " Automatic \n",
533 | " 4x4 \n",
534 | " 320 kW \n",
535 | " 3 956 cc \n",
536 | " 199 \n",
537 | " Euro 6 \n",
538 | " 62 412 km \n",
539 | " 6/2017 \n",
540 | " Used \n",
541 | " \n",
542 | " \n",
543 | "
\n",
544 | "
598 rows × 20 columns
\n",
545 | "
"
546 | ],
547 | "text/plain": [
548 | " make model price body_color interior_color \\\n",
549 | "0 Seat Leon 3 049 € None Interior color \n",
550 | "1 Opel Crossland X 15 899 € Orange Type of finish \n",
551 | "2 Kia XCeed 24 649 € Silver Type of finish \n",
552 | "3 Renault Clio 15 099 € Grey Interior color \n",
553 | "4 Mercedes-Benz E 350 10 899 € Black Type of finish \n",
554 | ".. ... ... ... ... ... \n",
555 | "593 Mercedes-Benz E 220 32 049 € Silver Type of finish \n",
556 | "594 BMW 220 49 549 € Black Type of finish \n",
557 | "595 Dodge Durango 54 399 € Grey Type of finish \n",
558 | "596 BMW X3 M40 60 649 € Black Type of finish \n",
559 | "597 Audi SQ7 62 149 € Silver Type of finish \n",
560 | "\n",
561 | " interior_material body doors \\\n",
562 | "0 Other interior material Compact 4/5 doors \n",
563 | "1 Black interior Cloth interior SUV / offroad \n",
564 | "2 Black interior Cloth interior SUV / offroad \n",
565 | "3 Other interior material Compact 4/5 doors \n",
566 | "4 Beige interior Full leather interior Station Wagon \n",
567 | ".. ... ... ... \n",
568 | "593 Black interior Part leather interior Station Wagon \n",
569 | "594 Black interior Full leather interior Sedans / saloons \n",
570 | "595 Black interior Full leather interior SUV / offroad \n",
571 | "596 Black interior Full leather interior SUV / offroad \n",
572 | "597 Black interior Full leather interior SUV / offroad \n",
573 | "\n",
574 | " seats vin fuel transmission drive_type power engine_capacity \\\n",
575 | "0 5 1249 Petrol Manual 4x2 92 kW 1 390 cc \n",
576 | "1 4/5 doors 5 Diesel Manual 4x2 88 kW 15 600 cc \n",
577 | "2 4/5 doors 5 Petrol Manual 4x2 103 kW 13 534 cc \n",
578 | "3 1090 None Diesel Manual 4x2 66 kW 14 641 cc \n",
579 | "4 4/5 doors 5 Petrol Automatic 4x4 200 kW 3 498 cc \n",
580 | ".. ... ... ... ... ... ... ... \n",
581 | "593 4/5 doors 5 Diesel Automatic 4x2 143 kW 1 950 cc \n",
582 | "594 4/5 doors 5 Petrol Automatic 4x2 135 kW 1 998 cc \n",
583 | "595 4/5 doors 6 Petrol Automatic 4x4 268 kW 5 654 cc \n",
584 | "596 4/5 doors 5 Diesel Automatic 4x4 250 kW 2 993 cc \n",
585 | "597 4/5 doors 2345 Diesel Automatic 4x4 320 kW 3 956 cc \n",
586 | "\n",
587 | " co2_emission emission_class driven_distance first_registration \\\n",
588 | "0 149 Euro 4 158 500 km 6/2009 \n",
589 | "1 103 Euro 6 69 000 km 8/2017 \n",
590 | "2 134 Euro 6d-TEMP 19 500 km 10/2020 \n",
591 | "3 135 Euro 6 13 884 km 11/2019 \n",
592 | "4 261 Euro 4 163 000 km 1/2009 \n",
593 | ".. ... ... ... ... \n",
594 | "593 159 Euro 6d-TEMP 73 170 km 6/2019 \n",
595 | "594 144 No emission class 0 km 7/2022 \n",
596 | "595 387 Euro 5 21 958 km 7/2021 \n",
597 | "596 184 Euro 6d 29 699 km 8/2021 \n",
598 | "597 199 Euro 6 62 412 km 6/2017 \n",
599 | "\n",
600 | " condition \n",
601 | "0 Used \n",
602 | "1 2 \n",
603 | "2 Used \n",
604 | "3 Used \n",
605 | "4 2 \n",
606 | ".. ... \n",
607 | "593 1 \n",
608 | "594 New \n",
609 | "595 2 \n",
610 | "596 Used \n",
611 | "597 Used \n",
612 | "\n",
613 | "[598 rows x 20 columns]"
614 | ]
615 | },
616 | "execution_count": 10,
617 | "metadata": {},
618 | "output_type": "execute_result"
619 | }
620 | ],
621 | "source": [
622 | "# the result of our scraping\n",
623 | "df= pd.read_csv('cars.csv')\n",
624 | "df"
625 | ]
626 | },
627 | {
628 | "cell_type": "code",
629 | "execution_count": 11,
630 | "id": "863d74f7",
631 | "metadata": {},
632 | "outputs": [
633 | {
634 | "data": {
635 | "text/plain": [
636 | "(598, 20)"
637 | ]
638 | },
639 | "execution_count": 11,
640 | "metadata": {},
641 | "output_type": "execute_result"
642 | }
643 | ],
644 | "source": [
645 | "df.shape"
646 | ]
647 | },
648 | {
649 | "cell_type": "code",
650 | "execution_count": null,
651 | "id": "15e1c1a4",
652 | "metadata": {},
653 | "outputs": [],
654 | "source": []
655 | }
656 | ],
657 | "metadata": {
658 | "kernelspec": {
659 | "display_name": "Python 3 (ipykernel)",
660 | "language": "python",
661 | "name": "python3"
662 | },
663 | "language_info": {
664 | "codemirror_mode": {
665 | "name": "ipython",
666 | "version": 3
667 | },
668 | "file_extension": ".py",
669 | "mimetype": "text/x-python",
670 | "name": "python",
671 | "nbconvert_exporter": "python",
672 | "pygments_lexer": "ipython3",
673 | "version": "3.9.12"
674 | }
675 | },
676 | "nbformat": 4,
677 | "nbformat_minor": 5
678 | }
679 |
--------------------------------------------------------------------------------
/notebooks/Cineb_movies.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "5a53369b",
6 | "metadata": {},
7 | "source": [
8 | "Scraping movie data from cineb.com "
9 | ]
10 | },
11 | {
12 | "cell_type": "code",
13 | "execution_count": 1,
14 | "id": "d1e52b22",
15 | "metadata": {},
16 | "outputs": [],
17 | "source": [
18 | "# Imports\n",
19 | "import pandas as pd\n",
20 | "import bs4\n",
21 | "from bs4 import BeautifulSoup\n",
22 | "import csv\n",
23 | "import requests\n"
24 | ]
25 | },
26 | {
27 | "cell_type": "code",
28 | "execution_count": 2,
29 | "id": "de166047",
30 | "metadata": {},
31 | "outputs": [
32 | {
33 | "name": "stdout",
34 | "output_type": "stream",
35 | "text": [
36 | "pandas version: 1.4.2\n",
37 | "bs4 version: 4.11.1\n",
38 | "requests version: 2.27.1\n",
39 | "csv version: 1.0\n"
40 | ]
41 | }
42 | ],
43 | "source": [
44 | "# %% Setup version\n",
45 | "\n",
46 | "print('pandas version: {}'.format(pd.__version__))\n",
47 | "print('bs4 version: {}'.format(bs4.__version__))\n",
48 | "print('requests version: {}'.format(requests.__version__))\n",
49 | "print('csv version: {}'.format(csv.__version__))"
50 | ]
51 | },
52 | {
53 | "cell_type": "code",
54 | "execution_count": 3,
55 | "id": "4876ac71",
56 | "metadata": {},
57 | "outputs": [
58 | {
59 | "data": {
60 | "text/plain": [
61 | "['https://cineb.net/movie?page=0',\n",
62 | " 'https://cineb.net/movie?page=1',\n",
63 | " 'https://cineb.net/movie?page=2',\n",
64 | " 'https://cineb.net/movie?page=3',\n",
65 | " 'https://cineb.net/movie?page=4',\n",
66 | " 'https://cineb.net/movie?page=5',\n",
67 | " 'https://cineb.net/movie?page=6',\n",
68 | " 'https://cineb.net/movie?page=7',\n",
69 | " 'https://cineb.net/movie?page=8',\n",
70 | " 'https://cineb.net/movie?page=9',\n",
71 | " 'https://cineb.net/movie?page=10',\n",
72 | " 'https://cineb.net/movie?page=11',\n",
73 | " 'https://cineb.net/movie?page=12',\n",
74 | " 'https://cineb.net/movie?page=13',\n",
75 | " 'https://cineb.net/movie?page=14',\n",
76 | " 'https://cineb.net/movie?page=15',\n",
77 | " 'https://cineb.net/movie?page=16',\n",
78 | " 'https://cineb.net/movie?page=17',\n",
79 | " 'https://cineb.net/movie?page=18',\n",
80 | " 'https://cineb.net/movie?page=19',\n",
81 | " 'https://cineb.net/movie?page=20',\n",
82 | " 'https://cineb.net/movie?page=21',\n",
83 | " 'https://cineb.net/movie?page=22',\n",
84 | " 'https://cineb.net/movie?page=23',\n",
85 | " 'https://cineb.net/movie?page=24',\n",
86 | " 'https://cineb.net/movie?page=25',\n",
87 | " 'https://cineb.net/movie?page=26',\n",
88 | " 'https://cineb.net/movie?page=27',\n",
89 | " 'https://cineb.net/movie?page=28',\n",
90 | " 'https://cineb.net/movie?page=29',\n",
91 | " 'https://cineb.net/movie?page=30',\n",
92 | " 'https://cineb.net/movie?page=31',\n",
93 | " 'https://cineb.net/movie?page=32',\n",
94 | " 'https://cineb.net/movie?page=33',\n",
95 | " 'https://cineb.net/movie?page=34',\n",
96 | " 'https://cineb.net/movie?page=35',\n",
97 | " 'https://cineb.net/movie?page=36',\n",
98 | " 'https://cineb.net/movie?page=37',\n",
99 | " 'https://cineb.net/movie?page=38',\n",
100 | " 'https://cineb.net/movie?page=39',\n",
101 | " 'https://cineb.net/movie?page=40',\n",
102 | " 'https://cineb.net/movie?page=41',\n",
103 | " 'https://cineb.net/movie?page=42',\n",
104 | " 'https://cineb.net/movie?page=43',\n",
105 | " 'https://cineb.net/movie?page=44',\n",
106 | " 'https://cineb.net/movie?page=45',\n",
107 | " 'https://cineb.net/movie?page=46',\n",
108 | " 'https://cineb.net/movie?page=47',\n",
109 | " 'https://cineb.net/movie?page=48',\n",
110 | " 'https://cineb.net/movie?page=49',\n",
111 | " 'https://cineb.net/movie?page=50',\n",
112 | " 'https://cineb.net/movie?page=51',\n",
113 | " 'https://cineb.net/movie?page=52',\n",
114 | " 'https://cineb.net/movie?page=53',\n",
115 | " 'https://cineb.net/movie?page=54',\n",
116 | " 'https://cineb.net/movie?page=55',\n",
117 | " 'https://cineb.net/movie?page=56',\n",
118 | " 'https://cineb.net/movie?page=57',\n",
119 | " 'https://cineb.net/movie?page=58',\n",
120 | " 'https://cineb.net/movie?page=59',\n",
121 | " 'https://cineb.net/movie?page=60',\n",
122 | " 'https://cineb.net/movie?page=61',\n",
123 | " 'https://cineb.net/movie?page=62',\n",
124 | " 'https://cineb.net/movie?page=63',\n",
125 | " 'https://cineb.net/movie?page=64',\n",
126 | " 'https://cineb.net/movie?page=65',\n",
127 | " 'https://cineb.net/movie?page=66',\n",
128 | " 'https://cineb.net/movie?page=67',\n",
129 | " 'https://cineb.net/movie?page=68',\n",
130 | " 'https://cineb.net/movie?page=69',\n",
131 | " 'https://cineb.net/movie?page=70',\n",
132 | " 'https://cineb.net/movie?page=71',\n",
133 | " 'https://cineb.net/movie?page=72',\n",
134 | " 'https://cineb.net/movie?page=73',\n",
135 | " 'https://cineb.net/movie?page=74',\n",
136 | " 'https://cineb.net/movie?page=75',\n",
137 | " 'https://cineb.net/movie?page=76',\n",
138 | " 'https://cineb.net/movie?page=77',\n",
139 | " 'https://cineb.net/movie?page=78',\n",
140 | " 'https://cineb.net/movie?page=79',\n",
141 | " 'https://cineb.net/movie?page=80',\n",
142 | " 'https://cineb.net/movie?page=81',\n",
143 | " 'https://cineb.net/movie?page=82',\n",
144 | " 'https://cineb.net/movie?page=83',\n",
145 | " 'https://cineb.net/movie?page=84',\n",
146 | " 'https://cineb.net/movie?page=85',\n",
147 | " 'https://cineb.net/movie?page=86',\n",
148 | " 'https://cineb.net/movie?page=87',\n",
149 | " 'https://cineb.net/movie?page=88',\n",
150 | " 'https://cineb.net/movie?page=89',\n",
151 | " 'https://cineb.net/movie?page=90',\n",
152 | " 'https://cineb.net/movie?page=91',\n",
153 | " 'https://cineb.net/movie?page=92',\n",
154 | " 'https://cineb.net/movie?page=93',\n",
155 | " 'https://cineb.net/movie?page=94',\n",
156 | " 'https://cineb.net/movie?page=95',\n",
157 | " 'https://cineb.net/movie?page=96',\n",
158 | " 'https://cineb.net/movie?page=97',\n",
159 | " 'https://cineb.net/movie?page=98',\n",
160 | " 'https://cineb.net/movie?page=99',\n",
161 | " 'https://cineb.net/movie?page=100',\n",
162 | " 'https://cineb.net/movie?page=101',\n",
163 | " 'https://cineb.net/movie?page=102',\n",
164 | " 'https://cineb.net/movie?page=103',\n",
165 | " 'https://cineb.net/movie?page=104',\n",
166 | " 'https://cineb.net/movie?page=105',\n",
167 | " 'https://cineb.net/movie?page=106',\n",
168 | " 'https://cineb.net/movie?page=107',\n",
169 | " 'https://cineb.net/movie?page=108',\n",
170 | " 'https://cineb.net/movie?page=109',\n",
171 | " 'https://cineb.net/movie?page=110',\n",
172 | " 'https://cineb.net/movie?page=111',\n",
173 | " 'https://cineb.net/movie?page=112',\n",
174 | " 'https://cineb.net/movie?page=113',\n",
175 | " 'https://cineb.net/movie?page=114',\n",
176 | " 'https://cineb.net/movie?page=115',\n",
177 | " 'https://cineb.net/movie?page=116',\n",
178 | " 'https://cineb.net/movie?page=117',\n",
179 | " 'https://cineb.net/movie?page=118',\n",
180 | " 'https://cineb.net/movie?page=119',\n",
181 | " 'https://cineb.net/movie?page=120',\n",
182 | " 'https://cineb.net/movie?page=121',\n",
183 | " 'https://cineb.net/movie?page=122',\n",
184 | " 'https://cineb.net/movie?page=123',\n",
185 | " 'https://cineb.net/movie?page=124',\n",
186 | " 'https://cineb.net/movie?page=125',\n",
187 | " 'https://cineb.net/movie?page=126',\n",
188 | " 'https://cineb.net/movie?page=127',\n",
189 | " 'https://cineb.net/movie?page=128',\n",
190 | " 'https://cineb.net/movie?page=129',\n",
191 | " 'https://cineb.net/movie?page=130',\n",
192 | " 'https://cineb.net/movie?page=131',\n",
193 | " 'https://cineb.net/movie?page=132',\n",
194 | " 'https://cineb.net/movie?page=133',\n",
195 | " 'https://cineb.net/movie?page=134',\n",
196 | " 'https://cineb.net/movie?page=135',\n",
197 | " 'https://cineb.net/movie?page=136',\n",
198 | " 'https://cineb.net/movie?page=137',\n",
199 | " 'https://cineb.net/movie?page=138',\n",
200 | " 'https://cineb.net/movie?page=139',\n",
201 | " 'https://cineb.net/movie?page=140',\n",
202 | " 'https://cineb.net/movie?page=141',\n",
203 | " 'https://cineb.net/movie?page=142',\n",
204 | " 'https://cineb.net/movie?page=143',\n",
205 | " 'https://cineb.net/movie?page=144',\n",
206 | " 'https://cineb.net/movie?page=145',\n",
207 | " 'https://cineb.net/movie?page=146',\n",
208 | " 'https://cineb.net/movie?page=147',\n",
209 | " 'https://cineb.net/movie?page=148',\n",
210 | " 'https://cineb.net/movie?page=149',\n",
211 | " 'https://cineb.net/movie?page=150',\n",
212 | " 'https://cineb.net/movie?page=151',\n",
213 | " 'https://cineb.net/movie?page=152',\n",
214 | " 'https://cineb.net/movie?page=153',\n",
215 | " 'https://cineb.net/movie?page=154',\n",
216 | " 'https://cineb.net/movie?page=155',\n",
217 | " 'https://cineb.net/movie?page=156',\n",
218 | " 'https://cineb.net/movie?page=157',\n",
219 | " 'https://cineb.net/movie?page=158',\n",
220 | " 'https://cineb.net/movie?page=159',\n",
221 | " 'https://cineb.net/movie?page=160',\n",
222 | " 'https://cineb.net/movie?page=161',\n",
223 | " 'https://cineb.net/movie?page=162',\n",
224 | " 'https://cineb.net/movie?page=163',\n",
225 | " 'https://cineb.net/movie?page=164',\n",
226 | " 'https://cineb.net/movie?page=165',\n",
227 | " 'https://cineb.net/movie?page=166',\n",
228 | " 'https://cineb.net/movie?page=167',\n",
229 | " 'https://cineb.net/movie?page=168',\n",
230 | " 'https://cineb.net/movie?page=169',\n",
231 | " 'https://cineb.net/movie?page=170',\n",
232 | " 'https://cineb.net/movie?page=171',\n",
233 | " 'https://cineb.net/movie?page=172',\n",
234 | " 'https://cineb.net/movie?page=173',\n",
235 | " 'https://cineb.net/movie?page=174',\n",
236 | " 'https://cineb.net/movie?page=175',\n",
237 | " 'https://cineb.net/movie?page=176',\n",
238 | " 'https://cineb.net/movie?page=177',\n",
239 | " 'https://cineb.net/movie?page=178',\n",
240 | " 'https://cineb.net/movie?page=179',\n",
241 | " 'https://cineb.net/movie?page=180',\n",
242 | " 'https://cineb.net/movie?page=181',\n",
243 | " 'https://cineb.net/movie?page=182',\n",
244 | " 'https://cineb.net/movie?page=183',\n",
245 | " 'https://cineb.net/movie?page=184',\n",
246 | " 'https://cineb.net/movie?page=185',\n",
247 | " 'https://cineb.net/movie?page=186',\n",
248 | " 'https://cineb.net/movie?page=187',\n",
249 | " 'https://cineb.net/movie?page=188',\n",
250 | " 'https://cineb.net/movie?page=189',\n",
251 | " 'https://cineb.net/movie?page=190',\n",
252 | " 'https://cineb.net/movie?page=191',\n",
253 | " 'https://cineb.net/movie?page=192',\n",
254 | " 'https://cineb.net/movie?page=193',\n",
255 | " 'https://cineb.net/movie?page=194',\n",
256 | " 'https://cineb.net/movie?page=195',\n",
257 | " 'https://cineb.net/movie?page=196',\n",
258 | " 'https://cineb.net/movie?page=197',\n",
259 | " 'https://cineb.net/movie?page=198',\n",
260 | " 'https://cineb.net/movie?page=199',\n",
261 | " 'https://cineb.net/movie?page=200',\n",
262 | " 'https://cineb.net/movie?page=201',\n",
263 | " 'https://cineb.net/movie?page=202',\n",
264 | " 'https://cineb.net/movie?page=203',\n",
265 | " 'https://cineb.net/movie?page=204',\n",
266 | " 'https://cineb.net/movie?page=205',\n",
267 | " 'https://cineb.net/movie?page=206',\n",
268 | " 'https://cineb.net/movie?page=207',\n",
269 | " 'https://cineb.net/movie?page=208',\n",
270 | " 'https://cineb.net/movie?page=209',\n",
271 | " 'https://cineb.net/movie?page=210',\n",
272 | " 'https://cineb.net/movie?page=211',\n",
273 | " 'https://cineb.net/movie?page=212',\n",
274 | " 'https://cineb.net/movie?page=213',\n",
275 | " 'https://cineb.net/movie?page=214',\n",
276 | " 'https://cineb.net/movie?page=215',\n",
277 | " 'https://cineb.net/movie?page=216',\n",
278 | " 'https://cineb.net/movie?page=217',\n",
279 | " 'https://cineb.net/movie?page=218',\n",
280 | " 'https://cineb.net/movie?page=219',\n",
281 | " 'https://cineb.net/movie?page=220',\n",
282 | " 'https://cineb.net/movie?page=221',\n",
283 | " 'https://cineb.net/movie?page=222',\n",
284 | " 'https://cineb.net/movie?page=223',\n",
285 | " 'https://cineb.net/movie?page=224',\n",
286 | " 'https://cineb.net/movie?page=225',\n",
287 | " 'https://cineb.net/movie?page=226',\n",
288 | " 'https://cineb.net/movie?page=227',\n",
289 | " 'https://cineb.net/movie?page=228',\n",
290 | " 'https://cineb.net/movie?page=229',\n",
291 | " 'https://cineb.net/movie?page=230',\n",
292 | " 'https://cineb.net/movie?page=231',\n",
293 | " 'https://cineb.net/movie?page=232',\n",
294 | " 'https://cineb.net/movie?page=233',\n",
295 | " 'https://cineb.net/movie?page=234',\n",
296 | " 'https://cineb.net/movie?page=235',\n",
297 | " 'https://cineb.net/movie?page=236',\n",
298 | " 'https://cineb.net/movie?page=237',\n",
299 | " 'https://cineb.net/movie?page=238',\n",
300 | " 'https://cineb.net/movie?page=239',\n",
301 | " 'https://cineb.net/movie?page=240',\n",
302 | " 'https://cineb.net/movie?page=241',\n",
303 | " 'https://cineb.net/movie?page=242',\n",
304 | " 'https://cineb.net/movie?page=243',\n",
305 | " 'https://cineb.net/movie?page=244',\n",
306 | " 'https://cineb.net/movie?page=245',\n",
307 | " 'https://cineb.net/movie?page=246',\n",
308 | " 'https://cineb.net/movie?page=247',\n",
309 | " 'https://cineb.net/movie?page=248',\n",
310 | " 'https://cineb.net/movie?page=249',\n",
311 | " 'https://cineb.net/movie?page=250',\n",
312 | " 'https://cineb.net/movie?page=251',\n",
313 | " 'https://cineb.net/movie?page=252',\n",
314 | " 'https://cineb.net/movie?page=253',\n",
315 | " 'https://cineb.net/movie?page=254',\n",
316 | " 'https://cineb.net/movie?page=255',\n",
317 | " 'https://cineb.net/movie?page=256',\n",
318 | " 'https://cineb.net/movie?page=257',\n",
319 | " 'https://cineb.net/movie?page=258',\n",
320 | " 'https://cineb.net/movie?page=259',\n",
321 | " 'https://cineb.net/movie?page=260',\n",
322 | " 'https://cineb.net/movie?page=261',\n",
323 | " 'https://cineb.net/movie?page=262',\n",
324 | " 'https://cineb.net/movie?page=263',\n",
325 | " 'https://cineb.net/movie?page=264',\n",
326 | " 'https://cineb.net/movie?page=265',\n",
327 | " 'https://cineb.net/movie?page=266',\n",
328 | " 'https://cineb.net/movie?page=267',\n",
329 | " 'https://cineb.net/movie?page=268',\n",
330 | " 'https://cineb.net/movie?page=269',\n",
331 | " 'https://cineb.net/movie?page=270',\n",
332 | " 'https://cineb.net/movie?page=271',\n",
333 | " 'https://cineb.net/movie?page=272',\n",
334 | " 'https://cineb.net/movie?page=273',\n",
335 | " 'https://cineb.net/movie?page=274',\n",
336 | " 'https://cineb.net/movie?page=275',\n",
337 | " 'https://cineb.net/movie?page=276',\n",
338 | " 'https://cineb.net/movie?page=277',\n",
339 | " 'https://cineb.net/movie?page=278',\n",
340 | " 'https://cineb.net/movie?page=279',\n",
341 | " 'https://cineb.net/movie?page=280',\n",
342 | " 'https://cineb.net/movie?page=281',\n",
343 | " 'https://cineb.net/movie?page=282',\n",
344 | " 'https://cineb.net/movie?page=283',\n",
345 | " 'https://cineb.net/movie?page=284',\n",
346 | " 'https://cineb.net/movie?page=285',\n",
347 | " 'https://cineb.net/movie?page=286',\n",
348 | " 'https://cineb.net/movie?page=287',\n",
349 | " 'https://cineb.net/movie?page=288',\n",
350 | " 'https://cineb.net/movie?page=289',\n",
351 | " 'https://cineb.net/movie?page=290',\n",
352 | " 'https://cineb.net/movie?page=291',\n",
353 | " 'https://cineb.net/movie?page=292',\n",
354 | " 'https://cineb.net/movie?page=293',\n",
355 | " 'https://cineb.net/movie?page=294',\n",
356 | " 'https://cineb.net/movie?page=295',\n",
357 | " 'https://cineb.net/movie?page=296',\n",
358 | " 'https://cineb.net/movie?page=297',\n",
359 | " 'https://cineb.net/movie?page=298',\n",
360 | " 'https://cineb.net/movie?page=299',\n",
361 | " 'https://cineb.net/movie?page=300',\n",
362 | " 'https://cineb.net/movie?page=301',\n",
363 | " 'https://cineb.net/movie?page=302',\n",
364 | " 'https://cineb.net/movie?page=303',\n",
365 | " 'https://cineb.net/movie?page=304',\n",
366 | " 'https://cineb.net/movie?page=305',\n",
367 | " 'https://cineb.net/movie?page=306',\n",
368 | " 'https://cineb.net/movie?page=307',\n",
369 | " 'https://cineb.net/movie?page=308',\n",
370 | " 'https://cineb.net/movie?page=309',\n",
371 | " 'https://cineb.net/movie?page=310',\n",
372 | " 'https://cineb.net/movie?page=311',\n",
373 | " 'https://cineb.net/movie?page=312',\n",
374 | " 'https://cineb.net/movie?page=313',\n",
375 | " 'https://cineb.net/movie?page=314',\n",
376 | " 'https://cineb.net/movie?page=315',\n",
377 | " 'https://cineb.net/movie?page=316',\n",
378 | " 'https://cineb.net/movie?page=317',\n",
379 | " 'https://cineb.net/movie?page=318',\n",
380 | " 'https://cineb.net/movie?page=319',\n",
381 | " 'https://cineb.net/movie?page=320',\n",
382 | " 'https://cineb.net/movie?page=321',\n",
383 | " 'https://cineb.net/movie?page=322',\n",
384 | " 'https://cineb.net/movie?page=323',\n",
385 | " 'https://cineb.net/movie?page=324',\n",
386 | " 'https://cineb.net/movie?page=325',\n",
387 | " 'https://cineb.net/movie?page=326',\n",
388 | " 'https://cineb.net/movie?page=327',\n",
389 | " 'https://cineb.net/movie?page=328',\n",
390 | " 'https://cineb.net/movie?page=329',\n",
391 | " 'https://cineb.net/movie?page=330',\n",
392 | " 'https://cineb.net/movie?page=331',\n",
393 | " 'https://cineb.net/movie?page=332',\n",
394 | " 'https://cineb.net/movie?page=333',\n",
395 | " 'https://cineb.net/movie?page=334',\n",
396 | " 'https://cineb.net/movie?page=335',\n",
397 | " 'https://cineb.net/movie?page=336',\n",
398 | " 'https://cineb.net/movie?page=337',\n",
399 | " 'https://cineb.net/movie?page=338',\n",
400 | " 'https://cineb.net/movie?page=339',\n",
401 | " 'https://cineb.net/movie?page=340',\n",
402 | " 'https://cineb.net/movie?page=341',\n",
403 | " 'https://cineb.net/movie?page=342',\n",
404 | " 'https://cineb.net/movie?page=343',\n",
405 | " 'https://cineb.net/movie?page=344',\n",
406 | " 'https://cineb.net/movie?page=345',\n",
407 | " 'https://cineb.net/movie?page=346',\n",
408 | " 'https://cineb.net/movie?page=347',\n",
409 | " 'https://cineb.net/movie?page=348',\n",
410 | " 'https://cineb.net/movie?page=349',\n",
411 | " 'https://cineb.net/movie?page=350',\n",
412 | " 'https://cineb.net/movie?page=351',\n",
413 | " 'https://cineb.net/movie?page=352',\n",
414 | " 'https://cineb.net/movie?page=353',\n",
415 | " 'https://cineb.net/movie?page=354',\n",
416 | " 'https://cineb.net/movie?page=355',\n",
417 | " 'https://cineb.net/movie?page=356',\n",
418 | " 'https://cineb.net/movie?page=357',\n",
419 | " 'https://cineb.net/movie?page=358',\n",
420 | " 'https://cineb.net/movie?page=359',\n",
421 | " 'https://cineb.net/movie?page=360',\n",
422 | " 'https://cineb.net/movie?page=361',\n",
423 | " 'https://cineb.net/movie?page=362',\n",
424 | " 'https://cineb.net/movie?page=363',\n",
425 | " 'https://cineb.net/movie?page=364',\n",
426 | " 'https://cineb.net/movie?page=365',\n",
427 | " 'https://cineb.net/movie?page=366',\n",
428 | " 'https://cineb.net/movie?page=367',\n",
429 | " 'https://cineb.net/movie?page=368',\n",
430 | " 'https://cineb.net/movie?page=369',\n",
431 | " 'https://cineb.net/movie?page=370',\n",
432 | " 'https://cineb.net/movie?page=371',\n",
433 | " 'https://cineb.net/movie?page=372',\n",
434 | " 'https://cineb.net/movie?page=373',\n",
435 | " 'https://cineb.net/movie?page=374',\n",
436 | " 'https://cineb.net/movie?page=375',\n",
437 | " 'https://cineb.net/movie?page=376',\n",
438 | " 'https://cineb.net/movie?page=377',\n",
439 | " 'https://cineb.net/movie?page=378',\n",
440 | " 'https://cineb.net/movie?page=379',\n",
441 | " 'https://cineb.net/movie?page=380',\n",
442 | " 'https://cineb.net/movie?page=381',\n",
443 | " 'https://cineb.net/movie?page=382',\n",
444 | " 'https://cineb.net/movie?page=383',\n",
445 | " 'https://cineb.net/movie?page=384',\n",
446 | " 'https://cineb.net/movie?page=385',\n",
447 | " 'https://cineb.net/movie?page=386',\n",
448 | " 'https://cineb.net/movie?page=387',\n",
449 | " 'https://cineb.net/movie?page=388',\n",
450 | " 'https://cineb.net/movie?page=389',\n",
451 | " 'https://cineb.net/movie?page=390',\n",
452 | " 'https://cineb.net/movie?page=391',\n",
453 | " 'https://cineb.net/movie?page=392',\n",
454 | " 'https://cineb.net/movie?page=393',\n",
455 | " 'https://cineb.net/movie?page=394',\n",
456 | " 'https://cineb.net/movie?page=395',\n",
457 | " 'https://cineb.net/movie?page=396',\n",
458 | " 'https://cineb.net/movie?page=397',\n",
459 | " 'https://cineb.net/movie?page=398',\n",
460 | " 'https://cineb.net/movie?page=399',\n",
461 | " 'https://cineb.net/movie?page=400',\n",
462 | " 'https://cineb.net/movie?page=401',\n",
463 | " 'https://cineb.net/movie?page=402',\n",
464 | " 'https://cineb.net/movie?page=403',\n",
465 | " 'https://cineb.net/movie?page=404',\n",
466 | " 'https://cineb.net/movie?page=405',\n",
467 | " 'https://cineb.net/movie?page=406',\n",
468 | " 'https://cineb.net/movie?page=407',\n",
469 | " 'https://cineb.net/movie?page=408',\n",
470 | " 'https://cineb.net/movie?page=409',\n",
471 | " 'https://cineb.net/movie?page=410',\n",
472 | " 'https://cineb.net/movie?page=411',\n",
473 | " 'https://cineb.net/movie?page=412',\n",
474 | " 'https://cineb.net/movie?page=413',\n",
475 | " 'https://cineb.net/movie?page=414',\n",
476 | " 'https://cineb.net/movie?page=415',\n",
477 | " 'https://cineb.net/movie?page=416',\n",
478 | " 'https://cineb.net/movie?page=417',\n",
479 | " 'https://cineb.net/movie?page=418',\n",
480 | " 'https://cineb.net/movie?page=419',\n",
481 | " 'https://cineb.net/movie?page=420',\n",
482 | " 'https://cineb.net/movie?page=421',\n",
483 | " 'https://cineb.net/movie?page=422',\n",
484 | " 'https://cineb.net/movie?page=423',\n",
485 | " 'https://cineb.net/movie?page=424',\n",
486 | " 'https://cineb.net/movie?page=425',\n",
487 | " 'https://cineb.net/movie?page=426',\n",
488 | " 'https://cineb.net/movie?page=427',\n",
489 | " 'https://cineb.net/movie?page=428',\n",
490 | " 'https://cineb.net/movie?page=429',\n",
491 | " 'https://cineb.net/movie?page=430',\n",
492 | " 'https://cineb.net/movie?page=431',\n",
493 | " 'https://cineb.net/movie?page=432',\n",
494 | " 'https://cineb.net/movie?page=433',\n",
495 | " 'https://cineb.net/movie?page=434',\n",
496 | " 'https://cineb.net/movie?page=435',\n",
497 | " 'https://cineb.net/movie?page=436',\n",
498 | " 'https://cineb.net/movie?page=437',\n",
499 | " 'https://cineb.net/movie?page=438',\n",
500 | " 'https://cineb.net/movie?page=439',\n",
501 | " 'https://cineb.net/movie?page=440',\n",
502 | " 'https://cineb.net/movie?page=441',\n",
503 | " 'https://cineb.net/movie?page=442',\n",
504 | " 'https://cineb.net/movie?page=443',\n",
505 | " 'https://cineb.net/movie?page=444',\n",
506 | " 'https://cineb.net/movie?page=445',\n",
507 | " 'https://cineb.net/movie?page=446',\n",
508 | " 'https://cineb.net/movie?page=447',\n",
509 | " 'https://cineb.net/movie?page=448',\n",
510 | " 'https://cineb.net/movie?page=449',\n",
511 | " 'https://cineb.net/movie?page=450',\n",
512 | " 'https://cineb.net/movie?page=451',\n",
513 | " 'https://cineb.net/movie?page=452',\n",
514 | " 'https://cineb.net/movie?page=453',\n",
515 | " 'https://cineb.net/movie?page=454',\n",
516 | " 'https://cineb.net/movie?page=455',\n",
517 | " 'https://cineb.net/movie?page=456',\n",
518 | " 'https://cineb.net/movie?page=457',\n",
519 | " 'https://cineb.net/movie?page=458',\n",
520 | " 'https://cineb.net/movie?page=459',\n",
521 | " 'https://cineb.net/movie?page=460',\n",
522 | " 'https://cineb.net/movie?page=461',\n",
523 | " 'https://cineb.net/movie?page=462',\n",
524 | " 'https://cineb.net/movie?page=463',\n",
525 | " 'https://cineb.net/movie?page=464',\n",
526 | " 'https://cineb.net/movie?page=465',\n",
527 | " 'https://cineb.net/movie?page=466',\n",
528 | " 'https://cineb.net/movie?page=467',\n",
529 | " 'https://cineb.net/movie?page=468',\n",
530 | " 'https://cineb.net/movie?page=469',\n",
531 | " 'https://cineb.net/movie?page=470',\n",
532 | " 'https://cineb.net/movie?page=471',\n",
533 | " 'https://cineb.net/movie?page=472',\n",
534 | " 'https://cineb.net/movie?page=473',\n",
535 | " 'https://cineb.net/movie?page=474',\n",
536 | " 'https://cineb.net/movie?page=475',\n",
537 | " 'https://cineb.net/movie?page=476',\n",
538 | " 'https://cineb.net/movie?page=477',\n",
539 | " 'https://cineb.net/movie?page=478',\n",
540 | " 'https://cineb.net/movie?page=479',\n",
541 | " 'https://cineb.net/movie?page=480',\n",
542 | " 'https://cineb.net/movie?page=481',\n",
543 | " 'https://cineb.net/movie?page=482',\n",
544 | " 'https://cineb.net/movie?page=483',\n",
545 | " 'https://cineb.net/movie?page=484',\n",
546 | " 'https://cineb.net/movie?page=485',\n",
547 | " 'https://cineb.net/movie?page=486',\n",
548 | " 'https://cineb.net/movie?page=487',\n",
549 | " 'https://cineb.net/movie?page=488',\n",
550 | " 'https://cineb.net/movie?page=489',\n",
551 | " 'https://cineb.net/movie?page=490',\n",
552 | " 'https://cineb.net/movie?page=491',\n",
553 | " 'https://cineb.net/movie?page=492',\n",
554 | " 'https://cineb.net/movie?page=493',\n",
555 | " 'https://cineb.net/movie?page=494',\n",
556 | " 'https://cineb.net/movie?page=495',\n",
557 | " 'https://cineb.net/movie?page=496',\n",
558 | " 'https://cineb.net/movie?page=497',\n",
559 | " 'https://cineb.net/movie?page=498',\n",
560 | " 'https://cineb.net/movie?page=499',\n",
561 | " 'https://cineb.net/movie?page=500',\n",
562 | " 'https://cineb.net/movie?page=501',\n",
563 | " 'https://cineb.net/movie?page=502',\n",
564 | " 'https://cineb.net/movie?page=503',\n",
565 | " 'https://cineb.net/movie?page=504',\n",
566 | " 'https://cineb.net/movie?page=505',\n",
567 | " 'https://cineb.net/movie?page=506',\n",
568 | " 'https://cineb.net/movie?page=507',\n",
569 | " 'https://cineb.net/movie?page=508',\n",
570 | " 'https://cineb.net/movie?page=509',\n",
571 | " 'https://cineb.net/movie?page=510',\n",
572 | " 'https://cineb.net/movie?page=511',\n",
573 | " 'https://cineb.net/movie?page=512',\n",
574 | " 'https://cineb.net/movie?page=513',\n",
575 | " 'https://cineb.net/movie?page=514',\n",
576 | " 'https://cineb.net/movie?page=515',\n",
577 | " 'https://cineb.net/movie?page=516',\n",
578 | " 'https://cineb.net/movie?page=517',\n",
579 | " 'https://cineb.net/movie?page=518',\n",
580 | " 'https://cineb.net/movie?page=519',\n",
581 | " 'https://cineb.net/movie?page=520',\n",
582 | " 'https://cineb.net/movie?page=521',\n",
583 | " 'https://cineb.net/movie?page=522',\n",
584 | " 'https://cineb.net/movie?page=523',\n",
585 | " 'https://cineb.net/movie?page=524',\n",
586 | " 'https://cineb.net/movie?page=525',\n",
587 | " 'https://cineb.net/movie?page=526',\n",
588 | " 'https://cineb.net/movie?page=527',\n",
589 | " 'https://cineb.net/movie?page=528',\n",
590 | " 'https://cineb.net/movie?page=529',\n",
591 | " 'https://cineb.net/movie?page=530',\n",
592 | " 'https://cineb.net/movie?page=531',\n",
593 | " 'https://cineb.net/movie?page=532',\n",
594 | " 'https://cineb.net/movie?page=533',\n",
595 | " 'https://cineb.net/movie?page=534',\n",
596 | " 'https://cineb.net/movie?page=535',\n",
597 | " 'https://cineb.net/movie?page=536',\n",
598 | " 'https://cineb.net/movie?page=537',\n",
599 | " 'https://cineb.net/movie?page=538',\n",
600 | " 'https://cineb.net/movie?page=539',\n",
601 | " 'https://cineb.net/movie?page=540',\n",
602 | " 'https://cineb.net/movie?page=541',\n",
603 | " 'https://cineb.net/movie?page=542',\n",
604 | " 'https://cineb.net/movie?page=543',\n",
605 | " 'https://cineb.net/movie?page=544',\n",
606 | " 'https://cineb.net/movie?page=545',\n",
607 | " 'https://cineb.net/movie?page=546',\n",
608 | " 'https://cineb.net/movie?page=547',\n",
609 | " 'https://cineb.net/movie?page=548',\n",
610 | " 'https://cineb.net/movie?page=549',\n",
611 | " 'https://cineb.net/movie?page=550',\n",
612 | " 'https://cineb.net/movie?page=551',\n",
613 | " 'https://cineb.net/movie?page=552',\n",
614 | " 'https://cineb.net/movie?page=553',\n",
615 | " 'https://cineb.net/movie?page=554',\n",
616 | " 'https://cineb.net/movie?page=555',\n",
617 | " 'https://cineb.net/movie?page=556',\n",
618 | " 'https://cineb.net/movie?page=557',\n",
619 | " 'https://cineb.net/movie?page=558',\n",
620 | " 'https://cineb.net/movie?page=559',\n",
621 | " 'https://cineb.net/movie?page=560',\n",
622 | " 'https://cineb.net/movie?page=561',\n",
623 | " 'https://cineb.net/movie?page=562',\n",
624 | " 'https://cineb.net/movie?page=563',\n",
625 | " 'https://cineb.net/movie?page=564',\n",
626 | " 'https://cineb.net/movie?page=565',\n",
627 | " 'https://cineb.net/movie?page=566',\n",
628 | " 'https://cineb.net/movie?page=567',\n",
629 | " 'https://cineb.net/movie?page=568',\n",
630 | " 'https://cineb.net/movie?page=569',\n",
631 | " 'https://cineb.net/movie?page=570',\n",
632 | " 'https://cineb.net/movie?page=571',\n",
633 | " 'https://cineb.net/movie?page=572',\n",
634 | " 'https://cineb.net/movie?page=573',\n",
635 | " 'https://cineb.net/movie?page=574',\n",
636 | " 'https://cineb.net/movie?page=575',\n",
637 | " 'https://cineb.net/movie?page=576',\n",
638 | " 'https://cineb.net/movie?page=577',\n",
639 | " 'https://cineb.net/movie?page=578',\n",
640 | " 'https://cineb.net/movie?page=579',\n",
641 | " 'https://cineb.net/movie?page=580',\n",
642 | " 'https://cineb.net/movie?page=581',\n",
643 | " 'https://cineb.net/movie?page=582',\n",
644 | " 'https://cineb.net/movie?page=583',\n",
645 | " 'https://cineb.net/movie?page=584',\n",
646 | " 'https://cineb.net/movie?page=585',\n",
647 | " 'https://cineb.net/movie?page=586',\n",
648 | " 'https://cineb.net/movie?page=587',\n",
649 | " 'https://cineb.net/movie?page=588',\n",
650 | " 'https://cineb.net/movie?page=589',\n",
651 | " 'https://cineb.net/movie?page=590',\n",
652 | " 'https://cineb.net/movie?page=591',\n",
653 | " 'https://cineb.net/movie?page=592',\n",
654 | " 'https://cineb.net/movie?page=593',\n",
655 | " 'https://cineb.net/movie?page=594',\n",
656 | " 'https://cineb.net/movie?page=595',\n",
657 | " 'https://cineb.net/movie?page=596',\n",
658 | " 'https://cineb.net/movie?page=597',\n",
659 | " 'https://cineb.net/movie?page=598',\n",
660 | " 'https://cineb.net/movie?page=599',\n",
661 | " 'https://cineb.net/movie?page=600',\n",
662 | " 'https://cineb.net/movie?page=601',\n",
663 | " 'https://cineb.net/movie?page=602',\n",
664 | " 'https://cineb.net/movie?page=603',\n",
665 | " 'https://cineb.net/movie?page=604',\n",
666 | " 'https://cineb.net/movie?page=605',\n",
667 | " 'https://cineb.net/movie?page=606',\n",
668 | " 'https://cineb.net/movie?page=607',\n",
669 | " 'https://cineb.net/movie?page=608',\n",
670 | " 'https://cineb.net/movie?page=609',\n",
671 | " 'https://cineb.net/movie?page=610',\n",
672 | " 'https://cineb.net/movie?page=611',\n",
673 | " 'https://cineb.net/movie?page=612',\n",
674 | " 'https://cineb.net/movie?page=613',\n",
675 | " 'https://cineb.net/movie?page=614',\n",
676 | " 'https://cineb.net/movie?page=615',\n",
677 | " 'https://cineb.net/movie?page=616',\n",
678 | " 'https://cineb.net/movie?page=617',\n",
679 | " 'https://cineb.net/movie?page=618',\n",
680 | " 'https://cineb.net/movie?page=619',\n",
681 | " 'https://cineb.net/movie?page=620',\n",
682 | " 'https://cineb.net/movie?page=621',\n",
683 | " 'https://cineb.net/movie?page=622',\n",
684 | " 'https://cineb.net/movie?page=623',\n",
685 | " 'https://cineb.net/movie?page=624',\n",
686 | " 'https://cineb.net/movie?page=625',\n",
687 | " 'https://cineb.net/movie?page=626',\n",
688 | " 'https://cineb.net/movie?page=627',\n",
689 | " 'https://cineb.net/movie?page=628',\n",
690 | " 'https://cineb.net/movie?page=629',\n",
691 | " 'https://cineb.net/movie?page=630',\n",
692 | " 'https://cineb.net/movie?page=631',\n",
693 | " 'https://cineb.net/movie?page=632',\n",
694 | " 'https://cineb.net/movie?page=633',\n",
695 | " 'https://cineb.net/movie?page=634',\n",
696 | " 'https://cineb.net/movie?page=635',\n",
697 | " 'https://cineb.net/movie?page=636',\n",
698 | " 'https://cineb.net/movie?page=637',\n",
699 | " 'https://cineb.net/movie?page=638',\n",
700 | " 'https://cineb.net/movie?page=639',\n",
701 | " 'https://cineb.net/movie?page=640',\n",
702 | " 'https://cineb.net/movie?page=641',\n",
703 | " 'https://cineb.net/movie?page=642',\n",
704 | " 'https://cineb.net/movie?page=643',\n",
705 | " 'https://cineb.net/movie?page=644',\n",
706 | " 'https://cineb.net/movie?page=645',\n",
707 | " 'https://cineb.net/movie?page=646',\n",
708 | " 'https://cineb.net/movie?page=647',\n",
709 | " 'https://cineb.net/movie?page=648',\n",
710 | " 'https://cineb.net/movie?page=649',\n",
711 | " 'https://cineb.net/movie?page=650',\n",
712 | " 'https://cineb.net/movie?page=651',\n",
713 | " 'https://cineb.net/movie?page=652',\n",
714 | " 'https://cineb.net/movie?page=653',\n",
715 | " 'https://cineb.net/movie?page=654',\n",
716 | " 'https://cineb.net/movie?page=655',\n",
717 | " 'https://cineb.net/movie?page=656',\n",
718 | " 'https://cineb.net/movie?page=657',\n",
719 | " 'https://cineb.net/movie?page=658',\n",
720 | " 'https://cineb.net/movie?page=659',\n",
721 | " 'https://cineb.net/movie?page=660',\n",
722 | " 'https://cineb.net/movie?page=661',\n",
723 | " 'https://cineb.net/movie?page=662',\n",
724 | " 'https://cineb.net/movie?page=663',\n",
725 | " 'https://cineb.net/movie?page=664',\n",
726 | " 'https://cineb.net/movie?page=665',\n",
727 | " 'https://cineb.net/movie?page=666',\n",
728 | " 'https://cineb.net/movie?page=667',\n",
729 | " 'https://cineb.net/movie?page=668',\n",
730 | " 'https://cineb.net/movie?page=669',\n",
731 | " 'https://cineb.net/movie?page=670',\n",
732 | " 'https://cineb.net/movie?page=671',\n",
733 | " 'https://cineb.net/movie?page=672',\n",
734 | " 'https://cineb.net/movie?page=673',\n",
735 | " 'https://cineb.net/movie?page=674',\n",
736 | " 'https://cineb.net/movie?page=675',\n",
737 | " 'https://cineb.net/movie?page=676',\n",
738 | " 'https://cineb.net/movie?page=677',\n",
739 | " 'https://cineb.net/movie?page=678',\n",
740 | " 'https://cineb.net/movie?page=679',\n",
741 | " 'https://cineb.net/movie?page=680',\n",
742 | " 'https://cineb.net/movie?page=681',\n",
743 | " 'https://cineb.net/movie?page=682',\n",
744 | " 'https://cineb.net/movie?page=683',\n",
745 | " 'https://cineb.net/movie?page=684',\n",
746 | " 'https://cineb.net/movie?page=685',\n",
747 | " 'https://cineb.net/movie?page=686',\n",
748 | " 'https://cineb.net/movie?page=687',\n",
749 | " 'https://cineb.net/movie?page=688',\n",
750 | " 'https://cineb.net/movie?page=689',\n",
751 | " 'https://cineb.net/movie?page=690',\n",
752 | " 'https://cineb.net/movie?page=691',\n",
753 | " 'https://cineb.net/movie?page=692',\n",
754 | " 'https://cineb.net/movie?page=693',\n",
755 | " 'https://cineb.net/movie?page=694',\n",
756 | " 'https://cineb.net/movie?page=695',\n",
757 | " 'https://cineb.net/movie?page=696',\n",
758 | " 'https://cineb.net/movie?page=697',\n",
759 | " 'https://cineb.net/movie?page=698',\n",
760 | " 'https://cineb.net/movie?page=699',\n",
761 | " 'https://cineb.net/movie?page=700',\n",
762 | " 'https://cineb.net/movie?page=701',\n",
763 | " 'https://cineb.net/movie?page=702',\n",
764 | " 'https://cineb.net/movie?page=703',\n",
765 | " 'https://cineb.net/movie?page=704',\n",
766 | " 'https://cineb.net/movie?page=705',\n",
767 | " 'https://cineb.net/movie?page=706',\n",
768 | " 'https://cineb.net/movie?page=707',\n",
769 | " 'https://cineb.net/movie?page=708',\n",
770 | " 'https://cineb.net/movie?page=709',\n",
771 | " 'https://cineb.net/movie?page=710',\n",
772 | " 'https://cineb.net/movie?page=711',\n",
773 | " 'https://cineb.net/movie?page=712',\n",
774 | " 'https://cineb.net/movie?page=713',\n",
775 | " 'https://cineb.net/movie?page=714',\n",
776 | " 'https://cineb.net/movie?page=715',\n",
777 | " 'https://cineb.net/movie?page=716',\n",
778 | " 'https://cineb.net/movie?page=717',\n",
779 | " 'https://cineb.net/movie?page=718',\n",
780 | " 'https://cineb.net/movie?page=719',\n",
781 | " 'https://cineb.net/movie?page=720',\n",
782 | " 'https://cineb.net/movie?page=721',\n",
783 | " 'https://cineb.net/movie?page=722',\n",
784 | " 'https://cineb.net/movie?page=723',\n",
785 | " 'https://cineb.net/movie?page=724',\n",
786 | " 'https://cineb.net/movie?page=725',\n",
787 | " 'https://cineb.net/movie?page=726',\n",
788 | " 'https://cineb.net/movie?page=727',\n",
789 | " 'https://cineb.net/movie?page=728',\n",
790 | " 'https://cineb.net/movie?page=729',\n",
791 | " 'https://cineb.net/movie?page=730',\n",
792 | " 'https://cineb.net/movie?page=731',\n",
793 | " 'https://cineb.net/movie?page=732',\n",
794 | " 'https://cineb.net/movie?page=733',\n",
795 | " 'https://cineb.net/movie?page=734',\n",
796 | " 'https://cineb.net/movie?page=735',\n",
797 | " 'https://cineb.net/movie?page=736',\n",
798 | " 'https://cineb.net/movie?page=737',\n",
799 | " 'https://cineb.net/movie?page=738',\n",
800 | " 'https://cineb.net/movie?page=739',\n",
801 | " 'https://cineb.net/movie?page=740',\n",
802 | " 'https://cineb.net/movie?page=741',\n",
803 | " 'https://cineb.net/movie?page=742',\n",
804 | " 'https://cineb.net/movie?page=743',\n",
805 | " 'https://cineb.net/movie?page=744',\n",
806 | " 'https://cineb.net/movie?page=745',\n",
807 | " 'https://cineb.net/movie?page=746',\n",
808 | " 'https://cineb.net/movie?page=747',\n",
809 | " 'https://cineb.net/movie?page=748',\n",
810 | " 'https://cineb.net/movie?page=749',\n",
811 | " 'https://cineb.net/movie?page=750',\n",
812 | " 'https://cineb.net/movie?page=751',\n",
813 | " 'https://cineb.net/movie?page=752',\n",
814 | " 'https://cineb.net/movie?page=753',\n",
815 | " 'https://cineb.net/movie?page=754',\n",
816 | " 'https://cineb.net/movie?page=755',\n",
817 | " 'https://cineb.net/movie?page=756',\n",
818 | " 'https://cineb.net/movie?page=757',\n",
819 | " 'https://cineb.net/movie?page=758',\n",
820 | " 'https://cineb.net/movie?page=759',\n",
821 | " 'https://cineb.net/movie?page=760',\n",
822 | " 'https://cineb.net/movie?page=761',\n",
823 | " 'https://cineb.net/movie?page=762',\n",
824 | " 'https://cineb.net/movie?page=763',\n",
825 | " 'https://cineb.net/movie?page=764',\n",
826 | " 'https://cineb.net/movie?page=765',\n",
827 | " 'https://cineb.net/movie?page=766',\n",
828 | " 'https://cineb.net/movie?page=767',\n",
829 | " 'https://cineb.net/movie?page=768',\n",
830 | " 'https://cineb.net/movie?page=769',\n",
831 | " 'https://cineb.net/movie?page=770',\n",
832 | " 'https://cineb.net/movie?page=771',\n",
833 | " 'https://cineb.net/movie?page=772',\n",
834 | " 'https://cineb.net/movie?page=773',\n",
835 | " 'https://cineb.net/movie?page=774',\n",
836 | " 'https://cineb.net/movie?page=775',\n",
837 | " 'https://cineb.net/movie?page=776',\n",
838 | " 'https://cineb.net/movie?page=777',\n",
839 | " 'https://cineb.net/movie?page=778',\n",
840 | " 'https://cineb.net/movie?page=779',\n",
841 | " 'https://cineb.net/movie?page=780',\n",
842 | " 'https://cineb.net/movie?page=781',\n",
843 | " 'https://cineb.net/movie?page=782',\n",
844 | " 'https://cineb.net/movie?page=783',\n",
845 | " 'https://cineb.net/movie?page=784',\n",
846 | " 'https://cineb.net/movie?page=785',\n",
847 | " 'https://cineb.net/movie?page=786',\n",
848 | " 'https://cineb.net/movie?page=787',\n",
849 | " 'https://cineb.net/movie?page=788',\n",
850 | " 'https://cineb.net/movie?page=789',\n",
851 | " 'https://cineb.net/movie?page=790',\n",
852 | " 'https://cineb.net/movie?page=791',\n",
853 | " 'https://cineb.net/movie?page=792',\n",
854 | " 'https://cineb.net/movie?page=793',\n",
855 | " 'https://cineb.net/movie?page=794',\n",
856 | " 'https://cineb.net/movie?page=795',\n",
857 | " 'https://cineb.net/movie?page=796',\n",
858 | " 'https://cineb.net/movie?page=797',\n",
859 | " 'https://cineb.net/movie?page=798',\n",
860 | " 'https://cineb.net/movie?page=799',\n",
861 | " 'https://cineb.net/movie?page=800',\n",
862 | " 'https://cineb.net/movie?page=801',\n",
863 | " 'https://cineb.net/movie?page=802',\n",
864 | " 'https://cineb.net/movie?page=803',\n",
865 | " 'https://cineb.net/movie?page=804',\n",
866 | " 'https://cineb.net/movie?page=805',\n",
867 | " 'https://cineb.net/movie?page=806',\n",
868 | " 'https://cineb.net/movie?page=807',\n",
869 | " 'https://cineb.net/movie?page=808',\n",
870 | " 'https://cineb.net/movie?page=809',\n",
871 | " 'https://cineb.net/movie?page=810',\n",
872 | " 'https://cineb.net/movie?page=811',\n",
873 | " 'https://cineb.net/movie?page=812',\n",
874 | " 'https://cineb.net/movie?page=813',\n",
875 | " 'https://cineb.net/movie?page=814',\n",
876 | " 'https://cineb.net/movie?page=815',\n",
877 | " 'https://cineb.net/movie?page=816',\n",
878 | " 'https://cineb.net/movie?page=817',\n",
879 | " 'https://cineb.net/movie?page=818',\n",
880 | " 'https://cineb.net/movie?page=819',\n",
881 | " 'https://cineb.net/movie?page=820',\n",
882 | " 'https://cineb.net/movie?page=821',\n",
883 | " 'https://cineb.net/movie?page=822',\n",
884 | " 'https://cineb.net/movie?page=823',\n",
885 | " 'https://cineb.net/movie?page=824',\n",
886 | " 'https://cineb.net/movie?page=825',\n",
887 | " 'https://cineb.net/movie?page=826',\n",
888 | " 'https://cineb.net/movie?page=827',\n",
889 | " 'https://cineb.net/movie?page=828',\n",
890 | " 'https://cineb.net/movie?page=829',\n",
891 | " 'https://cineb.net/movie?page=830',\n",
892 | " 'https://cineb.net/movie?page=831',\n",
893 | " 'https://cineb.net/movie?page=832',\n",
894 | " 'https://cineb.net/movie?page=833',\n",
895 | " 'https://cineb.net/movie?page=834',\n",
896 | " 'https://cineb.net/movie?page=835',\n",
897 | " 'https://cineb.net/movie?page=836',\n",
898 | " 'https://cineb.net/movie?page=837',\n",
899 | " 'https://cineb.net/movie?page=838',\n",
900 | " 'https://cineb.net/movie?page=839',\n",
901 | " 'https://cineb.net/movie?page=840',\n",
902 | " 'https://cineb.net/movie?page=841',\n",
903 | " 'https://cineb.net/movie?page=842',\n",
904 | " 'https://cineb.net/movie?page=843',\n",
905 | " 'https://cineb.net/movie?page=844',\n",
906 | " 'https://cineb.net/movie?page=845',\n",
907 | " 'https://cineb.net/movie?page=846',\n",
908 | " 'https://cineb.net/movie?page=847',\n",
909 | " 'https://cineb.net/movie?page=848',\n",
910 | " 'https://cineb.net/movie?page=849',\n",
911 | " 'https://cineb.net/movie?page=850',\n",
912 | " 'https://cineb.net/movie?page=851',\n",
913 | " 'https://cineb.net/movie?page=852',\n",
914 | " 'https://cineb.net/movie?page=853',\n",
915 | " 'https://cineb.net/movie?page=854',\n",
916 | " 'https://cineb.net/movie?page=855',\n",
917 | " 'https://cineb.net/movie?page=856',\n",
918 | " 'https://cineb.net/movie?page=857',\n",
919 | " 'https://cineb.net/movie?page=858',\n",
920 | " 'https://cineb.net/movie?page=859',\n",
921 | " 'https://cineb.net/movie?page=860',\n",
922 | " 'https://cineb.net/movie?page=861',\n",
923 | " 'https://cineb.net/movie?page=862',\n",
924 | " 'https://cineb.net/movie?page=863',\n",
925 | " 'https://cineb.net/movie?page=864',\n",
926 | " 'https://cineb.net/movie?page=865',\n",
927 | " 'https://cineb.net/movie?page=866',\n",
928 | " 'https://cineb.net/movie?page=867',\n",
929 | " 'https://cineb.net/movie?page=868',\n",
930 | " 'https://cineb.net/movie?page=869',\n",
931 | " 'https://cineb.net/movie?page=870',\n",
932 | " 'https://cineb.net/movie?page=871',\n",
933 | " 'https://cineb.net/movie?page=872',\n",
934 | " 'https://cineb.net/movie?page=873',\n",
935 | " 'https://cineb.net/movie?page=874',\n",
936 | " 'https://cineb.net/movie?page=875',\n",
937 | " 'https://cineb.net/movie?page=876',\n",
938 | " 'https://cineb.net/movie?page=877',\n",
939 | " 'https://cineb.net/movie?page=878',\n",
940 | " 'https://cineb.net/movie?page=879',\n",
941 | " 'https://cineb.net/movie?page=880',\n",
942 | " 'https://cineb.net/movie?page=881',\n",
943 | " 'https://cineb.net/movie?page=882',\n",
944 | " 'https://cineb.net/movie?page=883',\n",
945 | " 'https://cineb.net/movie?page=884',\n",
946 | " 'https://cineb.net/movie?page=885',\n",
947 | " 'https://cineb.net/movie?page=886',\n",
948 | " 'https://cineb.net/movie?page=887',\n",
949 | " 'https://cineb.net/movie?page=888',\n",
950 | " 'https://cineb.net/movie?page=889',\n",
951 | " 'https://cineb.net/movie?page=890',\n",
952 | " 'https://cineb.net/movie?page=891',\n",
953 | " 'https://cineb.net/movie?page=892',\n",
954 | " 'https://cineb.net/movie?page=893',\n",
955 | " 'https://cineb.net/movie?page=894',\n",
956 | " 'https://cineb.net/movie?page=895',\n",
957 | " 'https://cineb.net/movie?page=896',\n",
958 | " 'https://cineb.net/movie?page=897',\n",
959 | " 'https://cineb.net/movie?page=898',\n",
960 | " 'https://cineb.net/movie?page=899',\n",
961 | " 'https://cineb.net/movie?page=900',\n",
962 | " 'https://cineb.net/movie?page=901',\n",
963 | " 'https://cineb.net/movie?page=902',\n",
964 | " 'https://cineb.net/movie?page=903',\n",
965 | " 'https://cineb.net/movie?page=904',\n",
966 | " 'https://cineb.net/movie?page=905',\n",
967 | " 'https://cineb.net/movie?page=906',\n",
968 | " 'https://cineb.net/movie?page=907',\n",
969 | " 'https://cineb.net/movie?page=908',\n",
970 | " 'https://cineb.net/movie?page=909',\n",
971 | " 'https://cineb.net/movie?page=910',\n",
972 | " 'https://cineb.net/movie?page=911',\n",
973 | " 'https://cineb.net/movie?page=912',\n",
974 | " 'https://cineb.net/movie?page=913',\n",
975 | " 'https://cineb.net/movie?page=914',\n",
976 | " 'https://cineb.net/movie?page=915',\n",
977 | " 'https://cineb.net/movie?page=916',\n",
978 | " 'https://cineb.net/movie?page=917',\n",
979 | " 'https://cineb.net/movie?page=918',\n",
980 | " 'https://cineb.net/movie?page=919',\n",
981 | " 'https://cineb.net/movie?page=920',\n",
982 | " 'https://cineb.net/movie?page=921',\n",
983 | " 'https://cineb.net/movie?page=922',\n",
984 | " 'https://cineb.net/movie?page=923',\n",
985 | " 'https://cineb.net/movie?page=924',\n",
986 | " 'https://cineb.net/movie?page=925',\n",
987 | " 'https://cineb.net/movie?page=926',\n",
988 | " 'https://cineb.net/movie?page=927',\n",
989 | " 'https://cineb.net/movie?page=928',\n",
990 | " 'https://cineb.net/movie?page=929',\n",
991 | " 'https://cineb.net/movie?page=930',\n",
992 | " 'https://cineb.net/movie?page=931',\n",
993 | " 'https://cineb.net/movie?page=932',\n",
994 | " 'https://cineb.net/movie?page=933',\n",
995 | " 'https://cineb.net/movie?page=934',\n",
996 | " 'https://cineb.net/movie?page=935',\n",
997 | " 'https://cineb.net/movie?page=936',\n",
998 | " 'https://cineb.net/movie?page=937',\n",
999 | " 'https://cineb.net/movie?page=938',\n",
1000 | " 'https://cineb.net/movie?page=939',\n",
1001 | " 'https://cineb.net/movie?page=940',\n",
1002 | " 'https://cineb.net/movie?page=941',\n",
1003 | " 'https://cineb.net/movie?page=942',\n",
1004 | " 'https://cineb.net/movie?page=943',\n",
1005 | " 'https://cineb.net/movie?page=944',\n",
1006 | " 'https://cineb.net/movie?page=945',\n",
1007 | " 'https://cineb.net/movie?page=946',\n",
1008 | " 'https://cineb.net/movie?page=947',\n",
1009 | " 'https://cineb.net/movie?page=948',\n",
1010 | " 'https://cineb.net/movie?page=949',\n",
1011 | " 'https://cineb.net/movie?page=950',\n",
1012 | " 'https://cineb.net/movie?page=951',\n",
1013 | " 'https://cineb.net/movie?page=952',\n",
1014 | " 'https://cineb.net/movie?page=953',\n",
1015 | " 'https://cineb.net/movie?page=954',\n",
1016 | " 'https://cineb.net/movie?page=955',\n",
1017 | " 'https://cineb.net/movie?page=956',\n",
1018 | " 'https://cineb.net/movie?page=957',\n",
1019 | " 'https://cineb.net/movie?page=958',\n",
1020 | " 'https://cineb.net/movie?page=959',\n",
1021 | " 'https://cineb.net/movie?page=960',\n",
1022 | " 'https://cineb.net/movie?page=961',\n",
1023 | " 'https://cineb.net/movie?page=962',\n",
1024 | " 'https://cineb.net/movie?page=963',\n",
1025 | " 'https://cineb.net/movie?page=964',\n",
1026 | " 'https://cineb.net/movie?page=965',\n",
1027 | " 'https://cineb.net/movie?page=966',\n",
1028 | " 'https://cineb.net/movie?page=967',\n",
1029 | " 'https://cineb.net/movie?page=968',\n",
1030 | " 'https://cineb.net/movie?page=969',\n",
1031 | " 'https://cineb.net/movie?page=970',\n",
1032 | " 'https://cineb.net/movie?page=971',\n",
1033 | " 'https://cineb.net/movie?page=972',\n",
1034 | " 'https://cineb.net/movie?page=973',\n",
1035 | " 'https://cineb.net/movie?page=974',\n",
1036 | " 'https://cineb.net/movie?page=975',\n",
1037 | " 'https://cineb.net/movie?page=976',\n",
1038 | " 'https://cineb.net/movie?page=977',\n",
1039 | " 'https://cineb.net/movie?page=978',\n",
1040 | " 'https://cineb.net/movie?page=979',\n",
1041 | " 'https://cineb.net/movie?page=980',\n",
1042 | " 'https://cineb.net/movie?page=981',\n",
1043 | " 'https://cineb.net/movie?page=982',\n",
1044 | " 'https://cineb.net/movie?page=983',\n",
1045 | " 'https://cineb.net/movie?page=984',\n",
1046 | " 'https://cineb.net/movie?page=985',\n",
1047 | " 'https://cineb.net/movie?page=986',\n",
1048 | " 'https://cineb.net/movie?page=987',\n",
1049 | " 'https://cineb.net/movie?page=988',\n",
1050 | " 'https://cineb.net/movie?page=989',\n",
1051 | " 'https://cineb.net/movie?page=990',\n",
1052 | " 'https://cineb.net/movie?page=991',\n",
1053 | " 'https://cineb.net/movie?page=992',\n",
1054 | " 'https://cineb.net/movie?page=993',\n",
1055 | " 'https://cineb.net/movie?page=994',\n",
1056 | " 'https://cineb.net/movie?page=995',\n",
1057 | " 'https://cineb.net/movie?page=996',\n",
1058 | " 'https://cineb.net/movie?page=997',\n",
1059 | " 'https://cineb.net/movie?page=998',\n",
1060 | " 'https://cineb.net/movie?page=999',\n",
1061 | " ...]"
1062 | ]
1063 | },
1064 | "execution_count": 3,
1065 | "metadata": {},
1066 | "output_type": "execute_result"
1067 | }
1068 | ],
1069 | "source": [
1070 | "# genreate links \n",
1071 | "base_url= 'https://cineb.net/movie?page={}'\n",
1072 | "pages= 1172\n",
1073 | "links= []\n",
1074 | "for i in range(pages):\n",
1075 | " links.append(base_url.format(i))\n",
1076 | "links"
1077 | ]
1078 | },
1079 | {
1080 | "cell_type": "code",
1081 | "execution_count": 4,
1082 | "id": "aaa36ba2",
1083 | "metadata": {},
1084 | "outputs": [
1085 | {
1086 | "data": {
1087 | "text/plain": [
1088 | "\n",
1089 | "
\n",
1090 | "
\n",
1091 | "
\n",
1092 | "
\n",
1093 | "
\n",
1094 | "
\n",
1095 | "
\n",
1096 | "
\n",
1097 | "
\n",
1098 | "
\n",
1099 | "videocam Trailer \n",
1100 | "N/A \n",
1101 | "IMDB: 7.6 \n",
1102 | "
\n",
1103 | "
\n",
1104 | " The story of Oakland Athletics general manager Billy Beane's successful attempt to put together a baseball team on a budget, by employing computer-generated analysis to draft his players.\n",
1105 | "
\n",
1106 | "
\n",
1107 | "
\n",
1108 | "Released: 2011-09-22\n",
1109 | "
\n",
1110 | "
\n",
1111 | "
Genre: \n",
1112 | "
Drama \n",
1113 | "
\n",
1114 | "
\n",
1115 | "
Casts: \n",
1116 | "
Brad Pitt , \n",
1117 | " \n",
1118 | "
Jonah Hill , \n",
1119 | " \n",
1120 | "
Philip Seymour Hoffman , \n",
1121 | " \n",
1122 | "
Robin Wright , \n",
1123 | " \n",
1124 | "
Chris Pratt \n",
1125 | "
\n",
1126 | "
\n",
1127 | "Duration: 134\n",
1128 | " min\n",
1129 | "
\n",
1130 | "
\n",
1134 | "
\n",
1140 | "
\n",
1141 | "
\n",
1142 | "
\n",
1143 | "
\n",
1144 | "
Watch Moneyball Online\n",
1145 | " Free \n",
1146 | "Moneyball Online Free \n",
1147 | "Where to watch Moneyball \n",
1148 | "Moneyball movie free\n",
1149 | " online \n",
1150 | "Moneyball free online \n",
1151 | "\n",
1152 | "
\n",
1153 | "
\n",
1154 | "
\n",
1155 | "
\n",
1156 | "
\n",
1157 | "\n",
1168 | "
"
1169 | ]
1170 | },
1171 | "execution_count": 4,
1172 | "metadata": {},
1173 | "output_type": "execute_result"
1174 | }
1175 | ],
1176 | "source": [
1177 | "# Prototype Scraping: Moneyball\n",
1178 | "url= 'https://cineb.net/watch-movie/watch-moneyball-free-18763.5349739'\n",
1179 | "page= requests.get(url)\n",
1180 | "soup= BeautifulSoup(page.content, 'html.parser')\n",
1181 | "\n",
1182 | "# Find the div tag that contains the data to be scraped\n",
1183 | "div= soup.find('div', {'class': 'col-xl-6 col-lg-12 col-infor'})\n",
1184 | "div"
1185 | ]
1186 | },
1187 | {
1188 | "cell_type": "code",
1189 | "execution_count": 5,
1190 | "id": "7cf090d9",
1191 | "metadata": {},
1192 | "outputs": [
1193 | {
1194 | "name": "stdout",
1195 | "output_type": "stream",
1196 | "text": [
1197 | "('Moneyball', '7.6', \"The story of Oakland Athletics general manager Billy Beane's successful attempt to put together a baseball team on a budget, by employing computer-generated analysis to draft his players.\", '2011-09-22', 'Drama', ['Brad Pitt', 'Jonah Hill', 'Philip Seymour Hoffman', 'Robin Wright', 'Chris Pratt'], '134', 'United States of America', ['Columbia Pictures', 'Scott Rudin Productions'])\n"
1198 | ]
1199 | }
1200 | ],
1201 | "source": [
1202 | "# Now let's scrape dara and assign it to variables\n",
1203 | "# first div: image\n",
1204 | "image_tag= div.find('img', {'class': 'film-poster-img'})\n",
1205 | "image_link= image_tag['src']\n",
1206 | "\n",
1207 | "# Title\n",
1208 | "title= image_tag['title']\n",
1209 | "\n",
1210 | "# Rating \n",
1211 | "rating= div.find('button', {'class': 'btn btn-sm btn-imdb'}).text.strip('IMDB: ')\n",
1212 | "\n",
1213 | "# Link\n",
1214 | "link= 'https://cineb.net'+ div.find('h2', {'class': 'heading-name'}).a['href']\n",
1215 | "\n",
1216 | "# Description\n",
1217 | "description= div.find('div', {'class': 'description'}).text.strip('\\n ')\n",
1218 | "\n",
1219 | "# elements div \n",
1220 | "elements_div= div.find('div', { 'class': 'elements'})\n",
1221 | "elements= elements_div.find_all('div', {'class': 'row-line'})\n",
1222 | "\n",
1223 | "# Release date\n",
1224 | "release_date= elements[0].text.strip('\\nReleased: ')\n",
1225 | "\n",
1226 | "# Genre: I advise against using non-scalar values as it makes it hard to filter\n",
1227 | "# and clean data, but the goal here is to scrape as much data as we can.\n",
1228 | "\n",
1229 | "genre_a_tags= elements[1].find_all('a')\n",
1230 | "if len(genre_a_tags) > 1:\n",
1231 | " genre= []\n",
1232 | " for genre_a_tag in genre_a_tags:\n",
1233 | " genre.append(genre_a_tag.text)\n",
1234 | "else:\n",
1235 | " genre= genre_a_tags[0].text\n",
1236 | "# Casts\n",
1237 | "casts=[]\n",
1238 | "a_tags= elements[2].find_all('a')\n",
1239 | "for a_tag in a_tags:\n",
1240 | " casts.append(a_tag.text)\n",
1241 | " \n",
1242 | "# Duration\n",
1243 | "duration= elements[3].text.strip('\\nDuration: ').replace('\\n m','')\n",
1244 | "\n",
1245 | "# Country\n",
1246 | "try:\n",
1247 | " country= elements[4].a['title']\n",
1248 | "except:\n",
1249 | " country= 'None'\n",
1250 | " \n",
1251 | "# Production\n",
1252 | "production= []\n",
1253 | "a_tags= elements[5].find_all('a')\n",
1254 | "for a_tag in a_tags:\n",
1255 | " production.append(a_tag.text)\n",
1256 | "\n",
1257 | "# Movie data\n",
1258 | "movie= (title, rating, description, release_date, genre, casts, duration, country, production)\n",
1259 | "print(movie)\n"
1260 | ]
1261 | },
1262 | {
1263 | "cell_type": "code",
1264 | "execution_count": 6,
1265 | "id": "ce3f62df",
1266 | "metadata": {},
1267 | "outputs": [
1268 | {
1269 | "data": {
1270 | "text/plain": [
1271 | "'United States of America'"
1272 | ]
1273 | },
1274 | "execution_count": 6,
1275 | "metadata": {},
1276 | "output_type": "execute_result"
1277 | }
1278 | ],
1279 | "source": [
1280 | "country"
1281 | ]
1282 | },
1283 | {
1284 | "cell_type": "code",
1285 | "execution_count": 7,
1286 | "id": "0af219c5",
1287 | "metadata": {},
1288 | "outputs": [
1289 | {
1290 | "data": {
1291 | "text/plain": [
1292 | "['https://cineb.to/movie/watch-two-for-joy-free-14737',\n",
1293 | " '/movie/watch-journey-back-to-christmas-free-14736',\n",
1294 | " '/movie/watch-armadillo-free-14735',\n",
1295 | " '/movie/watch-the-night-watchmen-free-14734',\n",
1296 | " '/movie/watch-last-vermont-christmas-free-14733',\n",
1297 | " '/movie/watch-the-walking-deceased-free-14731',\n",
1298 | " '/movie/watch-the-hatton-garden-job-free-14729',\n",
1299 | " '/movie/watch-last-dance-free-14726',\n",
1300 | " '/movie/watch-crazy-kind-of-love-free-14722',\n",
1301 | " 'https://cineb.to/movie/watch-loose-cannons-free-14721',\n",
1302 | " '/movie/watch-gangnam-blues-free-14720',\n",
1303 | " '/movie/watch-feedback-free-14718',\n",
1304 | " '/movie/watch-proxy-free-14717',\n",
1305 | " 'https://cineb.to/movie/watch-the-nun-free-14716',\n",
1306 | " '/movie/watch-the-summer-of-sangaile-free-14714',\n",
1307 | " '/movie/watch-fred-armisen-standup-for-drummers-free-14713',\n",
1308 | " '/movie/watch-jack-goes-boating-free-14712',\n",
1309 | " '/movie/watch-the-thief-of-bagdad-free-14711',\n",
1310 | " '/movie/watch-nadine-free-14708',\n",
1311 | " '/movie/watch-vip-free-14707',\n",
1312 | " '/movie/watch-vagabond-free-14706',\n",
1313 | " '/movie/watch-no-mercy-free-14705',\n",
1314 | " '/movie/watch-evil-angel-free-14702',\n",
1315 | " '/movie/watch-the-wilde-wedding-free-14701',\n",
1316 | " '/movie/watch-varathan-free-14700',\n",
1317 | " '/movie/watch-very-very-valentine-free-14699',\n",
1318 | " '/movie/watch-sands-of-oblivion-free-14698',\n",
1319 | " '/movie/watch-american-loser-free-14696',\n",
1320 | " '/movie/watch-home-run-free-14695',\n",
1321 | " '/movie/watch-a-or-b-free-14692',\n",
1322 | " '/movie/watch-paradise-free-14691',\n",
1323 | " '/movie/watch-cold-hell-free-14689']"
1324 | ]
1325 | },
1326 | "execution_count": 7,
1327 | "metadata": {},
1328 | "output_type": "execute_result"
1329 | }
1330 | ],
1331 | "source": [
1332 | "# Scrape movies link from a page\n",
1333 | "url= 'https://cineb.net/movie?page=816'\n",
1334 | "page= requests.get(url)\n",
1335 | "soup= BeautifulSoup(page.content, 'html.parser')\n",
1336 | "\n",
1337 | "# Find the div tag that contains the data to be scraped\n",
1338 | "posters= soup.find_all('div', {'class': 'film-poster'})\n",
1339 | "m_links= []\n",
1340 | "for poster in posters:\n",
1341 | " m_links.append(poster.a['href'])\n",
1342 | "m_links"
1343 | ]
1344 | },
1345 | {
1346 | "cell_type": "markdown",
1347 | "id": "8fd71c4e",
1348 | "metadata": {},
1349 | "source": [
1350 | "Now let's bring all this together "
1351 | ]
1352 | },
1353 | {
1354 | "cell_type": "code",
1355 | "execution_count": 9,
1356 | "id": "bf1f5985",
1357 | "metadata": {},
1358 | "outputs": [],
1359 | "source": [
1360 | "\n",
1361 | "base_url= 'https://cineb.net/movie?page={}'\n",
1362 | "\n",
1363 | "# The movies secion contains more than 1700 pages, but for this tutorial we'll scrape way less pages \n",
1364 | "pages= 50\n",
1365 | "urls= []\n",
1366 | "for i in range(pages):\n",
1367 | " urls.append(base_url.format(i))\n",
1368 | " \n",
1369 | "records= []\n",
1370 | "for url in urls:\n",
1371 | "\n",
1372 | " page= requests.get(url)\n",
1373 | " soup= BeautifulSoup(page.content, 'html.parser')\n",
1374 | "\n",
1375 | " # Find the div tag that contains the data to be scraped\n",
1376 | " posters= soup.find_all('div', {'class': 'film-poster'})\n",
1377 | " for poster in posters:\n",
1378 | "\n",
1379 | " m_link=(poster.a['href'])\n",
1380 | " if 'cineb' in m_link:\n",
1381 | " pass\n",
1382 | " else:\n",
1383 | " m_link= 'https://cineb.net'+ m_link\n",
1384 | " page= requests.get(m_link)\n",
1385 | " soup= BeautifulSoup(page.content, 'html.parser')\n",
1386 | "\n",
1387 | " # Find the div tag that contains the data to be scraped\n",
1388 | " div= soup.find('div', {'class': 'col-xl-6 col-lg-12 col-infor'})\n",
1389 | "\n",
1390 | " # Image\n",
1391 | " image_tag= div.find('img', {'class': 'film-poster-img'})\n",
1392 | " image_link= image_tag['src']\n",
1393 | "\n",
1394 | " # Title\n",
1395 | " title= image_tag['title']\n",
1396 | "\n",
1397 | " # Rating \n",
1398 | " rating= div.find('button', {'class': 'btn btn-sm btn-imdb'}).text.strip('IMDB: ')\n",
1399 | "\n",
1400 | " # Description\n",
1401 | " description= div.find('div', {'class': 'description'}).text.strip('\\n ')\n",
1402 | "\n",
1403 | " # elements div \n",
1404 | " elements_div= div.find('div', { 'class': 'elements'})\n",
1405 | " elements= elements_div.find_all('div', {'class': 'row-line'})\n",
1406 | "\n",
1407 | " # Release date\n",
1408 | " release_date= elements[0].text.strip('\\nReleased: ')\n",
1409 | "\n",
1410 | " # Genre\n",
1411 | " genre_a_tags= elements[1].find_all('a')\n",
1412 | " if len(genre_a_tags) > 0:\n",
1413 | " genre= genre_a_tags[0].text\n",
1414 | " else:\n",
1415 | " genr= 'Undfined'\n",
1416 | " \n",
1417 | " # Casts\n",
1418 | " casts=[]\n",
1419 | " a_tags= elements[2].find_all('a')\n",
1420 | " for a_tag in a_tags:\n",
1421 | " casts.append(a_tag.text)\n",
1422 | "\n",
1423 | " # Duration\n",
1424 | " duration= elements[3].text.strip('\\nDuration: ').replace('\\n m','')\n",
1425 | "\n",
1426 | " # Country\n",
1427 | " try:\n",
1428 | " country= elements[4].a['title']\n",
1429 | " except:\n",
1430 | " country= 'None'\n",
1431 | " # Production\n",
1432 | " production= []\n",
1433 | " a_tags= elements[5].find_all('a')\n",
1434 | " for a_tag in a_tags:\n",
1435 | " production.append(a_tag.text)\n",
1436 | "\n",
1437 | " record= (title, rating, description, release_date, genre, casts, duration, country, production, m_link)\n",
1438 | " records.append(record)\n",
1439 | " "
1440 | ]
1441 | },
1442 | {
1443 | "cell_type": "code",
1444 | "execution_count": 11,
1445 | "id": "c0d0856f",
1446 | "metadata": {},
1447 | "outputs": [],
1448 | "source": [
1449 | "# Let's write those records in a csv file\n",
1450 | "with open(\"movies.csv\", 'w', newline='', encoding='utf-8') as f:\n",
1451 | " writer= csv.writer(f)\n",
1452 | " writer.writerow(['title', 'rating', 'description', 'release_date', 'genre', 'casts', 'duration', 'country', 'production', 'm_link'])\n",
1453 | " writer.writerows(records)"
1454 | ]
1455 | },
1456 | {
1457 | "cell_type": "code",
1458 | "execution_count": 12,
1459 | "id": "b1afba35",
1460 | "metadata": {},
1461 | "outputs": [],
1462 | "source": [
1463 | "df= pd.read_csv('movies.csv')"
1464 | ]
1465 | },
1466 | {
1467 | "cell_type": "code",
1468 | "execution_count": 13,
1469 | "id": "c5456d89",
1470 | "metadata": {
1471 | "scrolled": true
1472 | },
1473 | "outputs": [
1474 | {
1475 | "data": {
1476 | "text/html": [
1477 | "\n",
1478 | "\n",
1491 | "
\n",
1492 | " \n",
1493 | " \n",
1494 | " \n",
1495 | " title \n",
1496 | " rating \n",
1497 | " description \n",
1498 | " release_date \n",
1499 | " genre \n",
1500 | " casts \n",
1501 | " duration \n",
1502 | " country \n",
1503 | " production \n",
1504 | " m_link \n",
1505 | " \n",
1506 | " \n",
1507 | " \n",
1508 | " \n",
1509 | " 0 \n",
1510 | " Where Are You \n",
1511 | " NaN \n",
1512 | " A photographer suffering under artist decline ... \n",
1513 | " 2022-10-21 \n",
1514 | " Thriller \n",
1515 | " ['Brad Greenquist', 'Madeline Brewer', 'Ray Ni... \n",
1516 | " 94.0 \n",
1517 | " United States of America \n",
1518 | " ['Carte Blanche'] \n",
1519 | " https://cineb.net/movie/watch-where-are-you-fr... \n",
1520 | " \n",
1521 | " \n",
1522 | " 1 \n",
1523 | " The Domestic \n",
1524 | " NaN \n",
1525 | " An upper-class couple hire the daughter of the... \n",
1526 | " 2022-10-21 \n",
1527 | " Thriller \n",
1528 | " ['Amanda Du-Pont.'] \n",
1529 | " 110.0 \n",
1530 | " South Africa \n",
1531 | " ['Mandeville Films'] \n",
1532 | " https://cineb.net/movie/watch-the-domestic-fre... \n",
1533 | " \n",
1534 | " \n",
1535 | " 2 \n",
1536 | " A Chance Encounter \n",
1537 | " NaN \n",
1538 | " In the Sicilian town of Taormina, Italy, an as... \n",
1539 | " 2022-10-28 \n",
1540 | " Drama \n",
1541 | " ['Vincenzo Vivenzio', 'Kenny Burns', 'Jason Ed... \n",
1542 | " 91.0 \n",
1543 | " United States of America \n",
1544 | " ['Bespoke Works'] \n",
1545 | " https://cineb.net/movie/watch-a-chance-encount... \n",
1546 | " \n",
1547 | " \n",
1548 | " 3 \n",
1549 | " V/H/S/99 \n",
1550 | " NaN \n",
1551 | " A teenager’s home video leads to a series of h... \n",
1552 | " 2022-09-15 \n",
1553 | " Horror \n",
1554 | " ['Jackson Kelly', 'Verona Blue', 'Archelaus Cr... \n",
1555 | " 99.0 \n",
1556 | " United States of America \n",
1557 | " ['Bloody Disgusting', 'Radio Silence', 'Studio... \n",
1558 | " https://cineb.to/movie/watch-vhs99-free-89401 \n",
1559 | " \n",
1560 | " \n",
1561 | " 4 \n",
1562 | " Raymond & Ray \n",
1563 | " NaN \n",
1564 | " Half brothers Raymond and Ray reunite when the... \n",
1565 | " 2022-10-14 \n",
1566 | " Drama \n",
1567 | " ['Gina Jun', 'Sophie Okonedo', 'Vondie Curtis-... \n",
1568 | " 106.0 \n",
1569 | " United States of America \n",
1570 | " ['Esperanto Filmoj', 'Apple Inc.', 'Mockingbir... \n",
1571 | " https://cineb.net/movie/watch-raymond-ray-free... \n",
1572 | " \n",
1573 | " \n",
1574 | " ... \n",
1575 | " ... \n",
1576 | " ... \n",
1577 | " ... \n",
1578 | " ... \n",
1579 | " ... \n",
1580 | " ... \n",
1581 | " ... \n",
1582 | " ... \n",
1583 | " ... \n",
1584 | " ... \n",
1585 | " \n",
1586 | " \n",
1587 | " 1595 \n",
1588 | " Lilith \n",
1589 | " 6.8 \n",
1590 | " Vincent Bruce, a war veteran, begins working a... \n",
1591 | " 1965-01-15 \n",
1592 | " Drama \n",
1593 | " ['Warren Beatty', 'Kim Hunter', 'Anne Meacham'... \n",
1594 | " 114.0 \n",
1595 | " United States of America \n",
1596 | " ['Columbia Pictures', 'Centaur Enterprises'] \n",
1597 | " https://cineb.net/movie/watch-lilith-free-83005 \n",
1598 | " \n",
1599 | " \n",
1600 | " 1596 \n",
1601 | " Where the Scary Things Are \n",
1602 | " 3.0 \n",
1603 | " The horror begins as Ayla and her high-school ... \n",
1604 | " 2022-06-28 \n",
1605 | " Horror \n",
1606 | " ['Paul Cottman', 'Michael Cervantes', 'Selina ... \n",
1607 | " 93.0 \n",
1608 | " United States of America \n",
1609 | " ['Lionsgate'] \n",
1610 | " https://cineb.net/movie/watch-where-the-scary-... \n",
1611 | " \n",
1612 | " \n",
1613 | " 1597 \n",
1614 | " What The Nanny Saw \n",
1615 | " 4.8 \n",
1616 | " When nanny Kim accidentally discovers that her... \n",
1617 | " 2022-04-29 \n",
1618 | " Thriller \n",
1619 | " ['Lindsay Hartley', 'Ryan Francis', 'Laurie Fo... \n",
1620 | " 85.0 \n",
1621 | " United States of America \n",
1622 | " ['Almost Never Films Inc'] \n",
1623 | " https://cineb.net/movie/watch-what-the-nanny-s... \n",
1624 | " \n",
1625 | " \n",
1626 | " 1598 \n",
1627 | " Les Girls \n",
1628 | " 6.6 \n",
1629 | " After writing a tell-all book about her days i... \n",
1630 | " 1957-10-03 \n",
1631 | " Music \n",
1632 | " ['Gene Kelly', 'Jacques Bergerac', 'Mitzi Gayn... \n",
1633 | " 114.0 \n",
1634 | " United States of America \n",
1635 | " ['Sol C. Siegel Productions', 'Metro-Goldwyn-M... \n",
1636 | " https://cineb.net/movie/watch-les-girls-free-8... \n",
1637 | " \n",
1638 | " \n",
1639 | " 1599 \n",
1640 | " The Storms of Jeremy Thomas \n",
1641 | " 6.4 \n",
1642 | " Joining Oscar-winning producer Jeremy Thomas o... \n",
1643 | " 2021-12-10 \n",
1644 | " Documentary \n",
1645 | " ['Tilda Swinton', 'Mark Cousins', 'Debra Winge... \n",
1646 | " 94.0 \n",
1647 | " United Kingdom \n",
1648 | " ['Visit Films', 'Creative Scotland'] \n",
1649 | " https://cineb.net/movie/watch-the-storms-of-je... \n",
1650 | " \n",
1651 | " \n",
1652 | "
\n",
1653 | "
1600 rows × 10 columns
\n",
1654 | "
"
1655 | ],
1656 | "text/plain": [
1657 | " title rating \\\n",
1658 | "0 Where Are You NaN \n",
1659 | "1 The Domestic NaN \n",
1660 | "2 A Chance Encounter NaN \n",
1661 | "3 V/H/S/99 NaN \n",
1662 | "4 Raymond & Ray NaN \n",
1663 | "... ... ... \n",
1664 | "1595 Lilith 6.8 \n",
1665 | "1596 Where the Scary Things Are 3.0 \n",
1666 | "1597 What The Nanny Saw 4.8 \n",
1667 | "1598 Les Girls 6.6 \n",
1668 | "1599 The Storms of Jeremy Thomas 6.4 \n",
1669 | "\n",
1670 | " description release_date \\\n",
1671 | "0 A photographer suffering under artist decline ... 2022-10-21 \n",
1672 | "1 An upper-class couple hire the daughter of the... 2022-10-21 \n",
1673 | "2 In the Sicilian town of Taormina, Italy, an as... 2022-10-28 \n",
1674 | "3 A teenager’s home video leads to a series of h... 2022-09-15 \n",
1675 | "4 Half brothers Raymond and Ray reunite when the... 2022-10-14 \n",
1676 | "... ... ... \n",
1677 | "1595 Vincent Bruce, a war veteran, begins working a... 1965-01-15 \n",
1678 | "1596 The horror begins as Ayla and her high-school ... 2022-06-28 \n",
1679 | "1597 When nanny Kim accidentally discovers that her... 2022-04-29 \n",
1680 | "1598 After writing a tell-all book about her days i... 1957-10-03 \n",
1681 | "1599 Joining Oscar-winning producer Jeremy Thomas o... 2021-12-10 \n",
1682 | "\n",
1683 | " genre casts \\\n",
1684 | "0 Thriller ['Brad Greenquist', 'Madeline Brewer', 'Ray Ni... \n",
1685 | "1 Thriller ['Amanda Du-Pont.'] \n",
1686 | "2 Drama ['Vincenzo Vivenzio', 'Kenny Burns', 'Jason Ed... \n",
1687 | "3 Horror ['Jackson Kelly', 'Verona Blue', 'Archelaus Cr... \n",
1688 | "4 Drama ['Gina Jun', 'Sophie Okonedo', 'Vondie Curtis-... \n",
1689 | "... ... ... \n",
1690 | "1595 Drama ['Warren Beatty', 'Kim Hunter', 'Anne Meacham'... \n",
1691 | "1596 Horror ['Paul Cottman', 'Michael Cervantes', 'Selina ... \n",
1692 | "1597 Thriller ['Lindsay Hartley', 'Ryan Francis', 'Laurie Fo... \n",
1693 | "1598 Music ['Gene Kelly', 'Jacques Bergerac', 'Mitzi Gayn... \n",
1694 | "1599 Documentary ['Tilda Swinton', 'Mark Cousins', 'Debra Winge... \n",
1695 | "\n",
1696 | " duration country \\\n",
1697 | "0 94.0 United States of America \n",
1698 | "1 110.0 South Africa \n",
1699 | "2 91.0 United States of America \n",
1700 | "3 99.0 United States of America \n",
1701 | "4 106.0 United States of America \n",
1702 | "... ... ... \n",
1703 | "1595 114.0 United States of America \n",
1704 | "1596 93.0 United States of America \n",
1705 | "1597 85.0 United States of America \n",
1706 | "1598 114.0 United States of America \n",
1707 | "1599 94.0 United Kingdom \n",
1708 | "\n",
1709 | " production \\\n",
1710 | "0 ['Carte Blanche'] \n",
1711 | "1 ['Mandeville Films'] \n",
1712 | "2 ['Bespoke Works'] \n",
1713 | "3 ['Bloody Disgusting', 'Radio Silence', 'Studio... \n",
1714 | "4 ['Esperanto Filmoj', 'Apple Inc.', 'Mockingbir... \n",
1715 | "... ... \n",
1716 | "1595 ['Columbia Pictures', 'Centaur Enterprises'] \n",
1717 | "1596 ['Lionsgate'] \n",
1718 | "1597 ['Almost Never Films Inc'] \n",
1719 | "1598 ['Sol C. Siegel Productions', 'Metro-Goldwyn-M... \n",
1720 | "1599 ['Visit Films', 'Creative Scotland'] \n",
1721 | "\n",
1722 | " m_link \n",
1723 | "0 https://cineb.net/movie/watch-where-are-you-fr... \n",
1724 | "1 https://cineb.net/movie/watch-the-domestic-fre... \n",
1725 | "2 https://cineb.net/movie/watch-a-chance-encount... \n",
1726 | "3 https://cineb.to/movie/watch-vhs99-free-89401 \n",
1727 | "4 https://cineb.net/movie/watch-raymond-ray-free... \n",
1728 | "... ... \n",
1729 | "1595 https://cineb.net/movie/watch-lilith-free-83005 \n",
1730 | "1596 https://cineb.net/movie/watch-where-the-scary-... \n",
1731 | "1597 https://cineb.net/movie/watch-what-the-nanny-s... \n",
1732 | "1598 https://cineb.net/movie/watch-les-girls-free-8... \n",
1733 | "1599 https://cineb.net/movie/watch-the-storms-of-je... \n",
1734 | "\n",
1735 | "[1600 rows x 10 columns]"
1736 | ]
1737 | },
1738 | "execution_count": 13,
1739 | "metadata": {},
1740 | "output_type": "execute_result"
1741 | }
1742 | ],
1743 | "source": [
1744 | "df"
1745 | ]
1746 | },
1747 | {
1748 | "cell_type": "code",
1749 | "execution_count": 14,
1750 | "id": "937ecc57",
1751 | "metadata": {},
1752 | "outputs": [
1753 | {
1754 | "data": {
1755 | "text/plain": [
1756 | "(1600, 10)"
1757 | ]
1758 | },
1759 | "execution_count": 14,
1760 | "metadata": {},
1761 | "output_type": "execute_result"
1762 | }
1763 | ],
1764 | "source": [
1765 | "df.shape"
1766 | ]
1767 | },
1768 | {
1769 | "cell_type": "code",
1770 | "execution_count": null,
1771 | "id": "4a7de796",
1772 | "metadata": {},
1773 | "outputs": [],
1774 | "source": []
1775 | }
1776 | ],
1777 | "metadata": {
1778 | "kernelspec": {
1779 | "display_name": "Python 3 (ipykernel)",
1780 | "language": "python",
1781 | "name": "python3"
1782 | },
1783 | "language_info": {
1784 | "codemirror_mode": {
1785 | "name": "ipython",
1786 | "version": 3
1787 | },
1788 | "file_extension": ".py",
1789 | "mimetype": "text/x-python",
1790 | "name": "python",
1791 | "nbconvert_exporter": "python",
1792 | "pygments_lexer": "ipython3",
1793 | "version": "3.9.12"
1794 | }
1795 | },
1796 | "nbformat": 4,
1797 | "nbformat_minor": 5
1798 | }
1799 |
--------------------------------------------------------------------------------
/notebooks/Real estate.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": null,
6 | "id": "81f1f273",
7 | "metadata": {},
8 | "outputs": [],
9 | "source": [
10 | "# Imports\n",
11 | "\n",
12 | "import pandas as pd\n",
13 | "import bs4\n",
14 | "from bs4 import BeautifulSoup\n",
15 | "import requests\n",
16 | "import csv\n"
17 | ]
18 | },
19 | {
20 | "cell_type": "code",
21 | "execution_count": null,
22 | "id": "06a9473c",
23 | "metadata": {},
24 | "outputs": [],
25 | "source": [
26 | "# Setup version\n",
27 | "\n",
28 | "print('pandas version: {}'.format(pd.__version__))\n",
29 | "print('bs4 version: {}'.format(bs4.__version__))\n",
30 | "print('requests version: {}'.format(requests.__version__))\n",
31 | "print('csv version: {}'.format(csv.__version__))"
32 | ]
33 | },
34 | {
35 | "cell_type": "code",
36 | "execution_count": null,
37 | "id": "5087b16f",
38 | "metadata": {},
39 | "outputs": [],
40 | "source": [
41 | "def get_url(city, pages):\n",
42 | " \n",
43 | " \"\"\"\n",
44 | " The get_url function returns a list of urls for the searched phrase and the number of pages\n",
45 | " \n",
46 | " :city: The name or expression of the city you're looking at for rent \n",
47 | " :pages: The number of pages you want to scrape (< maximum number of web pages shown)\n",
48 | " \n",
49 | " \"\"\" \n",
50 | " \n",
51 | " template1= 'https://www.pararius.com/apartments/{}/page-{}' \n",
52 | " urls= []\n",
53 | " for i in range(1,pages+1):\n",
54 | " url= template1.format(city, i)\n",
55 | " urls.append(url)\n",
56 | " return urls\n"
57 | ]
58 | },
59 | {
60 | "cell_type": "code",
61 | "execution_count": null,
62 | "id": "be85af19",
63 | "metadata": {},
64 | "outputs": [],
65 | "source": [
66 | "def scrape_results(urls):\n",
67 | " \n",
68 | " \"\"\"\n",
69 | " \n",
70 | " The scrape_results function loops over the urls and scrapes all real estate data \n",
71 | " \n",
72 | " :links: list of urls generated by calling the get_url function\n",
73 | " \n",
74 | " \"\"\"\n",
75 | " \n",
76 | " records= []\n",
77 | " \n",
78 | " for url in urls:\n",
79 | " page= requests.get(url)\n",
80 | " soup= BeautifulSoup(page.content, 'html.parser')\n",
81 | " items= soup.find_all('section', {'class':'listing-search-item'})\n",
82 | " template2= 'https://www.pararius.com{}'\n",
83 | " \n",
84 | " for item in items:\n",
85 | " \n",
86 | " title= item.find('a', {'class':'listing-search-item__link listing-search-item__link--title'}).text.strip()\n",
87 | " rent_price= item.find('div', {'class':'listing-search-item__price'}).text.strip().replace('per month','')[1:]\n",
88 | " adress= item.find('div', {'class':'listing-search-item__sub-title'}).text.strip()\n",
89 | " surface= item.find('li', {'class':'illustrated-features__item illustrated-features__item--surface-area'}).text.strip().replace('m²','')\n",
90 | " rooms= item.find('li', {'class':'illustrated-features__item illustrated-features__item--number-of-rooms'}).text.strip().replace('rooms','')\n",
91 | " \n",
92 | " try:\n",
93 | " interior_status= item.find('li', {'class': 'illustrated-features__item illustrated-features__item--interior'}).text.strip()\n",
94 | " except AttributeError:\n",
95 | " interior_status= 'Undefined'\n",
96 | " \n",
97 | " try:\n",
98 | " agency= item.find('div', class_='listing-search-item__info').text.strip()\n",
99 | " except AttributeError:\n",
100 | " agency='None'\n",
101 | " \n",
102 | " link= template2.format(item.a['href'])\n",
103 | " contact= scrape_contact(link)\n",
104 | "\n",
105 | " record= (title, adress, rent_price, surface, rooms, interior_status, agency, contact, link)\n",
106 | " records.append(record)\n",
107 | " \n",
108 | " with open(\"data/real_estate.csv\", 'w', newline='', encoding='utf-8') as f:\n",
109 | " writer= csv.writer(f)\n",
110 | " writer.writerow(['Title', 'Address', 'Rent Price', 'Surface', 'Rooms', 'Interior Status', 'Agency', 'Contact', 'Link'])\n",
111 | " writer.writerows(records)\n",
112 | " \n",
113 | " return records\n"
114 | ]
115 | },
116 | {
117 | "cell_type": "code",
118 | "execution_count": null,
119 | "id": "67a86665",
120 | "metadata": {},
121 | "outputs": [],
122 | "source": [
123 | "def scrape_contact(url):\n",
124 | " \n",
125 | " \"\"\"\n",
126 | " \n",
127 | " To scrape the contact's details \n",
128 | " \n",
129 | " :links: list of urls generated by calling the get_url function\n",
130 | " \n",
131 | " \"\"\" \n",
132 | " page= requests.get(url)\n",
133 | " soup= BeautifulSoup(page.content, 'html.parser')\n",
134 | " div= soup.find('section', class_='agent-summary')\n",
135 | " contact= div.find('div', class_='agent-summary__links').a['href'].replace('tel:','')\n",
136 | " \n",
137 | " return contact\n"
138 | ]
139 | },
140 | {
141 | "cell_type": "code",
142 | "execution_count": null,
143 | "id": "fae626c4",
144 | "metadata": {},
145 | "outputs": [],
146 | "source": [
147 | "# Let's try to scrape Amsterdams propreties\n",
148 | "\n",
149 | "urls= get_url('amsterdam', 17)\n",
150 | "records= scrape_results(urls)"
151 | ]
152 | },
153 | {
154 | "cell_type": "code",
155 | "execution_count": null,
156 | "id": "16bdcdcd",
157 | "metadata": {},
158 | "outputs": [],
159 | "source": [
160 | "df= pd.read_csv('data/real_estate.csv')\n",
161 | "df"
162 | ]
163 | },
164 | {
165 | "cell_type": "code",
166 | "execution_count": null,
167 | "id": "fc28abcf",
168 | "metadata": {},
169 | "outputs": [],
170 | "source": [
171 | "df.shape"
172 | ]
173 | },
174 | {
175 | "cell_type": "code",
176 | "execution_count": null,
177 | "id": "b4c01702",
178 | "metadata": {},
179 | "outputs": [],
180 | "source": []
181 | }
182 | ],
183 | "metadata": {
184 | "kernelspec": {
185 | "display_name": "Python 3 (ipykernel)",
186 | "language": "python",
187 | "name": "python3"
188 | },
189 | "language_info": {
190 | "codemirror_mode": {
191 | "name": "ipython",
192 | "version": 3
193 | },
194 | "file_extension": ".py",
195 | "mimetype": "text/x-python",
196 | "name": "python",
197 | "nbconvert_exporter": "python",
198 | "pygments_lexer": "ipython3",
199 | "version": "3.9.12"
200 | }
201 | },
202 | "nbformat": 4,
203 | "nbformat_minor": 5
204 | }
205 |
--------------------------------------------------------------------------------
/notebooks/Salaries.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": 1,
6 | "id": "717aca5c",
7 | "metadata": {},
8 | "outputs": [],
9 | "source": [
10 | "# %% Imports\n",
11 | "\n",
12 | "import pandas as pd\n",
13 | "import bs4\n",
14 | "from bs4 import BeautifulSoup as bs\n",
15 | "import requests\n",
16 | "import csv"
17 | ]
18 | },
19 | {
20 | "cell_type": "code",
21 | "execution_count": 2,
22 | "id": "fa813adf",
23 | "metadata": {},
24 | "outputs": [
25 | {
26 | "name": "stdout",
27 | "output_type": "stream",
28 | "text": [
29 | "pandas version: 1.4.2\n",
30 | "bs4 version: 4.11.1\n",
31 | "requests version: 2.27.1\n",
32 | "csv version: 1.0\n"
33 | ]
34 | }
35 | ],
36 | "source": [
37 | "# %% Setup version\n",
38 | "\n",
39 | "print('pandas version: {}'.format(pd.__version__))\n",
40 | "print('bs4 version: {}'.format(bs4.__version__))\n",
41 | "print('requests version: {}'.format(requests.__version__))\n",
42 | "print('csv version: {}'.format(csv.__version__))"
43 | ]
44 | },
45 | {
46 | "cell_type": "code",
47 | "execution_count": 3,
48 | "id": "86b4663f",
49 | "metadata": {},
50 | "outputs": [],
51 | "source": [
52 | "def get_url(search_term:str, nb_pages:int):\n",
53 | " \n",
54 | " \"\"\"\n",
55 | " The get_url function returns a list of urls for the searched phrase and the number of pages\n",
56 | " \n",
57 | " :search_term: The name or expression of the job you're looking for \n",
58 | " :nb_pages: The number of pages you want to scrape (< maximum number of web pages shown)\n",
59 | " \n",
60 | " \"\"\"\n",
61 | " \n",
62 | " links=[]\n",
63 | " search_term= search_term.replace(' ','%20')\n",
64 | " template= 'https://www.salary.com/tools/salary-calculator/search?keyword={}&location=&page={}&selectedjobcodes='\n",
65 | " for page in range(1, nb_pages):\n",
66 | " links.append(template.format(search_term, page))\n",
67 | " \n",
68 | " return links\n"
69 | ]
70 | },
71 | {
72 | "cell_type": "code",
73 | "execution_count": 4,
74 | "id": "17788884",
75 | "metadata": {},
76 | "outputs": [],
77 | "source": [
78 | "def scrape_desc(template):\n",
79 | " \n",
80 | " link= template + '-job-description?isshowmore=more&statistics=0'\n",
81 | " page= requests.get(link)\n",
82 | " soup= bs(page.content, 'html.parser')\n",
83 | " p= soup.find('p', class_='sal-p')\n",
84 | " description= p.text.strip()\n",
85 | " \n",
86 | " return description\n"
87 | ]
88 | },
89 | {
90 | "cell_type": "code",
91 | "execution_count": 5,
92 | "id": "50f96d54",
93 | "metadata": {},
94 | "outputs": [],
95 | "source": [
96 | "\n",
97 | "headers= {\"User-Agent\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36\", \"Accept-Encoding\":\"gzip, deflate\", \"Accept\":\"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\", \"DNT\":\"1\",\"Connection\":\"close\", \"Upgrade-Insecure-Requests\":\"1\"}\n",
98 | "\n",
99 | "def scrape_salary(link):\n",
100 | " \n",
101 | " \"\"\"\n",
102 | " \n",
103 | " Scrape_salary does scrape the salary percentiles of a particualar job\n",
104 | " \n",
105 | " :link: the URL of salary's web page \n",
106 | " \n",
107 | " \"\"\"\n",
108 | "\n",
109 | " page= requests.get(link, headers= headers)\n",
110 | " soup= bs(page.content, 'html.parser')\n",
111 | " div= soup.find('table', class_='table-chart')\n",
112 | " \n",
113 | " tds= div.find_all('td')\n",
114 | " percentile10= tds[5].text\n",
115 | " percentile25= tds[9].text\n",
116 | " percentile50= tds[13].text\n",
117 | " percentile75= tds[17].text\n",
118 | " percentile90= tds[21].text\n",
119 | " \n",
120 | " \n",
121 | " return [percentile10, percentile25, percentile50, percentile75, percentile90]\n"
122 | ]
123 | },
124 | {
125 | "cell_type": "code",
126 | "execution_count": 6,
127 | "id": "bc72f09f",
128 | "metadata": {},
129 | "outputs": [],
130 | "source": [
131 | "def scrape_all_links(links):\n",
132 | " \n",
133 | " \"\"\"\n",
134 | " \n",
135 | " The scrape_all_links function loops over the urls and scrapes all jobs infos \n",
136 | " \n",
137 | " :links: list of urls generated by calling the get_url function\n",
138 | " \n",
139 | " \"\"\"\n",
140 | " records= []\n",
141 | " \n",
142 | " for link in links:\n",
143 | " page= requests.get(link)\n",
144 | " soup= bs(page.content, 'html.parser')\n",
145 | " divs= soup.find_all('div', class_='sal-popluar-skills margin-top30')\n",
146 | " \n",
147 | " for div in divs:\n",
148 | " title = div.find('a', class_='a-color font-semibold margin-right10').text\n",
149 | " link= div.a['href']\n",
150 | " start= link.find('https://www.salary.com/tools/salary-calculator/')\n",
151 | " end= link.find(\"'\", start)\n",
152 | " link= link[start:end]\n",
153 | " \n",
154 | " salaries= scrape_salary(link)\n",
155 | " description= scrape_desc(link)\n",
156 | " record= [title, description, link] + salaries\n",
157 | " records.append(record)\n",
158 | " \n",
159 | " with open(\"data/salary_data.csv\", 'w', newline='', encoding='utf-8') as f:\n",
160 | " writer= csv.writer(f)\n",
161 | " writer.writerow(['Title', 'Description', 'link', 'Percentile10', 'Percentile25', 'Percentile50', 'Percentile75', 'Percentile90'])\n",
162 | " writer.writerows(records)\n",
163 | " \n",
164 | " return records\n"
165 | ]
166 | },
167 | {
168 | "cell_type": "code",
169 | "execution_count": 7,
170 | "id": "b4db8eb7",
171 | "metadata": {},
172 | "outputs": [],
173 | "source": [
174 | "# To scrape the salaries of project managers\n",
175 | "\n",
176 | "links= get_url('project manager', 2)\n",
177 | "records= scrape_all_links(links)"
178 | ]
179 | },
180 | {
181 | "cell_type": "code",
182 | "execution_count": 8,
183 | "id": "298ea970",
184 | "metadata": {},
185 | "outputs": [
186 | {
187 | "data": {
188 | "text/html": [
189 | "\n",
190 | "\n",
203 | "
\n",
204 | " \n",
205 | " \n",
206 | " \n",
207 | " Title \n",
208 | " Description \n",
209 | " link \n",
210 | " Percentile10 \n",
211 | " Percentile25 \n",
212 | " Percentile50 \n",
213 | " Percentile75 \n",
214 | " Percentile90 \n",
215 | " \n",
216 | " \n",
217 | " \n",
218 | " \n",
219 | " 0 \n",
220 | " Project Manager - Construction \n",
221 | " Project Manager - Construction oversees and di... \n",
222 | " https://www.salary.com/tools/salary-calculator... \n",
223 | " $84,577 \n",
224 | " $97,670 \n",
225 | " $112,050 \n",
226 | " $126,274 \n",
227 | " $139,224 \n",
228 | " \n",
229 | " \n",
230 | " 1 \n",
231 | " Project Accounting Manager \n",
232 | " Project Accounting Manager manages a team of p... \n",
233 | " https://www.salary.com/tools/salary-calculator... \n",
234 | " $86,153 \n",
235 | " $104,751 \n",
236 | " $125,179 \n",
237 | " $148,657 \n",
238 | " $170,032 \n",
239 | " \n",
240 | " \n",
241 | " 2 \n",
242 | " IT Project Manager IV \n",
243 | " IT Project Manager IV manages and oversees all... \n",
244 | " https://www.salary.com/tools/salary-calculator... \n",
245 | " $110,759 \n",
246 | " $121,122 \n",
247 | " $132,504 \n",
248 | " $143,264 \n",
249 | " $153,060 \n",
250 | " \n",
251 | " \n",
252 | " 3 \n",
253 | " Project Controls Manager \n",
254 | " Project Controls Manager manages and oversees ... \n",
255 | " https://www.salary.com/tools/salary-calculator... \n",
256 | " $108,483 \n",
257 | " $127,458 \n",
258 | " $148,301 \n",
259 | " $163,739 \n",
260 | " $177,795 \n",
261 | " \n",
262 | " \n",
263 | " 4 \n",
264 | " Project Manager Sr. - Construction \n",
265 | " Project Manager Sr. - Construction is responsi... \n",
266 | " https://www.salary.com/tools/salary-calculator... \n",
267 | " $118,505 \n",
268 | " $136,352 \n",
269 | " $155,955 \n",
270 | " $174,966 \n",
271 | " $192,274 \n",
272 | " \n",
273 | " \n",
274 | "
\n",
275 | "
"
276 | ],
277 | "text/plain": [
278 | " Title \\\n",
279 | "0 Project Manager - Construction \n",
280 | "1 Project Accounting Manager \n",
281 | "2 IT Project Manager IV \n",
282 | "3 Project Controls Manager \n",
283 | "4 Project Manager Sr. - Construction \n",
284 | "\n",
285 | " Description \\\n",
286 | "0 Project Manager - Construction oversees and di... \n",
287 | "1 Project Accounting Manager manages a team of p... \n",
288 | "2 IT Project Manager IV manages and oversees all... \n",
289 | "3 Project Controls Manager manages and oversees ... \n",
290 | "4 Project Manager Sr. - Construction is responsi... \n",
291 | "\n",
292 | " link Percentile10 \\\n",
293 | "0 https://www.salary.com/tools/salary-calculator... $84,577 \n",
294 | "1 https://www.salary.com/tools/salary-calculator... $86,153 \n",
295 | "2 https://www.salary.com/tools/salary-calculator... $110,759 \n",
296 | "3 https://www.salary.com/tools/salary-calculator... $108,483 \n",
297 | "4 https://www.salary.com/tools/salary-calculator... $118,505 \n",
298 | "\n",
299 | " Percentile25 Percentile50 Percentile75 Percentile90 \n",
300 | "0 $97,670 $112,050 $126,274 $139,224 \n",
301 | "1 $104,751 $125,179 $148,657 $170,032 \n",
302 | "2 $121,122 $132,504 $143,264 $153,060 \n",
303 | "3 $127,458 $148,301 $163,739 $177,795 \n",
304 | "4 $136,352 $155,955 $174,966 $192,274 "
305 | ]
306 | },
307 | "execution_count": 8,
308 | "metadata": {},
309 | "output_type": "execute_result"
310 | }
311 | ],
312 | "source": [
313 | "# Let's check the results\n",
314 | "\n",
315 | "df= pd.read_csv('data/salary_data.csv')\n",
316 | "df.head()"
317 | ]
318 | },
319 | {
320 | "cell_type": "code",
321 | "execution_count": 10,
322 | "id": "741f3640",
323 | "metadata": {},
324 | "outputs": [
325 | {
326 | "data": {
327 | "text/plain": [
328 | "(10, 8)"
329 | ]
330 | },
331 | "execution_count": 10,
332 | "metadata": {},
333 | "output_type": "execute_result"
334 | }
335 | ],
336 | "source": [
337 | "df.shape"
338 | ]
339 | },
340 | {
341 | "cell_type": "code",
342 | "execution_count": null,
343 | "id": "09409fe0",
344 | "metadata": {},
345 | "outputs": [],
346 | "source": []
347 | }
348 | ],
349 | "metadata": {
350 | "kernelspec": {
351 | "display_name": "Python 3 (ipykernel)",
352 | "language": "python",
353 | "name": "python3"
354 | },
355 | "language_info": {
356 | "codemirror_mode": {
357 | "name": "ipython",
358 | "version": 3
359 | },
360 | "file_extension": ".py",
361 | "mimetype": "text/x-python",
362 | "name": "python",
363 | "nbconvert_exporter": "python",
364 | "pygments_lexer": "ipython3",
365 | "version": "3.9.12"
366 | }
367 | },
368 | "nbformat": 4,
369 | "nbformat_minor": 5
370 | }
371 |
--------------------------------------------------------------------------------
/notebooks/Transfermarkt.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": 1,
6 | "id": "31572ff8",
7 | "metadata": {},
8 | "outputs": [],
9 | "source": [
10 | "# Imports\n",
11 | "import pandas as pd\n",
12 | "import bs4\n",
13 | "from bs4 import BeautifulSoup \n",
14 | "import requests\n",
15 | "import csv"
16 | ]
17 | },
18 | {
19 | "cell_type": "code",
20 | "execution_count": 2,
21 | "id": "beae77b1",
22 | "metadata": {},
23 | "outputs": [
24 | {
25 | "name": "stdout",
26 | "output_type": "stream",
27 | "text": [
28 | "pandas version: 1.4.2\n",
29 | "bs4 version: 4.11.1\n",
30 | "requests version: 2.27.1\n",
31 | "csv version: 1.0\n"
32 | ]
33 | }
34 | ],
35 | "source": [
36 | "# %% Setup version\n",
37 | "\n",
38 | "print('pandas version: {}'.format(pd.__version__))\n",
39 | "print('bs4 version: {}'.format(bs4.__version__))\n",
40 | "print('requests version: {}'.format(requests.__version__))\n",
41 | "print('csv version: {}'.format(csv.__version__))"
42 | ]
43 | },
44 | {
45 | "cell_type": "code",
46 | "execution_count": 3,
47 | "id": "00abf501",
48 | "metadata": {},
49 | "outputs": [],
50 | "source": [
51 | "def get_url(league_id):\n",
52 | " template= 'https://www.transfermarkt.com/ligue-1/transfers/wettbewerb/{}'\n",
53 | " url= template.format(league_id)\n",
54 | " return url\n",
55 | "\n",
56 | "def get_souped_page(url):\n",
57 | " ''' \n",
58 | " \n",
59 | " Scraping is prohibited on several websites. When scraping, we must request the xml pages similarly \n",
60 | " to how a browser does it by passing headers as a parameter to the get function in order to avoid rejection.\n",
61 | " \n",
62 | " '''\n",
63 | " \n",
64 | " headers= {\"User-Agent\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36\",\n",
65 | " \"Accept-Encoding\":\"gzip, deflate\", \"Accept\":\"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\", \"DNT\":\"1\",\n",
66 | " \"Connection\":\"close\", \"Upgrade-Insecure-Requests\":\"100\"}\n",
67 | " \n",
68 | " page= requests.get(url, headers= headers)\n",
69 | " soup= BeautifulSoup(page.content, 'html.parser')\n",
70 | " return soup\n",
71 | "\n",
72 | "# Convert the league_id to league name\n",
73 | "def id_to_name(league_id):\n",
74 | " \n",
75 | " if league_id == 'GB1':\n",
76 | " return 'Premier League'\n",
77 | " elif league_id == 'ES1':\n",
78 | " return 'La Liga'\n",
79 | " elif league_id == 'L1':\n",
80 | " return 'Bundesliga'\n",
81 | " elif league_id == 'IT1':\n",
82 | " return 'Serie A'\n",
83 | " elif league_id == 'FR1':\n",
84 | " return 'Ligue 1'\n",
85 | " \n",
86 | "# leagues ids and number of clubs competing\n",
87 | "leagues_dict= {'GB1':20,\n",
88 | " 'ES1':20,\n",
89 | " 'L1':18,\n",
90 | " 'IT1':20,\n",
91 | " 'FR1':20}"
92 | ]
93 | },
94 | {
95 | "cell_type": "code",
96 | "execution_count": 4,
97 | "id": "7dfd5f1a",
98 | "metadata": {},
99 | "outputs": [],
100 | "source": [
101 | "\n",
102 | "def scrape_all_transfers(leagues_dict):\n",
103 | " \n",
104 | " records= []\n",
105 | " for key, value in leagues_dict.items():\n",
106 | " \n",
107 | " url= get_url(key)\n",
108 | " soup= get_souped_page(url)\n",
109 | " divs= soup.find_all('div', class_='box')\n",
110 | " clubs= divs[4:value+4]\n",
111 | " \n",
112 | " for club in clubs:\n",
113 | " header= club.find('h2')\n",
114 | " try:\n",
115 | " name= header.a.text\n",
116 | " except:\n",
117 | " pass\n",
118 | " table_bodies= club.find_all('tbody')\n",
119 | " \n",
120 | " for table in table_bodies:\n",
121 | " trs= table.find_all('tr') \n",
122 | " \n",
123 | " for tr in trs:\n",
124 | " \n",
125 | " player= tr.a.text\n",
126 | " age= tr.find('td', class_='zentriert alter-transfer-cell').text\n",
127 | " nationality= tr.find('td', class_='zentriert nat-transfer-cell').img['alt']\n",
128 | " position= tr.find('td', class_='pos-transfer-cell').text\n",
129 | " \n",
130 | " market_value= tr.find('td', class_='rechts mw-transfer-cell').text\n",
131 | " \n",
132 | " # In the sake of cleaning data\n",
133 | " if \"m\" in market_value:\n",
134 | " market_value = int( float(market_value.strip().replace(\"€\",\"\").replace(\"m\",\"\")) * 1000000 )\n",
135 | " elif \"Th.\" in market_value:\n",
136 | " market_value = int(market_value.strip().replace(\"€\",\"\").replace(\"Th.\",\"\")) * 1000\n",
137 | " elif \"-\":\n",
138 | " market_value = 0\n",
139 | " \n",
140 | " try:\n",
141 | " left= tr.find('td', class_='no-border-links verein-flagge-transfer-cell').a.text\n",
142 | " except:\n",
143 | " pass\n",
144 | " \n",
145 | " fee= tr.find_all('td', class_='rechts')[1].a.text\n",
146 | " if \"Loan fee\" in fee:\n",
147 | " type_transfer= 'Loan'\n",
148 | " fee= fee.replace('Loan fee:','')\n",
149 | " else:\n",
150 | " type_transfer= 'Transfer'\n",
151 | " \n",
152 | " if \"m\" in fee:\n",
153 | " fee = int( float(fee.strip().replace(\"€\",\"\").replace(\"m\",\"\")) * 1000000 )\n",
154 | " elif \"Th.\" in fee:\n",
155 | " fee = int(fee.strip().replace(\"€\",\"\").replace(\"Th.\",\"\")) * 1000 \n",
156 | " elif \"-\":\n",
157 | " fee = 0\n",
158 | "\n",
159 | " \n",
160 | " competition= id_to_name(key)\n",
161 | " if table_bodies.index(table)==0:\n",
162 | " record= [player, age, nationality, position, market_value, left, name, fee, type_transfer, competition]\n",
163 | " else:\n",
164 | " record= [player, age, nationality, position, market_value, name, left, fee, type_transfer, competition]\n",
165 | "\n",
166 | " records.append(record)\n",
167 | " \n",
168 | " with open(\"data/transfers_data.csv\", 'w', newline='', encoding='utf-8') as f:\n",
169 | " writer= csv.writer(f)\n",
170 | " writer.writerow(['Player', 'Age', 'Nationality', 'Position', 'Market Value', 'From',\n",
171 | " 'To', 'Fee', 'Type of Transfer','Competition'])\n",
172 | " writer.writerows(records)\n",
173 | " \n",
174 | " return records\n"
175 | ]
176 | },
177 | {
178 | "cell_type": "code",
179 | "execution_count": 5,
180 | "id": "73f299d6",
181 | "metadata": {},
182 | "outputs": [],
183 | "source": [
184 | "# Let's scrape some transfers data\n",
185 | "data= scrape_all_transfers(leagues_dict)"
186 | ]
187 | },
188 | {
189 | "cell_type": "code",
190 | "execution_count": 8,
191 | "id": "4e3b3b33",
192 | "metadata": {},
193 | "outputs": [
194 | {
195 | "data": {
196 | "text/html": [
197 | "\n",
198 | "\n",
211 | "
\n",
212 | " \n",
213 | " \n",
214 | " \n",
215 | " Player \n",
216 | " Age \n",
217 | " Nationality \n",
218 | " Position \n",
219 | " Market Value \n",
220 | " From \n",
221 | " To \n",
222 | " Fee \n",
223 | " Type of Transfer \n",
224 | " Competition \n",
225 | " \n",
226 | " \n",
227 | " \n",
228 | " \n",
229 | " 0 \n",
230 | " Diego Carlos \n",
231 | " 29 \n",
232 | " Brazil \n",
233 | " Centre-Back \n",
234 | " 30000000 \n",
235 | " Sevilla FC \n",
236 | " Aston Villa \n",
237 | " 31000000 \n",
238 | " Transfer \n",
239 | " Premier League \n",
240 | " \n",
241 | " \n",
242 | " 1 \n",
243 | " Philippe Coutinho \n",
244 | " 30 \n",
245 | " Brazil \n",
246 | " Left Winger \n",
247 | " 20000000 \n",
248 | " Barcelona \n",
249 | " Aston Villa \n",
250 | " 20000000 \n",
251 | " Transfer \n",
252 | " Premier League \n",
253 | " \n",
254 | " \n",
255 | " 2 \n",
256 | " Leander Dendoncker \n",
257 | " 27 \n",
258 | " Belgium \n",
259 | " Defensive Midfield \n",
260 | " 28000000 \n",
261 | " Wolves \n",
262 | " Aston Villa \n",
263 | " 15000000 \n",
264 | " Transfer \n",
265 | " Premier League \n",
266 | " \n",
267 | " \n",
268 | " 3 \n",
269 | " Robin Olsen \n",
270 | " 32 \n",
271 | " Sweden \n",
272 | " Goalkeeper \n",
273 | " 2000000 \n",
274 | " AS Roma \n",
275 | " Aston Villa \n",
276 | " 3500000 \n",
277 | " Transfer \n",
278 | " Premier League \n",
279 | " \n",
280 | " \n",
281 | " 4 \n",
282 | " Ludwig Augustinsson \n",
283 | " 28 \n",
284 | " Sweden \n",
285 | " Left-Back \n",
286 | " 5000000 \n",
287 | " Sevilla FC \n",
288 | " Aston Villa \n",
289 | " 500000 \n",
290 | " Loan \n",
291 | " Premier League \n",
292 | " \n",
293 | " \n",
294 | "
\n",
295 | "
"
296 | ],
297 | "text/plain": [
298 | " Player Age Nationality Position Market Value \\\n",
299 | "0 Diego Carlos 29 Brazil Centre-Back 30000000 \n",
300 | "1 Philippe Coutinho 30 Brazil Left Winger 20000000 \n",
301 | "2 Leander Dendoncker 27 Belgium Defensive Midfield 28000000 \n",
302 | "3 Robin Olsen 32 Sweden Goalkeeper 2000000 \n",
303 | "4 Ludwig Augustinsson 28 Sweden Left-Back 5000000 \n",
304 | "\n",
305 | " From To Fee Type of Transfer Competition \n",
306 | "0 Sevilla FC Aston Villa 31000000 Transfer Premier League \n",
307 | "1 Barcelona Aston Villa 20000000 Transfer Premier League \n",
308 | "2 Wolves Aston Villa 15000000 Transfer Premier League \n",
309 | "3 AS Roma Aston Villa 3500000 Transfer Premier League \n",
310 | "4 Sevilla FC Aston Villa 500000 Loan Premier League "
311 | ]
312 | },
313 | "execution_count": 8,
314 | "metadata": {},
315 | "output_type": "execute_result"
316 | }
317 | ],
318 | "source": [
319 | "df= pd.read_csv('data/transfers_data.csv')\n",
320 | "df.head()"
321 | ]
322 | },
323 | {
324 | "cell_type": "code",
325 | "execution_count": 9,
326 | "id": "cd8e0ecc",
327 | "metadata": {},
328 | "outputs": [
329 | {
330 | "data": {
331 | "text/plain": [
332 | "(3079, 10)"
333 | ]
334 | },
335 | "execution_count": 9,
336 | "metadata": {},
337 | "output_type": "execute_result"
338 | }
339 | ],
340 | "source": [
341 | "df.shape"
342 | ]
343 | },
344 | {
345 | "cell_type": "code",
346 | "execution_count": null,
347 | "id": "5202567a",
348 | "metadata": {},
349 | "outputs": [],
350 | "source": []
351 | }
352 | ],
353 | "metadata": {
354 | "kernelspec": {
355 | "display_name": "Python 3 (ipykernel)",
356 | "language": "python",
357 | "name": "python3"
358 | },
359 | "language_info": {
360 | "codemirror_mode": {
361 | "name": "ipython",
362 | "version": 3
363 | },
364 | "file_extension": ".py",
365 | "mimetype": "text/x-python",
366 | "name": "python",
367 | "nbconvert_exporter": "python",
368 | "pygments_lexer": "ipython3",
369 | "version": "3.9.12"
370 | }
371 | },
372 | "nbformat": 4,
373 | "nbformat_minor": 5
374 | }
375 |
--------------------------------------------------------------------------------