├── LICENSE
├── README.md
├── ornek-verisetleri
├── amazon-ornek-veriseti.xlsx
├── beyazperde-ornek-veriseti.xlsx
├── ciceksepeti-ornek-veriseti.xlsx
├── eksi-sozluk-ornek-veriseti.xlsx
├── gittigidiyor-ornek-veriseti.xlsx
├── hepsiburada-ornek-veriseti.xlsx
├── incehesap-ornek-veriseti.xlsx
├── kitapyurdu-ornek-veriseti.xlsx
├── mediamarkt-ornek-veriseti.xlsx
├── n11-ornek-veriseti.xlsx
├── trendyol-ornek-veriseti.xlsx
├── yemeksepeti-ornek-veriseti.xlsx
└── youtube-ornek-veriseti.xlsx
├── requirements.txt
└── scrapers
├── amazon_scraper
├── amazon-ornek-veriseti.xlsx
├── amazon_scraper.py
└── requirements.txt
├── beyazperde_scraper
├── beyazperde-ornek-veriseti.xlsx
├── beyazperde_scraper.py
└── requirements.txt
├── ciceksepeti_scraper
├── ciceksepeti-ornek-veriseti.xlsx
├── ciceksepeti_scraper.py
└── requirements.txt
├── eksi_scraper
├── eksi-sozluk-ornek-veriseti.xlsx
├── eksi_scraper.py
└── requirements.txt
├── gittigidiyor_scraper
├── gittigidiyor-ornek-veriseti.xlsx
├── gittigidiyor_scraper.py
└── requirements.txt
├── hepsiburada_scraper
├── hepsiburada-ornek-veriseti.xlsx
├── hepsiburada_scraper.py
└── requirements.txt
├── incehesap_scraper
├── incehesap-ornek-veriseti.xlsx
├── incehesap_scraper.py
└── requirements.txt
├── kitapyurdu_scraper
├── kitapyurdu-ornek-veriseti.xlsx
├── kitapyurdu_scraper.py
└── requirements.txt
├── main-scraper.py
├── mediamarkt_scraper
├── mediamarkt-ornek-veriseti.xlsx
├── mediamarkt_scraper.py
└── requirements.txt
├── n11_scraper
├── n11-ornek-veriseti.xlsx
├── n11_scraper.py
└── requirements.txt
├── trendyol_scraper
├── requirements.txt
├── trendyol-ornek-veriseti.xlsx
└── trendyol_scraper.py
├── yemeksepeti_scraper
├── requirements.txt
├── yemeksepeti-ornek-veriseti.xlsx
└── yemeksepeti_scraper.py
└── youtube_scraper
├── requirements.txt
├── youtube-ornek-veriseti.xlsx
└── youtube_scraper.py
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2020 Arda Uzunoğlu
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # TRScraper
2 |
3 | TRScraper, **doğal dil işleme uygulamaları**nda kullanılmak amacıyla geliştirilmiş, Türkçe içerik girilen büyük platformlarda **metin madenciliği** yapma imkanı sunan bir uygulamadır.
4 |
5 | ### Yöntem ve Kullanılan Kütüphaneler
6 |
7 | TRScraper, **selenium** kütüphanesini kullanarak çeşitli büyük platformlarda kazıma(scrape) işlemi yapar ve elde ettiği veriyi **pandas** kütüphanesi aracılığı ile .xlsx uzantılı Excel dosyasına aktarır.
8 |
9 | ### Gereksinimler
10 | > 'pip install -r requirements.txt' komutu ile gerekli kütüphaneleri kurabilirsiniz.
11 |
12 | > Chromedriver'ın indirilmesi gerekmektedir. İndirme tamamlandıktan sonra '.py' uzantılı dosyalar içerisindeki path değişkenine Chromedriver'ın uzantısı atamalısınız.
13 |
14 | [ChromeDriver'ı İndir](https://chromedriver.storage.googleapis.com/index.html?path=84.0.4147.30/)
15 |
16 |
17 | ### Geliştirme Aşamasında Olan Özellikler
18 |
19 | - Çeşitli Platformlar
20 | - Arayüz
21 | - Kullanılabilir Drive Sayısını Arttırma
22 |
23 | ### Galeri - Ekşi Sözlük Örneği
24 |
25 | 
26 | 
27 | 
28 | 
29 |
30 | ### Geliştirici
31 |
32 | Github Hesabım: [ardauzunoglu](https://github.com/ardauzunoglu)
33 | LinkedIn Hesabım: [Arda Uzunoğlu](https://www.linkedin.com/in/arda-uzunoğlu/)
34 |
--------------------------------------------------------------------------------
/ornek-verisetleri/amazon-ornek-veriseti.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ardauzunoglu/TRScraper/85ef82834d225756b2727ad2e0aa3683d0dd9333/ornek-verisetleri/amazon-ornek-veriseti.xlsx
--------------------------------------------------------------------------------
/ornek-verisetleri/beyazperde-ornek-veriseti.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ardauzunoglu/TRScraper/85ef82834d225756b2727ad2e0aa3683d0dd9333/ornek-verisetleri/beyazperde-ornek-veriseti.xlsx
--------------------------------------------------------------------------------
/ornek-verisetleri/ciceksepeti-ornek-veriseti.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ardauzunoglu/TRScraper/85ef82834d225756b2727ad2e0aa3683d0dd9333/ornek-verisetleri/ciceksepeti-ornek-veriseti.xlsx
--------------------------------------------------------------------------------
/ornek-verisetleri/eksi-sozluk-ornek-veriseti.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ardauzunoglu/TRScraper/85ef82834d225756b2727ad2e0aa3683d0dd9333/ornek-verisetleri/eksi-sozluk-ornek-veriseti.xlsx
--------------------------------------------------------------------------------
/ornek-verisetleri/gittigidiyor-ornek-veriseti.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ardauzunoglu/TRScraper/85ef82834d225756b2727ad2e0aa3683d0dd9333/ornek-verisetleri/gittigidiyor-ornek-veriseti.xlsx
--------------------------------------------------------------------------------
/ornek-verisetleri/hepsiburada-ornek-veriseti.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ardauzunoglu/TRScraper/85ef82834d225756b2727ad2e0aa3683d0dd9333/ornek-verisetleri/hepsiburada-ornek-veriseti.xlsx
--------------------------------------------------------------------------------
/ornek-verisetleri/incehesap-ornek-veriseti.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ardauzunoglu/TRScraper/85ef82834d225756b2727ad2e0aa3683d0dd9333/ornek-verisetleri/incehesap-ornek-veriseti.xlsx
--------------------------------------------------------------------------------
/ornek-verisetleri/kitapyurdu-ornek-veriseti.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ardauzunoglu/TRScraper/85ef82834d225756b2727ad2e0aa3683d0dd9333/ornek-verisetleri/kitapyurdu-ornek-veriseti.xlsx
--------------------------------------------------------------------------------
/ornek-verisetleri/mediamarkt-ornek-veriseti.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ardauzunoglu/TRScraper/85ef82834d225756b2727ad2e0aa3683d0dd9333/ornek-verisetleri/mediamarkt-ornek-veriseti.xlsx
--------------------------------------------------------------------------------
/ornek-verisetleri/n11-ornek-veriseti.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ardauzunoglu/TRScraper/85ef82834d225756b2727ad2e0aa3683d0dd9333/ornek-verisetleri/n11-ornek-veriseti.xlsx
--------------------------------------------------------------------------------
/ornek-verisetleri/trendyol-ornek-veriseti.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ardauzunoglu/TRScraper/85ef82834d225756b2727ad2e0aa3683d0dd9333/ornek-verisetleri/trendyol-ornek-veriseti.xlsx
--------------------------------------------------------------------------------
/ornek-verisetleri/yemeksepeti-ornek-veriseti.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ardauzunoglu/TRScraper/85ef82834d225756b2727ad2e0aa3683d0dd9333/ornek-verisetleri/yemeksepeti-ornek-veriseti.xlsx
--------------------------------------------------------------------------------
/ornek-verisetleri/youtube-ornek-veriseti.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ardauzunoglu/TRScraper/85ef82834d225756b2727ad2e0aa3683d0dd9333/ornek-verisetleri/youtube-ornek-veriseti.xlsx
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | beautifulsoup4==4.9.1
2 | numpy==1.18.5
3 | pandas==1.1.4
4 | selenium==3.141.0
--------------------------------------------------------------------------------
/scrapers/amazon_scraper/amazon-ornek-veriseti.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ardauzunoglu/TRScraper/85ef82834d225756b2727ad2e0aa3683d0dd9333/scrapers/amazon_scraper/amazon-ornek-veriseti.xlsx
--------------------------------------------------------------------------------
/scrapers/amazon_scraper/amazon_scraper.py:
--------------------------------------------------------------------------------
1 | import sys
2 | import time
3 | import requests
4 | import pandas as pd
5 | from bs4 import BeautifulSoup
6 | from selenium import webdriver
7 | from selenium.common.exceptions import WebDriverException, NoSuchElementException
8 | from selenium.webdriver.common.keys import Keys
9 |
10 | def amazon_scraper():
11 | def selenium():
12 | def selenium_initialize():
13 | def preference(scrape_input, question):
14 | while (scrape_input.lower() != "y") or (scrape_input.lower() != "n"):
15 | if scrape_input.lower() == "y":
16 | output = True
17 | break
18 |
19 | elif scrape_input.lower() == "n":
20 | output = False
21 | break
22 |
23 | else:
24 | print("Geçersiz yanıt.")
25 | scrape_input = input(question)
26 |
27 | return output
28 |
29 | def delay_check(delay):
30 | while type(delay) != int:
31 | try:
32 | delay = int(delay)
33 | except ValueError:
34 | print("Lütfen bir sayı değeri giriniz.")
35 | delay = input("Bekleme süresi: ")
36 |
37 | return delay
38 |
39 | global product_name, file, delay, review_texts, review_headlines, review_useful, customer_name_texts, date_texts, scrape_headlines, scrape_useful, scrape_customer_names, scrape_dates, path
40 |
41 | product_name = input("İncelemelerin çekileceği ürün adı: ")
42 | file = input("Oluşturulacak Excel dosyasının adı: ")
43 | file = file + ".xlsx"
44 | delay = delay_check(input("Bekleme süresi(sn): "))
45 |
46 | review_texts = []
47 | review_useful = []
48 | review_headlines = []
49 | customer_name_texts = []
50 | date_texts = []
51 |
52 | scrape_useful_question = "İncelemenin aldığı beğeni sayısı çekilsin mi(y/n): "
53 | scrape_useful_input = input(scrape_useful_question)
54 | scrape_useful = preference(scrape_useful_input, scrape_useful_question)
55 |
56 | scrape_headlines_question = "İncelemenin başlığı çekilsin mi(y/n): "
57 | scrape_headlines_input = input(scrape_headlines_question)
58 | scrape_headlines = preference(scrape_headlines_input, scrape_headlines_question)
59 |
60 | scrape_customer_name_question = "Müşteri isimleri çekilsin mi(y/n): "
61 | scrape_customer_name_input = input(scrape_customer_name_question)
62 | scrape_customer_names = preference(scrape_customer_name_input, scrape_customer_name_question)
63 |
64 | scrape_date_question = "İnceleme tarihleri çekilsin mi(y/n): "
65 | scrape_date_input = input(scrape_date_question)
66 | scrape_dates = preference(scrape_date_input, scrape_date_question)
67 |
68 | path = "BURAYA CHROMEDRIVER KONUMUNU GİRİNİZ"
69 |
70 | def selenium_scrape():
71 | try:
72 | print("Chromedriver'a erişiliyor...")
73 | driver = webdriver.Chrome(path)
74 | time.sleep(delay)
75 | print("Chromedriver'a erişildi.")
76 |
77 | except WebDriverException:
78 | print("Chromedriver kullanılamıyor.")
79 | sys.exit()
80 |
81 | try:
82 | print("Amazon adresine gidiliyor...")
83 | driver.get("https://www.amazon.com.tr")
84 | time.sleep(delay)
85 | driver.maximize_window()
86 | time.sleep(delay)
87 | print("Amazon adresine gidildi.")
88 |
89 | except:
90 | print("Amazon'a erişilemiyor.")
91 | sys.exit()
92 |
93 | try:
94 | print("Ürün aranıyor...")
95 | search_bar = driver.find_element_by_id("twotabsearchtextbox")
96 | search_bar.send_keys(product_name)
97 | search_bar.send_keys(Keys.ENTER)
98 | time.sleep(delay)
99 |
100 | product = driver.find_element_by_class_name("s-result-item")
101 | product.click()
102 | time.sleep(delay)
103 | print("Ürün bulundu.")
104 |
105 | except NoSuchElementException:
106 | print("Ürün bulunamadı.")
107 | sys.exit()
108 |
109 | try:
110 | time.sleep(delay)
111 | review_count = driver.find_element_by_id("acrCustomerReviewText")
112 | review_count = int(review_count.text.replace("değerlendirme", ""))
113 |
114 | except NoSuchElementException:
115 | print("İnceleme bulunamadı.")
116 | sys.exit()
117 |
118 | lenOfPage = driver.execute_script("window.scrollTo(0, document.body.scrollHeight); var lenOfPage=document.body.scrollHeight; return lenOfPage;")
119 | match = False
120 |
121 | while match == False:
122 | lastCount = lenOfPage
123 | time.sleep(delay)
124 | lenOfPage = driver.execute_script("window.scrollTo(0, document.body.scrollHeight); var lenOfPage=document.body.scrollHeight; return lenOfPage;")
125 | if lastCount == lenOfPage:
126 | match = True
127 |
128 | see_all_reviews = driver.find_element_by_xpath("//*[@id='reviews-medley-footer']/div[2]/a")
129 | see_all_reviews.click()
130 |
131 | if review_count % 10 == 0:
132 | length_of_page = review_count // 10
133 |
134 | else:
135 | length_of_page = (review_count // 10) + 1
136 |
137 | l = 1
138 |
139 | while l <= length_of_page:
140 |
141 | print("İncelemeler çekiliyor...")
142 | print("Sayfa: " + str(l))
143 |
144 | time.sleep(delay)
145 |
146 | reviews = driver.find_elements_by_class_name("review")
147 | for review in reviews:
148 | review_text = review.find_element_by_class_name("review-text-content").text
149 | review_texts.append(review_text)
150 |
151 | try:
152 | headline = review.find_element_by_class_name("review-title").text
153 | review_headlines.append(headline)
154 |
155 | except:
156 | review_headlines.append("BOŞ")
157 |
158 | try:
159 | useful = review.find_element_by_class_name("cr-vote-text")
160 | useful = useful.text.split()[0]
161 | if useful.lower == "bir":
162 | useful = "1"
163 | review_useful.append(useful)
164 |
165 | except:
166 | review_useful.append("0")
167 |
168 | customer = review.find_element_by_class_name("a-profile-name").text
169 | customer_name_texts.append(customer)
170 |
171 | date = review.find_element_by_class_name("review-date")
172 | date = date.text.split()
173 | date = date[1:4]
174 | date = " ".join(date)
175 | date_texts.append(date)
176 |
177 | driver.execute_script("window.scrollTo(0, document.body.scrollHeight); var lenOfPage=document.body.scrollHeight; return lenOfPage;")
178 | try:
179 | next_page = driver.find_element_by_class_name("a-last")
180 | next_page.click()
181 |
182 | except:
183 | pass
184 |
185 | l += 1
186 |
187 | driver.close()
188 |
189 | length_list = [review_texts, review_useful, review_headlines, customer_name_texts, date_texts]
190 | limit = map(len, length_list)
191 | limit = min(list(limit))
192 | limit -= 1
193 |
194 | review_texts_fin = review_texts[:limit]
195 | df = pd.DataFrame({"Yorum": review_texts_fin})
196 |
197 | if scrape_useful:
198 | review_useful_fin = review_useful[:limit]
199 | df["Yorum Beğeni Sayısı"] = review_useful_fin
200 | df["Yorum Beğeni Sayısı"] = df["Yorum Beğeni Sayısı"]
201 |
202 | if scrape_headlines:
203 | review_headlines_fin = review_headlines[:limit]
204 | df["Yorumun Başlığı"] = review_headlines_fin
205 |
206 | if scrape_customer_names:
207 | customer_name_texts_fin = customer_name_texts[:limit]
208 | df["Yorum Yazan Müşteri"] = customer_name_texts_fin
209 |
210 | if scrape_dates:
211 | date_texts_fin = date_texts[:limit]
212 | df["Yorumun Yazıldığı Tarih"] = date_texts_fin
213 |
214 | df.to_excel(file, header = True, index = False)
215 |
216 | x = "Çektiğiniz veriler " + file + " adlı excel dosyasına kaydedildi."
217 | print(x)
218 | print("""
219 | --------------------------------------------------------------------------
220 | - Projeden memnun kaldıysanız Github üzerinden yıldızlamayı unutmayın. -
221 | - Github Hesabım: ardauzunoglu -
222 | --------------------------------------------------------------------------
223 | """)
224 | selenium_initialize()
225 | selenium_scrape()
226 |
227 | def beautifulsoup():
228 | def preference(scrape_input, question):
229 | while (scrape_input.lower() != "y") or (scrape_input.lower() != "n"):
230 | if scrape_input.lower() == "y":
231 | output = True
232 | break
233 |
234 | elif scrape_input.lower() == "n":
235 | output = False
236 | break
237 |
238 | else:
239 | print("Geçersiz yanıt.")
240 | scrape_input = input(question)
241 |
242 | return output
243 |
244 | def get_soup(url):
245 | r = requests.get(url)
246 | soup = BeautifulSoup(r.text, "html.parser")
247 | return soup
248 |
249 | def get_length_of_pages():
250 | soup = get_soup(page_url)
251 | review_count = int(soup.find("div", {"data-hook":"cr-filter-info-review-rating-count"}).text.split()[4])
252 | if review_count % 10 == 0:
253 | length_of_pages = review_count // 10
254 | else:
255 | length_of_pages = review_count // 10 + 1
256 | return length_of_pages
257 |
258 | def get_reviews(soup):
259 | reviews = soup.find_all("div", {"data-hook":"review"})
260 | for review in reviews:
261 | try:
262 | like_count = review.find("span", {"data-hook":"helpful-vote-statement"}).text.split()[0] if review.find("span", {"data-hook":"helpful-vote-statement"}).text.split()[0] != "Bir" else "1"
263 | except:
264 | like_count = 0
265 | review = {
266 | "Yorum":review.find("span", {"data-hook":"review-body"}).text.replace("\n", ""),
267 | "Yorumun Beğeni Sayısı":like_count,
268 | "Yorumun Başlığı":review.find("a", {"data-hook":"review-title"}).text,
269 | "Yorum Yazan Müşteri":review.find("span", {"class":"a-profile-name"}).text.replace("\n", ""),
270 | "Yorumun Yazıldığı Tarih":" ".join(review.find("span", {"data-hook":"review-date"}).text.split()[1:4])
271 | }
272 | review_list.append(review)
273 |
274 | def list_to_excel(list):
275 | df = pd.DataFrame(list)
276 | if not scrape_useful:
277 | df = df.drop(columns=["Yorumun Beğeni Sayısı"])
278 |
279 | if not scrape_headlines:
280 | df = df.drop(columns=["Yorumun Başlığı"])
281 |
282 | if not scrape_customer_names:
283 | df = df.drop(columns=["Yorum Yazan Müşteri"])
284 |
285 | if not scrape_dates:
286 | df = df.drop(columns=["Yorumun Yazıldığı Tarih"])
287 |
288 | df.to_excel(file, header = True, index = False)
289 | print("Excele kaydedildi.")
290 |
291 | scrape_useful_question = "İncelemenin aldığı beğeni sayısı çekilsin mi(y/n): "
292 | scrape_useful_input = input(scrape_useful_question)
293 | scrape_useful = preference(scrape_useful_input, scrape_useful_question)
294 |
295 | scrape_headlines_question = "İncelemenin başlığı çekilsin mi(y/n): "
296 | scrape_headlines_input = input(scrape_headlines_question)
297 | scrape_headlines = preference(scrape_headlines_input, scrape_headlines_question)
298 |
299 | scrape_customer_name_question = "Müşteri isimleri çekilsin mi(y/n): "
300 | scrape_customer_name_input = input(scrape_customer_name_question)
301 | scrape_customer_names = preference(scrape_customer_name_input, scrape_customer_name_question)
302 |
303 | scrape_date_question = "İnceleme tarihleri çekilsin mi(y/n): "
304 | scrape_date_input = input(scrape_date_question)
305 | scrape_dates = preference(scrape_date_input, scrape_date_question)
306 |
307 | review_list = []
308 | page_url = input("Ürün linki: ")
309 | file = input("Oluşturulacak Excel dosyasının adı: ") + ".xlsx"
310 |
311 | current_page = 1
312 |
313 | while True:
314 | if current_page == 1:
315 | page_url = page_url + str("&reviewerType=all_reviews&pageNumber=") + str(current_page)
316 | else:
317 | page_url = page_url[:len(page_url)-1] + str(current_page)
318 | soup = get_soup(page_url)
319 | print(page_url)
320 | print("Veriler çekiliyor...")
321 | print("Sayfa: " + str(current_page))
322 | get_reviews(soup)
323 | current_page += 1
324 | length_of_pages = get_length_of_pages()
325 | if current_page > length_of_pages:
326 | break
327 |
328 | list_to_excel(review_list)
329 | x = "Çektiğiniz veriler " + file + " adlı excel dosyasına kaydedildi."
330 | print(x)
331 | print("""
332 | --------------------------------------------------------------------------
333 | - Projeden memnun kaldıysanız Github üzerinden yıldızlamayı unutmayın. -
334 | - Github Hesabım: ardauzunoglu -
335 | --------------------------------------------------------------------------
336 | """)
337 |
338 | print("""
339 | ---------------------------------------------------------
340 | - Amazon Scraper'a hoş geldiniz! -
341 | - Geliştirici: Arda Uzunoğlu -
342 | ---------------------------------------------------------
343 | """)
344 |
345 | s_or_bs = input("Kullanılacak kütüphane(s/bs): ")
346 | if s_or_bs.lower() == "bs":
347 | beautifulsoup()
348 |
349 | elif s_or_bs.lower() == "s":
350 | selenium()
351 |
352 | else:
353 | print("Geçersiz yanıt.")
354 |
355 | if __name__ == "__main__":
356 | amazon_scraper()
--------------------------------------------------------------------------------
/scrapers/amazon_scraper/requirements.txt:
--------------------------------------------------------------------------------
1 | beautifulsoup4==4.9.1
2 | numpy==1.18.5
3 | pandas==1.1.4
4 | selenium==3.141.0
--------------------------------------------------------------------------------
/scrapers/beyazperde_scraper/beyazperde-ornek-veriseti.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ardauzunoglu/TRScraper/85ef82834d225756b2727ad2e0aa3683d0dd9333/scrapers/beyazperde_scraper/beyazperde-ornek-veriseti.xlsx
--------------------------------------------------------------------------------
/scrapers/beyazperde_scraper/beyazperde_scraper.py:
--------------------------------------------------------------------------------
1 | import sys
2 | import time
3 | import pandas as pd
4 | from selenium import webdriver
5 | from selenium.common.exceptions import WebDriverException, NoSuchElementException
6 |
7 | def beyazperde_scrape():
8 | def initialize():
9 | def preference(scrape_input, question):
10 | while (scrape_input.lower() != "y") or (scrape_input.lower() != "n"):
11 | if scrape_input.lower() == "y":
12 | output = True
13 | break
14 |
15 | elif scrape_input.lower() == "n":
16 | output = False
17 | break
18 |
19 | else:
20 | print("Geçersiz yanıt.")
21 | scrape_input = input(question)
22 |
23 | return output
24 |
25 | def delay_check(delay):
26 | while type(delay) != int:
27 | try:
28 | delay = int(delay)
29 | except ValueError:
30 | print("Lütfen bir sayı değeri giriniz.")
31 | delay = input("Bekleme süresi: ")
32 |
33 | return delay
34 |
35 | print("""
36 | ---------------------------------------------------------
37 | - Beyazperde Scraper'a hoş geldiniz! -
38 | - Geliştirici: Arda Uzunoğlu -
39 | ---------------------------------------------------------
40 | """)
41 |
42 | global film, file, delay, review_texts, review_useful, review_not_useful, review_scores, member_name_texts, date_texts, scrape_useful, scrape_scores, scrape_member_name, scrape_date, path
43 |
44 | film = input("İncelemelerin Çekileceği Film: ")
45 | file = input("Oluşturulacak Excel dosyasının adı: ") + ".xlsx"
46 | delay = delay_check(input("Bekleme süresi(sn): "))
47 |
48 | review_texts = []
49 | review_useful = []
50 | review_not_useful = []
51 | review_scores = []
52 | member_name_texts = []
53 | date_texts = []
54 |
55 | scrape_useful_question = "İncelemenin aldığı beğeni sayısı çekilsin mi(y/n): "
56 | scrape_useful_input = input(scrape_useful_question)
57 | scrape_useful = preference(scrape_useful_input, scrape_useful_question)
58 |
59 | scrape_scores_question = "Filme verilen puan çekilsin mi(y/n): "
60 | scrape_scores_input = input(scrape_scores_question)
61 | scrape_scores = preference(scrape_scores_input, scrape_scores_question)
62 |
63 | scrape_member_name_question = "Kullanıcı isimleri çekilsin mi(y/n): "
64 | scrape_member_name_input = input(scrape_member_name_question)
65 | scrape_member_name = preference(scrape_member_name_input, scrape_member_name_question)
66 |
67 | scrape_date_question = "İnceleme tarihleri çekilsin mi(y/n): "
68 | scrape_date_input = input(scrape_date_question)
69 | scrape_date = preference(scrape_date_input, scrape_date_question)
70 |
71 | path = "BURAYA CHROMEDRIVER KONUMUNU GİRİNİZ"
72 |
73 | def scrape():
74 | try:
75 | print("Chromedriver'a erişiliyor...")
76 | driver = webdriver.Chrome(path)
77 | time.sleep(delay)
78 | print("Chromedriver'a erişildi.")
79 |
80 | except WebDriverException:
81 | print("Chromedriver kullanılamıyor.")
82 | sys.exit()
83 |
84 | try:
85 | print("Beyazperde adresine gidiliyor...")
86 | driver.get("http://www.beyazperde.com")
87 | time.sleep(delay)
88 | driver.maximize_window()
89 | time.sleep(delay)
90 | print("Beyazperde adresine gidildi.")
91 |
92 | except:
93 | print("Beyazperde'ye erişilemiyor.")
94 | sys.exit()
95 |
96 | try:
97 | print("Film aranıyor...")
98 | search = driver.find_element_by_class_name("header-search-input")
99 | search.send_keys(film)
100 | time.sleep(delay+3)
101 |
102 | auto_complete = driver.find_element_by_class_name("autocomplete-result-title")
103 | auto_complete.click()
104 | time.sleep(delay)
105 | print("Film bulundu.")
106 |
107 | except NoSuchElementException:
108 | print("Film bulunamadı.")
109 | sys.exit()
110 |
111 | try:
112 | member_reviews = driver.find_element_by_link_text("Üye Eleştirileri")
113 | member_reviews.click()
114 | time.sleep(delay)
115 |
116 | review_count = driver.find_element_by_class_name("titlebar-title.titlebar-title-md").text
117 | review_count = int(review_count.replace(" kullanıcı eleştirisi",""))
118 | time.sleep(delay)
119 |
120 | except NoSuchElementException:
121 | print("Film incelemeleri bulunamadı.")
122 | sys.exit()
123 |
124 | try:
125 | close_banner = driver.find_element_by_id("creativeClose")
126 | close_banner.click()
127 | time.sleep(delay)
128 |
129 | except NoSuchElementException:
130 | pass
131 |
132 | if (review_count % 20) == 0:
133 | review_page_count = review_count // 20
134 |
135 | else:
136 | review_page_count = (review_count // 20) + 1
137 |
138 | constant_url = driver.current_url
139 |
140 | l = 1
141 | while l <= review_page_count:
142 |
143 | lenOfPage = driver.execute_script("window.scrollTo(0, document.body.scrollHeight); var lenOfPage=document.body.scrollHeight; return lenOfPage;")
144 | match = False
145 |
146 | while match == False:
147 | lastCount = lenOfPage
148 | time.sleep(delay)
149 | lenOfPage = driver.execute_script("window.scrollTo(0, document.body.scrollHeight); var lenOfPage=document.body.scrollHeight; return lenOfPage;")
150 | if lastCount == lenOfPage:
151 | match = True
152 |
153 | print("Veriler çekiliyor...")
154 | print("Sayfa: " + str(l))
155 |
156 | reviews = driver.find_elements_by_class_name("review-card-content")
157 | for review in reviews:
158 | review = review.text
159 | review_texts.append(review)
160 |
161 | usefuls = driver.find_elements_by_class_name("reviews-users-comment-useful")
162 | for useful_unp in usefuls:
163 | useful_unp = useful_unp.text
164 | useful_unp = useful_unp.split()
165 |
166 | useful = useful_unp[0][0]
167 | not_useful = useful_unp[0][1]
168 |
169 | review_useful.append(useful)
170 | review_not_useful.append(not_useful)
171 |
172 | scores = driver.find_elements_by_class_name("stareval-note")
173 | for score in scores:
174 | score = score.text.replace(",0","")
175 | review_scores.append(score)
176 |
177 | member_names = driver.find_elements_by_class_name("review-card-user-infos.cf")
178 | for member_name in member_names:
179 | seperation = member_name.index("\n")
180 | member_name = member_name.text[:seperation]
181 | member_name_texts.append(member_name)
182 |
183 | dates = driver.find_elements_by_class_name("review-card-meta-date")
184 | for date in dates:
185 | date = date.text.split()[:3]
186 | date = " ".join(date)
187 | date_texts.append(date)
188 |
189 | l += 1
190 |
191 | url = constant_url + "?page=" + str(l)
192 | driver.get(url)
193 |
194 | driver.close()
195 |
196 | length_list = [review_texts, review_useful, review_not_useful, review_scores, member_name_texts, date_texts]
197 | limit = map(len, length_list)
198 | limit = min(list(limit))
199 | limit -= 1
200 |
201 | review_texts_fin = review_texts[:limit]
202 | df = pd.DataFrame({"İncelemeler":review_texts_fin})
203 |
204 | if scrape_useful:
205 | review_useful_fin = review_useful[:limit]
206 | review_not_useful_fin = review_not_useful[:limit]
207 | df["İncelemeyi Yararlı Bulanlar"] = review_useful_fin
208 | df["İncelemeyi Yararlı Bulmayanlar"] = review_not_useful_fin
209 |
210 | if scrape_scores:
211 | review_scores_fin = review_scores[:limit]
212 | df["İnceleme Puanları"] = review_scores_fin
213 |
214 | if scrape_member_name:
215 | member_name_texts_fin = member_name_texts[:limit]
216 | df["İncelemeyi Yayınlayan Kişi"] = member_name_texts_fin
217 |
218 | if scrape_date:
219 | date_texts_fin = date_texts[:limit]
220 | df["İncelemenin Yayınlanma Tarihi"] = date_texts_fin
221 |
222 | df.to_excel(file, header = True, index = False)
223 | x = "Çektiğiniz veriler "+ file + " adlı excel dosyasına kaydedildi."
224 | print(x)
225 | print("""
226 | --------------------------------------------------------------------------
227 | - Projeden memnun kaldıysanız Github üzerinden yıldızlamayı unutmayın. -
228 | - Github Hesabım: ardauzunoglu -
229 | --------------------------------------------------------------------------
230 | """)
231 | time.sleep(3)
232 |
233 | initialize()
234 | scrape()
235 |
236 | if __name__ == "__main__":
237 | beyazperde_scrape()
--------------------------------------------------------------------------------
/scrapers/beyazperde_scraper/requirements.txt:
--------------------------------------------------------------------------------
1 | numpy==1.18.5
2 | pandas==1.1.4
3 | selenium==3.141.0
--------------------------------------------------------------------------------
/scrapers/ciceksepeti_scraper/ciceksepeti-ornek-veriseti.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ardauzunoglu/TRScraper/85ef82834d225756b2727ad2e0aa3683d0dd9333/scrapers/ciceksepeti_scraper/ciceksepeti-ornek-veriseti.xlsx
--------------------------------------------------------------------------------
/scrapers/ciceksepeti_scraper/ciceksepeti_scraper.py:
--------------------------------------------------------------------------------
1 | import sys
2 | import time
3 | import pandas as pd
4 | from selenium import webdriver
5 | from selenium.common.exceptions import WebDriverException, NoSuchElementException
6 | from selenium.webdriver.common.keys import Keys
7 |
8 | def ciceksepeti_scraper():
9 | def initialize():
10 | def preference(scrape_input, question):
11 | while (scrape_input.lower() != "y") or (scrape_input.lower() != "n"):
12 | if scrape_input.lower() == "y":
13 | output = True
14 | break
15 |
16 | elif scrape_input.lower() == "n":
17 | output = False
18 | break
19 |
20 | else:
21 | print("Geçersiz yanıt.")
22 | scrape_input = input(question)
23 |
24 | return output
25 |
26 | def delay_check(delay):
27 | while type(delay) != int:
28 | try:
29 | delay = int(delay)
30 | except ValueError:
31 | print("Lütfen bir sayı değeri giriniz.")
32 | delay = input("Bekleme süresi: ")
33 |
34 | return delay
35 |
36 | print("""
37 | ---------------------------------------------------------
38 | - Çiçeksepeti Scraper'a hoş geldiniz! -
39 | - Geliştirici: Arda Uzunoğlu -
40 | ---------------------------------------------------------
41 | """)
42 |
43 | global product_name, file, delay, review_texts, customer_province_texts, customer_name_texts, date_texts, scrape_province, scrape_customer_names, scrape_dates, path
44 |
45 | product_name = input("İncelemelerin çekileceği ürün adı: ")
46 | file = input("Oluşturulacak Excel dosyasının adı: ")
47 | file = file + ".xlsx"
48 | delay = delay_check(input("Bekleme süresi(sn): "))
49 |
50 | review_texts = []
51 | customer_province_texts = []
52 | customer_name_texts = []
53 | date_texts = []
54 |
55 | scrape_province_question = "Müşterinin konumu çekilsin mi(y/n): "
56 | scrape_province_input = input(scrape_province_question)
57 | scrape_province = preference(scrape_province_input, scrape_province_question)
58 |
59 | scrape_customer_name_question = "Müşteri isimleri çekilsin mi(y/n): "
60 | scrape_customer_name_input = input(scrape_customer_name_question)
61 | scrape_customer_names = preference(scrape_customer_name_input, scrape_customer_name_question)
62 |
63 | scrape_date_question = "İnceleme tarihleri çekilsin mi(y/n): "
64 | scrape_date_input = input(scrape_date_question)
65 | scrape_dates = preference(scrape_date_input, scrape_date_question)
66 |
67 | path = "BURAYA CHROMEDRIVER KONUMUNU GİRİNİZ"
68 |
69 | def scrape():
70 | try:
71 | print("Chromedriver'a erişiliyor...")
72 | driver = webdriver.Chrome(path)
73 | time.sleep(delay)
74 | print("Chromedriver'a erişildi.")
75 |
76 | except WebDriverException:
77 | print("Chromedriver kullanılamıyor.")
78 | sys.exit()
79 |
80 | try:
81 | print("Çiçeksepeti adresine gidiliyor...")
82 | driver.get("https://www.ciceksepeti.com")
83 | time.sleep(delay)
84 | driver.maximize_window()
85 | time.sleep(delay)
86 | print("Çiçeksepeti adresine gidildi.")
87 |
88 | except:
89 | print("Çiçeksepeti'ne erişilemiyor.")
90 | sys.exit()
91 |
92 | try:
93 | print("Ürün aranıyor...")
94 | search_bar = driver.find_element_by_class_name("product-search__input")
95 | search_bar.send_keys(product_name)
96 | search_bar.send_keys(Keys.ENTER)
97 | time.sleep(delay)
98 |
99 | product = driver.find_element_by_class_name("products__item-inner")
100 | product.click()
101 | time.sleep(delay)
102 | print("Ürün bulundu.")
103 |
104 | except NoSuchElementException:
105 | print("Ürün bulunamadı.")
106 | sys.exit()
107 |
108 | see_all_reviews = driver.find_element_by_class_name("comments__all-comments")
109 | see_all_reviews.click()
110 |
111 | review_count = driver.find_element_by_class_name("page-comments__product-evaluation__comment-count").text.replace("Yorum", "")
112 | review_count = int(review_count.strip("()"))
113 |
114 | if review_count % 20 == 0:
115 | length_of_page = review_count // 20
116 |
117 | else:
118 | length_of_page = (review_count // 20) + 1
119 |
120 | l = 1
121 |
122 | while l <= length_of_page:
123 | print("İncelemeler çekiliyor...")
124 | print("Sayfa: " + str(l))
125 |
126 | time.sleep(delay)
127 |
128 | reviews = driver.find_elements_by_class_name("page-comments__list__item")
129 | for review in reviews:
130 | review_text = review.find_element_by_class_name("page-comments__list__item__text").text
131 | if review_text == "":
132 | review_text = "BOŞ"
133 | review_texts.append(review_text)
134 |
135 | customer_name = review.find_element_by_class_name("page-comments__list__item__name").text
136 | customer_name_texts.append(customer_name)
137 |
138 | try:
139 | review = review.text.replace(review_text, "")
140 |
141 | except:
142 | pass
143 |
144 | review = review.replace(customer_name, "")
145 | review = review.replace(" | ", "").split()
146 | customer_province = review[0]
147 | date = review[1]
148 |
149 | customer_province_texts.append(customer_province)
150 | date_texts.append(date)
151 |
152 | try:
153 | driver.execute_script("window.scrollTo(0, 2160)")
154 | next_page = driver.find_element_by_class_name("cs-next")
155 | next_page.click()
156 |
157 | except:
158 | pass
159 |
160 | l += 1
161 |
162 | driver.close()
163 |
164 | length_list = [review_texts, customer_province_texts, customer_name_texts, date_texts]
165 | limit = map(len, length_list)
166 | limit = min(list(limit))
167 | limit -= 1
168 |
169 | review_texts_fin = review_texts[:limit]
170 | df = pd.DataFrame({"Yorum": review_texts_fin})
171 |
172 | if scrape_province:
173 | customer_province_texts_fin = customer_province_texts[:limit]
174 | df["Yorum Beğeni Sayısı"] = customer_province_texts_fin
175 | df["Yorum Beğeni Sayısı"] = df["Yorum Beğeni Sayısı"]
176 |
177 | if scrape_customer_names:
178 | customer_name_texts_fin = customer_name_texts[:limit]
179 | df["Yorum Yazan Müşteri"] = customer_name_texts_fin
180 |
181 | if scrape_dates:
182 | date_texts_fin = date_texts[:limit]
183 | df["Yorumun Yazıldığı Tarih"] = date_texts_fin
184 |
185 | df.to_excel(file, header = True, index = False)
186 |
187 | x = "Çektiğiniz veriler " + file + " adlı excel dosyasına kaydedildi."
188 | print(x)
189 |
190 | print("""
191 | --------------------------------------------------------------------------
192 | - Projeden memnun kaldıysanız Github üzerinden yıldızlamayı unutmayın. -
193 | - Github Hesabım: ardauzunoglu -
194 | --------------------------------------------------------------------------
195 | """)
196 |
197 | initialize()
198 | scrape()
199 |
200 | if __name__ == "__main__":
201 | ciceksepeti_scraper()
--------------------------------------------------------------------------------
/scrapers/ciceksepeti_scraper/requirements.txt:
--------------------------------------------------------------------------------
1 | numpy==1.18.5
2 | pandas==1.1.4
3 | selenium==3.141.0
--------------------------------------------------------------------------------
/scrapers/eksi_scraper/eksi-sozluk-ornek-veriseti.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ardauzunoglu/TRScraper/85ef82834d225756b2727ad2e0aa3683d0dd9333/scrapers/eksi_scraper/eksi-sozluk-ornek-veriseti.xlsx
--------------------------------------------------------------------------------
/scrapers/eksi_scraper/eksi_scraper.py:
--------------------------------------------------------------------------------
1 | import sys
2 | import time
3 | import pandas as pd
4 | from selenium import webdriver
5 | from selenium.common.exceptions import NoSuchElementException, WebDriverException
6 | from selenium.webdriver.common.keys import Keys
7 |
8 | def eksisozluk_scrape():
9 | def initialize():
10 | def preference(scrape_input, question):
11 | while (scrape_input.lower() != "y") or (scrape_input.lower() != "n"):
12 | if scrape_input.lower() == "y":
13 | output = True
14 | break
15 |
16 | elif scrape_input.lower() == "n":
17 | output = False
18 | break
19 |
20 | else:
21 | print("Geçersiz yanıt.")
22 | scrape_input = input(question)
23 |
24 | return output
25 |
26 | def delay_check(delay):
27 | while type(delay) != int:
28 | try:
29 | delay = int(delay)
30 | except ValueError:
31 | print("Lütfen bir sayı değeri giriniz.")
32 | delay = input("Bekleme süresi: ")
33 |
34 | return delay
35 |
36 | print("""
37 | ---------------------------------------------------------
38 | - Ekşi Sözlük Scraper'a hoş geldiniz! -
39 | - Geliştirici: Arda Uzunoğlu -
40 | ---------------------------------------------------------
41 | """)
42 |
43 | global title, file, delay, entry_texts, author_texts, date_texts, scrape_author_input, scrape_date_input, scrape_author, scrape_date, path
44 |
45 | title = input("Entrylerin çekileceği başlık: ")
46 | file = input("Oluşturulacak Excel dosyasının adı: ")
47 | file = file + ".xlsx"
48 | delay = delay_check(input("Bekleme süresi(sn): "))
49 |
50 | entry_texts = []
51 | author_texts = []
52 | date_texts = []
53 |
54 | scrape_author_question = "Yazar isimleri çekilsin mi(y/n): "
55 | scrape_author_input = input(scrape_author_question)
56 | scrape_author = preference(scrape_author_input, scrape_author_question)
57 |
58 | scrape_date_question = "Entry tarihleri çekilsin mi(y/n): "
59 | scrape_date_input = input(scrape_date_question)
60 | scrape_date = preference(scrape_date_input, scrape_date_question)
61 |
62 | path = "BURAYA CHROMEDRIVER KONUMUNU GİRİNİZ"
63 |
64 | def scrape():
65 | try:
66 | print("Chromedriver'a erişiliyor...")
67 | driver = webdriver.Chrome(path)
68 | time.sleep(delay)
69 | print("Chromedriver'a erişildi.")
70 |
71 | except WebDriverException:
72 | print("Chromedriver kullanılamıyor.")
73 | sys.exit()
74 |
75 | try:
76 | print("Ekşi Sözlük adresine gidiliyor...")
77 | driver.get("https://eksisozluk.com")
78 | time.sleep(delay)
79 | driver.maximize_window()
80 | time.sleep(delay)
81 | print("Ekşi Sözlük adresine gidildi.")
82 |
83 | except:
84 | print("Ekşi Sözlük'e erişilemiyor.")
85 | sys.exit()
86 |
87 | try:
88 | print("Başlık aranıyor...")
89 | search_bar = driver.find_element_by_id("search-textbox")
90 | search_bar.send_keys(title)
91 | search_bar.send_keys(Keys.ENTER)
92 | time.sleep(delay)
93 | print("Başlık bulundu.")
94 |
95 | except NoSuchElementException:
96 | print("Başlık bulunamadı.")
97 | sys.exit()
98 |
99 | try:
100 | length_of_title = driver.find_element_by_class_name("last")
101 | length_of_title = int(length_of_title.text)
102 |
103 | except NoSuchElementException:
104 | length_of_title = 1
105 |
106 | l = 1
107 |
108 | while l <= length_of_title:
109 |
110 | print("Veriler çekiliyor...")
111 | print("Sayfa: " + str(l))
112 |
113 | time.sleep(delay)
114 |
115 | entries = driver.find_elements_by_css_selector(".content")
116 | for entry in entries:
117 | entry = entry.text
118 | entry_texts.append(entry)
119 |
120 | time.sleep(delay)
121 |
122 | dates = driver.find_elements_by_class_name("entry-date")
123 | for date in dates:
124 | date = date.text
125 | date_texts.append(date)
126 |
127 | time.sleep(delay)
128 |
129 | authors = driver.find_elements_by_class_name("entry-author")
130 | for author in authors:
131 | author = author.text
132 | author_texts.append(author)
133 |
134 | l += 1
135 |
136 | try:
137 | close_ad = driver.find_element_by_id("interstitial-close-link-tag")
138 | close_ad.click()
139 | time.sleep(delay)
140 |
141 | except NoSuchElementException:
142 | try:
143 | next_page = driver.find_element_by_class_name("next")
144 | next_page.click()
145 |
146 | except NoSuchElementException:
147 | pass
148 |
149 | driver.close()
150 |
151 | length_list = [entry_texts, author_texts, date_texts]
152 | limit = map(len, length_list)
153 | limit = min(list(limit))
154 | limit -= 1
155 |
156 | entry_texts_fin = entry_texts[:limit]
157 | df = pd.DataFrame({"Entryler": entry_texts_fin})
158 |
159 | if scrape_date:
160 | date_texts_fin = date_texts[:limit]
161 | df["Tarihler"] = date_texts_fin
162 |
163 | if scrape_author:
164 | author_texts_fin = author_texts[:limit]
165 | df["Yazarlar"] = author_texts_fin
166 |
167 | df.to_excel(file, header = True, index = False)
168 |
169 | print("Başlık kazıması tamamlandı.")
170 | print("Çektiğiniz veriler "+ file + " adlı excel dosyasına kaydedildi.")
171 | print("""
172 | --------------------------------------------------------------------------
173 | - Projeden memnun kaldıysanız Github üzerinden yıldızlamayı unutmayın. -
174 | - Github Hesabım: ardauzunoglu -
175 | --------------------------------------------------------------------------
176 | """)
177 |
178 | time.sleep(3)
179 |
180 | initialize()
181 | scrape()
182 |
183 | if __name__ == "__main__":
184 | eksisozluk_scrape()
--------------------------------------------------------------------------------
/scrapers/eksi_scraper/requirements.txt:
--------------------------------------------------------------------------------
1 | numpy==1.18.5
2 | pandas==1.1.4
3 | selenium==3.141.0
--------------------------------------------------------------------------------
/scrapers/gittigidiyor_scraper/gittigidiyor-ornek-veriseti.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ardauzunoglu/TRScraper/85ef82834d225756b2727ad2e0aa3683d0dd9333/scrapers/gittigidiyor_scraper/gittigidiyor-ornek-veriseti.xlsx
--------------------------------------------------------------------------------
/scrapers/gittigidiyor_scraper/gittigidiyor_scraper.py:
--------------------------------------------------------------------------------
1 | import sys
2 | import time
3 | import pandas as pd
4 | from selenium import webdriver
5 | from selenium.common.exceptions import WebDriverException, NoSuchElementException
6 | from selenium.webdriver.common.keys import Keys
7 |
8 | def gittigidiyor_scrape():
9 | def initialize():
10 | def preference(scrape_input, question):
11 | while (scrape_input.lower() != "y") or (scrape_input.lower() != "n"):
12 | if scrape_input.lower() == "y":
13 | output = True
14 | break
15 |
16 | elif scrape_input.lower() == "n":
17 | output = False
18 | break
19 |
20 | else:
21 | print("Geçersiz yanıt.")
22 | scrape_input = input(question)
23 |
24 | return output
25 |
26 | def delay_check(delay):
27 | while type(delay) != int:
28 | try:
29 | delay = int(delay)
30 | except ValueError:
31 | print("Lütfen bir sayı değeri giriniz.")
32 | delay = input("Bekleme süresi: ")
33 |
34 | return delay
35 |
36 | print("""
37 | ---------------------------------------------------------
38 | - Gittigidiyor Scraper'a hoş geldiniz! -
39 | - Geliştirici: Arda Uzunoğlu -
40 | ---------------------------------------------------------
41 | """)
42 |
43 | global product_name, file, delay, review_texts, review_headlines, review_useful, customer_name_texts, date_texts, scrape_headlines, scrape_useful, scrape_customer_names, scrape_dates, path
44 |
45 | product_name = input("İncelemelerin çekileceği ürün adı: ")
46 | file = input("Oluşturulacak Excel dosyasının adı: ")
47 | file = file + ".xlsx"
48 | delay = delay_check(input("Bekleme süresi(sn): "))
49 |
50 | review_texts = []
51 | review_useful = []
52 | review_headlines = []
53 | customer_name_texts = []
54 | date_texts = []
55 |
56 | scrape_useful_question = "İncelemenin aldığı beğeni sayısı çekilsin mi(y/n): "
57 | scrape_useful_input = input(scrape_useful_question)
58 | scrape_useful = preference(scrape_useful_input, scrape_useful_question)
59 |
60 | scrape_headlines_question = "İncelemenin başlığı çekilsin mi(y/n): "
61 | scrape_headlines_input = input(scrape_headlines_question)
62 | scrape_headlines = preference(scrape_headlines_input, scrape_headlines_question)
63 |
64 | scrape_customer_name_question = "Müşteri isimleri çekilsin mi(y/n): "
65 | scrape_customer_name_input = input(scrape_customer_name_question)
66 | scrape_customer_names = preference(scrape_customer_name_input, scrape_customer_name_question)
67 |
68 | scrape_date_question = "İnceleme tarihleri çekilsin mi(y/n): "
69 | scrape_date_input = input(scrape_date_question)
70 | scrape_dates = preference(scrape_date_input, scrape_date_question)
71 |
72 | path = "BURAYA CHROMEDRIVER KONUMUNU GİRİNİZ"
73 |
74 | def scrape():
75 | try:
76 | print("Chromedriver'a erişiliyor...")
77 | driver = webdriver.Chrome(path)
78 | time.sleep(delay)
79 | print("Chromedriver'a erişildi.")
80 |
81 | except WebDriverException:
82 | print("Chromedriver kullanılamıyor.")
83 | sys.exit()
84 |
85 | try:
86 | print("Gittigidiyor adresine gidiliyor...")
87 | driver.get("https://www.gittigidiyor.com")
88 | time.sleep(delay)
89 | driver.maximize_window()
90 | time.sleep(delay)
91 | print("Gittigidiyor adresine gidildi.")
92 |
93 | except:
94 | print("Gittigidiyor'a erişilemiyor.")
95 | sys.exit()
96 |
97 | try:
98 | print("Ürün aranıyor...")
99 | search_bar = driver.find_element_by_xpath("//*[@id='__next']/header/div[3]/div/div/div/div[2]/form/div/div[1]/div[2]/input")
100 | search_bar.send_keys(product_name)
101 | search_bar.send_keys(Keys.ENTER)
102 | time.sleep(delay)
103 |
104 | product = driver.find_element_by_class_name("srp-item-list")
105 | product.click()
106 | time.sleep(delay)
107 | print("Ürün bulundu.")
108 |
109 | except NoSuchElementException:
110 | print("Ürün bulunamadı.")
111 | sys.exit()
112 |
113 | url = driver.current_url
114 | root = url.index("_")
115 | url = url[:root]
116 | url = url + "/yorumlari"
117 | driver.get(url)
118 |
119 | review_counts = driver.find_element_by_class_name("catalog-point-content").text
120 | review_counts = int(review_counts.replace("Kullanıcı Değerlendirmesi", ""))
121 |
122 | if review_counts % 10 == 0:
123 | length_of_page = review_counts // 10
124 | else:
125 | length_of_page = (review_counts // 10) + 1
126 |
127 | l = 1
128 |
129 | while l <= length_of_page:
130 |
131 | print("İncelemeler çekiliyor...")
132 | print("Sayfa: " + str(l))
133 |
134 | lenOfPage = driver.execute_script("window.scrollTo(0, document.body.scrollHeight); var lenOfPage=document.body.scrollHeight; return lenOfPage;")
135 | match = False
136 |
137 | while match == False:
138 | lastCount = lenOfPage
139 | time.sleep(delay)
140 | lenOfPage = driver.execute_script("window.scrollTo(0, document.body.scrollHeight); var lenOfPage=document.body.scrollHeight; return lenOfPage;")
141 | if lastCount == lenOfPage:
142 | match = True
143 |
144 | time.sleep(delay)
145 |
146 | reviews = driver.find_elements_by_class_name("user-catalog-review-comment-detail")
147 | for review in reviews:
148 | review = review.text
149 | if review == "":
150 | review = "BOŞ"
151 | review_texts.append(review)
152 |
153 | print("Veriler çekiliyor...")
154 | print("İnceleme: " + str(len(review_texts)))
155 |
156 | time.sleep(delay)
157 |
158 | usefuls = driver.find_elements_by_class_name("point-count")
159 | for useful in usefuls:
160 | useful = useful.text
161 | if useful == "":
162 | useful = "0"
163 | review_useful.append(useful)
164 |
165 | headlines = driver.find_elements_by_class_name("user-catalog-review-header")
166 | for headline in headlines:
167 | headline = headline.text
168 | if headline == "":
169 | headline = "BOŞ"
170 | review_headlines.append(headline)
171 |
172 | customers = driver.find_elements_by_class_name("user-detail-container")
173 | for customer in customers:
174 | customer = customer.text
175 | customer = customer.split()
176 |
177 | customer_name = customer[0]
178 | customer_name_texts.append(customer_name)
179 |
180 | date = customer[1]
181 | date_texts.append(date)
182 |
183 | try:
184 | next_button = driver.find_element_by_class_name("next-link")
185 | next_button.click()
186 |
187 | except:
188 | pass
189 |
190 | l += 1
191 |
192 | driver.close()
193 |
194 | length_list = [review_texts, review_useful, review_headlines, customer_name_texts, date_texts]
195 | limit = map(len, length_list)
196 | limit = min(list(limit))
197 | limit -= 1
198 |
199 | review_texts_fin = review_texts[:limit]
200 | df = pd.DataFrame({"Yorum": review_texts_fin})
201 |
202 | if scrape_useful:
203 | review_useful_fin = review_useful[:limit]
204 | df["Yorum Beğeni Sayısı"] = review_useful_fin
205 | df["Yorum Beğeni Sayısı"] = df["Yorum Beğeni Sayısı"]
206 |
207 | if scrape_headlines:
208 | review_headlines_fin = review_headlines[:limit]
209 | df["Yorumun Başlığı"] = review_headlines_fin
210 |
211 | if scrape_customer_names:
212 | customer_name_texts_fin = customer_name_texts[:limit]
213 | df["Yorum Yazan Müşteri"] = customer_name_texts_fin
214 |
215 | if scrape_dates:
216 | date_texts_fin = date_texts[:limit]
217 | df["Yorumun Yazıldığı Tarih"] = date_texts_fin
218 |
219 | df.to_excel(file, header = True, index = False)
220 |
221 | x = "Çektiğiniz veriler " + file + " adlı excel dosyasına kaydedildi."
222 | print(x)
223 |
224 | print("""
225 | --------------------------------------------------------------------------
226 | - Projeden memnun kaldıysanız Github üzerinden yıldızlamayı unutmayın. -
227 | - Github Hesabım: ardauzunoglu -
228 | --------------------------------------------------------------------------
229 | """)
230 |
231 | time.sleep(3)
232 | initialize()
233 | scrape()
234 |
235 | if __name__ == "__main__":
236 | gittigidiyor_scrape()
--------------------------------------------------------------------------------
/scrapers/gittigidiyor_scraper/requirements.txt:
--------------------------------------------------------------------------------
1 | numpy==1.18.5
2 | pandas==1.1.4
3 | selenium==3.141.0
--------------------------------------------------------------------------------
/scrapers/hepsiburada_scraper/hepsiburada-ornek-veriseti.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ardauzunoglu/TRScraper/85ef82834d225756b2727ad2e0aa3683d0dd9333/scrapers/hepsiburada_scraper/hepsiburada-ornek-veriseti.xlsx
--------------------------------------------------------------------------------
/scrapers/hepsiburada_scraper/hepsiburada_scraper.py:
--------------------------------------------------------------------------------
1 | import sys
2 | import time
3 | import pandas as pd
4 | from selenium import webdriver
5 | from selenium.common.exceptions import NoSuchElementException, WebDriverException
6 | from selenium.webdriver.common.keys import Keys
7 |
8 | def hepsiburada_scrape():
9 | def initialize():
10 | def preference(scrape_input, question):
11 | while (scrape_input.lower() != "y") or (scrape_input.lower() != "n"):
12 | if scrape_input.lower() == "y":
13 | output = True
14 | break
15 |
16 | elif scrape_input.lower() == "n":
17 | output = False
18 | break
19 |
20 | else:
21 | print("Geçersiz yanıt.")
22 | scrape_input = input(question)
23 |
24 | return output
25 |
26 | def delay_check(delay):
27 | while type(delay) != int:
28 | try:
29 | delay = int(delay)
30 | except ValueError:
31 | print("Lütfen bir sayı değeri giriniz.")
32 | delay = input("Bekleme süresi: ")
33 |
34 | return delay
35 |
36 | print("""
37 | ---------------------------------------------------------
38 | - Hepsiburada Scraper'a hoş geldiniz! -
39 | - Geliştirici: Arda Uzunoğlu -
40 | ---------------------------------------------------------
41 | """)
42 |
43 | global product_name, file, delay, review_texts, review_useful, review_not_useful, customer_name_texts, customer_province_texts, customer_age_texts, date_texts, scrape_useful, scrape_customer_name, scrape_customer_province, scrape_customer_age, scrape_date, path
44 |
45 | product_name = input("Değerlendirmelerin çekileceği ürün adı: ")
46 | file = input("Oluşturulacak Excel dosyasının adı: ")
47 | file = file + ".xlsx"
48 | delay = delay_check(input("Bekleme süresi(sn): "))
49 |
50 | review_texts = []
51 | review_useful = []
52 | review_not_useful = []
53 | customer_name_texts = []
54 | customer_province_texts = []
55 | customer_age_texts = []
56 | date_texts = []
57 |
58 | scrape_useful_question = "İncelemenin aldığı beğeni sayısı çekilsin mi(y/n): "
59 | scrape_useful_input = input(scrape_useful_question)
60 | scrape_useful = preference(scrape_useful_input, scrape_useful_question)
61 |
62 | scrape_customer_name_question = "Müşteri isimleri çekilsin mi(y/n): "
63 | scrape_customer_name_input = input(scrape_customer_name_question)
64 | scrape_customer_name = preference(scrape_customer_name_input, scrape_customer_name_question)
65 |
66 | scrape_customer_province_question = "Müşteri konumları çekilsin mi(y/n): "
67 | scrape_customer_province_input = input(scrape_customer_province_question)
68 | scrape_customer_province = preference(scrape_customer_province_input, scrape_customer_province_question)
69 |
70 | scrape_customer_age_question = "Müşteri yaşları çekilsin mi(y/n): "
71 | scrape_customer_age_input = input(scrape_customer_age_question)
72 | scrape_customer_age = preference(scrape_customer_age_input, scrape_customer_age_question)
73 |
74 | scrape_date_question = "İnceleme tarihleri çekilsin mi(y/n): "
75 | scrape_date_input = input(scrape_date_question)
76 | scrape_date = preference(scrape_date_input, scrape_date_question)
77 |
78 | path = "BURAYA CHROMEDRIVER KONUMUNU GİRİNİZ"
79 |
80 | def scrape():
81 | try:
82 | print("Chromedriver'a erişiliyor...")
83 | driver = webdriver.Chrome(path)
84 | time.sleep(delay)
85 | print("Chromedriver'a erişildi.")
86 |
87 | except WebDriverException:
88 | print("Chromedriver kullanılamıyor.")
89 | sys.exit()
90 |
91 | try:
92 | print("Hepsiburada adresine gidiliyor...")
93 | driver.get("https://www.hepsiburada.com")
94 | time.sleep(delay)
95 | driver.maximize_window()
96 | time.sleep(delay)
97 | print("Hepsiburada adresine gidildi.")
98 |
99 | except:
100 | print("Hepsiburada'ya erişilemiyor.")
101 | sys.exit()
102 |
103 | try:
104 | print("Ürün aranıyor...")
105 | search_bar = driver.find_element_by_class_name("desktopOldAutosuggestTheme-input")
106 | search_bar.send_keys(product_name)
107 | search_bar.send_keys(Keys.ENTER)
108 | time.sleep(delay)
109 |
110 | product = driver.find_element_by_class_name("search-item")
111 | product.click()
112 | time.sleep(delay)
113 | print("Ürün bulundu.")
114 |
115 | except NoSuchElementException:
116 | print("Ürün bulunamadı.")
117 | sys.exit()
118 |
119 | try:
120 | review_count = driver.find_element_by_id("productReviewsTab").text
121 | review_count = review_count.replace("Değerlendirmeler ", "")
122 | review_count = review_count.replace("(","")
123 | review_count = review_count.replace(")","")
124 | review_count = int(review_count)
125 | if review_count % 30 == 0:
126 | review_page_count = review_count // 10
127 |
128 | else:
129 | review_page_count = (review_count // 10) + 1
130 |
131 | constant_url = driver.current_url
132 |
133 | except NoSuchElementException:
134 | print("İnceleme bulunamadı.")
135 | sys.exit()
136 |
137 | try:
138 | index_of_question_mark = constant_url.index("?")
139 | constant_url = constant_url[:index_of_question_mark]
140 |
141 | except NoSuchElementException:
142 | pass
143 |
144 | i = 1
145 | while i <= review_page_count:
146 |
147 | url = constant_url + "-yorumlari?sayfa=" + str(i)
148 | driver.get(url)
149 |
150 | print("Veriler çekiliyor...")
151 | print("Sayfa: " + str(i))
152 |
153 | reviews = driver.find_elements_by_xpath("//*[@id='hermes-voltran-comments']//span[@itemprop='description']")
154 | for review in reviews:
155 | review = review.text
156 | review_texts.append(review)
157 |
158 | customer_names = driver.find_elements_by_xpath("//*[@id='hermes-voltran-comments']//span[@itemprop='author']")
159 | for customer_name in customer_names:
160 | customer_name = customer_name.text
161 | customer_name_texts.append(customer_name)
162 |
163 | customer_ages = driver.find_elements_by_xpath("//*[@class='hermes-ReviewCard-module-1-Wp3']//span[2]")
164 | for customer_age in customer_ages:
165 | customer_age = customer_age.text
166 | customer_age = customer_age.replace("(", "")
167 | customer_age = customer_age.replace(")", "")
168 |
169 | if customer_age == "":
170 | customer_age = "Boş"
171 |
172 | customer_age_texts.append(customer_age)
173 |
174 | customer_provinces = driver.find_elements_by_xpath("//*[@class='hermes-ReviewCard-module-1-Wp3']//span[3]")
175 | for customer_province in customer_provinces:
176 | customer_province = customer_province.text
177 | customer_province = customer_province.replace("-", "")
178 | customer_province = customer_province.replace(" ", "")
179 | customer_province_texts.append(customer_province)
180 |
181 | dates = driver.find_elements_by_xpath("//*[@id='hermes-voltran-comments']//span[@itemprop='datePublished']")
182 | for date in dates:
183 | date = date.text
184 | date = date.replace(",", "")
185 | date = date.split()
186 |
187 | day_conv = {
188 | "Pts":"Pazartesi",
189 | "Sal":"Salı",
190 | "Çar":"Çarşamba",
191 | "Per":"Perşembe",
192 | "Cum":"Cuma",
193 | "Cts":"Cumartesi",
194 | "Paz":"Pazar",
195 | "Pazartesi":"Pazartesi",
196 | "Salı":"Salı",
197 | "Çarşamba":"Çarşamba",
198 | "Perşembe":"Perşembe",
199 | "Cuma":"Cuma",
200 | "Cumartesi":"Cumartesi",
201 | "Pazar":"Pazar"
202 | }
203 |
204 | years = ["2020", "2019", "2018", "2017", "2016", "2015", "2014", "2013", "2012", "2011", "2010", "2009", "2008", "2007", "2006", "2005", "2004", "2003", "2002", "2001", "2000"]
205 |
206 | if date[2] not in years:
207 | date.insert(2, "2021")
208 |
209 | date[-1] = day_conv[date[-1]]
210 | date = " ".join(date)
211 | date_texts.append(date)
212 |
213 | usefuls = driver.find_elements_by_xpath("//*[@id='hermes-voltran-comments']//button[@class='hermes-ReviewCard-module-1MoiF']")
214 | not_usefuls = driver.find_elements_by_xpath("//*[@id='hermes-voltran-comments']//button[@class='hermes-ReviewCard-module-39K0Y']")
215 |
216 | for useful in usefuls:
217 | useful = useful.text
218 | useful = useful.replace("Evet", "")
219 | useful = useful.replace("(", "")
220 | useful = useful.replace(")", "")
221 | review_useful.append(useful)
222 |
223 | for not_useful in not_usefuls:
224 | not_useful = not_useful.text
225 | not_useful = not_useful.replace("Hayır", "")
226 | not_useful = not_useful.replace("(", "")
227 | not_useful = not_useful.replace(")", "")
228 | review_not_useful.append(not_useful)
229 |
230 | while len(review_useful) < len(date_texts):
231 | review_useful.append("0")
232 | review_not_useful.append("0")
233 |
234 | while len(review_texts) < len(date_texts):
235 | review_texts.append("Boş")
236 |
237 | i += 1
238 |
239 | driver.close()
240 |
241 | length_list = [review_texts, review_useful, review_not_useful, date_texts, customer_name_texts, customer_age_texts, customer_province_texts]
242 | limit = map(len, length_list)
243 | limit = min(list(limit))
244 | limit -= 1
245 |
246 | review_texts_fin = review_texts[:limit]
247 | df = pd.DataFrame({"Değerlendirme: ":review_texts_fin})
248 |
249 | if scrape_useful:
250 | review_useful_fin = review_useful[:limit]
251 | review_not_useful_fin = review_not_useful[:limit]
252 | df["Değerlendirmeyi Yararlı Bulanlar"] = review_useful_fin
253 | df["Değerlendirmeyi Yararlı Bulmayanlar"] = review_not_useful_fin
254 |
255 | if scrape_date:
256 | date_texts_fin = date_texts[:limit]
257 | df["Değerlendirme Tarihi:"] = date_texts_fin
258 |
259 | if scrape_customer_name:
260 | customer_name_texts_fin = customer_name_texts[:limit]
261 | df["Müşterinin Adı Soyadı"] = customer_name_texts_fin
262 |
263 | if scrape_customer_age:
264 | customer_age_texts_fin = customer_age_texts[:limit]
265 | df["Müşterinin Yaşı"] = customer_age_texts_fin
266 |
267 | if scrape_customer_province:
268 | customer_province_texts_fin = customer_province_texts[:limit]
269 | df["Müşterinin Konumu"] = customer_province_texts_fin
270 |
271 | df.to_excel(file, header = True, index = False)
272 |
273 | x = "Çektiğiniz veriler "+ file + " adlı excel dosyasına kaydedildi."
274 | print(x)
275 |
276 | print("""
277 | --------------------------------------------------------------------------
278 | - Projeden memnun kaldıysanız Github üzerinden yıldızlamayı unutmayın. -
279 | - Github Hesabım: ardauzunoglu -
280 | --------------------------------------------------------------------------
281 | """)
282 |
283 | time.sleep(3)
284 |
285 | initialize()
286 | scrape()
287 |
288 | if __name__ == "__main__":
289 | hepsiburada_scrape()
--------------------------------------------------------------------------------
/scrapers/hepsiburada_scraper/requirements.txt:
--------------------------------------------------------------------------------
1 | numpy==1.18.5
2 | pandas==1.1.4
3 | selenium==3.141.0
--------------------------------------------------------------------------------
/scrapers/incehesap_scraper/incehesap-ornek-veriseti.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ardauzunoglu/TRScraper/85ef82834d225756b2727ad2e0aa3683d0dd9333/scrapers/incehesap_scraper/incehesap-ornek-veriseti.xlsx
--------------------------------------------------------------------------------
/scrapers/incehesap_scraper/incehesap_scraper.py:
--------------------------------------------------------------------------------
1 | import sys
2 | import time
3 | import pandas as pd
4 | from selenium import webdriver
5 | from selenium.common.exceptions import WebDriverException, NoSuchElementException
6 | from selenium.webdriver.common.keys import Keys
7 |
8 | def incehesap_scraper():
9 | def initialize():
10 | def preference(scrape_input, question):
11 | while (scrape_input.lower() != "y") or (scrape_input.lower() != "n"):
12 | if scrape_input.lower() == "y":
13 | output = True
14 | break
15 |
16 | elif scrape_input.lower() == "n":
17 | output = False
18 | break
19 |
20 | else:
21 | print("Geçersiz yanıt.")
22 | scrape_input = input(question)
23 |
24 | return output
25 |
26 | def delay_check(delay):
27 | while type(delay) != int:
28 | try:
29 | delay = int(delay)
30 | except ValueError:
31 | print("Lütfen bir sayı değeri giriniz.")
32 | delay = input("Bekleme süresi: ")
33 |
34 | return delay
35 |
36 | print("""
37 | ---------------------------------------------------------
38 | - İncehesap Scraper'a hoş geldiniz! -
39 | - Geliştirici: Arda Uzunoğlu -
40 | ---------------------------------------------------------
41 | """)
42 |
43 | global product_name, file, delay, review_texts, review_headlines, review_useful, customer_name_texts, date_texts, scrape_headlines, scrape_useful, scrape_customer_names, scrape_dates, path
44 |
45 | product_name = input("İncelemelerin çekileceği ürün adı: ")
46 | file = input("Oluşturulacak Excel dosyasının adı: ")
47 | file = file + ".xlsx"
48 | delay = delay_check(input("Bekleme süresi(sn): "))
49 |
50 | review_texts = []
51 | review_useful = []
52 | review_headlines = []
53 | customer_name_texts = []
54 | date_texts = []
55 |
56 | scrape_useful_question = "İncelemenin aldığı beğeni sayısı çekilsin mi(y/n): "
57 | scrape_useful_input = input(scrape_useful_question)
58 | scrape_useful = preference(scrape_useful_input, scrape_useful_question)
59 |
60 | scrape_headlines_question = "İncelemenin başlığı çekilsin mi(y/n): "
61 | scrape_headlines_input = input(scrape_headlines_question)
62 | scrape_headlines = preference(scrape_headlines_input, scrape_headlines_question)
63 |
64 | scrape_customer_name_question = "Müşteri isimleri çekilsin mi(y/n): "
65 | scrape_customer_name_input = input(scrape_customer_name_question)
66 | scrape_customer_names = preference(scrape_customer_name_input, scrape_customer_name_question)
67 |
68 | scrape_date_question = "İnceleme tarihleri çekilsin mi(y/n): "
69 | scrape_date_input = input(scrape_date_question)
70 | scrape_dates = preference(scrape_date_input, scrape_date_question)
71 |
72 | path = "BURAYA CHROMEDRIVER KONUMUNU GİRİNİZ"
73 |
74 | def scrape():
75 | try:
76 | print("Chromedriver'a erişiliyor...")
77 | driver = webdriver.Chrome(path)
78 | time.sleep(delay)
79 | print("Chromedriver'a erişildi.")
80 |
81 | except WebDriverException:
82 | print("Chromedriver kullanılamıyor.")
83 | sys.exit()
84 |
85 | try:
86 | print("İncehesap adresine gidiliyor...")
87 | driver.get("https://www.incehesap.com")
88 | time.sleep(delay)
89 | driver.maximize_window()
90 | time.sleep(delay)
91 | print("İncehesap adresine gidildi.")
92 |
93 | except:
94 | print("İncehesap'a erişilemiyor.")
95 | sys.exit()
96 |
97 | try:
98 | print("Ürün aranıyor...")
99 | search_bar = driver.find_element_by_id("query")
100 | search_bar.send_keys(product_name)
101 | search_bar.send_keys(Keys.ENTER)
102 | time.sleep(delay)
103 |
104 | product = driver.find_element_by_class_name("product-link")
105 | product.click()
106 | time.sleep(delay)
107 | print("Ürün bulundu.")
108 |
109 | except NoSuchElementException:
110 | print("Ürün bulunamadı.")
111 | sys.exit()
112 |
113 | try:
114 | cancel_cookie_pop = driver.find_element_by_xpath("/html/body/div[4]/div/button")
115 | cancel_cookie_pop.click()
116 | except:
117 | pass
118 |
119 | try:
120 | time.sleep(delay)
121 | review_count = driver.find_element_by_xpath("/html/body/div[2]/div[1]/main/section[1]/div[2]/div[3]/div[4]/div[2]/a[2]")
122 | review_count.click()
123 | review_count = review_count.text.replace("(", "")
124 | review_count = review_count.replace(")", "")
125 | review_count = int(review_count.replace("Yorumlar", ""))
126 |
127 | except NoSuchElementException:
128 | print("İnceleme bulunamadı.")
129 | sys.exit()
130 |
131 | try:
132 | time.sleep(delay)
133 | load_all_comments = driver.find_element_by_class_name("all-comments")
134 | load_all_comments.click()
135 |
136 | except:
137 | pass
138 |
139 | while len(review_texts) <= review_count:
140 |
141 | time.sleep(delay)
142 | driver.execute_script("window.scrollTo(0, document.body.scrollHeight); var lenOfPage=document.body.scrollHeight; return lenOfPage;")
143 |
144 | comments = driver.find_elements_by_class_name("item")
145 | for comment in comments:
146 | try:
147 | customer = comment.find_element_by_xpath("//*[@id='comment-list']/div["+str(len(review_texts)+2)+"]/div[1]/strong").text
148 | customer_name_texts.append(customer)
149 |
150 | date = comment.find_element_by_xpath("//*[@id='comment-list']/div["+str(len(review_texts)+2)+"]/div[1]/span").text.split()
151 | date = " ".join(date[:3])
152 | date_texts.append(date)
153 |
154 | headline = comment.find_element_by_xpath("//*[@id='comment-list']/div["+str(len(review_texts)+2)+"]/div[3]/b").text
155 | if headline == "":
156 | review_headlines.append("BOŞ")
157 | else:
158 | review_headlines.append(headline)
159 |
160 | useful = comment.find_element_by_xpath("//*[@id='comment-list']/div["+str(len(review_texts)+2)+"]/div[4]/a[1]").text
161 | useful = useful.replace("Evet", "").replace("(", "").replace(")", "")
162 | review_useful.append(useful)
163 |
164 | review = comment.find_element_by_xpath("//*[@id='comment-list']/div["+str(len(review_texts)+2)+"]/div[3]/span").text
165 | review_texts.append(review)
166 |
167 | print("İncelemeler çekiliyor...")
168 | print("İnceleme: " + str(len(review_texts)))
169 |
170 | except:
171 | break
172 |
173 | break
174 |
175 | driver.close()
176 |
177 | length_list = [review_texts, review_useful, review_headlines, customer_name_texts, date_texts]
178 | limit = map(len, length_list)
179 | limit = min(list(limit))
180 | limit -= 1
181 |
182 | review_texts_fin = review_texts[:limit]
183 | df = pd.DataFrame({"Yorum": review_texts_fin})
184 |
185 | if scrape_useful:
186 | review_useful_fin = review_useful[:limit]
187 | df["Yorum Beğeni Sayısı"] = review_useful_fin
188 |
189 | if scrape_headlines:
190 | review_headlines_fin = review_headlines[:limit]
191 | df["Yorumun Başlığı"] = review_headlines_fin
192 |
193 | if scrape_customer_names:
194 | customer_name_texts_fin = customer_name_texts[:limit]
195 | df["Yorum Yazan Müşteri"] = customer_name_texts_fin
196 |
197 | if scrape_dates:
198 | date_texts_fin = date_texts[:limit]
199 | df["Yorumun Yazıldığı Tarih"] = date_texts_fin
200 |
201 | df.to_excel(file, header = True, index = False)
202 |
203 | x = "Çektiğiniz veriler " + file + " adlı excel dosyasına kaydedildi."
204 | print(x)
205 |
206 | print("""
207 | --------------------------------------------------------------------------
208 | - Projeden memnun kaldıysanız Github üzerinden yıldızlamayı unutmayın. -
209 | - Github Hesabım: ardauzunoglu -
210 | --------------------------------------------------------------------------
211 | """)
212 |
213 | initialize()
214 | scrape()
215 |
216 | if __name__ == "__main__":
217 | incehesap_scraper()
--------------------------------------------------------------------------------
/scrapers/incehesap_scraper/requirements.txt:
--------------------------------------------------------------------------------
1 | numpy==1.18.5
2 | pandas==1.1.4
3 | selenium==3.141.0
--------------------------------------------------------------------------------
/scrapers/kitapyurdu_scraper/kitapyurdu-ornek-veriseti.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ardauzunoglu/TRScraper/85ef82834d225756b2727ad2e0aa3683d0dd9333/scrapers/kitapyurdu_scraper/kitapyurdu-ornek-veriseti.xlsx
--------------------------------------------------------------------------------
/scrapers/kitapyurdu_scraper/kitapyurdu_scraper.py:
--------------------------------------------------------------------------------
1 | import sys
2 | import time
3 | import pandas as pd
4 | from selenium import webdriver
5 | from selenium.common.exceptions import NoSuchElementException, WebDriverException
6 | from selenium.webdriver.common.keys import Keys
7 |
8 | def kitapyurdu_scrape():
9 | def initialize():
10 | def preference(scrape_input, question):
11 | while (scrape_input.lower() != "y") or (scrape_input.lower() != "n"):
12 | if scrape_input.lower() == "y":
13 | output = True
14 | break
15 |
16 | elif scrape_input.lower() == "n":
17 | output = False
18 | break
19 |
20 | else:
21 | print("Geçersiz yanıt.")
22 | scrape_input = input(question)
23 |
24 | return output
25 |
26 | def delay_check(delay):
27 | while type(delay) != int:
28 | try:
29 | delay = int(delay)
30 | except ValueError:
31 | print("Lütfen bir sayı değeri giriniz.")
32 | delay = input("Bekleme süresi: ")
33 |
34 | return delay
35 |
36 | print("""
37 | ---------------------------------------------------------
38 | - Kitapyurdu Scraper'a hoş geldiniz! -
39 | - Geliştirici: Arda Uzunoğlu -
40 | ---------------------------------------------------------
41 | """)
42 |
43 | global book, file, delay, review_texts, review_useful, review_not_useful, author_texts, date_texts, scrape_useful, scrape_author, scrape_date, path
44 |
45 | book = input("İncelemelerin Çekileceği Kitap Adı: ")
46 | file = input("Oluşturulacak Excel dosyasının adı: ")
47 | file = file + ".xlsx"
48 | delay = delay_check(input("Bekleme süresi(sn): "))
49 |
50 | review_texts = []
51 | review_useful = []
52 | review_not_useful = []
53 | author_texts = []
54 | date_texts = []
55 |
56 | scrape_useful_question = "İncelemenin aldığı beğeni sayısı çekilsin mi(y/n): "
57 | scrape_useful_input = input(scrape_useful_question)
58 | scrape_useful = preference(scrape_useful_input, scrape_useful_question)
59 |
60 | scrape_author_question = "Kullanıcı isimleri çekilsin mi(y/n): "
61 | scrape_author_input = input(scrape_author_question)
62 | scrape_author = preference(scrape_author_input, scrape_author_question)
63 |
64 | scrape_date_question = "İnceleme tarihleri çekilsin mi(y/n): "
65 | scrape_date_input = input(scrape_date_question)
66 | scrape_date = preference(scrape_date_input, scrape_date_question)
67 |
68 | path = "BURAYA CHROMEDRIVER KONUMUNU GİRİNİZ"
69 |
70 | def scrape():
71 | try:
72 | print("Chromedriver'a erişiliyor...")
73 | driver = webdriver.Chrome(path)
74 | time.sleep(delay)
75 | print("Chromedriver'a erişildi.")
76 |
77 | except WebDriverException:
78 | print("Chromedriver kullanılamıyor.")
79 | sys.exit()
80 |
81 | try:
82 | print("Kitapyurdu adresine gidiliyor...")
83 | driver.get("https://www.kitapyurdu.com")
84 | time.sleep(delay)
85 | driver.maximize_window()
86 | time.sleep(delay)
87 | print("Kitapyurdu adresine gidildi.")
88 |
89 | except:
90 | print("Kitapyurdu'na erişilemiyor.")
91 | sys.exit()
92 |
93 | try:
94 | print("Kitap aranıyor...")
95 | search = driver.find_element_by_id("search-input")
96 | search.send_keys(book)
97 | search.send_keys(Keys.ENTER)
98 |
99 | time.sleep(delay)
100 |
101 | try:
102 | close_notification = driver.find_element_by_class_name("opt-in-disallow-button")
103 | close_notification.click()
104 |
105 | except NoSuchElementException:
106 | pass
107 |
108 | time.sleep(delay)
109 |
110 | product = driver.find_element_by_class_name("name.ellipsis")
111 | product.click()
112 | time.sleep(delay)
113 | print("Kitap bulundu.")
114 |
115 | except NoSuchElementException:
116 | print("Kitap bulunamadı.")
117 | sys.exit()
118 |
119 | try:
120 | reviewsTab = driver.find_element_by_class_name("pr__htabs-review-text")
121 | reviewsTab.click()
122 | time.sleep(delay)
123 |
124 | except NoSuchElementException:
125 | print("Kitap incelemeleri bulunamadı.")
126 | sys.exit()
127 |
128 | l = 1
129 | review_length = reviewsTab.text.replace("Yorumlar","")
130 |
131 | try:
132 | review_length = review_length.replace(".","")
133 | review_length = int(review_length)
134 |
135 | except NoSuchElementException:
136 | review_length = int(review_length)
137 |
138 | if review_length % 5 == 0:
139 | review_length = review_length // 5
140 | else:
141 | review_length = (review_length // 5) + 1
142 |
143 | while l <= review_length:
144 |
145 | lenOfPage = driver.execute_script("window.scrollTo(0, document.body.scrollHeight); var lenOfPage=document.body.scrollHeight; return lenOfPage;")
146 | match = False
147 |
148 | while match == False:
149 | lastCount = lenOfPage
150 | time.sleep(delay)
151 | lenOfPage = driver.execute_script("window.scrollTo(0, document.body.scrollHeight); var lenOfPage=document.body.scrollHeight; return lenOfPage;")
152 | if lastCount == lenOfPage:
153 | match = True
154 |
155 | print("Veriler çekiliyor...")
156 | print("Sayfa: " + str(l))
157 |
158 | try:
159 | reviews = driver.find_elements_by_class_name("review-text")
160 | for review in reviews:
161 | review = review.text
162 | review_texts.append(review)
163 |
164 | authors = driver.find_elements_by_xpath("//a[@class ='alt']//span[@itemprop='name']")
165 | for author in authors:
166 | author = author.text
167 | author_texts.append(author)
168 |
169 | dates = driver.find_elements_by_class_name("review-date")
170 | for date in dates:
171 | date = date.text
172 | date_texts.append(date)
173 |
174 | usefuls = driver.find_elements_by_xpath("//div[@class ='agree']//span[@class='count']")
175 | for useful in usefuls:
176 | useful = useful.text
177 | review_useful.append(useful)
178 |
179 | not_usefuls = driver.find_elements_by_xpath("//div[@class ='disagree']//span[@class='count']")
180 | for not_useful in not_usefuls:
181 | not_useful = not_useful.text
182 | review_not_useful.append(not_useful)
183 |
184 | except NoSuchElementException:
185 | time.sleep(delay)
186 |
187 | l += 1
188 |
189 | try:
190 | next_page = driver.find_element_by_link_text(str(l))
191 | next_page.click()
192 |
193 | except NoSuchElementException:
194 | time.sleep(delay)
195 |
196 | driver.close()
197 |
198 | length_list = [review_texts, review_useful, review_not_useful, author_texts, date_texts]
199 | limit = map(len, length_list)
200 | limit = min(list(limit))
201 | limit -= 1
202 |
203 | review_texts_fin = review_texts[:limit]
204 |
205 | df = pd.DataFrame({"Yorumlar": review_texts_fin})
206 |
207 | if scrape_author:
208 | author_texts_fin = author_texts[:limit]
209 | df["Müşteriler"] = author_texts_fin
210 |
211 | if scrape_date:
212 | date_texts_fin = date_texts[:limit]
213 | df["İnceleme Tarihi"] = date_texts_fin
214 |
215 | if scrape_useful:
216 | review_useful_fin = review_useful[:limit]
217 | review_not_useful_fin = review_not_useful[:limit]
218 | df["İncelemeyi Yararlı Bulan Kişi Sayısı"] = review_useful_fin
219 | df["İncelemeyi Yararlı Bulmayan Kişi Sayısı"] = review_not_useful_fin
220 |
221 | df.to_excel(file, header = True, index = False)
222 |
223 | x = "Çektiğiniz veriler "+ file + " adlı excel dosyasına kaydedildi."
224 | print(x)
225 |
226 | print("""
227 | --------------------------------------------------------------------------
228 | - Projeden memnun kaldıysanız Github üzerinden yıldızlamayı unutmayın. -
229 | - Github Hesabım: ardauzunoglu -
230 | --------------------------------------------------------------------------
231 | """)
232 |
233 | time.sleep(3)
234 | initialize()
235 | scrape()
236 |
237 | if __name__ == "__main__":
238 | kitapyurdu_scrape()
--------------------------------------------------------------------------------
/scrapers/kitapyurdu_scraper/requirements.txt:
--------------------------------------------------------------------------------
1 | numpy==1.18.5
2 | pandas==1.1.4
3 | selenium==3.141.0
--------------------------------------------------------------------------------
/scrapers/main-scraper.py:
--------------------------------------------------------------------------------
1 | import amazon_scraper as as
2 | from amazon_scraper import amazon_scraper
3 | import beyazperde_scraper as bp
4 | from beyazperde_scraper import beyazperde_scrape
5 | import ciceksepeti_scraper as cs
6 | from ciceksepeti_scraper import ciceksepeti_scraper
7 | import eksi_scraper as es
8 | from eksi_scraper import eksisozluk_scrape
9 | import gittigidiyor_scraper as gs
10 | from gittigidiyor_scraper import gittigidiyor_scrape
11 | import hepsiburada_scraper as hb
12 | from hepsiburada_scraper import hepsiburada_scrape
13 | import incehesap_scraper as is
14 | from incehesap_scraper import incehesap_scraper
15 | import kitapyurdu_scraper as ky
16 | from kitapyurdu_scraper import kitapyurdu_scrape
17 | import mediamarkt_scraper as ms
18 | from mediamarkt_scraper import mediamarkt_scraper
19 | import n11_scraper as ns
20 | from n11_scraper import n11_scraper
21 | import trendyol_scraper as ty
22 | from trendyol_scraper import trendyol_scrape
23 | import yemeksepeti_scraper as ys
24 | from yemeksepeti_scraper import yemeksepeti_scrape
25 | import youtube_scraper as yt
26 | from youtube_scraper import youtube_scrape
27 |
28 | choices = ["amazon", "beyazperde", "çiçeksepeti", "ekşi sözlük", "gittigidiyor", "hepsiburada", "incehesap", "kitapyurdu", "mediamarkt", "n11", "trendyol", "yemeksepeti", "youtube"]
29 | libs = {"amazon":amazon_scraper, "beyazperde":beyazperde_scrape, "çiçeksepeti":ciceksepeti_scraper, "ekşi sözlük":eksisozluk_scrape, "hepsiburada":hepsiburada_scrape, "incehesap":incehesap_scraper,
30 | "kitapyurdu":kitapyurdu_scrape, "mediamarkt":mediamarkt_scraper, "n11":n11_scraper, "trendyol":trendyol_scrape, "yemeksepeti":yemeksepeti_scrape, "youtube":youtube_scrape}
31 |
32 | choice = input("Kullanacağınız scraper: ")
33 | choice = choice.lower()
34 |
35 | if choice in choices:
36 | scraper = libs[choice]
37 | scraper()
38 | else:
39 | print("Geçersiz yanıt.")
40 | secenek = input("Kullanacağınız scraper: ")
--------------------------------------------------------------------------------
/scrapers/mediamarkt_scraper/mediamarkt-ornek-veriseti.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ardauzunoglu/TRScraper/85ef82834d225756b2727ad2e0aa3683d0dd9333/scrapers/mediamarkt_scraper/mediamarkt-ornek-veriseti.xlsx
--------------------------------------------------------------------------------
/scrapers/mediamarkt_scraper/mediamarkt_scraper.py:
--------------------------------------------------------------------------------
1 | import sys
2 | import time
3 | import pandas as pd
4 | from selenium import webdriver
5 | from selenium.common.exceptions import WebDriverException, NoSuchElementException
6 | from selenium.webdriver.common.keys import Keys
7 |
8 | def mediamarkt_scraper():
9 | def initialize():
10 | def preference(scrape_input, question):
11 | while (scrape_input.lower() != "y") or (scrape_input.lower() != "n"):
12 | if scrape_input.lower() == "y":
13 | output = True
14 | break
15 |
16 | elif scrape_input.lower() == "n":
17 | output = False
18 | break
19 |
20 | else:
21 | print("Geçersiz yanıt.")
22 | scrape_input = input(question)
23 |
24 | return output
25 |
26 | def delay_check(delay):
27 | while type(delay) != int:
28 | try:
29 | delay = int(delay)
30 | except ValueError:
31 | print("Lütfen bir sayı değeri giriniz.")
32 | delay = input("Bekleme süresi: ")
33 |
34 | return delay
35 |
36 | print("""
37 | ---------------------------------------------------------
38 | - MediaMarkt Scraper'a hoş geldiniz! -
39 | - Geliştirici: Arda Uzunoğlu -
40 | ---------------------------------------------------------
41 | """)
42 |
43 | global product_name, file, delay, review_texts, review_headlines, review_useful, customer_name_texts, date_texts, scrape_headlines, scrape_useful, scrape_customer_names, scrape_dates, path
44 |
45 | product_name = input("İncelemelerin çekileceği ürün adı: ")
46 | file = input("Oluşturulacak Excel dosyasının adı: ")
47 | file = file + ".xlsx"
48 | delay = delay_check(input("Bekleme süresi(sn): "))
49 |
50 | review_texts = []
51 | review_useful = []
52 | review_headlines = []
53 | customer_name_texts = []
54 | date_texts = []
55 |
56 | scrape_useful_question = "İncelemenin aldığı beğeni sayısı çekilsin mi(y/n): "
57 | scrape_useful_input = input(scrape_useful_question)
58 | scrape_useful = preference(scrape_useful_input, scrape_useful_question)
59 |
60 | scrape_headlines_question = "İncelemenin başlığı çekilsin mi(y/n): "
61 | scrape_headlines_input = input(scrape_headlines_question)
62 | scrape_headlines = preference(scrape_headlines_input, scrape_headlines_question)
63 |
64 | scrape_customer_name_question = "Müşteri isimleri çekilsin mi(y/n): "
65 | scrape_customer_name_input = input(scrape_customer_name_question)
66 | scrape_customer_names = preference(scrape_customer_name_input, scrape_customer_name_question)
67 |
68 | scrape_date_question = "İnceleme tarihleri çekilsin mi(y/n): "
69 | scrape_date_input = input(scrape_date_question)
70 | scrape_dates = preference(scrape_date_input, scrape_date_question)
71 |
72 | path = "BURAYA CHROMEDRİVER KONUMUNU GİRİNİZ"
73 |
74 | def scrape():
75 | try:
76 | print("Chromedriver'a erişiliyor...")
77 | driver = webdriver.Chrome(path)
78 | time.sleep(delay)
79 | print("Chromedriver'a erişildi.")
80 |
81 | except WebDriverException:
82 | print("Chromedriver kullanılamıyor.")
83 | sys.exit()
84 |
85 | try:
86 | print("MediaMarkt adresine gidiliyor...")
87 | driver.get("https://www.mediamarkt.com.tr")
88 | time.sleep(delay)
89 | driver.maximize_window()
90 | time.sleep(delay)
91 | print("MediaMarkt adresine gidildi.")
92 |
93 | except:
94 | print("MediaMarkt'a erişilemiyor.")
95 | sys.exit()
96 |
97 | try:
98 | print("Ürün aranıyor...")
99 | search_bar = driver.find_element_by_xpath("//*[@id='search-autocomplete']/form/input[1]")
100 | search_bar.send_keys(product_name)
101 | search_bar.send_keys(Keys.ENTER)
102 | time.sleep(delay)
103 |
104 | product = driver.find_element_by_class_name("clickable")
105 | product.click()
106 | time.sleep(delay)
107 | print("Ürün bulundu.")
108 |
109 | except NoSuchElementException:
110 | print("Ürün bulunamadı.")
111 | sys.exit()
112 |
113 | time.sleep(delay)
114 | review_count = driver.find_element_by_xpath("//*[@id='yorumlar-']/div/div[1]/h2").text.replace("Yorumlar ", "")
115 | review_count = review_count.replace("(", "")
116 | review_count = review_count.replace(")", "")
117 | review_count = int(review_count)
118 | driver.execute_script("window.scrollTo(0, 1080)")
119 |
120 | while len(review_texts) < review_count:
121 | if len(review_texts) <= 2:
122 | try:
123 | useful = driver.find_element_by_xpath("//*[@id='yorumlar-']/div/ul[1]/li["+str(len(review_texts) + 1)+"]/article/div[1]").text
124 | useful_prep = useful.split()[0]
125 | review_useful.append(useful_prep)
126 |
127 | headline = driver.find_element_by_xpath("//*[@id='yorumlar-']/div/ul[1]/li["+str(len(review_texts) + 1)+"]/article/h3").text
128 | review_headlines.append(headline)
129 |
130 | customer_name = driver.find_element_by_xpath("//*[@id='yorumlar-']/div/ul[1]/li["+str(len(review_texts) + 1)+"]/aside/strong").text
131 | customer_name_texts.append(customer_name)
132 |
133 | date = driver.find_element_by_xpath("//*[@id='yorumlar-']/div/ul[1]/li["+str(len(review_texts) + 1)+"]/aside/small").text
134 | date_texts.append(date)
135 |
136 | review = driver.find_element_by_xpath("//*[@id='yorumlar-']/div/ul[1]/li["+str(len(review_texts) + 1)+"]/article").text
137 | question = driver.find_element_by_class_name("review-rate").text
138 |
139 | review = review.replace(useful, "").replace(headline, "").replace(question, "")
140 | review_texts.append(review)
141 | print("Veriler çekiliyor...")
142 | print("İnceleme: " + str(len(review_texts)))
143 |
144 | except:
145 | break
146 |
147 | try:
148 | cookie_accept_button = driver.find_element_by_xpath("//*[@id='cookie-info-layer']/div[1]/div/div[2]/a")
149 | cookie_accept_button.click()
150 |
151 | except:
152 | pass
153 |
154 | try:
155 | load_all_reviews = driver.find_element_by_xpath("//*[@id='yorumlar-']/div/div[3]/a[1]")
156 | load_all_reviews.click()
157 |
158 | except:
159 | pass
160 |
161 | lenOfPage = driver.execute_script("window.scrollTo(0, document.body.scrollHeight); var lenOfPage=document.body.scrollHeight; return lenOfPage;")
162 | match = False
163 |
164 | while match == False:
165 | lastCount = lenOfPage
166 | time.sleep(delay)
167 | lenOfPage = driver.execute_script("window.scrollTo(0, document.body.scrollHeight); var lenOfPage=document.body.scrollHeight; return lenOfPage;")
168 | if lastCount == lenOfPage:
169 | match = True
170 |
171 | time.sleep(delay)
172 |
173 | else:
174 | try:
175 | useful = driver.find_element_by_xpath("//*[@id='yorumlar-']/div/ul[2]/li["+str(len(review_texts) + 1)+"]/article/div[1]").text
176 | useful_prep = useful.split()[0]
177 | review_useful.append(useful_prep)
178 |
179 | headline = driver.find_element_by_xpath("//*[@id='yorumlar-']/div/ul[2]/li["+str(len(review_texts) + 1)+"]/article/h3").text
180 | review_headlines.append(headline)
181 |
182 | customer_name = driver.find_element_by_xpath("//*[@id='yorumlar-']/div/ul[2]/li["+str(len(review_texts) + 1)+"]/aside/strong").text
183 | customer_name_texts.append(customer_name)
184 |
185 | date = driver.find_element_by_xpath("//*[@id='yorumlar-']/div/ul[2]/li["+str(len(review_texts) + 1)+"]/aside/small").text
186 | date_texts.append(date)
187 |
188 | review = driver.find_element_by_xpath("//*[@id='yorumlar-']/div/ul[2]/li["+str(len(review_texts) + 1)+"]/article").text
189 | question = driver.find_element_by_class_name("review-rate").text
190 |
191 | review = review.replace(useful, "").replace(headline, "").replace(question, "")
192 | review_texts.append(review)
193 | print("Veriler çekiliyor...")
194 | print("İnceleme: " + str(len(review_texts)))
195 |
196 | lenOfPage = driver.execute_script("window.scrollTo(0, document.body.scrollHeight); var lenOfPage=document.body.scrollHeight; return lenOfPage;")
197 | match = False
198 |
199 | while match == False:
200 | lastCount = lenOfPage
201 | time.sleep(delay)
202 | lenOfPage = driver.execute_script("window.scrollTo(0, document.body.scrollHeight); var lenOfPage=document.body.scrollHeight; return lenOfPage;")
203 | if lastCount == lenOfPage:
204 | match = True
205 |
206 | except:
207 | break
208 |
209 | time.sleep(delay)
210 |
211 | driver.close()
212 |
213 | length_list = [review_texts, review_useful, review_headlines, customer_name_texts, date_texts]
214 | limit = map(len, length_list)
215 | limit = min(list(limit))
216 | limit -= 1
217 |
218 | review_texts_fin = review_texts[:limit]
219 | df = pd.DataFrame({"Yorum": review_texts_fin})
220 |
221 | if scrape_useful:
222 | review_useful_fin = review_useful[:limit]
223 | df["Yorum Beğeni Sayısı"] = review_useful_fin
224 |
225 | if scrape_headlines:
226 | review_headlines_fin = review_headlines[:limit]
227 | df["Yorumun Başlığı"] = review_headlines_fin
228 |
229 | if scrape_customer_names:
230 | customer_name_texts_fin = customer_name_texts[:limit]
231 | df["Yorum Yazan Müşteri"] = customer_name_texts_fin
232 |
233 | if scrape_dates:
234 | date_texts_fin = date_texts[:limit]
235 | df["Yorumun Yazıldığı Tarih"] = date_texts_fin
236 |
237 | df.to_excel(file, header = True, index = False)
238 |
239 | x = "Çektiğiniz veriler " + file + " adlı excel dosyasına kaydedildi."
240 | print(x)
241 |
242 | print("""
243 | --------------------------------------------------------------------------
244 | - Projeden memnun kaldıysanız Github üzerinden yıldızlamayı unutmayın. -
245 | - Github Hesabım: ardauzunoglu -
246 | --------------------------------------------------------------------------
247 | """)
248 | initialize()
249 | scrape()
250 |
251 | if __name__ == "__main__":
252 | mediamarkt_scraper()
--------------------------------------------------------------------------------
/scrapers/mediamarkt_scraper/requirements.txt:
--------------------------------------------------------------------------------
1 | numpy==1.18.5
2 | pandas==1.1.4
3 | selenium==3.141.0
--------------------------------------------------------------------------------
/scrapers/n11_scraper/n11-ornek-veriseti.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ardauzunoglu/TRScraper/85ef82834d225756b2727ad2e0aa3683d0dd9333/scrapers/n11_scraper/n11-ornek-veriseti.xlsx
--------------------------------------------------------------------------------
/scrapers/n11_scraper/n11_scraper.py:
--------------------------------------------------------------------------------
1 | import sys
2 | import time
3 | import pandas as pd
4 | from selenium import webdriver
5 | from selenium.common.exceptions import WebDriverException, NoSuchElementException
6 | from selenium.webdriver.common.keys import Keys
7 |
8 | def n11_scraper():
9 | def initialize():
10 | def preference(scrape_input, question):
11 | while (scrape_input.lower() != "y") or (scrape_input.lower() != "n"):
12 | if scrape_input.lower() == "y":
13 | output = True
14 | break
15 |
16 | elif scrape_input.lower() == "n":
17 | output = False
18 | break
19 |
20 | else:
21 | print("Geçersiz yanıt.")
22 | scrape_input = input(question)
23 |
24 | return output
25 |
26 | def delay_check(delay):
27 | while type(delay) != int:
28 | try:
29 | delay = int(delay)
30 | except ValueError:
31 | print("Lütfen bir sayı değeri giriniz.")
32 | delay = input("Bekleme süresi: ")
33 |
34 | return delay
35 |
36 | print("""
37 | ---------------------------------------------------------
38 | - N11 Scraper'a hoş geldiniz! -
39 | - Geliştirici: Arda Uzunoğlu -
40 | ---------------------------------------------------------
41 | """)
42 |
43 | global product_name, file, delay, review_texts, review_headlines, review_useful, customer_name_texts, date_texts, scrape_headlines, scrape_useful, scrape_customer_names, scrape_dates, path
44 |
45 | product_name = input("İncelemelerin çekileceği ürün adı: ")
46 | file = input("Oluşturulacak Excel dosyasının adı: ")
47 | file = file + ".xlsx"
48 | delay = delay_check(input("Bekleme süresi(sn): "))
49 |
50 | review_texts = []
51 | review_useful = []
52 | review_headlines = []
53 | customer_name_texts = []
54 | date_texts = []
55 |
56 | scrape_useful_question = "İncelemenin aldığı beğeni sayısı çekilsin mi(y/n): "
57 | scrape_useful_input = input(scrape_useful_question)
58 | scrape_useful = preference(scrape_useful_input, scrape_useful_question)
59 |
60 | scrape_headlines_question = "İncelemenin başlığı çekilsin mi(y/n): "
61 | scrape_headlines_input = input(scrape_headlines_question)
62 | scrape_headlines = preference(scrape_headlines_input, scrape_headlines_question)
63 |
64 | scrape_customer_name_question = "Müşteri isimleri çekilsin mi(y/n): "
65 | scrape_customer_name_input = input(scrape_customer_name_question)
66 | scrape_customer_names = preference(scrape_customer_name_input, scrape_customer_name_question)
67 |
68 | scrape_date_question = "İnceleme tarihleri çekilsin mi(y/n): "
69 | scrape_date_input = input(scrape_date_question)
70 | scrape_dates = preference(scrape_date_input, scrape_date_question)
71 |
72 | path = "BURAYA CHROMEDRIVER KONUMUNU GİRİNİZ"
73 |
74 | def scrape():
75 | try:
76 | print("Chromedriver'a erişiliyor...")
77 | driver = webdriver.Chrome(path)
78 | time.sleep(delay)
79 | print("Chromedriver'a erişildi.")
80 |
81 | except WebDriverException:
82 | print("Chromedriver kullanılamıyor.")
83 | sys.exit()
84 |
85 | try:
86 | print("N11 adresine gidiliyor...")
87 | driver.get("https://www.n11.com")
88 | time.sleep(delay)
89 | driver.maximize_window()
90 | time.sleep(delay)
91 | print("N11 adresine gidildi.")
92 |
93 | except:
94 | print("N11'e erişilemiyor.")
95 | sys.exit()
96 |
97 | try:
98 | print("Ürün aranıyor...")
99 | search_bar = driver.find_element_by_id("searchData")
100 | search_bar.send_keys(product_name)
101 | search_bar.send_keys(Keys.ENTER)
102 | time.sleep(delay)
103 |
104 | product = driver.find_element_by_class_name("productName")
105 | product.click()
106 | time.sleep(delay)
107 | print("Ürün bulundu.")
108 |
109 | except NoSuchElementException:
110 | print("Ürün bulunamadı.")
111 | sys.exit()
112 |
113 | review_count = driver.find_element_by_class_name("reviewNum").text
114 | review_count = int(review_count)
115 |
116 | go_to_reviews = driver.find_element_by_id("readReviews")
117 | go_to_reviews.click()
118 |
119 | if review_count % 10 == 0:
120 | length_of_page = review_count // 10
121 | else:
122 | length_of_page = (review_count // 10) + 1
123 |
124 | l = 1
125 |
126 | while l <= length_of_page:
127 |
128 | print("İncelemeler çekiliyor...")
129 | print("Sayfa: " + str(l))
130 |
131 | time.sleep(delay)
132 | driver.execute_script("window.scrollTo(0, document.body.scrollHeight); var lenOfPage=document.body.scrollHeight; return lenOfPage;")
133 |
134 | comments = driver.find_elements_by_class_name("comment")
135 | for comment in comments:
136 |
137 | customer = comment.find_element_by_class_name("userName").text
138 | customer_name_texts.append(customer)
139 |
140 | date = comment.find_element_by_class_name("commentDate").text
141 | date_texts.append(date)
142 |
143 | try:
144 | headline = comment.find_element_by_class_name("commentTitle").text
145 | review_headlines.append(headline)
146 |
147 | except:
148 | review_headlines.append("BOŞ")
149 |
150 | useful = comment.find_element_by_class_name("btnComment.yesBtn").text
151 | useful = useful.replace("Evet", "").replace("(", "").replace(")", "")
152 | review_useful.append(useful)
153 |
154 | replaced_useful = comment.find_element_by_class_name("btnComment.yesBtn").text
155 | review = comment.text
156 | review = review.replace(customer, "").replace(date, "").replace(replaced_useful, "").replace("Bu yorumu faydalı buldunuz mu?", "")
157 | review_texts.append(review)
158 |
159 | try:
160 | next_button = driver.find_element_by_xpath("//*[@id='tabPanelProComments']/div/div[2]/div[2]/a[11]")
161 | next_button.click()
162 |
163 | except:
164 | pass
165 |
166 | l += 1
167 |
168 | driver.close()
169 |
170 | length_list = [review_texts, review_useful, review_headlines, customer_name_texts, date_texts]
171 | limit = map(len, length_list)
172 | limit = min(list(limit))
173 | limit -= 1
174 |
175 | review_texts_fin = review_texts[:limit]
176 | df = pd.DataFrame({"Yorum": review_texts_fin})
177 |
178 | if scrape_useful:
179 | review_useful_fin = review_useful[:limit]
180 | df["Yorum Beğeni Sayısı"] = review_useful_fin
181 |
182 | if scrape_headlines:
183 | review_headlines_fin = review_headlines[:limit]
184 | df["Yorumun Başlığı"] = review_headlines_fin
185 |
186 | if scrape_customer_names:
187 | customer_name_texts_fin = customer_name_texts[:limit]
188 | df["Yorum Yazan Müşteri"] = customer_name_texts_fin
189 |
190 | if scrape_dates:
191 | date_texts_fin = date_texts[:limit]
192 | df["Yorumun Yazıldığı Tarih"] = date_texts_fin
193 |
194 | df.to_excel(file, header = True, index = False)
195 |
196 | x = "Çektiğiniz veriler " + file + " adlı excel dosyasına kaydedildi."
197 | print(x)
198 |
199 | print("""
200 | --------------------------------------------------------------------------
201 | - Projeden memnun kaldıysanız Github üzerinden yıldızlamayı unutmayın. -
202 | - Github Hesabım: ardauzunoglu -
203 | --------------------------------------------------------------------------
204 | """)
205 | initialize()
206 | scrape()
207 |
208 | if __name__ == "__main__":
209 | n11_scraper()
--------------------------------------------------------------------------------
/scrapers/n11_scraper/requirements.txt:
--------------------------------------------------------------------------------
1 | numpy==1.18.5
2 | pandas==1.1.4
3 | selenium==3.141.0
--------------------------------------------------------------------------------
/scrapers/trendyol_scraper/requirements.txt:
--------------------------------------------------------------------------------
1 | numpy==1.18.5
2 | pandas==1.1.4
3 | selenium==3.141.0
--------------------------------------------------------------------------------
/scrapers/trendyol_scraper/trendyol-ornek-veriseti.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ardauzunoglu/TRScraper/85ef82834d225756b2727ad2e0aa3683d0dd9333/scrapers/trendyol_scraper/trendyol-ornek-veriseti.xlsx
--------------------------------------------------------------------------------
/scrapers/trendyol_scraper/trendyol_scraper.py:
--------------------------------------------------------------------------------
1 | import sys
2 | import time
3 | import pandas as pd
4 | from selenium import webdriver
5 | from selenium.common.exceptions import WebDriverException, NoSuchElementException
6 | from selenium.webdriver.common.keys import Keys
7 |
8 | def trendyol_scrape():
9 | def initialize():
10 | def preference(scrape_input, question):
11 | while (scrape_input.lower() != "y") or (scrape_input.lower() != "n"):
12 | if scrape_input.lower() == "y":
13 | output = True
14 | break
15 |
16 | elif scrape_input.lower() == "n":
17 | output = False
18 | break
19 |
20 | else:
21 | print("Geçersiz yanıt.")
22 | scrape_input = input(question)
23 |
24 | return output
25 |
26 | def delay_check(delay):
27 | while type(delay) != int:
28 | try:
29 | delay = int(delay)
30 | except ValueError:
31 | print("Lütfen bir sayı değeri giriniz.")
32 | delay = input("Bekleme süresi: ")
33 |
34 | return delay
35 |
36 | print("""
37 | ---------------------------------------------------------
38 | - Trendyol Scraper'a hoş geldiniz! -
39 | - Geliştirici: Arda Uzunoğlu -
40 | ---------------------------------------------------------
41 | """)
42 |
43 | global product_name, file, delay, review_texts, review_useful, customer_name_texts, date_texts, scrape_useful, scrape_customer_name, scrape_date, path
44 |
45 | product_name = input("İncelemelerin çekileceği ürün adı: ")
46 | file = input("Oluşturulacak Excel dosyasının adı: ")
47 | file = file + ".xlsx"
48 | delay = delay_check(input("Bekleme süresi(sn): "))
49 |
50 | review_texts = []
51 | review_useful = []
52 | customer_name_texts = []
53 | date_texts = []
54 |
55 | scrape_useful_question = "İncelemenin aldığı beğeni sayısı çekilsin mi(y/n): "
56 | scrape_useful_input = input(scrape_useful_question)
57 | scrape_useful = preference(scrape_useful_input, scrape_useful_question)
58 |
59 | scrape_customer_name_question = "Müşteri isimleri çekilsin mi(y/n): "
60 | scrape_customer_name_input = input(scrape_customer_name_question)
61 | scrape_customer_name = preference(scrape_customer_name_input, scrape_customer_name_question)
62 |
63 | scrape_date_question = "İnceleme tarihleri çekilsin mi(y/n): "
64 | scrape_date_input = input(scrape_date_question)
65 | scrape_date = preference(scrape_date_input, scrape_date_question)
66 |
67 | path = "BURAYA CHROMEDRIVER KONUMUNU GİRİNİZ"
68 |
69 | def scrape():
70 | try:
71 | print("Chromedriver'a erişiliyor...")
72 | driver = webdriver.Chrome(path)
73 | time.sleep(delay)
74 | print("Chromedriver'a erişildi.")
75 |
76 | except WebDriverException:
77 | print("Chromedriver kullanılamıyor.")
78 | sys.exit()
79 |
80 | try:
81 | print("Trendyol adresine gidiliyor...")
82 | driver.get("https://www.trendyol.com")
83 | time.sleep(delay)
84 | driver.maximize_window()
85 | time.sleep(delay)
86 | print("Trendyol adresine gidildi.")
87 |
88 | except:
89 | print("Trendyola'a erişilemiyor.")
90 | sys.exit()
91 |
92 | try:
93 | print("Ürün aranıyor...")
94 | search_bar = driver.find_element_by_class_name("search-box")
95 | search_bar.send_keys(product_name)
96 | search_bar.send_keys(Keys.ENTER)
97 | time.sleep(delay)
98 |
99 | product = driver.find_element_by_class_name("prdct-desc-cntnr")
100 | product.click()
101 | time.sleep(delay)
102 | print("Ürün bulundu.")
103 |
104 | except NoSuchElementException:
105 | print("Ürün bulunamadı.")
106 | sys.exit()
107 |
108 | url = driver.current_url
109 | index_of_question_mark = url.index("?")
110 | url = url[:index_of_question_mark]
111 | url = url + "/yorumlar"
112 | driver.get(url)
113 |
114 | review_count = driver.find_element_by_class_name("pr-rnr-sm-p-s").text
115 | review_count = review_count.replace("Değerlendirme", "")
116 | review_count = review_count.replace("Yorum", "")
117 | review_count = review_count.split()
118 | review_count = int(review_count[1])
119 |
120 | while len(review_texts) < review_count:
121 |
122 | lenOfPage = driver.execute_script("window.scrollTo(0, document.body.scrollHeight); var lenOfPage=document.body.scrollHeight; return lenOfPage;")
123 | match = False
124 |
125 | while match == False:
126 | lastCount = lenOfPage
127 | time.sleep(delay)
128 | lenOfPage = driver.execute_script("window.scrollTo(0, document.body.scrollHeight); var lenOfPage=document.body.scrollHeight; return lenOfPage;")
129 | if lastCount == lenOfPage:
130 | match = True
131 |
132 | time.sleep(delay)
133 |
134 | reviews = driver.find_elements_by_class_name("rnr-com-tx")
135 | for review in reviews:
136 | review = review.text
137 | review_texts.append(review)
138 |
139 | print("Veriler çekiliyor...")
140 | print("İnceleme: " + str(len(review_texts)))
141 |
142 | usefuls = driver.find_elements_by_xpath("//*[@class='tooltip-wrp']//span[2]")
143 | for useful in usefuls:
144 | useful = useful.text
145 | useful = useful.strip("()")
146 | review_useful.append(useful)
147 |
148 | customers = driver.find_elements_by_xpath("//*[@class='rnr-com-bt']//span[@class = 'rnr-com-usr']")
149 | for customer in customers:
150 | customer = customer.text
151 | customer = customer.replace("|","")
152 | customer = customer.split()
153 |
154 | customer_name = customer[-3:]
155 | customer_name = " ".join(customer_name)
156 | customer_name_texts.append(customer_name)
157 |
158 | date = customer[:-3]
159 | date = " ".join(date)
160 | date_texts.append(date)
161 |
162 | driver.close()
163 |
164 | length_list = [review_texts, review_useful, customer_name_texts, date_texts]
165 | limit = map(len, length_list)
166 | limit = min(list(limit))
167 | limit -= 1
168 |
169 | review_texts_fin = review_texts[:limit]
170 | df = pd.DataFrame({"Yorum": review_texts_fin})
171 |
172 | if scrape_useful:
173 | review_useful_fin = review_useful[:limit]
174 | df["Yorum Beğeni Sayısı"] = review_useful_fin
175 |
176 | if scrape_customer_name:
177 | customer_name_texts_fin = customer_name_texts[:limit]
178 | df["Yorum Yazan Müşteri"] = customer_name_texts_fin
179 |
180 | if scrape_date:
181 | date_texts_fin = date_texts[:limit]
182 | df["Yorumun Yazıldığı Tarih"] = date_texts_fin
183 |
184 | df.to_excel(file, header = True, index = False)
185 |
186 | x = "Çektiğiniz veriler " + file + " adlı excel dosyasına kaydedildi."
187 | print(x)
188 | print("""
189 | --------------------------------------------------------------------------
190 | - Projeden memnun kaldıysanız Github üzerinden yıldızlamayı unutmayın. -
191 | - Github Hesabım: ardauzunoglu -
192 | --------------------------------------------------------------------------
193 | """)
194 |
195 | time.sleep(3)
196 | initialize()
197 | scrape()
198 |
199 | if __name__ == "__main__":
200 | trendyol_scrape()
--------------------------------------------------------------------------------
/scrapers/yemeksepeti_scraper/requirements.txt:
--------------------------------------------------------------------------------
1 | numpy==1.18.5
2 | pandas==1.1.4
3 | selenium==3.141.0
--------------------------------------------------------------------------------
/scrapers/yemeksepeti_scraper/yemeksepeti-ornek-veriseti.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ardauzunoglu/TRScraper/85ef82834d225756b2727ad2e0aa3683d0dd9333/scrapers/yemeksepeti_scraper/yemeksepeti-ornek-veriseti.xlsx
--------------------------------------------------------------------------------
/scrapers/yemeksepeti_scraper/yemeksepeti_scraper.py:
--------------------------------------------------------------------------------
1 | import sys
2 | import time
3 | import pandas as pd
4 | from selenium import webdriver
5 | from selenium.common.exceptions import WebDriverException, NoSuchElementException
6 | from selenium.webdriver.common.keys import Keys
7 |
8 | def yemeksepeti_scrape():
9 | def initialize():
10 | def preference(scrape_input, question):
11 | while (scrape_input.lower() != "y") or (scrape_input.lower() != "n"):
12 | if scrape_input.lower() == "y":
13 | output = True
14 | break
15 |
16 | elif scrape_input.lower() == "n":
17 | output = False
18 | break
19 |
20 | else:
21 | print("Geçersiz yanıt.")
22 | scrape_input = input(question)
23 |
24 | return output
25 |
26 | def delay_check(delay):
27 | while type(delay) != int:
28 | try:
29 | delay = int(delay)
30 | except ValueError:
31 | print("Lütfen bir sayı değeri giriniz.")
32 | delay = input("Bekleme süresi: ")
33 |
34 | return delay
35 |
36 | print("""
37 | ---------------------------------------------------------
38 | - Yemeksepeti Scraper'a hoş geldiniz! -
39 | - Geliştirici: Arda Uzunoğlu -
40 | ---------------------------------------------------------
41 | """)
42 |
43 | global restaurant_info, username_info, password_info, city_info, file, delay, review_texts, author_texts, date_texts, speed_ratings, service_ratings, flavour_ratings, scrape_author, scrape_date, scrape_speed, scrape_service, scrape_flavour, path
44 |
45 | restaurant_info = input("Yorumların Çekileceği Restoran: ")
46 | username_info = input("Yemeksepeti kullanıcı adı: ")
47 | password_info = input("Yemeksepeti parola: ")
48 | city_info = input("Yemeksepeti Şehir: ")
49 | file = input("Oluşturulacak Excel dosyasının adı: ")
50 | file = file + ".xlsx"
51 | delay = delay_check(input("Bekleme süresi(sn): "))
52 |
53 | review_texts = []
54 | author_texts = []
55 | date_texts = []
56 | speed_ratings = []
57 | service_ratings = []
58 | flavour_ratings = []
59 |
60 | scrape_author_question = "Müşteri isimleri çekilsin mi(y/n): "
61 | scrape_author_input = input(scrape_author_question)
62 | scrape_author = preference(scrape_author_input, scrape_author_question)
63 |
64 | scrape_date_question = "İnceleme tarihleri çekilsin mi(y/n): "
65 | scrape_date_input = input(scrape_date_question)
66 | scrape_date = preference(scrape_date_input, scrape_date_question)
67 |
68 | scrape_speed_question = "İncelemedeki hız puanı çekilsin mi(y/n): "
69 | scrape_speed_input = input(scrape_speed_question)
70 | scrape_speed = preference(scrape_speed_input, scrape_speed_question)
71 |
72 | scrape_service_question = "İncelemedeki servis puanı çekilsin mi(y/n): "
73 | scrape_service_input = input(scrape_service_question)
74 | scrape_service = preference(scrape_service_input, scrape_service_question)
75 |
76 | scrape_flavour_question = "İncelemedeki lezzet puanı çekilsin mi(y/n): "
77 | scrape_flavour_input = input(scrape_flavour_question)
78 | scrape_flavour = preference(scrape_flavour_input, scrape_flavour_question)
79 |
80 | path = "BURAYA CHROMEDRIVER KONUMUNU GİRİNİZ"
81 |
82 | tr_chars = ["ğ", "ş", "ı", "ü", "ö", "ç"]
83 | tr2eng = {
84 | "ğ":"g",
85 | "ş":"s",
86 | "ı":"i",
87 | "ü":"u",
88 | "ö":"o",
89 | "ç":"c"
90 | }
91 |
92 | city_info = city_info.lower()
93 | for harf in city_info:
94 | if harf in tr_chars:
95 | city_info = city_info.replace(harf, tr2eng[harf])
96 |
97 | else:
98 | pass
99 |
100 | def scrape():
101 | try:
102 | print("Chromedriver'a erişiliyor...")
103 | driver = webdriver.Chrome(path)
104 | time.sleep(delay)
105 | print("Chromedriver'a erişildi.")
106 |
107 | except WebDriverException:
108 | print("Chromedriver kullanılamıyor.")
109 | sys.exit()
110 |
111 | try:
112 | print("Yemeksepeti adresine gidiliyor...")
113 | url = "https://www.yemeksepeti.com/" + city_info
114 | driver.get(url)
115 | time.sleep(delay)
116 | driver.maximize_window()
117 | time.sleep(delay)
118 | print("Yemeksepeti adresine gidildi.")
119 |
120 | except:
121 | print("Yemeksepeti'ne ulaşılamıyor.")
122 | sys.exit()
123 |
124 | try:
125 | print("Yemeksepeti hesabına giriş yapılıyor...")
126 | username = driver.find_element_by_id("UserName")
127 | username.send_keys(username_info)
128 | time.sleep(delay)
129 |
130 | password = driver.find_element_by_id("password")
131 | password.send_keys(password_info)
132 | password.send_keys(Keys.ENTER)
133 | time.sleep(delay)
134 | print("Yemeksepeti hesabına giriş yapıldı.")
135 |
136 | except NoSuchElementException:
137 | print("Kullanıcı adı ve/veya parola hatalı.")
138 | sys.exit()
139 |
140 | try:
141 | address_area = driver.find_element_by_class_name("address-area")
142 | address_area.click()
143 | time.sleep(delay)
144 |
145 | except NoSuchElementException:
146 | print("Kayıtlı adres bulunamadı.")
147 | sys.exit()
148 |
149 | try:
150 | print("Restoran aranıyor...")
151 | search_box = driver.find_element_by_class_name("search-box")
152 | search_box.send_keys(restaurant_info)
153 | search_box.send_keys(Keys.ENTER)
154 | time.sleep(delay+3)
155 |
156 | restaurant = driver.find_element_by_class_name("restaurantName")
157 | restaurant.click()
158 | time.sleep(delay)
159 | print("Restoran bulundu.")
160 |
161 | except NoSuchElementException:
162 | print("Restoran bulunamadı.")
163 | sys.exit()
164 |
165 | try:
166 | yorumlar_section = driver.find_element_by_xpath("//*[@id='restaurantDetail']/div[2]/div[1]/ul/li[4]/a")
167 | yorumlar_section.click()
168 | time.sleep(delay)
169 |
170 | except NoSuchElementException:
171 | print("Yorum bulunamadı.")
172 | sys.exit()
173 |
174 | l = 1
175 | review_count = yorumlar_section.text
176 | review_count = review_count.replace("Yorumlar", "")
177 | review_count = review_count.replace("(","")
178 | review_count = review_count.replace(")","")
179 | review_count = int(review_count)
180 |
181 | if review_count % 30 == 0:
182 | review_count = review_count // 30
183 |
184 | else:
185 | review_count = (review_count // 30) + 1
186 |
187 | while l < review_count:
188 |
189 | lenOfPage = driver.execute_script("window.scrollTo(0, document.body.scrollHeight); var lenOfPage=document.body.scrollHeight; return lenOfPage;")
190 | match = False
191 |
192 | while match == False:
193 | lastCount = lenOfPage
194 | time.sleep(delay)
195 | lenOfPage = driver.execute_script("window.scrollTo(0, document.body.scrollHeight); var lenOfPage=document.body.scrollHeight; return lenOfPage;")
196 | if lastCount == lenOfPage:
197 | match = True
198 |
199 | print("Veriler çekiliyor...")
200 | print("Sayfa: " + str(l))
201 |
202 | yorumlar = driver.find_elements_by_class_name("comment.row")
203 | for yorum in yorumlar:
204 | yorum = yorum.text
205 | yorum = yorum.replace("\n", " ")
206 | yorum = yorum.split()
207 |
208 | if "." in yorum[0]:
209 | yorum = yorum[1:]
210 |
211 | else:
212 | pass
213 |
214 | yorum = " ".join(yorum)
215 | review_texts.append(yorum)
216 |
217 | authors = driver.find_elements_by_class_name("userName")
218 | for author in authors:
219 | author = author.text
220 | author_texts.append(author)
221 |
222 | dates = driver.find_elements_by_class_name("commentDate")
223 | for date in dates:
224 | date = date.text
225 | date_texts.append(date)
226 |
227 | speeds = driver.find_elements_by_class_name("speed")
228 | for speed in speeds:
229 | speed = speed.text
230 | speed = speed.replace("Hız: ", "")
231 | speed_ratings.append(speed)
232 |
233 | services = driver.find_elements_by_class_name("serving")
234 | for service in services:
235 | service = service.text
236 | service = service.replace("Servis: ", "")
237 | service_ratings.append(service)
238 |
239 | flavours = driver.find_elements_by_class_name("flavour")
240 | for flavour in flavours:
241 | flavour = flavour.text
242 | flavour = flavour.replace("Lezzet: ", "")
243 | flavour_ratings.append(flavour)
244 |
245 | l += 1
246 | next_page = driver.find_element_by_link_text(str(l))
247 | next_page.click()
248 |
249 | driver.close()
250 |
251 | def duplicates(lst, item):
252 | return [i for i, x in enumerate(lst) if x == item]
253 |
254 | if "Restoran Cevabı" in author_texts:
255 | girilecek_rating_indexleri = duplicates(author_texts,"Restoran Cevabı")
256 | for i in girilecek_rating_indexleri:
257 | date_texts.insert(i, "Restoran Cevabı")
258 | speed_ratings.insert(i, "Restoran Cevabı")
259 | service_ratings.insert(i, "Restoran Cevabı")
260 | flavour_ratings.insert(i, "Restoran Cevabı")
261 |
262 | elif "Yemeksepeti" in author_texts:
263 | girilecek_rating_indexleri = duplicates(author_texts,"Yemeksepeti")
264 | for i in girilecek_rating_indexleri:
265 | date_texts.insert(i, "Yemeksepeti")
266 | speed_ratings.insert(i, "Yemeksepeti")
267 | service_ratings.insert(i, "Yemeksepeti")
268 | flavour_ratings.insert(i, "Yemeksepeti")
269 |
270 | else:
271 | pass
272 |
273 | length_list = [review_texts, author_texts, date_texts, speed_ratings, service_ratings, flavour_ratings]
274 | limit = map(len, length_list)
275 | limit = min(list(limit))
276 | limit -= 1
277 |
278 | review_texts = review_texts[:limit]
279 | df = pd.DataFrame({"Yorumlar": review_texts})
280 |
281 | if scrape_author:
282 | author_texts_fin = author_texts[:limit]
283 | df["Müşteriler"] = author_texts_fin
284 |
285 | if scrape_date:
286 | date_texts_fin = date_texts[:limit]
287 | df["Yorum Tarihi"] = date_texts_fin
288 |
289 | if scrape_speed:
290 | speed_ratings_fin = speed_ratings[:limit]
291 | df["Hız Değerlendirmesi"] = speed_ratings_fin
292 |
293 | if scrape_service:
294 | service_ratings_fin = service_ratings[:limit]
295 | df["Servis Değerlendirmesi"] = service_ratings_fin
296 |
297 | if scrape_flavour:
298 | flavour_ratings_fin = flavour_ratings[:limit]
299 | df["Lezzet Değerlendirmesi"] = flavour_ratings_fin
300 |
301 | df.to_excel(file, header = True, index = False)
302 |
303 | x = "Çektiğiniz veriler "+ file + " adlı excel dosyasına kaydedildi."
304 | print(x)
305 | print("""
306 | --------------------------------------------------------------------------
307 | - Projeden memnun kaldıysanız Github üzerinden yıldızlamayı unutmayın. -
308 | - Github Hesabım: ardauzunoglu -
309 | --------------------------------------------------------------------------
310 | """)
311 |
312 | time.sleep(3)
313 | initialize()
314 | scrape()
315 |
316 | if __name__ == "__main__":
317 | yemeksepeti_scrape()
--------------------------------------------------------------------------------
/scrapers/youtube_scraper/requirements.txt:
--------------------------------------------------------------------------------
1 | numpy==1.18.5
2 | pandas==1.1.4
3 | selenium==3.141.0
--------------------------------------------------------------------------------
/scrapers/youtube_scraper/youtube-ornek-veriseti.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ardauzunoglu/TRScraper/85ef82834d225756b2727ad2e0aa3683d0dd9333/scrapers/youtube_scraper/youtube-ornek-veriseti.xlsx
--------------------------------------------------------------------------------
/scrapers/youtube_scraper/youtube_scraper.py:
--------------------------------------------------------------------------------
1 | import sys
2 | import time
3 | import pandas as pd
4 | from selenium import webdriver
5 | from selenium.common.exceptions import WebDriverException, NoSuchElementException
6 | from selenium.webdriver.common.keys import Keys
7 |
8 | def youtube_scrape():
9 | def initialize():
10 | def preference(scrape_input, question):
11 | while (scrape_input.lower() != "y") or (scrape_input.lower() != "n"):
12 | if scrape_input.lower() == "y":
13 | output = True
14 | break
15 |
16 | elif scrape_input.lower() == "n":
17 | output = False
18 | break
19 |
20 | else:
21 | print("Geçersiz yanıt.")
22 | scrape_input = input(question)
23 |
24 | return output
25 |
26 | def delay_check(delay):
27 | while type(delay) != int:
28 | try:
29 | delay = int(delay)
30 | except ValueError:
31 | print("Lütfen bir sayı değeri giriniz.")
32 | delay = input("Bekleme süresi: ")
33 |
34 | return delay
35 |
36 | print("""
37 | ---------------------------------------------------------
38 | - Youtube Scraper'a hoş geldiniz! -
39 | - Geliştirici: Arda Uzunoğlu -
40 | ---------------------------------------------------------
41 | """)
42 |
43 | global url, file, delay, comment_texts, author_texts, date_texts, title_text, like_texts, scrape_author, scrape_date, scrape_title, scrape_like, path
44 |
45 | url = input("Yorumların çekileceği Youtube videosunun bağlantısı: ")
46 | file = input("Oluşturulacak Excel dosyasının adı: ")
47 | file = file + ".xlsx"
48 | delay = delay_check(input("Bekleme süresi(sn): "))
49 |
50 | comment_texts = []
51 | author_texts = []
52 | date_texts = []
53 | title_text = []
54 | like_texts = []
55 |
56 | scrape_author_question = "Kullanıcı isimleri çekilsin mi(y/n): "
57 | scrape_author_input = input(scrape_author_question)
58 | scrape_author = preference(scrape_author_input, scrape_author_question)
59 |
60 | scrape_date_question = "Yorum tarihleri çekilsin mi(y/n): "
61 | scrape_date_input = input(scrape_date_question)
62 | scrape_date = preference(scrape_date_input, scrape_date_question)
63 |
64 | scrape_title_question = "Video başlığı çekilsin mi(y/n): "
65 | scrape_title_input = input(scrape_title_question)
66 | scrape_title = preference(scrape_title_input, scrape_title_question)
67 |
68 | scrape_like_question = "Yorumun aldığı beğeni sayısı çekilsin mi(y/n): "
69 | scrape_like_input = input(scrape_like_question)
70 | scrape_like = preference(scrape_like_input, scrape_like_question)
71 |
72 | path = "C:\chromedriver.exe"
73 |
74 | def scrape():
75 | try:
76 | print("Chromedriver'a erişiliyor...")
77 | driver = webdriver.Chrome(path)
78 | time.sleep(delay)
79 | print("Chromedriver'a erişildi.")
80 |
81 | except WebDriverException:
82 | print("Chromedriver kullanılamıyor.")
83 | sys.exit()
84 |
85 | try:
86 | print("Video bağlantısına gidiliyor...")
87 | driver.get(url)
88 | time.sleep(delay)
89 | driver.maximize_window()
90 | time.sleep(delay)
91 | print("Video bağlantısına gidildi.")
92 |
93 | except:
94 | print("Youtube'a erişilemiyor.")
95 | sys.exit()
96 |
97 | time.sleep(delay+2)
98 | comment_section = driver.find_element_by_xpath('//*[@id="comments"]')
99 | title = driver.find_element_by_class_name("title").text
100 | time.sleep(delay)
101 |
102 | driver.execute_script("arguments[0].scrollIntoView();", comment_section)
103 | time.sleep(delay+2)
104 |
105 | comment_count = driver.find_element_by_class_name("count-text.ytd-comments-header-renderer").text
106 | comment_count = comment_count.replace(" Yorum","")
107 | comment_count = comment_count.replace(".","")
108 | comment_count = int(comment_count)
109 |
110 | last_height = driver.execute_script("return document.documentElement.scrollHeight")
111 |
112 | while True:
113 | driver.execute_script("window.scrollTo(0, document.documentElement.scrollHeight);")
114 | new_height = driver.execute_script("return document.documentElement.scrollHeight")
115 | if new_height == last_height:
116 | break
117 | last_height = new_height
118 |
119 | l = 1
120 | while l <= comment_count:
121 | try:
122 | comments = driver.find_elements_by_xpath("//*[@id='contents']/ytd-comment-thread-renderer")
123 | for comment in comments:
124 | print("Veri çekiliyor...")
125 | print("Yorum: " + str(l))
126 | author = comment.find_element_by_id("author-text").text
127 | date = comment.find_element_by_class_name("published-time-text").text
128 | comment_text = comment.find_element_by_id("content-text").text
129 | likes = comment.find_element_by_id("vote-count-middle").text
130 |
131 | author_texts.append(author)
132 | date_texts.append(date)
133 | comment_texts.append(comment_text)
134 | like_texts.append(likes)
135 | title_text.append(title)
136 |
137 | l += 1
138 |
139 | except NoSuchElementException:
140 | break
141 |
142 | driver.close()
143 |
144 | length_list = [comment_texts, author_texts, date_texts, like_texts, title_text]
145 | limit = map(len, length_list)
146 | limit = min(list(limit))
147 | limit -= 1
148 |
149 | comment_texts_fin = comment_texts[:limit]
150 | df = pd.DataFrame({"Yorumlar":comment_texts_fin})
151 | if scrape_author:
152 | author_texts_fin = author_texts[:limit]
153 | df["Kullanıcı"] = author_texts_fin
154 |
155 | if scrape_date:
156 | date_texts_fin = date_texts[:limit]
157 | df["Yorum Tarihi"] = date_texts_fin
158 |
159 | if scrape_like:
160 | like_texts_fin = like_texts[:limit]
161 | df["Yorumun Aldığı Beğeni Sayısı"] = like_texts_fin
162 |
163 | if scrape_title:
164 | title_text_fin = title_text[:limit]
165 | df["Video Başlığı"] = title_text_fin
166 |
167 | df.to_excel(file, header = True, index = False)
168 | x = "Çektiğiniz veriler " + file + " adlı excel dosyasına kaydedildi."
169 | print(x)
170 |
171 | print("""
172 | --------------------------------------------------------------------------
173 | - Projeden memnun kaldıysanız Github üzerinden yıldızlamayı unutmayın. -
174 | - Github Hesabım: ardauzunoglu -
175 | --------------------------------------------------------------------------
176 | """)
177 |
178 | time.sleep(3)
179 | initialize()
180 | scrape()
181 |
182 | if __name__ == "__main__":
183 | youtube_scrape()
--------------------------------------------------------------------------------