├── README.md
├── requirements.txt
└── search.py


/README.md:
--------------------------------------------------------------------------------
 1 | # Scraping Google Starter Script
 2 | 
 3 | Simple script to scrape Google using `requests` and `bs4`.
 4 | 
 5 | ## Getting Started
 6 | 
 7 | These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
 8 | 
 9 | ### Prerequisites
10 | 
11 | Python3+ 
12 | 
13 | ```
14 | https://www.python.org/downloads/
15 | ```
16 | 
17 | ### Installing
18 | 
19 | ```bash
20 | # clone repo
21 | git pull https://github.com/getlinksc/scrape_google.git
22 | # install requirements
23 | pip install -r requirements.txt
24 | # modify query in search.py and run script
25 | python search.py
26 | ```
27 | 
28 | * [Python3](https://www.python.org/) - Python is a programming language that lets you work quickly
29 | and integrate systems more effectively.
30 | * [Pip](https://pip.pypa.io/en/stable/) - The Python Package Installer
31 | * [Requests](https://requests.readthedocs.io/en/master/) - HTTP for Humans
32 | * [Beautiful Soup](https://requests.readthedocs.io/en/master/) - a Python library for pulling data out of HTML and XML files
33 | * [Google Search API](https://link.sc) - an API for scraping Google search results
34 | 
35 | ## Contributing
36 | 
37 | ## Versioning
38 | 
39 | ## Authors
40 | 
41 | * **Link.sc** - *Initial work* - [Link.sc](https://github.com/getlinksc/)
42 | 
43 | ## License
44 | 
45 | ## Acknowledgments
46 | 
47 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | requests
2 | bs4


--------------------------------------------------------------------------------
/search.py:
--------------------------------------------------------------------------------
 1 | import urllib
 2 | import requests
 3 | from bs4 import BeautifulSoup
 4 | 
 5 | # desktop user-agent
 6 | USER_AGENT = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:65.0) Gecko/20100101 Firefox/65.0"
 7 | # mobile user-agent
 8 | MOBILE_USER_AGENT = "Mozilla/5.0 (Linux; Android 7.0; SM-G930V Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.125 Mobile Safari/537.36"
 9 | 
10 | query = "hackernoon How To Scrape Google With Python"
11 | query = query.replace(' ', '+')
12 | URL = f"https://google.com/search?q={query}"
13 | 
14 | headers = {"user-agent": USER_AGENT}
15 | resp = requests.get(URL, headers=headers)
16 | 
17 | if resp.status_code == 200:
18 |     soup = BeautifulSoup(resp.content, "html.parser")
19 |     results = []
20 |     for g in soup.find_all('div', class_='g'):
21 |         # anchor div
22 |         rc = g.find('div', class_='rc')
23 |         # description div
24 |         s = g.find('div', class_='s')
25 |         if rc:
26 |             divs = rc.find_all('div', recursive=False)
27 |             if len(divs) >= 2:
28 |                 anchor = divs[0].find('a')
29 |                 link = anchor['href']
30 |                 title = anchor.find('h3').text
31 |                 item = {
32 |                     "title": title,
33 |                     "link": link
34 |                 }
35 |                 results.append(item)
36 |     print(results)


--------------------------------------------------------------------------------