├── README.md ├── requirements.txt └── search.py /README.md: -------------------------------------------------------------------------------- 1 | # Scraping Google Starter Script 2 | 3 | Simple script to scrape Google using `requests` and `bs4`. 4 | 5 | ## Getting Started 6 | 7 | These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. 8 | 9 | ### Prerequisites 10 | 11 | Python3+ 12 | 13 | ``` 14 | https://www.python.org/downloads/ 15 | ``` 16 | 17 | ### Installing 18 | 19 | ```bash 20 | # clone repo 21 | git pull https://github.com/getlinksc/scrape_google.git 22 | # install requirements 23 | pip install -r requirements.txt 24 | # modify query in search.py and run script 25 | python search.py 26 | ``` 27 | 28 | * [Python3](https://www.python.org/) - Python is a programming language that lets you work quickly 29 | and integrate systems more effectively. 30 | * [Pip](https://pip.pypa.io/en/stable/) - The Python Package Installer 31 | * [Requests](https://requests.readthedocs.io/en/master/) - HTTP for Humans 32 | * [Beautiful Soup](https://requests.readthedocs.io/en/master/) - a Python library for pulling data out of HTML and XML files 33 | * [Google Search API](https://link.sc) - an API for scraping Google search results 34 | 35 | ## Contributing 36 | 37 | ## Versioning 38 | 39 | ## Authors 40 | 41 | * **Link.sc** - *Initial work* - [Link.sc](https://github.com/getlinksc/) 42 | 43 | ## License 44 | 45 | ## Acknowledgments 46 | 47 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | requests 2 | bs4 -------------------------------------------------------------------------------- /search.py: -------------------------------------------------------------------------------- 1 | import urllib 2 | import requests 3 | from bs4 import BeautifulSoup 4 | 5 | # desktop user-agent 6 | USER_AGENT = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:65.0) Gecko/20100101 Firefox/65.0" 7 | # mobile user-agent 8 | MOBILE_USER_AGENT = "Mozilla/5.0 (Linux; Android 7.0; SM-G930V Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.125 Mobile Safari/537.36" 9 | 10 | query = "hackernoon How To Scrape Google With Python" 11 | query = query.replace(' ', '+') 12 | URL = f"https://google.com/search?q={query}" 13 | 14 | headers = {"user-agent": USER_AGENT} 15 | resp = requests.get(URL, headers=headers) 16 | 17 | if resp.status_code == 200: 18 | soup = BeautifulSoup(resp.content, "html.parser") 19 | results = [] 20 | for g in soup.find_all('div', class_='g'): 21 | # anchor div 22 | rc = g.find('div', class_='rc') 23 | # description div 24 | s = g.find('div', class_='s') 25 | if rc: 26 | divs = rc.find_all('div', recursive=False) 27 | if len(divs) >= 2: 28 | anchor = divs[0].find('a') 29 | link = anchor['href'] 30 | title = anchor.find('h3').text 31 | item = { 32 | "title": title, 33 | "link": link 34 | } 35 | results.append(item) 36 | print(results) --------------------------------------------------------------------------------