├── readme.md ├── requirements.txt ├── results.json └── scraper.py /readme.md: -------------------------------------------------------------------------------- 1 | # Google Maps Scraper 2 | 3 | This Python application utilizes Selenium WebDriver to scrape data from Google Maps based on a specific search keyword. The data includes titles, links, websites, ratings, reviews, and phone numbers of the listed entities. 4 | 5 | ## Proxies Recommendation 6 | 7 | By the way, using good quality proxies is crucial for scraping. I recommend using NodeMaven. Use `Michael` at checkout to get +2GB of free traffic when purchasing any package (except trials) - [https://go.nodemaven.com/scrape](https://go.nodemaven.com/scrape) 8 | 9 | ## Features 10 | 11 | - Anonymize proxy usage to prevent blocking. 12 | - Automated browser navigation and data scraping from Google Maps. 13 | - Extraction of detailed information including business title, website, contact information, ratings, and reviews. 14 | - Automated scrolling to load all search results. 15 | - Data saved in an easy-to-read JSON format. 16 | 17 | ## Prerequisites 18 | 19 | Before running this project, ensure you have the following installed: 20 | - Python (3.6 or higher recommended) 21 | - pip (usually comes with Python) 22 | 23 | ## Installation 24 | 25 | 1. Clone the repository: 26 | ```bash 27 | git clone https://github.com/michaelkitas/Google-Maps-Leads-Scraper-Selenium 28 | ``` 29 | 2. Navigate to the project directory: 30 | ```bash 31 | cd google-maps-scraper-python 32 | ``` 33 | 3. Install the dependencies: 34 | ```bash 35 | pip install -r requirements.txt 36 | ``` 37 | 38 | ## Usage 39 | 40 | To run the script, simply execute the following command in your terminal: 41 | 42 | ```bash 43 | python scraper.py 44 | ``` 45 | 46 | Before running the script, you may want to set the search keyword and proxy settings in the script file: 47 | 48 | ```python 49 | keyword = "your_search_keyword" 50 | proxy = "http://username:password@proxy-host:proxy-port" 51 | ``` 52 | 53 | ## Output 54 | 55 | After running the script, the scraped data will be saved in a file named `results.json` in the root directory of the project. This file will contain an array of objects, each representing a business or entity found in the Google Maps search results. 56 | 57 | ## Configuration 58 | 59 | You can modify the script to search for different keywords or to change the behavior of the scraping process. Configuration can be done directly in the script: 60 | 61 | - Change the `keyword` variable to your desired search term. 62 | - Modify the `proxy` variable to use different proxy settings. 63 | - Adjust the scrolling behavior, timeout settings, and other parameters as needed. 64 | 65 | ## Disclaimer 66 | 67 | This script is for educational purposes only. Scraping data from websites may be against their terms of service. Use this script responsibly and ethically, and ensure you are compliant with Google Maps' terms of service and any relevant laws or regulations. 68 | 69 | ## Contributing 70 | 71 | Contributions to the project are welcome! Please feel free to fork the repository, make changes, and submit pull requests. 72 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | seleniumwire 2 | selenium 3 | webdriver_manager -------------------------------------------------------------------------------- /results.json: -------------------------------------------------------------------------------- 1 | [] -------------------------------------------------------------------------------- /scraper.py: -------------------------------------------------------------------------------- 1 | from seleniumwire import webdriver 2 | from selenium.webdriver.chrome.service import Service 3 | from selenium.webdriver.common.by import By 4 | from selenium.webdriver.support.ui import WebDriverWait 5 | from selenium.webdriver.support import expected_conditions as EC 6 | from webdriver_manager.chrome import ChromeDriverManager 7 | import time 8 | import json 9 | import re 10 | 11 | chrome_options = webdriver.ChromeOptions() 12 | 13 | service = Service( 14 | ChromeDriverManager().install() 15 | ) 16 | 17 | proxy='http://mixaliskitas_gmail_com-country-us-region-new_york-city-new_york_city:5pyqsmquyy@gate.nodemaven.com:8080' 18 | 19 | options = { 20 | 'proxy': { 21 | 'http': proxy, 22 | 'https': proxy, 23 | 'no_proxy': 'localhost,127.0.0.1' 24 | } 25 | } 26 | 27 | driver = webdriver.Chrome( 28 | service=service, options=chrome_options, seleniumwire_options=options 29 | ) 30 | 31 | try: 32 | keyword = "lawyer" 33 | 34 | driver.get(f'https://www.google.com/maps/search/{keyword}/') 35 | 36 | try: 37 | WebDriverWait(driver, 5).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "form:nth-child(2)"))).click() 38 | except Exception: 39 | pass 40 | 41 | scrollable_div = driver.find_element(By.CSS_SELECTOR, 'div[role="feed"]') 42 | driver.execute_script(""" 43 | var scrollableDiv = arguments[0]; 44 | function scrollWithinElement(scrollableDiv) { 45 | return new Promise((resolve, reject) => { 46 | var totalHeight = 0; 47 | var distance = 1000; 48 | var scrollDelay = 3000; 49 | 50 | var timer = setInterval(() => { 51 | var scrollHeightBefore = scrollableDiv.scrollHeight; 52 | scrollableDiv.scrollBy(0, distance); 53 | totalHeight += distance; 54 | 55 | if (totalHeight >= scrollHeightBefore) { 56 | totalHeight = 0; 57 | setTimeout(() => { 58 | var scrollHeightAfter = scrollableDiv.scrollHeight; 59 | if (scrollHeightAfter > scrollHeightBefore) { 60 | return; 61 | } else { 62 | clearInterval(timer); 63 | resolve(); 64 | } 65 | }, scrollDelay); 66 | } 67 | }, 200); 68 | }); 69 | } 70 | return scrollWithinElement(scrollableDiv); 71 | """, scrollable_div) 72 | 73 | items = driver.find_elements(By.CSS_SELECTOR, 'div[role="feed"] > div > div[jsaction]') 74 | 75 | results = [] 76 | for item in items: 77 | data = {} 78 | 79 | try: 80 | data['title'] = item.find_element(By.CSS_SELECTOR, ".fontHeadlineSmall").text 81 | except Exception: 82 | pass 83 | 84 | try: 85 | data['link'] = item.find_element(By.CSS_SELECTOR, "a").get_attribute('href') 86 | except Exception: 87 | pass 88 | 89 | try: 90 | data['website'] = item.find_element(By.CSS_SELECTOR, 'div[role="feed"] > div > div[jsaction] div > a').get_attribute('href') 91 | except Exception: 92 | pass 93 | 94 | try: 95 | rating_text = item.find_element(By.CSS_SELECTOR, '.fontBodyMedium > span[role="img"]').get_attribute('aria-label') 96 | rating_numbers = [float(piece.replace(",", ".")) for piece in rating_text.split(" ") if piece.replace(",", ".").replace(".", "", 1).isdigit()] 97 | 98 | if rating_numbers: 99 | data['stars'] = rating_numbers[0] 100 | data['reviews'] = int(rating_numbers[1]) if len(rating_numbers) > 1 else 0 101 | except Exception: 102 | pass 103 | 104 | try: 105 | text_content = item.text 106 | phone_pattern = r'((\+?\d{1,2}[ -]?)?(\(?\d{3}\)?[ -]?\d{3,4}[ -]?\d{4}|\(?\d{2,3}\)?[ -]?\d{2,3}[ -]?\d{2,3}[ -]?\d{2,3}))' 107 | matches = re.findall(phone_pattern, text_content) 108 | 109 | phone_numbers = [match[0] for match in matches] 110 | unique_phone_numbers = list(set(phone_numbers)) 111 | 112 | data['phone'] = unique_phone_numbers[0] if unique_phone_numbers else None 113 | except Exception: 114 | pass 115 | 116 | if (data.get('title')): 117 | results.append(data) 118 | 119 | with open('results.json', 'w', encoding='utf-8') as f: 120 | json.dump(results, f, ensure_ascii=False, indent=2) 121 | 122 | finally: 123 | time.sleep(60) 124 | driver.quit() 125 | --------------------------------------------------------------------------------