├── readme.md
├── requirements.txt
├── results.json
└── scraper.py


/readme.md:
--------------------------------------------------------------------------------
 1 | # Google Maps Scraper
 2 | 
 3 | This Python application utilizes Selenium WebDriver to scrape data from Google Maps based on a specific search keyword. The data includes titles, links, websites, ratings, reviews, and phone numbers of the listed entities.
 4 | 
 5 | ## Proxies Recommendation
 6 | 
 7 | By the way, using good quality proxies is crucial for scraping. I recommend using NodeMaven. Use `Michael` at checkout to get +2GB of free traffic when purchasing any package (except trials) - [https://go.nodemaven.com/scrape](https://go.nodemaven.com/scrape)
 8 | 
 9 | ## Features
10 | 
11 | - Anonymize proxy usage to prevent blocking.
12 | - Automated browser navigation and data scraping from Google Maps.
13 | - Extraction of detailed information including business title, website, contact information, ratings, and reviews.
14 | - Automated scrolling to load all search results.
15 | - Data saved in an easy-to-read JSON format.
16 | 
17 | ## Prerequisites
18 | 
19 | Before running this project, ensure you have the following installed:
20 | - Python (3.6 or higher recommended)
21 | - pip (usually comes with Python)
22 | 
23 | ## Installation
24 | 
25 | 1. Clone the repository:
26 |    ```bash
27 |    git clone https://github.com/michaelkitas/Google-Maps-Leads-Scraper-Selenium
28 |    ```
29 | 2. Navigate to the project directory:
30 |    ```bash
31 |    cd google-maps-scraper-python
32 |    ```
33 | 3. Install the dependencies:
34 |    ```bash
35 |    pip install -r requirements.txt
36 |    ```
37 | 
38 | ## Usage
39 | 
40 | To run the script, simply execute the following command in your terminal:
41 | 
42 | ```bash
43 | python scraper.py
44 | ```
45 | 
46 | Before running the script, you may want to set the search keyword and proxy settings in the script file:
47 | 
48 | ```python
49 | keyword = "your_search_keyword"
50 | proxy = "http://username:password@proxy-host:proxy-port"
51 | ```
52 | 
53 | ## Output
54 | 
55 | After running the script, the scraped data will be saved in a file named `results.json` in the root directory of the project. This file will contain an array of objects, each representing a business or entity found in the Google Maps search results.
56 | 
57 | ## Configuration
58 | 
59 | You can modify the script to search for different keywords or to change the behavior of the scraping process. Configuration can be done directly in the script:
60 | 
61 | - Change the `keyword` variable to your desired search term.
62 | - Modify the `proxy` variable to use different proxy settings.
63 | - Adjust the scrolling behavior, timeout settings, and other parameters as needed.
64 | 
65 | ## Disclaimer
66 | 
67 | This script is for educational purposes only. Scraping data from websites may be against their terms of service. Use this script responsibly and ethically, and ensure you are compliant with Google Maps' terms of service and any relevant laws or regulations.
68 | 
69 | ## Contributing
70 | 
71 | Contributions to the project are welcome! Please feel free to fork the repository, make changes, and submit pull requests.
72 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | seleniumwire
2 | selenium
3 | webdriver_manager


--------------------------------------------------------------------------------
/results.json:
--------------------------------------------------------------------------------
1 | []


--------------------------------------------------------------------------------
/scraper.py:
--------------------------------------------------------------------------------
  1 | from seleniumwire import webdriver
  2 | from selenium.webdriver.chrome.service import Service
  3 | from selenium.webdriver.common.by import By
  4 | from selenium.webdriver.support.ui import WebDriverWait
  5 | from selenium.webdriver.support import expected_conditions as EC
  6 | from webdriver_manager.chrome import ChromeDriverManager
  7 | import time
  8 | import json
  9 | import re
 10 | 
 11 | chrome_options = webdriver.ChromeOptions()
 12 | 
 13 | service = Service(
 14 |   ChromeDriverManager().install()
 15 | )
 16 | 
 17 | proxy='http://mixaliskitas_gmail_com-country-us-region-new_york-city-new_york_city:5pyqsmquyy@gate.nodemaven.com:8080'
 18 | 
 19 | options = {
 20 |     'proxy': {
 21 |         'http': proxy,
 22 |         'https': proxy,
 23 |         'no_proxy': 'localhost,127.0.0.1'
 24 |     }
 25 | }
 26 | 
 27 | driver = webdriver.Chrome(
 28 |   service=service, options=chrome_options, seleniumwire_options=options
 29 | )
 30 | 
 31 | try: 
 32 |   keyword = "lawyer"
 33 | 
 34 |   driver.get(f'https://www.google.com/maps/search/{keyword}/')
 35 | 
 36 |   try:
 37 |     WebDriverWait(driver, 5).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "form:nth-child(2)"))).click()
 38 |   except Exception:
 39 |     pass
 40 | 
 41 |   scrollable_div = driver.find_element(By.CSS_SELECTOR, 'div[role="feed"]')
 42 |   driver.execute_script("""
 43 |           var scrollableDiv = arguments[0];
 44 |           function scrollWithinElement(scrollableDiv) {
 45 |               return new Promise((resolve, reject) => {
 46 |                   var totalHeight = 0;
 47 |                   var distance = 1000;
 48 |                   var scrollDelay = 3000;
 49 |                   
 50 |                   var timer = setInterval(() => {
 51 |                       var scrollHeightBefore = scrollableDiv.scrollHeight;
 52 |                       scrollableDiv.scrollBy(0, distance);
 53 |                       totalHeight += distance;
 54 | 
 55 |                       if (totalHeight >= scrollHeightBefore) {
 56 |                           totalHeight = 0;
 57 |                           setTimeout(() => {
 58 |                               var scrollHeightAfter = scrollableDiv.scrollHeight;
 59 |                               if (scrollHeightAfter > scrollHeightBefore) {
 60 |                                   return;
 61 |                               } else {
 62 |                                   clearInterval(timer);
 63 |                                   resolve();
 64 |                               }
 65 |                           }, scrollDelay);
 66 |                       }
 67 |                   }, 200);
 68 |               });
 69 |           }
 70 |           return scrollWithinElement(scrollableDiv);
 71 |   """, scrollable_div)
 72 | 
 73 |   items = driver.find_elements(By.CSS_SELECTOR, 'div[role="feed"] > div > div[jsaction]')
 74 | 
 75 |   results = []
 76 |   for item in items:
 77 |     data = {}
 78 | 
 79 |     try:
 80 |         data['title'] = item.find_element(By.CSS_SELECTOR, ".fontHeadlineSmall").text
 81 |     except Exception:
 82 |       pass
 83 | 
 84 |     try:
 85 |         data['link'] = item.find_element(By.CSS_SELECTOR, "a").get_attribute('href')
 86 |     except Exception:
 87 |       pass
 88 | 
 89 |     try:
 90 |         data['website'] = item.find_element(By.CSS_SELECTOR, 'div[role="feed"] > div > div[jsaction] div > a').get_attribute('href')
 91 |     except Exception:
 92 |       pass
 93 |     
 94 |     try:
 95 |         rating_text = item.find_element(By.CSS_SELECTOR, '.fontBodyMedium > span[role="img"]').get_attribute('aria-label')
 96 |         rating_numbers = [float(piece.replace(",", ".")) for piece in rating_text.split(" ") if piece.replace(",", ".").replace(".", "", 1).isdigit()]
 97 | 
 98 |         if rating_numbers:
 99 |            data['stars'] = rating_numbers[0]
100 |            data['reviews'] = int(rating_numbers[1]) if len(rating_numbers) > 1 else 0
101 |     except Exception:
102 |       pass
103 | 
104 |     try:
105 |       text_content = item.text
106 |       phone_pattern = r'((\+?\d{1,2}[ -]?)?(\(?\d{3}\)?[ -]?\d{3,4}[ -]?\d{4}|\(?\d{2,3}\)?[ -]?\d{2,3}[ -]?\d{2,3}[ -]?\d{2,3}))'
107 |       matches = re.findall(phone_pattern, text_content)
108 | 
109 |       phone_numbers = [match[0] for match in matches]
110 |       unique_phone_numbers = list(set(phone_numbers))
111 | 
112 |       data['phone'] = unique_phone_numbers[0] if unique_phone_numbers else None   
113 |     except Exception:
114 |         pass
115 | 
116 |     if (data.get('title')):
117 |       results.append(data)
118 |     
119 |   with open('results.json', 'w', encoding='utf-8') as f:
120 |       json.dump(results, f, ensure_ascii=False, indent=2)
121 | 
122 | finally:
123 |   time.sleep(60)
124 |   driver.quit()
125 | 


--------------------------------------------------------------------------------