├── README.md ├── properties-02126.csv └── zillow.py /README.md: -------------------------------------------------------------------------------- 1 | # Zillow Real Estate Listing Scraper 2 | 3 | This script will scrape Zillow.com, an online real estate database to extract real estate listings available based on a zip code. If you would like to know more about this scraper you can check out our blog post at this link https://www.scrapehero.com/how-to-scrape-real-estate-listings-on-zillow-com-using-python-and-lxml/ 4 | 5 | ## Getting Started 6 | 7 | These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. 8 | 9 | ### Fields 10 | 11 | This zillow scraper can extract the fields below 12 | 13 | 1. Title 14 | 2. Street Name 15 | 3. City 16 | 4. State 17 | 5. Zip Code 18 | 6. Price 19 | 7. Facts and Features 20 | 8. Real Estate Provider 21 | 9. URL 22 | 23 | ### Prerequisites 24 | 25 | For this web scraping tutorial using Python 3, we will need some packages for downloading and parsing the HTML. 26 | Below are the package requirements: 27 | 28 | - lxml 29 | - requests 30 | 31 | ### Installation 32 | 33 | PIP to install the following packages in Python (https://pip.pypa.io/en/stable/installing/). 34 | 35 | Python Requests, to make requests and download the HTML content of the pages (http://docs.python-requests.org/en/master/user/install/). 36 | 37 | To install python request module: 38 | 39 | ``` 40 | pip3 install requests 41 | ``` 42 | 43 | Python LXML, for parsing the HTML Tree Structure using Xpaths (Learn how to install that here – http://lxml.de/installation.html) 44 | Installing lxml: 45 | 46 | ``` 47 | pip3 install lxml 48 | ``` 49 | 50 | 51 | ## Running the scraper 52 | You must run the script using Python with arguments for zip code and sort. The sort argument has the options ‘newest’ and ‘cheapest’ 53 | listings available. As an example, to find the listings of the newest properties up for sale in Boston, Massachusetts we would run the 54 | script as: 55 | 56 | ``` 57 | python3 zillow.py 02126 newest 58 | ``` 59 | ## Sample Output 60 | 61 | This will create a csv file: 62 | 63 | [Sample Output](https://raw.githubusercontent.com/scrapehero/zillow_real_estate/master/properties-02126.csv) 64 | 65 | 66 | -------------------------------------------------------------------------------- /properties-02126.csv: -------------------------------------------------------------------------------- 1 | Title,Address,City,State,Zip Code,Price,Facts and Features,Real Estate Provider,URL 2 | House For Sale,45 Faunce Rd,Boston,MA,2126,"$349,000 ","3 bds , 2 ba , 956 sqft",Pondside Realty,https://www.zillow.com/homedetails/45-Faunce-Rd-Boston-MA-02126/59137709_zpid/ 3 | House For Sale,60 Temple St,Boston,MA,2126,"$540,000 ","4 bds , 2 ba , 1,696 sqft",Gibson Sotheby's International Realty,https://www.zillow.com/homedetails/60-Temple-St-Boston-MA-02126/59138896_zpid/ 4 | House For Sale,29 Violante St,Boston,MA,2126,"$259,000 ","3 bds , 2 ba , 1,632 sqft",,https://www.zillow.com/homedetails/29-Violante-St-Boston-MA-02126/59139388_zpid/ 5 | Condo For Sale,53 River St,Boston,MA,2126,"$529,000 ","3 bds , 3 ba , 1,589 sqft","The Galvin Group, LLC",https://www.zillow.com/homedetails/53-River-St-Boston-MA-02126/2092600080_zpid/ 6 | Condo For Sale,6 Temple St,Boston,MA,2126,"$659,000 ","3 bds , 3 ba , 1,697 sqft","The Galvin Group, LLC",https://www.zillow.com/homedetails/6-Temple-St-Boston-MA-02126/2124803167_zpid/ 7 | Condo For Sale,15 Groveland St # 3,Boston,MA,2126,"$349,900 ","3 bds , 1 ba , 830 sqft",Newbury Street Real Estate,https://www.zillow.com/homedetails/15-Groveland-St-3-Boston-MA-02126/2092921175_zpid/ 8 | House For Sale,20 Almont St,Boston,MA,2126,"$550,000 ","8 bds , 3 ba , 3,770 sqft",,https://www.zillow.com/homedetails/20-Almont-St-Boston-MA-02126/59138113_zpid/ 9 | House For Sale,21 Kennebec St,Boston,MA,2126,"$420,000 ","2 bds , 2 ba , 908 sqft",Success! Real Estate,https://www.zillow.com/homedetails/21-Kennebec-St-Boston-MA-02126/59138042_zpid/ 10 | Apartment For Sale,40 Tennis Rd,Boston,MA,2126,"$629,900 ","6 bds , 2 ba , 4,452 sqft",M7 Real Estate Group,https://www.zillow.com/homedetails/40-Tennis-Rd-Boston-MA-02126/59137978_zpid/ 11 | House For Sale,3 Duxbury Rd,Boston,MA,2126,"$475,000 ","4 bds , 1.5 ba , 1,296 sqft",,https://www.zillow.com/homedetails/3-Duxbury-Rd-Boston-MA-02126/59137470_zpid/ 12 | Condo For Sale,51 River St,Mattapan,MA,2126,"$599,000 ","3 bds , 3 ba , 1,485 sqft","The Galvin Group, LLC",https://www.zillow.com/homedetails/51-River-St-Mattapan-MA-02126/2092719719_zpid/ 13 | Condo For Sale,51A River St,Boston,MA,2126,"$579,000 ","3 bds , 3 ba , 1,491 sqft","The Galvin Group, LLC",https://www.zillow.com/homedetails/51A-River-St-Boston-MA-02126/2092760832_zpid/ 14 | Apartment For Sale,4-6 Greendale Rd,Boston,MA,2126,"$599,900 ","6 bds , 3 ba , 2,496 sqft","WEICHERT, REALTORS® - Briarwood Real Estate",https://www.zillow.com/homedetails/4-6-Greendale-Rd-Boston-MA-02126/2092835887_zpid/ 15 | Condo For Sale,27 Hosmer St APT 1,Boston,MA,2126,"$199,000 ","2 bds , 1 ba , 896 sqft",,https://www.zillow.com/homedetails/27-Hosmer-St-APT-1-Boston-MA-02126/81378796_zpid/ 16 | House For Sale,64 Tampa St,Boston,MA,2126,"$230,000 ","1 bd , 1 ba , 729 sqft",Rudy & Associates REALTORS®,https://www.zillow.com/homedetails/64-Tampa-St-Boston-MA-02126/2118499072_zpid/ 17 | House For Sale,151 Hebron St,Boston,MA,2126,"$299,900 ","2 bds , 1 ba , 960 sqft",Pondside Realty,https://www.zillow.com/homedetails/151-Hebron-St-Boston-MA-02126/59138658_zpid/ 18 | Condo For Sale,622 Morton St # 1,Boston,MA,2126,"$325,000 ","3 bds , 1 ba , 1,200 sqft",Alpha Realty,https://www.zillow.com/homedetails/622-Morton-St-1-Boston-MA-02126/2093444356_zpid/ 19 | -------------------------------------------------------------------------------- /zillow.py: -------------------------------------------------------------------------------- 1 | from lxml import html 2 | import requests 3 | import unicodecsv as csv 4 | import argparse 5 | 6 | def parse(zipcode,filter=None): 7 | 8 | if filter=="newest": 9 | url = "https://www.zillow.com/homes/for_sale/{0}/0_singlestory/days_sort".format(zipcode) 10 | elif filter == "cheapest": 11 | url = "https://www.zillow.com/homes/for_sale/{0}/0_singlestory/pricea_sort/".format(zipcode) 12 | else: 13 | url = "https://www.zillow.com/homes/for_sale/{0}_rb/?fromHomePage=true&shouldFireSellPageImplicitClaimGA=false&fromHomePageTab=buy".format(zipcode) 14 | 15 | for i in range(5): 16 | # try: 17 | headers= { 18 | 'accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8', 19 | 'accept-encoding':'gzip, deflate, sdch, br', 20 | 'accept-language':'en-GB,en;q=0.8,en-US;q=0.6,ml;q=0.4', 21 | 'cache-control':'max-age=0', 22 | 'upgrade-insecure-requests':'1', 23 | 'user-agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36' 24 | } 25 | response = requests.get(url,headers=headers) 26 | print(response.status_code) 27 | parser = html.fromstring(response.text) 28 | search_results = parser.xpath("//div[@id='search-results']//article") 29 | properties_list = [] 30 | 31 | for properties in search_results: 32 | raw_address = properties.xpath(".//span[@itemprop='address']//span[@itemprop='streetAddress']//text()") 33 | raw_city = properties.xpath(".//span[@itemprop='address']//span[@itemprop='addressLocality']//text()") 34 | raw_state= properties.xpath(".//span[@itemprop='address']//span[@itemprop='addressRegion']//text()") 35 | raw_postal_code= properties.xpath(".//span[@itemprop='address']//span[@itemprop='postalCode']//text()") 36 | raw_price = properties.xpath(".//span[@class='zsg-photo-card-price']//text()") 37 | raw_info = properties.xpath(".//span[@class='zsg-photo-card-info']//text()") 38 | raw_broker_name = properties.xpath(".//span[@class='zsg-photo-card-broker-name']//text()") 39 | url = properties.xpath(".//a[contains(@class,'overlay-link')]/@href") 40 | raw_title = properties.xpath(".//h4//text()") 41 | 42 | address = ' '.join(' '.join(raw_address).split()) if raw_address else None 43 | city = ''.join(raw_city).strip() if raw_city else None 44 | state = ''.join(raw_state).strip() if raw_state else None 45 | postal_code = ''.join(raw_postal_code).strip() if raw_postal_code else None 46 | price = ''.join(raw_price).strip() if raw_price else None 47 | info = ' '.join(' '.join(raw_info).split()).replace(u"\xb7",',') 48 | broker = ''.join(raw_broker_name).strip() if raw_broker_name else None 49 | title = ''.join(raw_title) if raw_title else None 50 | property_url = "https://www.zillow.com"+url[0] if url else None 51 | is_forsale = properties.xpath('.//span[@class="zsg-icon-for-sale"]') 52 | properties = { 53 | 'address':address, 54 | 'city':city, 55 | 'state':state, 56 | 'postal_code':postal_code, 57 | 'price':price, 58 | 'facts and features':info, 59 | 'real estate provider':broker, 60 | 'url':property_url, 61 | 'title':title 62 | } 63 | if is_forsale: 64 | properties_list.append(properties) 65 | return properties_list 66 | # except: 67 | # print ("Failed to process the page",url) 68 | 69 | if __name__=="__main__": 70 | argparser = argparse.ArgumentParser(formatter_class=argparse.RawTextHelpFormatter) 71 | argparser.add_argument('zipcode',help = '') 72 | sortorder_help = """ 73 | available sort orders are : 74 | newest : Latest property details, 75 | cheapest : Properties with cheapest price 76 | """ 77 | argparser.add_argument('sort',nargs='?',help = sortorder_help,default ='Homes For You') 78 | args = argparser.parse_args() 79 | zipcode = args.zipcode 80 | sort = args.sort 81 | print ("Fetching data for %s"%(zipcode)) 82 | scraped_data = parse(zipcode,sort) 83 | print ("Writing data to output file") 84 | with open("properties-%s.csv"%(zipcode),'wb')as csvfile: 85 | fieldnames = ['title','address','city','state','postal_code','price','facts and features','real estate provider','url'] 86 | writer = csv.DictWriter(csvfile, fieldnames=fieldnames) 87 | writer.writeheader() 88 | for row in scraped_data: 89 | writer.writerow(row) 90 | 91 | --------------------------------------------------------------------------------