├── house_outputGBK编码,可用excle打开,.csv ├── lianjia ├── .idea │ ├── .gitignore │ ├── encodings.xml │ ├── inspectionProfiles │ │ └── profiles_settings.xml │ ├── lianjia.iml │ ├── misc.xml │ ├── modules.xml │ └── vcs.xml ├── 2021211338-郭柏彤-爬虫小作业-源代码 │ ├── .idea │ │ ├── .gitignore │ │ ├── inspectionProfiles │ │ │ └── profiles_settings.xml │ │ ├── lianjia.iml │ │ ├── misc.xml │ │ └── modules.xml │ ├── lianjia │ │ ├── __init__.py │ │ ├── begin.py │ │ ├── items.py │ │ ├── middlewares.py │ │ ├── pipelines.py │ │ ├── settings.py │ │ └── spiders │ │ │ ├── __init__.py │ │ │ ├── spider1.py │ │ │ └── spider2.py │ └── scrapy.cfg ├── 2021211338-郭柏彤-爬虫小作业-爬取的数据文件 │ ├── scrapy-test-firsthand.json │ └── scrapy-test-secondhand.json ├── 2021211338-郭柏彤-爬虫小作业-说明文档.docx ├── datachange.py ├── house_output.csv ├── house_show.py ├── house_show2.py ├── house_show3.py ├── lianjia │ ├── .idea │ │ ├── .gitignore │ │ ├── inspectionProfiles │ │ │ └── profiles_settings.xml │ │ ├── lianjia.iml │ │ ├── misc.xml │ │ └── modules.xml │ ├── __init__.py │ ├── begin.py │ ├── items.py │ ├── middlewares.py │ ├── pipelines.py │ ├── settings.py │ └── spiders │ │ ├── __init__.py │ │ ├── spider1.py │ │ └── spider2.py ├── scrapy-test-firsthand.json ├── scrapy-test-secondhand.json └── scrapy.cfg ├── test1_house ├── datachange.py ├── house_output.csv ├── house_outputGBK编码,可用excle打开,.csv ├── house_show.py ├── house_show2.py ├── house_show3.py ├── scrapy-test-firsthand.json ├── 单价-总价散点图绘制效果.png ├── 单价直方图绘制效果.png └── 总价直方图绘制效果.png ├── zufang ├── .idea │ ├── .gitignore │ ├── inspectionProfiles │ │ └── profiles_settings.xml │ ├── misc.xml │ ├── modules.xml │ └── zufang.iml ├── GDP_price_show.py ├── chromedriver.exe ├── face_price_show.py ├── pos_price_show.py ├── room_price_show.py ├── salary_price_show.py ├── scrapy-beijing-zufang.json ├── scrapy-guangzhou-zufang.json ├── scrapy-shanghai-zufang.json ├── scrapy-shenzhen-zufang.json ├── scrapy-xian-zufang.json ├── scrapy.cfg ├── total_price_show.py └── zufang │ ├── __init__.py │ ├── items.py │ ├── middlewares.py │ ├── pipelines.py │ ├── settings.py │ └── spiders │ ├── __init__.py │ └── spider1.py ├── 实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤.zip ├── 实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤 ├── zufang │ ├── .idea │ │ ├── .gitignore │ │ ├── inspectionProfiles │ │ │ └── profiles_settings.xml │ │ ├── misc.xml │ │ ├── modules.xml │ │ └── zufang.iml │ ├── GDP_price_show.py │ ├── chromedriver.exe │ ├── face_price_show.py │ ├── pos_price_show.py │ ├── room_price_show.py │ ├── salary_price_show.py │ ├── scrapy-beijing-zufang.json │ ├── scrapy-guangzhou-zufang.json │ ├── scrapy-shanghai-zufang.json │ ├── scrapy-shenzhen-zufang.json │ ├── scrapy-xian-zufang.json │ ├── scrapy.cfg │ ├── total_price_show.py │ └── zufang │ │ ├── __init__.py │ │ ├── items.py │ │ ├── middlewares.py │ │ ├── pipelines.py │ │ ├── settings.py │ │ └── spiders │ │ ├── __init__.py │ │ └── spider1.py ├── 实验报告 │ ├── 租房数据分析实验报告-2021211338-郭柏彤.docx │ └── 租房数据分析实验报告-2021211338-郭柏彤.pdf ├── 爬取下来的数据 │ ├── scrapy-beijing-zufang.json │ ├── scrapy-guangzhou-zufang.json │ ├── scrapy-shanghai-zufang.json │ ├── scrapy-shenzhen-zufang.json │ └── scrapy-xian-zufang.json └── 生成的图表 │ ├── 五个城市租房总价和单位面积价格分析.png │ ├── 单位面积价格和GDP的关系.png │ ├── 单位面积价格和人均月薪的关系.png │ ├── 均价和居室的关系.png │ ├── 均价和板块的关系.png │ └── 均价和面向的关系.png ├── 租房数据分析实验报告-2021211338-郭柏彤.docx ├── 租房数据分析实验报告-2021211338-郭柏彤.pdf └── 题目要求.pdf /house_outputGBK编码,可用excle打开,.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Revenger666/python-house-scrapy/bbff08f70092d39ef0d19d3f0d9723aa4c8fd1b8/house_outputGBK编码,可用excle打开,.csv -------------------------------------------------------------------------------- /lianjia/.idea/.gitignore: -------------------------------------------------------------------------------- 1 | # 默认忽略的文件 2 | /shelf/ 3 | /workspace.xml 4 | -------------------------------------------------------------------------------- /lianjia/.idea/encodings.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | -------------------------------------------------------------------------------- /lianjia/.idea/inspectionProfiles/profiles_settings.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 6 | -------------------------------------------------------------------------------- /lianjia/.idea/lianjia.iml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | -------------------------------------------------------------------------------- /lianjia/.idea/misc.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | -------------------------------------------------------------------------------- /lianjia/.idea/modules.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | -------------------------------------------------------------------------------- /lianjia/.idea/vcs.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | -------------------------------------------------------------------------------- /lianjia/2021211338-郭柏彤-爬虫小作业-源代码/.idea/.gitignore: -------------------------------------------------------------------------------- 1 | # 默认忽略的文件 2 | /shelf/ 3 | /workspace.xml 4 | -------------------------------------------------------------------------------- /lianjia/2021211338-郭柏彤-爬虫小作业-源代码/.idea/inspectionProfiles/profiles_settings.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 6 | -------------------------------------------------------------------------------- /lianjia/2021211338-郭柏彤-爬虫小作业-源代码/.idea/lianjia.iml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | -------------------------------------------------------------------------------- /lianjia/2021211338-郭柏彤-爬虫小作业-源代码/.idea/misc.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | -------------------------------------------------------------------------------- /lianjia/2021211338-郭柏彤-爬虫小作业-源代码/.idea/modules.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | -------------------------------------------------------------------------------- /lianjia/2021211338-郭柏彤-爬虫小作业-源代码/lianjia/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Revenger666/python-house-scrapy/bbff08f70092d39ef0d19d3f0d9723aa4c8fd1b8/lianjia/2021211338-郭柏彤-爬虫小作业-源代码/lianjia/__init__.py -------------------------------------------------------------------------------- /lianjia/2021211338-郭柏彤-爬虫小作业-源代码/lianjia/begin.py: -------------------------------------------------------------------------------- 1 | from scrapy.crawler import CrawlerRunner 2 | from scrapy.utils.log import configure_logging 3 | from twisted.internet import reactor 4 | 5 | from lianjia.spiders.spider1 import firsthandspider 6 | from lianjia.spiders.spider2 import secondhandspider 7 | 8 | configure_logging() 9 | runner = CrawlerRunner() 10 | runner.crawl(firsthandspider) 11 | runner.crawl(secondhandspider) 12 | d = runner.join() 13 | d.addBoth(lambda _: reactor.stop()) 14 | 15 | reactor.run() 16 | """ 17 | from scrapy import cmdline 18 | 19 | cmdline.execute("scrapy crawl spider1".split()) 20 | cmdline.execute("scrapy crawl spider2".split()) 21 | 22 | from scrapy.crawler import CrawlerProcess 23 | from scrapy.utils.project import get_project_settings 24 | 25 | settings = get_project_settings() 26 | 27 | crawler = CrawlerProcess(settings) 28 | 29 | crawler.crawl('spider1') 30 | crawler.crawl('spider2') 31 | 32 | crawler.start()""" -------------------------------------------------------------------------------- /lianjia/2021211338-郭柏彤-爬虫小作业-源代码/lianjia/items.py: -------------------------------------------------------------------------------- 1 | # Define here the models for your scraped items 2 | # 3 | # See documentation in: 4 | # https://docs.scrapy.org/en/latest/topics/items.html 5 | 6 | import scrapy 7 | 8 | 9 | class firsthanditem(scrapy.Item): 10 | name = scrapy.Field() 11 | position = scrapy.Field() 12 | types = scrapy.Field() 13 | houseType = scrapy.Field() 14 | space = scrapy.Field() 15 | unitPrice = scrapy.Field() 16 | totalPrice = scrapy.Field() 17 | 18 | class secondhanditem(scrapy.Item): 19 | name = scrapy.Field() 20 | position = scrapy.Field() 21 | types = scrapy.Field() 22 | unitPrice = scrapy.Field() 23 | totalPrice = scrapy.Field() -------------------------------------------------------------------------------- /lianjia/2021211338-郭柏彤-爬虫小作业-源代码/lianjia/middlewares.py: -------------------------------------------------------------------------------- 1 | # Define here the models for your spider middleware 2 | # 3 | # See documentation in: 4 | # https://docs.scrapy.org/en/latest/topics/spider-middleware.html 5 | 6 | from scrapy import signals 7 | 8 | # useful for handling different item types with a single interface 9 | from itemadapter import is_item, ItemAdapter 10 | 11 | 12 | import random 13 | class RandomUserAgentMiddleware(object): 14 | def __init__(self, user_agents): 15 | self.user_agents = user_agents 16 | 17 | @classmethod 18 | def from_crawler(cls, crawler): 19 | # 从settings.py中导入MY_USER_AGENT 20 | s = cls(user_agents=crawler.settings.get('MY_USER_AGENT')) 21 | return s 22 | 23 | def process_request(self, request, spider): 24 | agent = random.choice(self.user_agents) 25 | request.headers['User-Agent'] = agent 26 | return None 27 | 28 | 29 | class LianjiaSpiderMiddleware: 30 | # Not all methods need to be defined. If a method is not defined, 31 | # scrapy acts as if the spider middleware does not modify the 32 | # passed objects. 33 | 34 | @classmethod 35 | def from_crawler(cls, crawler): 36 | # This method is used by Scrapy to create your spiders. 37 | s = cls() 38 | crawler.signals.connect(s.spider_opened, signal=signals.spider_opened) 39 | return s 40 | 41 | def process_spider_input(self, response, spider): 42 | # Called for each response that goes through the spider 43 | # middleware and into the spider. 44 | 45 | # Should return None or raise an exception. 46 | return None 47 | 48 | def process_spider_output(self, response, result, spider): 49 | # Called with the results returned from the Spider, after 50 | # it has processed the response. 51 | 52 | # Must return an iterable of Request, or item objects. 53 | for i in result: 54 | yield i 55 | 56 | def process_spider_exception(self, response, exception, spider): 57 | # Called when a spider or process_spider_input() method 58 | # (from other spider middleware) raises an exception. 59 | 60 | # Should return either None or an iterable of Request or item objects. 61 | pass 62 | 63 | def process_start_requests(self, start_requests, spider): 64 | # Called with the start requests of the spider, and works 65 | # similarly to the process_spider_output() method, except 66 | # that it doesn’t have a response associated. 67 | 68 | # Must return only requests (not items). 69 | for r in start_requests: 70 | yield r 71 | 72 | def spider_opened(self, spider): 73 | spider.logger.info("Spider opened: %s" % spider.name) 74 | 75 | 76 | class LianjiaDownloaderMiddleware: 77 | # Not all methods need to be defined. If a method is not defined, 78 | # scrapy acts as if the downloader middleware does not modify the 79 | # passed objects. 80 | 81 | @classmethod 82 | def from_crawler(cls, crawler): 83 | # This method is used by Scrapy to create your spiders. 84 | s = cls() 85 | crawler.signals.connect(s.spider_opened, signal=signals.spider_opened) 86 | return s 87 | 88 | def process_request(self, request, spider): 89 | # Called for each request that goes through the downloader 90 | # middleware. 91 | 92 | # Must either: 93 | # - return None: continue processing this request 94 | # - or return a Response object 95 | # - or return a Request object 96 | # - or raise IgnoreRequest: process_exception() methods of 97 | # installed downloader middleware will be called 98 | return None 99 | 100 | def process_response(self, request, response, spider): 101 | # Called with the response returned from the downloader. 102 | 103 | # Must either; 104 | # - return a Response object 105 | # - return a Request object 106 | # - or raise IgnoreRequest 107 | return response 108 | 109 | def process_exception(self, request, exception, spider): 110 | # Called when a download handler or a process_request() 111 | # (from other downloader middleware) raises an exception. 112 | 113 | # Must either: 114 | # - return None: continue processing this exception 115 | # - return a Response object: stops process_exception() chain 116 | # - return a Request object: stops process_exception() chain 117 | pass 118 | 119 | def spider_opened(self, spider): 120 | spider.logger.info("Spider opened: %s" % spider.name) 121 | -------------------------------------------------------------------------------- /lianjia/2021211338-郭柏彤-爬虫小作业-源代码/lianjia/pipelines.py: -------------------------------------------------------------------------------- 1 | # Define your item pipelines here 2 | # 3 | # Don't forget to add your pipeline to the ITEM_PIPELINES setting 4 | # See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html 5 | 6 | 7 | # useful for handling different item types with a single interface 8 | from itemadapter import ItemAdapter 9 | import json 10 | 11 | class firsthandline(object): 12 | def open_spider(self, spider): 13 | try: 14 | self.file = open('scrapy-test-firsthand.json',"w",encoding="utf-8") 15 | except Exception as err: 16 | print(err) 17 | 18 | def process_item(self, item, spider): 19 | dict_item = dict(item) 20 | json_str = json.dumps(dict_item, ensure_ascii=False) + "\n" 21 | self.file.write(json_str) 22 | return item 23 | 24 | def close_spider(self, spider): 25 | self.file.close() 26 | 27 | 28 | class secondhandline(object): 29 | def open_spider(self, spider): 30 | try: 31 | self.file = open('scrapy-test-secondhand.json', "w", encoding="utf-8") 32 | except Exception as err: 33 | print(err) 34 | 35 | def process_item(self, item, spider): 36 | dict_item = dict(item) 37 | json_str = json.dumps(dict_item, ensure_ascii=False) + "\n" 38 | self.file.write(json_str) 39 | return item 40 | 41 | def close_spider(self, spider): 42 | self.file.close() -------------------------------------------------------------------------------- /lianjia/2021211338-郭柏彤-爬虫小作业-源代码/lianjia/settings.py: -------------------------------------------------------------------------------- 1 | # Scrapy settings for lianjia project 2 | # 3 | # For simplicity, this file contains only settings considered important or 4 | # commonly used. You can find more settings consulting the documentation: 5 | # 6 | # https://docs.scrapy.org/en/latest/topics/settings.html 7 | # https://docs.scrapy.org/en/latest/topics/downloader-middleware.html 8 | # https://docs.scrapy.org/en/latest/topics/spider-middleware.html 9 | 10 | BOT_NAME = "lianjia" 11 | #2403:a200:a200:13f1:183:84:18:11 12 | 13 | SPIDER_MODULES = ["lianjia.spiders"] 14 | NEWSPIDER_MODULE = "lianjia.spiders" 15 | 16 | # Crawl responsibly by identifying yourself (and your website) on the user-agent 17 | #USER_AGENT = 'python_qm (+http://www.yourdomain.com)' 18 | #USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36" 19 | #USER_AGENT ="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36 Edg/119.0.0.0" 20 | DOWNLOADER_MIDDLEWARES = { 21 | # 'lianjia.middlewares.LianjiaDownloaderMiddleware': 543, 22 | 'lianjia.middlewares.RandomUserAgentMiddleware': 900, 23 | } 24 | 25 | MY_USER_AGENT = [ 26 | "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; AcooBrowser; .NET CLR 1.1.4322; .NET CLR 2.0.50727)", 27 | "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; Acoo Browser; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.0.04506)", 28 | "Mozilla/4.0 (compatible; MSIE 7.0; AOL 9.5; AOLBuild 4337.35; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)", 29 | "Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US)", 30 | "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET CLR 2.0.50727; Media Center PC 6.0)", 31 | "Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET CLR 1.0.3705; .NET CLR 1.1.4322)", 32 | "Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 5.2; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.2; .NET CLR 3.0.04506.30)", 33 | "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN) AppleWebKit/523.15 (KHTML, like Gecko, Safari/419.3) Arora/0.3 (Change: 287 c9dfb30)", 34 | "Mozilla/5.0 (X11; U; Linux; en-US) AppleWebKit/527+ (KHTML, like Gecko, Safari/419.3) Arora/0.6", 35 | "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.2pre) Gecko/20070215 K-Ninja/2.1.1", 36 | "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.9) Gecko/20080705 Firefox/3.0 Kapiko/3.0", 37 | "Mozilla/5.0 (X11; Linux i686; U;) Gecko/20070322 Kazehakase/0.4.5", 38 | "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.8) Gecko Fedora/1.9.0.8-1.fc10 Kazehakase/0.5.6", 39 | "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11", 40 | "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/535.20 (KHTML, like Gecko) Chrome/19.0.1036.7 Safari/535.20", 41 | "Opera/9.80 (Macintosh; Intel Mac OS X 10.6.8; U; fr) Presto/2.9.168 Version/11.52", 42 | "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.11 TaoBrowser/2.0 Safari/536.11", 43 | "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.71 Safari/537.1 LBBROWSER", 44 | "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; LBBROWSER)", 45 | "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E; LBBROWSER)", 46 | "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.84 Safari/535.11 LBBROWSER", 47 | "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E)", 48 | "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; QQBrowser/7.0.3698.400)", 49 | "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E)", 50 | "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; SV1; QQDownload 732; .NET4.0C; .NET4.0E; 360SE)", 51 | "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E)", 52 | "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E)", 53 | "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1", 54 | "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1", 55 | "Mozilla/5.0 (iPad; U; CPU OS 4_2_1 like Mac OS X; zh-cn) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8C148 Safari/6533.18.5", 56 | "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:2.0b13pre) Gecko/20110307 Firefox/4.0b13pre", 57 | "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:16.0) Gecko/20100101 Firefox/16.0", 58 | "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11", 59 | "Mozilla/5.0 (X11; U; Linux x86_64; zh-CN; rv:1.9.2.10) Gecko/20100922 Ubuntu/10.10 (maverick) Firefox/3.6.10", 60 | "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36", 61 | ] 62 | 63 | # Obey robots.txt rules 64 | ROBOTSTXT_OBEY = False 65 | 66 | #LOG_LEVEL = 'WARNING' 67 | 68 | #LOG_LEVEL = "WARNING" 69 | # Configure maximum concurrent requests performed by Scrapy (default: 16) 70 | #CONCURRENT_REQUESTS = 8 71 | 72 | # Configure a delay for requests for the same website (default: 0) 73 | # See https://docs.scrapy.org/en/latest/topics/settings.html#download-delay 74 | # See also autothrottle settings and docs 75 | DOWNLOAD_DELAY = 3 76 | RANDOMIZE_DOWNLOAD_DELAY = True 77 | #CONCURRENT_REQUESTS_PER_DOMAIN = 2 78 | # The download delay setting will honor only one of: 79 | #CONCURRENT_REQUESTS_PER_DOMAIN = 16 80 | #CONCURRENT_REQUESTS_PER_IP = 16 81 | 82 | # Disable cookies (enabled by default) 83 | #COOKIES_ENABLED = False 84 | 85 | # Disable Telnet Console (enabled by default) 86 | #TELNETCONSOLE_ENABLED = False 87 | 88 | # Override the default request headers: 89 | #DEFAULT_REQUEST_HEADERS = { 90 | # "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", 91 | # "Accept-Language": "en", 92 | # "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.4389.82 Safari/537.36" 93 | #} 94 | 95 | # Enable or disable spider middlewares 96 | # See https://docs.scrapy.org/en/latest/topics/spider-middleware.html 97 | #SPIDER_MIDDLEWARES = { 98 | # "lianjia.middlewares.LianjiaSpiderMiddleware": 543, 99 | #} 100 | 101 | # Enable or disable downloader middlewares 102 | # See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html 103 | #DOWNLOADER_MIDDLEWARES = { 104 | # "lianjia.middlewares.LianjiaDownloaderMiddleware": 543, 105 | #} 106 | 107 | # Enable or disable extensions 108 | # See https://docs.scrapy.org/en/latest/topics/extensions.html 109 | #EXTENSIONS = { 110 | # "scrapy.extensions.telnet.TelnetConsole": None, 111 | #} 112 | 113 | # Configure item pipelines 114 | # See https://docs.scrapy.org/en/latest/topics/item-pipeline.html 115 | ITEM_PIPELINES = {'lianjia.pipelines.firsthandline': 300, 'lianjia.pipelines.secondhandline': 300,} 116 | #ITEM_PIPELINES = {'lianjia.pipelines.firsthandline': 300} 117 | # Enable and configure the AutoThrottle extension (disabled by default) 118 | # See https://docs.scrapy.org/en/latest/topics/autothrottle.html 119 | #AUTOTHROTTLE_ENABLED = True 120 | # The initial download delay 121 | #AUTOTHROTTLE_START_DELAY = 5 122 | # The maximum download delay to be set in case of high latencies 123 | #AUTOTHROTTLE_MAX_DELAY = 60 124 | # The average number of requests Scrapy should be sending in parallel to 125 | # each remote server 126 | #AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0 127 | # Enable showing throttling stats for every response received: 128 | #AUTOTHROTTLE_DEBUG = False 129 | 130 | # Enable and configure HTTP caching (disabled by default) 131 | # See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings 132 | #HTTPCACHE_ENABLED = True 133 | #HTTPCACHE_EXPIRATION_SECS = 0 134 | #HTTPCACHE_DIR = "httpcache" 135 | #HTTPCACHE_IGNORE_HTTP_CODES = [] 136 | #HTTPCACHE_STORAGE = "scrapy.extensions.httpcache.FilesystemCacheStorage" 137 | 138 | # Set settings whose default value is deprecated to a future-proof value 139 | REQUEST_FINGERPRINTER_IMPLEMENTATION = "2.7" 140 | TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor" 141 | FEED_EXPORT_ENCODING = "utf-8" 142 | -------------------------------------------------------------------------------- /lianjia/2021211338-郭柏彤-爬虫小作业-源代码/lianjia/spiders/__init__.py: -------------------------------------------------------------------------------- 1 | # This package will contain the spiders of your Scrapy project 2 | # 3 | # Please refer to the documentation for information on how to create and manage 4 | # your spiders. 5 | -------------------------------------------------------------------------------- /lianjia/2021211338-郭柏彤-爬虫小作业-源代码/lianjia/spiders/spider1.py: -------------------------------------------------------------------------------- 1 | import scrapy 2 | from scrapy import Selector, Request 3 | from scrapy.http import HtmlResponse 4 | from lianjia.items import firsthanditem 5 | class firsthandspider(scrapy.spiders.Spider): 6 | name = "lianjia1" 7 | allowed_domains = ["bj.fang.lianjia.com"] 8 | start_urls = [] 9 | for page in range(3, 8): 10 | url1 = 'https://bj.fang.lianjia.com/loupan/pg{}/'.format(page) 11 | start_urls.append(url1) 12 | #for page in range(3, 8): 13 | # url1 = 'https://bj.lianjia.com/ershoufang/pg{}/'.format(page) 14 | # start_urls.append(url1) 15 | 16 | custom_settings = { 17 | 'ITEM_PIPELINES': {'lianjia.pipelines.firsthandline': 300}, 18 | } 19 | 20 | def parse(self, response): 21 | 22 | item = firsthanditem() 23 | div_list = response.xpath("/html/body/div[3]/ul[2]/li") 24 | #div_list = response.xpath("//*") 25 | #print(div_list) 26 | for each in div_list: 27 | item['name'] = each.xpath("./div[@class=\"resblock-desc-wrapper\"]/div[@class=\"resblock-name\"]/a/text()").extract_first() 28 | item['types'] = each.xpath("./div[@class=\"resblock-desc-wrapper\"]/div[@class=\"resblock-name\"]/span[@class=\"resblock-type\"]/text()").extract_first() 29 | item['position'] = each.xpath("./div[@class=\"resblock-desc-wrapper\"]/div[@class=\"resblock-location\"]/a/text()").extract_first() 30 | item['houseType'] = each.xpath("./div[@class=\"resblock-desc-wrapper\"]/a[@class=\"resblock-room\"]/span/text()").extract_first() 31 | item['space'] = each.xpath("./div[@class=\"resblock-desc-wrapper\"]/div[@class=\"resblock-area\"]/span/text()").extract_first() 32 | item['unitPrice'] = each.xpath("./div[@class=\"resblock-desc-wrapper\"]/div[@class=\"resblock-price\"]/div[@class=\"main-price\"]/span[@class = \"number\"]/text()").extract_first() 33 | item['totalPrice'] = each.xpath("./div[@class=\"resblock-desc-wrapper\"]/div[@class=\"resblock-price\"]/div[@class=\"second\"]/text()").extract_first() 34 | yield item 35 | -------------------------------------------------------------------------------- /lianjia/2021211338-郭柏彤-爬虫小作业-源代码/lianjia/spiders/spider2.py: -------------------------------------------------------------------------------- 1 | import scrapy 2 | from scrapy import Selector, Request 3 | from scrapy.http import HtmlResponse 4 | from lianjia.items import secondhanditem 5 | class secondhandspider(scrapy.spiders.Spider): 6 | name = "lianjia2" 7 | allowed_domains = ["bj.lianjia.com"] 8 | start_urls = [] 9 | for page in range(3, 8): 10 | url1 = 'https://bj.lianjia.com/ershoufang/pg{}/'.format(page) 11 | start_urls.append(url1) 12 | 13 | custom_settings = { 14 | 'ITEM_PIPELINES': {'lianjia.pipelines.secondhandline': 300}, 15 | } 16 | 17 | def parse(self, response): 18 | 19 | item = secondhanditem() 20 | div_list = response.xpath("//*[@id=\"content\"]/div[1]/ul/li") 21 | 22 | #print(div_list) 23 | for each in div_list: 24 | item['name'] = each.xpath("./div[@class=\"info clear\"]/div[@class=\"flood\"]/div/a[1]/text()").extract_first() 25 | item['position'] = each.xpath("./div[@class=\"info clear\"]/div[@class=\"flood\"]/div/a[2]/text()").extract_first() 26 | item['types'] = each.xpath("./div[@class=\"info clear\"]/div[@class=\"address\"]/div/text()").extract_first() 27 | item['unitPrice'] = each.xpath("./div[@class=\"info clear\"]/div[@class=\"priceInfo\"]/div[2]/span/text()").extract_first() 28 | item['totalPrice'] = each.xpath("./div[@class=\"info clear\"]/div[@class=\"priceInfo\"]/div[1]/span/text()").extract_first() 29 | yield item 30 | -------------------------------------------------------------------------------- /lianjia/2021211338-郭柏彤-爬虫小作业-源代码/scrapy.cfg: -------------------------------------------------------------------------------- 1 | # Automatically created by: scrapy startproject 2 | # 3 | # For more information about the [deploy] section see: 4 | # https://scrapyd.readthedocs.io/en/latest/deploy.html 5 | 6 | [settings] 7 | default = lianjia.settings 8 | 9 | [deploy] 10 | #url = http://localhost:6800/ 11 | project = lianjia 12 | -------------------------------------------------------------------------------- /lianjia/2021211338-郭柏彤-爬虫小作业-爬取的数据文件/scrapy-test-firsthand.json: -------------------------------------------------------------------------------- 1 | {"name": "北辰墅院1900", "types": "住宅", "position": "顺兴街11号院望尊园", "houseType": "3室", "space": "建面 83-135㎡", "unitPrice": "36000", "totalPrice": "总价430(万/套)"} 2 | {"name": "燕西华府", "types": "别墅", "position": "王佐镇青龙湖公园东1500米, 泉湖西路1号院(七区), 泉湖西路1号院(六区)", "houseType": "3室", "space": "建面 350-851㎡", "unitPrice": "47000", "totalPrice": "总价1400-3500(万/套)"} 3 | {"name": "京西悦府", "types": "住宅", "position": "燕房线阎村地铁站东南角约189米", "houseType": null, "space": "建面 120-135㎡", "unitPrice": "33000", "totalPrice": "总价440(万/套)"} 4 | {"name": "福景苑", "types": "住宅", "position": "亮马桥路46号", "houseType": "1室", "space": "建面 145-268㎡", "unitPrice": "83000", "totalPrice": "总价1150-2400(万/套)"} 5 | {"name": "合景寰汇公馆", "types": "住宅", "position": "北京市通州区滨河中路西侧(合景寰汇公馆)", "houseType": "2室", "space": "建面 77-117㎡", "unitPrice": "35000", "totalPrice": "总价280-490(万/套)"} 6 | {"name": "K2十里春风", "types": "住宅", "position": "北京市通州区", "houseType": "2室", "space": "建面 74-90㎡", "unitPrice": "23500", "totalPrice": "总价188-212(万/套)"} 7 | {"name": "K2十里春风", "types": "别墅", "position": "北京市通州区", "houseType": "3室", "space": "建面 155-156㎡", "unitPrice": "28000", "totalPrice": "总价440-460(万/套)"} 8 | {"name": "玺萌壹號院", "types": "别墅", "position": "西南三环嘉园路与镇国寺北街交叉口", "houseType": "5室", "space": "建面 320-464㎡", "unitPrice": "90000", "totalPrice": "总价3650-3940(万/套)"} 9 | {"name": "北京书院", "types": "住宅", "position": "北京市朝阳区北土城东路辅路", "houseType": "1室", "space": "建面 79-139㎡", "unitPrice": "155000", "totalPrice": "总价1066(万/套)"} 10 | {"name": "中铁华侨城和园", "types": "住宅", "position": "南五环南海子公园西侧约500米", "houseType": "3室", "space": "建面 154-184㎡", "unitPrice": "60000", "totalPrice": "总价930-980(万/套)"} 11 | {"name": "顺鑫颐和天璟", "types": "住宅", "position": "北京市顺义区牛栏山镇牛富路顺鑫颐和天璟禧润售楼中心", "houseType": "4室", "space": "建面 110-220㎡", "unitPrice": "28000", "totalPrice": "总价400-420(万/套)"} 12 | {"name": "顺鑫颐和天璟", "types": "别墅", "position": "新城右堤路与昌金路交汇处向北200米", "houseType": "4室", "space": "建面 278-486㎡", "unitPrice": "28000", "totalPrice": "总价950-1200(万/套)"} 13 | {"name": "永旺19街", "types": "商业", "position": "地铁生物医药基地站向南200米", "houseType": null, "space": null, "unitPrice": "24000", "totalPrice": "总价299(万/套)"} 14 | {"name": "北京城建北京合院", "types": "住宅", "position": "燕京街与通顺路交汇口东800米(仁和公园南)", "houseType": "3室", "space": "建面 95-130㎡", "unitPrice": "46000", "totalPrice": "总价556-566(万/套)"} 15 | {"name": "复地运河公馆", "types": "住宅", "position": "通州运河核心区临滨河西路", "houseType": "2室", "space": "建面 89-145㎡", "unitPrice": "43000", "totalPrice": "总价450-650(万/套)"} 16 | {"name": "北京城建北京合院", "types": "别墅", "position": "燕京街与通顺路交汇口东800米(仁和公园南)", "houseType": "4室", "space": "建面 210-330㎡", "unitPrice": "39000", "totalPrice": "总价1000-1300(万/套)"} 17 | {"name": "月亮河七星公馆", "types": "住宅", "position": "通燕高速耿庄桥出口南200米月亮河,河滨路1号", "houseType": "1室", "space": "建面 55-109㎡", "unitPrice": "68000", "totalPrice": "总价374-800(万/套)"} 18 | {"name": "天润福熙大道", "types": "住宅", "position": "清河营东路1号院, 清河营东路3号院", "houseType": "1室", "space": "建面 65-374㎡", "unitPrice": "108000", "totalPrice": "总价750-3316(万/套)"} 19 | {"name": "京贸国际公馆", "types": "住宅", "position": "怡乐中路299号院(广渠快速路二期出口向南1000米)", "houseType": "1室", "space": "建面 72-147㎡", "unitPrice": "64000", "totalPrice": "总价495-950(万/套)"} 20 | {"name": "凯德麓语", "types": "别墅", "position": "兴寿镇京承高速G11出口向西怀昌路北侧", "houseType": "3室", "space": "建面 280-863㎡", "unitPrice": "35000", "totalPrice": "总价850-3450(万/套)"} 21 | {"name": "京贸国际城·峰景", "types": "住宅", "position": "芙蓉东路1号(通燕高速耿庄桥北出口向南300米)", "houseType": "1室", "space": "建面 69-140㎡", "unitPrice": "68000", "totalPrice": "总价460-980(万/套)"} 22 | {"name": "观唐云鼎", "types": "别墅", "position": "溪翁庄镇密溪路39号院(云佛山度假村对面)", "houseType": "3室", "space": "建面 346-613㎡", "unitPrice": "30000", "totalPrice": "总价1068-1850(万/套)"} 23 | {"name": "硅谷SOHO", "types": "商业类", "position": "京藏高速科技园出口(28出口)凉水河路", "houseType": "1室", "space": "建面 49-68㎡", "unitPrice": "20000", "totalPrice": "总价85-180(万/套)"} 24 | {"name": "旭辉城", "types": "住宅", "position": "北京市房山区良锦街6号院旭辉城营销中心", "houseType": "2室", "space": "建面 75-116㎡", "unitPrice": "28500", "totalPrice": "总价219-330(万/套)"} 25 | {"name": "檀香府", "types": "住宅", "position": "京潭大街与潭柘十街交叉口", "houseType": "3室", "space": "建面 124-170㎡", "unitPrice": "42000", "totalPrice": "总价530-750(万/套)"} 26 | {"name": "泰禾金府大院", "types": "别墅", "position": "南四环地铁新宫站南800米", "houseType": "4室", "space": "建面 362-504㎡", "unitPrice": "75000", "totalPrice": "总价2700-3700(万/套)"} 27 | {"name": "和棠瑞著", "types": "别墅", "position": "金海湖景区坝前广场西侧500米", "houseType": "3室", "space": "建面 305-360㎡", "unitPrice": "16000", "totalPrice": "总价530-560(万/套)"} 28 | {"name": "尊悦光华", "types": "住宅", "position": "北京市朝阳区光华东里甲1号院3号楼", "houseType": "3室", "space": "建面 133-171㎡", "unitPrice": "150000", "totalPrice": "总价2500(万/套)"} 29 | {"name": "首创·河著", "types": "别墅", "position": "京承高速11出口(昌金路)向东900 米路北", "houseType": "4室", "space": "建面 248-310㎡", "unitPrice": "38000", "totalPrice": "总价1200-1900(万/套)"} 30 | {"name": "华萃西山", "types": "住宅", "position": "永定镇地铁S1号线石厂西南700米", "houseType": "3室", "space": "建面 115-122㎡", "unitPrice": "48000", "totalPrice": "总价560-600(万/套)"} 31 | {"name": "京西悦府", "types": "别墅", "position": "北京市房山区燕房线阎村地铁站东南角约189米", "houseType": "3室", "space": "建面 175-176㎡", "unitPrice": "40000", "totalPrice": "总价700-780(万/套)"} 32 | {"name": "中粮天恒天悦壹号", "types": "别墅", "position": "南四环地铁新宫站南500米", "houseType": "4室", "space": "建面 220-340㎡", "unitPrice": "80000", "totalPrice": "总价2000-2360(万/套)"} 33 | {"name": "龙湾别墅", "types": "住宅", "position": "后沙峪镇龙湾别墅", "houseType": "4室", "space": "建面 218-317㎡", "unitPrice": "70000", "totalPrice": "总价2300(万/套)"} 34 | {"name": "京投发展·锦悦府", "types": "住宅", "position": "檀营乡檀东路西侧", "houseType": "3室", "space": "建面 90㎡", "unitPrice": "25607", "totalPrice": "总价220(万/套)"} 35 | {"name": "京投发展·锦悦府", "types": "别墅", "position": "檀营乡檀东路西侧", "houseType": "3室", "space": "建面 187-285㎡", "unitPrice": "25000", "totalPrice": "总价400-560(万/套)"} 36 | {"name": "金辰府", "types": "住宅", "position": "北京市昌平区北七家镇政府东南100米", "houseType": "3室", "space": "建面 89-143㎡", "unitPrice": "55000", "totalPrice": "总价490-790(万/套)"} 37 | {"name": "建邦·顺颐府", "types": "住宅", "position": "空港B区裕民大街30号千里马国际一层建邦·顺颐府售楼中心", "houseType": "3室", "space": "建面 89-147㎡", "unitPrice": "55583", "totalPrice": "总价480-845(万/套)"} 38 | {"name": "葛洲坝中国府", "types": "住宅", "position": "北京市丰台东路46号", "houseType": "3室", "space": "建面 168-240㎡", "unitPrice": "125000", "totalPrice": "总价2200-3000(万/套)"} 39 | {"name": "华萃西山", "types": "别墅", "position": "门头沟永定镇地铁S1号线石厂站西南700米", "houseType": "4室", "space": "建面 135-245㎡", "unitPrice": "48000", "totalPrice": "总价760-1060(万/套)"} 40 | {"name": "北京东湾", "types": "住宅", "position": "通惠北路98号", "houseType": "1室", "space": "建面 58-130㎡", "unitPrice": "68500", "totalPrice": "总价410-900(万/套)"} 41 | {"name": "富兴首府", "types": "住宅", "position": "东坝路9号东北60米", "houseType": "3室", "space": "建面 144-356㎡", "unitPrice": "85000", "totalPrice": "总价1706-2240(万/套)"} 42 | {"name": "中铁诺德阅墅", "types": "别墅", "position": "顺义区后沙峪镇裕园路762乡龙湖滟澜山对面", "houseType": "4室", "space": "建面 235-320㎡", "unitPrice": "50000", "totalPrice": "总价1150-1700(万/套)"} 43 | {"name": "中铁华侨城和园", "types": "别墅", "position": "南五环南海子公园西侧约500米", "houseType": "4室", "space": "建面 288-370㎡", "unitPrice": "50000", "totalPrice": "总价1870(万/套)"} 44 | {"name": "懋源·璟岳", "types": "别墅", "position": "南三环西路99号院", "houseType": "4室", "space": "建面 465-590㎡", "unitPrice": "140000", "totalPrice": "总价6500-9000(万/套)"} 45 | {"name": "懋源·璟玺", "types": "别墅", "position": "孙河京密路与京平辅路交叉口西行1000米", "houseType": "5室", "space": "建面 500-716㎡", "unitPrice": "100000", "totalPrice": "总价4380-6778(万/套)"} 46 | {"name": "万科雲庐", "types": "住宅", "position": "魏各庄路万科雲庐。售楼处位置在山湖路与泉湖西湖交叉位置", "houseType": "4室", "space": "建面 104-300㎡", "unitPrice": "39000", "totalPrice": "总价656-820(万/套)"} 47 | {"name": "万科雲庐", "types": "别墅", "position": "魏各庄路万科雲庐", "houseType": "4室", "space": "建面 200-330㎡", "unitPrice": "30000", "totalPrice": "总价852-950(万/套)"} 48 | {"name": "金茂北京国际社区", "types": "住宅", "position": "顺义新城北小营昌金路水色时光路西", "houseType": "1室", "space": "建面 50-118㎡", "unitPrice": "30000", "totalPrice": "总价160-360(万/套)"} 49 | {"name": "住总如院", "types": "住宅", "position": "北京市大兴区采华路(波尔多小镇南区西南侧约250米)", "houseType": "2室", "space": "建面 98-233㎡", "unitPrice": "31136", "totalPrice": "总价280-475(万/套)"} 50 | {"name": "郎府书苑", "types": "住宅", "position": "西集镇京哈高速郎府出口南侧300米", "houseType": "3室", "space": "建面 89-116㎡", "unitPrice": "25800", "totalPrice": "总价273-300(万/套)"} 51 | -------------------------------------------------------------------------------- /lianjia/2021211338-郭柏彤-爬虫小作业-说明文档.docx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Revenger666/python-house-scrapy/bbff08f70092d39ef0d19d3f0d9723aa4c8fd1b8/lianjia/2021211338-郭柏彤-爬虫小作业-说明文档.docx -------------------------------------------------------------------------------- /lianjia/datachange.py: -------------------------------------------------------------------------------- 1 | import csv 2 | import json 3 | import codecs 4 | 5 | ''' 6 | 将json文件格式转为csv文件格式并保存。 7 | ''' 8 | 9 | class Json_Csv(): 10 | 11 | # 初始化方法,创建csv文件。 12 | def __init__(self): 13 | self.save_csv = open('house_output.csv', 'w', encoding='utf-8', newline='') 14 | 15 | self.write_csv = csv.writer(self.save_csv, delimiter=',') # 以,为分隔符 16 | def trans(self, filename): 17 | with codecs.open(filename, 'r', encoding='utf-8') as f: #读取json文件 18 | read = f.readlines() 19 | flag = True 20 | for index, info in enumerate(read): 21 | data = json.loads(info) 22 | if flag: # 第一行当做head 23 | keys = list(data.keys()) # 将得到的keys用列表的形式封装好,才能写入csv 24 | self.write_csv.writerow(keys)#以,为分隔符将表头写入csv中 25 | flag = False # 释放 26 | value = list(data.values()) # 写入values,也要是列表形式 27 | 28 | temp = value[6]#将面积只保留最小面积,并转换为int形 29 | if type(temp) == str: 30 | list_temp = temp.split(' ') 31 | list_temp = list_temp[1].split('-') 32 | list_temp = list_temp[0].split('㎡') 33 | value[6] = int(list_temp[0]) 34 | 35 | value[7] = int(value[7])#将单价转换为Int形式,单位为元 36 | 37 | temp = value[8]#将总价只保留最小的,转换为int型,单位为万元 38 | if type(temp) == str: 39 | list_temp = temp.split('价') 40 | list_temp = list_temp[1].split('-') 41 | list_temp = list_temp[0].split('(') 42 | value[8] = int(list_temp[0]) 43 | 44 | self.write_csv.writerow(value)#以,为分隔符将数据写入表格中 45 | self.save_csv.close() # 写完就关闭 46 | 47 | 48 | if __name__ == '__main__': 49 | json_csv = Json_Csv() 50 | path = 'scrapy-test-firsthand.json' 51 | json_csv.trans(path) -------------------------------------------------------------------------------- /lianjia/house_output.csv: -------------------------------------------------------------------------------- 1 | name,types,position,position1,position2,houseType,space,unitPrice,totalPrice 2 | 北辰墅院1900,住宅,顺兴街11号院望尊园,顺义,马坡,3室,83,36000,430 3 | 燕西华府,别墅,"王佐镇青龙湖公园东1500米, 泉湖西路1号院(七区), 泉湖西路1号院(六区)",丰台,丰台其它,3室,350,47000,1400 4 | 京西悦府,住宅,燕房线阎村地铁站东南角约189米,房山,阎村,,120,33000,440 5 | 福景苑,住宅,亮马桥路46号,朝阳,燕莎,1室,145,83000,1150 6 | 合景寰汇公馆,住宅,北京市通州区滨河中路西侧(合景寰汇公馆),通州,武夷花园,2室,77,35000,280 7 | K2十里春风,住宅,北京市通州区,通州,通州其它,2室,74,23500,188 8 | K2十里春风,别墅,北京市通州区,通州,通州其它,3室,155,28000,440 9 | 玺萌壹號院,别墅,西南三环嘉园路与镇国寺北街交叉口,丰台,草桥,5室,320,90000,3650 10 | 北京书院,住宅,北京市朝阳区北土城东路辅路,朝阳,惠新西街,1室,79,155000,1066 11 | 中铁华侨城和园,住宅,南五环南海子公园西侧约500米,大兴,瀛海,3室,154,60000,930 12 | 顺鑫颐和天璟,住宅,北京市顺义区牛栏山镇牛富路顺鑫颐和天璟禧润售楼中心,顺义,顺义其它,4室,110,28000,400 13 | 顺鑫颐和天璟,别墅,新城右堤路与昌金路交汇处向北200米,顺义,顺义其它,4室,278,28000,950 14 | 永旺19街,商业,地铁生物医药基地站向南200米,大兴,天宫院,,,24000,299 15 | 北京城建北京合院,住宅,燕京街与通顺路交汇口东800米(仁和公园南),顺义,顺义其它,3室,95,46000,556 16 | 复地运河公馆,住宅,通州运河核心区临滨河西路,通州,武夷花园,2室,89,43000,450 17 | 北京城建北京合院,别墅,燕京街与通顺路交汇口东800米(仁和公园南),顺义,顺义其它,4室,210,39000,1000 18 | 月亮河七星公馆,住宅,通燕高速耿庄桥出口南200米月亮河,河滨路1号,通州,武夷花园,1室,55,68000,374 19 | 天润福熙大道,住宅,"清河营东路1号院, 清河营东路3号院",朝阳,北苑,1室,65,108000,750 20 | 京贸国际公馆,住宅,怡乐中路299号院(广渠快速路二期出口向南1000米),通州,九棵树(家乐福),1室,72,64000,495 21 | 凯德麓语,别墅,兴寿镇京承高速G11出口向西怀昌路北侧,昌平,昌平其它,3室,280,35000,850 22 | 京贸国际城·峰景,住宅,芙蓉东路1号(通燕高速耿庄桥北出口向南300米),通州,武夷花园,1室,69,68000,460 23 | 观唐云鼎,别墅,溪翁庄镇密溪路39号院(云佛山度假村对面),密云,溪翁庄镇,3室,346,30000,1068 24 | 旭辉城,住宅,北京市房山区良锦街6号院旭辉城营销中心,房山,房山其它,2室,75,28500,219 25 | 檀香府,住宅,京潭大街与潭柘十街交叉口,门头沟,门头沟其它,3室,124,42000,530 26 | 泰禾金府大院,别墅,南四环地铁新宫站南800米,丰台,新宫,4室,362,75000,2700 27 | 和棠瑞著,别墅,金海湖景区坝前广场西侧500米,平谷,平谷其它,3室,305,16000,530 28 | 尊悦光华,住宅,北京市朝阳区光华东里甲1号院3号楼,朝阳,CBD,3室,133,150000,2500 29 | 首创·河著,别墅,京承高速11出口(昌金路)向东900 米路北,顺义,顺义其它,4室,248,38000,1200 30 | 华萃西山,住宅,永定镇地铁S1号线石厂西南700米,门头沟,门头沟其它,3室,115,48000,560 31 | 京西悦府,别墅,北京市房山区燕房线阎村地铁站东南角约189米,房山,阎村,3室,175,40000,700 32 | 中粮天恒天悦壹号,别墅,南四环地铁新宫站南500米,丰台,新宫,4室,220,80000,2000 33 | 龙湾别墅,住宅,后沙峪镇龙湾别墅,顺义,中央别墅区,4室,218,70000,2300 34 | 京投发展·锦悦府,住宅,檀营乡檀东路西侧,密云,鼓楼街道,3室,90,25607,220 35 | 京投发展·锦悦府,别墅,檀营乡檀东路西侧,密云,鼓楼街道,3室,187,25000,400 36 | 金辰府,住宅,北京市昌平区北七家镇政府东南100米,昌平,北七家,3室,89,55000,490 37 | 建邦·顺颐府,住宅,空港B区裕民大街30号千里马国际一层建邦·顺颐府售楼中心,顺义,后沙峪,3室,89,55583,480 38 | 葛洲坝中国府,住宅,北京市丰台东路46号,丰台,玉泉营,3室,168,125000,2200 39 | 华萃西山,别墅,门头沟永定镇地铁S1号线石厂站西南700米,门头沟,门头沟其它,4室,135,48000,760 40 | 富兴首府,住宅,东坝路9号东北60米,朝阳,东坝,3室,144,85000,1706 41 | 中铁诺德阅墅,别墅,顺义区后沙峪镇裕园路762乡龙湖滟澜山对面,顺义,中央别墅区,4室,235,50000,1150 42 | 中铁华侨城和园,别墅,南五环南海子公园西侧约500米,大兴,瀛海,4室,288,50000,1870 43 | 懋源·璟岳,别墅,南三环西路99号院,丰台,玉泉营,4室,465,140000,6500 44 | 合景泰富天汇,住宅,顺义区昌金路与通顺路交汇处,顺义,马坡,2室,70,33000,230 45 | 懋源·璟玺,别墅,孙河京密路与京平辅路交叉口西行1000米,朝阳,中央别墅区,5室,500,100000,4380 46 | 万科雲庐,住宅,魏各庄路万科雲庐。售楼处位置在山湖路与泉湖西湖交叉位置,丰台,丰台其它,4室,104,39000,656 47 | 万科雲庐,别墅,魏各庄路万科雲庐,丰台,丰台其它,4室,200,30000,852 48 | 金茂北京国际社区,住宅,顺义新城北小营昌金路水色时光路西,顺义,顺义其它,1室,50,30000,160 49 | 住总如院,住宅,北京市大兴区采华路(波尔多小镇南区西南侧约250米),大兴,大兴新机场洋房别墅区,2室,98,31136,280 50 | 郎府书苑,住宅,西集镇京哈高速郎府出口南侧300米,通州,通州其它,3室,89,25800,273 51 | 建邦·顺颐府,别墅,空港B区裕民大街30号,顺义,后沙峪,3室,270,55583,1300 52 | -------------------------------------------------------------------------------- /lianjia/house_show.py: -------------------------------------------------------------------------------- 1 | #本代码实现了对csv文件的新房数据可视化处理,转换为散点图展示单价与总价的关系 2 | 3 | import matplotlib 4 | import matplotlib.pyplot as plt 5 | import csv 6 | 7 | filename = 'house_output.csv' 8 | with open(filename,"r",encoding='utf-8') as f: #注意这里一定记得用utf-8打开 9 | data = csv.reader(f) 10 | unit_price = [] 11 | total_price = [] 12 | house_type = [] 13 | test_f = 1 14 | for i in data: 15 | if test_f == 1: 16 | test_f = 0 17 | else: 18 | unit_price.append(int(i[7])) 19 | total_price.append(int(i[8])) 20 | house_type.append(i[1]) 21 | up1 = [] 22 | up2 = [] 23 | up3 = [] 24 | tp1 = [] 25 | tp2 = [] 26 | tp3 = [] 27 | for i in range(len(unit_price)): 28 | if house_type[i] == '住宅': 29 | up1.append(int(unit_price[i])) 30 | tp1.append(int(total_price[i])) 31 | if house_type[i] == '别墅': 32 | up2.append(int(unit_price[i])) 33 | tp2.append(int(total_price[i])) 34 | if house_type[i] == '商业': 35 | up3.append(int(unit_price[i])) 36 | tp3.append(int(total_price[i])) 37 | 38 | for i in range(len(tp1)): 39 | cur_index = i 40 | while tp1[cur_index - 1] > tp1[cur_index] and cur_index - 1 >= 0: 41 | tp1[cur_index], tp1[cur_index - 1] = tp1[cur_index - 1], tp1[cur_index] 42 | up1[cur_index], up1[cur_index - 1] = up1[cur_index - 1], up1[cur_index] 43 | cur_index -= 1 44 | for i in range(len(tp2)): 45 | cur_index = i 46 | while tp2[cur_index - 1] > tp2[cur_index] and cur_index - 1 >= 0: 47 | tp2[cur_index], tp2[cur_index - 1] = tp2[cur_index - 1], tp2[cur_index] 48 | up2[cur_index], up2[cur_index - 1] = up2[cur_index - 1], up2[cur_index] 49 | cur_index -= 1 50 | for i in range(len(tp3)): 51 | cur_index = i 52 | while tp3[cur_index - 1] > tp3[cur_index] and cur_index - 1 >= 0: 53 | tp3[cur_index], tp3[cur_index - 1] = tp3[cur_index - 1], tp3[cur_index] 54 | up3[cur_index], up3[cur_index - 1] = up3[cur_index - 1], up3[cur_index] 55 | cur_index -= 1 56 | #print(unit_price) 57 | #print(total_price) 58 | #print(house_type) 59 | 60 | color_list = ['#FF8C00', '#00FF00', '#0000FF'] #住宅,别墅,商业 61 | types = ['residence', 'villa', 'commercial'] 62 | 63 | plt.figure(figsize=(30, 10), dpi=70) 64 | plt.title('total_price and unit_price for different type house') 65 | plt.scatter(tp1, up1, s=30, c=color_list[0]) 66 | plt.scatter(tp2, up2, s=30, c=color_list[1]) 67 | plt.scatter(tp3, up3, s=30, c=color_list[2]) 68 | plt.xlabel('total_price/10000 yuan') 69 | plt.ylabel('unit_price/yuan') 70 | plt.legend(loc='lower right',title='house_type',labels=types) 71 | plt.show() 72 | -------------------------------------------------------------------------------- /lianjia/house_show2.py: -------------------------------------------------------------------------------- 1 | # 该代码实现单价-直方图的绘制 2 | import matplotlib 3 | import matplotlib.pyplot as plt 4 | import csv 5 | 6 | filename = 'house_output.csv' 7 | with open(filename,"r",encoding='utf-8') as f: # 注意这里一定记得用utf-8打开 8 | data = csv.reader(f) 9 | unit_price = [] 10 | pos = [] # 行政区 11 | house_num = [] # 楼盘数量 12 | price_sum = [] # 平均单价的和 13 | test_f = 1 14 | for i in data: 15 | if test_f == 1: 16 | test_f = 0 17 | else: 18 | unit_price.append(int(i[7])) 19 | pos.append(str(i[3])) 20 | for i in range(0,10): 21 | house_num.append(int(0)) 22 | price_sum.append(int(0)) 23 | 24 | for i in range(len(pos)): 25 | if pos[i] == '朝阳': 26 | house_num[0] = house_num[0] + 1 27 | price_sum[0] = price_sum[0] + unit_price[i] 28 | if pos[i] == '丰台': 29 | house_num[1] = house_num[1] + 1 30 | price_sum[1] = price_sum[1] + unit_price[i] 31 | if pos[i] == '顺义': 32 | house_num[2] = house_num[2] + 1 33 | price_sum[2] = price_sum[2] + unit_price[i] 34 | if pos[i] == '通州': 35 | house_num[3] = house_num[3] + 1 36 | price_sum[3] = price_sum[3] + unit_price[i] 37 | if pos[i] == '大兴': 38 | house_num[4] = house_num[4] + 1 39 | price_sum[4] = price_sum[4] + unit_price[i] 40 | if pos[i] == '昌平': 41 | house_num[5] = house_num[5] + 1 42 | price_sum[5] = price_sum[5] + unit_price[i] 43 | if pos[i] == '门头沟': 44 | house_num[6] = house_num[6] + 1 45 | price_sum[6] = price_sum[6] + unit_price[i] 46 | if pos[i] == '房山': 47 | house_num[7] = house_num[7] + 1 48 | price_sum[7] = price_sum[7] + unit_price[i] 49 | if pos[i] == '密云': 50 | house_num[8] = house_num[8] + 1 51 | price_sum[8] = price_sum[8] + unit_price[i] 52 | if pos[i] == '平谷': 53 | house_num[9] = house_num[9] + 1 54 | price_sum[9] = price_sum[9] + unit_price[i] 55 | print(house_num) 56 | bins_num = [] 57 | count = 3 58 | for i in range(0, 11): 59 | if i != 0: 60 | price_sum[i-1] = price_sum[i-1] / house_num[i-1] # 计算出平均单价 61 | count = count + house_num[i-1] 62 | bins_num.append(count) 63 | 64 | 65 | position_qu = ['chaoyang', 'fengtai', 'shunyi', 'tongzhou', 'daxing', 'changping', 'mentougou', 'fangshan', 'miyun', 'pinggu'] 66 | print(bins_num) 67 | print(price_sum) 68 | plt.figure(figsize=(30, 10), dpi=70) 69 | plt.title('unit_price_show', fontsize=30) 70 | plt.xlabel('position', fontsize=15) 71 | plt.ylabel('avg_unit_price/yuan', fontsize=15) 72 | for i in range(0,10): 73 | plt.bar(position_qu[i], price_sum[i], width=(house_num[i]/15)) 74 | for x,y in zip(position_qu,price_sum): 75 | plt.text(x,y,'%d' % y, ha='center', va='bottom', fontsize=15) 76 | plt.show() 77 | 78 | 79 | -------------------------------------------------------------------------------- /lianjia/house_show3.py: -------------------------------------------------------------------------------- 1 | # 该代码实现总价-直方图的绘制 2 | import matplotlib 3 | import matplotlib.pyplot as plt 4 | import csv 5 | 6 | filename = 'house_output.csv' 7 | with open(filename,"r",encoding='utf-8') as f: # 注意这里一定记得用utf-8打开 8 | data = csv.reader(f) 9 | total_price = [] 10 | pos = [] # 行政区 11 | house_num = [] # 楼盘数量 12 | price_sum = [] # 平均单价的和 13 | test_f = 1 14 | for i in data: 15 | if test_f == 1: 16 | test_f = 0 17 | else: 18 | total_price.append(int(i[8])) 19 | pos.append(str(i[3])) 20 | for i in range(0,10): 21 | house_num.append(int(0)) 22 | price_sum.append(int(0)) 23 | 24 | for i in range(len(pos)): 25 | if pos[i] == '朝阳': 26 | house_num[0] = house_num[0] + 1 27 | price_sum[0] = price_sum[0] + total_price[i] 28 | if pos[i] == '丰台': 29 | house_num[1] = house_num[1] + 1 30 | price_sum[1] = price_sum[1] + total_price[i] 31 | if pos[i] == '顺义': 32 | house_num[2] = house_num[2] + 1 33 | price_sum[2] = price_sum[2] + total_price[i] 34 | if pos[i] == '通州': 35 | house_num[3] = house_num[3] + 1 36 | price_sum[3] = price_sum[3] + total_price[i] 37 | if pos[i] == '大兴': 38 | house_num[4] = house_num[4] + 1 39 | price_sum[4] = price_sum[4] + total_price[i] 40 | if pos[i] == '昌平': 41 | house_num[5] = house_num[5] + 1 42 | price_sum[5] = price_sum[5] + total_price[i] 43 | if pos[i] == '门头沟': 44 | house_num[6] = house_num[6] + 1 45 | price_sum[6] = price_sum[6] + total_price[i] 46 | if pos[i] == '房山': 47 | house_num[7] = house_num[7] + 1 48 | price_sum[7] = price_sum[7] + total_price[i] 49 | if pos[i] == '密云': 50 | house_num[8] = house_num[8] + 1 51 | price_sum[8] = price_sum[8] + total_price[i] 52 | if pos[i] == '平谷': 53 | house_num[9] = house_num[9] + 1 54 | price_sum[9] = price_sum[9] + total_price[i] 55 | print(house_num) 56 | bins_num = [] 57 | count = 3 58 | for i in range(0, 11): 59 | if i != 0: 60 | price_sum[i-1] = price_sum[i-1] / house_num[i-1] # 计算出平均单价 61 | count = count + house_num[i-1] 62 | bins_num.append(count) 63 | 64 | 65 | position_qu = ['chaoyang', 'fengtai', 'shunyi', 'tongzhou', 'daxing', 'changping', 'mentougou', 'fangshan', 'miyun', 'pinggu'] 66 | print(bins_num) 67 | print(price_sum) 68 | plt.figure(figsize=(30, 10), dpi=70) 69 | plt.title('total_price_show', fontsize=30) 70 | plt.xlabel('position', fontsize=15) 71 | plt.ylabel('avg_unit_price/10000 yuan', fontsize=15) 72 | for i in range(0,10): 73 | plt.bar(position_qu[i], price_sum[i], width=(house_num[i]/15)) 74 | for x,y in zip(position_qu,price_sum): 75 | plt.text(x,y,'%d' % y, ha='center', va='bottom', fontsize=15) 76 | plt.show() 77 | 78 | 79 | -------------------------------------------------------------------------------- /lianjia/lianjia/.idea/.gitignore: -------------------------------------------------------------------------------- 1 | # 默认忽略的文件 2 | /shelf/ 3 | /workspace.xml 4 | -------------------------------------------------------------------------------- /lianjia/lianjia/.idea/inspectionProfiles/profiles_settings.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 6 | -------------------------------------------------------------------------------- /lianjia/lianjia/.idea/lianjia.iml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | -------------------------------------------------------------------------------- /lianjia/lianjia/.idea/misc.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | -------------------------------------------------------------------------------- /lianjia/lianjia/.idea/modules.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | -------------------------------------------------------------------------------- /lianjia/lianjia/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Revenger666/python-house-scrapy/bbff08f70092d39ef0d19d3f0d9723aa4c8fd1b8/lianjia/lianjia/__init__.py -------------------------------------------------------------------------------- /lianjia/lianjia/begin.py: -------------------------------------------------------------------------------- 1 | from scrapy.crawler import CrawlerRunner 2 | from scrapy.utils.log import configure_logging 3 | from twisted.internet import reactor 4 | 5 | from lianjia.spiders.spider1 import firsthandspider 6 | from lianjia.spiders.spider2 import secondhandspider 7 | 8 | configure_logging() 9 | runner = CrawlerRunner() 10 | runner.crawl(firsthandspider) 11 | runner.crawl(secondhandspider) 12 | d = runner.join() 13 | d.addBoth(lambda _: reactor.stop()) 14 | 15 | reactor.run() 16 | """ 17 | from scrapy import cmdline 18 | 19 | cmdline.execute("scrapy crawl spider1".split()) 20 | cmdline.execute("scrapy crawl spider2".split()) 21 | 22 | from scrapy.crawler import CrawlerProcess 23 | from scrapy.utils.project import get_project_settings 24 | 25 | settings = get_project_settings() 26 | 27 | crawler = CrawlerProcess(settings) 28 | 29 | crawler.crawl('spider1') 30 | crawler.crawl('spider2') 31 | 32 | crawler.start()""" -------------------------------------------------------------------------------- /lianjia/lianjia/items.py: -------------------------------------------------------------------------------- 1 | # Define here the models for your scraped items 2 | # 3 | # See documentation in: 4 | # https://docs.scrapy.org/en/latest/topics/items.html 5 | 6 | import scrapy 7 | 8 | 9 | class firsthanditem(scrapy.Item): 10 | name = scrapy.Field() 11 | position = scrapy.Field() 12 | position1 = scrapy.Field() 13 | position2 = scrapy.Field() 14 | types = scrapy.Field() 15 | houseType = scrapy.Field() 16 | space = scrapy.Field() 17 | unitPrice = scrapy.Field() 18 | totalPrice = scrapy.Field() 19 | 20 | class secondhanditem(scrapy.Item): 21 | name = scrapy.Field() 22 | position = scrapy.Field() 23 | types = scrapy.Field() 24 | unitPrice = scrapy.Field() 25 | totalPrice = scrapy.Field() -------------------------------------------------------------------------------- /lianjia/lianjia/middlewares.py: -------------------------------------------------------------------------------- 1 | # Define here the models for your spider middleware 2 | # 3 | # See documentation in: 4 | # https://docs.scrapy.org/en/latest/topics/spider-middleware.html 5 | 6 | from scrapy import signals 7 | 8 | # useful for handling different item types with a single interface 9 | from itemadapter import is_item, ItemAdapter 10 | 11 | 12 | import random 13 | class RandomUserAgentMiddleware(object): 14 | def __init__(self, user_agents): 15 | self.user_agents = user_agents 16 | 17 | @classmethod 18 | def from_crawler(cls, crawler): 19 | # 从settings.py中导入MY_USER_AGENT 20 | s = cls(user_agents=crawler.settings.get('MY_USER_AGENT')) 21 | return s 22 | 23 | def process_request(self, request, spider): 24 | agent = random.choice(self.user_agents) 25 | request.headers['User-Agent'] = agent 26 | return None 27 | 28 | 29 | class LianjiaSpiderMiddleware: 30 | # Not all methods need to be defined. If a method is not defined, 31 | # scrapy acts as if the spider middleware does not modify the 32 | # passed objects. 33 | 34 | @classmethod 35 | def from_crawler(cls, crawler): 36 | # This method is used by Scrapy to create your spiders. 37 | s = cls() 38 | crawler.signals.connect(s.spider_opened, signal=signals.spider_opened) 39 | return s 40 | 41 | def process_spider_input(self, response, spider): 42 | # Called for each response that goes through the spider 43 | # middleware and into the spider. 44 | 45 | # Should return None or raise an exception. 46 | return None 47 | 48 | def process_spider_output(self, response, result, spider): 49 | # Called with the results returned from the Spider, after 50 | # it has processed the response. 51 | 52 | # Must return an iterable of Request, or item objects. 53 | for i in result: 54 | yield i 55 | 56 | def process_spider_exception(self, response, exception, spider): 57 | # Called when a spider or process_spider_input() method 58 | # (from other spider middleware) raises an exception. 59 | 60 | # Should return either None or an iterable of Request or item objects. 61 | pass 62 | 63 | def process_start_requests(self, start_requests, spider): 64 | # Called with the start requests of the spider, and works 65 | # similarly to the process_spider_output() method, except 66 | # that it doesn’t have a response associated. 67 | 68 | # Must return only requests (not items). 69 | for r in start_requests: 70 | yield r 71 | 72 | def spider_opened(self, spider): 73 | spider.logger.info("Spider opened: %s" % spider.name) 74 | 75 | 76 | class LianjiaDownloaderMiddleware: 77 | # Not all methods need to be defined. If a method is not defined, 78 | # scrapy acts as if the downloader middleware does not modify the 79 | # passed objects. 80 | 81 | @classmethod 82 | def from_crawler(cls, crawler): 83 | # This method is used by Scrapy to create your spiders. 84 | s = cls() 85 | crawler.signals.connect(s.spider_opened, signal=signals.spider_opened) 86 | return s 87 | 88 | def process_request(self, request, spider): 89 | # Called for each request that goes through the downloader 90 | # middleware. 91 | 92 | # Must either: 93 | # - return None: continue processing this request 94 | # - or return a Response object 95 | # - or return a Request object 96 | # - or raise IgnoreRequest: process_exception() methods of 97 | # installed downloader middleware will be called 98 | return None 99 | 100 | def process_response(self, request, response, spider): 101 | # Called with the response returned from the downloader. 102 | 103 | # Must either; 104 | # - return a Response object 105 | # - return a Request object 106 | # - or raise IgnoreRequest 107 | return response 108 | 109 | def process_exception(self, request, exception, spider): 110 | # Called when a download handler or a process_request() 111 | # (from other downloader middleware) raises an exception. 112 | 113 | # Must either: 114 | # - return None: continue processing this exception 115 | # - return a Response object: stops process_exception() chain 116 | # - return a Request object: stops process_exception() chain 117 | pass 118 | 119 | def spider_opened(self, spider): 120 | spider.logger.info("Spider opened: %s" % spider.name) 121 | -------------------------------------------------------------------------------- /lianjia/lianjia/pipelines.py: -------------------------------------------------------------------------------- 1 | # Define your item pipelines here 2 | # 3 | # Don't forget to add your pipeline to the ITEM_PIPELINES setting 4 | # See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html 5 | 6 | 7 | # useful for handling different item types with a single interface 8 | from itemadapter import ItemAdapter 9 | import json 10 | 11 | class firsthandline(object): 12 | def open_spider(self, spider): 13 | try: 14 | self.file = open('scrapy-test-firsthand.json',"w",encoding="utf-8") 15 | except Exception as err: 16 | print(err) 17 | 18 | def process_item(self, item, spider): 19 | dict_item = dict(item) 20 | json_str = json.dumps(dict_item, ensure_ascii=False) + "\n" 21 | self.file.write(json_str) 22 | return item 23 | 24 | def close_spider(self, spider): 25 | self.file.close() 26 | 27 | 28 | class secondhandline(object): 29 | def open_spider(self, spider): 30 | try: 31 | self.file = open('scrapy-test-secondhand.json', "w", encoding="utf-8") 32 | except Exception as err: 33 | print(err) 34 | 35 | def process_item(self, item, spider): 36 | dict_item = dict(item) 37 | json_str = json.dumps(dict_item, ensure_ascii=False) + "\n" 38 | self.file.write(json_str) 39 | return item 40 | 41 | def close_spider(self, spider): 42 | self.file.close() -------------------------------------------------------------------------------- /lianjia/lianjia/settings.py: -------------------------------------------------------------------------------- 1 | # Scrapy settings for lianjia project 2 | # 3 | # For simplicity, this file contains only settings considered important or 4 | # commonly used. You can find more settings consulting the documentation: 5 | # 6 | # https://docs.scrapy.org/en/latest/topics/settings.html 7 | # https://docs.scrapy.org/en/latest/topics/downloader-middleware.html 8 | # https://docs.scrapy.org/en/latest/topics/spider-middleware.html 9 | 10 | BOT_NAME = "lianjia" 11 | #2403:a200:a200:13f1:183:84:18:11 12 | 13 | SPIDER_MODULES = ["lianjia.spiders"] 14 | NEWSPIDER_MODULE = "lianjia.spiders" 15 | 16 | # Crawl responsibly by identifying yourself (and your website) on the user-agent 17 | #USER_AGENT = 'python_qm (+http://www.yourdomain.com)' 18 | #USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36" 19 | #USER_AGENT ="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36 Edg/119.0.0.0" 20 | DOWNLOADER_MIDDLEWARES = { 21 | # 'lianjia.middlewares.LianjiaDownloaderMiddleware': 543, 22 | 'lianjia.middlewares.RandomUserAgentMiddleware': 900, 23 | } 24 | 25 | MY_USER_AGENT = [ 26 | "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; AcooBrowser; .NET CLR 1.1.4322; .NET CLR 2.0.50727)", 27 | "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; Acoo Browser; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.0.04506)", 28 | "Mozilla/4.0 (compatible; MSIE 7.0; AOL 9.5; AOLBuild 4337.35; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)", 29 | "Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US)", 30 | "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET CLR 2.0.50727; Media Center PC 6.0)", 31 | "Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET CLR 1.0.3705; .NET CLR 1.1.4322)", 32 | "Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 5.2; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.2; .NET CLR 3.0.04506.30)", 33 | "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN) AppleWebKit/523.15 (KHTML, like Gecko, Safari/419.3) Arora/0.3 (Change: 287 c9dfb30)", 34 | "Mozilla/5.0 (X11; U; Linux; en-US) AppleWebKit/527+ (KHTML, like Gecko, Safari/419.3) Arora/0.6", 35 | "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.2pre) Gecko/20070215 K-Ninja/2.1.1", 36 | "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.9) Gecko/20080705 Firefox/3.0 Kapiko/3.0", 37 | "Mozilla/5.0 (X11; Linux i686; U;) Gecko/20070322 Kazehakase/0.4.5", 38 | "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.8) Gecko Fedora/1.9.0.8-1.fc10 Kazehakase/0.5.6", 39 | "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11", 40 | "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/535.20 (KHTML, like Gecko) Chrome/19.0.1036.7 Safari/535.20", 41 | "Opera/9.80 (Macintosh; Intel Mac OS X 10.6.8; U; fr) Presto/2.9.168 Version/11.52", 42 | "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.11 TaoBrowser/2.0 Safari/536.11", 43 | "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.71 Safari/537.1 LBBROWSER", 44 | "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; LBBROWSER)", 45 | "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E; LBBROWSER)", 46 | "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.84 Safari/535.11 LBBROWSER", 47 | "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E)", 48 | "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; QQBrowser/7.0.3698.400)", 49 | "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E)", 50 | "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; SV1; QQDownload 732; .NET4.0C; .NET4.0E; 360SE)", 51 | "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E)", 52 | "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E)", 53 | "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1", 54 | "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1", 55 | "Mozilla/5.0 (iPad; U; CPU OS 4_2_1 like Mac OS X; zh-cn) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8C148 Safari/6533.18.5", 56 | "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:2.0b13pre) Gecko/20110307 Firefox/4.0b13pre", 57 | "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:16.0) Gecko/20100101 Firefox/16.0", 58 | "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11", 59 | "Mozilla/5.0 (X11; U; Linux x86_64; zh-CN; rv:1.9.2.10) Gecko/20100922 Ubuntu/10.10 (maverick) Firefox/3.6.10", 60 | "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36", 61 | ] 62 | 63 | # Obey robots.txt rules 64 | ROBOTSTXT_OBEY = False 65 | 66 | #LOG_LEVEL = 'WARNING' 67 | 68 | #LOG_LEVEL = "WARNING" 69 | # Configure maximum concurrent requests performed by Scrapy (default: 16) 70 | #CONCURRENT_REQUESTS = 8 71 | 72 | # Configure a delay for requests for the same website (default: 0) 73 | # See https://docs.scrapy.org/en/latest/topics/settings.html#download-delay 74 | # See also autothrottle settings and docs 75 | DOWNLOAD_DELAY = 3 76 | RANDOMIZE_DOWNLOAD_DELAY = True 77 | #CONCURRENT_REQUESTS_PER_DOMAIN = 2 78 | # The download delay setting will honor only one of: 79 | #CONCURRENT_REQUESTS_PER_DOMAIN = 16 80 | #CONCURRENT_REQUESTS_PER_IP = 16 81 | 82 | # Disable cookies (enabled by default) 83 | #COOKIES_ENABLED = False 84 | 85 | # Disable Telnet Console (enabled by default) 86 | #TELNETCONSOLE_ENABLED = False 87 | 88 | # Override the default request headers: 89 | #DEFAULT_REQUEST_HEADERS = { 90 | # "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", 91 | # "Accept-Language": "en", 92 | # "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.4389.82 Safari/537.36" 93 | #} 94 | 95 | # Enable or disable spider middlewares 96 | # See https://docs.scrapy.org/en/latest/topics/spider-middleware.html 97 | #SPIDER_MIDDLEWARES = { 98 | # "lianjia.middlewares.LianjiaSpiderMiddleware": 543, 99 | #} 100 | 101 | # Enable or disable downloader middlewares 102 | # See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html 103 | #DOWNLOADER_MIDDLEWARES = { 104 | # "lianjia.middlewares.LianjiaDownloaderMiddleware": 543, 105 | #} 106 | 107 | # Enable or disable extensions 108 | # See https://docs.scrapy.org/en/latest/topics/extensions.html 109 | #EXTENSIONS = { 110 | # "scrapy.extensions.telnet.TelnetConsole": None, 111 | #} 112 | 113 | # Configure item pipelines 114 | # See https://docs.scrapy.org/en/latest/topics/item-pipeline.html 115 | ITEM_PIPELINES = {'lianjia.pipelines.firsthandline': 300, 'lianjia.pipelines.secondhandline': 300,} 116 | #ITEM_PIPELINES = {'lianjia.pipelines.firsthandline': 300} 117 | # Enable and configure the AutoThrottle extension (disabled by default) 118 | # See https://docs.scrapy.org/en/latest/topics/autothrottle.html 119 | #AUTOTHROTTLE_ENABLED = True 120 | # The initial download delay 121 | #AUTOTHROTTLE_START_DELAY = 5 122 | # The maximum download delay to be set in case of high latencies 123 | #AUTOTHROTTLE_MAX_DELAY = 60 124 | # The average number of requests Scrapy should be sending in parallel to 125 | # each remote server 126 | #AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0 127 | # Enable showing throttling stats for every response received: 128 | #AUTOTHROTTLE_DEBUG = False 129 | 130 | # Enable and configure HTTP caching (disabled by default) 131 | # See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings 132 | #HTTPCACHE_ENABLED = True 133 | #HTTPCACHE_EXPIRATION_SECS = 0 134 | #HTTPCACHE_DIR = "httpcache" 135 | #HTTPCACHE_IGNORE_HTTP_CODES = [] 136 | #HTTPCACHE_STORAGE = "scrapy.extensions.httpcache.FilesystemCacheStorage" 137 | 138 | # Set settings whose default value is deprecated to a future-proof value 139 | REQUEST_FINGERPRINTER_IMPLEMENTATION = "2.7" 140 | TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor" 141 | FEED_EXPORT_ENCODING = "utf-8" 142 | -------------------------------------------------------------------------------- /lianjia/lianjia/spiders/__init__.py: -------------------------------------------------------------------------------- 1 | # This package will contain the spiders of your Scrapy project 2 | # 3 | # Please refer to the documentation for information on how to create and manage 4 | # your spiders. 5 | -------------------------------------------------------------------------------- /lianjia/lianjia/spiders/spider1.py: -------------------------------------------------------------------------------- 1 | import scrapy 2 | from scrapy import Selector, Request 3 | from scrapy.http import HtmlResponse 4 | from lianjia.items import firsthanditem 5 | class firsthandspider(scrapy.spiders.Spider): 6 | name = "lianjia1" 7 | allowed_domains = ["bj.fang.lianjia.com"] 8 | start_urls = [] 9 | for page in range(3, 8): 10 | url1 = 'https://bj.fang.lianjia.com/loupan/pg{}/'.format(page) 11 | start_urls.append(url1) 12 | #for page in range(3, 8): 13 | # url1 = 'https://bj.lianjia.com/ershoufang/pg{}/'.format(page) 14 | # start_urls.append(url1) 15 | 16 | custom_settings = { 17 | 'ITEM_PIPELINES': {'lianjia.pipelines.firsthandline': 300}, 18 | } 19 | 20 | def parse(self, response): 21 | 22 | item = firsthanditem() 23 | div_list = response.xpath("/html/body/div[3]/ul[2]/li") 24 | #div_list = response.xpath("//*") 25 | #print(div_list) 26 | for each in div_list: 27 | 28 | item['name'] = each.xpath("./div[@class=\"resblock-desc-wrapper\"]/div[@class=\"resblock-name\"]/a/text()").extract_first() 29 | item['types'] = each.xpath("./div[@class=\"resblock-desc-wrapper\"]/div[@class=\"resblock-name\"]/span[@class=\"resblock-type\"]/text()").extract_first() 30 | item['position'] = each.xpath("./div[@class=\"resblock-desc-wrapper\"]/div[@class=\"resblock-location\"]/a/text()").extract_first() 31 | item['position1'] = each.xpath("./div[@class=\"resblock-desc-wrapper\"]/div[@class=\"resblock-location\"]/span[1]/text()").extract_first() 32 | item['position2'] = each.xpath("./div[@class=\"resblock-desc-wrapper\"]/div[@class=\"resblock-location\"]/span[2]/text()").extract_first() 33 | item['houseType'] = each.xpath("./div[@class=\"resblock-desc-wrapper\"]/a[@class=\"resblock-room\"]/span/text()").extract_first() 34 | item['space'] = each.xpath("./div[@class=\"resblock-desc-wrapper\"]/div[@class=\"resblock-area\"]/span/text()").extract_first() 35 | item['unitPrice'] = each.xpath("./div[@class=\"resblock-desc-wrapper\"]/div[@class=\"resblock-price\"]/div[@class=\"main-price\"]/span[@class = \"number\"]/text()").extract_first() 36 | item['totalPrice'] = each.xpath("./div[@class=\"resblock-desc-wrapper\"]/div[@class=\"resblock-price\"]/div[@class=\"second\"]/text()").extract_first() 37 | yield item 38 | -------------------------------------------------------------------------------- /lianjia/lianjia/spiders/spider2.py: -------------------------------------------------------------------------------- 1 | import scrapy 2 | from scrapy import Selector, Request 3 | from scrapy.http import HtmlResponse 4 | from lianjia.items import secondhanditem 5 | class secondhandspider(scrapy.spiders.Spider): 6 | name = "lianjia2" 7 | allowed_domains = ["bj.lianjia.com"] 8 | start_urls = [] 9 | for page in range(3, 8): 10 | url1 = 'https://bj.lianjia.com/ershoufang/pg{}/'.format(page) 11 | start_urls.append(url1) 12 | 13 | custom_settings = { 14 | 'ITEM_PIPELINES': {'lianjia.pipelines.secondhandline': 300}, 15 | } 16 | 17 | def parse(self, response): 18 | 19 | item = secondhanditem() 20 | div_list = response.xpath("//*[@id=\"content\"]/div[1]/ul/li") 21 | 22 | #print(div_list) 23 | for each in div_list: 24 | item['name'] = each.xpath("./div[@class=\"info clear\"]/div[@class=\"flood\"]/div/a[1]/text()").extract_first() 25 | item['position'] = each.xpath("./div[@class=\"info clear\"]/div[@class=\"flood\"]/div/a[2]/text()").extract_first() 26 | item['types'] = each.xpath("./div[@class=\"info clear\"]/div[@class=\"address\"]/div/text()").extract_first() 27 | item['unitPrice'] = each.xpath("./div[@class=\"info clear\"]/div[@class=\"priceInfo\"]/div[2]/span/text()").extract_first() 28 | item['totalPrice'] = each.xpath("./div[@class=\"info clear\"]/div[@class=\"priceInfo\"]/div[1]/span/text()").extract_first() 29 | yield item 30 | -------------------------------------------------------------------------------- /lianjia/scrapy-test-firsthand.json: -------------------------------------------------------------------------------- 1 | {"name": "北辰墅院1900", "types": "住宅", "position": "顺兴街11号院望尊园", "position1": "顺义", "position2": "马坡", "houseType": "3室", "space": "建面 83-135㎡", "unitPrice": "36000", "totalPrice": "总价430(万/套)"} 2 | {"name": "燕西华府", "types": "别墅", "position": "王佐镇青龙湖公园东1500米, 泉湖西路1号院(七区), 泉湖西路1号院(六区)", "position1": "丰台", "position2": "丰台其它", "houseType": "3室", "space": "建面 350-851㎡", "unitPrice": "47000", "totalPrice": "总价1400-3500(万/套)"} 3 | {"name": "京西悦府", "types": "住宅", "position": "燕房线阎村地铁站东南角约189米", "position1": "房山", "position2": "阎村", "houseType": null, "space": "建面 120-135㎡", "unitPrice": "33000", "totalPrice": "总价440(万/套)"} 4 | {"name": "福景苑", "types": "住宅", "position": "亮马桥路46号", "position1": "朝阳", "position2": "燕莎", "houseType": "1室", "space": "建面 145-268㎡", "unitPrice": "83000", "totalPrice": "总价1150-2400(万/套)"} 5 | {"name": "合景寰汇公馆", "types": "住宅", "position": "北京市通州区滨河中路西侧(合景寰汇公馆)", "position1": "通州", "position2": "武夷花园", "houseType": "2室", "space": "建面 77-117㎡", "unitPrice": "35000", "totalPrice": "总价280-490(万/套)"} 6 | {"name": "K2十里春风", "types": "住宅", "position": "北京市通州区", "position1": "通州", "position2": "通州其它", "houseType": "2室", "space": "建面 74-90㎡", "unitPrice": "23500", "totalPrice": "总价188-212(万/套)"} 7 | {"name": "K2十里春风", "types": "别墅", "position": "北京市通州区", "position1": "通州", "position2": "通州其它", "houseType": "3室", "space": "建面 155-156㎡", "unitPrice": "28000", "totalPrice": "总价440-460(万/套)"} 8 | {"name": "玺萌壹號院", "types": "别墅", "position": "西南三环嘉园路与镇国寺北街交叉口", "position1": "丰台", "position2": "草桥", "houseType": "5室", "space": "建面 320-464㎡", "unitPrice": "90000", "totalPrice": "总价3650-3940(万/套)"} 9 | {"name": "北京书院", "types": "住宅", "position": "北京市朝阳区北土城东路辅路", "position1": "朝阳", "position2": "惠新西街", "houseType": "1室", "space": "建面 79-139㎡", "unitPrice": "155000", "totalPrice": "总价1066(万/套)"} 10 | {"name": "中铁华侨城和园", "types": "住宅", "position": "南五环南海子公园西侧约500米", "position1": "大兴", "position2": "瀛海", "houseType": "3室", "space": "建面 154-184㎡", "unitPrice": "60000", "totalPrice": "总价930-980(万/套)"} 11 | {"name": "顺鑫颐和天璟", "types": "住宅", "position": "北京市顺义区牛栏山镇牛富路顺鑫颐和天璟禧润售楼中心", "position1": "顺义", "position2": "顺义其它", "houseType": "4室", "space": "建面 110-220㎡", "unitPrice": "28000", "totalPrice": "总价400-420(万/套)"} 12 | {"name": "顺鑫颐和天璟", "types": "别墅", "position": "新城右堤路与昌金路交汇处向北200米", "position1": "顺义", "position2": "顺义其它", "houseType": "4室", "space": "建面 278-486㎡", "unitPrice": "28000", "totalPrice": "总价950-1200(万/套)"} 13 | {"name": "永旺19街", "types": "商业", "position": "地铁生物医药基地站向南200米", "position1": "大兴", "position2": "天宫院", "houseType": null, "space": null, "unitPrice": "24000", "totalPrice": "总价299(万/套)"} 14 | {"name": "北京城建北京合院", "types": "住宅", "position": "燕京街与通顺路交汇口东800米(仁和公园南)", "position1": "顺义", "position2": "顺义其它", "houseType": "3室", "space": "建面 95-130㎡", "unitPrice": "46000", "totalPrice": "总价556-566(万/套)"} 15 | {"name": "复地运河公馆", "types": "住宅", "position": "通州运河核心区临滨河西路", "position1": "通州", "position2": "武夷花园", "houseType": "2室", "space": "建面 89-145㎡", "unitPrice": "43000", "totalPrice": "总价450-650(万/套)"} 16 | {"name": "北京城建北京合院", "types": "别墅", "position": "燕京街与通顺路交汇口东800米(仁和公园南)", "position1": "顺义", "position2": "顺义其它", "houseType": "4室", "space": "建面 210-330㎡", "unitPrice": "39000", "totalPrice": "总价1000-1300(万/套)"} 17 | {"name": "月亮河七星公馆", "types": "住宅", "position": "通燕高速耿庄桥出口南200米月亮河,河滨路1号", "position1": "通州", "position2": "武夷花园", "houseType": "1室", "space": "建面 55-109㎡", "unitPrice": "68000", "totalPrice": "总价374-800(万/套)"} 18 | {"name": "天润福熙大道", "types": "住宅", "position": "清河营东路1号院, 清河营东路3号院", "position1": "朝阳", "position2": "北苑", "houseType": "1室", "space": "建面 65-374㎡", "unitPrice": "108000", "totalPrice": "总价750-3316(万/套)"} 19 | {"name": "京贸国际公馆", "types": "住宅", "position": "怡乐中路299号院(广渠快速路二期出口向南1000米)", "position1": "通州", "position2": "九棵树(家乐福)", "houseType": "1室", "space": "建面 72-147㎡", "unitPrice": "64000", "totalPrice": "总价495-950(万/套)"} 20 | {"name": "凯德麓语", "types": "别墅", "position": "兴寿镇京承高速G11出口向西怀昌路北侧", "position1": "昌平", "position2": "昌平其它", "houseType": "3室", "space": "建面 280-863㎡", "unitPrice": "35000", "totalPrice": "总价850-3450(万/套)"} 21 | {"name": "京贸国际城·峰景", "types": "住宅", "position": "芙蓉东路1号(通燕高速耿庄桥北出口向南300米)", "position1": "通州", "position2": "武夷花园", "houseType": "1室", "space": "建面 69-140㎡", "unitPrice": "68000", "totalPrice": "总价460-980(万/套)"} 22 | {"name": "观唐云鼎", "types": "别墅", "position": "溪翁庄镇密溪路39号院(云佛山度假村对面)", "position1": "密云", "position2": "溪翁庄镇", "houseType": "3室", "space": "建面 346-613㎡", "unitPrice": "30000", "totalPrice": "总价1068-1850(万/套)"} 23 | {"name": "旭辉城", "types": "住宅", "position": "北京市房山区良锦街6号院旭辉城营销中心", "position1": "房山", "position2": "房山其它", "houseType": "2室", "space": "建面 75-116㎡", "unitPrice": "28500", "totalPrice": "总价219-330(万/套)"} 24 | {"name": "檀香府", "types": "住宅", "position": "京潭大街与潭柘十街交叉口", "position1": "门头沟", "position2": "门头沟其它", "houseType": "3室", "space": "建面 124-170㎡", "unitPrice": "42000", "totalPrice": "总价530-750(万/套)"} 25 | {"name": "泰禾金府大院", "types": "别墅", "position": "南四环地铁新宫站南800米", "position1": "丰台", "position2": "新宫", "houseType": "4室", "space": "建面 362-504㎡", "unitPrice": "75000", "totalPrice": "总价2700-3700(万/套)"} 26 | {"name": "和棠瑞著", "types": "别墅", "position": "金海湖景区坝前广场西侧500米", "position1": "平谷", "position2": "平谷其它", "houseType": "3室", "space": "建面 305-360㎡", "unitPrice": "16000", "totalPrice": "总价530-560(万/套)"} 27 | {"name": "尊悦光华", "types": "住宅", "position": "北京市朝阳区光华东里甲1号院3号楼", "position1": "朝阳", "position2": "CBD", "houseType": "3室", "space": "建面 133-171㎡", "unitPrice": "150000", "totalPrice": "总价2500(万/套)"} 28 | {"name": "首创·河著", "types": "别墅", "position": "京承高速11出口(昌金路)向东900 米路北", "position1": "顺义", "position2": "顺义其它", "houseType": "4室", "space": "建面 248-310㎡", "unitPrice": "38000", "totalPrice": "总价1200-1900(万/套)"} 29 | {"name": "华萃西山", "types": "住宅", "position": "永定镇地铁S1号线石厂西南700米", "position1": "门头沟", "position2": "门头沟其它", "houseType": "3室", "space": "建面 115-122㎡", "unitPrice": "48000", "totalPrice": "总价560-600(万/套)"} 30 | {"name": "京西悦府", "types": "别墅", "position": "北京市房山区燕房线阎村地铁站东南角约189米", "position1": "房山", "position2": "阎村", "houseType": "3室", "space": "建面 175-176㎡", "unitPrice": "40000", "totalPrice": "总价700-780(万/套)"} 31 | {"name": "中粮天恒天悦壹号", "types": "别墅", "position": "南四环地铁新宫站南500米", "position1": "丰台", "position2": "新宫", "houseType": "4室", "space": "建面 220-340㎡", "unitPrice": "80000", "totalPrice": "总价2000-2360(万/套)"} 32 | {"name": "龙湾别墅", "types": "住宅", "position": "后沙峪镇龙湾别墅", "position1": "顺义", "position2": "中央别墅区", "houseType": "4室", "space": "建面 218-317㎡", "unitPrice": "70000", "totalPrice": "总价2300(万/套)"} 33 | {"name": "京投发展·锦悦府", "types": "住宅", "position": "檀营乡檀东路西侧", "position1": "密云", "position2": "鼓楼街道", "houseType": "3室", "space": "建面 90㎡", "unitPrice": "25607", "totalPrice": "总价220(万/套)"} 34 | {"name": "京投发展·锦悦府", "types": "别墅", "position": "檀营乡檀东路西侧", "position1": "密云", "position2": "鼓楼街道", "houseType": "3室", "space": "建面 187-285㎡", "unitPrice": "25000", "totalPrice": "总价400-560(万/套)"} 35 | {"name": "金辰府", "types": "住宅", "position": "北京市昌平区北七家镇政府东南100米", "position1": "昌平", "position2": "北七家", "houseType": "3室", "space": "建面 89-143㎡", "unitPrice": "55000", "totalPrice": "总价490-790(万/套)"} 36 | {"name": "建邦·顺颐府", "types": "住宅", "position": "空港B区裕民大街30号千里马国际一层建邦·顺颐府售楼中心", "position1": "顺义", "position2": "后沙峪", "houseType": "3室", "space": "建面 89-147㎡", "unitPrice": "55583", "totalPrice": "总价480-845(万/套)"} 37 | {"name": "葛洲坝中国府", "types": "住宅", "position": "北京市丰台东路46号", "position1": "丰台", "position2": "玉泉营", "houseType": "3室", "space": "建面 168-240㎡", "unitPrice": "125000", "totalPrice": "总价2200-3000(万/套)"} 38 | {"name": "华萃西山", "types": "别墅", "position": "门头沟永定镇地铁S1号线石厂站西南700米", "position1": "门头沟", "position2": "门头沟其它", "houseType": "4室", "space": "建面 135-245㎡", "unitPrice": "48000", "totalPrice": "总价760-1060(万/套)"} 39 | {"name": "富兴首府", "types": "住宅", "position": "东坝路9号东北60米", "position1": "朝阳", "position2": "东坝", "houseType": "3室", "space": "建面 144-356㎡", "unitPrice": "85000", "totalPrice": "总价1706-2240(万/套)"} 40 | {"name": "中铁诺德阅墅", "types": "别墅", "position": "顺义区后沙峪镇裕园路762乡龙湖滟澜山对面", "position1": "顺义", "position2": "中央别墅区", "houseType": "4室", "space": "建面 235-320㎡", "unitPrice": "50000", "totalPrice": "总价1150-1700(万/套)"} 41 | {"name": "中铁华侨城和园", "types": "别墅", "position": "南五环南海子公园西侧约500米", "position1": "大兴", "position2": "瀛海", "houseType": "4室", "space": "建面 288-370㎡", "unitPrice": "50000", "totalPrice": "总价1870(万/套)"} 42 | {"name": "懋源·璟岳", "types": "别墅", "position": "南三环西路99号院", "position1": "丰台", "position2": "玉泉营", "houseType": "4室", "space": "建面 465-590㎡", "unitPrice": "140000", "totalPrice": "总价6500-9000(万/套)"} 43 | {"name": "合景泰富天汇", "types": "住宅", "position": "顺义区昌金路与通顺路交汇处", "position1": "顺义", "position2": "马坡", "houseType": "2室", "space": "建面 70-117㎡", "unitPrice": "33000", "totalPrice": "总价230-390(万/套)"} 44 | {"name": "懋源·璟玺", "types": "别墅", "position": "孙河京密路与京平辅路交叉口西行1000米", "position1": "朝阳", "position2": "中央别墅区", "houseType": "5室", "space": "建面 500-716㎡", "unitPrice": "100000", "totalPrice": "总价4380-6778(万/套)"} 45 | {"name": "万科雲庐", "types": "住宅", "position": "魏各庄路万科雲庐。售楼处位置在山湖路与泉湖西湖交叉位置", "position1": "丰台", "position2": "丰台其它", "houseType": "4室", "space": "建面 104-300㎡", "unitPrice": "39000", "totalPrice": "总价656-820(万/套)"} 46 | {"name": "万科雲庐", "types": "别墅", "position": "魏各庄路万科雲庐", "position1": "丰台", "position2": "丰台其它", "houseType": "4室", "space": "建面 200-330㎡", "unitPrice": "30000", "totalPrice": "总价852-950(万/套)"} 47 | {"name": "金茂北京国际社区", "types": "住宅", "position": "顺义新城北小营昌金路水色时光路西", "position1": "顺义", "position2": "顺义其它", "houseType": "1室", "space": "建面 50-118㎡", "unitPrice": "30000", "totalPrice": "总价160-360(万/套)"} 48 | {"name": "住总如院", "types": "住宅", "position": "北京市大兴区采华路(波尔多小镇南区西南侧约250米)", "position1": "大兴", "position2": "大兴新机场洋房别墅区", "houseType": "2室", "space": "建面 98-233㎡", "unitPrice": "31136", "totalPrice": "总价280-475(万/套)"} 49 | {"name": "郎府书苑", "types": "住宅", "position": "西集镇京哈高速郎府出口南侧300米", "position1": "通州", "position2": "通州其它", "houseType": "3室", "space": "建面 89-116㎡", "unitPrice": "25800", "totalPrice": "总价273-300(万/套)"} 50 | {"name": "建邦·顺颐府", "types": "别墅", "position": "空港B区裕民大街30号", "position1": "顺义", "position2": "后沙峪", "houseType": "3室", "space": "建面 270㎡", "unitPrice": "55583", "totalPrice": "总价1300(万/套)"} 51 | -------------------------------------------------------------------------------- /lianjia/scrapy-test-secondhand.json: -------------------------------------------------------------------------------- 1 | {"name": "人民日报社家属区 ", "position": "红庙", "types": "2室1厅 | 65.64平米 | 东南 | 简装 | 低楼层(共16层) | 1991年 | 塔楼", "unitPrice": "83,791元/平", "totalPrice": "550"} 2 | {"name": "中建国际港 ", "position": "枣园", "types": "2室1厅 | 86.96平米 | 南 | 简装 | 低楼层(共33层) | 板楼", "unitPrice": "49,219元/平", "totalPrice": "428"} 3 | {"name": "延静里 ", "position": "甜水园", "types": "2室1厅 | 64平米 | 南 北 | 精装 | 低楼层(共6层) | 1979年 | 板楼", "unitPrice": "62,344元/平", "totalPrice": "399"} 4 | {"name": "铁东小区 ", "position": "军博", "types": "2室1厅 | 50.3平米 | 南 | 简装 | 中楼层(共6层) | 板楼", "unitPrice": "100,398元/平", "totalPrice": "505"} 5 | {"name": "红联北村 ", "position": "小西天", "types": "3室1厅 | 81平米 | 东 北 | 毛坯 | 中楼层(共16层) | 1993年 | 塔楼", "unitPrice": "76,544元/平", "totalPrice": "620"} 6 | {"name": "秀水园 ", "position": "甜水园", "types": "2室1厅 | 58.04平米 | 东南 | 精装 | 12层 | 1994年 | 板楼", "unitPrice": "56,169元/平", "totalPrice": "326"} 7 | {"name": "华纺易城 ", "position": "朝青", "types": "3室1厅 | 138.03平米 | 南 北 | 简装 | 13层 | 2006年 | 板楼", "unitPrice": "81,142元/平", "totalPrice": "1120"} 8 | {"name": "南庭新苑北区 ", "position": "新宫", "types": "2室1厅 | 58.7平米 | 东 | 简装 | 低楼层(共16层) | 2012年 | 板塔结合", "unitPrice": "54,174元/平", "totalPrice": "318"} 9 | {"name": "中建二局家属院 ", "position": "梨园", "types": "3室1厅 | 119.89平米 | 东南 | 精装 | 中楼层(共18层) | 塔楼", "unitPrice": "32,947元/平", "totalPrice": "395"} 10 | {"name": "门矿西山楼 ", "position": "门头沟其它", "types": "2室2厅 | 62.78平米 | 南 北 | 简装 | 低楼层(共6层) | 1992年 | 板楼", "unitPrice": "17,522元/平", "totalPrice": "110"} 11 | {"name": "建邦华庭东区 ", "position": "长阳", "types": "2室1厅 | 89.96平米 | 南 | 精装 | 中楼层(共16层) | 2013年 | 板塔结合", "unitPrice": "39,462元/平", "totalPrice": "355"} 12 | {"name": "龙博苑二区 ", "position": "回龙观", "types": "3室1厅 | 87.43平米 | 南 北 | 简装 | 中楼层(共7层) | 2004年 | 板楼", "unitPrice": "52,500元/平", "totalPrice": "459"} 13 | {"name": "丽景长安二期 ", "position": "冯村", "types": "2室1厅 | 87平米 | 南 | 精装 | 中楼层(共27层) | 板塔结合", "unitPrice": "37,357元/平", "totalPrice": "325"} 14 | {"name": "双榆树东里 ", "position": "双榆树", "types": "2室1厅 | 49.5平米 | 南 | 精装 | 底层(共6层) | 1981年 | 板楼", "unitPrice": "106,061元/平", "totalPrice": "525"} 15 | {"name": "永泰东里 ", "position": "清河", "types": "2室1厅 | 73.12平米 | 东 西 | 简装 | 顶层(共6层) | 板楼", "unitPrice": "59,902元/平", "totalPrice": "438"} 16 | {"name": "双桥六号井 ", "position": "双桥", "types": "2室1厅 | 58.69平米 | 南 北 | 简装 | 中楼层(共4层) | 1987年 | 板楼", "unitPrice": "35,100元/平", "totalPrice": "206"} 17 | {"name": "和平街十二区 ", "position": "和平里", "types": "2室1厅 | 59.13平米 | 南 北 | 简装 | 中楼层(共6层) | 1992年 | 板楼", "unitPrice": "77,457元/平", "totalPrice": "458"} 18 | {"name": "万科新里程57号院 ", "position": "长阳", "types": "2室2厅 | 90.98平米 | 南 北 | 精装 | 低楼层(共8层) | 2013年 | 板楼", "unitPrice": "38,361元/平", "totalPrice": "349"} 19 | {"name": "角门东里 ", "position": "角门", "types": "2室1厅 | 59.6平米 | 南 北 | 精装 | 中楼层(共6层) | 1993年 | 板楼", "unitPrice": "48,994元/平", "totalPrice": "292"} 20 | {"name": "百万庄午区 ", "position": "阜成门", "types": "2室0厅 | 55.3平米 | 东 南 北 | 简装 | 底层(共5层) | 板楼", "unitPrice": "124,774元/平", "totalPrice": "690"} 21 | {"name": "南礼士路甲62号院 ", "position": "月坛", "types": "2室1厅 | 61.5平米 | 南 北 | 简装 | 顶层(共6层) | 1982年 | 板楼", "unitPrice": "121,952元/平", "totalPrice": "750"} 22 | {"name": "保利嘉园三号院 ", "position": "常营", "types": "2室2厅 | 89平米 | 西南 | 精装 | 中楼层(共26层) | 塔楼", "unitPrice": "49,326元/平", "totalPrice": "439"} 23 | {"name": "美景东方 ", "position": "华威桥", "types": "2室2厅 | 77.15平米 | 西南 | 精装 | 22层 | 塔楼", "unitPrice": "73,883元/平", "totalPrice": "570"} 24 | {"name": "月季园 ", "position": "武夷花园", "types": "2室1厅 | 99.57平米 | 南 | 精装 | 中楼层(共27层) | 板塔结合", "unitPrice": "41,680元/平", "totalPrice": "415"} 25 | {"name": "南三环东路 ", "position": "刘家窑", "types": "2室1厅 | 68.92平米 | 南 北 | 简装 | 12层 | 1999年 | 板塔结合", "unitPrice": "47,592元/平", "totalPrice": "328"} 26 | {"name": "富卓苑 ", "position": "马家堡", "types": "2室1厅 | 69.7平米 | 东 西 | 简装 | 6层 | 2001年 | 板楼", "unitPrice": "47,203元/平", "totalPrice": "329"} 27 | {"name": "伟业嘉园西里 ", "position": "良乡", "types": "2室1厅 | 97.34平米 | 南 北 | 简装 | 中楼层(共7层) | 2010年 | 板楼", "unitPrice": "25,786元/平", "totalPrice": "251"} 28 | {"name": "顺五条 ", "position": "刘家窑", "types": "2室1厅 | 62.67平米 | 东南 | 精装 | 中楼层(共6层) | 板楼", "unitPrice": "55,051元/平", "totalPrice": "345"} 29 | {"name": "西黄新村北里 ", "position": "苹果园", "types": "3室1厅 | 111.89平米 | 南 西 | 简装 | 23层 | 2003年 | 塔楼", "unitPrice": "42,542元/平", "totalPrice": "476"} 30 | {"name": "国际花都 ", "position": "密云其它", "types": "2室1厅 | 86.15平米 | 南 北 | 精装 | 底层(共20层) | 板楼", "unitPrice": "20,430元/平", "totalPrice": "176"} 31 | {"name": "天通苑东一区 ", "position": "天通苑", "types": "2室1厅 | 76.91平米 | 南 北 | 精装 | 中楼层(共7层) | 2001年 | 板楼", "unitPrice": "43,688元/平", "totalPrice": "336"} 32 | {"name": "万年花城四期 ", "position": "玉泉营", "types": "2室2厅 | 86.95平米 | 东 西 | 精装 | 顶层(共6层) | 板楼", "unitPrice": "61,990元/平", "totalPrice": "539"} 33 | {"name": "万年花城四期 ", "position": "玉泉营", "types": "2室1厅 | 97.01平米 | 南 | 简装 | 中楼层(共27层) | 塔楼", "unitPrice": "70,096元/平", "totalPrice": "680"} 34 | {"name": "莲香园 ", "position": "六里桥", "types": "4室2厅 | 197.23平米 | 南 北 | 简装 | 顶层(共6层) | 板楼", "unitPrice": "43,097元/平", "totalPrice": "850"} 35 | {"name": "英特公寓 ", "position": "西坝河", "types": "3室1厅 | 227.33平米 | 南 北 | 精装 | 高楼层(共19层) | 板楼", "unitPrice": "57,186元/平", "totalPrice": "1300"} 36 | {"name": "观湖国际 ", "position": "朝阳公园", "types": "4室2厅 | 288.65平米 | 南 | 精装 | 中楼层(共27层) | 板楼", "unitPrice": "103,240元/平", "totalPrice": "2980"} 37 | {"name": "天伦锦城 ", "position": "花乡", "types": "3室2厅 | 110.55平米 | 东 西 | 精装 | 顶层(共13层) | 板楼", "unitPrice": "44,777元/平", "totalPrice": "495"} 38 | {"name": "怡锦园 ", "position": "科技园区", "types": "3室1厅 | 136.94平米 | 西南 | 精装 | 高楼层(共30层) | 2003年 | 塔楼", "unitPrice": "53,674元/平", "totalPrice": "735"} 39 | {"name": "曙光里 ", "position": "三元桥", "types": "3室1厅 | 77.4平米 | 南 北 | 精装 | 中楼层(共6层) | 板楼", "unitPrice": "65,892元/平", "totalPrice": "510"} 40 | {"name": "中国铁建花语金郡 ", "position": "瀛海", "types": "4室1厅 | 128.49平米 | 南 北 | 简装 | 中楼层(共18层) | 2018年 | 板楼", "unitPrice": "59,149元/平", "totalPrice": "760"} 41 | {"name": "左安漪园 ", "position": "左安门", "types": "3室1厅 | 133.04平米 | 南 北 | 简装 | 中楼层(共10层) | 板塔结合", "unitPrice": "105,232元/平", "totalPrice": "1400"} 42 | {"name": "银枫家园 ", "position": "大山子", "types": "4室2厅 | 215.29平米 | 东南 西北 | 简装 | 中楼层(共10层) | 板楼", "unitPrice": "41,758元/平", "totalPrice": "899"} 43 | {"name": "天通苑中苑 ", "position": "天通苑", "types": "3室1厅 | 158.07平米 | 南 北 | 简装 | 低楼层(共19层) | 2008年 | 板楼", "unitPrice": "38,464元/平", "totalPrice": "608"} 44 | {"name": "黄庄小区 ", "position": "中关村", "types": "3室2厅 | 127.1平米 | 东 南 北 | 精装 | 高楼层(共16层) | 1987年 | 塔楼", "unitPrice": "133,753元/平", "totalPrice": "1700"} 45 | {"name": "永金里小区 ", "position": "五棵松", "types": "3室1厅 | 120.99平米 | 东 西 | 精装 | 低楼层(共6层) | 板楼", "unitPrice": "73,478元/平", "totalPrice": "889"} 46 | {"name": "DBC加州小镇C区 ", "position": "临河里", "types": "3室2厅 | 133.83平米 | 南 北 | 精装 | 高楼层(共15层) | 2010年 | 板楼", "unitPrice": "43,563元/平", "totalPrice": "583"} 47 | {"name": "龙泽苑东区 ", "position": "回龙观", "types": "2室1厅 | 100.55平米 | 东南 | 简装 | 11层 | 塔楼", "unitPrice": "56,191元/平", "totalPrice": "565"} 48 | {"name": "尚家楼48号院 ", "position": "三元桥", "types": "3室2厅 | 88.04平米 | 南 西 北 | 精装 | 高楼层(共12层) | 板塔结合", "unitPrice": "83,712元/平", "totalPrice": "737"} 49 | {"name": "西山艺境3号院 ", "position": "大峪", "types": "3室2厅 | 140.62平米 | 南 北 | 精装 | 高楼层(共9层) | 2015年 | 板楼", "unitPrice": "45,442元/平", "totalPrice": "639"} 50 | {"name": "天通苑东一区 ", "position": "天通苑", "types": "3室1厅 | 130.14平米 | 南 北 | 简装 | 中楼层(共7层) | 板楼", "unitPrice": "35,962元/平", "totalPrice": "468"} 51 | {"name": "朝阳旺角 ", "position": "双桥", "types": "3室2厅 | 137.37平米 | 南 北 | 精装 | 中楼层(共16层) | 板楼", "unitPrice": "45,862元/平", "totalPrice": "630"} 52 | {"name": "蓝调沙龙西区 ", "position": "九棵树(家乐福)", "types": "2室1厅 | 98.46平米 | 南 北 | 简装 | 高楼层(共6层) | 2004年 | 板楼", "unitPrice": "37,376元/平", "totalPrice": "368"} 53 | {"name": "天通苑东二区 ", "position": "天通苑", "types": "3室1厅 | 141.41平米 | 南 北 | 简装 | 高楼层(共7层) | 2001年 | 板楼", "unitPrice": "26,873元/平", "totalPrice": "380"} 54 | {"name": "跃城 ", "position": "赵公口", "types": "3室2厅 | 162.29平米 | 南 北 | 精装 | 高楼层(共20层) | 板塔结合", "unitPrice": "46,830元/平", "totalPrice": "760"} 55 | {"name": "DBC加州小镇 ", "position": "临河里", "types": "3室1厅 | 124.74平米 | 南 北 | 精装 | 中楼层(共11层) | 板楼", "unitPrice": "39,924元/平", "totalPrice": "498"} 56 | {"name": "慧忠北里第三社区 ", "position": "亚运村", "types": "3室2厅 | 115.78平米 | 东 西北 | 简装 | 低楼层(共25层) | 塔楼", "unitPrice": "64,779元/平", "totalPrice": "750"} 57 | {"name": "泰中花园 ", "position": "高米店", "types": "5室2厅 | 204平米 | 南 北 | 简装 | 高楼层(共7层) | 板楼", "unitPrice": "22,010元/平", "totalPrice": "449"} 58 | {"name": "首城国际D区 ", "position": "双井", "types": "3室1厅 | 89.77平米 | 南 北 | 精装 | 中楼层(共28层) | 2010年 | 板楼", "unitPrice": "103,599元/平", "totalPrice": "930"} 59 | {"name": "弘善家园 ", "position": "潘家园", "types": "3室1厅 | 89.08平米 | 西北 | 简装 | 高楼层(共26层) | 板塔结合", "unitPrice": "48,833元/平", "totalPrice": "435"} 60 | {"name": "富力又一城A区 ", "position": "豆各庄", "types": "3室2厅 | 169.59平米 | 南 北 | 精装 | 高楼层(共22层) | 板楼", "unitPrice": "47,114元/平", "totalPrice": "799"} 61 | {"name": "田村山南路9号院 ", "position": "玉泉路", "types": "3室1厅 | 83.71平米 | 南 北 | 简装 | 顶层(共4层) | 板楼", "unitPrice": "73,827元/平", "totalPrice": "618"} 62 | {"name": "金色漫香郡北区 ", "position": "南中轴机场商务区", "types": "1室1厅 | 57.97平米 | 南 | 精装 | 中楼层(共9层) | 板楼", "unitPrice": "32,604元/平", "totalPrice": "189"} 63 | {"name": "富强东里 ", "position": "黄村中", "types": "2室1厅 | 75.2平米 | 南 北 | 简装 | 低楼层(共6层) | 1993年 | 板楼", "unitPrice": "28,591元/平", "totalPrice": "215"} 64 | {"name": "望泉家园 ", "position": "顺义城", "types": "2室1厅 | 80.88平米 | 南 北 | 精装 | 中楼层(共6层) | 板楼", "unitPrice": "29,550元/平", "totalPrice": "239"} 65 | {"name": "和谐家园二区 ", "position": "回龙观", "types": "3室1厅 | 124.47平米 | 南 北 | 精装 | 低楼层(共6层) | 2006年 | 板楼", "unitPrice": "42,501元/平", "totalPrice": "529"} 66 | {"name": "牛奶宿舍 ", "position": "牡丹园", "types": "2室1厅 | 62.6平米 | 南 北 | 精装 | 高楼层(共6层) | 板楼", "unitPrice": "95,847元/平", "totalPrice": "600"} 67 | {"name": "当代采育满庭春MOMA ", "position": "大兴新机场洋房别墅区", "types": "2室1厅 | 84.67平米 | 南 | 简装 | 中楼层(共18层) | 板楼", "unitPrice": "17,598元/平", "totalPrice": "149"} 68 | {"name": "金泰先锋北区 ", "position": "百子湾", "types": "3室2厅 | 129.49平米 | 南 北 | 精装 | 18层 | 板楼", "unitPrice": "84,949元/平", "totalPrice": "1100"} 69 | {"name": "龙泽苑东区 ", "position": "回龙观", "types": "5室1厅 | 158.4平米 | 南 北 | 毛坯 | 7层 | 2005年 | 板楼", "unitPrice": "34,723元/平", "totalPrice": "550"} 70 | {"name": "模式口东里 ", "position": "苹果园", "types": "2室2厅 | 101.54平米 | 南 北 | 精装 | 底层(共3层) | 1993年 | 板楼", "unitPrice": "43,235元/平", "totalPrice": "439"} 71 | {"name": "枣园小区 ", "position": "枣园", "types": "3室2厅 | 115.51平米 | 南 北 | 简装 | 低楼层(共6层) | 板楼", "unitPrice": "38,525元/平", "totalPrice": "445"} 72 | {"name": "天通西苑二区 ", "position": "天通苑", "types": "4室2厅 | 176.18平米 | 南 西 北 | 简装 | 高楼层(共32层) | 塔楼", "unitPrice": "26,451元/平", "totalPrice": "466"} 73 | {"name": "知春路82号院 ", "position": "双榆树", "types": "4室1厅 | 90.2平米 | 南 北 | 精装 | 中楼层(共5层) | 板楼", "unitPrice": "136,364元/平", "totalPrice": "1230"} 74 | {"name": "金汉绿港二区 ", "position": "顺义城", "types": "4室1厅 | 167.7平米 | 南 北 | 精装 | 中楼层(共17层) | 板楼", "unitPrice": "33,990元/平", "totalPrice": "570"} 75 | {"name": "领秀慧谷D区 ", "position": "回龙观", "types": "3室2厅 | 108.55平米 | 南 北 | 精装 | 中楼层(共11层) | 2016年 | 板楼", "unitPrice": "70,475元/平", "totalPrice": "765"} 76 | {"name": "汇园公寓 ", "position": "亚运村", "types": "3室2厅 | 134.54平米 | 南 北 | 精装 | 顶层(共15层) | 1990年 | 板楼", "unitPrice": "73,585元/平", "totalPrice": "990"} 77 | {"name": "乐府江南 ", "position": "田村", "types": "3室2厅 | 137.23平米 | 南 北 | 精装 | 中楼层(共9层) | 2005年 | 板塔结合", "unitPrice": "107,849元/平", "totalPrice": "1480"} 78 | {"name": "名都园 ", "position": "中央别墅区", "types": "4室2厅 | 228.54平米 | 东 南 西 北 | 简装 | 3层 | 板楼 | 独栋别墅 ", "unitPrice": "74,386元/平", "totalPrice": "1700"} 79 | {"name": "名流花园 ", "position": "北七家", "types": "4室2厅 | 240.55平米 | 南 北 | 精装 | 高楼层(共6层) | 板楼", "unitPrice": "19,913元/平", "totalPrice": "479"} 80 | {"name": "维多莉亚花园公寓 ", "position": "农展馆", "types": "3室1厅 | 153.92平米 | 南 北 | 精装 | 中楼层(共11层) | 板楼", "unitPrice": "92,906元/平", "totalPrice": "1430"} 81 | {"name": "欧陆经典 ", "position": "亚运村小营", "types": "3室1厅 | 135.41平米 | 东南 | 简装 | 低楼层(共26层) | 塔楼", "unitPrice": "78,798元/平", "totalPrice": "1067"} 82 | {"name": "恩济庄46号院 ", "position": "定慧寺", "types": "3室2厅 | 117.1平米 | 南 西 北 | 精装 | 高楼层(共14层) | 1995年 | 塔楼", "unitPrice": "69,172元/平", "totalPrice": "810"} 83 | {"name": "天通苑东一区 ", "position": "天通苑", "types": "3室1厅 | 108.31平米 | 南 西 北 | 简装 | 中楼层(共7层) | 板楼", "unitPrice": "36,008元/平", "totalPrice": "390"} 84 | {"name": "美丽园 ", "position": "四季青", "types": "3室2厅 | 144.1平米 | 南 北 | 精装 | 中楼层(共7层) | 板楼", "unitPrice": "114,504元/平", "totalPrice": "1650"} 85 | {"name": "林栖园 ", "position": "青塔", "types": "3室2厅 | 125.52平米 | 南 北 | 精装 | 顶层(共6层) | 2006年 | 板楼", "unitPrice": "51,785元/平", "totalPrice": "650"} 86 | {"name": "北蜂窝63号院 ", "position": "军博", "types": "3室1厅 | 81.13平米 | 南 北 | 简装 | 低楼层(共6层) | 1986年 | 板楼", "unitPrice": "101,073元/平", "totalPrice": "820"} 87 | {"name": "清枫华景园 ", "position": "学院路", "types": "3室2厅 | 129.66平米 | 南 北 | 简装 | 中楼层(共16层) | 2005年 | 板楼", "unitPrice": "101,420元/平", "totalPrice": "1315"} 88 | {"name": "西山枫林三期 ", "position": "苹果园", "types": "3室1厅 | 125.89平米 | 南 北 | 精装 | 高楼层(共10层) | 板楼", "unitPrice": "56,240元/平", "totalPrice": "708"} 89 | {"name": "爱民里小区 ", "position": "西四", "types": "3室1厅 | 96.8平米 | 南 北 | 简装 | 高楼层(共6层) | 1990年 | 板楼", "unitPrice": "133,265元/平", "totalPrice": "1290"} 90 | {"name": "阜成门外北四巷 ", "position": "阜成门", "types": "4室0厅 | 121.2平米 | 南 北 | 简装 | 中楼层(共4层) | 1950年 | 板楼", "unitPrice": "109,736元/平", "totalPrice": "1330"} 91 | {"name": "密西花园一期 ", "position": "果园街道", "types": "3室2厅 | 121.4平米 | 南 北 | 其他 | 6层 | 2003年 | 板楼", "unitPrice": "17,958元/平", "totalPrice": "218"} 92 | {"name": "东会新村 ", "position": "双桥", "types": "2室1厅 | 66.32平米 | 南 北 | 精装 | 低楼层(共6层) | 1997年 | 板楼", "unitPrice": "46,291元/平", "totalPrice": "307"} 93 | {"name": "石景嘉园 ", "position": "八角", "types": "2室1厅 | 66.53平米 | 西 北 | 简装 | 低楼层(共15层) | 塔楼", "unitPrice": "39,832元/平", "totalPrice": "265"} 94 | {"name": "天通苑东一区 ", "position": "天通苑", "types": "2室1厅 | 76.3平米 | 南 北 | 简装 | 低楼层(共7层) | 板楼", "unitPrice": "44,430元/平", "totalPrice": "339"} 95 | {"name": "龙潭西里 ", "position": "左安门", "types": "1室1厅 | 46.98平米 | 东 西 | 精装 | 6层 | 2000年 | 板楼", "unitPrice": "105,364元/平", "totalPrice": "495"} 96 | {"name": "阜成路甲52号院 ", "position": "定慧寺", "types": "2室1厅 | 81.61平米 | 西南 | 简装 | 低楼层(共18层) | 1999年 | 板楼", "unitPrice": "81,486元/平", "totalPrice": "665"} 97 | {"name": "龙华园 ", "position": "回龙观", "types": "2室1厅 | 67平米 | 南 北 | 简装 | 6层 | 板楼", "unitPrice": "52,687元/平", "totalPrice": "353"} 98 | {"name": "保利首开熙悦春天 ", "position": "天宫院", "types": "2室1厅 | 83.82平米 | 南 | 其他 | 高楼层(共21层) | 板楼", "unitPrice": "36,746元/平", "totalPrice": "308"} 99 | {"name": "天通苑东二区 ", "position": "天通苑", "types": "2室1厅 | 106.67平米 | 东南 | 简装 | 中楼层(共17层) | 板楼", "unitPrice": "40,124元/平", "totalPrice": "428"} 100 | {"name": "农光南路 ", "position": "劲松", "types": "2室1厅 | 54.05平米 | 南 北 | 精装 | 顶层(共6层) | 板楼", "unitPrice": "52,729元/平", "totalPrice": "285"} 101 | {"name": "鑫兆雅园北区 ", "position": "刘家窑", "types": "2室1厅 | 93.81平米 | 南 北 | 简装 | 中楼层(共6层) | 2005年 | 板楼", "unitPrice": "74,086元/平", "totalPrice": "695"} 102 | {"name": "禧瑞都 ", "position": "红庙", "types": "1室1厅 | 108.88平米 | 西 | 精装 | 低楼层(共28层) | 2010年 | 塔楼", "unitPrice": "96,437元/平", "totalPrice": "1050"} 103 | {"name": "大方居 ", "position": "九棵树(家乐福)", "types": "2室1厅 | 88.44平米 | 西 | 精装 | 低楼层(共22层) | 板塔结合", "unitPrice": "29,173元/平", "totalPrice": "258"} 104 | {"name": "电建北院 ", "position": "定福庄", "types": "2室1厅 | 70.96平米 | 北 南 | 简装 | 底层(共7层) | 板楼", "unitPrice": "48,619元/平", "totalPrice": "345"} 105 | {"name": "芳星园三区 ", "position": "方庄", "types": "3室2厅 | 95.27平米 | 西南 | 简装 | 中楼层(共12层) | 2000年 | 板塔结合", "unitPrice": "63,084元/平", "totalPrice": "601"} 106 | {"name": "北京新天地二期 ", "position": "常营", "types": "2室2厅 | 101.64平米 | 南 | 精装 | 低楼层(共28层) | 2008年 | 板塔结合", "unitPrice": "53,031元/平", "totalPrice": "539"} 107 | {"name": "美然动力A2区 ", "position": "定福庄", "types": "1室0厅 | 41.44平米 | 南 | 简装 | 低楼层(共14层) | 2003年 | 板楼", "unitPrice": "53,089元/平", "totalPrice": "220"} 108 | {"name": "安贞西里 ", "position": "安贞", "types": "2室1厅 | 57.15平米 | 南 北 | 简装 | 中楼层(共6层) | 1984年 | 板楼", "unitPrice": "89,939元/平", "totalPrice": "514"} 109 | {"name": "西宏苑 ", "position": "西红门", "types": "2室1厅 | 59.92平米 | 南 北 | 简装 | 顶层(共6层) | 1995年 | 板塔结合", "unitPrice": "33,045元/平", "totalPrice": "198"} 110 | {"name": "安慧北里秀园 ", "position": "亚运村", "types": "2室1厅 | 63.03平米 | 西南 | 简装 | 低楼层(共20层) | 1994年 | 塔楼", "unitPrice": "74,568元/平", "totalPrice": "470"} 111 | {"name": "金隅康惠园1号院 ", "position": "双桥", "types": "2室2厅 | 88.13平米 | 南 北 | 简装 | 中楼层(共9层) | 2010年 | 板楼", "unitPrice": "44,821元/平", "totalPrice": "395"} 112 | {"name": "晨光家园A区 ", "position": "石佛营", "types": "2室1厅 | 82.58平米 | 西南 | 精装 | 低楼层(共30层) | 2001年 | 塔楼", "unitPrice": "59,942元/平", "totalPrice": "495"} 113 | {"name": "鼎顺嘉园东区 ", "position": "顺义其它", "types": "2室1厅 | 73.73平米 | 南 | 精装 | 低楼层(共13层) | 板塔结合", "unitPrice": "27,805元/平", "totalPrice": "205"} 114 | {"name": "模式口西里 ", "position": "苹果园", "types": "2室1厅 | 54.02平米 | 南 | 简装 | 高楼层(共6层) | 板楼", "unitPrice": "36,468元/平", "totalPrice": "197"} 115 | {"name": "丽水嘉园 ", "position": "朝阳公园", "types": "2室1厅 | 95.36平米 | 南 西南 | 精装 | 中楼层(共29层) | 2000年 | 塔楼", "unitPrice": "98,469元/平", "totalPrice": "939"} 116 | {"name": "西罗园四区 ", "position": "西罗园", "types": "2室1厅 | 61.82平米 | 南 北 | 精装 | 高楼层(共6层) | 板楼", "unitPrice": "44,484元/平", "totalPrice": "275"} 117 | {"name": "北京金科天籁城 ", "position": "天宫院", "types": "3室1厅 | 89.79平米 | 南 | 简装 | 中楼层(共29层) | 板塔结合", "unitPrice": "41,096元/平", "totalPrice": "369"} 118 | {"name": "金谷园 ", "position": "知春路", "types": "2室1厅 | 95.04平米 | 南 北 | 精装 | 底层(共6层) | 2002年 | 板楼", "unitPrice": "101,958元/平", "totalPrice": "969"} 119 | {"name": "芳古园一区 ", "position": "方庄", "types": "2室1厅 | 56.3平米 | 南 | 简装 | 低楼层(共14层) | 1992年 | 塔楼", "unitPrice": "66,253元/平", "totalPrice": "373"} 120 | {"name": "雅丽世居 ", "position": "果园", "types": "2室2厅 | 99.14平米 | 南 北 | 精装 | 高楼层(共18层) | 2005年 | 板楼", "unitPrice": "34,295元/平", "totalPrice": "340"} 121 | {"name": "金隅康惠园1号院 ", "position": "双桥", "types": "2室2厅 | 88.13平米 | 南 北 | 简装 | 中楼层(共9层) | 2010年 | 板楼", "unitPrice": "44,821元/平", "totalPrice": "395"} 122 | {"name": "华远铭悦园 ", "position": "临河里", "types": "2室1厅 | 76.48平米 | 南 | 精装 | 28层 | 2014年 | 板塔结合", "unitPrice": "37,919元/平", "totalPrice": "290"} 123 | {"name": "鸭子桥南里 ", "position": "菜户营", "types": "3室1厅 | 60.39平米 | 南 北 | 简装 | 顶层(共6层) | 1982年 | 板楼", "unitPrice": "80,312元/平", "totalPrice": "485"} 124 | {"name": "新外大街31号院 ", "position": "小西天", "types": "3室1厅 | 66.3平米 | 东 南 北 | 简装 | 高楼层(共6层) | 1979年 | 板楼", "unitPrice": "77,678元/平", "totalPrice": "515"} 125 | {"name": "水仙园 ", "position": "武夷花园", "types": "2室1厅 | 93.75平米 | 南 北 | 简装 | 低楼层(共6层) | 2000年 | 板楼", "unitPrice": "46,720元/平", "totalPrice": "438"} 126 | {"name": "天通西苑二区 ", "position": "天通苑", "types": "2室1厅 | 95.62平米 | 东 南 | 简装 | 低楼层(共32层) | 塔楼", "unitPrice": "39,532元/平", "totalPrice": "378"} 127 | {"name": "永居东里 ", "position": "天宁寺", "types": "2室1厅 | 43.9平米 | 东 | 简装 | 中楼层(共5层) | 板楼", "unitPrice": "97,723元/平", "totalPrice": "429"} 128 | {"name": "澜西园三区 ", "position": "顺义城", "types": "2室1厅 | 74.57平米 | 南 北 | 简装 | 底层(共9层) | 2009年 | 板楼", "unitPrice": "30,844元/平", "totalPrice": "230"} 129 | {"name": "富锦嘉园五区 ", "position": "科技园区", "types": "2室1厅 | 91.9平米 | 南 北 | 精装 | 高楼层(共6层) | 2008年 | 板楼", "unitPrice": "54,843元/平", "totalPrice": "504"} 130 | {"name": "鹏润家园 ", "position": "菜户营", "types": "2室1厅 | 79.98平米 | 东 西 | 精装 | 高楼层(共16层) | 板楼", "unitPrice": "61,266元/平", "totalPrice": "490"} 131 | {"name": "海特花园东区 ", "position": "苹果园", "types": "2室1厅 | 98.89平米 | 西 | 精装 | 18层 | 2004年 | 板塔结合", "unitPrice": "46,011元/平", "totalPrice": "455"} 132 | {"name": "金顶阳光 ", "position": "苹果园", "types": "2室1厅 | 89.49平米 | 南 北 | 精装 | 中楼层(共21层) | 2009年 | 板楼", "unitPrice": "55,873元/平", "totalPrice": "500"} 133 | {"name": "黄金苑 ", "position": "奥林匹克公园", "types": "3室2厅 | 122.69平米 | 西南 | 精装 | 低楼层(共18层) | 塔楼", "unitPrice": "45,644元/平", "totalPrice": "560"} 134 | {"name": "福润四季A区 ", "position": "东坝", "types": "2室1厅 | 75.3平米 | 南 | 精装 | 16层 | 板塔结合", "unitPrice": "44,489元/平", "totalPrice": "335"} 135 | {"name": "凯景铭座 ", "position": "安定门", "types": "2室1厅 | 137.93平米 | 东南 | 精装 | 中楼层(共19层) | 2001年 | 塔楼", "unitPrice": "76,706元/平", "totalPrice": "1058"} 136 | {"name": "北京经开汀塘 ", "position": "通州其它", "types": "2室1厅 | 83.76平米 | 南 北 | 精装 | 高楼层(共15层) | 2019年 | 板楼", "unitPrice": "61,963元/平", "totalPrice": "519"} 137 | {"name": "玉带河西街 ", "position": "万达", "types": "2室1厅 | 56平米 | 南 北 | 简装 | 顶层(共5层) | 板楼", "unitPrice": "33,215元/平", "totalPrice": "186"} 138 | {"name": "芳星园三区 ", "position": "方庄", "types": "1室1厅 | 45.05平米 | 南 北 | 精装 | 低楼层(共6层) | 1987年 | 板楼", "unitPrice": "61,932元/平", "totalPrice": "279"} 139 | {"name": "南顶小区 ", "position": "赵公口", "types": "1室1厅 | 45.6平米 | 南 | 简装 | 高楼层(共6层) | 1992年 | 板楼", "unitPrice": "45,615元/平", "totalPrice": "208"} 140 | {"name": "花家地北里 ", "position": "望京", "types": "1室1厅 | 44.38平米 | 南 | 精装 | 高楼层(共18层) | 1994年 | 塔楼", "unitPrice": "68,725元/平", "totalPrice": "305"} 141 | {"name": "惠泽家园 ", "position": "门头沟其它", "types": "2室1厅 | 67.95平米 | 南 北 | 精装 | 顶层(共6层) | 板楼", "unitPrice": "26,785元/平", "totalPrice": "182"} 142 | {"name": "新外大街3号院 ", "position": "小西天", "types": "3室1厅 | 76.6平米 | 东 | 简装 | 顶层(共3层) | 板塔结合", "unitPrice": "86,162元/平", "totalPrice": "660"} 143 | {"name": "辛勤胡同 ", "position": "德胜门", "types": "3室1厅 | 64.3平米 | 东 南 西 | 简装 | 中楼层(共6层) | 板楼", "unitPrice": "127,528元/平", "totalPrice": "820"} 144 | {"name": "尚家楼48号院 ", "position": "三元桥", "types": "2室1厅 | 59.04平米 | 南 北 | 精装 | 顶层(共12层) | 1997年 | 板塔结合", "unitPrice": "73,679元/平", "totalPrice": "435"} 145 | {"name": "京投发展公园悦府一区 ", "position": "回龙观", "types": "2室2厅 | 81.68平米 | 南 | 精装 | 中楼层(共22层) | 板楼", "unitPrice": "64,888元/平", "totalPrice": "530"} 146 | {"name": "华威西里 ", "position": "潘家园", "types": "1室1厅 | 44.3平米 | 南 | 简装 | 高楼层(共18层) | 1993年 | 塔楼", "unitPrice": "55,531元/平", "totalPrice": "246"} 147 | {"name": "南平里 ", "position": "首都机场", "types": "1室1厅 | 41.47平米 | 南 | 精装 | 高楼层(共6层) | 板楼", "unitPrice": "33,760元/平", "totalPrice": "140"} 148 | {"name": "鸿坤理想城五期 ", "position": "西红门", "types": "2室1厅 | 70.31平米 | 南 | 简装 | 中楼层(共18层) | 板塔结合", "unitPrice": "39,682元/平", "totalPrice": "279"} 149 | {"name": "洋桥西里 ", "position": "洋桥", "types": "2室1厅 | 53.2平米 | 南 北 | 简装 | 6层 | 1990年 | 板楼", "unitPrice": "53,760元/平", "totalPrice": "286"} 150 | {"name": "次渠南里十一区 ", "position": "通州其它", "types": "1室1厅 | 57.95平米 | 南 | 简装 | 低楼层(共18层) | 暂无数据", "unitPrice": "34,858元/平", "totalPrice": "202"} 151 | -------------------------------------------------------------------------------- /lianjia/scrapy.cfg: -------------------------------------------------------------------------------- 1 | # Automatically created by: scrapy startproject 2 | # 3 | # For more information about the [deploy] section see: 4 | # https://scrapyd.readthedocs.io/en/latest/deploy.html 5 | 6 | [settings] 7 | default = lianjia.settings 8 | 9 | [deploy] 10 | #url = http://localhost:6800/ 11 | project = lianjia 12 | -------------------------------------------------------------------------------- /test1_house/datachange.py: -------------------------------------------------------------------------------- 1 | import csv 2 | import json 3 | import codecs 4 | 5 | ''' 6 | 将json文件格式转为csv文件格式并保存。 7 | ''' 8 | 9 | class Json_Csv(): 10 | 11 | # 初始化方法,创建csv文件。 12 | def __init__(self): 13 | self.save_csv = open('house_output.csv', 'w', encoding='utf-8', newline='') 14 | 15 | self.write_csv = csv.writer(self.save_csv, delimiter=',') # 以,为分隔符 16 | def trans(self, filename): 17 | with codecs.open(filename, 'r', encoding='utf-8') as f: #读取json文件 18 | read = f.readlines() 19 | flag = True 20 | for index, info in enumerate(read): 21 | data = json.loads(info) 22 | if flag: # 第一行当做head 23 | keys = list(data.keys()) # 将得到的keys用列表的形式封装好,才能写入csv 24 | self.write_csv.writerow(keys)#以,为分隔符将表头写入csv中 25 | flag = False # 释放 26 | value = list(data.values()) # 写入values,也要是列表形式 27 | 28 | temp = value[6]#将面积只保留最小面积,并转换为int形 29 | if type(temp) == str: 30 | list_temp = temp.split(' ') 31 | list_temp = list_temp[1].split('-') 32 | list_temp = list_temp[0].split('㎡') 33 | value[6] = int(list_temp[0]) 34 | 35 | value[7] = int(value[7])#将单价转换为Int形式,单位为元 36 | 37 | temp = value[8]#将总价只保留最小的,转换为int型,单位为万元 38 | if type(temp) == str: 39 | list_temp = temp.split('价') 40 | list_temp = list_temp[1].split('-') 41 | list_temp = list_temp[0].split('(') 42 | value[8] = int(list_temp[0]) 43 | 44 | self.write_csv.writerow(value)#以,为分隔符将数据写入表格中 45 | self.save_csv.close() # 写完就关闭 46 | 47 | 48 | if __name__ == '__main__': 49 | json_csv = Json_Csv() 50 | path = 'scrapy-test-firsthand.json' 51 | json_csv.trans(path) -------------------------------------------------------------------------------- /test1_house/house_output.csv: -------------------------------------------------------------------------------- 1 | name,types,position,position1,position2,houseType,space,unitPrice,totalPrice 2 | 北辰墅院1900,住宅,顺兴街11号院望尊园,顺义,马坡,3室,83,36000,430 3 | 燕西华府,别墅,"王佐镇青龙湖公园东1500米, 泉湖西路1号院(七区), 泉湖西路1号院(六区)",丰台,丰台其它,3室,350,47000,1400 4 | 京西悦府,住宅,燕房线阎村地铁站东南角约189米,房山,阎村,,120,33000,440 5 | 福景苑,住宅,亮马桥路46号,朝阳,燕莎,1室,145,83000,1150 6 | 合景寰汇公馆,住宅,北京市通州区滨河中路西侧(合景寰汇公馆),通州,武夷花园,2室,77,35000,280 7 | K2十里春风,住宅,北京市通州区,通州,通州其它,2室,74,23500,188 8 | K2十里春风,别墅,北京市通州区,通州,通州其它,3室,155,28000,440 9 | 玺萌壹號院,别墅,西南三环嘉园路与镇国寺北街交叉口,丰台,草桥,5室,320,90000,3650 10 | 北京书院,住宅,北京市朝阳区北土城东路辅路,朝阳,惠新西街,1室,79,155000,1066 11 | 中铁华侨城和园,住宅,南五环南海子公园西侧约500米,大兴,瀛海,3室,154,60000,930 12 | 顺鑫颐和天璟,住宅,北京市顺义区牛栏山镇牛富路顺鑫颐和天璟禧润售楼中心,顺义,顺义其它,4室,110,28000,400 13 | 顺鑫颐和天璟,别墅,新城右堤路与昌金路交汇处向北200米,顺义,顺义其它,4室,278,28000,950 14 | 永旺19街,商业,地铁生物医药基地站向南200米,大兴,天宫院,,,24000,299 15 | 北京城建北京合院,住宅,燕京街与通顺路交汇口东800米(仁和公园南),顺义,顺义其它,3室,95,46000,556 16 | 复地运河公馆,住宅,通州运河核心区临滨河西路,通州,武夷花园,2室,89,43000,450 17 | 北京城建北京合院,别墅,燕京街与通顺路交汇口东800米(仁和公园南),顺义,顺义其它,4室,210,39000,1000 18 | 月亮河七星公馆,住宅,通燕高速耿庄桥出口南200米月亮河,河滨路1号,通州,武夷花园,1室,55,68000,374 19 | 天润福熙大道,住宅,"清河营东路1号院, 清河营东路3号院",朝阳,北苑,1室,65,108000,750 20 | 京贸国际公馆,住宅,怡乐中路299号院(广渠快速路二期出口向南1000米),通州,九棵树(家乐福),1室,72,64000,495 21 | 凯德麓语,别墅,兴寿镇京承高速G11出口向西怀昌路北侧,昌平,昌平其它,3室,280,35000,850 22 | 京贸国际城·峰景,住宅,芙蓉东路1号(通燕高速耿庄桥北出口向南300米),通州,武夷花园,1室,69,68000,460 23 | 观唐云鼎,别墅,溪翁庄镇密溪路39号院(云佛山度假村对面),密云,溪翁庄镇,3室,346,30000,1068 24 | 旭辉城,住宅,北京市房山区良锦街6号院旭辉城营销中心,房山,房山其它,2室,75,28500,219 25 | 檀香府,住宅,京潭大街与潭柘十街交叉口,门头沟,门头沟其它,3室,124,42000,530 26 | 泰禾金府大院,别墅,南四环地铁新宫站南800米,丰台,新宫,4室,362,75000,2700 27 | 和棠瑞著,别墅,金海湖景区坝前广场西侧500米,平谷,平谷其它,3室,305,16000,530 28 | 尊悦光华,住宅,北京市朝阳区光华东里甲1号院3号楼,朝阳,CBD,3室,133,150000,2500 29 | 首创·河著,别墅,京承高速11出口(昌金路)向东900 米路北,顺义,顺义其它,4室,248,38000,1200 30 | 华萃西山,住宅,永定镇地铁S1号线石厂西南700米,门头沟,门头沟其它,3室,115,48000,560 31 | 京西悦府,别墅,北京市房山区燕房线阎村地铁站东南角约189米,房山,阎村,3室,175,40000,700 32 | 中粮天恒天悦壹号,别墅,南四环地铁新宫站南500米,丰台,新宫,4室,220,80000,2000 33 | 龙湾别墅,住宅,后沙峪镇龙湾别墅,顺义,中央别墅区,4室,218,70000,2300 34 | 京投发展·锦悦府,住宅,檀营乡檀东路西侧,密云,鼓楼街道,3室,90,25607,220 35 | 京投发展·锦悦府,别墅,檀营乡檀东路西侧,密云,鼓楼街道,3室,187,25000,400 36 | 金辰府,住宅,北京市昌平区北七家镇政府东南100米,昌平,北七家,3室,89,55000,490 37 | 建邦·顺颐府,住宅,空港B区裕民大街30号千里马国际一层建邦·顺颐府售楼中心,顺义,后沙峪,3室,89,55583,480 38 | 葛洲坝中国府,住宅,北京市丰台东路46号,丰台,玉泉营,3室,168,125000,2200 39 | 华萃西山,别墅,门头沟永定镇地铁S1号线石厂站西南700米,门头沟,门头沟其它,4室,135,48000,760 40 | 富兴首府,住宅,东坝路9号东北60米,朝阳,东坝,3室,144,85000,1706 41 | 中铁诺德阅墅,别墅,顺义区后沙峪镇裕园路762乡龙湖滟澜山对面,顺义,中央别墅区,4室,235,50000,1150 42 | 中铁华侨城和园,别墅,南五环南海子公园西侧约500米,大兴,瀛海,4室,288,50000,1870 43 | 懋源·璟岳,别墅,南三环西路99号院,丰台,玉泉营,4室,465,140000,6500 44 | 合景泰富天汇,住宅,顺义区昌金路与通顺路交汇处,顺义,马坡,2室,70,33000,230 45 | 懋源·璟玺,别墅,孙河京密路与京平辅路交叉口西行1000米,朝阳,中央别墅区,5室,500,100000,4380 46 | 万科雲庐,住宅,魏各庄路万科雲庐。售楼处位置在山湖路与泉湖西湖交叉位置,丰台,丰台其它,4室,104,39000,656 47 | 万科雲庐,别墅,魏各庄路万科雲庐,丰台,丰台其它,4室,200,30000,852 48 | 金茂北京国际社区,住宅,顺义新城北小营昌金路水色时光路西,顺义,顺义其它,1室,50,30000,160 49 | 住总如院,住宅,北京市大兴区采华路(波尔多小镇南区西南侧约250米),大兴,大兴新机场洋房别墅区,2室,98,31136,280 50 | 郎府书苑,住宅,西集镇京哈高速郎府出口南侧300米,通州,通州其它,3室,89,25800,273 51 | 建邦·顺颐府,别墅,空港B区裕民大街30号,顺义,后沙峪,3室,270,55583,1300 52 | -------------------------------------------------------------------------------- /test1_house/house_outputGBK编码,可用excle打开,.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Revenger666/python-house-scrapy/bbff08f70092d39ef0d19d3f0d9723aa4c8fd1b8/test1_house/house_outputGBK编码,可用excle打开,.csv -------------------------------------------------------------------------------- /test1_house/house_show.py: -------------------------------------------------------------------------------- 1 | #本代码实现了对csv文件的新房数据可视化处理,转换为散点图展示单价与总价的关系 2 | 3 | import matplotlib 4 | import matplotlib.pyplot as plt 5 | import csv 6 | 7 | filename = 'house_output.csv' 8 | with open(filename,"r",encoding='utf-8') as f: #注意这里一定记得用utf-8打开 9 | data = csv.reader(f) 10 | unit_price = [] 11 | total_price = [] 12 | house_type = [] 13 | test_f = 1 14 | for i in data: 15 | if test_f == 1: 16 | test_f = 0 17 | else: 18 | unit_price.append(int(i[7])) 19 | total_price.append(int(i[8])) 20 | house_type.append(i[1]) 21 | up1 = [] 22 | up2 = [] 23 | up3 = [] 24 | tp1 = [] 25 | tp2 = [] 26 | tp3 = [] 27 | for i in range(len(unit_price)): 28 | if house_type[i] == '住宅': 29 | up1.append(int(unit_price[i])) 30 | tp1.append(int(total_price[i])) 31 | if house_type[i] == '别墅': 32 | up2.append(int(unit_price[i])) 33 | tp2.append(int(total_price[i])) 34 | if house_type[i] == '商业': 35 | up3.append(int(unit_price[i])) 36 | tp3.append(int(total_price[i])) 37 | 38 | for i in range(len(tp1)): 39 | cur_index = i 40 | while tp1[cur_index - 1] > tp1[cur_index] and cur_index - 1 >= 0: 41 | tp1[cur_index], tp1[cur_index - 1] = tp1[cur_index - 1], tp1[cur_index] 42 | up1[cur_index], up1[cur_index - 1] = up1[cur_index - 1], up1[cur_index] 43 | cur_index -= 1 44 | for i in range(len(tp2)): 45 | cur_index = i 46 | while tp2[cur_index - 1] > tp2[cur_index] and cur_index - 1 >= 0: 47 | tp2[cur_index], tp2[cur_index - 1] = tp2[cur_index - 1], tp2[cur_index] 48 | up2[cur_index], up2[cur_index - 1] = up2[cur_index - 1], up2[cur_index] 49 | cur_index -= 1 50 | for i in range(len(tp3)): 51 | cur_index = i 52 | while tp3[cur_index - 1] > tp3[cur_index] and cur_index - 1 >= 0: 53 | tp3[cur_index], tp3[cur_index - 1] = tp3[cur_index - 1], tp3[cur_index] 54 | up3[cur_index], up3[cur_index - 1] = up3[cur_index - 1], up3[cur_index] 55 | cur_index -= 1 56 | #print(unit_price) 57 | #print(total_price) 58 | #print(house_type) 59 | 60 | color_list = ['#FF8C00', '#00FF00', '#0000FF'] #住宅,别墅,商业 61 | types = ['residence', 'villa', 'commercial'] 62 | 63 | plt.figure(figsize=(30, 10), dpi=70) 64 | plt.title('total_price and unit_price for different type house') 65 | plt.scatter(tp1, up1, s=30, c=color_list[0]) 66 | plt.scatter(tp2, up2, s=30, c=color_list[1]) 67 | plt.scatter(tp3, up3, s=30, c=color_list[2]) 68 | plt.xlabel('total_price/10000 yuan') 69 | plt.ylabel('unit_price/yuan') 70 | plt.legend(loc='lower right',title='house_type',labels=types) 71 | plt.show() 72 | -------------------------------------------------------------------------------- /test1_house/house_show2.py: -------------------------------------------------------------------------------- 1 | # 该代码实现单价-直方图的绘制 2 | import matplotlib 3 | import matplotlib.pyplot as plt 4 | import csv 5 | 6 | filename = 'house_output.csv' 7 | with open(filename,"r",encoding='utf-8') as f: # 注意这里一定记得用utf-8打开 8 | data = csv.reader(f) 9 | unit_price = [] 10 | pos = [] # 行政区 11 | house_num = [] # 楼盘数量 12 | price_sum = [] # 平均单价的和 13 | test_f = 1 14 | for i in data: 15 | if test_f == 1: 16 | test_f = 0 17 | else: 18 | unit_price.append(int(i[7])) 19 | pos.append(str(i[3])) 20 | for i in range(0,10): 21 | house_num.append(int(0)) 22 | price_sum.append(int(0)) 23 | 24 | for i in range(len(pos)): 25 | if pos[i] == '朝阳': 26 | house_num[0] = house_num[0] + 1 27 | price_sum[0] = price_sum[0] + unit_price[i] 28 | if pos[i] == '丰台': 29 | house_num[1] = house_num[1] + 1 30 | price_sum[1] = price_sum[1] + unit_price[i] 31 | if pos[i] == '顺义': 32 | house_num[2] = house_num[2] + 1 33 | price_sum[2] = price_sum[2] + unit_price[i] 34 | if pos[i] == '通州': 35 | house_num[3] = house_num[3] + 1 36 | price_sum[3] = price_sum[3] + unit_price[i] 37 | if pos[i] == '大兴': 38 | house_num[4] = house_num[4] + 1 39 | price_sum[4] = price_sum[4] + unit_price[i] 40 | if pos[i] == '昌平': 41 | house_num[5] = house_num[5] + 1 42 | price_sum[5] = price_sum[5] + unit_price[i] 43 | if pos[i] == '门头沟': 44 | house_num[6] = house_num[6] + 1 45 | price_sum[6] = price_sum[6] + unit_price[i] 46 | if pos[i] == '房山': 47 | house_num[7] = house_num[7] + 1 48 | price_sum[7] = price_sum[7] + unit_price[i] 49 | if pos[i] == '密云': 50 | house_num[8] = house_num[8] + 1 51 | price_sum[8] = price_sum[8] + unit_price[i] 52 | if pos[i] == '平谷': 53 | house_num[9] = house_num[9] + 1 54 | price_sum[9] = price_sum[9] + unit_price[i] 55 | print(house_num) 56 | bins_num = [] 57 | count = 3 58 | for i in range(0, 11): 59 | if i != 0: 60 | price_sum[i-1] = price_sum[i-1] / house_num[i-1] # 计算出平均单价 61 | count = count + house_num[i-1] 62 | bins_num.append(count) 63 | 64 | 65 | position_qu = ['chaoyang', 'fengtai', 'shunyi', 'tongzhou', 'daxing', 'changping', 'mentougou', 'fangshan', 'miyun', 'pinggu'] 66 | print(bins_num) 67 | print(price_sum) 68 | plt.figure(figsize=(30, 10), dpi=70) 69 | plt.title('unit_price_show', fontsize=30) 70 | plt.xlabel('position', fontsize=15) 71 | plt.ylabel('avg_unit_price/yuan', fontsize=15) 72 | for i in range(0,10): 73 | plt.bar(position_qu[i], price_sum[i], width=(house_num[i]/15)) 74 | for x,y in zip(position_qu,price_sum): 75 | plt.text(x,y,'%d' % y, ha='center', va='bottom', fontsize=15) 76 | plt.show() 77 | 78 | 79 | -------------------------------------------------------------------------------- /test1_house/house_show3.py: -------------------------------------------------------------------------------- 1 | # 该代码实现总价-直方图的绘制 2 | import matplotlib 3 | import matplotlib.pyplot as plt 4 | import csv 5 | 6 | filename = 'house_output.csv' 7 | with open(filename,"r",encoding='utf-8') as f: # 注意这里一定记得用utf-8打开 8 | data = csv.reader(f) 9 | total_price = [] 10 | pos = [] # 行政区 11 | house_num = [] # 楼盘数量 12 | price_sum = [] # 平均单价的和 13 | test_f = 1 14 | for i in data: 15 | if test_f == 1: 16 | test_f = 0 17 | else: 18 | total_price.append(int(i[8])) 19 | pos.append(str(i[3])) 20 | for i in range(0,10): 21 | house_num.append(int(0)) 22 | price_sum.append(int(0)) 23 | 24 | for i in range(len(pos)): 25 | if pos[i] == '朝阳': 26 | house_num[0] = house_num[0] + 1 27 | price_sum[0] = price_sum[0] + total_price[i] 28 | if pos[i] == '丰台': 29 | house_num[1] = house_num[1] + 1 30 | price_sum[1] = price_sum[1] + total_price[i] 31 | if pos[i] == '顺义': 32 | house_num[2] = house_num[2] + 1 33 | price_sum[2] = price_sum[2] + total_price[i] 34 | if pos[i] == '通州': 35 | house_num[3] = house_num[3] + 1 36 | price_sum[3] = price_sum[3] + total_price[i] 37 | if pos[i] == '大兴': 38 | house_num[4] = house_num[4] + 1 39 | price_sum[4] = price_sum[4] + total_price[i] 40 | if pos[i] == '昌平': 41 | house_num[5] = house_num[5] + 1 42 | price_sum[5] = price_sum[5] + total_price[i] 43 | if pos[i] == '门头沟': 44 | house_num[6] = house_num[6] + 1 45 | price_sum[6] = price_sum[6] + total_price[i] 46 | if pos[i] == '房山': 47 | house_num[7] = house_num[7] + 1 48 | price_sum[7] = price_sum[7] + total_price[i] 49 | if pos[i] == '密云': 50 | house_num[8] = house_num[8] + 1 51 | price_sum[8] = price_sum[8] + total_price[i] 52 | if pos[i] == '平谷': 53 | house_num[9] = house_num[9] + 1 54 | price_sum[9] = price_sum[9] + total_price[i] 55 | print(house_num) 56 | bins_num = [] 57 | count = 3 58 | for i in range(0, 11): 59 | if i != 0: 60 | price_sum[i-1] = price_sum[i-1] / house_num[i-1] # 计算出平均单价 61 | count = count + house_num[i-1] 62 | bins_num.append(count) 63 | 64 | 65 | position_qu = ['chaoyang', 'fengtai', 'shunyi', 'tongzhou', 'daxing', 'changping', 'mentougou', 'fangshan', 'miyun', 'pinggu'] 66 | print(bins_num) 67 | print(price_sum) 68 | plt.figure(figsize=(30, 10), dpi=70) 69 | plt.title('total_price_show', fontsize=30) 70 | plt.xlabel('position', fontsize=15) 71 | plt.ylabel('avg_unit_price/10000 yuan', fontsize=15) 72 | for i in range(0,10): 73 | plt.bar(position_qu[i], price_sum[i], width=(house_num[i]/15)) 74 | for x,y in zip(position_qu,price_sum): 75 | plt.text(x,y,'%d' % y, ha='center', va='bottom', fontsize=15) 76 | plt.show() 77 | 78 | 79 | -------------------------------------------------------------------------------- /test1_house/scrapy-test-firsthand.json: -------------------------------------------------------------------------------- 1 | {"name": "北辰墅院1900", "types": "住宅", "position": "顺兴街11号院望尊园", "position1": "顺义", "position2": "马坡", "houseType": "3室", "space": "建面 83-135㎡", "unitPrice": "36000", "totalPrice": "总价430(万/套)"} 2 | {"name": "燕西华府", "types": "别墅", "position": "王佐镇青龙湖公园东1500米, 泉湖西路1号院(七区), 泉湖西路1号院(六区)", "position1": "丰台", "position2": "丰台其它", "houseType": "3室", "space": "建面 350-851㎡", "unitPrice": "47000", "totalPrice": "总价1400-3500(万/套)"} 3 | {"name": "京西悦府", "types": "住宅", "position": "燕房线阎村地铁站东南角约189米", "position1": "房山", "position2": "阎村", "houseType": null, "space": "建面 120-135㎡", "unitPrice": "33000", "totalPrice": "总价440(万/套)"} 4 | {"name": "福景苑", "types": "住宅", "position": "亮马桥路46号", "position1": "朝阳", "position2": "燕莎", "houseType": "1室", "space": "建面 145-268㎡", "unitPrice": "83000", "totalPrice": "总价1150-2400(万/套)"} 5 | {"name": "合景寰汇公馆", "types": "住宅", "position": "北京市通州区滨河中路西侧(合景寰汇公馆)", "position1": "通州", "position2": "武夷花园", "houseType": "2室", "space": "建面 77-117㎡", "unitPrice": "35000", "totalPrice": "总价280-490(万/套)"} 6 | {"name": "K2十里春风", "types": "住宅", "position": "北京市通州区", "position1": "通州", "position2": "通州其它", "houseType": "2室", "space": "建面 74-90㎡", "unitPrice": "23500", "totalPrice": "总价188-212(万/套)"} 7 | {"name": "K2十里春风", "types": "别墅", "position": "北京市通州区", "position1": "通州", "position2": "通州其它", "houseType": "3室", "space": "建面 155-156㎡", "unitPrice": "28000", "totalPrice": "总价440-460(万/套)"} 8 | {"name": "玺萌壹號院", "types": "别墅", "position": "西南三环嘉园路与镇国寺北街交叉口", "position1": "丰台", "position2": "草桥", "houseType": "5室", "space": "建面 320-464㎡", "unitPrice": "90000", "totalPrice": "总价3650-3940(万/套)"} 9 | {"name": "北京书院", "types": "住宅", "position": "北京市朝阳区北土城东路辅路", "position1": "朝阳", "position2": "惠新西街", "houseType": "1室", "space": "建面 79-139㎡", "unitPrice": "155000", "totalPrice": "总价1066(万/套)"} 10 | {"name": "中铁华侨城和园", "types": "住宅", "position": "南五环南海子公园西侧约500米", "position1": "大兴", "position2": "瀛海", "houseType": "3室", "space": "建面 154-184㎡", "unitPrice": "60000", "totalPrice": "总价930-980(万/套)"} 11 | {"name": "顺鑫颐和天璟", "types": "住宅", "position": "北京市顺义区牛栏山镇牛富路顺鑫颐和天璟禧润售楼中心", "position1": "顺义", "position2": "顺义其它", "houseType": "4室", "space": "建面 110-220㎡", "unitPrice": "28000", "totalPrice": "总价400-420(万/套)"} 12 | {"name": "顺鑫颐和天璟", "types": "别墅", "position": "新城右堤路与昌金路交汇处向北200米", "position1": "顺义", "position2": "顺义其它", "houseType": "4室", "space": "建面 278-486㎡", "unitPrice": "28000", "totalPrice": "总价950-1200(万/套)"} 13 | {"name": "永旺19街", "types": "商业", "position": "地铁生物医药基地站向南200米", "position1": "大兴", "position2": "天宫院", "houseType": null, "space": null, "unitPrice": "24000", "totalPrice": "总价299(万/套)"} 14 | {"name": "北京城建北京合院", "types": "住宅", "position": "燕京街与通顺路交汇口东800米(仁和公园南)", "position1": "顺义", "position2": "顺义其它", "houseType": "3室", "space": "建面 95-130㎡", "unitPrice": "46000", "totalPrice": "总价556-566(万/套)"} 15 | {"name": "复地运河公馆", "types": "住宅", "position": "通州运河核心区临滨河西路", "position1": "通州", "position2": "武夷花园", "houseType": "2室", "space": "建面 89-145㎡", "unitPrice": "43000", "totalPrice": "总价450-650(万/套)"} 16 | {"name": "北京城建北京合院", "types": "别墅", "position": "燕京街与通顺路交汇口东800米(仁和公园南)", "position1": "顺义", "position2": "顺义其它", "houseType": "4室", "space": "建面 210-330㎡", "unitPrice": "39000", "totalPrice": "总价1000-1300(万/套)"} 17 | {"name": "月亮河七星公馆", "types": "住宅", "position": "通燕高速耿庄桥出口南200米月亮河,河滨路1号", "position1": "通州", "position2": "武夷花园", "houseType": "1室", "space": "建面 55-109㎡", "unitPrice": "68000", "totalPrice": "总价374-800(万/套)"} 18 | {"name": "天润福熙大道", "types": "住宅", "position": "清河营东路1号院, 清河营东路3号院", "position1": "朝阳", "position2": "北苑", "houseType": "1室", "space": "建面 65-374㎡", "unitPrice": "108000", "totalPrice": "总价750-3316(万/套)"} 19 | {"name": "京贸国际公馆", "types": "住宅", "position": "怡乐中路299号院(广渠快速路二期出口向南1000米)", "position1": "通州", "position2": "九棵树(家乐福)", "houseType": "1室", "space": "建面 72-147㎡", "unitPrice": "64000", "totalPrice": "总价495-950(万/套)"} 20 | {"name": "凯德麓语", "types": "别墅", "position": "兴寿镇京承高速G11出口向西怀昌路北侧", "position1": "昌平", "position2": "昌平其它", "houseType": "3室", "space": "建面 280-863㎡", "unitPrice": "35000", "totalPrice": "总价850-3450(万/套)"} 21 | {"name": "京贸国际城·峰景", "types": "住宅", "position": "芙蓉东路1号(通燕高速耿庄桥北出口向南300米)", "position1": "通州", "position2": "武夷花园", "houseType": "1室", "space": "建面 69-140㎡", "unitPrice": "68000", "totalPrice": "总价460-980(万/套)"} 22 | {"name": "观唐云鼎", "types": "别墅", "position": "溪翁庄镇密溪路39号院(云佛山度假村对面)", "position1": "密云", "position2": "溪翁庄镇", "houseType": "3室", "space": "建面 346-613㎡", "unitPrice": "30000", "totalPrice": "总价1068-1850(万/套)"} 23 | {"name": "旭辉城", "types": "住宅", "position": "北京市房山区良锦街6号院旭辉城营销中心", "position1": "房山", "position2": "房山其它", "houseType": "2室", "space": "建面 75-116㎡", "unitPrice": "28500", "totalPrice": "总价219-330(万/套)"} 24 | {"name": "檀香府", "types": "住宅", "position": "京潭大街与潭柘十街交叉口", "position1": "门头沟", "position2": "门头沟其它", "houseType": "3室", "space": "建面 124-170㎡", "unitPrice": "42000", "totalPrice": "总价530-750(万/套)"} 25 | {"name": "泰禾金府大院", "types": "别墅", "position": "南四环地铁新宫站南800米", "position1": "丰台", "position2": "新宫", "houseType": "4室", "space": "建面 362-504㎡", "unitPrice": "75000", "totalPrice": "总价2700-3700(万/套)"} 26 | {"name": "和棠瑞著", "types": "别墅", "position": "金海湖景区坝前广场西侧500米", "position1": "平谷", "position2": "平谷其它", "houseType": "3室", "space": "建面 305-360㎡", "unitPrice": "16000", "totalPrice": "总价530-560(万/套)"} 27 | {"name": "尊悦光华", "types": "住宅", "position": "北京市朝阳区光华东里甲1号院3号楼", "position1": "朝阳", "position2": "CBD", "houseType": "3室", "space": "建面 133-171㎡", "unitPrice": "150000", "totalPrice": "总价2500(万/套)"} 28 | {"name": "首创·河著", "types": "别墅", "position": "京承高速11出口(昌金路)向东900 米路北", "position1": "顺义", "position2": "顺义其它", "houseType": "4室", "space": "建面 248-310㎡", "unitPrice": "38000", "totalPrice": "总价1200-1900(万/套)"} 29 | {"name": "华萃西山", "types": "住宅", "position": "永定镇地铁S1号线石厂西南700米", "position1": "门头沟", "position2": "门头沟其它", "houseType": "3室", "space": "建面 115-122㎡", "unitPrice": "48000", "totalPrice": "总价560-600(万/套)"} 30 | {"name": "京西悦府", "types": "别墅", "position": "北京市房山区燕房线阎村地铁站东南角约189米", "position1": "房山", "position2": "阎村", "houseType": "3室", "space": "建面 175-176㎡", "unitPrice": "40000", "totalPrice": "总价700-780(万/套)"} 31 | {"name": "中粮天恒天悦壹号", "types": "别墅", "position": "南四环地铁新宫站南500米", "position1": "丰台", "position2": "新宫", "houseType": "4室", "space": "建面 220-340㎡", "unitPrice": "80000", "totalPrice": "总价2000-2360(万/套)"} 32 | {"name": "龙湾别墅", "types": "住宅", "position": "后沙峪镇龙湾别墅", "position1": "顺义", "position2": "中央别墅区", "houseType": "4室", "space": "建面 218-317㎡", "unitPrice": "70000", "totalPrice": "总价2300(万/套)"} 33 | {"name": "京投发展·锦悦府", "types": "住宅", "position": "檀营乡檀东路西侧", "position1": "密云", "position2": "鼓楼街道", "houseType": "3室", "space": "建面 90㎡", "unitPrice": "25607", "totalPrice": "总价220(万/套)"} 34 | {"name": "京投发展·锦悦府", "types": "别墅", "position": "檀营乡檀东路西侧", "position1": "密云", "position2": "鼓楼街道", "houseType": "3室", "space": "建面 187-285㎡", "unitPrice": "25000", "totalPrice": "总价400-560(万/套)"} 35 | {"name": "金辰府", "types": "住宅", "position": "北京市昌平区北七家镇政府东南100米", "position1": "昌平", "position2": "北七家", "houseType": "3室", "space": "建面 89-143㎡", "unitPrice": "55000", "totalPrice": "总价490-790(万/套)"} 36 | {"name": "建邦·顺颐府", "types": "住宅", "position": "空港B区裕民大街30号千里马国际一层建邦·顺颐府售楼中心", "position1": "顺义", "position2": "后沙峪", "houseType": "3室", "space": "建面 89-147㎡", "unitPrice": "55583", "totalPrice": "总价480-845(万/套)"} 37 | {"name": "葛洲坝中国府", "types": "住宅", "position": "北京市丰台东路46号", "position1": "丰台", "position2": "玉泉营", "houseType": "3室", "space": "建面 168-240㎡", "unitPrice": "125000", "totalPrice": "总价2200-3000(万/套)"} 38 | {"name": "华萃西山", "types": "别墅", "position": "门头沟永定镇地铁S1号线石厂站西南700米", "position1": "门头沟", "position2": "门头沟其它", "houseType": "4室", "space": "建面 135-245㎡", "unitPrice": "48000", "totalPrice": "总价760-1060(万/套)"} 39 | {"name": "富兴首府", "types": "住宅", "position": "东坝路9号东北60米", "position1": "朝阳", "position2": "东坝", "houseType": "3室", "space": "建面 144-356㎡", "unitPrice": "85000", "totalPrice": "总价1706-2240(万/套)"} 40 | {"name": "中铁诺德阅墅", "types": "别墅", "position": "顺义区后沙峪镇裕园路762乡龙湖滟澜山对面", "position1": "顺义", "position2": "中央别墅区", "houseType": "4室", "space": "建面 235-320㎡", "unitPrice": "50000", "totalPrice": "总价1150-1700(万/套)"} 41 | {"name": "中铁华侨城和园", "types": "别墅", "position": "南五环南海子公园西侧约500米", "position1": "大兴", "position2": "瀛海", "houseType": "4室", "space": "建面 288-370㎡", "unitPrice": "50000", "totalPrice": "总价1870(万/套)"} 42 | {"name": "懋源·璟岳", "types": "别墅", "position": "南三环西路99号院", "position1": "丰台", "position2": "玉泉营", "houseType": "4室", "space": "建面 465-590㎡", "unitPrice": "140000", "totalPrice": "总价6500-9000(万/套)"} 43 | {"name": "合景泰富天汇", "types": "住宅", "position": "顺义区昌金路与通顺路交汇处", "position1": "顺义", "position2": "马坡", "houseType": "2室", "space": "建面 70-117㎡", "unitPrice": "33000", "totalPrice": "总价230-390(万/套)"} 44 | {"name": "懋源·璟玺", "types": "别墅", "position": "孙河京密路与京平辅路交叉口西行1000米", "position1": "朝阳", "position2": "中央别墅区", "houseType": "5室", "space": "建面 500-716㎡", "unitPrice": "100000", "totalPrice": "总价4380-6778(万/套)"} 45 | {"name": "万科雲庐", "types": "住宅", "position": "魏各庄路万科雲庐。售楼处位置在山湖路与泉湖西湖交叉位置", "position1": "丰台", "position2": "丰台其它", "houseType": "4室", "space": "建面 104-300㎡", "unitPrice": "39000", "totalPrice": "总价656-820(万/套)"} 46 | {"name": "万科雲庐", "types": "别墅", "position": "魏各庄路万科雲庐", "position1": "丰台", "position2": "丰台其它", "houseType": "4室", "space": "建面 200-330㎡", "unitPrice": "30000", "totalPrice": "总价852-950(万/套)"} 47 | {"name": "金茂北京国际社区", "types": "住宅", "position": "顺义新城北小营昌金路水色时光路西", "position1": "顺义", "position2": "顺义其它", "houseType": "1室", "space": "建面 50-118㎡", "unitPrice": "30000", "totalPrice": "总价160-360(万/套)"} 48 | {"name": "住总如院", "types": "住宅", "position": "北京市大兴区采华路(波尔多小镇南区西南侧约250米)", "position1": "大兴", "position2": "大兴新机场洋房别墅区", "houseType": "2室", "space": "建面 98-233㎡", "unitPrice": "31136", "totalPrice": "总价280-475(万/套)"} 49 | {"name": "郎府书苑", "types": "住宅", "position": "西集镇京哈高速郎府出口南侧300米", "position1": "通州", "position2": "通州其它", "houseType": "3室", "space": "建面 89-116㎡", "unitPrice": "25800", "totalPrice": "总价273-300(万/套)"} 50 | {"name": "建邦·顺颐府", "types": "别墅", "position": "空港B区裕民大街30号", "position1": "顺义", "position2": "后沙峪", "houseType": "3室", "space": "建面 270㎡", "unitPrice": "55583", "totalPrice": "总价1300(万/套)"} 51 | -------------------------------------------------------------------------------- /test1_house/单价-总价散点图绘制效果.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Revenger666/python-house-scrapy/bbff08f70092d39ef0d19d3f0d9723aa4c8fd1b8/test1_house/单价-总价散点图绘制效果.png -------------------------------------------------------------------------------- /test1_house/单价直方图绘制效果.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Revenger666/python-house-scrapy/bbff08f70092d39ef0d19d3f0d9723aa4c8fd1b8/test1_house/单价直方图绘制效果.png -------------------------------------------------------------------------------- /test1_house/总价直方图绘制效果.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Revenger666/python-house-scrapy/bbff08f70092d39ef0d19d3f0d9723aa4c8fd1b8/test1_house/总价直方图绘制效果.png -------------------------------------------------------------------------------- /zufang/.idea/.gitignore: -------------------------------------------------------------------------------- 1 | # 默认忽略的文件 2 | /shelf/ 3 | /workspace.xml 4 | -------------------------------------------------------------------------------- /zufang/.idea/inspectionProfiles/profiles_settings.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 6 | -------------------------------------------------------------------------------- /zufang/.idea/misc.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | -------------------------------------------------------------------------------- /zufang/.idea/modules.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | -------------------------------------------------------------------------------- /zufang/.idea/zufang.iml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | -------------------------------------------------------------------------------- /zufang/GDP_price_show.py: -------------------------------------------------------------------------------- 1 | #由于之前写好的程序中有展示各个城市单位面积均价,直接在这里存入列表中。 2 | #单位面积均价: 北京:103.2元/平米/月 3 | # 广州:63.9元/平米/月 4 | # 上海:103.5元/平米/月 5 | # 深圳:88.5元/平米/月 6 | # 西安:36.0元/平米/月 7 | #通过百度查询各个城市的人均GDP可知: 8 | #人均GDP:北京:19.03万元/人 广州:15.36万元/人 上海:17.99万元/人 深圳:18.33万元/人 西安:8.88万元/人 9 | #采用比值的形式来展示单位面积均价/GDP ,比值越小越好,因为对于同样的单位面积,GDP越大越好 10 | 11 | import matplotlib.pyplot as plt 12 | 13 | GDP = [19.03, 15.36, 17.99, 18.33, 8.88] 14 | space_price = [103.2, 63.9, 103.5, 88.5, 36.0] 15 | count = [] 16 | # 求比值 17 | for i in range(0,5): 18 | count.append(space_price[i]/GDP[i]) 19 | 20 | city_name = ['北京', '广州', '上海', '深圳', '西安'] 21 | plt.figure(figsize=(20, 10), dpi=70) 22 | plt.rcParams['font.sans-serif'] = ['Microsoft YaHei'] 23 | 24 | # 绘制直方图 25 | plt.subplot(121) 26 | plt.title("单位面积价格与GDP的比值直方图",fontsize=20) 27 | plt.bar(city_name, count,width=0.4) 28 | plt.ylabel("单位面积价格/人均GDP(万元)",fontsize=15) 29 | 30 | # 绘制散点图 31 | plt.subplot(122) 32 | plt.scatter(GDP[0],space_price[0],s=60, label='北京', color='steelblue') 33 | plt.scatter(GDP[1],space_price[1],s=60, label='广州', color='brown') 34 | plt.scatter(GDP[2],space_price[2],s=60, label='上海',color='green') 35 | plt.scatter(GDP[3],space_price[3],s=60, label='深圳',color='darkorange') 36 | plt.scatter(GDP[4],space_price[4],s=60, label='西安',color='skyblue') 37 | plt.title("GDP-单位面积均价散点图",fontsize=20) 38 | plt.xlabel("GDP 单位:万元",fontsize=15) 39 | plt.ylabel("单位面积均价 单位:平方米/元/月",fontsize=15) 40 | plt.legend(fontsize=15) 41 | 42 | plt.show() 43 | -------------------------------------------------------------------------------- /zufang/chromedriver.exe: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Revenger666/python-house-scrapy/bbff08f70092d39ef0d19d3f0d9723aa4c8fd1b8/zufang/chromedriver.exe -------------------------------------------------------------------------------- /zufang/face_price_show.py: -------------------------------------------------------------------------------- 1 | # 该文件实现了对五个不同城市按照房屋面向来进行均价的比较 2 | 3 | import json 4 | import codecs 5 | import matplotlib.pyplot as plt 6 | import numpy as np 7 | 8 | avg_price = [] # 均价 9 | count_list = [] # 各种面向的数量 10 | 11 | class Read_Json_and_show(): 12 | def reads(self,filename,city, avg_price,count_list): # 读取城市数据,并分别存储到需要的数组中 13 | temp_price = [] # 临时用存储价格的列表 14 | count = [] # 临时用存储各个面向计数的列表 15 | for i in range(0,4): # 0123分别表示东南西北 16 | count.append(0) 17 | temp_price.append(0) 18 | 19 | with codecs.open(filename, 'r', encoding='utf-8') as f: 20 | read = f.readlines() 21 | # 打开文件并逐行读取 22 | for index, info in enumerate(read): 23 | data = json.loads(info) 24 | value = list(data.values()) 25 | # value保存了每一行的数据的值,以列表形式。 26 | 27 | # 不断拆分,最后拆出来需要的面向 28 | temp_price_read = value[1].split('-') 29 | price = temp_price_read[0] 30 | price = int(price) 31 | temp_face_read = value[5].split('㎡ /') 32 | temp_face_read = temp_face_read[1].split(' / ') 33 | temp_face_read = temp_face_read[0].split('在') 34 | 35 | if len(temp_face_read) == 1: 36 | temp_face_read = temp_face_read[0].split('室') 37 | if len(temp_face_read) == 1: 38 | temp_face_read = temp_face_read[0].split(' ') 39 | # 由于同时一组数据可能有多个面向,故如面向东 南,则东,南均各统计一次。 40 | for face in temp_face_read: 41 | if face == '东': 42 | temp_price[0] += price 43 | count[0] += 1 44 | if face == '南': 45 | temp_price[1] += price 46 | count[1] += 1 47 | if face == '西': 48 | temp_price[2] += price 49 | count[2] += 1 50 | if face == '北': 51 | temp_price[3] += price 52 | count[3] += 1 53 | # 计算平均价格 54 | for i in range(len(count)): 55 | temp_price[i] = temp_price[i]/count[i] 56 | 57 | # 将价格放入列表中 58 | avg_price.append(temp_price) 59 | count_list.append(count) 60 | 61 | 62 | if __name__ == '__main__': 63 | read_json = Read_Json_and_show() 64 | # 存储路径 65 | path1 = 'scrapy-beijing-zufang.json' 66 | path2 = 'scrapy-guangzhou-zufang.json' 67 | path3 = 'scrapy-shanghai-zufang.json' 68 | path4 = 'scrapy-shenzhen-zufang.json' 69 | path5 = 'scrapy-xian-zufang.json' 70 | # 分别对五个城市进行读取和分析的操作 71 | read_json.reads(path1, 0, avg_price, count_list) 72 | read_json.reads(path2, 1, avg_price, count_list) 73 | read_json.reads(path3, 2, avg_price, count_list) 74 | read_json.reads(path4, 3, avg_price, count_list) 75 | read_json.reads(path5, 4, avg_price, count_list) 76 | # 查看分析结果 77 | print(avg_price) 78 | print(count_list) 79 | # 绘图,分别将五个城市的直方图进行绘制 80 | face_name = ['东','南','西','北'] 81 | plt.figure(figsize=(30, 30), dpi=70) 82 | bar_width = 0.1 # 条宽偏移 83 | plt.rcParams['font.sans-serif'] = ['Microsoft YaHei'] 84 | 85 | # 通过偏移来绘制直方图,以达到一个横坐标能显示多个直方图的效果,同时便于区分颜色和图例 86 | plt.title('各城市面向以及价格比较', fontsize=25) 87 | plt.bar(x=np.arange(len(face_name)), height=avg_price[0], width=bar_width, label='北京', color='steelblue') 88 | plt.bar(x=np.arange(len(face_name)) + bar_width, height=avg_price[1], width=bar_width, label='广州', color='brown') 89 | plt.bar(x=np.arange(len(face_name)) + bar_width * 2, height=avg_price[2], width=bar_width, label='上海',color='greenyellow') 90 | plt.bar(x=np.arange(len(face_name)) + bar_width * 3, height=avg_price[3], width=bar_width, label='深圳',color='darkorange') 91 | plt.bar(x=np.arange(len(face_name)) + bar_width * 4, height=avg_price[4], width=bar_width, label='西安',color='skyblue') 92 | plt.xticks(np.arange(4) + 0.2, face_name,fontsize=20) 93 | plt.ylabel('价格:元/月',fontsize=20) 94 | # 显示图例 95 | plt.legend(fontsize=10) 96 | # 为柱状图在顶部添加数据信息 97 | for x, y in enumerate(avg_price[0]): 98 | plt.text(x, y + 0.2, "%s" % round(y, 1), ha='center') 99 | for x, y in enumerate(avg_price[1]): 100 | plt.text(x + bar_width, y + 0.2, "%s" % round(y, 1), ha='center') 101 | for x, y in enumerate(avg_price[2]): 102 | plt.text(x + bar_width * 2, y + 0.2, "%s" % round(y, 1), ha='center') 103 | for x, y in enumerate(avg_price[3]): 104 | plt.text(x + bar_width * 3, y + 0.2, "%s" % round(y, 1), ha='center') 105 | for x, y in enumerate(avg_price[4]): 106 | plt.text(x + bar_width * 4, y + 0.2, "%s" % round(y, 1), ha='center') 107 | 108 | plt.show() 109 | -------------------------------------------------------------------------------- /zufang/pos_price_show.py: -------------------------------------------------------------------------------- 1 | # 该文件实现了按照板块来展示不同城市不同板块的均价 2 | 3 | import json 4 | import codecs 5 | import matplotlib.pyplot as plt 6 | 7 | avg_price = [] # 均价 8 | pos_list = [] # 板块名 9 | count_list = [] # 板块数量 10 | 11 | class Read_Json_and_show(): 12 | def reads(self,filename,city, avg_price, pos_list,count_list): # 读取城市数据,并分别存储到需要的数组中 13 | temp_price = [] 14 | pos_name = [] 15 | count = [] 16 | for i in range(0,3000): 17 | pos_name.append("") 18 | count.append(0) 19 | temp_price.append(0) 20 | 21 | with codecs.open(filename, 'r', encoding='utf-8') as f: 22 | read = f.readlines() 23 | for index, info in enumerate(read): 24 | data = json.loads(info) 25 | value = list(data.values()) 26 | # value保存了每一行的数据的值,以列表形式。 27 | # 划分出价格 28 | temp_price_read = value[1].split('-') 29 | price = temp_price_read[0] 30 | price = int(price) 31 | # 划分出板块 32 | temp_pos_read = value[5].split('-') 33 | if len(temp_pos_read) > 1: # 先确定板块的位置,再将干扰的数据剔除 34 | temp_pos_read = temp_pos_read[1].split('㎡') 35 | if len(temp_pos_read) == 1: 36 | pos = str(temp_pos_read[0]) 37 | # 通过遍历的方式来检查已经建立的板块名列表,重复则数量加一,无重复则将这个板块名加入板块列表 38 | flag = -1 39 | for j in range(len(pos_name)): 40 | if pos_name[j] == pos or pos_name[j] == "": 41 | flag = j 42 | break 43 | 44 | pos_name[flag] = pos 45 | count[flag] += 1 46 | temp_price[flag] += price 47 | # 将之前预设的列表里多余的空项去除掉 48 | while temp_price[-1] == 0: 49 | temp_price.pop() 50 | while pos_name[-1] == "": 51 | pos_name.pop() 52 | while count[-1] == 0: 53 | count.pop() 54 | for i in range(len(count)): 55 | temp_price[i] = temp_price[i]/count[i] 56 | # 将最后得到的列表写入总列表中 57 | avg_price.append(temp_price) 58 | pos_list.append(pos_name) 59 | count_list.append(len(temp_price)) 60 | 61 | 62 | if __name__ == '__main__': 63 | read_json = Read_Json_and_show() 64 | path1 = 'scrapy-beijing-zufang.json' 65 | path2 = 'scrapy-guangzhou-zufang.json' 66 | path3 = 'scrapy-shanghai-zufang.json' 67 | path4 = 'scrapy-shenzhen-zufang.json' 68 | path5 = 'scrapy-xian-zufang.json' 69 | # 按照不同城市进行数据的读取和存储操作 70 | read_json.reads(path1, 0, avg_price, pos_list, count_list) 71 | read_json.reads(path2, 1, avg_price, pos_list, count_list) 72 | read_json.reads(path3, 2, avg_price, pos_list, count_list) 73 | read_json.reads(path4, 3, avg_price, pos_list, count_list) 74 | read_json.reads(path5, 4, avg_price, pos_list, count_list) 75 | # 输出读取结果进行观察 76 | print(avg_price) 77 | print(pos_list) 78 | print(count_list) 79 | 80 | # 数据太多了,只保留一部分 ,这里只保留15个板块的数据 81 | for i in range(5): 82 | while len(avg_price[i]) > 15: 83 | avg_price[i].pop() 84 | while len(pos_list[i]) > 15: 85 | pos_list[i].pop() 86 | 87 | city_name = ['北京', '广州', '上海', '深圳', '西安'] 88 | # 一些预设参数 89 | plt.figure(figsize=(50, 50), dpi=70) 90 | bar_width = 0.23 91 | plt.rcParams['font.sans-serif'] = ['Microsoft YaHei'] 92 | 93 | # 分开画不同城市的不同板块的展示图 94 | # 北京 95 | plt.subplot(321) 96 | plt.bar(pos_list[0],avg_price[0]) 97 | plt.title("北京不同板块均价展示图") 98 | plt.ylabel("价格:元/月") 99 | # 广州 100 | plt.subplot(322) 101 | plt.bar(pos_list[1], avg_price[1]) 102 | plt.title("广州不同板块均价展示图") 103 | plt.ylabel("价格:元/月") 104 | # 上海 105 | plt.subplot(323) 106 | plt.bar(pos_list[2], avg_price[2]) 107 | plt.title("上海不同板块均价展示图") 108 | plt.ylabel("价格:元/月") 109 | # 深圳 110 | plt.subplot(324) 111 | plt.bar(pos_list[3], avg_price[3]) 112 | plt.title("深圳不同板块均价展示图") 113 | plt.ylabel("价格:元/月") 114 | # 西安 115 | plt.subplot(325) 116 | plt.bar(pos_list[4], avg_price[4]) 117 | plt.title("西安不同板块均价展示图") 118 | plt.ylabel("价格:元/月") 119 | 120 | plt.show() -------------------------------------------------------------------------------- /zufang/room_price_show.py: -------------------------------------------------------------------------------- 1 | # 按照居室展示,几居即几室 2 | 3 | import json 4 | import codecs 5 | import matplotlib.pyplot as plt 6 | import numpy as np 7 | 8 | avg_price = [[]for _ in range(3)] # 均价 9 | high_price = [[]for _ in range(3)]# 最高价 10 | mid_price = [[]for _ in range(3)]# 中位数 11 | low_price = [[]for _ in range(3)]# 最低价 12 | 13 | class Read_Json_and_show(): 14 | def reads(self,filename,city, avg_price, high_price, mid_price, low_price): # 读取城市数据,并分别存储到需要的数组中 15 | temp_price = [] # 临时用的总价 16 | temp_high_price = [] # 临时用的最高价 17 | temp_low_price = [] # 临时用的最低价 18 | temp_mid_price_count = [[]for _ in range(3)] 19 | house_type = 0 # 1表示1居 2表示2居 3表示3居 20 | count = [] # 居室个数计数器 21 | for i in range(0, 3): # 给初值 22 | count.append(0) 23 | temp_price.append(0) 24 | temp_low_price.append(99999999) 25 | temp_high_price.append(0) 26 | 27 | with codecs.open(filename, 'r', encoding='utf-8') as f: 28 | read = f.readlines() 29 | for index, info in enumerate(read): 30 | data = json.loads(info) 31 | value = list(data.values()) 32 | # value保存了每一行的数据的值,以列表形式。 33 | # 数据处理,拆分出价格 34 | temp_price_read = value[1].split('-') 35 | price = temp_price_read[0] 36 | price = int(price) 37 | # 数据处理,拆分出居室情况 38 | temp_house_type = value[5].split('室') 39 | temp_house_type = temp_house_type[0].split('/ ') 40 | if temp_house_type[len(temp_house_type)-1] == '1': 41 | house_type = 1 42 | if temp_house_type[len(temp_house_type)-1] == '2': 43 | house_type = 2 44 | if temp_house_type[len(temp_house_type)-1] == '3': 45 | house_type = 3 46 | # print(house_type) 47 | 48 | if house_type == 1: # 1室的情况 49 | count[0] += 1 50 | if price > temp_high_price[0]: # 最高价 51 | temp_high_price[0] = price 52 | if price < temp_low_price[0]: # 最低价 53 | temp_low_price[0] = price 54 | temp_price[0] += price 55 | temp_mid_price_count[0].append(price) 56 | 57 | if house_type == 2: # 2室的情况 58 | count[1] += 1 59 | if price > temp_high_price[1]: # 最高价 60 | temp_high_price[1] = price 61 | if price < temp_low_price[1]: # 最低价 62 | temp_low_price[1] = price 63 | 64 | temp_price[1] += price 65 | temp_mid_price_count[1].append(price) 66 | 67 | if house_type == 3: # 3室的情况 68 | count[2] += 1 69 | if price > temp_high_price[2]: # 最高价 70 | temp_high_price[2] = price 71 | if price < temp_low_price[2]: # 最低价 72 | temp_low_price[2] = price 73 | temp_price[2] += price 74 | temp_mid_price_count[2].append(price) 75 | 76 | for i in range(0, 3): 77 | # 将处理完毕的数据放入列表中 78 | temp_price[i] = float(temp_price[i]/count[i]) # 均价 79 | avg_price[i][city] = temp_price[i] 80 | high_price[i][city] = temp_high_price[i] 81 | low_price[i][city] = temp_low_price[i] 82 | temp_mid_price_count[i].sort() 83 | mid_price[i][city] = temp_mid_price_count[i][int(len(temp_mid_price_count[i])/2)] 84 | 85 | 86 | if __name__ == '__main__': 87 | read_json = Read_Json_and_show() 88 | # 读文件并按照路径依次处理数据 89 | path1 = 'scrapy-beijing-zufang.json' 90 | path2 = 'scrapy-guangzhou-zufang.json' 91 | path3 = 'scrapy-shanghai-zufang.json' 92 | path4 = 'scrapy-shenzhen-zufang.json' 93 | path5 = 'scrapy-xian-zufang.json' 94 | # 给嵌套列表一个初值,这样方便后续数据处理 95 | for i in range(0, 3): 96 | for j in range(0, 5): 97 | avg_price[i].append(0) 98 | high_price[i].append(0) 99 | low_price[i].append(0) 100 | mid_price[i].append(0) 101 | 102 | # 读取并处理数据 103 | read_json.reads(path1, 0, avg_price, high_price, mid_price, low_price) 104 | read_json.reads(path2, 1, avg_price, high_price, mid_price, low_price) 105 | read_json.reads(path3, 2, avg_price, high_price, mid_price, low_price) 106 | read_json.reads(path4, 3, avg_price, high_price, mid_price, low_price) 107 | read_json.reads(path5, 4, avg_price, high_price, mid_price, low_price) 108 | 109 | print(avg_price) 110 | print(high_price) 111 | print(low_price) 112 | print(mid_price) 113 | 114 | # 绘图步骤 115 | city_name = ['北京', '广州', '上海', '深圳', '西安'] 116 | plt.figure(figsize=(30, 30), dpi=70) 117 | bar_width = 0.23 118 | plt.rcParams['font.sans-serif'] = ['Microsoft YaHei'] 119 | 120 | # 均价 121 | plt.subplot(221) 122 | plt.title('平均价格 ', fontsize=25) 123 | # 绘制成一个X坐标对应多个直方图的,这样方便数据的对比和展示观看 124 | plt.bar(x=np.arange(len(city_name)), height=avg_price[0], width=bar_width,label='1居',color='steelblue') 125 | plt.bar(x=np.arange(len(city_name))+bar_width, height=avg_price[1],width=bar_width, label='2居',color='brown') 126 | plt.bar(x=np.arange(len(city_name))+bar_width*2, height=avg_price[2], width=bar_width, label='3居',color='darkorange') 127 | plt.xticks(np.arange(5)+0.2,city_name,fontsize=15) 128 | plt.ylabel('价格:元/月') 129 | # 绘制图例 130 | plt.legend() 131 | # 给图像上端增添数据显示 132 | for x, y in enumerate(avg_price[0]): 133 | plt.text(x, y + 0.2, "%s" % round(y, 1), ha='center') 134 | for x, y in enumerate(avg_price[1]): 135 | plt.text(x+bar_width, y + 0.2, "%s" % round(y, 1), ha='center') 136 | for x, y in enumerate(avg_price[2]): 137 | plt.text(x+bar_width*2, y + 0.2, "%s" % round(y, 1), ha='center') 138 | 139 | # 最高价 140 | 141 | plt.subplot(222) 142 | plt.title('最高价格 ', fontsize=25) 143 | # 绘制成一个X坐标对应多个直方图的,这样方便数据的对比和展示观看 144 | plt.bar(x=np.arange(len(city_name)), height=high_price[0], width=bar_width, label='1居', color='steelblue') 145 | plt.bar(x=np.arange(len(city_name)) + bar_width, height=high_price[1], width=bar_width, label='2居', color='brown') 146 | plt.bar(x=np.arange(len(city_name)) + bar_width * 2, height=high_price[2], width=bar_width, label='3居', 147 | color='darkorange') 148 | plt.xticks(np.arange(5) + 0.2, city_name, fontsize=15) 149 | plt.ylabel('价格:元/月') 150 | plt.legend() 151 | # 给图像上端增添数据显示 152 | for x, y in enumerate(high_price[0]): 153 | plt.text(x, y + 0.2, "%s" % round(y, 1), ha='center') 154 | for x, y in enumerate(high_price[1]): 155 | plt.text(x + bar_width, y + 0.2, "%s" % round(y, 1), ha='center') 156 | for x, y in enumerate(high_price[2]): 157 | plt.text(x + bar_width * 2, y + 0.2, "%s" % round(y, 1), ha='center') 158 | 159 | # 最低价 160 | plt.subplot(223) 161 | plt.title('最低价格 ', fontsize=25) 162 | # 绘制成一个X坐标对应多个直方图的,这样方便数据的对比和展示观看 163 | plt.bar(x=np.arange(len(city_name)), height=low_price[0], width=bar_width, label='1居', color='steelblue') 164 | plt.bar(x=np.arange(len(city_name)) + bar_width, height=low_price[1], width=bar_width, label='2居', color='brown') 165 | plt.bar(x=np.arange(len(city_name)) + bar_width * 2, height=low_price[2], width=bar_width, label='3居', 166 | color='darkorange') 167 | plt.xticks(np.arange(5) + 0.2, city_name, fontsize=15) 168 | plt.ylabel('价格:元/月') 169 | plt.legend() 170 | # 给图像上端增添数据显示 171 | for x, y in enumerate(low_price[0]): 172 | plt.text(x, y + 0.2, "%s" % round(y, 1), ha='center') 173 | for x, y in enumerate(low_price[1]): 174 | plt.text(x + bar_width, y + 0.2, "%s" % round(y, 1), ha='center') 175 | for x, y in enumerate(low_price[2]): 176 | plt.text(x + bar_width * 2, y + 0.2, "%s" % round(y, 1), ha='center') 177 | 178 | # 中位数 179 | plt.subplot(224) 180 | plt.title('中位数价格 ', fontsize=25) 181 | plt.bar(x=np.arange(len(city_name)), height=mid_price[0], width=bar_width, label='1居', color='steelblue') 182 | plt.bar(x=np.arange(len(city_name)) + bar_width, height=mid_price[1], width=bar_width, label='2居', color='brown') 183 | plt.bar(x=np.arange(len(city_name)) + bar_width * 2, height=mid_price[2], width=bar_width, label='3居', 184 | color='darkorange') 185 | plt.xticks(np.arange(5) + 0.2, city_name, fontsize=15) 186 | plt.ylabel('价格:元/月') 187 | plt.legend() 188 | # 给图像上端增添数据显示 189 | for x, y in enumerate(mid_price[0]): 190 | plt.text(x, y + 0.2, "%s" % round(y, 1), ha='center') 191 | for x, y in enumerate(mid_price[1]): 192 | plt.text(x + bar_width, y + 0.2, "%s" % round(y, 1), ha='center') 193 | for x, y in enumerate(mid_price[2]): 194 | plt.text(x + bar_width * 2, y + 0.2, "%s" % round(y, 1), ha='center') 195 | 196 | plt.show() 197 | -------------------------------------------------------------------------------- /zufang/salary_price_show.py: -------------------------------------------------------------------------------- 1 | #由于之前写好的程序中有展示各个城市单位面积均价,直接在这里存入列表中。 2 | #单位面积均价: 北京:103.2元/平米/月 3 | # 广州:63.9元/平米/月 4 | # 上海:103.5元/平米/月 5 | # 深圳:88.5元/平米/月 6 | # 西安:36.0元/平米/月 7 | #查询百度各个城市的人均工资可知: 8 | #人均工资: 北京:13567元/月 广州:11300元/月 上海:12183元/月 深圳:12300元/月 西安9011元/月 9 | 10 | import matplotlib.pyplot as plt 11 | 12 | space_price = [103.2, 63.9, 103.5, 88.5, 36.0] 13 | salary = [135.67, 113.00, 121.83, 123.00, 90.11] # 单位为百元,这样好计算一点 14 | count = [] 15 | # 计算比值 16 | for i in range(0,5): 17 | count.append(space_price[i]/salary[i]) # 用比值来表示,比值越低说明房租占工资占比小,生活成本相对低一点 18 | 19 | # 绘图过程 20 | city_name = ['北京', '广州', '上海', '深圳', '西安'] 21 | plt.figure(figsize=(20, 10), dpi=70) 22 | # 消除中文乱码用的 23 | plt.rcParams['font.sans-serif'] = ['Microsoft YaHei'] 24 | # 绘制直方图 25 | plt.subplot(121) 26 | plt.title("单位面积价格与平均月薪的比值直方图",fontsize=20) 27 | plt.bar(city_name, count,width=0.4) 28 | plt.ylabel("单位面积价格/人均月薪(百元)",fontsize=15) 29 | # 绘制散点图,由于要用不同的点的颜色来表示,因此分开绘制五个点 30 | plt.subplot(122) 31 | plt.scatter(salary[0],space_price[0],s=60, label='北京', color='steelblue') 32 | plt.scatter(salary[1],space_price[1],s=60, label='广州', color='brown') 33 | plt.scatter(salary[2],space_price[2],s=60, label='上海',color='green') 34 | plt.scatter(salary[3],space_price[3],s=60, label='深圳',color='darkorange') 35 | plt.scatter(salary[4],space_price[4],s=60, label='西安',color='skyblue') 36 | plt.title("人均月薪-单位面积均价散点图",fontsize=20) 37 | plt.xlabel("人均月薪 单位:百元",fontsize=15) 38 | plt.ylabel("单位面积均价 单位:平方米/元/月",fontsize=15) 39 | plt.legend(fontsize=15) 40 | 41 | plt.show() 42 | -------------------------------------------------------------------------------- /zufang/scrapy.cfg: -------------------------------------------------------------------------------- 1 | # Automatically created by: scrapy startproject 2 | # 3 | # For more information about the [deploy] section see: 4 | # https://scrapyd.readthedocs.io/en/latest/deploy.html 5 | 6 | [settings] 7 | default = zufang.settings 8 | 9 | [deploy] 10 | #url = http://localhost:6800/ 11 | project = zufang 12 | -------------------------------------------------------------------------------- /zufang/total_price_show.py: -------------------------------------------------------------------------------- 1 | # 该文件实现了对五个城市的租房价格的均价,中位数,最高价,最低价和单位面积的均价,中位数,最高价,最低价的比较分析和图表绘制 2 | import json 3 | import codecs 4 | import re 5 | import matplotlib.pyplot as plt 6 | 7 | 8 | total_avg_price = [] # 均价 9 | total_high_price = [] # 最高价 10 | total_mid_price = [] # 中位数 11 | total_low_price = [] # 最低价 12 | space_avg_price = [] # 均价(单位面积) 13 | space_high_price = [] # 最高价(单位面积) 14 | space_low_price = [] # 最低价(单位面积) 15 | space_mid_price = [] # 中位数 (单位面积) 16 | 17 | class Read_Json_and_show(): 18 | def reads(self,filename,city, total_avg_price, total_high_price, total_mid_price, total_low_price, space_avg_price, 19 | space_high_price, space_low_price, space_mid_price): # 读取城市数据,并分别存储到需要的数组中 20 | temp_total_price = 0 21 | temp_space_price = 0 22 | temp_space_high_price = 0 23 | temp_space_low_price = 9999999 24 | temp_total_high_price = 0 25 | temp_total_low_price = 9999999 26 | with codecs.open(filename,'r',encoding='utf-8') as f: 27 | # 打开并读取文件 28 | read = f.readlines() 29 | total_mid_price_count = [] 30 | space_mid_price_count = [] 31 | for index, info in enumerate(read): # 逐行读取 32 | data = json.loads(info) 33 | value = list(data.values()) 34 | # value保存了每一行的数据的值,以列表形式。 35 | 36 | number = re.compile(r'^[-+]?[0-9]+\.[0-9]+$') # 正则式判断是否是小数 37 | # 划分出价格 38 | temp_price = value[1].split('-') 39 | price = temp_price[0] 40 | price = int(price) 41 | # 划分出面积 42 | temp_space = value[5].split("㎡") 43 | temp_space = temp_space[0].split("/ ") 44 | if len(temp_space) == 1: 45 | temp_space = temp_space[0].split("-") 46 | space = temp_space[0] 47 | space = float(space) 48 | else: 49 | temp_space2 = temp_space[1].split("-") 50 | # 判断这个数是否为小数,是则说明是面积,这里保留了最小面积作为参考(否则就是异常数据,需要被忽略) 51 | result = number.match(temp_space2[0]) 52 | if result: 53 | space = temp_space2[0] 54 | space = float(space) 55 | else: # 保留最小的面积(如20-25㎡则将space看做20) 56 | temp_space = temp_space[2].split("-") 57 | space = temp_space[0] 58 | space = float(space) 59 | # print(space) 60 | if price > temp_total_high_price: # 最高价 61 | temp_total_high_price = price 62 | if price < temp_total_low_price: # 最低价 63 | temp_total_low_price = price 64 | temp_total_price += price # 总价,均价出循环了计算 65 | total_mid_price_count.append(price) # 中位数,同样出循环了计算 66 | 67 | space_price = float(price/space) 68 | if space_price > temp_space_high_price: # 最高价(单位面积) 69 | temp_space_high_price = space_price 70 | if space_price < temp_space_low_price: # 最低价(单位面积) 71 | temp_space_low_price = space_price 72 | temp_space_price += space_price # 总价 (单位面积) 73 | space_mid_price_count.append(space_price) # 中位数(单位面积) 74 | 75 | # 均价 76 | total_avg_price.append(float(temp_total_price/3000)) 77 | space_avg_price.append(float(temp_space_price/3000)) 78 | # 最高价 79 | total_high_price.append(float(temp_total_high_price)) 80 | space_high_price.append(float(temp_space_high_price)) 81 | # 最低价 82 | total_low_price.append(float(temp_total_low_price)) 83 | space_low_price.append(float(temp_space_low_price)) 84 | # 中位数 85 | total_mid_price_count.sort() 86 | space_mid_price_count.sort() 87 | total_mid_price.append(float(total_mid_price_count[1499])) 88 | space_mid_price.append(float(space_mid_price_count[1499])) 89 | 90 | 91 | if __name__ == '__main__': 92 | read_json = Read_Json_and_show() 93 | path1 = 'scrapy-beijing-zufang.json' 94 | path2 = 'scrapy-guangzhou-zufang.json' 95 | path3 = 'scrapy-shanghai-zufang.json' 96 | path4 = 'scrapy-shenzhen-zufang.json' 97 | path5 = 'scrapy-xian-zufang.json' 98 | # 读取并处理数据 99 | read_json.reads(path1, 1, total_avg_price, total_high_price, total_mid_price, total_low_price, space_avg_price, 100 | space_high_price, space_low_price, space_mid_price) 101 | read_json.reads(path2, 2, total_avg_price, total_high_price, total_mid_price, total_low_price, space_avg_price, 102 | space_high_price, space_low_price, space_mid_price) 103 | read_json.reads(path3, 3, total_avg_price, total_high_price, total_mid_price, total_low_price, space_avg_price, 104 | space_high_price, space_low_price, space_mid_price) 105 | read_json.reads(path4, 4, total_avg_price, total_high_price, total_mid_price, total_low_price, space_avg_price, 106 | space_high_price, space_low_price, space_mid_price) 107 | read_json.reads(path5, 5, total_avg_price, total_high_price, total_mid_price, total_low_price, space_avg_price, 108 | space_high_price, space_low_price, space_mid_price) 109 | 110 | # 输出展示处理结果 111 | print(total_avg_price) 112 | print(total_high_price) 113 | print(total_low_price) 114 | print(total_mid_price) 115 | print(space_avg_price) 116 | print(space_high_price) 117 | print(space_low_price) 118 | print(space_mid_price) 119 | 120 | # 绘直方图图 121 | plt.rcParams['font.sans-serif'] = ['Microsoft YaHei'] 122 | city_name = ['北京', '广州', '上海', '深圳', '西安'] 123 | plt.figure(figsize=(30, 30), dpi=70) 124 | bar_width = 0.4 125 | 126 | # 总平均租金展示图的绘制,共八个子图 127 | plt.subplot(241) 128 | plt.title('总价平均价格', fontsize=25) 129 | plt.bar(city_name,total_avg_price) 130 | plt.ylabel('价格:元/月') 131 | for x, y in enumerate(total_avg_price): 132 | plt.text(x, y + 0.1, "%s" % round(y, 1), ha='center') 133 | # 总最高租金 134 | plt.subplot(242) 135 | plt.title('总价最高价格', fontsize=25) 136 | plt.bar(city_name, total_high_price) 137 | plt.ylabel('价格:元/月') 138 | for x, y in enumerate(total_high_price): 139 | plt.text(x, y + 0.1, "%s" % round(y, 1), ha='center') 140 | # 总最低租金 141 | plt.subplot(243) 142 | plt.title('总价最低价格', fontsize=25) 143 | plt.bar(city_name, total_low_price) 144 | plt.ylabel('价格:元/月') 145 | for x, y in enumerate(total_low_price): 146 | plt.text(x, y + 0.1, "%s" % round(y, 1), ha='center') 147 | # 总中位数租金 148 | plt.subplot(244) 149 | plt.title('总价中位数价格', fontsize=25) 150 | plt.bar(city_name, total_mid_price) 151 | plt.ylabel('价格:元/月') 152 | for x, y in enumerate(total_mid_price): 153 | plt.text(x, y + 0.1, "%s" % round(y, 1), ha='center') 154 | # 单位面积平均租金展示图的绘制 155 | plt.subplot(245) 156 | plt.title('单位面积均价', fontsize=25) 157 | plt.bar(city_name, space_avg_price) 158 | plt.ylabel('价格:元/月') 159 | for x, y in enumerate(space_avg_price): 160 | plt.text(x, y + 0.1, "%s" % round(y, 1), ha='center') 161 | # 单位面积最高租金 162 | plt.subplot(246) 163 | plt.title('单位面积最高价格', fontsize=25) 164 | plt.bar(city_name, space_high_price) 165 | plt.ylabel('价格:元/月') 166 | for x, y in enumerate(space_high_price): 167 | plt.text(x, y + 0.1, "%s" % round(y, 1), ha='center') 168 | # 单位面积最低租金 169 | plt.subplot(247) 170 | plt.title('单位面积最低价格', fontsize=25) 171 | plt.bar(city_name, space_low_price) 172 | plt.ylabel('价格:元/月') 173 | for x, y in enumerate(space_low_price): 174 | plt.text(x, y + 0.1, "%s" % round(y, 1), ha='center') 175 | # 单位面积中位数租金 176 | plt.subplot(248) 177 | plt.title('单位面积中位数价格', fontsize=25) 178 | plt.bar(city_name, space_mid_price) 179 | plt.ylabel('价格:元/月') 180 | for x, y in enumerate(space_mid_price): 181 | plt.text(x, y + 0.1, "%s" % round(y, 1), ha='center') 182 | 183 | plt.show() 184 | -------------------------------------------------------------------------------- /zufang/zufang/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Revenger666/python-house-scrapy/bbff08f70092d39ef0d19d3f0d9723aa4c8fd1b8/zufang/zufang/__init__.py -------------------------------------------------------------------------------- /zufang/zufang/items.py: -------------------------------------------------------------------------------- 1 | # Define here the models for your scraped items 2 | # 3 | # See documentation in: 4 | # https://docs.scrapy.org/en/latest/topics/items.html 5 | 6 | import scrapy 7 | 8 | 9 | class zufangitem(scrapy.Item): 10 | title = scrapy.Field() # 标题 11 | price = scrapy.Field() #月租金 12 | position0 = scrapy.Field() #地址1 13 | position1 = scrapy.Field() #地址2 14 | position2 = scrapy.Field() #地址3 15 | information = scrapy.Field() #其他信息 16 | 17 | -------------------------------------------------------------------------------- /zufang/zufang/middlewares.py: -------------------------------------------------------------------------------- 1 | from scrapy import signals 2 | from selenium import webdriver 3 | from scrapy.http import HtmlResponse 4 | 5 | # 只需要修改下载器中间件,爬虫中间件不用管 6 | 7 | 8 | class ZufangDownloaderMiddleware: 9 | # 当下载器中间件开始工作时,自动打开一个浏览器 10 | 11 | def __init__(self): 12 | self.driver = webdriver.Chrome() 13 | 14 | @classmethod 15 | def from_crawler(cls, crawler): 16 | # This method is used by Scrapy to create your spiders. 17 | s = cls() 18 | crawler.signals.connect(s.spider_opened, signal=signals.spider_opened) 19 | # 下面这一行需要手动添加,作用是调用关闭浏览器的函数 20 | crawler.signals.connect(s.spider_closed, signal=signals.spider_closed) 21 | return s 22 | 23 | # 每当爬虫文件向目标网址发送一次请求都会调用这个函数,用处就是返回该网址的源码 24 | 25 | def process_request(self, request, spider): 26 | self.driver.get(request.url) # 使用浏览器打开请求的URL 27 | body = self.driver.page_source # 获取网页HTML源码 28 | return HtmlResponse(url=self.driver.current_url, body=body, encoding='utf-8', request=request) 29 | 30 | def process_response(self, request, response, spider): 31 | return response 32 | 33 | def process_exception(self, request, exception, spider): 34 | pass 35 | 36 | def spider_opened(self, spider): 37 | spider.logger.info("Spider opened: %s" % spider.name) 38 | 39 | # 该函数需要手动添加,作用是关闭浏览器 40 | 41 | def spider_closed(self, spider): 42 | self.driver.close() 43 | spider.logger.info("Spider closed: %s" % spider.name) 44 | -------------------------------------------------------------------------------- /zufang/zufang/pipelines.py: -------------------------------------------------------------------------------- 1 | # Define your item pipelines here 2 | # 3 | # Don't forget to add your pipeline to the ITEM_PIPELINES setting 4 | # See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html 5 | 6 | 7 | # useful for handling different item types with a single interface 8 | from itemadapter import ItemAdapter 9 | import json 10 | 11 | 12 | class zufangline(object): 13 | def open_spider(self, spider): 14 | try: 15 | self.file = open('scrapy-xian-zufang.json', "w", encoding="utf-8") 16 | except Exception as err: 17 | print(err) 18 | 19 | def process_item(self, item, spider): 20 | dict_item = dict(item) 21 | json_str = json.dumps(dict_item, ensure_ascii=False) + "\n" 22 | self.file.write(json_str) 23 | return item 24 | 25 | def close_spider(self, spider): 26 | self.file.close() 27 | -------------------------------------------------------------------------------- /zufang/zufang/settings.py: -------------------------------------------------------------------------------- 1 | # Scrapy settings for lianjia project 2 | # 3 | # For simplicity, this file contains only settings considered important or 4 | # commonly used. You can find more settings consulting the documentation: 5 | # 6 | # https://docs.scrapy.org/en/latest/topics/settings.html 7 | # https://docs.scrapy.org/en/latest/topics/downloader-middleware.html 8 | # https://docs.scrapy.org/en/latest/topics/spider-middleware.html 9 | 10 | BOT_NAME = "zufang" 11 | #2403:a200:a200:13f1:183:84:18:11 12 | 13 | SPIDER_MODULES = ["zufang.spiders"] 14 | NEWSPIDER_MODULE = "zufang.spiders" 15 | 16 | # Crawl responsibly by identifying yourself (and your website) on the user-agent 17 | #USER_AGENT = 'python_qm (+http://www.yourdomain.com)' 18 | #USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36" 19 | #USER_AGENT ="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36 Edg/119.0.0.0" 20 | DOWNLOADER_MIDDLEWARES = { 21 | # 'lianjia.middlewares.LianjiaDownloaderMiddleware': 543, 22 | 'zufang.middlewares.RandomUserAgentMiddleware': 900, 23 | } 24 | 25 | MY_USER_AGENT = [ 26 | "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; AcooBrowser; .NET CLR 1.1.4322; .NET CLR 2.0.50727)", 27 | "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; Acoo Browser; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.0.04506)", 28 | "Mozilla/4.0 (compatible; MSIE 7.0; AOL 9.5; AOLBuild 4337.35; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)", 29 | "Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US)", 30 | "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET CLR 2.0.50727; Media Center PC 6.0)", 31 | "Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET CLR 1.0.3705; .NET CLR 1.1.4322)", 32 | "Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 5.2; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.2; .NET CLR 3.0.04506.30)", 33 | "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN) AppleWebKit/523.15 (KHTML, like Gecko, Safari/419.3) Arora/0.3 (Change: 287 c9dfb30)", 34 | "Mozilla/5.0 (X11; U; Linux; en-US) AppleWebKit/527+ (KHTML, like Gecko, Safari/419.3) Arora/0.6", 35 | "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.2pre) Gecko/20070215 K-Ninja/2.1.1", 36 | "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.9) Gecko/20080705 Firefox/3.0 Kapiko/3.0", 37 | "Mozilla/5.0 (X11; Linux i686; U;) Gecko/20070322 Kazehakase/0.4.5", 38 | "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.8) Gecko Fedora/1.9.0.8-1.fc10 Kazehakase/0.5.6", 39 | "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11", 40 | "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/535.20 (KHTML, like Gecko) Chrome/19.0.1036.7 Safari/535.20", 41 | "Opera/9.80 (Macintosh; Intel Mac OS X 10.6.8; U; fr) Presto/2.9.168 Version/11.52", 42 | "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.11 TaoBrowser/2.0 Safari/536.11", 43 | "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.71 Safari/537.1 LBBROWSER", 44 | "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; LBBROWSER)", 45 | "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E; LBBROWSER)", 46 | "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.84 Safari/535.11 LBBROWSER", 47 | "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E)", 48 | "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; QQBrowser/7.0.3698.400)", 49 | "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E)", 50 | "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; SV1; QQDownload 732; .NET4.0C; .NET4.0E; 360SE)", 51 | "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E)", 52 | "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E)", 53 | "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1", 54 | "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1", 55 | "Mozilla/5.0 (iPad; U; CPU OS 4_2_1 like Mac OS X; zh-cn) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8C148 Safari/6533.18.5", 56 | "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:2.0b13pre) Gecko/20110307 Firefox/4.0b13pre", 57 | "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:16.0) Gecko/20100101 Firefox/16.0", 58 | "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11", 59 | "Mozilla/5.0 (X11; U; Linux x86_64; zh-CN; rv:1.9.2.10) Gecko/20100922 Ubuntu/10.10 (maverick) Firefox/3.6.10", 60 | "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36", 61 | ] 62 | 63 | # Obey robots.txt rules 64 | ROBOTSTXT_OBEY = False 65 | 66 | LOG_LEVEL = 'WARNING' 67 | 68 | #LOG_LEVEL = "WARNING" 69 | # Configure maximum concurrent requests performed by Scrapy (default: 16) 70 | #CONCURRENT_REQUESTS = 8 71 | 72 | # Configure a delay for requests for the same website (default: 0) 73 | # See https://docs.scrapy.org/en/latest/topics/settings.html#download-delay 74 | # See also autothrottle settings and docs 75 | DOWNLOAD_DELAY = 3 76 | RANDOMIZE_DOWNLOAD_DELAY = True 77 | #CONCURRENT_REQUESTS_PER_DOMAIN = 2 78 | # The download delay setting will honor only one of: 79 | #CONCURRENT_REQUESTS_PER_DOMAIN = 16 80 | #CONCURRENT_REQUESTS_PER_IP = 16 81 | 82 | # Disable cookies (enabled by default) 83 | #COOKIES_ENABLED = False 84 | 85 | # Disable Telnet Console (enabled by default) 86 | #TELNETCONSOLE_ENABLED = False 87 | 88 | # Override the default request headers: 89 | #DEFAULT_REQUEST_HEADERS = { 90 | # "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", 91 | # "Accept-Language": "en", 92 | # "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.4389.82 Safari/537.36" 93 | #} 94 | 95 | # Enable or disable spider middlewares 96 | # See https://docs.scrapy.org/en/latest/topics/spider-middleware.html 97 | #SPIDER_MIDDLEWARES = { 98 | # "lianjia.middlewares.LianjiaSpiderMiddleware": 543, 99 | #} 100 | 101 | # Enable or disable downloader middlewares 102 | # See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html 103 | DOWNLOADER_MIDDLEWARES = { 104 | "zufang.middlewares.ZufangDownloaderMiddleware": 543, 105 | } 106 | 107 | # Enable or disable extensions 108 | # See https://docs.scrapy.org/en/latest/topics/extensions.html 109 | #EXTENSIONS = { 110 | # "scrapy.extensions.telnet.TelnetConsole": None, 111 | #} 112 | 113 | # Configure item pipelines 114 | # See https://docs.scrapy.org/en/latest/topics/item-pipeline.html 115 | ITEM_PIPELINES = {'zufang.pipelines.zufangline': 300, } 116 | #ITEM_PIPELINES = {'lianjia.pipelines.firsthandline': 300} 117 | # Enable and configure the AutoThrottle extension (disabled by default) 118 | # See https://docs.scrapy.org/en/latest/topics/autothrottle.html 119 | #AUTOTHROTTLE_ENABLED = True 120 | # The initial download delay 121 | #AUTOTHROTTLE_START_DELAY = 5 122 | # The maximum download delay to be set in case of high latencies 123 | #AUTOTHROTTLE_MAX_DELAY = 60 124 | # The average number of requests Scrapy should be sending in parallel to 125 | # each remote server 126 | #AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0 127 | # Enable showing throttling stats for every response received: 128 | #AUTOTHROTTLE_DEBUG = False 129 | 130 | # Enable and configure HTTP caching (disabled by default) 131 | # See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings 132 | #HTTPCACHE_ENABLED = True 133 | #HTTPCACHE_EXPIRATION_SECS = 0 134 | #HTTPCACHE_DIR = "httpcache" 135 | #HTTPCACHE_IGNORE_HTTP_CODES = [] 136 | #HTTPCACHE_STORAGE = "scrapy.extensions.httpcache.FilesystemCacheStorage" 137 | 138 | # Set settings whose default value is deprecated to a future-proof value 139 | REQUEST_FINGERPRINTER_IMPLEMENTATION = "2.7" 140 | TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor" 141 | FEED_EXPORT_ENCODING = "utf-8" 142 | -------------------------------------------------------------------------------- /zufang/zufang/spiders/__init__.py: -------------------------------------------------------------------------------- 1 | # This package will contain the spiders of your Scrapy project 2 | # 3 | # Please refer to the documentation for information on how to create and manage 4 | # your spiders. 5 | -------------------------------------------------------------------------------- /zufang/zufang/spiders/spider1.py: -------------------------------------------------------------------------------- 1 | import scrapy 2 | 3 | from zufang.items import zufangitem 4 | 5 | 6 | class Zufangspider(scrapy.spiders.Spider): 7 | name = "xian" # 爬虫名字分别为 beijing shanghai guangzhou shenzhen xian 8 | allowed_domains = ["xa.lianjia.com"] # 爬取的起始页面 9 | start_urls = [] 10 | for page in range(1, 101): # 共100页,所以利用一个循环来爬取 11 | url1 = 'https://xa.lianjia.com/zufang/pg{}/'.format(page) 12 | start_urls.append(url1) 13 | 14 | custom_settings = { 15 | 'ITEM_PIPELINES': {'zufang.pipelines.zufangline': 300}, 16 | } 17 | 18 | def parse(self, response, **kwargs): 19 | 20 | item = zufangitem() 21 | div_list = response.xpath("//*[@id=\"content\"]/div[1]/div[1]/div") 22 | 23 | # 通过XPATH来分析爬取到的内容,并提取需要的数据 24 | for each in div_list: 25 | item['title'] = each.xpath("normalize-space(./div/p[1]/a/text())").extract_first() 26 | item['price'] = each.xpath("normalize-space(./div/span/em/text())").extract_first() 27 | item['position0'] = each.xpath("./div/p[2]/a[1]/text()").extract_first() 28 | item['position1'] = each.xpath("./div/p[2]/a[2]/text()").extract_first() 29 | item['position2'] = each.xpath("./div/p[2]/a[3]/text()").extract_first() 30 | item['information'] = each.xpath("normalize-space(./div/p[2])").extract_first() 31 | yield item 32 | -------------------------------------------------------------------------------- /实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Revenger666/python-house-scrapy/bbff08f70092d39ef0d19d3f0d9723aa4c8fd1b8/实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤.zip -------------------------------------------------------------------------------- /实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/zufang/.idea/.gitignore: -------------------------------------------------------------------------------- 1 | # 默认忽略的文件 2 | /shelf/ 3 | /workspace.xml 4 | -------------------------------------------------------------------------------- /实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/zufang/.idea/inspectionProfiles/profiles_settings.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 6 | -------------------------------------------------------------------------------- /实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/zufang/.idea/misc.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | -------------------------------------------------------------------------------- /实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/zufang/.idea/modules.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | -------------------------------------------------------------------------------- /实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/zufang/.idea/zufang.iml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | -------------------------------------------------------------------------------- /实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/zufang/GDP_price_show.py: -------------------------------------------------------------------------------- 1 | #由于之前写好的程序中有展示各个城市单位面积均价,直接在这里存入列表中。 2 | #单位面积均价: 北京:103.2元/平米/月 3 | # 广州:63.9元/平米/月 4 | # 上海:103.5元/平米/月 5 | # 深圳:88.5元/平米/月 6 | # 西安:36.0元/平米/月 7 | #通过百度查询各个城市的人均GDP可知: 8 | #人均GDP:北京:19.03万元/人 广州:15.36万元/人 上海:17.99万元/人 深圳:18.33万元/人 西安:8.88万元/人 9 | #采用比值的形式来展示单位面积均价/GDP ,比值越小越好,因为对于同样的单位面积,GDP越大越好 10 | 11 | import matplotlib.pyplot as plt 12 | 13 | GDP = [19.03, 15.36, 17.99, 18.33, 8.88] 14 | space_price = [103.2, 63.9, 103.5, 88.5, 36.0] 15 | count = [] 16 | # 求比值 17 | for i in range(0,5): 18 | count.append(space_price[i]/GDP[i]) 19 | 20 | city_name = ['北京', '广州', '上海', '深圳', '西安'] 21 | plt.figure(figsize=(20, 10), dpi=70) 22 | plt.rcParams['font.sans-serif'] = ['Microsoft YaHei'] 23 | 24 | # 绘制直方图 25 | plt.subplot(121) 26 | plt.title("单位面积价格与GDP的比值直方图",fontsize=20) 27 | plt.bar(city_name, count,width=0.4) 28 | plt.ylabel("单位面积价格/人均GDP(万元)",fontsize=15) 29 | 30 | # 绘制散点图 31 | plt.subplot(122) 32 | plt.scatter(GDP[0],space_price[0],s=60, label='北京', color='steelblue') 33 | plt.scatter(GDP[1],space_price[1],s=60, label='广州', color='brown') 34 | plt.scatter(GDP[2],space_price[2],s=60, label='上海',color='green') 35 | plt.scatter(GDP[3],space_price[3],s=60, label='深圳',color='darkorange') 36 | plt.scatter(GDP[4],space_price[4],s=60, label='西安',color='skyblue') 37 | plt.title("GDP-单位面积均价散点图",fontsize=20) 38 | plt.xlabel("GDP 单位:万元",fontsize=15) 39 | plt.ylabel("单位面积均价 单位:平方米/元/月",fontsize=15) 40 | plt.legend(fontsize=15) 41 | 42 | plt.show() 43 | -------------------------------------------------------------------------------- /实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/zufang/chromedriver.exe: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Revenger666/python-house-scrapy/bbff08f70092d39ef0d19d3f0d9723aa4c8fd1b8/实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/zufang/chromedriver.exe -------------------------------------------------------------------------------- /实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/zufang/face_price_show.py: -------------------------------------------------------------------------------- 1 | # 该文件实现了对五个不同城市按照房屋面向来进行均价的比较 2 | 3 | import json 4 | import codecs 5 | import matplotlib.pyplot as plt 6 | import numpy as np 7 | 8 | avg_price = [] # 均价 9 | count_list = [] # 各种面向的数量 10 | 11 | class Read_Json_and_show(): 12 | def reads(self,filename,city, avg_price,count_list): # 读取城市数据,并分别存储到需要的数组中 13 | temp_price = [] # 临时用存储价格的列表 14 | count = [] # 临时用存储各个面向计数的列表 15 | for i in range(0,4): # 0123分别表示东南西北 16 | count.append(0) 17 | temp_price.append(0) 18 | 19 | with codecs.open(filename, 'r', encoding='utf-8') as f: 20 | read = f.readlines() 21 | # 打开文件并逐行读取 22 | for index, info in enumerate(read): 23 | data = json.loads(info) 24 | value = list(data.values()) 25 | # value保存了每一行的数据的值,以列表形式。 26 | 27 | # 不断拆分,最后拆出来需要的面向 28 | temp_price_read = value[1].split('-') 29 | price = temp_price_read[0] 30 | price = int(price) 31 | temp_face_read = value[5].split('㎡ /') 32 | temp_face_read = temp_face_read[1].split(' / ') 33 | temp_face_read = temp_face_read[0].split('在') 34 | 35 | if len(temp_face_read) == 1: 36 | temp_face_read = temp_face_read[0].split('室') 37 | if len(temp_face_read) == 1: 38 | temp_face_read = temp_face_read[0].split(' ') 39 | # 由于同时一组数据可能有多个面向,故如面向东 南,则东,南均各统计一次。 40 | for face in temp_face_read: 41 | if face == '东': 42 | temp_price[0] += price 43 | count[0] += 1 44 | if face == '南': 45 | temp_price[1] += price 46 | count[1] += 1 47 | if face == '西': 48 | temp_price[2] += price 49 | count[2] += 1 50 | if face == '北': 51 | temp_price[3] += price 52 | count[3] += 1 53 | # 计算平均价格 54 | for i in range(len(count)): 55 | temp_price[i] = temp_price[i]/count[i] 56 | 57 | # 将价格放入列表中 58 | avg_price.append(temp_price) 59 | count_list.append(count) 60 | 61 | 62 | if __name__ == '__main__': 63 | read_json = Read_Json_and_show() 64 | # 存储路径 65 | path1 = 'scrapy-beijing-zufang.json' 66 | path2 = 'scrapy-guangzhou-zufang.json' 67 | path3 = 'scrapy-shanghai-zufang.json' 68 | path4 = 'scrapy-shenzhen-zufang.json' 69 | path5 = 'scrapy-xian-zufang.json' 70 | # 分别对五个城市进行读取和分析的操作 71 | read_json.reads(path1, 0, avg_price, count_list) 72 | read_json.reads(path2, 1, avg_price, count_list) 73 | read_json.reads(path3, 2, avg_price, count_list) 74 | read_json.reads(path4, 3, avg_price, count_list) 75 | read_json.reads(path5, 4, avg_price, count_list) 76 | # 查看分析结果 77 | print(avg_price) 78 | print(count_list) 79 | # 绘图,分别将五个城市的直方图进行绘制 80 | face_name = ['东','南','西','北'] 81 | plt.figure(figsize=(30, 30), dpi=70) 82 | bar_width = 0.1 # 条宽偏移 83 | plt.rcParams['font.sans-serif'] = ['Microsoft YaHei'] 84 | 85 | # 通过偏移来绘制直方图,以达到一个横坐标能显示多个直方图的效果,同时便于区分颜色和图例 86 | plt.title('各城市面向以及价格比较', fontsize=25) 87 | plt.bar(x=np.arange(len(face_name)), height=avg_price[0], width=bar_width, label='北京', color='steelblue') 88 | plt.bar(x=np.arange(len(face_name)) + bar_width, height=avg_price[1], width=bar_width, label='广州', color='brown') 89 | plt.bar(x=np.arange(len(face_name)) + bar_width * 2, height=avg_price[2], width=bar_width, label='上海',color='greenyellow') 90 | plt.bar(x=np.arange(len(face_name)) + bar_width * 3, height=avg_price[3], width=bar_width, label='深圳',color='darkorange') 91 | plt.bar(x=np.arange(len(face_name)) + bar_width * 4, height=avg_price[4], width=bar_width, label='西安',color='skyblue') 92 | plt.xticks(np.arange(4) + 0.2, face_name,fontsize=20) 93 | plt.ylabel('价格:元/月',fontsize=20) 94 | # 显示图例 95 | plt.legend(fontsize=10) 96 | # 为柱状图在顶部添加数据信息 97 | for x, y in enumerate(avg_price[0]): 98 | plt.text(x, y + 0.2, "%s" % round(y, 1), ha='center') 99 | for x, y in enumerate(avg_price[1]): 100 | plt.text(x + bar_width, y + 0.2, "%s" % round(y, 1), ha='center') 101 | for x, y in enumerate(avg_price[2]): 102 | plt.text(x + bar_width * 2, y + 0.2, "%s" % round(y, 1), ha='center') 103 | for x, y in enumerate(avg_price[3]): 104 | plt.text(x + bar_width * 3, y + 0.2, "%s" % round(y, 1), ha='center') 105 | for x, y in enumerate(avg_price[4]): 106 | plt.text(x + bar_width * 4, y + 0.2, "%s" % round(y, 1), ha='center') 107 | 108 | plt.show() 109 | -------------------------------------------------------------------------------- /实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/zufang/pos_price_show.py: -------------------------------------------------------------------------------- 1 | # 该文件实现了按照板块来展示不同城市不同板块的均价 2 | 3 | import json 4 | import codecs 5 | import matplotlib.pyplot as plt 6 | 7 | avg_price = [] # 均价 8 | pos_list = [] # 板块名 9 | count_list = [] # 板块数量 10 | 11 | class Read_Json_and_show(): 12 | def reads(self,filename,city, avg_price, pos_list,count_list): # 读取城市数据,并分别存储到需要的数组中 13 | temp_price = [] 14 | pos_name = [] 15 | count = [] 16 | for i in range(0,3000): 17 | pos_name.append("") 18 | count.append(0) 19 | temp_price.append(0) 20 | 21 | with codecs.open(filename, 'r', encoding='utf-8') as f: 22 | read = f.readlines() 23 | for index, info in enumerate(read): 24 | data = json.loads(info) 25 | value = list(data.values()) 26 | # value保存了每一行的数据的值,以列表形式。 27 | # 划分出价格 28 | temp_price_read = value[1].split('-') 29 | price = temp_price_read[0] 30 | price = int(price) 31 | # 划分出板块 32 | temp_pos_read = value[5].split('-') 33 | if len(temp_pos_read) > 1: # 先确定板块的位置,再将干扰的数据剔除 34 | temp_pos_read = temp_pos_read[1].split('㎡') 35 | if len(temp_pos_read) == 1: 36 | pos = str(temp_pos_read[0]) 37 | # 通过遍历的方式来检查已经建立的板块名列表,重复则数量加一,无重复则将这个板块名加入板块列表 38 | flag = -1 39 | for j in range(len(pos_name)): 40 | if pos_name[j] == pos or pos_name[j] == "": 41 | flag = j 42 | break 43 | 44 | pos_name[flag] = pos 45 | count[flag] += 1 46 | temp_price[flag] += price 47 | # 将之前预设的列表里多余的空项去除掉 48 | while temp_price[-1] == 0: 49 | temp_price.pop() 50 | while pos_name[-1] == "": 51 | pos_name.pop() 52 | while count[-1] == 0: 53 | count.pop() 54 | for i in range(len(count)): 55 | temp_price[i] = temp_price[i]/count[i] 56 | # 将最后得到的列表写入总列表中 57 | avg_price.append(temp_price) 58 | pos_list.append(pos_name) 59 | count_list.append(len(temp_price)) 60 | 61 | 62 | if __name__ == '__main__': 63 | read_json = Read_Json_and_show() 64 | path1 = 'scrapy-beijing-zufang.json' 65 | path2 = 'scrapy-guangzhou-zufang.json' 66 | path3 = 'scrapy-shanghai-zufang.json' 67 | path4 = 'scrapy-shenzhen-zufang.json' 68 | path5 = 'scrapy-xian-zufang.json' 69 | # 按照不同城市进行数据的读取和存储操作 70 | read_json.reads(path1, 0, avg_price, pos_list, count_list) 71 | read_json.reads(path2, 1, avg_price, pos_list, count_list) 72 | read_json.reads(path3, 2, avg_price, pos_list, count_list) 73 | read_json.reads(path4, 3, avg_price, pos_list, count_list) 74 | read_json.reads(path5, 4, avg_price, pos_list, count_list) 75 | # 输出读取结果进行观察 76 | print(avg_price) 77 | print(pos_list) 78 | print(count_list) 79 | 80 | # 数据太多了,只保留一部分 ,这里只保留15个板块的数据 81 | for i in range(5): 82 | while len(avg_price[i]) > 15: 83 | avg_price[i].pop() 84 | while len(pos_list[i]) > 15: 85 | pos_list[i].pop() 86 | 87 | city_name = ['北京', '广州', '上海', '深圳', '西安'] 88 | # 一些预设参数 89 | plt.figure(figsize=(50, 50), dpi=70) 90 | bar_width = 0.23 91 | plt.rcParams['font.sans-serif'] = ['Microsoft YaHei'] 92 | 93 | # 分开画不同城市的不同板块的展示图 94 | # 北京 95 | plt.subplot(321) 96 | plt.bar(pos_list[0],avg_price[0]) 97 | plt.title("北京不同板块均价展示图") 98 | plt.ylabel("价格:元/月") 99 | # 广州 100 | plt.subplot(322) 101 | plt.bar(pos_list[1], avg_price[1]) 102 | plt.title("广州不同板块均价展示图") 103 | plt.ylabel("价格:元/月") 104 | # 上海 105 | plt.subplot(323) 106 | plt.bar(pos_list[2], avg_price[2]) 107 | plt.title("上海不同板块均价展示图") 108 | plt.ylabel("价格:元/月") 109 | # 深圳 110 | plt.subplot(324) 111 | plt.bar(pos_list[3], avg_price[3]) 112 | plt.title("深圳不同板块均价展示图") 113 | plt.ylabel("价格:元/月") 114 | # 西安 115 | plt.subplot(325) 116 | plt.bar(pos_list[4], avg_price[4]) 117 | plt.title("西安不同板块均价展示图") 118 | plt.ylabel("价格:元/月") 119 | 120 | plt.show() -------------------------------------------------------------------------------- /实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/zufang/room_price_show.py: -------------------------------------------------------------------------------- 1 | # 按照居室展示,几居即几室 2 | 3 | import json 4 | import codecs 5 | import matplotlib.pyplot as plt 6 | import numpy as np 7 | 8 | avg_price = [[]for _ in range(3)] # 均价 9 | high_price = [[]for _ in range(3)]# 最高价 10 | mid_price = [[]for _ in range(3)]# 中位数 11 | low_price = [[]for _ in range(3)]# 最低价 12 | 13 | class Read_Json_and_show(): 14 | def reads(self,filename,city, avg_price, high_price, mid_price, low_price): # 读取城市数据,并分别存储到需要的数组中 15 | temp_price = [] # 临时用的总价 16 | temp_high_price = [] # 临时用的最高价 17 | temp_low_price = [] # 临时用的最低价 18 | temp_mid_price_count = [[]for _ in range(3)] 19 | house_type = 0 # 1表示1居 2表示2居 3表示3居 20 | count = [] # 居室个数计数器 21 | for i in range(0, 3): # 给初值 22 | count.append(0) 23 | temp_price.append(0) 24 | temp_low_price.append(99999999) 25 | temp_high_price.append(0) 26 | 27 | with codecs.open(filename, 'r', encoding='utf-8') as f: 28 | read = f.readlines() 29 | for index, info in enumerate(read): 30 | data = json.loads(info) 31 | value = list(data.values()) 32 | # value保存了每一行的数据的值,以列表形式。 33 | # 数据处理,拆分出价格 34 | temp_price_read = value[1].split('-') 35 | price = temp_price_read[0] 36 | price = int(price) 37 | # 数据处理,拆分出居室情况 38 | temp_house_type = value[5].split('室') 39 | temp_house_type = temp_house_type[0].split('/ ') 40 | if temp_house_type[len(temp_house_type)-1] == '1': 41 | house_type = 1 42 | if temp_house_type[len(temp_house_type)-1] == '2': 43 | house_type = 2 44 | if temp_house_type[len(temp_house_type)-1] == '3': 45 | house_type = 3 46 | # print(house_type) 47 | 48 | if house_type == 1: # 1室的情况 49 | count[0] += 1 50 | if price > temp_high_price[0]: # 最高价 51 | temp_high_price[0] = price 52 | if price < temp_low_price[0]: # 最低价 53 | temp_low_price[0] = price 54 | temp_price[0] += price 55 | temp_mid_price_count[0].append(price) 56 | 57 | if house_type == 2: # 2室的情况 58 | count[1] += 1 59 | if price > temp_high_price[1]: # 最高价 60 | temp_high_price[1] = price 61 | if price < temp_low_price[1]: # 最低价 62 | temp_low_price[1] = price 63 | 64 | temp_price[1] += price 65 | temp_mid_price_count[1].append(price) 66 | 67 | if house_type == 3: # 3室的情况 68 | count[2] += 1 69 | if price > temp_high_price[2]: # 最高价 70 | temp_high_price[2] = price 71 | if price < temp_low_price[2]: # 最低价 72 | temp_low_price[2] = price 73 | temp_price[2] += price 74 | temp_mid_price_count[2].append(price) 75 | 76 | for i in range(0, 3): 77 | # 将处理完毕的数据放入列表中 78 | temp_price[i] = float(temp_price[i]/count[i]) # 均价 79 | avg_price[i][city] = temp_price[i] 80 | high_price[i][city] = temp_high_price[i] 81 | low_price[i][city] = temp_low_price[i] 82 | temp_mid_price_count[i].sort() 83 | mid_price[i][city] = temp_mid_price_count[i][int(len(temp_mid_price_count[i])/2)] 84 | 85 | 86 | if __name__ == '__main__': 87 | read_json = Read_Json_and_show() 88 | # 读文件并按照路径依次处理数据 89 | path1 = 'scrapy-beijing-zufang.json' 90 | path2 = 'scrapy-guangzhou-zufang.json' 91 | path3 = 'scrapy-shanghai-zufang.json' 92 | path4 = 'scrapy-shenzhen-zufang.json' 93 | path5 = 'scrapy-xian-zufang.json' 94 | # 给嵌套列表一个初值,这样方便后续数据处理 95 | for i in range(0, 3): 96 | for j in range(0, 5): 97 | avg_price[i].append(0) 98 | high_price[i].append(0) 99 | low_price[i].append(0) 100 | mid_price[i].append(0) 101 | 102 | # 读取并处理数据 103 | read_json.reads(path1, 0, avg_price, high_price, mid_price, low_price) 104 | read_json.reads(path2, 1, avg_price, high_price, mid_price, low_price) 105 | read_json.reads(path3, 2, avg_price, high_price, mid_price, low_price) 106 | read_json.reads(path4, 3, avg_price, high_price, mid_price, low_price) 107 | read_json.reads(path5, 4, avg_price, high_price, mid_price, low_price) 108 | 109 | print(avg_price) 110 | print(high_price) 111 | print(low_price) 112 | print(mid_price) 113 | 114 | # 绘图步骤 115 | city_name = ['北京', '广州', '上海', '深圳', '西安'] 116 | plt.figure(figsize=(30, 30), dpi=70) 117 | bar_width = 0.23 118 | plt.rcParams['font.sans-serif'] = ['Microsoft YaHei'] 119 | 120 | # 均价 121 | plt.subplot(221) 122 | plt.title('平均价格 ', fontsize=25) 123 | # 绘制成一个X坐标对应多个直方图的,这样方便数据的对比和展示观看 124 | plt.bar(x=np.arange(len(city_name)), height=avg_price[0], width=bar_width,label='1居',color='steelblue') 125 | plt.bar(x=np.arange(len(city_name))+bar_width, height=avg_price[1],width=bar_width, label='2居',color='brown') 126 | plt.bar(x=np.arange(len(city_name))+bar_width*2, height=avg_price[2], width=bar_width, label='3居',color='darkorange') 127 | plt.xticks(np.arange(5)+0.2,city_name,fontsize=15) 128 | plt.ylabel('价格:元/月') 129 | # 绘制图例 130 | plt.legend() 131 | # 给图像上端增添数据显示 132 | for x, y in enumerate(avg_price[0]): 133 | plt.text(x, y + 0.2, "%s" % round(y, 1), ha='center') 134 | for x, y in enumerate(avg_price[1]): 135 | plt.text(x+bar_width, y + 0.2, "%s" % round(y, 1), ha='center') 136 | for x, y in enumerate(avg_price[2]): 137 | plt.text(x+bar_width*2, y + 0.2, "%s" % round(y, 1), ha='center') 138 | 139 | # 最高价 140 | 141 | plt.subplot(222) 142 | plt.title('最高价格 ', fontsize=25) 143 | # 绘制成一个X坐标对应多个直方图的,这样方便数据的对比和展示观看 144 | plt.bar(x=np.arange(len(city_name)), height=high_price[0], width=bar_width, label='1居', color='steelblue') 145 | plt.bar(x=np.arange(len(city_name)) + bar_width, height=high_price[1], width=bar_width, label='2居', color='brown') 146 | plt.bar(x=np.arange(len(city_name)) + bar_width * 2, height=high_price[2], width=bar_width, label='3居', 147 | color='darkorange') 148 | plt.xticks(np.arange(5) + 0.2, city_name, fontsize=15) 149 | plt.ylabel('价格:元/月') 150 | plt.legend() 151 | # 给图像上端增添数据显示 152 | for x, y in enumerate(high_price[0]): 153 | plt.text(x, y + 0.2, "%s" % round(y, 1), ha='center') 154 | for x, y in enumerate(high_price[1]): 155 | plt.text(x + bar_width, y + 0.2, "%s" % round(y, 1), ha='center') 156 | for x, y in enumerate(high_price[2]): 157 | plt.text(x + bar_width * 2, y + 0.2, "%s" % round(y, 1), ha='center') 158 | 159 | # 最低价 160 | plt.subplot(223) 161 | plt.title('最低价格 ', fontsize=25) 162 | # 绘制成一个X坐标对应多个直方图的,这样方便数据的对比和展示观看 163 | plt.bar(x=np.arange(len(city_name)), height=low_price[0], width=bar_width, label='1居', color='steelblue') 164 | plt.bar(x=np.arange(len(city_name)) + bar_width, height=low_price[1], width=bar_width, label='2居', color='brown') 165 | plt.bar(x=np.arange(len(city_name)) + bar_width * 2, height=low_price[2], width=bar_width, label='3居', 166 | color='darkorange') 167 | plt.xticks(np.arange(5) + 0.2, city_name, fontsize=15) 168 | plt.ylabel('价格:元/月') 169 | plt.legend() 170 | # 给图像上端增添数据显示 171 | for x, y in enumerate(low_price[0]): 172 | plt.text(x, y + 0.2, "%s" % round(y, 1), ha='center') 173 | for x, y in enumerate(low_price[1]): 174 | plt.text(x + bar_width, y + 0.2, "%s" % round(y, 1), ha='center') 175 | for x, y in enumerate(low_price[2]): 176 | plt.text(x + bar_width * 2, y + 0.2, "%s" % round(y, 1), ha='center') 177 | 178 | # 中位数 179 | plt.subplot(224) 180 | plt.title('中位数价格 ', fontsize=25) 181 | plt.bar(x=np.arange(len(city_name)), height=mid_price[0], width=bar_width, label='1居', color='steelblue') 182 | plt.bar(x=np.arange(len(city_name)) + bar_width, height=mid_price[1], width=bar_width, label='2居', color='brown') 183 | plt.bar(x=np.arange(len(city_name)) + bar_width * 2, height=mid_price[2], width=bar_width, label='3居', 184 | color='darkorange') 185 | plt.xticks(np.arange(5) + 0.2, city_name, fontsize=15) 186 | plt.ylabel('价格:元/月') 187 | plt.legend() 188 | # 给图像上端增添数据显示 189 | for x, y in enumerate(mid_price[0]): 190 | plt.text(x, y + 0.2, "%s" % round(y, 1), ha='center') 191 | for x, y in enumerate(mid_price[1]): 192 | plt.text(x + bar_width, y + 0.2, "%s" % round(y, 1), ha='center') 193 | for x, y in enumerate(mid_price[2]): 194 | plt.text(x + bar_width * 2, y + 0.2, "%s" % round(y, 1), ha='center') 195 | 196 | plt.show() 197 | -------------------------------------------------------------------------------- /实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/zufang/salary_price_show.py: -------------------------------------------------------------------------------- 1 | #由于之前写好的程序中有展示各个城市单位面积均价,直接在这里存入列表中。 2 | #单位面积均价: 北京:103.2元/平米/月 3 | # 广州:63.9元/平米/月 4 | # 上海:103.5元/平米/月 5 | # 深圳:88.5元/平米/月 6 | # 西安:36.0元/平米/月 7 | #查询百度各个城市的人均工资可知: 8 | #人均工资: 北京:13567元/月 广州:11300元/月 上海:12183元/月 深圳:12300元/月 西安9011元/月 9 | 10 | import matplotlib.pyplot as plt 11 | 12 | space_price = [103.2, 63.9, 103.5, 88.5, 36.0] 13 | salary = [135.67, 113.00, 121.83, 123.00, 90.11] # 单位为百元,这样好计算一点 14 | count = [] 15 | # 计算比值 16 | for i in range(0,5): 17 | count.append(space_price[i]/salary[i]) # 用比值来表示,比值越低说明房租占工资占比小,生活成本相对低一点 18 | 19 | # 绘图过程 20 | city_name = ['北京', '广州', '上海', '深圳', '西安'] 21 | plt.figure(figsize=(20, 10), dpi=70) 22 | # 消除中文乱码用的 23 | plt.rcParams['font.sans-serif'] = ['Microsoft YaHei'] 24 | # 绘制直方图 25 | plt.subplot(121) 26 | plt.title("单位面积价格与平均月薪的比值直方图",fontsize=20) 27 | plt.bar(city_name, count,width=0.4) 28 | plt.ylabel("单位面积价格/人均月薪(百元)",fontsize=15) 29 | # 绘制散点图,由于要用不同的点的颜色来表示,因此分开绘制五个点 30 | plt.subplot(122) 31 | plt.scatter(salary[0],space_price[0],s=60, label='北京', color='steelblue') 32 | plt.scatter(salary[1],space_price[1],s=60, label='广州', color='brown') 33 | plt.scatter(salary[2],space_price[2],s=60, label='上海',color='green') 34 | plt.scatter(salary[3],space_price[3],s=60, label='深圳',color='darkorange') 35 | plt.scatter(salary[4],space_price[4],s=60, label='西安',color='skyblue') 36 | plt.title("人均月薪-单位面积均价散点图",fontsize=20) 37 | plt.xlabel("人均月薪 单位:百元",fontsize=15) 38 | plt.ylabel("单位面积均价 单位:平方米/元/月",fontsize=15) 39 | plt.legend(fontsize=15) 40 | 41 | plt.show() 42 | -------------------------------------------------------------------------------- /实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/zufang/scrapy.cfg: -------------------------------------------------------------------------------- 1 | # Automatically created by: scrapy startproject 2 | # 3 | # For more information about the [deploy] section see: 4 | # https://scrapyd.readthedocs.io/en/latest/deploy.html 5 | 6 | [settings] 7 | default = zufang.settings 8 | 9 | [deploy] 10 | #url = http://localhost:6800/ 11 | project = zufang 12 | -------------------------------------------------------------------------------- /实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/zufang/total_price_show.py: -------------------------------------------------------------------------------- 1 | # 该文件实现了对五个城市的租房价格的均价,中位数,最高价,最低价和单位面积的均价,中位数,最高价,最低价的比较分析和图表绘制 2 | import json 3 | import codecs 4 | import re 5 | import matplotlib.pyplot as plt 6 | 7 | 8 | total_avg_price = [] # 均价 9 | total_high_price = [] # 最高价 10 | total_mid_price = [] # 中位数 11 | total_low_price = [] # 最低价 12 | space_avg_price = [] # 均价(单位面积) 13 | space_high_price = [] # 最高价(单位面积) 14 | space_low_price = [] # 最低价(单位面积) 15 | space_mid_price = [] # 中位数 (单位面积) 16 | 17 | class Read_Json_and_show(): 18 | def reads(self,filename,city, total_avg_price, total_high_price, total_mid_price, total_low_price, space_avg_price, 19 | space_high_price, space_low_price, space_mid_price): # 读取城市数据,并分别存储到需要的数组中 20 | temp_total_price = 0 21 | temp_space_price = 0 22 | temp_space_high_price = 0 23 | temp_space_low_price = 9999999 24 | temp_total_high_price = 0 25 | temp_total_low_price = 9999999 26 | with codecs.open(filename,'r',encoding='utf-8') as f: 27 | # 打开并读取文件 28 | read = f.readlines() 29 | total_mid_price_count = [] 30 | space_mid_price_count = [] 31 | for index, info in enumerate(read): # 逐行读取 32 | data = json.loads(info) 33 | value = list(data.values()) 34 | # value保存了每一行的数据的值,以列表形式。 35 | 36 | number = re.compile(r'^[-+]?[0-9]+\.[0-9]+$') # 正则式判断是否是小数 37 | # 划分出价格 38 | temp_price = value[1].split('-') 39 | price = temp_price[0] 40 | price = int(price) 41 | # 划分出面积 42 | temp_space = value[5].split("㎡") 43 | temp_space = temp_space[0].split("/ ") 44 | if len(temp_space) == 1: 45 | temp_space = temp_space[0].split("-") 46 | space = temp_space[0] 47 | space = float(space) 48 | else: 49 | temp_space2 = temp_space[1].split("-") 50 | # 判断这个数是否为小数,是则说明是面积,这里保留了最小面积作为参考(否则就是异常数据,需要被忽略) 51 | result = number.match(temp_space2[0]) 52 | if result: 53 | space = temp_space2[0] 54 | space = float(space) 55 | else: # 保留最小的面积(如20-25㎡则将space看做20) 56 | temp_space = temp_space[2].split("-") 57 | space = temp_space[0] 58 | space = float(space) 59 | # print(space) 60 | if price > temp_total_high_price: # 最高价 61 | temp_total_high_price = price 62 | if price < temp_total_low_price: # 最低价 63 | temp_total_low_price = price 64 | temp_total_price += price # 总价,均价出循环了计算 65 | total_mid_price_count.append(price) # 中位数,同样出循环了计算 66 | 67 | space_price = float(price/space) 68 | if space_price > temp_space_high_price: # 最高价(单位面积) 69 | temp_space_high_price = space_price 70 | if space_price < temp_space_low_price: # 最低价(单位面积) 71 | temp_space_low_price = space_price 72 | temp_space_price += space_price # 总价 (单位面积) 73 | space_mid_price_count.append(space_price) # 中位数(单位面积) 74 | 75 | # 均价 76 | total_avg_price.append(float(temp_total_price/3000)) 77 | space_avg_price.append(float(temp_space_price/3000)) 78 | # 最高价 79 | total_high_price.append(float(temp_total_high_price)) 80 | space_high_price.append(float(temp_space_high_price)) 81 | # 最低价 82 | total_low_price.append(float(temp_total_low_price)) 83 | space_low_price.append(float(temp_space_low_price)) 84 | # 中位数 85 | total_mid_price_count.sort() 86 | space_mid_price_count.sort() 87 | total_mid_price.append(float(total_mid_price_count[1499])) 88 | space_mid_price.append(float(space_mid_price_count[1499])) 89 | 90 | 91 | if __name__ == '__main__': 92 | read_json = Read_Json_and_show() 93 | path1 = 'scrapy-beijing-zufang.json' 94 | path2 = 'scrapy-guangzhou-zufang.json' 95 | path3 = 'scrapy-shanghai-zufang.json' 96 | path4 = 'scrapy-shenzhen-zufang.json' 97 | path5 = 'scrapy-xian-zufang.json' 98 | # 读取并处理数据 99 | read_json.reads(path1, 1, total_avg_price, total_high_price, total_mid_price, total_low_price, space_avg_price, 100 | space_high_price, space_low_price, space_mid_price) 101 | read_json.reads(path2, 2, total_avg_price, total_high_price, total_mid_price, total_low_price, space_avg_price, 102 | space_high_price, space_low_price, space_mid_price) 103 | read_json.reads(path3, 3, total_avg_price, total_high_price, total_mid_price, total_low_price, space_avg_price, 104 | space_high_price, space_low_price, space_mid_price) 105 | read_json.reads(path4, 4, total_avg_price, total_high_price, total_mid_price, total_low_price, space_avg_price, 106 | space_high_price, space_low_price, space_mid_price) 107 | read_json.reads(path5, 5, total_avg_price, total_high_price, total_mid_price, total_low_price, space_avg_price, 108 | space_high_price, space_low_price, space_mid_price) 109 | 110 | # 输出展示处理结果 111 | print(total_avg_price) 112 | print(total_high_price) 113 | print(total_low_price) 114 | print(total_mid_price) 115 | print(space_avg_price) 116 | print(space_high_price) 117 | print(space_low_price) 118 | print(space_mid_price) 119 | 120 | # 绘直方图图 121 | plt.rcParams['font.sans-serif'] = ['Microsoft YaHei'] 122 | city_name = ['北京', '广州', '上海', '深圳', '西安'] 123 | plt.figure(figsize=(30, 30), dpi=70) 124 | bar_width = 0.4 125 | 126 | # 总平均租金展示图的绘制,共八个子图 127 | plt.subplot(241) 128 | plt.title('总价平均价格', fontsize=25) 129 | plt.bar(city_name,total_avg_price) 130 | plt.ylabel('价格:元/月') 131 | for x, y in enumerate(total_avg_price): 132 | plt.text(x, y + 0.1, "%s" % round(y, 1), ha='center') 133 | # 总最高租金 134 | plt.subplot(242) 135 | plt.title('总价最高价格', fontsize=25) 136 | plt.bar(city_name, total_high_price) 137 | plt.ylabel('价格:元/月') 138 | for x, y in enumerate(total_high_price): 139 | plt.text(x, y + 0.1, "%s" % round(y, 1), ha='center') 140 | # 总最低租金 141 | plt.subplot(243) 142 | plt.title('总价最低价格', fontsize=25) 143 | plt.bar(city_name, total_low_price) 144 | plt.ylabel('价格:元/月') 145 | for x, y in enumerate(total_low_price): 146 | plt.text(x, y + 0.1, "%s" % round(y, 1), ha='center') 147 | # 总中位数租金 148 | plt.subplot(244) 149 | plt.title('总价中位数价格', fontsize=25) 150 | plt.bar(city_name, total_mid_price) 151 | plt.ylabel('价格:元/月') 152 | for x, y in enumerate(total_mid_price): 153 | plt.text(x, y + 0.1, "%s" % round(y, 1), ha='center') 154 | # 单位面积平均租金展示图的绘制 155 | plt.subplot(245) 156 | plt.title('单位面积均价', fontsize=25) 157 | plt.bar(city_name, space_avg_price) 158 | plt.ylabel('价格:元/月') 159 | for x, y in enumerate(space_avg_price): 160 | plt.text(x, y + 0.1, "%s" % round(y, 1), ha='center') 161 | # 单位面积最高租金 162 | plt.subplot(246) 163 | plt.title('单位面积最高价格', fontsize=25) 164 | plt.bar(city_name, space_high_price) 165 | plt.ylabel('价格:元/月') 166 | for x, y in enumerate(space_high_price): 167 | plt.text(x, y + 0.1, "%s" % round(y, 1), ha='center') 168 | # 单位面积最低租金 169 | plt.subplot(247) 170 | plt.title('单位面积最低价格', fontsize=25) 171 | plt.bar(city_name, space_low_price) 172 | plt.ylabel('价格:元/月') 173 | for x, y in enumerate(space_low_price): 174 | plt.text(x, y + 0.1, "%s" % round(y, 1), ha='center') 175 | # 单位面积中位数租金 176 | plt.subplot(248) 177 | plt.title('单位面积中位数价格', fontsize=25) 178 | plt.bar(city_name, space_mid_price) 179 | plt.ylabel('价格:元/月') 180 | for x, y in enumerate(space_mid_price): 181 | plt.text(x, y + 0.1, "%s" % round(y, 1), ha='center') 182 | 183 | plt.show() 184 | -------------------------------------------------------------------------------- /实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/zufang/zufang/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Revenger666/python-house-scrapy/bbff08f70092d39ef0d19d3f0d9723aa4c8fd1b8/实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/zufang/zufang/__init__.py -------------------------------------------------------------------------------- /实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/zufang/zufang/items.py: -------------------------------------------------------------------------------- 1 | # Define here the models for your scraped items 2 | # 3 | # See documentation in: 4 | # https://docs.scrapy.org/en/latest/topics/items.html 5 | 6 | import scrapy 7 | 8 | 9 | class zufangitem(scrapy.Item): 10 | title = scrapy.Field() # 标题 11 | price = scrapy.Field() #月租金 12 | position0 = scrapy.Field() #地址1 13 | position1 = scrapy.Field() #地址2 14 | position2 = scrapy.Field() #地址3 15 | information = scrapy.Field() #其他信息 16 | 17 | -------------------------------------------------------------------------------- /实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/zufang/zufang/middlewares.py: -------------------------------------------------------------------------------- 1 | from scrapy import signals 2 | from selenium import webdriver 3 | from scrapy.http import HtmlResponse 4 | 5 | # 只需要修改下载器中间件,爬虫中间件不用管 6 | 7 | 8 | class ZufangDownloaderMiddleware: 9 | # 当下载器中间件开始工作时,自动打开一个浏览器 10 | 11 | def __init__(self): 12 | self.driver = webdriver.Chrome() 13 | 14 | @classmethod 15 | def from_crawler(cls, crawler): 16 | # This method is used by Scrapy to create your spiders. 17 | s = cls() 18 | crawler.signals.connect(s.spider_opened, signal=signals.spider_opened) 19 | # 下面这一行需要手动添加,作用是调用关闭浏览器的函数 20 | crawler.signals.connect(s.spider_closed, signal=signals.spider_closed) 21 | return s 22 | 23 | # 每当爬虫文件向目标网址发送一次请求都会调用这个函数,用处就是返回该网址的源码 24 | 25 | def process_request(self, request, spider): 26 | self.driver.get(request.url) # 使用浏览器打开请求的URL 27 | body = self.driver.page_source # 获取网页HTML源码 28 | return HtmlResponse(url=self.driver.current_url, body=body, encoding='utf-8', request=request) 29 | 30 | def process_response(self, request, response, spider): 31 | return response 32 | 33 | def process_exception(self, request, exception, spider): 34 | pass 35 | 36 | def spider_opened(self, spider): 37 | spider.logger.info("Spider opened: %s" % spider.name) 38 | 39 | # 该函数需要手动添加,作用是关闭浏览器 40 | 41 | def spider_closed(self, spider): 42 | self.driver.close() 43 | spider.logger.info("Spider closed: %s" % spider.name) 44 | -------------------------------------------------------------------------------- /实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/zufang/zufang/pipelines.py: -------------------------------------------------------------------------------- 1 | # Define your item pipelines here 2 | # 3 | # Don't forget to add your pipeline to the ITEM_PIPELINES setting 4 | # See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html 5 | 6 | 7 | # useful for handling different item types with a single interface 8 | from itemadapter import ItemAdapter 9 | import json 10 | 11 | 12 | class zufangline(object): 13 | def open_spider(self, spider): 14 | try: 15 | self.file = open('scrapy-xian-zufang.json', "w", encoding="utf-8") 16 | except Exception as err: 17 | print(err) 18 | 19 | def process_item(self, item, spider): 20 | dict_item = dict(item) 21 | json_str = json.dumps(dict_item, ensure_ascii=False) + "\n" 22 | self.file.write(json_str) 23 | return item 24 | 25 | def close_spider(self, spider): 26 | self.file.close() 27 | -------------------------------------------------------------------------------- /实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/zufang/zufang/settings.py: -------------------------------------------------------------------------------- 1 | # Scrapy settings for lianjia project 2 | # 3 | # For simplicity, this file contains only settings considered important or 4 | # commonly used. You can find more settings consulting the documentation: 5 | # 6 | # https://docs.scrapy.org/en/latest/topics/settings.html 7 | # https://docs.scrapy.org/en/latest/topics/downloader-middleware.html 8 | # https://docs.scrapy.org/en/latest/topics/spider-middleware.html 9 | 10 | BOT_NAME = "zufang" 11 | #2403:a200:a200:13f1:183:84:18:11 12 | 13 | SPIDER_MODULES = ["zufang.spiders"] 14 | NEWSPIDER_MODULE = "zufang.spiders" 15 | 16 | # Crawl responsibly by identifying yourself (and your website) on the user-agent 17 | #USER_AGENT = 'python_qm (+http://www.yourdomain.com)' 18 | #USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36" 19 | #USER_AGENT ="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36 Edg/119.0.0.0" 20 | DOWNLOADER_MIDDLEWARES = { 21 | # 'lianjia.middlewares.LianjiaDownloaderMiddleware': 543, 22 | 'zufang.middlewares.RandomUserAgentMiddleware': 900, 23 | } 24 | 25 | MY_USER_AGENT = [ 26 | "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; AcooBrowser; .NET CLR 1.1.4322; .NET CLR 2.0.50727)", 27 | "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; Acoo Browser; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.0.04506)", 28 | "Mozilla/4.0 (compatible; MSIE 7.0; AOL 9.5; AOLBuild 4337.35; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)", 29 | "Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US)", 30 | "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET CLR 2.0.50727; Media Center PC 6.0)", 31 | "Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET CLR 1.0.3705; .NET CLR 1.1.4322)", 32 | "Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 5.2; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.2; .NET CLR 3.0.04506.30)", 33 | "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN) AppleWebKit/523.15 (KHTML, like Gecko, Safari/419.3) Arora/0.3 (Change: 287 c9dfb30)", 34 | "Mozilla/5.0 (X11; U; Linux; en-US) AppleWebKit/527+ (KHTML, like Gecko, Safari/419.3) Arora/0.6", 35 | "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.2pre) Gecko/20070215 K-Ninja/2.1.1", 36 | "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.9) Gecko/20080705 Firefox/3.0 Kapiko/3.0", 37 | "Mozilla/5.0 (X11; Linux i686; U;) Gecko/20070322 Kazehakase/0.4.5", 38 | "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.8) Gecko Fedora/1.9.0.8-1.fc10 Kazehakase/0.5.6", 39 | "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11", 40 | "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/535.20 (KHTML, like Gecko) Chrome/19.0.1036.7 Safari/535.20", 41 | "Opera/9.80 (Macintosh; Intel Mac OS X 10.6.8; U; fr) Presto/2.9.168 Version/11.52", 42 | "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.11 TaoBrowser/2.0 Safari/536.11", 43 | "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.71 Safari/537.1 LBBROWSER", 44 | "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; LBBROWSER)", 45 | "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E; LBBROWSER)", 46 | "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.84 Safari/535.11 LBBROWSER", 47 | "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E)", 48 | "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; QQBrowser/7.0.3698.400)", 49 | "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E)", 50 | "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; SV1; QQDownload 732; .NET4.0C; .NET4.0E; 360SE)", 51 | "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E)", 52 | "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E)", 53 | "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1", 54 | "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1", 55 | "Mozilla/5.0 (iPad; U; CPU OS 4_2_1 like Mac OS X; zh-cn) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8C148 Safari/6533.18.5", 56 | "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:2.0b13pre) Gecko/20110307 Firefox/4.0b13pre", 57 | "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:16.0) Gecko/20100101 Firefox/16.0", 58 | "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11", 59 | "Mozilla/5.0 (X11; U; Linux x86_64; zh-CN; rv:1.9.2.10) Gecko/20100922 Ubuntu/10.10 (maverick) Firefox/3.6.10", 60 | "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36", 61 | ] 62 | 63 | # Obey robots.txt rules 64 | ROBOTSTXT_OBEY = False 65 | 66 | LOG_LEVEL = 'WARNING' 67 | 68 | #LOG_LEVEL = "WARNING" 69 | # Configure maximum concurrent requests performed by Scrapy (default: 16) 70 | #CONCURRENT_REQUESTS = 8 71 | 72 | # Configure a delay for requests for the same website (default: 0) 73 | # See https://docs.scrapy.org/en/latest/topics/settings.html#download-delay 74 | # See also autothrottle settings and docs 75 | DOWNLOAD_DELAY = 3 76 | RANDOMIZE_DOWNLOAD_DELAY = True 77 | #CONCURRENT_REQUESTS_PER_DOMAIN = 2 78 | # The download delay setting will honor only one of: 79 | #CONCURRENT_REQUESTS_PER_DOMAIN = 16 80 | #CONCURRENT_REQUESTS_PER_IP = 16 81 | 82 | # Disable cookies (enabled by default) 83 | #COOKIES_ENABLED = False 84 | 85 | # Disable Telnet Console (enabled by default) 86 | #TELNETCONSOLE_ENABLED = False 87 | 88 | # Override the default request headers: 89 | #DEFAULT_REQUEST_HEADERS = { 90 | # "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", 91 | # "Accept-Language": "en", 92 | # "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.4389.82 Safari/537.36" 93 | #} 94 | 95 | # Enable or disable spider middlewares 96 | # See https://docs.scrapy.org/en/latest/topics/spider-middleware.html 97 | #SPIDER_MIDDLEWARES = { 98 | # "lianjia.middlewares.LianjiaSpiderMiddleware": 543, 99 | #} 100 | 101 | # Enable or disable downloader middlewares 102 | # See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html 103 | DOWNLOADER_MIDDLEWARES = { 104 | "zufang.middlewares.ZufangDownloaderMiddleware": 543, 105 | } 106 | 107 | # Enable or disable extensions 108 | # See https://docs.scrapy.org/en/latest/topics/extensions.html 109 | #EXTENSIONS = { 110 | # "scrapy.extensions.telnet.TelnetConsole": None, 111 | #} 112 | 113 | # Configure item pipelines 114 | # See https://docs.scrapy.org/en/latest/topics/item-pipeline.html 115 | ITEM_PIPELINES = {'zufang.pipelines.zufangline': 300, } 116 | #ITEM_PIPELINES = {'lianjia.pipelines.firsthandline': 300} 117 | # Enable and configure the AutoThrottle extension (disabled by default) 118 | # See https://docs.scrapy.org/en/latest/topics/autothrottle.html 119 | #AUTOTHROTTLE_ENABLED = True 120 | # The initial download delay 121 | #AUTOTHROTTLE_START_DELAY = 5 122 | # The maximum download delay to be set in case of high latencies 123 | #AUTOTHROTTLE_MAX_DELAY = 60 124 | # The average number of requests Scrapy should be sending in parallel to 125 | # each remote server 126 | #AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0 127 | # Enable showing throttling stats for every response received: 128 | #AUTOTHROTTLE_DEBUG = False 129 | 130 | # Enable and configure HTTP caching (disabled by default) 131 | # See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings 132 | #HTTPCACHE_ENABLED = True 133 | #HTTPCACHE_EXPIRATION_SECS = 0 134 | #HTTPCACHE_DIR = "httpcache" 135 | #HTTPCACHE_IGNORE_HTTP_CODES = [] 136 | #HTTPCACHE_STORAGE = "scrapy.extensions.httpcache.FilesystemCacheStorage" 137 | 138 | # Set settings whose default value is deprecated to a future-proof value 139 | REQUEST_FINGERPRINTER_IMPLEMENTATION = "2.7" 140 | TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor" 141 | FEED_EXPORT_ENCODING = "utf-8" 142 | -------------------------------------------------------------------------------- /实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/zufang/zufang/spiders/__init__.py: -------------------------------------------------------------------------------- 1 | # This package will contain the spiders of your Scrapy project 2 | # 3 | # Please refer to the documentation for information on how to create and manage 4 | # your spiders. 5 | -------------------------------------------------------------------------------- /实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/zufang/zufang/spiders/spider1.py: -------------------------------------------------------------------------------- 1 | import scrapy 2 | 3 | from zufang.items import zufangitem 4 | 5 | 6 | class Zufangspider(scrapy.spiders.Spider): 7 | name = "xian" # 爬虫名字分别为 beijing shanghai guangzhou shenzhen xian 8 | allowed_domains = ["xa.lianjia.com"] # 爬取的起始页面 9 | start_urls = [] 10 | for page in range(1, 101): # 共100页,所以利用一个循环来爬取 11 | url1 = 'https://xa.lianjia.com/zufang/pg{}/'.format(page) 12 | start_urls.append(url1) 13 | 14 | custom_settings = { 15 | 'ITEM_PIPELINES': {'zufang.pipelines.zufangline': 300}, 16 | } 17 | 18 | def parse(self, response, **kwargs): 19 | 20 | item = zufangitem() 21 | div_list = response.xpath("//*[@id=\"content\"]/div[1]/div[1]/div") 22 | 23 | # 通过XPATH来分析爬取到的内容,并提取需要的数据 24 | for each in div_list: 25 | item['title'] = each.xpath("normalize-space(./div/p[1]/a/text())").extract_first() 26 | item['price'] = each.xpath("normalize-space(./div/span/em/text())").extract_first() 27 | item['position0'] = each.xpath("./div/p[2]/a[1]/text()").extract_first() 28 | item['position1'] = each.xpath("./div/p[2]/a[2]/text()").extract_first() 29 | item['position2'] = each.xpath("./div/p[2]/a[3]/text()").extract_first() 30 | item['information'] = each.xpath("normalize-space(./div/p[2])").extract_first() 31 | yield item 32 | -------------------------------------------------------------------------------- /实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/实验报告/租房数据分析实验报告-2021211338-郭柏彤.docx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Revenger666/python-house-scrapy/bbff08f70092d39ef0d19d3f0d9723aa4c8fd1b8/实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/实验报告/租房数据分析实验报告-2021211338-郭柏彤.docx -------------------------------------------------------------------------------- /实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/实验报告/租房数据分析实验报告-2021211338-郭柏彤.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Revenger666/python-house-scrapy/bbff08f70092d39ef0d19d3f0d9723aa4c8fd1b8/实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/实验报告/租房数据分析实验报告-2021211338-郭柏彤.pdf -------------------------------------------------------------------------------- /实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/生成的图表/五个城市租房总价和单位面积价格分析.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Revenger666/python-house-scrapy/bbff08f70092d39ef0d19d3f0d9723aa4c8fd1b8/实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/生成的图表/五个城市租房总价和单位面积价格分析.png -------------------------------------------------------------------------------- /实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/生成的图表/单位面积价格和GDP的关系.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Revenger666/python-house-scrapy/bbff08f70092d39ef0d19d3f0d9723aa4c8fd1b8/实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/生成的图表/单位面积价格和GDP的关系.png -------------------------------------------------------------------------------- /实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/生成的图表/单位面积价格和人均月薪的关系.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Revenger666/python-house-scrapy/bbff08f70092d39ef0d19d3f0d9723aa4c8fd1b8/实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/生成的图表/单位面积价格和人均月薪的关系.png -------------------------------------------------------------------------------- /实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/生成的图表/均价和居室的关系.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Revenger666/python-house-scrapy/bbff08f70092d39ef0d19d3f0d9723aa4c8fd1b8/实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/生成的图表/均价和居室的关系.png -------------------------------------------------------------------------------- /实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/生成的图表/均价和板块的关系.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Revenger666/python-house-scrapy/bbff08f70092d39ef0d19d3f0d9723aa4c8fd1b8/实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/生成的图表/均价和板块的关系.png -------------------------------------------------------------------------------- /实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/生成的图表/均价和面向的关系.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Revenger666/python-house-scrapy/bbff08f70092d39ef0d19d3f0d9723aa4c8fd1b8/实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/生成的图表/均价和面向的关系.png -------------------------------------------------------------------------------- /租房数据分析实验报告-2021211338-郭柏彤.docx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Revenger666/python-house-scrapy/bbff08f70092d39ef0d19d3f0d9723aa4c8fd1b8/租房数据分析实验报告-2021211338-郭柏彤.docx -------------------------------------------------------------------------------- /租房数据分析实验报告-2021211338-郭柏彤.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Revenger666/python-house-scrapy/bbff08f70092d39ef0d19d3f0d9723aa4c8fd1b8/租房数据分析实验报告-2021211338-郭柏彤.pdf -------------------------------------------------------------------------------- /题目要求.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Revenger666/python-house-scrapy/bbff08f70092d39ef0d19d3f0d9723aa4c8fd1b8/题目要求.pdf --------------------------------------------------------------------------------