├── house_outputGBK编码,可用excle打开,.csv
├── lianjia
├── .idea
│ ├── .gitignore
│ ├── encodings.xml
│ ├── inspectionProfiles
│ │ └── profiles_settings.xml
│ ├── lianjia.iml
│ ├── misc.xml
│ ├── modules.xml
│ └── vcs.xml
├── 2021211338-郭柏彤-爬虫小作业-源代码
│ ├── .idea
│ │ ├── .gitignore
│ │ ├── inspectionProfiles
│ │ │ └── profiles_settings.xml
│ │ ├── lianjia.iml
│ │ ├── misc.xml
│ │ └── modules.xml
│ ├── lianjia
│ │ ├── __init__.py
│ │ ├── begin.py
│ │ ├── items.py
│ │ ├── middlewares.py
│ │ ├── pipelines.py
│ │ ├── settings.py
│ │ └── spiders
│ │ │ ├── __init__.py
│ │ │ ├── spider1.py
│ │ │ └── spider2.py
│ └── scrapy.cfg
├── 2021211338-郭柏彤-爬虫小作业-爬取的数据文件
│ ├── scrapy-test-firsthand.json
│ └── scrapy-test-secondhand.json
├── 2021211338-郭柏彤-爬虫小作业-说明文档.docx
├── datachange.py
├── house_output.csv
├── house_show.py
├── house_show2.py
├── house_show3.py
├── lianjia
│ ├── .idea
│ │ ├── .gitignore
│ │ ├── inspectionProfiles
│ │ │ └── profiles_settings.xml
│ │ ├── lianjia.iml
│ │ ├── misc.xml
│ │ └── modules.xml
│ ├── __init__.py
│ ├── begin.py
│ ├── items.py
│ ├── middlewares.py
│ ├── pipelines.py
│ ├── settings.py
│ └── spiders
│ │ ├── __init__.py
│ │ ├── spider1.py
│ │ └── spider2.py
├── scrapy-test-firsthand.json
├── scrapy-test-secondhand.json
└── scrapy.cfg
├── test1_house
├── datachange.py
├── house_output.csv
├── house_outputGBK编码,可用excle打开,.csv
├── house_show.py
├── house_show2.py
├── house_show3.py
├── scrapy-test-firsthand.json
├── 单价-总价散点图绘制效果.png
├── 单价直方图绘制效果.png
└── 总价直方图绘制效果.png
├── zufang
├── .idea
│ ├── .gitignore
│ ├── inspectionProfiles
│ │ └── profiles_settings.xml
│ ├── misc.xml
│ ├── modules.xml
│ └── zufang.iml
├── GDP_price_show.py
├── chromedriver.exe
├── face_price_show.py
├── pos_price_show.py
├── room_price_show.py
├── salary_price_show.py
├── scrapy-beijing-zufang.json
├── scrapy-guangzhou-zufang.json
├── scrapy-shanghai-zufang.json
├── scrapy-shenzhen-zufang.json
├── scrapy-xian-zufang.json
├── scrapy.cfg
├── total_price_show.py
└── zufang
│ ├── __init__.py
│ ├── items.py
│ ├── middlewares.py
│ ├── pipelines.py
│ ├── settings.py
│ └── spiders
│ ├── __init__.py
│ └── spider1.py
├── 实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤.zip
├── 实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤
├── zufang
│ ├── .idea
│ │ ├── .gitignore
│ │ ├── inspectionProfiles
│ │ │ └── profiles_settings.xml
│ │ ├── misc.xml
│ │ ├── modules.xml
│ │ └── zufang.iml
│ ├── GDP_price_show.py
│ ├── chromedriver.exe
│ ├── face_price_show.py
│ ├── pos_price_show.py
│ ├── room_price_show.py
│ ├── salary_price_show.py
│ ├── scrapy-beijing-zufang.json
│ ├── scrapy-guangzhou-zufang.json
│ ├── scrapy-shanghai-zufang.json
│ ├── scrapy-shenzhen-zufang.json
│ ├── scrapy-xian-zufang.json
│ ├── scrapy.cfg
│ ├── total_price_show.py
│ └── zufang
│ │ ├── __init__.py
│ │ ├── items.py
│ │ ├── middlewares.py
│ │ ├── pipelines.py
│ │ ├── settings.py
│ │ └── spiders
│ │ ├── __init__.py
│ │ └── spider1.py
├── 实验报告
│ ├── 租房数据分析实验报告-2021211338-郭柏彤.docx
│ └── 租房数据分析实验报告-2021211338-郭柏彤.pdf
├── 爬取下来的数据
│ ├── scrapy-beijing-zufang.json
│ ├── scrapy-guangzhou-zufang.json
│ ├── scrapy-shanghai-zufang.json
│ ├── scrapy-shenzhen-zufang.json
│ └── scrapy-xian-zufang.json
└── 生成的图表
│ ├── 五个城市租房总价和单位面积价格分析.png
│ ├── 单位面积价格和GDP的关系.png
│ ├── 单位面积价格和人均月薪的关系.png
│ ├── 均价和居室的关系.png
│ ├── 均价和板块的关系.png
│ └── 均价和面向的关系.png
├── 租房数据分析实验报告-2021211338-郭柏彤.docx
├── 租房数据分析实验报告-2021211338-郭柏彤.pdf
└── 题目要求.pdf
/house_outputGBK编码,可用excle打开,.csv:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Revenger666/python-house-scrapy/bbff08f70092d39ef0d19d3f0d9723aa4c8fd1b8/house_outputGBK编码,可用excle打开,.csv
--------------------------------------------------------------------------------
/lianjia/.idea/.gitignore:
--------------------------------------------------------------------------------
1 | # 默认忽略的文件
2 | /shelf/
3 | /workspace.xml
4 |
--------------------------------------------------------------------------------
/lianjia/.idea/encodings.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
--------------------------------------------------------------------------------
/lianjia/.idea/inspectionProfiles/profiles_settings.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
--------------------------------------------------------------------------------
/lianjia/.idea/lianjia.iml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
--------------------------------------------------------------------------------
/lianjia/.idea/misc.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
--------------------------------------------------------------------------------
/lianjia/.idea/modules.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
--------------------------------------------------------------------------------
/lianjia/.idea/vcs.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
--------------------------------------------------------------------------------
/lianjia/2021211338-郭柏彤-爬虫小作业-源代码/.idea/.gitignore:
--------------------------------------------------------------------------------
1 | # 默认忽略的文件
2 | /shelf/
3 | /workspace.xml
4 |
--------------------------------------------------------------------------------
/lianjia/2021211338-郭柏彤-爬虫小作业-源代码/.idea/inspectionProfiles/profiles_settings.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
--------------------------------------------------------------------------------
/lianjia/2021211338-郭柏彤-爬虫小作业-源代码/.idea/lianjia.iml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
--------------------------------------------------------------------------------
/lianjia/2021211338-郭柏彤-爬虫小作业-源代码/.idea/misc.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
--------------------------------------------------------------------------------
/lianjia/2021211338-郭柏彤-爬虫小作业-源代码/.idea/modules.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
--------------------------------------------------------------------------------
/lianjia/2021211338-郭柏彤-爬虫小作业-源代码/lianjia/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Revenger666/python-house-scrapy/bbff08f70092d39ef0d19d3f0d9723aa4c8fd1b8/lianjia/2021211338-郭柏彤-爬虫小作业-源代码/lianjia/__init__.py
--------------------------------------------------------------------------------
/lianjia/2021211338-郭柏彤-爬虫小作业-源代码/lianjia/begin.py:
--------------------------------------------------------------------------------
1 | from scrapy.crawler import CrawlerRunner
2 | from scrapy.utils.log import configure_logging
3 | from twisted.internet import reactor
4 |
5 | from lianjia.spiders.spider1 import firsthandspider
6 | from lianjia.spiders.spider2 import secondhandspider
7 |
8 | configure_logging()
9 | runner = CrawlerRunner()
10 | runner.crawl(firsthandspider)
11 | runner.crawl(secondhandspider)
12 | d = runner.join()
13 | d.addBoth(lambda _: reactor.stop())
14 |
15 | reactor.run()
16 | """
17 | from scrapy import cmdline
18 |
19 | cmdline.execute("scrapy crawl spider1".split())
20 | cmdline.execute("scrapy crawl spider2".split())
21 |
22 | from scrapy.crawler import CrawlerProcess
23 | from scrapy.utils.project import get_project_settings
24 |
25 | settings = get_project_settings()
26 |
27 | crawler = CrawlerProcess(settings)
28 |
29 | crawler.crawl('spider1')
30 | crawler.crawl('spider2')
31 |
32 | crawler.start()"""
--------------------------------------------------------------------------------
/lianjia/2021211338-郭柏彤-爬虫小作业-源代码/lianjia/items.py:
--------------------------------------------------------------------------------
1 | # Define here the models for your scraped items
2 | #
3 | # See documentation in:
4 | # https://docs.scrapy.org/en/latest/topics/items.html
5 |
6 | import scrapy
7 |
8 |
9 | class firsthanditem(scrapy.Item):
10 | name = scrapy.Field()
11 | position = scrapy.Field()
12 | types = scrapy.Field()
13 | houseType = scrapy.Field()
14 | space = scrapy.Field()
15 | unitPrice = scrapy.Field()
16 | totalPrice = scrapy.Field()
17 |
18 | class secondhanditem(scrapy.Item):
19 | name = scrapy.Field()
20 | position = scrapy.Field()
21 | types = scrapy.Field()
22 | unitPrice = scrapy.Field()
23 | totalPrice = scrapy.Field()
--------------------------------------------------------------------------------
/lianjia/2021211338-郭柏彤-爬虫小作业-源代码/lianjia/middlewares.py:
--------------------------------------------------------------------------------
1 | # Define here the models for your spider middleware
2 | #
3 | # See documentation in:
4 | # https://docs.scrapy.org/en/latest/topics/spider-middleware.html
5 |
6 | from scrapy import signals
7 |
8 | # useful for handling different item types with a single interface
9 | from itemadapter import is_item, ItemAdapter
10 |
11 |
12 | import random
13 | class RandomUserAgentMiddleware(object):
14 | def __init__(self, user_agents):
15 | self.user_agents = user_agents
16 |
17 | @classmethod
18 | def from_crawler(cls, crawler):
19 | # 从settings.py中导入MY_USER_AGENT
20 | s = cls(user_agents=crawler.settings.get('MY_USER_AGENT'))
21 | return s
22 |
23 | def process_request(self, request, spider):
24 | agent = random.choice(self.user_agents)
25 | request.headers['User-Agent'] = agent
26 | return None
27 |
28 |
29 | class LianjiaSpiderMiddleware:
30 | # Not all methods need to be defined. If a method is not defined,
31 | # scrapy acts as if the spider middleware does not modify the
32 | # passed objects.
33 |
34 | @classmethod
35 | def from_crawler(cls, crawler):
36 | # This method is used by Scrapy to create your spiders.
37 | s = cls()
38 | crawler.signals.connect(s.spider_opened, signal=signals.spider_opened)
39 | return s
40 |
41 | def process_spider_input(self, response, spider):
42 | # Called for each response that goes through the spider
43 | # middleware and into the spider.
44 |
45 | # Should return None or raise an exception.
46 | return None
47 |
48 | def process_spider_output(self, response, result, spider):
49 | # Called with the results returned from the Spider, after
50 | # it has processed the response.
51 |
52 | # Must return an iterable of Request, or item objects.
53 | for i in result:
54 | yield i
55 |
56 | def process_spider_exception(self, response, exception, spider):
57 | # Called when a spider or process_spider_input() method
58 | # (from other spider middleware) raises an exception.
59 |
60 | # Should return either None or an iterable of Request or item objects.
61 | pass
62 |
63 | def process_start_requests(self, start_requests, spider):
64 | # Called with the start requests of the spider, and works
65 | # similarly to the process_spider_output() method, except
66 | # that it doesn’t have a response associated.
67 |
68 | # Must return only requests (not items).
69 | for r in start_requests:
70 | yield r
71 |
72 | def spider_opened(self, spider):
73 | spider.logger.info("Spider opened: %s" % spider.name)
74 |
75 |
76 | class LianjiaDownloaderMiddleware:
77 | # Not all methods need to be defined. If a method is not defined,
78 | # scrapy acts as if the downloader middleware does not modify the
79 | # passed objects.
80 |
81 | @classmethod
82 | def from_crawler(cls, crawler):
83 | # This method is used by Scrapy to create your spiders.
84 | s = cls()
85 | crawler.signals.connect(s.spider_opened, signal=signals.spider_opened)
86 | return s
87 |
88 | def process_request(self, request, spider):
89 | # Called for each request that goes through the downloader
90 | # middleware.
91 |
92 | # Must either:
93 | # - return None: continue processing this request
94 | # - or return a Response object
95 | # - or return a Request object
96 | # - or raise IgnoreRequest: process_exception() methods of
97 | # installed downloader middleware will be called
98 | return None
99 |
100 | def process_response(self, request, response, spider):
101 | # Called with the response returned from the downloader.
102 |
103 | # Must either;
104 | # - return a Response object
105 | # - return a Request object
106 | # - or raise IgnoreRequest
107 | return response
108 |
109 | def process_exception(self, request, exception, spider):
110 | # Called when a download handler or a process_request()
111 | # (from other downloader middleware) raises an exception.
112 |
113 | # Must either:
114 | # - return None: continue processing this exception
115 | # - return a Response object: stops process_exception() chain
116 | # - return a Request object: stops process_exception() chain
117 | pass
118 |
119 | def spider_opened(self, spider):
120 | spider.logger.info("Spider opened: %s" % spider.name)
121 |
--------------------------------------------------------------------------------
/lianjia/2021211338-郭柏彤-爬虫小作业-源代码/lianjia/pipelines.py:
--------------------------------------------------------------------------------
1 | # Define your item pipelines here
2 | #
3 | # Don't forget to add your pipeline to the ITEM_PIPELINES setting
4 | # See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html
5 |
6 |
7 | # useful for handling different item types with a single interface
8 | from itemadapter import ItemAdapter
9 | import json
10 |
11 | class firsthandline(object):
12 | def open_spider(self, spider):
13 | try:
14 | self.file = open('scrapy-test-firsthand.json',"w",encoding="utf-8")
15 | except Exception as err:
16 | print(err)
17 |
18 | def process_item(self, item, spider):
19 | dict_item = dict(item)
20 | json_str = json.dumps(dict_item, ensure_ascii=False) + "\n"
21 | self.file.write(json_str)
22 | return item
23 |
24 | def close_spider(self, spider):
25 | self.file.close()
26 |
27 |
28 | class secondhandline(object):
29 | def open_spider(self, spider):
30 | try:
31 | self.file = open('scrapy-test-secondhand.json', "w", encoding="utf-8")
32 | except Exception as err:
33 | print(err)
34 |
35 | def process_item(self, item, spider):
36 | dict_item = dict(item)
37 | json_str = json.dumps(dict_item, ensure_ascii=False) + "\n"
38 | self.file.write(json_str)
39 | return item
40 |
41 | def close_spider(self, spider):
42 | self.file.close()
--------------------------------------------------------------------------------
/lianjia/2021211338-郭柏彤-爬虫小作业-源代码/lianjia/settings.py:
--------------------------------------------------------------------------------
1 | # Scrapy settings for lianjia project
2 | #
3 | # For simplicity, this file contains only settings considered important or
4 | # commonly used. You can find more settings consulting the documentation:
5 | #
6 | # https://docs.scrapy.org/en/latest/topics/settings.html
7 | # https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
8 | # https://docs.scrapy.org/en/latest/topics/spider-middleware.html
9 |
10 | BOT_NAME = "lianjia"
11 | #2403:a200:a200:13f1:183:84:18:11
12 |
13 | SPIDER_MODULES = ["lianjia.spiders"]
14 | NEWSPIDER_MODULE = "lianjia.spiders"
15 |
16 | # Crawl responsibly by identifying yourself (and your website) on the user-agent
17 | #USER_AGENT = 'python_qm (+http://www.yourdomain.com)'
18 | #USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"
19 | #USER_AGENT ="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36 Edg/119.0.0.0"
20 | DOWNLOADER_MIDDLEWARES = {
21 | # 'lianjia.middlewares.LianjiaDownloaderMiddleware': 543,
22 | 'lianjia.middlewares.RandomUserAgentMiddleware': 900,
23 | }
24 |
25 | MY_USER_AGENT = [
26 | "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; AcooBrowser; .NET CLR 1.1.4322; .NET CLR 2.0.50727)",
27 | "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; Acoo Browser; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.0.04506)",
28 | "Mozilla/4.0 (compatible; MSIE 7.0; AOL 9.5; AOLBuild 4337.35; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)",
29 | "Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US)",
30 | "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET CLR 2.0.50727; Media Center PC 6.0)",
31 | "Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET CLR 1.0.3705; .NET CLR 1.1.4322)",
32 | "Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 5.2; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.2; .NET CLR 3.0.04506.30)",
33 | "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN) AppleWebKit/523.15 (KHTML, like Gecko, Safari/419.3) Arora/0.3 (Change: 287 c9dfb30)",
34 | "Mozilla/5.0 (X11; U; Linux; en-US) AppleWebKit/527+ (KHTML, like Gecko, Safari/419.3) Arora/0.6",
35 | "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.2pre) Gecko/20070215 K-Ninja/2.1.1",
36 | "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.9) Gecko/20080705 Firefox/3.0 Kapiko/3.0",
37 | "Mozilla/5.0 (X11; Linux i686; U;) Gecko/20070322 Kazehakase/0.4.5",
38 | "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.8) Gecko Fedora/1.9.0.8-1.fc10 Kazehakase/0.5.6",
39 | "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11",
40 | "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/535.20 (KHTML, like Gecko) Chrome/19.0.1036.7 Safari/535.20",
41 | "Opera/9.80 (Macintosh; Intel Mac OS X 10.6.8; U; fr) Presto/2.9.168 Version/11.52",
42 | "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.11 TaoBrowser/2.0 Safari/536.11",
43 | "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.71 Safari/537.1 LBBROWSER",
44 | "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; LBBROWSER)",
45 | "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E; LBBROWSER)",
46 | "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.84 Safari/535.11 LBBROWSER",
47 | "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E)",
48 | "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; QQBrowser/7.0.3698.400)",
49 | "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E)",
50 | "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; SV1; QQDownload 732; .NET4.0C; .NET4.0E; 360SE)",
51 | "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E)",
52 | "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E)",
53 | "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1",
54 | "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1",
55 | "Mozilla/5.0 (iPad; U; CPU OS 4_2_1 like Mac OS X; zh-cn) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8C148 Safari/6533.18.5",
56 | "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:2.0b13pre) Gecko/20110307 Firefox/4.0b13pre",
57 | "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:16.0) Gecko/20100101 Firefox/16.0",
58 | "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11",
59 | "Mozilla/5.0 (X11; U; Linux x86_64; zh-CN; rv:1.9.2.10) Gecko/20100922 Ubuntu/10.10 (maverick) Firefox/3.6.10",
60 | "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36",
61 | ]
62 |
63 | # Obey robots.txt rules
64 | ROBOTSTXT_OBEY = False
65 |
66 | #LOG_LEVEL = 'WARNING'
67 |
68 | #LOG_LEVEL = "WARNING"
69 | # Configure maximum concurrent requests performed by Scrapy (default: 16)
70 | #CONCURRENT_REQUESTS = 8
71 |
72 | # Configure a delay for requests for the same website (default: 0)
73 | # See https://docs.scrapy.org/en/latest/topics/settings.html#download-delay
74 | # See also autothrottle settings and docs
75 | DOWNLOAD_DELAY = 3
76 | RANDOMIZE_DOWNLOAD_DELAY = True
77 | #CONCURRENT_REQUESTS_PER_DOMAIN = 2
78 | # The download delay setting will honor only one of:
79 | #CONCURRENT_REQUESTS_PER_DOMAIN = 16
80 | #CONCURRENT_REQUESTS_PER_IP = 16
81 |
82 | # Disable cookies (enabled by default)
83 | #COOKIES_ENABLED = False
84 |
85 | # Disable Telnet Console (enabled by default)
86 | #TELNETCONSOLE_ENABLED = False
87 |
88 | # Override the default request headers:
89 | #DEFAULT_REQUEST_HEADERS = {
90 | # "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
91 | # "Accept-Language": "en",
92 | # "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.4389.82 Safari/537.36"
93 | #}
94 |
95 | # Enable or disable spider middlewares
96 | # See https://docs.scrapy.org/en/latest/topics/spider-middleware.html
97 | #SPIDER_MIDDLEWARES = {
98 | # "lianjia.middlewares.LianjiaSpiderMiddleware": 543,
99 | #}
100 |
101 | # Enable or disable downloader middlewares
102 | # See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
103 | #DOWNLOADER_MIDDLEWARES = {
104 | # "lianjia.middlewares.LianjiaDownloaderMiddleware": 543,
105 | #}
106 |
107 | # Enable or disable extensions
108 | # See https://docs.scrapy.org/en/latest/topics/extensions.html
109 | #EXTENSIONS = {
110 | # "scrapy.extensions.telnet.TelnetConsole": None,
111 | #}
112 |
113 | # Configure item pipelines
114 | # See https://docs.scrapy.org/en/latest/topics/item-pipeline.html
115 | ITEM_PIPELINES = {'lianjia.pipelines.firsthandline': 300, 'lianjia.pipelines.secondhandline': 300,}
116 | #ITEM_PIPELINES = {'lianjia.pipelines.firsthandline': 300}
117 | # Enable and configure the AutoThrottle extension (disabled by default)
118 | # See https://docs.scrapy.org/en/latest/topics/autothrottle.html
119 | #AUTOTHROTTLE_ENABLED = True
120 | # The initial download delay
121 | #AUTOTHROTTLE_START_DELAY = 5
122 | # The maximum download delay to be set in case of high latencies
123 | #AUTOTHROTTLE_MAX_DELAY = 60
124 | # The average number of requests Scrapy should be sending in parallel to
125 | # each remote server
126 | #AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0
127 | # Enable showing throttling stats for every response received:
128 | #AUTOTHROTTLE_DEBUG = False
129 |
130 | # Enable and configure HTTP caching (disabled by default)
131 | # See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings
132 | #HTTPCACHE_ENABLED = True
133 | #HTTPCACHE_EXPIRATION_SECS = 0
134 | #HTTPCACHE_DIR = "httpcache"
135 | #HTTPCACHE_IGNORE_HTTP_CODES = []
136 | #HTTPCACHE_STORAGE = "scrapy.extensions.httpcache.FilesystemCacheStorage"
137 |
138 | # Set settings whose default value is deprecated to a future-proof value
139 | REQUEST_FINGERPRINTER_IMPLEMENTATION = "2.7"
140 | TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"
141 | FEED_EXPORT_ENCODING = "utf-8"
142 |
--------------------------------------------------------------------------------
/lianjia/2021211338-郭柏彤-爬虫小作业-源代码/lianjia/spiders/__init__.py:
--------------------------------------------------------------------------------
1 | # This package will contain the spiders of your Scrapy project
2 | #
3 | # Please refer to the documentation for information on how to create and manage
4 | # your spiders.
5 |
--------------------------------------------------------------------------------
/lianjia/2021211338-郭柏彤-爬虫小作业-源代码/lianjia/spiders/spider1.py:
--------------------------------------------------------------------------------
1 | import scrapy
2 | from scrapy import Selector, Request
3 | from scrapy.http import HtmlResponse
4 | from lianjia.items import firsthanditem
5 | class firsthandspider(scrapy.spiders.Spider):
6 | name = "lianjia1"
7 | allowed_domains = ["bj.fang.lianjia.com"]
8 | start_urls = []
9 | for page in range(3, 8):
10 | url1 = 'https://bj.fang.lianjia.com/loupan/pg{}/'.format(page)
11 | start_urls.append(url1)
12 | #for page in range(3, 8):
13 | # url1 = 'https://bj.lianjia.com/ershoufang/pg{}/'.format(page)
14 | # start_urls.append(url1)
15 |
16 | custom_settings = {
17 | 'ITEM_PIPELINES': {'lianjia.pipelines.firsthandline': 300},
18 | }
19 |
20 | def parse(self, response):
21 |
22 | item = firsthanditem()
23 | div_list = response.xpath("/html/body/div[3]/ul[2]/li")
24 | #div_list = response.xpath("//*")
25 | #print(div_list)
26 | for each in div_list:
27 | item['name'] = each.xpath("./div[@class=\"resblock-desc-wrapper\"]/div[@class=\"resblock-name\"]/a/text()").extract_first()
28 | item['types'] = each.xpath("./div[@class=\"resblock-desc-wrapper\"]/div[@class=\"resblock-name\"]/span[@class=\"resblock-type\"]/text()").extract_first()
29 | item['position'] = each.xpath("./div[@class=\"resblock-desc-wrapper\"]/div[@class=\"resblock-location\"]/a/text()").extract_first()
30 | item['houseType'] = each.xpath("./div[@class=\"resblock-desc-wrapper\"]/a[@class=\"resblock-room\"]/span/text()").extract_first()
31 | item['space'] = each.xpath("./div[@class=\"resblock-desc-wrapper\"]/div[@class=\"resblock-area\"]/span/text()").extract_first()
32 | item['unitPrice'] = each.xpath("./div[@class=\"resblock-desc-wrapper\"]/div[@class=\"resblock-price\"]/div[@class=\"main-price\"]/span[@class = \"number\"]/text()").extract_first()
33 | item['totalPrice'] = each.xpath("./div[@class=\"resblock-desc-wrapper\"]/div[@class=\"resblock-price\"]/div[@class=\"second\"]/text()").extract_first()
34 | yield item
35 |
--------------------------------------------------------------------------------
/lianjia/2021211338-郭柏彤-爬虫小作业-源代码/lianjia/spiders/spider2.py:
--------------------------------------------------------------------------------
1 | import scrapy
2 | from scrapy import Selector, Request
3 | from scrapy.http import HtmlResponse
4 | from lianjia.items import secondhanditem
5 | class secondhandspider(scrapy.spiders.Spider):
6 | name = "lianjia2"
7 | allowed_domains = ["bj.lianjia.com"]
8 | start_urls = []
9 | for page in range(3, 8):
10 | url1 = 'https://bj.lianjia.com/ershoufang/pg{}/'.format(page)
11 | start_urls.append(url1)
12 |
13 | custom_settings = {
14 | 'ITEM_PIPELINES': {'lianjia.pipelines.secondhandline': 300},
15 | }
16 |
17 | def parse(self, response):
18 |
19 | item = secondhanditem()
20 | div_list = response.xpath("//*[@id=\"content\"]/div[1]/ul/li")
21 |
22 | #print(div_list)
23 | for each in div_list:
24 | item['name'] = each.xpath("./div[@class=\"info clear\"]/div[@class=\"flood\"]/div/a[1]/text()").extract_first()
25 | item['position'] = each.xpath("./div[@class=\"info clear\"]/div[@class=\"flood\"]/div/a[2]/text()").extract_first()
26 | item['types'] = each.xpath("./div[@class=\"info clear\"]/div[@class=\"address\"]/div/text()").extract_first()
27 | item['unitPrice'] = each.xpath("./div[@class=\"info clear\"]/div[@class=\"priceInfo\"]/div[2]/span/text()").extract_first()
28 | item['totalPrice'] = each.xpath("./div[@class=\"info clear\"]/div[@class=\"priceInfo\"]/div[1]/span/text()").extract_first()
29 | yield item
30 |
--------------------------------------------------------------------------------
/lianjia/2021211338-郭柏彤-爬虫小作业-源代码/scrapy.cfg:
--------------------------------------------------------------------------------
1 | # Automatically created by: scrapy startproject
2 | #
3 | # For more information about the [deploy] section see:
4 | # https://scrapyd.readthedocs.io/en/latest/deploy.html
5 |
6 | [settings]
7 | default = lianjia.settings
8 |
9 | [deploy]
10 | #url = http://localhost:6800/
11 | project = lianjia
12 |
--------------------------------------------------------------------------------
/lianjia/2021211338-郭柏彤-爬虫小作业-爬取的数据文件/scrapy-test-firsthand.json:
--------------------------------------------------------------------------------
1 | {"name": "北辰墅院1900", "types": "住宅", "position": "顺兴街11号院望尊园", "houseType": "3室", "space": "建面 83-135㎡", "unitPrice": "36000", "totalPrice": "总价430(万/套)"}
2 | {"name": "燕西华府", "types": "别墅", "position": "王佐镇青龙湖公园东1500米, 泉湖西路1号院(七区), 泉湖西路1号院(六区)", "houseType": "3室", "space": "建面 350-851㎡", "unitPrice": "47000", "totalPrice": "总价1400-3500(万/套)"}
3 | {"name": "京西悦府", "types": "住宅", "position": "燕房线阎村地铁站东南角约189米", "houseType": null, "space": "建面 120-135㎡", "unitPrice": "33000", "totalPrice": "总价440(万/套)"}
4 | {"name": "福景苑", "types": "住宅", "position": "亮马桥路46号", "houseType": "1室", "space": "建面 145-268㎡", "unitPrice": "83000", "totalPrice": "总价1150-2400(万/套)"}
5 | {"name": "合景寰汇公馆", "types": "住宅", "position": "北京市通州区滨河中路西侧(合景寰汇公馆)", "houseType": "2室", "space": "建面 77-117㎡", "unitPrice": "35000", "totalPrice": "总价280-490(万/套)"}
6 | {"name": "K2十里春风", "types": "住宅", "position": "北京市通州区", "houseType": "2室", "space": "建面 74-90㎡", "unitPrice": "23500", "totalPrice": "总价188-212(万/套)"}
7 | {"name": "K2十里春风", "types": "别墅", "position": "北京市通州区", "houseType": "3室", "space": "建面 155-156㎡", "unitPrice": "28000", "totalPrice": "总价440-460(万/套)"}
8 | {"name": "玺萌壹號院", "types": "别墅", "position": "西南三环嘉园路与镇国寺北街交叉口", "houseType": "5室", "space": "建面 320-464㎡", "unitPrice": "90000", "totalPrice": "总价3650-3940(万/套)"}
9 | {"name": "北京书院", "types": "住宅", "position": "北京市朝阳区北土城东路辅路", "houseType": "1室", "space": "建面 79-139㎡", "unitPrice": "155000", "totalPrice": "总价1066(万/套)"}
10 | {"name": "中铁华侨城和园", "types": "住宅", "position": "南五环南海子公园西侧约500米", "houseType": "3室", "space": "建面 154-184㎡", "unitPrice": "60000", "totalPrice": "总价930-980(万/套)"}
11 | {"name": "顺鑫颐和天璟", "types": "住宅", "position": "北京市顺义区牛栏山镇牛富路顺鑫颐和天璟禧润售楼中心", "houseType": "4室", "space": "建面 110-220㎡", "unitPrice": "28000", "totalPrice": "总价400-420(万/套)"}
12 | {"name": "顺鑫颐和天璟", "types": "别墅", "position": "新城右堤路与昌金路交汇处向北200米", "houseType": "4室", "space": "建面 278-486㎡", "unitPrice": "28000", "totalPrice": "总价950-1200(万/套)"}
13 | {"name": "永旺19街", "types": "商业", "position": "地铁生物医药基地站向南200米", "houseType": null, "space": null, "unitPrice": "24000", "totalPrice": "总价299(万/套)"}
14 | {"name": "北京城建北京合院", "types": "住宅", "position": "燕京街与通顺路交汇口东800米(仁和公园南)", "houseType": "3室", "space": "建面 95-130㎡", "unitPrice": "46000", "totalPrice": "总价556-566(万/套)"}
15 | {"name": "复地运河公馆", "types": "住宅", "position": "通州运河核心区临滨河西路", "houseType": "2室", "space": "建面 89-145㎡", "unitPrice": "43000", "totalPrice": "总价450-650(万/套)"}
16 | {"name": "北京城建北京合院", "types": "别墅", "position": "燕京街与通顺路交汇口东800米(仁和公园南)", "houseType": "4室", "space": "建面 210-330㎡", "unitPrice": "39000", "totalPrice": "总价1000-1300(万/套)"}
17 | {"name": "月亮河七星公馆", "types": "住宅", "position": "通燕高速耿庄桥出口南200米月亮河,河滨路1号", "houseType": "1室", "space": "建面 55-109㎡", "unitPrice": "68000", "totalPrice": "总价374-800(万/套)"}
18 | {"name": "天润福熙大道", "types": "住宅", "position": "清河营东路1号院, 清河营东路3号院", "houseType": "1室", "space": "建面 65-374㎡", "unitPrice": "108000", "totalPrice": "总价750-3316(万/套)"}
19 | {"name": "京贸国际公馆", "types": "住宅", "position": "怡乐中路299号院(广渠快速路二期出口向南1000米)", "houseType": "1室", "space": "建面 72-147㎡", "unitPrice": "64000", "totalPrice": "总价495-950(万/套)"}
20 | {"name": "凯德麓语", "types": "别墅", "position": "兴寿镇京承高速G11出口向西怀昌路北侧", "houseType": "3室", "space": "建面 280-863㎡", "unitPrice": "35000", "totalPrice": "总价850-3450(万/套)"}
21 | {"name": "京贸国际城·峰景", "types": "住宅", "position": "芙蓉东路1号(通燕高速耿庄桥北出口向南300米)", "houseType": "1室", "space": "建面 69-140㎡", "unitPrice": "68000", "totalPrice": "总价460-980(万/套)"}
22 | {"name": "观唐云鼎", "types": "别墅", "position": "溪翁庄镇密溪路39号院(云佛山度假村对面)", "houseType": "3室", "space": "建面 346-613㎡", "unitPrice": "30000", "totalPrice": "总价1068-1850(万/套)"}
23 | {"name": "硅谷SOHO", "types": "商业类", "position": "京藏高速科技园出口(28出口)凉水河路", "houseType": "1室", "space": "建面 49-68㎡", "unitPrice": "20000", "totalPrice": "总价85-180(万/套)"}
24 | {"name": "旭辉城", "types": "住宅", "position": "北京市房山区良锦街6号院旭辉城营销中心", "houseType": "2室", "space": "建面 75-116㎡", "unitPrice": "28500", "totalPrice": "总价219-330(万/套)"}
25 | {"name": "檀香府", "types": "住宅", "position": "京潭大街与潭柘十街交叉口", "houseType": "3室", "space": "建面 124-170㎡", "unitPrice": "42000", "totalPrice": "总价530-750(万/套)"}
26 | {"name": "泰禾金府大院", "types": "别墅", "position": "南四环地铁新宫站南800米", "houseType": "4室", "space": "建面 362-504㎡", "unitPrice": "75000", "totalPrice": "总价2700-3700(万/套)"}
27 | {"name": "和棠瑞著", "types": "别墅", "position": "金海湖景区坝前广场西侧500米", "houseType": "3室", "space": "建面 305-360㎡", "unitPrice": "16000", "totalPrice": "总价530-560(万/套)"}
28 | {"name": "尊悦光华", "types": "住宅", "position": "北京市朝阳区光华东里甲1号院3号楼", "houseType": "3室", "space": "建面 133-171㎡", "unitPrice": "150000", "totalPrice": "总价2500(万/套)"}
29 | {"name": "首创·河著", "types": "别墅", "position": "京承高速11出口(昌金路)向东900 米路北", "houseType": "4室", "space": "建面 248-310㎡", "unitPrice": "38000", "totalPrice": "总价1200-1900(万/套)"}
30 | {"name": "华萃西山", "types": "住宅", "position": "永定镇地铁S1号线石厂西南700米", "houseType": "3室", "space": "建面 115-122㎡", "unitPrice": "48000", "totalPrice": "总价560-600(万/套)"}
31 | {"name": "京西悦府", "types": "别墅", "position": "北京市房山区燕房线阎村地铁站东南角约189米", "houseType": "3室", "space": "建面 175-176㎡", "unitPrice": "40000", "totalPrice": "总价700-780(万/套)"}
32 | {"name": "中粮天恒天悦壹号", "types": "别墅", "position": "南四环地铁新宫站南500米", "houseType": "4室", "space": "建面 220-340㎡", "unitPrice": "80000", "totalPrice": "总价2000-2360(万/套)"}
33 | {"name": "龙湾别墅", "types": "住宅", "position": "后沙峪镇龙湾别墅", "houseType": "4室", "space": "建面 218-317㎡", "unitPrice": "70000", "totalPrice": "总价2300(万/套)"}
34 | {"name": "京投发展·锦悦府", "types": "住宅", "position": "檀营乡檀东路西侧", "houseType": "3室", "space": "建面 90㎡", "unitPrice": "25607", "totalPrice": "总价220(万/套)"}
35 | {"name": "京投发展·锦悦府", "types": "别墅", "position": "檀营乡檀东路西侧", "houseType": "3室", "space": "建面 187-285㎡", "unitPrice": "25000", "totalPrice": "总价400-560(万/套)"}
36 | {"name": "金辰府", "types": "住宅", "position": "北京市昌平区北七家镇政府东南100米", "houseType": "3室", "space": "建面 89-143㎡", "unitPrice": "55000", "totalPrice": "总价490-790(万/套)"}
37 | {"name": "建邦·顺颐府", "types": "住宅", "position": "空港B区裕民大街30号千里马国际一层建邦·顺颐府售楼中心", "houseType": "3室", "space": "建面 89-147㎡", "unitPrice": "55583", "totalPrice": "总价480-845(万/套)"}
38 | {"name": "葛洲坝中国府", "types": "住宅", "position": "北京市丰台东路46号", "houseType": "3室", "space": "建面 168-240㎡", "unitPrice": "125000", "totalPrice": "总价2200-3000(万/套)"}
39 | {"name": "华萃西山", "types": "别墅", "position": "门头沟永定镇地铁S1号线石厂站西南700米", "houseType": "4室", "space": "建面 135-245㎡", "unitPrice": "48000", "totalPrice": "总价760-1060(万/套)"}
40 | {"name": "北京东湾", "types": "住宅", "position": "通惠北路98号", "houseType": "1室", "space": "建面 58-130㎡", "unitPrice": "68500", "totalPrice": "总价410-900(万/套)"}
41 | {"name": "富兴首府", "types": "住宅", "position": "东坝路9号东北60米", "houseType": "3室", "space": "建面 144-356㎡", "unitPrice": "85000", "totalPrice": "总价1706-2240(万/套)"}
42 | {"name": "中铁诺德阅墅", "types": "别墅", "position": "顺义区后沙峪镇裕园路762乡龙湖滟澜山对面", "houseType": "4室", "space": "建面 235-320㎡", "unitPrice": "50000", "totalPrice": "总价1150-1700(万/套)"}
43 | {"name": "中铁华侨城和园", "types": "别墅", "position": "南五环南海子公园西侧约500米", "houseType": "4室", "space": "建面 288-370㎡", "unitPrice": "50000", "totalPrice": "总价1870(万/套)"}
44 | {"name": "懋源·璟岳", "types": "别墅", "position": "南三环西路99号院", "houseType": "4室", "space": "建面 465-590㎡", "unitPrice": "140000", "totalPrice": "总价6500-9000(万/套)"}
45 | {"name": "懋源·璟玺", "types": "别墅", "position": "孙河京密路与京平辅路交叉口西行1000米", "houseType": "5室", "space": "建面 500-716㎡", "unitPrice": "100000", "totalPrice": "总价4380-6778(万/套)"}
46 | {"name": "万科雲庐", "types": "住宅", "position": "魏各庄路万科雲庐。售楼处位置在山湖路与泉湖西湖交叉位置", "houseType": "4室", "space": "建面 104-300㎡", "unitPrice": "39000", "totalPrice": "总价656-820(万/套)"}
47 | {"name": "万科雲庐", "types": "别墅", "position": "魏各庄路万科雲庐", "houseType": "4室", "space": "建面 200-330㎡", "unitPrice": "30000", "totalPrice": "总价852-950(万/套)"}
48 | {"name": "金茂北京国际社区", "types": "住宅", "position": "顺义新城北小营昌金路水色时光路西", "houseType": "1室", "space": "建面 50-118㎡", "unitPrice": "30000", "totalPrice": "总价160-360(万/套)"}
49 | {"name": "住总如院", "types": "住宅", "position": "北京市大兴区采华路(波尔多小镇南区西南侧约250米)", "houseType": "2室", "space": "建面 98-233㎡", "unitPrice": "31136", "totalPrice": "总价280-475(万/套)"}
50 | {"name": "郎府书苑", "types": "住宅", "position": "西集镇京哈高速郎府出口南侧300米", "houseType": "3室", "space": "建面 89-116㎡", "unitPrice": "25800", "totalPrice": "总价273-300(万/套)"}
51 |
--------------------------------------------------------------------------------
/lianjia/2021211338-郭柏彤-爬虫小作业-说明文档.docx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Revenger666/python-house-scrapy/bbff08f70092d39ef0d19d3f0d9723aa4c8fd1b8/lianjia/2021211338-郭柏彤-爬虫小作业-说明文档.docx
--------------------------------------------------------------------------------
/lianjia/datachange.py:
--------------------------------------------------------------------------------
1 | import csv
2 | import json
3 | import codecs
4 |
5 | '''
6 | 将json文件格式转为csv文件格式并保存。
7 | '''
8 |
9 | class Json_Csv():
10 |
11 | # 初始化方法,创建csv文件。
12 | def __init__(self):
13 | self.save_csv = open('house_output.csv', 'w', encoding='utf-8', newline='')
14 |
15 | self.write_csv = csv.writer(self.save_csv, delimiter=',') # 以,为分隔符
16 | def trans(self, filename):
17 | with codecs.open(filename, 'r', encoding='utf-8') as f: #读取json文件
18 | read = f.readlines()
19 | flag = True
20 | for index, info in enumerate(read):
21 | data = json.loads(info)
22 | if flag: # 第一行当做head
23 | keys = list(data.keys()) # 将得到的keys用列表的形式封装好,才能写入csv
24 | self.write_csv.writerow(keys)#以,为分隔符将表头写入csv中
25 | flag = False # 释放
26 | value = list(data.values()) # 写入values,也要是列表形式
27 |
28 | temp = value[6]#将面积只保留最小面积,并转换为int形
29 | if type(temp) == str:
30 | list_temp = temp.split(' ')
31 | list_temp = list_temp[1].split('-')
32 | list_temp = list_temp[0].split('㎡')
33 | value[6] = int(list_temp[0])
34 |
35 | value[7] = int(value[7])#将单价转换为Int形式,单位为元
36 |
37 | temp = value[8]#将总价只保留最小的,转换为int型,单位为万元
38 | if type(temp) == str:
39 | list_temp = temp.split('价')
40 | list_temp = list_temp[1].split('-')
41 | list_temp = list_temp[0].split('(')
42 | value[8] = int(list_temp[0])
43 |
44 | self.write_csv.writerow(value)#以,为分隔符将数据写入表格中
45 | self.save_csv.close() # 写完就关闭
46 |
47 |
48 | if __name__ == '__main__':
49 | json_csv = Json_Csv()
50 | path = 'scrapy-test-firsthand.json'
51 | json_csv.trans(path)
--------------------------------------------------------------------------------
/lianjia/house_output.csv:
--------------------------------------------------------------------------------
1 | name,types,position,position1,position2,houseType,space,unitPrice,totalPrice
2 | 北辰墅院1900,住宅,顺兴街11号院望尊园,顺义,马坡,3室,83,36000,430
3 | 燕西华府,别墅,"王佐镇青龙湖公园东1500米, 泉湖西路1号院(七区), 泉湖西路1号院(六区)",丰台,丰台其它,3室,350,47000,1400
4 | 京西悦府,住宅,燕房线阎村地铁站东南角约189米,房山,阎村,,120,33000,440
5 | 福景苑,住宅,亮马桥路46号,朝阳,燕莎,1室,145,83000,1150
6 | 合景寰汇公馆,住宅,北京市通州区滨河中路西侧(合景寰汇公馆),通州,武夷花园,2室,77,35000,280
7 | K2十里春风,住宅,北京市通州区,通州,通州其它,2室,74,23500,188
8 | K2十里春风,别墅,北京市通州区,通州,通州其它,3室,155,28000,440
9 | 玺萌壹號院,别墅,西南三环嘉园路与镇国寺北街交叉口,丰台,草桥,5室,320,90000,3650
10 | 北京书院,住宅,北京市朝阳区北土城东路辅路,朝阳,惠新西街,1室,79,155000,1066
11 | 中铁华侨城和园,住宅,南五环南海子公园西侧约500米,大兴,瀛海,3室,154,60000,930
12 | 顺鑫颐和天璟,住宅,北京市顺义区牛栏山镇牛富路顺鑫颐和天璟禧润售楼中心,顺义,顺义其它,4室,110,28000,400
13 | 顺鑫颐和天璟,别墅,新城右堤路与昌金路交汇处向北200米,顺义,顺义其它,4室,278,28000,950
14 | 永旺19街,商业,地铁生物医药基地站向南200米,大兴,天宫院,,,24000,299
15 | 北京城建北京合院,住宅,燕京街与通顺路交汇口东800米(仁和公园南),顺义,顺义其它,3室,95,46000,556
16 | 复地运河公馆,住宅,通州运河核心区临滨河西路,通州,武夷花园,2室,89,43000,450
17 | 北京城建北京合院,别墅,燕京街与通顺路交汇口东800米(仁和公园南),顺义,顺义其它,4室,210,39000,1000
18 | 月亮河七星公馆,住宅,通燕高速耿庄桥出口南200米月亮河,河滨路1号,通州,武夷花园,1室,55,68000,374
19 | 天润福熙大道,住宅,"清河营东路1号院, 清河营东路3号院",朝阳,北苑,1室,65,108000,750
20 | 京贸国际公馆,住宅,怡乐中路299号院(广渠快速路二期出口向南1000米),通州,九棵树(家乐福),1室,72,64000,495
21 | 凯德麓语,别墅,兴寿镇京承高速G11出口向西怀昌路北侧,昌平,昌平其它,3室,280,35000,850
22 | 京贸国际城·峰景,住宅,芙蓉东路1号(通燕高速耿庄桥北出口向南300米),通州,武夷花园,1室,69,68000,460
23 | 观唐云鼎,别墅,溪翁庄镇密溪路39号院(云佛山度假村对面),密云,溪翁庄镇,3室,346,30000,1068
24 | 旭辉城,住宅,北京市房山区良锦街6号院旭辉城营销中心,房山,房山其它,2室,75,28500,219
25 | 檀香府,住宅,京潭大街与潭柘十街交叉口,门头沟,门头沟其它,3室,124,42000,530
26 | 泰禾金府大院,别墅,南四环地铁新宫站南800米,丰台,新宫,4室,362,75000,2700
27 | 和棠瑞著,别墅,金海湖景区坝前广场西侧500米,平谷,平谷其它,3室,305,16000,530
28 | 尊悦光华,住宅,北京市朝阳区光华东里甲1号院3号楼,朝阳,CBD,3室,133,150000,2500
29 | 首创·河著,别墅,京承高速11出口(昌金路)向东900 米路北,顺义,顺义其它,4室,248,38000,1200
30 | 华萃西山,住宅,永定镇地铁S1号线石厂西南700米,门头沟,门头沟其它,3室,115,48000,560
31 | 京西悦府,别墅,北京市房山区燕房线阎村地铁站东南角约189米,房山,阎村,3室,175,40000,700
32 | 中粮天恒天悦壹号,别墅,南四环地铁新宫站南500米,丰台,新宫,4室,220,80000,2000
33 | 龙湾别墅,住宅,后沙峪镇龙湾别墅,顺义,中央别墅区,4室,218,70000,2300
34 | 京投发展·锦悦府,住宅,檀营乡檀东路西侧,密云,鼓楼街道,3室,90,25607,220
35 | 京投发展·锦悦府,别墅,檀营乡檀东路西侧,密云,鼓楼街道,3室,187,25000,400
36 | 金辰府,住宅,北京市昌平区北七家镇政府东南100米,昌平,北七家,3室,89,55000,490
37 | 建邦·顺颐府,住宅,空港B区裕民大街30号千里马国际一层建邦·顺颐府售楼中心,顺义,后沙峪,3室,89,55583,480
38 | 葛洲坝中国府,住宅,北京市丰台东路46号,丰台,玉泉营,3室,168,125000,2200
39 | 华萃西山,别墅,门头沟永定镇地铁S1号线石厂站西南700米,门头沟,门头沟其它,4室,135,48000,760
40 | 富兴首府,住宅,东坝路9号东北60米,朝阳,东坝,3室,144,85000,1706
41 | 中铁诺德阅墅,别墅,顺义区后沙峪镇裕园路762乡龙湖滟澜山对面,顺义,中央别墅区,4室,235,50000,1150
42 | 中铁华侨城和园,别墅,南五环南海子公园西侧约500米,大兴,瀛海,4室,288,50000,1870
43 | 懋源·璟岳,别墅,南三环西路99号院,丰台,玉泉营,4室,465,140000,6500
44 | 合景泰富天汇,住宅,顺义区昌金路与通顺路交汇处,顺义,马坡,2室,70,33000,230
45 | 懋源·璟玺,别墅,孙河京密路与京平辅路交叉口西行1000米,朝阳,中央别墅区,5室,500,100000,4380
46 | 万科雲庐,住宅,魏各庄路万科雲庐。售楼处位置在山湖路与泉湖西湖交叉位置,丰台,丰台其它,4室,104,39000,656
47 | 万科雲庐,别墅,魏各庄路万科雲庐,丰台,丰台其它,4室,200,30000,852
48 | 金茂北京国际社区,住宅,顺义新城北小营昌金路水色时光路西,顺义,顺义其它,1室,50,30000,160
49 | 住总如院,住宅,北京市大兴区采华路(波尔多小镇南区西南侧约250米),大兴,大兴新机场洋房别墅区,2室,98,31136,280
50 | 郎府书苑,住宅,西集镇京哈高速郎府出口南侧300米,通州,通州其它,3室,89,25800,273
51 | 建邦·顺颐府,别墅,空港B区裕民大街30号,顺义,后沙峪,3室,270,55583,1300
52 |
--------------------------------------------------------------------------------
/lianjia/house_show.py:
--------------------------------------------------------------------------------
1 | #本代码实现了对csv文件的新房数据可视化处理,转换为散点图展示单价与总价的关系
2 |
3 | import matplotlib
4 | import matplotlib.pyplot as plt
5 | import csv
6 |
7 | filename = 'house_output.csv'
8 | with open(filename,"r",encoding='utf-8') as f: #注意这里一定记得用utf-8打开
9 | data = csv.reader(f)
10 | unit_price = []
11 | total_price = []
12 | house_type = []
13 | test_f = 1
14 | for i in data:
15 | if test_f == 1:
16 | test_f = 0
17 | else:
18 | unit_price.append(int(i[7]))
19 | total_price.append(int(i[8]))
20 | house_type.append(i[1])
21 | up1 = []
22 | up2 = []
23 | up3 = []
24 | tp1 = []
25 | tp2 = []
26 | tp3 = []
27 | for i in range(len(unit_price)):
28 | if house_type[i] == '住宅':
29 | up1.append(int(unit_price[i]))
30 | tp1.append(int(total_price[i]))
31 | if house_type[i] == '别墅':
32 | up2.append(int(unit_price[i]))
33 | tp2.append(int(total_price[i]))
34 | if house_type[i] == '商业':
35 | up3.append(int(unit_price[i]))
36 | tp3.append(int(total_price[i]))
37 |
38 | for i in range(len(tp1)):
39 | cur_index = i
40 | while tp1[cur_index - 1] > tp1[cur_index] and cur_index - 1 >= 0:
41 | tp1[cur_index], tp1[cur_index - 1] = tp1[cur_index - 1], tp1[cur_index]
42 | up1[cur_index], up1[cur_index - 1] = up1[cur_index - 1], up1[cur_index]
43 | cur_index -= 1
44 | for i in range(len(tp2)):
45 | cur_index = i
46 | while tp2[cur_index - 1] > tp2[cur_index] and cur_index - 1 >= 0:
47 | tp2[cur_index], tp2[cur_index - 1] = tp2[cur_index - 1], tp2[cur_index]
48 | up2[cur_index], up2[cur_index - 1] = up2[cur_index - 1], up2[cur_index]
49 | cur_index -= 1
50 | for i in range(len(tp3)):
51 | cur_index = i
52 | while tp3[cur_index - 1] > tp3[cur_index] and cur_index - 1 >= 0:
53 | tp3[cur_index], tp3[cur_index - 1] = tp3[cur_index - 1], tp3[cur_index]
54 | up3[cur_index], up3[cur_index - 1] = up3[cur_index - 1], up3[cur_index]
55 | cur_index -= 1
56 | #print(unit_price)
57 | #print(total_price)
58 | #print(house_type)
59 |
60 | color_list = ['#FF8C00', '#00FF00', '#0000FF'] #住宅,别墅,商业
61 | types = ['residence', 'villa', 'commercial']
62 |
63 | plt.figure(figsize=(30, 10), dpi=70)
64 | plt.title('total_price and unit_price for different type house')
65 | plt.scatter(tp1, up1, s=30, c=color_list[0])
66 | plt.scatter(tp2, up2, s=30, c=color_list[1])
67 | plt.scatter(tp3, up3, s=30, c=color_list[2])
68 | plt.xlabel('total_price/10000 yuan')
69 | plt.ylabel('unit_price/yuan')
70 | plt.legend(loc='lower right',title='house_type',labels=types)
71 | plt.show()
72 |
--------------------------------------------------------------------------------
/lianjia/house_show2.py:
--------------------------------------------------------------------------------
1 | # 该代码实现单价-直方图的绘制
2 | import matplotlib
3 | import matplotlib.pyplot as plt
4 | import csv
5 |
6 | filename = 'house_output.csv'
7 | with open(filename,"r",encoding='utf-8') as f: # 注意这里一定记得用utf-8打开
8 | data = csv.reader(f)
9 | unit_price = []
10 | pos = [] # 行政区
11 | house_num = [] # 楼盘数量
12 | price_sum = [] # 平均单价的和
13 | test_f = 1
14 | for i in data:
15 | if test_f == 1:
16 | test_f = 0
17 | else:
18 | unit_price.append(int(i[7]))
19 | pos.append(str(i[3]))
20 | for i in range(0,10):
21 | house_num.append(int(0))
22 | price_sum.append(int(0))
23 |
24 | for i in range(len(pos)):
25 | if pos[i] == '朝阳':
26 | house_num[0] = house_num[0] + 1
27 | price_sum[0] = price_sum[0] + unit_price[i]
28 | if pos[i] == '丰台':
29 | house_num[1] = house_num[1] + 1
30 | price_sum[1] = price_sum[1] + unit_price[i]
31 | if pos[i] == '顺义':
32 | house_num[2] = house_num[2] + 1
33 | price_sum[2] = price_sum[2] + unit_price[i]
34 | if pos[i] == '通州':
35 | house_num[3] = house_num[3] + 1
36 | price_sum[3] = price_sum[3] + unit_price[i]
37 | if pos[i] == '大兴':
38 | house_num[4] = house_num[4] + 1
39 | price_sum[4] = price_sum[4] + unit_price[i]
40 | if pos[i] == '昌平':
41 | house_num[5] = house_num[5] + 1
42 | price_sum[5] = price_sum[5] + unit_price[i]
43 | if pos[i] == '门头沟':
44 | house_num[6] = house_num[6] + 1
45 | price_sum[6] = price_sum[6] + unit_price[i]
46 | if pos[i] == '房山':
47 | house_num[7] = house_num[7] + 1
48 | price_sum[7] = price_sum[7] + unit_price[i]
49 | if pos[i] == '密云':
50 | house_num[8] = house_num[8] + 1
51 | price_sum[8] = price_sum[8] + unit_price[i]
52 | if pos[i] == '平谷':
53 | house_num[9] = house_num[9] + 1
54 | price_sum[9] = price_sum[9] + unit_price[i]
55 | print(house_num)
56 | bins_num = []
57 | count = 3
58 | for i in range(0, 11):
59 | if i != 0:
60 | price_sum[i-1] = price_sum[i-1] / house_num[i-1] # 计算出平均单价
61 | count = count + house_num[i-1]
62 | bins_num.append(count)
63 |
64 |
65 | position_qu = ['chaoyang', 'fengtai', 'shunyi', 'tongzhou', 'daxing', 'changping', 'mentougou', 'fangshan', 'miyun', 'pinggu']
66 | print(bins_num)
67 | print(price_sum)
68 | plt.figure(figsize=(30, 10), dpi=70)
69 | plt.title('unit_price_show', fontsize=30)
70 | plt.xlabel('position', fontsize=15)
71 | plt.ylabel('avg_unit_price/yuan', fontsize=15)
72 | for i in range(0,10):
73 | plt.bar(position_qu[i], price_sum[i], width=(house_num[i]/15))
74 | for x,y in zip(position_qu,price_sum):
75 | plt.text(x,y,'%d' % y, ha='center', va='bottom', fontsize=15)
76 | plt.show()
77 |
78 |
79 |
--------------------------------------------------------------------------------
/lianjia/house_show3.py:
--------------------------------------------------------------------------------
1 | # 该代码实现总价-直方图的绘制
2 | import matplotlib
3 | import matplotlib.pyplot as plt
4 | import csv
5 |
6 | filename = 'house_output.csv'
7 | with open(filename,"r",encoding='utf-8') as f: # 注意这里一定记得用utf-8打开
8 | data = csv.reader(f)
9 | total_price = []
10 | pos = [] # 行政区
11 | house_num = [] # 楼盘数量
12 | price_sum = [] # 平均单价的和
13 | test_f = 1
14 | for i in data:
15 | if test_f == 1:
16 | test_f = 0
17 | else:
18 | total_price.append(int(i[8]))
19 | pos.append(str(i[3]))
20 | for i in range(0,10):
21 | house_num.append(int(0))
22 | price_sum.append(int(0))
23 |
24 | for i in range(len(pos)):
25 | if pos[i] == '朝阳':
26 | house_num[0] = house_num[0] + 1
27 | price_sum[0] = price_sum[0] + total_price[i]
28 | if pos[i] == '丰台':
29 | house_num[1] = house_num[1] + 1
30 | price_sum[1] = price_sum[1] + total_price[i]
31 | if pos[i] == '顺义':
32 | house_num[2] = house_num[2] + 1
33 | price_sum[2] = price_sum[2] + total_price[i]
34 | if pos[i] == '通州':
35 | house_num[3] = house_num[3] + 1
36 | price_sum[3] = price_sum[3] + total_price[i]
37 | if pos[i] == '大兴':
38 | house_num[4] = house_num[4] + 1
39 | price_sum[4] = price_sum[4] + total_price[i]
40 | if pos[i] == '昌平':
41 | house_num[5] = house_num[5] + 1
42 | price_sum[5] = price_sum[5] + total_price[i]
43 | if pos[i] == '门头沟':
44 | house_num[6] = house_num[6] + 1
45 | price_sum[6] = price_sum[6] + total_price[i]
46 | if pos[i] == '房山':
47 | house_num[7] = house_num[7] + 1
48 | price_sum[7] = price_sum[7] + total_price[i]
49 | if pos[i] == '密云':
50 | house_num[8] = house_num[8] + 1
51 | price_sum[8] = price_sum[8] + total_price[i]
52 | if pos[i] == '平谷':
53 | house_num[9] = house_num[9] + 1
54 | price_sum[9] = price_sum[9] + total_price[i]
55 | print(house_num)
56 | bins_num = []
57 | count = 3
58 | for i in range(0, 11):
59 | if i != 0:
60 | price_sum[i-1] = price_sum[i-1] / house_num[i-1] # 计算出平均单价
61 | count = count + house_num[i-1]
62 | bins_num.append(count)
63 |
64 |
65 | position_qu = ['chaoyang', 'fengtai', 'shunyi', 'tongzhou', 'daxing', 'changping', 'mentougou', 'fangshan', 'miyun', 'pinggu']
66 | print(bins_num)
67 | print(price_sum)
68 | plt.figure(figsize=(30, 10), dpi=70)
69 | plt.title('total_price_show', fontsize=30)
70 | plt.xlabel('position', fontsize=15)
71 | plt.ylabel('avg_unit_price/10000 yuan', fontsize=15)
72 | for i in range(0,10):
73 | plt.bar(position_qu[i], price_sum[i], width=(house_num[i]/15))
74 | for x,y in zip(position_qu,price_sum):
75 | plt.text(x,y,'%d' % y, ha='center', va='bottom', fontsize=15)
76 | plt.show()
77 |
78 |
79 |
--------------------------------------------------------------------------------
/lianjia/lianjia/.idea/.gitignore:
--------------------------------------------------------------------------------
1 | # 默认忽略的文件
2 | /shelf/
3 | /workspace.xml
4 |
--------------------------------------------------------------------------------
/lianjia/lianjia/.idea/inspectionProfiles/profiles_settings.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
--------------------------------------------------------------------------------
/lianjia/lianjia/.idea/lianjia.iml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
--------------------------------------------------------------------------------
/lianjia/lianjia/.idea/misc.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
--------------------------------------------------------------------------------
/lianjia/lianjia/.idea/modules.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
--------------------------------------------------------------------------------
/lianjia/lianjia/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Revenger666/python-house-scrapy/bbff08f70092d39ef0d19d3f0d9723aa4c8fd1b8/lianjia/lianjia/__init__.py
--------------------------------------------------------------------------------
/lianjia/lianjia/begin.py:
--------------------------------------------------------------------------------
1 | from scrapy.crawler import CrawlerRunner
2 | from scrapy.utils.log import configure_logging
3 | from twisted.internet import reactor
4 |
5 | from lianjia.spiders.spider1 import firsthandspider
6 | from lianjia.spiders.spider2 import secondhandspider
7 |
8 | configure_logging()
9 | runner = CrawlerRunner()
10 | runner.crawl(firsthandspider)
11 | runner.crawl(secondhandspider)
12 | d = runner.join()
13 | d.addBoth(lambda _: reactor.stop())
14 |
15 | reactor.run()
16 | """
17 | from scrapy import cmdline
18 |
19 | cmdline.execute("scrapy crawl spider1".split())
20 | cmdline.execute("scrapy crawl spider2".split())
21 |
22 | from scrapy.crawler import CrawlerProcess
23 | from scrapy.utils.project import get_project_settings
24 |
25 | settings = get_project_settings()
26 |
27 | crawler = CrawlerProcess(settings)
28 |
29 | crawler.crawl('spider1')
30 | crawler.crawl('spider2')
31 |
32 | crawler.start()"""
--------------------------------------------------------------------------------
/lianjia/lianjia/items.py:
--------------------------------------------------------------------------------
1 | # Define here the models for your scraped items
2 | #
3 | # See documentation in:
4 | # https://docs.scrapy.org/en/latest/topics/items.html
5 |
6 | import scrapy
7 |
8 |
9 | class firsthanditem(scrapy.Item):
10 | name = scrapy.Field()
11 | position = scrapy.Field()
12 | position1 = scrapy.Field()
13 | position2 = scrapy.Field()
14 | types = scrapy.Field()
15 | houseType = scrapy.Field()
16 | space = scrapy.Field()
17 | unitPrice = scrapy.Field()
18 | totalPrice = scrapy.Field()
19 |
20 | class secondhanditem(scrapy.Item):
21 | name = scrapy.Field()
22 | position = scrapy.Field()
23 | types = scrapy.Field()
24 | unitPrice = scrapy.Field()
25 | totalPrice = scrapy.Field()
--------------------------------------------------------------------------------
/lianjia/lianjia/middlewares.py:
--------------------------------------------------------------------------------
1 | # Define here the models for your spider middleware
2 | #
3 | # See documentation in:
4 | # https://docs.scrapy.org/en/latest/topics/spider-middleware.html
5 |
6 | from scrapy import signals
7 |
8 | # useful for handling different item types with a single interface
9 | from itemadapter import is_item, ItemAdapter
10 |
11 |
12 | import random
13 | class RandomUserAgentMiddleware(object):
14 | def __init__(self, user_agents):
15 | self.user_agents = user_agents
16 |
17 | @classmethod
18 | def from_crawler(cls, crawler):
19 | # 从settings.py中导入MY_USER_AGENT
20 | s = cls(user_agents=crawler.settings.get('MY_USER_AGENT'))
21 | return s
22 |
23 | def process_request(self, request, spider):
24 | agent = random.choice(self.user_agents)
25 | request.headers['User-Agent'] = agent
26 | return None
27 |
28 |
29 | class LianjiaSpiderMiddleware:
30 | # Not all methods need to be defined. If a method is not defined,
31 | # scrapy acts as if the spider middleware does not modify the
32 | # passed objects.
33 |
34 | @classmethod
35 | def from_crawler(cls, crawler):
36 | # This method is used by Scrapy to create your spiders.
37 | s = cls()
38 | crawler.signals.connect(s.spider_opened, signal=signals.spider_opened)
39 | return s
40 |
41 | def process_spider_input(self, response, spider):
42 | # Called for each response that goes through the spider
43 | # middleware and into the spider.
44 |
45 | # Should return None or raise an exception.
46 | return None
47 |
48 | def process_spider_output(self, response, result, spider):
49 | # Called with the results returned from the Spider, after
50 | # it has processed the response.
51 |
52 | # Must return an iterable of Request, or item objects.
53 | for i in result:
54 | yield i
55 |
56 | def process_spider_exception(self, response, exception, spider):
57 | # Called when a spider or process_spider_input() method
58 | # (from other spider middleware) raises an exception.
59 |
60 | # Should return either None or an iterable of Request or item objects.
61 | pass
62 |
63 | def process_start_requests(self, start_requests, spider):
64 | # Called with the start requests of the spider, and works
65 | # similarly to the process_spider_output() method, except
66 | # that it doesn’t have a response associated.
67 |
68 | # Must return only requests (not items).
69 | for r in start_requests:
70 | yield r
71 |
72 | def spider_opened(self, spider):
73 | spider.logger.info("Spider opened: %s" % spider.name)
74 |
75 |
76 | class LianjiaDownloaderMiddleware:
77 | # Not all methods need to be defined. If a method is not defined,
78 | # scrapy acts as if the downloader middleware does not modify the
79 | # passed objects.
80 |
81 | @classmethod
82 | def from_crawler(cls, crawler):
83 | # This method is used by Scrapy to create your spiders.
84 | s = cls()
85 | crawler.signals.connect(s.spider_opened, signal=signals.spider_opened)
86 | return s
87 |
88 | def process_request(self, request, spider):
89 | # Called for each request that goes through the downloader
90 | # middleware.
91 |
92 | # Must either:
93 | # - return None: continue processing this request
94 | # - or return a Response object
95 | # - or return a Request object
96 | # - or raise IgnoreRequest: process_exception() methods of
97 | # installed downloader middleware will be called
98 | return None
99 |
100 | def process_response(self, request, response, spider):
101 | # Called with the response returned from the downloader.
102 |
103 | # Must either;
104 | # - return a Response object
105 | # - return a Request object
106 | # - or raise IgnoreRequest
107 | return response
108 |
109 | def process_exception(self, request, exception, spider):
110 | # Called when a download handler or a process_request()
111 | # (from other downloader middleware) raises an exception.
112 |
113 | # Must either:
114 | # - return None: continue processing this exception
115 | # - return a Response object: stops process_exception() chain
116 | # - return a Request object: stops process_exception() chain
117 | pass
118 |
119 | def spider_opened(self, spider):
120 | spider.logger.info("Spider opened: %s" % spider.name)
121 |
--------------------------------------------------------------------------------
/lianjia/lianjia/pipelines.py:
--------------------------------------------------------------------------------
1 | # Define your item pipelines here
2 | #
3 | # Don't forget to add your pipeline to the ITEM_PIPELINES setting
4 | # See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html
5 |
6 |
7 | # useful for handling different item types with a single interface
8 | from itemadapter import ItemAdapter
9 | import json
10 |
11 | class firsthandline(object):
12 | def open_spider(self, spider):
13 | try:
14 | self.file = open('scrapy-test-firsthand.json',"w",encoding="utf-8")
15 | except Exception as err:
16 | print(err)
17 |
18 | def process_item(self, item, spider):
19 | dict_item = dict(item)
20 | json_str = json.dumps(dict_item, ensure_ascii=False) + "\n"
21 | self.file.write(json_str)
22 | return item
23 |
24 | def close_spider(self, spider):
25 | self.file.close()
26 |
27 |
28 | class secondhandline(object):
29 | def open_spider(self, spider):
30 | try:
31 | self.file = open('scrapy-test-secondhand.json', "w", encoding="utf-8")
32 | except Exception as err:
33 | print(err)
34 |
35 | def process_item(self, item, spider):
36 | dict_item = dict(item)
37 | json_str = json.dumps(dict_item, ensure_ascii=False) + "\n"
38 | self.file.write(json_str)
39 | return item
40 |
41 | def close_spider(self, spider):
42 | self.file.close()
--------------------------------------------------------------------------------
/lianjia/lianjia/settings.py:
--------------------------------------------------------------------------------
1 | # Scrapy settings for lianjia project
2 | #
3 | # For simplicity, this file contains only settings considered important or
4 | # commonly used. You can find more settings consulting the documentation:
5 | #
6 | # https://docs.scrapy.org/en/latest/topics/settings.html
7 | # https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
8 | # https://docs.scrapy.org/en/latest/topics/spider-middleware.html
9 |
10 | BOT_NAME = "lianjia"
11 | #2403:a200:a200:13f1:183:84:18:11
12 |
13 | SPIDER_MODULES = ["lianjia.spiders"]
14 | NEWSPIDER_MODULE = "lianjia.spiders"
15 |
16 | # Crawl responsibly by identifying yourself (and your website) on the user-agent
17 | #USER_AGENT = 'python_qm (+http://www.yourdomain.com)'
18 | #USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"
19 | #USER_AGENT ="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36 Edg/119.0.0.0"
20 | DOWNLOADER_MIDDLEWARES = {
21 | # 'lianjia.middlewares.LianjiaDownloaderMiddleware': 543,
22 | 'lianjia.middlewares.RandomUserAgentMiddleware': 900,
23 | }
24 |
25 | MY_USER_AGENT = [
26 | "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; AcooBrowser; .NET CLR 1.1.4322; .NET CLR 2.0.50727)",
27 | "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; Acoo Browser; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.0.04506)",
28 | "Mozilla/4.0 (compatible; MSIE 7.0; AOL 9.5; AOLBuild 4337.35; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)",
29 | "Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US)",
30 | "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET CLR 2.0.50727; Media Center PC 6.0)",
31 | "Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET CLR 1.0.3705; .NET CLR 1.1.4322)",
32 | "Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 5.2; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.2; .NET CLR 3.0.04506.30)",
33 | "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN) AppleWebKit/523.15 (KHTML, like Gecko, Safari/419.3) Arora/0.3 (Change: 287 c9dfb30)",
34 | "Mozilla/5.0 (X11; U; Linux; en-US) AppleWebKit/527+ (KHTML, like Gecko, Safari/419.3) Arora/0.6",
35 | "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.2pre) Gecko/20070215 K-Ninja/2.1.1",
36 | "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.9) Gecko/20080705 Firefox/3.0 Kapiko/3.0",
37 | "Mozilla/5.0 (X11; Linux i686; U;) Gecko/20070322 Kazehakase/0.4.5",
38 | "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.8) Gecko Fedora/1.9.0.8-1.fc10 Kazehakase/0.5.6",
39 | "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11",
40 | "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/535.20 (KHTML, like Gecko) Chrome/19.0.1036.7 Safari/535.20",
41 | "Opera/9.80 (Macintosh; Intel Mac OS X 10.6.8; U; fr) Presto/2.9.168 Version/11.52",
42 | "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.11 TaoBrowser/2.0 Safari/536.11",
43 | "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.71 Safari/537.1 LBBROWSER",
44 | "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; LBBROWSER)",
45 | "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E; LBBROWSER)",
46 | "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.84 Safari/535.11 LBBROWSER",
47 | "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E)",
48 | "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; QQBrowser/7.0.3698.400)",
49 | "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E)",
50 | "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; SV1; QQDownload 732; .NET4.0C; .NET4.0E; 360SE)",
51 | "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E)",
52 | "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E)",
53 | "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1",
54 | "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1",
55 | "Mozilla/5.0 (iPad; U; CPU OS 4_2_1 like Mac OS X; zh-cn) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8C148 Safari/6533.18.5",
56 | "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:2.0b13pre) Gecko/20110307 Firefox/4.0b13pre",
57 | "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:16.0) Gecko/20100101 Firefox/16.0",
58 | "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11",
59 | "Mozilla/5.0 (X11; U; Linux x86_64; zh-CN; rv:1.9.2.10) Gecko/20100922 Ubuntu/10.10 (maverick) Firefox/3.6.10",
60 | "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36",
61 | ]
62 |
63 | # Obey robots.txt rules
64 | ROBOTSTXT_OBEY = False
65 |
66 | #LOG_LEVEL = 'WARNING'
67 |
68 | #LOG_LEVEL = "WARNING"
69 | # Configure maximum concurrent requests performed by Scrapy (default: 16)
70 | #CONCURRENT_REQUESTS = 8
71 |
72 | # Configure a delay for requests for the same website (default: 0)
73 | # See https://docs.scrapy.org/en/latest/topics/settings.html#download-delay
74 | # See also autothrottle settings and docs
75 | DOWNLOAD_DELAY = 3
76 | RANDOMIZE_DOWNLOAD_DELAY = True
77 | #CONCURRENT_REQUESTS_PER_DOMAIN = 2
78 | # The download delay setting will honor only one of:
79 | #CONCURRENT_REQUESTS_PER_DOMAIN = 16
80 | #CONCURRENT_REQUESTS_PER_IP = 16
81 |
82 | # Disable cookies (enabled by default)
83 | #COOKIES_ENABLED = False
84 |
85 | # Disable Telnet Console (enabled by default)
86 | #TELNETCONSOLE_ENABLED = False
87 |
88 | # Override the default request headers:
89 | #DEFAULT_REQUEST_HEADERS = {
90 | # "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
91 | # "Accept-Language": "en",
92 | # "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.4389.82 Safari/537.36"
93 | #}
94 |
95 | # Enable or disable spider middlewares
96 | # See https://docs.scrapy.org/en/latest/topics/spider-middleware.html
97 | #SPIDER_MIDDLEWARES = {
98 | # "lianjia.middlewares.LianjiaSpiderMiddleware": 543,
99 | #}
100 |
101 | # Enable or disable downloader middlewares
102 | # See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
103 | #DOWNLOADER_MIDDLEWARES = {
104 | # "lianjia.middlewares.LianjiaDownloaderMiddleware": 543,
105 | #}
106 |
107 | # Enable or disable extensions
108 | # See https://docs.scrapy.org/en/latest/topics/extensions.html
109 | #EXTENSIONS = {
110 | # "scrapy.extensions.telnet.TelnetConsole": None,
111 | #}
112 |
113 | # Configure item pipelines
114 | # See https://docs.scrapy.org/en/latest/topics/item-pipeline.html
115 | ITEM_PIPELINES = {'lianjia.pipelines.firsthandline': 300, 'lianjia.pipelines.secondhandline': 300,}
116 | #ITEM_PIPELINES = {'lianjia.pipelines.firsthandline': 300}
117 | # Enable and configure the AutoThrottle extension (disabled by default)
118 | # See https://docs.scrapy.org/en/latest/topics/autothrottle.html
119 | #AUTOTHROTTLE_ENABLED = True
120 | # The initial download delay
121 | #AUTOTHROTTLE_START_DELAY = 5
122 | # The maximum download delay to be set in case of high latencies
123 | #AUTOTHROTTLE_MAX_DELAY = 60
124 | # The average number of requests Scrapy should be sending in parallel to
125 | # each remote server
126 | #AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0
127 | # Enable showing throttling stats for every response received:
128 | #AUTOTHROTTLE_DEBUG = False
129 |
130 | # Enable and configure HTTP caching (disabled by default)
131 | # See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings
132 | #HTTPCACHE_ENABLED = True
133 | #HTTPCACHE_EXPIRATION_SECS = 0
134 | #HTTPCACHE_DIR = "httpcache"
135 | #HTTPCACHE_IGNORE_HTTP_CODES = []
136 | #HTTPCACHE_STORAGE = "scrapy.extensions.httpcache.FilesystemCacheStorage"
137 |
138 | # Set settings whose default value is deprecated to a future-proof value
139 | REQUEST_FINGERPRINTER_IMPLEMENTATION = "2.7"
140 | TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"
141 | FEED_EXPORT_ENCODING = "utf-8"
142 |
--------------------------------------------------------------------------------
/lianjia/lianjia/spiders/__init__.py:
--------------------------------------------------------------------------------
1 | # This package will contain the spiders of your Scrapy project
2 | #
3 | # Please refer to the documentation for information on how to create and manage
4 | # your spiders.
5 |
--------------------------------------------------------------------------------
/lianjia/lianjia/spiders/spider1.py:
--------------------------------------------------------------------------------
1 | import scrapy
2 | from scrapy import Selector, Request
3 | from scrapy.http import HtmlResponse
4 | from lianjia.items import firsthanditem
5 | class firsthandspider(scrapy.spiders.Spider):
6 | name = "lianjia1"
7 | allowed_domains = ["bj.fang.lianjia.com"]
8 | start_urls = []
9 | for page in range(3, 8):
10 | url1 = 'https://bj.fang.lianjia.com/loupan/pg{}/'.format(page)
11 | start_urls.append(url1)
12 | #for page in range(3, 8):
13 | # url1 = 'https://bj.lianjia.com/ershoufang/pg{}/'.format(page)
14 | # start_urls.append(url1)
15 |
16 | custom_settings = {
17 | 'ITEM_PIPELINES': {'lianjia.pipelines.firsthandline': 300},
18 | }
19 |
20 | def parse(self, response):
21 |
22 | item = firsthanditem()
23 | div_list = response.xpath("/html/body/div[3]/ul[2]/li")
24 | #div_list = response.xpath("//*")
25 | #print(div_list)
26 | for each in div_list:
27 |
28 | item['name'] = each.xpath("./div[@class=\"resblock-desc-wrapper\"]/div[@class=\"resblock-name\"]/a/text()").extract_first()
29 | item['types'] = each.xpath("./div[@class=\"resblock-desc-wrapper\"]/div[@class=\"resblock-name\"]/span[@class=\"resblock-type\"]/text()").extract_first()
30 | item['position'] = each.xpath("./div[@class=\"resblock-desc-wrapper\"]/div[@class=\"resblock-location\"]/a/text()").extract_first()
31 | item['position1'] = each.xpath("./div[@class=\"resblock-desc-wrapper\"]/div[@class=\"resblock-location\"]/span[1]/text()").extract_first()
32 | item['position2'] = each.xpath("./div[@class=\"resblock-desc-wrapper\"]/div[@class=\"resblock-location\"]/span[2]/text()").extract_first()
33 | item['houseType'] = each.xpath("./div[@class=\"resblock-desc-wrapper\"]/a[@class=\"resblock-room\"]/span/text()").extract_first()
34 | item['space'] = each.xpath("./div[@class=\"resblock-desc-wrapper\"]/div[@class=\"resblock-area\"]/span/text()").extract_first()
35 | item['unitPrice'] = each.xpath("./div[@class=\"resblock-desc-wrapper\"]/div[@class=\"resblock-price\"]/div[@class=\"main-price\"]/span[@class = \"number\"]/text()").extract_first()
36 | item['totalPrice'] = each.xpath("./div[@class=\"resblock-desc-wrapper\"]/div[@class=\"resblock-price\"]/div[@class=\"second\"]/text()").extract_first()
37 | yield item
38 |
--------------------------------------------------------------------------------
/lianjia/lianjia/spiders/spider2.py:
--------------------------------------------------------------------------------
1 | import scrapy
2 | from scrapy import Selector, Request
3 | from scrapy.http import HtmlResponse
4 | from lianjia.items import secondhanditem
5 | class secondhandspider(scrapy.spiders.Spider):
6 | name = "lianjia2"
7 | allowed_domains = ["bj.lianjia.com"]
8 | start_urls = []
9 | for page in range(3, 8):
10 | url1 = 'https://bj.lianjia.com/ershoufang/pg{}/'.format(page)
11 | start_urls.append(url1)
12 |
13 | custom_settings = {
14 | 'ITEM_PIPELINES': {'lianjia.pipelines.secondhandline': 300},
15 | }
16 |
17 | def parse(self, response):
18 |
19 | item = secondhanditem()
20 | div_list = response.xpath("//*[@id=\"content\"]/div[1]/ul/li")
21 |
22 | #print(div_list)
23 | for each in div_list:
24 | item['name'] = each.xpath("./div[@class=\"info clear\"]/div[@class=\"flood\"]/div/a[1]/text()").extract_first()
25 | item['position'] = each.xpath("./div[@class=\"info clear\"]/div[@class=\"flood\"]/div/a[2]/text()").extract_first()
26 | item['types'] = each.xpath("./div[@class=\"info clear\"]/div[@class=\"address\"]/div/text()").extract_first()
27 | item['unitPrice'] = each.xpath("./div[@class=\"info clear\"]/div[@class=\"priceInfo\"]/div[2]/span/text()").extract_first()
28 | item['totalPrice'] = each.xpath("./div[@class=\"info clear\"]/div[@class=\"priceInfo\"]/div[1]/span/text()").extract_first()
29 | yield item
30 |
--------------------------------------------------------------------------------
/lianjia/scrapy-test-firsthand.json:
--------------------------------------------------------------------------------
1 | {"name": "北辰墅院1900", "types": "住宅", "position": "顺兴街11号院望尊园", "position1": "顺义", "position2": "马坡", "houseType": "3室", "space": "建面 83-135㎡", "unitPrice": "36000", "totalPrice": "总价430(万/套)"}
2 | {"name": "燕西华府", "types": "别墅", "position": "王佐镇青龙湖公园东1500米, 泉湖西路1号院(七区), 泉湖西路1号院(六区)", "position1": "丰台", "position2": "丰台其它", "houseType": "3室", "space": "建面 350-851㎡", "unitPrice": "47000", "totalPrice": "总价1400-3500(万/套)"}
3 | {"name": "京西悦府", "types": "住宅", "position": "燕房线阎村地铁站东南角约189米", "position1": "房山", "position2": "阎村", "houseType": null, "space": "建面 120-135㎡", "unitPrice": "33000", "totalPrice": "总价440(万/套)"}
4 | {"name": "福景苑", "types": "住宅", "position": "亮马桥路46号", "position1": "朝阳", "position2": "燕莎", "houseType": "1室", "space": "建面 145-268㎡", "unitPrice": "83000", "totalPrice": "总价1150-2400(万/套)"}
5 | {"name": "合景寰汇公馆", "types": "住宅", "position": "北京市通州区滨河中路西侧(合景寰汇公馆)", "position1": "通州", "position2": "武夷花园", "houseType": "2室", "space": "建面 77-117㎡", "unitPrice": "35000", "totalPrice": "总价280-490(万/套)"}
6 | {"name": "K2十里春风", "types": "住宅", "position": "北京市通州区", "position1": "通州", "position2": "通州其它", "houseType": "2室", "space": "建面 74-90㎡", "unitPrice": "23500", "totalPrice": "总价188-212(万/套)"}
7 | {"name": "K2十里春风", "types": "别墅", "position": "北京市通州区", "position1": "通州", "position2": "通州其它", "houseType": "3室", "space": "建面 155-156㎡", "unitPrice": "28000", "totalPrice": "总价440-460(万/套)"}
8 | {"name": "玺萌壹號院", "types": "别墅", "position": "西南三环嘉园路与镇国寺北街交叉口", "position1": "丰台", "position2": "草桥", "houseType": "5室", "space": "建面 320-464㎡", "unitPrice": "90000", "totalPrice": "总价3650-3940(万/套)"}
9 | {"name": "北京书院", "types": "住宅", "position": "北京市朝阳区北土城东路辅路", "position1": "朝阳", "position2": "惠新西街", "houseType": "1室", "space": "建面 79-139㎡", "unitPrice": "155000", "totalPrice": "总价1066(万/套)"}
10 | {"name": "中铁华侨城和园", "types": "住宅", "position": "南五环南海子公园西侧约500米", "position1": "大兴", "position2": "瀛海", "houseType": "3室", "space": "建面 154-184㎡", "unitPrice": "60000", "totalPrice": "总价930-980(万/套)"}
11 | {"name": "顺鑫颐和天璟", "types": "住宅", "position": "北京市顺义区牛栏山镇牛富路顺鑫颐和天璟禧润售楼中心", "position1": "顺义", "position2": "顺义其它", "houseType": "4室", "space": "建面 110-220㎡", "unitPrice": "28000", "totalPrice": "总价400-420(万/套)"}
12 | {"name": "顺鑫颐和天璟", "types": "别墅", "position": "新城右堤路与昌金路交汇处向北200米", "position1": "顺义", "position2": "顺义其它", "houseType": "4室", "space": "建面 278-486㎡", "unitPrice": "28000", "totalPrice": "总价950-1200(万/套)"}
13 | {"name": "永旺19街", "types": "商业", "position": "地铁生物医药基地站向南200米", "position1": "大兴", "position2": "天宫院", "houseType": null, "space": null, "unitPrice": "24000", "totalPrice": "总价299(万/套)"}
14 | {"name": "北京城建北京合院", "types": "住宅", "position": "燕京街与通顺路交汇口东800米(仁和公园南)", "position1": "顺义", "position2": "顺义其它", "houseType": "3室", "space": "建面 95-130㎡", "unitPrice": "46000", "totalPrice": "总价556-566(万/套)"}
15 | {"name": "复地运河公馆", "types": "住宅", "position": "通州运河核心区临滨河西路", "position1": "通州", "position2": "武夷花园", "houseType": "2室", "space": "建面 89-145㎡", "unitPrice": "43000", "totalPrice": "总价450-650(万/套)"}
16 | {"name": "北京城建北京合院", "types": "别墅", "position": "燕京街与通顺路交汇口东800米(仁和公园南)", "position1": "顺义", "position2": "顺义其它", "houseType": "4室", "space": "建面 210-330㎡", "unitPrice": "39000", "totalPrice": "总价1000-1300(万/套)"}
17 | {"name": "月亮河七星公馆", "types": "住宅", "position": "通燕高速耿庄桥出口南200米月亮河,河滨路1号", "position1": "通州", "position2": "武夷花园", "houseType": "1室", "space": "建面 55-109㎡", "unitPrice": "68000", "totalPrice": "总价374-800(万/套)"}
18 | {"name": "天润福熙大道", "types": "住宅", "position": "清河营东路1号院, 清河营东路3号院", "position1": "朝阳", "position2": "北苑", "houseType": "1室", "space": "建面 65-374㎡", "unitPrice": "108000", "totalPrice": "总价750-3316(万/套)"}
19 | {"name": "京贸国际公馆", "types": "住宅", "position": "怡乐中路299号院(广渠快速路二期出口向南1000米)", "position1": "通州", "position2": "九棵树(家乐福)", "houseType": "1室", "space": "建面 72-147㎡", "unitPrice": "64000", "totalPrice": "总价495-950(万/套)"}
20 | {"name": "凯德麓语", "types": "别墅", "position": "兴寿镇京承高速G11出口向西怀昌路北侧", "position1": "昌平", "position2": "昌平其它", "houseType": "3室", "space": "建面 280-863㎡", "unitPrice": "35000", "totalPrice": "总价850-3450(万/套)"}
21 | {"name": "京贸国际城·峰景", "types": "住宅", "position": "芙蓉东路1号(通燕高速耿庄桥北出口向南300米)", "position1": "通州", "position2": "武夷花园", "houseType": "1室", "space": "建面 69-140㎡", "unitPrice": "68000", "totalPrice": "总价460-980(万/套)"}
22 | {"name": "观唐云鼎", "types": "别墅", "position": "溪翁庄镇密溪路39号院(云佛山度假村对面)", "position1": "密云", "position2": "溪翁庄镇", "houseType": "3室", "space": "建面 346-613㎡", "unitPrice": "30000", "totalPrice": "总价1068-1850(万/套)"}
23 | {"name": "旭辉城", "types": "住宅", "position": "北京市房山区良锦街6号院旭辉城营销中心", "position1": "房山", "position2": "房山其它", "houseType": "2室", "space": "建面 75-116㎡", "unitPrice": "28500", "totalPrice": "总价219-330(万/套)"}
24 | {"name": "檀香府", "types": "住宅", "position": "京潭大街与潭柘十街交叉口", "position1": "门头沟", "position2": "门头沟其它", "houseType": "3室", "space": "建面 124-170㎡", "unitPrice": "42000", "totalPrice": "总价530-750(万/套)"}
25 | {"name": "泰禾金府大院", "types": "别墅", "position": "南四环地铁新宫站南800米", "position1": "丰台", "position2": "新宫", "houseType": "4室", "space": "建面 362-504㎡", "unitPrice": "75000", "totalPrice": "总价2700-3700(万/套)"}
26 | {"name": "和棠瑞著", "types": "别墅", "position": "金海湖景区坝前广场西侧500米", "position1": "平谷", "position2": "平谷其它", "houseType": "3室", "space": "建面 305-360㎡", "unitPrice": "16000", "totalPrice": "总价530-560(万/套)"}
27 | {"name": "尊悦光华", "types": "住宅", "position": "北京市朝阳区光华东里甲1号院3号楼", "position1": "朝阳", "position2": "CBD", "houseType": "3室", "space": "建面 133-171㎡", "unitPrice": "150000", "totalPrice": "总价2500(万/套)"}
28 | {"name": "首创·河著", "types": "别墅", "position": "京承高速11出口(昌金路)向东900 米路北", "position1": "顺义", "position2": "顺义其它", "houseType": "4室", "space": "建面 248-310㎡", "unitPrice": "38000", "totalPrice": "总价1200-1900(万/套)"}
29 | {"name": "华萃西山", "types": "住宅", "position": "永定镇地铁S1号线石厂西南700米", "position1": "门头沟", "position2": "门头沟其它", "houseType": "3室", "space": "建面 115-122㎡", "unitPrice": "48000", "totalPrice": "总价560-600(万/套)"}
30 | {"name": "京西悦府", "types": "别墅", "position": "北京市房山区燕房线阎村地铁站东南角约189米", "position1": "房山", "position2": "阎村", "houseType": "3室", "space": "建面 175-176㎡", "unitPrice": "40000", "totalPrice": "总价700-780(万/套)"}
31 | {"name": "中粮天恒天悦壹号", "types": "别墅", "position": "南四环地铁新宫站南500米", "position1": "丰台", "position2": "新宫", "houseType": "4室", "space": "建面 220-340㎡", "unitPrice": "80000", "totalPrice": "总价2000-2360(万/套)"}
32 | {"name": "龙湾别墅", "types": "住宅", "position": "后沙峪镇龙湾别墅", "position1": "顺义", "position2": "中央别墅区", "houseType": "4室", "space": "建面 218-317㎡", "unitPrice": "70000", "totalPrice": "总价2300(万/套)"}
33 | {"name": "京投发展·锦悦府", "types": "住宅", "position": "檀营乡檀东路西侧", "position1": "密云", "position2": "鼓楼街道", "houseType": "3室", "space": "建面 90㎡", "unitPrice": "25607", "totalPrice": "总价220(万/套)"}
34 | {"name": "京投发展·锦悦府", "types": "别墅", "position": "檀营乡檀东路西侧", "position1": "密云", "position2": "鼓楼街道", "houseType": "3室", "space": "建面 187-285㎡", "unitPrice": "25000", "totalPrice": "总价400-560(万/套)"}
35 | {"name": "金辰府", "types": "住宅", "position": "北京市昌平区北七家镇政府东南100米", "position1": "昌平", "position2": "北七家", "houseType": "3室", "space": "建面 89-143㎡", "unitPrice": "55000", "totalPrice": "总价490-790(万/套)"}
36 | {"name": "建邦·顺颐府", "types": "住宅", "position": "空港B区裕民大街30号千里马国际一层建邦·顺颐府售楼中心", "position1": "顺义", "position2": "后沙峪", "houseType": "3室", "space": "建面 89-147㎡", "unitPrice": "55583", "totalPrice": "总价480-845(万/套)"}
37 | {"name": "葛洲坝中国府", "types": "住宅", "position": "北京市丰台东路46号", "position1": "丰台", "position2": "玉泉营", "houseType": "3室", "space": "建面 168-240㎡", "unitPrice": "125000", "totalPrice": "总价2200-3000(万/套)"}
38 | {"name": "华萃西山", "types": "别墅", "position": "门头沟永定镇地铁S1号线石厂站西南700米", "position1": "门头沟", "position2": "门头沟其它", "houseType": "4室", "space": "建面 135-245㎡", "unitPrice": "48000", "totalPrice": "总价760-1060(万/套)"}
39 | {"name": "富兴首府", "types": "住宅", "position": "东坝路9号东北60米", "position1": "朝阳", "position2": "东坝", "houseType": "3室", "space": "建面 144-356㎡", "unitPrice": "85000", "totalPrice": "总价1706-2240(万/套)"}
40 | {"name": "中铁诺德阅墅", "types": "别墅", "position": "顺义区后沙峪镇裕园路762乡龙湖滟澜山对面", "position1": "顺义", "position2": "中央别墅区", "houseType": "4室", "space": "建面 235-320㎡", "unitPrice": "50000", "totalPrice": "总价1150-1700(万/套)"}
41 | {"name": "中铁华侨城和园", "types": "别墅", "position": "南五环南海子公园西侧约500米", "position1": "大兴", "position2": "瀛海", "houseType": "4室", "space": "建面 288-370㎡", "unitPrice": "50000", "totalPrice": "总价1870(万/套)"}
42 | {"name": "懋源·璟岳", "types": "别墅", "position": "南三环西路99号院", "position1": "丰台", "position2": "玉泉营", "houseType": "4室", "space": "建面 465-590㎡", "unitPrice": "140000", "totalPrice": "总价6500-9000(万/套)"}
43 | {"name": "合景泰富天汇", "types": "住宅", "position": "顺义区昌金路与通顺路交汇处", "position1": "顺义", "position2": "马坡", "houseType": "2室", "space": "建面 70-117㎡", "unitPrice": "33000", "totalPrice": "总价230-390(万/套)"}
44 | {"name": "懋源·璟玺", "types": "别墅", "position": "孙河京密路与京平辅路交叉口西行1000米", "position1": "朝阳", "position2": "中央别墅区", "houseType": "5室", "space": "建面 500-716㎡", "unitPrice": "100000", "totalPrice": "总价4380-6778(万/套)"}
45 | {"name": "万科雲庐", "types": "住宅", "position": "魏各庄路万科雲庐。售楼处位置在山湖路与泉湖西湖交叉位置", "position1": "丰台", "position2": "丰台其它", "houseType": "4室", "space": "建面 104-300㎡", "unitPrice": "39000", "totalPrice": "总价656-820(万/套)"}
46 | {"name": "万科雲庐", "types": "别墅", "position": "魏各庄路万科雲庐", "position1": "丰台", "position2": "丰台其它", "houseType": "4室", "space": "建面 200-330㎡", "unitPrice": "30000", "totalPrice": "总价852-950(万/套)"}
47 | {"name": "金茂北京国际社区", "types": "住宅", "position": "顺义新城北小营昌金路水色时光路西", "position1": "顺义", "position2": "顺义其它", "houseType": "1室", "space": "建面 50-118㎡", "unitPrice": "30000", "totalPrice": "总价160-360(万/套)"}
48 | {"name": "住总如院", "types": "住宅", "position": "北京市大兴区采华路(波尔多小镇南区西南侧约250米)", "position1": "大兴", "position2": "大兴新机场洋房别墅区", "houseType": "2室", "space": "建面 98-233㎡", "unitPrice": "31136", "totalPrice": "总价280-475(万/套)"}
49 | {"name": "郎府书苑", "types": "住宅", "position": "西集镇京哈高速郎府出口南侧300米", "position1": "通州", "position2": "通州其它", "houseType": "3室", "space": "建面 89-116㎡", "unitPrice": "25800", "totalPrice": "总价273-300(万/套)"}
50 | {"name": "建邦·顺颐府", "types": "别墅", "position": "空港B区裕民大街30号", "position1": "顺义", "position2": "后沙峪", "houseType": "3室", "space": "建面 270㎡", "unitPrice": "55583", "totalPrice": "总价1300(万/套)"}
51 |
--------------------------------------------------------------------------------
/lianjia/scrapy-test-secondhand.json:
--------------------------------------------------------------------------------
1 | {"name": "人民日报社家属区 ", "position": "红庙", "types": "2室1厅 | 65.64平米 | 东南 | 简装 | 低楼层(共16层) | 1991年 | 塔楼", "unitPrice": "83,791元/平", "totalPrice": "550"}
2 | {"name": "中建国际港 ", "position": "枣园", "types": "2室1厅 | 86.96平米 | 南 | 简装 | 低楼层(共33层) | 板楼", "unitPrice": "49,219元/平", "totalPrice": "428"}
3 | {"name": "延静里 ", "position": "甜水园", "types": "2室1厅 | 64平米 | 南 北 | 精装 | 低楼层(共6层) | 1979年 | 板楼", "unitPrice": "62,344元/平", "totalPrice": "399"}
4 | {"name": "铁东小区 ", "position": "军博", "types": "2室1厅 | 50.3平米 | 南 | 简装 | 中楼层(共6层) | 板楼", "unitPrice": "100,398元/平", "totalPrice": "505"}
5 | {"name": "红联北村 ", "position": "小西天", "types": "3室1厅 | 81平米 | 东 北 | 毛坯 | 中楼层(共16层) | 1993年 | 塔楼", "unitPrice": "76,544元/平", "totalPrice": "620"}
6 | {"name": "秀水园 ", "position": "甜水园", "types": "2室1厅 | 58.04平米 | 东南 | 精装 | 12层 | 1994年 | 板楼", "unitPrice": "56,169元/平", "totalPrice": "326"}
7 | {"name": "华纺易城 ", "position": "朝青", "types": "3室1厅 | 138.03平米 | 南 北 | 简装 | 13层 | 2006年 | 板楼", "unitPrice": "81,142元/平", "totalPrice": "1120"}
8 | {"name": "南庭新苑北区 ", "position": "新宫", "types": "2室1厅 | 58.7平米 | 东 | 简装 | 低楼层(共16层) | 2012年 | 板塔结合", "unitPrice": "54,174元/平", "totalPrice": "318"}
9 | {"name": "中建二局家属院 ", "position": "梨园", "types": "3室1厅 | 119.89平米 | 东南 | 精装 | 中楼层(共18层) | 塔楼", "unitPrice": "32,947元/平", "totalPrice": "395"}
10 | {"name": "门矿西山楼 ", "position": "门头沟其它", "types": "2室2厅 | 62.78平米 | 南 北 | 简装 | 低楼层(共6层) | 1992年 | 板楼", "unitPrice": "17,522元/平", "totalPrice": "110"}
11 | {"name": "建邦华庭东区 ", "position": "长阳", "types": "2室1厅 | 89.96平米 | 南 | 精装 | 中楼层(共16层) | 2013年 | 板塔结合", "unitPrice": "39,462元/平", "totalPrice": "355"}
12 | {"name": "龙博苑二区 ", "position": "回龙观", "types": "3室1厅 | 87.43平米 | 南 北 | 简装 | 中楼层(共7层) | 2004年 | 板楼", "unitPrice": "52,500元/平", "totalPrice": "459"}
13 | {"name": "丽景长安二期 ", "position": "冯村", "types": "2室1厅 | 87平米 | 南 | 精装 | 中楼层(共27层) | 板塔结合", "unitPrice": "37,357元/平", "totalPrice": "325"}
14 | {"name": "双榆树东里 ", "position": "双榆树", "types": "2室1厅 | 49.5平米 | 南 | 精装 | 底层(共6层) | 1981年 | 板楼", "unitPrice": "106,061元/平", "totalPrice": "525"}
15 | {"name": "永泰东里 ", "position": "清河", "types": "2室1厅 | 73.12平米 | 东 西 | 简装 | 顶层(共6层) | 板楼", "unitPrice": "59,902元/平", "totalPrice": "438"}
16 | {"name": "双桥六号井 ", "position": "双桥", "types": "2室1厅 | 58.69平米 | 南 北 | 简装 | 中楼层(共4层) | 1987年 | 板楼", "unitPrice": "35,100元/平", "totalPrice": "206"}
17 | {"name": "和平街十二区 ", "position": "和平里", "types": "2室1厅 | 59.13平米 | 南 北 | 简装 | 中楼层(共6层) | 1992年 | 板楼", "unitPrice": "77,457元/平", "totalPrice": "458"}
18 | {"name": "万科新里程57号院 ", "position": "长阳", "types": "2室2厅 | 90.98平米 | 南 北 | 精装 | 低楼层(共8层) | 2013年 | 板楼", "unitPrice": "38,361元/平", "totalPrice": "349"}
19 | {"name": "角门东里 ", "position": "角门", "types": "2室1厅 | 59.6平米 | 南 北 | 精装 | 中楼层(共6层) | 1993年 | 板楼", "unitPrice": "48,994元/平", "totalPrice": "292"}
20 | {"name": "百万庄午区 ", "position": "阜成门", "types": "2室0厅 | 55.3平米 | 东 南 北 | 简装 | 底层(共5层) | 板楼", "unitPrice": "124,774元/平", "totalPrice": "690"}
21 | {"name": "南礼士路甲62号院 ", "position": "月坛", "types": "2室1厅 | 61.5平米 | 南 北 | 简装 | 顶层(共6层) | 1982年 | 板楼", "unitPrice": "121,952元/平", "totalPrice": "750"}
22 | {"name": "保利嘉园三号院 ", "position": "常营", "types": "2室2厅 | 89平米 | 西南 | 精装 | 中楼层(共26层) | 塔楼", "unitPrice": "49,326元/平", "totalPrice": "439"}
23 | {"name": "美景东方 ", "position": "华威桥", "types": "2室2厅 | 77.15平米 | 西南 | 精装 | 22层 | 塔楼", "unitPrice": "73,883元/平", "totalPrice": "570"}
24 | {"name": "月季园 ", "position": "武夷花园", "types": "2室1厅 | 99.57平米 | 南 | 精装 | 中楼层(共27层) | 板塔结合", "unitPrice": "41,680元/平", "totalPrice": "415"}
25 | {"name": "南三环东路 ", "position": "刘家窑", "types": "2室1厅 | 68.92平米 | 南 北 | 简装 | 12层 | 1999年 | 板塔结合", "unitPrice": "47,592元/平", "totalPrice": "328"}
26 | {"name": "富卓苑 ", "position": "马家堡", "types": "2室1厅 | 69.7平米 | 东 西 | 简装 | 6层 | 2001年 | 板楼", "unitPrice": "47,203元/平", "totalPrice": "329"}
27 | {"name": "伟业嘉园西里 ", "position": "良乡", "types": "2室1厅 | 97.34平米 | 南 北 | 简装 | 中楼层(共7层) | 2010年 | 板楼", "unitPrice": "25,786元/平", "totalPrice": "251"}
28 | {"name": "顺五条 ", "position": "刘家窑", "types": "2室1厅 | 62.67平米 | 东南 | 精装 | 中楼层(共6层) | 板楼", "unitPrice": "55,051元/平", "totalPrice": "345"}
29 | {"name": "西黄新村北里 ", "position": "苹果园", "types": "3室1厅 | 111.89平米 | 南 西 | 简装 | 23层 | 2003年 | 塔楼", "unitPrice": "42,542元/平", "totalPrice": "476"}
30 | {"name": "国际花都 ", "position": "密云其它", "types": "2室1厅 | 86.15平米 | 南 北 | 精装 | 底层(共20层) | 板楼", "unitPrice": "20,430元/平", "totalPrice": "176"}
31 | {"name": "天通苑东一区 ", "position": "天通苑", "types": "2室1厅 | 76.91平米 | 南 北 | 精装 | 中楼层(共7层) | 2001年 | 板楼", "unitPrice": "43,688元/平", "totalPrice": "336"}
32 | {"name": "万年花城四期 ", "position": "玉泉营", "types": "2室2厅 | 86.95平米 | 东 西 | 精装 | 顶层(共6层) | 板楼", "unitPrice": "61,990元/平", "totalPrice": "539"}
33 | {"name": "万年花城四期 ", "position": "玉泉营", "types": "2室1厅 | 97.01平米 | 南 | 简装 | 中楼层(共27层) | 塔楼", "unitPrice": "70,096元/平", "totalPrice": "680"}
34 | {"name": "莲香园 ", "position": "六里桥", "types": "4室2厅 | 197.23平米 | 南 北 | 简装 | 顶层(共6层) | 板楼", "unitPrice": "43,097元/平", "totalPrice": "850"}
35 | {"name": "英特公寓 ", "position": "西坝河", "types": "3室1厅 | 227.33平米 | 南 北 | 精装 | 高楼层(共19层) | 板楼", "unitPrice": "57,186元/平", "totalPrice": "1300"}
36 | {"name": "观湖国际 ", "position": "朝阳公园", "types": "4室2厅 | 288.65平米 | 南 | 精装 | 中楼层(共27层) | 板楼", "unitPrice": "103,240元/平", "totalPrice": "2980"}
37 | {"name": "天伦锦城 ", "position": "花乡", "types": "3室2厅 | 110.55平米 | 东 西 | 精装 | 顶层(共13层) | 板楼", "unitPrice": "44,777元/平", "totalPrice": "495"}
38 | {"name": "怡锦园 ", "position": "科技园区", "types": "3室1厅 | 136.94平米 | 西南 | 精装 | 高楼层(共30层) | 2003年 | 塔楼", "unitPrice": "53,674元/平", "totalPrice": "735"}
39 | {"name": "曙光里 ", "position": "三元桥", "types": "3室1厅 | 77.4平米 | 南 北 | 精装 | 中楼层(共6层) | 板楼", "unitPrice": "65,892元/平", "totalPrice": "510"}
40 | {"name": "中国铁建花语金郡 ", "position": "瀛海", "types": "4室1厅 | 128.49平米 | 南 北 | 简装 | 中楼层(共18层) | 2018年 | 板楼", "unitPrice": "59,149元/平", "totalPrice": "760"}
41 | {"name": "左安漪园 ", "position": "左安门", "types": "3室1厅 | 133.04平米 | 南 北 | 简装 | 中楼层(共10层) | 板塔结合", "unitPrice": "105,232元/平", "totalPrice": "1400"}
42 | {"name": "银枫家园 ", "position": "大山子", "types": "4室2厅 | 215.29平米 | 东南 西北 | 简装 | 中楼层(共10层) | 板楼", "unitPrice": "41,758元/平", "totalPrice": "899"}
43 | {"name": "天通苑中苑 ", "position": "天通苑", "types": "3室1厅 | 158.07平米 | 南 北 | 简装 | 低楼层(共19层) | 2008年 | 板楼", "unitPrice": "38,464元/平", "totalPrice": "608"}
44 | {"name": "黄庄小区 ", "position": "中关村", "types": "3室2厅 | 127.1平米 | 东 南 北 | 精装 | 高楼层(共16层) | 1987年 | 塔楼", "unitPrice": "133,753元/平", "totalPrice": "1700"}
45 | {"name": "永金里小区 ", "position": "五棵松", "types": "3室1厅 | 120.99平米 | 东 西 | 精装 | 低楼层(共6层) | 板楼", "unitPrice": "73,478元/平", "totalPrice": "889"}
46 | {"name": "DBC加州小镇C区 ", "position": "临河里", "types": "3室2厅 | 133.83平米 | 南 北 | 精装 | 高楼层(共15层) | 2010年 | 板楼", "unitPrice": "43,563元/平", "totalPrice": "583"}
47 | {"name": "龙泽苑东区 ", "position": "回龙观", "types": "2室1厅 | 100.55平米 | 东南 | 简装 | 11层 | 塔楼", "unitPrice": "56,191元/平", "totalPrice": "565"}
48 | {"name": "尚家楼48号院 ", "position": "三元桥", "types": "3室2厅 | 88.04平米 | 南 西 北 | 精装 | 高楼层(共12层) | 板塔结合", "unitPrice": "83,712元/平", "totalPrice": "737"}
49 | {"name": "西山艺境3号院 ", "position": "大峪", "types": "3室2厅 | 140.62平米 | 南 北 | 精装 | 高楼层(共9层) | 2015年 | 板楼", "unitPrice": "45,442元/平", "totalPrice": "639"}
50 | {"name": "天通苑东一区 ", "position": "天通苑", "types": "3室1厅 | 130.14平米 | 南 北 | 简装 | 中楼层(共7层) | 板楼", "unitPrice": "35,962元/平", "totalPrice": "468"}
51 | {"name": "朝阳旺角 ", "position": "双桥", "types": "3室2厅 | 137.37平米 | 南 北 | 精装 | 中楼层(共16层) | 板楼", "unitPrice": "45,862元/平", "totalPrice": "630"}
52 | {"name": "蓝调沙龙西区 ", "position": "九棵树(家乐福)", "types": "2室1厅 | 98.46平米 | 南 北 | 简装 | 高楼层(共6层) | 2004年 | 板楼", "unitPrice": "37,376元/平", "totalPrice": "368"}
53 | {"name": "天通苑东二区 ", "position": "天通苑", "types": "3室1厅 | 141.41平米 | 南 北 | 简装 | 高楼层(共7层) | 2001年 | 板楼", "unitPrice": "26,873元/平", "totalPrice": "380"}
54 | {"name": "跃城 ", "position": "赵公口", "types": "3室2厅 | 162.29平米 | 南 北 | 精装 | 高楼层(共20层) | 板塔结合", "unitPrice": "46,830元/平", "totalPrice": "760"}
55 | {"name": "DBC加州小镇 ", "position": "临河里", "types": "3室1厅 | 124.74平米 | 南 北 | 精装 | 中楼层(共11层) | 板楼", "unitPrice": "39,924元/平", "totalPrice": "498"}
56 | {"name": "慧忠北里第三社区 ", "position": "亚运村", "types": "3室2厅 | 115.78平米 | 东 西北 | 简装 | 低楼层(共25层) | 塔楼", "unitPrice": "64,779元/平", "totalPrice": "750"}
57 | {"name": "泰中花园 ", "position": "高米店", "types": "5室2厅 | 204平米 | 南 北 | 简装 | 高楼层(共7层) | 板楼", "unitPrice": "22,010元/平", "totalPrice": "449"}
58 | {"name": "首城国际D区 ", "position": "双井", "types": "3室1厅 | 89.77平米 | 南 北 | 精装 | 中楼层(共28层) | 2010年 | 板楼", "unitPrice": "103,599元/平", "totalPrice": "930"}
59 | {"name": "弘善家园 ", "position": "潘家园", "types": "3室1厅 | 89.08平米 | 西北 | 简装 | 高楼层(共26层) | 板塔结合", "unitPrice": "48,833元/平", "totalPrice": "435"}
60 | {"name": "富力又一城A区 ", "position": "豆各庄", "types": "3室2厅 | 169.59平米 | 南 北 | 精装 | 高楼层(共22层) | 板楼", "unitPrice": "47,114元/平", "totalPrice": "799"}
61 | {"name": "田村山南路9号院 ", "position": "玉泉路", "types": "3室1厅 | 83.71平米 | 南 北 | 简装 | 顶层(共4层) | 板楼", "unitPrice": "73,827元/平", "totalPrice": "618"}
62 | {"name": "金色漫香郡北区 ", "position": "南中轴机场商务区", "types": "1室1厅 | 57.97平米 | 南 | 精装 | 中楼层(共9层) | 板楼", "unitPrice": "32,604元/平", "totalPrice": "189"}
63 | {"name": "富强东里 ", "position": "黄村中", "types": "2室1厅 | 75.2平米 | 南 北 | 简装 | 低楼层(共6层) | 1993年 | 板楼", "unitPrice": "28,591元/平", "totalPrice": "215"}
64 | {"name": "望泉家园 ", "position": "顺义城", "types": "2室1厅 | 80.88平米 | 南 北 | 精装 | 中楼层(共6层) | 板楼", "unitPrice": "29,550元/平", "totalPrice": "239"}
65 | {"name": "和谐家园二区 ", "position": "回龙观", "types": "3室1厅 | 124.47平米 | 南 北 | 精装 | 低楼层(共6层) | 2006年 | 板楼", "unitPrice": "42,501元/平", "totalPrice": "529"}
66 | {"name": "牛奶宿舍 ", "position": "牡丹园", "types": "2室1厅 | 62.6平米 | 南 北 | 精装 | 高楼层(共6层) | 板楼", "unitPrice": "95,847元/平", "totalPrice": "600"}
67 | {"name": "当代采育满庭春MOMA ", "position": "大兴新机场洋房别墅区", "types": "2室1厅 | 84.67平米 | 南 | 简装 | 中楼层(共18层) | 板楼", "unitPrice": "17,598元/平", "totalPrice": "149"}
68 | {"name": "金泰先锋北区 ", "position": "百子湾", "types": "3室2厅 | 129.49平米 | 南 北 | 精装 | 18层 | 板楼", "unitPrice": "84,949元/平", "totalPrice": "1100"}
69 | {"name": "龙泽苑东区 ", "position": "回龙观", "types": "5室1厅 | 158.4平米 | 南 北 | 毛坯 | 7层 | 2005年 | 板楼", "unitPrice": "34,723元/平", "totalPrice": "550"}
70 | {"name": "模式口东里 ", "position": "苹果园", "types": "2室2厅 | 101.54平米 | 南 北 | 精装 | 底层(共3层) | 1993年 | 板楼", "unitPrice": "43,235元/平", "totalPrice": "439"}
71 | {"name": "枣园小区 ", "position": "枣园", "types": "3室2厅 | 115.51平米 | 南 北 | 简装 | 低楼层(共6层) | 板楼", "unitPrice": "38,525元/平", "totalPrice": "445"}
72 | {"name": "天通西苑二区 ", "position": "天通苑", "types": "4室2厅 | 176.18平米 | 南 西 北 | 简装 | 高楼层(共32层) | 塔楼", "unitPrice": "26,451元/平", "totalPrice": "466"}
73 | {"name": "知春路82号院 ", "position": "双榆树", "types": "4室1厅 | 90.2平米 | 南 北 | 精装 | 中楼层(共5层) | 板楼", "unitPrice": "136,364元/平", "totalPrice": "1230"}
74 | {"name": "金汉绿港二区 ", "position": "顺义城", "types": "4室1厅 | 167.7平米 | 南 北 | 精装 | 中楼层(共17层) | 板楼", "unitPrice": "33,990元/平", "totalPrice": "570"}
75 | {"name": "领秀慧谷D区 ", "position": "回龙观", "types": "3室2厅 | 108.55平米 | 南 北 | 精装 | 中楼层(共11层) | 2016年 | 板楼", "unitPrice": "70,475元/平", "totalPrice": "765"}
76 | {"name": "汇园公寓 ", "position": "亚运村", "types": "3室2厅 | 134.54平米 | 南 北 | 精装 | 顶层(共15层) | 1990年 | 板楼", "unitPrice": "73,585元/平", "totalPrice": "990"}
77 | {"name": "乐府江南 ", "position": "田村", "types": "3室2厅 | 137.23平米 | 南 北 | 精装 | 中楼层(共9层) | 2005年 | 板塔结合", "unitPrice": "107,849元/平", "totalPrice": "1480"}
78 | {"name": "名都园 ", "position": "中央别墅区", "types": "4室2厅 | 228.54平米 | 东 南 西 北 | 简装 | 3层 | 板楼 | 独栋别墅 ", "unitPrice": "74,386元/平", "totalPrice": "1700"}
79 | {"name": "名流花园 ", "position": "北七家", "types": "4室2厅 | 240.55平米 | 南 北 | 精装 | 高楼层(共6层) | 板楼", "unitPrice": "19,913元/平", "totalPrice": "479"}
80 | {"name": "维多莉亚花园公寓 ", "position": "农展馆", "types": "3室1厅 | 153.92平米 | 南 北 | 精装 | 中楼层(共11层) | 板楼", "unitPrice": "92,906元/平", "totalPrice": "1430"}
81 | {"name": "欧陆经典 ", "position": "亚运村小营", "types": "3室1厅 | 135.41平米 | 东南 | 简装 | 低楼层(共26层) | 塔楼", "unitPrice": "78,798元/平", "totalPrice": "1067"}
82 | {"name": "恩济庄46号院 ", "position": "定慧寺", "types": "3室2厅 | 117.1平米 | 南 西 北 | 精装 | 高楼层(共14层) | 1995年 | 塔楼", "unitPrice": "69,172元/平", "totalPrice": "810"}
83 | {"name": "天通苑东一区 ", "position": "天通苑", "types": "3室1厅 | 108.31平米 | 南 西 北 | 简装 | 中楼层(共7层) | 板楼", "unitPrice": "36,008元/平", "totalPrice": "390"}
84 | {"name": "美丽园 ", "position": "四季青", "types": "3室2厅 | 144.1平米 | 南 北 | 精装 | 中楼层(共7层) | 板楼", "unitPrice": "114,504元/平", "totalPrice": "1650"}
85 | {"name": "林栖园 ", "position": "青塔", "types": "3室2厅 | 125.52平米 | 南 北 | 精装 | 顶层(共6层) | 2006年 | 板楼", "unitPrice": "51,785元/平", "totalPrice": "650"}
86 | {"name": "北蜂窝63号院 ", "position": "军博", "types": "3室1厅 | 81.13平米 | 南 北 | 简装 | 低楼层(共6层) | 1986年 | 板楼", "unitPrice": "101,073元/平", "totalPrice": "820"}
87 | {"name": "清枫华景园 ", "position": "学院路", "types": "3室2厅 | 129.66平米 | 南 北 | 简装 | 中楼层(共16层) | 2005年 | 板楼", "unitPrice": "101,420元/平", "totalPrice": "1315"}
88 | {"name": "西山枫林三期 ", "position": "苹果园", "types": "3室1厅 | 125.89平米 | 南 北 | 精装 | 高楼层(共10层) | 板楼", "unitPrice": "56,240元/平", "totalPrice": "708"}
89 | {"name": "爱民里小区 ", "position": "西四", "types": "3室1厅 | 96.8平米 | 南 北 | 简装 | 高楼层(共6层) | 1990年 | 板楼", "unitPrice": "133,265元/平", "totalPrice": "1290"}
90 | {"name": "阜成门外北四巷 ", "position": "阜成门", "types": "4室0厅 | 121.2平米 | 南 北 | 简装 | 中楼层(共4层) | 1950年 | 板楼", "unitPrice": "109,736元/平", "totalPrice": "1330"}
91 | {"name": "密西花园一期 ", "position": "果园街道", "types": "3室2厅 | 121.4平米 | 南 北 | 其他 | 6层 | 2003年 | 板楼", "unitPrice": "17,958元/平", "totalPrice": "218"}
92 | {"name": "东会新村 ", "position": "双桥", "types": "2室1厅 | 66.32平米 | 南 北 | 精装 | 低楼层(共6层) | 1997年 | 板楼", "unitPrice": "46,291元/平", "totalPrice": "307"}
93 | {"name": "石景嘉园 ", "position": "八角", "types": "2室1厅 | 66.53平米 | 西 北 | 简装 | 低楼层(共15层) | 塔楼", "unitPrice": "39,832元/平", "totalPrice": "265"}
94 | {"name": "天通苑东一区 ", "position": "天通苑", "types": "2室1厅 | 76.3平米 | 南 北 | 简装 | 低楼层(共7层) | 板楼", "unitPrice": "44,430元/平", "totalPrice": "339"}
95 | {"name": "龙潭西里 ", "position": "左安门", "types": "1室1厅 | 46.98平米 | 东 西 | 精装 | 6层 | 2000年 | 板楼", "unitPrice": "105,364元/平", "totalPrice": "495"}
96 | {"name": "阜成路甲52号院 ", "position": "定慧寺", "types": "2室1厅 | 81.61平米 | 西南 | 简装 | 低楼层(共18层) | 1999年 | 板楼", "unitPrice": "81,486元/平", "totalPrice": "665"}
97 | {"name": "龙华园 ", "position": "回龙观", "types": "2室1厅 | 67平米 | 南 北 | 简装 | 6层 | 板楼", "unitPrice": "52,687元/平", "totalPrice": "353"}
98 | {"name": "保利首开熙悦春天 ", "position": "天宫院", "types": "2室1厅 | 83.82平米 | 南 | 其他 | 高楼层(共21层) | 板楼", "unitPrice": "36,746元/平", "totalPrice": "308"}
99 | {"name": "天通苑东二区 ", "position": "天通苑", "types": "2室1厅 | 106.67平米 | 东南 | 简装 | 中楼层(共17层) | 板楼", "unitPrice": "40,124元/平", "totalPrice": "428"}
100 | {"name": "农光南路 ", "position": "劲松", "types": "2室1厅 | 54.05平米 | 南 北 | 精装 | 顶层(共6层) | 板楼", "unitPrice": "52,729元/平", "totalPrice": "285"}
101 | {"name": "鑫兆雅园北区 ", "position": "刘家窑", "types": "2室1厅 | 93.81平米 | 南 北 | 简装 | 中楼层(共6层) | 2005年 | 板楼", "unitPrice": "74,086元/平", "totalPrice": "695"}
102 | {"name": "禧瑞都 ", "position": "红庙", "types": "1室1厅 | 108.88平米 | 西 | 精装 | 低楼层(共28层) | 2010年 | 塔楼", "unitPrice": "96,437元/平", "totalPrice": "1050"}
103 | {"name": "大方居 ", "position": "九棵树(家乐福)", "types": "2室1厅 | 88.44平米 | 西 | 精装 | 低楼层(共22层) | 板塔结合", "unitPrice": "29,173元/平", "totalPrice": "258"}
104 | {"name": "电建北院 ", "position": "定福庄", "types": "2室1厅 | 70.96平米 | 北 南 | 简装 | 底层(共7层) | 板楼", "unitPrice": "48,619元/平", "totalPrice": "345"}
105 | {"name": "芳星园三区 ", "position": "方庄", "types": "3室2厅 | 95.27平米 | 西南 | 简装 | 中楼层(共12层) | 2000年 | 板塔结合", "unitPrice": "63,084元/平", "totalPrice": "601"}
106 | {"name": "北京新天地二期 ", "position": "常营", "types": "2室2厅 | 101.64平米 | 南 | 精装 | 低楼层(共28层) | 2008年 | 板塔结合", "unitPrice": "53,031元/平", "totalPrice": "539"}
107 | {"name": "美然动力A2区 ", "position": "定福庄", "types": "1室0厅 | 41.44平米 | 南 | 简装 | 低楼层(共14层) | 2003年 | 板楼", "unitPrice": "53,089元/平", "totalPrice": "220"}
108 | {"name": "安贞西里 ", "position": "安贞", "types": "2室1厅 | 57.15平米 | 南 北 | 简装 | 中楼层(共6层) | 1984年 | 板楼", "unitPrice": "89,939元/平", "totalPrice": "514"}
109 | {"name": "西宏苑 ", "position": "西红门", "types": "2室1厅 | 59.92平米 | 南 北 | 简装 | 顶层(共6层) | 1995年 | 板塔结合", "unitPrice": "33,045元/平", "totalPrice": "198"}
110 | {"name": "安慧北里秀园 ", "position": "亚运村", "types": "2室1厅 | 63.03平米 | 西南 | 简装 | 低楼层(共20层) | 1994年 | 塔楼", "unitPrice": "74,568元/平", "totalPrice": "470"}
111 | {"name": "金隅康惠园1号院 ", "position": "双桥", "types": "2室2厅 | 88.13平米 | 南 北 | 简装 | 中楼层(共9层) | 2010年 | 板楼", "unitPrice": "44,821元/平", "totalPrice": "395"}
112 | {"name": "晨光家园A区 ", "position": "石佛营", "types": "2室1厅 | 82.58平米 | 西南 | 精装 | 低楼层(共30层) | 2001年 | 塔楼", "unitPrice": "59,942元/平", "totalPrice": "495"}
113 | {"name": "鼎顺嘉园东区 ", "position": "顺义其它", "types": "2室1厅 | 73.73平米 | 南 | 精装 | 低楼层(共13层) | 板塔结合", "unitPrice": "27,805元/平", "totalPrice": "205"}
114 | {"name": "模式口西里 ", "position": "苹果园", "types": "2室1厅 | 54.02平米 | 南 | 简装 | 高楼层(共6层) | 板楼", "unitPrice": "36,468元/平", "totalPrice": "197"}
115 | {"name": "丽水嘉园 ", "position": "朝阳公园", "types": "2室1厅 | 95.36平米 | 南 西南 | 精装 | 中楼层(共29层) | 2000年 | 塔楼", "unitPrice": "98,469元/平", "totalPrice": "939"}
116 | {"name": "西罗园四区 ", "position": "西罗园", "types": "2室1厅 | 61.82平米 | 南 北 | 精装 | 高楼层(共6层) | 板楼", "unitPrice": "44,484元/平", "totalPrice": "275"}
117 | {"name": "北京金科天籁城 ", "position": "天宫院", "types": "3室1厅 | 89.79平米 | 南 | 简装 | 中楼层(共29层) | 板塔结合", "unitPrice": "41,096元/平", "totalPrice": "369"}
118 | {"name": "金谷园 ", "position": "知春路", "types": "2室1厅 | 95.04平米 | 南 北 | 精装 | 底层(共6层) | 2002年 | 板楼", "unitPrice": "101,958元/平", "totalPrice": "969"}
119 | {"name": "芳古园一区 ", "position": "方庄", "types": "2室1厅 | 56.3平米 | 南 | 简装 | 低楼层(共14层) | 1992年 | 塔楼", "unitPrice": "66,253元/平", "totalPrice": "373"}
120 | {"name": "雅丽世居 ", "position": "果园", "types": "2室2厅 | 99.14平米 | 南 北 | 精装 | 高楼层(共18层) | 2005年 | 板楼", "unitPrice": "34,295元/平", "totalPrice": "340"}
121 | {"name": "金隅康惠园1号院 ", "position": "双桥", "types": "2室2厅 | 88.13平米 | 南 北 | 简装 | 中楼层(共9层) | 2010年 | 板楼", "unitPrice": "44,821元/平", "totalPrice": "395"}
122 | {"name": "华远铭悦园 ", "position": "临河里", "types": "2室1厅 | 76.48平米 | 南 | 精装 | 28层 | 2014年 | 板塔结合", "unitPrice": "37,919元/平", "totalPrice": "290"}
123 | {"name": "鸭子桥南里 ", "position": "菜户营", "types": "3室1厅 | 60.39平米 | 南 北 | 简装 | 顶层(共6层) | 1982年 | 板楼", "unitPrice": "80,312元/平", "totalPrice": "485"}
124 | {"name": "新外大街31号院 ", "position": "小西天", "types": "3室1厅 | 66.3平米 | 东 南 北 | 简装 | 高楼层(共6层) | 1979年 | 板楼", "unitPrice": "77,678元/平", "totalPrice": "515"}
125 | {"name": "水仙园 ", "position": "武夷花园", "types": "2室1厅 | 93.75平米 | 南 北 | 简装 | 低楼层(共6层) | 2000年 | 板楼", "unitPrice": "46,720元/平", "totalPrice": "438"}
126 | {"name": "天通西苑二区 ", "position": "天通苑", "types": "2室1厅 | 95.62平米 | 东 南 | 简装 | 低楼层(共32层) | 塔楼", "unitPrice": "39,532元/平", "totalPrice": "378"}
127 | {"name": "永居东里 ", "position": "天宁寺", "types": "2室1厅 | 43.9平米 | 东 | 简装 | 中楼层(共5层) | 板楼", "unitPrice": "97,723元/平", "totalPrice": "429"}
128 | {"name": "澜西园三区 ", "position": "顺义城", "types": "2室1厅 | 74.57平米 | 南 北 | 简装 | 底层(共9层) | 2009年 | 板楼", "unitPrice": "30,844元/平", "totalPrice": "230"}
129 | {"name": "富锦嘉园五区 ", "position": "科技园区", "types": "2室1厅 | 91.9平米 | 南 北 | 精装 | 高楼层(共6层) | 2008年 | 板楼", "unitPrice": "54,843元/平", "totalPrice": "504"}
130 | {"name": "鹏润家园 ", "position": "菜户营", "types": "2室1厅 | 79.98平米 | 东 西 | 精装 | 高楼层(共16层) | 板楼", "unitPrice": "61,266元/平", "totalPrice": "490"}
131 | {"name": "海特花园东区 ", "position": "苹果园", "types": "2室1厅 | 98.89平米 | 西 | 精装 | 18层 | 2004年 | 板塔结合", "unitPrice": "46,011元/平", "totalPrice": "455"}
132 | {"name": "金顶阳光 ", "position": "苹果园", "types": "2室1厅 | 89.49平米 | 南 北 | 精装 | 中楼层(共21层) | 2009年 | 板楼", "unitPrice": "55,873元/平", "totalPrice": "500"}
133 | {"name": "黄金苑 ", "position": "奥林匹克公园", "types": "3室2厅 | 122.69平米 | 西南 | 精装 | 低楼层(共18层) | 塔楼", "unitPrice": "45,644元/平", "totalPrice": "560"}
134 | {"name": "福润四季A区 ", "position": "东坝", "types": "2室1厅 | 75.3平米 | 南 | 精装 | 16层 | 板塔结合", "unitPrice": "44,489元/平", "totalPrice": "335"}
135 | {"name": "凯景铭座 ", "position": "安定门", "types": "2室1厅 | 137.93平米 | 东南 | 精装 | 中楼层(共19层) | 2001年 | 塔楼", "unitPrice": "76,706元/平", "totalPrice": "1058"}
136 | {"name": "北京经开汀塘 ", "position": "通州其它", "types": "2室1厅 | 83.76平米 | 南 北 | 精装 | 高楼层(共15层) | 2019年 | 板楼", "unitPrice": "61,963元/平", "totalPrice": "519"}
137 | {"name": "玉带河西街 ", "position": "万达", "types": "2室1厅 | 56平米 | 南 北 | 简装 | 顶层(共5层) | 板楼", "unitPrice": "33,215元/平", "totalPrice": "186"}
138 | {"name": "芳星园三区 ", "position": "方庄", "types": "1室1厅 | 45.05平米 | 南 北 | 精装 | 低楼层(共6层) | 1987年 | 板楼", "unitPrice": "61,932元/平", "totalPrice": "279"}
139 | {"name": "南顶小区 ", "position": "赵公口", "types": "1室1厅 | 45.6平米 | 南 | 简装 | 高楼层(共6层) | 1992年 | 板楼", "unitPrice": "45,615元/平", "totalPrice": "208"}
140 | {"name": "花家地北里 ", "position": "望京", "types": "1室1厅 | 44.38平米 | 南 | 精装 | 高楼层(共18层) | 1994年 | 塔楼", "unitPrice": "68,725元/平", "totalPrice": "305"}
141 | {"name": "惠泽家园 ", "position": "门头沟其它", "types": "2室1厅 | 67.95平米 | 南 北 | 精装 | 顶层(共6层) | 板楼", "unitPrice": "26,785元/平", "totalPrice": "182"}
142 | {"name": "新外大街3号院 ", "position": "小西天", "types": "3室1厅 | 76.6平米 | 东 | 简装 | 顶层(共3层) | 板塔结合", "unitPrice": "86,162元/平", "totalPrice": "660"}
143 | {"name": "辛勤胡同 ", "position": "德胜门", "types": "3室1厅 | 64.3平米 | 东 南 西 | 简装 | 中楼层(共6层) | 板楼", "unitPrice": "127,528元/平", "totalPrice": "820"}
144 | {"name": "尚家楼48号院 ", "position": "三元桥", "types": "2室1厅 | 59.04平米 | 南 北 | 精装 | 顶层(共12层) | 1997年 | 板塔结合", "unitPrice": "73,679元/平", "totalPrice": "435"}
145 | {"name": "京投发展公园悦府一区 ", "position": "回龙观", "types": "2室2厅 | 81.68平米 | 南 | 精装 | 中楼层(共22层) | 板楼", "unitPrice": "64,888元/平", "totalPrice": "530"}
146 | {"name": "华威西里 ", "position": "潘家园", "types": "1室1厅 | 44.3平米 | 南 | 简装 | 高楼层(共18层) | 1993年 | 塔楼", "unitPrice": "55,531元/平", "totalPrice": "246"}
147 | {"name": "南平里 ", "position": "首都机场", "types": "1室1厅 | 41.47平米 | 南 | 精装 | 高楼层(共6层) | 板楼", "unitPrice": "33,760元/平", "totalPrice": "140"}
148 | {"name": "鸿坤理想城五期 ", "position": "西红门", "types": "2室1厅 | 70.31平米 | 南 | 简装 | 中楼层(共18层) | 板塔结合", "unitPrice": "39,682元/平", "totalPrice": "279"}
149 | {"name": "洋桥西里 ", "position": "洋桥", "types": "2室1厅 | 53.2平米 | 南 北 | 简装 | 6层 | 1990年 | 板楼", "unitPrice": "53,760元/平", "totalPrice": "286"}
150 | {"name": "次渠南里十一区 ", "position": "通州其它", "types": "1室1厅 | 57.95平米 | 南 | 简装 | 低楼层(共18层) | 暂无数据", "unitPrice": "34,858元/平", "totalPrice": "202"}
151 |
--------------------------------------------------------------------------------
/lianjia/scrapy.cfg:
--------------------------------------------------------------------------------
1 | # Automatically created by: scrapy startproject
2 | #
3 | # For more information about the [deploy] section see:
4 | # https://scrapyd.readthedocs.io/en/latest/deploy.html
5 |
6 | [settings]
7 | default = lianjia.settings
8 |
9 | [deploy]
10 | #url = http://localhost:6800/
11 | project = lianjia
12 |
--------------------------------------------------------------------------------
/test1_house/datachange.py:
--------------------------------------------------------------------------------
1 | import csv
2 | import json
3 | import codecs
4 |
5 | '''
6 | 将json文件格式转为csv文件格式并保存。
7 | '''
8 |
9 | class Json_Csv():
10 |
11 | # 初始化方法,创建csv文件。
12 | def __init__(self):
13 | self.save_csv = open('house_output.csv', 'w', encoding='utf-8', newline='')
14 |
15 | self.write_csv = csv.writer(self.save_csv, delimiter=',') # 以,为分隔符
16 | def trans(self, filename):
17 | with codecs.open(filename, 'r', encoding='utf-8') as f: #读取json文件
18 | read = f.readlines()
19 | flag = True
20 | for index, info in enumerate(read):
21 | data = json.loads(info)
22 | if flag: # 第一行当做head
23 | keys = list(data.keys()) # 将得到的keys用列表的形式封装好,才能写入csv
24 | self.write_csv.writerow(keys)#以,为分隔符将表头写入csv中
25 | flag = False # 释放
26 | value = list(data.values()) # 写入values,也要是列表形式
27 |
28 | temp = value[6]#将面积只保留最小面积,并转换为int形
29 | if type(temp) == str:
30 | list_temp = temp.split(' ')
31 | list_temp = list_temp[1].split('-')
32 | list_temp = list_temp[0].split('㎡')
33 | value[6] = int(list_temp[0])
34 |
35 | value[7] = int(value[7])#将单价转换为Int形式,单位为元
36 |
37 | temp = value[8]#将总价只保留最小的,转换为int型,单位为万元
38 | if type(temp) == str:
39 | list_temp = temp.split('价')
40 | list_temp = list_temp[1].split('-')
41 | list_temp = list_temp[0].split('(')
42 | value[8] = int(list_temp[0])
43 |
44 | self.write_csv.writerow(value)#以,为分隔符将数据写入表格中
45 | self.save_csv.close() # 写完就关闭
46 |
47 |
48 | if __name__ == '__main__':
49 | json_csv = Json_Csv()
50 | path = 'scrapy-test-firsthand.json'
51 | json_csv.trans(path)
--------------------------------------------------------------------------------
/test1_house/house_output.csv:
--------------------------------------------------------------------------------
1 | name,types,position,position1,position2,houseType,space,unitPrice,totalPrice
2 | 北辰墅院1900,住宅,顺兴街11号院望尊园,顺义,马坡,3室,83,36000,430
3 | 燕西华府,别墅,"王佐镇青龙湖公园东1500米, 泉湖西路1号院(七区), 泉湖西路1号院(六区)",丰台,丰台其它,3室,350,47000,1400
4 | 京西悦府,住宅,燕房线阎村地铁站东南角约189米,房山,阎村,,120,33000,440
5 | 福景苑,住宅,亮马桥路46号,朝阳,燕莎,1室,145,83000,1150
6 | 合景寰汇公馆,住宅,北京市通州区滨河中路西侧(合景寰汇公馆),通州,武夷花园,2室,77,35000,280
7 | K2十里春风,住宅,北京市通州区,通州,通州其它,2室,74,23500,188
8 | K2十里春风,别墅,北京市通州区,通州,通州其它,3室,155,28000,440
9 | 玺萌壹號院,别墅,西南三环嘉园路与镇国寺北街交叉口,丰台,草桥,5室,320,90000,3650
10 | 北京书院,住宅,北京市朝阳区北土城东路辅路,朝阳,惠新西街,1室,79,155000,1066
11 | 中铁华侨城和园,住宅,南五环南海子公园西侧约500米,大兴,瀛海,3室,154,60000,930
12 | 顺鑫颐和天璟,住宅,北京市顺义区牛栏山镇牛富路顺鑫颐和天璟禧润售楼中心,顺义,顺义其它,4室,110,28000,400
13 | 顺鑫颐和天璟,别墅,新城右堤路与昌金路交汇处向北200米,顺义,顺义其它,4室,278,28000,950
14 | 永旺19街,商业,地铁生物医药基地站向南200米,大兴,天宫院,,,24000,299
15 | 北京城建北京合院,住宅,燕京街与通顺路交汇口东800米(仁和公园南),顺义,顺义其它,3室,95,46000,556
16 | 复地运河公馆,住宅,通州运河核心区临滨河西路,通州,武夷花园,2室,89,43000,450
17 | 北京城建北京合院,别墅,燕京街与通顺路交汇口东800米(仁和公园南),顺义,顺义其它,4室,210,39000,1000
18 | 月亮河七星公馆,住宅,通燕高速耿庄桥出口南200米月亮河,河滨路1号,通州,武夷花园,1室,55,68000,374
19 | 天润福熙大道,住宅,"清河营东路1号院, 清河营东路3号院",朝阳,北苑,1室,65,108000,750
20 | 京贸国际公馆,住宅,怡乐中路299号院(广渠快速路二期出口向南1000米),通州,九棵树(家乐福),1室,72,64000,495
21 | 凯德麓语,别墅,兴寿镇京承高速G11出口向西怀昌路北侧,昌平,昌平其它,3室,280,35000,850
22 | 京贸国际城·峰景,住宅,芙蓉东路1号(通燕高速耿庄桥北出口向南300米),通州,武夷花园,1室,69,68000,460
23 | 观唐云鼎,别墅,溪翁庄镇密溪路39号院(云佛山度假村对面),密云,溪翁庄镇,3室,346,30000,1068
24 | 旭辉城,住宅,北京市房山区良锦街6号院旭辉城营销中心,房山,房山其它,2室,75,28500,219
25 | 檀香府,住宅,京潭大街与潭柘十街交叉口,门头沟,门头沟其它,3室,124,42000,530
26 | 泰禾金府大院,别墅,南四环地铁新宫站南800米,丰台,新宫,4室,362,75000,2700
27 | 和棠瑞著,别墅,金海湖景区坝前广场西侧500米,平谷,平谷其它,3室,305,16000,530
28 | 尊悦光华,住宅,北京市朝阳区光华东里甲1号院3号楼,朝阳,CBD,3室,133,150000,2500
29 | 首创·河著,别墅,京承高速11出口(昌金路)向东900 米路北,顺义,顺义其它,4室,248,38000,1200
30 | 华萃西山,住宅,永定镇地铁S1号线石厂西南700米,门头沟,门头沟其它,3室,115,48000,560
31 | 京西悦府,别墅,北京市房山区燕房线阎村地铁站东南角约189米,房山,阎村,3室,175,40000,700
32 | 中粮天恒天悦壹号,别墅,南四环地铁新宫站南500米,丰台,新宫,4室,220,80000,2000
33 | 龙湾别墅,住宅,后沙峪镇龙湾别墅,顺义,中央别墅区,4室,218,70000,2300
34 | 京投发展·锦悦府,住宅,檀营乡檀东路西侧,密云,鼓楼街道,3室,90,25607,220
35 | 京投发展·锦悦府,别墅,檀营乡檀东路西侧,密云,鼓楼街道,3室,187,25000,400
36 | 金辰府,住宅,北京市昌平区北七家镇政府东南100米,昌平,北七家,3室,89,55000,490
37 | 建邦·顺颐府,住宅,空港B区裕民大街30号千里马国际一层建邦·顺颐府售楼中心,顺义,后沙峪,3室,89,55583,480
38 | 葛洲坝中国府,住宅,北京市丰台东路46号,丰台,玉泉营,3室,168,125000,2200
39 | 华萃西山,别墅,门头沟永定镇地铁S1号线石厂站西南700米,门头沟,门头沟其它,4室,135,48000,760
40 | 富兴首府,住宅,东坝路9号东北60米,朝阳,东坝,3室,144,85000,1706
41 | 中铁诺德阅墅,别墅,顺义区后沙峪镇裕园路762乡龙湖滟澜山对面,顺义,中央别墅区,4室,235,50000,1150
42 | 中铁华侨城和园,别墅,南五环南海子公园西侧约500米,大兴,瀛海,4室,288,50000,1870
43 | 懋源·璟岳,别墅,南三环西路99号院,丰台,玉泉营,4室,465,140000,6500
44 | 合景泰富天汇,住宅,顺义区昌金路与通顺路交汇处,顺义,马坡,2室,70,33000,230
45 | 懋源·璟玺,别墅,孙河京密路与京平辅路交叉口西行1000米,朝阳,中央别墅区,5室,500,100000,4380
46 | 万科雲庐,住宅,魏各庄路万科雲庐。售楼处位置在山湖路与泉湖西湖交叉位置,丰台,丰台其它,4室,104,39000,656
47 | 万科雲庐,别墅,魏各庄路万科雲庐,丰台,丰台其它,4室,200,30000,852
48 | 金茂北京国际社区,住宅,顺义新城北小营昌金路水色时光路西,顺义,顺义其它,1室,50,30000,160
49 | 住总如院,住宅,北京市大兴区采华路(波尔多小镇南区西南侧约250米),大兴,大兴新机场洋房别墅区,2室,98,31136,280
50 | 郎府书苑,住宅,西集镇京哈高速郎府出口南侧300米,通州,通州其它,3室,89,25800,273
51 | 建邦·顺颐府,别墅,空港B区裕民大街30号,顺义,后沙峪,3室,270,55583,1300
52 |
--------------------------------------------------------------------------------
/test1_house/house_outputGBK编码,可用excle打开,.csv:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Revenger666/python-house-scrapy/bbff08f70092d39ef0d19d3f0d9723aa4c8fd1b8/test1_house/house_outputGBK编码,可用excle打开,.csv
--------------------------------------------------------------------------------
/test1_house/house_show.py:
--------------------------------------------------------------------------------
1 | #本代码实现了对csv文件的新房数据可视化处理,转换为散点图展示单价与总价的关系
2 |
3 | import matplotlib
4 | import matplotlib.pyplot as plt
5 | import csv
6 |
7 | filename = 'house_output.csv'
8 | with open(filename,"r",encoding='utf-8') as f: #注意这里一定记得用utf-8打开
9 | data = csv.reader(f)
10 | unit_price = []
11 | total_price = []
12 | house_type = []
13 | test_f = 1
14 | for i in data:
15 | if test_f == 1:
16 | test_f = 0
17 | else:
18 | unit_price.append(int(i[7]))
19 | total_price.append(int(i[8]))
20 | house_type.append(i[1])
21 | up1 = []
22 | up2 = []
23 | up3 = []
24 | tp1 = []
25 | tp2 = []
26 | tp3 = []
27 | for i in range(len(unit_price)):
28 | if house_type[i] == '住宅':
29 | up1.append(int(unit_price[i]))
30 | tp1.append(int(total_price[i]))
31 | if house_type[i] == '别墅':
32 | up2.append(int(unit_price[i]))
33 | tp2.append(int(total_price[i]))
34 | if house_type[i] == '商业':
35 | up3.append(int(unit_price[i]))
36 | tp3.append(int(total_price[i]))
37 |
38 | for i in range(len(tp1)):
39 | cur_index = i
40 | while tp1[cur_index - 1] > tp1[cur_index] and cur_index - 1 >= 0:
41 | tp1[cur_index], tp1[cur_index - 1] = tp1[cur_index - 1], tp1[cur_index]
42 | up1[cur_index], up1[cur_index - 1] = up1[cur_index - 1], up1[cur_index]
43 | cur_index -= 1
44 | for i in range(len(tp2)):
45 | cur_index = i
46 | while tp2[cur_index - 1] > tp2[cur_index] and cur_index - 1 >= 0:
47 | tp2[cur_index], tp2[cur_index - 1] = tp2[cur_index - 1], tp2[cur_index]
48 | up2[cur_index], up2[cur_index - 1] = up2[cur_index - 1], up2[cur_index]
49 | cur_index -= 1
50 | for i in range(len(tp3)):
51 | cur_index = i
52 | while tp3[cur_index - 1] > tp3[cur_index] and cur_index - 1 >= 0:
53 | tp3[cur_index], tp3[cur_index - 1] = tp3[cur_index - 1], tp3[cur_index]
54 | up3[cur_index], up3[cur_index - 1] = up3[cur_index - 1], up3[cur_index]
55 | cur_index -= 1
56 | #print(unit_price)
57 | #print(total_price)
58 | #print(house_type)
59 |
60 | color_list = ['#FF8C00', '#00FF00', '#0000FF'] #住宅,别墅,商业
61 | types = ['residence', 'villa', 'commercial']
62 |
63 | plt.figure(figsize=(30, 10), dpi=70)
64 | plt.title('total_price and unit_price for different type house')
65 | plt.scatter(tp1, up1, s=30, c=color_list[0])
66 | plt.scatter(tp2, up2, s=30, c=color_list[1])
67 | plt.scatter(tp3, up3, s=30, c=color_list[2])
68 | plt.xlabel('total_price/10000 yuan')
69 | plt.ylabel('unit_price/yuan')
70 | plt.legend(loc='lower right',title='house_type',labels=types)
71 | plt.show()
72 |
--------------------------------------------------------------------------------
/test1_house/house_show2.py:
--------------------------------------------------------------------------------
1 | # 该代码实现单价-直方图的绘制
2 | import matplotlib
3 | import matplotlib.pyplot as plt
4 | import csv
5 |
6 | filename = 'house_output.csv'
7 | with open(filename,"r",encoding='utf-8') as f: # 注意这里一定记得用utf-8打开
8 | data = csv.reader(f)
9 | unit_price = []
10 | pos = [] # 行政区
11 | house_num = [] # 楼盘数量
12 | price_sum = [] # 平均单价的和
13 | test_f = 1
14 | for i in data:
15 | if test_f == 1:
16 | test_f = 0
17 | else:
18 | unit_price.append(int(i[7]))
19 | pos.append(str(i[3]))
20 | for i in range(0,10):
21 | house_num.append(int(0))
22 | price_sum.append(int(0))
23 |
24 | for i in range(len(pos)):
25 | if pos[i] == '朝阳':
26 | house_num[0] = house_num[0] + 1
27 | price_sum[0] = price_sum[0] + unit_price[i]
28 | if pos[i] == '丰台':
29 | house_num[1] = house_num[1] + 1
30 | price_sum[1] = price_sum[1] + unit_price[i]
31 | if pos[i] == '顺义':
32 | house_num[2] = house_num[2] + 1
33 | price_sum[2] = price_sum[2] + unit_price[i]
34 | if pos[i] == '通州':
35 | house_num[3] = house_num[3] + 1
36 | price_sum[3] = price_sum[3] + unit_price[i]
37 | if pos[i] == '大兴':
38 | house_num[4] = house_num[4] + 1
39 | price_sum[4] = price_sum[4] + unit_price[i]
40 | if pos[i] == '昌平':
41 | house_num[5] = house_num[5] + 1
42 | price_sum[5] = price_sum[5] + unit_price[i]
43 | if pos[i] == '门头沟':
44 | house_num[6] = house_num[6] + 1
45 | price_sum[6] = price_sum[6] + unit_price[i]
46 | if pos[i] == '房山':
47 | house_num[7] = house_num[7] + 1
48 | price_sum[7] = price_sum[7] + unit_price[i]
49 | if pos[i] == '密云':
50 | house_num[8] = house_num[8] + 1
51 | price_sum[8] = price_sum[8] + unit_price[i]
52 | if pos[i] == '平谷':
53 | house_num[9] = house_num[9] + 1
54 | price_sum[9] = price_sum[9] + unit_price[i]
55 | print(house_num)
56 | bins_num = []
57 | count = 3
58 | for i in range(0, 11):
59 | if i != 0:
60 | price_sum[i-1] = price_sum[i-1] / house_num[i-1] # 计算出平均单价
61 | count = count + house_num[i-1]
62 | bins_num.append(count)
63 |
64 |
65 | position_qu = ['chaoyang', 'fengtai', 'shunyi', 'tongzhou', 'daxing', 'changping', 'mentougou', 'fangshan', 'miyun', 'pinggu']
66 | print(bins_num)
67 | print(price_sum)
68 | plt.figure(figsize=(30, 10), dpi=70)
69 | plt.title('unit_price_show', fontsize=30)
70 | plt.xlabel('position', fontsize=15)
71 | plt.ylabel('avg_unit_price/yuan', fontsize=15)
72 | for i in range(0,10):
73 | plt.bar(position_qu[i], price_sum[i], width=(house_num[i]/15))
74 | for x,y in zip(position_qu,price_sum):
75 | plt.text(x,y,'%d' % y, ha='center', va='bottom', fontsize=15)
76 | plt.show()
77 |
78 |
79 |
--------------------------------------------------------------------------------
/test1_house/house_show3.py:
--------------------------------------------------------------------------------
1 | # 该代码实现总价-直方图的绘制
2 | import matplotlib
3 | import matplotlib.pyplot as plt
4 | import csv
5 |
6 | filename = 'house_output.csv'
7 | with open(filename,"r",encoding='utf-8') as f: # 注意这里一定记得用utf-8打开
8 | data = csv.reader(f)
9 | total_price = []
10 | pos = [] # 行政区
11 | house_num = [] # 楼盘数量
12 | price_sum = [] # 平均单价的和
13 | test_f = 1
14 | for i in data:
15 | if test_f == 1:
16 | test_f = 0
17 | else:
18 | total_price.append(int(i[8]))
19 | pos.append(str(i[3]))
20 | for i in range(0,10):
21 | house_num.append(int(0))
22 | price_sum.append(int(0))
23 |
24 | for i in range(len(pos)):
25 | if pos[i] == '朝阳':
26 | house_num[0] = house_num[0] + 1
27 | price_sum[0] = price_sum[0] + total_price[i]
28 | if pos[i] == '丰台':
29 | house_num[1] = house_num[1] + 1
30 | price_sum[1] = price_sum[1] + total_price[i]
31 | if pos[i] == '顺义':
32 | house_num[2] = house_num[2] + 1
33 | price_sum[2] = price_sum[2] + total_price[i]
34 | if pos[i] == '通州':
35 | house_num[3] = house_num[3] + 1
36 | price_sum[3] = price_sum[3] + total_price[i]
37 | if pos[i] == '大兴':
38 | house_num[4] = house_num[4] + 1
39 | price_sum[4] = price_sum[4] + total_price[i]
40 | if pos[i] == '昌平':
41 | house_num[5] = house_num[5] + 1
42 | price_sum[5] = price_sum[5] + total_price[i]
43 | if pos[i] == '门头沟':
44 | house_num[6] = house_num[6] + 1
45 | price_sum[6] = price_sum[6] + total_price[i]
46 | if pos[i] == '房山':
47 | house_num[7] = house_num[7] + 1
48 | price_sum[7] = price_sum[7] + total_price[i]
49 | if pos[i] == '密云':
50 | house_num[8] = house_num[8] + 1
51 | price_sum[8] = price_sum[8] + total_price[i]
52 | if pos[i] == '平谷':
53 | house_num[9] = house_num[9] + 1
54 | price_sum[9] = price_sum[9] + total_price[i]
55 | print(house_num)
56 | bins_num = []
57 | count = 3
58 | for i in range(0, 11):
59 | if i != 0:
60 | price_sum[i-1] = price_sum[i-1] / house_num[i-1] # 计算出平均单价
61 | count = count + house_num[i-1]
62 | bins_num.append(count)
63 |
64 |
65 | position_qu = ['chaoyang', 'fengtai', 'shunyi', 'tongzhou', 'daxing', 'changping', 'mentougou', 'fangshan', 'miyun', 'pinggu']
66 | print(bins_num)
67 | print(price_sum)
68 | plt.figure(figsize=(30, 10), dpi=70)
69 | plt.title('total_price_show', fontsize=30)
70 | plt.xlabel('position', fontsize=15)
71 | plt.ylabel('avg_unit_price/10000 yuan', fontsize=15)
72 | for i in range(0,10):
73 | plt.bar(position_qu[i], price_sum[i], width=(house_num[i]/15))
74 | for x,y in zip(position_qu,price_sum):
75 | plt.text(x,y,'%d' % y, ha='center', va='bottom', fontsize=15)
76 | plt.show()
77 |
78 |
79 |
--------------------------------------------------------------------------------
/test1_house/scrapy-test-firsthand.json:
--------------------------------------------------------------------------------
1 | {"name": "北辰墅院1900", "types": "住宅", "position": "顺兴街11号院望尊园", "position1": "顺义", "position2": "马坡", "houseType": "3室", "space": "建面 83-135㎡", "unitPrice": "36000", "totalPrice": "总价430(万/套)"}
2 | {"name": "燕西华府", "types": "别墅", "position": "王佐镇青龙湖公园东1500米, 泉湖西路1号院(七区), 泉湖西路1号院(六区)", "position1": "丰台", "position2": "丰台其它", "houseType": "3室", "space": "建面 350-851㎡", "unitPrice": "47000", "totalPrice": "总价1400-3500(万/套)"}
3 | {"name": "京西悦府", "types": "住宅", "position": "燕房线阎村地铁站东南角约189米", "position1": "房山", "position2": "阎村", "houseType": null, "space": "建面 120-135㎡", "unitPrice": "33000", "totalPrice": "总价440(万/套)"}
4 | {"name": "福景苑", "types": "住宅", "position": "亮马桥路46号", "position1": "朝阳", "position2": "燕莎", "houseType": "1室", "space": "建面 145-268㎡", "unitPrice": "83000", "totalPrice": "总价1150-2400(万/套)"}
5 | {"name": "合景寰汇公馆", "types": "住宅", "position": "北京市通州区滨河中路西侧(合景寰汇公馆)", "position1": "通州", "position2": "武夷花园", "houseType": "2室", "space": "建面 77-117㎡", "unitPrice": "35000", "totalPrice": "总价280-490(万/套)"}
6 | {"name": "K2十里春风", "types": "住宅", "position": "北京市通州区", "position1": "通州", "position2": "通州其它", "houseType": "2室", "space": "建面 74-90㎡", "unitPrice": "23500", "totalPrice": "总价188-212(万/套)"}
7 | {"name": "K2十里春风", "types": "别墅", "position": "北京市通州区", "position1": "通州", "position2": "通州其它", "houseType": "3室", "space": "建面 155-156㎡", "unitPrice": "28000", "totalPrice": "总价440-460(万/套)"}
8 | {"name": "玺萌壹號院", "types": "别墅", "position": "西南三环嘉园路与镇国寺北街交叉口", "position1": "丰台", "position2": "草桥", "houseType": "5室", "space": "建面 320-464㎡", "unitPrice": "90000", "totalPrice": "总价3650-3940(万/套)"}
9 | {"name": "北京书院", "types": "住宅", "position": "北京市朝阳区北土城东路辅路", "position1": "朝阳", "position2": "惠新西街", "houseType": "1室", "space": "建面 79-139㎡", "unitPrice": "155000", "totalPrice": "总价1066(万/套)"}
10 | {"name": "中铁华侨城和园", "types": "住宅", "position": "南五环南海子公园西侧约500米", "position1": "大兴", "position2": "瀛海", "houseType": "3室", "space": "建面 154-184㎡", "unitPrice": "60000", "totalPrice": "总价930-980(万/套)"}
11 | {"name": "顺鑫颐和天璟", "types": "住宅", "position": "北京市顺义区牛栏山镇牛富路顺鑫颐和天璟禧润售楼中心", "position1": "顺义", "position2": "顺义其它", "houseType": "4室", "space": "建面 110-220㎡", "unitPrice": "28000", "totalPrice": "总价400-420(万/套)"}
12 | {"name": "顺鑫颐和天璟", "types": "别墅", "position": "新城右堤路与昌金路交汇处向北200米", "position1": "顺义", "position2": "顺义其它", "houseType": "4室", "space": "建面 278-486㎡", "unitPrice": "28000", "totalPrice": "总价950-1200(万/套)"}
13 | {"name": "永旺19街", "types": "商业", "position": "地铁生物医药基地站向南200米", "position1": "大兴", "position2": "天宫院", "houseType": null, "space": null, "unitPrice": "24000", "totalPrice": "总价299(万/套)"}
14 | {"name": "北京城建北京合院", "types": "住宅", "position": "燕京街与通顺路交汇口东800米(仁和公园南)", "position1": "顺义", "position2": "顺义其它", "houseType": "3室", "space": "建面 95-130㎡", "unitPrice": "46000", "totalPrice": "总价556-566(万/套)"}
15 | {"name": "复地运河公馆", "types": "住宅", "position": "通州运河核心区临滨河西路", "position1": "通州", "position2": "武夷花园", "houseType": "2室", "space": "建面 89-145㎡", "unitPrice": "43000", "totalPrice": "总价450-650(万/套)"}
16 | {"name": "北京城建北京合院", "types": "别墅", "position": "燕京街与通顺路交汇口东800米(仁和公园南)", "position1": "顺义", "position2": "顺义其它", "houseType": "4室", "space": "建面 210-330㎡", "unitPrice": "39000", "totalPrice": "总价1000-1300(万/套)"}
17 | {"name": "月亮河七星公馆", "types": "住宅", "position": "通燕高速耿庄桥出口南200米月亮河,河滨路1号", "position1": "通州", "position2": "武夷花园", "houseType": "1室", "space": "建面 55-109㎡", "unitPrice": "68000", "totalPrice": "总价374-800(万/套)"}
18 | {"name": "天润福熙大道", "types": "住宅", "position": "清河营东路1号院, 清河营东路3号院", "position1": "朝阳", "position2": "北苑", "houseType": "1室", "space": "建面 65-374㎡", "unitPrice": "108000", "totalPrice": "总价750-3316(万/套)"}
19 | {"name": "京贸国际公馆", "types": "住宅", "position": "怡乐中路299号院(广渠快速路二期出口向南1000米)", "position1": "通州", "position2": "九棵树(家乐福)", "houseType": "1室", "space": "建面 72-147㎡", "unitPrice": "64000", "totalPrice": "总价495-950(万/套)"}
20 | {"name": "凯德麓语", "types": "别墅", "position": "兴寿镇京承高速G11出口向西怀昌路北侧", "position1": "昌平", "position2": "昌平其它", "houseType": "3室", "space": "建面 280-863㎡", "unitPrice": "35000", "totalPrice": "总价850-3450(万/套)"}
21 | {"name": "京贸国际城·峰景", "types": "住宅", "position": "芙蓉东路1号(通燕高速耿庄桥北出口向南300米)", "position1": "通州", "position2": "武夷花园", "houseType": "1室", "space": "建面 69-140㎡", "unitPrice": "68000", "totalPrice": "总价460-980(万/套)"}
22 | {"name": "观唐云鼎", "types": "别墅", "position": "溪翁庄镇密溪路39号院(云佛山度假村对面)", "position1": "密云", "position2": "溪翁庄镇", "houseType": "3室", "space": "建面 346-613㎡", "unitPrice": "30000", "totalPrice": "总价1068-1850(万/套)"}
23 | {"name": "旭辉城", "types": "住宅", "position": "北京市房山区良锦街6号院旭辉城营销中心", "position1": "房山", "position2": "房山其它", "houseType": "2室", "space": "建面 75-116㎡", "unitPrice": "28500", "totalPrice": "总价219-330(万/套)"}
24 | {"name": "檀香府", "types": "住宅", "position": "京潭大街与潭柘十街交叉口", "position1": "门头沟", "position2": "门头沟其它", "houseType": "3室", "space": "建面 124-170㎡", "unitPrice": "42000", "totalPrice": "总价530-750(万/套)"}
25 | {"name": "泰禾金府大院", "types": "别墅", "position": "南四环地铁新宫站南800米", "position1": "丰台", "position2": "新宫", "houseType": "4室", "space": "建面 362-504㎡", "unitPrice": "75000", "totalPrice": "总价2700-3700(万/套)"}
26 | {"name": "和棠瑞著", "types": "别墅", "position": "金海湖景区坝前广场西侧500米", "position1": "平谷", "position2": "平谷其它", "houseType": "3室", "space": "建面 305-360㎡", "unitPrice": "16000", "totalPrice": "总价530-560(万/套)"}
27 | {"name": "尊悦光华", "types": "住宅", "position": "北京市朝阳区光华东里甲1号院3号楼", "position1": "朝阳", "position2": "CBD", "houseType": "3室", "space": "建面 133-171㎡", "unitPrice": "150000", "totalPrice": "总价2500(万/套)"}
28 | {"name": "首创·河著", "types": "别墅", "position": "京承高速11出口(昌金路)向东900 米路北", "position1": "顺义", "position2": "顺义其它", "houseType": "4室", "space": "建面 248-310㎡", "unitPrice": "38000", "totalPrice": "总价1200-1900(万/套)"}
29 | {"name": "华萃西山", "types": "住宅", "position": "永定镇地铁S1号线石厂西南700米", "position1": "门头沟", "position2": "门头沟其它", "houseType": "3室", "space": "建面 115-122㎡", "unitPrice": "48000", "totalPrice": "总价560-600(万/套)"}
30 | {"name": "京西悦府", "types": "别墅", "position": "北京市房山区燕房线阎村地铁站东南角约189米", "position1": "房山", "position2": "阎村", "houseType": "3室", "space": "建面 175-176㎡", "unitPrice": "40000", "totalPrice": "总价700-780(万/套)"}
31 | {"name": "中粮天恒天悦壹号", "types": "别墅", "position": "南四环地铁新宫站南500米", "position1": "丰台", "position2": "新宫", "houseType": "4室", "space": "建面 220-340㎡", "unitPrice": "80000", "totalPrice": "总价2000-2360(万/套)"}
32 | {"name": "龙湾别墅", "types": "住宅", "position": "后沙峪镇龙湾别墅", "position1": "顺义", "position2": "中央别墅区", "houseType": "4室", "space": "建面 218-317㎡", "unitPrice": "70000", "totalPrice": "总价2300(万/套)"}
33 | {"name": "京投发展·锦悦府", "types": "住宅", "position": "檀营乡檀东路西侧", "position1": "密云", "position2": "鼓楼街道", "houseType": "3室", "space": "建面 90㎡", "unitPrice": "25607", "totalPrice": "总价220(万/套)"}
34 | {"name": "京投发展·锦悦府", "types": "别墅", "position": "檀营乡檀东路西侧", "position1": "密云", "position2": "鼓楼街道", "houseType": "3室", "space": "建面 187-285㎡", "unitPrice": "25000", "totalPrice": "总价400-560(万/套)"}
35 | {"name": "金辰府", "types": "住宅", "position": "北京市昌平区北七家镇政府东南100米", "position1": "昌平", "position2": "北七家", "houseType": "3室", "space": "建面 89-143㎡", "unitPrice": "55000", "totalPrice": "总价490-790(万/套)"}
36 | {"name": "建邦·顺颐府", "types": "住宅", "position": "空港B区裕民大街30号千里马国际一层建邦·顺颐府售楼中心", "position1": "顺义", "position2": "后沙峪", "houseType": "3室", "space": "建面 89-147㎡", "unitPrice": "55583", "totalPrice": "总价480-845(万/套)"}
37 | {"name": "葛洲坝中国府", "types": "住宅", "position": "北京市丰台东路46号", "position1": "丰台", "position2": "玉泉营", "houseType": "3室", "space": "建面 168-240㎡", "unitPrice": "125000", "totalPrice": "总价2200-3000(万/套)"}
38 | {"name": "华萃西山", "types": "别墅", "position": "门头沟永定镇地铁S1号线石厂站西南700米", "position1": "门头沟", "position2": "门头沟其它", "houseType": "4室", "space": "建面 135-245㎡", "unitPrice": "48000", "totalPrice": "总价760-1060(万/套)"}
39 | {"name": "富兴首府", "types": "住宅", "position": "东坝路9号东北60米", "position1": "朝阳", "position2": "东坝", "houseType": "3室", "space": "建面 144-356㎡", "unitPrice": "85000", "totalPrice": "总价1706-2240(万/套)"}
40 | {"name": "中铁诺德阅墅", "types": "别墅", "position": "顺义区后沙峪镇裕园路762乡龙湖滟澜山对面", "position1": "顺义", "position2": "中央别墅区", "houseType": "4室", "space": "建面 235-320㎡", "unitPrice": "50000", "totalPrice": "总价1150-1700(万/套)"}
41 | {"name": "中铁华侨城和园", "types": "别墅", "position": "南五环南海子公园西侧约500米", "position1": "大兴", "position2": "瀛海", "houseType": "4室", "space": "建面 288-370㎡", "unitPrice": "50000", "totalPrice": "总价1870(万/套)"}
42 | {"name": "懋源·璟岳", "types": "别墅", "position": "南三环西路99号院", "position1": "丰台", "position2": "玉泉营", "houseType": "4室", "space": "建面 465-590㎡", "unitPrice": "140000", "totalPrice": "总价6500-9000(万/套)"}
43 | {"name": "合景泰富天汇", "types": "住宅", "position": "顺义区昌金路与通顺路交汇处", "position1": "顺义", "position2": "马坡", "houseType": "2室", "space": "建面 70-117㎡", "unitPrice": "33000", "totalPrice": "总价230-390(万/套)"}
44 | {"name": "懋源·璟玺", "types": "别墅", "position": "孙河京密路与京平辅路交叉口西行1000米", "position1": "朝阳", "position2": "中央别墅区", "houseType": "5室", "space": "建面 500-716㎡", "unitPrice": "100000", "totalPrice": "总价4380-6778(万/套)"}
45 | {"name": "万科雲庐", "types": "住宅", "position": "魏各庄路万科雲庐。售楼处位置在山湖路与泉湖西湖交叉位置", "position1": "丰台", "position2": "丰台其它", "houseType": "4室", "space": "建面 104-300㎡", "unitPrice": "39000", "totalPrice": "总价656-820(万/套)"}
46 | {"name": "万科雲庐", "types": "别墅", "position": "魏各庄路万科雲庐", "position1": "丰台", "position2": "丰台其它", "houseType": "4室", "space": "建面 200-330㎡", "unitPrice": "30000", "totalPrice": "总价852-950(万/套)"}
47 | {"name": "金茂北京国际社区", "types": "住宅", "position": "顺义新城北小营昌金路水色时光路西", "position1": "顺义", "position2": "顺义其它", "houseType": "1室", "space": "建面 50-118㎡", "unitPrice": "30000", "totalPrice": "总价160-360(万/套)"}
48 | {"name": "住总如院", "types": "住宅", "position": "北京市大兴区采华路(波尔多小镇南区西南侧约250米)", "position1": "大兴", "position2": "大兴新机场洋房别墅区", "houseType": "2室", "space": "建面 98-233㎡", "unitPrice": "31136", "totalPrice": "总价280-475(万/套)"}
49 | {"name": "郎府书苑", "types": "住宅", "position": "西集镇京哈高速郎府出口南侧300米", "position1": "通州", "position2": "通州其它", "houseType": "3室", "space": "建面 89-116㎡", "unitPrice": "25800", "totalPrice": "总价273-300(万/套)"}
50 | {"name": "建邦·顺颐府", "types": "别墅", "position": "空港B区裕民大街30号", "position1": "顺义", "position2": "后沙峪", "houseType": "3室", "space": "建面 270㎡", "unitPrice": "55583", "totalPrice": "总价1300(万/套)"}
51 |
--------------------------------------------------------------------------------
/test1_house/单价-总价散点图绘制效果.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Revenger666/python-house-scrapy/bbff08f70092d39ef0d19d3f0d9723aa4c8fd1b8/test1_house/单价-总价散点图绘制效果.png
--------------------------------------------------------------------------------
/test1_house/单价直方图绘制效果.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Revenger666/python-house-scrapy/bbff08f70092d39ef0d19d3f0d9723aa4c8fd1b8/test1_house/单价直方图绘制效果.png
--------------------------------------------------------------------------------
/test1_house/总价直方图绘制效果.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Revenger666/python-house-scrapy/bbff08f70092d39ef0d19d3f0d9723aa4c8fd1b8/test1_house/总价直方图绘制效果.png
--------------------------------------------------------------------------------
/zufang/.idea/.gitignore:
--------------------------------------------------------------------------------
1 | # 默认忽略的文件
2 | /shelf/
3 | /workspace.xml
4 |
--------------------------------------------------------------------------------
/zufang/.idea/inspectionProfiles/profiles_settings.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
--------------------------------------------------------------------------------
/zufang/.idea/misc.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
--------------------------------------------------------------------------------
/zufang/.idea/modules.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
--------------------------------------------------------------------------------
/zufang/.idea/zufang.iml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
--------------------------------------------------------------------------------
/zufang/GDP_price_show.py:
--------------------------------------------------------------------------------
1 | #由于之前写好的程序中有展示各个城市单位面积均价,直接在这里存入列表中。
2 | #单位面积均价: 北京:103.2元/平米/月
3 | # 广州:63.9元/平米/月
4 | # 上海:103.5元/平米/月
5 | # 深圳:88.5元/平米/月
6 | # 西安:36.0元/平米/月
7 | #通过百度查询各个城市的人均GDP可知:
8 | #人均GDP:北京:19.03万元/人 广州:15.36万元/人 上海:17.99万元/人 深圳:18.33万元/人 西安:8.88万元/人
9 | #采用比值的形式来展示单位面积均价/GDP ,比值越小越好,因为对于同样的单位面积,GDP越大越好
10 |
11 | import matplotlib.pyplot as plt
12 |
13 | GDP = [19.03, 15.36, 17.99, 18.33, 8.88]
14 | space_price = [103.2, 63.9, 103.5, 88.5, 36.0]
15 | count = []
16 | # 求比值
17 | for i in range(0,5):
18 | count.append(space_price[i]/GDP[i])
19 |
20 | city_name = ['北京', '广州', '上海', '深圳', '西安']
21 | plt.figure(figsize=(20, 10), dpi=70)
22 | plt.rcParams['font.sans-serif'] = ['Microsoft YaHei']
23 |
24 | # 绘制直方图
25 | plt.subplot(121)
26 | plt.title("单位面积价格与GDP的比值直方图",fontsize=20)
27 | plt.bar(city_name, count,width=0.4)
28 | plt.ylabel("单位面积价格/人均GDP(万元)",fontsize=15)
29 |
30 | # 绘制散点图
31 | plt.subplot(122)
32 | plt.scatter(GDP[0],space_price[0],s=60, label='北京', color='steelblue')
33 | plt.scatter(GDP[1],space_price[1],s=60, label='广州', color='brown')
34 | plt.scatter(GDP[2],space_price[2],s=60, label='上海',color='green')
35 | plt.scatter(GDP[3],space_price[3],s=60, label='深圳',color='darkorange')
36 | plt.scatter(GDP[4],space_price[4],s=60, label='西安',color='skyblue')
37 | plt.title("GDP-单位面积均价散点图",fontsize=20)
38 | plt.xlabel("GDP 单位:万元",fontsize=15)
39 | plt.ylabel("单位面积均价 单位:平方米/元/月",fontsize=15)
40 | plt.legend(fontsize=15)
41 |
42 | plt.show()
43 |
--------------------------------------------------------------------------------
/zufang/chromedriver.exe:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Revenger666/python-house-scrapy/bbff08f70092d39ef0d19d3f0d9723aa4c8fd1b8/zufang/chromedriver.exe
--------------------------------------------------------------------------------
/zufang/face_price_show.py:
--------------------------------------------------------------------------------
1 | # 该文件实现了对五个不同城市按照房屋面向来进行均价的比较
2 |
3 | import json
4 | import codecs
5 | import matplotlib.pyplot as plt
6 | import numpy as np
7 |
8 | avg_price = [] # 均价
9 | count_list = [] # 各种面向的数量
10 |
11 | class Read_Json_and_show():
12 | def reads(self,filename,city, avg_price,count_list): # 读取城市数据,并分别存储到需要的数组中
13 | temp_price = [] # 临时用存储价格的列表
14 | count = [] # 临时用存储各个面向计数的列表
15 | for i in range(0,4): # 0123分别表示东南西北
16 | count.append(0)
17 | temp_price.append(0)
18 |
19 | with codecs.open(filename, 'r', encoding='utf-8') as f:
20 | read = f.readlines()
21 | # 打开文件并逐行读取
22 | for index, info in enumerate(read):
23 | data = json.loads(info)
24 | value = list(data.values())
25 | # value保存了每一行的数据的值,以列表形式。
26 |
27 | # 不断拆分,最后拆出来需要的面向
28 | temp_price_read = value[1].split('-')
29 | price = temp_price_read[0]
30 | price = int(price)
31 | temp_face_read = value[5].split('㎡ /')
32 | temp_face_read = temp_face_read[1].split(' / ')
33 | temp_face_read = temp_face_read[0].split('在')
34 |
35 | if len(temp_face_read) == 1:
36 | temp_face_read = temp_face_read[0].split('室')
37 | if len(temp_face_read) == 1:
38 | temp_face_read = temp_face_read[0].split(' ')
39 | # 由于同时一组数据可能有多个面向,故如面向东 南,则东,南均各统计一次。
40 | for face in temp_face_read:
41 | if face == '东':
42 | temp_price[0] += price
43 | count[0] += 1
44 | if face == '南':
45 | temp_price[1] += price
46 | count[1] += 1
47 | if face == '西':
48 | temp_price[2] += price
49 | count[2] += 1
50 | if face == '北':
51 | temp_price[3] += price
52 | count[3] += 1
53 | # 计算平均价格
54 | for i in range(len(count)):
55 | temp_price[i] = temp_price[i]/count[i]
56 |
57 | # 将价格放入列表中
58 | avg_price.append(temp_price)
59 | count_list.append(count)
60 |
61 |
62 | if __name__ == '__main__':
63 | read_json = Read_Json_and_show()
64 | # 存储路径
65 | path1 = 'scrapy-beijing-zufang.json'
66 | path2 = 'scrapy-guangzhou-zufang.json'
67 | path3 = 'scrapy-shanghai-zufang.json'
68 | path4 = 'scrapy-shenzhen-zufang.json'
69 | path5 = 'scrapy-xian-zufang.json'
70 | # 分别对五个城市进行读取和分析的操作
71 | read_json.reads(path1, 0, avg_price, count_list)
72 | read_json.reads(path2, 1, avg_price, count_list)
73 | read_json.reads(path3, 2, avg_price, count_list)
74 | read_json.reads(path4, 3, avg_price, count_list)
75 | read_json.reads(path5, 4, avg_price, count_list)
76 | # 查看分析结果
77 | print(avg_price)
78 | print(count_list)
79 | # 绘图,分别将五个城市的直方图进行绘制
80 | face_name = ['东','南','西','北']
81 | plt.figure(figsize=(30, 30), dpi=70)
82 | bar_width = 0.1 # 条宽偏移
83 | plt.rcParams['font.sans-serif'] = ['Microsoft YaHei']
84 |
85 | # 通过偏移来绘制直方图,以达到一个横坐标能显示多个直方图的效果,同时便于区分颜色和图例
86 | plt.title('各城市面向以及价格比较', fontsize=25)
87 | plt.bar(x=np.arange(len(face_name)), height=avg_price[0], width=bar_width, label='北京', color='steelblue')
88 | plt.bar(x=np.arange(len(face_name)) + bar_width, height=avg_price[1], width=bar_width, label='广州', color='brown')
89 | plt.bar(x=np.arange(len(face_name)) + bar_width * 2, height=avg_price[2], width=bar_width, label='上海',color='greenyellow')
90 | plt.bar(x=np.arange(len(face_name)) + bar_width * 3, height=avg_price[3], width=bar_width, label='深圳',color='darkorange')
91 | plt.bar(x=np.arange(len(face_name)) + bar_width * 4, height=avg_price[4], width=bar_width, label='西安',color='skyblue')
92 | plt.xticks(np.arange(4) + 0.2, face_name,fontsize=20)
93 | plt.ylabel('价格:元/月',fontsize=20)
94 | # 显示图例
95 | plt.legend(fontsize=10)
96 | # 为柱状图在顶部添加数据信息
97 | for x, y in enumerate(avg_price[0]):
98 | plt.text(x, y + 0.2, "%s" % round(y, 1), ha='center')
99 | for x, y in enumerate(avg_price[1]):
100 | plt.text(x + bar_width, y + 0.2, "%s" % round(y, 1), ha='center')
101 | for x, y in enumerate(avg_price[2]):
102 | plt.text(x + bar_width * 2, y + 0.2, "%s" % round(y, 1), ha='center')
103 | for x, y in enumerate(avg_price[3]):
104 | plt.text(x + bar_width * 3, y + 0.2, "%s" % round(y, 1), ha='center')
105 | for x, y in enumerate(avg_price[4]):
106 | plt.text(x + bar_width * 4, y + 0.2, "%s" % round(y, 1), ha='center')
107 |
108 | plt.show()
109 |
--------------------------------------------------------------------------------
/zufang/pos_price_show.py:
--------------------------------------------------------------------------------
1 | # 该文件实现了按照板块来展示不同城市不同板块的均价
2 |
3 | import json
4 | import codecs
5 | import matplotlib.pyplot as plt
6 |
7 | avg_price = [] # 均价
8 | pos_list = [] # 板块名
9 | count_list = [] # 板块数量
10 |
11 | class Read_Json_and_show():
12 | def reads(self,filename,city, avg_price, pos_list,count_list): # 读取城市数据,并分别存储到需要的数组中
13 | temp_price = []
14 | pos_name = []
15 | count = []
16 | for i in range(0,3000):
17 | pos_name.append("")
18 | count.append(0)
19 | temp_price.append(0)
20 |
21 | with codecs.open(filename, 'r', encoding='utf-8') as f:
22 | read = f.readlines()
23 | for index, info in enumerate(read):
24 | data = json.loads(info)
25 | value = list(data.values())
26 | # value保存了每一行的数据的值,以列表形式。
27 | # 划分出价格
28 | temp_price_read = value[1].split('-')
29 | price = temp_price_read[0]
30 | price = int(price)
31 | # 划分出板块
32 | temp_pos_read = value[5].split('-')
33 | if len(temp_pos_read) > 1: # 先确定板块的位置,再将干扰的数据剔除
34 | temp_pos_read = temp_pos_read[1].split('㎡')
35 | if len(temp_pos_read) == 1:
36 | pos = str(temp_pos_read[0])
37 | # 通过遍历的方式来检查已经建立的板块名列表,重复则数量加一,无重复则将这个板块名加入板块列表
38 | flag = -1
39 | for j in range(len(pos_name)):
40 | if pos_name[j] == pos or pos_name[j] == "":
41 | flag = j
42 | break
43 |
44 | pos_name[flag] = pos
45 | count[flag] += 1
46 | temp_price[flag] += price
47 | # 将之前预设的列表里多余的空项去除掉
48 | while temp_price[-1] == 0:
49 | temp_price.pop()
50 | while pos_name[-1] == "":
51 | pos_name.pop()
52 | while count[-1] == 0:
53 | count.pop()
54 | for i in range(len(count)):
55 | temp_price[i] = temp_price[i]/count[i]
56 | # 将最后得到的列表写入总列表中
57 | avg_price.append(temp_price)
58 | pos_list.append(pos_name)
59 | count_list.append(len(temp_price))
60 |
61 |
62 | if __name__ == '__main__':
63 | read_json = Read_Json_and_show()
64 | path1 = 'scrapy-beijing-zufang.json'
65 | path2 = 'scrapy-guangzhou-zufang.json'
66 | path3 = 'scrapy-shanghai-zufang.json'
67 | path4 = 'scrapy-shenzhen-zufang.json'
68 | path5 = 'scrapy-xian-zufang.json'
69 | # 按照不同城市进行数据的读取和存储操作
70 | read_json.reads(path1, 0, avg_price, pos_list, count_list)
71 | read_json.reads(path2, 1, avg_price, pos_list, count_list)
72 | read_json.reads(path3, 2, avg_price, pos_list, count_list)
73 | read_json.reads(path4, 3, avg_price, pos_list, count_list)
74 | read_json.reads(path5, 4, avg_price, pos_list, count_list)
75 | # 输出读取结果进行观察
76 | print(avg_price)
77 | print(pos_list)
78 | print(count_list)
79 |
80 | # 数据太多了,只保留一部分 ,这里只保留15个板块的数据
81 | for i in range(5):
82 | while len(avg_price[i]) > 15:
83 | avg_price[i].pop()
84 | while len(pos_list[i]) > 15:
85 | pos_list[i].pop()
86 |
87 | city_name = ['北京', '广州', '上海', '深圳', '西安']
88 | # 一些预设参数
89 | plt.figure(figsize=(50, 50), dpi=70)
90 | bar_width = 0.23
91 | plt.rcParams['font.sans-serif'] = ['Microsoft YaHei']
92 |
93 | # 分开画不同城市的不同板块的展示图
94 | # 北京
95 | plt.subplot(321)
96 | plt.bar(pos_list[0],avg_price[0])
97 | plt.title("北京不同板块均价展示图")
98 | plt.ylabel("价格:元/月")
99 | # 广州
100 | plt.subplot(322)
101 | plt.bar(pos_list[1], avg_price[1])
102 | plt.title("广州不同板块均价展示图")
103 | plt.ylabel("价格:元/月")
104 | # 上海
105 | plt.subplot(323)
106 | plt.bar(pos_list[2], avg_price[2])
107 | plt.title("上海不同板块均价展示图")
108 | plt.ylabel("价格:元/月")
109 | # 深圳
110 | plt.subplot(324)
111 | plt.bar(pos_list[3], avg_price[3])
112 | plt.title("深圳不同板块均价展示图")
113 | plt.ylabel("价格:元/月")
114 | # 西安
115 | plt.subplot(325)
116 | plt.bar(pos_list[4], avg_price[4])
117 | plt.title("西安不同板块均价展示图")
118 | plt.ylabel("价格:元/月")
119 |
120 | plt.show()
--------------------------------------------------------------------------------
/zufang/room_price_show.py:
--------------------------------------------------------------------------------
1 | # 按照居室展示,几居即几室
2 |
3 | import json
4 | import codecs
5 | import matplotlib.pyplot as plt
6 | import numpy as np
7 |
8 | avg_price = [[]for _ in range(3)] # 均价
9 | high_price = [[]for _ in range(3)]# 最高价
10 | mid_price = [[]for _ in range(3)]# 中位数
11 | low_price = [[]for _ in range(3)]# 最低价
12 |
13 | class Read_Json_and_show():
14 | def reads(self,filename,city, avg_price, high_price, mid_price, low_price): # 读取城市数据,并分别存储到需要的数组中
15 | temp_price = [] # 临时用的总价
16 | temp_high_price = [] # 临时用的最高价
17 | temp_low_price = [] # 临时用的最低价
18 | temp_mid_price_count = [[]for _ in range(3)]
19 | house_type = 0 # 1表示1居 2表示2居 3表示3居
20 | count = [] # 居室个数计数器
21 | for i in range(0, 3): # 给初值
22 | count.append(0)
23 | temp_price.append(0)
24 | temp_low_price.append(99999999)
25 | temp_high_price.append(0)
26 |
27 | with codecs.open(filename, 'r', encoding='utf-8') as f:
28 | read = f.readlines()
29 | for index, info in enumerate(read):
30 | data = json.loads(info)
31 | value = list(data.values())
32 | # value保存了每一行的数据的值,以列表形式。
33 | # 数据处理,拆分出价格
34 | temp_price_read = value[1].split('-')
35 | price = temp_price_read[0]
36 | price = int(price)
37 | # 数据处理,拆分出居室情况
38 | temp_house_type = value[5].split('室')
39 | temp_house_type = temp_house_type[0].split('/ ')
40 | if temp_house_type[len(temp_house_type)-1] == '1':
41 | house_type = 1
42 | if temp_house_type[len(temp_house_type)-1] == '2':
43 | house_type = 2
44 | if temp_house_type[len(temp_house_type)-1] == '3':
45 | house_type = 3
46 | # print(house_type)
47 |
48 | if house_type == 1: # 1室的情况
49 | count[0] += 1
50 | if price > temp_high_price[0]: # 最高价
51 | temp_high_price[0] = price
52 | if price < temp_low_price[0]: # 最低价
53 | temp_low_price[0] = price
54 | temp_price[0] += price
55 | temp_mid_price_count[0].append(price)
56 |
57 | if house_type == 2: # 2室的情况
58 | count[1] += 1
59 | if price > temp_high_price[1]: # 最高价
60 | temp_high_price[1] = price
61 | if price < temp_low_price[1]: # 最低价
62 | temp_low_price[1] = price
63 |
64 | temp_price[1] += price
65 | temp_mid_price_count[1].append(price)
66 |
67 | if house_type == 3: # 3室的情况
68 | count[2] += 1
69 | if price > temp_high_price[2]: # 最高价
70 | temp_high_price[2] = price
71 | if price < temp_low_price[2]: # 最低价
72 | temp_low_price[2] = price
73 | temp_price[2] += price
74 | temp_mid_price_count[2].append(price)
75 |
76 | for i in range(0, 3):
77 | # 将处理完毕的数据放入列表中
78 | temp_price[i] = float(temp_price[i]/count[i]) # 均价
79 | avg_price[i][city] = temp_price[i]
80 | high_price[i][city] = temp_high_price[i]
81 | low_price[i][city] = temp_low_price[i]
82 | temp_mid_price_count[i].sort()
83 | mid_price[i][city] = temp_mid_price_count[i][int(len(temp_mid_price_count[i])/2)]
84 |
85 |
86 | if __name__ == '__main__':
87 | read_json = Read_Json_and_show()
88 | # 读文件并按照路径依次处理数据
89 | path1 = 'scrapy-beijing-zufang.json'
90 | path2 = 'scrapy-guangzhou-zufang.json'
91 | path3 = 'scrapy-shanghai-zufang.json'
92 | path4 = 'scrapy-shenzhen-zufang.json'
93 | path5 = 'scrapy-xian-zufang.json'
94 | # 给嵌套列表一个初值,这样方便后续数据处理
95 | for i in range(0, 3):
96 | for j in range(0, 5):
97 | avg_price[i].append(0)
98 | high_price[i].append(0)
99 | low_price[i].append(0)
100 | mid_price[i].append(0)
101 |
102 | # 读取并处理数据
103 | read_json.reads(path1, 0, avg_price, high_price, mid_price, low_price)
104 | read_json.reads(path2, 1, avg_price, high_price, mid_price, low_price)
105 | read_json.reads(path3, 2, avg_price, high_price, mid_price, low_price)
106 | read_json.reads(path4, 3, avg_price, high_price, mid_price, low_price)
107 | read_json.reads(path5, 4, avg_price, high_price, mid_price, low_price)
108 |
109 | print(avg_price)
110 | print(high_price)
111 | print(low_price)
112 | print(mid_price)
113 |
114 | # 绘图步骤
115 | city_name = ['北京', '广州', '上海', '深圳', '西安']
116 | plt.figure(figsize=(30, 30), dpi=70)
117 | bar_width = 0.23
118 | plt.rcParams['font.sans-serif'] = ['Microsoft YaHei']
119 |
120 | # 均价
121 | plt.subplot(221)
122 | plt.title('平均价格 ', fontsize=25)
123 | # 绘制成一个X坐标对应多个直方图的,这样方便数据的对比和展示观看
124 | plt.bar(x=np.arange(len(city_name)), height=avg_price[0], width=bar_width,label='1居',color='steelblue')
125 | plt.bar(x=np.arange(len(city_name))+bar_width, height=avg_price[1],width=bar_width, label='2居',color='brown')
126 | plt.bar(x=np.arange(len(city_name))+bar_width*2, height=avg_price[2], width=bar_width, label='3居',color='darkorange')
127 | plt.xticks(np.arange(5)+0.2,city_name,fontsize=15)
128 | plt.ylabel('价格:元/月')
129 | # 绘制图例
130 | plt.legend()
131 | # 给图像上端增添数据显示
132 | for x, y in enumerate(avg_price[0]):
133 | plt.text(x, y + 0.2, "%s" % round(y, 1), ha='center')
134 | for x, y in enumerate(avg_price[1]):
135 | plt.text(x+bar_width, y + 0.2, "%s" % round(y, 1), ha='center')
136 | for x, y in enumerate(avg_price[2]):
137 | plt.text(x+bar_width*2, y + 0.2, "%s" % round(y, 1), ha='center')
138 |
139 | # 最高价
140 |
141 | plt.subplot(222)
142 | plt.title('最高价格 ', fontsize=25)
143 | # 绘制成一个X坐标对应多个直方图的,这样方便数据的对比和展示观看
144 | plt.bar(x=np.arange(len(city_name)), height=high_price[0], width=bar_width, label='1居', color='steelblue')
145 | plt.bar(x=np.arange(len(city_name)) + bar_width, height=high_price[1], width=bar_width, label='2居', color='brown')
146 | plt.bar(x=np.arange(len(city_name)) + bar_width * 2, height=high_price[2], width=bar_width, label='3居',
147 | color='darkorange')
148 | plt.xticks(np.arange(5) + 0.2, city_name, fontsize=15)
149 | plt.ylabel('价格:元/月')
150 | plt.legend()
151 | # 给图像上端增添数据显示
152 | for x, y in enumerate(high_price[0]):
153 | plt.text(x, y + 0.2, "%s" % round(y, 1), ha='center')
154 | for x, y in enumerate(high_price[1]):
155 | plt.text(x + bar_width, y + 0.2, "%s" % round(y, 1), ha='center')
156 | for x, y in enumerate(high_price[2]):
157 | plt.text(x + bar_width * 2, y + 0.2, "%s" % round(y, 1), ha='center')
158 |
159 | # 最低价
160 | plt.subplot(223)
161 | plt.title('最低价格 ', fontsize=25)
162 | # 绘制成一个X坐标对应多个直方图的,这样方便数据的对比和展示观看
163 | plt.bar(x=np.arange(len(city_name)), height=low_price[0], width=bar_width, label='1居', color='steelblue')
164 | plt.bar(x=np.arange(len(city_name)) + bar_width, height=low_price[1], width=bar_width, label='2居', color='brown')
165 | plt.bar(x=np.arange(len(city_name)) + bar_width * 2, height=low_price[2], width=bar_width, label='3居',
166 | color='darkorange')
167 | plt.xticks(np.arange(5) + 0.2, city_name, fontsize=15)
168 | plt.ylabel('价格:元/月')
169 | plt.legend()
170 | # 给图像上端增添数据显示
171 | for x, y in enumerate(low_price[0]):
172 | plt.text(x, y + 0.2, "%s" % round(y, 1), ha='center')
173 | for x, y in enumerate(low_price[1]):
174 | plt.text(x + bar_width, y + 0.2, "%s" % round(y, 1), ha='center')
175 | for x, y in enumerate(low_price[2]):
176 | plt.text(x + bar_width * 2, y + 0.2, "%s" % round(y, 1), ha='center')
177 |
178 | # 中位数
179 | plt.subplot(224)
180 | plt.title('中位数价格 ', fontsize=25)
181 | plt.bar(x=np.arange(len(city_name)), height=mid_price[0], width=bar_width, label='1居', color='steelblue')
182 | plt.bar(x=np.arange(len(city_name)) + bar_width, height=mid_price[1], width=bar_width, label='2居', color='brown')
183 | plt.bar(x=np.arange(len(city_name)) + bar_width * 2, height=mid_price[2], width=bar_width, label='3居',
184 | color='darkorange')
185 | plt.xticks(np.arange(5) + 0.2, city_name, fontsize=15)
186 | plt.ylabel('价格:元/月')
187 | plt.legend()
188 | # 给图像上端增添数据显示
189 | for x, y in enumerate(mid_price[0]):
190 | plt.text(x, y + 0.2, "%s" % round(y, 1), ha='center')
191 | for x, y in enumerate(mid_price[1]):
192 | plt.text(x + bar_width, y + 0.2, "%s" % round(y, 1), ha='center')
193 | for x, y in enumerate(mid_price[2]):
194 | plt.text(x + bar_width * 2, y + 0.2, "%s" % round(y, 1), ha='center')
195 |
196 | plt.show()
197 |
--------------------------------------------------------------------------------
/zufang/salary_price_show.py:
--------------------------------------------------------------------------------
1 | #由于之前写好的程序中有展示各个城市单位面积均价,直接在这里存入列表中。
2 | #单位面积均价: 北京:103.2元/平米/月
3 | # 广州:63.9元/平米/月
4 | # 上海:103.5元/平米/月
5 | # 深圳:88.5元/平米/月
6 | # 西安:36.0元/平米/月
7 | #查询百度各个城市的人均工资可知:
8 | #人均工资: 北京:13567元/月 广州:11300元/月 上海:12183元/月 深圳:12300元/月 西安9011元/月
9 |
10 | import matplotlib.pyplot as plt
11 |
12 | space_price = [103.2, 63.9, 103.5, 88.5, 36.0]
13 | salary = [135.67, 113.00, 121.83, 123.00, 90.11] # 单位为百元,这样好计算一点
14 | count = []
15 | # 计算比值
16 | for i in range(0,5):
17 | count.append(space_price[i]/salary[i]) # 用比值来表示,比值越低说明房租占工资占比小,生活成本相对低一点
18 |
19 | # 绘图过程
20 | city_name = ['北京', '广州', '上海', '深圳', '西安']
21 | plt.figure(figsize=(20, 10), dpi=70)
22 | # 消除中文乱码用的
23 | plt.rcParams['font.sans-serif'] = ['Microsoft YaHei']
24 | # 绘制直方图
25 | plt.subplot(121)
26 | plt.title("单位面积价格与平均月薪的比值直方图",fontsize=20)
27 | plt.bar(city_name, count,width=0.4)
28 | plt.ylabel("单位面积价格/人均月薪(百元)",fontsize=15)
29 | # 绘制散点图,由于要用不同的点的颜色来表示,因此分开绘制五个点
30 | plt.subplot(122)
31 | plt.scatter(salary[0],space_price[0],s=60, label='北京', color='steelblue')
32 | plt.scatter(salary[1],space_price[1],s=60, label='广州', color='brown')
33 | plt.scatter(salary[2],space_price[2],s=60, label='上海',color='green')
34 | plt.scatter(salary[3],space_price[3],s=60, label='深圳',color='darkorange')
35 | plt.scatter(salary[4],space_price[4],s=60, label='西安',color='skyblue')
36 | plt.title("人均月薪-单位面积均价散点图",fontsize=20)
37 | plt.xlabel("人均月薪 单位:百元",fontsize=15)
38 | plt.ylabel("单位面积均价 单位:平方米/元/月",fontsize=15)
39 | plt.legend(fontsize=15)
40 |
41 | plt.show()
42 |
--------------------------------------------------------------------------------
/zufang/scrapy.cfg:
--------------------------------------------------------------------------------
1 | # Automatically created by: scrapy startproject
2 | #
3 | # For more information about the [deploy] section see:
4 | # https://scrapyd.readthedocs.io/en/latest/deploy.html
5 |
6 | [settings]
7 | default = zufang.settings
8 |
9 | [deploy]
10 | #url = http://localhost:6800/
11 | project = zufang
12 |
--------------------------------------------------------------------------------
/zufang/total_price_show.py:
--------------------------------------------------------------------------------
1 | # 该文件实现了对五个城市的租房价格的均价,中位数,最高价,最低价和单位面积的均价,中位数,最高价,最低价的比较分析和图表绘制
2 | import json
3 | import codecs
4 | import re
5 | import matplotlib.pyplot as plt
6 |
7 |
8 | total_avg_price = [] # 均价
9 | total_high_price = [] # 最高价
10 | total_mid_price = [] # 中位数
11 | total_low_price = [] # 最低价
12 | space_avg_price = [] # 均价(单位面积)
13 | space_high_price = [] # 最高价(单位面积)
14 | space_low_price = [] # 最低价(单位面积)
15 | space_mid_price = [] # 中位数 (单位面积)
16 |
17 | class Read_Json_and_show():
18 | def reads(self,filename,city, total_avg_price, total_high_price, total_mid_price, total_low_price, space_avg_price,
19 | space_high_price, space_low_price, space_mid_price): # 读取城市数据,并分别存储到需要的数组中
20 | temp_total_price = 0
21 | temp_space_price = 0
22 | temp_space_high_price = 0
23 | temp_space_low_price = 9999999
24 | temp_total_high_price = 0
25 | temp_total_low_price = 9999999
26 | with codecs.open(filename,'r',encoding='utf-8') as f:
27 | # 打开并读取文件
28 | read = f.readlines()
29 | total_mid_price_count = []
30 | space_mid_price_count = []
31 | for index, info in enumerate(read): # 逐行读取
32 | data = json.loads(info)
33 | value = list(data.values())
34 | # value保存了每一行的数据的值,以列表形式。
35 |
36 | number = re.compile(r'^[-+]?[0-9]+\.[0-9]+$') # 正则式判断是否是小数
37 | # 划分出价格
38 | temp_price = value[1].split('-')
39 | price = temp_price[0]
40 | price = int(price)
41 | # 划分出面积
42 | temp_space = value[5].split("㎡")
43 | temp_space = temp_space[0].split("/ ")
44 | if len(temp_space) == 1:
45 | temp_space = temp_space[0].split("-")
46 | space = temp_space[0]
47 | space = float(space)
48 | else:
49 | temp_space2 = temp_space[1].split("-")
50 | # 判断这个数是否为小数,是则说明是面积,这里保留了最小面积作为参考(否则就是异常数据,需要被忽略)
51 | result = number.match(temp_space2[0])
52 | if result:
53 | space = temp_space2[0]
54 | space = float(space)
55 | else: # 保留最小的面积(如20-25㎡则将space看做20)
56 | temp_space = temp_space[2].split("-")
57 | space = temp_space[0]
58 | space = float(space)
59 | # print(space)
60 | if price > temp_total_high_price: # 最高价
61 | temp_total_high_price = price
62 | if price < temp_total_low_price: # 最低价
63 | temp_total_low_price = price
64 | temp_total_price += price # 总价,均价出循环了计算
65 | total_mid_price_count.append(price) # 中位数,同样出循环了计算
66 |
67 | space_price = float(price/space)
68 | if space_price > temp_space_high_price: # 最高价(单位面积)
69 | temp_space_high_price = space_price
70 | if space_price < temp_space_low_price: # 最低价(单位面积)
71 | temp_space_low_price = space_price
72 | temp_space_price += space_price # 总价 (单位面积)
73 | space_mid_price_count.append(space_price) # 中位数(单位面积)
74 |
75 | # 均价
76 | total_avg_price.append(float(temp_total_price/3000))
77 | space_avg_price.append(float(temp_space_price/3000))
78 | # 最高价
79 | total_high_price.append(float(temp_total_high_price))
80 | space_high_price.append(float(temp_space_high_price))
81 | # 最低价
82 | total_low_price.append(float(temp_total_low_price))
83 | space_low_price.append(float(temp_space_low_price))
84 | # 中位数
85 | total_mid_price_count.sort()
86 | space_mid_price_count.sort()
87 | total_mid_price.append(float(total_mid_price_count[1499]))
88 | space_mid_price.append(float(space_mid_price_count[1499]))
89 |
90 |
91 | if __name__ == '__main__':
92 | read_json = Read_Json_and_show()
93 | path1 = 'scrapy-beijing-zufang.json'
94 | path2 = 'scrapy-guangzhou-zufang.json'
95 | path3 = 'scrapy-shanghai-zufang.json'
96 | path4 = 'scrapy-shenzhen-zufang.json'
97 | path5 = 'scrapy-xian-zufang.json'
98 | # 读取并处理数据
99 | read_json.reads(path1, 1, total_avg_price, total_high_price, total_mid_price, total_low_price, space_avg_price,
100 | space_high_price, space_low_price, space_mid_price)
101 | read_json.reads(path2, 2, total_avg_price, total_high_price, total_mid_price, total_low_price, space_avg_price,
102 | space_high_price, space_low_price, space_mid_price)
103 | read_json.reads(path3, 3, total_avg_price, total_high_price, total_mid_price, total_low_price, space_avg_price,
104 | space_high_price, space_low_price, space_mid_price)
105 | read_json.reads(path4, 4, total_avg_price, total_high_price, total_mid_price, total_low_price, space_avg_price,
106 | space_high_price, space_low_price, space_mid_price)
107 | read_json.reads(path5, 5, total_avg_price, total_high_price, total_mid_price, total_low_price, space_avg_price,
108 | space_high_price, space_low_price, space_mid_price)
109 |
110 | # 输出展示处理结果
111 | print(total_avg_price)
112 | print(total_high_price)
113 | print(total_low_price)
114 | print(total_mid_price)
115 | print(space_avg_price)
116 | print(space_high_price)
117 | print(space_low_price)
118 | print(space_mid_price)
119 |
120 | # 绘直方图图
121 | plt.rcParams['font.sans-serif'] = ['Microsoft YaHei']
122 | city_name = ['北京', '广州', '上海', '深圳', '西安']
123 | plt.figure(figsize=(30, 30), dpi=70)
124 | bar_width = 0.4
125 |
126 | # 总平均租金展示图的绘制,共八个子图
127 | plt.subplot(241)
128 | plt.title('总价平均价格', fontsize=25)
129 | plt.bar(city_name,total_avg_price)
130 | plt.ylabel('价格:元/月')
131 | for x, y in enumerate(total_avg_price):
132 | plt.text(x, y + 0.1, "%s" % round(y, 1), ha='center')
133 | # 总最高租金
134 | plt.subplot(242)
135 | plt.title('总价最高价格', fontsize=25)
136 | plt.bar(city_name, total_high_price)
137 | plt.ylabel('价格:元/月')
138 | for x, y in enumerate(total_high_price):
139 | plt.text(x, y + 0.1, "%s" % round(y, 1), ha='center')
140 | # 总最低租金
141 | plt.subplot(243)
142 | plt.title('总价最低价格', fontsize=25)
143 | plt.bar(city_name, total_low_price)
144 | plt.ylabel('价格:元/月')
145 | for x, y in enumerate(total_low_price):
146 | plt.text(x, y + 0.1, "%s" % round(y, 1), ha='center')
147 | # 总中位数租金
148 | plt.subplot(244)
149 | plt.title('总价中位数价格', fontsize=25)
150 | plt.bar(city_name, total_mid_price)
151 | plt.ylabel('价格:元/月')
152 | for x, y in enumerate(total_mid_price):
153 | plt.text(x, y + 0.1, "%s" % round(y, 1), ha='center')
154 | # 单位面积平均租金展示图的绘制
155 | plt.subplot(245)
156 | plt.title('单位面积均价', fontsize=25)
157 | plt.bar(city_name, space_avg_price)
158 | plt.ylabel('价格:元/月')
159 | for x, y in enumerate(space_avg_price):
160 | plt.text(x, y + 0.1, "%s" % round(y, 1), ha='center')
161 | # 单位面积最高租金
162 | plt.subplot(246)
163 | plt.title('单位面积最高价格', fontsize=25)
164 | plt.bar(city_name, space_high_price)
165 | plt.ylabel('价格:元/月')
166 | for x, y in enumerate(space_high_price):
167 | plt.text(x, y + 0.1, "%s" % round(y, 1), ha='center')
168 | # 单位面积最低租金
169 | plt.subplot(247)
170 | plt.title('单位面积最低价格', fontsize=25)
171 | plt.bar(city_name, space_low_price)
172 | plt.ylabel('价格:元/月')
173 | for x, y in enumerate(space_low_price):
174 | plt.text(x, y + 0.1, "%s" % round(y, 1), ha='center')
175 | # 单位面积中位数租金
176 | plt.subplot(248)
177 | plt.title('单位面积中位数价格', fontsize=25)
178 | plt.bar(city_name, space_mid_price)
179 | plt.ylabel('价格:元/月')
180 | for x, y in enumerate(space_mid_price):
181 | plt.text(x, y + 0.1, "%s" % round(y, 1), ha='center')
182 |
183 | plt.show()
184 |
--------------------------------------------------------------------------------
/zufang/zufang/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Revenger666/python-house-scrapy/bbff08f70092d39ef0d19d3f0d9723aa4c8fd1b8/zufang/zufang/__init__.py
--------------------------------------------------------------------------------
/zufang/zufang/items.py:
--------------------------------------------------------------------------------
1 | # Define here the models for your scraped items
2 | #
3 | # See documentation in:
4 | # https://docs.scrapy.org/en/latest/topics/items.html
5 |
6 | import scrapy
7 |
8 |
9 | class zufangitem(scrapy.Item):
10 | title = scrapy.Field() # 标题
11 | price = scrapy.Field() #月租金
12 | position0 = scrapy.Field() #地址1
13 | position1 = scrapy.Field() #地址2
14 | position2 = scrapy.Field() #地址3
15 | information = scrapy.Field() #其他信息
16 |
17 |
--------------------------------------------------------------------------------
/zufang/zufang/middlewares.py:
--------------------------------------------------------------------------------
1 | from scrapy import signals
2 | from selenium import webdriver
3 | from scrapy.http import HtmlResponse
4 |
5 | # 只需要修改下载器中间件,爬虫中间件不用管
6 |
7 |
8 | class ZufangDownloaderMiddleware:
9 | # 当下载器中间件开始工作时,自动打开一个浏览器
10 |
11 | def __init__(self):
12 | self.driver = webdriver.Chrome()
13 |
14 | @classmethod
15 | def from_crawler(cls, crawler):
16 | # This method is used by Scrapy to create your spiders.
17 | s = cls()
18 | crawler.signals.connect(s.spider_opened, signal=signals.spider_opened)
19 | # 下面这一行需要手动添加,作用是调用关闭浏览器的函数
20 | crawler.signals.connect(s.spider_closed, signal=signals.spider_closed)
21 | return s
22 |
23 | # 每当爬虫文件向目标网址发送一次请求都会调用这个函数,用处就是返回该网址的源码
24 |
25 | def process_request(self, request, spider):
26 | self.driver.get(request.url) # 使用浏览器打开请求的URL
27 | body = self.driver.page_source # 获取网页HTML源码
28 | return HtmlResponse(url=self.driver.current_url, body=body, encoding='utf-8', request=request)
29 |
30 | def process_response(self, request, response, spider):
31 | return response
32 |
33 | def process_exception(self, request, exception, spider):
34 | pass
35 |
36 | def spider_opened(self, spider):
37 | spider.logger.info("Spider opened: %s" % spider.name)
38 |
39 | # 该函数需要手动添加,作用是关闭浏览器
40 |
41 | def spider_closed(self, spider):
42 | self.driver.close()
43 | spider.logger.info("Spider closed: %s" % spider.name)
44 |
--------------------------------------------------------------------------------
/zufang/zufang/pipelines.py:
--------------------------------------------------------------------------------
1 | # Define your item pipelines here
2 | #
3 | # Don't forget to add your pipeline to the ITEM_PIPELINES setting
4 | # See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html
5 |
6 |
7 | # useful for handling different item types with a single interface
8 | from itemadapter import ItemAdapter
9 | import json
10 |
11 |
12 | class zufangline(object):
13 | def open_spider(self, spider):
14 | try:
15 | self.file = open('scrapy-xian-zufang.json', "w", encoding="utf-8")
16 | except Exception as err:
17 | print(err)
18 |
19 | def process_item(self, item, spider):
20 | dict_item = dict(item)
21 | json_str = json.dumps(dict_item, ensure_ascii=False) + "\n"
22 | self.file.write(json_str)
23 | return item
24 |
25 | def close_spider(self, spider):
26 | self.file.close()
27 |
--------------------------------------------------------------------------------
/zufang/zufang/settings.py:
--------------------------------------------------------------------------------
1 | # Scrapy settings for lianjia project
2 | #
3 | # For simplicity, this file contains only settings considered important or
4 | # commonly used. You can find more settings consulting the documentation:
5 | #
6 | # https://docs.scrapy.org/en/latest/topics/settings.html
7 | # https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
8 | # https://docs.scrapy.org/en/latest/topics/spider-middleware.html
9 |
10 | BOT_NAME = "zufang"
11 | #2403:a200:a200:13f1:183:84:18:11
12 |
13 | SPIDER_MODULES = ["zufang.spiders"]
14 | NEWSPIDER_MODULE = "zufang.spiders"
15 |
16 | # Crawl responsibly by identifying yourself (and your website) on the user-agent
17 | #USER_AGENT = 'python_qm (+http://www.yourdomain.com)'
18 | #USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"
19 | #USER_AGENT ="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36 Edg/119.0.0.0"
20 | DOWNLOADER_MIDDLEWARES = {
21 | # 'lianjia.middlewares.LianjiaDownloaderMiddleware': 543,
22 | 'zufang.middlewares.RandomUserAgentMiddleware': 900,
23 | }
24 |
25 | MY_USER_AGENT = [
26 | "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; AcooBrowser; .NET CLR 1.1.4322; .NET CLR 2.0.50727)",
27 | "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; Acoo Browser; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.0.04506)",
28 | "Mozilla/4.0 (compatible; MSIE 7.0; AOL 9.5; AOLBuild 4337.35; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)",
29 | "Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US)",
30 | "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET CLR 2.0.50727; Media Center PC 6.0)",
31 | "Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET CLR 1.0.3705; .NET CLR 1.1.4322)",
32 | "Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 5.2; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.2; .NET CLR 3.0.04506.30)",
33 | "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN) AppleWebKit/523.15 (KHTML, like Gecko, Safari/419.3) Arora/0.3 (Change: 287 c9dfb30)",
34 | "Mozilla/5.0 (X11; U; Linux; en-US) AppleWebKit/527+ (KHTML, like Gecko, Safari/419.3) Arora/0.6",
35 | "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.2pre) Gecko/20070215 K-Ninja/2.1.1",
36 | "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.9) Gecko/20080705 Firefox/3.0 Kapiko/3.0",
37 | "Mozilla/5.0 (X11; Linux i686; U;) Gecko/20070322 Kazehakase/0.4.5",
38 | "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.8) Gecko Fedora/1.9.0.8-1.fc10 Kazehakase/0.5.6",
39 | "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11",
40 | "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/535.20 (KHTML, like Gecko) Chrome/19.0.1036.7 Safari/535.20",
41 | "Opera/9.80 (Macintosh; Intel Mac OS X 10.6.8; U; fr) Presto/2.9.168 Version/11.52",
42 | "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.11 TaoBrowser/2.0 Safari/536.11",
43 | "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.71 Safari/537.1 LBBROWSER",
44 | "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; LBBROWSER)",
45 | "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E; LBBROWSER)",
46 | "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.84 Safari/535.11 LBBROWSER",
47 | "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E)",
48 | "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; QQBrowser/7.0.3698.400)",
49 | "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E)",
50 | "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; SV1; QQDownload 732; .NET4.0C; .NET4.0E; 360SE)",
51 | "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E)",
52 | "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E)",
53 | "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1",
54 | "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1",
55 | "Mozilla/5.0 (iPad; U; CPU OS 4_2_1 like Mac OS X; zh-cn) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8C148 Safari/6533.18.5",
56 | "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:2.0b13pre) Gecko/20110307 Firefox/4.0b13pre",
57 | "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:16.0) Gecko/20100101 Firefox/16.0",
58 | "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11",
59 | "Mozilla/5.0 (X11; U; Linux x86_64; zh-CN; rv:1.9.2.10) Gecko/20100922 Ubuntu/10.10 (maverick) Firefox/3.6.10",
60 | "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36",
61 | ]
62 |
63 | # Obey robots.txt rules
64 | ROBOTSTXT_OBEY = False
65 |
66 | LOG_LEVEL = 'WARNING'
67 |
68 | #LOG_LEVEL = "WARNING"
69 | # Configure maximum concurrent requests performed by Scrapy (default: 16)
70 | #CONCURRENT_REQUESTS = 8
71 |
72 | # Configure a delay for requests for the same website (default: 0)
73 | # See https://docs.scrapy.org/en/latest/topics/settings.html#download-delay
74 | # See also autothrottle settings and docs
75 | DOWNLOAD_DELAY = 3
76 | RANDOMIZE_DOWNLOAD_DELAY = True
77 | #CONCURRENT_REQUESTS_PER_DOMAIN = 2
78 | # The download delay setting will honor only one of:
79 | #CONCURRENT_REQUESTS_PER_DOMAIN = 16
80 | #CONCURRENT_REQUESTS_PER_IP = 16
81 |
82 | # Disable cookies (enabled by default)
83 | #COOKIES_ENABLED = False
84 |
85 | # Disable Telnet Console (enabled by default)
86 | #TELNETCONSOLE_ENABLED = False
87 |
88 | # Override the default request headers:
89 | #DEFAULT_REQUEST_HEADERS = {
90 | # "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
91 | # "Accept-Language": "en",
92 | # "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.4389.82 Safari/537.36"
93 | #}
94 |
95 | # Enable or disable spider middlewares
96 | # See https://docs.scrapy.org/en/latest/topics/spider-middleware.html
97 | #SPIDER_MIDDLEWARES = {
98 | # "lianjia.middlewares.LianjiaSpiderMiddleware": 543,
99 | #}
100 |
101 | # Enable or disable downloader middlewares
102 | # See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
103 | DOWNLOADER_MIDDLEWARES = {
104 | "zufang.middlewares.ZufangDownloaderMiddleware": 543,
105 | }
106 |
107 | # Enable or disable extensions
108 | # See https://docs.scrapy.org/en/latest/topics/extensions.html
109 | #EXTENSIONS = {
110 | # "scrapy.extensions.telnet.TelnetConsole": None,
111 | #}
112 |
113 | # Configure item pipelines
114 | # See https://docs.scrapy.org/en/latest/topics/item-pipeline.html
115 | ITEM_PIPELINES = {'zufang.pipelines.zufangline': 300, }
116 | #ITEM_PIPELINES = {'lianjia.pipelines.firsthandline': 300}
117 | # Enable and configure the AutoThrottle extension (disabled by default)
118 | # See https://docs.scrapy.org/en/latest/topics/autothrottle.html
119 | #AUTOTHROTTLE_ENABLED = True
120 | # The initial download delay
121 | #AUTOTHROTTLE_START_DELAY = 5
122 | # The maximum download delay to be set in case of high latencies
123 | #AUTOTHROTTLE_MAX_DELAY = 60
124 | # The average number of requests Scrapy should be sending in parallel to
125 | # each remote server
126 | #AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0
127 | # Enable showing throttling stats for every response received:
128 | #AUTOTHROTTLE_DEBUG = False
129 |
130 | # Enable and configure HTTP caching (disabled by default)
131 | # See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings
132 | #HTTPCACHE_ENABLED = True
133 | #HTTPCACHE_EXPIRATION_SECS = 0
134 | #HTTPCACHE_DIR = "httpcache"
135 | #HTTPCACHE_IGNORE_HTTP_CODES = []
136 | #HTTPCACHE_STORAGE = "scrapy.extensions.httpcache.FilesystemCacheStorage"
137 |
138 | # Set settings whose default value is deprecated to a future-proof value
139 | REQUEST_FINGERPRINTER_IMPLEMENTATION = "2.7"
140 | TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"
141 | FEED_EXPORT_ENCODING = "utf-8"
142 |
--------------------------------------------------------------------------------
/zufang/zufang/spiders/__init__.py:
--------------------------------------------------------------------------------
1 | # This package will contain the spiders of your Scrapy project
2 | #
3 | # Please refer to the documentation for information on how to create and manage
4 | # your spiders.
5 |
--------------------------------------------------------------------------------
/zufang/zufang/spiders/spider1.py:
--------------------------------------------------------------------------------
1 | import scrapy
2 |
3 | from zufang.items import zufangitem
4 |
5 |
6 | class Zufangspider(scrapy.spiders.Spider):
7 | name = "xian" # 爬虫名字分别为 beijing shanghai guangzhou shenzhen xian
8 | allowed_domains = ["xa.lianjia.com"] # 爬取的起始页面
9 | start_urls = []
10 | for page in range(1, 101): # 共100页,所以利用一个循环来爬取
11 | url1 = 'https://xa.lianjia.com/zufang/pg{}/'.format(page)
12 | start_urls.append(url1)
13 |
14 | custom_settings = {
15 | 'ITEM_PIPELINES': {'zufang.pipelines.zufangline': 300},
16 | }
17 |
18 | def parse(self, response, **kwargs):
19 |
20 | item = zufangitem()
21 | div_list = response.xpath("//*[@id=\"content\"]/div[1]/div[1]/div")
22 |
23 | # 通过XPATH来分析爬取到的内容,并提取需要的数据
24 | for each in div_list:
25 | item['title'] = each.xpath("normalize-space(./div/p[1]/a/text())").extract_first()
26 | item['price'] = each.xpath("normalize-space(./div/span/em/text())").extract_first()
27 | item['position0'] = each.xpath("./div/p[2]/a[1]/text()").extract_first()
28 | item['position1'] = each.xpath("./div/p[2]/a[2]/text()").extract_first()
29 | item['position2'] = each.xpath("./div/p[2]/a[3]/text()").extract_first()
30 | item['information'] = each.xpath("normalize-space(./div/p[2])").extract_first()
31 | yield item
32 |
--------------------------------------------------------------------------------
/实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Revenger666/python-house-scrapy/bbff08f70092d39ef0d19d3f0d9723aa4c8fd1b8/实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤.zip
--------------------------------------------------------------------------------
/实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/zufang/.idea/.gitignore:
--------------------------------------------------------------------------------
1 | # 默认忽略的文件
2 | /shelf/
3 | /workspace.xml
4 |
--------------------------------------------------------------------------------
/实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/zufang/.idea/inspectionProfiles/profiles_settings.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
--------------------------------------------------------------------------------
/实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/zufang/.idea/misc.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
--------------------------------------------------------------------------------
/实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/zufang/.idea/modules.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
--------------------------------------------------------------------------------
/实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/zufang/.idea/zufang.iml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
--------------------------------------------------------------------------------
/实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/zufang/GDP_price_show.py:
--------------------------------------------------------------------------------
1 | #由于之前写好的程序中有展示各个城市单位面积均价,直接在这里存入列表中。
2 | #单位面积均价: 北京:103.2元/平米/月
3 | # 广州:63.9元/平米/月
4 | # 上海:103.5元/平米/月
5 | # 深圳:88.5元/平米/月
6 | # 西安:36.0元/平米/月
7 | #通过百度查询各个城市的人均GDP可知:
8 | #人均GDP:北京:19.03万元/人 广州:15.36万元/人 上海:17.99万元/人 深圳:18.33万元/人 西安:8.88万元/人
9 | #采用比值的形式来展示单位面积均价/GDP ,比值越小越好,因为对于同样的单位面积,GDP越大越好
10 |
11 | import matplotlib.pyplot as plt
12 |
13 | GDP = [19.03, 15.36, 17.99, 18.33, 8.88]
14 | space_price = [103.2, 63.9, 103.5, 88.5, 36.0]
15 | count = []
16 | # 求比值
17 | for i in range(0,5):
18 | count.append(space_price[i]/GDP[i])
19 |
20 | city_name = ['北京', '广州', '上海', '深圳', '西安']
21 | plt.figure(figsize=(20, 10), dpi=70)
22 | plt.rcParams['font.sans-serif'] = ['Microsoft YaHei']
23 |
24 | # 绘制直方图
25 | plt.subplot(121)
26 | plt.title("单位面积价格与GDP的比值直方图",fontsize=20)
27 | plt.bar(city_name, count,width=0.4)
28 | plt.ylabel("单位面积价格/人均GDP(万元)",fontsize=15)
29 |
30 | # 绘制散点图
31 | plt.subplot(122)
32 | plt.scatter(GDP[0],space_price[0],s=60, label='北京', color='steelblue')
33 | plt.scatter(GDP[1],space_price[1],s=60, label='广州', color='brown')
34 | plt.scatter(GDP[2],space_price[2],s=60, label='上海',color='green')
35 | plt.scatter(GDP[3],space_price[3],s=60, label='深圳',color='darkorange')
36 | plt.scatter(GDP[4],space_price[4],s=60, label='西安',color='skyblue')
37 | plt.title("GDP-单位面积均价散点图",fontsize=20)
38 | plt.xlabel("GDP 单位:万元",fontsize=15)
39 | plt.ylabel("单位面积均价 单位:平方米/元/月",fontsize=15)
40 | plt.legend(fontsize=15)
41 |
42 | plt.show()
43 |
--------------------------------------------------------------------------------
/实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/zufang/chromedriver.exe:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Revenger666/python-house-scrapy/bbff08f70092d39ef0d19d3f0d9723aa4c8fd1b8/实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/zufang/chromedriver.exe
--------------------------------------------------------------------------------
/实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/zufang/face_price_show.py:
--------------------------------------------------------------------------------
1 | # 该文件实现了对五个不同城市按照房屋面向来进行均价的比较
2 |
3 | import json
4 | import codecs
5 | import matplotlib.pyplot as plt
6 | import numpy as np
7 |
8 | avg_price = [] # 均价
9 | count_list = [] # 各种面向的数量
10 |
11 | class Read_Json_and_show():
12 | def reads(self,filename,city, avg_price,count_list): # 读取城市数据,并分别存储到需要的数组中
13 | temp_price = [] # 临时用存储价格的列表
14 | count = [] # 临时用存储各个面向计数的列表
15 | for i in range(0,4): # 0123分别表示东南西北
16 | count.append(0)
17 | temp_price.append(0)
18 |
19 | with codecs.open(filename, 'r', encoding='utf-8') as f:
20 | read = f.readlines()
21 | # 打开文件并逐行读取
22 | for index, info in enumerate(read):
23 | data = json.loads(info)
24 | value = list(data.values())
25 | # value保存了每一行的数据的值,以列表形式。
26 |
27 | # 不断拆分,最后拆出来需要的面向
28 | temp_price_read = value[1].split('-')
29 | price = temp_price_read[0]
30 | price = int(price)
31 | temp_face_read = value[5].split('㎡ /')
32 | temp_face_read = temp_face_read[1].split(' / ')
33 | temp_face_read = temp_face_read[0].split('在')
34 |
35 | if len(temp_face_read) == 1:
36 | temp_face_read = temp_face_read[0].split('室')
37 | if len(temp_face_read) == 1:
38 | temp_face_read = temp_face_read[0].split(' ')
39 | # 由于同时一组数据可能有多个面向,故如面向东 南,则东,南均各统计一次。
40 | for face in temp_face_read:
41 | if face == '东':
42 | temp_price[0] += price
43 | count[0] += 1
44 | if face == '南':
45 | temp_price[1] += price
46 | count[1] += 1
47 | if face == '西':
48 | temp_price[2] += price
49 | count[2] += 1
50 | if face == '北':
51 | temp_price[3] += price
52 | count[3] += 1
53 | # 计算平均价格
54 | for i in range(len(count)):
55 | temp_price[i] = temp_price[i]/count[i]
56 |
57 | # 将价格放入列表中
58 | avg_price.append(temp_price)
59 | count_list.append(count)
60 |
61 |
62 | if __name__ == '__main__':
63 | read_json = Read_Json_and_show()
64 | # 存储路径
65 | path1 = 'scrapy-beijing-zufang.json'
66 | path2 = 'scrapy-guangzhou-zufang.json'
67 | path3 = 'scrapy-shanghai-zufang.json'
68 | path4 = 'scrapy-shenzhen-zufang.json'
69 | path5 = 'scrapy-xian-zufang.json'
70 | # 分别对五个城市进行读取和分析的操作
71 | read_json.reads(path1, 0, avg_price, count_list)
72 | read_json.reads(path2, 1, avg_price, count_list)
73 | read_json.reads(path3, 2, avg_price, count_list)
74 | read_json.reads(path4, 3, avg_price, count_list)
75 | read_json.reads(path5, 4, avg_price, count_list)
76 | # 查看分析结果
77 | print(avg_price)
78 | print(count_list)
79 | # 绘图,分别将五个城市的直方图进行绘制
80 | face_name = ['东','南','西','北']
81 | plt.figure(figsize=(30, 30), dpi=70)
82 | bar_width = 0.1 # 条宽偏移
83 | plt.rcParams['font.sans-serif'] = ['Microsoft YaHei']
84 |
85 | # 通过偏移来绘制直方图,以达到一个横坐标能显示多个直方图的效果,同时便于区分颜色和图例
86 | plt.title('各城市面向以及价格比较', fontsize=25)
87 | plt.bar(x=np.arange(len(face_name)), height=avg_price[0], width=bar_width, label='北京', color='steelblue')
88 | plt.bar(x=np.arange(len(face_name)) + bar_width, height=avg_price[1], width=bar_width, label='广州', color='brown')
89 | plt.bar(x=np.arange(len(face_name)) + bar_width * 2, height=avg_price[2], width=bar_width, label='上海',color='greenyellow')
90 | plt.bar(x=np.arange(len(face_name)) + bar_width * 3, height=avg_price[3], width=bar_width, label='深圳',color='darkorange')
91 | plt.bar(x=np.arange(len(face_name)) + bar_width * 4, height=avg_price[4], width=bar_width, label='西安',color='skyblue')
92 | plt.xticks(np.arange(4) + 0.2, face_name,fontsize=20)
93 | plt.ylabel('价格:元/月',fontsize=20)
94 | # 显示图例
95 | plt.legend(fontsize=10)
96 | # 为柱状图在顶部添加数据信息
97 | for x, y in enumerate(avg_price[0]):
98 | plt.text(x, y + 0.2, "%s" % round(y, 1), ha='center')
99 | for x, y in enumerate(avg_price[1]):
100 | plt.text(x + bar_width, y + 0.2, "%s" % round(y, 1), ha='center')
101 | for x, y in enumerate(avg_price[2]):
102 | plt.text(x + bar_width * 2, y + 0.2, "%s" % round(y, 1), ha='center')
103 | for x, y in enumerate(avg_price[3]):
104 | plt.text(x + bar_width * 3, y + 0.2, "%s" % round(y, 1), ha='center')
105 | for x, y in enumerate(avg_price[4]):
106 | plt.text(x + bar_width * 4, y + 0.2, "%s" % round(y, 1), ha='center')
107 |
108 | plt.show()
109 |
--------------------------------------------------------------------------------
/实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/zufang/pos_price_show.py:
--------------------------------------------------------------------------------
1 | # 该文件实现了按照板块来展示不同城市不同板块的均价
2 |
3 | import json
4 | import codecs
5 | import matplotlib.pyplot as plt
6 |
7 | avg_price = [] # 均价
8 | pos_list = [] # 板块名
9 | count_list = [] # 板块数量
10 |
11 | class Read_Json_and_show():
12 | def reads(self,filename,city, avg_price, pos_list,count_list): # 读取城市数据,并分别存储到需要的数组中
13 | temp_price = []
14 | pos_name = []
15 | count = []
16 | for i in range(0,3000):
17 | pos_name.append("")
18 | count.append(0)
19 | temp_price.append(0)
20 |
21 | with codecs.open(filename, 'r', encoding='utf-8') as f:
22 | read = f.readlines()
23 | for index, info in enumerate(read):
24 | data = json.loads(info)
25 | value = list(data.values())
26 | # value保存了每一行的数据的值,以列表形式。
27 | # 划分出价格
28 | temp_price_read = value[1].split('-')
29 | price = temp_price_read[0]
30 | price = int(price)
31 | # 划分出板块
32 | temp_pos_read = value[5].split('-')
33 | if len(temp_pos_read) > 1: # 先确定板块的位置,再将干扰的数据剔除
34 | temp_pos_read = temp_pos_read[1].split('㎡')
35 | if len(temp_pos_read) == 1:
36 | pos = str(temp_pos_read[0])
37 | # 通过遍历的方式来检查已经建立的板块名列表,重复则数量加一,无重复则将这个板块名加入板块列表
38 | flag = -1
39 | for j in range(len(pos_name)):
40 | if pos_name[j] == pos or pos_name[j] == "":
41 | flag = j
42 | break
43 |
44 | pos_name[flag] = pos
45 | count[flag] += 1
46 | temp_price[flag] += price
47 | # 将之前预设的列表里多余的空项去除掉
48 | while temp_price[-1] == 0:
49 | temp_price.pop()
50 | while pos_name[-1] == "":
51 | pos_name.pop()
52 | while count[-1] == 0:
53 | count.pop()
54 | for i in range(len(count)):
55 | temp_price[i] = temp_price[i]/count[i]
56 | # 将最后得到的列表写入总列表中
57 | avg_price.append(temp_price)
58 | pos_list.append(pos_name)
59 | count_list.append(len(temp_price))
60 |
61 |
62 | if __name__ == '__main__':
63 | read_json = Read_Json_and_show()
64 | path1 = 'scrapy-beijing-zufang.json'
65 | path2 = 'scrapy-guangzhou-zufang.json'
66 | path3 = 'scrapy-shanghai-zufang.json'
67 | path4 = 'scrapy-shenzhen-zufang.json'
68 | path5 = 'scrapy-xian-zufang.json'
69 | # 按照不同城市进行数据的读取和存储操作
70 | read_json.reads(path1, 0, avg_price, pos_list, count_list)
71 | read_json.reads(path2, 1, avg_price, pos_list, count_list)
72 | read_json.reads(path3, 2, avg_price, pos_list, count_list)
73 | read_json.reads(path4, 3, avg_price, pos_list, count_list)
74 | read_json.reads(path5, 4, avg_price, pos_list, count_list)
75 | # 输出读取结果进行观察
76 | print(avg_price)
77 | print(pos_list)
78 | print(count_list)
79 |
80 | # 数据太多了,只保留一部分 ,这里只保留15个板块的数据
81 | for i in range(5):
82 | while len(avg_price[i]) > 15:
83 | avg_price[i].pop()
84 | while len(pos_list[i]) > 15:
85 | pos_list[i].pop()
86 |
87 | city_name = ['北京', '广州', '上海', '深圳', '西安']
88 | # 一些预设参数
89 | plt.figure(figsize=(50, 50), dpi=70)
90 | bar_width = 0.23
91 | plt.rcParams['font.sans-serif'] = ['Microsoft YaHei']
92 |
93 | # 分开画不同城市的不同板块的展示图
94 | # 北京
95 | plt.subplot(321)
96 | plt.bar(pos_list[0],avg_price[0])
97 | plt.title("北京不同板块均价展示图")
98 | plt.ylabel("价格:元/月")
99 | # 广州
100 | plt.subplot(322)
101 | plt.bar(pos_list[1], avg_price[1])
102 | plt.title("广州不同板块均价展示图")
103 | plt.ylabel("价格:元/月")
104 | # 上海
105 | plt.subplot(323)
106 | plt.bar(pos_list[2], avg_price[2])
107 | plt.title("上海不同板块均价展示图")
108 | plt.ylabel("价格:元/月")
109 | # 深圳
110 | plt.subplot(324)
111 | plt.bar(pos_list[3], avg_price[3])
112 | plt.title("深圳不同板块均价展示图")
113 | plt.ylabel("价格:元/月")
114 | # 西安
115 | plt.subplot(325)
116 | plt.bar(pos_list[4], avg_price[4])
117 | plt.title("西安不同板块均价展示图")
118 | plt.ylabel("价格:元/月")
119 |
120 | plt.show()
--------------------------------------------------------------------------------
/实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/zufang/room_price_show.py:
--------------------------------------------------------------------------------
1 | # 按照居室展示,几居即几室
2 |
3 | import json
4 | import codecs
5 | import matplotlib.pyplot as plt
6 | import numpy as np
7 |
8 | avg_price = [[]for _ in range(3)] # 均价
9 | high_price = [[]for _ in range(3)]# 最高价
10 | mid_price = [[]for _ in range(3)]# 中位数
11 | low_price = [[]for _ in range(3)]# 最低价
12 |
13 | class Read_Json_and_show():
14 | def reads(self,filename,city, avg_price, high_price, mid_price, low_price): # 读取城市数据,并分别存储到需要的数组中
15 | temp_price = [] # 临时用的总价
16 | temp_high_price = [] # 临时用的最高价
17 | temp_low_price = [] # 临时用的最低价
18 | temp_mid_price_count = [[]for _ in range(3)]
19 | house_type = 0 # 1表示1居 2表示2居 3表示3居
20 | count = [] # 居室个数计数器
21 | for i in range(0, 3): # 给初值
22 | count.append(0)
23 | temp_price.append(0)
24 | temp_low_price.append(99999999)
25 | temp_high_price.append(0)
26 |
27 | with codecs.open(filename, 'r', encoding='utf-8') as f:
28 | read = f.readlines()
29 | for index, info in enumerate(read):
30 | data = json.loads(info)
31 | value = list(data.values())
32 | # value保存了每一行的数据的值,以列表形式。
33 | # 数据处理,拆分出价格
34 | temp_price_read = value[1].split('-')
35 | price = temp_price_read[0]
36 | price = int(price)
37 | # 数据处理,拆分出居室情况
38 | temp_house_type = value[5].split('室')
39 | temp_house_type = temp_house_type[0].split('/ ')
40 | if temp_house_type[len(temp_house_type)-1] == '1':
41 | house_type = 1
42 | if temp_house_type[len(temp_house_type)-1] == '2':
43 | house_type = 2
44 | if temp_house_type[len(temp_house_type)-1] == '3':
45 | house_type = 3
46 | # print(house_type)
47 |
48 | if house_type == 1: # 1室的情况
49 | count[0] += 1
50 | if price > temp_high_price[0]: # 最高价
51 | temp_high_price[0] = price
52 | if price < temp_low_price[0]: # 最低价
53 | temp_low_price[0] = price
54 | temp_price[0] += price
55 | temp_mid_price_count[0].append(price)
56 |
57 | if house_type == 2: # 2室的情况
58 | count[1] += 1
59 | if price > temp_high_price[1]: # 最高价
60 | temp_high_price[1] = price
61 | if price < temp_low_price[1]: # 最低价
62 | temp_low_price[1] = price
63 |
64 | temp_price[1] += price
65 | temp_mid_price_count[1].append(price)
66 |
67 | if house_type == 3: # 3室的情况
68 | count[2] += 1
69 | if price > temp_high_price[2]: # 最高价
70 | temp_high_price[2] = price
71 | if price < temp_low_price[2]: # 最低价
72 | temp_low_price[2] = price
73 | temp_price[2] += price
74 | temp_mid_price_count[2].append(price)
75 |
76 | for i in range(0, 3):
77 | # 将处理完毕的数据放入列表中
78 | temp_price[i] = float(temp_price[i]/count[i]) # 均价
79 | avg_price[i][city] = temp_price[i]
80 | high_price[i][city] = temp_high_price[i]
81 | low_price[i][city] = temp_low_price[i]
82 | temp_mid_price_count[i].sort()
83 | mid_price[i][city] = temp_mid_price_count[i][int(len(temp_mid_price_count[i])/2)]
84 |
85 |
86 | if __name__ == '__main__':
87 | read_json = Read_Json_and_show()
88 | # 读文件并按照路径依次处理数据
89 | path1 = 'scrapy-beijing-zufang.json'
90 | path2 = 'scrapy-guangzhou-zufang.json'
91 | path3 = 'scrapy-shanghai-zufang.json'
92 | path4 = 'scrapy-shenzhen-zufang.json'
93 | path5 = 'scrapy-xian-zufang.json'
94 | # 给嵌套列表一个初值,这样方便后续数据处理
95 | for i in range(0, 3):
96 | for j in range(0, 5):
97 | avg_price[i].append(0)
98 | high_price[i].append(0)
99 | low_price[i].append(0)
100 | mid_price[i].append(0)
101 |
102 | # 读取并处理数据
103 | read_json.reads(path1, 0, avg_price, high_price, mid_price, low_price)
104 | read_json.reads(path2, 1, avg_price, high_price, mid_price, low_price)
105 | read_json.reads(path3, 2, avg_price, high_price, mid_price, low_price)
106 | read_json.reads(path4, 3, avg_price, high_price, mid_price, low_price)
107 | read_json.reads(path5, 4, avg_price, high_price, mid_price, low_price)
108 |
109 | print(avg_price)
110 | print(high_price)
111 | print(low_price)
112 | print(mid_price)
113 |
114 | # 绘图步骤
115 | city_name = ['北京', '广州', '上海', '深圳', '西安']
116 | plt.figure(figsize=(30, 30), dpi=70)
117 | bar_width = 0.23
118 | plt.rcParams['font.sans-serif'] = ['Microsoft YaHei']
119 |
120 | # 均价
121 | plt.subplot(221)
122 | plt.title('平均价格 ', fontsize=25)
123 | # 绘制成一个X坐标对应多个直方图的,这样方便数据的对比和展示观看
124 | plt.bar(x=np.arange(len(city_name)), height=avg_price[0], width=bar_width,label='1居',color='steelblue')
125 | plt.bar(x=np.arange(len(city_name))+bar_width, height=avg_price[1],width=bar_width, label='2居',color='brown')
126 | plt.bar(x=np.arange(len(city_name))+bar_width*2, height=avg_price[2], width=bar_width, label='3居',color='darkorange')
127 | plt.xticks(np.arange(5)+0.2,city_name,fontsize=15)
128 | plt.ylabel('价格:元/月')
129 | # 绘制图例
130 | plt.legend()
131 | # 给图像上端增添数据显示
132 | for x, y in enumerate(avg_price[0]):
133 | plt.text(x, y + 0.2, "%s" % round(y, 1), ha='center')
134 | for x, y in enumerate(avg_price[1]):
135 | plt.text(x+bar_width, y + 0.2, "%s" % round(y, 1), ha='center')
136 | for x, y in enumerate(avg_price[2]):
137 | plt.text(x+bar_width*2, y + 0.2, "%s" % round(y, 1), ha='center')
138 |
139 | # 最高价
140 |
141 | plt.subplot(222)
142 | plt.title('最高价格 ', fontsize=25)
143 | # 绘制成一个X坐标对应多个直方图的,这样方便数据的对比和展示观看
144 | plt.bar(x=np.arange(len(city_name)), height=high_price[0], width=bar_width, label='1居', color='steelblue')
145 | plt.bar(x=np.arange(len(city_name)) + bar_width, height=high_price[1], width=bar_width, label='2居', color='brown')
146 | plt.bar(x=np.arange(len(city_name)) + bar_width * 2, height=high_price[2], width=bar_width, label='3居',
147 | color='darkorange')
148 | plt.xticks(np.arange(5) + 0.2, city_name, fontsize=15)
149 | plt.ylabel('价格:元/月')
150 | plt.legend()
151 | # 给图像上端增添数据显示
152 | for x, y in enumerate(high_price[0]):
153 | plt.text(x, y + 0.2, "%s" % round(y, 1), ha='center')
154 | for x, y in enumerate(high_price[1]):
155 | plt.text(x + bar_width, y + 0.2, "%s" % round(y, 1), ha='center')
156 | for x, y in enumerate(high_price[2]):
157 | plt.text(x + bar_width * 2, y + 0.2, "%s" % round(y, 1), ha='center')
158 |
159 | # 最低价
160 | plt.subplot(223)
161 | plt.title('最低价格 ', fontsize=25)
162 | # 绘制成一个X坐标对应多个直方图的,这样方便数据的对比和展示观看
163 | plt.bar(x=np.arange(len(city_name)), height=low_price[0], width=bar_width, label='1居', color='steelblue')
164 | plt.bar(x=np.arange(len(city_name)) + bar_width, height=low_price[1], width=bar_width, label='2居', color='brown')
165 | plt.bar(x=np.arange(len(city_name)) + bar_width * 2, height=low_price[2], width=bar_width, label='3居',
166 | color='darkorange')
167 | plt.xticks(np.arange(5) + 0.2, city_name, fontsize=15)
168 | plt.ylabel('价格:元/月')
169 | plt.legend()
170 | # 给图像上端增添数据显示
171 | for x, y in enumerate(low_price[0]):
172 | plt.text(x, y + 0.2, "%s" % round(y, 1), ha='center')
173 | for x, y in enumerate(low_price[1]):
174 | plt.text(x + bar_width, y + 0.2, "%s" % round(y, 1), ha='center')
175 | for x, y in enumerate(low_price[2]):
176 | plt.text(x + bar_width * 2, y + 0.2, "%s" % round(y, 1), ha='center')
177 |
178 | # 中位数
179 | plt.subplot(224)
180 | plt.title('中位数价格 ', fontsize=25)
181 | plt.bar(x=np.arange(len(city_name)), height=mid_price[0], width=bar_width, label='1居', color='steelblue')
182 | plt.bar(x=np.arange(len(city_name)) + bar_width, height=mid_price[1], width=bar_width, label='2居', color='brown')
183 | plt.bar(x=np.arange(len(city_name)) + bar_width * 2, height=mid_price[2], width=bar_width, label='3居',
184 | color='darkorange')
185 | plt.xticks(np.arange(5) + 0.2, city_name, fontsize=15)
186 | plt.ylabel('价格:元/月')
187 | plt.legend()
188 | # 给图像上端增添数据显示
189 | for x, y in enumerate(mid_price[0]):
190 | plt.text(x, y + 0.2, "%s" % round(y, 1), ha='center')
191 | for x, y in enumerate(mid_price[1]):
192 | plt.text(x + bar_width, y + 0.2, "%s" % round(y, 1), ha='center')
193 | for x, y in enumerate(mid_price[2]):
194 | plt.text(x + bar_width * 2, y + 0.2, "%s" % round(y, 1), ha='center')
195 |
196 | plt.show()
197 |
--------------------------------------------------------------------------------
/实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/zufang/salary_price_show.py:
--------------------------------------------------------------------------------
1 | #由于之前写好的程序中有展示各个城市单位面积均价,直接在这里存入列表中。
2 | #单位面积均价: 北京:103.2元/平米/月
3 | # 广州:63.9元/平米/月
4 | # 上海:103.5元/平米/月
5 | # 深圳:88.5元/平米/月
6 | # 西安:36.0元/平米/月
7 | #查询百度各个城市的人均工资可知:
8 | #人均工资: 北京:13567元/月 广州:11300元/月 上海:12183元/月 深圳:12300元/月 西安9011元/月
9 |
10 | import matplotlib.pyplot as plt
11 |
12 | space_price = [103.2, 63.9, 103.5, 88.5, 36.0]
13 | salary = [135.67, 113.00, 121.83, 123.00, 90.11] # 单位为百元,这样好计算一点
14 | count = []
15 | # 计算比值
16 | for i in range(0,5):
17 | count.append(space_price[i]/salary[i]) # 用比值来表示,比值越低说明房租占工资占比小,生活成本相对低一点
18 |
19 | # 绘图过程
20 | city_name = ['北京', '广州', '上海', '深圳', '西安']
21 | plt.figure(figsize=(20, 10), dpi=70)
22 | # 消除中文乱码用的
23 | plt.rcParams['font.sans-serif'] = ['Microsoft YaHei']
24 | # 绘制直方图
25 | plt.subplot(121)
26 | plt.title("单位面积价格与平均月薪的比值直方图",fontsize=20)
27 | plt.bar(city_name, count,width=0.4)
28 | plt.ylabel("单位面积价格/人均月薪(百元)",fontsize=15)
29 | # 绘制散点图,由于要用不同的点的颜色来表示,因此分开绘制五个点
30 | plt.subplot(122)
31 | plt.scatter(salary[0],space_price[0],s=60, label='北京', color='steelblue')
32 | plt.scatter(salary[1],space_price[1],s=60, label='广州', color='brown')
33 | plt.scatter(salary[2],space_price[2],s=60, label='上海',color='green')
34 | plt.scatter(salary[3],space_price[3],s=60, label='深圳',color='darkorange')
35 | plt.scatter(salary[4],space_price[4],s=60, label='西安',color='skyblue')
36 | plt.title("人均月薪-单位面积均价散点图",fontsize=20)
37 | plt.xlabel("人均月薪 单位:百元",fontsize=15)
38 | plt.ylabel("单位面积均价 单位:平方米/元/月",fontsize=15)
39 | plt.legend(fontsize=15)
40 |
41 | plt.show()
42 |
--------------------------------------------------------------------------------
/实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/zufang/scrapy.cfg:
--------------------------------------------------------------------------------
1 | # Automatically created by: scrapy startproject
2 | #
3 | # For more information about the [deploy] section see:
4 | # https://scrapyd.readthedocs.io/en/latest/deploy.html
5 |
6 | [settings]
7 | default = zufang.settings
8 |
9 | [deploy]
10 | #url = http://localhost:6800/
11 | project = zufang
12 |
--------------------------------------------------------------------------------
/实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/zufang/total_price_show.py:
--------------------------------------------------------------------------------
1 | # 该文件实现了对五个城市的租房价格的均价,中位数,最高价,最低价和单位面积的均价,中位数,最高价,最低价的比较分析和图表绘制
2 | import json
3 | import codecs
4 | import re
5 | import matplotlib.pyplot as plt
6 |
7 |
8 | total_avg_price = [] # 均价
9 | total_high_price = [] # 最高价
10 | total_mid_price = [] # 中位数
11 | total_low_price = [] # 最低价
12 | space_avg_price = [] # 均价(单位面积)
13 | space_high_price = [] # 最高价(单位面积)
14 | space_low_price = [] # 最低价(单位面积)
15 | space_mid_price = [] # 中位数 (单位面积)
16 |
17 | class Read_Json_and_show():
18 | def reads(self,filename,city, total_avg_price, total_high_price, total_mid_price, total_low_price, space_avg_price,
19 | space_high_price, space_low_price, space_mid_price): # 读取城市数据,并分别存储到需要的数组中
20 | temp_total_price = 0
21 | temp_space_price = 0
22 | temp_space_high_price = 0
23 | temp_space_low_price = 9999999
24 | temp_total_high_price = 0
25 | temp_total_low_price = 9999999
26 | with codecs.open(filename,'r',encoding='utf-8') as f:
27 | # 打开并读取文件
28 | read = f.readlines()
29 | total_mid_price_count = []
30 | space_mid_price_count = []
31 | for index, info in enumerate(read): # 逐行读取
32 | data = json.loads(info)
33 | value = list(data.values())
34 | # value保存了每一行的数据的值,以列表形式。
35 |
36 | number = re.compile(r'^[-+]?[0-9]+\.[0-9]+$') # 正则式判断是否是小数
37 | # 划分出价格
38 | temp_price = value[1].split('-')
39 | price = temp_price[0]
40 | price = int(price)
41 | # 划分出面积
42 | temp_space = value[5].split("㎡")
43 | temp_space = temp_space[0].split("/ ")
44 | if len(temp_space) == 1:
45 | temp_space = temp_space[0].split("-")
46 | space = temp_space[0]
47 | space = float(space)
48 | else:
49 | temp_space2 = temp_space[1].split("-")
50 | # 判断这个数是否为小数,是则说明是面积,这里保留了最小面积作为参考(否则就是异常数据,需要被忽略)
51 | result = number.match(temp_space2[0])
52 | if result:
53 | space = temp_space2[0]
54 | space = float(space)
55 | else: # 保留最小的面积(如20-25㎡则将space看做20)
56 | temp_space = temp_space[2].split("-")
57 | space = temp_space[0]
58 | space = float(space)
59 | # print(space)
60 | if price > temp_total_high_price: # 最高价
61 | temp_total_high_price = price
62 | if price < temp_total_low_price: # 最低价
63 | temp_total_low_price = price
64 | temp_total_price += price # 总价,均价出循环了计算
65 | total_mid_price_count.append(price) # 中位数,同样出循环了计算
66 |
67 | space_price = float(price/space)
68 | if space_price > temp_space_high_price: # 最高价(单位面积)
69 | temp_space_high_price = space_price
70 | if space_price < temp_space_low_price: # 最低价(单位面积)
71 | temp_space_low_price = space_price
72 | temp_space_price += space_price # 总价 (单位面积)
73 | space_mid_price_count.append(space_price) # 中位数(单位面积)
74 |
75 | # 均价
76 | total_avg_price.append(float(temp_total_price/3000))
77 | space_avg_price.append(float(temp_space_price/3000))
78 | # 最高价
79 | total_high_price.append(float(temp_total_high_price))
80 | space_high_price.append(float(temp_space_high_price))
81 | # 最低价
82 | total_low_price.append(float(temp_total_low_price))
83 | space_low_price.append(float(temp_space_low_price))
84 | # 中位数
85 | total_mid_price_count.sort()
86 | space_mid_price_count.sort()
87 | total_mid_price.append(float(total_mid_price_count[1499]))
88 | space_mid_price.append(float(space_mid_price_count[1499]))
89 |
90 |
91 | if __name__ == '__main__':
92 | read_json = Read_Json_and_show()
93 | path1 = 'scrapy-beijing-zufang.json'
94 | path2 = 'scrapy-guangzhou-zufang.json'
95 | path3 = 'scrapy-shanghai-zufang.json'
96 | path4 = 'scrapy-shenzhen-zufang.json'
97 | path5 = 'scrapy-xian-zufang.json'
98 | # 读取并处理数据
99 | read_json.reads(path1, 1, total_avg_price, total_high_price, total_mid_price, total_low_price, space_avg_price,
100 | space_high_price, space_low_price, space_mid_price)
101 | read_json.reads(path2, 2, total_avg_price, total_high_price, total_mid_price, total_low_price, space_avg_price,
102 | space_high_price, space_low_price, space_mid_price)
103 | read_json.reads(path3, 3, total_avg_price, total_high_price, total_mid_price, total_low_price, space_avg_price,
104 | space_high_price, space_low_price, space_mid_price)
105 | read_json.reads(path4, 4, total_avg_price, total_high_price, total_mid_price, total_low_price, space_avg_price,
106 | space_high_price, space_low_price, space_mid_price)
107 | read_json.reads(path5, 5, total_avg_price, total_high_price, total_mid_price, total_low_price, space_avg_price,
108 | space_high_price, space_low_price, space_mid_price)
109 |
110 | # 输出展示处理结果
111 | print(total_avg_price)
112 | print(total_high_price)
113 | print(total_low_price)
114 | print(total_mid_price)
115 | print(space_avg_price)
116 | print(space_high_price)
117 | print(space_low_price)
118 | print(space_mid_price)
119 |
120 | # 绘直方图图
121 | plt.rcParams['font.sans-serif'] = ['Microsoft YaHei']
122 | city_name = ['北京', '广州', '上海', '深圳', '西安']
123 | plt.figure(figsize=(30, 30), dpi=70)
124 | bar_width = 0.4
125 |
126 | # 总平均租金展示图的绘制,共八个子图
127 | plt.subplot(241)
128 | plt.title('总价平均价格', fontsize=25)
129 | plt.bar(city_name,total_avg_price)
130 | plt.ylabel('价格:元/月')
131 | for x, y in enumerate(total_avg_price):
132 | plt.text(x, y + 0.1, "%s" % round(y, 1), ha='center')
133 | # 总最高租金
134 | plt.subplot(242)
135 | plt.title('总价最高价格', fontsize=25)
136 | plt.bar(city_name, total_high_price)
137 | plt.ylabel('价格:元/月')
138 | for x, y in enumerate(total_high_price):
139 | plt.text(x, y + 0.1, "%s" % round(y, 1), ha='center')
140 | # 总最低租金
141 | plt.subplot(243)
142 | plt.title('总价最低价格', fontsize=25)
143 | plt.bar(city_name, total_low_price)
144 | plt.ylabel('价格:元/月')
145 | for x, y in enumerate(total_low_price):
146 | plt.text(x, y + 0.1, "%s" % round(y, 1), ha='center')
147 | # 总中位数租金
148 | plt.subplot(244)
149 | plt.title('总价中位数价格', fontsize=25)
150 | plt.bar(city_name, total_mid_price)
151 | plt.ylabel('价格:元/月')
152 | for x, y in enumerate(total_mid_price):
153 | plt.text(x, y + 0.1, "%s" % round(y, 1), ha='center')
154 | # 单位面积平均租金展示图的绘制
155 | plt.subplot(245)
156 | plt.title('单位面积均价', fontsize=25)
157 | plt.bar(city_name, space_avg_price)
158 | plt.ylabel('价格:元/月')
159 | for x, y in enumerate(space_avg_price):
160 | plt.text(x, y + 0.1, "%s" % round(y, 1), ha='center')
161 | # 单位面积最高租金
162 | plt.subplot(246)
163 | plt.title('单位面积最高价格', fontsize=25)
164 | plt.bar(city_name, space_high_price)
165 | plt.ylabel('价格:元/月')
166 | for x, y in enumerate(space_high_price):
167 | plt.text(x, y + 0.1, "%s" % round(y, 1), ha='center')
168 | # 单位面积最低租金
169 | plt.subplot(247)
170 | plt.title('单位面积最低价格', fontsize=25)
171 | plt.bar(city_name, space_low_price)
172 | plt.ylabel('价格:元/月')
173 | for x, y in enumerate(space_low_price):
174 | plt.text(x, y + 0.1, "%s" % round(y, 1), ha='center')
175 | # 单位面积中位数租金
176 | plt.subplot(248)
177 | plt.title('单位面积中位数价格', fontsize=25)
178 | plt.bar(city_name, space_mid_price)
179 | plt.ylabel('价格:元/月')
180 | for x, y in enumerate(space_mid_price):
181 | plt.text(x, y + 0.1, "%s" % round(y, 1), ha='center')
182 |
183 | plt.show()
184 |
--------------------------------------------------------------------------------
/实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/zufang/zufang/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Revenger666/python-house-scrapy/bbff08f70092d39ef0d19d3f0d9723aa4c8fd1b8/实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/zufang/zufang/__init__.py
--------------------------------------------------------------------------------
/实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/zufang/zufang/items.py:
--------------------------------------------------------------------------------
1 | # Define here the models for your scraped items
2 | #
3 | # See documentation in:
4 | # https://docs.scrapy.org/en/latest/topics/items.html
5 |
6 | import scrapy
7 |
8 |
9 | class zufangitem(scrapy.Item):
10 | title = scrapy.Field() # 标题
11 | price = scrapy.Field() #月租金
12 | position0 = scrapy.Field() #地址1
13 | position1 = scrapy.Field() #地址2
14 | position2 = scrapy.Field() #地址3
15 | information = scrapy.Field() #其他信息
16 |
17 |
--------------------------------------------------------------------------------
/实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/zufang/zufang/middlewares.py:
--------------------------------------------------------------------------------
1 | from scrapy import signals
2 | from selenium import webdriver
3 | from scrapy.http import HtmlResponse
4 |
5 | # 只需要修改下载器中间件,爬虫中间件不用管
6 |
7 |
8 | class ZufangDownloaderMiddleware:
9 | # 当下载器中间件开始工作时,自动打开一个浏览器
10 |
11 | def __init__(self):
12 | self.driver = webdriver.Chrome()
13 |
14 | @classmethod
15 | def from_crawler(cls, crawler):
16 | # This method is used by Scrapy to create your spiders.
17 | s = cls()
18 | crawler.signals.connect(s.spider_opened, signal=signals.spider_opened)
19 | # 下面这一行需要手动添加,作用是调用关闭浏览器的函数
20 | crawler.signals.connect(s.spider_closed, signal=signals.spider_closed)
21 | return s
22 |
23 | # 每当爬虫文件向目标网址发送一次请求都会调用这个函数,用处就是返回该网址的源码
24 |
25 | def process_request(self, request, spider):
26 | self.driver.get(request.url) # 使用浏览器打开请求的URL
27 | body = self.driver.page_source # 获取网页HTML源码
28 | return HtmlResponse(url=self.driver.current_url, body=body, encoding='utf-8', request=request)
29 |
30 | def process_response(self, request, response, spider):
31 | return response
32 |
33 | def process_exception(self, request, exception, spider):
34 | pass
35 |
36 | def spider_opened(self, spider):
37 | spider.logger.info("Spider opened: %s" % spider.name)
38 |
39 | # 该函数需要手动添加,作用是关闭浏览器
40 |
41 | def spider_closed(self, spider):
42 | self.driver.close()
43 | spider.logger.info("Spider closed: %s" % spider.name)
44 |
--------------------------------------------------------------------------------
/实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/zufang/zufang/pipelines.py:
--------------------------------------------------------------------------------
1 | # Define your item pipelines here
2 | #
3 | # Don't forget to add your pipeline to the ITEM_PIPELINES setting
4 | # See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html
5 |
6 |
7 | # useful for handling different item types with a single interface
8 | from itemadapter import ItemAdapter
9 | import json
10 |
11 |
12 | class zufangline(object):
13 | def open_spider(self, spider):
14 | try:
15 | self.file = open('scrapy-xian-zufang.json', "w", encoding="utf-8")
16 | except Exception as err:
17 | print(err)
18 |
19 | def process_item(self, item, spider):
20 | dict_item = dict(item)
21 | json_str = json.dumps(dict_item, ensure_ascii=False) + "\n"
22 | self.file.write(json_str)
23 | return item
24 |
25 | def close_spider(self, spider):
26 | self.file.close()
27 |
--------------------------------------------------------------------------------
/实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/zufang/zufang/settings.py:
--------------------------------------------------------------------------------
1 | # Scrapy settings for lianjia project
2 | #
3 | # For simplicity, this file contains only settings considered important or
4 | # commonly used. You can find more settings consulting the documentation:
5 | #
6 | # https://docs.scrapy.org/en/latest/topics/settings.html
7 | # https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
8 | # https://docs.scrapy.org/en/latest/topics/spider-middleware.html
9 |
10 | BOT_NAME = "zufang"
11 | #2403:a200:a200:13f1:183:84:18:11
12 |
13 | SPIDER_MODULES = ["zufang.spiders"]
14 | NEWSPIDER_MODULE = "zufang.spiders"
15 |
16 | # Crawl responsibly by identifying yourself (and your website) on the user-agent
17 | #USER_AGENT = 'python_qm (+http://www.yourdomain.com)'
18 | #USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"
19 | #USER_AGENT ="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36 Edg/119.0.0.0"
20 | DOWNLOADER_MIDDLEWARES = {
21 | # 'lianjia.middlewares.LianjiaDownloaderMiddleware': 543,
22 | 'zufang.middlewares.RandomUserAgentMiddleware': 900,
23 | }
24 |
25 | MY_USER_AGENT = [
26 | "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; AcooBrowser; .NET CLR 1.1.4322; .NET CLR 2.0.50727)",
27 | "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; Acoo Browser; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.0.04506)",
28 | "Mozilla/4.0 (compatible; MSIE 7.0; AOL 9.5; AOLBuild 4337.35; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)",
29 | "Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US)",
30 | "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET CLR 2.0.50727; Media Center PC 6.0)",
31 | "Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET CLR 1.0.3705; .NET CLR 1.1.4322)",
32 | "Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 5.2; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.2; .NET CLR 3.0.04506.30)",
33 | "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN) AppleWebKit/523.15 (KHTML, like Gecko, Safari/419.3) Arora/0.3 (Change: 287 c9dfb30)",
34 | "Mozilla/5.0 (X11; U; Linux; en-US) AppleWebKit/527+ (KHTML, like Gecko, Safari/419.3) Arora/0.6",
35 | "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.2pre) Gecko/20070215 K-Ninja/2.1.1",
36 | "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.9) Gecko/20080705 Firefox/3.0 Kapiko/3.0",
37 | "Mozilla/5.0 (X11; Linux i686; U;) Gecko/20070322 Kazehakase/0.4.5",
38 | "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.8) Gecko Fedora/1.9.0.8-1.fc10 Kazehakase/0.5.6",
39 | "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11",
40 | "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/535.20 (KHTML, like Gecko) Chrome/19.0.1036.7 Safari/535.20",
41 | "Opera/9.80 (Macintosh; Intel Mac OS X 10.6.8; U; fr) Presto/2.9.168 Version/11.52",
42 | "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.11 TaoBrowser/2.0 Safari/536.11",
43 | "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.71 Safari/537.1 LBBROWSER",
44 | "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; LBBROWSER)",
45 | "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E; LBBROWSER)",
46 | "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.84 Safari/535.11 LBBROWSER",
47 | "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E)",
48 | "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; QQBrowser/7.0.3698.400)",
49 | "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E)",
50 | "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; SV1; QQDownload 732; .NET4.0C; .NET4.0E; 360SE)",
51 | "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E)",
52 | "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E)",
53 | "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1",
54 | "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1",
55 | "Mozilla/5.0 (iPad; U; CPU OS 4_2_1 like Mac OS X; zh-cn) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8C148 Safari/6533.18.5",
56 | "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:2.0b13pre) Gecko/20110307 Firefox/4.0b13pre",
57 | "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:16.0) Gecko/20100101 Firefox/16.0",
58 | "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11",
59 | "Mozilla/5.0 (X11; U; Linux x86_64; zh-CN; rv:1.9.2.10) Gecko/20100922 Ubuntu/10.10 (maverick) Firefox/3.6.10",
60 | "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36",
61 | ]
62 |
63 | # Obey robots.txt rules
64 | ROBOTSTXT_OBEY = False
65 |
66 | LOG_LEVEL = 'WARNING'
67 |
68 | #LOG_LEVEL = "WARNING"
69 | # Configure maximum concurrent requests performed by Scrapy (default: 16)
70 | #CONCURRENT_REQUESTS = 8
71 |
72 | # Configure a delay for requests for the same website (default: 0)
73 | # See https://docs.scrapy.org/en/latest/topics/settings.html#download-delay
74 | # See also autothrottle settings and docs
75 | DOWNLOAD_DELAY = 3
76 | RANDOMIZE_DOWNLOAD_DELAY = True
77 | #CONCURRENT_REQUESTS_PER_DOMAIN = 2
78 | # The download delay setting will honor only one of:
79 | #CONCURRENT_REQUESTS_PER_DOMAIN = 16
80 | #CONCURRENT_REQUESTS_PER_IP = 16
81 |
82 | # Disable cookies (enabled by default)
83 | #COOKIES_ENABLED = False
84 |
85 | # Disable Telnet Console (enabled by default)
86 | #TELNETCONSOLE_ENABLED = False
87 |
88 | # Override the default request headers:
89 | #DEFAULT_REQUEST_HEADERS = {
90 | # "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
91 | # "Accept-Language": "en",
92 | # "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.4389.82 Safari/537.36"
93 | #}
94 |
95 | # Enable or disable spider middlewares
96 | # See https://docs.scrapy.org/en/latest/topics/spider-middleware.html
97 | #SPIDER_MIDDLEWARES = {
98 | # "lianjia.middlewares.LianjiaSpiderMiddleware": 543,
99 | #}
100 |
101 | # Enable or disable downloader middlewares
102 | # See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
103 | DOWNLOADER_MIDDLEWARES = {
104 | "zufang.middlewares.ZufangDownloaderMiddleware": 543,
105 | }
106 |
107 | # Enable or disable extensions
108 | # See https://docs.scrapy.org/en/latest/topics/extensions.html
109 | #EXTENSIONS = {
110 | # "scrapy.extensions.telnet.TelnetConsole": None,
111 | #}
112 |
113 | # Configure item pipelines
114 | # See https://docs.scrapy.org/en/latest/topics/item-pipeline.html
115 | ITEM_PIPELINES = {'zufang.pipelines.zufangline': 300, }
116 | #ITEM_PIPELINES = {'lianjia.pipelines.firsthandline': 300}
117 | # Enable and configure the AutoThrottle extension (disabled by default)
118 | # See https://docs.scrapy.org/en/latest/topics/autothrottle.html
119 | #AUTOTHROTTLE_ENABLED = True
120 | # The initial download delay
121 | #AUTOTHROTTLE_START_DELAY = 5
122 | # The maximum download delay to be set in case of high latencies
123 | #AUTOTHROTTLE_MAX_DELAY = 60
124 | # The average number of requests Scrapy should be sending in parallel to
125 | # each remote server
126 | #AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0
127 | # Enable showing throttling stats for every response received:
128 | #AUTOTHROTTLE_DEBUG = False
129 |
130 | # Enable and configure HTTP caching (disabled by default)
131 | # See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings
132 | #HTTPCACHE_ENABLED = True
133 | #HTTPCACHE_EXPIRATION_SECS = 0
134 | #HTTPCACHE_DIR = "httpcache"
135 | #HTTPCACHE_IGNORE_HTTP_CODES = []
136 | #HTTPCACHE_STORAGE = "scrapy.extensions.httpcache.FilesystemCacheStorage"
137 |
138 | # Set settings whose default value is deprecated to a future-proof value
139 | REQUEST_FINGERPRINTER_IMPLEMENTATION = "2.7"
140 | TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"
141 | FEED_EXPORT_ENCODING = "utf-8"
142 |
--------------------------------------------------------------------------------
/实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/zufang/zufang/spiders/__init__.py:
--------------------------------------------------------------------------------
1 | # This package will contain the spiders of your Scrapy project
2 | #
3 | # Please refer to the documentation for information on how to create and manage
4 | # your spiders.
5 |
--------------------------------------------------------------------------------
/实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/zufang/zufang/spiders/spider1.py:
--------------------------------------------------------------------------------
1 | import scrapy
2 |
3 | from zufang.items import zufangitem
4 |
5 |
6 | class Zufangspider(scrapy.spiders.Spider):
7 | name = "xian" # 爬虫名字分别为 beijing shanghai guangzhou shenzhen xian
8 | allowed_domains = ["xa.lianjia.com"] # 爬取的起始页面
9 | start_urls = []
10 | for page in range(1, 101): # 共100页,所以利用一个循环来爬取
11 | url1 = 'https://xa.lianjia.com/zufang/pg{}/'.format(page)
12 | start_urls.append(url1)
13 |
14 | custom_settings = {
15 | 'ITEM_PIPELINES': {'zufang.pipelines.zufangline': 300},
16 | }
17 |
18 | def parse(self, response, **kwargs):
19 |
20 | item = zufangitem()
21 | div_list = response.xpath("//*[@id=\"content\"]/div[1]/div[1]/div")
22 |
23 | # 通过XPATH来分析爬取到的内容,并提取需要的数据
24 | for each in div_list:
25 | item['title'] = each.xpath("normalize-space(./div/p[1]/a/text())").extract_first()
26 | item['price'] = each.xpath("normalize-space(./div/span/em/text())").extract_first()
27 | item['position0'] = each.xpath("./div/p[2]/a[1]/text()").extract_first()
28 | item['position1'] = each.xpath("./div/p[2]/a[2]/text()").extract_first()
29 | item['position2'] = each.xpath("./div/p[2]/a[3]/text()").extract_first()
30 | item['information'] = each.xpath("normalize-space(./div/p[2])").extract_first()
31 | yield item
32 |
--------------------------------------------------------------------------------
/实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/实验报告/租房数据分析实验报告-2021211338-郭柏彤.docx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Revenger666/python-house-scrapy/bbff08f70092d39ef0d19d3f0d9723aa4c8fd1b8/实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/实验报告/租房数据分析实验报告-2021211338-郭柏彤.docx
--------------------------------------------------------------------------------
/实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/实验报告/租房数据分析实验报告-2021211338-郭柏彤.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Revenger666/python-house-scrapy/bbff08f70092d39ef0d19d3f0d9723aa4c8fd1b8/实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/实验报告/租房数据分析实验报告-2021211338-郭柏彤.pdf
--------------------------------------------------------------------------------
/实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/生成的图表/五个城市租房总价和单位面积价格分析.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Revenger666/python-house-scrapy/bbff08f70092d39ef0d19d3f0d9723aa4c8fd1b8/实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/生成的图表/五个城市租房总价和单位面积价格分析.png
--------------------------------------------------------------------------------
/实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/生成的图表/单位面积价格和GDP的关系.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Revenger666/python-house-scrapy/bbff08f70092d39ef0d19d3f0d9723aa4c8fd1b8/实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/生成的图表/单位面积价格和GDP的关系.png
--------------------------------------------------------------------------------
/实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/生成的图表/单位面积价格和人均月薪的关系.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Revenger666/python-house-scrapy/bbff08f70092d39ef0d19d3f0d9723aa4c8fd1b8/实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/生成的图表/单位面积价格和人均月薪的关系.png
--------------------------------------------------------------------------------
/实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/生成的图表/均价和居室的关系.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Revenger666/python-house-scrapy/bbff08f70092d39ef0d19d3f0d9723aa4c8fd1b8/实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/生成的图表/均价和居室的关系.png
--------------------------------------------------------------------------------
/实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/生成的图表/均价和板块的关系.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Revenger666/python-house-scrapy/bbff08f70092d39ef0d19d3f0d9723aa4c8fd1b8/实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/生成的图表/均价和板块的关系.png
--------------------------------------------------------------------------------
/实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/生成的图表/均价和面向的关系.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Revenger666/python-house-scrapy/bbff08f70092d39ef0d19d3f0d9723aa4c8fd1b8/实验源代码,爬取到的数据和生成的图表-2021211338-郭柏彤/生成的图表/均价和面向的关系.png
--------------------------------------------------------------------------------
/租房数据分析实验报告-2021211338-郭柏彤.docx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Revenger666/python-house-scrapy/bbff08f70092d39ef0d19d3f0d9723aa4c8fd1b8/租房数据分析实验报告-2021211338-郭柏彤.docx
--------------------------------------------------------------------------------
/租房数据分析实验报告-2021211338-郭柏彤.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Revenger666/python-house-scrapy/bbff08f70092d39ef0d19d3f0d9723aa4c8fd1b8/租房数据分析实验报告-2021211338-郭柏彤.pdf
--------------------------------------------------------------------------------
/题目要求.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Revenger666/python-house-scrapy/bbff08f70092d39ef0d19d3f0d9723aa4c8fd1b8/题目要求.pdf
--------------------------------------------------------------------------------