├── ChilisauceComment
├── README.md
├── 李子柒辣椒酱评论.xlsx
└── 评价分析代码.ipynb
├── Comments
├── README.md
├── 刷单鉴定评价数据.xlsx
├── 生姜防脱洗发水.xlsx
└── 评价刷单鉴定两板斧代码.ipynb
├── DoubanMovies
├── README.md
├── 最终电影排名结果.xlsx
├── 清洗分析详细步骤.ipynb
├── 电影基本信息大全.xlsx
├── 电影详细信息.xlsx
└── 豆瓣电影爬取.ipynb
├── Hair
├── README.md
├── 防脱洗发水评价.xlsx
└── 防脱洗发水评价爬取+分析.ipynb
├── Python+excel
├── Python批量处理Excel表格.ipynb
├── README.md
└── 源数据128张表格
│ ├── 专项户外运动装备&冰爪.xlsx
│ ├── 专项户外运动装备&呼吸管-呼吸器.xlsx
│ ├── 专项户外运动装备&安全带.xlsx
│ ├── 专项户外运动装备&救生衣.xlsx
│ ├── 专项户外运动装备&气瓶.xlsx
│ ├── 专项户外运动装备&滑雪头盔.xlsx
│ ├── 专项户外运动装备&滑雪护具.xlsx
│ ├── 专项户外运动装备&滑雪板.xlsx
│ ├── 专项户外运动装备&滑雪眼镜.xlsx
│ ├── 专项户外运动装备&潜水箱包.xlsx
│ ├── 专项户外运动装备&潜水袜.xlsx
│ ├── 专项户外运动装备&皮划艇充气艇.xlsx
│ ├── 专项户外运动装备&绳索.xlsx
│ ├── 专项户外运动装备&脚蹼.xlsx
│ ├── 专项户外运动装备&面镜.xlsx
│ ├── 垂钓装备&其他垂钓用品.xlsx
│ ├── 垂钓装备&垂钓小配件.xlsx
│ ├── 垂钓装备&垂钓装备.xlsx
│ ├── 垂钓装备&太空豆.xlsx
│ ├── 垂钓装备&打水桶.xlsx
│ ├── 垂钓装备&抄网.xlsx
│ ├── 垂钓装备&抄网头.xlsx
│ ├── 垂钓装备&抄网杆.xlsx
│ ├── 垂钓装备&探鱼器.xlsx
│ ├── 垂钓装备&支架.xlsx
│ ├── 垂钓装备&止血钳.xlsx
│ ├── 垂钓装备&浮漂.xlsx
│ ├── 垂钓装备&渔具包.xlsx
│ ├── 垂钓装备&绑钩器.xlsx
│ ├── 垂钓装备&装鱼桶.xlsx
│ ├── 垂钓装备&钓台.xlsx
│ ├── 垂钓装备&钓竿.xlsx
│ ├── 垂钓装备&钓箱.xlsx
│ ├── 垂钓装备&钓鱼伞.xlsx
│ ├── 垂钓装备&钓鱼帽.xlsx
│ ├── 垂钓装备&钓鱼手套.xlsx
│ ├── 垂钓装备&钓鱼椅、凳.xlsx
│ ├── 垂钓装备&钓鱼鞋.xlsx
│ ├── 垂钓装备&铅坠.xlsx
│ ├── 垂钓装备&铅皮.xlsx
│ ├── 垂钓装备&饵料盒.xlsx
│ ├── 垂钓装备&鱼护.xlsx
│ ├── 垂钓装备&鱼线.xlsx
│ ├── 垂钓装备&鱼线轮.xlsx
│ ├── 垂钓装备&鱼网-虾笼-其它渔具.xlsx
│ ├── 垂钓装备&鱼钩.xlsx
│ ├── 垂钓装备&鱼饵.xlsx
│ ├── 户外休闲家具&充气床.xlsx
│ ├── 户外休闲家具&吊床.xlsx
│ ├── 户外休闲家具&户外休闲家具.xlsx
│ ├── 户外休闲家具&户外床-折叠床.xlsx
│ ├── 户外休闲家具&户外桌子.xlsx
│ ├── 户外休闲家具&户外桌椅套装.xlsx
│ ├── 户外休闲家具&户外椅子凳子.xlsx
│ ├── 户外休闲家具&野餐垫.xlsx
│ ├── 户外服装&一次性内裤.xlsx
│ ├── 户外服装&其他户外服装.xlsx
│ ├── 户外服装&内衣裤套装.xlsx
│ ├── 户外服装&冲锋衣.xlsx
│ ├── 户外服装&冲锋衣裤套装.xlsx
│ ├── 户外服装&冲锋裤.xlsx
│ ├── 户外服装&功能内衣上装.xlsx
│ ├── 户外服装&功能内衣下装.xlsx
│ ├── 户外服装&功能内裤.xlsx
│ ├── 户外服装&户外休闲衣.xlsx
│ ├── 户外服装&户外休闲衣裤套装.xlsx
│ ├── 户外服装&户外休闲裤.xlsx
│ ├── 户外服装&户外服装.xlsx
│ ├── 户外服装&抓绒衣.xlsx
│ ├── 户外服装&抓绒裤.xlsx
│ ├── 户外服装&滑雪衣.xlsx
│ ├── 户外服装&滑雪衣裤套装.xlsx
│ ├── 户外服装&滑雪裤.xlsx
│ ├── 户外服装&潜水服.xlsx
│ ├── 户外服装&羽绒衣.xlsx
│ ├── 户外服装&软壳衣.xlsx
│ ├── 户外服装&软壳裤.xlsx
│ ├── 户外服装&运动户外风衣.xlsx
│ ├── 户外服装&速干T恤.xlsx
│ ├── 户外服装&速干背心.xlsx
│ ├── 户外服装&速干衣裤套装.xlsx
│ ├── 户外服装&速干衬衣.xlsx
│ ├── 户外服装&速干裤.xlsx
│ ├── 户外服装&钓鱼服.xlsx
│ ├── 户外照明&信号灯-发光棒-救生灯.xlsx
│ ├── 户外照明&充电器.xlsx
│ ├── 户外照明&其他.xlsx
│ ├── 户外照明&头灯.xlsx
│ ├── 户外照明&户外照明.xlsx
│ ├── 户外照明&手电筒.xlsx
│ ├── 户外照明&电池-燃料.xlsx
│ ├── 户外照明&营地灯-帐篷灯.xlsx
│ ├── 户外照明&钓鱼灯.xlsx
│ ├── 户外鞋靴&其他户外鞋.xlsx
│ ├── 户外鞋靴&户外休闲鞋.xlsx
│ ├── 户外鞋靴&户外鞋靴.xlsx
│ ├── 户外鞋靴&攀岩鞋.xlsx
│ ├── 户外鞋靴&沙滩鞋-凉鞋-拖鞋.xlsx
│ ├── 户外鞋靴&溯溪鞋.xlsx
│ ├── 户外鞋靴&滑雪鞋-雪地靴.xlsx
│ ├── 户外鞋靴&登山鞋-徒步鞋.xlsx
│ ├── 户外鞋靴&越野跑鞋.xlsx
│ ├── 旅行便携装备&其他.xlsx
│ ├── 旅行便携装备&其他安全防盗产品.xlsx
│ ├── 旅行便携装备&旅行便携装备.xlsx
│ ├── 旅行便携装备&普通密码锁.xlsx
│ ├── 旅行便携装备&晾衣绳.xlsx
│ ├── 旅行便携装备&转换插头.xlsx
│ ├── 望远镜-夜视仪-户外眼镜&垂钓望远镜.xlsx
│ ├── 望远镜-夜视仪-户外眼镜&户外眼镜.xlsx
│ ├── 望远镜-夜视仪-户外眼镜&普通望远镜.xlsx
│ ├── 望远镜-夜视仪-户外眼镜&望远镜-夜视仪-户外眼镜.xlsx
│ ├── 望远镜-夜视仪-户外眼镜&望远镜配件.xlsx
│ ├── 洗漱清洁-护理用品&防虫-防蚊用品.xlsx
│ ├── 登山杖-手杖&登山杖-手杖.xlsx
│ ├── 睡袋&睡袋.xlsx
│ ├── 防护-救生装备&其他防护救生装备.xlsx
│ ├── 防护-救生装备&急救包-急救箱.xlsx
│ ├── 防护-救生装备&急救护理用品.xlsx
│ ├── 防护-救生装备&求生哨.xlsx
│ ├── 防护-救生装备&求生绳-逃生绳.xlsx
│ ├── 防护-救生装备&求生锯-绳锯-线锯.xlsx
│ ├── 防护-救生装备&防护-救生装备.xlsx
│ ├── 防护-救生装备&防护面罩.xlsx
│ ├── 防潮垫-地席-枕头&地布-地席.xlsx
│ ├── 防潮垫-地席-枕头&枕头.xlsx
│ ├── 防潮垫-地席-枕头&防潮垫-地席-枕头.xlsx
│ └── 防潮垫-地席-枕头&防潮垫.xlsx
├── README.md
├── RFM
├── PYTHON-RFM实战数据.xlsx
├── README.md
└── RFM模型实战案例代码.ipynb
├── TGI
├── README.md
├── TGI分析代码.ipynb
└── TGI指数案例数据.xlsx
├── Weather+Email
├── README.md
└── 天气爬虫+邮件发送.ipynb
└── Zhihu
├── README.md
├── 数据清洗.ipynb
├── 知乎爬取代码.ipynb
├── 第二个问题源数据.xlsx
└── 过年工作问题.xlsx
/ChilisauceComment/README.md:
--------------------------------------------------------------------------------
1 | # 数据分析如何更进一步-实战项目#
2 |
3 | ----------
4 |
5 | 本项目以一个场景切入,以实战案例来探索怎样将分析更进一步
6 |
7 | 主要文件为:
8 |
9 | - 辣椒酱评论数据
10 | - 评价分析完整代码
11 |
12 | 欢迎关注公众号:数据不吹牛
13 |
14 |
--------------------------------------------------------------------------------
/ChilisauceComment/李子柒辣椒酱评论.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/ChilisauceComment/李子柒辣椒酱评论.xlsx
--------------------------------------------------------------------------------
/Comments/README.md:
--------------------------------------------------------------------------------
1 | # 如何优雅鉴别刷单 #
2 |
3 | ----------
4 |
5 | 项目主要用两种简单粗暴的方法来鉴别刷单(刷评价)。
6 |
7 | 主要文件为:
8 |
9 | - 刷单鉴定评价数据
10 | - 生姜防脱洗发水(评论)数据
11 | - 评价刷单鉴定代码
12 |
13 | 欢迎关注公众号:数据不吹牛
14 |
15 |
--------------------------------------------------------------------------------
/Comments/刷单鉴定评价数据.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Comments/刷单鉴定评价数据.xlsx
--------------------------------------------------------------------------------
/Comments/生姜防脱洗发水.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Comments/生姜防脱洗发水.xlsx
--------------------------------------------------------------------------------
/Comments/评价刷单鉴定两板斧代码.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "### 公众号:数据不吹牛,更多案例和有趣分析等你来撩"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": 33,
13 | "metadata": {},
14 | "outputs": [],
15 | "source": [
16 | "import pandas as pd\n",
17 | "import os\n",
18 | "import matplotlib.pyplot as plt\n",
19 | "\n",
20 | "%matplotlib inline"
21 | ]
22 | },
23 | {
24 | "cell_type": "markdown",
25 | "metadata": {},
26 | "source": [
27 | "### 导入数据"
28 | ]
29 | },
30 | {
31 | "cell_type": "code",
32 | "execution_count": 16,
33 | "metadata": {},
34 | "outputs": [],
35 | "source": [
36 | "os.chdir('C:\\\\Users\\\\Administrator\\\\Desktop\\\\JC数据集')"
37 | ]
38 | },
39 | {
40 | "cell_type": "code",
41 | "execution_count": 17,
42 | "metadata": {},
43 | "outputs": [
44 | {
45 | "data": {
46 | "text/html": [
47 | "
\n",
48 | "\n",
61 | "
\n",
62 | " \n",
63 | " \n",
64 | " | \n",
65 | " 产品ID | \n",
66 | " 价格 | \n",
67 | " 总销量 | \n",
68 | " 总评价数 | \n",
69 | " 规格类型 | \n",
70 | "
\n",
71 | " \n",
72 | " \n",
73 | " \n",
74 | " 0 | \n",
75 | " 59497802 | \n",
76 | " 189.0 | \n",
77 | " 22153 | \n",
78 | " 12269 | \n",
79 | " 套装 | \n",
80 | "
\n",
81 | " \n",
82 | " 1 | \n",
83 | " 55594403 | \n",
84 | " 95.0 | \n",
85 | " 227064 | \n",
86 | " 53842 | \n",
87 | " NaN | \n",
88 | "
\n",
89 | " \n",
90 | " 2 | \n",
91 | " 56419172 | \n",
92 | " 79.0 | \n",
93 | " 733418 | \n",
94 | " 130106 | \n",
95 | " 正常规格 | \n",
96 | "
\n",
97 | " \n",
98 | " 3 | \n",
99 | " 58567235 | \n",
100 | " 89.0 | \n",
101 | " 480040 | \n",
102 | " 103975 | \n",
103 | " 常规单品 | \n",
104 | "
\n",
105 | " \n",
106 | " 4 | \n",
107 | " 53625235 | \n",
108 | " 59.0 | \n",
109 | " 253606 | \n",
110 | " 49611 | \n",
111 | " 常规单品 | \n",
112 | "
\n",
113 | " \n",
114 | "
\n",
115 | "
"
116 | ],
117 | "text/plain": [
118 | " 产品ID 价格 总销量 总评价数 规格类型\n",
119 | "0 59497802 189.0 22153 12269 套装\n",
120 | "1 55594403 95.0 227064 53842 NaN\n",
121 | "2 56419172 79.0 733418 130106 正常规格\n",
122 | "3 58567235 89.0 480040 103975 常规单品\n",
123 | "4 53625235 59.0 253606 49611 常规单品"
124 | ]
125 | },
126 | "execution_count": 17,
127 | "metadata": {},
128 | "output_type": "execute_result"
129 | }
130 | ],
131 | "source": [
132 | "df = pd.read_excel('刷单鉴定评价数据.xlsx')\n",
133 | "df.head()"
134 | ]
135 | },
136 | {
137 | "cell_type": "markdown",
138 | "metadata": {},
139 | "source": [
140 | "### 计算评销比"
141 | ]
142 | },
143 | {
144 | "cell_type": "code",
145 | "execution_count": 18,
146 | "metadata": {},
147 | "outputs": [
148 | {
149 | "data": {
150 | "text/html": [
151 | "\n",
152 | "\n",
165 | "
\n",
166 | " \n",
167 | " \n",
168 | " | \n",
169 | " 产品ID | \n",
170 | " 价格 | \n",
171 | " 总销量 | \n",
172 | " 总评价数 | \n",
173 | " 规格类型 | \n",
174 | " 评销比 | \n",
175 | "
\n",
176 | " \n",
177 | " \n",
178 | " \n",
179 | " 0 | \n",
180 | " 59497802 | \n",
181 | " 189.0 | \n",
182 | " 22153 | \n",
183 | " 12269 | \n",
184 | " 套装 | \n",
185 | " 55.383018 | \n",
186 | "
\n",
187 | " \n",
188 | " 1 | \n",
189 | " 55594403 | \n",
190 | " 95.0 | \n",
191 | " 227064 | \n",
192 | " 53842 | \n",
193 | " NaN | \n",
194 | " 23.712257 | \n",
195 | "
\n",
196 | " \n",
197 | " 2 | \n",
198 | " 56419172 | \n",
199 | " 79.0 | \n",
200 | " 733418 | \n",
201 | " 130106 | \n",
202 | " 正常规格 | \n",
203 | " 17.739679 | \n",
204 | "
\n",
205 | " \n",
206 | " 3 | \n",
207 | " 58567235 | \n",
208 | " 89.0 | \n",
209 | " 480040 | \n",
210 | " 103975 | \n",
211 | " 常规单品 | \n",
212 | " 21.659653 | \n",
213 | "
\n",
214 | " \n",
215 | " 4 | \n",
216 | " 53625235 | \n",
217 | " 59.0 | \n",
218 | " 253606 | \n",
219 | " 49611 | \n",
220 | " 常规单品 | \n",
221 | " 19.562234 | \n",
222 | "
\n",
223 | " \n",
224 | "
\n",
225 | "
"
226 | ],
227 | "text/plain": [
228 | " 产品ID 价格 总销量 总评价数 规格类型 评销比\n",
229 | "0 59497802 189.0 22153 12269 套装 55.383018\n",
230 | "1 55594403 95.0 227064 53842 NaN 23.712257\n",
231 | "2 56419172 79.0 733418 130106 正常规格 17.739679\n",
232 | "3 58567235 89.0 480040 103975 常规单品 21.659653\n",
233 | "4 53625235 59.0 253606 49611 常规单品 19.562234"
234 | ]
235 | },
236 | "execution_count": 18,
237 | "metadata": {},
238 | "output_type": "execute_result"
239 | }
240 | ],
241 | "source": [
242 | "df['评销比'] = df['总评价数'] / df['总销量'] * 100\n",
243 | "df.head()"
244 | ]
245 | },
246 | {
247 | "cell_type": "markdown",
248 | "metadata": {},
249 | "source": [
250 | "### 查看评销比分布"
251 | ]
252 | },
253 | {
254 | "cell_type": "code",
255 | "execution_count": 19,
256 | "metadata": {},
257 | "outputs": [
258 | {
259 | "data": {
260 | "text/plain": [
261 | ""
262 | ]
263 | },
264 | "execution_count": 19,
265 | "metadata": {},
266 | "output_type": "execute_result"
267 | },
268 | {
269 | "data": {
270 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAlMAAAFICAYAAAB0uHstAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAE8ZJREFUeJzt3X+w5Xdd3/HXmywkDalRklWakmwwTGlMoHZM0Vo6hKLEgGOnQYOxNKFDXRDidIqUWqstiLY4oNLAwHhTCejwK4gRo5CpMyF1YqF0M9AGQnBi3JANQ1yNSPPTJvvuH+e79HDZuDf7uT/OPffxmDmTe77nnHs+73t3d575nvM93+ruAABwbB631QsAANjOxBQAwAAxBQAwQEwBAAwQUwAAA8QUAMAAMQUAMEBMAQAMEFMAAAN2beaTnXrqqX3mmWdu5lMCAByTm2666U+7e/fR7repMXXmmWdm3759m/mUAADHpKruWMv9vMwHADBATAEADBBTAAADxBQAwAAxBQAwQEwBAAwQUwAAA8QUAMAAMQUAMEBMAQAMEFMAAAM29dx87GArK1u9gpm9e7d6BQAsGXumAAAGiCkAgAFiCgBggJgCABggpgAABogpAIABYgoAYICYAgAYIKYAAAaIKQCAAWIKAGCAmAIAGCCmAAAGiCkAgAFiCgBggJgCABggpgAABogpAIABYgoAYICYAgAYsOaYqqpfrKquqlOr6viquqqq7qqqW6vqoo1cJADAotq1ljtV1fOSnDG36SeTnJzk9CTnJLmxqm7o7nvWf4kAAIvrqHumquqbkrwpyeVzmy9OckV3H+rum5PsS3LBozx+b1Xtq6p9Bw8eXI81AwAsjLW8zPeOJK/v7rvntp2V5Pa563ckOe1ID+7ule4+r7vP271797GvFABgAf2VMVVVL0lyX3d/eNVNh6bLYb3qOgDAjnC090y9MsnJVfXpuW0fS3J3Zu+hOjBt25Pkd9Z/eQAAi+2vjKnu/u7561XVSZ6b5F8nubyqPp7k3CRnJ7luoxYJALCojvVzpt6QpJLsT3J1kku7+4H1WhQAwHaxpo9GOKy7a+7qJeu8FgCAbccnoAMADBBTAAADxBQAwAAxBQAwQEwBAAx4TEfzwba3srLVK5jZu3erVwDAOrFnCgBggJgCABggpgAABogpAIABYgoAYICYAgAYIKYAAAaIKQCAAWIKAGCAmAIAGCCmAAAGiCkAgAFiCgBggJgCABggpgAABogpAIABYgoAYICYAgAYIKYAAAaIKQCAAWIKAGCAmAIAGCCmAAAGiCkAgAFiCgBggJgCABggpgAABogpAIABYgoAYICYAgAYIKYAAAaIKQCAAWIKAGCAmAIAGCCmAAAGiCkAgAFiCgBggJgCABggpgAABogpAIABYgoAYICYAgAYIKYAAAaIKQCAAWuKqap6X1V9rqoOVNXbquq4qjq+qq6qqruq6taqumijFwsAsGh2rfF+r+3uO6vqiUk+k+SaJM9OcnKS05Ock+TGqrqhu+/ZmKUCACyeNe2Z6u47py9PS3IoyR8muTjJFd19qLtvTrIvyQWrH1tVe6tqX1XtO3jw4DotGwBgMaz1Zb4XV9WBJJ9K8nNTXJ2V5Pa5u92RWWx9je5e6e7zuvu83bt3r8eaAQAWxppe5uvuDyT5QFV9a5Jrq+orme2hOjR/t1XXAQCW3mM6mq+7b0/ywSTnJzmQ5Iy5m/ck2b9eCwMA2A6OGlNVdXpVPW36+uQkFyb5eGZvQr+8Zp6R5Owk123kYgEAFs1aXuY7McmHpiP5Hkryru5+b1X9dpIrM9sbdX+SS7v7gQ1bKQDAAjpqTHX355Oce4Tt9ya5ZCMWBQCwXfgEdACAAWIKAGCAmAIAGCCmAAAGiCkAgAFiCgBggJgCABggpgAABogpAIABYgoAYICYAgAYIKYAAAaIKQCAAWIKAGCAmAIAGCCmAAAGiCkAgAFiCgBggJgCABggpgAABogpAIABYgoAYICYAgAYIKYAAAaIKQCAAWIKAGCAmAIAGCCmAAAGiCkAgAG7tnoBbLCVla1eAQAsNXumAAAGiCkAgAFiCgBggJgCABggpgAABogpAIABYgoAYICYAgAYIKYAAAaIKQCAAWIKAGCAmAIAGCCmAAAGiCkAgAFiCgBggJgCABiwa6sXADvSyspWryDZu3erVwCwFOyZAgAYIKYAAAYcNaZq5o1VdWtV7a+qj1TV7qo6vqquqqq7ptsu2owFAwAskrW8Z6qSfCHJuUkeSXJVkp9Ock+Sk5OcnuScJDdW1Q3dfc8GrRUAYOEcdc9Udx/q7rd398Pd3Un2JTklycVJrphuv3nafsHGLhcAYLE8pvdMVdXjk1yW5D1Jzkpy+9zNdyQ57QiP2VtV+6pq38GDB0fWCgCwcNYcU1X1uCTvTPKx7v5okkPT5bBedX22sXulu8/r7vN27949ul4AgIWyppiqql2Z7Y062N2vnTYfSHLG3N32JNm/rqsDAFhwazma7/gkv5nklu5+9dxN1yS5fDra7xlJzk5y3cYsEwBgMa3laL4fTXJhkm+rqsumbR9K8oYkV2a2N+r+JJd29wMbsUgAgEV11Jjq7rcleduj3HzJ+i4HAGB7cW4+2KkW4fyAiXMEAtue08kAAAwQUwAAA8QUAMAAMQUAMEBMAQAMEFMAAAPEFADAADEFADBATAEADBBTAAADxBQAwAAxBQAwQEwBAAwQUwAAA8QUAMAAMQUAMEBMAQAMEFMAAAPEFADAADEFADBATAEADBBTAAADxBQAwAAxBQAwQEwBAAwQUwAAA8QUAMAAMQUAMEBMAQAMEFMAAAPEFADAADEFADBATAEADBBTAAADxBQAwAAxBQAwQEwBAAwQUwAAA8QUAMAAMQUAMEBMAQAMEFMAAAPEFADAADEFADBATAEADBBTAAADxBQAwAAxBQAwYM0xVVXHVdWzNnIxAADbzVFjqqp2VdXVSb6U5Pq57cdX1VVVdVdV3VpVF23kQgEAFtFa9kx1kl9Pcv6q7T+Z5OQkpyf5oSRXVdWT1nV1AAAL7qgx1d2PdPe1Se5bddPFSa7o7kPdfXOSfUku2IA1AgAsrF0Djz0rye1z1+9IctrqO1XV3iR7k+SMM84YeLptZmVlq1cA28Oi/F3Zu3erVwBsUyNH8x2aLof1quuzjd0r3X1ed5+3e/fugacDAFg8IzF1IMn8rqY9SfYPrQYAYJsZialrklxeM89IcnaS69ZnWQAA28Oa3jNVVbcmOSHJiVV1W5JbkvxIkisz2xt1f5JLu/uBDVonAMBCWlNMdffffpSbLlnHtQAAbDtOJwMAMEBMAQAMEFMAAAPEFADAADEFADBATAEADBBTAAADxBQAwAAxBQAwQEwBAAwQUwAAA8QUAMAAMQUAMEBMAQAMEFMAAAPEFADAADEFADBATAEADBBTAAADxBQAwAAxBQAwQEwBAAwQUwAAA3Zt9QIAFsLKylavYLHs3bvVK4Btw54pAIABYgoAYICYAgAYIKYAAAaIKQCAAY7mA+DrLcrRjY4qZBuwZwoAYICYAgAYIKYAAAaIKQCAAWIKAGCAmAIAGCCmAAAGiCkAgAFiCgBggJgCABggpgAABiznufkW5ZxSAIxZhH/PF+X8gIvws0gW5+exQOyZAgAYIKYAAAaIKQCAAWIKAGCAmAIAGLCcR/MBwHpZlKPoFsWi/DwW6KhCe6YAAAaIKQCAAUMxVVXPq6rPVdUXquqDVfXX12thAADbwTHHVFWdnOQDSS5NsifJ/03yM+u0LgCAbWFkz9Tzk3ymu/9nd3eSdyT5x+uzLACA7WHkaL6zktw+d/2OJKetvlNV7U1y+C3391bV5weec61OTfKnm/A8i8bcO4u5dxZz7yzmPpqXv3xjVzKzZy13GompQ9PlsF51fbaxeyXJph5HWVX7uvu8zXzORWDuncXcO4u5dxZzby8jL/MdSHLG3PU9SfYPrQYAYJsZianrkjyrqp5ZVZXkx5L8+vosCwBgezjmmOrue5L8syRXJ7kzyV8muWKd1jVqQT6eddOZe2cx985i7p3F3NtIzQ7EAwDgWPgEdACAAWJqG6uq46rqWVu9DgDYyZYqpnbK6W2qaldVXZ3kS0mun9t+fFVdVVV3VdWtVXXR1q1y/dXMG6fZ9lfVR6pq97LPnSRV9b7pz/aBqnrbFNJLP3eSVNUvVlVX1ak7YeaqumGa77bp8n3T9p0w+xOq6i3TjPur6uXLPndVvWbud31bVf1JVX1i2edOkqp6YVX976r6o6r6/ap66nade2neMzWd3uaPklyYZF+S9yQ50N2v3dKFbYCqOi7JCzL70NT/0d0nTdv/Q5K/k+QHk5yT5MYkT50OFtj2qupxSV6R2RsUH0lyVZK/SHJPlnjuJKmq07v7zqp6YpLPJPkXSZ6d5Z/7eZn9zn8wye4kr8ryz3xDktd19w2rti/13+8kqapfTvLNSV7W3Q9Of95fkyWfe15VvS/J7yU5PUs8d1XtSvLnSc7v7puq6k2Z/R3/42zDuZdpz9SOOb1Ndz/S3dcmuW/VTRcnuaK7D3X3zZlF5QWbvsANMs319u5+ePod70tySpZ87iTp7junL0/L7MNx/zBLPndVfVOSNyW5fG7zUs98FEs9e1WdmuQlSV7V3Q8mSXfflyWfe15V/c0k5yd5b5Z/7k7yUP7/h32fkOSL2aZzL1NMren0Nktux/wMqurxSS7LbA/k0s9dVS+uqgNJPpXk56a4Wva535Hk9d1999y2ZZ85mX3MzLunl3XfXFVPmLYv++znZjbTG6vq81X1e1X19Cz/3PNemeTdU0wu9dzd/UiS70/ynmlv7ClJfjbbdO5liqk1nd5mye2In8H0ct87k3ysuz+aHTB3d3+gu5+S5JlJXlNVL8oSz11VL0lyX3d/eNVNSzvzYd39/O7ek+Q5Sb4zyU9MNy377E9O8vQk/6W7n57ZS13vzvLPnSSpqhOSvCyz/4lIlnzu6d/xH8/sbRuvyuz3/4Js07mXKaac3mYH/Aym19nfk+Tg3Pvhln7uw7r79iQfzOylgGWe+5VJvquqPl1Vn562fSzJ3Vnemb9Gd/9JZh+K/Mxp0zL/vpPZATW3dfe+6fr7M9tbtexzH/ZPk3y8u++Yri/73M9PcmZ3v6W7P5vktUl+Kdt07mWKKae3Sa5Jcvl01Nszkpyd2c9lKVTV8Ul+M8kt3f3quZuWfe7Tq+pp09cnZ3aQxcezxHN393d39znd/e3d/e3T5udmFhdLOfNh00tbh3/XP5TkD6ablvb3PflEklOq6vBJbl+UJf9zvsq/TPLWuevLPvcDSb61qp48Xf/OJH+WbTr3rq1ewHrp7nuq6vDpbU7KbBfxopzeZt1V1a2ZvWHvxKq6LcktSX4kyZWZVfz9SS7t7ge2bJHr70czC4lvq6rLpm0fSvKGLPfcJyb50HRk00NJ3tXd762q385yz30ky/67TpKrp5B6OLN/z94+bV/q2aej916cZKWqviHJbZm97HVPlnjuJKmq5yY5rruvn9u87L/v/1ZVv5Lkk1X1UJK7krw0s6P5tt3cS/PRCAAAW2GZXuYDANh0YgoAYICYAgAYIKYAAAaIKQCAAWIKAGCAmAIAGCCmgE1VVY+rqh+vqhMf4+N2VdXNVXXGdP2fTB9umap6RVW9f/q6quqiNXy/A1V1/jGMAPA1xBSw2Z6Q5IeTfLiqjp9CqI9yeUpmJ0H9Snd/Yfo+fz/JNVX1hFXf/6eSvHkutG6sqpdu0mzADiSmgE3V3Q8m+YEkT0vyH5P8SpLHz10+ktlZ5L+6rbsPJHlFku+oqi9X1ZeT/PT0LX/i8PeuquckeXWSH+juv3isa6uql1XVtxzrbMDOJKaATdfdf5bkhUl+Psk/SPLO7n64ux9O0kkOTV+fmuT6KZKe1N0ndPc3Tpe/TPLizM40f9j/SnJhd3/msa6pqn4qyZvztWesBziqpTnRMbC9dPctSVJVn07yvVX1wu7+3VV3e12SLyf5N5md+PWrquoVme2tSpJTMjvB+Q3TbV/s7hfM3f3xVXXC3PWHVn2vn0nyr5J8T3ffNDAWsAOJKWBTVdUNSZ6dJN29q7vvrarXJ3lrVV0/d79nJXlJknOTPCXJ66rq2iT3Jvn9JPuSfCnJdaue4klJLl21bWW6HPYPp/+eVFW/keS7kjyvuz81PiGw04gpYFN19/lVdVKS/zO3+VeT/I0kD89tOynJv+vu/Un2J/meqjqQ5Nndvb+qXpfkE939lvnvX1Vn5utj6p9397tW3S9Jfi3JzUm+o7vvHpkL2LnEFLBlalY0T5yuvinJ8UmOm/77ySSfnMLrUHfff4Rv8dIjfLzBCUe436P5jSSvnN6fBXBMxBSwlfYk+eMjbP++JPN7nB6sqkeSnJjks1XVmb1Z/BNJ3r/qsd+c2ccjrMV7hRQwytF8wJbp7v3dXfOXJL+b5MdWbf9r3X1Ski8mOWf6Oklu7e7fmr8k+a/Hup6q+paq+obxyYCdREwBW6qq/t7Aw/99VT08f0ly21Ge7/lV9Y2PcvOLknx0YD3ADiSmgK2wK0mq6peSvPsxPK6S7Kmqw0fm/ex0ROBXL5l9GOjXP7Dqwqr670l+K7M3tx/JaUm+8Ci3ARyRmAK2wnOm/56S2ccSHFVV7Uny5MzeI/XZI9z+t6b7nJ/kwWnbrsw+VuGXk7wrybVJnjJ9ovq9Sf7u3ONPTvKPknzuWAYCdi4xBWyFP0hySXdf1t1fWeNjDiT5t0me3t3/+Qi3X5nkU0l+Ick7pm27khxM8tYkT+3u/9Td90y3vT3Jzx8+/1+SP89sj9WvHtNEwI5V3b3VawDYMFV1XHc/stXrAJaXmAIAGOBlPgCAAWIKAGCAmAIAGCCmAAAGiCkAgAFiCgBgwP8DRUhe7fIwIj8AAAAASUVORK5CYII=\n",
271 | "text/plain": [
272 | ""
273 | ]
274 | },
275 | "metadata": {
276 | "needs_background": "light"
277 | },
278 | "output_type": "display_data"
279 | }
280 | ],
281 | "source": [
282 | "import seaborn as sns\n",
283 | "import matplotlib.pyplot as plt\n",
284 | "\n",
285 | "fig,ax = plt.subplots(1,1,figsize = (10,5))\n",
286 | "sns.distplot(df['评销比'],color = 'red',kde = False)\n",
287 | "\n",
288 | "plt.yticks(fontsize=11)\n",
289 | "plt.xticks(fontsize=11)\n",
290 | "\n",
291 | "ax.set_xlabel('评销比', fontsize=14)"
292 | ]
293 | },
294 | {
295 | "cell_type": "markdown",
296 | "metadata": {},
297 | "source": [
298 | "### 判断是否有刷单嫌疑"
299 | ]
300 | },
301 | {
302 | "cell_type": "code",
303 | "execution_count": 20,
304 | "metadata": {},
305 | "outputs": [
306 | {
307 | "data": {
308 | "text/plain": [
309 | "False 166\n",
310 | "True 22\n",
311 | "Name: 是否有刷单嫌疑, dtype: int64"
312 | ]
313 | },
314 | "execution_count": 20,
315 | "metadata": {},
316 | "output_type": "execute_result"
317 | }
318 | ],
319 | "source": [
320 | "df['是否有刷单嫌疑'] = df['评销比'] > 40\n",
321 | "df['是否有刷单嫌疑'].value_counts()"
322 | ]
323 | },
324 | {
325 | "cell_type": "markdown",
326 | "metadata": {},
327 | "source": [
328 | "### 导入评论数据"
329 | ]
330 | },
331 | {
332 | "cell_type": "code",
333 | "execution_count": 28,
334 | "metadata": {},
335 | "outputs": [
336 | {
337 | "data": {
338 | "text/html": [
339 | "\n",
340 | "\n",
353 | "
\n",
354 | " \n",
355 | " \n",
356 | " | \n",
357 | " 买家 | \n",
358 | " 初评内容 | \n",
359 | " 评价日期 | \n",
360 | " 追评 | \n",
361 | "
\n",
362 | " \n",
363 | " \n",
364 | " \n",
365 | " 0 | \n",
366 | " 摈**唉 | \n",
367 | " 昨天晚上用了一次, 姜味很浓,用过一段时间再看看效果吧,好用会再回购的! | \n",
368 | " 2019-11-29 | \n",
369 | " - | \n",
370 | "
\n",
371 | " \n",
372 | " 1 | \n",
373 | " t**4 | \n",
374 | " 最近脱发特别严重,鬓角的头发最是损失惨重,抱着试试看的态度来的,目前我用了1个疗程感觉恢复得... | \n",
375 | " 2019-11-29 | \n",
376 | " - | \n",
377 | "
\n",
378 | " \n",
379 | " 2 | \n",
380 | " 露**发 | \n",
381 | " 最近头发大把大把的脱,特别是洗头的时候!刚开始是抱着试试的心态,每次都会隔断时间拍照自己对比... | \n",
382 | " 2019-11-29 | \n",
383 | " - | \n",
384 | "
\n",
385 | " \n",
386 | " 3 | \n",
387 | " t**6 | \n",
388 | " 质量很好,效果不错 | \n",
389 | " 2019-11-29 | \n",
390 | " - | \n",
391 | "
\n",
392 | " \n",
393 | " 4 | \n",
394 | " 去**5 | \n",
395 | " 这次放假回家看到老爸的大脑门,莫名的揪心,老爸为家庭操心了太多,头发一直在掉,这次买了这款防... | \n",
396 | " 2019-11-29 | \n",
397 | " - | \n",
398 | "
\n",
399 | " \n",
400 | "
\n",
401 | "
"
402 | ],
403 | "text/plain": [
404 | " 买家 初评内容 评价日期 追评\n",
405 | "0 摈**唉 昨天晚上用了一次, 姜味很浓,用过一段时间再看看效果吧,好用会再回购的! 2019-11-29 -\n",
406 | "1 t**4 最近脱发特别严重,鬓角的头发最是损失惨重,抱着试试看的态度来的,目前我用了1个疗程感觉恢复得... 2019-11-29 -\n",
407 | "2 露**发 最近头发大把大把的脱,特别是洗头的时候!刚开始是抱着试试的心态,每次都会隔断时间拍照自己对比... 2019-11-29 -\n",
408 | "3 t**6 质量很好,效果不错 2019-11-29 -\n",
409 | "4 去**5 这次放假回家看到老爸的大脑门,莫名的揪心,老爸为家庭操心了太多,头发一直在掉,这次买了这款防... 2019-11-29 -"
410 | ]
411 | },
412 | "execution_count": 28,
413 | "metadata": {},
414 | "output_type": "execute_result"
415 | }
416 | ],
417 | "source": [
418 | "comments = pd.read_excel('生姜防脱洗发水.xlsx')\n",
419 | "comments.head()"
420 | ]
421 | },
422 | {
423 | "cell_type": "markdown",
424 | "metadata": {},
425 | "source": [
426 | "### 评价长度筛选"
427 | ]
428 | },
429 | {
430 | "cell_type": "code",
431 | "execution_count": 29,
432 | "metadata": {},
433 | "outputs": [
434 | {
435 | "name": "stdout",
436 | "output_type": "stream",
437 | "text": [
438 | "(1200, 5)\n"
439 | ]
440 | },
441 | {
442 | "data": {
443 | "text/html": [
444 | "\n",
445 | "\n",
458 | "
\n",
459 | " \n",
460 | " \n",
461 | " | \n",
462 | " 买家 | \n",
463 | " 初评内容 | \n",
464 | " 评价日期 | \n",
465 | " 追评 | \n",
466 | " 评价长度 | \n",
467 | "
\n",
468 | " \n",
469 | " \n",
470 | " \n",
471 | " 0 | \n",
472 | " 摈**唉 | \n",
473 | " 昨天晚上用了一次, 姜味很浓,用过一段时间再看看效果吧,好用会再回购的! | \n",
474 | " 2019-11-29 | \n",
475 | " - | \n",
476 | " 36 | \n",
477 | "
\n",
478 | " \n",
479 | " 1 | \n",
480 | " t**4 | \n",
481 | " 最近脱发特别严重,鬓角的头发最是损失惨重,抱着试试看的态度来的,目前我用了1个疗程感觉恢复得... | \n",
482 | " 2019-11-29 | \n",
483 | " - | \n",
484 | " 80 | \n",
485 | "
\n",
486 | " \n",
487 | " 2 | \n",
488 | " 露**发 | \n",
489 | " 最近头发大把大把的脱,特别是洗头的时候!刚开始是抱着试试的心态,每次都会隔断时间拍照自己对比... | \n",
490 | " 2019-11-29 | \n",
491 | " - | \n",
492 | " 85 | \n",
493 | "
\n",
494 | " \n",
495 | " 4 | \n",
496 | " 去**5 | \n",
497 | " 这次放假回家看到老爸的大脑门,莫名的揪心,老爸为家庭操心了太多,头发一直在掉,这次买了这款防... | \n",
498 | " 2019-11-29 | \n",
499 | " - | \n",
500 | " 76 | \n",
501 | "
\n",
502 | " \n",
503 | " 5 | \n",
504 | " 德**艺 | \n",
505 | " 以前就用过这款生姜洗发水防脱发效果真的很好,这次这个疗程是买来巩固的用过之后脱发已经很少了,... | \n",
506 | " 2019-11-29 | \n",
507 | " - | \n",
508 | " 60 | \n",
509 | "
\n",
510 | " \n",
511 | "
\n",
512 | "
"
513 | ],
514 | "text/plain": [
515 | " 买家 初评内容 评价日期 追评 评价长度\n",
516 | "0 摈**唉 昨天晚上用了一次, 姜味很浓,用过一段时间再看看效果吧,好用会再回购的! 2019-11-29 - 36\n",
517 | "1 t**4 最近脱发特别严重,鬓角的头发最是损失惨重,抱着试试看的态度来的,目前我用了1个疗程感觉恢复得... 2019-11-29 - 80\n",
518 | "2 露**发 最近头发大把大把的脱,特别是洗头的时候!刚开始是抱着试试的心态,每次都会隔断时间拍照自己对比... 2019-11-29 - 85\n",
519 | "4 去**5 这次放假回家看到老爸的大脑门,莫名的揪心,老爸为家庭操心了太多,头发一直在掉,这次买了这款防... 2019-11-29 - 76\n",
520 | "5 德**艺 以前就用过这款生姜洗发水防脱发效果真的很好,这次这个疗程是买来巩固的用过之后脱发已经很少了,... 2019-11-29 - 60"
521 | ]
522 | },
523 | "execution_count": 29,
524 | "metadata": {},
525 | "output_type": "execute_result"
526 | }
527 | ],
528 | "source": [
529 | "comments['评价长度'] = comments['初评内容'].apply(len)\n",
530 | "comments = comments.loc[comments['评价长度'] > 15,:]\n",
531 | "print(comments.shape)\n",
532 | "comments.head()"
533 | ]
534 | },
535 | {
536 | "cell_type": "markdown",
537 | "metadata": {},
538 | "source": [
539 | "### 按内容排序,找到嫌疑评价"
540 | ]
541 | },
542 | {
543 | "cell_type": "code",
544 | "execution_count": 31,
545 | "metadata": {},
546 | "outputs": [
547 | {
548 | "data": {
549 | "text/html": [
550 | "\n",
551 | "\n",
564 | "
\n",
565 | " \n",
566 | " \n",
567 | " | \n",
568 | " 买家 | \n",
569 | " 初评内容 | \n",
570 | " 评价日期 | \n",
571 | " 追评 | \n",
572 | " 评价长度 | \n",
573 | "
\n",
574 | " \n",
575 | " \n",
576 | " \n",
577 | " 1307 | \n",
578 | " 你**个 | \n",
579 | " 感觉越洗头发掉得越多,每次洗必须要用洗发水两次以上,还要搓按5分钟,这样洗下去头发本来就少,... | \n",
580 | " 2019-07-11 | \n",
581 | " 我是短发,洗一次掉这么多,以前洗只掉几根,洗了之后头痒的要死,当初客服说用了不适应可以退,现... | \n",
582 | " 348 | \n",
583 | "
\n",
584 | " \n",
585 | " 1147 | \n",
586 | " y**8 | \n",
587 | " 使用了第二次才来评价的,我头发很长(齐膝)掉得特别厉害。之前使用防脱洗发水用完之后呢换成了潘... | \n",
588 | " 2019-09-02 | \n",
589 | " 长头发的妹子可以试试这款洗发水哦!我现在掉发已经开始在变少了,开心 | \n",
590 | " 290 | \n",
591 | "
\n",
592 | " \n",
593 | " 629 | \n",
594 | " 0**b | \n",
595 | " 1客服小海马说寄来的品牌是柏诗春天,我下单购买的海洋诗韵,俩不同品牌都是一个厂家生产的,让我... | \n",
596 | " 2019-10-22 | \n",
597 | " - | \n",
598 | " 290 | \n",
599 | "
\n",
600 | " \n",
601 | " 151 | \n",
602 | " t**1 | \n",
603 | " 自从高考那个紧张的阶段后,我的头发就很会掉,每天都房间里,床铺上地上都可以看到我的掉发,每次... | \n",
604 | " 2019-11-21 | \n",
605 | " - | \n",
606 | " 177 | \n",
607 | "
\n",
608 | " \n",
609 | " 587 | \n",
610 | " 女**8 | \n",
611 | " 自从高考那个紧张的阶段后,我的头发就很会掉,每天都房间里,床铺上地上都可以看到我的掉发,每次... | \n",
612 | " 2019-10-24 | \n",
613 | " - | \n",
614 | " 177 | \n",
615 | "
\n",
616 | " \n",
617 | " 674 | \n",
618 | " e**1 | \n",
619 | " 自从高考那个紧张的阶段后,我的头发就很会掉,每天都房间里,床铺上地上都可以看到我的掉发,每次... | \n",
620 | " 2019-10-16 | \n",
621 | " - | \n",
622 | " 177 | \n",
623 | "
\n",
624 | " \n",
625 | "
\n",
626 | "
"
627 | ],
628 | "text/plain": [
629 | " 买家 初评内容 评价日期 \\\n",
630 | "1307 你**个 感觉越洗头发掉得越多,每次洗必须要用洗发水两次以上,还要搓按5分钟,这样洗下去头发本来就少,... 2019-07-11 \n",
631 | "1147 y**8 使用了第二次才来评价的,我头发很长(齐膝)掉得特别厉害。之前使用防脱洗发水用完之后呢换成了潘... 2019-09-02 \n",
632 | "629 0**b 1客服小海马说寄来的品牌是柏诗春天,我下单购买的海洋诗韵,俩不同品牌都是一个厂家生产的,让我... 2019-10-22 \n",
633 | "151 t**1 自从高考那个紧张的阶段后,我的头发就很会掉,每天都房间里,床铺上地上都可以看到我的掉发,每次... 2019-11-21 \n",
634 | "587 女**8 自从高考那个紧张的阶段后,我的头发就很会掉,每天都房间里,床铺上地上都可以看到我的掉发,每次... 2019-10-24 \n",
635 | "674 e**1 自从高考那个紧张的阶段后,我的头发就很会掉,每天都房间里,床铺上地上都可以看到我的掉发,每次... 2019-10-16 \n",
636 | "\n",
637 | " 追评 评价长度 \n",
638 | "1307 我是短发,洗一次掉这么多,以前洗只掉几根,洗了之后头痒的要死,当初客服说用了不适应可以退,现... 348 \n",
639 | "1147 长头发的妹子可以试试这款洗发水哦!我现在掉发已经开始在变少了,开心 290 \n",
640 | "629 - 290 \n",
641 | "151 - 177 \n",
642 | "587 - 177 \n",
643 | "674 - 177 "
644 | ]
645 | },
646 | "execution_count": 31,
647 | "metadata": {},
648 | "output_type": "execute_result"
649 | }
650 | ],
651 | "source": [
652 | "comments = comments.sort_values(['评价长度','初评内容'],ascending = False)\n",
653 | "comments.head(6)"
654 | ]
655 | },
656 | {
657 | "cell_type": "markdown",
658 | "metadata": {},
659 | "source": [
660 | "### 统计重复评价数"
661 | ]
662 | },
663 | {
664 | "cell_type": "code",
665 | "execution_count": 27,
666 | "metadata": {},
667 | "outputs": [
668 | {
669 | "name": "stdout",
670 | "output_type": "stream",
671 | "text": [
672 | "总评价数: 1200\n",
673 | "重复的评价数占比:31.5%\n"
674 | ]
675 | }
676 | ],
677 | "source": [
678 | "#按内容分组,统计每条评价出现的次数\n",
679 | "filt = comments.groupby('初评内容')['买家'].count().reset_index()\n",
680 | "filt.columns = ['初评内容','重复次数']\n",
681 | "\n",
682 | "#统计重复评价出现的次数\n",
683 | "reap = filt.loc[filt['重复次数'] > 1,'重复次数'].sum()\n",
684 | "\n",
685 | "print('总评价数:',len(comments))\n",
686 | "print('重复的评价数占比:{}%'.format(reap / len(comments) * 100))"
687 | ]
688 | },
689 | {
690 | "cell_type": "code",
691 | "execution_count": null,
692 | "metadata": {},
693 | "outputs": [],
694 | "source": []
695 | }
696 | ],
697 | "metadata": {
698 | "kernelspec": {
699 | "display_name": "Python 3",
700 | "language": "python",
701 | "name": "python3"
702 | },
703 | "language_info": {
704 | "codemirror_mode": {
705 | "name": "ipython",
706 | "version": 3
707 | },
708 | "file_extension": ".py",
709 | "mimetype": "text/x-python",
710 | "name": "python",
711 | "nbconvert_exporter": "python",
712 | "pygments_lexer": "ipython3",
713 | "version": "3.5.3"
714 | }
715 | },
716 | "nbformat": 4,
717 | "nbformat_minor": 2
718 | }
719 |
--------------------------------------------------------------------------------
/DoubanMovies/README.md:
--------------------------------------------------------------------------------
1 | # 豆瓣电影爬取及自制年代榜单 #
2 |
3 | ----------
4 |
5 | 项目以豆瓣电影为例,详述爬取、清洗、分析全过程,尽可能详细的展示数据分析的清晰逻辑链条。
6 |
7 | 主要文件为:
8 |
9 | - 豆瓣9000部电影的爬取代码
10 | - 基于爬取数据清洗和分析的详细操作代码
11 | - 电影爬取源数据集两份和最终电影排行榜
12 |
13 |
--------------------------------------------------------------------------------
/DoubanMovies/最终电影排名结果.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/DoubanMovies/最终电影排名结果.xlsx
--------------------------------------------------------------------------------
/DoubanMovies/电影基本信息大全.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/DoubanMovies/电影基本信息大全.xlsx
--------------------------------------------------------------------------------
/DoubanMovies/电影详细信息.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/DoubanMovies/电影详细信息.xlsx
--------------------------------------------------------------------------------
/DoubanMovies/豆瓣电影爬取.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": 3,
6 | "metadata": {
7 | "collapsed": true
8 | },
9 | "outputs": [],
10 | "source": [
11 | "import os\n",
12 | "import requests\n",
13 | "import pandas as pd\n",
14 | "import numpy as np\n",
15 | "import json\n",
16 | "import time\n",
17 | "import random\n",
18 | "from lxml import etree"
19 | ]
20 | },
21 | {
22 | "cell_type": "markdown",
23 | "metadata": {},
24 | "source": [
25 | "### 影信息通过动态加载,所有的信息都藏在基础网页,唯一变动的是start"
26 | ]
27 | },
28 | {
29 | "cell_type": "code",
30 | "execution_count": 5,
31 | "metadata": {
32 | "collapsed": true
33 | },
34 | "outputs": [],
35 | "source": [
36 | "url1 = 'https://movie.douban.com/j/new_search_subjects?sort=T&range=0,10&tags=%E7%94%B5%E5%BD%B1&start=0'\n",
37 | "url2 = 'https://movie.douban.com/j/new_search_subjects?sort=T&range=0,10&tags=%E7%94%B5%E5%BD%B1&start=20'\n",
38 | "url3 = 'https://movie.douban.com/j/new_search_subjects?sort=T&range=0,10&tags=%E7%94%B5%E5%BD%B1&start=40'\n",
39 | "url4 = 'https://movie.douban.com/j/new_search_subjects?sort=T&range=0,10&tags=%E7%94%B5%E5%BD%B1&start=60'"
40 | ]
41 | },
42 | {
43 | "cell_type": "markdown",
44 | "metadata": {},
45 | "source": [
46 | "### 构造爬取的网址"
47 | ]
48 | },
49 | {
50 | "cell_type": "code",
51 | "execution_count": 9,
52 | "metadata": {
53 | "collapsed": true
54 | },
55 | "outputs": [],
56 | "source": [
57 | "#构造网页\n",
58 | "def format_url(num):\n",
59 | " urls = []\n",
60 | " base_url = 'https://movie.douban.com/j/new_search_subjects?sort=T&range=0,10&tags=%E7%94%B5%E5%BD%B1&start={}'\n",
61 | " for i in range(0,20 * num,20):\n",
62 | " url = base_url.format(i)\n",
63 | " urls.append(url)\n",
64 | " return urls\n",
65 | "\n",
66 | "#这里是爬取10页,可以自行更改参数\n",
67 | "urls = format_url(10)"
68 | ]
69 | },
70 | {
71 | "cell_type": "markdown",
72 | "metadata": {
73 | "collapsed": true
74 | },
75 | "source": [
76 | "### 伪装请求头"
77 | ]
78 | },
79 | {
80 | "cell_type": "code",
81 | "execution_count": 8,
82 | "metadata": {
83 | "collapsed": true
84 | },
85 | "outputs": [],
86 | "source": [
87 | "headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}"
88 | ]
89 | },
90 | {
91 | "cell_type": "markdown",
92 | "metadata": {},
93 | "source": [
94 | "### 解析单页"
95 | ]
96 | },
97 | {
98 | "cell_type": "code",
99 | "execution_count": 11,
100 | "metadata": {
101 | "collapsed": true
102 | },
103 | "outputs": [],
104 | "source": [
105 | "def parse_base_info(url,headers):\n",
106 | " html = requests.get(url,headers = headers) \n",
107 | " bs = json.loads(html.text)\n",
108 | " df = pd.DataFrame()\n",
109 | " for i in bs['data']:\n",
110 | " casts = i['casts'] #主演\n",
111 | " cover = i['cover'] #海报\n",
112 | " directors = i['directors'] #导演\n",
113 | " m_id = i['id'] #ID\n",
114 | " rate = i['rate'] #评分\n",
115 | " star = i['star'] #标记人数 \n",
116 | " title = i['title'] #片名\n",
117 | " url = i['url'] #网址\n",
118 | " cache = pd.DataFrame({'主演':[casts],'海报':[cover],'导演':[directors],\n",
119 | " 'ID':[m_id],'评分':[rate],'标记':[star],'片名':[title],'网址':[url]})\n",
120 | " df = pd.concat([df,cache])\n",
121 | " return df"
122 | ]
123 | },
124 | {
125 | "cell_type": "markdown",
126 | "metadata": {
127 | "collapsed": true
128 | },
129 | "source": [
130 | "### 循环批量爬取电影"
131 | ]
132 | },
133 | {
134 | "cell_type": "code",
135 | "execution_count": 12,
136 | "metadata": {
137 | "collapsed": false
138 | },
139 | "outputs": [
140 | {
141 | "name": "stdout",
142 | "output_type": "stream",
143 | "text": [
144 | "I had crawled page of:1\n",
145 | "I had crawled page of:2\n",
146 | "I had crawled page of:3\n",
147 | "I had crawled page of:4\n",
148 | "I had crawled page of:5\n",
149 | "I had crawled page of:6\n",
150 | "I had crawled page of:7\n",
151 | "I had crawled page of:8\n",
152 | "I had crawled page of:9\n",
153 | "I had crawled page of:10\n"
154 | ]
155 | }
156 | ],
157 | "source": [
158 | "result = pd.DataFrame()\n",
159 | "\n",
160 | "count = 1\n",
161 | "for url in urls:\n",
162 | " df = parse_base_info(url,headers = headers)\n",
163 | " result = pd.concat([result,df])\n",
164 | " time.sleep(random.random() + 2)\n",
165 | " print('I had crawled page of:%d' % count)\n",
166 | " count += 1"
167 | ]
168 | },
169 | {
170 | "cell_type": "code",
171 | "execution_count": 13,
172 | "metadata": {
173 | "collapsed": false
174 | },
175 | "outputs": [
176 | {
177 | "data": {
178 | "text/html": [
179 | "\n",
180 | "\n",
193 | "
\n",
194 | " \n",
195 | " \n",
196 | " | \n",
197 | " ID | \n",
198 | " 主演 | \n",
199 | " 导演 | \n",
200 | " 标记 | \n",
201 | " 海报 | \n",
202 | " 片名 | \n",
203 | " 网址 | \n",
204 | " 评分 | \n",
205 | "
\n",
206 | " \n",
207 | " \n",
208 | " \n",
209 | " 0 | \n",
210 | " 26752088 | \n",
211 | " [徐峥, 王传君, 周一围, 谭卓, 章宇] | \n",
212 | " [文牧野] | \n",
213 | " 45 | \n",
214 | " https://img3.doubanio.com/view/photo/s_ratio_p... | \n",
215 | " 我不是药神 | \n",
216 | " https://movie.douban.com/subject/26752088/ | \n",
217 | " 9.0 | \n",
218 | "
\n",
219 | " \n",
220 | " 0 | \n",
221 | " 1295644 | \n",
222 | " [让·雷诺, 娜塔莉·波特曼, 加里·奥德曼, 丹尼·爱罗, 彼得·阿佩尔] | \n",
223 | " [吕克·贝松] | \n",
224 | " 45 | \n",
225 | " https://img3.doubanio.com/view/photo/s_ratio_p... | \n",
226 | " 这个杀手不太冷 | \n",
227 | " https://movie.douban.com/subject/1295644/ | \n",
228 | " 9.4 | \n",
229 | "
\n",
230 | " \n",
231 | " 0 | \n",
232 | " 1292052 | \n",
233 | " [蒂姆·罗宾斯, 摩根·弗里曼, 鲍勃·冈顿, 威廉姆·赛德勒, 克兰西·布朗] | \n",
234 | " [弗兰克·德拉邦特] | \n",
235 | " 50 | \n",
236 | " https://img3.doubanio.com/view/photo/s_ratio_p... | \n",
237 | " 肖申克的救赎 | \n",
238 | " https://movie.douban.com/subject/1292052/ | \n",
239 | " 9.7 | \n",
240 | "
\n",
241 | " \n",
242 | " 0 | \n",
243 | " 26266893 | \n",
244 | " [屈楚萧, 吴京, 李光洁, 吴孟达, 赵今麦] | \n",
245 | " [郭帆] | \n",
246 | " 40 | \n",
247 | " https://img3.doubanio.com/view/photo/s_ratio_p... | \n",
248 | " 流浪地球 | \n",
249 | " https://movie.douban.com/subject/26266893/ | \n",
250 | " 7.9 | \n",
251 | "
\n",
252 | " \n",
253 | " 0 | \n",
254 | " 1292720 | \n",
255 | " [汤姆·汉克斯, 罗宾·怀特, 加里·西尼斯, 麦凯尔泰·威廉逊, 莎莉·菲尔德] | \n",
256 | " [罗伯特·泽米吉斯] | \n",
257 | " 45 | \n",
258 | " https://img3.doubanio.com/view/photo/s_ratio_p... | \n",
259 | " 阿甘正传 | \n",
260 | " https://movie.douban.com/subject/1292720/ | \n",
261 | " 9.4 | \n",
262 | "
\n",
263 | " \n",
264 | "
\n",
265 | "
"
266 | ],
267 | "text/plain": [
268 | " ID 主演 导演 标记 \\\n",
269 | "0 26752088 [徐峥, 王传君, 周一围, 谭卓, 章宇] [文牧野] 45 \n",
270 | "0 1295644 [让·雷诺, 娜塔莉·波特曼, 加里·奥德曼, 丹尼·爱罗, 彼得·阿佩尔] [吕克·贝松] 45 \n",
271 | "0 1292052 [蒂姆·罗宾斯, 摩根·弗里曼, 鲍勃·冈顿, 威廉姆·赛德勒, 克兰西·布朗] [弗兰克·德拉邦特] 50 \n",
272 | "0 26266893 [屈楚萧, 吴京, 李光洁, 吴孟达, 赵今麦] [郭帆] 40 \n",
273 | "0 1292720 [汤姆·汉克斯, 罗宾·怀特, 加里·西尼斯, 麦凯尔泰·威廉逊, 莎莉·菲尔德] [罗伯特·泽米吉斯] 45 \n",
274 | "\n",
275 | " 海报 片名 \\\n",
276 | "0 https://img3.doubanio.com/view/photo/s_ratio_p... 我不是药神 \n",
277 | "0 https://img3.doubanio.com/view/photo/s_ratio_p... 这个杀手不太冷 \n",
278 | "0 https://img3.doubanio.com/view/photo/s_ratio_p... 肖申克的救赎 \n",
279 | "0 https://img3.doubanio.com/view/photo/s_ratio_p... 流浪地球 \n",
280 | "0 https://img3.doubanio.com/view/photo/s_ratio_p... 阿甘正传 \n",
281 | "\n",
282 | " 网址 评分 \n",
283 | "0 https://movie.douban.com/subject/26752088/ 9.0 \n",
284 | "0 https://movie.douban.com/subject/1295644/ 9.4 \n",
285 | "0 https://movie.douban.com/subject/1292052/ 9.7 \n",
286 | "0 https://movie.douban.com/subject/26266893/ 7.9 \n",
287 | "0 https://movie.douban.com/subject/1292720/ 9.4 "
288 | ]
289 | },
290 | "execution_count": 13,
291 | "metadata": {},
292 | "output_type": "execute_result"
293 | }
294 | ],
295 | "source": [
296 | "result.head()"
297 | ]
298 | },
299 | {
300 | "cell_type": "markdown",
301 | "metadata": {
302 | "collapsed": false
303 | },
304 | "source": [
305 | "### 解析单个页面,获取详细的电影信息"
306 | ]
307 | },
308 | {
309 | "cell_type": "code",
310 | "execution_count": 14,
311 | "metadata": {
312 | "collapsed": false
313 | },
314 | "outputs": [],
315 | "source": [
316 | "def parse_movie_info(url,headers = headers,ip = ''):\n",
317 | " if ip == '':\n",
318 | " html = requests.get(url,headers = headers)\n",
319 | " else:\n",
320 | " html = requests.get(url,headers = headers,proxies = ip)\n",
321 | " bs = etree.HTML(html.text)\n",
322 | " #片名\n",
323 | " title = bs.xpath('//div[@id = \"wrapper\"]/div/h1/span')[0].text \n",
324 | " #上映时间\n",
325 | " year = bs.xpath('//div[@id = \"wrapper\"]/div/h1/span')[1].text \n",
326 | " #电影类型\n",
327 | " m_type = []\n",
328 | " for t in bs.xpath('//span[@property = \"v:genre\"]'):\n",
329 | " m_type.append(t.text) \n",
330 | " a = bs.xpath('//div[@id= \"info\"]')[0].xpath('string()')\n",
331 | " #片长\n",
332 | " m_time =a[a.find('片长: ') + 4:a.find('分钟\\n')] #时长\n",
333 | " #地区\n",
334 | " area = a[a.find('制片国家/地区:') + 9:a.find('\\n 语言')] #地区\n",
335 | " #评分人数\n",
336 | " try:\n",
337 | " people = bs.xpath('//a[@class = \"rating_people\"]/span')[0].text\n",
338 | " #评分分布\n",
339 | " rating = {}\n",
340 | " rate_count = bs.xpath('//div[@class = \"ratings-on-weight\"]/div')\n",
341 | " for rate in rate_count:\n",
342 | " rating[rate.xpath('span/@title')[0]] = rate.xpath('span[@class = \"rating_per\"]')[0].text\n",
343 | " except:\n",
344 | " people = 'None'\n",
345 | " rating = {}\n",
346 | " #简介\n",
347 | " try:\n",
348 | " brief = bs.xpath('//span[@property = \"v:summary\"]')[0].text.strip('\\n \\u3000\\u3000')\n",
349 | " except:\n",
350 | " brief = 'None'\n",
351 | " try:\n",
352 | " hot_comment = bs.xpath('//div[@id = \"hot-comments\"]/div/div/p/span')[0].text\n",
353 | " except:\n",
354 | " hot_comment = 'None'\n",
355 | " cache = pd.DataFrame({'片名':[title],'上映时间':[year],'电影类型':[m_type],'片长':[m_time],\n",
356 | " '地区':[area],'评分人数':[people],'评分分布':[rating],'简介':[brief],'热评':[hot_comment],'网址':[url]})\n",
357 | " return cache"
358 | ]
359 | },
360 | {
361 | "cell_type": "markdown",
362 | "metadata": {
363 | "collapsed": true
364 | },
365 | "source": [
366 | "### 批量访问单个电影页面"
367 | ]
368 | },
369 | {
370 | "cell_type": "code",
371 | "execution_count": 15,
372 | "metadata": {
373 | "collapsed": false
374 | },
375 | "outputs": [
376 | {
377 | "name": "stdout",
378 | "output_type": "stream",
379 | "text": [
380 | "我们爬取了第:1部电影-------我不是药神\n",
381 | "我们爬取了第:2部电影-------这个杀手不太冷\n",
382 | "我们爬取了第:3部电影-------肖申克的救赎\n",
383 | "我们爬取了第:4部电影-------流浪地球\n",
384 | "我们爬取了第:5部电影-------阿甘正传\n",
385 | "我们爬取了第:6部电影-------复仇者联盟3:无限战争\n",
386 | "我们爬取了第:7部电影-------盗梦空间\n",
387 | "我们爬取了第:8部电影-------西虹市首富\n",
388 | "我们爬取了第:9部电影-------泰坦尼克号\n",
389 | "我们爬取了第:10部电影-------千与千寻\n",
390 | "我们爬取了第:11部电影-------霸王别姬\n",
391 | "我们爬取了第:12部电影-------三傻大闹宝莱坞\n",
392 | "我们爬取了第:13部电影-------让子弹飞\n",
393 | "我们爬取了第:14部电影-------怦然心动\n",
394 | "我们爬取了第:15部电影-------摔跤吧!爸爸\n",
395 | "我们爬取了第:16部电影-------毒液:致命守护者\n",
396 | "我们爬取了第:17部电影-------疯狂动物城\n",
397 | "我们爬取了第:18部电影-------忠犬八公的故事\n",
398 | "我们爬取了第:19部电影-------一出好戏\n",
399 | "我们爬取了第:20部电影-------当幸福来敲门\n",
400 | "我们爬取了第:21部电影-------海上钢琴师\n",
401 | "我们爬取了第:22部电影-------大话西游之大圣娶亲\n",
402 | "我们爬取了第:23部电影-------海王\n",
403 | "我们爬取了第:24部电影-------楚门的世界\n",
404 | "我们爬取了第:25部电影-------你的名字。\n",
405 | "我们爬取了第:26部电影-------阿凡达\n",
406 | "我们爬取了第:27部电影-------少年派的奇幻漂流\n",
407 | "我们爬取了第:28部电影-------星际穿越\n",
408 | "我们爬取了第:29部电影-------头号玩家\n",
409 | "我们爬取了第:30部电影-------无双\n",
410 | "我们爬取了第:31部电影-------放牛班的春天\n",
411 | "我们爬取了第:32部电影-------飞屋环游记\n",
412 | "我们爬取了第:33部电影-------机器人总动员\n",
413 | "我们爬取了第:34部电影-------那些年,我们一起追的女孩\n",
414 | "我们爬取了第:35部电影-------龙猫\n",
415 | "我们爬取了第:36部电影-------寻梦环游记\n",
416 | "我们爬取了第:37部电影-------红海行动\n",
417 | "我们爬取了第:38部电影-------初恋这件小事\n",
418 | "我们爬取了第:39部电影-------大话西游之月光宝盒\n",
419 | "我们爬取了第:40部电影-------无名之辈\n",
420 | "我们爬取了第:41部电影-------无间道\n",
421 | "我们爬取了第:42部电影-------天使爱美丽\n",
422 | "我们爬取了第:43部电影-------碟中谍6:全面瓦解\n",
423 | "我们爬取了第:44部电影-------剪刀手爱德华\n",
424 | "我们爬取了第:45部电影-------复仇者联盟\n",
425 | "我们爬取了第:46部电影-------战狼2\n",
426 | "我们爬取了第:47部电影-------美丽人生\n",
427 | "我们爬取了第:48部电影-------绿皮书\n",
428 | "我们爬取了第:49部电影-------飞驰人生\n",
429 | "我们爬取了第:50部电影-------罗马假日\n",
430 | "我们爬取了第:51部电影-------V字仇杀队\n",
431 | "我们爬取了第:52部电影-------唐伯虎点秋香\n",
432 | "我们爬取了第:53部电影-------夏洛特烦恼\n",
433 | "我们爬取了第:54部电影-------唐人街探案2\n",
434 | "我们爬取了第:55部电影-------动物世界\n",
435 | "我们爬取了第:56部电影-------辛德勒的名单\n",
436 | "我们爬取了第:57部电影-------芳华\n",
437 | "我们爬取了第:58部电影-------人再囧途之泰囧\n",
438 | "我们爬取了第:59部电影-------老炮儿\n",
439 | "我们爬取了第:60部电影-------釜山行\n",
440 | "我们爬取了第:61部电影-------蝴蝶效应\n",
441 | "我们爬取了第:62部电影-------神偷奶爸\n",
442 | "我们爬取了第:63部电影-------七宗罪\n",
443 | "我们爬取了第:64部电影-------邪不压正\n",
444 | "我们爬取了第:65部电影-------疯狂的外星人\n",
445 | "我们爬取了第:66部电影-------哈尔的移动城堡\n",
446 | "我们爬取了第:67部电影-------复仇者联盟4:终局之战\n",
447 | "我们爬取了第:68部电影-------蚁人2:黄蜂女现身\n",
448 | "我们爬取了第:69部电影-------失恋33天\n",
449 | "我们爬取了第:70部电影-------看不见的客人\n",
450 | "我们爬取了第:71部电影-------蝙蝠侠:黑暗骑士\n",
451 | "我们爬取了第:72部电影-------湄公河行动\n",
452 | "我们爬取了第:73部电影-------加勒比海盗\n",
453 | "我们爬取了第:74部电影-------本杰明·巴顿奇事\n",
454 | "我们爬取了第:75部电影-------喜剧之王\n",
455 | "我们爬取了第:76部电影-------西西里的美丽传说\n",
456 | "我们爬取了第:77部电影-------美人鱼\n",
457 | "我们爬取了第:78部电影-------中国合伙人\n",
458 | "我们爬取了第:79部电影-------小偷家族\n",
459 | "我们爬取了第:80部电影-------疯狂原始人\n",
460 | "我们爬取了第:81部电影-------触不可及\n",
461 | "我们爬取了第:82部电影-------钢铁侠\n",
462 | "我们爬取了第:83部电影-------后会无期\n",
463 | "我们爬取了第:84部电影-------超能陆战队\n",
464 | "我们爬取了第:85部电影-------黑天鹅\n",
465 | "我们爬取了第:86部电影-------北京遇上西雅图\n",
466 | "我们爬取了第:87部电影-------情书\n",
467 | "我们爬取了第:88部电影-------奇异博士\n",
468 | "我们爬取了第:89部电影-------教父\n",
469 | "我们爬取了第:90部电影-------血战钢锯岭\n",
470 | "我们爬取了第:91部电影-------天空之城\n",
471 | "我们爬取了第:92部电影-------功夫\n",
472 | "我们爬取了第:93部电影-------超时空同居\n",
473 | "我们爬取了第:94部电影-------禁闭岛\n",
474 | "我们爬取了第:95部电影-------银河护卫队\n",
475 | "我们爬取了第:96部电影-------倩女幽魂\n",
476 | "我们爬取了第:97部电影-------无问西东\n",
477 | "我们爬取了第:98部电影-------唐人街探案\n",
478 | "我们爬取了第:99部电影-------羞羞的铁拳\n",
479 | "我们爬取了第:100部电影-------复仇者联盟2:奥创纪元\n",
480 | "我们爬取了第:101部电影-------贫民窟的百万富翁\n",
481 | "我们爬取了第:102部电影-------搏击俱乐部\n",
482 | "我们爬取了第:103部电影-------源代码\n",
483 | "我们爬取了第:104部电影-------爱乐之城\n",
484 | "我们爬取了第:105部电影-------七月与安生\n",
485 | "我们爬取了第:106部电影-------闻香识女人\n",
486 | "我们爬取了第:107部电影-------狮子王\n",
487 | "我们爬取了第:108部电影-------沉默的羔羊\n",
488 | "我们爬取了第:109部电影-------穿普拉达的女王\n",
489 | "我们爬取了第:110部电影-------驴得水\n",
490 | "我们爬取了第:111部电影-------黑客帝国\n",
491 | "我们爬取了第:112部电影-------疯狂的石头\n",
492 | "我们爬取了第:113部电影-------哈利·波特与魔法石\n",
493 | "我们爬取了第:114部电影-------妖猫传\n",
494 | "我们爬取了第:115部电影-------美国队长3\n",
495 | "我们爬取了第:116部电影-------天才枪手\n",
496 | "我们爬取了第:117部电影-------我的少女时代\n",
497 | "我们爬取了第:118部电影-------敦刻尔克\n",
498 | "我们爬取了第:119部电影-------重庆森林\n",
499 | "我们爬取了第:120部电影-------低俗小说\n",
500 | "我们爬取了第:121部电影-------西游记之大圣归来\n",
501 | "我们爬取了第:122部电影-------人在囧途\n",
502 | "我们爬取了第:123部电影-------消失的爱人\n",
503 | "我们爬取了第:124部电影-------王牌特工:特工学院\n",
504 | "我们爬取了第:125部电影-------国王的演讲\n",
505 | "我们爬取了第:126部电影-------美国队长2\n",
506 | "我们爬取了第:127部电影-------美丽心灵\n",
507 | "我们爬取了第:128部电影-------熔炉\n",
508 | "我们爬取了第:129部电影-------影\n",
509 | "我们爬取了第:130部电影-------钢铁侠3\n",
510 | "我们爬取了第:131部电影-------指环王1:魔戒再现\n",
511 | "我们爬取了第:132部电影-------火星救援\n",
512 | "我们爬取了第:133部电影-------钢铁侠2\n",
513 | "我们爬取了第:134部电影-------蚁人\n",
514 | "我们爬取了第:135部电影-------傲慢与偏见\n",
515 | "我们爬取了第:136部电影-------致命魔术\n",
516 | "我们爬取了第:137部电影-------三块广告牌\n",
517 | "我们爬取了第:138部电影-------布达佩斯大饭店\n",
518 | "我们爬取了第:139部电影-------东邪西毒\n",
519 | "我们爬取了第:140部电影-------断背山\n",
520 | "我们爬取了第:141部电影-------银河护卫队2\n",
521 | "我们爬取了第:142部电影-------寻龙诀\n",
522 | "我们爬取了第:143部电影-------西游降魔篇\n",
523 | "我们爬取了第:144部电影-------秒速5厘米\n",
524 | "我们爬取了第:145部电影-------指环王3:王者无敌\n",
525 | "我们爬取了第:146部电影-------活着\n",
526 | "我们爬取了第:147部电影-------2012\n",
527 | "我们爬取了第:148部电影-------恐怖游轮\n",
528 | "我们爬取了第:149部电影-------蜘蛛侠:英雄归来\n",
529 | "我们爬取了第:150部电影-------告白\n",
530 | "我们爬取了第:151部电影-------功夫熊猫\n",
531 | "我们爬取了第:152部电影-------被嫌弃的松子的一生\n",
532 | "我们爬取了第:153部电影-------驯龙高手\n",
533 | "我们爬取了第:154部电影-------神奇动物:格林德沃之罪\n",
534 | "我们爬取了第:155部电影-------蝙蝠侠:黑暗骑士崛起\n",
535 | "我们爬取了第:156部电影-------冰雪奇缘\n",
536 | "我们爬取了第:157部电影-------哈利·波特与死亡圣器(下)\n",
537 | "我们爬取了第:158部电影-------冰川时代\n",
538 | "我们爬取了第:159部电影-------志明与春娇\n",
539 | "我们爬取了第:160部电影-------致命ID\n",
540 | "我们爬取了第:161部电影-------乘风破浪\n",
541 | "我们爬取了第:162部电影-------金陵十三钗\n",
542 | "我们爬取了第:163部电影-------美国队长\n",
543 | "我们爬取了第:164部电影-------黑豹\n",
544 | "我们爬取了第:165部电影-------哪吒之魔童降世\n",
545 | "我们爬取了第:166部电影-------雷神3:诸神黄昏\n",
546 | "我们爬取了第:167部电影-------指环王2:双塔奇兵\n",
547 | "我们爬取了第:168部电影-------勇敢的心\n",
548 | "我们爬取了第:169部电影-------天堂电影院\n",
549 | "我们爬取了第:170部电影-------惊天魔盗团\n",
550 | "我们爬取了第:171部电影-------致我们终将逝去的青春\n",
551 | "我们爬取了第:172部电影-------真爱至上\n",
552 | "我们爬取了第:173部电影-------射雕英雄传之东成西就\n",
553 | "我们爬取了第:174部电影-------大黄蜂\n",
554 | "我们爬取了第:175部电影-------怪兽电力公司\n",
555 | "我们爬取了第:176部电影-------捉妖记\n",
556 | "我们爬取了第:177部电影-------了不起的盖茨比\n",
557 | "我们爬取了第:178部电影-------速度与激情7\n",
558 | "我们爬取了第:179部电影-------死亡诗社\n",
559 | "我们爬取了第:180部电影-------阳光姐妹淘\n",
560 | "我们爬取了第:181部电影-------乱世佳人\n",
561 | "我们爬取了第:182部电影-------入殓师\n",
562 | "我们爬取了第:183部电影-------岁月神偷\n",
563 | "我们爬取了第:184部电影-------心灵捕手\n",
564 | "我们爬取了第:185部电影-------色,戒\n",
565 | "我们爬取了第:186部电影-------猫鼠游戏\n",
566 | "我们爬取了第:187部电影-------超体\n",
567 | "我们爬取了第:188部电影-------阳光灿烂的日子\n",
568 | "我们爬取了第:189部电影-------烈日灼心\n",
569 | "我们爬取了第:190部电影-------拯救大兵瑞恩\n",
570 | "我们爬取了第:191部电影-------蜘蛛侠:平行宇宙\n",
571 | "我们爬取了第:192部电影-------神奇动物在哪里\n",
572 | "我们爬取了第:193部电影-------X战警:逆转未来\n",
573 | "我们爬取了第:194部电影-------雷神\n",
574 | "我们爬取了第:195部电影-------调音师\n",
575 | "我们爬取了第:196部电影-------恋恋笔记本\n",
576 | "我们爬取了第:197部电影-------记忆碎片\n",
577 | "我们爬取了第:198部电影-------暮光之城\n",
578 | "我们爬取了第:199部电影-------雷神2:黑暗世界\n",
579 | "我们爬取了第:200部电影-------两小无猜\n"
580 | ]
581 | }
582 | ],
583 | "source": [
584 | "movie_result = pd.DataFrame()\n",
585 | "ip = '' #这里构建自己的IP池\n",
586 | "count2 = 1\n",
587 | "cw = 1\n",
588 | "\n",
589 | "for url,name in zip(result['网址'].values,result['片名'].values):\n",
590 | "#for name,url in wrongs.items():\n",
591 | " try:\n",
592 | " cache = parse_movie_info(url,headers = headers,ip = ip)\n",
593 | " movie_result = pd.concat([movie_result,cache])\n",
594 | " #time.sleep(random.random())\n",
595 | " print('我们爬取了第:%d部电影-------%s' % (count2,name))\n",
596 | " count2 += 1\n",
597 | " except:\n",
598 | " print('滴滴滴滴滴,第{}次报错'.format(cw))\n",
599 | " print('ip is:{}'.format(ip))\n",
600 | " cw += 1\n",
601 | " time.sleep(2)\n",
602 | " continue"
603 | ]
604 | },
605 | {
606 | "cell_type": "code",
607 | "execution_count": 17,
608 | "metadata": {
609 | "collapsed": false
610 | },
611 | "outputs": [
612 | {
613 | "data": {
614 | "text/html": [
615 | "\n",
616 | "\n",
629 | "
\n",
630 | " \n",
631 | " \n",
632 | " | \n",
633 | " 上映时间 | \n",
634 | " 地区 | \n",
635 | " 热评 | \n",
636 | " 片名 | \n",
637 | " 片长 | \n",
638 | " 电影类型 | \n",
639 | " 简介 | \n",
640 | " 网址 | \n",
641 | " 评分人数 | \n",
642 | " 评分分布 | \n",
643 | "
\n",
644 | " \n",
645 | " \n",
646 | " \n",
647 | " 0 | \n",
648 | " (2018) | \n",
649 | " 中国大陆 | \n",
650 | " “你敢保证你一辈子不得病?”纯粹、直接、有力!常常感叹:电影只能是电影。但每看到这样的佳作,... | \n",
651 | " 我不是药神 | \n",
652 | " 117 | \n",
653 | " [剧情, 喜剧] | \n",
654 | " 普通中年男子程勇(徐峥 饰)经营着一家保健品店,失意又失婚。不速之客吕受益(王传君 饰)的到... | \n",
655 | " https://movie.douban.com/subject/26752088/ | \n",
656 | " 1174897 | \n",
657 | " {'还行': '7.0%', '力荐': '57.4%', '较差': '0.5%', '很... | \n",
658 | "
\n",
659 | " \n",
660 | " 0 | \n",
661 | " (1994) | \n",
662 | " 法国 | \n",
663 | " 他们居然没做爱 | \n",
664 | " 这个杀手不太冷 Léon | \n",
665 | " 110分钟(剧场版) / 133分钟(国际版)\\n 又名: 杀手莱昂 / 终极... | \n",
666 | " [剧情, 动作, 犯罪] | \n",
667 | " 里昂(让·雷诺饰)是名孤独的职业杀手,受人雇佣。一天,邻居家小姑娘马蒂尔达(纳塔丽·波特曼饰... | \n",
668 | " https://movie.douban.com/subject/1295644/ | \n",
669 | " 1380628 | \n",
670 | " {'还行': '3.2%', '力荐': '74.2%', '较差': '0.2%', '很... | \n",
671 | "
\n",
672 | " \n",
673 | " 0 | \n",
674 | " (1994) | \n",
675 | " 美国 | \n",
676 | " 关于希望最强有力的注释。 | \n",
677 | " 肖申克的救赎 The Shawshank Redemption | \n",
678 | " 142 | \n",
679 | " [剧情, 犯罪] | \n",
680 | " 20世纪40年代末,小有成就的青年银行家安迪(蒂姆·罗宾斯 Tim Robbins 饰)因涉... | \n",
681 | " https://movie.douban.com/subject/1292052/ | \n",
682 | " 1525345 | \n",
683 | " {'还行': '1.5%', '力荐': '84.6%', '较差': '0.1%', '很... | \n",
684 | "
\n",
685 | " \n",
686 | " 0 | \n",
687 | " (2019) | \n",
688 | " 中国大陆 | \n",
689 | " 1.终于,轮到我们仰望星空。2.后启示录死亡废墟,赛博朋克地下城,以及烟波浩渺的末日想象,缔... | \n",
690 | " 流浪地球 | \n",
691 | " 125 | \n",
692 | " [科幻, 灾难] | \n",
693 | " 近未来,科学家们发现太阳急速衰老膨胀,短时间内包括地球在内的整个太阳系都将被太阳所吞没。为了... | \n",
694 | " https://movie.douban.com/subject/26266893/ | \n",
695 | " 1264654 | \n",
696 | " {'还行': '22.0%', '力荐': '33.1%', '较差': '4.7%', '... | \n",
697 | "
\n",
698 | " \n",
699 | " 0 | \n",
700 | " (1994) | \n",
701 | " 美国 | \n",
702 | " 我生命里最温暖的一部电影 | \n",
703 | " 阿甘正传 Forrest Gump | \n",
704 | " 142 | \n",
705 | " [剧情, 爱情] | \n",
706 | " 阿甘(汤姆·汉克斯 饰)于二战结束后不久出生在美国南方阿拉巴马州一个闭塞的小镇,他先天弱智,... | \n",
707 | " https://movie.douban.com/subject/1292720/ | \n",
708 | " 1192711 | \n",
709 | " {'还行': '2.9%', '力荐': '76.0%', '较差': '0.2%', '很... | \n",
710 | "
\n",
711 | " \n",
712 | "
\n",
713 | "
"
714 | ],
715 | "text/plain": [
716 | " 上映时间 地区 热评 \\\n",
717 | "0 (2018) 中国大陆 “你敢保证你一辈子不得病?”纯粹、直接、有力!常常感叹:电影只能是电影。但每看到这样的佳作,... \n",
718 | "0 (1994) 法国 他们居然没做爱 \n",
719 | "0 (1994) 美国 关于希望最强有力的注释。 \n",
720 | "0 (2019) 中国大陆 1.终于,轮到我们仰望星空。2.后启示录死亡废墟,赛博朋克地下城,以及烟波浩渺的末日想象,缔... \n",
721 | "0 (1994) 美国 我生命里最温暖的一部电影 \n",
722 | "\n",
723 | " 片名 \\\n",
724 | "0 我不是药神 \n",
725 | "0 这个杀手不太冷 Léon \n",
726 | "0 肖申克的救赎 The Shawshank Redemption \n",
727 | "0 流浪地球 \n",
728 | "0 阿甘正传 Forrest Gump \n",
729 | "\n",
730 | " 片长 电影类型 \\\n",
731 | "0 117 [剧情, 喜剧] \n",
732 | "0 110分钟(剧场版) / 133分钟(国际版)\\n 又名: 杀手莱昂 / 终极... [剧情, 动作, 犯罪] \n",
733 | "0 142 [剧情, 犯罪] \n",
734 | "0 125 [科幻, 灾难] \n",
735 | "0 142 [剧情, 爱情] \n",
736 | "\n",
737 | " 简介 \\\n",
738 | "0 普通中年男子程勇(徐峥 饰)经营着一家保健品店,失意又失婚。不速之客吕受益(王传君 饰)的到... \n",
739 | "0 里昂(让·雷诺饰)是名孤独的职业杀手,受人雇佣。一天,邻居家小姑娘马蒂尔达(纳塔丽·波特曼饰... \n",
740 | "0 20世纪40年代末,小有成就的青年银行家安迪(蒂姆·罗宾斯 Tim Robbins 饰)因涉... \n",
741 | "0 近未来,科学家们发现太阳急速衰老膨胀,短时间内包括地球在内的整个太阳系都将被太阳所吞没。为了... \n",
742 | "0 阿甘(汤姆·汉克斯 饰)于二战结束后不久出生在美国南方阿拉巴马州一个闭塞的小镇,他先天弱智,... \n",
743 | "\n",
744 | " 网址 评分人数 \\\n",
745 | "0 https://movie.douban.com/subject/26752088/ 1174897 \n",
746 | "0 https://movie.douban.com/subject/1295644/ 1380628 \n",
747 | "0 https://movie.douban.com/subject/1292052/ 1525345 \n",
748 | "0 https://movie.douban.com/subject/26266893/ 1264654 \n",
749 | "0 https://movie.douban.com/subject/1292720/ 1192711 \n",
750 | "\n",
751 | " 评分分布 \n",
752 | "0 {'还行': '7.0%', '力荐': '57.4%', '较差': '0.5%', '很... \n",
753 | "0 {'还行': '3.2%', '力荐': '74.2%', '较差': '0.2%', '很... \n",
754 | "0 {'还行': '1.5%', '力荐': '84.6%', '较差': '0.1%', '很... \n",
755 | "0 {'还行': '22.0%', '力荐': '33.1%', '较差': '4.7%', '... \n",
756 | "0 {'还行': '2.9%', '力荐': '76.0%', '较差': '0.2%', '很... "
757 | ]
758 | },
759 | "execution_count": 17,
760 | "metadata": {},
761 | "output_type": "execute_result"
762 | }
763 | ],
764 | "source": [
765 | "movie_result.head()"
766 | ]
767 | },
768 | {
769 | "cell_type": "markdown",
770 | "metadata": {
771 | "collapsed": true
772 | },
773 | "source": [
774 | "### 文件存储"
775 | ]
776 | },
777 | {
778 | "cell_type": "code",
779 | "execution_count": 18,
780 | "metadata": {
781 | "collapsed": false
782 | },
783 | "outputs": [],
784 | "source": [
785 | "result.to_excel('电影基本信息大全.xlsx')\n",
786 | "movie_result.to_excel('电影详细信息.xlsx')"
787 | ]
788 | },
789 | {
790 | "cell_type": "code",
791 | "execution_count": null,
792 | "metadata": {
793 | "collapsed": true
794 | },
795 | "outputs": [],
796 | "source": []
797 | },
798 | {
799 | "cell_type": "code",
800 | "execution_count": null,
801 | "metadata": {
802 | "collapsed": false
803 | },
804 | "outputs": [],
805 | "source": []
806 | },
807 | {
808 | "cell_type": "code",
809 | "execution_count": null,
810 | "metadata": {
811 | "collapsed": true
812 | },
813 | "outputs": [],
814 | "source": []
815 | },
816 | {
817 | "cell_type": "code",
818 | "execution_count": null,
819 | "metadata": {
820 | "collapsed": false
821 | },
822 | "outputs": [],
823 | "source": []
824 | },
825 | {
826 | "cell_type": "code",
827 | "execution_count": null,
828 | "metadata": {
829 | "collapsed": true
830 | },
831 | "outputs": [],
832 | "source": []
833 | },
834 | {
835 | "cell_type": "code",
836 | "execution_count": null,
837 | "metadata": {
838 | "collapsed": false
839 | },
840 | "outputs": [],
841 | "source": []
842 | },
843 | {
844 | "cell_type": "code",
845 | "execution_count": null,
846 | "metadata": {
847 | "collapsed": true
848 | },
849 | "outputs": [],
850 | "source": []
851 | },
852 | {
853 | "cell_type": "code",
854 | "execution_count": null,
855 | "metadata": {
856 | "collapsed": true
857 | },
858 | "outputs": [],
859 | "source": []
860 | },
861 | {
862 | "cell_type": "code",
863 | "execution_count": null,
864 | "metadata": {
865 | "collapsed": false
866 | },
867 | "outputs": [],
868 | "source": []
869 | },
870 | {
871 | "cell_type": "code",
872 | "execution_count": null,
873 | "metadata": {
874 | "collapsed": true
875 | },
876 | "outputs": [],
877 | "source": []
878 | },
879 | {
880 | "cell_type": "code",
881 | "execution_count": 24,
882 | "metadata": {
883 | "collapsed": true
884 | },
885 | "outputs": [],
886 | "source": []
887 | },
888 | {
889 | "cell_type": "code",
890 | "execution_count": 37,
891 | "metadata": {
892 | "collapsed": true
893 | },
894 | "outputs": [],
895 | "source": []
896 | },
897 | {
898 | "cell_type": "code",
899 | "execution_count": null,
900 | "metadata": {
901 | "collapsed": true
902 | },
903 | "outputs": [],
904 | "source": []
905 | },
906 | {
907 | "cell_type": "code",
908 | "execution_count": 39,
909 | "metadata": {
910 | "collapsed": true
911 | },
912 | "outputs": [],
913 | "source": []
914 | },
915 | {
916 | "cell_type": "code",
917 | "execution_count": null,
918 | "metadata": {
919 | "collapsed": true
920 | },
921 | "outputs": [],
922 | "source": []
923 | },
924 | {
925 | "cell_type": "code",
926 | "execution_count": null,
927 | "metadata": {
928 | "collapsed": true
929 | },
930 | "outputs": [],
931 | "source": []
932 | },
933 | {
934 | "cell_type": "code",
935 | "execution_count": null,
936 | "metadata": {
937 | "collapsed": true
938 | },
939 | "outputs": [],
940 | "source": []
941 | },
942 | {
943 | "cell_type": "code",
944 | "execution_count": null,
945 | "metadata": {
946 | "collapsed": true
947 | },
948 | "outputs": [],
949 | "source": []
950 | }
951 | ],
952 | "metadata": {
953 | "anaconda-cloud": {},
954 | "kernelspec": {
955 | "display_name": "Python [default]",
956 | "language": "python",
957 | "name": "python3"
958 | },
959 | "language_info": {
960 | "codemirror_mode": {
961 | "name": "ipython",
962 | "version": 3
963 | },
964 | "file_extension": ".py",
965 | "mimetype": "text/x-python",
966 | "name": "python",
967 | "nbconvert_exporter": "python",
968 | "pygments_lexer": "ipython3",
969 | "version": "3.5.2"
970 | }
971 | },
972 | "nbformat": 4,
973 | "nbformat_minor": 2
974 | }
975 |
--------------------------------------------------------------------------------
/Hair/README.md:
--------------------------------------------------------------------------------
1 | # 防脱洗发水分析 #
2 |
3 | ----------
4 |
5 | 这是一个关于发际线的悲伤故事。
6 |
7 | 主要文件为:
8 |
9 | - 淘宝评价爬取(selenium)
10 | - 防脱发洗发水评价源数据
11 |
12 |
13 |
--------------------------------------------------------------------------------
/Hair/防脱洗发水评价.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Hair/防脱洗发水评价.xlsx
--------------------------------------------------------------------------------
/Hair/防脱洗发水评价爬取+分析.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "### 公众号:数据不吹牛,更多案例和有趣分析等你来撩"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": null,
13 | "metadata": {},
14 | "outputs": [],
15 | "source": [
16 | "import pandas as pd\n",
17 | "from selenium import webdriver\n",
18 | "import random\n",
19 | "import os\n",
20 | "import time"
21 | ]
22 | },
23 | {
24 | "cell_type": "markdown",
25 | "metadata": {},
26 | "source": [
27 | "### 爬取单页评价(每页20条)"
28 | ]
29 | },
30 | {
31 | "cell_type": "code",
32 | "execution_count": 216,
33 | "metadata": {},
34 | "outputs": [],
35 | "source": [
36 | "def get_page(driver):\n",
37 | " \n",
38 | " result = pd.DataFrame()\n",
39 | " for i in driver.find_elements_by_xpath('//div[@class = \"rate-grid\"]/table/tbody/tr'):\n",
40 | " try:\n",
41 | " content = i.find_element_by_xpath('td[@class = \"tm-col-master\"]/div[@class = \"tm-rate-content\"]').text\n",
42 | " #评价日期\n",
43 | " date = i.find_element_by_xpath('td[@class = \"tm-col-master\"]/div[@class = \"tm-rate-date\"]').text\n",
44 | " #购买产品\n",
45 | " sku = i.find_element_by_xpath('td[@class = \"col-meta\"]/div[@class = \"rate-sku\"]').text\n",
46 | "\n",
47 | " #用户名\n",
48 | " username = i.find_element_by_xpath('td[@class = \"col-author\"]/div[@class = \"rate-user-info\"]').text\n",
49 | " append_time = None\n",
50 | " append_content = None\n",
51 | "\n",
52 | " except:\n",
53 | " content = i.find_element_by_xpath('td[@class = \"tm-col-master\"]/div[@class = \"tm-rate-premiere\"]/div[@class = \"tm-rate-content\"]').text\n",
54 | " #评价日期\n",
55 | " date = i.find_element_by_xpath('td[@class = \"tm-col-master\"]/div[@class = \"tm-rate-premiere\"]/div[@class = \"tm-rate-tag\"]/div[@class = \"tm-rate-date\"]').text\n",
56 | " #购买产品\n",
57 | " sku = i.find_element_by_xpath('td[@class = \"col-meta\"]/div[@class = \"rate-sku\"]').text\n",
58 | " #用户名\n",
59 | " username = i.find_element_by_xpath('td[@class = \"col-author\"]/div[@class = \"rate-user-info\"]').text\n",
60 | "\n",
61 | " append_time = i.find_element_by_xpath('td[@class = \"tm-col-master\"]/div[@class = \"tm-rate-append\"]/div[1]').text\n",
62 | " append_content = i.find_element_by_xpath('td[@class = \"tm-col-master\"]/div[@class = \"tm-rate-append\"]/div[2]').text\n",
63 | "\n",
64 | " df = pd.DataFrame({'用户名':[username],'购买产品':[sku],'评价日期':[date],'初评内容':[content],\n",
65 | " '追评时间':[append_time],'追评内容':[append_content]})\n",
66 | "\n",
67 | " result = pd.concat([result,df])\n",
68 | " \n",
69 | " return result,driver"
70 | ]
71 | },
72 | {
73 | "cell_type": "markdown",
74 | "metadata": {},
75 | "source": [
76 | "### 循环爬取,需要提前指定网址和评论总数"
77 | ]
78 | },
79 | {
80 | "cell_type": "code",
81 | "execution_count": 107,
82 | "metadata": {},
83 | "outputs": [],
84 | "source": [
85 | "url = 'https://detail.tmall.com/item.htm?spm=a230r.1.14.1.70f65edadaPTn3&id=521921506095&ns=1&abbucket=18'\n",
86 | "\n",
87 | "def carwl_product_comment(driver,url,max_num = 100):\n",
88 | " driver.get(url)\n",
89 | " \n",
90 | " time.sleep(5)\n",
91 | " #关掉要求登录的弹窗,就能够不登录状态下爬取\n",
92 | " driver.find_element_by_xpath('//div[@class = \"sufei-dialog-close\"]').click()\n",
93 | " \n",
94 | " driver.implicitly_wait(5)\n",
95 | " #点击到评论页面\n",
96 | " try:\n",
97 | " driver.find_element_by_xpath('//ul[@class = \"tabbar tm-clear\"]/li[2]').click()\n",
98 | " except:\n",
99 | " driver.implicitly_wait(5)\n",
100 | " driver.find_element_by_xpath('//ul[@class = \"tabbar tm-clear\"]/li[2]').click()\n",
101 | " \n",
102 | " max_page = int(max_num / 20)\n",
103 | " \n",
104 | " if max_page > 90:\n",
105 | " max_page = 90\n",
106 | " else:\n",
107 | " pass\n",
108 | " \n",
109 | " c = 1\n",
110 | " final_re = pd.DataFrame()\n",
111 | "\n",
112 | " while c <= max_page:\n",
113 | " result,driver = get_page(driver)\n",
114 | " final_re = pd.concat([final_re,result])\n",
115 | " print('Bro,完成第{}页爬取'.format(c))\n",
116 | "\n",
117 | " #点击下一页\n",
118 | " driver.find_element_by_link_text('下一页>>').click()\n",
119 | " c += 1\n",
120 | " time.sleep(random.random() + 3)\n",
121 | " return final_re"
122 | ]
123 | },
124 | {
125 | "cell_type": "markdown",
126 | "metadata": {},
127 | "source": [
128 | "### 运行\n",
129 | "#### 这里使用的是selenium中的PhantomJS,同学们也可以尝试Chrome,安装坑略多,不过网上都能找到相关解决方法"
130 | ]
131 | },
132 | {
133 | "cell_type": "code",
134 | "execution_count": 217,
135 | "metadata": {},
136 | "outputs": [],
137 | "source": [
138 | "driver = webdriver.PhantomJS()\n",
139 | "final_re = carwl_product_comment(driver,url)"
140 | ]
141 | },
142 | {
143 | "cell_type": "markdown",
144 | "metadata": {},
145 | "source": [
146 | "### 情感分词"
147 | ]
148 | },
149 | {
150 | "cell_type": "code",
151 | "execution_count": 208,
152 | "metadata": {},
153 | "outputs": [
154 | {
155 | "data": {
156 | "text/html": [
157 | "\n",
158 | "\n",
171 | "
\n",
172 | " \n",
173 | " \n",
174 | " | \n",
175 | " 品牌 | \n",
176 | " 买家 | \n",
177 | " 评价日期 | \n",
178 | " 初评内容 | \n",
179 | " 追评 | \n",
180 | " SKU | \n",
181 | "
\n",
182 | " \n",
183 | " \n",
184 | " \n",
185 | " 0 | \n",
186 | " 三个魔发匠 | \n",
187 | " t**4 | \n",
188 | " 2019-11-21 | \n",
189 | " 用了之后真的是促进每个神经细胞的,洗发水的泡沫比较丰富的,不需要拿沐浴球搓,随便用手搓搓就可... | \n",
190 | " 姗姗来迟的生姜洗发水,在关键时刻起到了最大的作用,洗头发顺顺滑滑的,特别清新,留香持久,洗的... | \n",
191 | " 化妆品净含量:(2瓶)生姜洗发水500ml+500ml | \n",
192 | "
\n",
193 | " \n",
194 | " 1 | \n",
195 | " 三个魔发匠 | \n",
196 | " t**3 | \n",
197 | " 2019-11-23 | \n",
198 | " 生姜洗发水用着很舒服,根据使用方法来洗一个地方就按一次,把头皮都按了个遍,很舒服,而且泡泡很... | \n",
199 | " 淡淡的姜香味,闻着挺舒服的,而且这个控油去屑的效果真是无敌,不用再看到满头的头皮屑的感觉真好... | \n",
200 | " 化妆品净含量:(1瓶)生姜洗发水500ml | \n",
201 | "
\n",
202 | " \n",
203 | " 2 | \n",
204 | " 三个魔发匠 | \n",
205 | " 喃**y | \n",
206 | " 2019-11-22 | \n",
207 | " 朋友推荐来买的这款生姜洗发水,根据她们说效果很好!生姜发水一大瓶超级划算,九零后已经走上了防... | \n",
208 | " 我闻着洗发水有很浓的生姜味道,但又不会很刺鼻,蛮好的,洗发后头皮不痒、没有头皮屑,至于生发效... | \n",
209 | " 化妆品净含量:(1瓶)生姜洗发水500ml | \n",
210 | "
\n",
211 | " \n",
212 | " 3 | \n",
213 | " 三个魔发匠 | \n",
214 | " f**0 | \n",
215 | " 2019-11-22 | \n",
216 | " 脱发的故事是缓缓写到结局,这款生姜洗发水是生姜提取物成分,使用也是非常的放心,而且这款洗发水... | \n",
217 | " 唠唠叨叨的用这款洗发水洗头发非常舒服,洗发水温和的一点刺激感都没有,而且发质还变好了,包装的... | \n",
218 | " 化妆品净含量:【1瓶】生姜洗发水500ml+【1瓶】香氛护发素500ml | \n",
219 | "
\n",
220 | " \n",
221 | " 4 | \n",
222 | " 三个魔发匠 | \n",
223 | " z**珍 | \n",
224 | " 2019-11-24 | \n",
225 | " 水油平衡也改善了,洗发水用着很舒服,洗一个地方就要按一次,而且泡泡很容易冲洗干净,吹干头发超... | \n",
226 | " 太值了,这款三个魔发匠生姜洗发水的香味我很喜欢,使用了以后感觉头发很顺,也不用油了,可以除螨... | \n",
227 | " 化妆品净含量:【1瓶】生姜洗发水500ml+【1瓶】香氛护发素500ml | \n",
228 | "
\n",
229 | " \n",
230 | "
\n",
231 | "
"
232 | ],
233 | "text/plain": [
234 | " 品牌 买家 评价日期 初评内容 \\\n",
235 | "0 三个魔发匠 t**4 2019-11-21 用了之后真的是促进每个神经细胞的,洗发水的泡沫比较丰富的,不需要拿沐浴球搓,随便用手搓搓就可... \n",
236 | "1 三个魔发匠 t**3 2019-11-23 生姜洗发水用着很舒服,根据使用方法来洗一个地方就按一次,把头皮都按了个遍,很舒服,而且泡泡很... \n",
237 | "2 三个魔发匠 喃**y 2019-11-22 朋友推荐来买的这款生姜洗发水,根据她们说效果很好!生姜发水一大瓶超级划算,九零后已经走上了防... \n",
238 | "3 三个魔发匠 f**0 2019-11-22 脱发的故事是缓缓写到结局,这款生姜洗发水是生姜提取物成分,使用也是非常的放心,而且这款洗发水... \n",
239 | "4 三个魔发匠 z**珍 2019-11-24 水油平衡也改善了,洗发水用着很舒服,洗一个地方就要按一次,而且泡泡很容易冲洗干净,吹干头发超... \n",
240 | "\n",
241 | " 追评 \\\n",
242 | "0 姗姗来迟的生姜洗发水,在关键时刻起到了最大的作用,洗头发顺顺滑滑的,特别清新,留香持久,洗的... \n",
243 | "1 淡淡的姜香味,闻着挺舒服的,而且这个控油去屑的效果真是无敌,不用再看到满头的头皮屑的感觉真好... \n",
244 | "2 我闻着洗发水有很浓的生姜味道,但又不会很刺鼻,蛮好的,洗发后头皮不痒、没有头皮屑,至于生发效... \n",
245 | "3 唠唠叨叨的用这款洗发水洗头发非常舒服,洗发水温和的一点刺激感都没有,而且发质还变好了,包装的... \n",
246 | "4 太值了,这款三个魔发匠生姜洗发水的香味我很喜欢,使用了以后感觉头发很顺,也不用油了,可以除螨... \n",
247 | "\n",
248 | " SKU \n",
249 | "0 化妆品净含量:(2瓶)生姜洗发水500ml+500ml \n",
250 | "1 化妆品净含量:(1瓶)生姜洗发水500ml \n",
251 | "2 化妆品净含量:(1瓶)生姜洗发水500ml \n",
252 | "3 化妆品净含量:【1瓶】生姜洗发水500ml+【1瓶】香氛护发素500ml \n",
253 | "4 化妆品净含量:【1瓶】生姜洗发水500ml+【1瓶】香氛护发素500ml "
254 | ]
255 | },
256 | "execution_count": 208,
257 | "metadata": {},
258 | "output_type": "execute_result"
259 | }
260 | ],
261 | "source": [
262 | "from snownlp import SnowNLP\n",
263 | "\n",
264 | "sens = []\n",
265 | "\n",
266 | "for text in final_re['初评内容']:\n",
267 | " s = SnowNLP(text)\n",
268 | " sens.append(s.sentiments)\n",
269 | " \n",
270 | "final_re['初评情感评分'] = sens"
271 | ]
272 | },
273 | {
274 | "cell_type": "markdown",
275 | "metadata": {},
276 | "source": [
277 | "### 情感评分分析"
278 | ]
279 | },
280 | {
281 | "cell_type": "code",
282 | "execution_count": 245,
283 | "metadata": {},
284 | "outputs": [
285 | {
286 | "data": {
287 | "text/plain": [
288 | "0.49948609661261906"
289 | ]
290 | },
291 | "execution_count": 245,
292 | "metadata": {},
293 | "output_type": "execute_result"
294 | }
295 | ],
296 | "source": [
297 | "final_re['初评情感评分'].mean()"
298 | ]
299 | },
300 | {
301 | "cell_type": "code",
302 | "execution_count": 234,
303 | "metadata": {},
304 | "outputs": [
305 | {
306 | "data": {
307 | "text/html": [
308 | "\n",
309 | "\n",
322 | "
\n",
323 | " \n",
324 | " \n",
325 | " | \n",
326 | " 情感评分 | \n",
327 | " 初评情感评分 | \n",
328 | "
\n",
329 | " \n",
330 | " \n",
331 | " \n",
332 | " count | \n",
333 | " 8980.000000 | \n",
334 | " 8980.000000 | \n",
335 | "
\n",
336 | " \n",
337 | " mean | \n",
338 | " 0.499448 | \n",
339 | " 0.499448 | \n",
340 | "
\n",
341 | " \n",
342 | " std | \n",
343 | " 0.347043 | \n",
344 | " 0.347043 | \n",
345 | "
\n",
346 | " \n",
347 | " min | \n",
348 | " 0.000000 | \n",
349 | " 0.000000 | \n",
350 | "
\n",
351 | " \n",
352 | " 25% | \n",
353 | " 0.145513 | \n",
354 | " 0.145513 | \n",
355 | "
\n",
356 | " \n",
357 | " 50% | \n",
358 | " 0.489421 | \n",
359 | " 0.489421 | \n",
360 | "
\n",
361 | " \n",
362 | " 75% | \n",
363 | " 0.847504 | \n",
364 | " 0.847504 | \n",
365 | "
\n",
366 | " \n",
367 | " max | \n",
368 | " 1.000000 | \n",
369 | " 1.000000 | \n",
370 | "
\n",
371 | " \n",
372 | "
\n",
373 | "
"
374 | ],
375 | "text/plain": [
376 | " 情感评分 初评情感评分\n",
377 | "count 8980.000000 8980.000000\n",
378 | "mean 0.499448 0.499448\n",
379 | "std 0.347043 0.347043\n",
380 | "min 0.000000 0.000000\n",
381 | "25% 0.145513 0.145513\n",
382 | "50% 0.489421 0.489421\n",
383 | "75% 0.847504 0.847504\n",
384 | "max 1.000000 1.000000"
385 | ]
386 | },
387 | "execution_count": 234,
388 | "metadata": {},
389 | "output_type": "execute_result"
390 | }
391 | ],
392 | "source": [
393 | "final_re.describe()"
394 | ]
395 | },
396 | {
397 | "cell_type": "code",
398 | "execution_count": 243,
399 | "metadata": {},
400 | "outputs": [
401 | {
402 | "data": {
403 | "text/plain": [
404 | ""
405 | ]
406 | },
407 | "execution_count": 243,
408 | "metadata": {},
409 | "output_type": "execute_result"
410 | },
411 | {
412 | "data": {
413 | "image/png": "\n",
414 | "text/plain": [
415 | ""
416 | ]
417 | },
418 | "metadata": {
419 | "needs_background": "light"
420 | },
421 | "output_type": "display_data"
422 | }
423 | ],
424 | "source": [
425 | "import seaborn as sns\n",
426 | "import matplotlib.pyplot as plt\n",
427 | "\n",
428 | "fig,ax = plt.subplots(1,1,figsize = (12,5))\n",
429 | "sns.distplot(final_re['初评情感评分'],color = 'red')\n",
430 | "\n",
431 | "plt.yticks(fontsize=11)\n",
432 | "plt.xticks(fontsize=11)\n",
433 | "\n",
434 | "ax.set_xlabel('情感评分', fontsize=14)"
435 | ]
436 | },
437 | {
438 | "cell_type": "code",
439 | "execution_count": 248,
440 | "metadata": {},
441 | "outputs": [
442 | {
443 | "data": {
444 | "text/html": [
445 | "\n",
446 | "\n",
459 | "
\n",
460 | " \n",
461 | " \n",
462 | " | \n",
463 | " 品牌 | \n",
464 | " 初评情感评分 | \n",
465 | "
\n",
466 | " \n",
467 | " \n",
468 | " \n",
469 | " 0 | \n",
470 | " 三个魔发匠 | \n",
471 | " 0.547250 | \n",
472 | "
\n",
473 | " \n",
474 | " 1 | \n",
475 | " 有情生姜 | \n",
476 | " 0.630458 | \n",
477 | "
\n",
478 | " \n",
479 | " 2 | \n",
480 | " 白云山敬修堂 | \n",
481 | " 0.573973 | \n",
482 | "
\n",
483 | " \n",
484 | " 3 | \n",
485 | " 章光101 | \n",
486 | " 0.145513 | \n",
487 | "
\n",
488 | " \n",
489 | " 4 | \n",
490 | " 霸王防脱 | \n",
491 | " 0.615317 | \n",
492 | "
\n",
493 | " \n",
494 | "
\n",
495 | "
"
496 | ],
497 | "text/plain": [
498 | " 品牌 初评情感评分\n",
499 | "0 三个魔发匠 0.547250\n",
500 | "1 有情生姜 0.630458\n",
501 | "2 白云山敬修堂 0.573973\n",
502 | "3 章光101 0.145513\n",
503 | "4 霸王防脱 0.615317"
504 | ]
505 | },
506 | "execution_count": 248,
507 | "metadata": {},
508 | "output_type": "execute_result"
509 | }
510 | ],
511 | "source": [
512 | "final_re.groupby('品牌')['初评情感评分'].median().reset_index()"
513 | ]
514 | },
515 | {
516 | "cell_type": "code",
517 | "execution_count": null,
518 | "metadata": {},
519 | "outputs": [],
520 | "source": []
521 | },
522 | {
523 | "cell_type": "code",
524 | "execution_count": null,
525 | "metadata": {},
526 | "outputs": [],
527 | "source": []
528 | },
529 | {
530 | "cell_type": "code",
531 | "execution_count": null,
532 | "metadata": {},
533 | "outputs": [],
534 | "source": []
535 | }
536 | ],
537 | "metadata": {
538 | "kernelspec": {
539 | "display_name": "Python 3",
540 | "language": "python",
541 | "name": "python3"
542 | },
543 | "language_info": {
544 | "codemirror_mode": {
545 | "name": "ipython",
546 | "version": 3
547 | },
548 | "file_extension": ".py",
549 | "mimetype": "text/x-python",
550 | "name": "python",
551 | "nbconvert_exporter": "python",
552 | "pygments_lexer": "ipython3",
553 | "version": "3.5.3"
554 | }
555 | },
556 | "nbformat": 4,
557 | "nbformat_minor": 2
558 | }
559 |
--------------------------------------------------------------------------------
/Python+excel/Python批量处理Excel表格.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "### 原创:周志鹏\n",
8 | "### 公众号:数据不吹牛,更多案例和有趣分析等你来撩"
9 | ]
10 | },
11 | {
12 | "cell_type": "code",
13 | "execution_count": null,
14 | "metadata": {},
15 | "outputs": [],
16 | "source": [
17 | "import os\n",
18 | "import time\n",
19 | "import pandas as pd\n",
20 | "\n",
21 | "os.chdir('C:\\\\Users\\\\Administrator\\\\Desktop\\\\data')"
22 | ]
23 | },
24 | {
25 | "cell_type": "markdown",
26 | "metadata": {},
27 | "source": [
28 | "### 打开单个表格"
29 | ]
30 | },
31 | {
32 | "cell_type": "code",
33 | "execution_count": 4,
34 | "metadata": {},
35 | "outputs": [
36 | {
37 | "data": {
38 | "text/html": [
39 | "\n",
40 | "\n",
53 | "
\n",
54 | " \n",
55 | " \n",
56 | " | \n",
57 | " 日期 | \n",
58 | " 转化率 | \n",
59 | " 访客数 | \n",
60 | " 三级类目 | \n",
61 | " 客单价 | \n",
62 | " 品牌 | \n",
63 | "
\n",
64 | " \n",
65 | " \n",
66 | " \n",
67 | " 0 | \n",
68 | " 2019-08 | \n",
69 | " 0.025806 | \n",
70 | " 221402 | \n",
71 | " 绑钩器 | \n",
72 | " 33.284283 | \n",
73 | " 品牌-17 | \n",
74 | "
\n",
75 | " \n",
76 | " 1 | \n",
77 | " 2019-08 | \n",
78 | " 0.019638 | \n",
79 | " 14074 | \n",
80 | " 绑钩器 | \n",
81 | " 233.995330 | \n",
82 | " 品牌-12 | \n",
83 | "
\n",
84 | " \n",
85 | " 2 | \n",
86 | " 2019-08 | \n",
87 | " 0.065407 | \n",
88 | " 75392 | \n",
89 | " 绑钩器 | \n",
90 | " 11.938785 | \n",
91 | " 品牌-20 | \n",
92 | "
\n",
93 | " \n",
94 | " 3 | \n",
95 | " 2019-08 | \n",
96 | " 0.015905 | \n",
97 | " 85529 | \n",
98 | " 绑钩器 | \n",
99 | " 41.059966 | \n",
100 | " 品牌-13 | \n",
101 | "
\n",
102 | " \n",
103 | " 4 | \n",
104 | " 2019-08 | \n",
105 | " 0.039033 | \n",
106 | " 23839 | \n",
107 | " 绑钩器 | \n",
108 | " 44.502008 | \n",
109 | " 品牌-1 | \n",
110 | "
\n",
111 | " \n",
112 | "
\n",
113 | "
"
114 | ],
115 | "text/plain": [
116 | " 日期 转化率 访客数 三级类目 客单价 品牌\n",
117 | "0 2019-08 0.025806 221402 绑钩器 33.284283 品牌-17\n",
118 | "1 2019-08 0.019638 14074 绑钩器 233.995330 品牌-12\n",
119 | "2 2019-08 0.065407 75392 绑钩器 11.938785 品牌-20\n",
120 | "3 2019-08 0.015905 85529 绑钩器 41.059966 品牌-13\n",
121 | "4 2019-08 0.039033 23839 绑钩器 44.502008 品牌-1"
122 | ]
123 | },
124 | "execution_count": 4,
125 | "metadata": {},
126 | "output_type": "execute_result"
127 | }
128 | ],
129 | "source": [
130 | "name = '垂钓装备&绑钩器.xlsx'\n",
131 | "df = pd.read_excel(name)\n",
132 | "df.head()"
133 | ]
134 | },
135 | {
136 | "cell_type": "markdown",
137 | "metadata": {},
138 | "source": [
139 | "### 查看日期范围"
140 | ]
141 | },
142 | {
143 | "cell_type": "code",
144 | "execution_count": 6,
145 | "metadata": {},
146 | "outputs": [
147 | {
148 | "data": {
149 | "text/plain": [
150 | "array(['2019-08', '2019-07', '2019-06', '2019-05', '2019-04', '2019-03',\n",
151 | " '2019-02', '2019-01', '2018-12', '2018-11', '2018-10', '2018-09'], dtype=object)"
152 | ]
153 | },
154 | "execution_count": 6,
155 | "metadata": {},
156 | "output_type": "execute_result"
157 | }
158 | ],
159 | "source": [
160 | "df['日期'].unique()"
161 | ]
162 | },
163 | {
164 | "cell_type": "markdown",
165 | "metadata": {},
166 | "source": [
167 | "### 计算销售额"
168 | ]
169 | },
170 | {
171 | "cell_type": "code",
172 | "execution_count": 8,
173 | "metadata": {},
174 | "outputs": [
175 | {
176 | "data": {
177 | "text/html": [
178 | "\n",
179 | "\n",
192 | "
\n",
193 | " \n",
194 | " \n",
195 | " | \n",
196 | " 日期 | \n",
197 | " 转化率 | \n",
198 | " 访客数 | \n",
199 | " 三级类目 | \n",
200 | " 客单价 | \n",
201 | " 品牌 | \n",
202 | " 销售额 | \n",
203 | "
\n",
204 | " \n",
205 | " \n",
206 | " \n",
207 | " 0 | \n",
208 | " 2019-08 | \n",
209 | " 0.025806 | \n",
210 | " 221402 | \n",
211 | " 绑钩器 | \n",
212 | " 33.284283 | \n",
213 | " 品牌-17 | \n",
214 | " 190167.455681 | \n",
215 | "
\n",
216 | " \n",
217 | " 1 | \n",
218 | " 2019-08 | \n",
219 | " 0.019638 | \n",
220 | " 14074 | \n",
221 | " 绑钩器 | \n",
222 | " 233.995330 | \n",
223 | " 品牌-12 | \n",
224 | " 64673.807815 | \n",
225 | "
\n",
226 | " \n",
227 | " 2 | \n",
228 | " 2019-08 | \n",
229 | " 0.065407 | \n",
230 | " 75392 | \n",
231 | " 绑钩器 | \n",
232 | " 11.938785 | \n",
233 | " 品牌-20 | \n",
234 | " 58871.997672 | \n",
235 | "
\n",
236 | " \n",
237 | " 3 | \n",
238 | " 2019-08 | \n",
239 | " 0.015905 | \n",
240 | " 85529 | \n",
241 | " 绑钩器 | \n",
242 | " 41.059966 | \n",
243 | " 品牌-13 | \n",
244 | " 55856.842507 | \n",
245 | "
\n",
246 | " \n",
247 | " 4 | \n",
248 | " 2019-08 | \n",
249 | " 0.039033 | \n",
250 | " 23839 | \n",
251 | " 绑钩器 | \n",
252 | " 44.502008 | \n",
253 | " 品牌-1 | \n",
254 | " 41409.600947 | \n",
255 | "
\n",
256 | " \n",
257 | "
\n",
258 | "
"
259 | ],
260 | "text/plain": [
261 | " 日期 转化率 访客数 三级类目 客单价 品牌 销售额\n",
262 | "0 2019-08 0.025806 221402 绑钩器 33.284283 品牌-17 190167.455681\n",
263 | "1 2019-08 0.019638 14074 绑钩器 233.995330 品牌-12 64673.807815\n",
264 | "2 2019-08 0.065407 75392 绑钩器 11.938785 品牌-20 58871.997672\n",
265 | "3 2019-08 0.015905 85529 绑钩器 41.059966 品牌-13 55856.842507\n",
266 | "4 2019-08 0.039033 23839 绑钩器 44.502008 品牌-1 41409.600947"
267 | ]
268 | },
269 | "execution_count": 8,
270 | "metadata": {},
271 | "output_type": "execute_result"
272 | }
273 | ],
274 | "source": [
275 | "df['销售额'] = df['访客数'] * df['转化率'] * df['客单价']\n",
276 | "df.head()"
277 | ]
278 | },
279 | {
280 | "cell_type": "markdown",
281 | "metadata": {},
282 | "source": [
283 | "### 单表销售额合并"
284 | ]
285 | },
286 | {
287 | "cell_type": "code",
288 | "execution_count": 10,
289 | "metadata": {},
290 | "outputs": [
291 | {
292 | "data": {
293 | "text/html": [
294 | "\n",
295 | "\n",
308 | "
\n",
309 | " \n",
310 | " \n",
311 | " | \n",
312 | " 品牌 | \n",
313 | " 销售额 | \n",
314 | "
\n",
315 | " \n",
316 | " \n",
317 | " \n",
318 | " 0 | \n",
319 | " 品牌-1 | \n",
320 | " 529837.745358 | \n",
321 | "
\n",
322 | " \n",
323 | " 1 | \n",
324 | " 品牌-10 | \n",
325 | " 217976.661847 | \n",
326 | "
\n",
327 | " \n",
328 | " 2 | \n",
329 | " 品牌-11 | \n",
330 | " 327093.079507 | \n",
331 | "
\n",
332 | " \n",
333 | " 3 | \n",
334 | " 品牌-12 | \n",
335 | " 485635.295843 | \n",
336 | "
\n",
337 | " \n",
338 | " 4 | \n",
339 | " 品牌-13 | \n",
340 | " 438391.195855 | \n",
341 | "
\n",
342 | " \n",
343 | "
\n",
344 | "
"
345 | ],
346 | "text/plain": [
347 | " 品牌 销售额\n",
348 | "0 品牌-1 529837.745358\n",
349 | "1 品牌-10 217976.661847\n",
350 | "2 品牌-11 327093.079507\n",
351 | "3 品牌-12 485635.295843\n",
352 | "4 品牌-13 438391.195855"
353 | ]
354 | },
355 | "execution_count": 10,
356 | "metadata": {},
357 | "output_type": "execute_result"
358 | }
359 | ],
360 | "source": [
361 | "df_sum = df.groupby('品牌')['销售额'].sum().reset_index()\n",
362 | "df_sum.head()"
363 | ]
364 | },
365 | {
366 | "cell_type": "markdown",
367 | "metadata": {},
368 | "source": [
369 | "### 增加行业标签"
370 | ]
371 | },
372 | {
373 | "cell_type": "code",
374 | "execution_count": 12,
375 | "metadata": {},
376 | "outputs": [
377 | {
378 | "data": {
379 | "text/html": [
380 | "\n",
381 | "\n",
394 | "
\n",
395 | " \n",
396 | " \n",
397 | " | \n",
398 | " 品牌 | \n",
399 | " 销售额 | \n",
400 | " 行业 | \n",
401 | "
\n",
402 | " \n",
403 | " \n",
404 | " \n",
405 | " 0 | \n",
406 | " 品牌-1 | \n",
407 | " 529837.745358 | \n",
408 | " 垂钓装备&绑钩器 | \n",
409 | "
\n",
410 | " \n",
411 | " 1 | \n",
412 | " 品牌-10 | \n",
413 | " 217976.661847 | \n",
414 | " 垂钓装备&绑钩器 | \n",
415 | "
\n",
416 | " \n",
417 | " 2 | \n",
418 | " 品牌-11 | \n",
419 | " 327093.079507 | \n",
420 | " 垂钓装备&绑钩器 | \n",
421 | "
\n",
422 | " \n",
423 | " 3 | \n",
424 | " 品牌-12 | \n",
425 | " 485635.295843 | \n",
426 | " 垂钓装备&绑钩器 | \n",
427 | "
\n",
428 | " \n",
429 | " 4 | \n",
430 | " 品牌-13 | \n",
431 | " 438391.195855 | \n",
432 | " 垂钓装备&绑钩器 | \n",
433 | "
\n",
434 | " \n",
435 | "
\n",
436 | "
"
437 | ],
438 | "text/plain": [
439 | " 品牌 销售额 行业\n",
440 | "0 品牌-1 529837.745358 垂钓装备&绑钩器\n",
441 | "1 品牌-10 217976.661847 垂钓装备&绑钩器\n",
442 | "2 品牌-11 327093.079507 垂钓装备&绑钩器\n",
443 | "3 品牌-12 485635.295843 垂钓装备&绑钩器\n",
444 | "4 品牌-13 438391.195855 垂钓装备&绑钩器"
445 | ]
446 | },
447 | "execution_count": 12,
448 | "metadata": {},
449 | "output_type": "execute_result"
450 | }
451 | ],
452 | "source": [
453 | "df_sum['行业'] = name.replace('.xlsx','')\n",
454 | "df_sum.head()"
455 | ]
456 | },
457 | {
458 | "cell_type": "markdown",
459 | "metadata": {},
460 | "source": [
461 | "### 搞定单个文件,批量处理只需要循环即可"
462 | ]
463 | },
464 | {
465 | "cell_type": "code",
466 | "execution_count": 13,
467 | "metadata": {},
468 | "outputs": [
469 | {
470 | "name": "stdout",
471 | "output_type": "stream",
472 | "text": [
473 | "用Python操作所花费时间:2.6550002098083496 s\n"
474 | ]
475 | }
476 | ],
477 | "source": [
478 | "import time\n",
479 | "\n",
480 | "#开始时间\n",
481 | "start = time.time()\n",
482 | "\n",
483 | "#存储汇总的结果\n",
484 | "result = pd.DataFrame()\n",
485 | "\n",
486 | "#循环遍历表格名称\n",
487 | "for name in os.listdir():\n",
488 | " df = pd.read_excel(name)\n",
489 | " #计算销售额字段\n",
490 | " df['销售额'] = df['访客数'] * df['转化率'] * df['客单价']\n",
491 | " #按品牌对细分行业销售额进行汇总\n",
492 | " df_sum = df.groupby('品牌')['销售额'].sum().reset_index()\n",
493 | " df_sum['类目'] = name.replace('.xlsx','')\n",
494 | " result = pd.concat([result,df_sum])\n",
495 | "\n",
496 | "#对最终结果按销售额进行排序\n",
497 | "final = result.groupby('品牌')['销售额'].sum().reset_index().sort_values('销售额',ascending = False)\n",
498 | "\n",
499 | "#结束时间\n",
500 | "end = time.time()\n",
501 | "print('用Python操作所花费时间:{} s'.format(end-start))"
502 | ]
503 | },
504 | {
505 | "cell_type": "code",
506 | "execution_count": 15,
507 | "metadata": {},
508 | "outputs": [
509 | {
510 | "data": {
511 | "text/html": [
512 | "\n",
513 | "\n",
526 | "
\n",
527 | " \n",
528 | " \n",
529 | " | \n",
530 | " 品牌 | \n",
531 | " 销售额 | \n",
532 | "
\n",
533 | " \n",
534 | " \n",
535 | " \n",
536 | " 15 | \n",
537 | " 品牌-5 | \n",
538 | " 1.226224e+09 | \n",
539 | "
\n",
540 | " \n",
541 | " 8 | \n",
542 | " 品牌-17 | \n",
543 | " 1.195281e+09 | \n",
544 | "
\n",
545 | " \n",
546 | " 2 | \n",
547 | " 品牌-11 | \n",
548 | " 1.151829e+09 | \n",
549 | "
\n",
550 | " \n",
551 | " 4 | \n",
552 | " 品牌-13 | \n",
553 | " 1.150687e+09 | \n",
554 | "
\n",
555 | " \n",
556 | " 3 | \n",
557 | " 品牌-12 | \n",
558 | " 1.143520e+09 | \n",
559 | "
\n",
560 | " \n",
561 | "
\n",
562 | "
"
563 | ],
564 | "text/plain": [
565 | " 品牌 销售额\n",
566 | "15 品牌-5 1.226224e+09\n",
567 | "8 品牌-17 1.195281e+09\n",
568 | "2 品牌-11 1.151829e+09\n",
569 | "4 品牌-13 1.150687e+09\n",
570 | "3 品牌-12 1.143520e+09"
571 | ]
572 | },
573 | "execution_count": 15,
574 | "metadata": {},
575 | "output_type": "execute_result"
576 | }
577 | ],
578 | "source": [
579 | "final.head()"
580 | ]
581 | },
582 | {
583 | "cell_type": "markdown",
584 | "metadata": {},
585 | "source": [
586 | "### 不显示科学计数法,保留小数点两位数"
587 | ]
588 | },
589 | {
590 | "cell_type": "code",
591 | "execution_count": 16,
592 | "metadata": {},
593 | "outputs": [
594 | {
595 | "data": {
596 | "text/html": [
597 | "\n",
598 | "\n",
611 | "
\n",
612 | " \n",
613 | " \n",
614 | " | \n",
615 | " 品牌 | \n",
616 | " 销售额 | \n",
617 | "
\n",
618 | " \n",
619 | " \n",
620 | " \n",
621 | " 15 | \n",
622 | " 品牌-5 | \n",
623 | " 1226223640.73 | \n",
624 | "
\n",
625 | " \n",
626 | " 8 | \n",
627 | " 品牌-17 | \n",
628 | " 1195280571.60 | \n",
629 | "
\n",
630 | " \n",
631 | " 2 | \n",
632 | " 品牌-11 | \n",
633 | " 1151829215.73 | \n",
634 | "
\n",
635 | " \n",
636 | " 4 | \n",
637 | " 品牌-13 | \n",
638 | " 1150687029.66 | \n",
639 | "
\n",
640 | " \n",
641 | " 3 | \n",
642 | " 品牌-12 | \n",
643 | " 1143519788.23 | \n",
644 | "
\n",
645 | " \n",
646 | "
\n",
647 | "
"
648 | ],
649 | "text/plain": [
650 | " 品牌 销售额\n",
651 | "15 品牌-5 1226223640.73\n",
652 | "8 品牌-17 1195280571.60\n",
653 | "2 品牌-11 1151829215.73\n",
654 | "4 品牌-13 1150687029.66\n",
655 | "3 品牌-12 1143519788.23"
656 | ]
657 | },
658 | "execution_count": 16,
659 | "metadata": {},
660 | "output_type": "execute_result"
661 | }
662 | ],
663 | "source": [
664 | "pd.set_option('display.float_format', lambda x: '%.2f' % x)\n",
665 | "final.head()"
666 | ]
667 | }
668 | ],
669 | "metadata": {
670 | "kernelspec": {
671 | "display_name": "Python 3",
672 | "language": "python",
673 | "name": "python3"
674 | },
675 | "language_info": {
676 | "codemirror_mode": {
677 | "name": "ipython",
678 | "version": 3
679 | },
680 | "file_extension": ".py",
681 | "mimetype": "text/x-python",
682 | "name": "python",
683 | "nbconvert_exporter": "python",
684 | "pygments_lexer": "ipython3",
685 | "version": "3.5.3"
686 | }
687 | },
688 | "nbformat": 4,
689 | "nbformat_minor": 2
690 | }
691 |
--------------------------------------------------------------------------------
/Python+excel/README.md:
--------------------------------------------------------------------------------
1 | # Python批量处理128张Excel表格 #
2 |
3 | ----------
4 |
5 | 项目主要以一个Python处理128张表格的操作为案例,引出Python自动化处理表格或办公,提升工作效率的思路。
6 |
7 | 主要文件为:
8 |
9 | - 128张销售表格
10 | - Python批量处理Excel代码
11 |
12 |
13 |
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/专项户外运动装备&冰爪.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/专项户外运动装备&冰爪.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/专项户外运动装备&呼吸管-呼吸器.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/专项户外运动装备&呼吸管-呼吸器.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/专项户外运动装备&安全带.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/专项户外运动装备&安全带.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/专项户外运动装备&救生衣.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/专项户外运动装备&救生衣.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/专项户外运动装备&气瓶.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/专项户外运动装备&气瓶.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/专项户外运动装备&滑雪头盔.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/专项户外运动装备&滑雪头盔.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/专项户外运动装备&滑雪护具.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/专项户外运动装备&滑雪护具.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/专项户外运动装备&滑雪板.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/专项户外运动装备&滑雪板.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/专项户外运动装备&滑雪眼镜.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/专项户外运动装备&滑雪眼镜.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/专项户外运动装备&潜水箱包.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/专项户外运动装备&潜水箱包.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/专项户外运动装备&潜水袜.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/专项户外运动装备&潜水袜.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/专项户外运动装备&皮划艇充气艇.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/专项户外运动装备&皮划艇充气艇.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/专项户外运动装备&绳索.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/专项户外运动装备&绳索.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/专项户外运动装备&脚蹼.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/专项户外运动装备&脚蹼.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/专项户外运动装备&面镜.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/专项户外运动装备&面镜.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/垂钓装备&其他垂钓用品.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&其他垂钓用品.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/垂钓装备&垂钓小配件.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&垂钓小配件.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/垂钓装备&垂钓装备.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&垂钓装备.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/垂钓装备&太空豆.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&太空豆.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/垂钓装备&打水桶.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&打水桶.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/垂钓装备&抄网.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&抄网.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/垂钓装备&抄网头.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&抄网头.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/垂钓装备&抄网杆.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&抄网杆.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/垂钓装备&探鱼器.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&探鱼器.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/垂钓装备&支架.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&支架.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/垂钓装备&止血钳.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&止血钳.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/垂钓装备&浮漂.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&浮漂.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/垂钓装备&渔具包.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&渔具包.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/垂钓装备&绑钩器.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&绑钩器.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/垂钓装备&装鱼桶.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&装鱼桶.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/垂钓装备&钓台.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&钓台.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/垂钓装备&钓竿.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&钓竿.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/垂钓装备&钓箱.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&钓箱.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/垂钓装备&钓鱼伞.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&钓鱼伞.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/垂钓装备&钓鱼帽.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&钓鱼帽.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/垂钓装备&钓鱼手套.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&钓鱼手套.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/垂钓装备&钓鱼椅、凳.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&钓鱼椅、凳.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/垂钓装备&钓鱼鞋.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&钓鱼鞋.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/垂钓装备&铅坠.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&铅坠.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/垂钓装备&铅皮.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&铅皮.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/垂钓装备&饵料盒.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&饵料盒.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/垂钓装备&鱼护.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&鱼护.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/垂钓装备&鱼线.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&鱼线.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/垂钓装备&鱼线轮.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&鱼线轮.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/垂钓装备&鱼网-虾笼-其它渔具.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&鱼网-虾笼-其它渔具.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/垂钓装备&鱼钩.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&鱼钩.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/垂钓装备&鱼饵.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&鱼饵.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外休闲家具&充气床.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外休闲家具&充气床.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外休闲家具&吊床.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外休闲家具&吊床.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外休闲家具&户外休闲家具.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外休闲家具&户外休闲家具.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外休闲家具&户外床-折叠床.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外休闲家具&户外床-折叠床.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外休闲家具&户外桌子.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外休闲家具&户外桌子.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外休闲家具&户外桌椅套装.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外休闲家具&户外桌椅套装.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外休闲家具&户外椅子凳子.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外休闲家具&户外椅子凳子.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外休闲家具&野餐垫.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外休闲家具&野餐垫.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外服装&一次性内裤.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&一次性内裤.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外服装&其他户外服装.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&其他户外服装.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外服装&内衣裤套装.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&内衣裤套装.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外服装&冲锋衣.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&冲锋衣.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外服装&冲锋衣裤套装.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&冲锋衣裤套装.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外服装&冲锋裤.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&冲锋裤.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外服装&功能内衣上装.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&功能内衣上装.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外服装&功能内衣下装.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&功能内衣下装.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外服装&功能内裤.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&功能内裤.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外服装&户外休闲衣.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&户外休闲衣.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外服装&户外休闲衣裤套装.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&户外休闲衣裤套装.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外服装&户外休闲裤.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&户外休闲裤.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外服装&户外服装.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&户外服装.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外服装&抓绒衣.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&抓绒衣.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外服装&抓绒裤.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&抓绒裤.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外服装&滑雪衣.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&滑雪衣.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外服装&滑雪衣裤套装.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&滑雪衣裤套装.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外服装&滑雪裤.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&滑雪裤.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外服装&潜水服.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&潜水服.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外服装&羽绒衣.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&羽绒衣.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外服装&软壳衣.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&软壳衣.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外服装&软壳裤.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&软壳裤.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外服装&运动户外风衣.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&运动户外风衣.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外服装&速干T恤.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&速干T恤.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外服装&速干背心.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&速干背心.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外服装&速干衣裤套装.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&速干衣裤套装.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外服装&速干衬衣.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&速干衬衣.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外服装&速干裤.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&速干裤.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外服装&钓鱼服.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&钓鱼服.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外照明&信号灯-发光棒-救生灯.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外照明&信号灯-发光棒-救生灯.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外照明&充电器.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外照明&充电器.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外照明&其他.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外照明&其他.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外照明&头灯.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外照明&头灯.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外照明&户外照明.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外照明&户外照明.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外照明&手电筒.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外照明&手电筒.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外照明&电池-燃料.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外照明&电池-燃料.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外照明&营地灯-帐篷灯.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外照明&营地灯-帐篷灯.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外照明&钓鱼灯.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外照明&钓鱼灯.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外鞋靴&其他户外鞋.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外鞋靴&其他户外鞋.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外鞋靴&户外休闲鞋.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外鞋靴&户外休闲鞋.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外鞋靴&户外鞋靴.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外鞋靴&户外鞋靴.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外鞋靴&攀岩鞋.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外鞋靴&攀岩鞋.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外鞋靴&沙滩鞋-凉鞋-拖鞋.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外鞋靴&沙滩鞋-凉鞋-拖鞋.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外鞋靴&溯溪鞋.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外鞋靴&溯溪鞋.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外鞋靴&滑雪鞋-雪地靴.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外鞋靴&滑雪鞋-雪地靴.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外鞋靴&登山鞋-徒步鞋.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外鞋靴&登山鞋-徒步鞋.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/户外鞋靴&越野跑鞋.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外鞋靴&越野跑鞋.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/旅行便携装备&其他.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/旅行便携装备&其他.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/旅行便携装备&其他安全防盗产品.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/旅行便携装备&其他安全防盗产品.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/旅行便携装备&旅行便携装备.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/旅行便携装备&旅行便携装备.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/旅行便携装备&普通密码锁.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/旅行便携装备&普通密码锁.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/旅行便携装备&晾衣绳.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/旅行便携装备&晾衣绳.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/旅行便携装备&转换插头.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/旅行便携装备&转换插头.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/望远镜-夜视仪-户外眼镜&垂钓望远镜.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/望远镜-夜视仪-户外眼镜&垂钓望远镜.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/望远镜-夜视仪-户外眼镜&户外眼镜.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/望远镜-夜视仪-户外眼镜&户外眼镜.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/望远镜-夜视仪-户外眼镜&普通望远镜.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/望远镜-夜视仪-户外眼镜&普通望远镜.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/望远镜-夜视仪-户外眼镜&望远镜-夜视仪-户外眼镜.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/望远镜-夜视仪-户外眼镜&望远镜-夜视仪-户外眼镜.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/望远镜-夜视仪-户外眼镜&望远镜配件.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/望远镜-夜视仪-户外眼镜&望远镜配件.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/洗漱清洁-护理用品&防虫-防蚊用品.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/洗漱清洁-护理用品&防虫-防蚊用品.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/登山杖-手杖&登山杖-手杖.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/登山杖-手杖&登山杖-手杖.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/睡袋&睡袋.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/睡袋&睡袋.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/防护-救生装备&其他防护救生装备.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/防护-救生装备&其他防护救生装备.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/防护-救生装备&急救包-急救箱.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/防护-救生装备&急救包-急救箱.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/防护-救生装备&急救护理用品.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/防护-救生装备&急救护理用品.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/防护-救生装备&求生哨.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/防护-救生装备&求生哨.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/防护-救生装备&求生绳-逃生绳.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/防护-救生装备&求生绳-逃生绳.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/防护-救生装备&求生锯-绳锯-线锯.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/防护-救生装备&求生锯-绳锯-线锯.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/防护-救生装备&防护-救生装备.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/防护-救生装备&防护-救生装备.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/防护-救生装备&防护面罩.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/防护-救生装备&防护面罩.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/防潮垫-地席-枕头&地布-地席.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/防潮垫-地席-枕头&地布-地席.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/防潮垫-地席-枕头&枕头.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/防潮垫-地席-枕头&枕头.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/防潮垫-地席-枕头&防潮垫-地席-枕头.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/防潮垫-地席-枕头&防潮垫-地席-枕头.xlsx
--------------------------------------------------------------------------------
/Python+excel/源数据128张表格/防潮垫-地席-枕头&防潮垫.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/防潮垫-地席-枕头&防潮垫.xlsx
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # 你好啊,欢迎来到数据不吹牛#
2 |
3 | ----------
4 |
5 | 黑发渔樵江渚上
6 |
7 | 惯看秋月春风
8 |
9 | 一壶数据喜相逢
10 |
11 | 古今多少事
12 |
13 | 都在分析中
14 |
15 | ·
16 |
17 | 这里数据源和代码搭配小Z公众号《数据不吹牛》食用更佳~
18 |
19 | 当初学Python和数据分析的时候,看到很多优秀的案例,经常苦于没有数据源和详细代码去复现。
20 |
21 | 所以,后来小Z在分享的过程中,特别注意数据源和分步骤代码的沉淀,希望分享的内容,能够再多那么一丢丢帮助到需要的朋友
22 |
23 |
24 | 你的点赞就是对小Z最大的鼓励
25 |
26 |
--------------------------------------------------------------------------------
/RFM/PYTHON-RFM实战数据.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/RFM/PYTHON-RFM实战数据.xlsx
--------------------------------------------------------------------------------
/RFM/README.md:
--------------------------------------------------------------------------------
1 | # RFM分析实战 #
2 |
3 | ----------
4 |
5 | 项目主要讲清楚两个问题,什么是RFM模型以及怎么用Python实现RFM模型
6 |
7 | 主要文件为:
8 |
9 | - 脱敏案例源数据
10 | - RFM分析实战代码
11 |
12 | 欢迎关注公众号:数据不吹牛
13 |
14 |
--------------------------------------------------------------------------------
/TGI/README.md:
--------------------------------------------------------------------------------
1 | # TGI分析实战 #
2 |
3 | ----------
4 |
5 | 项目主要围绕什么是TGI指数以及怎么样基于案例数据,用Python实现基本的TGI指数分析。
6 |
7 | 主要文件为:
8 |
9 | - 脱敏案例源数据
10 | - TGI分析实战代码
11 |
12 | 欢迎关注公众号:数据不吹牛
13 |
14 |
--------------------------------------------------------------------------------
/TGI/TGI指数案例数据.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/TGI/TGI指数案例数据.xlsx
--------------------------------------------------------------------------------
/Weather+Email/README.md:
--------------------------------------------------------------------------------
1 | # 天气爬取+邮件发送 #
2 |
3 | ----------
4 |
5 | 项目主要介绍了天气网站的爬取和如何用简洁的代码发送邮件,对应了公众号文章中的脑洞。
6 |
7 | 主要文件为:
8 |
9 | -天气爬虫 + 邮件发送完整代码
10 |
11 | 欢迎来撩公众号:数据不吹牛
12 |
13 |
--------------------------------------------------------------------------------
/Weather+Email/天气爬虫+邮件发送.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "### 原创:公众号《数据不吹牛》"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": 15,
13 | "metadata": {},
14 | "outputs": [],
15 | "source": [
16 | "import pandas as pd\n",
17 | "import numpy as np"
18 | ]
19 | },
20 | {
21 | "cell_type": "markdown",
22 | "metadata": {},
23 | "source": [
24 | "### 天气爬虫"
25 | ]
26 | },
27 | {
28 | "cell_type": "code",
29 | "execution_count": 11,
30 | "metadata": {},
31 | "outputs": [],
32 | "source": [
33 | "import requests\n",
34 | "from lxml import etree\n",
35 | "\n",
36 | "def parse(url = 'https://www.tianqi.com/hangzhou'):\n",
37 | " headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36'}\n",
38 | " html = requests.get(url,headers = headers)\n",
39 | " bs = etree.HTML(html.text)\n",
40 | " \n",
41 | " #今天天气相关数据:日期,星期几,天气,最低气温,最高气温\n",
42 | " today_date = bs.xpath('//ul[@class = \"week\"]/li[1]/b/text()')[0]\n",
43 | " today_week = bs.xpath('//ul[@class = \"week\"]/li[1]/span/text()')[0]\n",
44 | " today_weather = bs.xpath('//ul[@class = \"txt txt2\"]/li[1]/text()')[0]\n",
45 | " today_low = bs.xpath('//div[@class = \"zxt_shuju\"]/ul/li[1]/b/text()')[0]\n",
46 | " today_high = bs.xpath('//div[@class = \"zxt_shuju\"]/ul/li[1]/span/text()')[0]\n",
47 | "\n",
48 | " #明天天气相关数据,维度和上述一致\n",
49 | " tomorrow_date = bs.xpath('//ul[@class = \"week\"]/li[2]/b/text()')[0]\n",
50 | " tomorrow_week = bs.xpath('//ul[@class = \"week\"]/li[2]/span/text()')[0]\n",
51 | " tomorrow_weather = bs.xpath('//ul[@class = \"txt txt2\"]/li[2]/text()')[0]\n",
52 | " tomorrow_low = bs.xpath('//div[@class = \"zxt_shuju\"]/ul/li[2]/b/text()')[0]\n",
53 | " tomorrow_high = bs.xpath('//div[@class = \"zxt_shuju\"]/ul/li[2]/span/text()')[0]\n",
54 | " \n",
55 | " tomorrow = ('明天是%s,%s,%s,%s-%s度,温差%d度')% \\\n",
56 | " (tomorrow_date,tomorrow_week,tomorrow_weather,tomorrow_low,tomorrow_high,int(int(tomorrow_high)-int(tomorrow_low)))\n",
57 | " \n",
58 | " print(('明天是%s,%s,%s,%s-%s度,温差%d度')% \\\n",
59 | " (tomorrow_date,tomorrow_week,tomorrow_weather,tomorrow_low,tomorrow_high,int(int(tomorrow_high)-int(tomorrow_low))))\n",
60 | " \n",
61 | " #计算今明两天温度差异,这里用的是最高温度\n",
62 | " temperature_distance = int(tomorrow_high) - int(today_high)\n",
63 | " \n",
64 | " if temperature_distance > 0:\n",
65 | " a = '明日升温%d' % temperature_distance\n",
66 | " print('明日升温%d' % temperature_distance)\n",
67 | " if temperature_distance < 0:\n",
68 | " a = '明日降温%d' % temperature_distance\n",
69 | " print('明日降温%d' % temperature_distance)\n",
70 | " else:\n",
71 | " a = '最高气温不变'\n",
72 | " print('最高气温不变')\n",
73 | " content = tomorrow,a\n",
74 | " return content"
75 | ]
76 | },
77 | {
78 | "cell_type": "markdown",
79 | "metadata": {},
80 | "source": [
81 | "### 展示爬取结果"
82 | ]
83 | },
84 | {
85 | "cell_type": "code",
86 | "execution_count": 12,
87 | "metadata": {},
88 | "outputs": [
89 | {
90 | "name": "stdout",
91 | "output_type": "stream",
92 | "text": [
93 | "明天是11月19日,星期二,晴转多云,5-14度,温差9度\n",
94 | "明日降温-3\n"
95 | ]
96 | }
97 | ],
98 | "source": [
99 | "#默认爬取杭州,可以找到自己城市所对应的地址\n",
100 | "weather = parse()"
101 | ]
102 | },
103 | {
104 | "cell_type": "markdown",
105 | "metadata": {},
106 | "source": [
107 | "### 邮件发送"
108 | ]
109 | },
110 | {
111 | "cell_type": "code",
112 | "execution_count": 13,
113 | "metadata": {},
114 | "outputs": [],
115 | "source": [
116 | "import yagmail\n",
117 | "\n",
118 | "def send_email(contents,send_to = 'receiver_email@xx.com'):\n",
119 | " #登录邮箱,设置登录的账号,密码和port等信息\n",
120 | " yag = yagmail.SMTP(user = 'youremail@sohu.com',password = 'yourpass',\n",
121 | " host = 'smtp.sohu.com',port = '465')\n",
122 | " \n",
123 | " #登录完即可一件发送,设置发送给谁,和邮件主题,邮件内容\n",
124 | " yag.send(to = send_to,\n",
125 | " subject = '天气关怀',\n",
126 | " contents = contents)\n",
127 | " print('发送成功!~')"
128 | ]
129 | },
130 | {
131 | "cell_type": "markdown",
132 | "metadata": {},
133 | "source": [
134 | "### 最终执行,设置自己的邮箱名,密码,host和port参数,以及要发给谁"
135 | ]
136 | },
137 | {
138 | "cell_type": "code",
139 | "execution_count": 9,
140 | "metadata": {},
141 | "outputs": [],
142 | "source": [
143 | "send_email(weather,send_to = 'xxxx')"
144 | ]
145 | },
146 | {
147 | "cell_type": "code",
148 | "execution_count": null,
149 | "metadata": {},
150 | "outputs": [],
151 | "source": []
152 | }
153 | ],
154 | "metadata": {
155 | "kernelspec": {
156 | "display_name": "Python 3",
157 | "language": "python",
158 | "name": "python3"
159 | },
160 | "language_info": {
161 | "codemirror_mode": {
162 | "name": "ipython",
163 | "version": 3
164 | },
165 | "file_extension": ".py",
166 | "mimetype": "text/x-python",
167 | "name": "python",
168 | "nbconvert_exporter": "python",
169 | "pygments_lexer": "ipython3",
170 | "version": "3.5.3"
171 | }
172 | },
173 | "nbformat": 4,
174 | "nbformat_minor": 2
175 | }
176 |
--------------------------------------------------------------------------------
/Zhihu/README.md:
--------------------------------------------------------------------------------
1 | # 知乎爬取和清洗 #
2 |
3 | ----------
4 |
5 | 这里是公众号《数据不吹牛》关于过年三个问题的(知乎)爬取、清洗代码和源数据。
6 |
7 | 主要文件为:
8 |
9 | - 知乎回答和单个用户信息爬取
10 | - 基于爬取数据的清洗和简单可视化
11 | - 两个数据源
12 |
13 |
--------------------------------------------------------------------------------
/Zhihu/知乎爬取代码.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "### 公众号:数据不吹牛\n",
8 | "### 更多案例和有趣分析等你来撩"
9 | ]
10 | },
11 | {
12 | "cell_type": "code",
13 | "execution_count": 5,
14 | "metadata": {},
15 | "outputs": [],
16 | "source": [
17 | "import pandas as pd\n",
18 | "import numpy as np\n",
19 | "import os\n",
20 | "import json\n",
21 | "import requests\n",
22 | "import time\n",
23 | "import random\n",
24 | "from lxml import etree"
25 | ]
26 | },
27 | {
28 | "cell_type": "markdown",
29 | "metadata": {},
30 | "source": [
31 | "## 设置基本网址和headers"
32 | ]
33 | },
34 | {
35 | "cell_type": "code",
36 | "execution_count": 29,
37 | "metadata": {},
38 | "outputs": [],
39 | "source": [
40 | "#两个问题基础网址\n",
41 | "gangwei = 'https://www.zhihu.com/api/v4/questions/266817891/answers?include=data%5B%2A%5D.is_normal%2Cadmin_closed_comment%2Creward_info%2Cis_collapsed%2Cannotation_action%2Cannotation_detail%2Ccollapse_reason%2Cis_sticky%2Ccollapsed_by%2Csuggest_edit%2Ccomment_count%2Ccan_comment%2Ccontent%2Ceditable_content%2Cvoteup_count%2Creshipment_settings%2Ccomment_permission%2Ccreated_time%2Cupdated_time%2Creview_info%2Crelevant_info%2Cquestion%2Cexcerpt%2Crelationship.is_authorized%2Cis_author%2Cvoting%2Cis_thanked%2Cis_nothelp%2Cis_labeled%2Cis_recognized%2Cpaid_info%2Cpaid_info_content%3Bdata%5B%2A%5D.mark_infos%5B%2A%5D.url%3Bdata%5B%2A%5D.author.follower_count%2Cbadge%5B%2A%5D.topics&offset={}&limit={}&platform=desktop&sort_by=default'\n",
42 | "sanwen = 'https://www.zhihu.com/api/v4/questions/27329739/answers?include=data%5B%2A%5D.is_normal%2Cadmin_closed_comment%2Creward_info%2Cis_collapsed%2Cannotation_action%2Cannotation_detail%2Ccollapse_reason%2Cis_sticky%2Ccollapsed_by%2Csuggest_edit%2Ccomment_count%2Ccan_comment%2Ccontent%2Ceditable_content%2Cvoteup_count%2Creshipment_settings%2Ccomment_permission%2Ccreated_time%2Cupdated_time%2Creview_info%2Crelevant_info%2Cquestion%2Cexcerpt%2Crelationship.is_authorized%2Cis_author%2Cvoting%2Cis_thanked%2Cis_nothelp%2Cis_labeled%2Cis_recognized%2Cpaid_info%2Cpaid_info_content%3Bdata%5B%2A%5D.mark_infos%5B%2A%5D.url%3Bdata%5B%2A%5D.author.follower_count%2Cbadge%5B%2A%5D.topics&offset={}&limit={}&platform=desktop&sort_by=default'\n",
43 | "\n",
44 | "headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}"
45 | ]
46 | },
47 | {
48 | "cell_type": "markdown",
49 | "metadata": {
50 | "collapsed": true
51 | },
52 | "source": [
53 | "## 解析单页信息"
54 | ]
55 | },
56 | {
57 | "cell_type": "code",
58 | "execution_count": 30,
59 | "metadata": {},
60 | "outputs": [],
61 | "source": [
62 | "def parse_page(url,headers):\n",
63 | " html = requests.get(url,headers = headers)\n",
64 | " bs = json.loads(html.text)\n",
65 | " result = pd.DataFrame()\n",
66 | " for i in bs['data']:\n",
67 | " headline = i['author']['headline'] #签名\n",
68 | " gender = i['author']['gender'] #性别\n",
69 | " user_type = i['author']['user_type']\n",
70 | " user_id = i['author']['id']\n",
71 | " user_token = i['author']['url_token']\n",
72 | " follwer_count = i['author']['follower_count'] #关注人数\n",
73 | " name = i['author']['name'] #用户昵称\n",
74 | " vote_up = i['voteup_count'] #点赞数\n",
75 | " updated_time = i['updated_time'] #更新时间\n",
76 | " title = i['question']['title'] #问题\n",
77 | " created_time = i['created_time'] #创建时间\n",
78 | " comment_count = i['comment_count'] #评论数\n",
79 | " can_comment = i['can_comment']['status'] #是否可以评论\n",
80 | " content = i['content'] #内容,还需要再清洗\n",
81 | " cache = pd.DataFrame({'用户ID':[user_id],'用户名':[name],'性别':[gender],'token':[user_token],'用户类型':[user_type],'签名':[headline],\n",
82 | " '被关注人数':[follwer_count],'创建时间':[created_time],'更新时间':[updated_time],'评论数':[comment_count],\n",
83 | " '点赞数':[vote_up],'是否可以评论':[can_comment],'内容':[content],'问题':[title]})\n",
84 | " result = pd.concat([result,cache])\n",
85 | " return result"
86 | ]
87 | },
88 | {
89 | "cell_type": "markdown",
90 | "metadata": {},
91 | "source": [
92 | "## 设置爬取回答数,批量获取"
93 | ]
94 | },
95 | {
96 | "cell_type": "code",
97 | "execution_count": 31,
98 | "metadata": {},
99 | "outputs": [],
100 | "source": [
101 | "def run_all(url,headers,num = 200):\n",
102 | " final_result = pd.DataFrame()\n",
103 | " for i in range(0,num,5):\n",
104 | " try:\n",
105 | " result = parse_page(url.format(i,5),headers)\n",
106 | " final_result = pd.concat([final_result,result])\n",
107 | " time.sleep(random.random())\n",
108 | " print('i had parsed:',i)\n",
109 | " except:\n",
110 | " try:\n",
111 | " time.sleep(5)\n",
112 | " result = parse_page(url.format(i,5),headers)\n",
113 | " final_result = pd.concat([final_result,result])\n",
114 | " time.sleep(random.random())\n",
115 | " print('i had parsed:',i)\n",
116 | " except:\n",
117 | " print(i,'is wrong~~~') \n",
118 | " return final_result"
119 | ]
120 | },
121 | {
122 | "cell_type": "code",
123 | "execution_count": 35,
124 | "metadata": {
125 | "scrolled": true
126 | },
127 | "outputs": [
128 | {
129 | "name": "stdout",
130 | "output_type": "stream",
131 | "text": [
132 | "i had parsed: 0\n",
133 | "i had parsed: 5\n",
134 | "i had parsed: 10\n",
135 | "i had parsed: 15\n",
136 | "i had parsed: 20\n",
137 | "i had parsed: 25\n",
138 | "i had parsed: 30\n",
139 | "i had parsed: 35\n",
140 | "i had parsed: 40\n",
141 | "i had parsed: 45\n",
142 | "i had parsed: 50\n",
143 | "i had parsed: 55\n",
144 | "i had parsed: 60\n",
145 | "i had parsed: 65\n",
146 | "i had parsed: 70\n",
147 | "i had parsed: 75\n",
148 | "i had parsed: 80\n",
149 | "i had parsed: 85\n",
150 | "i had parsed: 90\n",
151 | "i had parsed: 95\n",
152 | "i had parsed: 100\n",
153 | "i had parsed: 105\n",
154 | "i had parsed: 110\n",
155 | "i had parsed: 115\n",
156 | "i had parsed: 120\n",
157 | "i had parsed: 125\n",
158 | "i had parsed: 130\n",
159 | "i had parsed: 135\n",
160 | "i had parsed: 140\n",
161 | "i had parsed: 145\n",
162 | "i had parsed: 150\n",
163 | "i had parsed: 155\n",
164 | "i had parsed: 160\n",
165 | "i had parsed: 165\n",
166 | "i had parsed: 170\n",
167 | "i had parsed: 175\n",
168 | "i had parsed: 180\n",
169 | "i had parsed: 185\n",
170 | "i had parsed: 190\n",
171 | "i had parsed: 195\n",
172 | "i had parsed: 200\n",
173 | "i had parsed: 205\n",
174 | "i had parsed: 210\n",
175 | "i had parsed: 215\n",
176 | "i had parsed: 220\n",
177 | "i had parsed: 225\n",
178 | "i had parsed: 230\n",
179 | "i had parsed: 235\n",
180 | "i had parsed: 240\n",
181 | "i had parsed: 245\n",
182 | "i had parsed: 250\n",
183 | "i had parsed: 255\n",
184 | "i had parsed: 260\n",
185 | "i had parsed: 265\n",
186 | "i had parsed: 270\n",
187 | "i had parsed: 275\n",
188 | "i had parsed: 280\n",
189 | "i had parsed: 285\n",
190 | "i had parsed: 290\n",
191 | "i had parsed: 295\n",
192 | "i had parsed: 300\n",
193 | "i had parsed: 305\n",
194 | "i had parsed: 310\n",
195 | "i had parsed: 315\n",
196 | "i had parsed: 320\n",
197 | "i had parsed: 325\n",
198 | "i had parsed: 330\n",
199 | "i had parsed: 335\n",
200 | "i had parsed: 340\n",
201 | "i had parsed: 345\n",
202 | "i had parsed: 350\n",
203 | "i had parsed: 355\n",
204 | "i had parsed: 360\n",
205 | "i had parsed: 365\n",
206 | "i had parsed: 370\n",
207 | "i had parsed: 375\n",
208 | "i had parsed: 380\n",
209 | "i had parsed: 385\n",
210 | "i had parsed: 390\n",
211 | "i had parsed: 395\n",
212 | "i had parsed: 400\n",
213 | "i had parsed: 405\n",
214 | "i had parsed: 410\n",
215 | "i had parsed: 415\n",
216 | "i had parsed: 420\n",
217 | "i had parsed: 425\n",
218 | "i had parsed: 430\n",
219 | "i had parsed: 435\n",
220 | "i had parsed: 440\n",
221 | "i had parsed: 445\n",
222 | "i had parsed: 450\n",
223 | "i had parsed: 455\n",
224 | "i had parsed: 460\n",
225 | "i had parsed: 465\n",
226 | "i had parsed: 470\n",
227 | "i had parsed: 475\n",
228 | "i had parsed: 480\n",
229 | "i had parsed: 485\n",
230 | "i had parsed: 490\n",
231 | "i had parsed: 495\n",
232 | "i had parsed: 500\n",
233 | "i had parsed: 505\n",
234 | "i had parsed: 510\n",
235 | "i had parsed: 515\n",
236 | "i had parsed: 520\n",
237 | "i had parsed: 525\n",
238 | "i had parsed: 530\n",
239 | "i had parsed: 535\n",
240 | "i had parsed: 540\n",
241 | "i had parsed: 545\n",
242 | "i had parsed: 550\n",
243 | "i had parsed: 555\n",
244 | "i had parsed: 560\n",
245 | "i had parsed: 565\n",
246 | "i had parsed: 570\n",
247 | "i had parsed: 575\n",
248 | "i had parsed: 580\n",
249 | "i had parsed: 585\n",
250 | "i had parsed: 590\n",
251 | "i had parsed: 595\n",
252 | "i had parsed: 600\n",
253 | "i had parsed: 605\n",
254 | "i had parsed: 610\n",
255 | "i had parsed: 615\n",
256 | "i had parsed: 620\n",
257 | "i had parsed: 625\n",
258 | "i had parsed: 630\n",
259 | "i had parsed: 635\n",
260 | "i had parsed: 640\n",
261 | "i had parsed: 645\n",
262 | "i had parsed: 650\n",
263 | "i had parsed: 655\n",
264 | "i had parsed: 660\n",
265 | "i had parsed: 665\n",
266 | "i had parsed: 670\n",
267 | "i had parsed: 675\n",
268 | "i had parsed: 680\n",
269 | "i had parsed: 685\n",
270 | "i had parsed: 690\n",
271 | "i had parsed: 695\n",
272 | "i had parsed: 700\n",
273 | "i had parsed: 705\n",
274 | "i had parsed: 710\n",
275 | "i had parsed: 715\n",
276 | "i had parsed: 720\n",
277 | "i had parsed: 725\n",
278 | "i had parsed: 730\n",
279 | "i had parsed: 735\n",
280 | "i had parsed: 740\n",
281 | "i had parsed: 745\n",
282 | "i had parsed: 750\n",
283 | "i had parsed: 755\n",
284 | "i had parsed: 760\n",
285 | "i had parsed: 765\n",
286 | "i had parsed: 770\n"
287 | ]
288 | }
289 | ],
290 | "source": [
291 | "final_result = run_all(sanwen,headers,775)"
292 | ]
293 | },
294 | {
295 | "cell_type": "markdown",
296 | "metadata": {},
297 | "source": [
298 | "### 单个爬取用户信息,会存在ip短暂被ban的情况,如果想要持续稳定,建议准备好IP"
299 | ]
300 | },
301 | {
302 | "cell_type": "code",
303 | "execution_count": null,
304 | "metadata": {},
305 | "outputs": [],
306 | "source": [
307 | "def get_ips():\n",
308 | " #交给你了朋友,事先准备好ip,每次调用一个回来就OK"
309 | ]
310 | },
311 | {
312 | "cell_type": "code",
313 | "execution_count": 20,
314 | "metadata": {},
315 | "outputs": [
316 | {
317 | "data": {
318 | "text/plain": [
319 | "{'https': 'https://117.57.35.166:4512'}"
320 | ]
321 | },
322 | "execution_count": 20,
323 | "metadata": {},
324 | "output_type": "execute_result"
325 | }
326 | ],
327 | "source": [
328 | "get_ips()"
329 | ]
330 | },
331 | {
332 | "cell_type": "markdown",
333 | "metadata": {},
334 | "source": [
335 | "### 逐个爬取用户信息,获取行业岗位等信息"
336 | ]
337 | },
338 | {
339 | "cell_type": "code",
340 | "execution_count": 1,
341 | "metadata": {},
342 | "outputs": [],
343 | "source": [
344 | "def get_user_info(user,headers,ip):\n",
345 | " user_url = 'https://www.zhihu.com/people/{}/activities'\n",
346 | " ht = requests.get(user_url.format(user),headers = headers,proxies = ip)\n",
347 | " if ht.text.find('安全验证') == -1:\n",
348 | " \n",
349 | " bs = etree.HTML(ht.text)\n",
350 | " try:\n",
351 | " hangye = bs.xpath('//div[@class = \"ProfileHeader-infoItem\"][1]/text()')\n",
352 | " except:\n",
353 | " hangye = None\n",
354 | " try:\n",
355 | " school = bs.xpath('//div[@class = \"ProfileHeader-infoItem\"][2]/text()')[0]\n",
356 | " except:\n",
357 | " school = None\n",
358 | " try:\n",
359 | " prof = bs.xpath('//div[@class = \"ProfileHeader-infoItem\"][2]/text()')[1]\n",
360 | " except:\n",
361 | " prof = None\n",
362 | " df = pd.DataFrame({'token':[user],'行业':[hangye],'教育经历':[school],'专业':[prof]})\n",
363 | " \n",
364 | " else:\n",
365 | " \n",
366 | " ip = get_ips()\n",
367 | " try:\n",
368 | " ht = requests.get(user_url.format(user),headers = headers,proxies = ip)\n",
369 | " except:\n",
370 | " ip = get_ips()\n",
371 | " ht = requests.get(user_url.format(user),headers = headers,proxies = ip)\n",
372 | " bs = etree.HTML(ht.text)\n",
373 | " try:\n",
374 | " hangye = bs.xpath('//div[@class = \"ProfileHeader-infoItem\"][1]/text()')\n",
375 | " except:\n",
376 | " hangye = None\n",
377 | " try:\n",
378 | " school = bs.xpath('//div[@class = \"ProfileHeader-infoItem\"][2]/text()')[0]\n",
379 | " except:\n",
380 | " school = None\n",
381 | " try:\n",
382 | " prof = bs.xpath('//div[@class = \"ProfileHeader-infoItem\"][2]/text()')[1]\n",
383 | " except:\n",
384 | " prof = None\n",
385 | " df = pd.DataFrame({'token':[user],'行业':[hangye],'教育经历':[school],'专业':[prof]})\n",
386 | " print('ip changes')\n",
387 | " \n",
388 | " return df,ip"
389 | ]
390 | },
391 | {
392 | "cell_type": "markdown",
393 | "metadata": {},
394 | "source": [
395 | "### 循环爬取用户信息"
396 | ]
397 | },
398 | {
399 | "cell_type": "code",
400 | "execution_count": null,
401 | "metadata": {},
402 | "outputs": [],
403 | "source": [
404 | "ip = get_ips()\n",
405 | "ct = 1\n",
406 | "user_info2 = pd.DataFrame()\n",
407 | "for i in final_result['token']:\n",
408 | " df,ip = get_user_info(i,headers,ip)\n",
409 | " user_info2 = pd.concat([user_info2,df])\n",
410 | " time.sleep(random.random() / 2)\n",
411 | " print('i had parsed:{}'.format(ct))\n",
412 | " ct += 1"
413 | ]
414 | },
415 | {
416 | "cell_type": "code",
417 | "execution_count": 15,
418 | "metadata": {},
419 | "outputs": [],
420 | "source": []
421 | },
422 | {
423 | "cell_type": "code",
424 | "execution_count": null,
425 | "metadata": {},
426 | "outputs": [],
427 | "source": []
428 | },
429 | {
430 | "cell_type": "code",
431 | "execution_count": 26,
432 | "metadata": {},
433 | "outputs": [],
434 | "source": []
435 | },
436 | {
437 | "cell_type": "code",
438 | "execution_count": 27,
439 | "metadata": {},
440 | "outputs": [],
441 | "source": []
442 | },
443 | {
444 | "cell_type": "code",
445 | "execution_count": null,
446 | "metadata": {},
447 | "outputs": [],
448 | "source": []
449 | },
450 | {
451 | "cell_type": "code",
452 | "execution_count": 128,
453 | "metadata": {},
454 | "outputs": [],
455 | "source": []
456 | },
457 | {
458 | "cell_type": "code",
459 | "execution_count": null,
460 | "metadata": {},
461 | "outputs": [],
462 | "source": []
463 | },
464 | {
465 | "cell_type": "code",
466 | "execution_count": null,
467 | "metadata": {
468 | "collapsed": true
469 | },
470 | "outputs": [],
471 | "source": []
472 | },
473 | {
474 | "cell_type": "code",
475 | "execution_count": null,
476 | "metadata": {},
477 | "outputs": [],
478 | "source": []
479 | },
480 | {
481 | "cell_type": "code",
482 | "execution_count": null,
483 | "metadata": {},
484 | "outputs": [],
485 | "source": []
486 | },
487 | {
488 | "cell_type": "code",
489 | "execution_count": null,
490 | "metadata": {
491 | "collapsed": true
492 | },
493 | "outputs": [],
494 | "source": []
495 | },
496 | {
497 | "cell_type": "code",
498 | "execution_count": null,
499 | "metadata": {
500 | "collapsed": true
501 | },
502 | "outputs": [],
503 | "source": []
504 | },
505 | {
506 | "cell_type": "code",
507 | "execution_count": null,
508 | "metadata": {
509 | "collapsed": true
510 | },
511 | "outputs": [],
512 | "source": []
513 | },
514 | {
515 | "cell_type": "code",
516 | "execution_count": null,
517 | "metadata": {
518 | "collapsed": true
519 | },
520 | "outputs": [],
521 | "source": []
522 | },
523 | {
524 | "cell_type": "code",
525 | "execution_count": null,
526 | "metadata": {
527 | "collapsed": true
528 | },
529 | "outputs": [],
530 | "source": []
531 | },
532 | {
533 | "cell_type": "code",
534 | "execution_count": null,
535 | "metadata": {
536 | "collapsed": true
537 | },
538 | "outputs": [],
539 | "source": []
540 | },
541 | {
542 | "cell_type": "code",
543 | "execution_count": null,
544 | "metadata": {},
545 | "outputs": [],
546 | "source": []
547 | },
548 | {
549 | "cell_type": "code",
550 | "execution_count": null,
551 | "metadata": {
552 | "collapsed": true
553 | },
554 | "outputs": [],
555 | "source": []
556 | },
557 | {
558 | "cell_type": "code",
559 | "execution_count": null,
560 | "metadata": {
561 | "collapsed": true
562 | },
563 | "outputs": [],
564 | "source": []
565 | },
566 | {
567 | "cell_type": "code",
568 | "execution_count": null,
569 | "metadata": {},
570 | "outputs": [],
571 | "source": []
572 | },
573 | {
574 | "cell_type": "code",
575 | "execution_count": null,
576 | "metadata": {
577 | "collapsed": true
578 | },
579 | "outputs": [],
580 | "source": []
581 | },
582 | {
583 | "cell_type": "code",
584 | "execution_count": null,
585 | "metadata": {
586 | "collapsed": true
587 | },
588 | "outputs": [],
589 | "source": []
590 | },
591 | {
592 | "cell_type": "code",
593 | "execution_count": null,
594 | "metadata": {
595 | "collapsed": true
596 | },
597 | "outputs": [],
598 | "source": []
599 | },
600 | {
601 | "cell_type": "code",
602 | "execution_count": null,
603 | "metadata": {
604 | "collapsed": true
605 | },
606 | "outputs": [],
607 | "source": []
608 | },
609 | {
610 | "cell_type": "code",
611 | "execution_count": null,
612 | "metadata": {
613 | "collapsed": true
614 | },
615 | "outputs": [],
616 | "source": []
617 | }
618 | ],
619 | "metadata": {
620 | "anaconda-cloud": {},
621 | "kernelspec": {
622 | "display_name": "Python 3",
623 | "language": "python",
624 | "name": "python3"
625 | },
626 | "language_info": {
627 | "codemirror_mode": {
628 | "name": "ipython",
629 | "version": 3
630 | },
631 | "file_extension": ".py",
632 | "mimetype": "text/x-python",
633 | "name": "python",
634 | "nbconvert_exporter": "python",
635 | "pygments_lexer": "ipython3",
636 | "version": "3.5.3"
637 | }
638 | },
639 | "nbformat": 4,
640 | "nbformat_minor": 2
641 | }
642 |
--------------------------------------------------------------------------------
/Zhihu/第二个问题源数据.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Zhihu/第二个问题源数据.xlsx
--------------------------------------------------------------------------------
/Zhihu/过年工作问题.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Zhihu/过年工作问题.xlsx
--------------------------------------------------------------------------------