├── ChilisauceComment ├── README.md ├── 李子柒辣椒酱评论.xlsx └── 评价分析代码.ipynb ├── Comments ├── README.md ├── 刷单鉴定评价数据.xlsx ├── 生姜防脱洗发水.xlsx └── 评价刷单鉴定两板斧代码.ipynb ├── DoubanMovies ├── README.md ├── 最终电影排名结果.xlsx ├── 清洗分析详细步骤.ipynb ├── 电影基本信息大全.xlsx ├── 电影详细信息.xlsx └── 豆瓣电影爬取.ipynb ├── Hair ├── README.md ├── 防脱洗发水评价.xlsx └── 防脱洗发水评价爬取+分析.ipynb ├── Python+excel ├── Python批量处理Excel表格.ipynb ├── README.md └── 源数据128张表格 │ ├── 专项户外运动装备&冰爪.xlsx │ ├── 专项户外运动装备&呼吸管-呼吸器.xlsx │ ├── 专项户外运动装备&安全带.xlsx │ ├── 专项户外运动装备&救生衣.xlsx │ ├── 专项户外运动装备&气瓶.xlsx │ ├── 专项户外运动装备&滑雪头盔.xlsx │ ├── 专项户外运动装备&滑雪护具.xlsx │ ├── 专项户外运动装备&滑雪板.xlsx │ ├── 专项户外运动装备&滑雪眼镜.xlsx │ ├── 专项户外运动装备&潜水箱包.xlsx │ ├── 专项户外运动装备&潜水袜.xlsx │ ├── 专项户外运动装备&皮划艇充气艇.xlsx │ ├── 专项户外运动装备&绳索.xlsx │ ├── 专项户外运动装备&脚蹼.xlsx │ ├── 专项户外运动装备&面镜.xlsx │ ├── 垂钓装备&其他垂钓用品.xlsx │ ├── 垂钓装备&垂钓小配件.xlsx │ ├── 垂钓装备&垂钓装备.xlsx │ ├── 垂钓装备&太空豆.xlsx │ ├── 垂钓装备&打水桶.xlsx │ ├── 垂钓装备&抄网.xlsx │ ├── 垂钓装备&抄网头.xlsx │ ├── 垂钓装备&抄网杆.xlsx │ ├── 垂钓装备&探鱼器.xlsx │ ├── 垂钓装备&支架.xlsx │ ├── 垂钓装备&止血钳.xlsx │ ├── 垂钓装备&浮漂.xlsx │ ├── 垂钓装备&渔具包.xlsx │ ├── 垂钓装备&绑钩器.xlsx │ ├── 垂钓装备&装鱼桶.xlsx │ ├── 垂钓装备&钓台.xlsx │ ├── 垂钓装备&钓竿.xlsx │ ├── 垂钓装备&钓箱.xlsx │ ├── 垂钓装备&钓鱼伞.xlsx │ ├── 垂钓装备&钓鱼帽.xlsx │ ├── 垂钓装备&钓鱼手套.xlsx │ ├── 垂钓装备&钓鱼椅、凳.xlsx │ ├── 垂钓装备&钓鱼鞋.xlsx │ ├── 垂钓装备&铅坠.xlsx │ ├── 垂钓装备&铅皮.xlsx │ ├── 垂钓装备&饵料盒.xlsx │ ├── 垂钓装备&鱼护.xlsx │ ├── 垂钓装备&鱼线.xlsx │ ├── 垂钓装备&鱼线轮.xlsx │ ├── 垂钓装备&鱼网-虾笼-其它渔具.xlsx │ ├── 垂钓装备&鱼钩.xlsx │ ├── 垂钓装备&鱼饵.xlsx │ ├── 户外休闲家具&充气床.xlsx │ ├── 户外休闲家具&吊床.xlsx │ ├── 户外休闲家具&户外休闲家具.xlsx │ ├── 户外休闲家具&户外床-折叠床.xlsx │ ├── 户外休闲家具&户外桌子.xlsx │ ├── 户外休闲家具&户外桌椅套装.xlsx │ ├── 户外休闲家具&户外椅子凳子.xlsx │ ├── 户外休闲家具&野餐垫.xlsx │ ├── 户外服装&一次性内裤.xlsx │ ├── 户外服装&其他户外服装.xlsx │ ├── 户外服装&内衣裤套装.xlsx │ ├── 户外服装&冲锋衣.xlsx │ ├── 户外服装&冲锋衣裤套装.xlsx │ ├── 户外服装&冲锋裤.xlsx │ ├── 户外服装&功能内衣上装.xlsx │ ├── 户外服装&功能内衣下装.xlsx │ ├── 户外服装&功能内裤.xlsx │ ├── 户外服装&户外休闲衣.xlsx │ ├── 户外服装&户外休闲衣裤套装.xlsx │ ├── 户外服装&户外休闲裤.xlsx │ ├── 户外服装&户外服装.xlsx │ ├── 户外服装&抓绒衣.xlsx │ ├── 户外服装&抓绒裤.xlsx │ ├── 户外服装&滑雪衣.xlsx │ ├── 户外服装&滑雪衣裤套装.xlsx │ ├── 户外服装&滑雪裤.xlsx │ ├── 户外服装&潜水服.xlsx │ ├── 户外服装&羽绒衣.xlsx │ ├── 户外服装&软壳衣.xlsx │ ├── 户外服装&软壳裤.xlsx │ ├── 户外服装&运动户外风衣.xlsx │ ├── 户外服装&速干T恤.xlsx │ ├── 户外服装&速干背心.xlsx │ ├── 户外服装&速干衣裤套装.xlsx │ ├── 户外服装&速干衬衣.xlsx │ ├── 户外服装&速干裤.xlsx │ ├── 户外服装&钓鱼服.xlsx │ ├── 户外照明&信号灯-发光棒-救生灯.xlsx │ ├── 户外照明&充电器.xlsx │ ├── 户外照明&其他.xlsx │ ├── 户外照明&头灯.xlsx │ ├── 户外照明&户外照明.xlsx │ ├── 户外照明&手电筒.xlsx │ ├── 户外照明&电池-燃料.xlsx │ ├── 户外照明&营地灯-帐篷灯.xlsx │ ├── 户外照明&钓鱼灯.xlsx │ ├── 户外鞋靴&其他户外鞋.xlsx │ ├── 户外鞋靴&户外休闲鞋.xlsx │ ├── 户外鞋靴&户外鞋靴.xlsx │ ├── 户外鞋靴&攀岩鞋.xlsx │ ├── 户外鞋靴&沙滩鞋-凉鞋-拖鞋.xlsx │ ├── 户外鞋靴&溯溪鞋.xlsx │ ├── 户外鞋靴&滑雪鞋-雪地靴.xlsx │ ├── 户外鞋靴&登山鞋-徒步鞋.xlsx │ ├── 户外鞋靴&越野跑鞋.xlsx │ ├── 旅行便携装备&其他.xlsx │ ├── 旅行便携装备&其他安全防盗产品.xlsx │ ├── 旅行便携装备&旅行便携装备.xlsx │ ├── 旅行便携装备&普通密码锁.xlsx │ ├── 旅行便携装备&晾衣绳.xlsx │ ├── 旅行便携装备&转换插头.xlsx │ ├── 望远镜-夜视仪-户外眼镜&垂钓望远镜.xlsx │ ├── 望远镜-夜视仪-户外眼镜&户外眼镜.xlsx │ ├── 望远镜-夜视仪-户外眼镜&普通望远镜.xlsx │ ├── 望远镜-夜视仪-户外眼镜&望远镜-夜视仪-户外眼镜.xlsx │ ├── 望远镜-夜视仪-户外眼镜&望远镜配件.xlsx │ ├── 洗漱清洁-护理用品&防虫-防蚊用品.xlsx │ ├── 登山杖-手杖&登山杖-手杖.xlsx │ ├── 睡袋&睡袋.xlsx │ ├── 防护-救生装备&其他防护救生装备.xlsx │ ├── 防护-救生装备&急救包-急救箱.xlsx │ ├── 防护-救生装备&急救护理用品.xlsx │ ├── 防护-救生装备&求生哨.xlsx │ ├── 防护-救生装备&求生绳-逃生绳.xlsx │ ├── 防护-救生装备&求生锯-绳锯-线锯.xlsx │ ├── 防护-救生装备&防护-救生装备.xlsx │ ├── 防护-救生装备&防护面罩.xlsx │ ├── 防潮垫-地席-枕头&地布-地席.xlsx │ ├── 防潮垫-地席-枕头&枕头.xlsx │ ├── 防潮垫-地席-枕头&防潮垫-地席-枕头.xlsx │ └── 防潮垫-地席-枕头&防潮垫.xlsx ├── README.md ├── RFM ├── PYTHON-RFM实战数据.xlsx ├── README.md └── RFM模型实战案例代码.ipynb ├── TGI ├── README.md ├── TGI分析代码.ipynb └── TGI指数案例数据.xlsx ├── Weather+Email ├── README.md └── 天气爬虫+邮件发送.ipynb └── Zhihu ├── README.md ├── 数据清洗.ipynb ├── 知乎爬取代码.ipynb ├── 第二个问题源数据.xlsx └── 过年工作问题.xlsx /ChilisauceComment/README.md: -------------------------------------------------------------------------------- 1 | # 数据分析如何更进一步-实战项目# 2 | 3 | ---------- 4 | 5 | 本项目以一个场景切入,以实战案例来探索怎样将分析更进一步 6 | 7 | 主要文件为: 8 | 9 | - 辣椒酱评论数据 10 | - 评价分析完整代码 11 | 12 | 欢迎关注公众号:数据不吹牛 13 | 14 | -------------------------------------------------------------------------------- /ChilisauceComment/李子柒辣椒酱评论.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/ChilisauceComment/李子柒辣椒酱评论.xlsx -------------------------------------------------------------------------------- /Comments/README.md: -------------------------------------------------------------------------------- 1 | # 如何优雅鉴别刷单 # 2 | 3 | ---------- 4 | 5 | 项目主要用两种简单粗暴的方法来鉴别刷单(刷评价)。 6 | 7 | 主要文件为: 8 | 9 | - 刷单鉴定评价数据 10 | - 生姜防脱洗发水(评论)数据 11 | - 评价刷单鉴定代码 12 | 13 | 欢迎关注公众号:数据不吹牛 14 | 15 | -------------------------------------------------------------------------------- /Comments/刷单鉴定评价数据.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Comments/刷单鉴定评价数据.xlsx -------------------------------------------------------------------------------- /Comments/生姜防脱洗发水.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Comments/生姜防脱洗发水.xlsx -------------------------------------------------------------------------------- /Comments/评价刷单鉴定两板斧代码.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### 公众号:数据不吹牛,更多案例和有趣分析等你来撩" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 33, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "import pandas as pd\n", 17 | "import os\n", 18 | "import matplotlib.pyplot as plt\n", 19 | "\n", 20 | "%matplotlib inline" 21 | ] 22 | }, 23 | { 24 | "cell_type": "markdown", 25 | "metadata": {}, 26 | "source": [ 27 | "### 导入数据" 28 | ] 29 | }, 30 | { 31 | "cell_type": "code", 32 | "execution_count": 16, 33 | "metadata": {}, 34 | "outputs": [], 35 | "source": [ 36 | "os.chdir('C:\\\\Users\\\\Administrator\\\\Desktop\\\\JC数据集')" 37 | ] 38 | }, 39 | { 40 | "cell_type": "code", 41 | "execution_count": 17, 42 | "metadata": {}, 43 | "outputs": [ 44 | { 45 | "data": { 46 | "text/html": [ 47 | "
\n", 48 | "\n", 61 | "\n", 62 | " \n", 63 | " \n", 64 | " \n", 65 | " \n", 66 | " \n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | "
产品ID价格总销量总评价数规格类型
059497802189.02215312269套装
15559440395.022706453842NaN
25641917279.0733418130106正常规格
35856723589.0480040103975常规单品
45362523559.025360649611常规单品
\n", 115 | "
" 116 | ], 117 | "text/plain": [ 118 | " 产品ID 价格 总销量 总评价数 规格类型\n", 119 | "0 59497802 189.0 22153 12269 套装\n", 120 | "1 55594403 95.0 227064 53842 NaN\n", 121 | "2 56419172 79.0 733418 130106 正常规格\n", 122 | "3 58567235 89.0 480040 103975 常规单品\n", 123 | "4 53625235 59.0 253606 49611 常规单品" 124 | ] 125 | }, 126 | "execution_count": 17, 127 | "metadata": {}, 128 | "output_type": "execute_result" 129 | } 130 | ], 131 | "source": [ 132 | "df = pd.read_excel('刷单鉴定评价数据.xlsx')\n", 133 | "df.head()" 134 | ] 135 | }, 136 | { 137 | "cell_type": "markdown", 138 | "metadata": {}, 139 | "source": [ 140 | "### 计算评销比" 141 | ] 142 | }, 143 | { 144 | "cell_type": "code", 145 | "execution_count": 18, 146 | "metadata": {}, 147 | "outputs": [ 148 | { 149 | "data": { 150 | "text/html": [ 151 | "
\n", 152 | "\n", 165 | "\n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | "
产品ID价格总销量总评价数规格类型评销比
059497802189.02215312269套装55.383018
15559440395.022706453842NaN23.712257
25641917279.0733418130106正常规格17.739679
35856723589.0480040103975常规单品21.659653
45362523559.025360649611常规单品19.562234
\n", 225 | "
" 226 | ], 227 | "text/plain": [ 228 | " 产品ID 价格 总销量 总评价数 规格类型 评销比\n", 229 | "0 59497802 189.0 22153 12269 套装 55.383018\n", 230 | "1 55594403 95.0 227064 53842 NaN 23.712257\n", 231 | "2 56419172 79.0 733418 130106 正常规格 17.739679\n", 232 | "3 58567235 89.0 480040 103975 常规单品 21.659653\n", 233 | "4 53625235 59.0 253606 49611 常规单品 19.562234" 234 | ] 235 | }, 236 | "execution_count": 18, 237 | "metadata": {}, 238 | "output_type": "execute_result" 239 | } 240 | ], 241 | "source": [ 242 | "df['评销比'] = df['总评价数'] / df['总销量'] * 100\n", 243 | "df.head()" 244 | ] 245 | }, 246 | { 247 | "cell_type": "markdown", 248 | "metadata": {}, 249 | "source": [ 250 | "### 查看评销比分布" 251 | ] 252 | }, 253 | { 254 | "cell_type": "code", 255 | "execution_count": 19, 256 | "metadata": {}, 257 | "outputs": [ 258 | { 259 | "data": { 260 | "text/plain": [ 261 | "" 262 | ] 263 | }, 264 | "execution_count": 19, 265 | "metadata": {}, 266 | "output_type": "execute_result" 267 | }, 268 | { 269 | "data": { 270 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAlMAAAFICAYAAAB0uHstAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAE8ZJREFUeJzt3X+w5Xdd3/HXmywkDalRklWakmwwTGlMoHZM0Vo6hKLEgGOnQYOxNKFDXRDidIqUWqstiLY4oNLAwHhTCejwK4gRo5CpMyF1YqF0M9AGQnBi3JANQ1yNSPPTJvvuH+e79HDZuDf7uT/OPffxmDmTe77nnHs+73t3d575nvM93+ruAABwbB631QsAANjOxBQAwAAxBQAwQEwBAAwQUwAAA8QUAMAAMQUAMEBMAQAMEFMAAAN2beaTnXrqqX3mmWdu5lMCAByTm2666U+7e/fR7repMXXmmWdm3759m/mUAADHpKruWMv9vMwHADBATAEADBBTAAADxBQAwAAxBQAwQEwBAAwQUwAAA8QUAMAAMQUAMEBMAQAMEFMAAAM29dx87GArK1u9gpm9e7d6BQAsGXumAAAGiCkAgAFiCgBggJgCABggpgAABogpAIABYgoAYICYAgAYIKYAAAaIKQCAAWIKAGCAmAIAGCCmAAAGiCkAgAFiCgBggJgCABggpgAABogpAIABYgoAYICYAgAYsOaYqqpfrKquqlOr6viquqqq7qqqW6vqoo1cJADAotq1ljtV1fOSnDG36SeTnJzk9CTnJLmxqm7o7nvWf4kAAIvrqHumquqbkrwpyeVzmy9OckV3H+rum5PsS3LBozx+b1Xtq6p9Bw8eXI81AwAsjLW8zPeOJK/v7rvntp2V5Pa563ckOe1ID+7ule4+r7vP271797GvFABgAf2VMVVVL0lyX3d/eNVNh6bLYb3qOgDAjnC090y9MsnJVfXpuW0fS3J3Zu+hOjBt25Pkd9Z/eQAAi+2vjKnu/u7561XVSZ6b5F8nubyqPp7k3CRnJ7luoxYJALCojvVzpt6QpJLsT3J1kku7+4H1WhQAwHaxpo9GOKy7a+7qJeu8FgCAbccnoAMADBBTAAADxBQAwAAxBQAwQEwBAAx4TEfzwba3srLVK5jZu3erVwDAOrFnCgBggJgCABggpgAABogpAIABYgoAYICYAgAYIKYAAAaIKQCAAWIKAGCAmAIAGCCmAAAGiCkAgAFiCgBggJgCABggpgAABogpAIABYgoAYICYAgAYIKYAAAaIKQCAAWIKAGCAmAIAGCCmAAAGiCkAgAFiCgBggJgCABggpgAABogpAIABYgoAYICYAgAYIKYAAAaIKQCAAWIKAGCAmAIAGCCmAAAGiCkAgAFiCgBggJgCABggpgAABogpAIABYgoAYICYAgAYIKYAAAaIKQCAAWuKqap6X1V9rqoOVNXbquq4qjq+qq6qqruq6taqumijFwsAsGh2rfF+r+3uO6vqiUk+k+SaJM9OcnKS05Ock+TGqrqhu+/ZmKUCACyeNe2Z6u47py9PS3IoyR8muTjJFd19qLtvTrIvyQWrH1tVe6tqX1XtO3jw4DotGwBgMaz1Zb4XV9WBJJ9K8nNTXJ2V5Pa5u92RWWx9je5e6e7zuvu83bt3r8eaAQAWxppe5uvuDyT5QFV9a5Jrq+orme2hOjR/t1XXAQCW3mM6mq+7b0/ywSTnJzmQ5Iy5m/ck2b9eCwMA2A6OGlNVdXpVPW36+uQkFyb5eGZvQr+8Zp6R5Owk123kYgEAFs1aXuY7McmHpiP5Hkryru5+b1X9dpIrM9sbdX+SS7v7gQ1bKQDAAjpqTHX355Oce4Tt9ya5ZCMWBQCwXfgEdACAAWIKAGCAmAIAGCCmAAAGiCkAgAFiCgBggJgCABggpgAABogpAIABYgoAYICYAgAYIKYAAAaIKQCAAWIKAGCAmAIAGCCmAAAGiCkAgAFiCgBggJgCABggpgAABogpAIABYgoAYICYAgAYIKYAAAaIKQCAAWIKAGCAmAIAGCCmAAAGiCkAgAG7tnoBbLCVla1eAQAsNXumAAAGiCkAgAFiCgBggJgCABggpgAABogpAIABYgoAYICYAgAYIKYAAAaIKQCAAWIKAGCAmAIAGCCmAAAGiCkAgAFiCgBggJgCABiwa6sXADvSyspWryDZu3erVwCwFOyZAgAYIKYAAAYcNaZq5o1VdWtV7a+qj1TV7qo6vqquqqq7ptsu2owFAwAskrW8Z6qSfCHJuUkeSXJVkp9Ock+Sk5OcnuScJDdW1Q3dfc8GrRUAYOEcdc9Udx/q7rd398Pd3Un2JTklycVJrphuv3nafsHGLhcAYLE8pvdMVdXjk1yW5D1Jzkpy+9zNdyQ57QiP2VtV+6pq38GDB0fWCgCwcNYcU1X1uCTvTPKx7v5okkPT5bBedX22sXulu8/r7vN27949ul4AgIWyppiqql2Z7Y062N2vnTYfSHLG3N32JNm/rqsDAFhwazma7/gkv5nklu5+9dxN1yS5fDra7xlJzk5y3cYsEwBgMa3laL4fTXJhkm+rqsumbR9K8oYkV2a2N+r+JJd29wMbsUgAgEV11Jjq7rcleduj3HzJ+i4HAGB7cW4+2KkW4fyAiXMEAtue08kAAAwQUwAAA8QUAMAAMQUAMEBMAQAMEFMAAAPEFADAADEFADBATAEADBBTAAADxBQAwAAxBQAwQEwBAAwQUwAAA8QUAMAAMQUAMEBMAQAMEFMAAAPEFADAADEFADBATAEADBBTAAADxBQAwAAxBQAwQEwBAAwQUwAAA8QUAMAAMQUAMEBMAQAMEFMAAAPEFADAADEFADBATAEADBBTAAADxBQAwAAxBQAwQEwBAAwQUwAAA8QUAMAAMQUAMEBMAQAMEFMAAAPEFADAADEFADBATAEADBBTAAADxBQAwAAxBQAwYM0xVVXHVdWzNnIxAADbzVFjqqp2VdXVSb6U5Pq57cdX1VVVdVdV3VpVF23kQgEAFtFa9kx1kl9Pcv6q7T+Z5OQkpyf5oSRXVdWT1nV1AAAL7qgx1d2PdPe1Se5bddPFSa7o7kPdfXOSfUku2IA1AgAsrF0Djz0rye1z1+9IctrqO1XV3iR7k+SMM84YeLptZmVlq1cA28Oi/F3Zu3erVwBsUyNH8x2aLof1quuzjd0r3X1ed5+3e/fugacDAFg8IzF1IMn8rqY9SfYPrQYAYJsZialrklxeM89IcnaS69ZnWQAA28Oa3jNVVbcmOSHJiVV1W5JbkvxIkisz2xt1f5JLu/uBDVonAMBCWlNMdffffpSbLlnHtQAAbDtOJwMAMEBMAQAMEFMAAAPEFADAADEFADBATAEADBBTAAADxBQAwAAxBQAwQEwBAAwQUwAAA8QUAMAAMQUAMEBMAQAMEFMAAAPEFADAADEFADBATAEADBBTAAADxBQAwAAxBQAwQEwBAAwQUwAAA3Zt9QIAFsLKylavYLHs3bvVK4Btw54pAIABYgoAYICYAgAYIKYAAAaIKQCAAY7mA+DrLcrRjY4qZBuwZwoAYICYAgAYIKYAAAaIKQCAAWIKAGCAmAIAGCCmAAAGiCkAgAFiCgBggJgCABggpgAABiznufkW5ZxSAIxZhH/PF+X8gIvws0gW5+exQOyZAgAYIKYAAAaIKQCAAWIKAGCAmAIAGLCcR/MBwHpZlKPoFsWi/DwW6KhCe6YAAAaIKQCAAUMxVVXPq6rPVdUXquqDVfXX12thAADbwTHHVFWdnOQDSS5NsifJ/03yM+u0LgCAbWFkz9Tzk3ymu/9nd3eSdyT5x+uzLACA7WHkaL6zktw+d/2OJKetvlNV7U1y+C3391bV5weec61OTfKnm/A8i8bcO4u5dxZz7yzmPpqXv3xjVzKzZy13GompQ9PlsF51fbaxeyXJph5HWVX7uvu8zXzORWDuncXcO4u5dxZzby8jL/MdSHLG3PU9SfYPrQYAYJsZianrkjyrqp5ZVZXkx5L8+vosCwBgezjmmOrue5L8syRXJ7kzyV8muWKd1jVqQT6eddOZe2cx985i7p3F3NtIzQ7EAwDgWPgEdACAAWJqG6uq46rqWVu9DgDYyZYqpnbK6W2qaldVXZ3kS0mun9t+fFVdVVV3VdWtVXXR1q1y/dXMG6fZ9lfVR6pq97LPnSRV9b7pz/aBqnrbFNJLP3eSVNUvVlVX1ak7YeaqumGa77bp8n3T9p0w+xOq6i3TjPur6uXLPndVvWbud31bVf1JVX1i2edOkqp6YVX976r6o6r6/ap66nade2neMzWd3uaPklyYZF+S9yQ50N2v3dKFbYCqOi7JCzL70NT/0d0nTdv/Q5K/k+QHk5yT5MYkT50OFtj2qupxSV6R2RsUH0lyVZK/SHJPlnjuJKmq07v7zqp6YpLPJPkXSZ6d5Z/7eZn9zn8wye4kr8ryz3xDktd19w2rti/13+8kqapfTvLNSV7W3Q9Of95fkyWfe15VvS/J7yU5PUs8d1XtSvLnSc7v7puq6k2Z/R3/42zDuZdpz9SOOb1Ndz/S3dcmuW/VTRcnuaK7D3X3zZlF5QWbvsANMs319u5+ePod70tySpZ87iTp7junL0/L7MNx/zBLPndVfVOSNyW5fG7zUs98FEs9e1WdmuQlSV7V3Q8mSXfflyWfe15V/c0k5yd5b5Z/7k7yUP7/h32fkOSL2aZzL1NMren0Nktux/wMqurxSS7LbA/k0s9dVS+uqgNJPpXk56a4Wva535Hk9d1999y2ZZ85mX3MzLunl3XfXFVPmLYv++znZjbTG6vq81X1e1X19Cz/3PNemeTdU0wu9dzd/UiS70/ynmlv7ClJfjbbdO5liqk1nd5mye2In8H0ct87k3ysuz+aHTB3d3+gu5+S5JlJXlNVL8oSz11VL0lyX3d/eNVNSzvzYd39/O7ek+Q5Sb4zyU9MNy377E9O8vQk/6W7n57ZS13vzvLPnSSpqhOSvCyz/4lIlnzu6d/xH8/sbRuvyuz3/4Js07mXKaac3mYH/Aym19nfk+Tg3Pvhln7uw7r79iQfzOylgGWe+5VJvquqPl1Vn562fSzJ3Vnemb9Gd/9JZh+K/Mxp0zL/vpPZATW3dfe+6fr7M9tbtexzH/ZPk3y8u++Yri/73M9PcmZ3v6W7P5vktUl+Kdt07mWKKae3Sa5Jcvl01Nszkpyd2c9lKVTV8Ul+M8kt3f3quZuWfe7Tq+pp09cnZ3aQxcezxHN393d39znd/e3d/e3T5udmFhdLOfNh00tbh3/XP5TkD6ablvb3PflEklOq6vBJbl+UJf9zvsq/TPLWuevLPvcDSb61qp48Xf/OJH+WbTr3rq1ewHrp7nuq6vDpbU7KbBfxopzeZt1V1a2ZvWHvxKq6LcktSX4kyZWZVfz9SS7t7ge2bJHr70czC4lvq6rLpm0fSvKGLPfcJyb50HRk00NJ3tXd762q385yz30ky/67TpKrp5B6OLN/z94+bV/q2aej916cZKWqviHJbZm97HVPlnjuJKmq5yY5rruvn9u87L/v/1ZVv5Lkk1X1UJK7krw0s6P5tt3cS/PRCAAAW2GZXuYDANh0YgoAYICYAgAYIKYAAAaIKQCAAWIKAGCAmAIAGCCmgE1VVY+rqh+vqhMf4+N2VdXNVXXGdP2fTB9umap6RVW9f/q6quqiNXy/A1V1/jGMAPA1xBSw2Z6Q5IeTfLiqjp9CqI9yeUpmJ0H9Snd/Yfo+fz/JNVX1hFXf/6eSvHkutG6sqpdu0mzADiSmgE3V3Q8m+YEkT0vyH5P8SpLHz10+ktlZ5L+6rbsPJHlFku+oqi9X1ZeT/PT0LX/i8PeuquckeXWSH+juv3isa6uql1XVtxzrbMDOJKaATdfdf5bkhUl+Psk/SPLO7n64ux9O0kkOTV+fmuT6KZKe1N0ndPc3Tpe/TPLizM40f9j/SnJhd3/msa6pqn4qyZvztWesBziqpTnRMbC9dPctSVJVn07yvVX1wu7+3VV3e12SLyf5N5md+PWrquoVme2tSpJTMjvB+Q3TbV/s7hfM3f3xVXXC3PWHVn2vn0nyr5J8T3ffNDAWsAOJKWBTVdUNSZ6dJN29q7vvrarXJ3lrVV0/d79nJXlJknOTPCXJ66rq2iT3Jvn9JPuSfCnJdaue4klJLl21bWW6HPYPp/+eVFW/keS7kjyvuz81PiGw04gpYFN19/lVdVKS/zO3+VeT/I0kD89tOynJv+vu/Un2J/meqjqQ5Nndvb+qXpfkE939lvnvX1Vn5utj6p9397tW3S9Jfi3JzUm+o7vvHpkL2LnEFLBlalY0T5yuvinJ8UmOm/77ySSfnMLrUHfff4Rv8dIjfLzBCUe436P5jSSvnN6fBXBMxBSwlfYk+eMjbP++JPN7nB6sqkeSnJjks1XVmb1Z/BNJ3r/qsd+c2ccjrMV7hRQwytF8wJbp7v3dXfOXJL+b5MdWbf9r3X1Ski8mOWf6Oklu7e7fmr8k+a/Hup6q+paq+obxyYCdREwBW6qq/t7Aw/99VT08f0ly21Ge7/lV9Y2PcvOLknx0YD3ADiSmgK2wK0mq6peSvPsxPK6S7Kmqw0fm/ex0ROBXL5l9GOjXP7Dqwqr670l+K7M3tx/JaUm+8Ci3ARyRmAK2wnOm/56S2ccSHFVV7Uny5MzeI/XZI9z+t6b7nJ/kwWnbrsw+VuGXk7wrybVJnjJ9ovq9Sf7u3ONPTvKPknzuWAYCdi4xBWyFP0hySXdf1t1fWeNjDiT5t0me3t3/+Qi3X5nkU0l+Ick7pm27khxM8tYkT+3u/9Td90y3vT3Jzx8+/1+SP89sj9WvHtNEwI5V3b3VawDYMFV1XHc/stXrAJaXmAIAGOBlPgCAAWIKAGCAmAIAGCCmAAAGiCkAgAFiCgBgwP8DRUhe7fIwIj8AAAAASUVORK5CYII=\n", 271 | "text/plain": [ 272 | "" 273 | ] 274 | }, 275 | "metadata": { 276 | "needs_background": "light" 277 | }, 278 | "output_type": "display_data" 279 | } 280 | ], 281 | "source": [ 282 | "import seaborn as sns\n", 283 | "import matplotlib.pyplot as plt\n", 284 | "\n", 285 | "fig,ax = plt.subplots(1,1,figsize = (10,5))\n", 286 | "sns.distplot(df['评销比'],color = 'red',kde = False)\n", 287 | "\n", 288 | "plt.yticks(fontsize=11)\n", 289 | "plt.xticks(fontsize=11)\n", 290 | "\n", 291 | "ax.set_xlabel('评销比', fontsize=14)" 292 | ] 293 | }, 294 | { 295 | "cell_type": "markdown", 296 | "metadata": {}, 297 | "source": [ 298 | "### 判断是否有刷单嫌疑" 299 | ] 300 | }, 301 | { 302 | "cell_type": "code", 303 | "execution_count": 20, 304 | "metadata": {}, 305 | "outputs": [ 306 | { 307 | "data": { 308 | "text/plain": [ 309 | "False 166\n", 310 | "True 22\n", 311 | "Name: 是否有刷单嫌疑, dtype: int64" 312 | ] 313 | }, 314 | "execution_count": 20, 315 | "metadata": {}, 316 | "output_type": "execute_result" 317 | } 318 | ], 319 | "source": [ 320 | "df['是否有刷单嫌疑'] = df['评销比'] > 40\n", 321 | "df['是否有刷单嫌疑'].value_counts()" 322 | ] 323 | }, 324 | { 325 | "cell_type": "markdown", 326 | "metadata": {}, 327 | "source": [ 328 | "### 导入评论数据" 329 | ] 330 | }, 331 | { 332 | "cell_type": "code", 333 | "execution_count": 28, 334 | "metadata": {}, 335 | "outputs": [ 336 | { 337 | "data": { 338 | "text/html": [ 339 | "
\n", 340 | "\n", 353 | "\n", 354 | " \n", 355 | " \n", 356 | " \n", 357 | " \n", 358 | " \n", 359 | " \n", 360 | " \n", 361 | " \n", 362 | " \n", 363 | " \n", 364 | " \n", 365 | " \n", 366 | " \n", 367 | " \n", 368 | " \n", 369 | " \n", 370 | " \n", 371 | " \n", 372 | " \n", 373 | " \n", 374 | " \n", 375 | " \n", 376 | " \n", 377 | " \n", 378 | " \n", 379 | " \n", 380 | " \n", 381 | " \n", 382 | " \n", 383 | " \n", 384 | " \n", 385 | " \n", 386 | " \n", 387 | " \n", 388 | " \n", 389 | " \n", 390 | " \n", 391 | " \n", 392 | " \n", 393 | " \n", 394 | " \n", 395 | " \n", 396 | " \n", 397 | " \n", 398 | " \n", 399 | " \n", 400 | "
买家初评内容评价日期追评
0摈**唉昨天晚上用了一次, 姜味很浓,用过一段时间再看看效果吧,好用会再回购的!2019-11-29-
1t**4最近脱发特别严重,鬓角的头发最是损失惨重,抱着试试看的态度来的,目前我用了1个疗程感觉恢复得...2019-11-29-
2露**发最近头发大把大把的脱,特别是洗头的时候!刚开始是抱着试试的心态,每次都会隔断时间拍照自己对比...2019-11-29-
3t**6质量很好,效果不错2019-11-29-
4去**5这次放假回家看到老爸的大脑门,莫名的揪心,老爸为家庭操心了太多,头发一直在掉,这次买了这款防...2019-11-29-
\n", 401 | "
" 402 | ], 403 | "text/plain": [ 404 | " 买家 初评内容 评价日期 追评\n", 405 | "0 摈**唉 昨天晚上用了一次, 姜味很浓,用过一段时间再看看效果吧,好用会再回购的! 2019-11-29 -\n", 406 | "1 t**4 最近脱发特别严重,鬓角的头发最是损失惨重,抱着试试看的态度来的,目前我用了1个疗程感觉恢复得... 2019-11-29 -\n", 407 | "2 露**发 最近头发大把大把的脱,特别是洗头的时候!刚开始是抱着试试的心态,每次都会隔断时间拍照自己对比... 2019-11-29 -\n", 408 | "3 t**6 质量很好,效果不错 2019-11-29 -\n", 409 | "4 去**5 这次放假回家看到老爸的大脑门,莫名的揪心,老爸为家庭操心了太多,头发一直在掉,这次买了这款防... 2019-11-29 -" 410 | ] 411 | }, 412 | "execution_count": 28, 413 | "metadata": {}, 414 | "output_type": "execute_result" 415 | } 416 | ], 417 | "source": [ 418 | "comments = pd.read_excel('生姜防脱洗发水.xlsx')\n", 419 | "comments.head()" 420 | ] 421 | }, 422 | { 423 | "cell_type": "markdown", 424 | "metadata": {}, 425 | "source": [ 426 | "### 评价长度筛选" 427 | ] 428 | }, 429 | { 430 | "cell_type": "code", 431 | "execution_count": 29, 432 | "metadata": {}, 433 | "outputs": [ 434 | { 435 | "name": "stdout", 436 | "output_type": "stream", 437 | "text": [ 438 | "(1200, 5)\n" 439 | ] 440 | }, 441 | { 442 | "data": { 443 | "text/html": [ 444 | "
\n", 445 | "\n", 458 | "\n", 459 | " \n", 460 | " \n", 461 | " \n", 462 | " \n", 463 | " \n", 464 | " \n", 465 | " \n", 466 | " \n", 467 | " \n", 468 | " \n", 469 | " \n", 470 | " \n", 471 | " \n", 472 | " \n", 473 | " \n", 474 | " \n", 475 | " \n", 476 | " \n", 477 | " \n", 478 | " \n", 479 | " \n", 480 | " \n", 481 | " \n", 482 | " \n", 483 | " \n", 484 | " \n", 485 | " \n", 486 | " \n", 487 | " \n", 488 | " \n", 489 | " \n", 490 | " \n", 491 | " \n", 492 | " \n", 493 | " \n", 494 | " \n", 495 | " \n", 496 | " \n", 497 | " \n", 498 | " \n", 499 | " \n", 500 | " \n", 501 | " \n", 502 | " \n", 503 | " \n", 504 | " \n", 505 | " \n", 506 | " \n", 507 | " \n", 508 | " \n", 509 | " \n", 510 | " \n", 511 | "
买家初评内容评价日期追评评价长度
0摈**唉昨天晚上用了一次, 姜味很浓,用过一段时间再看看效果吧,好用会再回购的!2019-11-29-36
1t**4最近脱发特别严重,鬓角的头发最是损失惨重,抱着试试看的态度来的,目前我用了1个疗程感觉恢复得...2019-11-29-80
2露**发最近头发大把大把的脱,特别是洗头的时候!刚开始是抱着试试的心态,每次都会隔断时间拍照自己对比...2019-11-29-85
4去**5这次放假回家看到老爸的大脑门,莫名的揪心,老爸为家庭操心了太多,头发一直在掉,这次买了这款防...2019-11-29-76
5德**艺以前就用过这款生姜洗发水防脱发效果真的很好,这次这个疗程是买来巩固的用过之后脱发已经很少了,...2019-11-29-60
\n", 512 | "
" 513 | ], 514 | "text/plain": [ 515 | " 买家 初评内容 评价日期 追评 评价长度\n", 516 | "0 摈**唉 昨天晚上用了一次, 姜味很浓,用过一段时间再看看效果吧,好用会再回购的! 2019-11-29 - 36\n", 517 | "1 t**4 最近脱发特别严重,鬓角的头发最是损失惨重,抱着试试看的态度来的,目前我用了1个疗程感觉恢复得... 2019-11-29 - 80\n", 518 | "2 露**发 最近头发大把大把的脱,特别是洗头的时候!刚开始是抱着试试的心态,每次都会隔断时间拍照自己对比... 2019-11-29 - 85\n", 519 | "4 去**5 这次放假回家看到老爸的大脑门,莫名的揪心,老爸为家庭操心了太多,头发一直在掉,这次买了这款防... 2019-11-29 - 76\n", 520 | "5 德**艺 以前就用过这款生姜洗发水防脱发效果真的很好,这次这个疗程是买来巩固的用过之后脱发已经很少了,... 2019-11-29 - 60" 521 | ] 522 | }, 523 | "execution_count": 29, 524 | "metadata": {}, 525 | "output_type": "execute_result" 526 | } 527 | ], 528 | "source": [ 529 | "comments['评价长度'] = comments['初评内容'].apply(len)\n", 530 | "comments = comments.loc[comments['评价长度'] > 15,:]\n", 531 | "print(comments.shape)\n", 532 | "comments.head()" 533 | ] 534 | }, 535 | { 536 | "cell_type": "markdown", 537 | "metadata": {}, 538 | "source": [ 539 | "### 按内容排序,找到嫌疑评价" 540 | ] 541 | }, 542 | { 543 | "cell_type": "code", 544 | "execution_count": 31, 545 | "metadata": {}, 546 | "outputs": [ 547 | { 548 | "data": { 549 | "text/html": [ 550 | "
\n", 551 | "\n", 564 | "\n", 565 | " \n", 566 | " \n", 567 | " \n", 568 | " \n", 569 | " \n", 570 | " \n", 571 | " \n", 572 | " \n", 573 | " \n", 574 | " \n", 575 | " \n", 576 | " \n", 577 | " \n", 578 | " \n", 579 | " \n", 580 | " \n", 581 | " \n", 582 | " \n", 583 | " \n", 584 | " \n", 585 | " \n", 586 | " \n", 587 | " \n", 588 | " \n", 589 | " \n", 590 | " \n", 591 | " \n", 592 | " \n", 593 | " \n", 594 | " \n", 595 | " \n", 596 | " \n", 597 | " \n", 598 | " \n", 599 | " \n", 600 | " \n", 601 | " \n", 602 | " \n", 603 | " \n", 604 | " \n", 605 | " \n", 606 | " \n", 607 | " \n", 608 | " \n", 609 | " \n", 610 | " \n", 611 | " \n", 612 | " \n", 613 | " \n", 614 | " \n", 615 | " \n", 616 | " \n", 617 | " \n", 618 | " \n", 619 | " \n", 620 | " \n", 621 | " \n", 622 | " \n", 623 | " \n", 624 | " \n", 625 | "
买家初评内容评价日期追评评价长度
1307你**个感觉越洗头发掉得越多,每次洗必须要用洗发水两次以上,还要搓按5分钟,这样洗下去头发本来就少,...2019-07-11我是短发,洗一次掉这么多,以前洗只掉几根,洗了之后头痒的要死,当初客服说用了不适应可以退,现...348
1147y**8使用了第二次才来评价的,我头发很长(齐膝)掉得特别厉害。之前使用防脱洗发水用完之后呢换成了潘...2019-09-02长头发的妹子可以试试这款洗发水哦!我现在掉发已经开始在变少了,开心290
6290**b1客服小海马说寄来的品牌是柏诗春天,我下单购买的海洋诗韵,俩不同品牌都是一个厂家生产的,让我...2019-10-22-290
151t**1自从高考那个紧张的阶段后,我的头发就很会掉,每天都房间里,床铺上地上都可以看到我的掉发,每次...2019-11-21-177
587女**8自从高考那个紧张的阶段后,我的头发就很会掉,每天都房间里,床铺上地上都可以看到我的掉发,每次...2019-10-24-177
674e**1自从高考那个紧张的阶段后,我的头发就很会掉,每天都房间里,床铺上地上都可以看到我的掉发,每次...2019-10-16-177
\n", 626 | "
" 627 | ], 628 | "text/plain": [ 629 | " 买家 初评内容 评价日期 \\\n", 630 | "1307 你**个 感觉越洗头发掉得越多,每次洗必须要用洗发水两次以上,还要搓按5分钟,这样洗下去头发本来就少,... 2019-07-11 \n", 631 | "1147 y**8 使用了第二次才来评价的,我头发很长(齐膝)掉得特别厉害。之前使用防脱洗发水用完之后呢换成了潘... 2019-09-02 \n", 632 | "629 0**b 1客服小海马说寄来的品牌是柏诗春天,我下单购买的海洋诗韵,俩不同品牌都是一个厂家生产的,让我... 2019-10-22 \n", 633 | "151 t**1 自从高考那个紧张的阶段后,我的头发就很会掉,每天都房间里,床铺上地上都可以看到我的掉发,每次... 2019-11-21 \n", 634 | "587 女**8 自从高考那个紧张的阶段后,我的头发就很会掉,每天都房间里,床铺上地上都可以看到我的掉发,每次... 2019-10-24 \n", 635 | "674 e**1 自从高考那个紧张的阶段后,我的头发就很会掉,每天都房间里,床铺上地上都可以看到我的掉发,每次... 2019-10-16 \n", 636 | "\n", 637 | " 追评 评价长度 \n", 638 | "1307 我是短发,洗一次掉这么多,以前洗只掉几根,洗了之后头痒的要死,当初客服说用了不适应可以退,现... 348 \n", 639 | "1147 长头发的妹子可以试试这款洗发水哦!我现在掉发已经开始在变少了,开心 290 \n", 640 | "629 - 290 \n", 641 | "151 - 177 \n", 642 | "587 - 177 \n", 643 | "674 - 177 " 644 | ] 645 | }, 646 | "execution_count": 31, 647 | "metadata": {}, 648 | "output_type": "execute_result" 649 | } 650 | ], 651 | "source": [ 652 | "comments = comments.sort_values(['评价长度','初评内容'],ascending = False)\n", 653 | "comments.head(6)" 654 | ] 655 | }, 656 | { 657 | "cell_type": "markdown", 658 | "metadata": {}, 659 | "source": [ 660 | "### 统计重复评价数" 661 | ] 662 | }, 663 | { 664 | "cell_type": "code", 665 | "execution_count": 27, 666 | "metadata": {}, 667 | "outputs": [ 668 | { 669 | "name": "stdout", 670 | "output_type": "stream", 671 | "text": [ 672 | "总评价数: 1200\n", 673 | "重复的评价数占比:31.5%\n" 674 | ] 675 | } 676 | ], 677 | "source": [ 678 | "#按内容分组,统计每条评价出现的次数\n", 679 | "filt = comments.groupby('初评内容')['买家'].count().reset_index()\n", 680 | "filt.columns = ['初评内容','重复次数']\n", 681 | "\n", 682 | "#统计重复评价出现的次数\n", 683 | "reap = filt.loc[filt['重复次数'] > 1,'重复次数'].sum()\n", 684 | "\n", 685 | "print('总评价数:',len(comments))\n", 686 | "print('重复的评价数占比:{}%'.format(reap / len(comments) * 100))" 687 | ] 688 | }, 689 | { 690 | "cell_type": "code", 691 | "execution_count": null, 692 | "metadata": {}, 693 | "outputs": [], 694 | "source": [] 695 | } 696 | ], 697 | "metadata": { 698 | "kernelspec": { 699 | "display_name": "Python 3", 700 | "language": "python", 701 | "name": "python3" 702 | }, 703 | "language_info": { 704 | "codemirror_mode": { 705 | "name": "ipython", 706 | "version": 3 707 | }, 708 | "file_extension": ".py", 709 | "mimetype": "text/x-python", 710 | "name": "python", 711 | "nbconvert_exporter": "python", 712 | "pygments_lexer": "ipython3", 713 | "version": "3.5.3" 714 | } 715 | }, 716 | "nbformat": 4, 717 | "nbformat_minor": 2 718 | } 719 | -------------------------------------------------------------------------------- /DoubanMovies/README.md: -------------------------------------------------------------------------------- 1 | # 豆瓣电影爬取及自制年代榜单 # 2 | 3 | ---------- 4 | 5 | 项目以豆瓣电影为例,详述爬取、清洗、分析全过程,尽可能详细的展示数据分析的清晰逻辑链条。 6 | 7 | 主要文件为: 8 | 9 | - 豆瓣9000部电影的爬取代码 10 | - 基于爬取数据清洗和分析的详细操作代码 11 | - 电影爬取源数据集两份和最终电影排行榜 12 | 13 | -------------------------------------------------------------------------------- /DoubanMovies/最终电影排名结果.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/DoubanMovies/最终电影排名结果.xlsx -------------------------------------------------------------------------------- /DoubanMovies/电影基本信息大全.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/DoubanMovies/电影基本信息大全.xlsx -------------------------------------------------------------------------------- /DoubanMovies/电影详细信息.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/DoubanMovies/电影详细信息.xlsx -------------------------------------------------------------------------------- /DoubanMovies/豆瓣电影爬取.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 3, 6 | "metadata": { 7 | "collapsed": true 8 | }, 9 | "outputs": [], 10 | "source": [ 11 | "import os\n", 12 | "import requests\n", 13 | "import pandas as pd\n", 14 | "import numpy as np\n", 15 | "import json\n", 16 | "import time\n", 17 | "import random\n", 18 | "from lxml import etree" 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": {}, 24 | "source": [ 25 | "### 影信息通过动态加载,所有的信息都藏在基础网页,唯一变动的是start" 26 | ] 27 | }, 28 | { 29 | "cell_type": "code", 30 | "execution_count": 5, 31 | "metadata": { 32 | "collapsed": true 33 | }, 34 | "outputs": [], 35 | "source": [ 36 | "url1 = 'https://movie.douban.com/j/new_search_subjects?sort=T&range=0,10&tags=%E7%94%B5%E5%BD%B1&start=0'\n", 37 | "url2 = 'https://movie.douban.com/j/new_search_subjects?sort=T&range=0,10&tags=%E7%94%B5%E5%BD%B1&start=20'\n", 38 | "url3 = 'https://movie.douban.com/j/new_search_subjects?sort=T&range=0,10&tags=%E7%94%B5%E5%BD%B1&start=40'\n", 39 | "url4 = 'https://movie.douban.com/j/new_search_subjects?sort=T&range=0,10&tags=%E7%94%B5%E5%BD%B1&start=60'" 40 | ] 41 | }, 42 | { 43 | "cell_type": "markdown", 44 | "metadata": {}, 45 | "source": [ 46 | "### 构造爬取的网址" 47 | ] 48 | }, 49 | { 50 | "cell_type": "code", 51 | "execution_count": 9, 52 | "metadata": { 53 | "collapsed": true 54 | }, 55 | "outputs": [], 56 | "source": [ 57 | "#构造网页\n", 58 | "def format_url(num):\n", 59 | " urls = []\n", 60 | " base_url = 'https://movie.douban.com/j/new_search_subjects?sort=T&range=0,10&tags=%E7%94%B5%E5%BD%B1&start={}'\n", 61 | " for i in range(0,20 * num,20):\n", 62 | " url = base_url.format(i)\n", 63 | " urls.append(url)\n", 64 | " return urls\n", 65 | "\n", 66 | "#这里是爬取10页,可以自行更改参数\n", 67 | "urls = format_url(10)" 68 | ] 69 | }, 70 | { 71 | "cell_type": "markdown", 72 | "metadata": { 73 | "collapsed": true 74 | }, 75 | "source": [ 76 | "### 伪装请求头" 77 | ] 78 | }, 79 | { 80 | "cell_type": "code", 81 | "execution_count": 8, 82 | "metadata": { 83 | "collapsed": true 84 | }, 85 | "outputs": [], 86 | "source": [ 87 | "headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}" 88 | ] 89 | }, 90 | { 91 | "cell_type": "markdown", 92 | "metadata": {}, 93 | "source": [ 94 | "### 解析单页" 95 | ] 96 | }, 97 | { 98 | "cell_type": "code", 99 | "execution_count": 11, 100 | "metadata": { 101 | "collapsed": true 102 | }, 103 | "outputs": [], 104 | "source": [ 105 | "def parse_base_info(url,headers):\n", 106 | " html = requests.get(url,headers = headers) \n", 107 | " bs = json.loads(html.text)\n", 108 | " df = pd.DataFrame()\n", 109 | " for i in bs['data']:\n", 110 | " casts = i['casts'] #主演\n", 111 | " cover = i['cover'] #海报\n", 112 | " directors = i['directors'] #导演\n", 113 | " m_id = i['id'] #ID\n", 114 | " rate = i['rate'] #评分\n", 115 | " star = i['star'] #标记人数 \n", 116 | " title = i['title'] #片名\n", 117 | " url = i['url'] #网址\n", 118 | " cache = pd.DataFrame({'主演':[casts],'海报':[cover],'导演':[directors],\n", 119 | " 'ID':[m_id],'评分':[rate],'标记':[star],'片名':[title],'网址':[url]})\n", 120 | " df = pd.concat([df,cache])\n", 121 | " return df" 122 | ] 123 | }, 124 | { 125 | "cell_type": "markdown", 126 | "metadata": { 127 | "collapsed": true 128 | }, 129 | "source": [ 130 | "### 循环批量爬取电影" 131 | ] 132 | }, 133 | { 134 | "cell_type": "code", 135 | "execution_count": 12, 136 | "metadata": { 137 | "collapsed": false 138 | }, 139 | "outputs": [ 140 | { 141 | "name": "stdout", 142 | "output_type": "stream", 143 | "text": [ 144 | "I had crawled page of:1\n", 145 | "I had crawled page of:2\n", 146 | "I had crawled page of:3\n", 147 | "I had crawled page of:4\n", 148 | "I had crawled page of:5\n", 149 | "I had crawled page of:6\n", 150 | "I had crawled page of:7\n", 151 | "I had crawled page of:8\n", 152 | "I had crawled page of:9\n", 153 | "I had crawled page of:10\n" 154 | ] 155 | } 156 | ], 157 | "source": [ 158 | "result = pd.DataFrame()\n", 159 | "\n", 160 | "count = 1\n", 161 | "for url in urls:\n", 162 | " df = parse_base_info(url,headers = headers)\n", 163 | " result = pd.concat([result,df])\n", 164 | " time.sleep(random.random() + 2)\n", 165 | " print('I had crawled page of:%d' % count)\n", 166 | " count += 1" 167 | ] 168 | }, 169 | { 170 | "cell_type": "code", 171 | "execution_count": 13, 172 | "metadata": { 173 | "collapsed": false 174 | }, 175 | "outputs": [ 176 | { 177 | "data": { 178 | "text/html": [ 179 | "
\n", 180 | "\n", 193 | "\n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | "
ID主演导演标记海报片名网址评分
026752088[徐峥, 王传君, 周一围, 谭卓, 章宇][文牧野]45https://img3.doubanio.com/view/photo/s_ratio_p...我不是药神https://movie.douban.com/subject/26752088/9.0
01295644[让·雷诺, 娜塔莉·波特曼, 加里·奥德曼, 丹尼·爱罗, 彼得·阿佩尔][吕克·贝松]45https://img3.doubanio.com/view/photo/s_ratio_p...这个杀手不太冷https://movie.douban.com/subject/1295644/9.4
01292052[蒂姆·罗宾斯, 摩根·弗里曼, 鲍勃·冈顿, 威廉姆·赛德勒, 克兰西·布朗][弗兰克·德拉邦特]50https://img3.doubanio.com/view/photo/s_ratio_p...肖申克的救赎https://movie.douban.com/subject/1292052/9.7
026266893[屈楚萧, 吴京, 李光洁, 吴孟达, 赵今麦][郭帆]40https://img3.doubanio.com/view/photo/s_ratio_p...流浪地球https://movie.douban.com/subject/26266893/7.9
01292720[汤姆·汉克斯, 罗宾·怀特, 加里·西尼斯, 麦凯尔泰·威廉逊, 莎莉·菲尔德][罗伯特·泽米吉斯]45https://img3.doubanio.com/view/photo/s_ratio_p...阿甘正传https://movie.douban.com/subject/1292720/9.4
\n", 265 | "
" 266 | ], 267 | "text/plain": [ 268 | " ID 主演 导演 标记 \\\n", 269 | "0 26752088 [徐峥, 王传君, 周一围, 谭卓, 章宇] [文牧野] 45 \n", 270 | "0 1295644 [让·雷诺, 娜塔莉·波特曼, 加里·奥德曼, 丹尼·爱罗, 彼得·阿佩尔] [吕克·贝松] 45 \n", 271 | "0 1292052 [蒂姆·罗宾斯, 摩根·弗里曼, 鲍勃·冈顿, 威廉姆·赛德勒, 克兰西·布朗] [弗兰克·德拉邦特] 50 \n", 272 | "0 26266893 [屈楚萧, 吴京, 李光洁, 吴孟达, 赵今麦] [郭帆] 40 \n", 273 | "0 1292720 [汤姆·汉克斯, 罗宾·怀特, 加里·西尼斯, 麦凯尔泰·威廉逊, 莎莉·菲尔德] [罗伯特·泽米吉斯] 45 \n", 274 | "\n", 275 | " 海报 片名 \\\n", 276 | "0 https://img3.doubanio.com/view/photo/s_ratio_p... 我不是药神 \n", 277 | "0 https://img3.doubanio.com/view/photo/s_ratio_p... 这个杀手不太冷 \n", 278 | "0 https://img3.doubanio.com/view/photo/s_ratio_p... 肖申克的救赎 \n", 279 | "0 https://img3.doubanio.com/view/photo/s_ratio_p... 流浪地球 \n", 280 | "0 https://img3.doubanio.com/view/photo/s_ratio_p... 阿甘正传 \n", 281 | "\n", 282 | " 网址 评分 \n", 283 | "0 https://movie.douban.com/subject/26752088/ 9.0 \n", 284 | "0 https://movie.douban.com/subject/1295644/ 9.4 \n", 285 | "0 https://movie.douban.com/subject/1292052/ 9.7 \n", 286 | "0 https://movie.douban.com/subject/26266893/ 7.9 \n", 287 | "0 https://movie.douban.com/subject/1292720/ 9.4 " 288 | ] 289 | }, 290 | "execution_count": 13, 291 | "metadata": {}, 292 | "output_type": "execute_result" 293 | } 294 | ], 295 | "source": [ 296 | "result.head()" 297 | ] 298 | }, 299 | { 300 | "cell_type": "markdown", 301 | "metadata": { 302 | "collapsed": false 303 | }, 304 | "source": [ 305 | "### 解析单个页面,获取详细的电影信息" 306 | ] 307 | }, 308 | { 309 | "cell_type": "code", 310 | "execution_count": 14, 311 | "metadata": { 312 | "collapsed": false 313 | }, 314 | "outputs": [], 315 | "source": [ 316 | "def parse_movie_info(url,headers = headers,ip = ''):\n", 317 | " if ip == '':\n", 318 | " html = requests.get(url,headers = headers)\n", 319 | " else:\n", 320 | " html = requests.get(url,headers = headers,proxies = ip)\n", 321 | " bs = etree.HTML(html.text)\n", 322 | " #片名\n", 323 | " title = bs.xpath('//div[@id = \"wrapper\"]/div/h1/span')[0].text \n", 324 | " #上映时间\n", 325 | " year = bs.xpath('//div[@id = \"wrapper\"]/div/h1/span')[1].text \n", 326 | " #电影类型\n", 327 | " m_type = []\n", 328 | " for t in bs.xpath('//span[@property = \"v:genre\"]'):\n", 329 | " m_type.append(t.text) \n", 330 | " a = bs.xpath('//div[@id= \"info\"]')[0].xpath('string()')\n", 331 | " #片长\n", 332 | " m_time =a[a.find('片长: ') + 4:a.find('分钟\\n')] #时长\n", 333 | " #地区\n", 334 | " area = a[a.find('制片国家/地区:') + 9:a.find('\\n 语言')] #地区\n", 335 | " #评分人数\n", 336 | " try:\n", 337 | " people = bs.xpath('//a[@class = \"rating_people\"]/span')[0].text\n", 338 | " #评分分布\n", 339 | " rating = {}\n", 340 | " rate_count = bs.xpath('//div[@class = \"ratings-on-weight\"]/div')\n", 341 | " for rate in rate_count:\n", 342 | " rating[rate.xpath('span/@title')[0]] = rate.xpath('span[@class = \"rating_per\"]')[0].text\n", 343 | " except:\n", 344 | " people = 'None'\n", 345 | " rating = {}\n", 346 | " #简介\n", 347 | " try:\n", 348 | " brief = bs.xpath('//span[@property = \"v:summary\"]')[0].text.strip('\\n \\u3000\\u3000')\n", 349 | " except:\n", 350 | " brief = 'None'\n", 351 | " try:\n", 352 | " hot_comment = bs.xpath('//div[@id = \"hot-comments\"]/div/div/p/span')[0].text\n", 353 | " except:\n", 354 | " hot_comment = 'None'\n", 355 | " cache = pd.DataFrame({'片名':[title],'上映时间':[year],'电影类型':[m_type],'片长':[m_time],\n", 356 | " '地区':[area],'评分人数':[people],'评分分布':[rating],'简介':[brief],'热评':[hot_comment],'网址':[url]})\n", 357 | " return cache" 358 | ] 359 | }, 360 | { 361 | "cell_type": "markdown", 362 | "metadata": { 363 | "collapsed": true 364 | }, 365 | "source": [ 366 | "### 批量访问单个电影页面" 367 | ] 368 | }, 369 | { 370 | "cell_type": "code", 371 | "execution_count": 15, 372 | "metadata": { 373 | "collapsed": false 374 | }, 375 | "outputs": [ 376 | { 377 | "name": "stdout", 378 | "output_type": "stream", 379 | "text": [ 380 | "我们爬取了第:1部电影-------我不是药神\n", 381 | "我们爬取了第:2部电影-------这个杀手不太冷\n", 382 | "我们爬取了第:3部电影-------肖申克的救赎\n", 383 | "我们爬取了第:4部电影-------流浪地球\n", 384 | "我们爬取了第:5部电影-------阿甘正传\n", 385 | "我们爬取了第:6部电影-------复仇者联盟3:无限战争\n", 386 | "我们爬取了第:7部电影-------盗梦空间\n", 387 | "我们爬取了第:8部电影-------西虹市首富\n", 388 | "我们爬取了第:9部电影-------泰坦尼克号\n", 389 | "我们爬取了第:10部电影-------千与千寻\n", 390 | "我们爬取了第:11部电影-------霸王别姬\n", 391 | "我们爬取了第:12部电影-------三傻大闹宝莱坞\n", 392 | "我们爬取了第:13部电影-------让子弹飞\n", 393 | "我们爬取了第:14部电影-------怦然心动\n", 394 | "我们爬取了第:15部电影-------摔跤吧!爸爸\n", 395 | "我们爬取了第:16部电影-------毒液:致命守护者\n", 396 | "我们爬取了第:17部电影-------疯狂动物城\n", 397 | "我们爬取了第:18部电影-------忠犬八公的故事\n", 398 | "我们爬取了第:19部电影-------一出好戏\n", 399 | "我们爬取了第:20部电影-------当幸福来敲门\n", 400 | "我们爬取了第:21部电影-------海上钢琴师\n", 401 | "我们爬取了第:22部电影-------大话西游之大圣娶亲\n", 402 | "我们爬取了第:23部电影-------海王\n", 403 | "我们爬取了第:24部电影-------楚门的世界\n", 404 | "我们爬取了第:25部电影-------你的名字。\n", 405 | "我们爬取了第:26部电影-------阿凡达\n", 406 | "我们爬取了第:27部电影-------少年派的奇幻漂流\n", 407 | "我们爬取了第:28部电影-------星际穿越\n", 408 | "我们爬取了第:29部电影-------头号玩家\n", 409 | "我们爬取了第:30部电影-------无双\n", 410 | "我们爬取了第:31部电影-------放牛班的春天\n", 411 | "我们爬取了第:32部电影-------飞屋环游记\n", 412 | "我们爬取了第:33部电影-------机器人总动员\n", 413 | "我们爬取了第:34部电影-------那些年,我们一起追的女孩\n", 414 | "我们爬取了第:35部电影-------龙猫\n", 415 | "我们爬取了第:36部电影-------寻梦环游记\n", 416 | "我们爬取了第:37部电影-------红海行动\n", 417 | "我们爬取了第:38部电影-------初恋这件小事\n", 418 | "我们爬取了第:39部电影-------大话西游之月光宝盒\n", 419 | "我们爬取了第:40部电影-------无名之辈\n", 420 | "我们爬取了第:41部电影-------无间道\n", 421 | "我们爬取了第:42部电影-------天使爱美丽\n", 422 | "我们爬取了第:43部电影-------碟中谍6:全面瓦解\n", 423 | "我们爬取了第:44部电影-------剪刀手爱德华\n", 424 | "我们爬取了第:45部电影-------复仇者联盟\n", 425 | "我们爬取了第:46部电影-------战狼2\n", 426 | "我们爬取了第:47部电影-------美丽人生\n", 427 | "我们爬取了第:48部电影-------绿皮书\n", 428 | "我们爬取了第:49部电影-------飞驰人生\n", 429 | "我们爬取了第:50部电影-------罗马假日\n", 430 | "我们爬取了第:51部电影-------V字仇杀队\n", 431 | "我们爬取了第:52部电影-------唐伯虎点秋香\n", 432 | "我们爬取了第:53部电影-------夏洛特烦恼\n", 433 | "我们爬取了第:54部电影-------唐人街探案2\n", 434 | "我们爬取了第:55部电影-------动物世界\n", 435 | "我们爬取了第:56部电影-------辛德勒的名单\n", 436 | "我们爬取了第:57部电影-------芳华\n", 437 | "我们爬取了第:58部电影-------人再囧途之泰囧\n", 438 | "我们爬取了第:59部电影-------老炮儿\n", 439 | "我们爬取了第:60部电影-------釜山行\n", 440 | "我们爬取了第:61部电影-------蝴蝶效应\n", 441 | "我们爬取了第:62部电影-------神偷奶爸\n", 442 | "我们爬取了第:63部电影-------七宗罪\n", 443 | "我们爬取了第:64部电影-------邪不压正\n", 444 | "我们爬取了第:65部电影-------疯狂的外星人\n", 445 | "我们爬取了第:66部电影-------哈尔的移动城堡\n", 446 | "我们爬取了第:67部电影-------复仇者联盟4:终局之战\n", 447 | "我们爬取了第:68部电影-------蚁人2:黄蜂女现身\n", 448 | "我们爬取了第:69部电影-------失恋33天\n", 449 | "我们爬取了第:70部电影-------看不见的客人\n", 450 | "我们爬取了第:71部电影-------蝙蝠侠:黑暗骑士\n", 451 | "我们爬取了第:72部电影-------湄公河行动\n", 452 | "我们爬取了第:73部电影-------加勒比海盗\n", 453 | "我们爬取了第:74部电影-------本杰明·巴顿奇事\n", 454 | "我们爬取了第:75部电影-------喜剧之王\n", 455 | "我们爬取了第:76部电影-------西西里的美丽传说\n", 456 | "我们爬取了第:77部电影-------美人鱼\n", 457 | "我们爬取了第:78部电影-------中国合伙人\n", 458 | "我们爬取了第:79部电影-------小偷家族\n", 459 | "我们爬取了第:80部电影-------疯狂原始人\n", 460 | "我们爬取了第:81部电影-------触不可及\n", 461 | "我们爬取了第:82部电影-------钢铁侠\n", 462 | "我们爬取了第:83部电影-------后会无期\n", 463 | "我们爬取了第:84部电影-------超能陆战队\n", 464 | "我们爬取了第:85部电影-------黑天鹅\n", 465 | "我们爬取了第:86部电影-------北京遇上西雅图\n", 466 | "我们爬取了第:87部电影-------情书\n", 467 | "我们爬取了第:88部电影-------奇异博士\n", 468 | "我们爬取了第:89部电影-------教父\n", 469 | "我们爬取了第:90部电影-------血战钢锯岭\n", 470 | "我们爬取了第:91部电影-------天空之城\n", 471 | "我们爬取了第:92部电影-------功夫\n", 472 | "我们爬取了第:93部电影-------超时空同居\n", 473 | "我们爬取了第:94部电影-------禁闭岛\n", 474 | "我们爬取了第:95部电影-------银河护卫队\n", 475 | "我们爬取了第:96部电影-------倩女幽魂\n", 476 | "我们爬取了第:97部电影-------无问西东\n", 477 | "我们爬取了第:98部电影-------唐人街探案\n", 478 | "我们爬取了第:99部电影-------羞羞的铁拳\n", 479 | "我们爬取了第:100部电影-------复仇者联盟2:奥创纪元\n", 480 | "我们爬取了第:101部电影-------贫民窟的百万富翁\n", 481 | "我们爬取了第:102部电影-------搏击俱乐部\n", 482 | "我们爬取了第:103部电影-------源代码\n", 483 | "我们爬取了第:104部电影-------爱乐之城\n", 484 | "我们爬取了第:105部电影-------七月与安生\n", 485 | "我们爬取了第:106部电影-------闻香识女人\n", 486 | "我们爬取了第:107部电影-------狮子王\n", 487 | "我们爬取了第:108部电影-------沉默的羔羊\n", 488 | "我们爬取了第:109部电影-------穿普拉达的女王\n", 489 | "我们爬取了第:110部电影-------驴得水\n", 490 | "我们爬取了第:111部电影-------黑客帝国\n", 491 | "我们爬取了第:112部电影-------疯狂的石头\n", 492 | "我们爬取了第:113部电影-------哈利·波特与魔法石\n", 493 | "我们爬取了第:114部电影-------妖猫传\n", 494 | "我们爬取了第:115部电影-------美国队长3\n", 495 | "我们爬取了第:116部电影-------天才枪手\n", 496 | "我们爬取了第:117部电影-------我的少女时代\n", 497 | "我们爬取了第:118部电影-------敦刻尔克\n", 498 | "我们爬取了第:119部电影-------重庆森林\n", 499 | "我们爬取了第:120部电影-------低俗小说\n", 500 | "我们爬取了第:121部电影-------西游记之大圣归来\n", 501 | "我们爬取了第:122部电影-------人在囧途\n", 502 | "我们爬取了第:123部电影-------消失的爱人\n", 503 | "我们爬取了第:124部电影-------王牌特工:特工学院\n", 504 | "我们爬取了第:125部电影-------国王的演讲\n", 505 | "我们爬取了第:126部电影-------美国队长2\n", 506 | "我们爬取了第:127部电影-------美丽心灵\n", 507 | "我们爬取了第:128部电影-------熔炉\n", 508 | "我们爬取了第:129部电影-------影\n", 509 | "我们爬取了第:130部电影-------钢铁侠3\n", 510 | "我们爬取了第:131部电影-------指环王1:魔戒再现\n", 511 | "我们爬取了第:132部电影-------火星救援\n", 512 | "我们爬取了第:133部电影-------钢铁侠2\n", 513 | "我们爬取了第:134部电影-------蚁人\n", 514 | "我们爬取了第:135部电影-------傲慢与偏见\n", 515 | "我们爬取了第:136部电影-------致命魔术\n", 516 | "我们爬取了第:137部电影-------三块广告牌\n", 517 | "我们爬取了第:138部电影-------布达佩斯大饭店\n", 518 | "我们爬取了第:139部电影-------东邪西毒\n", 519 | "我们爬取了第:140部电影-------断背山\n", 520 | "我们爬取了第:141部电影-------银河护卫队2\n", 521 | "我们爬取了第:142部电影-------寻龙诀\n", 522 | "我们爬取了第:143部电影-------西游降魔篇\n", 523 | "我们爬取了第:144部电影-------秒速5厘米\n", 524 | "我们爬取了第:145部电影-------指环王3:王者无敌\n", 525 | "我们爬取了第:146部电影-------活着\n", 526 | "我们爬取了第:147部电影-------2012\n", 527 | "我们爬取了第:148部电影-------恐怖游轮\n", 528 | "我们爬取了第:149部电影-------蜘蛛侠:英雄归来\n", 529 | "我们爬取了第:150部电影-------告白\n", 530 | "我们爬取了第:151部电影-------功夫熊猫\n", 531 | "我们爬取了第:152部电影-------被嫌弃的松子的一生\n", 532 | "我们爬取了第:153部电影-------驯龙高手\n", 533 | "我们爬取了第:154部电影-------神奇动物:格林德沃之罪\n", 534 | "我们爬取了第:155部电影-------蝙蝠侠:黑暗骑士崛起\n", 535 | "我们爬取了第:156部电影-------冰雪奇缘\n", 536 | "我们爬取了第:157部电影-------哈利·波特与死亡圣器(下)\n", 537 | "我们爬取了第:158部电影-------冰川时代\n", 538 | "我们爬取了第:159部电影-------志明与春娇\n", 539 | "我们爬取了第:160部电影-------致命ID\n", 540 | "我们爬取了第:161部电影-------乘风破浪\n", 541 | "我们爬取了第:162部电影-------金陵十三钗\n", 542 | "我们爬取了第:163部电影-------美国队长\n", 543 | "我们爬取了第:164部电影-------黑豹\n", 544 | "我们爬取了第:165部电影-------哪吒之魔童降世\n", 545 | "我们爬取了第:166部电影-------雷神3:诸神黄昏\n", 546 | "我们爬取了第:167部电影-------指环王2:双塔奇兵\n", 547 | "我们爬取了第:168部电影-------勇敢的心\n", 548 | "我们爬取了第:169部电影-------天堂电影院\n", 549 | "我们爬取了第:170部电影-------惊天魔盗团\n", 550 | "我们爬取了第:171部电影-------致我们终将逝去的青春\n", 551 | "我们爬取了第:172部电影-------真爱至上\n", 552 | "我们爬取了第:173部电影-------射雕英雄传之东成西就\n", 553 | "我们爬取了第:174部电影-------大黄蜂\n", 554 | "我们爬取了第:175部电影-------怪兽电力公司\n", 555 | "我们爬取了第:176部电影-------捉妖记\n", 556 | "我们爬取了第:177部电影-------了不起的盖茨比\n", 557 | "我们爬取了第:178部电影-------速度与激情7\n", 558 | "我们爬取了第:179部电影-------死亡诗社\n", 559 | "我们爬取了第:180部电影-------阳光姐妹淘\n", 560 | "我们爬取了第:181部电影-------乱世佳人\n", 561 | "我们爬取了第:182部电影-------入殓师\n", 562 | "我们爬取了第:183部电影-------岁月神偷\n", 563 | "我们爬取了第:184部电影-------心灵捕手\n", 564 | "我们爬取了第:185部电影-------色,戒\n", 565 | "我们爬取了第:186部电影-------猫鼠游戏\n", 566 | "我们爬取了第:187部电影-------超体\n", 567 | "我们爬取了第:188部电影-------阳光灿烂的日子\n", 568 | "我们爬取了第:189部电影-------烈日灼心\n", 569 | "我们爬取了第:190部电影-------拯救大兵瑞恩\n", 570 | "我们爬取了第:191部电影-------蜘蛛侠:平行宇宙\n", 571 | "我们爬取了第:192部电影-------神奇动物在哪里\n", 572 | "我们爬取了第:193部电影-------X战警:逆转未来\n", 573 | "我们爬取了第:194部电影-------雷神\n", 574 | "我们爬取了第:195部电影-------调音师\n", 575 | "我们爬取了第:196部电影-------恋恋笔记本\n", 576 | "我们爬取了第:197部电影-------记忆碎片\n", 577 | "我们爬取了第:198部电影-------暮光之城\n", 578 | "我们爬取了第:199部电影-------雷神2:黑暗世界\n", 579 | "我们爬取了第:200部电影-------两小无猜\n" 580 | ] 581 | } 582 | ], 583 | "source": [ 584 | "movie_result = pd.DataFrame()\n", 585 | "ip = '' #这里构建自己的IP池\n", 586 | "count2 = 1\n", 587 | "cw = 1\n", 588 | "\n", 589 | "for url,name in zip(result['网址'].values,result['片名'].values):\n", 590 | "#for name,url in wrongs.items():\n", 591 | " try:\n", 592 | " cache = parse_movie_info(url,headers = headers,ip = ip)\n", 593 | " movie_result = pd.concat([movie_result,cache])\n", 594 | " #time.sleep(random.random())\n", 595 | " print('我们爬取了第:%d部电影-------%s' % (count2,name))\n", 596 | " count2 += 1\n", 597 | " except:\n", 598 | " print('滴滴滴滴滴,第{}次报错'.format(cw))\n", 599 | " print('ip is:{}'.format(ip))\n", 600 | " cw += 1\n", 601 | " time.sleep(2)\n", 602 | " continue" 603 | ] 604 | }, 605 | { 606 | "cell_type": "code", 607 | "execution_count": 17, 608 | "metadata": { 609 | "collapsed": false 610 | }, 611 | "outputs": [ 612 | { 613 | "data": { 614 | "text/html": [ 615 | "
\n", 616 | "\n", 629 | "\n", 630 | " \n", 631 | " \n", 632 | " \n", 633 | " \n", 634 | " \n", 635 | " \n", 636 | " \n", 637 | " \n", 638 | " \n", 639 | " \n", 640 | " \n", 641 | " \n", 642 | " \n", 643 | " \n", 644 | " \n", 645 | " \n", 646 | " \n", 647 | " \n", 648 | " \n", 649 | " \n", 650 | " \n", 651 | " \n", 652 | " \n", 653 | " \n", 654 | " \n", 655 | " \n", 656 | " \n", 657 | " \n", 658 | " \n", 659 | " \n", 660 | " \n", 661 | " \n", 662 | " \n", 663 | " \n", 664 | " \n", 665 | " \n", 666 | " \n", 667 | " \n", 668 | " \n", 669 | " \n", 670 | " \n", 671 | " \n", 672 | " \n", 673 | " \n", 674 | " \n", 675 | " \n", 676 | " \n", 677 | " \n", 678 | " \n", 679 | " \n", 680 | " \n", 681 | " \n", 682 | " \n", 683 | " \n", 684 | " \n", 685 | " \n", 686 | " \n", 687 | " \n", 688 | " \n", 689 | " \n", 690 | " \n", 691 | " \n", 692 | " \n", 693 | " \n", 694 | " \n", 695 | " \n", 696 | " \n", 697 | " \n", 698 | " \n", 699 | " \n", 700 | " \n", 701 | " \n", 702 | " \n", 703 | " \n", 704 | " \n", 705 | " \n", 706 | " \n", 707 | " \n", 708 | " \n", 709 | " \n", 710 | " \n", 711 | " \n", 712 | "
上映时间地区热评片名片长电影类型简介网址评分人数评分分布
0(2018)中国大陆“你敢保证你一辈子不得病?”纯粹、直接、有力!常常感叹:电影只能是电影。但每看到这样的佳作,...我不是药神117[剧情, 喜剧]普通中年男子程勇(徐峥 饰)经营着一家保健品店,失意又失婚。不速之客吕受益(王传君 饰)的到...https://movie.douban.com/subject/26752088/1174897{'还行': '7.0%', '力荐': '57.4%', '较差': '0.5%', '很...
0(1994)法国他们居然没做爱这个杀手不太冷 Léon110分钟(剧场版) / 133分钟(国际版)\\n 又名: 杀手莱昂 / 终极...[剧情, 动作, 犯罪]里昂(让·雷诺饰)是名孤独的职业杀手,受人雇佣。一天,邻居家小姑娘马蒂尔达(纳塔丽·波特曼饰...https://movie.douban.com/subject/1295644/1380628{'还行': '3.2%', '力荐': '74.2%', '较差': '0.2%', '很...
0(1994)美国关于希望最强有力的注释。肖申克的救赎 The Shawshank Redemption142[剧情, 犯罪]20世纪40年代末,小有成就的青年银行家安迪(蒂姆·罗宾斯 Tim Robbins 饰)因涉...https://movie.douban.com/subject/1292052/1525345{'还行': '1.5%', '力荐': '84.6%', '较差': '0.1%', '很...
0(2019)中国大陆1.终于,轮到我们仰望星空。2.后启示录死亡废墟,赛博朋克地下城,以及烟波浩渺的末日想象,缔...流浪地球125[科幻, 灾难]近未来,科学家们发现太阳急速衰老膨胀,短时间内包括地球在内的整个太阳系都将被太阳所吞没。为了...https://movie.douban.com/subject/26266893/1264654{'还行': '22.0%', '力荐': '33.1%', '较差': '4.7%', '...
0(1994)美国我生命里最温暖的一部电影阿甘正传 Forrest Gump142[剧情, 爱情]阿甘(汤姆·汉克斯 饰)于二战结束后不久出生在美国南方阿拉巴马州一个闭塞的小镇,他先天弱智,...https://movie.douban.com/subject/1292720/1192711{'还行': '2.9%', '力荐': '76.0%', '较差': '0.2%', '很...
\n", 713 | "
" 714 | ], 715 | "text/plain": [ 716 | " 上映时间 地区 热评 \\\n", 717 | "0 (2018) 中国大陆 “你敢保证你一辈子不得病?”纯粹、直接、有力!常常感叹:电影只能是电影。但每看到这样的佳作,... \n", 718 | "0 (1994) 法国 他们居然没做爱 \n", 719 | "0 (1994) 美国 关于希望最强有力的注释。 \n", 720 | "0 (2019) 中国大陆 1.终于,轮到我们仰望星空。2.后启示录死亡废墟,赛博朋克地下城,以及烟波浩渺的末日想象,缔... \n", 721 | "0 (1994) 美国 我生命里最温暖的一部电影 \n", 722 | "\n", 723 | " 片名 \\\n", 724 | "0 我不是药神 \n", 725 | "0 这个杀手不太冷 Léon \n", 726 | "0 肖申克的救赎 The Shawshank Redemption \n", 727 | "0 流浪地球 \n", 728 | "0 阿甘正传 Forrest Gump \n", 729 | "\n", 730 | " 片长 电影类型 \\\n", 731 | "0 117 [剧情, 喜剧] \n", 732 | "0 110分钟(剧场版) / 133分钟(国际版)\\n 又名: 杀手莱昂 / 终极... [剧情, 动作, 犯罪] \n", 733 | "0 142 [剧情, 犯罪] \n", 734 | "0 125 [科幻, 灾难] \n", 735 | "0 142 [剧情, 爱情] \n", 736 | "\n", 737 | " 简介 \\\n", 738 | "0 普通中年男子程勇(徐峥 饰)经营着一家保健品店,失意又失婚。不速之客吕受益(王传君 饰)的到... \n", 739 | "0 里昂(让·雷诺饰)是名孤独的职业杀手,受人雇佣。一天,邻居家小姑娘马蒂尔达(纳塔丽·波特曼饰... \n", 740 | "0 20世纪40年代末,小有成就的青年银行家安迪(蒂姆·罗宾斯 Tim Robbins 饰)因涉... \n", 741 | "0 近未来,科学家们发现太阳急速衰老膨胀,短时间内包括地球在内的整个太阳系都将被太阳所吞没。为了... \n", 742 | "0 阿甘(汤姆·汉克斯 饰)于二战结束后不久出生在美国南方阿拉巴马州一个闭塞的小镇,他先天弱智,... \n", 743 | "\n", 744 | " 网址 评分人数 \\\n", 745 | "0 https://movie.douban.com/subject/26752088/ 1174897 \n", 746 | "0 https://movie.douban.com/subject/1295644/ 1380628 \n", 747 | "0 https://movie.douban.com/subject/1292052/ 1525345 \n", 748 | "0 https://movie.douban.com/subject/26266893/ 1264654 \n", 749 | "0 https://movie.douban.com/subject/1292720/ 1192711 \n", 750 | "\n", 751 | " 评分分布 \n", 752 | "0 {'还行': '7.0%', '力荐': '57.4%', '较差': '0.5%', '很... \n", 753 | "0 {'还行': '3.2%', '力荐': '74.2%', '较差': '0.2%', '很... \n", 754 | "0 {'还行': '1.5%', '力荐': '84.6%', '较差': '0.1%', '很... \n", 755 | "0 {'还行': '22.0%', '力荐': '33.1%', '较差': '4.7%', '... \n", 756 | "0 {'还行': '2.9%', '力荐': '76.0%', '较差': '0.2%', '很... " 757 | ] 758 | }, 759 | "execution_count": 17, 760 | "metadata": {}, 761 | "output_type": "execute_result" 762 | } 763 | ], 764 | "source": [ 765 | "movie_result.head()" 766 | ] 767 | }, 768 | { 769 | "cell_type": "markdown", 770 | "metadata": { 771 | "collapsed": true 772 | }, 773 | "source": [ 774 | "### 文件存储" 775 | ] 776 | }, 777 | { 778 | "cell_type": "code", 779 | "execution_count": 18, 780 | "metadata": { 781 | "collapsed": false 782 | }, 783 | "outputs": [], 784 | "source": [ 785 | "result.to_excel('电影基本信息大全.xlsx')\n", 786 | "movie_result.to_excel('电影详细信息.xlsx')" 787 | ] 788 | }, 789 | { 790 | "cell_type": "code", 791 | "execution_count": null, 792 | "metadata": { 793 | "collapsed": true 794 | }, 795 | "outputs": [], 796 | "source": [] 797 | }, 798 | { 799 | "cell_type": "code", 800 | "execution_count": null, 801 | "metadata": { 802 | "collapsed": false 803 | }, 804 | "outputs": [], 805 | "source": [] 806 | }, 807 | { 808 | "cell_type": "code", 809 | "execution_count": null, 810 | "metadata": { 811 | "collapsed": true 812 | }, 813 | "outputs": [], 814 | "source": [] 815 | }, 816 | { 817 | "cell_type": "code", 818 | "execution_count": null, 819 | "metadata": { 820 | "collapsed": false 821 | }, 822 | "outputs": [], 823 | "source": [] 824 | }, 825 | { 826 | "cell_type": "code", 827 | "execution_count": null, 828 | "metadata": { 829 | "collapsed": true 830 | }, 831 | "outputs": [], 832 | "source": [] 833 | }, 834 | { 835 | "cell_type": "code", 836 | "execution_count": null, 837 | "metadata": { 838 | "collapsed": false 839 | }, 840 | "outputs": [], 841 | "source": [] 842 | }, 843 | { 844 | "cell_type": "code", 845 | "execution_count": null, 846 | "metadata": { 847 | "collapsed": true 848 | }, 849 | "outputs": [], 850 | "source": [] 851 | }, 852 | { 853 | "cell_type": "code", 854 | "execution_count": null, 855 | "metadata": { 856 | "collapsed": true 857 | }, 858 | "outputs": [], 859 | "source": [] 860 | }, 861 | { 862 | "cell_type": "code", 863 | "execution_count": null, 864 | "metadata": { 865 | "collapsed": false 866 | }, 867 | "outputs": [], 868 | "source": [] 869 | }, 870 | { 871 | "cell_type": "code", 872 | "execution_count": null, 873 | "metadata": { 874 | "collapsed": true 875 | }, 876 | "outputs": [], 877 | "source": [] 878 | }, 879 | { 880 | "cell_type": "code", 881 | "execution_count": 24, 882 | "metadata": { 883 | "collapsed": true 884 | }, 885 | "outputs": [], 886 | "source": [] 887 | }, 888 | { 889 | "cell_type": "code", 890 | "execution_count": 37, 891 | "metadata": { 892 | "collapsed": true 893 | }, 894 | "outputs": [], 895 | "source": [] 896 | }, 897 | { 898 | "cell_type": "code", 899 | "execution_count": null, 900 | "metadata": { 901 | "collapsed": true 902 | }, 903 | "outputs": [], 904 | "source": [] 905 | }, 906 | { 907 | "cell_type": "code", 908 | "execution_count": 39, 909 | "metadata": { 910 | "collapsed": true 911 | }, 912 | "outputs": [], 913 | "source": [] 914 | }, 915 | { 916 | "cell_type": "code", 917 | "execution_count": null, 918 | "metadata": { 919 | "collapsed": true 920 | }, 921 | "outputs": [], 922 | "source": [] 923 | }, 924 | { 925 | "cell_type": "code", 926 | "execution_count": null, 927 | "metadata": { 928 | "collapsed": true 929 | }, 930 | "outputs": [], 931 | "source": [] 932 | }, 933 | { 934 | "cell_type": "code", 935 | "execution_count": null, 936 | "metadata": { 937 | "collapsed": true 938 | }, 939 | "outputs": [], 940 | "source": [] 941 | }, 942 | { 943 | "cell_type": "code", 944 | "execution_count": null, 945 | "metadata": { 946 | "collapsed": true 947 | }, 948 | "outputs": [], 949 | "source": [] 950 | } 951 | ], 952 | "metadata": { 953 | "anaconda-cloud": {}, 954 | "kernelspec": { 955 | "display_name": "Python [default]", 956 | "language": "python", 957 | "name": "python3" 958 | }, 959 | "language_info": { 960 | "codemirror_mode": { 961 | "name": "ipython", 962 | "version": 3 963 | }, 964 | "file_extension": ".py", 965 | "mimetype": "text/x-python", 966 | "name": "python", 967 | "nbconvert_exporter": "python", 968 | "pygments_lexer": "ipython3", 969 | "version": "3.5.2" 970 | } 971 | }, 972 | "nbformat": 4, 973 | "nbformat_minor": 2 974 | } 975 | -------------------------------------------------------------------------------- /Hair/README.md: -------------------------------------------------------------------------------- 1 | # 防脱洗发水分析 # 2 | 3 | ---------- 4 | 5 | 这是一个关于发际线的悲伤故事。 6 | 7 | 主要文件为: 8 | 9 | - 淘宝评价爬取(selenium) 10 | - 防脱发洗发水评价源数据 11 | 12 | 13 | -------------------------------------------------------------------------------- /Hair/防脱洗发水评价.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Hair/防脱洗发水评价.xlsx -------------------------------------------------------------------------------- /Hair/防脱洗发水评价爬取+分析.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### 公众号:数据不吹牛,更多案例和有趣分析等你来撩" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": null, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "import pandas as pd\n", 17 | "from selenium import webdriver\n", 18 | "import random\n", 19 | "import os\n", 20 | "import time" 21 | ] 22 | }, 23 | { 24 | "cell_type": "markdown", 25 | "metadata": {}, 26 | "source": [ 27 | "### 爬取单页评价(每页20条)" 28 | ] 29 | }, 30 | { 31 | "cell_type": "code", 32 | "execution_count": 216, 33 | "metadata": {}, 34 | "outputs": [], 35 | "source": [ 36 | "def get_page(driver):\n", 37 | " \n", 38 | " result = pd.DataFrame()\n", 39 | " for i in driver.find_elements_by_xpath('//div[@class = \"rate-grid\"]/table/tbody/tr'):\n", 40 | " try:\n", 41 | " content = i.find_element_by_xpath('td[@class = \"tm-col-master\"]/div[@class = \"tm-rate-content\"]').text\n", 42 | " #评价日期\n", 43 | " date = i.find_element_by_xpath('td[@class = \"tm-col-master\"]/div[@class = \"tm-rate-date\"]').text\n", 44 | " #购买产品\n", 45 | " sku = i.find_element_by_xpath('td[@class = \"col-meta\"]/div[@class = \"rate-sku\"]').text\n", 46 | "\n", 47 | " #用户名\n", 48 | " username = i.find_element_by_xpath('td[@class = \"col-author\"]/div[@class = \"rate-user-info\"]').text\n", 49 | " append_time = None\n", 50 | " append_content = None\n", 51 | "\n", 52 | " except:\n", 53 | " content = i.find_element_by_xpath('td[@class = \"tm-col-master\"]/div[@class = \"tm-rate-premiere\"]/div[@class = \"tm-rate-content\"]').text\n", 54 | " #评价日期\n", 55 | " date = i.find_element_by_xpath('td[@class = \"tm-col-master\"]/div[@class = \"tm-rate-premiere\"]/div[@class = \"tm-rate-tag\"]/div[@class = \"tm-rate-date\"]').text\n", 56 | " #购买产品\n", 57 | " sku = i.find_element_by_xpath('td[@class = \"col-meta\"]/div[@class = \"rate-sku\"]').text\n", 58 | " #用户名\n", 59 | " username = i.find_element_by_xpath('td[@class = \"col-author\"]/div[@class = \"rate-user-info\"]').text\n", 60 | "\n", 61 | " append_time = i.find_element_by_xpath('td[@class = \"tm-col-master\"]/div[@class = \"tm-rate-append\"]/div[1]').text\n", 62 | " append_content = i.find_element_by_xpath('td[@class = \"tm-col-master\"]/div[@class = \"tm-rate-append\"]/div[2]').text\n", 63 | "\n", 64 | " df = pd.DataFrame({'用户名':[username],'购买产品':[sku],'评价日期':[date],'初评内容':[content],\n", 65 | " '追评时间':[append_time],'追评内容':[append_content]})\n", 66 | "\n", 67 | " result = pd.concat([result,df])\n", 68 | " \n", 69 | " return result,driver" 70 | ] 71 | }, 72 | { 73 | "cell_type": "markdown", 74 | "metadata": {}, 75 | "source": [ 76 | "### 循环爬取,需要提前指定网址和评论总数" 77 | ] 78 | }, 79 | { 80 | "cell_type": "code", 81 | "execution_count": 107, 82 | "metadata": {}, 83 | "outputs": [], 84 | "source": [ 85 | "url = 'https://detail.tmall.com/item.htm?spm=a230r.1.14.1.70f65edadaPTn3&id=521921506095&ns=1&abbucket=18'\n", 86 | "\n", 87 | "def carwl_product_comment(driver,url,max_num = 100):\n", 88 | " driver.get(url)\n", 89 | " \n", 90 | " time.sleep(5)\n", 91 | " #关掉要求登录的弹窗,就能够不登录状态下爬取\n", 92 | " driver.find_element_by_xpath('//div[@class = \"sufei-dialog-close\"]').click()\n", 93 | " \n", 94 | " driver.implicitly_wait(5)\n", 95 | " #点击到评论页面\n", 96 | " try:\n", 97 | " driver.find_element_by_xpath('//ul[@class = \"tabbar tm-clear\"]/li[2]').click()\n", 98 | " except:\n", 99 | " driver.implicitly_wait(5)\n", 100 | " driver.find_element_by_xpath('//ul[@class = \"tabbar tm-clear\"]/li[2]').click()\n", 101 | " \n", 102 | " max_page = int(max_num / 20)\n", 103 | " \n", 104 | " if max_page > 90:\n", 105 | " max_page = 90\n", 106 | " else:\n", 107 | " pass\n", 108 | " \n", 109 | " c = 1\n", 110 | " final_re = pd.DataFrame()\n", 111 | "\n", 112 | " while c <= max_page:\n", 113 | " result,driver = get_page(driver)\n", 114 | " final_re = pd.concat([final_re,result])\n", 115 | " print('Bro,完成第{}页爬取'.format(c))\n", 116 | "\n", 117 | " #点击下一页\n", 118 | " driver.find_element_by_link_text('下一页>>').click()\n", 119 | " c += 1\n", 120 | " time.sleep(random.random() + 3)\n", 121 | " return final_re" 122 | ] 123 | }, 124 | { 125 | "cell_type": "markdown", 126 | "metadata": {}, 127 | "source": [ 128 | "### 运行\n", 129 | "#### 这里使用的是selenium中的PhantomJS,同学们也可以尝试Chrome,安装坑略多,不过网上都能找到相关解决方法" 130 | ] 131 | }, 132 | { 133 | "cell_type": "code", 134 | "execution_count": 217, 135 | "metadata": {}, 136 | "outputs": [], 137 | "source": [ 138 | "driver = webdriver.PhantomJS()\n", 139 | "final_re = carwl_product_comment(driver,url)" 140 | ] 141 | }, 142 | { 143 | "cell_type": "markdown", 144 | "metadata": {}, 145 | "source": [ 146 | "### 情感分词" 147 | ] 148 | }, 149 | { 150 | "cell_type": "code", 151 | "execution_count": 208, 152 | "metadata": {}, 153 | "outputs": [ 154 | { 155 | "data": { 156 | "text/html": [ 157 | "
\n", 158 | "\n", 171 | "\n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | "
品牌买家评价日期初评内容追评SKU
0三个魔发匠t**42019-11-21用了之后真的是促进每个神经细胞的,洗发水的泡沫比较丰富的,不需要拿沐浴球搓,随便用手搓搓就可...姗姗来迟的生姜洗发水,在关键时刻起到了最大的作用,洗头发顺顺滑滑的,特别清新,留香持久,洗的...化妆品净含量:(2瓶)生姜洗发水500ml+500ml
1三个魔发匠t**32019-11-23生姜洗发水用着很舒服,根据使用方法来洗一个地方就按一次,把头皮都按了个遍,很舒服,而且泡泡很...淡淡的姜香味,闻着挺舒服的,而且这个控油去屑的效果真是无敌,不用再看到满头的头皮屑的感觉真好...化妆品净含量:(1瓶)生姜洗发水500ml
2三个魔发匠喃**y2019-11-22朋友推荐来买的这款生姜洗发水,根据她们说效果很好!生姜发水一大瓶超级划算,九零后已经走上了防...我闻着洗发水有很浓的生姜味道,但又不会很刺鼻,蛮好的,洗发后头皮不痒、没有头皮屑,至于生发效...化妆品净含量:(1瓶)生姜洗发水500ml
3三个魔发匠f**02019-11-22脱发的故事是缓缓写到结局,这款生姜洗发水是生姜提取物成分,使用也是非常的放心,而且这款洗发水...唠唠叨叨的用这款洗发水洗头发非常舒服,洗发水温和的一点刺激感都没有,而且发质还变好了,包装的...化妆品净含量:【1瓶】生姜洗发水500ml+【1瓶】香氛护发素500ml
4三个魔发匠z**珍2019-11-24水油平衡也改善了,洗发水用着很舒服,洗一个地方就要按一次,而且泡泡很容易冲洗干净,吹干头发超...太值了,这款三个魔发匠生姜洗发水的香味我很喜欢,使用了以后感觉头发很顺,也不用油了,可以除螨...化妆品净含量:【1瓶】生姜洗发水500ml+【1瓶】香氛护发素500ml
\n", 231 | "
" 232 | ], 233 | "text/plain": [ 234 | " 品牌 买家 评价日期 初评内容 \\\n", 235 | "0 三个魔发匠 t**4 2019-11-21 用了之后真的是促进每个神经细胞的,洗发水的泡沫比较丰富的,不需要拿沐浴球搓,随便用手搓搓就可... \n", 236 | "1 三个魔发匠 t**3 2019-11-23 生姜洗发水用着很舒服,根据使用方法来洗一个地方就按一次,把头皮都按了个遍,很舒服,而且泡泡很... \n", 237 | "2 三个魔发匠 喃**y 2019-11-22 朋友推荐来买的这款生姜洗发水,根据她们说效果很好!生姜发水一大瓶超级划算,九零后已经走上了防... \n", 238 | "3 三个魔发匠 f**0 2019-11-22 脱发的故事是缓缓写到结局,这款生姜洗发水是生姜提取物成分,使用也是非常的放心,而且这款洗发水... \n", 239 | "4 三个魔发匠 z**珍 2019-11-24 水油平衡也改善了,洗发水用着很舒服,洗一个地方就要按一次,而且泡泡很容易冲洗干净,吹干头发超... \n", 240 | "\n", 241 | " 追评 \\\n", 242 | "0 姗姗来迟的生姜洗发水,在关键时刻起到了最大的作用,洗头发顺顺滑滑的,特别清新,留香持久,洗的... \n", 243 | "1 淡淡的姜香味,闻着挺舒服的,而且这个控油去屑的效果真是无敌,不用再看到满头的头皮屑的感觉真好... \n", 244 | "2 我闻着洗发水有很浓的生姜味道,但又不会很刺鼻,蛮好的,洗发后头皮不痒、没有头皮屑,至于生发效... \n", 245 | "3 唠唠叨叨的用这款洗发水洗头发非常舒服,洗发水温和的一点刺激感都没有,而且发质还变好了,包装的... \n", 246 | "4 太值了,这款三个魔发匠生姜洗发水的香味我很喜欢,使用了以后感觉头发很顺,也不用油了,可以除螨... \n", 247 | "\n", 248 | " SKU \n", 249 | "0 化妆品净含量:(2瓶)生姜洗发水500ml+500ml \n", 250 | "1 化妆品净含量:(1瓶)生姜洗发水500ml \n", 251 | "2 化妆品净含量:(1瓶)生姜洗发水500ml \n", 252 | "3 化妆品净含量:【1瓶】生姜洗发水500ml+【1瓶】香氛护发素500ml \n", 253 | "4 化妆品净含量:【1瓶】生姜洗发水500ml+【1瓶】香氛护发素500ml " 254 | ] 255 | }, 256 | "execution_count": 208, 257 | "metadata": {}, 258 | "output_type": "execute_result" 259 | } 260 | ], 261 | "source": [ 262 | "from snownlp import SnowNLP\n", 263 | "\n", 264 | "sens = []\n", 265 | "\n", 266 | "for text in final_re['初评内容']:\n", 267 | " s = SnowNLP(text)\n", 268 | " sens.append(s.sentiments)\n", 269 | " \n", 270 | "final_re['初评情感评分'] = sens" 271 | ] 272 | }, 273 | { 274 | "cell_type": "markdown", 275 | "metadata": {}, 276 | "source": [ 277 | "### 情感评分分析" 278 | ] 279 | }, 280 | { 281 | "cell_type": "code", 282 | "execution_count": 245, 283 | "metadata": {}, 284 | "outputs": [ 285 | { 286 | "data": { 287 | "text/plain": [ 288 | "0.49948609661261906" 289 | ] 290 | }, 291 | "execution_count": 245, 292 | "metadata": {}, 293 | "output_type": "execute_result" 294 | } 295 | ], 296 | "source": [ 297 | "final_re['初评情感评分'].mean()" 298 | ] 299 | }, 300 | { 301 | "cell_type": "code", 302 | "execution_count": 234, 303 | "metadata": {}, 304 | "outputs": [ 305 | { 306 | "data": { 307 | "text/html": [ 308 | "
\n", 309 | "\n", 322 | "\n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | " \n", 332 | " \n", 333 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | " \n", 344 | " \n", 345 | " \n", 346 | " \n", 347 | " \n", 348 | " \n", 349 | " \n", 350 | " \n", 351 | " \n", 352 | " \n", 353 | " \n", 354 | " \n", 355 | " \n", 356 | " \n", 357 | " \n", 358 | " \n", 359 | " \n", 360 | " \n", 361 | " \n", 362 | " \n", 363 | " \n", 364 | " \n", 365 | " \n", 366 | " \n", 367 | " \n", 368 | " \n", 369 | " \n", 370 | " \n", 371 | " \n", 372 | "
情感评分初评情感评分
count8980.0000008980.000000
mean0.4994480.499448
std0.3470430.347043
min0.0000000.000000
25%0.1455130.145513
50%0.4894210.489421
75%0.8475040.847504
max1.0000001.000000
\n", 373 | "
" 374 | ], 375 | "text/plain": [ 376 | " 情感评分 初评情感评分\n", 377 | "count 8980.000000 8980.000000\n", 378 | "mean 0.499448 0.499448\n", 379 | "std 0.347043 0.347043\n", 380 | "min 0.000000 0.000000\n", 381 | "25% 0.145513 0.145513\n", 382 | "50% 0.489421 0.489421\n", 383 | "75% 0.847504 0.847504\n", 384 | "max 1.000000 1.000000" 385 | ] 386 | }, 387 | "execution_count": 234, 388 | "metadata": {}, 389 | "output_type": "execute_result" 390 | } 391 | ], 392 | "source": [ 393 | "final_re.describe()" 394 | ] 395 | }, 396 | { 397 | "cell_type": "code", 398 | "execution_count": 243, 399 | "metadata": {}, 400 | "outputs": [ 401 | { 402 | "data": { 403 | "text/plain": [ 404 | "" 405 | ] 406 | }, 407 | "execution_count": 243, 408 | "metadata": {}, 409 | "output_type": "execute_result" 410 | }, 411 | { 412 | "data": { 413 | "image/png": "\n", 414 | "text/plain": [ 415 | "" 416 | ] 417 | }, 418 | "metadata": { 419 | "needs_background": "light" 420 | }, 421 | "output_type": "display_data" 422 | } 423 | ], 424 | "source": [ 425 | "import seaborn as sns\n", 426 | "import matplotlib.pyplot as plt\n", 427 | "\n", 428 | "fig,ax = plt.subplots(1,1,figsize = (12,5))\n", 429 | "sns.distplot(final_re['初评情感评分'],color = 'red')\n", 430 | "\n", 431 | "plt.yticks(fontsize=11)\n", 432 | "plt.xticks(fontsize=11)\n", 433 | "\n", 434 | "ax.set_xlabel('情感评分', fontsize=14)" 435 | ] 436 | }, 437 | { 438 | "cell_type": "code", 439 | "execution_count": 248, 440 | "metadata": {}, 441 | "outputs": [ 442 | { 443 | "data": { 444 | "text/html": [ 445 | "
\n", 446 | "\n", 459 | "\n", 460 | " \n", 461 | " \n", 462 | " \n", 463 | " \n", 464 | " \n", 465 | " \n", 466 | " \n", 467 | " \n", 468 | " \n", 469 | " \n", 470 | " \n", 471 | " \n", 472 | " \n", 473 | " \n", 474 | " \n", 475 | " \n", 476 | " \n", 477 | " \n", 478 | " \n", 479 | " \n", 480 | " \n", 481 | " \n", 482 | " \n", 483 | " \n", 484 | " \n", 485 | " \n", 486 | " \n", 487 | " \n", 488 | " \n", 489 | " \n", 490 | " \n", 491 | " \n", 492 | " \n", 493 | " \n", 494 | "
品牌初评情感评分
0三个魔发匠0.547250
1有情生姜0.630458
2白云山敬修堂0.573973
3章光1010.145513
4霸王防脱0.615317
\n", 495 | "
" 496 | ], 497 | "text/plain": [ 498 | " 品牌 初评情感评分\n", 499 | "0 三个魔发匠 0.547250\n", 500 | "1 有情生姜 0.630458\n", 501 | "2 白云山敬修堂 0.573973\n", 502 | "3 章光101 0.145513\n", 503 | "4 霸王防脱 0.615317" 504 | ] 505 | }, 506 | "execution_count": 248, 507 | "metadata": {}, 508 | "output_type": "execute_result" 509 | } 510 | ], 511 | "source": [ 512 | "final_re.groupby('品牌')['初评情感评分'].median().reset_index()" 513 | ] 514 | }, 515 | { 516 | "cell_type": "code", 517 | "execution_count": null, 518 | "metadata": {}, 519 | "outputs": [], 520 | "source": [] 521 | }, 522 | { 523 | "cell_type": "code", 524 | "execution_count": null, 525 | "metadata": {}, 526 | "outputs": [], 527 | "source": [] 528 | }, 529 | { 530 | "cell_type": "code", 531 | "execution_count": null, 532 | "metadata": {}, 533 | "outputs": [], 534 | "source": [] 535 | } 536 | ], 537 | "metadata": { 538 | "kernelspec": { 539 | "display_name": "Python 3", 540 | "language": "python", 541 | "name": "python3" 542 | }, 543 | "language_info": { 544 | "codemirror_mode": { 545 | "name": "ipython", 546 | "version": 3 547 | }, 548 | "file_extension": ".py", 549 | "mimetype": "text/x-python", 550 | "name": "python", 551 | "nbconvert_exporter": "python", 552 | "pygments_lexer": "ipython3", 553 | "version": "3.5.3" 554 | } 555 | }, 556 | "nbformat": 4, 557 | "nbformat_minor": 2 558 | } 559 | -------------------------------------------------------------------------------- /Python+excel/Python批量处理Excel表格.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### 原创:周志鹏\n", 8 | "### 公众号:数据不吹牛,更多案例和有趣分析等你来撩" 9 | ] 10 | }, 11 | { 12 | "cell_type": "code", 13 | "execution_count": null, 14 | "metadata": {}, 15 | "outputs": [], 16 | "source": [ 17 | "import os\n", 18 | "import time\n", 19 | "import pandas as pd\n", 20 | "\n", 21 | "os.chdir('C:\\\\Users\\\\Administrator\\\\Desktop\\\\data')" 22 | ] 23 | }, 24 | { 25 | "cell_type": "markdown", 26 | "metadata": {}, 27 | "source": [ 28 | "### 打开单个表格" 29 | ] 30 | }, 31 | { 32 | "cell_type": "code", 33 | "execution_count": 4, 34 | "metadata": {}, 35 | "outputs": [ 36 | { 37 | "data": { 38 | "text/html": [ 39 | "
\n", 40 | "\n", 53 | "\n", 54 | " \n", 55 | " \n", 56 | " \n", 57 | " \n", 58 | " \n", 59 | " \n", 60 | " \n", 61 | " \n", 62 | " \n", 63 | " \n", 64 | " \n", 65 | " \n", 66 | " \n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | "
日期转化率访客数三级类目客单价品牌
02019-080.025806221402绑钩器33.284283品牌-17
12019-080.01963814074绑钩器233.995330品牌-12
22019-080.06540775392绑钩器11.938785品牌-20
32019-080.01590585529绑钩器41.059966品牌-13
42019-080.03903323839绑钩器44.502008品牌-1
\n", 113 | "
" 114 | ], 115 | "text/plain": [ 116 | " 日期 转化率 访客数 三级类目 客单价 品牌\n", 117 | "0 2019-08 0.025806 221402 绑钩器 33.284283 品牌-17\n", 118 | "1 2019-08 0.019638 14074 绑钩器 233.995330 品牌-12\n", 119 | "2 2019-08 0.065407 75392 绑钩器 11.938785 品牌-20\n", 120 | "3 2019-08 0.015905 85529 绑钩器 41.059966 品牌-13\n", 121 | "4 2019-08 0.039033 23839 绑钩器 44.502008 品牌-1" 122 | ] 123 | }, 124 | "execution_count": 4, 125 | "metadata": {}, 126 | "output_type": "execute_result" 127 | } 128 | ], 129 | "source": [ 130 | "name = '垂钓装备&绑钩器.xlsx'\n", 131 | "df = pd.read_excel(name)\n", 132 | "df.head()" 133 | ] 134 | }, 135 | { 136 | "cell_type": "markdown", 137 | "metadata": {}, 138 | "source": [ 139 | "### 查看日期范围" 140 | ] 141 | }, 142 | { 143 | "cell_type": "code", 144 | "execution_count": 6, 145 | "metadata": {}, 146 | "outputs": [ 147 | { 148 | "data": { 149 | "text/plain": [ 150 | "array(['2019-08', '2019-07', '2019-06', '2019-05', '2019-04', '2019-03',\n", 151 | " '2019-02', '2019-01', '2018-12', '2018-11', '2018-10', '2018-09'], dtype=object)" 152 | ] 153 | }, 154 | "execution_count": 6, 155 | "metadata": {}, 156 | "output_type": "execute_result" 157 | } 158 | ], 159 | "source": [ 160 | "df['日期'].unique()" 161 | ] 162 | }, 163 | { 164 | "cell_type": "markdown", 165 | "metadata": {}, 166 | "source": [ 167 | "### 计算销售额" 168 | ] 169 | }, 170 | { 171 | "cell_type": "code", 172 | "execution_count": 8, 173 | "metadata": {}, 174 | "outputs": [ 175 | { 176 | "data": { 177 | "text/html": [ 178 | "
\n", 179 | "\n", 192 | "\n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | "
日期转化率访客数三级类目客单价品牌销售额
02019-080.025806221402绑钩器33.284283品牌-17190167.455681
12019-080.01963814074绑钩器233.995330品牌-1264673.807815
22019-080.06540775392绑钩器11.938785品牌-2058871.997672
32019-080.01590585529绑钩器41.059966品牌-1355856.842507
42019-080.03903323839绑钩器44.502008品牌-141409.600947
\n", 258 | "
" 259 | ], 260 | "text/plain": [ 261 | " 日期 转化率 访客数 三级类目 客单价 品牌 销售额\n", 262 | "0 2019-08 0.025806 221402 绑钩器 33.284283 品牌-17 190167.455681\n", 263 | "1 2019-08 0.019638 14074 绑钩器 233.995330 品牌-12 64673.807815\n", 264 | "2 2019-08 0.065407 75392 绑钩器 11.938785 品牌-20 58871.997672\n", 265 | "3 2019-08 0.015905 85529 绑钩器 41.059966 品牌-13 55856.842507\n", 266 | "4 2019-08 0.039033 23839 绑钩器 44.502008 品牌-1 41409.600947" 267 | ] 268 | }, 269 | "execution_count": 8, 270 | "metadata": {}, 271 | "output_type": "execute_result" 272 | } 273 | ], 274 | "source": [ 275 | "df['销售额'] = df['访客数'] * df['转化率'] * df['客单价']\n", 276 | "df.head()" 277 | ] 278 | }, 279 | { 280 | "cell_type": "markdown", 281 | "metadata": {}, 282 | "source": [ 283 | "### 单表销售额合并" 284 | ] 285 | }, 286 | { 287 | "cell_type": "code", 288 | "execution_count": 10, 289 | "metadata": {}, 290 | "outputs": [ 291 | { 292 | "data": { 293 | "text/html": [ 294 | "
\n", 295 | "\n", 308 | "\n", 309 | " \n", 310 | " \n", 311 | " \n", 312 | " \n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | " \n", 320 | " \n", 321 | " \n", 322 | " \n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | " \n", 332 | " \n", 333 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | "
品牌销售额
0品牌-1529837.745358
1品牌-10217976.661847
2品牌-11327093.079507
3品牌-12485635.295843
4品牌-13438391.195855
\n", 344 | "
" 345 | ], 346 | "text/plain": [ 347 | " 品牌 销售额\n", 348 | "0 品牌-1 529837.745358\n", 349 | "1 品牌-10 217976.661847\n", 350 | "2 品牌-11 327093.079507\n", 351 | "3 品牌-12 485635.295843\n", 352 | "4 品牌-13 438391.195855" 353 | ] 354 | }, 355 | "execution_count": 10, 356 | "metadata": {}, 357 | "output_type": "execute_result" 358 | } 359 | ], 360 | "source": [ 361 | "df_sum = df.groupby('品牌')['销售额'].sum().reset_index()\n", 362 | "df_sum.head()" 363 | ] 364 | }, 365 | { 366 | "cell_type": "markdown", 367 | "metadata": {}, 368 | "source": [ 369 | "### 增加行业标签" 370 | ] 371 | }, 372 | { 373 | "cell_type": "code", 374 | "execution_count": 12, 375 | "metadata": {}, 376 | "outputs": [ 377 | { 378 | "data": { 379 | "text/html": [ 380 | "
\n", 381 | "\n", 394 | "\n", 395 | " \n", 396 | " \n", 397 | " \n", 398 | " \n", 399 | " \n", 400 | " \n", 401 | " \n", 402 | " \n", 403 | " \n", 404 | " \n", 405 | " \n", 406 | " \n", 407 | " \n", 408 | " \n", 409 | " \n", 410 | " \n", 411 | " \n", 412 | " \n", 413 | " \n", 414 | " \n", 415 | " \n", 416 | " \n", 417 | " \n", 418 | " \n", 419 | " \n", 420 | " \n", 421 | " \n", 422 | " \n", 423 | " \n", 424 | " \n", 425 | " \n", 426 | " \n", 427 | " \n", 428 | " \n", 429 | " \n", 430 | " \n", 431 | " \n", 432 | " \n", 433 | " \n", 434 | " \n", 435 | "
品牌销售额行业
0品牌-1529837.745358垂钓装备&绑钩器
1品牌-10217976.661847垂钓装备&绑钩器
2品牌-11327093.079507垂钓装备&绑钩器
3品牌-12485635.295843垂钓装备&绑钩器
4品牌-13438391.195855垂钓装备&绑钩器
\n", 436 | "
" 437 | ], 438 | "text/plain": [ 439 | " 品牌 销售额 行业\n", 440 | "0 品牌-1 529837.745358 垂钓装备&绑钩器\n", 441 | "1 品牌-10 217976.661847 垂钓装备&绑钩器\n", 442 | "2 品牌-11 327093.079507 垂钓装备&绑钩器\n", 443 | "3 品牌-12 485635.295843 垂钓装备&绑钩器\n", 444 | "4 品牌-13 438391.195855 垂钓装备&绑钩器" 445 | ] 446 | }, 447 | "execution_count": 12, 448 | "metadata": {}, 449 | "output_type": "execute_result" 450 | } 451 | ], 452 | "source": [ 453 | "df_sum['行业'] = name.replace('.xlsx','')\n", 454 | "df_sum.head()" 455 | ] 456 | }, 457 | { 458 | "cell_type": "markdown", 459 | "metadata": {}, 460 | "source": [ 461 | "### 搞定单个文件,批量处理只需要循环即可" 462 | ] 463 | }, 464 | { 465 | "cell_type": "code", 466 | "execution_count": 13, 467 | "metadata": {}, 468 | "outputs": [ 469 | { 470 | "name": "stdout", 471 | "output_type": "stream", 472 | "text": [ 473 | "用Python操作所花费时间:2.6550002098083496 s\n" 474 | ] 475 | } 476 | ], 477 | "source": [ 478 | "import time\n", 479 | "\n", 480 | "#开始时间\n", 481 | "start = time.time()\n", 482 | "\n", 483 | "#存储汇总的结果\n", 484 | "result = pd.DataFrame()\n", 485 | "\n", 486 | "#循环遍历表格名称\n", 487 | "for name in os.listdir():\n", 488 | " df = pd.read_excel(name)\n", 489 | " #计算销售额字段\n", 490 | " df['销售额'] = df['访客数'] * df['转化率'] * df['客单价']\n", 491 | " #按品牌对细分行业销售额进行汇总\n", 492 | " df_sum = df.groupby('品牌')['销售额'].sum().reset_index()\n", 493 | " df_sum['类目'] = name.replace('.xlsx','')\n", 494 | " result = pd.concat([result,df_sum])\n", 495 | "\n", 496 | "#对最终结果按销售额进行排序\n", 497 | "final = result.groupby('品牌')['销售额'].sum().reset_index().sort_values('销售额',ascending = False)\n", 498 | "\n", 499 | "#结束时间\n", 500 | "end = time.time()\n", 501 | "print('用Python操作所花费时间:{} s'.format(end-start))" 502 | ] 503 | }, 504 | { 505 | "cell_type": "code", 506 | "execution_count": 15, 507 | "metadata": {}, 508 | "outputs": [ 509 | { 510 | "data": { 511 | "text/html": [ 512 | "
\n", 513 | "\n", 526 | "\n", 527 | " \n", 528 | " \n", 529 | " \n", 530 | " \n", 531 | " \n", 532 | " \n", 533 | " \n", 534 | " \n", 535 | " \n", 536 | " \n", 537 | " \n", 538 | " \n", 539 | " \n", 540 | " \n", 541 | " \n", 542 | " \n", 543 | " \n", 544 | " \n", 545 | " \n", 546 | " \n", 547 | " \n", 548 | " \n", 549 | " \n", 550 | " \n", 551 | " \n", 552 | " \n", 553 | " \n", 554 | " \n", 555 | " \n", 556 | " \n", 557 | " \n", 558 | " \n", 559 | " \n", 560 | " \n", 561 | "
品牌销售额
15品牌-51.226224e+09
8品牌-171.195281e+09
2品牌-111.151829e+09
4品牌-131.150687e+09
3品牌-121.143520e+09
\n", 562 | "
" 563 | ], 564 | "text/plain": [ 565 | " 品牌 销售额\n", 566 | "15 品牌-5 1.226224e+09\n", 567 | "8 品牌-17 1.195281e+09\n", 568 | "2 品牌-11 1.151829e+09\n", 569 | "4 品牌-13 1.150687e+09\n", 570 | "3 品牌-12 1.143520e+09" 571 | ] 572 | }, 573 | "execution_count": 15, 574 | "metadata": {}, 575 | "output_type": "execute_result" 576 | } 577 | ], 578 | "source": [ 579 | "final.head()" 580 | ] 581 | }, 582 | { 583 | "cell_type": "markdown", 584 | "metadata": {}, 585 | "source": [ 586 | "### 不显示科学计数法,保留小数点两位数" 587 | ] 588 | }, 589 | { 590 | "cell_type": "code", 591 | "execution_count": 16, 592 | "metadata": {}, 593 | "outputs": [ 594 | { 595 | "data": { 596 | "text/html": [ 597 | "
\n", 598 | "\n", 611 | "\n", 612 | " \n", 613 | " \n", 614 | " \n", 615 | " \n", 616 | " \n", 617 | " \n", 618 | " \n", 619 | " \n", 620 | " \n", 621 | " \n", 622 | " \n", 623 | " \n", 624 | " \n", 625 | " \n", 626 | " \n", 627 | " \n", 628 | " \n", 629 | " \n", 630 | " \n", 631 | " \n", 632 | " \n", 633 | " \n", 634 | " \n", 635 | " \n", 636 | " \n", 637 | " \n", 638 | " \n", 639 | " \n", 640 | " \n", 641 | " \n", 642 | " \n", 643 | " \n", 644 | " \n", 645 | " \n", 646 | "
品牌销售额
15品牌-51226223640.73
8品牌-171195280571.60
2品牌-111151829215.73
4品牌-131150687029.66
3品牌-121143519788.23
\n", 647 | "
" 648 | ], 649 | "text/plain": [ 650 | " 品牌 销售额\n", 651 | "15 品牌-5 1226223640.73\n", 652 | "8 品牌-17 1195280571.60\n", 653 | "2 品牌-11 1151829215.73\n", 654 | "4 品牌-13 1150687029.66\n", 655 | "3 品牌-12 1143519788.23" 656 | ] 657 | }, 658 | "execution_count": 16, 659 | "metadata": {}, 660 | "output_type": "execute_result" 661 | } 662 | ], 663 | "source": [ 664 | "pd.set_option('display.float_format', lambda x: '%.2f' % x)\n", 665 | "final.head()" 666 | ] 667 | } 668 | ], 669 | "metadata": { 670 | "kernelspec": { 671 | "display_name": "Python 3", 672 | "language": "python", 673 | "name": "python3" 674 | }, 675 | "language_info": { 676 | "codemirror_mode": { 677 | "name": "ipython", 678 | "version": 3 679 | }, 680 | "file_extension": ".py", 681 | "mimetype": "text/x-python", 682 | "name": "python", 683 | "nbconvert_exporter": "python", 684 | "pygments_lexer": "ipython3", 685 | "version": "3.5.3" 686 | } 687 | }, 688 | "nbformat": 4, 689 | "nbformat_minor": 2 690 | } 691 | -------------------------------------------------------------------------------- /Python+excel/README.md: -------------------------------------------------------------------------------- 1 | # Python批量处理128张Excel表格 # 2 | 3 | ---------- 4 | 5 | 项目主要以一个Python处理128张表格的操作为案例,引出Python自动化处理表格或办公,提升工作效率的思路。 6 | 7 | 主要文件为: 8 | 9 | - 128张销售表格 10 | - Python批量处理Excel代码 11 | 12 | 13 | -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/专项户外运动装备&冰爪.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/专项户外运动装备&冰爪.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/专项户外运动装备&呼吸管-呼吸器.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/专项户外运动装备&呼吸管-呼吸器.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/专项户外运动装备&安全带.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/专项户外运动装备&安全带.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/专项户外运动装备&救生衣.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/专项户外运动装备&救生衣.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/专项户外运动装备&气瓶.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/专项户外运动装备&气瓶.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/专项户外运动装备&滑雪头盔.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/专项户外运动装备&滑雪头盔.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/专项户外运动装备&滑雪护具.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/专项户外运动装备&滑雪护具.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/专项户外运动装备&滑雪板.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/专项户外运动装备&滑雪板.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/专项户外运动装备&滑雪眼镜.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/专项户外运动装备&滑雪眼镜.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/专项户外运动装备&潜水箱包.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/专项户外运动装备&潜水箱包.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/专项户外运动装备&潜水袜.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/专项户外运动装备&潜水袜.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/专项户外运动装备&皮划艇充气艇.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/专项户外运动装备&皮划艇充气艇.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/专项户外运动装备&绳索.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/专项户外运动装备&绳索.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/专项户外运动装备&脚蹼.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/专项户外运动装备&脚蹼.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/专项户外运动装备&面镜.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/专项户外运动装备&面镜.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/垂钓装备&其他垂钓用品.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&其他垂钓用品.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/垂钓装备&垂钓小配件.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&垂钓小配件.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/垂钓装备&垂钓装备.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&垂钓装备.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/垂钓装备&太空豆.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&太空豆.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/垂钓装备&打水桶.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&打水桶.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/垂钓装备&抄网.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&抄网.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/垂钓装备&抄网头.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&抄网头.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/垂钓装备&抄网杆.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&抄网杆.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/垂钓装备&探鱼器.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&探鱼器.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/垂钓装备&支架.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&支架.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/垂钓装备&止血钳.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&止血钳.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/垂钓装备&浮漂.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&浮漂.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/垂钓装备&渔具包.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&渔具包.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/垂钓装备&绑钩器.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&绑钩器.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/垂钓装备&装鱼桶.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&装鱼桶.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/垂钓装备&钓台.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&钓台.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/垂钓装备&钓竿.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&钓竿.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/垂钓装备&钓箱.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&钓箱.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/垂钓装备&钓鱼伞.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&钓鱼伞.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/垂钓装备&钓鱼帽.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&钓鱼帽.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/垂钓装备&钓鱼手套.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&钓鱼手套.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/垂钓装备&钓鱼椅、凳.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&钓鱼椅、凳.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/垂钓装备&钓鱼鞋.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&钓鱼鞋.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/垂钓装备&铅坠.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&铅坠.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/垂钓装备&铅皮.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&铅皮.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/垂钓装备&饵料盒.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&饵料盒.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/垂钓装备&鱼护.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&鱼护.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/垂钓装备&鱼线.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&鱼线.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/垂钓装备&鱼线轮.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&鱼线轮.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/垂钓装备&鱼网-虾笼-其它渔具.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&鱼网-虾笼-其它渔具.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/垂钓装备&鱼钩.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&鱼钩.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/垂钓装备&鱼饵.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/垂钓装备&鱼饵.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外休闲家具&充气床.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外休闲家具&充气床.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外休闲家具&吊床.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外休闲家具&吊床.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外休闲家具&户外休闲家具.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外休闲家具&户外休闲家具.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外休闲家具&户外床-折叠床.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外休闲家具&户外床-折叠床.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外休闲家具&户外桌子.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外休闲家具&户外桌子.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外休闲家具&户外桌椅套装.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外休闲家具&户外桌椅套装.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外休闲家具&户外椅子凳子.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外休闲家具&户外椅子凳子.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外休闲家具&野餐垫.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外休闲家具&野餐垫.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外服装&一次性内裤.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&一次性内裤.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外服装&其他户外服装.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&其他户外服装.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外服装&内衣裤套装.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&内衣裤套装.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外服装&冲锋衣.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&冲锋衣.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外服装&冲锋衣裤套装.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&冲锋衣裤套装.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外服装&冲锋裤.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&冲锋裤.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外服装&功能内衣上装.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&功能内衣上装.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外服装&功能内衣下装.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&功能内衣下装.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外服装&功能内裤.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&功能内裤.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外服装&户外休闲衣.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&户外休闲衣.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外服装&户外休闲衣裤套装.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&户外休闲衣裤套装.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外服装&户外休闲裤.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&户外休闲裤.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外服装&户外服装.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&户外服装.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外服装&抓绒衣.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&抓绒衣.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外服装&抓绒裤.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&抓绒裤.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外服装&滑雪衣.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&滑雪衣.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外服装&滑雪衣裤套装.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&滑雪衣裤套装.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外服装&滑雪裤.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&滑雪裤.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外服装&潜水服.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&潜水服.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外服装&羽绒衣.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&羽绒衣.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外服装&软壳衣.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&软壳衣.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外服装&软壳裤.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&软壳裤.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外服装&运动户外风衣.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&运动户外风衣.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外服装&速干T恤.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&速干T恤.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外服装&速干背心.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&速干背心.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外服装&速干衣裤套装.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&速干衣裤套装.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外服装&速干衬衣.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&速干衬衣.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外服装&速干裤.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&速干裤.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外服装&钓鱼服.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外服装&钓鱼服.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外照明&信号灯-发光棒-救生灯.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外照明&信号灯-发光棒-救生灯.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外照明&充电器.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外照明&充电器.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外照明&其他.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外照明&其他.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外照明&头灯.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外照明&头灯.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外照明&户外照明.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外照明&户外照明.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外照明&手电筒.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外照明&手电筒.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外照明&电池-燃料.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外照明&电池-燃料.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外照明&营地灯-帐篷灯.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外照明&营地灯-帐篷灯.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外照明&钓鱼灯.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外照明&钓鱼灯.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外鞋靴&其他户外鞋.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外鞋靴&其他户外鞋.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外鞋靴&户外休闲鞋.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外鞋靴&户外休闲鞋.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外鞋靴&户外鞋靴.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外鞋靴&户外鞋靴.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外鞋靴&攀岩鞋.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外鞋靴&攀岩鞋.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外鞋靴&沙滩鞋-凉鞋-拖鞋.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外鞋靴&沙滩鞋-凉鞋-拖鞋.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外鞋靴&溯溪鞋.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外鞋靴&溯溪鞋.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外鞋靴&滑雪鞋-雪地靴.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外鞋靴&滑雪鞋-雪地靴.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外鞋靴&登山鞋-徒步鞋.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外鞋靴&登山鞋-徒步鞋.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/户外鞋靴&越野跑鞋.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/户外鞋靴&越野跑鞋.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/旅行便携装备&其他.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/旅行便携装备&其他.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/旅行便携装备&其他安全防盗产品.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/旅行便携装备&其他安全防盗产品.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/旅行便携装备&旅行便携装备.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/旅行便携装备&旅行便携装备.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/旅行便携装备&普通密码锁.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/旅行便携装备&普通密码锁.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/旅行便携装备&晾衣绳.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/旅行便携装备&晾衣绳.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/旅行便携装备&转换插头.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/旅行便携装备&转换插头.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/望远镜-夜视仪-户外眼镜&垂钓望远镜.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/望远镜-夜视仪-户外眼镜&垂钓望远镜.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/望远镜-夜视仪-户外眼镜&户外眼镜.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/望远镜-夜视仪-户外眼镜&户外眼镜.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/望远镜-夜视仪-户外眼镜&普通望远镜.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/望远镜-夜视仪-户外眼镜&普通望远镜.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/望远镜-夜视仪-户外眼镜&望远镜-夜视仪-户外眼镜.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/望远镜-夜视仪-户外眼镜&望远镜-夜视仪-户外眼镜.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/望远镜-夜视仪-户外眼镜&望远镜配件.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/望远镜-夜视仪-户外眼镜&望远镜配件.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/洗漱清洁-护理用品&防虫-防蚊用品.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/洗漱清洁-护理用品&防虫-防蚊用品.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/登山杖-手杖&登山杖-手杖.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/登山杖-手杖&登山杖-手杖.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/睡袋&睡袋.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/睡袋&睡袋.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/防护-救生装备&其他防护救生装备.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/防护-救生装备&其他防护救生装备.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/防护-救生装备&急救包-急救箱.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/防护-救生装备&急救包-急救箱.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/防护-救生装备&急救护理用品.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/防护-救生装备&急救护理用品.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/防护-救生装备&求生哨.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/防护-救生装备&求生哨.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/防护-救生装备&求生绳-逃生绳.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/防护-救生装备&求生绳-逃生绳.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/防护-救生装备&求生锯-绳锯-线锯.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/防护-救生装备&求生锯-绳锯-线锯.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/防护-救生装备&防护-救生装备.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/防护-救生装备&防护-救生装备.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/防护-救生装备&防护面罩.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/防护-救生装备&防护面罩.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/防潮垫-地席-枕头&地布-地席.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/防潮垫-地席-枕头&地布-地席.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/防潮垫-地席-枕头&枕头.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/防潮垫-地席-枕头&枕头.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/防潮垫-地席-枕头&防潮垫-地席-枕头.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/防潮垫-地席-枕头&防潮垫-地席-枕头.xlsx -------------------------------------------------------------------------------- /Python+excel/源数据128张表格/防潮垫-地席-枕头&防潮垫.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Python+excel/源数据128张表格/防潮垫-地席-枕头&防潮垫.xlsx -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # 你好啊,欢迎来到数据不吹牛# 2 | 3 | ---------- 4 | 5 | 黑发渔樵江渚上 6 | 7 | 惯看秋月春风 8 | 9 | 一壶数据喜相逢 10 | 11 | 古今多少事 12 | 13 | 都在分析中 14 | 15 | · 16 | 17 | 这里数据源和代码搭配小Z公众号《数据不吹牛》食用更佳~ 18 | 19 | 当初学Python和数据分析的时候,看到很多优秀的案例,经常苦于没有数据源和详细代码去复现。 20 | 21 | 所以,后来小Z在分享的过程中,特别注意数据源和分步骤代码的沉淀,希望分享的内容,能够再多那么一丢丢帮助到需要的朋友 22 | 23 | 24 | 你的点赞就是对小Z最大的鼓励 25 | 26 | -------------------------------------------------------------------------------- /RFM/PYTHON-RFM实战数据.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/RFM/PYTHON-RFM实战数据.xlsx -------------------------------------------------------------------------------- /RFM/README.md: -------------------------------------------------------------------------------- 1 | # RFM分析实战 # 2 | 3 | ---------- 4 | 5 | 项目主要讲清楚两个问题,什么是RFM模型以及怎么用Python实现RFM模型 6 | 7 | 主要文件为: 8 | 9 | - 脱敏案例源数据 10 | - RFM分析实战代码 11 | 12 | 欢迎关注公众号:数据不吹牛 13 | 14 | -------------------------------------------------------------------------------- /TGI/README.md: -------------------------------------------------------------------------------- 1 | # TGI分析实战 # 2 | 3 | ---------- 4 | 5 | 项目主要围绕什么是TGI指数以及怎么样基于案例数据,用Python实现基本的TGI指数分析。 6 | 7 | 主要文件为: 8 | 9 | - 脱敏案例源数据 10 | - TGI分析实战代码 11 | 12 | 欢迎关注公众号:数据不吹牛 13 | 14 | -------------------------------------------------------------------------------- /TGI/TGI指数案例数据.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/TGI/TGI指数案例数据.xlsx -------------------------------------------------------------------------------- /Weather+Email/README.md: -------------------------------------------------------------------------------- 1 | # 天气爬取+邮件发送 # 2 | 3 | ---------- 4 | 5 | 项目主要介绍了天气网站的爬取和如何用简洁的代码发送邮件,对应了公众号文章中的脑洞。 6 | 7 | 主要文件为: 8 | 9 | -天气爬虫 + 邮件发送完整代码 10 | 11 | 欢迎来撩公众号:数据不吹牛 12 | 13 | -------------------------------------------------------------------------------- /Weather+Email/天气爬虫+邮件发送.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### 原创:公众号《数据不吹牛》" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 15, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "import pandas as pd\n", 17 | "import numpy as np" 18 | ] 19 | }, 20 | { 21 | "cell_type": "markdown", 22 | "metadata": {}, 23 | "source": [ 24 | "### 天气爬虫" 25 | ] 26 | }, 27 | { 28 | "cell_type": "code", 29 | "execution_count": 11, 30 | "metadata": {}, 31 | "outputs": [], 32 | "source": [ 33 | "import requests\n", 34 | "from lxml import etree\n", 35 | "\n", 36 | "def parse(url = 'https://www.tianqi.com/hangzhou'):\n", 37 | " headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36'}\n", 38 | " html = requests.get(url,headers = headers)\n", 39 | " bs = etree.HTML(html.text)\n", 40 | " \n", 41 | " #今天天气相关数据:日期,星期几,天气,最低气温,最高气温\n", 42 | " today_date = bs.xpath('//ul[@class = \"week\"]/li[1]/b/text()')[0]\n", 43 | " today_week = bs.xpath('//ul[@class = \"week\"]/li[1]/span/text()')[0]\n", 44 | " today_weather = bs.xpath('//ul[@class = \"txt txt2\"]/li[1]/text()')[0]\n", 45 | " today_low = bs.xpath('//div[@class = \"zxt_shuju\"]/ul/li[1]/b/text()')[0]\n", 46 | " today_high = bs.xpath('//div[@class = \"zxt_shuju\"]/ul/li[1]/span/text()')[0]\n", 47 | "\n", 48 | " #明天天气相关数据,维度和上述一致\n", 49 | " tomorrow_date = bs.xpath('//ul[@class = \"week\"]/li[2]/b/text()')[0]\n", 50 | " tomorrow_week = bs.xpath('//ul[@class = \"week\"]/li[2]/span/text()')[0]\n", 51 | " tomorrow_weather = bs.xpath('//ul[@class = \"txt txt2\"]/li[2]/text()')[0]\n", 52 | " tomorrow_low = bs.xpath('//div[@class = \"zxt_shuju\"]/ul/li[2]/b/text()')[0]\n", 53 | " tomorrow_high = bs.xpath('//div[@class = \"zxt_shuju\"]/ul/li[2]/span/text()')[0]\n", 54 | " \n", 55 | " tomorrow = ('明天是%s,%s,%s,%s-%s度,温差%d度')% \\\n", 56 | " (tomorrow_date,tomorrow_week,tomorrow_weather,tomorrow_low,tomorrow_high,int(int(tomorrow_high)-int(tomorrow_low)))\n", 57 | " \n", 58 | " print(('明天是%s,%s,%s,%s-%s度,温差%d度')% \\\n", 59 | " (tomorrow_date,tomorrow_week,tomorrow_weather,tomorrow_low,tomorrow_high,int(int(tomorrow_high)-int(tomorrow_low))))\n", 60 | " \n", 61 | " #计算今明两天温度差异,这里用的是最高温度\n", 62 | " temperature_distance = int(tomorrow_high) - int(today_high)\n", 63 | " \n", 64 | " if temperature_distance > 0:\n", 65 | " a = '明日升温%d' % temperature_distance\n", 66 | " print('明日升温%d' % temperature_distance)\n", 67 | " if temperature_distance < 0:\n", 68 | " a = '明日降温%d' % temperature_distance\n", 69 | " print('明日降温%d' % temperature_distance)\n", 70 | " else:\n", 71 | " a = '最高气温不变'\n", 72 | " print('最高气温不变')\n", 73 | " content = tomorrow,a\n", 74 | " return content" 75 | ] 76 | }, 77 | { 78 | "cell_type": "markdown", 79 | "metadata": {}, 80 | "source": [ 81 | "### 展示爬取结果" 82 | ] 83 | }, 84 | { 85 | "cell_type": "code", 86 | "execution_count": 12, 87 | "metadata": {}, 88 | "outputs": [ 89 | { 90 | "name": "stdout", 91 | "output_type": "stream", 92 | "text": [ 93 | "明天是11月19日,星期二,晴转多云,5-14度,温差9度\n", 94 | "明日降温-3\n" 95 | ] 96 | } 97 | ], 98 | "source": [ 99 | "#默认爬取杭州,可以找到自己城市所对应的地址\n", 100 | "weather = parse()" 101 | ] 102 | }, 103 | { 104 | "cell_type": "markdown", 105 | "metadata": {}, 106 | "source": [ 107 | "### 邮件发送" 108 | ] 109 | }, 110 | { 111 | "cell_type": "code", 112 | "execution_count": 13, 113 | "metadata": {}, 114 | "outputs": [], 115 | "source": [ 116 | "import yagmail\n", 117 | "\n", 118 | "def send_email(contents,send_to = 'receiver_email@xx.com'):\n", 119 | " #登录邮箱,设置登录的账号,密码和port等信息\n", 120 | " yag = yagmail.SMTP(user = 'youremail@sohu.com',password = 'yourpass',\n", 121 | " host = 'smtp.sohu.com',port = '465')\n", 122 | " \n", 123 | " #登录完即可一件发送,设置发送给谁,和邮件主题,邮件内容\n", 124 | " yag.send(to = send_to,\n", 125 | " subject = '天气关怀',\n", 126 | " contents = contents)\n", 127 | " print('发送成功!~')" 128 | ] 129 | }, 130 | { 131 | "cell_type": "markdown", 132 | "metadata": {}, 133 | "source": [ 134 | "### 最终执行,设置自己的邮箱名,密码,host和port参数,以及要发给谁" 135 | ] 136 | }, 137 | { 138 | "cell_type": "code", 139 | "execution_count": 9, 140 | "metadata": {}, 141 | "outputs": [], 142 | "source": [ 143 | "send_email(weather,send_to = 'xxxx')" 144 | ] 145 | }, 146 | { 147 | "cell_type": "code", 148 | "execution_count": null, 149 | "metadata": {}, 150 | "outputs": [], 151 | "source": [] 152 | } 153 | ], 154 | "metadata": { 155 | "kernelspec": { 156 | "display_name": "Python 3", 157 | "language": "python", 158 | "name": "python3" 159 | }, 160 | "language_info": { 161 | "codemirror_mode": { 162 | "name": "ipython", 163 | "version": 3 164 | }, 165 | "file_extension": ".py", 166 | "mimetype": "text/x-python", 167 | "name": "python", 168 | "nbconvert_exporter": "python", 169 | "pygments_lexer": "ipython3", 170 | "version": "3.5.3" 171 | } 172 | }, 173 | "nbformat": 4, 174 | "nbformat_minor": 2 175 | } 176 | -------------------------------------------------------------------------------- /Zhihu/README.md: -------------------------------------------------------------------------------- 1 | # 知乎爬取和清洗 # 2 | 3 | ---------- 4 | 5 | 这里是公众号《数据不吹牛》关于过年三个问题的(知乎)爬取、清洗代码和源数据。 6 | 7 | 主要文件为: 8 | 9 | - 知乎回答和单个用户信息爬取 10 | - 基于爬取数据的清洗和简单可视化 11 | - 两个数据源 12 | 13 | -------------------------------------------------------------------------------- /Zhihu/知乎爬取代码.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### 公众号:数据不吹牛\n", 8 | "### 更多案例和有趣分析等你来撩" 9 | ] 10 | }, 11 | { 12 | "cell_type": "code", 13 | "execution_count": 5, 14 | "metadata": {}, 15 | "outputs": [], 16 | "source": [ 17 | "import pandas as pd\n", 18 | "import numpy as np\n", 19 | "import os\n", 20 | "import json\n", 21 | "import requests\n", 22 | "import time\n", 23 | "import random\n", 24 | "from lxml import etree" 25 | ] 26 | }, 27 | { 28 | "cell_type": "markdown", 29 | "metadata": {}, 30 | "source": [ 31 | "## 设置基本网址和headers" 32 | ] 33 | }, 34 | { 35 | "cell_type": "code", 36 | "execution_count": 29, 37 | "metadata": {}, 38 | "outputs": [], 39 | "source": [ 40 | "#两个问题基础网址\n", 41 | "gangwei = 'https://www.zhihu.com/api/v4/questions/266817891/answers?include=data%5B%2A%5D.is_normal%2Cadmin_closed_comment%2Creward_info%2Cis_collapsed%2Cannotation_action%2Cannotation_detail%2Ccollapse_reason%2Cis_sticky%2Ccollapsed_by%2Csuggest_edit%2Ccomment_count%2Ccan_comment%2Ccontent%2Ceditable_content%2Cvoteup_count%2Creshipment_settings%2Ccomment_permission%2Ccreated_time%2Cupdated_time%2Creview_info%2Crelevant_info%2Cquestion%2Cexcerpt%2Crelationship.is_authorized%2Cis_author%2Cvoting%2Cis_thanked%2Cis_nothelp%2Cis_labeled%2Cis_recognized%2Cpaid_info%2Cpaid_info_content%3Bdata%5B%2A%5D.mark_infos%5B%2A%5D.url%3Bdata%5B%2A%5D.author.follower_count%2Cbadge%5B%2A%5D.topics&offset={}&limit={}&platform=desktop&sort_by=default'\n", 42 | "sanwen = 'https://www.zhihu.com/api/v4/questions/27329739/answers?include=data%5B%2A%5D.is_normal%2Cadmin_closed_comment%2Creward_info%2Cis_collapsed%2Cannotation_action%2Cannotation_detail%2Ccollapse_reason%2Cis_sticky%2Ccollapsed_by%2Csuggest_edit%2Ccomment_count%2Ccan_comment%2Ccontent%2Ceditable_content%2Cvoteup_count%2Creshipment_settings%2Ccomment_permission%2Ccreated_time%2Cupdated_time%2Creview_info%2Crelevant_info%2Cquestion%2Cexcerpt%2Crelationship.is_authorized%2Cis_author%2Cvoting%2Cis_thanked%2Cis_nothelp%2Cis_labeled%2Cis_recognized%2Cpaid_info%2Cpaid_info_content%3Bdata%5B%2A%5D.mark_infos%5B%2A%5D.url%3Bdata%5B%2A%5D.author.follower_count%2Cbadge%5B%2A%5D.topics&offset={}&limit={}&platform=desktop&sort_by=default'\n", 43 | "\n", 44 | "headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}" 45 | ] 46 | }, 47 | { 48 | "cell_type": "markdown", 49 | "metadata": { 50 | "collapsed": true 51 | }, 52 | "source": [ 53 | "## 解析单页信息" 54 | ] 55 | }, 56 | { 57 | "cell_type": "code", 58 | "execution_count": 30, 59 | "metadata": {}, 60 | "outputs": [], 61 | "source": [ 62 | "def parse_page(url,headers):\n", 63 | " html = requests.get(url,headers = headers)\n", 64 | " bs = json.loads(html.text)\n", 65 | " result = pd.DataFrame()\n", 66 | " for i in bs['data']:\n", 67 | " headline = i['author']['headline'] #签名\n", 68 | " gender = i['author']['gender'] #性别\n", 69 | " user_type = i['author']['user_type']\n", 70 | " user_id = i['author']['id']\n", 71 | " user_token = i['author']['url_token']\n", 72 | " follwer_count = i['author']['follower_count'] #关注人数\n", 73 | " name = i['author']['name'] #用户昵称\n", 74 | " vote_up = i['voteup_count'] #点赞数\n", 75 | " updated_time = i['updated_time'] #更新时间\n", 76 | " title = i['question']['title'] #问题\n", 77 | " created_time = i['created_time'] #创建时间\n", 78 | " comment_count = i['comment_count'] #评论数\n", 79 | " can_comment = i['can_comment']['status'] #是否可以评论\n", 80 | " content = i['content'] #内容,还需要再清洗\n", 81 | " cache = pd.DataFrame({'用户ID':[user_id],'用户名':[name],'性别':[gender],'token':[user_token],'用户类型':[user_type],'签名':[headline],\n", 82 | " '被关注人数':[follwer_count],'创建时间':[created_time],'更新时间':[updated_time],'评论数':[comment_count],\n", 83 | " '点赞数':[vote_up],'是否可以评论':[can_comment],'内容':[content],'问题':[title]})\n", 84 | " result = pd.concat([result,cache])\n", 85 | " return result" 86 | ] 87 | }, 88 | { 89 | "cell_type": "markdown", 90 | "metadata": {}, 91 | "source": [ 92 | "## 设置爬取回答数,批量获取" 93 | ] 94 | }, 95 | { 96 | "cell_type": "code", 97 | "execution_count": 31, 98 | "metadata": {}, 99 | "outputs": [], 100 | "source": [ 101 | "def run_all(url,headers,num = 200):\n", 102 | " final_result = pd.DataFrame()\n", 103 | " for i in range(0,num,5):\n", 104 | " try:\n", 105 | " result = parse_page(url.format(i,5),headers)\n", 106 | " final_result = pd.concat([final_result,result])\n", 107 | " time.sleep(random.random())\n", 108 | " print('i had parsed:',i)\n", 109 | " except:\n", 110 | " try:\n", 111 | " time.sleep(5)\n", 112 | " result = parse_page(url.format(i,5),headers)\n", 113 | " final_result = pd.concat([final_result,result])\n", 114 | " time.sleep(random.random())\n", 115 | " print('i had parsed:',i)\n", 116 | " except:\n", 117 | " print(i,'is wrong~~~') \n", 118 | " return final_result" 119 | ] 120 | }, 121 | { 122 | "cell_type": "code", 123 | "execution_count": 35, 124 | "metadata": { 125 | "scrolled": true 126 | }, 127 | "outputs": [ 128 | { 129 | "name": "stdout", 130 | "output_type": "stream", 131 | "text": [ 132 | "i had parsed: 0\n", 133 | "i had parsed: 5\n", 134 | "i had parsed: 10\n", 135 | "i had parsed: 15\n", 136 | "i had parsed: 20\n", 137 | "i had parsed: 25\n", 138 | "i had parsed: 30\n", 139 | "i had parsed: 35\n", 140 | "i had parsed: 40\n", 141 | "i had parsed: 45\n", 142 | "i had parsed: 50\n", 143 | "i had parsed: 55\n", 144 | "i had parsed: 60\n", 145 | "i had parsed: 65\n", 146 | "i had parsed: 70\n", 147 | "i had parsed: 75\n", 148 | "i had parsed: 80\n", 149 | "i had parsed: 85\n", 150 | "i had parsed: 90\n", 151 | "i had parsed: 95\n", 152 | "i had parsed: 100\n", 153 | "i had parsed: 105\n", 154 | "i had parsed: 110\n", 155 | "i had parsed: 115\n", 156 | "i had parsed: 120\n", 157 | "i had parsed: 125\n", 158 | "i had parsed: 130\n", 159 | "i had parsed: 135\n", 160 | "i had parsed: 140\n", 161 | "i had parsed: 145\n", 162 | "i had parsed: 150\n", 163 | "i had parsed: 155\n", 164 | "i had parsed: 160\n", 165 | "i had parsed: 165\n", 166 | "i had parsed: 170\n", 167 | "i had parsed: 175\n", 168 | "i had parsed: 180\n", 169 | "i had parsed: 185\n", 170 | "i had parsed: 190\n", 171 | "i had parsed: 195\n", 172 | "i had parsed: 200\n", 173 | "i had parsed: 205\n", 174 | "i had parsed: 210\n", 175 | "i had parsed: 215\n", 176 | "i had parsed: 220\n", 177 | "i had parsed: 225\n", 178 | "i had parsed: 230\n", 179 | "i had parsed: 235\n", 180 | "i had parsed: 240\n", 181 | "i had parsed: 245\n", 182 | "i had parsed: 250\n", 183 | "i had parsed: 255\n", 184 | "i had parsed: 260\n", 185 | "i had parsed: 265\n", 186 | "i had parsed: 270\n", 187 | "i had parsed: 275\n", 188 | "i had parsed: 280\n", 189 | "i had parsed: 285\n", 190 | "i had parsed: 290\n", 191 | "i had parsed: 295\n", 192 | "i had parsed: 300\n", 193 | "i had parsed: 305\n", 194 | "i had parsed: 310\n", 195 | "i had parsed: 315\n", 196 | "i had parsed: 320\n", 197 | "i had parsed: 325\n", 198 | "i had parsed: 330\n", 199 | "i had parsed: 335\n", 200 | "i had parsed: 340\n", 201 | "i had parsed: 345\n", 202 | "i had parsed: 350\n", 203 | "i had parsed: 355\n", 204 | "i had parsed: 360\n", 205 | "i had parsed: 365\n", 206 | "i had parsed: 370\n", 207 | "i had parsed: 375\n", 208 | "i had parsed: 380\n", 209 | "i had parsed: 385\n", 210 | "i had parsed: 390\n", 211 | "i had parsed: 395\n", 212 | "i had parsed: 400\n", 213 | "i had parsed: 405\n", 214 | "i had parsed: 410\n", 215 | "i had parsed: 415\n", 216 | "i had parsed: 420\n", 217 | "i had parsed: 425\n", 218 | "i had parsed: 430\n", 219 | "i had parsed: 435\n", 220 | "i had parsed: 440\n", 221 | "i had parsed: 445\n", 222 | "i had parsed: 450\n", 223 | "i had parsed: 455\n", 224 | "i had parsed: 460\n", 225 | "i had parsed: 465\n", 226 | "i had parsed: 470\n", 227 | "i had parsed: 475\n", 228 | "i had parsed: 480\n", 229 | "i had parsed: 485\n", 230 | "i had parsed: 490\n", 231 | "i had parsed: 495\n", 232 | "i had parsed: 500\n", 233 | "i had parsed: 505\n", 234 | "i had parsed: 510\n", 235 | "i had parsed: 515\n", 236 | "i had parsed: 520\n", 237 | "i had parsed: 525\n", 238 | "i had parsed: 530\n", 239 | "i had parsed: 535\n", 240 | "i had parsed: 540\n", 241 | "i had parsed: 545\n", 242 | "i had parsed: 550\n", 243 | "i had parsed: 555\n", 244 | "i had parsed: 560\n", 245 | "i had parsed: 565\n", 246 | "i had parsed: 570\n", 247 | "i had parsed: 575\n", 248 | "i had parsed: 580\n", 249 | "i had parsed: 585\n", 250 | "i had parsed: 590\n", 251 | "i had parsed: 595\n", 252 | "i had parsed: 600\n", 253 | "i had parsed: 605\n", 254 | "i had parsed: 610\n", 255 | "i had parsed: 615\n", 256 | "i had parsed: 620\n", 257 | "i had parsed: 625\n", 258 | "i had parsed: 630\n", 259 | "i had parsed: 635\n", 260 | "i had parsed: 640\n", 261 | "i had parsed: 645\n", 262 | "i had parsed: 650\n", 263 | "i had parsed: 655\n", 264 | "i had parsed: 660\n", 265 | "i had parsed: 665\n", 266 | "i had parsed: 670\n", 267 | "i had parsed: 675\n", 268 | "i had parsed: 680\n", 269 | "i had parsed: 685\n", 270 | "i had parsed: 690\n", 271 | "i had parsed: 695\n", 272 | "i had parsed: 700\n", 273 | "i had parsed: 705\n", 274 | "i had parsed: 710\n", 275 | "i had parsed: 715\n", 276 | "i had parsed: 720\n", 277 | "i had parsed: 725\n", 278 | "i had parsed: 730\n", 279 | "i had parsed: 735\n", 280 | "i had parsed: 740\n", 281 | "i had parsed: 745\n", 282 | "i had parsed: 750\n", 283 | "i had parsed: 755\n", 284 | "i had parsed: 760\n", 285 | "i had parsed: 765\n", 286 | "i had parsed: 770\n" 287 | ] 288 | } 289 | ], 290 | "source": [ 291 | "final_result = run_all(sanwen,headers,775)" 292 | ] 293 | }, 294 | { 295 | "cell_type": "markdown", 296 | "metadata": {}, 297 | "source": [ 298 | "### 单个爬取用户信息,会存在ip短暂被ban的情况,如果想要持续稳定,建议准备好IP" 299 | ] 300 | }, 301 | { 302 | "cell_type": "code", 303 | "execution_count": null, 304 | "metadata": {}, 305 | "outputs": [], 306 | "source": [ 307 | "def get_ips():\n", 308 | " #交给你了朋友,事先准备好ip,每次调用一个回来就OK" 309 | ] 310 | }, 311 | { 312 | "cell_type": "code", 313 | "execution_count": 20, 314 | "metadata": {}, 315 | "outputs": [ 316 | { 317 | "data": { 318 | "text/plain": [ 319 | "{'https': 'https://117.57.35.166:4512'}" 320 | ] 321 | }, 322 | "execution_count": 20, 323 | "metadata": {}, 324 | "output_type": "execute_result" 325 | } 326 | ], 327 | "source": [ 328 | "get_ips()" 329 | ] 330 | }, 331 | { 332 | "cell_type": "markdown", 333 | "metadata": {}, 334 | "source": [ 335 | "### 逐个爬取用户信息,获取行业岗位等信息" 336 | ] 337 | }, 338 | { 339 | "cell_type": "code", 340 | "execution_count": 1, 341 | "metadata": {}, 342 | "outputs": [], 343 | "source": [ 344 | "def get_user_info(user,headers,ip):\n", 345 | " user_url = 'https://www.zhihu.com/people/{}/activities'\n", 346 | " ht = requests.get(user_url.format(user),headers = headers,proxies = ip)\n", 347 | " if ht.text.find('安全验证') == -1:\n", 348 | " \n", 349 | " bs = etree.HTML(ht.text)\n", 350 | " try:\n", 351 | " hangye = bs.xpath('//div[@class = \"ProfileHeader-infoItem\"][1]/text()')\n", 352 | " except:\n", 353 | " hangye = None\n", 354 | " try:\n", 355 | " school = bs.xpath('//div[@class = \"ProfileHeader-infoItem\"][2]/text()')[0]\n", 356 | " except:\n", 357 | " school = None\n", 358 | " try:\n", 359 | " prof = bs.xpath('//div[@class = \"ProfileHeader-infoItem\"][2]/text()')[1]\n", 360 | " except:\n", 361 | " prof = None\n", 362 | " df = pd.DataFrame({'token':[user],'行业':[hangye],'教育经历':[school],'专业':[prof]})\n", 363 | " \n", 364 | " else:\n", 365 | " \n", 366 | " ip = get_ips()\n", 367 | " try:\n", 368 | " ht = requests.get(user_url.format(user),headers = headers,proxies = ip)\n", 369 | " except:\n", 370 | " ip = get_ips()\n", 371 | " ht = requests.get(user_url.format(user),headers = headers,proxies = ip)\n", 372 | " bs = etree.HTML(ht.text)\n", 373 | " try:\n", 374 | " hangye = bs.xpath('//div[@class = \"ProfileHeader-infoItem\"][1]/text()')\n", 375 | " except:\n", 376 | " hangye = None\n", 377 | " try:\n", 378 | " school = bs.xpath('//div[@class = \"ProfileHeader-infoItem\"][2]/text()')[0]\n", 379 | " except:\n", 380 | " school = None\n", 381 | " try:\n", 382 | " prof = bs.xpath('//div[@class = \"ProfileHeader-infoItem\"][2]/text()')[1]\n", 383 | " except:\n", 384 | " prof = None\n", 385 | " df = pd.DataFrame({'token':[user],'行业':[hangye],'教育经历':[school],'专业':[prof]})\n", 386 | " print('ip changes')\n", 387 | " \n", 388 | " return df,ip" 389 | ] 390 | }, 391 | { 392 | "cell_type": "markdown", 393 | "metadata": {}, 394 | "source": [ 395 | "### 循环爬取用户信息" 396 | ] 397 | }, 398 | { 399 | "cell_type": "code", 400 | "execution_count": null, 401 | "metadata": {}, 402 | "outputs": [], 403 | "source": [ 404 | "ip = get_ips()\n", 405 | "ct = 1\n", 406 | "user_info2 = pd.DataFrame()\n", 407 | "for i in final_result['token']:\n", 408 | " df,ip = get_user_info(i,headers,ip)\n", 409 | " user_info2 = pd.concat([user_info2,df])\n", 410 | " time.sleep(random.random() / 2)\n", 411 | " print('i had parsed:{}'.format(ct))\n", 412 | " ct += 1" 413 | ] 414 | }, 415 | { 416 | "cell_type": "code", 417 | "execution_count": 15, 418 | "metadata": {}, 419 | "outputs": [], 420 | "source": [] 421 | }, 422 | { 423 | "cell_type": "code", 424 | "execution_count": null, 425 | "metadata": {}, 426 | "outputs": [], 427 | "source": [] 428 | }, 429 | { 430 | "cell_type": "code", 431 | "execution_count": 26, 432 | "metadata": {}, 433 | "outputs": [], 434 | "source": [] 435 | }, 436 | { 437 | "cell_type": "code", 438 | "execution_count": 27, 439 | "metadata": {}, 440 | "outputs": [], 441 | "source": [] 442 | }, 443 | { 444 | "cell_type": "code", 445 | "execution_count": null, 446 | "metadata": {}, 447 | "outputs": [], 448 | "source": [] 449 | }, 450 | { 451 | "cell_type": "code", 452 | "execution_count": 128, 453 | "metadata": {}, 454 | "outputs": [], 455 | "source": [] 456 | }, 457 | { 458 | "cell_type": "code", 459 | "execution_count": null, 460 | "metadata": {}, 461 | "outputs": [], 462 | "source": [] 463 | }, 464 | { 465 | "cell_type": "code", 466 | "execution_count": null, 467 | "metadata": { 468 | "collapsed": true 469 | }, 470 | "outputs": [], 471 | "source": [] 472 | }, 473 | { 474 | "cell_type": "code", 475 | "execution_count": null, 476 | "metadata": {}, 477 | "outputs": [], 478 | "source": [] 479 | }, 480 | { 481 | "cell_type": "code", 482 | "execution_count": null, 483 | "metadata": {}, 484 | "outputs": [], 485 | "source": [] 486 | }, 487 | { 488 | "cell_type": "code", 489 | "execution_count": null, 490 | "metadata": { 491 | "collapsed": true 492 | }, 493 | "outputs": [], 494 | "source": [] 495 | }, 496 | { 497 | "cell_type": "code", 498 | "execution_count": null, 499 | "metadata": { 500 | "collapsed": true 501 | }, 502 | "outputs": [], 503 | "source": [] 504 | }, 505 | { 506 | "cell_type": "code", 507 | "execution_count": null, 508 | "metadata": { 509 | "collapsed": true 510 | }, 511 | "outputs": [], 512 | "source": [] 513 | }, 514 | { 515 | "cell_type": "code", 516 | "execution_count": null, 517 | "metadata": { 518 | "collapsed": true 519 | }, 520 | "outputs": [], 521 | "source": [] 522 | }, 523 | { 524 | "cell_type": "code", 525 | "execution_count": null, 526 | "metadata": { 527 | "collapsed": true 528 | }, 529 | "outputs": [], 530 | "source": [] 531 | }, 532 | { 533 | "cell_type": "code", 534 | "execution_count": null, 535 | "metadata": { 536 | "collapsed": true 537 | }, 538 | "outputs": [], 539 | "source": [] 540 | }, 541 | { 542 | "cell_type": "code", 543 | "execution_count": null, 544 | "metadata": {}, 545 | "outputs": [], 546 | "source": [] 547 | }, 548 | { 549 | "cell_type": "code", 550 | "execution_count": null, 551 | "metadata": { 552 | "collapsed": true 553 | }, 554 | "outputs": [], 555 | "source": [] 556 | }, 557 | { 558 | "cell_type": "code", 559 | "execution_count": null, 560 | "metadata": { 561 | "collapsed": true 562 | }, 563 | "outputs": [], 564 | "source": [] 565 | }, 566 | { 567 | "cell_type": "code", 568 | "execution_count": null, 569 | "metadata": {}, 570 | "outputs": [], 571 | "source": [] 572 | }, 573 | { 574 | "cell_type": "code", 575 | "execution_count": null, 576 | "metadata": { 577 | "collapsed": true 578 | }, 579 | "outputs": [], 580 | "source": [] 581 | }, 582 | { 583 | "cell_type": "code", 584 | "execution_count": null, 585 | "metadata": { 586 | "collapsed": true 587 | }, 588 | "outputs": [], 589 | "source": [] 590 | }, 591 | { 592 | "cell_type": "code", 593 | "execution_count": null, 594 | "metadata": { 595 | "collapsed": true 596 | }, 597 | "outputs": [], 598 | "source": [] 599 | }, 600 | { 601 | "cell_type": "code", 602 | "execution_count": null, 603 | "metadata": { 604 | "collapsed": true 605 | }, 606 | "outputs": [], 607 | "source": [] 608 | }, 609 | { 610 | "cell_type": "code", 611 | "execution_count": null, 612 | "metadata": { 613 | "collapsed": true 614 | }, 615 | "outputs": [], 616 | "source": [] 617 | } 618 | ], 619 | "metadata": { 620 | "anaconda-cloud": {}, 621 | "kernelspec": { 622 | "display_name": "Python 3", 623 | "language": "python", 624 | "name": "python3" 625 | }, 626 | "language_info": { 627 | "codemirror_mode": { 628 | "name": "ipython", 629 | "version": 3 630 | }, 631 | "file_extension": ".py", 632 | "mimetype": "text/x-python", 633 | "name": "python", 634 | "nbconvert_exporter": "python", 635 | "pygments_lexer": "ipython3", 636 | "version": "3.5.3" 637 | } 638 | }, 639 | "nbformat": 4, 640 | "nbformat_minor": 2 641 | } 642 | -------------------------------------------------------------------------------- /Zhihu/第二个问题源数据.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Zhihu/第二个问题源数据.xlsx -------------------------------------------------------------------------------- /Zhihu/过年工作问题.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seizeeveryday/DA-cases/aed586b90f809e5243e5d18918233d35d216ba4d/Zhihu/过年工作问题.xlsx --------------------------------------------------------------------------------