├── .gitignore ├── 1. 报告自动化的前提.ipynb ├── 10. 人工分析部分自定义和ipynb交互.ipynb ├── 2. 报告结构规定.ipynb ├── 3. 报告的内容添加.ipynb ├── 4.1 模板的制作:Jinja2模板介绍.ipynb ├── 4.2 模板的制作:Jinja2模板中的递归.ipynb ├── 5. 图表绘制 ├── 1. 理论基础.ipynb ├── 2. 作图模块化及模板:以柱状图为例.ipynb ├── 3. 作图模板的使用与扩展 : 以多线图为例.ipynb ├── 4. 作图模板的扩展 : 添加副轴及细分类.ipynb ├── 5. 作图模板的练习 :Pie图|饼图.ipynb ├── Images.py ├── font │ ├── Calibri.ttf │ └── MSYHMONO.ttf └── source │ ├── anatomy.png │ ├── multi_line.png │ ├── plot_bar.png │ └── twinx.png ├── 6. 报告内容生成.ipynb ├── 7. 数据处理 ├── 1. 数据的读取.ipynb ├── 2. 数据的去重.ipynb ├── __init__.py ├── datapipeline.py └── tools.py ├── 8. 文件结构设计与模块化.ipynb ├── 9. 多人协作和版本管理.ipynb ├── ExampleCode ├── __init__.py ├── model.py └── models.py ├── Example_1 ├── ImageFactory.py ├── PS20180528_bestsellers.pkl ├── PS20180528_hotnewreleases.pkl ├── PS20180528_moversandshakers.pkl ├── PS20180528_product.pkl ├── Pet Supplies品类爆款报告.html ├── __init__.py ├── analysis.py ├── configs.py ├── image │ ├── chapter1_subchapter0.png │ ├── chapter1_subchapter1.png │ ├── chapter1_subchapter2.png │ ├── chapter1_subchapter3.png │ ├── chapter2_subchapter0.png │ ├── chapter2_subchapter1.png │ ├── chapter2_subchapter2.png │ ├── chapter2_subchapter3.png │ ├── chapter3_subchapter0.png │ ├── chapter3_subchapter1.png │ ├── chapter3_subchapter2.png │ └── chapter3_subchapter3.png ├── items.py ├── models.py └── template.html ├── Example_2 ├── configs.py ├── main.ipynb └── reports │ └── Report1 │ ├── configs.py │ └── main.ipynb ├── Example_3 ├── ImageFactory.py ├── __init__.py ├── analysis.py ├── configs.py ├── datapipeline.py ├── items.py ├── main.ipynb ├── models.py ├── reports │ └── Report1 │ │ ├── Report1.html │ │ ├── configs.py │ │ ├── data │ │ ├── PS20180528_bestsellers.pkl │ │ ├── PS20180528_hotnewreleases.pkl │ │ ├── PS20180528_moversandshakers.pkl │ │ └── PS20180528_product.pkl │ │ ├── image │ │ ├── chapter1_subchapter0.png │ │ ├── chapter1_subchapter1.png │ │ ├── chapter1_subchapter2.png │ │ ├── chapter1_subchapter3.png │ │ ├── chapter2_subchapter0.png │ │ ├── chapter2_subchapter1.png │ │ ├── chapter2_subchapter2.png │ │ ├── chapter2_subchapter3.png │ │ ├── chapter3_subchapter0.png │ │ ├── chapter3_subchapter1.png │ │ ├── chapter3_subchapter2.png │ │ └── chapter3_subchapter3.png │ │ └── main.ipynb ├── template │ ├── template.html │ └── template_recursive.html └── tools.py ├── __init__.py ├── font ├── Calibri.ttf └── MSYHMONO.ttf ├── readme.md ├── report.html ├── requirements.txt ├── 报告自动化的思路分享.zip └── 相关知识积累书目 ├── Mastering Python Design Patterns.pdf ├── Pro Python.pdf ├── sicp.pdf └── 计算机程序的构造和解释中文版.pdf /.gitignore: -------------------------------------------------------------------------------- 1 | *.DS_Store 2 | .AppleDouble 3 | .LSOverride 4 | 5 | # Icon must end with two \r 6 | Icon 7 | 8 | 9 | # Thumbnails 10 | ._* 11 | 12 | # Files that might appear in the root of a volume 13 | .DocumentRevisions-V100 14 | .fseventsd 15 | .Spotlight-V100 16 | .TemporaryItems 17 | .Trashes 18 | .VolumeIcon.icns 19 | .com.apple.timemachine.donotpresent 20 | 21 | # Directories potentially created on remote AFP share 22 | .AppleDB 23 | .AppleDesktop 24 | Network Trash Folder 25 | Temporary Items 26 | .apdisk 27 | 28 | __pycache__ 29 | *.pyc 30 | *.csv 31 | image/* 32 | .ipynb_checkpoints -------------------------------------------------------------------------------- /1. 报告自动化的前提.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### 教程已上传[Github](https://github.com/LinshuZhang/automate-report)\n", 8 | "\n", 9 | "### 总体说明\n", 10 | "对报告进行自动化是对一篇成熟的数据分析报告的提升,而数据分析报告已经成熟时对报告进行自动化的充要条件。\n", 11 | "一篇成熟的数据分析报告有如下要求:\n", 12 | "- 稳定的数据获取来源\n", 13 | " - 使用爬虫或者调用API获取数据\n", 14 | " - 对缺失数据有成熟的补全或者筛选方式\n", 15 | " \n", 16 | "- 确定的数据分析方法\n", 17 | " - 允许基于不同的情况进行不同的数据分析思路,但是所有分析思路必须是一个有限集,自动化报告是没有创造性的\n", 18 | "- 较为固定的分析描述语言\n", 19 | " - 把数据结果使用自然语言描述出来,使用较为固定的句式方便与自动化处理\n", 20 | " - 对于信息整合有较为固定的逻辑,方便自动化写出报告总结\n", 21 | " \n", 22 | "适合进行自动化的报告,必然是自动化之后可以节约大量的人力和脑力成本,提升报告产出效率的,且这样的报告会出现如下特征:\n", 23 | "- 相同的分析方法对不同的数据多次重复性使用\n", 24 | "- 同样式的图表对不同的数据需要重复作图\n", 25 | "- 同样的分析描述需要每篇报告再写一遍\n", 26 | ".....\n", 27 | "重点在于,明明已经设计好了分析思路却需要进行一遍遍单调枯燥的数据分析,被日常性工作消耗了大量时间精力。\n", 28 | "\n", 29 | "完成报告自动化之后的成果:\n", 30 | "- 只需要手动写出需要人工分析的少部分内容\n", 31 | "- 报告主体部分一键自动化生产\n", 32 | "- 报告可以进行模块化生产,提升整个报告生产流程的效率" 33 | ] 34 | }, 35 | { 36 | "cell_type": "markdown", 37 | "metadata": { 38 | "collapsed": true 39 | }, 40 | "source": [ 41 | "### 此框架的相关知识积累书目:\n", 42 | "以下为作者写此报告制作框架前与此框架相关的知识积累,具体内容可在Github中提供部分[pdf文档](https://github.com/LinshuZhang/automate-report/tree/master/%E7%9B%B8%E5%85%B3%E7%9F%A5%E8%AF%86%E7%A7%AF%E7%B4%AF%E4%B9%A6%E7%9B%AE)\n", 43 | "- Pro Python\n", 44 | "- Mastering Python Design Patterns\n", 45 | "- SICP:计算机程序的构造和解释中文版\n", 46 | "- 利用python进行数据分析\n", 47 | "- Numpy, Pandas, Matplotlib库官方文档\n", 48 | "- W3School的HTML和CSS教程" 49 | ] 50 | }, 51 | { 52 | "cell_type": "code", 53 | "execution_count": null, 54 | "metadata": { 55 | "collapsed": true 56 | }, 57 | "outputs": [], 58 | "source": [] 59 | } 60 | ], 61 | "metadata": { 62 | "hide_input": false, 63 | "kernelspec": { 64 | "display_name": "Python 3", 65 | "language": "python", 66 | "name": "python3" 67 | }, 68 | "language_info": { 69 | "codemirror_mode": { 70 | "name": "ipython", 71 | "version": 3 72 | }, 73 | "file_extension": ".py", 74 | "mimetype": "text/x-python", 75 | "name": "python", 76 | "nbconvert_exporter": "python", 77 | "pygments_lexer": "ipython3", 78 | "version": "3.6.1" 79 | } 80 | }, 81 | "nbformat": 4, 82 | "nbformat_minor": 2 83 | } 84 | -------------------------------------------------------------------------------- /2. 报告结构规定.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### 知识基础\n", 8 | "\n", 9 | "- Python类的基础了解\n", 10 | "- 代码来自于[Example_1](https://github.com/LinshuZhang/automate-report/tree/master/Example_1)" 11 | ] 12 | }, 13 | { 14 | "cell_type": "markdown", 15 | "metadata": {}, 16 | "source": [ 17 | "### 学习目标\n", 18 | "\n", 19 | "- 使用Python类构造文章结构" 20 | ] 21 | }, 22 | { 23 | "cell_type": "markdown", 24 | "metadata": {}, 25 | "source": [ 26 | "### 1. 文章建立" 27 | ] 28 | }, 29 | { 30 | "cell_type": "markdown", 31 | "metadata": { 32 | "collapsed": true 33 | }, 34 | "source": [ 35 | "首先,我们先建立基础的文章组成构建:文档,章,节。" 36 | ] 37 | }, 38 | { 39 | "cell_type": "code", 40 | "execution_count": 14, 41 | "metadata": { 42 | "collapsed": true 43 | }, 44 | "outputs": [], 45 | "source": [ 46 | "class Document(object):\n", 47 | " def __init__(self):\n", 48 | " # 文章标题\n", 49 | " self.title = None\n", 50 | " # 子标题\n", 51 | " self.subtitle = None\n", 52 | " # 前言\n", 53 | " self.foreword = None\n", 54 | " # 章节\n", 55 | " self.chapters = []\n", 56 | "\n", 57 | "class Chapter(Document):\n", 58 | " def __init__(self):\n", 59 | " # 文章标题\n", 60 | " self.title = None\n", 61 | " # 子标题\n", 62 | " self.subtitle = None\n", 63 | " # 前言\n", 64 | " self.foreword = None\n", 65 | " # 章节\n", 66 | " self.chapters = []\n", 67 | " # 章节内表格\n", 68 | " self.table = None\n", 69 | " # 章节内有多个图表时\n", 70 | " self.tables = []\n", 71 | " # 文本内容\n", 72 | " self.content = None\n", 73 | " # 图片\n", 74 | " self.image = None\n", 75 | " # 多个图片\n", 76 | " self.images = []\n", 77 | " \n", 78 | "# 子章节定义可以直接使用Chapter定义,利用了Python类的继承,也为之后自定义子章节提供了留白\n", 79 | "class Subchapter(Chapter):\n", 80 | " pass" 81 | ] 82 | }, 83 | { 84 | "cell_type": "markdown", 85 | "metadata": {}, 86 | "source": [ 87 | "### 2. 建立结构 " 88 | ] 89 | }, 90 | { 91 | "cell_type": "code", 92 | "execution_count": 15, 93 | "metadata": { 94 | "collapsed": true 95 | }, 96 | "outputs": [], 97 | "source": [ 98 | "document = Document()" 99 | ] 100 | }, 101 | { 102 | "cell_type": "markdown", 103 | "metadata": {}, 104 | "source": [ 105 | "以上我们对Document类做了基础的定义,下一步我们需要根据所要制作报告确定Document的结构,假如我们使用`Example 1`中的报告,报告由5个章节组成,其中最后一个总结的章节无标题,每个章节中有至多5个子章节,子章节中有单个图片和表格,章节中会有两个位置有文字" 106 | ] 107 | }, 108 | { 109 | "cell_type": "markdown", 110 | "metadata": {}, 111 | "source": [ 112 | "#### 基于内容定制Chapter" 113 | ] 114 | }, 115 | { 116 | "cell_type": "markdown", 117 | "metadata": {}, 118 | "source": [ 119 | "以上我们对Document类做了基础的定义,下一步我们需要根据所要制作报告确定Document的结构,假如我们使用`Example 1`中的报告,报告分为四个大章,每个大章最多有4个小章,把每个小章节的内容类别做一个集合,{标题,前言,表格,图表,说明}会是这个集合的超集,所以我们可以把章节定义如下。" 120 | ] 121 | }, 122 | { 123 | "cell_type": "code", 124 | "execution_count": 16, 125 | "metadata": { 126 | "collapsed": true 127 | }, 128 | "outputs": [], 129 | "source": [ 130 | "class Chapter(Document):\n", 131 | " def __init__(self, subchapter_number=0):\n", 132 | " \"\"\"\n", 133 | " subchapter_number : 子章节数\n", 134 | " \"\"\"\n", 135 | " # 文章标题\n", 136 | " self.title = None\n", 137 | " # 子标题\n", 138 | " self.subtitle = None\n", 139 | " # 前言\n", 140 | " self.foreword = None\n", 141 | " # 章节\n", 142 | " self.chapters = []\n", 143 | " self.table = None\n", 144 | " # 图表会先做出图片形式,此处保存图片的路径\n", 145 | " self.image = None\n", 146 | " self.content = None\n", 147 | " self.rank_list_change = None\n", 148 | " self.subchapters= [Chapter(subchapter_number=0) for i in range(subchapter_number)]\n", 149 | " " 150 | ] 151 | }, 152 | { 153 | "cell_type": "markdown", 154 | "metadata": {}, 155 | "source": [ 156 | "#### 为章节加入 添加图片的方法\n", 157 | "使用self.image或者self.images保存的为图片路径,为了方便我们可以直接把matplotlib的fig直接加入chapter中,只需要设定加入规则" 158 | ] 159 | }, 160 | { 161 | "cell_type": "code", 162 | "execution_count": 17, 163 | "metadata": { 164 | "collapsed": true 165 | }, 166 | "outputs": [], 167 | "source": [ 168 | "class Chapter(Document):\n", 169 | " def __init__(self, subchapter_number=0, number=''):\n", 170 | " \"\"\"\n", 171 | " subchapter_number : 子章节数\n", 172 | " number : 章节编号,基于文档结构生成,需要是唯一的\n", 173 | " \"\"\"\n", 174 | " # 文章标题\n", 175 | " self.title = None\n", 176 | " # 子标题\n", 177 | " self.subtitle = None\n", 178 | " # 前言\n", 179 | " self.foreword = None\n", 180 | " # 章节\n", 181 | " self.chapters = []\n", 182 | " self.content = None\n", 183 | " self.table = None\n", 184 | " # 图表会先做出图片形式,此处保存图片的路径\n", 185 | " self.image = None\n", 186 | " self.rank_list_change = None\n", 187 | " self.number = number\n", 188 | " self.chapters= [Chapter(subchapter_number=0, number=\"{}_subchapter{}\".format(self.number, i)) \n", 189 | " for i in range(subchapter_number)]\n", 190 | " # 设定图片保存的链接\n", 191 | " self.image_path = './image/'\n", 192 | " \n", 193 | "\n", 194 | " def set_image(self, fig):\n", 195 | " image_filename = '{}{}.png'.format(self.image_path, self.number)\n", 196 | " fig.savefig(image_filename, dpi=160, bbox_inches='tight')\n", 197 | " self.image = image_filename" 198 | ] 199 | }, 200 | { 201 | "cell_type": "code", 202 | "execution_count": 18, 203 | "metadata": { 204 | "collapsed": true 205 | }, 206 | "outputs": [], 207 | "source": [ 208 | "# 我们知道报告中有5个章节,每个章节至多有4个子章节\n", 209 | "# 此处代码可以加入document的__init__函数中,在实例化过程中直接完成\n", 210 | "document = Document()\n", 211 | "for i in range(5):\n", 212 | " document.chapters.append(Chapter(subchapter_number=4, number=\"chapter{}\".format(i)))" 213 | ] 214 | }, 215 | { 216 | "cell_type": "code", 217 | "execution_count": 19, 218 | "metadata": {}, 219 | "outputs": [ 220 | { 221 | "name": "stdout", 222 | "output_type": "stream", 223 | "text": [ 224 | "chapter3_subchapter0\n" 225 | ] 226 | } 227 | ], 228 | "source": [ 229 | "print(document.chapters[3].chapters[0].number)" 230 | ] 231 | }, 232 | { 233 | "cell_type": "markdown", 234 | "metadata": {}, 235 | "source": [ 236 | "#### 添加函数直接打印document文档结构" 237 | ] 238 | }, 239 | { 240 | "cell_type": "code", 241 | "execution_count": 20, 242 | "metadata": { 243 | "collapsed": true 244 | }, 245 | "outputs": [], 246 | "source": [ 247 | "def print_structure(chapter, deep):\n", 248 | " if chapter.chapters:\n", 249 | " for subchapter in chapter.chapters:\n", 250 | " print('--'*deep+subchapter.number)\n", 251 | " print_structure(subchapter, deep+1)\n", 252 | " else:\n", 253 | " return" 254 | ] 255 | }, 256 | { 257 | "cell_type": "code", 258 | "execution_count": 21, 259 | "metadata": { 260 | "collapsed": true 261 | }, 262 | "outputs": [], 263 | "source": [ 264 | "## print_structure(document, 0)" 265 | ] 266 | }, 267 | { 268 | "cell_type": "markdown", 269 | "metadata": {}, 270 | "source": [ 271 | "把此文件中做的报告结构保存入./ExampleCode/models.py\n", 272 | "并且把print_structure方法放入Document" 273 | ] 274 | }, 275 | { 276 | "cell_type": "code", 277 | "execution_count": 22, 278 | "metadata": { 279 | "collapsed": true, 280 | "scrolled": true 281 | }, 282 | "outputs": [], 283 | "source": [ 284 | "from ExampleCode.models import Document" 285 | ] 286 | }, 287 | { 288 | "cell_type": "code", 289 | "execution_count": 23, 290 | "metadata": {}, 291 | "outputs": [ 292 | { 293 | "name": "stdout", 294 | "output_type": "stream", 295 | "text": [ 296 | "chapter0\n", 297 | "--chapter0_subchapter0\n", 298 | "--chapter0_subchapter1\n", 299 | "--chapter0_subchapter2\n", 300 | "--chapter0_subchapter3\n", 301 | "chapter1\n", 302 | "--chapter1_subchapter0\n", 303 | "--chapter1_subchapter1\n", 304 | "--chapter1_subchapter2\n", 305 | "--chapter1_subchapter3\n", 306 | "chapter2\n", 307 | "--chapter2_subchapter0\n", 308 | "--chapter2_subchapter1\n", 309 | "--chapter2_subchapter2\n", 310 | "--chapter2_subchapter3\n", 311 | "chapter3\n", 312 | "--chapter3_subchapter0\n", 313 | "--chapter3_subchapter1\n", 314 | "--chapter3_subchapter2\n", 315 | "--chapter3_subchapter3\n", 316 | "chapter4\n", 317 | "--chapter4_subchapter0\n", 318 | "--chapter4_subchapter1\n", 319 | "--chapter4_subchapter2\n", 320 | "--chapter4_subchapter3\n" 321 | ] 322 | } 323 | ], 324 | "source": [ 325 | "document = Document()\n", 326 | "document.print_structure(0)" 327 | ] 328 | } 329 | ], 330 | "metadata": { 331 | "hide_input": false, 332 | "kernelspec": { 333 | "display_name": "Python 3", 334 | "language": "python", 335 | "name": "python3" 336 | }, 337 | "language_info": { 338 | "codemirror_mode": { 339 | "name": "ipython", 340 | "version": 3 341 | }, 342 | "file_extension": ".py", 343 | "mimetype": "text/x-python", 344 | "name": "python", 345 | "nbconvert_exporter": "python", 346 | "pygments_lexer": "ipython3", 347 | "version": "3.6.1" 348 | } 349 | }, 350 | "nbformat": 4, 351 | "nbformat_minor": 2 352 | } 353 | -------------------------------------------------------------------------------- /3. 报告的内容添加.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### 知识基础\n", 8 | "\n", 9 | "- Python类的基础了解\n", 10 | "- 代码来自于[Example_1](https://github.com/LinshuZhang/automate-report/tree/master/Example_1)" 11 | ] 12 | }, 13 | { 14 | "cell_type": "markdown", 15 | "metadata": {}, 16 | "source": [ 17 | "### 学习目标\n", 18 | "\n", 19 | "- 在已经构造完毕的文档中添加内容" 20 | ] 21 | }, 22 | { 23 | "cell_type": "code", 24 | "execution_count": 1, 25 | "metadata": { 26 | "collapsed": true 27 | }, 28 | "outputs": [], 29 | "source": [ 30 | "from ExampleCode.models import Document\n", 31 | "document = Document()" 32 | ] 33 | }, 34 | { 35 | "cell_type": "markdown", 36 | "metadata": {}, 37 | "source": [ 38 | "此处内容的添加基于报告自身的分析方式,把报告的产出添加入document实例中" 39 | ] 40 | }, 41 | { 42 | "cell_type": "code", 43 | "execution_count": 2, 44 | "metadata": { 45 | "collapsed": true 46 | }, 47 | "outputs": [], 48 | "source": [ 49 | "# 执行目录进入Example_1文件夹\n", 50 | "import os\n", 51 | "import logging\n", 52 | "os.chdir(\"./Example_1\")\n", 53 | "logging.info(\"当前目录 : {}\".format(os.getcwd()))" 54 | ] 55 | }, 56 | { 57 | "cell_type": "code", 58 | "execution_count": 3, 59 | "metadata": { 60 | "collapsed": true 61 | }, 62 | "outputs": [], 63 | "source": [ 64 | "# 执行报告制作的分析部分\n", 65 | "%run \"analysis.py\"" 66 | ] 67 | }, 68 | { 69 | "cell_type": "code", 70 | "execution_count": 4, 71 | "metadata": { 72 | "collapsed": true 73 | }, 74 | "outputs": [], 75 | "source": [ 76 | "document.author = author\n", 77 | "document.title = '%s-%s%s品类报告'%(info.start_time, info.end_time,info.category_name)\n", 78 | "document.foreword = \"\"\"\n", 79 | "这里是前言\n", 80 | "\"\"\".format(Preface[info.category_name],\n", 81 | " info.start_time,\n", 82 | " info.end_time,\n", 83 | " info.category_name,\n", 84 | " author)\n", 85 | "document.chapters[0].title = '一、总体情况'\n", 86 | "document.chapters[0].foreword = \"\"\"\n", 87 | "{.start_time}至{.end_time}共有{.commodity_total_count}款\n", 88 | "商品登榜亚马逊{.category_name}品类的相关榜单,\n", 89 | "其中涉及品牌数{.brand_total_count:.0f}个。\n", 90 | "截至{.end_time},登榜商品总评论数达{.review_total_count:.0f}条,\n", 91 | "平均星级{.rating_mean:.2f}星。\\n\\n\n", 92 | "榜单的登榜商品及品牌数量情况如下表所示:\n", 93 | "\"\"\".format(info,info,info,info,info,info,info,info)\n", 94 | "document.chapters[0].table = pd.DataFrame(data = {'榜单':list(lists.keys()),\n", 95 | " '登榜商品数':[rank_list.commodity.count for rank_list in lists.values()],\n", 96 | " '登榜品牌数':[rank_list.brand.count for rank_list in lists.values()]\n", 97 | " })" 98 | ] 99 | }, 100 | { 101 | "cell_type": "markdown", 102 | "metadata": {}, 103 | "source": [ 104 | "分析报告的结构发现,报告内容的第二到第四章使用相同的文章结构和分析方法,可以直接重复使用" 105 | ] 106 | }, 107 | { 108 | "cell_type": "code", 109 | "execution_count": 7, 110 | "metadata": { 111 | "collapsed": true 112 | }, 113 | "outputs": [], 114 | "source": [ 115 | "for i in range(3):\n", 116 | " chinese_number = ['一', '二', '三', '四']\n", 117 | " rank_name = list(lists.keys())[i]\n", 118 | " rank_list = lists[rank_name]\n", 119 | " \n", 120 | " chapter = document.chapters[i+1]\n", 121 | " chapter.title = \"{}、{}榜单分析:\".format(chinese_number[i+1], rank_name)\n", 122 | " \n", 123 | " chapter.chapters[0].title = '1.价格分布'\n", 124 | " chapter.chapters[0].set_image(rank_list.price.image.fig)\n", 125 | " chapter.chapters[0].content = \"\"\"\n", 126 | " 上周 {} 品类 {} 榜单商品最高价${:.2f},\n", 127 | " 最低价${:.2f},平均价格${:.2f}。{}发现,在上榜的商品中,\n", 128 | " \"\"\".format(info.category_name, rank_name, rank_list.price.max, rank_list.price.min, \n", 129 | " rank_list.price.mean, author)\n", 130 | " \n", 131 | " chapter.chapters[1].title = '2.评论量分布'\n", 132 | " chapter.chapters[1].set_image(rank_list.review.image.fig)\n", 133 | " chapter.chapters[1].content = \"\"\"\n", 134 | " 在上周登榜 {} 品类 {} 榜单的商品中,\n", 135 | " 单个商品拥有的最大评论数为{:.2f}条,最小评论数为{:.2f}条,平均评论数{:.2f}条,评论平均星级{:.2f}星。\n", 136 | " \"\"\".format(info.category_name, rank_name, rank_list.review.max, rank_list.review.min, \n", 137 | " rank_list.review.mean, rank_list.review.rating_mean)\n", 138 | " \n", 139 | " chapter.chapters[2].title = '3.商品排行'\n", 140 | " chapter.chapters[2].foreword = '''根据上周 {} 榜单上商品的登榜次数及平均排名,\n", 141 | " {}对登榜的商品进行了排序,其中排名靠前的五款商品如下所示:'''.format(rank_name, author)\n", 142 | " chapter.chapters[2].table = rank_list.commodity.table\n", 143 | " chapter.chapters[2].set_image(rank_list.commodity.image.fig)\n", 144 | " chapter.chapters[2].content = \"\"\"\n", 145 | " 可以看到,在上周{}品类的{}榜单中,\n", 146 | " \"\"\".format(info.category_name, rank_name)\n", 147 | " \n", 148 | " chapter.chapters[3].title = '4.品牌排行'\n", 149 | " chapter.chapters[3].table = rank_list.brand.table\n", 150 | " chapter.chapters[3].set_image(rank_list.brand.image.fig)\n", 151 | " chapter.chapters[3].foreword = \"\"\"根据上周 {} 榜单上各个品牌下商品的登榜次数,{}对登榜的品牌进行了排序,\n", 152 | " 其中排名靠前的五个品牌如下所示:\"\"\".format(rank_name, author)\n", 153 | " chapter.chapters[3].content = \"\"\"\n", 154 | " 可以看到,在上周{}品类的{}榜单中,\n", 155 | " \"\"\".format(info.category_name, rank_name)\n", 156 | " \n", 157 | "document.chapters[4].title = ' '\n", 158 | "document.chapters[4].content = \"\"\"\n", 159 | "这里是总结\n", 160 | "\"\"\"" 161 | ] 162 | }, 163 | { 164 | "cell_type": "code", 165 | "execution_count": 8, 166 | "metadata": { 167 | "collapsed": true 168 | }, 169 | "outputs": [], 170 | "source": [ 171 | "def print_structure(chapter, deep, show_contents = []):\n", 172 | " if show_contents:\n", 173 | " for content in show_contents:\n", 174 | " if chapter.__dict__[content]:\n", 175 | " content_value = chapter.__dict__[content]\n", 176 | " if isinstance(content_value, str):\n", 177 | " print('--'*deep+content+\" : \" + chapter.__dict__[content])\n", 178 | " else:\n", 179 | " print('--'*deep+content)\n", 180 | " if chapter.chapters:\n", 181 | " for subchapter in chapter.chapters:\n", 182 | " if subchapter.__dict__['title']:\n", 183 | " print('\\n--'*deep+'-'+subchapter.number)\n", 184 | " print_structure(subchapter, deep+1, show_contents = show_contents)\n", 185 | " else:\n", 186 | " return" 187 | ] 188 | }, 189 | { 190 | "cell_type": "code", 191 | "execution_count": 9, 192 | "metadata": {}, 193 | "outputs": [ 194 | { 195 | "name": "stdout", 196 | "output_type": "stream", 197 | "text": [ 198 | "title : 2018年5月21日-2018年5月27日亚马逊美站\t Pet Supplies \t品类爆款分析\n", 199 | "-chapter0\n", 200 | "--title : 一、总体情况\n", 201 | "-chapter1\n", 202 | "--title : 二、Best Seller榜单分析:\n", 203 | "\n", 204 | "---chapter1_subchapter0\n", 205 | "----title : 1.价格分布\n", 206 | "\n", 207 | "---chapter1_subchapter1\n", 208 | "----title : 2.评论量分布\n", 209 | "\n", 210 | "---chapter1_subchapter2\n", 211 | "----title : 3.商品排行\n", 212 | "\n", 213 | "---chapter1_subchapter3\n", 214 | "----title : 4.品牌排行\n", 215 | "-chapter2\n", 216 | "--title : 三、Hot New Releases榜单分析:\n", 217 | "\n", 218 | "---chapter2_subchapter0\n", 219 | "----title : 1.价格分布\n", 220 | "\n", 221 | "---chapter2_subchapter1\n", 222 | "----title : 2.评论量分布\n", 223 | "\n", 224 | "---chapter2_subchapter2\n", 225 | "----title : 3.商品排行\n", 226 | "\n", 227 | "---chapter2_subchapter3\n", 228 | "----title : 4.品牌排行\n", 229 | "-chapter3\n", 230 | "--title : 四、Movers & Shakers榜单分析:\n", 231 | "\n", 232 | "---chapter3_subchapter0\n", 233 | "----title : 1.价格分布\n", 234 | "\n", 235 | "---chapter3_subchapter1\n", 236 | "----title : 2.评论量分布\n", 237 | "\n", 238 | "---chapter3_subchapter2\n", 239 | "----title : 3.商品排行\n", 240 | "\n", 241 | "---chapter3_subchapter3\n", 242 | "----title : 4.品牌排行\n", 243 | "-chapter4\n", 244 | "--title : \n" 245 | ] 246 | } 247 | ], 248 | "source": [ 249 | "print_structure(document, 0, show_contents = ['title'])" 250 | ] 251 | }, 252 | { 253 | "cell_type": "markdown", 254 | "metadata": { 255 | "collapsed": true 256 | }, 257 | "source": [ 258 | "## 作业\n", 259 | "把更新后的print_structure方法放入./Example_1/models.py中的Document中" 260 | ] 261 | }, 262 | { 263 | "cell_type": "code", 264 | "execution_count": null, 265 | "metadata": { 266 | "collapsed": true 267 | }, 268 | "outputs": [], 269 | "source": [] 270 | } 271 | ], 272 | "metadata": { 273 | "hide_input": false, 274 | "kernelspec": { 275 | "display_name": "Python 3", 276 | "language": "python", 277 | "name": "python3" 278 | }, 279 | "language_info": { 280 | "codemirror_mode": { 281 | "name": "ipython", 282 | "version": 3 283 | }, 284 | "file_extension": ".py", 285 | "mimetype": "text/x-python", 286 | "name": "python", 287 | "nbconvert_exporter": "python", 288 | "pygments_lexer": "ipython3", 289 | "version": "3.6.1" 290 | } 291 | }, 292 | "nbformat": 4, 293 | "nbformat_minor": 2 294 | } 295 | -------------------------------------------------------------------------------- /4.1 模板的制作:Jinja2模板介绍.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### 学习目标\n", 8 | "\n", 9 | "- 了解jinja2模板常用语句" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": {}, 15 | "source": [ 16 | "### 前期设置\n", 17 | "- 初始化document\n", 18 | "- 定义直接显示jinja2渲染结果的函数render_templ" 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": 3, 24 | "metadata": { 25 | "collapsed": true 26 | }, 27 | "outputs": [], 28 | "source": [ 29 | "from jinja2 import Template" 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": 12, 35 | "metadata": { 36 | "collapsed": true 37 | }, 38 | "outputs": [], 39 | "source": [ 40 | "class NameDict(object):\n", 41 | " def __getattr__(self, name):\n", 42 | " try:\n", 43 | " return self.name\n", 44 | " except:\n", 45 | " logging.error(\"Attribute is not exist\")\n", 46 | " return None\n", 47 | " \n", 48 | " def __getitem__(self, name):\n", 49 | " return self.__dict__[name]\n", 50 | " \n", 51 | " def __setitem__(self, name, value):\n", 52 | " self.__dict__[name] = value" 53 | ] 54 | }, 55 | { 56 | "cell_type": "code", 57 | "execution_count": 13, 58 | "metadata": { 59 | "collapsed": true 60 | }, 61 | "outputs": [], 62 | "source": [ 63 | "document = NameDict()" 64 | ] 65 | }, 66 | { 67 | "cell_type": "code", 68 | "execution_count": 19, 69 | "metadata": { 70 | "collapsed": true 71 | }, 72 | "outputs": [], 73 | "source": [ 74 | "def render_templ(templ):\n", 75 | " t = Template(templ)\n", 76 | " html = t.render(document=document)\n", 77 | " return html" 78 | ] 79 | }, 80 | { 81 | "cell_type": "markdown", 82 | "metadata": {}, 83 | "source": [ 84 | "#### 使用{{}}直接生成内容" 85 | ] 86 | }, 87 | { 88 | "cell_type": "code", 89 | "execution_count": 21, 90 | "metadata": {}, 91 | "outputs": [ 92 | { 93 | "data": { 94 | "text/plain": [ 95 | "'This is a title'" 96 | ] 97 | }, 98 | "execution_count": 21, 99 | "metadata": {}, 100 | "output_type": "execute_result" 101 | } 102 | ], 103 | "source": [ 104 | "document['title'] = \"This is a title\"\n", 105 | "templ = \"{{ document.title }}\"\n", 106 | "render_templ(templ)" 107 | ] 108 | }, 109 | { 110 | "cell_type": "markdown", 111 | "metadata": {}, 112 | "source": [ 113 | "#### 循环生成内容\n", 114 | "```\n", 115 | "{% for element in element_list %}\n", 116 | "{{ element }}\n", 117 | "{% endfor %}\n", 118 | "```" 119 | ] 120 | }, 121 | { 122 | "cell_type": "code", 123 | "execution_count": 36, 124 | "metadata": { 125 | "scrolled": true 126 | }, 127 | "outputs": [ 128 | { 129 | "name": "stdout", 130 | "output_type": "stream", 131 | "text": [ 132 | "the begin\n", 133 | "\n", 134 | "T\n", 135 | "\n", 136 | "h\n", 137 | "\n", 138 | "i\n", 139 | "\n", 140 | "s\n", 141 | "\n", 142 | " \n", 143 | "\n", 144 | "i\n", 145 | "\n", 146 | "s\n", 147 | "\n", 148 | " \n", 149 | "\n", 150 | "a\n", 151 | "\n", 152 | " \n", 153 | "\n", 154 | "t\n", 155 | "\n", 156 | "i\n", 157 | "\n", 158 | "t\n", 159 | "\n", 160 | "l\n", 161 | "\n", 162 | "e\n", 163 | "\n", 164 | "the end\n" 165 | ] 166 | } 167 | ], 168 | "source": [ 169 | "templ = \"\"\"the begin\n", 170 | "{% for element in document['title'] %}\n", 171 | "{{ element }}\n", 172 | "{% endfor %}\n", 173 | "the end\"\"\"\n", 174 | "print(render_templ(templ))" 175 | ] 176 | }, 177 | { 178 | "cell_type": "markdown", 179 | "metadata": {}, 180 | "source": [ 181 | "这里出现了大量的空白,是用于{}外的空白符号并不会在render使自动忽略,若想自动忽略空白,需要在`{%`加入`-`,比如\n", 182 | "```\n", 183 | "{% for element in element_list -%}\n", 184 | "{{ element }}\n", 185 | "{%- endfor %}\n", 186 | "```\n", 187 | "就可以去除循环内部的空白。同理\n", 188 | "```\n", 189 | "{%- for element in element_list %}\n", 190 | "{{ element }}\n", 191 | "{% endfor -%}\n", 192 | "```\n", 193 | "可以去除外部的空白,有效防止模板排版对生成内容的影响。\n" 194 | ] 195 | }, 196 | { 197 | "cell_type": "code", 198 | "execution_count": 32, 199 | "metadata": {}, 200 | "outputs": [ 201 | { 202 | "name": "stdout", 203 | "output_type": "stream", 204 | "text": [ 205 | "the begin\n", 206 | "This is a title\n", 207 | "the end\n" 208 | ] 209 | } 210 | ], 211 | "source": [ 212 | "templ = \"\"\"the begin\n", 213 | "{% for element in document['title'] -%}\n", 214 | "{{ element }}\n", 215 | "{%- endfor %}\n", 216 | "the end\"\"\"\n", 217 | "print(render_templ(templ))" 218 | ] 219 | }, 220 | { 221 | "cell_type": "code", 222 | "execution_count": 35, 223 | "metadata": {}, 224 | "outputs": [ 225 | { 226 | "name": "stdout", 227 | "output_type": "stream", 228 | "text": [ 229 | "the beginThis is a titlethe end\n" 230 | ] 231 | } 232 | ], 233 | "source": [ 234 | "templ = \"\"\"the begin\n", 235 | "{%- for element in document['title'] -%}\n", 236 | "{{ element }}\n", 237 | "{%- endfor -%}\n", 238 | "the end\"\"\"\n", 239 | "print(render_templ(templ))" 240 | ] 241 | }, 242 | { 243 | "cell_type": "markdown", 244 | "metadata": {}, 245 | "source": [ 246 | "#### 赋值" 247 | ] 248 | }, 249 | { 250 | "cell_type": "code", 251 | "execution_count": 38, 252 | "metadata": {}, 253 | "outputs": [ 254 | { 255 | "name": "stdout", 256 | "output_type": "stream", 257 | "text": [ 258 | "the begin\n", 259 | "This is a title\n", 260 | "the end\n" 261 | ] 262 | } 263 | ], 264 | "source": [ 265 | "templ = \"\"\"the begin\n", 266 | "{% set element = document['title'] -%}\n", 267 | "{{ element }}\n", 268 | "the end\"\"\"\n", 269 | "print(render_templ(templ))" 270 | ] 271 | }, 272 | { 273 | "cell_type": "markdown", 274 | "metadata": {}, 275 | "source": [ 276 | "#### 条件语句" 277 | ] 278 | }, 279 | { 280 | "cell_type": "code", 281 | "execution_count": 42, 282 | "metadata": {}, 283 | "outputs": [ 284 | { 285 | "name": "stdout", 286 | "output_type": "stream", 287 | "text": [ 288 | "the begin\n", 289 | "This is a title\n", 290 | "the end\n" 291 | ] 292 | } 293 | ], 294 | "source": [ 295 | "document.has_title = True\n", 296 | "templ = \"\"\"the begin\n", 297 | "{% if document.has_title -%}\n", 298 | "{{ document['title'] }}\n", 299 | "{%- endif %}\n", 300 | "the end\"\"\"\n", 301 | "print(render_templ(templ))" 302 | ] 303 | }, 304 | { 305 | "cell_type": "code", 306 | "execution_count": 41, 307 | "metadata": {}, 308 | "outputs": [ 309 | { 310 | "name": "stdout", 311 | "output_type": "stream", 312 | "text": [ 313 | "the begin\n", 314 | "\n", 315 | "the end\n" 316 | ] 317 | } 318 | ], 319 | "source": [ 320 | "document.has_title = False\n", 321 | "templ = \"\"\"the begin\n", 322 | "{% if document.has_title -%}\n", 323 | "{{ document['title'] }}\n", 324 | "{%- endif %}\n", 325 | "the end\"\"\"\n", 326 | "print(render_templ(templ))" 327 | ] 328 | }, 329 | { 330 | "cell_type": "code", 331 | "execution_count": null, 332 | "metadata": { 333 | "collapsed": true 334 | }, 335 | "outputs": [], 336 | "source": [] 337 | } 338 | ], 339 | "metadata": { 340 | "hide_input": false, 341 | "kernelspec": { 342 | "display_name": "Python 3", 343 | "language": "python", 344 | "name": "python3" 345 | }, 346 | "language_info": { 347 | "codemirror_mode": { 348 | "name": "ipython", 349 | "version": 3 350 | }, 351 | "file_extension": ".py", 352 | "mimetype": "text/x-python", 353 | "name": "python", 354 | "nbconvert_exporter": "python", 355 | "pygments_lexer": "ipython3", 356 | "version": "3.6.1" 357 | } 358 | }, 359 | "nbformat": 4, 360 | "nbformat_minor": 2 361 | } 362 | -------------------------------------------------------------------------------- /4.2 模板的制作:Jinja2模板中的递归.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### 学习目标\n", 8 | "\n", 9 | "- 使用jinja2模板的递归方法重构报告生成模板" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": {}, 15 | "source": [ 16 | "### 前期设置\n", 17 | "- 初始化document\n", 18 | "- 定义直接显示jinja2渲染结果的函数render_templ" 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": {}, 24 | "source": [ 25 | "### Recursive Example For Jinja2" 26 | ] 27 | }, 28 | { 29 | "cell_type": "markdown", 30 | "metadata": {}, 31 | "source": [ 32 | "首先我们来看一个Jinja2模板的递归例子,需要注意此时的渲染方式有变化,需要提前设置环境。" 33 | ] 34 | }, 35 | { 36 | "cell_type": "code", 37 | "execution_count": 2, 38 | "metadata": {}, 39 | "outputs": [ 40 | { 41 | "name": "stdout", 42 | "output_type": "stream", 43 | "text": [ 44 | "\n", 45 | " depth=1. idx=1. pidx=0. title=a\n", 46 | " depth=2. idx=1. pidx=1. title=a_a\n", 47 | " depth=3. idx=1. pidx=1. title=a_a_a\n", 48 | " depth=2. idx=2. pidx=1. title=a_b\n", 49 | " depth=3. idx=1. pidx=2. title=a_b_a\n", 50 | " depth=4. idx=1. pidx=1. title=a_b_a_0\n", 51 | " depth=1. idx=2. pidx=0. title=b\n" 52 | ] 53 | } 54 | ], 55 | "source": [ 56 | "import jinja2\n", 57 | "\n", 58 | "template = \"\"\"\n", 59 | "{%- set idxs = [0] -%}\n", 60 | "{%- for item in sitemap recursive %}\n", 61 | " depth={{idxs|length}}. idx={{loop.index}}. pidx={{idxs[-1]}}. title={{item.title}}\n", 62 | " {%- if item.children -%}\n", 63 | " {%- do idxs.append(loop.index) -%}\n", 64 | " {{ loop(item.children) }}\n", 65 | " {%- do idxs.pop() -%}\n", 66 | " {%- endif %}\n", 67 | "{%- endfor %}\n", 68 | "\"\"\"\n", 69 | "\n", 70 | "class Node():\n", 71 | " def __init__(self, title, children=[]):\n", 72 | " self.title = title\n", 73 | " self.children = children\n", 74 | "\n", 75 | "sitemap = [\n", 76 | " Node('a', [\n", 77 | " Node('a_a', [\n", 78 | " Node('a_a_a'),\n", 79 | " ]),\n", 80 | " Node('a_b', [\n", 81 | " Node('a_b_a', [\n", 82 | " Node('a_b_a_0'),\n", 83 | " ]),\n", 84 | " ]),\n", 85 | " ]),\n", 86 | " Node('b'),\n", 87 | " ]\n", 88 | "\n", 89 | "env = jinja2.Environment(extensions=['jinja2.ext.do'])\n", 90 | "print(env.from_string(template).render(sitemap=sitemap))" 91 | ] 92 | }, 93 | { 94 | "cell_type": "markdown", 95 | "metadata": {}, 96 | "source": [ 97 | "### 把之前生成报告的模板改为使用递归的形式" 98 | ] 99 | }, 100 | { 101 | "cell_type": "markdown", 102 | "metadata": {}, 103 | "source": [ 104 | "而我们的Docoment类同样是一个树结构,可以进行递归的模板生成。而且每个Chapter和SubChapter都有着类似的结构,可以使用同样的处理方式,所以可以把我们之前的jinja2模板改为:\n", 105 | "```\n", 106 | "\n", 107 | "
\n", 108 | "

{{ document.title }}

\n", 109 | "

{{ document.foreword }}

\n", 110 | "\n", 111 | "{% for chapter in document.chapters recursive %}\n", 112 | " {%- if not chapter.title is none %}\n", 113 | "

{{ chapter.title }}

\n", 114 | " {%- endif -%}\n", 115 | " {%- if not chapter.foreword is none %}\n", 116 | "

{{ chapter.foreword }}

\n", 117 | " {%- endif -%}\n", 118 | " {% if not chapter.table is none %}\n", 119 | " {% set table = chapter.table %}\n", 120 | " \n", 121 | " \n", 122 | " {% for column in table.columns %}\n", 123 | " \n", 124 | " {% endfor %}\n", 125 | " \n", 126 | " {% for index in table.index %}\n", 127 | " \n", 128 | " {% for column in table.columns %}\n", 129 | " \n", 130 | " {% endfor %}\n", 131 | " \n", 132 | " {% endfor %}\n", 133 | "
{{column}}
{{ table[column][index] }}
\n", 134 | " {% endif %}\n", 135 | " {% if not chapter.image is none %}\n", 136 | " {% set image = chapter.image %}\n", 137 | "
\n", 138 | " {% endif %}\n", 139 | " {% if not chapter.content is none %}\n", 140 | "

{{ chapter.content }}

\n", 141 | " {% endif %}\n", 142 | " {% if chapter.chapters.__len__() > 0 %}\n", 143 | " {{ loop(chapter.chapters) }}\n", 144 | " {% endif %}\n", 145 | "\n", 146 | "{% endfor %}\n", 147 | "
\n", 148 | "\n", 149 | "```" 150 | ] 151 | }, 152 | { 153 | "cell_type": "markdown", 154 | "metadata": {}, 155 | "source": [ 156 | "## 作业\n", 157 | "上面的递归会把所有的章标题记录为`

`,对于更深一层的子章节我们需要使用更小的标签,比如`

`,这时候就可以使用递归深度来做规定。请自行做出作为练习。\n", 158 | "想要直接查看结果可以参照[Example_3](https://github.com/LinshuZhang/automate-report/tree/master/Example_3)" 159 | ] 160 | }, 161 | { 162 | "cell_type": "code", 163 | "execution_count": null, 164 | "metadata": { 165 | "collapsed": true 166 | }, 167 | "outputs": [], 168 | "source": [] 169 | } 170 | ], 171 | "metadata": { 172 | "hide_input": false, 173 | "kernelspec": { 174 | "display_name": "Python 3", 175 | "language": "python", 176 | "name": "python3" 177 | }, 178 | "language_info": { 179 | "codemirror_mode": { 180 | "name": "ipython", 181 | "version": 3 182 | }, 183 | "file_extension": ".py", 184 | "mimetype": "text/x-python", 185 | "name": "python", 186 | "nbconvert_exporter": "python", 187 | "pygments_lexer": "ipython3", 188 | "version": "3.6.1" 189 | } 190 | }, 191 | "nbformat": 4, 192 | "nbformat_minor": 2 193 | } 194 | -------------------------------------------------------------------------------- /5. 图表绘制/Images.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | import numpy as np 3 | import matplotlib.pyplot as plt 4 | import matplotlib as mpl 5 | import matplotlib.font_manager as mfm 6 | import matplotlib.gridspec as gridspec 7 | import matplotlib.ticker as plticker 8 | 9 | class Image(object): 10 | font_path = {} 11 | prop = {} 12 | font_path['hei'] = './font/MSYHMONO.ttf' 13 | font_path['english'] = './font/Calibri.ttf' 14 | for font_name in list(font_path): 15 | prop[font_name] = mfm.FontProperties(fname=font_path[font_name]) 16 | 17 | title_font = prop['hei'].copy() 18 | title_font.set_size(14) 19 | xticks_font = prop['hei'].copy() 20 | xticks_font.set_size(9) 21 | ylable_font = prop['hei'].copy() 22 | ylable_font.set_size(10) 23 | legend_font = prop['hei'].copy() 24 | legend_font.set_size(9) 25 | default_colors = {} 26 | default_colors['blue'] = '#6CADDD' 27 | default_colors['yellow'] = '#F3903F' 28 | default_colors['red'] = '#CB615A' 29 | default_colors['orange'] = '#F3903F' 30 | default_colors['gray'] = '#B4B4B4' 31 | default_colors['lightyellow'] = '#FCC900' 32 | default_colors['royalblue'] = '#5488CF' 33 | 34 | IMAGE_WIDTH = 5.708 35 | IMAGE_HIGH = 2.756 36 | 37 | def __init__(self, title=None, labels=None, data=None, 38 | image_path=None, title_y=1.1, xticks_rotation='horizontal', legend_name=[]): 39 | self.length = len(data) 40 | self.x = np.arange(self.length) 41 | self.y = data 42 | self.data = data 43 | self.title_y = title_y 44 | self.title = title 45 | self.labels = labels 46 | self.legend_name = legend_name 47 | self.xticks_rotation = xticks_rotation 48 | 49 | def init(self): 50 | self.fig = plt.figure(figsize=(self.IMAGE_WIDTH, self.IMAGE_HIGH)) 51 | self.gs = gridspec.GridSpec(1, 1) 52 | self.ax = self.fig.add_subplot(self.gs[0]) 53 | self.set_xticks() 54 | # 为后续添加副轴留出空间 55 | self.add_ax() 56 | self.add_title() 57 | self.plot() 58 | self.set_spines() 59 | self.set_tick_marks() 60 | self.add_legend() 61 | # 为后续补充设定留出空间 62 | self.config_add() 63 | self.tight_layout() 64 | self.set_grid() 65 | plt.close() 66 | 67 | def add_ax(self): 68 | pass 69 | 70 | def add_legend(self): 71 | if not (self.legend_name is None): 72 | self.ax.legend(self.legend_name, loc='upper right', bbox_to_anchor=(1, 1.2), prop=self.legend_font, frameon=True) 73 | 74 | def tight_layout(self, **karg): 75 | self.gs.tight_layout(self.fig, **karg) 76 | 77 | def plot(self): 78 | rects = plt.bar(self.x, self.y, 0.4, zorder=3, color=self.default_colors['blue']) 79 | for rect in rects: 80 | height = rect.get_height() 81 | self.ax.text(rect.get_x() + rect.get_width()/2., 1.05*height, 82 | '%d' % int(height), 83 | ha='center', va='bottom') 84 | def add_title(self): 85 | if self.title: 86 | self.ax.set_title(self.title, fontproperties=self.title_font, y=self.title_y) 87 | 88 | def set_grid(self): 89 | get_ax_space = lambda x: x.get_ylim()[1] - x.get_ylim()[0] 90 | self.ax_space = get_ax_space(self.ax) 91 | def get_interval(ax_space, space_number=5): 92 | digit_number = len(str((ax_space))) 93 | intervals = int((ax_space)/(space_number*(10**digit_number))) 94 | while intervals == 0: 95 | digit_number -= 1 96 | intervals = int((ax_space)/(space_number*(10**digit_number))) 97 | linshi = round((ax_space)/(space_number*(10**digit_number))) 98 | intervals = linshi*(10**digit_number) 99 | return intervals 100 | 101 | if not 'intervals' in self.__dict__.keys(): 102 | self.intervals = get_interval(self.ax_space) 103 | loc = plticker.MultipleLocator(base=self.intervals) 104 | self.ax.yaxis.set_major_locator(loc) 105 | self.ax.grid(axis='y', zorder=0) 106 | if 'ax2' in self.__dict__.keys(): 107 | print("有双轴需要设置副轴grid") 108 | self.ax_space2 = get_ax_space(self.ax2) 109 | self.intervals2 = get_interval(self.ax_space2, space_number=5) 110 | loc2 = plticker.MultipleLocator(base=self.intervals2) 111 | self.ax2.yaxis.set_major_locator(loc2) 112 | 113 | def set_spines(self): 114 | self.ax.spines['right'].set_visible(False) 115 | self.ax.spines['top'].set_visible(False) 116 | self.ax.spines['left'].set_visible(False) 117 | 118 | def set_tick_marks(self): 119 | self.ax.tick_params(axis='both', which='both', bottom=False, top=False, 120 | labelbottom=True, left=False, right=False, labelleft=True) 121 | 122 | def set_xticks(self): 123 | plt.xticks(self.x, self.labels, fontproperties=self.xticks_font, rotation=self.xticks_rotation) # 设置横坐标标签 124 | 125 | def show(self): 126 | plt.show() 127 | 128 | def save(self): 129 | if image_path: 130 | self.fig.savefig(image_path) 131 | else: 132 | logging.warning("Please sure image path firse") 133 | 134 | def config_add(self): 135 | """ 136 | 保留用于扩展功能或者设定 137 | """ 138 | pass -------------------------------------------------------------------------------- /5. 图表绘制/font/Calibri.ttf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ssty247/automate-report/27b3fecf5bdd5dd3ae7a8362443d835c4825b1a6/5. 图表绘制/font/Calibri.ttf -------------------------------------------------------------------------------- /5. 图表绘制/font/MSYHMONO.ttf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ssty247/automate-report/27b3fecf5bdd5dd3ae7a8362443d835c4825b1a6/5. 图表绘制/font/MSYHMONO.ttf -------------------------------------------------------------------------------- /5. 图表绘制/source/anatomy.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ssty247/automate-report/27b3fecf5bdd5dd3ae7a8362443d835c4825b1a6/5. 图表绘制/source/anatomy.png -------------------------------------------------------------------------------- /5. 图表绘制/source/multi_line.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ssty247/automate-report/27b3fecf5bdd5dd3ae7a8362443d835c4825b1a6/5. 图表绘制/source/multi_line.png -------------------------------------------------------------------------------- /5. 图表绘制/source/plot_bar.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ssty247/automate-report/27b3fecf5bdd5dd3ae7a8362443d835c4825b1a6/5. 图表绘制/source/plot_bar.png -------------------------------------------------------------------------------- /5. 图表绘制/source/twinx.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ssty247/automate-report/27b3fecf5bdd5dd3ae7a8362443d835c4825b1a6/5. 图表绘制/source/twinx.png -------------------------------------------------------------------------------- /6. 报告内容生成.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### 知识基础\n", 8 | "\n", 9 | "- 了解Python类的基础\n", 10 | "- 了解jinja2模板\n", 11 | "- 了解HTML和CSS进行网页页面设置\n", 12 | "- 代码来自于[Example_1](https://github.com/LinshuZhang/automate-report/tree/master/Example_1)" 13 | ] 14 | }, 15 | { 16 | "cell_type": "markdown", 17 | "metadata": {}, 18 | "source": [ 19 | "### 学习目标\n", 20 | "\n", 21 | "- 使用已经添加好内容的 document 生成html文档" 22 | ] 23 | }, 24 | { 25 | "cell_type": "markdown", 26 | "metadata": {}, 27 | "source": [ 28 | "这里我们使用jinja2模板来直接生成html格式的报告,使用html+css规定格式,使用jinja2渲染内容" 29 | ] 30 | }, 31 | { 32 | "cell_type": "markdown", 33 | "metadata": {}, 34 | "source": [ 35 | "### jinja2模板" 36 | ] 37 | }, 38 | { 39 | "cell_type": "markdown", 40 | "metadata": {}, 41 | "source": [ 42 | "首先使用html常用的头部(head):\n", 43 | "```\n", 44 | "\n", 45 | "\n", 46 | "\n", 47 | "\n", 48 | "\n", 49 | "```\n", 50 | "然后对各个部分内容的格式进行规定:\n", 51 | "```\n", 52 | "\n", 72 | "```" 73 | ] 74 | }, 75 | { 76 | "cell_type": "markdown", 77 | "metadata": {}, 78 | "source": [ 79 | "逐个生成报告的各个组成章节\n", 80 | "```\n", 81 | "\n", 82 | "
\n", 83 | "

{{ document.title }}

\n", 84 | "

{{ document.foreword }}

\n", 85 | "\n", 86 | "{% for chapter in document.chapters %}\n", 87 | "

{{ chapter.title }}

\n", 88 | " {%- if chapter.foreword -%}\n", 89 | "

{{ chapter.foreword }}

\n", 90 | " {%- endif -%}\n", 91 | "

{{ chapter.content }}

\n", 92 | " {% if not chapter.table is none %}\n", 93 | " {% set table = chapter.table %}\n", 94 | " \n", 95 | " \n", 96 | " {% for column in table.columns %}\n", 97 | " \n", 98 | " {% endfor %}\n", 99 | " \n", 100 | " {% for index in table.index %}\n", 101 | " \n", 102 | " {% for column in table.columns %}\n", 103 | " \n", 104 | " {% endfor %}\n", 105 | " \n", 106 | " {% endfor %}\n", 107 | "
{{column}}
{{ table[column][index] }}
\n", 108 | " {% endif %}\n", 109 | " {% for subchapter in chapter.chapters %}\n", 110 | " {% if not subchapter.title is none %}\n", 111 | "

{{ subchapter.title }}

\n", 112 | "

{{ subchapter.foreword }}

\n", 113 | " {% if not subchapter.table is none %}\n", 114 | " {% set table = subchapter.table %}\n", 115 | " \n", 116 | " \n", 117 | " {% for column in table.columns %}\n", 118 | " \n", 119 | " {% endfor %}\n", 120 | " \n", 121 | " {% for index in table.index %}\n", 122 | " \n", 123 | " {% for column in table.columns %}\n", 124 | " \n", 125 | " {% endfor %}\n", 126 | " \n", 127 | " {% endfor %}\n", 128 | "
{{column}}
{{ table[column][index] }}
\n", 129 | " {% endif %}\n", 130 | "\n", 131 | " {% if not subchapter.image is none %}\n", 132 | " {% set image = subchapter.image %}\n", 133 | "
\n", 134 | " {% endif %}\n", 135 | "\n", 136 | "

{{ subchapter.content }}

\n", 137 | " {% endif %}\n", 138 | " {% endfor %}\n", 139 | "\n", 140 | "{% endfor %}\n", 141 | "
\n", 142 | "\n", 143 | "\n", 144 | "```" 145 | ] 146 | }, 147 | { 148 | "cell_type": "markdown", 149 | "metadata": {}, 150 | "source": [ 151 | "执行教程 `3. 报告的内容添加.ipynb` ,可以得到已经添加报告内容的document实例 " 152 | ] 153 | }, 154 | { 155 | "cell_type": "code", 156 | "execution_count": 1, 157 | "metadata": { 158 | "scrolled": true 159 | }, 160 | "outputs": [ 161 | { 162 | "name": "stdout", 163 | "output_type": "stream", 164 | "text": [ 165 | "chapter3_subchapter0\n", 166 | "title : 2018年5月21日-2018年5月27日亚马逊美站\t Pet Supplies \t品类爆款分析\n", 167 | "-chapter0\n", 168 | "--title : 一、总体情况\n", 169 | "-chapter1\n", 170 | "--title : 二、Best Seller榜单分析:\n", 171 | "\n", 172 | "---chapter1_subchapter0\n", 173 | "----title : 1.价格分布\n", 174 | "\n", 175 | "---chapter1_subchapter1\n", 176 | "----title : 2.评论量分布\n", 177 | "\n", 178 | "---chapter1_subchapter2\n", 179 | "----title : 3.商品排行\n", 180 | "\n", 181 | "---chapter1_subchapter3\n", 182 | "----title : 4.品牌排行\n", 183 | "-chapter2\n", 184 | "--title : 三、Hot New Releases榜单分析:\n", 185 | "\n", 186 | "---chapter2_subchapter0\n", 187 | "----title : 1.价格分布\n", 188 | "\n", 189 | "---chapter2_subchapter1\n", 190 | "----title : 2.评论量分布\n", 191 | "\n", 192 | "---chapter2_subchapter2\n", 193 | "----title : 3.商品排行\n", 194 | "\n", 195 | "---chapter2_subchapter3\n", 196 | "----title : 4.品牌排行\n", 197 | "-chapter3\n", 198 | "--title : 四、Movers & Shakers榜单分析:\n", 199 | "\n", 200 | "---chapter3_subchapter0\n", 201 | "----title : 1.价格分布\n", 202 | "\n", 203 | "---chapter3_subchapter1\n", 204 | "----title : 2.评论量分布\n", 205 | "\n", 206 | "---chapter3_subchapter2\n", 207 | "----title : 3.商品排行\n", 208 | "\n", 209 | "---chapter3_subchapter3\n", 210 | "----title : 4.品牌排行\n", 211 | "-chapter4\n", 212 | "--title : \n" 213 | ] 214 | } 215 | ], 216 | "source": [ 217 | "%run \"3. 报告的内容添加.ipynb\"" 218 | ] 219 | }, 220 | { 221 | "cell_type": "markdown", 222 | "metadata": {}, 223 | "source": [ 224 | "使用jinja2模板渲染document实例,生成报告 `report.html`" 225 | ] 226 | }, 227 | { 228 | "cell_type": "code", 229 | "execution_count": 10, 230 | "metadata": { 231 | "collapsed": true 232 | }, 233 | "outputs": [], 234 | "source": [ 235 | "from jinja2 import Template\n", 236 | "html_name = 'report.html'\n", 237 | "with open('./template.html') as f:\n", 238 | " templ = f.read()\n", 239 | "t = Template(templ)\n", 240 | "html = t.render(document=document)\n", 241 | "with open(html_name,'w') as f:\n", 242 | " f.write(html)" 243 | ] 244 | } 245 | ], 246 | "metadata": { 247 | "hide_input": false, 248 | "kernelspec": { 249 | "display_name": "Python 3", 250 | "language": "python", 251 | "name": "python3" 252 | }, 253 | "language_info": { 254 | "codemirror_mode": { 255 | "name": "ipython", 256 | "version": 3 257 | }, 258 | "file_extension": ".py", 259 | "mimetype": "text/x-python", 260 | "name": "python", 261 | "nbconvert_exporter": "python", 262 | "pygments_lexer": "ipython3", 263 | "version": "3.6.1" 264 | } 265 | }, 266 | "nbformat": 4, 267 | "nbformat_minor": 2 268 | } 269 | -------------------------------------------------------------------------------- /7. 数据处理/1. 数据的读取.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### 知识基础\n", 8 | "\n", 9 | "- Pandas包基础:pd.read_csv\n", 10 | "- 正则表达式基础" 11 | ] 12 | }, 13 | { 14 | "cell_type": "markdown", 15 | "metadata": {}, 16 | "source": [ 17 | "报告自动化对数据的质量有着更高的要求,但是实际情况中出现错漏是非常正常的,而我们不仅仅应该在出现问题后修复bug,在最开始就应该做好尽可能严格的规定并作出意外情况的报告和处理。" 18 | ] 19 | }, 20 | { 21 | "cell_type": "markdown", 22 | "metadata": {}, 23 | "source": [ 24 | "## 读取CSV文件\n", 25 | "csv文件是我们常用的数据源,在此我们以csv文件为例" 26 | ] 27 | }, 28 | { 29 | "cell_type": "markdown", 30 | "metadata": {}, 31 | "source": [ 32 | "### 首先我们可以查看要读取数据内容" 33 | ] 34 | }, 35 | { 36 | "cell_type": "code", 37 | "execution_count": 18, 38 | "metadata": { 39 | "collapsed": true 40 | }, 41 | "outputs": [], 42 | "source": [ 43 | "import pandas as pd\n", 44 | "import numpy as np" 45 | ] 46 | }, 47 | { 48 | "cell_type": "code", 49 | "execution_count": 5, 50 | "metadata": {}, 51 | "outputs": [ 52 | { 53 | "data": { 54 | "text/html": [ 55 | "
\n", 56 | "\n", 69 | "\n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | "
Product NameBrandPriceCategoryRankSalesRevenueReviewsRatingSellerLQSASINLink
#
1Mind Reader Adjustable Height Ergonomic Foot R...Mind Reader$14.99Office Products2864,440$66,5563094AMZN.A.B07FMGMVT8https://www.amazon.com/dp/B07FMGMVT8
2AmazonBasics Foot Rest - BlackAmazonBasics$13.19Office Products5393,115$41,0876574N.A.5B01DN8TG46https://www.amazon.com/dp/B01DN8TG46
3Sleepy Ride - Airplane Footrest Made with Prem...Sleepy Ride$19.97Office Products1,0672,075$41,4383864.5FBA5B01M35M87Ohttps://www.amazon.com/dp/B01M35M87O
4Rest My Sole - Foot Rest Cushion for Under Des...Well Desk$26.95Office Products1,1591,661$44,7641884.5FBA8B075RYDWZHhttps://www.amazon.com/dp/B075RYDWZH
5Andyer Andyer Foot Rest, Portable Travel Footr...Andyer$10.99Home & Kitchen6,1691,384$15,2102154FBA6B072VJ9BKXhttps://www.amazon.com/dp/B072VJ9BKX
\n", 187 | "
" 188 | ], 189 | "text/plain": [ 190 | " Product Name Brand Price \\\n", 191 | "# \n", 192 | "1 Mind Reader Adjustable Height Ergonomic Foot R... Mind Reader $14.99 \n", 193 | "2 AmazonBasics Foot Rest - Black AmazonBasics $13.19 \n", 194 | "3 Sleepy Ride - Airplane Footrest Made with Prem... Sleepy Ride $19.97 \n", 195 | "4 Rest My Sole - Foot Rest Cushion for Under Des... Well Desk $26.95 \n", 196 | "5 Andyer Andyer Foot Rest, Portable Travel Footr... Andyer $10.99 \n", 197 | "\n", 198 | " Category Rank Sales Revenue Reviews Rating Seller LQS \\\n", 199 | "# \n", 200 | "1 Office Products 286 4,440 $66,556 309 4 AMZ N.A. \n", 201 | "2 Office Products 539 3,115 $41,087 657 4 N.A. 5 \n", 202 | "3 Office Products 1,067 2,075 $41,438 386 4.5 FBA 5 \n", 203 | "4 Office Products 1,159 1,661 $44,764 188 4.5 FBA 8 \n", 204 | "5 Home & Kitchen 6,169 1,384 $15,210 215 4 FBA 6 \n", 205 | "\n", 206 | " ASIN Link \n", 207 | "# \n", 208 | "1 B07FMGMVT8 https://www.amazon.com/dp/B07FMGMVT8 \n", 209 | "2 B01DN8TG46 https://www.amazon.com/dp/B01DN8TG46 \n", 210 | "3 B01M35M87O https://www.amazon.com/dp/B01M35M87O \n", 211 | "4 B075RYDWZH https://www.amazon.com/dp/B075RYDWZH \n", 212 | "5 B072VJ9BKX https://www.amazon.com/dp/B072VJ9BKX " 213 | ] 214 | }, 215 | "execution_count": 5, 216 | "metadata": {}, 217 | "output_type": "execute_result" 218 | } 219 | ], 220 | "source": [ 221 | "# 可以发现第8行才是头部,于是设置header参数\n", 222 | "data = pd.read_csv('data.csv', header=7, index_col=0)\n", 223 | "data.head()" 224 | ] 225 | }, 226 | { 227 | "cell_type": "markdown", 228 | "metadata": {}, 229 | "source": [ 230 | "### 对读取目标列进行格式规定" 231 | ] 232 | }, 233 | { 234 | "cell_type": "code", 235 | "execution_count": 7, 236 | "metadata": {}, 237 | "outputs": [ 238 | { 239 | "data": { 240 | "text/plain": [ 241 | "Product Name object\n", 242 | "Brand object\n", 243 | "Price object\n", 244 | "Category object\n", 245 | "Rank object\n", 246 | "Sales object\n", 247 | "Revenue object\n", 248 | "Reviews int64\n", 249 | "Rating object\n", 250 | "Seller object\n", 251 | "LQS object\n", 252 | "ASIN object\n", 253 | "Link object\n", 254 | "dtype: object" 255 | ] 256 | }, 257 | "execution_count": 7, 258 | "metadata": {}, 259 | "output_type": "execute_result" 260 | } 261 | ], 262 | "source": [ 263 | "data.dtypes" 264 | ] 265 | }, 266 | { 267 | "cell_type": "markdown", 268 | "metadata": {}, 269 | "source": [ 270 | "可以看到在列:Price, Rank, Sales, Revenue, Reviews, Rating, LQS都应该是数值,但是只有Review列被默认读取为数值" 271 | ] 272 | }, 273 | { 274 | "cell_type": "markdown", 275 | "metadata": {}, 276 | "source": [ 277 | "#### 使用dtype进行格式规定" 278 | ] 279 | }, 280 | { 281 | "cell_type": "code", 282 | "execution_count": 19, 283 | "metadata": {}, 284 | "outputs": [ 285 | { 286 | "name": "stdout", 287 | "output_type": "stream", 288 | "text": [ 289 | "invalid literal for int() with base 10: '1,067'\n" 290 | ] 291 | } 292 | ], 293 | "source": [ 294 | "dtype = {'#':int,\n", 295 | " 'Product Name':str,\n", 296 | " 'Brand':str,\n", 297 | " 'Price':float,\n", 298 | " 'Category':str,\n", 299 | " 'Rank':int,\n", 300 | " 'Sales':int,\n", 301 | " 'Revenue':int,\n", 302 | " 'Reviews':int,\n", 303 | " 'Rating':float,\n", 304 | " 'Seller':str,\n", 305 | " 'LQS':int,\n", 306 | " 'ASIN':str,\n", 307 | " 'Link':str\n", 308 | " }\n", 309 | "try:\n", 310 | " data = pd.read_csv('data.csv', dtype=dtype, header=7, index_col=0)\n", 311 | "except BaseException as e:\n", 312 | " print(e)" 313 | ] 314 | }, 315 | { 316 | "cell_type": "markdown", 317 | "metadata": {}, 318 | "source": [ 319 | "可以看到使用dtype并不能直接忽略非数字符号进行转换,我们需要更强的格式规定" 320 | ] 321 | }, 322 | { 323 | "cell_type": "markdown", 324 | "metadata": {}, 325 | "source": [ 326 | "#### 使用converters进行格式转化" 327 | ] 328 | }, 329 | { 330 | "cell_type": "code", 331 | "execution_count": 50, 332 | "metadata": {}, 333 | "outputs": [ 334 | { 335 | "data": { 336 | "text/html": [ 337 | "
\n", 338 | "\n", 351 | "\n", 352 | " \n", 353 | " \n", 354 | " \n", 355 | " \n", 356 | " \n", 357 | " \n", 358 | " \n", 359 | " \n", 360 | " \n", 361 | " \n", 362 | " \n", 363 | " \n", 364 | " \n", 365 | " \n", 366 | " \n", 367 | " \n", 368 | " \n", 369 | " \n", 370 | " \n", 371 | " \n", 372 | " \n", 373 | " \n", 374 | " \n", 375 | " \n", 376 | " \n", 377 | " \n", 378 | " \n", 379 | " \n", 380 | " \n", 381 | " \n", 382 | " \n", 383 | " \n", 384 | " \n", 385 | " \n", 386 | " \n", 387 | " \n", 388 | " \n", 389 | " \n", 390 | " \n", 391 | " \n", 392 | " \n", 393 | " \n", 394 | " \n", 395 | " \n", 396 | " \n", 397 | " \n", 398 | " \n", 399 | " \n", 400 | " \n", 401 | " \n", 402 | " \n", 403 | " \n", 404 | " \n", 405 | " \n", 406 | " \n", 407 | " \n", 408 | " \n", 409 | " \n", 410 | " \n", 411 | " \n", 412 | " \n", 413 | " \n", 414 | " \n", 415 | " \n", 416 | " \n", 417 | " \n", 418 | " \n", 419 | " \n", 420 | " \n", 421 | " \n", 422 | " \n", 423 | " \n", 424 | " \n", 425 | " \n", 426 | " \n", 427 | " \n", 428 | " \n", 429 | " \n", 430 | " \n", 431 | " \n", 432 | " \n", 433 | " \n", 434 | " \n", 435 | " \n", 436 | " \n", 437 | " \n", 438 | " \n", 439 | " \n", 440 | " \n", 441 | " \n", 442 | " \n", 443 | " \n", 444 | " \n", 445 | " \n", 446 | " \n", 447 | " \n", 448 | " \n", 449 | " \n", 450 | " \n", 451 | " \n", 452 | " \n", 453 | " \n", 454 | " \n", 455 | " \n", 456 | " \n", 457 | " \n", 458 | " \n", 459 | " \n", 460 | " \n", 461 | " \n", 462 | " \n", 463 | " \n", 464 | " \n", 465 | " \n", 466 | " \n", 467 | " \n", 468 | "
Product NameBrandPriceCategoryRankSalesRevenueReviewsRatingSellerLQSASINLink
#
1Mind Reader Adjustable Height Ergonomic Foot R...Mind Reader14.99Office Products286.04440.066556.0309.04.0AMZN.A.B07FMGMVT8https://www.amazon.com/dp/B07FMGMVT8
2AmazonBasics Foot Rest - BlackAmazonBasics13.19Office Products539.03115.041087.0657.04.0N.A.5B01DN8TG46https://www.amazon.com/dp/B01DN8TG46
3Sleepy Ride - Airplane Footrest Made with Prem...Sleepy Ride19.97Office Products1067.02075.041438.0386.04.5FBA5B01M35M87Ohttps://www.amazon.com/dp/B01M35M87O
4Rest My Sole - Foot Rest Cushion for Under Des...Well Desk26.95Office Products1159.01661.044764.0188.04.5FBA8B075RYDWZHhttps://www.amazon.com/dp/B075RYDWZH
5Andyer Andyer Foot Rest, Portable Travel Footr...Andyer10.99Home & Kitchen6169.01384.015210.0215.04.0FBA6B072VJ9BKXhttps://www.amazon.com/dp/B072VJ9BKX
\n", 469 | "
" 470 | ], 471 | "text/plain": [ 472 | " Product Name Brand Price \\\n", 473 | "# \n", 474 | "1 Mind Reader Adjustable Height Ergonomic Foot R... Mind Reader 14.99 \n", 475 | "2 AmazonBasics Foot Rest - Black AmazonBasics 13.19 \n", 476 | "3 Sleepy Ride - Airplane Footrest Made with Prem... Sleepy Ride 19.97 \n", 477 | "4 Rest My Sole - Foot Rest Cushion for Under Des... Well Desk 26.95 \n", 478 | "5 Andyer Andyer Foot Rest, Portable Travel Footr... Andyer 10.99 \n", 479 | "\n", 480 | " Category Rank Sales Revenue Reviews Rating Seller LQS \\\n", 481 | "# \n", 482 | "1 Office Products 286.0 4440.0 66556.0 309.0 4.0 AMZ N.A. \n", 483 | "2 Office Products 539.0 3115.0 41087.0 657.0 4.0 N.A. 5 \n", 484 | "3 Office Products 1067.0 2075.0 41438.0 386.0 4.5 FBA 5 \n", 485 | "4 Office Products 1159.0 1661.0 44764.0 188.0 4.5 FBA 8 \n", 486 | "5 Home & Kitchen 6169.0 1384.0 15210.0 215.0 4.0 FBA 6 \n", 487 | "\n", 488 | " ASIN Link \n", 489 | "# \n", 490 | "1 B07FMGMVT8 https://www.amazon.com/dp/B07FMGMVT8 \n", 491 | "2 B01DN8TG46 https://www.amazon.com/dp/B01DN8TG46 \n", 492 | "3 B01M35M87O https://www.amazon.com/dp/B01M35M87O \n", 493 | "4 B075RYDWZH https://www.amazon.com/dp/B075RYDWZH \n", 494 | "5 B072VJ9BKX https://www.amazon.com/dp/B072VJ9BKX " 495 | ] 496 | }, 497 | "execution_count": 50, 498 | "metadata": {}, 499 | "output_type": "execute_result" 500 | } 501 | ], 502 | "source": [ 503 | "import re\n", 504 | "# 使用正则表达式进行数字提取\n", 505 | "def str2num(string):\n", 506 | " if not isinstance(string, str):\n", 507 | " string = str(string)\n", 508 | " string = string.replace(',','')\n", 509 | " regular_expression = '\\d+\\.?\\d*'\n", 510 | " pattern = re.compile(regular_expression)\n", 511 | " match = pattern.search(string)\n", 512 | " if match:\n", 513 | " return float(match.group())\n", 514 | " else:\n", 515 | " return float('nan')\n", 516 | "converters = {'Price':str2num,\n", 517 | " 'Rank':str2num,\n", 518 | " 'Rating':str2num,\n", 519 | " 'Sales':str2num,\n", 520 | " 'Revenue':str2num,\n", 521 | " 'Reviews':str2num\n", 522 | " }\n", 523 | "try:\n", 524 | " data = pd.read_csv('data.csv', converters=converters, header=7, index_col=0)\n", 525 | "except BaseException as e:\n", 526 | " print(e)\n", 527 | "data.head()" 528 | ] 529 | }, 530 | { 531 | "cell_type": "markdown", 532 | "metadata": {}, 533 | "source": [ 534 | "把不同的数据处理函数解耦,分别把str2num放入tools模块,数据读取放入datapipeline模块" 535 | ] 536 | }, 537 | { 538 | "cell_type": "code", 539 | "execution_count": null, 540 | "metadata": { 541 | "collapsed": true 542 | }, 543 | "outputs": [], 544 | "source": [] 545 | } 546 | ], 547 | "metadata": { 548 | "hide_input": false, 549 | "kernelspec": { 550 | "display_name": "Python 3", 551 | "language": "python", 552 | "name": "python3" 553 | }, 554 | "language_info": { 555 | "codemirror_mode": { 556 | "name": "ipython", 557 | "version": 3 558 | }, 559 | "file_extension": ".py", 560 | "mimetype": "text/x-python", 561 | "name": "python", 562 | "nbconvert_exporter": "python", 563 | "pygments_lexer": "ipython3", 564 | "version": "3.6.1" 565 | } 566 | }, 567 | "nbformat": 4, 568 | "nbformat_minor": 2 569 | } 570 | -------------------------------------------------------------------------------- /7. 数据处理/2. 数据的去重.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### 知识基础\n", 8 | "\n", 9 | "- Pandas包基础:pd.read_csv\n", 10 | "- 正则表达式基础" 11 | ] 12 | }, 13 | { 14 | "cell_type": "markdown", 15 | "metadata": {}, 16 | "source": [ 17 | "在数据获取过程中由于网络延迟或者数据抓取规则的缘故,出现数据重复问题也是很常见的,所以需要对数据进行查重和去重处理。" 18 | ] 19 | }, 20 | { 21 | "cell_type": "markdown", 22 | "metadata": {}, 23 | "source": [ 24 | "首先每一行需要基于一个或者多个属性(attribution)是唯一(unique)的,或者确定数据唯一的规则,然后对数据进行查重和去重处理,继续以`data.csv`为例。" 25 | ] 26 | }, 27 | { 28 | "cell_type": "markdown", 29 | "metadata": {}, 30 | "source": [ 31 | "## 导入数据" 32 | ] 33 | }, 34 | { 35 | "cell_type": "code", 36 | "execution_count": 1, 37 | "metadata": {}, 38 | "outputs": [], 39 | "source": [ 40 | "from datapipeline import data" 41 | ] 42 | }, 43 | { 44 | "cell_type": "code", 45 | "execution_count": 2, 46 | "metadata": {}, 47 | "outputs": [ 48 | { 49 | "data": { 50 | "text/html": [ 51 | "
\n", 52 | "\n", 65 | "\n", 66 | " \n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | "
Product NameBrandPriceCategoryRankSalesRevenueReviewsRatingSellerLQSASINLink
#
1Mind Reader Adjustable Height Ergonomic Foot R...Mind Reader14.99Office Products286.04440.066556.0309.04.0AMZN.A.B07FMGMVT8https://www.amazon.com/dp/B07FMGMVT8
2AmazonBasics Foot Rest - BlackAmazonBasics13.19Office Products539.03115.041087.0657.04.0N.A.5B01DN8TG46https://www.amazon.com/dp/B01DN8TG46
3Sleepy Ride - Airplane Footrest Made with Prem...Sleepy Ride19.97Office Products1067.02075.041438.0386.04.5FBA5B01M35M87Ohttps://www.amazon.com/dp/B01M35M87O
4Rest My Sole - Foot Rest Cushion for Under Des...Well Desk26.95Office Products1159.01661.044764.0188.04.5FBA8B075RYDWZHhttps://www.amazon.com/dp/B075RYDWZH
5Andyer Andyer Foot Rest, Portable Travel Footr...Andyer10.99Home & Kitchen6169.01384.015210.0215.04.0FBA6B072VJ9BKXhttps://www.amazon.com/dp/B072VJ9BKX
\n", 183 | "
" 184 | ], 185 | "text/plain": [ 186 | " Product Name Brand Price \\\n", 187 | "# \n", 188 | "1 Mind Reader Adjustable Height Ergonomic Foot R... Mind Reader 14.99 \n", 189 | "2 AmazonBasics Foot Rest - Black AmazonBasics 13.19 \n", 190 | "3 Sleepy Ride - Airplane Footrest Made with Prem... Sleepy Ride 19.97 \n", 191 | "4 Rest My Sole - Foot Rest Cushion for Under Des... Well Desk 26.95 \n", 192 | "5 Andyer Andyer Foot Rest, Portable Travel Footr... Andyer 10.99 \n", 193 | "\n", 194 | " Category Rank Sales Revenue Reviews Rating Seller LQS \\\n", 195 | "# \n", 196 | "1 Office Products 286.0 4440.0 66556.0 309.0 4.0 AMZ N.A. \n", 197 | "2 Office Products 539.0 3115.0 41087.0 657.0 4.0 N.A. 5 \n", 198 | "3 Office Products 1067.0 2075.0 41438.0 386.0 4.5 FBA 5 \n", 199 | "4 Office Products 1159.0 1661.0 44764.0 188.0 4.5 FBA 8 \n", 200 | "5 Home & Kitchen 6169.0 1384.0 15210.0 215.0 4.0 FBA 6 \n", 201 | "\n", 202 | " ASIN Link \n", 203 | "# \n", 204 | "1 B07FMGMVT8 https://www.amazon.com/dp/B07FMGMVT8 \n", 205 | "2 B01DN8TG46 https://www.amazon.com/dp/B01DN8TG46 \n", 206 | "3 B01M35M87O https://www.amazon.com/dp/B01M35M87O \n", 207 | "4 B075RYDWZH https://www.amazon.com/dp/B075RYDWZH \n", 208 | "5 B072VJ9BKX https://www.amazon.com/dp/B072VJ9BKX " 209 | ] 210 | }, 211 | "execution_count": 2, 212 | "metadata": {}, 213 | "output_type": "execute_result" 214 | } 215 | ], 216 | "source": [ 217 | "data.head()" 218 | ] 219 | }, 220 | { 221 | "cell_type": "markdown", 222 | "metadata": {}, 223 | "source": [ 224 | "现在没有重复的行,我们可以先手动添加一下" 225 | ] 226 | }, 227 | { 228 | "cell_type": "code", 229 | "execution_count": 10, 230 | "metadata": {}, 231 | "outputs": [ 232 | { 233 | "name": "stdout", 234 | "output_type": "stream", 235 | "text": [ 236 | "110\n" 237 | ] 238 | } 239 | ], 240 | "source": [ 241 | "data = data.append(data[0:10])\n", 242 | "print(data.__len__())" 243 | ] 244 | }, 245 | { 246 | "cell_type": "markdown", 247 | "metadata": {}, 248 | "source": [ 249 | "显然,data中ASIN为唯一attr,可基于ASIN去重" 250 | ] 251 | }, 252 | { 253 | "cell_type": "code", 254 | "execution_count": 11, 255 | "metadata": {}, 256 | "outputs": [ 257 | { 258 | "name": "stdout", 259 | "output_type": "stream", 260 | "text": [ 261 | "100\n" 262 | ] 263 | } 264 | ], 265 | "source": [ 266 | "data = data.drop_duplicates('ASIN')\n", 267 | "print(data.__len__())" 268 | ] 269 | }, 270 | { 271 | "cell_type": "markdown", 272 | "metadata": {}, 273 | "source": [ 274 | "**Tip:**\n", 275 | "\n", 276 | "实际操作过程中要基于数据特征和业务需求进行去重处理,甚至可能需要自行编写去重规则,在此先不展开说明了。" 277 | ] 278 | }, 279 | { 280 | "cell_type": "code", 281 | "execution_count": null, 282 | "metadata": { 283 | "collapsed": true 284 | }, 285 | "outputs": [], 286 | "source": [] 287 | } 288 | ], 289 | "metadata": { 290 | "hide_input": false, 291 | "kernelspec": { 292 | "display_name": "Python 3", 293 | "language": "python", 294 | "name": "python3" 295 | }, 296 | "language_info": { 297 | "codemirror_mode": { 298 | "name": "ipython", 299 | "version": 3 300 | }, 301 | "file_extension": ".py", 302 | "mimetype": "text/x-python", 303 | "name": "python", 304 | "nbconvert_exporter": "python", 305 | "pygments_lexer": "ipython3", 306 | "version": "3.6.1" 307 | } 308 | }, 309 | "nbformat": 4, 310 | "nbformat_minor": 2 311 | } 312 | -------------------------------------------------------------------------------- /7. 数据处理/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ssty247/automate-report/27b3fecf5bdd5dd3ae7a8362443d835c4825b1a6/7. 数据处理/__init__.py -------------------------------------------------------------------------------- /7. 数据处理/datapipeline.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import numpy as np 3 | 4 | from tools import str2num 5 | 6 | converters = {'Price':str2num, 7 | 'Rank':str2num, 8 | 'Rating':str2num, 9 | 'Sales':str2num, 10 | 'Revenue':str2num, 11 | 'Reviews':str2num 12 | } 13 | try: 14 | data = pd.read_csv('data.csv', converters=converters, header=7, index_col=0) 15 | except BaseException as e: 16 | print(e) -------------------------------------------------------------------------------- /7. 数据处理/tools.py: -------------------------------------------------------------------------------- 1 | import re 2 | 3 | def str2num(string): 4 | if not isinstance(string, str): 5 | string = str(string) 6 | string = string.replace(',','') 7 | regular_expression = '\d+\.?\d*' 8 | pattern = re.compile(regular_expression) 9 | match = pattern.search(string) 10 | if match: 11 | return float(match.group()) 12 | else: 13 | return float('nan') -------------------------------------------------------------------------------- /8. 文件结构设计与模块化.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### 知识基础" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "- 代码可参照[Example_3](https://github.com/LinshuZhang/automate-report/tree/master/Example_3)" 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "### 学习目标\n", 22 | "\n", 23 | "- 可以基于报告内容和形式组织报告自动化脚本的文档结构\n", 24 | "- 在编写不同报告的数据分析模块时产生技术积累\n", 25 | "- 对分析模块进行泛用话处理,以便于积累更多通用模块" 26 | ] 27 | }, 28 | { 29 | "cell_type": "markdown", 30 | "metadata": {}, 31 | "source": [ 32 | "在7.1的数据读取章节我们有出现一个把函数分类放置的处理:\n", 33 | "把str2num放入tools模块,数据读取放入datapipeline模块,在此我们同样是要进行类似的工作,把不同的函数进行分类处理" 34 | ] 35 | }, 36 | { 37 | "cell_type": "markdown", 38 | "metadata": {}, 39 | "source": [ 40 | "### 1. 文件组织形式\n", 41 | "一般来说自动化报告的组织形式如下:" 42 | ] 43 | }, 44 | { 45 | "cell_type": "markdown", 46 | "metadata": {}, 47 | "source": [ 48 | "格式:\n", 49 | "- ***文件夹*** : 加粗倾斜\n", 50 | "- *数据文件* : 倾斜\n", 51 | "- **执行模块** : 加粗\n", 52 | "- 未分类文件 : 无\n", 53 | "\n", 54 | "\n", 55 | "- ***report*** 自动化报告的工作目录\n", 56 | " - **\\__init__.py**\n", 57 | " - main.ipynb 文件的主执行和调试文件,显示内容和结果,及进行自定义\n", 58 | " - ***data*** 保存数据文件 Save data to be used or generated\n", 59 | " - *dynamic_data.csv*\n", 60 | " \n", 61 | " *dynamic_data.csv*是经常变化的,每篇报告都会使用不同的数据\n", 62 | " - *static_data.csv*\n", 63 | " \n", 64 | " *static_data.csv*是更新频率很低的,或者数据虽然会更新但是被所有报告所共用,比如存储了做预测时使用的拟合参数,文本关联矩阵,或者随着报告制作而变化的总体均值方差,用于纵向比较。而在多人协作时,此类数据放在共用数据库更佳。\n", 65 | " - ***image*** 存放生成的图片文件 Save image\n", 66 | " - ***report*** 存放生成的报告\n", 67 | " - ***template*** 存放报告生成需要的模板\n", 68 | " - template.html 生成HTML形式报告所用模板\n", 69 | " - template.tex 生成PDF报告需要的模板,这里我们使用python先生成Tex,然后Tex再生成PDF\n", 70 | " \n", 71 | " 模板的后缀只影响用编辑器打开模板时的格式,不影响Jinja2生成模板,因为这些内容都是使用python来进行的文件读取\n", 72 | " - **tools.py** 保存工具型函数,可多次复用并且功能单一\n", 73 | " - **datapipeline.py** 数据读取和预处理,可从中直接取出清洗后数据\n", 74 | " - **ImageFactory.py** 图片工具包,可以把设置完毕的图片模板保存在此处,方便多次使用\n", 75 | " - **models.py** 定义文档结构\n", 76 | " - **items.py** 定义分析时会使用的数据结构\n", 77 | " - **analysis.py** 存放成熟的分析方法,多是从main.ipynb中提取出来\n", 78 | " - **configs.py** 设置报告的参数,比如作者,日期,数据源等等,每次使用时可以仅仅修改设置便生成不同报告\n", 79 | " - readme.txt 报告的说明文件\n", 80 | " \n", 81 | " 类似于此处的文档结构说明可加入之后的readme文件中,具体说明文件目录中各文件功能,还可以更加详细地说明内含方法\n", 82 | " - requirements.txt 关于报告生成所用python环境中所用包及其版本的规定,为之后使用环境配置方便,直接`pip install -r requirements.txt`安装\n", 83 | " " 84 | ] 85 | }, 86 | { 87 | "cell_type": "markdown", 88 | "metadata": {}, 89 | "source": [ 90 | "### 2. 报告产生流程\n", 91 | "作报告,原材料就是数据,数据分析就是一个个加工步骤,且产出的报告或者说结论就是最终的产品,接下来要介绍数据是怎么在这个框架中变成产品的" 92 | ] 93 | }, 94 | { 95 | "cell_type": "markdown", 96 | "metadata": {}, 97 | "source": [ 98 | "- 数据获取后存入***data***,数据获取方式业务之间各有不同,再次不做说明,爬虫和数据库的教程网上应该有做好的可用\n", 99 | "- **datapipeline.py**中的模块可以对***data***中的数据自动进行数据清洗工作,可能会需要**configs.py**中的设置\n", 100 | "- **models.py**中已经定义好的数据结构可以对从**datapipeline.py**导入的清洗好的数据进行进一步处理\n", 101 | "- main.ipynb\n", 102 | " \n", 103 | " 从**configs.py**导入设置\n", 104 | " \n", 105 | " 从**models.py**导入文档结构和数据结构\n", 106 | "\n", 107 | " 从**ImageFactory.py**导入image模板\n", 108 | "\n", 109 | " 从**datapipeline.py**导入处理好的数据\n", 110 | " \n", 111 | " 从**tools.py**导入数据分析工具\n", 112 | " \n", 113 | " 分析数据\n", 114 | " \n", 115 | " 生成图片\n", 116 | " \n", 117 | " 添加说明和人工分析\n", 118 | " \n", 119 | " 保存到document类\n", 120 | " \n", 121 | " 使用***template***中的模板渲染document生成report \n", 122 | " \n", 123 | "- 当main.ipynb的方法稳定成熟后,可以模块化后提取出所使用的数据结构放入**items.py**,提取出所使用的分析方法放入**analysis.py**,把半自动化报告中的自定义部分依旧留在main.ipynb中" 124 | ] 125 | }, 126 | { 127 | "cell_type": "markdown", 128 | "metadata": {}, 129 | "source": [ 130 | "### 3. 工作流程" 131 | ] 132 | }, 133 | { 134 | "cell_type": "markdown", 135 | "metadata": {}, 136 | "source": [ 137 | "1. 把需要使用的数据文件放入data文件夹中,\n", 138 | "数据分两种,一种是生成报告使用的支持数据,更新变动很慢,但每次作报告都需要使用,比如行业均值方差啊,拟合结果的参数啊,这种建议建立数据库使用sql管理;一种是和报告一一对应的,这种放在data文件夹中很合适。\n", 139 | "2. 对configs.py进行设置\n", 140 | "3. 执行main.py在ipynb中显示图片和文字结果,若是半自动则进行人工文字补充\n", 141 | "4. 执行main.py中导出report.html" 142 | ] 143 | }, 144 | { 145 | "cell_type": "markdown", 146 | "metadata": {}, 147 | "source": [ 148 | "### 4. 添加注释" 149 | ] 150 | }, 151 | { 152 | "cell_type": "markdown", 153 | "metadata": {}, 154 | "source": [ 155 | "依旧使用7.1数据读取章节中的例子,当时是很粗略地说“把str2num放入tools模块,数据读取放入datapipeline模块”,但是具体操作的时候仅仅是复制粘贴是不够的,为了方便复用和泛用,还需要加下注释。" 156 | ] 157 | }, 158 | { 159 | "cell_type": "code", 160 | "execution_count": 8, 161 | "metadata": { 162 | "collapsed": true 163 | }, 164 | "outputs": [], 165 | "source": [ 166 | "def str2num(string):\n", 167 | " \"\"\"\n", 168 | " str2num(string)\n", 169 | " \n", 170 | " Get number for a string.\n", 171 | " \n", 172 | " Parameters\n", 173 | " ----------\n", 174 | " string : a string with the format like '$2.1', '$1, 333' or '&4,3'\n", 175 | " \n", 176 | " Returns\n", 177 | " -------\n", 178 | " out : float\n", 179 | " \n", 180 | " Examples\n", 181 | " --------\n", 182 | " >>> str2num('$2.3')\n", 183 | " 2.3\n", 184 | " \"\"\"\n", 185 | " if not isinstance(string, str):\n", 186 | " string = str(string)\n", 187 | " string = string.replace(',','')\n", 188 | " regular_expression = '\\d+\\.?\\d*'\n", 189 | " pattern = re.compile(regular_expression)\n", 190 | " match = pattern.search(string)\n", 191 | " if match:\n", 192 | " return float(match.group())\n", 193 | " else:\n", 194 | " return float('nan')" 195 | ] 196 | }, 197 | { 198 | "cell_type": "markdown", 199 | "metadata": {}, 200 | "source": [ 201 | "可以使用`str2num`查看效果" 202 | ] 203 | }, 204 | { 205 | "cell_type": "code", 206 | "execution_count": 3, 207 | "metadata": {}, 208 | "outputs": [ 209 | { 210 | "name": "stdout", 211 | "output_type": "stream", 212 | "text": [ 213 | "Object `str2num` not found.\n" 214 | ] 215 | } 216 | ], 217 | "source": [ 218 | "?str2num" 219 | ] 220 | }, 221 | { 222 | "cell_type": "markdown", 223 | "metadata": {}, 224 | "source": [ 225 | "### 5. 数据读取独立" 226 | ] 227 | }, 228 | { 229 | "cell_type": "markdown", 230 | "metadata": {}, 231 | "source": [ 232 | "一般的数据读取步骤,可以简单地按照数据读取顺序放入文件datapipeline.py文件,之后使用时可以直接从datapipeline导入:\n", 233 | "```\n", 234 | "from datapipeline import data\n", 235 | "```\n", 236 | "而数据的预处理过程同样在datapipeline.py中完成" 237 | ] 238 | }, 239 | { 240 | "cell_type": "code", 241 | "execution_count": 14, 242 | "metadata": { 243 | "collapsed": true 244 | }, 245 | "outputs": [], 246 | "source": [ 247 | "import pandas as pd\n", 248 | "converters = {'Price':str2num,\n", 249 | " 'Rank':str2num,\n", 250 | " 'Rating':str2num,\n", 251 | " 'Sales':str2num,\n", 252 | " 'Revenue':str2num,\n", 253 | " 'Reviews':str2num\n", 254 | " }\n", 255 | "try:\n", 256 | " data = pd.read_csv('./data/data.csv', converters=converters, header=7, index_col=0)\n", 257 | "except BaseException as e:\n", 258 | " print(e)" 259 | ] 260 | }, 261 | { 262 | "cell_type": "markdown", 263 | "metadata": { 264 | "collapsed": true 265 | }, 266 | "source": [ 267 | "而在[Example_1](https://github.com/LinshuZhang/automate-report/tree/master/Example_1)中我们把数据的读取放在了分析模块中,虽然数据已经是处理好的pkl数据,也可以这么直接读取,但是进行解耦操作,单独出来更好。" 268 | ] 269 | }, 270 | { 271 | "cell_type": "markdown", 272 | "metadata": {}, 273 | "source": [ 274 | "### 6. 功能单一的方法(func)加入Class中" 275 | ] 276 | }, 277 | { 278 | "cell_type": "markdown", 279 | "metadata": {}, 280 | "source": [ 281 | "像是`2. 报告结构规定`中print_structure这种打印document结构的方法是依赖于document使用的,因此不适合放入tools.py,放入Document中和实例绑定更合适。\n", 282 | "还有`6. 报告内容生成`中生成html的方法,同样放入Document中" 283 | ] 284 | }, 285 | { 286 | "cell_type": "markdown", 287 | "metadata": {}, 288 | "source": [ 289 | "进行修改后的结果请参照[Example_3](https://github.com/LinshuZhang/automate-report/tree/master/Example_3)" 290 | ] 291 | }, 292 | { 293 | "cell_type": "code", 294 | "execution_count": null, 295 | "metadata": { 296 | "collapsed": true 297 | }, 298 | "outputs": [], 299 | "source": [] 300 | } 301 | ], 302 | "metadata": { 303 | "hide_input": false, 304 | "kernelspec": { 305 | "display_name": "Python 3", 306 | "language": "python", 307 | "name": "python3" 308 | }, 309 | "language_info": { 310 | "codemirror_mode": { 311 | "name": "ipython", 312 | "version": 3 313 | }, 314 | "file_extension": ".py", 315 | "mimetype": "text/x-python", 316 | "name": "python", 317 | "nbconvert_exporter": "python", 318 | "pygments_lexer": "ipython3", 319 | "version": "3.6.1" 320 | } 321 | }, 322 | "nbformat": 4, 323 | "nbformat_minor": 2 324 | } 325 | -------------------------------------------------------------------------------- /9. 多人协作和版本管理.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### 知识基础\n", 8 | "\n", 9 | "- Git基础知识\n", 10 | "- 代码可参照[Example_3](https://github.com/LinshuZhang/automate-report/tree/master/Example_3)" 11 | ] 12 | }, 13 | { 14 | "cell_type": "markdown", 15 | "metadata": {}, 16 | "source": [ 17 | "### 学习目标\n", 18 | "\n", 19 | "- 用Git进行自动化脚本的版本管理\n", 20 | "- 用Git保证团队报告自动化脚本的版本同步" 21 | ] 22 | }, 23 | { 24 | "cell_type": "markdown", 25 | "metadata": {}, 26 | "source": [ 27 | "### 1. 文档结构调整" 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": {}, 33 | "source": [ 34 | "在多人协作时,我们要尽保持模板的同步,同时每个人对模板的完善都能集合在一起,而具体工作的内容不应该产生干扰。所以对于变动频繁的文件,我们可以集合起来然后进行同步时忽略处理。\n", 35 | "对于第8章的文档结构,会常发生变动的有:\n", 36 | "- ***report*** 自动化报告的工作目录\n", 37 | " - main.ipynb 文件的主执行和调试文件,每次执行都会变化\n", 38 | " - ***data***\n", 39 | " - *dynamic_data.csv*\n", 40 | " - *static_data.csv*\n", 41 | " - ***image*** 存放生成的图片文件\n", 42 | " - ***report*** 存放生成的报告\n", 43 | " - **configs.py** 设置报告的参数,比如作者,日期,数据源等等,每次使用时可以仅仅修改设置便生成不同报告\n", 44 | " \n", 45 | "而这些我们可以打包在一起放入***report***文件夹,把文档结构调整为:\n", 46 | "- ***report*** 自动化报告的工作目录\n", 47 | " - ... 其他文件不动\n", 48 | " - **configs.py** 在此需要添加设置报告的编号,且报告编号需要唯一\n", 49 | " - main.ipynb 文件的主执行和调试文件,不会执行,仅仅作为母版\n", 50 | " - ***data***\n", 51 | " - *static_data.csv*\n", 52 | " - ***reports*** 存放生成的报告\n", 53 | " - ***report1*** 报告1\n", 54 | " - ***report2*** 报告2\n", 55 | " - main.ipynb 从上上级文件夹复制,每次执行时都会变化\n", 56 | " - **configs.py** 基于上上级文件夹中的**configs.py**文件生成,作为此报告设置的记录\n", 57 | " - ***data***\n", 58 | " - *dynamic_data.csv*\n", 59 | " - ***image*** 存放生成的图片文件" 60 | ] 61 | }, 62 | { 63 | "cell_type": "markdown", 64 | "metadata": {}, 65 | "source": [ 66 | "**注意:**\n", 67 | "\n", 68 | "在文档结构调整后,我们还需要添加辅助用的小脚本,用途为:\n", 69 | "1. 基于**configs.py**在***reports***文件夹生成制作报告用的文件夹,并生成如下的目录结构\n", 70 | " - ***report1*** 报告1\n", 71 | " - main.ipynb 从上上级文件夹复制,每次执行时都会变化\n", 72 | " - **configs.py** 基于上上级文件夹中的**configs.py**文件生成,作为此报告设置的记录\n", 73 | " - ***data***\n", 74 | " - ***image*** 存放生成的图片文件\n", 75 | "2. 检查***reports***文件夹中的目录结构是否正确" 76 | ] 77 | }, 78 | { 79 | "cell_type": "markdown", 80 | "metadata": {}, 81 | "source": [ 82 | "**检查目录的小例子:**\n", 83 | "\n", 84 | "更复杂和完善的请根据报告形式自行编写" 85 | ] 86 | }, 87 | { 88 | "cell_type": "code", 89 | "execution_count": 1, 90 | "metadata": { 91 | "collapsed": true, 92 | "scrolled": false 93 | }, 94 | "outputs": [], 95 | "source": [ 96 | "import os\n", 97 | "import logging\n", 98 | "logging.info(\"当前目录 : {}\".format(os.getcwd()))" 99 | ] 100 | }, 101 | { 102 | "cell_type": "code", 103 | "execution_count": 2, 104 | "metadata": { 105 | "collapsed": true 106 | }, 107 | "outputs": [], 108 | "source": [ 109 | "def check_folder(folder_name):\n", 110 | " if os.path.exists(folder_name):\n", 111 | " logging.info(\"文件夹已创立:{}\".format(folder_name))\n", 112 | " else:\n", 113 | " logging.warning(\"文件夹未创立:{}\".format(folder_name))\n", 114 | " os.makedirs(folder_name)\n", 115 | " logging.info(\"文件夹创立完毕:{}\".format(folder_name))" 116 | ] 117 | }, 118 | { 119 | "cell_type": "code", 120 | "execution_count": 3, 121 | "metadata": { 122 | "collapsed": true 123 | }, 124 | "outputs": [], 125 | "source": [ 126 | "check_folder(\"./Example_2\")\n", 127 | "# 修改当前工作目录\n", 128 | "os.chdir(\"./Example_2\")\n", 129 | "logging.info(\"当前目录 : {}\".format(os.getcwd()))" 130 | ] 131 | }, 132 | { 133 | "cell_type": "code", 134 | "execution_count": 4, 135 | "metadata": {}, 136 | "outputs": [ 137 | { 138 | "name": "stderr", 139 | "output_type": "stream", 140 | "text": [ 141 | "WARNING:root:文件夹未创立:Report1\n" 142 | ] 143 | } 144 | ], 145 | "source": [ 146 | "import configs\n", 147 | "check_folder(configs.report_name)" 148 | ] 149 | }, 150 | { 151 | "cell_type": "code", 152 | "execution_count": 5, 153 | "metadata": { 154 | "collapsed": true 155 | }, 156 | "outputs": [], 157 | "source": [ 158 | "# 创建文件夹\n", 159 | "for folder in ['data', 'image']:\n", 160 | " check_folder(r\"./reports/{}/{}\".format(configs.report_name, folder))" 161 | ] 162 | }, 163 | { 164 | "cell_type": "code", 165 | "execution_count": 6, 166 | "metadata": {}, 167 | "outputs": [ 168 | { 169 | "name": "stderr", 170 | "output_type": "stream", 171 | "text": [ 172 | "WARNING:root:文件未添加:./reports/Report1/main.ipynb\n", 173 | "WARNING:root:文件未添加:./reports/Report1/configs.py\n" 174 | ] 175 | } 176 | ], 177 | "source": [ 178 | "# 复制文件\n", 179 | "import shutil\n", 180 | "for filename in ['main.ipynb', 'configs.py']: \n", 181 | " if os.path.exists(r\"./reports/{}/{}\".format(configs.report_name, filename)):\n", 182 | " logging.warning(\"文件未添加:./reports/{}/{}\".format(configs.report_name, filename))\n", 183 | " else:\n", 184 | " shutil.copy(filename, r\"./reports/{}/{}\".format(configs.report_name, filename))\n", 185 | " logging.info(\"已复制文件:{}\".format(filename))" 186 | ] 187 | }, 188 | { 189 | "cell_type": "markdown", 190 | "metadata": {}, 191 | "source": [ 192 | "### 2. main.ipynb的拆分\n", 193 | "这里分为两种情况:\n", 194 | "1. 报告尚未完全自动化,还需要人工干涉\n", 195 | "由于main.ipynb在每次执行时结果会不同,变动时无法保持同步,所以可以对其内容进行拆分,分为方法部分和人工设置部分,以实现方法部分通用,人工设置部分每篇报告都输入不同。\n", 196 | "2. 报告已经完全自动化\n", 197 | "这时候main.ipynb仅仅是作为一个调试工具,把其中的方法保存为main.py,之后直接执行main.py即可\n", 198 | "\n", 199 | "\n", 200 | "\n", 201 | "**TODO :**\n", 202 | "具体拆分方法和main.ipynb的组织方式会在之后补充。" 203 | ] 204 | }, 205 | { 206 | "cell_type": "markdown", 207 | "metadata": { 208 | "collapsed": true 209 | }, 210 | "source": [ 211 | "### 3. 设置git添加时的忽略文件" 212 | ] 213 | }, 214 | { 215 | "cell_type": "markdown", 216 | "metadata": {}, 217 | "source": [ 218 | "所以对于生成报告所用的公用文件,我们用Git进行版本控制,而每次报告都会不同的设置或者数据,以及生成文件,我们需要让Git忽略掉它们,但是还要保证文件的组成结构没有变化。" 219 | ] 220 | }, 221 | { 222 | "cell_type": "markdown", 223 | "metadata": {}, 224 | "source": [ 225 | "比如对于调整前的文档结构,.gitignore的文件内容添加如下\n", 226 | "```\n", 227 | "\\__pycache__\n", 228 | "*.pyc\n", 229 | ".ipynb_checkpoints\n", 230 | "data/data.csv\n", 231 | "image/*\n", 232 | "report/*\n", 233 | "```\n", 234 | "分别忽略了:\n", 235 | "- python导入模块时的生成文件 `\\__pycache__、*.pyc`\n", 236 | "- ipynb的历史记录文件 `.ipynb_checkpoints`\n", 237 | "- 每份报告都会\n", 238 | "\n", 239 | "而对于调整后的文档结构,.gitignore的文件内容可改为添加如下\n", 240 | "```\n", 241 | "\\__pycache__\n", 242 | "*.pyc\n", 243 | ".ipynb_checkpoints\n", 244 | "report/*\n", 245 | "```" 246 | ] 247 | }, 248 | { 249 | "cell_type": "markdown", 250 | "metadata": {}, 251 | "source": [ 252 | "进行修改后的结果请参照[Example_3](https://github.com/LinshuZhang/automate-report/tree/master/Example_3)" 253 | ] 254 | }, 255 | { 256 | "cell_type": "code", 257 | "execution_count": null, 258 | "metadata": { 259 | "collapsed": true 260 | }, 261 | "outputs": [], 262 | "source": [] 263 | } 264 | ], 265 | "metadata": { 266 | "hide_input": false, 267 | "kernelspec": { 268 | "display_name": "Python 3", 269 | "language": "python", 270 | "name": "python3" 271 | }, 272 | "language_info": { 273 | "codemirror_mode": { 274 | "name": "ipython", 275 | "version": 3 276 | }, 277 | "file_extension": ".py", 278 | "mimetype": "text/x-python", 279 | "name": "python", 280 | "nbconvert_exporter": "python", 281 | "pygments_lexer": "ipython3", 282 | "version": "3.6.1" 283 | } 284 | }, 285 | "nbformat": 4, 286 | "nbformat_minor": 2 287 | } 288 | -------------------------------------------------------------------------------- /ExampleCode/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ssty247/automate-report/27b3fecf5bdd5dd3ae7a8362443d835c4825b1a6/ExampleCode/__init__.py -------------------------------------------------------------------------------- /ExampleCode/model.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | import logging 3 | 4 | class Document(object): 5 | def __init__(self): 6 | # 文章标题 7 | self.title = None 8 | # 子标题 9 | self.subtitle = None 10 | # 前言 11 | self.foreword = None 12 | # 章节 13 | self.chapters = [] 14 | for i in range(5): 15 | self.chapters.append(Chapter(subchapter_number=4, number="chapter{}".format(i))) 16 | 17 | def print_structure(self, deep): 18 | if self.chapters: 19 | for subchapter in self.chapters: 20 | print('--'*deep+subchapter.number) 21 | subchapter.print_structure(deep+1) 22 | else: 23 | return 24 | 25 | class Chapter(Document): 26 | def __init__(self, subchapter_number=0, number=''): 27 | """ 28 | subchapter_number : 子章节数 29 | number : 章节编号,基于文档结构生成,需要是唯一的 30 | """ 31 | self.title = None 32 | self.content1 = None 33 | self.table = None 34 | # 图表会先做出图片形式,此处保存图片的路径 35 | self.image = None 36 | self.content2 = None 37 | self.rank_list_change = None 38 | self.number = number 39 | self.chapters= [Chapter(subchapter_number=0, number="{}_subchapter{}".format(self.number, i)) 40 | for i in range(subchapter_number)] 41 | # 设定图片保存的链接 42 | self.image_path = './image/' 43 | 44 | 45 | def set_image(self, fig): 46 | image_filename = '{}{}.png'.format(self.image_path, self.number) 47 | fig.savefig(image_filename, dpi=160, bbox_inches='tight') 48 | self.image = image_filename 49 | 50 | def __getattr__(self, name): 51 | try: 52 | return self.name 53 | except: 54 | logging.error("Attribute is not exist") 55 | return None 56 | 57 | # 子章节定义可以直接使用Chapter定义,利用了Python类的继承,也为之后补充自定义子章节提供了留白 58 | class Subchapter(Chapter): 59 | pass -------------------------------------------------------------------------------- /ExampleCode/models.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | import logging 3 | 4 | class Document(object): 5 | def __init__(self): 6 | # 文章标题 7 | self.title = None 8 | # 子标题 9 | self.subtitle = None 10 | # 前言 11 | self.foreword = None 12 | # 章节 13 | self.chapters = [] 14 | for i in range(5): 15 | self.chapters.append(Chapter(subchapter_number=4, number="chapter{}".format(i))) 16 | 17 | def print_structure(self, deep): 18 | if self.chapters: 19 | for subchapter in self.chapters: 20 | print('--'*deep+subchapter.number) 21 | subchapter.print_structure(deep+1) 22 | else: 23 | return 24 | 25 | class Chapter(Document): 26 | def __init__(self, subchapter_number=0, number=''): 27 | """ 28 | subchapter_number : 子章节数 29 | number : 章节编号,基于文档结构生成,需要是唯一的 30 | """ 31 | self.title = None 32 | self.content1 = None 33 | self.table = None 34 | # 图表会先做出图片形式,此处保存图片的路径 35 | self.image = None 36 | self.content2 = None 37 | self.rank_list_change = None 38 | self.number = number 39 | self.chapters= [Chapter(subchapter_number=0, number="{}_subchapter{}".format(self.number, i)) 40 | for i in range(subchapter_number)] 41 | # 设定图片保存的链接 42 | self.image_path = './image/' 43 | 44 | 45 | def set_image(self, fig): 46 | image_filename = '{}{}.png'.format(self.image_path, self.number) 47 | fig.savefig(image_filename, dpi=160, bbox_inches='tight') 48 | self.image = image_filename 49 | 50 | def __getattr__(self, name): 51 | try: 52 | return self.name 53 | except: 54 | logging.error("Attribute is not exist") 55 | return None 56 | 57 | # 子章节定义可以直接使用Chapter定义,利用了Python类的继承,也为之后补充自定义子章节提供了留白 58 | class Subchapter(Chapter): 59 | pass -------------------------------------------------------------------------------- /Example_1/ImageFactory.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | import matplotlib.pyplot as plt 3 | import matplotlib as mpl 4 | import matplotlib.font_manager as mfm 5 | import matplotlib.gridspec as gridspec 6 | import matplotlib.ticker as plticker 7 | import numpy as np 8 | import types 9 | from itertools import cycle 10 | 11 | class Image(object): 12 | font_path = {} 13 | prop = {} 14 | font_path['hei'] = '../font/MSYHMONO.ttf' 15 | font_path['english'] = '../font/Calibri.ttf' 16 | for font_name in list(font_path): 17 | prop[font_name] = mfm.FontProperties(fname=font_path[font_name]) 18 | 19 | title_font = prop['hei'].copy() 20 | title_font.set_size(14) 21 | xticks_font = prop['hei'].copy() 22 | xticks_font.set_size(9) 23 | ylable_font = prop['hei'].copy() 24 | ylable_font.set_size(10) 25 | legend_font = prop['hei'].copy() 26 | legend_font.set_size(9) 27 | default_colors = {} 28 | default_colors['blue'] = '#6CADDD' 29 | default_colors['yellow'] = '#F3903F' 30 | default_colors['red'] = '#CB615A' 31 | default_colors['orange'] = '#F3903F' 32 | default_colors['gray'] = '#B4B4B4' 33 | default_colors['lightyellow'] = '#FCC900' 34 | default_colors['royalblue'] = '#5488CF' 35 | 36 | IMAGE_WIDTH = 5.708 37 | IMAGE_HIGH = 2.756 38 | 39 | def __init__(self, title=None, labels=None, data=None, 40 | image_path=None, title_y=1.1, xticks_rotation='vertical', legend_name=[]): 41 | self.length = len(data) 42 | self.x = np.arange(self.length) 43 | self.y = data 44 | self.data = data 45 | self.title_y = title_y 46 | self.title = title 47 | self.labels = labels 48 | self.legend_name = legend_name 49 | self.xticks_rotation = xticks_rotation 50 | 51 | def init(self): 52 | self.fig = plt.figure(figsize=(self.IMAGE_WIDTH, self.IMAGE_HIGH)) 53 | self.gs = gridspec.GridSpec(1, 1) 54 | self.ax = self.fig.add_subplot(self.gs[0]) 55 | self.init_plus() 56 | self.set_margins() 57 | self.set_xticks() 58 | self.add_ax() 59 | self.add_title() 60 | self.plot() 61 | self.set_spines() 62 | self.set_tick_marks() 63 | self.add_legend() 64 | self.config_add() 65 | self.tight_layout() 66 | self.set_grid() 67 | plt.close() 68 | 69 | def config_add(self): 70 | pass 71 | 72 | def add_ax(self): 73 | pass 74 | 75 | def init_plus(self): 76 | pass 77 | 78 | def add_legend(self): 79 | if not (self.legend_name is None): 80 | if len(self.legend_name)==1: 81 | plt.legend(self.legend_name, loc='upper right', bbox_to_anchor=(1, 1.2), prop=self.legend_font, frameon=True) 82 | elif len(self.legend_name)==2: 83 | lines1, labels1 = self.ax.get_legend_handles_labels() 84 | self.ax.legend(lines1, labels1, loc='upper center', ncol=2, bbox_to_anchor=(0.5, 1.2), prop=self.legend_font, frameon=False) 85 | 86 | 87 | def set_ylabel(self): 88 | pass 89 | 90 | def tight_layout(self, **karg): 91 | self.gs.tight_layout(self.fig, **karg) 92 | 93 | def plot(self): 94 | self.ax.fill_between(self.x, self.y.min()*0.9, 95 | self.y, zorder=3, color=self.default_colors['blue']) 96 | 97 | def add_title(self): 98 | if self.title: 99 | plt.title(self.title, fontproperties=self.title_font, y=self.title_y) 100 | 101 | def set_grid(self): 102 | get_ax_space = lambda x: x.get_ylim()[1] - x.get_ylim()[0] 103 | self.ax_space = get_ax_space(self.ax) 104 | def get_interval(ax_space, space_number=5): 105 | digit_number = len(str((ax_space))) 106 | intervals = int((ax_space)/(space_number*(10**digit_number))) 107 | while intervals == 0: 108 | digit_number -= 1 109 | intervals = int((ax_space)/(space_number*(10**digit_number))) 110 | linshi = round((ax_space)/(space_number*(10**digit_number))) 111 | intervals = linshi*(10**digit_number) 112 | return intervals 113 | 114 | if not 'intervals' in self.__dict__.keys(): 115 | self.intervals = get_interval(self.ax_space) 116 | loc = plticker.MultipleLocator(base=self.intervals) 117 | self.ax.yaxis.set_major_locator(loc) 118 | self.ax.grid(axis='y', zorder=0) 119 | if 'ax2' in self.__dict__.keys(): 120 | print("有双轴需要设置副轴grid") 121 | self.ax_space2 = get_ax_space(self.ax2) 122 | self.intervals2 = get_interval(self.ax_space2, space_number=5) 123 | loc2 = plticker.MultipleLocator(base=self.intervals2) 124 | self.ax2.yaxis.set_major_locator(loc2) 125 | 126 | def set_margins(self): 127 | self.ax.margins(0.013, 0.073) 128 | 129 | def set_spines(self): 130 | self.ax.spines['right'].set_visible(False) 131 | self.ax.spines['top'].set_visible(False) 132 | # self.ax.spines['bottom'].set_visible(False) 133 | self.ax.spines['left'].set_visible(False) 134 | 135 | def set_tick_marks(self): 136 | self.ax.tick_params(axis='both', which='both', bottom=False, top=False, 137 | labelbottom=True, left=False, right=False, labelleft=True) 138 | 139 | def set_xticks(self): 140 | plt.xticks(self.x, self.labels, fontproperties=self.xticks_font, rotation=self.xticks_rotation) # 设置横坐标标签 141 | 142 | def show(self): 143 | plt.show() 144 | 145 | def save(self): 146 | if image_path: 147 | self.fig.savefig(image_path) 148 | else: 149 | logging.warning("Please sure image path firse") 150 | 151 | class ImageFill(Image): 152 | # def __init__(self): 153 | # pass 154 | 155 | def set_spines(self): 156 | self.ax.spines['right'].set_visible(False) 157 | self.ax.spines['top'].set_visible(False) 158 | self.ax.spines['bottom'].set_visible(False) 159 | self.ax.spines['left'].set_visible(False) 160 | 161 | class ImageBar(Image): 162 | def __init__(self, title=None, labels=None, data=None, image_path=None, xticks_rotation = 40, 163 | legend_name=[], y2=None, title_y=1.2): 164 | self.y2 = y2 165 | super(ImageBar, self).__init__(title=title, labels=labels, data=data, image_path=image_path, 166 | title_y=title_y, xticks_rotation=xticks_rotation, legend_name=legend_name) 167 | 168 | def plot(self): 169 | rects = plt.bar(self.x, self.y, 0.4, zorder=3, color=self.default_colors['blue']) 170 | for rect in rects: 171 | height = rect.get_height() 172 | self.ax.text(rect.get_x() + rect.get_width()/2., 1.05*height, 173 | '%d' % int(height), 174 | ha='center', va='bottom') 175 | 176 | # def set_xticks(self): 177 | # plt.xticks(self.x, self.labels, fontproperties=self.xticks_font, rotation= self.xticks_rotation, wrap=True) 178 | 179 | class ImageTwinx(Image): 180 | def __init__(self, title=None, labels=None, data=None, image_path=None, xticks_rotation=40, 181 | legend_name=[], y2=None, title_y=1.2, ylabel_show=True): 182 | self.ylabel_show = ylabel_show 183 | self.legend_name = legend_name 184 | self.marker_style = dict(color=self.default_colors['yellow'], linestyle='-', marker='o') 185 | self.y2 = y2 186 | super(ImageTwinx, self).__init__(title=title, labels=labels, data=data, 187 | image_path=image_path, xticks_rotation=xticks_rotation, title_y=title_y, legend_name=legend_name) 188 | 189 | def config_add(self): 190 | self.set_ylabel() 191 | 192 | def set_ylabel(self): 193 | if self.ylabel_show: 194 | self.ax.set_ylabel(self.legend_name[0], fontproperties=self.ylable_font) 195 | self.ax2.set_ylabel(self.legend_name[1], fontproperties=self.ylable_font) 196 | 197 | def add_ax(self): 198 | self.ax2 = self.ax.twinx() 199 | 200 | def tight_layout(self, **karg): 201 | self.gs.tight_layout(self.fig, **karg) 202 | 203 | def plot(self): 204 | self.ln1 = self.ax.bar(self.x, self.y, 0.4, zorder=3, label=self.legend_name[0], color=self.default_colors['blue']) 205 | self.ax2.plot(self.x, self.y2, label=self.legend_name[1], **self.marker_style) 206 | 207 | def set_spines(self): 208 | for _ax in [self.ax, self.ax2]: 209 | _ax.margins(0) # 设置留白 210 | # spines 211 | _ax.spines['right'].set_visible(False) 212 | _ax.spines['top'].set_visible(False) 213 | _ax.spines['left'].set_visible(False) 214 | 215 | def set_tick_marks(self): 216 | self.ax.tick_params(axis='both', which='both', bottom=False, top=False, 217 | labelbottom=True, left=False, right=False, labelleft=True) 218 | self.ax2.tick_params(axis='both', which='both', bottom=False, top=False, 219 | labelbottom=True, left=False, right=False) 220 | 221 | def add_legend(self): 222 | if not (self.legend_name is None): 223 | if len(self.legend_name) == 2: 224 | lines1, labels1 = self.ax.get_legend_handles_labels() 225 | lines2, labels2 = self.ax2.get_legend_handles_labels() 226 | self.ax.legend(lines1+lines2, labels1+labels2, loc='upper center', ncol=2, bbox_to_anchor=(0.5, 1.27), prop=self.legend_font, frameon=False) 227 | 228 | class ImageLine(Image): 229 | def __init__(self, title=None, labels=None, data=None, image_path=None, 230 | title_y = 1.08, xticks_rotation='horizontal', legend_name=[]): 231 | self.marker_style = dict(color=Image.default_colors['blue'], linestyle='-') 232 | super(ImageLine, self).__init__(title=title, labels=labels, data=data, 233 | image_path=image_path, title_y=title_y, xticks_rotation= xticks_rotation, 234 | legend_name=legend_name) 235 | self.init_plus = self.config_add 236 | 237 | def config_add(self): 238 | self.set_ylabel() 239 | self.ax.set_ylim(top = round(np.max(self.y)*1.1)) 240 | 241 | def plot(self): 242 | self.ax.plot(self.x, self.y, **self.marker_style) 243 | 244 | class ImagePie(Image): 245 | colors = ['#F3903F','#B4B4B4','#FCC900','#6CADDD', 246 | '#D9D4CF', '#7C7877', '#ABD0CE', '#F0E5DE', '#6AAFE6', 247 | '#D09E88', '#D4DFE6'] 248 | 249 | def plot(self): 250 | self.explode = np.ones(self.length)*0.03 251 | self.patches = self.ax.pie(self.y, 252 | explode=self.explode, 253 | labels=self.labels, 254 | colors=self.colors, 255 | autopct='%d%%') 256 | 257 | def set_grid(self): 258 | pass 259 | 260 | def add_legend(self): 261 | handles = [] 262 | for i, l in enumerate(self.labels): 263 | handles.append(mpl.patches.Patch(color=self.colors[i], label=l)) 264 | self.ax.legend(handles, self.labels, loc="center right", frameon=False) 265 | 266 | class ImageFluctuation(ImageTwinx): 267 | def plot(self): 268 | self.ax.plot(self.x, self.y, label=self.legend_name[0], **self.marker_style) 269 | self.ax2.bar(self.x, self.y2, 0.4, zorder=3, label=self.legend_name[1], color=self.default_colors['red']) 270 | 271 | def init_plus(self): 272 | self.marker_style = dict(color=self.default_colors['blue'], linestyle='-') 273 | 274 | def set_xticks(self): 275 | plt.xticks(range(0,self.length,30), self.labels.loc[[0, 30, 60, 90, 120, 150, 180]], fontproperties=self.xticks_font, rotation=self.xticks_rotation) 276 | 277 | def set_ylim(self, top=None, bottom=None): 278 | if not top: 279 | top=int(np.max(self.y)*1.1) 280 | if not bottom: 281 | bottom=int(np.min(self.y)*0.8) 282 | if top and bottom: 283 | try: 284 | self.ax.set_ylim(top=top, bottom=bottom) 285 | except: 286 | top=int(np.max(self.y)*1.1) 287 | bottom=int(np.min(self.y)*0.8) 288 | self.ax.set_ylim(top=top, bottom=bottom) 289 | 290 | def config_add(self): 291 | self.set_ylabel() 292 | self.set_ylim() 293 | 294 | class ImageDoubleLine(ImageTwinx): 295 | def init_plus(self): 296 | self.marker_style1 = dict(color=self.default_colors['red'], linestyle='-') 297 | self.marker_style2 = dict(color=self.default_colors['blue'], linestyle='-') 298 | 299 | def plot(self): 300 | self.ax.plot(self.x, self.y, label=self.legend_name[0], **self.marker_style1) 301 | self.ax2.plot(self.x, self.y2, label=self.legend_name[1], **self.marker_style2) 302 | 303 | def set_xticks(self): 304 | plt.xticks(range(0,self.length,30), self.labels.loc[[0, 30, 60, 90, 120, 150, 180]], fontproperties=self.xticks_font, rotation=self.xticks_rotation) 305 | 306 | class ImageMultiLine(Image): 307 | IMAGE_HIGH = 2.756 308 | color_cycle = cycle(['blue', 'orange', 'red', 'lightyellow', 'royalblue']) 309 | 310 | def plot(self): 311 | self.marker_style = [] 312 | for asin in self.y.columns: 313 | data = self.y[asin] 314 | marker_style = dict(color=self.default_colors[next(self.color_cycle)], 315 | linestyle='-', marker='o') 316 | self.marker_style.append(marker_style) 317 | self.ax.plot(self.x, data, zorder=3, label=asin, **marker_style) 318 | 319 | def set_tick_marks(self): 320 | self.ax.tick_params(axis='both', zorder=1, which='both', bottom=True, top=False, 321 | labelbottom=True, left=False, right=False, labelleft=False) 322 | 323 | def set_margins(self): 324 | self.ax.margins(0.013, 0.073) 325 | 326 | def set_yticks(self): 327 | self.ax.yaxis.tick_right() 328 | self.set_tick_marks() 329 | 330 | def config_add(self): 331 | self.set_yticks() 332 | data_max = self.y.max().max() 333 | if round(data_max*1.1)-round(data_max)>1: 334 | top = round(data_max*1.1) 335 | else: 336 | top = round(data_max)+1 337 | self.ax.set_ylim(top=top) 338 | 339 | def add_legend(self): 340 | handles, labels = self.ax.get_legend_handles_labels() 341 | self.ax.legend(handles, labels, loc='lower center', ncol=3, bbox_to_anchor=(0.5, -0.4), prop=self.legend_font, frameon=False) -------------------------------------------------------------------------------- /Example_1/PS20180528_bestsellers.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ssty247/automate-report/27b3fecf5bdd5dd3ae7a8362443d835c4825b1a6/Example_1/PS20180528_bestsellers.pkl -------------------------------------------------------------------------------- /Example_1/PS20180528_hotnewreleases.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ssty247/automate-report/27b3fecf5bdd5dd3ae7a8362443d835c4825b1a6/Example_1/PS20180528_hotnewreleases.pkl -------------------------------------------------------------------------------- /Example_1/PS20180528_moversandshakers.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ssty247/automate-report/27b3fecf5bdd5dd3ae7a8362443d835c4825b1a6/Example_1/PS20180528_moversandshakers.pkl -------------------------------------------------------------------------------- /Example_1/PS20180528_product.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ssty247/automate-report/27b3fecf5bdd5dd3ae7a8362443d835c4825b1a6/Example_1/PS20180528_product.pkl -------------------------------------------------------------------------------- /Example_1/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ssty247/automate-report/27b3fecf5bdd5dd3ae7a8362443d835c4825b1a6/Example_1/__init__.py -------------------------------------------------------------------------------- /Example_1/analysis.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | import pandas as pd 3 | import numpy as np 4 | import logging 5 | 6 | from configs import filepath, date, bd, author 7 | from ImageFactory import ImageMultiLine, ImageBar 8 | 9 | class Product(object): 10 | def __init__(self): 11 | self.data = self.read_data() 12 | 13 | def read_data(self): 14 | df = pd.read_pickle(filepath+bd+date+'_product.pkl') 15 | # 修复价格显示百倍 16 | df.loc[:, 'lowest_price'] = df['lowest_price']/100 17 | return df 18 | 19 | product = Product() 20 | 21 | class TopList(object): 22 | def __init__(self, list_name): 23 | self.list_name = list_name 24 | self.data = self.read_data() 25 | self.product_id = self.data['product_id'] 26 | self.price = Price(self.data.groupby(['product_id'])['lowest_price'].mean()) 27 | self.review = Review(self.data) 28 | self.commodity = Commodity(self.data) 29 | self.brand = Brand(self.data) 30 | 31 | def read_data(self): 32 | df_ = pd.read_pickle('{}{}{}_{}.pkl'.format(filepath, bd, date, self.list_name)) 33 | df = pd.merge(df_, product.data, how='left', on = 'product_id') 34 | return df 35 | 36 | class Price(object): 37 | def __init__(self, data): 38 | self.data = data 39 | self.mean = np.nanmean(self.data) 40 | self.max = np.max(self.data) 41 | self.min = np.min(self.data) 42 | self.count = len(self.data) 43 | self.space = 5 44 | self.table = self.distribution_table(50, space=self.space) 45 | # self.image = self.plot_distribution() 46 | 47 | @property 48 | def image(self): 49 | labels = ["\\"+label for label in self.table['labels']] 50 | image = ImageBar(data=self.table['count'], 51 | labels=labels, 52 | title='榜单商品价格分布', 53 | xticks_rotation=0, 54 | legend_name=['商品数量']) 55 | image.init() 56 | return image 57 | 58 | def distribution_table(self, 59 | truncation_number:int, 60 | space_auto=True, 61 | space=10, 62 | space_power=1, 63 | unit_symbol='$' 64 | ): 65 | """ 66 | 把数据划分出不同区间,然后计算各个区间的频率 67 | 68 | Parameters 69 | ---------- 70 | truncation_number : 图表截断值,即区间范围的最大值 71 | space_auto : 是否自动确定组间距离 72 | space : 组间距离,即各个区间的范围 73 | space_power : 组间距离乘数,本身组间距离的增大方式为:当组间距离<=5时,每次加1,当>5时,每次加5 74 | 二档组间距离乘数为100时,则当组间距离<=500时,每次加100,当>500时,每次加500 75 | unit_symbol : x轴标签的单位符号,当unit_symbol='$',x轴0-10区间的标签为$0-$10 76 | 而当unit_symbol='$',x轴0-10区间的标签为0-10 77 | """ 78 | 79 | repeat_times = 0 80 | while True: 81 | data_last = 0 82 | data_count = {} 83 | labels = [] 84 | for data in range(0, int(self.max+1), space): 85 | if data < truncation_number: 86 | data_count['{}{}-{}{}'.format(unit_symbol, data_last, unit_symbol, data)]\ 87 | = sum((self.data < data) & (self.data >= data_last)) 88 | data_last = data 89 | else: 90 | data_count['{}{}以上'.format(unit_symbol, truncation_number)]\ 91 | = sum((self.data >= truncation_number)) 92 | 93 | if space_auto: 94 | if sum(map(lambda x : x != 0, data_count.values()))>14: 95 | space += 1*space_power if space<5*space_power else 5*space_power 96 | elif sum(map(lambda x : x != 0, data_count.values()))>8: 97 | break 98 | else: 99 | space -= 1*space_power if space<5*space_power else 5*space_power 100 | 101 | repeat_times += 1 102 | if repeat_times > 30: 103 | logging.warning("循环次数过多,无法自动确认组间距") 104 | break 105 | else: 106 | break 107 | 108 | if sum(map(lambda x : x != 0, data_count.values()))>14: 109 | logging.warning("存在分组数大于14,请手动调节组间距和最大截断值") 110 | df_data = pd.DataFrame({'labels':list(data_count.keys()), 111 | 'count':list(data_count.values())} 112 | ) 113 | 114 | df_data = df_data.dropna() 115 | df_data = df_data[df_data['count'] != 0] 116 | return df_data.assign(percentage = lambda x: x['count']/sum(x['count'])) 117 | 118 | class Review(object): 119 | def __init__(self, data): 120 | self.data = data.groupby(['product_id'])['review_count'].mean() 121 | self.review_rating = data[data['review_count']!=0].review_rating 122 | self.rating_mean = np.nanmean(self.review_rating) 123 | self.mean = np.nanmean(self.data) 124 | self.max = np.max(self.data) 125 | self.min = np.min(self.data) 126 | self.sum = np.sum(self.data) 127 | self.count = len(self.data) 128 | 129 | # self.distribution_table = Price.distribution_table 130 | try: 131 | if self.mean>1000: 132 | self.table = self.distribution_table(10000, 133 | space=500, 134 | space_power=100, 135 | unit_symbol='') 136 | else: 137 | self.table = self.distribution_table(1000, 138 | space=50, 139 | space_power=2, 140 | space_auto = True, 141 | unit_symbol='') 142 | except BaseException as e: 143 | print(e) 144 | 145 | def distribution_table(self, *arg, **karg): 146 | return Price.distribution_table(self, *arg, **karg) 147 | 148 | @property 149 | def image(self): 150 | labels = self.table['labels'] 151 | image = ImageBar(data=self.table['count'], 152 | labels=labels, 153 | title='榜单商品评论数分布', 154 | xticks_rotation=20, 155 | legend_name=['商品数量'] 156 | ) 157 | image.init() 158 | return image 159 | 160 | class Commodity(object): 161 | show_number = 5 162 | def __init__(self, data): 163 | self.data = data 164 | self.product_id = self.data['product_id'] 165 | self.set = set(self.data['product_id']) 166 | self.count = len(self.set) 167 | self.table = self.commodity_rank() 168 | self.rank_change_table = ( 169 | self.data[ 170 | self.product_id.isin( 171 | self.cp['product_id'].head(self.show_number) 172 | ) 173 | ].pivot_table( 174 | index = ['date'], 175 | columns = ['product_id'], 176 | values = ['rank'] 177 | )['rank'][self.cp['product_id'].head(self.show_number)] 178 | )[self.table['ASIN'].head()] 179 | 180 | @property 181 | def image(self): 182 | image = ImageMultiLine(data=self.rank_change_table, 183 | labels=self.rank_change_table.index, 184 | title='商品排名变化情况', 185 | xticks_rotation=0, 186 | title_y = 1.1 187 | ) 188 | image.init() 189 | if image.intervals<1: 190 | image.intervals = 1 191 | image.init() 192 | image.ax.invert_yaxis() 193 | return image 194 | 195 | def commodity_rank(self): 196 | groups = self.data.groupby(['product_id']) 197 | # 上榜次数 198 | cp_count = pd.DataFrame(groups['date'].nunique()) 199 | # 计算每个商品的平均bsr 200 | cp_mean = pd.DataFrame(groups['rank'].mean()) 201 | #计算各个商品的最好bsr 202 | cp_min = pd.DataFrame(groups['rank'].min()) 203 | # 整理成一个表 204 | self.cp = pd.merge(cp_min.reset_index(), 205 | pd.merge(cp_count.reset_index(), 206 | cp_mean.reset_index(), 207 | on = 'product_id'), 208 | on = 'product_id') 209 | # 对列进行命名 210 | self.cp.columns = ['product_id','highest_rank','counts','avg_rank'] 211 | # 根据平均排名进行排序 212 | self.cp = self.cp.sort_values(['counts','avg_rank'],ascending = (False,True)) 213 | # 合并所需要的商品排行表 214 | self.cp = pd.merge(self.cp,product.data,on = 'product_id') 215 | 216 | rank_form = self.cp[['product_id', 'image_url', 'counts', 'avg_rank', 'highest_rank', 'brand', 'review_count', 'review_rating','lowest_price']].head(self.show_number).copy() 217 | rank_form.index = range(1,self.show_number + 1) 218 | # 将图片链接改为在html中可以直接显示的类型 219 | for i in range(len(rank_form)): 220 | if rank_form.image_url.notnull()[i + 1]: 221 | if "images-na.ssl" in rank_form.loc[i+1, 'image_url']: 222 | rank_form.loc[i+1, 'image_url'] = '' 224 | # 遇到图片缺失的情况时的处理 225 | else: 226 | logging.warning("ASIN -> {} : Can't get image_url, save url as {}.jpg instead".format(rank_form.product_id, rank_form.product_id)) 227 | rank_form.loc[i+1, 'image_url'] = '' 229 | #调整小数位数 230 | rank_form.avg_rank = rank_form.avg_rank.round(2) 231 | 232 | rank_form.columns = ['ASIN', '图片', '登榜次数', '平均排名', '最高排名', '品牌', '评论量', '平均星级', '当前价格($)'] 233 | return rank_form 234 | 235 | class Brand(object): 236 | show_number = 5 237 | def __init__(self, data): 238 | self.data = data 239 | self.set = set(self.data['brand']) 240 | self.count = len(self.set) 241 | self.table = self.commodity_rank() 242 | self.count_change_table = self.count_change() 243 | 244 | @property 245 | def image(self): 246 | image = ImageMultiLine(data=self.count_change_table, 247 | labels=self.count_change_table.index, 248 | title='品牌登榜商品数变化情况', 249 | xticks_rotation=0, 250 | title_y = 1.1 251 | ) 252 | image.init() 253 | if image.intervals<1: 254 | image.intervals = 1 255 | image.init() 256 | return image 257 | 258 | def count_change(self): 259 | df_brand_product_counts = pd.DataFrame(self.data.groupby(['brand','date'])['product_id'].count()).reset_index() 260 | df_brand_productcounts_change = df_brand_product_counts[df_brand_product_counts['brand'].isin(self.table['品牌'].head())].pivot_table(index = ['date'],columns = ['brand'],values = ['product_id']) 261 | try: 262 | df_brand_productcounts_change.columns = df_brand_productcounts_change.columns.levels[1] 263 | except: 264 | print('') 265 | df_brand_productcounts_change = df_brand_productcounts_change[self.table['品牌'].head()] 266 | return df_brand_productcounts_change.fillna(0) 267 | 268 | def commodity_rank(self): 269 | product.data_id = pd.DataFrame({'product_id':list(set(list(self.data['product_id'])))}) 270 | df_cp_product = pd.merge(product.data_id,product.data,on = 'product_id') 271 | cp_count = pd.DataFrame(self.data.groupby(['brand'])['rank'].count()) 272 | cp_mean = pd.DataFrame(self.data.groupby(['brand'])['rank'].mean()).round(2) 273 | cp_productcount = pd.DataFrame(df_cp_product.groupby(['brand'])['product_id'].count()) 274 | cp_max = pd.DataFrame(self.data.groupby(['brand'])['rank'].min()) 275 | cp_reviewcount = pd.DataFrame(df_cp_product.groupby(['brand'])['review_count'].sum()).fillna(0).astype(int) 276 | cp_reviewrating = pd.DataFrame(df_cp_product[df_cp_product['review_rating']!= 0].groupby(['brand'])['review_rating'].mean()).round(2) 277 | cp_price = pd.DataFrame(df_cp_product.groupby(['brand'])['lowest_price'].mean()).round(2) 278 | cp = pd.merge(cp_count.reset_index(),cp_mean.reset_index(),on = 'brand') 279 | cp = pd.merge(cp,cp_productcount.reset_index(),on = 'brand') 280 | cp = pd.merge(cp,cp_max.reset_index(),on = 'brand') 281 | cp = pd.merge(cp,cp_reviewcount.reset_index(),on = 'brand') 282 | cp = pd.merge(cp,cp_reviewrating.reset_index(),on = 'brand', how = 'outer') 283 | cp = pd.merge(cp,cp_price.reset_index(),on = 'brand') 284 | cp.columns = ['brand','counts','avg_rank','prodcut_count','max_rank','review_count','avg_rating','avg_price'] 285 | cp = cp.sort_values(['counts','avg_rank'],ascending = (False,True)) 286 | cp[['brand','counts','prodcut_count','avg_rank','max_rank','review_count','avg_rating','avg_price']].head(self.show_number) 287 | rank_form = cp[['brand','counts','prodcut_count','avg_rank','max_rank','review_count','avg_rating','avg_price']].head(self.show_number) 288 | rank_form['avg_price'] = rank_form['avg_price'].map(lambda x:float('%.2f' % x)) 289 | rank_form.columns = ['品牌', '登榜次数', '登榜商品数', '平均排名', '最高排名', '评论量', '平均星级', '平均价格($)'] 290 | rank_form.index = range(1,self.show_number + 1) 291 | return rank_form 292 | 293 | lists = {'Best Seller':TopList('bestsellers'), 294 | 'Hot New Releases':TopList('hotnewreleases'), 295 | 'Movers & Shakers':TopList('moversandshakers')} 296 | bs = lists['Best Seller'] 297 | hnr = lists['Hot New Releases'] 298 | mns = lists['Movers & Shakers'] 299 | 300 | class Info(object): 301 | def __init__(self): 302 | self.rank_lists = list(lists.values()) 303 | self.review_total_count = sum([rank_list.review.sum for rank_list in self.rank_lists]) 304 | sorted_time = sorted(list(set(bs.data['date']))) 305 | start_time = sorted_time[0] 306 | self.start_time = str(start_time.year) + '年' + str(start_time.month) + '月' + str(start_time.day) + '日' 307 | end_time = sorted_time[-1] 308 | self.end_time = str(end_time.year) + '年' + str(end_time.month) + '月' + str(end_time.day) + '日' 309 | self.category_name = bs.data['dept_name'][0] 310 | 311 | @property 312 | def commodity_total_count(self): 313 | commodity_total = bs.commodity.set and hnr.commodity.set and mns.commodity.set 314 | commodity_total_count = len(commodity_total) 315 | return commodity_total_count 316 | 317 | @property 318 | def brand_total_count(self): 319 | brand_total = bs.brand.set and hnr.brand.set and mns.brand.set 320 | brand_total_count = len(brand_total) 321 | return brand_total_count 322 | 323 | @property 324 | def rating_mean(self): 325 | rating = np.array([rank_list.review.rating_mean for rank_list in self.rank_lists]) 326 | count = np.array([rank_list.review.count for rank_list in self.rank_lists]) 327 | rating_mean = np.dot(rating, count)/np.sum(count) 328 | return rating_mean 329 | 330 | info = Info() -------------------------------------------------------------------------------- /Example_1/configs.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | filepath = './' 4 | date = u'20180528' 5 | bd = u'PS' 6 | author = u'作者君' 7 | 8 | 9 | -------------------------------------------------------------------------------- /Example_1/image/chapter1_subchapter0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ssty247/automate-report/27b3fecf5bdd5dd3ae7a8362443d835c4825b1a6/Example_1/image/chapter1_subchapter0.png -------------------------------------------------------------------------------- /Example_1/image/chapter1_subchapter1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ssty247/automate-report/27b3fecf5bdd5dd3ae7a8362443d835c4825b1a6/Example_1/image/chapter1_subchapter1.png -------------------------------------------------------------------------------- /Example_1/image/chapter1_subchapter2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ssty247/automate-report/27b3fecf5bdd5dd3ae7a8362443d835c4825b1a6/Example_1/image/chapter1_subchapter2.png -------------------------------------------------------------------------------- /Example_1/image/chapter1_subchapter3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ssty247/automate-report/27b3fecf5bdd5dd3ae7a8362443d835c4825b1a6/Example_1/image/chapter1_subchapter3.png -------------------------------------------------------------------------------- /Example_1/image/chapter2_subchapter0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ssty247/automate-report/27b3fecf5bdd5dd3ae7a8362443d835c4825b1a6/Example_1/image/chapter2_subchapter0.png -------------------------------------------------------------------------------- /Example_1/image/chapter2_subchapter1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ssty247/automate-report/27b3fecf5bdd5dd3ae7a8362443d835c4825b1a6/Example_1/image/chapter2_subchapter1.png -------------------------------------------------------------------------------- /Example_1/image/chapter2_subchapter2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ssty247/automate-report/27b3fecf5bdd5dd3ae7a8362443d835c4825b1a6/Example_1/image/chapter2_subchapter2.png -------------------------------------------------------------------------------- /Example_1/image/chapter2_subchapter3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ssty247/automate-report/27b3fecf5bdd5dd3ae7a8362443d835c4825b1a6/Example_1/image/chapter2_subchapter3.png -------------------------------------------------------------------------------- /Example_1/image/chapter3_subchapter0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ssty247/automate-report/27b3fecf5bdd5dd3ae7a8362443d835c4825b1a6/Example_1/image/chapter3_subchapter0.png -------------------------------------------------------------------------------- /Example_1/image/chapter3_subchapter1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ssty247/automate-report/27b3fecf5bdd5dd3ae7a8362443d835c4825b1a6/Example_1/image/chapter3_subchapter1.png -------------------------------------------------------------------------------- /Example_1/image/chapter3_subchapter2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ssty247/automate-report/27b3fecf5bdd5dd3ae7a8362443d835c4825b1a6/Example_1/image/chapter3_subchapter2.png -------------------------------------------------------------------------------- /Example_1/image/chapter3_subchapter3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ssty247/automate-report/27b3fecf5bdd5dd3ae7a8362443d835c4825b1a6/Example_1/image/chapter3_subchapter3.png -------------------------------------------------------------------------------- /Example_1/items.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | import pandas as pd 3 | import numpy as np 4 | import logging 5 | 6 | from configs import date, bd, author 7 | from ImageFactory import ImageMultiLine, ImageBar 8 | 9 | Preface = {"Cell Phones & Accessories": "随着现代科技的进步发展,各式各样的移动设备正逐渐成为人们学习工作生活中的必需品,对于跨境电商而言,手机及其附件的市场规模同样不容小觑。", 10 | 11 | "Health & Personal Care": "健康,向来是人类生活追求的永恒主题之一,而健康及个人护理类商品,也是跨境电商市场中的重要组成部分。", 12 | 13 | "Pet Supplies": "随着人们生活水平的提高与社会压力的加大,养一只小宠物作为自己的动物伴侣正成为当今时代下时髦的减压方式,由此也促进了宠物用品相关跨境电商市场的产生与发展。", 14 | 15 | "Kitchen & Dining": "幸福美好的家居生活离不开各式各样厨具用品的陪伴,随着网络科技的发展,人们通过跨境电商平台来选购一系列的厨房餐饮工具,也逐渐成为了生活中常见的现象。", 16 | 17 | "Sports & Outdoors": "近年来,伴随着社会科技水平的提高与工业设计的人性化发展,各式各样的户外运动产品层出不穷,而通过网购来获得最新潮的户外运动装置,也成为了许多时尚人士的不二之选。", 18 | 19 | "Tools & Home Improvement": "家装灯具、五金配件以及各类家电,是现代人家居生活中必不可少的工具助手,同时也是跨境电商市场中的重要组成部分。", 20 | 21 | "Clothing, Shoes & Jewelry": "衣着与装饰,作为人们物质生活中必不可少的一环,一直是广大网购爱好者最为关心的品类之一,而在跨境电商平台中,各式各样的衣着与装饰类商品争奇斗艳,也形成了一道亮丽的风景线。", 22 | 23 | "Toys & Games": "说起玩具,大多数人也许首先会联想到家中的小孩或是自己童真的孩提时代,而随着生活节奏的加快与社会压力的增大,如今,通过网购的方式在第一时间淘来一些新奇的玩具也成为了一部分时髦人士的解压方式。", 24 | 25 | "Office Products": "档案夹、订书机、签字笔等形形色色的办公用品,是现代人日常工作中必不可少的辅助工具。通过电商平台来购买这些工作中的得力助手,也成为了当今时代下一部分企业或是个人节省时间、提高工作效率的方法之一。" 26 | } 27 | 28 | class Product(object): 29 | def __init__(self): 30 | self.data = self.read_data() 31 | 32 | def read_data(self): 33 | df = pd.read_pickle(bd+date+'_product.pkl') 34 | # 修复价格显示百倍 35 | df.loc[:, 'lowest_price'] = df['lowest_price']/100 36 | return df 37 | 38 | product = Product() 39 | 40 | class TopList(object): 41 | def __init__(self, list_name): 42 | self.list_name = list_name 43 | self.data = self.read_data() 44 | self.product_id = self.data['product_id'] 45 | self.price = Price(self.data.groupby(['product_id'])['lowest_price'].mean()) 46 | self.review = Review(self.data) 47 | self.commodity = Commodity(self.data) 48 | self.brand = Brand(self.data) 49 | 50 | def read_data(self): 51 | df_ = pd.read_pickle('{}{}_{}.pkl'.format(bd, date, self.list_name)) 52 | df = pd.merge(df_, product.data, how='left', on = 'product_id') 53 | return df 54 | 55 | class Price(object): 56 | def __init__(self, data): 57 | self.data = data 58 | self.mean = np.nanmean(self.data) 59 | self.max = np.max(self.data) 60 | self.min = np.min(self.data) 61 | self.count = len(self.data) 62 | self.space = 5 63 | self.table = self.distribution_table(50, space=self.space) 64 | # self.image = self.plot_distribution() 65 | 66 | @property 67 | def image(self): 68 | labels = ["\\"+label for label in self.table['labels']] 69 | image = ImageBar(data=self.table['count'], 70 | labels=labels, 71 | title='榜单商品价格分布', 72 | xticks_rotation=0, 73 | legend_name=['商品数量']) 74 | image.init() 75 | return image 76 | 77 | def distribution_table(self, 78 | truncation_number:int, 79 | space_auto=True, 80 | space=10, 81 | space_power=1, 82 | unit_symbol='$' 83 | ): 84 | """ 85 | 把数据划分出不同区间,然后计算各个区间的频率 86 | 87 | Parameters 88 | ---------- 89 | truncation_number : 图表截断值,即区间范围的最大值 90 | space_auto : 是否自动确定组间距离 91 | space : 组间距离,即各个区间的范围 92 | space_power : 组间距离乘数,本身组间距离的增大方式为:当组间距离<=5时,每次加1,当>5时,每次加5 93 | 二档组间距离乘数为100时,则当组间距离<=500时,每次加100,当>500时,每次加500 94 | unit_symbol : x轴标签的单位符号,当unit_symbol='$',x轴0-10区间的标签为$0-$10 95 | 而当unit_symbol='$',x轴0-10区间的标签为0-10 96 | """ 97 | 98 | repeat_times = 0 99 | while True: 100 | data_last = 0 101 | data_count = {} 102 | labels = [] 103 | for data in range(0, int(self.max+1), space): 104 | if data < truncation_number: 105 | data_count['{}{}-{}{}'.format(unit_symbol, data_last, unit_symbol, data)]\ 106 | = sum((self.data < data) & (self.data >= data_last)) 107 | data_last = data 108 | else: 109 | data_count['{}{}以上'.format(unit_symbol, truncation_number)]\ 110 | = sum((self.data >= truncation_number)) 111 | 112 | if space_auto: 113 | if sum(map(lambda x : x != 0, data_count.values()))>14: 114 | space += 1*space_power if space<5*space_power else 5*space_power 115 | elif sum(map(lambda x : x != 0, data_count.values()))>8: 116 | break 117 | else: 118 | space -= 1*space_power if space<5*space_power else 5*space_power 119 | 120 | repeat_times += 1 121 | if repeat_times > 30: 122 | logging.warning("循环次数过多,无法自动确认组间距") 123 | break 124 | else: 125 | break 126 | 127 | if sum(map(lambda x : x != 0, data_count.values()))>14: 128 | logging.warning("存在分组数大于14,请手动调节组间距和最大截断值") 129 | df_data = pd.DataFrame({'labels':list(data_count.keys()), 130 | 'count':list(data_count.values())} 131 | ) 132 | 133 | df_data = df_data.dropna() 134 | df_data = df_data[df_data['count'] != 0] 135 | return df_data.assign(percentage = lambda x: x['count']/sum(x['count'])) 136 | 137 | class Review(object): 138 | def __init__(self, data): 139 | self.data = data.groupby(['product_id'])['review_count'].mean() 140 | self.review_rating = data[data['review_count']!=0].review_rating 141 | self.rating_mean = np.nanmean(self.review_rating) 142 | self.mean = np.nanmean(self.data) 143 | self.max = np.max(self.data) 144 | self.min = np.min(self.data) 145 | self.sum = np.sum(self.data) 146 | self.count = len(self.data) 147 | 148 | # self.distribution_table = Price.distribution_table 149 | try: 150 | if self.mean>1000: 151 | self.table = self.distribution_table(10000, 152 | space=500, 153 | space_power=100, 154 | unit_symbol='') 155 | else: 156 | self.table = self.distribution_table(1000, 157 | space=50, 158 | space_power=2, 159 | space_auto = True, 160 | unit_symbol='') 161 | except BaseException as e: 162 | print(e) 163 | 164 | def distribution_table(self, *arg, **karg): 165 | return Price.distribution_table(self, *arg, **karg) 166 | 167 | @property 168 | def image(self): 169 | labels = self.table['labels'] 170 | image = ImageBar(data=self.table['count'], 171 | labels=labels, 172 | title='榜单商品评论数分布', 173 | xticks_rotation=20, 174 | legend_name=['商品数量'] 175 | ) 176 | image.init() 177 | return image 178 | 179 | class Commodity(object): 180 | show_number = 5 181 | def __init__(self, data): 182 | self.data = data 183 | self.product_id = self.data['product_id'] 184 | self.set = set(self.data['product_id']) 185 | self.count = len(self.set) 186 | self.table = self.commodity_rank() 187 | self.rank_change_table = ( 188 | self.data[ 189 | self.product_id.isin( 190 | self.cp['product_id'].head(self.show_number) 191 | ) 192 | ].pivot_table( 193 | index = ['date'], 194 | columns = ['product_id'], 195 | values = ['rank'] 196 | )['rank'][self.cp['product_id'].head(self.show_number)] 197 | )[self.table['ASIN'].head()] 198 | 199 | @property 200 | def image(self): 201 | image = ImageMultiLine(data=self.rank_change_table, 202 | labels=self.rank_change_table.index, 203 | title='商品排名变化情况', 204 | xticks_rotation=0, 205 | title_y = 1.1 206 | ) 207 | image.init() 208 | if image.intervals<1: 209 | image.intervals = 1 210 | image.init() 211 | image.ax.invert_yaxis() 212 | return image 213 | 214 | def commodity_rank(self): 215 | groups = self.data.groupby(['product_id']) 216 | # 上榜次数 217 | cp_count = pd.DataFrame(groups['date'].nunique()) 218 | # 计算每个商品的平均bsr 219 | cp_mean = pd.DataFrame(groups['rank'].mean()) 220 | #计算各个商品的最好bsr 221 | cp_min = pd.DataFrame(groups['rank'].min()) 222 | # 整理成一个表 223 | self.cp = pd.merge(cp_min.reset_index(), 224 | pd.merge(cp_count.reset_index(), 225 | cp_mean.reset_index(), 226 | on = 'product_id'), 227 | on = 'product_id') 228 | # 对列进行命名 229 | self.cp.columns = ['product_id','highest_rank','counts','avg_rank'] 230 | # 根据平均排名进行排序 231 | self.cp = self.cp.sort_values(['counts','avg_rank'],ascending = (False,True)) 232 | # 合并所需要的商品排行表 233 | self.cp = pd.merge(self.cp,product.data,on = 'product_id') 234 | 235 | rank_form = self.cp[['product_id', 'image_url', 'counts', 'avg_rank', 'highest_rank', 'brand', 'review_count', 'review_rating','lowest_price']].head(self.show_number).copy() 236 | rank_form.index = range(1,self.show_number + 1) 237 | # 将图片链接改为在html中可以直接显示的类型 238 | for i in range(len(rank_form)): 239 | if rank_form.image_url.notnull()[i + 1]: 240 | if "images-na.ssl" in rank_form.loc[i+1, 'image_url']: 241 | rank_form.loc[i+1, 'image_url'] = '' 243 | # 遇到图片缺失的情况时的处理 244 | else: 245 | logging.warning("ASIN -> {} : Can't get image_url, save url as {}.jpg instead".format(rank_form.product_id, rank_form.product_id)) 246 | rank_form.loc[i+1, 'image_url'] = '' 248 | #调整小数位数 249 | rank_form.avg_rank = rank_form.avg_rank.round(2) 250 | 251 | rank_form.columns = ['ASIN', '图片', '登榜次数', '平均排名', '最高排名', '品牌', '评论量', '平均星级', '当前价格($)'] 252 | return rank_form 253 | 254 | class Brand(object): 255 | show_number = 5 256 | def __init__(self, data): 257 | self.data = data 258 | self.set = set(self.data['brand']) 259 | self.count = len(self.set) 260 | self.table = self.commodity_rank() 261 | self.count_change_table = self.count_change() 262 | 263 | @property 264 | def image(self): 265 | image = ImageMultiLine(data=self.count_change_table, 266 | labels=self.count_change_table.index, 267 | title='品牌登榜商品数变化情况', 268 | xticks_rotation=0, 269 | title_y = 1.1 270 | ) 271 | image.init() 272 | if image.intervals<1: 273 | image.intervals = 1 274 | image.init() 275 | return image 276 | 277 | def count_change(self): 278 | df_brand_product_counts = pd.DataFrame(self.data.groupby(['brand','date'])['product_id'].count()).reset_index() 279 | df_brand_productcounts_change = df_brand_product_counts[df_brand_product_counts['brand'].isin(self.table['品牌'].head())].pivot_table(index = ['date'],columns = ['brand'],values = ['product_id']) 280 | try: 281 | df_brand_productcounts_change.columns = df_brand_productcounts_change.columns.levels[1] 282 | except: 283 | print('') 284 | df_brand_productcounts_change = df_brand_productcounts_change[self.table['品牌'].head()] 285 | return df_brand_productcounts_change.fillna(0) 286 | 287 | def commodity_rank(self): 288 | product.data_id = pd.DataFrame({'product_id':list(set(list(self.data['product_id'])))}) 289 | df_cp_product = pd.merge(product.data_id,product.data,on = 'product_id') 290 | cp_count = pd.DataFrame(self.data.groupby(['brand'])['rank'].count()) 291 | cp_mean = pd.DataFrame(self.data.groupby(['brand'])['rank'].mean()).round(2) 292 | cp_productcount = pd.DataFrame(df_cp_product.groupby(['brand'])['product_id'].count()) 293 | cp_max = pd.DataFrame(self.data.groupby(['brand'])['rank'].min()) 294 | cp_reviewcount = pd.DataFrame(df_cp_product.groupby(['brand'])['review_count'].sum()).fillna(0).astype(int) 295 | cp_reviewrating = pd.DataFrame(df_cp_product[df_cp_product['review_rating']!= 0].groupby(['brand'])['review_rating'].mean()).round(2) 296 | cp_price = pd.DataFrame(df_cp_product.groupby(['brand'])['lowest_price'].mean()).round(2) 297 | cp = pd.merge(cp_count.reset_index(),cp_mean.reset_index(),on = 'brand') 298 | cp = pd.merge(cp,cp_productcount.reset_index(),on = 'brand') 299 | cp = pd.merge(cp,cp_max.reset_index(),on = 'brand') 300 | cp = pd.merge(cp,cp_reviewcount.reset_index(),on = 'brand') 301 | cp = pd.merge(cp,cp_reviewrating.reset_index(),on = 'brand', how = 'outer') 302 | cp = pd.merge(cp,cp_price.reset_index(),on = 'brand') 303 | cp.columns = ['brand','counts','avg_rank','prodcut_count','max_rank','review_count','avg_rating','avg_price'] 304 | cp = cp.sort_values(['counts','avg_rank'],ascending = (False,True)) 305 | cp[['brand','counts','prodcut_count','avg_rank','max_rank','review_count','avg_rating','avg_price']].head(self.show_number) 306 | rank_form = cp[['brand','counts','prodcut_count','avg_rank','max_rank','review_count','avg_rating','avg_price']].head(self.show_number) 307 | rank_form['avg_price'] = rank_form['avg_price'].map(lambda x:float('%.2f' % x)) 308 | rank_form.columns = ['品牌', '登榜次数', '登榜商品数', '平均排名', '最高排名', '评论量', '平均星级', '平均价格($)'] 309 | rank_form.index = range(1,self.show_number + 1) 310 | return rank_form -------------------------------------------------------------------------------- /Example_1/models.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | import logging 3 | 4 | class Document(object): 5 | def __init__(self): 6 | # 文章标题 7 | self.title = None 8 | # 子标题 9 | self.subtitle = None 10 | # 前言 11 | self.foreword = None 12 | # 章节 13 | self.chapters = [] 14 | for i in range(5): 15 | self.chapters.append(Chapter(subchapter_number=4, number="chapter{}".format(i))) 16 | 17 | def print_structure(self, deep): 18 | if self.chapters: 19 | for subchapter in self.chapters: 20 | print('--'*deep+subchapter.number) 21 | subchapter.print_structure(deep+1) 22 | else: 23 | return 24 | 25 | class Chapter(Document): 26 | def __init__(self, subchapter_number=0, number=''): 27 | """ 28 | subchapter_number : 子章节数 29 | number : 章节编号,基于文档结构生成,需要是唯一的 30 | """ 31 | self.title = None 32 | self.content1 = None 33 | self.table = None 34 | # 图表会先做出图片形式,此处保存图片的路径 35 | self.image = None 36 | self.content2 = None 37 | self.rank_list_change = None 38 | self.number = number 39 | self.chapters= [Chapter(subchapter_number=0, number="{}_subchapter{}".format(self.number, i)) 40 | for i in range(subchapter_number)] 41 | # 设定图片保存的链接 42 | self.image_path = './image/' 43 | 44 | 45 | def set_image(self, fig): 46 | image_filename = '{}{}.png'.format(self.image_path, self.number) 47 | fig.savefig(image_filename, dpi=160, bbox_inches='tight') 48 | self.image = image_filename 49 | 50 | def __getattr__(self, name): 51 | try: 52 | return self.name 53 | except: 54 | logging.error("Attribute is not exist") 55 | return None 56 | 57 | # 子章节定义可以直接使用Chapter定义,利用了Python类的继承,也为之后补充自定义子章节提供了留白 58 | class Subchapter(Chapter): 59 | pass -------------------------------------------------------------------------------- /Example_1/template.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 22 | 23 |
24 |

{{ document.title }}

25 |

{{ document.foreword }}

26 | 27 | {% for chapter in document.chapters %} 28 |

{{ chapter.title }}

29 | {%- if chapter.foreword -%} 30 |

{{ chapter.foreword }}

31 | {%- endif -%} 32 |

{{ chapter.content }}

33 | {% if not chapter.table is none %} 34 | {% set table = chapter.table %} 35 | 36 | 37 | {% for column in table.columns %} 38 | 39 | {% endfor %} 40 | 41 | {% for index in table.index %} 42 | 43 | {% for column in table.columns %} 44 | 45 | {% endfor %} 46 | 47 | {% endfor %} 48 |
{{column}}
{{ table[column][index] }}
49 | {% endif %} 50 | {% for subchapter in chapter.chapters %} 51 | {% if not subchapter.title is none %} 52 |

{{ subchapter.title }}

53 |

{{ subchapter.foreword }}

54 | {% if not subchapter.table is none %} 55 | {% set table = subchapter.table %} 56 | 57 | 58 | {% for column in table.columns %} 59 | 60 | {% endfor %} 61 | 62 | {% for index in table.index %} 63 | 64 | {% for column in table.columns %} 65 | 66 | {% endfor %} 67 | 68 | {% endfor %} 69 |
{{column}}
{{ table[column][index] }}
70 | {% endif %} 71 | 72 | {% if not subchapter.image is none %} 73 | {% set image = subchapter.image %} 74 |
75 | {% endif %} 76 | 77 |

{{ subchapter.content }}

78 | {% endif %} 79 | {% endfor %} 80 | 81 | {% endfor %} 82 |
83 | 84 | -------------------------------------------------------------------------------- /Example_2/configs.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | report_name = "Report1" -------------------------------------------------------------------------------- /Example_2/main.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": null, 6 | "metadata": { 7 | "collapsed": true 8 | }, 9 | "outputs": [], 10 | "source": [] 11 | } 12 | ], 13 | "metadata": { 14 | "hide_input": false, 15 | "kernelspec": { 16 | "display_name": "Python 3", 17 | "language": "python", 18 | "name": "python3" 19 | } 20 | }, 21 | "nbformat": 4, 22 | "nbformat_minor": 2 23 | } 24 | -------------------------------------------------------------------------------- /Example_2/reports/Report1/configs.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | report_name = "Report1" -------------------------------------------------------------------------------- /Example_2/reports/Report1/main.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": null, 6 | "metadata": { 7 | "collapsed": true 8 | }, 9 | "outputs": [], 10 | "source": [] 11 | } 12 | ], 13 | "metadata": { 14 | "hide_input": false, 15 | "kernelspec": { 16 | "display_name": "Python 3", 17 | "language": "python", 18 | "name": "python3" 19 | } 20 | }, 21 | "nbformat": 4, 22 | "nbformat_minor": 2 23 | } 24 | -------------------------------------------------------------------------------- /Example_3/ImageFactory.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | import matplotlib.pyplot as plt 3 | import matplotlib as mpl 4 | import matplotlib.font_manager as mfm 5 | import matplotlib.gridspec as gridspec 6 | import matplotlib.ticker as plticker 7 | import numpy as np 8 | import types 9 | from itertools import cycle 10 | 11 | class Image(object): 12 | font_path = {} 13 | prop = {} 14 | font_path['hei'] = '../font/MSYHMONO.ttf' 15 | font_path['english'] = '../font/Calibri.ttf' 16 | for font_name in list(font_path): 17 | prop[font_name] = mfm.FontProperties(fname=font_path[font_name]) 18 | 19 | title_font = prop['hei'].copy() 20 | title_font.set_size(14) 21 | xticks_font = prop['hei'].copy() 22 | xticks_font.set_size(9) 23 | ylable_font = prop['hei'].copy() 24 | ylable_font.set_size(10) 25 | legend_font = prop['hei'].copy() 26 | legend_font.set_size(9) 27 | default_colors = {} 28 | default_colors['blue'] = '#6CADDD' 29 | default_colors['yellow'] = '#F3903F' 30 | default_colors['red'] = '#CB615A' 31 | default_colors['orange'] = '#F3903F' 32 | default_colors['gray'] = '#B4B4B4' 33 | default_colors['lightyellow'] = '#FCC900' 34 | default_colors['royalblue'] = '#5488CF' 35 | 36 | IMAGE_WIDTH = 5.708 37 | IMAGE_HIGH = 2.756 38 | 39 | def __init__(self, title=None, labels=None, data=None, 40 | image_path=None, title_y=1.1, xticks_rotation='vertical', legend_name=[]): 41 | self.length = len(data) 42 | self.x = np.arange(self.length) 43 | self.y = data 44 | self.data = data 45 | self.title_y = title_y 46 | self.title = title 47 | self.labels = labels 48 | self.legend_name = legend_name 49 | self.xticks_rotation = xticks_rotation 50 | 51 | def init(self): 52 | self.fig = plt.figure(figsize=(self.IMAGE_WIDTH, self.IMAGE_HIGH)) 53 | self.gs = gridspec.GridSpec(1, 1) 54 | self.ax = self.fig.add_subplot(self.gs[0]) 55 | self.init_plus() 56 | self.set_margins() 57 | self.set_xticks() 58 | self.add_ax() 59 | self.add_title() 60 | self.plot() 61 | self.set_spines() 62 | self.set_tick_marks() 63 | self.add_legend() 64 | self.config_add() 65 | self.tight_layout() 66 | self.set_grid() 67 | plt.close() 68 | 69 | def config_add(self): 70 | pass 71 | 72 | def add_ax(self): 73 | pass 74 | 75 | def init_plus(self): 76 | pass 77 | 78 | def add_legend(self): 79 | if not (self.legend_name is None): 80 | if len(self.legend_name)==1: 81 | plt.legend(self.legend_name, loc='upper right', bbox_to_anchor=(1, 1.2), prop=self.legend_font, frameon=True) 82 | elif len(self.legend_name)==2: 83 | lines1, labels1 = self.ax.get_legend_handles_labels() 84 | self.ax.legend(lines1, labels1, loc='upper center', ncol=2, bbox_to_anchor=(0.5, 1.2), prop=self.legend_font, frameon=False) 85 | 86 | 87 | def set_ylabel(self): 88 | pass 89 | 90 | def tight_layout(self, **karg): 91 | self.gs.tight_layout(self.fig, **karg) 92 | 93 | def plot(self): 94 | self.ax.fill_between(self.x, self.y.min()*0.9, 95 | self.y, zorder=3, color=self.default_colors['blue']) 96 | 97 | def add_title(self): 98 | if self.title: 99 | plt.title(self.title, fontproperties=self.title_font, y=self.title_y) 100 | 101 | def set_grid(self): 102 | get_ax_space = lambda x: x.get_ylim()[1] - x.get_ylim()[0] 103 | self.ax_space = get_ax_space(self.ax) 104 | def get_interval(ax_space, space_number=5): 105 | digit_number = len(str((ax_space))) 106 | intervals = int((ax_space)/(space_number*(10**digit_number))) 107 | while intervals == 0: 108 | digit_number -= 1 109 | intervals = int((ax_space)/(space_number*(10**digit_number))) 110 | linshi = round((ax_space)/(space_number*(10**digit_number))) 111 | intervals = linshi*(10**digit_number) 112 | return intervals 113 | 114 | if not 'intervals' in self.__dict__.keys(): 115 | self.intervals = get_interval(self.ax_space) 116 | loc = plticker.MultipleLocator(base=self.intervals) 117 | self.ax.yaxis.set_major_locator(loc) 118 | self.ax.grid(axis='y', zorder=0) 119 | if 'ax2' in self.__dict__.keys(): 120 | print("有双轴需要设置副轴grid") 121 | self.ax_space2 = get_ax_space(self.ax2) 122 | self.intervals2 = get_interval(self.ax_space2, space_number=5) 123 | loc2 = plticker.MultipleLocator(base=self.intervals2) 124 | self.ax2.yaxis.set_major_locator(loc2) 125 | 126 | def set_margins(self): 127 | self.ax.margins(0.013, 0.073) 128 | 129 | def set_spines(self): 130 | self.ax.spines['right'].set_visible(False) 131 | self.ax.spines['top'].set_visible(False) 132 | # self.ax.spines['bottom'].set_visible(False) 133 | self.ax.spines['left'].set_visible(False) 134 | 135 | def set_tick_marks(self): 136 | self.ax.tick_params(axis='both', which='both', bottom=False, top=False, 137 | labelbottom=True, left=False, right=False, labelleft=True) 138 | 139 | def set_xticks(self): 140 | plt.xticks(self.x, self.labels, fontproperties=self.xticks_font, rotation=self.xticks_rotation) # 设置横坐标标签 141 | 142 | def show(self): 143 | plt.show() 144 | 145 | def save(self): 146 | if image_path: 147 | self.fig.savefig(image_path) 148 | else: 149 | logging.warning("Please sure image path firse") 150 | 151 | class ImageFill(Image): 152 | # def __init__(self): 153 | # pass 154 | 155 | def set_spines(self): 156 | self.ax.spines['right'].set_visible(False) 157 | self.ax.spines['top'].set_visible(False) 158 | self.ax.spines['bottom'].set_visible(False) 159 | self.ax.spines['left'].set_visible(False) 160 | 161 | class ImageBar(Image): 162 | def __init__(self, title=None, labels=None, data=None, image_path=None, xticks_rotation = 40, 163 | legend_name=[], y2=None, title_y=1.2): 164 | self.y2 = y2 165 | super(ImageBar, self).__init__(title=title, labels=labels, data=data, image_path=image_path, 166 | title_y=title_y, xticks_rotation=xticks_rotation, legend_name=legend_name) 167 | 168 | def plot(self): 169 | rects = plt.bar(self.x, self.y, 0.4, zorder=3, color=self.default_colors['blue']) 170 | for rect in rects: 171 | height = rect.get_height() 172 | self.ax.text(rect.get_x() + rect.get_width()/2., 1.05*height, 173 | '%d' % int(height), 174 | ha='center', va='bottom') 175 | 176 | # def set_xticks(self): 177 | # plt.xticks(self.x, self.labels, fontproperties=self.xticks_font, rotation= self.xticks_rotation, wrap=True) 178 | 179 | class ImageTwinx(Image): 180 | def __init__(self, title=None, labels=None, data=None, image_path=None, xticks_rotation=40, 181 | legend_name=[], y2=None, title_y=1.2, ylabel_show=True): 182 | self.ylabel_show = ylabel_show 183 | self.legend_name = legend_name 184 | self.marker_style = dict(color=self.default_colors['yellow'], linestyle='-', marker='o') 185 | self.y2 = y2 186 | super(ImageTwinx, self).__init__(title=title, labels=labels, data=data, 187 | image_path=image_path, xticks_rotation=xticks_rotation, title_y=title_y, legend_name=legend_name) 188 | 189 | def config_add(self): 190 | self.set_ylabel() 191 | 192 | def set_ylabel(self): 193 | if self.ylabel_show: 194 | self.ax.set_ylabel(self.legend_name[0], fontproperties=self.ylable_font) 195 | self.ax2.set_ylabel(self.legend_name[1], fontproperties=self.ylable_font) 196 | 197 | def add_ax(self): 198 | self.ax2 = self.ax.twinx() 199 | 200 | def tight_layout(self, **karg): 201 | self.gs.tight_layout(self.fig, **karg) 202 | 203 | def plot(self): 204 | self.ln1 = self.ax.bar(self.x, self.y, 0.4, zorder=3, label=self.legend_name[0], color=self.default_colors['blue']) 205 | self.ax2.plot(self.x, self.y2, label=self.legend_name[1], **self.marker_style) 206 | 207 | def set_spines(self): 208 | for _ax in [self.ax, self.ax2]: 209 | _ax.margins(0) # 设置留白 210 | # spines 211 | _ax.spines['right'].set_visible(False) 212 | _ax.spines['top'].set_visible(False) 213 | _ax.spines['left'].set_visible(False) 214 | 215 | def set_tick_marks(self): 216 | self.ax.tick_params(axis='both', which='both', bottom=False, top=False, 217 | labelbottom=True, left=False, right=False, labelleft=True) 218 | self.ax2.tick_params(axis='both', which='both', bottom=False, top=False, 219 | labelbottom=True, left=False, right=False) 220 | 221 | def add_legend(self): 222 | if not (self.legend_name is None): 223 | if len(self.legend_name) == 2: 224 | lines1, labels1 = self.ax.get_legend_handles_labels() 225 | lines2, labels2 = self.ax2.get_legend_handles_labels() 226 | self.ax.legend(lines1+lines2, labels1+labels2, loc='upper center', ncol=2, bbox_to_anchor=(0.5, 1.27), prop=self.legend_font, frameon=False) 227 | 228 | class ImageLine(Image): 229 | def __init__(self, title=None, labels=None, data=None, image_path=None, 230 | title_y = 1.08, xticks_rotation='horizontal', legend_name=[]): 231 | self.marker_style = dict(color=Image.default_colors['blue'], linestyle='-') 232 | super(ImageLine, self).__init__(title=title, labels=labels, data=data, 233 | image_path=image_path, title_y=title_y, xticks_rotation= xticks_rotation, 234 | legend_name=legend_name) 235 | self.init_plus = self.config_add 236 | 237 | def config_add(self): 238 | self.set_ylabel() 239 | self.ax.set_ylim(top = round(np.max(self.y)*1.1)) 240 | 241 | def plot(self): 242 | self.ax.plot(self.x, self.y, **self.marker_style) 243 | 244 | class ImagePie(Image): 245 | colors = ['#F3903F','#B4B4B4','#FCC900','#6CADDD', 246 | '#D9D4CF', '#7C7877', '#ABD0CE', '#F0E5DE', '#6AAFE6', 247 | '#D09E88', '#D4DFE6'] 248 | 249 | def plot(self): 250 | self.explode = np.ones(self.length)*0.03 251 | self.patches = self.ax.pie(self.y, 252 | explode=self.explode, 253 | labels=self.labels, 254 | colors=self.colors, 255 | autopct='%d%%') 256 | 257 | def set_grid(self): 258 | pass 259 | 260 | def add_legend(self): 261 | handles = [] 262 | for i, l in enumerate(self.labels): 263 | handles.append(mpl.patches.Patch(color=self.colors[i], label=l)) 264 | self.ax.legend(handles, self.labels, loc="center right", frameon=False) 265 | 266 | class ImageFluctuation(ImageTwinx): 267 | def plot(self): 268 | self.ax.plot(self.x, self.y, label=self.legend_name[0], **self.marker_style) 269 | self.ax2.bar(self.x, self.y2, 0.4, zorder=3, label=self.legend_name[1], color=self.default_colors['red']) 270 | 271 | def init_plus(self): 272 | self.marker_style = dict(color=self.default_colors['blue'], linestyle='-') 273 | 274 | def set_xticks(self): 275 | plt.xticks(range(0,self.length,30), self.labels.loc[[0, 30, 60, 90, 120, 150, 180]], fontproperties=self.xticks_font, rotation=self.xticks_rotation) 276 | 277 | def set_ylim(self, top=None, bottom=None): 278 | if not top: 279 | top=int(np.max(self.y)*1.1) 280 | if not bottom: 281 | bottom=int(np.min(self.y)*0.8) 282 | if top and bottom: 283 | try: 284 | self.ax.set_ylim(top=top, bottom=bottom) 285 | except: 286 | top=int(np.max(self.y)*1.1) 287 | bottom=int(np.min(self.y)*0.8) 288 | self.ax.set_ylim(top=top, bottom=bottom) 289 | 290 | def config_add(self): 291 | self.set_ylabel() 292 | self.set_ylim() 293 | 294 | class ImageDoubleLine(ImageTwinx): 295 | def init_plus(self): 296 | self.marker_style1 = dict(color=self.default_colors['red'], linestyle='-') 297 | self.marker_style2 = dict(color=self.default_colors['blue'], linestyle='-') 298 | 299 | def plot(self): 300 | self.ax.plot(self.x, self.y, label=self.legend_name[0], **self.marker_style1) 301 | self.ax2.plot(self.x, self.y2, label=self.legend_name[1], **self.marker_style2) 302 | 303 | def set_xticks(self): 304 | plt.xticks(range(0,self.length,30), self.labels.loc[[0, 30, 60, 90, 120, 150, 180]], fontproperties=self.xticks_font, rotation=self.xticks_rotation) 305 | 306 | class ImageMultiLine(Image): 307 | IMAGE_HIGH = 2.756 308 | color_cycle = cycle(['blue', 'orange', 'red', 'lightyellow', 'royalblue']) 309 | 310 | def plot(self): 311 | self.marker_style = [] 312 | for asin in self.y.columns: 313 | data = self.y[asin] 314 | marker_style = dict(color=self.default_colors[next(self.color_cycle)], 315 | linestyle='-', marker='o') 316 | self.marker_style.append(marker_style) 317 | self.ax.plot(self.x, data, zorder=3, label=asin, **marker_style) 318 | 319 | def set_tick_marks(self): 320 | self.ax.tick_params(axis='both', zorder=1, which='both', bottom=True, top=False, 321 | labelbottom=True, left=False, right=False, labelleft=False) 322 | 323 | def set_margins(self): 324 | self.ax.margins(0.013, 0.073) 325 | 326 | def set_yticks(self): 327 | self.ax.yaxis.tick_right() 328 | self.set_tick_marks() 329 | 330 | def config_add(self): 331 | self.set_yticks() 332 | data_max = self.y.max().max() 333 | if round(data_max*1.1)-round(data_max)>1: 334 | top = round(data_max*1.1) 335 | else: 336 | top = round(data_max)+1 337 | self.ax.set_ylim(top=top) 338 | 339 | def add_legend(self): 340 | handles, labels = self.ax.get_legend_handles_labels() 341 | self.ax.legend(handles, labels, loc='lower center', ncol=3, bbox_to_anchor=(0.5, -0.4), prop=self.legend_font, frameon=False) -------------------------------------------------------------------------------- /Example_3/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ssty247/automate-report/27b3fecf5bdd5dd3ae7a8362443d835c4825b1a6/Example_3/__init__.py -------------------------------------------------------------------------------- /Example_3/analysis.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | import pandas as pd 3 | import numpy as np 4 | import logging 5 | 6 | from datapipeline import product_data, list_data 7 | from items import product, TopList 8 | 9 | lists = {'Best Seller':TopList(list_data['bestsellers']), 10 | 'Hot New Releases':TopList(list_data['hotnewreleases']), 11 | 'Movers & Shakers':TopList(list_data['moversandshakers'])} 12 | bs = lists['Best Seller'] 13 | hnr = lists['Hot New Releases'] 14 | mns = lists['Movers & Shakers'] 15 | 16 | 17 | class Info(object): 18 | def __init__(self): 19 | self.rank_lists = list(lists.values()) 20 | self.review_total_count = sum([rank_list.review.sum for rank_list in self.rank_lists]) 21 | sorted_time = sorted(list(set(bs.data['date']))) 22 | start_time = sorted_time[0] 23 | self.start_time = str(start_time.year) + '年' + str(start_time.month) + '月' + str(start_time.day) + '日' 24 | end_time = sorted_time[-1] 25 | self.end_time = str(end_time.year) + '年' + str(end_time.month) + '月' + str(end_time.day) + '日' 26 | self.category_name = bs.data['dept_name'][0] 27 | 28 | @property 29 | def commodity_total_count(self): 30 | commodity_total = bs.commodity.set and hnr.commodity.set and mns.commodity.set 31 | commodity_total_count = len(commodity_total) 32 | return commodity_total_count 33 | 34 | @property 35 | def brand_total_count(self): 36 | brand_total = bs.brand.set and hnr.brand.set and mns.brand.set 37 | brand_total_count = len(brand_total) 38 | return brand_total_count 39 | 40 | @property 41 | def rating_mean(self): 42 | rating = np.array([rank_list.review.rating_mean for rank_list in self.rank_lists]) 43 | count = np.array([rank_list.review.count for rank_list in self.rank_lists]) 44 | rating_mean = np.dot(rating, count)/np.sum(count) 45 | return rating_mean 46 | 47 | info = Info() -------------------------------------------------------------------------------- /Example_3/configs.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | report_name = 'Report1' 3 | filepath = './' 4 | date = u'20180528' 5 | bd = u'PS' 6 | author = u'作者君' 7 | 8 | 9 | -------------------------------------------------------------------------------- /Example_3/datapipeline.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | import pandas as pd 3 | from configs import bd, date, filepath, report_name 4 | 5 | def read_data(): 6 | df = pd.read_pickle('./reports/{}/data/{}{}_product.pkl'.format( 7 | report_name, bd, date)) 8 | # 修复价格显示百倍 9 | df.loc[:, 'lowest_price'] = df['lowest_price']/100 10 | return df 11 | 12 | product_data = read_data() 13 | 14 | def read_list_data(list_name): 15 | df_ = pd.read_pickle('./reports/{}/data/{}{}_{}.pkl'.format(report_name, bd, date, list_name)) 16 | df = pd.merge(df_, product_data, how='left', on = 'product_id') 17 | return df 18 | 19 | list_data = {} 20 | for list_name in ['bestsellers', 'hotnewreleases', 'moversandshakers']: 21 | list_data[list_name] = read_list_data(list_name) -------------------------------------------------------------------------------- /Example_3/items.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | import pandas as pd 3 | import numpy as np 4 | import logging 5 | 6 | from configs import filepath, date, bd, author 7 | from ImageFactory import ImageMultiLine, ImageBar 8 | from datapipeline import product_data 9 | 10 | class Product(object): 11 | def __init__(self, data): 12 | self.data = data 13 | 14 | product = Product(product_data) 15 | 16 | class TopList(object): 17 | def __init__(self, data): 18 | self.data = data 19 | self.product_id = self.data['product_id'] 20 | self.price = Price(self.data.groupby(['product_id'])['lowest_price'].mean()) 21 | self.review = Review(self.data) 22 | self.commodity = Commodity(self.data) 23 | self.brand = Brand(self.data) 24 | 25 | class Price(object): 26 | def __init__(self, data): 27 | self.data = data 28 | self.mean = np.nanmean(self.data) 29 | self.max = np.max(self.data) 30 | self.min = np.min(self.data) 31 | self.count = len(self.data) 32 | self.space = 5 33 | self.table = self.distribution_table(50, space=self.space) 34 | # self.image = self.plot_distribution() 35 | 36 | @property 37 | def image(self): 38 | labels = ["\\"+label for label in self.table['labels']] 39 | image = ImageBar(data=self.table['count'], 40 | labels=labels, 41 | title='榜单商品价格分布', 42 | xticks_rotation=0, 43 | legend_name=['商品数量']) 44 | image.init() 45 | return image 46 | 47 | def distribution_table(self, 48 | truncation_number:int, 49 | space_auto=True, 50 | space=10, 51 | space_power=1, 52 | unit_symbol='$' 53 | ): 54 | """ 55 | 把数据划分出不同区间,然后计算各个区间的频率 56 | 57 | Parameters 58 | ---------- 59 | truncation_number : 图表截断值,即区间范围的最大值 60 | space_auto : 是否自动确定组间距离 61 | space : 组间距离,即各个区间的范围 62 | space_power : 组间距离乘数,本身组间距离的增大方式为:当组间距离<=5时,每次加1,当>5时,每次加5 63 | 二档组间距离乘数为100时,则当组间距离<=500时,每次加100,当>500时,每次加500 64 | unit_symbol : x轴标签的单位符号,当unit_symbol='$',x轴0-10区间的标签为$0-$10 65 | 而当unit_symbol='$',x轴0-10区间的标签为0-10 66 | """ 67 | 68 | repeat_times = 0 69 | while True: 70 | data_last = 0 71 | data_count = {} 72 | labels = [] 73 | for data in range(0, int(self.max+1), space): 74 | if data < truncation_number: 75 | data_count['{}{}-{}{}'.format(unit_symbol, data_last, unit_symbol, data)]\ 76 | = sum((self.data < data) & (self.data >= data_last)) 77 | data_last = data 78 | else: 79 | data_count['{}{}以上'.format(unit_symbol, truncation_number)]\ 80 | = sum((self.data >= truncation_number)) 81 | 82 | if space_auto: 83 | if sum(map(lambda x : x != 0, data_count.values()))>14: 84 | space += 1*space_power if space<5*space_power else 5*space_power 85 | elif sum(map(lambda x : x != 0, data_count.values()))>8: 86 | break 87 | else: 88 | space -= 1*space_power if space<5*space_power else 5*space_power 89 | 90 | repeat_times += 1 91 | if repeat_times > 30: 92 | logging.warning("循环次数过多,无法自动确认组间距") 93 | break 94 | else: 95 | break 96 | 97 | if sum(map(lambda x : x != 0, data_count.values()))>14: 98 | logging.warning("存在分组数大于14,请手动调节组间距和最大截断值") 99 | df_data = pd.DataFrame({'labels':list(data_count.keys()), 100 | 'count':list(data_count.values())} 101 | ) 102 | 103 | df_data = df_data.dropna() 104 | df_data = df_data[df_data['count'] != 0] 105 | return df_data.assign(percentage = lambda x: x['count']/sum(x['count'])) 106 | 107 | class Review(object): 108 | def __init__(self, data): 109 | self.data = data.groupby(['product_id'])['review_count'].mean() 110 | self.review_rating = data[data['review_count']!=0].review_rating 111 | self.rating_mean = np.nanmean(self.review_rating) 112 | self.mean = np.nanmean(self.data) 113 | self.max = np.max(self.data) 114 | self.min = np.min(self.data) 115 | self.sum = np.sum(self.data) 116 | self.count = len(self.data) 117 | 118 | # self.distribution_table = Price.distribution_table 119 | try: 120 | if self.mean>1000: 121 | self.table = self.distribution_table(10000, 122 | space=500, 123 | space_power=100, 124 | unit_symbol='') 125 | else: 126 | self.table = self.distribution_table(1000, 127 | space=50, 128 | space_power=2, 129 | space_auto = True, 130 | unit_symbol='') 131 | except BaseException as e: 132 | print(e) 133 | 134 | def distribution_table(self, *arg, **karg): 135 | return Price.distribution_table(self, *arg, **karg) 136 | 137 | @property 138 | def image(self): 139 | labels = self.table['labels'] 140 | image = ImageBar(data=self.table['count'], 141 | labels=labels, 142 | title='榜单商品评论数分布', 143 | xticks_rotation=20, 144 | legend_name=['商品数量'] 145 | ) 146 | image.init() 147 | return image 148 | 149 | class Commodity(object): 150 | show_number = 5 151 | def __init__(self, data): 152 | self.data = data 153 | self.product_id = self.data['product_id'] 154 | self.set = set(self.data['product_id']) 155 | self.count = len(self.set) 156 | self.table = self.commodity_rank() 157 | self.rank_change_table = ( 158 | self.data[ 159 | self.product_id.isin( 160 | self.cp['product_id'].head(self.show_number) 161 | ) 162 | ].pivot_table( 163 | index = ['date'], 164 | columns = ['product_id'], 165 | values = ['rank'] 166 | )['rank'][self.cp['product_id'].head(self.show_number)] 167 | )[self.table['ASIN'].head()] 168 | 169 | @property 170 | def image(self): 171 | image = ImageMultiLine(data=self.rank_change_table, 172 | labels=self.rank_change_table.index, 173 | title='商品排名变化情况', 174 | xticks_rotation=0, 175 | title_y = 1.1 176 | ) 177 | image.init() 178 | if image.intervals<1: 179 | image.intervals = 1 180 | image.init() 181 | image.ax.invert_yaxis() 182 | return image 183 | 184 | def commodity_rank(self): 185 | groups = self.data.groupby(['product_id']) 186 | # 上榜次数 187 | cp_count = pd.DataFrame(groups['date'].nunique()) 188 | # 计算每个商品的平均bsr 189 | cp_mean = pd.DataFrame(groups['rank'].mean()) 190 | #计算各个商品的最好bsr 191 | cp_min = pd.DataFrame(groups['rank'].min()) 192 | # 整理成一个表 193 | self.cp = pd.merge(cp_min.reset_index(), 194 | pd.merge(cp_count.reset_index(), 195 | cp_mean.reset_index(), 196 | on = 'product_id'), 197 | on = 'product_id') 198 | # 对列进行命名 199 | self.cp.columns = ['product_id','highest_rank','counts','avg_rank'] 200 | # 根据平均排名进行排序 201 | self.cp = self.cp.sort_values(['counts','avg_rank'],ascending = (False,True)) 202 | # 合并所需要的商品排行表 203 | self.cp = pd.merge(self.cp,product.data,on = 'product_id') 204 | 205 | rank_form = self.cp[['product_id', 'image_url', 'counts', 'avg_rank', 'highest_rank', 'brand', 'review_count', 'review_rating','lowest_price']].head(self.show_number).copy() 206 | rank_form.index = range(1,self.show_number + 1) 207 | # 将图片链接改为在html中可以直接显示的类型 208 | for i in range(len(rank_form)): 209 | if rank_form.image_url.notnull()[i + 1]: 210 | if "images-na.ssl" in rank_form.loc[i+1, 'image_url']: 211 | rank_form.loc[i+1, 'image_url'] = '' 213 | # 遇到图片缺失的情况时的处理 214 | else: 215 | logging.warning("ASIN -> {} : Can't get image_url, save url as {}.jpg instead".format(rank_form.product_id, rank_form.product_id)) 216 | rank_form.loc[i+1, 'image_url'] = '' 218 | #调整小数位数 219 | rank_form.avg_rank = rank_form.avg_rank.round(2) 220 | 221 | rank_form.columns = ['ASIN', '图片', '登榜次数', '平均排名', '最高排名', '品牌', '评论量', '平均星级', '当前价格($)'] 222 | return rank_form 223 | 224 | class Brand(object): 225 | show_number = 5 226 | def __init__(self, data): 227 | self.data = data 228 | self.set = set(self.data['brand']) 229 | self.count = len(self.set) 230 | self.table = self.commodity_rank() 231 | self.count_change_table = self.count_change() 232 | 233 | @property 234 | def image(self): 235 | image = ImageMultiLine(data=self.count_change_table, 236 | labels=self.count_change_table.index, 237 | title='品牌登榜商品数变化情况', 238 | xticks_rotation=0, 239 | title_y = 1.1 240 | ) 241 | image.init() 242 | if image.intervals<1: 243 | image.intervals = 1 244 | image.init() 245 | return image 246 | 247 | def count_change(self): 248 | df_brand_product_counts = pd.DataFrame(self.data.groupby(['brand','date'])['product_id'].count()).reset_index() 249 | df_brand_productcounts_change = df_brand_product_counts[df_brand_product_counts['brand'].isin(self.table['品牌'].head())].pivot_table(index = ['date'],columns = ['brand'],values = ['product_id']) 250 | try: 251 | df_brand_productcounts_change.columns = df_brand_productcounts_change.columns.levels[1] 252 | except: 253 | print('') 254 | df_brand_productcounts_change = df_brand_productcounts_change[self.table['品牌'].head()] 255 | return df_brand_productcounts_change.fillna(0) 256 | 257 | def commodity_rank(self): 258 | product.data_id = pd.DataFrame({'product_id':list(set(list(self.data['product_id'])))}) 259 | df_cp_product = pd.merge(product.data_id,product.data,on = 'product_id') 260 | cp_count = pd.DataFrame(self.data.groupby(['brand'])['rank'].count()) 261 | cp_mean = pd.DataFrame(self.data.groupby(['brand'])['rank'].mean()).round(2) 262 | cp_productcount = pd.DataFrame(df_cp_product.groupby(['brand'])['product_id'].count()) 263 | cp_max = pd.DataFrame(self.data.groupby(['brand'])['rank'].min()) 264 | cp_reviewcount = pd.DataFrame(df_cp_product.groupby(['brand'])['review_count'].sum()).fillna(0).astype(int) 265 | cp_reviewrating = pd.DataFrame(df_cp_product[df_cp_product['review_rating']!= 0].groupby(['brand'])['review_rating'].mean()).round(2) 266 | cp_price = pd.DataFrame(df_cp_product.groupby(['brand'])['lowest_price'].mean()).round(2) 267 | cp = pd.merge(cp_count.reset_index(),cp_mean.reset_index(),on = 'brand') 268 | cp = pd.merge(cp,cp_productcount.reset_index(),on = 'brand') 269 | cp = pd.merge(cp,cp_max.reset_index(),on = 'brand') 270 | cp = pd.merge(cp,cp_reviewcount.reset_index(),on = 'brand') 271 | cp = pd.merge(cp,cp_reviewrating.reset_index(),on = 'brand', how = 'outer') 272 | cp = pd.merge(cp,cp_price.reset_index(),on = 'brand') 273 | cp.columns = ['brand','counts','avg_rank','prodcut_count','max_rank','review_count','avg_rating','avg_price'] 274 | cp = cp.sort_values(['counts','avg_rank'],ascending = (False,True)) 275 | cp[['brand','counts','prodcut_count','avg_rank','max_rank','review_count','avg_rating','avg_price']].head(self.show_number) 276 | rank_form = cp[['brand','counts','prodcut_count','avg_rank','max_rank','review_count','avg_rating','avg_price']].head(self.show_number) 277 | rank_form['avg_price'] = rank_form['avg_price'].map(lambda x:float('%.2f' % x)) 278 | rank_form.columns = ['品牌', '登榜次数', '登榜商品数', '平均排名', '最高排名', '评论量', '平均星级', '平均价格($)'] 279 | rank_form.index = range(1,self.show_number + 1) 280 | return rank_form 281 | 282 | 283 | -------------------------------------------------------------------------------- /Example_3/main.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": { 7 | "collapsed": true 8 | }, 9 | "outputs": [], 10 | "source": [ 11 | "# -*- coding: utf-8 -*-\n", 12 | "import pandas as pd\n", 13 | "import logging, os, shutil\n", 14 | "\n", 15 | "\n", 16 | "import configs\n", 17 | "from models import Document\n", 18 | "from analysis import product, info, lists, bs, hnr, mns\n", 19 | "from tools import check_folder" 20 | ] 21 | }, 22 | { 23 | "cell_type": "code", 24 | "execution_count": 2, 25 | "metadata": {}, 26 | "outputs": [], 27 | "source": [ 28 | "## 检查文件结构,并创建缺少的文件夹\n", 29 | "check_folder(r\"./reports/{}\".format(configs.report_name))\n", 30 | "for folder in ['data', 'image']:\n", 31 | " check_folder(r\"./reports/{}/{}\".format(configs.report_name, folder))" 32 | ] 33 | }, 34 | { 35 | "cell_type": "code", 36 | "execution_count": 3, 37 | "metadata": { 38 | "collapsed": true 39 | }, 40 | "outputs": [], 41 | "source": [ 42 | "document = Document()" 43 | ] 44 | }, 45 | { 46 | "cell_type": "code", 47 | "execution_count": 4, 48 | "metadata": { 49 | "collapsed": true 50 | }, 51 | "outputs": [], 52 | "source": [ 53 | "document.author = configs.author\n", 54 | "document.title = '%s-%s%s品类报告'%(info.start_time, info.end_time,info.category_name)\n", 55 | "document.foreword = \"\"\"\n", 56 | "这里是前言\n", 57 | "\"\"\"\n", 58 | "document.chapters[0].title = '一、总体情况'\n", 59 | "document.chapters[0].foreword = \"\"\"\n", 60 | "{.start_time}至{.end_time}共有{.commodity_total_count}款\n", 61 | "商品登榜亚马逊{.category_name}品类的相关榜单,\n", 62 | "其中涉及品牌数{.brand_total_count:.0f}个。\n", 63 | "截至{.end_time},登榜商品总评论数达{.review_total_count:.0f}条,\n", 64 | "平均星级{.rating_mean:.2f}星。\\n\\n\n", 65 | "榜单的登榜商品及品牌数量情况如下表所示:\n", 66 | "\"\"\".format(info,info,info,info,info,info,info,info)\n", 67 | "document.chapters[0].table = pd.DataFrame(data = {'榜单':list(lists.keys()),\n", 68 | " '登榜商品数':[rank_list.commodity.count for rank_list in lists.values()],\n", 69 | " '登榜品牌数':[rank_list.brand.count for rank_list in lists.values()]\n", 70 | " })" 71 | ] 72 | }, 73 | { 74 | "cell_type": "code", 75 | "execution_count": 5, 76 | "metadata": { 77 | "collapsed": true 78 | }, 79 | "outputs": [], 80 | "source": [ 81 | "for i in range(3):\n", 82 | " chinese_number = ['一', '二', '三', '四']\n", 83 | " rank_name = list(lists.keys())[i]\n", 84 | " rank_list = lists[rank_name]\n", 85 | " \n", 86 | " chapter = document.chapters[i+1]\n", 87 | " chapter.title = \"{}、{}榜单分析:\".format(chinese_number[i+1], rank_name)\n", 88 | " \n", 89 | " chapter.chapters[0].title = '1.价格分布'\n", 90 | " chapter.chapters[0].set_image(rank_list.price.image.fig)\n", 91 | " chapter.chapters[0].content = \"\"\"\n", 92 | " 上周 {} 品类 {} 榜单商品最高价${:.2f},\n", 93 | " 最低价${:.2f},平均价格${:.2f}。{}发现,在上榜的商品中,\n", 94 | " \"\"\".format(info.category_name, rank_name, rank_list.price.max, rank_list.price.min, \n", 95 | " rank_list.price.mean, configs.author)\n", 96 | " \n", 97 | " chapter.chapters[1].title = '2.评论量分布'\n", 98 | " chapter.chapters[1].set_image(rank_list.review.image.fig)\n", 99 | " chapter.chapters[1].content = \"\"\"\n", 100 | " 在上周登榜 {} 品类 {} 榜单的商品中,\n", 101 | " 单个商品拥有的最大评论数为{:.2f}条,最小评论数为{:.2f}条,平均评论数{:.2f}条,评论平均星级{:.2f}星。\n", 102 | " \"\"\".format(info.category_name, rank_name, rank_list.review.max, rank_list.review.min, \n", 103 | " rank_list.review.mean, rank_list.review.rating_mean)\n", 104 | " \n", 105 | " chapter.chapters[2].title = '3.商品排行'\n", 106 | " chapter.chapters[2].foreword = '''根据上周 {} 榜单上商品的登榜次数及平均排名,\n", 107 | " {}对登榜的商品进行了排序,其中排名靠前的五款商品如下所示:'''.format(rank_name, configs.author)\n", 108 | " chapter.chapters[2].table = rank_list.commodity.table\n", 109 | " chapter.chapters[2].set_image(rank_list.commodity.image.fig)\n", 110 | " chapter.chapters[2].content = \"\"\"\n", 111 | " 可以看到,在上周{}品类的{}榜单中,\n", 112 | " \"\"\".format(info.category_name, rank_name)\n", 113 | " \n", 114 | " chapter.chapters[3].title = '4.品牌排行'\n", 115 | " chapter.chapters[3].table = rank_list.brand.table\n", 116 | " chapter.chapters[3].set_image(rank_list.brand.image.fig)\n", 117 | " chapter.chapters[3].foreword = \"\"\"根据上周 {} 榜单上各个品牌下商品的登榜次数,{}对登榜的品牌进行了排序,\n", 118 | " 其中排名靠前的五个品牌如下所示:\"\"\".format(rank_name, configs.author)\n", 119 | " chapter.chapters[3].content = \"\"\"\n", 120 | " 可以看到,在上周{}品类的{}榜单中,\n", 121 | " \"\"\".format(info.category_name, rank_name)\n", 122 | " \n", 123 | "document.chapters[4].title = ' '\n", 124 | "document.chapters[4].content = \"\"\"\n", 125 | "这里是总结\n", 126 | "\"\"\"" 127 | ] 128 | }, 129 | { 130 | "cell_type": "code", 131 | "execution_count": 6, 132 | "metadata": {}, 133 | "outputs": [], 134 | "source": [ 135 | "document.save_html(\"./reports/{}/{}.html\".format(configs.report_name, configs.report_name))" 136 | ] 137 | }, 138 | { 139 | "cell_type": "code", 140 | "execution_count": 7, 141 | "metadata": {}, 142 | "outputs": [ 143 | { 144 | "name": "stderr", 145 | "output_type": "stream", 146 | "text": [ 147 | "WARNING:root:文件已添加:./reports/Report1/main.ipynb\n", 148 | "WARNING:root:文件已添加:./reports/Report1/configs.py\n" 149 | ] 150 | } 151 | ], 152 | "source": [ 153 | "# 把main.ipynb和configs.py备份入报告文件夹\n", 154 | "for filename in ['main.ipynb', 'configs.py']: \n", 155 | " if os.path.exists(r\"./reports/{}/{}\".format(configs.report_name, filename)):\n", 156 | " logging.warning(\"文件已添加:./reports/{}/{}\".format(configs.report_name, filename))\n", 157 | " shutil.copy(filename, r\"./reports/{}/{}\".format(configs.report_name, filename))\n", 158 | " logging.info(\"已复制文件:{}\".format(filename))" 159 | ] 160 | }, 161 | { 162 | "cell_type": "code", 163 | "execution_count": null, 164 | "metadata": { 165 | "collapsed": true 166 | }, 167 | "outputs": [], 168 | "source": [] 169 | } 170 | ], 171 | "metadata": { 172 | "hide_input": false, 173 | "kernelspec": { 174 | "display_name": "Python 3", 175 | "language": "python", 176 | "name": "python3" 177 | }, 178 | "language_info": { 179 | "codemirror_mode": { 180 | "name": "ipython", 181 | "version": 3 182 | }, 183 | "file_extension": ".py", 184 | "mimetype": "text/x-python", 185 | "name": "python", 186 | "nbconvert_exporter": "python", 187 | "pygments_lexer": "ipython3", 188 | "version": "3.6.1" 189 | } 190 | }, 191 | "nbformat": 4, 192 | "nbformat_minor": 2 193 | } 194 | -------------------------------------------------------------------------------- /Example_3/models.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | import logging 3 | import jinja2 4 | import configs 5 | 6 | class Document(object): 7 | def __init__(self): 8 | # 文章标题 9 | self.title = None 10 | # 子标题 11 | self.subtitle = None 12 | # 前言 13 | self.foreword = None 14 | # 章节 15 | self.chapters = [] 16 | for i in range(5): 17 | self.chapters.append(Chapter(subchapter_number=4, number="chapter{}".format(i))) 18 | 19 | def print_structure(self, deep): 20 | if self.chapters: 21 | for subchapter in self.chapters: 22 | print('--'*deep+subchapter.number) 23 | subchapter.print_structure(deep+1) 24 | else: 25 | return 26 | 27 | def save_html(self, html_name): 28 | with open('./template/template.html') as f: 29 | templ = f.read() 30 | env = jinja2.Environment(extensions=['jinja2.ext.do']) 31 | html = env.from_string(templ).render(document=self) 32 | with open(html_name,'w') as f: 33 | f.write(html) 34 | 35 | class Chapter(Document): 36 | def __init__(self, subchapter_number=0, number=''): 37 | """ 38 | subchapter_number : 子章节数 39 | number : 章节编号,基于文档结构生成,需要是唯一的 40 | """ 41 | # 文章标题 42 | self.title = None 43 | # 子标题 44 | self.subtitle = None 45 | # 前言 46 | self.foreword = None 47 | 48 | self.content = None 49 | self.table = None 50 | # 图表会先做出图片形式,此处保存图片的路径 51 | self.image = None 52 | self.rank_list_change = None 53 | self.number = number 54 | self.chapters= [Chapter(subchapter_number=0, number="{}_subchapter{}".format(self.number, i)) 55 | for i in range(subchapter_number)] 56 | # 设定图片保存的链接 57 | self.image_path = './reports/{}/image/'.format(configs.report_name) 58 | 59 | def set_image(self, fig): 60 | image_filename = '{}{}.png'.format(self.image_path, self.number) 61 | fig.savefig(image_filename, dpi=160, bbox_inches='tight') 62 | self.image = './image/{}.png'.format(self.number) 63 | 64 | def __getattr__(self, name): 65 | try: 66 | return self.__dict__[name] 67 | except: 68 | logging.info("Attribute is not exist") 69 | return None 70 | 71 | # 子章节定义可以直接使用Chapter定义,利用了Python类的继承,也为之后补充自定义子章节提供了留白 72 | class Subchapter(Chapter): 73 | pass -------------------------------------------------------------------------------- /Example_3/reports/Report1/configs.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | report_name = 'Report1' 3 | filepath = './' 4 | date = u'20180528' 5 | bd = u'PS' 6 | author = u'作者君' 7 | 8 | 9 | -------------------------------------------------------------------------------- /Example_3/reports/Report1/data/PS20180528_bestsellers.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ssty247/automate-report/27b3fecf5bdd5dd3ae7a8362443d835c4825b1a6/Example_3/reports/Report1/data/PS20180528_bestsellers.pkl -------------------------------------------------------------------------------- /Example_3/reports/Report1/data/PS20180528_hotnewreleases.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ssty247/automate-report/27b3fecf5bdd5dd3ae7a8362443d835c4825b1a6/Example_3/reports/Report1/data/PS20180528_hotnewreleases.pkl -------------------------------------------------------------------------------- /Example_3/reports/Report1/data/PS20180528_moversandshakers.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ssty247/automate-report/27b3fecf5bdd5dd3ae7a8362443d835c4825b1a6/Example_3/reports/Report1/data/PS20180528_moversandshakers.pkl -------------------------------------------------------------------------------- /Example_3/reports/Report1/data/PS20180528_product.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ssty247/automate-report/27b3fecf5bdd5dd3ae7a8362443d835c4825b1a6/Example_3/reports/Report1/data/PS20180528_product.pkl -------------------------------------------------------------------------------- /Example_3/reports/Report1/image/chapter1_subchapter0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ssty247/automate-report/27b3fecf5bdd5dd3ae7a8362443d835c4825b1a6/Example_3/reports/Report1/image/chapter1_subchapter0.png -------------------------------------------------------------------------------- /Example_3/reports/Report1/image/chapter1_subchapter1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ssty247/automate-report/27b3fecf5bdd5dd3ae7a8362443d835c4825b1a6/Example_3/reports/Report1/image/chapter1_subchapter1.png -------------------------------------------------------------------------------- /Example_3/reports/Report1/image/chapter1_subchapter2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ssty247/automate-report/27b3fecf5bdd5dd3ae7a8362443d835c4825b1a6/Example_3/reports/Report1/image/chapter1_subchapter2.png -------------------------------------------------------------------------------- /Example_3/reports/Report1/image/chapter1_subchapter3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ssty247/automate-report/27b3fecf5bdd5dd3ae7a8362443d835c4825b1a6/Example_3/reports/Report1/image/chapter1_subchapter3.png -------------------------------------------------------------------------------- /Example_3/reports/Report1/image/chapter2_subchapter0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ssty247/automate-report/27b3fecf5bdd5dd3ae7a8362443d835c4825b1a6/Example_3/reports/Report1/image/chapter2_subchapter0.png -------------------------------------------------------------------------------- /Example_3/reports/Report1/image/chapter2_subchapter1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ssty247/automate-report/27b3fecf5bdd5dd3ae7a8362443d835c4825b1a6/Example_3/reports/Report1/image/chapter2_subchapter1.png -------------------------------------------------------------------------------- /Example_3/reports/Report1/image/chapter2_subchapter2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ssty247/automate-report/27b3fecf5bdd5dd3ae7a8362443d835c4825b1a6/Example_3/reports/Report1/image/chapter2_subchapter2.png -------------------------------------------------------------------------------- /Example_3/reports/Report1/image/chapter2_subchapter3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ssty247/automate-report/27b3fecf5bdd5dd3ae7a8362443d835c4825b1a6/Example_3/reports/Report1/image/chapter2_subchapter3.png -------------------------------------------------------------------------------- /Example_3/reports/Report1/image/chapter3_subchapter0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ssty247/automate-report/27b3fecf5bdd5dd3ae7a8362443d835c4825b1a6/Example_3/reports/Report1/image/chapter3_subchapter0.png -------------------------------------------------------------------------------- /Example_3/reports/Report1/image/chapter3_subchapter1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ssty247/automate-report/27b3fecf5bdd5dd3ae7a8362443d835c4825b1a6/Example_3/reports/Report1/image/chapter3_subchapter1.png -------------------------------------------------------------------------------- /Example_3/reports/Report1/image/chapter3_subchapter2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ssty247/automate-report/27b3fecf5bdd5dd3ae7a8362443d835c4825b1a6/Example_3/reports/Report1/image/chapter3_subchapter2.png -------------------------------------------------------------------------------- /Example_3/reports/Report1/image/chapter3_subchapter3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ssty247/automate-report/27b3fecf5bdd5dd3ae7a8362443d835c4825b1a6/Example_3/reports/Report1/image/chapter3_subchapter3.png -------------------------------------------------------------------------------- /Example_3/reports/Report1/main.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": { 7 | "collapsed": true 8 | }, 9 | "outputs": [], 10 | "source": [ 11 | "# -*- coding: utf-8 -*-\n", 12 | "import pandas as pd\n", 13 | "import logging, os, shutil\n", 14 | "\n", 15 | "\n", 16 | "import configs\n", 17 | "from models import Document\n", 18 | "from analysis import product, info, lists, bs, hnr, mns\n", 19 | "from tools import check_folder" 20 | ] 21 | }, 22 | { 23 | "cell_type": "code", 24 | "execution_count": 2, 25 | "metadata": {}, 26 | "outputs": [ 27 | { 28 | "name": "stderr", 29 | "output_type": "stream", 30 | "text": [ 31 | "WARNING:root:文件夹未创立:Report1\n" 32 | ] 33 | } 34 | ], 35 | "source": [ 36 | "## 检查文件结构,并创建缺少的文件夹\n", 37 | "check_folder(r\"./reports/{}\".format(configs.report_name))\n", 38 | "for folder in ['data', 'image']:\n", 39 | " check_folder(r\"./reports/{}/{}\".format(configs.report_name, folder))" 40 | ] 41 | }, 42 | { 43 | "cell_type": "code", 44 | "execution_count": 3, 45 | "metadata": { 46 | "collapsed": true 47 | }, 48 | "outputs": [], 49 | "source": [ 50 | "document = Document()" 51 | ] 52 | }, 53 | { 54 | "cell_type": "code", 55 | "execution_count": 4, 56 | "metadata": { 57 | "collapsed": true 58 | }, 59 | "outputs": [], 60 | "source": [ 61 | "document.author = configs.author\n", 62 | "document.title = '%s-%s%s品类报告'%(info.start_time, info.end_time,info.category_name)\n", 63 | "document.foreword = \"\"\"\n", 64 | "这里是前言\n", 65 | "\"\"\"\n", 66 | "document.chapters[0].title = '一、总体情况'\n", 67 | "document.chapters[0].foreword = \"\"\"\n", 68 | "{.start_time}至{.end_time}共有{.commodity_total_count}款\n", 69 | "商品登榜亚马逊{.category_name}品类的相关榜单,\n", 70 | "其中涉及品牌数{.brand_total_count:.0f}个。\n", 71 | "截至{.end_time},登榜商品总评论数达{.review_total_count:.0f}条,\n", 72 | "平均星级{.rating_mean:.2f}星。\\n\\n\n", 73 | "榜单的登榜商品及品牌数量情况如下表所示:\n", 74 | "\"\"\".format(info,info,info,info,info,info,info,info)\n", 75 | "document.chapters[0].table = pd.DataFrame(data = {'榜单':list(lists.keys()),\n", 76 | " '登榜商品数':[rank_list.commodity.count for rank_list in lists.values()],\n", 77 | " '登榜品牌数':[rank_list.brand.count for rank_list in lists.values()]\n", 78 | " })" 79 | ] 80 | }, 81 | { 82 | "cell_type": "code", 83 | "execution_count": 5, 84 | "metadata": { 85 | "collapsed": true 86 | }, 87 | "outputs": [], 88 | "source": [ 89 | "for i in range(3):\n", 90 | " chinese_number = ['一', '二', '三', '四']\n", 91 | " rank_name = list(lists.keys())[i]\n", 92 | " rank_list = lists[rank_name]\n", 93 | " \n", 94 | " chapter = document.chapters[i+1]\n", 95 | " chapter.title = \"{}、{}榜单分析:\".format(chinese_number[i+1], rank_name)\n", 96 | " \n", 97 | " chapter.chapters[0].title = '1.价格分布'\n", 98 | " chapter.chapters[0].set_image(rank_list.price.image.fig)\n", 99 | " chapter.chapters[0].content = \"\"\"\n", 100 | " 上周 {} 品类 {} 榜单商品最高价${:.2f},\n", 101 | " 最低价${:.2f},平均价格${:.2f}。{}发现,在上榜的商品中,\n", 102 | " \"\"\".format(info.category_name, rank_name, rank_list.price.max, rank_list.price.min, \n", 103 | " rank_list.price.mean, configs.author)\n", 104 | " \n", 105 | " chapter.chapters[1].title = '2.评论量分布'\n", 106 | " chapter.chapters[1].set_image(rank_list.review.image.fig)\n", 107 | " chapter.chapters[1].content = \"\"\"\n", 108 | " 在上周登榜 {} 品类 {} 榜单的商品中,\n", 109 | " 单个商品拥有的最大评论数为{:.2f}条,最小评论数为{:.2f}条,平均评论数{:.2f}条,评论平均星级{:.2f}星。\n", 110 | " \"\"\".format(info.category_name, rank_name, rank_list.review.max, rank_list.review.min, \n", 111 | " rank_list.review.mean, rank_list.review.rating_mean)\n", 112 | " \n", 113 | " chapter.chapters[2].title = '3.商品排行'\n", 114 | " chapter.chapters[2].foreword = '''根据上周 {} 榜单上商品的登榜次数及平均排名,\n", 115 | " {}对登榜的商品进行了排序,其中排名靠前的五款商品如下所示:'''.format(rank_name, configs.author)\n", 116 | " chapter.chapters[2].table = rank_list.commodity.table\n", 117 | " chapter.chapters[2].set_image(rank_list.commodity.image.fig)\n", 118 | " chapter.chapters[2].content = \"\"\"\n", 119 | " 可以看到,在上周{}品类的{}榜单中,\n", 120 | " \"\"\".format(info.category_name, rank_name)\n", 121 | " \n", 122 | " chapter.chapters[3].title = '4.品牌排行'\n", 123 | " chapter.chapters[3].table = rank_list.brand.table\n", 124 | " chapter.chapters[3].set_image(rank_list.brand.image.fig)\n", 125 | " chapter.chapters[3].foreword = \"\"\"根据上周 {} 榜单上各个品牌下商品的登榜次数,{}对登榜的品牌进行了排序,\n", 126 | " 其中排名靠前的五个品牌如下所示:\"\"\".format(rank_name, configs.author)\n", 127 | " chapter.chapters[3].content = \"\"\"\n", 128 | " 可以看到,在上周{}品类的{}榜单中,\n", 129 | " \"\"\".format(info.category_name, rank_name)\n", 130 | " \n", 131 | "document.chapters[4].title = ' '\n", 132 | "document.chapters[4].content = \"\"\"\n", 133 | "这里是总结\n", 134 | "\"\"\"" 135 | ] 136 | }, 137 | { 138 | "cell_type": "code", 139 | "execution_count": 6, 140 | "metadata": {}, 141 | "outputs": [ 142 | { 143 | "ename": "NameError", 144 | "evalue": "name 'Template' is not defined", 145 | "output_type": "error", 146 | "traceback": [ 147 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 148 | "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", 149 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mdocument\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msave_html\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"./reports/{}/{}.html\"\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mformat\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mconfigs\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mreport_name\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mconfigs\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mreport_name\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", 150 | "\u001b[0;32m/Users/zhanglinshu/Documents/automate report/Example_3/models.py\u001b[0m in \u001b[0;36msave_html\u001b[0;34m(self, html_name)\u001b[0m\n\u001b[1;32m 28\u001b[0m \u001b[0;32mwith\u001b[0m \u001b[0mopen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'./template/template.html'\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0mf\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 29\u001b[0m \u001b[0mtempl\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 30\u001b[0;31m \u001b[0mt\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mTemplate\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtempl\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 31\u001b[0m \u001b[0menv\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mjinja2\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mEnvironment\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mextensions\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'jinja2.ext.do'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 32\u001b[0m \u001b[0mhtml\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0menv\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfrom_string\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtempl\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrender\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdocument\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 151 | "\u001b[0;31mNameError\u001b[0m: name 'Template' is not defined" 152 | ] 153 | } 154 | ], 155 | "source": [ 156 | "document.save_html(\"./reports/{}/{}.html\".format(configs.report_name, configs.report_name))" 157 | ] 158 | }, 159 | { 160 | "cell_type": "code", 161 | "execution_count": null, 162 | "metadata": {}, 163 | "outputs": [], 164 | "source": [ 165 | "# 把main.ipynb和configs.py备份入报告文件夹\n", 166 | "for filename in ['main.ipynb', 'configs.py']: \n", 167 | " if os.path.exists(r\"./reports/{}/{}\".format(configs.report_name, filename)):\n", 168 | " logging.warning(\"文件已添加:./reports/{}/{}\".format(configs.report_name, filename))\n", 169 | " shutil.copy(filename, r\"./reports/{}/{}\".format(configs.report_name, filename))\n", 170 | " logging.info(\"已复制文件:{}\".format(filename))" 171 | ] 172 | }, 173 | { 174 | "cell_type": "code", 175 | "execution_count": null, 176 | "metadata": { 177 | "collapsed": true 178 | }, 179 | "outputs": [], 180 | "source": [] 181 | } 182 | ], 183 | "metadata": { 184 | "hide_input": false, 185 | "kernelspec": { 186 | "display_name": "Python 3", 187 | "language": "python", 188 | "name": "python3" 189 | }, 190 | "language_info": { 191 | "codemirror_mode": { 192 | "name": "ipython", 193 | "version": 3 194 | }, 195 | "file_extension": ".py", 196 | "mimetype": "text/x-python", 197 | "name": "python", 198 | "nbconvert_exporter": "python", 199 | "pygments_lexer": "ipython3", 200 | "version": "3.6.1" 201 | } 202 | }, 203 | "nbformat": 4, 204 | "nbformat_minor": 2 205 | } 206 | -------------------------------------------------------------------------------- /Example_3/template/template.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 22 | 23 |
24 |

{{ document.title }}

25 |

{{ document.foreword }}

26 | 27 | {%- set idxs = [0] -%} 28 | {% for chapter in document.chapters recursive %} 29 | {%- set depth=idxs|length -%} 30 | {%- if not chapter.title is none %} 31 | {{ chapter.title }} 32 | {%- endif -%} 33 | {%- if not chapter.foreword is none %} 34 |

{{ chapter.foreword }}

35 | {%- endif -%} 36 | {% if not chapter.table is none %} 37 | {% set table = chapter.table %} 38 | 39 | 40 | {% for column in table.columns %} 41 | 42 | {% endfor %} 43 | 44 | {% for index in table.index %} 45 | 46 | {% for column in table.columns %} 47 | 48 | {% endfor %} 49 | 50 | {% endfor %} 51 |
{{column}}
{{ table[column][index] }}
52 | {% endif %} 53 | {% if not chapter.image is none %} 54 | {% set image = chapter.image %} 55 |
56 | {% endif %} 57 | {% if not chapter.content is none %} 58 |

{{ chapter.content }}

59 | {% endif %} 60 | {% if chapter.chapters.__len__() > 0 %} 61 | {%- do idxs.append(loop.index) -%} 62 | {{ loop(chapter.chapters) }} 63 | {%- do idxs.pop() -%} 64 | {% endif %} 65 | 66 | {% endfor %} 67 |
68 | 69 | -------------------------------------------------------------------------------- /Example_3/template/template_recursive.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 22 | 23 |
24 |

{{ document.title }}

25 |

{{ document.foreword }}

26 | 27 | {%- set idxs = [0] -%} 28 | {% for chapter in document.chapters recursive %} 29 | {%- set depth=idxs|length -%} 30 | {%- if not chapter.title is none %} 31 | {{ chapter.title }} 32 | {%- endif -%} 33 | {%- if not chapter.foreword is none %} 34 |

{{ chapter.foreword }}

35 | {%- endif -%} 36 | {% if not chapter.table is none %} 37 | {% set table = chapter.table %} 38 | 39 | 40 | {% for column in table.columns %} 41 | 42 | {% endfor %} 43 | 44 | {% for index in table.index %} 45 | 46 | {% for column in table.columns %} 47 | 48 | {% endfor %} 49 | 50 | {% endfor %} 51 |
{{column}}
{{ table[column][index] }}
52 | {% endif %} 53 | {% if not chapter.image is none %} 54 | {% set image = chapter.image %} 55 |
56 | {% endif %} 57 | {% if not chapter.content is none %} 58 |

{{ chapter.content }}

59 | {% endif %} 60 | {% if chapter.chapters.__len__() > 0 %} 61 | {%- do idxs.append(loop.index) -%} 62 | {{ loop(chapter.chapters) }} 63 | {%- do idxs.pop() -%} 64 | {% endif %} 65 | 66 | {% endfor %} 67 |
68 | 69 | -------------------------------------------------------------------------------- /Example_3/tools.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | import re 3 | import logging, os 4 | 5 | def str2num(string): 6 | """ 7 | str2num(string) 8 | 9 | Get number for a string. 10 | 11 | Parameters 12 | ---------- 13 | string : a string with the format like '$2.1', '$1, 333' or '&4,3' 14 | 15 | Returns 16 | ------- 17 | out : float 18 | 19 | Examples 20 | -------- 21 | >>> str2num('$2.3') 22 | 2.3 23 | """ 24 | if not isinstance(string, str): 25 | string = str(string) 26 | string = string.replace(',','') 27 | regular_expression = '\d+\.?\d*' 28 | pattern = re.compile(regular_expression) 29 | match = pattern.search(string) 30 | if match: 31 | return float(match.group()) 32 | else: 33 | return float('nan') 34 | 35 | def check_folder(folder_name): 36 | if os.path.exists(folder_name): 37 | logging.info("文件夹已创立:{}".format(folder_name)) 38 | else: 39 | logging.warning("文件夹未创立:{}".format(folder_name)) 40 | os.makedirs(folder_name) 41 | logging.info("文件夹创立完毕:{}".format(folder_name)) -------------------------------------------------------------------------------- /__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ssty247/automate-report/27b3fecf5bdd5dd3ae7a8362443d835c4825b1a6/__init__.py -------------------------------------------------------------------------------- /font/Calibri.ttf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ssty247/automate-report/27b3fecf5bdd5dd3ae7a8362443d835c4825b1a6/font/Calibri.ttf -------------------------------------------------------------------------------- /font/MSYHMONO.ttf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ssty247/automate-report/27b3fecf5bdd5dd3ae7a8362443d835c4825b1a6/font/MSYHMONO.ttf -------------------------------------------------------------------------------- /readme.md: -------------------------------------------------------------------------------- 1 | ## 基本说明 2 | 此报告已经上传[github](https://github.com/LinshuZhang/automate-report),clone下来之后查看更为方便。 3 | 4 | 5 | ## 环境配置(可直接复制粘贴到终端中运行) 6 | ### 1. 报告制作独立环境(env_name请使用你所习惯的环境命名) 7 | conda create -n env_name python=3.6 8 | ### 2. 激活环境 9 | source activate env_name 10 | ### 3. 基于requirements.txt安装依赖包 11 | pip install -r requirements.txt 12 | 13 | ## 使用方式 14 | ### 1. 进入工作文件夹 15 | 在终端中使用cd命令进入此报告自动化教程要保存的文件夹目录 16 | 比如把此文件夹放在Documents 文件夹中(Mac系统),依次执行 17 | ``` 18 | cd ~ 19 | cd Documents 20 | ``` 21 | 就进入了工作目录 22 | ### 2. 下载教程 23 | 进入工作目录后,从github上clone本教程,方便直接执行查看 24 | 可以直接在Mac终端中执行如下: 25 | ``` 26 | git clone https://github.com/LinshuZhang/automate-report.git 27 | ``` 28 | 这时候就已经在Documents文件夹中下载了教程文件夹 automate-report 29 | 30 | ### 3. 激活环境 31 | 执行如下,激活制作报告所用的独立python3环境 32 | ``` 33 | source activate env_name 34 | ``` 35 | 36 | ### 4. 打开ipython notebook 37 | ``` 38 | jupyter notebook 39 | ``` 40 | ### 4. 在浏览器中打开的notebook界面打开ipynb文件查看和执行此教程 41 | 42 | 43 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | ipython==5.3.0 2 | matplotlib==2.0.0 3 | jinja2==2.9.6 4 | Jupyter 5 | requests 6 | numpy 7 | pandas 8 | types 9 | 10 | -------------------------------------------------------------------------------- /报告自动化的思路分享.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ssty247/automate-report/27b3fecf5bdd5dd3ae7a8362443d835c4825b1a6/报告自动化的思路分享.zip -------------------------------------------------------------------------------- /相关知识积累书目/Mastering Python Design Patterns.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ssty247/automate-report/27b3fecf5bdd5dd3ae7a8362443d835c4825b1a6/相关知识积累书目/Mastering Python Design Patterns.pdf -------------------------------------------------------------------------------- /相关知识积累书目/Pro Python.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ssty247/automate-report/27b3fecf5bdd5dd3ae7a8362443d835c4825b1a6/相关知识积累书目/Pro Python.pdf -------------------------------------------------------------------------------- /相关知识积累书目/sicp.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ssty247/automate-report/27b3fecf5bdd5dd3ae7a8362443d835c4825b1a6/相关知识积累书目/sicp.pdf -------------------------------------------------------------------------------- /相关知识积累书目/计算机程序的构造和解释中文版.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ssty247/automate-report/27b3fecf5bdd5dd3ae7a8362443d835c4825b1a6/相关知识积累书目/计算机程序的构造和解释中文版.pdf --------------------------------------------------------------------------------