├── ML_roadmap.png
├── FakeDataGenerator.py
├── README.md
├── 第3期-学习方法及资料分享.ipynb
├── .gitignore
├── 学习资料推荐.md
├── 第0期-如何用机器学习预测职业生涯.ipynb
├── 第5期-OOP_keras.ipynb
├── 第2期-kaggle新冠预测竞赛高赞代码解读.ipynb
└── 第1期-了解这3组概念,就可以开始机器学习实战了.ipynb
/ML_roadmap.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LmYjQ/AI_for_everyone/HEAD/ML_roadmap.png
--------------------------------------------------------------------------------
/FakeDataGenerator.py:
--------------------------------------------------------------------------------
1 | import yaml
2 | import random
3 | import pandas as pd
4 |
5 | class FakeDataGenerator(object):
6 | def __init__(self, conf_file):
7 | with open(conf_file, 'r') as f:
8 | self.conf = yaml.load(f)
9 | self.data = self._make_fake_data()
10 |
11 | def _make_fake_data(self):
12 | data = {}
13 | self.num = int(self.conf['num_samples'])
14 | data['label'] = self._random_col(self.conf['label'])
15 | for field, value_str in self.conf['fields'].items():
16 | data[field] = self._random_col(value_str)
17 | print(data)
18 | return data
19 |
20 | # 随机产生一列数据,离散或连续
21 | def _random_col(self, value_str):
22 | if ',' in value_str: # discrete
23 | value_set = value_str.split(',')
24 | return [int(random.choice(value_set)) for i in range(self.num) ]
25 | else: # continuous
26 | value_set = value_str.split('-')
27 | return [random.uniform(int(value_set[0]), int(value_set[1])) for i in range(self.num)]
28 |
29 | def get_dataframe():
30 | pass
31 |
32 |
33 | if __name__=='__main__':
34 | fdg = FakeDataGenerator('./conf/classification.yaml')
35 | fdg._make_fake_data()
36 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | ## 内容简介
2 | 力图用最简单易懂的方式科普机器学习算法/人工智能
3 |
4 | 计划更新内容如下:
5 |
6 | - 第0期 机器学习算法能解决什么样的问题(已更新)
7 | - 第1期 15分钟带你掌握实战机器学习前的知识准备
8 | - 第2期 来一场kaggle实战初体验
9 | - 第3期 学习资料推荐
10 | - 还没想好,欢迎大家提建议
11 |
12 | ## 现有内容有什么缺点
13 | - 理论性内容过于严谨,学起来吃力耗时长
14 | - 快餐性内容过于流程化,看了之后不知道怎么用到自己的场景里
15 | - 我希望结合自己的踩坑经验,做一些讲解让大家看了以后就能上手用,用起来之后再慢慢补理论基础
16 |
17 | ## 适合人群
18 | 任何对机器学习算法,人工智能感兴趣的专业/非专业人群。
19 |
20 | ## FAQ
21 | - 1.我是xxx背景,我可以做机器学习吗?
22 |
23 | 答:任何人都可以学习和使用机器学习,因为现有的算法包已经很成熟了,简单理解原理之后调用是很容易的一件事情。仅仅调用现成的算法包在很多场景下已经足够产生很大的收益了,不需要每个人都去把理论吃透,发明新的算法。
24 |
25 | - 2.我想学习机器学习,应该怎么开始?
26 |
27 | 答:(1)如果时间比较充裕,对数学基础比较有信心,可以选择系统性的学习。从andrewNG,李宏毅,林轩田的课开始看。书籍参考李航《统计学习方法》和西瓜书(周志华老师《机器学习》),英文教材PRML等等。这些资料去知乎能搜到很多。(2)如果时间不充足,不喜欢推数学公式的话可以直接上手实操。先学一些python的基础语法,然后直接上kaggle找比赛看高赞的kernel。kernel就是别人写的代码,从读取数据,预处理,模型训练到预测,很快就可以体验完整的流程,不需要对模型的理解很深入。先把完整的流程走通,得到正反馈再一点一点补理论基础。(后面总结更多资料)
28 |
29 | 个人推荐大多数人走方法二就好。因为如果不是要走学术路线,只是为了能解决一些实际问题或者找工作,只要明白每个模型的大体思路,每个参数有什么影响,用别人写好的模型就足够做出一个还不错的结果了。想要突破sota做出创新的方法一还是留给少数大牛去做吧
30 |
31 | - 3.算法这么“高大上”的岗位都会被裁的?
32 |
33 | 答:任何岗位都是搬砖的,当老板认为你不能做出足够大的产出(或者说产出足够大产出的概率不够大),那被裁就是理所应当的了。一个事实是,算法能够产出是依赖很多前提的:业务量要够大,能够产出高质量的数据,开始的探索期有一段时间是赔钱的……但是开发就不一样了,老板提出什么需求(只要是正常需求)你满足就好了,(至少在老板看来)能够有产出是很明确的一件事情。
34 |
35 | - 4.非计算机专业转行做算法难度大吗?
36 |
37 | 答:做算法难度确实大,但是哪行都不容易啊,看自己愿意吃哪种苦了。只要有心想做,可以参照Q2中的方法二自己试一试。现在的网络资源如此之丰富,只需要用业余时间去做,不需要辞职不需要花钱不需要重新高考。后面这里会持续更新对新人更友好的学习教程。
38 |
39 | - 5.关于找算法岗的工作/实习
40 |
41 | 答:实际经验比学历重要的多。为什么很多人有学历很重要的感觉,那是因为没有实践经验,那人家就只能看你学历了。至于实践经验怎么找,kaggle还有很多类似的平台上有大把的比赛(后面总结一下比赛的平台)可以刷。
42 |
43 | - 为什么up主的发量有点不像程序员?
44 |
45 | 答:大概是因为我还不够强吧。。。努力中
46 |
--------------------------------------------------------------------------------
/第3期-学习方法及资料分享.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## 目录\n",
8 | "- 误区\n",
9 | "- 学习目标\n",
10 | "- 学习资料"
11 | ]
12 | },
13 | {
14 | "cell_type": "markdown",
15 | "metadata": {},
16 | "source": [
17 | "## 0.误区\n",
18 | "### Q0:想要体验机器学习,需要先看很多资料吗?\n",
19 | "A0:不一定。只需要了解简单的背景知识就可以先动手体验。\n",
20 | "\n",
21 | "### Q1:想要找算法岗的工作,一定要发过几篇论文才可以吗?\n",
22 | "A1:算法的工作分不同的类型。理论研究型肯定是要求学历和论文的,但是也有应用型岗位更看中实践经验的。\n",
23 | "\n",
24 | "### Q2:算法岗是不是特别看中学校/学历/比赛得奖/....?\n",
25 | "A2:两条原则。1.有一定比没有好:别问xx东西有没有用,会总比不会好 2.匹配原则:当有A和B两个相比的话,就要看岗位匹配度了。比如科研岗位更看重理论,业务岗更看重应用"
26 | ]
27 | },
28 | {
29 | "cell_type": "markdown",
30 | "metadata": {},
31 | "source": [
32 | "## 1.学习目标"
33 | ]
34 | },
35 | {
36 | "cell_type": "markdown",
37 | "metadata": {},
38 | "source": [
39 | "\n"
40 | ]
41 | },
42 | {
43 | "cell_type": "markdown",
44 | "metadata": {},
45 | "source": [
46 | " | 研究型 | 应用型 \n",
47 | "-|-|-\n",
48 | "linux(使用服务器/装包) | 🌟🌟🌟 | 🌟🌟🌟🌟 |\n",
49 | "数据处理/SQL | 🌟🌟 | 🌟🌟🌟🌟 |\n",
50 | "算法理论 | 🌟🌟🌟🌟🌟 | 🌟🌟🌟\n",
51 | "编码功力(leetcode) |🌟🌟🌟|🌟🌟🌟\n",
52 | "编码功力(设计模式) |🌟| 🌟🌟🌟\n",
53 | "业务理解 | 🌟🌟| 🌟🌟🌟🌟🌟\n",
54 | "计算机基本原理(操作系统/网络)|🌟🌟|🌟🌟🌟"
55 | ]
56 | },
57 | {
58 | "cell_type": "markdown",
59 | "metadata": {},
60 | "source": [
61 | "## 2.资料\n",
62 | "见markdown文档"
63 | ]
64 | },
65 | {
66 | "cell_type": "code",
67 | "execution_count": null,
68 | "metadata": {},
69 | "outputs": [],
70 | "source": []
71 | }
72 | ],
73 | "metadata": {
74 | "kernelspec": {
75 | "display_name": "Python 3",
76 | "language": "python",
77 | "name": "python3"
78 | },
79 | "language_info": {
80 | "codemirror_mode": {
81 | "name": "ipython",
82 | "version": 3
83 | },
84 | "file_extension": ".py",
85 | "mimetype": "text/x-python",
86 | "name": "python",
87 | "nbconvert_exporter": "python",
88 | "pygments_lexer": "ipython3",
89 | "version": "3.6.4"
90 | }
91 | },
92 | "nbformat": 4,
93 | "nbformat_minor": 2
94 | }
95 |
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | # Byte-compiled / optimized / DLL files
2 | __pycache__/
3 | *.py[cod]
4 | *$py.class
5 |
6 | # C extensions
7 | *.so
8 |
9 | # Distribution / packaging
10 | .Python
11 | build/
12 | develop-eggs/
13 | dist/
14 | downloads/
15 | eggs/
16 | .eggs/
17 | lib/
18 | lib64/
19 | parts/
20 | sdist/
21 | var/
22 | wheels/
23 | pip-wheel-metadata/
24 | share/python-wheels/
25 | *.egg-info/
26 | .installed.cfg
27 | *.egg
28 | MANIFEST
29 |
30 | # PyInstaller
31 | # Usually these files are written by a python script from a template
32 | # before PyInstaller builds the exe, so as to inject date/other infos into it.
33 | *.manifest
34 | *.spec
35 |
36 | # Installer logs
37 | pip-log.txt
38 | pip-delete-this-directory.txt
39 |
40 | # Unit test / coverage reports
41 | htmlcov/
42 | .tox/
43 | .nox/
44 | .coverage
45 | .coverage.*
46 | .cache
47 | nosetests.xml
48 | coverage.xml
49 | *.cover
50 | *.py,cover
51 | .hypothesis/
52 | .pytest_cache/
53 |
54 | # Translations
55 | *.mo
56 | *.pot
57 |
58 | # Django stuff:
59 | *.log
60 | local_settings.py
61 | db.sqlite3
62 | db.sqlite3-journal
63 |
64 | # Flask stuff:
65 | instance/
66 | .webassets-cache
67 |
68 | # Scrapy stuff:
69 | .scrapy
70 |
71 | # Sphinx documentation
72 | docs/_build/
73 |
74 | # PyBuilder
75 | target/
76 |
77 | # Jupyter Notebook
78 | .ipynb_checkpoints
79 |
80 | # IPython
81 | profile_default/
82 | ipython_config.py
83 |
84 | # pyenv
85 | .python-version
86 |
87 | # pipenv
88 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
89 | # However, in case of collaboration, if having platform-specific dependencies or dependencies
90 | # having no cross-platform support, pipenv may install dependencies that don't work, or not
91 | # install all needed dependencies.
92 | #Pipfile.lock
93 |
94 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow
95 | __pypackages__/
96 |
97 | # Celery stuff
98 | celerybeat-schedule
99 | celerybeat.pid
100 |
101 | # SageMath parsed files
102 | *.sage.py
103 |
104 | # Environments
105 | .env
106 | .venv
107 | env/
108 | venv/
109 | ENV/
110 | env.bak/
111 | venv.bak/
112 |
113 | # Spyder project settings
114 | .spyderproject
115 | .spyproject
116 |
117 | # Rope project settings
118 | .ropeproject
119 |
120 | # mkdocs documentation
121 | /site
122 |
123 | # mypy
124 | .mypy_cache/
125 | .dmypy.json
126 | dmypy.json
127 |
128 | # Pyre type checker
129 | .pyre/
130 |
--------------------------------------------------------------------------------
/学习资料推荐.md:
--------------------------------------------------------------------------------
1 | ## 1.算法理论基础
2 | - 视频课程
3 |
4 | 林轩田《机器学习基石》和《机器学习技法》https://www.bilibili.com/video/BV1Cx411i7op?from=search&seid=7223292421632721688 https://www.bilibili.com/video/BV1ix411i7yp?from=search&seid=7223292421632721688
5 |
6 | 李宏毅深度学习 https://www.bilibili.com/video/BV1JE411g7XF
7 |
8 | 白板推导机器学习 https://www.bilibili.com/video/BV1aE411o7qd
9 |
10 | - 书籍
11 | 李航《统计学习方法》
12 |
13 | 周志华《机器学习》(西瓜书)
14 |
15 | DeepLearning(花书)
16 |
17 | Pattern Recognition and Machine Learning
18 |
19 | The element of statistical Learning
20 |
21 | - 博客
22 |
23 | 苏剑林,理论水平和实践水平都非常牛逼的大神,经常提出自己独到的见解 https://spaces.ac.cn/
24 |
25 | 52nlp,nlp相关内容很全 http://www.52nlp.cn/
26 |
27 | ## 2.计算机基础知识
28 | 算法从理论到实际开发和上线,是离不开相关的计算机基础知识的。重点分享给非计算机专业的小伙伴们
29 |
30 | - python编程
31 | 廖雪峰教程,网站上还有SQL和git教程,如果不了解也可以看看 https://www.liaoxuefeng.com/wiki/1016959663602400
32 |
33 | - 计算机网络
34 | 网易首席架构师刘超,课程特点是结合故事和具体案例讲解枯燥琐碎的知识,最后结合双十一的故事串起整套网络协议的知识,听完会有一种通透的赶脚 https://time.geekbang.org/column/intro/85
35 |
36 | - linux操作系统
37 | 还是网易刘超,说实话这门课我趁优惠囤了还没有看完。但是就冲网络协议那么课我相信这位大佬 https://time.geekbang.org/column/intro/164
38 |
39 | - 设计模式
40 | 设计模式的内容相对上面的基础来讲(尤其是对于初学者)相对没有那么重要,因为即使没有设计模式,该解决的问题一样能解决,只是会麻烦、不优雅、容易出错。建议有一些实际的踩坑经验再来学习这门课,当然先走马观花的听一耳朵留个印象也没毛病。这课我也是趁优惠囤的还没看完,冲着王争的数据结构与算法我相信这位大佬 https://time.geekbang.org/column/intro/250
41 |
42 | ## 3.工业实战经验&综述
43 | 这部分工业应用方向以up主过去的主业推荐系统为主,其他方向也会涉及一些。
44 |
45 | - 张俊林,新浪微博推荐大佬,综述文章非常给力,每篇文章都值得精读。特点是有理论有实践,可以很快掌握一类技术的前世今生,不陷入过多细节 https://www.zhihu.com/people/zhang-jun-lin-76/posts
46 |
47 | - 王喆,同样是做推荐/广告的大佬,文章内容非常贴近工业实战,细节多,不太适合没有踩过坑的新人 https://www.zhihu.com/people/wang-zhe-58
48 |
49 | - 微信公众号datafuntalk,超级多工业级实践分享,内容涉及人工智能/机器学习/大数据的方方面面
50 |
51 | - 微信公众号小小挖掘机,《推荐系统遇上深度学习》系列,理论配合代码讲解
52 |
53 | - 浅梦,开源项目DeepCTR作者。graph embedding系列写的非常好 https://zhuanlan.zhihu.com/weichennote
54 |
55 | - 夕小瑶,主做nlp的算法女神。理论内容十分给力,但是这里特别推荐两篇,大家可以了解算法工作的真实状态。 1.在大厂和小厂做算法有什么不同?https://mp.weixin.qq.com/s/T9peHEuauLVxsRMemR5AcQ 2. 拒绝跟风,谈谈几种算法岗的区别和体验 https://mp.weixin.qq.com/s/Y0NkZFxVued3L1izu-6bCA
56 |
57 | - 微信公众号有三AI,主做cv的大牛,分免费和收费内容。我只白嫖过免费内容,质量是很高的。收费内容我相信也很不错,如果需要大家可以自行了解。
58 |
59 | - 微信公众号王的机器,主做金融工程方向的大牛,机器学习、python基础的内容也有很多。
60 |
61 | - pelhans博客,特色是kaldi/ASR相关内容,机器学习/深度学习基础也有 http://pelhans.com/
62 |
63 | - 微信公众号intro2musictech,无痛入门音乐科技,做音乐人工智能的博士女神贝茨 https://mp.weixin.qq.com/s/GNIyDXrYV9zgxJ9Ii3IRhA
64 |
65 | - 软绵绵的小熊猫,Google/Facebook/Amazon等10多家外企面试经验 https://b23.tv/BV177411G7Rj
66 |
67 | - 微博@爱可可-爱生活,b站@fly51fly,业界最前沿/最潮流内容分享,英文内容比较多。https://space.bilibili.com/23852932/?share_source=copy_link&share_medium=iphone&bbid=Z64D7961336870564B5EA852ABDECE972ADC&ts=1585262999
68 |
69 | ## 4.数据结构与算法
70 | 冬瓜哥出品,13.6k star,不仅仅是讲题,还有分类别讲解思路 https://github.com/labuladong/fucking-algorith
71 |
72 | 王争《数据结构与算法之美》,最大的亮点是结合工业应用,告诉你为什么在那个场景下就要用那种数据结构 https://time.geekbang.org/column/intro/126
73 |
74 | liuyubobobo的算法与数据结构,c++实现,一步一步手把手带着敲代码,比王争更细致一些,聚焦于性能分析不涉及工业应用场景 https://coding.imooc.com/learn/list/71.html
75 |
76 | liuyubobobo的leetcode分类讲解,c++实现。和冬瓜哥类似的分类讲解思路,亮点是每一部分都是由简单到复杂,可以体会逐步进阶的过程 https://coding.imooc.com/learn/list/82.htm
77 |
78 | 各种类型题目解题模板 https://blog.csdn.net/fuxuemingzhu/article/details/101900729l
79 |
--------------------------------------------------------------------------------
/第0期-如何用机器学习预测职业生涯.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# 【讲人话的人工智能】第0期 - 如何用机器学习预测自己的职业生涯"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "## 1. 为什么要看这些视频\n",
15 | "### 1.我能不能做机器学习?\n",
16 | "\n",
17 | "任何人都可以,不同人有不同的方式\n",
18 | "\n",
19 | "\n",
20 | "\n",
21 | "\n",
22 | "### 2.我想做机器学习,应该如何上手?\n",
23 | "\n",
24 | "方案一:以正规军的方式系统性学习\n",
25 | "\n",
26 | "🌟方案二:野路子搞起,边干边学\n",
27 | "\n",
28 | "\n",
29 | "\n",
30 | "### 3.排除常见误区\n",
31 | "\n",
32 | "高大上的机器学习是不是多么困难的问题都能解决?\n",
33 | "\n",
34 | "是不是越复杂的模型最厉害?\n",
35 | "\n",
36 | "\n",
37 | "\n",
38 | "## 2. 为什么可以做到\n",
39 | "1.多讲思想少推公式\n",
40 | "\n",
41 | "2.尽量结合具体的实例\n",
42 | "\n",
43 | "\n",
44 | "## 3. 问题类型以及对应的算法(算法类型以及对应能解决的问题?)\n",
45 | "### 先有问题还是先有算法?\n",
46 | "\n",
47 | "1.回归(监督学习)\n",
48 | "\n",
49 | "- 已知一个人的一系列属性(年龄/性别/毕业院校/技能熟练度..),预测能拿多少工资\n",
50 | "\n",
51 | "2.分类(监督学习)\n",
52 | "\n",
53 | "- 已知一个人的一系列属性(年龄/性别/所在公司/工作年限...),预测是否会升职\n",
54 | "\n",
55 | "3.聚类(非监督学习)\n",
56 | "\n",
57 | "- 已知一个小组leader手下有一堆人,想要根据他们的不同特质(技术水平/沟通能力/性格特点/家庭背景...)分成几类,采用不同的培养方式\n",
58 | "\n",
59 | "4.智能体&环境交互(强化学习)\n",
60 | "\n",
61 | "- alphaGo下围棋,星际争霸,flappybird\n",
62 | "- 一个应届毕业生(智能体)毕业后进入一家公司(环境),工作和接触同事们不断打怪升级(交互),根据获得的经验(正/负向激励),提升自己(学习到的策略)\n",
63 | "\n",
64 | "\n",
65 | "## 4. 几种算法的区别\n",
66 | "1.监督学习\n",
67 | "\n",
68 | "需要看参考答案(样本)来学习(训练)。已知y(label)和X(feature)求一个f(model)使得y=f(x)\n",
69 | "有了f(x)以后随便来一个x都可以算出y\n",
70 | "\n",
71 | "2.非监督学习\n",
72 | "\n",
73 | "不需要参考答案,只根据X值的分布把样本分成几堆(和分类的区别在不知道到底分几堆)\n",
74 | "\n",
75 | "- 技术水平高&沟通能力弱一些的把高难度的任务派给他\n",
76 | "- 技术水平一般&沟通能力强的负责和其他部门协作,讨论需求\n",
77 | "- 技术水平渣渣&家庭背景硬的当大爷供起来\n",
78 | "- 。。。。\n",
79 | "\n",
80 | "3.强化学习\n",
81 | "\n",
82 | "定义智能体/环境的交互规则,奖惩措施,通过暴力or机智的搜索方式让智能体找到好的策略\n",
83 | "\n",
84 | "## 5. 各种算法的经典模型\n",
85 | "\n",
86 | "### sklearn(python必学) https://scikit-learn.org/\n",
87 | "\n",
88 | "regression回归\n",
89 | "\n",
90 | "classification分类\n",
91 | "\n",
92 | "clustering聚类\n",
93 | "\n",
94 | "dimension reducing降维"
95 | ]
96 | },
97 | {
98 | "cell_type": "markdown",
99 | "metadata": {},
100 | "source": [
101 | "## 思考题\n",
102 | "你可以举出一些“问题以及对应的算法”的例子吗?"
103 | ]
104 | },
105 | {
106 | "cell_type": "code",
107 | "execution_count": null,
108 | "metadata": {},
109 | "outputs": [],
110 | "source": []
111 | }
112 | ],
113 | "metadata": {
114 | "kernelspec": {
115 | "display_name": "Python 3",
116 | "language": "python",
117 | "name": "python3"
118 | },
119 | "language_info": {
120 | "codemirror_mode": {
121 | "name": "ipython",
122 | "version": 3
123 | },
124 | "file_extension": ".py",
125 | "mimetype": "text/x-python",
126 | "name": "python",
127 | "nbconvert_exporter": "python",
128 | "pygments_lexer": "ipython3",
129 | "version": "3.6.4"
130 | }
131 | },
132 | "nbformat": 4,
133 | "nbformat_minor": 2
134 | }
135 |
--------------------------------------------------------------------------------
/第5期-OOP_keras.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Why-两面性\n",
8 | "- 减少重复劳动:逻辑隐藏在基类中,需要花力气找\n",
9 | "- 增加约束:避免发生低级错误"
10 | ]
11 | },
12 | {
13 | "cell_type": "markdown",
14 | "metadata": {},
15 | "source": [
16 | "## 深度学习为什么OOP\n",
17 | "- 深度学习最重要的事情:定义层是什么样子,能做什么事情\n",
18 | "- 定义结构(权重),计算逻辑,loss,metric\n",
19 | "- OOP思想应用于层的复用——完美!"
20 | ]
21 | },
22 | {
23 | "cell_type": "markdown",
24 | "metadata": {},
25 | "source": [
26 | "## keras OOP\n",
27 | "\n",
28 | "### [base_layer](https://keras.io/api/layers/base_layer/)\n",
29 | "\n",
30 | "__init__(): Defines custom layer attributes, and creates layer state variables that do not depend on input shapes, using add_weight().\n",
31 | "\n",
32 | "build(self, input_shape): This method can be used to create weights that depend on the shape(s) of the input(s), using add_weight(). __call__() will automatically build the layer (if it has not been built yet) by calling build().\n",
33 | "\n",
34 | "call(self, *args, **kwargs): Called in __call__ after making sure build() has been called. call() performs the logic of applying the layer to the input tensors (which should be passed in as argument). Two reserved keyword arguments you can optionally use in call() are: - training (boolean, whether the call is in inference mode or training mode) - mask (boolean tensor encoding masked timesteps in the input, used in RNN layers)\n",
35 | "\n",
36 | "get_config(self): Returns a dictionary containing the configuration used to initialize this layer. If the keys differ from the arguments in __init__, then override from_config(self) as well. This method is used when saving the layer or a model that contains this layer.\n",
37 | "\n",
38 | "懒加载(lazy loading)思想:用到的时候才创建\n",
39 | "\n",
40 | "为什么区分是否依赖输入维度?只想写死输出的维度,输出维度根据输入自动识别"
41 | ]
42 | },
43 | {
44 | "cell_type": "markdown",
45 | "metadata": {},
46 | "source": [
47 | "### [实战](https://keras.io/guides/making_new_layers_and_models_via_subclassing/#making-new-layers-and-models-via-subclassing)"
48 | ]
49 | },
50 | {
51 | "cell_type": "markdown",
52 | "metadata": {},
53 | "source": [
54 | "#### 最简单的wx+b如何实现(add_weight封装,简化代码)\n",
55 | "#### 固定权重,训练过程中不更新:trainable=False\n",
56 | "#### 根据输入自动推断维度\n",
57 | "#### 递归调用:层中层(不禁止套娃)\n",
58 | "#### loss\n",
59 | "- loss作为属性更新/调用\n",
60 | "- 每一轮(反向传播结束)重置\n",
61 | "\n",
62 | "#### metric\n",
63 | "- 使用方法和loss类似\n",
64 | "\n",
65 | "#### 序列化\n",
66 | "#### 训练/预测阶段做不同的事情: BatchNormalization / Dropout \n",
67 | "#### 处理序列数据:mask\n",
68 | "#### model类:把层装到一起\n",
69 | "#### 结合以上所有:MNIST数据集训练VAE\n",
70 | "#### 函数式API\n",
71 | "- 一行代码 = 一层 + 一输入 + 一输出 = 两个节点\n",
72 | "- 所有节点合起来 = DAG"
73 | ]
74 | },
75 | {
76 | "cell_type": "markdown",
77 | "metadata": {},
78 | "source": [
79 | "## 推荐学习资料\n",
80 | "https://github.com/shenweichen/DeepMatch\n",
81 | "\n",
82 | "https://github.com/shenweichen/DeepCTR\n",
83 | "\n",
84 | "https://spaces.ac.cn/tag/keras/"
85 | ]
86 | },
87 | {
88 | "cell_type": "code",
89 | "execution_count": null,
90 | "metadata": {},
91 | "outputs": [],
92 | "source": []
93 | }
94 | ],
95 | "metadata": {
96 | "kernelspec": {
97 | "display_name": "Python 3",
98 | "language": "python",
99 | "name": "python3"
100 | },
101 | "language_info": {
102 | "codemirror_mode": {
103 | "name": "ipython",
104 | "version": 3
105 | },
106 | "file_extension": ".py",
107 | "mimetype": "text/x-python",
108 | "name": "python",
109 | "nbconvert_exporter": "python",
110 | "pygments_lexer": "ipython3",
111 | "version": "3.7.0"
112 | },
113 | "toc": {
114 | "base_numbering": 1,
115 | "nav_menu": {},
116 | "number_sections": true,
117 | "sideBar": true,
118 | "skip_h1_title": false,
119 | "title_cell": "Table of Contents",
120 | "title_sidebar": "Contents",
121 | "toc_cell": false,
122 | "toc_position": {},
123 | "toc_section_display": true,
124 | "toc_window_display": false
125 | }
126 | },
127 | "nbformat": 4,
128 | "nbformat_minor": 2
129 | }
130 |
--------------------------------------------------------------------------------
/第2期-kaggle新冠预测竞赛高赞代码解读.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## 0.目的\n",
8 | "- 实战如何跟着高赞代码学习,体验完整的机器学习流程\n",
9 | "- 推荐学习方式:先看输入输出,然后了解怎么做到这一点的\n",
10 | "- 常见数据处理思想:filter/groupby/agg/transform/join\n",
11 | "- 模型训练/预测思想:fit/predict/fit_predict/fit_transform\n",
12 | "- 模型对比\n",
13 | "- data leakage\n",
14 | "\n",
15 | "## 1.题目简介\n",
16 | "\n",
17 | "https://www.kaggle.com/c/covid19-global-forecasting-week-2\n",
18 | "- 监督学习\n",
19 | "- 时间序列"
20 | ]
21 | },
22 | {
23 | "cell_type": "code",
24 | "execution_count": 1,
25 | "metadata": {},
26 | "outputs": [],
27 | "source": [
28 | "import pandas as pd \n",
29 | "train = pd.read_csv(\"~/Downloads/train.csv\")\n",
30 | "test = pd.read_csv(\"~/Downloads/test.csv\")"
31 | ]
32 | },
33 | {
34 | "cell_type": "code",
35 | "execution_count": 2,
36 | "metadata": {},
37 | "outputs": [
38 | {
39 | "data": {
40 | "text/html": [
41 | "
\n",
42 | "\n",
55 | "
\n",
56 | " \n",
57 | " \n",
58 | " | \n",
59 | " Id | \n",
60 | " Province_State | \n",
61 | " Country_Region | \n",
62 | " Date | \n",
63 | " ConfirmedCases | \n",
64 | " Fatalities | \n",
65 | "
\n",
66 | " \n",
67 | " \n",
68 | " \n",
69 | " | 0 | \n",
70 | " 1 | \n",
71 | " NaN | \n",
72 | " Afghanistan | \n",
73 | " 2020-01-22 | \n",
74 | " 0.0 | \n",
75 | " 0.0 | \n",
76 | "
\n",
77 | " \n",
78 | " | 1 | \n",
79 | " 2 | \n",
80 | " NaN | \n",
81 | " Afghanistan | \n",
82 | " 2020-01-23 | \n",
83 | " 0.0 | \n",
84 | " 0.0 | \n",
85 | "
\n",
86 | " \n",
87 | " | 2 | \n",
88 | " 3 | \n",
89 | " NaN | \n",
90 | " Afghanistan | \n",
91 | " 2020-01-24 | \n",
92 | " 0.0 | \n",
93 | " 0.0 | \n",
94 | "
\n",
95 | " \n",
96 | " | 3 | \n",
97 | " 4 | \n",
98 | " NaN | \n",
99 | " Afghanistan | \n",
100 | " 2020-01-25 | \n",
101 | " 0.0 | \n",
102 | " 0.0 | \n",
103 | "
\n",
104 | " \n",
105 | " | 4 | \n",
106 | " 5 | \n",
107 | " NaN | \n",
108 | " Afghanistan | \n",
109 | " 2020-01-26 | \n",
110 | " 0.0 | \n",
111 | " 0.0 | \n",
112 | "
\n",
113 | " \n",
114 | "
\n",
115 | "
"
116 | ],
117 | "text/plain": [
118 | " Id Province_State Country_Region Date ConfirmedCases Fatalities\n",
119 | "0 1 NaN Afghanistan 2020-01-22 0.0 0.0\n",
120 | "1 2 NaN Afghanistan 2020-01-23 0.0 0.0\n",
121 | "2 3 NaN Afghanistan 2020-01-24 0.0 0.0\n",
122 | "3 4 NaN Afghanistan 2020-01-25 0.0 0.0\n",
123 | "4 5 NaN Afghanistan 2020-01-26 0.0 0.0"
124 | ]
125 | },
126 | "execution_count": 2,
127 | "metadata": {},
128 | "output_type": "execute_result"
129 | }
130 | ],
131 | "source": [
132 | "train.head()"
133 | ]
134 | },
135 | {
136 | "cell_type": "code",
137 | "execution_count": 3,
138 | "metadata": {},
139 | "outputs": [
140 | {
141 | "data": {
142 | "text/html": [
143 | "\n",
144 | "\n",
157 | "
\n",
158 | " \n",
159 | " \n",
160 | " | \n",
161 | " ForecastId | \n",
162 | " Province_State | \n",
163 | " Country_Region | \n",
164 | " Date | \n",
165 | "
\n",
166 | " \n",
167 | " \n",
168 | " \n",
169 | " | 0 | \n",
170 | " 1 | \n",
171 | " NaN | \n",
172 | " Afghanistan | \n",
173 | " 2020-03-19 | \n",
174 | "
\n",
175 | " \n",
176 | " | 1 | \n",
177 | " 2 | \n",
178 | " NaN | \n",
179 | " Afghanistan | \n",
180 | " 2020-03-20 | \n",
181 | "
\n",
182 | " \n",
183 | " | 2 | \n",
184 | " 3 | \n",
185 | " NaN | \n",
186 | " Afghanistan | \n",
187 | " 2020-03-21 | \n",
188 | "
\n",
189 | " \n",
190 | " | 3 | \n",
191 | " 4 | \n",
192 | " NaN | \n",
193 | " Afghanistan | \n",
194 | " 2020-03-22 | \n",
195 | "
\n",
196 | " \n",
197 | " | 4 | \n",
198 | " 5 | \n",
199 | " NaN | \n",
200 | " Afghanistan | \n",
201 | " 2020-03-23 | \n",
202 | "
\n",
203 | " \n",
204 | "
\n",
205 | "
"
206 | ],
207 | "text/plain": [
208 | " ForecastId Province_State Country_Region Date\n",
209 | "0 1 NaN Afghanistan 2020-03-19\n",
210 | "1 2 NaN Afghanistan 2020-03-20\n",
211 | "2 3 NaN Afghanistan 2020-03-21\n",
212 | "3 4 NaN Afghanistan 2020-03-22\n",
213 | "4 5 NaN Afghanistan 2020-03-23"
214 | ]
215 | },
216 | "execution_count": 3,
217 | "metadata": {},
218 | "output_type": "execute_result"
219 | }
220 | ],
221 | "source": [
222 | "test.head()"
223 | ]
224 | },
225 | {
226 | "cell_type": "markdown",
227 | "metadata": {},
228 | "source": [
229 | "## 2.评估指标\n",
230 | "\n",
231 | "- RMSLE:和RMSE相比多一个取对数操作\n",
232 | "\n",
233 | "## 3.EDA(Exploratory data analysis) / Preprocessing"
234 | ]
235 | },
236 | {
237 | "cell_type": "markdown",
238 | "metadata": {},
239 | "source": [
240 | "### filter\n",
241 | "按条件过滤"
242 | ]
243 | },
244 | {
245 | "cell_type": "code",
246 | "execution_count": null,
247 | "metadata": {},
248 | "outputs": [],
249 | "source": [
250 | "data_pred = data_pred.loc[data_pred['Day_num']>=day_start]\n",
251 | "real_data = train.loc[(train['Country_Region']==country_name) & (train['Date'].isin(dates_list))]"
252 | ]
253 | },
254 | {
255 | "cell_type": "markdown",
256 | "metadata": {},
257 | "source": [
258 | "### groupby/agg/transform\n",
259 | "分组统计"
260 | ]
261 | },
262 | {
263 | "cell_type": "code",
264 | "execution_count": null,
265 | "metadata": {},
266 | "outputs": [],
267 | "source": [
268 | "# 分组聚合统计1: 行数减少\n",
269 | "confirmed_total_date = train.groupby(['Date']).agg({'ConfirmedCases':['sum']})\n",
270 | "\n",
271 | "# 分组聚合统计2: 行数不变\n",
272 | "train['total_confirm'] = train.groupby(['Date'])['ConfirmedCases'].transform('sum')"
273 | ]
274 | },
275 | {
276 | "cell_type": "markdown",
277 | "metadata": {},
278 | "source": [
279 | "### join\n",
280 | "相当于excel里的vlookup操作"
281 | ]
282 | },
283 | {
284 | "cell_type": "code",
285 | "execution_count": null,
286 | "metadata": {},
287 | "outputs": [],
288 | "source": [
289 | "# 假设有一份数据有关于每个国家的一些信息\n",
290 | "train_extend = pd.merge(train, country_info, on='Country')\n",
291 | "df_val = pd.merge(pred_data_all,train[['Date','Country_Region','Province_State','ConfirmedCases','Fatalities']],on=['Date','Country_Region','Province_State'], how='left')"
292 | ]
293 | },
294 | {
295 | "cell_type": "markdown",
296 | "metadata": {},
297 | "source": [
298 | "### 可视化"
299 | ]
300 | },
301 | {
302 | "cell_type": "code",
303 | "execution_count": null,
304 | "metadata": {},
305 | "outputs": [],
306 | "source": [
307 | "import matplotlib.pyplot as plt\n",
308 | "import seaborn as sns\n",
309 | "import plotly"
310 | ]
311 | },
312 | {
313 | "cell_type": "markdown",
314 | "metadata": {},
315 | "source": [
316 | "### fit/predict/transform\n",
317 | "使用场景:数据转换/编码/模型训练&预测\n",
318 | "\n",
319 | "关键点:训练/预测两阶段可能用不一样的数据"
320 | ]
321 | },
322 | {
323 | "cell_type": "code",
324 | "execution_count": null,
325 | "metadata": {},
326 | "outputs": [],
327 | "source": [
328 | "from sklearn import preprocessing\n",
329 | "# 写法一:一步到位\n",
330 | "le = preprocessing.LabelEncoder()\n",
331 | "train['State_encode'] = le.fit_transform(train['State'])\n",
332 | "\n",
333 | "# 写法二:分两步\n",
334 | "le = preprocessing.LabelEncoder()\n",
335 | "le.fit(train['State'])\n",
336 | "train['State_encode'] = le.transform(train['State'])"
337 | ]
338 | },
339 | {
340 | "cell_type": "code",
341 | "execution_count": null,
342 | "metadata": {},
343 | "outputs": [],
344 | "source": [
345 | "# 模型一般分两步\n",
346 | "model1 = XGBRegressor(n_estimators=1000)\n",
347 | "model1.fit(X_Train_CS, y1_Train_CS)\n",
348 | "y1_pred = model1.predict(X_Test_CS)"
349 | ]
350 | },
351 | {
352 | "cell_type": "markdown",
353 | "metadata": {},
354 | "source": [
355 | "## 4.模型对比及思考"
356 | ]
357 | },
358 | {
359 | "cell_type": "markdown",
360 | "metadata": {},
361 | "source": [
362 | "[1. SIR模型+线性回归 1.12958](https://www.kaggle.com/saga21/covid-global-forecast-sir-model-ml-regressions#4.-Predictions-with-machine-learning-)\n",
363 | "假设:取对数后近似线性模型。简单模型,效果一般。\n",
364 | "\n",
365 | "[2. RNN 0.92425](https://www.kaggle.com/frlemarchand/covid-19-forecasting-with-an-rnn)\n",
366 | "小数据集,深度学习发不上力\n",
367 | "\n",
368 | "[3. 指数模型/SARIMA 0.09306](https://www.kaggle.com/binhlc/sars-cov-2-exponential-model-week-2) \n",
369 | "时间序列模型有明显提升\n",
370 | "\n",
371 | "[4. XGB 0.00006](https://www.kaggle.com/ashora/4ver-arima-2ver-xgboost-newbie)\n",
372 | "xgb不愧是kaggle神器?❌[小心Data-leakage](https://www.kaggle.com/c/covid19-global-forecasting-week-1/discussion/138025)\n",
373 | "\n",
374 | "5.(根据现有数据)有什么模型做不到的?预测峰值。为什么?因为和政策强相关"
375 | ]
376 | },
377 | {
378 | "cell_type": "markdown",
379 | "metadata": {},
380 | "source": [
381 | "## 总结\n",
382 | "- 监督学习整体流程\n",
383 | "- 常用数据处理思想\n",
384 | "- fit/predict/transform思想\n",
385 | "- 模型对比:there is no silver bullet\n",
386 | "- data leakage"
387 | ]
388 | },
389 | {
390 | "cell_type": "markdown",
391 | "metadata": {},
392 | "source": [
393 | "## TODO\n",
394 | "- 从kernel中积累各种用法,分类总结\n",
395 | "- 了解模型理论,通过不同的数据实践,思考为什么效果好/效果不好"
396 | ]
397 | },
398 | {
399 | "cell_type": "code",
400 | "execution_count": null,
401 | "metadata": {},
402 | "outputs": [],
403 | "source": []
404 | }
405 | ],
406 | "metadata": {
407 | "kernelspec": {
408 | "display_name": "Python 3",
409 | "language": "python",
410 | "name": "python3"
411 | },
412 | "language_info": {
413 | "codemirror_mode": {
414 | "name": "ipython",
415 | "version": 3
416 | },
417 | "file_extension": ".py",
418 | "mimetype": "text/x-python",
419 | "name": "python",
420 | "nbconvert_exporter": "python",
421 | "pygments_lexer": "ipython3",
422 | "version": "3.6.4"
423 | }
424 | },
425 | "nbformat": 4,
426 | "nbformat_minor": 2
427 | }
428 |
--------------------------------------------------------------------------------
/第1期-了解这3组概念,就可以开始机器学习实战了.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "attachments": {
5 | "image.png": {
6 | "image/png": ""
7 | }
8 | },
9 | "cell_type": "markdown",
10 | "metadata": {},
11 | "source": [
12 | "## 0.哪3组概念?\n",
13 | "- 1.评价指标\n",
14 | "- 2.欠拟合/过拟合\n",
15 | "- 3.gridsearch/cross validation\n",
16 | "\n",
17 | "## 1.机器学习项目的目标是什么\n",
18 | "\n",
19 | "- 针对具体问题定义一个评估指标(准确率/召回率/AUC/RMSE/NDCG。。。。)\n",
20 | "- 使用现有数据训练一个好的模型\n",
21 | "\n",
22 | "## 2.一个好的机器学习模型有哪些标准\n",
23 | "\n",
24 | "- 在现有数据上评估指标高\n",
25 | "- 在未来可能见到的数据上评估指标高\n",
26 | "\n",
27 | "## 3.什么是过拟合\n",
28 | "\n",
29 | "\n",
30 | "从左到右分别是:欠拟合,好拟合,过拟合\n",
31 | "\n",
32 | "过拟合的本质:模型把数据中的噪声当作细节进行了学习\n",
33 | "\n",
34 | "## 4.为什么会发生过拟合\n",
35 | "\n",
36 | "- 数据噪声\n",
37 | "- 模型复杂度高\n",
38 | "\n",
39 | "## 5.如何抑制过拟合\n",
40 | "\n",
41 | "- 过拟合无法彻底消灭\n",
42 | "- 分train/test数据集\n",
43 | "- 选择合适的模型参数\n",
44 | "- 增加正则化(暂不深究正则化的理论,只要了解有参数可调即可)\n",
45 | "\n",
46 | "## 6.什么是gridsearch和cross validation\n",
47 | "\n",
48 | "- gridsearch:对多个可调参数的多个备选值的组合每个都试一遍\n",
49 | "- cross validation:把数据集分成N份,做N次实验。每次用其中一份做test,其他做train\n",
50 | "\n",
51 | "### Ref\n",
52 | "sklearn相关文档:https://scikit-learn.org/stable/model_selection.html#model-selection"
53 | ]
54 | },
55 | {
56 | "cell_type": "markdown",
57 | "metadata": {},
58 | "source": [
59 | "## 7.实例体会过拟合"
60 | ]
61 | },
62 | {
63 | "cell_type": "code",
64 | "execution_count": 18,
65 | "metadata": {},
66 | "outputs": [],
67 | "source": [
68 | "import numpy as np\n",
69 | "import pandas as pd\n",
70 | "import matplotlib.pyplot as pl\n",
71 | "import sklearn.learning_curve as curves\n",
72 | "from sklearn.tree import DecisionTreeRegressor\n",
73 | "from sklearn.cross_validation import ShuffleSplit, train_test_split\n",
74 | "\n",
75 | "# 让结果在notebook中显示\n",
76 | "%matplotlib inline\n",
77 | "\n",
78 | "# 载入波士顿房屋的数据集\n",
79 | "data = pd.read_csv('housing.csv')\n",
80 | "prices = data['MEDV']\n",
81 | "features = data.drop('MEDV', axis = 1)"
82 | ]
83 | },
84 | {
85 | "cell_type": "code",
86 | "execution_count": 23,
87 | "metadata": {},
88 | "outputs": [],
89 | "source": [
90 | "def ModelComplexity(X, y):\n",
91 | " \"\"\" Calculates the performance of the model as model complexity increases.\n",
92 | " The learning and testing errors rates are then plotted. \"\"\"\n",
93 | "\n",
94 | " # Create 10 cross-validation sets for training and testing\n",
95 | " cv = ShuffleSplit(X.shape[0], n_iter = 10, test_size = 0.2, random_state = 0)\n",
96 | "\n",
97 | " # Vary the max_depth parameter from 1 to 10\n",
98 | " max_depth = np.arange(1,11)\n",
99 | "\n",
100 | " # Calculate the training and testing scores\n",
101 | " train_scores, test_scores = curves.validation_curve(DecisionTreeRegressor(), X, y, \\\n",
102 | " param_name = \"max_depth\", param_range = max_depth, cv = cv, scoring = 'r2')\n",
103 | "\n",
104 | " # Find the mean and standard deviation for smoothing\n",
105 | " train_mean = np.mean(train_scores, axis=1)\n",
106 | " train_std = np.std(train_scores, axis=1)\n",
107 | " test_mean = np.mean(test_scores, axis=1)\n",
108 | " test_std = np.std(test_scores, axis=1)\n",
109 | "\n",
110 | " # Plot the validation curve\n",
111 | " pl.figure(figsize=(7, 5))\n",
112 | " pl.title('Decision Tree Regressor Complexity Performance')\n",
113 | " pl.plot(max_depth, train_mean, 'o-', color = 'r', label = 'Training Score')\n",
114 | " pl.plot(max_depth, test_mean, 'o-', color = 'g', label = 'Validation Score')\n",
115 | " pl.fill_between(max_depth, train_mean - train_std, \\\n",
116 | " train_mean + train_std, alpha = 0.15, color = 'r')\n",
117 | " pl.fill_between(max_depth, test_mean - test_std, \\\n",
118 | " test_mean + test_std, alpha = 0.15, color = 'g')\n",
119 | "\n",
120 | " # Visual aesthetics\n",
121 | " pl.legend(loc = 'lower right') \n",
122 | " pl.xlabel('Maximum Depth')\n",
123 | " pl.ylabel('Score')\n",
124 | " pl.ylim([-0.05,1.05])"
125 | ]
126 | },
127 | {
128 | "cell_type": "code",
129 | "execution_count": 26,
130 | "metadata": {},
131 | "outputs": [
132 | {
133 | "data": {
134 | "image/png": "\n",
135 | "text/plain": [
136 | ""
137 | ]
138 | },
139 | "metadata": {},
140 | "output_type": "display_data"
141 | }
142 | ],
143 | "source": [
144 | "ModelComplexity(features, prices)"
145 | ]
146 | },
147 | {
148 | "cell_type": "markdown",
149 | "metadata": {},
150 | "source": [
151 | "## 总结\n",
152 | "- 1.评价指标\n",
153 | "- 2.过拟合\n",
154 | "- 3.gridsearch/cross validation\n",
155 | "- 4.TODO: 安装anaconda,看代码 https://www.kaggle.com/search?q=boston+housing+in%3Anotebooks 积累pandas,numpy各种数据处理技巧,"
156 | ]
157 | },
158 | {
159 | "cell_type": "markdown",
160 | "metadata": {},
161 | "source": [
162 | "## 后续内容预告\n",
163 | "- 评估指标对比\n",
164 | "- 数据处理总结\n",
165 | "- 模型在不同领域的应用\n",
166 | "- kaggle实战"
167 | ]
168 | }
169 | ],
170 | "metadata": {
171 | "kernelspec": {
172 | "display_name": "Python 3",
173 | "language": "python",
174 | "name": "python3"
175 | },
176 | "language_info": {
177 | "codemirror_mode": {
178 | "name": "ipython",
179 | "version": 3
180 | },
181 | "file_extension": ".py",
182 | "mimetype": "text/x-python",
183 | "name": "python",
184 | "nbconvert_exporter": "python",
185 | "pygments_lexer": "ipython3",
186 | "version": "3.6.4"
187 | }
188 | },
189 | "nbformat": 4,
190 | "nbformat_minor": 2
191 | }
192 |
--------------------------------------------------------------------------------