├── .gitignore
├── 01_basic
├── README.md
└── arg_test.py
├── 02_numpy
└── README.md
├── 03_pandas
├── README.md
├── concat.png
├── join.png
└── merge.png
├── 04_sklearn
└── README.md
├── 05_OOP
└── README.md
├── 06_flask_sanic
└── README.md
├── 07_database
├── README.md
├── es.md
├── faiss.md
├── imgs
│ └── redis_pic.png
└── neo4j.md
├── 08_vscode
└── README.md
├── 09_remote_ipython
└── README.md
├── 10_docker
├── README.md
├── mi_docker_demo
│ ├── README.md
│ ├── app.py
│ ├── docker_build
│ └── requirements.txt
└── newapi_docker_demo
│ └── README.md
├── 11_rabbitmq
└── README.md
├── 12_nginx
├── README.md
├── default.conf
├── default1.conf
├── default2.conf
└── default3.conf
├── 13_airflow
└── README.md
├── 14_go
└── README.md
├── 15_ansible
└── README.md
├── 99_pycharm_archive
├── .DS_Store
├── README.md
├── pic
│ ├── pycharm_activ.png
│ ├── pycharm_git1.png
│ ├── pycharm_git2.png
│ ├── pycharm_remote1.png
│ ├── pycharm_remote2.png
│ ├── pycharm_remote3.png
│ ├── pycharm_remote4.png
│ └── pycharm_remote5.png
└── 激活码
│ ├── .DS_Store
│ ├── jetbrains-agent.jar
│ ├── 永久激活码
│ ├── 激活码.txt
│ ├── 激活码1.txt
│ ├── 激活码2.txt
│ └── 激活码3.txt
│ └── 非永久激活码
│ └── Pycharm方式一激活码汇总.docx
└── README.md
/.gitignore:
--------------------------------------------------------------------------------
1 | .idea
2 | __pycache__
3 | .DS_Store
4 |
--------------------------------------------------------------------------------
/01_basic/README.md:
--------------------------------------------------------------------------------
1 | ## python实用技巧
2 |
3 | [**1. lambda函数**](#lambda函数)
4 |
5 | [**2. map函数**](#map函数)
6 |
7 | [**3. filter函数**](#filter函数)
8 |
9 | [**4. reduce函数**](#reduce函数)
10 |
11 | [**5. apply和applymap函数、transform/agg**](#apply函数)
12 |
13 | [**6. dict转object**](#dict转object)
14 |
15 | [**7. KFold函数**](#kfold函数)
16 |
17 | [**8. sys.defaultencoding**](#sys)
18 |
19 | [**9. pip install error _NamespacePath**](#pip_error)
20 |
21 | [**10. zip(\*xx)用法**](#zip)
22 |
23 | [**11. dataframe中某一列字符串长度为10的进行切片**](#切片)
24 |
25 | [**12. re模块(一些常用的正则轮子)**](#re模块)
26 |
27 | [**13. eval**](#eval)
28 |
29 | [**14. global用法**](#global)
30 |
31 | [**15. 多进程与多线程实现**](#多进程与多线程实现)
32 |
33 | [**16. CV的多进程实现**](#cv的多进程实现)
34 |
35 | [**17. 保存数据(json)**](#保存数据)
36 |
37 | [**18. 保存模型**](#保存模型)
38 |
39 | [**19. enumerate用法**](#enumerate)
40 |
41 | [**20. label数值化方法**](#label数值化方法)
42 |
43 | [**21. 列表推导式中使用if else**](#列表推导式中使用if_else)
44 |
45 | [**22. 将nparray或list中的最多的元素选出**](#将numpy_array中的最多的元素选出)
46 |
47 | [**23. 函数中传入函数demo**](#函数中传入函数demo)
48 |
49 | [**24. getattr**](#getattr)
50 |
51 | [**25. df宽变长及一列变多列**](#df宽变长及一列变多列)
52 |
53 | [**26. groupby使用**](#groupby使用)
54 |
55 | [**27. python画图显示中文**](#python画图及显示中文)
56 |
57 | [**28. 给字典按value排序**](#给字典按value排序)
58 |
59 | [**29. sorted高级用法**](#sorted高级用法)
60 |
61 | [**30. time用法**](#time用法)
62 |
63 | [**31. 两层列表展开平铺**](#两层列表展开平铺)
64 |
65 | [**32. 读取百度百科词向量**](#读取百度百科词向量)
66 |
67 | [**33. logging**](#logging)
68 |
69 | [**34. argparse用法**](#argparse用法)
70 |
71 | [**35. 包管理**](#包管理)
72 |
73 | [**36. 装饰器**](#装饰器)
74 |
75 | [**37. 本地用python起http服务**](#本地用python起http服务)
76 |
77 | [**38. cache**](#cache)
78 |
79 | [**39. 创建文件**](#创建文件)
80 |
81 | [**40. 字典转成对象(骚操作)**](#字典转成对象)
82 |
83 | [**41. lgb[gpu版本]和xgb[gpu版本]安装**](#boost安装)
84 |
85 | [**42. tqdm**](#tqdm)
86 |
87 | [**43. joblib Parallel并行**](#joblib_parallel)
88 |
89 | [**44. 调试神器pysnooper - 丢弃print**](#调试神器pysnooper)
90 |
91 | [**45. 调试神器debugpy**](#调试神器debugpy)
92 |
93 | [**46. 分组计算均值并填充**](#分组计算均值并填充)
94 |
95 | [**47. python日期处理**](#python日期处理)
96 |
97 | [**48. dataclass**](#dataclass)
98 |
99 | [**49. md5 sha256**](#md5_sha256)
100 |
101 | [**50. 查看内存**](#查看内存)
102 |
103 | [**51. __slots__用法**](#slots用法)
104 |
105 | ---
106 |
107 | 点击展开
108 |
109 | ```python
110 | %reload_ext autoreload
111 | %autoreload 2
112 | %matplotlib notebook
113 |
114 | import sys
115 | sys.path.append('..')
116 | ```
117 |
118 | ### lambda函数
119 | ```python
120 | # lambda: 快速定义单行的最小函数,inline的匿名函数
121 | (lambda x : x ** 2)(3)
122 | # 或者
123 | f = lambda x : x ** 2
124 | f(3)
125 | ```
126 |
127 | ### map函数
128 | ```python
129 | arr_str = ["hello", "this"]
130 | arr_num = [3,1,6,10,12]
131 |
132 | def f(x):
133 | return x ** 2
134 | map(lambda x : x ** 2, arr_num)
135 | map(f, arr_num)
136 | map(len, arr_str)
137 | map(lambda x : (x, 1), arr_str)
138 | ```
139 | ```python
140 | # 可以对每个列表对应的元素进行操作,比如加总
141 | f1 = lambda x,y,z:x+y+z
142 | list(map(f1,[1,2,10],[2,3,6],[4,3,5]))
143 | # [7,8,21]
144 | ```
145 |
146 | ### filter函数
147 | ```python
148 | arr_str = ['hello','hi','nice']
149 | arr_num = [1,6,10,12]
150 | filter(lambda x : len(x) >= 5, arr_str)
151 | filter(lambda x : x > 5, arr_num)
152 | [(i.word, 'E') if i.flag =='n' else (i.word, 'P') for i in filter(lambda x: x.flag in ('n', 'v'), a) ]
153 | ```
154 |
155 | ### reduce函数
156 | ```python
157 | # 在python3里,reduce函数已经被从全局命名空间里移除了,它现在被放置在functools模块里
158 | from functools import reduce
159 | arr_num = [1,6,7,10]
160 | reduce(lambda x, y : x + y, arr_num)
161 | ```
162 |
163 | ### apply函数
164 |
165 | - apply函数是对行进行操作
166 |
167 | 你可以把apply()当作是一个map()函数,只不过这个函数是专为Pandas的dataframe和series对象打造的。对初学者来说,你可以把series对象想象成类似NumPy里的数组对象。它是一个一维带索引的数据表结构。
168 |
169 | apply() 函数作用是,将一个函数应用到某个数据表中你指定的一行或一列中的每一个元素上。是不是很方便?特别是当你需要对某一列的所有元素都进行格式化或修改的时候,你就不用再一遍遍地循环啦!
170 | ```python
171 | df = pd.DataFrame([[4,9],]*3,columns=['A','B'])
172 | df.apply(np.sqrt)
173 | df.apply(np.sum,axis=0)
174 | df.apply(np.sum,axis=1)
175 | df.apply(lambda x : [1,2], axis=1)
176 | df.apply(lambda x : x.split()[0])
177 | ```
178 | > applymap和apply差不多,不过是全局函数,elementwise,作用于dataframe中的每个元素
179 |
180 | - transform/agg是对一列进行操作
181 |
182 | 由前面分析可以知道,Fare项在测试数据中缺少一个值,所以需要对该值进行填充。
183 | 我们按照一二三等舱各自的均价来填充:
184 | 下面transform将函数np.mean应用到各个group中。
185 | ```python
186 | combined_train_test['Fare'] = combined_train_test[['Fare']].fillna(combined_train_test.groupby('Pclass').transform(np.mean))
187 | ```
188 |
189 | ### dict转object
190 | ```python
191 | import json
192 | # json格式的str
193 | s = '{"name":{"0":"John","1":"Lily"},"phone_no":{"0":"189101","1":"234220"},"age":{"0":"11","1":"23"}}'
194 | # load成dict
195 | dic = json.loads(s)
196 | dic
197 | # {"name":{"0":"John","1":"Lily"},"phone_no":{"0":"189101","1":"234220"},"age":{"0":"11","1":"23"}}
198 | # 不能使用dic.name, dic.age 只能dic['name'], dic['age']
199 | class p:
200 | def __init__(self, d=None):
201 | self.__dict__ = d
202 | p1 = p(dic)
203 | # 这个时候就可以用p1.name, p1.age了
204 |
205 | # 更详细一点
206 | import six
207 | import pprint
208 | # 现在有个字典
209 | conf = {'base':{'good','medium','bad'},'age':'24'}
210 | # conf.age是不行的
211 | 定义一个class:
212 | class p:
213 | def __init__(self, d=None):
214 | self.__dict__ = d
215 | def keys(self):
216 | return self.__dict__.keys()
217 | def items(self):
218 | return six.iteritems(self.__dict__)
219 | def __repr__(self):
220 | return pprint.pformat(self.__dict__) # 将dict转成字符串
221 | p1 = p(conf)
222 | 这个时候就可以p1.base和p1.age
223 | p1这个实例拥有的属性有:
224 | p.__doc__
225 | p.__init__
226 | p.__module__
227 | p.__repr__
228 | p.age * age和base这两个是字典加载进来以后多出来的属性
229 | p.base *
230 | p.items
231 | p.keys
232 | ```
233 |
234 | ### kfold函数
235 | 新手用cross_val_score比较简单,后期可用KFold更灵活,
236 | ```python
237 | skf = StratifiedKFold(n_splits=5,shuffle=True)
238 | for train_idx, val_idx in skf.split(X,y):
239 | pass
240 | train_idx
241 | val_idx
242 | ```
243 | ```python
244 | from sklearn.model_selection import cross_val_score, StratifiedKFold, KFold
245 | forest = RandomForestClassifier(n_estimators = 120,max_depth=5, random_state=42)
246 | cross_val_score(forest,X=train_data_features,y=df.Score,scoring='neg_mean_squared_error',cv=3)
247 | # 这里的scoring可以自己写,比如我想用RMSE则
248 | from sklearn.metrics import scorer
249 | def ff(y,y_pred):
250 | rmse = np.sqrt(sum((y-y_pred)**2)/len(y))
251 | return rmse
252 | rmse_scoring = scorer.make_scorer(ff)
253 | cross_val_score(forest,X=train_data_features,y=df.Score,scoring=rmse_scoring,cv=5)
254 | ```
255 | ```python
256 | # Some useful parameters which will come in handy later on
257 | ntrain = titanic_train_data_X.shape[0]
258 | ntest = titanic_test_data_X.shape[0]
259 | SEED = 42 # for reproducibility
260 | NFOLDS = 5 # set folds for out-of-fold prediction
261 | kf = KFold(n_splits = NFOLDS, random_state=SEED, shuffle=True)
262 |
263 | def get_out_fold(clf, x_train, y_train, x_test): # 这里需要将dataframe转成array,用x_train.values即可
264 | oof_train = np.zeros((ntrain,))
265 | oof_test = np.zeros((ntest,))
266 | oof_test_skf = np.empty((NFOLDS, ntest))
267 |
268 | for i, (train_index, test_index) in enumerate(kf.split(x_train)):
269 | x_tr = x_train.loc[train_index]
270 | y_tr = y_train.loc[train_index]
271 | x_te = x_train.loc[test_index]
272 |
273 | clf.fit(x_tr, y_tr)
274 |
275 | oof_train[test_index] = clf.predict(x_te)
276 | oof_test_skf[i, :] = clf.predict(x_test)
277 |
278 | oof_test[:] = oof_test_skf.mean(axis=0)
279 | return oof_train.reshape(-1, 1), oof_test.reshape(-1, 1)
280 | ```
281 |
282 | ### sys
283 | ```python
284 | import sys
285 | reload(sys)
286 | sys.setdefaultencoding('utf-8')
287 | #注意:使用此方式,有极大的可能导致print函数无法打印数据!
288 |
289 | #改进方式如下:
290 | import sys #这里只是一个对sys的引用,只能reload才能进行重新加载
291 | stdi,stdo,stde=sys.stdin,sys.stdout,sys.stderr
292 | reload(sys) #通过import引用进来时,setdefaultencoding函数在被系统调用后被删除了,所以必须reload一次
293 | sys.stdin,sys.stdout,sys.stderr=stdi,stdo,stde
294 | sys.setdefaultencoding('utf-8')
295 | ```
296 |
297 | ### pip_error
298 |
299 | 使用pip时出现错误:
300 | AttributeError: '_NamespacePath' object has no attribute 'sort'
301 |
302 | 解决方法:
303 | 1. 关于Anaconda3报错 AttributeError: '_NamespacePath' object has no attribute 'sort' ,先参考下面这篇博客:
304 | http://www.cnblogs.com/newP/p/7149155.html
305 | 按照文中的做法是可以解决conda报错的,总结一下就是:一,把文件夹 D:\ProgramData\Anaconda3\Lib\site-packages\conda\_vendor\auxlib 中的 path.py 中,“except ImportError: ”修改为“except Exception:“;二、找到D:\ProgramData\Anaconda3\lib\site-packages\setuptools-27.2.0-py3.6.egg,删除(不放心的话,剪切到别的地方)
306 |
307 | 2.然而pip报错的问题还没解决。首先要安装setuptools模块,下载地址是:
308 | https://pypi.python.org/pypi/setuptools#files
309 | 下载setuptools-36.5.0.zip解压,命令窗口进入到文件夹然后 python setup.py install
310 |
311 | 3.安装好setuptools模块之后应该能用easy_install了,我们要借助它来重新安装pip。命令窗口输入命令:easy_install pip
312 |
313 | ### zip
314 | zip基本用法
315 | ```python
316 | a = [1,2,3]
317 | b = [4,5,6]
318 | for i,j in zip(a,b):
319 | print(i,j)
320 | # 1 4
321 | # 2 5
322 | # 3 6
323 | ```
324 |
325 | ```python
326 | s = '彩符和大汶口文化陶尊符号是第三阶段的语段文字'
327 | print(synonyms.seg(s))
328 | # (['彩符', '和', '大汶口', '文化', '陶尊', '符号', '是', '第三阶段', '的', '语段', '文字'], ['n', 'c', 'ns', 'n', 'nr', 'n', 'v', 't', 'uj', 'n', 'n'])
329 | [x for x in zip(*synonyms.seg(s))]
330 | # [('彩符', 'n'),
331 | ('和', 'c'),
332 | ('大汶口', 'ns'),
333 | ('文化', 'n'),
334 | ('陶尊', 'nr'),
335 | ('符号', 'n'),
336 | ('是', 'v'),
337 | ('第三阶段', 't'),
338 | ('的', 'uj'),
339 | ('语段', 'n'),
340 | ('文字', 'n')]
341 | ```
342 | ### 切片
343 | ```python
344 | data.msg_from = data.msg_from.astype(str)
345 | data[data.msg_from.apply(len)==10]
346 | ```
347 |
348 | ### re模块
349 |
350 | [常用正则表达式速查手册,Python文本处理必备](https://mp.weixin.qq.com/s/ySsgcrSnkguO2c8D-SQNxw)
351 | [regexlearn](https://github.com/aykutkardas/regexlearn.com)
352 |
353 | ```python
354 | # 1. 将一个问题中的网址、邮箱、手机号、身份证、日期、价格提出来
355 |
356 | # 日期 注:这里的{1,4}指的是匹配1到4位,问号指的是0个或1个
357 | DATE_REG1 = "(?:[一二三四五六七八九零十0-9]{1,4}年[一二三四五六七八九零十0-9]{1,2}月[一二三四五六七八九零十0-9]{1,2}[日|号|天|分]?)|\
358 | (?:[一二三四五六七八九零十0-9]+年[一二三四五六七八九零十0-9]+月)|\
359 | (?:[一二三四五六七八九零十0-9]{1,2}月[一二三四五六七八九零十0-9]{1,2}[号|日|天]?)|\
360 | (?:[一二三四五六七八九零十0-9]+年)|\
361 | (?:[一二三四五六七八九零十0-9]+月)|\
362 | (?:[一二三四五六七八九零十0-9]{1,3}[号|日|天])|\
363 | (?:[一二三四五六七八九零十0-9]+小时[一二三四五六七八九零十0-9]+分钟)|\
364 | (?:[一二三四五六七八九零十0-9]+小时)|\
365 | (?:[一二三四五六七八九零十0-9]+分钟)\
366 | "
367 |
368 | # 网址
369 | URL_REG = "http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*,]|(?:%[0-9a-fA-F][0-9a-fA-F]))+"
370 | # 手机
371 | PHONE_REG = "[+](?:86)[-\s+]*?1[3-8][0-9]{9}"
372 | # 邮箱
373 | MAIL_REG = "[0-9a-zA-Z_]{0,39}@(?:[A-Za-z0-9]+\.)+[A-Za-z]+"
374 | # 身份证
375 | IDCARD_REG = "\d{18}|\d{17}[Xx]"
376 |
377 | # 价格
378 | MONEY_REG1 = "(?:\d+[\.\d+]*万*亿*美*港*元/桶)|\
379 | (?:\d+[\.\d+]*万*亿*美*港*元/吨)|\
380 | (?:\d+[\.\d+]*万*亿*美*港*元/升)|\
381 | (?:\d+[\.\d+]*万*亿*美*港*元/吨)|\
382 | (?:\d+[\.\d+]*万*亿*美*港*元/赛季)|\
383 | (?:\d+[\.\d+]*万*亿*美*港*平方米)|\
384 | (?:\d+[\.\d+]*万*亿*美*港*平方千米)|\
385 | (?:(?:[\d]{1,3},)*(?:[\d]{3})[万亿]*[美港]*元)|\
386 | (?:\d+[\.\d+]*万*亿*美*港*[股|笔|户|辆|倍|桶|吨|升|个|手|点|元|亿|万])"
387 |
388 | MONEY_REG2 = "([一二三四五六七八九零十百千万亿|\d|.]+[万|元|块|毛][一二三四五六七八九零十百千万亿|\d|.]*)+"
389 |
390 | ## add date reg
391 | DATE_REG2 = "(?:[\d]*[-:\.]*\d+[-:\.点]\d+分)|(?:[\d+-]*\d+月份)|(?:\d+[-:\.]\d+[-:\.]\d+)"
392 | # HYPER_REG 2017-09-20
393 | HYPER_REG = "[0-9a-zA-Z]+[-:][0-9a-zA-Z]+[%]*"
394 |
395 | # 2. 具体的正则匹配问题
396 |
397 | ## 2.1 以数字开头后面只能接文字,而且数字后面接的文字不能是【小时、种】
398 | s = '22基本日常生活活动:指食物摄取、大小便始末、穿脱衣服、起居、步行、入浴。'
399 | re.findall(r'^\d+(?![\d*小时*]|[\d*种*])[\u4e00-\u9fa5]+', s)
400 |
401 | # 匹配只留下中文、英文和数字
402 | re.sub(r'[^\u4E00-\u9FA5\s0-9a-zA-Z]+', '', s)
403 |
404 | # 日期解析202206
405 | import cn2an #version 0.5.14
406 | import datetime
407 | import re
408 | def getYearMonth(s):
409 | '''
410 | 【格式说明】
411 | 今年上个月/上月/前一个月/前个月 -> 202204
412 | 今年当月/该月/这月/这个月/本月 -> 202205
413 | 去年5月/去年五月/2021年五月/2021五月/二零二一五月/二零二一 五月 -> 202105
414 | 前年5月/前年五月/2020年五月/2020五月/二零二零五月/二零二零 五月 -> 202005
415 | 2021年7月/二零二一年7月 -> 202107
416 | 5月/五月份 -> 202205
417 | 2021.6/2021.06/2021-6/2021-06/2021 - 6月/2021 ---6月/2021 . 6月/2021...6月, -> 202106
418 | 2021 4月/2021 04 -> 202104
419 | 如果没有提到时间 -> 202205(默认今年当月)
420 | 如果输入的时间有误或月份有误比如输入2021 23, -> 202205(默认今年当月)
421 | 如果输入时间超过当前时间 -> 202205(默认今年当月)
422 | 如果输入时间早于2020年1月 -> 202205(默认今年当月)
423 | '''
424 | cur_date = datetime.datetime.now().strftime('%Y%m')
425 | try:
426 | DATE_REG1 = '(?:[一二三四五六七八九零十0-9]{1,4}年[一二三四五六七八九零十0-9]{1,2}月)|(?:去年[一二三四五六七八九零十0-9]+月)|(?:前年[一二三四五六七八九零十0-9]+月)|(?:[一二三四五六七八九零十0-9]+年[一二三四五六七八九零十0-9]+月)|(?:[一二三四五六七八九零十0-9]{1,2}月)|(?:[一二三四五六七八九零十0-9]+年)|(?:[一二三四五六七八九零十0-9]+月)'
427 | thism_lst = ['当月', '该月', '这个月', '本月']
428 | lastm_lst = ['上月', '上个月', '前一个月', '前个月']
429 | date = ''
430 | def helper(s, pattern):
431 | date = ''
432 | s = cn2an.transform(s, "cn2an") # 转换成阿拉伯数字
433 | res = re.findall(pattern, s)
434 | if res:
435 | res = res[0] # 如果有多个就取第一个
436 | year = '2022' #需要人工维护当年,还有过去两年的一个判断;每年要手动更新这部分
437 | if '去年' in res or '21年' in res:
438 | year = '2021'
439 | elif '前年' in res or '20年' in res:
440 | year = '2020'
441 | month = re.findall('(?:([0-9]+)月)', res)
442 | if month:
443 | month = int(month[0])
444 | if month > 0 and month < 13:
445 | if month < 10:
446 | month = '0' + str(month)
447 | else:
448 | month = str(month)
449 | else:
450 | return ''
451 | date = year + month
452 | else:
453 | date = year + str(datetime.datetime.now().month)
454 | return date
455 | six_d = re.findall(r'2\d{5}', s) #直接识别6位日期比如202110
456 | if six_d:
457 | date = six_d[0]
458 | if not date:
459 | # 针对2021 4月/2021.6/2021.06/2021-6/2021-06/2021 - 6月/2021 ---6月/2021 . 6月/2021...6月这些情况
460 | DATE_REG3 = r'(?:\d{4}\s*\.+\s*\d{1,2})|(?:\d{4}\s*-+\s*\d{1,2})|(?:\d{4}\s*_+\s*\d{1,2})|(?:\d{4}\s+\d{1,2})'
461 | six_d2 = re.findall(DATE_REG3, s)
462 | if six_d2:
463 | _six_d2 = six_d2[0]
464 | try:
465 | int(_six_d2[-2])
466 | _six_d2_m = _six_d2[-2:]
467 | except:
468 | _six_d2_m = _six_d2[-1]
469 | s = _six_d2[:4]+'年'+_six_d2_m+'月'
470 | s = s.replace(' ', '')
471 | if not date:
472 | for i in thism_lst:
473 | if i in s:
474 | date = cur_date
475 | break
476 | if not date:
477 | for i in lastm_lst:
478 | if i in s:
479 | date = (datetime.datetime.now() - datetime.timedelta(days=30, hours=23)).strftime('%Y%m')
480 | break
481 | if not date:
482 | # 判断2021五月这种情况
483 | DATE_REG2 = '(?:[一二三四五六七八九零十0-9]{4}[一二三四五六七八九零十]{1,2}月)'
484 | res = re.findall(DATE_REG2, s)
485 | if res:
486 | s = res[0][:4]+'年'+res[0][4:]
487 | date = helper(s, DATE_REG1)
488 | else:
489 | date = ''
490 | if not date:
491 | date = helper(s, DATE_REG1)
492 | if not date:
493 | date = cur_date
494 | #corner case再判断下,处理下边界问题
495 | if date < '202001' or date[-2:] > '12':
496 | date = cur_date
497 | except:
498 | date = cur_date
499 | return date
500 | ```
501 |
502 | ### eval
503 | ```python
504 | eval("['一','二','三']")
505 | 输出 ['一','二','三']
506 | eval("{'a':1,'b':2}")
507 | 输出 {'a':1,'b':2}
508 | ```
509 |
510 | ### global
511 | ```python
512 | a = None
513 |
514 | def f1():
515 | a = 10
516 |
517 | def f2():
518 | global a
519 | a = 10
520 | f1()
521 | print(a)
522 | f2()
523 | print(a)
524 | ```
525 | 运行完f1()后,a还是None;运行完f2()后,a变成了10。一般规范global变量用大写
526 |
527 | ### 多进程与多线程实现
528 |
529 | ```python
530 | # 多进程实现举例
531 | from multiprocessing import Pool
532 | import os
533 | import time
534 |
535 | def long_time_task(a, b):
536 | print('Run task %s (%s)...' % (a, os.getpid()))
537 | start = time.time()
538 | time.sleep(1)
539 | end = time.time()
540 | print('Task %s runs %0.2f seconds.' % (a, (end - start)))
541 | return str(a) + '__pool__' + str(b)
542 |
543 |
544 | if __name__ == '__main__':
545 |
546 | print('Parent process %s.' % os.getpid())
547 | p = Pool(4)
548 | res = []
549 | for i in range(10):
550 | res.append(p.apply_async(long_time_task, args=(i, i+1)))
551 | print('Waiting for all subprocesses done...')
552 | p.close()
553 | p.join()
554 | print('All subprocesses done.')
555 | # 拿到子进程返回的结果
556 | for i in res:
557 | print('xxx', i.get())
558 | ```
559 | ```python
560 | # 多线程实现举例
561 | def func1(p1, p2, p3):
562 | pass
563 | def func2(p1, p2):
564 | pass
565 | from concurrent.futures import ThreadPoolExecutor, wait
566 | executor = ThreadPoolExecutor(max_workers=4)
567 | tasks = []
568 | tasks.append(executor.submit(func1, param1, param2, param3))
569 | tasks.append(executor.submit(func2, param1, param2))
570 | wait(tasks, return_when='ALL_COMPLETED')
571 | res1, res2 = (x.result() for x in tasks)
572 | ```
573 | ```python
574 | # 多进程优化版(推荐用这个)
575 | #!/usr/bin/env python
576 | # -*- coding: utf-8 -*-
577 | import functools
578 | from concurrent.futures import ProcessPoolExecutor
579 | from tqdm import tqdm
580 | import time
581 |
582 | class Pipe(object):
583 | """I am very like a linux pipe"""
584 |
585 | def __init__(self, function):
586 | self.function = function
587 | functools.update_wrapper(self, function)
588 |
589 | def __ror__(self, other):
590 | return self.function(other)
591 |
592 | def __call__(self, *args, **kwargs):
593 | return Pipe(
594 | lambda iterable, *args2, **kwargs2: self.function(
595 | iterable, *args, *args2, **kwargs, **kwargs2
596 | )
597 | )
598 |
599 | @Pipe
600 | def xProcessPoolExecutor(iterable, func, max_workers=5, desc="Processing", unit="it"):
601 | if max_workers > 1:
602 | total = len(iterable) if hasattr(iterable, '__len__') else None
603 |
604 | with ProcessPoolExecutor(max_workers) as pool, tqdm(total=total, desc=desc, unit=unit) as pbar:
605 | for i in pool.map(func, iterable):
606 | yield i
607 | pbar.update()
608 |
609 | else:
610 | return map(func, iterable)
611 |
612 | xtuple, xlist, xset = Pipe(tuple), Pipe(list), Pipe(set)
613 |
614 | def ff(x):
615 | for i in range(x):
616 | a = 1
617 | return x+2
618 |
619 | if __name__ == '__main__':
620 | dfs = []
621 | arr = [100000000,200000000,300000000,400000000]
622 | #without multiprocess
623 | for i in arr:
624 | dfs.append(ff(i))
625 | #with multiprocess
626 | dfs = arr | xProcessPoolExecutor(ff, 16) | xlist #这里的16是进程数,一般cpu有N核就起N-1个进程
627 | print(dfs)
628 | ```
629 | ```python
630 | # 多进程(yuanjie封装meutils) 以多进程读取data下pdf文件为例
631 | from meutils.pipe import *
632 | os.environ['LOG_PATH'] = 'pdf.log'
633 | from meutils.log_utils import *
634 | location = 'output' #pdf文件处理后保存的文件夹
635 | @diskcache(location=location)
636 | def func(file_path):
637 | try:
638 | df = pdf_layout(str(file_path)) #解析成字典 详见https://github.com/binzhouchn/deep_learning/blob/master/4_llm/1_%E5%90%91%E9%87%8F%E6%95%B0%E6%8D%AE%E5%BA%93/es/es.py 中的body字典
639 | with open(f'{location}/{file_path.stem}.txt', 'w', encoding='utf8') as f:
640 | json.dump(df, f, ensure_ascii=False)
641 | except Exception as e:
642 | logger.debug(f"{file_path}: {e}")
643 | logger.debug(f"{file_path}: {traceback.format_exc().strip()}")
644 | if __name__ == '__main__':
645 | ps = Path('./data/').glob('*.pdf') | xlist #将所有pdf文件都列出来
646 | dfs = ps | xProcessPoolExecutor(func, 16) | xlist #这里的16是进程数,一般cpu有N核就起N-1个进程
647 | ```
648 |
649 | ### cv的多进程实现
650 |
651 | ```python
652 | from multiprocessing import Manager, Process
653 | n = 5
654 | kf = KFold(n_splits=n, shuffle=False)
655 | mg = Manager()
656 | mg_list = mg.list()
657 | p_proc = []
658 |
659 | def lr_pred(i,tr,va,mg_list):
660 | print('%s stack:%d/%d'%(str(datetime.now()),i+1,n))
661 | clf = LogisticRegression(C=3)
662 | clf.fit(X[tr],y[tr])
663 | y_pred_va = clf.predict_proba(X[va])
664 | print('va acc:',myAcc(y[va], y_pred_va))
665 | mg_list.append((va, y_pred_va))
666 | # return mg_list # 可以不加
667 |
668 | print('main line')
669 | for i,(tr,va) in tqdm_notebook(enumerate(kf.split(X))):
670 | p = Process(target=lr_pred, args=(i,tr,va,mg_list,))
671 | p.start()
672 | p_proc.append(p)
673 | [p.join() for p in p_proc]
674 | # 最后把mg_list中的元组数据拿出来即可
675 | ```
676 |
677 | ### 保存数据
678 |
679 | ```python
680 | # 这里medical是mongodb的一个集合
681 | import json
682 | with open('../data/medical.json','w',encoding='utf-8') as fp:
683 | for i in medical.find():
684 | i['_id'] = i.get('_id').__str__() # 把bson的ObjectId转成str
685 | json.dump(i,fp, ensure_ascii=False)
686 | fp.write('\n')
687 | fp.close()
688 |
689 | # 使用pickle(保存)
690 | data = (x_train, y_train, x_test)
691 | f_data = open('./data_doc2vec_25.pkl', 'wb')
692 | pickle.dump(data, f_data)
693 | f_data.close()
694 | # 使用pickle(读取)
695 | f = open('./data_doc2vec_25.pkl', 'rb')
696 | x_train, _, x_test = pickle.load(f)
697 | f.close()
698 |
699 | ```
700 |
701 | ### 保存模型
702 |
703 | 1. 使用 pickle 保存
704 | ```python
705 | import pickle #pickle模块
706 |
707 | #保存Model(注:save文件夹要预先建立,否则会报错)
708 | with open('save/clf.pickle', 'wb') as f:
709 | pickle.dump(clf, f)
710 |
711 | #读取Model
712 | with open('save/clf.pickle', 'rb') as f:
713 | clf2 = pickle.load(f)
714 | #测试读取后的Model
715 | print(clf2.predict(X[0:1]))
716 | ```
717 | 2. 使用joblib保存
718 | ```python
719 | from sklearn.externals import joblib #jbolib模块
720 |
721 | #保存Model(注:save文件夹要预先建立,否则会报错)
722 | joblib.dump(clf, 'save/clf.pkl')
723 |
724 | #读取Model
725 | clf3 = joblib.load('save/clf.pkl')
726 |
727 | #测试读取后的Model
728 | print(clf3.predict(X[0:1]))
729 | ```
730 |
731 | 3. 可以使用dataframe自带的to_pickle函数,可以把大的文件存成多个
732 | ```python
733 | import os
734 | from glob import glob
735 |
736 | def mkdir_p(path):
737 | try:
738 | os.stat(path)
739 | except:
740 | os.mkdir(path)
741 |
742 | def to_pickles(df, path, split_size=3, inplace=True):
743 | """
744 | path = '../output/mydf'
745 |
746 | wirte '../output/mydf/0.p'
747 | '../output/mydf/1.p'
748 | '../output/mydf/2.p'
749 |
750 | """
751 | if inplace==True:
752 | df.reset_index(drop=True, inplace=True)
753 | else:
754 | df = df.reset_index(drop=True)
755 | gc.collect()
756 | mkdir_p(path)
757 |
758 | kf = KFold(n_splits=split_size)
759 | for i, (train_index, val_index) in enumerate(tqdm(kf.split(df))):
760 | df.iloc[val_index].to_pickle(f'{path}/{i:03d}.p')
761 | return
762 |
763 | def read_pickles(path, col=None):
764 | if col is None:
765 | df = pd.concat([pd.read_pickle(f) for f in tqdm(sorted(glob(path+'/*')))])
766 | else:
767 | df = pd.concat([pd.read_pickle(f)[col] for f in tqdm(sorted(glob(path+'/*')))])
768 | return df
769 | ```
770 |
771 | ### enumerate
772 |
773 | ```python
774 | tuples = [(2,3),(7,8),(12,25)]
775 | for step, tp in enumerate(tuples):
776 | print(step,tp)
777 | # 0 (2, 3)
778 | # 1 (7, 8)
779 | # 2 (12, 25)
780 | ```
781 |
782 | ### label数值化方法
783 |
784 | 方法一
785 | ```python
786 | # 比如10个类别转成1到10
787 | from sklearn.preprocessing import LabelEncoder
788 | data['label'] = LabelEncoder().fit_transform(data.categ_id)
789 | ```
790 | 方法二
791 | ```python
792 | # 比如10个类别转成onehot形式
793 | import pandas as pd
794 | pd.get_dummies(data.categ_id)
795 | ```
796 |
797 | 方法三
815 | 1. [x for x in data if condition]
816 | 2. [exp1 if condition else exp2 for x in data]
817 |
818 | ### 将numpy_array中的最多的元素选出
819 |
820 | 将numpy array中的最多的元素选出,如果一样则取最小的那个
821 | ```python
822 | arr = np.array([2,2,2,4,5])
823 | np.bincount(arr).argmax()
824 | # output: 2
825 | arr = np.array([1,2,1,4,2,8])
826 | np.bincount(arr).argmax()
827 | # output: 1
828 | ```
829 |
830 | 将list中最多的元素选出,如果一样则取最小的那个
831 | ```python
832 | # 方法一
833 | arr = [2,2,2,4,5]
834 | max(set(arr),key=arr.count)
835 | # 方法二
836 | from collections import Counter
837 | Counter(arr).most_common(1)[0][0]
838 | ```
839 |
840 | ### 函数中传入函数demo
841 |
842 | ```python
843 | # time_function把时间包装了一下给其他的函数
844 | def time_function(f, *args):
845 | """
846 | Call a function f with args and return the time (in seconds) that it took to execute.
847 | """
848 | import time
849 | tic = time.time()
850 | f(*args)
851 | toc = time.time()
852 | return toc - tic
853 |
854 | two_loop_time = time_function(classifier.compute_distances_two_loops, X_test)
855 | print('Two loop version took %f seconds' % two_loop_time)
856 |
857 | one_loop_time = time_function(classifier.compute_distances_one_loop, X_test)
858 | print('One loop version took %f seconds' % one_loop_time)
859 |
860 | no_loop_time = time_function(classifier.compute_distances_no_loops, X_test)
861 | print('No loop version took %f seconds' % no_loop_time)
862 | ```
863 |
864 | ### getattr
865 |
866 | ```python
867 | class A(object):
868 | def __init__(self):
869 | pass
870 | def xx(self,x):
871 | print('get xx func',x)
872 | a = A()
873 | getattr(a,'xx')(23213) ### 等同于a.xx(23213)
874 | #out[]: get xx func 23213
875 | ```
876 |
877 | ### df宽变长及一列变多列
878 |
879 | (1) df宽变长
880 | ```python
881 | def explode(df, col, pat=None, drop_col=True):
882 | """
883 | :param df:
884 | :param col: col name
885 | :param pat: String or regular expression to split on. If None, splits on whitespace
886 | :param drop_col: drop col is Yes or No
887 | :return: hive explode
888 | """
889 | data = df.copy()
890 | data_temp = data[col].str.split(pat=pat, expand=True).stack().reset_index(level=1, drop=True).rename(col+'_explode')
891 | if drop_col:
892 | data.drop(col, 1, inplace=True)
893 | return data.join(data_temp)
894 |
895 | df = pd.DataFrame([[1, 'a b c'],
896 | [2, 'a b'],
897 | [3, np.nan]], columns=['id', 'col'])
898 |
899 | explode(df, 'col', pat=' ')
900 | ```
901 | ```python
902 | # id col_explode
903 | #0 1 a
904 | #0 1 b
905 | #0 1 c
906 | #1 2 a
907 | #1 2 b
908 | #2 3 NaN
909 | ```
910 | (2) 一列变多列
911 | ```python
912 | df.col.str.split(' ', expand=True)
913 | ```
914 | ```python
915 | # 0 1 2
916 | #0 a b c
917 | #1 a b None
918 | #2 NaN NaN NaN
919 | ```
920 |
921 | ### groupby使用
922 |
923 | 根据df的personid进行groupby,统计一下用户消费consume这一列特征的相关聚合情况;
924 | 比如count, max, kurt
925 |
926 | ```python
927 | gr = df.groupby('personid')['consume']
928 | df_aggr = gr.agg([('_count','count'),('_max',np.max),('_kurt',pd.Series.kurt)]).reset_index()
929 |
930 | # 多个特征聚合统计值拼接
931 | df = df.merge(df_aggr, how='left', on='personid').fillna(0)
932 | ```
933 |
934 | ### python画图显示中文
935 |
936 | ```python
937 | ## 显示中文解决方法
938 | # 解决方法一
939 | import matplotlib as mpl
940 | mpl.rcParams['font.sans-serif'] = ['SimHei']
941 | mpl.rcParams['font.serif'] = ['SimHei']
942 |
943 | # 如果方法一解决不了
944 | import matplotlib.pyplot as plt
945 | plt.rcParams['font.sans-serif'] = ['SimHei'] # 解决中文显示问题-设置字体为黑体
946 | plt.rcParams['axes.unicode_minus'] = False # 解决保存图像是负号'-'显示为方块的问题
947 |
948 | # 如果方法二解决不了
949 | import matplotlib
950 | zhfont = matplotlib.font_manager.FontProperties(fname='../simsun.ttc')
951 | plt.title("职业分布情况",fontproperties=zhfont)
952 | plt.xlabel("用户职业",fontproperties=zhfont)
953 | plt.ylabel("逾期用户比例",fontproperties=zhfont)
954 | #或者
955 | import seaborn as sns
956 | p = sns.color_palette()
957 | sns.set_style("darkgrid",{"font.sans-serif":['simhei', 'Arial']})
958 | fig = plt.figure(figsize=(20, 20))
959 | ax1 = fig.add_subplot(3, 2, 1) # 总共3行2列6张,这是第一张图
960 | ax1=sns.barplot(职业分布.index, 职业分布.逾期/职业分布.总数, alpha=0.8, color=p[0], label='train')
961 | ax1.legend()
962 | ax1.set_title(u'职业分布情况',fontproperties=zhfont)
963 | ax1.set_xlabel(u'用户职业',fontproperties=zhfont)
964 | ax1.set_ylabel(u'逾期用户比例',fontproperties=zhfont)
965 |
966 | # 杰哥的方法,这个比较好
967 | from pathlib import Path
968 | from matplotlib.font_manager import _rebuild
969 | def chinese_setting(url=None):
970 | """
971 | :param url: SimHei字体下载链接
972 | :return:
973 | """
974 | print('开始设置中文...')
975 | matplotlibrc_path = Path(matplotlib.matplotlib_fname())
976 | ttf_path = matplotlibrc_path.parent.__str__() + '/fonts/ttf'
977 | ttf_url = 'https://raw.githubusercontent.com/Jie-Yuan/Jie-Yuan.github.io/master/SimHei.ttf' if url is None else url
978 | if list(Path(ttf_path).glob('SimHei.ttf')):
979 | pass
980 | else:
981 | print('下载字体...')
982 | os.popen("cd %s && wget %s" % (ttf_path, ttf_url))
983 |
984 | print('设置字体...')
985 | setting1 = 'font.family: sans-serif'
986 | setting2 = 'font.sans-serif: SimHei, Bitstream Vera Sans, Lucida Grande, Verdana, Geneva, Lucid, Arial, Helvetica, Avant Garde, sans-serif'
987 | setting3 = 'axes.unicode_minus: False'
988 | os.system('echo > %s' % matplotlibrc_path)
989 | os.system('echo %s >> %s' % (setting1, matplotlibrc_path))
990 | os.system('echo %s >> %s' % (setting2, matplotlibrc_path))
991 | os.system('echo %s >> %s' % (setting3, matplotlibrc_path))
992 | _rebuild()
993 | print('请重启kernel测试...')
994 | chinese_setting()
995 | ```
996 |
997 |
998 | ```bash
999 | # Graphviz 中文乱码
1000 | centos5.x下
1001 | yum install fonts-chinese
1002 | centos6.x或7.x下
1003 | yum install cjkuni-ukai-fonts
1004 |
1005 | fc-cache -f -v 刷新字体缓存
1006 | ```
1007 |
1008 | ### 给字典按value排序
1009 |
1010 | ```python
1011 | model = xgb.train()
1012 | feature_score = model.get_fscore()
1013 | #{'avg_user_date_datereceived_gap': 1207,
1014 | # 'buy_total': 2391,
1015 | # 'buy_use_coupon': 557,
1016 | # 'buy_use_coupon_rate': 1240,
1017 | # 'count_merchant': 1475,
1018 | # 'coupon_rate': 5615,
1019 | # ...
1020 | # }
1021 | ```
1022 |
1023 | 方法一:
1024 | ```python
1025 | sorted(feature_score.items(), key=lambda x:x[1],reverse=True)
1026 | ```
1027 |
1028 | 方法二:
1029 | ```python
1030 | df = pd.DataFrame([(key, value) for key,value in feature_score.items()],columns=['key','value'])
1031 | df.sort_values(by='value',ascending=False,inplace=True)
1032 | ```
1033 |
1034 | ### sorted高级用法
1035 |
1036 | 用法一:
1037 | 这里,列表里面的每一个元素都为二维元组,key参数传入了一个lambda函数表达式,其x就代表列表里的每一个元素,然后分别利用索引返回元素内的第一个和第二个元素,这就代表了sorted()函数利用哪一个元素进行排列。而reverse参数就如同上面讲的一样,起到逆排的作用。默认情况下,reverse参数为False。
1038 | ```python
1039 | l=[('a', 1), ('b', 2), ('c', 6), ('d', 4), ('e', 3)]
1040 | sorted(l, key=lambda x:x[0], reverse=True)
1041 | # Out[40]: [('e', 3), ('d', 4), ('c', 6), ('b', 2), ('a', 1)]
1042 | sorted(l, key=lambda x:x[1], reverse=True)
1043 | # Out[42]: [('c', 6), ('d', 4), ('e', 3), ('b', 2), ('a', 1)]
1044 | ```
1045 |
1046 | 用法一(衍生):
1054 | ```python
1055 | # 调整数组顺序使奇数位于偶数前面,奇偶相对顺序不变
1056 | # 按照某个键值(即索引)排序,这里相当于对0和1进行排序
1057 | a = [3,2,1,5,8,4,9]
1058 | sorted(a, key=lambda c:c%2, reverse=True)
1059 | # key=a%2得到索引[1,0,1,1,0,0,1] 相当于给a打上索引标签[(1, 3), (0, 2), (1, 1), (1, 5), (0, 8), (0, 4), (1, 9)]
1060 | # 然后根据0和1的索引排序 得到[0,0,0,1,1,1,1]对应的数[2,8,4,3,1,5,9],
1061 | # 最后reverse的时候两块索引整体交换位置[1,1,1,1,0,0,0] 对应的数为[3, 1, 5, 9, 2, 8, 4] 这一系列过程数相对位置不变
1062 | ```
1063 |
1064 | 用法三:
1065 | 需要注意的是,在python3以后,sort方法和sorted函数中的cmp参数被取消,此时如果还需要使用自定义的比较函数,那么可以使用cmp_to_key函数(在functools中)
1066 | ```python
1067 | from functools import cmp_to_key
1068 | arr = [3,5,6,4,2,8,1]
1069 | def comp(x, y):
1070 | if x < y:
1071 | return 1
1072 | elif x > y:
1073 | return -1
1074 | else:
1075 | return 0
1076 |
1077 | sorted(arr, key=cmp_to_key(comp))
1078 | # Out[10]: [8,6,5,4,3,2,1]
1079 | ```
1080 |
1081 | 用法三(衍生):
1082 | 输入一个正整数数组,把数组里所有数字拼接起来排成一个数,打印能拼接出的所有数字中最小的一个。例如输入数组{3,32,321},则打印出这三个数字能排成的最小数字为321323。
1083 | ```python
1084 | # 把数组排成最小的数
1085 | from functools import cmp_to_key
1086 | arr = [3, 32, 321]
1087 | arr = map(str, arr) # or [str(x) for x in arr]
1088 | ll = sorted(arr, key=cmp_to_key(lambda x,y:int(x+y)-int(y+x)))
1089 | print(int(''.join(ll)))
1090 | # Out[3]: 321323
1091 | ```
1092 |
1093 | ### time用法
1094 |
1095 | ```python
1096 | import time
1097 | s = 'Jun-96'
1098 | time.mktime(time.strptime(s,'%b-%y'))
1099 | # strptime函数是将字符串按照后面的格式转换成时间元组类型;mktime函数则是将时间元组转换成时间戳
1100 | ```
1101 |
1102 | ### 两层列表展开平铺
1103 |
1104 | 性能最好的两个方法
1105 |
1106 | 1. 方法一
1107 | ```python
1108 | C = [[1,2],[3,4,5],[7]]
1109 | [a for b in C for a in b]
1110 | ```
1111 |
1112 | 2. 方法二
1113 | ```python
1114 | from itertools import chain
1115 | list(chain(*input))
1116 | # list(chain.from_iterable(input))
1117 | ```
1118 |
1119 | 3. 方法三
1120 | ```python
1121 | import functools
1122 | import operator
1123 | #使用functools內建模块
1124 | def functools_reduce(a):
1125 | return functools.reduce(operator.concat, a)
1126 | ```
1127 |
1128 | ### 读取百度百科词向量
1129 |
1130 | ```python
1131 | from bz2 import BZ2File as b2f
1132 | import tarfile
1133 | path = 'data/sgns.target.word-ngram.1-2.dynwin5.thr10.neg5.dim300.iter5.bz2'
1134 | fp = b2f(path)
1135 | lines = fp.readlines()
1136 |
1137 | def get_baike_wv(lines):
1138 | d_ = {}
1139 | for line in lines:
1140 | tmp = line.decode('utf-8').split(' ')
1141 | d_[tmp[0]] = [float(x) for x in tmp[1:-1]]
1142 | return d_
1143 | baike_wv_dict = get_baike_wv(lines)
1144 | ```
1145 |
1146 | ### logging
1147 |
1148 | ```python
1149 | import logging
1150 | #logger
1151 | def get_logger():
1152 | FORMAT = '[%(levelname)s]%(asctime)s:%(name)s:%(message)s'
1153 | logging.basicConfig(format=FORMAT)
1154 | logger = logging.getLogger('main')
1155 | logger.setLevel(logging.DEBUG)
1156 | return logger
1157 |
1158 | logger = get_logger()
1159 |
1160 | logger.warning('Input data')
1161 | logger.info('cat treatment')
1162 | ```
1163 |
1164 | ### argparse用法
1165 |
1166 | argparse 是在 Python 中处理命令行参数的一种标准方式。
1167 |
1168 | [arg_test.py](arg_test.py)
1169 | ```
1170 | # 在shell中输入
1171 | python arg_test.py --train_path aa --dev_path bb
1172 | # 打印结果如下
1173 | Namespace(dev_path='bb',log_level='info',train_path='aa')
1174 | aa
1175 | bb
1176 | done.
1177 | ```
1178 |
1179 | ### 包管理
1180 |
1181 | 一个包里有三个模块,mod1.py, mod2.py, mod3.py,但使用from demopack import *导入模块时,如何保证只有mod1、mod3被导入了。
1182 | 答案:增加init.py文件,并在文件中增加:
1183 | ```python
1184 | __all__ = ['mod1','mod3']
1185 | ```
1186 |
1187 | ### 装饰器
1188 |
1189 | [装饰器参考网址(还可以)](https://blog.csdn.net/qq_41853758/article/details/82853811)
1190 | ```python
1191 | #其中一种举例 装饰带有返回值的函数
1192 | def function(func): #定义了一个闭包
1193 | def func_in(*args,**kwargs): #闭包内的函数,因为装饰器运行的实则是闭包内的函数,所以这里将需要有形参用来接收原函数的参数。
1194 | print('这里是需要装饰的内容,就是需要添加的内容')
1195 | num = func(*args,**kwargs) #调用实参函数,并传入一致的实参,并且用变量来接收原函数的返回值,
1196 | return num #将接受到的返回值再次返回到新的test()函数中。
1197 | return func_in
1198 | @function
1199 | def test(a,b): #定义一个函数
1200 | return a+b #返回实参的和
1201 | print(test(3, 4))
1202 | # 这里是需要装饰的内容,就是需要添加的内容
1203 | # 7
1204 | ```
1205 |
1206 | ### 本地用python起http服务
1207 |
1208 | ```shell
1209 | python -m http.server 7777
1210 | ```
1211 |
1212 | ### cache
1213 |
1214 | [好用的cache包](https://github.com/tkem/cachetools)
1215 | ```python
1216 | from cachetools import cached, LRUCache, TTLCache
1217 |
1218 | # speed up calculating Fibonacci numbers with dynamic programming
1219 | @cached(cache={})
1220 | def fib(n):
1221 | return n if n < 2 else fib(n - 1) + fib(n - 2)
1222 |
1223 | # cache least recently used Python Enhancement Proposals
1224 | @cached(cache=LRUCache(maxsize=32))
1225 | def get_pep(num):
1226 | url = 'http://www.python.org/dev/peps/pep-%04d/' % num
1227 | with urllib.request.urlopen(url) as s:
1228 | return s.read()
1229 |
1230 | # cache weather data for no longer than ten minutes
1231 | @cached(cache=TTLCache(maxsize=1024, ttl=600))
1232 | def get_weather(place):
1233 | return owm.weather_at_place(place).get_weather()
1234 | ```
1235 | 加在函数之前,主要cache输入和返回的值,下次输入同样的值就会1ms内返回,可以设置cache策略和数据过期时间ttl
1236 |
1237 | ### 创建文件
1238 |
1239 | 如果文件不存在则创建
1240 | ```python
1241 | from pathlib import Path
1242 | Path(OUT_DIR).mkdir(exist_ok=True)
1243 | ```
1244 |
1245 | ### 字典转成对象
1246 |
1247 | ```python
1248 | class MyDict(dict):
1249 | __setattr__ = dict.__setitem__
1250 | __getattr__ = dict.__getitem__
1251 |
1252 |
1253 | def dict_to_object(_d):
1254 | if not isinstance(_d, dict):
1255 | return _d
1256 | inst = MyDict()
1257 | for k, v in _d.items():
1258 | inst[k] = dict_to_object(v) # 解决嵌套字典问题
1259 | return inst
1260 | ```
1261 |
1262 | ### boost安装
1263 |
1264 | ```shell
1265 | sudo apt-get install libboost-all-dev
1266 | sudo apt install ocl-icd-opencl-dev
1267 | sudo apt install cmake(可以去https://cmake.org/files下载比如cmake-3.14.0.tar.gz然后执行./bootstrap然后make然后make install)
1268 | ```
1269 |
1270 | lgb gpu版安装
1271 | ```shell
1272 | pip install --upgrade pip
1273 | pip install lightgbm --install-option=--gpu
1274 | ```
1275 | xgb gpu版安装
1276 | ```shell
1277 | git clone --recursive https://github.com/dmlc/xgboost
1278 | cd xgboost
1279 | mkdir build
1280 | cd build
1281 | cmake .. -DUSE_CUDA=ON
1282 | make(或者make -j4可能或报错)
1283 |
1284 | cd ..
1285 | cd python-package
1286 | python setup.py install
1287 | ```
1288 |
1289 | ### tqdm
1290 |
1291 | [当Pytorch遇上tqdm](https://blog.csdn.net/dreaming_coder/article/details/113486645)
1292 | ```python
1293 | for epoch in range(epoch):
1294 | with tqdm(
1295 | iterable=train_loader,
1296 | bar_format='{desc} {n_fmt:>4s}/{total_fmt:<4s} {percentage:3.0f}%|{bar}| {postfix}',
1297 | ) as t:
1298 | start_time = datetime.now()
1299 | loss_list = []
1300 | for batch, data in enumerate(train_loader):
1301 | t.set_description_str(f"\33[36m【Epoch {epoch + 1:04d}】")
1302 | # 训练代码
1303 | time.sleep(1)
1304 | # 计算当前损失
1305 | loss = random()
1306 | loss_list.append(loss)
1307 | cur_time = datetime.now()
1308 | delta_time = cur_time - start_time
1309 | t.set_postfix_str(f"train_loss={sum(loss_list) / len(loss_list):.6f}, 执行时长:{delta_time}\33[0m")
1310 | t.update()
1311 | ```
1312 |
1313 | ### joblib_parallel
1314 |
1315 |
1316 | ```python
1317 | #Parallel for loop 此方法可用于多个文件数据并行读取
1318 | from joblib import Parallel, delayed
1319 | from math import sqrt
1320 | def ff(num):
1321 | return [sqrt(n ** 3) for n in range(num)]
1322 | #不使用并行 7.5s
1323 | res = []
1324 | for i in range(10,7000):
1325 | res.append(ff(i))
1326 | #使用并行 2.75s
1327 | res = Parallel(n_jobs = -1, verbose = 1)(delayed(ff)(i) for i in range(10,7000))
1328 | ```
1329 |
1330 | ### 调试神器pysnooper
1331 |
1332 | ```python
1333 | #pip install pysnooper
1334 | import os
1335 | os.environ['pysnooper'] = '1' # 开关
1336 |
1337 | from pysnooper import snoop
1338 | #如果为0,则重新定义snoop然后这个修饰啥都不干
1339 | if os.environ['pysnooper'] == '0':
1340 | import wrapt
1341 | def snoop(*args, **kwargs):
1342 | @wrapt.decorator
1343 | def wrapper(wrapped, instance, args, kwargs):
1344 | return wrapped(*args, **kwargs)
1345 | return wrapper
1346 | ```
1347 |
1348 | ### 调试神器debugpy
1349 |
1350 | 安装:pip install debugpy -U
1351 | 在python代码里面(最前面加上这句话)
1352 | ```python
1353 | import debugpy
1354 | try:
1355 | # 5678 is the default attach port in the VS Code debug configurations. Unless a host and port are specified, host defaults to 127.0.0.1
1356 | debugpy.listen(("localhost", 9501))
1357 | print("Waiting for debugger attach")
1358 | debugpy.wait_for_client()
1359 | except Exception as e:
1360 | pass
1361 |
1362 | ```
1363 |
1364 | 在vscode软件中项目下新建一个.vscode目录,然后创建launch.json,看9501端口那个配置
1365 | ```python
1366 | {
1367 | // 使用 IntelliSense 了解相关属性。
1368 | // 悬停以查看现有属性的描述。
1369 | // 欲了解更多信息,请访问: https://go.microsoft.com/fwlink/?linkid=830387
1370 | "version": "0.2.0",
1371 | "configurations": [
1372 | {
1373 | "name": "torchr_ex2",
1374 | "type": "python",
1375 | "request": "launch",
1376 | "program": "/Users/zb/anaconda3/envs/rag/bin/torchrun",
1377 | "console": "integratedTerminal",
1378 | "justMyCode": true,
1379 | "args": [
1380 | "--nnodes",
1381 | "1",
1382 | "--nproc-per-node",
1383 | "2",
1384 | "${file}",
1385 | "--model_name_or_path",
1386 | "my_model_bz"
1387 | ]
1388 | },
1389 | {
1390 | "name": "sh_file_debug",
1391 | "type": "debugpy",
1392 | "request": "attach",
1393 | "connect": {
1394 | "host": "localhost",
1395 | "port": 9501
1396 | }
1397 | },
1398 | ]
1399 | }
1400 | ```
1401 |
1402 | 上面的端口号都写一样比如9501,别搞错了!
1403 |
1404 | ### 分组计算均值并填充
1405 |
1406 | ```python
1407 | def pad_mean_by_group(df, gp_col='stock_id'):
1408 | # 只留下需要处理的列
1409 | cols = [col for col in df.columns if col not in["stock_id", "time_id", "target", "row_id"]]
1410 | # 查询nan的列
1411 | df_na = df[cols].isna()
1412 | # 根据分组计算平均值
1413 | df_mean = df.groupby(gp_col)[cols].mean()
1414 |
1415 | # 依次处理每一列
1416 | for col in cols:
1417 | na_series = df_na[col]
1418 | names = list(df.loc[na_series,gp_col])
1419 |
1420 | t = df_mean.loc[names,col]
1421 | t.index = df.loc[na_series,col].index
1422 |
1423 | # 相同的index进行赋值
1424 | df.loc[na_series,col] = t
1425 | return df
1426 | train_pca = pad_mean_by_group(train_pca)
1427 | ```
1428 |
1429 | ### python日期处理
1430 |
1431 | [80个例子,彻底掌握Python日期时间处理](https://mp.weixin.qq.com/s/2bJUZBfWS_8ULGrb9tRpmw)
1432 |
1433 | ### dataclass
1434 |
1435 | dataclass 提供一个简便的方式创建数据类, 默认实现__init__(), __repr__(), __eq__()方法
1436 | dataclass支持数据类型的嵌套
1437 | 支持将数据设置为不可变:@dataclass(frozen=True)
1438 |
1439 | 不用dataclass
1440 |
1441 | ```python
1442 | class Person:
1443 | def __init__(self, name, age):
1444 | self.name = name
1445 | self.age = age
1446 | p = Person('test', 18)
1447 | q = Person('test', 18)
1448 | #<__main__.Person at 0x7ff4ade66f40>
1449 | str(p)
1450 | repr(p)
1451 | #'<__main__.Person object at 0x7ff4ade66f40>'
1452 | p == q
1453 | #False
1454 | ```
1455 | ```python
1456 | from typing import Any
1457 | from dataclasses import dataclass
1458 | @dataclass
1459 | class Person:
1460 | name: Any
1461 | age: Any = 18
1462 | p = Person('test', 18)
1463 | q = Person('test', 18)
1464 | #Person(name='test', age=18)
1465 | str(p)
1466 | repr(p)
1467 | #"Person(name='test', age=18)"
1468 | p == q
1469 | #True
1470 | ```
1471 |
1472 | ### md5_sha256
1473 |
1474 | ```python
1475 | import hashlib
1476 |
1477 | def enc(s, ed='md5'):
1478 | if ed == 'md5':
1479 | hash_object = hashlib.md5(s.encode())
1480 | elif ed == 'sha256':
1481 | hash_object = hashlib.sha256(s.encode())
1482 | else:
1483 | raise ValueError('unsupport type!')
1484 | hash_hex = hash_object.hexdigest()
1485 | return hash_hex
1486 |
1487 | for i in ['13730973320','13802198853','17619520726']:
1488 | print(enc(i,'md5'))
1489 | ```
1490 |
1491 | ### 查看内存
1492 |
1493 | 有几种方法可以在Python中获取对象的大小。可以使用sys.getsizeof()来获取对象的确切大小,使用objgraph.show_refs()来可视化对象的结构,或者使用psutil.Process().memory_info()。RSS获取当前分配的所有内存。
1494 |
1495 | ```python
1496 | >>> import numpy as np
1497 | >>> import sys
1498 | >>> import objgraph
1499 | >>> import psutil
1500 | >>> import pandas as pd
1501 |
1502 | >>> ob = np.ones((1024, 1024, 1024, 3), dtype=np.uint8)
1503 |
1504 | ### Check object 'ob' size
1505 | >>> sys.getsizeof(ob) / (1024 * 1024)
1506 | 3072.0001373291016
1507 |
1508 | ### Check current memory usage of whole process (include ob and installed packages, ...)
1509 | >>> psutil.Process().memory_info().rss / (1024 * 1024)
1510 | 3234.19140625
1511 |
1512 | ### Check structure of 'ob' (Useful for class object)
1513 | >>> objgraph.show_refs([ob], filename='sample-graph.png')
1514 |
1515 | ### Check memory for pandas.DataFrame
1516 | >>> from sklearn.datasets import load_boston
1517 | >>> data = load_boston()
1518 | >>> data = pd.DataFrame(data['data'])
1519 | >>> print(data.info(verbose=False, memory_usage='deep'))
1520 |
1521 | RangeIndex: 506 entries, 0 to 505
1522 | Columns: 13 entries, 0 to 12
1523 | dtypes: float64(13)
1524 | memory usage: 51.5 KB
1525 |
1526 | ### Check memory for pandas.Series
1527 | >>> data[0].memory_usage(deep=True) # deep=True to include all the memory used by underlying parts that construct the pd.Series
1528 | 4176
1529 | ```
1530 |
1531 | ### slots用法
1532 |
1533 | ```python
1534 | #不使用__slots__时,可以很容易地添加一个额外的job属性
1535 | class Author:
1536 | def __init__(self, name, age):
1537 | self.name = name
1538 | self.age = age
1539 |
1540 | me = Author('Yang Zhou', 30)
1541 | me.job = 'Software Engineer'
1542 | print(me.job)
1543 | # Software Engineer
1544 |
1545 | # 在大多数情况下,我们不需要在运行时更改实例的变量或方法,并且__dict__不会(也不应该)在类定义后更改。所以Python为此提供了一个属性:__slots__
1546 | class Author:
1547 | __slots__ = ('name', 'age')
1548 |
1549 | def __init__(self, name, age):
1550 | self.name = name
1551 | self.age = age
1552 |
1553 | me = Author('Yang Zhou', 30)
1554 | me.job = 'Software Engineer'
1555 | print(me.job)
1556 | # AttributeError: 'Author' object has no attribute 'job'
1557 | ```
1558 |
1559 |
1560 |
1561 |
--------------------------------------------------------------------------------
/01_basic/arg_test.py:
--------------------------------------------------------------------------------
1 | __author__ = 'binzhou'
2 | __time__ = '20190116'
3 |
4 | import argparse
5 | import logging
6 |
7 | parser = argparse.ArgumentParser()
8 | parser.add_argument('--train_path', action='store', dest='train_path',
9 | help='Path to train data')
10 | parser.add_argument('--dev_path', action='store', dest='dev_path',
11 | help='Path to dev data')
12 | parser.add_argument('--log-level', dest='log_level',
13 | default='info',
14 | help='Logging level.')
15 |
16 | opt = parser.parse_args()
17 | print(opt)
18 |
19 | print('----------------------------------------')
20 |
21 | LOG_FORMAT = '%(asctime)s %(name)-12s %(levelname)-8s %(message)s'
22 | logging.basicConfig(format=LOG_FORMAT, level=getattr(logging, opt.log_level.upper()))
23 | # logging.info(opt)
24 |
25 | if opt.train_path is not None:
26 | print(opt.train_path)
27 | if opt.dev_path is not None:
28 | print(opt.dev_path)
29 |
30 | print('done.')
31 |
32 |
--------------------------------------------------------------------------------
/02_numpy/README.md:
--------------------------------------------------------------------------------
1 | ## 目录
2 |
3 | [**1. numpy类型**](#numpy类型)
4 |
5 | [**2. np.where用法**](#np_where用法)
6 |
7 | [**3. 数组方法**](#数组方法)
8 |
9 | [**4. copy&deep copy**](#deep_copy)
10 |
11 | [**5. flatten和ravel的区别**](#flatten_ravel)
12 |
13 | [**6. pandas一列数组转numpy二维数组**](#pandas一列数组转numpy二维数组)
14 |
15 | [**7. numpy array改dtype方法**](#numpy_array改dtype方法)
16 |
17 | [**8. numpy存取数据**](#numpy存取数据)
18 |
19 | ---
20 |
21 | ### numpy类型
22 |
23 | 具体如下:
24 |
25 | |基本类型|可用的**Numpy**类型|备注
26 | |--|--|--
27 | |布尔型|`bool`|占1个字节
28 | |整型|`int8, int16, int32, int64, int128, int`| `int` 跟**C**语言中的 `long` 一样大
29 | |无符号整型|`uint8, uint16, uint32, uint64, uint128, uint`| `uint` 跟**C**语言中的 `unsigned long` 一样大
30 | |浮点数| `float16, float32, float64, float, longfloat`|默认为双精度 `float64` ,`longfloat` 精度大小与系统有关
31 | |复数| `complex64, complex128, complex, longcomplex`| 默认为 `complex128` ,即实部虚部都为双精度
32 | |字符串| `string, unicode` | 可以使用 `dtype=S4` 表示一个4字节字符串的数组
33 | |对象| `object` |数组中可以使用任意值|
34 | |Records| `void` ||
35 | |时间| `datetime64, timedelta64` ||
36 |
37 |
38 | ### np_where用法
39 | ```python
40 | arr = array([ 0.31593257, 0.33837679, 0.38240686, 0.38970056, 0.54940456])
41 | pd.Series(np.where(arr > 0.5, 1, 0), name='result').to_csv(path_save, index=False, header=True)
42 | ```
43 | 这句话的意思是arr中值大于0.5赋为1,小于等于0.5赋为0,然后把这个series的名字命名为result,最后保存成csv文件去掉index,保留列名
44 |
45 | ### 数组方法
46 | ```python
47 | a = array([[1,2,3],
48 | [4,5,6]])
49 |
50 | # 求和, 沿第一维求和,沿第二维求和 下面函数都类似
51 | sum(a) 21
52 | a.sum() 21
53 | sum(a,axis=0) array([5, 7, 9])
54 | sum(a, axis=1) array([ 6, 15])
55 | # 求积
56 | a.prod() 720
57 | prod(a) 720
58 | # 求最大最小及最大最小值的位置
59 | a.min() 1
60 | a.max() 6
61 | a.argmin() 0
62 | a.argmax() 5
63 | # 求均值、标准差
64 | a.mean()
65 | a.std()
66 | # clip方法,将数值限制在某个范围
67 | a.clip(3,5) array([[3,3,3],[4,5,5]])
68 | # ptp方法,计算最大值和最小值之差
69 | a.ptp() 5
70 |
71 | # 生成二维随机矩阵并且保留3位小数
72 | from numpy.random import rand
73 | a = rand(3,4)
74 | %precision 3 #这个修饰可以运用在整个IDE上
75 | ```
76 |
77 | ### deep_copy
78 | ```python
79 | a = np.array([1,2,3])
80 | b1 = a #简单说, b1的东西全部都是a的东西, 动b1的任何地方, a都会被动到, 因为他们在内存中的位置是一模一样的, 本质上就是自己
81 | b2 = a.copy() # deep copy;则是将a copy了一份, 然后把b2放在内存中的另外的地方
82 | ```
83 |
84 | ### flatten_ravel
85 | flatten是deep copy,而ravel是view
86 | ```python
87 | a = np.arange(8).reshape(2,4)
88 | b1 = a.flatten()
89 | b2 = a.ravel()
90 | # 这里如果改变b1的值,a不会改变;而改变b2的值,a会改变!
91 | ```
92 |
93 | ### pandas一列数组转numpy二维数组
94 | ```python
95 | print(df)
96 | # nn
97 | # 0 [1,2,3]
98 | # 1 [4,5,6]
99 | # 2 [7,8,9]
100 | ## 想把df.nn这一列转成二维数组
101 | np.array([df.nn])[0]
102 | # array([[1, 2, 3],
103 | # [4, 5, 6],
104 | # [7, 8, 9]], dtype=object)
105 | ## 注意这里转换完以后array的dtype是object,需要根据需要转成相应的int,float等类型
106 | ## 转换方法看numpy_array改dtype方法
107 | ```
108 |
109 | ### numpy_array改dtype方法
110 | ```python
111 | ###用astype方法
112 | print(arr)
113 | # array([[1, 2, 3],
114 | # [4, 5, 6],
115 | # [7, 8, 9]], dtype=object)
116 | arr.astype(np.float)
117 | ```
118 |
119 | ### numpy存取数据
120 |
121 | ```python
122 | np.save('xxx.npy',data)
123 | np.load('xxx.npy')
124 | ```
--------------------------------------------------------------------------------
/03_pandas/README.md:
--------------------------------------------------------------------------------
1 | ## 目录
2 |
3 | [pandas进阶修炼300题](https://www.heywhale.com/mw/project/6146c0318447b8001769ff20)
4 |
5 | [可以替代pandas比较好用的数据平行处理包](#数据平行处理)
6 |
7 | [**1. pandas并行包**](#pandas并行包)
8 |
9 | [**2. pandas dataframe手动创建**](#pandas_dataframe手动创建)
10 |
11 | [**3. pandas dataframe中apply用法**](#pandas_dataframe中apply用法)
12 |
13 | [**4. pandas dataframe中map用法**](#pandas_dataframe中map用法)
14 |
15 | [**5. groupby用法**](#groupby用法)
16 |
17 | [**6. explode用法**](#explode用法)
18 |
19 | [**7. sort用法**](#sort用法)
20 |
21 | [**8. left join用法**](#left_join用法)
22 |
23 | [**9. reset_index用法**](#reset_index用法)
24 |
25 | [**10. pandas to_csv字段和值加引号操作**](#to_csv字段和值加引号操作)
26 |
27 | [**11. pd concat、merge、join来合并数据表**](#合并数据表)
28 |
29 | [**12. 数据透视表(Pivot Tables)**](#数据透视表)
30 |
31 | [**13. shuffle**](#shuffle)
32 |
33 | [**14. dataframe交换列的顺序**](#dataframe交换列的顺序)
34 |
35 | [**15. dataframe设置两个条件取值**](#dataframe设置两个条件取值)
36 |
37 | [**16. dataframe用h5格式保存**](#dataframe用h5格式保存)
38 |
39 | [**17. assign用法**](#assign用法)
40 |
41 | [**18. 用一列的非空值填充另一列对应行的空值**](#用一列的非空值填充另一列对应行的空值)
42 |
43 | [**19. dataframe修改值**](#dataframe修改值)
44 |
45 | [**20. dataframe表格填充**](#dataframe表格填充)
46 |
47 | [**21. 加快dataframe读取**](#加快dataframe读取)
48 |
49 | [**22. df热力图**](#df热力图)
50 |
51 | [**23. df热力地图**](#df热力地图)
52 |
53 | [**24. 2个pandas EDA插件**](#eda插件)
54 |
55 | [**25. python批量插入mysql数据库**](#python批量插入mysql数据库)
56 |
57 | ---
58 |
59 | ### 数据平行处理
60 |
61 | [polar]
62 | https://pola-rs.github.io/polars-book/user-guide/quickstart/intro.html
63 | https://pola-rs.github.io/polars/py-polars/html/reference
64 |
65 | [pandarallel](https://nalepae.github.io/pandarallel/)
66 |
67 |
68 | ### pandas_dataframe手动创建
69 |
70 | 手动创建dataframe
71 | ```python
72 | arr = np.array([['John','Lily','Ben'],[11,23,56]])
73 | df = pd.DataFrame(arr.transpose(),columns=['name','age'])
74 | ```
75 |
76 |
77 | ### pandas_dataframe中apply用法
78 |
79 | 现在想看一下地址中含有-和,的数据有哪些可以进行如下操作:
80 | ```python
81 | df[df.address.apply(lambda x: ('-' in list(x)) and (',' in list(x)))]
82 | ```
83 |
84 | > 可以看basic中apply函数的用法
85 |
86 | ### pandas_dataframe中map用法
87 |
88 | ```python
89 | df["season"] = df.season.map({1: "Spring", 2 : "Summer", 3 : "Fall", 4 :"Winter" })
90 | # 把数字映射成string
91 | ```
92 |
93 | ### groupby用法
94 |
95 | [**用法举例一**]
96 | ```python
97 | gr = df.groupby(by='EID')
98 | gr.agg({'BTBL':'max','BTYEAR':'count'}).reset_index() # 常见的max, min, count, mean, first, nunique
99 | ```
100 | ||EID|BTBL|BTYEAR
101 | |--|--|--|--
102 | |0|4|0.620|2011
103 | |1|38|0.700|2013
104 | |2|51|0.147|2002
105 |
106 | 这里对df根据EID进行groupby,然后根据字段BTBL, BTYEAR两个字段进行聚合,然后reset_index
107 |
108 | [**用法举例二**]
109 |
110 | ||EID|ALTERNO|ALTDATE|ALTBE|ALTAF
111 | |--|--|--|--|--|--
112 | |1|399|05|2014-01|10|50
113 | |2|399|12|2015-05|NaN|NaN
114 | |3|399|12|2013-12|NaN|NaN
115 | |4|399|27|2014-01|10|50
116 | |5|399|99|2014-01|NaN|NaN
117 |
118 | groupby EID然后想要统计一些唯一的月份有几个
119 | ```python
120 | # 方法一
121 | def f(ll):
122 | fun = lambda x : x.split('-')[1]
123 | return len(set(map(fun,list(ll))))
124 | # lambda套lambda写法
125 | f = lambda ll : len(set(map(lambda x : x.split('-')[1],list(ll))))
126 |
127 | p = pd.merge(data0, data2.groupby('EID').agg({'ALTERNO':'nunique','ALTDATE':f}).reset_index().rename(columns={'ALTERNO':'alt_count','ALTDATE':'altdate_nunique'}), how='left',on='EID')
128 |
129 | #方法二
130 | data2['year'] = data2.ALTDATE.apply(lambda x : x.split('-')[0])
131 | data2['month'] = data2.ALTDATE.apply(lambda x : x.split('-')[1])
132 | data2.groupby('EID').agg({'month':'nunique'}).reset_index().rename(columns={'month':'month_nunique'})
133 | ```
134 |
135 | ### explode用法
136 |
137 | **1. 比如有个dataframe的结构如下**
138 |
139 | ||city|community|longitude|latitude|address
140 | |--|--|--|--|--|--
141 | |1|上海|东方庭院|121.044|31.1332|复兴路88弄,珠安路77弄,浦祥路377弄
142 |
143 | 执行如下语句:
144 | ```python
145 | data.drop('address',axis=1).join(data['address'].str.split(',',expand=True).stack().reset_index(level=1,drop=True).rename('address'))
146 |
147 | # spark中的explode用法
148 | spark_df = spark_df.select(spark_df['city'],spark_df['community_org'],spark_df['community'],\
149 | spark_df['longitude'],spark_df['latitude'],(explode(split('address',','))).alias('address'),spark_df['villagekey'])
150 | ```
151 | ||city|community|longitude|latitude|address
152 | |--|--|--|--|--|--
153 | |1|上海|东方庭院|121.044|31.1332|复兴路88弄|
154 | |2|上海|东方庭院|121.044|31.1332|珠安路77弄|
155 | |3|上海|东方庭院|121.044|31.1332|浦祥路377弄|
156 |
157 | **2. pandas0.25版本以上有explode的函数**
158 |
159 | ||col_a|col_b
160 | |--|--|--
161 | |0|10|[111, 222]
162 | |1|11|[333, 444]
163 |
164 | ```python
165 | df.explode('col_b') #得到如下表
166 | ```
167 |
168 | ||col_a|col_b
169 | |--|--|--
170 | |0|10|111
171 | |0|10|222
172 | |1|11|333
173 | |1|11|444
174 |
175 | **3. 一列json的数据变多列**
176 | ```python
177 | df = pd.DataFrame([[10,'0000003723','{"aa":"001","bb":"002","channel":"c1"}'],\
178 | [14,'0000003723','{"aa":"001","bb":"002","xxx":"c1"}'],\
179 | [11,'0092837434','{"aa":"003","bb":"004","cc":"010","channelDetails":{"channel":"c2"}}']],columns=['_idx','userno','detail'])
180 | # 分两步走
181 | # 第一步,单独处理json列
182 | def ff(row):
183 | row = json.loads(row)
184 | if not ('channel' in row or 'channelDetails' in row):
185 | return []
186 | res = [row['aa'],row['bb'],row['channel']] if 'channel' in row else [row['aa'],row['bb'],row['channelDetails']['channel']]
187 | return res
188 | df['new_col'] = df.detail.apply(ff)
189 | # 第二步,,拼接
190 | df = df[df.new_col.map(lambda x : x!=[])].drop('detail',axis=1).reset_index(drop=True)
191 | df = pd.concat([df[['_idx','userno']], pd.DataFrame(df.new_col.tolist(),columns=['a','b','c'])],axis=1)
192 | ```
193 |
194 |
195 | ### sort用法
196 |
197 | 注:df.sort()已经deprecated,以后可用df.sort_values()
198 | ```python
199 | data3.sort_values(['EID','B_REYEAR'],ascending=True) #默认是升序排,先根据EID然后再根据B_REYEAR进行排序
200 | ```
201 |
202 | ### left_join用法
203 | ```python
204 | data.merge(data1, how='left', on='id_code')
205 | ```
206 |
207 | ### reset_index用法
208 | ```python
209 | data.reset_index(drop=True)
210 | ```
211 |
212 | ### to_csv字段和值加引号操作
213 | to_csv中的参数quoting: int or csv.QUOTE_* instance, default 0
214 | 控制csv中的引号常量。
215 | 可选 QUOTE_MINIMAL(0), QUOTE_ALL(1), QUOTE_NONNUMERIC(2) OR QUOTE_NONE(3)
216 |
217 | ### 合并数据表
218 | 如果你熟悉SQL,这几个概念对你来说就是小菜一碟。不过不管怎样,这几个函数从本质上来说不过就是合并多个数据表的不同方式而已。当然,要时刻记着什么情况下该用哪个函数也不是一件容易的事,所以,让我们一起再回顾一下吧。
219 |
220 | concat()可以把一个或多个数据表按行(或列)的方向简单堆叠起来(看你传入的axis参数是0还是1咯)。
221 | ```python
222 | import pandas as pd
223 | df1 = pd.DataFrame(np.ones((3,4))*0, columns=['a','b','c','d'])
224 | df2 = pd.DataFrame(np.ones((3,4))*1, columns=['a','b','c','d'])
225 | df3 = pd.DataFrame(np.ones((3,4))*2, columns=['a','b','c','d'])
226 | pd.concat([df1, df2, df3], axis=0, ignore_index=True)
227 | pd.concat([df1, df2, df3], axis=1, ignore_index=True)
228 | ```
229 | 
230 |
231 | merge()将会以用户指定的某个名字相同的列为主键进行对齐,把两个或多个数据表融合到一起。
\
232 | 
233 |
234 | join()和merge()很相似,只不过join()是按数据表的索引进行对齐,而不是按某一个相同的列。当某个表缺少某个索引的时候,对应的值为空(NaN)
235 | 
236 |
237 | ### 数据透视表
238 | 最后也最重要的是数据透视表。如果你对微软的Excel有一定了解的话,你大概也用过(或听过)Excel里的“数据透视表”功能。Pandas里内建的pivot_table()函数的功能也差不多,它能帮你对一个数据表进行格式化,并输出一个像Excel工作表一样的表格。实际使用中,透视表将根据一个或多个键对数据进行分组统计,将函数传入参数aggfunc中,数据将会按你指定的函数进行统计,并将结果分配到表格中。
239 | ```python
240 | from pandas import pivot_table
241 | >>> df
242 | A B C D
243 | 0 foo one small 1
244 | 1 foo one large 2
245 | 2 foo one large 2
246 | 3 foo two small 3
247 | 4 foo two small 3
248 | 5 bar one large 4
249 | 6 bar one small 5
250 | 7 bar two small 6
251 | 8 bar two large 7
252 |
253 | >>> table = pivot_table(df, values='D', index=['A', 'B'],
254 | ... columns=['C'], aggfunc=np.sum)
255 | >>> table
256 | small large
257 | foo one 1 4
258 | two 6 NaN
259 | bar one 5 4
260 | two 6 7
261 | ```
262 |
263 | ### shuffle
264 | ```python
265 | # 方法一
266 | from sklearn.utils import shuffle
267 | df = shuffle(df)
268 | # 方法二
269 | df.sample(frac=1).reset_index(drop=True)
270 | ```
271 |
272 | ### dataframe交换列的顺序
273 | ```python
274 | reorder_col = ['label','doc','query']
275 | df = df.loc[:, reorder_col]
276 | ```
277 |
278 | ### dataframe设置两个条件取值
279 |
280 | ```python
281 | df[(df.Store == 1) & (df.Dept == 1)]
282 | ```
283 |
284 | ### dataframe用h5格式保存
285 |
286 | ```python
287 | # 普通格式存储
288 | h5 = pd.HDFStore('data/data1_2212.h5','w')
289 | h5['data'] = data1
290 | h5.close()
291 | # 压缩格式存储
292 | h5 = pd.HDFStore('data/data1_2212.h5','w', complevel=4, complib='blosc')
293 | h5['data'] = data1
294 | h5.close()
295 | # 读取h5文件
296 | data=pd.read_hdf('data/data1_2212.h5',key='data')
297 | ```
298 |
299 | ### assign用法
300 |
301 | assign相当于给df增加一列,返回新的df copy
302 | Assign new columns to a DataFrame, returning a new object
303 | (a copy) with the new columns added to the original ones.
304 |
305 | ```python
306 | def iv_xy(x, y):
307 | # good bad func
308 | def goodbad(df):
309 | names = {'good': (df['y']==0).sum(),'bad': (df['y']==1).sum()}
310 | return pd.Series(names)
311 | # iv calculation
312 | iv_total = pd.DataFrame({'x':x.astype('str'),'y':y}) \
313 | .fillna('missing') \
314 | .groupby('x') \
315 | .apply(goodbad) \
316 | .replace(0, 0.9) \
317 | .assign(
318 | DistrBad = lambda x: x.bad/sum(x.bad),
319 | DistrGood = lambda x: x.good/sum(x.good)
320 | ) \
321 | .assign(iv = lambda x: (x.DistrBad-x.DistrGood)*np.log(x.DistrBad/x.DistrGood)) \
322 | .iv.sum() # iv核心公式,最后iv.sum()对每个group加总求和,即为该特征的iv值
323 | # return iv
324 | return iv_total
325 | ```
326 |
327 | ### 用一列的非空值填充另一列对应行的空值
328 |
329 | ```python
330 | df.loc[df['new_subject'].isnull(),'new_subject']=df[df['new_subject'].isnull()]['subject']
331 | ```
332 |
333 | ### dataframe修改值
334 |
335 | ```python
336 | df.loc[df.A < 4,'A'] = [100,120,140]
337 | # or
338 | df.loc[df.content_id=='x6mbO2rHfU3hTej4','sentiment_tmp'] = 1
339 | ```
340 |
341 | ### dataframe表格填充
342 |
343 | ```python
344 | df.fillna(method='ffill', axis=1).fillna(method='ffill')
345 | ```
346 |
347 | ### 加快dataframe读取
348 |
349 | 方式一:cpu多线程读取(推荐)
350 | ```python
351 | #安装datatable==0.11.1
352 | import datatable as dtable
353 | train = dtable.fread(path+'train.csv').to_pandas()
354 | ```
355 |
356 | 方式二:gpu读取
357 | ```python
358 | #安装cudf(稍微有点麻烦)
359 | import cudf
360 | train = cudf.read_csv(path+'train.csv').to_pandas()
361 | ```
362 |
363 | ### df热力图
364 |
365 | ```python
366 | df.corr().style.background_gradient(cmap='coolwarm').set_precision(2)
367 | ```
368 |
369 | ### df热力地图
370 |
371 | 结合pyecharts将各省市高校上榜数量进行地图可视化
372 | ```python
373 | from pyecharts import options as opts
374 | from pyecharts.charts import Map
375 | #省份
376 | list1 = ['北京','江苏','上海','广东','湖北','陕西','浙江','四川','湖南','山东','安徽','辽宁','重庆','福建','天津','吉林','河南','黑龙江','江西','甘肃','云南','河北']
377 | #省份对应的高效数量
378 | list2 = [18, 15, 10, 9, 7, 7, 4, 3, 3, 3, 3, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1]
379 | c = (
380 | Map()
381 | .add('', [list(z) for z in zip(list1,list2)], "china",is_map_symbol_show=False)
382 | .set_global_opts(
383 | title_opts=opts.TitleOpts(title="排名前100高校各省市占比"),
384 | visualmap_opts=opts.VisualMapOpts(max_=20),
385 |
386 |
387 | )
388 | )
389 | c.render_notebook()
390 | ```
391 |
392 | ### eda插件
393 |
394 | ```python
395 | #插件一
396 | #!pip install pandas_profiling
397 | import pandas_profiling
398 | pandas_profiling.ProfileReport(df)
399 | #插件二
400 | import sweetviz as sv
401 | report = sv.analyze(df)
402 | report.show_html()
403 | ```
404 |
405 | ### python批量插入mysql数据库
406 |
407 | ```python
408 | df.to_numpy()[:5].tolist()
409 | '''
410 | [['25_B', 25, 'B', 0.6, '2024-08-12'],
411 | ['23_C', 23, 'C', 2.2, '2024-08-12'],
412 | ['24_D', 24, 'D', 3.8, '2024-08-12'],
413 | ['29_E', 29, 'E', 1.5, '2024-08-12'],
414 | ['22_F', 22, 'F', 4.1, '2024-08-12']]
415 | '''
416 |
417 | import pymysql
418 | MYSQL_W_CONFIG = {'host':'10.xx.xxx.xx',
419 | 'port':3306,
420 | 'user':'user',
421 | 'password':'passwd',
422 | 'database':'mydatabase',
423 | 'charset':'utf8'}
424 | conn = pymysql.connect(autocommit=True, **MYSQL_W_CONFIG)
425 | cursor = conn.cursor()
426 | sql = "insert into xx_table(id,cust_id,agcode,score,s_time) values(%s,%s,%s,%s,%s)"
427 | cursor.executemany(sql, df_final.to_numpy().tolist())
428 | conn.commit()
429 | conn.close()
430 | #1w条数据批量插入大概0.45s左右
431 | ```
--------------------------------------------------------------------------------
/03_pandas/concat.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/binzhouchn/python_notes/52b87285c4147cc2bd43d69d0243a2d6357f4183/03_pandas/concat.png
--------------------------------------------------------------------------------
/03_pandas/join.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/binzhouchn/python_notes/52b87285c4147cc2bd43d69d0243a2d6357f4183/03_pandas/join.png
--------------------------------------------------------------------------------
/03_pandas/merge.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/binzhouchn/python_notes/52b87285c4147cc2bd43d69d0243a2d6357f4183/03_pandas/merge.png
--------------------------------------------------------------------------------
/04_sklearn/README.md:
--------------------------------------------------------------------------------
1 | ## 目录
2 |
3 | [**0. 特征工程相关查阅feature_engineering仓库**](https://github.com/binzhouchn/feature_engineering)
4 |
5 | [**1. 将数据集进行train, test分割**](#将数据集进行train_test分割)
6 |
7 | [**2. 对数据集进行随机抽样**](#对数据集进行随机抽样)
8 |
9 | - [抽样方法一](#抽样方法一)
10 | - [抽样方法二](#抽样方法二)
11 | - [抽样方法三](#抽样方法三)
12 |
13 | [**3. 对结果进行评判,混淆矩阵**](#对结果进行评判用混淆矩阵)
14 |
15 | [**4. 模型效果评价accuracy, logloss, precision, recall, ks等**](#模型效果评价)
16 |
17 | ---
18 |
19 | ### 将数据集进行train_test分割
20 | ```python
21 | # 训练测试样本集 stratify可以指定分割是否需要分层,分层的话正负样本在分割后还是保持一致, 输入的label
22 | from sklearn.cross_validation import train_test_split
23 | def train_test_sep(X, test_size = 0.25, stratify = None, random_state = 1001):
24 | train, test = train_test_split(X, test_size = test_size, stratify = stratify, random_state = random_state)
25 | return train, test
26 | ```
27 | ### 对数据集进行随机抽样
28 |
29 | #### 抽样方法一
30 |
31 | ```python
32 | from sklearn.model_selection import train_test_split
33 | #cat是在df中的某一个属性列
34 | X_train, X_test = train_test_split(df, test_size=0.3, stratify=df.cat)
35 | ```
36 |
37 | #### 抽样方法二
38 | ```python
39 | df.sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None)
40 |
41 | - n是要抽取的行数。(例如n=20000时,抽取其中的2w行)
42 | - frac是抽取的比列。(有一些时候,我们并对具体抽取的行数不关系,我们想抽取其中的百分比,这个时候就可以选择使用frac,例如frac=0.8,就是抽取其中80%)
43 | - replace抽样后的数据是否代替原DataFrame()
44 | - weights这个是每个样本的权重,具体可以看官方文档说明。
45 | - random_state这个在之前的文章已经介绍过了。
46 | - axis是选择抽取数据的行还是列。axis=0的时是抽取行,axis=1时是抽取列(也就是说axis=1时,在列中随机抽取n列,在axis=0时,在行中随机抽取n行)
47 | ```
48 | #### 抽样方法三
49 | ```python
50 | import random
51 | random_num_test = random.sample(np.arange(0,len(df)),200)
52 | random_num_train = list(set(xrange(len(df)))^set(random_num_test))
53 | test = df.iloc[random_num_test]
54 | train = df.iloc[random_num_train]
55 | ```
56 |
57 | ### 对结果进行评判用混淆矩阵
58 | ```python
59 | from sklearn.metrics import confusion_matrix
60 | confusion_matrix(y_true, y_pred)
61 | ```
62 |
63 | ### 模型效果评价
64 | ```python
65 | # accuracy 准确率是针对y_true和y_pred都是类别的比如0和1
66 | from sklearn.metrics import accuracy_score
67 | accuracy_score(y_true, y_pred)
68 | ```
69 | ```python
70 | # log_loss 又叫交叉熵,y_true是类别比如0和1,y_pred是属于类别1的概率值
71 | from sklearn.metrics import log_loss
72 | logloss = log_loss(y_true, y_pred, eps=1e-15)
73 | ```
74 | ```python
75 | # recall precision
76 | from sklearn.metrics import confusion_matrix
77 | confusion_matrix(y_true, y_pred)
78 | from sklearn.metrics import precision_score
79 | from sklearn.metrics import recall_score
80 | from sklearn.metrics import f1_score
81 | ```
82 | ```python
83 | # KS
84 | '''
85 | 其实从某个角度上来讲ROC曲线和KS曲线是一回事,只是横纵坐标的取法不同而已。
86 | 拿逻辑回归举例,模型训练完成之后每个样本都会得到一个类概率值(注意是类似的类),
87 | 把样本按这个类概率值排序后分成10等份,每一份单独计算它的真正率和假正率,
88 | 然后计算累计概率值,用真正率和假正率的累计做为坐标画出来的就是ROC曲线;
89 | 用10等分做为横坐标,用真正率和假正率的累计值分别做为纵坐标就得到两个曲线,这就是KS曲线
90 | '''
91 | from sklearn import metrics
92 | def ks(y_pred, y_true):
93 | label=y_true
94 | fpr,tpr,thres = metrics.roc_curve(label,y_pred,pos_label=1)
95 | return 'ks',abs(fpr - tpr).max()
96 | ```
97 |
98 |
--------------------------------------------------------------------------------
/05_OOP/README.md:
--------------------------------------------------------------------------------
1 | ## 目录
2 |
3 | [**1. 继承**](#继承)
4 |
5 | [**2. super函数使用基础**](#super函数使用基础)
6 |
7 | [**3. super函数使用 以LR为例**](#super函数使用_以lr为例)
8 |
9 | [**4. 装饰器@**](#装饰器)
10 |
11 | [**5. 装饰器@property**](#装饰器property)
12 |
13 | [**6. python单例模式**](#单例模式)
14 |
15 | [**7. python deprecated warning**](#deprecationwarning)
16 |
17 | [**8. 定制类**](#定制类)
18 |
19 | [**9. 网络编程**](#网络编程)
20 |
21 | [**各种模式待补充 看设计之禅(第2版)**](#设计模式)
22 |
23 | ---
24 | ### 继承
25 |
26 | ```python
27 | class FooParent(object):
28 | def __init__(self):
29 | self.parent = 'I\'m the parent.'
30 | print ('Parent')
31 |
32 | def bar(self, message):
33 | print ("{} from Parent".format(message))
34 |
35 | class FooChild(FooParent):
36 | def __init__(self):
37 | super(FooChild,self).__init__()
38 | print ('Child')
39 |
40 | def bar(self, message):
41 | super(FooChild, self).bar(message)
42 | print ('Child bar function')
43 | print (self.parent)
44 | ```
45 |
46 | ### super函数使用基础
47 |
48 | 实际上,大家对于在Python中如何正确使用super()函数普遍知之甚少。你有时候会看到像下面这样直接调用父类的一个方法:
49 | ```python
50 | class Base:
51 | def __init__(self):
52 | print('Base.__init__')
53 | class A(Base):
54 | def __init__(self):
55 | Base.__init__(self)
56 | print('A.__init__')
57 | ```
58 | 尽管对于大部分代码而言这么做没什么问题,但是在更复杂的涉及到多继承的代码中就有可能导致很奇怪的问题发生。比如,考虑如下的情况:
59 | ```python
60 | class Base:
61 | def __init__(self):
62 | print('Base.__init__')
63 | class A(Base):
64 | def __init__(self):
65 | Base.__init__(self)
66 | print('A.__init__')
67 | class B(Base):
68 | def __init__(self):
69 | Base.__init__(self)
70 | print('B.__init__')
71 | class C(A,B):
72 | def __init__(self):
73 | A.__init__(self)
74 | B.__init__(self)
75 | print('C.__init__')
76 | ```
77 | 如果你运行这段代码就会发现Base.init()被调用两次,如下所示:
78 | ```python
79 | >>> c = C()
80 | Base.__init__
81 | A.__init__
82 | Base.__init__
83 | B.__init__
84 | C.__init__
85 | >>>
86 | ```
87 | 可能两次调用Base.init()没什么坏处,但有时候却不是。另一方面,假设你在代码中换成使用super(),结果就很完美了:
88 | ```python
89 | class Base:
90 | def __init__(self):
91 | print('Base.__init__')
92 | class A(Base):
93 | def __init__(self):
94 | super().__init__()
95 | print('A.__init__')
96 | class B(Base):
97 | def __init__(self):
98 | super().__init__()
99 | print('B.__init__')
100 | class C(A,B):
101 | def __init__(self):
102 | super().__init__() # Only one call to super() here
103 | print('C.__init__')
104 | ```
105 | 运行这个新版本后,你会发现每个init()方法只会被调用一次了:
106 | ```python
107 | >>> c = C()
108 | Base.__init__
109 | B.__init__
110 | A.__init__
111 | C.__init__
112 | >>>
113 | ```
114 | 记一下这个:
115 | ```python
116 | class Base(object):
117 | def __init__(self,a=1,b=11):
118 | self.a = a
119 | self.b = b
120 | # 绑定(推荐)
121 | class B(Base):
122 | def __init__(self, a, b, c):
123 | super().__init__(a, b) # super(B, self).__init__(a, b)
124 | self.c = c
125 | # 未绑定
126 | class C(Base):
127 | def __init__(self, a, b, c):
128 | Base.__init__(self, a=a, b=1000)
129 | ```
130 | ```python
131 | B(1,2,3).a, B(1,2,3).b, B(1,2,3).c
132 | BB(1,2,3).a, BB(1,2,3).b, BB(1,2,3).c
133 |
134 | (1, 2, 3)
135 | (1, 1000, 3)
136 | ```
137 | ```
138 | 1. super并不是一个函数,是一个类名,形如super(B, self)事实上调用了super类的初始化函数,
139 | 产生了一个super对象;
140 | 2. super类的初始化函数并没有做什么特殊的操作,只是简单记录了类类型和具体实例;
141 | 3. super(B, self).func的调用并不是用于调用当前类的父类的func函数;
142 | 4. Python的多继承类是通过mro的方式来保证各个父类的函数被逐一调用,而且保证每个父类函数
143 | 只调用一次(如果每个类都使用super);
144 | 5. 混用super类和非绑定的函数是一个危险行为,这可能导致应该调用的父类函数没有调用或者一
145 | 个父类函数被调用多次。
146 | ```
147 |
148 | ### super函数使用_以LR为例
149 |
150 | ```python
151 | from sklearn.linear_model import LogisticRegression
152 |
153 | class LR(LogisticRegression):
154 |
155 | def __init__(self, threshold=0.01, dual=False, tol=1e-4, C=1.0,
156 | fit_intercept=True, intercept_scaling=1, class_weight=None,
157 | random_state=None, solver='liblinear', max_iter=100,
158 | multi_class='ovr', verbose=0, warm_start=False, n_jobs=1):
159 | #权值相近的阈值
160 | self.threshold = threshold
161 | super(LR, self).__init__(penalty='l1', dual=dual, tol=tol, C=C,
162 | fit_intercept=fit_intercept,
163 | intercept_scaling=intercept_scaling,
164 | class_weight=class_weight,
165 | random_state=random_state,
166 | solver=solver, max_iter=max_iter,
167 | multi_class=multi_class,
168 | verbose=verbose,
169 | warm_start=warm_start,
170 | n_jobs=n_jobs)
171 | #使用同样的参数创建L2逻辑回归
172 | self.l2 = LogisticRegression(penalty='l2', dual=dual, tol=tol, C=C, fit_intercept=fit_intercept, intercept_scaling=intercept_scaling,
173 | class_weight = class_weight, random_state=random_state,
174 | solver=solver,
175 | max_iter=max_iter,
176 | multi_class=multi_class,
177 | verbose=verbose,
178 | warm_start=warm_start,
179 | n_jobs=n_jobs)
180 |
181 | def fit(self, X, y, sample_weight=None):
182 | #训练L1逻辑回归
183 | super(LR, self).fit(X, y, sample_weight=sample_weight) # 这个不需要实例化就直接用父类的方法,父类在之前已经被初始化了penalty = 'l1'那个。
184 | self.coef_old_ = self.coef_.copy() # 继承了父类的,所以可以直接用self.coef_
185 | #训练L2逻辑回归
186 | self.l2.fit(X, y, sample_weight=sample_weight)
187 | print self.coef_
188 | print self.l2.coef_
189 | ```
190 |
191 | ### 装饰器
192 |
193 | _装饰器详解可参照[basic文档](https://github.com/binzhouchn/python_notes/blob/master/00.basic/README.md#装饰器)_
194 |
195 | a. @classmethod: 不需要self参数,但第一个参数需要是表示自身类的cls参数
196 | ```
197 | @classmethod意味着:当调用此方法时,我们将该类作为第一个参数传递,而不是该类的实例(正如我们通常使用的方法)。
198 | 这意味着您可以使用该方法中的类及其属性,而不是特定的实例
199 | ```
200 | b. @staticmethod: 不需要表示自身对象的self和自身类的cls参数,就跟使用函数一样
201 | ```
202 | @staticmethod意味着:当调用此方法时,我们不会将类的实例传递给它(正如我们通常使用的方法)。
203 | 这意味着你可以在一个类中放置一个函数,但是你无法访问该类的实例(当你的方法不使用实例时这很实用)
204 | ```
205 | ```python
206 | class A(object):
207 | bar = 1
208 | def foo(self):
209 | print 'foo'
210 |
211 | @staticmethod
212 | def static_foo():
213 | print 'static_foo'
214 | print A.bar
215 |
216 | @classmethod
217 | def class_foo(cls):
218 | print 'class_foo'
219 | print cls.bar
220 | cls().foo()
221 |
222 | A.static_foo()
223 | A.class_foo()
224 | ```
225 | ```python
226 | static_foo
227 | 1
228 | class_foo
229 | 1
230 | foo
231 | ```
232 |
233 | 装饰器相当于一个高阶函数,传入函数,返回函数,返回的时候这个函数多了一些功能[(原文链接)](https://mp.weixin.qq.com/s/hsa-kYvL31c1pEtMpkr6bA)
234 | ```python
235 | # 无参数的装饰器
236 | def use_logging(func):
237 |
238 | def wrapper():
239 | logging.warn("%s is running" % func.__name__)
240 | return func()
241 | return wrapper
242 |
243 | @use_logging
244 | def foo():
245 | print("i am foo")
246 |
247 | foo()
248 |
249 | #----------------------------------------------------------
250 | # 带参数的装饰器
251 | def use_logging(level):
252 | def decorator(func):
253 | def wrapper(*args, **kwargs):
254 | if level == "warn":
255 | logging.warn("%s is running" % func.__name__)
256 | elif level == "info":
257 | logging.info("%s is running" % func.__name__)
258 | return func(*args, **kwargs)
259 | return wrapper
260 |
261 | return decorator
262 |
263 | @use_logging(level="warn") # 可以传参数进装饰器
264 | def foo(name, age=None, height=None):
265 | print("I am %s, age %s, height %s" % (name, age, height))
266 |
267 | foo('John',9) [WARNING:root:foo is running]I am John, age 9, height None
268 |
269 | #---------------------------------------------------
270 | # 类装饰器
271 | class Foo(object):
272 | def __init__(self, func):
273 | self._func = func
274 |
275 | def __call__(self):
276 | print ('class decorator runing')
277 | self._func()
278 | print ('class decorator ending')
279 |
280 | @Foo
281 | def bar():
282 | print ('test bar')
283 |
284 | bar()
285 | 输出
286 | class decorator runing
287 | test bar
288 | class decorator ending
289 | ```
290 |
291 | ### 装饰器property
292 |
293 | 把一个getter方法变成属性,只需要加上@property就可以了,此时,@property本身又创建了另一个装饰器@score.setter,
294 | 负责把一个setter方法变成属性赋值,于是,我们就拥有一个可控的属性操作
295 | ```python
296 | class Student(object):
297 |
298 | @property
299 | def birth(self):
300 | return self._birth
301 |
302 | @birth.setter
303 | def birth(self, value):
304 | self._birth = value
305 |
306 | @property
307 | def age(self):
308 | return 2015 - self._birth
309 |
310 | a = Student()
311 | a.birth = 22 # 这个的birth.setter装饰器相当于把之前birth方法变成了属性
312 | print(a.birth)
313 | print(a.age)
314 | ```
315 |
316 | ### 单例模式
317 |
318 | ```python
319 | # 使用__new__方法
320 | #写法一
321 | class Singleton(object):
322 | def __new__(cls, *args, **kw):
323 | if not hasattr(cls, '_instance'):
324 | orig = super(Singleton, cls)
325 | cls._instance = orig.__new__(cls, *args, **kw)
326 | return cls._instance
327 | #写法二
328 | class Singleton(object):
329 | __instance=None
330 | def __init__(self):
331 | pass
332 | def __new__(cls, *args, **kwargs):
333 | if not cls.__instance:
334 | cls.__instance=super(Singleton, cls).__new__(cls,*args,**kwargs)
335 | return cls.__instance
336 | ```
337 | ```python
338 | # 这个class是自己定义的class可以继承singleton实现单例模式
339 | # MyClass只加载一次
340 | class MyClass(Singleton):
341 | def __init__(self):
342 | print('ok')
343 | def kk(self):
344 | print('effwfwsefwefwef')
345 | ```
346 | > 写一个装饰器@singleton也行
347 | ```python
348 | def singleton(cls, *args, **kw):
349 | instance={}
350 | def _singleton():
351 | if cls not in instance:
352 | instance[cls]=cls(*args, **kw)
353 | return instance[cls]
354 | return _singleton
355 |
356 | @singleton
357 | class A:
358 | def __init__(self):
359 | pass
360 | def test(self,num):
361 | return num*2
362 | ```
363 |
364 |
365 | ### DeprecationWarning
366 |
367 | ```python
368 | import warnings
369 | import functools
370 |
371 | def deprecated(func):
372 | """This is a decorator which can be used to mark functions
373 | as deprecated. It will result in a warning being emitted
374 | when the function is used."""
375 | @functools.wraps(func)
376 | def new_func(*args, **kwargs):
377 | warnings.simplefilter('always', DeprecationWarning) # turn off filter
378 | warnings.warn("Call to deprecated function {}.".format(func.__name__),
379 | category=DeprecationWarning,
380 | stacklevel=2)
381 | warnings.simplefilter('default', DeprecationWarning) # reset filter
382 | return func(*args, **kwargs)
383 | return new_func
384 |
385 | @deprecated
386 | def some_old_function(x, y):
387 | return x + y
388 | ```
389 |
390 | ### 定制类
391 |
392 | [廖雪峰 定制类](https://www.liaoxuefeng.com/wiki/0014316089557264a6b348958f449949df42a6d3a2e542c000/0014319098638265527beb24f7840aa97de564ccc7f20f6000)
393 |
394 | ### 网络编程
395 |
396 | [网络编程网址](https://blog.csdn.net/qq_41853758/article/details/82853811)
397 |
398 | ###
--------------------------------------------------------------------------------
/06_flask_sanic/README.md:
--------------------------------------------------------------------------------
1 | [**1. flask**](#flask)
2 |
3 | [**2. sanic**](#sanic)
4 |
5 | [**3. python调用flask post方法**](#python调用flask_post方法)
6 |
7 | # flask
8 |
9 | ## GET方法
10 | ```python
11 | # GET方法
12 | # -*- coding: utf-8 -*-
13 | from flask import Flask, jsonify
14 | app = Flask(__name__)
15 | app.config['JSON_AS_ASCII'] = False # 防止中文乱码
16 | tasks = [
17 | {
18 | 'id': 1,
19 | 'title': u'Buy groceries',
20 | 'description': u'Milk, Cheese, Pizza, Fruit, Tylenol',
21 | 'done': False
22 | },
23 | {
24 | 'id': 2,
25 | 'title': u'Learn Python',
26 | 'description': u'Need to find a good Python tutorial on the web',
27 | 'done': False
28 | }
29 | ]
30 | @app.route('/todo/api/v1.0/tasks', methods=['GET'])
31 | def get_tasks():
32 | return jsonify({'tasks': tasks})
33 | if __name__ == '__main__':
34 | app.run(host=127.0.0.1,
35 | port=5000)
36 | ```
37 |
38 | ## POST方法
39 | ```python
40 | # -*- coding: utf-8 -*-
41 | from flask import Flask
42 | from flask import request
43 | from flask import make_response,Response
44 | from flask import jsonify
45 | # 抛出flask异常类
46 | class CustomFlaskErr(Exception):
47 | # 自己定义了一个 responseCode,作为粗粒度的错误代码
48 | def __init__(self, responseCode=None):
49 | Exception.__init__(self)
50 | self.responseCode = responseCode
51 | self.J_MSG = {9704: '参数不合法!',9999:'系统内部异常!'}
52 | # 构造要返回的错误代码和错误信息的 dict
53 | def get_dict(self):
54 | rv = dict()
55 | # 增加 dict key: response code
56 | rv['responseCode'] = self.responseCode
57 | # 增加 dict key: responseMsg, 具体内容由常量定义文件中通过 responseCode 转化而来
58 | rv['responseMsg'] = self.J_MSG[self.responseCode]
59 | return rv
60 | def get_chatterbot_result(xx):
61 | pass # 这里应该return标准问题及ID,标准答案及ID,相似问及ID等之类的东西
62 |
63 | app = Flask(__name__)
64 | app.config['JSON_AS_ASCII'] = False
65 |
66 | @app.route('/')
67 | def get_simple_test():
68 | return 'BINZHOU TEST'
69 |
70 | @app.route('/req_message', methods=['POST'])
71 | def req_message():
72 | if request.method == 'POST':
73 | sid = request.form.get('sid')
74 | q = request.form.get('q')
75 | uid = request.form.get('uid','default_uid') # 可为空
76 | businessId = request.form.get('businessId')
77 | messageId = request.form.get('messageId','default_messageId') # 可为空
78 | source = request.form.get('source','default_source') # 可为空
79 | requestTime = request.form.get('requestTime')
80 | requestId = request.form.get('requestId')
81 | if not (sid and q and businessId and requestTime and requestId):
82 | raise CustomFlaskErr(responseCode=9704)
83 | # 进过我们自己定义的模块和chatterbot返回答案以及我们想要的一些东西等
84 | # bot_answer包含标准问题及ID,标准答案及ID,相似问及ID等之类的东西,需要解析一下然后给result
85 | bot_answer = get_chatterbot_result(q)
86 | # 根据机器人得到的结果将整个返回报文进行组装
87 | result = {
88 | 'sid':sid,
89 | 'q':q,
90 | 'uid':uid,
91 | 'businessId':businessId,
92 | 'messageId':messageId,
93 | 'type':0,
94 | 'source':source,
95 | 'sqId':12345,
96 | 'stQuestion':'绍兴在哪里?',
97 | 'sm':[
98 | {'smid':12346,'smQuestion':'绍兴?哪里?'},
99 | {'smid':12347,'smQuestion':'绍兴是哪里的啊啊'}],
100 | 'answer':{
101 | "aid":"12345",
102 | "answare_text":"绍兴在浙江",
103 | "atype":1
104 | },
105 | 'responseCode':'0000',
106 | 'responseMsg':'返回成功',
107 | 'responseTime':'20180601'
108 | }
109 | return jsonify(result)
110 |
111 | @app.errorhandler(CustomFlaskErr)
112 | def handle_flask_error(error):
113 | # response的json内容为自定义错误代码和错误信息
114 | response = jsonify(error.get_dict())
115 | response.responseCode = error.responseCode
116 | return response
117 |
118 | if __name__ == '__main__':
119 | app.run(host='127.0.0.1',
120 | port='5000')
121 | ```
122 |
123 | ## 解决flask跨域问题
124 |
125 | ```python
126 | # pip install flask-cors
127 | from flask_cors import CORS
128 | app = Flask(__name__,)
129 | # r'/*' 是通配符,让本服务器所有的URL 都允许跨域请求
130 | CORS(app, resources=r'/*')
131 | ```
132 |
133 | ---
134 |
135 | # sanic
136 | ```python
137 | # sanic get和post方法
138 | # 使用了异步特性,而且还使用uvloop作为事件循环,其底层使用的是libuv,从而使 Sanic的速度优势更加明显。
139 | import os
140 | from sanic import Sanic, response
141 | from sanic.response import html, json, redirect, text, raw, file, file_stream
142 |
143 | app = Sanic()
144 |
145 | @app.route('/get')
146 | async def get_test(request):
147 | title = request.args.get('title')
148 | return response.json([{'model_name': title}])
149 |
150 | if __name__ == '__main__':
151 | # app.run()
152 | app.run(host='127.0.0.1', port=8000)
153 | ```
154 | # python调用flask_post方法
155 |
156 | 方法一:python requests
157 | ```python
158 | # json(request.json.get)
159 | import requests
160 | json={'id': 1223, 'text': '我是中国人'}
161 | r = requests.post('http://0.0.0.0:5000/req_message', json=json)
162 | r.json()
163 | # values(request.values.get)
164 | import requests
165 | r = requests.post('http://0.0.0.0:5000/req_message', data=[('id',1223),('text', '我是中国人')])
166 | ```
167 |
168 | 方法二:postman工具
169 | 点击右上方,点击Code->选Python Requests->复制代码即可
170 |
171 | 方法三:终端访问
172 | curl, 待补充..
173 |
174 |
--------------------------------------------------------------------------------
/07_database/README.md:
--------------------------------------------------------------------------------
1 | # 目录
2 |
3 | ## 1. Mysql/Hive(docker version)
4 | ```
5 | # 先下载镜像
6 | docker pull mysql:5.5
7 | # 运行容器 可以先把-v去掉
8 | docker run -p 3306:3306 --name mymysql -v $PWD/conf:/etc/mysql/conf.d -v $PWD/logs:/logs -v $PWD/data:/var/lib/mysql -e MYSQL_ROOT_PASSWORD=123456 -d mysql:5.5
9 |
10 | -p 3306:3306:将容器的 3306 端口映射到主机的 3306 端口。
11 | -v -v $PWD/conf:/etc/mysql/conf.d:将主机当前目录下的 conf/my.cnf 挂载到容器的 /etc/mysql/my.cnf。
12 | -v $PWD/logs:/logs:将主机当前目录下的 logs 目录挂载到容器的 /logs。
13 | -v $PWD/data:/var/lib/mysql :将主机当前目录下的data目录挂载到容器的 /var/lib/mysql 。
14 | -e MYSQL_ROOT_PASSWORD=123456:初始化 root 用户的密码。
15 | ```
16 | ```python
17 | # 用三方工具Navicat或者python连接,先建好db比如test_db
18 | import pymysql
19 | # 打开数据库连接
20 | db = pymysql.connect("localhost","root","123456","test_db")
21 | # 使用 cursor() 方法创建一个游标对象 cursor
22 | cursor = db.cursor()
23 | sql = "INSERT INTO tt(a, b, date) VALUES ('%d', '%s', '%s')"
24 | data = (306, '插入6', '20190615')
25 | cursor.execute(sql % data)
26 | db.commit()
27 | ```
28 | ```
29 | # 起了mysql服务以后,在用docker python去插入数据
30 | # 需要先查看docker mysql的容器ip地址,命令看2.8
31 | # 然后localhost改成mysql容器的ip地址即可,其他一样
32 | ```
33 |
34 | ## 2. Redis(docker version)
35 |
36 | 
37 | ```
38 | # 启动redis命令
39 | docker run --name docker-redis-test -p 6379:6379 -d redis:latest --requirepass "123456"
40 | # redis客户端连接命令
41 | docker exec -it redis-cli
42 | # 进去以后的操作
43 | auth 123456
44 | set name zhangsan
45 | get name
46 | quit
47 | ```
48 |
49 | ```python
50 | # python连接docker起的redis服务
51 | import redis
52 | # 连接redis
53 | redis_conf = redis.Redis(host='xx.xx.xx.xx', port=6379, password='123456')
54 | # 查看redis中的keys
55 | redis_conf.keys()
56 | # 插入name,value
57 | redis_conf.set(name='name',value='John')
58 | # 插入name,key,value
59 | redis_conf.hset(name='name',key='k1',value='John')
60 | # 批量插入name,key,value
61 | redis_conf.hmset('hash1',{'k1':'v1','k2':'v2'})
62 | # 批量get name
63 | redis_conf.hgetall('hash1')
64 | ```
65 | ```
66 | # redis可视化工具RDM(已安装)
67 | ```
68 |
69 | ## 3. pymongo(docker version)
70 |
71 | 1. 把[腾讯词向量](https://ai.tencent.com/ailab/nlp/embedding.html)存入mongodb中,需先[安装mongodb](https://blog.csdn.net/weixin_29026283/article/details/82252941)
72 | 2. mongodb搭建后创建用户名密码
73 | ```shell
74 | 启动mongo: mongod -f mongodb.conf
75 | 关闭mongo: mongod -f mongodb.conf --shutdown
76 | # 命令行登录mongodb: mongo
77 | # 添加用户名密码
78 | use admin
79 | db.createUser({user: "root", pwd: "xxxxxx", roles:["root"]})
80 | ```
81 |
82 | ```
83 | # 启动mongodb命令
84 | docker run -p 27017:27017 -v $PWD/mongo_db:/data/mongo_db -d mongo:4.0.10
85 | # 连接到mongo镜像cli
86 | docker run -it mongo:4.0.10 mongo --host <容器ip>
87 |
88 | # 建database建collection比如runoob然后插入数据
89 | db.runoob.insert({"title": 'MongoDB 教程',
90 | "description": 'MongoDB 是一个 Nosql 数据库',
91 | "by": 'w3cschool',
92 | "url": 'http://www.w3cschool.cn',
93 | "tags": ['mongodb', 'database', 'NoSQL'],
94 | "likes": 100})
95 | db.runoob.find()
96 | ```
97 | ```python
98 | import pymongo
99 | # 连接
100 | client = pymongo.MongoClient(host='xx.xx.xx.xx', port=27017, username='root', password='xxxxxx')
101 | # client = pymongo.MongoClient('mongodb://root:xxxxxx@localhost:27017/')
102 | # 读取数据库(如果没有的话自动创建)
103 | db = client.tencent_wv
104 | # 读取集合(如果没有的话自动创建)
105 | my_set = db.test_set
106 | # 删除集合 test_set
107 | db.drop_collection('test_set')
108 | # 插入数据和查询数据
109 | my_set.insert_one ({"name":"zhangsan","age":18,'shuze':[3,4,2,6,7,10]})
110 | my_set.find_one({"name":"zhangsan"})
111 | ```
112 | ```python
113 | # 以插入腾讯词向量为例
114 | from tqdm import tqdm
115 | # 定义一个迭代器
116 | def __reader():
117 | with open("/opt/common_files/Tencent_AILab_ChineseEmbedding.txt",encoding='utf-8',errors='ignore') as f:
118 | for idx, line in tqdm(enumerate(f), 'Loading ...'):
119 | ws = line.strip().split(' ')
120 | if idx:
121 | vec = [float(i) for i in ws[1:]]
122 | if len(vec) != 200:
123 | continue
124 | yield {'word': ws[0], 'vector': vec}rd = __reader()
125 | rd = __reader()
126 | while rd:
127 | my_set.insert_one(next(rd))
128 | ```
129 |
130 | ## 4. ElasticSearch(docker version)
131 | ```
132 | # Run Elasticsearch
133 | docker run -d --name elasticsearch_for_test -p 9200:9200 -e "discovery.type=single-node" elasticsearch:6.6.0
134 | # 安装elasticsearch-head
135 | ```
136 | ```python
137 | # 用python连接,并进行增删改查
138 | from elasticsearch import Elasticsearch
139 | from elasticsearch import helpers
140 | # es = Elasticsearch(hosts="localhost:9200", http_auth=('username','passwd'))
141 | esclient = Elasticsearch(['localhost:9200'])
142 | # 高效插入ES
143 | action1 = {
144 | "_index": "idx111",
145 | "_type": "test",
146 | # "_id": ,
147 | "_source": {
148 | 'ServerIp': '0.1.1.1',
149 | 'SpiderType': 'toxic',
150 | 'Level': 4
151 | }
152 | }
153 | action2 = {
154 | "_index": "idx111",
155 | "_type": "pre",
156 | # "_id": 1,
157 | "_source": {
158 | 'ServerIp': '0.1.1.2',
159 | 'SpiderType': 'non-toxic',
160 | 'Level': 1
161 | }
162 | }
163 | actions = [action1, action2]
164 | helpers.bulk(esclient, actions)
165 |
166 | #---------------------------------------------------
167 | # 创建schema然后单条插入数据
168 | # 类似创建schema
169 | answer_index = 'baidu_answer'
170 | answer_type = 'doc22'
171 | esclient.indices.create(answer_index)
172 | answer_mapping = {
173 | "doc22": {
174 | "properties": {
175 | "id": {
176 | "type": "integer",
177 | # "index": True
178 | },
179 | "schoolID":{
180 | "type":"text"
181 | },
182 | "schoolName":{
183 | "type": "text",
184 | "analyzer": "ik_max_word" # 这个需要安装,先run docker6.6.0然后docker exec -it /bin/bash下载解压ik后exit然后restart这个container即可,之后可以新生成一个image
185 | # "analyzer":"whitespace"
186 | },
187 | "calNum":{
188 | "type":"float"
189 | }
190 | }
191 | }
192 | }
193 | esclient.indices.put_mapping(index=answer_index, doc_type=answer_type, body=answer_mapping)
194 | # 创建完schema以后导入数据
195 | doc = {'id': 7, 'schoolID': '007', 'schoolName': '春晖外国语学校', 'calNum':6.20190624}
196 | esclient.index(index=answer_index ,doc_type=answer_type ,body=doc, id=doc['id'])
197 | esclient.index(index=answer_index ,doc_type=answer_type ,body=doc, id=10)
198 | #----------------------------------------------------
199 |
200 | # 删除单条数据
201 | # esclient.delete(index='indexName', doc_type='typeName', id='idValue')
202 | esclient.delete(index='pre', doc_type='imagetable2', id=1)
203 | # 删除索引
204 | esclient.indices.delete(answer_index)
205 |
206 | # 更新
207 | # esclient.update(index='indexName', doc_type='typeName', id='idValue', body={_type:{待更新字段}})
208 | new_doc = {'id': 7, 'schoolId': '007', 'schoolName': '更新名字1'}
209 | esclient.update(index=answer_index, id=7, doc_type=answer_type, body={'doc': new_doc}) # 注意body中一定要加_type doc,更新的body中不一定要加入所有字段,只要把要更新的几个字段加入即可
210 |
211 | # 查询
212 | ### 根据id查找数据
213 | res = esclient.get(index=answer_index, doc_type=answer_type, id=7)
214 | ### 根据id列表查找数据
215 | body = {'ids': id_lst} # id_lst=[3,6,120,9]
216 | res = esclient.mget(index=index, doc_type=doc_type, body=body)
217 | ### match:在schoolName中包含关键词的都会被搜索出来(这里的分词工具是ik)
218 | # res = esclient.search(index=answer_index,body={'query':{'match':{'schoolName':'春晖外'}}})
219 | res = esclient.search(index=answer_index,body={'query':{'match':{'schoolName':'春晖学校'}}})
220 | ### 根据restful api进行查询
221 | # [GET] http://localhost:9200/knowledge_qv/question_vec/20
222 | ```
223 |
224 | [ES查询大于10000条数据方法](https://blog.csdn.net/xsdxs/article/details/72876703)
225 |
226 | ## 5. neo4j图数据库(docker version)
227 | ```
228 | # docker启动neo4j服务
229 | docker run \
230 | --publish=7474:7474 --publish=7687:7687 \
231 | --volume=$PWD/neo4j/data:/data \
232 | -d neo4j:latest
233 |
234 | # 然后登陆网页可视化界面
235 |
236 | # 或使用Cypher shell
237 | docker exec --interactive --tty bin/cypher-shell
238 | # 退出:exit
239 | ```
240 |
241 | ## 6. Stardog RDF数据库
242 |
243 | [stardog官方文档](https://www.stardog.com/docs/)
244 | [RDF入门](https://blog.csdn.net/txlCandy/article/details/50959358)
245 | [OWL语言](https://blog.csdn.net/zycxnanwang/article/details/86557350)
246 |
--------------------------------------------------------------------------------
/07_database/es.md:
--------------------------------------------------------------------------------
1 | # es存入768维度向量,以及向量查询(ES版本需要7.3之后)
2 |
3 | https://github.com/UKPLab/sentence-transformers/blob/master/examples/applications/semantic-search/semantic_search_quora_elasticsearch.py
4 |
5 |
6 | ```python
7 | """
8 | This script contains an example how to perform semantic search with ElasticSearch.
9 |
10 | As dataset, we use the Quora Duplicate Questions dataset, which contains about 500k questions:
11 | https://www.quora.com/q/quoradata/First-Quora-Dataset-Release-Question-Pairs
12 |
13 | Questions are indexed to ElasticSearch together with their respective sentence
14 | embeddings.
15 |
16 | The script shows results from BM25 as well as from semantic search with
17 | cosine similarity.
18 |
19 | You need ElasticSearch (https://www.elastic.co/de/elasticsearch/) up and running. Further, you need the Python
20 | ElasticSearch Client installed: https://elasticsearch-py.readthedocs.io/en/master/
21 |
22 | As embeddings model, we use the SBERT model 'quora-distilbert-multilingual',
23 | that it aligned for 100 languages. I.e., you can type in a question in various languages and it will
24 | return the closest questions in the corpus (questions in the corpus are mainly in English).
25 | """
26 |
27 | from sentence_transformers import SentenceTransformer, util
28 | import os
29 | from elasticsearch import Elasticsearch, helpers
30 | import csv
31 | import time
32 | import tqdm.autonotebook
33 |
34 |
35 |
36 | es = Elasticsearch()
37 |
38 | model = SentenceTransformer('quora-distilbert-multilingual')
39 |
40 | url = "http://qim.fs.quoracdn.net/quora_duplicate_questions.tsv"
41 | dataset_path = "quora_duplicate_questions.tsv"
42 | max_corpus_size = 100000
43 |
44 | #Download dataset if needed
45 | if not os.path.exists(dataset_path):
46 | print("Download dataset")
47 | util.http_get(url, dataset_path)
48 |
49 | #Get all unique sentences from the file
50 | all_questions = {}
51 | with open(dataset_path, encoding='utf8') as fIn:
52 | reader = csv.DictReader(fIn, delimiter='\t', quoting=csv.QUOTE_MINIMAL)
53 | for row in reader:
54 | all_questions[row['qid1']] = row['question1']
55 | if len(all_questions) >= max_corpus_size:
56 | break
57 |
58 | all_questions[row['qid2']] = row['question2']
59 | if len(all_questions) >= max_corpus_size:
60 | break
61 |
62 | qids = list(all_questions.keys())
63 | questions = [all_questions[qid] for qid in qids]
64 |
65 | #Index data, if the index does not exists
66 | if not es.indices.exists(index="quora"):
67 | try:
68 | es_index = {
69 | "mappings": {
70 | "properties": {
71 | "question": {
72 | "type": "text"
73 | },
74 | "question_vector": {
75 | "type": "dense_vector",
76 | "dims": 768
77 | }
78 | }
79 | }
80 | }
81 |
82 | es.indices.create(index='quora', body=es_index, ignore=[400])
83 | chunk_size = 500
84 | print("Index data (you can stop it by pressing Ctrl+C once):")
85 | with tqdm.tqdm(total=len(qids)) as pbar:
86 | for start_idx in range(0, len(qids), chunk_size):
87 | end_idx = start_idx+chunk_size
88 |
89 | embeddings = model.encode(questions[start_idx:end_idx], show_progress_bar=False)
90 | bulk_data = []
91 | for qid, question, embedding in zip(qids[start_idx:end_idx], questions[start_idx:end_idx], embeddings):
92 | bulk_data.append({
93 | "_index": 'quora',
94 | "_id": qid,
95 | "_source": {
96 | "question": question,
97 | "question_vector": embedding
98 | }
99 | })
100 |
101 | helpers.bulk(es, bulk_data)
102 | pbar.update(chunk_size)
103 |
104 | except:
105 | print("During index an exception occured. Continue\n\n")
106 |
107 |
108 |
109 |
110 | #Interactive search queries
111 | while True:
112 | inp_question = input("Please enter a question: ")
113 |
114 | encode_start_time = time.time()
115 | question_embedding = model.encode(inp_question)
116 | encode_end_time = time.time()
117 |
118 | #Lexical search
119 | bm25 = es.search(index="quora", body={"query": {"match": {"question": inp_question }}})
120 |
121 | #Sematic search
122 | sem_search = es.search(index="quora", body={
123 | "query": {
124 | "script_score": {
125 | "query": {
126 | "match_all": {}
127 | },
128 | "script": {
129 | "source": "cosineSimilarity(params.queryVector, doc['question_vector']) + 1.0",
130 | "params": {
131 | "queryVector": question_embedding
132 | }
133 | }
134 | }
135 | }
136 | })
137 |
138 | print("Input question:", inp_question)
139 | print("Computing the embedding took {:.3f} seconds, BM25 search took {:.3f} seconds, semantic search with ES took {:.3f} seconds".format(encode_end_time-encode_start_time, bm25['took']/1000, sem_search['took']/1000))
140 |
141 | print("BM25 results:")
142 | for hit in bm25['hits']['hits'][0:5]:
143 | print("\t{}".format(hit['_source']['question']))
144 |
145 | print("\nSemantic Search results:")
146 | for hit in sem_search['hits']['hits'][0:5]:
147 | print("\t{}".format(hit['_source']['question']))
148 |
149 | print("\n\n========\n")
150 | ```
--------------------------------------------------------------------------------
/07_database/faiss.md:
--------------------------------------------------------------------------------
1 | # faiss向量搜索库
2 |
3 | 与es.md提到的es7.3向量搜索一样,faiss是更加专业的向量搜索工具
4 |
5 | [实战入门faiss搜索bert最邻近句子:docker CPU镜像开箱即用,无需额外安装下载](https://mp.weixin.qq.com/s?__biz=MzA4NzkxNzM3Nw==&mid=2457484515&idx=1&sn=c13b27b09b4a7e2a31a1ee421b362540&chksm=87bc8acdb0cb03db46ca7cc0893e46d4078e925a3b35f717806315c0881f6ad75b2165df4a0f&cur_album_id=2002019450945896449&scene=189#wechat_redirect)
6 |
7 | [semantic_search_quora_faiss.py](https://github.com/UKPLab/sentence-transformers/blob/master/examples/applications/semantic-search/semantic_search_quora_faiss.py)
8 |
9 |
10 |
11 | ## todo
--------------------------------------------------------------------------------
/07_database/imgs/redis_pic.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/binzhouchn/python_notes/52b87285c4147cc2bd43d69d0243a2d6357f4183/07_database/imgs/redis_pic.png
--------------------------------------------------------------------------------
/07_database/neo4j.md:
--------------------------------------------------------------------------------
1 | # neo4j入门
2 |
3 | [**1. neo4j_python操作**](#neo4j_python操作)
4 |
5 | [**2. neo4j网页版直接操作**](#neo4j网页版直接操作)
6 |
7 | [**3. neo4j-spark-connector操作**](#neo4j-spark-connector操作)
8 |
9 | [**4. neo4j问题整理**](#neo4j问题整理)
10 |
11 |
12 | ### neo4j_python操作
13 |
14 | ```python
15 | import numpy as np
16 | import pandas as pd
17 | import py2neo
18 | from py2neo import Graph,Node,Relationship
19 | import neo4j
20 | from neo4j.v1 import GraphDatabase, basic_auth
21 |
22 | # py2neo操作
23 | test_graph = Graph(
24 | #"http://localhost:7474",
25 | "bolt://localhost:7687",
26 | username="neo4j",
27 | password="z123456789"
28 | )
29 |
30 | # 创建节点
31 | node1 = Node('Customer', name='John',age=18,phone=2232)
32 | node2 = Node('Customer', name='Lily',age=22,phone=9921)
33 | node3 = Node('Customer', name='Cathy',age=52,phone=7100)
34 | test_graph.create(node1)
35 | test_graph.create(node2)
36 | test_graph.create(node3)
37 |
38 | # 创建节点2
39 | arr = np.array([['John','Lily','Ben','Mark'],['189101','234220','019018','330682'],[11,23,56,28]])
40 | df = pd.DataFrame(arr.transpose(),columns=['name','phone_no','age'])
41 | for i, j, k in df.values:
42 | node1 = Node('Person',name=i,phone_no=j,age=k)
43 | graph.create(node1)
44 |
45 | # neo4j.v1操作
46 | driver = GraphDatabase.driver("bolt://localhost:7687", auth=basic_auth("neo4j", "z123456789"))
47 | session = driver.session()
48 | # 创建节点3
49 | arr = np.array([['John','Lily','Ben','Mark'],['189101','234220','019018','330682'],[11,23,56,28]])
50 | df = pd.DataFrame(arr.transpose(),columns=['name','phone_no','age'])
51 | # name phone_no age
52 | # 0 John 189101 11
53 | # 1 Lily 234220 23
54 | # 2 Ben 019018 56
55 | # 3 Mark 330682 28
56 | # dataframe to dict操作
57 | dic = {'events':df.to_dict('records')}
58 | session.run("unwind {events} as event merge (n:Person{name:event.name,phone_no2:event.phone_no,age: event.age})",dic)
59 |
60 | # 删除所有节点和边
61 | test_graph.delete_all()
62 | ```
63 |
64 | ### neo4j网页版直接操作
65 |
66 | 先把jdk改成1.8然后再进入neo4j的文件夹bin中输入neo4j.bat console撑起网页版服务
67 |
68 | 在http://localhost:7474/browser/用户名neo4j密码z1234...中输入命令进行一些简单的节点,关系等操作
69 |
70 | **Neo4j CQL常见的操作有:**
71 |
72 | |S.No|CQL命令/条款|用法
73 | |--|--|--
74 | |1|CREATE[创建节点](#创建节点)|创建节点,关系和属性
75 | |2|CREATE[创建关系1](#创建关系1)|创建关系和属性
76 | |3|CREATE[创建关系2](#创建关系2)|创建关系和属性
77 | |4|CREATE[创建关系3](#创建关系3)|创建关系和属性
78 | |5|MATCH[匹配](#匹配)|检索有关系点,关系和属性数据
79 | |6|RETURN[返回](#返回)|返回查询结果
80 | |7|WHERE[哪里](#哪里)|提供条件过滤检索数据
81 | |8|DELETE[删除](#删除)|删除节点和关系
82 | |9|REMOVE[移除](#移除)|删除节点和关系的属性
83 | |10|ORDER BY以..[排序](#排序)|排序检索数据
84 | |11|SET[设置](#设置)|添加或更新标签
85 | |12|[UNWIND](#unwind)|unwind操作
86 | |13|[INDEX](#index)|index添加,删除和查询
87 | |14|[修改graph.db](#修改)|修改备份Neo4j图数据库
88 |
89 | **Neo4j CQL常见的函数有:**
90 | |S.No|定制列表功能|用法
91 | |--|--|--
92 | |1|String[字符串](#string)|它们用于使用String字面量(UPPER,LOWER,SUBSTRING,REPLACE)
93 | |2|Aggregation[聚合](#聚合)|对CQL查询结果执行一些聚合操作(COUNT,MAX,MIN,SUM,AVG)
94 | |3|Relationship关系|他们用于获取关系的细节如startnode, endnode等
95 |
96 |
97 | ---
98 | **CQL常见的操作**
99 |
100 | - 创建节点
101 | 1)创建节点:
102 | create (e:Customer{id:'1001',name:'Bob',dob:'01/10/1982'})
103 | create (cc:CreditCard{id:'5001',number:'1234567890',cvv:'888',expiredate:'20/17'})
104 |
105 | 2)导入csv文件进行节点的创建
106 | load csv with headers from "file:///shop.csv" as df
107 | merge(:Shop{name:df.name,cn_name:df.cn_name,age:df.age,sex:df.age})
108 |
109 | - Import data from a CSV file with a custom field delimiter
110 | 比如 load csv with headers from "file:///shop.csv" as df FIELDTERMINATOR '\t'
111 | - Importing large amounts of data
112 | 比如 USING PERIODIC COMMIT 500
113 | load csv with headers from "file:///shop.csv" as df
114 |
115 | ----------
116 |
117 | ```
118 | - 创建关系1
119 | 创建两个节点之间的关系:
120 | match(e:Customer),(cc:CreditCard)
121 | create (e)-[r:DO_SHOPPING_WITH]->(cc)
122 |
123 | 创建两个节点之间的关系(加边属性):
124 | match(e:Customer),(cc:CreditCard)
125 | create (e)-[r:DO_SHOPPING_WITH{shopdate:'12/12/2014',price:6666}]->(cc)
126 |
127 | - 创建关系2
128 | 根据两个节点之间的相同属性进行连接1:
129 | match(c:Customer),(p:Phone)
130 | where c.phone = p.phone_no
131 | create (c)-[:Call]->(p)
132 |
133 | 根据两个节点之间的相同属性进行连接2:
134 | match(a:Test),(b:Test22)
135 | where b.ide in a.name
136 | create (a)-[:sssssssssss]->(b)
137 | 这里的a.name是个list
138 |
139 | - 创建关系3
140 | 各自创建节点比如shop和phone两个节点,然后导入一个关系的csv文件进行连接
141 | ```
142 |
143 | **shop.csv**
144 | |name|cn_name|age|sex
145 | |--|--|--|--
146 | |Jack|杰克|22|男
147 | |Lily|丽丽|34|女
148 | |John|约翰|56|男
149 | |Mark|马克|99|男
150 |
151 | **phone.csv**
152 | |phone|id_p|
153 | |--|--
154 | |1223|0
155 | |3432|1
156 | |9011|2
157 |
158 | **关系.csv**
159 | |name|phone
160 | |--|--
161 | |Jack|1223
162 | |Lily|3432
163 | |John|9011
164 | |Mark|3432
165 |
166 | ```
167 | cypher关系语句:
168 | load csv with headers from "file:///test.csv" as df
169 | match(a:Shop{name:df.name}),(b:Phone{phone:df.phone})
170 | create (a)-[:Call{phone_id:df.id_p}]->(b)
171 | ```
172 | *注:neo4j中不能创建双向或者无向的关系,只能单向*
173 |
174 | ```
175 | ### 匹配
176 |
177 | 三层关系:
178 | match (n:企业)-[k*1..3]-(m) return n.company_nm
179 |
180 | ### 返回
181 | match(e:Customer),(cc:CreditCard)
182 | return e.name,cc.cvv
183 |
184 | ### 哪里
185 | match(n:Customer), (cc:CreditCard)
186 | where n.name = 'Lily' and cc.id = '5001'
187 | create (n)-[r:DO_SHOPPING_WITH{shopdate:'1/1/9999', price:100}]->(cc)
188 |
189 | 正则使用:
190 | match(n:Person)
191 | where n.name =~ '(?i)^[a-d].*'
192 | return n
193 |
194 | ### 删除
195 | 删除所有的节点和关系
196 | match(n) match(n)-[r]-() delete n,r
197 |
198 | 删除相关的节点及和这些节点相连的边(一阶)
199 | match(cc:Customer)
200 | *detach* delete cc
201 | 或者
202 | match(cc:Customer) match(cc)-[r]-() delete cc,r
203 |
204 | 删除产品及上下游相连关系和节点,(递归),除3款产品外
205 | match r=(n:Product)-[*]->() where not n.raw_name in ["xx1","xx2","xx3"] detach delete r
206 |
207 | 删除所有孤立节点
208 | match (n) where not (n)–-() delete n
209 |
210 | 删除一阶孤立节点,比如保险责任->保险子责任(保险责任上游还有带产品的不删)
211 | match (n)-[r]-(m) where n.raw_name='保险责任' and not (n)–[]-(:Product) detach delete n,r,m;
212 |
213 | 删除条款示例(需要执行三句话)
214 | match(n) where id(n)={id} detach delete n
215 | match (n)-[r]-(m) where n.raw_name='保险责任' and not (n)–[]-(:Product) detach delete n,r,m
216 | match (n) where not (n)–-() delete n
217 |
218 | ### 移除
219 | 可以移除节点的属性
220 | match(n:Customer) where n.name = 'Lily'
221 | remove n.dob
222 |
223 | ### 设置
224 | 可以设置节点的属性(增加或者改写)
225 | match(n:Customer) where n.name = 'Bob'
226 | SET n.id = 1003
227 |
228 | 对已经存在的点,进行属性添加操作
229 | **--Person:**
230 | create(:Person{cd:'1223',xx:'er'})
231 | create(:Person{cd:'92223',xx:'iir'})
232 | create(:Person{cd:'6783',xx:'rrrr'})
233 | create(:Person{cd:'555903',xx:'ppppppppppr'})
234 | ```
235 |
236 | **--test.csv:** (注:导入csv的时候会把所有的转成string格式)
237 |
238 | |col_one|col_two|col_three
239 | |--|--|--
240 | |555903|"桂勇"|"良"
241 | |92223|"黎明"|"优"
242 | |1223|"皇家"|"优"
243 | |6783|"汽车"|"良"
244 | 给Person添加两个属性
245 | load csv with headers from "file:///test.csv" as df
246 | match(n:Person) where n.cd = df.col_one
247 | set n.nm = df.col_two
248 | set n.credit = df.col_three
249 |
250 | ```
251 | ### 排序
252 | match(n:Customer)
253 | return n.name, n.id, n.dob
254 | order by n.name desc
255 |
256 | ### UNWIND
257 | 创建节点
258 | unwind ['John','Mark','Peter'] as name
259 | create (n:Customer{name:name})
260 |
261 | unwind [{id:1,name:'Bob',phone:1232},{id:2,name:'Lily',phone:5421},{id:3,name:'John',phone:9011}] as cust
262 | create (n:Customer{name:cust.name,id:cust.id,phone:cust.phone})
263 |
264 | 删除节点
265 | unwind [1,2,3] as id
266 | match (n:Customer) where n.id = id
267 | delete n
268 |
269 | ---
270 |
271 | ### String
272 | match(e:Customer)
273 | return e.id,upper(e.name) as name, e.dob
274 |
275 | ### 聚合
276 | count三种写法:
277 | 1. Match (n:people) where n.age=18 return count(n)
278 | 2. Match (n:people{age:’18’}) return count(n)
279 | 3. Match (n:people) return count(n.age=18)
280 |
281 | ### INDEX
282 | 添加 CREATE INDEX ON :Person(name)
283 | 删除 DROP INDEX ON :Person(name)
284 | 查询 call db.indexes()
285 |
286 | ### 修改
287 | 在neo4j的文件夹conf下面,打开文件neo4j.conf,找到一下位置处
288 |
289 | dbms.active_database=graph.db,修改数据库名字,例如graph.db -> graph2.db即可。
290 |
291 | ```
292 |
--------------------------------------------------------------------------------
/08_vscode/README.md:
--------------------------------------------------------------------------------
1 | # vscode使用(版本1.86.2)
2 |
3 | ## 1. 在VScode中添加远程Linux服务器中Docker容器中的Python解释器
4 |
5 | **以dgx.6机器为例**
6 | ```shell
7 | # 第一步 创建容器
8 | nvidia-docker run -d --name myllm -p 8891:22 -v $PWD/llm:/workspace/llm -w /workspace/llm -it 10.xx.xx.xxx/zhoubin/llm:py311-cuda12.1.0-cudnn8-devel-ubuntu22.04 /bin/bash
9 | 注释:
10 | [-p 8891:22]:把docker的端口号22映射到服务器的端口号8891。
11 | [-d]:容器后台运行,避免退出容器后容器自动关闭。
12 | [-v]:挂载和同步目录,服务器和docker内有一个文件夹保持同步。
13 | [-it]:确保docker后台交互运行。
14 | [10.xx.xx.xxx/zhoubin/llm:py311-cuda12.1.0-cudnn8-devel-ubuntu22.04]:镜像名。
15 | [/bin/bash]:docker内要运行的指令。
16 | ```
17 | ```shell
18 | #第二步 在容器内安装ssh服务
19 | docker exec -it [容器ID] /bin/bash
20 | # 更新apt-get
21 | 命令:apt-get update
22 | # 安装vim
23 | 命令:apt-get install vim
24 | # 安装openssh-server
25 | 命令:apt-get install openssh-server
26 | # 设置root密码(docker里面的用户名和密码,我这边账号密码都是root/root)
27 | 命令:passwd
28 | ```
29 | ```shell
30 | # 第三步 配置/etc/ssh/sshd_config文件
31 | # 在文件/etc/ssh/sshd_config中添加下面的代码:
32 | PubkeyAuthentication yes
33 | PermitRootLogin yes
34 |
35 | # 第四步 重启ssh服务(好像每次停止容器后重启都需要运行下)
36 | /etc/init.d/ssh restart
37 | 或 service ssh restart
38 |
39 | # 第五步 退出docker后,验证端口映射
40 | docker ps -a
41 | docker port [容器ID] 22
42 | 若结果输出“0.0.0.0:8891”,则说明端口映射正确。
43 | ```
44 | ```shell
45 | # 第6步 本地电脑连接docker(见Termius dgx6_docker_llm)
46 | ssh root@11.xx.xx.xxx -p 8891 ,密码是root
47 | ```
48 | ```shell
49 | # 使用VSCode连接远程主机上的docker container
50 | # 打开VScode编辑器,按下快捷键“Ctrl+Shift+X”,查找安装“Remote Development”。安装完成后需要点击“reload”,然后按下快捷键“Ctrl+Shift+P”,输入“remote-ssh”,选择“open SSH Configuration file”,在文件xx/username/.ssh/config中添加如下内容:
51 | Host llm_docker #Host随便起名字
52 | HostName 11.xxx.xx.x
53 | User root
54 | Port 8891
55 |
56 | #保存后,按下快捷键"Ctrl+Shift+P",输入"remote-ssh",选择"Connect to Host...",然后点击"llm_docker",接着选择“Linux”,最后按提示输入第三步中设置的root连接密码,在左下角显示"SSH:llm_docker",说明已经成功连接docker。
57 | ```
58 |
59 | ```shell
60 | #内网环境远程如果出现连接不上,大概率是vscode-server无法下载导致,可以手动搞定
61 | https://update.code.visualstudio.com/commit:903b1e9d8990623e3d7da1df3d33db3e42d80eda/server-linux-x64/stable
62 |
63 | 具体参考附录中的[VSCode连不上远程服务器]
64 | ```
65 |
66 |
67 | ## 2. Debugging(自带,不需要额外安装插件)
68 |
69 | 在Visual Studio Code(VSCode)中,[Debug Console](https://code.visualstudio.com/Docs/editor/debugging)是一个用于查看程序调试信息的窗口。它通常用于查看程序在调试过程中输出的日志信息、变量的值等。Debug Console提供了一个方便的方式来查看和分析程序的执行过程,帮助开发人员定位和解决代码中的问题。
70 |
71 |
72 | ----
73 |
74 | [vscode历史版本下载地址](https://code.visualstudio.com/updates/v1_86)
75 | [vscode扩展应用市场vsix文件手动下载安装](https://marketplace.visualstudio.com/search?target=VSCode&category=All%20categories&sortBy=Installs)
76 | [vscode扩展应用市场vsix文件手动下载历史版本插件包](https://blog.csdn.net/qq_15054345/article/details/133884626)
77 | [在VScode中添加Linux中的Docker容器中的Python解释器](https://blog.csdn.net/weixin_43268590/article/details/129244984)
78 | [VSCode连不上远程服务器](https://blog.csdn.net/qq_42610612/article/details/132782965)
79 | [无网机的vscode中怎么使用jupyter notebook](https://www.bilibili.com/read/cv34411972/?jump_opus=1)
80 |
--------------------------------------------------------------------------------
/09_remote_ipython/README.md:
--------------------------------------------------------------------------------
1 | [**1. pycharm远程配置**](#pycharm远程配置)
2 |
3 | [**2. 远程ipython http版**](#远程ipython_http版)
4 |
5 | [**3. 远程ipython https安全版**](#远程ipython_https安全版)
6 |
7 | [**4. jupyter notebook启动错误总结**](#jupyter_notebook启动错误总结)
8 |
9 | [**5. 添加Anaconda虚拟环境**](#添加anaconda虚拟环境)
10 |
11 | # pycharm远程配置
12 |
13 | pycharm远程配置:
14 | file->Settings->Project Interpreter->加入远程ssh的连接和python的执行文件地址
15 | 然后再加一个path mappings(本地和远程的文件存储地址)
16 |
17 | 文件同步配置:
18 | Tools->Deployment->Configuration->添加一个新SFTP
19 | Root path选远程文件夹
20 | Web server root URL: http:///
21 | Mappings选local path工程目录,其他的都为/
22 |
23 | done!
24 |
25 | # 远程ipython_http版
26 |
27 | 1. 打开ipython
28 | ```python
29 | from IPython.lib import passwd #from notebook.auth import passwd
30 | In [2] : passwd() # 输入密码
31 | Enter password:
32 | Verify password:
33 | Out[2]: 'sha1:f9...'
34 | ```
35 |
36 | 2. 新建jupyter_config.py,输入如下配置。
37 | ```bash
38 | c.NotebookApp.password = u'sha1:f9...'
39 | c.NotebookApp.ip = '*'
40 | c.NotebookApp.open_browser = False
41 | c.NotebookApp.port = 8888
42 | ```
43 |
44 | 3. 启动jupyter notebook 并指定配置文件,输入如下命令。
45 | ```bash
46 | jupyter notebook --config=jupyter_config.py
47 | ```
48 |
49 | 4. 若客户端浏览器无法打开jupyter,有可能是防火墙的缘故,输入如下命令开放对应的
50 | 的端口(若使用IPv6,把命令iptables改成ip6tables)
51 | ```bash
52 | iptables -I INPUT -p tcp --dport 8888 -j ACCEPT
53 | iptables save
54 | ```
55 |
56 | # 远程ipython_https安全版
57 |
58 | 通过mac终端登录:
59 | sudo ssh -p 22 ubuntu@182.254.247.182
60 | z1234..
61 | 安装教程和视频(在本机)
62 | http://blog.csdn.net/hshuihui/article/details/53320144
63 |
64 | 安装ipython notebook on 百度云
65 | ```bash
66 | wget https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/
67 | ```
68 | PATH in your .bashrc or .bash_profile
69 | ```bash
70 | export PATH="/root/anaconda2/bin:$PATH"
71 | ```
72 | 在服务器上启动IPython,生成自定义密码的sha1
73 | ```python
74 | In [1]: from IPython.lib import passwd
75 | In [2]: passwd()
76 | Enter password:
77 | Verify password:
78 | Out[2]: 'sha1:01f0def65085:059ed81ab3f5658e7d4d266f1ed5394e9885e663'
79 | ```
80 | 创建IPython notebook服务器
81 | ```bash
82 | ipython profile create nbserver
83 | ```
84 | 生成mycert.pem
85 | ```bash
86 | mkdir certs
87 | cd certs
88 | 然后openssl req -x509 -nodes -days 365 -newkey rsa:1024 -keyout mycert.pem -out mycert.pem
89 | ```
90 | 我们重点要关注的是 cd .ipython/profile_nbserver
91 | ipython_notebook_config.py这个文件,待会儿我们要修改该文件来配置服务器。不过,有时候这个文件不能生成,
92 | 这时候我们自己在这里新建即可,使用vim或者gedit。我自己配置的时候就没有生成ipython_notebook_config.py这个文件,我使用vim新建了一个:
93 | 然后把以下代码复制进去(替换certfile路径和sha1),保存
94 |
95 | ```bash
96 | # Configuration file for ipython-notebook
97 | c = get_config()
98 | #Kernel config
99 | c.IPKernelApp.pylab = 'inline'
100 | #Notebook config
101 | c.NotebookApp.certfile = u'/root/certs/mycert.pem'
102 | c.NotebookApp.ip = '*'
103 | c.NotebookApp.open_browser = False
104 | c.NotebookApp.password = u'sha1:375df20c451e:16f5535e55154eb3490dbcb83d8cb930ef3c3799'
105 | c.NotebookApp.port = 8888
106 | ```
107 | 启动命令:
108 | ```bash
109 | ipython notebook --config=/root/.ipython/profile_nbserver/ipython_notebook_config.py
110 | ```
111 | ```bash
112 | nohup ipython notebook --config=/root/.ipython/profile_nbserver/ipython_notebook_config.py
113 | 如果想关闭nohup先lsof nohup.out 然后kill -9 [PID]
114 | 登录ipython notebook:
115 | ```
116 |
117 | 或者建一个jupyter_config.py文件然后输入(http访问)
118 | ```python
119 | c.NotebookApp.password = u'sha1:ebf4c635f6b6:7d6824aa8f863ffbe7c264b28854ec2acf1a0961'
120 | c.NotebookApp.ip = '*'
121 | c.NotebookApp.open_browser = False
122 | c.NotebookApp.port = 8888
123 | ```
124 | 然后用命令行启动
125 | ```shell
126 | nohup jupyter notebook --config=jupyter_config.py
127 | ```
128 |
129 | ---
130 |
131 | Jupyter Notebook 添加目录插件
132 |
133 | ```bash
134 | pip install jupyter_contrib_nbextensions
135 | ```
136 | ```bash
137 | jupyter contrib nbextension install --user --skip-running-check
138 | ```
139 | 注意配置的时候要确保没有打开Jupyter Notebook
140 |
141 | # The installation of the Java Jupyter Kernel
142 |
143 | 要求jdk11及以上,maven3.6.3及以上
144 | ```shell
145 | java --list-modules | grep "jdk.jshell"
146 |
147 | > jdk.jshell@12.0.1
148 | ```
149 | ```shell
150 | git clone https://github.com/frankfliu/IJava.git
151 | cd IJava/
152 | ./gradlew installKernel
153 | ```
154 | 然后启动jupyter notebook即可,选java kernel的notebook
155 |
156 | ### Run docker image
157 |
158 | ```shell
159 | cd jupyter
160 | docker run -itd -p 127.0.0.1:8888:8888 -v $PWD:/home/jupyter deepjavalibrary/jupyter
161 | ```
162 |
163 | # jupyter_notebook启动错误总结
164 |
165 | [Jupyter Notebook "signal only works in main thread"](https://blog.csdn.net/loovelj/article/details/82184223)
166 | 查询了很多网站,最后发现是两个包版本安装不对,重新安装这两个包就就可以了
167 | ```shell
168 | pip install -i https://pypi.tuna.tsinghua.edu.cn/simple "pyzmq==17.0.0" "ipykernel==4.8.2"
169 | ```
170 |
171 | # 添加anaconda虚拟环境
172 |
173 | 把anaconda3整个文件夹拷贝到anaconda3/envs下,然后取名为比如tf-gpu
174 | 然后可以把这个文件夹下的包的版本可以自行替换比如把tf2.0替换成tf1.14(注:不要删除,会有问题)
175 | 然后在jupyter notebook添加Anaconda虚拟环境的python kernel
176 | ```shell
177 | conda create -n tf-gpu python=3.8 # 创建tf-gpu虚拟环境
178 | source activate tf-gpu # 激活tf-gpu环境
179 | conda deactivate # 退出虚拟环境
180 | conda install ipykernel # 安装ipykernel模块(如果是虚拟机没联网,可以去https://anaconda.org/conda-forge/ipykernel/files下载)
181 | python -m ipykernel install --user --name tf-gpu --display-name "tf-gpu" # 进行配置
182 | jupyter notebook # 启动jupyter notebook,然后在"新建"中就会有py3这个kernel了
183 | ```
184 | 虚拟环境启动notebook
185 | ```shell
186 | 1. conda install jupyter notebook(如果不行,主环境的site-package整个拷贝到envs/下的虚拟环境)
187 | 2. 虚拟环境安装jupyter_nbextensions_configurator(https://zodiac911.github.io/blog/jupyter-nbextensions-configurator.html)
188 | 3. 虚拟环境conda install nb_conda(安装好这个则notebook新建的时候会出现该环境)
189 | 4. 进到虚拟环境启动jupyter notebook以后,如果import包有问题则退出并运行conda install nomkl numpy scipy scikit-learn numexpr
190 | ```
191 |
192 |
--------------------------------------------------------------------------------
/10_docker/README.md:
--------------------------------------------------------------------------------
1 | # [docker入门实践](https://yeasy.gitbook.io/docker_practice/)
2 |
3 | ## 1. docker安装及配置Docker镜像站
4 |
5 | 1.1 mac下安装
6 | [mac安装网址](https://hub.docker.com/editions/community/docker-ce-desktop-mac)
7 | [docker docs for mac](https://docs.docker.com/docker-for-mac/)
8 |
9 | 1.2 linux下安装
10 | [Install Docker Engine on Ubuntu](https://docs.docker.com/engine/install/ubuntu/)
11 |
12 | 1.3 配置docker镜像站
13 | [docker镜像站网址](https://www.daocloud.io/mirror#accelerator-doc)
14 |
15 | 1.4 配置docker代理
16 |
17 | - windows中右击图标,选settings->Proxies
18 | - [mac](https://www.cnblogs.com/EasonJim/p/9988154.html)
19 | - [linux](https://blog.csdn.net/qq_30034989/article/details/132021346)
20 |
21 | ```shell
22 | # 如果使用HTTP代理服务器时,将为docker服务创建systemd插件目录
23 | mkdir -p /etc/systemd/system/docker.service.d
24 | # 创建一个名为的文件/etc/systemd/system/docker.service.d/http-proxy.conf,添加HTTP_PROXY环境变量
25 | [Service]
26 | Environment="HTTP_PROXY=http://proxy.example.com:80/"
27 | # 或者,如果使用HTTPS代理服务器,那么再创建一个名为/etc/systemd/system/docker.service.d/https-proxy.conf 添加HTTPS_PROXY环境变量
28 | [Service]
29 | Environment="HTTPS_PROXY=https://proxy.example.com:443/"
30 | # 为Docker配置不代理的地址时,可以通过NO_PROXY环境变量指定它们,比如HTTP代理服务器的配置
31 | [Service]
32 | Environment="HTTP_PROXY=http://proxy.example.com:80/" "NO_PROXY=localhost,127.0.0.1,docker-registry.somecorporation.com"
33 | [Service]
34 | Environment="HTTPS_PROXY=https://proxy.example.com:443/" "NO_PROXY=localhost,127.0.0.1,docker-registry.somecorporation.com"
35 | # 重新读取服务的配置文件
36 | systemctl daemon-reload
37 | # 重启Docker
38 | systemctl restart docker #或者sudo service docker restart
39 | # 验证是否已加载配置
40 | systemctl show --property=Environment docker
41 | ```
42 |
43 |
44 | ## 2. docker基本命令
45 |
46 | 2.1 docker查看版本及images
47 | ```shell
48 | docker --version
49 | docker images
50 | ```
51 |
52 | 2.2 docker run
53 | ```shell
54 | docker run hello-world
55 | # run之前如果没有这个images,则会从docker_hub上先pull下来 docker pull hello-world
56 | ```
57 |
58 | 2.3 如果不小心关了container或者重启了电脑
59 | ```
60 | # 先查看container历史
61 | docker ps -a
62 | # 重启container即可,前提是docker run的时候要加-volume把数据挂载到本地
63 | docker start
64 | ```
65 |
66 | 2.3 docker跑完以后需要删除container再删除image
67 | ```shell
68 | # 查看image对应的container id
69 | docker ps -a
70 | # 删除container
71 | docker rm container_id
72 | # 删除image
73 | docker rmi image_id
74 | # 也可以直接暴力删除image
75 | docker rmi -f image_id
76 | # 如果存在同名同id不同tag的镜像
77 | 可以使用repository:tag的组合来删除特殊的镜像
78 | ```
79 |
80 | 2.4 docker打开image bash编辑,比如打开python镜像bash下载一些包再保存
81 | ```shell
82 | # 如果原来的镜像已经启动了container,则
83 | docker exec -it /bin/bash
84 | # 进去修改完后
85 | docker start
86 | #------------------------------------------
87 | docker pull python:3.6
88 | docker run -it python:3.6 /bin/bash #启动镜像并进入到shell页面
89 | docker run -dit python:3.6 /bin/bash #如果只是想启动并后台运行
90 | # 接下去进行一些pip install一些包等操作
91 | docker commit -m="has update" -a="binzhouchn" binzhouchn/python36:1.3
92 | ```
93 |
94 | 2.5 docker保存和读取image(存成tar.gz文件)
95 | ```shell
96 | # 保存
97 | docker save -o helloword_test.tar fce45eedd449(image_id)
98 | #或者docker save -o mydocker.tar.gz mydocker:1.0.0
99 | # 读取
100 | docker load -i helloword_test.tar
101 | ```
102 |
103 | 2.6 docker保存和读取container
104 | ```shell
105 | # 保存
106 | docker export -o helloword_test.tar fce45eedd444(container_id)
107 | # 读取
108 | docker import ...
109 | ```
110 |
111 | 2.7 修改repository和tag名称
112 | ```shell
113 | # 加载images后可以名称都为
114 | docker tag [image id] [name]:[版本]
115 | ```
116 |
117 | 2.8 用dockerfile建一个image,并上传到dockerhub
118 | ```
119 | # 建一个dockerfile
120 | cat > Dockerfile <)
142 | ```
143 |
144 | 2.11 批量停止并删除容器
145 | ```shell
146 | docker stop $(docker ps -a -q)
147 | docker rm $(docker ps -a -q)
148 | ```
149 |
150 | 2.12 虚悬镜像
151 |
152 | 上面的镜像列表中,还可以看到一个特殊的镜像,这个镜像既没有仓库名,也没有标签,均为 。
153 | ```shell
154 | 00285df0df87 5 days ago 342 MB
155 | ```
156 | 这个镜像原本是有镜像名和标签的,原来为 mongo:3.2,随着官方镜像维护,发布了新版本后,重新 docker pull mongo:3.2 时,mongo:3.2 这个镜像名被转移到了新下载的镜像身上,而旧的镜像上的这个名称则被取消,从而成为了 。除了 docker pull 可能导致这种情况,docker build 也同样可以导致这种现象。由于新旧镜像同名,旧镜像名称被取消,从而出现仓库名、标签均为 的镜像。这类无标签镜像也被称为 虚悬镜像(dangling image) ,可以用下面的命令专门显示这类镜像:
157 | ```shell
158 | $ docker image ls -f dangling=true
159 | REPOSITORY TAG IMAGE ID CREATED SIZE
160 | 00285df0df87 5 days ago 342 MB
161 | ```
162 | 一般来说,虚悬镜像已经失去了存在的价值,是可以随意删除的,可以用下面的命令删除。
163 | ```shell
164 | $ docker image prune
165 | ```
166 |
167 | 2.13 拷贝宿主机本地文件到docker中,和从docker中拷贝到宿主机
168 | ```shell
169 | #1
170 | docker cp test.txt :/home
171 | #2
172 | docker cp :/home/xx.txt /opt
173 | ```
174 |
175 | 2.14 根据镜像名定位到已经开启python:3.6镜像容器的id
176 | ```shell
177 | docker ps -a| grep python:3.6 | awk '{print $1}' #方法一
178 | docker ps -aq --filter ancestor=python:3.6 #方法二
179 | # 根据镜像名停止和删除容器
180 | docker stop `docker ps -a| grep python:3.6 | awk '{print $1}'`
181 | docker rm `docker ps -a| grep python:3.6 | awk '{print $1}'`
182 | ```
183 |
184 | 2.15 docker中python print不生效解决办法
185 | ```shell
186 | #方法一 显式调用flush
187 | print("Hello www", flush=True)
188 | #方法二 使用 "-u" 参数执行 python 命令
189 | sudo nvidia-docker run -v $PWD/masr_bz:/workspace/masr_bz -w /workspace/masr_bz binzhouchn/pytorch:1.7-cuda10.1-cudnn7-masr python -u train.py
190 | ```
191 |
192 |
193 | ## 3. docker镜像使用
194 |
195 | 【3.0 工作中】
196 | **方法一(环境和代码独立,代码放外面)**
197 | ```shell
198 |
199 | - 配置好环境镜像比如binzhouchn/python36:1.4
200 | - docker run -d -p 5005:5005 -v $PWD/xx_service:/usr/src/xx_service -w /usr/src/xx_service binzhouchn/python36:1.4 gunicorn -b :5005 server:app
201 | ```
202 |
203 | **方法二(代码放在镜像里面为一个整体)**
204 | ```shell
205 | #构建Dockerfile
206 | FROM binzhouchn/python36:1.4
207 | MAINTAINER zhoubin zhoubin@qq.com
208 | COPY target/xx_service /usr/src/xx_service
209 | WORKDIR /usr/src/xx_service
210 | ENTRYPOINT ["gunicorn", "-b", ":5005", "server:app"]
211 | #run dockerfile
212 | docker build -t binzhouchn/new_img:0.1 .
213 | #run image(后台运行,5005映射出来)
214 | docker run -d -p 5005:5005 new_img:0.1
215 | ```
216 |
217 |
218 | 3.1 docker跑一个helloworld
219 | ```shell
220 | docker run -v $PWD/myapp:/usr/src/myapp -w /usr/src/myapp python:3.5 python helloworld.py
221 | # 本地需要建一个myapp文件夹,把helloworld.py文件放文件夹中,然后返回上一级cd ..
222 | 命令说明:
223 | -v $PWD/myapp:/usr/src/myapp :将主机中当前目录下的myapp挂载到容器的/usr/src/myapp
224 | -w /usr/src/myapp :指定容器的/usr/src/myapp目录为工作目录
225 | python helloworld.py :使用容器的python命令来执行工作目录中的helloworld.py文件
226 | ```
227 |
228 | 3.1 docker跑一个简单的flask demo(用到python3.5镜像)
229 | ```shell
230 | # -d后台运行 -p端口映射
231 | docker run -d -p 5000:5000 -v $PWD/myapp:/usr/src/myapp -w /usr/src/myapp binzhou/python35:v2 python app.py
232 | ```
233 |
234 | 3.2 docker用mysql镜像
235 | ```
236 | # 先下载镜像
237 | docker pull mysql:5.5
238 | # 运行容器 可以先把-v去掉
239 | docker run -p 3306:3306 --name mymysql -v $PWD/conf:/etc/mysql/conf.d -v $PWD/logs:/logs -v $PWD/data:/var/lib/mysql -e MYSQL_ROOT_PASSWORD=123456 -d mysql:5.5
240 |
241 | -p 3306:3306:将容器的 3306 端口映射到主机的 3306 端口。
242 | -v -v $PWD/conf:/etc/mysql/conf.d:将主机当前目录下的 conf/my.cnf 挂载到容器的 /etc/mysql/my.cnf。
243 | -v $PWD/logs:/logs:将主机当前目录下的 logs 目录挂载到容器的 /logs。
244 | -v $PWD/data:/var/lib/mysql :将主机当前目录下的data目录挂载到容器的 /var/lib/mysql 。
245 | -e MYSQL_ROOT_PASSWORD=123456:初始化 root 用户的密码。
246 |
247 | # 用三方工具Navicat或者python连接,先建好db比如test_db
248 | import pymysql
249 | # 打开数据库连接
250 | db = pymysql.connect("localhost","root","123456","test_db")
251 | # 使用 cursor() 方法创建一个游标对象 cursor
252 | cursor = db.cursor()
253 | sql = "INSERT INTO tt(a, b, date) VALUES ('%d', '%s', '%s')"
254 | data = (306, '插入6', '20190615')
255 | cursor.execute(sql % data)
256 | db.commit()
257 |
258 | # 起了mysql服务以后,在用docker python去插入数据
259 | # 需要先查看docker mysql的容器ip地址,命令看2.8
260 | # 然后localhost改成mysql容器的ip地址即可,其他一样
261 |
262 | ```
263 |
264 | 3.3 docker用redis镜像
265 | ```
266 | # 启动redis命令
267 | docker run --name docker-redis-test -p 6379:6379 -d redis:latest --requirepass "123456"
268 | # redis客户端连接命令
269 | docker exec -it redis-cli
270 | # 进去以后的操作
271 | auth 123456
272 | set name zhangsan
273 | get name
274 | quit
275 |
276 | # python连接docker起的redis服务
277 | import redis
278 | r = redis.Redis(host='localhost', port=6379, password='123456')
279 | r.set('name', 'John')
280 | print(r.get('name'))
281 |
282 | # redis可视化工具RDM(已安装)
283 | ```
284 |
285 | 3.4 docker用mongo镜像
286 | ```
287 | # 启动mongodb命令
288 | docker run -p 27017:27017 -v $PWD/mongo_db:/data/mongo_db -d mongo:4.0.10
289 | # 连接到mongo镜像cli
290 | docker run -it mongo:4.0.10 mongo --host <容器ip>
291 |
292 | # 建database建collection比如runoob然后插入数据
293 | db.runoob.insert({"title": 'MongoDB 教程',
294 | "description": 'MongoDB 是一个 Nosql 数据库',
295 | "by": 'w3cschool',
296 | "url": 'http://www.w3cschool.cn',
297 | "tags": ['mongodb', 'database', 'NoSQL'],
298 | "likes": 100})
299 | db.runoob.find()
300 |
301 | # python连接docker起的mongo服务
302 | import pymongo
303 | mongodb_host = 'localhost'
304 | mongodb_port = 27017
305 | # pymongo.MongoClient(mongodb_host, mongodb_port, username='test', password='123456')
306 | myclient = pymongo.MongoClient('mongodb://localhost:27017/')
307 | myclient.list_database_names()
308 | mydb = myclient["mongo_testdb"]
309 | mydb.list_collection_names()
310 | mycol = mydb["runoob"]
311 | # 创建collection
312 | mydb.create_collection('test2')
313 | # 插入数据
314 | mydict = { "name": "Google", "age": "25", "url": "https://www.google.com" }
315 | mycol.insert_one(mydict)
316 | # 查看数据
317 | list(mycol.find())
318 | ```
319 |
320 | 3.5 docker用elasticsearch镜像
321 | ```
322 | # Run Elasticsearch
323 | docker run -d --name elasticsearch_for_test -p 9200:9200 -e "discovery.type=single-node" elasticsearch:6.6.0
324 | # 安装elasticsearch-head
325 | ```
326 | ```python
327 | # 用python连接,并进行增删改查
328 | from elasticsearch import Elasticsearch
329 | from elasticsearch import helpers
330 | # es = Elasticsearch(hosts="localhost:9200", http_auth=('username','passwd'))
331 | esclient = Elasticsearch(['localhost:9200'])
332 | # 高效插入ES
333 | action1 = {
334 | "_index": "idx111",
335 | "_type": "test",
336 | # "_id": ,
337 | "_source": {
338 | 'ServerIp': '0.1.1.1',
339 | 'SpiderType': 'toxic',
340 | 'Level': 4
341 | }
342 | }
343 | action2 = {
344 | "_index": "idx111",
345 | "_type": "pre",
346 | # "_id": 1,
347 | "_source": {
348 | 'ServerIp': '0.1.1.2',
349 | 'SpiderType': 'non-toxic',
350 | 'Level': 1
351 | }
352 | }
353 | actions = [action1, action2]
354 | helpers.bulk(esclient, actions)
355 |
356 | #---------------------------------------------------
357 | # 创建schema然后单条插入数据
358 | # 类似创建schema
359 | answer_index = 'baidu_answer'
360 | answer_type = 'doc22'
361 | esclient.indices.create(answer_index)
362 | answer_mapping = {
363 | "doc22": {
364 | "properties": {
365 | "id": {
366 | "type": "integer",
367 | # "index": True
368 | },
369 | "schoolID":{
370 | "type":"text"
371 | },
372 | "schoolName":{
373 | "type": "text",
374 | "analyzer": "ik_max_word" # 这个需要安装,先run docker6.6.0然后docker exec -it /bin/bash下载解压ik后exit然后restart这个container即可,之后可以新生成一个image
375 | # "analyzer":"whitespace"
376 | },
377 | "calNum":{
378 | "type":"float"
379 | }
380 | }
381 | }
382 | }
383 | esclient.indices.put_mapping(index=answer_index, doc_type=answer_type, body=answer_mapping)
384 | # 创建完schema以后导入数据
385 | doc = {'id': 7, 'schoolID': '007', 'schoolName': '春晖外国语学校', 'calNum':6.20190624}
386 | esclient.index(index=answer_index ,doc_type=answer_type ,body=doc, id=doc['id'])
387 | esclient.index(index=answer_index ,doc_type=answer_type ,body=doc, id=10)
388 | #----------------------------------------------------
389 |
390 | # 删除单条数据
391 | # esclient.delete(index='indexName', doc_type='typeName', id='idValue')
392 | esclient.delete(index='pre', doc_type='imagetable2', id=1)
393 | # 删除索引
394 | esclient.indices.delete(answer_index)
395 |
396 | # 更新
397 | # esclient.update(index='indexName', doc_type='typeName', id='idValue', body={_type:{待更新字段}})
398 | new_doc = {'id': 7, 'schoolId': '007', 'schoolName': '更新名字1'}
399 | esclient.update(index=answer_index, id=7, doc_type=answer_type, body={'doc': new_doc}) # 注意body中一定要加_type doc,更新的body中不一定要加入所有字段,只要把要更新的几个字段加入即可
400 |
401 | # 查询
402 | ### 根据id查找数据
403 | res = esclient.get(index=answer_index, doc_type=answer_type, id=7)
404 | ### match:在schoolName中包含关键词的都会被搜索出来(这里的分词工具是ik)
405 | # res = esclient.search(index=answer_index,body={'query':{'match':{'schoolName':'春晖外'}}})
406 | res = esclient.search(index=answer_index,body={'query':{'match':{'schoolName':'春晖学校'}}})
407 | ### ids:根据id值
408 | esclient.search(index='baidu_answer',body={'query':{'ids':{'values':'10'}}})
409 | ```
410 |
411 | 3.6 docker用neo4j镜像
412 | ```
413 | # docker启动neo4j服务
414 | docker run \
415 | --publish=7474:7474 --publish=7687:7687 \
416 | --volume=$PWD/neo4j/data:/data \
417 | -d neo4j:latest
418 |
419 | # 然后登陆网页可视化界面
420 |
421 | # 或使用Cypher shell
422 | docker exec --interactive --tty bin/cypher-shell
423 | # 退出:exit
424 | ```
425 |
426 | 3.7 stardog
427 |
428 | ```
429 | docker pull stardog/stardog:latest
430 | docker run -v ~/stardog-6.2.2/:/var/opt/stardog -e STARDOG_SERVER_JAVA_ARGS="-Xmx8g -Xms8g -XX:MaxDirectMemorySize=2g" stardog/stardog:latest
431 |
432 | ```
433 |
434 | 3.8 容器云k8s
435 |
436 | Kubernetes是什么?Kubernetes是一个全新的基于容器技术的分布式架构解决方案,是Google开源的一个容器集群管理系统,Kubernetes简称K8S。Kubernetes 提供了完善的管理工具,这些工具涵盖了开发、部署测试、运维监控在内的各个环节。
437 |
438 | Kubernetes特性
439 | - 自我修复:在节点故障时,重新启动失败的容器,替换和重新部署,保证预期的副本数量;杀死健康检查失败的容器,并且在未准备好之前不会处理用户的请求,确保线上服务不中断。
440 | - 弹性伸缩:使用命令、UI或者基于CPU使用情况自动快速扩容和缩容应用程序实例,保证应用业务高峰并发时的高可用性;业务低峰时回收资源,以最小成本运行服务。
441 | - 自动部署和回滚:K8S采用滚动更新策略更新应用,一次更新一个Pod,而不是同时删除所有Pod,如果更新过程中出现问题,将回滚更改,确保升级不影响业务。
442 | - 服务发现和负载均衡:K8S为多个容器提供一个统一访问入口(内部IP地址和一个DNS名称),并且负载均衡关联的所有容器,使得用户无需考虑容器IP问题。
443 | - 机密和配置管理:管理机密数据和应用程序配置,而不需要把敏感数据暴露在镜像里,提高敏感数据安全性。并可以将一些常用的配置存储在K8S中,方便应用程序使用。
444 | - 存储编排:挂载外部存储系统,无论是来自本地存储,公有云,还是网络存储,都作为集群资源的一部分使用,极大提高存储使用灵活性。
445 | - 批处理:提供一次性任务,定时任务;满足批量数据处理和分析的场景。
446 |
447 | [Kubernetes 深入学习(一) —— 入门和集群安装部署](https://www.cnblogs.com/chiangchou/p/k8s-1.html#_label0_0)
448 | [Kubernetes(一) 跟着官方文档从零搭建K8S](https://juejin.cn/post/6844903943051411469)
449 | [kubeadm部署k8s集群最全最详细](https://blog.csdn.net/Doudou_Mylove/article/details/103901732)
450 |
451 |
452 |
453 | [RDF入门](https://blog.csdn.net/txlCandy/article/details/50959358)
454 | [OWL语言](https://blog.csdn.net/zycxnanwang/article/details/86557350)
455 |
456 |
--------------------------------------------------------------------------------
/10_docker/mi_docker_demo/README.md:
--------------------------------------------------------------------------------
1 | ## flask on docker demo
2 |
3 | 1. 如果用request.json则传入的数据需要json格式
4 | 2. 如果用request.values则传入的数据是[(key, value),()]这种形式
5 |
6 | ### 这里以json为例
7 |
8 | ```python
9 | import requests
10 | json={'id': 1223, 'text': '我是中国人'}
11 | r = requests.post('http://0.0.0.0:5000/req_message', json=json)
12 | r.json()
13 |
14 | # {'responseTime': '20190515120101', 'sid': 1223, 'text_sep': '我 是 中国 人'}
15 | ```
--------------------------------------------------------------------------------
/10_docker/mi_docker_demo/app.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 |
3 | from flask import Flask
4 | from flask import request
5 | from flask import make_response, Response
6 | from flask import jsonify
7 | import datetime
8 | import jieba
9 | jieba.initialize()
10 |
11 |
12 | # 创建一个falsk对象
13 | app = Flask(__name__)
14 |
15 |
16 | @app.route('/')
17 | def get_simple_test():
18 | return 'BINZHOU TEST'
19 |
20 | @app.route('/req_message', methods=['POST'])
21 | def req_message():
22 | print(request.json)
23 | if request.method == 'POST':
24 | id_ = request.json.get('id')
25 | text_ = request.json.get('text')
26 | text_sep_str = ' '.join(jieba.lcut(text_))
27 | res = {
28 | 'sid': id_,
29 | 'text_sep': text_sep_str,
30 | 'responseTime': datetime.datetime.now().strftime('%Y%m%d%H%M%S')}
31 | return jsonify(res)
32 |
33 | app.config['JSON_AS_ASCII'] = False
34 | app.run(host='0.0.0.0', port=5000, debug=False)
35 |
--------------------------------------------------------------------------------
/10_docker/mi_docker_demo/docker_build:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 |
3 | mkdir -p release/
4 | cp -r * /home/work
5 | pip install -r requirements.txt -U -i https://pypi.tuna.tsinghua.edu.cn/simple/
6 |
--------------------------------------------------------------------------------
/10_docker/mi_docker_demo/requirements.txt:
--------------------------------------------------------------------------------
1 | flask
2 | jieba
3 | requests
--------------------------------------------------------------------------------
/10_docker/newapi_docker_demo/README.md:
--------------------------------------------------------------------------------
1 | ## Dockerfile
2 |
3 | ```Dockerfile
4 | FROM node:16 as builder
5 |
6 | WORKDIR /build
7 | COPY web/package.json .
8 | RUN npm install
9 | COPY ./web .
10 | COPY ./VERSION .
11 | RUN DISABLE_ESLINT_PLUGIN='true' VITE_REACT_APP_VERSION=$(cat VERSION) npm run build
12 |
13 | FROM golang AS builder2
14 |
15 | ENV GO111MODULE=on \
16 | CGO_ENABLED=1 \
17 | GOOS=linux
18 |
19 | WORKDIR /build
20 | ADD go.mod go.sum ./
21 | RUN go mod download
22 | COPY . .
23 | COPY --from=builder /build/dist ./web/dist
24 | RUN go build -ldflags "-s -w -X 'one-api/common.Version=$(cat VERSION)' -extldflags '-static'" -o one-api
25 |
26 | FROM alpine
27 |
28 | RUN apk update \
29 | && apk upgrade \
30 | && apk add --no-cache ca-certificates tzdata \
31 | && update-ca-certificates 2>/dev/null || true
32 |
33 | COPY --from=builder2 /build/one-api /
34 | EXPOSE 3000
35 | WORKDIR /data
36 | ENTRYPOINT ["/one-api"]
37 | ```
38 |
39 |
40 | ## Dockerfile 解析
41 |
42 | 这个 Dockerfile 通过多个阶段构建一个含前端和后端组件的应用。每个阶段使用不同的基础镜像和步骤来完成特定的任务。
43 |
44 | ### 第一阶段:前端构建(Node.js)
45 |
46 | - **基础镜像**:
47 | - `FROM node:16 as builder`:使用 Node.js 16 版本的官方镜像作为基础镜像,并标记此构建阶段为 `builder`。
48 | - **设置工作目录**:
49 | - `WORKDIR /build`:将工作目录设置为 `/build`。
50 | - **复制文件**:
51 | - `COPY web/package.json .`:将前端代码目录下的 `package.json` 文件复制到工作目录中。
52 | - **安装依赖**:
53 | - `RUN npm install`:根据 `package.json` 安装所需依赖。
54 | - **复制前端代码和版本文件**:
55 | - `COPY ./web .`:将web文件夹下所有文件复制到工作目录。
56 | - `COPY ./VERSION .`:将项目版本文件复制到工作目录。
57 | - **构建前端项目**:
58 | - `RUN DISABLE_ESLINT_PLUGIN='true' VITE_REACT_APP_VERSION=$(cat VERSION) npm run build`:设置环境变量并执行前端构建脚本,生成生产环境用的前端文件。
59 |
60 | ### 第二阶段:后端构建(Go)
61 |
62 | - **基础镜像**:
63 | - `FROM golang AS builder2`:使用 Go 的官方镜像作为基础,并标记此阶段为 `builder2`。
64 | - **环境变量**:
65 | - 设置多个环境变量,以支持 Go 的模块系统和确保生成的是适用于 Linux 的静态链接二进制文件。
66 | - **设置工作目录**:
67 | - `WORKDIR /build`:设置工作目录。
68 | - **添加 Go 模块文件**:
69 | - `ADD go.mod go.sum ./`:添加 Go 模块定义文件。
70 | - **下载依赖**:
71 | - `RUN go mod download`:下载 Go 依赖。
72 | - **复制代码和前端构建产物**:
73 | - `COPY . .`:复制所有后端代码到工作目录。
74 | - `COPY --from=builder /build/dist ./web/dist`:从第一阶段中复制构建好的前端文件到后端服务目录中。
75 | - **编译应用**:
76 | - `RUN go build -ldflags "-s -w -X 'one-api/common.Version=$(cat VERSION)' -extldflags '-static'" -o one-api`:使用 Go 编译命令构建应用,设置链接器选项以嵌入版本信息并优化二进制大小。
77 |
78 | ### 第三阶段:运行环境
79 |
80 | - **基础镜像**:
81 | - `FROM alpine`:使用轻量级的 Alpine Linux 镜像作为基础。
82 | - **安装证书和时区数据**:
83 | - 运行一系列命令以安装必要的证书和时区数据,确保应用可以处理 HTTPS 连接和正确的时间。
84 | - **复制编译好的应用**:
85 | - `COPY --from=builder2 /build/one-api /`:从第二阶段复制编译好的应用到根目录。
86 | - **端口和工作目录**:
87 | - `EXPOSE 3000`:声明容器在运行时会监听 3000 端口。
88 | - `WORKDIR /data`:设置工作目录,应用可能会使用此目录来存储数据。
89 | - **设置入口点**:
90 | - `ENTRYPOINT ["/one-api"]`:设置容器启动时执行的命令。
91 |
92 | ### 总结
93 |
94 | 此 Dockerfile 首先构建前端资源,然后构建后端服务,并将前端资源集成到后端服务中,最后在一个轻量级容器中运行编译好的二进制文件,实现前后端的自动化构建和部署。
95 |
--------------------------------------------------------------------------------
/11_rabbitmq/README.md:
--------------------------------------------------------------------------------
1 | ## 消息队列
2 |
3 | - [1. 简单使用步骤](#简单使用步骤)
4 | - [2. 启动rabbitmq docker命令](#启动docker命令)
5 | - [3. 简单的生产者和消费者demo代码](#简单的生产者和消费者demo代码)
6 | - [4. rabbitmq实现一台服务器同时给所有的消费者发送消息](#rabbitmq实现一台服务器同时给所有的消费者发送消息)
7 |
8 |
9 | [rabbitmq tutorial](https://www.rabbitmq.com/tutorials/tutorial-one-python.html)
10 |
11 | ### 简单使用步骤
12 |
13 | [安装RabbitMQ Server](https://www.rabbitmq.com/download.html)
14 | 用docker安装即可
15 |
16 | ### 启动docker命令
17 |
18 | ```shell
19 | docker run -it --rm --name rabbitmq -p 5672:5672 -p 15672:15672 rabbitmq:3-management
20 | # port 5672
21 | ```
22 | 用docker启动RabbitMQ
23 |
24 | ### 简单的生产者和消费者demo代码
25 |
26 | 消费者 server.py
27 | ```python
28 | #!/usr/bin/env python
29 | import pika
30 | import json
31 |
32 | connection = pika.BlockingConnection(
33 | pika.ConnectionParameters(host='localhost'))
34 | channel = connection.channel()
35 |
36 | channel.queue_declare(queue='hello')
37 |
38 |
39 | def callback(ch, method, properties, body):
40 | vec = json.loads(body)
41 | print(" [x] Received ", vec)
42 |
43 |
44 | channel.basic_consume(
45 | queue='hello', on_message_callback=callback, auto_ack=True)
46 |
47 | print(' [*] Waiting for messages. To exit press CTRL+C')
48 | channel.start_consuming()
49 | ```
50 | receive.py启动以后会一直监听host上的queue
51 |
52 | 生产者 client.py
53 | ```python
54 | #!/usr/bin/env python
55 | import pika
56 | import json
57 |
58 | connection = pika.BlockingConnection(
59 | pika.ConnectionParameters(host='localhost'))
60 | channel = connection.channel()
61 |
62 | channel.queue_declare(queue='hello')
63 |
64 | channel.basic_publish(exchange='', routing_key='hello', body=json.dumps([1.2,0.99,5.5]))
65 | print(" [x] Sent 'Hello World!'")
66 | connection.close()
67 | ```
68 | send.py每发一次,receive.py那边会打印出发送的body信息
69 |
70 |
71 | ### rabbitmq实现一台服务器同时给所有的消费者发送消息
72 |
73 | 开了docker版的rabbitmq服务以后,在多台机器上先运行消费者server.py
74 | ```python
75 | #!/usr/bin/env python
76 | import pika
77 |
78 | connection = pika.BlockingConnection(pika.ConnectionParameters(host='localhost'))
79 | channel = connection.channel()
80 | #创建exchange的名称为logs,指定类型为fanout
81 | channel.exchange_declare(exchange='logs', exchange_type='fanout')
82 | #删除随机创建的消息队列
83 | queue_name = 'task_queue1' #每台机器上的名字最好不一样
84 | result = channel.queue_declare(queue=queue_name)
85 | channel.queue_bind(exchange='logs', queue=queue_name)
86 | print(' [*] Waiting for logs. To exit press CTRL+C')
87 | def callback(ch, method, properties, body):
88 | print(" [x] %r" % body)
89 | channel.basic_consume(queue_name, callback)
90 | channel.start_consuming()
91 | ```
92 |
93 | 然后再用生产者client.py发送给消费者,这个时候这些消费者会同时接收到该消息
94 | ```python
95 | import pika
96 | import sys
97 | connection = pika.BlockingConnection(pika.ConnectionParameters(host='localhost'))
98 | channel = connection.channel()
99 | channel.exchange_declare(exchange='logs',exchange_type='fanout')
100 | message = ' '.join(sys.argv[1:]) or "info: Hello World!"
101 | #指定exchange的名称
102 | channel.basic_publish(exchange='logs', routing_key='', body=message)
103 | print(" [x] Sent %r" % message)
104 | connection.close()
105 | ```
106 |
107 | 注:host都要写同一个服务器, rabbitmq服务开启的那台机器ip
108 |
109 | ### xxx
--------------------------------------------------------------------------------
/12_nginx/README.md:
--------------------------------------------------------------------------------
1 | ## nginx
2 |
3 | [**1. nginx入门使用**](#nginx入门使用)
4 |
5 | [**2. nginx正则使用1(2024.4.2更新)**](#nginx正则使用1)
6 |
7 |
8 |
9 | ---
10 |
11 | ### nginx入门使用
12 |
13 |
14 | 点击展开
15 |
16 | **1. 第一步用安装docker nginx**
17 |
18 | ```shell
19 | docker pull nginx:latest
20 | ```
21 |
22 | **2. 开启nginx和两个flask服务(用来模拟多个服务器的)**
23 |
24 | ```shell
25 | # 开启nginx
26 | docker run --name=nginx -d -p 4030:80 nginx #网页访问端口4030
27 | # 开启两个flask server
28 | docker run -d -p 5001:5001 -v $PWD/flask_nginx_test:/usr/src/flask_nginx_test -w /usr/src/flask_nginx_test binzhouchn/python36:1.4 python test1.py
29 | docker run -d -p 5002:5002 -v $PWD/flask_nginx_test:/usr/src/flask_nginx_test -w /usr/src/flask_nginx_test binzhouchn/python36:1.4 python test2.py
30 | ```
31 |
32 | nginx配置前,开启以后单独访问
33 | localhost:4030会进入nginx欢迎界面
34 | localhost:5001页面显示BINZHOU TEST 1
35 | localhost:5002页面显示BINZHOU TEST 2
36 |
37 | **3. 配置nginx配置文件**
38 |
39 | 文件在/etc/nginx/nginx.conf,由于这个文件include /etc/nginx/conf.d/*.conf;所以直接到/etc/nginx/conf.d/下面更改default.conf即可
40 | [更改后的default.conf](default.conf)
41 |
42 | 注:
43 | 这里172.17.0.3这些是docker虚拟ip地址,docker之间通信可以通过这个地址
44 | 负载均衡通过轮询方式
45 | 172.17.0.5:5003这个端口并没有开启,会自动忽略
46 |
47 | **4. 配置完后重启ngix**
48 |
49 | ```shell
50 | # 先进到ngix docker里面/etc/nginx/config.d中运行nginx -t看下是否success
51 | docker stop
52 | docker start
53 | ```
54 |
55 | 配置完nginx以及重启后,再访问
56 | localhost:4030页面会显示BINZHOU TEST 1;再刷新(重载)会显示BINZHOU TEST 2;再刷新BINZHOU TEST 1
57 |
58 | **说明nginx已经自动转到两个服务器去了**
59 |
60 | **5. 配置文件扩展**
61 |
62 | 5.1 一台nginx服务器,通过指定不同端口(比如4030和4031)来达到访问不同应用的目的
63 | ```shell
64 | # docker开启nginx命令如下,映射两个端口
65 | docker run --name=nginx -d -p 4030:4030 -p 4031:4031 nginx
66 | ```
67 | [配置文件1](default1.conf)
68 |
69 | 5.2 一台nginx服务器,通过不同的路由(比如/project/guoge)来达到访问不同应用的目的
84 |
85 | ### nginx正则使用1
86 |
87 | ```shell
88 | cd /etc/nginx/conf.d
89 | #修改后重启
90 | systemctl restart nginx
91 | nginx -s reload
92 | ```
93 | [配置文件3](default3.conf)
94 |
95 | 说明:本次使用正则的目的是当我访问
96 | http://10.28.xx.xx:8000/aimanager_gpu/recsys/时,
97 | 正则匹配后转到http://localhost:10086,后面不加/aimanager_gpu/recsys路由
98 | (如果不走正则那么proxy_pass转到http://localhost:10086后会自动拼接/aimanager_gpu/recsys)
99 |
100 |
101 |
102 |
103 | - 参考资料
104 |
105 | [nginx作为http服务器-静态页面的访问](https://www.cnblogs.com/xuyang94/p/12667844.html)
106 | [docker nginx反向代理](https://www.cnblogs.com/dotnet261010/p/12596185.html)
107 | [nginx负载均衡参考1](https://www.jianshu.com/p/4c250c1cd6cd)
108 | [nginx负载均衡参考2](https://www.cnblogs.com/diantong/p/11208508.html)
--------------------------------------------------------------------------------
/12_nginx/default.conf:
--------------------------------------------------------------------------------
1 | upstream nginx-flask-test {
2 | server 172.17.0.3:5001;
3 | server 172.17.0.4:5002;
4 | server 172.17.0.5:5003;
5 | }
6 |
7 | server {
8 | listen 80;
9 | listen [::]:80;
10 | server_name localhost;
11 |
12 | #charset koi8-r;
13 | #access_log /var/log/nginx/host.access.log main;
14 |
15 | location / {
16 | root /usr/share/nginx/html;
17 | index index.html index.htm;
18 | proxy_pass http://nginx-flask-test;
19 | }
20 |
21 | #error_page 404 /404.html;
22 |
23 | # redirect server error pages to the static page /50x.html
24 | #
25 | error_page 500 502 503 504 /50x.html;
26 | location = /50x.html {
27 | root /usr/share/nginx/html;
28 | }
29 |
30 | # proxy the PHP scripts to Apache listening on 127.0.0.1:80
31 | #
32 | #location ~ \.php$ {
33 | # proxy_pass http://127.0.0.1;
34 | #}
35 |
36 | # pass the PHP scripts to FastCGI server listening on 127.0.0.1:9000
37 | #
38 | #location ~ \.php$ {
39 | # root html;
40 | # fastcgi_pass 127.0.0.1:9000;
41 | # fastcgi_index index.php;
42 | # fastcgi_param SCRIPT_FILENAME /scripts$fastcgi_script_name;
43 | # include fastcgi_params;
44 | #}
45 | # deny access to .htaccess files, if Apache's document root
46 | # concurs with nginx's one
47 | #
48 | #location ~ /\.ht {
49 | # deny all;
50 | #}
51 | }
52 |
53 |
--------------------------------------------------------------------------------
/12_nginx/default1.conf:
--------------------------------------------------------------------------------
1 | upstream server1 {
2 | server 192.168.0.108:5004;
3 | }
4 |
5 | upstream server2 {
6 | server 192.168.0.108:5007;
7 | }
8 |
9 | server {
10 | listen 4030;
11 | server_name localhost;
12 | client_max_body_size 1024M;
13 |
14 | #默认路由放最下面
15 | location / {
16 | proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
17 | proxy_set_header Host $http_host;
18 | proxy_pass http://server1;
19 | }
20 | }
21 |
22 | server {
23 | listen 4031;
24 | server_name localhost;
25 | client_max_body_size 1024M;
26 |
27 | #默认路由放最下面
28 | location / {
29 | proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
30 | proxy_set_header Host $http_host;
31 | proxy_pass http://server2;
32 | }
33 | }
34 |
--------------------------------------------------------------------------------
/12_nginx/default2.conf:
--------------------------------------------------------------------------------
1 | upstream server1 {
2 | server 192.168.0.108:5004;
3 | }
4 |
5 | upstream server2 {
6 | server 192.168.0.108:5007;
7 | }
8 |
9 | server {
10 | listen 4030;
11 | server_name localhost;
12 | client_max_body_size 1024M;
13 |
14 | location /project/guoge {
15 | proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
16 | proxy_set_header Host $http_host;
17 | proxy_pass http://server2;
18 | }
19 |
20 | #默认路由放最下面
21 | location / {
22 | proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
23 | proxy_set_header Host $http_host;
24 | proxy_pass http://server1;
25 | }
26 | }
27 |
28 |
29 |
--------------------------------------------------------------------------------
/12_nginx/default3.conf:
--------------------------------------------------------------------------------
1 | upstream recsys {
2 | server localhost:10086;
3 | }
4 |
5 | server {
6 | server_name localhost;
7 | listen 8000;
8 | location ~* /aimanager_gpu/recsys/ {
9 | if ($request_uri ~ /aimanager_gpu/recsys/(.+))
10 | {
11 | set $rightUrl $1;
12 | }
13 | proxy_pass http://recsys/$rightUrl;
14 | }
15 | }
--------------------------------------------------------------------------------
/13_airflow/README.md:
--------------------------------------------------------------------------------
1 | ## airflow 气流
2 |
3 | 任务调度神器airflow之初体验
4 | https://zhuanlan.zhihu.com/p/42052108
5 |
6 | Airflow入门及使用
7 | https://zhuanlan.zhihu.com/p/84332879
8 |
9 | 【调研】Airflow使用
10 | https://www.jianshu.com/p/75c64b63122b
--------------------------------------------------------------------------------
/14_go/README.md:
--------------------------------------------------------------------------------
1 | # python调用golang
2 |
3 | ## 示例一 python端输入int返回int
4 |
5 | ```Go
6 | package main
7 |
8 | import (
9 | "C"
10 | )
11 |
12 | func f1(x int) int {
13 | return x*x + 2
14 | }
15 |
16 | //export Fib
17 | func Fib(n int) int {
18 | if n == 1 || n == 2 {
19 | return 1
20 | } else {
21 | return Fib(n-1) + Fib(n-2) + f1(1)
22 | }
23 | }
24 |
25 | func main() {}
26 | ```
27 |
28 | //go build -buildmode=c-shared -o _fib.so fib.go
29 | //参考链接https://blog.csdn.net/cainiao_python/article/details/107724309
30 | //将_fib.so文件拷贝到python文件夹下
31 |
32 | ```python
33 | import ctypes
34 | import time
35 | from ctypes import *
36 | so = ctypes.cdll.LoadLibrary('./_fib.so')
37 | start = time.time()
38 | result = so.Fib(40)
39 | end = time.time()
40 | print(f'斐波那契数列第40项:{result},耗时:{end - start}')
41 | ```
42 |
43 | ## 示例二 python端输入string返回string(推荐看示例三)
44 |
45 | ```Go
46 | package main
47 |
48 | import (
49 | "C"
50 | "database/sql"
51 | "log"
52 | "strings"
53 |
54 | _ "github.com/go-sql-driver/mysql"
55 | )
56 |
57 | //export Gdbc
58 | func Gdbc(uri *C.char) string {
59 | log.Println(uri)
60 | db, err := sql.Open("mysql", C.GoString(uri))
61 | if err != nil {
62 | log.Fatalln(err)
63 | }
64 | rows, err := db.Query("SELECT feature_word FROM insurance_qa.feature_words")
65 | if err != nil {
66 | log.Fatalln(err)
67 | }
68 | res := []string{}
69 | for rows.Next() {
70 | var s string
71 | err = rows.Scan(&s)
72 | if err != nil {
73 | log.Fatalln(err)
74 | }
75 | // log.Printf("found row containing %q", s)
76 | res = append(res, s)
77 | }
78 | rows.Close()
79 | return strings.Join(res, ",")
80 | }
81 |
82 | func main() {
83 | // res := Gdbc("username:password@tcp(localhost:3306)/database?charset=utf8")
84 | // fmt.Println(res)
85 | }
86 | ```
87 | //go build -buildmode=c-shared -o _gdbc.so test.go
88 | //将_gdbc.so文件拷贝到python文件夹下
89 |
90 | ```python
91 | import ctypes
92 | import time
93 | from ctypes import *
94 | class StructPointer(Structure):
95 | _fields_ = [("p", c_char_p), ("n", c_longlong)]
96 |
97 | so = ctypes.cdll.LoadLibrary('./_gdbc.so')
98 | so.Gdbc.restype = StructPointer
99 | start = time.time()
100 | uri = "username:password@tcp(localhost:3306)/database?charset=utf8"
101 | res = so.Gdbc(uri.encode("utf-8"))
102 | print(res.n)
103 | print(res.p[:res.n].decode())#print(res.p.decode())这样貌似也没问题
104 | end = time.time()
105 | print(f'耗时:{end - start}')
106 | ```
107 |
108 | ## 示例三 python端输入string,go查询数据库然后返回json str
109 |
110 | ```Go
111 | package main
112 |
113 | import (
114 | "C"
115 | "database/sql"
116 | "encoding/json"
117 | "log"
118 |
119 | _ "github.com/go-sql-driver/mysql"
120 | )
121 |
122 | type Fw struct {
123 | feature_word string
124 | word_type string
125 | id int64
126 | }
127 |
128 | //export Gdbc
129 | func Gdbc(uri *C.char) string {
130 | db, err := sql.Open("mysql", C.GoString(uri))
131 | //设置数据库最大连接数
132 | db.SetConnMaxLifetime(100)
133 | //设置上数据库最大闲置连接数
134 | db.SetMaxIdleConns(10)
135 | if err != nil {
136 | log.Fatalln(err)
137 | }
138 | rows, err := db.Query("SELECT feature_word,word_type,id FROM insurance_qa.feature_words")
139 | if err != nil {
140 | log.Fatalln(err)
141 | }
142 | res := [][]interface{}{}
143 | var fw Fw
144 | for rows.Next() {
145 | err = rows.Scan(&fw.feature_word, &fw.word_type, &fw.id)
146 | if err != nil {
147 | log.Fatalln(err)
148 | }
149 | // log.Printf("found row containing %q", s)
150 | tmp := []interface{}{}
151 | tmp = append(tmp, fw.feature_word)
152 | tmp = append(tmp, fw.word_type)
153 | tmp = append(tmp, fw.id)
154 | res = append(res, tmp)
155 | // res = append(res, []interface{}{fw.feature_word, fw.word_type, fw.id})//上面的一行写法
156 | }
157 | rows.Close()
158 | b, err := json.Marshal(res)
159 | if err != nil {
160 | panic(err)
161 | }
162 | result := string(b)
163 | return result
164 | }
165 |
166 | func main() {}
167 |
168 | ```
169 |
170 | //go build -buildmode=c-shared -o _gdbc.so test.go
171 | //将_gdbc.so文件拷贝到python文件夹下
172 |
173 | ```python
174 | import ctypes
175 | import time
176 | import json
177 | from ctypes import *
178 | class StructPointer(Structure):
179 | _fields_ = [("p", c_char_p), ("n", c_longlong)]
180 |
181 | so = ctypes.cdll.LoadLibrary('./_gdbc.so')
182 | so.Gdbc.restype = StructPointer
183 | start = time.time()
184 | uri = "username:password@tcp(localhost:3306)/database?charset=utf8"
185 | res = so.Gdbc(uri.encode("utf-8"))
186 | print(res.n)
187 | print(res.p.decode())
188 | print(json.loads(res.p.decode()))
189 | end = time.time()
190 | ```
191 |
192 | ##
--------------------------------------------------------------------------------
/15_ansible/README.md:
--------------------------------------------------------------------------------
1 | # ansible笔记
2 |
3 | ```shell
4 | 在/etc/ansible/ansible.cfg下配置[model]
5 | # ping
6 | ansible model -m ping
7 | # ansible-playbook写剧本
8 | ansible-playbook xxx.yaml
9 | # 传文件
10 | ansible model -m copy -a "src=./test.txt dest=/home/zhoubin"
11 | # 创建文件(ansible-playbook形式)
12 | - hosts: model
13 | remote_user: zhoubin
14 | tasks:
15 | - name: "create test2.txt in the /etc directory"
16 | file:
17 | path: "/home/zhoubin/test2.txt"
18 | state: "touch"
19 | # 创建文件夹(ansible-playbook形式)
20 | - hosts: model
21 | remote_user: zhoubin
22 | tasks:
23 | - name: "create tmp file in the /etc directory"
24 | file:
25 | path: "/home/zhoubin/tmp"
26 | state: "directory"
27 | # 删除文件(ansible-playbook形式)
28 | - hosts: model
29 | remote_user: zhoubin
30 | tasks:
31 | - name: "delete test.txt in the /etc directory"
32 | file:
33 | path: "/home/zhoubin/test.txt"
34 | state: "absent"
35 | # 删除多个文件(ansible-playbook形式)
36 | - hosts: model
37 | remote_user: zhoubin
38 | tasks:
39 | - name: "delete multi files in the /etc directory"
40 | file:
41 | path: "{{ item }}"
42 | state: "absent"
43 | with_items:
44 | - /home/zhoubin/test1.txt
45 | - /home/zhoubin/test2.txt
46 | # 将远程服务器文件拷贝到本机
47 | ansible model -m fetch -a "src=/home/zhoubin/test.txt dest=./ force=yes backup=yes"
48 |
49 | # 写一个剧本(传docker镜像并且加载) become:yes可以避免sudo输密码!
50 | - hosts: model
51 | remote_user: zhoubin
52 | tasks:
53 | - name: copy docker image
54 | copy: src=./py37.tar.gz dest=/home/zhoubin
55 | - name: load image
56 | shell: docker load -i /home/zhoubin/py37.tar.gz
57 | become: yes
58 |
59 |
60 | ```
61 |
62 |
63 | ### 附录
64 |
65 | [超简单ansible2.4.2.0与playbook入门教程](https://blog.csdn.net/qq_45206551/article/details/105004233)
66 | [ansible-命令使用说明](https://www.cnblogs.com/scajy/p/11353825.html)
67 |
--------------------------------------------------------------------------------
/99_pycharm_archive/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/binzhouchn/python_notes/52b87285c4147cc2bd43d69d0243a2d6357f4183/99_pycharm_archive/.DS_Store
--------------------------------------------------------------------------------
/99_pycharm_archive/README.md:
--------------------------------------------------------------------------------
1 | # 1. pycharm github
2 |
3 | pubilic email: binzhouchn@gmail.com
4 |
5 | **每次github push东西之前,先pull一下**
6 |
7 | [github教程](https://www.liaoxuefeng.com/wiki/0013739516305929606dd18361248578c67b8067c8c017b000)
8 |
9 | 简单的步骤:
10 | **首先:** 安装好git, macOS和windows不一样
11 |
12 | **第1步:** 创建SSH Key。在用户主目录下,看看有没有.ssh目录,如果有,再看看这个目录下有没有id_rsa和id_rsa.pub这两个文件,
13 | 如果已经有了,可直接跳到下一步。如果没有,打开Shell(Windows下打开Git Bash),创建SSH Key:
14 | ```
15 | ssh-keygen -t rsa -C "binzhouchn@gmail.com"
16 | ```
17 | 你需要把邮件地址换成你自己的邮件地址,然后一路回车,使用默认值即可,由于这个Key也不是用于军事目的,所以也无需设置密码。
18 | macOS存放密匙路径: /Users/binzhou/.ssh/id_rs
19 | 如果一切顺利的话,可以在用户主目录里找到.ssh目录,里面有id_rsa和id_rsa.pub两个文件,这两个就是SSH Key的秘钥对,id_rsa是私钥,
20 | 不能泄露出去,id_rsa.pub是公钥,可以放心地告诉任何人。
21 |
22 | **第2步:** 登陆GitHub,打开“Account settings”,“SSH Keys”页面:
23 | 然后,点“Add SSH Key”,填上任意Title,在Key文本框里粘贴id_rsa.pub文件的内容:
24 | 当然,GitHub允许你添加多个Key。假定你有若干电脑,你一会儿在公司提交,一会儿在家里提交,只要把每台电脑的Key都添加到GitHub,就可以在每台电脑上往GitHub推送了。
25 |
26 | 最后友情提示,在GitHub上免费托管的Git仓库,任何人都可以看到喔(但只有你自己才能改)。所以,不要把敏感信息放进去。
27 |
28 | **第3步:** 如果想在pycharm中使用git可以先配置pycharm(以windows为例的话)
29 |
30 |
31 |
32 |
33 | # 2. pycharm远程服务器生成项目及调试代码
34 |
35 | 1. 打开pycharm -> File -> Settings -> Project Interpreter
36 | 2. 点Project Interpreter轮子,选择add remote选择SSH然后填入用户密码等apply再OK
37 | 3. Tools -> Deployment -> Configuration点击左上角的加号(取名字v2并选SFTP)
38 | 4. Configuration中配置Connection和Mappings如下图所示
39 |
40 |
41 |
42 |
43 |
44 |
45 |
46 |
47 | # 3. pycharm中遇到的一些问题
48 |
49 | 1. pycharm运行app.py需要在configuration->Environment variables中加入 LANG=en_US.utf-8;LC_ALL=en_US.utf-8
50 |
51 | # 4. pycharm永久激活
52 |
53 | - 下载激活插件:jetbrains-agent.jar(见激活码文件夹),并将jetbrains-agent.jar放到PyCharm安装目录bin下面,例如/Applications/PyCharm.app/Contents/bin
54 | - 首次安装的Pycharm,需要点击激活窗口的Evaluate for free免费试用,然后再创建一个空项目进入主页窗口。
55 | - 在菜单栏Help中选择Edit Custom VM Options… 在弹框中选择Create
56 | - 在最后一行添加:-javaagent:/Applications/PyCharm.app/Contents/bin/jetbrains-agent.jar
57 | - 修改完成后,重启Pycharm,点击菜单栏中的 “Help” -> “Register”,输入永久激活码(见激活码文件夹)完成完成激活,这里的激活码与方法一种激活码不同
58 | - 查看有效期的步骤:点击:Help->About,这里可以看到你的pycharm有效期到2089年了,cheers bro!
59 |
60 |
61 |
62 | intellij idea也是同理激活,最后“Help” -> “Register”,license server激活方式输入https://fls.jet...
63 |
64 | # 5. xxx
65 |
--------------------------------------------------------------------------------
/99_pycharm_archive/pic/pycharm_activ.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/binzhouchn/python_notes/52b87285c4147cc2bd43d69d0243a2d6357f4183/99_pycharm_archive/pic/pycharm_activ.png
--------------------------------------------------------------------------------
/99_pycharm_archive/pic/pycharm_git1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/binzhouchn/python_notes/52b87285c4147cc2bd43d69d0243a2d6357f4183/99_pycharm_archive/pic/pycharm_git1.png
--------------------------------------------------------------------------------
/99_pycharm_archive/pic/pycharm_git2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/binzhouchn/python_notes/52b87285c4147cc2bd43d69d0243a2d6357f4183/99_pycharm_archive/pic/pycharm_git2.png
--------------------------------------------------------------------------------
/99_pycharm_archive/pic/pycharm_remote1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/binzhouchn/python_notes/52b87285c4147cc2bd43d69d0243a2d6357f4183/99_pycharm_archive/pic/pycharm_remote1.png
--------------------------------------------------------------------------------
/99_pycharm_archive/pic/pycharm_remote2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/binzhouchn/python_notes/52b87285c4147cc2bd43d69d0243a2d6357f4183/99_pycharm_archive/pic/pycharm_remote2.png
--------------------------------------------------------------------------------
/99_pycharm_archive/pic/pycharm_remote3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/binzhouchn/python_notes/52b87285c4147cc2bd43d69d0243a2d6357f4183/99_pycharm_archive/pic/pycharm_remote3.png
--------------------------------------------------------------------------------
/99_pycharm_archive/pic/pycharm_remote4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/binzhouchn/python_notes/52b87285c4147cc2bd43d69d0243a2d6357f4183/99_pycharm_archive/pic/pycharm_remote4.png
--------------------------------------------------------------------------------
/99_pycharm_archive/pic/pycharm_remote5.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/binzhouchn/python_notes/52b87285c4147cc2bd43d69d0243a2d6357f4183/99_pycharm_archive/pic/pycharm_remote5.png
--------------------------------------------------------------------------------
/99_pycharm_archive/激活码/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/binzhouchn/python_notes/52b87285c4147cc2bd43d69d0243a2d6357f4183/99_pycharm_archive/激活码/.DS_Store
--------------------------------------------------------------------------------
/99_pycharm_archive/激活码/jetbrains-agent.jar:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/binzhouchn/python_notes/52b87285c4147cc2bd43d69d0243a2d6357f4183/99_pycharm_archive/激活码/jetbrains-agent.jar
--------------------------------------------------------------------------------
/99_pycharm_archive/激活码/永久激活码/激活码.txt:
--------------------------------------------------------------------------------
1 | JQE11SV0BR-eyJsaWNlbnNlSWQiOiJKUUUxMVNWMEJSIiwibGljZW5zZWVOYW1lIjoicGlnNiIsImFzc2lnbmVlTmFtZSI6IiIsImFzc2lnbmVlRW1haWwiOiIiLCJsaWNlbnNlUmVzdHJpY3Rpb24iOiIiLCJjaGVja0NvbmN1cnJlbnRVc2UiOmZhbHNlLCJwcm9kdWN0cyI6W3siY29kZSI6IklJIiwiZmFsbGJhY2tEYXRlIjoiMjA4OS0wNy0wNyIsInBhaWRVcFRvIjoiMjA4OS0wNy0wNyJ9LHsiY29kZSI6IkFDIiwiZmFsbGJhY2tEYXRlIjoiMjA4OS0wNy0wNyIsInBhaWRVcFRvIjoiMjA4OS0wNy0wNyJ9LHsiY29kZSI6IkRQTiIsImZhbGxiYWNrRGF0ZSI6IjIwODktMDctMDciLCJwYWlkVXBUbyI6IjIwODktMDctMDcifSx7ImNvZGUiOiJQUyIsImZhbGxiYWNrRGF0ZSI6IjIwODktMDctMDciLCJwYWlkVXBUbyI6IjIwODktMDctMDcifSx7ImNvZGUiOiJHTyIsImZhbGxiYWNrRGF0ZSI6IjIwODktMDctMDciLCJwYWlkVXBUbyI6IjIwODktMDctMDcifSx7ImNvZGUiOiJETSIsImZhbGxiYWNrRGF0ZSI6IjIwODktMDctMDciLCJwYWlkVXBUbyI6IjIwODktMDctMDcifSx7ImNvZGUiOiJDTCIsImZhbGxiYWNrRGF0ZSI6IjIwODktMDctMDciLCJwYWlkVXBUbyI6IjIwODktMDctMDcifSx7ImNvZGUiOiJSUzAiLCJmYWxsYmFja0RhdGUiOiIyMDg5LTA3LTA3IiwicGFpZFVwVG8iOiIyMDg5LTA3LTA3In0seyJjb2RlIjoiUkMiLCJmYWxsYmFja0RhdGUiOiIyMDg5LTA3LTA3IiwicGFpZFVwVG8iOiIyMDg5LTA3LTA3In0seyJjb2RlIjoiUkQiLCJmYWxsYmFja0RhdGUiOiIyMDg5LTA3LTA3IiwicGFpZFVwVG8iOiIyMDg5LTA3LTA3In0seyJjb2RlIjoiUEMiLCJmYWxsYmFja0RhdGUiOiIyMDg5LTA3LTA3IiwicGFpZFVwVG8iOiIyMDg5LTA3LTA3In0seyJjb2RlIjoiUk0iLCJmYWxsYmFja0RhdGUiOiIyMDg5LTA3LTA3IiwicGFpZFVwVG8iOiIyMDg5LTA3LTA3In0seyJjb2RlIjoiV1MiLCJmYWxsYmFja0RhdGUiOiIyMDg5LTA3LTA3IiwicGFpZFVwVG8iOiIyMDg5LTA3LTA3In0seyJjb2RlIjoiREIiLCJmYWxsYmFja0RhdGUiOiIyMDg5LTA3LTA3IiwicGFpZFVwVG8iOiIyMDg5LTA3LTA3In0seyJjb2RlIjoiREMiLCJmYWxsYmFja0RhdGUiOiIyMDg5LTA3LTA3IiwicGFpZFVwVG8iOiIyMDg5LTA3LTA3In0seyJjb2RlIjoiUlNVIiwiZmFsbGJhY2tEYXRlIjoiMjA4OS0wNy0wNyIsInBhaWRVcFRvIjoiMjA4OS0wNy0wNyJ9XSwiaGFzaCI6IjEyNzk2ODc3LzAiLCJncmFjZVBlcmlvZERheXMiOjcsImF1dG9Qcm9sb25nYXRlZCI6ZmFsc2UsImlzQXV0b1Byb2xvbmdhdGVkIjpmYWxzZX0=-khgsQrnDiglknF0m+yyoYGJXX4vFE3IIVaoMd0bkpfAlMiYM4FUK1JM7uMnVSN0NBC7qtZjYlNzPscEyKE8634uGuY/uToFQnIOCtyUfBxB6j0wF/DcCjhKMNDbnJ1RKZ2VaALuC9B6d6lhtEKm9+urXWTBq7h2VfIBv5wk1Ul9T/m9Dwkz/LccTqnxO0PP288fF13ZbmcLI1/D0dqp/QxYshW6CLR+2Tvk6QCPoaOTKDU/eL1AssD7/mO1g2ZJA+k//8qfRMLgdLmLrMdyiaIhrsM/jJk2qDfTaMcCNylkWXLgKwSvEQG95IhitLN9+GQ4pBW3gOTNl82Gem7jEkA==-MIIElTCCAn2gAwIBAgIBCTANBgkqhkiG9w0BAQsFADAYMRYwFAYDVQQDDA1KZXRQcm9maWxlIENBMB4XDTE4MTEwMTEyMjk0NloXDTIwMTEwMjEyMjk0NlowaDELMAkGA1UEBhMCQ1oxDjAMBgNVBAgMBU51c2xlMQ8wDQYDVQQHDAZQcmFndWUxGTAXBgNVBAoMEEpldEJyYWlucyBzLnIuby4xHTAbBgNVBAMMFHByb2QzeS1mcm9tLTIwMTgxMTAxMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA5ndaik1GD0nyTdqkZgURQZGW+RGxCdBITPXIwpjhhaD0SXGa4XSZBEBoiPdY6XV6pOfUJeyfi9dXsY4MmT0D+sKoST3rSw96xaf9FXPvOjn4prMTdj3Ji3CyQrGWeQU2nzYqFrp1QYNLAbaViHRKuJrYHI6GCvqCbJe0LQ8qqUiVMA9wG/PQwScpNmTF9Kp2Iej+Z5OUxF33zzm+vg/nYV31HLF7fJUAplI/1nM+ZG8K+AXWgYKChtknl3sW9PCQa3a3imPL9GVToUNxc0wcuTil8mqveWcSQCHYxsIaUajWLpFzoO2AhK4mfYBSStAqEjoXRTuj17mo8Q6M2SHOcwIDAQABo4GZMIGWMAkGA1UdEwQCMAAwHQYDVR0OBBYEFGEpG9oZGcfLMGNBkY7SgHiMGgTcMEgGA1UdIwRBMD+AFKOetkhnQhI2Qb1t4Lm0oFKLl/GzoRykGjAYMRYwFAYDVQQDDA1KZXRQcm9maWxlIENBggkA0myxg7KDeeEwEwYDVR0lBAwwCgYIKwYBBQUHAwEwCwYDVR0PBAQDAgWgMA0GCSqGSIb3DQEBCwUAA4ICAQBonMu8oa3vmNAa4RQP8gPGlX3SQaA3WCRUAj6Zrlk8AesKV1YSkh5D2l+yUk6njysgzfr1bIR5xF8eup5xXc4/G7NtVYRSMvrd6rfQcHOyK5UFJLm+8utmyMIDrZOzLQuTsT8NxFpbCVCfV5wNRu4rChrCuArYVGaKbmp9ymkw1PU6+HoO5i2wU3ikTmRv8IRjrlSStyNzXpnPTwt7bja19ousk56r40SmlmC04GdDHErr0ei2UbjUua5kw71Qn9g02tL9fERI2sSRjQrvPbn9INwRWl5+k05mlKekbtbu2ev2woJFZK4WEXAd/GaAdeZZdumv8T2idDFL7cAirJwcrbfpawPeXr52oKTPnXfi0l5+g9Gnt/wfiXCrPElX6ycTR6iL3GC2VR4jTz6YatT4Ntz59/THOT7NJQhr6AyLkhhJCdkzE2cob/KouVp4ivV7Q3Fc6HX7eepHAAF/DpxwgOrg9smX6coXLgfp0b1RU2u/tUNID04rpNxTMueTtrT8WSskqvaJd3RH8r7cnRj6Y2hltkja82HlpDURDxDTRvv+krbwMr26SB/40BjpMUrDRCeKuiBahC0DCoU/4+ze1l94wVUhdkCfL0GpJrMSCDEK+XEurU18Hb7WT+ThXbkdl6VpFdHsRvqAnhR2g4b+Qzgidmuky5NUZVfEaZqV/g==
--------------------------------------------------------------------------------
/99_pycharm_archive/激活码/永久激活码/激活码1.txt:
--------------------------------------------------------------------------------
1 | A82DEE284F-eyJsaWNlbnNlSWQiOiJBODJERUUyODRGIiwibGljZW5zZWVOYW1lIjoiaHR0cHM6Ly96aGlsZS5pbyIsImFzc2lnbmVlTmFtZSI6IiIsImFzc2lnbmVlRW1haWwiOiIiLCJsaWNlbnNlUmVzdHJpY3Rpb24iOiJVbmxpbWl0ZWQgbGljZW5zZSB0aWxsIGVuZCBvZiB0aGUgY2VudHVyeS4iLCJjaGVja0NvbmN1cnJlbnRVc2UiOmZhbHNlLCJwcm9kdWN0cyI6W3siY29kZSI6IklJIiwicGFpZFVwVG8iOiIyMDg5LTA3LTA3In0seyJjb2RlIjoiUlMwIiwicGFpZFVwVG8iOiIyMDg5LTA3LTA3In0seyJjb2RlIjoiV1MiLCJwYWlkVXBUbyI6IjIwODktMDctMDcifSx7ImNvZGUiOiJSRCIsInBhaWRVcFRvIjoiMjA4OS0wNy0wNyJ9LHsiY29kZSI6IlJDIiwicGFpZFVwVG8iOiIyMDg5LTA3LTA3In0seyJjb2RlIjoiREMiLCJwYWlkVXBUbyI6IjIwODktMDctMDcifSx7ImNvZGUiOiJEQiIsInBhaWRVcFRvIjoiMjA4OS0wNy0wNyJ9LHsiY29kZSI6IlJNIiwicGFpZFVwVG8iOiIyMDg5LTA3LTA3In0seyJjb2RlIjoiRE0iLCJwYWlkVXBUbyI6IjIwODktMDctMDcifSx7ImNvZGUiOiJBQyIsInBhaWRVcFRvIjoiMjA4OS0wNy0wNyJ9LHsiY29kZSI6IkRQTiIsInBhaWRVcFRvIjoiMjA4OS0wNy0wNyJ9LHsiY29kZSI6IkdPIiwicGFpZFVwVG8iOiIyMDg5LTA3LTA3In0seyJjb2RlIjoiUFMiLCJwYWlkVXBUbyI6IjIwODktMDctMDcifSx7ImNvZGUiOiJDTCIsInBhaWRVcFRvIjoiMjA4OS0wNy0wNyJ9LHsiY29kZSI6IlBDIiwicGFpZFVwVG8iOiIyMDg5LTA3LTA3In0seyJjb2RlIjoiUlNVIiwicGFpZFVwVG8iOiIyMDg5LTA3LTA3In1dLCJoYXNoIjoiODkwNzA3MC8wIiwiZ3JhY2VQZXJpb2REYXlzIjowLCJhdXRvUHJvbG9uZ2F0ZWQiOmZhbHNlLCJpc0F1dG9Qcm9sb25nYXRlZCI6ZmFsc2V9-5epo90Xs7KIIBb8ckoxnB/AZQ8Ev7rFrNqwFhBAsQYsQyhvqf1FcYdmlecFWJBHSWZU9b41kvsN4bwAHT5PiznOTmfvGv1MuOzMO0VOXZlc+edepemgpt+t3GUHvfGtzWFYeKeyCk+CLA9BqUzHRTgl2uBoIMNqh5izlDmejIwUHLl39QOyzHiTYNehnVN7GW5+QUeimTr/koVUgK8xofu59Tv8rcdiwIXwTo71LcU2z2P+T3R81fwKkt34evy7kRch4NIQUQUno//Pl3V0rInm3B2oFq9YBygPUdBUbdH/KHROyohZRD8SaZJO6kUT0BNvtDPKF4mCT1saWM38jkw==-MIIElTCCAn2gAwIBAgIBCTANBgkqhkiG9w0BAQsFADAYMRYwFAYDVQQDDA1KZXRQcm9maWxlIENBMB4XDTE4MTEwMTEyMjk0NloXDTIwMTEwMjEyMjk0NlowaDELMAkGA1UEBhMCQ1oxDjAMBgNVBAgMBU51c2xlMQ8wDQYDVQQHDAZQcmFndWUxGTAXBgNVBAoMEEpldEJyYWlucyBzLnIuby4xHTAbBgNVBAMMFHByb2QzeS1mcm9tLTIwMTgxMTAxMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA5ndaik1GD0nyTdqkZgURQZGW+RGxCdBITPXIwpjhhaD0SXGa4XSZBEBoiPdY6XV6pOfUJeyfi9dXsY4MmT0D+sKoST3rSw96xaf9FXPvOjn4prMTdj3Ji3CyQrGWeQU2nzYqFrp1QYNLAbaViHRKuJrYHI6GCvqCbJe0LQ8qqUiVMA9wG/PQwScpNmTF9Kp2Iej+Z5OUxF33zzm+vg/nYV31HLF7fJUAplI/1nM+ZG8K+AXWgYKChtknl3sW9PCQa3a3imPL9GVToUNxc0wcuTil8mqveWcSQCHYxsIaUajWLpFzoO2AhK4mfYBSStAqEjoXRTuj17mo8Q6M2SHOcwIDAQABo4GZMIGWMAkGA1UdEwQCMAAwHQYDVR0OBBYEFGEpG9oZGcfLMGNBkY7SgHiMGgTcMEgGA1UdIwRBMD+AFKOetkhnQhI2Qb1t4Lm0oFKLl/GzoRykGjAYMRYwFAYDVQQDDA1KZXRQcm9maWxlIENBggkA0myxg7KDeeEwEwYDVR0lBAwwCgYIKwYBBQUHAwEwCwYDVR0PBAQDAgWgMA0GCSqGSIb3DQEBCwUAA4ICAQBonMu8oa3vmNAa4RQP8gPGlX3SQaA3WCRUAj6Zrlk8AesKV1YSkh5D2l+yUk6njysgzfr1bIR5xF8eup5xXc4/G7NtVYRSMvrd6rfQcHOyK5UFJLm+8utmyMIDrZOzLQuTsT8NxFpbCVCfV5wNRu4rChrCuArYVGaKbmp9ymkw1PU6+HoO5i2wU3ikTmRv8IRjrlSStyNzXpnPTwt7bja19ousk56r40SmlmC04GdDHErr0ei2UbjUua5kw71Qn9g02tL9fERI2sSRjQrvPbn9INwRWl5+k05mlKekbtbu2ev2woJFZK4WEXAd/GaAdeZZdumv8T2idDFL7cAirJwcrbfpawPeXr52oKTPnXfi0l5+g9Gnt/wfiXCrPElX6ycTR6iL3GC2VR4jTz6YatT4Ntz59/THOT7NJQhr6AyLkhhJCdkzE2cob/KouVp4ivV7Q3Fc6HX7eepHAAF/DpxwgOrg9smX6coXLgfp0b1RU2u/tUNID04rpNxTMueTtrT8WSskqvaJd3RH8r7cnRj6Y2hltkja82HlpDURDxDTRvv+krbwMr26SB/40BjpMUrDRCeKuiBahC0DCoU/4+ze1l94wVUhdkCfL0GpJrMSCDEK+XEurU18Hb7WT+ThXbkdl6VpFdHsRvqAnhR2g4b+Qzgidmuky5NUZVfEaZqV/g==
--------------------------------------------------------------------------------
/99_pycharm_archive/激活码/永久激活码/激活码2.txt:
--------------------------------------------------------------------------------
1 | 3AGXEJXFK9-eyJsaWNlbnNlSWQiOiIzQUdYRUpYRks5IiwibGljZW5zZWVOYW1lIjoiaHR0cHM6Ly96aGlsZS5pbyIsImFzc2lnbmVlTmFtZSI6IiIsImFzc2lnbmVlRW1haWwiOiIiLCJsaWNlbnNlUmVzdHJpY3Rpb24iOiIiLCJjaGVja0NvbmN1cnJlbnRVc2UiOmZhbHNlLCJwcm9kdWN0cyI6W3siY29kZSI6IklJIiwiZmFsbGJhY2tEYXRlIjoiMjA4OS0wNy0wNyIsInBhaWRVcFRvIjoiMjA4OS0wNy0wNyJ9LHsiY29kZSI6IkFDIiwiZmFsbGJhY2tEYXRlIjoiMjA4OS0wNy0wNyIsInBhaWRVcFRvIjoiMjA4OS0wNy0wNyJ9LHsiY29kZSI6IkRQTiIsImZhbGxiYWNrRGF0ZSI6IjIwODktMDctMDciLCJwYWlkVXBUbyI6IjIwODktMDctMDcifSx7ImNvZGUiOiJQUyIsImZhbGxiYWNrRGF0ZSI6IjIwODktMDctMDciLCJwYWlkVXBUbyI6IjIwODktMDctMDcifSx7ImNvZGUiOiJHTyIsImZhbGxiYWNrRGF0ZSI6IjIwODktMDctMDciLCJwYWlkVXBUbyI6IjIwODktMDctMDcifSx7ImNvZGUiOiJETSIsImZhbGxiYWNrRGF0ZSI6IjIwODktMDctMDciLCJwYWlkVXBUbyI6IjIwODktMDctMDcifSx7ImNvZGUiOiJDTCIsImZhbGxiYWNrRGF0ZSI6IjIwODktMDctMDciLCJwYWlkVXBUbyI6IjIwODktMDctMDcifSx7ImNvZGUiOiJSUzAiLCJmYWxsYmFja0RhdGUiOiIyMDg5LTA3LTA3IiwicGFpZFVwVG8iOiIyMDg5LTA3LTA3In0seyJjb2RlIjoiUkMiLCJmYWxsYmFja0RhdGUiOiIyMDg5LTA3LTA3IiwicGFpZFVwVG8iOiIyMDg5LTA3LTA3In0seyJjb2RlIjoiUkQiLCJmYWxsYmFja0RhdGUiOiIyMDg5LTA3LTA3IiwicGFpZFVwVG8iOiIyMDg5LTA3LTA3In0seyJjb2RlIjoiUEMiLCJmYWxsYmFja0RhdGUiOiIyMDg5LTA3LTA3IiwicGFpZFVwVG8iOiIyMDg5LTA3LTA3In0seyJjb2RlIjoiUk0iLCJmYWxsYmFja0RhdGUiOiIyMDg5LTA3LTA3IiwicGFpZFVwVG8iOiIyMDg5LTA3LTA3In0seyJjb2RlIjoiV1MiLCJmYWxsYmFja0RhdGUiOiIyMDg5LTA3LTA3IiwicGFpZFVwVG8iOiIyMDg5LTA3LTA3In0seyJjb2RlIjoiREIiLCJmYWxsYmFja0RhdGUiOiIyMDg5LTA3LTA3IiwicGFpZFVwVG8iOiIyMDg5LTA3LTA3In0seyJjb2RlIjoiREMiLCJmYWxsYmFja0RhdGUiOiIyMDg5LTA3LTA3IiwicGFpZFVwVG8iOiIyMDg5LTA3LTA3In0seyJjb2RlIjoiUlNVIiwiZmFsbGJhY2tEYXRlIjoiMjA4OS0wNy0wNyIsInBhaWRVcFRvIjoiMjA4OS0wNy0wNyJ9XSwiaGFzaCI6IjEyNzk2ODc3LzAiLCJncmFjZVBlcmlvZERheXMiOjcsImF1dG9Qcm9sb25nYXRlZCI6ZmFsc2UsImlzQXV0b1Byb2xvbmdhdGVkIjpmYWxzZX0=-WGTHs6XpDhr+uumvbwQPOdlxWnQwgnGaL4eRnlpGKApEEkJyYvNEuPWBSrQkPmVpim/8Sab6HV04Dw3IzkJT0yTc29sPEXBf69+7y6Jv718FaJu4MWfsAk/ZGtNIUOczUQ0iGKKnSSsfQ/3UoMv0q/yJcfvj+me5Zd/gfaisCCMUaGjB/lWIPpEPzblDtVJbRexB1MALrLCEoDv3ujcPAZ7xWb54DiZwjYhQvQ+CvpNNF2jeTku7lbm5v+BoDsdeRq7YBt9ANLUKPr2DahcaZ4gctpHZXhG96IyKx232jYq9jQrFDbQMtVr3E+GsCekMEWSD//dLT+HuZdc1sAIYrw==-MIIElTCCAn2gAwIBAgIBCTANBgkqhkiG9w0BAQsFADAYMRYwFAYDVQQDDA1KZXRQcm9maWxlIENBMB4XDTE4MTEwMTEyMjk0NloXDTIwMTEwMjEyMjk0NlowaDELMAkGA1UEBhMCQ1oxDjAMBgNVBAgMBU51c2xlMQ8wDQYDVQQHDAZQcmFndWUxGTAXBgNVBAoMEEpldEJyYWlucyBzLnIuby4xHTAbBgNVBAMMFHByb2QzeS1mcm9tLTIwMTgxMTAxMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA5ndaik1GD0nyTdqkZgURQZGW+RGxCdBITPXIwpjhhaD0SXGa4XSZBEBoiPdY6XV6pOfUJeyfi9dXsY4MmT0D+sKoST3rSw96xaf9FXPvOjn4prMTdj3Ji3CyQrGWeQU2nzYqFrp1QYNLAbaViHRKuJrYHI6GCvqCbJe0LQ8qqUiVMA9wG/PQwScpNmTF9Kp2Iej+Z5OUxF33zzm+vg/nYV31HLF7fJUAplI/1nM+ZG8K+AXWgYKChtknl3sW9PCQa3a3imPL9GVToUNxc0wcuTil8mqveWcSQCHYxsIaUajWLpFzoO2AhK4mfYBSStAqEjoXRTuj17mo8Q6M2SHOcwIDAQABo4GZMIGWMAkGA1UdEwQCMAAwHQYDVR0OBBYEFGEpG9oZGcfLMGNBkY7SgHiMGgTcMEgGA1UdIwRBMD+AFKOetkhnQhI2Qb1t4Lm0oFKLl/GzoRykGjAYMRYwFAYDVQQDDA1KZXRQcm9maWxlIENBggkA0myxg7KDeeEwEwYDVR0lBAwwCgYIKwYBBQUHAwEwCwYDVR0PBAQDAgWgMA0GCSqGSIb3DQEBCwUAA4ICAQBonMu8oa3vmNAa4RQP8gPGlX3SQaA3WCRUAj6Zrlk8AesKV1YSkh5D2l+yUk6njysgzfr1bIR5xF8eup5xXc4/G7NtVYRSMvrd6rfQcHOyK5UFJLm+8utmyMIDrZOzLQuTsT8NxFpbCVCfV5wNRu4rChrCuArYVGaKbmp9ymkw1PU6+HoO5i2wU3ikTmRv8IRjrlSStyNzXpnPTwt7bja19ousk56r40SmlmC04GdDHErr0ei2UbjUua5kw71Qn9g02tL9fERI2sSRjQrvPbn9INwRWl5+k05mlKekbtbu2ev2woJFZK4WEXAd/GaAdeZZdumv8T2idDFL7cAirJwcrbfpawPeXr52oKTPnXfi0l5+g9Gnt/wfiXCrPElX6ycTR6iL3GC2VR4jTz6YatT4Ntz59/THOT7NJQhr6AyLkhhJCdkzE2cob/KouVp4ivV7Q3Fc6HX7eepHAAF/DpxwgOrg9smX6coXLgfp0b1RU2u/tUNID04rpNxTMueTtrT8WSskqvaJd3RH8r7cnRj6Y2hltkja82HlpDURDxDTRvv+krbwMr26SB/40BjpMUrDRCeKuiBahC0DCoU/4+ze1l94wVUhdkCfL0GpJrMSCDEK+XEurU18Hb7WT+ThXbkdl6VpFdHsRvqAnhR2g4b+Qzgidmuky5NUZVfEaZqV/g==
--------------------------------------------------------------------------------
/99_pycharm_archive/激活码/永久激活码/激活码3.txt:
--------------------------------------------------------------------------------
1 | KNBB2QUUR1-eyJsaWNlbnNlSWQiOiJLTkJCMlFVVVIxIiwibGljZW5zZWVOYW1lIjoiZ2hib2tlIiwiYXNzaWduZWVOYW1lIjoiIiwiYXNzaWduZWVFbWFpbCI6IiIsImxpY2Vuc2VSZXN0cmljdGlvbiI6IiIsImNoZWNrQ29uY3VycmVudFVzZSI6ZmFsc2UsInByb2R1Y3RzIjpbeyJjb2RlIjoiSUkiLCJmYWxsYmFja0RhdGUiOiIyMDg5LTA3LTA3IiwicGFpZFVwVG8iOiIyMDg5LTA3LTA3In0seyJjb2RlIjoiQUMiLCJmYWxsYmFja0RhdGUiOiIyMDg5LTA3LTA3IiwicGFpZFVwVG8iOiIyMDg5LTA3LTA3In0seyJjb2RlIjoiRFBOIiwiZmFsbGJhY2tEYXRlIjoiMjA4OS0wNy0wNyIsInBhaWRVcFRvIjoiMjA4OS0wNy0wNyJ9LHsiY29kZSI6IlBTIiwiZmFsbGJhY2tEYXRlIjoiMjA4OS0wNy0wNyIsInBhaWRVcFRvIjoiMjA4OS0wNy0wNyJ9LHsiY29kZSI6IkdPIiwiZmFsbGJhY2tEYXRlIjoiMjA4OS0wNy0wNyIsInBhaWRVcFRvIjoiMjA4OS0wNy0wNyJ9LHsiY29kZSI6IkRNIiwiZmFsbGJhY2tEYXRlIjoiMjA4OS0wNy0wNyIsInBhaWRVcFRvIjoiMjA4OS0wNy0wNyJ9LHsiY29kZSI6IkNMIiwiZmFsbGJhY2tEYXRlIjoiMjA4OS0wNy0wNyIsInBhaWRVcFRvIjoiMjA4OS0wNy0wNyJ9LHsiY29kZSI6IlJTMCIsImZhbGxiYWNrRGF0ZSI6IjIwODktMDctMDciLCJwYWlkVXBUbyI6IjIwODktMDctMDcifSx7ImNvZGUiOiJSQyIsImZhbGxiYWNrRGF0ZSI6IjIwODktMDctMDciLCJwYWlkVXBUbyI6IjIwODktMDctMDcifSx7ImNvZGUiOiJSRCIsImZhbGxiYWNrRGF0ZSI6IjIwODktMDctMDciLCJwYWlkVXBUbyI6IjIwODktMDctMDcifSx7ImNvZGUiOiJQQyIsImZhbGxiYWNrRGF0ZSI6IjIwODktMDctMDciLCJwYWlkVXBUbyI6IjIwODktMDctMDcifSx7ImNvZGUiOiJSTSIsImZhbGxiYWNrRGF0ZSI6IjIwODktMDctMDciLCJwYWlkVXBUbyI6IjIwODktMDctMDcifSx7ImNvZGUiOiJXUyIsImZhbGxiYWNrRGF0ZSI6IjIwODktMDctMDciLCJwYWlkVXBUbyI6IjIwODktMDctMDcifSx7ImNvZGUiOiJEQiIsImZhbGxiYWNrRGF0ZSI6IjIwODktMDctMDciLCJwYWlkVXBUbyI6IjIwODktMDctMDcifSx7ImNvZGUiOiJEQyIsImZhbGxiYWNrRGF0ZSI6IjIwODktMDctMDciLCJwYWlkVXBUbyI6IjIwODktMDctMDcifSx7ImNvZGUiOiJSU1UiLCJmYWxsYmFja0RhdGUiOiIyMDg5LTA3LTA3IiwicGFpZFVwVG8iOiIyMDg5LTA3LTA3In1dLCJoYXNoIjoiMTI3OTY4NzcvMCIsImdyYWNlUGVyaW9kRGF5cyI6NywiYXV0b1Byb2xvbmdhdGVkIjpmYWxzZSwiaXNBdXRvUHJvbG9uZ2F0ZWQiOmZhbHNlfQ==-1iV7BA/baNqv0Q5yUnAphUmh66QhkDRX+qPL09ICuEicBqiPOBxmVLLCVUpkxhrNyfmOtat2LcHwcX/NHkYXdoW+6aS0S388xe1PV2oodiPBhFlEaOac42UQLgP4EidfGQSvKwC9tR1zL5b2CJPQKZ7iiHh/iKBQxP6OBMUP1T7j3Fe1rlxfYPc92HRZf6cO+C0+buJP5ERZkyIn5ZrVM4TEnWrRHbpL8SVNq4yqfc+NwoRzRSNC++81VDS3AXv9c91YeZJz6JXO7AokIk54wltr42FLNuKbozvB/HCxV9PA5vIiM+kZY1K0w5ytgxEYKqA87adA7R5xL/crpaMxHQ==-MIIElTCCAn2gAwIBAgIBCTANBgkqhkiG9w0BAQsFADAYMRYwFAYDVQQDDA1KZXRQcm9maWxlIENBMB4XDTE4MTEwMTEyMjk0NloXDTIwMTEwMjEyMjk0NlowaDELMAkGA1UEBhMCQ1oxDjAMBgNVBAgMBU51c2xlMQ8wDQYDVQQHDAZQcmFndWUxGTAXBgNVBAoMEEpldEJyYWlucyBzLnIuby4xHTAbBgNVBAMMFHByb2QzeS1mcm9tLTIwMTgxMTAxMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA5ndaik1GD0nyTdqkZgURQZGW+RGxCdBITPXIwpjhhaD0SXGa4XSZBEBoiPdY6XV6pOfUJeyfi9dXsY4MmT0D+sKoST3rSw96xaf9FXPvOjn4prMTdj3Ji3CyQrGWeQU2nzYqFrp1QYNLAbaViHRKuJrYHI6GCvqCbJe0LQ8qqUiVMA9wG/PQwScpNmTF9Kp2Iej+Z5OUxF33zzm+vg/nYV31HLF7fJUAplI/1nM+ZG8K+AXWgYKChtknl3sW9PCQa3a3imPL9GVToUNxc0wcuTil8mqveWcSQCHYxsIaUajWLpFzoO2AhK4mfYBSStAqEjoXRTuj17mo8Q6M2SHOcwIDAQABo4GZMIGWMAkGA1UdEwQCMAAwHQYDVR0OBBYEFGEpG9oZGcfLMGNBkY7SgHiMGgTcMEgGA1UdIwRBMD+AFKOetkhnQhI2Qb1t4Lm0oFKLl/GzoRykGjAYMRYwFAYDVQQDDA1KZXRQcm9maWxlIENBggkA0myxg7KDeeEwEwYDVR0lBAwwCgYIKwYBBQUHAwEwCwYDVR0PBAQDAgWgMA0GCSqGSIb3DQEBCwUAA4ICAQBonMu8oa3vmNAa4RQP8gPGlX3SQaA3WCRUAj6Zrlk8AesKV1YSkh5D2l+yUk6njysgzfr1bIR5xF8eup5xXc4/G7NtVYRSMvrd6rfQcHOyK5UFJLm+8utmyMIDrZOzLQuTsT8NxFpbCVCfV5wNRu4rChrCuArYVGaKbmp9ymkw1PU6+HoO5i2wU3ikTmRv8IRjrlSStyNzXpnPTwt7bja19ousk56r40SmlmC04GdDHErr0ei2UbjUua5kw71Qn9g02tL9fERI2sSRjQrvPbn9INwRWl5+k05mlKekbtbu2ev2woJFZK4WEXAd/GaAdeZZdumv8T2idDFL7cAirJwcrbfpawPeXr52oKTPnXfi0l5+g9Gnt/wfiXCrPElX6ycTR6iL3GC2VR4jTz6YatT4Ntz59/THOT7NJQhr6AyLkhhJCdkzE2cob/KouVp4ivV7Q3Fc6HX7eepHAAF/DpxwgOrg9smX6coXLgfp0b1RU2u/tUNID04rpNxTMueTtrT8WSskqvaJd3RH8r7cnRj6Y2hltkja82HlpDURDxDTRvv+krbwMr26SB/40BjpMUrDRCeKuiBahC0DCoU/4+ze1l94wVUhdkCfL0GpJrMSCDEK+XEurU18Hb7WT+ThXbkdl6VpFdHsRvqAnhR2g4b+Qzgidmuky5NUZVfEaZqV/g==
--------------------------------------------------------------------------------
/99_pycharm_archive/激活码/非永久激活码/Pycharm方式一激活码汇总.docx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/binzhouchn/python_notes/52b87285c4147cc2bd43d69d0243a2d6357f4183/99_pycharm_archive/激活码/非永久激活码/Pycharm方式一激活码汇总.docx
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
2 | [](https://github.com/binzhouchn/feature_engineering)
3 |
4 | # python笔记
5 | > 版本:0.5
6 | > 作者:binzhou
7 | > 邮件:binzhouchn@gmail.com
8 |
9 | `Github`加载`ipynb`的速度较慢,建议在 [Nbviewer](http://nbviewer.ipython.org/github/lijin-THU/notes-python/blob/master/index.ipynb) 中查看该项目。
10 |
11 | [python各版本下载仓库](https://www.python.org/ftp/python/)
12 |
13 | ---
14 |
15 | ## 简介
16 |
17 | 默认安装了 `Python 3.10`,以及相关的第三方包 `gensim`, `tqdm`, `flask`
18 |
19 | anaconda 虚拟环境创建python版本降级命令:conda create -n tableqa python=3.9
20 |
21 | > life is short.use python.
22 |
23 | 推荐使用[Anaconda](http://www.continuum.io/downloads),这个IDE集成了大部分常用的包。
24 |
25 |
26 | pip使用国内镜像
27 |
28 | [让python pip使用国内镜像](https://www.cnblogs.com/wqpkita/p/7248525.html)
29 | ```shell
30 | pip install -i http://pypi.douban.com/simple --trusted-host pypi.douban.com xx包
31 | pip install -i https://pypi.tuna.tsinghua.edu.cn/simple xx包
32 | pip install -i http://pypi.douban.com/simple --trusted-host pypi.douban.com xx包 --use-feature=2020-resolver #解决安装包时冲突问题
33 | ```
34 | ```
35 | 临时使用示例:
36 | pip install -i http://pypi.douban.com/simple --trusted-host pypi.douban.com flask
37 | # 如果是公司电脑且有代理,本地进入docker python3.6后需要加个代理再安装相关的包
38 | pip --proxy=proxyAddress:port install -i http://pypi.douban.com/simple --trusted-host pypi.douban.com flask
39 | ```
40 |
41 |
42 |
43 |
44 | pip镜像配置
45 |
46 | pip install镜像配置(Linux)
47 | ```
48 | # 先在home或者和anaconda文件夹平级的的.pip文件夹下新建pip.conf配置文件然后把以后代码复制进去
49 | [global]
50 | trusted-host = pypi.tuna.tsinghua.edu.cn
51 | index-url = https://pypi.tuna.tsinghua.edu.cn/simple
52 | ```
53 | pip install镜像配置(Windows)
54 | ```
55 | # 进入目录(C:\Users\Administrator)下新建一个pip文件夹,文件夹里建一个pip.ini 文本文件,内容如下:
56 | [global]
57 | index-url = https://pypi.tuna.tsinghua.edu.cn/simple
58 | [install]
59 | trusted-host = pypi.tuna.tsinghua.edu.cn
60 | 或者
61 | [global]
62 | index-url = http://mirrors.aliyun.com/pypi/simple/
63 | [install]
64 | trusted-host = mirrors.aliyun.com
65 | ```
66 |
67 |
68 | ## 使用conda升级到python3.12
69 |
70 | 方法一
71 | https://qa.1r1g.com/sf/ask/4099772281/)
72 | ```shell
73 | conda update -n base -c defaults conda
74 | conda install -c anaconda python=3.12
75 | #然后再重新安装下依赖包
76 | ```
77 | 方法二(或使用虚拟环境)
78 | ```
79 | $ conda create -p /your_path/env_name python=3.12
80 | # 激活环境
81 | $ source activate /your_path/env_name
82 | # 关闭环境
83 | $ source deactivate /your_path/env_name
84 | # 删除环境
85 | $ conda env remove -p /your_path/env_name
86 | ```
87 |
88 | ## 其他python仓库推荐
89 |
90 | [All algorithms implemented in Python - for education](https://github.com/TheAlgorithms/Python/)
91 |
--------------------------------------------------------------------------------