├── .gitignore ├── 01_basic ├── README.md └── arg_test.py ├── 02_numpy └── README.md ├── 03_pandas ├── README.md ├── concat.png ├── join.png └── merge.png ├── 04_sklearn └── README.md ├── 05_OOP └── README.md ├── 06_flask_sanic └── README.md ├── 07_database ├── README.md ├── es.md ├── faiss.md ├── imgs │ └── redis_pic.png └── neo4j.md ├── 08_vscode └── README.md ├── 09_remote_ipython └── README.md ├── 10_docker ├── README.md ├── mi_docker_demo │ ├── README.md │ ├── app.py │ ├── docker_build │ └── requirements.txt └── newapi_docker_demo │ └── README.md ├── 11_rabbitmq └── README.md ├── 12_nginx ├── README.md ├── default.conf ├── default1.conf ├── default2.conf └── default3.conf ├── 13_airflow └── README.md ├── 14_go └── README.md ├── 15_ansible └── README.md ├── 99_pycharm_archive ├── .DS_Store ├── README.md ├── pic │ ├── pycharm_activ.png │ ├── pycharm_git1.png │ ├── pycharm_git2.png │ ├── pycharm_remote1.png │ ├── pycharm_remote2.png │ ├── pycharm_remote3.png │ ├── pycharm_remote4.png │ └── pycharm_remote5.png └── 激活码 │ ├── .DS_Store │ ├── jetbrains-agent.jar │ ├── 永久激活码 │ ├── 激活码.txt │ ├── 激活码1.txt │ ├── 激活码2.txt │ └── 激活码3.txt │ └── 非永久激活码 │ └── Pycharm方式一激活码汇总.docx └── README.md /.gitignore: -------------------------------------------------------------------------------- 1 | .idea 2 | __pycache__ 3 | .DS_Store 4 | -------------------------------------------------------------------------------- /01_basic/README.md: -------------------------------------------------------------------------------- 1 | ## python实用技巧 2 | 3 | [**1. lambda函数**](#lambda函数) 4 | 5 | [**2. map函数**](#map函数) 6 | 7 | [**3. filter函数**](#filter函数) 8 | 9 | [**4. reduce函数**](#reduce函数) 10 | 11 | [**5. apply和applymap函数、transform/agg**](#apply函数) 12 | 13 | [**6. dict转object**](#dict转object) 14 | 15 | [**7. KFold函数**](#kfold函数) 16 | 17 | [**8. sys.defaultencoding**](#sys) 18 | 19 | [**9. pip install error _NamespacePath**](#pip_error) 20 | 21 | [**10. zip(\*xx)用法**](#zip) 22 | 23 | [**11. dataframe中某一列字符串长度为10的进行切片**](#切片) 24 | 25 | [**12. re模块(一些常用的正则轮子)**](#re模块) 26 | 27 | [**13. eval**](#eval) 28 | 29 | [**14. global用法**](#global) 30 | 31 | [**15. 多进程与多线程实现**](#多进程与多线程实现) 32 | 33 | [**16. CV的多进程实现**](#cv的多进程实现) 34 | 35 | [**17. 保存数据(json)**](#保存数据) 36 | 37 | [**18. 保存模型**](#保存模型) 38 | 39 | [**19. enumerate用法**](#enumerate) 40 | 41 | [**20. label数值化方法**](#label数值化方法) 42 | 43 | [**21. 列表推导式中使用if else**](#列表推导式中使用if_else) 44 | 45 | [**22. 将nparray或list中的最多的元素选出**](#将numpy_array中的最多的元素选出) 46 | 47 | [**23. 函数中传入函数demo**](#函数中传入函数demo) 48 | 49 | [**24. getattr**](#getattr) 50 | 51 | [**25. df宽变长及一列变多列**](#df宽变长及一列变多列) 52 | 53 | [**26. groupby使用**](#groupby使用) 54 | 55 | [**27. python画图显示中文**](#python画图及显示中文) 56 | 57 | [**28. 给字典按value排序**](#给字典按value排序) 58 | 59 | [**29. sorted高级用法**](#sorted高级用法) 60 | 61 | [**30. time用法**](#time用法) 62 | 63 | [**31. 两层列表展开平铺**](#两层列表展开平铺) 64 | 65 | [**32. 读取百度百科词向量**](#读取百度百科词向量) 66 | 67 | [**33. logging**](#logging) 68 | 69 | [**34. argparse用法**](#argparse用法) 70 | 71 | [**35. 包管理**](#包管理) 72 | 73 | [**36. 装饰器**](#装饰器) 74 | 75 | [**37. 本地用python起http服务**](#本地用python起http服务) 76 | 77 | [**38. cache**](#cache) 78 | 79 | [**39. 创建文件**](#创建文件) 80 | 81 | [**40. 字典转成对象(骚操作)**](#字典转成对象) 82 | 83 | [**41. lgb[gpu版本]和xgb[gpu版本]安装**](#boost安装) 84 | 85 | [**42. tqdm**](#tqdm) 86 | 87 | [**43. joblib Parallel并行**](#joblib_parallel) 88 | 89 | [**44. 调试神器pysnooper - 丢弃print**](#调试神器pysnooper) 90 | 91 | [**45. 调试神器debugpy**](#调试神器debugpy) 92 | 93 | [**46. 分组计算均值并填充**](#分组计算均值并填充) 94 | 95 | [**47. python日期处理**](#python日期处理) 96 | 97 | [**48. dataclass**](#dataclass) 98 | 99 | [**49. md5 sha256**](#md5_sha256) 100 | 101 | [**50. 查看内存**](#查看内存) 102 | 103 | [**51. __slots__用法**](#slots用法) 104 | 105 | --- 106 |
107 | 点击展开 108 | 109 | ```python 110 | %reload_ext autoreload 111 | %autoreload 2 112 | %matplotlib notebook 113 | 114 | import sys 115 | sys.path.append('..') 116 | ``` 117 | 118 | ### lambda函数 119 | ```python 120 | # lambda: 快速定义单行的最小函数,inline的匿名函数 121 | (lambda x : x ** 2)(3) 122 | # 或者 123 | f = lambda x : x ** 2 124 | f(3) 125 | ``` 126 | 127 | ### map函数 128 | ```python 129 | arr_str = ["hello", "this"] 130 | arr_num = [3,1,6,10,12] 131 | 132 | def f(x): 133 | return x ** 2 134 | map(lambda x : x ** 2, arr_num) 135 | map(f, arr_num) 136 | map(len, arr_str) 137 | map(lambda x : (x, 1), arr_str) 138 | ``` 139 | ```python 140 | # 可以对每个列表对应的元素进行操作,比如加总 141 | f1 = lambda x,y,z:x+y+z 142 | list(map(f1,[1,2,10],[2,3,6],[4,3,5])) 143 | # [7,8,21] 144 | ``` 145 | 146 | ### filter函数 147 | ```python 148 | arr_str = ['hello','hi','nice'] 149 | arr_num = [1,6,10,12] 150 | filter(lambda x : len(x) >= 5, arr_str) 151 | filter(lambda x : x > 5, arr_num) 152 | [(i.word, 'E') if i.flag =='n' else (i.word, 'P') for i in filter(lambda x: x.flag in ('n', 'v'), a) ] 153 | ``` 154 | 155 | ### reduce函数 156 | ```python 157 | # 在python3里,reduce函数已经被从全局命名空间里移除了,它现在被放置在functools模块里 158 | from functools import reduce 159 | arr_num = [1,6,7,10] 160 | reduce(lambda x, y : x + y, arr_num) 161 | ``` 162 | 163 | ### apply函数 164 | 165 | - apply函数是对行进行操作 166 | 167 | 你可以把apply()当作是一个map()函数,只不过这个函数是专为Pandas的dataframe和series对象打造的。对初学者来说,你可以把series对象想象成类似NumPy里的数组对象。它是一个一维带索引的数据表结构。
168 |
169 | apply() 函数作用是,将一个函数应用到某个数据表中你指定的一行或一列中的每一个元素上。是不是很方便?特别是当你需要对某一列的所有元素都进行格式化或修改的时候,你就不用再一遍遍地循环啦!
170 | ```python 171 | df = pd.DataFrame([[4,9],]*3,columns=['A','B']) 172 | df.apply(np.sqrt) 173 | df.apply(np.sum,axis=0) 174 | df.apply(np.sum,axis=1) 175 | df.apply(lambda x : [1,2], axis=1) 176 | df.apply(lambda x : x.split()[0]) 177 | ``` 178 | > applymap和apply差不多,不过是全局函数,elementwise,作用于dataframe中的每个元素 179 | 180 | - transform/agg是对一列进行操作 181 | 182 | 由前面分析可以知道,Fare项在测试数据中缺少一个值,所以需要对该值进行填充。 183 | 我们按照一二三等舱各自的均价来填充: 184 | 下面transform将函数np.mean应用到各个group中。 185 | ```python 186 | combined_train_test['Fare'] = combined_train_test[['Fare']].fillna(combined_train_test.groupby('Pclass').transform(np.mean)) 187 | ``` 188 | 189 | ### dict转object 190 | ```python 191 | import json 192 | # json格式的str 193 | s = '{"name":{"0":"John","1":"Lily"},"phone_no":{"0":"189101","1":"234220"},"age":{"0":"11","1":"23"}}' 194 | # load成dict 195 | dic = json.loads(s) 196 | dic 197 | # {"name":{"0":"John","1":"Lily"},"phone_no":{"0":"189101","1":"234220"},"age":{"0":"11","1":"23"}} 198 | # 不能使用dic.name, dic.age 只能dic['name'], dic['age'] 199 | class p: 200 | def __init__(self, d=None): 201 | self.__dict__ = d 202 | p1 = p(dic) 203 | # 这个时候就可以用p1.name, p1.age了 204 | 205 | # 更详细一点 206 | import six 207 | import pprint 208 | # 现在有个字典 209 | conf = {'base':{'good','medium','bad'},'age':'24'} 210 | # conf.age是不行的 211 | 定义一个class: 212 | class p: 213 | def __init__(self, d=None): 214 | self.__dict__ = d 215 | def keys(self): 216 | return self.__dict__.keys() 217 | def items(self): 218 | return six.iteritems(self.__dict__) 219 | def __repr__(self): 220 | return pprint.pformat(self.__dict__) # 将dict转成字符串 221 | p1 = p(conf) 222 | 这个时候就可以p1.base和p1.age 223 | p1这个实例拥有的属性有: 224 | p.__doc__ 225 | p.__init__ 226 | p.__module__ 227 | p.__repr__ 228 | p.age * age和base这两个是字典加载进来以后多出来的属性 229 | p.base * 230 | p.items 231 | p.keys 232 | ``` 233 | 234 | ### kfold函数 235 | 新手用cross_val_score比较简单,后期可用KFold更灵活, 236 | ```python 237 | skf = StratifiedKFold(n_splits=5,shuffle=True) 238 | for train_idx, val_idx in skf.split(X,y): 239 | pass 240 | train_idx 241 | val_idx 242 | ``` 243 | ```python 244 | from sklearn.model_selection import cross_val_score, StratifiedKFold, KFold 245 | forest = RandomForestClassifier(n_estimators = 120,max_depth=5, random_state=42) 246 | cross_val_score(forest,X=train_data_features,y=df.Score,scoring='neg_mean_squared_error',cv=3) 247 | # 这里的scoring可以自己写,比如我想用RMSE则 248 | from sklearn.metrics import scorer 249 | def ff(y,y_pred): 250 | rmse = np.sqrt(sum((y-y_pred)**2)/len(y)) 251 | return rmse 252 | rmse_scoring = scorer.make_scorer(ff) 253 | cross_val_score(forest,X=train_data_features,y=df.Score,scoring=rmse_scoring,cv=5) 254 | ``` 255 | ```python 256 | # Some useful parameters which will come in handy later on 257 | ntrain = titanic_train_data_X.shape[0] 258 | ntest = titanic_test_data_X.shape[0] 259 | SEED = 42 # for reproducibility 260 | NFOLDS = 5 # set folds for out-of-fold prediction 261 | kf = KFold(n_splits = NFOLDS, random_state=SEED, shuffle=True) 262 | 263 | def get_out_fold(clf, x_train, y_train, x_test): # 这里需要将dataframe转成array,用x_train.values即可 264 | oof_train = np.zeros((ntrain,)) 265 | oof_test = np.zeros((ntest,)) 266 | oof_test_skf = np.empty((NFOLDS, ntest)) 267 | 268 | for i, (train_index, test_index) in enumerate(kf.split(x_train)): 269 | x_tr = x_train.loc[train_index] 270 | y_tr = y_train.loc[train_index] 271 | x_te = x_train.loc[test_index] 272 | 273 | clf.fit(x_tr, y_tr) 274 | 275 | oof_train[test_index] = clf.predict(x_te) 276 | oof_test_skf[i, :] = clf.predict(x_test) 277 | 278 | oof_test[:] = oof_test_skf.mean(axis=0) 279 | return oof_train.reshape(-1, 1), oof_test.reshape(-1, 1) 280 | ``` 281 | 282 | ### sys 283 | ```python 284 | import sys 285 | reload(sys) 286 | sys.setdefaultencoding('utf-8') 287 | #注意:使用此方式,有极大的可能导致print函数无法打印数据! 288 | 289 | #改进方式如下: 290 | import sys #这里只是一个对sys的引用,只能reload才能进行重新加载 291 | stdi,stdo,stde=sys.stdin,sys.stdout,sys.stderr 292 | reload(sys) #通过import引用进来时,setdefaultencoding函数在被系统调用后被删除了,所以必须reload一次 293 | sys.stdin,sys.stdout,sys.stderr=stdi,stdo,stde 294 | sys.setdefaultencoding('utf-8') 295 | ``` 296 | 297 | ### pip_error 298 | 299 | 使用pip时出现错误: 300 | AttributeError: '_NamespacePath' object has no attribute 'sort' 301 | 302 | 解决方法:
303 | 1. 关于Anaconda3报错 AttributeError: '_NamespacePath' object has no attribute 'sort' ,先参考下面这篇博客:
304 | http://www.cnblogs.com/newP/p/7149155.html
305 | 按照文中的做法是可以解决conda报错的,总结一下就是:一,把文件夹 D:\ProgramData\Anaconda3\Lib\site-packages\conda\_vendor\auxlib 中的 path.py 中,“except ImportError: ”修改为“except Exception:“;二、找到D:\ProgramData\Anaconda3\lib\site-packages\setuptools-27.2.0-py3.6.egg,删除(不放心的话,剪切到别的地方) 306 | 307 | 2.然而pip报错的问题还没解决。首先要安装setuptools模块,下载地址是:
308 | https://pypi.python.org/pypi/setuptools#files
309 | 下载setuptools-36.5.0.zip解压,命令窗口进入到文件夹然后 python setup.py install 310 | 311 | 3.安装好setuptools模块之后应该能用easy_install了,我们要借助它来重新安装pip。命令窗口输入命令:easy_install pip 312 | 313 | ### zip 314 | zip基本用法
315 | ```python 316 | a = [1,2,3] 317 | b = [4,5,6] 318 | for i,j in zip(a,b): 319 | print(i,j) 320 | # 1 4 321 | # 2 5 322 | # 3 6 323 | ``` 324 | 325 | ```python 326 | s = '彩符和大汶口文化陶尊符号是第三阶段的语段文字' 327 | print(synonyms.seg(s)) 328 | # (['彩符', '和', '大汶口', '文化', '陶尊', '符号', '是', '第三阶段', '的', '语段', '文字'], ['n', 'c', 'ns', 'n', 'nr', 'n', 'v', 't', 'uj', 'n', 'n']) 329 | [x for x in zip(*synonyms.seg(s))] 330 | # [('彩符', 'n'), 331 | ('和', 'c'), 332 | ('大汶口', 'ns'), 333 | ('文化', 'n'), 334 | ('陶尊', 'nr'), 335 | ('符号', 'n'), 336 | ('是', 'v'), 337 | ('第三阶段', 't'), 338 | ('的', 'uj'), 339 | ('语段', 'n'), 340 | ('文字', 'n')] 341 | ``` 342 | ### 切片 343 | ```python 344 | data.msg_from = data.msg_from.astype(str) 345 | data[data.msg_from.apply(len)==10] 346 | ``` 347 | 348 | ### re模块 349 | 350 | [常用正则表达式速查手册,Python文本处理必备](https://mp.weixin.qq.com/s/ySsgcrSnkguO2c8D-SQNxw)
351 | [regexlearn](https://github.com/aykutkardas/regexlearn.com)
352 | 353 | ```python 354 | # 1. 将一个问题中的网址、邮箱、手机号、身份证、日期、价格提出来 355 | 356 | # 日期 注:这里的{1,4}指的是匹配1到4位,问号指的是0个或1个 357 | DATE_REG1 = "(?:[一二三四五六七八九零十0-9]{1,4}年[一二三四五六七八九零十0-9]{1,2}月[一二三四五六七八九零十0-9]{1,2}[日|号|天|分]?)|\ 358 | (?:[一二三四五六七八九零十0-9]+年[一二三四五六七八九零十0-9]+月)|\ 359 | (?:[一二三四五六七八九零十0-9]{1,2}月[一二三四五六七八九零十0-9]{1,2}[号|日|天]?)|\ 360 | (?:[一二三四五六七八九零十0-9]+年)|\ 361 | (?:[一二三四五六七八九零十0-9]+月)|\ 362 | (?:[一二三四五六七八九零十0-9]{1,3}[号|日|天])|\ 363 | (?:[一二三四五六七八九零十0-9]+小时[一二三四五六七八九零十0-9]+分钟)|\ 364 | (?:[一二三四五六七八九零十0-9]+小时)|\ 365 | (?:[一二三四五六七八九零十0-9]+分钟)\ 366 | " 367 | 368 | # 网址 369 | URL_REG = "http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*,]|(?:%[0-9a-fA-F][0-9a-fA-F]))+" 370 | # 手机 371 | PHONE_REG = "[+](?:86)[-\s+]*?1[3-8][0-9]{9}" 372 | # 邮箱 373 | MAIL_REG = "[0-9a-zA-Z_]{0,39}@(?:[A-Za-z0-9]+\.)+[A-Za-z]+" 374 | # 身份证 375 | IDCARD_REG = "\d{18}|\d{17}[Xx]" 376 | 377 | # 价格 378 | MONEY_REG1 = "(?:\d+[\.\d+]*万*亿*美*港*元/桶)|\ 379 | (?:\d+[\.\d+]*万*亿*美*港*元/吨)|\ 380 | (?:\d+[\.\d+]*万*亿*美*港*元/升)|\ 381 | (?:\d+[\.\d+]*万*亿*美*港*元/吨)|\ 382 | (?:\d+[\.\d+]*万*亿*美*港*元/赛季)|\ 383 | (?:\d+[\.\d+]*万*亿*美*港*平方米)|\ 384 | (?:\d+[\.\d+]*万*亿*美*港*平方千米)|\ 385 | (?:(?:[\d]{1,3},)*(?:[\d]{3})[万亿]*[美港]*元)|\ 386 | (?:\d+[\.\d+]*万*亿*美*港*[股|笔|户|辆|倍|桶|吨|升|个|手|点|元|亿|万])" 387 | 388 | MONEY_REG2 = "([一二三四五六七八九零十百千万亿|\d|.]+[万|元|块|毛][一二三四五六七八九零十百千万亿|\d|.]*)+" 389 | 390 | ## add date reg 391 | DATE_REG2 = "(?:[\d]*[-:\.]*\d+[-:\.点]\d+分)|(?:[\d+-]*\d+月份)|(?:\d+[-:\.]\d+[-:\.]\d+)" 392 | # HYPER_REG 2017-09-20 393 | HYPER_REG = "[0-9a-zA-Z]+[-:][0-9a-zA-Z]+[%]*" 394 | 395 | # 2. 具体的正则匹配问题 396 | 397 | ## 2.1 以数字开头后面只能接文字,而且数字后面接的文字不能是【小时、种】 398 | s = '22基本日常生活活动:指食物摄取、大小便始末、穿脱衣服、起居、步行、入浴。' 399 | re.findall(r'^\d+(?![\d*小时*]|[\d*种*])[\u4e00-\u9fa5]+', s) 400 | 401 | # 匹配只留下中文、英文和数字 402 | re.sub(r'[^\u4E00-\u9FA5\s0-9a-zA-Z]+', '', s) 403 | 404 | # 日期解析202206 405 | import cn2an #version 0.5.14 406 | import datetime 407 | import re 408 | def getYearMonth(s): 409 | ''' 410 | 【格式说明】 411 | 今年上个月/上月/前一个月/前个月 -> 202204 412 | 今年当月/该月/这月/这个月/本月 -> 202205 413 | 去年5月/去年五月/2021年五月/2021五月/二零二一五月/二零二一 五月 -> 202105 414 | 前年5月/前年五月/2020年五月/2020五月/二零二零五月/二零二零 五月 -> 202005 415 | 2021年7月/二零二一年7月 -> 202107 416 | 5月/五月份 -> 202205 417 | 2021.6/2021.06/2021-6/2021-06/2021 - 6月/2021 ---6月/2021 . 6月/2021...6月, -> 202106 418 | 2021 4月/2021 04 -> 202104 419 | 如果没有提到时间 -> 202205(默认今年当月) 420 | 如果输入的时间有误或月份有误比如输入2021 23, -> 202205(默认今年当月) 421 | 如果输入时间超过当前时间 -> 202205(默认今年当月) 422 | 如果输入时间早于2020年1月 -> 202205(默认今年当月) 423 | ''' 424 | cur_date = datetime.datetime.now().strftime('%Y%m') 425 | try: 426 | DATE_REG1 = '(?:[一二三四五六七八九零十0-9]{1,4}年[一二三四五六七八九零十0-9]{1,2}月)|(?:去年[一二三四五六七八九零十0-9]+月)|(?:前年[一二三四五六七八九零十0-9]+月)|(?:[一二三四五六七八九零十0-9]+年[一二三四五六七八九零十0-9]+月)|(?:[一二三四五六七八九零十0-9]{1,2}月)|(?:[一二三四五六七八九零十0-9]+年)|(?:[一二三四五六七八九零十0-9]+月)' 427 | thism_lst = ['当月', '该月', '这个月', '本月'] 428 | lastm_lst = ['上月', '上个月', '前一个月', '前个月'] 429 | date = '' 430 | def helper(s, pattern): 431 | date = '' 432 | s = cn2an.transform(s, "cn2an") # 转换成阿拉伯数字 433 | res = re.findall(pattern, s) 434 | if res: 435 | res = res[0] # 如果有多个就取第一个 436 | year = '2022' #需要人工维护当年,还有过去两年的一个判断;每年要手动更新这部分 437 | if '去年' in res or '21年' in res: 438 | year = '2021' 439 | elif '前年' in res or '20年' in res: 440 | year = '2020' 441 | month = re.findall('(?:([0-9]+)月)', res) 442 | if month: 443 | month = int(month[0]) 444 | if month > 0 and month < 13: 445 | if month < 10: 446 | month = '0' + str(month) 447 | else: 448 | month = str(month) 449 | else: 450 | return '' 451 | date = year + month 452 | else: 453 | date = year + str(datetime.datetime.now().month) 454 | return date 455 | six_d = re.findall(r'2\d{5}', s) #直接识别6位日期比如202110 456 | if six_d: 457 | date = six_d[0] 458 | if not date: 459 | # 针对2021 4月/2021.6/2021.06/2021-6/2021-06/2021 - 6月/2021 ---6月/2021 . 6月/2021...6月这些情况 460 | DATE_REG3 = r'(?:\d{4}\s*\.+\s*\d{1,2})|(?:\d{4}\s*-+\s*\d{1,2})|(?:\d{4}\s*_+\s*\d{1,2})|(?:\d{4}\s+\d{1,2})' 461 | six_d2 = re.findall(DATE_REG3, s) 462 | if six_d2: 463 | _six_d2 = six_d2[0] 464 | try: 465 | int(_six_d2[-2]) 466 | _six_d2_m = _six_d2[-2:] 467 | except: 468 | _six_d2_m = _six_d2[-1] 469 | s = _six_d2[:4]+'年'+_six_d2_m+'月' 470 | s = s.replace(' ', '') 471 | if not date: 472 | for i in thism_lst: 473 | if i in s: 474 | date = cur_date 475 | break 476 | if not date: 477 | for i in lastm_lst: 478 | if i in s: 479 | date = (datetime.datetime.now() - datetime.timedelta(days=30, hours=23)).strftime('%Y%m') 480 | break 481 | if not date: 482 | # 判断2021五月这种情况 483 | DATE_REG2 = '(?:[一二三四五六七八九零十0-9]{4}[一二三四五六七八九零十]{1,2}月)' 484 | res = re.findall(DATE_REG2, s) 485 | if res: 486 | s = res[0][:4]+'年'+res[0][4:] 487 | date = helper(s, DATE_REG1) 488 | else: 489 | date = '' 490 | if not date: 491 | date = helper(s, DATE_REG1) 492 | if not date: 493 | date = cur_date 494 | #corner case再判断下,处理下边界问题 495 | if date < '202001' or date[-2:] > '12': 496 | date = cur_date 497 | except: 498 | date = cur_date 499 | return date 500 | ``` 501 | 502 | ### eval 503 | ```python 504 | eval("['一','二','三']") 505 | 输出 ['一','二','三'] 506 | eval("{'a':1,'b':2}") 507 | 输出 {'a':1,'b':2} 508 | ``` 509 | 510 | ### global 511 | ```python 512 | a = None 513 | 514 | def f1(): 515 | a = 10 516 | 517 | def f2(): 518 | global a 519 | a = 10 520 | f1() 521 | print(a) 522 | f2() 523 | print(a) 524 | ``` 525 | 运行完f1()后,a还是None;运行完f2()后,a变成了10。一般规范global变量用大写 526 | 527 | ### 多进程与多线程实现 528 | 529 | ```python 530 | # 多进程实现举例 531 | from multiprocessing import Pool 532 | import os 533 | import time 534 | 535 | def long_time_task(a, b): 536 | print('Run task %s (%s)...' % (a, os.getpid())) 537 | start = time.time() 538 | time.sleep(1) 539 | end = time.time() 540 | print('Task %s runs %0.2f seconds.' % (a, (end - start))) 541 | return str(a) + '__pool__' + str(b) 542 | 543 | 544 | if __name__ == '__main__': 545 | 546 | print('Parent process %s.' % os.getpid()) 547 | p = Pool(4) 548 | res = [] 549 | for i in range(10): 550 | res.append(p.apply_async(long_time_task, args=(i, i+1))) 551 | print('Waiting for all subprocesses done...') 552 | p.close() 553 | p.join() 554 | print('All subprocesses done.') 555 | # 拿到子进程返回的结果 556 | for i in res: 557 | print('xxx', i.get()) 558 | ``` 559 | ```python 560 | # 多线程实现举例 561 | def func1(p1, p2, p3): 562 | pass 563 | def func2(p1, p2): 564 | pass 565 | from concurrent.futures import ThreadPoolExecutor, wait 566 | executor = ThreadPoolExecutor(max_workers=4) 567 | tasks = [] 568 | tasks.append(executor.submit(func1, param1, param2, param3)) 569 | tasks.append(executor.submit(func2, param1, param2)) 570 | wait(tasks, return_when='ALL_COMPLETED') 571 | res1, res2 = (x.result() for x in tasks) 572 | ``` 573 | ```python 574 | # 多进程优化版(推荐用这个) 575 | #!/usr/bin/env python 576 | # -*- coding: utf-8 -*- 577 | import functools 578 | from concurrent.futures import ProcessPoolExecutor 579 | from tqdm import tqdm 580 | import time 581 | 582 | class Pipe(object): 583 | """I am very like a linux pipe""" 584 | 585 | def __init__(self, function): 586 | self.function = function 587 | functools.update_wrapper(self, function) 588 | 589 | def __ror__(self, other): 590 | return self.function(other) 591 | 592 | def __call__(self, *args, **kwargs): 593 | return Pipe( 594 | lambda iterable, *args2, **kwargs2: self.function( 595 | iterable, *args, *args2, **kwargs, **kwargs2 596 | ) 597 | ) 598 | 599 | @Pipe 600 | def xProcessPoolExecutor(iterable, func, max_workers=5, desc="Processing", unit="it"): 601 | if max_workers > 1: 602 | total = len(iterable) if hasattr(iterable, '__len__') else None 603 | 604 | with ProcessPoolExecutor(max_workers) as pool, tqdm(total=total, desc=desc, unit=unit) as pbar: 605 | for i in pool.map(func, iterable): 606 | yield i 607 | pbar.update() 608 | 609 | else: 610 | return map(func, iterable) 611 | 612 | xtuple, xlist, xset = Pipe(tuple), Pipe(list), Pipe(set) 613 | 614 | def ff(x): 615 | for i in range(x): 616 | a = 1 617 | return x+2 618 | 619 | if __name__ == '__main__': 620 | dfs = [] 621 | arr = [100000000,200000000,300000000,400000000] 622 | #without multiprocess 623 | for i in arr: 624 | dfs.append(ff(i)) 625 | #with multiprocess 626 | dfs = arr | xProcessPoolExecutor(ff, 16) | xlist #这里的16是进程数,一般cpu有N核就起N-1个进程 627 | print(dfs) 628 | ``` 629 | ```python 630 | # 多进程(yuanjie封装meutils) 以多进程读取data下pdf文件为例 631 | from meutils.pipe import * 632 | os.environ['LOG_PATH'] = 'pdf.log' 633 | from meutils.log_utils import * 634 | location = 'output' #pdf文件处理后保存的文件夹 635 | @diskcache(location=location) 636 | def func(file_path): 637 | try: 638 | df = pdf_layout(str(file_path)) #解析成字典 详见https://github.com/binzhouchn/deep_learning/blob/master/4_llm/1_%E5%90%91%E9%87%8F%E6%95%B0%E6%8D%AE%E5%BA%93/es/es.py 中的body字典 639 | with open(f'{location}/{file_path.stem}.txt', 'w', encoding='utf8') as f: 640 | json.dump(df, f, ensure_ascii=False) 641 | except Exception as e: 642 | logger.debug(f"{file_path}: {e}") 643 | logger.debug(f"{file_path}: {traceback.format_exc().strip()}") 644 | if __name__ == '__main__': 645 | ps = Path('./data/').glob('*.pdf') | xlist #将所有pdf文件都列出来 646 | dfs = ps | xProcessPoolExecutor(func, 16) | xlist #这里的16是进程数,一般cpu有N核就起N-1个进程 647 | ``` 648 | 649 | ### cv的多进程实现 650 | 651 | ```python 652 | from multiprocessing import Manager, Process 653 | n = 5 654 | kf = KFold(n_splits=n, shuffle=False) 655 | mg = Manager() 656 | mg_list = mg.list() 657 | p_proc = [] 658 | 659 | def lr_pred(i,tr,va,mg_list): 660 | print('%s stack:%d/%d'%(str(datetime.now()),i+1,n)) 661 | clf = LogisticRegression(C=3) 662 | clf.fit(X[tr],y[tr]) 663 | y_pred_va = clf.predict_proba(X[va]) 664 | print('va acc:',myAcc(y[va], y_pred_va)) 665 | mg_list.append((va, y_pred_va)) 666 | # return mg_list # 可以不加 667 | 668 | print('main line') 669 | for i,(tr,va) in tqdm_notebook(enumerate(kf.split(X))): 670 | p = Process(target=lr_pred, args=(i,tr,va,mg_list,)) 671 | p.start() 672 | p_proc.append(p) 673 | [p.join() for p in p_proc] 674 | # 最后把mg_list中的元组数据拿出来即可 675 | ``` 676 | 677 | ### 保存数据 678 | 679 | ```python 680 | # 这里medical是mongodb的一个集合 681 | import json 682 | with open('../data/medical.json','w',encoding='utf-8') as fp: 683 | for i in medical.find(): 684 | i['_id'] = i.get('_id').__str__() # 把bson的ObjectId转成str 685 | json.dump(i,fp, ensure_ascii=False) 686 | fp.write('\n') 687 | fp.close() 688 | 689 | # 使用pickle(保存) 690 | data = (x_train, y_train, x_test) 691 | f_data = open('./data_doc2vec_25.pkl', 'wb') 692 | pickle.dump(data, f_data) 693 | f_data.close() 694 | # 使用pickle(读取) 695 | f = open('./data_doc2vec_25.pkl', 'rb') 696 | x_train, _, x_test = pickle.load(f) 697 | f.close() 698 | 699 | ``` 700 | 701 | ### 保存模型 702 | 703 | 1. 使用 pickle 保存 704 | ```python 705 | import pickle #pickle模块 706 | 707 | #保存Model(注:save文件夹要预先建立,否则会报错) 708 | with open('save/clf.pickle', 'wb') as f: 709 | pickle.dump(clf, f) 710 | 711 | #读取Model 712 | with open('save/clf.pickle', 'rb') as f: 713 | clf2 = pickle.load(f) 714 | #测试读取后的Model 715 | print(clf2.predict(X[0:1])) 716 | ``` 717 | 2. 使用joblib保存 718 | ```python 719 | from sklearn.externals import joblib #jbolib模块 720 | 721 | #保存Model(注:save文件夹要预先建立,否则会报错) 722 | joblib.dump(clf, 'save/clf.pkl') 723 | 724 | #读取Model 725 | clf3 = joblib.load('save/clf.pkl') 726 | 727 | #测试读取后的Model 728 | print(clf3.predict(X[0:1])) 729 | ``` 730 | 731 | 3. 可以使用dataframe自带的to_pickle函数,可以把大的文件存成多个 732 | ```python 733 | import os 734 | from glob import glob 735 | 736 | def mkdir_p(path): 737 | try: 738 | os.stat(path) 739 | except: 740 | os.mkdir(path) 741 | 742 | def to_pickles(df, path, split_size=3, inplace=True): 743 | """ 744 | path = '../output/mydf' 745 | 746 | wirte '../output/mydf/0.p' 747 | '../output/mydf/1.p' 748 | '../output/mydf/2.p' 749 | 750 | """ 751 | if inplace==True: 752 | df.reset_index(drop=True, inplace=True) 753 | else: 754 | df = df.reset_index(drop=True) 755 | gc.collect() 756 | mkdir_p(path) 757 | 758 | kf = KFold(n_splits=split_size) 759 | for i, (train_index, val_index) in enumerate(tqdm(kf.split(df))): 760 | df.iloc[val_index].to_pickle(f'{path}/{i:03d}.p') 761 | return 762 | 763 | def read_pickles(path, col=None): 764 | if col is None: 765 | df = pd.concat([pd.read_pickle(f) for f in tqdm(sorted(glob(path+'/*')))]) 766 | else: 767 | df = pd.concat([pd.read_pickle(f)[col] for f in tqdm(sorted(glob(path+'/*')))]) 768 | return df 769 | ``` 770 | 771 | ### enumerate 772 | 773 | ```python 774 | tuples = [(2,3),(7,8),(12,25)] 775 | for step, tp in enumerate(tuples): 776 | print(step,tp) 777 | # 0 (2, 3) 778 | # 1 (7, 8) 779 | # 2 (12, 25) 780 | ``` 781 | 782 | ### label数值化方法 783 | 784 | 方法一
785 | ```python 786 | # 比如10个类别转成1到10 787 | from sklearn.preprocessing import LabelEncoder 788 | data['label'] = LabelEncoder().fit_transform(data.categ_id) 789 | ``` 790 | 方法二
791 | ```python 792 | # 比如10个类别转成onehot形式 793 | import pandas as pd 794 | pd.get_dummies(data.categ_id) 795 | ``` 796 | 797 | 方法三
815 | 1. [x for x in data if condition]
816 | 2. [exp1 if condition else exp2 for x in data] 817 | 818 | ### 将numpy_array中的最多的元素选出 819 | 820 | 将numpy array中的最多的元素选出,如果一样则取最小的那个 821 | ```python 822 | arr = np.array([2,2,2,4,5]) 823 | np.bincount(arr).argmax() 824 | # output: 2 825 | arr = np.array([1,2,1,4,2,8]) 826 | np.bincount(arr).argmax() 827 | # output: 1 828 | ``` 829 | 830 | 将list中最多的元素选出,如果一样则取最小的那个 831 | ```python 832 | # 方法一 833 | arr = [2,2,2,4,5] 834 | max(set(arr),key=arr.count) 835 | # 方法二 836 | from collections import Counter 837 | Counter(arr).most_common(1)[0][0] 838 | ``` 839 | 840 | ### 函数中传入函数demo 841 | 842 | ```python 843 | # time_function把时间包装了一下给其他的函数 844 | def time_function(f, *args): 845 | """ 846 | Call a function f with args and return the time (in seconds) that it took to execute. 847 | """ 848 | import time 849 | tic = time.time() 850 | f(*args) 851 | toc = time.time() 852 | return toc - tic 853 | 854 | two_loop_time = time_function(classifier.compute_distances_two_loops, X_test) 855 | print('Two loop version took %f seconds' % two_loop_time) 856 | 857 | one_loop_time = time_function(classifier.compute_distances_one_loop, X_test) 858 | print('One loop version took %f seconds' % one_loop_time) 859 | 860 | no_loop_time = time_function(classifier.compute_distances_no_loops, X_test) 861 | print('No loop version took %f seconds' % no_loop_time) 862 | ``` 863 | 864 | ### getattr 865 | 866 | ```python 867 | class A(object): 868 | def __init__(self): 869 | pass 870 | def xx(self,x): 871 | print('get xx func',x) 872 | a = A() 873 | getattr(a,'xx')(23213) ### 等同于a.xx(23213) 874 | #out[]: get xx func 23213 875 | ``` 876 | 877 | ### df宽变长及一列变多列 878 | 879 | (1) df宽变长
880 | ```python 881 | def explode(df, col, pat=None, drop_col=True): 882 | """ 883 | :param df: 884 | :param col: col name 885 | :param pat: String or regular expression to split on. If None, splits on whitespace 886 | :param drop_col: drop col is Yes or No 887 | :return: hive explode 888 | """ 889 | data = df.copy() 890 | data_temp = data[col].str.split(pat=pat, expand=True).stack().reset_index(level=1, drop=True).rename(col+'_explode') 891 | if drop_col: 892 | data.drop(col, 1, inplace=True) 893 | return data.join(data_temp) 894 | 895 | df = pd.DataFrame([[1, 'a b c'], 896 | [2, 'a b'], 897 | [3, np.nan]], columns=['id', 'col']) 898 | 899 | explode(df, 'col', pat=' ') 900 | ``` 901 | ```python 902 | # id col_explode 903 | #0 1 a 904 | #0 1 b 905 | #0 1 c 906 | #1 2 a 907 | #1 2 b 908 | #2 3 NaN 909 | ``` 910 | (2) 一列变多列 911 | ```python 912 | df.col.str.split(' ', expand=True) 913 | ``` 914 | ```python 915 | # 0 1 2 916 | #0 a b c 917 | #1 a b None 918 | #2 NaN NaN NaN 919 | ``` 920 | 921 | ### groupby使用 922 | 923 | 根据df的personid进行groupby,统计一下用户消费consume这一列特征的相关聚合情况; 924 | 比如count, max, kurt 925 | 926 | ```python 927 | gr = df.groupby('personid')['consume'] 928 | df_aggr = gr.agg([('_count','count'),('_max',np.max),('_kurt',pd.Series.kurt)]).reset_index() 929 | 930 | # 多个特征聚合统计值拼接 931 | df = df.merge(df_aggr, how='left', on='personid').fillna(0) 932 | ``` 933 | 934 | ### python画图显示中文 935 | 936 | ```python 937 | ## 显示中文解决方法 938 | # 解决方法一 939 | import matplotlib as mpl 940 | mpl.rcParams['font.sans-serif'] = ['SimHei'] 941 | mpl.rcParams['font.serif'] = ['SimHei'] 942 | 943 | # 如果方法一解决不了 944 | import matplotlib.pyplot as plt 945 | plt.rcParams['font.sans-serif'] = ['SimHei'] # 解决中文显示问题-设置字体为黑体 946 | plt.rcParams['axes.unicode_minus'] = False # 解决保存图像是负号'-'显示为方块的问题 947 | 948 | # 如果方法二解决不了 949 | import matplotlib 950 | zhfont = matplotlib.font_manager.FontProperties(fname='../simsun.ttc') 951 | plt.title("职业分布情况",fontproperties=zhfont) 952 | plt.xlabel("用户职业",fontproperties=zhfont) 953 | plt.ylabel("逾期用户比例",fontproperties=zhfont) 954 | #或者 955 | import seaborn as sns 956 | p = sns.color_palette() 957 | sns.set_style("darkgrid",{"font.sans-serif":['simhei', 'Arial']}) 958 | fig = plt.figure(figsize=(20, 20)) 959 | ax1 = fig.add_subplot(3, 2, 1) # 总共3行2列6张,这是第一张图 960 | ax1=sns.barplot(职业分布.index, 职业分布.逾期/职业分布.总数, alpha=0.8, color=p[0], label='train') 961 | ax1.legend() 962 | ax1.set_title(u'职业分布情况',fontproperties=zhfont) 963 | ax1.set_xlabel(u'用户职业',fontproperties=zhfont) 964 | ax1.set_ylabel(u'逾期用户比例',fontproperties=zhfont) 965 | 966 | # 杰哥的方法,这个比较好 967 | from pathlib import Path 968 | from matplotlib.font_manager import _rebuild 969 | def chinese_setting(url=None): 970 | """ 971 | :param url: SimHei字体下载链接 972 | :return: 973 | """ 974 | print('开始设置中文...') 975 | matplotlibrc_path = Path(matplotlib.matplotlib_fname()) 976 | ttf_path = matplotlibrc_path.parent.__str__() + '/fonts/ttf' 977 | ttf_url = 'https://raw.githubusercontent.com/Jie-Yuan/Jie-Yuan.github.io/master/SimHei.ttf' if url is None else url 978 | if list(Path(ttf_path).glob('SimHei.ttf')): 979 | pass 980 | else: 981 | print('下载字体...') 982 | os.popen("cd %s && wget %s" % (ttf_path, ttf_url)) 983 | 984 | print('设置字体...') 985 | setting1 = 'font.family: sans-serif' 986 | setting2 = 'font.sans-serif: SimHei, Bitstream Vera Sans, Lucida Grande, Verdana, Geneva, Lucid, Arial, Helvetica, Avant Garde, sans-serif' 987 | setting3 = 'axes.unicode_minus: False' 988 | os.system('echo > %s' % matplotlibrc_path) 989 | os.system('echo %s >> %s' % (setting1, matplotlibrc_path)) 990 | os.system('echo %s >> %s' % (setting2, matplotlibrc_path)) 991 | os.system('echo %s >> %s' % (setting3, matplotlibrc_path)) 992 | _rebuild() 993 | print('请重启kernel测试...') 994 | chinese_setting() 995 | ``` 996 | 997 | 998 | ```bash 999 | # Graphviz 中文乱码 1000 | centos5.x下 1001 | yum install fonts-chinese 1002 | centos6.x或7.x下 1003 | yum install cjkuni-ukai-fonts 1004 | 1005 | fc-cache -f -v 刷新字体缓存 1006 | ``` 1007 | 1008 | ### 给字典按value排序 1009 | 1010 | ```python 1011 | model = xgb.train() 1012 | feature_score = model.get_fscore() 1013 | #{'avg_user_date_datereceived_gap': 1207, 1014 | # 'buy_total': 2391, 1015 | # 'buy_use_coupon': 557, 1016 | # 'buy_use_coupon_rate': 1240, 1017 | # 'count_merchant': 1475, 1018 | # 'coupon_rate': 5615, 1019 | # ... 1020 | # } 1021 | ``` 1022 | 1023 | 方法一: 1024 | ```python 1025 | sorted(feature_score.items(), key=lambda x:x[1],reverse=True) 1026 | ``` 1027 | 1028 | 方法二: 1029 | ```python 1030 | df = pd.DataFrame([(key, value) for key,value in feature_score.items()],columns=['key','value']) 1031 | df.sort_values(by='value',ascending=False,inplace=True) 1032 | ``` 1033 | 1034 | ### sorted高级用法 1035 | 1036 | 用法一:
1037 | 这里,列表里面的每一个元素都为二维元组,key参数传入了一个lambda函数表达式,其x就代表列表里的每一个元素,然后分别利用索引返回元素内的第一个和第二个元素,这就代表了sorted()函数利用哪一个元素进行排列。而reverse参数就如同上面讲的一样,起到逆排的作用。默认情况下,reverse参数为False。
1038 | ```python 1039 | l=[('a', 1), ('b', 2), ('c', 6), ('d', 4), ('e', 3)] 1040 | sorted(l, key=lambda x:x[0], reverse=True) 1041 | # Out[40]: [('e', 3), ('d', 4), ('c', 6), ('b', 2), ('a', 1)] 1042 | sorted(l, key=lambda x:x[1], reverse=True) 1043 | # Out[42]: [('c', 6), ('d', 4), ('e', 3), ('b', 2), ('a', 1)] 1044 | ``` 1045 | 1046 | 用法一(衍生):
1054 | ```python 1055 | # 调整数组顺序使奇数位于偶数前面,奇偶相对顺序不变 1056 | # 按照某个键值(即索引)排序,这里相当于对0和1进行排序 1057 | a = [3,2,1,5,8,4,9] 1058 | sorted(a, key=lambda c:c%2, reverse=True) 1059 | # key=a%2得到索引[1,0,1,1,0,0,1] 相当于给a打上索引标签[(1, 3), (0, 2), (1, 1), (1, 5), (0, 8), (0, 4), (1, 9)] 1060 | # 然后根据0和1的索引排序 得到[0,0,0,1,1,1,1]对应的数[2,8,4,3,1,5,9], 1061 | # 最后reverse的时候两块索引整体交换位置[1,1,1,1,0,0,0] 对应的数为[3, 1, 5, 9, 2, 8, 4] 这一系列过程数相对位置不变 1062 | ``` 1063 | 1064 | 用法三:
1065 | 需要注意的是,在python3以后,sort方法和sorted函数中的cmp参数被取消,此时如果还需要使用自定义的比较函数,那么可以使用cmp_to_key函数(在functools中)
1066 | ```python 1067 | from functools import cmp_to_key 1068 | arr = [3,5,6,4,2,8,1] 1069 | def comp(x, y): 1070 | if x < y: 1071 | return 1 1072 | elif x > y: 1073 | return -1 1074 | else: 1075 | return 0 1076 | 1077 | sorted(arr, key=cmp_to_key(comp)) 1078 | # Out[10]: [8,6,5,4,3,2,1] 1079 | ``` 1080 | 1081 | 用法三(衍生):
1082 | 输入一个正整数数组,把数组里所有数字拼接起来排成一个数,打印能拼接出的所有数字中最小的一个。例如输入数组{3,32,321},则打印出这三个数字能排成的最小数字为321323。
1083 | ```python 1084 | # 把数组排成最小的数 1085 | from functools import cmp_to_key 1086 | arr = [3, 32, 321] 1087 | arr = map(str, arr) # or [str(x) for x in arr] 1088 | ll = sorted(arr, key=cmp_to_key(lambda x,y:int(x+y)-int(y+x))) 1089 | print(int(''.join(ll))) 1090 | # Out[3]: 321323 1091 | ``` 1092 | 1093 | ### time用法 1094 | 1095 | ```python 1096 | import time 1097 | s = 'Jun-96' 1098 | time.mktime(time.strptime(s,'%b-%y')) 1099 | # strptime函数是将字符串按照后面的格式转换成时间元组类型;mktime函数则是将时间元组转换成时间戳 1100 | ``` 1101 | 1102 | ### 两层列表展开平铺 1103 | 1104 | 性能最好的两个方法 1105 | 1106 | 1. 方法一 1107 | ```python 1108 | C = [[1,2],[3,4,5],[7]] 1109 | [a for b in C for a in b] 1110 | ``` 1111 | 1112 | 2. 方法二 1113 | ```python 1114 | from itertools import chain 1115 | list(chain(*input)) 1116 | # list(chain.from_iterable(input)) 1117 | ``` 1118 | 1119 | 3. 方法三 1120 | ```python 1121 | import functools 1122 | import operator 1123 | #使用functools內建模块 1124 | def functools_reduce(a): 1125 | return functools.reduce(operator.concat, a) 1126 | ``` 1127 | 1128 | ### 读取百度百科词向量 1129 | 1130 | ```python 1131 | from bz2 import BZ2File as b2f 1132 | import tarfile 1133 | path = 'data/sgns.target.word-ngram.1-2.dynwin5.thr10.neg5.dim300.iter5.bz2' 1134 | fp = b2f(path) 1135 | lines = fp.readlines() 1136 | 1137 | def get_baike_wv(lines): 1138 | d_ = {} 1139 | for line in lines: 1140 | tmp = line.decode('utf-8').split(' ') 1141 | d_[tmp[0]] = [float(x) for x in tmp[1:-1]] 1142 | return d_ 1143 | baike_wv_dict = get_baike_wv(lines) 1144 | ``` 1145 | 1146 | ### logging 1147 | 1148 | ```python 1149 | import logging 1150 | #logger 1151 | def get_logger(): 1152 | FORMAT = '[%(levelname)s]%(asctime)s:%(name)s:%(message)s' 1153 | logging.basicConfig(format=FORMAT) 1154 | logger = logging.getLogger('main') 1155 | logger.setLevel(logging.DEBUG) 1156 | return logger 1157 | 1158 | logger = get_logger() 1159 | 1160 | logger.warning('Input data') 1161 | logger.info('cat treatment') 1162 | ``` 1163 | 1164 | ### argparse用法 1165 | 1166 | argparse 是在 Python 中处理命令行参数的一种标准方式。 1167 | 1168 | [arg_test.py](arg_test.py) 1169 | ``` 1170 | # 在shell中输入 1171 | python arg_test.py --train_path aa --dev_path bb 1172 | # 打印结果如下 1173 | Namespace(dev_path='bb',log_level='info',train_path='aa') 1174 | aa 1175 | bb 1176 | done. 1177 | ``` 1178 | 1179 | ### 包管理 1180 | 1181 | 一个包里有三个模块,mod1.py, mod2.py, mod3.py,但使用from demopack import *导入模块时,如何保证只有mod1、mod3被导入了。
1182 | 答案:增加init.py文件,并在文件中增加: 1183 | ```python 1184 | __all__ = ['mod1','mod3'] 1185 | ``` 1186 | 1187 | ### 装饰器 1188 | 1189 | [装饰器参考网址(还可以)](https://blog.csdn.net/qq_41853758/article/details/82853811)
1190 | ```python 1191 | #其中一种举例 装饰带有返回值的函数 1192 | def function(func): #定义了一个闭包 1193 | def func_in(*args,**kwargs): #闭包内的函数,因为装饰器运行的实则是闭包内的函数,所以这里将需要有形参用来接收原函数的参数。 1194 | print('这里是需要装饰的内容,就是需要添加的内容') 1195 | num = func(*args,**kwargs) #调用实参函数,并传入一致的实参,并且用变量来接收原函数的返回值, 1196 | return num #将接受到的返回值再次返回到新的test()函数中。 1197 | return func_in 1198 | @function 1199 | def test(a,b): #定义一个函数 1200 | return a+b #返回实参的和 1201 | print(test(3, 4)) 1202 | # 这里是需要装饰的内容,就是需要添加的内容 1203 | # 7 1204 | ``` 1205 | 1206 | ### 本地用python起http服务 1207 | 1208 | ```shell 1209 | python -m http.server 7777 1210 | ``` 1211 | 1212 | ### cache 1213 | 1214 | [好用的cache包](https://github.com/tkem/cachetools)
1215 | ```python 1216 | from cachetools import cached, LRUCache, TTLCache 1217 | 1218 | # speed up calculating Fibonacci numbers with dynamic programming 1219 | @cached(cache={}) 1220 | def fib(n): 1221 | return n if n < 2 else fib(n - 1) + fib(n - 2) 1222 | 1223 | # cache least recently used Python Enhancement Proposals 1224 | @cached(cache=LRUCache(maxsize=32)) 1225 | def get_pep(num): 1226 | url = 'http://www.python.org/dev/peps/pep-%04d/' % num 1227 | with urllib.request.urlopen(url) as s: 1228 | return s.read() 1229 | 1230 | # cache weather data for no longer than ten minutes 1231 | @cached(cache=TTLCache(maxsize=1024, ttl=600)) 1232 | def get_weather(place): 1233 | return owm.weather_at_place(place).get_weather() 1234 | ``` 1235 | 加在函数之前,主要cache输入和返回的值,下次输入同样的值就会1ms内返回,可以设置cache策略和数据过期时间ttl 1236 | 1237 | ### 创建文件 1238 | 1239 | 如果文件不存在则创建 1240 | ```python 1241 | from pathlib import Path 1242 | Path(OUT_DIR).mkdir(exist_ok=True) 1243 | ``` 1244 | 1245 | ### 字典转成对象 1246 | 1247 | ```python 1248 | class MyDict(dict): 1249 | __setattr__ = dict.__setitem__ 1250 | __getattr__ = dict.__getitem__ 1251 | 1252 | 1253 | def dict_to_object(_d): 1254 | if not isinstance(_d, dict): 1255 | return _d 1256 | inst = MyDict() 1257 | for k, v in _d.items(): 1258 | inst[k] = dict_to_object(v) # 解决嵌套字典问题 1259 | return inst 1260 | ``` 1261 | 1262 | ### boost安装 1263 | 1264 | ```shell 1265 | sudo apt-get install libboost-all-dev 1266 | sudo apt install ocl-icd-opencl-dev 1267 | sudo apt install cmake(可以去https://cmake.org/files下载比如cmake-3.14.0.tar.gz然后执行./bootstrap然后make然后make install) 1268 | ``` 1269 | 1270 | lgb gpu版安装
1271 | ```shell 1272 | pip install --upgrade pip 1273 | pip install lightgbm --install-option=--gpu 1274 | ``` 1275 | xgb gpu版安装
1276 | ```shell 1277 | git clone --recursive https://github.com/dmlc/xgboost 1278 | cd xgboost 1279 | mkdir build 1280 | cd build 1281 | cmake .. -DUSE_CUDA=ON 1282 | make(或者make -j4可能或报错) 1283 | 1284 | cd .. 1285 | cd python-package 1286 | python setup.py install 1287 | ``` 1288 | 1289 | ### tqdm 1290 | 1291 | [当Pytorch遇上tqdm](https://blog.csdn.net/dreaming_coder/article/details/113486645)
1292 | ```python 1293 | for epoch in range(epoch): 1294 | with tqdm( 1295 | iterable=train_loader, 1296 | bar_format='{desc} {n_fmt:>4s}/{total_fmt:<4s} {percentage:3.0f}%|{bar}| {postfix}', 1297 | ) as t: 1298 | start_time = datetime.now() 1299 | loss_list = [] 1300 | for batch, data in enumerate(train_loader): 1301 | t.set_description_str(f"\33[36m【Epoch {epoch + 1:04d}】") 1302 | # 训练代码 1303 | time.sleep(1) 1304 | # 计算当前损失 1305 | loss = random() 1306 | loss_list.append(loss) 1307 | cur_time = datetime.now() 1308 | delta_time = cur_time - start_time 1309 | t.set_postfix_str(f"train_loss={sum(loss_list) / len(loss_list):.6f}, 执行时长:{delta_time}\33[0m") 1310 | t.update() 1311 | ``` 1312 | 1313 | ### joblib_parallel 1314 | 1315 | 1316 | ```python 1317 | #Parallel for loop 此方法可用于多个文件数据并行读取 1318 | from joblib import Parallel, delayed 1319 | from math import sqrt 1320 | def ff(num): 1321 | return [sqrt(n ** 3) for n in range(num)] 1322 | #不使用并行 7.5s 1323 | res = [] 1324 | for i in range(10,7000): 1325 | res.append(ff(i)) 1326 | #使用并行 2.75s 1327 | res = Parallel(n_jobs = -1, verbose = 1)(delayed(ff)(i) for i in range(10,7000)) 1328 | ``` 1329 | 1330 | ### 调试神器pysnooper 1331 | 1332 | ```python 1333 | #pip install pysnooper 1334 | import os 1335 | os.environ['pysnooper'] = '1' # 开关 1336 | 1337 | from pysnooper import snoop 1338 | #如果为0,则重新定义snoop然后这个修饰啥都不干 1339 | if os.environ['pysnooper'] == '0': 1340 | import wrapt 1341 | def snoop(*args, **kwargs): 1342 | @wrapt.decorator 1343 | def wrapper(wrapped, instance, args, kwargs): 1344 | return wrapped(*args, **kwargs) 1345 | return wrapper 1346 | ``` 1347 | 1348 | ### 调试神器debugpy 1349 | 1350 | 安装:pip install debugpy -U
1351 | 在python代码里面(最前面加上这句话)
1352 | ```python 1353 | import debugpy 1354 | try: 1355 | # 5678 is the default attach port in the VS Code debug configurations. Unless a host and port are specified, host defaults to 127.0.0.1 1356 | debugpy.listen(("localhost", 9501)) 1357 | print("Waiting for debugger attach") 1358 | debugpy.wait_for_client() 1359 | except Exception as e: 1360 | pass 1361 | 1362 | ``` 1363 | 1364 | 在vscode软件中项目下新建一个.vscode目录,然后创建launch.json,看9501端口那个配置
1365 | ```python 1366 | { 1367 | // 使用 IntelliSense 了解相关属性。 1368 | // 悬停以查看现有属性的描述。 1369 | // 欲了解更多信息,请访问: https://go.microsoft.com/fwlink/?linkid=830387 1370 | "version": "0.2.0", 1371 | "configurations": [ 1372 | { 1373 | "name": "torchr_ex2", 1374 | "type": "python", 1375 | "request": "launch", 1376 | "program": "/Users/zb/anaconda3/envs/rag/bin/torchrun", 1377 | "console": "integratedTerminal", 1378 | "justMyCode": true, 1379 | "args": [ 1380 | "--nnodes", 1381 | "1", 1382 | "--nproc-per-node", 1383 | "2", 1384 | "${file}", 1385 | "--model_name_or_path", 1386 | "my_model_bz" 1387 | ] 1388 | }, 1389 | { 1390 | "name": "sh_file_debug", 1391 | "type": "debugpy", 1392 | "request": "attach", 1393 | "connect": { 1394 | "host": "localhost", 1395 | "port": 9501 1396 | } 1397 | }, 1398 | ] 1399 | } 1400 | ``` 1401 | 1402 | 上面的端口号都写一样比如9501,别搞错了! 1403 | 1404 | ### 分组计算均值并填充 1405 | 1406 | ```python 1407 | def pad_mean_by_group(df, gp_col='stock_id'): 1408 | # 只留下需要处理的列 1409 | cols = [col for col in df.columns if col not in["stock_id", "time_id", "target", "row_id"]] 1410 | # 查询nan的列 1411 | df_na = df[cols].isna() 1412 | # 根据分组计算平均值 1413 | df_mean = df.groupby(gp_col)[cols].mean() 1414 | 1415 | # 依次处理每一列 1416 | for col in cols: 1417 | na_series = df_na[col] 1418 | names = list(df.loc[na_series,gp_col]) 1419 | 1420 | t = df_mean.loc[names,col] 1421 | t.index = df.loc[na_series,col].index 1422 | 1423 | # 相同的index进行赋值 1424 | df.loc[na_series,col] = t 1425 | return df 1426 | train_pca = pad_mean_by_group(train_pca) 1427 | ``` 1428 | 1429 | ### python日期处理 1430 | 1431 | [80个例子,彻底掌握Python日期时间处理](https://mp.weixin.qq.com/s/2bJUZBfWS_8ULGrb9tRpmw)
1432 | 1433 | ### dataclass 1434 | 1435 | dataclass 提供一个简便的方式创建数据类, 默认实现__init__(), __repr__(), __eq__()方法
1436 | dataclass支持数据类型的嵌套
1437 | 支持将数据设置为不可变:@dataclass(frozen=True)
1438 | 1439 | 不用dataclass
1440 | 1441 | ```python 1442 | class Person: 1443 | def __init__(self, name, age): 1444 | self.name = name 1445 | self.age = age 1446 | p = Person('test', 18) 1447 | q = Person('test', 18) 1448 | #<__main__.Person at 0x7ff4ade66f40> 1449 | str(p) 1450 | repr(p) 1451 | #'<__main__.Person object at 0x7ff4ade66f40>' 1452 | p == q 1453 | #False 1454 | ``` 1455 | ```python 1456 | from typing import Any 1457 | from dataclasses import dataclass 1458 | @dataclass 1459 | class Person: 1460 | name: Any 1461 | age: Any = 18 1462 | p = Person('test', 18) 1463 | q = Person('test', 18) 1464 | #Person(name='test', age=18) 1465 | str(p) 1466 | repr(p) 1467 | #"Person(name='test', age=18)" 1468 | p == q 1469 | #True 1470 | ``` 1471 | 1472 | ### md5_sha256 1473 | 1474 | ```python 1475 | import hashlib 1476 | 1477 | def enc(s, ed='md5'): 1478 | if ed == 'md5': 1479 | hash_object = hashlib.md5(s.encode()) 1480 | elif ed == 'sha256': 1481 | hash_object = hashlib.sha256(s.encode()) 1482 | else: 1483 | raise ValueError('unsupport type!') 1484 | hash_hex = hash_object.hexdigest() 1485 | return hash_hex 1486 | 1487 | for i in ['13730973320','13802198853','17619520726']: 1488 | print(enc(i,'md5')) 1489 | ``` 1490 | 1491 | ### 查看内存 1492 | 1493 | 有几种方法可以在Python中获取对象的大小。可以使用sys.getsizeof()来获取对象的确切大小,使用objgraph.show_refs()来可视化对象的结构,或者使用psutil.Process().memory_info()。RSS获取当前分配的所有内存。 1494 | 1495 | ```python 1496 | >>> import numpy as np 1497 | >>> import sys 1498 | >>> import objgraph 1499 | >>> import psutil 1500 | >>> import pandas as pd 1501 | 1502 | >>> ob = np.ones((1024, 1024, 1024, 3), dtype=np.uint8) 1503 | 1504 | ### Check object 'ob' size 1505 | >>> sys.getsizeof(ob) / (1024 * 1024) 1506 | 3072.0001373291016 1507 | 1508 | ### Check current memory usage of whole process (include ob and installed packages, ...) 1509 | >>> psutil.Process().memory_info().rss / (1024 * 1024) 1510 | 3234.19140625 1511 | 1512 | ### Check structure of 'ob' (Useful for class object) 1513 | >>> objgraph.show_refs([ob], filename='sample-graph.png') 1514 | 1515 | ### Check memory for pandas.DataFrame 1516 | >>> from sklearn.datasets import load_boston 1517 | >>> data = load_boston() 1518 | >>> data = pd.DataFrame(data['data']) 1519 | >>> print(data.info(verbose=False, memory_usage='deep')) 1520 | 1521 | RangeIndex: 506 entries, 0 to 505 1522 | Columns: 13 entries, 0 to 12 1523 | dtypes: float64(13) 1524 | memory usage: 51.5 KB 1525 | 1526 | ### Check memory for pandas.Series 1527 | >>> data[0].memory_usage(deep=True) # deep=True to include all the memory used by underlying parts that construct the pd.Series 1528 | 4176 1529 | ``` 1530 | 1531 | ### slots用法 1532 | 1533 | ```python 1534 | #不使用__slots__时,可以很容易地添加一个额外的job属性 1535 | class Author: 1536 | def __init__(self, name, age): 1537 | self.name = name 1538 | self.age = age 1539 | 1540 | me = Author('Yang Zhou', 30) 1541 | me.job = 'Software Engineer' 1542 | print(me.job) 1543 | # Software Engineer 1544 | 1545 | # 在大多数情况下,我们不需要在运行时更改实例的变量或方法,并且__dict__不会(也不应该)在类定义后更改。所以Python为此提供了一个属性:__slots__ 1546 | class Author: 1547 | __slots__ = ('name', 'age') 1548 | 1549 | def __init__(self, name, age): 1550 | self.name = name 1551 | self.age = age 1552 | 1553 | me = Author('Yang Zhou', 30) 1554 | me.job = 'Software Engineer' 1555 | print(me.job) 1556 | # AttributeError: 'Author' object has no attribute 'job' 1557 | ``` 1558 | 1559 | 1560 | 1561 |
-------------------------------------------------------------------------------- /01_basic/arg_test.py: -------------------------------------------------------------------------------- 1 | __author__ = 'binzhou' 2 | __time__ = '20190116' 3 | 4 | import argparse 5 | import logging 6 | 7 | parser = argparse.ArgumentParser() 8 | parser.add_argument('--train_path', action='store', dest='train_path', 9 | help='Path to train data') 10 | parser.add_argument('--dev_path', action='store', dest='dev_path', 11 | help='Path to dev data') 12 | parser.add_argument('--log-level', dest='log_level', 13 | default='info', 14 | help='Logging level.') 15 | 16 | opt = parser.parse_args() 17 | print(opt) 18 | 19 | print('----------------------------------------') 20 | 21 | LOG_FORMAT = '%(asctime)s %(name)-12s %(levelname)-8s %(message)s' 22 | logging.basicConfig(format=LOG_FORMAT, level=getattr(logging, opt.log_level.upper())) 23 | # logging.info(opt) 24 | 25 | if opt.train_path is not None: 26 | print(opt.train_path) 27 | if opt.dev_path is not None: 28 | print(opt.dev_path) 29 | 30 | print('done.') 31 | 32 | -------------------------------------------------------------------------------- /02_numpy/README.md: -------------------------------------------------------------------------------- 1 | ## 目录 2 | 3 | [**1. numpy类型**](#numpy类型) 4 | 5 | [**2. np.where用法**](#np_where用法) 6 | 7 | [**3. 数组方法**](#数组方法) 8 | 9 | [**4. copy&deep copy**](#deep_copy) 10 | 11 | [**5. flatten和ravel的区别**](#flatten_ravel) 12 | 13 | [**6. pandas一列数组转numpy二维数组**](#pandas一列数组转numpy二维数组) 14 | 15 | [**7. numpy array改dtype方法**](#numpy_array改dtype方法) 16 | 17 | [**8. numpy存取数据**](#numpy存取数据) 18 | 19 | --- 20 | 21 | ### numpy类型 22 | 23 | 具体如下: 24 | 25 | |基本类型|可用的**Numpy**类型|备注 26 | |--|--|-- 27 | |布尔型|`bool`|占1个字节 28 | |整型|`int8, int16, int32, int64, int128, int`| `int` 跟**C**语言中的 `long` 一样大 29 | |无符号整型|`uint8, uint16, uint32, uint64, uint128, uint`| `uint` 跟**C**语言中的 `unsigned long` 一样大 30 | |浮点数| `float16, float32, float64, float, longfloat`|默认为双精度 `float64` ,`longfloat` 精度大小与系统有关 31 | |复数| `complex64, complex128, complex, longcomplex`| 默认为 `complex128` ,即实部虚部都为双精度 32 | |字符串| `string, unicode` | 可以使用 `dtype=S4` 表示一个4字节字符串的数组 33 | |对象| `object` |数组中可以使用任意值| 34 | |Records| `void` || 35 | |时间| `datetime64, timedelta64` || 36 | 37 | 38 | ### np_where用法 39 | ```python 40 | arr = array([ 0.31593257, 0.33837679, 0.38240686, 0.38970056, 0.54940456]) 41 | pd.Series(np.where(arr > 0.5, 1, 0), name='result').to_csv(path_save, index=False, header=True) 42 | ``` 43 | 这句话的意思是arr中值大于0.5赋为1,小于等于0.5赋为0,然后把这个series的名字命名为result,最后保存成csv文件去掉index,保留列名 44 | 45 | ### 数组方法 46 | ```python 47 | a = array([[1,2,3], 48 | [4,5,6]]) 49 | 50 | # 求和, 沿第一维求和,沿第二维求和 下面函数都类似 51 | sum(a) 21 52 | a.sum() 21 53 | sum(a,axis=0) array([5, 7, 9]) 54 | sum(a, axis=1) array([ 6, 15]) 55 | # 求积 56 | a.prod() 720 57 | prod(a) 720 58 | # 求最大最小及最大最小值的位置 59 | a.min() 1 60 | a.max() 6 61 | a.argmin() 0 62 | a.argmax() 5 63 | # 求均值、标准差 64 | a.mean() 65 | a.std() 66 | # clip方法,将数值限制在某个范围 67 | a.clip(3,5) array([[3,3,3],[4,5,5]]) 68 | # ptp方法,计算最大值和最小值之差 69 | a.ptp() 5 70 | 71 | # 生成二维随机矩阵并且保留3位小数 72 | from numpy.random import rand 73 | a = rand(3,4) 74 | %precision 3 #这个修饰可以运用在整个IDE上 75 | ``` 76 | 77 | ### deep_copy 78 | ```python 79 | a = np.array([1,2,3]) 80 | b1 = a #简单说, b1的东西全部都是a的东西, 动b1的任何地方, a都会被动到, 因为他们在内存中的位置是一模一样的, 本质上就是自己 81 | b2 = a.copy() # deep copy;则是将a copy了一份, 然后把b2放在内存中的另外的地方 82 | ``` 83 | 84 | ### flatten_ravel 85 | flatten是deep copy,而ravel是view 86 | ```python 87 | a = np.arange(8).reshape(2,4) 88 | b1 = a.flatten() 89 | b2 = a.ravel() 90 | # 这里如果改变b1的值,a不会改变;而改变b2的值,a会改变! 91 | ``` 92 | 93 | ### pandas一列数组转numpy二维数组 94 | ```python 95 | print(df) 96 | # nn 97 | # 0 [1,2,3] 98 | # 1 [4,5,6] 99 | # 2 [7,8,9] 100 | ## 想把df.nn这一列转成二维数组 101 | np.array([df.nn])[0] 102 | # array([[1, 2, 3], 103 | # [4, 5, 6], 104 | # [7, 8, 9]], dtype=object) 105 | ## 注意这里转换完以后array的dtype是object,需要根据需要转成相应的int,float等类型 106 | ## 转换方法看numpy_array改dtype方法 107 | ``` 108 | 109 | ### numpy_array改dtype方法 110 | ```python 111 | ###用astype方法 112 | print(arr) 113 | # array([[1, 2, 3], 114 | # [4, 5, 6], 115 | # [7, 8, 9]], dtype=object) 116 | arr.astype(np.float) 117 | ``` 118 | 119 | ### numpy存取数据 120 | 121 | ```python 122 | np.save('xxx.npy',data) 123 | np.load('xxx.npy') 124 | ``` -------------------------------------------------------------------------------- /03_pandas/README.md: -------------------------------------------------------------------------------- 1 | ## 目录 2 | 3 | [pandas进阶修炼300题](https://www.heywhale.com/mw/project/6146c0318447b8001769ff20)
4 | 5 | [可以替代pandas比较好用的数据平行处理包](#数据平行处理) 6 | 7 | [**1. pandas并行包**](#pandas并行包) 8 | 9 | [**2. pandas dataframe手动创建**](#pandas_dataframe手动创建) 10 | 11 | [**3. pandas dataframe中apply用法**](#pandas_dataframe中apply用法) 12 | 13 | [**4. pandas dataframe中map用法**](#pandas_dataframe中map用法) 14 | 15 | [**5. groupby用法**](#groupby用法) 16 | 17 | [**6. explode用法**](#explode用法) 18 | 19 | [**7. sort用法**](#sort用法) 20 | 21 | [**8. left join用法**](#left_join用法) 22 | 23 | [**9. reset_index用法**](#reset_index用法) 24 | 25 | [**10. pandas to_csv字段和值加引号操作**](#to_csv字段和值加引号操作) 26 | 27 | [**11. pd concat、merge、join来合并数据表**](#合并数据表) 28 | 29 | [**12. 数据透视表(Pivot Tables)**](#数据透视表) 30 | 31 | [**13. shuffle**](#shuffle) 32 | 33 | [**14. dataframe交换列的顺序**](#dataframe交换列的顺序) 34 | 35 | [**15. dataframe设置两个条件取值**](#dataframe设置两个条件取值) 36 | 37 | [**16. dataframe用h5格式保存**](#dataframe用h5格式保存) 38 | 39 | [**17. assign用法**](#assign用法) 40 | 41 | [**18. 用一列的非空值填充另一列对应行的空值**](#用一列的非空值填充另一列对应行的空值) 42 | 43 | [**19. dataframe修改值**](#dataframe修改值) 44 | 45 | [**20. dataframe表格填充**](#dataframe表格填充) 46 | 47 | [**21. 加快dataframe读取**](#加快dataframe读取) 48 | 49 | [**22. df热力图**](#df热力图) 50 | 51 | [**23. df热力地图**](#df热力地图) 52 | 53 | [**24. 2个pandas EDA插件**](#eda插件) 54 | 55 | [**25. python批量插入mysql数据库**](#python批量插入mysql数据库) 56 | 57 | --- 58 | 59 | ### 数据平行处理 60 | 61 | [polar]
62 | https://pola-rs.github.io/polars-book/user-guide/quickstart/intro.html
63 | https://pola-rs.github.io/polars/py-polars/html/reference
64 | 65 | [pandarallel](https://nalepae.github.io/pandarallel/)
66 | 67 | 68 | ### pandas_dataframe手动创建 69 | 70 | 手动创建dataframe 71 | ```python 72 | arr = np.array([['John','Lily','Ben'],[11,23,56]]) 73 | df = pd.DataFrame(arr.transpose(),columns=['name','age']) 74 | ``` 75 | 76 | 77 | ### pandas_dataframe中apply用法 78 | 79 | 现在想看一下地址中含有-和,的数据有哪些可以进行如下操作: 80 | ```python 81 | df[df.address.apply(lambda x: ('-' in list(x)) and (',' in list(x)))] 82 | ``` 83 | 84 | > 可以看basic中apply函数的用法 85 | 86 | ### pandas_dataframe中map用法 87 | 88 | ```python 89 | df["season"] = df.season.map({1: "Spring", 2 : "Summer", 3 : "Fall", 4 :"Winter" }) 90 | # 把数字映射成string 91 | ``` 92 | 93 | ### groupby用法 94 | 95 | [**用法举例一**] 96 | ```python 97 | gr = df.groupby(by='EID') 98 | gr.agg({'BTBL':'max','BTYEAR':'count'}).reset_index() # 常见的max, min, count, mean, first, nunique 99 | ``` 100 | ||EID|BTBL|BTYEAR 101 | |--|--|--|-- 102 | |0|4|0.620|2011 103 | |1|38|0.700|2013 104 | |2|51|0.147|2002 105 | 106 | 这里对df根据EID进行groupby,然后根据字段BTBL, BTYEAR两个字段进行聚合,然后reset_index 107 | 108 | [**用法举例二**] 109 | 110 | ||EID|ALTERNO|ALTDATE|ALTBE|ALTAF 111 | |--|--|--|--|--|-- 112 | |1|399|05|2014-01|10|50 113 | |2|399|12|2015-05|NaN|NaN 114 | |3|399|12|2013-12|NaN|NaN 115 | |4|399|27|2014-01|10|50 116 | |5|399|99|2014-01|NaN|NaN 117 | 118 | groupby EID然后想要统计一些唯一的月份有几个   119 | ```python 120 | # 方法一 121 | def f(ll): 122 | fun = lambda x : x.split('-')[1] 123 | return len(set(map(fun,list(ll)))) 124 | # lambda套lambda写法 125 | f = lambda ll : len(set(map(lambda x : x.split('-')[1],list(ll)))) 126 | 127 | p = pd.merge(data0, data2.groupby('EID').agg({'ALTERNO':'nunique','ALTDATE':f}).reset_index().rename(columns={'ALTERNO':'alt_count','ALTDATE':'altdate_nunique'}), how='left',on='EID') 128 | 129 | #方法二 130 | data2['year'] = data2.ALTDATE.apply(lambda x : x.split('-')[0]) 131 | data2['month'] = data2.ALTDATE.apply(lambda x : x.split('-')[1]) 132 | data2.groupby('EID').agg({'month':'nunique'}).reset_index().rename(columns={'month':'month_nunique'}) 133 | ``` 134 | 135 | ### explode用法 136 | 137 | **1. 比如有个dataframe的结构如下** 138 | 139 | ||city|community|longitude|latitude|address 140 | |--|--|--|--|--|-- 141 | |1|上海|东方庭院|121.044|31.1332|复兴路88弄,珠安路77弄,浦祥路377弄 142 | 143 | 执行如下语句: 144 | ```python 145 | data.drop('address',axis=1).join(data['address'].str.split(',',expand=True).stack().reset_index(level=1,drop=True).rename('address')) 146 | 147 | # spark中的explode用法 148 | spark_df = spark_df.select(spark_df['city'],spark_df['community_org'],spark_df['community'],\ 149 | spark_df['longitude'],spark_df['latitude'],(explode(split('address',','))).alias('address'),spark_df['villagekey']) 150 | ``` 151 | ||city|community|longitude|latitude|address 152 | |--|--|--|--|--|-- 153 | |1|上海|东方庭院|121.044|31.1332|复兴路88弄| 154 | |2|上海|东方庭院|121.044|31.1332|珠安路77弄| 155 | |3|上海|东方庭院|121.044|31.1332|浦祥路377弄| 156 | 157 | **2. pandas0.25版本以上有explode的函数**
158 | 159 | ||col_a|col_b 160 | |--|--|-- 161 | |0|10|[111, 222] 162 | |1|11|[333, 444] 163 | 164 | ```python 165 | df.explode('col_b') #得到如下表 166 | ``` 167 | 168 | ||col_a|col_b 169 | |--|--|-- 170 | |0|10|111 171 | |0|10|222 172 | |1|11|333 173 | |1|11|444 174 | 175 | **3. 一列json的数据变多列**
176 | ```python 177 | df = pd.DataFrame([[10,'0000003723','{"aa":"001","bb":"002","channel":"c1"}'],\ 178 | [14,'0000003723','{"aa":"001","bb":"002","xxx":"c1"}'],\ 179 | [11,'0092837434','{"aa":"003","bb":"004","cc":"010","channelDetails":{"channel":"c2"}}']],columns=['_idx','userno','detail']) 180 | # 分两步走 181 | # 第一步,单独处理json列 182 | def ff(row): 183 | row = json.loads(row) 184 | if not ('channel' in row or 'channelDetails' in row): 185 | return [] 186 | res = [row['aa'],row['bb'],row['channel']] if 'channel' in row else [row['aa'],row['bb'],row['channelDetails']['channel']] 187 | return res 188 | df['new_col'] = df.detail.apply(ff) 189 | # 第二步,,拼接 190 | df = df[df.new_col.map(lambda x : x!=[])].drop('detail',axis=1).reset_index(drop=True) 191 | df = pd.concat([df[['_idx','userno']], pd.DataFrame(df.new_col.tolist(),columns=['a','b','c'])],axis=1) 192 | ``` 193 | 194 | 195 | ### sort用法 196 | 197 | 注:df.sort()已经deprecated,以后可用df.sort_values() 198 | ```python 199 | data3.sort_values(['EID','B_REYEAR'],ascending=True) #默认是升序排,先根据EID然后再根据B_REYEAR进行排序 200 | ``` 201 | 202 | ### left_join用法 203 | ```python 204 | data.merge(data1, how='left', on='id_code') 205 | ``` 206 | 207 | ### reset_index用法 208 | ```python 209 | data.reset_index(drop=True) 210 | ``` 211 | 212 | ### to_csv字段和值加引号操作 213 | to_csv中的参数quoting: int or csv.QUOTE_* instance, default 0 214 | 控制csv中的引号常量。 215 | 可选 QUOTE_MINIMAL(0), QUOTE_ALL(1), QUOTE_NONNUMERIC(2) OR QUOTE_NONE(3) 216 | 217 | ### 合并数据表 218 | 如果你熟悉SQL,这几个概念对你来说就是小菜一碟。不过不管怎样,这几个函数从本质上来说不过就是合并多个数据表的不同方式而已。当然,要时刻记着什么情况下该用哪个函数也不是一件容易的事,所以,让我们一起再回顾一下吧。
219 |
220 | concat()可以把一个或多个数据表按行(或列)的方向简单堆叠起来(看你传入的axis参数是0还是1咯)。 221 | ```python 222 | import pandas as pd 223 | df1 = pd.DataFrame(np.ones((3,4))*0, columns=['a','b','c','d']) 224 | df2 = pd.DataFrame(np.ones((3,4))*1, columns=['a','b','c','d']) 225 | df3 = pd.DataFrame(np.ones((3,4))*2, columns=['a','b','c','d']) 226 | pd.concat([df1, df2, df3], axis=0, ignore_index=True) 227 | pd.concat([df1, df2, df3], axis=1, ignore_index=True) 228 | ``` 229 | ![concat.png](concat.png) 230 | 231 | merge()将会以用户指定的某个名字相同的列为主键进行对齐,把两个或多个数据表融合到一起。
\ 232 | ![merge.png](merge.png) 233 | 234 | join()和merge()很相似,只不过join()是按数据表的索引进行对齐,而不是按某一个相同的列。当某个表缺少某个索引的时候,对应的值为空(NaN)
235 | ![join.png](join.png) 236 | 237 | ### 数据透视表 238 | 最后也最重要的是数据透视表。如果你对微软的Excel有一定了解的话,你大概也用过(或听过)Excel里的“数据透视表”功能。Pandas里内建的pivot_table()函数的功能也差不多,它能帮你对一个数据表进行格式化,并输出一个像Excel工作表一样的表格。实际使用中,透视表将根据一个或多个键对数据进行分组统计,将函数传入参数aggfunc中,数据将会按你指定的函数进行统计,并将结果分配到表格中。
239 | ```python 240 | from pandas import pivot_table 241 | >>> df 242 | A B C D 243 | 0 foo one small 1 244 | 1 foo one large 2 245 | 2 foo one large 2 246 | 3 foo two small 3 247 | 4 foo two small 3 248 | 5 bar one large 4 249 | 6 bar one small 5 250 | 7 bar two small 6 251 | 8 bar two large 7 252 | 253 | >>> table = pivot_table(df, values='D', index=['A', 'B'], 254 | ... columns=['C'], aggfunc=np.sum) 255 | >>> table 256 | small large 257 | foo one 1 4 258 | two 6 NaN 259 | bar one 5 4 260 | two 6 7 261 | ``` 262 | 263 | ### shuffle 264 | ```python 265 | # 方法一 266 | from sklearn.utils import shuffle 267 | df = shuffle(df) 268 | # 方法二 269 | df.sample(frac=1).reset_index(drop=True) 270 | ``` 271 | 272 | ### dataframe交换列的顺序 273 | ```python 274 | reorder_col = ['label','doc','query'] 275 | df = df.loc[:, reorder_col] 276 | ``` 277 | 278 | ### dataframe设置两个条件取值 279 | 280 | ```python 281 | df[(df.Store == 1) & (df.Dept == 1)] 282 | ``` 283 | 284 | ### dataframe用h5格式保存 285 | 286 | ```python 287 | # 普通格式存储 288 | h5 = pd.HDFStore('data/data1_2212.h5','w') 289 | h5['data'] = data1 290 | h5.close() 291 | # 压缩格式存储 292 | h5 = pd.HDFStore('data/data1_2212.h5','w', complevel=4, complib='blosc') 293 | h5['data'] = data1 294 | h5.close() 295 | # 读取h5文件 296 | data=pd.read_hdf('data/data1_2212.h5',key='data') 297 | ``` 298 | 299 | ### assign用法 300 | 301 | assign相当于给df增加一列,返回新的df copy
302 | Assign new columns to a DataFrame, returning a new object 303 | (a copy) with the new columns added to the original ones. 304 | 305 | ```python 306 | def iv_xy(x, y): 307 | # good bad func 308 | def goodbad(df): 309 | names = {'good': (df['y']==0).sum(),'bad': (df['y']==1).sum()} 310 | return pd.Series(names) 311 | # iv calculation 312 | iv_total = pd.DataFrame({'x':x.astype('str'),'y':y}) \ 313 | .fillna('missing') \ 314 | .groupby('x') \ 315 | .apply(goodbad) \ 316 | .replace(0, 0.9) \ 317 | .assign( 318 | DistrBad = lambda x: x.bad/sum(x.bad), 319 | DistrGood = lambda x: x.good/sum(x.good) 320 | ) \ 321 | .assign(iv = lambda x: (x.DistrBad-x.DistrGood)*np.log(x.DistrBad/x.DistrGood)) \ 322 | .iv.sum() # iv核心公式,最后iv.sum()对每个group加总求和,即为该特征的iv值 323 | # return iv 324 | return iv_total 325 | ``` 326 | 327 | ### 用一列的非空值填充另一列对应行的空值 328 | 329 | ```python 330 | df.loc[df['new_subject'].isnull(),'new_subject']=df[df['new_subject'].isnull()]['subject'] 331 | ``` 332 | 333 | ### dataframe修改值 334 | 335 | ```python 336 | df.loc[df.A < 4,'A'] = [100,120,140] 337 | # or 338 | df.loc[df.content_id=='x6mbO2rHfU3hTej4','sentiment_tmp'] = 1 339 | ``` 340 | 341 | ### dataframe表格填充 342 | 343 | ```python 344 | df.fillna(method='ffill', axis=1).fillna(method='ffill') 345 | ``` 346 | 347 | ### 加快dataframe读取 348 | 349 | 方式一:cpu多线程读取(推荐)
350 | ```python 351 | #安装datatable==0.11.1 352 | import datatable as dtable 353 | train = dtable.fread(path+'train.csv').to_pandas() 354 | ``` 355 | 356 | 方式二:gpu读取
357 | ```python 358 | #安装cudf(稍微有点麻烦) 359 | import cudf 360 | train = cudf.read_csv(path+'train.csv').to_pandas() 361 | ``` 362 | 363 | ### df热力图 364 | 365 | ```python 366 | df.corr().style.background_gradient(cmap='coolwarm').set_precision(2) 367 | ``` 368 | 369 | ### df热力地图 370 | 371 | 结合pyecharts将各省市高校上榜数量进行地图可视化
372 | ```python 373 | from pyecharts import options as opts 374 | from pyecharts.charts import Map 375 | #省份 376 | list1 = ['北京','江苏','上海','广东','湖北','陕西','浙江','四川','湖南','山东','安徽','辽宁','重庆','福建','天津','吉林','河南','黑龙江','江西','甘肃','云南','河北'] 377 | #省份对应的高效数量 378 | list2 = [18, 15, 10, 9, 7, 7, 4, 3, 3, 3, 3, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1] 379 | c = ( 380 | Map() 381 | .add('', [list(z) for z in zip(list1,list2)], "china",is_map_symbol_show=False) 382 | .set_global_opts( 383 | title_opts=opts.TitleOpts(title="排名前100高校各省市占比"), 384 | visualmap_opts=opts.VisualMapOpts(max_=20), 385 | 386 | 387 | ) 388 | ) 389 | c.render_notebook() 390 | ``` 391 | 392 | ### eda插件 393 | 394 | ```python 395 | #插件一 396 | #!pip install pandas_profiling 397 | import pandas_profiling 398 | pandas_profiling.ProfileReport(df) 399 | #插件二 400 | import sweetviz as sv 401 | report = sv.analyze(df) 402 | report.show_html() 403 | ``` 404 | 405 | ### python批量插入mysql数据库 406 | 407 | ```python 408 | df.to_numpy()[:5].tolist() 409 | ''' 410 | [['25_B', 25, 'B', 0.6, '2024-08-12'], 411 | ['23_C', 23, 'C', 2.2, '2024-08-12'], 412 | ['24_D', 24, 'D', 3.8, '2024-08-12'], 413 | ['29_E', 29, 'E', 1.5, '2024-08-12'], 414 | ['22_F', 22, 'F', 4.1, '2024-08-12']] 415 | ''' 416 | 417 | import pymysql 418 | MYSQL_W_CONFIG = {'host':'10.xx.xxx.xx', 419 | 'port':3306, 420 | 'user':'user', 421 | 'password':'passwd', 422 | 'database':'mydatabase', 423 | 'charset':'utf8'} 424 | conn = pymysql.connect(autocommit=True, **MYSQL_W_CONFIG) 425 | cursor = conn.cursor() 426 | sql = "insert into xx_table(id,cust_id,agcode,score,s_time) values(%s,%s,%s,%s,%s)" 427 | cursor.executemany(sql, df_final.to_numpy().tolist()) 428 | conn.commit() 429 | conn.close() 430 | #1w条数据批量插入大概0.45s左右 431 | ``` -------------------------------------------------------------------------------- /03_pandas/concat.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binzhouchn/python_notes/52b87285c4147cc2bd43d69d0243a2d6357f4183/03_pandas/concat.png -------------------------------------------------------------------------------- /03_pandas/join.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binzhouchn/python_notes/52b87285c4147cc2bd43d69d0243a2d6357f4183/03_pandas/join.png -------------------------------------------------------------------------------- /03_pandas/merge.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binzhouchn/python_notes/52b87285c4147cc2bd43d69d0243a2d6357f4183/03_pandas/merge.png -------------------------------------------------------------------------------- /04_sklearn/README.md: -------------------------------------------------------------------------------- 1 | ## 目录 2 | 3 | [**0. 特征工程相关查阅feature_engineering仓库**](https://github.com/binzhouchn/feature_engineering) 4 | 5 | [**1. 将数据集进行train, test分割**](#将数据集进行train_test分割) 6 | 7 | [**2. 对数据集进行随机抽样**](#对数据集进行随机抽样) 8 | 9 | - [抽样方法一](#抽样方法一) 10 | - [抽样方法二](#抽样方法二) 11 | - [抽样方法三](#抽样方法三) 12 | 13 | [**3. 对结果进行评判,混淆矩阵**](#对结果进行评判用混淆矩阵) 14 | 15 | [**4. 模型效果评价accuracy, logloss, precision, recall, ks等**](#模型效果评价) 16 | 17 | --- 18 | 19 | ### 将数据集进行train_test分割 20 | ```python 21 | # 训练测试样本集 stratify可以指定分割是否需要分层,分层的话正负样本在分割后还是保持一致, 输入的label 22 | from sklearn.cross_validation import train_test_split 23 | def train_test_sep(X, test_size = 0.25, stratify = None, random_state = 1001): 24 | train, test = train_test_split(X, test_size = test_size, stratify = stratify, random_state = random_state) 25 | return train, test 26 | ``` 27 | ### 对数据集进行随机抽样 28 | 29 | #### 抽样方法一 30 | 31 | ```python 32 | from sklearn.model_selection import train_test_split 33 | #cat是在df中的某一个属性列 34 | X_train, X_test = train_test_split(df, test_size=0.3, stratify=df.cat) 35 | ``` 36 | 37 | #### 抽样方法二 38 | ```python 39 | df.sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None) 40 | 41 | - n是要抽取的行数。(例如n=20000时,抽取其中的2w行) 42 | - frac是抽取的比列。(有一些时候,我们并对具体抽取的行数不关系,我们想抽取其中的百分比,这个时候就可以选择使用frac,例如frac=0.8,就是抽取其中80%) 43 | - replace抽样后的数据是否代替原DataFrame() 44 | - weights这个是每个样本的权重,具体可以看官方文档说明。 45 | - random_state这个在之前的文章已经介绍过了。 46 | - axis是选择抽取数据的行还是列。axis=0的时是抽取行,axis=1时是抽取列(也就是说axis=1时,在列中随机抽取n列,在axis=0时,在行中随机抽取n行) 47 | ``` 48 | #### 抽样方法三 49 | ```python 50 | import random 51 | random_num_test = random.sample(np.arange(0,len(df)),200) 52 | random_num_train = list(set(xrange(len(df)))^set(random_num_test)) 53 | test = df.iloc[random_num_test] 54 | train = df.iloc[random_num_train] 55 | ``` 56 | 57 | ### 对结果进行评判用混淆矩阵 58 | ```python 59 | from sklearn.metrics import confusion_matrix 60 | confusion_matrix(y_true, y_pred) 61 | ``` 62 | 63 | ### 模型效果评价 64 | ```python 65 | # accuracy 准确率是针对y_true和y_pred都是类别的比如0和1 66 | from sklearn.metrics import accuracy_score 67 | accuracy_score(y_true, y_pred) 68 | ``` 69 | ```python 70 | # log_loss 又叫交叉熵,y_true是类别比如0和1,y_pred是属于类别1的概率值 71 | from sklearn.metrics import log_loss 72 | logloss = log_loss(y_true, y_pred, eps=1e-15) 73 | ``` 74 | ```python 75 | # recall precision 76 | from sklearn.metrics import confusion_matrix 77 | confusion_matrix(y_true, y_pred) 78 | from sklearn.metrics import precision_score 79 | from sklearn.metrics import recall_score 80 | from sklearn.metrics import f1_score 81 | ``` 82 | ```python 83 | # KS 84 | ''' 85 | 其实从某个角度上来讲ROC曲线和KS曲线是一回事,只是横纵坐标的取法不同而已。 86 | 拿逻辑回归举例,模型训练完成之后每个样本都会得到一个类概率值(注意是类似的类), 87 | 把样本按这个类概率值排序后分成10等份,每一份单独计算它的真正率和假正率, 88 | 然后计算累计概率值,用真正率和假正率的累计做为坐标画出来的就是ROC曲线; 89 | 用10等分做为横坐标,用真正率和假正率的累计值分别做为纵坐标就得到两个曲线,这就是KS曲线 90 | ''' 91 | from sklearn import metrics 92 | def ks(y_pred, y_true): 93 | label=y_true 94 | fpr,tpr,thres = metrics.roc_curve(label,y_pred,pos_label=1) 95 | return 'ks',abs(fpr - tpr).max() 96 | ``` 97 | 98 | -------------------------------------------------------------------------------- /05_OOP/README.md: -------------------------------------------------------------------------------- 1 | ## 目录 2 | 3 | [**1. 继承**](#继承) 4 | 5 | [**2. super函数使用基础**](#super函数使用基础) 6 | 7 | [**3. super函数使用 以LR为例**](#super函数使用_以lr为例) 8 | 9 | [**4. 装饰器@**](#装饰器) 10 | 11 | [**5. 装饰器@property**](#装饰器property) 12 | 13 | [**6. python单例模式**](#单例模式) 14 | 15 | [**7. python deprecated warning**](#deprecationwarning) 16 | 17 | [**8. 定制类**](#定制类) 18 | 19 | [**9. 网络编程**](#网络编程) 20 | 21 | [**各种模式待补充 看设计之禅(第2版)**](#设计模式) 22 | 23 | --- 24 | ### 继承 25 | 26 | ```python 27 | class FooParent(object): 28 | def __init__(self): 29 | self.parent = 'I\'m the parent.' 30 | print ('Parent') 31 | 32 | def bar(self, message): 33 | print ("{} from Parent".format(message)) 34 | 35 | class FooChild(FooParent): 36 | def __init__(self): 37 | super(FooChild,self).__init__() 38 | print ('Child') 39 | 40 | def bar(self, message): 41 | super(FooChild, self).bar(message) 42 | print ('Child bar function') 43 | print (self.parent) 44 | ``` 45 | 46 | ### super函数使用基础 47 | 48 | 实际上,大家对于在Python中如何正确使用super()函数普遍知之甚少。你有时候会看到像下面这样直接调用父类的一个方法: 49 | ```python 50 | class Base: 51 | def __init__(self): 52 | print('Base.__init__') 53 | class A(Base): 54 | def __init__(self): 55 | Base.__init__(self) 56 | print('A.__init__') 57 | ``` 58 | 尽管对于大部分代码而言这么做没什么问题,但是在更复杂的涉及到多继承的代码中就有可能导致很奇怪的问题发生。比如,考虑如下的情况: 59 | ```python 60 | class Base: 61 | def __init__(self): 62 | print('Base.__init__') 63 | class A(Base): 64 | def __init__(self): 65 | Base.__init__(self) 66 | print('A.__init__') 67 | class B(Base): 68 | def __init__(self): 69 | Base.__init__(self) 70 | print('B.__init__') 71 | class C(A,B): 72 | def __init__(self): 73 | A.__init__(self) 74 | B.__init__(self) 75 | print('C.__init__') 76 | ``` 77 | 如果你运行这段代码就会发现Base.init()被调用两次,如下所示: 78 | ```python 79 | >>> c = C() 80 | Base.__init__ 81 | A.__init__ 82 | Base.__init__ 83 | B.__init__ 84 | C.__init__ 85 | >>> 86 | ``` 87 | 可能两次调用Base.init()没什么坏处,但有时候却不是。另一方面,假设你在代码中换成使用super(),结果就很完美了: 88 | ```python 89 | class Base: 90 | def __init__(self): 91 | print('Base.__init__') 92 | class A(Base): 93 | def __init__(self): 94 | super().__init__() 95 | print('A.__init__') 96 | class B(Base): 97 | def __init__(self): 98 | super().__init__() 99 | print('B.__init__') 100 | class C(A,B): 101 | def __init__(self): 102 | super().__init__() # Only one call to super() here 103 | print('C.__init__') 104 | ``` 105 | 运行这个新版本后,你会发现每个init()方法只会被调用一次了: 106 | ```python 107 | >>> c = C() 108 | Base.__init__ 109 | B.__init__ 110 | A.__init__ 111 | C.__init__ 112 | >>> 113 | ``` 114 | 记一下这个: 115 | ```python 116 | class Base(object): 117 | def __init__(self,a=1,b=11): 118 | self.a = a 119 | self.b = b 120 | # 绑定(推荐) 121 | class B(Base): 122 | def __init__(self, a, b, c): 123 | super().__init__(a, b) # super(B, self).__init__(a, b) 124 | self.c = c 125 | # 未绑定 126 | class C(Base): 127 | def __init__(self, a, b, c): 128 | Base.__init__(self, a=a, b=1000) 129 | ``` 130 | ```python 131 | B(1,2,3).a, B(1,2,3).b, B(1,2,3).c 132 | BB(1,2,3).a, BB(1,2,3).b, BB(1,2,3).c 133 | 134 | (1, 2, 3) 135 | (1, 1000, 3) 136 | ``` 137 | ``` 138 |   1. super并不是一个函数,是一个类名,形如super(B, self)事实上调用了super类的初始化函数, 139 | 产生了一个super对象; 140 |   2. super类的初始化函数并没有做什么特殊的操作,只是简单记录了类类型和具体实例; 141 |   3. super(B, self).func的调用并不是用于调用当前类的父类的func函数; 142 |   4. Python的多继承类是通过mro的方式来保证各个父类的函数被逐一调用,而且保证每个父类函数 143 | 只调用一次(如果每个类都使用super); 144 |   5. 混用super类和非绑定的函数是一个危险行为,这可能导致应该调用的父类函数没有调用或者一 145 | 个父类函数被调用多次。 146 | ``` 147 | 148 | ### super函数使用_以LR为例 149 | 150 | ```python 151 | from sklearn.linear_model import LogisticRegression 152 | 153 | class LR(LogisticRegression): 154 | 155 | def __init__(self, threshold=0.01, dual=False, tol=1e-4, C=1.0, 156 | fit_intercept=True, intercept_scaling=1, class_weight=None, 157 | random_state=None, solver='liblinear', max_iter=100, 158 | multi_class='ovr', verbose=0, warm_start=False, n_jobs=1): 159 | #权值相近的阈值 160 | self.threshold = threshold 161 | super(LR, self).__init__(penalty='l1', dual=dual, tol=tol, C=C, 162 | fit_intercept=fit_intercept, 163 | intercept_scaling=intercept_scaling, 164 | class_weight=class_weight, 165 | random_state=random_state, 166 | solver=solver, max_iter=max_iter, 167 | multi_class=multi_class, 168 | verbose=verbose, 169 | warm_start=warm_start, 170 | n_jobs=n_jobs) 171 | #使用同样的参数创建L2逻辑回归 172 | self.l2 = LogisticRegression(penalty='l2', dual=dual, tol=tol, C=C, fit_intercept=fit_intercept, intercept_scaling=intercept_scaling, 173 | class_weight = class_weight, random_state=random_state, 174 | solver=solver, 175 | max_iter=max_iter, 176 | multi_class=multi_class, 177 | verbose=verbose, 178 | warm_start=warm_start, 179 | n_jobs=n_jobs) 180 | 181 | def fit(self, X, y, sample_weight=None): 182 | #训练L1逻辑回归 183 | super(LR, self).fit(X, y, sample_weight=sample_weight) # 这个不需要实例化就直接用父类的方法,父类在之前已经被初始化了penalty = 'l1'那个。 184 | self.coef_old_ = self.coef_.copy() # 继承了父类的,所以可以直接用self.coef_ 185 | #训练L2逻辑回归 186 | self.l2.fit(X, y, sample_weight=sample_weight) 187 | print self.coef_ 188 | print self.l2.coef_ 189 | ``` 190 | 191 | ### 装饰器 192 | 193 | _装饰器详解可参照[basic文档](https://github.com/binzhouchn/python_notes/blob/master/00.basic/README.md#装饰器)_ 194 | 195 | a. @classmethod: 不需要self参数,但第一个参数需要是表示自身类的cls参数 196 | ``` 197 | @classmethod意味着:当调用此方法时,我们将该类作为第一个参数传递,而不是该类的实例(正如我们通常使用的方法)。 198 | 这意味着您可以使用该方法中的类及其属性,而不是特定的实例 199 | ``` 200 | b. @staticmethod: 不需要表示自身对象的self和自身类的cls参数,就跟使用函数一样 201 | ``` 202 | @staticmethod意味着:当调用此方法时,我们不会将类的实例传递给它(正如我们通常使用的方法)。 203 | 这意味着你可以在一个类中放置一个函数,但是你无法访问该类的实例(当你的方法不使用实例时这很实用) 204 | ``` 205 | ```python 206 | class A(object): 207 | bar = 1 208 | def foo(self): 209 | print 'foo' 210 | 211 | @staticmethod 212 | def static_foo(): 213 | print 'static_foo' 214 | print A.bar 215 | 216 | @classmethod 217 | def class_foo(cls): 218 | print 'class_foo' 219 | print cls.bar 220 | cls().foo() 221 | 222 | A.static_foo() 223 | A.class_foo() 224 | ``` 225 | ```python 226 | static_foo 227 | 1 228 | class_foo 229 | 1 230 | foo 231 | ``` 232 | 233 | 装饰器相当于一个高阶函数,传入函数,返回函数,返回的时候这个函数多了一些功能[(原文链接)](https://mp.weixin.qq.com/s/hsa-kYvL31c1pEtMpkr6bA) 234 | ```python 235 | # 无参数的装饰器 236 | def use_logging(func): 237 | 238 | def wrapper(): 239 | logging.warn("%s is running" % func.__name__) 240 | return func() 241 | return wrapper 242 | 243 | @use_logging 244 | def foo(): 245 | print("i am foo") 246 | 247 | foo() 248 | 249 | #---------------------------------------------------------- 250 | # 带参数的装饰器 251 | def use_logging(level): 252 | def decorator(func): 253 | def wrapper(*args, **kwargs): 254 | if level == "warn": 255 | logging.warn("%s is running" % func.__name__) 256 | elif level == "info": 257 | logging.info("%s is running" % func.__name__) 258 | return func(*args, **kwargs) 259 | return wrapper 260 | 261 | return decorator 262 | 263 | @use_logging(level="warn") # 可以传参数进装饰器 264 | def foo(name, age=None, height=None): 265 | print("I am %s, age %s, height %s" % (name, age, height)) 266 | 267 | foo('John',9) [WARNING:root:foo is running]I am John, age 9, height None 268 | 269 | #--------------------------------------------------- 270 | # 类装饰器 271 | class Foo(object): 272 | def __init__(self, func): 273 | self._func = func 274 | 275 | def __call__(self): 276 | print ('class decorator runing') 277 | self._func() 278 | print ('class decorator ending') 279 | 280 | @Foo 281 | def bar(): 282 | print ('test bar') 283 | 284 | bar() 285 | 输出 286 | class decorator runing 287 | test bar 288 | class decorator ending 289 | ``` 290 | 291 | ### 装饰器property 292 | 293 | 把一个getter方法变成属性,只需要加上@property就可以了,此时,@property本身又创建了另一个装饰器@score.setter, 294 | 负责把一个setter方法变成属性赋值,于是,我们就拥有一个可控的属性操作
295 | ```python 296 | class Student(object): 297 | 298 | @property 299 | def birth(self): 300 | return self._birth 301 | 302 | @birth.setter 303 | def birth(self, value): 304 | self._birth = value 305 | 306 | @property 307 | def age(self): 308 | return 2015 - self._birth 309 | 310 | a = Student() 311 | a.birth = 22 # 这个的birth.setter装饰器相当于把之前birth方法变成了属性 312 | print(a.birth) 313 | print(a.age) 314 | ``` 315 | 316 | ### 单例模式 317 | 318 | ```python 319 | # 使用__new__方法 320 | #写法一 321 | class Singleton(object): 322 | def __new__(cls, *args, **kw): 323 | if not hasattr(cls, '_instance'): 324 | orig = super(Singleton, cls) 325 | cls._instance = orig.__new__(cls, *args, **kw) 326 | return cls._instance 327 | #写法二 328 | class Singleton(object): 329 | __instance=None 330 | def __init__(self): 331 | pass 332 | def __new__(cls, *args, **kwargs): 333 | if not cls.__instance: 334 | cls.__instance=super(Singleton, cls).__new__(cls,*args,**kwargs) 335 | return cls.__instance 336 | ``` 337 | ```python 338 | # 这个class是自己定义的class可以继承singleton实现单例模式 339 | # MyClass只加载一次 340 | class MyClass(Singleton): 341 | def __init__(self): 342 | print('ok') 343 | def kk(self): 344 | print('effwfwsefwefwef') 345 | ``` 346 | > 写一个装饰器@singleton也行 347 | ```python 348 | def singleton(cls, *args, **kw): 349 | instance={} 350 | def _singleton(): 351 | if cls not in instance: 352 | instance[cls]=cls(*args, **kw) 353 | return instance[cls] 354 | return _singleton 355 | 356 | @singleton 357 | class A: 358 | def __init__(self): 359 | pass 360 | def test(self,num): 361 | return num*2 362 | ``` 363 | 364 | 365 | ### DeprecationWarning 366 | 367 | ```python 368 | import warnings 369 | import functools 370 | 371 | def deprecated(func): 372 | """This is a decorator which can be used to mark functions 373 | as deprecated. It will result in a warning being emitted 374 | when the function is used.""" 375 | @functools.wraps(func) 376 | def new_func(*args, **kwargs): 377 | warnings.simplefilter('always', DeprecationWarning) # turn off filter 378 | warnings.warn("Call to deprecated function {}.".format(func.__name__), 379 | category=DeprecationWarning, 380 | stacklevel=2) 381 | warnings.simplefilter('default', DeprecationWarning) # reset filter 382 | return func(*args, **kwargs) 383 | return new_func 384 | 385 | @deprecated 386 | def some_old_function(x, y): 387 | return x + y 388 | ``` 389 | 390 | ### 定制类 391 | 392 | [廖雪峰 定制类](https://www.liaoxuefeng.com/wiki/0014316089557264a6b348958f449949df42a6d3a2e542c000/0014319098638265527beb24f7840aa97de564ccc7f20f6000) 393 | 394 | ### 网络编程 395 | 396 | [网络编程网址](https://blog.csdn.net/qq_41853758/article/details/82853811)
397 | 398 | ### -------------------------------------------------------------------------------- /06_flask_sanic/README.md: -------------------------------------------------------------------------------- 1 | [**1. flask**](#flask) 2 | 3 | [**2. sanic**](#sanic) 4 | 5 | [**3. python调用flask post方法**](#python调用flask_post方法) 6 | 7 | # flask 8 | 9 | ## GET方法 10 | ```python 11 | # GET方法 12 | # -*- coding: utf-8 -*- 13 | from flask import Flask, jsonify 14 | app = Flask(__name__) 15 | app.config['JSON_AS_ASCII'] = False # 防止中文乱码 16 | tasks = [ 17 | { 18 | 'id': 1, 19 | 'title': u'Buy groceries', 20 | 'description': u'Milk, Cheese, Pizza, Fruit, Tylenol', 21 | 'done': False 22 | }, 23 | { 24 | 'id': 2, 25 | 'title': u'Learn Python', 26 | 'description': u'Need to find a good Python tutorial on the web', 27 | 'done': False 28 | } 29 | ] 30 | @app.route('/todo/api/v1.0/tasks', methods=['GET']) 31 | def get_tasks(): 32 | return jsonify({'tasks': tasks}) 33 | if __name__ == '__main__': 34 | app.run(host=127.0.0.1, 35 | port=5000) 36 | ``` 37 | 38 | ## POST方法 39 | ```python 40 | # -*- coding: utf-8 -*- 41 | from flask import Flask 42 | from flask import request 43 | from flask import make_response,Response 44 | from flask import jsonify 45 | # 抛出flask异常类 46 | class CustomFlaskErr(Exception): 47 | # 自己定义了一个 responseCode,作为粗粒度的错误代码 48 | def __init__(self, responseCode=None): 49 | Exception.__init__(self) 50 | self.responseCode = responseCode 51 | self.J_MSG = {9704: '参数不合法!',9999:'系统内部异常!'} 52 | # 构造要返回的错误代码和错误信息的 dict 53 | def get_dict(self): 54 | rv = dict() 55 | # 增加 dict key: response code 56 | rv['responseCode'] = self.responseCode 57 | # 增加 dict key: responseMsg, 具体内容由常量定义文件中通过 responseCode 转化而来 58 | rv['responseMsg'] = self.J_MSG[self.responseCode] 59 | return rv 60 | def get_chatterbot_result(xx): 61 | pass # 这里应该return标准问题及ID,标准答案及ID,相似问及ID等之类的东西 62 | 63 | app = Flask(__name__) 64 | app.config['JSON_AS_ASCII'] = False 65 | 66 | @app.route('/') 67 | def get_simple_test(): 68 | return 'BINZHOU TEST' 69 | 70 | @app.route('/req_message', methods=['POST']) 71 | def req_message(): 72 | if request.method == 'POST': 73 | sid = request.form.get('sid') 74 | q = request.form.get('q') 75 | uid = request.form.get('uid','default_uid') # 可为空 76 | businessId = request.form.get('businessId') 77 | messageId = request.form.get('messageId','default_messageId') # 可为空 78 | source = request.form.get('source','default_source') # 可为空 79 | requestTime = request.form.get('requestTime') 80 | requestId = request.form.get('requestId') 81 | if not (sid and q and businessId and requestTime and requestId): 82 | raise CustomFlaskErr(responseCode=9704) 83 | # 进过我们自己定义的模块和chatterbot返回答案以及我们想要的一些东西等 84 | # bot_answer包含标准问题及ID,标准答案及ID,相似问及ID等之类的东西,需要解析一下然后给result 85 | bot_answer = get_chatterbot_result(q) 86 | # 根据机器人得到的结果将整个返回报文进行组装 87 | result = { 88 | 'sid':sid, 89 | 'q':q, 90 | 'uid':uid, 91 | 'businessId':businessId, 92 | 'messageId':messageId, 93 | 'type':0, 94 | 'source':source, 95 | 'sqId':12345, 96 | 'stQuestion':'绍兴在哪里?', 97 | 'sm':[ 98 | {'smid':12346,'smQuestion':'绍兴?哪里?'}, 99 | {'smid':12347,'smQuestion':'绍兴是哪里的啊啊'}], 100 | 'answer':{ 101 | "aid":"12345", 102 | "answare_text":"绍兴在浙江", 103 | "atype":1 104 | }, 105 | 'responseCode':'0000', 106 | 'responseMsg':'返回成功', 107 | 'responseTime':'20180601' 108 | } 109 | return jsonify(result) 110 | 111 | @app.errorhandler(CustomFlaskErr) 112 | def handle_flask_error(error): 113 | # response的json内容为自定义错误代码和错误信息 114 | response = jsonify(error.get_dict()) 115 | response.responseCode = error.responseCode 116 | return response 117 | 118 | if __name__ == '__main__': 119 | app.run(host='127.0.0.1', 120 | port='5000') 121 | ``` 122 | 123 | ## 解决flask跨域问题 124 | 125 | ```python 126 | # pip install flask-cors 127 | from flask_cors import CORS 128 | app = Flask(__name__,) 129 | # r'/*' 是通配符,让本服务器所有的URL 都允许跨域请求 130 | CORS(app, resources=r'/*') 131 | ``` 132 | 133 | --- 134 | 135 | # sanic 136 | ```python 137 | # sanic get和post方法 138 | # 使用了异步特性,而且还使用uvloop作为事件循环,其底层使用的是libuv,从而使 Sanic的速度优势更加明显。 139 | import os 140 | from sanic import Sanic, response 141 | from sanic.response import html, json, redirect, text, raw, file, file_stream 142 | 143 | app = Sanic() 144 | 145 | @app.route('/get') 146 | async def get_test(request): 147 | title = request.args.get('title') 148 | return response.json([{'model_name': title}]) 149 | 150 | if __name__ == '__main__': 151 | # app.run() 152 | app.run(host='127.0.0.1', port=8000) 153 | ``` 154 | # python调用flask_post方法 155 | 156 | 方法一:python requests
157 | ```python 158 | # json(request.json.get) 159 | import requests 160 | json={'id': 1223, 'text': '我是中国人'} 161 | r = requests.post('http://0.0.0.0:5000/req_message', json=json) 162 | r.json() 163 | # values(request.values.get) 164 | import requests 165 | r = requests.post('http://0.0.0.0:5000/req_message', data=[('id',1223),('text', '我是中国人')]) 166 | ``` 167 | 168 | 方法二:postman工具
169 | 点击右上方,点击Code->选Python Requests->复制代码即可 170 | 171 | 方法三:终端访问
172 | curl, 待补充.. 173 | 174 | -------------------------------------------------------------------------------- /07_database/README.md: -------------------------------------------------------------------------------- 1 | # 目录 2 | 3 | ## 1. Mysql/Hive(docker version) 4 | ``` 5 | # 先下载镜像 6 | docker pull mysql:5.5 7 | # 运行容器 可以先把-v去掉 8 | docker run -p 3306:3306 --name mymysql -v $PWD/conf:/etc/mysql/conf.d -v $PWD/logs:/logs -v $PWD/data:/var/lib/mysql -e MYSQL_ROOT_PASSWORD=123456 -d mysql:5.5 9 | 10 | -p 3306:3306:将容器的 3306 端口映射到主机的 3306 端口。 11 | -v -v $PWD/conf:/etc/mysql/conf.d:将主机当前目录下的 conf/my.cnf 挂载到容器的 /etc/mysql/my.cnf。 12 | -v $PWD/logs:/logs:将主机当前目录下的 logs 目录挂载到容器的 /logs。 13 | -v $PWD/data:/var/lib/mysql :将主机当前目录下的data目录挂载到容器的 /var/lib/mysql 。 14 | -e MYSQL_ROOT_PASSWORD=123456:初始化 root 用户的密码。 15 | ``` 16 | ```python 17 | # 用三方工具Navicat或者python连接,先建好db比如test_db 18 | import pymysql 19 | # 打开数据库连接 20 | db = pymysql.connect("localhost","root","123456","test_db") 21 | # 使用 cursor() 方法创建一个游标对象 cursor 22 | cursor = db.cursor() 23 | sql = "INSERT INTO tt(a, b, date) VALUES ('%d', '%s', '%s')" 24 | data = (306, '插入6', '20190615') 25 | cursor.execute(sql % data) 26 | db.commit() 27 | ``` 28 | ``` 29 | # 起了mysql服务以后,在用docker python去插入数据 30 | # 需要先查看docker mysql的容器ip地址,命令看2.8 31 | # 然后localhost改成mysql容器的ip地址即可,其他一样 32 | ``` 33 | 34 | ## 2. Redis(docker version) 35 | 36 | ![redis](imgs/redis_pic.png) 37 | ``` 38 | # 启动redis命令 39 | docker run --name docker-redis-test -p 6379:6379 -d redis:latest --requirepass "123456" 40 | # redis客户端连接命令 41 | docker exec -it redis-cli 42 | # 进去以后的操作 43 | auth 123456 44 | set name zhangsan 45 | get name 46 | quit 47 | ``` 48 | 49 | ```python 50 | # python连接docker起的redis服务 51 | import redis 52 | # 连接redis 53 | redis_conf = redis.Redis(host='xx.xx.xx.xx', port=6379, password='123456') 54 | # 查看redis中的keys 55 | redis_conf.keys() 56 | # 插入name,value 57 | redis_conf.set(name='name',value='John') 58 | # 插入name,key,value 59 | redis_conf.hset(name='name',key='k1',value='John') 60 | # 批量插入name,key,value 61 | redis_conf.hmset('hash1',{'k1':'v1','k2':'v2'}) 62 | # 批量get name 63 | redis_conf.hgetall('hash1') 64 | ``` 65 | ``` 66 | # redis可视化工具RDM(已安装) 67 | ``` 68 | 69 | ## 3. pymongo(docker version) 70 | 71 | 1. 把[腾讯词向量](https://ai.tencent.com/ailab/nlp/embedding.html)存入mongodb中,需先[安装mongodb](https://blog.csdn.net/weixin_29026283/article/details/82252941)
72 | 2. mongodb搭建后创建用户名密码
73 | ```shell 74 | 启动mongo: mongod -f mongodb.conf 75 | 关闭mongo: mongod -f mongodb.conf --shutdown 76 | # 命令行登录mongodb: mongo 77 | # 添加用户名密码 78 | use admin 79 | db.createUser({user: "root", pwd: "xxxxxx", roles:["root"]}) 80 | ``` 81 | 82 | ``` 83 | # 启动mongodb命令 84 | docker run -p 27017:27017 -v $PWD/mongo_db:/data/mongo_db -d mongo:4.0.10 85 | # 连接到mongo镜像cli 86 | docker run -it mongo:4.0.10 mongo --host <容器ip> 87 | 88 | # 建database建collection比如runoob然后插入数据 89 | db.runoob.insert({"title": 'MongoDB 教程', 90 | "description": 'MongoDB 是一个 Nosql 数据库', 91 | "by": 'w3cschool', 92 | "url": 'http://www.w3cschool.cn', 93 | "tags": ['mongodb', 'database', 'NoSQL'], 94 | "likes": 100}) 95 | db.runoob.find() 96 | ``` 97 | ```python 98 | import pymongo 99 | # 连接 100 | client = pymongo.MongoClient(host='xx.xx.xx.xx', port=27017, username='root', password='xxxxxx') 101 | # client = pymongo.MongoClient('mongodb://root:xxxxxx@localhost:27017/') 102 | # 读取数据库(如果没有的话自动创建) 103 | db = client.tencent_wv 104 | # 读取集合(如果没有的话自动创建) 105 | my_set = db.test_set 106 | # 删除集合 test_set 107 | db.drop_collection('test_set') 108 | # 插入数据和查询数据 109 | my_set.insert_one ({"name":"zhangsan","age":18,'shuze':[3,4,2,6,7,10]}) 110 | my_set.find_one({"name":"zhangsan"}) 111 | ``` 112 | ```python 113 | # 以插入腾讯词向量为例 114 | from tqdm import tqdm 115 | # 定义一个迭代器 116 | def __reader(): 117 | with open("/opt/common_files/Tencent_AILab_ChineseEmbedding.txt",encoding='utf-8',errors='ignore') as f: 118 | for idx, line in tqdm(enumerate(f), 'Loading ...'): 119 | ws = line.strip().split(' ') 120 | if idx: 121 | vec = [float(i) for i in ws[1:]] 122 | if len(vec) != 200: 123 | continue 124 | yield {'word': ws[0], 'vector': vec}rd = __reader() 125 | rd = __reader() 126 | while rd: 127 | my_set.insert_one(next(rd)) 128 | ``` 129 | 130 | ## 4. ElasticSearch(docker version) 131 | ``` 132 | # Run Elasticsearch 133 | docker run -d --name elasticsearch_for_test -p 9200:9200 -e "discovery.type=single-node" elasticsearch:6.6.0 134 | # 安装elasticsearch-head 135 | ``` 136 | ```python 137 | # 用python连接,并进行增删改查 138 | from elasticsearch import Elasticsearch 139 | from elasticsearch import helpers 140 | # es = Elasticsearch(hosts="localhost:9200", http_auth=('username','passwd')) 141 | esclient = Elasticsearch(['localhost:9200']) 142 | # 高效插入ES 143 | action1 = { 144 | "_index": "idx111", 145 | "_type": "test", 146 | # "_id": , 147 | "_source": { 148 | 'ServerIp': '0.1.1.1', 149 | 'SpiderType': 'toxic', 150 | 'Level': 4 151 | } 152 | } 153 | action2 = { 154 | "_index": "idx111", 155 | "_type": "pre", 156 | # "_id": 1, 157 | "_source": { 158 | 'ServerIp': '0.1.1.2', 159 | 'SpiderType': 'non-toxic', 160 | 'Level': 1 161 | } 162 | } 163 | actions = [action1, action2] 164 | helpers.bulk(esclient, actions) 165 | 166 | #--------------------------------------------------- 167 | # 创建schema然后单条插入数据 168 | # 类似创建schema 169 | answer_index = 'baidu_answer' 170 | answer_type = 'doc22' 171 | esclient.indices.create(answer_index) 172 | answer_mapping = { 173 | "doc22": { 174 | "properties": { 175 | "id": { 176 | "type": "integer", 177 | # "index": True 178 | }, 179 | "schoolID":{ 180 | "type":"text" 181 | }, 182 | "schoolName":{ 183 | "type": "text", 184 | "analyzer": "ik_max_word" # 这个需要安装,先run docker6.6.0然后docker exec -it /bin/bash下载解压ik后exit然后restart这个container即可,之后可以新生成一个image 185 | # "analyzer":"whitespace" 186 | }, 187 | "calNum":{ 188 | "type":"float" 189 | } 190 | } 191 | } 192 | } 193 | esclient.indices.put_mapping(index=answer_index, doc_type=answer_type, body=answer_mapping) 194 | # 创建完schema以后导入数据 195 | doc = {'id': 7, 'schoolID': '007', 'schoolName': '春晖外国语学校', 'calNum':6.20190624} 196 | esclient.index(index=answer_index ,doc_type=answer_type ,body=doc, id=doc['id']) 197 | esclient.index(index=answer_index ,doc_type=answer_type ,body=doc, id=10) 198 | #---------------------------------------------------- 199 | 200 | # 删除单条数据 201 | # esclient.delete(index='indexName', doc_type='typeName', id='idValue') 202 | esclient.delete(index='pre', doc_type='imagetable2', id=1) 203 | # 删除索引 204 | esclient.indices.delete(answer_index) 205 | 206 | # 更新 207 | # esclient.update(index='indexName', doc_type='typeName', id='idValue', body={_type:{待更新字段}}) 208 | new_doc = {'id': 7, 'schoolId': '007', 'schoolName': '更新名字1'} 209 | esclient.update(index=answer_index, id=7, doc_type=answer_type, body={'doc': new_doc}) # 注意body中一定要加_type doc,更新的body中不一定要加入所有字段,只要把要更新的几个字段加入即可 210 | 211 | # 查询 212 | ### 根据id查找数据 213 | res = esclient.get(index=answer_index, doc_type=answer_type, id=7) 214 | ### 根据id列表查找数据 215 | body = {'ids': id_lst} # id_lst=[3,6,120,9] 216 | res = esclient.mget(index=index, doc_type=doc_type, body=body) 217 | ### match:在schoolName中包含关键词的都会被搜索出来(这里的分词工具是ik) 218 | # res = esclient.search(index=answer_index,body={'query':{'match':{'schoolName':'春晖外'}}}) 219 | res = esclient.search(index=answer_index,body={'query':{'match':{'schoolName':'春晖学校'}}}) 220 | ### 根据restful api进行查询 221 | # [GET] http://localhost:9200/knowledge_qv/question_vec/20 222 | ``` 223 | 224 | [ES查询大于10000条数据方法](https://blog.csdn.net/xsdxs/article/details/72876703) 225 | 226 | ## 5. neo4j图数据库(docker version) 227 | ``` 228 | # docker启动neo4j服务 229 | docker run \ 230 | --publish=7474:7474 --publish=7687:7687 \ 231 | --volume=$PWD/neo4j/data:/data \ 232 | -d neo4j:latest 233 | 234 | # 然后登陆网页可视化界面 235 | 236 | # 或使用Cypher shell 237 | docker exec --interactive --tty bin/cypher-shell 238 | # 退出:exit 239 | ``` 240 | 241 | ## 6. Stardog RDF数据库 242 | 243 | [stardog官方文档](https://www.stardog.com/docs/)
244 | [RDF入门](https://blog.csdn.net/txlCandy/article/details/50959358)
245 | [OWL语言](https://blog.csdn.net/zycxnanwang/article/details/86557350)
246 | -------------------------------------------------------------------------------- /07_database/es.md: -------------------------------------------------------------------------------- 1 | # es存入768维度向量,以及向量查询(ES版本需要7.3之后) 2 | 3 | https://github.com/UKPLab/sentence-transformers/blob/master/examples/applications/semantic-search/semantic_search_quora_elasticsearch.py 4 | 5 | 6 | ```python 7 | """ 8 | This script contains an example how to perform semantic search with ElasticSearch. 9 | 10 | As dataset, we use the Quora Duplicate Questions dataset, which contains about 500k questions: 11 | https://www.quora.com/q/quoradata/First-Quora-Dataset-Release-Question-Pairs 12 | 13 | Questions are indexed to ElasticSearch together with their respective sentence 14 | embeddings. 15 | 16 | The script shows results from BM25 as well as from semantic search with 17 | cosine similarity. 18 | 19 | You need ElasticSearch (https://www.elastic.co/de/elasticsearch/) up and running. Further, you need the Python 20 | ElasticSearch Client installed: https://elasticsearch-py.readthedocs.io/en/master/ 21 | 22 | As embeddings model, we use the SBERT model 'quora-distilbert-multilingual', 23 | that it aligned for 100 languages. I.e., you can type in a question in various languages and it will 24 | return the closest questions in the corpus (questions in the corpus are mainly in English). 25 | """ 26 | 27 | from sentence_transformers import SentenceTransformer, util 28 | import os 29 | from elasticsearch import Elasticsearch, helpers 30 | import csv 31 | import time 32 | import tqdm.autonotebook 33 | 34 | 35 | 36 | es = Elasticsearch() 37 | 38 | model = SentenceTransformer('quora-distilbert-multilingual') 39 | 40 | url = "http://qim.fs.quoracdn.net/quora_duplicate_questions.tsv" 41 | dataset_path = "quora_duplicate_questions.tsv" 42 | max_corpus_size = 100000 43 | 44 | #Download dataset if needed 45 | if not os.path.exists(dataset_path): 46 | print("Download dataset") 47 | util.http_get(url, dataset_path) 48 | 49 | #Get all unique sentences from the file 50 | all_questions = {} 51 | with open(dataset_path, encoding='utf8') as fIn: 52 | reader = csv.DictReader(fIn, delimiter='\t', quoting=csv.QUOTE_MINIMAL) 53 | for row in reader: 54 | all_questions[row['qid1']] = row['question1'] 55 | if len(all_questions) >= max_corpus_size: 56 | break 57 | 58 | all_questions[row['qid2']] = row['question2'] 59 | if len(all_questions) >= max_corpus_size: 60 | break 61 | 62 | qids = list(all_questions.keys()) 63 | questions = [all_questions[qid] for qid in qids] 64 | 65 | #Index data, if the index does not exists 66 | if not es.indices.exists(index="quora"): 67 | try: 68 | es_index = { 69 | "mappings": { 70 | "properties": { 71 | "question": { 72 | "type": "text" 73 | }, 74 | "question_vector": { 75 | "type": "dense_vector", 76 | "dims": 768 77 | } 78 | } 79 | } 80 | } 81 | 82 | es.indices.create(index='quora', body=es_index, ignore=[400]) 83 | chunk_size = 500 84 | print("Index data (you can stop it by pressing Ctrl+C once):") 85 | with tqdm.tqdm(total=len(qids)) as pbar: 86 | for start_idx in range(0, len(qids), chunk_size): 87 | end_idx = start_idx+chunk_size 88 | 89 | embeddings = model.encode(questions[start_idx:end_idx], show_progress_bar=False) 90 | bulk_data = [] 91 | for qid, question, embedding in zip(qids[start_idx:end_idx], questions[start_idx:end_idx], embeddings): 92 | bulk_data.append({ 93 | "_index": 'quora', 94 | "_id": qid, 95 | "_source": { 96 | "question": question, 97 | "question_vector": embedding 98 | } 99 | }) 100 | 101 | helpers.bulk(es, bulk_data) 102 | pbar.update(chunk_size) 103 | 104 | except: 105 | print("During index an exception occured. Continue\n\n") 106 | 107 | 108 | 109 | 110 | #Interactive search queries 111 | while True: 112 | inp_question = input("Please enter a question: ") 113 | 114 | encode_start_time = time.time() 115 | question_embedding = model.encode(inp_question) 116 | encode_end_time = time.time() 117 | 118 | #Lexical search 119 | bm25 = es.search(index="quora", body={"query": {"match": {"question": inp_question }}}) 120 | 121 | #Sematic search 122 | sem_search = es.search(index="quora", body={ 123 | "query": { 124 | "script_score": { 125 | "query": { 126 | "match_all": {} 127 | }, 128 | "script": { 129 | "source": "cosineSimilarity(params.queryVector, doc['question_vector']) + 1.0", 130 | "params": { 131 | "queryVector": question_embedding 132 | } 133 | } 134 | } 135 | } 136 | }) 137 | 138 | print("Input question:", inp_question) 139 | print("Computing the embedding took {:.3f} seconds, BM25 search took {:.3f} seconds, semantic search with ES took {:.3f} seconds".format(encode_end_time-encode_start_time, bm25['took']/1000, sem_search['took']/1000)) 140 | 141 | print("BM25 results:") 142 | for hit in bm25['hits']['hits'][0:5]: 143 | print("\t{}".format(hit['_source']['question'])) 144 | 145 | print("\nSemantic Search results:") 146 | for hit in sem_search['hits']['hits'][0:5]: 147 | print("\t{}".format(hit['_source']['question'])) 148 | 149 | print("\n\n========\n") 150 | ``` -------------------------------------------------------------------------------- /07_database/faiss.md: -------------------------------------------------------------------------------- 1 | # faiss向量搜索库 2 | 3 | 与es.md提到的es7.3向量搜索一样,faiss是更加专业的向量搜索工具 4 | 5 | [实战入门faiss搜索bert最邻近句子:docker CPU镜像开箱即用,无需额外安装下载](https://mp.weixin.qq.com/s?__biz=MzA4NzkxNzM3Nw==&mid=2457484515&idx=1&sn=c13b27b09b4a7e2a31a1ee421b362540&chksm=87bc8acdb0cb03db46ca7cc0893e46d4078e925a3b35f717806315c0881f6ad75b2165df4a0f&cur_album_id=2002019450945896449&scene=189#wechat_redirect) 6 | 7 | [semantic_search_quora_faiss.py](https://github.com/UKPLab/sentence-transformers/blob/master/examples/applications/semantic-search/semantic_search_quora_faiss.py) 8 | 9 | 10 | 11 | ## todo -------------------------------------------------------------------------------- /07_database/imgs/redis_pic.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binzhouchn/python_notes/52b87285c4147cc2bd43d69d0243a2d6357f4183/07_database/imgs/redis_pic.png -------------------------------------------------------------------------------- /07_database/neo4j.md: -------------------------------------------------------------------------------- 1 | # neo4j入门 2 | 3 | [**1. neo4j_python操作**](#neo4j_python操作) 4 | 5 | [**2. neo4j网页版直接操作**](#neo4j网页版直接操作) 6 | 7 | [**3. neo4j-spark-connector操作**](#neo4j-spark-connector操作) 8 | 9 | [**4. neo4j问题整理**](#neo4j问题整理) 10 | 11 | 12 | ### neo4j_python操作 13 | 14 | ```python 15 | import numpy as np 16 | import pandas as pd 17 | import py2neo 18 | from py2neo import Graph,Node,Relationship 19 | import neo4j 20 | from neo4j.v1 import GraphDatabase, basic_auth 21 | 22 | # py2neo操作 23 | test_graph = Graph( 24 | #"http://localhost:7474", 25 | "bolt://localhost:7687", 26 | username="neo4j", 27 | password="z123456789" 28 | ) 29 | 30 | # 创建节点 31 | node1 = Node('Customer', name='John',age=18,phone=2232) 32 | node2 = Node('Customer', name='Lily',age=22,phone=9921) 33 | node3 = Node('Customer', name='Cathy',age=52,phone=7100) 34 | test_graph.create(node1) 35 | test_graph.create(node2) 36 | test_graph.create(node3) 37 | 38 | # 创建节点2 39 | arr = np.array([['John','Lily','Ben','Mark'],['189101','234220','019018','330682'],[11,23,56,28]]) 40 | df = pd.DataFrame(arr.transpose(),columns=['name','phone_no','age']) 41 | for i, j, k in df.values: 42 | node1 = Node('Person',name=i,phone_no=j,age=k) 43 | graph.create(node1) 44 | 45 | # neo4j.v1操作 46 | driver = GraphDatabase.driver("bolt://localhost:7687", auth=basic_auth("neo4j", "z123456789")) 47 | session = driver.session() 48 | # 创建节点3 49 | arr = np.array([['John','Lily','Ben','Mark'],['189101','234220','019018','330682'],[11,23,56,28]]) 50 | df = pd.DataFrame(arr.transpose(),columns=['name','phone_no','age']) 51 | # name phone_no age 52 | # 0 John 189101 11 53 | # 1 Lily 234220 23 54 | # 2 Ben 019018 56 55 | # 3 Mark 330682 28 56 | # dataframe to dict操作 57 | dic = {'events':df.to_dict('records')} 58 | session.run("unwind {events} as event merge (n:Person{name:event.name,phone_no2:event.phone_no,age: event.age})",dic) 59 | 60 | # 删除所有节点和边 61 | test_graph.delete_all() 62 | ``` 63 | 64 | ### neo4j网页版直接操作 65 | 66 | 先把jdk改成1.8然后再进入neo4j的文件夹bin中输入neo4j.bat console撑起网页版服务 67 | 68 | 在http://localhost:7474/browser/用户名neo4j密码z1234...中输入命令进行一些简单的节点,关系等操作 69 | 70 | **Neo4j CQL常见的操作有:** 71 | 72 | |S.No|CQL命令/条款|用法 73 | |--|--|-- 74 | |1|CREATE[创建节点](#创建节点)|创建节点,关系和属性 75 | |2|CREATE[创建关系1](#创建关系1)|创建关系和属性 76 | |3|CREATE[创建关系2](#创建关系2)|创建关系和属性 77 | |4|CREATE[创建关系3](#创建关系3)|创建关系和属性 78 | |5|MATCH[匹配](#匹配)|检索有关系点,关系和属性数据 79 | |6|RETURN[返回](#返回)|返回查询结果 80 | |7|WHERE[哪里](#哪里)|提供条件过滤检索数据 81 | |8|DELETE[删除](#删除)|删除节点和关系 82 | |9|REMOVE[移除](#移除)|删除节点和关系的属性 83 | |10|ORDER BY以..[排序](#排序)|排序检索数据 84 | |11|SET[设置](#设置)|添加或更新标签 85 | |12|[UNWIND](#unwind)|unwind操作 86 | |13|[INDEX](#index)|index添加,删除和查询 87 | |14|[修改graph.db](#修改)|修改备份Neo4j图数据库 88 | 89 | **Neo4j CQL常见的函数有:** 90 | |S.No|定制列表功能|用法 91 | |--|--|-- 92 | |1|String[字符串](#string)|它们用于使用String字面量(UPPER,LOWER,SUBSTRING,REPLACE) 93 | |2|Aggregation[聚合](#聚合)|对CQL查询结果执行一些聚合操作(COUNT,MAX,MIN,SUM,AVG) 94 | |3|Relationship关系|他们用于获取关系的细节如startnode, endnode等 95 | 96 | 97 | --- 98 | **CQL常见的操作**
99 | 100 | - 创建节点 101 | 1)创建节点: 102 | create (e:Customer{id:'1001',name:'Bob',dob:'01/10/1982'}) 103 | create (cc:CreditCard{id:'5001',number:'1234567890',cvv:'888',expiredate:'20/17'}) 104 | 105 | 2)导入csv文件进行节点的创建 106 | load csv with headers from "file:///shop.csv" as df 107 | merge(:Shop{name:df.name,cn_name:df.cn_name,age:df.age,sex:df.age}) 108 | 109 | - Import data from a CSV file with a custom field delimiter 110 | 比如 load csv with headers from "file:///shop.csv" as df FIELDTERMINATOR '\t' 111 | - Importing large amounts of data 112 | 比如 USING PERIODIC COMMIT 500 113 | load csv with headers from "file:///shop.csv" as df 114 | 115 | ---------- 116 | 117 | ``` 118 | - 创建关系1 119 | 创建两个节点之间的关系: 120 | match(e:Customer),(cc:CreditCard) 121 | create (e)-[r:DO_SHOPPING_WITH]->(cc) 122 | 123 | 创建两个节点之间的关系(加边属性): 124 | match(e:Customer),(cc:CreditCard) 125 | create (e)-[r:DO_SHOPPING_WITH{shopdate:'12/12/2014',price:6666}]->(cc) 126 | 127 | - 创建关系2 128 | 根据两个节点之间的相同属性进行连接1: 129 | match(c:Customer),(p:Phone) 130 | where c.phone = p.phone_no 131 | create (c)-[:Call]->(p) 132 | 133 | 根据两个节点之间的相同属性进行连接2: 134 | match(a:Test),(b:Test22) 135 | where b.ide in a.name 136 | create (a)-[:sssssssssss]->(b) 137 | 这里的a.name是个list 138 | 139 | - 创建关系3 140 | 各自创建节点比如shop和phone两个节点,然后导入一个关系的csv文件进行连接 141 | ``` 142 | 143 | **shop.csv** 144 | |name|cn_name|age|sex 145 | |--|--|--|-- 146 | |Jack|杰克|22|男 147 | |Lily|丽丽|34|女 148 | |John|约翰|56|男 149 | |Mark|马克|99|男 150 | 151 | **phone.csv** 152 | |phone|id_p| 153 | |--|-- 154 | |1223|0 155 | |3432|1 156 | |9011|2 157 | 158 | **关系.csv** 159 | |name|phone 160 | |--|-- 161 | |Jack|1223 162 | |Lily|3432 163 | |John|9011 164 | |Mark|3432 165 | 166 | ``` 167 | cypher关系语句: 168 | load csv with headers from "file:///test.csv" as df 169 | match(a:Shop{name:df.name}),(b:Phone{phone:df.phone}) 170 | create (a)-[:Call{phone_id:df.id_p}]->(b) 171 | ``` 172 | *注:neo4j中不能创建双向或者无向的关系,只能单向* 173 | 174 | ``` 175 | ### 匹配 176 | 177 | 三层关系: 178 | match (n:企业)-[k*1..3]-(m) return n.company_nm 179 | 180 | ### 返回 181 | match(e:Customer),(cc:CreditCard) 182 | return e.name,cc.cvv 183 | 184 | ### 哪里 185 | match(n:Customer), (cc:CreditCard) 186 | where n.name = 'Lily' and cc.id = '5001' 187 | create (n)-[r:DO_SHOPPING_WITH{shopdate:'1/1/9999', price:100}]->(cc) 188 | 189 | 正则使用: 190 | match(n:Person) 191 | where n.name =~ '(?i)^[a-d].*' 192 | return n 193 | 194 | ### 删除 195 | 删除所有的节点和关系 196 | match(n) match(n)-[r]-() delete n,r 197 | 198 | 删除相关的节点及和这些节点相连的边(一阶) 199 | match(cc:Customer) 200 | *detach* delete cc 201 | 或者 202 | match(cc:Customer) match(cc)-[r]-() delete cc,r 203 | 204 | 删除产品及上下游相连关系和节点,(递归),除3款产品外 205 | match r=(n:Product)-[*]->() where not n.raw_name in ["xx1","xx2","xx3"] detach delete r 206 | 207 | 删除所有孤立节点 208 | match (n) where not (n)–-() delete n 209 | 210 | 删除一阶孤立节点,比如保险责任->保险子责任(保险责任上游还有带产品的不删) 211 | match (n)-[r]-(m) where n.raw_name='保险责任' and not (n)–[]-(:Product) detach delete n,r,m; 212 | 213 | 删除条款示例(需要执行三句话) 214 | match(n) where id(n)={id} detach delete n 215 | match (n)-[r]-(m) where n.raw_name='保险责任' and not (n)–[]-(:Product) detach delete n,r,m 216 | match (n) where not (n)–-() delete n 217 | 218 | ### 移除 219 | 可以移除节点的属性 220 | match(n:Customer) where n.name = 'Lily' 221 | remove n.dob 222 | 223 | ### 设置 224 | 可以设置节点的属性(增加或者改写) 225 | match(n:Customer) where n.name = 'Bob' 226 | SET n.id = 1003 227 | 228 | 对已经存在的点,进行属性添加操作 229 | **--Person:** 230 | create(:Person{cd:'1223',xx:'er'}) 231 | create(:Person{cd:'92223',xx:'iir'}) 232 | create(:Person{cd:'6783',xx:'rrrr'}) 233 | create(:Person{cd:'555903',xx:'ppppppppppr'}) 234 | ``` 235 | 236 | **--test.csv:** (注:导入csv的时候会把所有的转成string格式) 237 | 238 | |col_one|col_two|col_three 239 | |--|--|-- 240 | |555903|"桂勇"|"良" 241 | |92223|"黎明"|"优" 242 | |1223|"皇家"|"优" 243 | |6783|"汽车"|"良" 244 | 给Person添加两个属性 245 | load csv with headers from "file:///test.csv" as df 246 | match(n:Person) where n.cd = df.col_one 247 | set n.nm = df.col_two 248 | set n.credit = df.col_three 249 | 250 | ``` 251 | ### 排序 252 | match(n:Customer) 253 | return n.name, n.id, n.dob 254 | order by n.name desc 255 | 256 | ### UNWIND 257 | 创建节点 258 | unwind ['John','Mark','Peter'] as name 259 | create (n:Customer{name:name}) 260 | 261 | unwind [{id:1,name:'Bob',phone:1232},{id:2,name:'Lily',phone:5421},{id:3,name:'John',phone:9011}] as cust 262 | create (n:Customer{name:cust.name,id:cust.id,phone:cust.phone}) 263 | 264 | 删除节点 265 | unwind [1,2,3] as id 266 | match (n:Customer) where n.id = id 267 | delete n 268 | 269 | --- 270 | 271 | ### String 272 | match(e:Customer) 273 | return e.id,upper(e.name) as name, e.dob 274 | 275 | ### 聚合 276 | count三种写法: 277 | 1. Match (n:people) where n.age=18 return count(n) 278 | 2. Match (n:people{age:’18’}) return count(n) 279 | 3. Match (n:people) return count(n.age=18) 280 | 281 | ### INDEX 282 | 添加 CREATE INDEX ON :Person(name) 283 | 删除 DROP INDEX ON :Person(name) 284 | 查询 call db.indexes() 285 | 286 | ### 修改 287 | 在neo4j的文件夹conf下面,打开文件neo4j.conf,找到一下位置处 288 | 289 | dbms.active_database=graph.db,修改数据库名字,例如graph.db -> graph2.db即可。 290 | 291 | ``` 292 | -------------------------------------------------------------------------------- /08_vscode/README.md: -------------------------------------------------------------------------------- 1 | # vscode使用(版本1.86.2) 2 | 3 | ## 1. 在VScode中添加远程Linux服务器中Docker容器中的Python解释器 4 | 5 | **以dgx.6机器为例**
6 | ```shell 7 | # 第一步 创建容器 8 | nvidia-docker run -d --name myllm -p 8891:22 -v $PWD/llm:/workspace/llm -w /workspace/llm -it 10.xx.xx.xxx/zhoubin/llm:py311-cuda12.1.0-cudnn8-devel-ubuntu22.04 /bin/bash 9 | 注释: 10 | [-p 8891:22]:把docker的端口号22映射到服务器的端口号8891。 11 | [-d]:容器后台运行,避免退出容器后容器自动关闭。 12 | [-v]:挂载和同步目录,服务器和docker内有一个文件夹保持同步。 13 | [-it]:确保docker后台交互运行。 14 | [10.xx.xx.xxx/zhoubin/llm:py311-cuda12.1.0-cudnn8-devel-ubuntu22.04]:镜像名。 15 | [/bin/bash]:docker内要运行的指令。 16 | ``` 17 | ```shell 18 | #第二步 在容器内安装ssh服务 19 | docker exec -it [容器ID] /bin/bash 20 | # 更新apt-get 21 | 命令:apt-get update 22 | # 安装vim 23 | 命令:apt-get install vim 24 | # 安装openssh-server 25 | 命令:apt-get install openssh-server 26 | # 设置root密码(docker里面的用户名和密码,我这边账号密码都是root/root) 27 | 命令:passwd 28 | ``` 29 | ```shell 30 | # 第三步 配置/etc/ssh/sshd_config文件 31 | # 在文件/etc/ssh/sshd_config中添加下面的代码: 32 | PubkeyAuthentication yes 33 | PermitRootLogin yes 34 | 35 | # 第四步 重启ssh服务(好像每次停止容器后重启都需要运行下) 36 | /etc/init.d/ssh restart 37 | 或 service ssh restart 38 | 39 | # 第五步 退出docker后,验证端口映射 40 | docker ps -a 41 | docker port [容器ID] 22 42 | 若结果输出“0.0.0.0:8891”,则说明端口映射正确。 43 | ``` 44 | ```shell 45 | # 第6步 本地电脑连接docker(见Termius dgx6_docker_llm) 46 | ssh root@11.xx.xx.xxx -p 8891 ,密码是root 47 | ``` 48 | ```shell 49 | # 使用VSCode连接远程主机上的docker container 50 | # 打开VScode编辑器,按下快捷键“Ctrl+Shift+X”,查找安装“Remote Development”。安装完成后需要点击“reload”,然后按下快捷键“Ctrl+Shift+P”,输入“remote-ssh”,选择“open SSH Configuration file”,在文件xx/username/.ssh/config中添加如下内容: 51 | Host llm_docker #Host随便起名字 52 | HostName 11.xxx.xx.x 53 | User root 54 | Port 8891 55 | 56 | #保存后,按下快捷键"Ctrl+Shift+P",输入"remote-ssh",选择"Connect to Host...",然后点击"llm_docker",接着选择“Linux”,最后按提示输入第三步中设置的root连接密码,在左下角显示"SSH:llm_docker",说明已经成功连接docker。 57 | ``` 58 | 59 | ```shell 60 | #内网环境远程如果出现连接不上,大概率是vscode-server无法下载导致,可以手动搞定 61 | https://update.code.visualstudio.com/commit:903b1e9d8990623e3d7da1df3d33db3e42d80eda/server-linux-x64/stable 62 | 63 | 具体参考附录中的[VSCode连不上远程服务器] 64 | ``` 65 | 66 | 67 | ## 2. Debugging(自带,不需要额外安装插件) 68 | 69 | 在Visual Studio Code(VSCode)中,[Debug Console](https://code.visualstudio.com/Docs/editor/debugging)是一个用于查看程序调试信息的窗口。它通常用于查看程序在调试过程中输出的日志信息、变量的值等。Debug Console提供了一个方便的方式来查看和分析程序的执行过程,帮助开发人员定位和解决代码中的问题。 70 | 71 | 72 | ---- 73 | 74 | [vscode历史版本下载地址](https://code.visualstudio.com/updates/v1_86)
75 | [vscode扩展应用市场vsix文件手动下载安装](https://marketplace.visualstudio.com/search?target=VSCode&category=All%20categories&sortBy=Installs)
76 | [vscode扩展应用市场vsix文件手动下载历史版本插件包](https://blog.csdn.net/qq_15054345/article/details/133884626)
77 | [在VScode中添加Linux中的Docker容器中的Python解释器](https://blog.csdn.net/weixin_43268590/article/details/129244984)
78 | [VSCode连不上远程服务器](https://blog.csdn.net/qq_42610612/article/details/132782965)
79 | [无网机的vscode中怎么使用jupyter notebook](https://www.bilibili.com/read/cv34411972/?jump_opus=1)
80 | -------------------------------------------------------------------------------- /09_remote_ipython/README.md: -------------------------------------------------------------------------------- 1 | [**1. pycharm远程配置**](#pycharm远程配置) 2 | 3 | [**2. 远程ipython http版**](#远程ipython_http版) 4 | 5 | [**3. 远程ipython https安全版**](#远程ipython_https安全版) 6 | 7 | [**4. jupyter notebook启动错误总结**](#jupyter_notebook启动错误总结) 8 | 9 | [**5. 添加Anaconda虚拟环境**](#添加anaconda虚拟环境) 10 | 11 | # pycharm远程配置 12 | 13 | pycharm远程配置:
14 | file->Settings->Project Interpreter->加入远程ssh的连接和python的执行文件地址
15 | 然后再加一个path mappings(本地和远程的文件存储地址) 16 | 17 | 文件同步配置:
18 | Tools->Deployment->Configuration->添加一个新SFTP
19 | Root path选远程文件夹
20 | Web server root URL: http:///
21 | Mappings选local path工程目录,其他的都为/
22 | 23 | done! 24 | 25 | # 远程ipython_http版 26 | 27 | 1. 打开ipython 28 | ```python 29 | from IPython.lib import passwd #from notebook.auth import passwd 30 | In [2] : passwd() # 输入密码 31 | Enter password: 32 | Verify password: 33 | Out[2]: 'sha1:f9...' 34 | ``` 35 | 36 | 2. 新建jupyter_config.py,输入如下配置。 37 | ```bash 38 | c.NotebookApp.password = u'sha1:f9...' 39 | c.NotebookApp.ip = '*' 40 | c.NotebookApp.open_browser = False 41 | c.NotebookApp.port = 8888 42 | ``` 43 | 44 | 3. 启动jupyter notebook 并指定配置文件,输入如下命令。 45 | ```bash 46 | jupyter notebook --config=jupyter_config.py 47 | ``` 48 | 49 | 4. 若客户端浏览器无法打开jupyter,有可能是防火墙的缘故,输入如下命令开放对应的 50 | 的端口(若使用IPv6,把命令iptables改成ip6tables) 51 | ```bash 52 | iptables -I INPUT -p tcp --dport 8888 -j ACCEPT 53 | iptables save 54 | ``` 55 | 56 | # 远程ipython_https安全版 57 | 58 | 通过mac终端登录:
59 | sudo ssh -p 22 ubuntu@182.254.247.182
60 | z1234..
61 | 安装教程和视频(在本机)
62 | http://blog.csdn.net/hshuihui/article/details/53320144
63 | 64 | 安装ipython notebook on 百度云
65 | ```bash 66 | wget https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/ 67 | ``` 68 | PATH in your .bashrc or .bash_profile 69 | ```bash 70 | export PATH="/root/anaconda2/bin:$PATH" 71 | ``` 72 | 在服务器上启动IPython,生成自定义密码的sha1 73 | ```python 74 | In [1]: from IPython.lib import passwd 75 | In [2]: passwd() 76 | Enter password: 77 | Verify password: 78 | Out[2]: 'sha1:01f0def65085:059ed81ab3f5658e7d4d266f1ed5394e9885e663' 79 | ``` 80 | 创建IPython notebook服务器 81 | ```bash 82 | ipython profile create nbserver 83 | ``` 84 | 生成mycert.pem 85 | ```bash 86 | mkdir certs 87 | cd certs 88 | 然后openssl req -x509 -nodes -days 365 -newkey rsa:1024 -keyout mycert.pem -out mycert.pem 89 | ``` 90 | 我们重点要关注的是 cd .ipython/profile_nbserver
91 | ipython_notebook_config.py这个文件,待会儿我们要修改该文件来配置服务器。不过,有时候这个文件不能生成, 92 | 这时候我们自己在这里新建即可,使用vim或者gedit。我自己配置的时候就没有生成ipython_notebook_config.py这个文件,我使用vim新建了一个: 93 | 然后把以下代码复制进去(替换certfile路径和sha1),保存 94 | 95 | ```bash 96 | # Configuration file for ipython-notebook 97 | c = get_config() 98 | #Kernel config 99 | c.IPKernelApp.pylab = 'inline' 100 | #Notebook config 101 | c.NotebookApp.certfile = u'/root/certs/mycert.pem' 102 | c.NotebookApp.ip = '*' 103 | c.NotebookApp.open_browser = False 104 | c.NotebookApp.password = u'sha1:375df20c451e:16f5535e55154eb3490dbcb83d8cb930ef3c3799' 105 | c.NotebookApp.port = 8888 106 | ``` 107 | 启动命令:
108 | ```bash 109 | ipython notebook --config=/root/.ipython/profile_nbserver/ipython_notebook_config.py 110 | ``` 111 | ```bash 112 | nohup ipython notebook --config=/root/.ipython/profile_nbserver/ipython_notebook_config.py 113 | 如果想关闭nohup先lsof nohup.out 然后kill -9 [PID] 114 | 登录ipython notebook: 115 | ``` 116 | 117 | 或者建一个jupyter_config.py文件然后输入(http访问)
118 | ```python 119 | c.NotebookApp.password = u'sha1:ebf4c635f6b6:7d6824aa8f863ffbe7c264b28854ec2acf1a0961' 120 | c.NotebookApp.ip = '*' 121 | c.NotebookApp.open_browser = False 122 | c.NotebookApp.port = 8888 123 | ``` 124 | 然后用命令行启动 125 | ```shell 126 | nohup jupyter notebook --config=jupyter_config.py 127 | ``` 128 | 129 | --- 130 | 131 | Jupyter Notebook 添加目录插件
132 | 133 | ```bash 134 | pip install jupyter_contrib_nbextensions 135 | ``` 136 | ```bash 137 | jupyter contrib nbextension install --user --skip-running-check 138 | ``` 139 | 注意配置的时候要确保没有打开Jupyter Notebook 140 | 141 | # The installation of the Java Jupyter Kernel 142 | 143 | 要求jdk11及以上,maven3.6.3及以上
144 | ```shell 145 | java --list-modules | grep "jdk.jshell" 146 | 147 | > jdk.jshell@12.0.1 148 | ``` 149 | ```shell 150 | git clone https://github.com/frankfliu/IJava.git 151 | cd IJava/ 152 | ./gradlew installKernel 153 | ``` 154 | 然后启动jupyter notebook即可,选java kernel的notebook 155 | 156 | ### Run docker image 157 | 158 | ```shell 159 | cd jupyter 160 | docker run -itd -p 127.0.0.1:8888:8888 -v $PWD:/home/jupyter deepjavalibrary/jupyter 161 | ``` 162 | 163 | # jupyter_notebook启动错误总结 164 | 165 | [Jupyter Notebook "signal only works in main thread"](https://blog.csdn.net/loovelj/article/details/82184223)
166 | 查询了很多网站,最后发现是两个包版本安装不对,重新安装这两个包就就可以了
167 | ```shell 168 | pip install -i https://pypi.tuna.tsinghua.edu.cn/simple "pyzmq==17.0.0" "ipykernel==4.8.2" 169 | ``` 170 | 171 | # 添加anaconda虚拟环境 172 | 173 | 把anaconda3整个文件夹拷贝到anaconda3/envs下,然后取名为比如tf-gpu
174 | 然后可以把这个文件夹下的包的版本可以自行替换比如把tf2.0替换成tf1.14(注:不要删除,会有问题)
175 | 然后在jupyter notebook添加Anaconda虚拟环境的python kernel 176 | ```shell 177 | conda create -n tf-gpu python=3.8 # 创建tf-gpu虚拟环境 178 | source activate tf-gpu # 激活tf-gpu环境 179 | conda deactivate # 退出虚拟环境 180 | conda install ipykernel # 安装ipykernel模块(如果是虚拟机没联网,可以去https://anaconda.org/conda-forge/ipykernel/files下载) 181 | python -m ipykernel install --user --name tf-gpu --display-name "tf-gpu" # 进行配置 182 | jupyter notebook # 启动jupyter notebook,然后在"新建"中就会有py3这个kernel了 183 | ``` 184 | 虚拟环境启动notebook
185 | ```shell 186 | 1. conda install jupyter notebook(如果不行,主环境的site-package整个拷贝到envs/下的虚拟环境) 187 | 2. 虚拟环境安装jupyter_nbextensions_configurator(https://zodiac911.github.io/blog/jupyter-nbextensions-configurator.html) 188 | 3. 虚拟环境conda install nb_conda(安装好这个则notebook新建的时候会出现该环境) 189 | 4. 进到虚拟环境启动jupyter notebook以后,如果import包有问题则退出并运行conda install nomkl numpy scipy scikit-learn numexpr 190 | ``` 191 | 192 | -------------------------------------------------------------------------------- /10_docker/README.md: -------------------------------------------------------------------------------- 1 | # [docker入门实践](https://yeasy.gitbook.io/docker_practice/) 2 | 3 | ## 1. docker安装及配置Docker镜像站 4 | 5 | 1.1 mac下安装
6 | [mac安装网址](https://hub.docker.com/editions/community/docker-ce-desktop-mac)
7 | [docker docs for mac](https://docs.docker.com/docker-for-mac/)
8 | 9 | 1.2 linux下安装
10 | [Install Docker Engine on Ubuntu](https://docs.docker.com/engine/install/ubuntu/) 11 | 12 | 1.3 配置docker镜像站
13 | [docker镜像站网址](https://www.daocloud.io/mirror#accelerator-doc)
14 | 15 | 1.4 配置docker代理
16 | 17 | - windows中右击图标,选settings->Proxies 18 | - [mac](https://www.cnblogs.com/EasonJim/p/9988154.html) 19 | - [linux](https://blog.csdn.net/qq_30034989/article/details/132021346) 20 | 21 | ```shell 22 | # 如果使用HTTP代理服务器时,将为docker服务创建systemd插件目录 23 | mkdir -p /etc/systemd/system/docker.service.d 24 | # 创建一个名为的文件/etc/systemd/system/docker.service.d/http-proxy.conf,添加HTTP_PROXY环境变量 25 | [Service] 26 | Environment="HTTP_PROXY=http://proxy.example.com:80/" 27 | # 或者,如果使用HTTPS代理服务器,那么再创建一个名为/etc/systemd/system/docker.service.d/https-proxy.conf 添加HTTPS_PROXY环境变量 28 | [Service] 29 | Environment="HTTPS_PROXY=https://proxy.example.com:443/" 30 | # 为Docker配置不代理的地址时,可以通过NO_PROXY环境变量指定它们,比如HTTP代理服务器的配置 31 | [Service] 32 | Environment="HTTP_PROXY=http://proxy.example.com:80/" "NO_PROXY=localhost,127.0.0.1,docker-registry.somecorporation.com" 33 | [Service] 34 | Environment="HTTPS_PROXY=https://proxy.example.com:443/" "NO_PROXY=localhost,127.0.0.1,docker-registry.somecorporation.com" 35 | # 重新读取服务的配置文件 36 | systemctl daemon-reload 37 | # 重启Docker 38 | systemctl restart docker #或者sudo service docker restart 39 | # 验证是否已加载配置 40 | systemctl show --property=Environment docker 41 | ``` 42 | 43 | 44 | ## 2. docker基本命令 45 | 46 | 2.1 docker查看版本及images 47 | ```shell 48 | docker --version 49 | docker images 50 | ``` 51 | 52 | 2.2 docker run 53 | ```shell 54 | docker run hello-world 55 | # run之前如果没有这个images,则会从docker_hub上先pull下来 docker pull hello-world 56 | ``` 57 | 58 | 2.3 如果不小心关了container或者重启了电脑 59 | ``` 60 | # 先查看container历史 61 | docker ps -a 62 | # 重启container即可,前提是docker run的时候要加-volume把数据挂载到本地 63 | docker start 64 | ``` 65 | 66 | 2.3 docker跑完以后需要删除container再删除image 67 | ```shell 68 | # 查看image对应的container id 69 | docker ps -a 70 | # 删除container 71 | docker rm container_id 72 | # 删除image 73 | docker rmi image_id 74 | # 也可以直接暴力删除image 75 | docker rmi -f image_id 76 | # 如果存在同名同id不同tag的镜像 77 | 可以使用repository:tag的组合来删除特殊的镜像 78 | ``` 79 | 80 | 2.4 docker打开image bash编辑,比如打开python镜像bash下载一些包再保存 81 | ```shell 82 | # 如果原来的镜像已经启动了container,则 83 | docker exec -it /bin/bash 84 | # 进去修改完后 85 | docker start 86 | #------------------------------------------ 87 | docker pull python:3.6 88 | docker run -it python:3.6 /bin/bash #启动镜像并进入到shell页面 89 | docker run -dit python:3.6 /bin/bash #如果只是想启动并后台运行 90 | # 接下去进行一些pip install一些包等操作 91 | docker commit -m="has update" -a="binzhouchn" binzhouchn/python36:1.3 92 | ``` 93 | 94 | 2.5 docker保存和读取image(存成tar.gz文件) 95 | ```shell 96 | # 保存 97 | docker save -o helloword_test.tar fce45eedd449(image_id) 98 | #或者docker save -o mydocker.tar.gz mydocker:1.0.0 99 | # 读取 100 | docker load -i helloword_test.tar 101 | ``` 102 | 103 | 2.6 docker保存和读取container 104 | ```shell 105 | # 保存 106 | docker export -o helloword_test.tar fce45eedd444(container_id) 107 | # 读取 108 | docker import ... 109 | ``` 110 | 111 | 2.7 修改repository和tag名称 112 | ```shell 113 | # 加载images后可以名称都为 114 | docker tag [image id] [name]:[版本] 115 | ``` 116 | 117 | 2.8 用dockerfile建一个image,并上传到dockerhub 118 | ``` 119 | # 建一个dockerfile 120 | cat > Dockerfile <) 142 | ``` 143 | 144 | 2.11 批量停止并删除容器 145 | ```shell 146 | docker stop $(docker ps -a -q) 147 | docker rm $(docker ps -a -q) 148 | ``` 149 | 150 | 2.12 虚悬镜像 151 | 152 | 上面的镜像列表中,还可以看到一个特殊的镜像,这个镜像既没有仓库名,也没有标签,均为
153 | ```shell 154 | 00285df0df87 5 days ago 342 MB 155 | ```
156 | 这个镜像原本是有镜像名和标签的,原来为 mongo:3.2,随着官方镜像维护,发布了新版本后,重新 docker pull mongo:3.2 时,mongo:3.2 这个镜像名被转移到了新下载的镜像身上,而旧的镜像上的这个名称则被取消,从而成为了 。除了 docker pull 可能导致这种情况,docker build 也同样可以导致这种现象。由于新旧镜像同名,旧镜像名称被取消,从而出现仓库名、标签均为 的镜像。这类无标签镜像也被称为 虚悬镜像(dangling image) ,可以用下面的命令专门显示这类镜像:
157 | ```shell 158 | $ docker image ls -f dangling=true 159 | REPOSITORY TAG IMAGE ID CREATED SIZE 160 | 00285df0df87 5 days ago 342 MB 161 | ```
162 | 一般来说,虚悬镜像已经失去了存在的价值,是可以随意删除的,可以用下面的命令删除。
163 | ```shell 164 | $ docker image prune 165 | ``` 166 | 167 | 2.13 拷贝宿主机本地文件到docker中,和从docker中拷贝到宿主机 168 | ```shell 169 | #1 170 | docker cp test.txt :/home 171 | #2 172 | docker cp :/home/xx.txt /opt 173 | ``` 174 | 175 | 2.14 根据镜像名定位到已经开启python:3.6镜像容器的id 176 | ```shell 177 | docker ps -a| grep python:3.6 | awk '{print $1}' #方法一 178 | docker ps -aq --filter ancestor=python:3.6 #方法二 179 | # 根据镜像名停止和删除容器 180 | docker stop `docker ps -a| grep python:3.6 | awk '{print $1}'` 181 | docker rm `docker ps -a| grep python:3.6 | awk '{print $1}'` 182 | ``` 183 | 184 | 2.15 docker中python print不生效解决办法 185 | ```shell 186 | #方法一 显式调用flush 187 | print("Hello www", flush=True) 188 | #方法二 使用 "-u" 参数执行 python 命令 189 | sudo nvidia-docker run -v $PWD/masr_bz:/workspace/masr_bz -w /workspace/masr_bz binzhouchn/pytorch:1.7-cuda10.1-cudnn7-masr python -u train.py 190 | ``` 191 | 192 | 193 | ## 3. docker镜像使用 194 | 195 | 【3.0 工作中】
196 | **方法一(环境和代码独立,代码放外面)** 197 | ```shell 198 | 199 | - 配置好环境镜像比如binzhouchn/python36:1.4 200 | - docker run -d -p 5005:5005 -v $PWD/xx_service:/usr/src/xx_service -w /usr/src/xx_service binzhouchn/python36:1.4 gunicorn -b :5005 server:app 201 | ``` 202 | 203 | **方法二(代码放在镜像里面为一个整体)**
204 | ```shell 205 | #构建Dockerfile 206 | FROM binzhouchn/python36:1.4 207 | MAINTAINER zhoubin zhoubin@qq.com 208 | COPY target/xx_service /usr/src/xx_service 209 | WORKDIR /usr/src/xx_service 210 | ENTRYPOINT ["gunicorn", "-b", ":5005", "server:app"] 211 | #run dockerfile 212 | docker build -t binzhouchn/new_img:0.1 . 213 | #run image(后台运行,5005映射出来) 214 | docker run -d -p 5005:5005 new_img:0.1 215 | ``` 216 | 217 | 218 | 3.1 docker跑一个helloworld 219 | ```shell 220 | docker run -v $PWD/myapp:/usr/src/myapp -w /usr/src/myapp python:3.5 python helloworld.py 221 | # 本地需要建一个myapp文件夹,把helloworld.py文件放文件夹中,然后返回上一级cd .. 222 | 命令说明: 223 | -v $PWD/myapp:/usr/src/myapp :将主机中当前目录下的myapp挂载到容器的/usr/src/myapp 224 | -w /usr/src/myapp :指定容器的/usr/src/myapp目录为工作目录 225 | python helloworld.py :使用容器的python命令来执行工作目录中的helloworld.py文件 226 | ``` 227 | 228 | 3.1 docker跑一个简单的flask demo(用到python3.5镜像) 229 | ```shell 230 | # -d后台运行 -p端口映射 231 | docker run -d -p 5000:5000 -v $PWD/myapp:/usr/src/myapp -w /usr/src/myapp binzhou/python35:v2 python app.py 232 | ``` 233 | 234 | 3.2 docker用mysql镜像 235 | ``` 236 | # 先下载镜像 237 | docker pull mysql:5.5 238 | # 运行容器 可以先把-v去掉 239 | docker run -p 3306:3306 --name mymysql -v $PWD/conf:/etc/mysql/conf.d -v $PWD/logs:/logs -v $PWD/data:/var/lib/mysql -e MYSQL_ROOT_PASSWORD=123456 -d mysql:5.5 240 | 241 | -p 3306:3306:将容器的 3306 端口映射到主机的 3306 端口。 242 | -v -v $PWD/conf:/etc/mysql/conf.d:将主机当前目录下的 conf/my.cnf 挂载到容器的 /etc/mysql/my.cnf。 243 | -v $PWD/logs:/logs:将主机当前目录下的 logs 目录挂载到容器的 /logs。 244 | -v $PWD/data:/var/lib/mysql :将主机当前目录下的data目录挂载到容器的 /var/lib/mysql 。 245 | -e MYSQL_ROOT_PASSWORD=123456:初始化 root 用户的密码。 246 | 247 | # 用三方工具Navicat或者python连接,先建好db比如test_db 248 | import pymysql 249 | # 打开数据库连接 250 | db = pymysql.connect("localhost","root","123456","test_db") 251 | # 使用 cursor() 方法创建一个游标对象 cursor 252 | cursor = db.cursor() 253 | sql = "INSERT INTO tt(a, b, date) VALUES ('%d', '%s', '%s')" 254 | data = (306, '插入6', '20190615') 255 | cursor.execute(sql % data) 256 | db.commit() 257 | 258 | # 起了mysql服务以后,在用docker python去插入数据 259 | # 需要先查看docker mysql的容器ip地址,命令看2.8 260 | # 然后localhost改成mysql容器的ip地址即可,其他一样 261 | 262 | ``` 263 | 264 | 3.3 docker用redis镜像 265 | ``` 266 | # 启动redis命令 267 | docker run --name docker-redis-test -p 6379:6379 -d redis:latest --requirepass "123456" 268 | # redis客户端连接命令 269 | docker exec -it redis-cli 270 | # 进去以后的操作 271 | auth 123456 272 | set name zhangsan 273 | get name 274 | quit 275 | 276 | # python连接docker起的redis服务 277 | import redis 278 | r = redis.Redis(host='localhost', port=6379, password='123456') 279 | r.set('name', 'John') 280 | print(r.get('name')) 281 | 282 | # redis可视化工具RDM(已安装) 283 | ``` 284 | 285 | 3.4 docker用mongo镜像 286 | ``` 287 | # 启动mongodb命令 288 | docker run -p 27017:27017 -v $PWD/mongo_db:/data/mongo_db -d mongo:4.0.10 289 | # 连接到mongo镜像cli 290 | docker run -it mongo:4.0.10 mongo --host <容器ip> 291 | 292 | # 建database建collection比如runoob然后插入数据 293 | db.runoob.insert({"title": 'MongoDB 教程', 294 | "description": 'MongoDB 是一个 Nosql 数据库', 295 | "by": 'w3cschool', 296 | "url": 'http://www.w3cschool.cn', 297 | "tags": ['mongodb', 'database', 'NoSQL'], 298 | "likes": 100}) 299 | db.runoob.find() 300 | 301 | # python连接docker起的mongo服务 302 | import pymongo 303 | mongodb_host = 'localhost' 304 | mongodb_port = 27017 305 | # pymongo.MongoClient(mongodb_host, mongodb_port, username='test', password='123456') 306 | myclient = pymongo.MongoClient('mongodb://localhost:27017/') 307 | myclient.list_database_names() 308 | mydb = myclient["mongo_testdb"] 309 | mydb.list_collection_names() 310 | mycol = mydb["runoob"] 311 | # 创建collection 312 | mydb.create_collection('test2') 313 | # 插入数据 314 | mydict = { "name": "Google", "age": "25", "url": "https://www.google.com" } 315 | mycol.insert_one(mydict) 316 | # 查看数据 317 | list(mycol.find()) 318 | ``` 319 | 320 | 3.5 docker用elasticsearch镜像 321 | ``` 322 | # Run Elasticsearch 323 | docker run -d --name elasticsearch_for_test -p 9200:9200 -e "discovery.type=single-node" elasticsearch:6.6.0 324 | # 安装elasticsearch-head 325 | ``` 326 | ```python 327 | # 用python连接,并进行增删改查 328 | from elasticsearch import Elasticsearch 329 | from elasticsearch import helpers 330 | # es = Elasticsearch(hosts="localhost:9200", http_auth=('username','passwd')) 331 | esclient = Elasticsearch(['localhost:9200']) 332 | # 高效插入ES 333 | action1 = { 334 | "_index": "idx111", 335 | "_type": "test", 336 | # "_id": , 337 | "_source": { 338 | 'ServerIp': '0.1.1.1', 339 | 'SpiderType': 'toxic', 340 | 'Level': 4 341 | } 342 | } 343 | action2 = { 344 | "_index": "idx111", 345 | "_type": "pre", 346 | # "_id": 1, 347 | "_source": { 348 | 'ServerIp': '0.1.1.2', 349 | 'SpiderType': 'non-toxic', 350 | 'Level': 1 351 | } 352 | } 353 | actions = [action1, action2] 354 | helpers.bulk(esclient, actions) 355 | 356 | #--------------------------------------------------- 357 | # 创建schema然后单条插入数据 358 | # 类似创建schema 359 | answer_index = 'baidu_answer' 360 | answer_type = 'doc22' 361 | esclient.indices.create(answer_index) 362 | answer_mapping = { 363 | "doc22": { 364 | "properties": { 365 | "id": { 366 | "type": "integer", 367 | # "index": True 368 | }, 369 | "schoolID":{ 370 | "type":"text" 371 | }, 372 | "schoolName":{ 373 | "type": "text", 374 | "analyzer": "ik_max_word" # 这个需要安装,先run docker6.6.0然后docker exec -it /bin/bash下载解压ik后exit然后restart这个container即可,之后可以新生成一个image 375 | # "analyzer":"whitespace" 376 | }, 377 | "calNum":{ 378 | "type":"float" 379 | } 380 | } 381 | } 382 | } 383 | esclient.indices.put_mapping(index=answer_index, doc_type=answer_type, body=answer_mapping) 384 | # 创建完schema以后导入数据 385 | doc = {'id': 7, 'schoolID': '007', 'schoolName': '春晖外国语学校', 'calNum':6.20190624} 386 | esclient.index(index=answer_index ,doc_type=answer_type ,body=doc, id=doc['id']) 387 | esclient.index(index=answer_index ,doc_type=answer_type ,body=doc, id=10) 388 | #---------------------------------------------------- 389 | 390 | # 删除单条数据 391 | # esclient.delete(index='indexName', doc_type='typeName', id='idValue') 392 | esclient.delete(index='pre', doc_type='imagetable2', id=1) 393 | # 删除索引 394 | esclient.indices.delete(answer_index) 395 | 396 | # 更新 397 | # esclient.update(index='indexName', doc_type='typeName', id='idValue', body={_type:{待更新字段}}) 398 | new_doc = {'id': 7, 'schoolId': '007', 'schoolName': '更新名字1'} 399 | esclient.update(index=answer_index, id=7, doc_type=answer_type, body={'doc': new_doc}) # 注意body中一定要加_type doc,更新的body中不一定要加入所有字段,只要把要更新的几个字段加入即可 400 | 401 | # 查询 402 | ### 根据id查找数据 403 | res = esclient.get(index=answer_index, doc_type=answer_type, id=7) 404 | ### match:在schoolName中包含关键词的都会被搜索出来(这里的分词工具是ik) 405 | # res = esclient.search(index=answer_index,body={'query':{'match':{'schoolName':'春晖外'}}}) 406 | res = esclient.search(index=answer_index,body={'query':{'match':{'schoolName':'春晖学校'}}}) 407 | ### ids:根据id值 408 | esclient.search(index='baidu_answer',body={'query':{'ids':{'values':'10'}}}) 409 | ``` 410 | 411 | 3.6 docker用neo4j镜像 412 | ``` 413 | # docker启动neo4j服务 414 | docker run \ 415 | --publish=7474:7474 --publish=7687:7687 \ 416 | --volume=$PWD/neo4j/data:/data \ 417 | -d neo4j:latest 418 | 419 | # 然后登陆网页可视化界面 420 | 421 | # 或使用Cypher shell 422 | docker exec --interactive --tty bin/cypher-shell 423 | # 退出:exit 424 | ``` 425 | 426 | 3.7 stardog 427 | 428 | ``` 429 | docker pull stardog/stardog:latest 430 | docker run -v ~/stardog-6.2.2/:/var/opt/stardog -e STARDOG_SERVER_JAVA_ARGS="-Xmx8g -Xms8g -XX:MaxDirectMemorySize=2g" stardog/stardog:latest 431 | 432 | ``` 433 | 434 | 3.8 容器云k8s 435 | 436 | Kubernetes是什么?Kubernetes是一个全新的基于容器技术的分布式架构解决方案,是Google开源的一个容器集群管理系统,Kubernetes简称K8S。Kubernetes 提供了完善的管理工具,这些工具涵盖了开发、部署测试、运维监控在内的各个环节。
437 | 438 | Kubernetes特性
439 | - 自我修复:在节点故障时,重新启动失败的容器,替换和重新部署,保证预期的副本数量;杀死健康检查失败的容器,并且在未准备好之前不会处理用户的请求,确保线上服务不中断。 440 | - 弹性伸缩:使用命令、UI或者基于CPU使用情况自动快速扩容和缩容应用程序实例,保证应用业务高峰并发时的高可用性;业务低峰时回收资源,以最小成本运行服务。 441 | - 自动部署和回滚:K8S采用滚动更新策略更新应用,一次更新一个Pod,而不是同时删除所有Pod,如果更新过程中出现问题,将回滚更改,确保升级不影响业务。 442 | - 服务发现和负载均衡:K8S为多个容器提供一个统一访问入口(内部IP地址和一个DNS名称),并且负载均衡关联的所有容器,使得用户无需考虑容器IP问题。 443 | - 机密和配置管理:管理机密数据和应用程序配置,而不需要把敏感数据暴露在镜像里,提高敏感数据安全性。并可以将一些常用的配置存储在K8S中,方便应用程序使用。 444 | - 存储编排:挂载外部存储系统,无论是来自本地存储,公有云,还是网络存储,都作为集群资源的一部分使用,极大提高存储使用灵活性。 445 | - 批处理:提供一次性任务,定时任务;满足批量数据处理和分析的场景。 446 | 447 | [Kubernetes 深入学习(一) —— 入门和集群安装部署](https://www.cnblogs.com/chiangchou/p/k8s-1.html#_label0_0)
448 | [Kubernetes(一) 跟着官方文档从零搭建K8S](https://juejin.cn/post/6844903943051411469)
449 | [kubeadm部署k8s集群最全最详细](https://blog.csdn.net/Doudou_Mylove/article/details/103901732)
450 | 451 | 452 | 453 | [RDF入门](https://blog.csdn.net/txlCandy/article/details/50959358)
454 | [OWL语言](https://blog.csdn.net/zycxnanwang/article/details/86557350)
455 | 456 | -------------------------------------------------------------------------------- /10_docker/mi_docker_demo/README.md: -------------------------------------------------------------------------------- 1 | ## flask on docker demo 2 | 3 | 1. 如果用request.json则传入的数据需要json格式
4 | 2. 如果用request.values则传入的数据是[(key, value),()]这种形式
5 | 6 | ### 这里以json为例 7 | 8 | ```python 9 | import requests 10 | json={'id': 1223, 'text': '我是中国人'} 11 | r = requests.post('http://0.0.0.0:5000/req_message', json=json) 12 | r.json() 13 | 14 | # {'responseTime': '20190515120101', 'sid': 1223, 'text_sep': '我 是 中国 人'} 15 | ``` -------------------------------------------------------------------------------- /10_docker/mi_docker_demo/app.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | from flask import Flask 4 | from flask import request 5 | from flask import make_response, Response 6 | from flask import jsonify 7 | import datetime 8 | import jieba 9 | jieba.initialize() 10 | 11 | 12 | # 创建一个falsk对象 13 | app = Flask(__name__) 14 | 15 | 16 | @app.route('/') 17 | def get_simple_test(): 18 | return 'BINZHOU TEST' 19 | 20 | @app.route('/req_message', methods=['POST']) 21 | def req_message(): 22 | print(request.json) 23 | if request.method == 'POST': 24 | id_ = request.json.get('id') 25 | text_ = request.json.get('text') 26 | text_sep_str = ' '.join(jieba.lcut(text_)) 27 | res = { 28 | 'sid': id_, 29 | 'text_sep': text_sep_str, 30 | 'responseTime': datetime.datetime.now().strftime('%Y%m%d%H%M%S')} 31 | return jsonify(res) 32 | 33 | app.config['JSON_AS_ASCII'] = False 34 | app.run(host='0.0.0.0', port=5000, debug=False) 35 | -------------------------------------------------------------------------------- /10_docker/mi_docker_demo/docker_build: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | mkdir -p release/ 4 | cp -r * /home/work 5 | pip install -r requirements.txt -U -i https://pypi.tuna.tsinghua.edu.cn/simple/ 6 | -------------------------------------------------------------------------------- /10_docker/mi_docker_demo/requirements.txt: -------------------------------------------------------------------------------- 1 | flask 2 | jieba 3 | requests -------------------------------------------------------------------------------- /10_docker/newapi_docker_demo/README.md: -------------------------------------------------------------------------------- 1 | ## Dockerfile 2 | 3 | ```Dockerfile 4 | FROM node:16 as builder 5 | 6 | WORKDIR /build 7 | COPY web/package.json . 8 | RUN npm install 9 | COPY ./web . 10 | COPY ./VERSION . 11 | RUN DISABLE_ESLINT_PLUGIN='true' VITE_REACT_APP_VERSION=$(cat VERSION) npm run build 12 | 13 | FROM golang AS builder2 14 | 15 | ENV GO111MODULE=on \ 16 | CGO_ENABLED=1 \ 17 | GOOS=linux 18 | 19 | WORKDIR /build 20 | ADD go.mod go.sum ./ 21 | RUN go mod download 22 | COPY . . 23 | COPY --from=builder /build/dist ./web/dist 24 | RUN go build -ldflags "-s -w -X 'one-api/common.Version=$(cat VERSION)' -extldflags '-static'" -o one-api 25 | 26 | FROM alpine 27 | 28 | RUN apk update \ 29 | && apk upgrade \ 30 | && apk add --no-cache ca-certificates tzdata \ 31 | && update-ca-certificates 2>/dev/null || true 32 | 33 | COPY --from=builder2 /build/one-api / 34 | EXPOSE 3000 35 | WORKDIR /data 36 | ENTRYPOINT ["/one-api"] 37 | ``` 38 | 39 | 40 | ## Dockerfile 解析 41 | 42 | 这个 Dockerfile 通过多个阶段构建一个含前端和后端组件的应用。每个阶段使用不同的基础镜像和步骤来完成特定的任务。 43 | 44 | ### 第一阶段:前端构建(Node.js) 45 | 46 | - **基础镜像**: 47 | - `FROM node:16 as builder`:使用 Node.js 16 版本的官方镜像作为基础镜像,并标记此构建阶段为 `builder`。 48 | - **设置工作目录**: 49 | - `WORKDIR /build`:将工作目录设置为 `/build`。 50 | - **复制文件**: 51 | - `COPY web/package.json .`:将前端代码目录下的 `package.json` 文件复制到工作目录中。 52 | - **安装依赖**: 53 | - `RUN npm install`:根据 `package.json` 安装所需依赖。 54 | - **复制前端代码和版本文件**: 55 | - `COPY ./web .`:将web文件夹下所有文件复制到工作目录。 56 | - `COPY ./VERSION .`:将项目版本文件复制到工作目录。 57 | - **构建前端项目**: 58 | - `RUN DISABLE_ESLINT_PLUGIN='true' VITE_REACT_APP_VERSION=$(cat VERSION) npm run build`:设置环境变量并执行前端构建脚本,生成生产环境用的前端文件。 59 | 60 | ### 第二阶段:后端构建(Go) 61 | 62 | - **基础镜像**: 63 | - `FROM golang AS builder2`:使用 Go 的官方镜像作为基础,并标记此阶段为 `builder2`。 64 | - **环境变量**: 65 | - 设置多个环境变量,以支持 Go 的模块系统和确保生成的是适用于 Linux 的静态链接二进制文件。 66 | - **设置工作目录**: 67 | - `WORKDIR /build`:设置工作目录。 68 | - **添加 Go 模块文件**: 69 | - `ADD go.mod go.sum ./`:添加 Go 模块定义文件。 70 | - **下载依赖**: 71 | - `RUN go mod download`:下载 Go 依赖。 72 | - **复制代码和前端构建产物**: 73 | - `COPY . .`:复制所有后端代码到工作目录。 74 | - `COPY --from=builder /build/dist ./web/dist`:从第一阶段中复制构建好的前端文件到后端服务目录中。 75 | - **编译应用**: 76 | - `RUN go build -ldflags "-s -w -X 'one-api/common.Version=$(cat VERSION)' -extldflags '-static'" -o one-api`:使用 Go 编译命令构建应用,设置链接器选项以嵌入版本信息并优化二进制大小。 77 | 78 | ### 第三阶段:运行环境 79 | 80 | - **基础镜像**: 81 | - `FROM alpine`:使用轻量级的 Alpine Linux 镜像作为基础。 82 | - **安装证书和时区数据**: 83 | - 运行一系列命令以安装必要的证书和时区数据,确保应用可以处理 HTTPS 连接和正确的时间。 84 | - **复制编译好的应用**: 85 | - `COPY --from=builder2 /build/one-api /`:从第二阶段复制编译好的应用到根目录。 86 | - **端口和工作目录**: 87 | - `EXPOSE 3000`:声明容器在运行时会监听 3000 端口。 88 | - `WORKDIR /data`:设置工作目录,应用可能会使用此目录来存储数据。 89 | - **设置入口点**: 90 | - `ENTRYPOINT ["/one-api"]`:设置容器启动时执行的命令。 91 | 92 | ### 总结 93 | 94 | 此 Dockerfile 首先构建前端资源,然后构建后端服务,并将前端资源集成到后端服务中,最后在一个轻量级容器中运行编译好的二进制文件,实现前后端的自动化构建和部署。 95 | -------------------------------------------------------------------------------- /11_rabbitmq/README.md: -------------------------------------------------------------------------------- 1 | ## 消息队列 2 | 3 | - [1. 简单使用步骤](#简单使用步骤) 4 | - [2. 启动rabbitmq docker命令](#启动docker命令) 5 | - [3. 简单的生产者和消费者demo代码](#简单的生产者和消费者demo代码) 6 | - [4. rabbitmq实现一台服务器同时给所有的消费者发送消息](#rabbitmq实现一台服务器同时给所有的消费者发送消息) 7 | 8 | 9 | [rabbitmq tutorial](https://www.rabbitmq.com/tutorials/tutorial-one-python.html)
10 | 11 | ### 简单使用步骤 12 | 13 | [安装RabbitMQ Server](https://www.rabbitmq.com/download.html)
14 | 用docker安装即可 15 | 16 | ### 启动docker命令 17 | 18 | ```shell 19 | docker run -it --rm --name rabbitmq -p 5672:5672 -p 15672:15672 rabbitmq:3-management 20 | # port 5672 21 | ``` 22 | 用docker启动RabbitMQ 23 | 24 | ### 简单的生产者和消费者demo代码 25 | 26 | 消费者 server.py
27 | ```python 28 | #!/usr/bin/env python 29 | import pika 30 | import json 31 | 32 | connection = pika.BlockingConnection( 33 | pika.ConnectionParameters(host='localhost')) 34 | channel = connection.channel() 35 | 36 | channel.queue_declare(queue='hello') 37 | 38 | 39 | def callback(ch, method, properties, body): 40 | vec = json.loads(body) 41 | print(" [x] Received ", vec) 42 | 43 | 44 | channel.basic_consume( 45 | queue='hello', on_message_callback=callback, auto_ack=True) 46 | 47 | print(' [*] Waiting for messages. To exit press CTRL+C') 48 | channel.start_consuming() 49 | ``` 50 | receive.py启动以后会一直监听host上的queue
51 | 52 | 生产者 client.py
53 | ```python 54 | #!/usr/bin/env python 55 | import pika 56 | import json 57 | 58 | connection = pika.BlockingConnection( 59 | pika.ConnectionParameters(host='localhost')) 60 | channel = connection.channel() 61 | 62 | channel.queue_declare(queue='hello') 63 | 64 | channel.basic_publish(exchange='', routing_key='hello', body=json.dumps([1.2,0.99,5.5])) 65 | print(" [x] Sent 'Hello World!'") 66 | connection.close() 67 | ``` 68 | send.py每发一次,receive.py那边会打印出发送的body信息 69 | 70 | 71 | ### rabbitmq实现一台服务器同时给所有的消费者发送消息 72 | 73 | 开了docker版的rabbitmq服务以后,在多台机器上先运行消费者server.py
74 | ```python 75 | #!/usr/bin/env python 76 | import pika 77 | 78 | connection = pika.BlockingConnection(pika.ConnectionParameters(host='localhost')) 79 | channel = connection.channel() 80 | #创建exchange的名称为logs,指定类型为fanout 81 | channel.exchange_declare(exchange='logs', exchange_type='fanout') 82 | #删除随机创建的消息队列 83 | queue_name = 'task_queue1' #每台机器上的名字最好不一样 84 | result = channel.queue_declare(queue=queue_name) 85 | channel.queue_bind(exchange='logs', queue=queue_name) 86 | print(' [*] Waiting for logs. To exit press CTRL+C') 87 | def callback(ch, method, properties, body): 88 | print(" [x] %r" % body) 89 | channel.basic_consume(queue_name, callback) 90 | channel.start_consuming() 91 | ``` 92 | 93 | 然后再用生产者client.py发送给消费者,这个时候这些消费者会同时接收到该消息
94 | ```python 95 | import pika 96 | import sys 97 | connection = pika.BlockingConnection(pika.ConnectionParameters(host='localhost')) 98 | channel = connection.channel() 99 | channel.exchange_declare(exchange='logs',exchange_type='fanout') 100 | message = ' '.join(sys.argv[1:]) or "info: Hello World!" 101 | #指定exchange的名称 102 | channel.basic_publish(exchange='logs', routing_key='', body=message) 103 | print(" [x] Sent %r" % message) 104 | connection.close() 105 | ``` 106 | 107 | 注:host都要写同一个服务器, rabbitmq服务开启的那台机器ip 108 | 109 | ### xxx -------------------------------------------------------------------------------- /12_nginx/README.md: -------------------------------------------------------------------------------- 1 | ## nginx 2 | 3 | [**1. nginx入门使用**](#nginx入门使用) 4 | 5 | [**2. nginx正则使用1(2024.4.2更新)**](#nginx正则使用1) 6 | 7 | 8 | 9 | --- 10 | 11 | ### nginx入门使用 12 | 13 |
14 | 点击展开 15 | 16 | **1. 第一步用安装docker nginx** 17 | 18 | ```shell 19 | docker pull nginx:latest 20 | ``` 21 | 22 | **2. 开启nginx和两个flask服务(用来模拟多个服务器的)** 23 | 24 | ```shell 25 | # 开启nginx 26 | docker run --name=nginx -d -p 4030:80 nginx #网页访问端口4030 27 | # 开启两个flask server 28 | docker run -d -p 5001:5001 -v $PWD/flask_nginx_test:/usr/src/flask_nginx_test -w /usr/src/flask_nginx_test binzhouchn/python36:1.4 python test1.py 29 | docker run -d -p 5002:5002 -v $PWD/flask_nginx_test:/usr/src/flask_nginx_test -w /usr/src/flask_nginx_test binzhouchn/python36:1.4 python test2.py 30 | ``` 31 | 32 | nginx配置前,开启以后单独访问
33 | localhost:4030会进入nginx欢迎界面
34 | localhost:5001页面显示BINZHOU TEST 1
35 | localhost:5002页面显示BINZHOU TEST 2
36 | 37 | **3. 配置nginx配置文件** 38 | 39 | 文件在/etc/nginx/nginx.conf,由于这个文件include /etc/nginx/conf.d/*.conf;所以直接到/etc/nginx/conf.d/下面更改default.conf即可
40 | [更改后的default.conf](default.conf) 41 | 42 | 注: 43 | 这里172.17.0.3这些是docker虚拟ip地址,docker之间通信可以通过这个地址
44 | 负载均衡通过轮询方式
45 | 172.17.0.5:5003这个端口并没有开启,会自动忽略
46 | 47 | **4. 配置完后重启ngix** 48 | 49 | ```shell 50 | # 先进到ngix docker里面/etc/nginx/config.d中运行nginx -t看下是否success 51 | docker stop 52 | docker start 53 | ``` 54 | 55 | 配置完nginx以及重启后,再访问
56 | localhost:4030页面会显示BINZHOU TEST 1;再刷新(重载)会显示BINZHOU TEST 2;再刷新BINZHOU TEST 1 57 | 58 | **说明nginx已经自动转到两个服务器去了**
59 | 60 | **5. 配置文件扩展** 61 | 62 | 5.1 一台nginx服务器,通过指定不同端口(比如4030和4031)来达到访问不同应用的目的
63 | ```shell 64 | # docker开启nginx命令如下,映射两个端口 65 | docker run --name=nginx -d -p 4030:4030 -p 4031:4031 nginx 66 | ``` 67 | [配置文件1](default1.conf) 68 | 69 | 5.2 一台nginx服务器,通过不同的路由(比如/project/guoge)来达到访问不同应用的目的
84 | 85 | ### nginx正则使用1 86 | 87 | ```shell 88 | cd /etc/nginx/conf.d 89 | #修改后重启 90 | systemctl restart nginx 91 | nginx -s reload 92 | ``` 93 | [配置文件3](default3.conf) 94 | 95 | 说明:本次使用正则的目的是当我访问 96 | http://10.28.xx.xx:8000/aimanager_gpu/recsys/时, 97 | 正则匹配后转到http://localhost:10086,后面不加/aimanager_gpu/recsys路由 98 | (如果不走正则那么proxy_pass转到http://localhost:10086后会自动拼接/aimanager_gpu/recsys) 99 | 100 | 101 | 102 | 103 | - 参考资料 104 | 105 | [nginx作为http服务器-静态页面的访问](https://www.cnblogs.com/xuyang94/p/12667844.html)
106 | [docker nginx反向代理](https://www.cnblogs.com/dotnet261010/p/12596185.html)
107 | [nginx负载均衡参考1](https://www.jianshu.com/p/4c250c1cd6cd)
108 | [nginx负载均衡参考2](https://www.cnblogs.com/diantong/p/11208508.html)
-------------------------------------------------------------------------------- /12_nginx/default.conf: -------------------------------------------------------------------------------- 1 | upstream nginx-flask-test { 2 | server 172.17.0.3:5001; 3 | server 172.17.0.4:5002; 4 | server 172.17.0.5:5003; 5 | } 6 | 7 | server { 8 | listen 80; 9 | listen [::]:80; 10 | server_name localhost; 11 | 12 | #charset koi8-r; 13 | #access_log /var/log/nginx/host.access.log main; 14 | 15 | location / { 16 | root /usr/share/nginx/html; 17 | index index.html index.htm; 18 | proxy_pass http://nginx-flask-test; 19 | } 20 | 21 | #error_page 404 /404.html; 22 | 23 | # redirect server error pages to the static page /50x.html 24 | # 25 | error_page 500 502 503 504 /50x.html; 26 | location = /50x.html { 27 | root /usr/share/nginx/html; 28 | } 29 | 30 | # proxy the PHP scripts to Apache listening on 127.0.0.1:80 31 | # 32 | #location ~ \.php$ { 33 | # proxy_pass http://127.0.0.1; 34 | #} 35 | 36 | # pass the PHP scripts to FastCGI server listening on 127.0.0.1:9000 37 | # 38 | #location ~ \.php$ { 39 | # root html; 40 | # fastcgi_pass 127.0.0.1:9000; 41 | # fastcgi_index index.php; 42 | # fastcgi_param SCRIPT_FILENAME /scripts$fastcgi_script_name; 43 | # include fastcgi_params; 44 | #} 45 | # deny access to .htaccess files, if Apache's document root 46 | # concurs with nginx's one 47 | # 48 | #location ~ /\.ht { 49 | # deny all; 50 | #} 51 | } 52 | 53 | -------------------------------------------------------------------------------- /12_nginx/default1.conf: -------------------------------------------------------------------------------- 1 | upstream server1 { 2 | server 192.168.0.108:5004; 3 | } 4 | 5 | upstream server2 { 6 | server 192.168.0.108:5007; 7 | } 8 | 9 | server { 10 | listen 4030; 11 | server_name localhost; 12 | client_max_body_size 1024M; 13 | 14 | #默认路由放最下面 15 | location / { 16 | proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; 17 | proxy_set_header Host $http_host; 18 | proxy_pass http://server1; 19 | } 20 | } 21 | 22 | server { 23 | listen 4031; 24 | server_name localhost; 25 | client_max_body_size 1024M; 26 | 27 | #默认路由放最下面 28 | location / { 29 | proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; 30 | proxy_set_header Host $http_host; 31 | proxy_pass http://server2; 32 | } 33 | } 34 | -------------------------------------------------------------------------------- /12_nginx/default2.conf: -------------------------------------------------------------------------------- 1 | upstream server1 { 2 | server 192.168.0.108:5004; 3 | } 4 | 5 | upstream server2 { 6 | server 192.168.0.108:5007; 7 | } 8 | 9 | server { 10 | listen 4030; 11 | server_name localhost; 12 | client_max_body_size 1024M; 13 | 14 | location /project/guoge { 15 | proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; 16 | proxy_set_header Host $http_host; 17 | proxy_pass http://server2; 18 | } 19 | 20 | #默认路由放最下面 21 | location / { 22 | proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; 23 | proxy_set_header Host $http_host; 24 | proxy_pass http://server1; 25 | } 26 | } 27 | 28 | 29 | -------------------------------------------------------------------------------- /12_nginx/default3.conf: -------------------------------------------------------------------------------- 1 | upstream recsys { 2 | server localhost:10086; 3 | } 4 | 5 | server { 6 | server_name localhost; 7 | listen 8000; 8 | location ~* /aimanager_gpu/recsys/ { 9 | if ($request_uri ~ /aimanager_gpu/recsys/(.+)) 10 | { 11 | set $rightUrl $1; 12 | } 13 | proxy_pass http://recsys/$rightUrl; 14 | } 15 | } -------------------------------------------------------------------------------- /13_airflow/README.md: -------------------------------------------------------------------------------- 1 | ## airflow 气流 2 | 3 | 任务调度神器airflow之初体验
4 | https://zhuanlan.zhihu.com/p/42052108 5 | 6 | Airflow入门及使用
7 | https://zhuanlan.zhihu.com/p/84332879 8 | 9 | 【调研】Airflow使用
10 | https://www.jianshu.com/p/75c64b63122b -------------------------------------------------------------------------------- /14_go/README.md: -------------------------------------------------------------------------------- 1 | # python调用golang 2 | 3 | ## 示例一 python端输入int返回int 4 | 5 | ```Go 6 | package main 7 | 8 | import ( 9 | "C" 10 | ) 11 | 12 | func f1(x int) int { 13 | return x*x + 2 14 | } 15 | 16 | //export Fib 17 | func Fib(n int) int { 18 | if n == 1 || n == 2 { 19 | return 1 20 | } else { 21 | return Fib(n-1) + Fib(n-2) + f1(1) 22 | } 23 | } 24 | 25 | func main() {} 26 | ``` 27 | 28 | //go build -buildmode=c-shared -o _fib.so fib.go
29 | //参考链接https://blog.csdn.net/cainiao_python/article/details/107724309
30 | //将_fib.so文件拷贝到python文件夹下
31 | 32 | ```python 33 | import ctypes 34 | import time 35 | from ctypes import * 36 | so = ctypes.cdll.LoadLibrary('./_fib.so') 37 | start = time.time() 38 | result = so.Fib(40) 39 | end = time.time() 40 | print(f'斐波那契数列第40项:{result},耗时:{end - start}') 41 | ``` 42 | 43 | ## 示例二 python端输入string返回string(推荐看示例三) 44 | 45 | ```Go 46 | package main 47 | 48 | import ( 49 | "C" 50 | "database/sql" 51 | "log" 52 | "strings" 53 | 54 | _ "github.com/go-sql-driver/mysql" 55 | ) 56 | 57 | //export Gdbc 58 | func Gdbc(uri *C.char) string { 59 | log.Println(uri) 60 | db, err := sql.Open("mysql", C.GoString(uri)) 61 | if err != nil { 62 | log.Fatalln(err) 63 | } 64 | rows, err := db.Query("SELECT feature_word FROM insurance_qa.feature_words") 65 | if err != nil { 66 | log.Fatalln(err) 67 | } 68 | res := []string{} 69 | for rows.Next() { 70 | var s string 71 | err = rows.Scan(&s) 72 | if err != nil { 73 | log.Fatalln(err) 74 | } 75 | // log.Printf("found row containing %q", s) 76 | res = append(res, s) 77 | } 78 | rows.Close() 79 | return strings.Join(res, ",") 80 | } 81 | 82 | func main() { 83 | // res := Gdbc("username:password@tcp(localhost:3306)/database?charset=utf8") 84 | // fmt.Println(res) 85 | } 86 | ``` 87 | //go build -buildmode=c-shared -o _gdbc.so test.go
88 | //将_gdbc.so文件拷贝到python文件夹下
89 | 90 | ```python 91 | import ctypes 92 | import time 93 | from ctypes import * 94 | class StructPointer(Structure): 95 | _fields_ = [("p", c_char_p), ("n", c_longlong)] 96 | 97 | so = ctypes.cdll.LoadLibrary('./_gdbc.so') 98 | so.Gdbc.restype = StructPointer 99 | start = time.time() 100 | uri = "username:password@tcp(localhost:3306)/database?charset=utf8" 101 | res = so.Gdbc(uri.encode("utf-8")) 102 | print(res.n) 103 | print(res.p[:res.n].decode())#print(res.p.decode())这样貌似也没问题 104 | end = time.time() 105 | print(f'耗时:{end - start}') 106 | ``` 107 | 108 | ## 示例三 python端输入string,go查询数据库然后返回json str 109 | 110 | ```Go 111 | package main 112 | 113 | import ( 114 | "C" 115 | "database/sql" 116 | "encoding/json" 117 | "log" 118 | 119 | _ "github.com/go-sql-driver/mysql" 120 | ) 121 | 122 | type Fw struct { 123 | feature_word string 124 | word_type string 125 | id int64 126 | } 127 | 128 | //export Gdbc 129 | func Gdbc(uri *C.char) string { 130 | db, err := sql.Open("mysql", C.GoString(uri)) 131 | //设置数据库最大连接数 132 | db.SetConnMaxLifetime(100) 133 | //设置上数据库最大闲置连接数 134 | db.SetMaxIdleConns(10) 135 | if err != nil { 136 | log.Fatalln(err) 137 | } 138 | rows, err := db.Query("SELECT feature_word,word_type,id FROM insurance_qa.feature_words") 139 | if err != nil { 140 | log.Fatalln(err) 141 | } 142 | res := [][]interface{}{} 143 | var fw Fw 144 | for rows.Next() { 145 | err = rows.Scan(&fw.feature_word, &fw.word_type, &fw.id) 146 | if err != nil { 147 | log.Fatalln(err) 148 | } 149 | // log.Printf("found row containing %q", s) 150 | tmp := []interface{}{} 151 | tmp = append(tmp, fw.feature_word) 152 | tmp = append(tmp, fw.word_type) 153 | tmp = append(tmp, fw.id) 154 | res = append(res, tmp) 155 | // res = append(res, []interface{}{fw.feature_word, fw.word_type, fw.id})//上面的一行写法 156 | } 157 | rows.Close() 158 | b, err := json.Marshal(res) 159 | if err != nil { 160 | panic(err) 161 | } 162 | result := string(b) 163 | return result 164 | } 165 | 166 | func main() {} 167 | 168 | ``` 169 | 170 | //go build -buildmode=c-shared -o _gdbc.so test.go
171 | //将_gdbc.so文件拷贝到python文件夹下
172 | 173 | ```python 174 | import ctypes 175 | import time 176 | import json 177 | from ctypes import * 178 | class StructPointer(Structure): 179 | _fields_ = [("p", c_char_p), ("n", c_longlong)] 180 | 181 | so = ctypes.cdll.LoadLibrary('./_gdbc.so') 182 | so.Gdbc.restype = StructPointer 183 | start = time.time() 184 | uri = "username:password@tcp(localhost:3306)/database?charset=utf8" 185 | res = so.Gdbc(uri.encode("utf-8")) 186 | print(res.n) 187 | print(res.p.decode()) 188 | print(json.loads(res.p.decode())) 189 | end = time.time() 190 | ``` 191 | 192 | ## -------------------------------------------------------------------------------- /15_ansible/README.md: -------------------------------------------------------------------------------- 1 | # ansible笔记 2 | 3 | ```shell 4 | 在/etc/ansible/ansible.cfg下配置[model] 5 | # ping 6 | ansible model -m ping 7 | # ansible-playbook写剧本 8 | ansible-playbook xxx.yaml 9 | # 传文件 10 | ansible model -m copy -a "src=./test.txt dest=/home/zhoubin" 11 | # 创建文件(ansible-playbook形式) 12 | - hosts: model 13 | remote_user: zhoubin 14 | tasks: 15 | - name: "create test2.txt in the /etc directory" 16 | file: 17 | path: "/home/zhoubin/test2.txt" 18 | state: "touch" 19 | # 创建文件夹(ansible-playbook形式) 20 | - hosts: model 21 | remote_user: zhoubin 22 | tasks: 23 | - name: "create tmp file in the /etc directory" 24 | file: 25 | path: "/home/zhoubin/tmp" 26 | state: "directory" 27 | # 删除文件(ansible-playbook形式) 28 | - hosts: model 29 | remote_user: zhoubin 30 | tasks: 31 | - name: "delete test.txt in the /etc directory" 32 | file: 33 | path: "/home/zhoubin/test.txt" 34 | state: "absent" 35 | # 删除多个文件(ansible-playbook形式) 36 | - hosts: model 37 | remote_user: zhoubin 38 | tasks: 39 | - name: "delete multi files in the /etc directory" 40 | file: 41 | path: "{{ item }}" 42 | state: "absent" 43 | with_items: 44 | - /home/zhoubin/test1.txt 45 | - /home/zhoubin/test2.txt 46 | # 将远程服务器文件拷贝到本机 47 | ansible model -m fetch -a "src=/home/zhoubin/test.txt dest=./ force=yes backup=yes" 48 | 49 | # 写一个剧本(传docker镜像并且加载) become:yes可以避免sudo输密码! 50 | - hosts: model 51 | remote_user: zhoubin 52 | tasks: 53 | - name: copy docker image 54 | copy: src=./py37.tar.gz dest=/home/zhoubin 55 | - name: load image 56 | shell: docker load -i /home/zhoubin/py37.tar.gz 57 | become: yes 58 | 59 | 60 | ``` 61 | 62 | 63 | ### 附录 64 | 65 | [超简单ansible2.4.2.0与playbook入门教程](https://blog.csdn.net/qq_45206551/article/details/105004233)
66 | [ansible-命令使用说明](https://www.cnblogs.com/scajy/p/11353825.html)
67 | -------------------------------------------------------------------------------- /99_pycharm_archive/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binzhouchn/python_notes/52b87285c4147cc2bd43d69d0243a2d6357f4183/99_pycharm_archive/.DS_Store -------------------------------------------------------------------------------- /99_pycharm_archive/README.md: -------------------------------------------------------------------------------- 1 | # 1. pycharm github 2 | 3 | pubilic email: binzhouchn@gmail.com 4 | 5 | **每次github push东西之前,先pull一下** 6 | 7 | [github教程](https://www.liaoxuefeng.com/wiki/0013739516305929606dd18361248578c67b8067c8c017b000) 8 | 9 | 简单的步骤:
10 | **首先:** 安装好git, macOS和windows不一样
11 | 12 | **第1步:** 创建SSH Key。在用户主目录下,看看有没有.ssh目录,如果有,再看看这个目录下有没有id_rsa和id_rsa.pub这两个文件, 13 | 如果已经有了,可直接跳到下一步。如果没有,打开Shell(Windows下打开Git Bash),创建SSH Key:
14 | ``` 15 | ssh-keygen -t rsa -C "binzhouchn@gmail.com" 16 | ``` 17 | 你需要把邮件地址换成你自己的邮件地址,然后一路回车,使用默认值即可,由于这个Key也不是用于军事目的,所以也无需设置密码。 18 | macOS存放密匙路径: /Users/binzhou/.ssh/id_rs
19 | 如果一切顺利的话,可以在用户主目录里找到.ssh目录,里面有id_rsa和id_rsa.pub两个文件,这两个就是SSH Key的秘钥对,id_rsa是私钥, 20 | 不能泄露出去,id_rsa.pub是公钥,可以放心地告诉任何人。 21 | 22 | **第2步:** 登陆GitHub,打开“Account settings”,“SSH Keys”页面: 23 | 然后,点“Add SSH Key”,填上任意Title,在Key文本框里粘贴id_rsa.pub文件的内容:
24 | 当然,GitHub允许你添加多个Key。假定你有若干电脑,你一会儿在公司提交,一会儿在家里提交,只要把每台电脑的Key都添加到GitHub,就可以在每台电脑上往GitHub推送了。 25 | 26 | 最后友情提示,在GitHub上免费托管的Git仓库,任何人都可以看到喔(但只有你自己才能改)。所以,不要把敏感信息放进去。 27 | 28 | **第3步:** 如果想在pycharm中使用git可以先配置pycharm(以windows为例的话)
29 | 30 | 31 | 32 | 33 | # 2. pycharm远程服务器生成项目及调试代码 34 | 35 | 1. 打开pycharm -> File -> Settings -> Project Interpreter
36 | 2. 点Project Interpreter轮子,选择add remote选择SSH然后填入用户密码等apply再OK 37 | 3. Tools -> Deployment -> Configuration点击左上角的加号(取名字v2并选SFTP) 38 | 4. Configuration中配置Connection和Mappings如下图所示 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | # 3. pycharm中遇到的一些问题 48 | 49 | 1. pycharm运行app.py需要在configuration->Environment variables中加入 LANG=en_US.utf-8;LC_ALL=en_US.utf-8 50 | 51 | # 4. pycharm永久激活 52 | 53 | - 下载激活插件:jetbrains-agent.jar(见激活码文件夹),并将jetbrains-agent.jar放到PyCharm安装目录bin下面,例如/Applications/PyCharm.app/Contents/bin 54 | - 首次安装的Pycharm,需要点击激活窗口的Evaluate for free免费试用,然后再创建一个空项目进入主页窗口。 55 | - 在菜单栏Help中选择Edit Custom VM Options… 在弹框中选择Create 56 | - 在最后一行添加:-javaagent:/Applications/PyCharm.app/Contents/bin/jetbrains-agent.jar 57 | - 修改完成后,重启Pycharm,点击菜单栏中的 “Help” -> “Register”,输入永久激活码(见激活码文件夹)完成完成激活,这里的激活码与方法一种激活码不同 58 | - 查看有效期的步骤:点击:Help->About,这里可以看到你的pycharm有效期到2089年了,cheers bro! 59 | 60 | 61 | 62 | intellij idea也是同理激活,最后“Help” -> “Register”,license server激活方式输入https://fls.jet... 63 | 64 | # 5. xxx 65 | -------------------------------------------------------------------------------- /99_pycharm_archive/pic/pycharm_activ.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binzhouchn/python_notes/52b87285c4147cc2bd43d69d0243a2d6357f4183/99_pycharm_archive/pic/pycharm_activ.png -------------------------------------------------------------------------------- /99_pycharm_archive/pic/pycharm_git1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binzhouchn/python_notes/52b87285c4147cc2bd43d69d0243a2d6357f4183/99_pycharm_archive/pic/pycharm_git1.png -------------------------------------------------------------------------------- /99_pycharm_archive/pic/pycharm_git2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binzhouchn/python_notes/52b87285c4147cc2bd43d69d0243a2d6357f4183/99_pycharm_archive/pic/pycharm_git2.png -------------------------------------------------------------------------------- /99_pycharm_archive/pic/pycharm_remote1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binzhouchn/python_notes/52b87285c4147cc2bd43d69d0243a2d6357f4183/99_pycharm_archive/pic/pycharm_remote1.png -------------------------------------------------------------------------------- /99_pycharm_archive/pic/pycharm_remote2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binzhouchn/python_notes/52b87285c4147cc2bd43d69d0243a2d6357f4183/99_pycharm_archive/pic/pycharm_remote2.png -------------------------------------------------------------------------------- /99_pycharm_archive/pic/pycharm_remote3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binzhouchn/python_notes/52b87285c4147cc2bd43d69d0243a2d6357f4183/99_pycharm_archive/pic/pycharm_remote3.png -------------------------------------------------------------------------------- /99_pycharm_archive/pic/pycharm_remote4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binzhouchn/python_notes/52b87285c4147cc2bd43d69d0243a2d6357f4183/99_pycharm_archive/pic/pycharm_remote4.png -------------------------------------------------------------------------------- /99_pycharm_archive/pic/pycharm_remote5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binzhouchn/python_notes/52b87285c4147cc2bd43d69d0243a2d6357f4183/99_pycharm_archive/pic/pycharm_remote5.png -------------------------------------------------------------------------------- /99_pycharm_archive/激活码/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binzhouchn/python_notes/52b87285c4147cc2bd43d69d0243a2d6357f4183/99_pycharm_archive/激活码/.DS_Store -------------------------------------------------------------------------------- /99_pycharm_archive/激活码/jetbrains-agent.jar: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binzhouchn/python_notes/52b87285c4147cc2bd43d69d0243a2d6357f4183/99_pycharm_archive/激活码/jetbrains-agent.jar -------------------------------------------------------------------------------- /99_pycharm_archive/激活码/永久激活码/激活码.txt: -------------------------------------------------------------------------------- 1 | JQE11SV0BR-eyJsaWNlbnNlSWQiOiJKUUUxMVNWMEJSIiwibGljZW5zZWVOYW1lIjoicGlnNiIsImFzc2lnbmVlTmFtZSI6IiIsImFzc2lnbmVlRW1haWwiOiIiLCJsaWNlbnNlUmVzdHJpY3Rpb24iOiIiLCJjaGVja0NvbmN1cnJlbnRVc2UiOmZhbHNlLCJwcm9kdWN0cyI6W3siY29kZSI6IklJIiwiZmFsbGJhY2tEYXRlIjoiMjA4OS0wNy0wNyIsInBhaWRVcFRvIjoiMjA4OS0wNy0wNyJ9LHsiY29kZSI6IkFDIiwiZmFsbGJhY2tEYXRlIjoiMjA4OS0wNy0wNyIsInBhaWRVcFRvIjoiMjA4OS0wNy0wNyJ9LHsiY29kZSI6IkRQTiIsImZhbGxiYWNrRGF0ZSI6IjIwODktMDctMDciLCJwYWlkVXBUbyI6IjIwODktMDctMDcifSx7ImNvZGUiOiJQUyIsImZhbGxiYWNrRGF0ZSI6IjIwODktMDctMDciLCJwYWlkVXBUbyI6IjIwODktMDctMDcifSx7ImNvZGUiOiJHTyIsImZhbGxiYWNrRGF0ZSI6IjIwODktMDctMDciLCJwYWlkVXBUbyI6IjIwODktMDctMDcifSx7ImNvZGUiOiJETSIsImZhbGxiYWNrRGF0ZSI6IjIwODktMDctMDciLCJwYWlkVXBUbyI6IjIwODktMDctMDcifSx7ImNvZGUiOiJDTCIsImZhbGxiYWNrRGF0ZSI6IjIwODktMDctMDciLCJwYWlkVXBUbyI6IjIwODktMDctMDcifSx7ImNvZGUiOiJSUzAiLCJmYWxsYmFja0RhdGUiOiIyMDg5LTA3LTA3IiwicGFpZFVwVG8iOiIyMDg5LTA3LTA3In0seyJjb2RlIjoiUkMiLCJmYWxsYmFja0RhdGUiOiIyMDg5LTA3LTA3IiwicGFpZFVwVG8iOiIyMDg5LTA3LTA3In0seyJjb2RlIjoiUkQiLCJmYWxsYmFja0RhdGUiOiIyMDg5LTA3LTA3IiwicGFpZFVwVG8iOiIyMDg5LTA3LTA3In0seyJjb2RlIjoiUEMiLCJmYWxsYmFja0RhdGUiOiIyMDg5LTA3LTA3IiwicGFpZFVwVG8iOiIyMDg5LTA3LTA3In0seyJjb2RlIjoiUk0iLCJmYWxsYmFja0RhdGUiOiIyMDg5LTA3LTA3IiwicGFpZFVwVG8iOiIyMDg5LTA3LTA3In0seyJjb2RlIjoiV1MiLCJmYWxsYmFja0RhdGUiOiIyMDg5LTA3LTA3IiwicGFpZFVwVG8iOiIyMDg5LTA3LTA3In0seyJjb2RlIjoiREIiLCJmYWxsYmFja0RhdGUiOiIyMDg5LTA3LTA3IiwicGFpZFVwVG8iOiIyMDg5LTA3LTA3In0seyJjb2RlIjoiREMiLCJmYWxsYmFja0RhdGUiOiIyMDg5LTA3LTA3IiwicGFpZFVwVG8iOiIyMDg5LTA3LTA3In0seyJjb2RlIjoiUlNVIiwiZmFsbGJhY2tEYXRlIjoiMjA4OS0wNy0wNyIsInBhaWRVcFRvIjoiMjA4OS0wNy0wNyJ9XSwiaGFzaCI6IjEyNzk2ODc3LzAiLCJncmFjZVBlcmlvZERheXMiOjcsImF1dG9Qcm9sb25nYXRlZCI6ZmFsc2UsImlzQXV0b1Byb2xvbmdhdGVkIjpmYWxzZX0=-khgsQrnDiglknF0m+yyoYGJXX4vFE3IIVaoMd0bkpfAlMiYM4FUK1JM7uMnVSN0NBC7qtZjYlNzPscEyKE8634uGuY/uToFQnIOCtyUfBxB6j0wF/DcCjhKMNDbnJ1RKZ2VaALuC9B6d6lhtEKm9+urXWTBq7h2VfIBv5wk1Ul9T/m9Dwkz/LccTqnxO0PP288fF13ZbmcLI1/D0dqp/QxYshW6CLR+2Tvk6QCPoaOTKDU/eL1AssD7/mO1g2ZJA+k//8qfRMLgdLmLrMdyiaIhrsM/jJk2qDfTaMcCNylkWXLgKwSvEQG95IhitLN9+GQ4pBW3gOTNl82Gem7jEkA==-MIIElTCCAn2gAwIBAgIBCTANBgkqhkiG9w0BAQsFADAYMRYwFAYDVQQDDA1KZXRQcm9maWxlIENBMB4XDTE4MTEwMTEyMjk0NloXDTIwMTEwMjEyMjk0NlowaDELMAkGA1UEBhMCQ1oxDjAMBgNVBAgMBU51c2xlMQ8wDQYDVQQHDAZQcmFndWUxGTAXBgNVBAoMEEpldEJyYWlucyBzLnIuby4xHTAbBgNVBAMMFHByb2QzeS1mcm9tLTIwMTgxMTAxMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA5ndaik1GD0nyTdqkZgURQZGW+RGxCdBITPXIwpjhhaD0SXGa4XSZBEBoiPdY6XV6pOfUJeyfi9dXsY4MmT0D+sKoST3rSw96xaf9FXPvOjn4prMTdj3Ji3CyQrGWeQU2nzYqFrp1QYNLAbaViHRKuJrYHI6GCvqCbJe0LQ8qqUiVMA9wG/PQwScpNmTF9Kp2Iej+Z5OUxF33zzm+vg/nYV31HLF7fJUAplI/1nM+ZG8K+AXWgYKChtknl3sW9PCQa3a3imPL9GVToUNxc0wcuTil8mqveWcSQCHYxsIaUajWLpFzoO2AhK4mfYBSStAqEjoXRTuj17mo8Q6M2SHOcwIDAQABo4GZMIGWMAkGA1UdEwQCMAAwHQYDVR0OBBYEFGEpG9oZGcfLMGNBkY7SgHiMGgTcMEgGA1UdIwRBMD+AFKOetkhnQhI2Qb1t4Lm0oFKLl/GzoRykGjAYMRYwFAYDVQQDDA1KZXRQcm9maWxlIENBggkA0myxg7KDeeEwEwYDVR0lBAwwCgYIKwYBBQUHAwEwCwYDVR0PBAQDAgWgMA0GCSqGSIb3DQEBCwUAA4ICAQBonMu8oa3vmNAa4RQP8gPGlX3SQaA3WCRUAj6Zrlk8AesKV1YSkh5D2l+yUk6njysgzfr1bIR5xF8eup5xXc4/G7NtVYRSMvrd6rfQcHOyK5UFJLm+8utmyMIDrZOzLQuTsT8NxFpbCVCfV5wNRu4rChrCuArYVGaKbmp9ymkw1PU6+HoO5i2wU3ikTmRv8IRjrlSStyNzXpnPTwt7bja19ousk56r40SmlmC04GdDHErr0ei2UbjUua5kw71Qn9g02tL9fERI2sSRjQrvPbn9INwRWl5+k05mlKekbtbu2ev2woJFZK4WEXAd/GaAdeZZdumv8T2idDFL7cAirJwcrbfpawPeXr52oKTPnXfi0l5+g9Gnt/wfiXCrPElX6ycTR6iL3GC2VR4jTz6YatT4Ntz59/THOT7NJQhr6AyLkhhJCdkzE2cob/KouVp4ivV7Q3Fc6HX7eepHAAF/DpxwgOrg9smX6coXLgfp0b1RU2u/tUNID04rpNxTMueTtrT8WSskqvaJd3RH8r7cnRj6Y2hltkja82HlpDURDxDTRvv+krbwMr26SB/40BjpMUrDRCeKuiBahC0DCoU/4+ze1l94wVUhdkCfL0GpJrMSCDEK+XEurU18Hb7WT+ThXbkdl6VpFdHsRvqAnhR2g4b+Qzgidmuky5NUZVfEaZqV/g== -------------------------------------------------------------------------------- /99_pycharm_archive/激活码/永久激活码/激活码1.txt: -------------------------------------------------------------------------------- 1 | A82DEE284F-eyJsaWNlbnNlSWQiOiJBODJERUUyODRGIiwibGljZW5zZWVOYW1lIjoiaHR0cHM6Ly96aGlsZS5pbyIsImFzc2lnbmVlTmFtZSI6IiIsImFzc2lnbmVlRW1haWwiOiIiLCJsaWNlbnNlUmVzdHJpY3Rpb24iOiJVbmxpbWl0ZWQgbGljZW5zZSB0aWxsIGVuZCBvZiB0aGUgY2VudHVyeS4iLCJjaGVja0NvbmN1cnJlbnRVc2UiOmZhbHNlLCJwcm9kdWN0cyI6W3siY29kZSI6IklJIiwicGFpZFVwVG8iOiIyMDg5LTA3LTA3In0seyJjb2RlIjoiUlMwIiwicGFpZFVwVG8iOiIyMDg5LTA3LTA3In0seyJjb2RlIjoiV1MiLCJwYWlkVXBUbyI6IjIwODktMDctMDcifSx7ImNvZGUiOiJSRCIsInBhaWRVcFRvIjoiMjA4OS0wNy0wNyJ9LHsiY29kZSI6IlJDIiwicGFpZFVwVG8iOiIyMDg5LTA3LTA3In0seyJjb2RlIjoiREMiLCJwYWlkVXBUbyI6IjIwODktMDctMDcifSx7ImNvZGUiOiJEQiIsInBhaWRVcFRvIjoiMjA4OS0wNy0wNyJ9LHsiY29kZSI6IlJNIiwicGFpZFVwVG8iOiIyMDg5LTA3LTA3In0seyJjb2RlIjoiRE0iLCJwYWlkVXBUbyI6IjIwODktMDctMDcifSx7ImNvZGUiOiJBQyIsInBhaWRVcFRvIjoiMjA4OS0wNy0wNyJ9LHsiY29kZSI6IkRQTiIsInBhaWRVcFRvIjoiMjA4OS0wNy0wNyJ9LHsiY29kZSI6IkdPIiwicGFpZFVwVG8iOiIyMDg5LTA3LTA3In0seyJjb2RlIjoiUFMiLCJwYWlkVXBUbyI6IjIwODktMDctMDcifSx7ImNvZGUiOiJDTCIsInBhaWRVcFRvIjoiMjA4OS0wNy0wNyJ9LHsiY29kZSI6IlBDIiwicGFpZFVwVG8iOiIyMDg5LTA3LTA3In0seyJjb2RlIjoiUlNVIiwicGFpZFVwVG8iOiIyMDg5LTA3LTA3In1dLCJoYXNoIjoiODkwNzA3MC8wIiwiZ3JhY2VQZXJpb2REYXlzIjowLCJhdXRvUHJvbG9uZ2F0ZWQiOmZhbHNlLCJpc0F1dG9Qcm9sb25nYXRlZCI6ZmFsc2V9-5epo90Xs7KIIBb8ckoxnB/AZQ8Ev7rFrNqwFhBAsQYsQyhvqf1FcYdmlecFWJBHSWZU9b41kvsN4bwAHT5PiznOTmfvGv1MuOzMO0VOXZlc+edepemgpt+t3GUHvfGtzWFYeKeyCk+CLA9BqUzHRTgl2uBoIMNqh5izlDmejIwUHLl39QOyzHiTYNehnVN7GW5+QUeimTr/koVUgK8xofu59Tv8rcdiwIXwTo71LcU2z2P+T3R81fwKkt34evy7kRch4NIQUQUno//Pl3V0rInm3B2oFq9YBygPUdBUbdH/KHROyohZRD8SaZJO6kUT0BNvtDPKF4mCT1saWM38jkw==-MIIElTCCAn2gAwIBAgIBCTANBgkqhkiG9w0BAQsFADAYMRYwFAYDVQQDDA1KZXRQcm9maWxlIENBMB4XDTE4MTEwMTEyMjk0NloXDTIwMTEwMjEyMjk0NlowaDELMAkGA1UEBhMCQ1oxDjAMBgNVBAgMBU51c2xlMQ8wDQYDVQQHDAZQcmFndWUxGTAXBgNVBAoMEEpldEJyYWlucyBzLnIuby4xHTAbBgNVBAMMFHByb2QzeS1mcm9tLTIwMTgxMTAxMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA5ndaik1GD0nyTdqkZgURQZGW+RGxCdBITPXIwpjhhaD0SXGa4XSZBEBoiPdY6XV6pOfUJeyfi9dXsY4MmT0D+sKoST3rSw96xaf9FXPvOjn4prMTdj3Ji3CyQrGWeQU2nzYqFrp1QYNLAbaViHRKuJrYHI6GCvqCbJe0LQ8qqUiVMA9wG/PQwScpNmTF9Kp2Iej+Z5OUxF33zzm+vg/nYV31HLF7fJUAplI/1nM+ZG8K+AXWgYKChtknl3sW9PCQa3a3imPL9GVToUNxc0wcuTil8mqveWcSQCHYxsIaUajWLpFzoO2AhK4mfYBSStAqEjoXRTuj17mo8Q6M2SHOcwIDAQABo4GZMIGWMAkGA1UdEwQCMAAwHQYDVR0OBBYEFGEpG9oZGcfLMGNBkY7SgHiMGgTcMEgGA1UdIwRBMD+AFKOetkhnQhI2Qb1t4Lm0oFKLl/GzoRykGjAYMRYwFAYDVQQDDA1KZXRQcm9maWxlIENBggkA0myxg7KDeeEwEwYDVR0lBAwwCgYIKwYBBQUHAwEwCwYDVR0PBAQDAgWgMA0GCSqGSIb3DQEBCwUAA4ICAQBonMu8oa3vmNAa4RQP8gPGlX3SQaA3WCRUAj6Zrlk8AesKV1YSkh5D2l+yUk6njysgzfr1bIR5xF8eup5xXc4/G7NtVYRSMvrd6rfQcHOyK5UFJLm+8utmyMIDrZOzLQuTsT8NxFpbCVCfV5wNRu4rChrCuArYVGaKbmp9ymkw1PU6+HoO5i2wU3ikTmRv8IRjrlSStyNzXpnPTwt7bja19ousk56r40SmlmC04GdDHErr0ei2UbjUua5kw71Qn9g02tL9fERI2sSRjQrvPbn9INwRWl5+k05mlKekbtbu2ev2woJFZK4WEXAd/GaAdeZZdumv8T2idDFL7cAirJwcrbfpawPeXr52oKTPnXfi0l5+g9Gnt/wfiXCrPElX6ycTR6iL3GC2VR4jTz6YatT4Ntz59/THOT7NJQhr6AyLkhhJCdkzE2cob/KouVp4ivV7Q3Fc6HX7eepHAAF/DpxwgOrg9smX6coXLgfp0b1RU2u/tUNID04rpNxTMueTtrT8WSskqvaJd3RH8r7cnRj6Y2hltkja82HlpDURDxDTRvv+krbwMr26SB/40BjpMUrDRCeKuiBahC0DCoU/4+ze1l94wVUhdkCfL0GpJrMSCDEK+XEurU18Hb7WT+ThXbkdl6VpFdHsRvqAnhR2g4b+Qzgidmuky5NUZVfEaZqV/g== -------------------------------------------------------------------------------- /99_pycharm_archive/激活码/永久激活码/激活码2.txt: -------------------------------------------------------------------------------- 1 | 3AGXEJXFK9-eyJsaWNlbnNlSWQiOiIzQUdYRUpYRks5IiwibGljZW5zZWVOYW1lIjoiaHR0cHM6Ly96aGlsZS5pbyIsImFzc2lnbmVlTmFtZSI6IiIsImFzc2lnbmVlRW1haWwiOiIiLCJsaWNlbnNlUmVzdHJpY3Rpb24iOiIiLCJjaGVja0NvbmN1cnJlbnRVc2UiOmZhbHNlLCJwcm9kdWN0cyI6W3siY29kZSI6IklJIiwiZmFsbGJhY2tEYXRlIjoiMjA4OS0wNy0wNyIsInBhaWRVcFRvIjoiMjA4OS0wNy0wNyJ9LHsiY29kZSI6IkFDIiwiZmFsbGJhY2tEYXRlIjoiMjA4OS0wNy0wNyIsInBhaWRVcFRvIjoiMjA4OS0wNy0wNyJ9LHsiY29kZSI6IkRQTiIsImZhbGxiYWNrRGF0ZSI6IjIwODktMDctMDciLCJwYWlkVXBUbyI6IjIwODktMDctMDcifSx7ImNvZGUiOiJQUyIsImZhbGxiYWNrRGF0ZSI6IjIwODktMDctMDciLCJwYWlkVXBUbyI6IjIwODktMDctMDcifSx7ImNvZGUiOiJHTyIsImZhbGxiYWNrRGF0ZSI6IjIwODktMDctMDciLCJwYWlkVXBUbyI6IjIwODktMDctMDcifSx7ImNvZGUiOiJETSIsImZhbGxiYWNrRGF0ZSI6IjIwODktMDctMDciLCJwYWlkVXBUbyI6IjIwODktMDctMDcifSx7ImNvZGUiOiJDTCIsImZhbGxiYWNrRGF0ZSI6IjIwODktMDctMDciLCJwYWlkVXBUbyI6IjIwODktMDctMDcifSx7ImNvZGUiOiJSUzAiLCJmYWxsYmFja0RhdGUiOiIyMDg5LTA3LTA3IiwicGFpZFVwVG8iOiIyMDg5LTA3LTA3In0seyJjb2RlIjoiUkMiLCJmYWxsYmFja0RhdGUiOiIyMDg5LTA3LTA3IiwicGFpZFVwVG8iOiIyMDg5LTA3LTA3In0seyJjb2RlIjoiUkQiLCJmYWxsYmFja0RhdGUiOiIyMDg5LTA3LTA3IiwicGFpZFVwVG8iOiIyMDg5LTA3LTA3In0seyJjb2RlIjoiUEMiLCJmYWxsYmFja0RhdGUiOiIyMDg5LTA3LTA3IiwicGFpZFVwVG8iOiIyMDg5LTA3LTA3In0seyJjb2RlIjoiUk0iLCJmYWxsYmFja0RhdGUiOiIyMDg5LTA3LTA3IiwicGFpZFVwVG8iOiIyMDg5LTA3LTA3In0seyJjb2RlIjoiV1MiLCJmYWxsYmFja0RhdGUiOiIyMDg5LTA3LTA3IiwicGFpZFVwVG8iOiIyMDg5LTA3LTA3In0seyJjb2RlIjoiREIiLCJmYWxsYmFja0RhdGUiOiIyMDg5LTA3LTA3IiwicGFpZFVwVG8iOiIyMDg5LTA3LTA3In0seyJjb2RlIjoiREMiLCJmYWxsYmFja0RhdGUiOiIyMDg5LTA3LTA3IiwicGFpZFVwVG8iOiIyMDg5LTA3LTA3In0seyJjb2RlIjoiUlNVIiwiZmFsbGJhY2tEYXRlIjoiMjA4OS0wNy0wNyIsInBhaWRVcFRvIjoiMjA4OS0wNy0wNyJ9XSwiaGFzaCI6IjEyNzk2ODc3LzAiLCJncmFjZVBlcmlvZERheXMiOjcsImF1dG9Qcm9sb25nYXRlZCI6ZmFsc2UsImlzQXV0b1Byb2xvbmdhdGVkIjpmYWxzZX0=-WGTHs6XpDhr+uumvbwQPOdlxWnQwgnGaL4eRnlpGKApEEkJyYvNEuPWBSrQkPmVpim/8Sab6HV04Dw3IzkJT0yTc29sPEXBf69+7y6Jv718FaJu4MWfsAk/ZGtNIUOczUQ0iGKKnSSsfQ/3UoMv0q/yJcfvj+me5Zd/gfaisCCMUaGjB/lWIPpEPzblDtVJbRexB1MALrLCEoDv3ujcPAZ7xWb54DiZwjYhQvQ+CvpNNF2jeTku7lbm5v+BoDsdeRq7YBt9ANLUKPr2DahcaZ4gctpHZXhG96IyKx232jYq9jQrFDbQMtVr3E+GsCekMEWSD//dLT+HuZdc1sAIYrw==-MIIElTCCAn2gAwIBAgIBCTANBgkqhkiG9w0BAQsFADAYMRYwFAYDVQQDDA1KZXRQcm9maWxlIENBMB4XDTE4MTEwMTEyMjk0NloXDTIwMTEwMjEyMjk0NlowaDELMAkGA1UEBhMCQ1oxDjAMBgNVBAgMBU51c2xlMQ8wDQYDVQQHDAZQcmFndWUxGTAXBgNVBAoMEEpldEJyYWlucyBzLnIuby4xHTAbBgNVBAMMFHByb2QzeS1mcm9tLTIwMTgxMTAxMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA5ndaik1GD0nyTdqkZgURQZGW+RGxCdBITPXIwpjhhaD0SXGa4XSZBEBoiPdY6XV6pOfUJeyfi9dXsY4MmT0D+sKoST3rSw96xaf9FXPvOjn4prMTdj3Ji3CyQrGWeQU2nzYqFrp1QYNLAbaViHRKuJrYHI6GCvqCbJe0LQ8qqUiVMA9wG/PQwScpNmTF9Kp2Iej+Z5OUxF33zzm+vg/nYV31HLF7fJUAplI/1nM+ZG8K+AXWgYKChtknl3sW9PCQa3a3imPL9GVToUNxc0wcuTil8mqveWcSQCHYxsIaUajWLpFzoO2AhK4mfYBSStAqEjoXRTuj17mo8Q6M2SHOcwIDAQABo4GZMIGWMAkGA1UdEwQCMAAwHQYDVR0OBBYEFGEpG9oZGcfLMGNBkY7SgHiMGgTcMEgGA1UdIwRBMD+AFKOetkhnQhI2Qb1t4Lm0oFKLl/GzoRykGjAYMRYwFAYDVQQDDA1KZXRQcm9maWxlIENBggkA0myxg7KDeeEwEwYDVR0lBAwwCgYIKwYBBQUHAwEwCwYDVR0PBAQDAgWgMA0GCSqGSIb3DQEBCwUAA4ICAQBonMu8oa3vmNAa4RQP8gPGlX3SQaA3WCRUAj6Zrlk8AesKV1YSkh5D2l+yUk6njysgzfr1bIR5xF8eup5xXc4/G7NtVYRSMvrd6rfQcHOyK5UFJLm+8utmyMIDrZOzLQuTsT8NxFpbCVCfV5wNRu4rChrCuArYVGaKbmp9ymkw1PU6+HoO5i2wU3ikTmRv8IRjrlSStyNzXpnPTwt7bja19ousk56r40SmlmC04GdDHErr0ei2UbjUua5kw71Qn9g02tL9fERI2sSRjQrvPbn9INwRWl5+k05mlKekbtbu2ev2woJFZK4WEXAd/GaAdeZZdumv8T2idDFL7cAirJwcrbfpawPeXr52oKTPnXfi0l5+g9Gnt/wfiXCrPElX6ycTR6iL3GC2VR4jTz6YatT4Ntz59/THOT7NJQhr6AyLkhhJCdkzE2cob/KouVp4ivV7Q3Fc6HX7eepHAAF/DpxwgOrg9smX6coXLgfp0b1RU2u/tUNID04rpNxTMueTtrT8WSskqvaJd3RH8r7cnRj6Y2hltkja82HlpDURDxDTRvv+krbwMr26SB/40BjpMUrDRCeKuiBahC0DCoU/4+ze1l94wVUhdkCfL0GpJrMSCDEK+XEurU18Hb7WT+ThXbkdl6VpFdHsRvqAnhR2g4b+Qzgidmuky5NUZVfEaZqV/g== -------------------------------------------------------------------------------- /99_pycharm_archive/激活码/永久激活码/激活码3.txt: -------------------------------------------------------------------------------- 1 | KNBB2QUUR1-eyJsaWNlbnNlSWQiOiJLTkJCMlFVVVIxIiwibGljZW5zZWVOYW1lIjoiZ2hib2tlIiwiYXNzaWduZWVOYW1lIjoiIiwiYXNzaWduZWVFbWFpbCI6IiIsImxpY2Vuc2VSZXN0cmljdGlvbiI6IiIsImNoZWNrQ29uY3VycmVudFVzZSI6ZmFsc2UsInByb2R1Y3RzIjpbeyJjb2RlIjoiSUkiLCJmYWxsYmFja0RhdGUiOiIyMDg5LTA3LTA3IiwicGFpZFVwVG8iOiIyMDg5LTA3LTA3In0seyJjb2RlIjoiQUMiLCJmYWxsYmFja0RhdGUiOiIyMDg5LTA3LTA3IiwicGFpZFVwVG8iOiIyMDg5LTA3LTA3In0seyJjb2RlIjoiRFBOIiwiZmFsbGJhY2tEYXRlIjoiMjA4OS0wNy0wNyIsInBhaWRVcFRvIjoiMjA4OS0wNy0wNyJ9LHsiY29kZSI6IlBTIiwiZmFsbGJhY2tEYXRlIjoiMjA4OS0wNy0wNyIsInBhaWRVcFRvIjoiMjA4OS0wNy0wNyJ9LHsiY29kZSI6IkdPIiwiZmFsbGJhY2tEYXRlIjoiMjA4OS0wNy0wNyIsInBhaWRVcFRvIjoiMjA4OS0wNy0wNyJ9LHsiY29kZSI6IkRNIiwiZmFsbGJhY2tEYXRlIjoiMjA4OS0wNy0wNyIsInBhaWRVcFRvIjoiMjA4OS0wNy0wNyJ9LHsiY29kZSI6IkNMIiwiZmFsbGJhY2tEYXRlIjoiMjA4OS0wNy0wNyIsInBhaWRVcFRvIjoiMjA4OS0wNy0wNyJ9LHsiY29kZSI6IlJTMCIsImZhbGxiYWNrRGF0ZSI6IjIwODktMDctMDciLCJwYWlkVXBUbyI6IjIwODktMDctMDcifSx7ImNvZGUiOiJSQyIsImZhbGxiYWNrRGF0ZSI6IjIwODktMDctMDciLCJwYWlkVXBUbyI6IjIwODktMDctMDcifSx7ImNvZGUiOiJSRCIsImZhbGxiYWNrRGF0ZSI6IjIwODktMDctMDciLCJwYWlkVXBUbyI6IjIwODktMDctMDcifSx7ImNvZGUiOiJQQyIsImZhbGxiYWNrRGF0ZSI6IjIwODktMDctMDciLCJwYWlkVXBUbyI6IjIwODktMDctMDcifSx7ImNvZGUiOiJSTSIsImZhbGxiYWNrRGF0ZSI6IjIwODktMDctMDciLCJwYWlkVXBUbyI6IjIwODktMDctMDcifSx7ImNvZGUiOiJXUyIsImZhbGxiYWNrRGF0ZSI6IjIwODktMDctMDciLCJwYWlkVXBUbyI6IjIwODktMDctMDcifSx7ImNvZGUiOiJEQiIsImZhbGxiYWNrRGF0ZSI6IjIwODktMDctMDciLCJwYWlkVXBUbyI6IjIwODktMDctMDcifSx7ImNvZGUiOiJEQyIsImZhbGxiYWNrRGF0ZSI6IjIwODktMDctMDciLCJwYWlkVXBUbyI6IjIwODktMDctMDcifSx7ImNvZGUiOiJSU1UiLCJmYWxsYmFja0RhdGUiOiIyMDg5LTA3LTA3IiwicGFpZFVwVG8iOiIyMDg5LTA3LTA3In1dLCJoYXNoIjoiMTI3OTY4NzcvMCIsImdyYWNlUGVyaW9kRGF5cyI6NywiYXV0b1Byb2xvbmdhdGVkIjpmYWxzZSwiaXNBdXRvUHJvbG9uZ2F0ZWQiOmZhbHNlfQ==-1iV7BA/baNqv0Q5yUnAphUmh66QhkDRX+qPL09ICuEicBqiPOBxmVLLCVUpkxhrNyfmOtat2LcHwcX/NHkYXdoW+6aS0S388xe1PV2oodiPBhFlEaOac42UQLgP4EidfGQSvKwC9tR1zL5b2CJPQKZ7iiHh/iKBQxP6OBMUP1T7j3Fe1rlxfYPc92HRZf6cO+C0+buJP5ERZkyIn5ZrVM4TEnWrRHbpL8SVNq4yqfc+NwoRzRSNC++81VDS3AXv9c91YeZJz6JXO7AokIk54wltr42FLNuKbozvB/HCxV9PA5vIiM+kZY1K0w5ytgxEYKqA87adA7R5xL/crpaMxHQ==-MIIElTCCAn2gAwIBAgIBCTANBgkqhkiG9w0BAQsFADAYMRYwFAYDVQQDDA1KZXRQcm9maWxlIENBMB4XDTE4MTEwMTEyMjk0NloXDTIwMTEwMjEyMjk0NlowaDELMAkGA1UEBhMCQ1oxDjAMBgNVBAgMBU51c2xlMQ8wDQYDVQQHDAZQcmFndWUxGTAXBgNVBAoMEEpldEJyYWlucyBzLnIuby4xHTAbBgNVBAMMFHByb2QzeS1mcm9tLTIwMTgxMTAxMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA5ndaik1GD0nyTdqkZgURQZGW+RGxCdBITPXIwpjhhaD0SXGa4XSZBEBoiPdY6XV6pOfUJeyfi9dXsY4MmT0D+sKoST3rSw96xaf9FXPvOjn4prMTdj3Ji3CyQrGWeQU2nzYqFrp1QYNLAbaViHRKuJrYHI6GCvqCbJe0LQ8qqUiVMA9wG/PQwScpNmTF9Kp2Iej+Z5OUxF33zzm+vg/nYV31HLF7fJUAplI/1nM+ZG8K+AXWgYKChtknl3sW9PCQa3a3imPL9GVToUNxc0wcuTil8mqveWcSQCHYxsIaUajWLpFzoO2AhK4mfYBSStAqEjoXRTuj17mo8Q6M2SHOcwIDAQABo4GZMIGWMAkGA1UdEwQCMAAwHQYDVR0OBBYEFGEpG9oZGcfLMGNBkY7SgHiMGgTcMEgGA1UdIwRBMD+AFKOetkhnQhI2Qb1t4Lm0oFKLl/GzoRykGjAYMRYwFAYDVQQDDA1KZXRQcm9maWxlIENBggkA0myxg7KDeeEwEwYDVR0lBAwwCgYIKwYBBQUHAwEwCwYDVR0PBAQDAgWgMA0GCSqGSIb3DQEBCwUAA4ICAQBonMu8oa3vmNAa4RQP8gPGlX3SQaA3WCRUAj6Zrlk8AesKV1YSkh5D2l+yUk6njysgzfr1bIR5xF8eup5xXc4/G7NtVYRSMvrd6rfQcHOyK5UFJLm+8utmyMIDrZOzLQuTsT8NxFpbCVCfV5wNRu4rChrCuArYVGaKbmp9ymkw1PU6+HoO5i2wU3ikTmRv8IRjrlSStyNzXpnPTwt7bja19ousk56r40SmlmC04GdDHErr0ei2UbjUua5kw71Qn9g02tL9fERI2sSRjQrvPbn9INwRWl5+k05mlKekbtbu2ev2woJFZK4WEXAd/GaAdeZZdumv8T2idDFL7cAirJwcrbfpawPeXr52oKTPnXfi0l5+g9Gnt/wfiXCrPElX6ycTR6iL3GC2VR4jTz6YatT4Ntz59/THOT7NJQhr6AyLkhhJCdkzE2cob/KouVp4ivV7Q3Fc6HX7eepHAAF/DpxwgOrg9smX6coXLgfp0b1RU2u/tUNID04rpNxTMueTtrT8WSskqvaJd3RH8r7cnRj6Y2hltkja82HlpDURDxDTRvv+krbwMr26SB/40BjpMUrDRCeKuiBahC0DCoU/4+ze1l94wVUhdkCfL0GpJrMSCDEK+XEurU18Hb7WT+ThXbkdl6VpFdHsRvqAnhR2g4b+Qzgidmuky5NUZVfEaZqV/g== -------------------------------------------------------------------------------- /99_pycharm_archive/激活码/非永久激活码/Pycharm方式一激活码汇总.docx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binzhouchn/python_notes/52b87285c4147cc2bd43d69d0243a2d6357f4183/99_pycharm_archive/激活码/非永久激活码/Pycharm方式一激活码汇总.docx -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | [![Analytics](https://ga-beacon.appspot.com/GA-80121379-2/notes-python)](https://github.com/binzhouchn/feature_engineering) 3 | 4 | # python笔记 5 | > 版本:0.5
6 | > 作者:binzhou
7 | > 邮件:binzhouchn@gmail.com
8 | 9 | `Github`加载`ipynb`的速度较慢,建议在 [Nbviewer](http://nbviewer.ipython.org/github/lijin-THU/notes-python/blob/master/index.ipynb) 中查看该项目。 10 | 11 | [python各版本下载仓库](https://www.python.org/ftp/python/)
12 | 13 | --- 14 | 15 | ## 简介 16 | 17 | 默认安装了 `Python 3.10`,以及相关的第三方包 `gensim`, `tqdm`, `flask` 18 | 19 | anaconda 虚拟环境创建python版本降级命令:conda create -n tableqa python=3.9 20 | 21 | > life is short.use python. 22 | 23 | 推荐使用[Anaconda](http://www.continuum.io/downloads),这个IDE集成了大部分常用的包。 24 | 25 |
26 | pip使用国内镜像 27 | 28 | [让python pip使用国内镜像](https://www.cnblogs.com/wqpkita/p/7248525.html) 29 | ```shell 30 | pip install -i http://pypi.douban.com/simple --trusted-host pypi.douban.com xx包 31 | pip install -i https://pypi.tuna.tsinghua.edu.cn/simple xx包 32 | pip install -i http://pypi.douban.com/simple --trusted-host pypi.douban.com xx包 --use-feature=2020-resolver #解决安装包时冲突问题 33 | ``` 34 | ``` 35 | 临时使用示例: 36 | pip install -i http://pypi.douban.com/simple --trusted-host pypi.douban.com flask 37 | # 如果是公司电脑且有代理,本地进入docker python3.6后需要加个代理再安装相关的包 38 | pip --proxy=proxyAddress:port install -i http://pypi.douban.com/simple --trusted-host pypi.douban.com flask 39 | ``` 40 | 41 |
42 | 43 |
44 | pip镜像配置 45 | 46 | pip install镜像配置(Linux) 47 | ``` 48 | # 先在home或者和anaconda文件夹平级的的.pip文件夹下新建pip.conf配置文件然后把以后代码复制进去 49 | [global] 50 | trusted-host = pypi.tuna.tsinghua.edu.cn 51 | index-url = https://pypi.tuna.tsinghua.edu.cn/simple 52 | ``` 53 | pip install镜像配置(Windows) 54 | ``` 55 | # 进入目录(C:\Users\Administrator)下新建一个pip文件夹,文件夹里建一个pip.ini 文本文件,内容如下: 56 | [global] 57 | index-url = https://pypi.tuna.tsinghua.edu.cn/simple 58 | [install] 59 | trusted-host = pypi.tuna.tsinghua.edu.cn 60 | 或者 61 | [global] 62 | index-url = http://mirrors.aliyun.com/pypi/simple/ 63 | [install] 64 | trusted-host = mirrors.aliyun.com 65 | ``` 66 |
67 | 68 | ## 使用conda升级到python3.12 69 | 70 | 方法一
71 | https://qa.1r1g.com/sf/ask/4099772281/)
72 | ```shell 73 | conda update -n base -c defaults conda 74 | conda install -c anaconda python=3.12 75 | #然后再重新安装下依赖包 76 | ``` 77 | 方法二(或使用虚拟环境)
78 | ``` 79 | $ conda create -p /your_path/env_name python=3.12 80 | # 激活环境 81 | $ source activate /your_path/env_name 82 | # 关闭环境 83 | $ source deactivate /your_path/env_name 84 | # 删除环境 85 | $ conda env remove -p /your_path/env_name 86 | ``` 87 | 88 | ## 其他python仓库推荐 89 | 90 | [All algorithms implemented in Python - for education](https://github.com/TheAlgorithms/Python/)
91 | --------------------------------------------------------------------------------