├── README.md ├── code ├── financial_data_visualization.ipynb ├── process.ipynb ├── python_to_neo4j.ipynb ├── stock_data.ipynb ├── stock_text_data_analysis.ipynb └── text_analysis.ipynb ├── cypher_cheetsheet └── README.md ├── data ├── 000001.XSHE.csv ├── 000063.XSHE.csv └── latest_news.csv ├── docs ├── 小型金融知识图谱构建示范.pdf └── 小型金融知识图谱构建示范.pptx ├── font └── FZLTCXHJW.TTF ├── images ├── aaa.png ├── average_price_plot.png ├── close_price_plot.png ├── cn.png ├── corr_node.png ├── corr_node_with_return.png ├── create_node.png ├── create_releationship.png ├── daily_return_plot.png ├── link_prediction_nodes.png ├── monte_2_plot.png ├── monte_3_plot.png ├── monte_plot.png ├── multi_rets_plot.png ├── nodes_pcc.png ├── others_alg.png ├── pcc.png ├── ra.png ├── rets_plot.png ├── return_risk.png ├── return_risk_2.png ├── tn.png ├── volume_plot.png └── word_cloud.png └── jar └── graph-algorithms-algo-3.5.4.0.jar /README.md: -------------------------------------------------------------------------------- 1 | ## 小型金融知识图谱构建示例 2 | 3 | ![author](https://img.shields.io/static/v1?label=Author&message=junmingguo&color=green) 4 | ![language](https://img.shields.io/static/v1?label=Language&message=python3&color=orange) 5 | ![topics](https://img.shields.io/static/v1?label=Topics&message=knowledge-graph&color=blue) 6 | 7 | - [1. 知识图谱存储方式](#1-知识图谱存储方式) 8 | - [1.1 资源描述框架特性](#11-资源描述框架特性) 9 | - [1.2 图数据库特性](#12-图数据库特性) 10 | - [2. 图数据库neo4j](#2-图数据库neo4j) 11 | - [2.1 软件下载](#21-软件下载) 12 | - [2.2 启动登录](#22-启动登录) 13 | - [2.2.1 Windows](#221-windows) 14 | - [2.2.2 MacOS](#222-macos) 15 | - [2.3 储备知识](#23-储备知识) 16 | - [2.4 Windows安装时可能遇到问题及解决方法](#24-windows安装时可能遇到问题及解决方法) 17 | - [3. 知识图谱数据准备](#3-知识图谱数据准备) 18 | - [3.1 免费开源金融数据接口](#31-免费开源金融数据接口) 19 | - [3.1.1 Tushare](#311-tushare) 20 | - [3.1.2 JointQuant](#312-jointquant) 21 | - [3.1.3 导入模块](#313-导入模块) 22 | - [3.2 数据预处理](#32-数据预处理) 23 | - [3.2.1 股票基本信息](#321-股票基本信息) 24 | - [3.2.2 股票持有股东信息](#322-股票持有股东信息) 25 | - [3.2.3 股票概念信息](#323-股票概念信息) 26 | - [3.2.4 股票公告信息](#324-股票公告信息) 27 | - [3.2.5 财经新闻信息](#325-财经新闻信息) 28 | - [3.2.6 概念信息](#326-概念信息) 29 | - [3.2.7 沪股通和深股通成分信息](#327-沪股通和深股通成分信息) 30 | - [3.2.8 股票价格信息](#328-股票价格信息) 31 | - [3.2.9 使用免费接口获取股票数据](#329-使用免费接口获取股票数据) 32 | - [3.3 数据预处理](#33-数据预处理) 33 | - [3.3.1 统计股票的交易日量众数](#331-统计股票的交易日量众数) 34 | - [3.3.2 计算股票对数收益](#332-计算股票对数收益) 35 | - [3.3.3 股票间对数收益率相关系数](#333-股票间对数收益率相关系数) 36 | - [4 搭建金融知识图谱](#4-搭建金融知识图谱) 37 | - [4.1 基于python连接](#41-基于python连接) 38 | - [4.2 读取数据](#42-读取数据) 39 | - [4.3 填充和去重](#43-填充和去重) 40 | - [4.4 创建实体](#44-创建实体) 41 | - [4.5 创建关系](#45-创建关系) 42 | - [5 数据可视化查询](#5-数据可视化查询) 43 | - [5.1 查看所有关联实体](#51-查看所有关联实体) 44 | - [5.2 限制显示数量](#52-限制显示数量) 45 | - [5.3 指定股票间对数收益率相关系数](#53-指定股票间对数收益率相关系数) 46 | - [6 neo4j 图算法](#6-neo4j-图算法) 47 | - [6.1.中心度算法(Centralities)](#61中心度算法centralities) 48 | - [6.2 社区检测算法(Community detection)](#62-社区检测算法community-detection) 49 | - [6.3 路径搜索算法(Path finding)](#63-路径搜索算法path-finding) 50 | - [6.4 相似性算法(Similarity)](#64-相似性算法similarity) 51 | - [6.5 链接预测(Link Prediction)](#65-链接预测link-prediction) 52 | - [6.6 预处理算法(Preprocessing)](#66-预处理算法preprocessing) 53 | - [6.7 算法库安装及导入方法](#67-算法库安装及导入方法) 54 | - [6.8 算法实践——链路预测](#68-算法实践链路预测) 55 | - [6.8.1 Aaamic Adar algorithm](#681-aaamic-adar-algorithm) 56 | - [6.8.2 Common Neighbors](#682-common-neighbors) 57 | - [6.8.3 Resource Allocation](#683-resource-allocation) 58 | - [6.8.4 Total Neighbors](#684-total-neighbors) 59 | 60 | 61 | 62 | ## 1. 知识图谱存储方式 63 | 64 | 知识图谱存储方式主要包含资源描述框架([Resource Description Framework,RDF](https://en.wikipedia.org/wiki/Resource_Description_Framework))和图数据库([Graph Database](https://en.wikipedia.org/wiki/Graph_database))。 65 | 66 | ### 1.1 资源描述框架特性 67 | 68 | - 存储为三元组(Triple) 69 | - 标准的推理引擎 70 | - [W3C标准](https://en.wikipedia.org/wiki/World_Wide_Web_Consortium) 71 | - 易于发布数据 72 | - 多数为学术界场景 73 | 74 | ### 1.2 图数据库特性 75 | 76 | - 节点和关系均可以包含属性 77 | - 没有标准的推理引擎 78 | - 图的遍历效率高 79 | - 事务管理 80 | - 多数为工业界场景 81 | 82 | --- 83 | 84 | ## 2. 图数据库neo4j 85 | 86 | neo4j是一款[NoSQL](https://en.wikipedia.org/wiki/NoSQL)图数据库,具备高性能的读写可扩展性,基于高效的图形查询语言`Cypher`,更多介绍可访问[neo4j官网](https://neo4j.com/),官网还提供了[Online Sandbox](https://neo4j.com/sandbox/)实现快速上手体验。 87 | 88 | 89 | 90 | ### 2.1 软件下载 91 | 92 | 下载链接:https://neo4j.com/download-center/ 93 | 94 | ### 2.2 启动登录 95 | 96 | #### 2.2.1 Windows 97 | 98 | - 进入`neo4j`目录 99 | 100 | ``` 101 | cd neo4j/bin 102 | ./neo4j start 103 | ``` 104 | 105 | - 启动成功,终端出现如下提示即为启动成功 106 | 107 | ``` 108 | Starting Neo4j.Started neo4j (pid 30914). It is available at http://localhost:7474/ There may be a short delay until the server is ready. 109 | ``` 110 | 111 | (1)访问页面:http://localhost:7474 112 | 113 | (2)初始账户和密码均为`neo4j`(`host`类型选择`bolt`) 114 | 115 | (3)输入旧密码并输入新密码:启动前注意本地已安装`jdk`(建议安装`jdk version 11`):https://www.oracle.com/java/technologies/javase-downloads.html 116 | 117 | #### 2.2.2 MacOS 118 | 119 | 执行 Add Local DBMS 后,再打开 Neo4j Browser即可 120 | 121 | ### 2.3 储备知识 122 | 123 | 在 neo4j 上执行 CRUD 时需要使用 Cypher 查询语言。 124 | - [官网文档](https://neo4j.com/developer/cypher/) 125 | - [个人整理的常见Cypher指令](https://github.com/jm199504/Financial-Knowledge-Graphs/tree/master/cypher%20cheetsheet) 126 | 127 | ### 2.4 Windows安装时可能遇到问题及解决方法 128 | 129 | - 问题:完成安装JDK1.8.0_261后,在启动`neo4j`过程中出现了以下问题: 130 | 131 | ``` 132 | Unable to find any JVMs matching version "11" 133 | ``` 134 | 135 | - 解决方案:提示安装`jdk 11 version`,于是下载了`jdk-11.0.8`,`Mac OS`可通过`ls -la /Library/Java/JavaVirtualMachines/`查看已安装的`jdk`及版本信息。 136 | 137 | --- 138 | 139 | ## 3. 知识图谱数据准备 140 | 141 | ### 3.1 免费开源金融数据接口 142 | 143 | Tushare免费账号可能无法拉取数据,可参考issues提供的股票数据获取方法: https://github.com/jm199504/Financial-Knowledge-Graphs/issues/2#issuecomment-801732782 144 | 145 | #### 3.1.1 Tushare 146 | 147 | 官网链接:http://www.tushare.org/ 148 | 149 | #### 3.1.2 JointQuant 150 | 151 | 官网链接:https://www.joinquant.com/ 152 | 153 | #### 3.1.3 导入模块 154 | 155 | ```python 156 | import tushare as ts # 参考Tushare官网提供的安装方式 157 | import csv 158 | import time 159 | import pandas as pd 160 | # 以下pro_api token可能已过期,可自行前往申请或者使用免费版本 161 | pro = ts.pro_api('4340a981b3102106757287c11833fc14e310c4bacf8275f067c9b82d') 162 | ``` 163 | 164 | 165 | 166 | ### 3.2 数据预处理 167 | 168 | #### 3.2.1 股票基本信息 169 | 170 | ```python 171 | stock_basic = pro.stock_basic(list_status='L', fields='ts_code, symbol, name, industry') 172 | # 重命名行,便于后面导入neo4j 173 | basic_rename = {'ts_code': 'TS代码', 'symbol': '股票代码', 'name': '股票名称', 'industry': '行业'} 174 | stock_basic.rename(columns=basic_rename, inplace=True) 175 | # 保存为stock_basic.csv 176 | stock_basic.to_csv('financial_data\\stock_basic.csv', encoding='gbk') 177 | ``` 178 | 179 | | TS代码 | 股票代码 | 股票名称 | 行业 | 180 | |-----------|----------|------------|----------| 181 | | 000001.SZ | 000001 | 平安银行 | 银行 | 182 | | 000002.SZ | 000002 | 万科A | 全国地产 | 183 | | 000004.SZ | 000004 | 国华网安 | 软件服务 | 184 | | 000005.SZ | 000005 | ST星源 | 环境保护 | 185 | | 000006.SZ | 000006 | 深振业A | 区域地产 | 186 | 187 | 188 | #### 3.2.2 股票持有股东信息 189 | 190 | ```python 191 | holders = pd.DataFrame(columns=('ts_code', 'ann_date', 'end_date', 'holder_name', 'hold_amount', 'hold_ratio')) 192 | # 获取一年内所有上市股票股东信息(可以获取一个报告期的) 193 | for i in range(3610): 194 | code = stock_basic['TS代码'].values[i] 195 | holders = pro.top10_holders(ts_code=code, start_date='20180101', end_date='20181231') 196 | holders = holders.append(holders) 197 | if i % 600 == 0: 198 | print(i) 199 | time.sleep(0.4)# 数据接口限制 200 | # 保存为stock_holders.csv 201 | holders.to_csv('financial_data\\stock_holders.csv', encoding='gbk') 202 | holders = pro.holders(ts_code='000001.SZ', start_date='20180101', end_date='20181231') 203 | ``` 204 | 205 | | ts_code | ann_date | end_date | holder_name | hold_amount | hold_ratio | 206 | |-----------|----------|----------|---------------------------------------------------------|---------------|------------| 207 | | 000001.SZ | 20200421 | 20200331 | 中国平安保险(集团)股份有限公司-集团本级-自有资金 | 9.618540e+09 | 49.56 | 208 | | 000001.SZ | 20200421 | 20200331 | 香港中央结算有限公司(陆股通) | 1.669875e+09 | 8.60 | 209 | | 000001.SZ | 20200421 | 20200331 | 中国平安人寿保险股份有限公司-自有资金 | 1.186100e+09 | 6.11 | 210 | | 000001.SZ | 20200421 | 20200331 | 中国平安人寿保险股份有限公司-传统-普通保险产品 | 4.404787e+08 | 2.27 | 211 | | 000001.SZ | 20200421 | 20200331 | 中国证券金融股份有限公司 | 4.292327e+08 | 2.21 | 212 | 213 | #### 3.2.3 股票概念信息 214 | 215 | ```python 216 | concept_details = pd.DataFrame(columns=('id', 'concept_name', 'ts_code', 'name')) 217 | for i in range(358): 218 | id = 'TS' + str(i) 219 | concept_detail = pro.concept_detail(id=id) 220 | concept_details = concept_details.append(concept_detail) 221 | time.sleep(0.4) 222 | # 保存为concept_detail.csv 223 | concept_details.to_csv('financial_data\\stock_concept.csv', encoding='gbk') 224 | ``` 225 | 226 | | id | concept_name | ts_code | name | 227 | |----|---------|----------|--------------| 228 | | TS0 | 密集调研 | 000917.SZ | 电广传媒 | 229 | | TS0 | 密集调研 | 002123.SZ | 梦网集团 | 230 | | TS0 | 密集调研 | 002127.SZ | 南极电商 | 231 | | TS0 | 密集调研 | 002371.SZ | 北方华创 | 232 | | TS0 | 密集调研 | 002460.SZ | 赣锋锂业 | 233 | 234 | #### 3.2.4 股票公告信息 235 | 236 | ```python 237 | for i in range(3610): 238 | code = stock_basic['TS代码'].values[i] 239 | notices = pro.anns(ts_code=code, start_date='20180101', end_date='20181231', year='2018') 240 | notices.to_csv("financial_data\\notices\\"+str(code)+".csv",encoding='utf_8_sig',index=False) 241 | notices = pro.anns(ts_code='000001.SZ', start_date='20180101', end_date='20181231', year='2018') 242 | ``` 243 | 244 | | ts_code | ann_date | title | content | 245 | |-----------|-----------|---------------------------------------------------------------------------------------------|---------------------------------------------| 246 | | 000001.SZ | 20181227 | 关于公开发行A股可转换公司债券申请获得中国证监会核准批复的公告 | 证券代码:000001 证券简称:平安银行 | 247 | | 000001.SZ | 20181226 | 董事会决议公告 | 证券代码:000001 证券简称:平安银行 ... | 248 | | 000001.SZ | 20181219 | 关于成功发行350亿元金融债券的公告 | 证券代码:000001 证券简称:平安银行. | 249 | | 000001.SZ | 20181218 | 关于公开发行A股可转换公司债券申请获中国证监会发行审核委员会南核通过的公告 | 证券代码:000001 证券简称:平安银行 ... | 250 | | 000001.SZ | 20181208 | 关于发行金融债券获准的公告 | 证券代码:000001 证券简称:平安银行⋯ | 251 | 252 | 253 | 254 | #### 3.2.5 财经新闻信息 255 | 256 | ```python 257 | news = pro.news(src='sina', start_date='20180101', end_date='20181231') 258 | news.to_csv("financial_data\\news.csv",encoding='utf_8_sig') 259 | ``` 260 | 261 | | datetime | content | title | 262 | |---------------------|----------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------| 263 | | 2018-11-22 10:09:00 | 【北京:10月末非金融企业贷款增速创历史同期新高 】据中国人民银行营业管理部11月22日披露... | | 264 | | 2018-11-22 10:08:29 | 【湖北省积极推动企业向科创板靠拢,年底将向上交所推荐一批优质企业】湖北省上市指导中心相关人士... | | 265 | | 2018-11-22 10:08:04 | 【上海:加快专利审查速度 进一步缩短专利授权周期】上海市人民政府发布关于加快本市高新技术企业... | | 266 | | 2018-11-22 10:04:35 | 【生意社:稀土价格走势或将上涨】11月20日稀土指数为344点,与上一日持平,较2011年1... | | 267 | | 2018-11-22 10:03:40 | 【上海:加快建设具有全球影响力的科技创新中心 做大做强高新技术产业】上海市人民政府发布关于加... | | 268 | 269 | #### 3.2.6 概念信息 270 | 271 | ```python 272 | concept = pro.concept() 273 | concept.to_csv('financial_data\\concept.csv', encoding='gbk') 274 | ``` 275 | 276 | | code | name | src | 277 | |--------|--------------|--------------| 278 | | TS0 | TSO 密集调研 | ts | 279 | | TS1 | TS1 南北船合并 ts | ts | 280 | | TS2 | TS25G | ts | 281 | | TS3 | TS3 机场 | ts | 282 | | TS4 | TS4 高价股 | ts | 283 | 284 | #### 3.2.7 沪股通和深股通成分信息 285 | 286 | ```python 287 | #获取沪股通成分 288 | sh = pro.hs_const(hs_type='SH') 289 | sh.to_csv("financial_data\\sh.csv",index=False) 290 | #获取深股通成分 291 | sz = pro.hs_const(hs_type='SZ') 292 | sz.to_csv("financial_data\\sz.csv",index=False) 293 | ``` 294 | 295 | | ts_code | hs_type | in_date | out_date | is_new | 296 | |-----------|---------|-----------|----------|--------| 297 | | 601628.SH | SH | 20141117 | None | 1 | 298 | | 601099.SH | SH | 20141117 | None | 1 | 299 | | 601808.SH | SH | 20141117 | None | 1 | 300 | | 601107.SH | SH | 20141117 | None | 1 | 301 | | 601880.SH | SH | 20141117 | None | 1 | 302 | 303 | #### 3.2.8 股票价格信息 304 | 305 | ```python 306 | for i in range(3610): 307 | code = stock_basic['TS代码'].values[i] 308 | price = pro.query('daily', ts_code=code, start_date='20180101', end_date='20181231') 309 | price.to_csv("financial_data\\price\\"+str(code)+".csv",index=False) 310 | ``` 311 | 312 | | ts_code | trade_date | open | high | low | close | pre_close | change | pct_chg | vol | amount | 313 | |-----------|------------|-------|-------|-------|-------|-----------|--------|---------|------------|--------------| 314 | | 000001.SZ | 20200123 | 15.92 | 15.92 | 15.39 | 15.54 | 16.09 | -0.55 | -3.4183 | 1100592.07 | 1723394.336 | 315 | | 000001.SZ | 20200122 | 15.92 | 16.16 | 15.71 | 16.09 | 16.00 | 0.09 | 0.5625 | 719464.91 | 1150933.398 | 316 | | 000001.SZ | 20200121 | 16.34 | 16.34 | 15.93 | 16.00 | 16.45 | -0.45 | -2.7356 | 896603.10 | 1442171.431 | 317 | | 000001.SZ | 20200120 | 16.43 | 16.61 | 16.35 | 16.45 | 16.39 | 0.06 | 0.3661 | 746074.75 | 1226464.649 | 318 | | 000001.SZ | 20200117 | 16.38 | 16.55 | 16.35 | 16.39 | 16.33 | 0.06 | 0.3674 | 605436.69 | 995909.007 | 319 | 320 | #### 3.2.9 使用免费接口获取股票数据 321 | 322 | ```python 323 | import tushare as ts 324 | # 基本面信息 325 | df = ts.get_stock_basics() 326 | # 公告信息 327 | ts.get_notices("000001") 328 | # 新浪股吧 329 | ts.guba_sina() 330 | # 历史价格数据 331 | ts.get_hist_data("000001") 332 | # 历史价格数据(周粒度) 333 | ts.get_hist_data("000001",ktype="w") 334 | # 历史价格数据(1分钟粒度) 335 | ts.get_hist_data("000001",ktype="m") 336 | # 历史价格数据(5分钟粒度) 337 | ts.get_hist_data("000001",ktype="5") 338 | ``` 339 | #### 3.2.10 指数数据 340 | 341 | - sh上证指数 342 | 343 | - sz深圳成指 344 | - hs300沪深300 345 | - sz50上证50 346 | - zxb中小板指数 347 | - cyb创业板指数 348 | 349 | ``` 350 | ts.get_hist_data("cyb").head() 351 | ``` 352 | | date | open | high | close | low | volume | price_change | p_change | ma5 | ma10 | ma20 | v_ma5 | v_ma10 | v_ma20 | 353 | |------------|--------|--------|--------|--------|-------------|--------------|----------|----------|----------|----------|-------------|-------------|--------------| 354 | | 2019-06-19 | 1501.25| 1504.41| 1469.99| 1469.59| 16878786.0 | 14.24 | 0.98 | 1460.376 | 1456.171 | 1466.724 | 13305159.8 | 13847384.5 | 14209395.85 | 355 | | 2019-06-18 | 1443.65| 1460.19| 1455.75| 1437.04| 9075484.0 | 13.40 | 0.93 | 1461.158 | 1454.799 | 1467.911 | 12853513.4 | 13402555.2 | 14119367.10 | 356 | | 2019-06-17 | 1450.56| 1459.21| 1442.35| 1437.14| 9968822.0 | -11.61 | -0.80 | 1467.478 | 1456.122 | 1468.589 | 14905515.8 | 13928269.9 | 14493830.70 | 357 | | 2019-06-14 | 1478.53| 1489.47| 1453.96| 1451.66| 16016380.0 | -25.87 | -1.75 | 1465.276 | 1460.253 | 1470.409 | 15363326.0 | 14201386.5 | 14961172.10 | 358 | | 2019-06-13 | 1474.55| 1485.72| 1479.83| 1464.84| 14586327.0 | 5.93 | 0.40 | 1457.696 | 1463.381 | 1474.394 | 14821981.6 | 13946315.2 | 14942959.45 | 359 | 360 | #### 3.2.11 宏观数据(居民消费指数) 361 | 362 | ``` 363 | ts.get_cpi() 364 | ``` 365 | | month | cpi | 366 | |--------|--------| 367 | | 2019.5 | 102.74 | 368 | | 2019.4 | 102.54 | 369 | | 2019.3 | 102.28 | 370 | | 2019.2 | 101.49 | 371 | | 2019.1 | 101.74 | 372 | 373 | #### 3.2.12 获取分笔数据 374 | 375 | ```python 376 | ts.get_tick_data('000001', date='2018-10-08', src='tt') 377 | ``` 378 | 379 | 380 | | time | price | change | volume | amount | type | 381 | |---------|-------|--------|--------|----------|-------| 382 | | 09:25:04| 10.70 | -0.35 | 36779 | 39353530 | 卖盘 | 383 | | 09:30:04| 10.69 | -0.01 | 25165 | 26872673 | 卖盘 | 384 | | 09:30:06| 10.69 | 0.00 | 11092 | 11853208 | 买盘 | 385 | | 09:30:09| 10.68 | -0.01 | 2005 | 2142749 | 卖盘 | 386 | | 09:30:13| 10.68 | 0.00 | 5973 | 6363516 | 买盘 | 387 | 388 | #### 3.2.13 价格走势图 389 | 390 | ```python 391 | from pyecharts.charts import Line 392 | from pyecharts import options as opts 393 | import numpy as np 394 | price = pro.query('daily', ts_code='000001.SZ', start_date='20180101', end_date='20181231') 395 | ( 396 | Line() 397 | .add_xaxis(xaxis_data=list(price['trade_date'])[::-1]) 398 | 399 | .add_yaxis(series_name="close price",y_axis=list(price['close'])[::-1],symbol="circle") 400 | .add_yaxis(series_name="open price",y_axis=list(price['open'])[::-1],symbol="circle") 401 | .add_yaxis(series_name="high price",y_axis=list(price['high'])[::-1],symbol="circle") 402 | .add_yaxis(series_name="low price",y_axis=list(price['low'])[::-1],symbol="circle") 403 | .set_global_opts(title_opts=opts.TitleOpts(title="价格走势图")) 404 | .render_notebook() 405 | ) 406 | ``` 407 | 408 | 409 | 410 | ### 3.3 数据预处理 411 | 412 | #### 3.3.1 统计股票的交易日量众数 413 | 414 | ```python 415 | import numpy as np 416 | 417 | yaxis = list() 418 | for i in listdir: 419 | stock = pd.read_csv("financial_data\\price_logreturn\\"+i) 420 | yaxis.append(len(stock['logreturn'])) 421 | counts = np.bincount(yaxis) 422 | 423 | np.argmax(counts) 424 | ``` 425 | 426 | 427 | 428 | #### 3.3.2 计算股票对数收益 429 | 430 | 股票对数收益及皮尔逊相关系数的计算公式: 431 | 432 | ![](https://github.com/jm199504/Financial-Knowledge-Graphs/blob/master/images/pcc.png?raw=true) 433 | 434 | ```python 435 | import pandas as pd 436 | import numpy as np 437 | import os 438 | import math 439 | 440 | listdir = os.listdir("financial_data\\price") 441 | 442 | for l in listdir: 443 | stock = pd.read_csv('financial_data\\price\\'+l) 444 | stock['index'] = [1]* len(stock['close']) 445 | stock['next_close'] = stock.groupby('index')['close'].shift(-1) 446 | stock = stock.drop(index=stock.index[-1]) 447 | logreturn = list() 448 | for i in stock.index: 449 | logreturn.append(math.log(stock['next_close'][i]/stock['close'][i])) 450 | stock['logreturn'] = logreturn 451 | stock.to_csv("financial_data\\price_logreturn\\"+l,index=False) 452 | ``` 453 | 454 | #### 3.3.3 股票间对数收益率相关系数 455 | 456 | ```python 457 | from math import sqrt 458 | def multipl(a,b): 459 | sumofab=0.0 460 | for i in range(len(a)): 461 | temp=a[i]*b[i] 462 | sumofab+=temp 463 | return sumofab 464 | 465 | def corrcoef(x,y): 466 | n=len(x) 467 | #求和 468 | sum1=sum(x) 469 | sum2=sum(y) 470 | #求乘积之和 471 | sumofxy=multipl(x,y) 472 | #求平方和 473 | sumofx2 = sum([pow(i,2) for i in x]) 474 | sumofy2 = sum([pow(j,2) for j in y]) 475 | num=sumofxy-(float(sum1)*float(sum2)/n) 476 | #计算皮尔逊相关系数 477 | den=sqrt((sumofx2-float(sum1**2)/n)*(sumofy2-float(sum2**2)/n)) 478 | return num/den 479 | ``` 480 | 481 | 由于原始数据达百万条,为节省计算量仅选取前300个股票进行关联性分析 482 | 483 | ```python 484 | listdir = os.listdir("financial_data\\300stock_logreturn") 485 | s1 = list() 486 | s2 = list() 487 | corr = list() 488 | for i in listdir: 489 | for j in listdir: 490 | stocka = pd.read_csv("financial_data\\300stock_logreturn\\"+i) 491 | stockb = pd.read_csv("financial_data\\300stock_logreturn\\"+j) 492 | if len(stocka['logreturn']) == 242 and len(stockb['logreturn']) == 242: 493 | s1.append(str(i)[:10]) 494 | s2.append(str(j)[:10]) 495 | corr.append(corrcoef(stocka['logreturn'],stockb['logreturn'])) 496 | print(str(i)[:10],str(j)[:10],corrcoef(stocka['logreturn'],stockb['logreturn'])) 497 | corrdf = pd.DataFrame() 498 | corrdf['s1'] = s1 499 | corrdf['s2'] = s2 500 | corrdf['corr'] = corr 501 | corrdf.to_csv("financial_data\\corr.csv") 502 | ``` 503 | 504 | ### 3.4 文本数据词云及情绪分析 505 | 506 | #### 3.4.1 获取数据 507 | 508 | ```python 509 | #金融量化分析常用到的有:pandas(数据结构)、 510 | #numpy(数组)、matplotlib(可视化)、scipy(统计) 511 | import tushare as ts 512 | import pandas as pd 513 | import matplotlib.pyplot as plt 514 | %matplotlib inline 515 | import jieba 516 | import jieba.analyse 517 | from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator 518 | 519 | # 正常显示画图时出现的中文和负号 520 | from pylab import mpl 521 | mpl.rcParams['font.sans-serif']=['SimHei'] 522 | mpl.rcParams['axes.unicode_minus']=False 523 | 524 | ts.set_token('4340a981b3102106757287c11833fc14e310c4bacf8275f067c9b82d') 525 | pro = ts.pro_api() 526 | df = pro.news(src='sina', start_date='20190601', end_date='20190624') 527 | #获取当前即时财经新闻(如本文是2018年11月17日) 528 | 529 | # 数据清洗,保留需要的字段 530 | df=df[['datetime','title','content']] 531 | # 保存数据 532 | df.to_csv("latest_news.csv",encoding="utf_8_sig") 533 | # 查看前5条数据 534 | df.head() 535 | ``` 536 | 537 | #### 3.4.2 财经新闻标题词云 538 | 539 | ```python 540 | #提取新闻标题内容并转化为列表(list) 541 | #注意原来是pandas的数据格式 542 | mylist = list(df.content.values) 543 | 544 | #对内容进行分词(即切割为一个个关键词) 545 | word_list = [" ".join(jieba.cut(sentence)) for sentence in mylist] 546 | new_text = ' '.join(word_list) 547 | 548 | #读取图 549 | img = plt.imread("black.jpg") 550 | 551 | #设置词云格式 552 | wc = WordCloud(background_color="white", 553 | mask=img,#设置背景图片 554 | max_font_size=120, #字体最大值 555 | random_state=42, #颜色随机性 556 | font_path="c:\windows\fonts\simsun.ttc") 557 | #font_path显示中文字体,使用黑体 558 | 559 | #生成词云 560 | wc.generate(new_text) 561 | image_colors = ImageColorGenerator(img) 562 | 563 | #设置图片大小 564 | plt.figure(figsize=(14,12)) 565 | plt.imshow(wc) 566 | plt.title('财经新闻标题词云\n',fontsize=18) 567 | plt.axis("off") 568 | plt.show() 569 | 570 | #将图片保存到本地 571 | # wc.to_file("财经新闻标题词云.jpg") 572 | ``` 573 | 574 | ![](https://github.com/jm199504/Financial-Knowledge-Graphs/blob/master/images/word_cloud.png?raw=true) 575 | 576 | #### 3.4.3 数据清洗(黑名单/固定词) 577 | 578 | ```python 579 | #数据清洗 580 | #将titles列专门提取出来,并转化为列表形式 581 | d=list(df.content[0]) 582 | content=''.join(d) 583 | print(content) 584 | #设置分词黑名单 585 | blacklist = ['个','文本','界面','21','23'] 586 | 587 | #将某些固定词汇加入分词 588 | stopwords=['产融对接项目','国际金融交易·博览会'] 589 | for word in stopwords: 590 | jieba.add_word(word) 591 | 592 | #设置blacklist黑名单过滤无关词语 593 | d = {} #将词语转入字典 594 | for word in jieba.cut(content): 595 | if word in blacklist: 596 | continue 597 | if len(word)<2: #去除单个字的词语 598 | continue 599 | d[word] = d.get(word, 0) + 1 600 | 601 | #使用jieba.analyse 602 | d=''.join(d) 603 | tags=jieba.analyse.extract_tags(d,topK=100,withWeight=True) 604 | tf=dict((a[0],a[1]) for a in tags) 605 | backgroud_Image = plt.imread('black.jpg') 606 | 607 | wc = WordCloud( 608 | background_color='white', 609 | # 设置背景颜色 610 | mask=backgroud_Image, 611 | # 设置背景图片 612 | font_path="c:\windows\fonts\simsun.ttc", 613 | # 若是有中文的话,这句代码必须添加 614 | max_words=10, # 设置最大现实的字数 615 | stopwords=STOPWORDS,# 设置停用词 616 | max_font_size=150,# 设置字体最大值 617 | random_state=30) 618 | wc.generate_from_frequencies(tf) 619 | plt.figure(figsize=(6,6),facecolor='w',edgecolor='k') 620 | plt.imshow(wc) 621 | # 是否显示x轴、y轴下标 622 | plt.title(df.title[0],fontsize=15) 623 | plt.axis('off') 624 | plt.show() 625 | ``` 626 | 627 | #### 3.4.4 情绪分析 628 | 629 | ```python 630 | from snownlp import SnowNLP 631 | 632 | def word_processing(text): 633 | pass 634 | #数据清洗,限于篇幅,代码省略 635 | 636 | def sentiment_score_list(dataset): 637 | pass 638 | #数据处理和情绪判断主函数, 639 | #限于篇幅,代码省略 640 | 641 | def sentiment_score(senti_text): 642 | s1 = SnowNLP(senti_text) 643 | print(senti_text) 644 | return s1.sentiments 645 | #情绪得分汇总 646 | 647 | #将上述新闻标题去掉空格,写入列表里(list) 648 | y=[] 649 | t1=list(df.content) 650 | for i in range(len(t1)): 651 | x=t1[i].split() 652 | x=','.join(x) 653 | if i0: 666 | # p+=1 667 | # else: 668 | # n+=1 669 | # print("正面新闻数目:{0},负面新闻数目:{1}".format(p,n)) 670 | ``` 671 | 672 | 输出结果: 673 | 674 | ``` 675 | 【第八届金交会闭幕,意向签约总额近3500亿元】6月21日至23日,为期三天的第八届中国(广州)国际金融交易·博览会在广州举办。本届金交会上,广东新设机构平台9家,收集48个产融对接项目,意向签约总额近3500亿元,有7个粤港澳大湾区项目是首次集中交换签约文本。深圳证券交易所广州服务基地在本届金交会正式授牌成立,将更好地推动广东省企业上市、发行固定收益产品,支持区域内上市公司做优做强。(界面) 676 | 情绪得分: 0.921750008323056 677 | 678 | 【科创板发行接二连三!睿创微纳、天准科技将于7月2日网上申购】6月23日晚间,记者获悉,睿创微纳和天准科技即将在上交所披露招股意向书、上市发行安排及初步询价公告等多个文件。公告显示,睿创微纳股票代码为688002,网上申购代码为787002。天准科技股票代码为688003,网上申购代码为787003。睿创微纳、天准科技网上、网下申购时间均为7月2日,将于7月4日公布中签结果。(证券时报) 679 | 情绪得分: 0.33760371797103894 680 | 681 | 【东盟峰会主席声明反对贸易保护主义】第34届东盟峰会23日在泰国首都曼谷闭幕。当天公布的本届东盟峰会主席声明说,东盟反对贸易保护主义,支持维护多边贸易体制。(新华社) 682 | 情绪得分: 0.05343571003642067 683 | ``` 684 | 685 | ### 3.5 数据可视化 686 | 687 | #### 3.5.1 获取数据 688 | 689 | ```python 690 | #先引入后面可能用到的包(package) 691 | import pandas as pd 692 | import numpy as np 693 | import matplotlib.pyplot as plt 694 | #正常显示画图时出现的中文 695 | from pylab import mpl 696 | #这里使用微软雅黑字体 697 | mpl.rcParams['font.sans-serif']=['SimHei'] 698 | #画图时显示负号 699 | mpl.rcParams['axes.unicode_minus']=False 700 | import seaborn as sns #画图用的 701 | import tushare as ts 702 | #Jupyter Notebook特有的magic命令 703 | #直接在行内显示图形 704 | %matplotlib inline 705 | 706 | sh=ts.get_k_data(code='sh',ktype='D', 707 | autype='qfq', start='1990-12-20') 708 | #code:股票代码,个股主要使用代码,如‘600000’ 709 | #ktype:'D':日数据;‘m’:月数据,‘Y’:年数据 710 | #autype:复权选择,默认‘qfq’前复权 711 | #start:起始时间 712 | #end:默认当前时间 713 | #查看下数据前5行 714 | sh.head(5) 715 | ``` 716 | | date | open | close | high | low | volume | code | 717 | |------------|-------|-------|-------|-------|--------|------| 718 | | 1990-12-20 | 113.1 | 113.5 | 113.5 | 112.85| 1990.0 | sh | 719 | | 1990-12-21 | 113.5 | 113.5 | 113.5 | 113.4 | 1190.0 | sh | 720 | | 1990-12-24 | 113.5 | 114.0 | 114.0 | 113.3 | 8070.0 | sh | 721 | | 1990-12-25 | 114.0 | 114.1 | 114.2 | 114.0 | 2780.0 | sh | 722 | | 1990-12-26 | 114.4 | 114.3 | 114.4 | 114.2 | 310.0 | sh | 723 | 724 | #### 3.5.2 绘制收盘价趋势图 725 | 726 | ```python 727 | #将数据列表中的第0列'date'设置为索引 728 | sh.index=pd.to_datetime(sh.date) 729 | #画出上证指数收盘价的走势 730 | sh['close'].plot(figsize=(12,6)) 731 | plt.title('上证指数1990-2018年走势图') 732 | plt.xlabel('日期') 733 | plt.show() 734 | ``` 735 | 736 | ![](https://github.com/jm199504/Financial-Knowledge-Graphs/blob/master/images/close_price_plot.png?raw=true) 737 | 738 | #### 3.5.3 描述性统计 739 | 740 | ```python 741 | #pandas的describe()函数提供了数据的描述性统计 742 | #count:数据样本,mean:均值,std:标准差 743 | sh.describe().round(2) 744 | ``` 745 | | | open | close | high | low | volume | 746 | |---------|---------|---------|---------|---------|--------------| 747 | | count | 6808.00 | 6808.00 | 6808.00 | 6808.00 | 6.808000e+03 | 748 | | mean | 1957.80 | 1959.08 | 1976.41 | 1937.84 | 7.431684e+07 | 749 | | std | 1074.56 | 1075.94 | 1085.71 | 1062.06 | 1.055240e+08 | 750 | | min | 105.50 | 105.50 | 105.50 | 105.50 | 1.000000e+01 | 751 | | 25% | 1186.85 | 1185.01 | 1194.68 | 1171.79 | 5.272635e+06 | 752 | | 50% | 1831.68 | 1832.42 | 1842.81 | 1813.92 | 2.375030e+07 | 753 | | 75% | 2772.39 | 2779.86 | 2807.02 | 2744.36 | 1.144936e+08 | 754 | | max | 6057.43 | 6092.06 | 6124.04 | 6040.71 | 8.571328e+08 | 755 | 756 | #### 3.5.4 绘制每日成交量趋势图 757 | 758 | ```python 759 | sh.loc["2007-01-01":]["volume"].plot(figsize=(12,6)) 760 | plt.title('上证指数2007-2018年日成交量图') 761 | plt.xlabel('日期') 762 | plt.show() 763 | ``` 764 | 765 | ![](https://github.com/jm199504/Financial-Knowledge-Graphs/blob/master/images/volume_plot.png?raw=true) 766 | 767 | #### 3.5.5 绘制均线趋势图 768 | 769 | ```python 770 | #这里的平均线是通过自定义函数,手动设置20,52,252日均线 771 | #移动平均线: 772 | ma_day = [20,52,252] 773 | 774 | for ma in ma_day: 775 | column_name = "%s日均线" %(str(ma)) 776 | sh[column_name] =sh["close"].rolling(ma).mean() 777 | #sh.tail(3) 778 | #画出2010年以来收盘价和均线图 779 | sh.loc['2010-10-8':][["close", 780 | "20日均线","52日均线","252日均线"]].plot(figsize=(12,6)) 781 | plt.title('2010-2018上证指数走势图') 782 | plt.xlabel('日期') 783 | plt.show() 784 | ``` 785 | 786 | ![](https://github.com/jm199504/Financial-Knowledge-Graphs/blob/master/images/average_price_plot.png?raw=true) 787 | 788 | #### 3.5.6 绘制日收益率趋势图 789 | 790 | ``` 791 | #2005年之前的数据噪音太大,主要分析2005年之后的 792 | sh["日收益率"] = sh["close"].pct_change() 793 | sh["日收益率"].loc['2005-01-01':].plot(figsize=(12,4)) 794 | plt.xlabel('日期') 795 | plt.ylabel('收益率') 796 | plt.title('2005-2018年上证指数日收益率') 797 | plt.show() 798 | ``` 799 | 800 | ![](https://github.com/jm199504/Financial-Knowledge-Graphs/blob/master/images/daily_return_plot.png?raw=true) 801 | 802 | #### 3.5.7 分析多股票 803 | 804 | ```python 805 | #分析下常见的几个股票指数 806 | stocks={'上证指数':'sh','深证指数':'sz','沪深300':'hs300', 807 | '上证50':'sz50','中小板指':'zxb','创业板':'cyb'} 808 | stock_index=pd.DataFrame() 809 | for stock in stocks.values(): 810 | stock_index[stock]=ts.get_k_data(stock,ktype='D', 811 | autype='qfq', start='2005-01-01')['close'] 812 | #stock_index.head() 813 | #计算这些股票指数每日涨跌幅 814 | tech_rets = stock_index.pct_change()[1:] 815 | print(tech_rets) 816 | #tech_rets.head() 817 | #收益率描述性统计 818 | tech_rets.describe() 819 | #结果不在此报告 820 | #均值其实都大于0 821 | tech_rets.mean()*100 #转换为% 822 | ``` 823 | | | sh | sz | hs300 | sz50 | zxb | cyb | 824 | |----|-----------|-----------|-----------|-----------|-----------|-----------| 825 | | 1 | 0.007379 | 0.009070 | -0.008002 | 0.005272 | 0.004482 | 0.001279 | 826 | | 2 | -0.009992 | -0.007904 | -0.016797 | -0.010741 | -0.025450 | 0.029334 | 827 | | 3 | 0.004292 | 0.002265 | 0.022683 | 0.001362 | 0.009146 | 0.040661 | 828 | | 4 | 0.006146 | 0.008941 | -0.013917 | 0.011377 | -0.044799 | -0.002164 | 829 | | 5 | 0.004040 | 0.002233 | -0.013060 | 0.005846 | 0.014144 | 0.009998 | 830 | | ...| ... | ... | ... | ... | ... | ... | 831 | | 3524 | 0.001933 | 0.007996 | 0.000000 | 0.002929 | 0.000000 | 0.000000 | 832 | | 3525 | -0.025805 | -0.027207 | 0.000000 | -0.021746 | 0.000000 | 0.000000 | 833 | | 3526 | -0.001749 | 0.001361 | 0.000000 | -0.005508 | 0.000000 | 0.000000 | 834 | | 3527 | -0.004416 | -0.003548 | 0.000000 | -0.000961 | 0.000000 | 0.000000 | 835 | 836 | #### 3.5.8 绘制相关系数 837 | 838 | ```python 839 | # jointplot这个函数可以画出两个指数的”相关性系数“,或者说皮尔森相关系数 840 | sns.jointplot('sh','sz',data=tech_rets) 841 | ``` 842 | 843 | ![](https://github.com/jm199504/Financial-Knowledge-Graphs/blob/master/images/rets_plot.png?raw=true) 844 | 845 | #### 3.5.9 绘制多数据集相关系数 846 | 847 | ```python 848 | # 成对的比较不同数据集之间的相关性,而对角线则会显示该数据集的直方图 849 | sns.pairplot(tech_rets.iloc[:,3:].dropna()) 850 | ``` 851 | 852 | ![](https://github.com/jm199504/Financial-Knowledge-Graphs/blob/master/images/multi_rets_plot.png?raw=true) 853 | 854 | #### 3.5.10 收益率与风险 855 | 856 | ```python 857 | #构建一个计算股票收益率和标准差的函数 858 | #默认起始时间为'2005-01-01' 859 | def return_risk(stocks,startdate='2005-01-01'): 860 | close=pd.DataFrame() 861 | for stock in stocks.values(): 862 | close[stock]=ts.get_k_data(stock,ktype='D', 863 | autype='qfq', start=startdate)['close'] 864 | tech_rets = close.pct_change()[1:] 865 | rets = tech_rets.dropna() 866 | ret_mean=rets.mean()*100 867 | ret_std=rets.std()*100 868 | return ret_mean,ret_std 869 | 870 | #画图函数 871 | def plot_return_risk(): 872 | ret,vol=return_risk(stocks) 873 | color=np.array([ 0.18, 0.96, 0.75, 0.3, 0.9,0.5]) 874 | plt.scatter(ret, vol, marker = 'o', 875 | c=color,s = 500,cmap=plt.get_cmap('Spectral')) 876 | plt.xlabel("日收益率均值%") 877 | plt.ylabel("标准差%") 878 | for label,x,y in zip(stocks.keys(),ret,vol): 879 | plt.annotate(label,xy = (x,y),xytext = (20,20), 880 | textcoords = "offset points", 881 | ha = "right",va = "bottom", 882 | bbox = dict(boxstyle = 'round,pad=0.5', 883 | fc = 'yellow', alpha = 0.5), 884 | arrowprops = dict(arrowstyle = "->", 885 | connectionstyle = "arc3,rad=0")) 886 | stocks={'上证指数':'sh','深证指数':'sz','沪深300':'hs300', 887 | '上证50':'sz50','中小板指数':'zxb','创业板指数':'cyb'} 888 | plot_return_risk() 889 | ``` 890 | 891 | ![](https://github.com/jm199504/Financial-Knowledge-Graphs/blob/master/images/return_risk.png?raw=true) 892 | 893 | ```python 894 | stocks={'中国平安':'601318','格力电器':'000651', 895 | '招商银行':'600036','恒生电子':'600570', 896 | '中信证券':'600030','贵州茅台':'600519'} 897 | startdate='2018-01-01' 898 | plot_return_risk() 899 | ``` 900 | 901 | ![](https://github.com/jm199504/Financial-Knowledge-Graphs/blob/master/images/return_risk_2.png?raw=true) 902 | 903 | #### 3.5.11 蒙特卡洛模拟 904 | 905 | 蒙特卡洛模拟是一种统计学方法,用来模拟数据的演变趋势。蒙特卡洛模拟是在二战期间,当时在原子弹研制的项目中,为了模拟裂变物质的中子随机扩散现象,由美国数学家冯·诺伊曼和乌拉姆等发明的一种统计方法。之所以起名叫蒙特卡洛模拟,是因为蒙特卡洛在是欧洲袖珍国家摩纳哥一个城市,这个城市在当时是非常著名的一个赌城。因为赌博的本质是算概率,而蒙特卡洛模拟正是以概率为基础的一种方法,所以用赌城的名字为这种方法命名。蒙特卡洛模拟每次输入都随机选择输入值,通过大量的模拟,最终得出一个累计概率分布图。 906 | 907 | ```python 908 | df=ts.get_k_data('sh',ktype='D', autype='qfq', 909 | start='2005-01-01') 910 | df.index=pd.to_datetime(df.date) 911 | tech_rets = df.close.pct_change()[1:] 912 | rets = tech_rets.dropna() 913 | #rets.head() 914 | # 结果解释:95%的置信我们每天不会损失超过0.0264... 915 | rets.quantile(0.05) 916 | # -0.02618228439478043 917 | ``` 918 | 919 | #### 3.5.12 蒙特卡洛模拟价格分布图 920 | 921 | ```python 922 | def monte_carlo(start_price,days,mu,sigma): 923 | dt=1/days 924 | price = np.zeros(days) 925 | price[0] = start_price 926 | shock = np.zeros(days) 927 | drift = np.zeros(days) 928 | 929 | for x in range(1,days): 930 | shock[x] = np.random.normal(loc=mu * dt, 931 | scale=sigma * np.sqrt(dt)) 932 | drift[x] = mu * dt 933 | price[x] = price[x-1] + (price[x-1] * 934 | (drift[x] + shock[x])) 935 | return price 936 | #模拟次数 937 | runs = 10000 938 | start_price = 2641.34 #今日收盘价 939 | days = 252 940 | mu=rets.mean() 941 | sigma=rets.std() 942 | simulations = np.zeros(runs) 943 | 944 | for run in range(runs): 945 | simulations[run] = monte_carlo(start_price, 946 | days,mu,sigma)[days-1] 947 | q = np.percentile(simulations,1) 948 | plt.figure(figsize=(8,6)) 949 | plt.hist(simulations,bins=50,color='grey') 950 | plt.figtext(0.6,0.8,s="初始价格: %.2f" % start_price) 951 | plt.figtext(0.6,0.7,"预期价格均值: %.2f" %simulations.mean()) 952 | plt.figtext(0.15,0.6,"q(0.99: %.2f)" %q) 953 | plt.axvline(x=q,linewidth=6,color="r") 954 | plt.title("经过 %s 天后上证指数模拟价格分布图" %days,weight="bold") 955 | # Text(0.5,1,'经过 252 天后上证指数模拟价格分布图') 956 | ``` 957 | 958 | ![](https://github.com/jm199504/Financial-Knowledge-Graphs/blob/master/images/monte_plot.png?raw=true) 959 | 960 | #### 3.5.13 借用期权定价里对未来股票走势的假定来进行蒙特卡洛模拟 961 | 962 | ```python 963 | import numpy as np 964 | from time import time 965 | np.random.seed(2018) 966 | t0=time() 967 | S0=2641.34 968 | T=1.0; 969 | r=0.05; 970 | sigma=rets.std() 971 | M=50; 972 | dt=T/M; 973 | I=250000 974 | S=np.zeros((M+1,I)) 975 | S[0]=S0 976 | for t in range(1,M+1): 977 | z=np.random.standard_normal(I) 978 | S[t]=S[t-1]*np.exp((r-0.5*sigma**2)*dt+ 979 | sigma*np.sqrt(dt)*z) 980 | s_m=np.sum(S[-1])/I 981 | tnp1=time()-t0 982 | # print('经过250000次模拟,得出1年以后上证指数的预期平均收盘价为:%.2f',%s_m) 983 | # 经过250000次模拟,得出1年以后上证指数的预期平均收盘价为:2776.85 984 | %matplotlib inline 985 | import matplotlib.pyplot as plt 986 | plt.figure(figsize=(10,6)) 987 | plt.plot(S[:,:10]) 988 | plt.grid(True) 989 | plt.title('上证指数蒙特卡洛模拟其中10条模拟路径图') 990 | plt.xlabel('时间') 991 | plt.ylabel('指数') 992 | plt.show() 993 | ``` 994 | 995 | ![](https://github.com/jm199504/Financial-Knowledge-Graphs/blob/master/images/monte_2_plot.png?raw=true) 996 | 997 | #### 3.5.16 上证指数蒙特卡洛模拟 998 | 999 | ```python 1000 | plt.figure(figsize=(10,6)) 1001 | plt.hist(S[-1], bins=120) 1002 | plt.grid(True) 1003 | plt.xlabel('指数水平') 1004 | plt.ylabel('频率') 1005 | plt.title('上证指数蒙特卡洛模拟') 1006 | # Text(0.5,1,'上证指数蒙特卡洛模拟') 1007 | ``` 1008 | 1009 | ![](https://github.com/jm199504/Financial-Knowledge-Graphs/blob/master/images/monte_3_plot.png?raw=true) 1010 | 1011 | ## 4 搭建金融知识图谱 1012 | 1013 | 安装第三方库 1014 | 1015 | ```shell 1016 | pip install py2neo 1017 | ``` 1018 | 1019 | ### 4.1 基于python连接 1020 | 1021 | 具体代码可参考3.1 python操作neo4j-连接 1022 | 1023 | ```python 1024 | from pandas import DataFrame 1025 | from py2neo import Graph,Node,Relationship,NodeMatcher 1026 | import pandas as pd 1027 | import numpy as np 1028 | import os 1029 | # 连接Neo4j数据库 1030 | graph = Graph('http://localhost:7474/db/data/',username='neo4j',password='neo4j') 1031 | ``` 1032 | 1033 | ### 4.2 读取数据 1034 | 1035 | ```python 1036 | stock = pd.read_csv('stock_basic.csv',encoding="gbk") 1037 | holder = pd.read_csv('holders.csv') 1038 | concept_num = pd.read_csv('concept.csv') 1039 | concept = pd.read_csv('stock_concept.csv') 1040 | sh = pd.read_csv('sh.csv') 1041 | sz = pd.read_csv('sz.csv') 1042 | corr = pd.read_csv('corr.csv') 1043 | ``` 1044 | 1045 | ### 4.3 填充和去重 1046 | 1047 | ```python 1048 | stock['行业'] = stock['行业'].fillna('未知') 1049 | holder = holder.drop_duplicates(subset=None, keep='first', inplace=False) 1050 | ``` 1051 | 1052 | ### 4.4 创建实体 1053 | 1054 | 概念、股票、股东、股通 1055 | 1056 | ```python 1057 | sz = Node('深股通',名字='深股通') 1058 | graph.create(sz) 1059 | 1060 | sh = Node('沪股通',名字='沪股通') 1061 | graph.create(sh) 1062 | 1063 | for i in concept_num.values: 1064 | a = Node('概念',概念代码=i[1],概念名称=i[2]) 1065 | print('概念代码:'+str(i[1]),'概念名称:'+str(i[2])) 1066 | graph.create(a) 1067 | 1068 | for i in stock.values: 1069 | a = Node('股票',TS代码=i[1],股票名称=i[3],行业=i[4]) 1070 | print('TS代码:'+str(i[1]),'股票名称:'+str(i[3]),'行业:'+str(i[4])) 1071 | graph.create(a) 1072 | 1073 | for i in holder.values: 1074 | a = Node('股东',TS代码=i[0],股东名称=i[1],持股数量=i[2],持股比例=i[3]) 1075 | print('TS代码:'+str(i[0]),'股东名称:'+str(i[1]),'持股数量:'+str(i[2])) 1076 | graph.create(a) 1077 | ``` 1078 | 1079 | ![](https://github.com/jm199504/Financial-Knowledge-Graphs/blob/master/images/create_node.png?raw=true) 1080 | 1081 | ### 4.5 创建关系 1082 | 1083 | 股票-股东、股票-概念、股票-公告、股票-股通 1084 | 1085 | ```python 1086 | matcher = NodeMatcher(graph) 1087 | for i in holder.values: 1088 | a = matcher.match("股票",TS代码=i[0]).first() 1089 | b = matcher.match("股东",TS代码=i[0]) 1090 | for j in b: 1091 | r = Relationship(j,'参股',a) 1092 | graph.create(r) 1093 | print('TS',str(i[0])) 1094 | 1095 | for i in concept.values: 1096 | a = matcher.match("股票",TS代码=i[3]).first() 1097 | b = matcher.match("概念",概念代码=i[1]).first() 1098 | if a == None or b == None: 1099 | continue 1100 | r = Relationship(a,'概念属于',b) 1101 | graph.create(r) 1102 | 1103 | noticesdir = os.listdir("notices\\") 1104 | for n in noticesdir: 1105 | notice = pd.read_csv("notices\\"+n,encoding="utf_8_sig") 1106 | notice['content'] = notice['content'].fillna('空白') 1107 | for i in notice.values: 1108 | a = matcher.match("股票",TS代码=i[0]).first() 1109 | b = Node('公告',日期=i[1],标题=i[2],内容=i[3]) 1110 | graph.create(b) 1111 | r = Relationship(a,'发布公告',b) 1112 | graph.create(r) 1113 | print(str(i[0])) 1114 | 1115 | for i in sz.values: 1116 | a = matcher.match("股票",TS代码=i[0]).first() 1117 | b = matcher.match("深股通").first() 1118 | r = Relationship(a,'成分股属于',b) 1119 | graph.create(r) 1120 | print('TS代码:'+str(i[1]),'--深股通') 1121 | 1122 | for i in sh.values: 1123 | a = matcher.match("股票",TS代码=i[0]).first() 1124 | b = matcher.match("沪股通").first() 1125 | r = Relationship(a,'成分股属于',b) 1126 | graph.create(r) 1127 | print('TS代码:'+str(i[1]),'--沪股通') 1128 | 1129 | # 构建股票间关联 1130 | corr = pd.read_csv("corr.csv") 1131 | for i in corr.values: 1132 | a = matcher.match("股票",TS代码=i[1][:-1]).first() 1133 | b = matcher.match("股票",TS代码=i[2][:-1]).first() 1134 | r = Relationship(a,str(i[3]),b) 1135 | graph.create(r) 1136 | print(i) 1137 | ``` 1138 | 1139 | ![](https://github.com/jm199504/Financial-Knowledge-Graphs/blob/master/images/create_releationship.png?raw=true) 1140 | 1141 | 1142 | 1143 | ## 5 数据可视化查询 1144 | 1145 | 基于Crypher语言,以平安银行为例进行可视化查询。 1146 | 1147 | ### 5.1 查看所有关联实体 1148 | 1149 | ```cypher 1150 | match p=(m)-[]->(n) where m.股票名称="平安银行" or n.股票名称="平安银行" return p; 1151 | ``` 1152 | 1153 | ![](https://github.com/jm199504/Financial-Knowledge-Graphs/blob/master/images/corr_node.png?raw=true) 1154 | 1155 | ### 5.2 限制显示数量 1156 | 1157 | 计算股票间对数收益率的相关系数后,查看与平安银行股票相关联的实体 1158 | 1159 | ```cypher 1160 | match p=(m)-[]->(n) where m.股票名称="平安银行" or n.股票名称="平安银行" return p limit 300; 1161 | ``` 1162 | 1163 | ![](https://github.com/jm199504/Financial-Knowledge-Graphs/blob/master/images/corr_node_with_return.png?raw=true) 1164 | 1165 | ### 5.3 指定股票间对数收益率相关系数 1166 | 1167 | ```cypher 1168 | match p=(m)-[]->(n) where m.股票名称="平安银行" and n.股票名称="万科A" return p; 1169 | ``` 1170 | 1171 | ![](https://github.com/jm199504/Financial-Knowledge-Graphs/blob/master/images/nodes_pcc.png?raw=true) 1172 | 1173 | ## 6 neo4j 图算法 1174 | 1175 | ### 6.1.中心度算法(Centralities) 1176 | 1177 | - [PageRank(页面排名)](https://neo4j.com/docs/graph-algorithms/current/algorithms/page-rank/) 1178 | 1179 | - [ArticleRank(文章排名)](https://neo4j.com/docs/graph-algorithms/current/algorithms/article-rank/) 1180 | 1181 | - [Betweenness Centrality (中介中心度)](https://neo4j.com/docs/graph-algorithms/current/algorithms/betweenness-centrality/) 1182 | 1183 | - [Closeness Centrality (接近中心度)](https://neo4j.com/docs/graph-algorithms/current/algorithms/closeness-centrality/) 1184 | 1185 | - [Harmonic Centrality(谐波中心度)](https://neo4j.com/docs/graph-algorithms/current/algorithms/harmonic-centrality/) 1186 | 1187 | ### 6.2 社区检测算法(Community detection) 1188 | 1189 | - [Louvain (鲁汶算法)](https://neo4j.com/docs/graph-data-science/current/algorithms/louvain/) 1190 | - [Label Propagation (标签传播)](https://neo4j.com/docs/graph-algorithms/current/algorithms/label-propagation/) 1191 | - [Connected Components (连通组件)](https://neo4j.com/docs/graph-algorithms/current/algorithms/connected-components/) 1192 | - [Strongly Connected Components (强连通组件)]((https://neo4j.com/docs/graph-algorithms/current/algorithms/strongly-connected-components/) 1193 | - [Triangle Counting / Clustering Coefficient (三角计数/聚类系数)](https://neo4j.com/docs/graph-algorithms/current/algorithms/triangle-counting-clustering-coefficient/) 1194 | 1195 | ### 6.3 路径搜索算法(Path finding) 1196 | 1197 | - [Minimum Weight Spanning Tree (最小权重生成树)](https://neo4j.com/docs/graph-algorithms/current/algorithms/minimum-weight-spanning-tree/) 1198 | 1199 | - [Shortest Path (最短路径)](https://neo4j.com/docs/graph-algorithms/current/algorithms/shortest-path/) 1200 | 1201 | - [Single Source Shortest Path (单源最短路径)](https://neo4j.com/docs/graph-algorithms/current/algorithms/single-source-shortest-path/) 1202 | 1203 | - [All Pairs Shortest Path (全顶点对最短路径)](https://neo4j.com/docs/graph-algorithms/current/algorithms/all-pairs-shortest-path/) 1204 | 1205 | - [A*(A星)](https://neo4j.com/docs/graph-algorithms/current/algorithms/a_star/) 1206 | 1207 | - [Yen’s K-shortest Paths(Yen-K最短路径)](https://neo4j.com/docs/graph-algorithms/current/algorithms/yen-s-k-shortest-path/) 1208 | 1209 | - [Random Walk (随机游走)](https://neo4j.com/docs/graph-algorithms/current/algorithms/random-walk/) 1210 | 1211 | ### 6.4 相似性算法(Similarity) 1212 | 1213 | - [Jaccard Similarity (Jaccard相似度)](https://neo4j.com/docs/graph-algorithms/current/algorithms/similarity-jaccard/) 1214 | 1215 | - [Cosine Similarity (余弦相似度)](https://neo4j.com/docs/graph-algorithms/current/algorithms/similarity-cosine/) 1216 | 1217 | - [Pearson Similarity (Pearson相似度)](https://neo4j.com/docs/graph-algorithms/current/algorithms/similarity-pearson/) 1218 | 1219 | - [Euclidean Distance (欧氏距离)](https://neo4j.com/docs/graph-algorithms/current/algorithms/similarity-euclidean/) 1220 | 1221 | - [Overlap Similarity (重叠相似度)](https://neo4j.com/docs/graph-algorithms/current/algorithms/similarity-overlap/) 1222 | 1223 | ### 6.5 链接预测(Link Prediction) 1224 | 1225 | - [Adamic Adar(AA)](https://neo4j.com/docs/graph-algorithms/current/algorithms/linkprediction-adamic-adar/) 1226 | 1227 | - [Common Neighbors(共同近邻)](https://neo4j.com/docs/graph-algorithms/current/algorithms/linkprediction-common-neighbors/) 1228 | 1229 | - [Preferential Attachment(优先连接)](https://neo4j.com/docs/graph-algorithms/current/algorithms/linkprediction-preferential-attachment/) 1230 | 1231 | - [Resource Allocation(资源分配)](https://neo4j.com/docs/graph-algorithms/current/algorithms/linkprediction-resource-allocation/) 1232 | 1233 | - [Same Community(共同社区)](https://neo4j.com/docs/graph-algorithms/current/algorithms/linkprediction-same-community/) 1234 | 1235 | - [Total Neighbors(近邻总数)](https://neo4j.com/docs/graph-algorithms/current/algorithms/linkprediction-total-neighbors/) 1236 | 1237 | ### 6.6 预处理算法(Preprocessing) 1238 | 1239 | - [One Hot Encoding(独热编码)](https://neo4j.com/docs/graph-algorithms/current/algorithms/one-hot-encoding/) 1240 | 1241 | ### 6.7 算法库安装及导入方法 1242 | 1243 | 以Windows OS为例,neo4j的算法库并非在安装包中提供,而需要下载算法包: 1244 | 1245 | (1)下载`graph-algorithms-algo-3.5.4.0.jar` 1246 | 1247 | (2)将`graph-algorithms-algo-3.5.4.0.jar`移动至neo4j数据库根目录下的`plugin`中 1248 | 1249 | (3)修改neo4j数据库目录的`conf`中`neo4j.conf`,添加以下配置 1250 | 1251 | ``` 1252 | dbms.security.procedures.unrestricted=algo.* 1253 | ``` 1254 | 1255 | (4)使用以下命令查看所有算法列表 1256 | 1257 | ```cypher 1258 | CALL algo.list() 1259 | ``` 1260 | 1261 | ### 6.8 算法实践——链路预测 1262 | 1263 | #### 6.8.1 Aaamic Adar algorithm 1264 | 1265 | 主要基于判断相邻的两个节点之间的亲密程度作为评判标准,2003年由Lada Adamic 和 Eytan Adar在 [Friends and neighbors on the Web](https://www.sciencedirect.com/science/article/abs/pii/S0378873303000091?via%3Dihub) 提出,其中节点亲密度的计算公式如下: 1266 | 1267 | 1268 | 1269 | 其中`N(u)`表示与节点u相邻的节点集合,若`A(x,y)`表示节点x和节点y不相邻,而该值若越大则紧密度为高。 1270 | 1271 | AAA 算法 cypher 代码参考: 1272 | 1273 | ```cypher 1274 | MERGE (zhen:Person {name: "Zhen"}) 1275 | MERGE (praveena:Person {name: "Praveena"}) 1276 | MERGE (michael:Person {name: "Michael"}) 1277 | MERGE (arya:Person {name: "Arya"}) 1278 | MERGE (karin:Person {name: "Karin"}) 1279 | 1280 | MERGE (zhen)-[:FRIENDS]-(arya) 1281 | MERGE (zhen)-[:FRIENDS]-(praveena) 1282 | MERGE (praveena)-[:WORKS_WITH]-(karin) 1283 | MERGE (praveena)-[:FRIENDS]-(michael) 1284 | MERGE (michael)-[:WORKS_WITH]-(karin) 1285 | MERGE (arya)-[:FRIENDS]-(karin) 1286 | 1287 | // 计算 Michael 和 Karin 之间的亲密度 1288 | MATCH (p1:Person {name: 'Michael'}) 1289 | MATCH (p2:Person {name: 'Karin'}) 1290 | RETURN algo.linkprediction.adamicAdar(p1, p2) AS score 1291 | // score: 0.910349 1292 | 1293 | // 基于好友关系计算 Michael 和 Karin 之间的亲密度 1294 | MATCH (p1:Person {name: 'Michael'}) 1295 | MATCH (p2:Person {name: 'Karin'}) 1296 | RETURN algo.linkprediction.adamicAdar(p1, p2, {relationshipQuery: "FRIENDS"}) AS score 1297 | // score: 0.0 1298 | ``` 1299 | 1300 | #### 6.8.2 Common Neighbors 1301 | 1302 | 基于节点之间共同近邻数量计算,计算公式如下: 1303 | 1304 | 1305 | 1306 | 其中N(x)表示与节点x相邻的节点集合,共同近邻表示两个集合的交集,若CN(x,y)值越高,表示节点x和节点y的亲密度越高。 1307 | 1308 | Common Neighbors算法 cypher 代码参考: 1309 | 1310 | ```cypher 1311 | MATCH (p1:Person {name: 'Michael'}) 1312 | MATCH (p2:Person {name: 'Karin'}) 1313 | RETURN algo.linkprediction.commonNeighbors(p1, p2) AS score 1314 | ``` 1315 | 1316 | --- 1317 | 1318 | #### 6.8.3 Resource Allocation 1319 | 1320 | 资源分配算法,计算公式如下: 1321 | 1322 | 1323 | 1324 | 其中`N(u)`是与节点`u`相邻的节点集合,RA(x,y)越高表明节点x和节点y的亲密度越大。 1325 | 1326 | Resource Allocation算法 cypher 代码参考: 1327 | 1328 | ```cypher 1329 | MATCH (p1:Person {name: 'Michael'}) 1330 | MATCH (p2:Person {name: 'Karin'}) 1331 | RETURN algo.linkprediction.resourceAllocation(p1, p2) AS score 1332 | ``` 1333 | 1334 | #### 6.8.4 Total Neighbors 1335 | 1336 | 指的是相邻节点之间的邻居总数,计算公式如下: 1337 | 1338 | 1339 | 1340 | 其中`N(u)`是与节点`u`相邻的节点集合。 1341 | 1342 | Total Neighbors算法 cypher 代码参考: 1343 | 1344 | ```cypher 1345 | MATCH (p1:Person {name: 'Michael'}) 1346 | MATCH (p2:Person {name: 'Karin'}) 1347 | RETURN algo.linkprediction.totalNeighbors(p1, p2) AS score 1348 | ``` 1349 | 1350 | 官网文档=>链路算法=>介绍:https://neo4j.com/docs/graph-algorithms/3.5/labs-algorithms/linkprediction/ 1351 | 1352 | 1353 | 1354 | 1355 | 1356 | -------------------------------------------------------------------------------- /code/python_to_neo4j.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 9, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "from pandas import DataFrame\n", 10 | "from py2neo import Graph,Node,Relationship,NodeMatcher\n", 11 | "import pandas as pd\n", 12 | "import numpy as np\n", 13 | "import os\n", 14 | "# 连接Neo4j数据库\n", 15 | "graph = Graph('http://localhost:7474/db/data/',username='neo4j',password='123456')" 16 | ] 17 | }, 18 | { 19 | "cell_type": "code", 20 | "execution_count": 11, 21 | "metadata": {}, 22 | "outputs": [ 23 | { 24 | "data": { 25 | "text/plain": [ 26 | "" 27 | ] 28 | }, 29 | "execution_count": 11, 30 | "metadata": {}, 31 | "output_type": "execute_result" 32 | } 33 | ], 34 | "source": [ 35 | "a = Node('Person',name='Tom')\n", 36 | "graph.create(a)\n", 37 | "b = Node('Person',name='Bob')\n", 38 | "graph.create(b)\n", 39 | "\n", 40 | "# 创建关系例子\n", 41 | "r = Relationship(a,'KNOWS',b)\n", 42 | "graph.create(r)\n", 43 | "\n", 44 | "# 读取节点信息\n", 45 | "node = DataFrame(graph.run('MATCH (n:`Person`) RETURN n LIMIT 25'))\n", 46 | "# print(node)\n", 47 | "\n", 48 | "# 读取关系信息\n", 49 | "relation = DataFrame(graph.run('MATCH (n:`Person`)-[r]->(m:`Person`) return n,m,type(r)'))\n", 50 | "# print(relation)\n", 51 | "\n", 52 | "# 删除所有节点\n", 53 | "graph.run('MATCH (n) OPTIONAL MATCH (n)-[r]-() DELETE n,r')" 54 | ] 55 | }, 56 | { 57 | "cell_type": "code", 58 | "execution_count": 4, 59 | "metadata": {}, 60 | "outputs": [], 61 | "source": [ 62 | "# 读取数据\n", 63 | "stock = pd.read_csv('stock_basic.csv',encoding=\"gbk\")\n", 64 | "holder = pd.read_csv('holders.csv')\n", 65 | "concept_num = pd.read_csv('concept.csv')\n", 66 | "concept = pd.read_csv('stock_concept.csv')\n", 67 | "sh = pd.read_csv('sh.csv')\n", 68 | "sz = pd.read_csv('sz.csv')\n", 69 | "corr = pd.read_csv('corr.csv')" 70 | ] 71 | }, 72 | { 73 | "cell_type": "code", 74 | "execution_count": 5, 75 | "metadata": {}, 76 | "outputs": [], 77 | "source": [ 78 | "# 数据预处理\n", 79 | "stock['行业'] = stock['行业'].fillna('未知')\n", 80 | "holder = holder.drop_duplicates(subset=None, keep='first', inplace=False)" 81 | ] 82 | }, 83 | { 84 | "cell_type": "code", 85 | "execution_count": 6, 86 | "metadata": {}, 87 | "outputs": [], 88 | "source": [ 89 | "# 创建实体(概念、股票、股东、股通)\n", 90 | "\n", 91 | "sz = Node('深股通',名字='深股通')\n", 92 | "graph.create(sz) \n", 93 | " \n", 94 | "sh = Node('沪股通',名字='沪股通')\n", 95 | "graph.create(sh) \n", 96 | "\n", 97 | "for i in concept_num.values:\n", 98 | " a = Node('概念',概念代码=i[1],概念名称=i[2])\n", 99 | " print('概念代码:'+str(i[1]),'概念名称:'+str(i[2]))\n", 100 | " graph.create(a)\n", 101 | "\n", 102 | "for i in stock.values:\n", 103 | " a = Node('股票',TS代码=i[1],股票名称=i[3],行业=i[4])\n", 104 | " print('TS代码:'+str(i[1]),'股票名称:'+str(i[3]),'行业:'+str(i[4]))\n", 105 | " graph.create(a)\n", 106 | "\n", 107 | "for i in holder.values:\n", 108 | " a = Node('股东',TS代码=i[0],股东名称=i[1],持股数量=i[2],持股比例=i[3])\n", 109 | " print('TS代码:'+str(i[0]),'股东名称:'+str(i[1]),'持股数量:'+str(i[2]))\n", 110 | " graph.create(a)" 111 | ] 112 | }, 113 | { 114 | "cell_type": "code", 115 | "execution_count": 7, 116 | "metadata": {}, 117 | "outputs": [], 118 | "source": [ 119 | "# 创建关系(股票-股东、股票-概念、股票-公告、股票-股通)\n", 120 | "\n", 121 | "matcher = NodeMatcher(graph) \n", 122 | "for i in holder.values: \n", 123 | " a = matcher.match(\"股票\",TS代码=i[0]).first()\n", 124 | " b = matcher.match(\"股东\",TS代码=i[0])\n", 125 | " for j in b:\n", 126 | " r = Relationship(j,'参股',a)\n", 127 | " graph.create(r)\n", 128 | " print('TS',str(i[0]))\n", 129 | " \n", 130 | "for i in concept.values:\n", 131 | " a = matcher.match(\"股票\",TS代码=i[3]).first()\n", 132 | " b = matcher.match(\"概念\",概念代码=i[1]).first()\n", 133 | " if a == None or b == None:\n", 134 | " continue\n", 135 | " r = Relationship(a,'概念属于',b)\n", 136 | " graph.create(r) \n", 137 | "\n", 138 | "noticesdir = os.listdir(\"notices\\\\\")\n", 139 | "for n in noticesdir:\n", 140 | " notice = pd.read_csv(\"notices\\\\\"+n,encoding=\"utf_8_sig\")\n", 141 | " notice['content'] = notice['content'].fillna('空白')\n", 142 | " for i in notice.values:\n", 143 | " a = matcher.match(\"股票\",TS代码=i[0]).first()\n", 144 | " b = Node('公告',日期=i[1],标题=i[2],内容=i[3])\n", 145 | " graph.create(b)\n", 146 | " r = Relationship(a,'发布公告',b)\n", 147 | " graph.create(r)\n", 148 | " print(str(i[0]))\n", 149 | " \n", 150 | "for i in sz.values:\n", 151 | " a = matcher.match(\"股票\",TS代码=i[0]).first()\n", 152 | " b = matcher.match(\"深股通\").first()\n", 153 | " r = Relationship(a,'成分股属于',b)\n", 154 | " graph.create(r)\n", 155 | " print('TS代码:'+str(i[1]),'--深股通')\n", 156 | "\n", 157 | "for i in sh.values:\n", 158 | " a = matcher.match(\"股票\",TS代码=i[0]).first()\n", 159 | " b = matcher.match(\"沪股通\").first()\n", 160 | " r = Relationship(a,'成分股属于',b)\n", 161 | " graph.create(r)\n", 162 | " print('TS代码:'+str(i[1]),'--沪股通')\n", 163 | "\n", 164 | "# 构建股票间关联\n", 165 | "corr = pd.read_csv(\"corr.csv\")\n", 166 | "for i in corr.values:\n", 167 | " a = matcher.match(\"股票\",TS代码=i[1][:-1]).first()\n", 168 | " b = matcher.match(\"股票\",TS代码=i[2][:-1]).first()\n", 169 | " r = Relationship(a,str(i[3]),b)\n", 170 | " graph.create(r)\n", 171 | " print(i)" 172 | ] 173 | }, 174 | { 175 | "cell_type": "code", 176 | "execution_count": null, 177 | "metadata": {}, 178 | "outputs": [], 179 | "source": [] 180 | } 181 | ], 182 | "metadata": { 183 | "kernelspec": { 184 | "display_name": "Python 3", 185 | "language": "python", 186 | "name": "python3" 187 | }, 188 | "language_info": { 189 | "codemirror_mode": { 190 | "name": "ipython", 191 | "version": 3 192 | }, 193 | "file_extension": ".py", 194 | "mimetype": "text/x-python", 195 | "name": "python", 196 | "nbconvert_exporter": "python", 197 | "pygments_lexer": "ipython3", 198 | "version": "3.7.8" 199 | } 200 | }, 201 | "nbformat": 4, 202 | "nbformat_minor": 2 203 | } 204 | -------------------------------------------------------------------------------- /code/stock_data.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 3, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "import tushare as ts\n", 10 | "import csv\n", 11 | "import time\n", 12 | "import pandas as pd\n", 13 | "pro = ts.pro_api('35d8848b876df93910413e8936c40745d7b7da42553ae73920862cd9')" 14 | ] 15 | }, 16 | { 17 | "cell_type": "code", 18 | "execution_count": 24, 19 | "metadata": {}, 20 | "outputs": [], 21 | "source": [ 22 | "start_date = '20200101'\n", 23 | "end_date = '20200301'\n", 24 | "ts_code = '000001.SZ'\n", 25 | "year='2020'" 26 | ] 27 | }, 28 | { 29 | "cell_type": "markdown", 30 | "metadata": {}, 31 | "source": [ 32 | "### 1. 股票数据获取(Tushare)" 33 | ] 34 | }, 35 | { 36 | "cell_type": "markdown", 37 | "metadata": {}, 38 | "source": [ 39 | "#### 1.1 股票基本信息" 40 | ] 41 | }, 42 | { 43 | "cell_type": "code", 44 | "execution_count": 4, 45 | "metadata": {}, 46 | "outputs": [ 47 | { 48 | "data": { 49 | "text/html": [ 50 | "
\n", 51 | "\n", 64 | "\n", 65 | " \n", 66 | " \n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | "
TS代码股票代码股票名称行业
0000001.SZ000001平安银行银行
1000002.SZ000002万科A全国地产
2000004.SZ000004国华网安软件服务
3000005.SZ000005ST星源环境保护
4000006.SZ000006深振业A区域地产
\n", 112 | "
" 113 | ], 114 | "text/plain": [ 115 | " TS代码 股票代码 股票名称 行业\n", 116 | "0 000001.SZ 000001 平安银行 银行\n", 117 | "1 000002.SZ 000002 万科A 全国地产\n", 118 | "2 000004.SZ 000004 国华网安 软件服务\n", 119 | "3 000005.SZ 000005 ST星源 环境保护\n", 120 | "4 000006.SZ 000006 深振业A 区域地产" 121 | ] 122 | }, 123 | "execution_count": 4, 124 | "metadata": {}, 125 | "output_type": "execute_result" 126 | } 127 | ], 128 | "source": [ 129 | "stock_basic = pro.stock_basic(list_status='L', fields='ts_code, symbol, name, industry')\n", 130 | "\n", 131 | "# 重命名字段(便于后续导入neo4j)\n", 132 | "basic_rename = {'ts_code': 'TS代码', 'symbol': '股票代码', 'name': '股票名称', 'industry': '行业'}\n", 133 | "stock_basic.rename(columns=basic_rename, inplace=True)\n", 134 | "\n", 135 | "# 保存为stock_basic.csv\n", 136 | "stock_basic.to_csv('financial_data\\\\stock_basic.csv', encoding='gbk')\n", 137 | "\n", 138 | "stock_basic.head()" 139 | ] 140 | }, 141 | { 142 | "cell_type": "markdown", 143 | "metadata": {}, 144 | "source": [ 145 | "#### 1.2 股票Top10股东信息" 146 | ] 147 | }, 148 | { 149 | "cell_type": "code", 150 | "execution_count": 6, 151 | "metadata": {}, 152 | "outputs": [ 153 | { 154 | "data": { 155 | "text/html": [ 156 | "
\n", 157 | "\n", 170 | "\n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | "
ts_codeann_dateend_dateholder_namehold_amounthold_ratio
0000001.SZ2020042120200331中国平安保险(集团)股份有限公司-集团本级-自有资金9.618540e+0949.56
1000001.SZ2020042120200331香港中央结算有限公司(陆股通)1.669875e+098.60
2000001.SZ2020042120200331中国平安人寿保险股份有限公司-自有资金1.186100e+096.11
3000001.SZ2020042120200331中国平安人寿保险股份有限公司-传统-普通保险产品4.404787e+082.27
4000001.SZ2020042120200331中国证券金融股份有限公司4.292327e+082.21
\n", 230 | "
" 231 | ], 232 | "text/plain": [ 233 | " ts_code ann_date end_date holder_name hold_amount \\\n", 234 | "0 000001.SZ 20200421 20200331 中国平安保险(集团)股份有限公司-集团本级-自有资金 9.618540e+09 \n", 235 | "1 000001.SZ 20200421 20200331 香港中央结算有限公司(陆股通) 1.669875e+09 \n", 236 | "2 000001.SZ 20200421 20200331 中国平安人寿保险股份有限公司-自有资金 1.186100e+09 \n", 237 | "3 000001.SZ 20200421 20200331 中国平安人寿保险股份有限公司-传统-普通保险产品 4.404787e+08 \n", 238 | "4 000001.SZ 20200421 20200331 中国证券金融股份有限公司 4.292327e+08 \n", 239 | "\n", 240 | " hold_ratio \n", 241 | "0 49.56 \n", 242 | "1 8.60 \n", 243 | "2 6.11 \n", 244 | "3 2.27 \n", 245 | "4 2.21 " 246 | ] 247 | }, 248 | "execution_count": 6, 249 | "metadata": {}, 250 | "output_type": "execute_result" 251 | } 252 | ], 253 | "source": [ 254 | "holders = pd.DataFrame(columns=('ts_code', 'ann_date', 'end_date', 'holder_name', 'hold_amount', 'hold_ratio'))\n", 255 | "\n", 256 | "top10_holders = pro.top10_holders(ts_code=ts_code, start_date=start_date, end_date=end_date)\n", 257 | "\n", 258 | "top10_holders.head()" 259 | ] 260 | }, 261 | { 262 | "cell_type": "markdown", 263 | "metadata": {}, 264 | "source": [ 265 | "#### 1.3 股票概念信息" 266 | ] 267 | }, 268 | { 269 | "cell_type": "code", 270 | "execution_count": 8, 271 | "metadata": {}, 272 | "outputs": [], 273 | "source": [ 274 | "concept_details = pd.DataFrame(columns=('id', 'concept_name', 'ts_code', 'name'))\n", 275 | "\n", 276 | "# concept_detail = pro.concept_detail(id='TS0') # 该接口 TOKEN 受限\n", 277 | " \n", 278 | "# concept_detail.head()" 279 | ] 280 | }, 281 | { 282 | "cell_type": "markdown", 283 | "metadata": {}, 284 | "source": [ 285 | "#### 1.4 股票公告信息" 286 | ] 287 | }, 288 | { 289 | "cell_type": "code", 290 | "execution_count": 12, 291 | "metadata": {}, 292 | "outputs": [], 293 | "source": [ 294 | "# notices = pro.anns(ts_code=ts_code, start_date=start_date, end_date=end_date, year=year) # 该接口 TOKEN 受限\n", 295 | "\n", 296 | "# notices.to_csv(\"financial_data\\\\notices\\\\\"+str(code)+\".csv\",encoding='utf_8_sig',index=False)\n", 297 | "\n", 298 | "# notices.head()" 299 | ] 300 | }, 301 | { 302 | "cell_type": "markdown", 303 | "metadata": {}, 304 | "source": [ 305 | "#### 1.5 财经新闻信息" 306 | ] 307 | }, 308 | { 309 | "cell_type": "code", 310 | "execution_count": 129, 311 | "metadata": {}, 312 | "outputs": [ 313 | { 314 | "data": { 315 | "text/html": [ 316 | "
\n", 317 | "\n", 330 | "\n", 331 | " \n", 332 | " \n", 333 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | " \n", 344 | " \n", 345 | " \n", 346 | " \n", 347 | " \n", 348 | " \n", 349 | " \n", 350 | " \n", 351 | " \n", 352 | " \n", 353 | " \n", 354 | " \n", 355 | " \n", 356 | " \n", 357 | " \n", 358 | " \n", 359 | " \n", 360 | " \n", 361 | " \n", 362 | " \n", 363 | " \n", 364 | " \n", 365 | " \n", 366 | " \n", 367 | " \n", 368 | " \n", 369 | " \n", 370 | " \n", 371 | "
datetimecontenttitle
02018-11-22 10:09:00【北京:10月末非金融企业贷款增速创历史同期新高 】据中国人民银行营业管理部11月22日披露...
12018-11-22 10:08:29【湖北省积极推动企业向科创板靠拢,年底将向上交所推荐一批优质企业】湖北省上市指导中心相关人士...
22018-11-22 10:08:04【上海:加快专利审查速度 进一步缩短专利授权周期】上海市人民政府发布关于加快本市高新技术企业...
32018-11-22 10:04:35【生意社:稀土价格走势或将上涨】11月20日稀土指数为344点,与上一日持平,较2011年1...
42018-11-22 10:03:40【上海:加快建设具有全球影响力的科技创新中心 做大做强高新技术产业】上海市人民政府发布关于加...
\n", 372 | "
" 373 | ], 374 | "text/plain": [ 375 | " datetime content \\\n", 376 | "0 2018-11-22 10:09:00 【北京:10月末非金融企业贷款增速创历史同期新高 】据中国人民银行营业管理部11月22日披露... \n", 377 | "1 2018-11-22 10:08:29 【湖北省积极推动企业向科创板靠拢,年底将向上交所推荐一批优质企业】湖北省上市指导中心相关人士... \n", 378 | "2 2018-11-22 10:08:04 【上海:加快专利审查速度 进一步缩短专利授权周期】上海市人民政府发布关于加快本市高新技术企业... \n", 379 | "3 2018-11-22 10:04:35 【生意社:稀土价格走势或将上涨】11月20日稀土指数为344点,与上一日持平,较2011年1... \n", 380 | "4 2018-11-22 10:03:40 【上海:加快建设具有全球影响力的科技创新中心 做大做强高新技术产业】上海市人民政府发布关于加... \n", 381 | "\n", 382 | " title \n", 383 | "0 \n", 384 | "1 \n", 385 | "2 \n", 386 | "3 \n", 387 | "4 " 388 | ] 389 | }, 390 | "execution_count": 129, 391 | "metadata": {}, 392 | "output_type": "execute_result" 393 | } 394 | ], 395 | "source": [ 396 | "news = pro.news(src='sina', start_date='2018-11-21 09:00:00', end_date='2018-11-22 10:10:00')\n", 397 | "news.head()" 398 | ] 399 | }, 400 | { 401 | "cell_type": "markdown", 402 | "metadata": {}, 403 | "source": [ 404 | "#### 1.6 概念信息" 405 | ] 406 | }, 407 | { 408 | "cell_type": "code", 409 | "execution_count": 19, 410 | "metadata": {}, 411 | "outputs": [], 412 | "source": [ 413 | "# concept = pro.concept() # 该接口 TOKEN 受限\n", 414 | "\n", 415 | "# concept.head()" 416 | ] 417 | }, 418 | { 419 | "cell_type": "markdown", 420 | "metadata": {}, 421 | "source": [ 422 | "#### 1.7 沪股通成分和深股通成分信息 " 423 | ] 424 | }, 425 | { 426 | "cell_type": "code", 427 | "execution_count": 20, 428 | "metadata": {}, 429 | "outputs": [ 430 | { 431 | "data": { 432 | "text/html": [ 433 | "
\n", 434 | "\n", 447 | "\n", 448 | " \n", 449 | " \n", 450 | " \n", 451 | " \n", 452 | " \n", 453 | " \n", 454 | " \n", 455 | " \n", 456 | " \n", 457 | " \n", 458 | " \n", 459 | " \n", 460 | " \n", 461 | " \n", 462 | " \n", 463 | " \n", 464 | " \n", 465 | " \n", 466 | " \n", 467 | " \n", 468 | " \n", 469 | " \n", 470 | " \n", 471 | " \n", 472 | " \n", 473 | " \n", 474 | " \n", 475 | " \n", 476 | " \n", 477 | " \n", 478 | " \n", 479 | " \n", 480 | " \n", 481 | " \n", 482 | " \n", 483 | " \n", 484 | " \n", 485 | " \n", 486 | " \n", 487 | " \n", 488 | " \n", 489 | " \n", 490 | " \n", 491 | " \n", 492 | " \n", 493 | " \n", 494 | " \n", 495 | " \n", 496 | " \n", 497 | " \n", 498 | " \n", 499 | " \n", 500 | "
ts_codehs_typein_dateout_dateis_new
0601628.SHSH20141117None1
1601099.SHSH20141117None1
2601808.SHSH20141117None1
3601107.SHSH20141117None1
4601880.SHSH20141117None1
\n", 501 | "
" 502 | ], 503 | "text/plain": [ 504 | " ts_code hs_type in_date out_date is_new\n", 505 | "0 601628.SH SH 20141117 None 1\n", 506 | "1 601099.SH SH 20141117 None 1\n", 507 | "2 601808.SH SH 20141117 None 1\n", 508 | "3 601107.SH SH 20141117 None 1\n", 509 | "4 601880.SH SH 20141117 None 1" 510 | ] 511 | }, 512 | "execution_count": 20, 513 | "metadata": {}, 514 | "output_type": "execute_result" 515 | } 516 | ], 517 | "source": [ 518 | "sh = pro.hs_const(hs_type='SH') # 获取沪股通成分\n", 519 | "\n", 520 | "sz = pro.hs_const(hs_type='SZ') # 获取深股通成分\n", 521 | "\n", 522 | "sh.head()" 523 | ] 524 | }, 525 | { 526 | "cell_type": "markdown", 527 | "metadata": {}, 528 | "source": [ 529 | "#### 1.8 股票价格信息" 530 | ] 531 | }, 532 | { 533 | "cell_type": "code", 534 | "execution_count": 29, 535 | "metadata": {}, 536 | "outputs": [ 537 | { 538 | "data": { 539 | "text/html": [ 540 | "
\n", 541 | "\n", 554 | "\n", 555 | " \n", 556 | " \n", 557 | " \n", 558 | " \n", 559 | " \n", 560 | " \n", 561 | " \n", 562 | " \n", 563 | " \n", 564 | " \n", 565 | " \n", 566 | " \n", 567 | " \n", 568 | " \n", 569 | " \n", 570 | " \n", 571 | " \n", 572 | " \n", 573 | " \n", 574 | " \n", 575 | " \n", 576 | " \n", 577 | " \n", 578 | " \n", 579 | " \n", 580 | " \n", 581 | " \n", 582 | " \n", 583 | " \n", 584 | " \n", 585 | " \n", 586 | " \n", 587 | " \n", 588 | " \n", 589 | " \n", 590 | " \n", 591 | " \n", 592 | " \n", 593 | " \n", 594 | " \n", 595 | " \n", 596 | " \n", 597 | " \n", 598 | " \n", 599 | " \n", 600 | " \n", 601 | " \n", 602 | " \n", 603 | " \n", 604 | " \n", 605 | " \n", 606 | " \n", 607 | " \n", 608 | " \n", 609 | " \n", 610 | " \n", 611 | " \n", 612 | " \n", 613 | " \n", 614 | " \n", 615 | " \n", 616 | " \n", 617 | " \n", 618 | " \n", 619 | " \n", 620 | " \n", 621 | " \n", 622 | " \n", 623 | " \n", 624 | " \n", 625 | " \n", 626 | " \n", 627 | " \n", 628 | " \n", 629 | " \n", 630 | " \n", 631 | " \n", 632 | " \n", 633 | " \n", 634 | " \n", 635 | " \n", 636 | " \n", 637 | " \n", 638 | " \n", 639 | " \n", 640 | " \n", 641 | " \n", 642 | " \n", 643 | "
ts_codetrade_dateopenhighlowclosepre_closechangepct_chgvolamount
0000001.SZ2020012315.9215.9215.3915.5416.09-0.55-3.41831100592.071723394.336
1000001.SZ2020012215.9216.1615.7116.0916.000.090.5625719464.911150933.398
2000001.SZ2020012116.3416.3415.9316.0016.45-0.45-2.7356896603.101442171.431
3000001.SZ2020012016.4316.6116.3516.4516.390.060.3661746074.751226464.649
4000001.SZ2020011716.3816.5516.3516.3916.330.060.3674605436.69995909.007
\n", 644 | "
" 645 | ], 646 | "text/plain": [ 647 | " ts_code trade_date open high low close pre_close change \\\n", 648 | "0 000001.SZ 20200123 15.92 15.92 15.39 15.54 16.09 -0.55 \n", 649 | "1 000001.SZ 20200122 15.92 16.16 15.71 16.09 16.00 0.09 \n", 650 | "2 000001.SZ 20200121 16.34 16.34 15.93 16.00 16.45 -0.45 \n", 651 | "3 000001.SZ 20200120 16.43 16.61 16.35 16.45 16.39 0.06 \n", 652 | "4 000001.SZ 20200117 16.38 16.55 16.35 16.39 16.33 0.06 \n", 653 | "\n", 654 | " pct_chg vol amount \n", 655 | "0 -3.4183 1100592.07 1723394.336 \n", 656 | "1 0.5625 719464.91 1150933.398 \n", 657 | "2 -2.7356 896603.10 1442171.431 \n", 658 | "3 0.3661 746074.75 1226464.649 \n", 659 | "4 0.3674 605436.69 995909.007 " 660 | ] 661 | }, 662 | "execution_count": 29, 663 | "metadata": {}, 664 | "output_type": "execute_result" 665 | } 666 | ], 667 | "source": [ 668 | "end_date = '20200201' # 使用更短数据目的是1.9绘制走势图数值不重叠\n", 669 | "\n", 670 | "price = pro.query('daily', ts_code=ts_code, start_date=start_date, end_date=end_date)\n", 671 | "\n", 672 | "price.head()" 673 | ] 674 | }, 675 | { 676 | "cell_type": "markdown", 677 | "metadata": {}, 678 | "source": [ 679 | "#### 1.9 股票价格走势图" 680 | ] 681 | }, 682 | { 683 | "cell_type": "code", 684 | "execution_count": 34, 685 | "metadata": { 686 | "scrolled": false 687 | }, 688 | "outputs": [ 689 | { 690 | "data": { 691 | "text/html": [ 692 | "\n", 693 | "\n", 700 | "\n", 701 | "
\n", 702 | "\n", 703 | "\n" 946 | ], 947 | "text/plain": [ 948 | "" 949 | ] 950 | }, 951 | "execution_count": 34, 952 | "metadata": {}, 953 | "output_type": "execute_result" 954 | } 955 | ], 956 | "source": [ 957 | "from pyecharts.charts import Line\n", 958 | "from pyecharts import options as opts\n", 959 | "import numpy as np\n", 960 | "\n", 961 | "price = pro.query('daily', ts_code=ts_code, start_date=start_date, end_date=end_date)\n", 962 | "(\n", 963 | " Line()\n", 964 | " .add_xaxis(xaxis_data=list(price['trade_date'])[::-1])\n", 965 | " .add_yaxis(series_name=\"收盘价\",y_axis=list(price['close'])[::-1],symbol=\"circle\")\n", 966 | "# .add_yaxis(series_name=\"开盘价\",y_axis=list(price['open'])[::-1],symbol=\"circle\")\n", 967 | "# .add_yaxis(series_name=\"最高价\",y_axis=list(price['high'])[::-1],symbol=\"circle\")\n", 968 | "# .add_yaxis(series_name=\"最低价\",y_axis=list(price['low'])[::-1],symbol=\"circle\")\n", 969 | " .set_global_opts(title_opts=opts.TitleOpts(title=\"价格走势图\"))\n", 970 | " .render_notebook()\n", 971 | ")" 972 | ] 973 | }, 974 | { 975 | "cell_type": "markdown", 976 | "metadata": {}, 977 | "source": [ 978 | "#### 1.10 基本面数据" 979 | ] 980 | }, 981 | { 982 | "cell_type": "code", 983 | "execution_count": 110, 984 | "metadata": {}, 985 | "outputs": [ 986 | { 987 | "data": { 988 | "text/html": [ 989 | "
\n", 990 | "\n", 1003 | "\n", 1004 | " \n", 1005 | " \n", 1006 | " \n", 1007 | " \n", 1008 | " \n", 1009 | " \n", 1010 | " \n", 1011 | " \n", 1012 | " \n", 1013 | " \n", 1014 | " \n", 1015 | " \n", 1016 | " \n", 1017 | " \n", 1018 | " \n", 1019 | " \n", 1020 | " \n", 1021 | " \n", 1022 | " \n", 1023 | " \n", 1024 | " \n", 1025 | " \n", 1026 | " \n", 1027 | " \n", 1028 | " \n", 1029 | " \n", 1030 | " \n", 1031 | " \n", 1032 | " \n", 1033 | " \n", 1034 | " \n", 1035 | " \n", 1036 | " \n", 1037 | " \n", 1038 | " \n", 1039 | " \n", 1040 | " \n", 1041 | " \n", 1042 | " \n", 1043 | " \n", 1044 | " \n", 1045 | " \n", 1046 | " \n", 1047 | " \n", 1048 | " \n", 1049 | " \n", 1050 | " \n", 1051 | " \n", 1052 | " \n", 1053 | " \n", 1054 | " \n", 1055 | " \n", 1056 | " \n", 1057 | " \n", 1058 | " \n", 1059 | " \n", 1060 | " \n", 1061 | " \n", 1062 | " \n", 1063 | " \n", 1064 | " \n", 1065 | " \n", 1066 | " \n", 1067 | " \n", 1068 | "
ts_codechairmanmanagersecretaryreg_capitalsetup_dateprovince
0300952.SZ王咸华王咸华张武芬14492.765320040415江苏
1300268.SZ汤捷汤捷杨振刚17420.000020030508湖南
2300447.SZ陈祥楼何亮孙璐29106.925520010929江苏
3300451.SZ葛航张吕峥胡燕119260.810419971210浙江
4002531.SZ严俊旭严俊旭朱彬180250.906220050118江苏
\n", 1069 | "
" 1070 | ], 1071 | "text/plain": [ 1072 | " ts_code chairman manager secretary reg_capital setup_date province\n", 1073 | "0 300952.SZ 王咸华 王咸华 张武芬 14492.7653 20040415 江苏\n", 1074 | "1 300268.SZ 汤捷 汤捷 杨振刚 17420.0000 20030508 湖南\n", 1075 | "2 300447.SZ 陈祥楼 何亮 孙璐 29106.9255 20010929 江苏\n", 1076 | "3 300451.SZ 葛航 张吕峥 胡燕 119260.8104 19971210 浙江\n", 1077 | "4 002531.SZ 严俊旭 严俊旭 朱彬 180250.9062 20050118 江苏" 1078 | ] 1079 | }, 1080 | "execution_count": 110, 1081 | "metadata": {}, 1082 | "output_type": "execute_result" 1083 | } 1084 | ], 1085 | "source": [ 1086 | "stock_company = pro.stock_company(exchange='SZSE', fields='ts_code,chairman,manager,secretary,reg_capital,setup_date,province')\n", 1087 | "stock_company.head()" 1088 | ] 1089 | }, 1090 | { 1091 | "cell_type": "code", 1092 | "execution_count": 113, 1093 | "metadata": {}, 1094 | "outputs": [ 1095 | { 1096 | "data": { 1097 | "text/html": [ 1098 | "
\n", 1099 | "\n", 1112 | "\n", 1113 | " \n", 1114 | " \n", 1115 | " \n", 1116 | " \n", 1117 | " \n", 1118 | " \n", 1119 | " \n", 1120 | " \n", 1121 | " \n", 1122 | " \n", 1123 | " \n", 1124 | " \n", 1125 | " \n", 1126 | " \n", 1127 | " \n", 1128 | " \n", 1129 | " \n", 1130 | " \n", 1131 | " \n", 1132 | " \n", 1133 | " \n", 1134 | " \n", 1135 | " \n", 1136 | " \n", 1137 | " \n", 1138 | " \n", 1139 | " \n", 1140 | " \n", 1141 | " \n", 1142 | " \n", 1143 | " \n", 1144 | " \n", 1145 | " \n", 1146 | " \n", 1147 | " \n", 1148 | " \n", 1149 | " \n", 1150 | " \n", 1151 | " \n", 1152 | " \n", 1153 | " \n", 1154 | " \n", 1155 | " \n", 1156 | " \n", 1157 | " \n", 1158 | " \n", 1159 | "
titletypedateurl
0平安银行:平安银行股份有限公司2020年年度权益分派实施公告2021-05-07临时公告2021-05-07http://vip.stock.finance.sina.com.cn/corp/view...
1平安银行:2021年第一季度报告全文一季度报告2021-04-21http://vip.stock.finance.sina.com.cn/corp/view...
2平安银行:一季报监事会决议公告临时公告2021-04-21http://vip.stock.finance.sina.com.cn/corp/view...
3平安银行:一季报董事会决议公告临时公告2021-04-21http://vip.stock.finance.sina.com.cn/corp/view...
4平安银行:2021年第一季度报告正文一季度报告(摘要)2021-04-21http://vip.stock.finance.sina.com.cn/corp/view...
\n", 1160 | "
" 1161 | ], 1162 | "text/plain": [ 1163 | " title type date \\\n", 1164 | "0 平安银行:平安银行股份有限公司2020年年度权益分派实施公告2021-05-07 临时公告 2021-05-07 \n", 1165 | "1 平安银行:2021年第一季度报告全文 一季度报告 2021-04-21 \n", 1166 | "2 平安银行:一季报监事会决议公告 临时公告 2021-04-21 \n", 1167 | "3 平安银行:一季报董事会决议公告 临时公告 2021-04-21 \n", 1168 | "4 平安银行:2021年第一季度报告正文 一季度报告(摘要) 2021-04-21 \n", 1169 | "\n", 1170 | " url \n", 1171 | "0 http://vip.stock.finance.sina.com.cn/corp/view... \n", 1172 | "1 http://vip.stock.finance.sina.com.cn/corp/view... \n", 1173 | "2 http://vip.stock.finance.sina.com.cn/corp/view... \n", 1174 | "3 http://vip.stock.finance.sina.com.cn/corp/view... \n", 1175 | "4 http://vip.stock.finance.sina.com.cn/corp/view... " 1176 | ] 1177 | }, 1178 | "execution_count": 113, 1179 | "metadata": {}, 1180 | "output_type": "execute_result" 1181 | } 1182 | ], 1183 | "source": [ 1184 | "import tushare as ts\n", 1185 | "ts.get_notices(\"000001\").head()" 1186 | ] 1187 | }, 1188 | { 1189 | "cell_type": "code", 1190 | "execution_count": 114, 1191 | "metadata": {}, 1192 | "outputs": [ 1193 | { 1194 | "data": { 1195 | "text/html": [ 1196 | "
\n", 1197 | "\n", 1210 | "\n", 1211 | " \n", 1212 | " \n", 1213 | " \n", 1214 | " \n", 1215 | " \n", 1216 | " \n", 1217 | " \n", 1218 | " \n", 1219 | " \n", 1220 | " \n", 1221 | " \n", 1222 | " \n", 1223 | " \n", 1224 | " \n", 1225 | " \n", 1226 | " \n", 1227 | " \n", 1228 | " \n", 1229 | " \n", 1230 | " \n", 1231 | " \n", 1232 | " \n", 1233 | " \n", 1234 | " \n", 1235 | " \n", 1236 | " \n", 1237 | " \n", 1238 | " \n", 1239 | " \n", 1240 | " \n", 1241 | " \n", 1242 | " \n", 1243 | " \n", 1244 | " \n", 1245 | " \n", 1246 | " \n", 1247 | " \n", 1248 | " \n", 1249 | " \n", 1250 | " \n", 1251 | "
titleptimercounts
0机构减仓或在战役性转移05月07日 14:57360.0
1170家电子行业公司发布年度业绩预告0.0
2券商开户专属通道:新客专享理财福利多多0.0
3热门股票如何选择盘中介入点05月07日 07:40419.0
4节后综合症再现 依然不乏结构性机会05月07日 08:03507.0
\n", 1252 | "
" 1253 | ], 1254 | "text/plain": [ 1255 | " title ptime rcounts\n", 1256 | "0 机构减仓或在战役性转移 05月07日 14:57 360.0\n", 1257 | "1 170家电子行业公司发布年度业绩预告 0.0\n", 1258 | "2 券商开户专属通道:新客专享理财福利多多 0.0\n", 1259 | "3 热门股票如何选择盘中介入点 05月07日 07:40 419.0\n", 1260 | "4 节后综合症再现 依然不乏结构性机会 05月07日 08:03 507.0" 1261 | ] 1262 | }, 1263 | "execution_count": 114, 1264 | "metadata": {}, 1265 | "output_type": "execute_result" 1266 | } 1267 | ], 1268 | "source": [ 1269 | "ts.guba_sina().head()" 1270 | ] 1271 | }, 1272 | { 1273 | "cell_type": "code", 1274 | "execution_count": 115, 1275 | "metadata": {}, 1276 | "outputs": [ 1277 | { 1278 | "name": "stdout", 1279 | "output_type": "stream", 1280 | "text": [ 1281 | "本接口即将停止更新,请尽快使用Pro版接口:https://tushare.pro/document/2\n" 1282 | ] 1283 | }, 1284 | { 1285 | "data": { 1286 | "text/html": [ 1287 | "
\n", 1288 | "\n", 1301 | "\n", 1302 | " \n", 1303 | " \n", 1304 | " \n", 1305 | " \n", 1306 | " \n", 1307 | " \n", 1308 | " \n", 1309 | " \n", 1310 | " \n", 1311 | " \n", 1312 | " \n", 1313 | " \n", 1314 | " \n", 1315 | " \n", 1316 | " \n", 1317 | " \n", 1318 | " \n", 1319 | " \n", 1320 | " \n", 1321 | " \n", 1322 | " \n", 1323 | " \n", 1324 | " \n", 1325 | " \n", 1326 | " \n", 1327 | " \n", 1328 | " \n", 1329 | " \n", 1330 | " \n", 1331 | " \n", 1332 | " \n", 1333 | " \n", 1334 | " \n", 1335 | " \n", 1336 | " \n", 1337 | " \n", 1338 | " \n", 1339 | " \n", 1340 | " \n", 1341 | " \n", 1342 | " \n", 1343 | " \n", 1344 | " \n", 1345 | " \n", 1346 | " \n", 1347 | " \n", 1348 | " \n", 1349 | " \n", 1350 | " \n", 1351 | " \n", 1352 | " \n", 1353 | " \n", 1354 | " \n", 1355 | " \n", 1356 | " \n", 1357 | " \n", 1358 | " \n", 1359 | " \n", 1360 | " \n", 1361 | " \n", 1362 | " \n", 1363 | " \n", 1364 | " \n", 1365 | " \n", 1366 | " \n", 1367 | " \n", 1368 | " \n", 1369 | " \n", 1370 | " \n", 1371 | " \n", 1372 | " \n", 1373 | " \n", 1374 | " \n", 1375 | " \n", 1376 | " \n", 1377 | " \n", 1378 | " \n", 1379 | " \n", 1380 | " \n", 1381 | " \n", 1382 | " \n", 1383 | " \n", 1384 | " \n", 1385 | " \n", 1386 | " \n", 1387 | " \n", 1388 | " \n", 1389 | " \n", 1390 | " \n", 1391 | " \n", 1392 | " \n", 1393 | " \n", 1394 | " \n", 1395 | " \n", 1396 | " \n", 1397 | " \n", 1398 | " \n", 1399 | " \n", 1400 | " \n", 1401 | " \n", 1402 | " \n", 1403 | " \n", 1404 | " \n", 1405 | " \n", 1406 | " \n", 1407 | " \n", 1408 | " \n", 1409 | " \n", 1410 | " \n", 1411 | " \n", 1412 | " \n", 1413 | " \n", 1414 | " \n", 1415 | " \n", 1416 | " \n", 1417 | " \n", 1418 | " \n", 1419 | " \n", 1420 | " \n", 1421 | " \n", 1422 | " \n", 1423 | " \n", 1424 | " \n", 1425 | "
openhighcloselowvolumeprice_changep_changema5ma10ma20v_ma5v_ma10v_ma20turnover
date
2021-05-0723.6724.3024.0523.39802214.250.552.3423.55623.29422.153614633.11766603.84700531.860.41
2021-05-0623.1023.7023.5023.10500295.190.210.9023.33423.05822.034548276.55769859.69680476.290.26
2021-04-3023.3523.4923.2923.01561981.31-0.30-1.2723.22222.82321.934622701.05831058.44684515.710.29
2021-04-2923.3423.7123.5923.11614836.880.241.0323.22222.52021.859674950.82847860.13683639.090.32
2021-04-2823.2923.4523.3522.78593837.940.411.7923.10022.19721.780717922.23859962.95691748.650.31
\n", 1426 | "
" 1427 | ], 1428 | "text/plain": [ 1429 | " open high close low volume price_change p_change \\\n", 1430 | "date \n", 1431 | "2021-05-07 23.67 24.30 24.05 23.39 802214.25 0.55 2.34 \n", 1432 | "2021-05-06 23.10 23.70 23.50 23.10 500295.19 0.21 0.90 \n", 1433 | "2021-04-30 23.35 23.49 23.29 23.01 561981.31 -0.30 -1.27 \n", 1434 | "2021-04-29 23.34 23.71 23.59 23.11 614836.88 0.24 1.03 \n", 1435 | "2021-04-28 23.29 23.45 23.35 22.78 593837.94 0.41 1.79 \n", 1436 | "\n", 1437 | " ma5 ma10 ma20 v_ma5 v_ma10 v_ma20 turnover \n", 1438 | "date \n", 1439 | "2021-05-07 23.556 23.294 22.153 614633.11 766603.84 700531.86 0.41 \n", 1440 | "2021-05-06 23.334 23.058 22.034 548276.55 769859.69 680476.29 0.26 \n", 1441 | "2021-04-30 23.222 22.823 21.934 622701.05 831058.44 684515.71 0.29 \n", 1442 | "2021-04-29 23.222 22.520 21.859 674950.82 847860.13 683639.09 0.32 \n", 1443 | "2021-04-28 23.100 22.197 21.780 717922.23 859962.95 691748.65 0.31 " 1444 | ] 1445 | }, 1446 | "execution_count": 115, 1447 | "metadata": {}, 1448 | "output_type": "execute_result" 1449 | } 1450 | ], 1451 | "source": [ 1452 | "ts.get_hist_data(\"000001\").head()" 1453 | ] 1454 | }, 1455 | { 1456 | "cell_type": "code", 1457 | "execution_count": 116, 1458 | "metadata": {}, 1459 | "outputs": [ 1460 | { 1461 | "name": "stdout", 1462 | "output_type": "stream", 1463 | "text": [ 1464 | "本接口即将停止更新,请尽快使用Pro版接口:https://tushare.pro/document/2\n" 1465 | ] 1466 | }, 1467 | { 1468 | "data": { 1469 | "text/html": [ 1470 | "
\n", 1471 | "\n", 1484 | "\n", 1485 | " \n", 1486 | " \n", 1487 | " \n", 1488 | " \n", 1489 | " \n", 1490 | " \n", 1491 | " \n", 1492 | " \n", 1493 | " \n", 1494 | " \n", 1495 | " \n", 1496 | " \n", 1497 | " \n", 1498 | " \n", 1499 | " \n", 1500 | " \n", 1501 | " \n", 1502 | " \n", 1503 | " \n", 1504 | " \n", 1505 | " \n", 1506 | " \n", 1507 | " \n", 1508 | " \n", 1509 | " \n", 1510 | " \n", 1511 | " \n", 1512 | " \n", 1513 | " \n", 1514 | " \n", 1515 | " \n", 1516 | " \n", 1517 | " \n", 1518 | " \n", 1519 | " \n", 1520 | " \n", 1521 | " \n", 1522 | " \n", 1523 | " \n", 1524 | " \n", 1525 | " \n", 1526 | " \n", 1527 | " \n", 1528 | " \n", 1529 | " \n", 1530 | " \n", 1531 | " \n", 1532 | " \n", 1533 | " \n", 1534 | " \n", 1535 | " \n", 1536 | " \n", 1537 | " \n", 1538 | " \n", 1539 | " \n", 1540 | " \n", 1541 | " \n", 1542 | " \n", 1543 | " \n", 1544 | " \n", 1545 | " \n", 1546 | " \n", 1547 | " \n", 1548 | " \n", 1549 | " \n", 1550 | " \n", 1551 | " \n", 1552 | " \n", 1553 | " \n", 1554 | " \n", 1555 | " \n", 1556 | " \n", 1557 | " \n", 1558 | " \n", 1559 | " \n", 1560 | " \n", 1561 | " \n", 1562 | " \n", 1563 | " \n", 1564 | " \n", 1565 | " \n", 1566 | " \n", 1567 | " \n", 1568 | " \n", 1569 | " \n", 1570 | " \n", 1571 | " \n", 1572 | " \n", 1573 | " \n", 1574 | " \n", 1575 | " \n", 1576 | " \n", 1577 | " \n", 1578 | " \n", 1579 | " \n", 1580 | " \n", 1581 | " \n", 1582 | " \n", 1583 | " \n", 1584 | " \n", 1585 | " \n", 1586 | " \n", 1587 | " \n", 1588 | " \n", 1589 | " \n", 1590 | " \n", 1591 | " \n", 1592 | " \n", 1593 | " \n", 1594 | " \n", 1595 | " \n", 1596 | " \n", 1597 | " \n", 1598 | " \n", 1599 | " \n", 1600 | " \n", 1601 | " \n", 1602 | " \n", 1603 | " \n", 1604 | " \n", 1605 | " \n", 1606 | " \n", 1607 | " \n", 1608 | "
openhighcloselowvolumeprice_changep_changema5ma10ma20v_ma5v_ma10v_ma20turnover
date
2021-05-0723.1024.3024.0523.101302509.500.763.2622.43821.91221.8232882347.973893787.244866394.110.67
2021-04-3023.8724.2323.2922.783113505.250.000.0021.92821.64521.5383307222.474560954.595004209.481.60
2021-04-2320.0323.6523.2919.915197079.003.0314.9621.49821.70121.3003544311.924492385.465036850.992.68
2021-04-1621.5121.5120.2619.813097949.00-1.04-4.8820.93421.75421.1003492326.924367200.065044014.371.60
2021-04-0921.5522.0921.3021.081700697.12-0.20-0.9321.17822.22121.0723922926.324865955.415078637.830.88
\n", 1609 | "
" 1610 | ], 1611 | "text/plain": [ 1612 | " open high close low volume price_change p_change \\\n", 1613 | "date \n", 1614 | "2021-05-07 23.10 24.30 24.05 23.10 1302509.50 0.76 3.26 \n", 1615 | "2021-04-30 23.87 24.23 23.29 22.78 3113505.25 0.00 0.00 \n", 1616 | "2021-04-23 20.03 23.65 23.29 19.91 5197079.00 3.03 14.96 \n", 1617 | "2021-04-16 21.51 21.51 20.26 19.81 3097949.00 -1.04 -4.88 \n", 1618 | "2021-04-09 21.55 22.09 21.30 21.08 1700697.12 -0.20 -0.93 \n", 1619 | "\n", 1620 | " ma5 ma10 ma20 v_ma5 v_ma10 v_ma20 \\\n", 1621 | "date \n", 1622 | "2021-05-07 22.438 21.912 21.823 2882347.97 3893787.24 4866394.11 \n", 1623 | "2021-04-30 21.928 21.645 21.538 3307222.47 4560954.59 5004209.48 \n", 1624 | "2021-04-23 21.498 21.701 21.300 3544311.92 4492385.46 5036850.99 \n", 1625 | "2021-04-16 20.934 21.754 21.100 3492326.92 4367200.06 5044014.37 \n", 1626 | "2021-04-09 21.178 22.221 21.072 3922926.32 4865955.41 5078637.83 \n", 1627 | "\n", 1628 | " turnover \n", 1629 | "date \n", 1630 | "2021-05-07 0.67 \n", 1631 | "2021-04-30 1.60 \n", 1632 | "2021-04-23 2.68 \n", 1633 | "2021-04-16 1.60 \n", 1634 | "2021-04-09 0.88 " 1635 | ] 1636 | }, 1637 | "execution_count": 116, 1638 | "metadata": {}, 1639 | "output_type": "execute_result" 1640 | } 1641 | ], 1642 | "source": [ 1643 | "ts.get_hist_data(\"000001\",ktype=\"w\").head()" 1644 | ] 1645 | }, 1646 | { 1647 | "cell_type": "code", 1648 | "execution_count": 117, 1649 | "metadata": {}, 1650 | "outputs": [ 1651 | { 1652 | "name": "stdout", 1653 | "output_type": "stream", 1654 | "text": [ 1655 | "本接口即将停止更新,请尽快使用Pro版接口:https://tushare.pro/document/2\n" 1656 | ] 1657 | }, 1658 | { 1659 | "data": { 1660 | "text/html": [ 1661 | "
\n", 1662 | "\n", 1675 | "\n", 1676 | " \n", 1677 | " \n", 1678 | " \n", 1679 | " \n", 1680 | " \n", 1681 | " \n", 1682 | " \n", 1683 | " \n", 1684 | " \n", 1685 | " \n", 1686 | " \n", 1687 | " \n", 1688 | " \n", 1689 | " \n", 1690 | " \n", 1691 | " \n", 1692 | " \n", 1693 | " \n", 1694 | " \n", 1695 | " \n", 1696 | " \n", 1697 | " \n", 1698 | " \n", 1699 | " \n", 1700 | " \n", 1701 | " \n", 1702 | " \n", 1703 | " \n", 1704 | " \n", 1705 | " \n", 1706 | " \n", 1707 | " \n", 1708 | " \n", 1709 | " \n", 1710 | " \n", 1711 | " \n", 1712 | " \n", 1713 | " \n", 1714 | " \n", 1715 | " \n", 1716 | " \n", 1717 | " \n", 1718 | " \n", 1719 | " \n", 1720 | " \n", 1721 | " \n", 1722 | " \n", 1723 | " \n", 1724 | " \n", 1725 | " \n", 1726 | " \n", 1727 | " \n", 1728 | " \n", 1729 | " \n", 1730 | " \n", 1731 | " \n", 1732 | " \n", 1733 | " \n", 1734 | " \n", 1735 | " \n", 1736 | " \n", 1737 | " \n", 1738 | " \n", 1739 | " \n", 1740 | " \n", 1741 | " \n", 1742 | " \n", 1743 | " \n", 1744 | " \n", 1745 | " \n", 1746 | " \n", 1747 | " \n", 1748 | " \n", 1749 | " \n", 1750 | " \n", 1751 | " \n", 1752 | " \n", 1753 | " \n", 1754 | " \n", 1755 | " \n", 1756 | " \n", 1757 | " \n", 1758 | " \n", 1759 | " \n", 1760 | " \n", 1761 | " \n", 1762 | " \n", 1763 | " \n", 1764 | " \n", 1765 | " \n", 1766 | " \n", 1767 | " \n", 1768 | " \n", 1769 | " \n", 1770 | " \n", 1771 | " \n", 1772 | " \n", 1773 | " \n", 1774 | " \n", 1775 | " \n", 1776 | " \n", 1777 | " \n", 1778 | " \n", 1779 | " \n", 1780 | " \n", 1781 | " \n", 1782 | " \n", 1783 | " \n", 1784 | " \n", 1785 | " \n", 1786 | " \n", 1787 | " \n", 1788 | " \n", 1789 | " \n", 1790 | " \n", 1791 | " \n", 1792 | " \n", 1793 | " \n", 1794 | " \n", 1795 | " \n", 1796 | " \n", 1797 | " \n", 1798 | " \n", 1799 | "
openhighcloselowvolumeprice_changep_changema5ma10ma20v_ma5v_ma10v_ma20turnover
date
2021-05-0723.1024.3024.0523.101302509.50.763.2622.76420.09017.24117859144.920308849.8520748658.730.67
2021-04-3022.0824.2323.2919.8114234763.01.285.8221.82219.01916.81821521885.824854576.1021952860.657.34
2021-03-3121.5423.4922.0120.2823400600.00.632.9521.11217.97016.36122789929.625258599.6022737767.7012.06
2021-02-2623.0025.3121.3821.2122432726.0-1.71-7.4120.26017.06915.96722277667.224135001.5022393905.6011.56
2021-01-2919.1023.5423.0917.8027925126.03.7519.3919.01816.32415.58722094764.023598847.1022103348.4514.39
\n", 1800 | "
" 1801 | ], 1802 | "text/plain": [ 1803 | " open high close low volume price_change p_change \\\n", 1804 | "date \n", 1805 | "2021-05-07 23.10 24.30 24.05 23.10 1302509.5 0.76 3.26 \n", 1806 | "2021-04-30 22.08 24.23 23.29 19.81 14234763.0 1.28 5.82 \n", 1807 | "2021-03-31 21.54 23.49 22.01 20.28 23400600.0 0.63 2.95 \n", 1808 | "2021-02-26 23.00 25.31 21.38 21.21 22432726.0 -1.71 -7.41 \n", 1809 | "2021-01-29 19.10 23.54 23.09 17.80 27925126.0 3.75 19.39 \n", 1810 | "\n", 1811 | " ma5 ma10 ma20 v_ma5 v_ma10 v_ma20 \\\n", 1812 | "date \n", 1813 | "2021-05-07 22.764 20.090 17.241 17859144.9 20308849.85 20748658.73 \n", 1814 | "2021-04-30 21.822 19.019 16.818 21521885.8 24854576.10 21952860.65 \n", 1815 | "2021-03-31 21.112 17.970 16.361 22789929.6 25258599.60 22737767.70 \n", 1816 | "2021-02-26 20.260 17.069 15.967 22277667.2 24135001.50 22393905.60 \n", 1817 | "2021-01-29 19.018 16.324 15.587 22094764.0 23598847.10 22103348.45 \n", 1818 | "\n", 1819 | " turnover \n", 1820 | "date \n", 1821 | "2021-05-07 0.67 \n", 1822 | "2021-04-30 7.34 \n", 1823 | "2021-03-31 12.06 \n", 1824 | "2021-02-26 11.56 \n", 1825 | "2021-01-29 14.39 " 1826 | ] 1827 | }, 1828 | "execution_count": 117, 1829 | "metadata": {}, 1830 | "output_type": "execute_result" 1831 | } 1832 | ], 1833 | "source": [ 1834 | "ts.get_hist_data(\"000001\",ktype=\"m\").head()" 1835 | ] 1836 | }, 1837 | { 1838 | "cell_type": "code", 1839 | "execution_count": 118, 1840 | "metadata": {}, 1841 | "outputs": [ 1842 | { 1843 | "name": "stdout", 1844 | "output_type": "stream", 1845 | "text": [ 1846 | "本接口即将停止更新,请尽快使用Pro版接口:https://tushare.pro/document/2\n" 1847 | ] 1848 | }, 1849 | { 1850 | "data": { 1851 | "text/html": [ 1852 | "
\n", 1853 | "\n", 1866 | "\n", 1867 | " \n", 1868 | " \n", 1869 | " \n", 1870 | " \n", 1871 | " \n", 1872 | " \n", 1873 | " \n", 1874 | " \n", 1875 | " \n", 1876 | " \n", 1877 | " \n", 1878 | " \n", 1879 | " \n", 1880 | " \n", 1881 | " \n", 1882 | " \n", 1883 | " \n", 1884 | " \n", 1885 | " \n", 1886 | " \n", 1887 | " \n", 1888 | " \n", 1889 | " \n", 1890 | " \n", 1891 | " \n", 1892 | " \n", 1893 | " \n", 1894 | " \n", 1895 | " \n", 1896 | " \n", 1897 | " \n", 1898 | " \n", 1899 | " \n", 1900 | " \n", 1901 | " \n", 1902 | " \n", 1903 | " \n", 1904 | " \n", 1905 | " \n", 1906 | " \n", 1907 | " \n", 1908 | " \n", 1909 | " \n", 1910 | " \n", 1911 | " \n", 1912 | " \n", 1913 | " \n", 1914 | " \n", 1915 | " \n", 1916 | " \n", 1917 | " \n", 1918 | " \n", 1919 | " \n", 1920 | " \n", 1921 | " \n", 1922 | " \n", 1923 | " \n", 1924 | " \n", 1925 | " \n", 1926 | " \n", 1927 | " \n", 1928 | " \n", 1929 | " \n", 1930 | " \n", 1931 | " \n", 1932 | " \n", 1933 | " \n", 1934 | " \n", 1935 | " \n", 1936 | " \n", 1937 | " \n", 1938 | " \n", 1939 | " \n", 1940 | " \n", 1941 | " \n", 1942 | " \n", 1943 | " \n", 1944 | " \n", 1945 | " \n", 1946 | " \n", 1947 | " \n", 1948 | " \n", 1949 | " \n", 1950 | " \n", 1951 | " \n", 1952 | " \n", 1953 | " \n", 1954 | " \n", 1955 | " \n", 1956 | " \n", 1957 | " \n", 1958 | " \n", 1959 | " \n", 1960 | " \n", 1961 | " \n", 1962 | " \n", 1963 | " \n", 1964 | " \n", 1965 | " \n", 1966 | " \n", 1967 | " \n", 1968 | " \n", 1969 | " \n", 1970 | " \n", 1971 | " \n", 1972 | " \n", 1973 | " \n", 1974 | " \n", 1975 | " \n", 1976 | " \n", 1977 | " \n", 1978 | " \n", 1979 | " \n", 1980 | " \n", 1981 | " \n", 1982 | " \n", 1983 | " \n", 1984 | " \n", 1985 | " \n", 1986 | " \n", 1987 | " \n", 1988 | " \n", 1989 | " \n", 1990 | "
openhighcloselowvolumeprice_changep_changema5ma10ma20v_ma5v_ma10v_ma20turnover
date
2021-05-07 15:00:0024.0124.0324.0123.9814157.800.000.0024.02823.99824.055513524.912686.811663.70.01
2021-05-07 14:55:0024.0324.0424.0123.9614808.00-0.02-0.0824.02424.00224.063012897.513060.211526.30.01
2021-05-07 14:50:0024.0824.1024.0223.9620615.30-0.06-0.2523.99824.01124.067012849.713332.011336.40.01
2021-05-07 14:45:0024.0124.0824.0824.019387.440.070.2923.99024.01224.068010501.112905.311421.00.01
2021-05-07 14:40:0024.0024.0324.0223.958656.080.020.0823.97824.01224.065511153.412820.411752.80.01
\n", 1991 | "
" 1992 | ], 1993 | "text/plain": [ 1994 | " open high close low volume price_change \\\n", 1995 | "date \n", 1996 | "2021-05-07 15:00:00 24.01 24.03 24.01 23.98 14157.80 0.00 \n", 1997 | "2021-05-07 14:55:00 24.03 24.04 24.01 23.96 14808.00 -0.02 \n", 1998 | "2021-05-07 14:50:00 24.08 24.10 24.02 23.96 20615.30 -0.06 \n", 1999 | "2021-05-07 14:45:00 24.01 24.08 24.08 24.01 9387.44 0.07 \n", 2000 | "2021-05-07 14:40:00 24.00 24.03 24.02 23.95 8656.08 0.02 \n", 2001 | "\n", 2002 | " p_change ma5 ma10 ma20 v_ma5 v_ma10 \\\n", 2003 | "date \n", 2004 | "2021-05-07 15:00:00 0.00 24.028 23.998 24.0555 13524.9 12686.8 \n", 2005 | "2021-05-07 14:55:00 -0.08 24.024 24.002 24.0630 12897.5 13060.2 \n", 2006 | "2021-05-07 14:50:00 -0.25 23.998 24.011 24.0670 12849.7 13332.0 \n", 2007 | "2021-05-07 14:45:00 0.29 23.990 24.012 24.0680 10501.1 12905.3 \n", 2008 | "2021-05-07 14:40:00 0.08 23.978 24.012 24.0655 11153.4 12820.4 \n", 2009 | "\n", 2010 | " v_ma20 turnover \n", 2011 | "date \n", 2012 | "2021-05-07 15:00:00 11663.7 0.01 \n", 2013 | "2021-05-07 14:55:00 11526.3 0.01 \n", 2014 | "2021-05-07 14:50:00 11336.4 0.01 \n", 2015 | "2021-05-07 14:45:00 11421.0 0.01 \n", 2016 | "2021-05-07 14:40:00 11752.8 0.01 " 2017 | ] 2018 | }, 2019 | "execution_count": 118, 2020 | "metadata": {}, 2021 | "output_type": "execute_result" 2022 | } 2023 | ], 2024 | "source": [ 2025 | "ts.get_hist_data(\"000001\",ktype=\"5\").head() # 5min (15,30,60)" 2026 | ] 2027 | }, 2028 | { 2029 | "cell_type": "code", 2030 | "execution_count": 119, 2031 | "metadata": {}, 2032 | "outputs": [ 2033 | { 2034 | "name": "stdout", 2035 | "output_type": "stream", 2036 | "text": [ 2037 | "本接口即将停止更新,请尽快使用Pro版接口:https://tushare.pro/document/2\n" 2038 | ] 2039 | }, 2040 | { 2041 | "data": { 2042 | "text/html": [ 2043 | "
\n", 2044 | "\n", 2057 | "\n", 2058 | " \n", 2059 | " \n", 2060 | " \n", 2061 | " \n", 2062 | " \n", 2063 | " \n", 2064 | " \n", 2065 | " \n", 2066 | " \n", 2067 | " \n", 2068 | " \n", 2069 | " \n", 2070 | " \n", 2071 | " \n", 2072 | " \n", 2073 | " \n", 2074 | " \n", 2075 | " \n", 2076 | " \n", 2077 | " \n", 2078 | " \n", 2079 | " \n", 2080 | " \n", 2081 | " \n", 2082 | " \n", 2083 | " \n", 2084 | " \n", 2085 | " \n", 2086 | " \n", 2087 | " \n", 2088 | " \n", 2089 | " \n", 2090 | " \n", 2091 | " \n", 2092 | " \n", 2093 | " \n", 2094 | " \n", 2095 | " \n", 2096 | " \n", 2097 | " \n", 2098 | " \n", 2099 | " \n", 2100 | " \n", 2101 | " \n", 2102 | " \n", 2103 | " \n", 2104 | " \n", 2105 | " \n", 2106 | " \n", 2107 | " \n", 2108 | " \n", 2109 | " \n", 2110 | " \n", 2111 | " \n", 2112 | " \n", 2113 | " \n", 2114 | " \n", 2115 | " \n", 2116 | " \n", 2117 | " \n", 2118 | " \n", 2119 | " \n", 2120 | " \n", 2121 | " \n", 2122 | " \n", 2123 | " \n", 2124 | " \n", 2125 | " \n", 2126 | " \n", 2127 | " \n", 2128 | " \n", 2129 | " \n", 2130 | " \n", 2131 | " \n", 2132 | " \n", 2133 | " \n", 2134 | " \n", 2135 | " \n", 2136 | " \n", 2137 | " \n", 2138 | " \n", 2139 | " \n", 2140 | " \n", 2141 | " \n", 2142 | " \n", 2143 | " \n", 2144 | " \n", 2145 | " \n", 2146 | " \n", 2147 | " \n", 2148 | " \n", 2149 | " \n", 2150 | " \n", 2151 | " \n", 2152 | " \n", 2153 | " \n", 2154 | " \n", 2155 | " \n", 2156 | " \n", 2157 | " \n", 2158 | " \n", 2159 | " \n", 2160 | " \n", 2161 | " \n", 2162 | " \n", 2163 | " \n", 2164 | " \n", 2165 | " \n", 2166 | " \n", 2167 | " \n", 2168 | " \n", 2169 | " \n", 2170 | " \n", 2171 | " \n", 2172 | " \n", 2173 | " \n", 2174 | "
openhighcloselowvolumeprice_changep_changema5ma10ma20v_ma5v_ma10v_ma20
date
2021-05-073021.873026.852910.412910.4114489343.0-104.40-3.463023.7242992.4902898.92416444300.816648376.315426893.45
2021-05-063059.583059.583014.812961.9916552269.0-76.59-2.483038.8482991.0602894.98416974533.416827430.915346704.55
2021-04-303040.653110.163091.403037.3116606502.039.981.313029.7662979.4522886.85517752532.817050663.615277921.70
2021-04-293053.743068.603051.423014.5516647988.00.840.033010.3842948.6492873.05517834241.616717391.815101499.00
2021-04-282990.703052.043050.582980.6517925402.064.552.162987.9782922.5712858.40917635713.416185039.814900826.45
\n", 2175 | "
" 2176 | ], 2177 | "text/plain": [ 2178 | " open high close low volume price_change \\\n", 2179 | "date \n", 2180 | "2021-05-07 3021.87 3026.85 2910.41 2910.41 14489343.0 -104.40 \n", 2181 | "2021-05-06 3059.58 3059.58 3014.81 2961.99 16552269.0 -76.59 \n", 2182 | "2021-04-30 3040.65 3110.16 3091.40 3037.31 16606502.0 39.98 \n", 2183 | "2021-04-29 3053.74 3068.60 3051.42 3014.55 16647988.0 0.84 \n", 2184 | "2021-04-28 2990.70 3052.04 3050.58 2980.65 17925402.0 64.55 \n", 2185 | "\n", 2186 | " p_change ma5 ma10 ma20 v_ma5 v_ma10 \\\n", 2187 | "date \n", 2188 | "2021-05-07 -3.46 3023.724 2992.490 2898.924 16444300.8 16648376.3 \n", 2189 | "2021-05-06 -2.48 3038.848 2991.060 2894.984 16974533.4 16827430.9 \n", 2190 | "2021-04-30 1.31 3029.766 2979.452 2886.855 17752532.8 17050663.6 \n", 2191 | "2021-04-29 0.03 3010.384 2948.649 2873.055 17834241.6 16717391.8 \n", 2192 | "2021-04-28 2.16 2987.978 2922.571 2858.409 17635713.4 16185039.8 \n", 2193 | "\n", 2194 | " v_ma20 \n", 2195 | "date \n", 2196 | "2021-05-07 15426893.45 \n", 2197 | "2021-05-06 15346704.55 \n", 2198 | "2021-04-30 15277921.70 \n", 2199 | "2021-04-29 15101499.00 \n", 2200 | "2021-04-28 14900826.45 " 2201 | ] 2202 | }, 2203 | "execution_count": 119, 2204 | "metadata": {}, 2205 | "output_type": "execute_result" 2206 | } 2207 | ], 2208 | "source": [ 2209 | "# sh上证指数; sz深圳成指; hs300沪深300; sz50上证50; zxb中小板指数; cyb创业板指数\n", 2210 | "ts.get_hist_data(\"cyb\").head()" 2211 | ] 2212 | }, 2213 | { 2214 | "cell_type": "markdown", 2215 | "metadata": {}, 2216 | "source": [ 2217 | "### 2.股票数据获取(巨宽数据源)" 2218 | ] 2219 | }, 2220 | { 2221 | "cell_type": "code", 2222 | "execution_count": 40, 2223 | "metadata": {}, 2224 | "outputs": [], 2225 | "source": [ 2226 | "from jqdatasdk import *\n", 2227 | "\n", 2228 | "auth('18280180192', 'Tencent123') # 请自行前往 https://www.joinquant.com/ 免费申请TOKEN\n", 2229 | "\n", 2230 | "security = '000300.XSHG'\n", 2231 | "start_date = '2020-01-01'\n", 2232 | "end_date = '2020-03-01'" 2233 | ] 2234 | }, 2235 | { 2236 | "cell_type": "code", 2237 | "execution_count": 42, 2238 | "metadata": {}, 2239 | "outputs": [ 2240 | { 2241 | "data": { 2242 | "text/html": [ 2243 | "
\n", 2244 | "\n", 2257 | "\n", 2258 | " \n", 2259 | " \n", 2260 | " \n", 2261 | " \n", 2262 | " \n", 2263 | " \n", 2264 | " \n", 2265 | " \n", 2266 | " \n", 2267 | " \n", 2268 | " \n", 2269 | " \n", 2270 | " \n", 2271 | " \n", 2272 | " \n", 2273 | " \n", 2274 | " \n", 2275 | " \n", 2276 | " \n", 2277 | " \n", 2278 | " \n", 2279 | " \n", 2280 | " \n", 2281 | " \n", 2282 | " \n", 2283 | " \n", 2284 | " \n", 2285 | " \n", 2286 | " \n", 2287 | " \n", 2288 | " \n", 2289 | " \n", 2290 | " \n", 2291 | " \n", 2292 | " \n", 2293 | " \n", 2294 | " \n", 2295 | " \n", 2296 | " \n", 2297 | " \n", 2298 | " \n", 2299 | " \n", 2300 | " \n", 2301 | " \n", 2302 | " \n", 2303 | " \n", 2304 | " \n", 2305 | " \n", 2306 | " \n", 2307 | " \n", 2308 | " \n", 2309 | " \n", 2310 | " \n", 2311 | " \n", 2312 | " \n", 2313 | " \n", 2314 | " \n", 2315 | " \n", 2316 | "
openclosehighlowvolumemoney
2020-01-024121.354152.244172.664121.351.821168e+102.701055e+11
2020-01-034161.224144.964164.304131.861.428262e+102.152163e+11
2020-01-064120.524129.304170.644102.381.753100e+102.501821e+11
2020-01-074137.404160.234161.254135.101.394890e+101.963891e+11
2020-01-084139.634112.324149.814101.981.675858e+102.124063e+11
\n", 2317 | "
" 2318 | ], 2319 | "text/plain": [ 2320 | " open close high low volume money\n", 2321 | "2020-01-02 4121.35 4152.24 4172.66 4121.35 1.821168e+10 2.701055e+11\n", 2322 | "2020-01-03 4161.22 4144.96 4164.30 4131.86 1.428262e+10 2.152163e+11\n", 2323 | "2020-01-06 4120.52 4129.30 4170.64 4102.38 1.753100e+10 2.501821e+11\n", 2324 | "2020-01-07 4137.40 4160.23 4161.25 4135.10 1.394890e+10 1.963891e+11\n", 2325 | "2020-01-08 4139.63 4112.32 4149.81 4101.98 1.675858e+10 2.124063e+11" 2326 | ] 2327 | }, 2328 | "execution_count": 42, 2329 | "metadata": {}, 2330 | "output_type": "execute_result" 2331 | } 2332 | ], 2333 | "source": [ 2334 | "# security 股票代码\n", 2335 | "# frequency 时间粒度(1d=日)\n", 2336 | "# skip_paused 是否跳过缺失交易数据时间点\n", 2337 | "stock_price = get_price(security=security, start_date=start_date, end_date=end_date, frequency='1d',skip_paused=False)\n", 2338 | "\n", 2339 | "stock_price.head()" 2340 | ] 2341 | }, 2342 | { 2343 | "cell_type": "code", 2344 | "execution_count": null, 2345 | "metadata": {}, 2346 | "outputs": [], 2347 | "source": [] 2348 | } 2349 | ], 2350 | "metadata": { 2351 | "kernelspec": { 2352 | "display_name": "Python 3", 2353 | "language": "python", 2354 | "name": "python3" 2355 | }, 2356 | "language_info": { 2357 | "codemirror_mode": { 2358 | "name": "ipython", 2359 | "version": 3 2360 | }, 2361 | "file_extension": ".py", 2362 | "mimetype": "text/x-python", 2363 | "name": "python", 2364 | "nbconvert_exporter": "python", 2365 | "pygments_lexer": "ipython3", 2366 | "version": "3.7.8" 2367 | } 2368 | }, 2369 | "nbformat": 4, 2370 | "nbformat_minor": 2 2371 | } 2372 | -------------------------------------------------------------------------------- /cypher_cheetsheet/README.md: -------------------------------------------------------------------------------- 1 | ## Cypher Cheetsheet基础语法 2 | 3 | 4 | 5 | ### 1 创建节点 6 | 7 | 类型为`Person`(属性:姓名、年龄及性别) 8 | 9 | ```cql 10 | create (:Person{name:"Tom",age:18,sex:"male"}) 11 | create (:Person{name:"Jimmy",age:20,sex:"male"}) 12 | ``` 13 | 14 | ### 2 创建关系 15 | 16 | 寻找2个Person类型节点分别姓名为Tom和Jimmy,创建两节点之间的关系:类型为Friend,关系值为best 17 | 18 | ```cql 19 | match(p1:Person),(p2:Person) 20 | where p1.name="Tom" and p2.name = "Jimmy" 21 | create(p1) -[:Friend{relation:"best"}] ->(p2); 22 | ``` 23 | 24 | ### 3 创建索引 25 | 26 | ```cql 27 | create index on :Person(name) 28 | // 创建唯一索引(属性值唯一) 29 | create constraint on (n:Person) assert n.name is unique 30 | ``` 31 | 32 | ### 4 删除节点 33 | 34 | ```cql 35 | // 普通删除 36 | match(p:Person_{name:"Jiimmy"}) delete p 37 | match (a)-[r:knows]->(b) delete r,b 38 | // 级联删除(即删除某个节点时会同时删除该节点的关系) 39 | match (n{name: "Mary"}) detach delete n 40 | // 删除所有节点 41 | match (m) delete m 42 | ``` 43 | 44 | ### 5 删除关系 45 | 46 | ```cql 47 | // 普通删除 48 | match(p1:Person)-[r:Friend]-(p2:Person) 49 | where p1.name="Jimmy" and p2.name="Tom" 50 | delete r 51 | // 删除所有关系 52 | match p=()-[]-() delete p 53 | ``` 54 | 55 | ### 6 merge关键字 56 | 57 | 存在直接返回;不存在则新建并返回(通常实际用途于在对节点添加属性时避免报错) 58 | 59 | ```cql 60 | // 创建/获取对象 61 | merge (p:Person { name: "Jim1" }) return p; 62 | 63 | // 创建/获取对象 + 设置属性值 + 返回属性值 64 | merge (p:Person { name: "Koko" }) 65 | on create set p.time = timestamp() 66 | return p.name, p.time 67 | 68 | // 创建关系 69 | match (a:Person {name: "Jim"}),(b:Person {name: "Tom"}) 70 | merge (a)-[r:friends]->(b) 71 | ``` 72 | 73 | ### 7 更新节点 74 | 75 | #### 7.1 更新属性值 76 | 77 | ```cql 78 | match (n {name:'Jim'}) 79 | set n.name='Tom' 80 | set n.age=20 81 | return n 82 | ``` 83 | 84 | #### 7.2 新增属性和属性值 85 | 86 | ```cql 87 | match (n {name:'Mary'}) set n += {age:20} return n 88 | ``` 89 | 90 | #### 7.3 删除属性值 91 | 92 | ```cql 93 | match(n{name:'Tom'}) remove n.age return n 94 | ``` 95 | 96 | #### 7.4 更新节点类型(允许有多个标签) 97 | 98 | ```cql 99 | ①match (n{name:'Jim'}) set n:Person return n 100 | ②match (n{name:'Jim'}) set n:Person:Student return n 101 | ``` 102 | 103 | ### 8 匹配 104 | 105 | #### 8.1 限制节点类型和属性匹配 106 | 107 | ```cql 108 | match (n:Person{name:"Jim"}) return n 109 | match (n) where n.name = "Jim" return n 110 | match (n:Person)-[:Realation]->(m:Person) where n.name = 'Mary' 111 | ``` 112 | 113 | #### 8.2 可选匹配(对于缺失部分使用Null代替) 114 | 115 | ```cql 116 | optional match (n)-[r]->(m) return m 117 | ``` 118 | 119 | #### 8.3 字符串开头匹配 120 | 121 | ```cql 122 | match (n) where n.name starts with 'J' return n 123 | ``` 124 | 125 | #### 8.4 字符串结尾匹配 126 | 127 | ```cql 128 | match (n) where n.name ends with 'J' return n 129 | ``` 130 | 131 | #### 8.5 字符串包含匹配 132 | 133 | ```cql 134 | match (n) where n.name contains with 'g' return n 135 | ``` 136 | 137 | #### 8.6 字符串排除匹配 138 | 139 | ```cql 140 | match (n) where not n.name starts with 'J' return n 141 | ``` 142 | 143 | #### 8.7 正则匹配 =~(模糊匹配) 144 | 145 | ```cql 146 | match (n) where n.name =~ '.*J.*' return n (等价) like '%J%' 147 | ``` 148 | 149 | #### 8.8 正则匹配 =~(不区分大小写) 150 | 151 | ```cql 152 | match (n) where n.name =~ '(?i)b.*' return n (等价) like 'B/b%' 153 | ``` 154 | 155 | #### 8.9 属性值包含(IN) 156 | 157 | ```cql 158 | match (n { name: 'Jim' }),(m) where m.name in ['Tom', 'Koo'] and (n)<--(m) return m 159 | ``` 160 | 161 | #### 8.10 "或"匹配(|) 162 | 163 | ```cql 164 | match p=(n)-[:knows|:likes]->(m) return p 165 | ``` 166 | 167 | #### 8.11 任意节点和指定范围深度关系 168 | 169 | ```cql 170 | match p=(n)-[*1..3]->(m) return p 171 | ``` 172 | 173 | #### 8.12 任意节点和指任意深度关系 174 | 175 | ```cql 176 | match p=(n)-[*]->(m) return p 177 | ``` 178 | 179 | #### 8.13 去重返回 180 | 181 | ```cql 182 | match (n) where n.ptype='book' return distinct n 183 | ``` 184 | 185 | #### 8.14 排序返回(desc降序;asc升序) 186 | 187 | ```cql 188 | match (n) where n.ptype='book' return n order by n.price desc 189 | ``` 190 | 191 | #### 8.15 重命名返回 192 | 193 | ```cql 194 | match (n) where n.ptype='book' return n.pname as name 195 | ``` 196 | 197 | #### 8.16 多重条件限制(with),即返回认识10人以上的张% 198 | 199 | ```cql 200 | match (a)-[:knows]-(b) 201 | where a.name =~ '张.*' 202 | with a, count(b) as friends 203 | where friends > 10 204 | return a 205 | ``` 206 | 207 | #### 8.17 并集去重(union) 208 | 209 | ```cql 210 | match (a)-[:knows]->(b) return b.name 211 | union 212 | match (a)-[:likes]->(b) eturn b.name 213 | ``` 214 | 215 | #### 8.18 并集不去重(union all) 216 | 217 | ```cql 218 | match (a)-[:knows]->(b) return b.name 219 | union all 220 | match (a)-[:likes]->(b) eturn b.name 221 | ``` 222 | 223 | #### 8.19 查看节点属性/ID 224 | 225 | ```cql 226 | match (p) where p.name = 'Jim' 227 | return keys(p)/properties(p)/id(p) 228 | ``` 229 | 230 | #### 8.20 匹配分页返回 231 | 232 | ```cql 233 | match (n) where n.name='John' return n skip 10 limit 10 234 | ``` 235 | 236 | 237 | 238 | ### 9 读取文件 239 | 240 | #### 9.1 读取网络资源csv文件 241 | 242 | ```cql 243 | load csv with header from 'url:[www.download.com/abc.csv](http://www.download.com/abc.csv)' as line 244 | 245 | create (:Track{trackId:line.id,name:line.name,length:line.length}) 246 | ``` 247 | 248 | #### 9.2 分批读取网络资源 249 | 例如 csv文件(default=1000) 250 | 251 | ```cql 252 | using periodic commit (800) 253 | 254 | load csv with header from 'url:[www.download.com/abc.csv](http://www.download.com/abc.csv)' as line 255 | 256 | create (:Track{trackId:line.id,name:line.name,length:line.length}) 257 | ``` 258 | 259 | #### 9.3 读取本地文件 260 | 261 | ```cql 262 | load csv with headers from 'file:///00000.csv' as line 263 | create (:Data{date:line['date'],open:line['open']}) 264 | (fieldterminator ';') //自定义分隔符 265 | ``` 266 | 267 | #### 9.4 注意事项 268 | 269 | ``` 270 | ※ 本地csv文件必须是utf-8格式 271 | ※ 需要导入neo4j数据库目录的import目录下 272 | ※ 本地csv包含column必须添加with headers 273 | ``` 274 | 275 | 276 | 277 | ### 10 foreach关键字 278 | 279 | 280 | 281 | --- 282 | 283 | 个人小结 284 | 285 | 1.节点属性使用`()` 286 | 2.关系属性使用`[]` 287 | 3.where中使用`"="` 288 | 4.`{}`中使用`":"` 289 | 5.关系建立使用`(m)-[:r]->(n)` 290 | 6.正则使用`"=~"` 291 | 7.节点或者关系(/[变量名:类型{属性名:属性值}]/) 292 | 8.匹配关系时需要基于p=(m)-[r]->(n)返回p,而不是返回r(显示空) -------------------------------------------------------------------------------- /docs/小型金融知识图谱构建示范.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jm199504/Financial-Knowledge-Graphs/879ca6449ea17ab7653a3356448cab7667bc858b/docs/小型金融知识图谱构建示范.pdf -------------------------------------------------------------------------------- /docs/小型金融知识图谱构建示范.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jm199504/Financial-Knowledge-Graphs/879ca6449ea17ab7653a3356448cab7667bc858b/docs/小型金融知识图谱构建示范.pptx -------------------------------------------------------------------------------- /font/FZLTCXHJW.TTF: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jm199504/Financial-Knowledge-Graphs/879ca6449ea17ab7653a3356448cab7667bc858b/font/FZLTCXHJW.TTF -------------------------------------------------------------------------------- /images/aaa.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jm199504/Financial-Knowledge-Graphs/879ca6449ea17ab7653a3356448cab7667bc858b/images/aaa.png -------------------------------------------------------------------------------- /images/average_price_plot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jm199504/Financial-Knowledge-Graphs/879ca6449ea17ab7653a3356448cab7667bc858b/images/average_price_plot.png -------------------------------------------------------------------------------- /images/close_price_plot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jm199504/Financial-Knowledge-Graphs/879ca6449ea17ab7653a3356448cab7667bc858b/images/close_price_plot.png -------------------------------------------------------------------------------- /images/cn.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jm199504/Financial-Knowledge-Graphs/879ca6449ea17ab7653a3356448cab7667bc858b/images/cn.png -------------------------------------------------------------------------------- /images/corr_node.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jm199504/Financial-Knowledge-Graphs/879ca6449ea17ab7653a3356448cab7667bc858b/images/corr_node.png -------------------------------------------------------------------------------- /images/corr_node_with_return.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jm199504/Financial-Knowledge-Graphs/879ca6449ea17ab7653a3356448cab7667bc858b/images/corr_node_with_return.png -------------------------------------------------------------------------------- /images/create_node.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jm199504/Financial-Knowledge-Graphs/879ca6449ea17ab7653a3356448cab7667bc858b/images/create_node.png -------------------------------------------------------------------------------- /images/create_releationship.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jm199504/Financial-Knowledge-Graphs/879ca6449ea17ab7653a3356448cab7667bc858b/images/create_releationship.png -------------------------------------------------------------------------------- /images/daily_return_plot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jm199504/Financial-Knowledge-Graphs/879ca6449ea17ab7653a3356448cab7667bc858b/images/daily_return_plot.png -------------------------------------------------------------------------------- /images/link_prediction_nodes.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jm199504/Financial-Knowledge-Graphs/879ca6449ea17ab7653a3356448cab7667bc858b/images/link_prediction_nodes.png -------------------------------------------------------------------------------- /images/monte_2_plot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jm199504/Financial-Knowledge-Graphs/879ca6449ea17ab7653a3356448cab7667bc858b/images/monte_2_plot.png -------------------------------------------------------------------------------- /images/monte_3_plot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jm199504/Financial-Knowledge-Graphs/879ca6449ea17ab7653a3356448cab7667bc858b/images/monte_3_plot.png -------------------------------------------------------------------------------- /images/monte_plot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jm199504/Financial-Knowledge-Graphs/879ca6449ea17ab7653a3356448cab7667bc858b/images/monte_plot.png -------------------------------------------------------------------------------- /images/multi_rets_plot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jm199504/Financial-Knowledge-Graphs/879ca6449ea17ab7653a3356448cab7667bc858b/images/multi_rets_plot.png -------------------------------------------------------------------------------- /images/nodes_pcc.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jm199504/Financial-Knowledge-Graphs/879ca6449ea17ab7653a3356448cab7667bc858b/images/nodes_pcc.png -------------------------------------------------------------------------------- /images/others_alg.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jm199504/Financial-Knowledge-Graphs/879ca6449ea17ab7653a3356448cab7667bc858b/images/others_alg.png -------------------------------------------------------------------------------- /images/pcc.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jm199504/Financial-Knowledge-Graphs/879ca6449ea17ab7653a3356448cab7667bc858b/images/pcc.png -------------------------------------------------------------------------------- /images/ra.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jm199504/Financial-Knowledge-Graphs/879ca6449ea17ab7653a3356448cab7667bc858b/images/ra.png -------------------------------------------------------------------------------- /images/rets_plot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jm199504/Financial-Knowledge-Graphs/879ca6449ea17ab7653a3356448cab7667bc858b/images/rets_plot.png -------------------------------------------------------------------------------- /images/return_risk.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jm199504/Financial-Knowledge-Graphs/879ca6449ea17ab7653a3356448cab7667bc858b/images/return_risk.png -------------------------------------------------------------------------------- /images/return_risk_2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jm199504/Financial-Knowledge-Graphs/879ca6449ea17ab7653a3356448cab7667bc858b/images/return_risk_2.png -------------------------------------------------------------------------------- /images/tn.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jm199504/Financial-Knowledge-Graphs/879ca6449ea17ab7653a3356448cab7667bc858b/images/tn.png -------------------------------------------------------------------------------- /images/volume_plot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jm199504/Financial-Knowledge-Graphs/879ca6449ea17ab7653a3356448cab7667bc858b/images/volume_plot.png -------------------------------------------------------------------------------- /images/word_cloud.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jm199504/Financial-Knowledge-Graphs/879ca6449ea17ab7653a3356448cab7667bc858b/images/word_cloud.png -------------------------------------------------------------------------------- /jar/graph-algorithms-algo-3.5.4.0.jar: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jm199504/Financial-Knowledge-Graphs/879ca6449ea17ab7653a3356448cab7667bc858b/jar/graph-algorithms-algo-3.5.4.0.jar --------------------------------------------------------------------------------