├── README.md ├── data ├── SmoothNLP36kr新闻数据集10k.xlsx ├── SmoothNLP专栏资讯数据集样本10k.xlsx ├── SmoothNLP工商数据集样本10K.xlsx ├── SmoothNLP投资事件数据集样本2k.xlsx ├── SmoothNLP投资结构数据集样本1k.xlsx └── SmoothNLP金融新闻数据集样本20k.xlsx └── demo ├── 36kr新闻demo.PNG ├── 专栏资讯demo.png ├── 工商数据demo.png ├── 投资事件demo.png ├── 投资机构demo.png └── 金融新闻demo.png /README.md: -------------------------------------------------------------------------------- 1 | # FinancialDatasets 2 | SmoothNLP 金融文本数据集(公开) | Public Financial Datasets for NLP Researches 3 | 4 | [**API接口服务**](https://github.com/smoothnlp/smoothnlp_api) 5 | 6 | ## 数据一览 7 | > 由于github存储有限, 如需**全量**数据集, 请**联系**: contact@smoothnlp.com 8 | 9 | | 数据名称 | 数据字段 | 样本量 | 总量 | 下载链接 | 10 | | ----- | ------ | ----- | ----- | ----- | 11 | | 企业工商信息 | `名称`,`公司名称`,`公司介绍`,`工商`,`地址`,`工商注册id`,`成立时间`,`法人代表`,`注册资金`,`统一信用代码`,`网址` | 1万 | 50万 - (上市及中小型企业) |[下载](https://github.com/smoothnlp/FinancialDatasets/blob/master/data/SmoothNLP%E5%B7%A5%E5%95%86%E6%95%B0%E6%8D%AE%E9%9B%86%E6%A0%B7%E6%9C%AC10K.xlsx) | 12 | | 金融讯息新闻 | `title-新闻标题`,`content-新闻内容`,`pub_ts-发稿日期` | 2万 | 210万 | [下载](https://github.com/smoothnlp/FinancialDatasets/blob/master/data/SmoothNLP%E4%B8%93%E6%A0%8F%E8%B5%84%E8%AE%AF%E6%95%B0%E6%8D%AE%E9%9B%86%E6%A0%B7%E6%9C%AC10k.xlsx) | 13 | | 专栏资讯 | `title-新闻标题`,`content-新闻内容`,`pub_ts-发稿日期` | 1万 | 58万 | [下载](https://github.com/smoothnlp/FinancialDatasets/blob/master/data/SmoothNLP%E4%B8%93%E6%A0%8F%E8%B5%84%E8%AE%AF%E6%95%B0%E6%8D%AE%E9%9B%86%E6%A0%B7%E6%9C%AC10k.xlsx) | 14 | | 投资机构信息 | `机构名称`,`介绍`,`行业`,`规模`,`轮次`| 1K | 3万 | [下载](https://github.com/smoothnlp/FinancialDatasets/blob/master/data/SmoothNLP%E6%8A%95%E8%B5%84%E7%BB%93%E6%9E%84%E6%95%B0%E6%8D%AE%E9%9B%86%E6%A0%B7%E6%9C%AC1k.xlsx) | 15 | | 投资事件 | `事件资讯`,`投资方`,`融资方`,`融资事件`,`轮次`,`金额` | 2K | 7万 | [下载](https://github.com/smoothnlp/FinancialDatasets/blob/master/data/SmoothNLP%E6%8A%95%E8%B5%84%E4%BA%8B%E4%BB%B6%E6%95%B0%E6%8D%AE%E9%9B%86%E6%A0%B7%E6%9C%AC2k.xlsx) | 16 | |36氪新闻| `title-新闻标题`,`content-新闻内容`,`url-网址` |1万|11万|[下载](https://github.com/smoothnlp/FinancialDatasets/blob/master/data/SmoothNLP36kr新闻数据集10k.xlsx) 17 | 18 | ## 推荐研究方向 19 | * Embedding (Word2Vec, Bert, 等) 20 | * 实体识别 - NER 21 | * 无监督聚类: 基于企业描述信息, 进行竞品聚类 22 | * 企业行业分类 23 | * 标题总结 - Text Summary 24 | * 序列分类 - Sequence Classification 25 | 26 | ## 数据展示 27 | #### 投资机构 28 | ![机构](/demo/%E6%8A%95%E8%B5%84%E6%9C%BA%E6%9E%84demo.png) 29 | #### 投资事件 30 | ![投资事件](/demo/%E6%8A%95%E8%B5%84%E4%BA%8B%E4%BB%B6demo.png) 31 | 32 | #### 企业工商信息 33 | ![工商](/demo/%E5%B7%A5%E5%95%86%E6%95%B0%E6%8D%AEdemo.png) 34 | ##### 金融资讯新闻 35 | ![新闻](/demo/%E9%87%91%E8%9E%8D%E6%96%B0%E9%97%BBdemo.png) 36 | ##### 专栏资讯 37 | ![专栏](/demo/%E4%B8%93%E6%A0%8F%E8%B5%84%E8%AE%AFdemo.png) 38 | ##### 36氪新闻 39 | ![36氪](https://github.com/smoothnlp/FinancialDatasets/blob/master/demo/36kr新闻demo.PNG) 40 | 41 | 42 | 43 | -------------------------------------------------------------------------------- /data/SmoothNLP36kr新闻数据集10k.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/smoothnlp/FinancialDatasets/5e70262aac12b24b8738a382c7b8055b816496c5/data/SmoothNLP36kr新闻数据集10k.xlsx -------------------------------------------------------------------------------- /data/SmoothNLP专栏资讯数据集样本10k.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/smoothnlp/FinancialDatasets/5e70262aac12b24b8738a382c7b8055b816496c5/data/SmoothNLP专栏资讯数据集样本10k.xlsx -------------------------------------------------------------------------------- /data/SmoothNLP工商数据集样本10K.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/smoothnlp/FinancialDatasets/5e70262aac12b24b8738a382c7b8055b816496c5/data/SmoothNLP工商数据集样本10K.xlsx -------------------------------------------------------------------------------- /data/SmoothNLP投资事件数据集样本2k.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/smoothnlp/FinancialDatasets/5e70262aac12b24b8738a382c7b8055b816496c5/data/SmoothNLP投资事件数据集样本2k.xlsx -------------------------------------------------------------------------------- /data/SmoothNLP投资结构数据集样本1k.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/smoothnlp/FinancialDatasets/5e70262aac12b24b8738a382c7b8055b816496c5/data/SmoothNLP投资结构数据集样本1k.xlsx -------------------------------------------------------------------------------- /data/SmoothNLP金融新闻数据集样本20k.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/smoothnlp/FinancialDatasets/5e70262aac12b24b8738a382c7b8055b816496c5/data/SmoothNLP金融新闻数据集样本20k.xlsx -------------------------------------------------------------------------------- /demo/36kr新闻demo.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/smoothnlp/FinancialDatasets/5e70262aac12b24b8738a382c7b8055b816496c5/demo/36kr新闻demo.PNG -------------------------------------------------------------------------------- /demo/专栏资讯demo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/smoothnlp/FinancialDatasets/5e70262aac12b24b8738a382c7b8055b816496c5/demo/专栏资讯demo.png -------------------------------------------------------------------------------- /demo/工商数据demo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/smoothnlp/FinancialDatasets/5e70262aac12b24b8738a382c7b8055b816496c5/demo/工商数据demo.png -------------------------------------------------------------------------------- /demo/投资事件demo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/smoothnlp/FinancialDatasets/5e70262aac12b24b8738a382c7b8055b816496c5/demo/投资事件demo.png -------------------------------------------------------------------------------- /demo/投资机构demo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/smoothnlp/FinancialDatasets/5e70262aac12b24b8738a382c7b8055b816496c5/demo/投资机构demo.png -------------------------------------------------------------------------------- /demo/金融新闻demo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/smoothnlp/FinancialDatasets/5e70262aac12b24b8738a382c7b8055b816496c5/demo/金融新闻demo.png --------------------------------------------------------------------------------