您好,我是金融问答助手。
欢迎使用金融知识图谱知识计算引擎,有什么股票方面的问题可以问我哦。
├── README.md
├── all_relations.png
├── app.py
├── checkpoints
├── classifier
│ └── model.bin
└── entity_searcher
│ └── search_tree.pkl
├── dialogue.png
├── neo_db
├── classifier.py
├── fin_config.py
├── graph_matcher.py
├── query_graph.py
├── robot_answer.py
└── semantic_parser.py
├── query_node1.png
├── query_node2.png
├── requirement.txt
├── static
├── css
│ ├── bootstrap.min.css
│ ├── datatables.bootstrap.css
│ ├── datatables.responsive.css
│ ├── font-awesome.min.css
│ ├── ionicons.min.css
│ ├── main.css
│ ├── nifty-demo-icons.min.css
│ ├── nifty-demo.min.css
│ ├── nifty.css
│ ├── nifty.min.css
│ ├── pace.min.css
│ └── wiki.css
├── fonts
│ ├── cjzkeoubrn4kerxqtauh3acwcynf_cdxxwclxiixg1c.ttf
│ ├── fontawesome-webfont.ttf
│ ├── fontawesome-webfont.woff
│ ├── fontawesome-webfont.woff2
│ ├── mtp_ysujh_bn48vbg8snsonf5ufddttmlvmwujdhhgs.ttf
│ ├── nifty-demo-icons.ttf
│ └── nifty-demo-icons.woff
├── images
│ ├── bg-mask.png
│ ├── bg-reply-texture.jpg
│ ├── bk3.jpg
│ ├── btn-search-small.png
│ ├── bubble-triangle.png
│ ├── qa-banner.png
│ ├── user-assistant.png
│ └── 北邮LOGO.png
└── js
│ ├── bootstrap.min.js
│ ├── echarts.min.js
│ ├── icons.js
│ ├── jquery-2.2.4.min.js
│ ├── nifty-demo.min.js
│ ├── nifty.min.js
│ ├── pace.min.js
│ └── tags.js
├── templates
├── all_relation.html
├── dialogue.html
├── index.html
└── query_node.html
├── 封面.png
├── 程序流.png
└── 项目结构.png
/README.md:
--------------------------------------------------------------------------------
1 | # 基于金融知识图谱的知识计算引擎构建
2 |
3 | 前端代码参考[KGQA_HLM:基于知识图谱的《红楼梦》人物关系可视化及问答系统](https://github.com/chizhu/KGQA_HLM)
4 |
5 | 多轮对话机制参考[基于金融知识图谱的问答系统](https://github.com/XuekaiChen/FinKnowledgeGraph)
6 |
7 | ## 项目主图
8 |
9 | 封面
10 |
11 | 
12 |
13 | 检索节点信息
14 |
15 | 
16 |
17 | 图谱全貌
18 |
19 | 
20 |
21 | 多轮对话
22 |
23 | 
24 |
25 | 程序流
26 |
27 | 
28 |
29 | ## 准备数据
30 | 1. 利用结构化三元组构建金融知识图谱,数据可在[此处](https://pan.baidu.com/s/1UQfu5c1Y7BfdMS_uNGrZug )下载获得,提取码:`sae3`
31 | 2. 依照[此项目](https://github.com/XuekaiChen/FinKnowledgeGraph)中的中“2.安装环境”的提示,下载Neo4j
32 | * 创建Project:finance_demo
33 | * 在Project下创建数据库:db
34 | * 用户名:neo4j
35 | * 密码:neo4j123
36 | 3. 下载该项目中的`step2_store_to_neo4j.py`文件,运行生成后端金融知识图谱
37 |
38 |
39 | ## 运行
40 | 1. 开启Neo4j数据库finance_demo/db(默认7474端口)
41 | 2. 运行`app.py`主程序,点击生成的网址链接(默认5000端口)
42 |
43 | ### 项目结构如下:
44 | 
45 |
--------------------------------------------------------------------------------
/all_relations.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/XuekaiChen/ShowKnowledge/e12284f81bb00db2e1066ad9b74b5f0a1bc11655/all_relations.png
--------------------------------------------------------------------------------
/app.py:
--------------------------------------------------------------------------------
1 | from flask import Flask, render_template, request, jsonify, json
2 | from neo_db.query_graph import query, get_details, get_all_graph
3 | from neo_db.robot_answer import get_robot_answer
4 |
5 | app = Flask(__name__)
6 |
7 |
8 | @app.route('/', methods=['GET', 'POST'])
9 | @app.route('/index', methods=['GET', 'POST'])
10 | def index(name=None):
11 | return render_template('index.html', name=name)
12 |
13 |
14 | @app.route('/query_node', methods=['GET', 'POST'])
15 | def search_page():
16 | return render_template('query_node.html')
17 |
18 |
19 | @app.route('/get_all_relation', methods=['GET', 'POST'])
20 | def get_all_relation():
21 | return render_template('all_relation.html')
22 |
23 |
24 | @app.route('/dialogue', methods=['GET', 'POST'])
25 | def dialogue_page():
26 | # 需要返回json数据格式嵌入HTML
27 | return render_template('dialogue.html')
28 |
29 |
30 | @app.route('/get_profile',methods=['GET','POST'])
31 | def get_profile():
32 | choice = request.args.get('choice')
33 | limit = request.args.get('limit')
34 | json_data = get_all_graph(choice, limit)
35 | return jsonify(json_data)
36 |
37 |
38 | @app.route('/search_node', methods=['GET', 'POST'])
39 | def search_node():
40 | choice = request.args.get('choice')
41 | name = request.args.get('name')
42 | json_data = query(choice, str(name))
43 | return jsonify(json_data)
44 |
45 |
46 | @app.route('/get_chart', methods=['GET', 'POST'])
47 | def get_chart():
48 | node_or_edge = json.loads(request.form.get('type')) # $.ajax传多个参数只能post请求,对应form
49 | data = json.loads(request.form.get('data'))
50 | nodes = json.loads(request.form.get('nodes'))
51 | json_data = get_details(node_or_edge, data, nodes)
52 | return jsonify(json_data)
53 |
54 |
55 | @app.route('/dialogue_answer', methods=['GET', 'POST'])
56 | def dialogue_answer():
57 | question = request.args.get('name')
58 | robot_answer = get_robot_answer(str(question).strip())
59 | json_data = {'data': robot_answer}
60 | json_data['data'].replace("\n", "
")
61 | return jsonify(json_data)
62 |
63 |
64 | if __name__ == '__main__':
65 | app.debug = True
66 | app.run()
67 |
--------------------------------------------------------------------------------
/checkpoints/classifier/model.bin:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/XuekaiChen/ShowKnowledge/e12284f81bb00db2e1066ad9b74b5f0a1bc11655/checkpoints/classifier/model.bin
--------------------------------------------------------------------------------
/checkpoints/entity_searcher/search_tree.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/XuekaiChen/ShowKnowledge/e12284f81bb00db2e1066ad9b74b5f0a1bc11655/checkpoints/entity_searcher/search_tree.pkl
--------------------------------------------------------------------------------
/dialogue.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/XuekaiChen/ShowKnowledge/e12284f81bb00db2e1066ad9b74b5f0a1bc11655/dialogue.png
--------------------------------------------------------------------------------
/neo_db/classifier.py:
--------------------------------------------------------------------------------
1 | import fasttext
2 | import jieba
3 | # from KGQA_HLM.config import classifier_corpus_path, classifier_save_path
4 |
5 |
6 | def train_classifier(input_file_path, model_save_path):
7 | """训练分类模型"""
8 |
9 | # 基于 fasttext api 实现模型训练
10 | # https://fasttext.cc/docs/en/supervised-tutorial.html
11 | model = fasttext.train_supervised(input=input_file_path, label='__label__',lr=0.5)
12 | result = model.test(input_file_path)
13 | print(result[1])
14 | print(result[2])
15 | model.save_model(model_save_path)
16 |
17 |
18 | class Classifier:
19 | """分类器"""
20 |
21 | def __init__(self, model_load_path):
22 | self.model_load_path = model_load_path
23 | self.model = self.load_model()
24 |
25 | def load_model(self):
26 | """加载模型"""
27 | return fasttext.load_model(self.model_load_path)
28 |
29 | def predict(self, query):
30 | """预测 query"""
31 |
32 | # 基于 fasttext api 实现模型预测
33 | # https://fasttext.cc/docs/en/supervised-tutorial.html
34 | query_intent = self.model.predict(query)
35 | # 预测 label 和概率
36 | return query_intent[0][0].replace('__label__', ''), query_intent[1][0]
37 |
38 |
39 | if __name__ == '__main__':
40 |
41 | print('开始训练分类器...')
42 |
43 | train_classifier(classifier_corpus_path, classifier_save_path)
44 |
45 | print('分类器训练成功...')
46 |
47 |
48 |
--------------------------------------------------------------------------------
/neo_db/fin_config.py:
--------------------------------------------------------------------------------
1 | from py2neo import Graph
2 |
3 | # 加载知识图谱
4 | graph = Graph("http://localhost:7474", auth=("neo4j", "neo4j123"))
5 |
6 | # 知识语料路径
7 | entity_corpus_path = 'data/knowledge/'
8 |
9 | # 实体搜索器存储路径
10 | entity_searcher_save_path = 'checkpoints/entity_searcher/search_tree.pkl'
11 |
12 | # 实体搜索器加载路径
13 | entity_searcher_load_path = 'checkpoints/entity_searcher/search_tree.pkl'
14 |
15 | # 分类器语料路径
16 | classifier_corpus_path = 'data/classifier/chat.train'
17 |
18 | # 分类器模型存储路径
19 | classifier_save_path = 'checkpoints/classifier/model.bin'
20 |
21 | # 分类器模型加载路径
22 | classifier_load_path = 'checkpoints/classifier/model.bin'
23 |
24 | # 闲聊回复语料库
25 | chat_responses = {
26 | 'qa': [],
27 | 'greet': [
28 | 'hello,很高兴为您服务,有什么可以为您效劳的呢?',
29 | '您好,可以输入股票名称或者代码查看详细信息哦',
30 | '您好,有股票相关的问题可以问我哦'
31 | ],
32 | 'goodbye': [
33 | '再见',
34 | '有什么问题可以下次继续问我哦',
35 | '拜拜喽,别忘了给个小红心啊',
36 | ],
37 | 'bot': [
38 | '没错,我就是集美貌与才智于一身的智能问答机器人',
39 | '为了防止世界被破坏,为了维护世界的和平,有任何需要我都会帮助你的'
40 | ],
41 | 'safe': [
42 | '不好意思,您的问题我没太听懂,可以换一种说法嘛',
43 | '亲亲,这里好像没有您想要的答案'
44 | ]
45 | }
46 |
47 | # 问题类型
48 | question_types = {
49 | 'concept':
50 | ['概念', '特征'],
51 | 'holder':
52 | ['股东'],
53 | 'stock':
54 | ['股票', '持有', '控股', '控制'],
55 | 'industry':
56 | ['行业', '领域'],
57 | }
58 |
59 | # 存储对话历史中上一次涉及的问题类型和实体
60 | contexts = {
61 | 'ques_types': None,
62 | 'entities': None
63 | }
64 |
65 | # 节点Legend列表
66 | CA_LIST = {"股东": 0, "股票": 1, "概念": 2}
67 |
68 | d={
69 | "segments": [
70 | {
71 | "start": {
72 | "identity": 0,
73 | "labels": [
74 | "股票"
75 | ],
76 | "properties": {
77 | "股票名称": "泛海控股",
78 | "TS代码": "000046.SZ",
79 | "行业": "多元金融",
80 | "股票代码": 46
81 | }
82 | },
83 | "relationship": {
84 | "identity": 9601,
85 | "start": 0,
86 | "end": 4301,
87 | "type": "所属概念",
88 | "properties": {
89 |
90 | }
91 | },
92 | "end": {
93 | "identity": 4301,
94 | "labels": [
95 | "概念"
96 | ],
97 | "properties": {
98 | "概念代码": "TS199",
99 | "概念名称": "房地产"
100 | }
101 | }
102 | }
103 | ],
104 | "length": 1.0
105 | }
106 |
--------------------------------------------------------------------------------
/neo_db/graph_matcher.py:
--------------------------------------------------------------------------------
1 | from py2neo import Graph
2 |
3 |
4 | class GraphMatcher:
5 | """基于 cypher 语句查询数据库"""
6 |
7 | def __init__(self):
8 | self.graph = Graph('http://localhost:7474/finance_demo/db/', auth=('neo4j', 'neo4j123'))
9 |
10 | def parse_graph(self, ques_types, entities):
11 | """转换成 cypher 语句查询"""
12 |
13 | response = ""
14 | for each_ques_type in ques_types:
15 | # 问所属概念
16 | if each_ques_type == 'concept':
17 | for entity_name, entity_type in entities.items():
18 | # 1、问股票的概念
19 | if entity_type == '股票':
20 | cypher_sql = f'MATCH (s:`股票`)-[r:所属概念]->(c:`概念`) where s.股票名称 = "{entity_name}" return c.概念名称'
21 | rtn = self.graph.run(cypher_sql).data()
22 | response += f'{entity_name}所属概念有{"、".join([i["c.概念名称"] for i in rtn])}
'
23 | # 2、问概念有哪些股票
24 | elif entity_type == '概念':
25 | cypher_sql = f'MATCH (s:`股票`)-[r:所属概念]->(c:`概念`) where c.概念名称 = "{entity_name}" return s.股票名称'
26 | rtn = self.graph.run(cypher_sql).data()
27 | response += f'{entity_name}概念下有{"、".join([i["s.股票名称"] for i in rtn])}这些股票
'
28 |
29 | # 问股东的股票
30 | elif each_ques_type == 'stock':
31 | for entity_name, entity_type in entities.items():
32 | cypher_sql = f'MATCH (s:`股东`)-[r:持有]->(c:`股票`) where s.股东名称 = "{entity_name}" return c.股票名称, r.持有量, r.占比'
33 | rtn = self.graph.run(cypher_sql).data()
34 | response = f'{entity_name}持有股票的情况如下:\n'
35 | for i in rtn:
36 | response += f'{i["c.股票名称"]},持有股份{i["r.持有量"]},占比{i["r.占比"]}%
'
37 |
38 | # 问股票的股东
39 | elif each_ques_type == 'holder':
40 | for entity_name, entity_type in entities.items():
41 | cypher_sql = f'MATCH (s:`股东`)-[r:持有]->(c:`股票`) where c.股票名称 = "{entity_name}" return s.股东名称, r.持有量, r.占比'
42 | rtn = self.graph.run(cypher_sql).data()
43 | response = f'{entity_name}的股东是:\n'
44 | for i in rtn:
45 | response += f'{i["s.股东名称"]},持有股份{i["r.持有量"]},占比{i["r.占比"]}%
'
46 |
47 | # 问行业、领域
48 | elif each_ques_type == 'industry':
49 | # 提示:match 股票 return 行业
50 | for entity_name, entity_type in entities.items():
51 | # 1、股票所属行业
52 | if entity_type == '股票':
53 | cypher_sql = f'MATCH (s:`股票`) where s.股票名称="{entity_name}" return s.行业'
54 | rtn = self.graph.run(cypher_sql).data()
55 | response += f'{entity_name}所属行业是{rtn[0]["s.行业"]}
'
56 | pass
57 | return response.strip()
58 |
59 | def predict(self, semantics):
60 | """预测 query"""
61 | response = self.parse_graph(semantics['ques_types'], semantics['entities'])
62 | return response
63 |
--------------------------------------------------------------------------------
/neo_db/query_graph.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | # Date : 2022/5/19
3 | # Author : Chen Xuekai
4 | # Description :
5 |
6 | import sys
7 | import json
8 | from neo_db.fin_config import graph
9 | from pprint import pprint
10 |
11 |
12 | # 因为数据格式实在不固定,只能每类单独写了
13 | def query(node_cate, q_content):
14 | json_data = {'nodes': [], "links": []}
15 |
16 | # 总布局:股东-[持有]->股票-[所属概念]->概念
17 | if node_cate == '0': # 股东
18 | data = graph.run("match(p:`股东`{`股东名称`:'%s'})-[r]->(n) return n" % q_content)
19 | data = list(data)
20 |
21 | if len(data) == 0:
22 | pass
23 | else:
24 | # 添加查询节点
25 | node = {
26 | 'id': 0,
27 | 'name': q_content,
28 | 'category': 0 # 股东
29 | }
30 | json_data['nodes'].append(node)
31 | # # 添加受体节点
32 | for idx, n in enumerate(data):
33 | node = {
34 | 'id': idx + 1,
35 | 'name': n['n']['股票名称'],
36 | 'category': 1 # 股票
37 | }
38 | json_data['nodes'].append(node)
39 | # 添加关系
40 | for idx in range(len(data)):
41 | link = {
42 | 'source': 0,
43 | 'target': idx + 1,
44 | 'value': "持有"
45 | }
46 | json_data['links'].append(link)
47 |
48 | elif node_cate == '1': # 股票
49 | data1 = list(graph.run("match(p)-[r]->(n:`股票`{`股票名称`:'%s'}) return p" % q_content))
50 | if len(data1) == 0:
51 | pass
52 | else:
53 | # 添加查询节点
54 | node = {
55 | 'id': 0,
56 | 'name': q_content,
57 | 'category': 1 # 股票
58 | }
59 | json_data['nodes'].append(node)
60 | # 添加节点
61 | for idx, n in enumerate(data1):
62 | node = {
63 | 'id': idx + 1,
64 | 'name': n['p']['股东名称'],
65 | 'category': 0
66 | }
67 | json_data['nodes'].append(node)
68 | # 添加关系
69 | for idx in range(len(data1)):
70 | link = {
71 | 'source': idx + 1,
72 | 'target': 0,
73 | 'value': "持有"
74 | }
75 | json_data['links'].append(link)
76 |
77 | data2 = list(graph.run("match(p:`股票`{`股票名称`:'%s'})-[r]->(n) return n" % q_content))
78 | if len(data2) == 0:
79 | pass
80 | else:
81 | if len(data1) == 0:
82 | node = {
83 | 'id': 0,
84 | 'name': q_content,
85 | 'category': 1 # 股票
86 | }
87 | json_data['nodes'].append(node)
88 | else:
89 | existing = len(json_data['nodes'])
90 | # 添加受体节点
91 | for idx, n in enumerate(data2):
92 | node = {
93 | 'id': existing + idx,
94 | 'name': n['n']['概念名称'],
95 | 'category': 2
96 | }
97 | json_data['nodes'].append(node)
98 | # 添加关系
99 | for idx in range(len(data2)):
100 | link = {
101 | 'source': 0,
102 | 'target': existing + idx,
103 | 'value': "所属概念"
104 | }
105 | json_data['links'].append(link)
106 |
107 | elif node_cate == '2': # 概念
108 | data = graph.run("match(p)-[r]->(n:`概念`{`概念名称`:'%s'}) return p" % q_content)
109 | data = list(data)
110 |
111 | if len(data) == 0:
112 | pass
113 | else:
114 | # 添加查询节点
115 | node = {
116 | 'id': 0,
117 | 'name': q_content,
118 | 'category': 2 # 概念
119 | }
120 | json_data['nodes'].append(node)
121 | # 添加节点
122 | for idx, n in enumerate(data):
123 | node = {
124 | 'id': idx + 1,
125 | 'name': n['p']['股票名称'],
126 | 'category': 1 # 股票
127 | }
128 | json_data['nodes'].append(node)
129 | # 添加关系
130 | for idx in range(len(data)):
131 | link = {
132 | 'source': idx + 1,
133 | 'target': 0,
134 | 'value': "所属概念"
135 | }
136 | json_data['links'].append(link)
137 |
138 | else:
139 | sys.exit()
140 |
141 | return json_data # 返回列表[json_data{data:[],links:[]}, detail_chart{key:value}]
142 |
143 |
144 | # 点击echarts元素时获取信息表
145 | def get_details(node_or_edge, data, nodes): # json类型的p/r/n
146 | """
147 | node_or_edge: node
148 | data: {'category': 0, 'id': 0, 'name': '王石'}
149 | nodes: [{'category': 0, 'id': 0, 'name': '王石'}, {'category': 1, 'id': 1, 'name': '宝莱特'}]
150 |
151 | edge
152 | {'source': 0, 'target': 1, 'value': '持有'}
153 | [{'category': 0, 'id': 0, 'name': '王石'}, {'category': 1, 'id': 1, 'name': '宝莱特'}]
154 | """
155 | result = {}
156 | if node_or_edge == 'node':
157 | if data['category'] == 0: # 股东
158 | result = {'id': data['id'], '股东名称': data['name']}
159 | elif data['category'] == 1: # 股票
160 | result['id'] = data['id']
161 | response = list(graph.run("MATCH (n:`股票`{`股票名称`:'%s'}) RETURN n" % data['name']))
162 | for i in response[0]['n']:
163 | result[i] = str(response[0]['n'][i])
164 |
165 | elif data['category'] == 2: # 概念
166 | result['id'] = data['id']
167 | response = list(graph.run("MATCH (n:`概念`{`概念名称`:'%s'}) RETURN n" % data['name']))
168 | for i in response[0]['n']:
169 | result[i] = str(response[0]['n'][i])
170 |
171 | elif node_or_edge == 'edge':
172 | if data['value'] == '持有':
173 | result['关系类型'] = '持有'
174 | source = [item['name'] for item in nodes if item['id']==data['source']][0]
175 | target = [item['name'] for item in nodes if item['id']==data['target']][0]
176 | response = list(
177 | graph.run("MATCH l=(p:`股东`{`股东名称`:'%s'})-[r:`持有`]->(n:`股票`{`股票名称`:'%s'}) RETURN r" % (source, target))
178 | )
179 | for i in response[0]['r']:
180 | result[i] = response[0]['r'][i]
181 | if i == '占比':
182 | result[i] = str(response[0]['r'][i]) + '%'
183 |
184 | elif data['value'] == '所属概念':
185 | result = {'关系类型': '所属概念'}
186 |
187 | else:
188 | sys.exit()
189 |
190 | chart = dict_to_html(result)
191 | return chart
192 |
193 |
194 | # 将get_details得到的dict转换为html格式
195 | def dict_to_html(info_dict): # dict{key: value}
196 | s = ''
197 | for key, value in info_dict.items():
198 | st = "
您好,我是金融问答助手。
欢迎使用金融知识图谱知识计算引擎,有什么股票方面的问题可以问我哦。
基于金融知识图谱的知识计算引擎
63 | 开启探索 64 |