├── Query
├── code
├── model.pyc
├── blacklist
├── view.py
├── preprocess_test.py
├── predict.py
├── test.py
├── preprocess_train.py
├── train.py
└── model.py
├── temp
├── test_data
├── to_sentence
├── att_sents.pickle
├── att_words.pickle
├── attention.pickle
└── predict_y.pickle
├── dictionary
├── 公司简称.xlsx
├── 公告负面词.xlsx
├── 新闻负面词.xlsx
├── 组合管理-持仓清单.xlsx
├── positive.txt
├── negative.txt
└── stopwords_CN.dat
├── test_data
└── sample.xlsx
├── training_data
└── sample.xlsx
└── ReadMe.md
/Query:
--------------------------------------------------------------------------------
1 | $query1$
2 | $query2$
3 | $query3$
4 |
--------------------------------------------------------------------------------
/code/model.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LLluoling/FISHQA/HEAD/code/model.pyc
--------------------------------------------------------------------------------
/temp/test_data:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LLluoling/FISHQA/HEAD/temp/test_data
--------------------------------------------------------------------------------
/temp/to_sentence:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LLluoling/FISHQA/HEAD/temp/to_sentence
--------------------------------------------------------------------------------
/dictionary/公司简称.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LLluoling/FISHQA/HEAD/dictionary/公司简称.xlsx
--------------------------------------------------------------------------------
/dictionary/公告负面词.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LLluoling/FISHQA/HEAD/dictionary/公告负面词.xlsx
--------------------------------------------------------------------------------
/dictionary/新闻负面词.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LLluoling/FISHQA/HEAD/dictionary/新闻负面词.xlsx
--------------------------------------------------------------------------------
/temp/att_sents.pickle:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LLluoling/FISHQA/HEAD/temp/att_sents.pickle
--------------------------------------------------------------------------------
/temp/att_words.pickle:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LLluoling/FISHQA/HEAD/temp/att_words.pickle
--------------------------------------------------------------------------------
/temp/attention.pickle:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LLluoling/FISHQA/HEAD/temp/attention.pickle
--------------------------------------------------------------------------------
/temp/predict_y.pickle:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LLluoling/FISHQA/HEAD/temp/predict_y.pickle
--------------------------------------------------------------------------------
/test_data/sample.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LLluoling/FISHQA/HEAD/test_data/sample.xlsx
--------------------------------------------------------------------------------
/dictionary/组合管理-持仓清单.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LLluoling/FISHQA/HEAD/dictionary/组合管理-持仓清单.xlsx
--------------------------------------------------------------------------------
/training_data/sample.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LLluoling/FISHQA/HEAD/training_data/sample.xlsx
--------------------------------------------------------------------------------
/code/blacklist:
--------------------------------------------------------------------------------
1 | 年度
2 | 年
3 | 位于
4 | 年内
5 | 日
6 | 月
7 | 他
8 | 她
9 | 它
10 | 上
11 | 一
12 | 二
13 | 三
14 | 四
15 | 五
16 | 六
17 | 气
18 | 七
19 | 八
20 | 九
21 | 十
22 | 成立
23 | 昨日
24 | 今日
25 | 明日
26 | 全天
27 | 在
28 | 网
29 | 名单
30 | 新
31 | 是
32 | 上
33 | 下
34 | 左
35 | 右
36 | !
37 | 拟
--------------------------------------------------------------------------------
/dictionary/positive.txt:
--------------------------------------------------------------------------------
1 | 涨停
2 | 利好
3 | 追加担保
4 | 推荐
5 | 未受影响
6 | 资产注入
7 | 感谢
8 | 拯救
9 | 增持
10 | 拟投
11 | 定增
12 | 利好
13 | 化解
14 | 转型
15 | 看好
16 | 优质
17 | 牛股
18 | 金股
19 | 上调
20 | 升级
21 | 推动
22 | 机遇
23 | 腾飞
24 | 拓展
25 | 整合
26 | 启航
27 | 不减持
28 | 良机
29 | 孕育
30 | 机会
31 | 全额
32 | 推进
33 | 进军
34 | 加强
35 | 补贴
36 | 携手
37 | 振兴
38 | 扭亏为盈
39 | 助力
40 | 改革
41 | 转机
42 | 扫清
43 | 接盘
44 | 跳出
45 | 付息
46 | 逆袭
47 | 潜力
48 | 潜质
49 | 突围
50 | 有望
51 | 护航
52 | 优先
53 | 稳定
54 | 挺进
55 | 低估
56 | 翻番
57 | 在即
58 | 抄底
59 | 回应
60 | 回升
61 | 增补
62 | 加大
63 | 调升
64 | 上调
65 | 反弹
66 | 提高
67 | 持续提高
68 | 大有可为
69 | 战略重组
70 | 重启
71 | 追捧
72 | 积极
73 | 登顶
74 | 强力
75 | 自救成功
76 | 成功转让
77 | 增加
78 | 中标
79 | 领衔
80 | 解冻
81 | 付息
82 | 实现盈利
83 | 亏损收窄
84 | 收购
85 | 激励
86 | 澄清
87 | 盈利
88 | 派息
--------------------------------------------------------------------------------
/ReadMe.md:
--------------------------------------------------------------------------------
1 | # FISHQA ( Financial-Sentiment-Analysis-with-Hierarchical-Query-driven-Attention )
2 | ### This is a Tensorflow implementation of [Beyond Polarity: Interpretable Financial Sentiment Analysis with Hierarchical Query-driven Attention](https://www.ijcai.org/proceedings/2018/0590.pdf)
3 |
4 | ## Requirements
5 | * python 3.6.1
6 | * Tensorflow 1.11.0
7 | * jieba 0.39
8 |
9 | ## Code Introduction
10 |
11 | ### Step 1: Preprocess data
12 | ```bash
13 | python preprocess_train.py
14 | python preprocess_test.py
15 | ```
16 | Preprocess training dataset/test dataset.
17 | Remember to modify the dictionary, fiterwords based on your own datasets.
18 |
19 | ### Step 2: Training model
20 | ```bash
21 | python train.py
22 | ```
23 | ```bash
24 | cd FISHQA/code
25 | ```
26 | Set params based on your own datasets and train you own model
27 |
28 | ### Step 3: Test model
29 | ```bash
30 | python test.py
31 | ```
32 |
33 | ### Step 4: Simple attention visualization
34 | ```bash
35 | python view.py
36 | ```
37 |
38 |
39 | ## Data Introduction
40 | * Modify your own queries(FISHQA/Query) based on your own datasets and prior knowledge. Each `query` can be manually decided.
41 | * Notice that under folder `temp/` is a subset of our preprocessed data.
42 | * As the our dataset is private, we cannot release it. We put two raw samples in folder `train_data` and `test_data` individually.
43 | * Under folder `dictionary/`, there are some extra dictionaries summarized by professional for Chinese financial news.
44 |
--------------------------------------------------------------------------------
/dictionary/negative.txt:
--------------------------------------------------------------------------------
1 | 违约
2 | 实质违约
3 | 不确定性
4 | 不确定
5 | 退市
6 | 未按时
7 | 未按期
8 | 暂停上市
9 | 暂停交易
10 | 终止上市
11 | 逾期
12 | 债务逾期
13 | 贷款逾期
14 | 新增贷款
15 | 亏损
16 | 预亏
17 | 巨亏
18 | 血本无归
19 | 风险
20 | 偿付风险
21 | 兑付风险
22 | 特别风险
23 | 风险提示
24 | 风险警示
25 | 警示
26 | 缩减
27 | 风波不断
28 | 下调
29 | 下跌
30 | 下滑
31 | 重整
32 | 重组
33 | 重大事项
34 | 破产重组
35 | 破产
36 | 清盘
37 | 偿债
38 | 还债
39 | 免职
40 | 免去
41 | 解聘
42 | 判决
43 | 诉讼
44 | 审理
45 | 法律诉讼
46 | 司法
47 | 冻结
48 | 涉嫌
49 | 涉诉
50 | 起诉
51 | 纠纷
52 | 败诉
53 | 查封
54 | 仲裁
55 | 通缉
56 | 查处
57 | 禁止
58 | 扣押
59 | 推迟
60 | 延期
61 | 取消
62 | 终止
63 | 停牌
64 | 停盘
65 | 停产
66 | 停工
67 | 停止
68 | 跌停
69 | 欠息
70 | 降级
71 | 调降
72 | 下调
73 | 观察名单
74 | 无法偿还
75 | 无法兑付
76 | 无力偿还
77 | 未及时兑付
78 | 未足额
79 | 未履行
80 | 不能履行
81 | 大额对外担保
82 | 经理变更
83 | 股东变更
84 | 通报批评
85 | 谴责
86 | 调查
87 | 立案
88 | 处罚
89 | 违规
90 | 违反
91 | 处分
92 | 警告
93 | 警示
94 | 受贿
95 | 强制
96 | 非法
97 | 挪用
98 | 行贿
99 | 洗钱
100 | 内幕交易
101 | 违纪
102 | 违法
103 | 调查
104 | 开除
105 | 贪污
106 | 事故
107 | 会计差错
108 | 自杀
109 | 跳楼
110 | 上吊
111 | 身亡
112 | 溺水
113 | 坠楼
114 | 抑郁
115 | 抑郁症
116 | 自缢
117 | 死者
118 | 负面
119 | 审计署
120 | 整治
121 | 整顿
122 | 资不抵债
123 | 低效
124 | 暴跌
125 | 寒冬
126 | 恶化
127 | 调低
128 | 产能过剩
129 | 低迷
130 | 限用令
131 | 过剩
132 | 缩水
133 | 问询
134 | 危机
135 | 剥离
136 | 跌
137 | 下挫
138 | 走弱
139 | 走低
140 | 造假
141 | 大跌
142 | 临停
143 | 乏力
144 | 悉售
145 | 停产
146 | 补充质押
147 | 打开涨停
--------------------------------------------------------------------------------
/code/view.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/python
2 | #coding:utf-8
3 | import numpy as np
4 | import pandas as pd
5 | import pickle
6 | #import matplotlib.pyplot as plt
7 | #import seaborn as sns
8 | from functools import reduce
9 | from tqdm import tqdm
10 | import os
11 | import codecs
12 | data_path = "../test_data"
13 | output_path = "../output"
14 | def union_f(x, y = ""):
15 | return x +" "+ y
16 | with codecs.open("blacklist.txt",'r') as fr:
17 | b = fr.readlines()
18 | blacklist = []
19 | for i in b:
20 | blacklist.append(i.strip())
21 | blacklist = set(blacklist)
22 |
23 | f = open(os.path.join(output_path,"deal_list"))
24 | title,content = [],[]
25 | for line in f:
26 | filename = os.path.join(data_path,line.strip())
27 | sheet = pd.read_excel(filename)
28 | title.extend(list(sheet.loc[:,"title"]))
29 | content.extend(list(sheet.loc[:,"content"]))
30 |
31 | # load att_words
32 | with open('../temp/att_words.pickle', 'rb') as f:
33 | att_words = pickle.load(f)
34 | # load att_sents
35 | with open('../temp/att_sents.pickle', 'rb') as f:
36 | att_sents = pickle.load(f)
37 | # load predict_y
38 | with open('../temp/predict_y.pickle', 'rb') as f:
39 | y_pred = pickle.load(f)
40 | # Y = [i.index(1) for i in y_pred]
41 | with open('../temp/to_sentence', 'rb') as f:
42 | to_sentence = pickle.load(f)
43 | with open('../temp/test_data', 'rb') as f:
44 | X,_ = pickle.load(f)
45 |
46 | with open('../model/vocab.pickle', 'rb') as f:
47 | vocab = pickle.load(f)
48 |
49 | new_vocab = dict(map(lambda t:(t[1],t[0]), vocab.items()))
50 |
51 | print("title,content,att_words,att_sents,y_pred: ",len(title),len(content),len(att_words),len(att_sents),len(y_pred))
52 |
53 | # output a file to view attended sentences
54 | S,W = [],[]
55 | for doc in tqdm(range(len(att_sents))):
56 | b = []
57 | for query in range(len(att_sents[0])):
58 | b.append(sorted(range(len(att_sents[doc][query])), key=list(att_sents[doc][query]).__getitem__,reverse=True))
59 | # load sents
60 | try:
61 | tmp = ""
62 | count = 0;i = 0
63 | for i in range(3):
64 | temp_list = []
65 | for sent in b[i]:
66 | if len(to_sentence[str(X[doc][sent])])>3:
67 | temp_list.append(to_sentence[str(X[doc][sent])])
68 | count += 1
69 | if count >=3:
70 | break
71 | tmp += reduce(union_f,temp_list)+"||"
72 | S.append(tmp)
73 | except:
74 | S.append(title[doc])
75 | # load words
76 | try:
77 | tmp = ""
78 | for i in range(30):
79 | word = new_vocab[X[doc][i][att_words[doc][i]]]
80 | if word!="UNKNOW_TOKEN":
81 | tmp += word+" "
82 | W.append(tmp)
83 | except:
84 | pass
85 |
86 |
87 | writer = pd.ExcelWriter('../output/result.xlsx')
88 | df = pd.DataFrame(columns=['title','content',"predict_score","attened_sents","attened_words"])
89 | df.loc[:,"title"] = title
90 | df.loc[:,"content"] = content
91 | df.loc[:,"predict_score"] = list(y_pred)
92 | df.loc[:,"attened_sents"] = S
93 | df.loc[:,"attened_words"] = W
94 |
95 | df.to_excel(writer,'Sheet1')
96 | writer.save()
97 | print("done!")
98 |
--------------------------------------------------------------------------------
/code/preprocess_test.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/python
2 | #coding:utf-8
3 | import numpy as np
4 | import pandas as pd
5 | import jieba.posseg as pseg
6 | import jieba
7 | import re
8 | import os
9 | import codecs
10 | from collections import defaultdict
11 | from tqdm import tqdm
12 | import pickle
13 | import random
14 | import argparse
15 | import os
16 |
17 |
18 | # load dictionary/names of all the corps
19 | # noted that jieba is Chinese text segmentation; see https://github.com/fxsjy/jieba
20 | name = ["太平洋资产管理有限责任公司","张家港农商银行","江苏银行","中建投信托有限责任公司","华宝兴业基金"]
21 | corps =set()
22 | for i in range(len(name)):
23 | tmp_sheet = pd.read_excel("../dictionary/组合管理-持仓清单.xlsx",sheetname=name[i])
24 | corps = corps|set(list(tmp_sheet.loc[:,"主体名称"]))
25 | corps = corps|set(list(tmp_sheet.loc[:,"债券名称"]))
26 | sheet2 = pd.read_excel('../dictionary/公司简称.xlsx',sheetname = 0)
27 | corps = corps|(set(list(sheet2.iloc[:,1])))
28 | corps = corps|(set(list(sheet2.iloc[:,0])))
29 | jieba.load_userdict('../dictionary/mydict.txt')
30 | jieba.load_userdict('../dictionary/negative.txt')
31 | jieba.load_userdict('../dictionary/positive.txt')
32 | jieba.load_userdict(corps)
33 |
34 | # load negative words
35 | neg_words = pd.read_excel("../dictionary/新闻负面词.xlsx")
36 | jieba.load_userdict(list(neg_words.loc[:,"NewsClass"]))
37 |
38 | # filter some nosiy data
39 | # pattern="[\s\.\!\/_,-:;~{}`^\\\[\]<=>?$%^*()+\"\']+|[+——!·【】‘’“”《》,。:;?、~@#¥%……&*()]+0123456789qwertyuioplkjhgfdsazxcvbnm"
40 | pattern="[\.\\/_,,.:;~{}`^\\\[\]<=>?$%^*()+\"\']+|[+·。:【】‘’“”《》、~@#¥%……&*()]+0123456789"
41 | pat = set(pattern)|set(["\n",'\u3000'," ","\s","","
"])
42 | filterwords = ["
","责任编辑","DF","点击查看","热点栏目 资金流向 千股千评 个股诊断 最新评级 模拟交易 客户端","进入【新浪财经股吧】讨论","记者","鸣谢","报道","重点提示","重大事项","重要内容提示","提示:键盘也能翻页,试试“← →”键","原标题"]
43 | # with codecs.open('../dictionary/stopwords_CN.dat','r') as fr:
44 | # stopwords=fr.readlines()
45 | # stopwords=[i.strip() for i in stopwords]
46 | # stopwords=set(stopwords)
47 |
48 |
49 | # get test data
50 | test_x, test_y = [],[]
51 | #测试集每个分词后的句子对应的真实的句子,存在词典里面,
52 | to_sentence = {}
53 | #每个句子对应的文档index
54 | to_document = {}
55 | max_sent_in_doc = 30
56 | max_word_in_sent = 45
57 | UNKNOWN = 0
58 | num_classes =2
59 | with open("../model/vocab.pickle",'rb') as f:
60 | vocab = pickle.load(f)
61 | def FormData(sheet):
62 | for row in tqdm(range(len(sheet))):
63 | doc=np.zeros((30,45), dtype=np.int32)
64 | title = str(sheet.loc[row,"title"])
65 | text = str(sheet.loc[row,"content"])
66 | for item in filterwords:
67 | text = text.replace(item,"")
68 | sents = title +"。"+text
69 | count1 = 0
70 | for i, sent in enumerate(sents.split("。")):
71 | # filter the code in the news
72 | if "function()" in sent:
73 | continue
74 | if count1 < max_sent_in_doc:
75 | count = 0
76 | for j, word in enumerate(pseg.lcut(sent)):
77 | kind = (list(word))[1][0]
78 | tmpword = (list(word))[0]
79 | if (tmpword not in pat) and (tmpword[0] not in pat) and (count < max_word_in_sent):
80 | doc[count1][count] = vocab.get(tmpword, UNKNOWN)
81 | count +=1
82 | to_sentence[str(doc[count1].tolist())] = sent
83 | count1 +=1
84 | # score==1: negative; score==0: positive
85 | # try:
86 | if sheet.loc[row,"score"]==0:
87 | label = 0
88 | else:
89 | label = 1
90 | labels = [0] * num_classes
91 | labels[label] = 1
92 | # except:
93 | # labels = [0] * num_classes
94 | test_y.append(labels)
95 | test_x.append(doc.tolist())
96 |
97 | # deal with every file in test_data
98 | path = "../test_data"
99 | f = open("../output/deal_list","w")
100 | for n_file in os.listdir(path):
101 | try:
102 | file_path = os.path.join(path,n_file)
103 | data = pd.read_excel(file_path)
104 | # dat = data.loc[data.clas=="财经网站"].copy()
105 | # print(len(dat))
106 | f.write(n_file+"\n")
107 | FormData(data)
108 | except:
109 | pass
110 | f.close()
111 | pickle.dump((to_sentence), open('../temp/to_sentence', 'wb'))
112 | pickle.dump((test_x, test_y), open('../temp/test_data', 'wb'))
113 | print("load test_data finished")
114 |
--------------------------------------------------------------------------------
/code/predict.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/python
2 | #coding:utf-8
3 | from model import FISHQA,read_question,shuffle_data
4 | import tensorflow as tf
5 | import time
6 | import pickle
7 | import numpy as np
8 | from tqdm import tqdm
9 | from tensorflow.contrib import rnn
10 | from tensorflow.contrib import layers
11 | import random
12 | import pandas as pd
13 | import os
14 | import argparse
15 | parser = argparse.ArgumentParser()
16 | parser.add_argument('--logdir', default='1526700733')
17 | args = parser.parse_args()
18 | # Data loading params
19 | tf.flags.DEFINE_string("data_dir", "../data/data.dat", "data directory")
20 | tf.flags.DEFINE_integer("vocab_size", 52812, "vocabulary size")
21 | tf.flags.DEFINE_integer("num_classes", 2, "number of classes")
22 | tf.flags.DEFINE_integer("embedding_size", 200, "Dimensionality of character embedding (default: 200)")
23 | tf.flags.DEFINE_integer("hidden_size", 100, "Dimensionality of GRU hidden layer (default: 50)")
24 | tf.flags.DEFINE_integer("batch_size", 64, "Batch Size (default: 64)")
25 | tf.flags.DEFINE_integer("num_epochs", 20, "Number of training epochs (default: 50)")
26 | tf.flags.DEFINE_integer("checkpoint_every", 100, "Save model after this many steps (default: 100)")
27 | tf.flags.DEFINE_integer("num_checkpoints", 5, "Number of checkpoints to store (default: 5)")
28 | tf.flags.DEFINE_integer("evaluate_every", 10, "evaluate every this many batches")
29 | tf.flags.DEFINE_float("learning_rate", 0.001, "learning rate")
30 | tf.flags.DEFINE_float("grad_clip", 5, "grad clip to prevent gradient explode")
31 | tf.flags.DEFINE_float("sentence_num", 30, "the max number of sentence in a document")
32 | tf.flags.DEFINE_float("sentence_length", 45, "the max length of each sentence")
33 |
34 | with open("../temp/test_data", 'rb') as f:
35 | test_x,test_y = pickle.load(f)
36 |
37 | FLAGS = tf.flags.FLAGS
38 | print("loading test data finished")
39 |
40 | def main():
41 | with tf.Session() as sess:
42 | fishqa = FISHQA(vocab_size=FLAGS.vocab_size,
43 | num_classes=FLAGS.num_classes,
44 | embedding_size=FLAGS.embedding_size,
45 | hidden_size=FLAGS.hidden_size,
46 | dropout_keep_proba=0.5,
47 | query = read_question()
48 | )
49 | with tf.name_scope('loss'):
50 | loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=fishqa.input_y,
51 | logits=fishqa.out,
52 | name='loss'))
53 | with tf.name_scope('accuracy'):
54 | predict = tf.argmax(fishqa.out, axis=1, name='predict')
55 | label = tf.argmax(fishqa.input_y, axis=1, name='label')
56 | acc = tf.reduce_mean(tf.cast(tf.equal(predict, label), tf.float32))
57 |
58 | with tf.name_scope('att_words'):
59 | att_words = tf.reshape(fishqa.att_word,[-1,30])
60 | with tf.name_scope('att_sents'):
61 | att_sents = tf.reshape(fishqa.att_sent,[-1,4,30])
62 | timestamp = str(int(time.time()))
63 | out_dir = os.path.abspath(os.path.join(os.path.curdir, "runs", args.logdir))
64 |
65 |
66 | checkpoint_dir = os.path.abspath(os.path.join(out_dir, "checkpoints"))
67 | checkpoint_path = checkpoint_dir + '/my-model.ckpt'
68 |
69 | # saver = tf.train.Saver(tf.global_variables(), max_to_keep=1)
70 | saver = tf.train.Saver(tf.global_variables())
71 | sess.run(tf.global_variables_initializer())
72 | def test_step(x, y):
73 | predictions,labels = [],[]
74 | attend_w,attend_s = [],[]
75 | for i in range(0, len(x), FLAGS.batch_size):
76 |
77 | feed_dict = {
78 | fishqa.input_x: x[i:i + FLAGS.batch_size],
79 | fishqa.input_y: y[i:i + FLAGS.batch_size],
80 | fishqa.max_sentence_num: 30,
81 | fishqa.max_sentence_length: 45,
82 | fishqa.batch_size: 64,
83 | fishqa.is_training:False
84 | }
85 | # step, summaries,cost, accuracy,correctNumber = sess.run([global_step, dev_summary_op,loss,acc,accNUM], feed_dict)
86 | pre,att_w,att_s= sess.run([predict,att_words,att_sents], feed_dict)
87 | attend_w.extend(att_w)
88 | attend_s.extend(att_s)
89 | predictions.extend(pre)
90 |
91 | print("predict score done!")
92 | pickle.dump(attend_w, open('../temp/att_words.pickle', 'wb'))
93 | pickle.dump(attend_s, open('../temp/att_sents.pickle', 'wb'))
94 | pickle.dump(predictions, open('../temp/predict_y.pickle', 'wb'))
95 | print("attention weights loaded!")
96 |
97 | saver.restore(sess, checkpoint_path)
98 | test_step(test_x, test_y)
99 |
100 | if __name__ == '__main__':
101 | main()
102 |
--------------------------------------------------------------------------------
/code/test.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/python
2 | #coding:utf-8
3 | from model import FISHQA,read_question,shuffle_data
4 | import tensorflow as tf
5 | import time
6 | import pickle
7 | import numpy as np
8 | from tqdm import tqdm
9 | from tensorflow.contrib import rnn
10 | from tensorflow.contrib import layers
11 | import random
12 | import pandas as pd
13 | import os
14 | import argparse
15 | parser = argparse.ArgumentParser()
16 | parser.add_argument('--logdir', default='1526700733')
17 | args = parser.parse_args()
18 | # Data loading params
19 | tf.flags.DEFINE_string("data_dir", "../data/data.dat", "data directory")
20 | tf.flags.DEFINE_integer("vocab_size", 52812, "vocabulary size")
21 | tf.flags.DEFINE_integer("num_classes", 2, "number of classes")
22 | tf.flags.DEFINE_integer("embedding_size", 200, "Dimensionality of character embedding (default: 200)")
23 | tf.flags.DEFINE_integer("hidden_size", 100, "Dimensionality of GRU hidden layer (default: 50)")
24 | tf.flags.DEFINE_integer("batch_size", 64, "Batch Size (default: 64)")
25 | tf.flags.DEFINE_integer("num_epochs", 20, "Number of training epochs (default: 50)")
26 | tf.flags.DEFINE_integer("checkpoint_every", 100, "Save model after this many steps (default: 100)")
27 | tf.flags.DEFINE_integer("num_checkpoints", 5, "Number of checkpoints to store (default: 5)")
28 | tf.flags.DEFINE_integer("evaluate_every", 10, "evaluate every this many batches")
29 | tf.flags.DEFINE_float("learning_rate", 0.001, "learning rate")
30 | tf.flags.DEFINE_float("grad_clip", 5, "grad clip to prevent gradient explode")
31 | tf.flags.DEFINE_float("sentence_num", 30, "the max number of sentence in a document")
32 | tf.flags.DEFINE_float("sentence_length", 45, "the max length of each sentence")
33 |
34 | with open("../temp/test_data", 'rb') as f:
35 | test_x,test_y = pickle.load(f)
36 |
37 | FLAGS = tf.flags.FLAGS
38 | print("loading test data finished")
39 |
40 | def main():
41 | with tf.Session() as sess:
42 | fishqa = FISHQA(vocab_size=FLAGS.vocab_size,
43 | num_classes=FLAGS.num_classes,
44 | embedding_size=FLAGS.embedding_size,
45 | hidden_size=FLAGS.hidden_size,
46 | dropout_keep_proba=0.5,
47 | query = read_question()
48 | )
49 | with tf.name_scope('loss'):
50 | loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=fishqa.input_y,
51 | logits=fishqa.out,
52 | name='loss'))
53 | with tf.name_scope('accuracy'):
54 | predict = tf.argmax(fishqa.out, axis=1, name='predict')
55 | label = tf.argmax(fishqa.input_y, axis=1, name='label')
56 | acc = tf.reduce_mean(tf.cast(tf.equal(predict, label), tf.float32))
57 |
58 | with tf.name_scope('att_words'):
59 | att_words = tf.reshape(fishqa.att_word,[-1,30])
60 | with tf.name_scope('att_sents'):
61 | att_sents = tf.reshape(fishqa.att_sent,[-1,4,30])
62 | timestamp = str(int(time.time()))
63 | out_dir = os.path.abspath(os.path.join(os.path.curdir, "runs", args.logdir))
64 |
65 |
66 | checkpoint_dir = os.path.abspath(os.path.join(out_dir, "checkpoints"))
67 | checkpoint_path = checkpoint_dir + '/my-model.ckpt'
68 |
69 | # saver = tf.train.Saver(tf.global_variables(), max_to_keep=1)
70 | saver = tf.train.Saver(tf.global_variables())
71 | sess.run(tf.global_variables_initializer())
72 | def test_step(x, y):
73 | predictions,labels = [],[]
74 | attend_w,attend_s = [],[]
75 | for i in range(0, len(x), FLAGS.batch_size):
76 |
77 | feed_dict = {
78 | fishqa.input_x: x[i:i + FLAGS.batch_size],
79 | fishqa.input_y: y[i:i + FLAGS.batch_size],
80 | fishqa.max_sentence_num: 30,
81 | fishqa.max_sentence_length: 45,
82 | fishqa.batch_size: 64,
83 | fishqa.is_training:False
84 | }
85 | pre, groundtruth, att_w, att_s = sess.run([predict,label,att_words,att_sents], feed_dict)
86 | predictions.extend(pre)
87 | labels.extend(groundtruth)
88 | attend_w.extend(att_w)
89 | attend_s.extend(att_s)
90 | df = pd.DataFrame({'predictions': predictions, 'labels': labels})
91 | acc_dev = (df['predictions'] == df['labels']).mean()
92 | print("++++++++++++++++++test++++++++++++++: acc {:g} ".format(acc_dev))
93 | pickle.dump(attend_w, open('../temp/att_words.pickle', 'wb'))
94 | pickle.dump(attend_s, open('../temp/att_sents.pickle', 'wb'))
95 | pickle.dump(predictions, open('../temp/predict_y.pickle', 'wb'))
96 | print("attention weights loaded!")
97 |
98 | saver.restore(sess, checkpoint_path)
99 | test_step(test_x, test_y)
100 |
101 | if __name__ == '__main__':
102 | main()
103 |
--------------------------------------------------------------------------------
/code/preprocess_train.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/python
2 | #coding:utf-8
3 | import numpy as np
4 | import pandas as pd
5 | import jieba.posseg as pseg
6 | import jieba
7 | import re
8 | import os
9 | import codecs
10 | from collections import defaultdict
11 | from tqdm import tqdm
12 | import pickle
13 | import random
14 | import argparse
15 | from model import shuffle_data
16 |
17 |
18 | # load dictionary/names of all the corps
19 | # noted that jieba is Chinese text segmentation; see https://github.com/fxsjy/jieba
20 | name = ["太平洋资产管理有限责任公司","张家港农商银行","江苏银行","中建投信托有限责任公司","华宝兴业基金"]
21 | corps =set()
22 | for i in range(len(name)):
23 | sheet = pd.read_excel("../dictionary/组合管理-持仓清单.xlsx",sheetname=name[i])
24 | corps = corps|set(list(sheet.loc[:,"主体名称"]))
25 | corps = corps|set(list(sheet.loc[:,"债券名称"]))
26 | sheet2 = pd.read_excel('../dictionary/公司简称.xlsx',sheetname = 0)
27 | corps = corps|(set(list(sheet2.iloc[:,1])))
28 | corps = corps|(set(list(sheet2.iloc[:,0])))
29 | jieba.load_userdict('../dictionary/mydict.txt')
30 | jieba.load_userdict('../dictionary/negative.txt')
31 | jieba.load_userdict('../dictionary/positive.txt')
32 | jieba.load_userdict(corps)
33 |
34 | # load negative words
35 | neg_words = pd.read_excel("../dictionary/新闻负面词.xlsx")
36 | jieba.load_userdict(list(neg_words.loc[:,"NewsClass"]))
37 |
38 | # filter some nosiy marks
39 | pattern="[\.\\/_,,.:;~{}`^\\\[\]<=>?$%^*()+\"\']+|[+·。:【】‘’“”《》、~@#¥%……&*()]+0123456789"
40 | pat = set(pattern)|set(["\n",'\u3000'," ","\s","","
"])
41 |
42 | # some noise words in chinese news
43 | filterwords = ["
","责任编辑","DF","点击查看","热点栏目 资金流向 千股千评 个股诊断 最新评级 模拟交易 客户端","进入【新浪财经股吧】讨论","记者","鸣谢","报道","重点提示","重大事项","重要内容提示","提示:键盘也能翻页,试试“← →”键","原标题"]
44 | # with codecs.open('../dictionary/stopwords_CN.dat','r') as fr:
45 | # stopwords=fr.readlines()
46 | # stopwords=[i.strip() for i in stopwords]
47 | # stopwords=set(stopwords)
48 |
49 |
50 | # count the frequency of each word in documents
51 | print("count word frequency")
52 | word_freq = defaultdict(int)
53 | def Getdata(sheet):
54 | for row in tqdm(range(len(sheet))):
55 | title=str(sheet.loc[row,"title"])
56 | content=str(sheet.loc[row,"content"])
57 | for item in filterwords:
58 | content = content.replace(item,"")
59 | sents = title +" " +content
60 | words=pseg.lcut(sents)
61 | for j, word in enumerate(words):
62 | kind = (list(word))[1][0]
63 | tmpword = (list(word))[0]
64 | #if (kind not in ['e','x','m','u']) and (tmpword not in stopwords):
65 | if (tmpword not in pat) and (tmpword[0] not in pat):
66 | word_freq[tmpword]+=1
67 | path = "../training_data"
68 | for n_file in os.listdir(path):
69 | file_path = os.path.join(path,n_file)
70 | sheet = pd.read_excel(file_path)
71 | Getdata(sheet)
72 |
73 | # count the frequency of each word in query set
74 | q=[]
75 | f = open("../Query")
76 | for line in f:
77 | q.append(line)
78 | words=pseg.lcut(str(line.strip()))
79 | for j, word in enumerate(words):
80 | kind = (list(word))[1][0]
81 | tmpword = (list(word))[0]
82 | if (tmpword not in pat) and (tmpword[0] not in pat):
83 | word_freq[tmpword]+=1
84 | f.close()
85 | print("previous data length:",len(word_freq))
86 |
87 |
88 | # load word frequency
89 | if not os.path.exists("../model"):
90 | os.mkdir("../model")
91 | with open('../model/word_freq.pickle', 'wb') as g:
92 | pickle.dump(word_freq, g)
93 | print(len(word_freq),"word_freq save finished")
94 | # sorted by word frequency and remove those whose frquency < 3
95 | sort_words = list(sorted(word_freq.items(), key=lambda x:-x[1]))
96 | print("the 10 most words:",sort_words[:10],"\n the 10 least words:",sort_words[-10:])
97 |
98 |
99 | # load word vocab
100 | vocab = {}
101 | i = 3
102 | vocab['UNKNOW_TOKEN'] = 0
103 |
104 | for word, freq in sort_words:
105 | if freq > 3:
106 | vocab[word] = i
107 | i += 1
108 | with open('../model/vocab.pickle', 'wb') as f:
109 | pickle.dump(vocab, f)
110 | print(len(vocab),"vocab save finished")
111 | UNKNOWN = 0
112 | num_classes = 2
113 |
114 |
115 | # get training data
116 | data_x,data_y =[],[]
117 | max_sent_in_doc = 30
118 | max_word_in_sent = 45
119 |
120 | # we form 3 queries for our model (depending on your datasets and your need)
121 | question = np.zeros((3,max_word_in_sent), dtype=np.int32)
122 |
123 | for i,ite in enumerate(q):
124 | words=pseg.lcut(ite)
125 | count = 0
126 | for j, word in enumerate(words):
127 | kind = (list(word))[1][0]
128 | tmpword = (list(word))[0]
129 | if (tmpword not in pat) and (tmpword[0] not in pat):
130 | question[i][count] = vocab.get(tmpword, UNKNOWN)
131 | count +=1
132 | def FormData(sheet):
133 | for row in tqdm(range(len(sheet))):
134 | doc=np.zeros((30,45), dtype=np.int32)
135 | title = str(sheet.loc[row,"title"])
136 | text = str(sheet.loc[row,"content"])
137 | for item in filterwords:
138 | text = text.replace(item,"")
139 | sents = title +"。"+text
140 | count1 = 0
141 | for i, sent in enumerate(sents.split("。")):
142 | # filter the code in the news
143 | if "function()" in sent:
144 | continue
145 | if count1 < max_sent_in_doc:
146 | count = 0
147 | for j, word in enumerate(pseg.lcut(sent)):
148 | kind = (list(word))[1][0]
149 | tmpword = (list(word))[0]
150 | if (tmpword not in pat) and (tmpword[0] not in pat) and (count < max_word_in_sent):
151 | doc[count1][count] = vocab.get(tmpword, UNKNOWN)
152 | count +=1
153 | count1 +=1
154 | # 0: non-neg 1: neg
155 | if sheet.loc[row,"score"]==0:
156 | label = 0
157 | else:
158 | label = 1
159 | labels = [0] * num_classes
160 | labels[label] = 1
161 | data_y.append(labels)
162 | data_x.append(doc.tolist())
163 | for n_file in os.listdir(path):
164 | file_path = os.path.join(path,n_file)
165 | sheet = pd.read_excel(file_path)
166 | FormData(sheet)
167 | print("load train_data finished, length: ",len(data_x))
168 |
169 |
170 | # load training data
171 | data_x,data_y = shuffle_data(data_x,data_y)
172 | train_x,train_y,eval_x,eval_y = [],[],[],[]
173 | for i in range(len(data_x)):
174 | r = random.random()
175 | if r<0.8:
176 | train_x.append(data_x[i])
177 | train_y.append(data_y[i])
178 | else:
179 | eval_x.append(data_x[i])
180 | eval_y.append(data_y[i])
181 |
182 | print("shuffle data finished!")
183 | pickle.dump((train_x,train_y), open('../model/train_data', 'wb'))
184 | pickle.dump((eval_x,eval_y), open('../model/dev_data', 'wb'))
185 | pickle.dump((question[0].tolist()), open('../model/q1_data', 'wb'))
186 | pickle.dump((question[1].tolist()), open('../model/q2_data', 'wb'))
187 | pickle.dump((question[2].tolist()), open('../model/q3_data', 'wb'))
188 | print("store training data finished!")
189 |
--------------------------------------------------------------------------------
/code/train.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/python
2 | #coding:utf-8
3 | from model import FISHQA,read_question,shuffle_data
4 | import tensorflow as tf
5 | import time
6 | import pickle
7 | import numpy as np
8 | from tqdm import tqdm
9 | from tensorflow.contrib import rnn
10 | from tensorflow.contrib import layers
11 | import random
12 | import pandas as pd
13 | import os
14 |
15 |
16 | # Data loading params
17 | tf.flags.DEFINE_string("data_dir", "../data/data.dat", "data directory")
18 | tf.flags.DEFINE_integer("vocab_size", 52812, "vocabulary size")
19 | tf.flags.DEFINE_integer("num_classes", 2, "number of classes")
20 | tf.flags.DEFINE_integer("embedding_size", 200, "Dimensionality of character embedding (default: 200)")
21 | tf.flags.DEFINE_integer("hidden_size", 100, "Dimensionality of GRU hidden layer (default: 50)")
22 | tf.flags.DEFINE_integer("batch_size", 64, "Batch Size (default: 64)")
23 | tf.flags.DEFINE_integer("num_epochs", 15, "Number of training epochs (default: 50)")
24 | tf.flags.DEFINE_integer("checkpoint_every", 100, "Save model after this many steps (default: 100)")
25 | tf.flags.DEFINE_integer("num_checkpoints", 5, "Number of checkpoints to store (default: 5)")
26 | tf.flags.DEFINE_integer("evaluate_every", 10, "evaluate every this many batches")
27 | tf.flags.DEFINE_float("learning_rate", 0.001, "learning rate")
28 | tf.flags.DEFINE_float("grad_clip", 5, "grad clip to prevent gradient explode")
29 | tf.flags.DEFINE_float("sentence_num", 30, "the max number of sentence in a document")
30 | tf.flags.DEFINE_float("sentence_length", 45, "the max length of each sentence")
31 |
32 | def read_dataset():
33 | train_x, train_y,dev_x, dev_y =[],[],[],[]
34 | with open("../model/train_data", 'rb') as f:
35 | train_x, train_y = pickle.load(f)
36 | with open("../model/dev_data", 'rb') as g:
37 | dev_x, dev_y = pickle.load(g)
38 | return train_x, train_y,dev_x, dev_y
39 |
40 | FLAGS = tf.flags.FLAGS
41 | train_x, train_y,dev_x, dev_y = read_dataset()
42 | acc_record = 0
43 | print("data load finished")
44 |
45 |
46 |
47 | def main():
48 | with tf.Session() as sess:
49 | fishqa = FISHQA(vocab_size=FLAGS.vocab_size,
50 | num_classes=FLAGS.num_classes,
51 | embedding_size=FLAGS.embedding_size,
52 | hidden_size=FLAGS.hidden_size,
53 | dropout_keep_proba=0.5,
54 | query = read_question()
55 | )
56 | with tf.name_scope('loss'):
57 | loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=fishqa.input_y,
58 | logits=fishqa.out,
59 | name='loss'))
60 | with tf.name_scope('accuracy'):
61 | predict = tf.argmax(fishqa.out, axis=1, name='predict')
62 | label = tf.argmax(fishqa.input_y, axis=1, name='label')
63 | acc = tf.reduce_mean(tf.cast(tf.equal(predict, label), tf.float32))
64 |
65 | timestamp = str(int(time.time()))
66 | out_dir = os.path.abspath(os.path.join(os.path.curdir, "runs", timestamp))
67 | print("Model Writing to {}\n".format(out_dir))
68 | global_step = tf.Variable(0, trainable=False)
69 |
70 | optimizer = tf.train.AdamOptimizer(FLAGS.learning_rate)
71 | #optimizer = tf.train.MomentumOptimizer(FLAGS.learning_rate,0.9)
72 |
73 | tvars = tf.trainable_variables()
74 | grads, _ = tf.clip_by_global_norm(tf.gradients(loss, tvars), FLAGS.grad_clip)
75 | grads_and_vars = tuple(zip(grads, tvars))
76 | train_op = optimizer.apply_gradients(grads_and_vars, global_step=global_step)
77 |
78 | # Keep track of gradient values and sparsity (optional)
79 | # grad_summaries = grad_summary
80 | grad_summaries = []
81 | for g, v in grads_and_vars:
82 | if g is not None:
83 | grad_hist_summary = tf.summary.histogram("{}/grad/hist".format(v.name), g)
84 | grad_summaries.append(grad_hist_summary)
85 |
86 | # grad_summaries_merged = tf.summary.merge(grad_summaries)
87 |
88 | loss_summary = tf.summary.scalar('loss', loss)
89 | acc_summary = tf.summary.scalar('accuracy', acc)
90 |
91 |
92 | # train_summary_op = tf.summary.merge([loss_summary, acc_summary, grad_summaries_merged])
93 | train_summary_op = tf.summary.merge_all()#tf.merge_all_summaries()
94 | train_summary_dir = os.path.join(out_dir, "summaries", "train")
95 | train_summary_writer = tf.summary.FileWriter(train_summary_dir, sess.graph)
96 |
97 | checkpoint_dir = os.path.abspath(os.path.join(out_dir, "checkpoints"))
98 | checkpoint_prefix = os.path.join(checkpoint_dir, "model")
99 | checkpoint_path = checkpoint_dir + '/my-model.ckpt'
100 | if not os.path.exists(checkpoint_dir):
101 | os.makedirs(checkpoint_dir)
102 | # saver = tf.train.Saver(tf.global_variables(), max_to_keep=FLAGS.num_checkpoints)
103 | # saver = tf.train.Saver()
104 | # saver = tf.train.Saver(tf.global_variables(), max_to_keep=1)
105 | saver = tf.train.Saver(tf.global_variables(), max_to_keep=1)
106 | sess.run(tf.global_variables_initializer())
107 |
108 |
109 | def train_step(x_batch, y_batch):
110 | feed_dict = {
111 | fishqa.input_x: x_batch,
112 | fishqa.input_y: y_batch,
113 | fishqa.max_sentence_num: FLAGS.sentence_num,
114 | fishqa.max_sentence_length: FLAGS.sentence_length,
115 | fishqa.batch_size: FLAGS.batch_size,
116 | fishqa.is_training: True
117 | }
118 | _, step, summaries, cost, accuracy = sess.run([train_op, global_step, train_summary_op, loss, acc], feed_dict)
119 |
120 | time_str = str(int(time.time()))
121 | # print("{}: step {}, loss {:g}, acc {:g}".format(time_str, step, cost, accuracy))
122 | train_summary_writer.add_summary(summaries, step)
123 | return step
124 |
125 | def dev_step(x, y):
126 | global acc_record
127 | predictions = []
128 | labels = []
129 | for i in range(0, len(x), FLAGS.batch_size):
130 |
131 | feed_dict = {
132 | fishqa.input_x: x[i:i + FLAGS.batch_size],
133 | fishqa.input_y: y[i:i + FLAGS.batch_size],
134 | fishqa.max_sentence_num: 30,
135 | fishqa.max_sentence_length: 45,
136 | fishqa.batch_size: 64,
137 | fishqa.is_training:False
138 | }
139 | # step, summaries,cost, accuracy,correctNumber = sess.run([global_step, dev_summary_op,loss,acc,accNUM], feed_dict)
140 | step, pre, groundtruth= sess.run([global_step, predict, label], feed_dict)
141 | predictions.extend(pre)
142 | labels.extend(groundtruth)
143 | time_str = str(int(time.time()))
144 | df = pd.DataFrame({'predictions': predictions, 'labels': labels})
145 | acc_dev = (df['predictions'] == df['labels']).mean()
146 | print("++++++++++++++++++dev++++++++++++++{}: step {}, acc {:g} ".format(time_str, step, acc_dev))
147 | if acc_dev>acc_record:
148 | acc_record = acc_dev
149 | saver.save(sess, checkpoint_path)
150 |
151 | for epoch in range(FLAGS.num_epochs):
152 | X,Y = shuffle_data(train_x,train_y)
153 | print('current epoch %s' % (epoch + 1))
154 | for i in range(0, len(X), FLAGS.batch_size):
155 | x = X[i:i + FLAGS.batch_size]
156 | y = Y[i:i + FLAGS.batch_size]
157 | step = train_step(x, y)
158 | if step % FLAGS.evaluate_every == 0:
159 | dev_step(dev_x, dev_y)
160 |
161 | if __name__ == '__main__':
162 | main()
163 |
--------------------------------------------------------------------------------
/code/model.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/python
2 | #coding:utf-8
3 | import tensorflow as tf
4 | import time
5 | import pickle
6 | import numpy as np
7 | from tqdm import tqdm
8 | from tensorflow.contrib import rnn
9 | from tensorflow.contrib import layers
10 | import random
11 | import pandas as pd
12 |
13 | # return the length of each sequence
14 | def length(sequences):
15 | used = tf.sign(tf.reduce_max(tf.abs(sequences), reduction_indices=2))
16 | seq_len = tf.reduce_sum(used, reduction_indices=1)
17 | return tf.cast(seq_len, tf.int32)
18 | # load 3 query set (we set 3 based on )
19 | def read_question():
20 | with open('../model/q1_data', 'rb') as f:
21 | q1 = pickle.load(f)
22 | with open('../model/q2_data', 'rb') as g:
23 | q2 = pickle.load(g)
24 | with open('../model/q3_data', 'rb') as b:
25 | q3 = pickle.load(b)
26 | return [q1,q2,q3]
27 | def shuffle_data(x,y):
28 | train_x = [];train_y=[]
29 | li = np.random.permutation(len(x))
30 | for i in tqdm(range(len(li))):
31 | train_x.append(x[li[i]])
32 | train_y.append(y[li[i]])
33 | return train_x,train_y
34 | class FISHQA():
35 |
36 | def __init__(self, vocab_size, num_classes, embedding_size=200, hidden_size=50, dropout_keep_proba=0.5,query=[]):
37 |
38 | self.vocab_size = vocab_size
39 | self.num_classes = num_classes
40 | self.embedding_size = embedding_size
41 | self.hidden_size = hidden_size
42 | self.dropout_keep_proba = dropout_keep_proba
43 | self.query = query
44 |
45 | with tf.name_scope('placeholder'):
46 | self.max_sentence_num = tf.placeholder(tf.int32, name='max_sentence_num')
47 | self.max_sentence_length = tf.placeholder(tf.int32, name='max_sentence_length')
48 | self.batch_size = tf.placeholder(tf.int32, name='batch_size')
49 | #x shape [batch_size, sentence_num,word_num ]
50 | #y shape [batch_size, num_classes]
51 | self.input_x = tf.placeholder(tf.int32, [None, None, None], name='input_x')
52 | self.input_y = tf.placeholder(tf.float32, [None, num_classes], name='input_y')
53 | self.is_training = tf.placeholder(dtype=tf.bool, name='is_training')
54 |
55 | word_embedded,q1_emb,q2_emb,q3_emb = self.word2vec()
56 | sent_vec,att_word = self.sent2vec(word_embedded,q1_emb,q2_emb,q3_emb)
57 | doc_vec,att_sent = self.doc2vec(sent_vec,q1_emb,q2_emb,q3_emb)
58 | out = self.classifer(doc_vec)
59 |
60 | self.out = out
61 | self.att_word = att_word
62 | self.att_sent = att_sent
63 | def word2vec(self):
64 | with tf.name_scope("embedding"):
65 | embedding_mat = tf.Variable(tf.truncated_normal((self.vocab_size, self.embedding_size)))
66 | #shape: [batch_size, sent_in_doc, word_in_sent, embedding_size]
67 | # 45 is the max
68 | word_embedded = tf.nn.embedding_lookup(embedding_mat, self.input_x)
69 | q1_emb = tf.reduce_sum(tf.nn.embedding_lookup(embedding_mat, self.query[0]),axis=0)/45
70 | q2_emb = tf.reduce_sum(tf.nn.embedding_lookup(embedding_mat, self.query[1]),axis=0)/45
71 | q3_emb = tf.reduce_sum(tf.nn.embedding_lookup(embedding_mat, self.query[2]),axis=0)/45
72 | return word_embedded,q1_emb,q2_emb,q3_emb
73 |
74 | def sent2vec(self, word_embedded,q1_emb,q2_emb,q3_emb):
75 | with tf.name_scope("sent2vec"):
76 | #GRU input size : [batch_size, max_time, ...]
77 | #shape: [batch_size*sent_in_doc, word_in_sent, embedding_size]
78 | word_embedded = tf.reshape(word_embedded, [-1, self.max_sentence_length, self.embedding_size])
79 | #shape: [batch_size*sent_in_doce, word_in_sent, hidden_size*2]
80 | word_encoded = self.BidirectionalGRUEncoder(word_embedded, name='word_encoder')
81 | #shape: [batch_size*sent_in_doc, hidden_size*2]
82 | sent_temp,att_word = self.AttentionLayer(word_encoded,q1_emb,q2_emb,q3_emb, name='word_attention')
83 | sent_vec = layers.dropout(sent_temp, keep_prob=self.dropout_keep_proba,is_training=self.is_training,)
84 | return sent_vec,att_word
85 |
86 | def doc2vec(self, sent_vec,q1_embedded,q2_embedded,q3_embedded):
87 | # the same with sent2vec
88 | with tf.name_scope("doc2vec"):
89 | sent_vec = tf.reshape(sent_vec, [-1, self.max_sentence_num, self.hidden_size*2])
90 | #shape为[batch_size, sent_in_doc, hidden_size*2]
91 | doc_encoded = self.BidirectionalGRUEncoder(sent_vec, name='sent_encoder')
92 | #shape为[batch_szie, hidden_szie*2]
93 | doc_temp,att_sent = self.SentenceAttentionLayer(doc_encoded,q1_embedded,q2_embedded,q3_embedded,name='sent_attention')
94 | doc_vec = layers.dropout(doc_temp, keep_prob=self.dropout_keep_proba,is_training=self.is_training,)
95 | return doc_vec,att_sent
96 |
97 | def classifer(self, doc_vec):
98 | with tf.name_scope('doc_classification'):
99 | out = layers.fully_connected(inputs=doc_vec, num_outputs=self.num_classes, activation_fn=None)
100 | return out
101 |
102 | def BidirectionalGRUEncoder(self, inputs, name):
103 | #inputs shape: [batch_size, max_time, voc_size]
104 | with tf.variable_scope(name):
105 | GRU_cell_fw = rnn.GRUCell(self.hidden_size)
106 | GRU_cell_bw = rnn.GRUCell(self.hidden_size)
107 | #fw_outputs, bw_outputs size: [batch_size, max_time, hidden_size]
108 | # time_major=False,
109 | # if time_major = True, tensor shape: `[max_time, batch_size, depth]`.
110 | # if time_major = False, tensor shape`[batch_size, max_time, depth]`.
111 | ((fw_outputs, bw_outputs), (_, _)) = tf.nn.bidirectional_dynamic_rnn(cell_fw=GRU_cell_fw,
112 | cell_bw=GRU_cell_bw,
113 | inputs=inputs,
114 | sequence_length=length(inputs),
115 | dtype=tf.float32)
116 | #outputs size [batch_size, max_time, hidden_size*2]
117 | outputs = tf.concat((fw_outputs, bw_outputs), 2)
118 | return outputs
119 |
120 | def AttentionLayer(self, inputs, q1_emb,q2_emb,q3_emb,name):
121 | #inputs size [batch_size, max_time, encoder_size(hidden_size * 2)]
122 | with tf.variable_scope(name):
123 | # u_context length is 2×hidden_size
124 | u_context = tf.Variable(tf.truncated_normal([self.hidden_size * 2]), name='u_context')
125 | # output size [batch_size, max_time, hidden_size * 2]
126 | h1 = layers.fully_connected(inputs, self.hidden_size * 2, activation_fn=tf.nn.tanh)
127 | h2 = layers.fully_connected(inputs, self.hidden_size * 2, activation_fn=tf.nn.tanh)
128 | h3 = layers.fully_connected(inputs, self.hidden_size * 2, activation_fn=tf.nn.tanh)
129 | h4 = layers.fully_connected(inputs, self.hidden_size * 2, activation_fn=tf.nn.tanh)
130 |
131 | # shape [batch_size, max_time, 1]
132 | t_alpha = tf.nn.softmax(tf.reduce_sum(tf.multiply(h1, u_context), axis=2, keep_dims=True), dim=1)
133 | q_alpha1 = tf.nn.softmax(tf.reduce_sum(tf.multiply(h2, q1_emb), axis=2, keep_dims=True), dim=1)
134 | q_alpha2 = tf.nn.softmax(tf.reduce_sum(tf.multiply(h3, q2_emb), axis=2, keep_dims=True), dim=1)
135 | q_alpha3 = tf.nn.softmax(tf.reduce_sum(tf.multiply(h4, q3_emb), axis=2, keep_dims=True), dim=1)
136 |
137 | alpha = (t_alpha+q_alpha1+q_alpha2+q_alpha3)/4
138 |
139 | a = tf.nn.top_k((tf.reshape(alpha,[-1,self.max_sentence_length])),k=1).indices
140 | # shape [batch_size, max_time, 1]
141 | # alpha = tf.nn.softmax(tf.reduce_sum(tf.multiply(h, u_context), axis=2, keep_dims=True), dim=1)
142 | # reduce_sum [batch_size, max_time, hidden_size*2] ---> [batch_size, hidden_size*2]
143 | atten_output = tf.reduce_sum(tf.multiply(inputs, alpha), axis=1)
144 | # atten_output = tf.reduce_sum(inputs,axis=1)
145 | return atten_output,a
146 | def SentenceAttentionLayer(self, inputs,q1_emb,q2_emb,q3_emb, name):
147 | # inputs size [batch_size, max_time, encoder_size(hidden_size * 2)]
148 | with tf.variable_scope(name):
149 | u_context = tf.Variable(tf.truncated_normal([self.hidden_size * 2]), name='u_context')
150 |
151 | h1 = layers.fully_connected(inputs, self.hidden_size * 2, activation_fn=tf.nn.tanh)
152 | h2 = layers.fully_connected(inputs, self.hidden_size * 2, activation_fn=tf.nn.tanh)
153 | h3 = layers.fully_connected(inputs, self.hidden_size * 2, activation_fn=tf.nn.tanh)
154 | h4 = layers.fully_connected(inputs, self.hidden_size * 2, activation_fn=tf.nn.tanh)
155 |
156 | # shape [batch_size, max_time, 1]
157 | t_alpha = tf.nn.softmax(tf.reduce_sum(tf.multiply(h1, u_context), axis=2, keep_dims=True), dim=1)
158 | q_alpha1 = tf.nn.softmax(tf.reduce_sum(tf.multiply(h2, q1_emb), axis=2, keep_dims=True), dim=1)
159 | q_alpha2 = tf.nn.softmax(tf.reduce_sum(tf.multiply(h3, q2_emb), axis=2, keep_dims=True), dim=1)
160 | q_alpha3 = tf.nn.softmax(tf.reduce_sum(tf.multiply(h4, q3_emb), axis=2, keep_dims=True), dim=1)
161 |
162 | # sents shape [batch_size, sent_in_doc, hidden_size*2]
163 |
164 | alpha = (t_alpha+q_alpha1+q_alpha2+q_alpha3)/4
165 | #tf.add_to_collection('attention_value',alpha)
166 | #reduce_sum [batch_szie, max_time, hidden_szie*2] ---> [batch_size, hidden_size*2]
167 | atten_output = tf.reduce_sum(tf.multiply(inputs, alpha), axis=1)
168 |
169 | att = tf.concat([t_alpha,q_alpha1,q_alpha2,q_alpha3],0)
170 | #atten_output = tf.reduce_sum(inputs,axis=1)
171 | return atten_output,att
172 |
--------------------------------------------------------------------------------
/dictionary/stopwords_CN.dat:
--------------------------------------------------------------------------------
1 | 责任编辑
2 | 末
3 | 年
4 | 月
5 | 日
6 | 啊
7 | 阿
8 | 哎
9 | 哎呀
10 | 哎哟
11 | 唉
12 | 俺
13 | 俺们
14 | 按
15 | 按照
16 | 吧
17 | 吧哒
18 | 把
19 | 罢了
20 | 被
21 | 本
22 | 本着
23 | 比
24 | 比方
25 | 比如
26 | 鄙人
27 | 彼
28 | 彼此
29 | 边
30 | 别
31 | 别的
32 | 别说
33 | 并
34 | 并且
35 | 不单
36 | 不但
37 | 不独
38 | 不管
39 | 不光
40 | 不过
41 | 不仅
42 | 不拘
43 | 不论
44 | 不怕
45 | 不然
46 | 不特
47 | 不惟
48 | 不问
49 | 不只
50 | 朝
51 | 朝着
52 | 趁
53 | 趁着
54 | 乘
55 | 冲
56 | 除
57 | 除此之外
58 | 除非
59 | 除了
60 | 此
61 | 此间
62 | 此外
63 | 从
64 | 从而
65 | 打
66 | 待
67 | 当
68 | 当着
69 | 到
70 | 得
71 | 的
72 | 的话
73 | 等
74 | 等等
75 | 地
76 | 第
77 | 叮咚
78 | 对
79 | 对于
80 | 多
81 | 多少
82 | 而
83 | 而况
84 | 而且
85 | 而是
86 | 而外
87 | 而言
88 | 而已
89 | 尔后
90 | 反过来
91 | 反过来说
92 | 反之
93 | 非但
94 | 非徒
95 | 否则
96 | 嘎
97 | 嘎登
98 | 该
99 | 赶
100 | 个
101 | 各
102 | 各个
103 | 各位
104 | 各种
105 | 各自
106 | 给
107 | 根据
108 | 跟
109 | 故
110 | 故此
111 | 固然
112 | 关于
113 | 管
114 | 归
115 | 果然
116 | 果真
117 | 过
118 | 哈
119 | 哈哈
120 | 呵
121 | 和
122 | 何
123 | 何处
124 | 何况
125 | 何时
126 | 嘿
127 | 哼
128 | 哼唷
129 | 呼哧
130 | 乎
131 | 哗
132 | 还是
133 | 还有
134 | 换句话说
135 | 换言之
136 | 或
137 | 或是
138 | 或者
139 | 极了
140 | 及
141 | 及其
142 | 及至
143 | 即
144 | 即便
145 | 即或
146 | 即令
147 | 即若
148 | 即使
149 | 几
150 | 几时
151 | 己
152 | 既
153 | 既然
154 | 既是
155 | 继而
156 | 加之
157 | 假如
158 | 假若
159 | 假使
160 | 鉴于
161 | 将
162 | 较
163 | 较之
164 | 叫
165 | 接着
166 | 结果
167 | 借
168 | 紧接着
169 | 进而
170 | 尽
171 | 尽管
172 | 经
173 | 经过
174 | 就
175 | 就是
176 | 就是说
177 | 据
178 | 具体地说
179 | 具体说来
180 | 开始
181 | 开外
182 | 靠
183 | 咳
184 | 可
185 | 可见
186 | 可是
187 | 可以
188 | 况且
189 | 啦
190 | 来
191 | 来着
192 | 离
193 | 例如
194 | 哩
195 | 连
196 | 连同
197 | 两者
198 | 了
199 | 临
200 | 另
201 | 另外
202 | 另一方面
203 | 论
204 | 嘛
205 | 吗
206 | 慢说
207 | 漫说
208 | 冒
209 | 么
210 | 每
211 | 每当
212 | 们
213 | 莫若
214 | 某
215 | 某个
216 | 某些
217 | 拿
218 | 哪
219 | 哪边
220 | 哪儿
221 | 哪个
222 | 哪里
223 | 哪年
224 | 哪怕
225 | 哪天
226 | 哪些
227 | 哪样
228 | 那
229 | 那边
230 | 那儿
231 | 那个
232 | 那会儿
233 | 那里
234 | 那么
235 | 那么些
236 | 那么样
237 | 那时
238 | 那些
239 | 那样
240 | 乃
241 | 乃至
242 | 呢
243 | 能
244 | 你
245 | 你们
246 | 您
247 | 宁
248 | 宁可
249 | 宁肯
250 | 宁愿
251 | 哦
252 | 呕
253 | 啪达
254 | 旁人
255 | 呸
256 | 凭
257 | 凭借
258 | 其
259 | 其次
260 | 其二
261 | 其他
262 | 其它
263 | 其一
264 | 其余
265 | 其中
266 | 起
267 | 起见
268 | 岂但
269 | 恰恰相反
270 | 前后
271 | 前者
272 | 且
273 | 然而
274 | 然后
275 | 然则
276 | 让
277 | 人家
278 | 任
279 | 任何
280 | 任凭
281 | 如
282 | 如此
283 | 如果
284 | 如何
285 | 如其
286 | 如若
287 | 如上所述
288 | 若
289 | 若非
290 | 若是
291 | 啥
292 | 上下
293 | 尚且
294 | 设若
295 | 设使
296 | 甚而
297 | 甚么
298 | 甚至
299 | 省得
300 | 时候
301 | 什么
302 | 什么样
303 | 使得
304 | 是
305 | 是的
306 | 首先
307 | 谁
308 | 谁知
309 | 顺
310 | 顺着
311 | 似的
312 | 虽
313 | 虽然
314 | 虽说
315 | 虽则
316 | 随
317 | 随着
318 | 所
319 | 所以
320 | 他
321 | 他们
322 | 他人
323 | 它
324 | 它们
325 | 她
326 | 她们
327 | 倘
328 | 倘或
329 | 倘然
330 | 倘若
331 | 倘使
332 | 腾
333 | 替
334 | 通过
335 | 同
336 | 同时
337 | 哇
338 | 万一
339 | 往
340 | 望
341 | 为
342 | 为何
343 | 为了
344 | 为什么
345 | 为着
346 | 喂
347 | 嗡嗡
348 | 我
349 | 我们
350 | 呜
351 | 呜呼
352 | 乌乎
353 | 无论
354 | 无宁
355 | 毋宁
356 | 嘻
357 | 吓
358 | 相对而言
359 | 像
360 | 向
361 | 向着
362 | 嘘
363 | 呀
364 | 焉
365 | 沿
366 | 沿着
367 | 要
368 | 要不
369 | 要不然
370 | 要不是
371 | 要么
372 | 要是
373 | 也
374 | 也罢
375 | 也好
376 | 一
377 | 一般
378 | 一旦
379 | 一方面
380 | 一来
381 | 一切
382 | 一样
383 | 一则
384 | 依
385 | 依照
386 | 矣
387 | 以
388 | 以便
389 | 以及
390 | 以免
391 | 以至
392 | 以至于
393 | 以致
394 | 抑或
395 | 因
396 | 因此
397 | 因而
398 | 因为
399 | 哟
400 | 用
401 | 由
402 | 由此可见
403 | 由于
404 | 有
405 | 有的
406 | 有关
407 | 有些
408 | 又
409 | 于
410 | 于是
411 | 于是乎
412 | 与
413 | 与此同时
414 | 与否
415 | 与其
416 | 越是
417 | 云云
418 | 哉
419 | 再说
420 | 再者
421 | 在
422 | 在下
423 | 咱
424 | 咱们
425 | 则
426 | 怎
427 | 怎么
428 | 怎么办
429 | 怎么样
430 | 怎样
431 | 咋
432 | 照
433 | 照着
434 | 者
435 | 这
436 | 这边
437 | 这儿
438 | 这个
439 | 这会儿
440 | 这就是说
441 | 这里
442 | 这么
443 | 这么点儿
444 | 这么些
445 | 这么样
446 | 这时
447 | 这些
448 | 这样
449 | 正如
450 | 吱
451 | 之
452 | 之类
453 | 之所以
454 | 之一
455 | 只是
456 | 只限
457 | 只要
458 | 只有
459 | 至
460 | 至于
461 | 诸位
462 | 着
463 | 着呢
464 | 自
465 | 自从
466 | 自个儿
467 | 自各儿
468 | 自己
469 | 自家
470 | 自身
471 | 综上所述
472 | 总的来看
473 | 总的来说
474 | 总的说来
475 | 总而言之
476 | 总之
477 | 纵
478 | 纵令
479 | 纵然
480 | 纵使
481 | 遵照
482 | 作为
483 | 兮
484 | 呃
485 | 呗
486 | 咚
487 | 咦
488 | 喏
489 | 啐
490 | 喔唷
491 | 嗬
492 | 嗯
493 | 嗳
494 | 啊哈
495 | 啊呀
496 | 啊哟
497 | 挨次
498 | 挨个
499 | 挨家挨户
500 | 挨门挨户
501 | 挨门逐户
502 | 挨着
503 | 按理
504 | 按期
505 | 按时
506 | 按说
507 | 暗地里
508 | 暗中
509 | 暗自
510 | 昂然
511 | 八成
512 | 白白
513 | 半
514 | 梆
515 | 保管
516 | 保险
517 | 饱
518 | 背地里
519 | 背靠背
520 | 倍感
521 | 倍加
522 | 本人
523 | 本身
524 | 甭
525 | 比起
526 | 比如说
527 | 比照
528 | 毕竟
529 | 必
530 | 必定
531 | 必将
532 | 必须
533 | 便
534 | 别人
535 | 并肩
536 | 并排
537 | 勃然
538 | 策略地
539 | 差不多
540 | 差一点
541 | 常
542 | 常常
543 | 常言道
544 | 常言说
545 | 常言说得好
546 | 长此下去
547 | 长话短说
548 | 长期以来
549 | 长线
550 | 敞开儿
551 | 彻夜
552 | 陈年
553 | 趁便
554 | 趁机
555 | 趁热
556 | 趁势
557 | 趁早
558 | 成年
559 | 成年累月
560 | 成心
561 | 乘机
562 | 乘胜
563 | 乘势
564 | 乘隙
565 | 乘虚
566 | 诚然
567 | 迟早
568 | 充分
569 | 充其极
570 | 充其量
571 | 抽冷子
572 | 臭
573 | 初
574 | 出
575 | 出来
576 | 出去
577 | 除此
578 | 除此而外
579 | 除此以外
580 | 除开
581 | 除去
582 | 除却
583 | 除外
584 | 处处
585 | 川流不息
586 | 传
587 | 传说
588 | 传闻
589 | 串行
590 | 纯
591 | 纯粹
592 | 此后
593 | 此中
594 | 次第
595 | 匆匆
596 | 从不
597 | 从此
598 | 从此以后
599 | 从古到今
600 | 从古至今
601 | 从今以后
602 | 从宽
603 | 从来
604 | 从轻
605 | 从速
606 | 从头
607 | 从未
608 | 从无到有
609 | 从小
610 | 从新
611 | 从严
612 | 从优
613 | 从早到晚
614 | 从中
615 | 从重
616 | 凑巧
617 | 粗
618 | 存心
619 | 达旦
620 | 打从
621 | 打开天窗说亮话
622 | 大
623 | 大不了
624 | 大大
625 | 大抵
626 | 大都
627 | 大多
628 | 大凡
629 | 大概
630 | 大家
631 | 大举
632 | 大略
633 | 大面儿上
634 | 大事
635 | 大体
636 | 大体上
637 | 大约
638 | 大张旗鼓
639 | 大致
640 | 呆呆地
641 | 带
642 | 殆
643 | 待到
644 | 单
645 | 单纯
646 | 单单
647 | 但愿
648 | 弹指之间
649 | 当场
650 | 当儿
651 | 当即
652 | 当口儿
653 | 当然
654 | 当庭
655 | 当头
656 | 当下
657 | 当真
658 | 当中
659 | 倒不如
660 | 倒不如说
661 | 倒是
662 | 到处
663 | 到底
664 | 到了儿
665 | 到目前为止
666 | 到头
667 | 到头来
668 | 得起
669 | 得天独厚
670 | 的确
671 | 等到
672 | 叮当
673 | 顶多
674 | 定
675 | 动不动
676 | 动辄
677 | 陡然
678 | 都
679 | 独
680 | 独自
681 | 断然
682 | 顿时
683 | 多次
684 | 多多
685 | 多多少少
686 | 多多益善
687 | 多亏
688 | 多年来
689 | 多年前
690 | 而后
691 | 而论
692 | 而又
693 | 尔等
694 | 二话不说
695 | 二话没说
696 | 反倒
697 | 反倒是
698 | 反而
699 | 反手
700 | 反之亦然
701 | 反之则
702 | 方
703 | 方才
704 | 方能
705 | 放量
706 | 非常
707 | 非得
708 | 分期
709 | 分期分批
710 | 分头
711 | 奋勇
712 | 愤然
713 | 风雨无阻
714 | 逢
715 | 弗
716 | 甫
717 | 嘎嘎
718 | 该当
719 | 概
720 | 赶快
721 | 赶早不赶晚
722 | 敢
723 | 敢情
724 | 敢于
725 | 刚
726 | 刚才
727 | 刚好
728 | 刚巧
729 | 高低
730 | 格外
731 | 隔日
732 | 隔夜
733 | 个人
734 | 各式
735 | 更
736 | 更加
737 | 更进一步
738 | 更为
739 | 公然
740 | 共
741 | 共总
742 | 够瞧的
743 | 姑且
744 | 古来
745 | 故而
746 | 故意
747 | 固
748 | 怪
749 | 怪不得
750 | 惯常
751 | 光
752 | 光是
753 | 归根到底
754 | 归根结底
755 | 过于
756 | 毫不
757 | 毫无
758 | 毫无保留地
759 | 毫无例外
760 | 好在
761 | 何必
762 | 何尝
763 | 何妨
764 | 何苦
765 | 何乐而不为
766 | 何须
767 | 何止
768 | 很
769 | 很多
770 | 很少
771 | 轰然
772 | 后来
773 | 呼啦
774 | 忽地
775 | 忽然
776 | 互
777 | 互相
778 | 哗啦
779 | 话说
780 | 还
781 | 恍然
782 | 会
783 | 豁然
784 | 活
785 | 伙同
786 | 或多或少
787 | 或许
788 | 基本
789 | 基本上
790 | 基于
791 | 极
792 | 极大
793 | 极度
794 | 极端
795 | 极力
796 | 极其
797 | 极为
798 | 急匆匆
799 | 即将
800 | 即刻
801 | 即是说
802 | 几度
803 | 几番
804 | 几乎
805 | 几经
806 | 既...又
807 | 继之
808 | 加上
809 | 加以
810 | 间或
811 | 简而言之
812 | 简言之
813 | 简直
814 | 见
815 | 将才
816 | 将近
817 | 将要
818 | 交口
819 | 较比
820 | 较为
821 | 接连不断
822 | 接下来
823 | 皆可
824 | 截然
825 | 截至
826 | 藉以
827 | 借此
828 | 借以
829 | 届时
830 | 仅
831 | 仅仅
832 | 谨
833 | 进来
834 | 进去
835 | 近
836 | 近几年来
837 | 近来
838 | 近年来
839 | 尽管如此
840 | 尽可能
841 | 尽快
842 | 尽量
843 | 尽然
844 | 尽如人意
845 | 尽心竭力
846 | 尽心尽力
847 | 尽早
848 | 精光
849 | 经常
850 | 竟
851 | 竟然
852 | 究竟
853 | 就此
854 | 就地
855 | 就算
856 | 居然
857 | 局外
858 | 举凡
859 | 据称
860 | 据此
861 | 据实
862 | 据说
863 | 据我所知
864 | 据悉
865 | 具体来说
866 | 决不
867 | 决非
868 | 绝
869 | 绝不
870 | 绝顶
871 | 绝对
872 | 绝非
873 | 均
874 | 喀
875 | 看
876 | 看来
877 | 看起来
878 | 看上去
879 | 看样子
880 | 可好
881 | 可能
882 | 恐怕
883 | 快
884 | 快要
885 | 来不及
886 | 来得及
887 | 来讲
888 | 来看
889 | 拦腰
890 | 牢牢
891 | 老
892 | 老大
893 | 老老实实
894 | 老是
895 | 累次
896 | 累年
897 | 理当
898 | 理该
899 | 理应
900 | 历
901 | 立
902 | 立地
903 | 立刻
904 | 立马
905 | 立时
906 | 联袂
907 | 连连
908 | 连日
909 | 连日来
910 | 连声
911 | 连袂
912 | 临到
913 | 另方面
914 | 另行
915 | 另一个
916 | 路经
917 | 屡
918 | 屡次
919 | 屡次三番
920 | 屡屡
921 | 缕缕
922 | 率尔
923 | 率然
924 | 略
925 | 略加
926 | 略微
927 | 略为
928 | 论说
929 | 马上
930 | 蛮
931 | 满
932 | 每逢
933 | 每每
934 | 每时每刻
935 | 猛然
936 | 猛然间
937 | 莫
938 | 莫不
939 | 莫非
940 | 莫如
941 | 默默地
942 | 默然
943 | 呐
944 | 那末
945 | 奈
946 | 难道
947 | 难得
948 | 难怪
949 | 难说
950 | 内
951 | 年复一年
952 | 凝神
953 | 偶而
954 | 偶尔
955 | 怕
956 | 砰
957 | 碰巧
958 | 譬如
959 | 偏偏
960 | 乒
961 | 平素
962 | 颇
963 | 迫于
964 | 扑通
965 | 其后
966 | 其实
967 | 奇
968 | 齐
969 | 起初
970 | 起来
971 | 起首
972 | 起头
973 | 起先
974 | 岂
975 | 岂非
976 | 岂止
977 | 迄
978 | 恰逢
979 | 恰好
980 | 恰恰
981 | 恰巧
982 | 恰如
983 | 恰似
984 | 千
985 | 万
986 | 千万
987 | 千万千万
988 | 切
989 | 切不可
990 | 切莫
991 | 切切
992 | 切勿
993 | 窃
994 | 亲口
995 | 亲身
996 | 亲手
997 | 亲眼
998 | 亲自
999 | 顷
1000 | 顷刻
1001 | 顷刻间
1002 | 顷刻之间
1003 | 请勿
1004 | 穷年累月
1005 | 取道
1006 | 去
1007 | 权时
1008 | 全都
1009 | 全力
1010 | 全年
1011 | 全然
1012 | 全身心
1013 | 然
1014 | 人人
1015 | 仍
1016 | 仍旧
1017 | 仍然
1018 | 日复一日
1019 | 日见
1020 | 日渐
1021 | 日益
1022 | 日臻
1023 | 如常
1024 | 如此等等
1025 | 如次
1026 | 如今
1027 | 如期
1028 | 如前所述
1029 | 如上
1030 | 如下
1031 | 汝
1032 | 三番两次
1033 | 三番五次
1034 | 三天两头
1035 | 瑟瑟
1036 | 沙沙
1037 | 上
1038 | 上来
1039 | 上去
1040 | 一
1041 | 一一
1042 | 一下
1043 | 一个
1044 | 一些
1045 | 一何
1046 | 一则通过
1047 | 一天
1048 | 一定
1049 | 一时
1050 | 一次
1051 | 一片
1052 | 一番
1053 | 一直
1054 | 一致
1055 | 一起
1056 | 一转眼
1057 | 一边
1058 | 一面
1059 | 上升
1060 | 上述
1061 | 上面
1062 | 下
1063 | 下列
1064 | 下去
1065 | 下来
1066 | 下面
1067 | 不一
1068 | 不久
1069 | 不变
1070 | 不可
1071 | 不够
1072 | 不尽
1073 | 不尽然
1074 | 不敢
1075 | 不断
1076 | 不若
1077 | 不足
1078 | 与其说
1079 | 专门
1080 | 且不说
1081 | 且说
1082 | 严格
1083 | 严重
1084 | 个别
1085 | 中小
1086 | 中间
1087 | 丰富
1088 | 为主
1089 | 为什麽
1090 | 为止
1091 | 为此
1092 | 主张
1093 | 主要
1094 | 举行
1095 | 乃至于
1096 | 之前
1097 | 之后
1098 | 之後
1099 | 也就是说
1100 | 也是
1101 | 了解
1102 | 争取
1103 | 二来
1104 | 云尔
1105 | 些
1106 | 亦
1107 | 产生
1108 | 人
1109 | 人们
1110 | 什麽
1111 | 今
1112 | 今后
1113 | 今天
1114 | 今年
1115 | 今後
1116 | 介于
1117 | 从事
1118 | 他是
1119 | 他的
1120 | 代替
1121 | 以上
1122 | 以下
1123 | 以为
1124 | 以前
1125 | 以后
1126 | 以外
1127 | 以後
1128 | 以故
1129 | 以期
1130 | 以来
1131 | 任务
1132 | 企图
1133 | 伟大
1134 | 似乎
1135 | 但凡
1136 | 何以
1137 | 余外
1138 | 你是
1139 | 你的
1140 | 使
1141 | 使用
1142 | 依据
1143 | 依靠
1144 | 便于
1145 | 促进
1146 | 保持
1147 | 做到
1148 | 傥然
1149 | 儿
1150 | 允许
1151 | 元/吨
1152 | 先不先
1153 | 先后
1154 | 先後
1155 | 先生
1156 | 全体
1157 | 全部
1158 | 全面
1159 | 共同
1160 | 具体
1161 | 具有
1162 | 兼之
1163 | 再
1164 | 再其次
1165 | 再则
1166 | 再有
1167 | 再次
1168 | 再者说
1169 | 决定
1170 | 准备
1171 | 凡
1172 | 凡是
1173 | 出于
1174 | 出现
1175 | 分别
1176 | 则甚
1177 | 别处
1178 | 别是
1179 | 别管
1180 | 前此
1181 | 前进
1182 | 前面
1183 | 加入
1184 | 加强
1185 | 十分
1186 | 即如
1187 | 却
1188 | 却不
1189 | 原来
1190 | 又及
1191 | 及时
1192 | 双方
1193 | 反应
1194 | 反映
1195 | 取得
1196 | 受到
1197 | 变成
1198 | 另悉
1199 | 只
1200 | 只当
1201 | 只怕
1202 | 只消
1203 | 叫做
1204 | 召开
1205 | 各人
1206 | 各地
1207 | 各级
1208 | 合理
1209 | 同一
1210 | 同样
1211 | 后
1212 | 后者
1213 | 后面
1214 | 向使
1215 | 周围
1216 | 呵呵
1217 | 咧
1218 | 唯有
1219 | 啷当
1220 | 喽
1221 | 嗡
1222 | 嘿嘿
1223 | 因了
1224 | 因着
1225 | 在于
1226 | 坚决
1227 | 坚持
1228 | 处在
1229 | 处理
1230 | 复杂
1231 | 多么
1232 | 多数
1233 | 大力
1234 | 大多数
1235 | 大批
1236 | 大量
1237 | 失去
1238 | 她是
1239 | 她的
1240 | 好
1241 | 好的
1242 | 好象
1243 | 如同
1244 | 如是
1245 | 始而
1246 | 存在
1247 | 孰料
1248 | 孰知
1249 | 它们的
1250 | 它是
1251 | 它的
1252 | 安全
1253 | 完全
1254 | 完成
1255 | 实现
1256 | 实际
1257 | 宣布
1258 | 容易
1259 | 密切
1260 | 对应
1261 | 对待
1262 | 对方
1263 | 对比
1264 | 小
1265 | 少数
1266 | 尔
1267 | 尔尔
1268 | 尤其
1269 | 就是了
1270 | 就要
1271 | 属于
1272 | 左右
1273 | 巨大
1274 | 巩固
1275 | 已
1276 | 已矣
1277 | 已经
1278 | 巴
1279 | 巴巴
1280 | 帮助
1281 | 并不
1282 | 并不是
1283 | 广大
1284 | 广泛
1285 | 应当
1286 | 应用
1287 | 应该
1288 | 庶乎
1289 | 庶几
1290 | 开展
1291 | 引起
1292 | 强烈
1293 | 强调
1294 | 归齐
1295 | 当前
1296 | 当地
1297 | 当时
1298 | 形成
1299 | 彻底
1300 | 彼时
1301 | 往往
1302 | 後来
1303 | 後面
1304 | 得了
1305 | 得出
1306 | 得到
1307 | 心里
1308 | 必然
1309 | 必要
1310 | 怎奈
1311 | 怎麽
1312 | 总是
1313 | 总结
1314 | 您们
1315 | 您是
1316 | 惟其
1317 | 意思
1318 | 愿意
1319 | 成为
1320 | 我是
1321 | 我的
1322 | 或则
1323 | 或曰
1324 | 战斗
1325 | 所在
1326 | 所幸
1327 | 所有
1328 | 所谓
1329 | 扩大
1330 | 掌握
1331 | 接著
1332 | 数/
1333 | 整个
1334 | 方便
1335 | 方面
1336 | 既往
1337 | 明显
1338 | 明确
1339 | 是不是
1340 | 是以
1341 | 是否
1342 | 显然
1343 | 显著
1344 | 普通
1345 | 普遍
1346 | 曾
1347 | 曾经
1348 | 替代
1349 | 最
1350 | 最后
1351 | 最大
1352 | 最好
1353 | 最後
1354 | 最近
1355 | 最高
1356 | 有利
1357 | 有力
1358 | 有及
1359 | 有所
1360 | 有效
1361 | 有时
1362 | 有点
1363 | 有的是
1364 | 有着
1365 | 有著
1366 | 末
1367 | 本地
1368 | 来自
1369 | 来说
1370 | 构成
1371 | 某某
1372 | 根本
1373 | 欢迎
1374 | 欤
1375 | 正值
1376 | 正在
1377 | 正巧
1378 | 正常
1379 | 正是
1380 | 此地
1381 | 此处
1382 | 此时
1383 | 此次
1384 | 每个
1385 | 每天
1386 | 每年
1387 | 比及
1388 | 比较
1389 | 没奈何
1390 | 注意
1391 | 深入
1392 | 清楚
1393 | 满足
1394 | 然後
1395 | 特别是
1396 | 特殊
1397 | 特点
1398 | 犹且
1399 | 犹自
1400 | 现代
1401 | 现在
1402 | 甚且
1403 | 甚或
1404 | 甚至于
1405 | 用来
1406 | 由是
1407 | 由此
1408 | 目前
1409 | 直到
1410 | 直接
1411 | 相似
1412 | 相信
1413 | 相反
1414 | 相同
1415 | 相对
1416 | 相应
1417 | 相当
1418 | 相等
1419 | 看出
1420 | 看到
1421 | 看看
1422 | 看见
1423 | 真是
1424 | 真正
1425 | 眨眼
1426 | 矣乎
1427 | 矣哉
1428 | 知道
1429 | 确定
1430 | 种
1431 | 积极
1432 | 移动
1433 | 突出
1434 | 突然
1435 | 立即
1436 | 竟而
1437 | 第二
1438 | 类如
1439 | 练习
1440 | 组成
1441 | 结合
1442 | 继后
1443 | 继续
1444 | 维持
1445 | 考虑
1446 | 联系
1447 | 能否
1448 | 能够
1449 | 自后
1450 | 自打
1451 | 至今
1452 | 至若
1453 | 致
1454 | 般的
1455 | 良好
1456 | 若夫
1457 | 若果
1458 | 范围
1459 | 莫不然
1460 | 获得
1461 | 行为
1462 | 行动
1463 | 表明
1464 | 表示
1465 | 要求
1466 | 规定
1467 | 觉得
1468 | 譬喻
1469 | 认为
1470 | 认真
1471 | 认识
1472 | 许多
1473 | 设或
1474 | 诚如
1475 | 说明
1476 | 说来
1477 | 说说
1478 | 诸
1479 | 诸如
1480 | 谁人
1481 | 谁料
1482 | 贼死
1483 | 赖以
1484 | 距
1485 | 转动
1486 | 转变
1487 | 转贴
1488 | 达到
1489 | 迅速
1490 | 过去
1491 | 过来
1492 | 运用
1493 | 还要
1494 | 这一来
1495 | 这次
1496 | 这点
1497 | 这种
1498 | 这般
1499 | 这麽
1500 | 进入
1501 | 进步
1502 | 进行
1503 | 适应
1504 | 适当
1505 | 适用
1506 | 逐步
1507 | 逐渐
1508 | 通常
1509 | 造成
1510 | 遇到
1511 | 遭到
1512 | 遵循
1513 | 避免
1514 | 那般
1515 | 那麽
1516 | 部分
1517 | 采取
1518 | 里面
1519 | 重大
1520 | 重新
1521 | 重要
1522 | 针对
1523 | 问题
1524 | 防止
1525 | 附近
1526 | 限制
1527 | 随后
1528 | 随时
1529 | 随著
1530 | 难道说
1531 | 集中
1532 | 需要
1533 | 非特
1534 | 非独
1535 | 高兴
1536 | 若果
--------------------------------------------------------------------------------