├── .gitignore ├── LICENSE ├── README.md ├── SentimentAnalysis ├── SentimentAnalysis.py ├── __init__.py ├── creat_data │ ├── __init__.py │ ├── ali.py │ ├── baidu.py │ ├── bat.py │ ├── config.py │ └── tencent.py ├── data │ └── traindata.xlsx ├── data_info.json ├── flask_api.py ├── models │ ├── __init__.py │ ├── classify.h5 │ ├── classify.model │ ├── keras_log_plot.py │ ├── neural_bulit.py │ ├── parameter │ │ ├── __init__.py │ │ └── optimizers.py │ ├── sklearn_config.py │ ├── sklearn_supervised.py │ └── vocab_word2vec.model └── sentence_transform │ ├── __init__.py │ ├── creat_vocab_word2vec.py │ ├── sentence_2_sparse.py │ └── sentence_2_tokenizer.py ├── demo.py └── picture ├── Conv1D.png ├── SVM.png ├── api1.png ├── api2.png ├── api3.png └── label.png /.gitignore: -------------------------------------------------------------------------------- 1 | 2 | .idea/misc.xml 3 | .idea/modules.xml 4 | .idea/Sentiment-analysis.iml 5 | *.xml 6 | *.pyc 7 | try.py 8 | creat_data/config.py 9 | creat_label_mysql.py 10 | SentimentAnalysis/creat_data/bat_1.py 11 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 renjunxiang 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Sentiment-analysis:情感分析 2 | 3 | [![](https://img.shields.io/badge/Python-3.5-blue.svg)](https://www.python.org/)
4 | [![](https://img.shields.io/badge/baidu--aip-2.1.0.0-brightgreen.svg)](https://pypi.python.org/pypi/baidu-aip/2.1.0.0) 5 | [![](https://img.shields.io/badge/pandas-0.21.0-brightgreen.svg)](https://pypi.python.org/pypi/pandas/0.21.0) 6 | [![](https://img.shields.io/badge/numpy-1.13.1-brightgreen.svg)](https://pypi.python.org/pypi/numpy/1.13.1) 7 | [![](https://img.shields.io/badge/matplotlib-2.1.0-brightgreen.svg)](https://pypi.python.org/pypi/matplotlib/2.1.0) 8 | [![](https://img.shields.io/badge/jieba-0.39-brightgreen.svg)](https://pypi.python.org/pypi/jieba/0.39) 9 | [![](https://img.shields.io/badge/gensim-3.2.0-brightgreen.svg)](https://pypi.python.org/pypi/gensim/3.2.0) 10 | [![](https://img.shields.io/badge/scikit--learn-0.19.1-brightgreen.svg)](https://pypi.python.org/pypi/scikit-learn/0.19.1) 11 | [![](https://img.shields.io/badge/requests-2.18.4-brightgreen.svg)](https://pypi.python.org/pypi/requests/2.18.4) 12 | 13 | ## 语言 14 | Python3.5
15 | ## 依赖库 16 | baidu-aip=2.1.0.0
17 | pandas=0.21.0
18 | numpy=1.13.1
19 | jieba=0.39
20 | gensim=3.2.0
21 | scikit-learn=0.19.1
22 | keras=2.1.1
23 | requests=2.18.4
24 | 25 | 26 | 27 | ## 项目介绍 28 | * 通过对已有标签的帖子进行训练,实现新帖子的情感分类,SentimentAnalysis文件夹可以直接作为模块使用。
29 | * 已完成机器学习算法中KNN、SVM和Logistic的封装,神经网络算法中的一维卷积核LSTM封装。训练集为一万条记录,SVM效果最好,准确率在87%左右.
30 | * ***PS:该项目在上一个项目Text-Classification基础上封装而成~仍有很多不足,欢迎萌新、大佬多多指导!*** 31 | 32 | ## 用法简介 33 | * ### 导入模块,创建模型 34 | ``` python 35 | from SentimentAnalysis.SentimentAnalysis import SentimentAnalysis 36 | model = SentimentAnalysis() 37 | ``` 38 | 39 | * ### 借助第三方平台,打情感标签。 40 | ``` python 41 | # 用于在缺乏标签的时候利用BAT三家的接口创建训练集,5000条文档共耗时约45分钟 42 | texts=['国王喜欢吃苹果', 43 | '国王非常喜欢吃苹果', 44 | '国王讨厌吃苹果', 45 | '国王非常讨厌吃苹果'] 46 | texts_withlabel=model.creat_label(texts) 47 | ``` 48 | 49 | * ### 通过gensim模块创建词向量词包 50 | ``` python 51 | model.creat_vocab(texts=texts, 52 | sg=0, 53 | size=5, 54 | window=5, 55 | min_count=1, 56 | vocab_savepath=os.getcwd() + '/vocab_word2vec.model') 57 | # 也可以导入词向量 58 | model.load_vocab_word2vec(os.getcwd() + '/models/vocab_word2vec.model') 59 | # 词向量模型 60 | model.vocab_word2vec 61 | ``` 62 | 63 | * ### 通过scikit-learn进行机器学习 64 | ``` python 65 | model.train(texts=train_data, 66 | label=train_label, 67 | model_name='SVM', 68 | model_savepath=os.getcwd() + '/classify.model') 69 | # 也可以导入机器学习模型 70 | model.load_model(model_loadpath=os.getcwd() + '/classify.model') 71 | # 训练的模型 72 | model.model 73 | # 训练集标签 74 | model.label 75 | ``` 76 | 77 | * ### 通过keras进行深度学习(模型的后缀不同) 78 | ``` python 79 | model.train(texts=train_data, 80 | label=train_label, 81 | model_name='Conv1D', 82 | batch_size=100, 83 | epochs=2, 84 | verbose=1, 85 | maxlen=None, 86 | model_savepath=os.getcwd() + '/classify.h5') 87 | 88 | # 导入深度学习模型 89 | model.load_model(model_loadpath=os.getcwd() + '/classify.h5') 90 | # 训练的模型 91 | model.model 92 | # 训练的日志 93 | model.train_log 94 | # 可视化训练过程 95 | from SentimentAnalysis.models.keras_log_plot import keras_log_plot 96 | keras_log_plot(model.train_log) 97 | # 训练集标签 98 | model.label 99 | ``` 100 | 101 | * ### 预测 102 | ``` python 103 | # 概率 104 | result_prob = model.predict_prob(texts=test_data) 105 | result_prob = pd.DataFrame(result_prob, columns=model.label) 106 | result_prob['predict'] = result_prob.idxmax(axis=1) 107 | result_prob['data'] = test_data 108 | result_prob = result_prob[['data'] + list(model.label) + ['predict']] 109 | print('prob:\n', result_prob) 110 | 111 | # 分类 112 | result = model.predict(texts=test_data) 113 | print('score:', np.sum(result == np.array(test_label)) / len(result)) 114 | result = pd.DataFrame({'data': test_data, 115 | 'label': test_label, 116 | 'predict': result}, 117 | columns=['data', 'label', 'predict']) 118 | print('test\n', result) 119 | ``` 120 | 121 | * ### 开启API 122 | ``` python 123 | # 需要先训练好模型 124 | model.open_api() 125 | #http://0.0.0.0:5000/SentimentAnalyse/?model_name=模型名称&prob=是否需要返回概率&text=分类文本 126 | ``` 127 | 128 | * ### 其他说明 129 | 在训练集很小的情况下,sklearn的概率输出predict_prob会不准。目前发现,SVM会出现所有标签概率一样,暂时没看源码,猜测是离超平面过近不计算概率,predict不会出现这个情况。 130 | 131 | ## 一个简单的demo 132 | ``` python 133 | from SentimentAnalysis.SentimentAnalysis import SentimentAnalysis 134 | from SentimentAnalysis.models.keras_log_plot import keras_log_plot 135 | import numpy as np 136 | 137 | train_data = ['国王喜欢吃苹果', 138 | '国王非常喜欢吃苹果', 139 | '国王讨厌吃苹果', 140 | '国王非常讨厌吃苹果'] 141 | train_label = ['正面', '正面', '负面', '负面'] 142 | # print('train data\n', 143 | # pd.DataFrame({'data': train_data, 144 | # 'label': train_label}, 145 | # columns=['data', 'label'])) 146 | test_data = ['涛哥喜欢吃苹果', 147 | '涛哥讨厌吃苹果', 148 | '涛哥非常喜欢吃苹果', 149 | '涛哥非常讨厌吃苹果'] 150 | test_label = ['正面', '负面', '正面', '负面'] 151 | 152 | # 创建模型 153 | model = SentimentAnalysis() 154 | 155 | # 查看bat打的标签 156 | print(model.creat_label(test_data)) 157 | 158 | # 建模获取词向量词包 159 | model.creat_vocab(texts=train_data, 160 | sg=0, 161 | size=5, 162 | window=5, 163 | min_count=1, 164 | vocab_savepath=os.getcwd() + '/vocab_word2vec.model') 165 | 166 | # 导入词向量词包 167 | # model.load_vocab_word2vec(vocab_loadpath=os.getcwd() + '/vocab_word2vec.model') 168 | 169 | ################################################################################### 170 | # 进行机器学习 171 | model.train(texts=train_data, 172 | label=train_label, 173 | model_name='SVM', 174 | model_savepath=os.getcwd() + '/classify.model') 175 | 176 | # 导入机器学习模型 177 | # model.load_model(model_loadpath=os.getcwd() + '/classify.model') 178 | 179 | # 进行预测:概率 180 | result_prob = model.predict_prob(texts=test_data) 181 | result_prob = pd.DataFrame(result_prob, columns=model.label) 182 | result_prob['predict'] = result_prob.idxmax(axis=1) 183 | result_prob['data'] = test_data 184 | result_prob = result_prob[['data'] + list(model.label) + ['predict']] 185 | print('prob:\n', result_prob) 186 | 187 | # 进行预测:分类 188 | result = model.predict(texts=test_data) 189 | print('score:', np.sum(result == np.array(test_label)) / len(result)) 190 | result = pd.DataFrame({'data': test_data, 191 | 'label': test_label, 192 | 'predict': result}, 193 | columns=['data', 'label', 'predict']) 194 | print('test\n', result) 195 | ################################################################################### 196 | # 进行深度学习 197 | model.train(texts=train_data, 198 | label=train_label, 199 | model_name='Conv1D', 200 | batch_size=100, 201 | epochs=2, 202 | verbose=1, 203 | maxlen=None, 204 | model_savepath=os.getcwd() + '/classify.h5') 205 | 206 | # 导入深度学习模型 207 | # model.load_model(model_loadpath=os.getcwd() + '/classify.h5') 208 | 209 | # 进行预测:概率 210 | result_prob = model.predict_prob(texts=test_data) 211 | result_prob = pd.DataFrame(result_prob, columns=model.label) 212 | result_prob['predict'] = result_prob.idxmax(axis=1) 213 | print(result_prob) 214 | 215 | # 进行预测:分类 216 | result = model.predict(texts=test_data) 217 | print(result) 218 | print('score:', np.sum(result == np.array(test_label)) / len(result)) 219 | result = pd.DataFrame({'data': test_data, 220 | 'label': test_label, 221 | 'predict': result}, 222 | columns=['data', 'label', 'predict']) 223 | print('test\n', result) 224 | 225 | keras_log_plot(model.train_log) 226 | 227 | ``` 228 | bat打标签
229 | ![bat](https://github.com/renjunxiang/Sentiment-analysis/blob/master/picture/label.png)
230 | SVM
231 | ![SVM](https://github.com/renjunxiang/Sentiment-analysis/blob/master/picture/SVM.png)
232 | Conv1D
233 | ![Conv1D](https://github.com/renjunxiang/Sentiment-analysis/blob/master/picture/Conv1D.png)
234 | 235 | ## 简单API 236 | 做了一个api:http://192.168.3.59:5000/SentimentAnalyse/?model_name=模型名称&prob=是否需要返回概率&text=分类文本
237 | 192.168.3.59:ip地址,由服务器决定
238 | 模型名称:目前支持:SVM,Conv1D
239 | prob:0返回分类,1返回概率
240 | ``` python 241 | from SentimentAnalysis import SentimentAnalysis 242 | model = SentimentAnalysis() 243 | model.open_api() 244 | ``` 245 | 246 | ### 例子 247 | 248 | * __SVM模型,返回概率__
249 | url:http://192.168.3.59:5000/SentimentAnalyse/?model_name=SVM&prob=1&text=东西很不错
250 | ![api1](https://github.com/renjunxiang/Sentiment-analysis/blob/master/picture/api1.png)
251 | 252 | * __Conv1D模型,返回分类__
253 | url:http://192.168.3.59:5000/SentimentAnalyse/?model_name=Conv1D&prob=0&text=东西很不错
254 | ![api2](https://github.com/renjunxiang/Sentiment-analysis/blob/master/picture/api2.png)
255 | 256 | * __文本中的词语均不在词向量词库中__
257 | url:http://192.168.3.59:5000/SentimentAnalyse/?model_name=SVM&prob=0&text=呜呜呜
258 | ![api3](https://github.com/renjunxiang/Sentiment-analysis/blob/master/picture/api3.png)
259 | 260 | 261 | 262 | 263 | 264 | 265 | 266 | 267 | 268 | 269 | 270 | 271 | 272 | -------------------------------------------------------------------------------- /SentimentAnalysis/SentimentAnalysis.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import numpy as np 3 | import os 4 | from sklearn.externals import joblib 5 | from gensim.models import word2vec 6 | from keras.preprocessing.sequence import pad_sequences 7 | from keras.callbacks import History 8 | from keras.models import load_model 9 | import jieba 10 | import json 11 | 12 | from SentimentAnalysis.creat_data import bat 13 | from SentimentAnalysis.sentence_transform.creat_vocab_word2vec import creat_vocab_word2vec 14 | from SentimentAnalysis.models.sklearn_supervised import sklearn_supervised 15 | from SentimentAnalysis.models import sklearn_config 16 | from SentimentAnalysis.models.neural_bulit import neural_bulit 17 | 18 | jieba.setLogLevel('WARN') 19 | DIR = os.path.dirname(__file__) 20 | 21 | class SentimentAnalysis(): 22 | # def __init__(self): 23 | # pass 24 | 25 | # open api 26 | def open_api(self): 27 | os.system("python %s" % (DIR + '/flask_api.py')) 28 | os.close() 29 | 30 | # get labels from baidu,ali,tencent 31 | def creat_label(self, texts): 32 | results_dataframe = bat.creat_label(texts) 33 | return results_dataframe 34 | 35 | def creat_vocab(self, 36 | texts=None, 37 | sg=0, 38 | size=5, 39 | window=5, 40 | min_count=1, 41 | vocab_savepath=DIR + '/models/vocab_word2vec.model'): 42 | ''' 43 | get dictionary by word2vec 44 | :param texts: list of text 45 | :param sg: 0 CBOW,1 skip-gram 46 | :param size: the dimensionality of the feature vectors 47 | :param window: the maximum distance between the current and predicted word within a sentence 48 | :param min_count: ignore all words with total frequency lower than this 49 | :param vocab_savepath: path to save word2vec dictionary 50 | :return: None 51 | ''' 52 | # 构建词向量词库 53 | self.vocab_word2vec = creat_vocab_word2vec(texts=texts, 54 | sg=sg, 55 | vocab_savepath=vocab_savepath, 56 | size=size, 57 | window=window, 58 | min_count=min_count) 59 | 60 | def load_vocab_word2vec(self, 61 | vocab_loadpath=DIR + '/models/vocab_word2vec.model'): 62 | ''' 63 | load dictionary 64 | :param vocab_loadpath: path to load word2vec dictionary 65 | :return: 66 | ''' 67 | self.vocab_word2vec = word2vec.Word2Vec.load(vocab_loadpath) 68 | 69 | def train(self, 70 | texts=None, 71 | label=None, 72 | model_name='SVM', 73 | model_savepath=DIR + '/models/classify.model', 74 | net_shape=None, 75 | batch_size=100, # 神经网络参数 76 | epochs=2, # 神经网络参数 77 | verbose=2, # 神经网络参数 78 | maxlen=None, # 神经网络参数 79 | **sklearn_param): 80 | ''' 81 | use sklearn/keras to train model 82 | :param texts: x 83 | :param label: y 84 | :param model_name: name want to train 85 | :param model_savepath: model save path 86 | :param batch_size: for keras fit 87 | :param epochs: for keras fit 88 | :param verbose: for keras fit 89 | :param maxlen: for keras pad_sequences 90 | :param sklearn_param: param for sklearn 91 | :return: None 92 | ''' 93 | self.model_name = model_name 94 | self.label = np.unique(np.array(label)) 95 | # 文本转词向量 96 | vocab_word2vec = self.vocab_word2vec 97 | texts_cut = [[word for word in jieba.lcut(one_text) if word != ' '] for one_text in texts] # 分词 98 | data = [[vocab_word2vec[word] for word in one_text if word in vocab_word2vec] for one_text in texts_cut] 99 | if maxlen is None: 100 | maxlen = max([len(i) for i in texts_cut]) 101 | self.maxlen = maxlen 102 | # sklearn模型,词向量计算均值 103 | if model_name in ['SVM', 'KNN', 'Logistic']: 104 | data = [sum(i) / len(i) for i in data] 105 | # 配置sklearn模型参数 106 | if model_name == 'SVM': 107 | if sklearn_param == {}: 108 | sklearn_param = sklearn_config.SVC 109 | elif model_name == 'KNN': 110 | if sklearn_param == {}: 111 | sklearn_param = sklearn_config.KNN 112 | elif model_name == 'Logistic': 113 | if sklearn_param == {}: 114 | sklearn_param = sklearn_config.Logistic 115 | # 返回训练模型 116 | self.model = sklearn_supervised(data=data, 117 | label=label, 118 | model_savepath=model_savepath, 119 | model_name=model_name, 120 | **sklearn_param) 121 | 122 | # keras神经网络模型, 123 | elif model_name in ['Conv1D_LSTM', 'Conv1D', 'LSTM']: 124 | data = pad_sequences(data, maxlen=maxlen, padding='post', value=0, dtype='float32') 125 | label_transform = np.array(pd.get_dummies(label)) 126 | if net_shape is None: 127 | if model_name == 'Conv1D_LSTM': 128 | net_shape = [ 129 | {'name': 'InputLayer', 'input_shape': data.shape[1:]}, 130 | {'name': 'Conv1D', 'filters': 64, 'kernel_size': 3, 'strides': 1, 'padding': 'same', 131 | 'activation': 'relu'}, 132 | {'name': 'MaxPooling1D', 'pool_size': 5, 'padding': 'same', 'strides': 2}, 133 | {'name': 'LSTM', 'units': 16, 'activation': 'tanh', 'recurrent_activation': 'hard_sigmoid', 134 | 'dropout': 0., 'recurrent_dropout': 0.}, 135 | {'name': 'Flatten'}, 136 | {'name': 'Dense', 'activation': 'relu', 'units': 64}, 137 | {'name': 'Dropout', 'rate': 0.2, }, 138 | {'name': 'softmax', 'activation': 'softmax', 'units': len(np.unique(label))} 139 | ] 140 | 141 | elif model_name == 'LSTM': 142 | net_shape = [ 143 | {'name': 'InputLayer', 'input_shape': data.shape[1:]}, 144 | {'name': 'Masking'}, 145 | {'name': 'LSTM', 'units': 16, 'activation': 'tanh', 'recurrent_activation': 'hard_sigmoid', 146 | 'dropout': 0., 'recurrent_dropout': 0.}, 147 | {'name': 'Dense', 'activation': 'relu', 'units': 64}, 148 | {'name': 'Dropout', 'rate': 0.2, }, 149 | {'name': 'softmax', 'activation': 'softmax', 'units': len(np.unique(label))} 150 | ] 151 | elif model_name == 'Conv1D': 152 | net_shape = [ 153 | {'name': 'InputLayer', 'input_shape': data.shape[1:]}, 154 | {'name': 'Conv1D', 'filters': 64, 'kernel_size': 3, 'strides': 1, 'padding': 'same', 155 | 'activation': 'relu'}, 156 | {'name': 'MaxPooling1D', 'pool_size': 5, 'padding': 'same', 'strides': 2}, 157 | {'name': 'Flatten'}, 158 | {'name': 'Dense', 'activation': 'relu', 'units': 64}, 159 | {'name': 'Dropout', 'rate': 0.2, }, 160 | {'name': 'softmax', 'activation': 'softmax', 'units': len(np.unique(label))} 161 | ] 162 | 163 | model = neural_bulit(net_shape=net_shape, 164 | optimizer_name='Adagrad', 165 | lr=0.001, 166 | loss='categorical_crossentropy') 167 | history = History() 168 | model.fit(data, label_transform, 169 | batch_size=batch_size, epochs=epochs, verbose=verbose, callbacks=[history]) 170 | train_log = pd.DataFrame(history.history) 171 | self.model = model 172 | self.train_log = train_log 173 | if model_savepath != None: 174 | model.save(model_savepath) 175 | with open(DIR + '/data_info.json', mode='w', encoding='utf-8') as f: 176 | json.dump({'maxlen': maxlen, 'label': list(self.label)}, f) 177 | 178 | def load_model(self, 179 | model_loadpath=DIR + '/models/classify.model', 180 | model_name=None, 181 | data_info_path=DIR + '/data_info.json'): 182 | ''' 183 | load sklearn/keras model 184 | :param model_loadpath: path to load sklearn/keras model 185 | :param model_name: load model name 186 | :param data_info_path: date information path 187 | :return: None 188 | ''' 189 | 190 | with open(data_info_path, encoding='utf-8') as f: 191 | data_info = json.load(f) 192 | self.maxlen = data_info['maxlen'] 193 | self.label = data_info['label'] 194 | self.model_name = model_name 195 | 196 | if self.model_name in ['SVM', 'KNN', 'Logistic']: 197 | self.model = joblib.load(model_loadpath) 198 | elif self.model_name in ['Conv1D_LSTM', 'Conv1D', 'LSTM']: 199 | self.model = load_model(model_loadpath) 200 | 201 | def predict_prob(self, 202 | texts=None): 203 | ''' 204 | predict probability 205 | :param texts: list of text 206 | :return: list of probability 207 | ''' 208 | # 文本转词向量 209 | vocab_word2vec = self.vocab_word2vec 210 | if self.model_name in ['SVM', 'KNN', 'Logistic']: 211 | texts_cut = [[word for word in jieba.lcut(one_text) if word != ' '] for one_text in texts] # 分词 212 | data = [[vocab_word2vec[word] for word in one_text if word in vocab_word2vec] for one_text in texts_cut] 213 | data = [sum(i) / len(i) for i in data] 214 | self.testdata = data 215 | results = self.model.predict_proba(data) 216 | elif self.model_name in ['Conv1D_LSTM', 'Conv1D', 'LSTM']: 217 | texts_cut = [[word for word in jieba.lcut(one_text) if word != ' '] for one_text in texts] # 分词 218 | data = [[vocab_word2vec[word] for word in one_text if word in vocab_word2vec] for one_text in texts_cut] 219 | data = pad_sequences(data, maxlen=self.maxlen, padding='post', value=0, dtype='float32') 220 | self.testdata = data 221 | results = self.model.predict(data) 222 | return results 223 | 224 | def predict(self, 225 | texts=None): 226 | ''' 227 | predict class 228 | :param texts: list of text 229 | :return: list of classify 230 | ''' 231 | # 文本转词向量 232 | vocab_word2vec = self.vocab_word2vec 233 | if self.model_name in ['SVM', 'KNN', 'Logistic']: 234 | texts_cut = [[word for word in jieba.lcut(one_text) if word != ' '] for one_text in texts] # 分词 235 | data = [[vocab_word2vec[word] for word in one_text if word in vocab_word2vec] for one_text in texts_cut] 236 | data = [sum(i) / len(i) for i in data] 237 | self.testdata = data 238 | results = self.model.predict(data) 239 | elif self.model_name in ['Conv1D_LSTM', 'Conv1D', 'LSTM']: 240 | texts_cut = [[word for word in jieba.lcut(one_text) if word != ' '] for one_text in texts] # 分词 241 | data = [[vocab_word2vec[word] for word in one_text if word in vocab_word2vec] for one_text in texts_cut] 242 | data = pad_sequences(data, maxlen=self.maxlen, padding='post', value=0, dtype='float32') 243 | self.testdata = data 244 | results = self.model.predict(data) 245 | results = pd.DataFrame(results, columns=self.label) 246 | results = results.idxmax(axis=1) 247 | return results 248 | 249 | 250 | if __name__ == '__main__': 251 | train_data = ['国王喜欢吃苹果', 252 | '国王非常喜欢吃苹果', 253 | '国王讨厌吃苹果', 254 | '国王非常讨厌吃苹果'] 255 | train_label = ['正面', '正面', '负面', '负面'] 256 | # print('train data\n', 257 | # pd.DataFrame({'data': train_data, 258 | # 'label': train_label}, 259 | # columns=['data', 'label'])) 260 | test_data = ['涛哥喜欢吃苹果', 261 | '涛哥讨厌吃苹果', 262 | '涛哥非常喜欢吃苹果', 263 | '涛哥非常讨厌吃苹果'] 264 | test_label = ['正面', '负面', '正面', '负面'] 265 | 266 | # 创建模型 267 | model = SentimentAnalysis() 268 | 269 | # 查看bat打的标签 270 | print(model.creat_label(test_data)) 271 | 272 | model.creat_vocab(texts=train_data, 273 | sg=0, 274 | size=5, 275 | window=5, 276 | min_count=1, 277 | vocab_savepath=None) 278 | 279 | # 导入词向量词包 280 | # model.load_vocab_word2vec(vocab_loadpath=DIR + '/vocab_word2vec.model') 281 | 282 | ################################################################################### 283 | # 进行机器学习 284 | model.train(texts=train_data, 285 | label=train_label, 286 | model_name='SVM', 287 | model_savepath=DIR + '/models/classify.model') 288 | 289 | # 导入机器学习模型 290 | model.load_model(model_loadpath=DIR + '/models/classify.model', 291 | model_name='SVM', 292 | data_info_path=DIR + '/data_info.json') 293 | 294 | # 进行预测:概率 295 | result_prob = model.predict_prob(texts=test_data) 296 | result_prob = pd.DataFrame(result_prob, columns=model.label) 297 | result_prob['predict'] = result_prob.idxmax(axis=1) 298 | result_prob['data'] = test_data 299 | result_prob = result_prob[['data'] + list(model.label) + ['predict']] 300 | print('prob:\n', result_prob) 301 | print('score:', np.sum(result_prob['predict'] == np.array(test_label)) / len(result_prob['predict'])) 302 | -------------------------------------------------------------------------------- /SentimentAnalysis/__init__.py: -------------------------------------------------------------------------------- 1 | __all__ = ['models','sentence_transform','data','test','creat_data'] 2 | -------------------------------------------------------------------------------- /SentimentAnalysis/creat_data/__init__.py: -------------------------------------------------------------------------------- 1 | __all__ = ['baidu','ali','tencent','bat','config'] 2 | -------------------------------------------------------------------------------- /SentimentAnalysis/creat_data/ali.py: -------------------------------------------------------------------------------- 1 | from SentimentAnalysis.creat_data.config import ali 2 | import datetime 3 | import hashlib 4 | import base64 5 | from urllib.parse import urlparse 6 | import hmac 7 | import pandas as pd 8 | import numpy as np 9 | import requests 10 | import json 11 | import time 12 | 13 | org_code = ali['account']['id_1']['org_code'] 14 | akID = ali['account']['id_1']['akID'] 15 | akSecret = ali['account']['id_1']['akSecret'] 16 | 17 | def creat_label(texts, 18 | org_code=org_code, 19 | akID=akID, 20 | akSecret=akSecret): 21 | ''' 22 | :param texts: 需要打标签的文档列表 23 | :param AppID: 腾讯ai账号信息,默认调用配置文件id_1 24 | :param AppKey: 腾讯ai账号信息,默认调用配置文件id_1 25 | :return: 打好标签的列表,包括原始文档、标签、置信水平、是否成功 26 | ''' 27 | url = org_code.join(ali['api']['Sentiment']['url'].split('{org_code}')) 28 | 29 | results = [] 30 | 31 | def to_sha1_base64(stringToSign, akSecret): 32 | hmacsha1 = hmac.new(akSecret.encode('utf-8'), 33 | stringToSign.encode('utf-8'), 34 | hashlib.sha1) 35 | return base64.b64encode(hmacsha1.digest()).decode('utf-8') 36 | 37 | # 逐句调用接口判断 38 | count_i=0 39 | for one_text in texts: 40 | # one_text = '喜欢' 41 | time_now = datetime.datetime.strftime(datetime.datetime.utcnow(), "%a, %d %b %Y %H:%M:%S GMT") 42 | # time_now = time.strftime("%a, %d %b %Y %H:%M:%S GMT", time.localtime()) #这个也可以 43 | options = {'url': url, 44 | 'method': 'POST', 45 | 'headers': {'accept': 'application/json', 46 | 'content-type': 'application/json', 47 | 'date': time_now, 48 | 'authorization': ''}, 49 | 'body': json.dumps({'text': one_text}, separators=(',', ':'))} 50 | 51 | body = '' 52 | if 'body' in options: 53 | body = options['body'] 54 | # print(body) 55 | bodymd5 = '' 56 | if not body == '': 57 | bodymd5 = base64.b64encode( 58 | hashlib.md5(json.dumps({'text': one_text}, separators=(',', ':')).encode('utf-8')).digest()).decode( 59 | 'utf-8') 60 | 61 | # print(bodymd5) 62 | 63 | urlPath = urlparse(url) 64 | if urlPath.query != '': 65 | urlPath = urlPath.path + "?" + urlPath.query 66 | else: 67 | urlPath = urlPath.path 68 | stringToSign = 'POST' + '\n' + \ 69 | options['headers']['accept'] + '\n' + \ 70 | bodymd5 + '\n' + \ 71 | options['headers']['content-type'] + '\n' \ 72 | + options['headers']['date'] + '\n' + urlPath 73 | 74 | # print(stringToSign) 75 | signature = to_sha1_base64(stringToSign=stringToSign, 76 | akSecret=akSecret) 77 | # print(signature) 78 | authHeader = 'Dataplus ' + akID + ':' + signature 79 | # print(authHeader) 80 | options['headers']['authorization'] = authHeader 81 | r = requests.post(url=url, 82 | headers={'accept': 'application/json', 83 | 'content-type': 'application/json', 84 | 'date': time_now, 85 | 'authorization': authHeader}, 86 | data=json.dumps({'text': one_text}, separators=(',', ':'))) # 获取分析结果 87 | try: 88 | result = json.loads(r.text) 89 | # print(result) 90 | results.append([one_text, 91 | result['data']['text_polarity'], 92 | 0, 93 | 'ok' 94 | ]) 95 | except: 96 | results.append([one_text, 97 | -100, 98 | -100, 99 | 'error' 100 | ]) 101 | count_i += 1 102 | if count_i % 50 == 0: 103 | print('ali finish:%d' % (count_i)) 104 | r.close() 105 | return results 106 | 107 | 108 | if __name__ == '__main__': 109 | results = creat_label(texts=['价格便宜啦,比原来优惠多了', 110 | '壁挂效果差,果然一分价钱一分货', 111 | '东西一般般,诶呀', 112 | '快递非常快,电视很惊艳,非常喜欢', 113 | '到货很快,师傅很热情专业。', 114 | '讨厌你', 115 | '一般' 116 | ]) 117 | results = pd.DataFrame(results, columns=['evaluation', 118 | 'label', 119 | 'ret', 120 | 'msg']) 121 | results['label'] = np.where(results['label'] == '1', '正面', 122 | np.where(results['label'] == '0', '中性', 123 | np.where(results['label'] == '-1', '负面', '非法'))) 124 | print(results) 125 | 126 | 127 | -------------------------------------------------------------------------------- /SentimentAnalysis/creat_data/baidu.py: -------------------------------------------------------------------------------- 1 | # from aip import AipNlp #现在不能用了 2 | from SentimentAnalysis.creat_data.config import baidu 3 | import pandas as pd 4 | import numpy as np 5 | import json 6 | import requests 7 | 8 | APP_ID = baidu['account']['id_1']['APP_ID'] 9 | API_KEY = baidu['account']['id_1']['API_KEY'] 10 | SECRET_KEY = baidu['account']['id_1']['SECRET_KEY'] 11 | 12 | 13 | # 逐句调用接口判断 14 | def creat_label(texts, 15 | interface='SDK', 16 | APP_ID=APP_ID, 17 | API_KEY=API_KEY, 18 | SECRET_KEY=SECRET_KEY): 19 | ''' 20 | :param texts: 需要打标签的文档列表 21 | :param interface: 接口方式,SDK和API 22 | :param APP_ID: 百度ai账号信息,默认调用配置文件id_1 23 | :param API_KEY: 百度ai账号信息,默认调用配置文件id_1 24 | :param SECRET_KEY: 百度ai账号信息,默认调用配置文件id_1 25 | :return: 打好标签的列表,包括原始文档、标签、置信水平、正负面概率、是否成功 26 | ''' 27 | # 创建连接 28 | 29 | results = [] 30 | if interface == 'SDK': 31 | pass 32 | # client = AipNlp(APP_ID=APP_ID, 33 | # API_KEY=API_KEY, 34 | # SECRET_KEY=SECRET_KEY) 35 | # for one_text in texts: 36 | # result = client.sentimentClassify(one_text) 37 | # if 'error_code' in result: 38 | # results.append([one_text, 39 | # 0, 40 | # 0, 41 | # 0, 42 | # 0, 43 | # result['error_code'], 44 | # result['error_msg'] 45 | # ]) 46 | # else: 47 | # results.append([one_text, 48 | # result['items'][0]['sentiment'], 49 | # result['items'][0]['confidence'], 50 | # result['items'][0]['positive_prob'], 51 | # result['items'][0]['negative_prob'], 52 | # 0, 53 | # 'ok' 54 | # ]) 55 | elif interface == 'API': 56 | # 获取access_token 57 | url = baidu['access_token_url'] 58 | params = {'grant_type': 'client_credentials', 59 | 'client_id': baidu['account']['id_1']['API_KEY'], 60 | 'client_secret': baidu['account']['id_1']['SECRET_KEY']} 61 | r = requests.post(url, params=params) 62 | access_token = json.loads(r.text)['access_token'] 63 | r.close() 64 | 65 | url = baidu['api']['sentiment_classify']['url'] 66 | params = {'access_token': access_token} 67 | headers = {'Content-Type': baidu['api']['sentiment_classify']['Content-Type']} 68 | count_i=0 69 | for one_text in texts: 70 | data = json.dumps({'text': one_text}) 71 | r = requests.post(url=url, 72 | params=params, 73 | headers=headers, 74 | data=data) 75 | result = json.loads(r.text) 76 | if 'error_code' in result: 77 | results.append([one_text, 78 | 0, 79 | 0, 80 | 0, 81 | 0, 82 | result['error_code'], 83 | result['error_msg'] 84 | ]) 85 | else: 86 | results.append([one_text, 87 | result['items'][0]['sentiment'], 88 | result['items'][0]['confidence'], 89 | result['items'][0]['positive_prob'], 90 | result['items'][0]['negative_prob'], 91 | 0, 92 | 'ok' 93 | ]) 94 | count_i += 1 95 | if count_i % 50 == 0: 96 | print('baidu finish:%d' % (count_i)) 97 | r.close() 98 | else: 99 | print('ERROR: No interface named %s' % (interface)) 100 | return results 101 | 102 | 103 | if __name__ == '__main__': 104 | results = creat_label(texts=['价格便宜啦,比原来优惠多了', 105 | '壁挂效果差,果然一分价钱一分货', 106 | '东西一般般,诶呀', 107 | '讨厌你', 108 | '一般'], 109 | interface='API') 110 | results = pd.DataFrame(results, columns=['evaluation', 111 | 'label', 112 | 'confidence', 113 | 'positive_prob', 114 | 'negative_prob', 115 | 'ret', 116 | 'msg']) 117 | results['label'] = np.where(results['label'] == 2, 118 | '正面', 119 | np.where(results['label'] == 1, '中性', '负面')) 120 | print(results) 121 | -------------------------------------------------------------------------------- /SentimentAnalysis/creat_data/bat.py: -------------------------------------------------------------------------------- 1 | from SentimentAnalysis.creat_data import baidu, ali, tencent 2 | import pandas as pd 3 | # from collections import OrderedDict 4 | import numpy as np 5 | 6 | 7 | def creat_label(texts): 8 | results = [] 9 | count_i = 0 10 | for one_text in texts: 11 | result_baidu = baidu.creat_label([one_text], interface='API') 12 | result_ali = ali.creat_label([one_text]) 13 | result_tencent = tencent.creat_label([one_text]) 14 | 15 | result_all = [one_text, 16 | result_baidu[0][1], result_baidu[0][6], 17 | result_ali[0][1], result_ali[0][3], 18 | result_tencent[0][1], result_tencent[0][4]] 19 | results.append(result_all) 20 | 21 | # result = OrderedDict() 22 | # result['evaluation'] = result_all[0] 23 | # result['label_baidu'] = result_all[1] 24 | # result['msg_baidu'] = result_all[2] 25 | # result['label_ali'] = result_all[3] 26 | # result['msg_ali'] = result_all[4] 27 | # result['label_tencent'] = result_all[5] 28 | # result['msg_tencent'] = result_all[6] 29 | 30 | count_i += 1 31 | if count_i % 50 == 0: 32 | print('baidu finish:%d' % (count_i)) 33 | 34 | results_dataframe = pd.DataFrame(results, 35 | columns=['evaluation', 36 | 'label_baidu', 'msg_baidu', 37 | 'label_ali', 'msg_ali', 38 | 'label_tencent', 'msg_tencent']) 39 | results_dataframe['label_baidu'] = np.where(results_dataframe['label_baidu'] == 2, 40 | '正面', 41 | np.where(results_dataframe['label_baidu'] == 1, '中性', '负面')) 42 | results_dataframe['label_ali'] = np.where(results_dataframe['label_ali'] == '1', '正面', 43 | np.where(results_dataframe['label_ali'] == '0', '中性', 44 | np.where(results_dataframe['label_ali'] == '-1', '负面', '非法'))) 45 | results_dataframe['label_tencent'] = np.where(results_dataframe['label_tencent'] == 1, '正面', 46 | np.where(results_dataframe['label_tencent'] == 0, '中性', '负面')) 47 | return results_dataframe 48 | 49 | 50 | if __name__ == '__main__': 51 | print(creat_label(['价格便宜啦,比原来优惠多了', 52 | '壁挂效果差,果然一分价钱一分货', 53 | '东西一般般,诶呀', 54 | '讨厌你', 55 | '一般' 56 | ])) 57 | -------------------------------------------------------------------------------- /SentimentAnalysis/creat_data/config.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | baidu = {'access_token_url': 'https://aip.baidubce.com/oauth/2.0/token', 4 | 'api': { 5 | 'sentiment_classify': { 6 | 'Content-Type': 'application/json', 7 | 'url': 'https://aip.baidubce.com/rpc/2.0/nlp/v1/sentiment_classify'}}, 8 | 'account': { 9 | 'id_1': {'APP_ID': '000', 10 | 'API_KEY': '000', 11 | 'SECRET_KEY': '000'}, 12 | 'id_2': {'APP_ID': '000', 13 | 'API_KEY': '000', 14 | 'SECRET_KEY': '000'}} 15 | } 16 | 17 | tencent = {'api': { 18 | 'nlp_textpolar': { 19 | 'url': 'https://api.ai.qq.com/fcgi-bin/nlp/nlp_textpolar'}}, 20 | 'account': { 21 | 'id_1': {'APP_ID': '000', 22 | 'AppKey': '000'}, 23 | 'id_2': {'APP_ID': '000', 24 | 'API_KEY': '000'}} 25 | } 26 | 27 | ali = {'api': { 28 | 'Sentiment': { 29 | 'url': 'https://dtplus-cn-shanghai.data.aliyuncs.com/{org_code}/nlp/api/Sentiment/ecommerce'}}, 30 | 'account': { 31 | 'id_1': {'org_code': '000', 32 | 'akID': '000', 33 | 'akSecret': '000' 34 | }, 35 | 'id_2': {'org_code': '000', 36 | 'akID': '000', 37 | 'akSecret': '000' 38 | }} 39 | } 40 | 41 | label_path=os.getcwd() 42 | 43 | -------------------------------------------------------------------------------- /SentimentAnalysis/creat_data/tencent.py: -------------------------------------------------------------------------------- 1 | from SentimentAnalysis.creat_data.config import tencent 2 | import pandas as pd 3 | import numpy as np 4 | import requests 5 | import json 6 | import time 7 | import random 8 | import hashlib 9 | from urllib import parse 10 | from collections import OrderedDict 11 | 12 | AppID = tencent['account']['id_1']['APP_ID'] 13 | AppKey = tencent['account']['id_1']['AppKey'] 14 | 15 | def cal_sign(params_raw,AppKey=AppKey): 16 | # 官方文档例子为php,给出python版本 17 | # params_raw = {'app_id': '10000', 18 | # 'time_stamp': '1493449657', 19 | # 'nonce_str': '20e3408a79', 20 | # 'key1': '腾讯AI开放平台', 21 | # 'key2': '示例仅供参考', 22 | # 'sign': ''} 23 | # AppKey = 'a95eceb1ac8c24ee28b70f7dbba912bf' 24 | # cal_sign(params_raw=params_raw, 25 | # AppKey=AppKey) 26 | # 返回:BE918C28827E0783D1E5F8E6D7C37A61 27 | params = OrderedDict() 28 | for i in sorted(params_raw): 29 | if params_raw[i] != '': 30 | params[i] = params_raw[i] 31 | newurl = parse.urlencode(params) 32 | newurl += ('&app_key=' + AppKey) 33 | sign = hashlib.md5(newurl.encode("latin1")).hexdigest().upper() 34 | return sign 35 | 36 | 37 | def creat_label(texts, 38 | AppID=AppID, 39 | AppKey=AppKey): 40 | ''' 41 | :param texts: 需要打标签的文档列表 42 | :param AppID: 腾讯ai账号信息,默认调用配置文件id_1 43 | :param AppKey: 腾讯ai账号信息,默认调用配置文件id_1 44 | :return: 打好标签的列表,包括原始文档、标签、置信水平、是否成功 45 | ''' 46 | 47 | url = tencent['api']['nlp_textpolar']['url'] 48 | results = [] 49 | # 逐句调用接口判断 50 | count_i=0 51 | for one_text in texts: 52 | params = {'app_id': AppID, 53 | 'time_stamp': int(time.time()), 54 | 'nonce_str': ''.join([random.choice('1234567890abcdefghijklmnopqrstuvwxyz') for i in range(10)]), 55 | 'sign': '', 56 | 'text': one_text} 57 | params['sign'] = cal_sign(params_raw=params, 58 | AppKey=AppKey) # 获取sign 59 | r = requests.post(url=url, 60 | params=params) # 获取分析结果 61 | result = json.loads(r.text) 62 | # print(result) 63 | results.append([one_text, 64 | result['data']['polar'], 65 | result['data']['confd'], 66 | result['ret'], 67 | result['msg'] 68 | ]) 69 | r.close() 70 | count_i += 1 71 | if count_i % 50 == 0: 72 | print('tencent finish:%d' % (count_i)) 73 | return results 74 | 75 | 76 | if __name__ == '__main__': 77 | results = creat_label(texts=['价格便宜啦,比原来优惠多了', 78 | '壁挂效果差,果然一分价钱一分货', 79 | '东西一般般,诶呀', 80 | '讨厌你', 81 | '一般']) 82 | results = pd.DataFrame(results, columns=['evaluation', 83 | 'label', 84 | 'confidence', 85 | 'ret', 86 | 'msg']) 87 | results['label'] = np.where(results['label'] == 1, '正面', 88 | np.where(results['label'] == 0, '中性', '负面')) 89 | print(results) 90 | -------------------------------------------------------------------------------- /SentimentAnalysis/data/traindata.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/renjunxiang/Sentiment-analysis/c6cb5594d2784472f193a4b6633f155ae1919cf8/SentimentAnalysis/data/traindata.xlsx -------------------------------------------------------------------------------- /SentimentAnalysis/data_info.json: -------------------------------------------------------------------------------- 1 | {"maxlen": 383, "label": ["\u4e2d\u6027", "\u6b63\u9762", "\u8d1f\u9762"]} -------------------------------------------------------------------------------- /SentimentAnalysis/flask_api.py: -------------------------------------------------------------------------------- 1 | # -*- coding:utf-8 -*- 2 | 3 | import pandas as pd 4 | import numpy as np 5 | from SentimentAnalysis.SentimentAnalysis import SentimentAnalysis 6 | import os 7 | from flask import Flask, request 8 | from flask_restful import Resource, Api 9 | 10 | app = Flask(__name__) 11 | app.config.update(RESTFUL_JSON=dict(ensure_ascii=False)) 12 | api = Api(app) 13 | 14 | DIR = os.path.dirname(__file__) 15 | class sentiment_analyse(Resource): 16 | def get(self): 17 | model_name = request.args.get('model_name') 18 | text = request.args.get('text') 19 | prob=request.args.get('prob') 20 | ''' 21 | model_name='SVM' 22 | text='刚买就降价了桑心' 23 | ''' 24 | 25 | model = SentimentAnalysis() 26 | # 导入词向量词包 27 | model.load_vocab_word2vec(vocab_loadpath=DIR + '/models/vocab_word2vec.model') 28 | 29 | if model_name in ['SVM', 'KNN', 'Logistic']: 30 | # 导入机器学习模型 31 | model.load_model(model_loadpath=DIR + '/models/classify.model', 32 | model_name=model_name, 33 | data_info_path=DIR + '/data_info.json') 34 | elif model_name in ['Conv1D_LSTM', 'Conv1D', 'LSTM']: 35 | # 导入深度学习模型 36 | model.load_model(model_loadpath=DIR + '/models/classify.h5', 37 | model_name=model_name, 38 | data_info_path=DIR + '/data_info.json') 39 | 40 | try: 41 | if prob == '1': 42 | # 进行预测:概率 43 | result_prob = model.predict_prob(texts=[text]) 44 | result_prob = result_prob.astype(np.float64) 45 | result_prob = pd.DataFrame(result_prob, columns=model.label) 46 | result_prob['predict'] = result_prob.idxmax(axis=1) 47 | result_prob['text'] = [text] 48 | result_prob = result_prob[['text'] + list(model.label) + ['predict']] 49 | result = [{i: result_prob.loc[0, i]} for i in ['text'] + list(model.label) + ['predict']] 50 | else: 51 | # 进行预测:类别 52 | result_classify = model.predict(texts=[text]) 53 | result = [{'text': text},{'predict':result_classify[0]}] 54 | 55 | return result 56 | except Exception as e: 57 | return '该文本的词语均不在词库中,无法识别'+' (error: %s)'%e 58 | 59 | #http://192.168.3.59:5000/SentimentAnalyse/?model_name=Conv1D&prob=1&text=东西为什么这么烂 60 | api.add_resource(sentiment_analyse, '/SentimentAnalyse/') 61 | 62 | if __name__ == '__main__': 63 | app.run(debug=True, host='0.0.0.0') 64 | 65 | 66 | 67 | -------------------------------------------------------------------------------- /SentimentAnalysis/models/__init__.py: -------------------------------------------------------------------------------- 1 | __all__ = ['keras_log_plot', 2 | 'neural_bulit', 3 | 'sklearn_supervised', 4 | 'sklearn_config', 5 | 'parameter'] 6 | -------------------------------------------------------------------------------- /SentimentAnalysis/models/classify.h5: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/renjunxiang/Sentiment-analysis/c6cb5594d2784472f193a4b6633f155ae1919cf8/SentimentAnalysis/models/classify.h5 -------------------------------------------------------------------------------- /SentimentAnalysis/models/classify.model: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/renjunxiang/Sentiment-analysis/c6cb5594d2784472f193a4b6633f155ae1919cf8/SentimentAnalysis/models/classify.model -------------------------------------------------------------------------------- /SentimentAnalysis/models/keras_log_plot.py: -------------------------------------------------------------------------------- 1 | import matplotlib.pyplot as plt 2 | 3 | 4 | def keras_log_plot(train_log=None): 5 | plt.plot(train_log['acc'], label='acc', color='red') 6 | plt.plot(train_log['loss'], label='loss', color='yellow') 7 | if 'val_acc' in train_log.columns: 8 | plt.plot(train_log['val_acc'], label='val_acc', color='green') 9 | if 'val_loss' in train_log.columns: 10 | plt.plot(train_log['val_loss'], label='val_loss', color='blue') 11 | plt.legend() 12 | plt.show() 13 | -------------------------------------------------------------------------------- /SentimentAnalysis/models/neural_bulit.py: -------------------------------------------------------------------------------- 1 | from keras.models import Sequential 2 | from keras.layers.core import Dense, initializers, Flatten, Dropout, Masking 3 | from keras.layers import Conv1D, InputLayer 4 | from keras.layers.recurrent import LSTM 5 | from keras.layers.pooling import MaxPooling1D 6 | from SentimentAnalysis.models.parameter.optimizers import optimizers 7 | 8 | def neural_bulit(net_shape, 9 | optimizer_name='Adagrad', 10 | lr=0.001, 11 | loss='categorical_crossentropy'): 12 | ''' 13 | :param net_shape: 神经网络格式 14 | net_shape = [ 15 | {'name': 'InputLayer','input_shape': [10, 5]}, 16 | {'name': 'Dropout','rate': 0.2,}, 17 | {'name': 'Masking'}, 18 | {'name': 'LSTM','units': 16,'activation': 'tanh','recurrent_activation': 'hard_sigmoid','dropout': 0.,'recurrent_dropout': 0.}, 19 | {'name': 'Conv1D','filters': 64,'kernel_size': 3,'strides': 1,'padding': 'same','activation': 'relu'}, 20 | {'name': 'MaxPooling1D','pool_size': 5,'padding': 'same','strides': 2}, 21 | {'name': 'Flatten'}, 22 | {'name': 'Dense','activation': 'relu','units': 64}, 23 | {'name': 'softmax','activation': 'softmax','units': 2} 24 | ] 25 | :param optimizer_name: 优化器 26 | :param lr: 学习率 27 | :param loss: 损失函数 28 | :param return: 返回神经网络模型 29 | ''' 30 | model = Sequential() 31 | 32 | def add_InputLayer(input_shape, 33 | **param): 34 | model.add(InputLayer(input_shape=input_shape, 35 | **param)) 36 | 37 | def add_Dropout(rate=0.2, 38 | **param): 39 | model.add(Dropout(rate=rate, 40 | **param)) 41 | 42 | def add_Masking(mask_value=0, 43 | **param): 44 | model.add(Masking(mask_value=mask_value, 45 | **param)) 46 | 47 | def add_LSTM(units=16, 48 | activation='tanh', 49 | recurrent_activation='hard_sigmoid', 50 | implementation=1, 51 | dropout=0, 52 | recurrent_dropout=0, 53 | **param): 54 | model.add(LSTM(units=units, 55 | activation=activation, 56 | recurrent_activation=recurrent_activation, 57 | implementation=implementation, 58 | dropout=dropout, 59 | recurrent_dropout=recurrent_dropout, 60 | **param)) 61 | 62 | def add_Conv1D(filters=16, # 卷积核数量 63 | kernel_size=3, # 卷积核尺寸,或者[3] 64 | strides=1, 65 | padding='same', 66 | activation='relu', 67 | kernel_initializer=initializers.normal(stddev=0.1), 68 | bias_initializer=initializers.normal(stddev=0.1), 69 | **param): 70 | model.add(Conv1D(filters=filters, 71 | kernel_size=kernel_size, 72 | strides=strides, 73 | padding=padding, 74 | activation=activation, 75 | kernel_initializer=kernel_initializer, 76 | bias_initializer=bias_initializer, 77 | **param)) 78 | 79 | def add_MaxPooling1D(pool_size=3, # 卷积核尺寸,或者[3] 80 | strides=1, 81 | padding='same', 82 | **param): 83 | model.add(MaxPooling1D(pool_size=pool_size, 84 | strides=strides, 85 | padding=padding, 86 | **param)) 87 | 88 | def add_Flatten(**param): 89 | model.add(Flatten(**param)) 90 | 91 | def add_Dense(units=16, 92 | activation='relu', 93 | kernel_initializer=initializers.normal(stddev=0.1), 94 | **param): 95 | model.add(Dense(units=units, 96 | activation=activation, 97 | kernel_initializer=kernel_initializer, 98 | **param)) 99 | 100 | for n in range(len(net_shape)): 101 | if net_shape[n]['name'] == 'InputLayer': 102 | del net_shape[n]['name'] 103 | add_InputLayer(name='num_' + str(n) + '_InputLayer', 104 | **net_shape[n]) 105 | elif net_shape[n]['name'] == 'Dropout': 106 | del net_shape[n]['name'] 107 | add_Dropout(name='num_' + str(n) + '_Dropout', 108 | **net_shape[n]) 109 | elif net_shape[n]['name'] == 'Masking': 110 | del net_shape[n]['name'] 111 | add_Masking(name='num_' + str(n) + '_Masking', 112 | **net_shape[n]) 113 | elif net_shape[n]['name'] == 'LSTM': 114 | del net_shape[n]['name'] 115 | add_LSTM(name='num_' + str(n) + '_LSTM', 116 | **net_shape[n]) 117 | elif net_shape[n]['name'] == 'Conv1D': 118 | del net_shape[n]['name'] 119 | add_Conv1D(name='num_' + str(n) + '_Conv1D', 120 | **net_shape[n]) 121 | elif net_shape[n]['name'] == 'MaxPooling1D': 122 | del net_shape[n]['name'] 123 | add_MaxPooling1D(name='num_' + str(n) + '_MaxPooling1D', 124 | **net_shape[n]) 125 | elif net_shape[n]['name'] == 'Flatten': 126 | del net_shape[n]['name'] 127 | add_Flatten(name='num_' + str(n) + '_Flatten', 128 | **net_shape[n]) 129 | elif net_shape[n]['name'] == 'Dense': 130 | del net_shape[n]['name'] 131 | add_Dense(name='num_' + str(n) + '_Dense', 132 | **net_shape[n]) 133 | elif net_shape[n]['name'] == 'softmax': 134 | del net_shape[n]['name'] 135 | add_Dense(name='num_' + str(n) + '_softmax', 136 | **net_shape[n]) 137 | 138 | optimizer = optimizers(name=optimizer_name, lr=lr) 139 | model.compile(optimizer=optimizer, loss=loss, metrics=['accuracy']) 140 | 141 | return model 142 | 143 | 144 | if __name__ == '__main__': 145 | net_shape = [{'name': 'InputLayer', 146 | 'input_shape': [10, 5], 147 | }, 148 | {'name': 'Conv1D' 149 | }, 150 | {'name': 'MaxPooling1D' 151 | }, 152 | {'name': 'Flatten' 153 | }, 154 | {'name': 'Dense' 155 | }, 156 | {'name': 'Dropout' 157 | }, 158 | {'name': 'softmax', 159 | 'activation': 'softmax', 160 | 'units': 3 161 | } 162 | ] 163 | model = neural_bulit(net_shape=net_shape, 164 | optimizer_name='Adagrad', 165 | lr=0.001, 166 | loss='categorical_crossentropy') 167 | model.summary() 168 | -------------------------------------------------------------------------------- /SentimentAnalysis/models/parameter/__init__.py: -------------------------------------------------------------------------------- 1 | __all__ = ['optimizers'] 2 | -------------------------------------------------------------------------------- /SentimentAnalysis/models/parameter/optimizers.py: -------------------------------------------------------------------------------- 1 | from keras.optimizers import SGD, Adagrad, Adam 2 | 3 | 4 | def optimizers(name='SGD', lr=0.001): 5 | if name == 'SGD': 6 | optimizers_fun = SGD(lr=lr) 7 | elif name == 'Adagrad': 8 | optimizers_fun = Adagrad(lr=lr) 9 | elif name == 'Adam': 10 | optimizers_fun = Adam(lr=lr) 11 | 12 | return optimizers_fun 13 | -------------------------------------------------------------------------------- /SentimentAnalysis/models/sklearn_config.py: -------------------------------------------------------------------------------- 1 | # from sklearn.neighbors import KNeighborsClassifier 2 | # from sklearn.svm import SVC 3 | # from sklearn.linear_model import LogisticRegression 4 | 5 | SVC = {'kernel': 'linear', # 'linear', 'poly', 'rbf', 'sigmoid', 'precomputed' 6 | 'C': 1.0, 7 | 'probability': True} 8 | 9 | KNN = {'n_neighbors': 3} # Number of neighbors to use 10 | 11 | Logistic = {'solver': 'liblinear', 12 | 'penalty': 'l2', # 'l1' or 'l2' 13 | 'C': 1.0} # a positive float,Like in support vector machines, smaller values specify stronger 14 | -------------------------------------------------------------------------------- /SentimentAnalysis/models/sklearn_supervised.py: -------------------------------------------------------------------------------- 1 | from sklearn.neighbors import KNeighborsClassifier 2 | from sklearn.svm import SVC 3 | from sklearn.linear_model import LogisticRegression 4 | from sklearn.externals import joblib 5 | import os 6 | 7 | DIR = os.path.dirname(__file__) 8 | def sklearn_supervised(data=None, 9 | label=None, 10 | model_savepath=DIR + '/sentence_transform/classify.model', 11 | model_name='SVM', 12 | **sklearn_param): 13 | ''' 14 | :param data: 训练文本 15 | :param label: 训练文本的标签 16 | :param model_savepath: 模型保存路径 17 | :param model_name: 机器学习分类模型,SVM,KNN,Logistic 18 | :param return: 训练好的模型 19 | ''' 20 | 21 | if model_name == 'KNN': 22 | # 调用KNN,近邻=5 23 | model = KNeighborsClassifier(**sklearn_param) 24 | elif model_name == 'SVM': 25 | # 核函数为linear,惩罚系数为1.0 26 | model = SVC(**sklearn_param) 27 | model.fit(data, label) 28 | elif model_name == 'Logistic': 29 | model = LogisticRegression(**sklearn_param) # 核函数为线性,惩罚系数为1 30 | model.fit(data, label) 31 | 32 | if model_savepath != None: 33 | joblib.dump(model, model_savepath) # 保存模型 34 | 35 | 36 | return model 37 | -------------------------------------------------------------------------------- /SentimentAnalysis/models/vocab_word2vec.model: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/renjunxiang/Sentiment-analysis/c6cb5594d2784472f193a4b6633f155ae1919cf8/SentimentAnalysis/models/vocab_word2vec.model -------------------------------------------------------------------------------- /SentimentAnalysis/sentence_transform/__init__.py: -------------------------------------------------------------------------------- 1 | __all__ = ['sentence_2_sparse', 2 | 'sentence_word2vec', 3 | 'sentence_2_tokenizer'] 4 | -------------------------------------------------------------------------------- /SentimentAnalysis/sentence_transform/creat_vocab_word2vec.py: -------------------------------------------------------------------------------- 1 | from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer, HashingVectorizer 2 | import pandas as pd 3 | import jieba 4 | from gensim.models import word2vec, doc2vec 5 | import numpy as np 6 | import os 7 | 8 | jieba.setLogLevel('WARN') 9 | DIR = os.path.dirname(__file__) 10 | 11 | def creat_vocab_word2vec(texts=None, 12 | sg=0, 13 | vocab_savepath=DIR + '/vocab_word2vec.model', 14 | size=5, 15 | window=5, 16 | min_count=1): 17 | ''' 18 | 19 | :param texts: list of text 20 | :param sg: 0 CBOW,1 skip-gram 21 | :param size: the dimensionality of the feature vectors 22 | :param window: the maximum distance between the current and predicted word within a sentence 23 | :param min_count: ignore all words with total frequency lower than this 24 | :param vocab_savepath: path to save word2vec dictionary 25 | :return: None 26 | 27 | ''' 28 | texts_cut = [[word for word in jieba.lcut(one_text) if word != ' '] for one_text in texts] # 分词 29 | # 训练 30 | model = word2vec.Word2Vec(texts_cut, sg=sg, size=size, window=window, min_count=min_count) 31 | if vocab_savepath != None: 32 | model.save(vocab_savepath) 33 | 34 | return model 35 | 36 | 37 | if __name__ == '__main__': 38 | texts = ['全面从严治党', 39 | '国际公约和国际法', 40 | '中国航天科技集团有限公司'] 41 | vocab_word2vec = creat_vocab_word2vec(texts=texts, 42 | sg=0, 43 | vocab_savepath=DIR + '/vocab_word2vec.model', 44 | size=5, 45 | window=5, 46 | min_count=1) 47 | -------------------------------------------------------------------------------- /SentimentAnalysis/sentence_transform/sentence_2_sparse.py: -------------------------------------------------------------------------------- 1 | from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer, HashingVectorizer 2 | import pandas as pd 3 | import jieba 4 | 5 | jieba.setLogLevel('WARN') 6 | 7 | 8 | def sentence_2_sparse(train_data, 9 | test_data=None, 10 | language='Chinese', 11 | hash=True, 12 | hashmodel='CountVectorizer'): 13 | ''' 14 | 15 | :param train_data: 训练集 16 | :param test_data: 测试集 17 | :param language: 语种 18 | :param hash: 是否转哈希存储 19 | :param hashmodel: 哈希计数的方式 20 | :param return: 返回编码后稀疏矩阵 21 | ''' 22 | # 分词转one-hot dataframe 23 | if test_data==None: 24 | if hash == False: 25 | train_data = pd.DataFrame([pd.Series([word for word in jieba.lcut(sample) if word != ' ']).value_counts() 26 | for sample in train_data]).fillna(0) 27 | # 中文需要先分词空格分隔,再转稀疏矩阵 28 | else: 29 | if language == 'Chinese': 30 | train_data = [' '.join([word for word in jieba.lcut(sample) if word != ' ']) for sample in train_data] 31 | if hashmodel == 'CountVectorizer': # 只计数 32 | count_train = CountVectorizer() 33 | train_data_hashcount = count_train.fit_transform(train_data) # 训练数据转哈希计数 34 | elif hashmodel == 'TfidfTransformer': # 计数后计算tf-idf 35 | count_train = CountVectorizer() 36 | train_data_hashcount = count_train.fit_transform(train_data) # 训练数据转哈希计数 37 | tfidftransformer = TfidfTransformer() 38 | train_data_hashcount = tfidftransformer.fit(train_data_hashcount).transform(train_data_hashcount) 39 | elif hashmodel == 'HashingVectorizer': # 哈希计算 40 | vectorizer = HashingVectorizer(stop_words=None, n_features=10000) 41 | train_data_hashcount = vectorizer.fit_transform(train_data) # 训练数据转哈希后的特征,避免键值重叠导致过大有一个计算的 42 | return train_data_hashcount 43 | return train_data 44 | else: 45 | # 中文需要先分词空格分隔,再转稀疏矩阵,如果包含测试集,测试集转hash需要在训练集的词库基础上执行 46 | if language == 'Chinese': 47 | train_data = [' '.join([word for word in jieba.lcut(sample) if word != ' ']) for sample in train_data] 48 | test_data = [' '.join([word for word in jieba.lcut(sample) if word != ' ']) for sample in test_data] 49 | if hashmodel == 'CountVectorizer': # 只计数 50 | count_train = CountVectorizer() 51 | train_data_hashcount = count_train.fit_transform(train_data) # 训练数据转哈希计数 52 | count_test = CountVectorizer(vocabulary=count_train.vocabulary_) # 测试数据调用训练词库 53 | test_data_hashcount = count_test.fit_transform(test_data) # 测试数据转哈希计数 54 | elif hashmodel == 'TfidfTransformer': # 计数后计算tf-idf 55 | count_train = CountVectorizer() 56 | train_data_hashcount = count_train.fit_transform(train_data) # 训练数据转哈希计数 57 | count_test = CountVectorizer(vocabulary=count_train.vocabulary_) # 测试数据调用训练词库 58 | test_data_hashcount = count_test.fit_transform(test_data) # 测试数据转哈希计数 59 | tfidftransformer = TfidfTransformer() 60 | train_data_hashcount = tfidftransformer.fit(train_data_hashcount).transform(train_data_hashcount) 61 | test_data_hashcount = tfidftransformer.fit(test_data_hashcount).transform(test_data_hashcount) 62 | elif hashmodel == 'HashingVectorizer': # 哈希计算 63 | vectorizer = HashingVectorizer(stop_words=None, n_features=10000) 64 | train_data_hashcount = vectorizer.fit_transform(train_data) # 训练数据转哈希后的特征,避免键值重叠导致过大有一个计算的 65 | test_data_hashcount = vectorizer.fit_transform(test_data) # 测试数据转哈希后的特征 66 | return train_data_hashcount, test_data_hashcount 67 | 68 | 69 | if __name__ == '__main__': 70 | train_data = ['全面从严治党', 71 | '国际公约和国际法', 72 | '中国航天科技集团有限公司'] 73 | test_data = ['全面从严测试'] 74 | print('train_data\n',train_data,'\ntest_data\n',test_data) 75 | print('sentence_2_sparse(train_data=train_data,hash=False)\n', 76 | sentence_2_sparse(train_data=train_data, hash=False)) 77 | print('sentence_2_sparse(train_data=train_data,hash=True)\n', 78 | sentence_2_sparse(train_data=train_data, hash=True)) 79 | m,n=sentence_2_sparse(train_data=train_data, test_data=test_data, hash=True) 80 | print('sentence_2_sparse(train_data=train_data,test_data=test_data,hash=True)\n', 81 | 'train_data\n',m,'\ntest_data\n',n) 82 | -------------------------------------------------------------------------------- /SentimentAnalysis/sentence_transform/sentence_2_tokenizer.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import jieba 3 | from keras.preprocessing.text import Tokenizer 4 | 5 | jieba.setLogLevel('WARN') 6 | 7 | 8 | def sentence_2_tokenizer(train_data, 9 | test_data=None, 10 | num_words=None, 11 | word_index=False): 12 | ''' 13 | 14 | :param train_data: 训练集 15 | :param test_data: 测试集 16 | :param num_words: 词库大小,None则依据样本自动判定 17 | :param word_index: 是否需要索引 18 | :param return: 返回编码后数组 19 | ''' 20 | train_data = [' '.join([word for word in jieba.lcut(sample) if word != ' ']) for sample in train_data] 21 | test_data = [' '.join([word for word in jieba.lcut(sample) if word != ' ']) for sample in test_data] 22 | data = train_data + test_data 23 | tokenizer = Tokenizer(num_words=num_words) 24 | tokenizer.fit_on_texts(data) 25 | train_data = tokenizer.texts_to_sequences(train_data) 26 | test_data = tokenizer.texts_to_sequences(test_data) 27 | 28 | if word_index == False: 29 | if test_data == None: 30 | return train_data 31 | 32 | else: 33 | return train_data, test_data 34 | else: 35 | if test_data == None: 36 | return train_data, tokenizer.word_index 37 | 38 | else: 39 | return train_data, test_data, tokenizer.word_index 40 | 41 | 42 | if __name__ == '__main__': 43 | train_data = ['全面从严治党', 44 | '国际公约和国际法', 45 | '中国航天科技集团有限公司'] 46 | test_data = ['全面从严测试'] 47 | train_data_vec, test_data_vec, word_index = sentence_2_tokenizer(train_data=train_data, 48 | test_data=test_data, 49 | num_words=None, 50 | word_index=True) 51 | print(train_data, '\n', train_data_vec, '\n', 52 | test_data[0], '\n', test_data_vec, '\n', 53 | 'word_index\n',word_index) 54 | -------------------------------------------------------------------------------- /demo.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import numpy as np 3 | from sklearn.model_selection import train_test_split 4 | import os 5 | 6 | from SentimentAnalysis.SentimentAnalysis import SentimentAnalysis 7 | from SentimentAnalysis.models.keras_log_plot import keras_log_plot 8 | 9 | model = SentimentAnalysis() 10 | 11 | dataset = pd.read_excel(os.getcwd() + '/data/traindata.xlsx', sheet_name=0) 12 | data = dataset['evaluation'] 13 | label = dataset['label'] 14 | train_data, test_data, train_label, test_label = train_test_split(data, 15 | label, 16 | test_size=0.1, 17 | random_state=42) 18 | test_data = test_data.reset_index(drop=True) 19 | test_label = test_label.reset_index(drop=True) 20 | # 建模获取词向量词包 21 | model.creat_vocab(texts=train_data, 22 | sg=0, 23 | size=5, 24 | window=5, 25 | min_count=1, 26 | vocab_savepath=None) 27 | 28 | # 导入词向量词包 29 | # model.load_vocab_word2vec(vocab_loadpath=os.getcwd() + '/vocab_word2vec.model') 30 | 31 | ################################################################################### 32 | # 进行机器学习 33 | model.train(texts=train_data, 34 | label=train_label, 35 | model_name='SVM', 36 | model_savepath=os.getcwd() + '/models/classify.model') 37 | 38 | # 导入机器学习模型 39 | model.load_model(model_loadpath=os.getcwd() + '/models/classify.model', 40 | model_name='SVM', 41 | data_info_path=os.getcwd() + '/data_info.json') 42 | 43 | # 进行预测:概率 44 | result_prob = model.predict_prob(texts=test_data) 45 | result_prob = pd.DataFrame(result_prob, columns=model.label) 46 | result_prob['predict'] = result_prob.idxmax(axis=1) 47 | result_prob['data'] = test_data 48 | result_prob = result_prob[['data'] + list(model.label) + ['predict']] 49 | print('prob:\n', result_prob) 50 | print('score:', np.sum(result_prob['predict'] == np.array(test_label)) / len(result_prob['predict'])) 51 | 52 | ################################################################################### 53 | # 进行深度学习 54 | model.train(texts=train_data, 55 | label=train_label, 56 | model_name='Conv1D', 57 | batch_size=200, 58 | epochs=20, 59 | verbose=2, 60 | maxlen=None, 61 | model_savepath=os.getcwd() + '/models/classify.h5') 62 | 63 | # 导入深度学习模型 64 | model.load_model(model_loadpath=os.getcwd() + '/models/classify.h5', 65 | model_name='Conv1D', 66 | data_info_path=os.getcwd() + '/data_info.json') 67 | 68 | # 进行预测:概率 69 | result_prob = model.predict_prob(texts=test_data) 70 | result_prob = pd.DataFrame(result_prob, columns=model.label) 71 | result_prob['predict'] = result_prob.idxmax(axis=1) 72 | result_prob['data'] = test_data 73 | result_prob = result_prob[['data'] + list(model.label) + ['predict']] 74 | print('prob:\n', result_prob) 75 | print('score:', np.sum(result_prob['predict'] == np.array(test_label)) / len(result_prob['predict'])) 76 | 77 | keras_log_plot(model.train_log) 78 | -------------------------------------------------------------------------------- /picture/Conv1D.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/renjunxiang/Sentiment-analysis/c6cb5594d2784472f193a4b6633f155ae1919cf8/picture/Conv1D.png -------------------------------------------------------------------------------- /picture/SVM.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/renjunxiang/Sentiment-analysis/c6cb5594d2784472f193a4b6633f155ae1919cf8/picture/SVM.png -------------------------------------------------------------------------------- /picture/api1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/renjunxiang/Sentiment-analysis/c6cb5594d2784472f193a4b6633f155ae1919cf8/picture/api1.png -------------------------------------------------------------------------------- /picture/api2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/renjunxiang/Sentiment-analysis/c6cb5594d2784472f193a4b6633f155ae1919cf8/picture/api2.png -------------------------------------------------------------------------------- /picture/api3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/renjunxiang/Sentiment-analysis/c6cb5594d2784472f193a4b6633f155ae1919cf8/picture/api3.png -------------------------------------------------------------------------------- /picture/label.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/renjunxiang/Sentiment-analysis/c6cb5594d2784472f193a4b6633f155ae1919cf8/picture/label.png --------------------------------------------------------------------------------