├── .DS_Store
├── .ipynb_checkpoints
└── meng_models-checkpoint.ipynb
├── Behavior_Analysis
├── Data Exploration Plot
│ ├── 01.png
│ ├── 02.png
│ ├── 03.png
│ ├── 04.png
│ ├── 05.png
│ ├── 06.png
│ ├── 07.png
│ ├── 08.png
│ ├── 09.png
│ ├── 10.png
│ └── 11.png
└── EDA+Model.ipynb
├── README.md
└── SentimentAnalysis
├── 01-data_prep_EDA.ipynb
├── 02-tokenization.ipynb
├── 03-preprocessing.ipynb
├── 04-models.ipynb
├── data
└── train_df.csv
└── stopwords.txt
/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LakiLiu/Covid-19-Analysis/0905894869d861e6f8beab986d3b4fd6be170532/.DS_Store
--------------------------------------------------------------------------------
/Behavior_Analysis/Data Exploration Plot/01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LakiLiu/Covid-19-Analysis/0905894869d861e6f8beab986d3b4fd6be170532/Behavior_Analysis/Data Exploration Plot/01.png
--------------------------------------------------------------------------------
/Behavior_Analysis/Data Exploration Plot/02.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LakiLiu/Covid-19-Analysis/0905894869d861e6f8beab986d3b4fd6be170532/Behavior_Analysis/Data Exploration Plot/02.png
--------------------------------------------------------------------------------
/Behavior_Analysis/Data Exploration Plot/03.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LakiLiu/Covid-19-Analysis/0905894869d861e6f8beab986d3b4fd6be170532/Behavior_Analysis/Data Exploration Plot/03.png
--------------------------------------------------------------------------------
/Behavior_Analysis/Data Exploration Plot/04.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LakiLiu/Covid-19-Analysis/0905894869d861e6f8beab986d3b4fd6be170532/Behavior_Analysis/Data Exploration Plot/04.png
--------------------------------------------------------------------------------
/Behavior_Analysis/Data Exploration Plot/05.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LakiLiu/Covid-19-Analysis/0905894869d861e6f8beab986d3b4fd6be170532/Behavior_Analysis/Data Exploration Plot/05.png
--------------------------------------------------------------------------------
/Behavior_Analysis/Data Exploration Plot/06.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LakiLiu/Covid-19-Analysis/0905894869d861e6f8beab986d3b4fd6be170532/Behavior_Analysis/Data Exploration Plot/06.png
--------------------------------------------------------------------------------
/Behavior_Analysis/Data Exploration Plot/07.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LakiLiu/Covid-19-Analysis/0905894869d861e6f8beab986d3b4fd6be170532/Behavior_Analysis/Data Exploration Plot/07.png
--------------------------------------------------------------------------------
/Behavior_Analysis/Data Exploration Plot/08.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LakiLiu/Covid-19-Analysis/0905894869d861e6f8beab986d3b4fd6be170532/Behavior_Analysis/Data Exploration Plot/08.png
--------------------------------------------------------------------------------
/Behavior_Analysis/Data Exploration Plot/09.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LakiLiu/Covid-19-Analysis/0905894869d861e6f8beab986d3b4fd6be170532/Behavior_Analysis/Data Exploration Plot/09.png
--------------------------------------------------------------------------------
/Behavior_Analysis/Data Exploration Plot/10.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LakiLiu/Covid-19-Analysis/0905894869d861e6f8beab986d3b4fd6be170532/Behavior_Analysis/Data Exploration Plot/10.png
--------------------------------------------------------------------------------
/Behavior_Analysis/Data Exploration Plot/11.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/LakiLiu/Covid-19-Analysis/0905894869d861e6f8beab986d3b4fd6be170532/Behavior_Analysis/Data Exploration Plot/11.png
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Covid-19-Analysis
2 |
3 | ### 1. Covid-19 Human Behaviour Analysis: A creative Reseach
4 | ##### (A) Description:
5 | - Herd Behavior is commonly used in Financial Markets to predict uncertainty.
6 | - Common Models : Econometrics Regression and Time Series
7 | - Research Value : Very Rarely applied in the Covid-19 Mobility case but Human behavior uncertainty affects the policy effectiveness
8 |
9 | #### (B) Model:
10 | - Baseline Model : Bayesian Ridge Regression
11 |
12 | ---
13 | ### 2. Covid-19 Sentiment Analysis: A Chinese NLP Study
14 | ##### (A) Description:
15 | - The dataset was based on 230 keywords related to the topic of COVID, and a total of 1 million blogs were collected from 1 January 2020 to 20 February 2020. THis includes sentiments : 1 (positive), 0 (neutral) and -1 (negative)
16 | (数据集依据与“新冠肺炎”相关的230个主题关键词进行数据采集,抓取了2020年1月1日—2020年2月20日期间共计100万条微博数据,并对其中10万条数据进行人工标注,标注分为三类,分别为:1(积极),0(中性)和-1(消极))
17 | - 评价指标 Evaluation Matrix: Macro-F1 score
18 |
19 | #### (B) Model:
20 | - Preprocessing:tokenisation, remove duplicate, remove similar words
21 | - Machine Learning Model: Naive Bayes, Logistic regression -> result F1 score : 0.74
22 | - Deep Learning Model:RNN (RNN+LSTM), CNN
23 |
24 |
--------------------------------------------------------------------------------
/SentimentAnalysis/01-data_prep_EDA.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# **Covid-19 Sentiment Analysis - Part I**\n",
8 | "This part contains data cleaning and exploratory data analysis\n",
9 | "## 1. Data Cleaning"
10 | ]
11 | },
12 | {
13 | "cell_type": "code",
14 | "execution_count": 1,
15 | "metadata": {},
16 | "outputs": [
17 | {
18 | "data": {
19 | "text/html": [
20 | "
\n",
21 | "\n",
34 | "
\n",
35 | " \n",
36 | " \n",
37 | " | \n",
38 | " ID | \n",
39 | " time | \n",
40 | " user | \n",
41 | " content | \n",
42 | " pic | \n",
43 | " video | \n",
44 | " sentiment | \n",
45 | "
\n",
46 | " \n",
47 | " \n",
48 | " \n",
49 | " 0 | \n",
50 | " 4456072029125500 | \n",
51 | " 01月01日 23:50 | \n",
52 | " 存曦1988 | \n",
53 | " 写在年末冬初孩子流感的第五天,我们仍然没有忘记热情拥抱这2020年的第一天。带着一丝迷信,早... | \n",
54 | " ['https://ww2.sinaimg.cn/orj360/005VnA1zly1gah... | \n",
55 | " [] | \n",
56 | " 0 | \n",
57 | "
\n",
58 | " \n",
59 | " 1 | \n",
60 | " 4456074167480980 | \n",
61 | " 01月01日 23:58 | \n",
62 | " LunaKrys | \n",
63 | " 开年大模型…累到以为自己发烧了腰疼膝盖疼腿疼胳膊疼脖子疼#Luna的Krystallife#? | \n",
64 | " [] | \n",
65 | " [] | \n",
66 | " -1 | \n",
67 | "
\n",
68 | " \n",
69 | " 2 | \n",
70 | " 4456054253264520 | \n",
71 | " 01月01日 22:39 | \n",
72 | " 小王爷学辩论o_O | \n",
73 | " 邱晨这就是我爹,爹,发烧快好,毕竟美好的假期拿来养病不太好,假期还是要好好享受快乐,爹,新... | \n",
74 | " ['https://ww2.sinaimg.cn/thumb150/006ymYXKgy1g... | \n",
75 | " [] | \n",
76 | " 1 | \n",
77 | "
\n",
78 | " \n",
79 | " 3 | \n",
80 | " 4456061509126470 | \n",
81 | " 01月01日 23:08 | \n",
82 | " 芩鎟 | \n",
83 | " 新年的第一天感冒又发烧的也太衰了但是我要想着明天一定会好的? | \n",
84 | " ['https://ww2.sinaimg.cn/orj360/005FL9LZgy1gah... | \n",
85 | " [] | \n",
86 | " 1 | \n",
87 | "
\n",
88 | " \n",
89 | " 4 | \n",
90 | " 4455979322528190 | \n",
91 | " 01月01日 17:42 | \n",
92 | " changlwj | \n",
93 | " 问:我们意念里有坏的想法了,天神就会给记下来,那如果有好的想法也会被记下来吗?答:那当然了。... | \n",
94 | " [] | \n",
95 | " [] | \n",
96 | " 1 | \n",
97 | "
\n",
98 | " \n",
99 | "
\n",
100 | "
"
101 | ],
102 | "text/plain": [
103 | " ID time user \\\n",
104 | "0 4456072029125500 01月01日 23:50 存曦1988 \n",
105 | "1 4456074167480980 01月01日 23:58 LunaKrys \n",
106 | "2 4456054253264520 01月01日 22:39 小王爷学辩论o_O \n",
107 | "3 4456061509126470 01月01日 23:08 芩鎟 \n",
108 | "4 4455979322528190 01月01日 17:42 changlwj \n",
109 | "\n",
110 | " content \\\n",
111 | "0 写在年末冬初孩子流感的第五天,我们仍然没有忘记热情拥抱这2020年的第一天。带着一丝迷信,早... \n",
112 | "1 开年大模型…累到以为自己发烧了腰疼膝盖疼腿疼胳膊疼脖子疼#Luna的Krystallife#? \n",
113 | "2 邱晨这就是我爹,爹,发烧快好,毕竟美好的假期拿来养病不太好,假期还是要好好享受快乐,爹,新... \n",
114 | "3 新年的第一天感冒又发烧的也太衰了但是我要想着明天一定会好的? \n",
115 | "4 问:我们意念里有坏的想法了,天神就会给记下来,那如果有好的想法也会被记下来吗?答:那当然了。... \n",
116 | "\n",
117 | " pic video sentiment \n",
118 | "0 ['https://ww2.sinaimg.cn/orj360/005VnA1zly1gah... [] 0 \n",
119 | "1 [] [] -1 \n",
120 | "2 ['https://ww2.sinaimg.cn/thumb150/006ymYXKgy1g... [] 1 \n",
121 | "3 ['https://ww2.sinaimg.cn/orj360/005FL9LZgy1gah... [] 1 \n",
122 | "4 [] [] 1 "
123 | ]
124 | },
125 | "execution_count": 1,
126 | "metadata": {},
127 | "output_type": "execute_result"
128 | }
129 | ],
130 | "source": [
131 | "import numpy as np\n",
132 | "import pandas as pd\n",
133 | "import matplotlib.pyplot as plt \n",
134 | "import seaborn as sns\n",
135 | "import warnings \n",
136 | "warnings.filterwarnings('ignore')\n",
137 | "\n",
138 | "path = './nCoV_100k_train.csv'\n",
139 | "with open(path,'r',encoding = 'GB18030', errors = 'ignore') as f:\n",
140 | " raw_data = pd.read_csv(f)\n",
141 | " \n",
142 | "raw_data.columns = ['ID', 'time', 'user', 'content', 'pic', 'video', 'sentiment']\n",
143 | "raw_data.head()"
144 | ]
145 | },
146 | {
147 | "cell_type": "code",
148 | "execution_count": 2,
149 | "metadata": {},
150 | "outputs": [
151 | {
152 | "data": {
153 | "text/plain": [
154 | "(100000, 7)"
155 | ]
156 | },
157 | "execution_count": 2,
158 | "metadata": {},
159 | "output_type": "execute_result"
160 | }
161 | ],
162 | "source": [
163 | "raw_data.shape"
164 | ]
165 | },
166 | {
167 | "cell_type": "markdown",
168 | "metadata": {},
169 | "source": [
170 | "- **target variable : sentiment**"
171 | ]
172 | },
173 | {
174 | "cell_type": "code",
175 | "execution_count": 3,
176 | "metadata": {},
177 | "outputs": [
178 | {
179 | "data": {
180 | "text/plain": [
181 | "(99913, 7)"
182 | ]
183 | },
184 | "execution_count": 3,
185 | "metadata": {},
186 | "output_type": "execute_result"
187 | }
188 | ],
189 | "source": [
190 | "# keep y with only 1,0,-1\n",
191 | "raw_data = raw_data[(raw_data.sentiment == '0') | (raw_data.sentiment == '1') | (raw_data.sentiment == '-1')]\n",
192 | "raw_data.shape"
193 | ]
194 | },
195 | {
196 | "cell_type": "code",
197 | "execution_count": 4,
198 | "metadata": {},
199 | "outputs": [
200 | {
201 | "data": {
202 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAD6CAYAAACxrrxPAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjAsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8GearUAAAMoUlEQVR4nO3dYajd913H8fdnCfGBHQrmWmeS7gaXKVGHzmvmIx2zw5RCImxKAsIqm0EwOKnIMpQ+iCDdCvNRHizTwhBqVvtgXG00yNweqHTmtpZKErJdYrckoLvtykTEtXFfH+S0Hm/vzfnf5Nyc5Zv3CwL3//v/es63HHjzz/+cc5OqQpJ053vLrAeQJE2HQZekJgy6JDVh0CWpCYMuSU0YdElqYlDQk+xPcjHJcpJj6+z51STnk5xL8sR0x5QkTZJJn0NPsgX4CvB+4ApwFjhcVefH9uwBngTeV1WvJPnBqvrGjR53+/btNT8/f4vjS9Ld5dlnn32pqubWOrd1wH+/D1iuqksASU4BB4HzY3t+AzhRVa8ATIo5wPz8PEtLSwOeXpL0uiRfW+/ckFsuO4DLY8dXRmvj3gm8M8k/JHkmyf51BjmSZCnJ0srKyoCnliQNNa03RbcCe4D3AoeBzyT5/tWbqupkVS1U1cLc3Jp/Y5Ak3aQhQb8K7Bo73jlaG3cFWKyq16rqX7l+z33PdEaUJA0xJOhngT1JdifZBhwCFlft+TzXr85Jsp3rt2AuTXFOSdIEE4NeVdeAo8AZ4ALwZFWdS3I8yYHRtjPAy0nOA18Efq+qXt6soSVJbzbxY4ubZWFhofyUiyRtTJJnq2phrXN+U1SSmjDoktSEQZekJoZ8U7SF+WNPz3qETfXiow/OegRJM+YVuiQ1YdAlqQmDLklNGHRJasKgS1ITBl2SmjDoktSEQZekJgy6JDVh0CWpCYMuSU0YdElqwqBLUhMGXZKaMOiS1IRBl6QmDLokNWHQJakJgy5JTRh0SWrCoEtSEwZdkpow6JLUxKCgJ9mf5GKS5STH1jj/UJKVJM+P/nxk+qNKkm5k66QNSbYAJ4D3A1eAs0kWq+r8qq2fq6qjmzCjJGmAIVfo+4DlqrpUVa8Cp4CDmzuWJGmjhgR9B3B57PjKaG21DyR5IclTSXat9UBJjiRZSrK0srJyE+NKktYzrTdF/xKYr6p3AX8LfHatTVV1sqoWqmphbm5uSk8tSYJhQb8KjF9x7xytvaGqXq6qb48O/wT4memMJ0kaakjQzwJ7kuxOsg04BCyOb0jytrHDA8CF6Y0oSRpi4qdcqupakqPAGWAL8HhVnUtyHFiqqkXgt5McAK4B3wQe2sSZJUlrmBh0gKo6DZxetfbI2M8fBz4+3dEkSRvhN0UlqQmDLklNGHRJasKgS1ITBl2SmjDoktSEQZekJgy6JDVh0CWpCYMuSU0YdElqwqBLUhMGXZKaMOiS1IRBl6QmDLokNWHQJakJgy5JTRh0SWrCoEtSEwZdkpow6JLUhEGXpCYMuiQ1YdAlqQmDLklNGHRJamJQ0JPsT3IxyXKSYzfY94EklWRheiNKkoaYGPQkW4ATwAPAXuBwkr1r7Hsr8FHgy9MeUpI02ZAr9H3AclVdqqpXgVPAwTX2/SHwCeC/pzifJGmgIUHfAVweO74yWntDkncDu6rq6Rs9UJIjSZaSLK2srGx4WEnS+m75TdEkbwE+BfzupL1VdbKqFqpqYW5u7lafWpI0ZkjQrwK7xo53jtZe91bgJ4AvJXkR+Dlg0TdGJen2GhL0s8CeJLuTbAMOAYuvn6yqb1XV9qqar6p54BngQFUtbcrEkqQ1TQx6VV0DjgJngAvAk1V1LsnxJAc2e0BJ0jBbh2yqqtPA6VVrj6yz9723PpYkaaP8pqgkNWHQJakJgy5JTRh0SWrCoEtSEwZdkpow6JLUhEGXpCYMuiQ1YdAlqQmDLklNGHRJasKgS1ITBl2SmjDoktSEQZekJgy6JDVh0CWpCYMuSU0YdElqwqBLUhMGXZKaMOiS1IRBl6QmDLokNWHQJakJgy5JTQwKepL9SS4mWU5ybI3zv5nkX5I8n+Tvk+yd/qiSpBuZGPQkW4ATwAPAXuDwGsF+oqp+sqp+Cvgk8KmpTypJuqEhV+j7gOWqulRVrwKngIPjG6rqP8YOvxeo6Y0oSRpi64A9O4DLY8dXgPes3pTkt4CHgW3A+9Z6oCRHgCMA991330ZnlSTdwNTeFK2qE1X1I8DHgD9YZ8/JqlqoqoW5ublpPbUkiWFBvwrsGjveOVpbzyngl29lKEnSxg0J+llgT5LdSbYBh4DF8Q1J9owdPgh8dXojSpKGmHgPvaquJTkKnAG2AI9X1bkkx4GlqloEjia5H3gNeAX40GYOLUl6syFvilJVp4HTq9YeGfv5o1OeS5K0QX5TVJKaMOiS1IRBl6QmDLokNWHQJakJgy5JTRh0SWrCoEtSEwZdkpow6JLUhEGXpCYG/S4Xadbmjz096xE2zYuPPjjrEdSEV+iS1IRBl6QmDLokNWHQJakJgy5JTRh0SWrCoEtSEwZdkpow6JLUhEGXpCYMuiQ1YdAlqQmDLklNGHRJasKgS1ITg4KeZH+Si0mWkxxb4/zDSc4neSHJF5K8ffqjSpJuZGLQk2wBTgAPAHuBw0n2rtr2z8BCVb0LeAr45LQHlSTd2JAr9H3AclVdqqpXgVPAwfENVfXFqvqv0eEzwM7pjilJmmRI0HcAl8eOr4zW1vNh4K/XOpHkSJKlJEsrKyvDp5QkTTTVN0WT/BqwADy21vmqOllVC1W1MDc3N82nlqS73pB/JPoqsGvseOdo7f9Jcj/w+8AvVNW3pzOeJGmoIVfoZ4E9SXYn2QYcAhbHNyT5aeDTwIGq+sb0x5QkTTIx6FV1DTgKnAEuAE9W1bkkx5McGG17DLgH+IskzydZXOfhJEmbZMgtF6rqNHB61dojYz/fP+W5JEkb5DdFJakJgy5JTRh0SWrCoEtSEwZdkpoY9CkXSbpZ88eenvUIm+rFRx+c9Qhv8Apdkpow6JLUhEGXpCYMuiQ1YdAlqQmDLklNGHRJasKgS1ITBl2SmjDoktSEQZekJgy6JDVh0CWpCYMuSU0YdElqwqBLUhMGXZKaMOiS1IRBl6QmDLokNWHQJamJQUFPsj/JxSTLSY6tcf7nkzyX5FqSD05/TEnSJBODnmQLcAJ4ANgLHE6yd9W2rwMPAU9Me0BJ0jBbB+zZByxX1SWAJKeAg8D51zdU1Yujc9/ZhBklSQMMueWyA7g8dnxltLZhSY4kWUqytLKycjMPIUlax219U7SqTlbVQlUtzM3N3c6nlqT2hgT9KrBr7HjnaE2S9F1kSNDPAnuS7E6yDTgELG7uWJKkjZoY9Kq6BhwFzgAXgCer6lyS40kOACT52SRXgF8BPp3k3GYOLUl6syGfcqGqTgOnV609MvbzWa7fipEkzYjfFJWkJgy6JDVh0CWpCYMuSU0YdElqwqBLUhMGXZKaMOiS1IRBl6QmDLokNWHQJakJgy5JTRh0SWrCoEtSEwZdkpow6JLUhEGXpCYMuiQ1YdAlqQmDLklNGHRJasKgS1ITBl2SmjDoktSEQZekJgy6JDVh0CWpiUFBT7I/ycUky0mOrXH+e5J8bnT+y0nmpz2oJOnGJgY9yRbgBPAAsBc4nGTvqm0fBl6pqncAfwx8YtqDSpJubMgV+j5guaouVdWrwCng4Ko9B4HPjn5+CvjFJJnemJKkSbYO2LMDuDx2fAV4z3p7qupakm8BPwC8NL4pyRHgyOjwP5NcvJmh7xDbWfX/v5ni34mmydfuztb99Xv7eieGBH1qquokcPJ2PuesJFmqqoVZz6GN87W7s93Nr9+QWy5XgV1jxztHa2vuSbIV+D7g5WkMKEkaZkjQzwJ7kuxOsg04BCyu2rMIfGj08weBv6uqmt6YkqRJJt5yGd0TPwqcAbYAj1fVuSTHgaWqWgT+FPizJMvAN7ke/bvdXXFrqSlfuzvbXfv6xQtpSerBb4pKUhMGXZKaMOiS1MRt/Rx6Z0l+jOvfmN0xWroKLFbVhdlNJelu4hX6FCT5GNd/JUKAfxr9CfDna/0yM905kvz6rGfQzUlyz6xnuN38lMsUJPkK8ONV9dqq9W3AuaraM5vJdKuSfL2q7pv1HNq4u/G185bLdHwH+GHga6vW3zY6p+9iSV5Y7xRw7+2cRRuT5OH1TgF33RW6QZ+O3wG+kOSr/N8vMrsPeAdwdGZTaah7gV8CXlm1HuAfb/842oA/Ah4Drq1x7q67pWzQp6Cq/ibJO7n+q4bH3xQ9W1X/M7vJNNBfAfdU1fOrTyT50u0fRxvwHPD5qnp29YkkH5nBPDPlPXRJd6wkPwq8XFUvja39UFX9W5J7q+rfZzjebWfQJbWS5Lmqeves55iFu+4ek6T27tp/Lc2gS+rmM7MeYFa85SJJTXiFLklNGHRJasKgS1ITBl2Smvhf+pzliuwzjhYAAAAASUVORK5CYII=\n",
203 | "text/plain": [
204 | ""
205 | ]
206 | },
207 | "metadata": {
208 | "needs_background": "light"
209 | },
210 | "output_type": "display_data"
211 | }
212 | ],
213 | "source": [
214 | "raw_data.sentiment.value_counts(normalize=True).plot(kind='bar')\n",
215 | "plt.show()"
216 | ]
217 | },
218 | {
219 | "cell_type": "markdown",
220 | "metadata": {},
221 | "source": [
222 | "- **pic and video variables**"
223 | ]
224 | },
225 | {
226 | "cell_type": "code",
227 | "execution_count": 5,
228 | "metadata": {},
229 | "outputs": [
230 | {
231 | "data": {
232 | "text/plain": [
233 | "\"['https://ww2.sinaimg.cn/orj360/005VnA1zly1gahhwworn5j30m80fyq4n.jpg']\""
234 | ]
235 | },
236 | "execution_count": 5,
237 | "metadata": {},
238 | "output_type": "execute_result"
239 | }
240 | ],
241 | "source": [
242 | "raw_data.pic[0] # the url does not contain to much information, we convert photo and video to dummy variables"
243 | ]
244 | },
245 | {
246 | "cell_type": "code",
247 | "execution_count": 6,
248 | "metadata": {},
249 | "outputs": [
250 | {
251 | "data": {
252 | "text/html": [
253 | "\n",
254 | "\n",
267 | "
\n",
268 | " \n",
269 | " \n",
270 | " | \n",
271 | " ID | \n",
272 | " time | \n",
273 | " user | \n",
274 | " content | \n",
275 | " pic | \n",
276 | " video | \n",
277 | " sentiment | \n",
278 | "
\n",
279 | " \n",
280 | " \n",
281 | " \n",
282 | " 0 | \n",
283 | " 4456072029125500 | \n",
284 | " 01月01日 23:50 | \n",
285 | " 存曦1988 | \n",
286 | " 写在年末冬初孩子流感的第五天,我们仍然没有忘记热情拥抱这2020年的第一天。带着一丝迷信,早... | \n",
287 | " 1 | \n",
288 | " 0 | \n",
289 | " 0 | \n",
290 | "
\n",
291 | " \n",
292 | " 1 | \n",
293 | " 4456074167480980 | \n",
294 | " 01月01日 23:58 | \n",
295 | " LunaKrys | \n",
296 | " 开年大模型…累到以为自己发烧了腰疼膝盖疼腿疼胳膊疼脖子疼#Luna的Krystallife#? | \n",
297 | " 0 | \n",
298 | " 0 | \n",
299 | " -1 | \n",
300 | "
\n",
301 | " \n",
302 | " 2 | \n",
303 | " 4456054253264520 | \n",
304 | " 01月01日 22:39 | \n",
305 | " 小王爷学辩论o_O | \n",
306 | " 邱晨这就是我爹,爹,发烧快好,毕竟美好的假期拿来养病不太好,假期还是要好好享受快乐,爹,新... | \n",
307 | " 1 | \n",
308 | " 0 | \n",
309 | " 1 | \n",
310 | "
\n",
311 | " \n",
312 | " 3 | \n",
313 | " 4456061509126470 | \n",
314 | " 01月01日 23:08 | \n",
315 | " 芩鎟 | \n",
316 | " 新年的第一天感冒又发烧的也太衰了但是我要想着明天一定会好的? | \n",
317 | " 1 | \n",
318 | " 0 | \n",
319 | " 1 | \n",
320 | "
\n",
321 | " \n",
322 | " 4 | \n",
323 | " 4455979322528190 | \n",
324 | " 01月01日 17:42 | \n",
325 | " changlwj | \n",
326 | " 问:我们意念里有坏的想法了,天神就会给记下来,那如果有好的想法也会被记下来吗?答:那当然了。... | \n",
327 | " 0 | \n",
328 | " 0 | \n",
329 | " 1 | \n",
330 | "
\n",
331 | " \n",
332 | "
\n",
333 | "
"
334 | ],
335 | "text/plain": [
336 | " ID time user \\\n",
337 | "0 4456072029125500 01月01日 23:50 存曦1988 \n",
338 | "1 4456074167480980 01月01日 23:58 LunaKrys \n",
339 | "2 4456054253264520 01月01日 22:39 小王爷学辩论o_O \n",
340 | "3 4456061509126470 01月01日 23:08 芩鎟 \n",
341 | "4 4455979322528190 01月01日 17:42 changlwj \n",
342 | "\n",
343 | " content pic video sentiment \n",
344 | "0 写在年末冬初孩子流感的第五天,我们仍然没有忘记热情拥抱这2020年的第一天。带着一丝迷信,早... 1 0 0 \n",
345 | "1 开年大模型…累到以为自己发烧了腰疼膝盖疼腿疼胳膊疼脖子疼#Luna的Krystallife#? 0 0 -1 \n",
346 | "2 邱晨这就是我爹,爹,发烧快好,毕竟美好的假期拿来养病不太好,假期还是要好好享受快乐,爹,新... 1 0 1 \n",
347 | "3 新年的第一天感冒又发烧的也太衰了但是我要想着明天一定会好的? 1 0 1 \n",
348 | "4 问:我们意念里有坏的想法了,天神就会给记下来,那如果有好的想法也会被记下来吗?答:那当然了。... 0 0 1 "
349 | ]
350 | },
351 | "execution_count": 6,
352 | "metadata": {},
353 | "output_type": "execute_result"
354 | }
355 | ],
356 | "source": [
357 | "def dummy(series):\n",
358 | " if series == '[]':\n",
359 | " series = 0\n",
360 | " else:\n",
361 | " series = 1\n",
362 | " return series\n",
363 | "raw_data.pic = raw_data.pic.apply(lambda x: dummy(x))\n",
364 | "raw_data.video = raw_data.video.apply(lambda x: dummy(x))\n",
365 | "raw_data.head()"
366 | ]
367 | },
368 | {
369 | "cell_type": "markdown",
370 | "metadata": {},
371 | "source": [
372 | "- **drop duplicates**"
373 | ]
374 | },
375 | {
376 | "cell_type": "code",
377 | "execution_count": 7,
378 | "metadata": {},
379 | "outputs": [
380 | {
381 | "data": {
382 | "text/plain": [
383 | "4466220575191380 2\n",
384 | "4463683088077780 2\n",
385 | "4470569351337640 2\n",
386 | "4460684139742960 1\n",
387 | "4465820153259820 1\n",
388 | "Name: ID, dtype: int64"
389 | ]
390 | },
391 | "execution_count": 7,
392 | "metadata": {},
393 | "output_type": "execute_result"
394 | }
395 | ],
396 | "source": [
397 | "raw_data.ID.value_counts(ascending=False).head()"
398 | ]
399 | },
400 | {
401 | "cell_type": "code",
402 | "execution_count": 8,
403 | "metadata": {},
404 | "outputs": [
405 | {
406 | "data": {
407 | "text/html": [
408 | "\n",
409 | "\n",
422 | "
\n",
423 | " \n",
424 | " \n",
425 | " | \n",
426 | " ID | \n",
427 | " time | \n",
428 | " user | \n",
429 | " content | \n",
430 | " pic | \n",
431 | " video | \n",
432 | " sentiment | \n",
433 | "
\n",
434 | " \n",
435 | " \n",
436 | " \n",
437 | " 35202 | \n",
438 | " 4463683088077780 | \n",
439 | " 01月22日 23:54 | \n",
440 | " 爱你的Moment | \n",
441 | " 愿好人一生平安!//@努力努力再努力x:#抗击新型肺炎第一线#致敬伟大的逆行者们一定要平平安安 | \n",
442 | " 1 | \n",
443 | " 0 | \n",
444 | " 0 | \n",
445 | "
\n",
446 | " \n",
447 | " 62850 | \n",
448 | " 4463683088077780 | \n",
449 | " 01月22日 23:54 | \n",
450 | " 爱你的Moment | \n",
451 | " 愿好人一生平安!//@努力努力再努力x:#抗击新型肺炎第一线#致敬伟大的逆行者们一定要平平安安 | \n",
452 | " 1 | \n",
453 | " 0 | \n",
454 | " 1 | \n",
455 | "
\n",
456 | " \n",
457 | "
\n",
458 | "
"
459 | ],
460 | "text/plain": [
461 | " ID time user \\\n",
462 | "35202 4463683088077780 01月22日 23:54 爱你的Moment \n",
463 | "62850 4463683088077780 01月22日 23:54 爱你的Moment \n",
464 | "\n",
465 | " content pic video sentiment \n",
466 | "35202 愿好人一生平安!//@努力努力再努力x:#抗击新型肺炎第一线#致敬伟大的逆行者们一定要平平安安 1 0 0 \n",
467 | "62850 愿好人一生平安!//@努力努力再努力x:#抗击新型肺炎第一线#致敬伟大的逆行者们一定要平平安安 1 0 1 "
468 | ]
469 | },
470 | "execution_count": 8,
471 | "metadata": {},
472 | "output_type": "execute_result"
473 | }
474 | ],
475 | "source": [
476 | "raw_data[raw_data.ID == 4463683088077780]\n",
477 | "# labels are contradictory, so we drop these three observations"
478 | ]
479 | },
480 | {
481 | "cell_type": "code",
482 | "execution_count": 9,
483 | "metadata": {},
484 | "outputs": [
485 | {
486 | "data": {
487 | "text/plain": [
488 | "(99907, 7)"
489 | ]
490 | },
491 | "execution_count": 9,
492 | "metadata": {},
493 | "output_type": "execute_result"
494 | }
495 | ],
496 | "source": [
497 | "raw_data = raw_data[(raw_data.ID !=4470569351337640) & (raw_data.ID !=4463683088077780) & (raw_data.ID !=4466220575191380)]\n",
498 | "raw_data.shape"
499 | ]
500 | },
501 | {
502 | "cell_type": "markdown",
503 | "metadata": {},
504 | "source": [
505 | "- **time**"
506 | ]
507 | },
508 | {
509 | "cell_type": "code",
510 | "execution_count": 10,
511 | "metadata": {},
512 | "outputs": [
513 | {
514 | "data": {
515 | "text/html": [
516 | "\n",
517 | "\n",
530 | "
\n",
531 | " \n",
532 | " \n",
533 | " | \n",
534 | " ID | \n",
535 | " time | \n",
536 | " user | \n",
537 | " content | \n",
538 | " pic | \n",
539 | " video | \n",
540 | " sentiment | \n",
541 | "
\n",
542 | " \n",
543 | " \n",
544 | " \n",
545 | " 0 | \n",
546 | " 4456072029125500 | \n",
547 | " 2020-01-01 23:50:00 | \n",
548 | " 存曦1988 | \n",
549 | " 写在年末冬初孩子流感的第五天,我们仍然没有忘记热情拥抱这2020年的第一天。带着一丝迷信,早... | \n",
550 | " 1 | \n",
551 | " 0 | \n",
552 | " 0 | \n",
553 | "
\n",
554 | " \n",
555 | " 1 | \n",
556 | " 4456074167480980 | \n",
557 | " 2020-01-01 23:58:00 | \n",
558 | " LunaKrys | \n",
559 | " 开年大模型…累到以为自己发烧了腰疼膝盖疼腿疼胳膊疼脖子疼#Luna的Krystallife#? | \n",
560 | " 0 | \n",
561 | " 0 | \n",
562 | " -1 | \n",
563 | "
\n",
564 | " \n",
565 | " 2 | \n",
566 | " 4456054253264520 | \n",
567 | " 2020-01-01 22:39:00 | \n",
568 | " 小王爷学辩论o_O | \n",
569 | " 邱晨这就是我爹,爹,发烧快好,毕竟美好的假期拿来养病不太好,假期还是要好好享受快乐,爹,新... | \n",
570 | " 1 | \n",
571 | " 0 | \n",
572 | " 1 | \n",
573 | "
\n",
574 | " \n",
575 | " 3 | \n",
576 | " 4456061509126470 | \n",
577 | " 2020-01-01 23:08:00 | \n",
578 | " 芩鎟 | \n",
579 | " 新年的第一天感冒又发烧的也太衰了但是我要想着明天一定会好的? | \n",
580 | " 1 | \n",
581 | " 0 | \n",
582 | " 1 | \n",
583 | "
\n",
584 | " \n",
585 | " 4 | \n",
586 | " 4455979322528190 | \n",
587 | " 2020-01-01 17:42:00 | \n",
588 | " changlwj | \n",
589 | " 问:我们意念里有坏的想法了,天神就会给记下来,那如果有好的想法也会被记下来吗?答:那当然了。... | \n",
590 | " 0 | \n",
591 | " 0 | \n",
592 | " 1 | \n",
593 | "
\n",
594 | " \n",
595 | "
\n",
596 | "
"
597 | ],
598 | "text/plain": [
599 | " ID time user \\\n",
600 | "0 4456072029125500 2020-01-01 23:50:00 存曦1988 \n",
601 | "1 4456074167480980 2020-01-01 23:58:00 LunaKrys \n",
602 | "2 4456054253264520 2020-01-01 22:39:00 小王爷学辩论o_O \n",
603 | "3 4456061509126470 2020-01-01 23:08:00 芩鎟 \n",
604 | "4 4455979322528190 2020-01-01 17:42:00 changlwj \n",
605 | "\n",
606 | " content pic video sentiment \n",
607 | "0 写在年末冬初孩子流感的第五天,我们仍然没有忘记热情拥抱这2020年的第一天。带着一丝迷信,早... 1 0 0 \n",
608 | "1 开年大模型…累到以为自己发烧了腰疼膝盖疼腿疼胳膊疼脖子疼#Luna的Krystallife#? 0 0 -1 \n",
609 | "2 邱晨这就是我爹,爹,发烧快好,毕竟美好的假期拿来养病不太好,假期还是要好好享受快乐,爹,新... 1 0 1 \n",
610 | "3 新年的第一天感冒又发烧的也太衰了但是我要想着明天一定会好的? 1 0 1 \n",
611 | "4 问:我们意念里有坏的想法了,天神就会给记下来,那如果有好的想法也会被记下来吗?答:那当然了。... 0 0 1 "
612 | ]
613 | },
614 | "execution_count": 10,
615 | "metadata": {},
616 | "output_type": "execute_result"
617 | }
618 | ],
619 | "source": [
620 | "raw_data.time = pd.to_datetime('2020年'+raw_data.time,errors='coerce', format='%Y年%m月%d日 %H:%M')\n",
621 | "raw_data.head()"
622 | ]
623 | },
624 | {
625 | "cell_type": "code",
626 | "execution_count": 11,
627 | "metadata": {},
628 | "outputs": [],
629 | "source": [
630 | "# save to file\n",
631 | "raw_data.drop(['user'], axis=1, inplace=True)\n",
632 | "raw_data.to_csv('train_df.csv',index_label=False)"
633 | ]
634 | },
635 | {
636 | "cell_type": "markdown",
637 | "metadata": {},
638 | "source": [
639 | "## 2. EDA "
640 | ]
641 | }
642 | ],
643 | "metadata": {
644 | "kernelspec": {
645 | "display_name": "Python 3",
646 | "language": "python",
647 | "name": "python3"
648 | },
649 | "language_info": {
650 | "codemirror_mode": {
651 | "name": "ipython",
652 | "version": 3
653 | },
654 | "file_extension": ".py",
655 | "mimetype": "text/x-python",
656 | "name": "python",
657 | "nbconvert_exporter": "python",
658 | "pygments_lexer": "ipython3",
659 | "version": "3.7.6"
660 | }
661 | },
662 | "nbformat": 4,
663 | "nbformat_minor": 4
664 | }
665 |
--------------------------------------------------------------------------------
/SentimentAnalysis/02-tokenization.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# **Covid-19 Sentiment Analysis - Part II**\n",
8 | "This part contains the word tokenizaiton and cleaning process"
9 | ]
10 | },
11 | {
12 | "cell_type": "code",
13 | "execution_count": 1,
14 | "metadata": {},
15 | "outputs": [],
16 | "source": [
17 | "import pandas as pd\n",
18 | "import thulac\n",
19 | "import seaborn as sns\n",
20 | "import matplotlib.pyplot as plt\n",
21 | "\n",
22 | "myfile = './train_df.csv'\n",
23 | "with open(myfile,'r', errors = 'ignore') as f:\n",
24 | " raw_data = pd.read_csv(f,)\n",
25 | " \n",
26 | "# raw_data = raw_data.rename(columns={\"微博id\": \"ID\", \"微博发布时间\": \"time\", '发布人账号':'user','微博中文内容':'content',\n",
27 | "# '微博图片':'pic', '微博视频':'video','情感倾向':'sentiment'})"
28 | ]
29 | },
30 | {
31 | "cell_type": "code",
32 | "execution_count": 2,
33 | "metadata": {},
34 | "outputs": [],
35 | "source": [
36 | "df = raw_data.copy()"
37 | ]
38 | },
39 | {
40 | "cell_type": "code",
41 | "execution_count": 3,
42 | "metadata": {},
43 | "outputs": [
44 | {
45 | "data": {
46 | "text/plain": [
47 | "(99907, 6)"
48 | ]
49 | },
50 | "execution_count": 3,
51 | "metadata": {},
52 | "output_type": "execute_result"
53 | }
54 | ],
55 | "source": [
56 | "df.shape"
57 | ]
58 | },
59 | {
60 | "cell_type": "markdown",
61 | "metadata": {},
62 | "source": [
63 | "## 1. Constraint content length \n",
64 | "According to density plot, we only keep contents <= 150 words"
65 | ]
66 | },
67 | {
68 | "cell_type": "code",
69 | "execution_count": 4,
70 | "metadata": {},
71 | "outputs": [
72 | {
73 | "data": {
74 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYAAAAEICAYAAABWJCMKAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjAsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8GearUAAAgAElEQVR4nO3deXxU9bn48c+TSSYbIRthDZCwKIIiS0RQwQUXbKuo1YpdpK3V669aq962V29vrfXW/mr1trbVtj/rbltR6RZbrqhFK25AWJVNAwaSECBk39fn98ec4DROkkkyyZlknvfrlRdnzvmek+/XifPMdxdVxRhjTOSJcjsDxhhj3GEBwBhjIpQFAGOMiVAWAIwxJkJZADDGmAhlAcAYYyKUBQBjjIlQFgCMCUBEfiMi33OOzxGRohA//8si8mYon2lMb0W7nQFjwpGq3uh2HowZaFYDMMaYCGUBwAw7IvIVEXnR7/WHIvKC3+tCEZkjIjNE5BURKReRvSLyOb80T4rIDzs99z9F5JiIFIjIF/zOJ4vI0yJSKiIHROS/RKRX/28FkZeHReTvIlIjIhtEZGpv/7sY05kFADMc/RNYLCJRIjIe8AKLAERkCjAC+BB4BfgDMBpYAfxKRGZ28cyxwChgArASeERETnSu/RJIBqYAZwPXAl8JNrMikhhEXlYAPwBSgXzg3mCfb0xXLACYYUdV9wM1wBxgCbAWOCQiM/B9QK8HPgMUqOoTqtqqqluBPwJXdfPo76lqk6r+E/g78DkR8eD7cL5TVWtUtQD4H+BLvchyMHn5s6puVNVW4PdO2YzpF+sENsPVP4FzgGnOcSW+D/9FzuvJwOkiUul3TzTwTBfPq1DVOr/XB4Dx+GoFMc5r/2sTepHXYPJy2O+4Hl8txph+sQBghqt/ApcA2cCP8AWAL+ALAA8B04F/quoFQT4vVUQS/YLAJOB94BjQgu9DfJffteJe5LWwl3kxJiSsCcgMV/8EzgXiVbUIX7PPMiAd2Ar8DThBRL4kIjHOz2kiclI3z/yBiHhFZDG+ZpsXVLUNeB64V0SSRGQycDvwu17ktS95MabfLACYYUlVPwBq8X3wo6rVwH7gLVVtU9Ua4EJ87feH8DWx3AfEdvHIw0CFk/b3wI2quse59g2gznn+m/g6cx/vRV57mxdjQkJsRzBjjIlMVgMwxpgIZQHAmAHirCdUG+DnN27nzRiwJiBjjIlYQ2oY6KhRozQrK8vtbBhjzJCyefPmY6qa0fn8kAoAWVlZ5OXluZ0NY4wZUkTkQKDz1gdgjDERygKAMcZEKAsAxhgToYZUH4AxZvhoaWmhqKiIxsZGt7MybMTFxZGZmUlMTExQ6S0AGGNcUVRURFJSEllZWYiI29kZ8lSVsrIyioqKyM7ODuoeawIyxriisbGR9PR0+/APEREhPT29VzUqCwDGGNfYh39o9fa/pwUAY4YYm71vQsUCgDFDSFVDC6fd+w/+srU3+80YE5gFAGOGkC0HKjhW28RPXtpDU2ub29mJSF/72tfYtcu3+duIEf3fmfPLX/4yq1ev7vdz+sICgDFDSN6BcgAOVTXy3KZCl3MTmR599FFmzpzpdjZCwoaBGjOEbD5QwSkTkomP8fDQunw+lzORuBiP29nqtx+8uJNdh6pD+syZ40fy/UtmdXn9/vvvJzY2lltuuYXbbruN7du3s27dOtatW8djjz3GypUr+f73v09TUxNTp07liSeeYMSIEZxzzjk88MAD5OTkAHDbbbfx8ssvM3bsWFatWkVGRgbbtm3jxhtvpL6+nqlTp/L444+TmpraY543b97M7bffTm1tLaNGjeLJJ59k3LhxnHPOOZx++um89tprVFZW8thjj7F48eJ+/zeyGoAxQ0RLWzvbC6uYPzmV2y88gaM1Tfzu3YBrfJkgLF68mPXr1wOQl5dHbW0tLS0trF+/ntmzZ/PDH/6QV199lS1btpCTk8NPf/rTTzyjrq6OnJwcdu7cydlnn80PfvADAK699lruu+8+duzYwSmnnHL8fHdaWlr4xje+werVq9m8eTNf/epX+e53v3v8emtrKxs3buTBBx8M6nnBCKoGICLLgJ8DHuBRVf1xp+uxwNPAfKAMuFpVC0QkHVgNnAY8qao3B3h2LjBFVU/uV0mMGeZ2l1TT0NLG/MmpLJySzmlZqazeXMTXFk9xO2v91t039YEyf/58Nm/eTHV1NbGxscybN4+8vDzWr1/PpZdeyq5duzjzzDMBaG5uZtGiRZ94RlRUFFdffTUAX/ziF7niiiuoqqqisrKSs88+G4CVK1dy1VVX9ZifvXv38v7773PBBRcA0NbWxrhx445fv+KKK47nu6CgoF9l79BjABARD/AwcAFQBGwSkVxV3eWX7DqgQlWnicgKfBtaXw00At8DTnZ+Oj/7CnwbdxtjerD5QAUAOVm+poRZ45NZvbkIVbXx9H0QExNDdnY2Tz75JGeccQazZ8/mtddeIz8/n+zsbC644AKeffbZXj2zP++DqjJr1izeeeedgNdjY2MB8Hg8tLa29vn3+AumCWgBkK+q+1W1GVgFLO+UZjnwlHO8GlgqIqKqdar6Jr5A8C9EZARwO/DDPufemAiy+UAF45PjGJccD0Bmajy1Ta1UN4TmwyASLV68mAceeIAlS5awePFifvOb3zB37lwWLlzIW2+9RX5+PuBr6vnggw8+cX97e/vxETx/+MMfOOuss0hOTiY1NfV489IzzzxzvDbQnRNPPJHS0tLjAaClpYWdO3eGqqgBBRMAJgD+ww2KnHMB06hqK1AFpPfw3P8G/geo7y6RiNwgInkikldaWhpEdo0ZnjYfqGDe5I87Eiek+AJBUWW3/wuZbixevJiSkhIWLVrEmDFjiIuLY/HixWRkZPDkk09yzTXXMHv2bBYtWsSePXs+cX9iYiIbN27k5JNPZt26ddx1110APPXUU3z7299m9uzZbNu27fj57ni9XlavXs1//Md/cOqppzJnzhzefvvtkJfZnyujgERkDjBVVW8Tkazu0qrqI8AjADk5OTYF0kSkQ5UNlFQ1kuMfAFJ9AaC4ooFZ45PdytqQtnTpUlpaWo6/9v+Wf95557Fp06ZP3PP6668fP66tDdyCPWfOHN59992g8vDkk0/+y31vvPFGt79z1KhRIesDCKYGUAxM9Hud6ZwLmEZEooFkfJ3BXVkE5IhIAfAmcIKIvB5clo2JPB3t//Mnpx0/11EDKK5scCVPZugLJgBsAqaLSLaIeIEVQG6nNLnASuf4SmCddrNgiar+WlXHq2oWcBbwgaqe09vMGxMpdpdUEx0lzBiXdPxcWqKXuJgoiissAAwFN910E3PmzPmXnyeeeMLVPPXYBKSqrSJyM7AW3zDQx1V1p4jcA+Spai7wGPCMiOQD5fiCBADOt/yRgFdELgMu7DSCyBjTg8NVjYwZGUeM5+PvbCLChJT4IV0DiKQRTA8//PCA/47eLhQYVB+Aqq4B1nQ6d5ffcSMQcKCr8y2/u2cXEGCIqDHmY4eqGhiXHPeJ8xNSE4ZsAIiLi6OsrMz2BAiRjg1h4uI++XfSFVsKwpghoKSqkdmZKZ84PyElnp3FVS7kqP8yMzMpKirCRveFTseWkMGyAGBMmFNVSqoaWTbrk9/sMlPjKatrpqG5jXjv0FoTqGMilnGPrQVkTJgrr2umubU9cBOQjQQy/WABwJgwV1Llm0g/1pkB7O/4XAALAKYPLAAYE+YOOR/u41O6qQHYUFDTBxYAjAlzHTWAcQFqAGNGxhEdJRTbchCmDywAGBPmSqoaifEI6YneT1zzRAljk+OsBmD6xAKAMWGupKqBsclxREUFHis/1CeDGfdYADAmzJVUNgZs/ukwITXeagCmTywAGBPmSqoDzwLukJkSz+HqRlra2gcxV2Y4sABgTBhrb1cOV/VcA2hX33pBxvSGBQBjwtixuiZa2jTgENAOHfMDjlRbADC9YwHAmDBWUtn1ENAOHaODyuuaByVPZviwAGBMGPt4DkDXNYBUJwBU1FsAML1jAcCYMFZS5Rvd010ASEvwBYAyqwGYXrIAYEwYK6lqxBsdRVqASWAd4r0e4mM8VFgAML1kAcCYMHao0jcEtKcNU9ISvZTXtXSbxpjOLAAYE8Z8Q0B73uEpNTHG+gBMr1kAMCaMlfQwB6BDaoLX+gBMr1kAMCZMqSqlNU2MHhnbY9r0RK/1AZheCyoAiMgyEdkrIvkickeA67Ei8pxzfYOIZDnn00XkNRGpFZGH/NIniMjfRWSPiOwUkR+HqkDGDBfVDa00t7WTMaLnAJBqAcD0QY8BQEQ8wMPAxcBM4BoRmdkp2XVAhapOA34G3OecbwS+B3wrwKMfUNUZwFzgTBG5uG9FMGZ4OlrjmwMwemTPfQBpCV5qmlppam0b6GyZYSSYGsACIF9V96tqM7AKWN4pzXLgKed4NbBURERV61T1TXyB4DhVrVfV15zjZmALEPxW9sZEgNKaJoCgawAAlfU2EsgEL5gAMAEo9Htd5JwLmEZVW4EqID2YDIhICnAJ8I8urt8gInkikldaWhrMI40ZFkprnQCQFFwfANhyEKZ3XO0EFpFo4FngF6q6P1AaVX1EVXNUNScjI2NwM2iMi47XAIIIAMeXg7AAYHohmABQDEz0e53pnAuYxvlQTwbKgnj2I8CHqvpgEGmNiSilNU14o6MYGRfdY9qOmcI2FNT0RjABYBMwXUSyRcQLrAByO6XJBVY6x1cC61RVu3uoiPwQX6C4tXdZNiYyHK1pYnRSbI+zgOHjAGCTwUxv9PjVQlVbReRmYC3gAR5X1Z0icg+Qp6q5wGPAMyKSD5TjCxIAiEgBMBLwishlwIVANfBdYA+wxfkDf0hVHw1l4YwZykprmoJq/gFIiY8BrA/A9E7PdUtAVdcAazqdu8vvuBG4qot7s7p4bM9fa4yJYKU1TUxOTwgqbbQniuT4GAsApldsJrAxYaq0NvgaAHQsCGcBwATPAoAxYai5tZ3yumZGJ/U8CaxDWqLX+gBMr1gAMCYMldUFPwS0Q2qCLQltescCgDFhqDdzADqkJcZQ7gQOY4JhAcCYMNSXAOBbEK6FHkZgG3OcBQBjwlBfAkB6opfmtnbqmm1BOBMcCwDGhKGjTgAYNaLrvYA7S02w5SBM71gAMCYMldY0kZIQQ2y0J+h7bDkI01sWAIwJQ6U1TUEtA+0vzRaEM71kAcCYMNTbSWDwcQCwyWAmWBYAjAlDR2saGd3LAJBqAcD0kgUAY8JMx2bwva0BJMVGE+MRym02sAmSBQBjwkxtUyuNLe29DgAiQkqCbQ5vgmcBwJgw05c5AB1SE2JsX2ATNAsAxoSZjzeDD34huA4p8V4qG6wGYIJjAcCYMNMxCWz0yN7XAJKtBmB6wQKAMWHm4xpA7wNASnwMVQ0WAExwLAAYE2ZKa5uI8QjJzjaPvZFiNQDTCxYAjAkzpTVNjBoRS1RU73dNTUnw0tDSRmOLLQhnehZUABCRZSKyV0TyReSOANdjReQ55/oGEclyzqeLyGsiUisiD3W6Z76IvOfc8wtxdoY3JtIdrWnq9SSwDikJvlqDNQOZYPQYAETEAzwMXAzMBK4RkZmdkl0HVKjqNOBnwH3O+Ubge8C3Ajz618D1wHTnZ1lfCmDMcNOXSWAdUuJ9s4GtGcgEI5gawAIgX1X3q2ozsApY3inNcuAp53g1sFRERFXrVPVNfIHgOBEZB4xU1XfVt3vF08Bl/SmIMcNFvwKAUwOotNnAJgjBBIAJQKHf6yLnXMA0qtoKVAHpPTyzqIdnAiAiN4hInojklZaWBpFdY4autnalvK73K4F26Og4rrQmIBOEsO8EVtVHVDVHVXMyMjLczo4xA6qsrol27dssYPDrA7AmIBOEYAJAMTDR73Wmcy5gGhGJBpKBsh6emdnDM42JOEerO5aB6P0sYPCNAgJsNrAJSjABYBMwXUSyRcQLrAByO6XJBVY6x1cC67SbnalVtQSoFpGFzuifa4G/9jr3xgwzpbV9XwcIINHrITpKrBPYBCW6pwSq2ioiNwNrAQ/wuKruFJF7gDxVzQUeA54RkXygHF+QAEBECoCRgFdELgMuVNVdwNeBJ4F44H+dH2MiWscs4L4OA/WtCBpDhQUAE4QeAwCAqq4B1nQ6d5ffcSNwVRf3ZnVxPg84OdiMGhMJ+rMSaIeUBC9V1gRkghD2ncDGRJLSmiaS4qKJiwl+M/jOUuJtOQgTHAsAxoSR/swB6GDrAZlgWQAwJoyU1vR9DkCH5HivLQVhgmIBwJgwUlobqhqA9QGYnlkAMCaMhKQJKD6GuuY2mlvbQ5QrM1xZADAmTNQ3t1Lb1MroPk4C62ArgppgWQAwJkyEYggoQHLHbGBrBjI9sABgTJgIVQBITbAF4UxwLAAYEyb6sxewP9sTwATLAoAxYeJoxzIQI/s/CgisCcj0zAKAMWGitKYJT5SQ6rTh91WydQKbIFkAMCZMlNY0kZ7oxdOHzeD9JcVG47EVQU0QLAAYEyaO1DT2u/kHfCuCJsfH2J4ApkcWAIwJE4cqGxifHB+SZ9mCcCYYFgCMCQOqSnFFA+NTQhMAkm1BOBMECwDGhIHqxlbqmtuYEKIAkJrgtSYg0yMLAMaEgUOVDQAhqwFYE5AJhgUAY8JASVVHAOjfOkAdkhNiqLIAYHpgAcCYMFBc2QiEsgbgpaaplZY2WxHUdM0CgDFh4FBlAzEe6fcyEB1SEztmA1stwHQtqAAgIstEZK+I5IvIHQGux4rIc871DSKS5XftTuf8XhG5yO/8bSKyU0TeF5FnRSQ0dV9jhqBDlQ2MTY4jqp+TwDp0zCausOUgTDd6DAAi4gEeBi4GZgLXiMjMTsmuAypUdRrwM+A+596ZwApgFrAM+JWIeERkAnALkKOqJwMeJ50xESmUcwDALwDUWQAwXQumBrAAyFfV/araDKwClndKsxx4yjleDSwVEXHOr1LVJlX9CMh3ngcQDcSLSDSQABzqX1GMGboOVTaGbAgofNwEZDUA051gAsAEoNDvdZFzLmAaVW0FqoD0ru5V1WLgAeAgUAJUqerLgX65iNwgInkikldaWhpEdo0ZWlrb2jlc3RiyDmDwbwKyPgDTNVc6gUUkFV/tIBsYDySKyBcDpVXVR1Q1R1VzMjIyBjObxgyKozVNtLXrgASAcmsCMt0IJgAUAxP9Xmc65wKmcZp0koGybu49H/hIVUtVtQX4E3BGXwpgzFDXMQdgXIjmAADEez3ExUTZngCmW8EEgE3AdBHJFhEvvs7a3E5pcoGVzvGVwDpVVef8CmeUUDYwHdiIr+lnoYgkOH0FS4Hd/S+OMUNPxxyAUPYBAKQleK0JyHQruqcEqtoqIjcDa/GN1nlcVXeKyD1AnqrmAo8Bz4hIPlCOM6LHSfc8sAtoBW5S1TZgg4isBrY457cCj4S+eMaEv45lIMYlh3YkdEqC10YBmW71GAAAVHUNsKbTubv8jhuBq7q4917g3gDnvw98vzeZNWY4OlTZwMi4aJLiYkL63LREr40CMt2ymcDGuOxQZeiWgfaXkhBjTUCmWxYAjHFZcYjnAHSwGoDpiQUAY1w2cDUAL1UNLbS1a8ifbYYHCwDGuKi2qZWqhpaQDgHtkJYQgypUNVgzkAnMAoAxLio4VgdAVnpiyJ+dmmiTwUz3LAAY46J9pbUATM0YEfJnd8wGtslgpisWAIxx0b6jtUQJTE5PCPmzbTkI05Og5gEYdx2tbiR3+yFe3FHC+8VVREcJ8V4Pp2amcO6JGVw4a+yAdCKagbevtI6JaQnExXhC/mzbFMb0xAJAGFNV/rilmLv++j71zW2cPGEk1y+egqJUN7SyYX8Zd7+4i3v+toulJ43h2kWTOWvaKHyra5ihYF9p7YA0/4BfDcCagEwXLACEqYbmNv7zz+/x563FnJ6dxr2Xn8K00Z/8oPjoWB2rNxeyamMhr+w6wpSMRL60cDKfnZ/JyBDPLDWh1dau7D9Wx+Lpowbk+QleD97oKJsLYLpkASAMNba0cf3Teby97xi3nX8CN583DU8XWwVmj0rk2xfN4Jal01nzXglPv3OAH7y4i/vX7uWyuRO4dtFkZowdOcglMMEormigubV9wGoAIkJqQoytB2S6ZAEgzDS1tvF/freZt/Yd44ErT+Wz8zODui822sPlczO5fG4m7xVV8fQ7BfxxcxF/2HCQBdlpXLtoMhfNGkuMx/r9w8XxEUABanahkmorgppu2KdBGGlvV25/bjuv7S3lR5efEvSHf2enZCZz/1Wn8u6dS7nz4hmUVDVw8x+2svi+1/jV6/k2LDBMDOQQ0A6ptiKo6YYFgDDy4Ksf8Pf3SvjPT83gmgWT+v281EQv/3b2VF7/1rk8em0OUzIS+clLeznzx+u476U9lNU2hSDXpq/2ldaSmhBDmjNhayCkJsZYH4DpkjUBhYm/bivmF+vyuTpnItcvnhLSZ3uihPNnjuH8mWPYXVLNr17fx2/+uY+n3i7ghiVTuGHJFBK89qcw2PYdrRvQb/9gTUCme1YDCAPvF1fxndU7WJCdxn9fdvKADuM8adxIfnnNXF65bQnnnJjBg69+yDn3v85fthbj28TNDJaBHALaITXBS2V9M+22IJwJwAKAy6oaWvj677eQlujl11+Yhzd6cN6SaaOT+NUX5vPH/7OIcclx3PrcNlY+sYnC8vpB+f2RrqKumbK6ZqaODv0aQP5SE720K1Q3Wi3AfJIFABepKt96YTuHKht46PPzSB8RO+h5mD85jT99/Ux+cOksNheUc+HP3uC3b+ynta190PMSSfYfG/gOYIDUBN9cEGsGMoFYAHDRb9fv55VdR/jPT53E/MmpruXDEyWsPCOLV24/mzOnpXPvmt1c9qu32F1S7Vqehrt9R32rgA54ALAVQU03ggoAIrJMRPaKSL6I3BHgeqyIPOdc3yAiWX7X7nTO7xWRi/zOp4jIahHZIyK7RWRRKAo0VGzYX8Z9L+3lU6eM5StnZrmdHQDGp8Tz22tzePjz8zhc1cSlD73JQ+s+tNrAANhVUk18jIfM1IFdw8lWBDXd6TEAiIgHeBi4GJgJXCMiMzsluw6oUNVpwM+A+5x7ZwIrgFnAMuBXzvMAfg68pKozgFOB3f0vztBwtKaRm5/dyqS0BO777OywWrtHRPj07HG8fNsSLpo1lgde/oDP/vpt8o/Wup21YWXzgQrmTEwheoAn5qU5AaDMagAmgGD++hYA+aq6X1WbgVXA8k5plgNPOcergaXi+1RbDqxS1SZV/QjIBxaISDKwBHgMQFWbVbWy/8UJf+3tyq2rtlHT2MKvvziPpDBdryct0ctDn5/HL6+Zy4Hyej79i/U8un6/bS8YAvXNrewqqR6UZr/0EU4AqLUAYD4pmAAwASj0e13knAuYRlVbgSogvZt7s4FS4AkR2Soij4pIwOEQInKDiOSJSF5paWkQ2Q1vT75dwNv7yrj7kllDYo2eS04dz8u3LWHx9Ax++PfdrHjkHQ6U1bmdrSFte2EVbe06KAEgMTaaBK+H0hqb9Gc+ya1O4GhgHvBrVZ0L1AGf6FsAUNVHVDVHVXMyMjIGM48ht6+0lvte2sN5M0Zz9WkT3c5O0EYnxfHba+fzwFWnsudwDcseXM8z7x6weQN9tOVgBQBzJ6UMyu/LSIrlmM36NgEEEwCKAf9Pq0znXMA0IhINJANl3dxbBBSp6gbn/Gp8AWHYam1r59+f305cjIcfX3FKWLX7B0NEuHJ+JmtvXUJOVirf+8v7fOmxjRRXNridtSFn84EKpo0eQUrCwC0B4S9jRKzVAExAwQSATcB0EckWES++Tt3cTmlygZXO8ZXAOvV9PcwFVjijhLKB6cBGVT0MFIrIic49S4Fd/SxLWHv6nQNsK6zknuWzGD0yzu3s9Nn4lHie/uoC7r38ZLYcrGDZz97g+bxCqw0ESVXZcrCCeYP07R98NYBSqwGYAHoMAE6b/s3AWnwjdZ5X1Z0ico+IXOokewxIF5F84Hac5hxV3Qk8j+/D/SXgJlVtc+75BvB7EdkBzAF+FLpihZfSmiZ+9soHLJ4+iktPHe92dvpNRPjC6ZN56ZtLOGn8SL6zegdfeyqPo9WNbmct7O0/VkdlfcugzvvISLIagAksqBXAVHUNsKbTubv8jhuBq7q4917g3gDntwE5vcnsUPWTl/bQ2NrG3ZfOGnJNP92ZlJ7AqusX8sTbBfzkpT1c+OAb3LP8ZC6ZPW5YlTOUNh/wtf8PagAYEUtVQwtNrW3ERod+72EzdNlM4AG29WAFL2wu4qtnZg/4rE83REUJ152VzZpvLiYrPZFbnt3KLau2UdvU6nbWwtKWAxUkx8cwZdTg/S1kJPmWGDlmQ0FNJxYABpCq8qM1u8lIiuUbS6e7nZ0BNTVjBKtvXMS3LjyBv+84xKW/fJM9h20pic42FpQzd1IKUV1s8TkQOgKANQOZziwADKB39pexqaCCb5w3jRGxw3+9/WhPFDefN50/XL+QmqZWLnv4LV7IK+z5xgix93AN+0vrWDpj9KD+XgsApisWAAbQL/7xIaOTYvlcztAZ8x8KC6ek8/dbzmLuxFS+vXoH31m9nYbmtp5vHOZytxfjiRIuPmXcoP5eCwCmKxYABsiG/WW8u7+cG8+eSlxM5HW8jU6K43dfO52bz53G83lFXP6rtyg4FrkziFWVF7eXcMbUdEYN8rLf6YkWAExgFgAGyC/X5TNqRGxI9vYdqjxRwrcuOpEnvnIah6sbueSXb/LS+4fdzpYrdhRVcbC8nktcGAbsjY4iNSGG0lobpmv+lQWAAbDrUDVv5h/j+sXZxHsj79t/Z+eeOJoXbz6L7IxEbvzdZv5j9Y6IGyWUu/0QXk8UF80a68rvt7kAJhALAAPg2Y0H8UZHDan1fgbaxLQEXrhxEV8/ZyovbC5k2YNv8MquIxExg7i9XfnbjkOcfWIGyfHurP5qAcAEYgEgxOqbW/nL1mI+fcq4QVvrZaiIjfbwnWUzeOHGRXijo7j+6TxWPPIueQXlwzoQvPFhKUeqm1ydBZ4xwpaDMJ9kASDE/rajhJqm1ohu++/J/MlprL11Cf992eGvDy0AABQOSURBVMnkH63lyt+8w6d/8SbPbjw47DYvb29X7l+7l8zUeC6cNca1fIwaEcuxmuZhHWhN71kACLFnNx5kakYip2W5t8fvUBDjieJLCyfzxnfO5d7LT6ZdlTv/9B6n/fBVbvrDFl7ddYTm1qG/FeWLOw6x81A137rwRFeXYchIiqWhpY06G45r/Az/2UmDaHdJNVsPVvJfnz7J1sIJUmJsNF84fTKfXzCJrYWV/HVrMS/uKOHvO0pITYhh+ZwJfPmMLLJGBdwvKKw1tbZx/9q9zBw30vVFAP3nAkTCpEQTHPtLCKE/by0mxiN8dl6m21kZckSEeZNSmTcplf/6zEze+KCUP20t5vcbDvDUOwVccNIYvnHedE7JTHY7q0F7+u0DFFU08PRXTxnUpR8C8Q8A2UMwmJqBYQEgRFSVtTsPc8bUUaQmWudvf8R4olh60hiWnjSGo9WNPPPuAZ5+5wAvP/Qmy2aN5fYLT+CEMUluZ7NbGz8q5ydr97B0xmiWnOD+TnY2G9gEYn0AIbL3SA0HyupdG+c9XI0eGce/X3gi6//jXG49fzpv5h/jogff4NZVW8N2ZnFheT03/m4zE1MT+Onn5ridHcA3CgigtMYmg5mPWQAIkbXvH0EELpjp3kiP4WxkXAy3nn8C679zLv+2ZCov7TzM0p/+kzv+uCOstqU8WtPI157Ko7WtnUdX5pCc4M64/85SE7x4osSGgpp/YQEgRNbuPMz8SanHq9pmYKQmernj4hm88Z1z+dLCyfxpSzHn3v86d+fu5KjL3253Hqrisofe4mB5Pb/+4nymhNH+D1FRwqgRXmsCMv/CAkAIFJbXs6uk2pp/BtHopDjuvnQWr337HK6YN4Fn3j3Akp+8xt25O9l6sGJQx7u3tyvPbTrIVb95BwVeuHERZ04bNWi/P1g2G9h0Zp3AIbB2p2+BMwsAg29CSjw//uxsbjx7Kj//x4f8fsMBnny7gAkp8SzITuPUzGSmjU5i9MhYMkbEkpIQE9Ihuu8XV/H93J1sPlDBgqw0Hvr8XEaPjAvZ80NpdFIch6usD8B8zAJACLy86wgzxiYxKT3B7axErKxRifzs6jncfcksXtl9hFd2Heat/GP8eWvxv6SL8QijRsSSkRTL6CTfvxlJcYxOimVSWgJZ6YmMT4kj2tN15biirpk3Pizld+8eYFNBBWmJXu6/cjafnZfp+nDP7kxIiSevoNztbJgwElQAEJFlwM8BD/Coqv640/VY4GlgPlAGXK2qBc61O4HrgDbgFlVd63efB8gDilX1M/0ujQtqm1rZcqCCG5ZMcTsrBkhOiOHK+ZlcOT8TVaWkqpGD5fWU1jRRWtPE0ZomjtY0cqy2meLKRrYVVlJW14x/i1F0lDAxLYHM1HiS42MYGR9DU0s71Y0tfHSsjvyjtQBMTk/gu586ic/lTAybzt7uZKbGU93YSlVDi2uL0pnw0mMAcD6kHwYuAIqATSKSq6q7/JJdB1So6jQRWQHcB1wtIjOBFcAsYDzwqoicoKod89G/CewGRoasRINsU0E5re3KGVPDr8030okI41PiGZ8S32261rZ2SmubOFhWz4GyegrK6jhQVk9RZQNFFQ3UNLYQG+1hZHwMk9ISuHzuBBZkpzF/UmpYf+PvLDPVV0MtrmiwAGCA4GoAC4B8Vd0PICKrgOWAfwBYDtztHK8GHhJfQ+tyYJWqNgEfiUi+87x3RCQT+DRwL3B7CMriinf3leH1RDF/sq39M1RFe6IYlxzPuOR4Tp+S7nZ2BszENF8gLKqoZ+b4Ifudy4RQMKOAJgD+O3sXOecCplHVVqAKSO/h3geB7wDdrvglIjeISJ6I5JWWlgaR3cH19r4y5kxKsY1fTNjrqAEUVYTPvAnjLleGgYrIZ4Cjqrq5p7Sq+oiq5qhqTkaG+1Pq/VXVt7DzUBWLhvG3RjN8pCbEkOD1WAAwxwUTAIoB/62tMp1zAdOISDSQjK8zuKt7zwQuFZECYBVwnoj8rg/5d9WGj8poVzhjqgUAE/5EhMzUeAor6t3OigkTwQSATcB0EckWES++Tt3cTmlygZXO8ZXAOvXNxMkFVohIrIhkA9OBjap6p6pmqmqW87x1qvrFEJRnUL2zv4zY6CjmTEpxOyvGBCUzNcFqAOa4HjuBVbVVRG4G1uIbBvq4qu4UkXuAPFXNBR4DnnE6ecvxfajjpHseX4dxK3CT3wigIe+dfWWclpXm6kYfxvTGxNR4NtlcAOMIah6Aqq4B1nQ6d5ffcSNwVRf33otvpE9Xz34deD2YfISTstom9hyu4dsXubvRhzG9kZmaQI3NBTAOWwuojzYVVACwcEqayzkxJniZqR8PBTXGAkAfbS2swOuJ4uQJQ2eHKmM6hoIWlls/gLEA0GdbD1Zy0viR1v5vhhSrARh/FgD6oLWtnfeKqpg70Ub/mKElJSGGRJsLYBwWAPpg75EaGlramGvDP80QI+Jb6M4CgAELAH2y9WAlAHMn2vo/ZujJTI23JiADWADok60HK0lP9B5fXMuYoSQzNYHiioZB3TXNhCcLAH2wrbCCuZNSQrqzlDGDJTM1npom31wAE9ksAPRSVX0L+0rrmGMdwGaI6hgKeqDMmoEinQWAXtpW5LT/T7L2fzM0nTg2CfANZjCRzQJAL207WIkIzM60CWBmaJqUlkB8jIc9JRYAIp0FgF7aWljB9NEjSIqzdVTM0OSJEk4Ym8Sew9VuZ8W4zAJAL6gq2worbfinGfJmjElid0m1jQSKcBYAeqGgrJ7K+habAGaGvBnjkqiob6G0psntrBgXWQDoha0HfSuA2gYwZqibMda3Kfyew9YPEMksAPTC1oOVJHo9TB+d5HZWjOmXGc5IIOsHiGwWAHphW2Elp05MwRNlE8DM0Jaa6GXsyDgbCRThLAAEqaG5jd0l1TYBzAwbJ45NYrc1AUU0CwBBev9QFa3tahPAzLAxY1wS+47W0tLW7nZWjEuCCgAiskxE9opIvojcEeB6rIg851zfICJZftfudM7vFZGLnHMTReQ1EdklIjtF5JuhKtBA2easAGo1ADNcnDR2JM1t7Xx0rM7trBiX9BgARMQDPAxcDMwErhGRmZ2SXQdUqOo04GfAfc69M4EVwCxgGfAr53mtwL+r6kxgIXBTgGeGla2FFWSmxpORFOt2VowJiRnjfB3Bu0usIzhSBVMDWADkq+p+VW0GVgHLO6VZDjzlHK8GlopvqczlwCpVbVLVj4B8YIGqlqjqFgBVrQF2AxP6X5yBs/VgpTX/mGFlyqgRxHiE3dYRHLGCCQATgEK/10V88sP6eBpVbQWqgPRg7nWai+YCGwL9chG5QUTyRCSvtLQ0iOyGXklVAyVVjbYFpBlWvNFRzBqfzKaCcrezYlziaiewiIwA/gjcqqoB66Gq+oiq5qhqTkZGxuBm0LHxI9//IKdlpbny+40ZKGdOS2dbYSU1jbY3QCQKJgAUAxP9Xmc65wKmEZFoIBko6+5eEYnB9+H/e1X9U18yP1jyCipI9Ho4aZxNADPDy5lTR9HWrse/5JjIEkwA2ARMF5FsEfHi69TN7ZQmF1jpHF8JrFPfKlO5wApnlFA2MB3Y6PQPPAbsVtWfhqIgA2lTQTnzJqcS7bFRs2Z4mTc5ldjoKN7KL3M7K8YFPX6iOW36NwNr8XXWPq+qO0XkHhG51En2GJAuIvnA7cAdzr07geeBXcBLwE2q2gacCXwJOE9Etjk/nwpx2UKiqr6FvUdqrPnHDEtxMR5Oy0rj7X3H3M6KcUF0MIlUdQ2wptO5u/yOG4Grurj3XuDeTufeBIbEegqbD5ajCjlZNgLIDE9nTEvnJy/tpbSmyYY5Rxhr0+jBpoIKoqPE9gAww9aZU0cBWC0gAlkA6MGmj8o5eUIy8V6P21kxZkCcPCGZkXHRvG39ABHHAkA3Glva2FFUxYJsa/83w5cnSlg4JZ0384/ZDmERxgJAN3YUVdHc1k7OZGv+McPb+SeNobiygS3OpkcmMlgA6Ma7+8sQgRwbAWSGuU/PHkei18Nzmwp7TmyGDQsA3fjH7iPMmZhCWqLX7awYM6ASY6P5zOzxvLi9xGYFRxALAF04Ut3I9qIqzj9pjNtZMWZQXL1gIg0tbfxtR4nbWTGDxAJAF/6x+ygAF8y0AGAiw9yJKUwfPcKagSKIBYAuvLr7CJPSEpg+eoTbWTFmUIgIV582kW2FlbZHQISwABBAfXMrb+Yf4/yTxuBbtsiYyHDFvEwSvR5++soHbmfFDAILAAGs//AYza3tnD9ztNtZMWZQpSV6uem8abyy6wjrP3Rn/w0zeCwABPDqriOMjIu2BeBMRPrqmdlMSkvgnhd30Wobxg9rFgA6qW1q5aWdh1l60hhibPlnE4HiYjx899Mn8eHRWn737gG3s2MGkH3CdbJq40FqGlv58hlZbmfFGNdcOHMMi6eP4v/+7x6bHTyMWQDw09LWzmNvfsTCKWmcavv/mggmIjx49RzGjIzj+qfyKCyvdztLZgBYAPDz4vZDlFQ18m9LprqdFWNclz4ilie+chqt7cpXntzE4apGt7NkQswCgENVeeSN/Zw4JolzTnRn83ljws3UjBH8vy/Np6Sygc/8cj3v7rclo4cTCwCOZzcWsudwDdcvmWJj/43xs3BKOn+56UxGxsXwhUc38MDavbZe0DBhAQDfpu/fz32fs0/I4PK5E9zOjjFhZ/qYJP5685lcMnscD72Wz9n3v85v39hPRV2z21kz/SBDaQOInJwczcvLC+kziysbuPSXbzIyPoa/3HQmyfExIX2+McPN9sJK7ntpD2/vK8PrieL8maO5cOZYzpo+ilEjbE/hcCQim1U1p/P5oDaFF5FlwM8BD/Coqv640/VY4GlgPlAGXK2qBc61O4HrgDbgFlVdG8wzB1pbu/LcpkLuX7uH1jblt9fm2Ie/MUE4dWIKf7h+IbsOVfPC5kJytx1izXuHAThhzAhmZ6ZwyoRkskYlMiktgQkp8XijrbEhHPVYAxARD/ABcAFQBGwCrlHVXX5pvg7MVtUbRWQFcLmqXi0iM4FngQXAeOBV4ATntm6fGUhfawBt7UpFfTPldc0cLKtnU0E56/Yc5cOjtSzISuOey2YxY+zIXj/XGAPt7cr7h6p444NSNh+oYEdRFWV+TUNRAuNT4hmfEk9qQgypCV5SErykJMQwIjaauBgPcTFRxEb7/o2L8RAbHYU3OgqPCFFRQpSIc+zbwrLjvEd816KiIEoE/+474eMXgbr1gkkr/5JeujgfOE046U8NYAGQr6r7nQetApYD/h/Wy4G7nePVwEPi+y+xHFilqk3ARyKS7zyPIJ4ZMhf89J/sP1Z3/LXXE8XszGR+vmIOl546PmzfNGOGgqgoYXZmCrMzfXNnVJXSmiYOlNdzoKyeg2V1HCyv51BVIx8dq2NLfSWV9c20tA2d5uf+6ldQca7suPtC4mI8Ic1XMAFgAuC/QHgRcHpXaVS1VUSqgHTn/Lud7u3oZe3pmQCIyA3ADc7LWhHZG0See/Qh8Mfe3TIKOBaK3x3GhnsZrXxDW0SXL/5H/Xr25EAng+oDcJOqPgI84nY+RCQvUBVqOBnuZbTyDW1WvtALpmemGJjo9zrTORcwjYhEA8n4OoO7ujeYZxpjjBlAwQSATcB0EckWES+wAsjtlCYXWOkcXwmsU1/vci6wQkRiRSQbmA5sDPKZxhhjBlCPTUBOm/7NwFp8QzYfV9WdInIPkKequcBjwDNOJ285vg90nHTP4+vcbQVuUtU2gEDPDH3xQsr1ZqhBMNzLaOUb2qx8ITakJoIZY4wJHZudYYwxEcoCgDHGRCgLAEEQkWUisldE8kXkDrfzEwoiUiAi74nINhHJc86licgrIvKh82+q2/kMlog8LiJHReR9v3MByyM+v3Dezx0iMs+9nAeni/LdLSLFznu4TUQ+5XftTqd8e0XkIndyHTwRmSgir4nILhHZKSLfdM4Pi/ewm/K5+x6qqv1084Ovk3ofMAXwAtuBmW7nKwTlKgBGdTr3E+AO5/gO4D6389mL8iwB5gHv91Qe4FPA/+KbfLkQ2OB2/vtYvruBbwVIO9P5O40Fsp2/X4/bZeihfOOAec5xEr6lYmYOl/ewm/K5+h5aDaBnx5fCUNVmoGPZiuFoOfCUc/wUcJmLeekVVX0D3wg0f12VZznwtPq8C6SIyLjByWnfdFG+rhxfgkVVPwL8l2AJS6paoqpbnOMaYDe+VQOGxXvYTfm6MijvoQWAngVaCmM4bBqgwMsistlZbgNgjKqWOMeHgTHuZC1kuirPcHpPb3aaQB73a7Ib0uUTkSxgLrCBYfgediofuPgeWgCIXGep6jzgYuAmEVnif1F99dBhM0Z4uJXH8WtgKjAHKAH+x93s9J+IjMC3TNetqlrtf204vIcByufqe2gBoGfDctkKVS12/j0K/Blf9fJIRzXa+feoezkMia7KMyzeU1U9oqptqtoO/JaPmwiGZPlEJAbfh+PvVfVPzulh8x4GKp/b76EFgJ4Nu2UrRCRRRJI6joELgff51yU9VgJ/dSeHIdNVeXKBa52RJAuBKr9mhiGjU5v35fjeQ+h6CZawJSKCb0WB3ar6U79Lw+I97Kp8rr+HbveOD4UffCMOPsDXE/9dt/MTgvJMwTfCYDuws6NM+Jbw/ge+1bJfBdLczmsvyvQsvip0C7720uu6Kg++kSMPO+/ne0CO2/nvY/mecfK/w/nAGOeX/rtO+fYCF7ud/yDKdxa+5p0dwDbn51PD5T3spnyuvoe2FIQxxkQoawIyxpgIZQHAGGMilAUAY4yJUBYAjDEmQlkAMMaYCGUBwBhjIpQFAGOMiVD/HyIY/+gbCt+/AAAAAElFTkSuQmCC\n",
75 | "text/plain": [
76 | ""
77 | ]
78 | },
79 | "metadata": {
80 | "needs_background": "light"
81 | },
82 | "output_type": "display_data"
83 | }
84 | ],
85 | "source": [
86 | "df['weibo_len'] = df['content'].astype(str).apply(len)\n",
87 | "sns.kdeplot(df['weibo_len'])\n",
88 | "plt.title('weibo_len')\n",
89 | "plt.show()"
90 | ]
91 | },
92 | {
93 | "cell_type": "code",
94 | "execution_count": 5,
95 | "metadata": {},
96 | "outputs": [
97 | {
98 | "data": {
99 | "text/plain": [
100 | "(93361, 7)"
101 | ]
102 | },
103 | "execution_count": 5,
104 | "metadata": {},
105 | "output_type": "execute_result"
106 | }
107 | ],
108 | "source": [
109 | "df = df[df.weibo_len<=150]\n",
110 | "df.shape"
111 | ]
112 | },
113 | {
114 | "cell_type": "markdown",
115 | "metadata": {},
116 | "source": [
117 | "## 3.Remove symbols and non-Chinese"
118 | ]
119 | },
120 | {
121 | "cell_type": "code",
122 | "execution_count": 6,
123 | "metadata": {},
124 | "outputs": [
125 | {
126 | "data": {
127 | "text/plain": [
128 | "'写在年末冬初孩子流感的第五天,我们仍然没有忘记热情拥抱这2020年的第一天。带着一丝迷信,早晨给孩子穿上红色的羽绒服羽绒裤,祈祷新的一年,孩子们身体康健。仍然会有一丝焦虑,焦虑我的孩子为什么会过早的懂事,从两岁多开始关注我的情绪,会深沉地说:妈妈,你终于笑了!这句话像刀子一样扎入我?展开全文c'"
129 | ]
130 | },
131 | "execution_count": 6,
132 | "metadata": {},
133 | "output_type": "execute_result"
134 | }
135 | ],
136 | "source": [
137 | "df.content[0]"
138 | ]
139 | },
140 | {
141 | "cell_type": "code",
142 | "execution_count": 7,
143 | "metadata": {},
144 | "outputs": [],
145 | "source": [
146 | "import regex as re\n",
147 | "# remove symbol, number and letter\n",
148 | "symbols = \"[a-zA-Z0-9\\s+\\.\\!\\/_,$%^*()??;;:【】+\\\"\\'\\[\\]\\\\]+|[+——!,;:。?《》、~@#¥%……&*()“”.=-]+\"\n",
149 | "df['content'] = df['content'].astype(str).apply(lambda x : re.sub(symbols, '', x))\n",
150 | "\n",
151 | "#pre-cleaning, deleting or filtering unnecessary words like 'show more'\n",
152 | "df['content'] = df['content'].astype(str).apply(lambda x : x.replace(\"展开全文\", \"\"))\n",
153 | "\n",
154 | "#remove non-chinese\n",
155 | "pattern = re.compile(r'[^\\u4e00-\\u9fa5]')\n",
156 | "df['content'] = df['content'].astype(str).apply(lambda x : re.sub(pattern,'',x))"
157 | ]
158 | },
159 | {
160 | "cell_type": "code",
161 | "execution_count": 8,
162 | "metadata": {},
163 | "outputs": [
164 | {
165 | "data": {
166 | "text/html": [
167 | "\n",
168 | "\n",
181 | "
\n",
182 | " \n",
183 | " \n",
184 | " | \n",
185 | " ID | \n",
186 | " time | \n",
187 | " content | \n",
188 | " pic | \n",
189 | " video | \n",
190 | " sentiment | \n",
191 | " weibo_len | \n",
192 | "
\n",
193 | " \n",
194 | " \n",
195 | " \n",
196 | " 0 | \n",
197 | " 4456072029125500 | \n",
198 | " 2020-01-01 23:50:00 | \n",
199 | " 写在年末冬初孩子流感的第五天我们仍然没有忘记热情拥抱这年的第一天带着一丝迷信早晨给孩子穿上红... | \n",
200 | " 1 | \n",
201 | " 0 | \n",
202 | " 0 | \n",
203 | " 147 | \n",
204 | "
\n",
205 | " \n",
206 | " 1 | \n",
207 | " 4456074167480980 | \n",
208 | " 2020-01-01 23:58:00 | \n",
209 | " 开年大模型累到以为自己发烧了腰疼膝盖疼腿疼胳膊疼脖子疼的 | \n",
210 | " 0 | \n",
211 | " 0 | \n",
212 | " -1 | \n",
213 | " 47 | \n",
214 | "
\n",
215 | " \n",
216 | " 2 | \n",
217 | " 4456054253264520 | \n",
218 | " 2020-01-01 22:39:00 | \n",
219 | " 邱晨这就是我爹爹发烧快好毕竟美好的假期拿来养病不太好假期还是要好好享受快乐爹新年快乐发烧好了... | \n",
220 | " 1 | \n",
221 | " 0 | \n",
222 | " 1 | \n",
223 | " 99 | \n",
224 | "
\n",
225 | " \n",
226 | " 3 | \n",
227 | " 4456061509126470 | \n",
228 | " 2020-01-01 23:08:00 | \n",
229 | " 新年的第一天感冒又发烧的也太衰了但是我要想着明天一定会好的 | \n",
230 | " 1 | \n",
231 | " 0 | \n",
232 | " 1 | \n",
233 | " 30 | \n",
234 | "
\n",
235 | " \n",
236 | " 4 | \n",
237 | " 4455979322528190 | \n",
238 | " 2020-01-01 17:42:00 | \n",
239 | " 问我们意念里有坏的想法了天神就会给记下来那如果有好的想法也会被记下来吗答那当然了有坏的想法天... | \n",
240 | " 0 | \n",
241 | " 0 | \n",
242 | " 1 | \n",
243 | " 145 | \n",
244 | "
\n",
245 | " \n",
246 | "
\n",
247 | "
"
248 | ],
249 | "text/plain": [
250 | " ID time \\\n",
251 | "0 4456072029125500 2020-01-01 23:50:00 \n",
252 | "1 4456074167480980 2020-01-01 23:58:00 \n",
253 | "2 4456054253264520 2020-01-01 22:39:00 \n",
254 | "3 4456061509126470 2020-01-01 23:08:00 \n",
255 | "4 4455979322528190 2020-01-01 17:42:00 \n",
256 | "\n",
257 | " content pic video sentiment \\\n",
258 | "0 写在年末冬初孩子流感的第五天我们仍然没有忘记热情拥抱这年的第一天带着一丝迷信早晨给孩子穿上红... 1 0 0 \n",
259 | "1 开年大模型累到以为自己发烧了腰疼膝盖疼腿疼胳膊疼脖子疼的 0 0 -1 \n",
260 | "2 邱晨这就是我爹爹发烧快好毕竟美好的假期拿来养病不太好假期还是要好好享受快乐爹新年快乐发烧好了... 1 0 1 \n",
261 | "3 新年的第一天感冒又发烧的也太衰了但是我要想着明天一定会好的 1 0 1 \n",
262 | "4 问我们意念里有坏的想法了天神就会给记下来那如果有好的想法也会被记下来吗答那当然了有坏的想法天... 0 0 1 \n",
263 | "\n",
264 | " weibo_len \n",
265 | "0 147 \n",
266 | "1 47 \n",
267 | "2 99 \n",
268 | "3 30 \n",
269 | "4 145 "
270 | ]
271 | },
272 | "execution_count": 8,
273 | "metadata": {},
274 | "output_type": "execute_result"
275 | }
276 | ],
277 | "source": [
278 | "df.head()"
279 | ]
280 | },
281 | {
282 | "cell_type": "code",
283 | "execution_count": 9,
284 | "metadata": {},
285 | "outputs": [
286 | {
287 | "data": {
288 | "text/plain": [
289 | "'写在年末冬初孩子流感的第五天我们仍然没有忘记热情拥抱这年的第一天带着一丝迷信早晨给孩子穿上红色的羽绒服羽绒裤祈祷新的一年孩子们身体康健仍然会有一丝焦虑焦虑我的孩子为什么会过早的懂事从两岁多开始关注我的情绪会深沉地说妈妈你终于笑了这句话像刀子一样扎入我'"
290 | ]
291 | },
292 | "execution_count": 9,
293 | "metadata": {},
294 | "output_type": "execute_result"
295 | }
296 | ],
297 | "source": [
298 | "df.content[0]"
299 | ]
300 | },
301 | {
302 | "cell_type": "markdown",
303 | "metadata": {},
304 | "source": [
305 | "## 2. Word Tokenization"
306 | ]
307 | },
308 | {
309 | "cell_type": "code",
310 | "execution_count": 10,
311 | "metadata": {},
312 | "outputs": [
313 | {
314 | "name": "stdout",
315 | "output_type": "stream",
316 | "text": [
317 | "Model loaded succeed\n"
318 | ]
319 | },
320 | {
321 | "data": {
322 | "text/html": [
323 | "\n",
324 | "\n",
337 | "
\n",
338 | " \n",
339 | " \n",
340 | " | \n",
341 | " ID | \n",
342 | " time | \n",
343 | " content | \n",
344 | " pic | \n",
345 | " video | \n",
346 | " sentiment | \n",
347 | " weibo_len | \n",
348 | "
\n",
349 | " \n",
350 | " \n",
351 | " \n",
352 | " 0 | \n",
353 | " 4456072029125500 | \n",
354 | " 2020-01-01 23:50:00 | \n",
355 | " 写 在 年末 冬初 孩子 流感 的 第五 天 我们 仍然 没有 忘记 热情 拥抱 这 年 的... | \n",
356 | " 1 | \n",
357 | " 0 | \n",
358 | " 0 | \n",
359 | " 147 | \n",
360 | "
\n",
361 | " \n",
362 | " 1 | \n",
363 | " 4456074167480980 | \n",
364 | " 2020-01-01 23:58:00 | \n",
365 | " 开 年 大模型 累 到 以为 自己 发烧 了 腰 疼 膝盖 疼 腿 疼 胳膊 疼 脖子 疼 的 | \n",
366 | " 0 | \n",
367 | " 0 | \n",
368 | " -1 | \n",
369 | " 47 | \n",
370 | "
\n",
371 | " \n",
372 | " 2 | \n",
373 | " 4456054253264520 | \n",
374 | " 2020-01-01 22:39:00 | \n",
375 | " 邱晨 这 就 是 我 爹 爹 发烧 快 好 毕竟 美好 的 假期 拿 来 养 病 不 太 好... | \n",
376 | " 1 | \n",
377 | " 0 | \n",
378 | " 1 | \n",
379 | " 99 | \n",
380 | "
\n",
381 | " \n",
382 | " 3 | \n",
383 | " 4456061509126470 | \n",
384 | " 2020-01-01 23:08:00 | \n",
385 | " 新年 的 第一 天 感冒 又 发烧 的 也 太 衰 了 但是 我 要 想 着 明天 一定 会... | \n",
386 | " 1 | \n",
387 | " 0 | \n",
388 | " 1 | \n",
389 | " 30 | \n",
390 | "
\n",
391 | " \n",
392 | " 4 | \n",
393 | " 4455979322528190 | \n",
394 | " 2020-01-01 17:42:00 | \n",
395 | " 问 我们 意念 里 有 坏 的 想法 了 天神 就 会 给 记 下 来 那 如果 有 好 的... | \n",
396 | " 0 | \n",
397 | " 0 | \n",
398 | " 1 | \n",
399 | " 145 | \n",
400 | "
\n",
401 | " \n",
402 | " 5 | \n",
403 | " 4455960703574270 | \n",
404 | " 2020-01-01 16:28:00 | \n",
405 | " 发高烧 反反复复 眼睛 都 快 睁 不 开 了 今天 室友 带 我 去 看 还 在 发烧 中... | \n",
406 | " 1 | \n",
407 | " 0 | \n",
408 | " -1 | \n",
409 | " 139 | \n",
410 | "
\n",
411 | " \n",
412 | " 6 | \n",
413 | " 4456044141311370 | \n",
414 | " 2020-01-01 21:59:00 | \n",
415 | " 明天 考试 今天 发烧 跨年 给 我 跨 坏 了 兰州 兰州 交通 大学 | \n",
416 | " 0 | \n",
417 | " 0 | \n",
418 | " -1 | \n",
419 | " 28 | \n",
420 | "
\n",
421 | " \n",
422 | " 7 | \n",
423 | " 4456072930597380 | \n",
424 | " 2020-01-01 23:53:00 | \n",
425 | " 元旦 快乐 枇杷 手法 小结 每个 娃 都 是 有 故事 的 娃 每个 大人 也 是 有 故... | \n",
426 | " 1 | \n",
427 | " 0 | \n",
428 | " 0 | \n",
429 | " 143 | \n",
430 | "
\n",
431 | " \n",
432 | " 8 | \n",
433 | " 4456059546766320 | \n",
434 | " 2020-01-01 23:00:00 | \n",
435 | " 我 真 的 服 了 昨天 去 和 她 说 自己 不 舒服 描述 了 症状 她 说 啊 你 这... | \n",
436 | " 0 | \n",
437 | " 0 | \n",
438 | " -1 | \n",
439 | " 144 | \n",
440 | "
\n",
441 | " \n",
442 | " 9 | \n",
443 | " 4456064361730200 | \n",
444 | " 2020-01-01 23:19:00 | \n",
445 | " 新年 第一 天 为 自己 鼓掌 发烧 了 也 要 来 看 线 下 演出 因为 热爱 所以 才... | \n",
446 | " 1 | \n",
447 | " 0 | \n",
448 | " 1 | \n",
449 | " 127 | \n",
450 | "
\n",
451 | " \n",
452 | "
\n",
453 | "
"
454 | ],
455 | "text/plain": [
456 | " ID time \\\n",
457 | "0 4456072029125500 2020-01-01 23:50:00 \n",
458 | "1 4456074167480980 2020-01-01 23:58:00 \n",
459 | "2 4456054253264520 2020-01-01 22:39:00 \n",
460 | "3 4456061509126470 2020-01-01 23:08:00 \n",
461 | "4 4455979322528190 2020-01-01 17:42:00 \n",
462 | "5 4455960703574270 2020-01-01 16:28:00 \n",
463 | "6 4456044141311370 2020-01-01 21:59:00 \n",
464 | "7 4456072930597380 2020-01-01 23:53:00 \n",
465 | "8 4456059546766320 2020-01-01 23:00:00 \n",
466 | "9 4456064361730200 2020-01-01 23:19:00 \n",
467 | "\n",
468 | " content pic video sentiment \\\n",
469 | "0 写 在 年末 冬初 孩子 流感 的 第五 天 我们 仍然 没有 忘记 热情 拥抱 这 年 的... 1 0 0 \n",
470 | "1 开 年 大模型 累 到 以为 自己 发烧 了 腰 疼 膝盖 疼 腿 疼 胳膊 疼 脖子 疼 的 0 0 -1 \n",
471 | "2 邱晨 这 就 是 我 爹 爹 发烧 快 好 毕竟 美好 的 假期 拿 来 养 病 不 太 好... 1 0 1 \n",
472 | "3 新年 的 第一 天 感冒 又 发烧 的 也 太 衰 了 但是 我 要 想 着 明天 一定 会... 1 0 1 \n",
473 | "4 问 我们 意念 里 有 坏 的 想法 了 天神 就 会 给 记 下 来 那 如果 有 好 的... 0 0 1 \n",
474 | "5 发高烧 反反复复 眼睛 都 快 睁 不 开 了 今天 室友 带 我 去 看 还 在 发烧 中... 1 0 -1 \n",
475 | "6 明天 考试 今天 发烧 跨年 给 我 跨 坏 了 兰州 兰州 交通 大学 0 0 -1 \n",
476 | "7 元旦 快乐 枇杷 手法 小结 每个 娃 都 是 有 故事 的 娃 每个 大人 也 是 有 故... 1 0 0 \n",
477 | "8 我 真 的 服 了 昨天 去 和 她 说 自己 不 舒服 描述 了 症状 她 说 啊 你 这... 0 0 -1 \n",
478 | "9 新年 第一 天 为 自己 鼓掌 发烧 了 也 要 来 看 线 下 演出 因为 热爱 所以 才... 1 0 1 \n",
479 | "\n",
480 | " weibo_len \n",
481 | "0 147 \n",
482 | "1 47 \n",
483 | "2 99 \n",
484 | "3 30 \n",
485 | "4 145 \n",
486 | "5 139 \n",
487 | "6 28 \n",
488 | "7 143 \n",
489 | "8 144 \n",
490 | "9 127 "
491 | ]
492 | },
493 | "execution_count": 10,
494 | "metadata": {},
495 | "output_type": "execute_result"
496 | }
497 | ],
498 | "source": [
499 | "# tokenize by http://thulac.thunlp.org/\n",
500 | "\n",
501 | "thul = thulac.thulac(seg_only=True) # by default\n",
502 | "\n",
503 | "df['content'] = df['content'].astype(str).apply(lambda x : thul.cut(x,text = True))\n",
504 | "#text = thu1.cut(df0.content[0], text=True) #进行一句话分词\n",
505 | "#print(text)\n",
506 | "df.head(10)"
507 | ]
508 | },
509 | {
510 | "cell_type": "markdown",
511 | "metadata": {},
512 | "source": [
513 | "## 4. Remove meaningfulless words\n",
514 | "Import word list and filter word lists during the count vectorizer part\n",
515 | "e.g. stopwords, adverbs..."
516 | ]
517 | },
518 | {
519 | "cell_type": "code",
520 | "execution_count": 11,
521 | "metadata": {},
522 | "outputs": [
523 | {
524 | "data": {
525 | "text/plain": [
526 | "(1893, 1)"
527 | ]
528 | },
529 | "execution_count": 11,
530 | "metadata": {},
531 | "output_type": "execute_result"
532 | }
533 | ],
534 | "source": [
535 | "import csv\n",
536 | "myfile = './mystopwords.csv'\n",
537 | "\n",
538 | "stopwords = pd.read_csv(myfile, delimiter='\\n', header=None, encoding='utf-8', quoting=csv.QUOTE_NONE)\n",
539 | "stopwords.shape"
540 | ]
541 | },
542 | {
543 | "cell_type": "code",
544 | "execution_count": 12,
545 | "metadata": {
546 | "collapsed": true,
547 | "jupyter": {
548 | "outputs_hidden": true
549 | }
550 | },
551 | "outputs": [
552 | {
553 | "name": "stderr",
554 | "output_type": "stream",
555 | "text": [
556 | "/anaconda3/lib/python3.7/site-packages/sklearn/feature_extraction/text.py:300: UserWarning: Your stop_words may be inconsistent with your preprocessing. Tokenizing the stop words generated tokens ['lex', '①①', '①②', '①③', '①④', '①⑤', '①⑥', '①⑦', '①⑧', '①⑨', '①a', '①b', '①c', '①d', '①e', '①f', '①g', '①h', '①i', '①o', '②①', '②②', '②③', '②④', '②⑤', '②⑥', '②⑦', '②⑧', '②⑩', '②a', '②b', '②d', '②e', '②f', '②g', '②h', '②i', '②j', '③①', '③⑩', '③a', '③b', '③c', '③d', '③e', '③f', '③g', '③h', '④a', '④b', '④c', '④d', '④e', '⑤a', '⑤b', '⑤d', '⑤e', '⑤f', '12', 'li', 'zxfitl'] not in stop_words.\n",
557 | " 'stop_words.' % sorted(inconsistent))\n"
558 | ]
559 | },
560 | {
561 | "data": {
562 | "text/plain": [
563 | "{'酒精': 161,\n",
564 | " '网页链': 150,\n",
565 | " '社区': 141,\n",
566 | " '口罩': 53,\n",
567 | " '医用': 41,\n",
568 | " '工作': 69,\n",
569 | " '疫情': 129,\n",
570 | " '消毒': 122,\n",
571 | " '小时': 68,\n",
572 | " '病例': 133,\n",
573 | " '确诊': 139,\n",
574 | " '第一': 145,\n",
575 | " '患者': 79,\n",
576 | " '政府': 99,\n",
577 | " '重症': 162,\n",
578 | " '肺炎': 152,\n",
579 | " '疾病': 131,\n",
580 | " '武汉': 116,\n",
581 | " '市场': 70,\n",
582 | " '感冒': 81,\n",
583 | " '病毒': 134,\n",
584 | " '研究': 138,\n",
585 | " '上海': 1,\n",
586 | " '预防': 171,\n",
587 | " '上班': 2,\n",
588 | " '老师': 151,\n",
589 | " '视频': 154,\n",
590 | " '微博': 76,\n",
591 | " '隔离': 169,\n",
592 | " '中国': 6,\n",
593 | " '野味': 163,\n",
594 | " '春节': 109,\n",
595 | " '希望': 71,\n",
596 | " '晚上': 110,\n",
597 | " '出院': 30,\n",
598 | " '影响': 75,\n",
599 | " '感染': 82,\n",
600 | " '加油': 34,\n",
601 | " '致敬': 153,\n",
602 | " '前线': 32,\n",
603 | " '医护': 39,\n",
604 | " '人员': 11,\n",
605 | " '感谢': 84,\n",
606 | " '抗击': 89,\n",
607 | " '新型': 103,\n",
608 | " '第一线': 146,\n",
609 | " '湖北': 123,\n",
610 | " '社会': 140,\n",
611 | " '冠状': 28,\n",
612 | " '动物': 35,\n",
613 | " '发生': 52,\n",
614 | " '出门': 29,\n",
615 | " '今日': 12,\n",
616 | " '发现': 51,\n",
617 | " '新增': 104,\n",
618 | " '消息': 121,\n",
619 | " '评论': 156,\n",
620 | " '发热': 50,\n",
621 | " '困难': 57,\n",
622 | " '控制': 95,\n",
623 | " '建议': 73,\n",
624 | " '医院': 44,\n",
625 | " '支持': 97,\n",
626 | " '基层': 60,\n",
627 | " '防疫': 167,\n",
628 | " '防控': 166,\n",
629 | " '公益': 26,\n",
630 | " '发布': 48,\n",
631 | " '增强': 61,\n",
632 | " '抵抗力': 92,\n",
633 | " '转发': 158,\n",
634 | " '特别': 125,\n",
635 | " '领导': 172,\n",
636 | " '期间': 114,\n",
637 | " '咳嗽': 55,\n",
638 | " '生活': 127,\n",
639 | " '全国': 22,\n",
640 | " '新冠肺炎': 102,\n",
641 | " '月日': 112,\n",
642 | " '医务': 38,\n",
643 | " '新闻': 105,\n",
644 | " '喜欢': 56,\n",
645 | " '防护': 165,\n",
646 | " '公司': 25,\n",
647 | " '身体': 157,\n",
648 | " '医生': 40,\n",
649 | " '措施': 96,\n",
650 | " '平安': 72,\n",
651 | " '健康': 21,\n",
652 | " '辛苦': 159,\n",
653 | " '钟南山': 164,\n",
654 | " '原因': 47,\n",
655 | " '地方': 58,\n",
656 | " '朋友': 113,\n",
657 | " '情况': 80,\n",
658 | " '抗疫': 90,\n",
659 | " '一线': 0,\n",
660 | " '奋战': 64,\n",
661 | " '公共': 24,\n",
662 | " '卫生': 46,\n",
663 | " '传染病': 17,\n",
664 | " '传播': 15,\n",
665 | " '传人': 14,\n",
666 | " '手机': 87,\n",
667 | " '院士': 168,\n",
668 | " '捐赠': 93,\n",
669 | " '卫健委': 45,\n",
670 | " '发烧': 49,\n",
671 | " '症状': 135,\n",
672 | " '检测': 115,\n",
673 | " '治疗': 119,\n",
674 | " '企业': 13,\n",
675 | " '复工': 62,\n",
676 | " '护士': 91,\n",
677 | " '治愈': 118,\n",
678 | " '小区': 67,\n",
679 | " '物资': 124,\n",
680 | " '志愿者': 77,\n",
681 | " '祈福': 142,\n",
682 | " '直播': 136,\n",
683 | " '方案': 106,\n",
684 | " '分享': 31,\n",
685 | " '医疗': 42,\n",
686 | " '战疫': 85,\n",
687 | " '关注': 27,\n",
688 | " '城市': 59,\n",
689 | " '感觉': 83,\n",
690 | " '武汉市': 117,\n",
691 | " '紧急': 147,\n",
692 | " '世界': 4,\n",
693 | " '接触': 94,\n",
694 | " '疑似': 128,\n",
695 | " '孩子': 66,\n",
696 | " '快乐': 78,\n",
697 | " '组织': 148,\n",
698 | " '时间': 107,\n",
699 | " '活动': 120,\n",
700 | " '为什': 9,\n",
701 | " '通知': 160,\n",
702 | " '好好': 65,\n",
703 | " '戴口罩': 86,\n",
704 | " '传染': 16,\n",
705 | " '记者': 155,\n",
706 | " '临床': 8,\n",
707 | " '保护': 19,\n",
708 | " '体温': 18,\n",
709 | " '病人': 132,\n",
710 | " '中心': 7,\n",
711 | " '新冠': 101,\n",
712 | " '中医药': 5,\n",
713 | " '医疗队': 43,\n",
714 | " '大学': 63,\n",
715 | " '相关': 137,\n",
716 | " '力量': 33,\n",
717 | " '呼吸': 54,\n",
718 | " '明天': 108,\n",
719 | " '全球': 23,\n",
720 | " '科学': 143,\n",
721 | " '疫苗': 130,\n",
722 | " '空气': 144,\n",
723 | " '开学': 74,\n",
724 | " '页链': 170,\n",
725 | " '努力': 36,\n",
726 | " '结束': 149,\n",
727 | " '下午': 3,\n",
728 | " '北京': 37,\n",
729 | " '信息': 20,\n",
730 | " '最新': 111,\n",
731 | " '打卡': 88,\n",
732 | " '救治': 100,\n",
733 | " '事件': 10,\n",
734 | " '支援': 98,\n",
735 | " '生命': 126}"
736 | ]
737 | },
738 | "execution_count": 12,
739 | "metadata": {},
740 | "output_type": "execute_result"
741 | }
742 | ],
743 | "source": [
744 | "from sklearn.model_selection import train_test_split\n",
745 | "from sklearn.feature_extraction.text import TfidfVectorizer\n",
746 | "\n",
747 | "\n",
748 | "X_train_pre, X_test_pre, y_train, y_test = train_test_split(df['content'].astype(str), df['sentiment'].tolist(), test_size=0.2, random_state=0)\n",
749 | "\n",
750 | "tfidf = TfidfVectorizer(ngram_range = (1,1), stop_words = list(stopwords.values.reshape(1893,)), analyzer = 'word', min_df=0.01)\n",
751 | "tfidffit = tfidf.fit(X_train_pre)\n",
752 | "X_train = tfidffit.transform(X_train_pre)\n",
753 | "tfidf.vocabulary_"
754 | ]
755 | },
756 | {
757 | "cell_type": "code",
758 | "execution_count": 13,
759 | "metadata": {},
760 | "outputs": [
761 | {
762 | "data": {
763 | "text/plain": [
764 | "<74688x173 sparse matrix of type ''\n",
765 | "\twith 321189 stored elements in Compressed Sparse Row format>"
766 | ]
767 | },
768 | "execution_count": 13,
769 | "metadata": {},
770 | "output_type": "execute_result"
771 | }
772 | ],
773 | "source": [
774 | "X_train"
775 | ]
776 | }
777 | ],
778 | "metadata": {
779 | "kernelspec": {
780 | "display_name": "Python 3",
781 | "language": "python",
782 | "name": "python3"
783 | },
784 | "language_info": {
785 | "codemirror_mode": {
786 | "name": "ipython",
787 | "version": 3
788 | },
789 | "file_extension": ".py",
790 | "mimetype": "text/x-python",
791 | "name": "python",
792 | "nbconvert_exporter": "python",
793 | "pygments_lexer": "ipython3",
794 | "version": "3.7.6"
795 | }
796 | },
797 | "nbformat": 4,
798 | "nbformat_minor": 4
799 | }
800 |
--------------------------------------------------------------------------------
/SentimentAnalysis/03-preprocessing.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": 1,
6 | "metadata": {},
7 | "outputs": [],
8 | "source": [
9 | "import pandas as pd\n",
10 | "import thulac\n",
11 | "import seaborn as sns\n",
12 | "import matplotlib.pyplot as plt\n",
13 | "import regex as re\n",
14 | "import csv"
15 | ]
16 | },
17 | {
18 | "cell_type": "code",
19 | "execution_count": 2,
20 | "metadata": {},
21 | "outputs": [],
22 | "source": [
23 | "myfile = '/Users/zhangmeng/Desktop/study/ESCP_SEP/NLP/sentiment/train_ dataset/labled.csv'\n",
24 | "with open(myfile,'r',encoding = 'GB18030', errors = 'ignore') as f:\n",
25 | " raw_data = pd.read_csv(f)"
26 | ]
27 | },
28 | {
29 | "cell_type": "code",
30 | "execution_count": 3,
31 | "metadata": {},
32 | "outputs": [],
33 | "source": [
34 | "raw_data = raw_data.rename(columns={\"微博id\": \"ID\", \"微博发布时间\": \"time\", '发布人账号':'user','微博中文内容':'content',\n",
35 | " '微博图片':'pic', '微博视频':'video','情感倾向':'sentiment'})"
36 | ]
37 | },
38 | {
39 | "cell_type": "code",
40 | "execution_count": 30,
41 | "metadata": {},
42 | "outputs": [],
43 | "source": [
44 | "df0 = raw_data\n",
45 | "df0 = df0.drop(columns=['time','user','pic', 'video'])"
46 | ]
47 | },
48 | {
49 | "cell_type": "markdown",
50 | "metadata": {},
51 | "source": [
52 | "# Understand data, check target value distribution and the distribution of Weibo length"
53 | ]
54 | },
55 | {
56 | "cell_type": "code",
57 | "execution_count": 31,
58 | "metadata": {},
59 | "outputs": [
60 | {
61 | "data": {
62 | "text/plain": [
63 | "Text(0.5,1,'sentiment(target) distribution')"
64 | ]
65 | },
66 | "execution_count": 31,
67 | "metadata": {},
68 | "output_type": "execute_result"
69 | },
70 | {
71 | "data": {
72 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYcAAAELCAYAAAAybErdAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAGnNJREFUeJzt3Xu4XXV95/H3RyKIolwkRkjA4BhRvGtEnF6kMoWgHeFx1HqZEikaHXFGp51WbKcP3sWZTqk8oygKEloVqa0lajSm4GWsBQmKICLliGISuQTCRaqCwHf+WL+jm7POydlJTrKPnvfrefZz1vqt3/6t79rnZH32uuydVBWSJA16wKgLkCTNPoaDJKnHcJAk9RgOkqQew0GS1GM4SJJ6DAftEEk+l2T5qOsYl+TdSd446jqmkuTJSb62lc95S5K/bdMHJrkzyS4zVM8HkvxFmz48yYaZGLeN91tJrp6p8bRjGA7aboM7qXFVdXRVrRxBLWcneceEtvnAccAH2/yM7uy2RZJK8pjx+aq6HLgtyX/clvGq6odVtUdV3TvNel+Z5KtDjPfaqnr7ttQyyTonbuv/q6qDZ2Js7TiGg+aCVwKrq+qnMzFYknkzMc4kPgq8ZgeNPbSZOvrQr7iq8jGHHsCbgI3Aj4GrgSNa+wOAk4DvAbcA5wH7tGWLgQKWAz8Ebgb+vC1bBtwN/By4E/hWa/8S8Ko2/Urgn4FTgduAa4F/39rXAzcBywdq3A34y7auG4EPALu3ZYcDG4A/bs+7Hji+LVvR6ri71fLp1n4h8J/b9EOAnwL3tT53AvsDhwL/0uq7Hvi/wK4DNRVwInAN8P3WdmR7DW8H3g98eXyb2/I/BK4CbgXWAI9q7V9p4/1bW//vt/aFrbbdpvjdHdTW8WNgbavxbyf8juYNvObXtr7fB14BPB74GXBvW+9tre/ZwOnA6lbTf2ht75jwmv8Z3e/+B8ArBur6xe96YN1fnWpbx8cb6P/4NsZtwJXACwaWnQ28D/hs25aLgX836n9Hc+Ex8gJ87MRfNhxMtzPev80vHv+HBrwBuAhYRLdz/iDw8YF+BXwI2B14CnAX8Pi2/C3jO6mBdf1ih9F2FvcAxwO7AO+g2/G/r63ryPYPf4/W/1RgFbAP8FDg08C727LD21hvAx4IPA/4CbB3W/6LndpALZuAZw7M32/n1NqeARwGzGvbexXwxoHlRbdD3qe9BvsCdwAvbM95A10wjW/zMcBY2/HNA/4n8LUJ4z1mkt/RHcCTp/j9/QvwV+01++32mvXCgS4A7wAObsv2A54w8Lv46oRxz6YLuN+ge5PwIPrhcM/Aup9Dt7MfH/8Xv+vJ1jFxWwdf//Y7HKMLnl2B57btOnigtlvownse3dHVuaP+tzQXHp5WmlvupfvHfUiSB1bVD6rqe23Za+mOBjZU1V10O/wXTTiF8taq+mlVfQv4Fl1IDOv7VfWR6s6JfwI4AHhbVd1VVV+ge7f/mCShOwL471W1uap+DLwLeOnAWD9vz/15Va2me0e6pXPYe9HtcKZUVZdW1UVVdU9V/YAuHJ8zodu7W00/pQulK6vqH6rqHuA04IaBvq9t/a9qy98FPDXJo7ZUR6tzr4mNSQ4Engn8RXvNvkIXmlO5D3hikt2r6vqqunKa9Z5fVf9cVfdV1c+m6DO+7i/TvZN/yTRjDuMwYA/glKq6u6ouBD4DvGygz6eq6uvtdfwo8NQZWK+mYTjMIVU1BryRbsd/U5Jzk+zfFj8K+FSS25LcRvfO+V5gwcAQgzu/n9D9ox7WjQPTP231TGzbA5gPPBi4dKCWz7f2cbe0HcWwtdxKdwQypSSPTfKZJDckuYNuZ77vhG7rB6b3H5yvqqI79TLuUcB7B7ZhMxC6U0db8lC60ysT7Q/cWlX/NtB23WQDtD6/TxdQ1yf5bJLHTbPe9dMsn2zd+0/VeSvsD6yvqvsmjD34Om3P3522keEwx1TVx6rqN+l2XgW8py1aDxxdVXsNPB5UVRuHGXYGS7yZLiieMFDHnlU17A5hslouBx47TZ/Tge8CS6rqYXSnObKFsa+nOwUHQDviWTSwfD3wmgmv5+5VNeXtqkkW0p1amew2z+uBvZM8ZKDtwKnGqqo1VfW7dKeUvkt3SnDiNtzvKVON1Uy27h+16X+jC/Rxj5xmrEE/Ag5IMrgvOpDuuphGyHCYQ5IcnOS5SXajuzA5fmEWuou+7xw/7ZFkfpJjhhz6RmDxhH/g26S9g/wQcGqSR7RaFiY5aitqefSEttXc/xTRjcDDk+w50PZQuvP0d7Z32f9lmvV8FnhSkmPbqbcTuf9O8QPAm5M8oW3DnklePE2dzwEubKf17qeqrgPWAW9NsmuS3wQmve01yYIkx7Sd+V10p93Gf883AouS7DrN9k1mfN2/Bfwe8Het/TLghUke3G5ZPWHC8ybb1nEX0x0N/GmSByY5vG3XudtQn2aQ4TC37AacQvfu/AbgEcCb27L30l0E/kKSH9NdnH7WkOOO7yRuSfKNGajzTXQXKS9qp3j+iS1fUxh0Jt01lduS/GNrOwd4XpLdAarqu8DHgWtbv/2B/wG8nO6c/4forotMqapuBl4M/C+6C6aH0O2872rLP0V3VHZu24ZvA0cPDPEWYGVb//i5+1fQhcpUXk73O9kMnNy2azIPAP6I7l35ZrrQGQ+7C+nuCLohyc1b2sYJbqA7PfcjuvP+r22vI3Q3ENxNFwIr2/JBb6G/rQBU1d10YXA03d/l+4HjBsbWiKQ7VSr9ekvyLuCmqvrrHTT+A+iuObyiqr64Dc9/MvDBqnr2jBcnbQPDQdpG7VTXxXSn5/6E7tTSo2uGPmwnjZKnlaRt92y6Dw3eTHdq5FiDQb8uPHKQJPUMdeSQZK8kn0zy3SRXJXl2kn2SrE1yTfu5d+ubJKclGUtyeZKnD4yzvPW/JgPf2JnkGUmuaM85rd0WKEkakWFPK70X+HxVPY7uU7FX0X0PzwVVtQS4oM1Dd9fBkvZYQXf/OEn2obvD4ll0H4U/eTxQWp9XDzxv2fZtliRpe0x7WqndC34Z3YW2Gmi/Gji8qq5Psh/wpao6OMkH2/THB/uNP6rqNa39g3TfyfIl4IsteEjyssF+U9l3331r8eLFW7u9kjRnXXrppTdX1fzpe3ZfZDWdg+i+uOwjSZ4CXEr3JWMLqur61ucGfvk1Cwu5/0fxN7S2LbVvmKR9ixYvXsy6deuGKF+SBJBk0q9cmcwwp5XmAU8HTq+qp9F9VP6kwQ7tiGKHX9lOsiLJuiTrNm3atKNXJ0lz1jDhsIHu63UvbvOfpAuLG9vpJNrPm9ryjXTfuDluUWvbUvuiSdp7quqMqlpaVUvnzx/qyEiStA2mDYequgFYn2T86wuOAL5D91UL43ccLQfOb9OrgOPaXUuHAbe3009rgCOT7N0uRB8JrGnL7khyWLtL6biBsSRJIzDsf3f4X4GPti/rupbuP215AHBekhPovmJ3/DtTVtN91/0Y3RdqHQ9QVZuTvB24pPV7W1VtbtOvo/tPPXYHPtcekqQR+ZX9ENzSpUvLC9KSNLwkl1bV0mH6+vUZkqQew0GS1GM4SJJ6hr0g/Stp8UmfnZFxfnDK82dkHEn6VeGRgySpx3CQJPUYDpKkHsNBktRjOEiSegwHSVKP4SBJ6jEcJEk9hoMkqcdwkCT1GA6SpB7DQZLUYzhIknoMB0lSj+EgSeoxHCRJPYaDJKnHcJAk9RgOkqQew0GS1GM4SJJ6DAdJUo/hIEnqGSockvwgyRVJLkuyrrXtk2Rtkmvaz71be5KclmQsyeVJnj4wzvLW/5okywfan9HGH2vPzUxvqCRpeFtz5PA7VfXUqlra5k8CLqiqJcAFbR7gaGBJe6wATocuTICTgWcBhwInjwdK6/Pqgect2+YtkiRtt+05rXQMsLJNrwSOHWg/pzoXAXsl2Q84ClhbVZur6lZgLbCsLXtYVV1UVQWcMzCWJGkEhg2HAr6Q5NIkK1rbgqq6vk3fACxo0wuB9QPP3dDattS+YZL2niQrkqxLsm7Tpk1Dli5J2lrzhuz3m1W1MckjgLVJvju4sKoqSc18efdXVWcAZwAsXbp0h69PkuaqoY4cqmpj+3kT8Cm6awY3tlNCtJ83te4bgQMGnr6otW2pfdEk7ZKkEZk2HJI8JMlDx6eBI4FvA6uA8TuOlgPnt+lVwHHtrqXDgNvb6ac1wJFJ9m4Xoo8E1rRldyQ5rN2ldNzAWJKkERjmtNIC4FPt7tJ5wMeq6vNJLgHOS3ICcB3wktZ/NfA8YAz4CXA8QFVtTvJ24JLW721VtblNvw44G9gd+Fx7SJJGZNpwqKprgadM0n4LcMQk7QWcOMVYZwFnTdK+DnjiEPVKknYCPyEtSeoxHCRJPYaDJKnHcJAk9RgOkqQew0GS1GM4SJJ6DAdJUo/hIEnqMRwkST2GgySpx3CQJPUYDpKkHsNBktRjOEiSegwHSVKP4SBJ6jEcJEk9hoMkqcdwkCT1GA6SpB7DQZLUYzhIknoMB0lSj+EgSeoxHCRJPYaDJKln6HBIskuSbyb5TJs/KMnFScaSfCLJrq19tzY/1pYvHhjjza396iRHDbQva21jSU6auc2TJG2LrTlyeANw1cD8e4BTq+oxwK3ACa39BODW1n5q60eSQ4CXAk8AlgHvb4GzC/A+4GjgEOBlra8kaUSGCocki4DnAx9u8wGeC3yydVkJHNumj2nztOVHtP7HAOdW1V1V9X1gDDi0Pcaq6tqquhs4t/WVJI3IsEcOfw38KXBfm384cFtV3dPmNwAL2/RCYD1AW3576/+L9gnPmaq9J8mKJOuSrNu0adOQpUuStta04ZDk94CbqurSnVDPFlXVGVW1tKqWzp8/f9TlSNKvrXlD9PkN4AVJngc8CHgY8F5gryTz2tHBImBj678ROADYkGQesCdwy0D7uMHnTNUuSRqBaY8cqurNVbWoqhbTXVC+sKpeAXwReFHrthw4v02vavO05RdWVbX2l7a7mQ4ClgBfBy4BlrS7n3Zt61g1I1snSdomwxw5TOVNwLlJ3gF8EziztZ8J/E2SMWAz3c6eqroyyXnAd4B7gBOr6l6AJK8H1gC7AGdV1ZXbUZckaTttVThU1ZeAL7Xpa+nuNJrY52fAi6d4/juBd07SvhpYvTW1SJJ2HD8hLUnqMRwkST2GgySpx3CQJPUYDpKkHsNBktRjOEiSegwHSVKP4SBJ6jEcJEk9hoMkqcdwkCT1GA6SpB7DQZLUYzhIknoMB0lSj+EgSeoxHCRJPYaDJKnHcJAk9RgOkqQew0GS1GM4SJJ6DAdJUo/hIEnqMRwkST2GgySpZ9pwSPKgJF9P8q0kVyZ5a2s/KMnFScaSfCLJrq19tzY/1pYvHhjrza396iRHDbQva21jSU6a+c2UJG2NYY4c7gKeW1VPAZ4KLEtyGPAe4NSqegxwK3BC638CcGtrP7X1I8khwEuBJwDLgPcn2SXJLsD7gKOBQ4CXtb6SpBGZNhyqc2ebfWB7FPBc4JOtfSVwbJs+ps3Tlh+RJK393Kq6q6q+D4wBh7bHWFVdW1V3A+e2vpKkERnqmkN7h38ZcBOwFvgecFtV3dO6bAAWtumFwHqAtvx24OGD7ROeM1X7ZHWsSLIuybpNmzYNU7okaRsMFQ5VdW9VPRVYRPdO/3E7tKqp6zijqpZW1dL58+ePogRJmhO26m6lqroN+CLwbGCvJPPaokXAxja9ETgAoC3fE7hlsH3Cc6ZqlySNyDB3K81Psleb3h34XeAqupB4Ueu2HDi/Ta9q87TlF1ZVtfaXtruZDgKWAF8HLgGWtLufdqW7aL1qJjZOkrRt5k3fhf2Ale2uogcA51XVZ5J8Bzg3yTuAbwJntv5nAn+TZAzYTLezp6quTHIe8B3gHuDEqroXIMnrgTXALsBZVXXljG2hJGmrTRsOVXU58LRJ2q+lu/4wsf1nwIunGOudwDsnaV8NrB6iXknSTuAnpCVJPYaDJKnHcJAk9RgOkqQew0GS1GM4SJJ6DAdJUo/hIEnqMRwkST2GgySpx3CQJPUM88V7mklv2XMGx7p95saSpAEeOUiSegwHSVKP4SBJ6jEcJEk9hoMkqcdwkCT1GA6SpB7DQZLUYzhIknoMB0lSj+EgSeoxHCRJPYaDJKnHcJAk9RgOkqSeacMhyQFJvpjkO0muTPKG1r5PkrVJrmk/927tSXJakrEklyd5+sBYy1v/a5IsH2h/RpIr2nNOS5IdsbGSpOEMc+RwD/DHVXUIcBhwYpJDgJOAC6pqCXBBmwc4GljSHiuA06ELE+Bk4FnAocDJ44HS+rx64HnLtn/TJEnbatpwqKrrq+obbfrHwFXAQuAYYGXrthI4tk0fA5xTnYuAvZLsBxwFrK2qzVV1K7AWWNaWPayqLqqqAs4ZGEuSNAJbdc0hyWLgacDFwIKqur4tugFY0KYXAusHnrahtW2pfcMk7ZOtf0WSdUnWbdq0aWtKlyRthaHDIckewN8Db6yqOwaXtXf8NcO19VTVGVW1tKqWzp8/f0evTpLmrKHCIckD6YLho1X1D635xnZKiPbzpta+EThg4OmLWtuW2hdN0i5JGpFh7lYKcCZwVVX91cCiVcD4HUfLgfMH2o9rdy0dBtzeTj+tAY5Msne7EH0ksKYtuyPJYW1dxw2MJUkagXlD9PkN4A+AK5Jc1tr+DDgFOC/JCcB1wEvastXA84Ax4CfA8QBVtTnJ24FLWr+3VdXmNv064Gxgd+Bz7SFJGpFpw6GqvgpM9bmDIybpX8CJU4x1FnDWJO3rgCdOV4skaefwE9KSpB7DQZLUYzhIknoMB0lSj+EgSeoZ5lZW/Zp70sonzdhYVyy/YsbGkjQ6HjlIknoMB0lSj+EgSeoxHCRJPYaDJKnHcJAk9RgOkqQew0GS1GM4SJJ6DAdJUo/hIEnqMRwkST2GgySpx3CQJPUYDpKkHsNBktRjOEiSegwHSVKP4SBJ6jEcJEk9hoMkqWfacEhyVpKbknx7oG2fJGuTXNN+7t3ak+S0JGNJLk/y9IHnLG/9r0myfKD9GUmuaM85LUlmeiMlSVtnmCOHs4FlE9pOAi6oqiXABW0e4GhgSXusAE6HLkyAk4FnAYcCJ48HSuvz6oHnTVyXJGknmzYcquorwOYJzccAK9v0SuDYgfZzqnMRsFeS/YCjgLVVtbmqbgXWAsvasodV1UVVVcA5A2NJkkZkW685LKiq69v0DcCCNr0QWD/Qb0Nr21L7hknaJ5VkRZJ1SdZt2rRpG0uXJE1nuy9It3f8NQO1DLOuM6pqaVUtnT9//s5YpSTNSdsaDje2U0K0nze19o3AAQP9FrW2LbUvmqRdkjRC2xoOq4DxO46WA+cPtB/X7lo6DLi9nX5aAxyZZO92IfpIYE1bdkeSw9pdSscNjCVJGpF503VI8nHgcGDfJBvo7jo6BTgvyQnAdcBLWvfVwPOAMeAnwPEAVbU5yduBS1q/t1XV+EXu19HdEbU78Ln2kCSN0LThUFUvm2LREZP0LeDEKcY5CzhrkvZ1wBOnq0OStPP4CWlJUo/hIEnqMRwkST2GgySpx3CQJPUYDpKkHsNBktRjOEiSegwHSVKP4SBJ6jEcJEk9hoMkqcdwkCT1GA6SpB7DQZLUYzhIknoMB0lSj+EgSeoxHCRJPYaDJKnHcJAk9RgOkqQew0GS1GM4SJJ6DAdJUo/hIEnqMRwkST2zJhySLEtydZKxJCeNuh5JmstmRTgk2QV4H3A0cAjwsiSHjLYqSZq7ZkU4AIcCY1V1bVXdDZwLHDPimiRpzpo36gKahcD6gfkNwLMmdkqyAljRZu9McvUMrHtf4OYtdch7ZmAtW2famgB4a3Z8Jfc3/Wv1ytlX0whY03BmY00wO+uaqZoeNWzH2RIOQ6mqM4AzZnLMJOuqaulMjrm9ZmNNMDvrsqbhWNPwZmNdo6hptpxW2ggcMDC/qLVJkkZgtoTDJcCSJAcl2RV4KbBqxDVJ0pw1K04rVdU9SV4PrAF2Ac6qqit30upn9DTVDJmNNcHsrMuahmNNw5uNde30mlJVO3udkqRZbracVpIkzSKGgySpx3CQpCbJI0ddw2wx5645JHkc3aevF7amjcCqqrpqdFVpGEkeDbyQ7rbne4F/BT5WVXeMtLABSc6pquNGXYe2TZJvVNXTR13HbDCnjhySvInuqzkCfL09Anx8tn7ZX5LjR13DoCR7jGi9/w34APAg4JnAbnQhcVGSw0dU06oJj08DLxyfH0VNk/Hd8FbZ6R/xn63m1JFDkn8FnlBVP5/QvitwZVUtGU1lU0vyw6o6cNR1jBtVPUmuAJ5aVfcmeTCwuqoOT3IgcH5VPW0ENX0D+A7wYaBobzToPqdDVX15Z9c0mSSfrarnj7iGJwEfojti/xzwpqq6tS37elUdOsr6xiV5XVW9f8Q17Am8GTgWeATd39ZNwPnAKVV1286oY1Z8zmEnug/YH7huQvt+bdlIJLl8qkXAgp1ZC0CSP5pqETCSI4dmHt3ppN3G66iqHyZ54IjqWQq8Afhz4E+q6rIkP50toTBu1MHQnA68BbgIeBXw1SQvqKrvAaP6/fWMOhia84ALgcOr6gb4xdHf8rbsyJ1RxFwLhzcCFyS5hl9+0d+BwGOA14+sqi4AjgJundAe4Gs7vxzeBfxv4J5Jlo3qVOSHgUuSXAz8FvAegCTzgc2jKKiq7gNOTfJ37eeNzL1/U8N6aFV9vk3/ZZJLgc8n+QO6d8b6pcVVdb+v+2wh8Z4kf7iziphTf8hV9fkkj6X7ivDBC9KXVNW9o6uMzwB7VNVlExck+dLOL4dvAP9YVZdOXJDkVSOoh6p6b5J/Ah4P/J+q+m5r3wT89ihqGqhtA/DiJM8HZs3F8dkmyZ5VdTtAVX0xyX8C/h7YZ7SVzTrXJflTYGVV3QiQZAHwSu7/7dU71Jy65qDhJDkYuKWqbh5oe2RV3ZBkwfgfrDSsJC8Hrq2qiwbaHgnsCvxFVb16ZMXNMkn2Bk6iu6vyEa35Rrrvmztl/FrNDq/DcNAwvMVPM82/qa2X5Piq+sjOWNecupVV28Vb/DTT/Jvaem/dWSuaU9cctF0+NOoC9GvHv6lJzJa7Fz2tJEmzSLvrbcq7F6tq/51Rh0cOkjS7zIq7Fz1ykCT1eEFaktRjOEiSegwHSVKP4SBJ6jEcJEk9/x8JolSsrihzTAAAAABJRU5ErkJggg==\n",
73 | "text/plain": [
74 | ""
75 | ]
76 | },
77 | "metadata": {
78 | "needs_background": "light"
79 | },
80 | "output_type": "display_data"
81 | }
82 | ],
83 | "source": [
84 | "df0.sentiment.value_counts().plot.bar()\n",
85 | "plt.title('sentiment(target) distribution')"
86 | ]
87 | },
88 | {
89 | "cell_type": "code",
90 | "execution_count": 34,
91 | "metadata": {},
92 | "outputs": [],
93 | "source": [
94 | "df0['sentiment'] = df0['sentiment'].astype(str)\n",
95 | "valid_value = [\"0\",\"1\",\"-1\"]\n",
96 | "is_valid = df0['sentiment'].isin(valid_value)\n",
97 | "df0 = df0[is_valid]"
98 | ]
99 | },
100 | {
101 | "cell_type": "code",
102 | "execution_count": 36,
103 | "metadata": {},
104 | "outputs": [
105 | {
106 | "data": {
107 | "text/plain": [
108 | "Text(0.5,1,'sentiment(target) distribution')"
109 | ]
110 | },
111 | "execution_count": 36,
112 | "metadata": {},
113 | "output_type": "execute_result"
114 | },
115 | {
116 | "data": {
117 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYcAAAEICAYAAAC0+DhzAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAGIBJREFUeJzt3Xu4XXV95/H3ByKIIjeJKSRg6BBR8AopxLGtjEwh0E7h6aMW6wyRoaaO2NFpZwp2pg+IN5yZp1SeURSFElorMk4tVKMxA1prLZfgBURkOKKYRC6BcBEvIPCdP9bv6Oasc3J2Dkl24Lxfz7Ofs9b399tr/dbZOeuz12XvpKqQJGnQDqMegCRp+2M4SJJ6DAdJUo/hIEnqMRwkST2GgySpx3DQVpHks0mWjXoc45K8N8nbRj2OqSR5cZKvbOZzzkzy1216/yQPJtlxC43nQ0n+rE0fmWTdllhuW96vJbl5Sy1PW4fhoCdscCc1rqqOraoVIxjLRUneNaE2FzgJ+HCb36I7u5lIUkkOHJ+vquuB+5L8m5ksr6q+X1W7VtWj06z3DUm+PMTy3lRV75zJWCZZ58Rt/ceqOmhLLFtbj+Gg2eANwMqq+smWWFiSOVtiOZP4GPAHW2nZQ9tSRx96kqsqH7PoAZwGrAd+CNwMHNXqOwCnA98B7gEuBfZqbQuBApYB3wfuBv5ra1sKPAz8DHgQ+EarfxH4/Tb9BuCfgHOA+4BbgX/Z6muBu4BlA2PcGfifbV13Ah8CdmltRwLrgD9uz7sdOLm1LW/jeLiN5e9b/Urg37bpZwI/AR5rfR4E9gUOB/65je924H8BOw2MqYBTgVuA77ba0e13eD/wQeAfxre5tf974CbgXmAV8NxW/1Jb3o/a+n+31ee3se08xWt3QFvHD4HVbYx/PeE1mjPwO7+19f0u8HrgBcBPgUfbeu9rfS8CzgNWtjH961Z714Tf+Z/SvfbfA14/MK6fv9YD6/7yVNs6vryB/i9oy7gPuBH47YG2i4APAJ9p23I18C9G/Xc0Gx4jH4CPbfhiw0F0O+N92/zC8T804K3AVcACup3zh4GPD/Qr4CPALsBLgIeAF7T2M8d3UgPr+vkOo+0sHgFOBnYE3kW34/9AW9fR7Q9/19b/HOByYC/gWcDfA+9tbUe2ZZ0FPA04DvgxsGdr//lObWAsG4BfGZh/3M6p1Q4DlgBz2vbeBLxtoL3odsh7td/B3sADwO+057yVLpjGt/l4YKzt+OYA/w34yoTlHTjJa/QA8OIpXr9/Bv68/c5+vf3OeuFAF4APAAe1tn2AQwZeiy9PWO5FdAH3Cro3CU+nHw6PDKz7lXQ7+/Hl//y1nmwdE7d18PffXsMxuuDZCXhV266DBsZ2D114z6E7urpk1H9Ls+HhaaXZ5VG6P+6Dkzytqr5XVd9pbW+iOxpYV1UP0e3wXz3hFMo7quonVfUN4Bt0ITGs71bVX1Z3TvwTwH7AWVX1UFV9nu7d/oFJQncE8J+qamNV/RB4D3DiwLJ+1p77s6paSfeOdFPnsPeg2+FMqaquq6qrquqRqvoeXTi+ckK397Yx/YQulG6sqr+tqkeAc4E7Bvq+qfW/qbW/B3hpkuduahxtnHtMLCbZH/gV4M/a7+xLdKE5lceAFybZpapur6obp1nvZVX1T1X1WFX9dIo+4+v+B7p38q+dZpnDWALsCpxdVQ9X1ZXAp4HXDfT5VFVd036PHwNeugXWq2kYDrNIVY0Bb6Pb8d+V5JIk+7bm5wKfSnJfkvvo3jk/CswbWMTgzu/HdH/Uw7pzYPonbTwTa7sCc4FnANcNjOVzrT7unrajGHYs99IdgUwpyfOSfDrJHUkeoNuZ7z2h29qB6X0H56uq6E69jHsu8P6BbdgIhO7U0aY8i+70ykT7AvdW1Y8GardNtoDW53fpAur2JJ9J8vxp1rt2mvbJ1r3vVJ03w77A2qp6bMKyB39PT+TfnWbIcJhlqupvqupX6XZeBbyvNa0Fjq2qPQYeT6+q9cMsdgsO8W66oDhkYBy7V9WwO4TJxnI98Lxp+pwHfBtYVFW70Z3myCaWfTvdKTgA2hHPgoH2tcAfTPh97lJVU96ummQ+3amVyW7zvB3YM8kzB2r7T7WsqlpVVb9Bd0rp23SnBCduw+OeMtWymsnW/YM2/SO6QB/3S9Msa9APgP2SDO6L9qe7LqYRMhxmkSQHJXlVkp3pLkyOX5iF7qLvu8dPeySZm+T4IRd9J7Bwwh/4jLR3kB8BzknynDaW+UmO2Yyx/PKE2koef4roTuDZSXYfqD2L7jz9g+1d9n+YZj2fAV6U5IR26u1UHr9T/BDw9iSHtG3YPclrphnnK4Er22m9x6mq24A1wDuS7JTkV4FJb3tNMi/J8W1n/hDdabfx1/lOYEGSnabZvsmMr/vXgN8C/nerfx34nSTPaLesnjLheZNt67ir6Y4G/iTJ05Ic2bbrkhmMT1uQ4TC77AycTffu/A7gOcDbW9v76S4Cfz7JD+kuTh8x5HLHdxL3JPnqFhjnaXQXKa9qp3j+L5u+pjDoArprKvcl+btWuxg4LskuAFX1beDjwK2t377AfwZ+j+6c/0forotMqaruBl4D/He6C6YH0+28H2rtn6I7KrukbcM3gWMHFnEmsKKtf/zc/evpQmUqv0f3mmwEzmjbNZkdgD+ie1e+kS50xsPuSro7gu5IcvemtnGCO+hOz/2A7rz/m9rvEbobCB6mC4EVrX3QmfS3FYCqepguDI6l+3f5QeCkgWVrRNKdKpWe2pK8B7irqv5iKy1/B7prDq+vqi/M4PkvBj5cVS/f4oOTZsBwkGaoneq6mu703H+hO7X0y7WFPmwnjZKnlaSZezndhwbvpjs1coLBoKcKjxwkST1DHTkk2SPJJ5N8O8lNSV6eZK8kq5Pc0n7u2fomyblJxpJcn+TQgeUsa/1vycA3diY5LMkN7TnnttsCJUkjMtSRQ5IVwD9W1UfbLXDPoLsPfGNVnZ3kdLqvLzgtyXHAH9J9gvQI4P1VdUSSveju5lhMd0/1dcBhVXVvkmuA/0h3/nYlcG5VfXZTY9p7771r4cKFM9tqSZqFrrvuururau70PbvvKtmkdi/4r9N9X8r4rWcPt3vgj2zdVtB9v8ppdN8pc3H7xOhV7ahjn9Z3dVVtbMtdDSxN8kVgt6q6qtUvBk4ANhkOCxcuZM2aNcNsoyQJSDLpp+onM8xppQPovrjsL5N8LclH24dr5lXV7a3PHfziaxbm8/iP4q9rtU3V101S70myPMmaJGs2bNgwxNAlSTMxTDjMAQ4Fzquql9F9VP70wQ7tKGGrX9muqvOranFVLZ47d6gjI0nSDAwTDuvovl736jb/SbqwuLOdLqL9vKu1r6f7xs1xC1ptU/UFk9QlSSMybThU1R3A2iTjX19wFPAtuq9aGL/jaBlwWZu+HDip3bW0BLi/nX5aBRydZM92Z9PRwKrW9kCSJe0upZMGliVJGoFh/7vDPwQ+1u5UupXuP23ZAbg0ySl0X7E7/p0pK+nuVBqj+0KtkwGqamOSdwLXtn5njV+cBt5M95967EJ3IXqTF6MlSVvXk/ZDcIsXLy7vVpKk4SW5rqoWD9PXr8+QJPUYDpKkHsNBktQz7AXpWW3h6Z8Z9RC2qu+d/ZujHoKk7YxHDpKkHsNBktRjOEiSegwHSVKP4SBJ6jEcJEk9hoMkqcdwkCT1GA6SpB7DQZLUYzhIknoMB0lSj+EgSeoxHCRJPYaDJKnHcJAk9RgOkqQew0GS1GM4SJJ6DAdJUo/hIEnqMRwkST2GgySpZ6hwSPK9JDck+XqSNa22V5LVSW5pP/ds9SQ5N8lYkuuTHDqwnGWt/y1Jlg3UD2vLH2vPzZbeUEnS8DbnyOFfVdVLq2pxmz8duKKqFgFXtHmAY4FF7bEcOA+6MAHOAI4ADgfOGA+U1ueNA89bOuMtkiQ9YU/ktNLxwIo2vQI4YaB+cXWuAvZIsg9wDLC6qjZW1b3AamBpa9utqq6qqgIuHliWJGkEhg2HAj6f5Loky1ttXlXd3qbvAOa16fnA2oHnrmu1TdXXTVLvSbI8yZokazZs2DDk0CVJm2vOkP1+tarWJ3kOsDrJtwcbq6qS1JYf3uNV1fnA+QCLFy/e6uuTpNlqqCOHqlrfft4FfIrumsGd7ZQQ7eddrft6YL+Bpy9otU3VF0xSlySNyLThkOSZSZ41Pg0cDXwTuBwYv+NoGXBZm74cOKndtbQEuL+dfloFHJ1kz3Yh+mhgVWt7IMmSdpfSSQPLkiSNwDCnleYBn2p3l84B/qaqPpfkWuDSJKcAtwGvbf1XAscBY8CPgZMBqmpjkncC17Z+Z1XVxjb9ZuAiYBfgs+0hSRqRacOhqm4FXjJJ/R7gqEnqBZw6xbIuBC6cpL4GeOEQ45UkbQN+QlqS1GM4SJJ6DAdJUo/hIEnqMRwkST2GgySpx3CQJPUYDpKkHsNBktRjOEiSegwHSVKP4SBJ6jEcJEk9hoMkqcdwkCT1GA6SpB7DQZLUYzhIknoMB0lSj+EgSeoxHCRJPYaDJKnHcJAk9RgOkqQew0GS1GM4SJJ6DAdJUs/Q4ZBkxyRfS/LpNn9AkquTjCX5RJKdWn3nNj/W2hcOLOPtrX5zkmMG6ktbbSzJ6Vtu8yRJM7E5Rw5vBW4amH8fcE5VHQjcC5zS6qcA97b6Oa0fSQ4GTgQOAZYCH2yBsyPwAeBY4GDgda2vJGlEhgqHJAuA3wQ+2uYDvAr4ZOuyAjihTR/f5mntR7X+xwOXVNVDVfVdYAw4vD3GqurWqnoYuKT1lSSNyLBHDn8B/AnwWJt/NnBfVT3S5tcB89v0fGAtQGu/v/X/eX3Cc6aq9yRZnmRNkjUbNmwYcuiSpM01bTgk+S3grqq6bhuMZ5Oq6vyqWlxVi+fOnTvq4UjSU9acIfq8AvjtJMcBTwd2A94P7JFkTjs6WACsb/3XA/sB65LMAXYH7hmojxt8zlR1SdIITHvkUFVvr6oFVbWQ7oLylVX1euALwKtbt2XAZW368jZPa7+yqqrVT2x3Mx0ALAKuAa4FFrW7n3Zq67h8i2ydJGlGhjlymMppwCVJ3gV8Dbig1S8A/irJGLCRbmdPVd2Y5FLgW8AjwKlV9ShAkrcAq4AdgQur6sYnMC5J0hO0WeFQVV8Evtimb6W702hin58Cr5ni+e8G3j1JfSWwcnPGIknaevyEtCSpx3CQJPUYDpKkHsNBktRjOEiSegwHSVKP4SBJ6jEcJEk9hoMkqcdwkCT1GA6SpB7DQZLUYzhIknoMB0lSj+EgSeoxHCRJPYaDJKnHcJAk9RgOkqQew0GS1GM4SJJ6DAdJUo/hIEnqMRwkST2GgySpx3CQJPUYDpKknmnDIcnTk1yT5BtJbkzyjlY/IMnVScaSfCLJTq2+c5sfa+0LB5b19la/OckxA/WlrTaW5PQtv5mSpM0xzJHDQ8CrquolwEuBpUmWAO8DzqmqA4F7gVNa/1OAe1v9nNaPJAcDJwKHAEuBDybZMcmOwAeAY4GDgde1vpKkEZk2HKrzYJt9WnsU8Crgk62+AjihTR/f5mntRyVJq19SVQ9V1XeBMeDw9hirqlur6mHgktZXkjQiQ11zaO/wvw7cBawGvgPcV1WPtC7rgPltej6wFqC13w88e7A+4TlT1Scbx/Ika5Ks2bBhwzBDlyTNwFDhUFWPVtVLgQV07/Sfv1VHNfU4zq+qxVW1eO7cuaMYgiTNCpt1t1JV3Qd8AXg5sEeSOa1pAbC+Ta8H9gNo7bsD9wzWJzxnqrokaUSGuVtpbpI92vQuwG8AN9GFxKtbt2XAZW368jZPa7+yqqrVT2x3Mx0ALAKuAa4FFrW7n3aiu2h9+ZbYOEnSzMyZvgv7ACvaXUU7AJdW1aeTfAu4JMm7gK8BF7T+FwB/lWQM2Ei3s6eqbkxyKfAt4BHg1Kp6FCDJW4BVwI7AhVV14xbbQknSZps2HKrqeuBlk9Rvpbv+MLH+U+A1Uyzr3cC7J6mvBFYOMV5J0jbgJ6QlST2GgySpx3CQJPUYDpKkHsNBktRjOEiSegwHSVKP4SBJ6jEcJEk9hoMkqcdwkCT1DPPFe9KT25m7j3oEW9eZ9496BHoK8shBktRjOEiSegwHSVKP4SBJ6jEcJEk9hoMkqcdwkCT1GA6SpB7DQZLUYzhIknoMB0lSj+EgSeoxHCRJPYaDJKnHcJAk9UwbDkn2S/KFJN9KcmOSt7b6XklWJ7ml/dyz1ZPk3CRjSa5PcujAspa1/rckWTZQPyzJDe055ybJ1thYSdJwhjlyeAT446o6GFgCnJrkYOB04IqqWgRc0eYBjgUWtcdy4DzowgQ4AzgCOBw4YzxQWp83Djxv6RPfNEnSTE0bDlV1e1V9tU3/ELgJmA8cD6xo3VYAJ7Tp44GLq3MVsEeSfYBjgNVVtbGq7gVWA0tb225VdVVVFXDxwLIkSSOwWdcckiwEXgZcDcyrqttb0x3AvDY9H1g78LR1rbap+rpJ6pOtf3mSNUnWbNiwYXOGLknaDEOHQ5Jdgf8DvK2qHhhsa+/4awuPraeqzq+qxVW1eO7cuVt7dZI0aw0VDkmeRhcMH6uqv23lO9spIdrPu1p9PbDfwNMXtNqm6gsmqUuSRmSYu5UCXADcVFV/PtB0OTB+x9Ey4LKB+kntrqUlwP3t9NMq4Ogke7YL0UcDq1rbA0mWtHWdNLAsSdIIzBmizyuAfwfckOTrrfanwNnApUlOAW4DXtvaVgLHAWPAj4GTAapqY5J3Ate2fmdV1cY2/WbgImAX4LPtIUkakWnDoaq+DEz1uYOjJulfwKlTLOtC4MJJ6muAF043FknStuEnpCVJPYaDJKnHcJAk9RgOkqQew0GS1DPMraySNDIvWvGiUQ9hq7lh2Q2jHsKUPHKQJPUYDpKkHsNBktRjOEiSegwHSVKP4SBJ6jEcJEk9hoMkqcdwkCT1GA6SpB7DQZLUYzhIknoMB0lSj+EgSeoxHCRJPYaDJKnHcJAk9RgOkqQew0GS1GM4SJJ6DAdJUs+04ZDkwiR3JfnmQG2vJKuT3NJ+7tnqSXJukrEk1yc5dOA5y1r/W5IsG6gfluSG9pxzk2RLb6QkafMMc+RwEbB0Qu104IqqWgRc0eYBjgUWtcdy4DzowgQ4AzgCOBw4YzxQWp83Djxv4rokSdvYtOFQVV8CNk4oHw+saNMrgBMG6hdX5ypgjyT7AMcAq6tqY1XdC6wGlra23arqqqoq4OKBZUmSRmSm1xzmVdXtbfoOYF6bng+sHei3rtU2VV83SX1SSZYnWZNkzYYNG2Y4dEnSdJ7wBen2jr+2wFiGWdf5VbW4qhbPnTt3W6xSkmalmYbDne2UEO3nXa2+HthvoN+CVttUfcEkdUnSCM00HC4Hxu84WgZcNlA/qd21tAS4v51+WgUcnWTPdiH6aGBVa3sgyZJ2l9JJA8uSJI3InOk6JPk4cCSwd5J1dHcdnQ1cmuQU4Dbgta37SuA4YAz4MXAyQFVtTPJO4NrW76yqGr/I/Wa6O6J2AT7bHpKkEZo2HKrqdVM0HTVJ3wJOnWI5FwIXTlJfA7xwunFIkrYdPyEtSeoxHCRJPYaDJKnHcJAk9RgOkqQew0GS1GM4SJJ6DAdJUo/hIEnqMRwkST2GgySpx3CQJPUYDpKkHsNBktRjOEiSegwHSVKP4SBJ6jEcJEk9hoMkqcdwkCT1GA6SpB7DQZLUYzhIknoMB0lSj+EgSeoxHCRJPYaDJKlnuwmHJEuT3JxkLMnpox6PJM1m20U4JNkR+ABwLHAw8LokB492VJI0e20X4QAcDoxV1a1V9TBwCXD8iMckSbPWnFEPoJkPrB2YXwccMbFTkuXA8jb7YJKbt8HYRmFv4O5ttbK8b1utadbYpq8f78g2W9Ussc1ev7xhm792zx224/YSDkOpqvOB80c9jq0tyZqqWjzqcWhmfP2e3Hz9OtvLaaX1wH4D8wtaTZI0AttLOFwLLEpyQJKdgBOBy0c8JkmatbaL00pV9UiStwCrgB2BC6vqxhEPa5Se8qfOnuJ8/Z7cfP2AVNWoxyBJ2s5sL6eVJEnbEcNBktRjOEiSeraLC9KzXZLn030ifH4rrQcur6qbRjcqSbOZRw4jluQ0uq8LCXBNewT4uF9A+OSW5ORRj0Ezk2TXUY9h1LxbacSS/D/gkKr62YT6TsCNVbVoNCPTE5Xk+1W1/6jHoc3na+dppe3BY8C+wG0T6vu0Nm3Hklw/VRMwb1uORZsnyR9N1QTM+iMHw2H03gZckeQWfvHlg/sDBwJvGdmoNKx5wDHAvRPqAb6y7YejzfAe4H8Aj0zSNutPuRsOI1ZVn0vyPLqvLR+8IH1tVT06upFpSJ8Gdq2qr09sSPLFbT8cbYavAn9XVddNbEjy+yMYz3bFaw6SZqUkBwH3VNXdA7Vfqqo7ksyrqjtHOLyRMxwkqUny1ao6dNTj2B7M+vNqkjTA/zmpMRwk6Rc+MuoBbC88rSRJ6vHIQZLUYzhIknoMB0lSj+EgSer5/6sbai5eJBpxAAAAAElFTkSuQmCC\n",
118 | "text/plain": [
119 | ""
120 | ]
121 | },
122 | "metadata": {
123 | "needs_background": "light"
124 | },
125 | "output_type": "display_data"
126 | }
127 | ],
128 | "source": [
129 | "df0.sentiment.value_counts().plot.bar()\n",
130 | "plt.title('sentiment(target) distribution')"
131 | ]
132 | },
133 | {
134 | "cell_type": "code",
135 | "execution_count": 37,
136 | "metadata": {},
137 | "outputs": [
138 | {
139 | "data": {
140 | "text/plain": [
141 | "Text(0.5,1,'sentiment(target) distribution')"
142 | ]
143 | },
144 | "execution_count": 37,
145 | "metadata": {},
146 | "output_type": "execute_result"
147 | },
148 | {
149 | "data": {
150 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYcAAAEICAYAAAC0+DhzAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAGIBJREFUeJzt3Xu4XXV95/H3ByKIIjeJKSRg6BBR8AopxLGtjEwh0E7h6aMW6wyRoaaO2NFpZwp2pg+IN5yZp1SeURSFElorMk4tVKMxA1prLZfgBURkOKKYRC6BcBEvIPCdP9bv6Oasc3J2Dkl24Lxfz7Ofs9b399tr/dbZOeuz12XvpKqQJGnQDqMegCRp+2M4SJJ6DAdJUo/hIEnqMRwkST2GgySpx3DQVpHks0mWjXoc45K8N8nbRj2OqSR5cZKvbOZzzkzy1216/yQPJtlxC43nQ0n+rE0fmWTdllhuW96vJbl5Sy1PW4fhoCdscCc1rqqOraoVIxjLRUneNaE2FzgJ+HCb36I7u5lIUkkOHJ+vquuB+5L8m5ksr6q+X1W7VtWj06z3DUm+PMTy3lRV75zJWCZZ58Rt/ceqOmhLLFtbj+Gg2eANwMqq+smWWFiSOVtiOZP4GPAHW2nZQ9tSRx96kqsqH7PoAZwGrAd+CNwMHNXqOwCnA98B7gEuBfZqbQuBApYB3wfuBv5ra1sKPAz8DHgQ+EarfxH4/Tb9BuCfgHOA+4BbgX/Z6muBu4BlA2PcGfifbV13Ah8CdmltRwLrgD9uz7sdOLm1LW/jeLiN5e9b/Urg37bpZwI/AR5rfR4E9gUOB/65je924H8BOw2MqYBTgVuA77ba0e13eD/wQeAfxre5tf974CbgXmAV8NxW/1Jb3o/a+n+31ee3se08xWt3QFvHD4HVbYx/PeE1mjPwO7+19f0u8HrgBcBPgUfbeu9rfS8CzgNWtjH961Z714Tf+Z/SvfbfA14/MK6fv9YD6/7yVNs6vryB/i9oy7gPuBH47YG2i4APAJ9p23I18C9G/Xc0Gx4jH4CPbfhiw0F0O+N92/zC8T804K3AVcACup3zh4GPD/Qr4CPALsBLgIeAF7T2M8d3UgPr+vkOo+0sHgFOBnYE3kW34/9AW9fR7Q9/19b/HOByYC/gWcDfA+9tbUe2ZZ0FPA04DvgxsGdr//lObWAsG4BfGZh/3M6p1Q4DlgBz2vbeBLxtoL3odsh7td/B3sADwO+057yVLpjGt/l4YKzt+OYA/w34yoTlHTjJa/QA8OIpXr9/Bv68/c5+vf3OeuFAF4APAAe1tn2AQwZeiy9PWO5FdAH3Cro3CU+nHw6PDKz7lXQ7+/Hl//y1nmwdE7d18PffXsMxuuDZCXhV266DBsZ2D114z6E7urpk1H9Ls+HhaaXZ5VG6P+6Dkzytqr5XVd9pbW+iOxpYV1UP0e3wXz3hFMo7quonVfUN4Bt0ITGs71bVX1Z3TvwTwH7AWVX1UFV9nu7d/oFJQncE8J+qamNV/RB4D3DiwLJ+1p77s6paSfeOdFPnsPeg2+FMqaquq6qrquqRqvoeXTi+ckK397Yx/YQulG6sqr+tqkeAc4E7Bvq+qfW/qbW/B3hpkuduahxtnHtMLCbZH/gV4M/a7+xLdKE5lceAFybZpapur6obp1nvZVX1T1X1WFX9dIo+4+v+B7p38q+dZpnDWALsCpxdVQ9X1ZXAp4HXDfT5VFVd036PHwNeugXWq2kYDrNIVY0Bb6Pb8d+V5JIk+7bm5wKfSnJfkvvo3jk/CswbWMTgzu/HdH/Uw7pzYPonbTwTa7sCc4FnANcNjOVzrT7unrajGHYs99IdgUwpyfOSfDrJHUkeoNuZ7z2h29qB6X0H56uq6E69jHsu8P6BbdgIhO7U0aY8i+70ykT7AvdW1Y8GardNtoDW53fpAur2JJ9J8vxp1rt2mvbJ1r3vVJ03w77A2qp6bMKyB39PT+TfnWbIcJhlqupvqupX6XZeBbyvNa0Fjq2qPQYeT6+q9cMsdgsO8W66oDhkYBy7V9WwO4TJxnI98Lxp+pwHfBtYVFW70Z3myCaWfTvdKTgA2hHPgoH2tcAfTPh97lJVU96ummQ+3amVyW7zvB3YM8kzB2r7T7WsqlpVVb9Bd0rp23SnBCduw+OeMtWymsnW/YM2/SO6QB/3S9Msa9APgP2SDO6L9qe7LqYRMhxmkSQHJXlVkp3pLkyOX5iF7qLvu8dPeySZm+T4IRd9J7Bwwh/4jLR3kB8BzknynDaW+UmO2Yyx/PKE2koef4roTuDZSXYfqD2L7jz9g+1d9n+YZj2fAV6U5IR26u1UHr9T/BDw9iSHtG3YPclrphnnK4Er22m9x6mq24A1wDuS7JTkV4FJb3tNMi/J8W1n/hDdabfx1/lOYEGSnabZvsmMr/vXgN8C/nerfx34nSTPaLesnjLheZNt67ir6Y4G/iTJ05Ic2bbrkhmMT1uQ4TC77AycTffu/A7gOcDbW9v76S4Cfz7JD+kuTh8x5HLHdxL3JPnqFhjnaXQXKa9qp3j+L5u+pjDoArprKvcl+btWuxg4LskuAFX1beDjwK2t377AfwZ+j+6c/0forotMqaruBl4D/He6C6YH0+28H2rtn6I7KrukbcM3gWMHFnEmsKKtf/zc/evpQmUqv0f3mmwEzmjbNZkdgD+ie1e+kS50xsPuSro7gu5IcvemtnGCO+hOz/2A7rz/m9rvEbobCB6mC4EVrX3QmfS3FYCqepguDI6l+3f5QeCkgWVrRNKdKpWe2pK8B7irqv5iKy1/B7prDq+vqi/M4PkvBj5cVS/f4oOTZsBwkGaoneq6mu703H+hO7X0y7WFPmwnjZKnlaSZezndhwbvpjs1coLBoKcKjxwkST1DHTkk2SPJJ5N8O8lNSV6eZK8kq5Pc0n7u2fomyblJxpJcn+TQgeUsa/1vycA3diY5LMkN7TnnttsCJUkjMtSRQ5IVwD9W1UfbLXDPoLsPfGNVnZ3kdLqvLzgtyXHAH9J9gvQI4P1VdUSSveju5lhMd0/1dcBhVXVvkmuA/0h3/nYlcG5VfXZTY9p7771r4cKFM9tqSZqFrrvuururau70PbvvKtmkdi/4r9N9X8r4rWcPt3vgj2zdVtB9v8ppdN8pc3H7xOhV7ahjn9Z3dVVtbMtdDSxN8kVgt6q6qtUvBk4ANhkOCxcuZM2aNcNsoyQJSDLpp+onM8xppQPovrjsL5N8LclH24dr5lXV7a3PHfziaxbm8/iP4q9rtU3V101S70myPMmaJGs2bNgwxNAlSTMxTDjMAQ4Fzquql9F9VP70wQ7tKGGrX9muqvOranFVLZ47d6gjI0nSDAwTDuvovl736jb/SbqwuLOdLqL9vKu1r6f7xs1xC1ptU/UFk9QlSSMybThU1R3A2iTjX19wFPAtuq9aGL/jaBlwWZu+HDip3bW0BLi/nX5aBRydZM92Z9PRwKrW9kCSJe0upZMGliVJGoFh/7vDPwQ+1u5UupXuP23ZAbg0ySl0X7E7/p0pK+nuVBqj+0KtkwGqamOSdwLXtn5njV+cBt5M95967EJ3IXqTF6MlSVvXk/ZDcIsXLy7vVpKk4SW5rqoWD9PXr8+QJPUYDpKkHsNBktQz7AXpWW3h6Z8Z9RC2qu+d/ZujHoKk7YxHDpKkHsNBktRjOEiSegwHSVKP4SBJ6jEcJEk9hoMkqcdwkCT1GA6SpB7DQZLUYzhIknoMB0lSj+EgSeoxHCRJPYaDJKnHcJAk9RgOkqQew0GS1GM4SJJ6DAdJUo/hIEnqMRwkST2GgySpZ6hwSPK9JDck+XqSNa22V5LVSW5pP/ds9SQ5N8lYkuuTHDqwnGWt/y1Jlg3UD2vLH2vPzZbeUEnS8DbnyOFfVdVLq2pxmz8duKKqFgFXtHmAY4FF7bEcOA+6MAHOAI4ADgfOGA+U1ueNA89bOuMtkiQ9YU/ktNLxwIo2vQI4YaB+cXWuAvZIsg9wDLC6qjZW1b3AamBpa9utqq6qqgIuHliWJGkEhg2HAj6f5Loky1ttXlXd3qbvAOa16fnA2oHnrmu1TdXXTVLvSbI8yZokazZs2DDk0CVJm2vOkP1+tarWJ3kOsDrJtwcbq6qS1JYf3uNV1fnA+QCLFy/e6uuTpNlqqCOHqlrfft4FfIrumsGd7ZQQ7eddrft6YL+Bpy9otU3VF0xSlySNyLThkOSZSZ41Pg0cDXwTuBwYv+NoGXBZm74cOKndtbQEuL+dfloFHJ1kz3Yh+mhgVWt7IMmSdpfSSQPLkiSNwDCnleYBn2p3l84B/qaqPpfkWuDSJKcAtwGvbf1XAscBY8CPgZMBqmpjkncC17Z+Z1XVxjb9ZuAiYBfgs+0hSRqRacOhqm4FXjJJ/R7gqEnqBZw6xbIuBC6cpL4GeOEQ45UkbQN+QlqS1GM4SJJ6DAdJUo/hIEnqMRwkST2GgySpx3CQJPUYDpKkHsNBktRjOEiSegwHSVKP4SBJ6jEcJEk9hoMkqcdwkCT1GA6SpB7DQZLUYzhIknoMB0lSj+EgSeoxHCRJPYaDJKnHcJAk9RgOkqQew0GS1GM4SJJ6DAdJUs/Q4ZBkxyRfS/LpNn9AkquTjCX5RJKdWn3nNj/W2hcOLOPtrX5zkmMG6ktbbSzJ6Vtu8yRJM7E5Rw5vBW4amH8fcE5VHQjcC5zS6qcA97b6Oa0fSQ4GTgQOAZYCH2yBsyPwAeBY4GDgda2vJGlEhgqHJAuA3wQ+2uYDvAr4ZOuyAjihTR/f5mntR7X+xwOXVNVDVfVdYAw4vD3GqurWqnoYuKT1lSSNyLBHDn8B/AnwWJt/NnBfVT3S5tcB89v0fGAtQGu/v/X/eX3Cc6aq9yRZnmRNkjUbNmwYcuiSpM01bTgk+S3grqq6bhuMZ5Oq6vyqWlxVi+fOnTvq4UjSU9acIfq8AvjtJMcBTwd2A94P7JFkTjs6WACsb/3XA/sB65LMAXYH7hmojxt8zlR1SdIITHvkUFVvr6oFVbWQ7oLylVX1euALwKtbt2XAZW368jZPa7+yqqrVT2x3Mx0ALAKuAa4FFrW7n3Zq67h8i2ydJGlGhjlymMppwCVJ3gV8Dbig1S8A/irJGLCRbmdPVd2Y5FLgW8AjwKlV9ShAkrcAq4AdgQur6sYnMC5J0hO0WeFQVV8Evtimb6W702hin58Cr5ni+e8G3j1JfSWwcnPGIknaevyEtCSpx3CQJPUYDpKkHsNBktRjOEiSegwHSVKP4SBJ6jEcJEk9hoMkqcdwkCT1GA6SpB7DQZLUYzhIknoMB0lSj+EgSeoxHCRJPYaDJKnHcJAk9RgOkqQew0GS1GM4SJJ6DAdJUo/hIEnqMRwkST2GgySpx3CQJPUYDpKknmnDIcnTk1yT5BtJbkzyjlY/IMnVScaSfCLJTq2+c5sfa+0LB5b19la/OckxA/WlrTaW5PQtv5mSpM0xzJHDQ8CrquolwEuBpUmWAO8DzqmqA4F7gVNa/1OAe1v9nNaPJAcDJwKHAEuBDybZMcmOwAeAY4GDgde1vpKkEZk2HKrzYJt9WnsU8Crgk62+AjihTR/f5mntRyVJq19SVQ9V1XeBMeDw9hirqlur6mHgktZXkjQiQ11zaO/wvw7cBawGvgPcV1WPtC7rgPltej6wFqC13w88e7A+4TlT1Scbx/Ika5Ks2bBhwzBDlyTNwFDhUFWPVtVLgQV07/Sfv1VHNfU4zq+qxVW1eO7cuaMYgiTNCpt1t1JV3Qd8AXg5sEeSOa1pAbC+Ta8H9gNo7bsD9wzWJzxnqrokaUSGuVtpbpI92vQuwG8AN9GFxKtbt2XAZW368jZPa7+yqqrVT2x3Mx0ALAKuAa4FFrW7n3aiu2h9+ZbYOEnSzMyZvgv7ACvaXUU7AJdW1aeTfAu4JMm7gK8BF7T+FwB/lWQM2Ei3s6eqbkxyKfAt4BHg1Kp6FCDJW4BVwI7AhVV14xbbQknSZps2HKrqeuBlk9Rvpbv+MLH+U+A1Uyzr3cC7J6mvBFYOMV5J0jbgJ6QlST2GgySpx3CQJPUYDpKkHsNBktRjOEiSegwHSVKP4SBJ6jEcJEk9hoMkqcdwkCT1DPPFe9KT25m7j3oEW9eZ9496BHoK8shBktRjOEiSegwHSVKP4SBJ6jEcJEk9hoMkqcdwkCT1GA6SpB7DQZLUYzhIknoMB0lSj+EgSeoxHCRJPYaDJKnHcJAk9UwbDkn2S/KFJN9KcmOSt7b6XklWJ7ml/dyz1ZPk3CRjSa5PcujAspa1/rckWTZQPyzJDe055ybJ1thYSdJwhjlyeAT446o6GFgCnJrkYOB04IqqWgRc0eYBjgUWtcdy4DzowgQ4AzgCOBw4YzxQWp83Djxv6RPfNEnSTE0bDlV1e1V9tU3/ELgJmA8cD6xo3VYAJ7Tp44GLq3MVsEeSfYBjgNVVtbGq7gVWA0tb225VdVVVFXDxwLIkSSOwWdcckiwEXgZcDcyrqttb0x3AvDY9H1g78LR1rbap+rpJ6pOtf3mSNUnWbNiwYXOGLknaDEOHQ5Jdgf8DvK2qHhhsa+/4awuPraeqzq+qxVW1eO7cuVt7dZI0aw0VDkmeRhcMH6uqv23lO9spIdrPu1p9PbDfwNMXtNqm6gsmqUuSRmSYu5UCXADcVFV/PtB0OTB+x9Ey4LKB+kntrqUlwP3t9NMq4Ogke7YL0UcDq1rbA0mWtHWdNLAsSdIIzBmizyuAfwfckOTrrfanwNnApUlOAW4DXtvaVgLHAWPAj4GTAapqY5J3Ate2fmdV1cY2/WbgImAX4LPtIUkakWnDoaq+DEz1uYOjJulfwKlTLOtC4MJJ6muAF043FknStuEnpCVJPYaDJKnHcJAk9RgOkqQew0GS1DPMraySNDIvWvGiUQ9hq7lh2Q2jHsKUPHKQJPUYDpKkHsNBktRjOEiSegwHSVKP4SBJ6jEcJEk9hoMkqcdwkCT1GA6SpB7DQZLUYzhIknoMB0lSj+EgSeoxHCRJPYaDJKnHcJAk9RgOkqQew0GS1GM4SJJ6DAdJUs+04ZDkwiR3JfnmQG2vJKuT3NJ+7tnqSXJukrEk1yc5dOA5y1r/W5IsG6gfluSG9pxzk2RLb6QkafMMc+RwEbB0Qu104IqqWgRc0eYBjgUWtcdy4DzowgQ4AzgCOBw4YzxQWp83Djxv4rokSdvYtOFQVV8CNk4oHw+saNMrgBMG6hdX5ypgjyT7AMcAq6tqY1XdC6wGlra23arqqqoq4OKBZUmSRmSm1xzmVdXtbfoOYF6bng+sHei3rtU2VV83SX1SSZYnWZNkzYYNG2Y4dEnSdJ7wBen2jr+2wFiGWdf5VbW4qhbPnTt3W6xSkmalmYbDne2UEO3nXa2+HthvoN+CVttUfcEkdUnSCM00HC4Hxu84WgZcNlA/qd21tAS4v51+WgUcnWTPdiH6aGBVa3sgyZJ2l9JJA8uSJI3InOk6JPk4cCSwd5J1dHcdnQ1cmuQU4Dbgta37SuA4YAz4MXAyQFVtTPJO4NrW76yqGr/I/Wa6O6J2AT7bHpKkEZo2HKrqdVM0HTVJ3wJOnWI5FwIXTlJfA7xwunFIkrYdPyEtSeoxHCRJPYaDJKnHcJAk9RgOkqQew0GS1GM4SJJ6DAdJUo/hIEnqMRwkST2GgySpx3CQJPUYDpKkHsNBktRjOEiSegwHSVKP4SBJ6jEcJEk9hoMkqcdwkCT1GA6SpB7DQZLUYzhIknoMB0lSj+EgSeoxHCRJPYaDJKlnuwmHJEuT3JxkLMnpox6PJM1m20U4JNkR+ABwLHAw8LokB492VJI0e20X4QAcDoxV1a1V9TBwCXD8iMckSbPWnFEPoJkPrB2YXwccMbFTkuXA8jb7YJKbt8HYRmFv4O5ttbK8b1utadbYpq8f78g2W9Ussc1ev7xhm792zx224/YSDkOpqvOB80c9jq0tyZqqWjzqcWhmfP2e3Hz9OtvLaaX1wH4D8wtaTZI0AttLOFwLLEpyQJKdgBOBy0c8JkmatbaL00pV9UiStwCrgB2BC6vqxhEPa5Se8qfOnuJ8/Z7cfP2AVNWoxyBJ2s5sL6eVJEnbEcNBktRjOEiSeraLC9KzXZLn030ifH4rrQcur6qbRjcqSbOZRw4jluQ0uq8LCXBNewT4uF9A+OSW5ORRj0Ezk2TXUY9h1LxbacSS/D/gkKr62YT6TsCNVbVoNCPTE5Xk+1W1/6jHoc3na+dppe3BY8C+wG0T6vu0Nm3Hklw/VRMwb1uORZsnyR9N1QTM+iMHw2H03gZckeQWfvHlg/sDBwJvGdmoNKx5wDHAvRPqAb6y7YejzfAe4H8Aj0zSNutPuRsOI1ZVn0vyPLqvLR+8IH1tVT06upFpSJ8Gdq2qr09sSPLFbT8cbYavAn9XVddNbEjy+yMYz3bFaw6SZqUkBwH3VNXdA7Vfqqo7ksyrqjtHOLyRMxwkqUny1ao6dNTj2B7M+vNqkjTA/zmpMRwk6Rc+MuoBbC88rSRJ6vHIQZLUYzhIknoMB0lSj+EgSer5/6sbai5eJBpxAAAAAElFTkSuQmCC\n",
151 | "text/plain": [
152 | ""
153 | ]
154 | },
155 | "metadata": {
156 | "needs_background": "light"
157 | },
158 | "output_type": "display_data"
159 | }
160 | ],
161 | "source": [
162 | "df0.sentiment.value_counts().plot.bar()\n",
163 | "plt.title('sentiment(target) distribution')"
164 | ]
165 | },
166 | {
167 | "cell_type": "code",
168 | "execution_count": 38,
169 | "metadata": {},
170 | "outputs": [
171 | {
172 | "name": "stderr",
173 | "output_type": "stream",
174 | "text": [
175 | "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: \n",
176 | "A value is trying to be set on a copy of a slice from a DataFrame.\n",
177 | "Try using .loc[row_indexer,col_indexer] = value instead\n",
178 | "\n",
179 | "See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy\n",
180 | " \"\"\"Entry point for launching an IPython kernel.\n"
181 | ]
182 | },
183 | {
184 | "data": {
185 | "text/plain": [
186 | "Text(0.5,1,'weibo length')"
187 | ]
188 | },
189 | "execution_count": 38,
190 | "metadata": {},
191 | "output_type": "execute_result"
192 | },
193 | {
194 | "data": {
195 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYQAAAEICAYAAABfz4NwAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAIABJREFUeJzt3Xl8VfWd+P/X+ya52UlCEiAQIGFxAUTAyKLiUqtiq2KtVhgX2to6/VZrq2NndDouXZzWqa3Wcek44loVLbU/0ymtG2hxAwKCshshkLBm3/e8f3/cc/Eas9wkN7lL3s/HIw/OPedzPvfz4cJ957MeUVWMMcYYV7ALYIwxJjRYQDDGGANYQDDGGOOwgGCMMQawgGCMMcZhAcEYYwxgAcEMQyLyexG5wzk+W0RKApTv3SLyh0Dk1Y/3fktEvhOM9zaRIzrYBTBmqKnq94JdhoEQkbuBKap6dbDLYiKLtRCMMcYAFhBMGBGRb4nIX3xefyIif/R5XSwis5zjE0TkdRGpEJFdIvINn3RPicgvOuX97yJSJiJFInKVz/kUEXlGREpFZJ+I/IeI+PX/RkTmi8h7IlIlIltE5Gyfa2+JyM9F5F0RqRWR10Qkw+f6tc77lYvIHU65viwii4B/B64UkToR2eLzlhO7y88Yf1hAMOHkbWChiLhEZCzgBhYAiMgkIAn4SEQSgdeB54FRwBLgERGZ1k2+Y4AMYBywDHhMRI53rv03kAJMAs4CrgW+1VtBRWQc8FfgF8BI4FbgTyKS6ZPsn5y8Rjl1udW5dxrwCHAVkOW8/zgAVf078J/Ai6qapKon95afMf6ygGDChqruAWqBWcCZwKvAQRE5Ac+X9VpV7QAuAopU9UlVbVPVD4E/AVf0kP0dqtqsqm/j+SL/hohE4Qkmt6tqraoWAb8BrvGjuFcDq1R1lap2qOrrQAHwFZ80T6rqblVtBF5y6gVwOfAXVX1HVVuAOwF/Nh3rLj9j/GKDyibcvA2cDUxxjqvwBIMFzmuAicA8EanyuS8aeLabPCtVtd7n9T5gLJ5WQ4zz2vfaOD/KORG4QkQu9jkXA6zxeX3Y57gBTwsH572LvRdUtUFEyv14z+7yM8YvFhBMuHkbuBjIxdN1UoWna2UB8JCTphh4W1XP8zPPNBFJ9AkKE4CtQBnQiufLfbvPtQN+5FkMPKuq3/WzDL4OAd4uK0QkHkj3uW5bFJtBYV1GJty8DZwDxKtqCbAWWITnC/NDJ83/AceJyDUiEuP8nCoiJ/aQ709FxC0iC/F0Of1RVdvxdL3cIyLJIjIRuAXwZ63BH4CLReQCEYkSkThnzUO2H/eudO49TUTcwN2A+Fw/AuT4O7htjL/sH5QJK6q6G6jDEwhQ1RpgD/Cu8wWOqtYC5+Pp/z+IpyvlXiC2m2wPA5VO2ueA76nqTufaD4B65z3ewTNQ/YQf5SwGFuOZEVSKp8XwY/z4P6eq25z3XYGntVAHHAWanSTemVXlIrKpt/yM8ZfYA3KMCW0ikoSna2yqqu4NdnlM5LIWgjEhSEQuFpEEZwrtfcDHQFFwS2UinQUEY0LTYjxdWAeBqcAStea8GWTWZWSMMQawFoIxxhhHWK1DyMjI0JycnGAXwxhjwsrGjRvLVDWzt3RhFRBycnIoKCgIdjGMMSasiMi+3lNZl5ExxhiHBQRjjDGABQRjjDGOsBpDMMZEjtbWVkpKSmhqagp2USJGXFwc2dnZxMTE9Ot+CwjGmKAoKSkhOTmZnJwcRKT3G0yPVJXy8nJKSkrIzc3tVx7WZWSMCYqmpibS09MtGASIiJCenj6gFpcFBGNM0FgwCKyB/n1aQDAmTNm2MybQLCAYE4ZWrN/Pgl+upqm1PdhFMRHEAoIxYaa2qZX/enUXh2ua2FfeEOziDDvf+c532L7d80TVpKSBP7b6m9/8JitXrhxwPoFgs4yMCTOPr91LRX0LAEXl9Rw/JjnIJRpeHn/88WAXYdBYQDAmjJTVNfP42j2cMSWDdwrL2FdeH+wiBcRP/7KN7QdrAprntLEjuOvi6d1e//Wvf01sbCw33XQTN998M1u2bGH16tWsXr2a5cuXs2zZMu666y6am5uZPHkyTz75JElJSZx99tncd9995OXlAXDzzTfz2muvMWbMGFasWEFmZiabN2/me9/7Hg0NDUyePJknnniCtLS0Xsu8ceNGbrnlFurq6sjIyOCpp54iKyuLs88+m3nz5rFmzRqqqqpYvnw5CxcuDNjflZd1GRkTRh5aXUhTWwc/XTyd1IQY6zIagIULF7J27VoACgoKqKuro7W1lbVr1zJz5kx+8Ytf8MYbb7Bp0yby8vL47W9/+4U86uvrycvLY9u2bZx11ln89Kc/BeDaa6/l3nvv5aOPPuKkk046dr4nra2t/OAHP2DlypVs3LiRb3/72/zkJz85dr2trY3169fzwAMP+JVff1gLwZgwUVzRwHPr9vGNvGwmZyYxMT0xYgJCT7/JD5ZTTjmFjRs3UlNTQ2xsLHPmzKGgoIC1a9dyySWXsH37dk4//XQAWlpaWLBgwRfycLlcXHnllQBcffXVXHbZZVRXV1NVVcVZZ50FwLJly7jiiit6Lc+uXbvYunUr5513HgDt7e1kZWUdu37ZZZcdK3dRUdGA6t4dCwjGhImXCorpUPjhuccBMHFkAh8WVwa5VOErJiaG3NxcnnrqKU477TRmzpzJmjVrKCwsJDc3l/POO48XXnihT3kOZB2AqjJ9+nTef//9Lq/HxsYCEBUVRVtbW7/fpyfWZWRMmCipbCQrJY4xKXEA5KQncKCykZa2jiCXLHwtXLiQ++67jzPPPJOFCxfy+9//ntmzZzN//nzeffddCgsLAU/X0O7du79wf0dHx7EZQs8//zxnnHEGKSkppKWlHeuOevbZZ4+1Fnpy/PHHU1paeiwgtLa2sm3btkBV1S8WEIwJEwerGhmbEn/s9cT0RDoUSiojo9soGBYuXMihQ4dYsGABo0ePJi4ujoULF5KZmclTTz3F0qVLmTlzJgsWLGDnzp1fuD8xMZH169czY8YMVq9ezZ133gnA008/zY9//GNmzpzJ5s2bj53vidvtZuXKlfzbv/0bJ598MrNmzeK9994LeJ17IuG02jEvL0/tiWlmuDrr12s4OTuVB5fOBqCgqILLf/8+T37rVM45flSQS9d3O3bs4MQTTwx2MSJOV3+vIrJRVfN6u9daCMaEAVXlUHUTWalxx85NTE8EYF9ZZEw9NcHnV0AQkUUisktECkXkti6ux4rIi871dSKS45xPF5E1IlInIg91k3e+iGwdSCWMiXQV9S20tHWQNeKzgJCR5CbBHcW+CusyCgc33HADs2bN+tzPk08+GexifU6vs4xEJAp4GDgPKAE2iEi+qm73SXYdUKmqU0RkCXAvcCXQBNwBzHB+Oud9GVA34FoYE+EOVXu2NM5K/WwMQUTCfuqpqg6bHU8ffvjhQX+PgQ4B+NNCmAsUquoeVW0BVgCLO6VZDDztHK8EzhURUdV6VX0HT2D4HBFJAm4BftHv0hszTBwLCClxnzufk55AUZiuVo6Li6O8vNx2bQ0Q7wNy4uLiek/cDX/WIYwDin1elwDzukujqm0iUg2kA2U95Ptz4DdAj7/eiMj1wPUAEyZM8KO4xkSeQ9WNAMemnHpNSE/gzR1Hae9Qolzh9Zt2dnY2JSUllJaWBrsoEcP7CM3+CsrCNBGZBUxW1Zu94w3dUdXHgMfAM8to8EtnTOg5VN1ETJSQkRj7ufM56Ym0tHdwqLqR7LSEIJWuf7wLw0zo8KfL6AAw3ud1tnOuyzQiEg2kAOU95LkAyBORIuAd4DgRecu/Ihsz/ByqamT0iDhcnVoBE0d6gsD+MB5HMKHDn4CwAZgqIrki4gaWAPmd0uQDy5zjy4HV2kPHoKo+qqpjVTUHOAPYrapn97XwxgwXh6qbPrcozWtihmfqaZEFBBMAvXYZOWMCNwKvAlHAE6q6TUR+BhSoaj6wHHhWRAqBCjxBAwCnFTACcIvIpcD5nWYoGWN6cai6iVnjU79wPmtEHO5oV8Rsg22Cy68xBFVdBazqdO5On+MmoMvt/JxWQE95F9HFlFRjjIeqcri6iayTvjh7xOUSxqfFh/XUUxM6bKWyMSGuvL6FlvbPL0rzlZOeGLZTT01osYBgTIg73MWiNF8T0hPYX9Fg8/nNgFlAMCbEHazyrEHovCjNKyc9kYaWdkrrmoeyWCYCWUAwJsQdrvGuUu66hTAq2bM2oay2ZcjKZCKTBQRjQtzBKs+itPREd5fX05zzlQ0WEMzAWEAwJsQdrm5kTMoXF6V5pSVYQDCBYQHBmBB3sLqJrBFddxcBpCXEAFDZ0DpURTIRygKCMSHucKcH43SW6rQQquqthWAGxgKCMSGso8OzKK3zLqe+3NEuEt1R1kIwA2YBwZgQVtHgWZTW1T5GvlIT3FTZGIIZIAsIxoSwQ1WeKac9tRAA0hJjbFDZDJgFBGNCmPfBOL21ENIS3NZlZAbMAoIxIcz76MzeWgjWZWQCwQKCMSHsUHUT7ihXt4vSvNISYqyFYAbMAoIxIexoTROZybHdLkrzSktwU93YSlt7xxCVzEQiCwjGhLDSumYykmN7TeddnFbdaK0E038WEIwJYWV1LWQm9dxdBL77GVlAMP1nAcGYEFZe10x6Yu8thGOrlW1g2QyAXwFBRBaJyC4RKRSR27q4HisiLzrX14lIjnM+XUTWiEidiDzkkz5BRP4qIjtFZJuI/CpQFTImUnR0KOX1LWQk+9FCsP2MTAD0GhBEJAp4GLgQmAYsFZFpnZJdB1Sq6hTgfuBe53wTcAdwaxdZ36eqJwCzgdNF5ML+VcGYyFTd2Ep7h/rVQrAdT00g+NNCmAsUquoeVW0BVgCLO6VZDDztHK8EzhURUdV6VX0HT2A4RlUbVHWNc9wCbAKyB1APYyJOmfMENH8GlVOdFoJ1GZmB8CcgjAOKfV6XOOe6TKOqbUA1kO5PAUQkFbgYeLOb69eLSIGIFJSWlvqTpTERoazO8+We0csaBICk2GiiXWJdRmZAgjqoLCLRwAvAg6q6p6s0qvqYquapal5mZubQFtCYIOpLC0FEbLWyGTB/AsIBYLzP62znXJdpnC/5FKDcj7wfAz5R1Qf8SGvMsHIsICT1HhDAWa1cby0E03/+BIQNwFQRyRURN7AEyO+UJh9Y5hxfDqxWVe0pUxH5BZ7A8aO+FdmY4aG8roUol5AaH+NX+rRENxXWQjADEN1bAlVtE5EbgVeBKOAJVd0mIj8DClQ1H1gOPCsihUAFnqABgIgUASMAt4hcCpwP1AA/AXYCm0QE4CFVfTyQlTMmnJXVNTMy0d3rthVeaQkx7C2rH+RSmUjWa0AAUNVVwKpO5+70OW4Crujm3pxusvXvX7kxw1RZXUuvm9r5Sktws6mhahBLZCKdrVQ2JkSV1TWT6ceAspd3ULmX3lpjumUBwZgQVVbX3McWQgyt7Up9S/sglspEMgsIxoSo8roWv2cYgc9q5XobWDb9YwHBmBBU39xGY2u7X2sQvD5brWxTT03/WEAwJgSVO6uU+9RllGj7GZmBsYBgTAgq7cMqZa/Pdjy1gGD6xwKCMSHo2CplP3Y69Uo79kwE6zIy/WMBwZgQ5O0y8udZCF4pzormChtUNv1kAcGYEORtIYzswxhCdJSLEXHRtsGd6TcLCMaEoPK6ZkbERRMbHdWn+9IS3bYFtuk3CwjGhKCyupY+DSh7pSa4bVDZ9JsFBGNCUGldc58GlL3SEmJsUNn0mwUEY0JQeV1znwaUvdKshWAGwAKCMSHIs9Npf7qMrIVg+s8CgjEhpqWtg+rG1j7tY+SVluCmrrmNlraOQSiZiXQWEIwJMd51BOlJ/ekycvYzarRuI9N3FhCMCTF9fZayr2P7GdmzlU0/WEAwJsR4A0JmPwaVU+O921dYC8H0nQUEY0JM2bGdTvs3qAxQ3WgtBNN3fgUEEVkkIrtEpFBEbuvieqyIvOhcXyciOc75dBFZIyJ1IvJQp3tOEZGPnXseFBF7xrIxeKacQt92OvXy7mdUZQHB9EOvAUFEooCHgQuBacBSEZnWKdl1QKWqTgHuB+51zjcBdwC3dpH1o8B3ganOz6L+VMCYSFNW10xstItEd9+2rQCfFoJNPTX94E8LYS5QqKp7VLUFWAEs7pRmMfC0c7wSOFdERFXrVfUdPIHhGBHJAkao6gfqeSL4M8ClA6mIMZGizHl0Zn8azUmx0US5xGYZmX7xJyCMA4p9Xpc457pMo6ptQDWQ3kueJb3kCYCIXC8iBSJSUFpa6kdxjQlvpbXN/eouAhARUuJtcZrpn5AfVFbVx1Q1T1XzMjMzg10cYwbdkZomxozoX0AASI2PsUFl0y/+BIQDwHif19nOuS7TiEg0kAKU95Jndi95GjMseQJCXL/vT0mwgGD6x5+AsAGYKiK5IuIGlgD5ndLkA8uc48uB1c7YQJdU9RBQIyLzndlF1wKv9Ln0xkSYxpZ2apraGDWAgJBqXUamn6J7S6CqbSJyI/AqEAU8oarbRORnQIGq5gPLgWdFpBCowBM0ABCRImAE4BaRS4HzVXU78H3gKSAe+JvzY8ywdrTWM/9i9EACQoKbwtK6QBXJDCO9BgQAVV0FrOp07k6f4ybgim7uzenmfAEww9+CGjMcHK72BoT+jyHYoLLpr5AfVDZmODlS61mUNpAWQkp8DLVNbbR3dNtra0yXLCAYE0KO1jgthOSBdBl5FqfV2MCy6SMLCMaEkCM1TcRGuxgR71dvbpdSE2z7CtM/FhCMCSFHapoZPSKuX6uUvWzHU9NfFhCMCSFHapoGNKAMMMI2uDP9ZAHBmBBytLZ5QGsQwMYQTP9ZQDAmRKjqgFcpg2dhGmBTT02fWUAwJkTUNbfR0NI+4C6jFAsIpp8sIBgTIo7UDHwNAkB0lIvk2GjbAtv0mQUEY0KEdw3CqAGsQfAaER9jD8kxfWYBwZgQcaR24NtWeKXajqemHywgGBMivF1GA51lBJ6AYNNOTV9ZQDAmRBypaSIpNpqk2P6vUvZKjXfbwjTTZxYQjAkRR2uaGRWA7iKwh+SY/rGAYEyIOFLTNKBN7Xx5t8Du4TlVxnyBBQRjQsThAGxb4ZUaH0Nbh9LQ0h6Q/MzwYAHBmBCgqhx1NrYLBNvx1PSHBQRjQkBVQyst7R0BCwgptuOp6QcLCMaEgCMBeJayL28LwRanmb7wKyCIyCIR2SUihSJyWxfXY0XkRef6OhHJ8bl2u3N+l4hc4HP+ZhHZJiJbReQFEQnM/wRjwtBn21YEaJaRbYFt+qHXgCAiUcDDwIXANGCpiEzrlOw6oFJVpwD3A/c6904DlgDTgUXAIyISJSLjgJuAPFWdAUQ56YwZlo7UDE4LwTa4M33hTwthLlCoqntUtQVYASzulGYx8LRzvBI4VzyPfFoMrFDVZlXdCxQ6+QFEA/EiEg0kAAcHVhVjwpd3H6PM5EDNMvKMIdhaBNMX/gSEcUCxz+sS51yXaVS1DagG0ru7V1UPAPcB+4FDQLWqvtbVm4vI9SJSICIFpaWlfhTXmPBzpKaZ1IQY4mKiApJfXIwLd7TLdjw1fRKUQWURScPTesgFxgKJInJ1V2lV9TFVzVPVvMzMzKEspjFDJpCL0gBEhFTb8dT0kT8B4QAw3ud1tnOuyzROF1AKUN7DvV8G9qpqqaq2Ai8Dp/WnAsZEgpLKRrJSAzuvwrta2Rh/+RMQNgBTRSRXRNx4Bn/zO6XJB5Y5x5cDq9WzZj4fWOLMQsoFpgLr8XQVzReRBGes4Vxgx8CrY0z4UVWKyuvJSU8MaL6eHU+ty8j4r9dtFVW1TURuBF7FMxvoCVXdJiI/AwpUNR9YDjwrIoVABc6MISfdS8B2oA24QVXbgXUishLY5Jz/EHgs8NUzJvSV1jbT0NJObkZgA0JKvJsDVY0BzdNENr/22VXVVcCqTufu9DluAq7o5t57gHu6OH8XcFdfCmtMJNpbVg9AToADQmpCDNsPVgc0TxPZbKWyMUFWVO4JCLmB7jKKt4fkmL6xgGBMkO0tayAmShgb4EHl1IQYGlraaW6zHU+NfywgGBNkRWX1jB+ZQHRUYP87erevsMVpxl8WEIwJsqLyeiYFePwAICXBs1q5xgKC8ZMFBGOCqKND2VsW+Cmn4BlDANvPyPjPAoIxQXS4ponmto6AzzACSHNaCBX1thbB+MevaacmuNraO1i19TAfFVcRHeUiJkoYNSKOC6aPZlQAtzswQ6/ImXIa6DUIAOlJFhBM31hACGEtbR28vKmER9/+lH3lDcRGu1CFlvYOAO56ZSunTc7g0tnjuGhmVsA2RjNDZ2/54KxBABiZ6AkI5RYQjJ8sIISosrpmlj2xnm0Ha5iZncL/XHMK5504GpdLUFU+OVrHX7YcJH/LQW794xZ+9bcdXLsgh6vnTzz2RWBCX1FZPbHRLrIC9BwEX3ExUSTHRlNW1xzwvE1ksoAQgkoqG7hm+XoOVTfy6FVzWDRjDJ4tnzxEhONGJ/Mv5x/PLecdx3uflvP42j389vXdPLymkK/NHsc3T8/hhDEjglgL44+9ZQ1MTE/A5ZLeE/dDepKb8jprIRj/WEAIMYVHa7n68fU0tLTxh+vmkZczssf0IsLpUzI4fUoGnxyp5Yl3i/jzhyWs2FDM/EkjuWreRM6fPprYaOtOCkWDNeXUKz0p1loIxm82yyiEHKhqZOn/rqOtQ3nxnxf0Ggw6mzo6mV9edhIf3H4ut114AsUVjfzghQ9Z8MvV/HLVjmMDmCY0tHco+8sbBmVA2Ss90VoIxn/WQggRNU2tfOvJ9TS1tvOn/3cax41O7ndeqQluvnfWZK5fOIm1hWW8sG4/j7+zl//5xx4WTs3g6vkT+fKJo4kapG4K45+DVY20tA/OlFOv9KRYNu2vHLT8TWSxgBACWts7+P4fNrGntJ6nvz13QMHAl8slnHVcJmcdl8mRmiZe3FDMC+v388/PbmRSRiLfP2cKi2eNJSbAWyYY/3g3tRuMRWleGUluKupbaO9Q+wXA9Mq+CYJMVfmPP2/lncIyfnnZSZw+JWNQ3mf0iDhuOncqa//1HB65ag5xMVHc+sctnHPfW7yy+QCe5xmZobR3ENcgeKUnuulQqGqwbiPTOwsIQfbUe0W8WFDMDedM5oq88b3fMEDRUS6+clIWf73pDJYvyyMlPoYfrtjMlY99wI5DNYP+/uYze8vqiY+JYvSI2EF7j/QkT962FsH4wwJCEL1bWMYv/rqDL584mn857/ghfW8R4dwTR5N/4xn859dOYveRWr764Fp+uWoHTa22XfJQKCqrZ2J6wuemFAead7WyzTQy/rCAECTFFQ3c8PwmJmUkcv+VJw/aPPTeRLmEf5o3gbduPZtv5I3nf/6xh4v++x02F1cFpTzDye4jdUwelTSo75HhbSHYTCPjB78CgogsEpFdIlIoIrd1cT1WRF50rq8TkRyfa7c753eJyAU+51NFZKWI7BSRHSKyIBAVCgeNLe1895kCOjqU/702j+S4mGAXidQEN7/6+kye/vZc6pvbuOyRd/mvv++kpa0j2EWLSAerGjlQ1cgpE9IG9X3SvdtXWAvB+KHXgCAiUcDDwIXANGCpiEzrlOw6oFJVpwD3A/c6904DlgDTgUXAI05+AL8D/q6qJwAnAzsGXp3wcHf+NnYeruV3S2cP6pTD/jjruExevflMvj4nm0fe+pTFD79rYwuDoGCfZyroqX1ca9JXqQluXGJjCMY//rQQ5gKFqrpHVVuAFcDiTmkWA087xyuBc8XTMboYWKGqzaq6FygE5opICnAmsBxAVVtUdVj0Uby8qeTYIPI5x48KdnG6NCIuhl9fcTKPX5tHaW0zlzz0Dg+vKaS13VoLgbJhbwWJ7ihOzArMFOPuRLmEkYluyqzLyPjBn4AwDij2eV3inOsyjaq2AdVAeg/35gKlwJMi8qGIPC4iofWr8iAoPFrLT/68lbm5I7n5y8cFuzi9+vK00bx285mcP20Mv351F5c+/C7bDlYHu1gRYUNRBXMmpgX8sZldSU+MtS4j45dgDSpHA3OAR1V1NlAPfGFsAkBErheRAhEpKC0tHcoyBlRzWzs3PPchCe4o/nvp7CH5IgiEkYluHr5qDr+/eg5HappZ/NC73PfqLpuJNADVja3sOlJL3sTB7S7ySk9yW5eR8Ys/30oHAN8J8tnOuS7TiEg0kAKU93BvCVCiquuc8yvxBIgvUNXHVDVPVfMyMzP9KG5oemh1IbuO1HLfFSczehC2Oh5si2Zk8cYtZ3LJrLE8tKaQCx74B2s/Cd8AHUyb9leiCqfmDO6Asld6krUQjH/8CQgbgKkikisibjyDxPmd0uQDy5zjy4HV6ln6mg8scWYh5QJTgfWqehgoFhHv5Ptzge0DrEvI2nGohkff+pTLZo/jnBNCc9zAH6kJbn77jVk8/515uES4Zvl6frjiQ0pr7cumLwqKKoh2CbMmpA7J+9kGd8ZfvQYEZ0zgRuBVPDOBXlLVbSLyMxG5xEm2HEgXkULgFpzuH1XdBryE58v+78ANqurta/gB8JyIfATMAv4zcNUKHe0dym1/+oiU+BjuuKjz5KzwdNqUDP72w4XcdO5U/vbxYc79zVu8sH4/HR22/YU/NhRVMn1cCgnuodlKLCPJTW1zm3XzmV759S9SVVcBqzqdu9PnuAm4opt77wHu6eL8ZiCvL4UNR0++u5ctJdU8uHQ2aRH0JLO4mChuOe84Ljl5LD/588fc/vLH/HnTAe674mQmpCcEu3ghq7mtnS3FVVwzf+KQvad3+4qK+hbGpsYP2fua8BMeI5th6nB1E795bTfnnjCKi2dmBbs4g2LKqCRWXD+f/7p8JjsO1XDh7/7BivX7bbO8bmw9UENzW0efn3UxEJ8tTrNuI9MzCwiD6MHVn9DW0cHdl0wf1P1qgk1E+EbeeP5+85mcPD6V217+mO8+s5HqhtZgFy3kbCiqACBviAaU4bMWQlm9jfWYnllAGCRFZfW8tKGYpXMnMH7k8OhCGZcazx+um8cdF03j7d1HueThd2yVcycFRRVMykg8tsfQUMg4nbvrAAAVm0lEQVRIshaC8Y8FhEFy/xu7iY4SbjxnSrCLMqRcLuG6M3JZcf0Cmlrb+doj7/LnD0uCXayQUN/cxgd7Kpg3aei6i8BnC2ybemp6YQFhEOw4VEP+loN887RcRoXhmoNAOGViGn/5wRnMzE7l5he3cHf+tmG/9cUrmw9S19zG5adkD+n7JrqjiI122eI00ysLCIPgN6/tJik2mu+dNSnYRQmqUclxPPedeXz79Fyeeq+Iqx5fN2zXLKgqz7xfxIlZI5gzyDucdiYiZCTF2jMRTK8sIATY9oM1vLHjCNcvnERqQuRMM+2vmCgXd148jQeunMVHJVV89cG1vL79SLCLNeQ27a9k5+Farpk/MSgTDNKTbHGa6Z0FhAB7bt0+YqNdXLsgJ9hFCSmXzh7Hy//vdEYmuvnuMwXc8NwmjtY2BbtYQ+bZ9/eRHBvNpbPHBuX90xPdlNssI9MLCwgBVN/cxiubD/LVmVmkJAT/oTehZtrYEeTfeAa3nn8cr28/wrn3vc3P/rKdT0vrgl20QVVW18yqjw/z9VOyh2x1cmee/YyshWB6Fpx/nREqf4tn0PCqeROCXZSQ5Y52ceOXprJoRhYPvLGbZz8o4ol397JgUjqLZozhSyeMirhpui9uKKalvYOrh3B1cmfeLiNVjeg1MWZgLCAE0PPr9nP86OQhHzQMR1NGJfHQP82htLaZlwqKWbmxhLvyt3FX/jamjkriopljuXT2WCamh/djMuqa2/jDB/s4bXI6Uwb5+ck9yUiMpaW9g9rmNkaEwCNbTWiygBAgH5dU8/GBan4a4auSAy0zOZYbzpnCDedMYW9ZPat3HuX17Yd54M3d3P/GbmZPSOU7Z0ziwhljcLnC7+/1zle2cqSmid8tmR3UcqT7LE6zgGC6YwEhQJ5fv4+4GBeXzu78MDnjr9yMRK47I5frzsjlUHUj+ZsPeh43+vwmjh+dzA+/PJVF08MnMPxpYwkvbzrAj748lbm5Q7sYrbMMn8VpuSH2HG8TOmxQOQDqnMHki2eOJSXefvsKhKyUeP75rMm8fvNZ/G7JLNo6Ovj+c5v42qPvsdF5QH0o21Naxx2veB6X+oMvTQ12cY61EGwtgumJBYQAeHPHERpa2rny1PG9JzZ9EuUSFs8ax2s3n8V9V5zMoapGvv7oe/xoxYccqGoMdvG6VNXQwo3Pf4g72sXvlswiKgRaNN6n9B2uHj5TfU3fWZdRAKzeeZSRiW5m22DyoIlyCZefks2FM8bw6Fuf8tjaPaz6+DBXz5/I98+ZPKSbxfWkqKyebz+1gZLKRv7nmlPISgmN5w+kJ7qJj4miuDI0g6gJDRYQBqi9Q3l7dylfOn5USPwmGOkSY6O59YLjWTpvAg++8QlPvbeXFRv2c+Wp41k6dwLHjU4OWtkKiir47jMFADz33XmcOoTPPOiNiJCdFk9JZUOwi2JCmAWEAfpwfyVVDa1h/azkcDQuNZ57L5/J9WdN4ndvfMIfPtjHk+8WMWdCKl+dOZZ5uSM5MWvEkATp2qZWHlpTyBPv7CU7LYEnv3kqOSE4cOsJCNZCMN2zgDBAq3ceJcolnHlcZrCLMixNzkziwaWzKa+bxsubDvBSQTE//7/tACTHRTMtawTZaQlkp8UzekQcqQkxpMTHkJbgJiPJTVqim5io/g2lNbW288rmA/z61d2U1zdzxSnZ/PtXTgzZPayy0xLYtL8q2MUwIcyvgCAii4DfAVHA46r6q07XY4FngFOAcuBKVS1yrt0OXAe0Azep6qs+90UBBcABVb1owLUJgtU7j5I3Mc1mFwVZelIs3z1zEt89cxIHqxrZUFTBB3sqKDxay3uflnG4ponunuo5MtHN+LR4skcmMD4tgYnpCUwY6QkiqfFukuOiEYGapjaO1DTx6dE6/rb1MG/uOEJ9SzunTEzjiW/mMTM7dWgr3UfZafFUN7ZS09RqaxFMl3oNCM6X9sPAeUAJsEFE8lV1u0+y64BKVZ0iIkuAe4ErRWQasASYDowF3hCR41S13bnvh8AOYETAajSEDlY1svNwLbdfeEKwi2J8jE2NZ/GscSye9dmakJa2DirqW6hqbKGqoZXK+hbK61sor2vhcE0TJZUNbD9Yw2vbDtPa/vnIIeLZtbWl7bPnOaQlxHDJrLFcNHMsp01OD4vFiNlpni1BSioamTbWAoL5In9aCHOBQlXdAyAiK4DFgG9AWAzc7RyvBB4Sz/+QxcAKVW0G9opIoZPf+yKSDXwVuAe4JQB1GXJrdh0F4Es2fhDy3NEuxqTEMSal5wcWtXcoh2ua2Fdez4HKRmqa2qhpbKWptZ2MpFhGp8QxLjWOmdmp/e5qCpbxIz0znkoqG5g2Nix/BzODzJ+AMA4o9nldAszrLo2qtolINZDunP+g073eX9seAP4V6HFaiIhcD1wPMGFCaG0at2bnUbLT4oO6R40JrCiXMC41nnGpoTFdNJCOtRBsYNl0Iyi/4ojIRcBRVd3YW1pVfUxV81Q1LzMzdAZum1rbebewnC+dMCosuguMSUuIIcEdZQHBdMufgHAA8F2Cm+2c6zKNiEQDKXgGl7u793TgEhEpAlYAXxKRP/Sj/EGzfm8Fja3tNt3UhA3vWoRiW4tguuFPQNgATBWRXBFx4xkkzu+UJh9Y5hxfDqxWVXXOLxGRWBHJBaYC61X1dlXNVtUcJ7/Vqnp1AOozZAqKKnAJzA2hxUfG9GZ8WoK1EEy3eh1DcMYEbgRexTPt9AlV3SYiPwMKVDUfWA486wwaV+D5ksdJ9xKeAeg24AafGUZhbeP+Sk7MGkFirC3lMOEjOy2e9UUVwS6GCVF+fZup6ipgVadzd/ocNwFXdHPvPXhmEnWX91vAW/6UI1S0tXeweX8VXz8lO9hFMaZPstMSqG1qo7qx1dbOmC8Ir3lzIWLXkdpjC5KMCSfZaZ9NPTWmMwsI/bDJ2Y/fHpVpwo136mlxhY0jmC+ygNAPm/ZXMSo59thvW8aEC9/FacZ0ZgGhHzbuq+SUiWm2/sCEnZT4GJJio22mkemSBYQ+OlrbxP6KBhs/MGHps+ciWEAwX2QBoY827fNsH2xPRzPhyh6UY7pjAaGPNu2vxB3lYsY42xzMhKdsZ3GadrcfuBm2LCD00cZ9lZyUnUJsdFSwi2JMv2SnxVPX7FmLYIwvCwh90NzWzscl1TZ+YMKa7XpqumMBoQ+2Hayhpb3D1h+YsGaL00x3LCD0wbEFaRND+1GJxvRkQrqnhfBpaX2QS2JCjQWEPthSUs3YlDhGJff81C1jQtmIuBgmjExg28HqYBfFhBgLCH2wpbiKk8db68CEv5OyU/ioxAKC+TwLCH6qrG9hf0WDBQQTEWaOS6GkspHK+pZgF8WEEAsIftpS4lmQdnK2BQQT/k4alwLAxweslWA+YwHBT1uKqxHxNLWNCXfTLSCYLlhA8NOWkiqmjkoiyZ6QZiJASnwMOekJfGzjCMaHBQQ/qCpbiquYad1FJoKclJ1qLQTzORYQ/HCgqpHy+hYbUDYRZea4FM+/7brmYBfFhAi/AoKILBKRXSJSKCK3dXE9VkRedK6vE5Ecn2u3O+d3icgFzrnxIrJGRLaLyDYR+WGgKjQYthR7fouaZS0EE0Fm2DiC6aTXgCAiUcDDwIXANGCpiEzrlOw6oFJVpwD3A/c6904DlgDTgUXAI05+bcC/qOo0YD5wQxd5howtJVW4o10cPyY52EUxJmC8O/baOILx8qeFMBcoVNU9qtoCrAAWd0qzGHjaOV4JnCuex4ktBlaoarOq7gUKgbmqekhVNwGoai2wAxg38OoMjs3FVUwfOwJ3tPWwmciRHBfDpMxEPrIWgnH48w03Dij2eV3CF7+8j6VR1TagGkj3516ne2k2sK6rNxeR60WkQEQKSktL/ShuYLV3KFsPVNv6AxORThqXwlYLCMYR1F95RSQJ+BPwI1Wt6SqNqj6mqnmqmpeZmTm0BQQKj9bR0NLOyeNt/YGJPCeNS+FQdRNHa5uCXRQTAvwJCAeA8T6vs51zXaYRkWggBSjv6V4RicETDJ5T1Zf7U/ihsLnYs8OptRBMJPJOpbZWggH/AsIGYKqI5IqIG88gcX6nNPnAMuf4cmC1ep7Plw8scWYh5QJTgfXO+MJyYIeq/jYQFRks6/dWkpYQQ056YrCLYkzATR87Apd89qxwM7z1GhCcMYEbgVfxDP6+pKrbRORnInKJk2w5kC4ihcAtwG3OvduAl4DtwN+BG1S1HTgduAb4kohsdn6+EuC6DZiq8sGecuZPSsflkmAXx5iAS4yNZm7uSP6+7XCwi2JCgF/7MKjqKmBVp3N3+hw3AVd0c+89wD2dzr0DhPw3bHFFIweqGvnnsyYFuyjGDJqvnpTFHa9sY/eRWo4bbVOrhzObR9mDD/aUA7BgUnqQS2LM4LlgxhhE4K8fHQp2UUyQWUDowft7yslIcjNlVFKwi2LMoBmVHMe83JH89WMLCMOdBYRuqCrvf1rOvEnpeMbAjYlcXz0pi8Kjdew+UhvsopggsoDQjX3lDRyuabLuIjMsWLeRAQsI3XrfO34w2QKCiXyjkuOYmzOSVdZtNKxZQOjG+5+Wk5kcy6QMW39ghoeLZmbxiXUbDWsWELqgqry/p5wFNn5ghhFvt9H/92HnjQjMcGEBoQt7yuoprW227iIzrIxKjuPCGWN46r0i29tomLKA0IX3P/WMH8y3AWUzzPz4ghNoaevggTc+CXZRTBBYQOhC/paDTExPICc9IdhFMWZI5WYkcvX8iaxYv59PbCxh2LGA0Enh0VrW761g6dwJNn5ghqWbzp1KojuaX/1tZ7CLYoaYBYROnlu3n5go4YpTsoNdFGOCYmSimxu+NIU3dx7lvcKyYBfHDCELCD4aW9r508YSFs3IIj0pNtjFMSZovnlaDtlp8fzLH7dwoKox2MUxQ8QCgo//++ggNU1tXDVvQrCLYkxQxcVE8dg1edQ1t3HN8nWU1zUHu0hmCFhA8PHcuv1MzkxkXu7IYBfFmKCbNnYEy5edyoHKRr711AbqmtuCXSQzyCwgOLYdrGZzcRVXzZtog8nGOObmjuSRq+aw7WANSx/7wFYxRzgLCEB7h/Lb13YTG+3i63NsMNkYX+eeOJpHr5rDgapGvvrgWv77zU9obe8IdrHMILCAANzz1x28ufMo/7roBFISYoJdHGNCzvnTx/D6zWdy/vQx/Ob13Zx//z944p29VDe2BrtoJoBEVYNdBr/l5eVpQUFBQPNc/s5efv5/2/nW6TncdfH0gOZtTCR6ffsRHn2rkE37q4iLcXHhjCzOPj6TM6Zk2Oy8ECUiG1U1r7d0fj1TWUQWAb8DooDHVfVXna7HAs8ApwDlwJWqWuRcux24DmgHblLVV/3Jc7A1trTz4ob9/OKv27lg+mj+46vThvLtjQlb500bzXnTRrP1QDXPrdvH37Ye5s/OhngnjElmxrgUpo8dwYlZI5iYnsDo5DhcLhuXCwe9thBEJArYDZwHlAAbgKWqut0nzfeBmar6PRFZAnxNVa8UkWnAC8BcYCzwBnCcc1uPeXalvy2Ejg6lsqGFIzXNHKlp4s2dR3jlw4PUNrcxL3ckT31rLvHuqD7na4zxjMFtPVDNP3aXsmFfJdsPVlNW13LsujvKxbi0eDKTYslIdpOeGEtKfAzJcdEkx8WQ4I4iLiaKeHcU7igX7mgXsdEuoqOEaJeLmCghyuX8iOByCS7xHCPgEnCJ4Hnp+bMzVVAUVehQRZ1z+Jz3dSwvlydv73v4HosQNhNQAtlCmAsUquoeJ+MVwGLA98t7MXC3c7wSeEg8f1OLgRWq2gzsFZFCJz/8yDNgzrv/bT4trT/2OjbaxVdOymLJqeOZmzsybD5UY0JRlEs4eXwqJ49PBTzbxx+tbWbX4VqKKxvYX9FASWUjpbXN7DxcS3ldObVNrXSET291t8QbHPgsiODzdSKd0n52/vPfOd19Bfme3njHecTFDO4vrv4EhHFAsc/rEmBed2lUtU1EqoF05/wHne4d5xz3licAInI9cL3zsk5EdvlR5l7tBh7o2y0ZQCSv47f6hb9Ir2Ok1w96qGP8zweU70R/Evk1hhBMqvoY8FiwyyEiBf40ucKV1S/8RXodI71+EPw6+jPt9AAw3ud1tnOuyzQiEg2k4Blc7u5ef/I0xhgzhPwJCBuAqSKSKyJuYAmQ3ylNPrDMOb4cWK2e0ep8YImIxIpILjAVWO9nnsYYY4ZQr11GzpjAjcCreKaIPqGq20TkZ0CBquYDy4FnnUHjCjxf8DjpXsIzWNwG3KCq7QBd5Rn46gVU0LutBpnVL/xFeh0jvX4Q5DqG1cI0Y4wxg8e2rjDGGANYQDDGGOOwgNALEVkkIrtEpFBEbgt2eQJBRIpE5GMR2SwiBc65kSLyuoh84vyZFuxy9oWIPCEiR0Vkq8+5LuskHg86n+lHIjIneCX3Tzf1u1tEDjif42YR+YrPtdud+u0SkQuCU+q+EZHxIrJGRLaLyDYR+aFzPiI+xx7qFzqfo6raTzc/eAa8PwUmAW5gCzAt2OUKQL2KgIxO5/4LuM05vg24N9jl7GOdzgTmAFt7qxPwFeBveBaCzgfWBbv8/azf3cCtXaSd5vxbjQVynX/DUcGugx91zALmOMfJeNaPTouUz7GH+oXM52gthJ4d27ZDVVsA7xYbkWgx8LRz/DRwaRDL0meq+g88M9x8dVenxcAz6vEBkCoiWUNT0v7ppn7dObZljKruBXy3jAlZqnpIVTc5x7XADjw7G0TE59hD/boz5J+jBYSedbVtR08fYLhQ4DUR2ehsDQIwWlUPOceHgdHBKVpAdVenSPpcb3S6S57w6eYL+/qJSA4wG1hHBH6OneoHIfI5WkAYns5Q1TnAhcANInKm70X1tFcjaj5yJNYJeBSYDMwCDgG/CW5xAkNEkoA/AT9S1Rrfa5HwOXZRv5D5HC0g9Cwit9hQ1QPOn0eBP+Nphh7xNredP48Gr4QB012dIuJzVdUjqtquqh3A//JZd0LY1k9EYvB8WT6nqi87pyPmc+yqfqH0OVpA6FnEbbEhIokikuw9Bs4HtvL57UeWAa8Ep4QB1V2d8oFrnVkq84Fqny6JsNGpv/xreD5H6H7LmJAmIoJn14Mdqvpbn0sR8Tl2V7+Q+hyDPfIe6j94ZjLsxjPC/5NglycA9ZmEZ+bCFmCbt054tit/E/gEz4OMRga7rH2s1wt4mtutePpar+uuTnhmpTzsfKYfA3nBLn8/6/esU/6P8Hx5ZPmk/4lTv13AhcEuv591PANPd9BHwGbn5yuR8jn2UL+Q+Rxt6wpjjDGAdRkZY4xxWEAwxhgDWEAwxhjjsIBgjDEGsIBgjDHGYQHBGGMMYAHBGGOM4/8Hf4FXXq/kN5EAAAAASUVORK5CYII=\n",
196 | "text/plain": [
197 | ""
198 | ]
199 | },
200 | "metadata": {
201 | "needs_background": "light"
202 | },
203 | "output_type": "display_data"
204 | }
205 | ],
206 | "source": [
207 | "df0['weibo_len'] = df0['content'].astype(str).apply(len)\n",
208 | "sns.kdeplot(df0['weibo_len'])\n",
209 | "plt.title('weibo length')"
210 | ]
211 | },
212 | {
213 | "cell_type": "code",
214 | "execution_count": 39,
215 | "metadata": {},
216 | "outputs": [
217 | {
218 | "data": {
219 | "text/plain": [
220 | "Text(0.5,1,'weibo length new')"
221 | ]
222 | },
223 | "execution_count": 39,
224 | "metadata": {},
225 | "output_type": "execute_result"
226 | },
227 | {
228 | "data": {
229 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYQAAAEICAYAAABfz4NwAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAIABJREFUeJzt3Xl8VfWZ+PHPkz1kIxshJEDCIhAQBCKIiOKOVsVarVhbl9px7LhMa9ux/bXVbk7r1FE7ox3HulZb0cEu2NK6gYhahaCAJgSIbAmBELIvZH9+f9wTvIaE3Gz33Nw879crL84953vPfe5JuM/9rkdUFWOMMSbE7QCMMcYEBksIxhhjAEsIxhhjHJYQjDHGAJYQjDHGOCwhGGOMASwhmGFMRB4VkR8620tFpGSQzvsjEXluMM7Vj9d+U0S+5sZrGxPmdgDG9Jeq3uJ2DAMhIj8Cpqjql92OxRiwGoIxxhiHJQTjdyJyo4i87PV4l4j8n9fjYhE5xdmeLiKviUiliOwQkS96lXtaRH7W5dz/T0SOiMheEbnWa3+CiPxWRMpFZJ+I/EBEfPr7F5HTRORdEakWka0istTr2Jsi8lMReUdE6kTkVRFJ8Tp+nfN6FSLyQyeu80RkGfD/gKtFpF5Etnq95MSeztclrqUiUiIi3xKRwyJyUERu9DoeKSL3i8h+ESlzmtiinWPrReQLzvZiEVER+Zzz+FwR2eLLtTHBxRKCccN6YImIhIjIOCACWAQgIpOAWGCbiMQArwG/B8YAK4Bfi0hOD+cdC6QAGcD1wGMiMs059t9AAjAJOAu4Drixu5N4E5EM4K/Az4Ak4NvASyKS6lXsS865xjjv5dvOc3OAXwPXAunO62cAqOrfgX8HXlDVWFWd09v5TvCeO897E/CIiCQ6x34BnAScAkxxytztHFsPLHW2zwJ2A2d6PV5/wgtjgpIlBON3qrobqMPzQXUm8ApQKiLT8XwYbVDVDuASYK+qPqWqbar6IfAScNUJTv9DVW1W1fV4Psi/KCKheJLJ91S1TlX3Av8JfMWHcL8MrFHVNaraoaqvAXnAxV5lnlLVnap6FHjReV8AVwIvq+rbqtqC58PYl8XDejpfd1qBn6hqq6quAeqBaSIiwM3AN1W1UlXr8CSgFc7z1uO51uD5Hfzc67ElhBHKOpWNWzq/oU5xtqvxfBAt4tMPo4nAQhGp9npeGPBsD+esUtUGr8f7gHF4ag3hzmPvYxk+xDkRuEpELvXaFw6s83p8yGu7EU8NB+e1izsPqGqjiFT48Jo9na87Fara1k35VGAUsNmTGwAQINTZ/gdwkoik4Uk4lwE/dpqnFgBv+RCnCTKWEIxb1gOXAtl4vrlW42laWQQ87JQpBtar6vk+njNRRGK8ksIE4GPgCJ5v0hOBAq9jB3w4ZzHwrKr+k48xeDsIdDZZ4bTfJ3sdH8qlho8AR4GZqnrc+3SS02bgX4GPVbVFRN4F7gQ+UdUjQxibCVDWZGTcsh44G4hW1RJgA7AMzwfmh06Zv+D5FvsVEQl3fk4VkRknOO+PRSRCRJbgaXL6P1Vtx9P0cq+IxInIRDwffL7MNXgOuFRELhSRUBGJcjpzM3147irnuaeLSATwIzzf0juVAVm+dm73hdPk9hvgQREZA57+EBG50KvYeuA2Pq2RvdnlsRlhLCEYV6jqTjzt3Rucx7V4OjbfcT7Acdq9L8DT7l2KpynlPiCyh9MeAqqcsr8DblHVQufY7UCD8xpv4+moftKHOIuB5XhGBJXjqTF8Bx/+76hqvvO6K/HUFuqBw0CzU6RzZFWFiHzQ2/n64S6gCHhPRGqB1/GqseD54I/j0+ahro/NCCN2gxxj/ENEYvE0jU1V1T1ux2NMV1ZDMGYIicilIjLKGUJ7P/ARsNfdqIzpniUEY4bWcjxNWKXAVGCFWrXcBChrMjLGGANYDcEYY4xjWM1DSElJ0aysLLfDMMaYYWXz5s1HVDW1t3LDKiFkZWWRl5fndhjGGDOsiMi+3ktZk5ExxhiHJQRjjDGAJQRjjDGOYdWH0J3W1lZKSkpoampyO5SgEBUVRWZmJuHh4W6HYozxs2GfEEpKSoiLiyMrKwuvZX5NP6gqFRUVlJSUkJ2d7XY4xhg/G/ZNRk1NTSQnJ1syGAQiQnJystW2jBmhhn1CACwZDCK7lsaMXEGREIwxI5MtvTO4LCEYY4adkqpGlj30Fg+vLXI7lKBiCcElX/va1ygo8NzNMTb2RLfM9c0NN9zAqlWrBnweYwJdcWUjKx57j8JDdfx5a6nb4QSVYT/KaLh6/PHH3Q7BmGGnMxnUNbVy2ZxxrN5aypH6ZlJie7qJnumLoEoIP345n4LS2kE9Z864eO65dGaPx3/5y18SGRnJHXfcwTe/+U22bt3K2rVrWbt2LU888QTXX38999xzD83NzUyePJmnnnqK2NhYli5dyv33309ubi4A3/zmN3n11VcZO3YsK1euJDU1lS1btnDLLbfQ2NjI5MmTefLJJ0lMTOw15s2bN3PnnXdSX19PSkoKTz/9NOnp6SxdupSFCxeybt06qqureeKJJ1iyZMmgXStjhpKqcsNTG6lvbuP3/3QazW0drN5aysY9lVx8crrb4QUFazIaoCVLlrBhwwYA8vLyqK+vp7W1lQ0bNjB79mx+9rOf8frrr/PBBx+Qm5vLAw88cNw5GhoayM3NJT8/n7POOosf//jHAFx33XXcd999bNu2jZNPPvnY/hNpbW3l9ttvZ9WqVWzevJmvfvWrfP/73z92vK2tjY0bN/LQQw/5dD5jAkVZbTOflDfwjfOmMisjgdmZCUSHh/L+7gq3QwsaQVVDONE3+aEyf/58Nm/eTG1tLZGRkcybN4+8vDw2bNjAZZddRkFBAYsXLwagpaWFRYsWHXeOkJAQrr76agC+/OUvc8UVV1BTU0N1dTVnnXUWANdffz1XXXVVr/Hs2LGDjz/+mPPPPx+A9vZ20tM//fZ0xRVXHIt77969A3rvxvhTwcEaAE7OSAAgPDSE+RMTeX9PpZthBZWgSghuCA8PJzs7m6effprTTz+d2bNns27dOoqKisjOzub888/n+eef79M5BzIXQFWZOXMm//jHP7o9HhnpaWsNDQ2lra2t369jjL91NgdPT48/tm9hdhL/+dpOqhpaSIyJcCu0oGFNRoNgyZIl3H///Zx55pksWbKERx99lLlz53LaaafxzjvvUFTkGRrX0NDAzp07j3t+R0fHsRFCv//97znjjDNISEggMTHxWHPUs88+e6y2cCLTpk2jvLz8WEJobW0lPz9/sN6qMa4pOFhLVvIoYiM//R67cFIyABv3Wi1hMFhCGARLlizh4MGDLFq0iLS0NKKioliyZAmpqak8/fTTXHPNNcyePZtFixZRWFh43PNjYmLYuHEjs2bNYu3atdx9990APPPMM3znO99h9uzZbNmy5dj+E4mIiGDVqlXcddddzJkzh1NOOYV333130N+zMf5WUFpLzrj4z+ybMz6ByLAQ3t9tCWEwyHCa6Zebm6td75i2fft2ZsyY4VJEwcmuqQk09c1tzLrnFb59wUncds7Uzxxb8dg/qGtq46932Ii5nojIZlXN7a2c1RCMMQGv8KCn/6BrDQFgYXYyBQdrqTna6u+wgo4lhGHm1ltv5ZRTTvnMz1NPPeV2WMYMqYLOhJCecNyx0yYlowp51o8wYD6NMhKRZcCvgFDgcVX9RZfjkcBvgflABXC1qu4VkWRgFXAq8LSq3tbNuVcDk1R1Vn/fhKqOmFU6H3nkkSE9/3BqQjQjR0FpLUkxEaTFHz8jee6E0USEhvD+nkrOnZHmQnTBo9cagoiEAo8AFwE5wDUiktOl2E1AlapOAR4E7nP2NwE/BL7dw7mvAOr7F7pHVFQUFRUV9kE2CDpvkBMVFeV2KMZ8RsHBWnLS47v94hcVHsqMcfHkl9a4EFlw8aWGsAAoUtXdACKyElgOFHiVWQ78yNleBTwsIqKqDcDbIjKl60lFJBa4E7gZeLG/byAzM5OSkhLKy8v7ewrjpfMWmsYEirb2DgoP1XHD6Vk9lslMjCb/gCWEgfIlIWQAxV6PS4CFPZVR1TYRqQGSgSMnOO9Pgf8EGk/04iJyM56kwYQJE4473jkxzBgTnHYfaaClrYOc9OM7lDtljo7mtfwyOjqUkJCR0Xw8FFzpVBaRU4DJqvrH3sqq6mOqmququampqX6IzhgTSDpnKHc3wqhTRmI0Le0dHKlv9ldYQcmXhHAAGO/1ONPZ120ZEQkDEvB0LvdkEZArInuBt4GTRORN30I2xowkBQdriQgLYVJKTI9lMkZHA1BSfdRfYQUlXxLCJmCqiGSLSASwAljdpcxq4Hpn+0pgrZ6gl1dV/0dVx6lqFnAGsFNVl/Y1eGNM8CsorWX62DjCQnv+uMpI9CSEA1WWEAai1z4Ep0/gNuAVPMNOn1TVfBH5CZCnqquBJ4BnRaQIqMSTNABwagHxQISIXA5coKoFXV/HGGO6UlUKDtZyQc6Jh5N21hAOWA1hQHyah6Cqa4A1Xfbd7bXdBHS7NrNTCzjRufcC/Z6DYIwJXpUNLVQ2tDA1Le6E5eKiwomPCrMawgDZTGVjTMAqcT7gJySN6rVsRuIoqyEMkCUEY0zAKq7yjEofnxTda9mM0dGUWkIYEEsIxpiAVVzp+YDPTOy9hpCZGG1NRgNkCcEYE7BKqhpJHBX+mZvi9CRjdDR1zW226ukAWEIwxgSs4qqjjPeh/wBs6OlgsIRgjAlYJZWNZCb23n8ANvR0MFhCMMYEpI4OpaT6KON96D8A7xrCCZdHMydgCcEYE5DK65tpaevwuYaQHBNBVHiI1RAGwBKCMSYglTjf9DN97EMQEcaNjraEMACWEIwxAalzyKmvTUbg6UewTuX+s4RgjAlIxZVODcHHJqPOslZD6D9LCMaYgFRSdZTUuEiiwkN9fs64hGiO1LfQ1No+hJEFL0sIxpiAVFzVyPg+1A7Aa6SR1RL6xRKCMSYgFVc1+rRkhbfOuQi2plH/WEIwxgSctvYODlY3+bSonTebrTwwlhCMMQHnUG0TbR3a5xrC2PgoQkPEmoz6yRKCMSbg9GfIKUBYaAhj46OshtBPlhCMMQGnpA/3QegqY3Q0JVZD6BefEoKILBORHSJSJCLf7eZ4pIi84Bx/X0SynP3JIrJOROpF5GGv8qNE5K8iUigi+SLyi8F6Q8aY4a+46igikJ7Q94QwJj6S8rrmIYgq+PWaEEQkFHgEuAjIAa4RkZwuxW4CqlR1CvAgcJ+zvwn4IfDtbk59v6pOB+YCi0Xkov69BWNMsCmpaiQ9PoqIsL43YoyJi7KE0E++XO0FQJGq7lbVFmAlsLxLmeXAM872KuBcERFVbVDVt/EkhmNUtVFV1znbLcAHQOYA3ocxJoiUVB71eQ2jrlLjIqlvbqOxpW2Qowp+viSEDKDY63GJs6/bMqraBtQAyb4EICKjgUuBN3wpb4wJfp45CH1vLgJPQgCsltAPrnYqi0gY8DzwX6q6u4cyN4tInojklZeX+zdAY4zftbR1cKi2qc8jjDqNsYTQb74khAPAeK/Hmc6+bss4H/IJQIUP534M2KWqD/VUQFUfU9VcVc1NTU314ZTGmOHsYM1RVPu2qJ03qyH0ny8JYRMwVUSyRSQCWAGs7lJmNXC9s30lsFZV9UQnFZGf4Ukc3+hbyMaYYNY5hyBjgAnhsCWEPgvrrYCqtonIbcArQCjwpKrmi8hPgDxVXQ08ATwrIkVAJZ6kAYCI7AXigQgRuRy4AKgFvg8UAh+ICMDDqvr4YL45Y8zw0zmHoHNdor5KGhVBaIhYDaEfek0IAKq6BljTZd/dXttNwFU9PDerh9OKbyEaY0aS0ur+z0EACAkRUmIjLCH0g81UNsYElANVRxkTF9mvOQidUuMiOVzX1HtB8xmWEIwxAaW05ijj+tlc1Ck1NpLyeqsh9JUlBGNMQDlQdbTf/QedxsRFcbjWEkJfWUIwxgSMjg6ltLppwAkhNS6SioYW2jtOONjRdGEJwRgTMI40NNPS3tHvIaedUuMiae9QqhpbBimykcESgjEmYBybgzDgJiNnLoI1G/WJJQRjTMDovNPZgDuVO2crW8dyn1hCMMYEjNLqgc1S7mTLV/SPJQRjTMA4UHWUuKgw4qPCB3SeT5evsLkIfWEJwRgTMA4MwggjgFERYcRGhlkNoY8sIRhjAsaB6oHPQeiUGme30uwrSwjGmIBxoKpxwB3KnTzLV1hC6AtLCMaYgFDX1EptU9uAO5Q7pcZFcsQSQp9YQjDGBITSak8H8KA1GcVak1FfWUIwxgSEA9WNwMDnIHRKjYukrrmNoy3tg3K+kcASgjEmIBxwagj9vXVmV3Zv5b6zhGCMCQgHqo4SHiqkxkYOyvlsLkLfWUIwxgSE0uqjpCdEExIyODdTtNnKfWcJwRgTEAZzDgJ47okAtp5RX/iUEERkmYjsEJEiEfluN8cjReQF5/j7IpLl7E8WkXUiUi8iD3d5znwR+ch5zn+JiN1j2ZgR7EDVwO+U5i0pJoIQsRVP+6LXhCAiocAjwEVADnCNiOR0KXYTUKWqU4AHgfuc/U3AD4Fvd3Pq/wH+CZjq/Czrzxswxgx/re0dlNU1DdocBIDQECHZhp72iS81hAVAkaruVtUWYCWwvEuZ5cAzzvYq4FwREVVtUNW38SSGY0QkHYhX1fdUVYHfApcP5I0YY4avQzVNqELmINYQwDPSyJqMfOdLQsgAir0elzj7ui2jqm1ADZDcyzlLejknACJys4jkiUheeXm5D+EaY4ab4krPHITBrCFA5/IVNsrIVwHfqayqj6lqrqrmpqamuh2OMWYI7HcSwoSkUYN6Xput3De+JIQDwHivx5nOvm7LiEgYkABU9HLOzF7OaYwZIfZVNhIWIoPaqQzOekb1LXR06KCeN1j5khA2AVNFJFtEIoAVwOouZVYD1zvbVwJrnb6BbqnqQaBWRE5zRhddB/y5z9EbY4LC/spGMhOjCR2kOQidUuMiae9QqhpbBvW8wSqstwKq2iYitwGvAKHAk6qaLyI/AfJUdTXwBPCsiBQBlXiSBgAisheIByJE5HLgAlUtAP4FeBqIBv7m/BhjRqDiykbGD3JzEXx2LkLyIM2ADma9JgQAVV0DrOmy726v7Sbgqh6em9XD/jxglq+BGmOC176KRi6dkz7o5/WerTx97KCfPugEfKeyMSa41TS2UnO0ddA7lMGWr+grSwjGGFcVVw3NCCOwhNBXlhCMMa7aV9GZEGIG/dwxEaFEh4farTR9ZAnBGOOqY3MQkge/hiAipMbZXARfWUIwxrhqf2UDyTERxEb6NMalzywh+M4SgjHGVfuHaMhpp9RYW8/IV5YQjDGu2l/ZOCQdyp2shuA7SwjGGNe0tndQWt3ExCHoP+iUGhdJzdFWmtvah+w1goUlBGOMa0qrj9LeoUPaZDTGGXp6pN6Wr+iNJQRjjGuGapVTbzYXwXeWEIwxrumcgzDUTUZgCcEXlhCMMa4prmwkIjSENGcRuqHQmRDsRjm9s4RgjHHNvopGMpOiCRnkZa+9JcdYDcFXlhCMMa7ZX9nIxCHsPwCICAshcVS4JQQfWEIwxrhCVSke4jkInWwugm8sIRhjXFHV2EpdcxsTkgd/UbuuUuNstrIvLCEYY1zhjyGnnVJjrYbgC0sIxhhXFB2uByA7xU81hLpmTnCrd4MlBGOMS3aV1RERGkLWEM5B6DQmLormtg7qmtuG/LWGM58SgogsE5EdIlIkIt/t5nikiLzgHH9fRLK8jn3P2b9DRC702v9NEckXkY9F5HkRGbqByMaYgLOjrI5JqTGEhQ7991KbnOabXn8TIhIKPAJcBOQA14hITpdiNwFVqjoFeBC4z3luDrACmAksA34tIqEikgHcAeSq6iwg1ClnjBkhdpXVM21snF9eyxKCb3xJzQuAIlXdraotwEpgeZcyy4FnnO1VwLkiIs7+lararKp7gCLnfABhQLSIhAGjgNKBvRVjzHBR19TKgeqjnJTm34Rgt9I8MV8SQgZQ7PW4xNnXbRlVbQNqgOSenquqB4D7gf3AQaBGVV/t7sVF5GYRyRORvPLych/CNcYEul1Oh/LUMbF+eb3UWKsh+MKVTmURScRTe8gGxgExIvLl7sqq6mOqmququampqf4M0xgzRHaV1QH4rckoITqc8FCxhNALXxLCAWC81+NMZ1+3ZZwmoASg4gTPPQ/Yo6rlqtoK/AE4vT9vwBgz/OwsqycqPITxiUM/wgggJERIsbkIvfIlIWwCpopItohE4On8Xd2lzGrgemf7SmCtegb8rgZWOKOQsoGpwEY8TUWnicgop6/hXGD7wN+OMWY42FlWx5QxsUO6qF1XNlu5d2G9FVDVNhG5DXgFz2igJ1U1X0R+AuSp6mrgCeBZESkCKnFGDDnlXgQKgDbgVlVtB94XkVXAB87+D4HHBv/tGWMC0c6yOhZPSfHra6bGRlJaY0tgn0ivCQFAVdcAa7rsu9truwm4qofn3gvc283+e4B7+hKsMWb4qznaSllts99GGHVKjYtka0m1X19zuLGZysYYvzrWoeznhJAWH8WR+hZa2jr8+rrDiSUEY4xf7XASwtQ0/ww57ZSe4FkMoazWmo16YgnBGONXu8rqiYkIJWN0tF9fd6wlhF5ZQjDG+NXOsjqmpsXhGWDoP+kJngR00DqWe2QJwRjjVzvL6jjJz81F8GkN4ZAlhB75NMrIBI/yumZ2HKojNESIiQwlNjKMCUmj/LLipDGVDS0cqW/x+wgjgPioMKLDQ62GcAKWEEaATXsree69fWzeV0VJ1dHjjo+KCGXuhNHMn5jE0mmpnJI52q8ThszIsdPpUHYjIYgI6QlR1odwApYQgtj7uyv41Ru7ePeTChJHhXP65BSuX5TFzIx4BKGxpY3qxla2lVSTt6+Kh9fu4r/e2MWYuEgumJnGRbPSWZidZLUHM2i2OfMAZqTHu/L6YxOiOFhz/Jci42EJIQjVNbVyz+p8/vDBAVJiI/nB52Zw7cKJREeEdlv+C/MzAc+EoXWFh3kl/xAvbT7Ac+/tJzkmggtmjuWiWWM5bVIyEWGWHEz/bdpbRVbyqGPLUfvb2IQo3vukwpXXHg4sIQSZD/ZX8Y2VWyipauSOc6bw9aVTekwEXSVEh3P53Awun5vB0ZZ21u88zF8/OsSftxzg+Y37iYsK45zpYzg/J40zT0olPip8iN+NCSaqygf7qjh7+hjXYkhPiKKsrpn2DiXUmkWPYwkhSKgqT76zl39fs52x8VG88M+LODUrqd/ni44IZdmsdJbNSqeptZ23dx3h1YJDvL79MH/eUkpYiHBqVhLnTB/D2dNTmZwa6/dhhGZ42XOkgYqGFnInJroWw9j4KNo7lIr6ZsbE2117u7KEEARa2jq4Z/XHPL+xmAty0vjlVXNIiB68b+9R4aGcl5PGeTlptHcoH+6v4o3Cw6zdfph712zn3jXbGZ8UzTnTxnB+zlgWZCdZ05I5Tt7eKgBys1xMCF5zESwhHM8SwjBX3djCLc9t5r3dldx69mS+df60IR0hFBoi5GYlkZuVxF3LpnOg+ihv7jjMusLDvJBXzDP/2HesaWn5KeNYMjWVcOuUNkDevkoSR4UzOdX/cxA6dS5fcbCmiTnjeyk8AllCGMaKKxu5/qmNlFQe5YEvzuGKeZl+jyFjdDTXLpzItQsncrSlnXeKjvBaQRmvFBziz1tKSRwVziWzx3HtaROYPtadkSUmMOTtrWL+xERXmxY/nZxmI426YwlhmMovreGGpzbR3NrOszctYOGkZLdDIjri06aln7bNYsOucv60pZQX8op59r19nJqVyFcWZXHxrLE2lHWEqahvZveRBr54qrtfy5NGRRAeKhyqtRvldMcSwjD09q4j3PLcZuKjwvjd1093ZZJPbyLCQjh3RhrnzkijqqGFVZtLeO79fdzx/If8R2I0N585iavmj/d5BJQZ3vL2Of0HLnYog+dWmmnxUVZD6IF9TRtm/rrtIDc+vZHMxGj+8C+LAzIZdJUYE8E/nTmJdd9aym+uy2VMXCR3/zmfM+5by/+u/4TGlja3QzRDbPO+KiJCQ5iVkeB2KKQnRNnyFT2whDCMPPfePm57/gNOGT+aF/550bH20OEiJEQ4PyeNl75+Oi/+8yJyxsXz878VsuS+dZYYgtymvZXMzkwgKtz9GuHYhGgO2fIV3fIpIYjIMhHZISJFIvLdbo5HisgLzvH3RSTL69j3nP07RORCr/2jRWSViBSKyHYRWTQYbyhYPbKuiB/86WPOmTaG33514aAOK/U3EWFBdhLP3rSQl75+OjMzEo4lhsfessQQbJpa2/n4QA3zXRxu6i09IYpDNU2oqtuhBJxeE4KIhAKPABcBOcA1IpLTpdhNQJWqTgEeBO5znpsDrABmAsuAXzvnA/gV8HdVnQ7MAbYP/O0EH1Xlwdd28stXdvD5uRk8+pX5QdXuPn9iIr/96gJe+vrp5IyL59/XFHLmf3hqDA3NlhiCwbaSGlrblVMn9n+i5GBKi4+iua2D6sZWt0MJOL7UEBYARaq6W1VbgJXA8i5llgPPONurgHPFM7ZsObBSVZtVdQ9QBCwQkQTgTOAJAFVtUVW7+3UXqsr9r+7gV2/s4qr5mdx/1ZygHdM/f2Iiz960kFW3LGL6WE9T0hn3reXhtbuobbL/uMPZWzvLCRHP7zgQeM9FMJ/ly6dLBlDs9bjE2ddtGVVtA2qA5BM8NxsoB54SkQ9F5HERienuxUXkZhHJE5G88vJyH8INHr98ZQePrPuEaxaM574vzB4Ra6/kZiXx3NcW8od/OZ1Txo/m/ld3svjna7nv74UcrrP/wMONqvKXbaUsmpxMYkyE2+EAXnMRam2kUVdufd0MA+YB/6Oqc4EG4Li+CQBVfUxVc1U1NzU11Z8xuuo3b+3m129+wjULJnDv5SePuPsTzJuQyFM3LuAvt5/BmdNSeXT9J5xx3zq+94eP2F1e73Z4xkf5pbXsrWjkktnj3A7lmPRjk9NsLkJXvsxDOAB4zybJdPZ1V6ZERMKABKDiBM8tAUpU9X0242QvAAAaUElEQVRn/yp6SAgj0UubS7h3zXY+d3I6P7t81ohLBt5mZSTwyJfmsedIA4+99QkvfVDCyk37OW9GGjefOYlcl2e+mhP7y7aDhIUIy2aOdTuUY1JjIwkRm63cHV9qCJuAqSKSLSIReDqJV3cpsxq43tm+Elirni781cAKZxRSNjAV2Kiqh4BiEZnmPOdcoGCA7yUorC0s499e2sbiKck8cPWcEdFM5IvslBh+fsVs3rnrHG4/ewqb9lZy1aP/4Ir/eZe/fXSQ9g4bMRJoOpuLFk9JCZjmIoCw0BBS4yKtD6EbvdYQVLVNRG4DXgFCgSdVNV9EfgLkqepqPJ3Dz4pIEVCJJ2nglHsRz4d9G3CrqrY7p74d+J2TZHYDNw7yext2dpXVcfvvP2RGehz/+5VcIsOCZzTRYEmNi+TOC6Zxy9LJrNpcwuMb9vD1331AdkoMXz9rMpfPzbCVVgPEtpIaSqqOcse5U90O5Tg2F6F7MpzG4ubm5mpeXp7bYQyJ2qZWLn/4HWqbWnn59jNId5bpNSfW3qG8kn+IR9YVkV9aS8boaL6+dDJXnzo+aEdkDRf3/rWAp9/dS973zydhVGDNm7nl2c18Ul7Pa3ee5XYofiEim1U1t7dy9j8mAHR0KHe+sJX9lY088qV5lgz6IDREuPjkdP5y+xk8deOppMVH8oM/fcx5D6xn9dZSOqwpyRUdHcpftx3kzKmpAZcMoPPeylZD6MoSQgB4ZF0Rr28v4/ufmxEQq5YORyLC2dPG8NLXT+fJG3KJDg/ljuc/ZPkj75C3t9Lt8EacD4urKa1p4pI56W6H0q3MxGjqm9uobmxxO5SAYgnBZVuKq3nw9Z0sP2UcN5ye5XY4w56IcM70NNbcsYSHrj6FI/XNXPnoP7jj+Q85aKNK/Oa59/YRFR7CeTPS3A6lWxOSRgGwr6LR5UgCiyUEFzW1tvPt/9tKWnwUP718lg2fHEQhIcLlczN441tnccc5U/h7/iHOuX89v36ziOa29t5PYPptV1kdf9pygOtPzyIuKvCaiwCyUjzzYPdWNLgcSWCxhOCiB1/fSdHhen7xhdnEB+h/nOFuVEQYd14wjTfuPIslU1P4j7/vYNlDG1hXeNgWNxsiD76+k5iIMG45c7LbofSos4aw32oIn2EJwSWb91Xxm7d2c82C8Zx10siZge2W8UmjeOy6XJ756gIEuPHpTVzzm/fYUmxLaA2mjw/UsOajQ3z1jOyAmnvQVVR4KGPjo9hrCeEzLCG4oLW9g39btZX0hGj+38Uz3A5nRDnrpFT+/o0z+cnymRQdrufyR97h5t/m8d7uCqsxDIIHX9tJQnQ4N52R7XYovZqQPIr9ldZk5M1uoemC5zfu55PyBh6/Ljdg21iDWURYCNctyuKKeZk8vmE3T72zl1cLyjgpLZZrF07kgplpNvS3Hz7YX8UbhYf5zoXThsX9OrKSR7Fux8haMLM3lhD8rK6plV+9vovTJiVx7owxboczosVGhvGN807in8+czMtbS3n63b3cszqfe1bnk5Mez9JpqeRmJXLK+ESSArj5IxDUNbXy7f/bSmpc5LAZLTcxOYbyuhIaW9oYFWEfhWAJwe/+d/1uKhpaeOriGTaqKEBER4TyxVPHc1VuJkWH61lbeJg3Cg/zv2/t5tdvepqRJiaPYt6EROZOGM28CYnkpMeP6EUHvXV0KN96cSv7Khr53dcWEhM5PD5WJiZ/OvR0Rnq8y9EEhuHxmwsSh2qaePzt3Vw2ZxyzM0e7HY7pQkSYmhbH1LQ4/vmsyTS2tPFRSQ0fFlfz4f4q3i46wh8/9Cz0mxQTwZlTUzh7+hjOnZFG7DD5EBwKv36ziFcLyrj7khxOG0YTKycmeYaeWkL41Mj9K3bBA6/toKMDvnPhtN4LG9eNighj4aTkY7PHVZXSmiY27alk/c5y1u8s509bSokKD2HZzLFcMS+TxVNSRtQKta8VlPGfr+3k8lPGcePiLLfD6ZMJx2oI1rHcyRKCnxRXNrJqcwk3Ls5mvDMG2gwvIkLG6Ggy5mZw+dwMOjqUD4ur+MMHB3h5ayl/2lLK+KRovrxwIl/MHR/Qwy4Hw+/e38fdf85n1rgEfn7F7GHXBJoQHU7iqHD2VdrQ006WEPzkmXf3IiJ8bUngD8czvgkJEeZPTGL+xCR+eEkOrxWU8ex7+/j53wp54LWdXDpnHNctmhh0zYPtHcrP12zn8bf3cPa0VP77S/OIjhieS7VPTI6xGoIXSwh+0NDcxgt5xVw0a6wNZwxSUeGhXDpnHJfOGUfhoVqe/cc+/vjhAVZtLmHO+NF8acF4Lj45fdgPM95zpIEf/Okj3imq4IbTs/jB52YQNoyXGZ+YPIrN+6rcDiNgWELwg5c+KKGuqY0bF1vtYCSYPjaeez9/MnddNJ0/bC7h2ff2cddLH3HP6nwunDmWS2ePY/GUlGH1rbqptZ1H13/Cr9/8hMjQEH5+xclcs2CC22EN2MTkGF7eWkpLW4fdWAlLCEOuo0N5+p29zMlMYN6E4Go6MCcWHxXODYuzuf70LLYUV7Nqcwkvby3lz1tKiQwLYfGUFBZPSWHuhNHkpMcTFR54CaKhuY2Vm4p5YsNuSmuauHTOOH74uRmMiY9yO7RBMTFpFB0KJVWNTEqNdTsc11lCGGLrd5Wz+0gDD119yrDrdDODQ0SYOyGRuRMSufvSHDbtqeKNwjLWFh5mbeFhAMJDhcmpsWQlxzAxZRQZo6NJiokgKSaC0dERxEWFERMZRkxk6JDfWlVVKThYy1+2HeT5jfupbmxlQVYSv7xqDounpAzpa/tbVsqncxEsIVhCGHJPvr2HMXGRXHxyYN4oxPhXZFgoZ0xN4YypKdxz6UzKapv4cH81W4qr2VVWx67DdawtPExLe0eP54gIDSEmMpS4qHBGjwonITqcpJgIUmMjGRMfyZi4KMbERzI2Poq0+KheJ4qpKiVVR9lWUsPWkmpeKyhjz5EGQgTOnZHGLWdNZv7ExMG+FAFhwrG5CNaxDD4mBBFZBvwKCAUeV9VfdDkeCfwWmA9UAFer6l7n2PeAm4B24A5VfcXreaFAHnBAVS8Z8LsJMPsrGtmw6wjfPO8ka5803UqLj2LZrLEsmzX22L72DqWyoYXKhhYqGpqpaWylvrmNhuY2GlraqW9uo76pjbqmVqqPtlLV2Mq+ikbK65o52nr8vR6iwkNIjokkMSacqLBQwkKFsJAQ6prbqGpooaK+mYYWz/PCQ4UF2Un805JJXDgzjeTYSL9dCzekxEYQExFqq546ek0Izof2I8D5QAmwSURWq2qBV7GbgCpVnSIiK4D7gKtFJAdYAcwExgGvi8hJqtr5V/uvwHYgKKcJvrytFIAvzM9wORIznISGCKlxkaTGRQJxPj9PValvbqOstpnDtU0cqm2irLaZivpmT4JpbKGlrYO2dqWxrY2E6HCyk0eRGBPBlDGxzM4YzUljY4e8SSqQiAgTkmPYb3MRAN9qCAuAIlXdDSAiK4HlgHdCWA78yNleBTwsngbz5cBKVW0G9ohIkXO+f4hIJvA54F7gzkF4LwFn9ZZS5k9MJDPRJqKZoScixEWFExcVzpQx1h7uq6zkUewoq3M7jIDgSztGBlDs9bjE2ddtGVVtA2qA5F6e+xDwb0DPjaWAiNwsInkikldePnyWqt1xqI4dZXVcNmec26EYY05gQvIoSiqP0t5h98NwpWFbRC4BDqvq5t7KqupjqpqrqrmpqcPnzmIvby0lRLDOZGMC3OTUWFraO6xjGd8SwgFgvNfjTGdft2VEJAxIwNO53NNzFwOXicheYCVwjog814/4A5Kq8vK2Uk6fnOK0AxtjAlWOs9JpwcFalyNxny8JYRMwVUSyRSQCTyfx6i5lVgPXO9tXAmvVcz/C1cAKEYkUkWxgKrBRVb+nqpmqmuWcb62qfnkQ3k9A2FZSw76KRmsuMmYYmJoWS1iIUFBqCaHXTmVVbROR24BX8Aw7fVJV80XkJ0Ceqq4GngCedTqNK/F8yOOUexFPB3QbcKvXCKOg9fLWUsJDhQu9hhIaYwJTZFgoU9PiyLeE4Ns8BFVdA6zpsu9ur+0m4KoennsvnpFEPZ37TeBNX+IYDjo6lL9sO8hZJ40ZFveVNcZ4mo3W7xw+g1aGis2WGmQfFldxqLaJS+dYZ7Ixw8XMcfEcqW/mcF2T26G4yhLCIHtj+2FCQ4Sl08a4HYoxxkc54zwdyyO92cgSwiBbW3iYU7MSrbnImGGkMyGM9I5lSwiD6ED1UQoP1XHu9DS3QzHG9EF8VDjjk6ItIbgdQDDpXMr47OnWXGTMcDMzPWHEz0WwhDCI1hUeZmLyKCanxrgdijGmj3LGxbPnSAP1zW1uh+IaSwiD5GhLO+8UHeGc6WPsRjjGDEMznX6EwhFcS7CEMEje/eQIzW0dnGPNRcYMSzbSyBLCoHmj8DAxEaEsyE5yOxRjTD+MjY8iKSZiRHcsW0IYBKrKusLDLJmaOqJuLmJMMBERctLjyT9Y43YorrGEMAi2H6zjYE2TNRcZM8zNHBfPzkP1tJ7gntbBzBLCIHhzp2e46dJpw+d+DcaY483MSKClvWPE9iNYQhgEb+4oJyc9njHxUW6HYowZgEWTkgF4p+iIy5G4wxLCANU2tbJ5X5XVDowJAqlxkUwfG8fbuywhmH54t+gI7R3KWSdZQjAmGCyZmsLmfVUcbQn6W7ccxxLCAL25o5y4yDDmTUx0OxRjzCBYPCWFlvYONu6tdDsUv7OEMACqyps7yjljagrhoXYpjQkGC7OTiQgN4e1dI++GOfYpNgA7y+o5VNtkzUXGBJHoiFDmT0xkwwjsR/ApIYjIMhHZISJFIvLdbo5HisgLzvH3RSTL69j3nP07RORCZ994EVknIgUiki8i/zpYb8if3tzhGW56lnUoGxNUzpiaQuGhOsrrmt0Oxa96TQgiEgo8AlwE5ADXiEhOl2I3AVWqOgV4ELjPeW4OsAKYCSwDfu2crw34lqrmAKcBt3ZzzoC3fmc508fGkZ4Q7XYoxphBdMaUFMCzRtlI4ksNYQFQpKq7VbUFWAks71JmOfCMs70KOFc8S34uB1aqarOq7gGKgAWqelBVPwBQ1TpgO5Ax8LfjP/XNbWzaW2nNRcYEoVkZCSREh4+4ZiNfEkIGUOz1uITjP7yPlVHVNqAGSPbluU7z0lzg/e5eXERuFpE8EckrLw+cTp53io7Q2q7WXGRMEAoNEU6fnMw7RUdQVbfD8RtXO5VFJBZ4CfiGqnY7V1xVH1PVXFXNTU0NnA/fV/PLiI8K49QsW93UmGB0xtQUDtY0setwvduh+I0vCeEAMN7rcaazr9syIhIGJAAVJ3quiITjSQa/U9U/9Cd4t7S1d/BGYRnnzkiz4abGBKnzZ6QRGiL88cOuH3fBy5dPs03AVBHJFpEIPJ3Eq7uUWQ1c72xfCaxVTz1rNbDCGYWUDUwFNjr9C08A21X1gcF4I/60aW8V1Y2tXJCT5nYoxpghMiY+iqUnpfLS5hLaRsjqp70mBKdP4DbgFTydvy+qar6I/ERELnOKPQEki0gRcCfwXee5+cCLQAHwd+BWVW0HFgNfAc4RkS3Oz8WD/N6GzCv5h4gMC7H+A2OC3FW5mRyuax4xncthvhRS1TXAmi777vbabgKu6uG59wL3dtn3NjAsbzysqrxWUMaSqSmMivDp8hljhqlzpqeRFBPBi3nFnD0C7ndiDeB9lF9ay4Hqo1yQM9btUIwxQywiLITPz83g9e1lVDa0uB3OkLOE0Eev5h8iRODcGcH/bcEY42k2am1X/jQCOpctIfTRqwVl5GYlkRwb6XYoxhg/mD42ntmZCbyYVxz0cxIsIfTBvooGCg/VceFMay4yZiS5Knc8hYfq2FJc7XYoQ8oSQh+8vLUUwIabGjPCXH7KOBJHhfPAazvdDmVIWULwUXuH8vzGYhZPSWZ80ii3wzHG+FFcVDi3nj2FDbuOBPX9li0h+Gj9zsMcqD7KtQsnuh2KMcYFXz5tIhmjo/nF3wrp6AjOvgRLCD763Xv7SY2L5HxrLjJmRIoKD+XO80/iowM1rPn4oNvhDAlLCD4oqWpk7Y7DXJ073tYuMmYEu3xuBtPS4rj/lR20BuFyFvbp5oMXNnlW8F6xYHwvJY0xwSw0RLjromnsrWjksbd2ux3OoLOE0IvW9g5Wbirm7GljyEy0zmRjRrqzp43h0jnjeOC1nWzaW+l2OIPKEkIv/v7xIcrrmrl24QS3QzHGBAAR4d8/P4vxidHc/vsPg2pJC0sIJ9DY0sYv/lbIlDGxLJ1mS1UYYzziosJ5+EvzqGxo4VsvbgmaUUeWEE7gV2/s4kD1Ue69fBahIcNycVZjzBCZlZHA9z83g3U7yrnv74VBsayFrd/cg8JDtTyxYQ9Xzc9k4aRkt8MxxgSg6xZNZGdZHf/71m5a2ju4+5IcPPf/Gp4sIXSjo0P5wR8/Ji4qjO9dPMPtcIwxAUpE+Nnls4gIC+Gpd/bS0tbBT5fPImSYtihYQujGYxt2k7evil9eOZukmAi3wzHGBDAR4e5LcogMC+XR9Z9wsKaJX1xxMmPio9wOrc+sD8GLqnL/Kzv4xd8KuXBmGlfOz3Q7JGPMMCAi3LVsGj+6NId3io5wwUNvHVsMczixhOBoa+/grpe28fC6IlacOp5HvjRvWLcFGmP8S0S4YXE2f71jCROTY7j9+Q+59vH3eHPH4WHT4exTQhCRZSKyQ0SKROS73RyPFJEXnOPvi0iW17HvOft3iMiFvp7TXxpb2vj9+/u5+L828GJeCXecO5WfX3EyYbZEhTGmH6aMieWlWxbxg8/NYFdZPTc8tYllD23g8Q272VlWF9DJQXoLTkRCgZ3A+UAJsAm4RlULvMr8CzBbVW8RkRXA51X1ahHJAZ4HFgDjgNeBk5ynnfCc3cnNzdW8vLw+v8mODqWhpY26Js9PSVUjO8vq2XGolrWFh6ltaiMnPZ5/OXsyl8we1+fzG2NMd1raOnh5aylPvL2HgoO1AKTFRzJvQiJZKTFkJ8eQkRhNQnQ4CdHhxEWFEREWQkRoyKB+KRWRzaqa21s5XzqVFwBFqrrbOfFKYDng/eG9HPiRs70KeFg87S3LgZWq2gzsEZEi53z4cM5Bc96D69ld3nDc/vSEKJZOG8NXFk0kd2KiNREZYwZVRFgIX5ifyRfmZ3Kg+ihv7ypnw64jFBys5fXtZbS29/yFXARCRQgRISQEttx9AVHhoUMary8JIQMo9npcAizsqYyqtolIDZDs7H+vy3MznO3ezgmAiNwM3Ow8rBeRHT7E7JN9TnD/3f9TpACBeLeMQI0LAjc2i6vvAjW2oIwr+mcDem2fbuQS8MNOVfUx4DG34+iOiOT5Ug3zt0CNCwI3Nour7wI1Nour/3xppDoAeK/7nOns67aMiIQBCUDFCZ7ryzmNMcb4kS8JYRMwVUSyRSQCWAGs7lJmNXC9s30lsFY9vdWrgRXOKKRsYCqw0cdzGmOM8aNem4ycPoHbgFeAUOBJVc0XkZ8Aeaq6GngCeNbpNK7E8wGPU+5FPJ3FbcCtqtoO0N05B//tDbmAbMoicOOCwI3N4uq7QI3N4uqnXoedGmOMGRls9pUxxhjAEoIxxhiHJYR+CpSlN0RkvIisE5ECEckXkX919v9IRA6IyBbn52IXYtsrIh85r5/n7EsSkddEZJfzb6KfY5rmdU22iEitiHzDreslIk+KyGER+dhrX7fXSDz+y/mb2yYi8/wc1y9FpNB57T+KyGhnf5aIHPW6do8OVVwniK3H319Py+f4Ka4XvGLaKyJbnP1+vWY+U1X76eMPno7wT4BJQASwFchxKZZ0YJ6zHYdnSZAcPDPHv+3yddoLpHTZ9x/Ad53t7wL3ufx7PIRn0o4r1ws4E5gHfNzbNQIuBv4GCHAa8L6f47oACHO27/OKK8u7nEvXrNvfn/N/YSsQCWQ7/29D/RVXl+P/CdztxjXz9cdqCP1zbDkPVW0BOpfe8DtVPaiqHzjbdcB2Pp0NHoiWA884288Al7sYy7nAJ6q6z60AVPUtPCPzvPV0jZYDv1WP94DRIpLur7hU9VVVbXMevodn/pDf9XDNenJs+RxV3QN4L5/jt7icpXy+iGdtt4BlCaF/ulvOw/UPYfGsMjsXeN/ZdZtTvX/S300zDgVeFZHNzhIkAGmqetDZPgSkuRBXpxV89j+o29erU0/XKJD+7r6Kp7bSKVtEPhSR9SKyxKWYuvv9Bco1WwKUqeour32BcM0+wxJCkBCRWOAl4BuqWgv8DzAZOAU4iKe66m9nqOo84CLgVhE50/ugeurOrox7diZEXgb8n7MrEK7Xcdy8Rj0Rke/jmVf0O2fXQWCCqs4F7gR+LyLxfg4rIH9/Xq7hs18+AuGaHccSQv8E1NIbIhKOJxn8TlX/AKCqZararqodwG8YomryiajqAeffw8AfnRjKOps5nH8P+zsux0XAB6pa5sTo+vXy0tM1cv3vTkRuAC4BrnWSFU5zTIWzvRlPO/1JPZ5kCJzg9xcI1ywMuAJ4oXNfIFyz7lhC6J+AWXrDaZt8Atiuqg947fduW/488HHX5w5xXDEiEte5jadD8mM+u8zJ9cCf/RmXl898Y3P7enXR0zVaDVznjDY6DajxaloaciKyDPg34DJVbfTanyqe+6YgIpPwLFGz219xOa/b0++vp+Vz/Ok8oFBVSzp3BMI165bbvdrD9QfPiI+deDL7912M4ww8TQrbgC3Oz8XAs8BHzv7VQLqf45qEZ3THViC/8xrhWRb9DWAXnhsmJblwzWLwLL6Y4LXPleuFJykdBFrxtG/f1NM1wjO66BHnb+4jINfPcRXhaY/v/Dt71Cn7Bed3vAX4ALjUhWvW4+8P+L5zzXYAF/kzLmf/08AtXcr69Zr5+mNLVxhjjAGsycgYY4zDEoIxxhjAEoIxxhiHJQRjjDGAJQRjjDEOSwjGGGMASwjGGGMc/x+p1q8VlLYyzgAAAABJRU5ErkJggg==\n",
230 | "text/plain": [
231 | ""
232 | ]
233 | },
234 | "metadata": {
235 | "needs_background": "light"
236 | },
237 | "output_type": "display_data"
238 | }
239 | ],
240 | "source": [
241 | "df1 = df0[df0.content.astype(str).apply(len)<=170]\n",
242 | "sns.kdeplot(df1['weibo_len'])\n",
243 | "plt.title('weibo length new')"
244 | ]
245 | },
246 | {
247 | "cell_type": "markdown",
248 | "metadata": {},
249 | "source": [
250 | "# pre-cleaning, deleting or filtering unnecessary words like 'show more'"
251 | ]
252 | },
253 | {
254 | "cell_type": "code",
255 | "execution_count": 40,
256 | "metadata": {},
257 | "outputs": [
258 | {
259 | "name": "stderr",
260 | "output_type": "stream",
261 | "text": [
262 | "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/ipykernel_launcher.py:2: SettingWithCopyWarning: \n",
263 | "A value is trying to be set on a copy of a slice from a DataFrame.\n",
264 | "Try using .loc[row_indexer,col_indexer] = value instead\n",
265 | "\n",
266 | "See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy\n",
267 | " \n",
268 | "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/ipykernel_launcher.py:5: SettingWithCopyWarning: \n",
269 | "A value is trying to be set on a copy of a slice from a DataFrame.\n",
270 | "Try using .loc[row_indexer,col_indexer] = value instead\n",
271 | "\n",
272 | "See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy\n",
273 | " \"\"\"\n"
274 | ]
275 | }
276 | ],
277 | "source": [
278 | "#remove \"show more\"\n",
279 | "df1['content'] = df1['content'].astype(str).apply(lambda x : x.replace(\"展开全文\", \"\"))\n",
280 | "# remove symbol, number and letter\n",
281 | "symbols = \"[a-zA-Z0-9\\s+\\.\\!\\/_,$%^*()??;;:【】+\\\"\\'\\[\\]\\\\]+|[+——!,;:。?《》、~@#¥%……&*()“”.=-]+\"\n",
282 | "df1['content'] = df1['content'].astype(str).apply(lambda x : re.sub(symbols, '', x))"
283 | ]
284 | },
285 | {
286 | "cell_type": "code",
287 | "execution_count": 41,
288 | "metadata": {},
289 | "outputs": [
290 | {
291 | "name": "stderr",
292 | "output_type": "stream",
293 | "text": [
294 | "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/ipykernel_launcher.py:2: SettingWithCopyWarning: \n",
295 | "A value is trying to be set on a copy of a slice from a DataFrame.\n",
296 | "Try using .loc[row_indexer,col_indexer] = value instead\n",
297 | "\n",
298 | "See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy\n",
299 | " \n"
300 | ]
301 | }
302 | ],
303 | "source": [
304 | "pattern = re.compile(r'[^\\u4e00-\\u9fa5]')\n",
305 | "df1['content'] = df1['content'].astype(str).apply(lambda x: re.sub(pattern, '', x))"
306 | ]
307 | },
308 | {
309 | "cell_type": "code",
310 | "execution_count": 42,
311 | "metadata": {},
312 | "outputs": [
313 | {
314 | "name": "stdout",
315 | "output_type": "stream",
316 | "text": [
317 | "Model loaded succeed\n"
318 | ]
319 | },
320 | {
321 | "name": "stderr",
322 | "output_type": "stream",
323 | "text": [
324 | "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/ipykernel_launcher.py:2: SettingWithCopyWarning: \n",
325 | "A value is trying to be set on a copy of a slice from a DataFrame.\n",
326 | "Try using .loc[row_indexer,col_indexer] = value instead\n",
327 | "\n",
328 | "See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy\n",
329 | " \n"
330 | ]
331 | }
332 | ],
333 | "source": [
334 | "#tokenization using http://thulac.thunlp.org/\n",
335 | "thu1 = thulac.thulac(seg_only=True) #默认模式\n",
336 | "df1['content'] = df1['content'].astype(str).apply(lambda x : thu1.cut(x,text = True))"
337 | ]
338 | },
339 | {
340 | "cell_type": "code",
341 | "execution_count": 43,
342 | "metadata": {},
343 | "outputs": [
344 | {
345 | "data": {
346 | "text/plain": [
347 | "'写 在 年末 冬初 孩子 流感 的 第五 天 我们 仍然 没有 忘记 热情 拥抱 这 年 的 第一 天 带 着 一 丝 迷信 早晨 给 孩子 穿 上 红色 的 羽绒服 羽绒 裤祈祷 新 的 一 年 孩子 们 身体 康健 仍然 会 有 一 丝 焦虑 焦虑 我 的 孩子 为什 么 会 过早 的 懂事 从 两 岁 多 开始 关注 我 的 情绪 会 深沉 地 说 妈妈 你 终于 笑 了 这 句 话 像 刀子 一样 扎入 我'"
348 | ]
349 | },
350 | "execution_count": 43,
351 | "metadata": {},
352 | "output_type": "execute_result"
353 | }
354 | ],
355 | "source": [
356 | "df1.content[0]"
357 | ]
358 | },
359 | {
360 | "cell_type": "code",
361 | "execution_count": 44,
362 | "metadata": {},
363 | "outputs": [],
364 | "source": [
365 | "df1.to_csv(\"precleaned.csv\")"
366 | ]
367 | },
368 | {
369 | "cell_type": "code",
370 | "execution_count": null,
371 | "metadata": {},
372 | "outputs": [],
373 | "source": [
374 | "myfile = './mystopwords.csv'\n",
375 | "stopwords = pd.read_csv(myfile, delimiter='\\n', header=None, encoding='utf-8', quoting=csv.QUOTE_NONE)\n",
376 | "stopwords.shape"
377 | ]
378 | },
379 | {
380 | "cell_type": "code",
381 | "execution_count": null,
382 | "metadata": {},
383 | "outputs": [],
384 | "source": [
385 | "# thu1 = thulac.thulac(seg_only=True) #默认模式\n",
386 | "# df1 = df0\n",
387 | "# df1['content'] = df1['content'].astype(str).apply(lambda x : thu1.cut(x,text = True))\n",
388 | "# #text = thu1.cut(df0.content[0], text=True) #进行一句话分词\n",
389 | "# #print(text)\n",
390 | "# df1.head(10)"
391 | ]
392 | },
393 | {
394 | "cell_type": "code",
395 | "execution_count": null,
396 | "metadata": {},
397 | "outputs": [],
398 | "source": [
399 | "count_vect = CountVectorizer(analyzer='word', token_pattern=r'\\w{1,}')\n",
400 | "count_vect.fit(df1['content'])\n",
401 | "xtrain_count = count_vect.transform(train_df['text_cut'])"
402 | ]
403 | }
404 | ],
405 | "metadata": {
406 | "kernelspec": {
407 | "display_name": "py_35_env",
408 | "language": "python",
409 | "name": "py_35_env"
410 | },
411 | "language_info": {
412 | "codemirror_mode": {
413 | "name": "ipython",
414 | "version": 3
415 | },
416 | "file_extension": ".py",
417 | "mimetype": "text/x-python",
418 | "name": "python",
419 | "nbconvert_exporter": "python",
420 | "pygments_lexer": "ipython3",
421 | "version": "3.5.4"
422 | }
423 | },
424 | "nbformat": 4,
425 | "nbformat_minor": 4
426 | }
427 |
--------------------------------------------------------------------------------
/SentimentAnalysis/stopwords.txt:
--------------------------------------------------------------------------------
1 | !
2 | "
3 | #
4 | $
5 | %
6 | &
7 | '
8 | (
9 | )
10 | *
11 | +
12 | ,
13 | -
14 | --
15 | .
16 | ..
17 | ...
18 | ......
19 | ...................
20 | ./
21 | .一
22 | .数
23 | .日
24 | /
25 | //
26 | 0
27 | 1
28 | 2
29 | 3
30 | 4
31 | 5
32 | 6
33 | 7
34 | 8
35 | 9
36 | :
37 | ://
38 | ::
39 | ;
40 | <
41 | =
42 | >
43 | >>
44 | ?
45 | @
46 | A
47 | Lex
48 | [
49 | \
50 | ]
51 | ^
52 | _
53 | `
54 | exp
55 | sub
56 | sup
57 | |
58 | }
59 | ~
60 | ~~~~
61 | ·
62 | ×
63 | ×××
64 | Δ
65 | Ψ
66 | γ
67 | μ
68 | φ
69 | φ.
70 | В
71 | —
72 | ——
73 | ———
74 | ‘
75 | ’
76 | ’‘
77 | “
78 | ”
79 | ”,
80 | …
81 | ……
82 | …………………………………………………③
83 | ′∈
84 | ′|
85 | ℃
86 | Ⅲ
87 | ↑
88 | →
89 | ∈[
90 | ∪φ∈
91 | ≈
92 | ①
93 | ②
94 | ②c
95 | ③
96 | ③]
97 | ④
98 | ⑤
99 | ⑥
100 | ⑦
101 | ⑧
102 | ⑨
103 | ⑩
104 | ──
105 | ■
106 | ▲
107 |
108 | 、
109 | 。
110 | 〈
111 | 〉
112 | 《
113 | 》
114 | 》),
115 | 」
116 | 『
117 | 』
118 | 【
119 | 】
120 | 〔
121 | 〕
122 | 〕〔
123 | ㈧
124 | 一
125 | 一.
126 | 一一
127 | 一下
128 | 一个
129 | 一些
130 | 一何
131 | 一切
132 | 一则
133 | 一则通过
134 | 一天
135 | 一定
136 | 一方面
137 | 一旦
138 | 一时
139 | 一来
140 | 一样
141 | 一次
142 | 一片
143 | 一番
144 | 一直
145 | 一致
146 | 一般
147 | 一起
148 | 一转眼
149 | 一边
150 | 一面
151 | 七
152 | 万一
153 | 三
154 | 三天两头
155 | 三番两次
156 | 三番五次
157 | 上
158 | 上下
159 | 上升
160 | 上去
161 | 上来
162 | 上述
163 | 上面
164 | 下
165 | 下列
166 | 下去
167 | 下来
168 | 下面
169 | 不
170 | 不一
171 | 不下
172 | 不久
173 | 不了
174 | 不亦乐乎
175 | 不仅
176 | 不仅...而且
177 | 不仅仅
178 | 不仅仅是
179 | 不会
180 | 不但
181 | 不但...而且
182 | 不光
183 | 不免
184 | 不再
185 | 不力
186 | 不单
187 | 不变
188 | 不只
189 | 不可
190 | 不可开交
191 | 不可抗拒
192 | 不同
193 | 不外
194 | 不外乎
195 | 不够
196 | 不大
197 | 不如
198 | 不妨
199 | 不定
200 | 不对
201 | 不少
202 | 不尽
203 | 不尽然
204 | 不巧
205 | 不已
206 | 不常
207 | 不得
208 | 不得不
209 | 不得了
210 | 不得已
211 | 不必
212 | 不怎么
213 | 不怕
214 | 不惟
215 | 不成
216 | 不拘
217 | 不择手段
218 | 不敢
219 | 不料
220 | 不断
221 | 不日
222 | 不时
223 | 不是
224 | 不曾
225 | 不止
226 | 不止一次
227 | 不比
228 | 不消
229 | 不满
230 | 不然
231 | 不然的话
232 | 不特
233 | 不独
234 | 不由得
235 | 不知不觉
236 | 不管
237 | 不管怎样
238 | 不经意
239 | 不胜
240 | 不能
241 | 不能不
242 | 不至于
243 | 不若
244 | 不要
245 | 不论
246 | 不起
247 | 不足
248 | 不过
249 | 不迭
250 | 不问
251 | 不限
252 | 与
253 | 与其
254 | 与其说
255 | 与否
256 | 与此同时
257 | 专门
258 | 且
259 | 且不说
260 | 且说
261 | 两者
262 | 严格
263 | 严重
264 | 个
265 | 个人
266 | 个别
267 | 中小
268 | 中间
269 | 丰富
270 | 串行
271 | 临
272 | 临到
273 | 为
274 | 为主
275 | 为了
276 | 为什么
277 | 为什麽
278 | 为何
279 | 为止
280 | 为此
281 | 为着
282 | 主张
283 | 主要
284 | 举凡
285 | 举行
286 | 乃
287 | 乃至
288 | 乃至于
289 | 么
290 | 之
291 | 之一
292 | 之前
293 | 之后
294 | 之後
295 | 之所以
296 | 之类
297 | 乌乎
298 | 乎
299 | 乒
300 | 乘
301 | 乘势
302 | 乘机
303 | 乘胜
304 | 乘虚
305 | 乘隙
306 | 九
307 | 也
308 | 也好
309 | 也就是说
310 | 也是
311 | 也罢
312 | 了
313 | 了解
314 | 争取
315 | 二
316 | 二来
317 | 二话不说
318 | 二话没说
319 | 于
320 | 于是
321 | 于是乎
322 | 云云
323 | 云尔
324 | 互
325 | 互相
326 | 五
327 | 些
328 | 交口
329 | 亦
330 | 产生
331 | 亲口
332 | 亲手
333 | 亲眼
334 | 亲自
335 | 亲身
336 | 人
337 | 人人
338 | 人们
339 | 人家
340 | 人民
341 | 什么
342 | 什么样
343 | 什麽
344 | 仅
345 | 仅仅
346 | 今
347 | 今后
348 | 今天
349 | 今年
350 | 今後
351 | 介于
352 | 仍
353 | 仍旧
354 | 仍然
355 | 从
356 | 从不
357 | 从严
358 | 从中
359 | 从事
360 | 从今以后
361 | 从优
362 | 从古到今
363 | 从古至今
364 | 从头
365 | 从宽
366 | 从小
367 | 从新
368 | 从无到有
369 | 从早到晚
370 | 从未
371 | 从来
372 | 从此
373 | 从此以后
374 | 从而
375 | 从轻
376 | 从速
377 | 从重
378 | 他
379 | 他人
380 | 他们
381 | 他是
382 | 他的
383 | 代替
384 | 以
385 | 以上
386 | 以下
387 | 以为
388 | 以便
389 | 以免
390 | 以前
391 | 以及
392 | 以后
393 | 以外
394 | 以後
395 | 以故
396 | 以期
397 | 以来
398 | 以至
399 | 以至于
400 | 以致
401 | 们
402 | 任
403 | 任何
404 | 任凭
405 | 任务
406 | 企图
407 | 伙同
408 | 会
409 | 伟大
410 | 传
411 | 传说
412 | 传闻
413 | 似乎
414 | 似的
415 | 但
416 | 但凡
417 | 但愿
418 | 但是
419 | 何
420 | 何乐而不为
421 | 何以
422 | 何况
423 | 何处
424 | 何妨
425 | 何尝
426 | 何必
427 | 何时
428 | 何止
429 | 何苦
430 | 何须
431 | 余外
432 | 作为
433 | 你
434 | 你们
435 | 你是
436 | 你的
437 | 使
438 | 使得
439 | 使用
440 | 例如
441 | 依
442 | 依据
443 | 依照
444 | 依靠
445 | 便
446 | 便于
447 | 促进
448 | 保持
449 | 保管
450 | 保险
451 | 俺
452 | 俺们
453 | 倍加
454 | 倍感
455 | 倒不如
456 | 倒不如说
457 | 倒是
458 | 倘
459 | 倘使
460 | 倘或
461 | 倘然
462 | 倘若
463 | 借
464 | 借以
465 | 借此
466 | 假使
467 | 假如
468 | 假若
469 | 偏偏
470 | 做到
471 | 偶尔
472 | 偶而
473 | 傥然
474 | 像
475 | 儿
476 | 允许
477 | 元/吨
478 | 充其极
479 | 充其量
480 | 充分
481 | 先不先
482 | 先后
483 | 先後
484 | 先生
485 | 光
486 | 光是
487 | 全体
488 | 全力
489 | 全年
490 | 全然
491 | 全身心
492 | 全部
493 | 全都
494 | 全面
495 | 八
496 | 八成
497 | 公然
498 | 六
499 | 兮
500 | 共
501 | 共同
502 | 共总
503 | 关于
504 | 其
505 | 其一
506 | 其中
507 | 其二
508 | 其他
509 | 其余
510 | 其后
511 | 其它
512 | 其实
513 | 其次
514 | 具体
515 | 具体地说
516 | 具体来说
517 | 具体说来
518 | 具有
519 | 兼之
520 | 内
521 | 再
522 | 再其次
523 | 再则
524 | 再有
525 | 再次
526 | 再者
527 | 再者说
528 | 再说
529 | 冒
530 | 冲
531 | 决不
532 | 决定
533 | 决非
534 | 况且
535 | 准备
536 | 凑巧
537 | 凝神
538 | 几
539 | 几乎
540 | 几度
541 | 几时
542 | 几番
543 | 几经
544 | 凡
545 | 凡是
546 | 凭
547 | 凭借
548 | 出
549 | 出于
550 | 出去
551 | 出来
552 | 出现
553 | 分别
554 | 分头
555 | 分期
556 | 分期分批
557 | 切
558 | 切不可
559 | 切切
560 | 切勿
561 | 切莫
562 | 则
563 | 则甚
564 | 刚
565 | 刚好
566 | 刚巧
567 | 刚才
568 | 初
569 | 别
570 | 别人
571 | 别处
572 | 别是
573 | 别的
574 | 别管
575 | 别说
576 | 到
577 | 到了儿
578 | 到处
579 | 到头
580 | 到头来
581 | 到底
582 | 到目前为止
583 | 前后
584 | 前此
585 | 前者
586 | 前进
587 | 前面
588 | 加上
589 | 加之
590 | 加以
591 | 加入
592 | 加强
593 | 动不动
594 | 动辄
595 | 勃然
596 | 匆匆
597 | 十分
598 | 千
599 | 千万
600 | 千万千万
601 | 半
602 | 单
603 | 单单
604 | 单纯
605 | 即
606 | 即令
607 | 即使
608 | 即便
609 | 即刻
610 | 即如
611 | 即将
612 | 即或
613 | 即是说
614 | 即若
615 | 却
616 | 却不
617 | 历
618 | 原来
619 | 去
620 | 又
621 | 又及
622 | 及
623 | 及其
624 | 及时
625 | 及至
626 | 双方
627 | 反之
628 | 反之亦然
629 | 反之则
630 | 反倒
631 | 反倒是
632 | 反应
633 | 反手
634 | 反映
635 | 反而
636 | 反过来
637 | 反过来说
638 | 取得
639 | 取道
640 | 受到
641 | 变成
642 | 古来
643 | 另
644 | 另一个
645 | 另一方面
646 | 另外
647 | 另悉
648 | 另方面
649 | 另行
650 | 只
651 | 只当
652 | 只怕
653 | 只是
654 | 只有
655 | 只消
656 | 只要
657 | 只限
658 | 叫
659 | 叫做
660 | 召开
661 | 叮咚
662 | 叮当
663 | 可
664 | 可以
665 | 可好
666 | 可是
667 | 可能
668 | 可见
669 | 各
670 | 各个
671 | 各人
672 | 各位
673 | 各地
674 | 各式
675 | 各种
676 | 各级
677 | 各自
678 | 合理
679 | 同
680 | 同一
681 | 同时
682 | 同样
683 | 后
684 | 后来
685 | 后者
686 | 后面
687 | 向
688 | 向使
689 | 向着
690 | 吓
691 | 吗
692 | 否则
693 | 吧
694 | 吧哒
695 | 吱
696 | 呀
697 | 呃
698 | 呆呆地
699 | 呐
700 | 呕
701 | 呗
702 | 呜
703 | 呜呼
704 | 呢
705 | 周围
706 | 呵
707 | 呵呵
708 | 呸
709 | 呼哧
710 | 呼啦
711 | 咋
712 | 和
713 | 咚
714 | 咦
715 | 咧
716 | 咱
717 | 咱们
718 | 咳
719 | 哇
720 | 哈
721 | 哈哈
722 | 哉
723 | 哎
724 | 哎呀
725 | 哎哟
726 | 哗
727 | 哗啦
728 | 哟
729 | 哦
730 | 哩
731 | 哪
732 | 哪个
733 | 哪些
734 | 哪儿
735 | 哪天
736 | 哪年
737 | 哪怕
738 | 哪样
739 | 哪边
740 | 哪里
741 | 哼
742 | 哼唷
743 | 唉
744 | 唯有
745 | 啊
746 | 啊呀
747 | 啊哈
748 | 啊哟
749 | 啐
750 | 啥
751 | 啦
752 | 啪达
753 | 啷当
754 | 喀
755 | 喂
756 | 喏
757 | 喔唷
758 | 喽
759 | 嗡
760 | 嗡嗡
761 | 嗬
762 | 嗯
763 | 嗳
764 | 嘎
765 | 嘎嘎
766 | 嘎登
767 | 嘘
768 | 嘛
769 | 嘻
770 | 嘿
771 | 嘿嘿
772 | 四
773 | 因
774 | 因为
775 | 因了
776 | 因此
777 | 因着
778 | 因而
779 | 固
780 | 固然
781 | 在
782 | 在下
783 | 在于
784 | 地
785 | 均
786 | 坚决
787 | 坚持
788 | 基于
789 | 基本
790 | 基本上
791 | 处在
792 | 处处
793 | 处理
794 | 复杂
795 | 多
796 | 多么
797 | 多亏
798 | 多多
799 | 多多少少
800 | 多多益善
801 | 多少
802 | 多年前
803 | 多年来
804 | 多数
805 | 多次
806 | 够瞧的
807 | 大
808 | 大不了
809 | 大举
810 | 大事
811 | 大体
812 | 大体上
813 | 大凡
814 | 大力
815 | 大多
816 | 大多数
817 | 大大
818 | 大家
819 | 大张旗鼓
820 | 大批
821 | 大抵
822 | 大概
823 | 大略
824 | 大约
825 | 大致
826 | 大都
827 | 大量
828 | 大面儿上
829 | 失去
830 | 奇
831 | 奈
832 | 奋勇
833 | 她
834 | 她们
835 | 她是
836 | 她的
837 | 好
838 | 好在
839 | 好的
840 | 好象
841 | 如
842 | 如上
843 | 如上所述
844 | 如下
845 | 如今
846 | 如何
847 | 如其
848 | 如前所述
849 | 如同
850 | 如常
851 | 如是
852 | 如期
853 | 如果
854 | 如次
855 | 如此
856 | 如此等等
857 | 如若
858 | 始而
859 | 姑且
860 | 存在
861 | 存心
862 | 孰料
863 | 孰知
864 | 宁
865 | 宁可
866 | 宁愿
867 | 宁肯
868 | 它
869 | 它们
870 | 它们的
871 | 它是
872 | 它的
873 | 安全
874 | 完全
875 | 完成
876 | 定
877 | 实现
878 | 实际
879 | 宣布
880 | 容易
881 | 密切
882 | 对
883 | 对于
884 | 对应
885 | 对待
886 | 对方
887 | 对比
888 | 将
889 | 将才
890 | 将要
891 | 将近
892 | 小
893 | 少数
894 | 尔
895 | 尔后
896 | 尔尔
897 | 尔等
898 | 尚且
899 | 尤其
900 | 就
901 | 就地
902 | 就是
903 | 就是了
904 | 就是说
905 | 就此
906 | 就算
907 | 就要
908 | 尽
909 | 尽可能
910 | 尽如人意
911 | 尽心尽力
912 | 尽心竭力
913 | 尽快
914 | 尽早
915 | 尽然
916 | 尽管
917 | 尽管如此
918 | 尽量
919 | 局外
920 | 居然
921 | 届时
922 | 属于
923 | 屡
924 | 屡屡
925 | 屡次
926 | 屡次三番
927 | 岂
928 | 岂但
929 | 岂止
930 | 岂非
931 | 川流不息
932 | 左右
933 | 巨大
934 | 巩固
935 | 差一点
936 | 差不多
937 | 己
938 | 已
939 | 已矣
940 | 已经
941 | 巴
942 | 巴巴
943 | 带
944 | 帮助
945 | 常
946 | 常常
947 | 常言说
948 | 常言说得好
949 | 常言道
950 | 平素
951 | 年复一年
952 | 并
953 | 并不
954 | 并不是
955 | 并且
956 | 并排
957 | 并无
958 | 并没
959 | 并没有
960 | 并肩
961 | 并非
962 | 广大
963 | 广泛
964 | 应当
965 | 应用
966 | 应该
967 | 庶乎
968 | 庶几
969 | 开外
970 | 开始
971 | 开展
972 | 引起
973 | 弗
974 | 弹指之间
975 | 强烈
976 | 强调
977 | 归
978 | 归根到底
979 | 归根结底
980 | 归齐
981 | 当
982 | 当下
983 | 当中
984 | 当儿
985 | 当前
986 | 当即
987 | 当口儿
988 | 当地
989 | 当场
990 | 当头
991 | 当庭
992 | 当时
993 | 当然
994 | 当真
995 | 当着
996 | 形成
997 | 彻夜
998 | 彻底
999 | 彼
1000 | 彼时
1001 | 彼此
1002 | 往
1003 | 往往
1004 | 待
1005 | 待到
1006 | 很
1007 | 很多
1008 | 很少
1009 | 後来
1010 | 後面
1011 | 得
1012 | 得了
1013 | 得出
1014 | 得到
1015 | 得天独厚
1016 | 得起
1017 | 心里
1018 | 必
1019 | 必定
1020 | 必将
1021 | 必然
1022 | 必要
1023 | 必须
1024 | 快
1025 | 快要
1026 | 忽地
1027 | 忽然
1028 | 怎
1029 | 怎么
1030 | 怎么办
1031 | 怎么样
1032 | 怎奈
1033 | 怎样
1034 | 怎麽
1035 | 怕
1036 | 急匆匆
1037 | 怪
1038 | 怪不得
1039 | 总之
1040 | 总是
1041 | 总的来看
1042 | 总的来说
1043 | 总的说来
1044 | 总结
1045 | 总而言之
1046 | 恍然
1047 | 恐怕
1048 | 恰似
1049 | 恰好
1050 | 恰如
1051 | 恰巧
1052 | 恰恰
1053 | 恰恰相反
1054 | 恰逢
1055 | 您
1056 | 您们
1057 | 您是
1058 | 惟其
1059 | 惯常
1060 | 意思
1061 | 愤然
1062 | 愿意
1063 | 慢说
1064 | 成为
1065 | 成年
1066 | 成年累月
1067 | 成心
1068 | 我
1069 | 我们
1070 | 我是
1071 | 我的
1072 | 或
1073 | 或则
1074 | 或多或少
1075 | 或是
1076 | 或曰
1077 | 或者
1078 | 或许
1079 | 战斗
1080 | 截然
1081 | 截至
1082 | 所
1083 | 所以
1084 | 所在
1085 | 所幸
1086 | 所有
1087 | 所谓
1088 | 才
1089 | 才能
1090 | 扑通
1091 | 打
1092 | 打从
1093 | 打开天窗说亮话
1094 | 扩大
1095 | 把
1096 | 抑或
1097 | 抽冷子
1098 | 拦腰
1099 | 拿
1100 | 按
1101 | 按时
1102 | 按期
1103 | 按照
1104 | 按理
1105 | 按说
1106 | 挨个
1107 | 挨家挨户
1108 | 挨次
1109 | 挨着
1110 | 挨门挨户
1111 | 挨门逐户
1112 | 换句话说
1113 | 换言之
1114 | 据
1115 | 据实
1116 | 据悉
1117 | 据我所知
1118 | 据此
1119 | 据称
1120 | 据说
1121 | 掌握
1122 | 接下来
1123 | 接着
1124 | 接著
1125 | 接连不断
1126 | 放量
1127 | 故
1128 | 故意
1129 | 故此
1130 | 故而
1131 | 敞开儿
1132 | 敢
1133 | 敢于
1134 | 敢情
1135 | 数/
1136 | 整个
1137 | 断然
1138 | 方
1139 | 方便
1140 | 方才
1141 | 方能
1142 | 方面
1143 | 旁人
1144 | 无
1145 | 无宁
1146 | 无法
1147 | 无论
1148 | 既
1149 | 既...又
1150 | 既往
1151 | 既是
1152 | 既然
1153 | 日复一日
1154 | 日渐
1155 | 日益
1156 | 日臻
1157 | 日见
1158 | 时候
1159 | 昂然
1160 | 明显
1161 | 明确
1162 | 是
1163 | 是不是
1164 | 是以
1165 | 是否
1166 | 是的
1167 | 显然
1168 | 显著
1169 | 普通
1170 | 普遍
1171 | 暗中
1172 | 暗地里
1173 | 暗自
1174 | 更
1175 | 更为
1176 | 更加
1177 | 更进一步
1178 | 曾
1179 | 曾经
1180 | 替
1181 | 替代
1182 | 最
1183 | 最后
1184 | 最大
1185 | 最好
1186 | 最後
1187 | 最近
1188 | 最高
1189 | 有
1190 | 有些
1191 | 有关
1192 | 有利
1193 | 有力
1194 | 有及
1195 | 有所
1196 | 有效
1197 | 有时
1198 | 有点
1199 | 有的
1200 | 有的是
1201 | 有着
1202 | 有著
1203 | 望
1204 | 朝
1205 | 朝着
1206 | 末##末
1207 | 本
1208 | 本人
1209 | 本地
1210 | 本着
1211 | 本身
1212 | 权时
1213 | 来
1214 | 来不及
1215 | 来得及
1216 | 来看
1217 | 来着
1218 | 来自
1219 | 来讲
1220 | 来说
1221 | 极
1222 | 极为
1223 | 极了
1224 | 极其
1225 | 极力
1226 | 极大
1227 | 极度
1228 | 极端
1229 | 构成
1230 | 果然
1231 | 果真
1232 | 某
1233 | 某个
1234 | 某些
1235 | 某某
1236 | 根据
1237 | 根本
1238 | 格外
1239 | 梆
1240 | 概
1241 | 次第
1242 | 欢迎
1243 | 欤
1244 | 正值
1245 | 正在
1246 | 正如
1247 | 正巧
1248 | 正常
1249 | 正是
1250 | 此
1251 | 此中
1252 | 此后
1253 | 此地
1254 | 此处
1255 | 此外
1256 | 此时
1257 | 此次
1258 | 此间
1259 | 殆
1260 | 毋宁
1261 | 每
1262 | 每个
1263 | 每天
1264 | 每年
1265 | 每当
1266 | 每时每刻
1267 | 每每
1268 | 每逢
1269 | 比
1270 | 比及
1271 | 比如
1272 | 比如说
1273 | 比方
1274 | 比照
1275 | 比起
1276 | 比较
1277 | 毕竟
1278 | 毫不
1279 | 毫无
1280 | 毫无例外
1281 | 毫无保留地
1282 | 汝
1283 | 沙沙
1284 | 没
1285 | 没奈何
1286 | 没有
1287 | 沿
1288 | 沿着
1289 | 注意
1290 | 活
1291 | 深入
1292 | 清楚
1293 | 满
1294 | 满足
1295 | 漫说
1296 | 焉
1297 | 然
1298 | 然则
1299 | 然后
1300 | 然後
1301 | 然而
1302 | 照
1303 | 照着
1304 | 牢牢
1305 | 特别是
1306 | 特殊
1307 | 特点
1308 | 犹且
1309 | 犹自
1310 | 独
1311 | 独自
1312 | 猛然
1313 | 猛然间
1314 | 率尔
1315 | 率然
1316 | 现代
1317 | 现在
1318 | 理应
1319 | 理当
1320 | 理该
1321 | 瑟瑟
1322 | 甚且
1323 | 甚么
1324 | 甚或
1325 | 甚而
1326 | 甚至
1327 | 甚至于
1328 | 用
1329 | 用来
1330 | 甫
1331 | 甭
1332 | 由
1333 | 由于
1334 | 由是
1335 | 由此
1336 | 由此可见
1337 | 略
1338 | 略为
1339 | 略加
1340 | 略微
1341 | 白
1342 | 白白
1343 | 的
1344 | 的确
1345 | 的话
1346 | 皆可
1347 | 目前
1348 | 直到
1349 | 直接
1350 | 相似
1351 | 相信
1352 | 相反
1353 | 相同
1354 | 相对
1355 | 相对而言
1356 | 相应
1357 | 相当
1358 | 相等
1359 | 省得
1360 | 看
1361 | 看上去
1362 | 看出
1363 | 看到
1364 | 看来
1365 | 看样子
1366 | 看看
1367 | 看见
1368 | 看起来
1369 | 真是
1370 | 真正
1371 | 眨眼
1372 | 着
1373 | 着呢
1374 | 矣
1375 | 矣乎
1376 | 矣哉
1377 | 知道
1378 | 砰
1379 | 确定
1380 | 碰巧
1381 | 社会主义
1382 | 离
1383 | 种
1384 | 积极
1385 | 移动
1386 | 究竟
1387 | 穷年累月
1388 | 突出
1389 | 突然
1390 | 窃
1391 | 立
1392 | 立刻
1393 | 立即
1394 | 立地
1395 | 立时
1396 | 立马
1397 | 竟
1398 | 竟然
1399 | 竟而
1400 | 第
1401 | 第二
1402 | 等
1403 | 等到
1404 | 等等
1405 | 策略地
1406 | 简直
1407 | 简而言之
1408 | 简言之
1409 | 管
1410 | 类如
1411 | 粗
1412 | 精光
1413 | 紧接着
1414 | 累年
1415 | 累次
1416 | 纯
1417 | 纯粹
1418 | 纵
1419 | 纵令
1420 | 纵使
1421 | 纵然
1422 | 练习
1423 | 组成
1424 | 经
1425 | 经常
1426 | 经过
1427 | 结合
1428 | 结果
1429 | 给
1430 | 绝
1431 | 绝不
1432 | 绝对
1433 | 绝非
1434 | 绝顶
1435 | 继之
1436 | 继后
1437 | 继续
1438 | 继而
1439 | 维持
1440 | 综上所述
1441 | 缕缕
1442 | 罢了
1443 | 老
1444 | 老大
1445 | 老是
1446 | 老老实实
1447 | 考虑
1448 | 者
1449 | 而
1450 | 而且
1451 | 而况
1452 | 而又
1453 | 而后
1454 | 而外
1455 | 而已
1456 | 而是
1457 | 而言
1458 | 而论
1459 | 联系
1460 | 联袂
1461 | 背地里
1462 | 背靠背
1463 | 能
1464 | 能否
1465 | 能够
1466 | 腾
1467 | 自
1468 | 自个儿
1469 | 自从
1470 | 自各儿
1471 | 自后
1472 | 自家
1473 | 自己
1474 | 自打
1475 | 自身
1476 | 臭
1477 | 至
1478 | 至于
1479 | 至今
1480 | 至若
1481 | 致
1482 | 般的
1483 | 良好
1484 | 若
1485 | 若夫
1486 | 若是
1487 | 若果
1488 | 若非
1489 | 范围
1490 | 莫
1491 | 莫不
1492 | 莫不然
1493 | 莫如
1494 | 莫若
1495 | 莫非
1496 | 获得
1497 | 藉以
1498 | 虽
1499 | 虽则
1500 | 虽然
1501 | 虽说
1502 | 蛮
1503 | 行为
1504 | 行动
1505 | 表明
1506 | 表示
1507 | 被
1508 | 要
1509 | 要不
1510 | 要不是
1511 | 要不然
1512 | 要么
1513 | 要是
1514 | 要求
1515 | 见
1516 | 规定
1517 | 觉得
1518 | 譬喻
1519 | 譬如
1520 | 认为
1521 | 认真
1522 | 认识
1523 | 让
1524 | 许多
1525 | 论
1526 | 论说
1527 | 设使
1528 | 设或
1529 | 设若
1530 | 诚如
1531 | 诚然
1532 | 话说
1533 | 该
1534 | 该当
1535 | 说明
1536 | 说来
1537 | 说说
1538 | 请勿
1539 | 诸
1540 | 诸位
1541 | 诸如
1542 | 谁
1543 | 谁人
1544 | 谁料
1545 | 谁知
1546 | 谨
1547 | 豁然
1548 | 贼死
1549 | 赖以
1550 | 赶
1551 | 赶快
1552 | 赶早不赶晚
1553 | 起
1554 | 起先
1555 | 起初
1556 | 起头
1557 | 起来
1558 | 起见
1559 | 起首
1560 | 趁
1561 | 趁便
1562 | 趁势
1563 | 趁早
1564 | 趁机
1565 | 趁热
1566 | 趁着
1567 | 越是
1568 | 距
1569 | 跟
1570 | 路经
1571 | 转动
1572 | 转变
1573 | 转贴
1574 | 轰然
1575 | 较
1576 | 较为
1577 | 较之
1578 | 较比
1579 | 边
1580 | 达到
1581 | 达旦
1582 | 迄
1583 | 迅速
1584 | 过
1585 | 过于
1586 | 过去
1587 | 过来
1588 | 运用
1589 | 近
1590 | 近几年来
1591 | 近年来
1592 | 近来
1593 | 还
1594 | 还是
1595 | 还有
1596 | 还要
1597 | 这
1598 | 这一来
1599 | 这个
1600 | 这么
1601 | 这么些
1602 | 这么样
1603 | 这么点儿
1604 | 这些
1605 | 这会儿
1606 | 这儿
1607 | 这就是说
1608 | 这时
1609 | 这样
1610 | 这次
1611 | 这点
1612 | 这种
1613 | 这般
1614 | 这边
1615 | 这里
1616 | 这麽
1617 | 进入
1618 | 进去
1619 | 进来
1620 | 进步
1621 | 进而
1622 | 进行
1623 | 连
1624 | 连同
1625 | 连声
1626 | 连日
1627 | 连日来
1628 | 连袂
1629 | 连连
1630 | 迟早
1631 | 迫于
1632 | 适应
1633 | 适当
1634 | 适用
1635 | 逐步
1636 | 逐渐
1637 | 通常
1638 | 通过
1639 | 造成
1640 | 逢
1641 | 遇到
1642 | 遭到
1643 | 遵循
1644 | 遵照
1645 | 避免
1646 | 那
1647 | 那个
1648 | 那么
1649 | 那么些
1650 | 那么样
1651 | 那些
1652 | 那会儿
1653 | 那儿
1654 | 那时
1655 | 那末
1656 | 那样
1657 | 那般
1658 | 那边
1659 | 那里
1660 | 那麽
1661 | 部分
1662 | 都
1663 | 鄙人
1664 | 采取
1665 | 里面
1666 | 重大
1667 | 重新
1668 | 重要
1669 | 鉴于
1670 | 针对
1671 | 长期以来
1672 | 长此下去
1673 | 长线
1674 | 长话短说
1675 | 问题
1676 | 间或
1677 | 防止
1678 | 阿
1679 | 附近
1680 | 陈年
1681 | 限制
1682 | 陡然
1683 | 除
1684 | 除了
1685 | 除却
1686 | 除去
1687 | 除外
1688 | 除开
1689 | 除此
1690 | 除此之外
1691 | 除此以外
1692 | 除此而外
1693 | 除非
1694 | 随
1695 | 随后
1696 | 随时
1697 | 随着
1698 | 随著
1699 | 隔夜
1700 | 隔日
1701 | 难得
1702 | 难怪
1703 | 难说
1704 | 难道
1705 | 难道说
1706 | 集中
1707 | 零
1708 | 需要
1709 | 非但
1710 | 非常
1711 | 非徒
1712 | 非得
1713 | 非特
1714 | 非独
1715 | 靠
1716 | 顶多
1717 | 顷
1718 | 顷刻
1719 | 顷刻之间
1720 | 顷刻间
1721 | 顺
1722 | 顺着
1723 | 顿时
1724 | 颇
1725 | 风雨无阻
1726 | 饱
1727 | 首先
1728 | 马上
1729 | 高低
1730 | 高兴
1731 | 默然
1732 | 默默地
1733 | 齐
1734 | ︿
1735 | !
1736 | #
1737 | $
1738 | %
1739 | &
1740 | '
1741 | (
1742 | )
1743 | )÷(1-
1744 | )、
1745 | *
1746 | +
1747 | +ξ
1748 | ++
1749 | ,
1750 | ,也
1751 | -
1752 | -β
1753 | --
1754 | -[*]-
1755 | .
1756 | /
1757 | 0
1758 | 0:2
1759 | 1
1760 | 1.
1761 | 12%
1762 | 2
1763 | 2.3%
1764 | 3
1765 | 4
1766 | 5
1767 | 5:0
1768 | 6
1769 | 7
1770 | 8
1771 | 9
1772 | :
1773 | ;
1774 | <
1775 | <±
1776 | <Δ
1777 | <λ
1778 | <φ
1779 | <<
1780 | =
1781 | =″
1782 | =☆
1783 | =(
1784 | =-
1785 | =[
1786 | ={
1787 | >
1788 | >λ
1789 | ?
1790 | @
1791 | A
1792 | LI
1793 | R.L.
1794 | ZXFITL
1795 | [
1796 | [①①]
1797 | [①②]
1798 | [①③]
1799 | [①④]
1800 | [①⑤]
1801 | [①⑥]
1802 | [①⑦]
1803 | [①⑧]
1804 | [①⑨]
1805 | [①A]
1806 | [①B]
1807 | [①C]
1808 | [①D]
1809 | [①E]
1810 | [①]
1811 | [①a]
1812 | [①c]
1813 | [①d]
1814 | [①e]
1815 | [①f]
1816 | [①g]
1817 | [①h]
1818 | [①i]
1819 | [①o]
1820 | [②
1821 | [②①]
1822 | [②②]
1823 | [②③]
1824 | [②④
1825 | [②⑤]
1826 | [②⑥]
1827 | [②⑦]
1828 | [②⑧]
1829 | [②⑩]
1830 | [②B]
1831 | [②G]
1832 | [②]
1833 | [②a]
1834 | [②b]
1835 | [②c]
1836 | [②d]
1837 | [②e]
1838 | [②f]
1839 | [②g]
1840 | [②h]
1841 | [②i]
1842 | [②j]
1843 | [③①]
1844 | [③⑩]
1845 | [③F]
1846 | [③]
1847 | [③a]
1848 | [③b]
1849 | [③c]
1850 | [③d]
1851 | [③e]
1852 | [③g]
1853 | [③h]
1854 | [④]
1855 | [④a]
1856 | [④b]
1857 | [④c]
1858 | [④d]
1859 | [④e]
1860 | [⑤]
1861 | [⑤]]
1862 | [⑤a]
1863 | [⑤b]
1864 | [⑤d]
1865 | [⑤e]
1866 | [⑤f]
1867 | [⑥]
1868 | [⑦]
1869 | [⑧]
1870 | [⑨]
1871 | [⑩]
1872 | [*]
1873 | [-
1874 | []
1875 | ]
1876 | ]∧′=[
1877 | ][
1878 | _
1879 | a]
1880 | b]
1881 | c]
1882 | e]
1883 | f]
1884 | ng昉
1885 | {
1886 | {-
1887 | |
1888 | }
1889 | }>
1890 | ~
1891 | ~±
1892 | ~+
1893 | ¥
1894 |
--------------------------------------------------------------------------------