├── .gitignore
├── README.md
├── data
├── LICENSE
├── data.csv
├── dev.ipynb
├── example_data.csv
├── keys.txt
├── keys
│ └── test.ipynb
└── stopword.txt
├── exp.ipynb
├── imgs
├── bar_plot.png
├── intertopic_distance_map.png
└── topic_over_time.png
├── main.py
├── requirements.txt
└── utils.py
/.gitignore:
--------------------------------------------------------------------------------
1 | **/.DS_Store
2 |
3 |
4 | # ignore ckip models
5 | **/model_ner
6 | **/model_pos
7 | **/model_ws
8 |
9 | **/embedding_character
10 | **/embedding_word
11 |
12 | **/data.zip
13 | **/*.xml
14 |
15 | **/__pycache__
16 | **/export
17 |
18 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # HelloBERTopic
2 | 本專案用來對 [BERTopic](https://github.com/MaartenGr/BERTopic) 進行一些應用、摘要與實驗
3 | - [BERTopic文章](https://towardsdatascience.com/topic-modeling-with-bert-779f7db187e6)
4 | - [論文](https://arxiv.org/abs/2203.05794)
5 |
6 | ## 安裝
7 | - 我們使用 conda 來建立環境,並安裝相依套件 https://docs.anaconda.com/free/miniconda/miniconda-install/
8 |
9 | ```
10 | conda create --name bertopic python=3.9
11 | conda activate bertopic
12 | pip3 install -r requirements.txt
13 | ```
14 |
15 | ## 執行結果
16 | 若成功執行完`main.py` 檔案,會在export資料夾中產生以下html檔案:
17 | ```
18 | bar_fig.html
19 | topic_fig.html
20 | tot_fig.html
21 | ```
22 | ### Topics Bar
23 | 
24 |
25 | ### Topic Over Time
26 | 
27 |
28 | ### Intertopic Distance Map
29 | 
30 |
31 | ## 運作原理
32 | BERTopic 透過對詞向量做 UMAP 降維特徵提取後,採用 HDBSCAN 來進行非監督式的分群動作。
33 |
34 | ### UMAP
35 | 與 tSNE 相似的降維演算法,在資料視覺化上都有著很好的效果。
36 | - 步驟
37 | 1. 計算點與其周遭可控數量鄰點的距離(Distance)。
38 | 2. 確保當 data 降維到低維空間時,點與點之間的距離要與高維空間的距離關係是相似的。
39 |
40 | ### HDBSCAN
41 | HDBSCAN 是針對 DBSCAN 的缺點來進行改善而提出的演算法。
42 | DBSCAN 的演算法假設個群集間的密度(Density)是相同的,然而當此假設運用在密度差異明顯的資料集上時,就會產生錯誤分群的結果。
43 | 兩者最主要的不同,在於他們對待邊界值(border points)的方式。HDBSCAN提出了有效的演算法從而改進了上述DBSCAN在特定狀況下產生錯誤分群的結果。
44 |
45 | 另外,HDBSCAN也保留了DBSCAN的特性,會自動對資料進行分群,而不用使用者自己設定分群數量。
46 |
47 | ### Other Clustering Method
48 | 如果不想要 HDBSCAN 演算法自動做分群,可以採用以下方法更換成 `KMeans` 或 `Birch` 分群演算法,細節參照官方文件
49 | - [link](https://maartengr.github.io/BERTopic/getting_started/clustering/clustering.html#visual-overview)
50 | ```python
51 | from bertopic import BERTopic
52 | from sklearn.cluster import KMeans
53 |
54 | cluster_model = KMeans(n_clusters=50)
55 | topic_model = BERTopic(hdbscan_model=cluster_model)
56 | ```
57 |
58 |
59 | ## 專案結構
60 | ```
61 | .
62 | ├── .gitignore
63 | ├── README.md
64 | ├── data
65 | │ ├── data.csv
66 | │ ├── keys
67 | │ │ └── test.ipynb
68 | │ ├── keys.txt
69 | │ └── stopword.txt
70 | ├── exp.ipynb
71 | ├── main.py
72 | ├── requirements.txt
73 | └── utils.py
74 | ```
75 | ## 安裝環境
76 | ```
77 | pip install -r requirements.txt
78 | ```
79 |
80 | ## 執行程式
81 | ```
82 | python main.py
83 | ```
84 | ### 參數說明
85 | ```
86 | Hello BERTopics
87 |
88 | optional arguments:
89 | -h, --help show this help message and exit
90 | --topic_num TOPIC_NUM
91 | 設置要分成幾個topic
92 | --keyword_file KEYWORD_FILE
93 | 設置讀取keyword檔案名稱
94 | --model_name MODEL_NAME
95 | 設置HuggingFace的PretrainModel名稱
96 | --data_file DATA_FILE
97 | 設置資料讀取位置
98 | --word_sentence_cache WORD_SENTENCE_CACHE
99 | 是否讀取斷詞快取(如果沒有cache會走Default流程)
100 | ```
101 |
--------------------------------------------------------------------------------
/data/LICENSE:
--------------------------------------------------------------------------------
1 | Copyright 2019 CKIP
2 |
3 | Licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0
4 | International License; you may not use this file except in compliance
5 | with the License. You may obtain a copy of the License at
6 |
7 | https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode
8 |
9 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
10 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
11 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
12 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
13 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
14 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
15 | SOFTWARE.
16 |
--------------------------------------------------------------------------------
/data/dev.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": 1,
6 | "metadata": {},
7 | "outputs": [],
8 | "source": [
9 | "import pandas as pd\n"
10 | ]
11 | },
12 | {
13 | "cell_type": "code",
14 | "execution_count": 4,
15 | "metadata": {},
16 | "outputs": [],
17 | "source": [
18 | "df = pd.read_csv(\"data.csv\")\n"
19 | ]
20 | },
21 | {
22 | "cell_type": "code",
23 | "execution_count": 6,
24 | "metadata": {},
25 | "outputs": [
26 | {
27 | "data": {
28 | "text/plain": [
29 | "Index(['year', 'name', 'label', 'year_start', 'year_end', 'keyword', 'ner',\n",
30 | " 'tf_idf', 'description', 'order'],\n",
31 | " dtype='object')"
32 | ]
33 | },
34 | "execution_count": 6,
35 | "metadata": {},
36 | "output_type": "execute_result"
37 | }
38 | ],
39 | "source": [
40 | "df.columns"
41 | ]
42 | },
43 | {
44 | "cell_type": "code",
45 | "execution_count": 9,
46 | "metadata": {},
47 | "outputs": [],
48 | "source": [
49 | "df[[\"year\", \"name\", \"description\"]].to_csv(\"example_data.csv\", index=False)"
50 | ]
51 | }
52 | ],
53 | "metadata": {
54 | "interpreter": {
55 | "hash": "1ae26572eda725713d76d5b5539aa0885fd8640fe2af809f71b79ee35a86cf8d"
56 | },
57 | "kernelspec": {
58 | "display_name": "Python 3.7.11 ('bertopic')",
59 | "language": "python",
60 | "name": "python3"
61 | },
62 | "language_info": {
63 | "codemirror_mode": {
64 | "name": "ipython",
65 | "version": 3
66 | },
67 | "file_extension": ".py",
68 | "mimetype": "text/x-python",
69 | "name": "python",
70 | "nbconvert_exporter": "python",
71 | "pygments_lexer": "ipython3",
72 | "version": "3.7.11"
73 | },
74 | "orig_nbformat": 4
75 | },
76 | "nbformat": 4,
77 | "nbformat_minor": 2
78 | }
79 |
--------------------------------------------------------------------------------
/data/keys/test.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": 40,
6 | "metadata": {},
7 | "outputs": [],
8 | "source": [
9 | "import xml.etree.ElementTree as ET\n",
10 | "from tqdm import tqdm\n",
11 | "keys = []"
12 | ]
13 | },
14 | {
15 | "cell_type": "code",
16 | "execution_count": 41,
17 | "metadata": {},
18 | "outputs": [],
19 | "source": [
20 | "clean = [' ', '、', '(', ')', '台灣', ':']\n",
21 | "def clean_txt(input):\n",
22 | " for c in clean:\n",
23 | " input = input.replace(c, '')\n",
24 | " return input"
25 | ]
26 | },
27 | {
28 | "cell_type": "code",
29 | "execution_count": 42,
30 | "metadata": {},
31 | "outputs": [
32 | {
33 | "name": "stderr",
34 | "output_type": "stream",
35 | "text": [
36 | "14it [00:16, 1.15s/it]\n"
37 | ]
38 | }
39 | ],
40 | "source": [
41 | "for i, idx in tqdm(enumerate(range(14), start=97)):\n",
42 | " tree = ET.parse(f'./GRB_{i}.xml')\n",
43 | " root = tree.getroot()\n",
44 | " for grb05 in root.findall('GRB05'):\n",
45 | " g = grb05.find('KEYWORD_C')\n",
46 | " if g != None:\n",
47 | " if ';' in g.text: \n",
48 | " temp = list(set(g.text.split(';'))) \n",
49 | " if ';' in g.text:\n",
50 | " temp = list(set(g.text.split(';')))\n",
51 | " temp = [clean_txt(w) for w in temp if len(clean_txt(w)) < 5]\n",
52 | " keys.extend(temp) \n",
53 | " "
54 | ]
55 | },
56 | {
57 | "cell_type": "code",
58 | "execution_count": null,
59 | "metadata": {},
60 | "outputs": [],
61 | "source": []
62 | },
63 | {
64 | "cell_type": "code",
65 | "execution_count": 43,
66 | "metadata": {},
67 | "outputs": [
68 | {
69 | "data": {
70 | "text/plain": [
71 | "159076"
72 | ]
73 | },
74 | "execution_count": 43,
75 | "metadata": {},
76 | "output_type": "execute_result"
77 | }
78 | ],
79 | "source": [
80 | "keys = list(filter(None, keys))\n",
81 | "keys = list(set(keys))\n",
82 | "keys = [clean_txt(k) for k in keys]\n",
83 | "len(keys)"
84 | ]
85 | },
86 | {
87 | "cell_type": "code",
88 | "execution_count": 44,
89 | "metadata": {},
90 | "outputs": [
91 | {
92 | "name": "stdout",
93 | "output_type": "stream",
94 | "text": [
95 | "南韓\n"
96 | ]
97 | }
98 | ],
99 | "source": [
100 | "print(keys[1])"
101 | ]
102 | },
103 | {
104 | "cell_type": "code",
105 | "execution_count": 45,
106 | "metadata": {},
107 | "outputs": [],
108 | "source": [
109 | "with open('keys.txt', 'w') as f:\n",
110 | " for item in keys:\n",
111 | " f.write(\"%s\\n\" % item)"
112 | ]
113 | }
114 | ],
115 | "metadata": {
116 | "interpreter": {
117 | "hash": "aa781dd5d8b0b47d7fc97d3d29d31ddde4cdca0824a34701095d27847d24a55d"
118 | },
119 | "kernelspec": {
120 | "display_name": "Python 3.9.5 ('base')",
121 | "language": "python",
122 | "name": "python3"
123 | },
124 | "language_info": {
125 | "codemirror_mode": {
126 | "name": "ipython",
127 | "version": 3
128 | },
129 | "file_extension": ".py",
130 | "mimetype": "text/x-python",
131 | "name": "python",
132 | "nbconvert_exporter": "python",
133 | "pygments_lexer": "ipython3",
134 | "version": "3.7.11"
135 | },
136 | "orig_nbformat": 4
137 | },
138 | "nbformat": 4,
139 | "nbformat_minor": 2
140 | }
141 |
--------------------------------------------------------------------------------
/data/stopword.txt:
--------------------------------------------------------------------------------
1 | $
2 | 0
3 | 1
4 | 2
5 | 3
6 | 4
7 | 5
8 | 6
9 | 7
10 | 8
11 | 9
12 | ?
13 | _
14 | “
15 | ”
16 | 、
17 | 。
18 | 《
19 | 》
20 | 一
21 | 一些
22 | 一何
23 | 一切
24 | 一則
25 | 一方面
26 | 一旦
27 | 一來
28 | 一樣
29 | 一般
30 | 一轉眼
31 | 萬一
32 | 上
33 | 上下
34 | 下
35 | 不
36 | 不僅
37 | 不但
38 | 不光
39 | 不單
40 | 不只
41 | 不外乎
42 | 不如
43 | 不妨
44 | 不盡
45 | 不盡然
46 | 不得
47 | 不怕
48 | 不惟
49 | 不成
50 | 不拘
51 | 不料
52 | 不是
53 | 不比
54 | 不然
55 | 不特
56 | 不獨
57 | 不管
58 | 不至於
59 | 不若
60 | 不論
61 | 不過
62 | 不問
63 | 與
64 | 與其
65 | 與其說
66 | 與否
67 | 與此同時
68 | 且
69 | 且不說
70 | 且說
71 | 兩者
72 | 個
73 | 個別
74 | 臨
75 | 為
76 | 為了
77 | 為什麼
78 | 為何
79 | 為止
80 | 為此
81 | 為著
82 | 乃
83 | 乃至
84 | 乃至於
85 | 麼
86 | 之
87 | 之一
88 | 之所以
89 | 之類
90 | 烏乎
91 | 乎
92 | 乘
93 | 也
94 | 也好
95 | 也罷
96 | 了
97 | 二來
98 | 於
99 | 於是
100 | 於是乎
101 | 云云
102 | 云爾
103 | 些
104 | 亦
105 | 人
106 | 人們
107 | 人家
108 | 什麼
109 | 什麼樣
110 | 今
111 | 介於
112 | 仍
113 | 仍舊
114 | 從
115 | 從此
116 | 從而
117 | 他
118 | 他人
119 | 他們
120 | 以
121 | 以上
122 | 以為
123 | 以便
124 | 以免
125 | 以及
126 | 以故
127 | 以期
128 | 以來
129 | 以至
130 | 以至於
131 | 以致
132 | 們
133 | 任
134 | 任何
135 | 任憑
136 | 似的
137 | 但
138 | 但凡
139 | 但是
140 | 何
141 | 何以
142 | 何況
143 | 何處
144 | 何時
145 | 餘外
146 | 作為
147 | 你
148 | 你們
149 | 使
150 | 使得
151 | 例如
152 | 依
153 | 依據
154 | 依照
155 | 便於
156 | 俺
157 | 俺們
158 | 倘
159 | 倘使
160 | 倘或
161 | 倘然
162 | 倘若
163 | 借
164 | 假使
165 | 假如
166 | 假若
167 | 儻然
168 | 像
169 | 兒
170 | 先不先
171 | 光是
172 | 全體
173 | 全部
174 | 兮
175 | 關於
176 | 其
177 | 其一
178 | 其中
179 | 其二
180 | 其他
181 | 其餘
182 | 其它
183 | 其次
184 | 具體地說
185 | 具體說來
186 | 兼之
187 | 內
188 | 再
189 | 再其次
190 | 再則
191 | 再有
192 | 再者
193 | 再者說
194 | 再說
195 | 冒
196 | 衝
197 | 況且
198 | 幾
199 | 幾時
200 | 凡
201 | 凡是
202 | 憑
203 | 憑藉
204 | 出於
205 | 出來
206 | 分別
207 | 則
208 | 則甚
209 | 別
210 | 別人
211 | 別處
212 | 別是
213 | 別的
214 | 別管
215 | 別說
216 | 到
217 | 前後
218 | 前此
219 | 前者
220 | 加之
221 | 加以
222 | 即
223 | 即令
224 | 即使
225 | 即便
226 | 即如
227 | 即或
228 | 即若
229 | 卻
230 | 去
231 | 又
232 | 又及
233 | 及
234 | 及其
235 | 及至
236 | 反之
237 | 反而
238 | 反過來
239 | 反過來說
240 | 受到
241 | 另
242 | 另一方面
243 | 另外
244 | 另悉
245 | 只
246 | 只當
247 | 只怕
248 | 只是
249 | 只有
250 | 只消
251 | 只要
252 | 只限
253 | 叫
254 | 叮咚
255 | 可
256 | 可以
257 | 可是
258 | 可見
259 | 各
260 | 各個
261 | 各位
262 | 各種
263 | 各自
264 | 同
265 | 同時
266 | 後
267 | 後者
268 | 向
269 | 向使
270 | 向著
271 | 嚇
272 | 嗎
273 | 否則
274 | 吧
275 | 吧噠
276 | 吱
277 | 呀
278 | 呃
279 | 嘔
280 | 唄
281 | 嗚
282 | 嗚呼
283 | 呢
284 | 呵
285 | 呵呵
286 | 呸
287 | 呼哧
288 | 咋
289 | 和
290 | 咚
291 | 咦
292 | 咧
293 | 咱
294 | 咱們
295 | 咳
296 | 哇
297 | 哈
298 | 哈哈
299 | 哉
300 | 哎
301 | 哎呀
302 | 哎喲
303 | 嘩
304 | 喲
305 | 哦
306 | 哩
307 | 哪
308 | 哪個
309 | 哪些
310 | 哪兒
311 | 哪天
312 | 哪年
313 | 哪怕
314 | 哪樣
315 | 哪邊
316 | 哪裡
317 | 哼
318 | 哼唷
319 | 唉
320 | 唯有
321 | 啊
322 | 啐
323 | 啥
324 | 啦
325 | 啪達
326 | 啷噹
327 | 餵
328 | 喏
329 | 喔唷
330 | 嘍
331 | 嗡
332 | 嗡嗡
333 | 嗬
334 | 嗯
335 | 噯
336 | 嘎
337 | 嘎登
338 | 噓
339 | 嘛
340 | 嘻
341 | 嘿
342 | 嘿嘿
343 | 因
344 | 因為
345 | 因了
346 | 因此
347 | 因著
348 | 因而
349 | 固然
350 | 在
351 | 在下
352 | 在於
353 | 地
354 | 基於
355 | 處在
356 | 多
357 | 多麼
358 | 多少
359 | 大
360 | 大家
361 | 她
362 | 她們
363 | 好
364 | 如
365 | 如上
366 | 如上所述
367 | 如下
368 | 如何
369 | 如其
370 | 如同
371 | 如是
372 | 如果
373 | 如此
374 | 如若
375 | 始而
376 | 孰料
377 | 孰知
378 | 寧
379 | 寧可
380 | 寧願
381 | 寧肯
382 | 它
383 | 它們
384 | 對
385 | 對於
386 | 對待
387 | 對方
388 | 對比
389 | 將
390 | 小
391 | 爾
392 | 爾後
393 | 爾爾
394 | 尚且
395 | 就
396 | 就是
397 | 就是了
398 | 就是說
399 | 就算
400 | 就要
401 | 盡
402 | 儘管
403 | 儘管如此
404 | 豈但
405 | 己
406 | 已
407 | 已矣
408 | 巴
409 | 巴巴
410 | 並
411 | 並且
412 | 並非
413 | 庶乎
414 | 庶幾
415 | 開外
416 | 開始
417 | 歸
418 | 歸齊
419 | 當
420 | 當地
421 | 當然
422 | 當著
423 | 彼
424 | 彼時
425 | 彼此
426 | 往
427 | 待
428 | 很
429 | 得
430 | 得了
431 | 怎
432 | 怎麼
433 | 怎麼辦
434 | 怎麼樣
435 | 怎奈
436 | 怎樣
437 | 總之
438 | 總的來看
439 | 總的來說
440 | 總的說來
441 | 總而言之
442 | 恰恰相反
443 | 您
444 | 惟其
445 | 慢說
446 | 我
447 | 我們
448 | 或
449 | 或則
450 | 或是
451 | 或曰
452 | 或者
453 | 截至
454 | 所
455 | 所以
456 | 所在
457 | 所幸
458 | 所有
459 | 才
460 | 才能
461 | 打
462 | 打從
463 | 把
464 | 抑或
465 | 拿
466 | 按
467 | 按照
468 | 換句話說
469 | 換言之
470 | 據
471 | 據此
472 | 接著
473 | 故
474 | 故此
475 | 故而
476 | 旁人
477 | 無
478 | 無寧
479 | 無論
480 | 既
481 | 既往
482 | 既是
483 | 既然
484 | 時候
485 | 是
486 | 是以
487 | 是的
488 | 曾
489 | 替
490 | 替代
491 | 最
492 | 有
493 | 有些
494 | 有關
495 | 有及
496 | 有時
497 | 有的
498 | 望
499 | 朝
500 | 朝著
501 | 本
502 | 本人
503 | 本地
504 | 本著
505 | 本身
506 | 來
507 | 來著
508 | 來自
509 | 來說
510 | 極了
511 | 果然
512 | 果真
513 | 某
514 | 某個
515 | 某些
516 | 某某
517 | 根據
518 | 歟
519 | 正值
520 | 正如
521 | 正巧
522 | 正是
523 | 此
524 | 此地
525 | 此處
526 | 此外
527 | 此時
528 | 此次
529 | 此間
530 | 毋寧
531 | 每
532 | 每當
533 | 比
534 | 比及
535 | 比如
536 | 比方
537 | 沒奈何
538 | 沿
539 | 沿著
540 | 漫說
541 | 焉
542 | 然則
543 | 然後
544 | 然而
545 | 照
546 | 照著
547 | 猶且
548 | 猶自
549 | 甚且
550 | 甚麼
551 | 甚或
552 | 甚而
553 | 甚至
554 | 甚至於
555 | 用
556 | 用來
557 | 由
558 | 由於
559 | 由是
560 | 由此
561 | 由此可見
562 | 的
563 | 的確
564 | 的話
565 | 直到
566 | 相對而言
567 | 省得
568 | 看
569 | 眨眼
570 | 著
571 | 著呢
572 | 矣
573 | 矣乎
574 | 矣哉
575 | 離
576 | 竟而
577 | 第
578 | 等
579 | 等到
580 | 等等
581 | 簡言之
582 | 管
583 | 類如
584 | 緊接著
585 | 縱
586 | 縱令
587 | 縱使
588 | 縱然
589 | 經
590 | 經過
591 | 結果
592 | 給
593 | 繼之
594 | 繼後
595 | 繼而
596 | 綜上所述
597 | 罷了
598 | 者
599 | 而
600 | 而且
601 | 而況
602 | 而後
603 | 而外
604 | 而已
605 | 而是
606 | 而言
607 | 能
608 | 能否
609 | 騰
610 | 自
611 | 自個兒
612 | 自從
613 | 自各兒
614 | 自後
615 | 自家
616 | 自己
617 | 自打
618 | 自身
619 | 至
620 | 至於
621 | 至今
622 | 至若
623 | 致
624 | 般的
625 | 若
626 | 若夫
627 | 若是
628 | 若果
629 | 若非
630 | 莫不然
631 | 莫如
632 | 莫若
633 | 雖
634 | 雖則
635 | 雖然
636 | 雖說
637 | 被
638 | 要
639 | 要不
640 | 要不是
641 | 要不然
642 | 要么
643 | 要是
644 | 譬喻
645 | 譬如
646 | 讓
647 | 許多
648 | 論
649 | 設使
650 | 設或
651 | 設若
652 | 誠如
653 | 誠然
654 | 該
655 | 說來
656 | 諸
657 | 諸位
658 | 諸如
659 | 誰
660 | 誰人
661 | 誰料
662 | 誰知
663 | 賊死
664 | 賴以
665 | 趕
666 | 起
667 | 起見
668 | 趁
669 | 趁著
670 | 越是
671 | 距
672 | 跟
673 | 較
674 | 較之
675 | 邊
676 | 過
677 | 還
678 | 還是
679 | 還有
680 | 還要
681 | 這
682 | 這一來
683 | 這個
684 | 這麼
685 | 這麼些
686 | 這麼樣
687 | 這麼點兒
688 | 這些
689 | 這會兒
690 | 這兒
691 | 這就是說
692 | 這時
693 | 這樣
694 | 這次
695 | 這般
696 | 這邊
697 | 這裡
698 | 進而
699 | 連
700 | 連同
701 | 逐步
702 | 通過
703 | 遵循
704 | 遵照
705 | 那
706 | 那個
707 | 那麼
708 | 那麼些
709 | 那麼樣
710 | 那些
711 | 那會兒
712 | 那兒
713 | 那時
714 | 那樣
715 | 那般
716 | 那邊
717 | 那裡
718 | 都
719 | 鄙人
720 | 鑑於
721 | 針對
722 | 阿
723 | 除
724 | 除了
725 | 除外
726 | 除開
727 | 除此之外
728 | 除非
729 | 隨
730 | 隨後
731 | 隨時
732 | 隨著
733 | 難道說
734 | 非但
735 | 非徒
736 | 非特
737 | 非獨
738 | 靠
739 | 順
740 | 順著
741 | 首先
742 | !
743 | ,
744 | :
745 | ;
746 | ?
--------------------------------------------------------------------------------
/exp.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": 2,
6 | "metadata": {},
7 | "outputs": [],
8 | "source": [
9 | "import pandas as pd\n",
10 | "from bertopic import BERTopic\n",
11 | "from ckiptagger import construct_dictionary, WS, POS, NER\n",
12 | "from transformers import AutoModelForTokenClassification\n",
13 | "import numpy as np\n",
14 | "import random\n",
15 | "import torch"
16 | ]
17 | },
18 | {
19 | "cell_type": "code",
20 | "execution_count": 3,
21 | "metadata": {},
22 | "outputs": [],
23 | "source": [
24 | "def set_seed(seed: int) -> None:\n",
25 | " random.seed(seed)\n",
26 | " np.random.seed(seed)\n",
27 | " torch.manual_seed(seed)\n",
28 | " torch.cuda.manual_seed_all(seed)\n",
29 | "\n",
30 | " if torch.cuda.is_available():\n",
31 | " # Disable cuDNN benchmark for deterministic selection on algorithm.\n",
32 | " torch.backends.cudnn.benchmark = False\n",
33 | " torch.backends.cudnn.deterministic = True\n",
34 | " \n",
35 | "set_seed(4698)"
36 | ]
37 | },
38 | {
39 | "cell_type": "code",
40 | "execution_count": 4,
41 | "metadata": {},
42 | "outputs": [
43 | {
44 | "name": "stdout",
45 | "output_type": "stream",
46 | "text": [
47 | "南韓\n"
48 | ]
49 | }
50 | ],
51 | "source": [
52 | "keysfile = \"data/keys.txt\"\n",
53 | "with open(keysfile) as file:\n",
54 | " lines = file.read().splitlines() \n",
55 | "\n",
56 | "print(lines[1])"
57 | ]
58 | },
59 | {
60 | "cell_type": "code",
61 | "execution_count": 5,
62 | "metadata": {},
63 | "outputs": [],
64 | "source": [
65 | "keydict = { l: 1 for l in lines}\n",
66 | "dictionary = construct_dictionary(keydict)"
67 | ]
68 | },
69 | {
70 | "cell_type": "code",
71 | "execution_count": 6,
72 | "metadata": {},
73 | "outputs": [
74 | {
75 | "name": "stderr",
76 | "output_type": "stream",
77 | "text": [
78 | "2024-04-10 13:31:21.802783: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:375] MLIR V1 optimization pass is not enabled\n"
79 | ]
80 | }
81 | ],
82 | "source": [
83 | "ws = WS(\"./data\")\n",
84 | "pos = POS(\"./data\")\n",
85 | "ner = NER(\"./data\")"
86 | ]
87 | },
88 | {
89 | "cell_type": "code",
90 | "execution_count": 7,
91 | "metadata": {},
92 | "outputs": [],
93 | "source": [
94 | "df = pd.read_csv(\"data/data.csv\")\n",
95 | "df = df[[\"year\", \"name\", \"label\", \"description\"]]"
96 | ]
97 | },
98 | {
99 | "cell_type": "code",
100 | "execution_count": 8,
101 | "metadata": {},
102 | "outputs": [],
103 | "source": [
104 | "\n",
105 | "stoptext = open('data/stopword.txt', encoding='utf-8').read()\n",
106 | "stopwords = stoptext.split('\\n')\n"
107 | ]
108 | },
109 | {
110 | "cell_type": "code",
111 | "execution_count": 9,
112 | "metadata": {},
113 | "outputs": [],
114 | "source": [
115 | "sentence_list = df[\"description\"].tolist()\n",
116 | "word_sentence_list = ws(\n",
117 | " sentence_list,\n",
118 | " sentence_segmentation = True, # To consider delimiters\n",
119 | " segment_delimiter_set = {\",\", \"。\", \":\", \"?\", \"!\", \";\"}, # This is the defualt set of delimiters\n",
120 | " recommend_dictionary = dictionary # words in this dictionary are encouraged \n",
121 | ")\n"
122 | ]
123 | },
124 | {
125 | "cell_type": "code",
126 | "execution_count": 10,
127 | "metadata": {},
128 | "outputs": [],
129 | "source": [
130 | "# 轉換為BERTopic 可接受格式\n",
131 | "ws = [\" \".join(w) for w in word_sentence_list]"
132 | ]
133 | },
134 | {
135 | "cell_type": "code",
136 | "execution_count": 11,
137 | "metadata": {},
138 | "outputs": [
139 | {
140 | "name": "stderr",
141 | "output_type": "stream",
142 | "text": [
143 | "2024-04-10 13:34:52,128 - BERTopic - Embedding - Transforming documents to embeddings.\n",
144 | "Batches: 100%|██████████| 99/99 [00:37<00:00, 2.67it/s]\n",
145 | "2024-04-10 13:35:32,437 - BERTopic - Embedding - Completed ✓\n",
146 | "2024-04-10 13:35:32,438 - BERTopic - Dimensionality - Fitting the dimensionality reduction algorithm\n",
147 | "2024-04-10 13:35:41,960 - BERTopic - Dimensionality - Completed ✓\n",
148 | "2024-04-10 13:35:41,960 - BERTopic - Cluster - Start clustering the reduced embeddings\n",
149 | "2024-04-10 13:35:42,022 - BERTopic - Cluster - Completed ✓\n",
150 | "2024-04-10 13:35:42,026 - BERTopic - Representation - Extracting topics from clusters using representation models.\n",
151 | "2024-04-10 13:35:42,353 - BERTopic - Representation - Completed ✓\n"
152 | ]
153 | }
154 | ],
155 | "source": [
156 | "model = AutoModelForTokenClassification.from_pretrained(\"ckiplab/bert-base-chinese-ws\")\n",
157 | "topic_model = BERTopic(\n",
158 | " language=\"chinese\", \n",
159 | " embedding_model=model, \n",
160 | " verbose=True\n",
161 | ")\n",
162 | "topics, probs = topic_model.fit_transform(ws)\n"
163 | ]
164 | },
165 | {
166 | "cell_type": "code",
167 | "execution_count": 12,
168 | "metadata": {},
169 | "outputs": [],
170 | "source": [
171 | "timestamps = df.year.tolist() # 讀取data.csv檔案中的 year 資料,作為我們的timestamp\n",
172 | "timestamps = [f\"{str(int(t)+1911)}-01-01\" for t in timestamps]"
173 | ]
174 | },
175 | {
176 | "cell_type": "code",
177 | "execution_count": 19,
178 | "metadata": {},
179 | "outputs": [
180 | {
181 | "name": "stderr",
182 | "output_type": "stream",
183 | "text": [
184 | "0it [00:00, ?it/s]"
185 | ]
186 | },
187 | {
188 | "name": "stderr",
189 | "output_type": "stream",
190 | "text": [
191 | "8it [00:01, 4.88it/s]\n"
192 | ]
193 | },
194 | {
195 | "data": {
196 | "application/vnd.plotly.v1+json": {
197 | "config": {
198 | "plotlyServerURL": "https://plot.ly"
199 | },
200 | "data": [
201 | {
202 | "hoverinfo": "text",
203 | "hovertext": [
204 | "Topic 0
Words: 臨床試驗, 計畫, 研究, 開發, 進行",
205 | "Topic 0
Words: 疫苗, 計畫, 臨床試驗, 開發, 發展",
206 | "Topic 0
Words: 生技, 臨床試驗, 計畫, 研發, 發展",
207 | "Topic 0
Words: 新藥, 生技, 中心, 計畫, 疾病",
208 | "Topic 0
Words: 中心, 計畫, 生技, 進行, 研發",
209 | "Topic 0
Words: 計畫, 中心, 臨床試驗, 開發, 生技",
210 | "Topic 0
Words: 計畫, 臨床試驗, 中心, 開發, 發展",
211 | "Topic 0
Words: 開發, 中心, 計畫, 研發, 發展"
212 | ],
213 | "marker": {
214 | "color": "#E69F00"
215 | },
216 | "mode": "lines",
217 | "name": "0_計畫_臨床試驗_中心_開發",
218 | "type": "scatter",
219 | "x": [
220 | "2014-01-01T00:00:00",
221 | "2015-01-01T00:00:00",
222 | "2016-01-01T00:00:00",
223 | "2017-01-01T00:00:00",
224 | "2018-01-01T00:00:00",
225 | "2019-01-01T00:00:00",
226 | "2020-01-01T00:00:00",
227 | "2021-01-01T00:00:00"
228 | ],
229 | "y": [
230 | 44,
231 | 48,
232 | 35,
233 | 35,
234 | 42,
235 | 40,
236 | 38,
237 | 48
238 | ]
239 | },
240 | {
241 | "hoverinfo": "text",
242 | "hovertext": [
243 | "Topic 1
Words: 智慧電子, 智慧, 服務, 發展, 產業",
244 | "Topic 1
Words: 智慧, 智慧電子, 應用, 發展, 產業",
245 | "Topic 1
Words: 智慧, 服務, 智財, 平台, 發展",
246 | "Topic 1
Words: 智財, 智慧, 服務, 產業, 發展",
247 | "Topic 1
Words: 智慧, 服務, 發展, 應用, 產業",
248 | "Topic 1
Words: 智慧, 產業, 發展, 服務, 推動",
249 | "Topic 1
Words: 智慧, 發展, 應用, 服務, 產業",
250 | "Topic 1
Words: 智慧, 5g, 應用, 發展, 產業"
251 | ],
252 | "marker": {
253 | "color": "#56B4E9"
254 | },
255 | "mode": "lines",
256 | "name": "1_智慧_服務_發展_產業",
257 | "type": "scatter",
258 | "x": [
259 | "2014-01-01T00:00:00",
260 | "2015-01-01T00:00:00",
261 | "2016-01-01T00:00:00",
262 | "2017-01-01T00:00:00",
263 | "2018-01-01T00:00:00",
264 | "2019-01-01T00:00:00",
265 | "2020-01-01T00:00:00",
266 | "2021-01-01T00:00:00"
267 | ],
268 | "y": [
269 | 20,
270 | 20,
271 | 22,
272 | 34,
273 | 39,
274 | 43,
275 | 39,
276 | 54
277 | ]
278 | },
279 | {
280 | "hoverinfo": "text",
281 | "hovertext": [
282 | "Topic 2
Words: 中子, 工商, 行政, 地震, 實驗",
283 | "Topic 2
Words: 中子, 行政, 工商, 實驗, 頻道",
284 | "Topic 2
Words: 中子, 人事, 研發成果, 實驗, 行政",
285 | "Topic 2
Words: 中子, 行政, 電視臺, 協助, 訊號",
286 | "Topic 2
Words: 中子, 頻寬, 行政, 電視臺, 訊號",
287 | "Topic 2
Words: 頻寬, 電視臺, 行政, 訊號, mbps",
288 | "Topic 2
Words: 頻寬, 訊號, 電視臺, 上鏈, 測謊",
289 | "Topic 2
Words: 訊號, 電視臺, 上鏈, 接收, 頻道"
290 | ],
291 | "marker": {
292 | "color": "#009E73"
293 | },
294 | "mode": "lines",
295 | "name": "2_中子_行政_頻道_實驗",
296 | "type": "scatter",
297 | "x": [
298 | "2014-01-01T00:00:00",
299 | "2015-01-01T00:00:00",
300 | "2016-01-01T00:00:00",
301 | "2017-01-01T00:00:00",
302 | "2018-01-01T00:00:00",
303 | "2019-01-01T00:00:00",
304 | "2020-01-01T00:00:00",
305 | "2021-01-01T00:00:00"
306 | ],
307 | "y": [
308 | 19,
309 | 20,
310 | 24,
311 | 27,
312 | 20,
313 | 16,
314 | 16,
315 | 13
316 | ]
317 | },
318 | {
319 | "hoverinfo": "text",
320 | "hovertext": [
321 | "Topic 3
Words: 食品, pki, 美食, 食媒性, 病原",
322 | "Topic 3
Words: 食品, 食媒性, 加工技術, 品牌, 食品安全",
323 | "Topic 3
Words: 食品, 食品產業, 食媒性, 食品安全, 開發",
324 | "Topic 3
Words: 食品, 食品產業, 食媒性, 食品安全, 食材",
325 | "Topic 3
Words: 食品, 食品產業, 食材, 校園午餐, 加工技術",
326 | "Topic 3
Words: 食品, 食品產業, 食材, 開發, 國產",
327 | "Topic 3
Words: 食品, 食品產業, 肥胖, 食材, 氣候變遷",
328 | "Topic 3
Words: 食品, 品質, 透過, 食品產業, 食品安全"
329 | ],
330 | "marker": {
331 | "color": "#F0E442"
332 | },
333 | "mode": "lines",
334 | "name": "3_食品_食品產業_食品安全_食材",
335 | "type": "scatter",
336 | "x": [
337 | "2014-01-01T00:00:00",
338 | "2015-01-01T00:00:00",
339 | "2016-01-01T00:00:00",
340 | "2017-01-01T00:00:00",
341 | "2018-01-01T00:00:00",
342 | "2019-01-01T00:00:00",
343 | "2020-01-01T00:00:00",
344 | "2021-01-01T00:00:00"
345 | ],
346 | "y": [
347 | 10,
348 | 8,
349 | 12,
350 | 17,
351 | 12,
352 | 10,
353 | 11,
354 | 9
355 | ]
356 | },
357 | {
358 | "hoverinfo": "text",
359 | "hovertext": [
360 | "Topic 4
Words: 低溫物流, 園區, 工業, 基礎, 智財",
361 | "Topic 4
Words: 石化, 專利布局, 智財, 材料, 高值化",
362 | "Topic 4
Words: 材料, 石化, 超高畫質, 技術, 基礎",
363 | "Topic 4
Words: 超高畫質, 石化, 製作, 商務, 產業",
364 | "Topic 4
Words: 超高畫質, 高階, 高速寬頻, 基礎, 高值化",
365 | "Topic 4
Words: 超高畫質, 基礎, 高階, 技術, 設備",
366 | "Topic 4
Words: 高速寬頻, 網路, 高值化, 自製, 高階",
367 | "Topic 4
Words: 農業素材, 產業, 碳材, 高值化, 基地臺"
368 | ],
369 | "marker": {
370 | "color": "#D55E00"
371 | },
372 | "mode": "lines",
373 | "name": "4_超高畫質_基礎_產業_技術",
374 | "type": "scatter",
375 | "x": [
376 | "2014-01-01T00:00:00",
377 | "2015-01-01T00:00:00",
378 | "2016-01-01T00:00:00",
379 | "2017-01-01T00:00:00",
380 | "2018-01-01T00:00:00",
381 | "2019-01-01T00:00:00",
382 | "2020-01-01T00:00:00",
383 | "2021-01-01T00:00:00"
384 | ],
385 | "y": [
386 | 10,
387 | 8,
388 | 11,
389 | 9,
390 | 10,
391 | 13,
392 | 10,
393 | 8
394 | ]
395 | },
396 | {
397 | "hoverinfo": "text",
398 | "hovertext": [
399 | "Topic 5
Words: 地下水, 都會區, 山區, 水文觀測, 防洪",
400 | "Topic 5
Words: 地下水, 水資源, 都會區, 水文觀測, 補注區",
401 | "Topic 5
Words: 地下水, 混凝土, 綠色水泥, 氣候變遷, 水文觀測",
402 | "Topic 5
Words: 地下水, 氣候變遷, 水資源, 水文觀測, 混凝土",
403 | "Topic 5
Words: 混凝土, 地下水, 綠色水泥, 水庫, 水資源",
404 | "Topic 5
Words: 地下水, 混凝土, 飲用水, 用水, 水庫",
405 | "Topic 5
Words: 地下水, 水資源, 用水, 給水, 飲用水",
406 | "Topic 5
Words: 飲用水, 地層下陷, 水庫, 地下水, 列管"
407 | ],
408 | "marker": {
409 | "color": "#0072B2"
410 | },
411 | "mode": "lines",
412 | "name": "5_地下水_混凝土_水資源_氣候變遷",
413 | "type": "scatter",
414 | "x": [
415 | "2014-01-01T00:00:00",
416 | "2015-01-01T00:00:00",
417 | "2016-01-01T00:00:00",
418 | "2017-01-01T00:00:00",
419 | "2018-01-01T00:00:00",
420 | "2019-01-01T00:00:00",
421 | "2020-01-01T00:00:00",
422 | "2021-01-01T00:00:00"
423 | ],
424 | "y": [
425 | 9,
426 | 8,
427 | 10,
428 | 11,
429 | 8,
430 | 8,
431 | 8,
432 | 7
433 | ]
434 | },
435 | {
436 | "hoverinfo": "text",
437 | "hovertext": [
438 | "Topic 6
Words: 儀器, 光學, 光源, 材料, 真空",
439 | "Topic 6
Words: 儀器, 光源, 光學, 材料, 光束線",
440 | "Topic 6
Words: 光源, 儀器, 光學, 半導體, 設施",
441 | "Topic 6
Words: 光束線, 光源, 光子源, 設施, 實驗",
442 | "Topic 6
Words: 光子源, 光源, 光束線, 設施, 台灣",
443 | "Topic 6
Words: 儀器, 光子源, 設施, 實驗, 光學",
444 | "Topic 6
Words: 設施, 實驗, 光子源, 光源, 顯微術",
445 | "Topic 6
Words: 光子源, 實驗, 光源, 用戶, 台灣"
446 | ],
447 | "marker": {
448 | "color": "#CC79A7"
449 | },
450 | "mode": "lines",
451 | "name": "6_光源_光子源_設施_光束線",
452 | "type": "scatter",
453 | "x": [
454 | "2014-01-01T00:00:00",
455 | "2015-01-01T00:00:00",
456 | "2016-01-01T00:00:00",
457 | "2017-01-01T00:00:00",
458 | "2018-01-01T00:00:00",
459 | "2019-01-01T00:00:00",
460 | "2020-01-01T00:00:00",
461 | "2021-01-01T00:00:00"
462 | ],
463 | "y": [
464 | 8,
465 | 10,
466 | 10,
467 | 8,
468 | 6,
469 | 7,
470 | 5,
471 | 5
472 | ]
473 | },
474 | {
475 | "hoverinfo": "text",
476 | "hovertext": [
477 | "Topic 7
Words: 未來想像, 核心能力, 課程, 人文, 軟體",
478 | "Topic 7
Words: 人文, 社科, 學術, 人文社會, 資料庫",
479 | "Topic 7
Words: 人文, 社會科學, 服務業, 領域, 社會企業",
480 | "Topic 7
Words: 人文, 社會科學, 圖書館, 領域, 服務業",
481 | "Topic 7
Words: 人文, 社會科學, 領域, 圖書館, 數位經濟",
482 | "Topic 7
Words: 新住民, 領域, 新創, 敘事力, 計畫",
483 | "Topic 7
Words: 社會創新, 社會福利, 新住民, 跨域, 分支",
484 | "Topic 7
Words: 社會創新, 社會福利, 分支, 志願服務, 前瞻議題"
485 | ],
486 | "marker": {
487 | "color": "#E69F00"
488 | },
489 | "mode": "lines",
490 | "name": "7_人文_社會科學_社會創新_領域",
491 | "type": "scatter",
492 | "x": [
493 | "2014-01-01T00:00:00",
494 | "2015-01-01T00:00:00",
495 | "2016-01-01T00:00:00",
496 | "2017-01-01T00:00:00",
497 | "2018-01-01T00:00:00",
498 | "2019-01-01T00:00:00",
499 | "2020-01-01T00:00:00",
500 | "2021-01-01T00:00:00"
501 | ],
502 | "y": [
503 | 5,
504 | 5,
505 | 6,
506 | 6,
507 | 6,
508 | 7,
509 | 8,
510 | 6
511 | ]
512 | },
513 | {
514 | "hoverinfo": "text",
515 | "hovertext": [
516 | "Topic 8
Words: 交通資訊, 車輛, 開發, 大型, 系統",
517 | "Topic 8
Words: 車輛, 模組, 大型, 鋰電池, 系統",
518 | "Topic 8
Words: 車輛, 大型, 鋰電池, 高能量, 模組",
519 | "Topic 8
Words: 車輛, 自行車, 模組, 運輸, 技術",
520 | "Topic 8
Words: 駕駛車, 車輛, 自動, 運輸, 自行車",
521 | "Topic 8
Words: 雷達, 高精度, 車輛, 地圖, 自駕",
522 | "Topic 8
Words: 車輛, 車牌辨識, kr, 車廠, 雷達",
523 | "Topic 8
Words: 自駕, 車牌辨識, 車輛, 漁船, 雷達"
524 | ],
525 | "marker": {
526 | "color": "#56B4E9"
527 | },
528 | "mode": "lines",
529 | "name": "8_車輛_模組_大型_系統",
530 | "type": "scatter",
531 | "x": [
532 | "2014-01-01T00:00:00",
533 | "2015-01-01T00:00:00",
534 | "2016-01-01T00:00:00",
535 | "2017-01-01T00:00:00",
536 | "2018-01-01T00:00:00",
537 | "2019-01-01T00:00:00",
538 | "2020-01-01T00:00:00",
539 | "2021-01-01T00:00:00"
540 | ],
541 | "y": [
542 | 10,
543 | 8,
544 | 4,
545 | 4,
546 | 7,
547 | 5,
548 | 5,
549 | 5
550 | ]
551 | },
552 | {
553 | "hoverinfo": "text",
554 | "hovertext": [
555 | "Topic 9
Words: 山崩, 山崩潛勢, 防災教育, 耐震評估, 活動斷層",
556 | "Topic 9
Words: 山崩潛勢, 研發成果, 山崩, 輻射監測, 斷層",
557 | "Topic 9
Words: 活動斷層, 山崩潛勢, 山崩, 斷層, 研發成果",
558 | "Topic 9
Words: 崩塌, 大規模, 山崩, 山崩潛勢, 活動斷層",
559 | "Topic 9
Words: 開源, 研發成果, 農地, 開發, 耐震評估",
560 | "Topic 9
Words: 研發成果, 農地, 災害韌性, 模型, 大量估價",
561 | "Topic 9
Words: 農地, 災害韌性, 農業區, 資訊, 地價",
562 | "Topic 9
Words: 地熱, 探勘, 災害韌性, 潛能區, 農地"
563 | ],
564 | "marker": {
565 | "color": "#009E73"
566 | },
567 | "mode": "lines",
568 | "name": "9_研發成果_山崩_山崩潛勢_耐震評估",
569 | "type": "scatter",
570 | "x": [
571 | "2014-01-01T00:00:00",
572 | "2015-01-01T00:00:00",
573 | "2016-01-01T00:00:00",
574 | "2017-01-01T00:00:00",
575 | "2018-01-01T00:00:00",
576 | "2019-01-01T00:00:00",
577 | "2020-01-01T00:00:00",
578 | "2021-01-01T00:00:00"
579 | ],
580 | "y": [
581 | 7,
582 | 7,
583 | 4,
584 | 7,
585 | 5,
586 | 5,
587 | 4,
588 | 5
589 | ]
590 | },
591 | {
592 | "hoverinfo": "text",
593 | "hovertext": [
594 | "Topic 10
Words: 研究, 有害生物, 植物, 動物用, 安全衛生",
595 | "Topic 10
Words: 研究, 環境毒物, 檢疫, 有害生物, 安全衛生",
596 | "Topic 10
Words: 子項, 審驗技術, 研究, 處置, 廢棄物",
597 | "Topic 10
Words: 子項, 審驗技術, 研究, 廢棄物, 處置",
598 | "Topic 10
Words: 研究, 廢棄物, 子項, 審驗技術, 低放射性",
599 | "Topic 10
Words: 研究, 廢棄物, 子項, 低放射性, 處置",
600 | "Topic 10
Words: 研究, 勞動, 廢棄物, 資材, 處置",
601 | "Topic 10
Words: 勞動, 職業安全, 衛生, 研究, 職場安全"
602 | ],
603 | "marker": {
604 | "color": "#F0E442"
605 | },
606 | "mode": "lines",
607 | "name": "10_研究_廢棄物_子項_勞動",
608 | "type": "scatter",
609 | "x": [
610 | "2014-01-01T00:00:00",
611 | "2015-01-01T00:00:00",
612 | "2016-01-01T00:00:00",
613 | "2017-01-01T00:00:00",
614 | "2018-01-01T00:00:00",
615 | "2019-01-01T00:00:00",
616 | "2020-01-01T00:00:00",
617 | "2021-01-01T00:00:00"
618 | ],
619 | "y": [
620 | 4,
621 | 4,
622 | 5,
623 | 7,
624 | 7,
625 | 7,
626 | 8,
627 | 1
628 | ]
629 | },
630 | {
631 | "hoverinfo": "text",
632 | "hovertext": [
633 | "Topic 11
Words: 海洋, 海洋科技, 探測, 地震, 海嘯",
634 | "Topic 11
Words: 海洋, 海洋科技, 探測, 研究船, 水合物",
635 | "Topic 11
Words: 海洋, 探測, 海洋科技, 地震, 海域",
636 | "Topic 11
Words: 海洋, 海洋科技, 海域, 地震, 公里",
637 | "Topic 11
Words: 海洋, 海洋科技, 南海, 海象, 海域",
638 | "Topic 11
Words: 海洋, 海洋科技, 海洋環境, 研究船, 船舶",
639 | "Topic 11
Words: 災防, 海象, 海洋, 海域, 氣象",
640 | "Topic 11
Words: 海洋, 海象, 海洋科技, 海域, 空運"
641 | ],
642 | "marker": {
643 | "color": "#D55E00"
644 | },
645 | "mode": "lines",
646 | "name": "11_海洋_海洋科技_海域_探測",
647 | "type": "scatter",
648 | "x": [
649 | "2014-01-01T00:00:00",
650 | "2015-01-01T00:00:00",
651 | "2016-01-01T00:00:00",
652 | "2017-01-01T00:00:00",
653 | "2018-01-01T00:00:00",
654 | "2019-01-01T00:00:00",
655 | "2020-01-01T00:00:00",
656 | "2021-01-01T00:00:00"
657 | ],
658 | "y": [
659 | 6,
660 | 7,
661 | 5,
662 | 7,
663 | 5,
664 | 5,
665 | 3,
666 | 4
667 | ]
668 | }
669 | ],
670 | "layout": {
671 | "height": 450,
672 | "hoverlabel": {
673 | "bgcolor": "white",
674 | "font": {
675 | "family": "Rockwell",
676 | "size": 16
677 | }
678 | },
679 | "legend": {
680 | "title": {
681 | "text": "Global Topic Representation"
682 | }
683 | },
684 | "template": {
685 | "data": {
686 | "bar": [
687 | {
688 | "error_x": {
689 | "color": "rgb(36,36,36)"
690 | },
691 | "error_y": {
692 | "color": "rgb(36,36,36)"
693 | },
694 | "marker": {
695 | "line": {
696 | "color": "white",
697 | "width": 0.5
698 | },
699 | "pattern": {
700 | "fillmode": "overlay",
701 | "size": 10,
702 | "solidity": 0.2
703 | }
704 | },
705 | "type": "bar"
706 | }
707 | ],
708 | "barpolar": [
709 | {
710 | "marker": {
711 | "line": {
712 | "color": "white",
713 | "width": 0.5
714 | },
715 | "pattern": {
716 | "fillmode": "overlay",
717 | "size": 10,
718 | "solidity": 0.2
719 | }
720 | },
721 | "type": "barpolar"
722 | }
723 | ],
724 | "carpet": [
725 | {
726 | "aaxis": {
727 | "endlinecolor": "rgb(36,36,36)",
728 | "gridcolor": "white",
729 | "linecolor": "white",
730 | "minorgridcolor": "white",
731 | "startlinecolor": "rgb(36,36,36)"
732 | },
733 | "baxis": {
734 | "endlinecolor": "rgb(36,36,36)",
735 | "gridcolor": "white",
736 | "linecolor": "white",
737 | "minorgridcolor": "white",
738 | "startlinecolor": "rgb(36,36,36)"
739 | },
740 | "type": "carpet"
741 | }
742 | ],
743 | "choropleth": [
744 | {
745 | "colorbar": {
746 | "outlinewidth": 1,
747 | "tickcolor": "rgb(36,36,36)",
748 | "ticks": "outside"
749 | },
750 | "type": "choropleth"
751 | }
752 | ],
753 | "contour": [
754 | {
755 | "colorbar": {
756 | "outlinewidth": 1,
757 | "tickcolor": "rgb(36,36,36)",
758 | "ticks": "outside"
759 | },
760 | "colorscale": [
761 | [
762 | 0,
763 | "#440154"
764 | ],
765 | [
766 | 0.1111111111111111,
767 | "#482878"
768 | ],
769 | [
770 | 0.2222222222222222,
771 | "#3e4989"
772 | ],
773 | [
774 | 0.3333333333333333,
775 | "#31688e"
776 | ],
777 | [
778 | 0.4444444444444444,
779 | "#26828e"
780 | ],
781 | [
782 | 0.5555555555555556,
783 | "#1f9e89"
784 | ],
785 | [
786 | 0.6666666666666666,
787 | "#35b779"
788 | ],
789 | [
790 | 0.7777777777777778,
791 | "#6ece58"
792 | ],
793 | [
794 | 0.8888888888888888,
795 | "#b5de2b"
796 | ],
797 | [
798 | 1,
799 | "#fde725"
800 | ]
801 | ],
802 | "type": "contour"
803 | }
804 | ],
805 | "contourcarpet": [
806 | {
807 | "colorbar": {
808 | "outlinewidth": 1,
809 | "tickcolor": "rgb(36,36,36)",
810 | "ticks": "outside"
811 | },
812 | "type": "contourcarpet"
813 | }
814 | ],
815 | "heatmap": [
816 | {
817 | "colorbar": {
818 | "outlinewidth": 1,
819 | "tickcolor": "rgb(36,36,36)",
820 | "ticks": "outside"
821 | },
822 | "colorscale": [
823 | [
824 | 0,
825 | "#440154"
826 | ],
827 | [
828 | 0.1111111111111111,
829 | "#482878"
830 | ],
831 | [
832 | 0.2222222222222222,
833 | "#3e4989"
834 | ],
835 | [
836 | 0.3333333333333333,
837 | "#31688e"
838 | ],
839 | [
840 | 0.4444444444444444,
841 | "#26828e"
842 | ],
843 | [
844 | 0.5555555555555556,
845 | "#1f9e89"
846 | ],
847 | [
848 | 0.6666666666666666,
849 | "#35b779"
850 | ],
851 | [
852 | 0.7777777777777778,
853 | "#6ece58"
854 | ],
855 | [
856 | 0.8888888888888888,
857 | "#b5de2b"
858 | ],
859 | [
860 | 1,
861 | "#fde725"
862 | ]
863 | ],
864 | "type": "heatmap"
865 | }
866 | ],
867 | "heatmapgl": [
868 | {
869 | "colorbar": {
870 | "outlinewidth": 1,
871 | "tickcolor": "rgb(36,36,36)",
872 | "ticks": "outside"
873 | },
874 | "colorscale": [
875 | [
876 | 0,
877 | "#440154"
878 | ],
879 | [
880 | 0.1111111111111111,
881 | "#482878"
882 | ],
883 | [
884 | 0.2222222222222222,
885 | "#3e4989"
886 | ],
887 | [
888 | 0.3333333333333333,
889 | "#31688e"
890 | ],
891 | [
892 | 0.4444444444444444,
893 | "#26828e"
894 | ],
895 | [
896 | 0.5555555555555556,
897 | "#1f9e89"
898 | ],
899 | [
900 | 0.6666666666666666,
901 | "#35b779"
902 | ],
903 | [
904 | 0.7777777777777778,
905 | "#6ece58"
906 | ],
907 | [
908 | 0.8888888888888888,
909 | "#b5de2b"
910 | ],
911 | [
912 | 1,
913 | "#fde725"
914 | ]
915 | ],
916 | "type": "heatmapgl"
917 | }
918 | ],
919 | "histogram": [
920 | {
921 | "marker": {
922 | "line": {
923 | "color": "white",
924 | "width": 0.6
925 | }
926 | },
927 | "type": "histogram"
928 | }
929 | ],
930 | "histogram2d": [
931 | {
932 | "colorbar": {
933 | "outlinewidth": 1,
934 | "tickcolor": "rgb(36,36,36)",
935 | "ticks": "outside"
936 | },
937 | "colorscale": [
938 | [
939 | 0,
940 | "#440154"
941 | ],
942 | [
943 | 0.1111111111111111,
944 | "#482878"
945 | ],
946 | [
947 | 0.2222222222222222,
948 | "#3e4989"
949 | ],
950 | [
951 | 0.3333333333333333,
952 | "#31688e"
953 | ],
954 | [
955 | 0.4444444444444444,
956 | "#26828e"
957 | ],
958 | [
959 | 0.5555555555555556,
960 | "#1f9e89"
961 | ],
962 | [
963 | 0.6666666666666666,
964 | "#35b779"
965 | ],
966 | [
967 | 0.7777777777777778,
968 | "#6ece58"
969 | ],
970 | [
971 | 0.8888888888888888,
972 | "#b5de2b"
973 | ],
974 | [
975 | 1,
976 | "#fde725"
977 | ]
978 | ],
979 | "type": "histogram2d"
980 | }
981 | ],
982 | "histogram2dcontour": [
983 | {
984 | "colorbar": {
985 | "outlinewidth": 1,
986 | "tickcolor": "rgb(36,36,36)",
987 | "ticks": "outside"
988 | },
989 | "colorscale": [
990 | [
991 | 0,
992 | "#440154"
993 | ],
994 | [
995 | 0.1111111111111111,
996 | "#482878"
997 | ],
998 | [
999 | 0.2222222222222222,
1000 | "#3e4989"
1001 | ],
1002 | [
1003 | 0.3333333333333333,
1004 | "#31688e"
1005 | ],
1006 | [
1007 | 0.4444444444444444,
1008 | "#26828e"
1009 | ],
1010 | [
1011 | 0.5555555555555556,
1012 | "#1f9e89"
1013 | ],
1014 | [
1015 | 0.6666666666666666,
1016 | "#35b779"
1017 | ],
1018 | [
1019 | 0.7777777777777778,
1020 | "#6ece58"
1021 | ],
1022 | [
1023 | 0.8888888888888888,
1024 | "#b5de2b"
1025 | ],
1026 | [
1027 | 1,
1028 | "#fde725"
1029 | ]
1030 | ],
1031 | "type": "histogram2dcontour"
1032 | }
1033 | ],
1034 | "mesh3d": [
1035 | {
1036 | "colorbar": {
1037 | "outlinewidth": 1,
1038 | "tickcolor": "rgb(36,36,36)",
1039 | "ticks": "outside"
1040 | },
1041 | "type": "mesh3d"
1042 | }
1043 | ],
1044 | "parcoords": [
1045 | {
1046 | "line": {
1047 | "colorbar": {
1048 | "outlinewidth": 1,
1049 | "tickcolor": "rgb(36,36,36)",
1050 | "ticks": "outside"
1051 | }
1052 | },
1053 | "type": "parcoords"
1054 | }
1055 | ],
1056 | "pie": [
1057 | {
1058 | "automargin": true,
1059 | "type": "pie"
1060 | }
1061 | ],
1062 | "scatter": [
1063 | {
1064 | "fillpattern": {
1065 | "fillmode": "overlay",
1066 | "size": 10,
1067 | "solidity": 0.2
1068 | },
1069 | "type": "scatter"
1070 | }
1071 | ],
1072 | "scatter3d": [
1073 | {
1074 | "line": {
1075 | "colorbar": {
1076 | "outlinewidth": 1,
1077 | "tickcolor": "rgb(36,36,36)",
1078 | "ticks": "outside"
1079 | }
1080 | },
1081 | "marker": {
1082 | "colorbar": {
1083 | "outlinewidth": 1,
1084 | "tickcolor": "rgb(36,36,36)",
1085 | "ticks": "outside"
1086 | }
1087 | },
1088 | "type": "scatter3d"
1089 | }
1090 | ],
1091 | "scattercarpet": [
1092 | {
1093 | "marker": {
1094 | "colorbar": {
1095 | "outlinewidth": 1,
1096 | "tickcolor": "rgb(36,36,36)",
1097 | "ticks": "outside"
1098 | }
1099 | },
1100 | "type": "scattercarpet"
1101 | }
1102 | ],
1103 | "scattergeo": [
1104 | {
1105 | "marker": {
1106 | "colorbar": {
1107 | "outlinewidth": 1,
1108 | "tickcolor": "rgb(36,36,36)",
1109 | "ticks": "outside"
1110 | }
1111 | },
1112 | "type": "scattergeo"
1113 | }
1114 | ],
1115 | "scattergl": [
1116 | {
1117 | "marker": {
1118 | "colorbar": {
1119 | "outlinewidth": 1,
1120 | "tickcolor": "rgb(36,36,36)",
1121 | "ticks": "outside"
1122 | }
1123 | },
1124 | "type": "scattergl"
1125 | }
1126 | ],
1127 | "scattermapbox": [
1128 | {
1129 | "marker": {
1130 | "colorbar": {
1131 | "outlinewidth": 1,
1132 | "tickcolor": "rgb(36,36,36)",
1133 | "ticks": "outside"
1134 | }
1135 | },
1136 | "type": "scattermapbox"
1137 | }
1138 | ],
1139 | "scatterpolar": [
1140 | {
1141 | "marker": {
1142 | "colorbar": {
1143 | "outlinewidth": 1,
1144 | "tickcolor": "rgb(36,36,36)",
1145 | "ticks": "outside"
1146 | }
1147 | },
1148 | "type": "scatterpolar"
1149 | }
1150 | ],
1151 | "scatterpolargl": [
1152 | {
1153 | "marker": {
1154 | "colorbar": {
1155 | "outlinewidth": 1,
1156 | "tickcolor": "rgb(36,36,36)",
1157 | "ticks": "outside"
1158 | }
1159 | },
1160 | "type": "scatterpolargl"
1161 | }
1162 | ],
1163 | "scatterternary": [
1164 | {
1165 | "marker": {
1166 | "colorbar": {
1167 | "outlinewidth": 1,
1168 | "tickcolor": "rgb(36,36,36)",
1169 | "ticks": "outside"
1170 | }
1171 | },
1172 | "type": "scatterternary"
1173 | }
1174 | ],
1175 | "surface": [
1176 | {
1177 | "colorbar": {
1178 | "outlinewidth": 1,
1179 | "tickcolor": "rgb(36,36,36)",
1180 | "ticks": "outside"
1181 | },
1182 | "colorscale": [
1183 | [
1184 | 0,
1185 | "#440154"
1186 | ],
1187 | [
1188 | 0.1111111111111111,
1189 | "#482878"
1190 | ],
1191 | [
1192 | 0.2222222222222222,
1193 | "#3e4989"
1194 | ],
1195 | [
1196 | 0.3333333333333333,
1197 | "#31688e"
1198 | ],
1199 | [
1200 | 0.4444444444444444,
1201 | "#26828e"
1202 | ],
1203 | [
1204 | 0.5555555555555556,
1205 | "#1f9e89"
1206 | ],
1207 | [
1208 | 0.6666666666666666,
1209 | "#35b779"
1210 | ],
1211 | [
1212 | 0.7777777777777778,
1213 | "#6ece58"
1214 | ],
1215 | [
1216 | 0.8888888888888888,
1217 | "#b5de2b"
1218 | ],
1219 | [
1220 | 1,
1221 | "#fde725"
1222 | ]
1223 | ],
1224 | "type": "surface"
1225 | }
1226 | ],
1227 | "table": [
1228 | {
1229 | "cells": {
1230 | "fill": {
1231 | "color": "rgb(237,237,237)"
1232 | },
1233 | "line": {
1234 | "color": "white"
1235 | }
1236 | },
1237 | "header": {
1238 | "fill": {
1239 | "color": "rgb(217,217,217)"
1240 | },
1241 | "line": {
1242 | "color": "white"
1243 | }
1244 | },
1245 | "type": "table"
1246 | }
1247 | ]
1248 | },
1249 | "layout": {
1250 | "annotationdefaults": {
1251 | "arrowhead": 0,
1252 | "arrowwidth": 1
1253 | },
1254 | "autotypenumbers": "strict",
1255 | "coloraxis": {
1256 | "colorbar": {
1257 | "outlinewidth": 1,
1258 | "tickcolor": "rgb(36,36,36)",
1259 | "ticks": "outside"
1260 | }
1261 | },
1262 | "colorscale": {
1263 | "diverging": [
1264 | [
1265 | 0,
1266 | "rgb(103,0,31)"
1267 | ],
1268 | [
1269 | 0.1,
1270 | "rgb(178,24,43)"
1271 | ],
1272 | [
1273 | 0.2,
1274 | "rgb(214,96,77)"
1275 | ],
1276 | [
1277 | 0.3,
1278 | "rgb(244,165,130)"
1279 | ],
1280 | [
1281 | 0.4,
1282 | "rgb(253,219,199)"
1283 | ],
1284 | [
1285 | 0.5,
1286 | "rgb(247,247,247)"
1287 | ],
1288 | [
1289 | 0.6,
1290 | "rgb(209,229,240)"
1291 | ],
1292 | [
1293 | 0.7,
1294 | "rgb(146,197,222)"
1295 | ],
1296 | [
1297 | 0.8,
1298 | "rgb(67,147,195)"
1299 | ],
1300 | [
1301 | 0.9,
1302 | "rgb(33,102,172)"
1303 | ],
1304 | [
1305 | 1,
1306 | "rgb(5,48,97)"
1307 | ]
1308 | ],
1309 | "sequential": [
1310 | [
1311 | 0,
1312 | "#440154"
1313 | ],
1314 | [
1315 | 0.1111111111111111,
1316 | "#482878"
1317 | ],
1318 | [
1319 | 0.2222222222222222,
1320 | "#3e4989"
1321 | ],
1322 | [
1323 | 0.3333333333333333,
1324 | "#31688e"
1325 | ],
1326 | [
1327 | 0.4444444444444444,
1328 | "#26828e"
1329 | ],
1330 | [
1331 | 0.5555555555555556,
1332 | "#1f9e89"
1333 | ],
1334 | [
1335 | 0.6666666666666666,
1336 | "#35b779"
1337 | ],
1338 | [
1339 | 0.7777777777777778,
1340 | "#6ece58"
1341 | ],
1342 | [
1343 | 0.8888888888888888,
1344 | "#b5de2b"
1345 | ],
1346 | [
1347 | 1,
1348 | "#fde725"
1349 | ]
1350 | ],
1351 | "sequentialminus": [
1352 | [
1353 | 0,
1354 | "#440154"
1355 | ],
1356 | [
1357 | 0.1111111111111111,
1358 | "#482878"
1359 | ],
1360 | [
1361 | 0.2222222222222222,
1362 | "#3e4989"
1363 | ],
1364 | [
1365 | 0.3333333333333333,
1366 | "#31688e"
1367 | ],
1368 | [
1369 | 0.4444444444444444,
1370 | "#26828e"
1371 | ],
1372 | [
1373 | 0.5555555555555556,
1374 | "#1f9e89"
1375 | ],
1376 | [
1377 | 0.6666666666666666,
1378 | "#35b779"
1379 | ],
1380 | [
1381 | 0.7777777777777778,
1382 | "#6ece58"
1383 | ],
1384 | [
1385 | 0.8888888888888888,
1386 | "#b5de2b"
1387 | ],
1388 | [
1389 | 1,
1390 | "#fde725"
1391 | ]
1392 | ]
1393 | },
1394 | "colorway": [
1395 | "#1F77B4",
1396 | "#FF7F0E",
1397 | "#2CA02C",
1398 | "#D62728",
1399 | "#9467BD",
1400 | "#8C564B",
1401 | "#E377C2",
1402 | "#7F7F7F",
1403 | "#BCBD22",
1404 | "#17BECF"
1405 | ],
1406 | "font": {
1407 | "color": "rgb(36,36,36)"
1408 | },
1409 | "geo": {
1410 | "bgcolor": "white",
1411 | "lakecolor": "white",
1412 | "landcolor": "white",
1413 | "showlakes": true,
1414 | "showland": true,
1415 | "subunitcolor": "white"
1416 | },
1417 | "hoverlabel": {
1418 | "align": "left"
1419 | },
1420 | "hovermode": "closest",
1421 | "mapbox": {
1422 | "style": "light"
1423 | },
1424 | "paper_bgcolor": "white",
1425 | "plot_bgcolor": "white",
1426 | "polar": {
1427 | "angularaxis": {
1428 | "gridcolor": "rgb(232,232,232)",
1429 | "linecolor": "rgb(36,36,36)",
1430 | "showgrid": false,
1431 | "showline": true,
1432 | "ticks": "outside"
1433 | },
1434 | "bgcolor": "white",
1435 | "radialaxis": {
1436 | "gridcolor": "rgb(232,232,232)",
1437 | "linecolor": "rgb(36,36,36)",
1438 | "showgrid": false,
1439 | "showline": true,
1440 | "ticks": "outside"
1441 | }
1442 | },
1443 | "scene": {
1444 | "xaxis": {
1445 | "backgroundcolor": "white",
1446 | "gridcolor": "rgb(232,232,232)",
1447 | "gridwidth": 2,
1448 | "linecolor": "rgb(36,36,36)",
1449 | "showbackground": true,
1450 | "showgrid": false,
1451 | "showline": true,
1452 | "ticks": "outside",
1453 | "zeroline": false,
1454 | "zerolinecolor": "rgb(36,36,36)"
1455 | },
1456 | "yaxis": {
1457 | "backgroundcolor": "white",
1458 | "gridcolor": "rgb(232,232,232)",
1459 | "gridwidth": 2,
1460 | "linecolor": "rgb(36,36,36)",
1461 | "showbackground": true,
1462 | "showgrid": false,
1463 | "showline": true,
1464 | "ticks": "outside",
1465 | "zeroline": false,
1466 | "zerolinecolor": "rgb(36,36,36)"
1467 | },
1468 | "zaxis": {
1469 | "backgroundcolor": "white",
1470 | "gridcolor": "rgb(232,232,232)",
1471 | "gridwidth": 2,
1472 | "linecolor": "rgb(36,36,36)",
1473 | "showbackground": true,
1474 | "showgrid": false,
1475 | "showline": true,
1476 | "ticks": "outside",
1477 | "zeroline": false,
1478 | "zerolinecolor": "rgb(36,36,36)"
1479 | }
1480 | },
1481 | "shapedefaults": {
1482 | "fillcolor": "black",
1483 | "line": {
1484 | "width": 0
1485 | },
1486 | "opacity": 0.3
1487 | },
1488 | "ternary": {
1489 | "aaxis": {
1490 | "gridcolor": "rgb(232,232,232)",
1491 | "linecolor": "rgb(36,36,36)",
1492 | "showgrid": false,
1493 | "showline": true,
1494 | "ticks": "outside"
1495 | },
1496 | "baxis": {
1497 | "gridcolor": "rgb(232,232,232)",
1498 | "linecolor": "rgb(36,36,36)",
1499 | "showgrid": false,
1500 | "showline": true,
1501 | "ticks": "outside"
1502 | },
1503 | "bgcolor": "white",
1504 | "caxis": {
1505 | "gridcolor": "rgb(232,232,232)",
1506 | "linecolor": "rgb(36,36,36)",
1507 | "showgrid": false,
1508 | "showline": true,
1509 | "ticks": "outside"
1510 | }
1511 | },
1512 | "title": {
1513 | "x": 0.05
1514 | },
1515 | "xaxis": {
1516 | "automargin": true,
1517 | "gridcolor": "rgb(232,232,232)",
1518 | "linecolor": "rgb(36,36,36)",
1519 | "showgrid": false,
1520 | "showline": true,
1521 | "ticks": "outside",
1522 | "title": {
1523 | "standoff": 15
1524 | },
1525 | "zeroline": false,
1526 | "zerolinecolor": "rgb(36,36,36)"
1527 | },
1528 | "yaxis": {
1529 | "automargin": true,
1530 | "gridcolor": "rgb(232,232,232)",
1531 | "linecolor": "rgb(36,36,36)",
1532 | "showgrid": false,
1533 | "showline": true,
1534 | "ticks": "outside",
1535 | "title": {
1536 | "standoff": 15
1537 | },
1538 | "zeroline": false,
1539 | "zerolinecolor": "rgb(36,36,36)"
1540 | }
1541 | }
1542 | },
1543 | "title": {
1544 | "font": {
1545 | "color": "Black",
1546 | "size": 22
1547 | },
1548 | "text": "Topics over Time",
1549 | "x": 0.4,
1550 | "xanchor": "center",
1551 | "y": 0.95,
1552 | "yanchor": "top"
1553 | },
1554 | "width": 1000,
1555 | "xaxis": {
1556 | "showgrid": true
1557 | },
1558 | "yaxis": {
1559 | "showgrid": true,
1560 | "title": {
1561 | "text": "Frequency"
1562 | }
1563 | }
1564 | }
1565 | }
1566 | },
1567 | "metadata": {},
1568 | "output_type": "display_data"
1569 | }
1570 | ],
1571 | "source": [
1572 | "# 各 Topic 時間序列圖\n",
1573 | "topics_over_time = topic_model.topics_over_time(\n",
1574 | " ws, \n",
1575 | " timestamps, \n",
1576 | ")\n",
1577 | "tot_fig = topic_model.visualize_topics_over_time(\n",
1578 | " topics_over_time, top_n_topics=12, width=1000\n",
1579 | ")\n",
1580 | "tot_fig"
1581 | ]
1582 | },
1583 | {
1584 | "cell_type": "code",
1585 | "execution_count": 17,
1586 | "metadata": {},
1587 | "outputs": [
1588 | {
1589 | "data": {
1590 | "application/vnd.plotly.v1+json": {
1591 | "config": {
1592 | "plotlyServerURL": "https://plot.ly"
1593 | },
1594 | "data": [
1595 | {
1596 | "marker": {
1597 | "color": "#D55E00"
1598 | },
1599 | "orientation": "h",
1600 | "type": "bar",
1601 | "x": [
1602 | 0.010036117168846038,
1603 | 0.010358506547658615,
1604 | 0.0106788410320816,
1605 | 0.010698859633264288,
1606 | 0.011671864319967307
1607 | ],
1608 | "xaxis": "x",
1609 | "y": [
1610 | "生技 ",
1611 | "開發 ",
1612 | "中心 ",
1613 | "臨床試驗 ",
1614 | "計畫 "
1615 | ],
1616 | "yaxis": "y"
1617 | },
1618 | {
1619 | "marker": {
1620 | "color": "#0072B2"
1621 | },
1622 | "orientation": "h",
1623 | "type": "bar",
1624 | "x": [
1625 | 0.012840369315768275,
1626 | 0.013344711270460653,
1627 | 0.013531355294615246,
1628 | 0.014271351261196953,
1629 | 0.0162898165763392
1630 | ],
1631 | "xaxis": "x2",
1632 | "y": [
1633 | "應用 ",
1634 | "產業 ",
1635 | "發展 ",
1636 | "服務 ",
1637 | "智慧 "
1638 | ],
1639 | "yaxis": "y2"
1640 | },
1641 | {
1642 | "marker": {
1643 | "color": "#CC79A7"
1644 | },
1645 | "orientation": "h",
1646 | "type": "bar",
1647 | "x": [
1648 | 0.01578994069289869,
1649 | 0.01583388449071344,
1650 | 0.015931341674847187,
1651 | 0.019298052269421336,
1652 | 0.024931485304576874
1653 | ],
1654 | "xaxis": "x3",
1655 | "y": [
1656 | "訊號 ",
1657 | "實驗 ",
1658 | "頻道 ",
1659 | "行政 ",
1660 | "中子 "
1661 | ],
1662 | "yaxis": "y3"
1663 | },
1664 | {
1665 | "marker": {
1666 | "color": "#E69F00"
1667 | },
1668 | "orientation": "h",
1669 | "type": "bar",
1670 | "x": [
1671 | 0.018245889255145812,
1672 | 0.018270356399692125,
1673 | 0.0183354611928508,
1674 | 0.023199971305239788,
1675 | 0.04340033768876724
1676 | ],
1677 | "xaxis": "x4",
1678 | "y": [
1679 | "食媒性 ",
1680 | "食材 ",
1681 | "食品安全 ",
1682 | "食品產業 ",
1683 | "食品 "
1684 | ],
1685 | "yaxis": "y4"
1686 | },
1687 | {
1688 | "marker": {
1689 | "color": "#56B4E9"
1690 | },
1691 | "orientation": "h",
1692 | "type": "bar",
1693 | "x": [
1694 | 0.016462123966053496,
1695 | 0.016819153912223883,
1696 | 0.016951842528449633,
1697 | 0.018055115167948405,
1698 | 0.020226475421989475
1699 | ],
1700 | "xaxis": "x5",
1701 | "y": [
1702 | "石化 ",
1703 | "技術 ",
1704 | "產業 ",
1705 | "基礎 ",
1706 | "超高畫質 "
1707 | ],
1708 | "yaxis": "y5"
1709 | },
1710 | {
1711 | "marker": {
1712 | "color": "#009E73"
1713 | },
1714 | "orientation": "h",
1715 | "type": "bar",
1716 | "x": [
1717 | 0.019213738809956207,
1718 | 0.019644641014467952,
1719 | 0.022262091699000144,
1720 | 0.022617764817073394,
1721 | 0.044360590702848667
1722 | ],
1723 | "xaxis": "x6",
1724 | "y": [
1725 | "用水 ",
1726 | "氣候變遷 ",
1727 | "水資源 ",
1728 | "混凝土 ",
1729 | "地下水 "
1730 | ],
1731 | "yaxis": "y6"
1732 | },
1733 | {
1734 | "marker": {
1735 | "color": "#F0E442"
1736 | },
1737 | "orientation": "h",
1738 | "type": "bar",
1739 | "x": [
1740 | 0.025989495319065685,
1741 | 0.026155895875823284,
1742 | 0.02724730513272366,
1743 | 0.02849321310375216,
1744 | 0.030703461179395986
1745 | ],
1746 | "xaxis": "x7",
1747 | "y": [
1748 | "儀器 ",
1749 | "光束線 ",
1750 | "設施 ",
1751 | "光子源 ",
1752 | "光源 "
1753 | ],
1754 | "yaxis": "y7"
1755 | },
1756 | {
1757 | "marker": {
1758 | "color": "#D55E00"
1759 | },
1760 | "orientation": "h",
1761 | "type": "bar",
1762 | "x": [
1763 | 0.016178798001916274,
1764 | 0.019996998644316036,
1765 | 0.02117393601397003,
1766 | 0.022820429423951127,
1767 | 0.027014113121812827
1768 | ],
1769 | "xaxis": "x8",
1770 | "y": [
1771 | "學術 ",
1772 | "領域 ",
1773 | "社會創新 ",
1774 | "社會科學 ",
1775 | "人文 "
1776 | ],
1777 | "yaxis": "y8"
1778 | },
1779 | {
1780 | "marker": {
1781 | "color": "#0072B2"
1782 | },
1783 | "orientation": "h",
1784 | "type": "bar",
1785 | "x": [
1786 | 0.016823737053654045,
1787 | 0.01705607984260705,
1788 | 0.01774483437408217,
1789 | 0.01819365272349642,
1790 | 0.032203601246784626
1791 | ],
1792 | "xaxis": "x9",
1793 | "y": [
1794 | "技術 ",
1795 | "系統 ",
1796 | "大型 ",
1797 | "模組 ",
1798 | "車輛 "
1799 | ],
1800 | "yaxis": "y9"
1801 | },
1802 | {
1803 | "marker": {
1804 | "color": "#CC79A7"
1805 | },
1806 | "orientation": "h",
1807 | "type": "bar",
1808 | "x": [
1809 | 0.017640395275580457,
1810 | 0.01975905514443746,
1811 | 0.02036147094608936,
1812 | 0.021217334704581526,
1813 | 0.02204602573042941
1814 | ],
1815 | "xaxis": "x10",
1816 | "y": [
1817 | "活動斷層 ",
1818 | "耐震評估 ",
1819 | "山崩潛勢 ",
1820 | "山崩 ",
1821 | "研發成果 "
1822 | ],
1823 | "yaxis": "y10"
1824 | },
1825 | {
1826 | "marker": {
1827 | "color": "#E69F00"
1828 | },
1829 | "orientation": "h",
1830 | "type": "bar",
1831 | "x": [
1832 | 0.0328218556090127,
1833 | 0.03295156206050777,
1834 | 0.0370506974653674,
1835 | 0.03761736203756081,
1836 | 0.0530372794541444
1837 | ],
1838 | "xaxis": "x11",
1839 | "y": [
1840 | "處置 ",
1841 | "勞動 ",
1842 | "子項 ",
1843 | "廢棄物 ",
1844 | "研究 "
1845 | ],
1846 | "yaxis": "y11"
1847 | },
1848 | {
1849 | "marker": {
1850 | "color": "#56B4E9"
1851 | },
1852 | "orientation": "h",
1853 | "type": "bar",
1854 | "x": [
1855 | 0.024299717862355803,
1856 | 0.02456976894408899,
1857 | 0.025888136519192917,
1858 | 0.04252718243236,
1859 | 0.07990365608402077
1860 | ],
1861 | "xaxis": "x12",
1862 | "y": [
1863 | "海象 ",
1864 | "探測 ",
1865 | "海域 ",
1866 | "海洋科技 ",
1867 | "海洋 "
1868 | ],
1869 | "yaxis": "y12"
1870 | }
1871 | ],
1872 | "layout": {
1873 | "annotations": [
1874 | {
1875 | "font": {
1876 | "size": 16
1877 | },
1878 | "showarrow": false,
1879 | "text": "Topic 0",
1880 | "x": 0.0875,
1881 | "xanchor": "center",
1882 | "xref": "paper",
1883 | "y": 1,
1884 | "yanchor": "bottom",
1885 | "yref": "paper"
1886 | },
1887 | {
1888 | "font": {
1889 | "size": 16
1890 | },
1891 | "showarrow": false,
1892 | "text": "Topic 1",
1893 | "x": 0.36250000000000004,
1894 | "xanchor": "center",
1895 | "xref": "paper",
1896 | "y": 1,
1897 | "yanchor": "bottom",
1898 | "yref": "paper"
1899 | },
1900 | {
1901 | "font": {
1902 | "size": 16
1903 | },
1904 | "showarrow": false,
1905 | "text": "Topic 2",
1906 | "x": 0.6375000000000001,
1907 | "xanchor": "center",
1908 | "xref": "paper",
1909 | "y": 1,
1910 | "yanchor": "bottom",
1911 | "yref": "paper"
1912 | },
1913 | {
1914 | "font": {
1915 | "size": 16
1916 | },
1917 | "showarrow": false,
1918 | "text": "Topic 3",
1919 | "x": 0.9125,
1920 | "xanchor": "center",
1921 | "xref": "paper",
1922 | "y": 1,
1923 | "yanchor": "bottom",
1924 | "yref": "paper"
1925 | },
1926 | {
1927 | "font": {
1928 | "size": 16
1929 | },
1930 | "showarrow": false,
1931 | "text": "Topic 4",
1932 | "x": 0.0875,
1933 | "xanchor": "center",
1934 | "xref": "paper",
1935 | "y": 0.6222222222222222,
1936 | "yanchor": "bottom",
1937 | "yref": "paper"
1938 | },
1939 | {
1940 | "font": {
1941 | "size": 16
1942 | },
1943 | "showarrow": false,
1944 | "text": "Topic 5",
1945 | "x": 0.36250000000000004,
1946 | "xanchor": "center",
1947 | "xref": "paper",
1948 | "y": 0.6222222222222222,
1949 | "yanchor": "bottom",
1950 | "yref": "paper"
1951 | },
1952 | {
1953 | "font": {
1954 | "size": 16
1955 | },
1956 | "showarrow": false,
1957 | "text": "Topic 6",
1958 | "x": 0.6375000000000001,
1959 | "xanchor": "center",
1960 | "xref": "paper",
1961 | "y": 0.6222222222222222,
1962 | "yanchor": "bottom",
1963 | "yref": "paper"
1964 | },
1965 | {
1966 | "font": {
1967 | "size": 16
1968 | },
1969 | "showarrow": false,
1970 | "text": "Topic 7",
1971 | "x": 0.9125,
1972 | "xanchor": "center",
1973 | "xref": "paper",
1974 | "y": 0.6222222222222222,
1975 | "yanchor": "bottom",
1976 | "yref": "paper"
1977 | },
1978 | {
1979 | "font": {
1980 | "size": 16
1981 | },
1982 | "showarrow": false,
1983 | "text": "Topic 8",
1984 | "x": 0.0875,
1985 | "xanchor": "center",
1986 | "xref": "paper",
1987 | "y": 0.24444444444444446,
1988 | "yanchor": "bottom",
1989 | "yref": "paper"
1990 | },
1991 | {
1992 | "font": {
1993 | "size": 16
1994 | },
1995 | "showarrow": false,
1996 | "text": "Topic 9",
1997 | "x": 0.36250000000000004,
1998 | "xanchor": "center",
1999 | "xref": "paper",
2000 | "y": 0.24444444444444446,
2001 | "yanchor": "bottom",
2002 | "yref": "paper"
2003 | },
2004 | {
2005 | "font": {
2006 | "size": 16
2007 | },
2008 | "showarrow": false,
2009 | "text": "Topic 10",
2010 | "x": 0.6375000000000001,
2011 | "xanchor": "center",
2012 | "xref": "paper",
2013 | "y": 0.24444444444444446,
2014 | "yanchor": "bottom",
2015 | "yref": "paper"
2016 | },
2017 | {
2018 | "font": {
2019 | "size": 16
2020 | },
2021 | "showarrow": false,
2022 | "text": "Topic 11",
2023 | "x": 0.9125,
2024 | "xanchor": "center",
2025 | "xref": "paper",
2026 | "y": 0.24444444444444446,
2027 | "yanchor": "bottom",
2028 | "yref": "paper"
2029 | }
2030 | ],
2031 | "height": 750,
2032 | "hoverlabel": {
2033 | "bgcolor": "white",
2034 | "font": {
2035 | "family": "Rockwell",
2036 | "size": 16
2037 | }
2038 | },
2039 | "showlegend": false,
2040 | "template": {
2041 | "data": {
2042 | "bar": [
2043 | {
2044 | "error_x": {
2045 | "color": "#2a3f5f"
2046 | },
2047 | "error_y": {
2048 | "color": "#2a3f5f"
2049 | },
2050 | "marker": {
2051 | "line": {
2052 | "color": "white",
2053 | "width": 0.5
2054 | },
2055 | "pattern": {
2056 | "fillmode": "overlay",
2057 | "size": 10,
2058 | "solidity": 0.2
2059 | }
2060 | },
2061 | "type": "bar"
2062 | }
2063 | ],
2064 | "barpolar": [
2065 | {
2066 | "marker": {
2067 | "line": {
2068 | "color": "white",
2069 | "width": 0.5
2070 | },
2071 | "pattern": {
2072 | "fillmode": "overlay",
2073 | "size": 10,
2074 | "solidity": 0.2
2075 | }
2076 | },
2077 | "type": "barpolar"
2078 | }
2079 | ],
2080 | "carpet": [
2081 | {
2082 | "aaxis": {
2083 | "endlinecolor": "#2a3f5f",
2084 | "gridcolor": "#C8D4E3",
2085 | "linecolor": "#C8D4E3",
2086 | "minorgridcolor": "#C8D4E3",
2087 | "startlinecolor": "#2a3f5f"
2088 | },
2089 | "baxis": {
2090 | "endlinecolor": "#2a3f5f",
2091 | "gridcolor": "#C8D4E3",
2092 | "linecolor": "#C8D4E3",
2093 | "minorgridcolor": "#C8D4E3",
2094 | "startlinecolor": "#2a3f5f"
2095 | },
2096 | "type": "carpet"
2097 | }
2098 | ],
2099 | "choropleth": [
2100 | {
2101 | "colorbar": {
2102 | "outlinewidth": 0,
2103 | "ticks": ""
2104 | },
2105 | "type": "choropleth"
2106 | }
2107 | ],
2108 | "contour": [
2109 | {
2110 | "colorbar": {
2111 | "outlinewidth": 0,
2112 | "ticks": ""
2113 | },
2114 | "colorscale": [
2115 | [
2116 | 0,
2117 | "#0d0887"
2118 | ],
2119 | [
2120 | 0.1111111111111111,
2121 | "#46039f"
2122 | ],
2123 | [
2124 | 0.2222222222222222,
2125 | "#7201a8"
2126 | ],
2127 | [
2128 | 0.3333333333333333,
2129 | "#9c179e"
2130 | ],
2131 | [
2132 | 0.4444444444444444,
2133 | "#bd3786"
2134 | ],
2135 | [
2136 | 0.5555555555555556,
2137 | "#d8576b"
2138 | ],
2139 | [
2140 | 0.6666666666666666,
2141 | "#ed7953"
2142 | ],
2143 | [
2144 | 0.7777777777777778,
2145 | "#fb9f3a"
2146 | ],
2147 | [
2148 | 0.8888888888888888,
2149 | "#fdca26"
2150 | ],
2151 | [
2152 | 1,
2153 | "#f0f921"
2154 | ]
2155 | ],
2156 | "type": "contour"
2157 | }
2158 | ],
2159 | "contourcarpet": [
2160 | {
2161 | "colorbar": {
2162 | "outlinewidth": 0,
2163 | "ticks": ""
2164 | },
2165 | "type": "contourcarpet"
2166 | }
2167 | ],
2168 | "heatmap": [
2169 | {
2170 | "colorbar": {
2171 | "outlinewidth": 0,
2172 | "ticks": ""
2173 | },
2174 | "colorscale": [
2175 | [
2176 | 0,
2177 | "#0d0887"
2178 | ],
2179 | [
2180 | 0.1111111111111111,
2181 | "#46039f"
2182 | ],
2183 | [
2184 | 0.2222222222222222,
2185 | "#7201a8"
2186 | ],
2187 | [
2188 | 0.3333333333333333,
2189 | "#9c179e"
2190 | ],
2191 | [
2192 | 0.4444444444444444,
2193 | "#bd3786"
2194 | ],
2195 | [
2196 | 0.5555555555555556,
2197 | "#d8576b"
2198 | ],
2199 | [
2200 | 0.6666666666666666,
2201 | "#ed7953"
2202 | ],
2203 | [
2204 | 0.7777777777777778,
2205 | "#fb9f3a"
2206 | ],
2207 | [
2208 | 0.8888888888888888,
2209 | "#fdca26"
2210 | ],
2211 | [
2212 | 1,
2213 | "#f0f921"
2214 | ]
2215 | ],
2216 | "type": "heatmap"
2217 | }
2218 | ],
2219 | "heatmapgl": [
2220 | {
2221 | "colorbar": {
2222 | "outlinewidth": 0,
2223 | "ticks": ""
2224 | },
2225 | "colorscale": [
2226 | [
2227 | 0,
2228 | "#0d0887"
2229 | ],
2230 | [
2231 | 0.1111111111111111,
2232 | "#46039f"
2233 | ],
2234 | [
2235 | 0.2222222222222222,
2236 | "#7201a8"
2237 | ],
2238 | [
2239 | 0.3333333333333333,
2240 | "#9c179e"
2241 | ],
2242 | [
2243 | 0.4444444444444444,
2244 | "#bd3786"
2245 | ],
2246 | [
2247 | 0.5555555555555556,
2248 | "#d8576b"
2249 | ],
2250 | [
2251 | 0.6666666666666666,
2252 | "#ed7953"
2253 | ],
2254 | [
2255 | 0.7777777777777778,
2256 | "#fb9f3a"
2257 | ],
2258 | [
2259 | 0.8888888888888888,
2260 | "#fdca26"
2261 | ],
2262 | [
2263 | 1,
2264 | "#f0f921"
2265 | ]
2266 | ],
2267 | "type": "heatmapgl"
2268 | }
2269 | ],
2270 | "histogram": [
2271 | {
2272 | "marker": {
2273 | "pattern": {
2274 | "fillmode": "overlay",
2275 | "size": 10,
2276 | "solidity": 0.2
2277 | }
2278 | },
2279 | "type": "histogram"
2280 | }
2281 | ],
2282 | "histogram2d": [
2283 | {
2284 | "colorbar": {
2285 | "outlinewidth": 0,
2286 | "ticks": ""
2287 | },
2288 | "colorscale": [
2289 | [
2290 | 0,
2291 | "#0d0887"
2292 | ],
2293 | [
2294 | 0.1111111111111111,
2295 | "#46039f"
2296 | ],
2297 | [
2298 | 0.2222222222222222,
2299 | "#7201a8"
2300 | ],
2301 | [
2302 | 0.3333333333333333,
2303 | "#9c179e"
2304 | ],
2305 | [
2306 | 0.4444444444444444,
2307 | "#bd3786"
2308 | ],
2309 | [
2310 | 0.5555555555555556,
2311 | "#d8576b"
2312 | ],
2313 | [
2314 | 0.6666666666666666,
2315 | "#ed7953"
2316 | ],
2317 | [
2318 | 0.7777777777777778,
2319 | "#fb9f3a"
2320 | ],
2321 | [
2322 | 0.8888888888888888,
2323 | "#fdca26"
2324 | ],
2325 | [
2326 | 1,
2327 | "#f0f921"
2328 | ]
2329 | ],
2330 | "type": "histogram2d"
2331 | }
2332 | ],
2333 | "histogram2dcontour": [
2334 | {
2335 | "colorbar": {
2336 | "outlinewidth": 0,
2337 | "ticks": ""
2338 | },
2339 | "colorscale": [
2340 | [
2341 | 0,
2342 | "#0d0887"
2343 | ],
2344 | [
2345 | 0.1111111111111111,
2346 | "#46039f"
2347 | ],
2348 | [
2349 | 0.2222222222222222,
2350 | "#7201a8"
2351 | ],
2352 | [
2353 | 0.3333333333333333,
2354 | "#9c179e"
2355 | ],
2356 | [
2357 | 0.4444444444444444,
2358 | "#bd3786"
2359 | ],
2360 | [
2361 | 0.5555555555555556,
2362 | "#d8576b"
2363 | ],
2364 | [
2365 | 0.6666666666666666,
2366 | "#ed7953"
2367 | ],
2368 | [
2369 | 0.7777777777777778,
2370 | "#fb9f3a"
2371 | ],
2372 | [
2373 | 0.8888888888888888,
2374 | "#fdca26"
2375 | ],
2376 | [
2377 | 1,
2378 | "#f0f921"
2379 | ]
2380 | ],
2381 | "type": "histogram2dcontour"
2382 | }
2383 | ],
2384 | "mesh3d": [
2385 | {
2386 | "colorbar": {
2387 | "outlinewidth": 0,
2388 | "ticks": ""
2389 | },
2390 | "type": "mesh3d"
2391 | }
2392 | ],
2393 | "parcoords": [
2394 | {
2395 | "line": {
2396 | "colorbar": {
2397 | "outlinewidth": 0,
2398 | "ticks": ""
2399 | }
2400 | },
2401 | "type": "parcoords"
2402 | }
2403 | ],
2404 | "pie": [
2405 | {
2406 | "automargin": true,
2407 | "type": "pie"
2408 | }
2409 | ],
2410 | "scatter": [
2411 | {
2412 | "fillpattern": {
2413 | "fillmode": "overlay",
2414 | "size": 10,
2415 | "solidity": 0.2
2416 | },
2417 | "type": "scatter"
2418 | }
2419 | ],
2420 | "scatter3d": [
2421 | {
2422 | "line": {
2423 | "colorbar": {
2424 | "outlinewidth": 0,
2425 | "ticks": ""
2426 | }
2427 | },
2428 | "marker": {
2429 | "colorbar": {
2430 | "outlinewidth": 0,
2431 | "ticks": ""
2432 | }
2433 | },
2434 | "type": "scatter3d"
2435 | }
2436 | ],
2437 | "scattercarpet": [
2438 | {
2439 | "marker": {
2440 | "colorbar": {
2441 | "outlinewidth": 0,
2442 | "ticks": ""
2443 | }
2444 | },
2445 | "type": "scattercarpet"
2446 | }
2447 | ],
2448 | "scattergeo": [
2449 | {
2450 | "marker": {
2451 | "colorbar": {
2452 | "outlinewidth": 0,
2453 | "ticks": ""
2454 | }
2455 | },
2456 | "type": "scattergeo"
2457 | }
2458 | ],
2459 | "scattergl": [
2460 | {
2461 | "marker": {
2462 | "colorbar": {
2463 | "outlinewidth": 0,
2464 | "ticks": ""
2465 | }
2466 | },
2467 | "type": "scattergl"
2468 | }
2469 | ],
2470 | "scattermapbox": [
2471 | {
2472 | "marker": {
2473 | "colorbar": {
2474 | "outlinewidth": 0,
2475 | "ticks": ""
2476 | }
2477 | },
2478 | "type": "scattermapbox"
2479 | }
2480 | ],
2481 | "scatterpolar": [
2482 | {
2483 | "marker": {
2484 | "colorbar": {
2485 | "outlinewidth": 0,
2486 | "ticks": ""
2487 | }
2488 | },
2489 | "type": "scatterpolar"
2490 | }
2491 | ],
2492 | "scatterpolargl": [
2493 | {
2494 | "marker": {
2495 | "colorbar": {
2496 | "outlinewidth": 0,
2497 | "ticks": ""
2498 | }
2499 | },
2500 | "type": "scatterpolargl"
2501 | }
2502 | ],
2503 | "scatterternary": [
2504 | {
2505 | "marker": {
2506 | "colorbar": {
2507 | "outlinewidth": 0,
2508 | "ticks": ""
2509 | }
2510 | },
2511 | "type": "scatterternary"
2512 | }
2513 | ],
2514 | "surface": [
2515 | {
2516 | "colorbar": {
2517 | "outlinewidth": 0,
2518 | "ticks": ""
2519 | },
2520 | "colorscale": [
2521 | [
2522 | 0,
2523 | "#0d0887"
2524 | ],
2525 | [
2526 | 0.1111111111111111,
2527 | "#46039f"
2528 | ],
2529 | [
2530 | 0.2222222222222222,
2531 | "#7201a8"
2532 | ],
2533 | [
2534 | 0.3333333333333333,
2535 | "#9c179e"
2536 | ],
2537 | [
2538 | 0.4444444444444444,
2539 | "#bd3786"
2540 | ],
2541 | [
2542 | 0.5555555555555556,
2543 | "#d8576b"
2544 | ],
2545 | [
2546 | 0.6666666666666666,
2547 | "#ed7953"
2548 | ],
2549 | [
2550 | 0.7777777777777778,
2551 | "#fb9f3a"
2552 | ],
2553 | [
2554 | 0.8888888888888888,
2555 | "#fdca26"
2556 | ],
2557 | [
2558 | 1,
2559 | "#f0f921"
2560 | ]
2561 | ],
2562 | "type": "surface"
2563 | }
2564 | ],
2565 | "table": [
2566 | {
2567 | "cells": {
2568 | "fill": {
2569 | "color": "#EBF0F8"
2570 | },
2571 | "line": {
2572 | "color": "white"
2573 | }
2574 | },
2575 | "header": {
2576 | "fill": {
2577 | "color": "#C8D4E3"
2578 | },
2579 | "line": {
2580 | "color": "white"
2581 | }
2582 | },
2583 | "type": "table"
2584 | }
2585 | ]
2586 | },
2587 | "layout": {
2588 | "annotationdefaults": {
2589 | "arrowcolor": "#2a3f5f",
2590 | "arrowhead": 0,
2591 | "arrowwidth": 1
2592 | },
2593 | "autotypenumbers": "strict",
2594 | "coloraxis": {
2595 | "colorbar": {
2596 | "outlinewidth": 0,
2597 | "ticks": ""
2598 | }
2599 | },
2600 | "colorscale": {
2601 | "diverging": [
2602 | [
2603 | 0,
2604 | "#8e0152"
2605 | ],
2606 | [
2607 | 0.1,
2608 | "#c51b7d"
2609 | ],
2610 | [
2611 | 0.2,
2612 | "#de77ae"
2613 | ],
2614 | [
2615 | 0.3,
2616 | "#f1b6da"
2617 | ],
2618 | [
2619 | 0.4,
2620 | "#fde0ef"
2621 | ],
2622 | [
2623 | 0.5,
2624 | "#f7f7f7"
2625 | ],
2626 | [
2627 | 0.6,
2628 | "#e6f5d0"
2629 | ],
2630 | [
2631 | 0.7,
2632 | "#b8e186"
2633 | ],
2634 | [
2635 | 0.8,
2636 | "#7fbc41"
2637 | ],
2638 | [
2639 | 0.9,
2640 | "#4d9221"
2641 | ],
2642 | [
2643 | 1,
2644 | "#276419"
2645 | ]
2646 | ],
2647 | "sequential": [
2648 | [
2649 | 0,
2650 | "#0d0887"
2651 | ],
2652 | [
2653 | 0.1111111111111111,
2654 | "#46039f"
2655 | ],
2656 | [
2657 | 0.2222222222222222,
2658 | "#7201a8"
2659 | ],
2660 | [
2661 | 0.3333333333333333,
2662 | "#9c179e"
2663 | ],
2664 | [
2665 | 0.4444444444444444,
2666 | "#bd3786"
2667 | ],
2668 | [
2669 | 0.5555555555555556,
2670 | "#d8576b"
2671 | ],
2672 | [
2673 | 0.6666666666666666,
2674 | "#ed7953"
2675 | ],
2676 | [
2677 | 0.7777777777777778,
2678 | "#fb9f3a"
2679 | ],
2680 | [
2681 | 0.8888888888888888,
2682 | "#fdca26"
2683 | ],
2684 | [
2685 | 1,
2686 | "#f0f921"
2687 | ]
2688 | ],
2689 | "sequentialminus": [
2690 | [
2691 | 0,
2692 | "#0d0887"
2693 | ],
2694 | [
2695 | 0.1111111111111111,
2696 | "#46039f"
2697 | ],
2698 | [
2699 | 0.2222222222222222,
2700 | "#7201a8"
2701 | ],
2702 | [
2703 | 0.3333333333333333,
2704 | "#9c179e"
2705 | ],
2706 | [
2707 | 0.4444444444444444,
2708 | "#bd3786"
2709 | ],
2710 | [
2711 | 0.5555555555555556,
2712 | "#d8576b"
2713 | ],
2714 | [
2715 | 0.6666666666666666,
2716 | "#ed7953"
2717 | ],
2718 | [
2719 | 0.7777777777777778,
2720 | "#fb9f3a"
2721 | ],
2722 | [
2723 | 0.8888888888888888,
2724 | "#fdca26"
2725 | ],
2726 | [
2727 | 1,
2728 | "#f0f921"
2729 | ]
2730 | ]
2731 | },
2732 | "colorway": [
2733 | "#636efa",
2734 | "#EF553B",
2735 | "#00cc96",
2736 | "#ab63fa",
2737 | "#FFA15A",
2738 | "#19d3f3",
2739 | "#FF6692",
2740 | "#B6E880",
2741 | "#FF97FF",
2742 | "#FECB52"
2743 | ],
2744 | "font": {
2745 | "color": "#2a3f5f"
2746 | },
2747 | "geo": {
2748 | "bgcolor": "white",
2749 | "lakecolor": "white",
2750 | "landcolor": "white",
2751 | "showlakes": true,
2752 | "showland": true,
2753 | "subunitcolor": "#C8D4E3"
2754 | },
2755 | "hoverlabel": {
2756 | "align": "left"
2757 | },
2758 | "hovermode": "closest",
2759 | "mapbox": {
2760 | "style": "light"
2761 | },
2762 | "paper_bgcolor": "white",
2763 | "plot_bgcolor": "white",
2764 | "polar": {
2765 | "angularaxis": {
2766 | "gridcolor": "#EBF0F8",
2767 | "linecolor": "#EBF0F8",
2768 | "ticks": ""
2769 | },
2770 | "bgcolor": "white",
2771 | "radialaxis": {
2772 | "gridcolor": "#EBF0F8",
2773 | "linecolor": "#EBF0F8",
2774 | "ticks": ""
2775 | }
2776 | },
2777 | "scene": {
2778 | "xaxis": {
2779 | "backgroundcolor": "white",
2780 | "gridcolor": "#DFE8F3",
2781 | "gridwidth": 2,
2782 | "linecolor": "#EBF0F8",
2783 | "showbackground": true,
2784 | "ticks": "",
2785 | "zerolinecolor": "#EBF0F8"
2786 | },
2787 | "yaxis": {
2788 | "backgroundcolor": "white",
2789 | "gridcolor": "#DFE8F3",
2790 | "gridwidth": 2,
2791 | "linecolor": "#EBF0F8",
2792 | "showbackground": true,
2793 | "ticks": "",
2794 | "zerolinecolor": "#EBF0F8"
2795 | },
2796 | "zaxis": {
2797 | "backgroundcolor": "white",
2798 | "gridcolor": "#DFE8F3",
2799 | "gridwidth": 2,
2800 | "linecolor": "#EBF0F8",
2801 | "showbackground": true,
2802 | "ticks": "",
2803 | "zerolinecolor": "#EBF0F8"
2804 | }
2805 | },
2806 | "shapedefaults": {
2807 | "line": {
2808 | "color": "#2a3f5f"
2809 | }
2810 | },
2811 | "ternary": {
2812 | "aaxis": {
2813 | "gridcolor": "#DFE8F3",
2814 | "linecolor": "#A2B1C6",
2815 | "ticks": ""
2816 | },
2817 | "baxis": {
2818 | "gridcolor": "#DFE8F3",
2819 | "linecolor": "#A2B1C6",
2820 | "ticks": ""
2821 | },
2822 | "bgcolor": "white",
2823 | "caxis": {
2824 | "gridcolor": "#DFE8F3",
2825 | "linecolor": "#A2B1C6",
2826 | "ticks": ""
2827 | }
2828 | },
2829 | "title": {
2830 | "x": 0.05
2831 | },
2832 | "xaxis": {
2833 | "automargin": true,
2834 | "gridcolor": "#EBF0F8",
2835 | "linecolor": "#EBF0F8",
2836 | "ticks": "",
2837 | "title": {
2838 | "standoff": 15
2839 | },
2840 | "zerolinecolor": "#EBF0F8",
2841 | "zerolinewidth": 2
2842 | },
2843 | "yaxis": {
2844 | "automargin": true,
2845 | "gridcolor": "#EBF0F8",
2846 | "linecolor": "#EBF0F8",
2847 | "ticks": "",
2848 | "title": {
2849 | "standoff": 15
2850 | },
2851 | "zerolinecolor": "#EBF0F8",
2852 | "zerolinewidth": 2
2853 | }
2854 | }
2855 | },
2856 | "title": {
2857 | "font": {
2858 | "color": "Black",
2859 | "size": 22
2860 | },
2861 | "text": "Topic Word Scores",
2862 | "x": 0.5,
2863 | "xanchor": "center",
2864 | "yanchor": "top"
2865 | },
2866 | "width": 920,
2867 | "xaxis": {
2868 | "anchor": "y",
2869 | "domain": [
2870 | 0,
2871 | 0.175
2872 | ],
2873 | "showgrid": true
2874 | },
2875 | "xaxis10": {
2876 | "anchor": "y10",
2877 | "domain": [
2878 | 0.275,
2879 | 0.45
2880 | ],
2881 | "showgrid": true
2882 | },
2883 | "xaxis11": {
2884 | "anchor": "y11",
2885 | "domain": [
2886 | 0.55,
2887 | 0.7250000000000001
2888 | ],
2889 | "showgrid": true
2890 | },
2891 | "xaxis12": {
2892 | "anchor": "y12",
2893 | "domain": [
2894 | 0.825,
2895 | 1
2896 | ],
2897 | "showgrid": true
2898 | },
2899 | "xaxis2": {
2900 | "anchor": "y2",
2901 | "domain": [
2902 | 0.275,
2903 | 0.45
2904 | ],
2905 | "showgrid": true
2906 | },
2907 | "xaxis3": {
2908 | "anchor": "y3",
2909 | "domain": [
2910 | 0.55,
2911 | 0.7250000000000001
2912 | ],
2913 | "showgrid": true
2914 | },
2915 | "xaxis4": {
2916 | "anchor": "y4",
2917 | "domain": [
2918 | 0.825,
2919 | 1
2920 | ],
2921 | "showgrid": true
2922 | },
2923 | "xaxis5": {
2924 | "anchor": "y5",
2925 | "domain": [
2926 | 0,
2927 | 0.175
2928 | ],
2929 | "showgrid": true
2930 | },
2931 | "xaxis6": {
2932 | "anchor": "y6",
2933 | "domain": [
2934 | 0.275,
2935 | 0.45
2936 | ],
2937 | "showgrid": true
2938 | },
2939 | "xaxis7": {
2940 | "anchor": "y7",
2941 | "domain": [
2942 | 0.55,
2943 | 0.7250000000000001
2944 | ],
2945 | "showgrid": true
2946 | },
2947 | "xaxis8": {
2948 | "anchor": "y8",
2949 | "domain": [
2950 | 0.825,
2951 | 1
2952 | ],
2953 | "showgrid": true
2954 | },
2955 | "xaxis9": {
2956 | "anchor": "y9",
2957 | "domain": [
2958 | 0,
2959 | 0.175
2960 | ],
2961 | "showgrid": true
2962 | },
2963 | "yaxis": {
2964 | "anchor": "x",
2965 | "domain": [
2966 | 0.7555555555555555,
2967 | 1
2968 | ],
2969 | "showgrid": true
2970 | },
2971 | "yaxis10": {
2972 | "anchor": "x10",
2973 | "domain": [
2974 | 0,
2975 | 0.24444444444444446
2976 | ],
2977 | "showgrid": true
2978 | },
2979 | "yaxis11": {
2980 | "anchor": "x11",
2981 | "domain": [
2982 | 0,
2983 | 0.24444444444444446
2984 | ],
2985 | "showgrid": true
2986 | },
2987 | "yaxis12": {
2988 | "anchor": "x12",
2989 | "domain": [
2990 | 0,
2991 | 0.24444444444444446
2992 | ],
2993 | "showgrid": true
2994 | },
2995 | "yaxis2": {
2996 | "anchor": "x2",
2997 | "domain": [
2998 | 0.7555555555555555,
2999 | 1
3000 | ],
3001 | "showgrid": true
3002 | },
3003 | "yaxis3": {
3004 | "anchor": "x3",
3005 | "domain": [
3006 | 0.7555555555555555,
3007 | 1
3008 | ],
3009 | "showgrid": true
3010 | },
3011 | "yaxis4": {
3012 | "anchor": "x4",
3013 | "domain": [
3014 | 0.7555555555555555,
3015 | 1
3016 | ],
3017 | "showgrid": true
3018 | },
3019 | "yaxis5": {
3020 | "anchor": "x5",
3021 | "domain": [
3022 | 0.37777777777777777,
3023 | 0.6222222222222222
3024 | ],
3025 | "showgrid": true
3026 | },
3027 | "yaxis6": {
3028 | "anchor": "x6",
3029 | "domain": [
3030 | 0.37777777777777777,
3031 | 0.6222222222222222
3032 | ],
3033 | "showgrid": true
3034 | },
3035 | "yaxis7": {
3036 | "anchor": "x7",
3037 | "domain": [
3038 | 0.37777777777777777,
3039 | 0.6222222222222222
3040 | ],
3041 | "showgrid": true
3042 | },
3043 | "yaxis8": {
3044 | "anchor": "x8",
3045 | "domain": [
3046 | 0.37777777777777777,
3047 | 0.6222222222222222
3048 | ],
3049 | "showgrid": true
3050 | },
3051 | "yaxis9": {
3052 | "anchor": "x9",
3053 | "domain": [
3054 | 0,
3055 | 0.24444444444444446
3056 | ],
3057 | "showgrid": true
3058 | }
3059 | }
3060 | }
3061 | },
3062 | "metadata": {},
3063 | "output_type": "display_data"
3064 | }
3065 | ],
3066 | "source": [
3067 | "# 各 Topic TF-IDF 關鍵字直方圖\n",
3068 | "bar_fig = topic_model.visualize_barchart(\n",
3069 | " top_n_topics=12,\n",
3070 | " width=230,\n",
3071 | ")\n",
3072 | "bar_fig"
3073 | ]
3074 | },
3075 | {
3076 | "cell_type": "code",
3077 | "execution_count": 22,
3078 | "metadata": {},
3079 | "outputs": [
3080 | {
3081 | "data": {
3082 | "application/vnd.plotly.v1+json": {
3083 | "config": {
3084 | "plotlyServerURL": "https://plot.ly"
3085 | },
3086 | "data": [
3087 | {
3088 | "customdata": [
3089 | [
3090 | 0,
3091 | "計畫 | 臨床試驗 | 中心 | 開發 | 生技",
3092 | 330
3093 | ],
3094 | [
3095 | 1,
3096 | "智慧 | 服務 | 發展 | 產業 | 應用",
3097 | 271
3098 | ],
3099 | [
3100 | 2,
3101 | "中子 | 行政 | 頻道 | 實驗 | 訊號",
3102 | 155
3103 | ],
3104 | [
3105 | 3,
3106 | "食品 | 食品產業 | 食品安全 | 食材 | 食媒性",
3107 | 89
3108 | ],
3109 | [
3110 | 4,
3111 | "超高畫質 | 基礎 | 產業 | 技術 | 石化",
3112 | 79
3113 | ],
3114 | [
3115 | 5,
3116 | "地下水 | 混凝土 | 水資源 | 氣候變遷 | 用水",
3117 | 69
3118 | ],
3119 | [
3120 | 6,
3121 | "光源 | 光子源 | 設施 | 光束線 | 儀器",
3122 | 59
3123 | ],
3124 | [
3125 | 7,
3126 | "人文 | 社會科學 | 社會創新 | 領域 | 學術",
3127 | 49
3128 | ],
3129 | [
3130 | 8,
3131 | "車輛 | 模組 | 大型 | 系統 | 技術",
3132 | 48
3133 | ],
3134 | [
3135 | 9,
3136 | "研發成果 | 山崩 | 山崩潛勢 | 耐震評估 | 活動斷層",
3137 | 44
3138 | ]
3139 | ],
3140 | "hovertemplate": "Topic %{customdata[0]}
%{customdata[1]}
Size: %{customdata[2]}",
3141 | "legendgroup": "",
3142 | "marker": {
3143 | "color": "#B0BEC5",
3144 | "line": {
3145 | "color": "DarkSlateGrey",
3146 | "width": 2
3147 | },
3148 | "size": [
3149 | 330,
3150 | 271,
3151 | 155,
3152 | 89,
3153 | 79,
3154 | 69,
3155 | 59,
3156 | 49,
3157 | 48,
3158 | 44
3159 | ],
3160 | "sizemode": "area",
3161 | "sizeref": 0.20625,
3162 | "symbol": "circle"
3163 | },
3164 | "mode": "markers",
3165 | "name": "",
3166 | "orientation": "v",
3167 | "showlegend": false,
3168 | "type": "scatter",
3169 | "x": [
3170 | 17.96601104736328,
3171 | -0.20142881572246552,
3172 | 17.805370330810547,
3173 | 18.284164428710938,
3174 | -0.6609519124031067,
3175 | -1.7333014011383057,
3176 | -0.5467479825019836,
3177 | 18.24008560180664,
3178 | -1.2730786800384521,
3179 | -1.2586079835891724
3180 | ],
3181 | "xaxis": "x",
3182 | "y": [
3183 | 12.663885116577148,
3184 | 14.88134479522705,
3185 | 11.765763282775879,
3186 | 13.034785270690918,
3187 | 14.63888931274414,
3188 | 14.921974182128906,
3189 | 14.147262573242188,
3190 | 12.08366870880127,
3191 | 15.501675605773926,
3192 | 15.041316032409668
3193 | ],
3194 | "yaxis": "y"
3195 | }
3196 | ],
3197 | "layout": {
3198 | "annotations": [
3199 | {
3200 | "showarrow": false,
3201 | "text": "D1",
3202 | "x": -1.9932966113090516,
3203 | "y": 13.913912868499756,
3204 | "yshift": 10
3205 | },
3206 | {
3207 | "showarrow": false,
3208 | "text": "D2",
3209 | "x": 9.516746240854262,
3210 | "xshift": 10,
3211 | "y": 17.826926946640015
3212 | }
3213 | ],
3214 | "height": 650,
3215 | "hoverlabel": {
3216 | "bgcolor": "white",
3217 | "font": {
3218 | "family": "Rockwell",
3219 | "size": 16
3220 | }
3221 | },
3222 | "legend": {
3223 | "itemsizing": "constant",
3224 | "tracegroupgap": 0
3225 | },
3226 | "margin": {
3227 | "t": 60
3228 | },
3229 | "shapes": [
3230 | {
3231 | "line": {
3232 | "color": "#CFD8DC",
3233 | "width": 2
3234 | },
3235 | "type": "line",
3236 | "x0": 9.516746240854262,
3237 | "x1": 9.516746240854262,
3238 | "y0": 10.000898790359496,
3239 | "y1": 17.826926946640015
3240 | },
3241 | {
3242 | "line": {
3243 | "color": "#9E9E9E",
3244 | "width": 2
3245 | },
3246 | "type": "line",
3247 | "x0": -1.9932966113090516,
3248 | "x1": 21.026789093017577,
3249 | "y0": 13.913912868499756,
3250 | "y1": 13.913912868499756
3251 | }
3252 | ],
3253 | "sliders": [
3254 | {
3255 | "active": 0,
3256 | "pad": {
3257 | "t": 50
3258 | },
3259 | "steps": [
3260 | {
3261 | "args": [
3262 | {
3263 | "marker.color": [
3264 | [
3265 | "red",
3266 | "#B0BEC5",
3267 | "#B0BEC5",
3268 | "#B0BEC5",
3269 | "#B0BEC5",
3270 | "#B0BEC5",
3271 | "#B0BEC5",
3272 | "#B0BEC5",
3273 | "#B0BEC5",
3274 | "#B0BEC5"
3275 | ]
3276 | ]
3277 | }
3278 | ],
3279 | "label": "Topic 0",
3280 | "method": "update"
3281 | },
3282 | {
3283 | "args": [
3284 | {
3285 | "marker.color": [
3286 | [
3287 | "#B0BEC5",
3288 | "red",
3289 | "#B0BEC5",
3290 | "#B0BEC5",
3291 | "#B0BEC5",
3292 | "#B0BEC5",
3293 | "#B0BEC5",
3294 | "#B0BEC5",
3295 | "#B0BEC5",
3296 | "#B0BEC5"
3297 | ]
3298 | ]
3299 | }
3300 | ],
3301 | "label": "Topic 1",
3302 | "method": "update"
3303 | },
3304 | {
3305 | "args": [
3306 | {
3307 | "marker.color": [
3308 | [
3309 | "#B0BEC5",
3310 | "#B0BEC5",
3311 | "red",
3312 | "#B0BEC5",
3313 | "#B0BEC5",
3314 | "#B0BEC5",
3315 | "#B0BEC5",
3316 | "#B0BEC5",
3317 | "#B0BEC5",
3318 | "#B0BEC5"
3319 | ]
3320 | ]
3321 | }
3322 | ],
3323 | "label": "Topic 2",
3324 | "method": "update"
3325 | },
3326 | {
3327 | "args": [
3328 | {
3329 | "marker.color": [
3330 | [
3331 | "#B0BEC5",
3332 | "#B0BEC5",
3333 | "#B0BEC5",
3334 | "red",
3335 | "#B0BEC5",
3336 | "#B0BEC5",
3337 | "#B0BEC5",
3338 | "#B0BEC5",
3339 | "#B0BEC5",
3340 | "#B0BEC5"
3341 | ]
3342 | ]
3343 | }
3344 | ],
3345 | "label": "Topic 3",
3346 | "method": "update"
3347 | },
3348 | {
3349 | "args": [
3350 | {
3351 | "marker.color": [
3352 | [
3353 | "#B0BEC5",
3354 | "#B0BEC5",
3355 | "#B0BEC5",
3356 | "#B0BEC5",
3357 | "red",
3358 | "#B0BEC5",
3359 | "#B0BEC5",
3360 | "#B0BEC5",
3361 | "#B0BEC5",
3362 | "#B0BEC5"
3363 | ]
3364 | ]
3365 | }
3366 | ],
3367 | "label": "Topic 4",
3368 | "method": "update"
3369 | },
3370 | {
3371 | "args": [
3372 | {
3373 | "marker.color": [
3374 | [
3375 | "#B0BEC5",
3376 | "#B0BEC5",
3377 | "#B0BEC5",
3378 | "#B0BEC5",
3379 | "#B0BEC5",
3380 | "red",
3381 | "#B0BEC5",
3382 | "#B0BEC5",
3383 | "#B0BEC5",
3384 | "#B0BEC5"
3385 | ]
3386 | ]
3387 | }
3388 | ],
3389 | "label": "Topic 5",
3390 | "method": "update"
3391 | },
3392 | {
3393 | "args": [
3394 | {
3395 | "marker.color": [
3396 | [
3397 | "#B0BEC5",
3398 | "#B0BEC5",
3399 | "#B0BEC5",
3400 | "#B0BEC5",
3401 | "#B0BEC5",
3402 | "#B0BEC5",
3403 | "red",
3404 | "#B0BEC5",
3405 | "#B0BEC5",
3406 | "#B0BEC5"
3407 | ]
3408 | ]
3409 | }
3410 | ],
3411 | "label": "Topic 6",
3412 | "method": "update"
3413 | },
3414 | {
3415 | "args": [
3416 | {
3417 | "marker.color": [
3418 | [
3419 | "#B0BEC5",
3420 | "#B0BEC5",
3421 | "#B0BEC5",
3422 | "#B0BEC5",
3423 | "#B0BEC5",
3424 | "#B0BEC5",
3425 | "#B0BEC5",
3426 | "red",
3427 | "#B0BEC5",
3428 | "#B0BEC5"
3429 | ]
3430 | ]
3431 | }
3432 | ],
3433 | "label": "Topic 7",
3434 | "method": "update"
3435 | },
3436 | {
3437 | "args": [
3438 | {
3439 | "marker.color": [
3440 | [
3441 | "#B0BEC5",
3442 | "#B0BEC5",
3443 | "#B0BEC5",
3444 | "#B0BEC5",
3445 | "#B0BEC5",
3446 | "#B0BEC5",
3447 | "#B0BEC5",
3448 | "#B0BEC5",
3449 | "red",
3450 | "#B0BEC5"
3451 | ]
3452 | ]
3453 | }
3454 | ],
3455 | "label": "Topic 8",
3456 | "method": "update"
3457 | },
3458 | {
3459 | "args": [
3460 | {
3461 | "marker.color": [
3462 | [
3463 | "#B0BEC5",
3464 | "#B0BEC5",
3465 | "#B0BEC5",
3466 | "#B0BEC5",
3467 | "#B0BEC5",
3468 | "#B0BEC5",
3469 | "#B0BEC5",
3470 | "#B0BEC5",
3471 | "#B0BEC5",
3472 | "red"
3473 | ]
3474 | ]
3475 | }
3476 | ],
3477 | "label": "Topic 9",
3478 | "method": "update"
3479 | }
3480 | ]
3481 | }
3482 | ],
3483 | "template": {
3484 | "data": {
3485 | "bar": [
3486 | {
3487 | "error_x": {
3488 | "color": "rgb(36,36,36)"
3489 | },
3490 | "error_y": {
3491 | "color": "rgb(36,36,36)"
3492 | },
3493 | "marker": {
3494 | "line": {
3495 | "color": "white",
3496 | "width": 0.5
3497 | },
3498 | "pattern": {
3499 | "fillmode": "overlay",
3500 | "size": 10,
3501 | "solidity": 0.2
3502 | }
3503 | },
3504 | "type": "bar"
3505 | }
3506 | ],
3507 | "barpolar": [
3508 | {
3509 | "marker": {
3510 | "line": {
3511 | "color": "white",
3512 | "width": 0.5
3513 | },
3514 | "pattern": {
3515 | "fillmode": "overlay",
3516 | "size": 10,
3517 | "solidity": 0.2
3518 | }
3519 | },
3520 | "type": "barpolar"
3521 | }
3522 | ],
3523 | "carpet": [
3524 | {
3525 | "aaxis": {
3526 | "endlinecolor": "rgb(36,36,36)",
3527 | "gridcolor": "white",
3528 | "linecolor": "white",
3529 | "minorgridcolor": "white",
3530 | "startlinecolor": "rgb(36,36,36)"
3531 | },
3532 | "baxis": {
3533 | "endlinecolor": "rgb(36,36,36)",
3534 | "gridcolor": "white",
3535 | "linecolor": "white",
3536 | "minorgridcolor": "white",
3537 | "startlinecolor": "rgb(36,36,36)"
3538 | },
3539 | "type": "carpet"
3540 | }
3541 | ],
3542 | "choropleth": [
3543 | {
3544 | "colorbar": {
3545 | "outlinewidth": 1,
3546 | "tickcolor": "rgb(36,36,36)",
3547 | "ticks": "outside"
3548 | },
3549 | "type": "choropleth"
3550 | }
3551 | ],
3552 | "contour": [
3553 | {
3554 | "colorbar": {
3555 | "outlinewidth": 1,
3556 | "tickcolor": "rgb(36,36,36)",
3557 | "ticks": "outside"
3558 | },
3559 | "colorscale": [
3560 | [
3561 | 0,
3562 | "#440154"
3563 | ],
3564 | [
3565 | 0.1111111111111111,
3566 | "#482878"
3567 | ],
3568 | [
3569 | 0.2222222222222222,
3570 | "#3e4989"
3571 | ],
3572 | [
3573 | 0.3333333333333333,
3574 | "#31688e"
3575 | ],
3576 | [
3577 | 0.4444444444444444,
3578 | "#26828e"
3579 | ],
3580 | [
3581 | 0.5555555555555556,
3582 | "#1f9e89"
3583 | ],
3584 | [
3585 | 0.6666666666666666,
3586 | "#35b779"
3587 | ],
3588 | [
3589 | 0.7777777777777778,
3590 | "#6ece58"
3591 | ],
3592 | [
3593 | 0.8888888888888888,
3594 | "#b5de2b"
3595 | ],
3596 | [
3597 | 1,
3598 | "#fde725"
3599 | ]
3600 | ],
3601 | "type": "contour"
3602 | }
3603 | ],
3604 | "contourcarpet": [
3605 | {
3606 | "colorbar": {
3607 | "outlinewidth": 1,
3608 | "tickcolor": "rgb(36,36,36)",
3609 | "ticks": "outside"
3610 | },
3611 | "type": "contourcarpet"
3612 | }
3613 | ],
3614 | "heatmap": [
3615 | {
3616 | "colorbar": {
3617 | "outlinewidth": 1,
3618 | "tickcolor": "rgb(36,36,36)",
3619 | "ticks": "outside"
3620 | },
3621 | "colorscale": [
3622 | [
3623 | 0,
3624 | "#440154"
3625 | ],
3626 | [
3627 | 0.1111111111111111,
3628 | "#482878"
3629 | ],
3630 | [
3631 | 0.2222222222222222,
3632 | "#3e4989"
3633 | ],
3634 | [
3635 | 0.3333333333333333,
3636 | "#31688e"
3637 | ],
3638 | [
3639 | 0.4444444444444444,
3640 | "#26828e"
3641 | ],
3642 | [
3643 | 0.5555555555555556,
3644 | "#1f9e89"
3645 | ],
3646 | [
3647 | 0.6666666666666666,
3648 | "#35b779"
3649 | ],
3650 | [
3651 | 0.7777777777777778,
3652 | "#6ece58"
3653 | ],
3654 | [
3655 | 0.8888888888888888,
3656 | "#b5de2b"
3657 | ],
3658 | [
3659 | 1,
3660 | "#fde725"
3661 | ]
3662 | ],
3663 | "type": "heatmap"
3664 | }
3665 | ],
3666 | "heatmapgl": [
3667 | {
3668 | "colorbar": {
3669 | "outlinewidth": 1,
3670 | "tickcolor": "rgb(36,36,36)",
3671 | "ticks": "outside"
3672 | },
3673 | "colorscale": [
3674 | [
3675 | 0,
3676 | "#440154"
3677 | ],
3678 | [
3679 | 0.1111111111111111,
3680 | "#482878"
3681 | ],
3682 | [
3683 | 0.2222222222222222,
3684 | "#3e4989"
3685 | ],
3686 | [
3687 | 0.3333333333333333,
3688 | "#31688e"
3689 | ],
3690 | [
3691 | 0.4444444444444444,
3692 | "#26828e"
3693 | ],
3694 | [
3695 | 0.5555555555555556,
3696 | "#1f9e89"
3697 | ],
3698 | [
3699 | 0.6666666666666666,
3700 | "#35b779"
3701 | ],
3702 | [
3703 | 0.7777777777777778,
3704 | "#6ece58"
3705 | ],
3706 | [
3707 | 0.8888888888888888,
3708 | "#b5de2b"
3709 | ],
3710 | [
3711 | 1,
3712 | "#fde725"
3713 | ]
3714 | ],
3715 | "type": "heatmapgl"
3716 | }
3717 | ],
3718 | "histogram": [
3719 | {
3720 | "marker": {
3721 | "line": {
3722 | "color": "white",
3723 | "width": 0.6
3724 | }
3725 | },
3726 | "type": "histogram"
3727 | }
3728 | ],
3729 | "histogram2d": [
3730 | {
3731 | "colorbar": {
3732 | "outlinewidth": 1,
3733 | "tickcolor": "rgb(36,36,36)",
3734 | "ticks": "outside"
3735 | },
3736 | "colorscale": [
3737 | [
3738 | 0,
3739 | "#440154"
3740 | ],
3741 | [
3742 | 0.1111111111111111,
3743 | "#482878"
3744 | ],
3745 | [
3746 | 0.2222222222222222,
3747 | "#3e4989"
3748 | ],
3749 | [
3750 | 0.3333333333333333,
3751 | "#31688e"
3752 | ],
3753 | [
3754 | 0.4444444444444444,
3755 | "#26828e"
3756 | ],
3757 | [
3758 | 0.5555555555555556,
3759 | "#1f9e89"
3760 | ],
3761 | [
3762 | 0.6666666666666666,
3763 | "#35b779"
3764 | ],
3765 | [
3766 | 0.7777777777777778,
3767 | "#6ece58"
3768 | ],
3769 | [
3770 | 0.8888888888888888,
3771 | "#b5de2b"
3772 | ],
3773 | [
3774 | 1,
3775 | "#fde725"
3776 | ]
3777 | ],
3778 | "type": "histogram2d"
3779 | }
3780 | ],
3781 | "histogram2dcontour": [
3782 | {
3783 | "colorbar": {
3784 | "outlinewidth": 1,
3785 | "tickcolor": "rgb(36,36,36)",
3786 | "ticks": "outside"
3787 | },
3788 | "colorscale": [
3789 | [
3790 | 0,
3791 | "#440154"
3792 | ],
3793 | [
3794 | 0.1111111111111111,
3795 | "#482878"
3796 | ],
3797 | [
3798 | 0.2222222222222222,
3799 | "#3e4989"
3800 | ],
3801 | [
3802 | 0.3333333333333333,
3803 | "#31688e"
3804 | ],
3805 | [
3806 | 0.4444444444444444,
3807 | "#26828e"
3808 | ],
3809 | [
3810 | 0.5555555555555556,
3811 | "#1f9e89"
3812 | ],
3813 | [
3814 | 0.6666666666666666,
3815 | "#35b779"
3816 | ],
3817 | [
3818 | 0.7777777777777778,
3819 | "#6ece58"
3820 | ],
3821 | [
3822 | 0.8888888888888888,
3823 | "#b5de2b"
3824 | ],
3825 | [
3826 | 1,
3827 | "#fde725"
3828 | ]
3829 | ],
3830 | "type": "histogram2dcontour"
3831 | }
3832 | ],
3833 | "mesh3d": [
3834 | {
3835 | "colorbar": {
3836 | "outlinewidth": 1,
3837 | "tickcolor": "rgb(36,36,36)",
3838 | "ticks": "outside"
3839 | },
3840 | "type": "mesh3d"
3841 | }
3842 | ],
3843 | "parcoords": [
3844 | {
3845 | "line": {
3846 | "colorbar": {
3847 | "outlinewidth": 1,
3848 | "tickcolor": "rgb(36,36,36)",
3849 | "ticks": "outside"
3850 | }
3851 | },
3852 | "type": "parcoords"
3853 | }
3854 | ],
3855 | "pie": [
3856 | {
3857 | "automargin": true,
3858 | "type": "pie"
3859 | }
3860 | ],
3861 | "scatter": [
3862 | {
3863 | "fillpattern": {
3864 | "fillmode": "overlay",
3865 | "size": 10,
3866 | "solidity": 0.2
3867 | },
3868 | "type": "scatter"
3869 | }
3870 | ],
3871 | "scatter3d": [
3872 | {
3873 | "line": {
3874 | "colorbar": {
3875 | "outlinewidth": 1,
3876 | "tickcolor": "rgb(36,36,36)",
3877 | "ticks": "outside"
3878 | }
3879 | },
3880 | "marker": {
3881 | "colorbar": {
3882 | "outlinewidth": 1,
3883 | "tickcolor": "rgb(36,36,36)",
3884 | "ticks": "outside"
3885 | }
3886 | },
3887 | "type": "scatter3d"
3888 | }
3889 | ],
3890 | "scattercarpet": [
3891 | {
3892 | "marker": {
3893 | "colorbar": {
3894 | "outlinewidth": 1,
3895 | "tickcolor": "rgb(36,36,36)",
3896 | "ticks": "outside"
3897 | }
3898 | },
3899 | "type": "scattercarpet"
3900 | }
3901 | ],
3902 | "scattergeo": [
3903 | {
3904 | "marker": {
3905 | "colorbar": {
3906 | "outlinewidth": 1,
3907 | "tickcolor": "rgb(36,36,36)",
3908 | "ticks": "outside"
3909 | }
3910 | },
3911 | "type": "scattergeo"
3912 | }
3913 | ],
3914 | "scattergl": [
3915 | {
3916 | "marker": {
3917 | "colorbar": {
3918 | "outlinewidth": 1,
3919 | "tickcolor": "rgb(36,36,36)",
3920 | "ticks": "outside"
3921 | }
3922 | },
3923 | "type": "scattergl"
3924 | }
3925 | ],
3926 | "scattermapbox": [
3927 | {
3928 | "marker": {
3929 | "colorbar": {
3930 | "outlinewidth": 1,
3931 | "tickcolor": "rgb(36,36,36)",
3932 | "ticks": "outside"
3933 | }
3934 | },
3935 | "type": "scattermapbox"
3936 | }
3937 | ],
3938 | "scatterpolar": [
3939 | {
3940 | "marker": {
3941 | "colorbar": {
3942 | "outlinewidth": 1,
3943 | "tickcolor": "rgb(36,36,36)",
3944 | "ticks": "outside"
3945 | }
3946 | },
3947 | "type": "scatterpolar"
3948 | }
3949 | ],
3950 | "scatterpolargl": [
3951 | {
3952 | "marker": {
3953 | "colorbar": {
3954 | "outlinewidth": 1,
3955 | "tickcolor": "rgb(36,36,36)",
3956 | "ticks": "outside"
3957 | }
3958 | },
3959 | "type": "scatterpolargl"
3960 | }
3961 | ],
3962 | "scatterternary": [
3963 | {
3964 | "marker": {
3965 | "colorbar": {
3966 | "outlinewidth": 1,
3967 | "tickcolor": "rgb(36,36,36)",
3968 | "ticks": "outside"
3969 | }
3970 | },
3971 | "type": "scatterternary"
3972 | }
3973 | ],
3974 | "surface": [
3975 | {
3976 | "colorbar": {
3977 | "outlinewidth": 1,
3978 | "tickcolor": "rgb(36,36,36)",
3979 | "ticks": "outside"
3980 | },
3981 | "colorscale": [
3982 | [
3983 | 0,
3984 | "#440154"
3985 | ],
3986 | [
3987 | 0.1111111111111111,
3988 | "#482878"
3989 | ],
3990 | [
3991 | 0.2222222222222222,
3992 | "#3e4989"
3993 | ],
3994 | [
3995 | 0.3333333333333333,
3996 | "#31688e"
3997 | ],
3998 | [
3999 | 0.4444444444444444,
4000 | "#26828e"
4001 | ],
4002 | [
4003 | 0.5555555555555556,
4004 | "#1f9e89"
4005 | ],
4006 | [
4007 | 0.6666666666666666,
4008 | "#35b779"
4009 | ],
4010 | [
4011 | 0.7777777777777778,
4012 | "#6ece58"
4013 | ],
4014 | [
4015 | 0.8888888888888888,
4016 | "#b5de2b"
4017 | ],
4018 | [
4019 | 1,
4020 | "#fde725"
4021 | ]
4022 | ],
4023 | "type": "surface"
4024 | }
4025 | ],
4026 | "table": [
4027 | {
4028 | "cells": {
4029 | "fill": {
4030 | "color": "rgb(237,237,237)"
4031 | },
4032 | "line": {
4033 | "color": "white"
4034 | }
4035 | },
4036 | "header": {
4037 | "fill": {
4038 | "color": "rgb(217,217,217)"
4039 | },
4040 | "line": {
4041 | "color": "white"
4042 | }
4043 | },
4044 | "type": "table"
4045 | }
4046 | ]
4047 | },
4048 | "layout": {
4049 | "annotationdefaults": {
4050 | "arrowhead": 0,
4051 | "arrowwidth": 1
4052 | },
4053 | "autotypenumbers": "strict",
4054 | "coloraxis": {
4055 | "colorbar": {
4056 | "outlinewidth": 1,
4057 | "tickcolor": "rgb(36,36,36)",
4058 | "ticks": "outside"
4059 | }
4060 | },
4061 | "colorscale": {
4062 | "diverging": [
4063 | [
4064 | 0,
4065 | "rgb(103,0,31)"
4066 | ],
4067 | [
4068 | 0.1,
4069 | "rgb(178,24,43)"
4070 | ],
4071 | [
4072 | 0.2,
4073 | "rgb(214,96,77)"
4074 | ],
4075 | [
4076 | 0.3,
4077 | "rgb(244,165,130)"
4078 | ],
4079 | [
4080 | 0.4,
4081 | "rgb(253,219,199)"
4082 | ],
4083 | [
4084 | 0.5,
4085 | "rgb(247,247,247)"
4086 | ],
4087 | [
4088 | 0.6,
4089 | "rgb(209,229,240)"
4090 | ],
4091 | [
4092 | 0.7,
4093 | "rgb(146,197,222)"
4094 | ],
4095 | [
4096 | 0.8,
4097 | "rgb(67,147,195)"
4098 | ],
4099 | [
4100 | 0.9,
4101 | "rgb(33,102,172)"
4102 | ],
4103 | [
4104 | 1,
4105 | "rgb(5,48,97)"
4106 | ]
4107 | ],
4108 | "sequential": [
4109 | [
4110 | 0,
4111 | "#440154"
4112 | ],
4113 | [
4114 | 0.1111111111111111,
4115 | "#482878"
4116 | ],
4117 | [
4118 | 0.2222222222222222,
4119 | "#3e4989"
4120 | ],
4121 | [
4122 | 0.3333333333333333,
4123 | "#31688e"
4124 | ],
4125 | [
4126 | 0.4444444444444444,
4127 | "#26828e"
4128 | ],
4129 | [
4130 | 0.5555555555555556,
4131 | "#1f9e89"
4132 | ],
4133 | [
4134 | 0.6666666666666666,
4135 | "#35b779"
4136 | ],
4137 | [
4138 | 0.7777777777777778,
4139 | "#6ece58"
4140 | ],
4141 | [
4142 | 0.8888888888888888,
4143 | "#b5de2b"
4144 | ],
4145 | [
4146 | 1,
4147 | "#fde725"
4148 | ]
4149 | ],
4150 | "sequentialminus": [
4151 | [
4152 | 0,
4153 | "#440154"
4154 | ],
4155 | [
4156 | 0.1111111111111111,
4157 | "#482878"
4158 | ],
4159 | [
4160 | 0.2222222222222222,
4161 | "#3e4989"
4162 | ],
4163 | [
4164 | 0.3333333333333333,
4165 | "#31688e"
4166 | ],
4167 | [
4168 | 0.4444444444444444,
4169 | "#26828e"
4170 | ],
4171 | [
4172 | 0.5555555555555556,
4173 | "#1f9e89"
4174 | ],
4175 | [
4176 | 0.6666666666666666,
4177 | "#35b779"
4178 | ],
4179 | [
4180 | 0.7777777777777778,
4181 | "#6ece58"
4182 | ],
4183 | [
4184 | 0.8888888888888888,
4185 | "#b5de2b"
4186 | ],
4187 | [
4188 | 1,
4189 | "#fde725"
4190 | ]
4191 | ]
4192 | },
4193 | "colorway": [
4194 | "#1F77B4",
4195 | "#FF7F0E",
4196 | "#2CA02C",
4197 | "#D62728",
4198 | "#9467BD",
4199 | "#8C564B",
4200 | "#E377C2",
4201 | "#7F7F7F",
4202 | "#BCBD22",
4203 | "#17BECF"
4204 | ],
4205 | "font": {
4206 | "color": "rgb(36,36,36)"
4207 | },
4208 | "geo": {
4209 | "bgcolor": "white",
4210 | "lakecolor": "white",
4211 | "landcolor": "white",
4212 | "showlakes": true,
4213 | "showland": true,
4214 | "subunitcolor": "white"
4215 | },
4216 | "hoverlabel": {
4217 | "align": "left"
4218 | },
4219 | "hovermode": "closest",
4220 | "mapbox": {
4221 | "style": "light"
4222 | },
4223 | "paper_bgcolor": "white",
4224 | "plot_bgcolor": "white",
4225 | "polar": {
4226 | "angularaxis": {
4227 | "gridcolor": "rgb(232,232,232)",
4228 | "linecolor": "rgb(36,36,36)",
4229 | "showgrid": false,
4230 | "showline": true,
4231 | "ticks": "outside"
4232 | },
4233 | "bgcolor": "white",
4234 | "radialaxis": {
4235 | "gridcolor": "rgb(232,232,232)",
4236 | "linecolor": "rgb(36,36,36)",
4237 | "showgrid": false,
4238 | "showline": true,
4239 | "ticks": "outside"
4240 | }
4241 | },
4242 | "scene": {
4243 | "xaxis": {
4244 | "backgroundcolor": "white",
4245 | "gridcolor": "rgb(232,232,232)",
4246 | "gridwidth": 2,
4247 | "linecolor": "rgb(36,36,36)",
4248 | "showbackground": true,
4249 | "showgrid": false,
4250 | "showline": true,
4251 | "ticks": "outside",
4252 | "zeroline": false,
4253 | "zerolinecolor": "rgb(36,36,36)"
4254 | },
4255 | "yaxis": {
4256 | "backgroundcolor": "white",
4257 | "gridcolor": "rgb(232,232,232)",
4258 | "gridwidth": 2,
4259 | "linecolor": "rgb(36,36,36)",
4260 | "showbackground": true,
4261 | "showgrid": false,
4262 | "showline": true,
4263 | "ticks": "outside",
4264 | "zeroline": false,
4265 | "zerolinecolor": "rgb(36,36,36)"
4266 | },
4267 | "zaxis": {
4268 | "backgroundcolor": "white",
4269 | "gridcolor": "rgb(232,232,232)",
4270 | "gridwidth": 2,
4271 | "linecolor": "rgb(36,36,36)",
4272 | "showbackground": true,
4273 | "showgrid": false,
4274 | "showline": true,
4275 | "ticks": "outside",
4276 | "zeroline": false,
4277 | "zerolinecolor": "rgb(36,36,36)"
4278 | }
4279 | },
4280 | "shapedefaults": {
4281 | "fillcolor": "black",
4282 | "line": {
4283 | "width": 0
4284 | },
4285 | "opacity": 0.3
4286 | },
4287 | "ternary": {
4288 | "aaxis": {
4289 | "gridcolor": "rgb(232,232,232)",
4290 | "linecolor": "rgb(36,36,36)",
4291 | "showgrid": false,
4292 | "showline": true,
4293 | "ticks": "outside"
4294 | },
4295 | "baxis": {
4296 | "gridcolor": "rgb(232,232,232)",
4297 | "linecolor": "rgb(36,36,36)",
4298 | "showgrid": false,
4299 | "showline": true,
4300 | "ticks": "outside"
4301 | },
4302 | "bgcolor": "white",
4303 | "caxis": {
4304 | "gridcolor": "rgb(232,232,232)",
4305 | "linecolor": "rgb(36,36,36)",
4306 | "showgrid": false,
4307 | "showline": true,
4308 | "ticks": "outside"
4309 | }
4310 | },
4311 | "title": {
4312 | "x": 0.05
4313 | },
4314 | "xaxis": {
4315 | "automargin": true,
4316 | "gridcolor": "rgb(232,232,232)",
4317 | "linecolor": "rgb(36,36,36)",
4318 | "showgrid": false,
4319 | "showline": true,
4320 | "ticks": "outside",
4321 | "title": {
4322 | "standoff": 15
4323 | },
4324 | "zeroline": false,
4325 | "zerolinecolor": "rgb(36,36,36)"
4326 | },
4327 | "yaxis": {
4328 | "automargin": true,
4329 | "gridcolor": "rgb(232,232,232)",
4330 | "linecolor": "rgb(36,36,36)",
4331 | "showgrid": false,
4332 | "showline": true,
4333 | "ticks": "outside",
4334 | "title": {
4335 | "standoff": 15
4336 | },
4337 | "zeroline": false,
4338 | "zerolinecolor": "rgb(36,36,36)"
4339 | }
4340 | }
4341 | },
4342 | "title": {
4343 | "font": {
4344 | "color": "Black",
4345 | "size": 22
4346 | },
4347 | "text": "Intertopic Distance Map",
4348 | "x": 0.5,
4349 | "xanchor": "center",
4350 | "y": 0.95,
4351 | "yanchor": "top"
4352 | },
4353 | "width": 1000,
4354 | "xaxis": {
4355 | "anchor": "y",
4356 | "domain": [
4357 | 0,
4358 | 1
4359 | ],
4360 | "range": [
4361 | -1.9932966113090516,
4362 | 21.026789093017577
4363 | ],
4364 | "title": {
4365 | "text": ""
4366 | },
4367 | "visible": false
4368 | },
4369 | "yaxis": {
4370 | "anchor": "x",
4371 | "domain": [
4372 | 0,
4373 | 1
4374 | ],
4375 | "range": [
4376 | 10.000898790359496,
4377 | 17.826926946640015
4378 | ],
4379 | "title": {
4380 | "text": ""
4381 | },
4382 | "visible": false
4383 | }
4384 | }
4385 | }
4386 | },
4387 | "metadata": {},
4388 | "output_type": "display_data"
4389 | }
4390 | ],
4391 | "source": [
4392 | "# 各 Topic 間距離圖\n",
4393 | "topic_fig = topic_model.visualize_topics(\n",
4394 | " top_n_topics=10,\n",
4395 | " width=1000,\n",
4396 | ")\n",
4397 | "topic_fig"
4398 | ]
4399 | }
4400 | ],
4401 | "metadata": {
4402 | "interpreter": {
4403 | "hash": "1ae26572eda725713d76d5b5539aa0885fd8640fe2af809f71b79ee35a86cf8d"
4404 | },
4405 | "kernelspec": {
4406 | "display_name": "Python 3.7.11 ('bertopic')",
4407 | "language": "python",
4408 | "name": "python3"
4409 | },
4410 | "language_info": {
4411 | "codemirror_mode": {
4412 | "name": "ipython",
4413 | "version": 3
4414 | },
4415 | "file_extension": ".py",
4416 | "mimetype": "text/x-python",
4417 | "name": "python",
4418 | "nbconvert_exporter": "python",
4419 | "pygments_lexer": "ipython3",
4420 | "version": "3.9.19"
4421 | },
4422 | "orig_nbformat": 4
4423 | },
4424 | "nbformat": 4,
4425 | "nbformat_minor": 2
4426 | }
4427 |
--------------------------------------------------------------------------------
/imgs/bar_plot.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aidenzich/HelloBERTopic/13ec36b258a74c6eb05d3d69e6966896d60235e6/imgs/bar_plot.png
--------------------------------------------------------------------------------
/imgs/intertopic_distance_map.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aidenzich/HelloBERTopic/13ec36b258a74c6eb05d3d69e6966896d60235e6/imgs/intertopic_distance_map.png
--------------------------------------------------------------------------------
/imgs/topic_over_time.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aidenzich/HelloBERTopic/13ec36b258a74c6eb05d3d69e6966896d60235e6/imgs/topic_over_time.png
--------------------------------------------------------------------------------
/main.py:
--------------------------------------------------------------------------------
1 | import os
2 | from xmlrpc.client import Boolean
3 |
4 | os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3" # 關閉TF警告訊息
5 | import pandas as pd
6 | from bertopic import BERTopic
7 | from ckiptagger import construct_dictionary, WS
8 | from transformers import AutoModelForTokenClassification
9 | from utils import EXPORT_PATH, set_up, DATA_PATH
10 | from halo import Halo
11 | from termcolor import colored
12 | import argparse
13 | import pickle
14 |
15 | parser = argparse.ArgumentParser(description="Hello BERTopics")
16 | parser.add_argument(
17 | "--topic_num", type=int, default=10, help="設置要取出頻率排名前幾的topics"
18 | )
19 | parser.add_argument(
20 | "--keyword_file", type=str, default="keys.txt", help="設置讀取keyword檔案名稱"
21 | )
22 | parser.add_argument(
23 | "--model_name",
24 | type=str,
25 | default="ckiplab/bert-base-chinese-ws",
26 | help="設置HuggingFace的PretrainModel名稱",
27 | )
28 | parser.add_argument(
29 | "--data_file", type=str, default="example_data.csv", help="設置資料讀取位置"
30 | )
31 | parser.add_argument(
32 | "--word_sentence_cache", type=Boolean, default=False, help="是否讀取斷詞快取"
33 | )
34 |
35 |
36 | if __name__ == "__main__":
37 | args = parser.parse_args()
38 |
39 | set_up()
40 |
41 | # 設定 Huging Face Pretrained Model
42 | MODEL_NAME = args.model_name
43 | top_n_topics = args.topic_num
44 |
45 | # 讀取資料
46 | df = pd.read_csv(DATA_PATH / args.data_file)
47 | sentence_list = df[
48 | "description"
49 | ].tolist() # 我們取原始資料中的'description'欄位來當作訓練資料
50 | timestamps = df.year.tolist() # 讀取data.csv檔案中的 year 資料,作為我們的timestamp
51 |
52 | # 取出斷詞關鍵字
53 | keysfile = DATA_PATH / args.keyword_file
54 | with open(keysfile, encoding="utf-8") as file:
55 | lines = file.read().splitlines()
56 |
57 | ws_cache_path = EXPORT_PATH / "word_sentence_cache.pkl"
58 | if args.word_sentence_cache and ws_cache_path.is_file():
59 | spinner = Halo(text="Load tokenized cache...", spinner="dots")
60 | spinner.start()
61 | print("Loading word sentence cache...")
62 | word_sentence_list = pickle.load(open(ws_cache_path, "rb"))
63 | spinner.stop()
64 |
65 | else:
66 | # 讀取 CKIP 斷詞模型
67 | ws = WS(str(DATA_PATH))
68 |
69 | # 建立使用者字典 (幫助斷詞出關鍵字)
70 | keydict = {l: 1 for l in lines}
71 | dictionary = construct_dictionary(keydict)
72 |
73 | spinner = Halo(text="Tokenizing with CKIP-Tagger", spinner="dots")
74 | spinner.start()
75 |
76 | # 開始斷詞
77 | word_sentence_list = ws(
78 | sentence_list,
79 | sentence_segmentation=True,
80 | segment_delimiter_set={",", "。", ":", "?", "!", ";"},
81 | recommend_dictionary=dictionary, # 加入斷詞字典
82 | )
83 |
84 | spinner.stop()
85 |
86 | pickle.dump(word_sentence_list, open(ws_cache_path, "wb"))
87 |
88 | # 轉換為BERTopic 可接受格式
89 | ws = [" ".join(w) for w in word_sentence_list]
90 | print(colored(f"BERTopics Input Showcase: \n [ { ws[0] }]", "blue"))
91 |
92 | spinner = Halo(text="Loading HagingFace Pretrained Model", spinner="dots")
93 | spinner.start()
94 |
95 | # 讀取 Hugingface Pretrained Model
96 | model = AutoModelForTokenClassification.from_pretrained(MODEL_NAME)
97 | spinner.stop()
98 | # 建立 BERTopic
99 | topic_model = BERTopic(
100 | language="chinese",
101 | embedding_model=model,
102 | verbose=True,
103 | )
104 |
105 | # 訓練並產生資料
106 | topics, probs = topic_model.fit_transform(ws)
107 | # 產生資料時間資料
108 | topics_over_time = topic_model.topics_over_time(ws, timestamps, nr_bins=20)
109 |
110 | # 各 Topic TF-IDF 關鍵字直方圖
111 | bar_fig = topic_model.visualize_barchart(
112 | top_n_topics=top_n_topics,
113 | width=230,
114 | )
115 |
116 | # 各 Topic 間距離圖
117 | topic_fig = topic_model.visualize_topics(
118 | top_n_topics=top_n_topics,
119 | width=1000,
120 | )
121 |
122 | # 各 Topic 時間序列圖
123 | tot_fig = topic_model.visualize_topics_over_time(
124 | topics_over_time, top_n_topics=top_n_topics, width=1000
125 | )
126 |
127 | # 儲存成 html 檔案,供前端展示使用
128 | bar_fig.write_html(EXPORT_PATH / "bar_fig.html")
129 | topic_fig.write_html(EXPORT_PATH / "topic_fig.html")
130 | tot_fig.write_html(EXPORT_PATH / "tot_fig.html")
131 | print("Done")
132 |
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | ckiptagger==0.2.1
2 | tensorflow==2.13
3 | bertopic==0.16.0
4 | nbformat==5.10.4
5 | pandas==2.2.1
6 | jieba==0.42.1
7 | halo==0.0.31
8 | gdown==5.1.0
9 | spacy==3.7.4
--------------------------------------------------------------------------------
/utils.py:
--------------------------------------------------------------------------------
1 | import jieba
2 | import random
3 | import torch
4 | import numpy as np
5 | from pathlib import Path
6 | from ckiptagger import data_utils
7 | from termcolor import colored
8 |
9 | DATA_PATH = Path(__file__).parent / "data"
10 | EXPORT_PATH = Path(__file__).parent / "export"
11 |
12 |
13 | def set_seed(seed: int) -> None:
14 | random.seed(seed)
15 | np.random.seed(seed)
16 | torch.manual_seed(seed)
17 | torch.cuda.manual_seed_all(seed)
18 |
19 | if torch.cuda.is_available():
20 | torch.backends.cudnn.benchmark = False
21 | torch.backends.cudnn.deterministic = True
22 |
23 |
24 | def set_up():
25 | DATA_PATH.mkdir(parents=True, exist_ok=True)
26 | EXPORT_PATH.mkdir(parents=True, exist_ok=True)
27 | ckip_check()
28 | set_seed(42)
29 |
30 |
31 | def ckip_check():
32 | check_list = [
33 | "embedding_character",
34 | "embedding_word",
35 | "model_ner",
36 | "model_pos",
37 | "model_ws",
38 | ]
39 |
40 | check = True
41 |
42 | for i in check_list:
43 | data_exists = (DATA_PATH / i).exists()
44 | print(
45 | (
46 | colored(data_exists, "blue")
47 | if data_exists
48 | else colored(data_exists, "red")
49 | ),
50 | i,
51 | )
52 | if not data_exists:
53 | check = False
54 |
55 | if not check:
56 | print("Lack of CKIP data, Start download...")
57 | data_utils.download_data_gdown("./")
58 | print("CKIP Data download complete.")
59 | return
60 |
61 | print("CKIP Data validation complete.")
62 |
63 |
64 | def clean_text(text):
65 | stoptext = open(DATA_PATH / "stopword.txt", encoding="utf-8").read()
66 | stopwords = stoptext.split("\n")
67 | words = jieba.lcut(text)
68 | words = [w for w in words if w not in stopwords]
69 | return " ".join(words)
70 |
--------------------------------------------------------------------------------