├── .gitignore ├── README.md ├── data ├── LICENSE ├── data.csv ├── dev.ipynb ├── example_data.csv ├── keys.txt ├── keys │ └── test.ipynb └── stopword.txt ├── exp.ipynb ├── imgs ├── bar_plot.png ├── intertopic_distance_map.png └── topic_over_time.png ├── main.py ├── requirements.txt └── utils.py /.gitignore: -------------------------------------------------------------------------------- 1 | **/.DS_Store 2 | 3 | 4 | # ignore ckip models 5 | **/model_ner 6 | **/model_pos 7 | **/model_ws 8 | 9 | **/embedding_character 10 | **/embedding_word 11 | 12 | **/data.zip 13 | **/*.xml 14 | 15 | **/__pycache__ 16 | **/export 17 | 18 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # HelloBERTopic 2 | 本專案用來對 [BERTopic](https://github.com/MaartenGr/BERTopic) 進行一些應用、摘要與實驗 3 | - [BERTopic文章](https://towardsdatascience.com/topic-modeling-with-bert-779f7db187e6) 4 | - [論文](https://arxiv.org/abs/2203.05794) 5 | 6 | ## 安裝 7 | - 我們使用 conda 來建立環境,並安裝相依套件 https://docs.anaconda.com/free/miniconda/miniconda-install/ 8 | 9 | ``` 10 | conda create --name bertopic python=3.9 11 | conda activate bertopic 12 | pip3 install -r requirements.txt 13 | ``` 14 | 15 | ## 執行結果 16 | 若成功執行完`main.py` 檔案,會在export資料夾中產生以下html檔案: 17 | ``` 18 | bar_fig.html 19 | topic_fig.html 20 | tot_fig.html 21 | ``` 22 | ### Topics Bar 23 | ![TopicsBar](./imgs/bar_plot.png) 24 | 25 | ### Topic Over Time 26 | ![TopicOverTime](./imgs/topic_over_time.png) 27 | 28 | ### Intertopic Distance Map 29 | ![IntertopicDistanceMap](./imgs/intertopic_distance_map.png) 30 | 31 | ## 運作原理 32 | BERTopic 透過對詞向量做 UMAP 降維特徵提取後,採用 HDBSCAN 來進行非監督式的分群動作。 33 | 34 | ### UMAP 35 | 與 tSNE 相似的降維演算法,在資料視覺化上都有著很好的效果。 36 | - 步驟 37 | 1. 計算點與其周遭可控數量鄰點的距離(Distance)。 38 | 2. 確保當 data 降維到低維空間時,點與點之間的距離要與高維空間的距離關係是相似的。 39 | 40 | ### HDBSCAN 41 | HDBSCAN 是針對 DBSCAN 的缺點來進行改善而提出的演算法。 42 | DBSCAN 的演算法假設個群集間的密度(Density)是相同的,然而當此假設運用在密度差異明顯的資料集上時,就會產生錯誤分群的結果。 43 | 兩者最主要的不同,在於他們對待邊界值(border points)的方式。HDBSCAN提出了有效的演算法從而改進了上述DBSCAN在特定狀況下產生錯誤分群的結果。 44 | 45 | 另外,HDBSCAN也保留了DBSCAN的特性,會自動對資料進行分群,而不用使用者自己設定分群數量。 46 | 47 | ### Other Clustering Method 48 | 如果不想要 HDBSCAN 演算法自動做分群,可以採用以下方法更換成 `KMeans` 或 `Birch` 分群演算法,細節參照官方文件 49 | - [link](https://maartengr.github.io/BERTopic/getting_started/clustering/clustering.html#visual-overview) 50 | ```python 51 | from bertopic import BERTopic 52 | from sklearn.cluster import KMeans 53 | 54 | cluster_model = KMeans(n_clusters=50) 55 | topic_model = BERTopic(hdbscan_model=cluster_model) 56 | ``` 57 | 58 | 59 | ## 專案結構 60 | ``` 61 | . 62 | ├── .gitignore 63 | ├── README.md 64 | ├── data 65 | │ ├── data.csv 66 | │ ├── keys 67 | │ │ └── test.ipynb 68 | │ ├── keys.txt 69 | │ └── stopword.txt 70 | ├── exp.ipynb 71 | ├── main.py 72 | ├── requirements.txt 73 | └── utils.py 74 | ``` 75 | ## 安裝環境 76 | ``` 77 | pip install -r requirements.txt 78 | ``` 79 | 80 | ## 執行程式 81 | ``` 82 | python main.py 83 | ``` 84 | ### 參數說明 85 | ``` 86 | Hello BERTopics 87 | 88 | optional arguments: 89 | -h, --help show this help message and exit 90 | --topic_num TOPIC_NUM 91 | 設置要分成幾個topic 92 | --keyword_file KEYWORD_FILE 93 | 設置讀取keyword檔案名稱 94 | --model_name MODEL_NAME 95 | 設置HuggingFace的PretrainModel名稱 96 | --data_file DATA_FILE 97 | 設置資料讀取位置 98 | --word_sentence_cache WORD_SENTENCE_CACHE 99 | 是否讀取斷詞快取(如果沒有cache會走Default流程) 100 | ``` 101 | -------------------------------------------------------------------------------- /data/LICENSE: -------------------------------------------------------------------------------- 1 | Copyright 2019 CKIP 2 | 3 | Licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 4 | International License; you may not use this file except in compliance 5 | with the License. You may obtain a copy of the License at 6 | 7 | https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode 8 | 9 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 10 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 11 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 12 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 13 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 14 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 15 | SOFTWARE. 16 | -------------------------------------------------------------------------------- /data/dev.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "import pandas as pd\n" 10 | ] 11 | }, 12 | { 13 | "cell_type": "code", 14 | "execution_count": 4, 15 | "metadata": {}, 16 | "outputs": [], 17 | "source": [ 18 | "df = pd.read_csv(\"data.csv\")\n" 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": 6, 24 | "metadata": {}, 25 | "outputs": [ 26 | { 27 | "data": { 28 | "text/plain": [ 29 | "Index(['year', 'name', 'label', 'year_start', 'year_end', 'keyword', 'ner',\n", 30 | " 'tf_idf', 'description', 'order'],\n", 31 | " dtype='object')" 32 | ] 33 | }, 34 | "execution_count": 6, 35 | "metadata": {}, 36 | "output_type": "execute_result" 37 | } 38 | ], 39 | "source": [ 40 | "df.columns" 41 | ] 42 | }, 43 | { 44 | "cell_type": "code", 45 | "execution_count": 9, 46 | "metadata": {}, 47 | "outputs": [], 48 | "source": [ 49 | "df[[\"year\", \"name\", \"description\"]].to_csv(\"example_data.csv\", index=False)" 50 | ] 51 | } 52 | ], 53 | "metadata": { 54 | "interpreter": { 55 | "hash": "1ae26572eda725713d76d5b5539aa0885fd8640fe2af809f71b79ee35a86cf8d" 56 | }, 57 | "kernelspec": { 58 | "display_name": "Python 3.7.11 ('bertopic')", 59 | "language": "python", 60 | "name": "python3" 61 | }, 62 | "language_info": { 63 | "codemirror_mode": { 64 | "name": "ipython", 65 | "version": 3 66 | }, 67 | "file_extension": ".py", 68 | "mimetype": "text/x-python", 69 | "name": "python", 70 | "nbconvert_exporter": "python", 71 | "pygments_lexer": "ipython3", 72 | "version": "3.7.11" 73 | }, 74 | "orig_nbformat": 4 75 | }, 76 | "nbformat": 4, 77 | "nbformat_minor": 2 78 | } 79 | -------------------------------------------------------------------------------- /data/keys/test.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 40, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "import xml.etree.ElementTree as ET\n", 10 | "from tqdm import tqdm\n", 11 | "keys = []" 12 | ] 13 | }, 14 | { 15 | "cell_type": "code", 16 | "execution_count": 41, 17 | "metadata": {}, 18 | "outputs": [], 19 | "source": [ 20 | "clean = [' ', '、', '(', ')', '台灣', ':']\n", 21 | "def clean_txt(input):\n", 22 | " for c in clean:\n", 23 | " input = input.replace(c, '')\n", 24 | " return input" 25 | ] 26 | }, 27 | { 28 | "cell_type": "code", 29 | "execution_count": 42, 30 | "metadata": {}, 31 | "outputs": [ 32 | { 33 | "name": "stderr", 34 | "output_type": "stream", 35 | "text": [ 36 | "14it [00:16, 1.15s/it]\n" 37 | ] 38 | } 39 | ], 40 | "source": [ 41 | "for i, idx in tqdm(enumerate(range(14), start=97)):\n", 42 | " tree = ET.parse(f'./GRB_{i}.xml')\n", 43 | " root = tree.getroot()\n", 44 | " for grb05 in root.findall('GRB05'):\n", 45 | " g = grb05.find('KEYWORD_C')\n", 46 | " if g != None:\n", 47 | " if ';' in g.text: \n", 48 | " temp = list(set(g.text.split(';'))) \n", 49 | " if ';' in g.text:\n", 50 | " temp = list(set(g.text.split(';')))\n", 51 | " temp = [clean_txt(w) for w in temp if len(clean_txt(w)) < 5]\n", 52 | " keys.extend(temp) \n", 53 | " " 54 | ] 55 | }, 56 | { 57 | "cell_type": "code", 58 | "execution_count": null, 59 | "metadata": {}, 60 | "outputs": [], 61 | "source": [] 62 | }, 63 | { 64 | "cell_type": "code", 65 | "execution_count": 43, 66 | "metadata": {}, 67 | "outputs": [ 68 | { 69 | "data": { 70 | "text/plain": [ 71 | "159076" 72 | ] 73 | }, 74 | "execution_count": 43, 75 | "metadata": {}, 76 | "output_type": "execute_result" 77 | } 78 | ], 79 | "source": [ 80 | "keys = list(filter(None, keys))\n", 81 | "keys = list(set(keys))\n", 82 | "keys = [clean_txt(k) for k in keys]\n", 83 | "len(keys)" 84 | ] 85 | }, 86 | { 87 | "cell_type": "code", 88 | "execution_count": 44, 89 | "metadata": {}, 90 | "outputs": [ 91 | { 92 | "name": "stdout", 93 | "output_type": "stream", 94 | "text": [ 95 | "南韓\n" 96 | ] 97 | } 98 | ], 99 | "source": [ 100 | "print(keys[1])" 101 | ] 102 | }, 103 | { 104 | "cell_type": "code", 105 | "execution_count": 45, 106 | "metadata": {}, 107 | "outputs": [], 108 | "source": [ 109 | "with open('keys.txt', 'w') as f:\n", 110 | " for item in keys:\n", 111 | " f.write(\"%s\\n\" % item)" 112 | ] 113 | } 114 | ], 115 | "metadata": { 116 | "interpreter": { 117 | "hash": "aa781dd5d8b0b47d7fc97d3d29d31ddde4cdca0824a34701095d27847d24a55d" 118 | }, 119 | "kernelspec": { 120 | "display_name": "Python 3.9.5 ('base')", 121 | "language": "python", 122 | "name": "python3" 123 | }, 124 | "language_info": { 125 | "codemirror_mode": { 126 | "name": "ipython", 127 | "version": 3 128 | }, 129 | "file_extension": ".py", 130 | "mimetype": "text/x-python", 131 | "name": "python", 132 | "nbconvert_exporter": "python", 133 | "pygments_lexer": "ipython3", 134 | "version": "3.7.11" 135 | }, 136 | "orig_nbformat": 4 137 | }, 138 | "nbformat": 4, 139 | "nbformat_minor": 2 140 | } 141 | -------------------------------------------------------------------------------- /data/stopword.txt: -------------------------------------------------------------------------------- 1 | $ 2 | 0 3 | 1 4 | 2 5 | 3 6 | 4 7 | 5 8 | 6 9 | 7 10 | 8 11 | 9 12 | ? 13 | _ 14 | “ 15 | ” 16 | 、 17 | 。 18 | 《 19 | 》 20 | 一 21 | 一些 22 | 一何 23 | 一切 24 | 一則 25 | 一方面 26 | 一旦 27 | 一來 28 | 一樣 29 | 一般 30 | 一轉眼 31 | 萬一 32 | 上 33 | 上下 34 | 下 35 | 不 36 | 不僅 37 | 不但 38 | 不光 39 | 不單 40 | 不只 41 | 不外乎 42 | 不如 43 | 不妨 44 | 不盡 45 | 不盡然 46 | 不得 47 | 不怕 48 | 不惟 49 | 不成 50 | 不拘 51 | 不料 52 | 不是 53 | 不比 54 | 不然 55 | 不特 56 | 不獨 57 | 不管 58 | 不至於 59 | 不若 60 | 不論 61 | 不過 62 | 不問 63 | 與 64 | 與其 65 | 與其說 66 | 與否 67 | 與此同時 68 | 且 69 | 且不說 70 | 且說 71 | 兩者 72 | 個 73 | 個別 74 | 臨 75 | 為 76 | 為了 77 | 為什麼 78 | 為何 79 | 為止 80 | 為此 81 | 為著 82 | 乃 83 | 乃至 84 | 乃至於 85 | 麼 86 | 之 87 | 之一 88 | 之所以 89 | 之類 90 | 烏乎 91 | 乎 92 | 乘 93 | 也 94 | 也好 95 | 也罷 96 | 了 97 | 二來 98 | 於 99 | 於是 100 | 於是乎 101 | 云云 102 | 云爾 103 | 些 104 | 亦 105 | 人 106 | 人們 107 | 人家 108 | 什麼 109 | 什麼樣 110 | 今 111 | 介於 112 | 仍 113 | 仍舊 114 | 從 115 | 從此 116 | 從而 117 | 他 118 | 他人 119 | 他們 120 | 以 121 | 以上 122 | 以為 123 | 以便 124 | 以免 125 | 以及 126 | 以故 127 | 以期 128 | 以來 129 | 以至 130 | 以至於 131 | 以致 132 | 們 133 | 任 134 | 任何 135 | 任憑 136 | 似的 137 | 但 138 | 但凡 139 | 但是 140 | 何 141 | 何以 142 | 何況 143 | 何處 144 | 何時 145 | 餘外 146 | 作為 147 | 你 148 | 你們 149 | 使 150 | 使得 151 | 例如 152 | 依 153 | 依據 154 | 依照 155 | 便於 156 | 俺 157 | 俺們 158 | 倘 159 | 倘使 160 | 倘或 161 | 倘然 162 | 倘若 163 | 借 164 | 假使 165 | 假如 166 | 假若 167 | 儻然 168 | 像 169 | 兒 170 | 先不先 171 | 光是 172 | 全體 173 | 全部 174 | 兮 175 | 關於 176 | 其 177 | 其一 178 | 其中 179 | 其二 180 | 其他 181 | 其餘 182 | 其它 183 | 其次 184 | 具體地說 185 | 具體說來 186 | 兼之 187 | 內 188 | 再 189 | 再其次 190 | 再則 191 | 再有 192 | 再者 193 | 再者說 194 | 再說 195 | 冒 196 | 衝 197 | 況且 198 | 幾 199 | 幾時 200 | 凡 201 | 凡是 202 | 憑 203 | 憑藉 204 | 出於 205 | 出來 206 | 分別 207 | 則 208 | 則甚 209 | 別 210 | 別人 211 | 別處 212 | 別是 213 | 別的 214 | 別管 215 | 別說 216 | 到 217 | 前後 218 | 前此 219 | 前者 220 | 加之 221 | 加以 222 | 即 223 | 即令 224 | 即使 225 | 即便 226 | 即如 227 | 即或 228 | 即若 229 | 卻 230 | 去 231 | 又 232 | 又及 233 | 及 234 | 及其 235 | 及至 236 | 反之 237 | 反而 238 | 反過來 239 | 反過來說 240 | 受到 241 | 另 242 | 另一方面 243 | 另外 244 | 另悉 245 | 只 246 | 只當 247 | 只怕 248 | 只是 249 | 只有 250 | 只消 251 | 只要 252 | 只限 253 | 叫 254 | 叮咚 255 | 可 256 | 可以 257 | 可是 258 | 可見 259 | 各 260 | 各個 261 | 各位 262 | 各種 263 | 各自 264 | 同 265 | 同時 266 | 後 267 | 後者 268 | 向 269 | 向使 270 | 向著 271 | 嚇 272 | 嗎 273 | 否則 274 | 吧 275 | 吧噠 276 | 吱 277 | 呀 278 | 呃 279 | 嘔 280 | 唄 281 | 嗚 282 | 嗚呼 283 | 呢 284 | 呵 285 | 呵呵 286 | 呸 287 | 呼哧 288 | 咋 289 | 和 290 | 咚 291 | 咦 292 | 咧 293 | 咱 294 | 咱們 295 | 咳 296 | 哇 297 | 哈 298 | 哈哈 299 | 哉 300 | 哎 301 | 哎呀 302 | 哎喲 303 | 嘩 304 | 喲 305 | 哦 306 | 哩 307 | 哪 308 | 哪個 309 | 哪些 310 | 哪兒 311 | 哪天 312 | 哪年 313 | 哪怕 314 | 哪樣 315 | 哪邊 316 | 哪裡 317 | 哼 318 | 哼唷 319 | 唉 320 | 唯有 321 | 啊 322 | 啐 323 | 啥 324 | 啦 325 | 啪達 326 | 啷噹 327 | 餵 328 | 喏 329 | 喔唷 330 | 嘍 331 | 嗡 332 | 嗡嗡 333 | 嗬 334 | 嗯 335 | 噯 336 | 嘎 337 | 嘎登 338 | 噓 339 | 嘛 340 | 嘻 341 | 嘿 342 | 嘿嘿 343 | 因 344 | 因為 345 | 因了 346 | 因此 347 | 因著 348 | 因而 349 | 固然 350 | 在 351 | 在下 352 | 在於 353 | 地 354 | 基於 355 | 處在 356 | 多 357 | 多麼 358 | 多少 359 | 大 360 | 大家 361 | 她 362 | 她們 363 | 好 364 | 如 365 | 如上 366 | 如上所述 367 | 如下 368 | 如何 369 | 如其 370 | 如同 371 | 如是 372 | 如果 373 | 如此 374 | 如若 375 | 始而 376 | 孰料 377 | 孰知 378 | 寧 379 | 寧可 380 | 寧願 381 | 寧肯 382 | 它 383 | 它們 384 | 對 385 | 對於 386 | 對待 387 | 對方 388 | 對比 389 | 將 390 | 小 391 | 爾 392 | 爾後 393 | 爾爾 394 | 尚且 395 | 就 396 | 就是 397 | 就是了 398 | 就是說 399 | 就算 400 | 就要 401 | 盡 402 | 儘管 403 | 儘管如此 404 | 豈但 405 | 己 406 | 已 407 | 已矣 408 | 巴 409 | 巴巴 410 | 並 411 | 並且 412 | 並非 413 | 庶乎 414 | 庶幾 415 | 開外 416 | 開始 417 | 歸 418 | 歸齊 419 | 當 420 | 當地 421 | 當然 422 | 當著 423 | 彼 424 | 彼時 425 | 彼此 426 | 往 427 | 待 428 | 很 429 | 得 430 | 得了 431 | 怎 432 | 怎麼 433 | 怎麼辦 434 | 怎麼樣 435 | 怎奈 436 | 怎樣 437 | 總之 438 | 總的來看 439 | 總的來說 440 | 總的說來 441 | 總而言之 442 | 恰恰相反 443 | 您 444 | 惟其 445 | 慢說 446 | 我 447 | 我們 448 | 或 449 | 或則 450 | 或是 451 | 或曰 452 | 或者 453 | 截至 454 | 所 455 | 所以 456 | 所在 457 | 所幸 458 | 所有 459 | 才 460 | 才能 461 | 打 462 | 打從 463 | 把 464 | 抑或 465 | 拿 466 | 按 467 | 按照 468 | 換句話說 469 | 換言之 470 | 據 471 | 據此 472 | 接著 473 | 故 474 | 故此 475 | 故而 476 | 旁人 477 | 無 478 | 無寧 479 | 無論 480 | 既 481 | 既往 482 | 既是 483 | 既然 484 | 時候 485 | 是 486 | 是以 487 | 是的 488 | 曾 489 | 替 490 | 替代 491 | 最 492 | 有 493 | 有些 494 | 有關 495 | 有及 496 | 有時 497 | 有的 498 | 望 499 | 朝 500 | 朝著 501 | 本 502 | 本人 503 | 本地 504 | 本著 505 | 本身 506 | 來 507 | 來著 508 | 來自 509 | 來說 510 | 極了 511 | 果然 512 | 果真 513 | 某 514 | 某個 515 | 某些 516 | 某某 517 | 根據 518 | 歟 519 | 正值 520 | 正如 521 | 正巧 522 | 正是 523 | 此 524 | 此地 525 | 此處 526 | 此外 527 | 此時 528 | 此次 529 | 此間 530 | 毋寧 531 | 每 532 | 每當 533 | 比 534 | 比及 535 | 比如 536 | 比方 537 | 沒奈何 538 | 沿 539 | 沿著 540 | 漫說 541 | 焉 542 | 然則 543 | 然後 544 | 然而 545 | 照 546 | 照著 547 | 猶且 548 | 猶自 549 | 甚且 550 | 甚麼 551 | 甚或 552 | 甚而 553 | 甚至 554 | 甚至於 555 | 用 556 | 用來 557 | 由 558 | 由於 559 | 由是 560 | 由此 561 | 由此可見 562 | 的 563 | 的確 564 | 的話 565 | 直到 566 | 相對而言 567 | 省得 568 | 看 569 | 眨眼 570 | 著 571 | 著呢 572 | 矣 573 | 矣乎 574 | 矣哉 575 | 離 576 | 竟而 577 | 第 578 | 等 579 | 等到 580 | 等等 581 | 簡言之 582 | 管 583 | 類如 584 | 緊接著 585 | 縱 586 | 縱令 587 | 縱使 588 | 縱然 589 | 經 590 | 經過 591 | 結果 592 | 給 593 | 繼之 594 | 繼後 595 | 繼而 596 | 綜上所述 597 | 罷了 598 | 者 599 | 而 600 | 而且 601 | 而況 602 | 而後 603 | 而外 604 | 而已 605 | 而是 606 | 而言 607 | 能 608 | 能否 609 | 騰 610 | 自 611 | 自個兒 612 | 自從 613 | 自各兒 614 | 自後 615 | 自家 616 | 自己 617 | 自打 618 | 自身 619 | 至 620 | 至於 621 | 至今 622 | 至若 623 | 致 624 | 般的 625 | 若 626 | 若夫 627 | 若是 628 | 若果 629 | 若非 630 | 莫不然 631 | 莫如 632 | 莫若 633 | 雖 634 | 雖則 635 | 雖然 636 | 雖說 637 | 被 638 | 要 639 | 要不 640 | 要不是 641 | 要不然 642 | 要么 643 | 要是 644 | 譬喻 645 | 譬如 646 | 讓 647 | 許多 648 | 論 649 | 設使 650 | 設或 651 | 設若 652 | 誠如 653 | 誠然 654 | 該 655 | 說來 656 | 諸 657 | 諸位 658 | 諸如 659 | 誰 660 | 誰人 661 | 誰料 662 | 誰知 663 | 賊死 664 | 賴以 665 | 趕 666 | 起 667 | 起見 668 | 趁 669 | 趁著 670 | 越是 671 | 距 672 | 跟 673 | 較 674 | 較之 675 | 邊 676 | 過 677 | 還 678 | 還是 679 | 還有 680 | 還要 681 | 這 682 | 這一來 683 | 這個 684 | 這麼 685 | 這麼些 686 | 這麼樣 687 | 這麼點兒 688 | 這些 689 | 這會兒 690 | 這兒 691 | 這就是說 692 | 這時 693 | 這樣 694 | 這次 695 | 這般 696 | 這邊 697 | 這裡 698 | 進而 699 | 連 700 | 連同 701 | 逐步 702 | 通過 703 | 遵循 704 | 遵照 705 | 那 706 | 那個 707 | 那麼 708 | 那麼些 709 | 那麼樣 710 | 那些 711 | 那會兒 712 | 那兒 713 | 那時 714 | 那樣 715 | 那般 716 | 那邊 717 | 那裡 718 | 都 719 | 鄙人 720 | 鑑於 721 | 針對 722 | 阿 723 | 除 724 | 除了 725 | 除外 726 | 除開 727 | 除此之外 728 | 除非 729 | 隨 730 | 隨後 731 | 隨時 732 | 隨著 733 | 難道說 734 | 非但 735 | 非徒 736 | 非特 737 | 非獨 738 | 靠 739 | 順 740 | 順著 741 | 首先 742 | ! 743 | , 744 | : 745 | ; 746 | ? -------------------------------------------------------------------------------- /exp.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 2, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "import pandas as pd\n", 10 | "from bertopic import BERTopic\n", 11 | "from ckiptagger import construct_dictionary, WS, POS, NER\n", 12 | "from transformers import AutoModelForTokenClassification\n", 13 | "import numpy as np\n", 14 | "import random\n", 15 | "import torch" 16 | ] 17 | }, 18 | { 19 | "cell_type": "code", 20 | "execution_count": 3, 21 | "metadata": {}, 22 | "outputs": [], 23 | "source": [ 24 | "def set_seed(seed: int) -> None:\n", 25 | " random.seed(seed)\n", 26 | " np.random.seed(seed)\n", 27 | " torch.manual_seed(seed)\n", 28 | " torch.cuda.manual_seed_all(seed)\n", 29 | "\n", 30 | " if torch.cuda.is_available():\n", 31 | " # Disable cuDNN benchmark for deterministic selection on algorithm.\n", 32 | " torch.backends.cudnn.benchmark = False\n", 33 | " torch.backends.cudnn.deterministic = True\n", 34 | " \n", 35 | "set_seed(4698)" 36 | ] 37 | }, 38 | { 39 | "cell_type": "code", 40 | "execution_count": 4, 41 | "metadata": {}, 42 | "outputs": [ 43 | { 44 | "name": "stdout", 45 | "output_type": "stream", 46 | "text": [ 47 | "南韓\n" 48 | ] 49 | } 50 | ], 51 | "source": [ 52 | "keysfile = \"data/keys.txt\"\n", 53 | "with open(keysfile) as file:\n", 54 | " lines = file.read().splitlines() \n", 55 | "\n", 56 | "print(lines[1])" 57 | ] 58 | }, 59 | { 60 | "cell_type": "code", 61 | "execution_count": 5, 62 | "metadata": {}, 63 | "outputs": [], 64 | "source": [ 65 | "keydict = { l: 1 for l in lines}\n", 66 | "dictionary = construct_dictionary(keydict)" 67 | ] 68 | }, 69 | { 70 | "cell_type": "code", 71 | "execution_count": 6, 72 | "metadata": {}, 73 | "outputs": [ 74 | { 75 | "name": "stderr", 76 | "output_type": "stream", 77 | "text": [ 78 | "2024-04-10 13:31:21.802783: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:375] MLIR V1 optimization pass is not enabled\n" 79 | ] 80 | } 81 | ], 82 | "source": [ 83 | "ws = WS(\"./data\")\n", 84 | "pos = POS(\"./data\")\n", 85 | "ner = NER(\"./data\")" 86 | ] 87 | }, 88 | { 89 | "cell_type": "code", 90 | "execution_count": 7, 91 | "metadata": {}, 92 | "outputs": [], 93 | "source": [ 94 | "df = pd.read_csv(\"data/data.csv\")\n", 95 | "df = df[[\"year\", \"name\", \"label\", \"description\"]]" 96 | ] 97 | }, 98 | { 99 | "cell_type": "code", 100 | "execution_count": 8, 101 | "metadata": {}, 102 | "outputs": [], 103 | "source": [ 104 | "\n", 105 | "stoptext = open('data/stopword.txt', encoding='utf-8').read()\n", 106 | "stopwords = stoptext.split('\\n')\n" 107 | ] 108 | }, 109 | { 110 | "cell_type": "code", 111 | "execution_count": 9, 112 | "metadata": {}, 113 | "outputs": [], 114 | "source": [ 115 | "sentence_list = df[\"description\"].tolist()\n", 116 | "word_sentence_list = ws(\n", 117 | " sentence_list,\n", 118 | " sentence_segmentation = True, # To consider delimiters\n", 119 | " segment_delimiter_set = {\",\", \"。\", \":\", \"?\", \"!\", \";\"}, # This is the defualt set of delimiters\n", 120 | " recommend_dictionary = dictionary # words in this dictionary are encouraged \n", 121 | ")\n" 122 | ] 123 | }, 124 | { 125 | "cell_type": "code", 126 | "execution_count": 10, 127 | "metadata": {}, 128 | "outputs": [], 129 | "source": [ 130 | "# 轉換為BERTopic 可接受格式\n", 131 | "ws = [\" \".join(w) for w in word_sentence_list]" 132 | ] 133 | }, 134 | { 135 | "cell_type": "code", 136 | "execution_count": 11, 137 | "metadata": {}, 138 | "outputs": [ 139 | { 140 | "name": "stderr", 141 | "output_type": "stream", 142 | "text": [ 143 | "2024-04-10 13:34:52,128 - BERTopic - Embedding - Transforming documents to embeddings.\n", 144 | "Batches: 100%|██████████| 99/99 [00:37<00:00, 2.67it/s]\n", 145 | "2024-04-10 13:35:32,437 - BERTopic - Embedding - Completed ✓\n", 146 | "2024-04-10 13:35:32,438 - BERTopic - Dimensionality - Fitting the dimensionality reduction algorithm\n", 147 | "2024-04-10 13:35:41,960 - BERTopic - Dimensionality - Completed ✓\n", 148 | "2024-04-10 13:35:41,960 - BERTopic - Cluster - Start clustering the reduced embeddings\n", 149 | "2024-04-10 13:35:42,022 - BERTopic - Cluster - Completed ✓\n", 150 | "2024-04-10 13:35:42,026 - BERTopic - Representation - Extracting topics from clusters using representation models.\n", 151 | "2024-04-10 13:35:42,353 - BERTopic - Representation - Completed ✓\n" 152 | ] 153 | } 154 | ], 155 | "source": [ 156 | "model = AutoModelForTokenClassification.from_pretrained(\"ckiplab/bert-base-chinese-ws\")\n", 157 | "topic_model = BERTopic(\n", 158 | " language=\"chinese\", \n", 159 | " embedding_model=model, \n", 160 | " verbose=True\n", 161 | ")\n", 162 | "topics, probs = topic_model.fit_transform(ws)\n" 163 | ] 164 | }, 165 | { 166 | "cell_type": "code", 167 | "execution_count": 12, 168 | "metadata": {}, 169 | "outputs": [], 170 | "source": [ 171 | "timestamps = df.year.tolist() # 讀取data.csv檔案中的 year 資料,作為我們的timestamp\n", 172 | "timestamps = [f\"{str(int(t)+1911)}-01-01\" for t in timestamps]" 173 | ] 174 | }, 175 | { 176 | "cell_type": "code", 177 | "execution_count": 19, 178 | "metadata": {}, 179 | "outputs": [ 180 | { 181 | "name": "stderr", 182 | "output_type": "stream", 183 | "text": [ 184 | "0it [00:00, ?it/s]" 185 | ] 186 | }, 187 | { 188 | "name": "stderr", 189 | "output_type": "stream", 190 | "text": [ 191 | "8it [00:01, 4.88it/s]\n" 192 | ] 193 | }, 194 | { 195 | "data": { 196 | "application/vnd.plotly.v1+json": { 197 | "config": { 198 | "plotlyServerURL": "https://plot.ly" 199 | }, 200 | "data": [ 201 | { 202 | "hoverinfo": "text", 203 | "hovertext": [ 204 | "Topic 0
Words: 臨床試驗, 計畫, 研究, 開發, 進行", 205 | "Topic 0
Words: 疫苗, 計畫, 臨床試驗, 開發, 發展", 206 | "Topic 0
Words: 生技, 臨床試驗, 計畫, 研發, 發展", 207 | "Topic 0
Words: 新藥, 生技, 中心, 計畫, 疾病", 208 | "Topic 0
Words: 中心, 計畫, 生技, 進行, 研發", 209 | "Topic 0
Words: 計畫, 中心, 臨床試驗, 開發, 生技", 210 | "Topic 0
Words: 計畫, 臨床試驗, 中心, 開發, 發展", 211 | "Topic 0
Words: 開發, 中心, 計畫, 研發, 發展" 212 | ], 213 | "marker": { 214 | "color": "#E69F00" 215 | }, 216 | "mode": "lines", 217 | "name": "0_計畫_臨床試驗_中心_開發", 218 | "type": "scatter", 219 | "x": [ 220 | "2014-01-01T00:00:00", 221 | "2015-01-01T00:00:00", 222 | "2016-01-01T00:00:00", 223 | "2017-01-01T00:00:00", 224 | "2018-01-01T00:00:00", 225 | "2019-01-01T00:00:00", 226 | "2020-01-01T00:00:00", 227 | "2021-01-01T00:00:00" 228 | ], 229 | "y": [ 230 | 44, 231 | 48, 232 | 35, 233 | 35, 234 | 42, 235 | 40, 236 | 38, 237 | 48 238 | ] 239 | }, 240 | { 241 | "hoverinfo": "text", 242 | "hovertext": [ 243 | "Topic 1
Words: 智慧電子, 智慧, 服務, 發展, 產業", 244 | "Topic 1
Words: 智慧, 智慧電子, 應用, 發展, 產業", 245 | "Topic 1
Words: 智慧, 服務, 智財, 平台, 發展", 246 | "Topic 1
Words: 智財, 智慧, 服務, 產業, 發展", 247 | "Topic 1
Words: 智慧, 服務, 發展, 應用, 產業", 248 | "Topic 1
Words: 智慧, 產業, 發展, 服務, 推動", 249 | "Topic 1
Words: 智慧, 發展, 應用, 服務, 產業", 250 | "Topic 1
Words: 智慧, 5g, 應用, 發展, 產業" 251 | ], 252 | "marker": { 253 | "color": "#56B4E9" 254 | }, 255 | "mode": "lines", 256 | "name": "1_智慧_服務_發展_產業", 257 | "type": "scatter", 258 | "x": [ 259 | "2014-01-01T00:00:00", 260 | "2015-01-01T00:00:00", 261 | "2016-01-01T00:00:00", 262 | "2017-01-01T00:00:00", 263 | "2018-01-01T00:00:00", 264 | "2019-01-01T00:00:00", 265 | "2020-01-01T00:00:00", 266 | "2021-01-01T00:00:00" 267 | ], 268 | "y": [ 269 | 20, 270 | 20, 271 | 22, 272 | 34, 273 | 39, 274 | 43, 275 | 39, 276 | 54 277 | ] 278 | }, 279 | { 280 | "hoverinfo": "text", 281 | "hovertext": [ 282 | "Topic 2
Words: 中子, 工商, 行政, 地震, 實驗", 283 | "Topic 2
Words: 中子, 行政, 工商, 實驗, 頻道", 284 | "Topic 2
Words: 中子, 人事, 研發成果, 實驗, 行政", 285 | "Topic 2
Words: 中子, 行政, 電視臺, 協助, 訊號", 286 | "Topic 2
Words: 中子, 頻寬, 行政, 電視臺, 訊號", 287 | "Topic 2
Words: 頻寬, 電視臺, 行政, 訊號, mbps", 288 | "Topic 2
Words: 頻寬, 訊號, 電視臺, 上鏈, 測謊", 289 | "Topic 2
Words: 訊號, 電視臺, 上鏈, 接收, 頻道" 290 | ], 291 | "marker": { 292 | "color": "#009E73" 293 | }, 294 | "mode": "lines", 295 | "name": "2_中子_行政_頻道_實驗", 296 | "type": "scatter", 297 | "x": [ 298 | "2014-01-01T00:00:00", 299 | "2015-01-01T00:00:00", 300 | "2016-01-01T00:00:00", 301 | "2017-01-01T00:00:00", 302 | "2018-01-01T00:00:00", 303 | "2019-01-01T00:00:00", 304 | "2020-01-01T00:00:00", 305 | "2021-01-01T00:00:00" 306 | ], 307 | "y": [ 308 | 19, 309 | 20, 310 | 24, 311 | 27, 312 | 20, 313 | 16, 314 | 16, 315 | 13 316 | ] 317 | }, 318 | { 319 | "hoverinfo": "text", 320 | "hovertext": [ 321 | "Topic 3
Words: 食品, pki, 美食, 食媒性, 病原", 322 | "Topic 3
Words: 食品, 食媒性, 加工技術, 品牌, 食品安全", 323 | "Topic 3
Words: 食品, 食品產業, 食媒性, 食品安全, 開發", 324 | "Topic 3
Words: 食品, 食品產業, 食媒性, 食品安全, 食材", 325 | "Topic 3
Words: 食品, 食品產業, 食材, 校園午餐, 加工技術", 326 | "Topic 3
Words: 食品, 食品產業, 食材, 開發, 國產", 327 | "Topic 3
Words: 食品, 食品產業, 肥胖, 食材, 氣候變遷", 328 | "Topic 3
Words: 食品, 品質, 透過, 食品產業, 食品安全" 329 | ], 330 | "marker": { 331 | "color": "#F0E442" 332 | }, 333 | "mode": "lines", 334 | "name": "3_食品_食品產業_食品安全_食材", 335 | "type": "scatter", 336 | "x": [ 337 | "2014-01-01T00:00:00", 338 | "2015-01-01T00:00:00", 339 | "2016-01-01T00:00:00", 340 | "2017-01-01T00:00:00", 341 | "2018-01-01T00:00:00", 342 | "2019-01-01T00:00:00", 343 | "2020-01-01T00:00:00", 344 | "2021-01-01T00:00:00" 345 | ], 346 | "y": [ 347 | 10, 348 | 8, 349 | 12, 350 | 17, 351 | 12, 352 | 10, 353 | 11, 354 | 9 355 | ] 356 | }, 357 | { 358 | "hoverinfo": "text", 359 | "hovertext": [ 360 | "Topic 4
Words: 低溫物流, 園區, 工業, 基礎, 智財", 361 | "Topic 4
Words: 石化, 專利布局, 智財, 材料, 高值化", 362 | "Topic 4
Words: 材料, 石化, 超高畫質, 技術, 基礎", 363 | "Topic 4
Words: 超高畫質, 石化, 製作, 商務, 產業", 364 | "Topic 4
Words: 超高畫質, 高階, 高速寬頻, 基礎, 高值化", 365 | "Topic 4
Words: 超高畫質, 基礎, 高階, 技術, 設備", 366 | "Topic 4
Words: 高速寬頻, 網路, 高值化, 自製, 高階", 367 | "Topic 4
Words: 農業素材, 產業, 碳材, 高值化, 基地臺" 368 | ], 369 | "marker": { 370 | "color": "#D55E00" 371 | }, 372 | "mode": "lines", 373 | "name": "4_超高畫質_基礎_產業_技術", 374 | "type": "scatter", 375 | "x": [ 376 | "2014-01-01T00:00:00", 377 | "2015-01-01T00:00:00", 378 | "2016-01-01T00:00:00", 379 | "2017-01-01T00:00:00", 380 | "2018-01-01T00:00:00", 381 | "2019-01-01T00:00:00", 382 | "2020-01-01T00:00:00", 383 | "2021-01-01T00:00:00" 384 | ], 385 | "y": [ 386 | 10, 387 | 8, 388 | 11, 389 | 9, 390 | 10, 391 | 13, 392 | 10, 393 | 8 394 | ] 395 | }, 396 | { 397 | "hoverinfo": "text", 398 | "hovertext": [ 399 | "Topic 5
Words: 地下水, 都會區, 山區, 水文觀測, 防洪", 400 | "Topic 5
Words: 地下水, 水資源, 都會區, 水文觀測, 補注區", 401 | "Topic 5
Words: 地下水, 混凝土, 綠色水泥, 氣候變遷, 水文觀測", 402 | "Topic 5
Words: 地下水, 氣候變遷, 水資源, 水文觀測, 混凝土", 403 | "Topic 5
Words: 混凝土, 地下水, 綠色水泥, 水庫, 水資源", 404 | "Topic 5
Words: 地下水, 混凝土, 飲用水, 用水, 水庫", 405 | "Topic 5
Words: 地下水, 水資源, 用水, 給水, 飲用水", 406 | "Topic 5
Words: 飲用水, 地層下陷, 水庫, 地下水, 列管" 407 | ], 408 | "marker": { 409 | "color": "#0072B2" 410 | }, 411 | "mode": "lines", 412 | "name": "5_地下水_混凝土_水資源_氣候變遷", 413 | "type": "scatter", 414 | "x": [ 415 | "2014-01-01T00:00:00", 416 | "2015-01-01T00:00:00", 417 | "2016-01-01T00:00:00", 418 | "2017-01-01T00:00:00", 419 | "2018-01-01T00:00:00", 420 | "2019-01-01T00:00:00", 421 | "2020-01-01T00:00:00", 422 | "2021-01-01T00:00:00" 423 | ], 424 | "y": [ 425 | 9, 426 | 8, 427 | 10, 428 | 11, 429 | 8, 430 | 8, 431 | 8, 432 | 7 433 | ] 434 | }, 435 | { 436 | "hoverinfo": "text", 437 | "hovertext": [ 438 | "Topic 6
Words: 儀器, 光學, 光源, 材料, 真空", 439 | "Topic 6
Words: 儀器, 光源, 光學, 材料, 光束線", 440 | "Topic 6
Words: 光源, 儀器, 光學, 半導體, 設施", 441 | "Topic 6
Words: 光束線, 光源, 光子源, 設施, 實驗", 442 | "Topic 6
Words: 光子源, 光源, 光束線, 設施, 台灣", 443 | "Topic 6
Words: 儀器, 光子源, 設施, 實驗, 光學", 444 | "Topic 6
Words: 設施, 實驗, 光子源, 光源, 顯微術", 445 | "Topic 6
Words: 光子源, 實驗, 光源, 用戶, 台灣" 446 | ], 447 | "marker": { 448 | "color": "#CC79A7" 449 | }, 450 | "mode": "lines", 451 | "name": "6_光源_光子源_設施_光束線", 452 | "type": "scatter", 453 | "x": [ 454 | "2014-01-01T00:00:00", 455 | "2015-01-01T00:00:00", 456 | "2016-01-01T00:00:00", 457 | "2017-01-01T00:00:00", 458 | "2018-01-01T00:00:00", 459 | "2019-01-01T00:00:00", 460 | "2020-01-01T00:00:00", 461 | "2021-01-01T00:00:00" 462 | ], 463 | "y": [ 464 | 8, 465 | 10, 466 | 10, 467 | 8, 468 | 6, 469 | 7, 470 | 5, 471 | 5 472 | ] 473 | }, 474 | { 475 | "hoverinfo": "text", 476 | "hovertext": [ 477 | "Topic 7
Words: 未來想像, 核心能力, 課程, 人文, 軟體", 478 | "Topic 7
Words: 人文, 社科, 學術, 人文社會, 資料庫", 479 | "Topic 7
Words: 人文, 社會科學, 服務業, 領域, 社會企業", 480 | "Topic 7
Words: 人文, 社會科學, 圖書館, 領域, 服務業", 481 | "Topic 7
Words: 人文, 社會科學, 領域, 圖書館, 數位經濟", 482 | "Topic 7
Words: 新住民, 領域, 新創, 敘事力, 計畫", 483 | "Topic 7
Words: 社會創新, 社會福利, 新住民, 跨域, 分支", 484 | "Topic 7
Words: 社會創新, 社會福利, 分支, 志願服務, 前瞻議題" 485 | ], 486 | "marker": { 487 | "color": "#E69F00" 488 | }, 489 | "mode": "lines", 490 | "name": "7_人文_社會科學_社會創新_領域", 491 | "type": "scatter", 492 | "x": [ 493 | "2014-01-01T00:00:00", 494 | "2015-01-01T00:00:00", 495 | "2016-01-01T00:00:00", 496 | "2017-01-01T00:00:00", 497 | "2018-01-01T00:00:00", 498 | "2019-01-01T00:00:00", 499 | "2020-01-01T00:00:00", 500 | "2021-01-01T00:00:00" 501 | ], 502 | "y": [ 503 | 5, 504 | 5, 505 | 6, 506 | 6, 507 | 6, 508 | 7, 509 | 8, 510 | 6 511 | ] 512 | }, 513 | { 514 | "hoverinfo": "text", 515 | "hovertext": [ 516 | "Topic 8
Words: 交通資訊, 車輛, 開發, 大型, 系統", 517 | "Topic 8
Words: 車輛, 模組, 大型, 鋰電池, 系統", 518 | "Topic 8
Words: 車輛, 大型, 鋰電池, 高能量, 模組", 519 | "Topic 8
Words: 車輛, 自行車, 模組, 運輸, 技術", 520 | "Topic 8
Words: 駕駛車, 車輛, 自動, 運輸, 自行車", 521 | "Topic 8
Words: 雷達, 高精度, 車輛, 地圖, 自駕", 522 | "Topic 8
Words: 車輛, 車牌辨識, kr, 車廠, 雷達", 523 | "Topic 8
Words: 自駕, 車牌辨識, 車輛, 漁船, 雷達" 524 | ], 525 | "marker": { 526 | "color": "#56B4E9" 527 | }, 528 | "mode": "lines", 529 | "name": "8_車輛_模組_大型_系統", 530 | "type": "scatter", 531 | "x": [ 532 | "2014-01-01T00:00:00", 533 | "2015-01-01T00:00:00", 534 | "2016-01-01T00:00:00", 535 | "2017-01-01T00:00:00", 536 | "2018-01-01T00:00:00", 537 | "2019-01-01T00:00:00", 538 | "2020-01-01T00:00:00", 539 | "2021-01-01T00:00:00" 540 | ], 541 | "y": [ 542 | 10, 543 | 8, 544 | 4, 545 | 4, 546 | 7, 547 | 5, 548 | 5, 549 | 5 550 | ] 551 | }, 552 | { 553 | "hoverinfo": "text", 554 | "hovertext": [ 555 | "Topic 9
Words: 山崩, 山崩潛勢, 防災教育, 耐震評估, 活動斷層", 556 | "Topic 9
Words: 山崩潛勢, 研發成果, 山崩, 輻射監測, 斷層", 557 | "Topic 9
Words: 活動斷層, 山崩潛勢, 山崩, 斷層, 研發成果", 558 | "Topic 9
Words: 崩塌, 大規模, 山崩, 山崩潛勢, 活動斷層", 559 | "Topic 9
Words: 開源, 研發成果, 農地, 開發, 耐震評估", 560 | "Topic 9
Words: 研發成果, 農地, 災害韌性, 模型, 大量估價", 561 | "Topic 9
Words: 農地, 災害韌性, 農業區, 資訊, 地價", 562 | "Topic 9
Words: 地熱, 探勘, 災害韌性, 潛能區, 農地" 563 | ], 564 | "marker": { 565 | "color": "#009E73" 566 | }, 567 | "mode": "lines", 568 | "name": "9_研發成果_山崩_山崩潛勢_耐震評估", 569 | "type": "scatter", 570 | "x": [ 571 | "2014-01-01T00:00:00", 572 | "2015-01-01T00:00:00", 573 | "2016-01-01T00:00:00", 574 | "2017-01-01T00:00:00", 575 | "2018-01-01T00:00:00", 576 | "2019-01-01T00:00:00", 577 | "2020-01-01T00:00:00", 578 | "2021-01-01T00:00:00" 579 | ], 580 | "y": [ 581 | 7, 582 | 7, 583 | 4, 584 | 7, 585 | 5, 586 | 5, 587 | 4, 588 | 5 589 | ] 590 | }, 591 | { 592 | "hoverinfo": "text", 593 | "hovertext": [ 594 | "Topic 10
Words: 研究, 有害生物, 植物, 動物用, 安全衛生", 595 | "Topic 10
Words: 研究, 環境毒物, 檢疫, 有害生物, 安全衛生", 596 | "Topic 10
Words: 子項, 審驗技術, 研究, 處置, 廢棄物", 597 | "Topic 10
Words: 子項, 審驗技術, 研究, 廢棄物, 處置", 598 | "Topic 10
Words: 研究, 廢棄物, 子項, 審驗技術, 低放射性", 599 | "Topic 10
Words: 研究, 廢棄物, 子項, 低放射性, 處置", 600 | "Topic 10
Words: 研究, 勞動, 廢棄物, 資材, 處置", 601 | "Topic 10
Words: 勞動, 職業安全, 衛生, 研究, 職場安全" 602 | ], 603 | "marker": { 604 | "color": "#F0E442" 605 | }, 606 | "mode": "lines", 607 | "name": "10_研究_廢棄物_子項_勞動", 608 | "type": "scatter", 609 | "x": [ 610 | "2014-01-01T00:00:00", 611 | "2015-01-01T00:00:00", 612 | "2016-01-01T00:00:00", 613 | "2017-01-01T00:00:00", 614 | "2018-01-01T00:00:00", 615 | "2019-01-01T00:00:00", 616 | "2020-01-01T00:00:00", 617 | "2021-01-01T00:00:00" 618 | ], 619 | "y": [ 620 | 4, 621 | 4, 622 | 5, 623 | 7, 624 | 7, 625 | 7, 626 | 8, 627 | 1 628 | ] 629 | }, 630 | { 631 | "hoverinfo": "text", 632 | "hovertext": [ 633 | "Topic 11
Words: 海洋, 海洋科技, 探測, 地震, 海嘯", 634 | "Topic 11
Words: 海洋, 海洋科技, 探測, 研究船, 水合物", 635 | "Topic 11
Words: 海洋, 探測, 海洋科技, 地震, 海域", 636 | "Topic 11
Words: 海洋, 海洋科技, 海域, 地震, 公里", 637 | "Topic 11
Words: 海洋, 海洋科技, 南海, 海象, 海域", 638 | "Topic 11
Words: 海洋, 海洋科技, 海洋環境, 研究船, 船舶", 639 | "Topic 11
Words: 災防, 海象, 海洋, 海域, 氣象", 640 | "Topic 11
Words: 海洋, 海象, 海洋科技, 海域, 空運" 641 | ], 642 | "marker": { 643 | "color": "#D55E00" 644 | }, 645 | "mode": "lines", 646 | "name": "11_海洋_海洋科技_海域_探測", 647 | "type": "scatter", 648 | "x": [ 649 | "2014-01-01T00:00:00", 650 | "2015-01-01T00:00:00", 651 | "2016-01-01T00:00:00", 652 | "2017-01-01T00:00:00", 653 | "2018-01-01T00:00:00", 654 | "2019-01-01T00:00:00", 655 | "2020-01-01T00:00:00", 656 | "2021-01-01T00:00:00" 657 | ], 658 | "y": [ 659 | 6, 660 | 7, 661 | 5, 662 | 7, 663 | 5, 664 | 5, 665 | 3, 666 | 4 667 | ] 668 | } 669 | ], 670 | "layout": { 671 | "height": 450, 672 | "hoverlabel": { 673 | "bgcolor": "white", 674 | "font": { 675 | "family": "Rockwell", 676 | "size": 16 677 | } 678 | }, 679 | "legend": { 680 | "title": { 681 | "text": "Global Topic Representation" 682 | } 683 | }, 684 | "template": { 685 | "data": { 686 | "bar": [ 687 | { 688 | "error_x": { 689 | "color": "rgb(36,36,36)" 690 | }, 691 | "error_y": { 692 | "color": "rgb(36,36,36)" 693 | }, 694 | "marker": { 695 | "line": { 696 | "color": "white", 697 | "width": 0.5 698 | }, 699 | "pattern": { 700 | "fillmode": "overlay", 701 | "size": 10, 702 | "solidity": 0.2 703 | } 704 | }, 705 | "type": "bar" 706 | } 707 | ], 708 | "barpolar": [ 709 | { 710 | "marker": { 711 | "line": { 712 | "color": "white", 713 | "width": 0.5 714 | }, 715 | "pattern": { 716 | "fillmode": "overlay", 717 | "size": 10, 718 | "solidity": 0.2 719 | } 720 | }, 721 | "type": "barpolar" 722 | } 723 | ], 724 | "carpet": [ 725 | { 726 | "aaxis": { 727 | "endlinecolor": "rgb(36,36,36)", 728 | "gridcolor": "white", 729 | "linecolor": "white", 730 | "minorgridcolor": "white", 731 | "startlinecolor": "rgb(36,36,36)" 732 | }, 733 | "baxis": { 734 | "endlinecolor": "rgb(36,36,36)", 735 | "gridcolor": "white", 736 | "linecolor": "white", 737 | "minorgridcolor": "white", 738 | "startlinecolor": "rgb(36,36,36)" 739 | }, 740 | "type": "carpet" 741 | } 742 | ], 743 | "choropleth": [ 744 | { 745 | "colorbar": { 746 | "outlinewidth": 1, 747 | "tickcolor": "rgb(36,36,36)", 748 | "ticks": "outside" 749 | }, 750 | "type": "choropleth" 751 | } 752 | ], 753 | "contour": [ 754 | { 755 | "colorbar": { 756 | "outlinewidth": 1, 757 | "tickcolor": "rgb(36,36,36)", 758 | "ticks": "outside" 759 | }, 760 | "colorscale": [ 761 | [ 762 | 0, 763 | "#440154" 764 | ], 765 | [ 766 | 0.1111111111111111, 767 | "#482878" 768 | ], 769 | [ 770 | 0.2222222222222222, 771 | "#3e4989" 772 | ], 773 | [ 774 | 0.3333333333333333, 775 | "#31688e" 776 | ], 777 | [ 778 | 0.4444444444444444, 779 | "#26828e" 780 | ], 781 | [ 782 | 0.5555555555555556, 783 | "#1f9e89" 784 | ], 785 | [ 786 | 0.6666666666666666, 787 | "#35b779" 788 | ], 789 | [ 790 | 0.7777777777777778, 791 | "#6ece58" 792 | ], 793 | [ 794 | 0.8888888888888888, 795 | "#b5de2b" 796 | ], 797 | [ 798 | 1, 799 | "#fde725" 800 | ] 801 | ], 802 | "type": "contour" 803 | } 804 | ], 805 | "contourcarpet": [ 806 | { 807 | "colorbar": { 808 | "outlinewidth": 1, 809 | "tickcolor": "rgb(36,36,36)", 810 | "ticks": "outside" 811 | }, 812 | "type": "contourcarpet" 813 | } 814 | ], 815 | "heatmap": [ 816 | { 817 | "colorbar": { 818 | "outlinewidth": 1, 819 | "tickcolor": "rgb(36,36,36)", 820 | "ticks": "outside" 821 | }, 822 | "colorscale": [ 823 | [ 824 | 0, 825 | "#440154" 826 | ], 827 | [ 828 | 0.1111111111111111, 829 | "#482878" 830 | ], 831 | [ 832 | 0.2222222222222222, 833 | "#3e4989" 834 | ], 835 | [ 836 | 0.3333333333333333, 837 | "#31688e" 838 | ], 839 | [ 840 | 0.4444444444444444, 841 | "#26828e" 842 | ], 843 | [ 844 | 0.5555555555555556, 845 | "#1f9e89" 846 | ], 847 | [ 848 | 0.6666666666666666, 849 | "#35b779" 850 | ], 851 | [ 852 | 0.7777777777777778, 853 | "#6ece58" 854 | ], 855 | [ 856 | 0.8888888888888888, 857 | "#b5de2b" 858 | ], 859 | [ 860 | 1, 861 | "#fde725" 862 | ] 863 | ], 864 | "type": "heatmap" 865 | } 866 | ], 867 | "heatmapgl": [ 868 | { 869 | "colorbar": { 870 | "outlinewidth": 1, 871 | "tickcolor": "rgb(36,36,36)", 872 | "ticks": "outside" 873 | }, 874 | "colorscale": [ 875 | [ 876 | 0, 877 | "#440154" 878 | ], 879 | [ 880 | 0.1111111111111111, 881 | "#482878" 882 | ], 883 | [ 884 | 0.2222222222222222, 885 | "#3e4989" 886 | ], 887 | [ 888 | 0.3333333333333333, 889 | "#31688e" 890 | ], 891 | [ 892 | 0.4444444444444444, 893 | "#26828e" 894 | ], 895 | [ 896 | 0.5555555555555556, 897 | "#1f9e89" 898 | ], 899 | [ 900 | 0.6666666666666666, 901 | "#35b779" 902 | ], 903 | [ 904 | 0.7777777777777778, 905 | "#6ece58" 906 | ], 907 | [ 908 | 0.8888888888888888, 909 | "#b5de2b" 910 | ], 911 | [ 912 | 1, 913 | "#fde725" 914 | ] 915 | ], 916 | "type": "heatmapgl" 917 | } 918 | ], 919 | "histogram": [ 920 | { 921 | "marker": { 922 | "line": { 923 | "color": "white", 924 | "width": 0.6 925 | } 926 | }, 927 | "type": "histogram" 928 | } 929 | ], 930 | "histogram2d": [ 931 | { 932 | "colorbar": { 933 | "outlinewidth": 1, 934 | "tickcolor": "rgb(36,36,36)", 935 | "ticks": "outside" 936 | }, 937 | "colorscale": [ 938 | [ 939 | 0, 940 | "#440154" 941 | ], 942 | [ 943 | 0.1111111111111111, 944 | "#482878" 945 | ], 946 | [ 947 | 0.2222222222222222, 948 | "#3e4989" 949 | ], 950 | [ 951 | 0.3333333333333333, 952 | "#31688e" 953 | ], 954 | [ 955 | 0.4444444444444444, 956 | "#26828e" 957 | ], 958 | [ 959 | 0.5555555555555556, 960 | "#1f9e89" 961 | ], 962 | [ 963 | 0.6666666666666666, 964 | "#35b779" 965 | ], 966 | [ 967 | 0.7777777777777778, 968 | "#6ece58" 969 | ], 970 | [ 971 | 0.8888888888888888, 972 | "#b5de2b" 973 | ], 974 | [ 975 | 1, 976 | "#fde725" 977 | ] 978 | ], 979 | "type": "histogram2d" 980 | } 981 | ], 982 | "histogram2dcontour": [ 983 | { 984 | "colorbar": { 985 | "outlinewidth": 1, 986 | "tickcolor": "rgb(36,36,36)", 987 | "ticks": "outside" 988 | }, 989 | "colorscale": [ 990 | [ 991 | 0, 992 | "#440154" 993 | ], 994 | [ 995 | 0.1111111111111111, 996 | "#482878" 997 | ], 998 | [ 999 | 0.2222222222222222, 1000 | "#3e4989" 1001 | ], 1002 | [ 1003 | 0.3333333333333333, 1004 | "#31688e" 1005 | ], 1006 | [ 1007 | 0.4444444444444444, 1008 | "#26828e" 1009 | ], 1010 | [ 1011 | 0.5555555555555556, 1012 | "#1f9e89" 1013 | ], 1014 | [ 1015 | 0.6666666666666666, 1016 | "#35b779" 1017 | ], 1018 | [ 1019 | 0.7777777777777778, 1020 | "#6ece58" 1021 | ], 1022 | [ 1023 | 0.8888888888888888, 1024 | "#b5de2b" 1025 | ], 1026 | [ 1027 | 1, 1028 | "#fde725" 1029 | ] 1030 | ], 1031 | "type": "histogram2dcontour" 1032 | } 1033 | ], 1034 | "mesh3d": [ 1035 | { 1036 | "colorbar": { 1037 | "outlinewidth": 1, 1038 | "tickcolor": "rgb(36,36,36)", 1039 | "ticks": "outside" 1040 | }, 1041 | "type": "mesh3d" 1042 | } 1043 | ], 1044 | "parcoords": [ 1045 | { 1046 | "line": { 1047 | "colorbar": { 1048 | "outlinewidth": 1, 1049 | "tickcolor": "rgb(36,36,36)", 1050 | "ticks": "outside" 1051 | } 1052 | }, 1053 | "type": "parcoords" 1054 | } 1055 | ], 1056 | "pie": [ 1057 | { 1058 | "automargin": true, 1059 | "type": "pie" 1060 | } 1061 | ], 1062 | "scatter": [ 1063 | { 1064 | "fillpattern": { 1065 | "fillmode": "overlay", 1066 | "size": 10, 1067 | "solidity": 0.2 1068 | }, 1069 | "type": "scatter" 1070 | } 1071 | ], 1072 | "scatter3d": [ 1073 | { 1074 | "line": { 1075 | "colorbar": { 1076 | "outlinewidth": 1, 1077 | "tickcolor": "rgb(36,36,36)", 1078 | "ticks": "outside" 1079 | } 1080 | }, 1081 | "marker": { 1082 | "colorbar": { 1083 | "outlinewidth": 1, 1084 | "tickcolor": "rgb(36,36,36)", 1085 | "ticks": "outside" 1086 | } 1087 | }, 1088 | "type": "scatter3d" 1089 | } 1090 | ], 1091 | "scattercarpet": [ 1092 | { 1093 | "marker": { 1094 | "colorbar": { 1095 | "outlinewidth": 1, 1096 | "tickcolor": "rgb(36,36,36)", 1097 | "ticks": "outside" 1098 | } 1099 | }, 1100 | "type": "scattercarpet" 1101 | } 1102 | ], 1103 | "scattergeo": [ 1104 | { 1105 | "marker": { 1106 | "colorbar": { 1107 | "outlinewidth": 1, 1108 | "tickcolor": "rgb(36,36,36)", 1109 | "ticks": "outside" 1110 | } 1111 | }, 1112 | "type": "scattergeo" 1113 | } 1114 | ], 1115 | "scattergl": [ 1116 | { 1117 | "marker": { 1118 | "colorbar": { 1119 | "outlinewidth": 1, 1120 | "tickcolor": "rgb(36,36,36)", 1121 | "ticks": "outside" 1122 | } 1123 | }, 1124 | "type": "scattergl" 1125 | } 1126 | ], 1127 | "scattermapbox": [ 1128 | { 1129 | "marker": { 1130 | "colorbar": { 1131 | "outlinewidth": 1, 1132 | "tickcolor": "rgb(36,36,36)", 1133 | "ticks": "outside" 1134 | } 1135 | }, 1136 | "type": "scattermapbox" 1137 | } 1138 | ], 1139 | "scatterpolar": [ 1140 | { 1141 | "marker": { 1142 | "colorbar": { 1143 | "outlinewidth": 1, 1144 | "tickcolor": "rgb(36,36,36)", 1145 | "ticks": "outside" 1146 | } 1147 | }, 1148 | "type": "scatterpolar" 1149 | } 1150 | ], 1151 | "scatterpolargl": [ 1152 | { 1153 | "marker": { 1154 | "colorbar": { 1155 | "outlinewidth": 1, 1156 | "tickcolor": "rgb(36,36,36)", 1157 | "ticks": "outside" 1158 | } 1159 | }, 1160 | "type": "scatterpolargl" 1161 | } 1162 | ], 1163 | "scatterternary": [ 1164 | { 1165 | "marker": { 1166 | "colorbar": { 1167 | "outlinewidth": 1, 1168 | "tickcolor": "rgb(36,36,36)", 1169 | "ticks": "outside" 1170 | } 1171 | }, 1172 | "type": "scatterternary" 1173 | } 1174 | ], 1175 | "surface": [ 1176 | { 1177 | "colorbar": { 1178 | "outlinewidth": 1, 1179 | "tickcolor": "rgb(36,36,36)", 1180 | "ticks": "outside" 1181 | }, 1182 | "colorscale": [ 1183 | [ 1184 | 0, 1185 | "#440154" 1186 | ], 1187 | [ 1188 | 0.1111111111111111, 1189 | "#482878" 1190 | ], 1191 | [ 1192 | 0.2222222222222222, 1193 | "#3e4989" 1194 | ], 1195 | [ 1196 | 0.3333333333333333, 1197 | "#31688e" 1198 | ], 1199 | [ 1200 | 0.4444444444444444, 1201 | "#26828e" 1202 | ], 1203 | [ 1204 | 0.5555555555555556, 1205 | "#1f9e89" 1206 | ], 1207 | [ 1208 | 0.6666666666666666, 1209 | "#35b779" 1210 | ], 1211 | [ 1212 | 0.7777777777777778, 1213 | "#6ece58" 1214 | ], 1215 | [ 1216 | 0.8888888888888888, 1217 | "#b5de2b" 1218 | ], 1219 | [ 1220 | 1, 1221 | "#fde725" 1222 | ] 1223 | ], 1224 | "type": "surface" 1225 | } 1226 | ], 1227 | "table": [ 1228 | { 1229 | "cells": { 1230 | "fill": { 1231 | "color": "rgb(237,237,237)" 1232 | }, 1233 | "line": { 1234 | "color": "white" 1235 | } 1236 | }, 1237 | "header": { 1238 | "fill": { 1239 | "color": "rgb(217,217,217)" 1240 | }, 1241 | "line": { 1242 | "color": "white" 1243 | } 1244 | }, 1245 | "type": "table" 1246 | } 1247 | ] 1248 | }, 1249 | "layout": { 1250 | "annotationdefaults": { 1251 | "arrowhead": 0, 1252 | "arrowwidth": 1 1253 | }, 1254 | "autotypenumbers": "strict", 1255 | "coloraxis": { 1256 | "colorbar": { 1257 | "outlinewidth": 1, 1258 | "tickcolor": "rgb(36,36,36)", 1259 | "ticks": "outside" 1260 | } 1261 | }, 1262 | "colorscale": { 1263 | "diverging": [ 1264 | [ 1265 | 0, 1266 | "rgb(103,0,31)" 1267 | ], 1268 | [ 1269 | 0.1, 1270 | "rgb(178,24,43)" 1271 | ], 1272 | [ 1273 | 0.2, 1274 | "rgb(214,96,77)" 1275 | ], 1276 | [ 1277 | 0.3, 1278 | "rgb(244,165,130)" 1279 | ], 1280 | [ 1281 | 0.4, 1282 | "rgb(253,219,199)" 1283 | ], 1284 | [ 1285 | 0.5, 1286 | "rgb(247,247,247)" 1287 | ], 1288 | [ 1289 | 0.6, 1290 | "rgb(209,229,240)" 1291 | ], 1292 | [ 1293 | 0.7, 1294 | "rgb(146,197,222)" 1295 | ], 1296 | [ 1297 | 0.8, 1298 | "rgb(67,147,195)" 1299 | ], 1300 | [ 1301 | 0.9, 1302 | "rgb(33,102,172)" 1303 | ], 1304 | [ 1305 | 1, 1306 | "rgb(5,48,97)" 1307 | ] 1308 | ], 1309 | "sequential": [ 1310 | [ 1311 | 0, 1312 | "#440154" 1313 | ], 1314 | [ 1315 | 0.1111111111111111, 1316 | "#482878" 1317 | ], 1318 | [ 1319 | 0.2222222222222222, 1320 | "#3e4989" 1321 | ], 1322 | [ 1323 | 0.3333333333333333, 1324 | "#31688e" 1325 | ], 1326 | [ 1327 | 0.4444444444444444, 1328 | "#26828e" 1329 | ], 1330 | [ 1331 | 0.5555555555555556, 1332 | "#1f9e89" 1333 | ], 1334 | [ 1335 | 0.6666666666666666, 1336 | "#35b779" 1337 | ], 1338 | [ 1339 | 0.7777777777777778, 1340 | "#6ece58" 1341 | ], 1342 | [ 1343 | 0.8888888888888888, 1344 | "#b5de2b" 1345 | ], 1346 | [ 1347 | 1, 1348 | "#fde725" 1349 | ] 1350 | ], 1351 | "sequentialminus": [ 1352 | [ 1353 | 0, 1354 | "#440154" 1355 | ], 1356 | [ 1357 | 0.1111111111111111, 1358 | "#482878" 1359 | ], 1360 | [ 1361 | 0.2222222222222222, 1362 | "#3e4989" 1363 | ], 1364 | [ 1365 | 0.3333333333333333, 1366 | "#31688e" 1367 | ], 1368 | [ 1369 | 0.4444444444444444, 1370 | "#26828e" 1371 | ], 1372 | [ 1373 | 0.5555555555555556, 1374 | "#1f9e89" 1375 | ], 1376 | [ 1377 | 0.6666666666666666, 1378 | "#35b779" 1379 | ], 1380 | [ 1381 | 0.7777777777777778, 1382 | "#6ece58" 1383 | ], 1384 | [ 1385 | 0.8888888888888888, 1386 | "#b5de2b" 1387 | ], 1388 | [ 1389 | 1, 1390 | "#fde725" 1391 | ] 1392 | ] 1393 | }, 1394 | "colorway": [ 1395 | "#1F77B4", 1396 | "#FF7F0E", 1397 | "#2CA02C", 1398 | "#D62728", 1399 | "#9467BD", 1400 | "#8C564B", 1401 | "#E377C2", 1402 | "#7F7F7F", 1403 | "#BCBD22", 1404 | "#17BECF" 1405 | ], 1406 | "font": { 1407 | "color": "rgb(36,36,36)" 1408 | }, 1409 | "geo": { 1410 | "bgcolor": "white", 1411 | "lakecolor": "white", 1412 | "landcolor": "white", 1413 | "showlakes": true, 1414 | "showland": true, 1415 | "subunitcolor": "white" 1416 | }, 1417 | "hoverlabel": { 1418 | "align": "left" 1419 | }, 1420 | "hovermode": "closest", 1421 | "mapbox": { 1422 | "style": "light" 1423 | }, 1424 | "paper_bgcolor": "white", 1425 | "plot_bgcolor": "white", 1426 | "polar": { 1427 | "angularaxis": { 1428 | "gridcolor": "rgb(232,232,232)", 1429 | "linecolor": "rgb(36,36,36)", 1430 | "showgrid": false, 1431 | "showline": true, 1432 | "ticks": "outside" 1433 | }, 1434 | "bgcolor": "white", 1435 | "radialaxis": { 1436 | "gridcolor": "rgb(232,232,232)", 1437 | "linecolor": "rgb(36,36,36)", 1438 | "showgrid": false, 1439 | "showline": true, 1440 | "ticks": "outside" 1441 | } 1442 | }, 1443 | "scene": { 1444 | "xaxis": { 1445 | "backgroundcolor": "white", 1446 | "gridcolor": "rgb(232,232,232)", 1447 | "gridwidth": 2, 1448 | "linecolor": "rgb(36,36,36)", 1449 | "showbackground": true, 1450 | "showgrid": false, 1451 | "showline": true, 1452 | "ticks": "outside", 1453 | "zeroline": false, 1454 | "zerolinecolor": "rgb(36,36,36)" 1455 | }, 1456 | "yaxis": { 1457 | "backgroundcolor": "white", 1458 | "gridcolor": "rgb(232,232,232)", 1459 | "gridwidth": 2, 1460 | "linecolor": "rgb(36,36,36)", 1461 | "showbackground": true, 1462 | "showgrid": false, 1463 | "showline": true, 1464 | "ticks": "outside", 1465 | "zeroline": false, 1466 | "zerolinecolor": "rgb(36,36,36)" 1467 | }, 1468 | "zaxis": { 1469 | "backgroundcolor": "white", 1470 | "gridcolor": "rgb(232,232,232)", 1471 | "gridwidth": 2, 1472 | "linecolor": "rgb(36,36,36)", 1473 | "showbackground": true, 1474 | "showgrid": false, 1475 | "showline": true, 1476 | "ticks": "outside", 1477 | "zeroline": false, 1478 | "zerolinecolor": "rgb(36,36,36)" 1479 | } 1480 | }, 1481 | "shapedefaults": { 1482 | "fillcolor": "black", 1483 | "line": { 1484 | "width": 0 1485 | }, 1486 | "opacity": 0.3 1487 | }, 1488 | "ternary": { 1489 | "aaxis": { 1490 | "gridcolor": "rgb(232,232,232)", 1491 | "linecolor": "rgb(36,36,36)", 1492 | "showgrid": false, 1493 | "showline": true, 1494 | "ticks": "outside" 1495 | }, 1496 | "baxis": { 1497 | "gridcolor": "rgb(232,232,232)", 1498 | "linecolor": "rgb(36,36,36)", 1499 | "showgrid": false, 1500 | "showline": true, 1501 | "ticks": "outside" 1502 | }, 1503 | "bgcolor": "white", 1504 | "caxis": { 1505 | "gridcolor": "rgb(232,232,232)", 1506 | "linecolor": "rgb(36,36,36)", 1507 | "showgrid": false, 1508 | "showline": true, 1509 | "ticks": "outside" 1510 | } 1511 | }, 1512 | "title": { 1513 | "x": 0.05 1514 | }, 1515 | "xaxis": { 1516 | "automargin": true, 1517 | "gridcolor": "rgb(232,232,232)", 1518 | "linecolor": "rgb(36,36,36)", 1519 | "showgrid": false, 1520 | "showline": true, 1521 | "ticks": "outside", 1522 | "title": { 1523 | "standoff": 15 1524 | }, 1525 | "zeroline": false, 1526 | "zerolinecolor": "rgb(36,36,36)" 1527 | }, 1528 | "yaxis": { 1529 | "automargin": true, 1530 | "gridcolor": "rgb(232,232,232)", 1531 | "linecolor": "rgb(36,36,36)", 1532 | "showgrid": false, 1533 | "showline": true, 1534 | "ticks": "outside", 1535 | "title": { 1536 | "standoff": 15 1537 | }, 1538 | "zeroline": false, 1539 | "zerolinecolor": "rgb(36,36,36)" 1540 | } 1541 | } 1542 | }, 1543 | "title": { 1544 | "font": { 1545 | "color": "Black", 1546 | "size": 22 1547 | }, 1548 | "text": "Topics over Time", 1549 | "x": 0.4, 1550 | "xanchor": "center", 1551 | "y": 0.95, 1552 | "yanchor": "top" 1553 | }, 1554 | "width": 1000, 1555 | "xaxis": { 1556 | "showgrid": true 1557 | }, 1558 | "yaxis": { 1559 | "showgrid": true, 1560 | "title": { 1561 | "text": "Frequency" 1562 | } 1563 | } 1564 | } 1565 | } 1566 | }, 1567 | "metadata": {}, 1568 | "output_type": "display_data" 1569 | } 1570 | ], 1571 | "source": [ 1572 | "# 各 Topic 時間序列圖\n", 1573 | "topics_over_time = topic_model.topics_over_time(\n", 1574 | " ws, \n", 1575 | " timestamps, \n", 1576 | ")\n", 1577 | "tot_fig = topic_model.visualize_topics_over_time(\n", 1578 | " topics_over_time, top_n_topics=12, width=1000\n", 1579 | ")\n", 1580 | "tot_fig" 1581 | ] 1582 | }, 1583 | { 1584 | "cell_type": "code", 1585 | "execution_count": 17, 1586 | "metadata": {}, 1587 | "outputs": [ 1588 | { 1589 | "data": { 1590 | "application/vnd.plotly.v1+json": { 1591 | "config": { 1592 | "plotlyServerURL": "https://plot.ly" 1593 | }, 1594 | "data": [ 1595 | { 1596 | "marker": { 1597 | "color": "#D55E00" 1598 | }, 1599 | "orientation": "h", 1600 | "type": "bar", 1601 | "x": [ 1602 | 0.010036117168846038, 1603 | 0.010358506547658615, 1604 | 0.0106788410320816, 1605 | 0.010698859633264288, 1606 | 0.011671864319967307 1607 | ], 1608 | "xaxis": "x", 1609 | "y": [ 1610 | "生技 ", 1611 | "開發 ", 1612 | "中心 ", 1613 | "臨床試驗 ", 1614 | "計畫 " 1615 | ], 1616 | "yaxis": "y" 1617 | }, 1618 | { 1619 | "marker": { 1620 | "color": "#0072B2" 1621 | }, 1622 | "orientation": "h", 1623 | "type": "bar", 1624 | "x": [ 1625 | 0.012840369315768275, 1626 | 0.013344711270460653, 1627 | 0.013531355294615246, 1628 | 0.014271351261196953, 1629 | 0.0162898165763392 1630 | ], 1631 | "xaxis": "x2", 1632 | "y": [ 1633 | "應用 ", 1634 | "產業 ", 1635 | "發展 ", 1636 | "服務 ", 1637 | "智慧 " 1638 | ], 1639 | "yaxis": "y2" 1640 | }, 1641 | { 1642 | "marker": { 1643 | "color": "#CC79A7" 1644 | }, 1645 | "orientation": "h", 1646 | "type": "bar", 1647 | "x": [ 1648 | 0.01578994069289869, 1649 | 0.01583388449071344, 1650 | 0.015931341674847187, 1651 | 0.019298052269421336, 1652 | 0.024931485304576874 1653 | ], 1654 | "xaxis": "x3", 1655 | "y": [ 1656 | "訊號 ", 1657 | "實驗 ", 1658 | "頻道 ", 1659 | "行政 ", 1660 | "中子 " 1661 | ], 1662 | "yaxis": "y3" 1663 | }, 1664 | { 1665 | "marker": { 1666 | "color": "#E69F00" 1667 | }, 1668 | "orientation": "h", 1669 | "type": "bar", 1670 | "x": [ 1671 | 0.018245889255145812, 1672 | 0.018270356399692125, 1673 | 0.0183354611928508, 1674 | 0.023199971305239788, 1675 | 0.04340033768876724 1676 | ], 1677 | "xaxis": "x4", 1678 | "y": [ 1679 | "食媒性 ", 1680 | "食材 ", 1681 | "食品安全 ", 1682 | "食品產業 ", 1683 | "食品 " 1684 | ], 1685 | "yaxis": "y4" 1686 | }, 1687 | { 1688 | "marker": { 1689 | "color": "#56B4E9" 1690 | }, 1691 | "orientation": "h", 1692 | "type": "bar", 1693 | "x": [ 1694 | 0.016462123966053496, 1695 | 0.016819153912223883, 1696 | 0.016951842528449633, 1697 | 0.018055115167948405, 1698 | 0.020226475421989475 1699 | ], 1700 | "xaxis": "x5", 1701 | "y": [ 1702 | "石化 ", 1703 | "技術 ", 1704 | "產業 ", 1705 | "基礎 ", 1706 | "超高畫質 " 1707 | ], 1708 | "yaxis": "y5" 1709 | }, 1710 | { 1711 | "marker": { 1712 | "color": "#009E73" 1713 | }, 1714 | "orientation": "h", 1715 | "type": "bar", 1716 | "x": [ 1717 | 0.019213738809956207, 1718 | 0.019644641014467952, 1719 | 0.022262091699000144, 1720 | 0.022617764817073394, 1721 | 0.044360590702848667 1722 | ], 1723 | "xaxis": "x6", 1724 | "y": [ 1725 | "用水 ", 1726 | "氣候變遷 ", 1727 | "水資源 ", 1728 | "混凝土 ", 1729 | "地下水 " 1730 | ], 1731 | "yaxis": "y6" 1732 | }, 1733 | { 1734 | "marker": { 1735 | "color": "#F0E442" 1736 | }, 1737 | "orientation": "h", 1738 | "type": "bar", 1739 | "x": [ 1740 | 0.025989495319065685, 1741 | 0.026155895875823284, 1742 | 0.02724730513272366, 1743 | 0.02849321310375216, 1744 | 0.030703461179395986 1745 | ], 1746 | "xaxis": "x7", 1747 | "y": [ 1748 | "儀器 ", 1749 | "光束線 ", 1750 | "設施 ", 1751 | "光子源 ", 1752 | "光源 " 1753 | ], 1754 | "yaxis": "y7" 1755 | }, 1756 | { 1757 | "marker": { 1758 | "color": "#D55E00" 1759 | }, 1760 | "orientation": "h", 1761 | "type": "bar", 1762 | "x": [ 1763 | 0.016178798001916274, 1764 | 0.019996998644316036, 1765 | 0.02117393601397003, 1766 | 0.022820429423951127, 1767 | 0.027014113121812827 1768 | ], 1769 | "xaxis": "x8", 1770 | "y": [ 1771 | "學術 ", 1772 | "領域 ", 1773 | "社會創新 ", 1774 | "社會科學 ", 1775 | "人文 " 1776 | ], 1777 | "yaxis": "y8" 1778 | }, 1779 | { 1780 | "marker": { 1781 | "color": "#0072B2" 1782 | }, 1783 | "orientation": "h", 1784 | "type": "bar", 1785 | "x": [ 1786 | 0.016823737053654045, 1787 | 0.01705607984260705, 1788 | 0.01774483437408217, 1789 | 0.01819365272349642, 1790 | 0.032203601246784626 1791 | ], 1792 | "xaxis": "x9", 1793 | "y": [ 1794 | "技術 ", 1795 | "系統 ", 1796 | "大型 ", 1797 | "模組 ", 1798 | "車輛 " 1799 | ], 1800 | "yaxis": "y9" 1801 | }, 1802 | { 1803 | "marker": { 1804 | "color": "#CC79A7" 1805 | }, 1806 | "orientation": "h", 1807 | "type": "bar", 1808 | "x": [ 1809 | 0.017640395275580457, 1810 | 0.01975905514443746, 1811 | 0.02036147094608936, 1812 | 0.021217334704581526, 1813 | 0.02204602573042941 1814 | ], 1815 | "xaxis": "x10", 1816 | "y": [ 1817 | "活動斷層 ", 1818 | "耐震評估 ", 1819 | "山崩潛勢 ", 1820 | "山崩 ", 1821 | "研發成果 " 1822 | ], 1823 | "yaxis": "y10" 1824 | }, 1825 | { 1826 | "marker": { 1827 | "color": "#E69F00" 1828 | }, 1829 | "orientation": "h", 1830 | "type": "bar", 1831 | "x": [ 1832 | 0.0328218556090127, 1833 | 0.03295156206050777, 1834 | 0.0370506974653674, 1835 | 0.03761736203756081, 1836 | 0.0530372794541444 1837 | ], 1838 | "xaxis": "x11", 1839 | "y": [ 1840 | "處置 ", 1841 | "勞動 ", 1842 | "子項 ", 1843 | "廢棄物 ", 1844 | "研究 " 1845 | ], 1846 | "yaxis": "y11" 1847 | }, 1848 | { 1849 | "marker": { 1850 | "color": "#56B4E9" 1851 | }, 1852 | "orientation": "h", 1853 | "type": "bar", 1854 | "x": [ 1855 | 0.024299717862355803, 1856 | 0.02456976894408899, 1857 | 0.025888136519192917, 1858 | 0.04252718243236, 1859 | 0.07990365608402077 1860 | ], 1861 | "xaxis": "x12", 1862 | "y": [ 1863 | "海象 ", 1864 | "探測 ", 1865 | "海域 ", 1866 | "海洋科技 ", 1867 | "海洋 " 1868 | ], 1869 | "yaxis": "y12" 1870 | } 1871 | ], 1872 | "layout": { 1873 | "annotations": [ 1874 | { 1875 | "font": { 1876 | "size": 16 1877 | }, 1878 | "showarrow": false, 1879 | "text": "Topic 0", 1880 | "x": 0.0875, 1881 | "xanchor": "center", 1882 | "xref": "paper", 1883 | "y": 1, 1884 | "yanchor": "bottom", 1885 | "yref": "paper" 1886 | }, 1887 | { 1888 | "font": { 1889 | "size": 16 1890 | }, 1891 | "showarrow": false, 1892 | "text": "Topic 1", 1893 | "x": 0.36250000000000004, 1894 | "xanchor": "center", 1895 | "xref": "paper", 1896 | "y": 1, 1897 | "yanchor": "bottom", 1898 | "yref": "paper" 1899 | }, 1900 | { 1901 | "font": { 1902 | "size": 16 1903 | }, 1904 | "showarrow": false, 1905 | "text": "Topic 2", 1906 | "x": 0.6375000000000001, 1907 | "xanchor": "center", 1908 | "xref": "paper", 1909 | "y": 1, 1910 | "yanchor": "bottom", 1911 | "yref": "paper" 1912 | }, 1913 | { 1914 | "font": { 1915 | "size": 16 1916 | }, 1917 | "showarrow": false, 1918 | "text": "Topic 3", 1919 | "x": 0.9125, 1920 | "xanchor": "center", 1921 | "xref": "paper", 1922 | "y": 1, 1923 | "yanchor": "bottom", 1924 | "yref": "paper" 1925 | }, 1926 | { 1927 | "font": { 1928 | "size": 16 1929 | }, 1930 | "showarrow": false, 1931 | "text": "Topic 4", 1932 | "x": 0.0875, 1933 | "xanchor": "center", 1934 | "xref": "paper", 1935 | "y": 0.6222222222222222, 1936 | "yanchor": "bottom", 1937 | "yref": "paper" 1938 | }, 1939 | { 1940 | "font": { 1941 | "size": 16 1942 | }, 1943 | "showarrow": false, 1944 | "text": "Topic 5", 1945 | "x": 0.36250000000000004, 1946 | "xanchor": "center", 1947 | "xref": "paper", 1948 | "y": 0.6222222222222222, 1949 | "yanchor": "bottom", 1950 | "yref": "paper" 1951 | }, 1952 | { 1953 | "font": { 1954 | "size": 16 1955 | }, 1956 | "showarrow": false, 1957 | "text": "Topic 6", 1958 | "x": 0.6375000000000001, 1959 | "xanchor": "center", 1960 | "xref": "paper", 1961 | "y": 0.6222222222222222, 1962 | "yanchor": "bottom", 1963 | "yref": "paper" 1964 | }, 1965 | { 1966 | "font": { 1967 | "size": 16 1968 | }, 1969 | "showarrow": false, 1970 | "text": "Topic 7", 1971 | "x": 0.9125, 1972 | "xanchor": "center", 1973 | "xref": "paper", 1974 | "y": 0.6222222222222222, 1975 | "yanchor": "bottom", 1976 | "yref": "paper" 1977 | }, 1978 | { 1979 | "font": { 1980 | "size": 16 1981 | }, 1982 | "showarrow": false, 1983 | "text": "Topic 8", 1984 | "x": 0.0875, 1985 | "xanchor": "center", 1986 | "xref": "paper", 1987 | "y": 0.24444444444444446, 1988 | "yanchor": "bottom", 1989 | "yref": "paper" 1990 | }, 1991 | { 1992 | "font": { 1993 | "size": 16 1994 | }, 1995 | "showarrow": false, 1996 | "text": "Topic 9", 1997 | "x": 0.36250000000000004, 1998 | "xanchor": "center", 1999 | "xref": "paper", 2000 | "y": 0.24444444444444446, 2001 | "yanchor": "bottom", 2002 | "yref": "paper" 2003 | }, 2004 | { 2005 | "font": { 2006 | "size": 16 2007 | }, 2008 | "showarrow": false, 2009 | "text": "Topic 10", 2010 | "x": 0.6375000000000001, 2011 | "xanchor": "center", 2012 | "xref": "paper", 2013 | "y": 0.24444444444444446, 2014 | "yanchor": "bottom", 2015 | "yref": "paper" 2016 | }, 2017 | { 2018 | "font": { 2019 | "size": 16 2020 | }, 2021 | "showarrow": false, 2022 | "text": "Topic 11", 2023 | "x": 0.9125, 2024 | "xanchor": "center", 2025 | "xref": "paper", 2026 | "y": 0.24444444444444446, 2027 | "yanchor": "bottom", 2028 | "yref": "paper" 2029 | } 2030 | ], 2031 | "height": 750, 2032 | "hoverlabel": { 2033 | "bgcolor": "white", 2034 | "font": { 2035 | "family": "Rockwell", 2036 | "size": 16 2037 | } 2038 | }, 2039 | "showlegend": false, 2040 | "template": { 2041 | "data": { 2042 | "bar": [ 2043 | { 2044 | "error_x": { 2045 | "color": "#2a3f5f" 2046 | }, 2047 | "error_y": { 2048 | "color": "#2a3f5f" 2049 | }, 2050 | "marker": { 2051 | "line": { 2052 | "color": "white", 2053 | "width": 0.5 2054 | }, 2055 | "pattern": { 2056 | "fillmode": "overlay", 2057 | "size": 10, 2058 | "solidity": 0.2 2059 | } 2060 | }, 2061 | "type": "bar" 2062 | } 2063 | ], 2064 | "barpolar": [ 2065 | { 2066 | "marker": { 2067 | "line": { 2068 | "color": "white", 2069 | "width": 0.5 2070 | }, 2071 | "pattern": { 2072 | "fillmode": "overlay", 2073 | "size": 10, 2074 | "solidity": 0.2 2075 | } 2076 | }, 2077 | "type": "barpolar" 2078 | } 2079 | ], 2080 | "carpet": [ 2081 | { 2082 | "aaxis": { 2083 | "endlinecolor": "#2a3f5f", 2084 | "gridcolor": "#C8D4E3", 2085 | "linecolor": "#C8D4E3", 2086 | "minorgridcolor": "#C8D4E3", 2087 | "startlinecolor": "#2a3f5f" 2088 | }, 2089 | "baxis": { 2090 | "endlinecolor": "#2a3f5f", 2091 | "gridcolor": "#C8D4E3", 2092 | "linecolor": "#C8D4E3", 2093 | "minorgridcolor": "#C8D4E3", 2094 | "startlinecolor": "#2a3f5f" 2095 | }, 2096 | "type": "carpet" 2097 | } 2098 | ], 2099 | "choropleth": [ 2100 | { 2101 | "colorbar": { 2102 | "outlinewidth": 0, 2103 | "ticks": "" 2104 | }, 2105 | "type": "choropleth" 2106 | } 2107 | ], 2108 | "contour": [ 2109 | { 2110 | "colorbar": { 2111 | "outlinewidth": 0, 2112 | "ticks": "" 2113 | }, 2114 | "colorscale": [ 2115 | [ 2116 | 0, 2117 | "#0d0887" 2118 | ], 2119 | [ 2120 | 0.1111111111111111, 2121 | "#46039f" 2122 | ], 2123 | [ 2124 | 0.2222222222222222, 2125 | "#7201a8" 2126 | ], 2127 | [ 2128 | 0.3333333333333333, 2129 | "#9c179e" 2130 | ], 2131 | [ 2132 | 0.4444444444444444, 2133 | "#bd3786" 2134 | ], 2135 | [ 2136 | 0.5555555555555556, 2137 | "#d8576b" 2138 | ], 2139 | [ 2140 | 0.6666666666666666, 2141 | "#ed7953" 2142 | ], 2143 | [ 2144 | 0.7777777777777778, 2145 | "#fb9f3a" 2146 | ], 2147 | [ 2148 | 0.8888888888888888, 2149 | "#fdca26" 2150 | ], 2151 | [ 2152 | 1, 2153 | "#f0f921" 2154 | ] 2155 | ], 2156 | "type": "contour" 2157 | } 2158 | ], 2159 | "contourcarpet": [ 2160 | { 2161 | "colorbar": { 2162 | "outlinewidth": 0, 2163 | "ticks": "" 2164 | }, 2165 | "type": "contourcarpet" 2166 | } 2167 | ], 2168 | "heatmap": [ 2169 | { 2170 | "colorbar": { 2171 | "outlinewidth": 0, 2172 | "ticks": "" 2173 | }, 2174 | "colorscale": [ 2175 | [ 2176 | 0, 2177 | "#0d0887" 2178 | ], 2179 | [ 2180 | 0.1111111111111111, 2181 | "#46039f" 2182 | ], 2183 | [ 2184 | 0.2222222222222222, 2185 | "#7201a8" 2186 | ], 2187 | [ 2188 | 0.3333333333333333, 2189 | "#9c179e" 2190 | ], 2191 | [ 2192 | 0.4444444444444444, 2193 | "#bd3786" 2194 | ], 2195 | [ 2196 | 0.5555555555555556, 2197 | "#d8576b" 2198 | ], 2199 | [ 2200 | 0.6666666666666666, 2201 | "#ed7953" 2202 | ], 2203 | [ 2204 | 0.7777777777777778, 2205 | "#fb9f3a" 2206 | ], 2207 | [ 2208 | 0.8888888888888888, 2209 | "#fdca26" 2210 | ], 2211 | [ 2212 | 1, 2213 | "#f0f921" 2214 | ] 2215 | ], 2216 | "type": "heatmap" 2217 | } 2218 | ], 2219 | "heatmapgl": [ 2220 | { 2221 | "colorbar": { 2222 | "outlinewidth": 0, 2223 | "ticks": "" 2224 | }, 2225 | "colorscale": [ 2226 | [ 2227 | 0, 2228 | "#0d0887" 2229 | ], 2230 | [ 2231 | 0.1111111111111111, 2232 | "#46039f" 2233 | ], 2234 | [ 2235 | 0.2222222222222222, 2236 | "#7201a8" 2237 | ], 2238 | [ 2239 | 0.3333333333333333, 2240 | "#9c179e" 2241 | ], 2242 | [ 2243 | 0.4444444444444444, 2244 | "#bd3786" 2245 | ], 2246 | [ 2247 | 0.5555555555555556, 2248 | "#d8576b" 2249 | ], 2250 | [ 2251 | 0.6666666666666666, 2252 | "#ed7953" 2253 | ], 2254 | [ 2255 | 0.7777777777777778, 2256 | "#fb9f3a" 2257 | ], 2258 | [ 2259 | 0.8888888888888888, 2260 | "#fdca26" 2261 | ], 2262 | [ 2263 | 1, 2264 | "#f0f921" 2265 | ] 2266 | ], 2267 | "type": "heatmapgl" 2268 | } 2269 | ], 2270 | "histogram": [ 2271 | { 2272 | "marker": { 2273 | "pattern": { 2274 | "fillmode": "overlay", 2275 | "size": 10, 2276 | "solidity": 0.2 2277 | } 2278 | }, 2279 | "type": "histogram" 2280 | } 2281 | ], 2282 | "histogram2d": [ 2283 | { 2284 | "colorbar": { 2285 | "outlinewidth": 0, 2286 | "ticks": "" 2287 | }, 2288 | "colorscale": [ 2289 | [ 2290 | 0, 2291 | "#0d0887" 2292 | ], 2293 | [ 2294 | 0.1111111111111111, 2295 | "#46039f" 2296 | ], 2297 | [ 2298 | 0.2222222222222222, 2299 | "#7201a8" 2300 | ], 2301 | [ 2302 | 0.3333333333333333, 2303 | "#9c179e" 2304 | ], 2305 | [ 2306 | 0.4444444444444444, 2307 | "#bd3786" 2308 | ], 2309 | [ 2310 | 0.5555555555555556, 2311 | "#d8576b" 2312 | ], 2313 | [ 2314 | 0.6666666666666666, 2315 | "#ed7953" 2316 | ], 2317 | [ 2318 | 0.7777777777777778, 2319 | "#fb9f3a" 2320 | ], 2321 | [ 2322 | 0.8888888888888888, 2323 | "#fdca26" 2324 | ], 2325 | [ 2326 | 1, 2327 | "#f0f921" 2328 | ] 2329 | ], 2330 | "type": "histogram2d" 2331 | } 2332 | ], 2333 | "histogram2dcontour": [ 2334 | { 2335 | "colorbar": { 2336 | "outlinewidth": 0, 2337 | "ticks": "" 2338 | }, 2339 | "colorscale": [ 2340 | [ 2341 | 0, 2342 | "#0d0887" 2343 | ], 2344 | [ 2345 | 0.1111111111111111, 2346 | "#46039f" 2347 | ], 2348 | [ 2349 | 0.2222222222222222, 2350 | "#7201a8" 2351 | ], 2352 | [ 2353 | 0.3333333333333333, 2354 | "#9c179e" 2355 | ], 2356 | [ 2357 | 0.4444444444444444, 2358 | "#bd3786" 2359 | ], 2360 | [ 2361 | 0.5555555555555556, 2362 | "#d8576b" 2363 | ], 2364 | [ 2365 | 0.6666666666666666, 2366 | "#ed7953" 2367 | ], 2368 | [ 2369 | 0.7777777777777778, 2370 | "#fb9f3a" 2371 | ], 2372 | [ 2373 | 0.8888888888888888, 2374 | "#fdca26" 2375 | ], 2376 | [ 2377 | 1, 2378 | "#f0f921" 2379 | ] 2380 | ], 2381 | "type": "histogram2dcontour" 2382 | } 2383 | ], 2384 | "mesh3d": [ 2385 | { 2386 | "colorbar": { 2387 | "outlinewidth": 0, 2388 | "ticks": "" 2389 | }, 2390 | "type": "mesh3d" 2391 | } 2392 | ], 2393 | "parcoords": [ 2394 | { 2395 | "line": { 2396 | "colorbar": { 2397 | "outlinewidth": 0, 2398 | "ticks": "" 2399 | } 2400 | }, 2401 | "type": "parcoords" 2402 | } 2403 | ], 2404 | "pie": [ 2405 | { 2406 | "automargin": true, 2407 | "type": "pie" 2408 | } 2409 | ], 2410 | "scatter": [ 2411 | { 2412 | "fillpattern": { 2413 | "fillmode": "overlay", 2414 | "size": 10, 2415 | "solidity": 0.2 2416 | }, 2417 | "type": "scatter" 2418 | } 2419 | ], 2420 | "scatter3d": [ 2421 | { 2422 | "line": { 2423 | "colorbar": { 2424 | "outlinewidth": 0, 2425 | "ticks": "" 2426 | } 2427 | }, 2428 | "marker": { 2429 | "colorbar": { 2430 | "outlinewidth": 0, 2431 | "ticks": "" 2432 | } 2433 | }, 2434 | "type": "scatter3d" 2435 | } 2436 | ], 2437 | "scattercarpet": [ 2438 | { 2439 | "marker": { 2440 | "colorbar": { 2441 | "outlinewidth": 0, 2442 | "ticks": "" 2443 | } 2444 | }, 2445 | "type": "scattercarpet" 2446 | } 2447 | ], 2448 | "scattergeo": [ 2449 | { 2450 | "marker": { 2451 | "colorbar": { 2452 | "outlinewidth": 0, 2453 | "ticks": "" 2454 | } 2455 | }, 2456 | "type": "scattergeo" 2457 | } 2458 | ], 2459 | "scattergl": [ 2460 | { 2461 | "marker": { 2462 | "colorbar": { 2463 | "outlinewidth": 0, 2464 | "ticks": "" 2465 | } 2466 | }, 2467 | "type": "scattergl" 2468 | } 2469 | ], 2470 | "scattermapbox": [ 2471 | { 2472 | "marker": { 2473 | "colorbar": { 2474 | "outlinewidth": 0, 2475 | "ticks": "" 2476 | } 2477 | }, 2478 | "type": "scattermapbox" 2479 | } 2480 | ], 2481 | "scatterpolar": [ 2482 | { 2483 | "marker": { 2484 | "colorbar": { 2485 | "outlinewidth": 0, 2486 | "ticks": "" 2487 | } 2488 | }, 2489 | "type": "scatterpolar" 2490 | } 2491 | ], 2492 | "scatterpolargl": [ 2493 | { 2494 | "marker": { 2495 | "colorbar": { 2496 | "outlinewidth": 0, 2497 | "ticks": "" 2498 | } 2499 | }, 2500 | "type": "scatterpolargl" 2501 | } 2502 | ], 2503 | "scatterternary": [ 2504 | { 2505 | "marker": { 2506 | "colorbar": { 2507 | "outlinewidth": 0, 2508 | "ticks": "" 2509 | } 2510 | }, 2511 | "type": "scatterternary" 2512 | } 2513 | ], 2514 | "surface": [ 2515 | { 2516 | "colorbar": { 2517 | "outlinewidth": 0, 2518 | "ticks": "" 2519 | }, 2520 | "colorscale": [ 2521 | [ 2522 | 0, 2523 | "#0d0887" 2524 | ], 2525 | [ 2526 | 0.1111111111111111, 2527 | "#46039f" 2528 | ], 2529 | [ 2530 | 0.2222222222222222, 2531 | "#7201a8" 2532 | ], 2533 | [ 2534 | 0.3333333333333333, 2535 | "#9c179e" 2536 | ], 2537 | [ 2538 | 0.4444444444444444, 2539 | "#bd3786" 2540 | ], 2541 | [ 2542 | 0.5555555555555556, 2543 | "#d8576b" 2544 | ], 2545 | [ 2546 | 0.6666666666666666, 2547 | "#ed7953" 2548 | ], 2549 | [ 2550 | 0.7777777777777778, 2551 | "#fb9f3a" 2552 | ], 2553 | [ 2554 | 0.8888888888888888, 2555 | "#fdca26" 2556 | ], 2557 | [ 2558 | 1, 2559 | "#f0f921" 2560 | ] 2561 | ], 2562 | "type": "surface" 2563 | } 2564 | ], 2565 | "table": [ 2566 | { 2567 | "cells": { 2568 | "fill": { 2569 | "color": "#EBF0F8" 2570 | }, 2571 | "line": { 2572 | "color": "white" 2573 | } 2574 | }, 2575 | "header": { 2576 | "fill": { 2577 | "color": "#C8D4E3" 2578 | }, 2579 | "line": { 2580 | "color": "white" 2581 | } 2582 | }, 2583 | "type": "table" 2584 | } 2585 | ] 2586 | }, 2587 | "layout": { 2588 | "annotationdefaults": { 2589 | "arrowcolor": "#2a3f5f", 2590 | "arrowhead": 0, 2591 | "arrowwidth": 1 2592 | }, 2593 | "autotypenumbers": "strict", 2594 | "coloraxis": { 2595 | "colorbar": { 2596 | "outlinewidth": 0, 2597 | "ticks": "" 2598 | } 2599 | }, 2600 | "colorscale": { 2601 | "diverging": [ 2602 | [ 2603 | 0, 2604 | "#8e0152" 2605 | ], 2606 | [ 2607 | 0.1, 2608 | "#c51b7d" 2609 | ], 2610 | [ 2611 | 0.2, 2612 | "#de77ae" 2613 | ], 2614 | [ 2615 | 0.3, 2616 | "#f1b6da" 2617 | ], 2618 | [ 2619 | 0.4, 2620 | "#fde0ef" 2621 | ], 2622 | [ 2623 | 0.5, 2624 | "#f7f7f7" 2625 | ], 2626 | [ 2627 | 0.6, 2628 | "#e6f5d0" 2629 | ], 2630 | [ 2631 | 0.7, 2632 | "#b8e186" 2633 | ], 2634 | [ 2635 | 0.8, 2636 | "#7fbc41" 2637 | ], 2638 | [ 2639 | 0.9, 2640 | "#4d9221" 2641 | ], 2642 | [ 2643 | 1, 2644 | "#276419" 2645 | ] 2646 | ], 2647 | "sequential": [ 2648 | [ 2649 | 0, 2650 | "#0d0887" 2651 | ], 2652 | [ 2653 | 0.1111111111111111, 2654 | "#46039f" 2655 | ], 2656 | [ 2657 | 0.2222222222222222, 2658 | "#7201a8" 2659 | ], 2660 | [ 2661 | 0.3333333333333333, 2662 | "#9c179e" 2663 | ], 2664 | [ 2665 | 0.4444444444444444, 2666 | "#bd3786" 2667 | ], 2668 | [ 2669 | 0.5555555555555556, 2670 | "#d8576b" 2671 | ], 2672 | [ 2673 | 0.6666666666666666, 2674 | "#ed7953" 2675 | ], 2676 | [ 2677 | 0.7777777777777778, 2678 | "#fb9f3a" 2679 | ], 2680 | [ 2681 | 0.8888888888888888, 2682 | "#fdca26" 2683 | ], 2684 | [ 2685 | 1, 2686 | "#f0f921" 2687 | ] 2688 | ], 2689 | "sequentialminus": [ 2690 | [ 2691 | 0, 2692 | "#0d0887" 2693 | ], 2694 | [ 2695 | 0.1111111111111111, 2696 | "#46039f" 2697 | ], 2698 | [ 2699 | 0.2222222222222222, 2700 | "#7201a8" 2701 | ], 2702 | [ 2703 | 0.3333333333333333, 2704 | "#9c179e" 2705 | ], 2706 | [ 2707 | 0.4444444444444444, 2708 | "#bd3786" 2709 | ], 2710 | [ 2711 | 0.5555555555555556, 2712 | "#d8576b" 2713 | ], 2714 | [ 2715 | 0.6666666666666666, 2716 | "#ed7953" 2717 | ], 2718 | [ 2719 | 0.7777777777777778, 2720 | "#fb9f3a" 2721 | ], 2722 | [ 2723 | 0.8888888888888888, 2724 | "#fdca26" 2725 | ], 2726 | [ 2727 | 1, 2728 | "#f0f921" 2729 | ] 2730 | ] 2731 | }, 2732 | "colorway": [ 2733 | "#636efa", 2734 | "#EF553B", 2735 | "#00cc96", 2736 | "#ab63fa", 2737 | "#FFA15A", 2738 | "#19d3f3", 2739 | "#FF6692", 2740 | "#B6E880", 2741 | "#FF97FF", 2742 | "#FECB52" 2743 | ], 2744 | "font": { 2745 | "color": "#2a3f5f" 2746 | }, 2747 | "geo": { 2748 | "bgcolor": "white", 2749 | "lakecolor": "white", 2750 | "landcolor": "white", 2751 | "showlakes": true, 2752 | "showland": true, 2753 | "subunitcolor": "#C8D4E3" 2754 | }, 2755 | "hoverlabel": { 2756 | "align": "left" 2757 | }, 2758 | "hovermode": "closest", 2759 | "mapbox": { 2760 | "style": "light" 2761 | }, 2762 | "paper_bgcolor": "white", 2763 | "plot_bgcolor": "white", 2764 | "polar": { 2765 | "angularaxis": { 2766 | "gridcolor": "#EBF0F8", 2767 | "linecolor": "#EBF0F8", 2768 | "ticks": "" 2769 | }, 2770 | "bgcolor": "white", 2771 | "radialaxis": { 2772 | "gridcolor": "#EBF0F8", 2773 | "linecolor": "#EBF0F8", 2774 | "ticks": "" 2775 | } 2776 | }, 2777 | "scene": { 2778 | "xaxis": { 2779 | "backgroundcolor": "white", 2780 | "gridcolor": "#DFE8F3", 2781 | "gridwidth": 2, 2782 | "linecolor": "#EBF0F8", 2783 | "showbackground": true, 2784 | "ticks": "", 2785 | "zerolinecolor": "#EBF0F8" 2786 | }, 2787 | "yaxis": { 2788 | "backgroundcolor": "white", 2789 | "gridcolor": "#DFE8F3", 2790 | "gridwidth": 2, 2791 | "linecolor": "#EBF0F8", 2792 | "showbackground": true, 2793 | "ticks": "", 2794 | "zerolinecolor": "#EBF0F8" 2795 | }, 2796 | "zaxis": { 2797 | "backgroundcolor": "white", 2798 | "gridcolor": "#DFE8F3", 2799 | "gridwidth": 2, 2800 | "linecolor": "#EBF0F8", 2801 | "showbackground": true, 2802 | "ticks": "", 2803 | "zerolinecolor": "#EBF0F8" 2804 | } 2805 | }, 2806 | "shapedefaults": { 2807 | "line": { 2808 | "color": "#2a3f5f" 2809 | } 2810 | }, 2811 | "ternary": { 2812 | "aaxis": { 2813 | "gridcolor": "#DFE8F3", 2814 | "linecolor": "#A2B1C6", 2815 | "ticks": "" 2816 | }, 2817 | "baxis": { 2818 | "gridcolor": "#DFE8F3", 2819 | "linecolor": "#A2B1C6", 2820 | "ticks": "" 2821 | }, 2822 | "bgcolor": "white", 2823 | "caxis": { 2824 | "gridcolor": "#DFE8F3", 2825 | "linecolor": "#A2B1C6", 2826 | "ticks": "" 2827 | } 2828 | }, 2829 | "title": { 2830 | "x": 0.05 2831 | }, 2832 | "xaxis": { 2833 | "automargin": true, 2834 | "gridcolor": "#EBF0F8", 2835 | "linecolor": "#EBF0F8", 2836 | "ticks": "", 2837 | "title": { 2838 | "standoff": 15 2839 | }, 2840 | "zerolinecolor": "#EBF0F8", 2841 | "zerolinewidth": 2 2842 | }, 2843 | "yaxis": { 2844 | "automargin": true, 2845 | "gridcolor": "#EBF0F8", 2846 | "linecolor": "#EBF0F8", 2847 | "ticks": "", 2848 | "title": { 2849 | "standoff": 15 2850 | }, 2851 | "zerolinecolor": "#EBF0F8", 2852 | "zerolinewidth": 2 2853 | } 2854 | } 2855 | }, 2856 | "title": { 2857 | "font": { 2858 | "color": "Black", 2859 | "size": 22 2860 | }, 2861 | "text": "Topic Word Scores", 2862 | "x": 0.5, 2863 | "xanchor": "center", 2864 | "yanchor": "top" 2865 | }, 2866 | "width": 920, 2867 | "xaxis": { 2868 | "anchor": "y", 2869 | "domain": [ 2870 | 0, 2871 | 0.175 2872 | ], 2873 | "showgrid": true 2874 | }, 2875 | "xaxis10": { 2876 | "anchor": "y10", 2877 | "domain": [ 2878 | 0.275, 2879 | 0.45 2880 | ], 2881 | "showgrid": true 2882 | }, 2883 | "xaxis11": { 2884 | "anchor": "y11", 2885 | "domain": [ 2886 | 0.55, 2887 | 0.7250000000000001 2888 | ], 2889 | "showgrid": true 2890 | }, 2891 | "xaxis12": { 2892 | "anchor": "y12", 2893 | "domain": [ 2894 | 0.825, 2895 | 1 2896 | ], 2897 | "showgrid": true 2898 | }, 2899 | "xaxis2": { 2900 | "anchor": "y2", 2901 | "domain": [ 2902 | 0.275, 2903 | 0.45 2904 | ], 2905 | "showgrid": true 2906 | }, 2907 | "xaxis3": { 2908 | "anchor": "y3", 2909 | "domain": [ 2910 | 0.55, 2911 | 0.7250000000000001 2912 | ], 2913 | "showgrid": true 2914 | }, 2915 | "xaxis4": { 2916 | "anchor": "y4", 2917 | "domain": [ 2918 | 0.825, 2919 | 1 2920 | ], 2921 | "showgrid": true 2922 | }, 2923 | "xaxis5": { 2924 | "anchor": "y5", 2925 | "domain": [ 2926 | 0, 2927 | 0.175 2928 | ], 2929 | "showgrid": true 2930 | }, 2931 | "xaxis6": { 2932 | "anchor": "y6", 2933 | "domain": [ 2934 | 0.275, 2935 | 0.45 2936 | ], 2937 | "showgrid": true 2938 | }, 2939 | "xaxis7": { 2940 | "anchor": "y7", 2941 | "domain": [ 2942 | 0.55, 2943 | 0.7250000000000001 2944 | ], 2945 | "showgrid": true 2946 | }, 2947 | "xaxis8": { 2948 | "anchor": "y8", 2949 | "domain": [ 2950 | 0.825, 2951 | 1 2952 | ], 2953 | "showgrid": true 2954 | }, 2955 | "xaxis9": { 2956 | "anchor": "y9", 2957 | "domain": [ 2958 | 0, 2959 | 0.175 2960 | ], 2961 | "showgrid": true 2962 | }, 2963 | "yaxis": { 2964 | "anchor": "x", 2965 | "domain": [ 2966 | 0.7555555555555555, 2967 | 1 2968 | ], 2969 | "showgrid": true 2970 | }, 2971 | "yaxis10": { 2972 | "anchor": "x10", 2973 | "domain": [ 2974 | 0, 2975 | 0.24444444444444446 2976 | ], 2977 | "showgrid": true 2978 | }, 2979 | "yaxis11": { 2980 | "anchor": "x11", 2981 | "domain": [ 2982 | 0, 2983 | 0.24444444444444446 2984 | ], 2985 | "showgrid": true 2986 | }, 2987 | "yaxis12": { 2988 | "anchor": "x12", 2989 | "domain": [ 2990 | 0, 2991 | 0.24444444444444446 2992 | ], 2993 | "showgrid": true 2994 | }, 2995 | "yaxis2": { 2996 | "anchor": "x2", 2997 | "domain": [ 2998 | 0.7555555555555555, 2999 | 1 3000 | ], 3001 | "showgrid": true 3002 | }, 3003 | "yaxis3": { 3004 | "anchor": "x3", 3005 | "domain": [ 3006 | 0.7555555555555555, 3007 | 1 3008 | ], 3009 | "showgrid": true 3010 | }, 3011 | "yaxis4": { 3012 | "anchor": "x4", 3013 | "domain": [ 3014 | 0.7555555555555555, 3015 | 1 3016 | ], 3017 | "showgrid": true 3018 | }, 3019 | "yaxis5": { 3020 | "anchor": "x5", 3021 | "domain": [ 3022 | 0.37777777777777777, 3023 | 0.6222222222222222 3024 | ], 3025 | "showgrid": true 3026 | }, 3027 | "yaxis6": { 3028 | "anchor": "x6", 3029 | "domain": [ 3030 | 0.37777777777777777, 3031 | 0.6222222222222222 3032 | ], 3033 | "showgrid": true 3034 | }, 3035 | "yaxis7": { 3036 | "anchor": "x7", 3037 | "domain": [ 3038 | 0.37777777777777777, 3039 | 0.6222222222222222 3040 | ], 3041 | "showgrid": true 3042 | }, 3043 | "yaxis8": { 3044 | "anchor": "x8", 3045 | "domain": [ 3046 | 0.37777777777777777, 3047 | 0.6222222222222222 3048 | ], 3049 | "showgrid": true 3050 | }, 3051 | "yaxis9": { 3052 | "anchor": "x9", 3053 | "domain": [ 3054 | 0, 3055 | 0.24444444444444446 3056 | ], 3057 | "showgrid": true 3058 | } 3059 | } 3060 | } 3061 | }, 3062 | "metadata": {}, 3063 | "output_type": "display_data" 3064 | } 3065 | ], 3066 | "source": [ 3067 | "# 各 Topic TF-IDF 關鍵字直方圖\n", 3068 | "bar_fig = topic_model.visualize_barchart(\n", 3069 | " top_n_topics=12,\n", 3070 | " width=230,\n", 3071 | ")\n", 3072 | "bar_fig" 3073 | ] 3074 | }, 3075 | { 3076 | "cell_type": "code", 3077 | "execution_count": 22, 3078 | "metadata": {}, 3079 | "outputs": [ 3080 | { 3081 | "data": { 3082 | "application/vnd.plotly.v1+json": { 3083 | "config": { 3084 | "plotlyServerURL": "https://plot.ly" 3085 | }, 3086 | "data": [ 3087 | { 3088 | "customdata": [ 3089 | [ 3090 | 0, 3091 | "計畫 | 臨床試驗 | 中心 | 開發 | 生技", 3092 | 330 3093 | ], 3094 | [ 3095 | 1, 3096 | "智慧 | 服務 | 發展 | 產業 | 應用", 3097 | 271 3098 | ], 3099 | [ 3100 | 2, 3101 | "中子 | 行政 | 頻道 | 實驗 | 訊號", 3102 | 155 3103 | ], 3104 | [ 3105 | 3, 3106 | "食品 | 食品產業 | 食品安全 | 食材 | 食媒性", 3107 | 89 3108 | ], 3109 | [ 3110 | 4, 3111 | "超高畫質 | 基礎 | 產業 | 技術 | 石化", 3112 | 79 3113 | ], 3114 | [ 3115 | 5, 3116 | "地下水 | 混凝土 | 水資源 | 氣候變遷 | 用水", 3117 | 69 3118 | ], 3119 | [ 3120 | 6, 3121 | "光源 | 光子源 | 設施 | 光束線 | 儀器", 3122 | 59 3123 | ], 3124 | [ 3125 | 7, 3126 | "人文 | 社會科學 | 社會創新 | 領域 | 學術", 3127 | 49 3128 | ], 3129 | [ 3130 | 8, 3131 | "車輛 | 模組 | 大型 | 系統 | 技術", 3132 | 48 3133 | ], 3134 | [ 3135 | 9, 3136 | "研發成果 | 山崩 | 山崩潛勢 | 耐震評估 | 活動斷層", 3137 | 44 3138 | ] 3139 | ], 3140 | "hovertemplate": "Topic %{customdata[0]}
%{customdata[1]}
Size: %{customdata[2]}", 3141 | "legendgroup": "", 3142 | "marker": { 3143 | "color": "#B0BEC5", 3144 | "line": { 3145 | "color": "DarkSlateGrey", 3146 | "width": 2 3147 | }, 3148 | "size": [ 3149 | 330, 3150 | 271, 3151 | 155, 3152 | 89, 3153 | 79, 3154 | 69, 3155 | 59, 3156 | 49, 3157 | 48, 3158 | 44 3159 | ], 3160 | "sizemode": "area", 3161 | "sizeref": 0.20625, 3162 | "symbol": "circle" 3163 | }, 3164 | "mode": "markers", 3165 | "name": "", 3166 | "orientation": "v", 3167 | "showlegend": false, 3168 | "type": "scatter", 3169 | "x": [ 3170 | 17.96601104736328, 3171 | -0.20142881572246552, 3172 | 17.805370330810547, 3173 | 18.284164428710938, 3174 | -0.6609519124031067, 3175 | -1.7333014011383057, 3176 | -0.5467479825019836, 3177 | 18.24008560180664, 3178 | -1.2730786800384521, 3179 | -1.2586079835891724 3180 | ], 3181 | "xaxis": "x", 3182 | "y": [ 3183 | 12.663885116577148, 3184 | 14.88134479522705, 3185 | 11.765763282775879, 3186 | 13.034785270690918, 3187 | 14.63888931274414, 3188 | 14.921974182128906, 3189 | 14.147262573242188, 3190 | 12.08366870880127, 3191 | 15.501675605773926, 3192 | 15.041316032409668 3193 | ], 3194 | "yaxis": "y" 3195 | } 3196 | ], 3197 | "layout": { 3198 | "annotations": [ 3199 | { 3200 | "showarrow": false, 3201 | "text": "D1", 3202 | "x": -1.9932966113090516, 3203 | "y": 13.913912868499756, 3204 | "yshift": 10 3205 | }, 3206 | { 3207 | "showarrow": false, 3208 | "text": "D2", 3209 | "x": 9.516746240854262, 3210 | "xshift": 10, 3211 | "y": 17.826926946640015 3212 | } 3213 | ], 3214 | "height": 650, 3215 | "hoverlabel": { 3216 | "bgcolor": "white", 3217 | "font": { 3218 | "family": "Rockwell", 3219 | "size": 16 3220 | } 3221 | }, 3222 | "legend": { 3223 | "itemsizing": "constant", 3224 | "tracegroupgap": 0 3225 | }, 3226 | "margin": { 3227 | "t": 60 3228 | }, 3229 | "shapes": [ 3230 | { 3231 | "line": { 3232 | "color": "#CFD8DC", 3233 | "width": 2 3234 | }, 3235 | "type": "line", 3236 | "x0": 9.516746240854262, 3237 | "x1": 9.516746240854262, 3238 | "y0": 10.000898790359496, 3239 | "y1": 17.826926946640015 3240 | }, 3241 | { 3242 | "line": { 3243 | "color": "#9E9E9E", 3244 | "width": 2 3245 | }, 3246 | "type": "line", 3247 | "x0": -1.9932966113090516, 3248 | "x1": 21.026789093017577, 3249 | "y0": 13.913912868499756, 3250 | "y1": 13.913912868499756 3251 | } 3252 | ], 3253 | "sliders": [ 3254 | { 3255 | "active": 0, 3256 | "pad": { 3257 | "t": 50 3258 | }, 3259 | "steps": [ 3260 | { 3261 | "args": [ 3262 | { 3263 | "marker.color": [ 3264 | [ 3265 | "red", 3266 | "#B0BEC5", 3267 | "#B0BEC5", 3268 | "#B0BEC5", 3269 | "#B0BEC5", 3270 | "#B0BEC5", 3271 | "#B0BEC5", 3272 | "#B0BEC5", 3273 | "#B0BEC5", 3274 | "#B0BEC5" 3275 | ] 3276 | ] 3277 | } 3278 | ], 3279 | "label": "Topic 0", 3280 | "method": "update" 3281 | }, 3282 | { 3283 | "args": [ 3284 | { 3285 | "marker.color": [ 3286 | [ 3287 | "#B0BEC5", 3288 | "red", 3289 | "#B0BEC5", 3290 | "#B0BEC5", 3291 | "#B0BEC5", 3292 | "#B0BEC5", 3293 | "#B0BEC5", 3294 | "#B0BEC5", 3295 | "#B0BEC5", 3296 | "#B0BEC5" 3297 | ] 3298 | ] 3299 | } 3300 | ], 3301 | "label": "Topic 1", 3302 | "method": "update" 3303 | }, 3304 | { 3305 | "args": [ 3306 | { 3307 | "marker.color": [ 3308 | [ 3309 | "#B0BEC5", 3310 | "#B0BEC5", 3311 | "red", 3312 | "#B0BEC5", 3313 | "#B0BEC5", 3314 | "#B0BEC5", 3315 | "#B0BEC5", 3316 | "#B0BEC5", 3317 | "#B0BEC5", 3318 | "#B0BEC5" 3319 | ] 3320 | ] 3321 | } 3322 | ], 3323 | "label": "Topic 2", 3324 | "method": "update" 3325 | }, 3326 | { 3327 | "args": [ 3328 | { 3329 | "marker.color": [ 3330 | [ 3331 | "#B0BEC5", 3332 | "#B0BEC5", 3333 | "#B0BEC5", 3334 | "red", 3335 | "#B0BEC5", 3336 | "#B0BEC5", 3337 | "#B0BEC5", 3338 | "#B0BEC5", 3339 | "#B0BEC5", 3340 | "#B0BEC5" 3341 | ] 3342 | ] 3343 | } 3344 | ], 3345 | "label": "Topic 3", 3346 | "method": "update" 3347 | }, 3348 | { 3349 | "args": [ 3350 | { 3351 | "marker.color": [ 3352 | [ 3353 | "#B0BEC5", 3354 | "#B0BEC5", 3355 | "#B0BEC5", 3356 | "#B0BEC5", 3357 | "red", 3358 | "#B0BEC5", 3359 | "#B0BEC5", 3360 | "#B0BEC5", 3361 | "#B0BEC5", 3362 | "#B0BEC5" 3363 | ] 3364 | ] 3365 | } 3366 | ], 3367 | "label": "Topic 4", 3368 | "method": "update" 3369 | }, 3370 | { 3371 | "args": [ 3372 | { 3373 | "marker.color": [ 3374 | [ 3375 | "#B0BEC5", 3376 | "#B0BEC5", 3377 | "#B0BEC5", 3378 | "#B0BEC5", 3379 | "#B0BEC5", 3380 | "red", 3381 | "#B0BEC5", 3382 | "#B0BEC5", 3383 | "#B0BEC5", 3384 | "#B0BEC5" 3385 | ] 3386 | ] 3387 | } 3388 | ], 3389 | "label": "Topic 5", 3390 | "method": "update" 3391 | }, 3392 | { 3393 | "args": [ 3394 | { 3395 | "marker.color": [ 3396 | [ 3397 | "#B0BEC5", 3398 | "#B0BEC5", 3399 | "#B0BEC5", 3400 | "#B0BEC5", 3401 | "#B0BEC5", 3402 | "#B0BEC5", 3403 | "red", 3404 | "#B0BEC5", 3405 | "#B0BEC5", 3406 | "#B0BEC5" 3407 | ] 3408 | ] 3409 | } 3410 | ], 3411 | "label": "Topic 6", 3412 | "method": "update" 3413 | }, 3414 | { 3415 | "args": [ 3416 | { 3417 | "marker.color": [ 3418 | [ 3419 | "#B0BEC5", 3420 | "#B0BEC5", 3421 | "#B0BEC5", 3422 | "#B0BEC5", 3423 | "#B0BEC5", 3424 | "#B0BEC5", 3425 | "#B0BEC5", 3426 | "red", 3427 | "#B0BEC5", 3428 | "#B0BEC5" 3429 | ] 3430 | ] 3431 | } 3432 | ], 3433 | "label": "Topic 7", 3434 | "method": "update" 3435 | }, 3436 | { 3437 | "args": [ 3438 | { 3439 | "marker.color": [ 3440 | [ 3441 | "#B0BEC5", 3442 | "#B0BEC5", 3443 | "#B0BEC5", 3444 | "#B0BEC5", 3445 | "#B0BEC5", 3446 | "#B0BEC5", 3447 | "#B0BEC5", 3448 | "#B0BEC5", 3449 | "red", 3450 | "#B0BEC5" 3451 | ] 3452 | ] 3453 | } 3454 | ], 3455 | "label": "Topic 8", 3456 | "method": "update" 3457 | }, 3458 | { 3459 | "args": [ 3460 | { 3461 | "marker.color": [ 3462 | [ 3463 | "#B0BEC5", 3464 | "#B0BEC5", 3465 | "#B0BEC5", 3466 | "#B0BEC5", 3467 | "#B0BEC5", 3468 | "#B0BEC5", 3469 | "#B0BEC5", 3470 | "#B0BEC5", 3471 | "#B0BEC5", 3472 | "red" 3473 | ] 3474 | ] 3475 | } 3476 | ], 3477 | "label": "Topic 9", 3478 | "method": "update" 3479 | } 3480 | ] 3481 | } 3482 | ], 3483 | "template": { 3484 | "data": { 3485 | "bar": [ 3486 | { 3487 | "error_x": { 3488 | "color": "rgb(36,36,36)" 3489 | }, 3490 | "error_y": { 3491 | "color": "rgb(36,36,36)" 3492 | }, 3493 | "marker": { 3494 | "line": { 3495 | "color": "white", 3496 | "width": 0.5 3497 | }, 3498 | "pattern": { 3499 | "fillmode": "overlay", 3500 | "size": 10, 3501 | "solidity": 0.2 3502 | } 3503 | }, 3504 | "type": "bar" 3505 | } 3506 | ], 3507 | "barpolar": [ 3508 | { 3509 | "marker": { 3510 | "line": { 3511 | "color": "white", 3512 | "width": 0.5 3513 | }, 3514 | "pattern": { 3515 | "fillmode": "overlay", 3516 | "size": 10, 3517 | "solidity": 0.2 3518 | } 3519 | }, 3520 | "type": "barpolar" 3521 | } 3522 | ], 3523 | "carpet": [ 3524 | { 3525 | "aaxis": { 3526 | "endlinecolor": "rgb(36,36,36)", 3527 | "gridcolor": "white", 3528 | "linecolor": "white", 3529 | "minorgridcolor": "white", 3530 | "startlinecolor": "rgb(36,36,36)" 3531 | }, 3532 | "baxis": { 3533 | "endlinecolor": "rgb(36,36,36)", 3534 | "gridcolor": "white", 3535 | "linecolor": "white", 3536 | "minorgridcolor": "white", 3537 | "startlinecolor": "rgb(36,36,36)" 3538 | }, 3539 | "type": "carpet" 3540 | } 3541 | ], 3542 | "choropleth": [ 3543 | { 3544 | "colorbar": { 3545 | "outlinewidth": 1, 3546 | "tickcolor": "rgb(36,36,36)", 3547 | "ticks": "outside" 3548 | }, 3549 | "type": "choropleth" 3550 | } 3551 | ], 3552 | "contour": [ 3553 | { 3554 | "colorbar": { 3555 | "outlinewidth": 1, 3556 | "tickcolor": "rgb(36,36,36)", 3557 | "ticks": "outside" 3558 | }, 3559 | "colorscale": [ 3560 | [ 3561 | 0, 3562 | "#440154" 3563 | ], 3564 | [ 3565 | 0.1111111111111111, 3566 | "#482878" 3567 | ], 3568 | [ 3569 | 0.2222222222222222, 3570 | "#3e4989" 3571 | ], 3572 | [ 3573 | 0.3333333333333333, 3574 | "#31688e" 3575 | ], 3576 | [ 3577 | 0.4444444444444444, 3578 | "#26828e" 3579 | ], 3580 | [ 3581 | 0.5555555555555556, 3582 | "#1f9e89" 3583 | ], 3584 | [ 3585 | 0.6666666666666666, 3586 | "#35b779" 3587 | ], 3588 | [ 3589 | 0.7777777777777778, 3590 | "#6ece58" 3591 | ], 3592 | [ 3593 | 0.8888888888888888, 3594 | "#b5de2b" 3595 | ], 3596 | [ 3597 | 1, 3598 | "#fde725" 3599 | ] 3600 | ], 3601 | "type": "contour" 3602 | } 3603 | ], 3604 | "contourcarpet": [ 3605 | { 3606 | "colorbar": { 3607 | "outlinewidth": 1, 3608 | "tickcolor": "rgb(36,36,36)", 3609 | "ticks": "outside" 3610 | }, 3611 | "type": "contourcarpet" 3612 | } 3613 | ], 3614 | "heatmap": [ 3615 | { 3616 | "colorbar": { 3617 | "outlinewidth": 1, 3618 | "tickcolor": "rgb(36,36,36)", 3619 | "ticks": "outside" 3620 | }, 3621 | "colorscale": [ 3622 | [ 3623 | 0, 3624 | "#440154" 3625 | ], 3626 | [ 3627 | 0.1111111111111111, 3628 | "#482878" 3629 | ], 3630 | [ 3631 | 0.2222222222222222, 3632 | "#3e4989" 3633 | ], 3634 | [ 3635 | 0.3333333333333333, 3636 | "#31688e" 3637 | ], 3638 | [ 3639 | 0.4444444444444444, 3640 | "#26828e" 3641 | ], 3642 | [ 3643 | 0.5555555555555556, 3644 | "#1f9e89" 3645 | ], 3646 | [ 3647 | 0.6666666666666666, 3648 | "#35b779" 3649 | ], 3650 | [ 3651 | 0.7777777777777778, 3652 | "#6ece58" 3653 | ], 3654 | [ 3655 | 0.8888888888888888, 3656 | "#b5de2b" 3657 | ], 3658 | [ 3659 | 1, 3660 | "#fde725" 3661 | ] 3662 | ], 3663 | "type": "heatmap" 3664 | } 3665 | ], 3666 | "heatmapgl": [ 3667 | { 3668 | "colorbar": { 3669 | "outlinewidth": 1, 3670 | "tickcolor": "rgb(36,36,36)", 3671 | "ticks": "outside" 3672 | }, 3673 | "colorscale": [ 3674 | [ 3675 | 0, 3676 | "#440154" 3677 | ], 3678 | [ 3679 | 0.1111111111111111, 3680 | "#482878" 3681 | ], 3682 | [ 3683 | 0.2222222222222222, 3684 | "#3e4989" 3685 | ], 3686 | [ 3687 | 0.3333333333333333, 3688 | "#31688e" 3689 | ], 3690 | [ 3691 | 0.4444444444444444, 3692 | "#26828e" 3693 | ], 3694 | [ 3695 | 0.5555555555555556, 3696 | "#1f9e89" 3697 | ], 3698 | [ 3699 | 0.6666666666666666, 3700 | "#35b779" 3701 | ], 3702 | [ 3703 | 0.7777777777777778, 3704 | "#6ece58" 3705 | ], 3706 | [ 3707 | 0.8888888888888888, 3708 | "#b5de2b" 3709 | ], 3710 | [ 3711 | 1, 3712 | "#fde725" 3713 | ] 3714 | ], 3715 | "type": "heatmapgl" 3716 | } 3717 | ], 3718 | "histogram": [ 3719 | { 3720 | "marker": { 3721 | "line": { 3722 | "color": "white", 3723 | "width": 0.6 3724 | } 3725 | }, 3726 | "type": "histogram" 3727 | } 3728 | ], 3729 | "histogram2d": [ 3730 | { 3731 | "colorbar": { 3732 | "outlinewidth": 1, 3733 | "tickcolor": "rgb(36,36,36)", 3734 | "ticks": "outside" 3735 | }, 3736 | "colorscale": [ 3737 | [ 3738 | 0, 3739 | "#440154" 3740 | ], 3741 | [ 3742 | 0.1111111111111111, 3743 | "#482878" 3744 | ], 3745 | [ 3746 | 0.2222222222222222, 3747 | "#3e4989" 3748 | ], 3749 | [ 3750 | 0.3333333333333333, 3751 | "#31688e" 3752 | ], 3753 | [ 3754 | 0.4444444444444444, 3755 | "#26828e" 3756 | ], 3757 | [ 3758 | 0.5555555555555556, 3759 | "#1f9e89" 3760 | ], 3761 | [ 3762 | 0.6666666666666666, 3763 | "#35b779" 3764 | ], 3765 | [ 3766 | 0.7777777777777778, 3767 | "#6ece58" 3768 | ], 3769 | [ 3770 | 0.8888888888888888, 3771 | "#b5de2b" 3772 | ], 3773 | [ 3774 | 1, 3775 | "#fde725" 3776 | ] 3777 | ], 3778 | "type": "histogram2d" 3779 | } 3780 | ], 3781 | "histogram2dcontour": [ 3782 | { 3783 | "colorbar": { 3784 | "outlinewidth": 1, 3785 | "tickcolor": "rgb(36,36,36)", 3786 | "ticks": "outside" 3787 | }, 3788 | "colorscale": [ 3789 | [ 3790 | 0, 3791 | "#440154" 3792 | ], 3793 | [ 3794 | 0.1111111111111111, 3795 | "#482878" 3796 | ], 3797 | [ 3798 | 0.2222222222222222, 3799 | "#3e4989" 3800 | ], 3801 | [ 3802 | 0.3333333333333333, 3803 | "#31688e" 3804 | ], 3805 | [ 3806 | 0.4444444444444444, 3807 | "#26828e" 3808 | ], 3809 | [ 3810 | 0.5555555555555556, 3811 | "#1f9e89" 3812 | ], 3813 | [ 3814 | 0.6666666666666666, 3815 | "#35b779" 3816 | ], 3817 | [ 3818 | 0.7777777777777778, 3819 | "#6ece58" 3820 | ], 3821 | [ 3822 | 0.8888888888888888, 3823 | "#b5de2b" 3824 | ], 3825 | [ 3826 | 1, 3827 | "#fde725" 3828 | ] 3829 | ], 3830 | "type": "histogram2dcontour" 3831 | } 3832 | ], 3833 | "mesh3d": [ 3834 | { 3835 | "colorbar": { 3836 | "outlinewidth": 1, 3837 | "tickcolor": "rgb(36,36,36)", 3838 | "ticks": "outside" 3839 | }, 3840 | "type": "mesh3d" 3841 | } 3842 | ], 3843 | "parcoords": [ 3844 | { 3845 | "line": { 3846 | "colorbar": { 3847 | "outlinewidth": 1, 3848 | "tickcolor": "rgb(36,36,36)", 3849 | "ticks": "outside" 3850 | } 3851 | }, 3852 | "type": "parcoords" 3853 | } 3854 | ], 3855 | "pie": [ 3856 | { 3857 | "automargin": true, 3858 | "type": "pie" 3859 | } 3860 | ], 3861 | "scatter": [ 3862 | { 3863 | "fillpattern": { 3864 | "fillmode": "overlay", 3865 | "size": 10, 3866 | "solidity": 0.2 3867 | }, 3868 | "type": "scatter" 3869 | } 3870 | ], 3871 | "scatter3d": [ 3872 | { 3873 | "line": { 3874 | "colorbar": { 3875 | "outlinewidth": 1, 3876 | "tickcolor": "rgb(36,36,36)", 3877 | "ticks": "outside" 3878 | } 3879 | }, 3880 | "marker": { 3881 | "colorbar": { 3882 | "outlinewidth": 1, 3883 | "tickcolor": "rgb(36,36,36)", 3884 | "ticks": "outside" 3885 | } 3886 | }, 3887 | "type": "scatter3d" 3888 | } 3889 | ], 3890 | "scattercarpet": [ 3891 | { 3892 | "marker": { 3893 | "colorbar": { 3894 | "outlinewidth": 1, 3895 | "tickcolor": "rgb(36,36,36)", 3896 | "ticks": "outside" 3897 | } 3898 | }, 3899 | "type": "scattercarpet" 3900 | } 3901 | ], 3902 | "scattergeo": [ 3903 | { 3904 | "marker": { 3905 | "colorbar": { 3906 | "outlinewidth": 1, 3907 | "tickcolor": "rgb(36,36,36)", 3908 | "ticks": "outside" 3909 | } 3910 | }, 3911 | "type": "scattergeo" 3912 | } 3913 | ], 3914 | "scattergl": [ 3915 | { 3916 | "marker": { 3917 | "colorbar": { 3918 | "outlinewidth": 1, 3919 | "tickcolor": "rgb(36,36,36)", 3920 | "ticks": "outside" 3921 | } 3922 | }, 3923 | "type": "scattergl" 3924 | } 3925 | ], 3926 | "scattermapbox": [ 3927 | { 3928 | "marker": { 3929 | "colorbar": { 3930 | "outlinewidth": 1, 3931 | "tickcolor": "rgb(36,36,36)", 3932 | "ticks": "outside" 3933 | } 3934 | }, 3935 | "type": "scattermapbox" 3936 | } 3937 | ], 3938 | "scatterpolar": [ 3939 | { 3940 | "marker": { 3941 | "colorbar": { 3942 | "outlinewidth": 1, 3943 | "tickcolor": "rgb(36,36,36)", 3944 | "ticks": "outside" 3945 | } 3946 | }, 3947 | "type": "scatterpolar" 3948 | } 3949 | ], 3950 | "scatterpolargl": [ 3951 | { 3952 | "marker": { 3953 | "colorbar": { 3954 | "outlinewidth": 1, 3955 | "tickcolor": "rgb(36,36,36)", 3956 | "ticks": "outside" 3957 | } 3958 | }, 3959 | "type": "scatterpolargl" 3960 | } 3961 | ], 3962 | "scatterternary": [ 3963 | { 3964 | "marker": { 3965 | "colorbar": { 3966 | "outlinewidth": 1, 3967 | "tickcolor": "rgb(36,36,36)", 3968 | "ticks": "outside" 3969 | } 3970 | }, 3971 | "type": "scatterternary" 3972 | } 3973 | ], 3974 | "surface": [ 3975 | { 3976 | "colorbar": { 3977 | "outlinewidth": 1, 3978 | "tickcolor": "rgb(36,36,36)", 3979 | "ticks": "outside" 3980 | }, 3981 | "colorscale": [ 3982 | [ 3983 | 0, 3984 | "#440154" 3985 | ], 3986 | [ 3987 | 0.1111111111111111, 3988 | "#482878" 3989 | ], 3990 | [ 3991 | 0.2222222222222222, 3992 | "#3e4989" 3993 | ], 3994 | [ 3995 | 0.3333333333333333, 3996 | "#31688e" 3997 | ], 3998 | [ 3999 | 0.4444444444444444, 4000 | "#26828e" 4001 | ], 4002 | [ 4003 | 0.5555555555555556, 4004 | "#1f9e89" 4005 | ], 4006 | [ 4007 | 0.6666666666666666, 4008 | "#35b779" 4009 | ], 4010 | [ 4011 | 0.7777777777777778, 4012 | "#6ece58" 4013 | ], 4014 | [ 4015 | 0.8888888888888888, 4016 | "#b5de2b" 4017 | ], 4018 | [ 4019 | 1, 4020 | "#fde725" 4021 | ] 4022 | ], 4023 | "type": "surface" 4024 | } 4025 | ], 4026 | "table": [ 4027 | { 4028 | "cells": { 4029 | "fill": { 4030 | "color": "rgb(237,237,237)" 4031 | }, 4032 | "line": { 4033 | "color": "white" 4034 | } 4035 | }, 4036 | "header": { 4037 | "fill": { 4038 | "color": "rgb(217,217,217)" 4039 | }, 4040 | "line": { 4041 | "color": "white" 4042 | } 4043 | }, 4044 | "type": "table" 4045 | } 4046 | ] 4047 | }, 4048 | "layout": { 4049 | "annotationdefaults": { 4050 | "arrowhead": 0, 4051 | "arrowwidth": 1 4052 | }, 4053 | "autotypenumbers": "strict", 4054 | "coloraxis": { 4055 | "colorbar": { 4056 | "outlinewidth": 1, 4057 | "tickcolor": "rgb(36,36,36)", 4058 | "ticks": "outside" 4059 | } 4060 | }, 4061 | "colorscale": { 4062 | "diverging": [ 4063 | [ 4064 | 0, 4065 | "rgb(103,0,31)" 4066 | ], 4067 | [ 4068 | 0.1, 4069 | "rgb(178,24,43)" 4070 | ], 4071 | [ 4072 | 0.2, 4073 | "rgb(214,96,77)" 4074 | ], 4075 | [ 4076 | 0.3, 4077 | "rgb(244,165,130)" 4078 | ], 4079 | [ 4080 | 0.4, 4081 | "rgb(253,219,199)" 4082 | ], 4083 | [ 4084 | 0.5, 4085 | "rgb(247,247,247)" 4086 | ], 4087 | [ 4088 | 0.6, 4089 | "rgb(209,229,240)" 4090 | ], 4091 | [ 4092 | 0.7, 4093 | "rgb(146,197,222)" 4094 | ], 4095 | [ 4096 | 0.8, 4097 | "rgb(67,147,195)" 4098 | ], 4099 | [ 4100 | 0.9, 4101 | "rgb(33,102,172)" 4102 | ], 4103 | [ 4104 | 1, 4105 | "rgb(5,48,97)" 4106 | ] 4107 | ], 4108 | "sequential": [ 4109 | [ 4110 | 0, 4111 | "#440154" 4112 | ], 4113 | [ 4114 | 0.1111111111111111, 4115 | "#482878" 4116 | ], 4117 | [ 4118 | 0.2222222222222222, 4119 | "#3e4989" 4120 | ], 4121 | [ 4122 | 0.3333333333333333, 4123 | "#31688e" 4124 | ], 4125 | [ 4126 | 0.4444444444444444, 4127 | "#26828e" 4128 | ], 4129 | [ 4130 | 0.5555555555555556, 4131 | "#1f9e89" 4132 | ], 4133 | [ 4134 | 0.6666666666666666, 4135 | "#35b779" 4136 | ], 4137 | [ 4138 | 0.7777777777777778, 4139 | "#6ece58" 4140 | ], 4141 | [ 4142 | 0.8888888888888888, 4143 | "#b5de2b" 4144 | ], 4145 | [ 4146 | 1, 4147 | "#fde725" 4148 | ] 4149 | ], 4150 | "sequentialminus": [ 4151 | [ 4152 | 0, 4153 | "#440154" 4154 | ], 4155 | [ 4156 | 0.1111111111111111, 4157 | "#482878" 4158 | ], 4159 | [ 4160 | 0.2222222222222222, 4161 | "#3e4989" 4162 | ], 4163 | [ 4164 | 0.3333333333333333, 4165 | "#31688e" 4166 | ], 4167 | [ 4168 | 0.4444444444444444, 4169 | "#26828e" 4170 | ], 4171 | [ 4172 | 0.5555555555555556, 4173 | "#1f9e89" 4174 | ], 4175 | [ 4176 | 0.6666666666666666, 4177 | "#35b779" 4178 | ], 4179 | [ 4180 | 0.7777777777777778, 4181 | "#6ece58" 4182 | ], 4183 | [ 4184 | 0.8888888888888888, 4185 | "#b5de2b" 4186 | ], 4187 | [ 4188 | 1, 4189 | "#fde725" 4190 | ] 4191 | ] 4192 | }, 4193 | "colorway": [ 4194 | "#1F77B4", 4195 | "#FF7F0E", 4196 | "#2CA02C", 4197 | "#D62728", 4198 | "#9467BD", 4199 | "#8C564B", 4200 | "#E377C2", 4201 | "#7F7F7F", 4202 | "#BCBD22", 4203 | "#17BECF" 4204 | ], 4205 | "font": { 4206 | "color": "rgb(36,36,36)" 4207 | }, 4208 | "geo": { 4209 | "bgcolor": "white", 4210 | "lakecolor": "white", 4211 | "landcolor": "white", 4212 | "showlakes": true, 4213 | "showland": true, 4214 | "subunitcolor": "white" 4215 | }, 4216 | "hoverlabel": { 4217 | "align": "left" 4218 | }, 4219 | "hovermode": "closest", 4220 | "mapbox": { 4221 | "style": "light" 4222 | }, 4223 | "paper_bgcolor": "white", 4224 | "plot_bgcolor": "white", 4225 | "polar": { 4226 | "angularaxis": { 4227 | "gridcolor": "rgb(232,232,232)", 4228 | "linecolor": "rgb(36,36,36)", 4229 | "showgrid": false, 4230 | "showline": true, 4231 | "ticks": "outside" 4232 | }, 4233 | "bgcolor": "white", 4234 | "radialaxis": { 4235 | "gridcolor": "rgb(232,232,232)", 4236 | "linecolor": "rgb(36,36,36)", 4237 | "showgrid": false, 4238 | "showline": true, 4239 | "ticks": "outside" 4240 | } 4241 | }, 4242 | "scene": { 4243 | "xaxis": { 4244 | "backgroundcolor": "white", 4245 | "gridcolor": "rgb(232,232,232)", 4246 | "gridwidth": 2, 4247 | "linecolor": "rgb(36,36,36)", 4248 | "showbackground": true, 4249 | "showgrid": false, 4250 | "showline": true, 4251 | "ticks": "outside", 4252 | "zeroline": false, 4253 | "zerolinecolor": "rgb(36,36,36)" 4254 | }, 4255 | "yaxis": { 4256 | "backgroundcolor": "white", 4257 | "gridcolor": "rgb(232,232,232)", 4258 | "gridwidth": 2, 4259 | "linecolor": "rgb(36,36,36)", 4260 | "showbackground": true, 4261 | "showgrid": false, 4262 | "showline": true, 4263 | "ticks": "outside", 4264 | "zeroline": false, 4265 | "zerolinecolor": "rgb(36,36,36)" 4266 | }, 4267 | "zaxis": { 4268 | "backgroundcolor": "white", 4269 | "gridcolor": "rgb(232,232,232)", 4270 | "gridwidth": 2, 4271 | "linecolor": "rgb(36,36,36)", 4272 | "showbackground": true, 4273 | "showgrid": false, 4274 | "showline": true, 4275 | "ticks": "outside", 4276 | "zeroline": false, 4277 | "zerolinecolor": "rgb(36,36,36)" 4278 | } 4279 | }, 4280 | "shapedefaults": { 4281 | "fillcolor": "black", 4282 | "line": { 4283 | "width": 0 4284 | }, 4285 | "opacity": 0.3 4286 | }, 4287 | "ternary": { 4288 | "aaxis": { 4289 | "gridcolor": "rgb(232,232,232)", 4290 | "linecolor": "rgb(36,36,36)", 4291 | "showgrid": false, 4292 | "showline": true, 4293 | "ticks": "outside" 4294 | }, 4295 | "baxis": { 4296 | "gridcolor": "rgb(232,232,232)", 4297 | "linecolor": "rgb(36,36,36)", 4298 | "showgrid": false, 4299 | "showline": true, 4300 | "ticks": "outside" 4301 | }, 4302 | "bgcolor": "white", 4303 | "caxis": { 4304 | "gridcolor": "rgb(232,232,232)", 4305 | "linecolor": "rgb(36,36,36)", 4306 | "showgrid": false, 4307 | "showline": true, 4308 | "ticks": "outside" 4309 | } 4310 | }, 4311 | "title": { 4312 | "x": 0.05 4313 | }, 4314 | "xaxis": { 4315 | "automargin": true, 4316 | "gridcolor": "rgb(232,232,232)", 4317 | "linecolor": "rgb(36,36,36)", 4318 | "showgrid": false, 4319 | "showline": true, 4320 | "ticks": "outside", 4321 | "title": { 4322 | "standoff": 15 4323 | }, 4324 | "zeroline": false, 4325 | "zerolinecolor": "rgb(36,36,36)" 4326 | }, 4327 | "yaxis": { 4328 | "automargin": true, 4329 | "gridcolor": "rgb(232,232,232)", 4330 | "linecolor": "rgb(36,36,36)", 4331 | "showgrid": false, 4332 | "showline": true, 4333 | "ticks": "outside", 4334 | "title": { 4335 | "standoff": 15 4336 | }, 4337 | "zeroline": false, 4338 | "zerolinecolor": "rgb(36,36,36)" 4339 | } 4340 | } 4341 | }, 4342 | "title": { 4343 | "font": { 4344 | "color": "Black", 4345 | "size": 22 4346 | }, 4347 | "text": "Intertopic Distance Map", 4348 | "x": 0.5, 4349 | "xanchor": "center", 4350 | "y": 0.95, 4351 | "yanchor": "top" 4352 | }, 4353 | "width": 1000, 4354 | "xaxis": { 4355 | "anchor": "y", 4356 | "domain": [ 4357 | 0, 4358 | 1 4359 | ], 4360 | "range": [ 4361 | -1.9932966113090516, 4362 | 21.026789093017577 4363 | ], 4364 | "title": { 4365 | "text": "" 4366 | }, 4367 | "visible": false 4368 | }, 4369 | "yaxis": { 4370 | "anchor": "x", 4371 | "domain": [ 4372 | 0, 4373 | 1 4374 | ], 4375 | "range": [ 4376 | 10.000898790359496, 4377 | 17.826926946640015 4378 | ], 4379 | "title": { 4380 | "text": "" 4381 | }, 4382 | "visible": false 4383 | } 4384 | } 4385 | } 4386 | }, 4387 | "metadata": {}, 4388 | "output_type": "display_data" 4389 | } 4390 | ], 4391 | "source": [ 4392 | "# 各 Topic 間距離圖\n", 4393 | "topic_fig = topic_model.visualize_topics(\n", 4394 | " top_n_topics=10,\n", 4395 | " width=1000,\n", 4396 | ")\n", 4397 | "topic_fig" 4398 | ] 4399 | } 4400 | ], 4401 | "metadata": { 4402 | "interpreter": { 4403 | "hash": "1ae26572eda725713d76d5b5539aa0885fd8640fe2af809f71b79ee35a86cf8d" 4404 | }, 4405 | "kernelspec": { 4406 | "display_name": "Python 3.7.11 ('bertopic')", 4407 | "language": "python", 4408 | "name": "python3" 4409 | }, 4410 | "language_info": { 4411 | "codemirror_mode": { 4412 | "name": "ipython", 4413 | "version": 3 4414 | }, 4415 | "file_extension": ".py", 4416 | "mimetype": "text/x-python", 4417 | "name": "python", 4418 | "nbconvert_exporter": "python", 4419 | "pygments_lexer": "ipython3", 4420 | "version": "3.9.19" 4421 | }, 4422 | "orig_nbformat": 4 4423 | }, 4424 | "nbformat": 4, 4425 | "nbformat_minor": 2 4426 | } 4427 | -------------------------------------------------------------------------------- /imgs/bar_plot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Aidenzich/HelloBERTopic/13ec36b258a74c6eb05d3d69e6966896d60235e6/imgs/bar_plot.png -------------------------------------------------------------------------------- /imgs/intertopic_distance_map.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Aidenzich/HelloBERTopic/13ec36b258a74c6eb05d3d69e6966896d60235e6/imgs/intertopic_distance_map.png -------------------------------------------------------------------------------- /imgs/topic_over_time.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Aidenzich/HelloBERTopic/13ec36b258a74c6eb05d3d69e6966896d60235e6/imgs/topic_over_time.png -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- 1 | import os 2 | from xmlrpc.client import Boolean 3 | 4 | os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3" # 關閉TF警告訊息 5 | import pandas as pd 6 | from bertopic import BERTopic 7 | from ckiptagger import construct_dictionary, WS 8 | from transformers import AutoModelForTokenClassification 9 | from utils import EXPORT_PATH, set_up, DATA_PATH 10 | from halo import Halo 11 | from termcolor import colored 12 | import argparse 13 | import pickle 14 | 15 | parser = argparse.ArgumentParser(description="Hello BERTopics") 16 | parser.add_argument( 17 | "--topic_num", type=int, default=10, help="設置要取出頻率排名前幾的topics" 18 | ) 19 | parser.add_argument( 20 | "--keyword_file", type=str, default="keys.txt", help="設置讀取keyword檔案名稱" 21 | ) 22 | parser.add_argument( 23 | "--model_name", 24 | type=str, 25 | default="ckiplab/bert-base-chinese-ws", 26 | help="設置HuggingFace的PretrainModel名稱", 27 | ) 28 | parser.add_argument( 29 | "--data_file", type=str, default="example_data.csv", help="設置資料讀取位置" 30 | ) 31 | parser.add_argument( 32 | "--word_sentence_cache", type=Boolean, default=False, help="是否讀取斷詞快取" 33 | ) 34 | 35 | 36 | if __name__ == "__main__": 37 | args = parser.parse_args() 38 | 39 | set_up() 40 | 41 | # 設定 Huging Face Pretrained Model 42 | MODEL_NAME = args.model_name 43 | top_n_topics = args.topic_num 44 | 45 | # 讀取資料 46 | df = pd.read_csv(DATA_PATH / args.data_file) 47 | sentence_list = df[ 48 | "description" 49 | ].tolist() # 我們取原始資料中的'description'欄位來當作訓練資料 50 | timestamps = df.year.tolist() # 讀取data.csv檔案中的 year 資料,作為我們的timestamp 51 | 52 | # 取出斷詞關鍵字 53 | keysfile = DATA_PATH / args.keyword_file 54 | with open(keysfile, encoding="utf-8") as file: 55 | lines = file.read().splitlines() 56 | 57 | ws_cache_path = EXPORT_PATH / "word_sentence_cache.pkl" 58 | if args.word_sentence_cache and ws_cache_path.is_file(): 59 | spinner = Halo(text="Load tokenized cache...", spinner="dots") 60 | spinner.start() 61 | print("Loading word sentence cache...") 62 | word_sentence_list = pickle.load(open(ws_cache_path, "rb")) 63 | spinner.stop() 64 | 65 | else: 66 | # 讀取 CKIP 斷詞模型 67 | ws = WS(str(DATA_PATH)) 68 | 69 | # 建立使用者字典 (幫助斷詞出關鍵字) 70 | keydict = {l: 1 for l in lines} 71 | dictionary = construct_dictionary(keydict) 72 | 73 | spinner = Halo(text="Tokenizing with CKIP-Tagger", spinner="dots") 74 | spinner.start() 75 | 76 | # 開始斷詞 77 | word_sentence_list = ws( 78 | sentence_list, 79 | sentence_segmentation=True, 80 | segment_delimiter_set={",", "。", ":", "?", "!", ";"}, 81 | recommend_dictionary=dictionary, # 加入斷詞字典 82 | ) 83 | 84 | spinner.stop() 85 | 86 | pickle.dump(word_sentence_list, open(ws_cache_path, "wb")) 87 | 88 | # 轉換為BERTopic 可接受格式 89 | ws = [" ".join(w) for w in word_sentence_list] 90 | print(colored(f"BERTopics Input Showcase: \n [ { ws[0] }]", "blue")) 91 | 92 | spinner = Halo(text="Loading HagingFace Pretrained Model", spinner="dots") 93 | spinner.start() 94 | 95 | # 讀取 Hugingface Pretrained Model 96 | model = AutoModelForTokenClassification.from_pretrained(MODEL_NAME) 97 | spinner.stop() 98 | # 建立 BERTopic 99 | topic_model = BERTopic( 100 | language="chinese", 101 | embedding_model=model, 102 | verbose=True, 103 | ) 104 | 105 | # 訓練並產生資料 106 | topics, probs = topic_model.fit_transform(ws) 107 | # 產生資料時間資料 108 | topics_over_time = topic_model.topics_over_time(ws, timestamps, nr_bins=20) 109 | 110 | # 各 Topic TF-IDF 關鍵字直方圖 111 | bar_fig = topic_model.visualize_barchart( 112 | top_n_topics=top_n_topics, 113 | width=230, 114 | ) 115 | 116 | # 各 Topic 間距離圖 117 | topic_fig = topic_model.visualize_topics( 118 | top_n_topics=top_n_topics, 119 | width=1000, 120 | ) 121 | 122 | # 各 Topic 時間序列圖 123 | tot_fig = topic_model.visualize_topics_over_time( 124 | topics_over_time, top_n_topics=top_n_topics, width=1000 125 | ) 126 | 127 | # 儲存成 html 檔案,供前端展示使用 128 | bar_fig.write_html(EXPORT_PATH / "bar_fig.html") 129 | topic_fig.write_html(EXPORT_PATH / "topic_fig.html") 130 | tot_fig.write_html(EXPORT_PATH / "tot_fig.html") 131 | print("Done") 132 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | ckiptagger==0.2.1 2 | tensorflow==2.13 3 | bertopic==0.16.0 4 | nbformat==5.10.4 5 | pandas==2.2.1 6 | jieba==0.42.1 7 | halo==0.0.31 8 | gdown==5.1.0 9 | spacy==3.7.4 -------------------------------------------------------------------------------- /utils.py: -------------------------------------------------------------------------------- 1 | import jieba 2 | import random 3 | import torch 4 | import numpy as np 5 | from pathlib import Path 6 | from ckiptagger import data_utils 7 | from termcolor import colored 8 | 9 | DATA_PATH = Path(__file__).parent / "data" 10 | EXPORT_PATH = Path(__file__).parent / "export" 11 | 12 | 13 | def set_seed(seed: int) -> None: 14 | random.seed(seed) 15 | np.random.seed(seed) 16 | torch.manual_seed(seed) 17 | torch.cuda.manual_seed_all(seed) 18 | 19 | if torch.cuda.is_available(): 20 | torch.backends.cudnn.benchmark = False 21 | torch.backends.cudnn.deterministic = True 22 | 23 | 24 | def set_up(): 25 | DATA_PATH.mkdir(parents=True, exist_ok=True) 26 | EXPORT_PATH.mkdir(parents=True, exist_ok=True) 27 | ckip_check() 28 | set_seed(42) 29 | 30 | 31 | def ckip_check(): 32 | check_list = [ 33 | "embedding_character", 34 | "embedding_word", 35 | "model_ner", 36 | "model_pos", 37 | "model_ws", 38 | ] 39 | 40 | check = True 41 | 42 | for i in check_list: 43 | data_exists = (DATA_PATH / i).exists() 44 | print( 45 | ( 46 | colored(data_exists, "blue") 47 | if data_exists 48 | else colored(data_exists, "red") 49 | ), 50 | i, 51 | ) 52 | if not data_exists: 53 | check = False 54 | 55 | if not check: 56 | print("Lack of CKIP data, Start download...") 57 | data_utils.download_data_gdown("./") 58 | print("CKIP Data download complete.") 59 | return 60 | 61 | print("CKIP Data validation complete.") 62 | 63 | 64 | def clean_text(text): 65 | stoptext = open(DATA_PATH / "stopword.txt", encoding="utf-8").read() 66 | stopwords = stoptext.split("\n") 67 | words = jieba.lcut(text) 68 | words = [w for w in words if w not in stopwords] 69 | return " ".join(words) 70 | --------------------------------------------------------------------------------