├── .gitignore ├── README.org ├── dictionary-overlay.el ├── dictionary-overlay.py ├── images ├── 2022-11-15_21-23-58_screenshot.png └── dictionary-overlay-face.png ├── logo.svg ├── pystardict.py ├── requirements.txt └── resources ├── kdic-ec-11w.dict.dz ├── kdic-ec-11w.idx └── kdic-ec-11w.ifo /.gitignore: -------------------------------------------------------------------------------- 1 | *.elc 2 | *.pyc 3 | /.log/ 4 | node_modules/ 5 | dist/ 6 | tags 7 | __pycache__ 8 | env/ -------------------------------------------------------------------------------- /README.org: -------------------------------------------------------------------------------- 1 | #+title: Dictionary Overlay 2 | 3 | 4 | #+html:

logo

5 | #+html:

Dictionary Overlay

6 | 7 | * 目标 8 | 辅助英文较弱的 Emacser 进行英文阅读。提供了两种能力: 9 | 1. 生词本提示:自定义“生词本”,阅读英文文章时,通过 overlay 给生词添加中文翻译。 10 | 2. 透析阅读法:自定义“熟词本”,阅读英文文章时,通过 overlay 翻译当前文章所有未标记为“熟词”的单词 11 | 12 | #+caption: Example 13 | [[file:images/2022-11-15_21-23-58_screenshot.png]] 14 | 15 | * 安装 16 | ** [[https://github.com/ginqi7/websocket-bridge][websocket-bridge]] 17 | 用于 Emacs 与外部应用进行 websocket 通信 18 | ** Python 相关包 19 | 插件通过 python 编写,需要安装 python3 20 | - [[https://github.com/ginqi7/websocket-bridge-python][websocket-bridge-python]] websocket-bridge 的 python 客户端 21 | - [[https://github.com/huggingface/tokenizers][tokenizers]] python 分词工具 22 | - six pystardict.py 的依赖 23 | - [[https://github.com/jd-boyd/sexpdata][sexpdata]] 用于把 python 对象转换为 sexp 24 | - [[https://pypi.org/project/snowballstemmer/][snowballstemmer]] 用于“词干提取”的算法包 25 | - [[https://git.ookami.one/cgit/google-translate/][google-translate]] 用于网络翻译,非必选,可以用 crow-translate 替换 26 | - [[https://pyobjc.readthedocs.io/en/latest/][pyobjc]] 非必选,MacOS 用户想要使用系统词典时,需要安装 27 | 28 | 你可以使用 ~dictionary-overlay-install~ 来安装相关的 python 包(不包括 google-translate 和 pyobjc)。 29 | 30 | ** 网络翻译 31 | 默认会使用 sdcv 本地词典翻译。当单词在本地词典未找到时,会使用网络翻译,目前支持: 32 | 1. [[https://crow-translate.github.io/][crow-translate]] 33 | 2. [[https://git.ookami.one/cgit/google-translate/][google-translate]] 34 | 35 | 你可以使用: ~dictionary-overlay-install-google-translate~ 来安装 google-translate 36 | 37 | ** 下载 dictionary-overlay 38 | #+begin_src shell 39 | git clone --depth=1 -b main https://github.com/ginqi7/dictionary-overlay ~/.emacs.d/site-lisp/dictionary-overlay/ 40 | #+end_src 41 | 42 | ** 添加下面配置到 ~/.emacs 43 | #+begin_src emacs-lisp 44 | (add-to-list 'load-path "~/.emacs.d/site-lisp/dictionary-overlay/") 45 | (require 'dictionary-overlay) 46 | #+end_src 47 | 48 | * 命令 49 | | 命令 | 说明 | 50 | |----------------------------------------------+----------------------------------------------------------------| 51 | | dictionary-overlay-start | 启动 dictionary-overlay 应用 | 52 | | dictionary-overlay-stop | 退出 dictionary-overlay 应用 | 53 | | dictionary-overlay-restart | 重启 dictionary-overlay 应用 | 54 | | dictionary-overlay-render-buffer | 使用翻译渲染当前 buffer | 55 | | dictionary-overlay-toggle | 打开\关闭翻译渲染当前 buffer | 56 | | dictionary-overlay-lookup | 查询当前词, 默认 Emacs 自带词典。自定义见选项 | 57 | | dictionary-overlay-jump-next-unknown-word | 跳转到下一个生词 | 58 | | dictionary-overlay-jump-prev-unknown-word | 跳转到上一个生词 | 59 | | dictionary-overlay-jump-first-unknown-word | 跳转到第一个生词 | 60 | | dictionary-overlay-jump-last-unknown-word | 跳转到最后一个生词 | 61 | | dictionary-overlay-jump-out-overlay | 光标跳到词末,离开overlay,恢复正常keymap | 62 | | dictionary-overlay-mark-word-known | 标记当前单词为“已知” | 63 | | dictionary-overlay-mark-word-unknown | 标记当前单词为“生词” | 64 | | dictionary-overlay-mark-word-smart | 生词本模式时,默认标记当前单词为“未知”,透析模式时,标为“已知” | 65 | | dictionary-overlay-mark-word-smart-reversely | 功能同上,但生词本模式时标记为“已知”,透析模式时,标为”未知“ | 66 | | dictionary-overlay-mark-buffer | 标签当前 buffer 中所有未标记为“生词”的单词全为“已知” | 67 | | dictionary-overlay-mark-buffer-unknown | 标签当前 buffer 中所有未标记为“生词”的单词全为“未知” | 68 | | dictionary-overlay-install | 安装 dictionary-overlay 所依赖的必选 python 包 | 69 | | dictionary-overlay-install-google-translate | 安装 google-translate | 70 | | dictionary-overlay-modify-translation | 修改当前单词的“翻译”,可以选择词典中的翻译,也可以手动输入 | 71 | 72 | * 选项 73 | 74 | | 选项 | 说明 | 75 | |-----------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| 76 | | dictionary-overlay-just-unknown-words | t 时使用“生词本”模式,nil 为“透析阅读”模式,默认为 t | 77 | | dictionary-overlay-user-data-directory | 用户数据存放 目录,默认值为:“~/.emacs.d/dictionary-overlay-data” | 78 | | dictionary-overlay-position | 显示翻译的位置:词后,help-echo, 默认在词后 | 79 | | dictionary-overlay-lookup-with | 查词词典设置:默认系统词典。可自定义第三方包,比如 youdao-dictionary, popweb | 80 | | dictionary-overlay-inihibit-keymap | t 时关闭 keymap, 默认为 nil | 81 | | dictionary-overlay-auto-jump-after | 可选项:标为生词 mark-word-known, 标为熟词 mark-word-unknwon, 刷新 render-buffer | 82 | | dictionary-overlay-translation-format | 翻译展示的形式,默认是:"(%s)" | 83 | | dictionary-overlay-translators | 指定使用的翻译引擎以及使用顺序。默认包含'("local" "sdcv" "darwin" "web") 分别表示,本地dictionary.json 文件,内置的sdcv 词典, MacOs 系统词典,以及web 翻译,你可以选择使用的词典以及顺序。 | 84 | | dictionary-overlay-sdcv-dictionary-path | 默认值 nil, 此时会使用 dictionary-overlay 自带 的kdic-ec-11w 词典,如果你有自定义的 StarDict 词典,你可以设定自己的词典路径。 | 85 | 86 | *注意:手动修改dictionary-overlay-user-data-directory 目录下的文件时,请先关闭 dictionary-overlay 应用(运行dictionary-overlay-stop ),否则修改可能会被应用覆盖* 87 | 88 | 89 | 90 | ** face 91 | 92 | | 选项 | 说明 | 93 | |---------------------------------------------------+---------------------------------------------------------------| 94 | | dictionary-overlay-unknownword | 生词的展示形态 face 默认为 nil, 用户可自行修改 | 95 | | dictionary-overlay-translation | 生词的翻译的展示形态 face 默认为 nil, 用户可自行修改 | 96 | 97 | 用于控制生词的展示, 为了不影响阅读默认为空,不对原始 face 做任何修改。如果希望能通过 face 对生词进行显示增加可以参考 98 | 99 | #+begin_src emacs-lisp 100 | (defface dictionary-overlay-translation 101 | '((((class color) (min-colors 88) (background light)) 102 | :underline "#fb8c96" :background "#fbd8db") 103 | (((class color) (min-colors 88) (background dark)) 104 | :underline "#C77577" :background "#7A696B") 105 | (t 106 | :inherit highlight)) 107 | "Face for dictionary-overlay unknown words.") 108 | #+end_src 109 | 110 | face `dictionary-overlay-unknownword` 如果用户不自行定义,那么不会给单词加上 overlay, 只会新增翻译的 overlay. 这样的好处是,当你在单词上移动时,仍旧按照字母移动,而不是按照 overlay 移动。 111 | 112 | 推荐使用的 face : 113 | #+begin_src emacs-lisp 114 | (copy-face 'font-lock-keyword-face 'dictionary-overlay-unknownword) 115 | (copy-face 'font-lock-comment-face 'dictionary-overlay-translation) 116 | #+end_src 117 | 118 | #+caption: dictionary-overlay with face 119 | [[file:images/dictionary-overlay-face.png]] 120 | 121 | * 快捷键 122 | 当 ~(setq dictionary-overlay-inihibit-keymap nil)~ 可以使用若干自带的快捷键,当point 在一个生词的overlay 之上时,可以: 123 | 124 | | d | dictionary-overlay-lookup | 查当前词 | 125 | | r | dictionary-overlay-refresh-buffer | 刷新buffer | 126 | | p | dictionary-overlay-jump-prev-unknown-word | 跳转到上一个生词 | 127 | | n | dictionary-overlay-jump-next-unknown-word | 跳转到下一个生词 | 128 | | < | dictionary-overlay-jump-first-unknown-word | 跳转到第一个生词 | 129 | | > | dictionary-overlay-jump-last-unknown-word | 跳转到最后一个生词 | 130 | | m | dictionary-overlay-mark-word-smart | 透析模式,把单词标记为“熟词” | 131 | | M | dictionary-overlay-mark-word-smart-reversely | 生词本模式,把单词标记为“熟词” | 132 | | c | dictionary-overlay-modify-translation | 修改翻译 | 133 | | | dictionary-overlay-jump-out-of-overlay | 跳出overlay 让快捷键在非overlay 词语中失效。 | 134 | 135 | 快捷键只在标记为生词的overlay 上生效,因此 ~dictionary-overlay-mark-word-unknown~ 还需要自行绑定需要的快捷键 136 | 137 | * 使用方法探讨 138 | 139 | 默认使用“生词本”模式,阅读英文文章时,需要手动添加生词( ~dictionary-overlay-mark-word-unknown~ )。可以和你的“查询单词”的快捷键保持在一起。那么你下次遇到生词时,会自动展示出生词。 140 | 141 | 当你开始阅读文章时,可以把当前 buffer 中所有未标记为 known 的单词标记为 unknown ( ~dictionary-overlay-mark-buffer-unknown~ ) 142 | 143 | 当你阅读完一篇文章以后,可以把当前 buffer 中所有未标记为 unknown 的单词标记为 known ( ~dictionary-overlay-mark-buffer~ ) 144 | 145 | 当一个生词反复出现,你觉得自己已经认识了它,可以标记为 known ( ~dictionary-overlay-mark-word-known~ ),下次不再展示翻译。 146 | 147 | 当你阅读了足够多的文章,你应该积累了一定量的 known-words ,此时,或许你可以尝试使用析阅读法"( ~(setq dictionary-overlay-just-unknown-words nil)~ )将自动展示,“或许”你不认识的单词。 148 | 149 | 如果喜欢最小的视觉干扰,可以通过 (setq dictionary-overlay-position 'help-echo) 把翻译位置设置在 help-echo 里,只有鼠标通过时才显示释义。注意:目前支持的释义仍过于简单,并不推荐使用此法,同时由于默认无face,推荐设置前述 (copy-face 'font-lock-keyword-face 'dictionary-overlay-unknownword)。 150 | 151 | * 功能特性 152 | - 使用 snowballstemmer 进行词干提取,能够用于标记词干相同,形态不一的单词 153 | - 增加翻译修改功能,允许用户选择合适的词意 154 | -------------------------------------------------------------------------------- /dictionary-overlay.el: -------------------------------------------------------------------------------- 1 | ;;; dictionary-overlay.el --- Add overlay for new English word -*- lexical-binding: t; -*- 2 | 3 | ;; Copyright (C) 2022 Qiqi Jin 4 | 5 | ;; Author: Qiqi Jin 6 | ;; Keywords: lisp 7 | 8 | ;; This program is free software; you can redistribute it and/or modify 9 | ;; it under the terms of the GNU General Public License as published by 10 | ;; the Free Software Foundation, either version 3 of the License, or 11 | ;; (at your option) any later version. 12 | 13 | ;; This program is distributed in the hope that it will be useful, 14 | ;; but WITHOUT ANY WARRANTY; without even the implied warranty of 15 | ;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 16 | ;; GNU General Public License for more details. 17 | 18 | ;; You should have received a copy of the GNU General Public License 19 | ;; along with this program. If not, see . 20 | 21 | ;;; Commentary: 22 | 23 | ;; 24 | 25 | ;;; Commands: 26 | ;; 27 | ;; Below are complete command list: 28 | ;; 29 | ;; `dictionary-overlay-start' 30 | ;; Start dictionary-overlay. 31 | ;; Keybinding: M-x dictionary-overlay-start 32 | ;; `dictionary-overlay-stop' 33 | ;; Stop dictionary-overlay. 34 | ;; Keybinding: M-x dictionary-overlay-stop 35 | ;; `dictionary-overlay-restart' 36 | ;; Restart dictionary-overlay. 37 | ;; Keybinding: M-x dictionary-overlay-restart 38 | ;; `dictionary-overlay-render-buffer' 39 | ;; Render current buffer. 40 | ;; Keybinding: M-x dictionary-overlay-render-buffer 41 | ;; `dictionary-overlay-toggle' 42 | ;; Toggle current buffer. 43 | ;; Keybinding: M-x dictionary-overlay-toggle 44 | ;; `dictionary-overlay-refresh-buffer' 45 | ;; Refresh current buffer. 46 | ;; Keybinding: r 47 | ;; `dictionary-overlay-jump-first-unknown-word' 48 | ;; Jump to first unknown word. 49 | ;; Keybinding: < 50 | ;; `dictionary-overlay-jump-last-unknown-word' 51 | ;; Jump to last unknown word. 52 | ;; Keybinding: > 53 | ;; `dictionary-overlay-jump-next-unknown-word' 54 | ;; Jump to next unknown word. 55 | ;; Keybinding: n 56 | ;; `dictionary-overlay-jump-prev-unknown-word' 57 | ;; Jump to previous unknown word. 58 | ;; Keybinding: p 59 | ;; `dictionary-overlay-jump-out-of-overlay' 60 | ;; Jump out overlay so that we no longer in keymap. 61 | ;; Keybinding: 62 | ;; `dictionary-overlay-mark-word-known' 63 | ;; Mark current word known. 64 | ;; Keybinding: M-x dictionary-overlay-mark-word-known 65 | ;; `dictionary-overlay-mark-word-unknown' 66 | ;; Mark current word unknown. 67 | ;; Keybinding: M-x dictionary-overlay-mark-word-unknown 68 | ;; `dictionary-overlay-mark-word-smart' 69 | ;; Smartly mark current word as known or unknown. 70 | ;; Keybinding: M-x dictionary-overlay-mark-word-smart 71 | ;; `dictionary-overlay-mark-word-smart-reversely' 72 | ;; Smartly mark current word known or unknown smartly, but reversely. 73 | ;; Keybinding: M 74 | ;; `dictionary-overlay-mark-buffer' 75 | ;; Mark all words as known, except those in `unknownwords' list. 76 | ;; Keybinding: M-x dictionary-overlay-mark-buffer 77 | ;; `dictionary-overlay-mark-buffer-unknown' 78 | ;; Mark all words as unknown, except those in `unknownwords' list. 79 | ;; Keybinding: M-x dictionary-overlay-mark-buffer-unknown 80 | ;; `dictionary-overlay-lookup' 81 | ;; Look up word in a third-parity dictionary. 82 | ;; Keybinding: d 83 | ;; `dictionary-overlay-install' 84 | ;; Install all python dependencies. 85 | ;; Keybinding: M-x dictionary-overlay-install 86 | ;; `dictionary-overlay-macos-install-core-services' 87 | ;; Install all python dependencies. 88 | ;; Keybinding: M-x dictionary-overlay-macos-install-core-services 89 | ;; `dictionary-overlay-install-google-translate' 90 | ;; Install all google-translate dependencies. 91 | ;; Keybinding: M-x dictionary-overlay-install-google-translate 92 | ;; `dictionary-overlay-modify-translation' 93 | ;; Modify current word's translation. 94 | ;; Keybinding: c 95 | ;; 96 | ;;; Customizable Options: 97 | ;; 98 | ;; Below are customizable option list: 99 | ;; 100 | ;; `dictionary-overlay-just-unknown-words' 101 | ;; If t, show overlay for words in unknownwords list. 102 | ;; default = t 103 | ;; `dictionary-overlay-position' 104 | ;; Where to show translation. 105 | ;; default = 'after 106 | ;; `dictionary-overlay-user-data-directory' 107 | ;; Place user data in Emacs directory. 108 | ;; default = (locate-user-emacs-file "dictionary-overlay-data/") 109 | ;; `dictionary-overlay-translation-format' 110 | ;; Translation format. 111 | ;; default = "(%s)" 112 | ;; `dictionary-overlay-crow-engine' 113 | ;; Crow translate engine. 114 | ;; default = "google" 115 | ;; `dictionary-overlay-inhibit-keymap' 116 | ;; When non-nil, don't use `dictionary-overlay-map'. 117 | ;; default = nil 118 | ;; `dictionary-overlay-auto-jump-after' 119 | ;; Auto jump to next unknown word. 120 | ;; default = 'nil 121 | ;; `dictionary-overlay-recenter-after-mark-and-jump' 122 | ;; Recenter after mark or jump. 123 | ;; default = nil 124 | ;; `dictionary-overlay-lookup-with' 125 | ;; Look up word with fn. 126 | ;; default = 'dictionary-lookup-definition 127 | ;; `dictionary-overlay-translators' 128 | ;; The translators and theirs's order. 129 | ;; default = '("local" "sdcv" "darwin" "web") 130 | ;; `dictionary-overlay-sdcv-dictionary-path' 131 | ;; User defined sdcv dictionary path. 132 | ;; default = nil 133 | ;; `dictionary-overlay-python' 134 | ;; The Python interpreter. 135 | ;; default = "python3" 136 | 137 | ;;; Code: 138 | 139 | (require 'websocket-bridge) 140 | 141 | (defgroup dictionary-overlay () 142 | "Dictionary overlay for words in buffers." 143 | :group 'applications) 144 | 145 | (defface dictionary-overlay-unknownword nil 146 | "Face for dictionary-overlay unknown words." 147 | :group 'dictionary-overlay) 148 | 149 | (defface dictionary-overlay-translation nil 150 | "Face for dictionary-overlay translations." 151 | :group 'dictionary-overlay) 152 | 153 | (defvar dictionary-overlay-py-path 154 | (concat (file-name-directory load-file-name) 155 | "dictionary-overlay.py")) 156 | 157 | (defvar dictionary-overlay-py-requirements-path 158 | (concat (file-name-directory load-file-name) "requirements.txt")) 159 | 160 | (defvar-local dictionary-overlay-active-p nil 161 | "Check current buffer if active dictionary-overlay.") 162 | 163 | (defvar-local dictionary-overlay-hash-table nil 164 | "Hash-table contains overlays for dictionary-overlay. 165 | The key's format is begin:end:word:translation.") 166 | 167 | (defvar-local dictionary-overlay-hash-table-keys '() 168 | "Contains all hashtable-keys for dictionary-overlay. 169 | The key's format is begin:end:word:translation.") 170 | 171 | (defcustom dictionary-overlay-just-unknown-words t 172 | "If t, show overlay for words in unknownwords list. 173 | If nil, show overlay for words not in knownwords list." 174 | :group 'dictionary-overlay 175 | :type '(boolean)) 176 | 177 | (defcustom dictionary-overlay-position 'after 178 | "Where to show translation. 179 | If value is \\='after, put translation after word 180 | If value is \\='help-echo, show it when mouse over word." 181 | :group 'dictionary-overlay 182 | :type '(choice 183 | (cons :tag "Show after word" 'after) 184 | (cons :tag "Show in help-echo" 'help-echo))) 185 | 186 | (defcustom dictionary-overlay-user-data-directory 187 | (locate-user-emacs-file "dictionary-overlay-data/") 188 | "Place user data in Emacs directory." 189 | :group 'dictionary-overlay 190 | :type '(directory)) 191 | 192 | (defcustom dictionary-overlay-translation-format "(%s)" 193 | "Translation format." 194 | :group 'dictionary-overlay 195 | :type '(string)) 196 | 197 | (defcustom dictionary-overlay-crow-engine "google" 198 | "Crow translate engine." 199 | :group 'dictionary-overlay 200 | :type '(string)) 201 | 202 | ;; SRC: ideas from `symbol-overlay', tip hat! 203 | (defcustom dictionary-overlay-inhibit-keymap nil 204 | "When non-nil, don't use `dictionary-overlay-map'. 205 | This is intended for buffers/modes that use the keymap text 206 | property for their own purposes. Because this package uses 207 | overlays it would always override the text property keymaps 208 | of such packages." 209 | :group 'dictionary-overlay 210 | :type '(boolean)) 211 | 212 | (defcustom dictionary-overlay-auto-jump-after '() 213 | "Auto jump to next unknown word. 214 | Main purpose of auto jump is to keep cursor stay within overlay 215 | to facilitate the usage of keymap. For MARK-WORD-(UN)KNOWN, 216 | usually jump to the next unknown word, but depends on direction 217 | set by `dictionary-overlay-jump-direction'. For RENDER-BUFFER, if 218 | current cursor is within overlay, do nothing; otherwise move to 219 | next overlay." 220 | :group 'dictionary-overlay 221 | :type '(repeat 222 | (choice 223 | (const :tag "Don't jump" nil) 224 | (const :tag "After mark word known" mark-word-known) 225 | (const :tag "After mark word unknown" mark-word-unknown) 226 | (const :tag "After refresh buffer" render-buffer)))) 227 | 228 | (defvar-local dictionary-overlay-jump-direction 'next 229 | "Direction to jump word.") 230 | 231 | (defcustom dictionary-overlay-recenter-after-mark-and-jump nil 232 | "Recenter after mark or jump." 233 | :group 'dictionary-overlay 234 | :type '(choice (boolean :tag "Do nothing" nil) 235 | (integer :tag "Recenter to lines" N))) 236 | 237 | (defcustom dictionary-overlay-lookup-with 'dictionary-lookup-definition 238 | "Look up word with fn." 239 | :group 'dictionary-overlay 240 | :type '(function)) 241 | 242 | (defcustom dictionary-overlay-translators '("local" "sdcv" "darwin" "web") 243 | "The translators and theirs's order." 244 | :group 'dictionary-overlay 245 | :type '(list)) 246 | 247 | (defcustom dictionary-overlay-sdcv-dictionary-path nil 248 | "User defined sdcv dictionary path." 249 | :group 'dictionary-overlay 250 | :type '(string)) 251 | 252 | (defcustom dictionary-overlay-python (executable-find "python3") 253 | "The Python interpreter." 254 | :type 'string) 255 | 256 | (defvar dictionary-overlay-map 257 | (let ((map (make-sparse-keymap))) 258 | (define-key map (kbd "d") #'dictionary-overlay-lookup) 259 | (define-key map (kbd "r") #'dictionary-overlay-refresh-buffer) 260 | (define-key map (kbd "p") #'dictionary-overlay-jump-prev-unknown-word) 261 | (define-key map (kbd "n") #'dictionary-overlay-jump-next-unknown-word) 262 | (define-key map (kbd "<") #'dictionary-overlay-jump-first-unknown-word) 263 | (define-key map (kbd ">") #'dictionary-overlay-jump-last-unknown-word) 264 | (define-key map (kbd "m") #'dictionary-overlay-mark-word-smart) 265 | (define-key map (kbd "M") #'dictionary-overlay-mark-word-smart-reversely) 266 | (define-key map (kbd "c") #'dictionary-overlay-modify-translation) 267 | (define-key map (kbd "") #'dictionary-overlay-jump-out-of-overlay) 268 | map) 269 | "Keymap automatically activated inside overlays. 270 | You can re-bind the commands to any keys you prefer.") 271 | 272 | (defun dictionary-overlay-start () 273 | "Start dictionary-overlay." 274 | (interactive) 275 | (websocket-bridge-server-start) 276 | (websocket-bridge-app-start 277 | "dictionary-overlay" 278 | dictionary-overlay-python 279 | dictionary-overlay-py-path)) 280 | 281 | (defun dictionary-overlay-stop () 282 | "Stop dictionary-overlay." 283 | (interactive) 284 | (websocket-bridge-app-exit "dictionary-overlay")) 285 | 286 | (defun dictionary-overlay-restart () 287 | "Restart dictionary-overlay." 288 | (interactive) 289 | (dictionary-overlay-stop) 290 | (dictionary-overlay-start) 291 | ;; REVIEW: really need bring this buffer to front? or we place it at bottom? 292 | ;; (split-window-below -10) 293 | ;; (other-window 1) 294 | ;; (websocket-bridge-app-open-buffer "dictionary-overlay") 295 | ) 296 | 297 | (defun websocket-bridge-call-buffer (func-name) 298 | "Call grammarly function on current buffer by FUNC-NAME." 299 | (websocket-bridge-call "dictionary-overlay" func-name 300 | (buffer-string) 301 | (point) 302 | (buffer-name))) 303 | 304 | (defun websocket-bridge-call-word (func-name) 305 | "Call grammarly function on current word by FUNC-NAME." 306 | (let ((word (downcase (thing-at-point 'word)))) 307 | (websocket-bridge-call "dictionary-overlay" func-name 308 | (downcase (thing-at-point 'word))) 309 | (message word))) 310 | 311 | (defun dictionary-overlay-render-buffer () 312 | "Render current buffer." 313 | (interactive) 314 | (if (not (dictionary-overlay-ready-p)) 315 | (message "Dictionary-Overlay not ready, please wait a second.") 316 | (when (not (member "dictionary-overlay" websocket-bridge-app-list)) 317 | (dictionary-overlay-start)) 318 | (setq-local dictionary-overlay-active-p t) 319 | (dictionary-overlay-refresh-buffer) 320 | (when (member 'render-buffer dictionary-overlay-auto-jump-after) 321 | (websocket-bridge-call-buffer "jump_next_unknown_word")) 322 | )) 323 | 324 | (defun dictionary-overlay-toggle () 325 | "Toggle current buffer." 326 | (interactive) 327 | (if dictionary-overlay-active-p 328 | (progn 329 | ;; reset all hash-table-keys and delete all overlays 330 | (setq-local dictionary-overlay-hash-table-keys '()) 331 | (dictionary-overlay-refresh-overlays) 332 | (setq-local dictionary-overlay-active-p nil) 333 | (message "Dictionary overlay removed.")) 334 | (progn 335 | (dictionary-overlay-render-buffer) 336 | (message "Dictionary overlay rendered.")))) 337 | 338 | (defun dictionary-overlay-refresh-overlays () 339 | "Refresh overlays: remove overlays and hash-table items when not needed." 340 | (maphash 341 | (lambda (key val) 342 | (when (not (member key dictionary-overlay-hash-table-keys)) 343 | (remhash key dictionary-overlay-hash-table) 344 | (delete-overlay val))) 345 | dictionary-overlay-hash-table)) 346 | 347 | (defun dictionary-overlay-refresh-buffer () 348 | "Refresh current buffer." 349 | (interactive) 350 | (when dictionary-overlay-active-p 351 | (when (not dictionary-overlay-hash-table) 352 | (setq-local dictionary-overlay-hash-table (make-hash-table :test 'equal))) 353 | (setq-local dictionary-overlay-hash-table-keys '()) 354 | (websocket-bridge-call-buffer "render"))) 355 | 356 | (defun dictionary-overlay-first-unknown-word-pos () 357 | "End of last unknown word pos. 358 | NOTE: Retrieval of word pos relies on `dictionary-overlay-hash-table-keys', 359 | so currently won't work for auto jump after render buffer. Same as to 360 | `dictionary-overlay-last-unknown-word-pos'." 361 | (string-to-number 362 | (car (split-string 363 | (car (last dictionary-overlay-hash-table-keys)) ":")))) 364 | 365 | (defun dictionary-overlay-cursor-before-first-unknown-word-p () 366 | "Whether cursor is after word beginning of last unknown word." 367 | (<= (point) (dictionary-overlay-first-unknown-word-pos))) 368 | 369 | (defun dictionary-overlay-last-unknown-word-pos () 370 | "Beginning of last unnow word pos." 371 | (string-to-number 372 | (car (split-string 373 | (car dictionary-overlay-hash-table-keys) ":")))) 374 | 375 | (defun dictionary-overlay-cursor-after-last-unknown-word-p () 376 | "Whether cursor is after word beginning of last unknown word." 377 | (>= (point) (dictionary-overlay-last-unknown-word-pos))) 378 | 379 | (defun dictionary-overlay-jump-first-unknown-word () 380 | "Jump to first unknown word." 381 | (interactive) 382 | (goto-char (dictionary-overlay-first-unknown-word-pos))) 383 | 384 | (defun dictionary-overlay-jump-last-unknown-word () 385 | "Jump to last unknown word." 386 | (interactive) 387 | (goto-char (dictionary-overlay-last-unknown-word-pos))) 388 | 389 | (defun dictionary-overlay-jump-next-unknown-word () 390 | "Jump to next unknown word." 391 | (interactive) 392 | (if (dictionary-overlay-cursor-after-last-unknown-word-p) 393 | (dictionary-overlay-jump-first-unknown-word) 394 | (websocket-bridge-call-buffer "jump_next_unknown_word")) 395 | (setq-local dictionary-overlay-jump-direction 'next) 396 | (when dictionary-overlay-recenter-after-mark-and-jump 397 | (recenter dictionary-overlay-recenter-after-mark-and-jump))) 398 | 399 | (defun dictionary-overlay-jump-prev-unknown-word () 400 | "Jump to previous unknown word." 401 | (interactive) 402 | (if (dictionary-overlay-cursor-before-first-unknown-word-p) 403 | (dictionary-overlay-jump-last-unknown-word) 404 | (websocket-bridge-call-buffer "jump_prev_unknown_word")) 405 | (setq-local dictionary-overlay-jump-direction 'prev) 406 | (when dictionary-overlay-recenter-after-mark-and-jump 407 | (recenter dictionary-overlay-recenter-after-mark-and-jump))) 408 | 409 | (defun dictionary-overlay-jump-out-of-overlay () 410 | "Jump out overlay so that we no longer in keymap. 411 | Usually overlay keymap has a higher priority than other major and 412 | minor mode keymap. Jumping out of overlay facilitates the usage 413 | of original mode keymap. Since overlay is everywhere, don't expect it 414 | to work consistently, but usually it does a decent job." 415 | (interactive) 416 | (forward-word)) 417 | 418 | (defun dictionary-overlay-mark-word-known () 419 | "Mark current word known." 420 | (interactive) 421 | (websocket-bridge-call-word "mark_word_known") 422 | (when (member 'mark-word-known dictionary-overlay-auto-jump-after) 423 | (pcase dictionary-overlay-jump-direction 424 | (`next (dictionary-overlay-jump-next-unknown-word)) 425 | (`prev (dictionary-overlay-jump-prev-unknown-word)))) 426 | (dictionary-overlay-refresh-buffer) 427 | (when dictionary-overlay-recenter-after-mark-and-jump 428 | (recenter dictionary-overlay-recenter-after-mark-and-jump))) 429 | 430 | (defun dictionary-overlay-mark-word-unknown () 431 | "Mark current word unknown." 432 | (interactive) 433 | (websocket-bridge-call-word "mark_word_unknown") 434 | (when (member 'mark-word-unknown dictionary-overlay-auto-jump-after) 435 | (pcase dictionary-overlay-jump-direction 436 | (`next (dictionary-overlay-jump-next-unknown-word)) 437 | (`prev (dictionary-overlay-jump-prev-unknown-word)))) 438 | (dictionary-overlay-refresh-buffer) 439 | (when dictionary-overlay-recenter-after-mark-and-jump 440 | (recenter dictionary-overlay-recenter-after-mark-and-jump))) 441 | 442 | (defun dictionary-overlay-mark-word-smart () 443 | "Smartly mark current word as known or unknown. 444 | Based on value of `dictionary-overlay-just-unknown-words' 445 | Usually when value is t, we want to mark word as unknown. Vice versa. 446 | If you need reverse behavior, use: 447 | `dictionary-overlay-mark-word-smart-reversely' instead." 448 | (interactive) 449 | (if dictionary-overlay-just-unknown-words 450 | (dictionary-overlay-mark-word-unknown) 451 | (dictionary-overlay-mark-word-known))) 452 | 453 | (defun dictionary-overlay-mark-word-smart-reversely () 454 | "Smartly mark current word known or unknown smartly, but reversely. 455 | Based on value of `dictionary-overlay-just-unknown-words'" 456 | (interactive) 457 | (if dictionary-overlay-just-unknown-words 458 | (dictionary-overlay-mark-word-known) 459 | (dictionary-overlay-mark-word-unknown))) 460 | 461 | (defun dictionary-overlay-mark-buffer () 462 | "Mark all words as known, except those in `unknownwords' list." 463 | (interactive) 464 | (when (y-or-n-p 465 | "Mark all as KNOWN, EXCEPT those in unknownwords list?") 466 | (websocket-bridge-call-buffer "mark_buffer") 467 | (dictionary-overlay-refresh-buffer))) 468 | 469 | (defun dictionary-overlay-mark-buffer-unknown () 470 | "Mark all words as unknown, except those in `unknownwords' list." 471 | (interactive) 472 | (when (y-or-n-p 473 | "Mark all as UNKNOWN, EXCEPT those in unknownwords list?") 474 | (websocket-bridge-call-buffer "mark_buffer_unknown") 475 | (dictionary-overlay-refresh-buffer))) 476 | 477 | (defun dictionary-overlay-lookup () 478 | "Look up word in a third-parity dictionary. 479 | NOTE: third party dictionaries have their own implemention of 480 | getting words. Probably the word will be the same as the one 481 | dictionary-overlay gets." 482 | (interactive) 483 | (funcall dictionary-overlay-lookup-with)) 484 | 485 | (defun dictionary-add-overlay-from (begin end source target buffer-name) 486 | "Add overlay for SOURCE and TARGET from BEGIN to END in BUFFER-NAME." 487 | (when (get-buffer buffer-name) 488 | (with-current-buffer buffer-name 489 | (let ((ov (make-overlay begin end (get-buffer buffer-name))) 490 | (hash-table-key 491 | (format "%s:%s:%s:%s" begin end source target))) 492 | ;; record the overlay's key 493 | (add-to-list 'dictionary-overlay-hash-table-keys hash-table-key) 494 | (when (not (gethash hash-table-key dictionary-overlay-hash-table)) 495 | ;; create an overly only when the key not exists 496 | (overlay-put ov 'face 'dictionary-overlay-unknownword) 497 | (overlay-put ov 'evaporate t) 498 | (unless dictionary-overlay-inhibit-keymap 499 | (overlay-put ov 'keymap dictionary-overlay-map)) 500 | (pcase dictionary-overlay-position 501 | ('after 502 | (progn 503 | (overlay-put 504 | ov 'after-string 505 | (propertize 506 | (format dictionary-overlay-translation-format target) 507 | 'face 'dictionary-overlay-translation)))) 508 | ('help-echo 509 | (overlay-put 510 | ov 'help-echo 511 | (format dictionary-overlay-translation-format target)))) 512 | (puthash hash-table-key ov dictionary-overlay-hash-table)))))) 513 | 514 | (defun dictionary-overlay-install () 515 | "Install all python dependencies." 516 | (interactive) 517 | (let ((process-environment 518 | (cons "NO_COLOR=true" process-environment)) 519 | (process-buffer-name "*dictionary-overlay-install*")) 520 | (set-process-sentinel 521 | (start-process "dictionary-overlay-install" 522 | process-buffer-name 523 | dictionary-overlay-python 524 | "-m" "pip" "install" "-r" 525 | dictionary-overlay-py-requirements-path) 526 | (lambda (p _m) 527 | (when (eq 0 (process-exit-status p)) 528 | (with-current-buffer (process-buffer p) 529 | (ansi-color-apply-on-region (point-min) (point-max)))))) 530 | (split-window-below) 531 | (other-window 1) 532 | (switch-to-buffer process-buffer-name))) 533 | 534 | (defun dictionary-overlay-macos-install-core-services () 535 | "Install all python dependencies." 536 | (interactive) 537 | (let ((process-environment 538 | (cons "NO_COLOR=true" process-environment)) 539 | (process-buffer-name "*dictionary-overlay-install*")) 540 | (set-process-sentinel 541 | (start-process "dictionary-overlay-install" 542 | process-buffer-name 543 | dictionary-overlay-python 544 | "-m" "pip" "install" 545 | "pyobjc-framework-CoreServices" 546 | ) 547 | (lambda (p _m) 548 | (when (eq 0 (process-exit-status p)) 549 | (with-current-buffer (process-buffer p) 550 | (ansi-color-apply-on-region (point-min) (point-max)))))) 551 | (split-window-below) 552 | (other-window 1) 553 | (switch-to-buffer process-buffer-name))) 554 | 555 | 556 | (defun dictionary-overlay-install-google-translate () 557 | "Install all google-translate dependencies." 558 | (interactive) 559 | (let* ((process-environment 560 | (cons "NO_COLOR=true" process-environment)) 561 | (process-buffer-name "*dictionary-overlay-install*") 562 | (temp-install-directory 563 | (make-temp-file "install-google-translate" t)) 564 | (process-cmd 565 | (format 566 | (concat "git clone https://git.ookami.one/cgit/google-translate/ %s; " 567 | "cd %s; " 568 | dictionary-overlay-python 569 | " -m " "pip" " install build; " "make install") 570 | temp-install-directory temp-install-directory))) 571 | (set-process-sentinel 572 | (start-process-shell-command 573 | "dictionary-overlay-install-google-translate" 574 | process-buffer-name 575 | process-cmd) 576 | (lambda (p _m) 577 | (when (eq 0 (process-exit-status p)) 578 | (with-current-buffer (process-buffer p) 579 | (ansi-color-apply-on-region (point-min) (point-max)))))) 580 | (split-window-below) 581 | (other-window 1) 582 | (switch-to-buffer process-buffer-name))) 583 | 584 | (defun dictionary-overlay-modify-translation () 585 | "Modify current word's translation." 586 | (interactive) 587 | (let ((word (downcase (thing-at-point 'word t)))) 588 | (websocket-bridge-call "dictionary-overlay" 589 | "modify_translation" 590 | word))) 591 | 592 | (defun dictionary-overlay-choose-translate (word candidates) 593 | "Choose WORD's translation CANDIDATES." 594 | (let ((translation 595 | (completing-read "Choose or input translation: " candidates))) 596 | (websocket-bridge-call "dictionary-overlay" 597 | "update_translation" 598 | word 599 | translation)) 600 | (dictionary-overlay-render-buffer)) 601 | 602 | (defun dictionary-overlay-ready-p () 603 | "Check diction-overly if ready." 604 | (and 605 | (member "dictionary-overlay" websocket-bridge-app-list) 606 | (boundp 'websocket-bridge-client-dictionary-overlay))) 607 | 608 | (provide 'dictionary-overlay) 609 | ;;; dictionary-overlay.el ends here 610 | -------------------------------------------------------------------------------- /dictionary-overlay.py: -------------------------------------------------------------------------------- 1 | ''' Add translation overlay for unknown words.''' 2 | import asyncio 3 | import json 4 | import os 5 | import re 6 | import shutil 7 | from sys import platform 8 | from threading import Timer 9 | 10 | import snowballstemmer 11 | import websocket_bridge_python 12 | from sexpdata import dumps 13 | from tokenizers import Tokenizer 14 | from tokenizers.models import BPE 15 | from tokenizers.pre_tokenizers import Whitespace 16 | 17 | from pystardict import Dictionary 18 | 19 | snowball_stemmer = snowballstemmer.stemmer('english') 20 | sdcv_dictionary = None 21 | 22 | 23 | tokenizer = Tokenizer(BPE()) 24 | pre_tokenizer = Whitespace() 25 | dictionary = {} 26 | translators = [] 27 | 28 | 29 | def in_or_stem_in(word:str, words) -> bool: 30 | '''Check a word or word stem in the word list''' 31 | return word in words or snowball_stemmer.stemWord(word) in words 32 | 33 | async def parse(sentence: str): 34 | '''parse the sentence''' 35 | only_unknown_words = await get_emacs_var( 36 | "dictionary-overlay-just-unknown-words" 37 | ) 38 | tokens = pre_tokenizer.pre_tokenize_str(sentence) 39 | if only_unknown_words : 40 | tokens = [token for token in tokens if in_or_stem_in(token[0].lower(), unknown_words)] 41 | else: 42 | tokens = [token for token in tokens if new_word_p(token[0].lower())] 43 | return tokens 44 | 45 | def new_word_p(word: str) -> bool: 46 | if len(word) < 3: 47 | return False 48 | if re.search(r"[^a-z]", word, re.M | re.I): 49 | return False 50 | return not in_or_stem_in(word, known_words) 51 | 52 | def dump_knownwords_to_file(): 53 | with open(knownwords_file_path, "w", encoding="utf-8") as f: 54 | for word in known_words: 55 | f.write(f"{word}\n") 56 | 57 | def dump_unknownwords_to_file(): 58 | with open(unknownwords_file_path, "w", encoding="utf-8") as f: 59 | for word in unknown_words: 60 | f.write(f"{word}\n") 61 | 62 | def dump_dictionary_to_file(): 63 | with open(dictionary_file_path, "w", encoding="utf-8") as f: 64 | json.dump(dictionary, f, ensure_ascii=False, indent=4) 65 | 66 | def snapshot(): 67 | try: 68 | dump_dictionary_to_file() 69 | dump_knownwords_to_file() 70 | dump_unknownwords_to_file() 71 | except: 72 | pass 73 | 74 | Timer(30, snapshot).start() 75 | 76 | # dispatch message received from Emacs. 77 | async def on_message(message): 78 | try: 79 | info = json.loads(message) 80 | cmd = info[1][0].strip() 81 | if cmd == "render": 82 | sentence = info[1][1] 83 | buffer_name = info[1][3] 84 | await render(sentence, buffer_name) 85 | await run_and_log("(dictionary-overlay-refresh-overlays)") 86 | elif cmd == "jump_next_unknown_word": 87 | sentence = info[1][1] 88 | point = info[1][2] 89 | await jump_next_unknown_word(sentence, point) 90 | elif cmd == "jump_prev_unknown_word": 91 | sentence = info[1][1] 92 | point = info[1][2] 93 | await jump_prev_unknown_word(sentence, point) 94 | elif cmd == "mark_word_known": 95 | word = info[1][1] 96 | stem_word = snowball_stemmer.stemWord(word) 97 | if word in unknown_words: 98 | unknown_words.remove(word) 99 | if stem_word in unknown_words: 100 | unknown_words.remove(stem_word) 101 | known_words.add(word) 102 | known_words.add(stem_word) 103 | elif cmd == "mark_word_unknown": 104 | word = info[1][1] 105 | stem_word = snowball_stemmer.stemWord(word) 106 | if word in known_words: 107 | known_words.remove(word) 108 | if stem_word in known_words: 109 | known_words.remove(stem_word) 110 | unknown_words.add(word) 111 | unknown_words.add(stem_word) 112 | elif cmd == "mark_buffer": 113 | sentence = info[1][1] 114 | mark_buffer(sentence) 115 | elif cmd == "mark_buffer_unknown": 116 | sentence = info[1][1] 117 | mark_buffer_unknown(sentence) 118 | elif cmd == "modify_translation": 119 | # give user a selection to modify word translation. 120 | # combine with update_translation 121 | word = info[1][1] 122 | await modify_translation(word) 123 | elif cmd == "update_translation": 124 | # update translate in memory 125 | word = info[1][1] 126 | translation = info[1][2] 127 | dictionary[word]=translation 128 | 129 | else: 130 | print(f"not fount handler for {cmd}", flush=True) 131 | except: 132 | import traceback 133 | print(traceback.format_exc()) 134 | 135 | async def modify_translation(word: str): 136 | "let the user to modify default translation" 137 | all_translations = [] 138 | for translator in translators: 139 | translations = await translate_by_translator(word, translator) 140 | all_translations.extend(translations) 141 | # remove duplicative translations 142 | # dict.fromkeys doesn't lose ordering. It's slower than list(set(items)) (takes 50-100% longer typically), but much faster than any other order-preserving solution 143 | all_translations = list(dict.fromkeys(all_translations)) 144 | sexp = dumps(all_translations) 145 | cmd = f'(dictionary-overlay-choose-translate "{word}" \'{sexp})' 146 | await run_and_log(cmd) 147 | 148 | def mark_buffer(sentence: str): 149 | tokens = pre_tokenizer.pre_tokenize_str(sentence) 150 | words = [ 151 | token[0].lower() for token in tokens if not in_or_stem_in(token[0].lower(), unknown_words) 152 | ] 153 | 154 | for word in words: 155 | known_words.add(word) 156 | known_words.add(snowball_stemmer.stemWord(word)) 157 | 158 | def mark_buffer_unknown(sentence: str): 159 | tokens = pre_tokenizer.pre_tokenize_str(sentence) 160 | words = [ 161 | token[0].lower() for token in tokens if not in_or_stem_in(token[0].lower(), known_words) 162 | ] 163 | for word in words: 164 | unknown_words.add(word) 165 | unknown_words.add(snowball_stemmer.stemWord(word)) 166 | 167 | def get_command_result(command_string, cwd=None): 168 | import subprocess 169 | 170 | process = subprocess.Popen( 171 | command_string, 172 | cwd=cwd, 173 | shell=True, 174 | text=True, 175 | stdout=subprocess.PIPE, 176 | stderr=subprocess.PIPE, 177 | encoding="utf-8", 178 | ) 179 | ret = process.wait() 180 | return "".join((process.stdout if ret == 0 else process.stderr).readlines()).strip() # type: ignore 181 | 182 | async def jump_next_unknown_word(sentence: str, point: int): 183 | tokens = await parse(sentence) 184 | # todo: write this with build-in 'any' function 185 | for token in tokens: 186 | begin = token[1][0] + 1 187 | if point < begin: 188 | cmd = f"(goto-char {begin})" 189 | await run_and_log(cmd) 190 | break 191 | 192 | async def jump_prev_unknown_word(sentence: str, point: int): 193 | tokens = await parse(sentence) 194 | for token in reversed(tokens): 195 | begin = token[1][0] + 1 196 | if point > begin: 197 | cmd = f"(goto-char {begin})" 198 | await run_and_log(cmd) 199 | break 200 | 201 | async def web_translate(word: str) -> list: 202 | '''translate word by web translator, crow or google''' 203 | try: 204 | if shutil.which("crow"): 205 | result = get_command_result(f'crow -t zh-CN --json -e {crow_engine} "{word}"') 206 | return [json.loads(result)["translation"]] 207 | import google_translate 208 | result = google_translate.translate(word, dst_lang='zh') 209 | return result["trans"] 210 | except ImportError: 211 | msg= f"[Dictionary-overlay]you do not have a network dictionary installed and the queried word [\"{word}\"] is not in the local dictionary, please install crow-translate or google-translate" 212 | print(msg) 213 | await bridge.message_to_emacs(msg) 214 | return [] 215 | except Exception as e: 216 | print (e) 217 | msg = "[Dictionary-overlay]web-translate error, check your network. or run (websocket-bridge-app-open-buffer 'dictionary-overlay) see the error details." 218 | await bridge.message_to_emacs(msg) 219 | return [] 220 | 221 | async def ipa_translate(word: str) -> list: 222 | '''translate word by ipa''' 223 | try: 224 | import eng_to_ipa as ipa 225 | result = ipa.convert(word) 226 | return [result] 227 | except ImportError: 228 | msg= f"[Dictionary-overlay]you do not have a ipa dictionary installed." 229 | print(msg) 230 | await bridge.message_to_emacs(msg) 231 | return [] 232 | except Exception as e: 233 | print (e) 234 | msg = "[Dictionary-overlay]web-translate error, check your network. or run (websocket-bridge-app-open-buffer 'dictionary-overlay) see the error details." 235 | await bridge.message_to_emacs(msg) 236 | return [] 237 | 238 | 239 | def extract_translations(msg:str) -> list: 240 | '''extract translations by regex''' 241 | re_chinese_words = re.compile("[\u4e00-\u9fa5]+") 242 | return re.findall(re_chinese_words, msg) 243 | 244 | def sdcv_translate(word:str) -> list: 245 | '''translate word and stem by sdcv''' 246 | stem = snowball_stemmer.stemWord(word) 247 | translations = extract_translations(sdcv_dictionary.get(word)) 248 | translations.extend(extract_translations(sdcv_dictionary.get(stem))) 249 | return translations 250 | 251 | def local_translate(word:str) -> list: 252 | '''translate word by local dictionary''' 253 | translation = dictionary.get(word) 254 | return [translation] if translation else [] 255 | 256 | async def translate_by_translator(word: str, translator: str) -> list: 257 | '''translate word by specified translator''' 258 | if translator == "local": 259 | return local_translate(word) 260 | if translator == "sdcv": 261 | return sdcv_translate(word) 262 | if translator == "darwin": 263 | return macos_dictionary_translate(word) 264 | if translator == "web": 265 | return await web_translate(word) 266 | if translator == "ipa": 267 | return await ipa_translate(word) 268 | return [] 269 | 270 | async def translate(word: str) -> str: 271 | '''translate word.''' 272 | for translator in translators: 273 | translations = await translate_by_translator(word, translator) 274 | if translations: 275 | dictionary[word] = translations[0] 276 | return translations[0] 277 | return "" 278 | 279 | async def render(message, buffer_name): 280 | '''call Emacs render message''' 281 | try: 282 | tokens = await parse(message) 283 | for token in tokens: 284 | word = token[0].lower() 285 | chinese = await translate(word) 286 | if chinese != "": 287 | # if find translation, render function in emacs. 288 | await render_word(token, chinese, buffer_name) 289 | except Exception as e: 290 | msg = "[Dictionary-overlay]Render buffer error. Run (websocket-bridge-app-open-buffer 'dictionary-overlay) see the error details" 291 | await bridge.message_to_emacs(msg) 292 | print(e) 293 | 294 | async def render_word(token, chinese, buffer_name): 295 | word = token[0] 296 | begin = token[1][0] + 1 297 | end = token[1][1] + 1 298 | cmd = f'(dictionary-add-overlay-from {begin} {end} "{word}" "{chinese}" "{buffer_name}")' 299 | await run_and_log(cmd) 300 | 301 | # eval in emacs and log the command. 302 | async def run_and_log(cmd): 303 | print(cmd, flush=True) 304 | await bridge.eval_in_emacs(cmd) 305 | 306 | async def main(): 307 | snapshot() 308 | global bridge 309 | bridge = websocket_bridge_python.bridge_app_regist(on_message) 310 | await asyncio.gather(init(), bridge.start()) 311 | 312 | async def get_emacs_var(var_name: str): 313 | "Get Emacs variable and format it." 314 | var_value = await bridge.get_emacs_var(var_name) 315 | if isinstance(var_value, str): 316 | var_value = var_value.strip('"') 317 | print(f'{var_name} : {var_value}') 318 | if var_value == 'null': 319 | return None 320 | return var_value 321 | 322 | async def init(): 323 | "Init User data." 324 | global dictionary_file_path, knownwords_file_path, unknownwords_file_path, known_words, unknown_words, crow_engine, dictionary, translators, sdcv_dictionary 325 | sdcv_dictionary_path = await get_emacs_var("dictionary-overlay-sdcv-dictionary-path") 326 | if not sdcv_dictionary_path: 327 | sdcv_dictionary_path = os.path.join( 328 | os.path.dirname(__file__), "resources", "kdic-ec-11w" 329 | ) 330 | sdcv_dictionary = Dictionary(sdcv_dictionary_path, in_memory=True) 331 | print("Sdcv Dictionary load success.") 332 | crow_engine = await get_emacs_var("dictionary-overlay-crow-engine") 333 | translators = json.loads(await get_emacs_var("dictionary-overlay-translators")) 334 | user_data_directory = await get_emacs_var("dictionary-overlay-user-data-directory") 335 | user_data_directory = os.path.expanduser(user_data_directory) 336 | dictionary_file_path = os.path.join(user_data_directory, "dictionary.json") 337 | knownwords_file_path = os.path.join(user_data_directory, "knownwords.txt") 338 | unknownwords_file_path = os.path.join(user_data_directory, "unknownwords.txt") 339 | create_user_data_file_if_not_exist(dictionary_file_path, "{}") 340 | create_user_data_file_if_not_exist(knownwords_file_path) 341 | create_user_data_file_if_not_exist(unknownwords_file_path) 342 | with open(dictionary_file_path, "r", encoding="utf-8") as f: dictionary = json.load(f) 343 | with open(knownwords_file_path, "r", encoding="utf-8") as f: known_words= set(f.read().split()) 344 | with open(unknownwords_file_path, "r", encoding="utf-8") as f: unknown_words= set(f.read().split()) 345 | 346 | def create_user_data_file_if_not_exist(path: str, content=None): 347 | '''create user data file if not exist''' 348 | if not os.path.exists(path): 349 | # Build parent directories when file is not exist. 350 | basedir = os.path.dirname(path) 351 | if not os.path.exists(basedir): 352 | os.makedirs(basedir) 353 | 354 | with open(path, "w", encoding="utf-8") as f: 355 | if content: 356 | f.write(content) 357 | 358 | print(f"[dictionary-overlay] auto create user data file {path}") 359 | 360 | def macos_dictionary_translate(word: str) -> list: 361 | '''using macos dictionary to translate word''' 362 | if platform == "darwin": 363 | import CoreServices 364 | translation_msg = CoreServices.DCSCopyTextDefinition(None, word, (0, len(word))) 365 | translation_msg = translation_msg if translation_msg else "" 366 | return extract_translations(translation_msg) 367 | return [] 368 | 369 | asyncio.run(main()) 370 | -------------------------------------------------------------------------------- /images/2022-11-15_21-23-58_screenshot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ginqi7/dictionary-overlay/692fdcec3082e58d0ed57b36ad430aebf7352cd0/images/2022-11-15_21-23-58_screenshot.png -------------------------------------------------------------------------------- /images/dictionary-overlay-face.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ginqi7/dictionary-overlay/692fdcec3082e58d0ed57b36ad430aebf7352cd0/images/dictionary-overlay-face.png -------------------------------------------------------------------------------- /logo.svg: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | Dict 48 | 49 | -------------------------------------------------------------------------------- /pystardict.py: -------------------------------------------------------------------------------- 1 | import gzip 2 | import hashlib 3 | import os 4 | import re 5 | import warnings 6 | from struct import unpack 7 | 8 | import six 9 | 10 | 11 | class _StarDictIfo(object): 12 | """ 13 | The .ifo file has the following format: 14 | 15 | StarDict's dict ifo file 16 | version=2.4.2 17 | [options] 18 | 19 | Note that the current "version" string must be "2.4.2" or "3.0.0". If it's not, 20 | then StarDict will refuse to read the file. 21 | If version is "3.0.0", StarDict will parse the "idxoffsetbits" option. 22 | 23 | [options] 24 | --------- 25 | In the example above, [options] expands to any of the following lines 26 | specifying information about the dictionary. Each option is a keyword 27 | followed by an equal sign, then the value of that option, then a 28 | newline. The options may be appear in any order. 29 | 30 | Note that the dictionary must have at least a bookname, a wordcount and a 31 | idxfilesize, or the load will fail. All other information is optional. All 32 | strings should be encoded in UTF-8. 33 | 34 | Available options: 35 | 36 | bookname= // required 37 | wordcount= // required 38 | synwordcount= // required if ".syn" file exists. 39 | idxfilesize= // required 40 | idxoffsetbits= // New in 3.0.0 41 | author= 42 | email= 43 | website= 44 | description= // You can use
for new line. 45 | date= 46 | sametypesequence= // very important. 47 | """ 48 | 49 | def __init__(self, dict_prefix, container): 50 | 51 | ifo_filename = '%s.ifo' % dict_prefix 52 | 53 | try: 54 | _file = open(ifo_filename, encoding="utf-8") 55 | except Exception as e: 56 | raise Exception('ifo file opening error: "{}"'.format(e)) 57 | 58 | _file.readline() 59 | 60 | # skipping ifo header 61 | _line = _file.readline().split('=') 62 | if _line[0] == 'version': 63 | self.version = _line[1] 64 | else: 65 | raise Exception('ifo has invalid format') 66 | 67 | _config = {} 68 | for _line in _file: 69 | _line_splited = _line.split('=') 70 | _config[_line_splited[0]] = _line_splited[1] 71 | _file.close() 72 | 73 | self.bookname = _config.get('bookname', None).strip() 74 | if self.bookname is None: 75 | raise Exception('ifo has no bookname') 76 | 77 | self.wordcount = _config.get('wordcount', None) 78 | if self.wordcount is None: 79 | raise Exception('ifo has no wordcount') 80 | self.wordcount = int(self.wordcount) 81 | 82 | if self.version == '3.0.0': 83 | try: 84 | #_syn = open('%s.syn' % dict_prefix) # not used 85 | self.synwordcount = _config.get('synwordcount', None) 86 | if self.synwordcount is None: 87 | raise Exception( 88 | 'ifo has no synwordcount but .syn file exists') 89 | self.synwordcount = int(self.synwordcount) 90 | except IOError: 91 | pass 92 | 93 | self.idxfilesize = _config.get('idxfilesize', None) 94 | if self.idxfilesize is None: 95 | raise Exception('ifo has no idxfilesize') 96 | self.idxfilesize = int(self.idxfilesize) 97 | 98 | self.idxoffsetbits = _config.get('idxoffsetbits', 32) 99 | self.idxoffsetbits = int(self.idxoffsetbits) 100 | 101 | self.author = _config.get('author', '').strip() 102 | 103 | self.email = _config.get('email', '').strip() 104 | 105 | self.website = _config.get('website', '').strip() 106 | 107 | self.description = _config.get('description', '').strip() 108 | 109 | self.date = _config.get('date', '').strip() 110 | 111 | self.sametypesequence = _config.get('sametypesequence', '').strip() 112 | 113 | 114 | class _StarDictIdx(object): 115 | """ 116 | The .idx file is just a word list. 117 | 118 | The word list is a sorted list of word entries. 119 | 120 | Each entry in the word list contains three fields, one after the other: 121 | word_str; // a utf-8 string terminated by '\0'. 122 | word_data_offset; // word data's offset in .dict file 123 | word_data_size; // word data's total size in .dict file 124 | """ 125 | 126 | def __init__(self, dict_prefix, container): 127 | self._container = container 128 | 129 | idx_filename = '%s.idx' % dict_prefix 130 | idx_filename_gz = '%s.gz' % idx_filename 131 | 132 | try: 133 | file = open_file(idx_filename, idx_filename_gz) 134 | except Exception as e: 135 | raise Exception('idx file opening error: "{}"'.format(e)) 136 | 137 | self._file = file.read() 138 | 139 | """ check file size """ 140 | if file.tell() != container.ifo.idxfilesize: 141 | raise Exception('size of the .idx file is incorrect') 142 | file.close() 143 | 144 | """ prepare main dict and parsing parameters """ 145 | self._idx = {} 146 | idx_offset_bytes_size = int(container.ifo.idxoffsetbits / 8) 147 | idx_offset_format = {4: 'L', 8: 'Q', }[idx_offset_bytes_size] 148 | idx_cords_bytes_size = idx_offset_bytes_size + 4 149 | 150 | """ parse data via regex """ 151 | record_pattern = br'([\d\D]+?\x00[\d\D]{' + str( 152 | idx_cords_bytes_size).encode('utf-8') + br'})' 153 | matched_records = re.findall(record_pattern, self._file) 154 | 155 | """ check records count """ 156 | if len(matched_records) != container.ifo.wordcount: 157 | raise Exception('words count is incorrect') 158 | 159 | """ unpack parsed records """ 160 | for matched_record in matched_records: 161 | c = matched_record.find(b'\x00') 162 | if c == 0: 163 | continue 164 | record_tuple = unpack( 165 | '!%sc%sL' % (c + 1, idx_offset_format), matched_record) 166 | word, cords = record_tuple[:c], record_tuple[c + 1:] 167 | self._idx[b''.join(word)] = cords 168 | 169 | def __getitem__(self, word): 170 | """ 171 | returns tuple (word_data_offset, word_data_size,) for word in .dict 172 | 173 | @note: here may be placed flexible search realization 174 | """ 175 | return self._idx[word.encode('utf-8')] 176 | 177 | def __contains__(self, k): 178 | """ 179 | returns True if index has a word k, else False 180 | """ 181 | return k.encode('utf-8') in self._idx 182 | 183 | def __eq__(self, y): 184 | """ 185 | returns True if hashlib.md5(x.idx) is equal to hashlib.md5(y.idx), else False 186 | """ 187 | return hashlib.md5(self._file).hexdigest() == hashlib.md5(y._file).hexdigest() 188 | 189 | def __ne__(self, y): 190 | """ 191 | returns True if hashlib.md5(x.idx) is not equal to hashlib.md5(y.idx), else False 192 | """ 193 | return not self.__eq__(y) 194 | 195 | def iterkeys(self): 196 | """ 197 | returns iterkeys 198 | """ 199 | if not self._container.in_memory: 200 | warnings.warn( 201 | 'Iter dict items with in_memory=False may cause serious performance problem') 202 | for key in six.iterkeys(self._idx): 203 | yield key.decode('utf-8') 204 | 205 | def keys(self): 206 | """ 207 | returns keys 208 | """ 209 | if six.PY3: 210 | return self.iterkeys() 211 | 212 | if not self._container.in_memory: 213 | warnings.warn( 214 | 'Iter dict items with in_memory=False may cause serious performance problem') 215 | return [key.decode('utf-8') for key in self._idx.keys()] 216 | 217 | 218 | class _StarDictDict(object): 219 | """ 220 | The .dict file is a pure data sequence, as the offset and size of each 221 | word is recorded in the corresponding .idx file. 222 | 223 | If the "sametypesequence" option is not used in the .ifo file, then 224 | the .dict file has fields in the following order: 225 | ============== 226 | word_1_data_1_type; // a single char identifying the data type 227 | word_1_data_1_data; // the data 228 | word_1_data_2_type; 229 | word_1_data_2_data; 230 | ...... // the number of data entries for each word is determined by 231 | // word_data_size in .idx file 232 | word_2_data_1_type; 233 | word_2_data_1_data; 234 | ...... 235 | ============== 236 | It's important to note that each field in each word indicates its 237 | own length, as described below. The number of possible fields per 238 | word is also not fixed, and is determined by simply reading data until 239 | you've read word_data_size bytes for that word. 240 | 241 | 242 | Suppose the "sametypesequence" option is used in the .idx file, and 243 | the option is set like this: 244 | sametypesequence=tm 245 | Then the .dict file will look like this: 246 | ============== 247 | word_1_data_1_data 248 | word_1_data_2_data 249 | word_2_data_1_data 250 | word_2_data_2_data 251 | ...... 252 | ============== 253 | The first data entry for each word will have a terminating '\0', but 254 | the second entry will not have a terminating '\0'. The omissions of 255 | the type chars and of the last field's size information are the 256 | optimizations required by the "sametypesequence" option described 257 | above. 258 | 259 | If "idxoffsetbits=64", the file size of the .dict file will be bigger 260 | than 4G. Because we often need to mmap this large file, and there is 261 | a 4G maximum virtual memory space limit in a process on the 32 bits 262 | computer, which will make we can get error, so "idxoffsetbits=64" 263 | dictionary can't be loaded in 32 bits machine in fact, StarDict will 264 | simply print a warning in this case when loading. 64-bits computers 265 | should haven't this limit. 266 | 267 | Type identifiers 268 | ---------------- 269 | Here are the single-character type identifiers that may be used with 270 | the "sametypesequence" option in the .idx file, or may appear in the 271 | dict file itself if the "sametypesequence" option is not used. 272 | 273 | Lower-case characters signify that a field's size is determined by a 274 | terminating '\0', while upper-case characters indicate that the data 275 | begins with a network byte-ordered guint32 that gives the length of 276 | the following data's size(NOT the whole size which is 4 bytes bigger). 277 | 278 | 'm' 279 | Word's pure text meaning. 280 | The data should be a utf-8 string ending with '\0'. 281 | 282 | 'l' 283 | Word's pure text meaning. 284 | The data is NOT a utf-8 string, but is instead a string in locale 285 | encoding, ending with '\0'. Sometimes using this type will save disk 286 | space, but its use is discouraged. 287 | 288 | 'g' 289 | A utf-8 string which is marked up with the Pango text markup language. 290 | For more information about this markup language, See the "Pango 291 | Reference Manual." 292 | You might have it installed locally at: 293 | file:///usr/share/gtk-doc/html/pango/PangoMarkupFormat.html 294 | 295 | 't' 296 | English phonetic string. 297 | The data should be a utf-8 string ending with '\0'. 298 | 299 | Here are some utf-8 phonetic characters: 300 | θʃŋʧðʒæıʌʊɒɛəɑɜɔˌˈːˑṃṇḷ 301 | æɑɒʌәєŋvθðʃʒɚːɡˏˊˋ 302 | 303 | 'x' 304 | A utf-8 string which is marked up with the xdxf language. 305 | See http://xdxf.sourceforge.net 306 | StarDict have these extention: 307 | can have "type" attribute, it can be "image", "sound", "video" 308 | and "attach". 309 | can have "k" attribute. 310 | 311 | 'y' 312 | Chinese YinBiao or Japanese KANA. 313 | The data should be a utf-8 string ending with '\0'. 314 | 315 | 'k' 316 | KingSoft PowerWord's data. The data is a utf-8 string ending with '\0'. 317 | It is in XML format. 318 | 319 | 'w' 320 | MediaWiki markup language. 321 | See http://meta.wikimedia.org/wiki/Help:Editing#The_wiki_markup 322 | 323 | 'h' 324 | Html codes. 325 | 326 | 'r' 327 | Resource file list. 328 | The content can be: 329 | img:pic/example.jpg // Image file 330 | snd:apple.wav // Sound file 331 | vdo:film.avi // Video file 332 | att:file.bin // Attachment file 333 | More than one line is supported as a list of available files. 334 | StarDict will find the files in the Resource Storage. 335 | The image will be shown, the sound file will have a play button. 336 | You can "save as" the attachment file and so on. 337 | 338 | 'W' 339 | wav file. 340 | The data begins with a network byte-ordered guint32 to identify the wav 341 | file's size, immediately followed by the file's content. 342 | 343 | 'P' 344 | Picture file. 345 | The data begins with a network byte-ordered guint32 to identify the picture 346 | file's size, immediately followed by the file's content. 347 | 348 | 'X' 349 | this type identifier is reserved for experimental extensions. 350 | 351 | """ 352 | 353 | def __init__(self, dict_prefix, container, in_memory=False): 354 | """ 355 | opens regular or dziped .dict file 356 | 357 | 'in_memory': indicate whether read whole dict file into memory 358 | """ 359 | self._container = container 360 | self._in_memory = in_memory 361 | 362 | dict_filename = '%s.dict' % dict_prefix 363 | dict_filename_dz = '%s.dz' % dict_filename 364 | 365 | try: 366 | f = open_file(dict_filename, dict_filename_dz) 367 | except Exception as e: 368 | raise Exception('dict file opening error: "{}"'.format(e)) 369 | 370 | if in_memory: 371 | self._file = f.read() 372 | f.close() 373 | else: 374 | self._file = f 375 | 376 | def __getitem__(self, word): 377 | """ 378 | returns data from .dict for word 379 | """ 380 | 381 | # getting word data coordinates 382 | cords = self._container.idx[word] 383 | 384 | if self._in_memory: 385 | bytes_ = self._file[cords[0]: cords[0] + cords[1]] 386 | else: 387 | # seeking in file for data 388 | self._file.seek(cords[0]) 389 | 390 | # reading data 391 | bytes_ = self._file.read(cords[1]) 392 | 393 | return bytes_.decode('utf-8') 394 | 395 | 396 | class _StarDictSyn(object): 397 | 398 | def __init__(self, dict_prefix, container): 399 | 400 | syn_filename = '%s.syn' % dict_prefix 401 | 402 | try: 403 | self._file = open(syn_filename, encoding="utf-8") 404 | except IOError: 405 | # syn file is optional, passing silently 406 | pass 407 | 408 | 409 | class Dictionary(dict): 410 | """ 411 | Dictionary-like class for lazy manipulating stardict dictionaries 412 | 413 | All items of this dictionary are writable and dict is expandable itself, 414 | but changes are not stored anywhere and available in runtime only. 415 | 416 | We assume in this documentation that "x" or "y" is instances of the 417 | StarDictDict class and "x.{ifo,idx{,.gz},dict{,.dz},syn}" or 418 | "y.{ifo,idx{,.gz},dict{,.dz},syn}" is files of the corresponding stardict 419 | dictionaries. 420 | 421 | 422 | Following documentation is from the "dict" class an is subkect to rewrite 423 | in further impleneted methods: 424 | 425 | """ 426 | 427 | def __init__(self, filename_prefix, in_memory=False): 428 | """ 429 | filename_prefix: path to dictionary files without files extensions 430 | 431 | initializes new StarDictDict instance from stardict dictionary files 432 | provided by filename_prefix 433 | """ 434 | 435 | self.in_memory = in_memory 436 | 437 | # reading somedict.ifo 438 | self.ifo = _StarDictIfo(dict_prefix=filename_prefix, container=self) 439 | 440 | # reading somedict.idx or somedict.idx.gz 441 | self.idx = _StarDictIdx(dict_prefix=filename_prefix, container=self) 442 | 443 | # reading somedict.dict or somedict.dict.dz 444 | self.dict = _StarDictDict( 445 | dict_prefix=filename_prefix, container=self, in_memory=in_memory) 446 | 447 | # reading somedict.syn (optional) 448 | self.syn = _StarDictSyn(dict_prefix=filename_prefix, container=self) 449 | 450 | # initializing cache 451 | self._dict_cache = {} 452 | 453 | def __cmp__(self, y): 454 | """ 455 | raises NotImplemented exception 456 | """ 457 | raise NotImplementedError() 458 | 459 | def __contains__(self, k): 460 | """ 461 | returns True if x.idx has a word k, else False 462 | """ 463 | return k in self.idx 464 | 465 | def __delitem__(self, k): 466 | """ 467 | frees cache from word k translation 468 | """ 469 | del self._dict_cache[k] 470 | 471 | def __eq__(self, y): 472 | """ 473 | returns True if hashlib.md5(x.idx) is equal to hashlib.md5(y.idx), else False 474 | """ 475 | return self.idx.__eq__(y.idx) 476 | 477 | def __ge__(self, y): 478 | """ 479 | raises NotImplemented exception 480 | """ 481 | raise NotImplementedError() 482 | 483 | def __getitem__(self, k): 484 | """ 485 | returns translation for word k from cache or not and then caches 486 | """ 487 | if k in self._dict_cache: 488 | return self._dict_cache[k] 489 | else: 490 | value = self.dict[k] 491 | self._dict_cache[k] = value 492 | return value 493 | 494 | def __gt__(self, y): 495 | """ 496 | raises NotImplemented exception 497 | """ 498 | raise NotImplementedError() 499 | 500 | def __iter__(self): 501 | """ 502 | raises NotImplemented exception 503 | """ 504 | raise NotImplementedError() 505 | 506 | def __le__(self): 507 | """ 508 | raises NotImplemented exception 509 | """ 510 | raise NotImplementedError() 511 | 512 | def __len__(self): 513 | """ 514 | returns number of words provided by wordcount parameter of the x.ifo 515 | """ 516 | return self.ifo.wordcount 517 | 518 | def __lt__(self): 519 | """ 520 | raises NotImplemented exception 521 | """ 522 | raise NotImplementedError() 523 | 524 | def __ne__(self, y): 525 | """ 526 | returns True if hashlib.md5(x.idx) is not equal to hashlib.md5(y.idx), else False 527 | """ 528 | return not self.__eq__(y) 529 | 530 | def __repr__(self): 531 | """ 532 | returns classname and bookname parameter of the x.ifo 533 | """ 534 | return u'%s %s' % (self.__class__, self.ifo.bookname) 535 | 536 | def __setitem__(self, k, v): 537 | """ 538 | raises NotImplemented exception 539 | """ 540 | raise NotImplementedError() 541 | 542 | def clear(self): 543 | """ 544 | clear dict cache 545 | """ 546 | self._dict_cache = dict() 547 | 548 | def get(self, k, d=''): 549 | """ 550 | returns translation of the word k from self.dict or d if k not in x.idx 551 | 552 | d defaults to empty string 553 | """ 554 | return k in self and self[k] or d 555 | 556 | def has_key(self, k): 557 | """ 558 | returns True if self.idx has a word k, else False 559 | """ 560 | return k in self 561 | 562 | def items(self): 563 | """ 564 | returns items 565 | """ 566 | if not self.in_memory: 567 | warnings.warn( 568 | 'Iter dict items with in_memory=False may cause serious performance problem') 569 | return [(key, self[key]) for key in self.keys()] 570 | 571 | def iteritems(self): 572 | """ 573 | returns iteritems 574 | """ 575 | if not self.in_memory: 576 | warnings.warn( 577 | 'Iter dict items with in_memory=False may cause serious performance problem') 578 | for key in self.iterkeys(): 579 | yield (key, self[key]) 580 | 581 | def iterkeys(self): 582 | """ 583 | returns iterkeys 584 | """ 585 | if not self.in_memory: 586 | warnings.warn( 587 | 'Iter dict items with in_memory=False may cause serious performance problem') 588 | return self.idx.iterkeys() 589 | 590 | def itervalues(self): 591 | """ 592 | raises NotImplemented exception 593 | """ 594 | raise NotImplementedError() 595 | 596 | def keys(self): 597 | """ 598 | returns keys 599 | """ 600 | if not self.in_memory: 601 | warnings.warn( 602 | 'Iter dict items with in_memory=False may cause serious performance problem') 603 | return self.idx.keys() 604 | 605 | def pop(self, k, d): 606 | """ 607 | raises NotImplemented exception 608 | """ 609 | raise NotImplementedError() 610 | 611 | def popitem(self): 612 | """ 613 | raises NotImplemented exception 614 | """ 615 | raise NotImplementedError() 616 | 617 | def setdefault(self, k, d): 618 | """ 619 | raises NotImplemented exception 620 | """ 621 | raise NotImplementedError() 622 | 623 | def update(self, E, **F): 624 | """ 625 | raises NotImplemented exception 626 | """ 627 | raise NotImplementedError() 628 | 629 | def values(self): 630 | """ 631 | raises NotImplemented exception 632 | """ 633 | raise NotImplementedError() 634 | 635 | def fromkeys(self, S, v=None): 636 | """ 637 | raises NotImplemented exception 638 | """ 639 | raise NotImplementedError() 640 | 641 | 642 | def open_file(regular, gz): 643 | """ 644 | Open regular file if it exists, gz file otherwise. 645 | If no file exists, raise ValueError. 646 | """ 647 | if os.path.exists(regular): 648 | try: 649 | return open(regular, 'rb') 650 | except Exception as e: 651 | raise Exception('regular file opening error: "{}"'.format(e)) 652 | 653 | if os.path.exists(gz): 654 | try: 655 | return gzip.open(gz, 'rb') 656 | except Exception as e: 657 | raise Exception('gz file opening error: "{}"'.format(e)) 658 | 659 | raise ValueError('Neither regular nor gz file exists') 660 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | sexpdata 2 | six==1.16.0 3 | snowballstemmer==2.2.0 4 | tokenizers==0.13.2 5 | websocket-bridge-python==0.0.2 6 | websockets==10.4 7 | -------------------------------------------------------------------------------- /resources/kdic-ec-11w.dict.dz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ginqi7/dictionary-overlay/692fdcec3082e58d0ed57b36ad430aebf7352cd0/resources/kdic-ec-11w.dict.dz -------------------------------------------------------------------------------- /resources/kdic-ec-11w.idx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ginqi7/dictionary-overlay/692fdcec3082e58d0ed57b36ad430aebf7352cd0/resources/kdic-ec-11w.idx -------------------------------------------------------------------------------- /resources/kdic-ec-11w.ifo: -------------------------------------------------------------------------------- 1 | StarDict's dict ifo file 2 | version=2.4.2 3 | wordcount=112344 4 | idxfilesize=2105186 5 | bookname=KDic11万英汉词典 6 | description=胡正制作。 7 | date=2006.5.17 8 | sametypesequence=m 9 | --------------------------------------------------------------------------------