├── .gitignore
├── README.org
├── dictionary-overlay.el
├── dictionary-overlay.py
├── images
├── 2022-11-15_21-23-58_screenshot.png
└── dictionary-overlay-face.png
├── logo.svg
├── pystardict.py
├── requirements.txt
└── resources
├── kdic-ec-11w.dict.dz
├── kdic-ec-11w.idx
└── kdic-ec-11w.ifo
/.gitignore:
--------------------------------------------------------------------------------
1 | *.elc
2 | *.pyc
3 | /.log/
4 | node_modules/
5 | dist/
6 | tags
7 | __pycache__
8 | env/
--------------------------------------------------------------------------------
/README.org:
--------------------------------------------------------------------------------
1 | #+title: Dictionary Overlay
2 |
3 |
4 | #+html:
5 | #+html: Dictionary Overlay
6 |
7 | * 目标
8 | 辅助英文较弱的 Emacser 进行英文阅读。提供了两种能力:
9 | 1. 生词本提示:自定义“生词本”,阅读英文文章时,通过 overlay 给生词添加中文翻译。
10 | 2. 透析阅读法:自定义“熟词本”,阅读英文文章时,通过 overlay 翻译当前文章所有未标记为“熟词”的单词
11 |
12 | #+caption: Example
13 | [[file:images/2022-11-15_21-23-58_screenshot.png]]
14 |
15 | * 安装
16 | ** [[https://github.com/ginqi7/websocket-bridge][websocket-bridge]]
17 | 用于 Emacs 与外部应用进行 websocket 通信
18 | ** Python 相关包
19 | 插件通过 python 编写,需要安装 python3
20 | - [[https://github.com/ginqi7/websocket-bridge-python][websocket-bridge-python]] websocket-bridge 的 python 客户端
21 | - [[https://github.com/huggingface/tokenizers][tokenizers]] python 分词工具
22 | - six pystardict.py 的依赖
23 | - [[https://github.com/jd-boyd/sexpdata][sexpdata]] 用于把 python 对象转换为 sexp
24 | - [[https://pypi.org/project/snowballstemmer/][snowballstemmer]] 用于“词干提取”的算法包
25 | - [[https://git.ookami.one/cgit/google-translate/][google-translate]] 用于网络翻译,非必选,可以用 crow-translate 替换
26 | - [[https://pyobjc.readthedocs.io/en/latest/][pyobjc]] 非必选,MacOS 用户想要使用系统词典时,需要安装
27 |
28 | 你可以使用 ~dictionary-overlay-install~ 来安装相关的 python 包(不包括 google-translate 和 pyobjc)。
29 |
30 | ** 网络翻译
31 | 默认会使用 sdcv 本地词典翻译。当单词在本地词典未找到时,会使用网络翻译,目前支持:
32 | 1. [[https://crow-translate.github.io/][crow-translate]]
33 | 2. [[https://git.ookami.one/cgit/google-translate/][google-translate]]
34 |
35 | 你可以使用: ~dictionary-overlay-install-google-translate~ 来安装 google-translate
36 |
37 | ** 下载 dictionary-overlay
38 | #+begin_src shell
39 | git clone --depth=1 -b main https://github.com/ginqi7/dictionary-overlay ~/.emacs.d/site-lisp/dictionary-overlay/
40 | #+end_src
41 |
42 | ** 添加下面配置到 ~/.emacs
43 | #+begin_src emacs-lisp
44 | (add-to-list 'load-path "~/.emacs.d/site-lisp/dictionary-overlay/")
45 | (require 'dictionary-overlay)
46 | #+end_src
47 |
48 | * 命令
49 | | 命令 | 说明 |
50 | |----------------------------------------------+----------------------------------------------------------------|
51 | | dictionary-overlay-start | 启动 dictionary-overlay 应用 |
52 | | dictionary-overlay-stop | 退出 dictionary-overlay 应用 |
53 | | dictionary-overlay-restart | 重启 dictionary-overlay 应用 |
54 | | dictionary-overlay-render-buffer | 使用翻译渲染当前 buffer |
55 | | dictionary-overlay-toggle | 打开\关闭翻译渲染当前 buffer |
56 | | dictionary-overlay-lookup | 查询当前词, 默认 Emacs 自带词典。自定义见选项 |
57 | | dictionary-overlay-jump-next-unknown-word | 跳转到下一个生词 |
58 | | dictionary-overlay-jump-prev-unknown-word | 跳转到上一个生词 |
59 | | dictionary-overlay-jump-first-unknown-word | 跳转到第一个生词 |
60 | | dictionary-overlay-jump-last-unknown-word | 跳转到最后一个生词 |
61 | | dictionary-overlay-jump-out-overlay | 光标跳到词末,离开overlay,恢复正常keymap |
62 | | dictionary-overlay-mark-word-known | 标记当前单词为“已知” |
63 | | dictionary-overlay-mark-word-unknown | 标记当前单词为“生词” |
64 | | dictionary-overlay-mark-word-smart | 生词本模式时,默认标记当前单词为“未知”,透析模式时,标为“已知” |
65 | | dictionary-overlay-mark-word-smart-reversely | 功能同上,但生词本模式时标记为“已知”,透析模式时,标为”未知“ |
66 | | dictionary-overlay-mark-buffer | 标签当前 buffer 中所有未标记为“生词”的单词全为“已知” |
67 | | dictionary-overlay-mark-buffer-unknown | 标签当前 buffer 中所有未标记为“生词”的单词全为“未知” |
68 | | dictionary-overlay-install | 安装 dictionary-overlay 所依赖的必选 python 包 |
69 | | dictionary-overlay-install-google-translate | 安装 google-translate |
70 | | dictionary-overlay-modify-translation | 修改当前单词的“翻译”,可以选择词典中的翻译,也可以手动输入 |
71 |
72 | * 选项
73 |
74 | | 选项 | 说明 |
75 | |-----------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
76 | | dictionary-overlay-just-unknown-words | t 时使用“生词本”模式,nil 为“透析阅读”模式,默认为 t |
77 | | dictionary-overlay-user-data-directory | 用户数据存放 目录,默认值为:“~/.emacs.d/dictionary-overlay-data” |
78 | | dictionary-overlay-position | 显示翻译的位置:词后,help-echo, 默认在词后 |
79 | | dictionary-overlay-lookup-with | 查词词典设置:默认系统词典。可自定义第三方包,比如 youdao-dictionary, popweb |
80 | | dictionary-overlay-inihibit-keymap | t 时关闭 keymap, 默认为 nil |
81 | | dictionary-overlay-auto-jump-after | 可选项:标为生词 mark-word-known, 标为熟词 mark-word-unknwon, 刷新 render-buffer |
82 | | dictionary-overlay-translation-format | 翻译展示的形式,默认是:"(%s)" |
83 | | dictionary-overlay-translators | 指定使用的翻译引擎以及使用顺序。默认包含'("local" "sdcv" "darwin" "web") 分别表示,本地dictionary.json 文件,内置的sdcv 词典, MacOs 系统词典,以及web 翻译,你可以选择使用的词典以及顺序。 |
84 | | dictionary-overlay-sdcv-dictionary-path | 默认值 nil, 此时会使用 dictionary-overlay 自带 的kdic-ec-11w 词典,如果你有自定义的 StarDict 词典,你可以设定自己的词典路径。 |
85 |
86 | *注意:手动修改dictionary-overlay-user-data-directory 目录下的文件时,请先关闭 dictionary-overlay 应用(运行dictionary-overlay-stop ),否则修改可能会被应用覆盖*
87 |
88 |
89 |
90 | ** face
91 |
92 | | 选项 | 说明 |
93 | |---------------------------------------------------+---------------------------------------------------------------|
94 | | dictionary-overlay-unknownword | 生词的展示形态 face 默认为 nil, 用户可自行修改 |
95 | | dictionary-overlay-translation | 生词的翻译的展示形态 face 默认为 nil, 用户可自行修改 |
96 |
97 | 用于控制生词的展示, 为了不影响阅读默认为空,不对原始 face 做任何修改。如果希望能通过 face 对生词进行显示增加可以参考
98 |
99 | #+begin_src emacs-lisp
100 | (defface dictionary-overlay-translation
101 | '((((class color) (min-colors 88) (background light))
102 | :underline "#fb8c96" :background "#fbd8db")
103 | (((class color) (min-colors 88) (background dark))
104 | :underline "#C77577" :background "#7A696B")
105 | (t
106 | :inherit highlight))
107 | "Face for dictionary-overlay unknown words.")
108 | #+end_src
109 |
110 | face `dictionary-overlay-unknownword` 如果用户不自行定义,那么不会给单词加上 overlay, 只会新增翻译的 overlay. 这样的好处是,当你在单词上移动时,仍旧按照字母移动,而不是按照 overlay 移动。
111 |
112 | 推荐使用的 face :
113 | #+begin_src emacs-lisp
114 | (copy-face 'font-lock-keyword-face 'dictionary-overlay-unknownword)
115 | (copy-face 'font-lock-comment-face 'dictionary-overlay-translation)
116 | #+end_src
117 |
118 | #+caption: dictionary-overlay with face
119 | [[file:images/dictionary-overlay-face.png]]
120 |
121 | * 快捷键
122 | 当 ~(setq dictionary-overlay-inihibit-keymap nil)~ 可以使用若干自带的快捷键,当point 在一个生词的overlay 之上时,可以:
123 |
124 | | d | dictionary-overlay-lookup | 查当前词 |
125 | | r | dictionary-overlay-refresh-buffer | 刷新buffer |
126 | | p | dictionary-overlay-jump-prev-unknown-word | 跳转到上一个生词 |
127 | | n | dictionary-overlay-jump-next-unknown-word | 跳转到下一个生词 |
128 | | < | dictionary-overlay-jump-first-unknown-word | 跳转到第一个生词 |
129 | | > | dictionary-overlay-jump-last-unknown-word | 跳转到最后一个生词 |
130 | | m | dictionary-overlay-mark-word-smart | 透析模式,把单词标记为“熟词” |
131 | | M | dictionary-overlay-mark-word-smart-reversely | 生词本模式,把单词标记为“熟词” |
132 | | c | dictionary-overlay-modify-translation | 修改翻译 |
133 | | | dictionary-overlay-jump-out-of-overlay | 跳出overlay 让快捷键在非overlay 词语中失效。 |
134 |
135 | 快捷键只在标记为生词的overlay 上生效,因此 ~dictionary-overlay-mark-word-unknown~ 还需要自行绑定需要的快捷键
136 |
137 | * 使用方法探讨
138 |
139 | 默认使用“生词本”模式,阅读英文文章时,需要手动添加生词( ~dictionary-overlay-mark-word-unknown~ )。可以和你的“查询单词”的快捷键保持在一起。那么你下次遇到生词时,会自动展示出生词。
140 |
141 | 当你开始阅读文章时,可以把当前 buffer 中所有未标记为 known 的单词标记为 unknown ( ~dictionary-overlay-mark-buffer-unknown~ )
142 |
143 | 当你阅读完一篇文章以后,可以把当前 buffer 中所有未标记为 unknown 的单词标记为 known ( ~dictionary-overlay-mark-buffer~ )
144 |
145 | 当一个生词反复出现,你觉得自己已经认识了它,可以标记为 known ( ~dictionary-overlay-mark-word-known~ ),下次不再展示翻译。
146 |
147 | 当你阅读了足够多的文章,你应该积累了一定量的 known-words ,此时,或许你可以尝试使用析阅读法"( ~(setq dictionary-overlay-just-unknown-words nil)~ )将自动展示,“或许”你不认识的单词。
148 |
149 | 如果喜欢最小的视觉干扰,可以通过 (setq dictionary-overlay-position 'help-echo) 把翻译位置设置在 help-echo 里,只有鼠标通过时才显示释义。注意:目前支持的释义仍过于简单,并不推荐使用此法,同时由于默认无face,推荐设置前述 (copy-face 'font-lock-keyword-face 'dictionary-overlay-unknownword)。
150 |
151 | * 功能特性
152 | - 使用 snowballstemmer 进行词干提取,能够用于标记词干相同,形态不一的单词
153 | - 增加翻译修改功能,允许用户选择合适的词意
154 |
--------------------------------------------------------------------------------
/dictionary-overlay.el:
--------------------------------------------------------------------------------
1 | ;;; dictionary-overlay.el --- Add overlay for new English word -*- lexical-binding: t; -*-
2 |
3 | ;; Copyright (C) 2022 Qiqi Jin
4 |
5 | ;; Author: Qiqi Jin
6 | ;; Keywords: lisp
7 |
8 | ;; This program is free software; you can redistribute it and/or modify
9 | ;; it under the terms of the GNU General Public License as published by
10 | ;; the Free Software Foundation, either version 3 of the License, or
11 | ;; (at your option) any later version.
12 |
13 | ;; This program is distributed in the hope that it will be useful,
14 | ;; but WITHOUT ANY WARRANTY; without even the implied warranty of
15 | ;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
16 | ;; GNU General Public License for more details.
17 |
18 | ;; You should have received a copy of the GNU General Public License
19 | ;; along with this program. If not, see .
20 |
21 | ;;; Commentary:
22 |
23 | ;;
24 |
25 | ;;; Commands:
26 | ;;
27 | ;; Below are complete command list:
28 | ;;
29 | ;; `dictionary-overlay-start'
30 | ;; Start dictionary-overlay.
31 | ;; Keybinding: M-x dictionary-overlay-start
32 | ;; `dictionary-overlay-stop'
33 | ;; Stop dictionary-overlay.
34 | ;; Keybinding: M-x dictionary-overlay-stop
35 | ;; `dictionary-overlay-restart'
36 | ;; Restart dictionary-overlay.
37 | ;; Keybinding: M-x dictionary-overlay-restart
38 | ;; `dictionary-overlay-render-buffer'
39 | ;; Render current buffer.
40 | ;; Keybinding: M-x dictionary-overlay-render-buffer
41 | ;; `dictionary-overlay-toggle'
42 | ;; Toggle current buffer.
43 | ;; Keybinding: M-x dictionary-overlay-toggle
44 | ;; `dictionary-overlay-refresh-buffer'
45 | ;; Refresh current buffer.
46 | ;; Keybinding: r
47 | ;; `dictionary-overlay-jump-first-unknown-word'
48 | ;; Jump to first unknown word.
49 | ;; Keybinding: <
50 | ;; `dictionary-overlay-jump-last-unknown-word'
51 | ;; Jump to last unknown word.
52 | ;; Keybinding: >
53 | ;; `dictionary-overlay-jump-next-unknown-word'
54 | ;; Jump to next unknown word.
55 | ;; Keybinding: n
56 | ;; `dictionary-overlay-jump-prev-unknown-word'
57 | ;; Jump to previous unknown word.
58 | ;; Keybinding: p
59 | ;; `dictionary-overlay-jump-out-of-overlay'
60 | ;; Jump out overlay so that we no longer in keymap.
61 | ;; Keybinding:
62 | ;; `dictionary-overlay-mark-word-known'
63 | ;; Mark current word known.
64 | ;; Keybinding: M-x dictionary-overlay-mark-word-known
65 | ;; `dictionary-overlay-mark-word-unknown'
66 | ;; Mark current word unknown.
67 | ;; Keybinding: M-x dictionary-overlay-mark-word-unknown
68 | ;; `dictionary-overlay-mark-word-smart'
69 | ;; Smartly mark current word as known or unknown.
70 | ;; Keybinding: M-x dictionary-overlay-mark-word-smart
71 | ;; `dictionary-overlay-mark-word-smart-reversely'
72 | ;; Smartly mark current word known or unknown smartly, but reversely.
73 | ;; Keybinding: M
74 | ;; `dictionary-overlay-mark-buffer'
75 | ;; Mark all words as known, except those in `unknownwords' list.
76 | ;; Keybinding: M-x dictionary-overlay-mark-buffer
77 | ;; `dictionary-overlay-mark-buffer-unknown'
78 | ;; Mark all words as unknown, except those in `unknownwords' list.
79 | ;; Keybinding: M-x dictionary-overlay-mark-buffer-unknown
80 | ;; `dictionary-overlay-lookup'
81 | ;; Look up word in a third-parity dictionary.
82 | ;; Keybinding: d
83 | ;; `dictionary-overlay-install'
84 | ;; Install all python dependencies.
85 | ;; Keybinding: M-x dictionary-overlay-install
86 | ;; `dictionary-overlay-macos-install-core-services'
87 | ;; Install all python dependencies.
88 | ;; Keybinding: M-x dictionary-overlay-macos-install-core-services
89 | ;; `dictionary-overlay-install-google-translate'
90 | ;; Install all google-translate dependencies.
91 | ;; Keybinding: M-x dictionary-overlay-install-google-translate
92 | ;; `dictionary-overlay-modify-translation'
93 | ;; Modify current word's translation.
94 | ;; Keybinding: c
95 | ;;
96 | ;;; Customizable Options:
97 | ;;
98 | ;; Below are customizable option list:
99 | ;;
100 | ;; `dictionary-overlay-just-unknown-words'
101 | ;; If t, show overlay for words in unknownwords list.
102 | ;; default = t
103 | ;; `dictionary-overlay-position'
104 | ;; Where to show translation.
105 | ;; default = 'after
106 | ;; `dictionary-overlay-user-data-directory'
107 | ;; Place user data in Emacs directory.
108 | ;; default = (locate-user-emacs-file "dictionary-overlay-data/")
109 | ;; `dictionary-overlay-translation-format'
110 | ;; Translation format.
111 | ;; default = "(%s)"
112 | ;; `dictionary-overlay-crow-engine'
113 | ;; Crow translate engine.
114 | ;; default = "google"
115 | ;; `dictionary-overlay-inhibit-keymap'
116 | ;; When non-nil, don't use `dictionary-overlay-map'.
117 | ;; default = nil
118 | ;; `dictionary-overlay-auto-jump-after'
119 | ;; Auto jump to next unknown word.
120 | ;; default = 'nil
121 | ;; `dictionary-overlay-recenter-after-mark-and-jump'
122 | ;; Recenter after mark or jump.
123 | ;; default = nil
124 | ;; `dictionary-overlay-lookup-with'
125 | ;; Look up word with fn.
126 | ;; default = 'dictionary-lookup-definition
127 | ;; `dictionary-overlay-translators'
128 | ;; The translators and theirs's order.
129 | ;; default = '("local" "sdcv" "darwin" "web")
130 | ;; `dictionary-overlay-sdcv-dictionary-path'
131 | ;; User defined sdcv dictionary path.
132 | ;; default = nil
133 | ;; `dictionary-overlay-python'
134 | ;; The Python interpreter.
135 | ;; default = "python3"
136 |
137 | ;;; Code:
138 |
139 | (require 'websocket-bridge)
140 |
141 | (defgroup dictionary-overlay ()
142 | "Dictionary overlay for words in buffers."
143 | :group 'applications)
144 |
145 | (defface dictionary-overlay-unknownword nil
146 | "Face for dictionary-overlay unknown words."
147 | :group 'dictionary-overlay)
148 |
149 | (defface dictionary-overlay-translation nil
150 | "Face for dictionary-overlay translations."
151 | :group 'dictionary-overlay)
152 |
153 | (defvar dictionary-overlay-py-path
154 | (concat (file-name-directory load-file-name)
155 | "dictionary-overlay.py"))
156 |
157 | (defvar dictionary-overlay-py-requirements-path
158 | (concat (file-name-directory load-file-name) "requirements.txt"))
159 |
160 | (defvar-local dictionary-overlay-active-p nil
161 | "Check current buffer if active dictionary-overlay.")
162 |
163 | (defvar-local dictionary-overlay-hash-table nil
164 | "Hash-table contains overlays for dictionary-overlay.
165 | The key's format is begin:end:word:translation.")
166 |
167 | (defvar-local dictionary-overlay-hash-table-keys '()
168 | "Contains all hashtable-keys for dictionary-overlay.
169 | The key's format is begin:end:word:translation.")
170 |
171 | (defcustom dictionary-overlay-just-unknown-words t
172 | "If t, show overlay for words in unknownwords list.
173 | If nil, show overlay for words not in knownwords list."
174 | :group 'dictionary-overlay
175 | :type '(boolean))
176 |
177 | (defcustom dictionary-overlay-position 'after
178 | "Where to show translation.
179 | If value is \\='after, put translation after word
180 | If value is \\='help-echo, show it when mouse over word."
181 | :group 'dictionary-overlay
182 | :type '(choice
183 | (cons :tag "Show after word" 'after)
184 | (cons :tag "Show in help-echo" 'help-echo)))
185 |
186 | (defcustom dictionary-overlay-user-data-directory
187 | (locate-user-emacs-file "dictionary-overlay-data/")
188 | "Place user data in Emacs directory."
189 | :group 'dictionary-overlay
190 | :type '(directory))
191 |
192 | (defcustom dictionary-overlay-translation-format "(%s)"
193 | "Translation format."
194 | :group 'dictionary-overlay
195 | :type '(string))
196 |
197 | (defcustom dictionary-overlay-crow-engine "google"
198 | "Crow translate engine."
199 | :group 'dictionary-overlay
200 | :type '(string))
201 |
202 | ;; SRC: ideas from `symbol-overlay', tip hat!
203 | (defcustom dictionary-overlay-inhibit-keymap nil
204 | "When non-nil, don't use `dictionary-overlay-map'.
205 | This is intended for buffers/modes that use the keymap text
206 | property for their own purposes. Because this package uses
207 | overlays it would always override the text property keymaps
208 | of such packages."
209 | :group 'dictionary-overlay
210 | :type '(boolean))
211 |
212 | (defcustom dictionary-overlay-auto-jump-after '()
213 | "Auto jump to next unknown word.
214 | Main purpose of auto jump is to keep cursor stay within overlay
215 | to facilitate the usage of keymap. For MARK-WORD-(UN)KNOWN,
216 | usually jump to the next unknown word, but depends on direction
217 | set by `dictionary-overlay-jump-direction'. For RENDER-BUFFER, if
218 | current cursor is within overlay, do nothing; otherwise move to
219 | next overlay."
220 | :group 'dictionary-overlay
221 | :type '(repeat
222 | (choice
223 | (const :tag "Don't jump" nil)
224 | (const :tag "After mark word known" mark-word-known)
225 | (const :tag "After mark word unknown" mark-word-unknown)
226 | (const :tag "After refresh buffer" render-buffer))))
227 |
228 | (defvar-local dictionary-overlay-jump-direction 'next
229 | "Direction to jump word.")
230 |
231 | (defcustom dictionary-overlay-recenter-after-mark-and-jump nil
232 | "Recenter after mark or jump."
233 | :group 'dictionary-overlay
234 | :type '(choice (boolean :tag "Do nothing" nil)
235 | (integer :tag "Recenter to lines" N)))
236 |
237 | (defcustom dictionary-overlay-lookup-with 'dictionary-lookup-definition
238 | "Look up word with fn."
239 | :group 'dictionary-overlay
240 | :type '(function))
241 |
242 | (defcustom dictionary-overlay-translators '("local" "sdcv" "darwin" "web")
243 | "The translators and theirs's order."
244 | :group 'dictionary-overlay
245 | :type '(list))
246 |
247 | (defcustom dictionary-overlay-sdcv-dictionary-path nil
248 | "User defined sdcv dictionary path."
249 | :group 'dictionary-overlay
250 | :type '(string))
251 |
252 | (defcustom dictionary-overlay-python (executable-find "python3")
253 | "The Python interpreter."
254 | :type 'string)
255 |
256 | (defvar dictionary-overlay-map
257 | (let ((map (make-sparse-keymap)))
258 | (define-key map (kbd "d") #'dictionary-overlay-lookup)
259 | (define-key map (kbd "r") #'dictionary-overlay-refresh-buffer)
260 | (define-key map (kbd "p") #'dictionary-overlay-jump-prev-unknown-word)
261 | (define-key map (kbd "n") #'dictionary-overlay-jump-next-unknown-word)
262 | (define-key map (kbd "<") #'dictionary-overlay-jump-first-unknown-word)
263 | (define-key map (kbd ">") #'dictionary-overlay-jump-last-unknown-word)
264 | (define-key map (kbd "m") #'dictionary-overlay-mark-word-smart)
265 | (define-key map (kbd "M") #'dictionary-overlay-mark-word-smart-reversely)
266 | (define-key map (kbd "c") #'dictionary-overlay-modify-translation)
267 | (define-key map (kbd "") #'dictionary-overlay-jump-out-of-overlay)
268 | map)
269 | "Keymap automatically activated inside overlays.
270 | You can re-bind the commands to any keys you prefer.")
271 |
272 | (defun dictionary-overlay-start ()
273 | "Start dictionary-overlay."
274 | (interactive)
275 | (websocket-bridge-server-start)
276 | (websocket-bridge-app-start
277 | "dictionary-overlay"
278 | dictionary-overlay-python
279 | dictionary-overlay-py-path))
280 |
281 | (defun dictionary-overlay-stop ()
282 | "Stop dictionary-overlay."
283 | (interactive)
284 | (websocket-bridge-app-exit "dictionary-overlay"))
285 |
286 | (defun dictionary-overlay-restart ()
287 | "Restart dictionary-overlay."
288 | (interactive)
289 | (dictionary-overlay-stop)
290 | (dictionary-overlay-start)
291 | ;; REVIEW: really need bring this buffer to front? or we place it at bottom?
292 | ;; (split-window-below -10)
293 | ;; (other-window 1)
294 | ;; (websocket-bridge-app-open-buffer "dictionary-overlay")
295 | )
296 |
297 | (defun websocket-bridge-call-buffer (func-name)
298 | "Call grammarly function on current buffer by FUNC-NAME."
299 | (websocket-bridge-call "dictionary-overlay" func-name
300 | (buffer-string)
301 | (point)
302 | (buffer-name)))
303 |
304 | (defun websocket-bridge-call-word (func-name)
305 | "Call grammarly function on current word by FUNC-NAME."
306 | (let ((word (downcase (thing-at-point 'word))))
307 | (websocket-bridge-call "dictionary-overlay" func-name
308 | (downcase (thing-at-point 'word)))
309 | (message word)))
310 |
311 | (defun dictionary-overlay-render-buffer ()
312 | "Render current buffer."
313 | (interactive)
314 | (if (not (dictionary-overlay-ready-p))
315 | (message "Dictionary-Overlay not ready, please wait a second.")
316 | (when (not (member "dictionary-overlay" websocket-bridge-app-list))
317 | (dictionary-overlay-start))
318 | (setq-local dictionary-overlay-active-p t)
319 | (dictionary-overlay-refresh-buffer)
320 | (when (member 'render-buffer dictionary-overlay-auto-jump-after)
321 | (websocket-bridge-call-buffer "jump_next_unknown_word"))
322 | ))
323 |
324 | (defun dictionary-overlay-toggle ()
325 | "Toggle current buffer."
326 | (interactive)
327 | (if dictionary-overlay-active-p
328 | (progn
329 | ;; reset all hash-table-keys and delete all overlays
330 | (setq-local dictionary-overlay-hash-table-keys '())
331 | (dictionary-overlay-refresh-overlays)
332 | (setq-local dictionary-overlay-active-p nil)
333 | (message "Dictionary overlay removed."))
334 | (progn
335 | (dictionary-overlay-render-buffer)
336 | (message "Dictionary overlay rendered."))))
337 |
338 | (defun dictionary-overlay-refresh-overlays ()
339 | "Refresh overlays: remove overlays and hash-table items when not needed."
340 | (maphash
341 | (lambda (key val)
342 | (when (not (member key dictionary-overlay-hash-table-keys))
343 | (remhash key dictionary-overlay-hash-table)
344 | (delete-overlay val)))
345 | dictionary-overlay-hash-table))
346 |
347 | (defun dictionary-overlay-refresh-buffer ()
348 | "Refresh current buffer."
349 | (interactive)
350 | (when dictionary-overlay-active-p
351 | (when (not dictionary-overlay-hash-table)
352 | (setq-local dictionary-overlay-hash-table (make-hash-table :test 'equal)))
353 | (setq-local dictionary-overlay-hash-table-keys '())
354 | (websocket-bridge-call-buffer "render")))
355 |
356 | (defun dictionary-overlay-first-unknown-word-pos ()
357 | "End of last unknown word pos.
358 | NOTE: Retrieval of word pos relies on `dictionary-overlay-hash-table-keys',
359 | so currently won't work for auto jump after render buffer. Same as to
360 | `dictionary-overlay-last-unknown-word-pos'."
361 | (string-to-number
362 | (car (split-string
363 | (car (last dictionary-overlay-hash-table-keys)) ":"))))
364 |
365 | (defun dictionary-overlay-cursor-before-first-unknown-word-p ()
366 | "Whether cursor is after word beginning of last unknown word."
367 | (<= (point) (dictionary-overlay-first-unknown-word-pos)))
368 |
369 | (defun dictionary-overlay-last-unknown-word-pos ()
370 | "Beginning of last unnow word pos."
371 | (string-to-number
372 | (car (split-string
373 | (car dictionary-overlay-hash-table-keys) ":"))))
374 |
375 | (defun dictionary-overlay-cursor-after-last-unknown-word-p ()
376 | "Whether cursor is after word beginning of last unknown word."
377 | (>= (point) (dictionary-overlay-last-unknown-word-pos)))
378 |
379 | (defun dictionary-overlay-jump-first-unknown-word ()
380 | "Jump to first unknown word."
381 | (interactive)
382 | (goto-char (dictionary-overlay-first-unknown-word-pos)))
383 |
384 | (defun dictionary-overlay-jump-last-unknown-word ()
385 | "Jump to last unknown word."
386 | (interactive)
387 | (goto-char (dictionary-overlay-last-unknown-word-pos)))
388 |
389 | (defun dictionary-overlay-jump-next-unknown-word ()
390 | "Jump to next unknown word."
391 | (interactive)
392 | (if (dictionary-overlay-cursor-after-last-unknown-word-p)
393 | (dictionary-overlay-jump-first-unknown-word)
394 | (websocket-bridge-call-buffer "jump_next_unknown_word"))
395 | (setq-local dictionary-overlay-jump-direction 'next)
396 | (when dictionary-overlay-recenter-after-mark-and-jump
397 | (recenter dictionary-overlay-recenter-after-mark-and-jump)))
398 |
399 | (defun dictionary-overlay-jump-prev-unknown-word ()
400 | "Jump to previous unknown word."
401 | (interactive)
402 | (if (dictionary-overlay-cursor-before-first-unknown-word-p)
403 | (dictionary-overlay-jump-last-unknown-word)
404 | (websocket-bridge-call-buffer "jump_prev_unknown_word"))
405 | (setq-local dictionary-overlay-jump-direction 'prev)
406 | (when dictionary-overlay-recenter-after-mark-and-jump
407 | (recenter dictionary-overlay-recenter-after-mark-and-jump)))
408 |
409 | (defun dictionary-overlay-jump-out-of-overlay ()
410 | "Jump out overlay so that we no longer in keymap.
411 | Usually overlay keymap has a higher priority than other major and
412 | minor mode keymap. Jumping out of overlay facilitates the usage
413 | of original mode keymap. Since overlay is everywhere, don't expect it
414 | to work consistently, but usually it does a decent job."
415 | (interactive)
416 | (forward-word))
417 |
418 | (defun dictionary-overlay-mark-word-known ()
419 | "Mark current word known."
420 | (interactive)
421 | (websocket-bridge-call-word "mark_word_known")
422 | (when (member 'mark-word-known dictionary-overlay-auto-jump-after)
423 | (pcase dictionary-overlay-jump-direction
424 | (`next (dictionary-overlay-jump-next-unknown-word))
425 | (`prev (dictionary-overlay-jump-prev-unknown-word))))
426 | (dictionary-overlay-refresh-buffer)
427 | (when dictionary-overlay-recenter-after-mark-and-jump
428 | (recenter dictionary-overlay-recenter-after-mark-and-jump)))
429 |
430 | (defun dictionary-overlay-mark-word-unknown ()
431 | "Mark current word unknown."
432 | (interactive)
433 | (websocket-bridge-call-word "mark_word_unknown")
434 | (when (member 'mark-word-unknown dictionary-overlay-auto-jump-after)
435 | (pcase dictionary-overlay-jump-direction
436 | (`next (dictionary-overlay-jump-next-unknown-word))
437 | (`prev (dictionary-overlay-jump-prev-unknown-word))))
438 | (dictionary-overlay-refresh-buffer)
439 | (when dictionary-overlay-recenter-after-mark-and-jump
440 | (recenter dictionary-overlay-recenter-after-mark-and-jump)))
441 |
442 | (defun dictionary-overlay-mark-word-smart ()
443 | "Smartly mark current word as known or unknown.
444 | Based on value of `dictionary-overlay-just-unknown-words'
445 | Usually when value is t, we want to mark word as unknown. Vice versa.
446 | If you need reverse behavior, use:
447 | `dictionary-overlay-mark-word-smart-reversely' instead."
448 | (interactive)
449 | (if dictionary-overlay-just-unknown-words
450 | (dictionary-overlay-mark-word-unknown)
451 | (dictionary-overlay-mark-word-known)))
452 |
453 | (defun dictionary-overlay-mark-word-smart-reversely ()
454 | "Smartly mark current word known or unknown smartly, but reversely.
455 | Based on value of `dictionary-overlay-just-unknown-words'"
456 | (interactive)
457 | (if dictionary-overlay-just-unknown-words
458 | (dictionary-overlay-mark-word-known)
459 | (dictionary-overlay-mark-word-unknown)))
460 |
461 | (defun dictionary-overlay-mark-buffer ()
462 | "Mark all words as known, except those in `unknownwords' list."
463 | (interactive)
464 | (when (y-or-n-p
465 | "Mark all as KNOWN, EXCEPT those in unknownwords list?")
466 | (websocket-bridge-call-buffer "mark_buffer")
467 | (dictionary-overlay-refresh-buffer)))
468 |
469 | (defun dictionary-overlay-mark-buffer-unknown ()
470 | "Mark all words as unknown, except those in `unknownwords' list."
471 | (interactive)
472 | (when (y-or-n-p
473 | "Mark all as UNKNOWN, EXCEPT those in unknownwords list?")
474 | (websocket-bridge-call-buffer "mark_buffer_unknown")
475 | (dictionary-overlay-refresh-buffer)))
476 |
477 | (defun dictionary-overlay-lookup ()
478 | "Look up word in a third-parity dictionary.
479 | NOTE: third party dictionaries have their own implemention of
480 | getting words. Probably the word will be the same as the one
481 | dictionary-overlay gets."
482 | (interactive)
483 | (funcall dictionary-overlay-lookup-with))
484 |
485 | (defun dictionary-add-overlay-from (begin end source target buffer-name)
486 | "Add overlay for SOURCE and TARGET from BEGIN to END in BUFFER-NAME."
487 | (when (get-buffer buffer-name)
488 | (with-current-buffer buffer-name
489 | (let ((ov (make-overlay begin end (get-buffer buffer-name)))
490 | (hash-table-key
491 | (format "%s:%s:%s:%s" begin end source target)))
492 | ;; record the overlay's key
493 | (add-to-list 'dictionary-overlay-hash-table-keys hash-table-key)
494 | (when (not (gethash hash-table-key dictionary-overlay-hash-table))
495 | ;; create an overly only when the key not exists
496 | (overlay-put ov 'face 'dictionary-overlay-unknownword)
497 | (overlay-put ov 'evaporate t)
498 | (unless dictionary-overlay-inhibit-keymap
499 | (overlay-put ov 'keymap dictionary-overlay-map))
500 | (pcase dictionary-overlay-position
501 | ('after
502 | (progn
503 | (overlay-put
504 | ov 'after-string
505 | (propertize
506 | (format dictionary-overlay-translation-format target)
507 | 'face 'dictionary-overlay-translation))))
508 | ('help-echo
509 | (overlay-put
510 | ov 'help-echo
511 | (format dictionary-overlay-translation-format target))))
512 | (puthash hash-table-key ov dictionary-overlay-hash-table))))))
513 |
514 | (defun dictionary-overlay-install ()
515 | "Install all python dependencies."
516 | (interactive)
517 | (let ((process-environment
518 | (cons "NO_COLOR=true" process-environment))
519 | (process-buffer-name "*dictionary-overlay-install*"))
520 | (set-process-sentinel
521 | (start-process "dictionary-overlay-install"
522 | process-buffer-name
523 | dictionary-overlay-python
524 | "-m" "pip" "install" "-r"
525 | dictionary-overlay-py-requirements-path)
526 | (lambda (p _m)
527 | (when (eq 0 (process-exit-status p))
528 | (with-current-buffer (process-buffer p)
529 | (ansi-color-apply-on-region (point-min) (point-max))))))
530 | (split-window-below)
531 | (other-window 1)
532 | (switch-to-buffer process-buffer-name)))
533 |
534 | (defun dictionary-overlay-macos-install-core-services ()
535 | "Install all python dependencies."
536 | (interactive)
537 | (let ((process-environment
538 | (cons "NO_COLOR=true" process-environment))
539 | (process-buffer-name "*dictionary-overlay-install*"))
540 | (set-process-sentinel
541 | (start-process "dictionary-overlay-install"
542 | process-buffer-name
543 | dictionary-overlay-python
544 | "-m" "pip" "install"
545 | "pyobjc-framework-CoreServices"
546 | )
547 | (lambda (p _m)
548 | (when (eq 0 (process-exit-status p))
549 | (with-current-buffer (process-buffer p)
550 | (ansi-color-apply-on-region (point-min) (point-max))))))
551 | (split-window-below)
552 | (other-window 1)
553 | (switch-to-buffer process-buffer-name)))
554 |
555 |
556 | (defun dictionary-overlay-install-google-translate ()
557 | "Install all google-translate dependencies."
558 | (interactive)
559 | (let* ((process-environment
560 | (cons "NO_COLOR=true" process-environment))
561 | (process-buffer-name "*dictionary-overlay-install*")
562 | (temp-install-directory
563 | (make-temp-file "install-google-translate" t))
564 | (process-cmd
565 | (format
566 | (concat "git clone https://git.ookami.one/cgit/google-translate/ %s; "
567 | "cd %s; "
568 | dictionary-overlay-python
569 | " -m " "pip" " install build; " "make install")
570 | temp-install-directory temp-install-directory)))
571 | (set-process-sentinel
572 | (start-process-shell-command
573 | "dictionary-overlay-install-google-translate"
574 | process-buffer-name
575 | process-cmd)
576 | (lambda (p _m)
577 | (when (eq 0 (process-exit-status p))
578 | (with-current-buffer (process-buffer p)
579 | (ansi-color-apply-on-region (point-min) (point-max))))))
580 | (split-window-below)
581 | (other-window 1)
582 | (switch-to-buffer process-buffer-name)))
583 |
584 | (defun dictionary-overlay-modify-translation ()
585 | "Modify current word's translation."
586 | (interactive)
587 | (let ((word (downcase (thing-at-point 'word t))))
588 | (websocket-bridge-call "dictionary-overlay"
589 | "modify_translation"
590 | word)))
591 |
592 | (defun dictionary-overlay-choose-translate (word candidates)
593 | "Choose WORD's translation CANDIDATES."
594 | (let ((translation
595 | (completing-read "Choose or input translation: " candidates)))
596 | (websocket-bridge-call "dictionary-overlay"
597 | "update_translation"
598 | word
599 | translation))
600 | (dictionary-overlay-render-buffer))
601 |
602 | (defun dictionary-overlay-ready-p ()
603 | "Check diction-overly if ready."
604 | (and
605 | (member "dictionary-overlay" websocket-bridge-app-list)
606 | (boundp 'websocket-bridge-client-dictionary-overlay)))
607 |
608 | (provide 'dictionary-overlay)
609 | ;;; dictionary-overlay.el ends here
610 |
--------------------------------------------------------------------------------
/dictionary-overlay.py:
--------------------------------------------------------------------------------
1 | ''' Add translation overlay for unknown words.'''
2 | import asyncio
3 | import json
4 | import os
5 | import re
6 | import shutil
7 | from sys import platform
8 | from threading import Timer
9 |
10 | import snowballstemmer
11 | import websocket_bridge_python
12 | from sexpdata import dumps
13 | from tokenizers import Tokenizer
14 | from tokenizers.models import BPE
15 | from tokenizers.pre_tokenizers import Whitespace
16 |
17 | from pystardict import Dictionary
18 |
19 | snowball_stemmer = snowballstemmer.stemmer('english')
20 | sdcv_dictionary = None
21 |
22 |
23 | tokenizer = Tokenizer(BPE())
24 | pre_tokenizer = Whitespace()
25 | dictionary = {}
26 | translators = []
27 |
28 |
29 | def in_or_stem_in(word:str, words) -> bool:
30 | '''Check a word or word stem in the word list'''
31 | return word in words or snowball_stemmer.stemWord(word) in words
32 |
33 | async def parse(sentence: str):
34 | '''parse the sentence'''
35 | only_unknown_words = await get_emacs_var(
36 | "dictionary-overlay-just-unknown-words"
37 | )
38 | tokens = pre_tokenizer.pre_tokenize_str(sentence)
39 | if only_unknown_words :
40 | tokens = [token for token in tokens if in_or_stem_in(token[0].lower(), unknown_words)]
41 | else:
42 | tokens = [token for token in tokens if new_word_p(token[0].lower())]
43 | return tokens
44 |
45 | def new_word_p(word: str) -> bool:
46 | if len(word) < 3:
47 | return False
48 | if re.search(r"[^a-z]", word, re.M | re.I):
49 | return False
50 | return not in_or_stem_in(word, known_words)
51 |
52 | def dump_knownwords_to_file():
53 | with open(knownwords_file_path, "w", encoding="utf-8") as f:
54 | for word in known_words:
55 | f.write(f"{word}\n")
56 |
57 | def dump_unknownwords_to_file():
58 | with open(unknownwords_file_path, "w", encoding="utf-8") as f:
59 | for word in unknown_words:
60 | f.write(f"{word}\n")
61 |
62 | def dump_dictionary_to_file():
63 | with open(dictionary_file_path, "w", encoding="utf-8") as f:
64 | json.dump(dictionary, f, ensure_ascii=False, indent=4)
65 |
66 | def snapshot():
67 | try:
68 | dump_dictionary_to_file()
69 | dump_knownwords_to_file()
70 | dump_unknownwords_to_file()
71 | except:
72 | pass
73 |
74 | Timer(30, snapshot).start()
75 |
76 | # dispatch message received from Emacs.
77 | async def on_message(message):
78 | try:
79 | info = json.loads(message)
80 | cmd = info[1][0].strip()
81 | if cmd == "render":
82 | sentence = info[1][1]
83 | buffer_name = info[1][3]
84 | await render(sentence, buffer_name)
85 | await run_and_log("(dictionary-overlay-refresh-overlays)")
86 | elif cmd == "jump_next_unknown_word":
87 | sentence = info[1][1]
88 | point = info[1][2]
89 | await jump_next_unknown_word(sentence, point)
90 | elif cmd == "jump_prev_unknown_word":
91 | sentence = info[1][1]
92 | point = info[1][2]
93 | await jump_prev_unknown_word(sentence, point)
94 | elif cmd == "mark_word_known":
95 | word = info[1][1]
96 | stem_word = snowball_stemmer.stemWord(word)
97 | if word in unknown_words:
98 | unknown_words.remove(word)
99 | if stem_word in unknown_words:
100 | unknown_words.remove(stem_word)
101 | known_words.add(word)
102 | known_words.add(stem_word)
103 | elif cmd == "mark_word_unknown":
104 | word = info[1][1]
105 | stem_word = snowball_stemmer.stemWord(word)
106 | if word in known_words:
107 | known_words.remove(word)
108 | if stem_word in known_words:
109 | known_words.remove(stem_word)
110 | unknown_words.add(word)
111 | unknown_words.add(stem_word)
112 | elif cmd == "mark_buffer":
113 | sentence = info[1][1]
114 | mark_buffer(sentence)
115 | elif cmd == "mark_buffer_unknown":
116 | sentence = info[1][1]
117 | mark_buffer_unknown(sentence)
118 | elif cmd == "modify_translation":
119 | # give user a selection to modify word translation.
120 | # combine with update_translation
121 | word = info[1][1]
122 | await modify_translation(word)
123 | elif cmd == "update_translation":
124 | # update translate in memory
125 | word = info[1][1]
126 | translation = info[1][2]
127 | dictionary[word]=translation
128 |
129 | else:
130 | print(f"not fount handler for {cmd}", flush=True)
131 | except:
132 | import traceback
133 | print(traceback.format_exc())
134 |
135 | async def modify_translation(word: str):
136 | "let the user to modify default translation"
137 | all_translations = []
138 | for translator in translators:
139 | translations = await translate_by_translator(word, translator)
140 | all_translations.extend(translations)
141 | # remove duplicative translations
142 | # dict.fromkeys doesn't lose ordering. It's slower than list(set(items)) (takes 50-100% longer typically), but much faster than any other order-preserving solution
143 | all_translations = list(dict.fromkeys(all_translations))
144 | sexp = dumps(all_translations)
145 | cmd = f'(dictionary-overlay-choose-translate "{word}" \'{sexp})'
146 | await run_and_log(cmd)
147 |
148 | def mark_buffer(sentence: str):
149 | tokens = pre_tokenizer.pre_tokenize_str(sentence)
150 | words = [
151 | token[0].lower() for token in tokens if not in_or_stem_in(token[0].lower(), unknown_words)
152 | ]
153 |
154 | for word in words:
155 | known_words.add(word)
156 | known_words.add(snowball_stemmer.stemWord(word))
157 |
158 | def mark_buffer_unknown(sentence: str):
159 | tokens = pre_tokenizer.pre_tokenize_str(sentence)
160 | words = [
161 | token[0].lower() for token in tokens if not in_or_stem_in(token[0].lower(), known_words)
162 | ]
163 | for word in words:
164 | unknown_words.add(word)
165 | unknown_words.add(snowball_stemmer.stemWord(word))
166 |
167 | def get_command_result(command_string, cwd=None):
168 | import subprocess
169 |
170 | process = subprocess.Popen(
171 | command_string,
172 | cwd=cwd,
173 | shell=True,
174 | text=True,
175 | stdout=subprocess.PIPE,
176 | stderr=subprocess.PIPE,
177 | encoding="utf-8",
178 | )
179 | ret = process.wait()
180 | return "".join((process.stdout if ret == 0 else process.stderr).readlines()).strip() # type: ignore
181 |
182 | async def jump_next_unknown_word(sentence: str, point: int):
183 | tokens = await parse(sentence)
184 | # todo: write this with build-in 'any' function
185 | for token in tokens:
186 | begin = token[1][0] + 1
187 | if point < begin:
188 | cmd = f"(goto-char {begin})"
189 | await run_and_log(cmd)
190 | break
191 |
192 | async def jump_prev_unknown_word(sentence: str, point: int):
193 | tokens = await parse(sentence)
194 | for token in reversed(tokens):
195 | begin = token[1][0] + 1
196 | if point > begin:
197 | cmd = f"(goto-char {begin})"
198 | await run_and_log(cmd)
199 | break
200 |
201 | async def web_translate(word: str) -> list:
202 | '''translate word by web translator, crow or google'''
203 | try:
204 | if shutil.which("crow"):
205 | result = get_command_result(f'crow -t zh-CN --json -e {crow_engine} "{word}"')
206 | return [json.loads(result)["translation"]]
207 | import google_translate
208 | result = google_translate.translate(word, dst_lang='zh')
209 | return result["trans"]
210 | except ImportError:
211 | msg= f"[Dictionary-overlay]you do not have a network dictionary installed and the queried word [\"{word}\"] is not in the local dictionary, please install crow-translate or google-translate"
212 | print(msg)
213 | await bridge.message_to_emacs(msg)
214 | return []
215 | except Exception as e:
216 | print (e)
217 | msg = "[Dictionary-overlay]web-translate error, check your network. or run (websocket-bridge-app-open-buffer 'dictionary-overlay) see the error details."
218 | await bridge.message_to_emacs(msg)
219 | return []
220 |
221 | async def ipa_translate(word: str) -> list:
222 | '''translate word by ipa'''
223 | try:
224 | import eng_to_ipa as ipa
225 | result = ipa.convert(word)
226 | return [result]
227 | except ImportError:
228 | msg= f"[Dictionary-overlay]you do not have a ipa dictionary installed."
229 | print(msg)
230 | await bridge.message_to_emacs(msg)
231 | return []
232 | except Exception as e:
233 | print (e)
234 | msg = "[Dictionary-overlay]web-translate error, check your network. or run (websocket-bridge-app-open-buffer 'dictionary-overlay) see the error details."
235 | await bridge.message_to_emacs(msg)
236 | return []
237 |
238 |
239 | def extract_translations(msg:str) -> list:
240 | '''extract translations by regex'''
241 | re_chinese_words = re.compile("[\u4e00-\u9fa5]+")
242 | return re.findall(re_chinese_words, msg)
243 |
244 | def sdcv_translate(word:str) -> list:
245 | '''translate word and stem by sdcv'''
246 | stem = snowball_stemmer.stemWord(word)
247 | translations = extract_translations(sdcv_dictionary.get(word))
248 | translations.extend(extract_translations(sdcv_dictionary.get(stem)))
249 | return translations
250 |
251 | def local_translate(word:str) -> list:
252 | '''translate word by local dictionary'''
253 | translation = dictionary.get(word)
254 | return [translation] if translation else []
255 |
256 | async def translate_by_translator(word: str, translator: str) -> list:
257 | '''translate word by specified translator'''
258 | if translator == "local":
259 | return local_translate(word)
260 | if translator == "sdcv":
261 | return sdcv_translate(word)
262 | if translator == "darwin":
263 | return macos_dictionary_translate(word)
264 | if translator == "web":
265 | return await web_translate(word)
266 | if translator == "ipa":
267 | return await ipa_translate(word)
268 | return []
269 |
270 | async def translate(word: str) -> str:
271 | '''translate word.'''
272 | for translator in translators:
273 | translations = await translate_by_translator(word, translator)
274 | if translations:
275 | dictionary[word] = translations[0]
276 | return translations[0]
277 | return ""
278 |
279 | async def render(message, buffer_name):
280 | '''call Emacs render message'''
281 | try:
282 | tokens = await parse(message)
283 | for token in tokens:
284 | word = token[0].lower()
285 | chinese = await translate(word)
286 | if chinese != "":
287 | # if find translation, render function in emacs.
288 | await render_word(token, chinese, buffer_name)
289 | except Exception as e:
290 | msg = "[Dictionary-overlay]Render buffer error. Run (websocket-bridge-app-open-buffer 'dictionary-overlay) see the error details"
291 | await bridge.message_to_emacs(msg)
292 | print(e)
293 |
294 | async def render_word(token, chinese, buffer_name):
295 | word = token[0]
296 | begin = token[1][0] + 1
297 | end = token[1][1] + 1
298 | cmd = f'(dictionary-add-overlay-from {begin} {end} "{word}" "{chinese}" "{buffer_name}")'
299 | await run_and_log(cmd)
300 |
301 | # eval in emacs and log the command.
302 | async def run_and_log(cmd):
303 | print(cmd, flush=True)
304 | await bridge.eval_in_emacs(cmd)
305 |
306 | async def main():
307 | snapshot()
308 | global bridge
309 | bridge = websocket_bridge_python.bridge_app_regist(on_message)
310 | await asyncio.gather(init(), bridge.start())
311 |
312 | async def get_emacs_var(var_name: str):
313 | "Get Emacs variable and format it."
314 | var_value = await bridge.get_emacs_var(var_name)
315 | if isinstance(var_value, str):
316 | var_value = var_value.strip('"')
317 | print(f'{var_name} : {var_value}')
318 | if var_value == 'null':
319 | return None
320 | return var_value
321 |
322 | async def init():
323 | "Init User data."
324 | global dictionary_file_path, knownwords_file_path, unknownwords_file_path, known_words, unknown_words, crow_engine, dictionary, translators, sdcv_dictionary
325 | sdcv_dictionary_path = await get_emacs_var("dictionary-overlay-sdcv-dictionary-path")
326 | if not sdcv_dictionary_path:
327 | sdcv_dictionary_path = os.path.join(
328 | os.path.dirname(__file__), "resources", "kdic-ec-11w"
329 | )
330 | sdcv_dictionary = Dictionary(sdcv_dictionary_path, in_memory=True)
331 | print("Sdcv Dictionary load success.")
332 | crow_engine = await get_emacs_var("dictionary-overlay-crow-engine")
333 | translators = json.loads(await get_emacs_var("dictionary-overlay-translators"))
334 | user_data_directory = await get_emacs_var("dictionary-overlay-user-data-directory")
335 | user_data_directory = os.path.expanduser(user_data_directory)
336 | dictionary_file_path = os.path.join(user_data_directory, "dictionary.json")
337 | knownwords_file_path = os.path.join(user_data_directory, "knownwords.txt")
338 | unknownwords_file_path = os.path.join(user_data_directory, "unknownwords.txt")
339 | create_user_data_file_if_not_exist(dictionary_file_path, "{}")
340 | create_user_data_file_if_not_exist(knownwords_file_path)
341 | create_user_data_file_if_not_exist(unknownwords_file_path)
342 | with open(dictionary_file_path, "r", encoding="utf-8") as f: dictionary = json.load(f)
343 | with open(knownwords_file_path, "r", encoding="utf-8") as f: known_words= set(f.read().split())
344 | with open(unknownwords_file_path, "r", encoding="utf-8") as f: unknown_words= set(f.read().split())
345 |
346 | def create_user_data_file_if_not_exist(path: str, content=None):
347 | '''create user data file if not exist'''
348 | if not os.path.exists(path):
349 | # Build parent directories when file is not exist.
350 | basedir = os.path.dirname(path)
351 | if not os.path.exists(basedir):
352 | os.makedirs(basedir)
353 |
354 | with open(path, "w", encoding="utf-8") as f:
355 | if content:
356 | f.write(content)
357 |
358 | print(f"[dictionary-overlay] auto create user data file {path}")
359 |
360 | def macos_dictionary_translate(word: str) -> list:
361 | '''using macos dictionary to translate word'''
362 | if platform == "darwin":
363 | import CoreServices
364 | translation_msg = CoreServices.DCSCopyTextDefinition(None, word, (0, len(word)))
365 | translation_msg = translation_msg if translation_msg else ""
366 | return extract_translations(translation_msg)
367 | return []
368 |
369 | asyncio.run(main())
370 |
--------------------------------------------------------------------------------
/images/2022-11-15_21-23-58_screenshot.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ginqi7/dictionary-overlay/692fdcec3082e58d0ed57b36ad430aebf7352cd0/images/2022-11-15_21-23-58_screenshot.png
--------------------------------------------------------------------------------
/images/dictionary-overlay-face.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ginqi7/dictionary-overlay/692fdcec3082e58d0ed57b36ad430aebf7352cd0/images/dictionary-overlay-face.png
--------------------------------------------------------------------------------
/logo.svg:
--------------------------------------------------------------------------------
1 |
49 |
--------------------------------------------------------------------------------
/pystardict.py:
--------------------------------------------------------------------------------
1 | import gzip
2 | import hashlib
3 | import os
4 | import re
5 | import warnings
6 | from struct import unpack
7 |
8 | import six
9 |
10 |
11 | class _StarDictIfo(object):
12 | """
13 | The .ifo file has the following format:
14 |
15 | StarDict's dict ifo file
16 | version=2.4.2
17 | [options]
18 |
19 | Note that the current "version" string must be "2.4.2" or "3.0.0". If it's not,
20 | then StarDict will refuse to read the file.
21 | If version is "3.0.0", StarDict will parse the "idxoffsetbits" option.
22 |
23 | [options]
24 | ---------
25 | In the example above, [options] expands to any of the following lines
26 | specifying information about the dictionary. Each option is a keyword
27 | followed by an equal sign, then the value of that option, then a
28 | newline. The options may be appear in any order.
29 |
30 | Note that the dictionary must have at least a bookname, a wordcount and a
31 | idxfilesize, or the load will fail. All other information is optional. All
32 | strings should be encoded in UTF-8.
33 |
34 | Available options:
35 |
36 | bookname= // required
37 | wordcount= // required
38 | synwordcount= // required if ".syn" file exists.
39 | idxfilesize= // required
40 | idxoffsetbits= // New in 3.0.0
41 | author=
42 | email=
43 | website=
44 | description= // You can use
for new line.
45 | date=
46 | sametypesequence= // very important.
47 | """
48 |
49 | def __init__(self, dict_prefix, container):
50 |
51 | ifo_filename = '%s.ifo' % dict_prefix
52 |
53 | try:
54 | _file = open(ifo_filename, encoding="utf-8")
55 | except Exception as e:
56 | raise Exception('ifo file opening error: "{}"'.format(e))
57 |
58 | _file.readline()
59 |
60 | # skipping ifo header
61 | _line = _file.readline().split('=')
62 | if _line[0] == 'version':
63 | self.version = _line[1]
64 | else:
65 | raise Exception('ifo has invalid format')
66 |
67 | _config = {}
68 | for _line in _file:
69 | _line_splited = _line.split('=')
70 | _config[_line_splited[0]] = _line_splited[1]
71 | _file.close()
72 |
73 | self.bookname = _config.get('bookname', None).strip()
74 | if self.bookname is None:
75 | raise Exception('ifo has no bookname')
76 |
77 | self.wordcount = _config.get('wordcount', None)
78 | if self.wordcount is None:
79 | raise Exception('ifo has no wordcount')
80 | self.wordcount = int(self.wordcount)
81 |
82 | if self.version == '3.0.0':
83 | try:
84 | #_syn = open('%s.syn' % dict_prefix) # not used
85 | self.synwordcount = _config.get('synwordcount', None)
86 | if self.synwordcount is None:
87 | raise Exception(
88 | 'ifo has no synwordcount but .syn file exists')
89 | self.synwordcount = int(self.synwordcount)
90 | except IOError:
91 | pass
92 |
93 | self.idxfilesize = _config.get('idxfilesize', None)
94 | if self.idxfilesize is None:
95 | raise Exception('ifo has no idxfilesize')
96 | self.idxfilesize = int(self.idxfilesize)
97 |
98 | self.idxoffsetbits = _config.get('idxoffsetbits', 32)
99 | self.idxoffsetbits = int(self.idxoffsetbits)
100 |
101 | self.author = _config.get('author', '').strip()
102 |
103 | self.email = _config.get('email', '').strip()
104 |
105 | self.website = _config.get('website', '').strip()
106 |
107 | self.description = _config.get('description', '').strip()
108 |
109 | self.date = _config.get('date', '').strip()
110 |
111 | self.sametypesequence = _config.get('sametypesequence', '').strip()
112 |
113 |
114 | class _StarDictIdx(object):
115 | """
116 | The .idx file is just a word list.
117 |
118 | The word list is a sorted list of word entries.
119 |
120 | Each entry in the word list contains three fields, one after the other:
121 | word_str; // a utf-8 string terminated by '\0'.
122 | word_data_offset; // word data's offset in .dict file
123 | word_data_size; // word data's total size in .dict file
124 | """
125 |
126 | def __init__(self, dict_prefix, container):
127 | self._container = container
128 |
129 | idx_filename = '%s.idx' % dict_prefix
130 | idx_filename_gz = '%s.gz' % idx_filename
131 |
132 | try:
133 | file = open_file(idx_filename, idx_filename_gz)
134 | except Exception as e:
135 | raise Exception('idx file opening error: "{}"'.format(e))
136 |
137 | self._file = file.read()
138 |
139 | """ check file size """
140 | if file.tell() != container.ifo.idxfilesize:
141 | raise Exception('size of the .idx file is incorrect')
142 | file.close()
143 |
144 | """ prepare main dict and parsing parameters """
145 | self._idx = {}
146 | idx_offset_bytes_size = int(container.ifo.idxoffsetbits / 8)
147 | idx_offset_format = {4: 'L', 8: 'Q', }[idx_offset_bytes_size]
148 | idx_cords_bytes_size = idx_offset_bytes_size + 4
149 |
150 | """ parse data via regex """
151 | record_pattern = br'([\d\D]+?\x00[\d\D]{' + str(
152 | idx_cords_bytes_size).encode('utf-8') + br'})'
153 | matched_records = re.findall(record_pattern, self._file)
154 |
155 | """ check records count """
156 | if len(matched_records) != container.ifo.wordcount:
157 | raise Exception('words count is incorrect')
158 |
159 | """ unpack parsed records """
160 | for matched_record in matched_records:
161 | c = matched_record.find(b'\x00')
162 | if c == 0:
163 | continue
164 | record_tuple = unpack(
165 | '!%sc%sL' % (c + 1, idx_offset_format), matched_record)
166 | word, cords = record_tuple[:c], record_tuple[c + 1:]
167 | self._idx[b''.join(word)] = cords
168 |
169 | def __getitem__(self, word):
170 | """
171 | returns tuple (word_data_offset, word_data_size,) for word in .dict
172 |
173 | @note: here may be placed flexible search realization
174 | """
175 | return self._idx[word.encode('utf-8')]
176 |
177 | def __contains__(self, k):
178 | """
179 | returns True if index has a word k, else False
180 | """
181 | return k.encode('utf-8') in self._idx
182 |
183 | def __eq__(self, y):
184 | """
185 | returns True if hashlib.md5(x.idx) is equal to hashlib.md5(y.idx), else False
186 | """
187 | return hashlib.md5(self._file).hexdigest() == hashlib.md5(y._file).hexdigest()
188 |
189 | def __ne__(self, y):
190 | """
191 | returns True if hashlib.md5(x.idx) is not equal to hashlib.md5(y.idx), else False
192 | """
193 | return not self.__eq__(y)
194 |
195 | def iterkeys(self):
196 | """
197 | returns iterkeys
198 | """
199 | if not self._container.in_memory:
200 | warnings.warn(
201 | 'Iter dict items with in_memory=False may cause serious performance problem')
202 | for key in six.iterkeys(self._idx):
203 | yield key.decode('utf-8')
204 |
205 | def keys(self):
206 | """
207 | returns keys
208 | """
209 | if six.PY3:
210 | return self.iterkeys()
211 |
212 | if not self._container.in_memory:
213 | warnings.warn(
214 | 'Iter dict items with in_memory=False may cause serious performance problem')
215 | return [key.decode('utf-8') for key in self._idx.keys()]
216 |
217 |
218 | class _StarDictDict(object):
219 | """
220 | The .dict file is a pure data sequence, as the offset and size of each
221 | word is recorded in the corresponding .idx file.
222 |
223 | If the "sametypesequence" option is not used in the .ifo file, then
224 | the .dict file has fields in the following order:
225 | ==============
226 | word_1_data_1_type; // a single char identifying the data type
227 | word_1_data_1_data; // the data
228 | word_1_data_2_type;
229 | word_1_data_2_data;
230 | ...... // the number of data entries for each word is determined by
231 | // word_data_size in .idx file
232 | word_2_data_1_type;
233 | word_2_data_1_data;
234 | ......
235 | ==============
236 | It's important to note that each field in each word indicates its
237 | own length, as described below. The number of possible fields per
238 | word is also not fixed, and is determined by simply reading data until
239 | you've read word_data_size bytes for that word.
240 |
241 |
242 | Suppose the "sametypesequence" option is used in the .idx file, and
243 | the option is set like this:
244 | sametypesequence=tm
245 | Then the .dict file will look like this:
246 | ==============
247 | word_1_data_1_data
248 | word_1_data_2_data
249 | word_2_data_1_data
250 | word_2_data_2_data
251 | ......
252 | ==============
253 | The first data entry for each word will have a terminating '\0', but
254 | the second entry will not have a terminating '\0'. The omissions of
255 | the type chars and of the last field's size information are the
256 | optimizations required by the "sametypesequence" option described
257 | above.
258 |
259 | If "idxoffsetbits=64", the file size of the .dict file will be bigger
260 | than 4G. Because we often need to mmap this large file, and there is
261 | a 4G maximum virtual memory space limit in a process on the 32 bits
262 | computer, which will make we can get error, so "idxoffsetbits=64"
263 | dictionary can't be loaded in 32 bits machine in fact, StarDict will
264 | simply print a warning in this case when loading. 64-bits computers
265 | should haven't this limit.
266 |
267 | Type identifiers
268 | ----------------
269 | Here are the single-character type identifiers that may be used with
270 | the "sametypesequence" option in the .idx file, or may appear in the
271 | dict file itself if the "sametypesequence" option is not used.
272 |
273 | Lower-case characters signify that a field's size is determined by a
274 | terminating '\0', while upper-case characters indicate that the data
275 | begins with a network byte-ordered guint32 that gives the length of
276 | the following data's size(NOT the whole size which is 4 bytes bigger).
277 |
278 | 'm'
279 | Word's pure text meaning.
280 | The data should be a utf-8 string ending with '\0'.
281 |
282 | 'l'
283 | Word's pure text meaning.
284 | The data is NOT a utf-8 string, but is instead a string in locale
285 | encoding, ending with '\0'. Sometimes using this type will save disk
286 | space, but its use is discouraged.
287 |
288 | 'g'
289 | A utf-8 string which is marked up with the Pango text markup language.
290 | For more information about this markup language, See the "Pango
291 | Reference Manual."
292 | You might have it installed locally at:
293 | file:///usr/share/gtk-doc/html/pango/PangoMarkupFormat.html
294 |
295 | 't'
296 | English phonetic string.
297 | The data should be a utf-8 string ending with '\0'.
298 |
299 | Here are some utf-8 phonetic characters:
300 | θʃŋʧðʒæıʌʊɒɛəɑɜɔˌˈːˑṃṇḷ
301 | æɑɒʌәєŋvθðʃʒɚːɡˏˊˋ
302 |
303 | 'x'
304 | A utf-8 string which is marked up with the xdxf language.
305 | See http://xdxf.sourceforge.net
306 | StarDict have these extention:
307 | can have "type" attribute, it can be "image", "sound", "video"
308 | and "attach".
309 | can have "k" attribute.
310 |
311 | 'y'
312 | Chinese YinBiao or Japanese KANA.
313 | The data should be a utf-8 string ending with '\0'.
314 |
315 | 'k'
316 | KingSoft PowerWord's data. The data is a utf-8 string ending with '\0'.
317 | It is in XML format.
318 |
319 | 'w'
320 | MediaWiki markup language.
321 | See http://meta.wikimedia.org/wiki/Help:Editing#The_wiki_markup
322 |
323 | 'h'
324 | Html codes.
325 |
326 | 'r'
327 | Resource file list.
328 | The content can be:
329 | img:pic/example.jpg // Image file
330 | snd:apple.wav // Sound file
331 | vdo:film.avi // Video file
332 | att:file.bin // Attachment file
333 | More than one line is supported as a list of available files.
334 | StarDict will find the files in the Resource Storage.
335 | The image will be shown, the sound file will have a play button.
336 | You can "save as" the attachment file and so on.
337 |
338 | 'W'
339 | wav file.
340 | The data begins with a network byte-ordered guint32 to identify the wav
341 | file's size, immediately followed by the file's content.
342 |
343 | 'P'
344 | Picture file.
345 | The data begins with a network byte-ordered guint32 to identify the picture
346 | file's size, immediately followed by the file's content.
347 |
348 | 'X'
349 | this type identifier is reserved for experimental extensions.
350 |
351 | """
352 |
353 | def __init__(self, dict_prefix, container, in_memory=False):
354 | """
355 | opens regular or dziped .dict file
356 |
357 | 'in_memory': indicate whether read whole dict file into memory
358 | """
359 | self._container = container
360 | self._in_memory = in_memory
361 |
362 | dict_filename = '%s.dict' % dict_prefix
363 | dict_filename_dz = '%s.dz' % dict_filename
364 |
365 | try:
366 | f = open_file(dict_filename, dict_filename_dz)
367 | except Exception as e:
368 | raise Exception('dict file opening error: "{}"'.format(e))
369 |
370 | if in_memory:
371 | self._file = f.read()
372 | f.close()
373 | else:
374 | self._file = f
375 |
376 | def __getitem__(self, word):
377 | """
378 | returns data from .dict for word
379 | """
380 |
381 | # getting word data coordinates
382 | cords = self._container.idx[word]
383 |
384 | if self._in_memory:
385 | bytes_ = self._file[cords[0]: cords[0] + cords[1]]
386 | else:
387 | # seeking in file for data
388 | self._file.seek(cords[0])
389 |
390 | # reading data
391 | bytes_ = self._file.read(cords[1])
392 |
393 | return bytes_.decode('utf-8')
394 |
395 |
396 | class _StarDictSyn(object):
397 |
398 | def __init__(self, dict_prefix, container):
399 |
400 | syn_filename = '%s.syn' % dict_prefix
401 |
402 | try:
403 | self._file = open(syn_filename, encoding="utf-8")
404 | except IOError:
405 | # syn file is optional, passing silently
406 | pass
407 |
408 |
409 | class Dictionary(dict):
410 | """
411 | Dictionary-like class for lazy manipulating stardict dictionaries
412 |
413 | All items of this dictionary are writable and dict is expandable itself,
414 | but changes are not stored anywhere and available in runtime only.
415 |
416 | We assume in this documentation that "x" or "y" is instances of the
417 | StarDictDict class and "x.{ifo,idx{,.gz},dict{,.dz},syn}" or
418 | "y.{ifo,idx{,.gz},dict{,.dz},syn}" is files of the corresponding stardict
419 | dictionaries.
420 |
421 |
422 | Following documentation is from the "dict" class an is subkect to rewrite
423 | in further impleneted methods:
424 |
425 | """
426 |
427 | def __init__(self, filename_prefix, in_memory=False):
428 | """
429 | filename_prefix: path to dictionary files without files extensions
430 |
431 | initializes new StarDictDict instance from stardict dictionary files
432 | provided by filename_prefix
433 | """
434 |
435 | self.in_memory = in_memory
436 |
437 | # reading somedict.ifo
438 | self.ifo = _StarDictIfo(dict_prefix=filename_prefix, container=self)
439 |
440 | # reading somedict.idx or somedict.idx.gz
441 | self.idx = _StarDictIdx(dict_prefix=filename_prefix, container=self)
442 |
443 | # reading somedict.dict or somedict.dict.dz
444 | self.dict = _StarDictDict(
445 | dict_prefix=filename_prefix, container=self, in_memory=in_memory)
446 |
447 | # reading somedict.syn (optional)
448 | self.syn = _StarDictSyn(dict_prefix=filename_prefix, container=self)
449 |
450 | # initializing cache
451 | self._dict_cache = {}
452 |
453 | def __cmp__(self, y):
454 | """
455 | raises NotImplemented exception
456 | """
457 | raise NotImplementedError()
458 |
459 | def __contains__(self, k):
460 | """
461 | returns True if x.idx has a word k, else False
462 | """
463 | return k in self.idx
464 |
465 | def __delitem__(self, k):
466 | """
467 | frees cache from word k translation
468 | """
469 | del self._dict_cache[k]
470 |
471 | def __eq__(self, y):
472 | """
473 | returns True if hashlib.md5(x.idx) is equal to hashlib.md5(y.idx), else False
474 | """
475 | return self.idx.__eq__(y.idx)
476 |
477 | def __ge__(self, y):
478 | """
479 | raises NotImplemented exception
480 | """
481 | raise NotImplementedError()
482 |
483 | def __getitem__(self, k):
484 | """
485 | returns translation for word k from cache or not and then caches
486 | """
487 | if k in self._dict_cache:
488 | return self._dict_cache[k]
489 | else:
490 | value = self.dict[k]
491 | self._dict_cache[k] = value
492 | return value
493 |
494 | def __gt__(self, y):
495 | """
496 | raises NotImplemented exception
497 | """
498 | raise NotImplementedError()
499 |
500 | def __iter__(self):
501 | """
502 | raises NotImplemented exception
503 | """
504 | raise NotImplementedError()
505 |
506 | def __le__(self):
507 | """
508 | raises NotImplemented exception
509 | """
510 | raise NotImplementedError()
511 |
512 | def __len__(self):
513 | """
514 | returns number of words provided by wordcount parameter of the x.ifo
515 | """
516 | return self.ifo.wordcount
517 |
518 | def __lt__(self):
519 | """
520 | raises NotImplemented exception
521 | """
522 | raise NotImplementedError()
523 |
524 | def __ne__(self, y):
525 | """
526 | returns True if hashlib.md5(x.idx) is not equal to hashlib.md5(y.idx), else False
527 | """
528 | return not self.__eq__(y)
529 |
530 | def __repr__(self):
531 | """
532 | returns classname and bookname parameter of the x.ifo
533 | """
534 | return u'%s %s' % (self.__class__, self.ifo.bookname)
535 |
536 | def __setitem__(self, k, v):
537 | """
538 | raises NotImplemented exception
539 | """
540 | raise NotImplementedError()
541 |
542 | def clear(self):
543 | """
544 | clear dict cache
545 | """
546 | self._dict_cache = dict()
547 |
548 | def get(self, k, d=''):
549 | """
550 | returns translation of the word k from self.dict or d if k not in x.idx
551 |
552 | d defaults to empty string
553 | """
554 | return k in self and self[k] or d
555 |
556 | def has_key(self, k):
557 | """
558 | returns True if self.idx has a word k, else False
559 | """
560 | return k in self
561 |
562 | def items(self):
563 | """
564 | returns items
565 | """
566 | if not self.in_memory:
567 | warnings.warn(
568 | 'Iter dict items with in_memory=False may cause serious performance problem')
569 | return [(key, self[key]) for key in self.keys()]
570 |
571 | def iteritems(self):
572 | """
573 | returns iteritems
574 | """
575 | if not self.in_memory:
576 | warnings.warn(
577 | 'Iter dict items with in_memory=False may cause serious performance problem')
578 | for key in self.iterkeys():
579 | yield (key, self[key])
580 |
581 | def iterkeys(self):
582 | """
583 | returns iterkeys
584 | """
585 | if not self.in_memory:
586 | warnings.warn(
587 | 'Iter dict items with in_memory=False may cause serious performance problem')
588 | return self.idx.iterkeys()
589 |
590 | def itervalues(self):
591 | """
592 | raises NotImplemented exception
593 | """
594 | raise NotImplementedError()
595 |
596 | def keys(self):
597 | """
598 | returns keys
599 | """
600 | if not self.in_memory:
601 | warnings.warn(
602 | 'Iter dict items with in_memory=False may cause serious performance problem')
603 | return self.idx.keys()
604 |
605 | def pop(self, k, d):
606 | """
607 | raises NotImplemented exception
608 | """
609 | raise NotImplementedError()
610 |
611 | def popitem(self):
612 | """
613 | raises NotImplemented exception
614 | """
615 | raise NotImplementedError()
616 |
617 | def setdefault(self, k, d):
618 | """
619 | raises NotImplemented exception
620 | """
621 | raise NotImplementedError()
622 |
623 | def update(self, E, **F):
624 | """
625 | raises NotImplemented exception
626 | """
627 | raise NotImplementedError()
628 |
629 | def values(self):
630 | """
631 | raises NotImplemented exception
632 | """
633 | raise NotImplementedError()
634 |
635 | def fromkeys(self, S, v=None):
636 | """
637 | raises NotImplemented exception
638 | """
639 | raise NotImplementedError()
640 |
641 |
642 | def open_file(regular, gz):
643 | """
644 | Open regular file if it exists, gz file otherwise.
645 | If no file exists, raise ValueError.
646 | """
647 | if os.path.exists(regular):
648 | try:
649 | return open(regular, 'rb')
650 | except Exception as e:
651 | raise Exception('regular file opening error: "{}"'.format(e))
652 |
653 | if os.path.exists(gz):
654 | try:
655 | return gzip.open(gz, 'rb')
656 | except Exception as e:
657 | raise Exception('gz file opening error: "{}"'.format(e))
658 |
659 | raise ValueError('Neither regular nor gz file exists')
660 |
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | sexpdata
2 | six==1.16.0
3 | snowballstemmer==2.2.0
4 | tokenizers==0.13.2
5 | websocket-bridge-python==0.0.2
6 | websockets==10.4
7 |
--------------------------------------------------------------------------------
/resources/kdic-ec-11w.dict.dz:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ginqi7/dictionary-overlay/692fdcec3082e58d0ed57b36ad430aebf7352cd0/resources/kdic-ec-11w.dict.dz
--------------------------------------------------------------------------------
/resources/kdic-ec-11w.idx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ginqi7/dictionary-overlay/692fdcec3082e58d0ed57b36ad430aebf7352cd0/resources/kdic-ec-11w.idx
--------------------------------------------------------------------------------
/resources/kdic-ec-11w.ifo:
--------------------------------------------------------------------------------
1 | StarDict's dict ifo file
2 | version=2.4.2
3 | wordcount=112344
4 | idxfilesize=2105186
5 | bookname=KDic11万英汉词典
6 | description=胡正制作。
7 | date=2006.5.17
8 | sametypesequence=m
9 |
--------------------------------------------------------------------------------