├── .gitignore ├── LICENSE.md ├── LICENSE.txt ├── MANIFEST ├── README.md ├── leo ├── leo-cli.spec ├── pyproject.toml ├── requirements.txt └── setup.cfg /.gitignore: -------------------------------------------------------------------------------- 1 | # Packages 2 | *.egg 3 | *.egg-info 4 | dist 5 | build 6 | eggs 7 | parts 8 | bin 9 | var 10 | sdist 11 | develop-eggs 12 | .installed.cfg 13 | lib 14 | lib64 15 | .idea 16 | -------------------------------------------------------------------------------- /LICENSE.md: -------------------------------------------------------------------------------- 1 | Copyright 2021 Johannes Degn 2 | 3 | Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: 4 | 5 | The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. 6 | 7 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 8 | -------------------------------------------------------------------------------- /LICENSE.txt: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2017 Johannes Degn 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /MANIFEST: -------------------------------------------------------------------------------- 1 | # file GENERATED by distutils, do NOT edit 2 | leo 3 | setup.py 4 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # leo-cli 2 | 3 | leo-cli is a command line tool which can be used to translate words or phrases from several languages to german. It uses the open dictionary [dict.leo.org][]. I wrote this because visiting their website, choosing the language, typing the word and clicking the submit button required several too many steps. I am a lazy person. 4 | 5 | [dict.leo.org]: http://dict.leo.org 6 | 7 | 8 | 9 | ## Installation 10 | This tool requires beatiful soup, the wonderful requests library and the tabulate library. 11 | 12 | ### Install leo-cli 13 | pip install leo-cli 14 | 15 | ### Update 16 | There has been a layout change on leo.org so you might have to 17 | pip install leo-cli --upgrade 18 | 19 | ## Usage: 20 | 21 | leo -h 22 | usage: leo [-h] [-l {en,pt,fr,de,es,ru}] [-i] [-p {all,n,v,adj}] [-d] [-v] 23 | words [words ...] 24 | Retrieve word information via the Leo website 25 | positional arguments: 26 | words the words to look up on the LEO website 27 | optional arguments: 28 | -h, --help show this help message and exit 29 | -l {en,pt,fr,de,es,ru}, --lang {en,pt,fr,de,es,ru} 30 | source language, 2 chars (e.g. 'en') 31 | -i, --inflect print inflection tables for all homonyms 32 | -p {all,n,v,adj}, --pos {all,n,v,adj} 33 | Part of speech of words to translate/inflect. 34 | -d, --define print dictionary definitions. True by default if -i is 35 | not specified. 36 | -v, --verbose Print debug messages 37 | 38 | ### Examples 39 | 40 | leo example 41 | leo another example 42 | leo "hang out" 43 | leo -l fr bonne gout 44 | leo -l ru книга 45 | leo -l pt ação 46 | leo -i reden 47 | leo ii -p n reden 48 | 49 | ## TODO 50 | * print non-German plurals 51 | * allow specifying target and source languages separately 52 | * (maybe) don't print conjugation labels in translation header for conjugations 53 | * alternative conjugations with labels for usage (hängen) 54 | * label haupt/nebensätzlich sections for verbs 55 | -------------------------------------------------------------------------------- /leo: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | from __future__ import print_function 4 | 5 | import sys 6 | 7 | user_interrupt_errors = (KeyboardInterrupt,) 8 | if sys.version_info[0] == 3: 9 | user_interrupt_errors += (BrokenPipeError,) 10 | else: 11 | user_interrupt_errors += (IOError,) 12 | 13 | try: 14 | from argparse import ArgumentParser 15 | from collections import defaultdict 16 | # python 2 only 17 | try: 18 | import itertools.izip as zip 19 | except ImportError: 20 | pass 21 | import requests 22 | import time 23 | 24 | from bs4 import BeautifulSoup 25 | from tabulate import tabulate 26 | except user_interrupt_errors as error: 27 | # interrupted by the user 28 | if __name__ == '__main__': 29 | exit() 30 | else: 31 | raise error 32 | 33 | TARGET_LANG = 'de' 34 | 35 | COUNTRY_CODES = {'de', 'en', 'fr', 'es', 'pt', 'ru', 'it'} 36 | LANGUAGES = {'fr': 'French', 'de': 'German', 'es': 'Spanish', 'en': 'English', 'ru': 'Russian', 'pt': 'Portuguese', 'it': 'Italian'} 37 | 38 | POS_CLI_TO_LEO = {'v': 'verb', 'n': 'noun', 'adj': 'adjective'} 39 | 40 | TRANSLATE_URL = 'https://dict.leo.org/dictQuery/m-vocab/%(lang)s/query.xml?tolerMode=nof&lp=%(lang)s&lang=de&rmWords=off&rmSearch=on&search=%(word)s&searchLoc=0&resultOrder=basic&multiwordShowSingle=on' 41 | 42 | FLECTAB_URL = """https://dict.leo.org/dictQuery/m-vocab/ende/stemming.xml""" 43 | 44 | class LeoRequestManager: 45 | def __init__(self): 46 | self.recentlySent = False 47 | def get_xml(self, url, verbose=False): 48 | """Returns soup object (xml parsed with html.parser) 49 | If another request has been sent previously, this pauses for 3 seconds 50 | before sending this one. This is to be polite to the Leo website (and 51 | avoid possible banning).""" 52 | if self.recentlySent: 53 | if verbose: 54 | print('Pausing between requests to LEO server...', 55 | file=sys.stderr) 56 | time.sleep(1) 57 | self.recentlySent = True 58 | if verbose: 59 | print('Requesting URL: %s' % url, file=sys.stderr) 60 | r = requests.get(url) 61 | soup = BeautifulSoup(r.text, features='xml') 62 | return soup 63 | 64 | requestManager = LeoRequestManager() 65 | 66 | def get_entries(word, source_lang, pos_filter, verbose=False): 67 | xml = _retrieve_translation_doc(word, source_lang, verbose=verbose) 68 | parsed_entries = _parse_entries(xml, pos_filter) 69 | parsed_similar = _parse_similar(xml) 70 | return parsed_entries, parsed_similar 71 | 72 | def _parse_entries(xml, pos_filter): 73 | entries = xml.find_all('entry') 74 | parsed_entries = [] 75 | if entries: 76 | for entry in entries: 77 | category = entry.find('category') 78 | if category is None: 79 | pos = pos_filter 80 | else: 81 | pos = _validate_pos(pos_filter, category['type']) 82 | if not pos: 83 | continue 84 | parsed_sides = [] 85 | for side in entry.find_all('side'): 86 | word = side.find('word') 87 | if word: 88 | word = word.get_text() 89 | else: 90 | continue 91 | lang = side['lang'] 92 | if side.find('small'): 93 | forms = side.find_all('small') 94 | for form in forms: 95 | form = " ".join(form.strings) 96 | # verb conjugations 97 | if "|" in form: 98 | form = form.replace("|", "") 99 | form = form.strip() 100 | word += " (" + form + ")" 101 | # noun plurals (German-only) 102 | elif "Pl.:" in form: 103 | form = form.replace("Pl.:", "") 104 | form = form.strip() 105 | word += " (" + form + ")" 106 | flecttab = side.find('flecttab') 107 | if flecttab: 108 | inflect_url = flecttab['url'] 109 | else: 110 | inflect_url = None 111 | parsed_sides.append({'word': word, 'inflect_url': inflect_url, 'lang': lang}) 112 | parsed_entries.append({'sides': parsed_sides, 'pos': pos}) 113 | return parsed_entries 114 | 115 | def _parse_similar(xml): 116 | parsed_similar = {} 117 | similar = xml.find('similar') 118 | if similar: 119 | for side in similar.find_all('side'): 120 | lang = LANGUAGES.get(side['lang'], side['lang']) 121 | words = [] 122 | for word in side.find_all('word'): 123 | words.append(word.get_text()) 124 | if words: 125 | parsed_similar[lang] = words 126 | return parsed_similar 127 | 128 | 129 | def _validate_pos(pos_filter, pos): 130 | # often no clear distinction between adjective and adverb in German 131 | if pos in ['adjective', 'adjv', 'adverb']: 132 | pos = 'adjective' 133 | if pos_filter == 'all' or pos == pos_filter: 134 | return pos 135 | return None 136 | 137 | def _retrieve_translation_doc(word, source_lang, verbose=False): 138 | url = TRANSLATE_URL % { 139 | 'lang': source_lang + TARGET_LANG, 140 | 'word': word 141 | } 142 | return requestManager.get_xml(url, verbose=verbose) 143 | 144 | def inflect(entries, verbose=False): 145 | if not entries: 146 | return 147 | tables = _get_tables(entries, verbose=verbose) 148 | try: 149 | next_table = next(tables) 150 | except StopIteration: 151 | return 152 | while next_table is not None: 153 | translations, table = next_table 154 | if table.find('verbtab'): 155 | yield _extract_verb(table, translations) 156 | elif table.find('nountab'): 157 | yield _extract_noun(table, translations) 158 | elif table.find('adjtab'): 159 | yield _extract_adjective(table, translations) 160 | try: 161 | next_table = next(tables) 162 | except StopIteration: 163 | return 164 | 165 | def _extract_noun(table, translations): 166 | noun = {'pos': 'noun', 'translations': translations, 'moods': []} 167 | for mood in table.find_all('mood'): 168 | mood_struct = {'name': mood['title'], 'variants': []} 169 | for variant in mood.find_all('variant'): 170 | if variant['title'] == '': 171 | continue 172 | variant_struct = {'name': variant['title'], 'cases': []} 173 | for case in variant.find_all('case'): 174 | if not case.has_attr('cn'): 175 | continue 176 | variant_struct['cases'].append(_format_noun_case(case)) 177 | mood_struct['variants'].append(variant_struct) 178 | noun['moods'].append(mood_struct) 179 | 180 | return noun 181 | 182 | def _format_noun_case(case): 183 | case_name = case['cn'] 184 | final = "" 185 | art = case.find('art') 186 | radical = case.find('radical') 187 | radical = radical.get_text() if radical else '' 188 | ending = case.find('ending') 189 | ending = ending.get_text() if ending else '' 190 | return {'name': case_name, 'value': (art.get_text() + ' ' if art else '') + radical + ending} 191 | 192 | def _extract_adjective(table, translations): 193 | noun = {'pos': 'adjective', 'translations': translations, 'moods': []} 194 | for mood in table.find_all('mood'): 195 | mood_struct = {'name': mood['title'], 'variants': []} 196 | for variant in mood.find_all('variant'): 197 | if variant['title'] == '': 198 | continue 199 | variant_struct = {'name': variant['title'], 'cases': []} 200 | for case in variant.find_all('case'): 201 | if not case.has_attr('cn') or case['cn'] == 'NA': 202 | continue 203 | variant_struct['cases'].append(_format_adjective_case(case)) 204 | if variant_struct['cases']: 205 | mood_struct['variants'].append(variant_struct) 206 | noun['moods'].append(mood_struct) 207 | 208 | return noun 209 | 210 | def _format_adjective_case(case): 211 | case_name = case['cn'] 212 | final = "" 213 | radical = case.find('radical') 214 | radical = radical.get_text() if radical else '' 215 | adj = case.find('adj') 216 | arts = adj.find_all('art') 217 | if len(arts) > 1: 218 | return _format_multiple_adjective_cases(case_name, radical, adj) 219 | 220 | art = adj.find('art') 221 | art = art.get_text() if art else None 222 | part = adj.find('part') 223 | part = part.get_text() if part else None 224 | ending = adj.find('ending').get_text() 225 | 226 | return {'name': case_name, 'value': (art + ' ' if art else '') + (part + ' ' if part else '') + radical + ending} 227 | 228 | def _format_multiple_adjective_cases(case_name, radical, adj): 229 | arts = adj.find_all('art') 230 | endings = adj.find_all('ending') 231 | gender_arts = dict() 232 | for art in arts: 233 | gender_arts[art['g']] = art.get_text() 234 | gender_endings = dict() 235 | for ending in endings: 236 | gender_endings[ending['g']] = ending.get_text() 237 | 238 | cases = [] 239 | for gender in ['sm', 'sf', 'sn']: 240 | cases.append("%s %s%s" % (gender_arts[gender], radical, gender_endings[gender])) 241 | 242 | return {'name': case_name, 'value': "/".join(cases)} 243 | 244 | def _extract_verb(table, translations): 245 | verb = {'pos': 'verb', 'translations': translations, 'moods': []} 246 | aux = table.find('auxiliary') 247 | if aux: 248 | verb['aux'] = aux.get_text().strip() 249 | for mood in table.find_all('mood'): 250 | mood_name = mood['title'] 251 | mood_struct = {'name': mood_name, 'tenses': []} 252 | verb['moods'].append(mood_struct) 253 | for tense in mood.find_all('tense'): 254 | if not tense.has_attr('title'): 255 | continue 256 | tense_struct = {'name': tense['title'], 'cases': []} 257 | mood_struct['tenses'].append(tense_struct) 258 | for case in tense.find_all("case"): 259 | tense_struct['cases'].append(_format_verb_case(case)) 260 | return verb 261 | 262 | def _format_verb_case(case): 263 | pronouns = "/".join([ppron.get_text() for ppron in case.find_all("ppron")]) 264 | aux = case.find("aux") 265 | prefixes = case.find_all("pref") 266 | if prefixes: 267 | prefixes = "".join(prefix.get_text() for prefix in prefixes) 268 | else: 269 | prefixes = "" 270 | radical = case.find("radical") 271 | ending = case.find("ending") 272 | spref = case.find("spref") 273 | 274 | final = '' 275 | if pronouns: 276 | final += pronouns + " " 277 | if aux: 278 | final += (aux.get_text() + " ") 279 | final += prefixes 280 | final += "".join([part.get_text() for part in [radical, ending] if part]) 281 | if spref: 282 | final += " " + spref.get_text() 283 | return final 284 | 285 | def _get_tables(entries, verbose=False): 286 | url2translations = {} 287 | langs = set() 288 | for entry in entries: 289 | target_side = next((side for side in entry['sides'] if side["lang"] == TARGET_LANG)) 290 | url = target_side['inflect_url'] 291 | if url is None or url in url2translations: 292 | continue 293 | translations = [side['word'] for side in entry['sides']] 294 | url2translations[url] = translations 295 | 296 | for url, translations in url2translations.items(): 297 | table_url = FLECTAB_URL + url 298 | table = requestManager.get_xml(table_url, verbose=verbose) 299 | yield (translations, table) 300 | 301 | def _pairwise(l): 302 | return zip(l[0::2], l[1::2]) 303 | 304 | def _print_translation(translation): 305 | word_table = [] 306 | for entry in translation: 307 | word_table.append([side['word'] for side in entry['sides']]) 308 | print(tabulate(word_table, headers=[LANGUAGES.get(args['source_lang'], args['source_lang']), 'German'])) 309 | 310 | def _print_similar(similar): 311 | print(tabulate(similar, headers="keys")) 312 | 313 | def _print_inflection_table(table): 314 | print("/".join(table['translations'])) 315 | print(table['pos'].title()) 316 | if table['pos'] == 'verb': 317 | print("Hilfsverb: " + table['aux']) 318 | for mood in table['moods']: 319 | print("===" + mood['name'] + "===") 320 | for x, y in _pairwise(mood['tenses']): 321 | if y: 322 | print(tabulate({x['name']: x['cases'], y['name']: y['cases']}, headers='keys')) 323 | else: 324 | print(x['name']) 325 | print('-' * len(x['name'])) 326 | print("\n".join(x['cases'])) 327 | elif table['pos'] == 'noun': 328 | for mood in table['moods']: 329 | print("===" + mood['name'] + "===") 330 | for x, y in _pairwise(mood['variants']): 331 | if y: 332 | x_string = tabulate(x['cases']) 333 | y_string = tabulate(y['cases']) 334 | print(tabulate({x['name']: x_string.split("\n"), y['name']: y_string.split("\n")}, headers='keys')) 335 | else: 336 | print(x['name']) 337 | print(tabulate(x['cases'])) 338 | elif table['pos'] == 'adjective': 339 | for mood in table['moods']: 340 | print("===" + mood['name'] + "===") 341 | for variant in mood['variants']: 342 | print(variant['name']) 343 | print(tabulate(variant['cases'])) 344 | 345 | def main(): 346 | try: 347 | if args['define']: 348 | for word in args['words']: 349 | entries, similar = get_entries(word, args['source_lang'], args['pos'], verbose=args['verbose']) 350 | if entries: 351 | _print_translation(entries) 352 | else: 353 | print('\tNo translations found') 354 | if similar: 355 | print("\nSimilar Words") 356 | _print_similar(similar) 357 | if args['inflect']: 358 | print() 359 | sys.stdout.flush() 360 | if args['inflect']: 361 | for word in args['words']: 362 | entries, similar = get_entries(word, args['source_lang'], args['pos'], verbose=args['verbose']) 363 | tables = inflect(entries, verbose=args['verbose']) 364 | 365 | table_count = 0 366 | for table in tables: 367 | table_count += 1 368 | _print_inflection_table(table) 369 | print() 370 | sys.stdout.flush() 371 | if table_count == 0: 372 | print("\tNo inflection tables found for " + word) 373 | if similar: 374 | print("\nSimilar Words") 375 | _print_similar(similar) 376 | 377 | except user_interrupt_errors as error: 378 | if args['verbose']: 379 | print("Interrupted by user", file=sys.stderr) 380 | exit() 381 | 382 | 383 | if __name__ == '__main__': 384 | lang = 'ende' # language 385 | parser = ArgumentParser(description='Retrieve word information via the Leo website') 386 | parser.add_argument('words', nargs='+', help="the words to look up on the LEO website") 387 | parser.add_argument("-l", "--lang", dest="source_lang", default='en', 388 | choices=COUNTRY_CODES, 389 | help="source language, 2 chars (e.g. 'en')") 390 | parser.add_argument("-i", "--inflect", 391 | action="store_true", dest="inflect", default=False, 392 | help="print inflection tables for all homonyms") 393 | parser.add_argument("-s", "--similar", 394 | action="store_true", dest="similar", default=False, 395 | help="show similar words") 396 | parser.add_argument("-p", "--pos", dest="pos", choices=['all', 'n', 'v', 'adj'], 397 | default='all', 398 | help="Part of speech of words to translate/inflect.") 399 | parser.add_argument("-d", "--define", 400 | action="store_true", dest="define", default=False, 401 | help="print dictionary definitions. True by default if -i is not specified.") 402 | parser.add_argument("-v", "--verbose", 403 | action="store_true", dest="verbose", default=False, help="Print debug messages") 404 | args = vars(parser.parse_args()) 405 | 406 | if not args['inflect']: 407 | args['define'] = True 408 | 409 | if args['pos']: 410 | args['pos'] = POS_CLI_TO_LEO[args['pos']] if args['pos'] in POS_CLI_TO_LEO else args['pos'] 411 | 412 | if not args['inflect']: 413 | args['define'] = True 414 | 415 | if args['pos']: 416 | args['pos'] = POS_CLI_TO_LEO[args['pos']] if args['pos'] in POS_CLI_TO_LEO else args['pos'] 417 | 418 | main() 419 | -------------------------------------------------------------------------------- /leo-cli.spec: -------------------------------------------------------------------------------- 1 | %define name leo-cli 2 | %define version 0.3.4 3 | %define unmangled_version 0.3.4 4 | %define release 2 5 | 6 | Summary: leo.org command line tool 7 | Name: %{name} 8 | Version: %{version} 9 | Release: %{release} 10 | Source0: https://pypi.python.org/packages/source/l/leo-cli/%{name}-%{unmangled_version}.tar.gz 11 | License: MIT 12 | Group: Applications/Text 13 | BuildRoot: %{_tmppath}/%{name}-%{version}-%{release}-buildroot 14 | Prefix: %{_prefix} 15 | BuildArch: noarch 16 | Vendor: Johannes Degn 17 | Url: https://github.com/JoiDegn/leo-cli 18 | 19 | BuildRequires: python3 20 | 21 | %description 22 | leo-cli is a command line tool which can be used to translate words or phrases 23 | from several languages to german. It uses the open dictionary dict.leo.org. I 24 | wrote this because visiting their website, choosing the language, typing the 25 | word and clicking the submit button required several too many steps. I am a lazy 26 | person. 27 | 28 | %prep 29 | %setup -n %{name}-%{unmangled_version} 30 | 31 | %build 32 | python3 setup.py build 33 | 34 | %install 35 | python3 setup.py install -O1 --root=$RPM_BUILD_ROOT --record=INSTALLED_FILES 36 | 37 | %clean 38 | rm -rf $RPM_BUILD_ROOT 39 | 40 | %files -f INSTALLED_FILES 41 | %defattr(-,root,root) 42 | -------------------------------------------------------------------------------- /pyproject.toml: -------------------------------------------------------------------------------- 1 | [build-system] 2 | requires = ["setuptools", "wheel"] 3 | build-backend = "setuptools.build_meta" 4 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | beautifulsoup4 2 | requests>=2.0.0 3 | tabulate>=0.7.7 4 | lxml 5 | -------------------------------------------------------------------------------- /setup.cfg: -------------------------------------------------------------------------------- 1 | [metadata] 2 | name = leo-cli 3 | version = 0.3.6 4 | description = leo.org command line tool 5 | author = Johannes Degn 6 | author_email = joi@degn.de 7 | long_description = file: README.md 8 | long_description_content_type = text/markdown 9 | url = https://github.com/joidegn/leo-cli 10 | project_urls = 11 | Bug Tracker = https://github.com/joidegn/leo-cli/issues 12 | classifiers = 13 | Programming Language :: Python :: 3 14 | License :: OSI Approved :: MIT License 15 | Operating System :: OS Independent 16 | 17 | [options] 18 | packages = find: 19 | python_requires = >=3.6 20 | scripts = leo 21 | install_requires = file: requirements.txt 22 | --------------------------------------------------------------------------------