├── .gitignore
├── .travis.yml
├── CHANGELOG.md
├── LICENSE
├── README.en.md
├── README.md
├── setup.cfg
├── setup.py
├── test.py
└── tossi
    ├── __about__.py
    ├── __init__.py
    ├── coda.py
    ├── formatter.py
    ├── hangul.py
    ├── particles.py
    ├── tolerance.py
    └── utils.py


/.gitignore:
--------------------------------------------------------------------------------
 1 | *.py[cod]
 2 | 
 3 | # C extensions
 4 | *.so
 5 | 
 6 | # Packages
 7 | *.egg
 8 | *.egg-info
 9 | dist
10 | build
11 | eggs
12 | parts
13 | bin
14 | var
15 | sdist
16 | develop-eggs
17 | .installed.cfg
18 | lib
19 | lib64
20 | 
21 | # Installer logs
22 | pip-log.txt
23 | 
24 | # Unit test / coverage reports
25 | .coverage
26 | .tox
27 | .cache
28 | nosetests.xml
29 | 
30 | # Translations
31 | *.mo
32 | 
33 | # Mr Developer
34 | .mr.developer.cfg
35 | .project
36 | .pydevproject
37 | 
38 | # Vim swap files
39 | .*.sw[ponm]
40 | 


--------------------------------------------------------------------------------
/.travis.yml:
--------------------------------------------------------------------------------
 1 | language: python
 2 | sudo: false
 3 | python:
 4 | - 2.7
 5 | - 3.3
 6 | - 3.4
 7 | - 3.5
 8 | - 3.5-dev
 9 | - pypy
10 | - pypy3
11 | install:
12 | - pip install -e .
13 | - pip install flake8 flake8-import-order pytest pytest-cov coveralls
14 | script:
15 | - | # flake8
16 |   flake8 tossi test.py setup.py -v --show-source
17 | - | # pytest
18 |   py.test -v --cov=tossi --cov-report=term-missing
19 | after_success:
20 | - coveralls
21 | 


--------------------------------------------------------------------------------
/CHANGELOG.md:
--------------------------------------------------------------------------------
1 | ## Version 0.1
2 | 
3 | Released on Jun 10 2016.
4 | 
5 | The first public release.
6 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | Copyright (c) 2016, What! Studio
 2 | All rights reserved.
 3 | 
 4 | Redistribution and use in source and binary forms, with or without modification,
 5 | are permitted provided that the following conditions are met:
 6 | 
 7 |   Redistributions of source code must retain the above copyright notice, this
 8 |   list of conditions and the following disclaimer.
 9 | 
10 |   Redistributions in binary form must reproduce the above copyright notice, this
11 |   list of conditions and the following disclaimer in the documentation and/or
12 |   other materials provided with the distribution.
13 | 
14 |   Neither the name of the copyright holder nor the names of its
15 |   contributors may be used to endorse or promote products derived from
16 |   this software without specific prior written permission.
17 | 
18 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
19 | ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
20 | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
21 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR
22 | ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
23 | (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
24 | LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
25 | ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
26 | (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
27 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
28 | 


--------------------------------------------------------------------------------
/README.en.md:
--------------------------------------------------------------------------------
  1 | # Tossi
  2 | 
  3 | [![Build Status](
  4 |   https://travis-ci.org/what-studio/tossi.svg?branch=master
  5 | )](https://travis-ci.org/what-studio/tossi)
  6 | [![Coverage Status](
  7 |   https://coveralls.io/repos/github/what-studio/tossi/badge.svg?branch=master
  8 | )](https://coveralls.io/r/what-studio/tossi)
  9 | [![README in English](
 10 |   https://img.shields.io/badge/readme-korean-blue.svg?style=flat
 11 | )](README.en.md)
 12 | 
 13 | "Tossi(토씨)" is a pure-Korean name for grammatical particles.  Some of Korean
 14 | particles has allomorphic variant forms depending on a leading word.  The Tossi
 15 | library determines most natural form.
 16 | 
 17 | ## Installation
 18 | 
 19 | ```console
 20 | $ pip install tossi
 21 | ```
 22 | 
 23 | ## Usage
 24 | 
 25 | ```python
 26 | >>> import tossi
 27 | >>> tossi.postfix_particle(u'집', u'(으)로')
 28 | 집으로
 29 | >>> tossi.postfix_particle(u'말', u'으로는')
 30 | 말로는
 31 | >>> tossi.postfix_particle(u'대한민국', u'은(는)')
 32 | 대한민국은
 33 | >>> tossi.postfix_particle(u'민주공화국', u'다')
 34 | 민주공화국이다
 35 | ```
 36 | 
 37 | ## Natural Form for Particles
 38 | 
 39 | These particles do not have allomorphic variant.  They always appear in same
 40 | form: `의`, `도`, `만~`, `에~`, `께~`, `뿐~`, `하~`, `보다~`, `밖에~`, `같이~`,
 41 | `부터~`, `까지~`, `마저~`, `조차~`, `마냥~`, `처럼~`, and `커녕~`:
 42 | 
 43 | > 나오**의**, 모리안**의**, 키홀**의**, 나오**도**, 모리안**도**, 키홀**도**
 44 | 
 45 | Meanwhile, these particles appear in different form depending on whether the
 46 | leading word have a final consonant or not: `은(는)`, `이(가)`, `을(를)`, and
 47 | `과(와)~`:
 48 | 
 49 | > 나오**는**, 모리안**은**, 키홀**은**
 50 | 
 51 | `(으)로~` also have similar rule but if the final consonant is `ㄹ`, it appears
 52 | same with after non final consonant:
 53 | 
 54 | > 나오**로**, 모리안**으로**, 키홀**로**
 55 | 
 56 | `(이)다` which is a predicative particle have more diverse forms.  Its end can
 57 | be inflected in general:
 58 | 
 59 | > 나오**지만**, 모리안**이지만**, 키홀**이에요**, 나오**예요**
 60 | 
 61 | Tossi tries to determine most natural form for particles.  But if it fails to
 62 | do, determines both forms like `은(는)` or `(으)로` for tolerance:
 63 | 
 64 | ```python
 65 | >>> tossi.postfix_particle(u'벽돌', u'으로')
 66 | 벽돌로
 67 | >>> tossi.postfix_particle(u'짚', u'으로')
 68 | 짚으로
 69 | >>> tossi.postfix_particle(u'黃金', u'으로')
 70 | 黃金(으)로
 71 | ```
 72 | 
 73 | If the leading word ends with number, a natural form can be determined:
 74 | 
 75 | ```python
 76 | >>> tossi.postfix_particle(u'레벨 10', u'이')
 77 | 레벨 10이
 78 | >>> tossi.postfix_particle(u'레벨 999', u'이')
 79 | 레벨 999가
 80 | ```
 81 | 
 82 | Words in a parentheses are ignored:
 83 | 
 84 | ```python
 85 | >>> tossi.postfix_particle(u'나뭇가지(만렙)', u'을')
 86 | 나뭇가지(만렙)를
 87 | ```
 88 | 
 89 | ## Tolerance Styles
 90 | 
 91 | When Tossi can't determine the natural form, the result includes the both
 92 | forms.  In this case, you can choose the order of the forms.  For example, if
 93 | the most words are Japanese, they probably will not end with final consonants.
 94 | Therefore `는(은)` is better than `은(는)` which is the default style:
 95 | 
 96 | ```python
 97 | >>> tolerance_style = tossi.parse_tolerance_style(u'는(은)')
 98 | >>> tossi.postfix_particle(u'さくら', u'이', tolerance_style=tolerance_style)
 99 | さくら가(이)
100 | ```
101 | 
102 | Choose one of `은(는)`, `(은)는`, `는(은)`, `(는)은` for your project.
103 | 
104 | ## Licensing
105 | 
106 | Written by [Heungsub Lee][sublee] and [Chanwoong Kim][kexplo] at
107 | [What! Studio][what-studio] in [Nexon][nexon], and distributed under
108 | [the BSD 3-Clause license][bsd-3-clause].
109 | 
110 | [nexon]: http://nexon.com/
111 | [what-studio]: https://github.com/what-studio
112 | [sublee]: http://subl.ee/
113 | [kexplo]: http://chanwoong.kim/
114 | [bsd-3-clause]: http://opensource.org/licenses/BSD-3-Clause
115 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # 토씨
  2 | 
  3 | [![Build Status](
  4 |   https://travis-ci.org/what-studio/tossi.svg?branch=master
  5 | )](https://travis-ci.org/what-studio/tossi)
  6 | [![Coverage Status](
  7 |   https://coveralls.io/repos/github/what-studio/tossi/badge.svg?branch=master
  8 | )](https://coveralls.io/r/what-studio/tossi)
  9 | [![README in English](
 10 |   https://img.shields.io/badge/readme-english-blue.svg?style=flat
 11 | )](README.en.md)
 12 | 
 13 | '토씨'는 '조사'의 순우리말 이름입니다. 토씨 라이브러리는 임의의 단어 뒤에 올
 14 | 가장 자연스러운 한국어 조사 형태를 골라줍니다.
 15 | 
 16 | ## 설치
 17 | 
 18 | ```console
 19 | $ pip install tossi
 20 | ```
 21 | 
 22 | ## 사용법
 23 | 
 24 | ```python
 25 | >>> import tossi
 26 | >>> tossi.postfix(u'집', u'(으)로')
 27 | 집으로
 28 | >>> tossi.postfix(u'말', u'으로는')
 29 | 말로는
 30 | >>> tossi.postfix(u'대한민국', u'은(는)')
 31 | 대한민국은
 32 | >>> tossi.postfix(u'민주공화국', u'다')
 33 | 민주공화국이다
 34 | ```
 35 | 
 36 | ## 자연스러운 조사 선택
 37 | 
 38 | `의`, `도`, `만~`, `에~`, `께~`, `뿐~`, `하~`, `보다~`, `밖에~`, `같이~`,
 39 | `부터~`, `까지~`, `마저~`, `조차~`, `마냥~`, `처럼~`, `커녕~`에는 어떤 단어가
 40 | 앞서도 형태가 변하지 않습니다:
 41 | 
 42 | > 나오**의**, 모리안**의**, 키홀**의**, 나오**도**, 모리안**도**, 키홀**도**
 43 | 
 44 | 반면 `은(는)`, `이(가)`, `을(를)`, `과(와)~`는 앞선 단어의 마지막 음절의 받침
 45 | 유무에 따라 형태가 달라집니다:
 46 | 
 47 | > 나오**는**, 모리안**은**, 키홀**은**
 48 | 
 49 | `(으)로~`도 비슷한 규칙을 따르지만 앞선 받침이 `ㄹ`일 경우엔 받침이 없는 것과
 50 | 같게 취급합니다:
 51 | 
 52 | > 나오**로**, 모리안**으로**, 키홀**로**
 53 | 
 54 | 서술격 조사 `(이)다`는 어미가 활용되어 다양한 형태로 변형될 수 있습니다:
 55 | 
 56 | > 나오**지만**, 모리안**이지만**, 키홀**이에요**, 나오**예요**
 57 | 
 58 | 토씨는 가장 자연스러운 조사 형태를 선택합니다.  만약 어떤 형태가 자연스러운지
 59 | 알 수 없을 때에는 `은(는)`, `(으)로`처럼 모든 형태를 병기합니다:
 60 | 
 61 | ```python
 62 | >>> tossi.postfix(u'벽돌', u'으로')
 63 | 벽돌로
 64 | >>> tossi.postfix(u'짚', u'으로')
 65 | 짚으로
 66 | >>> tossi.postfix(u'黃金', u'으로')
 67 | 黃金(으)로
 68 | ```
 69 | 
 70 | 단어가 숫자로 끝나더라도 자연스러운 조사 형태가 선택됩니다:
 71 | 
 72 | ```python
 73 | >>> tossi.postfix(u'레벨 10', u'이')
 74 | 레벨 10이
 75 | >>> tossi.postfix(u'레벨 999', u'이')
 76 | 레벨 999가
 77 | ```
 78 | 
 79 | 괄호 속 단어나 구두점은 조사 형태를 선택할 때 참고하지 않습니다:
 80 | 
 81 | ```python
 82 | >>> tossi.postfix(u'나뭇가지(만렙)', u'을')
 83 | 나뭇가지(만렙)를
 84 | ```
 85 | 
 86 | ## 병기 순서
 87 | 
 88 | 조사의 형태를 모두 병기해야할 때 병기할 순서를 고를 수 있습니다. 가령 대부분의
 89 | 인자가 일본어 단어일 경우엔 단어가 모음으로 끝날 확률이 높습니다. 이 경우
 90 | 기본형인 `은(는)` 스타일보단 `는(은)` 스타일이 더 자연스러울 수 있습니다:
 91 | 
 92 | ```python
 93 | >>> tolerance_style = tossi.parse_tolerance_style(u'는(은)')
 94 | >>> tossi.postfix(u'さくら', u'이', tolerance_style=tolerance_style)
 95 | さくら가(이)
 96 | ```
 97 | 
 98 | `은(는)`, `(은)는`, `는(은)`, `(는)은` 네 가지 스타일 중 프로젝트에 맞는 것을
 99 | 고르세요.
100 | 
101 | ## API
102 | 
103 | ### `tossi.pick(word, morph) -> str`
104 | 
105 | `word`에 자연스럽게 뒤따르는 조사 형태를 구합니다.
106 | 
107 | ```python
108 | >>> tossi.pick(u'토씨', '은')
109 | 는
110 | >>> tossi.pick(u'우리말', '은')
111 | 은
112 | ```
113 | 
114 | ### `tossi.postfix(word, morph) -> str`
115 | 
116 | 단어와 조사를 자연스럽게 연결합니다.
117 | 
118 | ```python
119 | >>> tossi.postfix(u'토씨', '은')
120 | 토씨는
121 | >>> tossi.postfix(u'우리말', '은')
122 | 우리말은
123 | ```
124 | 
125 | ### `tossi.parse(morph) -> Particle`
126 | 
127 | 문자열로 된 조사 표기로부터 조사 객체를 얻습니다.
128 | 
129 | ```python
130 | >>> tossi.parse(u'으로')
131 | <Particle: u'(으)로'>
132 | >>> tossi.parse(u'(은)는')
133 | <Particle: u'은(는)'>
134 | >>> tossi.parse(u'이면')
135 | <Particle: u'(이)'>
136 | ```
137 | 
138 | ### `Particle[word[:morph]] -> str`
139 | 
140 | `word`에 뒤따르는 표기를 구합니다.
141 | 
142 | ```python
143 | >>> Eun = tossi.parse(u'은')
144 | >>> Eun[u'라면']
145 | 은
146 | >>> Eun[u'라볶이']
147 | 는
148 | ```
149 | 
150 | `morph`를 지정해서 어미에 변화를 줄 수 있습니다.
151 | 
152 | ```python
153 | >>> Euro = tossi.parse(u'으로')
154 | >>> Euro[u'라면':u'으론']
155 | 으론
156 | >>> Euro[u'라볶이':u'으론']
157 | 론
158 | ```
159 | 
160 | ## 만든이와 사용권
161 | 
162 | [넥슨][nexon] [왓 스튜디오][what-studio]의 [이흥섭][sublee]과
163 | [김찬웅][kexplo]이 만들었고 [제3조항을 포함하는 BSD 허가서][bsd-3-clause]를
164 | 채택했습니다.
165 | 
166 | [nexon]: http://nexon.com/
167 | [what-studio]: https://github.com/what-studio
168 | [sublee]: http://subl.ee/
169 | [kexplo]: http://chanwoong.kim/
170 | [bsd-3-clause]: http://opensource.org/licenses/BSD-3-Clause
171 | 


--------------------------------------------------------------------------------
/setup.cfg:
--------------------------------------------------------------------------------
1 | [flake8]
2 | ignore = E301, E731
3 | import_order_style = google
4 | application-import-names = tossi
5 | 
6 | [pytest]
7 | python_files = test.py
8 | 


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | import os
 3 | 
 4 | from setuptools import find_packages, setup
 5 | 
 6 | 
 7 | # Include __about__.py.
 8 | __dir__ = os.path.dirname(__file__)
 9 | about = {}
10 | with open(os.path.join(__dir__, 'tossi', '__about__.py')) as f:
11 |     exec(f.read(), about)
12 | 
13 | 
14 | setup(
15 |     name='tossi',
16 |     version=about['__version__'],
17 |     license=about['__license__'],
18 |     author=about['__author__'],
19 |     maintainer=about['__maintainer__'],
20 |     maintainer_email=about['__maintainer_email__'],
21 |     url='https://github.com/what-studio/tossi',
22 |     description='Supports Korean particles',
23 |     platforms='any',
24 |     packages=find_packages(),
25 |     zip_safe=False,
26 |     classifiers=[
27 |         'Development Status :: 4 - Beta',
28 |         'Intended Audience :: Developers',
29 |         'Intended Audience :: Science/Research',
30 |         'License :: OSI Approved :: BSD License',
31 |         'Natural Language :: Korean',
32 |         'Operating System :: OS Independent',
33 |         'Programming Language :: Python',
34 |         'Programming Language :: Python :: 2',
35 |         'Programming Language :: Python :: 2.7',
36 |         'Programming Language :: Python :: 3.4',
37 |         'Programming Language :: Python :: Implementation :: CPython',
38 |         'Programming Language :: Python :: Implementation :: PyPy',
39 |         'Topic :: Software Development :: Libraries :: Python Modules',
40 |         'Topic :: Software Development :: Localization',
41 |         'Topic :: Text Processing :: Linguistic',
42 |     ],
43 |     install_requires=['bidict', 'six'],
44 | )
45 | 


--------------------------------------------------------------------------------
/test.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | import functools
  3 | 
  4 | import pytest
  5 | from six import PY2, text_type as str, with_metaclass
  6 | 
  7 | import tossi
  8 | from tossi import postfix as f, registry
  9 | from tossi.coda import pick_coda_from_decimal
 10 | from tossi.hangul import join_phonemes, split_phonemes
 11 | from tossi.particles import Euro, Ida, Particle, SingletonParticleMeta
 12 | from tossi.tolerance import (
 13 |     generate_tolerances, get_tolerance, get_tolerance_from_iterator,
 14 |     MORPH1_AND_OPTIONAL_MORPH2, OPTIONAL_MORPH2_AND_MORPH1,
 15 |     parse_tolerance_style)
 16 | 
 17 | 
 18 | Eun = tossi.parse(u'은')
 19 | Eul = tossi.parse(u'을')
 20 | Gwa = tossi.parse(u'과')
 21 | 
 22 | 
 23 | def test_about():
 24 |     __import__('tossi.__about__')
 25 | 
 26 | 
 27 | def test_particle():
 28 |     assert str(Eun) == u'은(는)'
 29 |     assert str(Eul) == u'을(를)'
 30 |     assert str(Ida) == u'(이)'
 31 |     if PY2:
 32 |         try:
 33 |             __import__('unidecode')
 34 |         except ImportError:
 35 |             assert repr(Ida) == u"<Particle: u'(\\uc774)'>"
 36 |         else:
 37 |             assert repr(Ida) == u'<Particle: (i)>'
 38 |     else:
 39 |         assert repr(Ida) == u'<Particle: (이)>'
 40 | 
 41 | 
 42 | def test_frontend():
 43 |     assert tossi.parse(u'을') is Eul
 44 |     assert tossi.parse(u'를') is Eul
 45 |     assert tossi.parse(u'을(를)') is Eul
 46 |     assert tossi.parse(u'이다') is Ida
 47 |     assert tossi.parse(u'이었다') is Ida
 48 | 
 49 | 
 50 | def test_split_phonemes():
 51 |     assert split_phonemes(u'쏚') == (u'ㅆ', u'ㅗ', u'ㄲ')
 52 |     assert split_phonemes(u'섭') == (u'ㅅ', u'ㅓ', u'ㅂ')
 53 |     assert split_phonemes(u'투') == (u'ㅌ', u'ㅜ', u'')
 54 |     assert split_phonemes(u'투', onset=False) == (None, u'ㅜ', u'')
 55 |     with pytest.raises(ValueError):
 56 |         split_phonemes(u'X')
 57 |     with pytest.raises(ValueError):
 58 |         split_phonemes(u'섭섭')
 59 | 
 60 | 
 61 | def test_join_phonemes():
 62 |     assert join_phonemes(u'ㅅ', u'ㅓ', u'ㅂ') == u'섭'
 63 |     assert join_phonemes((u'ㅅ', u'ㅓ', u'ㅂ')) == u'섭'
 64 |     assert join_phonemes(u'ㅊ', u'ㅠ') == u'츄'
 65 |     assert join_phonemes(u'ㅊ', u'ㅠ', u'') == u'츄'
 66 |     assert join_phonemes((u'ㅊ', u'ㅠ')) == u'츄'
 67 |     with pytest.raises(TypeError):
 68 |         join_phonemes(u'ㄷ', u'ㅏ', u'ㄹ', u'ㄱ')
 69 | 
 70 | 
 71 | def test_particle_tolerances():
 72 |     t = lambda _1, _2: set(generate_tolerances(_1, _2))
 73 |     s = lambda x: set(x.split())
 74 |     assert t(u'이', u'가') == s(u'이(가) (이)가 가(이) (가)이')
 75 |     assert t(u'이', u'') == s(u'(이)')
 76 |     assert t(u'으로', u'로') == s(u'(으)로')
 77 |     assert t(u'이여', u'여') == s(u'(이)여')
 78 |     assert t(u'이시여', u'시여') == s(u'(이)시여')
 79 |     assert t(u'아', u'야') == s(u'아(야) (아)야 야(아) (야)아')
 80 |     assert \
 81 |         t(u'가나다', u'나나다') == \
 82 |         s(u'가(나)나다 (가)나나다 나(가)나다 (나)가나다')
 83 |     assert \
 84 |         t(u'가나다', u'마바사') == \
 85 |         s(u'가나다(마바사) (가나다)마바사 마바사(가나다) (마바사)가나다')
 86 | 
 87 | 
 88 | def test_euro():
 89 |     assert Euro[u'나오'] == u'로'
 90 |     assert Euro[u'키홀'] == u'로'
 91 |     assert Euro[u'모리안'] == u'으로'
 92 |     assert Euro[u'Nao'] == u'(으)로'
 93 |     assert Euro[u'나오':u'로서'] == u'로서'
 94 |     assert Euro[u'키홀':u'로서'] == u'로서'
 95 |     assert Euro[u'모리안':u'로서'] == u'으로서'
 96 |     assert Euro[u'나오':u'로써'] == u'로써'
 97 |     assert Euro[u'키홀':u'로써'] == u'로써'
 98 |     assert Euro[u'모리안':u'로써'] == u'으로써'
 99 |     assert Euro[u'나오':u'로부터'] == u'로부터'
100 |     assert Euro[u'키홀':u'로부터'] == u'로부터'
101 |     assert Euro[u'모리안':u'로부터'] == u'으로부터'
102 |     assert Euro[u'나오':u'(으)로부터의'] == u'로부터의'
103 |     assert Euro[u'밖':u'론'] == u'으론'
104 | 
105 | 
106 | def test_combinations():
107 |     assert f(u'이 방법', u'만으로는') == u'이 방법만으로는'
108 |     assert f(u'나', u'조차도') == u'나조차도'
109 |     assert f(u'그 친구', u'과는') == u'그 친구와는'
110 |     assert f(u'그것', u'와는') == u'그것과는'
111 |     assert f(u'사건', u'과(와)는') == u'사건과는'
112 |     assert f(u'그 친구', u'관') == u'그 친구완'
113 | 
114 | 
115 | def test_exceptions():
116 |     # Empty.
117 |     assert f(u'', u'를') == u'을(를)'
118 |     # Onsets only.
119 |     assert f(u'ㅋㅋㅋ', u'를') == u'ㅋㅋㅋ을(를)'
120 | 
121 | 
122 | def test_insignificant():
123 |     assert f(u'나오(Lv.25)', u'으로') == u'나오(Lv.25)로'
124 |     assert f(u'나오 (Lv.25)', u'을') == u'나오 (Lv.25)를'
125 |     assert f(u'나(?)오', u'으로') == u'나(?)오로'
126 |     assert f(u'헬로월드!', u'으로') == u'헬로월드!로'
127 |     assert f(u'?_?', u'으로') == u'?_?(으)로'
128 |     assert f(u'임창정,,,', u'가') == u'임창정,,,이'
129 |     assert f(u'《듀랑고》', u'을') == u'《듀랑고》를'
130 |     assert f(u'불완전괄호)', u'은') == u'불완전괄호)는'
131 |     assert f(u'이상한괄호)))', u'는') == u'이상한괄호)))는'
132 |     assert f(u'이상한괄호)()', u'은') == u'이상한괄호)()는'
133 |     assert f(u'이상한괄호())', u'(는)은') == u'이상한괄호())는'
134 |     assert f(u'^_^', u'이었다.') == u'^_^(이)었다.'
135 |     assert f(u'웃는얼굴^_^', u'이었다.') == u'웃는얼굴^_^이었다.'
136 |     assert f(u'폭탄(가짜)...', u'이었다.') == u'폭탄(가짜)...이었다.'
137 |     assert f(u'16(7)?!', u'으로') == u'16(7)?!으로'
138 |     assert f(u'7(16)?!', u'으로') == u'7(16)?!로'
139 |     assert f(u'검색\ue000', u'를') == u'검색\ue000을'
140 | 
141 | 
142 | def test_only_parentheses():
143 |     assert f(u'(1, 2)', u'를') == u'(1, 2)를'
144 |     assert f(u'(2, 3)', u'를') == u'(2, 3)을'
145 | 
146 | 
147 | def test_vocative_particles():
148 |     assert f(u'친구', u'야') == u'친구야'
149 |     assert f(u'사랑', u'야') == u'사랑아'
150 |     assert f(u'사랑', u'아') == u'사랑아'
151 |     assert f(u'친구', u'여') == u'친구여'
152 |     assert f(u'사랑', u'여') == u'사랑이여'
153 |     assert f(u'하늘', u'이시여') == u'하늘이시여'
154 |     assert f(u'바다', u'이시여') == u'바다시여'
155 | 
156 | 
157 | def test_ida():
158 |     """Cases for '이다' which is a copulative and existential verb."""
159 |     # Do or don't inject '이'.
160 |     assert f(u'나오', u'이다') == u'나오다'
161 |     assert f(u'키홀', u'이다') == u'키홀이다'
162 |     # Merge with the following vowel as /j/.
163 |     assert f(u'나오', u'이에요') == u'나오예요'
164 |     assert f(u'키홀', u'이에요') == u'키홀이에요'
165 |     # No allomorphs.
166 |     assert f(u'나오', u'입니다') == u'나오입니다'
167 |     assert f(u'키홀', u'입니다') == u'키홀입니다'
168 |     # Give up to select an allomorph.
169 |     assert f(u'God', u'이다') == u'God(이)다'
170 |     assert f(u'God', u'이에요') == u'God(이)에요'
171 |     assert f(u'God', u'입니다') == u'God입니다'
172 |     assert f(u'God', u'였습니다') == u'God(이)었습니다'
173 |     # Many examples.
174 |     assert f(u'키홀', u'였습니다') == u'키홀이었습니다'
175 |     assert f(u'나오', u'였습니다') == u'나오였습니다'
176 |     assert f(u'나오', u'이었다') == u'나오였다'
177 |     assert f(u'나오', u'이었지만') == u'나오였지만'
178 |     assert f(u'나오', u'이지만') == u'나오지만'
179 |     assert f(u'키홀', u'이지만') == u'키홀이지만'
180 |     assert f(u'나오', u'지만') == u'나오지만'
181 |     assert f(u'키홀', u'지만') == u'키홀이지만'
182 |     assert f(u'나오', u'다') == u'나오다'
183 |     assert f(u'키홀', u'다') == u'키홀이다'
184 |     assert f(u'나오', u'이에요') == u'나오예요'
185 |     assert f(u'키홀', u'이에요') == u'키홀이에요'
186 |     assert f(u'나오', u'고') == u'나오고'
187 |     assert f(u'키홀', u'고') == u'키홀이고'
188 |     assert f(u'모리안', u'고') == u'모리안이고'
189 |     assert f(u'나오', u'여서') == u'나오여서'
190 |     assert f(u'키홀', u'여서') == u'키홀이어서'
191 |     assert f(u'나오', u'이어서') == u'나오여서'
192 |     assert f(u'키홀', u'라고라') == u'키홀이라고라'
193 |     assert f(u'키홀', u'든지') == u'키홀이든지'
194 |     assert f(u'키홀', u'던가') == u'키홀이던가'
195 |     assert f(u'키홀', u'여도') == u'키홀이어도'
196 |     assert f(u'키홀', u'야말로') == u'키홀이야말로'
197 |     assert f(u'키홀', u'인양') == u'키홀인양'
198 |     assert f(u'나오', u'인양') == u'나오인양'
199 | 
200 | 
201 | def test_invariant_particles():
202 |     assert f(u'나오', u'도') == u'나오도'
203 |     assert f(u'모리안', u'도') == u'모리안도'
204 |     assert f(u'판교', u'에서') == u'판교에서'
205 |     assert f(u'판교', u'에서는') == u'판교에서는'
206 |     assert f(u'선생님', u'께서도') == u'선생님께서도'
207 |     assert f(u'나오', u'의') == u'나오의'
208 |     assert f(u'모리안', u'만') == u'모리안만'
209 |     assert f(u'키홀', u'하고') == u'키홀하고'
210 |     assert f(u'콩', u'만큼') == u'콩만큼'
211 |     assert f(u'콩', u'마냥') == u'콩마냥'
212 |     assert f(u'콩', u'처럼') == u'콩처럼'
213 | 
214 | 
215 | def test_tolerances():
216 |     assert f(u'나오', u'은(는)') == u'나오는'
217 |     assert f(u'나오', u'(은)는') == u'나오는'
218 |     assert f(u'나오', u'는(은)') == u'나오는'
219 |     assert f(u'나오', u'(는)은') == u'나오는'
220 | 
221 | 
222 | def test_decimal():
223 |     assert f(u'레벨30', u'이') == u'레벨30이'
224 |     assert f(u'레벨34', u'이') == u'레벨34가'
225 |     assert f(u'레벨7', u'으로') == u'레벨7로'
226 |     assert f(u'레벨42', u'으로') == u'레벨42로'
227 |     assert f(u'레벨100', u'으로') == u'레벨100으로'
228 |     assert pick_coda_from_decimal('1') == u'ㄹ'
229 |     assert pick_coda_from_decimal('2') == u''
230 |     assert pick_coda_from_decimal('3') == u'ㅁ'
231 |     assert pick_coda_from_decimal('10') == u'ㅂ'
232 |     assert pick_coda_from_decimal('16') == u'ㄱ'
233 |     assert pick_coda_from_decimal('19') == u''
234 |     assert pick_coda_from_decimal('200') == u'ㄱ'
235 |     assert pick_coda_from_decimal('30000') == u'ㄴ'
236 |     assert pick_coda_from_decimal('400000') == u'ㄴ'
237 |     assert pick_coda_from_decimal('500000000') == u'ㄱ'
238 |     assert pick_coda_from_decimal('1' + '0' * 50) == u'ㄱ'
239 |     assert pick_coda_from_decimal('1' + '0' * 100) is None
240 |     assert pick_coda_from_decimal('0') == u'ㅇ'
241 |     assert pick_coda_from_decimal('1.0') == u'ㅇ'
242 |     assert pick_coda_from_decimal('1.234567890') == u'ㅇ'
243 |     assert pick_coda_from_decimal('3.14') == u''
244 | 
245 | 
246 | def test_match():
247 |     # (n)eun
248 |     assert Eun.match(u'은') == u''
249 |     assert Eun.match(u'는') == u''
250 |     assert Eun.match(u'은(는)') == u''
251 |     assert Eun.match(u'는(은)') == u''
252 |     assert Eun.match(u'(은)는') == u''
253 |     assert Eun.match(u'(는)은') == u''
254 |     assert Eun.match(u'는는') == u'는'
255 |     # (r)eul (final=True)
256 |     assert Eul.match(u'를') == u''
257 |     assert Eul.match(u'을을') is None
258 |     # (g)wa
259 |     assert Gwa.match(u'과') == u''
260 |     assert Gwa.match(u'과는') == u'는'
261 |     assert Gwa.match(u'관') == u'ㄴ'
262 |     # (eu)ro
263 |     assert Euro.match(u'으로도') == u'도'
264 |     assert Euro.match(u'론') == u'ㄴ'
265 | 
266 | 
267 | def test_combine():
268 |     assert Euro[u'집':u'로'] == u'으로'
269 |     assert Euro[u'집':u'론'] == u'으론'
270 |     assert Euro[u'집':u'로는'] == u'으로는'
271 |     assert Euro[u'집':u'론123'] == u'으론123'
272 | 
273 | 
274 | def test_tolerances_for_coda_combination():
275 |     assert Euro[u'Hello':u'론'] == u'(으)론'
276 |     assert Gwa[u'Hello':u'완'] == u'관(완)'
277 |     assert Gwa[u'Hello':u'완':OPTIONAL_MORPH2_AND_MORPH1] == u'(완)관'
278 |     assert Gwa[u'Hello':u'완완완'] == u'관(완)완완'
279 |     assert Particle(u'크', u'')[u'Hello':u'큰큰'] == u'(큰)큰'
280 | 
281 | 
282 | def test_igyuho2006():
283 |     """Particles from <Classification and List of Conjunctive Particles>,
284 |     I Gyu-ho, 2006.
285 |     """
286 |     def ff(particle_string):
287 |         return f(u'남', particle_string), f(u'나', particle_string)
288 |     # p181-182:
289 |     assert ff(u'의') == (u'남의', u'나의')
290 |     assert ff(u'과') == (u'남과', u'나와')
291 |     assert ff(u'와') == (u'남과', u'나와')
292 |     assert ff(u'하고') == (u'남하고', u'나하고')
293 |     assert ff(u'이랑') == (u'남이랑', u'나랑')
294 |     assert ff(u'이니') == (u'남이니', u'나니')
295 |     assert ff(u'이다') == (u'남이다', u'나다')
296 |     assert ff(u'이라든가') == (u'남이라든가', u'나라든가')
297 |     assert ff(u'이라든지') == (u'남이라든지', u'나라든지')
298 |     assert ff(u'이며') == (u'남이며', u'나며')
299 |     assert ff(u'이야') == (u'남이야', u'나야')
300 |     assert ff(u'이요') == (u'남이요', u'나요')
301 |     assert ff(u'이랴') == (u'남이랴', u'나랴')
302 |     assert ff(u'에') == (u'남에', u'나에')
303 |     assert ff(u'하며') == (u'남하며', u'나하며')
304 |     assert ff(u'커녕') == (u'남커녕', u'나커녕')
305 |     assert ff(u'은커녕') == (u'남은커녕', u'나는커녕')
306 |     assert ff(u'이고') == (u'남이고', u'나고')
307 |     assert ff(u'이나') == (u'남이나', u'나나')
308 |     assert ff(u'에다') == (u'남에다', u'나에다')
309 |     assert ff(u'에다가') == (u'남에다가', u'나에다가')
310 |     assert ff(u'이란') == (u'남이란', u'나란')
311 |     assert ff(u'이면') == (u'남이면', u'나면')
312 |     assert ff(u'이거나') == (u'남이거나', u'나거나')
313 |     assert ff(u'이건') == (u'남이건', u'나건')
314 |     assert ff(u'이든') == (u'남이든', u'나든')
315 |     assert ff(u'이든가') == (u'남이든가', u'나든가')
316 |     assert ff(u'이든지') == (u'남이든지', u'나든지')
317 |     assert ff(u'인가') == (u'남인가', u'나인가')
318 |     assert ff(u'인지') == (u'남인지', u'나인지')
319 |     # p188-189:
320 |     assert ff(u'인') == (u'남인', u'나인')
321 |     assert ff(u'는') == (u'남은', u'나는')
322 |     assert ff(u'이라는') == (u'남이라는', u'나라는')
323 |     assert ff(u'이네') == (u'남이네', u'나네')
324 |     assert ff(u'도') == (u'남도', u'나도')
325 |     assert ff(u'이면서') == (u'남이면서', u'나면서')
326 |     assert ff(u'이자') == (u'남이자', u'나자')
327 |     assert ff(u'하고도') == (u'남하고도', u'나하고도')
328 |     assert ff(u'이냐') == (u'남이냐', u'나냐')
329 | 
330 | 
331 | def test_tolerance_style():
332 |     assert Gwa[u'Hello'::OPTIONAL_MORPH2_AND_MORPH1] == u'(와)과'
333 |     assert parse_tolerance_style(0) == MORPH1_AND_OPTIONAL_MORPH2
334 |     assert parse_tolerance_style(u'을(를)') == MORPH1_AND_OPTIONAL_MORPH2
335 |     assert parse_tolerance_style(u'(를)을') == OPTIONAL_MORPH2_AND_MORPH1
336 |     with pytest.raises(ValueError):
337 |         parse_tolerance_style(u'과')
338 |     with pytest.raises(ValueError):
339 |         parse_tolerance_style(u'이다')
340 |     with pytest.raises(ValueError):
341 |         parse_tolerance_style(u'(이)')
342 |     assert get_tolerance([u'예제'], OPTIONAL_MORPH2_AND_MORPH1) == u'예제'
343 |     assert get_tolerance_from_iterator(iter([u'예제']),
344 |                                        OPTIONAL_MORPH2_AND_MORPH1) == u'예제'
345 | 
346 | 
347 | def test_static_tolerance_style():
348 |     assert f(u'나오', u'을', tolerance_style=u'을/를') == u'나오를'
349 |     assert f(u'키홀', u'를', tolerance_style=u'을/를') == u'키홀을'
350 |     assert f(u'Tossi', u'을', tolerance_style=u'을/를') == u'Tossi을/를'
351 | 
352 | 
353 | def test_pick():
354 |     assert tossi.pick(u'나오', u'을') == u'를'
355 |     assert tossi.pick(u'키홀', u'를') == u'을'
356 |     assert tossi.pick(u'남', u'면서') == u'이면서'
357 |     assert tossi.pick(u'Tossi', u'을') == u'을(를)'
358 |     assert tossi.pick(u'Tossi', u'을', tolerance_style=u'을/를') == u'을/를'
359 | 
360 | 
361 | def test_custom_guess_coda():
362 |     def dont_guess_coda(word):
363 |         return None
364 |     assert Euro.allomorph(u'밖', u'으로',
365 |                           guess_coda=dont_guess_coda) == u'(으)로'
366 | 
367 | 
368 | def test_unmatch():
369 |     assert Eul[u'예제':u'는'] is None
370 | 
371 | 
372 | def test_formatter():
373 |     t = u'{0:으로} {0:을}'
374 |     f1 = functools.partial(tossi.Formatter(registry).format, t)
375 |     f2 = functools.partial(tossi.format, t)
376 |     assert f1(u'나오') == f2(u'나오') == u'나오로 나오를'
377 |     assert f1(u'키홀') == f2(u'키홀') == u'키홀로 키홀을'
378 |     assert f1(u'모리안') == f2(u'모리안') == u'모리안으로 모리안을'
379 | 
380 | 
381 | def test_singleton_error():
382 |     with pytest.raises(TypeError):
383 |         class Fail(with_metaclass(SingletonParticleMeta, object)):
384 |             pass
385 | 
386 | 
387 | def test_deprecations():
388 |     pytest.deprecated_call(registry.postfix_particle, u'테스트', u'으로부터')
389 |     pytest.deprecated_call(tossi.postfix_particle, u'테스트', u'으로부터')
390 |     pytest.deprecated_call(tossi.get_particle, u'으로부터')
391 | 


--------------------------------------------------------------------------------
/tossi/__about__.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | """
 3 |    tossi.__about__
 4 |    ~~~~~~~~~~~~~~~
 5 | """
 6 | __version__ = '0.3.1'
 7 | __license__ = 'BSD'
 8 | __author__ = 'What! Studio'
 9 | __maintainer__ = 'Heungsub Lee'
10 | __maintainer_email__ = 'sub@nexon.co.kr'
11 | 


--------------------------------------------------------------------------------
/tossi/__init__.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | """
  3 |    tossi
  4 |    ~~~~~
  5 | 
  6 |    Supports Korean particles.
  7 | 
  8 |    :copyright: (c) 2016-2017 by What! Studio
  9 |    :license: BSD, see LICENSE for more details.
 10 | 
 11 | """
 12 | import re
 13 | import warnings
 14 | 
 15 | from tossi.coda import guess_coda
 16 | from tossi.formatter import Formatter
 17 | from tossi.particles import Euro, Ida, Particle
 18 | from tossi.tolerance import (
 19 |     MORPH1_AND_OPTIONAL_MORPH2, MORPH2_AND_OPTIONAL_MORPH1,
 20 |     OPTIONAL_MORPH1_AND_MORPH2, OPTIONAL_MORPH2_AND_MORPH1,
 21 |     parse_tolerance_style)
 22 | 
 23 | 
 24 | __all__ = ['get_particle', 'guess_coda', 'MORPH1_AND_OPTIONAL_MORPH2',
 25 |            'MORPH2_AND_OPTIONAL_MORPH1', 'OPTIONAL_MORPH1_AND_MORPH2',
 26 |            'OPTIONAL_MORPH2_AND_MORPH1', 'parse', 'parse_tolerance_style',
 27 |            'Particle', 'pick', 'postfix', 'postfix_particle',
 28 |            'Formatter', 'format']
 29 | 
 30 | 
 31 | def index_particles(particles):
 32 |     """Indexes :class:`Particle` objects.  It returns a regex pattern which
 33 |     matches to any particle morphs and a dictionary indexes the given particles
 34 |     by regex groups.
 35 |     """
 36 |     patterns, indices = [], {}
 37 |     for x, p in enumerate(particles):
 38 |         group = u'_%d' % x
 39 |         indices[group] = x
 40 |         patterns.append(u'(?P<%s>%s)' % (group, p.regex_pattern()))
 41 |     pattern = re.compile(u'|'.join(patterns))
 42 |     return pattern, indices
 43 | 
 44 | 
 45 | class ParticleRegistry(object):
 46 | 
 47 |     __slots__ = ('default', 'particles', 'pattern', 'indices')
 48 | 
 49 |     def __init__(self, default, particles):
 50 |         self.default = default
 51 |         self.particles = particles
 52 |         self.pattern, self.indices = index_particles(particles)
 53 | 
 54 |     def _get_by_match(self, match):
 55 |         x = self.indices[match.lastgroup]
 56 |         return self.particles[x]
 57 | 
 58 |     def parse(self, morph):
 59 |         m = self.pattern.match(morph)
 60 |         if m is None:
 61 |             return self.default
 62 |         return self._get_by_match(m)
 63 | 
 64 |     def pick(self, word, morph, **kwargs):
 65 |         particle = self.parse(morph)
 66 |         return particle.allomorph(word, morph, **kwargs)
 67 | 
 68 |     def postfix(self, word, morph, **kwargs):
 69 |         return word + self.pick(word, morph, **kwargs)
 70 | 
 71 |     def get(self, morph):
 72 |         warnings.warn(DeprecationWarning('Use parse() instead'))
 73 |         return self.parse(morph)
 74 | 
 75 |     def postfix_particle(self, word, morph, **kwargs):
 76 |         warnings.warn(DeprecationWarning('Use postfix() instead'))
 77 |         return self.postfix(word, morph, **kwargs)
 78 | 
 79 | 
 80 | #: The default registry for well-known Korean particles.
 81 | registry = ParticleRegistry(Ida, [
 82 |     # Simple allomorphic rule:
 83 |     Particle(u'이', u'가', final=True),
 84 |     Particle(u'을', u'를', final=True),
 85 |     Particle(u'은', u'는'),  # "은(는)" includes "은(는)커녕".
 86 |     Particle(u'과', u'와'),
 87 |     # Vocative particles:
 88 |     Particle(u'아', u'야', final=True),
 89 |     Particle(u'이여', u'여', final=True),
 90 |     Particle(u'이시여', u'시여', final=True),
 91 |     # Invariant particles:
 92 |     Particle(u'의', final=True),
 93 |     Particle(u'도', final=True),
 94 |     Particle(u'만'),
 95 |     Particle(u'에'),
 96 |     Particle(u'께'),
 97 |     Particle(u'뿐'),
 98 |     Particle(u'하'),
 99 |     Particle(u'보다'),
100 |     Particle(u'밖에'),
101 |     Particle(u'같이'),
102 |     Particle(u'부터'),
103 |     Particle(u'까지'),
104 |     Particle(u'마저'),
105 |     Particle(u'조차'),
106 |     Particle(u'마냥'),
107 |     Particle(u'처럼'),
108 |     Particle(u'커녕'),
109 |     # Special particles:
110 |     Euro,
111 | ])
112 | formatter = Formatter(registry)
113 | 
114 | 
115 | def parse(morph):
116 |     """Shortcut for :class:`ParticleRegistry.parse` of the default registry."""
117 |     return registry.parse(morph)
118 | 
119 | 
120 | def pick(word, morph, **kwargs):
121 |     """Shortcut for :class:`ParticleRegistry.pick` of the default registry.
122 |     """
123 |     return registry.pick(word, morph, **kwargs)
124 | 
125 | 
126 | def postfix(word, morph, **kwargs):
127 |     """Shortcut for :class:`ParticleRegistry.postfix` of the default registry.
128 |     """
129 |     return registry.postfix(word, morph, **kwargs)
130 | 
131 | 
132 | def get_particle(morph):
133 |     warnings.warn(DeprecationWarning('Use parse() instead'))
134 |     return parse(morph)
135 | 
136 | 
137 | def postfix_particle(word, morph, **kwargs):
138 |     warnings.warn(DeprecationWarning('Use postfix() instead'))
139 |     return postfix(word, morph, **kwargs)
140 | 
141 | 
142 | def format(message, *args, **kwargs):
143 |     """Shortcut for :class:`tossi.Formatter.format` of the default registry.
144 |     """
145 |     return formatter.vformat(message, args, kwargs)
146 | 


--------------------------------------------------------------------------------
/tossi/coda.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | """
  3 |    tossi.coda
  4 |    ~~~~~~~~~~
  5 | 
  6 |    Coda is final consonant in a Korean syllable.  That is important because
  7 |    required when determining a particle allomorph in Korean.
  8 | 
  9 |    This module implements :func:`guess_coda` and related functions to guess a
 10 |    coda from any words as correct as possible.
 11 | 
 12 |    :copyright: (c) 2016-2017 by What! Studio
 13 |    :license: BSD, see LICENSE for more details.
 14 | 
 15 | """
 16 | from bisect import bisect_right
 17 | from decimal import Decimal
 18 | import re
 19 | import unicodedata
 20 | 
 21 | from tossi.hangul import split_phonemes
 22 | 
 23 | 
 24 | __all__ = ['filter_only_significant', 'guess_coda',
 25 |            'guess_coda_from_significant_word', 'pick_coda_from_decimal',
 26 |            'pick_coda_from_letter']
 27 | 
 28 | 
 29 | #: Matches to a decimal at the end of a word.
 30 | DECIMAL_PATTERN = re.compile(r'[0-9]+(\.[0-9]+)?$')
 31 | 
 32 | 
 33 | def guess_coda(word):
 34 |     """Guesses the coda of the given word as correct as possible.  If it fails
 35 |     to guess the coda, returns ``None``.
 36 |     """
 37 |     word = filter_only_significant(word)
 38 |     return guess_coda_from_significant_word(word)
 39 | 
 40 | 
 41 | def guess_coda_from_significant_word(word):
 42 |     if not word:
 43 |         return None
 44 |     decimal_m = DECIMAL_PATTERN.search(word)
 45 |     if decimal_m:
 46 |         return pick_coda_from_decimal(decimal_m.group(0))
 47 |     return pick_coda_from_letter(word[-1])
 48 | 
 49 | 
 50 | # Patterns which match to significant or insignificant letters at the end of
 51 | # words.
 52 | INSIGNIFICANT_PARENTHESIS_PATTERN = re.compile(r'\(.*?\)$')
 53 | SIGNIFICANT_UNICODE_CATEGORY_PATTERN = re.compile(r'^([LN].|S[cmo])$')
 54 | 
 55 | 
 56 | def filter_only_significant(word):
 57 |     """Gets a word which removes insignificant letters at the end of the given
 58 |     word::
 59 | 
 60 |     >>> pick_significant(u'넥슨(코리아)')
 61 |     넥슨
 62 |     >>> pick_significant(u'메이플스토리...')
 63 |     메이플스토리
 64 | 
 65 |     """
 66 |     if not word:
 67 |         return word
 68 |     # Unwrap a complete parenthesis.
 69 |     if word.startswith(u'(') and word.endswith(u')'):
 70 |         return filter_only_significant(word[1:-1])
 71 |     x = len(word)
 72 |     while x > 0:
 73 |         x -= 1
 74 |         c = word[x]
 75 |         # Skip a complete parenthesis.
 76 |         if c == u')':
 77 |             m = INSIGNIFICANT_PARENTHESIS_PATTERN.search(word[:x + 1])
 78 |             if m is not None:
 79 |                 x = m.start()
 80 |             continue
 81 |         # Skip unreadable characters such as punctuations.
 82 |         unicode_category = unicodedata.category(c)
 83 |         if not SIGNIFICANT_UNICODE_CATEGORY_PATTERN.match(unicode_category):
 84 |             continue
 85 |         break
 86 |     return word[:x + 1]
 87 | 
 88 | 
 89 | def pick_coda_from_letter(letter):
 90 |     """Picks only a coda from a Hangul letter.  It returns ``None`` if the
 91 |     given letter is not Hangul.
 92 |     """
 93 |     try:
 94 |         __, __, coda = \
 95 |             split_phonemes(letter, onset=False, nucleus=False, coda=True)
 96 |     except ValueError:
 97 |         return None
 98 |     else:
 99 |         return coda
100 | 
101 | 
102 | # Data for picking coda from a decimal.
103 | DIGITS = u'영일이삼사오육칠팔구'
104 | EXPS = {1: u'십', 2: u'백', 3: u'천', 4: u'만',
105 |         8: u'억', 12: u'조', 16: u'경', 20: u'해',
106 |         24: u'자', 28: u'양', 32: u'구', 36: u'간',
107 |         40: u'정', 44: u'재', 48: u'극', 52: u'항하사',
108 |         56: u'아승기', 60: u'나유타', 64: u'불가사의', 68: u'무량대수',
109 |         72: u'겁', 76: u'업'}
110 | DIGIT_CODAS = [pick_coda_from_letter(x[-1]) for x in DIGITS]
111 | EXP_CODAS = {exp: pick_coda_from_letter(x[-1]) for exp, x in EXPS.items()}
112 | EXP_INDICES = list(sorted(EXPS.keys()))
113 | 
114 | 
115 | # Mark the first unreadable exponent.
116 | _unreadable_exp = max(EXP_INDICES) + 4
117 | EXP_CODAS[_unreadable_exp] = None
118 | EXP_INDICES.append(_unreadable_exp)
119 | del _unreadable_exp
120 | 
121 | 
122 | def pick_coda_from_decimal(decimal):
123 |     """Picks only a coda from a decimal."""
124 |     decimal = Decimal(decimal)
125 |     __, digits, exp = decimal.as_tuple()
126 |     if exp < 0:
127 |         return DIGIT_CODAS[digits[-1]]
128 |     __, digits, exp = decimal.normalize().as_tuple()
129 |     index = bisect_right(EXP_INDICES, exp) - 1
130 |     if index < 0:
131 |         return DIGIT_CODAS[digits[-1]]
132 |     else:
133 |         return EXP_CODAS[EXP_INDICES[index]]
134 | 


--------------------------------------------------------------------------------
/tossi/formatter.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | """
 3 |    tossi.formatter
 4 |    ~~~~~~~~~~~~~~~
 5 | 
 6 |    String formatter for Tossi.
 7 | 
 8 |    :copyright: (c) 2016-2017 by What! Studio
 9 |    :license: BSD, see LICENSE for more details.
10 | 
11 | """
12 | import re
13 | from string import Formatter as StringFormatter
14 | 
15 | 
16 | class Formatter(StringFormatter):
17 |     """String formatter supports tossi format spec.
18 | 
19 |     >>> f = Formatter(tossi.registry)
20 |     >>> t = u'{0:으로} {0:을}'
21 |     >>> assert f.format(t, u'나오') == u'나오로 나오를'
22 |     >>> assert f.format(t, u'키홀') == u'키홀로 키홀을'
23 |     >>> assert f.format(t, u'모리안') == u'모리안으로 모리안을'
24 |     """
25 |     hangul_pattern = re.compile(u'[가-힣]+')
26 | 
27 |     def __init__(self, registry):
28 |         self.registry = registry
29 | 
30 |     def format_field(self, value, format_spec):
31 |         if re.match(self.hangul_pattern, format_spec):
32 |             return self.registry.postfix(value, format_spec)
33 |         else:
34 |             return super(Formatter, self).format_field(value, format_spec)
35 | 


--------------------------------------------------------------------------------
/tossi/hangul.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | """
 3 |    tossi.hangul
 4 |    ~~~~~~~~~~~~
 5 | 
 6 |    Manipulates Hangul letters.
 7 | 
 8 |    :copyright: (c) 2016-2017 by What! Studio
 9 |    :license: BSD, see LICENSE for more details.
10 | 
11 | """
12 | from six import unichr
13 | 
14 | 
15 | __all__ = ['combine_words', 'is_consonant', 'is_hangul', 'join_phonemes',
16 |            'split_phonemes']
17 | 
18 | 
19 | # Korean phonemes as known as 자소 including
20 | # onset(초성), nucleus(중성), and coda(종성).
21 | ONSETS = list(u'ㄱㄲㄴㄷㄸㄹㅁㅂㅃㅅㅆㅇㅈㅉㅊㅋㅌㅍㅎ')
22 | NUCLEUSES = list(u'ㅏㅐㅑㅒㅓㅔㅕㅖㅗㅘㅙㅚㅛㅜㅝㅞㅟㅠㅡㅢㅣ')
23 | CODAS = [u'']
24 | CODAS.extend(u'ㄱㄲㄳㄴㄵㄶㄷㄹㄺㄻㄼㄽㄾㄿㅀㅁㅂㅄㅅㅆㅇㅈㅊㅋㅌㅍㅎ')
25 | 
26 | # Lengths of the phonemes.
27 | NUM_ONSETS = len(ONSETS)
28 | NUM_NUCLEUSES = len(NUCLEUSES)
29 | NUM_CODAS = len(CODAS)
30 | 
31 | #: The Unicode offset of "가" which is the base offset for all Hangul letters.
32 | FIRST_HANGUL_OFFSET = ord(u'가')
33 | 
34 | 
35 | def is_hangul(letter):
36 |     return u'가' <= letter <= u'힣'
37 | 
38 | 
39 | def is_consonant(letter):
40 |     return u'ㄱ' <= letter <= u'ㅎ'
41 | 
42 | 
43 | def join_phonemes(*args):
44 |     """Joins a Hangul letter from Korean phonemes."""
45 |     # Normalize arguments as onset, nucleus, coda.
46 |     if len(args) == 1:
47 |         # tuple of (onset, nucleus[, coda])
48 |         args = args[0]
49 |     if len(args) == 2:
50 |         args += (CODAS[0],)
51 |     try:
52 |         onset, nucleus, coda = args
53 |     except ValueError:
54 |         raise TypeError('join_phonemes() takes at most 3 arguments')
55 |     offset = (
56 |         (ONSETS.index(onset) * NUM_NUCLEUSES + NUCLEUSES.index(nucleus)) *
57 |         NUM_CODAS + CODAS.index(coda)
58 |     )
59 |     return unichr(FIRST_HANGUL_OFFSET + offset)
60 | 
61 | 
62 | def split_phonemes(letter, onset=True, nucleus=True, coda=True):
63 |     """Splits Korean phonemes as known as "자소" from a Hangul letter.
64 | 
65 |     :returns: (onset, nucleus, coda)
66 |     :raises ValueError: `letter` is not a Hangul single letter.
67 | 
68 |     """
69 |     if len(letter) != 1 or not is_hangul(letter):
70 |         raise ValueError('Not Hangul letter: %r' % letter)
71 |     offset = ord(letter) - FIRST_HANGUL_OFFSET
72 |     phonemes = [None] * 3
73 |     if onset:
74 |         phonemes[0] = ONSETS[offset // (NUM_NUCLEUSES * NUM_CODAS)]
75 |     if nucleus:
76 |         phonemes[1] = NUCLEUSES[(offset // NUM_CODAS) % NUM_NUCLEUSES]
77 |     if coda:
78 |         phonemes[2] = CODAS[offset % NUM_CODAS]
79 |     return tuple(phonemes)
80 | 
81 | 
82 | def combine_words(word1, word2):
83 |     """Combines two words.  If the first word ends with a vowel and the initial
84 |     letter of the second word is only consonant, it merges them into one
85 |     letter::
86 | 
87 |     >>> combine_words(u'다', u'ㄺ')
88 |     닭
89 |     >>> combine_words(u'가오', u'ㄴ누리')
90 |     가온누리
91 | 
92 |     """
93 |     if word1 and word2 and is_consonant(word2[0]):
94 |         onset, nucleus, coda = split_phonemes(word1[-1])
95 |         if not coda:
96 |             glue = join_phonemes(onset, nucleus, word2[0])
97 |             return word1[:-1] + glue + word2[1:]
98 |     return word1 + word2
99 | 


--------------------------------------------------------------------------------
/tossi/particles.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | """
  3 |    tossi.particles
  4 |    ~~~~~~~~~~~~~~~
  5 | 
  6 |    Models for Korean allomorphic particles.
  7 | 
  8 |    :copyright: (c) 2016-2017 by What! Studio
  9 |    :license: BSD, see LICENSE for more details.
 10 | 
 11 | """
 12 | from itertools import chain
 13 | import re
 14 | 
 15 | from bidict import bidict
 16 | from six import PY2, python_2_unicode_compatible, text_type, with_metaclass
 17 | 
 18 | from tossi.coda import guess_coda, pick_coda_from_letter
 19 | from tossi.hangul import (
 20 |     combine_words, is_consonant, join_phonemes, split_phonemes)
 21 | from tossi.tolerance import (
 22 |     generate_tolerances, get_tolerance, get_tolerance_from_iterator,
 23 |     MORPH1_AND_OPTIONAL_MORPH2)
 24 | from tossi.utils import cached_property, CacheMeta
 25 | 
 26 | 
 27 | __all__ = ['DEFAULT_GUESS_CODA', 'DEFAULT_TOLERANCE_STYLE',
 28 |            'Euro', 'Ida', 'Particle']
 29 | 
 30 | 
 31 | #: The default tolerance style.
 32 | DEFAULT_TOLERANCE_STYLE = MORPH1_AND_OPTIONAL_MORPH2
 33 | 
 34 | #: The default function to guess the coda from a word.
 35 | DEFAULT_GUESS_CODA = guess_coda
 36 | 
 37 | 
 38 | @python_2_unicode_compatible
 39 | class Particle(with_metaclass(CacheMeta)):
 40 |     """Represents a Korean particle as known as "조사".
 41 | 
 42 |     This also implements the general allomorphic rule for most common
 43 |     particles.
 44 | 
 45 |     :param morph1: an allomorph after a consonant.
 46 |     :param morph2: an allomorph after a vowel.  If it is omitted, there's no
 47 |                   no alternative allomorph.  So `morph1` always will be
 48 |                   selected.
 49 |     :param final: whether the particle disallows combination with another
 50 |                   postpositions.  (default: ``False``)
 51 | 
 52 |     """
 53 | 
 54 |     __slots__ = ('morph1', 'morph2', 'final')
 55 | 
 56 |     def __init__(self, morph1, morph2=None, final=False):
 57 |         self.morph1 = morph1
 58 |         self.morph2 = morph1 if morph2 is None else morph2
 59 |         self.final = final
 60 | 
 61 |     @cached_property
 62 |     def tolerances(self):
 63 |         """The tuple containing all the possible tolerant morphs."""
 64 |         return tuple(generate_tolerances(self.morph1, self.morph2))
 65 | 
 66 |     def tolerance(self, style=DEFAULT_TOLERANCE_STYLE):
 67 |         """Gets a tolerant morph."""
 68 |         return get_tolerance(self.tolerances, style)
 69 | 
 70 |     def rule(self, coda):
 71 |         """Determines one of allomorphic morphs based on a coda."""
 72 |         if coda:
 73 |             return self.morph1
 74 |         else:
 75 |             return self.morph2
 76 | 
 77 |     def allomorph(self, word, morph, tolerance_style=DEFAULT_TOLERANCE_STYLE,
 78 |                   guess_coda=DEFAULT_GUESS_CODA):
 79 |         """Determines one of allomorphic morphs based on a word.
 80 | 
 81 |         .. see also:: :meth:`allomorph`.
 82 | 
 83 |         """
 84 |         suffix = self.match(morph)
 85 |         if suffix is None:
 86 |             return None
 87 |         coda = guess_coda(word)
 88 |         if coda is not None:
 89 |             # Coda guessed successfully.
 90 |             morph = self.rule(coda)
 91 |         elif isinstance(tolerance_style, text_type):
 92 |             # User specified the style themselves
 93 |             morph = tolerance_style
 94 |         elif not suffix or not is_consonant(suffix[0]):
 95 |             # Choose the tolerant morph.
 96 |             morph = self.tolerance(tolerance_style)
 97 |         else:
 98 |             # Suffix starts with a consonant.  Generate a new tolerant morph
 99 |             # by combined morphs.
100 |             morph1 = (combine_words(self.morph1, suffix)
101 |                       if self.morph1 else suffix[1:])
102 |             morph2 = (combine_words(self.morph2, suffix)
103 |                       if self.morph2 else suffix[1:])
104 |             tolerances = generate_tolerances(morph1, morph2)
105 |             return get_tolerance_from_iterator(tolerances, tolerance_style)
106 |         return combine_words(morph, suffix)
107 | 
108 |     def __getitem__(self, key):
109 |         """The syntax sugar to determine one of allomorphic morphs based on a
110 |         word::
111 | 
112 |            eun = Particle(u'은', u'는')
113 |            assert eun[u'나오'] == u'는'
114 |            assert eun[u'모리안'] == u'은'
115 | 
116 |         """
117 |         if isinstance(key, slice):
118 |             word = key.start
119 |             morph = key.stop or self.morph1
120 |             tolerance_style = key.step or DEFAULT_TOLERANCE_STYLE
121 |         else:
122 |             word, morph = key, self.morph1
123 |             tolerance_style = DEFAULT_TOLERANCE_STYLE
124 |         return self.allomorph(word, morph, tolerance_style)
125 | 
126 |     @cached_property
127 |     def regex(self):
128 |         return re.compile(self.regex_pattern())
129 | 
130 |     @cached_property
131 |     def morphs(self):
132 |         """The tuple containing the given morphs and all the possible tolerant
133 |         morphs.  Longer is first.
134 |         """
135 |         seen = set()
136 |         saw = seen.add
137 |         morphs = chain([self.morph1, self.morph2], self.tolerances)
138 |         unique_morphs = (x for x in morphs if x and not (x in seen or saw(x)))
139 |         return tuple(sorted(unique_morphs, key=len, reverse=True))
140 | 
141 |     def match(self, morph):
142 |         m = self.regex.match(morph)
143 |         if m is None:
144 |             return None
145 |         x = m.end()
146 |         if self.final or m.group() == self.morphs[m.lastindex - 1]:
147 |             return morph[x:]
148 |         coda = pick_coda_from_letter(morph[x - 1])
149 |         return coda + morph[x:]
150 | 
151 |     def regex_pattern(self):
152 |         if self.final:
153 |             return u'^(?:%s)$' % u'|'.join(re.escape(f) for f in self.morphs)
154 |         patterns = []
155 |         for morph in self.morphs:
156 |             try:
157 |                 onset, nucleus, coda = split_phonemes(morph[-1])
158 |             except ValueError:
159 |                 coda = None
160 |             if coda == u'':
161 |                 start = morph[-1]
162 |                 end = join_phonemes(onset, nucleus, u'ㅎ')
163 |                 pattern = re.escape(morph[:-1]) + u'[%s-%s]' % (start, end)
164 |             else:
165 |                 pattern = re.escape(morph)
166 |             patterns.append(pattern)
167 |         return u'^(?:%s)' % u'|'.join(u'(%s)' % p for p in patterns)
168 | 
169 |     def __str__(self):
170 |         return self.tolerance()
171 | 
172 |     if PY2:
173 |         def __repr__(self):
174 |             try:
175 |                 from unidecode import unidecode
176 |             except ImportError:
177 |                 return '<Particle: %r>' % self.tolerance()
178 |             else:
179 |                 return '<Particle: %s>' % unidecode(self.tolerance())
180 |     else:
181 |         def __repr__(self):
182 |             return '<Particle: %s>' % self.tolerance()
183 | 
184 | 
185 | class SingletonParticleMeta(type(Particle)):
186 | 
187 |     def __new__(meta, name, bases, attrs):
188 |         base_meta = super(SingletonParticleMeta, meta)
189 |         cls = base_meta.__new__(meta, name, bases, attrs)
190 |         if not issubclass(cls, Particle):
191 |             raise TypeError('Not particle class')
192 |         # Instantiate directly instead of returning a class.
193 |         return cls()
194 | 
195 | 
196 | class SingletonParticle(Particle):
197 | 
198 |     # Concrete classes should set these strings.
199 |     morph1 = morph2 = final = NotImplemented
200 | 
201 |     def __init__(self):
202 |         pass
203 | 
204 | 
205 | def singleton_particle(*bases):
206 |     """Defines a singleton instance immediately when defining the class.  The
207 |     name of the class will refer the instance instead.
208 |     """
209 |     return with_metaclass(SingletonParticleMeta, SingletonParticle, *bases)
210 | 
211 | 
212 | class Euro(singleton_particle(Particle)):
213 |     """Particles starting with "으로" have a special allomorphic rule after
214 |     coda "ㄹ".  "으로" can also be extended with some of suffixes such as
215 |     "으로서", "으로부터".
216 |     """
217 | 
218 |     __slots__ = ()
219 | 
220 |     morph1 = u'으로'
221 |     morph2 = u'로'
222 |     final = False
223 | 
224 |     def rule(self, coda):
225 |         if coda and coda != u'ㄹ':
226 |             return self.morph1
227 |         else:
228 |             return self.morph2
229 | 
230 | 
231 | class Ida(singleton_particle(Particle)):
232 |     """"이다" is a verbal particle.  Like other Korean verbs, it is also
233 |     fusional.
234 |     """
235 | 
236 |     __slots__ = ()
237 | 
238 |     morph1 = u'이'
239 |     morph2 = u''
240 |     final = False
241 | 
242 |     #: Matches with initial "이" or "(이)" to normalize fusioned verbal morphs.
243 |     I_PATTERN = re.compile(u'^이|\(이\)')
244 | 
245 |     #: The mapping for vowels which should be transmorphed by /j/ injection.
246 |     J_INJECTIONS = bidict({u'ㅓ': u'ㅕ', u'ㅔ': u'ㅖ'})
247 | 
248 |     def allomorph(self, word, morph, tolerance_style=DEFAULT_TOLERANCE_STYLE,
249 |                   guess_coda=DEFAULT_GUESS_CODA):
250 |         suffix = self.I_PATTERN.sub(u'', morph)
251 |         coda = guess_coda(word)
252 |         next_onset, next_nucleus, next_coda = split_phonemes(suffix[0])
253 |         if next_onset == u'ㅇ':
254 |             if next_nucleus == u'ㅣ':
255 |                 # No allomorphs when a morph starts with "이" and has a coda.
256 |                 return suffix
257 |             mapping = None
258 |             if coda == u'' and next_nucleus in self.J_INJECTIONS:
259 |                 # Squeeze "이어" or "이에" to "여" or "예"
260 |                 # after a word which ends with a nucleus.
261 |                 mapping = self.J_INJECTIONS
262 |             elif coda != u'' and next_nucleus in self.J_INJECTIONS.inv:
263 |                 # Lengthen "여" or "예" to "이어" or "이에"
264 |                 # after a word which ends with a consonant.
265 |                 mapping = self.J_INJECTIONS.inv
266 |             if mapping is not None:
267 |                 next_nucleus = mapping[next_nucleus]
268 |                 next_letter = join_phonemes(u'ㅇ', next_nucleus, next_coda)
269 |                 suffix = next_letter + suffix[1:]
270 |         if coda is None:
271 |             morph = self.tolerance(tolerance_style)
272 |         else:
273 |             morph = self.rule(coda)
274 |         return morph + suffix
275 | 


--------------------------------------------------------------------------------
/tossi/tolerance.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | """
 3 |    tossi.tolerance
 4 |    ~~~~~~~~~~~~~~~
 5 | 
 6 |    Utilities for tolerant particle morphs.
 7 | 
 8 |    :copyright: (c) 2016-2017 by What! Studio
 9 |    :license: BSD, see LICENSE for more details.
10 | 
11 | """
12 | from six import integer_types
13 | 
14 | 
15 | __all__ = ['generate_tolerances', 'get_tolerance',
16 |            'get_tolerance_from_iterator', 'parse_tolerance_style']
17 | 
18 | 
19 | # Tolerance styles:
20 | MORPH1_AND_OPTIONAL_MORPH2 = 0  # 은(는)
21 | OPTIONAL_MORPH1_AND_MORPH2 = 1  # (은)는
22 | MORPH2_AND_OPTIONAL_MORPH1 = 2  # 는(은)
23 | OPTIONAL_MORPH2_AND_MORPH1 = 3  # (는)은
24 | 
25 | 
26 | def generate_tolerances(morph1, morph2):
27 |     """Generates all reasonable tolerant particle morphs::
28 | 
29 |     >>> set(generate_tolerances(u'이', u'가'))
30 |     set([u'이(가)', u'(이)가', u'가(이)', u'(가)이'])
31 |     >>> set(generate_tolerances(u'이면', u'면'))
32 |     set([u'(이)면'])
33 | 
34 |     """
35 |     if morph1 == morph2:
36 |         # Tolerance not required.
37 |         return
38 |     if not (morph1 and morph2):
39 |         # Null allomorph exists.
40 |         yield u'(%s)' % (morph1 or morph2)
41 |         return
42 |     len1, len2 = len(morph1), len(morph2)
43 |     if len1 != len2:
44 |         longer, shorter = (morph1, morph2) if len1 > len2 else (morph2, morph1)
45 |         if longer.endswith(shorter):
46 |             # Longer morph ends with shorter morph.
47 |             yield u'(%s)%s' % (longer[:-len(shorter)], shorter)
48 |             return
49 |     # Find common suffix between two morphs.
50 |     for x, (let1, let2) in enumerate(zip(reversed(morph1), reversed(morph2))):
51 |         if let1 != let2:
52 |             break
53 |     if x:
54 |         # They share the common suffix.
55 |         x1, x2 = len(morph1) - x, len(morph2) - x
56 |         common_suffix = morph1[x1:]
57 |         morph1, morph2 = morph1[:x1], morph2[:x2]
58 |     else:
59 |         # No similarity with each other.
60 |         common_suffix = ''
61 |     for morph1, morph2 in [(morph1, morph2), (morph2, morph1)]:
62 |         yield u'%s(%s)%s' % (morph1, morph2, common_suffix)
63 |         yield u'(%s)%s%s' % (morph1, morph2, common_suffix)
64 | 
65 | 
66 | def parse_tolerance_style(style, registry=None):
67 |     """Resolves a tolerance style of the given tolerant particle morph::
68 | 
69 |     >>> parse_tolerance_style(u'은(는)')
70 |     0
71 |     >>> parse_tolerance_style(u'(은)는')
72 |     1
73 |     >>> parse_tolerance_style(OPTIONAL_MORPH2_AND_MORPH1)
74 |     3
75 | 
76 |     """
77 |     if isinstance(style, integer_types):
78 |         return style
79 |     if registry is None:
80 |         from . import registry
81 |     particle = registry.parse(style)
82 |     if len(particle.tolerances) != 4:
83 |         raise ValueError('Set tolerance style by general allomorphic particle')
84 |     return particle.tolerances.index(style)
85 | 
86 | 
87 | def get_tolerance(tolerances, style):
88 |     try:
89 |         return tolerances[style]
90 |     except IndexError:
91 |         return tolerances[0]
92 | 
93 | 
94 | def get_tolerance_from_iterator(tolerances, style):
95 |     for x, morph in enumerate(tolerances):
96 |         if style == x:
97 |             return morph
98 |     return morph
99 | 


--------------------------------------------------------------------------------
/tossi/utils.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | """
 3 |    tossi.utils
 4 |    ~~~~~~~~~~~
 5 | 
 6 |    Utilities for internal use.
 7 | 
 8 |    :copyright: (c) 2016-2017 by What! Studio
 9 |    :license: BSD, see LICENSE for more details.
10 | 
11 | """
12 | import functools
13 | 
14 | 
15 | __all__ = ['cached_property', 'CacheMeta']
16 | 
17 | 
18 | def cached_property(f):
19 |     """Similar to `@property` but it calls the function just once and caches
20 |     the result.  The object has to can have ``__cache__`` attribute.
21 | 
22 |     If you define `__slots__` for optimization, the metaclass should be a
23 |     :class:`CacheMeta`.
24 | 
25 |     """
26 |     @property
27 |     @functools.wraps(f)
28 |     def wrapped(self, name=f.__name__):
29 |         try:
30 |             cache = self.__cache__
31 |         except AttributeError:
32 |             self.__cache__ = cache = {}
33 |         try:
34 |             return cache[name]
35 |         except KeyError:
36 |             cache[name] = rv = f(self)
37 |             return rv
38 |     return wrapped
39 | 
40 | 
41 | class CacheMeta(type):
42 | 
43 |     def __new__(meta, name, bases, attrs):
44 |         if '__slots__' in attrs:
45 |             attrs['__slots__'] += ('__cache__',)
46 |         return super(CacheMeta, meta).__new__(meta, name, bases, attrs)
47 | 


--------------------------------------------------------------------------------