├── AUTHORS.rst ├── CONTRIBUTING.rst ├── HISTORY.rst ├── LICENSE ├── MANIFEST.in ├── Makefile ├── README.md ├── README.rst ├── docs ├── Makefile ├── authors.rst ├── conf.py ├── contributing.rst ├── history.rst ├── index.rst ├── installation.rst ├── make.bat ├── readme.rst └── usage.rst ├── g2p_zh_en ├── __init__.py ├── cli.py ├── g2p_en │ ├── __init__.py │ ├── checkpoint20.npz │ ├── expand.py │ ├── g2p.py │ └── homographs.en ├── g2p_zh_en.py ├── map_data │ ├── ARPA2IPA.map │ └── pinyin_to_phone.txt └── mapper.py ├── requirements.txt ├── requirements_dev.txt ├── setup.cfg ├── setup.py ├── tests ├── __init__.py ├── test_g2p_zh_en.py └── test_mapper.py └── tox.ini /AUTHORS.rst: -------------------------------------------------------------------------------- 1 | ======= 2 | Credits 3 | ======= 4 | 5 | Development Lead 6 | ---------------- 7 | 8 | * gp-zh-en 9 | 10 | Contributors 11 | ------------ 12 | 13 | None yet. Why not be the first? 14 | -------------------------------------------------------------------------------- /CONTRIBUTING.rst: -------------------------------------------------------------------------------- 1 | .. highlight:: shell 2 | 3 | ============ 4 | Contributing 5 | ============ 6 | 7 | Contributions are welcome, and they are greatly appreciated! Every little bit 8 | helps, and credit will always be given. 9 | 10 | You can contribute in many ways: 11 | 12 | Types of Contributions 13 | ---------------------- 14 | 15 | Report Bugs 16 | ~~~~~~~~~~~ 17 | 18 | Report bugs at https://github.com/skysbird/g2p_zh_en/issues. 19 | 20 | If you are reporting a bug, please include: 21 | 22 | * Your operating system name and version. 23 | * Any details about your local setup that might be helpful in troubleshooting. 24 | * Detailed steps to reproduce the bug. 25 | 26 | Fix Bugs 27 | ~~~~~~~~ 28 | 29 | Look through the GitHub issues for bugs. Anything tagged with "bug" and "help 30 | wanted" is open to whoever wants to implement it. 31 | 32 | Implement Features 33 | ~~~~~~~~~~~~~~~~~~ 34 | 35 | Look through the GitHub issues for features. Anything tagged with "enhancement" 36 | and "help wanted" is open to whoever wants to implement it. 37 | 38 | Write Documentation 39 | ~~~~~~~~~~~~~~~~~~~ 40 | 41 | g2p-zh-en could always use more documentation, whether as part of the 42 | official g2p-zh-en docs, in docstrings, or even on the web in blog posts, 43 | articles, and such. 44 | 45 | Submit Feedback 46 | ~~~~~~~~~~~~~~~ 47 | 48 | The best way to send feedback is to file an issue at https://github.com/skysbird/g2p_zh_en/issues. 49 | 50 | If you are proposing a feature: 51 | 52 | * Explain in detail how it would work. 53 | * Keep the scope as narrow as possible, to make it easier to implement. 54 | * Remember that this is a volunteer-driven project, and that contributions 55 | are welcome :) 56 | 57 | Get Started! 58 | ------------ 59 | 60 | Ready to contribute? Here's how to set up `g2p_zh_en` for local development. 61 | 62 | 1. Fork the `g2p_zh_en` repo on GitHub. 63 | 2. Clone your fork locally:: 64 | 65 | $ git clone git@github.com:your_name_here/g2p_zh_en.git 66 | 67 | 3. Install your local copy into a virtualenv. Assuming you have virtualenvwrapper installed, this is how you set up your fork for local development:: 68 | 69 | $ mkvirtualenv g2p_zh_en 70 | $ cd g2p_zh_en/ 71 | $ python setup.py develop 72 | 73 | 4. Create a branch for local development:: 74 | 75 | $ git checkout -b name-of-your-bugfix-or-feature 76 | 77 | Now you can make your changes locally. 78 | 79 | 5. When you're done making changes, check that your changes pass flake8 and the 80 | tests, including testing other Python versions with tox:: 81 | 82 | $ flake8 g2p_zh_en tests 83 | $ python setup.py test or pytest 84 | $ tox 85 | 86 | To get flake8 and tox, just pip install them into your virtualenv. 87 | 88 | 6. Commit your changes and push your branch to GitHub:: 89 | 90 | $ git add . 91 | $ git commit -m "Your detailed description of your changes." 92 | $ git push origin name-of-your-bugfix-or-feature 93 | 94 | 7. Submit a pull request through the GitHub website. 95 | 96 | Pull Request Guidelines 97 | ----------------------- 98 | 99 | Before you submit a pull request, check that it meets these guidelines: 100 | 101 | 1. The pull request should include tests. 102 | 2. If the pull request adds functionality, the docs should be updated. Put 103 | your new functionality into a function with a docstring, and add the 104 | feature to the list in README.rst. 105 | 3. The pull request should work for Python 3.5, 3.6, 3.7 and 3.8, and for PyPy. Check 106 | https://travis-ci.com/skysbird/g2p_zh_en/pull_requests 107 | and make sure that the tests pass for all supported Python versions. 108 | 109 | Tips 110 | ---- 111 | 112 | To run a subset of tests:: 113 | 114 | 115 | $ python -m unittest tests.test_g2p_zh_en 116 | 117 | Deploying 118 | --------- 119 | 120 | A reminder for the maintainers on how to deploy. 121 | Make sure all your changes are committed (including an entry in HISTORY.rst). 122 | Then run:: 123 | 124 | $ bump2version patch # possible: major / minor / patch 125 | $ git push 126 | $ git push --tags 127 | 128 | Travis will then deploy to PyPI if tests pass. 129 | -------------------------------------------------------------------------------- /HISTORY.rst: -------------------------------------------------------------------------------- 1 | ======= 2 | History 3 | ======= 4 | 5 | 0.1.0 (2023-07-14) 6 | ------------------ 7 | 8 | * First release on PyPI. 9 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | GNU GENERAL PUBLIC LICENSE 2 | Version 3, 29 June 2007 3 | 4 | g2p-zh-en 5 | Copyright (C) 2023 gp-zh-en 6 | 7 | This program is free software: you can redistribute it and/or modify 8 | it under the terms of the GNU General Public License as published by 9 | the Free Software Foundation, either version 3 of the License, or 10 | (at your option) any later version. 11 | 12 | This program is distributed in the hope that it will be useful, 13 | but WITHOUT ANY WARRANTY; without even the implied warranty of 14 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 15 | GNU General Public License for more details. 16 | 17 | You should have received a copy of the GNU General Public License 18 | along with this program. If not, see . 19 | 20 | Also add information on how to contact you by electronic and paper mail. 21 | 22 | You should also get your employer (if you work as a programmer) or school, 23 | if any, to sign a "copyright disclaimer" for the program, if necessary. 24 | For more information on this, and how to apply and follow the GNU GPL, see 25 | . 26 | 27 | The GNU General Public License does not permit incorporating your program 28 | into proprietary programs. If your program is a subroutine library, you 29 | may consider it more useful to permit linking proprietary applications with 30 | the library. If this is what you want to do, use the GNU Lesser General 31 | Public License instead of this License. But first, please read 32 | . 33 | 34 | -------------------------------------------------------------------------------- /MANIFEST.in: -------------------------------------------------------------------------------- 1 | include AUTHORS.rst 2 | include CONTRIBUTING.rst 3 | include HISTORY.rst 4 | include LICENSE 5 | include README.rst 6 | 7 | recursive-include tests * 8 | recursive-exclude * __pycache__ 9 | recursive-exclude * *.py[co] 10 | 11 | recursive-include docs *.rst conf.py Makefile make.bat *.jpg *.png *.gif 12 | recursive-include g2p_zh_en/map_data *.txt *.map 13 | recursive-include g2p_zh_en/g2p_en/ *.en *.npz 14 | 15 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | .PHONY: clean clean-build clean-pyc clean-test coverage dist docs help install lint lint/flake8 2 | .DEFAULT_GOAL := help 3 | 4 | define BROWSER_PYSCRIPT 5 | import os, webbrowser, sys 6 | 7 | from urllib.request import pathname2url 8 | 9 | webbrowser.open("file://" + pathname2url(os.path.abspath(sys.argv[1]))) 10 | endef 11 | export BROWSER_PYSCRIPT 12 | 13 | define PRINT_HELP_PYSCRIPT 14 | import re, sys 15 | 16 | for line in sys.stdin: 17 | match = re.match(r'^([a-zA-Z_-]+):.*?## (.*)$$', line) 18 | if match: 19 | target, help = match.groups() 20 | print("%-20s %s" % (target, help)) 21 | endef 22 | export PRINT_HELP_PYSCRIPT 23 | 24 | BROWSER := python -c "$$BROWSER_PYSCRIPT" 25 | 26 | help: 27 | @python -c "$$PRINT_HELP_PYSCRIPT" < $(MAKEFILE_LIST) 28 | 29 | clean: clean-build clean-pyc clean-test ## remove all build, test, coverage and Python artifacts 30 | 31 | clean-build: ## remove build artifacts 32 | rm -fr build/ 33 | rm -fr dist/ 34 | rm -fr .eggs/ 35 | find . -name '*.egg-info' -exec rm -fr {} + 36 | find . -name '*.egg' -exec rm -f {} + 37 | 38 | clean-pyc: ## remove Python file artifacts 39 | find . -name '*.pyc' -exec rm -f {} + 40 | find . -name '*.pyo' -exec rm -f {} + 41 | find . -name '*~' -exec rm -f {} + 42 | find . -name '__pycache__' -exec rm -fr {} + 43 | 44 | clean-test: ## remove test and coverage artifacts 45 | rm -fr .tox/ 46 | rm -f .coverage 47 | rm -fr htmlcov/ 48 | rm -fr .pytest_cache 49 | 50 | lint/flake8: ## check style with flake8 51 | flake8 g2p_zh_en tests 52 | 53 | lint: lint/flake8 ## check style 54 | 55 | test: ## run tests quickly with the default Python 56 | python setup.py test 57 | 58 | test-all: ## run tests on every Python version with tox 59 | tox 60 | 61 | coverage: ## check code coverage quickly with the default Python 62 | coverage run --source g2p_zh_en setup.py test 63 | coverage report -m 64 | coverage html 65 | $(BROWSER) htmlcov/index.html 66 | 67 | docs: ## generate Sphinx HTML documentation, including API docs 68 | rm -f docs/g2p_zh_en.rst 69 | rm -f docs/modules.rst 70 | sphinx-apidoc -o docs/ g2p_zh_en 71 | $(MAKE) -C docs clean 72 | $(MAKE) -C docs html 73 | $(BROWSER) docs/_build/html/index.html 74 | 75 | servedocs: docs ## compile the docs watching for changes 76 | watchmedo shell-command -p '*.rst' -c '$(MAKE) -C docs html' -R -D . 77 | 78 | release: dist ## package and upload a release 79 | twine upload dist/* 80 | 81 | dist: clean ## builds source and wheel package 82 | python setup.py sdist 83 | python setup.py bdist_wheel 84 | ls -l dist 85 | 86 | install: clean ## install the package to the active Python's site-packages 87 | python setup.py install 88 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # g2p_zh_en 2 | 3 | ## Introduction 4 | 5 | g2p_zh_en is an open-source project that provides a mixed Grapheme-to-Phoneme (G2P) conversion between Chinese and English based on Bigcidian's translation table. It aims to convert text between Chinese and English phonemes, allowing for pronunciation and speech-related applications. 6 | 7 | ## Features 8 | 9 | - G2P conversion from Chinese to English 10 | - G2P conversion from English to Chinese 11 | - Handling of special characters, such as numbers and currencies 12 | 13 | ## Installation 14 | 15 | 1. Make sure your environment meets the following requirements: 16 | - Python 3.x 17 | - Other dependencies (listed in requirements.txt) 18 | 19 | 2. Install the required dependencies by running the following command: 20 | 21 | pip install -r requirements.txt 22 | 23 | 24 | 3. Install g2p_zh_en using pip: 25 | 26 | pip install g2p_zh_en 27 | 28 | 29 | ## Usage 30 | 31 | Import the G2P class from g2p_zh_en and create an instance: 32 | 33 | ```python 34 | from g2p_zh_en.g2p_zh_en import G2P 35 | 36 | g2p = G2P() 37 | text = "我有100美元,i'm so rich." 38 | output = g2p.g2p(text) 39 | print(output) 40 | ['w', 'uɔ3', 'y', 'əu3', 'y', 'ii4', 'b', 'ai3', 'm', 'ei3', 'yu', 'an2', ',', ' ', 'ai', 'm', ' ', 's', 'əu', ' ', 'r', 'i', 'ch', ' ', '.'] 41 | text = "i have 100 dollar,我是不是很富有?" 42 | output = g2p.g2p(text, language='en-us') 43 | print(output) 44 | ['ai', ' ', 'h', 'æ', 'v', ' ', 'w', 'ʌ', 'n', ' ', 'h', 'ʌ', 'n', 'd', 'r', 'ə', 'd', ' ', 'd', 'a', 'l', 'ər', ' ', ',', ' ', 'w', 'uɔ3', 'sh', 'iii4', 'b', 'uu2', 'sh', 'iii4', 'h', 'ən3', 'f', 'uu4', 'y', 'əu3', ' ', '?'] 45 | ``` 46 | 47 | Please note that the output represents the phonetic representation of the input text. 48 | 49 | ## In Progress 50 | The following features are currently being developed: 51 | 52 | - [x] G2P conversion with Chinese as the primary language. 53 | - [x] G2P conversion with English as the primary language. 54 | - [ ] Handling various special characters, such as numbers and currencies. 55 | Contribution 56 | 57 | If you would like to contribute to this project, you can: 58 | 59 | Submit bug reports or feature requests on the project's issue page. 60 | Fork the project, create your own branch, and submit a pull request. 61 | Improve documentation and code comments. 62 | Thank you for your support and contributions! 63 | 64 | 65 | ## License 66 | This project is licensed under the GNU General Public License. -------------------------------------------------------------------------------- /README.rst: -------------------------------------------------------------------------------- 1 | 2 | g2p_zh_en 3 | ========= 4 | 5 | Introduction 6 | ------------ 7 | 8 | g2p_zh_en is an open-source project that provides a mixed Grapheme-to-Phoneme (G2P) conversion between Chinese and English based on Bigcidian's translation table. It aims to convert text between Chinese and English phonemes, allowing for pronunciation and speech-related applications. 9 | 10 | Features 11 | -------- 12 | 13 | 14 | * G2P conversion from Chinese to English 15 | * G2P conversion from English to Chinese 16 | * Handling of special characters, such as numbers and currencies 17 | 18 | Installation 19 | ------------ 20 | 21 | . 22 | = 23 | 24 | Make sure your environment meets the following requirements: 25 | 26 | 27 | * Python 3.x 28 | * Other dependencies (listed in requirements.txt) 29 | 30 | . 31 | = 32 | 33 | Install the required dependencies by running the following command: 34 | 35 | pip install -r requirements.txt 36 | 37 | . Install g2p_zh_en using pip: 38 | ============================== 39 | 40 | pip install g2p_zh_en 41 | 42 | Usage 43 | ----- 44 | 45 | Import the G2P class from g2p_zh_en and create an instance: 46 | 47 | .. code-block:: python 48 | 49 | from g2p_zh_en.g2p_zh_en import G2P 50 | 51 | g2p = G2P() 52 | text = "我有100美元,i'm so rich." 53 | output = g2p.g2p(text) 54 | print(output) 55 | ['w', 'uɔ3', 'y', 'əu3', 'y', 'ii4', 'b', 'ai3', 'm', 'ei3', 'yu', 'an2', ',', ' ', 'ai', 'm', ' ', 's', 'əu', ' ', 'r', 'i', 'ch', ' ', '.'] 56 | text = "i have 100 dollar,我是不是很富有?" 57 | output = g2p.g2p(text, language='en-us') 58 | print(output) 59 | ['ai', ' ', 'h', 'æ', 'v', ' ', 'w', 'ʌ', 'n', ' ', 'h', 'ʌ', 'n', 'd', 'r', 'ə', 'd', ' ', 'd', 'a', 'l', 'ər', ' ', ',', ' ', 'w', 'uɔ3', 'sh', 'iii4', 'b', 'uu2', 'sh', 'iii4', 'h', 'ən3', 'f', 'uu4', 'y', 'əu3', ' ', '?'] 60 | 61 | Please note that the output represents the phonetic representation of the input text. 62 | 63 | In Progress 64 | ----------- 65 | 66 | The following features are currently being developed: 67 | 68 | 69 | * [x] G2P conversion with Chinese as the primary language. 70 | * [x] G2P conversion with English as the primary language. 71 | * [ ] Handling various special characters, such as numbers and currencies. 72 | Contribution 73 | 74 | If you would like to contribute to this project, you can: 75 | 76 | Submit bug reports or feature requests on the project's issue page. 77 | Fork the project, create your own branch, and submit a pull request. 78 | Improve documentation and code comments. 79 | Thank you for your support and contributions! 80 | 81 | License 82 | ------- 83 | 84 | This project is licensed under the GNU General Public License. 85 | -------------------------------------------------------------------------------- /docs/Makefile: -------------------------------------------------------------------------------- 1 | # Minimal makefile for Sphinx documentation 2 | # 3 | 4 | # You can set these variables from the command line. 5 | SPHINXOPTS = 6 | SPHINXBUILD = python -msphinx 7 | SPHINXPROJ = g2p_zh_en 8 | SOURCEDIR = . 9 | BUILDDIR = _build 10 | 11 | # Put it first so that "make" without argument is like "make help". 12 | help: 13 | @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) 14 | 15 | .PHONY: help Makefile 16 | 17 | # Catch-all target: route all unknown targets to Sphinx using the new 18 | # "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). 19 | %: Makefile 20 | @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) 21 | -------------------------------------------------------------------------------- /docs/authors.rst: -------------------------------------------------------------------------------- 1 | .. include:: ../AUTHORS.rst 2 | -------------------------------------------------------------------------------- /docs/conf.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # 3 | # g2p_zh_en documentation build configuration file, created by 4 | # sphinx-quickstart on Fri Jun 9 13:47:02 2017. 5 | # 6 | # This file is execfile()d with the current directory set to its 7 | # containing dir. 8 | # 9 | # Note that not all possible configuration values are present in this 10 | # autogenerated file. 11 | # 12 | # All configuration values have a default; values that are commented out 13 | # serve to show the default. 14 | 15 | # If extensions (or modules to document with autodoc) are in another 16 | # directory, add these directories to sys.path here. If the directory is 17 | # relative to the documentation root, use os.path.abspath to make it 18 | # absolute, like shown here. 19 | # 20 | import os 21 | import sys 22 | sys.path.insert(0, os.path.abspath('..')) 23 | 24 | import g2p_zh_en 25 | 26 | # -- General configuration --------------------------------------------- 27 | 28 | # If your documentation needs a minimal Sphinx version, state it here. 29 | # 30 | # needs_sphinx = '1.0' 31 | 32 | # Add any Sphinx extension module names here, as strings. They can be 33 | # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom ones. 34 | extensions = ['sphinx.ext.autodoc', 'sphinx.ext.viewcode'] 35 | 36 | # Add any paths that contain templates here, relative to this directory. 37 | templates_path = ['_templates'] 38 | 39 | # The suffix(es) of source filenames. 40 | # You can specify multiple suffix as a list of string: 41 | # 42 | # source_suffix = ['.rst', '.md'] 43 | source_suffix = '.rst' 44 | 45 | # The master toctree document. 46 | master_doc = 'index' 47 | 48 | # General information about the project. 49 | project = 'g2p-zh-en' 50 | copyright = "2023, gp-zh-en" 51 | author = "gp-zh-en" 52 | 53 | # The version info for the project you're documenting, acts as replacement 54 | # for |version| and |release|, also used in various other places throughout 55 | # the built documents. 56 | # 57 | # The short X.Y version. 58 | version = g2p_zh_en.__version__ 59 | # The full version, including alpha/beta/rc tags. 60 | release = g2p_zh_en.__version__ 61 | 62 | # The language for content autogenerated by Sphinx. Refer to documentation 63 | # for a list of supported languages. 64 | # 65 | # This is also used if you do content translation via gettext catalogs. 66 | # Usually you set "language" from the command line for these cases. 67 | language = None 68 | 69 | # List of patterns, relative to source directory, that match files and 70 | # directories to ignore when looking for source files. 71 | # This patterns also effect to html_static_path and html_extra_path 72 | exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store'] 73 | 74 | # The name of the Pygments (syntax highlighting) style to use. 75 | pygments_style = 'sphinx' 76 | 77 | # If true, `todo` and `todoList` produce output, else they produce nothing. 78 | todo_include_todos = False 79 | 80 | 81 | # -- Options for HTML output ------------------------------------------- 82 | 83 | # The theme to use for HTML and HTML Help pages. See the documentation for 84 | # a list of builtin themes. 85 | # 86 | html_theme = 'alabaster' 87 | 88 | # Theme options are theme-specific and customize the look and feel of a 89 | # theme further. For a list of options available for each theme, see the 90 | # documentation. 91 | # 92 | # html_theme_options = {} 93 | 94 | # Add any paths that contain custom static files (such as style sheets) here, 95 | # relative to this directory. They are copied after the builtin static files, 96 | # so a file named "default.css" will overwrite the builtin "default.css". 97 | html_static_path = ['_static'] 98 | 99 | 100 | # -- Options for HTMLHelp output --------------------------------------- 101 | 102 | # Output file base name for HTML help builder. 103 | htmlhelp_basename = 'g2p_zh_endoc' 104 | 105 | 106 | # -- Options for LaTeX output ------------------------------------------ 107 | 108 | latex_elements = { 109 | # The paper size ('letterpaper' or 'a4paper'). 110 | # 111 | # 'papersize': 'letterpaper', 112 | 113 | # The font size ('10pt', '11pt' or '12pt'). 114 | # 115 | # 'pointsize': '10pt', 116 | 117 | # Additional stuff for the LaTeX preamble. 118 | # 119 | # 'preamble': '', 120 | 121 | # Latex figure (float) alignment 122 | # 123 | # 'figure_align': 'htbp', 124 | } 125 | 126 | # Grouping the document tree into LaTeX files. List of tuples 127 | # (source start file, target name, title, author, documentclass 128 | # [howto, manual, or own class]). 129 | latex_documents = [ 130 | (master_doc, 'g2p_zh_en.tex', 131 | 'g2p-zh-en Documentation', 132 | 'gp-zh-en', 'manual'), 133 | ] 134 | 135 | 136 | # -- Options for manual page output ------------------------------------ 137 | 138 | # One entry per manual page. List of tuples 139 | # (source start file, name, description, authors, manual section). 140 | man_pages = [ 141 | (master_doc, 'g2p_zh_en', 142 | 'g2p-zh-en Documentation', 143 | [author], 1) 144 | ] 145 | 146 | 147 | # -- Options for Texinfo output ---------------------------------------- 148 | 149 | # Grouping the document tree into Texinfo files. List of tuples 150 | # (source start file, target name, title, author, 151 | # dir menu entry, description, category) 152 | texinfo_documents = [ 153 | (master_doc, 'g2p_zh_en', 154 | 'g2p-zh-en Documentation', 155 | author, 156 | 'g2p_zh_en', 157 | 'One line description of project.', 158 | 'Miscellaneous'), 159 | ] 160 | 161 | 162 | 163 | -------------------------------------------------------------------------------- /docs/contributing.rst: -------------------------------------------------------------------------------- 1 | .. include:: ../CONTRIBUTING.rst 2 | -------------------------------------------------------------------------------- /docs/history.rst: -------------------------------------------------------------------------------- 1 | .. include:: ../HISTORY.rst 2 | -------------------------------------------------------------------------------- /docs/index.rst: -------------------------------------------------------------------------------- 1 | Welcome to g2p-zh-en's documentation! 2 | ====================================== 3 | 4 | .. toctree:: 5 | :maxdepth: 2 6 | :caption: Contents: 7 | 8 | readme 9 | installation 10 | usage 11 | modules 12 | contributing 13 | authors 14 | history 15 | 16 | Indices and tables 17 | ================== 18 | * :ref:`genindex` 19 | * :ref:`modindex` 20 | * :ref:`search` 21 | -------------------------------------------------------------------------------- /docs/installation.rst: -------------------------------------------------------------------------------- 1 | .. highlight:: shell 2 | 3 | ============ 4 | Installation 5 | ============ 6 | 7 | 8 | Stable release 9 | -------------- 10 | 11 | To install g2p-zh-en, run this command in your terminal: 12 | 13 | .. code-block:: console 14 | 15 | $ pip install g2p_zh_en 16 | 17 | This is the preferred method to install g2p-zh-en, as it will always install the most recent stable release. 18 | 19 | If you don't have `pip`_ installed, this `Python installation guide`_ can guide 20 | you through the process. 21 | 22 | .. _pip: https://pip.pypa.io 23 | .. _Python installation guide: http://docs.python-guide.org/en/latest/starting/installation/ 24 | 25 | 26 | From sources 27 | ------------ 28 | 29 | The sources for g2p-zh-en can be downloaded from the `Github repo`_. 30 | 31 | You can either clone the public repository: 32 | 33 | .. code-block:: console 34 | 35 | $ git clone git://github.com/skysbird/g2p_zh_en 36 | 37 | Or download the `tarball`_: 38 | 39 | .. code-block:: console 40 | 41 | $ curl -OJL https://github.com/skysbird/g2p_zh_en/tarball/master 42 | 43 | Once you have a copy of the source, you can install it with: 44 | 45 | .. code-block:: console 46 | 47 | $ python setup.py install 48 | 49 | 50 | .. _Github repo: https://github.com/skysbird/g2p_zh_en 51 | .. _tarball: https://github.com/skysbird/g2p_zh_en/tarball/master 52 | -------------------------------------------------------------------------------- /docs/make.bat: -------------------------------------------------------------------------------- 1 | @ECHO OFF 2 | 3 | pushd %~dp0 4 | 5 | REM Command file for Sphinx documentation 6 | 7 | if "%SPHINXBUILD%" == "" ( 8 | set SPHINXBUILD=python -msphinx 9 | ) 10 | set SOURCEDIR=. 11 | set BUILDDIR=_build 12 | set SPHINXPROJ=g2p_zh_en 13 | 14 | if "%1" == "" goto help 15 | 16 | %SPHINXBUILD% >NUL 2>NUL 17 | if errorlevel 9009 ( 18 | echo. 19 | echo.The Sphinx module was not found. Make sure you have Sphinx installed, 20 | echo.then set the SPHINXBUILD environment variable to point to the full 21 | echo.path of the 'sphinx-build' executable. Alternatively you may add the 22 | echo.Sphinx directory to PATH. 23 | echo. 24 | echo.If you don't have Sphinx installed, grab it from 25 | echo.http://sphinx-doc.org/ 26 | exit /b 1 27 | ) 28 | 29 | %SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% 30 | goto end 31 | 32 | :help 33 | %SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% 34 | 35 | :end 36 | popd 37 | -------------------------------------------------------------------------------- /docs/readme.rst: -------------------------------------------------------------------------------- 1 | .. include:: ../README.rst 2 | -------------------------------------------------------------------------------- /docs/usage.rst: -------------------------------------------------------------------------------- 1 | ===== 2 | Usage 3 | ===== 4 | 5 | To use g2p-zh-en in a project:: 6 | 7 | import g2p_zh_en 8 | -------------------------------------------------------------------------------- /g2p_zh_en/__init__.py: -------------------------------------------------------------------------------- 1 | """Top-level package for g2p-zh-en.""" 2 | 3 | __author__ = """gp-zh-en""" 4 | __email__ = 'skysbird@gmail.com' 5 | __version__ = '0.1.0' 6 | 7 | 8 | from .g2p_zh_en import G2P 9 | -------------------------------------------------------------------------------- /g2p_zh_en/cli.py: -------------------------------------------------------------------------------- 1 | """Console script for g2p_zh_en.""" 2 | import argparse 3 | import sys 4 | 5 | 6 | def main(): 7 | """Console script for g2p_zh_en.""" 8 | parser = argparse.ArgumentParser() 9 | parser.add_argument('_', nargs='*') 10 | args = parser.parse_args() 11 | 12 | print("Arguments: " + str(args._)) 13 | print("Replace this message by putting your code into " 14 | "g2p_zh_en.cli.main") 15 | return 0 16 | 17 | 18 | if __name__ == "__main__": 19 | sys.exit(main()) # pragma: no cover 20 | -------------------------------------------------------------------------------- /g2p_zh_en/g2p_en/__init__.py: -------------------------------------------------------------------------------- 1 | from .g2p import G2p 2 | -------------------------------------------------------------------------------- /g2p_zh_en/g2p_en/checkpoint20.npz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/skysbird/g2p-zh-en/740fa3a8d851de0ae1b16f43b127f33d8cfff6b7/g2p_zh_en/g2p_en/checkpoint20.npz -------------------------------------------------------------------------------- /g2p_zh_en/g2p_en/expand.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | #/usr/bin/python2 3 | ''' 4 | Borrowed 5 | from https://github.com/keithito/tacotron/blob/master/text/numbers.py 6 | By kyubyong park. kbpark.linguist@gmail.com. 7 | https://www.github.com/kyubyong/g2p 8 | ''' 9 | from __future__ import print_function 10 | import inflect 11 | import re 12 | 13 | 14 | 15 | _inflect = inflect.engine() 16 | _comma_number_re = re.compile(r'([0-9][0-9\,]+[0-9])') 17 | _decimal_number_re = re.compile(r'([0-9]+\.[0-9]+)') 18 | _pounds_re = re.compile(r'£([0-9\,]*[0-9]+)') 19 | _dollars_re = re.compile(r'\$([0-9\.\,]*[0-9]+)') 20 | _ordinal_re = re.compile(r'[0-9]+(st|nd|rd|th)') 21 | _number_re = re.compile(r'[0-9]+') 22 | 23 | 24 | def _remove_commas(m): 25 | return m.group(1).replace(',', '') 26 | 27 | 28 | def _expand_decimal_point(m): 29 | return m.group(1).replace('.', ' point ') 30 | 31 | 32 | def _expand_dollars(m): 33 | match = m.group(1) 34 | parts = match.split('.') 35 | if len(parts) > 2: 36 | return match + ' dollars' # Unexpected format 37 | dollars = int(parts[0]) if parts[0] else 0 38 | cents = int(parts[1]) if len(parts) > 1 and parts[1] else 0 39 | if dollars and cents: 40 | dollar_unit = 'dollar' if dollars == 1 else 'dollars' 41 | cent_unit = 'cent' if cents == 1 else 'cents' 42 | return '%s %s, %s %s' % (dollars, dollar_unit, cents, cent_unit) 43 | elif dollars: 44 | dollar_unit = 'dollar' if dollars == 1 else 'dollars' 45 | return '%s %s' % (dollars, dollar_unit) 46 | elif cents: 47 | cent_unit = 'cent' if cents == 1 else 'cents' 48 | return '%s %s' % (cents, cent_unit) 49 | else: 50 | return 'zero dollars' 51 | 52 | 53 | def _expand_ordinal(m): 54 | return _inflect.number_to_words(m.group(0)) 55 | 56 | 57 | def _expand_number(m): 58 | num = int(m.group(0)) 59 | if num > 1000 and num < 3000: 60 | if num == 2000: 61 | return 'two thousand' 62 | elif num > 2000 and num < 2010: 63 | return 'two thousand ' + _inflect.number_to_words(num % 100) 64 | elif num % 100 == 0: 65 | return _inflect.number_to_words(num // 100) + ' hundred' 66 | else: 67 | return _inflect.number_to_words(num, andword='', zero='oh', group=2).replace(', ', ' ') 68 | else: 69 | return _inflect.number_to_words(num, andword='') 70 | 71 | 72 | def normalize_numbers(text): 73 | text = re.sub(_comma_number_re, _remove_commas, text) 74 | text = re.sub(_pounds_re, r'\1 pounds', text) 75 | text = re.sub(_dollars_re, _expand_dollars, text) 76 | text = re.sub(_decimal_number_re, _expand_decimal_point, text) 77 | text = re.sub(_ordinal_re, _expand_ordinal, text) 78 | text = re.sub(_number_re, _expand_number, text) 79 | return text 80 | -------------------------------------------------------------------------------- /g2p_zh_en/g2p_en/g2p.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # /usr/bin/python 3 | ''' 4 | By kyubyong park(kbpark.linguist@gmail.com) and Jongseok Kim(https://github.com/ozmig77) 5 | https://www.github.com/kyubyong/g2p 6 | ''' 7 | from nltk import pos_tag 8 | from nltk.corpus import cmudict 9 | import nltk 10 | # from nltk.tokenize import word_tokenize 11 | from nltk.tokenize import TweetTokenizer 12 | word_tokenize = TweetTokenizer().tokenize 13 | import numpy as np 14 | import codecs 15 | import re 16 | import os 17 | import unicodedata 18 | from builtins import str as unicode 19 | from .expand import normalize_numbers 20 | 21 | try: 22 | nltk.data.find('taggers/averaged_perceptron_tagger.zip') 23 | except LookupError: 24 | nltk.download('averaged_perceptron_tagger') 25 | try: 26 | nltk.data.find('corpora/cmudict.zip') 27 | except LookupError: 28 | nltk.download('cmudict') 29 | 30 | dirname = os.path.dirname(__file__) 31 | 32 | def construct_homograph_dictionary(): 33 | f = os.path.join(dirname,'homographs.en') 34 | homograph2features = dict() 35 | for line in codecs.open(f, 'r', 'utf8').read().splitlines(): 36 | if line.startswith("#"): continue # comment 37 | headword, pron1, pron2, pos1 = line.strip().split("|") 38 | homograph2features[headword.lower()] = (pron1.split(), pron2.split(), pos1) 39 | return homograph2features 40 | 41 | # def segment(text): 42 | # ''' 43 | # Splits text into `tokens`. 44 | # :param text: A string. 45 | # :return: A list of tokens (string). 46 | # ''' 47 | # print(text) 48 | # text = re.sub('([.,?!]( |$))', r' \1', text) 49 | # print(text) 50 | # return text.split() 51 | 52 | class G2p(object): 53 | def __init__(self): 54 | super().__init__() 55 | self.graphemes = ["", "", ""] + list("abcdefghijklmnopqrstuvwxyz") 56 | self.phonemes = ["", "", "", ""] + ['AA0', 'AA1', 'AA2', 'AE0', 'AE1', 'AE2', 'AH0', 'AH1', 'AH2', 'AO0', 57 | 'AO1', 'AO2', 'AW0', 'AW1', 'AW2', 'AY0', 'AY1', 'AY2', 'B', 'CH', 'D', 'DH', 58 | 'EH0', 'EH1', 'EH2', 'ER0', 'ER1', 'ER2', 'EY0', 'EY1', 59 | 'EY2', 'F', 'G', 'HH', 60 | 'IH0', 'IH1', 'IH2', 'IY0', 'IY1', 'IY2', 'JH', 'K', 'L', 61 | 'M', 'N', 'NG', 'OW0', 'OW1', 62 | 'OW2', 'OY0', 'OY1', 'OY2', 'P', 'R', 'S', 'SH', 'T', 'TH', 63 | 'UH0', 'UH1', 'UH2', 'UW', 64 | 'UW0', 'UW1', 'UW2', 'V', 'W', 'Y', 'Z', 'ZH'] 65 | self.g2idx = {g: idx for idx, g in enumerate(self.graphemes)} 66 | self.idx2g = {idx: g for idx, g in enumerate(self.graphemes)} 67 | 68 | self.p2idx = {p: idx for idx, p in enumerate(self.phonemes)} 69 | self.idx2p = {idx: p for idx, p in enumerate(self.phonemes)} 70 | 71 | self.cmu = cmudict.dict() 72 | self.load_variables() 73 | self.homograph2features = construct_homograph_dictionary() 74 | 75 | def load_variables(self): 76 | self.variables = np.load(os.path.join(dirname,'checkpoint20.npz')) 77 | self.enc_emb = self.variables["enc_emb"] # (29, 64). (len(graphemes), emb) 78 | self.enc_w_ih = self.variables["enc_w_ih"] # (3*128, 64) 79 | self.enc_w_hh = self.variables["enc_w_hh"] # (3*128, 128) 80 | self.enc_b_ih = self.variables["enc_b_ih"] # (3*128,) 81 | self.enc_b_hh = self.variables["enc_b_hh"] # (3*128,) 82 | 83 | self.dec_emb = self.variables["dec_emb"] # (74, 64). (len(phonemes), emb) 84 | self.dec_w_ih = self.variables["dec_w_ih"] # (3*128, 64) 85 | self.dec_w_hh = self.variables["dec_w_hh"] # (3*128, 128) 86 | self.dec_b_ih = self.variables["dec_b_ih"] # (3*128,) 87 | self.dec_b_hh = self.variables["dec_b_hh"] # (3*128,) 88 | self.fc_w = self.variables["fc_w"] # (74, 128) 89 | self.fc_b = self.variables["fc_b"] # (74,) 90 | 91 | def sigmoid(self, x): 92 | return 1 / (1 + np.exp(-x)) 93 | 94 | def grucell(self, x, h, w_ih, w_hh, b_ih, b_hh): 95 | rzn_ih = np.matmul(x, w_ih.T) + b_ih 96 | rzn_hh = np.matmul(h, w_hh.T) + b_hh 97 | 98 | rz_ih, n_ih = rzn_ih[:, :rzn_ih.shape[-1] * 2 // 3], rzn_ih[:, rzn_ih.shape[-1] * 2 // 3:] 99 | rz_hh, n_hh = rzn_hh[:, :rzn_hh.shape[-1] * 2 // 3], rzn_hh[:, rzn_hh.shape[-1] * 2 // 3:] 100 | 101 | rz = self.sigmoid(rz_ih + rz_hh) 102 | r, z = np.split(rz, 2, -1) 103 | 104 | n = np.tanh(n_ih + r * n_hh) 105 | h = (1 - z) * n + z * h 106 | 107 | return h 108 | 109 | def gru(self, x, steps, w_ih, w_hh, b_ih, b_hh, h0=None): 110 | if h0 is None: 111 | h0 = np.zeros((x.shape[0], w_hh.shape[1]), np.float32) 112 | h = h0 # initial hidden state 113 | outputs = np.zeros((x.shape[0], steps, w_hh.shape[1]), np.float32) 114 | for t in range(steps): 115 | h = self.grucell(x[:, t, :], h, w_ih, w_hh, b_ih, b_hh) # (b, h) 116 | outputs[:, t, ::] = h 117 | return outputs 118 | 119 | def encode(self, word): 120 | chars = list(word) + [""] 121 | x = [self.g2idx.get(char, self.g2idx[""]) for char in chars] 122 | x = np.take(self.enc_emb, np.expand_dims(x, 0), axis=0) 123 | 124 | return x 125 | 126 | def predict(self, word): 127 | # encoder 128 | enc = self.encode(word) 129 | enc = self.gru(enc, len(word) + 1, self.enc_w_ih, self.enc_w_hh, 130 | self.enc_b_ih, self.enc_b_hh, h0=np.zeros((1, self.enc_w_hh.shape[-1]), np.float32)) 131 | last_hidden = enc[:, -1, :] 132 | 133 | # decoder 134 | dec = np.take(self.dec_emb, [2], axis=0) # 2: 135 | h = last_hidden 136 | 137 | preds = [] 138 | for i in range(20): 139 | h = self.grucell(dec, h, self.dec_w_ih, self.dec_w_hh, self.dec_b_ih, self.dec_b_hh) # (b, h) 140 | logits = np.matmul(h, self.fc_w.T) + self.fc_b 141 | pred = logits.argmax() 142 | if pred == 3: break # 3: 143 | preds.append(pred) 144 | dec = np.take(self.dec_emb, [pred], axis=0) 145 | 146 | preds = [self.idx2p.get(idx, "") for idx in preds] 147 | return preds 148 | 149 | def __call__(self, text, no_handler=None): 150 | # preprocessing 151 | text = unicode(text) 152 | text = normalize_numbers(text) 153 | text = ''.join(char for char in unicodedata.normalize('NFD', text) 154 | if unicodedata.category(char) != 'Mn') # Strip accents 155 | text = text.lower() 156 | # text = re.sub("[^ a-z'.,?!\-]", "", text) 157 | text = text.replace("i.e.", "that is") 158 | text = text.replace("e.g.", "for example") 159 | 160 | # tokenization 161 | words = word_tokenize(text) 162 | tokens = pos_tag(words) # tuples of (word, tag) 163 | 164 | # steps 165 | prons = [] 166 | for word, pos in tokens: 167 | if re.search("[a-z]", word) is None: 168 | if no_handler is None: 169 | pron = [word] 170 | else: 171 | pron = no_handler(word) 172 | 173 | elif word in self.homograph2features: # Check homograph 174 | pron1, pron2, pos1 = self.homograph2features[word] 175 | if pos.startswith(pos1): 176 | pron = pron1 177 | else: 178 | pron = pron2 179 | elif word in self.cmu: # lookup CMU dict 180 | pron = self.cmu[word][0] 181 | else: # predict for oov 182 | pron = self.predict(word) 183 | 184 | prons.extend(pron) 185 | prons.extend([" "]) 186 | 187 | return prons[:-1] 188 | 189 | if __name__ == '__main__': 190 | texts = ["I have $250 in my pocket.", # number -> spell-out 191 | "popular pets, e.g. cats and dogs", # e.g. -> for example 192 | "I refuse to collect the refuse around here.", # homograph 193 | "I'm an activationist."] # newly coined word 194 | g2p = G2p() 195 | for text in texts: 196 | out = g2p(text) 197 | print(out) 198 | 199 | -------------------------------------------------------------------------------- /g2p_zh_en/g2p_en/homographs.en: -------------------------------------------------------------------------------- 1 | #This is based on http://www minpairs talktalk net/graph html 2 | #Each line is formatted as follows: 3 | #HEADWORD|PRONUNCIATION1|PRONUNCIATION2|POS 4 | #HEADWORD should have PRONUNCIATION1 only if it's part-of-speech is POS 5 | #Otherwise PRONUNCIATION2 is applied 6 | #May, 2018 7 | #Kyubyong Park 8 | #https://github|com/kyubyong/g2p 9 | ABSENT|AH1 B S AE1 N T|AE1 B S AH0 N T|V 10 | ABSTRACT|AE0 B S T R AE1 K T|AE1 B S T R AE2 K T|V 11 | ABSTRACTS|AE0 B S T R AE1 K T S|AE1 B S T R AE0 K T S|V 12 | ABUSE|AH0 B Y UW1 Z|AH0 B Y UW1 S|V 13 | ABUSES|AH0 B Y UW1 Z IH0 Z|AH0 B Y UW1 S IH0 Z|V 14 | ACCENT|AH0 K S EH1 N T|AE1 K S EH2 N T|V 15 | ACCENTS|AE1 K S EH0 N T S|AE1 K S EH0 N T S|V 16 | ADDICT|AH0 D IH1 K T|AE1 D IH2 K T|V 17 | ADDICTS|AH0 D IH1 K T S|AE1 D IH2 K T S|V 18 | ADVOCATE|AE1 D V AH0 K EY2 T|AE1 D V AH0 K AH0 T|V 19 | ADVOCATES|AE1 D V AH0 K EY2 T S|AE1 D V AH0 K AH0 T S|V 20 | AFFECT|AH0 F EH1 K T|AE1 F EH0 K T|V 21 | AFFECTS|AH0 F EH1 K T S|AE1 F EH0 K T S|V 22 | AFFIX|AH0 F IH1 K S|AE1 F IH0 K S|V 23 | AFFIXES|AH0 F IH1 K S IH0 Z|AE1 F IH0 K S IH0 Z|V 24 | AGGLOMERATE|AH0 G L AA1 M ER0 EY2 T|AH0 G L AA1 M ER0 AH0 T|V 25 | AGGREGATE|AE1 G R AH0 G EY0 T|AE1 G R AH0 G AH0 T|V 26 | AGGREGATES|AE1 G R AH0 G EY2 T S|AE1 G R AH0 G IH0 T S|V 27 | ALLIES|AH0 L AY1 Z|AE1 L AY0 Z|V 28 | ALLOY|AH0 L OY1|AE1 L OY2|V 29 | ALLOYS|AH0 L OY1 Z|AE1 L OY2 Z|V 30 | ALLY|AH0 L AY1|AE1 L AY0|V 31 | ALTERNATE|AO1 L T ER0 N EY2 T|AO0 L T ER1 N AH0 T|V 32 | ANALYSES|AH0 N AE1 L IH0 S IY2 Z|AE1 N AH0 L AY0 Z IH2 Z|V 33 | ANIMATE|AE1 N AH0 M EY2 T|AE1 N AH0 M AH0 T|V 34 | ANNEX|AH0 N EH1 K S|AE1 N EH2 K S|V 35 | ANNEXES|AH0 N EH1 K S IH0 Z|AE1 N EH2 K S IH0 Z|V 36 | APPROPRIATE|AH0 P R OW1 P R IY0 EY2 T|AH0 P R OW1 P R IY0 AH0 T|V 37 | APPROXIMATE|AH0 P R AA1 K S AH0 M EY2 T|AH0 P R AA1 K S AH0 M AH0 T|V 38 | ARTICULATE|AA0 R T IH1 K Y AH0 L AH0 T|AA0 R T IH1 K Y AH0 L EY2 T|V 39 | ASPIRATE|AE1 S P ER0 EY2 T|AE1 S P ER0 AH0 T|V 40 | ASPIRATES|AE1 S P ER0 EY2 T S|AE1 S P ER0 AH0 T S|V 41 | ASSOCIATE|AH0 S OW1 S IY0 EY2 T|AH0 S OW1 S IY0 AH0 T|V 42 | ASSOCIATES|AH0 S OW1 S IY0 EY2 T S|AH0 S OW1 S IY0 AH0 T S|V 43 | ATTRIBUTE|AH0 T R IH1 B Y UW2 T|AE1 T R IH0 B Y UW0 T|V 44 | ATTRIBUTES|AH0 T R IH1 B Y UW2 T S|AE1 T R IH0 B Y UW0 T S|V 45 | BATHS|B AE1 TH S|B AE1 DH Z|V 46 | BLESSED|B L EH1 S IH0 D|B L EH1 S T|V 47 | CERTIFICATE|S ER0 T IH1 F IH0 K AH0 T|S ER0 T IH1 F IH0 K EY2 T|V 48 | CERTIFICATES|S ER0 T IH1 F IH0 K EY2 T S|S ER0 T IH1 F IH0 K AH0 T S|V 49 | CLOSE|K L OW1 Z|K L OW1 S|V 50 | CLOSER|K L OW1 Z ER0|K L OW1 S ER0|N 51 | CLOSES|K L OW1 Z IH0 Z|K L OW1 S IH0 Z|V 52 | COLLECT|K AH0 L EH1 K T|K AA1 L EH0 K T|V 53 | COLLECTS|K AH0 L EH1 K T S|K AA1 L EH0 K T S|V 54 | COMBAT|K AH0 M B AE1 T|K AA1 M B AE0 T|V 55 | COMBATS|K AH0 M B AE1 T S|K AH1 M B AE0 T S|V 56 | COMBINE|K AH0 M B AY1 N|K AA1 M B AY0 N|V 57 | COMMUNE|K AH0 M Y UW1 N|K AA1 M Y UW0 N|V 58 | COMMUNES|K AH0 M Y UW1 N Z|K AA1 M Y UW0 N Z|V 59 | COMPACT|K AH0 M P AE1 K T|K AA1 M P AE0 K T|V 60 | COMPACTS|K AH0 M P AE1 K T S|K AA1 M P AE0 K T S|V 61 | COMPLEX|K AH0 M P L EH1 K S| K AA1 M P L EH0 K S|ADJ 62 | COMPLIMENT|K AA1 M P L AH0 M EH0 N T|K AA1 M P L AH0 M AH0 N T|V 63 | COMPLIMENTS|K AA1 M P L AH0 M EH0 N T S|K AA1 M P L AH0 M AH0 N T S|V 64 | COMPOUND|K AH0 M P AW1 N D|K AA1 M P AW0 N D|V 65 | COMPOUNDS|K AH0 M P AW1 N D Z|K AA1 M P AW0 N D Z|V 66 | COMPRESS|K AH0 M P R EH1 S|K AA1 M P R EH0 S|V 67 | COMPRESSES|K AH0 M P R EH1 S IH0 Z|K AA1 M P R EH0 S AH0 Z|V 68 | CONCERT|K AH0 N S ER1 T|K AA1 N S ER0 T|V 69 | CONCERTS|K AH0 N S ER1 T S|K AA1 N S ER0 T S|V 70 | CONDUCT|K AA0 N D AH1 K T|K AA1 N D AH0 K T|V 71 | CONFEDERATE|K AH0 N F EH1 D ER0 EY2 T|K AH0 N F EH1 D ER0 AH0 T|V 72 | CONFEDERATES|K AH0 N F EH1 D ER0 EY2 T S|K AH0 N F EH1 D ER0 AH0 T S|V 73 | CONFINES|K AH0 N F AY1 N Z|K AA1 N F AY2 N Z|V 74 | CONFLICT|K AH0 N F L IH1 K T|K AA1 N F L IH0 K T|V 75 | CONFLICTS|K AH0 N F L IH1 K T S|K AA1 N F L IH0 K T S|V 76 | CONGLOMERATE|K AH0 N G L AA1 M ER0 EY2 T|K AH0 N G L AA1 M ER0 AH0 T|V 77 | CONGLOMERATES|K AH0 N G L AA1 M ER0 EY2 T S|K AH0 N G L AA1 M ER0 AH0 T S|V 78 | CONSCRIPT|K AH0 N S K R IH1 P T|K AA1 N S K R IH0 P T|V 79 | CONSCRIPTS|K AH0 N S K R IH1 P T S|K AA1 N S K R IH0 P T S|V 80 | CONSOLE|K AH0 N S OW1 L|K AA1 N S OW0 L|V 81 | CONSOLES|K AH0 N S OW1 L Z|K AA1 N S OW0 L Z|V 82 | CONSORT|K AH0 N S AO1 R T|K AA1 N S AO0 R T|V 83 | CONSTRUCT|K AH0 N S T R AH1 K T|K AA1 N S T R AH0 K T|V 84 | CONSTRUCTS|K AH0 N S T R AH1 K T S|K AA1 N S T R AH0 K T S|V 85 | CONSUMMATE|K AA1 N S AH0 M EY2 T|K AA0 N S AH1 M AH0 T|V 86 | CONTENT|K AA1 N T EH0 N T|K AH0 N T EH1 N T|N 87 | CONTENTS|K AH0 N T EH1 N T S|K AA1 N T EH0 N T S|V 88 | CONTEST|K AH0 N T EH1 S T|K AA1 N T EH0 S T|V 89 | CONTESTS|K AH0 N T EH1 S T S|K AA1 N T EH0 S T S|V 90 | CONTRACT|K AH0 N T R AE1 K T|K AA1 N T R AE2 K T|V 91 | CONTRACTS|K AH0 N T R AE1 K T S|K AA1 N T R AE2 K T S|V 92 | CONTRAST|K AH0 N T R AE1 S T|K AA1 N T R AE0 S T|V 93 | CONTRASTS|K AH0 N T R AE1 S T S|K AA1 N T R AE0 S T S|V 94 | CONVERSE|K AH0 N V ER1 S|K AA1 N V ER0 S|V 95 | CONVERT|K AH0 N V ER1 T|K AA1 N V ER0 T|V 96 | CONVERTS|K AH0 N V ER1 T S|K AA1 N V ER0 T S|V 97 | CONVICT|K AH0 N V IH1 K T|K AA1 N V IH0 K T|V 98 | CONVICTS|K AH0 N V IH1 K T S|K AA1 N V IH0 K T S|V 99 | COORDINATE|K OW0 AO1 R D AH0 N EY2 T|K OW0 AO1 R D AH0 N AH0 T|V 100 | COORDINATES|K OW0 AO1 R D AH0 N EY2 T S|K OW0 AO1 R D AH0 N AH0 T S|V 101 | COUNTERBALANCE|K AW1 N T ER0 B AE2 L AH0 N S|K AW2 N T ER0 B AE1 L AH0 N S|V 102 | COUNTERBALANCES|K AW2 N T ER0 B AE1 L AH0 N S IH0 Z|K AW1 N T ER0 B AE2 L AH0 N S IH0 Z|V 103 | CRABBED|K R AE1 B D|K R AE1 B IH0 D|V 104 | CROOKED|K R UH1 K T|K R UH1 K AH0 D|V 105 | CURATE|K Y UH0 R AH1 T|K Y UH1 R AH0 T|V 106 | CURSED|K ER1 S T|K ER1 S IH0 D|V 107 | DECOY|D IY0 K OY1|D IY1 K OY0|V 108 | DECOYS|D IY0 K OY1 Z|D IY1 K OY0 Z|V 109 | DECREASE|D IH0 K R IY1 S|D IY1 K R IY2 S|V 110 | DECREASES|D IH0 K R IY1 S IH0 Z|D IY1 K R IY2 S IH0 Z|V 111 | DEFECT|D IH0 F EH1 K T|D IY1 F EH0 K T|V 112 | DEFECTS|D IH0 F EH1 K T S|D IY1 F EH0 K T S|V 113 | DEGENERATE|D IH0 JH EH1 N ER0 EY2 T|D IH0 JH EH1 N ER0 AH0 T|V 114 | DEGENERATES|D IH0 JH EH1 N ER0 EY2 T S|D IH0 JH EH1 N ER0 AH0 T S|V 115 | DELEGATE|D EH1 L AH0 G EY2 T|D EH1 L AH0 G AH0 T|V 116 | DELEGATES|D EH1 L AH0 G EY2 T S|D EH1 L AH0 G AH0 T S|V 117 | DELIBERATE|D IH0 L IH1 B ER0 EY2 T|D IH0 L IH1 B ER0 AH0 T|V 118 | DESERT|D IH0 Z ER1 T|D EH1 Z ER0 T|V 119 | DESERTS|D IH0 Z ER1 T S|D EH1 Z ER0 T S|V 120 | DESOLATE|D EH1 S AH0 L EY2 T|D EH1 S AH0 L AH0 T|V 121 | DIAGNOSES|D AY1 AH0 G N OW2 Z IY0 Z|D AY2 AH0 G N OW1 S IY0 Z|V 122 | DICTATE|D IH0 K T EY1 T|D IH1 K T EY2 T|V 123 | DICTATES|D IH0 K T EY1 T S|D IH1 K T EY2 T S|V 124 | DIFFUSE|D IH0 F Y UW1 Z|D IH0 F Y UW1 S|V 125 | DIGEST|D AY0 JH EH1 S T|D AY1 JH EH0 S T|V 126 | DIGESTS|D AY2 JH EH1 S T S|D AY1 JH EH0 S T S|V 127 | DISCARD|D IH0 S K AA1 R D|D IH1 S K AA0 R D|V 128 | DISCARDS|D IH0 S K AA1 R D Z|D IH1 S K AA0 R D Z|V 129 | DISCHARGE|D IH0 S CH AA1 R JH|D IH1 S CH AA2 R JH|V 130 | DISCHARGES|D IH0 S CH AA1 R JH AH0 Z|D IH1 S CH AA2 R JH AH0 Z|V 131 | DISCOUNT|D IH0 S K AW1 N T|D IH1 S K AW0 N T|V 132 | DISCOUNTS|D IH0 S K AW1 N T S|D IH1 S K AW2 N T S|V 133 | DISCOURSE|D IH0 S K AO1 R S|D IH1 S K AO0 R S|V 134 | DISCOURSES|D IH0 S K AO1 R S IH0 Z|D IH1 S K AO0 R S IH0 Z|V 135 | DOCUMENT|D AA1 K Y UW0 M EH0 N T|D AA1 K Y AH0 M AH0 N T|V 136 | DOCUMENTS|D AA1 K Y UW0 M EH0 N T S|D AA1 K Y AH0 M AH0 N T S|V 137 | DOGGED|D AO1 G IH0 D|D AO1 G D|V 138 | DUPLICATE|D UW1 P L AH0 K EY2 T|D UW1 P L AH0 K AH0 T|V 139 | DUPLICATES|D UW1 P L AH0 K EY2 T S|D UW1 P L AH0 K AH0 T S|V 140 | EJACULATE|IH0 JH AE1 K Y UW0 L EY2 T|IH0 JH AE1 K Y UW0 L AH0 T|V 141 | EJACULATES|IH0 JH AE1 K Y UW0 L EY2 T S|IH0 JH AE1 K Y UW0 L AH0 T S|V 142 | ELABORATE|IH0 L AE1 B ER0 EY2 T|IH0 L AE1 B R AH0 T|V 143 | ENTRANCE|IH0 N T R AH1 N S|EH1 N T R AH0 N S|V 144 | ENTRANCES|IH0 N T R AH1 N S AH0 Z|EH1 N T R AH0 N S AH0 Z|V 145 | ENVELOPE|IH0 N V EH1 L AH0 P|EH1 N V AH0 L OW2 P|V 146 | ENVELOPES|IH0 N V EH1 L AH0 P S|EH1 N V AH0 L OW2 P S|V 147 | ESCORT|EH0 S K AO1 R T|EH1 S K AO0 R T|V 148 | ESCORTS|EH0 S K AO1 R T S|EH1 S K AO0 R T S|V 149 | ESSAY|EH0 S EY1|EH1 S EY2|V 150 | ESSAYS|EH0 S EY1 Z|EH1 S EY2 Z|V 151 | ESTIMATE|EH1 S T AH0 M EY2 T|EH1 S T AH0 M AH0 T|V 152 | ESTIMATES|EH1 S T AH0 M EY2 T S|EH1 S T AH0 M AH0 T S|V 153 | EXCESS|IH0 K S EH1 S|EH1 K S EH2 S|V 154 | EXCISE|EH0 K S AY1 S|EH1 K S AY0 Z|V 155 | EXCUSE|IH0 K S K Y UW1 Z|IH0 K S K Y UW1 S|V 156 | EXCUSES|IH0 K S K Y UW1 Z IH0 Z|IH0 K S K Y UW1 S IH0 Z|V 157 | EXPATRIATE|EH0 K S P EY1 T R IY0 EY2 T|EH0 K S P EY1 T R IY0 AH0 T|V 158 | EXPATRIATES|EH0 K S P EY1 T R IY0 EY2 T S|EH0 K S P EY1 T R IY0 AH0 T S|V 159 | EXPLOIT|EH1 K S P L OY2 T|EH2 K S P L OY1 T|V 160 | EXPLOITS|EH1 K S P L OY2 T S|EH2 K S P L OY1 T S|V 161 | EXPORT|IH0 K S P AO1 R T|EH1 K S P AO0 R T|V 162 | EXPORTS|IH0 K S P AO1 R T S|EH1 K S P AO0 R T S|V 163 | EXTRACT|IH0 K S T R AE1 K T|EH1 K S T R AE2 K T|V 164 | EXTRACTS|IH0 K S T R AE1 K T S|EH1 K S T R AE2 K T S|V 165 | FERMENT|F ER0 M EH1 N T|F ER1 M EH0 N T|V 166 | FERMENTS|F ER0 M EH1 N T S|F ER1 M EH0 N T S|V 167 | FRAGMENT|F R AE1 G M AH0 N T|F R AE0 G M EH1 N T|V 168 | FRAGMENTS|F R AE0 G M EH1 N T S|F R AE1 G M AH0 N T S|V 169 | FREQUENT|F R IY1 K W EH2 N T|F R IY1 K W AH0 N T|V 170 | GRADUATE|G R AE1 JH AH0 W EY2 T|G R AE1 JH AH0 W AH0 T|V 171 | GRADUATES|G R AE1 JH AH0 W EY2 T S|G R AE1 JH AH0 W AH0 T S|V 172 | HOUSE|HH AW1 Z|HH AW1 S|V 173 | IMPACT|IH2 M P AE1 K T|IH1 M P AE0 K T|V 174 | IMPACTS|IH2 M P AE1 K T S|IH1 M P AE0 K T S|V 175 | IMPLANT|IH2 M P L AE1 N T|IH1 M P L AE2 N T|V 176 | IMPLANTS|IH2 M P L AE1 N T S|IH1 M P L AE2 N T S|V 177 | IMPLEMENT|IH1 M P L AH0 M EH0 N T|IH1 M P L AH0 M AH0 N T|V 178 | IMPLEMENTS|IH1 M P L AH0 M EH0 N T S|IH1 M P L AH0 M AH0 N T S|V 179 | IMPORT|IH2 M P AO1 R T|IH1 M P AO2 R T|V 180 | IMPORTS|IH2 M P AO1 R T S|IH1 M P AO2 R T S|V 181 | IMPRESS|IH0 M P R EH1 S|IH1 M P R EH0 S|V 182 | IMPRINT|IH1 M P R IH0 N T|IH2 M P R IH1 N T|V 183 | IMPRINTS|IH2 M P R IH1 N T S|IH1 M P R IH0 N T S|V 184 | INCENSE|IH2 N S EH1 N S|IH1 N S EH2 N S|V 185 | INCLINE|IH2 N K L AY1 N|IH1 N K L AY0 N|V 186 | INCLINES|IH2 N K L AY1 N Z|IH1 N K L AY0 N Z|V 187 | INCORPORATE|IH2 N K AO1 R P ER0 EY2 T|IH2 N K AO1 R P ER0 AH0 T|V 188 | INCREASE|IH2 N K R IY1 S|IH1 N K R IY2 S|V 189 | INCREASES|IH2 N K R IY1 S IH0 Z|IH1 N K R IY2 S IH0 Z|V 190 | INDENT|IH2 N D EH1 N T|IH1 N D EH0 N T|V 191 | INDENTS|IH2 N D EH1 N T S|IH1 N D EH0 N T S|V 192 | INEBRIATE|IH2 N EH1 B R IY0 EY2 T|IH2 N EH1 B R IY0 AH0 T|V 193 | INEBRIATES|IH2 N EH1 B R IY0 EY2 T S|IH2 N EH1 B R IY0 AH0 T S|V 194 | INITIATE|IH2 N IH1 SH IY0 EY2 T|IH2 N IH1 SH IY0 AH0 T|V 195 | INITIATES|IH2 N IH1 SH IY0 EY2 T S|IH2 N IH1 SH IY0 AH0 T S|V 196 | INLAY|IH2 N L EY1|IH1 N L EY2|V 197 | INLAYS|IH2 N L EY1 Z|IH1 N L EY2 Z|V 198 | INSERT|IH2 N S ER1 T|IH1 N S ER2 T|V 199 | INSERTS|IH2 N S ER1 T S|IH1 N S ER2 T S|V 200 | INSET|IH2 N S EH1 T|IH1 N S EH2 T|V 201 | INSETS|IH2 N S EH1 T S|IH1 N S EH2 T S|V 202 | INSTINCT|IH2 N S T IH1 NG K T|IH1 N S T IH0 NG K T|V 203 | INSULT|IH2 N S AH1 L T|IH1 N S AH2 L T|V 204 | INSULTS|IH2 N S AH1 L T S|IH1 N S AH2 L T S|V 205 | INTERCHANGE|IH2 T ER0 CH EY1 N JH|IH1 N T ER0 CH EY2 N JH|V 206 | INTERCHANGES|IH2 T ER0 CH EY1 N JH IH0 Z|IH1 N T ER0 CH EY2 N JH IH0 Z|V 207 | INTERDICT|IH2 N T ER0 D IH1 K T|IH1 N T ER0 D IH2 K T|V 208 | INTERDICTS|IH2 N T ER0 D IH1 K T S|IH1 N T ER0 D IH2 K T S|V 209 | INTERN|IH0 N T ER1 N|IH1 N T ER0 N|V 210 | INTERNS|IH0 N T ER1 N Z|IH1 N T ER0 N Z|V 211 | INTIMATE|IH1 N T IH0 M EY2 T|IH1 N T AH0 M AH0 T|V 212 | INTIMATES|IH1 N T IH0 M EY2 T S|IH1 N T AH0 M AH0 T S|V 213 | INTROVERT|IH2 N T R AO0 V ER1 T|IH1 N T R AO0 V ER2 T|V 214 | INTROVERTS|IH2 N T R AO0 V ER1 T S|IH1 N T R AO0 V ER2 T S|V 215 | INVERSE|IH1 N V ER0 S|IH2 N V ER1 S|V 216 | INVITE|IH2 N V AY1 T|IH1 N V AY0 T|V 217 | INVITES|IH2 N V AY1 T S|IH1 N V AY0 T S|V 218 | JAGGED|JH AE1 G D|JH AE1 G IH0 D|V 219 | LEARNED|L ER1 N IH0 D|L ER1 N D|V 220 | LEGITIMATE|L AH0 JH IH1 T AH0 M EY2 T|L AH0 JH IH1 T AH0 M AH0 T|V 221 | MANDATE|M AE1 N D EY2 T|M AE2 N D EY1 T|V 222 | MISCONDUCT|M IH2 S K AA1 N D AH0 K T|M IH2 S K AA0 N D AH1 K T|V 223 | MISPRINT|M IH2 S P R IH1 N T|M IH1 S P R IH0 N T|V 224 | MISPRINTS|M IH2 S P R IH1 N T S|M IH1 S P R IH0 N T S|V 225 | MISUSE|M IH0 S Y UW1 S|M IH0 S Y UW1 Z|V 226 | MISUSES|M IH0 S Y UW1 Z IH0 Z|M IH0 S Y UW1 S IH0 Z|V 227 | MODERATE|M AA1 D ER0 EY2 T|M AA1 D ER0 AH0 T|V 228 | MODERATES|M AA1 D ER0 EY2 T S|M AA1 D ER0 AH0 T S|V 229 | MOUTH|M AW1 TH|M AW1 DH|V 230 | MOUTHS|M AW1 DH Z|M AW1 TH S|V 231 | OBJECT|AA1 B JH EH0 K T|AH0 B JH EH1 K T|V 232 | OBJECTS|AH0 B JH EH1 K T S|AA1 B JH EH0 K T S|V 233 | ORNAMENT|AO1 R N AH0 M EH0 N T|AO1 R N AH0 M AH0 N T|V 234 | ORNAMENTS|AO1 R N AH0 M EH0 N T S|AO1 R N AH0 M AH0 N T S|V 235 | OVERCHARGE|OW2 V ER0 CH AA1 R JH|OW1 V ER0 CH AA2 R JH|V 236 | OVERCHARGES|OW2 V ER0 CH AA1 R JH IH0 Z|OW1 V ER0 CH AA2 R JH IH0 Z|V 237 | OVERFLOW|OW2 V ER0 F L OW1|OW1 V ER0 F L OW2|V 238 | OVERFLOWS|OW2 V ER0 F L OW1 Z|OW1 V ER0 F L OW2 Z|V 239 | OVERHANG|OW2 V ER0 HH AE1 NG|OW1 V ER0 HH AE2 NG|V 240 | OVERHANGS|OW2 V ER0 HH AE1 NG Z|OW1 V ER0 HH AE2 NG Z|V 241 | OVERHAUL|OW2 V ER0 HH AO1 L|OW1 V ER0 HH AO2 L|V 242 | OVERHAULS|OW2 V ER0 HH AO1 L Z|OW1 V ER0 HH AO2 L Z|V 243 | OVERLAP|OW2 V ER0 L AE1 P|OW1 V ER0 L AE2 P|V 244 | OVERLAPS|OW2 V ER0 L AE1 P S|OW1 V ER0 L AE2 P S|V 245 | OVERLAY|OW2 V ER0 L EY1|OW1 V ER0 L EY2|V 246 | OVERLAYS|OW2 V ER0 L EY1 Z|OW1 V ER0 L EY2 Z|V 247 | OVERWORK|OW2 V ER0 W ER1 K|OW1 V ER0 W ER2 K|V 248 | PERFECT|P ER0 F EH1 K T|P ER1 F IH2 K T|V 249 | PERFUME|P ER0 F Y UW1 M|P ER1 F Y UW0 M|V 250 | PERFUMES|P ER0 F Y UW1 M Z|P ER1 F Y UW0 M Z|V 251 | PERMIT|P ER0 M IH1 T|P ER1 M IH2 T|V 252 | PERMITS|P ER0 M IH1 T S|P ER1 M IH2 T S|V 253 | PERVERT|P ER0 V ER1 T|P ER1 V ER0 T|V 254 | PERVERTS|P ER0 V ER1 T S|P ER1 V ER0 T S|V 255 | PONTIFICATE|P AA0 N T IH1 F AH0 K AH0 T|P AA0 N T IH1 F AH0 K EY2 T|V 256 | PONTIFICATES|P AA0 N T IH1 F AH0 K EY2 T S|P AA0 N T IH1 F AH0 K AH0 T S|V 257 | PRECIPITATE|P R IH0 S IH1 P IH0 T AH0 T|P R IH0 S IH1 P IH0 T EY2 T|V 258 | PREDICATE|P R EH1 D IH0 K AH0 T|P R EH1 D AH0 K EY2 T|V 259 | PREDICATES|P R EH1 D AH0 K EY2 T S|P R EH1 D IH0 K AH0 T S|V 260 | PREFIX|P R IY2 F IH1 K S|P R IY1 F IH0 K S|V 261 | PREFIXES|P R IY2 F IH1 K S IH0 JH|P R IY1 F IH0 K S IH0 JH|V 262 | PRESAGE|P R EH2 S IH1 JH|P R EH1 S IH0 JH|V 263 | PRESAGES|P R EH2 S IH1 JH IH0 JH|P R EH1 S IH0 JH IH0 JH|V 264 | PRESENT|P R IY0 Z EH1 N T|P R EH1 Z AH0 N T|V 265 | PRESENTS|P R IY0 Z EH1 N T S|P R EH1 Z AH0 N T S|V 266 | PROCEEDS|P R AH0 S IY1 D Z|P R OW1 S IY0 D Z|V 267 | PROCESS|P R AO2 S EH1 S|P R AA1 S EH2 S|V 268 | PROCESSES|P R AA1 S EH0 S AH0 Z|P R AO2 S EH1 S AH0 Z|V 269 | PROCESSING|P R AA0 S EH1 S IH0 NG|P R AA1 S EH0 S IH0 NG|V 270 | PRODUCE|P R AH0 D UW1 S|P R OW1 D UW0 S|V 271 | PROGRESS|P R AH0 G R EH1 S|P R AA1 G R EH2 S|V 272 | PROGRESSES|P R OW0 G R EH1 S AH0 Z|P R AA1 G R EH2 S AH0 Z|V 273 | PROJECT|P R AA0 JH EH1 K T|P R AA1 JH EH0 K T|V 274 | PROJECTS|P R AA0 JH EH1 K T S|P R AA1 JH EH0 K T S|V 275 | PROSPECT|P R AH2 S P EH1 K T|P R AA1 S P EH0 K T|V 276 | PROSPECTS|P R AH2 S P EH1 K T S|P R AA1 S P EH0 K T S|V 277 | PROSTRATE|P R AA0 S T R EY1 T|P R AA1 S T R EY0 T|V 278 | PROTEST|P R AH0 T EH1 S T|P R OW1 T EH2 S T|V 279 | PROTESTS|P R AH0 T EH1 S T S|P R OW1 T EH2 S T S|V 280 | PURPORT|P ER0 P AO1 R T|P ER1 P AO2 R T|V 281 | QUADRUPLE|K W AA1 D R UW0 P AH0 L|K W AA0 D R UW1 P AH0 L|V 282 | QUADRUPLES|K W AA0 D R UW1 P AH0 L Z|K W AA1 D R UW0 P AH0 L Z|V 283 | RAGGED|R AE1 G D|R AE1 G AH0 D|V 284 | RAMPAGE|R AE2 M P EY1 JH|R AE1 M P EY2 JH|V 285 | RAMPAGES|R AE2 M P EY1 JH IH0 Z|R AE1 M P EY2 JH IH0 Z|V 286 | READ|R IY1 D|R EH1 D|VBD 287 | REBEL|R EH1 B AH0 L|R IH0 B EH1 L|V 288 | REBELS|R IH0 B EH1 L Z|R EH1 B AH0 L Z|V 289 | REBOUND|R IY0 B AW1 N D|R IY1 B AW0 N D|V 290 | REBOUNDS|R IY0 B AW1 N D Z|R IY1 B AW0 N D Z|V 291 | RECALL|R IH0 K AO1 L|R IY1 K AO2 L|V 292 | RECALLS|R IH0 K AO1 L Z|R IY1 K AO2 L Z|V 293 | RECAP|R IH0 K AE1 P|R IY1 K AE2 P|V 294 | RECAPPED|R IH0 K AE1 P T|R IY1 K AE2 P T|V 295 | RECAPPING|R IH0 K AE1 P IH0 NG|R IY1 K AE2 P IH0 NG|V 296 | RECAPS|R IH0 K AE1 P S|R IY1 K AE2 P S|V 297 | RECOUNT|R IY2 K AW1 N T| R IH1 K AW0 N T|V 298 | RECOUNTS|R IY2 K AW1 N T S| R IH1 K AW0 N T S|V 299 | RECORD|R IH0 K AO1 R D|R EH1 K ER0 D|V 300 | RECORDS|R IH0 K AO1 R D Z|R EH1 K ER0 D Z|V 301 | REFILL|R IY0 F IH1 L|R IY1 F IH0 L|V 302 | REFILLS|R IY0 F IH1 L Z|R IY1 F IH0 L Z|V 303 | REFIT|R IY0 F IH1 T|R IY1 F IH0 T|V 304 | REFITS|R IY0 F IH1 T S|R IY1 F IH0 T S|V 305 | REFRESH|R IH0 F R EH1 SH|R IH1 F R EH0 SH|V 306 | REFUND|R IH0 F AH1 N D|R IY1 F AH2 N D|V 307 | REFUNDS|R IH0 F AH1 N D Z|R IY1 F AH2 N D Z|V 308 | REFUSE|R IH0 F Y UW1 Z|R EH1 F Y UW2 Z|V 309 | REGENERATE|R IY0 JH EH1 N ER0 EY2 T|R IY0 JH EH1 N ER0 AH0 T|V 310 | REHASH|R IY0 HH AE1 SH|R IY1 HH AE0 SH|V 311 | REHASHES|R IY0 HH AE1 SH IH0 Z|R IY1 HH AE0 SH IH0 Z|V 312 | REINCARNATE|R IY2 IH0 N K AA1 R N EY2 T|R IY2 IH0 N K AA1 R N AH0 T|V 313 | REJECT|R IH0 JH EH1 K T|R IY1 JH EH0 K T|V 314 | REJECTS|R IH0 JH EH1 K T S|R IY1 JH EH0 K T S|V 315 | RELAY|R IY2 L EY1|R IY1 L EY2|V 316 | RELAYING|R IY2 L EY1 IH0 NG|R IY1 L EY2 IH0 NG|V 317 | RELAYS|R IY2 L EY1 Z|R IY1 L EY2 Z|V 318 | REMAKE|R IY2 M EY1 K|R IY1 M EY0 K|V 319 | REMAKES|R IY2 M EY1 K S|R IY1 M EY0 K S|V 320 | REPLAY|R IY0 P L EY1|R IY1 P L EY0|V 321 | REPLAYS|R IY0 P L EY1 Z|R IY1 P L EY0 Z|V 322 | REPRINT|R IY0 P R IH1 N T|R IY1 P R IH0 N T|V 323 | REPRINTS|R IY0 P R IH1 N T S|R IY1 P R IH0 N T S|V 324 | RERUN|R IY2 R AH1 N|R IY1 R AH0 N|V 325 | RERUNS|R IY2 R AH1 N Z|R IY1 R AH0 N Z|V 326 | RESUME|R IY0 Z UW1 M|R EH1 Z AH0 M EY2|V 327 | RETAKE|R IY0 T EY1 K|R IY1 T EY0 K|V 328 | RETAKES|R IY0 T EY1 K S|R IY1 T EY0 K S|V 329 | RETHINK|R IY2 TH IH1 NG K|R IY1 TH IH0 NG K|V 330 | RETHINKS|R IY2 TH IH1 NG K S|R IY1 TH IH0 NG K S|V 331 | RETREAD|R IY2 T R EH1 D|R IY1 T R EH0 D|V 332 | RETREADS|R IY2 T R EH1 D Z|R IY1 T R EH0 D Z|V 333 | REWRITE|R IY0 R AY1 T|R IY1 R AY2 T|V 334 | REWRITES|R IY0 R AY1 T S|R IY1 R AY2 T S|V 335 | SEGMENT|S EH1 G M AH0 N T|S EH2 G M EH1 N T|V 336 | SEGMENTS|S EH2 G M EH1 N T S|S EH1 G M AH0 N T S|V 337 | SEPARATE|S EH1 P ER0 EY2 T|S EH1 P ER0 IH0 T|V 338 | SEPARATES|S EH1 P ER0 EY2 T S|S EH1 P ER0 IH0 T S|V 339 | SUBCONTRACT|S AH0 B K AA1 N T R AE2 K T|S AH2 B K AA0 N T R AE1 K T|V 340 | SUBCONTRACTS|S AH2 B K AA0 N T R AE1 K T S|S AH0 B K AA1 N T R AE2 K T S|V 341 | SUBJECT|S AH0 B JH EH1 K T|S AH1 B JH IH0 K T|V 342 | SUBJECTS|S AH0 B JH EH1 K T S|S AH1 B JH IH0 K T S|V 343 | SUBORDINATE|S AH0 B AO1 R D AH0 N EY2 T|S AH0 B AO1 R D AH0 N AH0 T|V 344 | SUBORDINATES|S AH0 B AO1 R D AH0 N EY2 T S|S AH0 B AO1 R D AH0 N AH0 T S|V 345 | SUPPLEMENT|S AH1 P L AH0 M EH0 N T|S AH1 P L AH0 M AH0 N T|V 346 | SUPPLEMENTS|S AH1 P L AH0 M EH0 N T S|S AH1 P L AH0 M AH0 N T S|V 347 | SURMISE|S ER0 M AY1 Z|S ER1 M AY0 Z|V 348 | SURMISES|S ER0 M AY1 Z IH0 Z|S ER1 M AY0 Z IH0 Z|V 349 | SURVEY|S ER0 V EY1|S ER1 V EY2|V 350 | SURVEYS|S ER0 V EY1 Z|S ER1 V EY2 Z|V 351 | SUSPECT|S AH0 S P EH1 K T|S AH1 S P EH2 K T|V 352 | SUSPECTS|S AH0 S P EH1 K T S|S AH1 S P EH2 K T S|V 353 | SYNDICATE|S IH1 N D AH0 K EY2 T|S IH1 N D IH0 K AH0 T|V 354 | SYNDICATES|S IH1 N D IH0 K EY2 T S|S IH1 N D IH0 K AH0 T S|V 355 | TORMENT|T AO1 R M EH2 N T|T AO0 R M EH1 N T|V 356 | TRANSFER|T R AE0 N S F ER1|T R AE1 N S F ER0|V 357 | TRANSFERS|T R AE0 N S F ER1 Z|T R AE1 N S F ER0 Z|V 358 | TRANSPLANT|T R AE0 N S P L AE1 N T|T R AE1 N S P L AE0 N T|V 359 | TRANSPLANTS|T R AE0 N S P L AE1 N T S|T R AE1 N S P L AE0 N T S|V 360 | TRANSPORT|T R AE0 N S P AO1 R T|T R AE1 N S P AO0 R T|V 361 | TRANSPORTS|T R AE0 N S P AO1 R T S|T R AE1 N S P AO0 R T S|V 362 | TRIPLICATE|T R IH1 P L IH0 K EY2 T|T R IH1 P L IH0 K AH0 T|V 363 | TRIPLICATES|T R IH1 P L IH0 K EY2 T S|T R IH1 P L IH0 K AH0 T S|V 364 | UNDERCUT|AH2 N D ER0 K AH1 T|AH1 N D ER0 K AH2 T|V 365 | UNDERESTIMATE|AH1 N D ER0 EH1 S T AH0 M EY2 T|AH1 N D ER0 EH1 S T AH0 M AH0 T|V 366 | UNDERESTIMATES|AH1 N D ER0 EH1 S T AH0 M EY2 T S|AH1 N D ER0 EH1 S T AH0 M AH0 T S|V 367 | UNDERLINE|AH2 N D ER0 L AY1 N|AH1 N D ER0 L AY2 N|V 368 | UNDERLINES|AH2 N D ER0 L AY1 N Z|AH1 N D ER0 L AY2 N Z|V 369 | UNDERTAKING|AH2 N D ER0 T EY1 K IH0 NG|AH1 N D ER0 T EY2 K IH0 NG|V 370 | UNDERTAKINGS|AH2 N D ER0 T EY1 K IH0 NG Z|AH1 N D ER0 T EY2 K IH0 NG Z|V 371 | UNUSED|AH0 N Y UW1 Z D|AH0 N Y UW1 S T|V 372 | UPGRADE|AH0 P G R EY1 D|AH1 P G R EY0 D|V 373 | UPGRADES|AH0 P G R EY1 D Z|AH1 P G R EY0 D Z|V 374 | UPLIFT|AH2 P L IH1 F T|AH1 P L IH0 F T|V 375 | UPSET|AH0 P S EH1 T|AH1 P S EH2 T|V 376 | UPSETS|AH0 P S EH1 T S|AH1 P S EH2 T S|V 377 | USE|Y UW1 Z|Y UW1 S|V 378 | USED|Y UW1 Z D|Y UW1 S T|VBN 379 | USES|Y UW1 Z IH0 Z|Y UW1 S IH0 Z|V -------------------------------------------------------------------------------- /g2p_zh_en/g2p_zh_en.py: -------------------------------------------------------------------------------- 1 | from g2p_zh_en.mapper import EnMapper,PinyinMapper 2 | from pypinyin import pinyin, lazy_pinyin, Style 3 | from typing import List 4 | import cn2an 5 | import re 6 | from .g2p_en import G2p 7 | g2p = G2p() 8 | 9 | 10 | """Main module.""" 11 | class G2P: 12 | def __init__(self): 13 | self.pinyin_mapper = PinyinMapper() 14 | self.en_mapper = EnMapper() 15 | 16 | #pinyin cannot handle, forward to g2p_en 17 | def __no_pinyin(self,ch): 18 | ch = ch.lower() 19 | ch = re.sub("[^ a-z'.,?!\-。,!]", "", ch) 20 | pout = g2p(ch) 21 | out = self.en_mapper.convert(pout) 22 | return out 23 | 24 | #english cannot handle, forward to pinyin 25 | def __no_eng(self,ch): 26 | text = cn2an.transform(ch, "an2cn") 27 | pout = lazy_pinyin(text, style=Style.TONE3, neutral_tone_with_five=True) 28 | out = self.pinyin_mapper.convert(pout) 29 | return out 30 | 31 | def g2p(self,language='zh-cn',text:str=None) -> List: 32 | if language == 'zh-cn': 33 | #try transform 34 | text = cn2an.transform(text, "an2cn") 35 | pout = lazy_pinyin(text, style=Style.TONE3, neutral_tone_with_five=True, errors=self.__no_pinyin) 36 | out = self.pinyin_mapper.convert(pout) 37 | return out 38 | elif language == 'en-us': 39 | #try transform chinese letter to pinyin 40 | # carr = lazy_pinyin(text, style=Style.TONE3, neutral_tone_with_five=True) 41 | # text = " ".join([s.strip() for s in carr]) 42 | pout = g2p(text,no_handler=self.__no_eng) 43 | out = self.en_mapper.convert(pout) 44 | return out 45 | else: 46 | raise Exception(f"language={language} not supported") 47 | 48 | 49 | 50 | -------------------------------------------------------------------------------- /g2p_zh_en/map_data/ARPA2IPA.map: -------------------------------------------------------------------------------- 1 | AA0 a 2 | AA1 a 3 | AA2 a 4 | AE0 æ 5 | AE1 æ 6 | AE2 æ 7 | AH0 ə 8 | AH1 ʌ 9 | AH2 ʌ 10 | AO0 ɔ 11 | AO1 ɔ 12 | AO2 ɔ 13 | AW0 au 14 | AW1 au 15 | AW2 au 16 | AY0 ai 17 | AY1 ai 18 | AY2 ai 19 | B b 20 | CH ch 21 | D d 22 | DH ð 23 | EH0 e 24 | EH1 e 25 | EH2 e 26 | ER0 ər 27 | ER1 ər 28 | ER2 ər 29 | EY0 ei 30 | EY1 ei 31 | EY2 ei 32 | F f 33 | G g 34 | HH h 35 | IH0 i 36 | IH1 i 37 | IH2 i 38 | IY0 ii 39 | IY1 ii 40 | IY2 ii 41 | JH zh 42 | K k 43 | L l 44 | M m 45 | N n 46 | NG ŋ 47 | OW0 əu 48 | OW1 əu 49 | OW2 əu 50 | OY0 ɔi 51 | OY1 ɔi 52 | OY2 ɔi 53 | P p 54 | R r 55 | S s 56 | SH sh 57 | T t 58 | TH θ 59 | UH0 u 60 | UH1 u 61 | UH2 u 62 | UW0 uu 63 | UW1 uu 64 | UW2 uu 65 | V v 66 | W w 67 | Y y 68 | Z z 69 | ZH ʒ 70 | -------------------------------------------------------------------------------- /g2p_zh_en/map_data/pinyin_to_phone.txt: -------------------------------------------------------------------------------- 1 | a a 2 | ai ai 3 | an an 4 | ang aŋ 5 | ao au 6 | ba b a 7 | bai b ai 8 | ban b an 9 | bang b aŋ 10 | bao b au 11 | bei b ei 12 | ben b ən 13 | beng b əŋ 14 | bi b ii 15 | bian b i an 16 | biao b i au 17 | bie b ie 18 | bin b in 19 | bing b iŋ 20 | bo b uɔ 21 | bu b uu 22 | ca ts a 23 | cai ts ai 24 | can ts an 25 | cang ts aŋ 26 | cao ts au 27 | ce ts ə 28 | cei ts ei 29 | cen ts ən 30 | ceng ts əŋ 31 | cha ch a 32 | chai ch ai 33 | chan ch an 34 | chang ch aŋ 35 | chao ch au 36 | che ch ə 37 | chen ch ən 38 | cheng ch əŋ 39 | chi ch iii 40 | chong ch uŋ 41 | chou ch əu 42 | chu ch uu 43 | chua ch u a 44 | chuai ch u ai 45 | chuan ch u an 46 | chuang ch u aŋ 47 | chui ch u ei 48 | chun ch u ən 49 | chuo ch uɔ 50 | ci ts iii 51 | cong ts uŋ 52 | cou ts əu 53 | cu ts uu 54 | cuan ts u an 55 | cui ts u ei 56 | cun ts u ən 57 | cuo ts uɔ 58 | da d a 59 | dai d ai 60 | dan d an 61 | dang d aŋ 62 | dao d au 63 | de d ə 64 | dei d ei 65 | den d ən 66 | deng d əŋ 67 | di d ii 68 | dia d i a 69 | dian d i an 70 | diao d i au 71 | die d ie 72 | ding d iŋ 73 | diu d i əu 74 | dong d uŋ 75 | dou d əu 76 | du d uu 77 | duan d u an 78 | dui d u ei 79 | dun d u ən 80 | duo d uɔ 81 | e ə 82 | ei ei 83 | en ən 84 | eng əŋ 85 | er ər 86 | fa f a 87 | fan f an 88 | fang f aŋ 89 | fei f ei 90 | fen f ən 91 | feng f əŋ 92 | fo f uɔ 93 | fou f əu 94 | fu f uu 95 | ga g a 96 | gai g ai 97 | gan g an 98 | gang g aŋ 99 | gao g au 100 | ge g ə 101 | gei g ei 102 | gen g ən 103 | geng g əŋ 104 | gong g uŋ 105 | gou g əu 106 | gu g uu 107 | gua g u a 108 | guai g u ai 109 | guan g u an 110 | guang g u aŋ 111 | gui g u ei 112 | gun g u ən 113 | guo g uɔ 114 | ha h a 115 | hai h ai 116 | han h an 117 | hang h aŋ 118 | hao h au 119 | he h ə 120 | hei h ei 121 | hen h ən 122 | heng h əŋ 123 | hong h uŋ 124 | hou h əu 125 | hu h uu 126 | hua h u a 127 | huai h u ai 128 | huan h u an 129 | huang h u aŋ 130 | hui h u ei 131 | hun h u ən 132 | huo h uɔ 133 | ji j ii 134 | jia j i a 135 | jian j i an 136 | jiang j i aŋ 137 | jiao j i au 138 | jie j ie 139 | jin j in 140 | jing j iŋ 141 | jiong j i uŋ 142 | jiu j i əu 143 | ju j yu 144 | juan j yu an 145 | jue j yue 146 | jun j yu n 147 | ka k a 148 | kai k ai 149 | kan k an 150 | kang k aŋ 151 | kao k au 152 | ke k ə 153 | kei k ei 154 | ken k ən 155 | keng k əŋ 156 | kiu k i əu 157 | kong k uŋ 158 | kou k əu 159 | ku k uu 160 | kua k u a 161 | kuai k u ai 162 | kuan k u an 163 | kuang k u aŋ 164 | kui k u ei 165 | kun k u ən 166 | kuo k uɔ 167 | la l a 168 | lai l ai 169 | lan l an 170 | lang l aŋ 171 | lao l au 172 | le l ə 173 | lei l ei 174 | leng l əŋ 175 | li l ii 176 | lia l i a 177 | lian l i an 178 | liang l i aŋ 179 | liao l i au 180 | lie l ie 181 | lin l in 182 | ling l iŋ 183 | liu l i əu 184 | long l uŋ 185 | lou l əu 186 | lu l uu 187 | luan l u an 188 | lun l u ən 189 | luo l uɔ 190 | lv l yu 191 | lve l yue 192 | ma m a 193 | mai m ai 194 | man m an 195 | mang m aŋ 196 | mao m au 197 | me m ə 198 | mei m ei 199 | men m ən 200 | meng m əŋ 201 | mi m ii 202 | mian m i an 203 | miao m i au 204 | mie m ie 205 | min m in 206 | ming m iŋ 207 | miu m i əu 208 | mo m uɔ 209 | mou m əu 210 | mu m uu 211 | na n a 212 | nai n ai 213 | nan n an 214 | nang n aŋ 215 | nao n au 216 | ne n ə 217 | nei n ei 218 | nen n ən 219 | neng n əŋ 220 | ni n ii 221 | nian n i an 222 | niang n i aŋ 223 | niao n i au 224 | nie n ie 225 | nin n in 226 | ning n iŋ 227 | niu n i əu 228 | nong n uŋ 229 | nou n əu 230 | nu n uu 231 | nuan n u an 232 | nuo n uɔ 233 | nv n yu 234 | nve n yue 235 | o ɔ 236 | ou əu 237 | pa p a 238 | pai p ai 239 | pan p an 240 | pang p aŋ 241 | pao p au 242 | pei p ei 243 | pen p ən 244 | peng p əŋ 245 | pi p ii 246 | pian p i an 247 | piao p i au 248 | pie p ie 249 | pin p in 250 | ping p iŋ 251 | po p uɔ 252 | pou p əu 253 | pu p uu 254 | qi q ii 255 | qia q i a 256 | qian q i an 257 | qiang q i aŋ 258 | qiao q i au 259 | qie q ie 260 | qin q in 261 | qing q iŋ 262 | qiong q i uŋ 263 | qiu q i əu 264 | qu q yu 265 | quan q yu an 266 | que q yue 267 | qun q yu n 268 | ran ʒ an 269 | rang ʒ aŋ 270 | rao ʒ au 271 | re ʒ ə 272 | ren ʒ ən 273 | reng ʒ əŋ 274 | ri ʒ iii 275 | rong ʒ uŋ 276 | rou ʒ əu 277 | ru ʒ uu 278 | ruan ʒ u an 279 | rui ʒ u ei 280 | run ʒ u ən 281 | ruo ʒ uɔ 282 | sa s a 283 | sai s ai 284 | san s an 285 | sang s aŋ 286 | sao s au 287 | se s ə 288 | sei s ei 289 | sen s ən 290 | seng s əŋ 291 | sha sh a 292 | shai sh ai 293 | shan sh an 294 | shang sh aŋ 295 | shao sh au 296 | she sh ə 297 | shei sh ei 298 | shen sh ən 299 | sheng sh əŋ 300 | shi sh iii 301 | shou sh əu 302 | shu sh uu 303 | shua sh u a 304 | shuai sh u ai 305 | shuan sh u an 306 | shuang sh u aŋ 307 | shui sh u ei 308 | shun sh u ən 309 | shuo sh uɔ 310 | si s iii 311 | song s uŋ 312 | sou s əu 313 | su s uu 314 | suan s u an 315 | sui s u ei 316 | sun s u ən 317 | suo s uɔ 318 | ta t a 319 | tai t ai 320 | tan t an 321 | tang t aŋ 322 | tao t au 323 | te t ə 324 | teng t əŋ 325 | ti t ii 326 | tian t i an 327 | tiao t i au 328 | tie t ie 329 | ting t iŋ 330 | tong t uŋ 331 | tou t əu 332 | tu t uu 333 | tuan t u an 334 | tui t u ei 335 | tun t u ən 336 | tuo t uɔ 337 | wa w a 338 | wai w ai 339 | wan w an 340 | wang w aŋ 341 | wei w ei 342 | wen w ən 343 | weng w əŋ 344 | wo w uɔ 345 | wu w uu 346 | xi x ii 347 | xia x i a 348 | xian x i an 349 | xiang x i aŋ 350 | xiao x i au 351 | xie x ie 352 | xin x in 353 | xing x iŋ 354 | xiong x i uŋ 355 | xiu x i əu 356 | xu x yu 357 | xuan x yu an 358 | xue x yue 359 | xun x yu n 360 | ya y a 361 | yan y an 362 | yang y aŋ 363 | yao y au 364 | ye y ie 365 | yi y ii 366 | yin y in 367 | ying y iŋ 368 | yo y ɔ 369 | yong y uŋ 370 | you y əu 371 | yu yu 372 | yuan yu an 373 | yue yue 374 | yun yu n 375 | za z a 376 | zai z ai 377 | zan z an 378 | zang z aŋ 379 | zao z au 380 | ze z ə 381 | zei z ei 382 | zen z ən 383 | zeng z əŋ 384 | zha zh a 385 | zhai zh ai 386 | zhan zh an 387 | zhang zh aŋ 388 | zhao zh au 389 | zhe zh ə 390 | zhei zh ei 391 | zhen zh ən 392 | zheng zh əŋ 393 | zhi zh iii 394 | zhong zh uŋ 395 | zhou zh əu 396 | zhu zh uu 397 | zhua zh u a 398 | zhuai zh u ai 399 | zhuan zh u an 400 | zhuang zh u aŋ 401 | zhui zh u ei 402 | zhun zh u ən 403 | zhuo zh uɔ 404 | zi z iii 405 | zong z uŋ 406 | zou z əu 407 | zu z uu 408 | zuan z u an 409 | zui z u ei 410 | zun z u ən 411 | zuo z uɔ 412 | -------------------------------------------------------------------------------- /g2p_zh_en/mapper.py: -------------------------------------------------------------------------------- 1 | 2 | #load dict 3 | import os, sys, codecs, re 4 | import logging 5 | from abc import ABCMeta, abstractmethod 6 | from typing import List 7 | 8 | dirname = os.path.dirname(__file__) 9 | 10 | 11 | mapf = f'{dirname}/map_data/ARPA2IPA.map' 12 | syllable_to_phone_file = f'{dirname}/map_data/pinyin_to_phone.txt' 13 | 14 | class Mapper: 15 | @abstractmethod 16 | def convert(self, phn_arr:List) -> List: 17 | pass 18 | 19 | class PinyinMapper(Mapper): 20 | mapper = None 21 | def __init__(self): 22 | #load dict 23 | if PinyinMapper.mapper is None: 24 | PinyinMapper.mapper = {} 25 | logging.info(f"loading mapper file {syllable_to_phone_file}") 26 | 27 | with open(syllable_to_phone_file) as f: 28 | for l in f: 29 | cols = l.strip().split('\t') 30 | assert(len(cols) == 2) 31 | syllable = cols[0] 32 | phones = cols[1].split() 33 | PinyinMapper.mapper[syllable] = phones 34 | 35 | def convert(self, phn_arr: List) -> List: 36 | arr = [] 37 | for phn in phn_arr: 38 | tr_phn = phn[:-1] 39 | if not PinyinMapper.mapper.__contains__(tr_phn): 40 | arr.append(phn) 41 | else: 42 | phones = PinyinMapper.mapper[tr_phn].copy() 43 | phones[-1] = phones[-1]+phn[-1:] 44 | arr = arr + phones 45 | return arr 46 | 47 | class EnMapper(Mapper): 48 | mapper = None 49 | def __init__(self): 50 | #load dict 51 | if EnMapper.mapper is None: 52 | logging.info(f"loading mapper file {mapf}") 53 | EnMapper.mapper = {} 54 | with codecs.open(mapf, 'r', 'utf8') as f: 55 | for l in f: 56 | cols = l.strip().split() 57 | EnMapper.mapper[cols[0]] = u' '.join(cols[1:]) 58 | 59 | 60 | def convert(self, phn_arr:List) -> List: 61 | arr = [] 62 | for phn in phn_arr: 63 | if not EnMapper.mapper.__contains__(phn): 64 | arr.append(phn) 65 | else: 66 | arr.append(EnMapper.mapper[phn]) 67 | return arr 68 | 69 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | cn2an==0.5.20 2 | inflect==5.3.0 3 | nltk==3.8.1 4 | numpy==1.20.3 5 | pypinyin==0.44.0 6 | setuptools==58.0.4 7 | 8 | -------------------------------------------------------------------------------- /requirements_dev.txt: -------------------------------------------------------------------------------- 1 | pip==19.2.3 2 | bump2version==0.5.11 3 | wheel==0.33.6 4 | watchdog==0.9.0 5 | flake8==3.7.8 6 | tox==3.14.0 7 | coverage==4.5.4 8 | Sphinx==1.8.5 9 | twine==1.14.0 10 | 11 | 12 | 13 | -------------------------------------------------------------------------------- /setup.cfg: -------------------------------------------------------------------------------- 1 | [bumpversion] 2 | current_version = 0.1.0 3 | commit = True 4 | tag = True 5 | 6 | [bumpversion:file:setup.py] 7 | search = version='{current_version}' 8 | replace = version='{new_version}' 9 | 10 | [bumpversion:file:g2p_zh_en/__init__.py] 11 | search = __version__ = '{current_version}' 12 | replace = __version__ = '{new_version}' 13 | 14 | [bdist_wheel] 15 | universal = 1 16 | 17 | [flake8] 18 | exclude = docs 19 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | """The setup script.""" 4 | 5 | from setuptools import setup, find_packages 6 | 7 | with open('README.rst') as readme_file: 8 | readme = readme_file.read() 9 | 10 | with open('HISTORY.rst') as history_file: 11 | history = history_file.read() 12 | 13 | requirements = [ ] 14 | 15 | test_requirements = [ ] 16 | 17 | setup( 18 | author="gp-zh-en", 19 | author_email='skysbird@gmail.com', 20 | python_requires='>=3.6', 21 | classifiers=[ 22 | 'Development Status :: 2 - Pre-Alpha', 23 | 'Intended Audience :: Developers', 24 | 'License :: OSI Approved :: GNU General Public License v3 (GPLv3)', 25 | 'Natural Language :: English', 26 | 'Programming Language :: Python :: 3', 27 | 'Programming Language :: Python :: 3.6', 28 | 'Programming Language :: Python :: 3.7', 29 | 'Programming Language :: Python :: 3.8', 30 | ], 31 | description="g2p-zh-en", 32 | entry_points={ 33 | 'console_scripts': [ 34 | 'g2p_zh_en=g2p_zh_en.cli:main', 35 | ], 36 | }, 37 | install_requires=requirements, 38 | license="GNU General Public License v3", 39 | long_description=readme + '\n\n' + history, 40 | include_package_data=True, 41 | keywords='g2p_zh_en', 42 | name='g2p_zh_en', 43 | packages=find_packages(include=['g2p_zh_en', 'g2p_zh_en.*','*']), 44 | test_suite='tests', 45 | tests_require=test_requirements, 46 | url='https://github.com/skysbird/g2p-zh-en', 47 | version='0.1.1', 48 | zip_safe=False, 49 | ) 50 | -------------------------------------------------------------------------------- /tests/__init__.py: -------------------------------------------------------------------------------- 1 | """Unit test package for g2p_zh_en.""" 2 | -------------------------------------------------------------------------------- /tests/test_g2p_zh_en.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | """Tests for `g2p_zh_en` package.""" 4 | 5 | 6 | import unittest 7 | import logging 8 | from g2p_zh_en.g2p_zh_en import G2P 9 | 10 | logging.basicConfig(level=logging.DEBUG) 11 | 12 | class TestG2p_zh_en(unittest.TestCase): 13 | """Tests for `g2p_zh_en` package.""" 14 | 15 | def setUp(self): 16 | """Set up test fixtures, if any.""" 17 | self.g2p = G2P() 18 | 19 | def tearDown(self): 20 | """Tear down test fixtures, if any.""" 21 | 22 | def atest_000_something(self): 23 | """Test something.""" 24 | out = self.g2p.g2p(text = "测试,我喜欢在家看NBA") 25 | print(out) 26 | 27 | def test_001_something(self): 28 | """Test something.""" 29 | #out = self.g2p.g2p(text = "i have 100 dollar") 30 | #print(out) 31 | #out = self.g2p.g2p(text = "i have 100 dollar",language='en-us') 32 | #print(out) 33 | 34 | out = self.g2p.g2p(text = "我有100美元,i'm so rich.") 35 | print(out) 36 | out = self.g2p.g2p(text = "i have 100 dollar,我是不是很富有?",language='en-us') 37 | print(out) 38 | 39 | def atest_002_something(self): 40 | """Test something.""" 41 | out = self.g2p.g2p(text = "这个页面使用") 42 | print(out) 43 | 44 | def atest_003_something(self): 45 | """Test something.""" 46 | out = self.g2p.g2p(text = "北京奥运会是在2008年8月8日举办的") 47 | print(out) 48 | 49 | def atest_004_something(self): 50 | """Test something.""" 51 | out = self.g2p.g2p(text = "伦敦奥运会是在2008年8月8日举办的,花了将近150060元") 52 | print(out) 53 | 54 | def atest_005_something(self): 55 | """Test something.""" 56 | out = self.g2p.g2p(text = "测试一个美元符号的情况$100元") 57 | print(out) 58 | 59 | 60 | def atest_100_something(self): 61 | """Test something.""" 62 | out = self.g2p.g2p(text = "hello,this is an english test! 混合一点中文$☺️ test",language='en-us') 63 | print(out) 64 | 65 | def atest_200_something(self): 66 | """Test something.""" 67 | out = self.g2p.g2p(text = "混合一点中文") 68 | print(out) 69 | 70 | def test_300_something(self): 71 | """Test something.""" 72 | out = self.g2p.g2p(text = "hello,this is a test Javascript!!",language='en-us') 73 | print(out) 74 | 75 | if __name__ == '__main__': 76 | unittest.main() 77 | -------------------------------------------------------------------------------- /tests/test_mapper.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | """Tests for `g2p_zh_en` package.""" 4 | 5 | 6 | import unittest 7 | import logging 8 | 9 | from g2p_zh_en.mapper import EnMapper,PinyinMapper 10 | from pypinyin import pinyin, lazy_pinyin, Style 11 | from g2p_en import G2p 12 | g2p = G2p() 13 | 14 | logging.basicConfig(level=logging.DEBUG) 15 | 16 | 17 | class TestG2p_zh_en(unittest.TestCase): 18 | """Tests for `g2p_zh_en` package.""" 19 | 20 | def setUp(self): 21 | """Set up test fixtures, if any.""" 22 | self.pinyin_mapper = PinyinMapper() 23 | self.en_mapper = EnMapper() 24 | 25 | def tearDown(self): 26 | """Tear down test fixtures, if any.""" 27 | 28 | def test_000_pinyin(self): 29 | """Test something.""" 30 | pout = lazy_pinyin('这个页面使用javascript和PHP语言开发而成', style=Style.TONE3, neutral_tone_with_five=True) 31 | out = self.pinyin_mapper.convert(pout) 32 | print(out) 33 | 34 | def test_001_eng(self): 35 | """Test something.""" 36 | pout = g2p('this page is developed by Javascript and PHP') 37 | out = self.en_mapper.convert(pout) 38 | print(out) 39 | 40 | if __name__ == '__main__': 41 | unittest.main() -------------------------------------------------------------------------------- /tox.ini: -------------------------------------------------------------------------------- 1 | [tox] 2 | envlist = py36, py37, py38, flake8 3 | 4 | [travis] 5 | python = 6 | 3.8: py38 7 | 3.7: py37 8 | 3.6: py36 9 | 10 | [testenv:flake8] 11 | basepython = python 12 | deps = flake8 13 | commands = flake8 g2p_zh_en tests 14 | 15 | [testenv] 16 | setenv = 17 | PYTHONPATH = {toxinidir} 18 | 19 | commands = python setup.py test 20 | --------------------------------------------------------------------------------