├── .gitignore ├── .pre-commit-config.yaml ├── .secrets.baseline ├── CONTRIBUTING.md ├── Experiments.md ├── From_WER_and_RIL_to_MER_and_WIL_improved_evaluatio.pdf ├── LICENSE ├── README.md ├── analyze.py ├── config.ini.sample ├── config.py ├── experiment.py ├── models.py ├── optional_analyze_with_sclite.py ├── requirements.txt ├── sample-files ├── ibuprofen.wav ├── lipitor.wav ├── reference_transcriptions.csv ├── stt_transcriptions.csv ├── tylenol.wav ├── vicodin.wav ├── wer_details.csv └── wer_summary.json ├── test_config.py └── transcribe.py /.gitignore: -------------------------------------------------------------------------------- 1 | config.ini 2 | config_* 3 | reference_transcriptions.csv 4 | reference_transcriptions_* 5 | stt_transcriptions.csv 6 | wer_details.csv 7 | wer_summary.json 8 | wer_word_accuracy.csv 9 | audio_input* 10 | *.wav 11 | *.mp3 12 | __pycache__ 13 | .DS_Store 14 | experiments_* 15 | output* 16 | -------------------------------------------------------------------------------- /.pre-commit-config.yaml: -------------------------------------------------------------------------------- 1 | # This is an example configuration to enable detect-secrets in the pre-commit hook. 2 | # Add this file to the root folder of your repository. 3 | # 4 | # Read pre-commit hook framework https://pre-commit.com/ for more details about the structure of config yaml file and how git pre-commit would invoke each hook. 5 | # 6 | # This line indicates we will use the hook from ibm/detect-secrets to run scan during committing phase. 7 | # Whitewater/whitewater-detect-secrets would sync code to ibm/detect-secrets upon merge. 8 | repos: 9 | - repo: https://github.com/ibm/detect-secrets 10 | # If you desire to use a specific version of detect-secrets, you can replace `master` with other git revisions such as branch, tag or commit sha. 11 | # You are encouraged to use static refs such as tags, instead of branch name 12 | # 13 | # Running "pre-commit autoupdate" would automatically updates rev to latest tag 14 | rev: 0.13.1+ibm.46.dss 15 | hooks: 16 | - id: detect-secrets # pragma: whitelist secret 17 | # Add options for detect-secrets-hook binary. You can run `detect-secrets-hook --help` to list out all possible options. 18 | # You may also run `pre-commit run detect-secrets` to preview the scan result. 19 | # when "--baseline" without "--use-all-plugins", pre-commit scan with just plugins in baseline file 20 | # when "--baseline" with "--use-all-plugins", pre-commit scan with all available plugins 21 | # add "--fail-on-non-audited" to fail pre-commit for unaudited potential secrets 22 | args: [--baseline, .secrets.baseline, --use-all-plugins ] 23 | -------------------------------------------------------------------------------- /.secrets.baseline: -------------------------------------------------------------------------------- 1 | { 2 | "exclude": { 3 | "files": "^.secrets.baseline$", 4 | "lines": null 5 | }, 6 | "generated_at": "2021-11-29T21:06:43Z", 7 | "plugins_used": [ 8 | { 9 | "name": "AWSKeyDetector" 10 | }, 11 | { 12 | "name": "ArtifactoryDetector" 13 | }, 14 | { 15 | "name": "AzureStorageKeyDetector" 16 | }, 17 | { 18 | "base64_limit": 4.5, 19 | "name": "Base64HighEntropyString" 20 | }, 21 | { 22 | "name": "BasicAuthDetector" 23 | }, 24 | { 25 | "name": "BoxDetector" 26 | }, 27 | { 28 | "name": "CloudantDetector" 29 | }, 30 | { 31 | "ghe_instance": "github.ibm.com", 32 | "name": "GheDetector" 33 | }, 34 | { 35 | "name": "GitHubTokenDetector" 36 | }, 37 | { 38 | "hex_limit": 3, 39 | "name": "HexHighEntropyString" 40 | }, 41 | { 42 | "name": "IbmCloudIamDetector" 43 | }, 44 | { 45 | "name": "IbmCosHmacDetector" 46 | }, 47 | { 48 | "name": "JwtTokenDetector" 49 | }, 50 | { 51 | "keyword_exclude": null, 52 | "name": "KeywordDetector" 53 | }, 54 | { 55 | "name": "MailchimpDetector" 56 | }, 57 | { 58 | "name": "NpmDetector" 59 | }, 60 | { 61 | "name": "PrivateKeyDetector" 62 | }, 63 | { 64 | "name": "SlackDetector" 65 | }, 66 | { 67 | "name": "SoftlayerDetector" 68 | }, 69 | { 70 | "name": "SquareOAuthDetector" 71 | }, 72 | { 73 | "name": "StripeDetector" 74 | }, 75 | { 76 | "name": "TwilioKeyDetector" 77 | } 78 | ], 79 | "results": {}, 80 | "version": "0.13.1+ibm.46.dss", 81 | "word_list": { 82 | "file": null, 83 | "hash": null 84 | } 85 | } 86 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing 2 | Thanks for your interest in this tool! We are glad you have found it and hope that you can use and improve it. Contributions are welcome! 3 | 4 | ## How to contribute an issue 5 | We welcome issues and enhancement requests on our [Issues Tracker](https://github.com/IBM/watson-stt-wer-python/issues). 6 | 7 | When reporting an issue, please be as specific as possible including: 8 | * Python version & OS 9 | * Error message(s) encountered (if any) 10 | * Configuration values 11 | 12 | If you have a problem using our tool, please open an issue! 13 | 14 | ## How to contribute code 15 | We use a Pull Request process with a required review from a maintainer. 16 | 17 | Before submitting your pull request, consider the following questions: 18 | * Has the documentation been updated to support this change? 19 | * Is the change backward-compatible to older configuration files? 20 | * Are new dependencies introduced, and are they included in the `requirements.txt`? 21 | * Does the change introduce complexity for novice users? Are optional features clearly marked optional? 22 | * Are debugging `print` statements removed? 23 | * Does the change use descriptive variable names? 24 | 25 | Please also review the commits: 26 | * Include an issue number in the commit name, to link the commit, issue, and pull request 27 | * Include a descriptive message 28 | * Sign the commits to certify your Developer Certificate of Origin (DCO) 29 | 30 | For example use: `git commit -sm "#issue_number - "` 31 | 32 | # Code of Conduct 33 | ## Our Pledge 34 | 35 | We as members, contributors, and leaders pledge to make participation in our 36 | community a harassment-free experience for everyone, regardless of age, body 37 | size, visible or invisible disability, ethnicity, sex characteristics, gender 38 | identity and expression, level of experience, education, socio-economic status, 39 | nationality, personal appearance, race, caste, color, religion, or sexual 40 | identity and orientation. 41 | 42 | We pledge to act and interact in ways that contribute to an open, welcoming, 43 | diverse, inclusive, and healthy community. 44 | 45 | ## Our Standards 46 | 47 | Examples of behavior that contributes to a positive environment for our 48 | community include: 49 | 50 | * Demonstrating empathy and kindness toward other people 51 | * Being respectful of differing opinions, viewpoints, and experiences 52 | * Giving and gracefully accepting constructive feedback 53 | * Accepting responsibility and apologizing to those affected by our mistakes, 54 | and learning from the experience 55 | * Focusing on what is best not just for us as individuals, but for the overall 56 | community 57 | 58 | Examples of unacceptable behavior include: 59 | 60 | * The use of sexualized language or imagery, and sexual attention or advances of 61 | any kind 62 | * Trolling, insulting or derogatory comments, and personal or political attacks 63 | * Public or private harassment 64 | * Publishing others' private information, such as a physical or email address, 65 | without their explicit permission 66 | * Other conduct which could reasonably be considered inappropriate in a 67 | professional setting 68 | 69 | ## Attribution 70 | 71 | This Code of Conduct is adapted from the [Contributor Covenant](https://www.contributor-covenant.org/), 72 | version 2.1, available at 73 | https://www.contributor-covenant.org/version/2/1/code_of_conduct.html. -------------------------------------------------------------------------------- /Experiments.md: -------------------------------------------------------------------------------- 1 | # Fine Tuning Your Configuration 2 | 3 | In its default state, Watson STT provides very accurate transcriptions across a wide range of words, accents, and audio interferances. However, should you want to get under the hood and fine tune things, Watson STT has you covered. With settings for speech sensitivity, background audio suppression, and customization model bias, to name a few, Watson STT gives you the ability to optimize your solution to fit your business and customer needs. 4 | 5 | ## Sounds great, how do I optimize the fine tuning, though? 6 | Getting that perfect balance can be tricky. Manually cycling through parameter values, or taking guesses at optimal settings, takes time and will often not produce the best results for your solution. To know for sure which settings provide the best transcriptions, you'll want to perform a grid search on the settings you care about. When complete, you'll be able to make a confident decision on what to set each parameter, knowing that those decisions will verifiably lead to better transcriptions within your usecase. 7 | 8 | ## What is grid search? 9 | Grid search is a fancy term for cycling through all combinations of various parameter values to find the most optimal ones. In the case of a STT grid search, the optimal values are the ones that produce the most accurate transcriptions for a given test set. During grid search, a given iteration will set the parameters to specific values, run a test set of audio files through STT, and analyze the results for accuracy. It then increments one of the parameters and starts a new iteration. After all possible combinations of values have been attempted and analyzed, results are provided showing the accuracy of each iteration and the values used during that iteration. 10 | 11 | ## This can be automated, right? 12 | Of course! In fact it already has! Within the https://github.com/IBM/watson-stt-wer-python repo there is a script called `experiment.py` to facilitate grid search against Watson STT. Here's how it works: 13 | 14 | Clone down the repo to your local machine 15 | 16 | ```git clone git@github.com:IBM/watson-stt-wer-python.git``` 17 | 18 | Follow the installation instructions in the repository's README - https://github.com/IBM/watson-stt-wer-python#installation 19 | 20 | Select audio files to use as a test set. Note that the more files you include both the more representative the results will be, and the longer grid search will take. Store this test set into a unique directory. 21 | 22 | Create a reference file containing audio filenames and their associated, correct, transcritption. For example: 23 | ``` 24 | Audio File Name,Reference 25 | ./sample-files/sample_audio/vicodin.wav,I will prescribe you some Vicodin 26 | ./sample-files/sample_audio/lipitor.wav,"To deal with your bad cholesterol, Lipitor would be a good option" 27 | ./sample-files/sample_audio/tylenol.wav,Take two Tylenol for your fever 28 | ./sample-files/sample_audio/ibuprofen.wav,Ibuprofen is good for your muscle aches 29 | ``` 30 | 31 | Update the `config.ini` file with: 32 | 1. The `apikey`, `service_url`, and `base_model_name` of the service to optimize. 33 | 1. If using a custom model, include the `language_model_id` and/or `acoustic_model_id`. 34 | 1. The `reference_transcriptions_file` you created 35 | 1. The `audio_file_folder` containing your test set of audio files. 36 | 1. The `stt_transcriptions_file` you wish to have transcripts stored into as a filename. 37 | 1. The parameters you wish to optimize. If you would like to omit parameters from the grid search, set their `_min` and `_max` values to the same value, which will be used in each iteration. 38 | 1. `sds_*` controls the `speech_detector_sensitivity` parameter 39 | 1. `bias_*` controls the `character_insertion_bias` parameter 40 | 1. `cust_weight_*` controls the `customization_weight` parameter 41 | 1. `bas_*` controls the `background_audio_suppression` parameter 42 | 1. `end_of_phrase_silence_time_*` controls the `end_of_phrase_silence_time_` parameter 43 | For example the following will iterate through `customization_weight` from `0` to `0.3` at `0.1` increments, while keeping the other parameters static: 44 | ``` 45 | [Experiments] 46 | sds_min=0.7 47 | sds_max=0.7 48 | sds_step=0.1 49 | bias_min=0 50 | bias_max=0.0 51 | bias_step=0.1 52 | cust_weight_min=0 53 | cust_weight_max=0.3 54 | cust_weight_step=0.1 55 | bas_min=0 56 | bas_max=0.0 57 | bas_step=0.1 58 | end_of_phrase_silence_time_min=0 59 | end_of_phrase_silence_time_max=0.0 60 | end_of_phrase_silence_time_step=0.1 61 | ``` 62 | 63 | Run the grid search 64 | ``` 65 | python experiment.py --config_file config.ini --log_level INFO 66 | 67 | 2023-05-04 11:13:34,607 - INFO - Running Experiment -- Character Insertion Bias: 0.0, Customization Weight: 0.0, Speech Detector Sensitivity: 0.7, Background Audio Suppression: 0.0 68 | 2023-05-04 11:13:44,665 - INFO - Completed transcribing 4 files out of 4 69 | 2023-05-04 11:13:44,666 - INFO - Wrote transcriptions for 4 audio files to ./sample-files/bias_0.0_weight_0.0_sds_0.7_bas_0.0/stt_transcriptions.csv 70 | 2023-05-04 11:13:44,676 - INFO - Updated ./sample-files/bias_0.0_weight_0.0_sds_0.7_bas_0.0/stt_transcriptions.csv with reference transcriptions 71 | 2023-05-04 11:13:44,680 - INFO - Created ctm file - ./sample-files/bias_0.0_weight_0.0_sds_0.7_bas_0.0/stt_transcriptions.ctm 72 | 2023-05-04 11:13:44,687 - INFO - Created stm file - ./sample-files/bias_0.0_weight_0.0_sds_0.7_bas_0.0/stt_transcriptions.stm 73 | 2023-05-04 11:13:44,709 - INFO - Created summary file - /Users/gecock/Desktop/watson-stt-wer-python-fork/watson-stt-wer-python/sample-files/bias_0.0_weight_0.0_sds_0.7_bas_0.0/sclite_wer_summary.json 74 | 2023-05-04 11:13:44,716 - INFO - Experiment Complete 75 | 76 | 2023-05-04 11:13:44,716 - INFO - Running Experiment -- Character Insertion Bias: 0.0, Customization Weight: 0.1, Speech Detector Sensitivity: 0.7, Background Audio Suppression: 0.0 77 | 2023-05-04 11:13:56,141 - INFO - Completed transcribing 4 files out of 4 78 | ... 79 | 2023-05-04 11:14:13,783 - INFO - 80 | | | task | Substitutions | Deletions | Insertions | Word Error Rate | Sentence Error Rate | Total Words | Total Sentences | 81 | |---:|:------------------------------------|----------------:|------------:|-------------:|------------------:|----------------------:|--------------:|------------------:| 82 | | 0 | bias_0.0_weight_0.0_sds_0.7_bas_0.0 | 16.1 | 0 | 16.1 | 32.3 | 100 | 31 | 4 | 83 | | 1 | bias_0.0_weight_0.1_sds_0.7_bas_0.0 | 16.1 | 0 | 16.1 | 32.3 | 100 | 31 | 4 | 84 | | 2 | bias_0.0_weight_0.3_sds_0.7_bas_0.0 | 16.1 | 3.2 | 12.9 | 32.3 | 100 | 31 | 4 | 85 | | 3 | bias_0.0_weight_0.2_sds_0.7_bas_0.0 | 16.1 | 0 | 16.1 | 32.3 | 100 | 31 | 4 | 86 | ``` 87 | 88 | View the results in the file `all_summaries.csv` to see a concise view of all iterations, or, view the details of each iteration by looking at the set of directories created in the format `bias__weight__sds__bas_` 89 | 90 | -------------------------------------------------------------------------------- /From_WER_and_RIL_to_MER_and_WIL_improved_evaluatio.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/watson-stt-wer-python/661c329230c588f71046228ffbca4737f5d19d1f/From_WER_and_RIL_to_MER_and_WIL_improved_evaluatio.pdf -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [yyyy] [name of copyright owner] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # STT-WER-Python 2 | Utilities for 3 | * [Transcribing](#transcription) a set of audio files with Speech to Text (STT) 4 | * [Analyzing](#analysis) the error rate of the STT transcription against a known-good transcription 5 | * [Experimenting](#experimenting) with various parameters to find optimal values 6 | 7 | ## More documentation 8 | This readme describes the tools in depth. For more information on use cases and methodology, please see the following articles: 9 | * [New Python Scripts to Measure Word Error Rate on Watson Speech to Text](https://medium.com/@marconoel/new-python-scripts-to-measure-word-error-rate-on-watson-speech-to-text-77ecaa513f60): How to use these tools, including a YouTube video demonstration 10 | * [New Speech Testing Utilities for Conversational AI Projects](https://medium.com/ibm-watson-speech-services/new-speech-testing-utilities-for-conversational-ai-projects-bf73debe19be): Describes recipe for using Text to Speech to "bootstrap" testing data 11 | * [Data Collection and Training for Speech Projects](https://medium.com/ibm-data-ai/data-collection-and-training-for-speech-projects-22004c3e84fb): How to collect test data from human voices. 12 | * [How to Train your Speech to Text Dragon](https://medium.com/ibm-watson/watson-speech-to-text-how-to-train-your-own-speech-dragon-part-1-data-collection-and-fdd8cea4f4b8) 13 | * [A mental model for Speech to Text training](https://medium.com/ibm-watson-speech-services/a-mental-model-for-speech-to-text-training-8c56a4105e25) 14 | 15 | You may also find useful: 16 | * [TTS-Python](https://github.com/IBM/watson-tts-python) - companion tooling for IBM Text to Speech 17 | 18 | ## Installation 19 | Requires Python 3.x installation. 20 | 21 | All of the watson-stt-wer-python dependencies are installed at once with `pip`: 22 | 23 | ``` 24 | pip install -r requirements.txt 25 | ``` 26 | 27 | **Note:** If receiving an SSL Certificate error (CERTIFICATE_VERIFY_FAILED) when running the python scripts, try the following commands to tell python to use the system certificate store. 28 | 29 | **_Windows_** 30 | ``` 31 | pip install --trusted-host pypi.org --trustedhost files.python.org python-certifi-win32 32 | ``` 33 | 34 | **_MacOS_** 35 | 36 | Open a terminal and change to the location of your python installation to execute `Install Certificates.command`, for example: 37 | ``` 38 | cd /Applications/Python 3.6 39 | ./Install Certificates.command 40 | ``` 41 | 42 | ## Setup 43 | Create a copy of `config.ini.sample`. You'll modify this file in subsequent steps. 44 | ``` 45 | cp config.ini.sample config.ini 46 | ``` 47 | 48 | Each sub-sections will describe what configuration parameters are needed. 49 | 50 | # Generic Command-Line parameters 51 | 52 | `--config_file` or `-c` is the configuration file to be used. The default is `config.ini` 53 | 54 | `--log_level` or `-ll` is the log level to be used when running the script. Supported levels are as follows: 55 | - `ERROR` -- Print out only when things fail. 56 | - `WARN` -- Print out cautions and when things fail. 57 | - `INFO` -- (Default) Print out useful status, cautions, and when things fail. 58 | - `DEBUG` -- Print out every possible message. 59 | 60 | # Transcription 61 | Uses IBM Watson Speech to Text service to transcribe a folder full of audio files. Creates a CSV with transcriptions. 62 | 63 | 64 | 65 | ## Setup 66 | Update the parameters in your `config.ini` file. 67 | 68 | Required configuration parameters: 69 | * apikey - API key for your Speech to Text instance 70 | * service_url - Reference URL for your Speech to Text instance 71 | * base_model_name - Base model for Speech to Text transcription 72 | 73 | Optional configuration parameters: 74 | * max_threads - Maximum number of threads to use with `transcribe.py` to improve performance. 75 | * language_model_id - Language model customization ID (comment out to use base model) 76 | * acoustic_model_id - Acoustic model customization ID (comment out to use base model) 77 | * grammar_name - Grammar name (comment out to use base model) 78 | * stt_transcriptions_file - Output file for Speech to Text transcriptions 79 | * audio_file_folder - Input directory containing your audio files 80 | * reference_transcriptions_file - Reference file for manually transcribed audio files ("labeled data" or "ground truth"). If present, will be merged into `stt_transcriptions_file` as "Reference" column 81 | * stemming - If True, pre-processing stems words with Porter stemmer. Stemming will treat singular/plural of a word as equivalent, rather than a word error. 82 | 83 | 84 | 85 | ## Execution 86 | Assuming your configuration is in `config.ini`, transcribe all the audio files in `audio_file_folder` parameter via the following command: 87 | 88 | ``` 89 | python transcribe.py --config_file config.ini --log_level DEBUG 90 | ``` 91 | 92 | See [Generic Command Line Parameters](#generic-command-line-parameters) for more details. 93 | 94 | ## Output 95 | Transcription will be stored in a CSV file based on `stt_transcriptions_file` parameter with a format like below: 96 | 97 | Audio File|Transcription 98 | -----|----- 99 | file1.wav|The quick brown fox 100 | file2.wav|jumped over the lazy dog 101 | 102 | A third column, "Reference", will be included with the reference transcription, if a `reference_transcriptions_file` is found as source. 103 | 104 | # Analysis 105 | Simple python package to approximate the Word Error Rate (WER), Match Error Rate (MER), Word Information Lost (WIL) and Word Information Preserved (WIP) of one or more transcripts. 106 | 107 | ## Setup 108 | Your config file must have references for the `reference_transcriptions_file` and `stt_transcriptions_file` properties. 109 | 110 | * **Reference file** (`reference_transcriptions_file`) is a CSV file with at least columns called `Audio File Name` and `Reference`. The `Reference` is the actual transcription of the audio file (also known as the "ground truth" or "labeled data"). NOTE: In your audio file name, make sure you put the full path (eg. ./audio1.wav) 111 | * **Hypothesis file** (`stt_transcriptions_file`) is a CSV file with at least columns called `Audio File Name` and `Hypothesis`. The `Hypothesis` is the transcription of the audio file by the Speech to Text engine. The `transcribe.py` script can create this file. 112 | 113 | ## Results 114 | * **Details** (`details_file`) is a CSV file with rows for each audio sample, including reference and hypothesis transcription and specific transcription errors 115 | * **Summary** (`summary_file`) is a JSON file with metrics for total transcriptions and overall word and sentence error rates. 116 | * **Accuracy** (`word_accuracy_file`) is a CSV file with rows 117 | 118 | ## Metrics (Definitions) 119 | - WER (word error rate), commonly used in ASR assessment, measures the cost of restoring the output word sequence to the original input sequence. 120 | - MER (match error rate) is the proportion of I/O word matches which are errors. 121 | - WIL (word information lost) is a simple approximation to the proportion of word information lost which overcomes the problems associated with the RIL (relative information lost) measure that was proposed half a century ago. 122 | 123 | ## Background on supporting library 124 | Repo of the Python module JIWER: https://pypi.org/project/jiwer/ 125 | 126 | It computes the minimum-edit distance between the ground-truth sentence and the hypothesis sentence of a speech-to-text API. 127 | The minimum-edit distance is calculated using the python C module python-Levenshtein. 128 | 129 | ## Execution 130 | 131 | ``` 132 | python analyze.py --config_file config.ini --log_level DEBUG 133 | ``` 134 | 135 | See [Generic Command Line Parameters](#generic-command-line-parameters) for more details. 136 | 137 | # Analysis with sclite 138 | This repo provides a wrapper script, `optional_analyze_with_sclite.py`, to run `sclite`, which is an open source tool designed to evaluate STT transcription results. `sclite` goes beyound regular WER and SER reporting to provide reports like Confusion Pairs to show exactly which words were substituted with what, or Text Alignment which shows the inline differences between the reference and transcribed texts. For more information about the output of `optional_analyze_with_sclite.py` see the results sub-section below. For more information about `sclite`, see -- https://people.csail.mit.edu/joe/sctk-1.2/doc/sclite.htm#sclite_name_0. 139 | 140 | ## Setup 141 | 1. `reference_transcriptions_file` and `stt_transcriptions_file` must be populated in `config.ini` and exist on the filesystem. 142 | 1. `sclite_directory` must be uncommented and populated with the directory that hold the `sclite` executable 143 | 1. To install `sclite` follow the instructions here -- https://github.com/usnistgov/SCTK#sctk-basic-installation 144 | 145 | ## Execution 146 | 147 | ``` 148 | python optional_analyze_with_sclite.py --config config.ini --log_level INFO 149 | ``` 150 | See [Generic Command Line Parameters](#generic-command-line-parameters) for more details. 151 | 152 | ## Results 153 | 1. `sclite_wer_summary.json` -- A concise summary of metrics 154 | 1. `*.sys` -- A summary file showing the number of words, sentences, deletions, insertions, substitutions, word error rate, and sentence error rate. 155 | 1. `*.prf` -- A text alignment file that shows, for each audio file, the reference text and transcribed text, and for each word whether it was inserted, deleted, substituted, or correct. 156 | 1. `*.dtl` -- A detail file showing confusion pairs and which specific words were inserted, deleted, or substituted. 157 | 158 | There will also be the following two files that were created for use by `sclite` but are not direct outputs of `sclite`: 159 | 1. `*.ctm` -- A file containing a line for each transcribed word of each audio file 160 | 1. `*.stm` -- A file containing a reformatted version of the `reference_transcriptions_file` that `sclite` uses for evalutation 161 | 162 | # Experimenting 163 | Use the `experiment.py` script to execute a series of Transcription/Analyze experiments to optimize SpeechToText parameters. 164 | 165 | ## Setup 166 | 167 | Follow the setup for [Transcribing](#transcription). 168 | 169 | Follow the setup for [Analyzing](#analysis). 170 | 171 | The following parameters in `[Experiments]` all have a `*_min` and `*_max` variant to specify the lower limit and upper limit, respectively, for its corresponding `[SpeechToText]` parameter, and a `*_step` variant to specify the amount to increase that parameter in each experiment: 172 | 1. `sds_*` controls the `speech_detector_sensitivity` parameter 173 | 1. `bias_*` controls the `character_insertion_bias` parameter 174 | 1. `cust_weight_*` controls the `customization_weight` parameter 175 | 1. `bas_*` controls the `background_audio_suppression`parameter 176 | 1. `end_of_phrase_silence_time_*` controls the `end_of_phrase_silence_time_` parameter 177 | 178 | Note: If you want to use `sclite` for analysis of each experiment be sure to configure `sclite_directory` under the `[ErrorRateOutput]` section. 179 | 180 | ## Execution 181 | 182 | ``` 183 | python experiment.py --config_file config.ini --log_level INFO 184 | ``` 185 | 186 | See [Generic Command Line Parameters](#generic-command-line-parameters) for more details. 187 | 188 | ## Results 189 | Each experiment creates a unique directory based on the parameters of that experiment in the format `bias__weight__sds__bas_`. 190 | 191 | For each experiment the output files from [Transcribing](#transcription) and [Analyzing](#analysis) will be created in its unique output directory. 192 | 193 | There will be a final file created called `all_summaries.csv` that contains the summary of all experiments in a single CSV. 194 | 195 | # Model training 196 | The `models.py` script has wrappers for many model-related tasks including creating models, updating training contents, getting model details, and training models. 197 | 198 | ## Setup 199 | Update the parameters in your `config.ini` file. 200 | 201 | Required configuration parameters: 202 | * apikey - API key for your Speech to Text instance 203 | * service_url - Reference URL for your Speech to Text instance 204 | * base_model_name - Base model for Speech to Text transcription 205 | 206 | ## Execution 207 | For general help, execute: 208 | ``` 209 | python models.py 210 | ``` 211 | 212 | The script requires a type (one of base_model,custom_model,corpus,word,grammar) and an operation (one of list,get,create,update,delete) 213 | The script optionally takes a config file as an argument with `-c config_file_name_goes_here`, otherwise using a default file of `config.ini` which contains the connection details for your speech to text instance. 214 | Depending on the specified operation, the script also accepts a name, description, and file for an associated resource. For instance, new custom models should have a name and description, and a corpus should have a name and associated file. 215 | 216 | ## Examples 217 | 218 | List all base models: 219 | ``` 220 | python models.py -o list -t base_model 221 | ``` 222 | 223 | List all custom models: 224 | ``` 225 | python models.py -o list -t custom_model 226 | ``` 227 | 228 | Create a custom model: 229 | ``` 230 | python models.py -o create -t custom_model -n "model1" -d "my first model" 231 | ``` 232 | 233 | Add a corpus file for a custom model (the custom model's customization_id is stored in `config.ini.model1`)(`corpus1.txt` contains the corpus contents): 234 | ``` 235 | python models.py -c config.ini.model1 -o create -n "corpus1" -f "corpus1.txt" -t corpus 236 | ``` 237 | 238 | Create corpora for all corpus files in a directory (the filename will be used for the corpora name) 239 | ``` 240 | python models.py -c config.ini.model1 -o create -t corpus -dir corpus-dir 241 | ``` 242 | 243 | List all corpora for a custom model (the custom model's customization_id is stored in `config.ini.model1`): 244 | ``` 245 | python models.py -c config.ini.model1 -o list -t corpus 246 | ``` 247 | 248 | Train a custom model (the custom model's customization_id is stored in `config.ini.model1`): 249 | ``` 250 | python models.py -c config.ini.model1 -o update -t custom_model 251 | ``` 252 | 253 | Note some parameter combinations are not possible. The operations supported all wrap the SDK methods documented at https://cloud.ibm.com/apidocs/speech-to-text. 254 | 255 | # Sample setup for organizing multiple experiments 256 | Instructions for creating a directory structure for organizing input and output files for experiments for multiple models. Creating a new directory structure is recommend for each new model being experimented/tested. A sample `MemberID` model is shown. 257 | 1. Start from root of WER tool directory, `cd WATSON-STT-WER-PYTHON` 258 | 1. Create project directory, `mkdir -p ` 259 | 1. e.g. `mkdir -p ClientName-data` 260 | 1. Create audio directory, `mkdir -p /audios/