├── pics ├── promo.png ├── Icon128.png ├── api_key.png ├── compare.png ├── apikeybp.png ├── encryption.png ├── googlestt.png ├── googletts.png ├── read_back.png ├── setup_all.png ├── sound_mix.png ├── thumbnail.png ├── useenvvars.png ├── wipe_cache.png ├── silencenode.png ├── sound_class.png ├── buffertosound.png ├── voicesettings.png ├── googlesttvariants.png ├── new_language_pin.png ├── ovrframesequence.png ├── tts_cache_folder.png ├── audio_capture_plugin.png ├── googlespeechkeyenv.png ├── mic_access_android.png ├── enumerate_microphones.png ├── audio_capture_sound_class.png ├── disk_read_access_android.png ├── disk_write_access_android.png ├── microphone_access_xcode.png └── start_stop_recording_set_submix.png └── README.md /pics/promo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/promo.png -------------------------------------------------------------------------------- /pics/Icon128.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/Icon128.png -------------------------------------------------------------------------------- /pics/api_key.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/api_key.png -------------------------------------------------------------------------------- /pics/compare.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/compare.png -------------------------------------------------------------------------------- /pics/apikeybp.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/apikeybp.png -------------------------------------------------------------------------------- /pics/encryption.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/encryption.png -------------------------------------------------------------------------------- /pics/googlestt.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/googlestt.png -------------------------------------------------------------------------------- /pics/googletts.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/googletts.png -------------------------------------------------------------------------------- /pics/read_back.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/read_back.png -------------------------------------------------------------------------------- /pics/setup_all.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/setup_all.png -------------------------------------------------------------------------------- /pics/sound_mix.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/sound_mix.png -------------------------------------------------------------------------------- /pics/thumbnail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/thumbnail.png -------------------------------------------------------------------------------- /pics/useenvvars.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/useenvvars.png -------------------------------------------------------------------------------- /pics/wipe_cache.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/wipe_cache.png -------------------------------------------------------------------------------- /pics/silencenode.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/silencenode.png -------------------------------------------------------------------------------- /pics/sound_class.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/sound_class.png -------------------------------------------------------------------------------- /pics/buffertosound.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/buffertosound.png -------------------------------------------------------------------------------- /pics/voicesettings.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/voicesettings.png -------------------------------------------------------------------------------- /pics/googlesttvariants.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/googlesttvariants.png -------------------------------------------------------------------------------- /pics/new_language_pin.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/new_language_pin.png -------------------------------------------------------------------------------- /pics/ovrframesequence.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/ovrframesequence.png -------------------------------------------------------------------------------- /pics/tts_cache_folder.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/tts_cache_folder.png -------------------------------------------------------------------------------- /pics/audio_capture_plugin.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/audio_capture_plugin.png -------------------------------------------------------------------------------- /pics/googlespeechkeyenv.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/googlespeechkeyenv.png -------------------------------------------------------------------------------- /pics/mic_access_android.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/mic_access_android.png -------------------------------------------------------------------------------- /pics/enumerate_microphones.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/enumerate_microphones.png -------------------------------------------------------------------------------- /pics/audio_capture_sound_class.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/audio_capture_sound_class.png -------------------------------------------------------------------------------- /pics/disk_read_access_android.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/disk_read_access_android.png -------------------------------------------------------------------------------- /pics/disk_write_access_android.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/disk_write_access_android.png -------------------------------------------------------------------------------- /pics/microphone_access_xcode.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/microphone_access_xcode.png -------------------------------------------------------------------------------- /pics/start_stop_recording_set_submix.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/start_stop_recording_set_submix.png -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # **UE4 Google Speech Kit** 2 | 3 | ![](pics/Icon128.png) 4 | 5 | This is UE4 wrapper for Google's [Cloud Text-to-Speech](https://cloud.google.com/text-to-speech/) and syncronous [Cloud Speech-to-Text](https://cloud.google.com/speech-to-text/) speech recognition. 6 | 7 | Plugin was battle tested in several commercial simulator projects. It is small, lean and simple to use. 8 | 9 | # Table of contents 10 | - [**UE4 Google Speech Kit**](#ue4-google-speech-kit) 11 | - [Table of contents](#table-of-contents) 12 | - [Engine preparation](#engine-preparation) 13 | - [Cloud preparation](#cloud-preparation) 14 | - [Speech synthesis](#speech-synthesis) 15 | - [Speech recognition](#speech-recognition) 16 | - [Grant permissions](#grant-permissions) 17 | - [Windows](#windows) 18 | - [Mac](#mac) 19 | - [Android](#android) 20 | - [Voice capture and speech recognition](#voice-capture-and-speech-recognition) 21 | - [Utilities](#utilities) 22 | - [Percentage based string comparison (Fuzzy matching)](#percentage-based-string-comparison-fuzzy-matching) 23 | - [Listing available capture devices](#listing-available-capture-devices) 24 | - [Supported platforms](#supported-platforms) 25 | - [Migration guide](#migration-guide) 26 | - [Version 3.0](#version-30) 27 | - [Links](#links) 28 | 29 | # Engine preparation 30 | 31 | To make microphone work, you need to add following lines to `DefaultEngine.ini` of the project. 32 | ``` 33 | [Voice] 34 | bEnabled=true 35 | ``` 36 | 37 | To not loose pauses in between words, you probably want to check silence detection treshold `voice.SilenceDetectionThreshold`, value `0.01` is good. 38 | This also goes to `DefaultEngine.ini`. 39 | 40 | ``` 41 | [SystemSettings] 42 | voice.SilenceDetectionThreshold=0.01 43 | ``` 44 | Starting from Engine version 4.25 also put 45 | ``` 46 | voice.MicNoiseGateThreshold=0.01 47 | ``` 48 | 49 | Another voice related variables worth playing with 50 | ```bash 51 | voice.MicNoiseGateThreshold 52 | voice.MicInputGain 53 | voice.MicStereoBias 54 | voice.MicNoiseAttackTime 55 | voice.MicNoiseReleaseTime 56 | voice.MicStereoBias 57 | voice.SilenceDetectionAttackTime 58 | voice.SilenceDetectionReleaseTime 59 | ``` 60 | 61 | To find available settings type `voice.` in editor console, and autocompletion widget will pop up. 62 | 63 | ![](pics/voicesettings.png) 64 | 65 | Console variables can be modified in runtime like this 66 | 67 | ![](pics/silencenode.png) 68 | 69 | To debug your microphone input you can convert output sound buffer to 70 | unreal sound wave and play it. 71 | 72 | ![](pics/buffertosound.png) 73 | 74 | Above values may differ depending on actual microphone characteristics. 75 | 76 | # Cloud preparation 77 | 1) Go to [google cloud](https://console.cloud.google.com) and create payment account. 78 | 2) Enable [Cloud Speech-to-Text API](https://console.cloud.google.com/apis/library/speech.googleapis.com) and [Cloud Text-to-Speech API](https://console.cloud.google.com/apis/library/texttospeech.googleapis.com). 79 | 3) Create credentials to access your enabled APIs. See instructions [here](https://cloud.google.com/docs/authentication). 80 | 81 | ![](pics/api_key.png) 82 | 83 | 4) There are two ways how you can use your credentials in project. 84 | 85 | * 4.1 By using environment variables. Create environment variable `GOOGLE_API_KEY` with created key as value. 86 | 87 | * 4.2 By assigning key directly in blueprints. This can be called anywhere. 88 | 89 | ![](pics/apikeybp.png) 90 | 91 | By default you need to set api key from nodes. To use environment variable, you need to set `Use Env Variable` to `true`. 92 | 93 | > **ADVICE**: Pay attention to security and encrypt your assets before packaging. 94 | 95 | ![](pics/encryption.png) 96 | 97 | # Speech synthesis 98 | 99 | You need to supply text to async node, as well as voice variant, speech speed, pitch value and optionally audio effects. As output you will get 100 | audio buffer which you can import using audio importer. 101 | 102 | ![](pics/googletts.png) 103 | 104 | 115 | 116 | # Speech recognition 117 | 118 | Consists of two parts. Voice capture, and sending request. There are two ways how you can capture your voice, depending on your needs. 119 | 120 | ## Grant permissions 121 | 122 | ### Windows 123 | No actions needed 124 | ### Mac 125 | 1. In Xcode, select you project 126 | 1. Go to `Info` tab 127 | 1. Expand `Custom macOS Application Target Properties` section 128 | 1. Hit `+`, and add `Privacy - Microphone Usage Description` string key, set any value you want, for example "GoogleSpeechKitMicAccess" 129 | ![](pics/microphone_access_xcode.png) 130 | ### Android 131 | Call this somewhere on begin play 132 | 1. Give [microphone access](https://blueprintue.com/blueprint/v-3i68vw/) (**android.permission.RECORD_AUDIO**) 133 | ![](pics/mic_access_android.png) 134 | 1. Give [disk read access](https://blueprintue.com/blueprint/myo1kxkf/) (**android.permission.READ_EXTERNAL_STORAGE**) 135 | ![](pics/disk_read_access_android.png) 136 | 1. Give [disk write access](https://blueprintue.com/blueprint/32f-40w8/) (**android.permission.WRITE_EXTERNAL_STORAGE**) 137 | ![](pics/disk_write_access_android.png) 138 | 139 | ## Voice capture and speech recognition 140 | 141 | 142 |
143 | Windows only method (deprecated) 144 | 145 | 146 | Use provided **MicrophoneCapture** actor component as shown below. Next, construct recognition parameters and pass them to **Google STT** async node. 147 | 148 | ![](pics/googlestt.png) 149 | 150 |
151 | 152 | --- 153 | 154 | 155 | ### Cross platform method (use this instead) 156 | 157 | 1. Create SoundMix. 158 | 1. Right click in content browser - `Sounds > Mix > Sound Soundmix` 159 | 2. Open it, and set output value to -96.0 160 | ![](pics/sound_mix.png) 161 | 162 | 2. Create sound class 163 | 1. Right click in content browser - `Sounds > Classes > Sound Class` 164 | 2. Open it, and set our submix that we created in previous step as sound class default submix 165 | 166 | 3. Make sure Audio Capture plugin is enabled 167 | ![](pics/audio_capture_plugin.png) 168 | 4. Go to your actor, and add AudioCapture component in components tab 169 | 5. Disable "Auto Activate" option on AudioCapture 170 | 6. Set our sound class to AudioCapture 171 | ![](pics/audio_capture_sound_class.png) 172 | 173 | 7. Now we can drop some nodes. In order to start and stop recording, we use `Activate` and `Deactivate` nodes with previously added AudioCapture component as a target. When audio capture is activated, we can start recording output to our submix 174 | 8. When audio capture is deactivated, we finish recording output to `Wav File`! **This is important**! Give your wav file a name (e.g. "stt_sample"), `Path` can be absolute, or relative (to the /Saved/BouncedWavFiles folder) 175 | ![](pics/start_stop_recording_set_submix.png) 176 | 1. Then, after small delay, we can read saved file back as byte samples, ready to be fed to `Google STT` node. Delay is needed since "Finish Recording Output" node writes sound to disk, file write operation takes some time, if we will proceed immediately, ReadWaveFile node will fail 177 | ![](pics/read_back.png) 178 | 179 | Here is the whole setup 180 | 181 | ![](pics/setup_all.png) 182 | 183 | --- 184 | 185 | 186 | There is another STT node - **Google STT Variants** node. Which, instead of returning result with highest confidence, returns an array of variants. 187 | 188 | ![](pics/googlesttvariants.png) 189 | 190 | # Utilities 191 | ## Percentage based string comparison (Fuzzy matching) 192 | 193 | Probably, you will need to process recognised voice in your app, to increase recognition chances use `CompareStrings` node. Below call will return 0.666 value, 194 | so we can treat those strings equal since they are simmilar on 66%. Utilizes [Levenstein distance](https://en.wikipedia.org/wiki/Levenshtein_distance) algorithm 195 | 196 | ![](pics/compare.png) 197 | 198 | ## Listing available capture devices 199 | 200 | You can pass microphone name to microphone capture component. To get list of available microphones, use following setup 201 | 202 | ![](pics/enumerate_microphones.png) 203 | 204 | # Supported platforms 205 | 206 | **Windows**, **Mac** and **Android**. 207 | 208 | # Migration guide 209 | 210 | ## Version 3.0 211 | 212 | `EGoogleTTSLanguage` was removed. You need to pass [voice name](https://cloud.google.com/text-to-speech/docs/voices) as string (**Voice name** column). 213 | 214 | ![new_language_pin](pics/new_language_pin.png) 215 | 216 | > **WARNING**: Since synthesys parameters has changed, TTS cache is no longer valid! Make sure you remove TTS cache if exists. **Editor/Game can freeze** if old cache wll be loaded. So make sure to remove `PROJECT_ROOT/Saved/GoogleTTSCache` folder. Or invoke `WipeTTSCache` node before GoogleTTS node is executed! 217 | 218 | ![](pics/wipe_cache.png) 219 | 220 | ![](pics/tts_cache_folder.png) 221 | 222 | The reason for this is that the number of languages has exceeded 256, and we can't put this amount into 8 bit enums (This is Unreal's limitation) 223 | 224 | # Links 225 | * [Supported TTS voices](https://cloud.google.com/text-to-speech/docs/voices) ([WaveNet](https://en.wikipedia.org/wiki/WaveNet) are the best) 226 | * [Speech synthesis config](https://cloud.google.com/text-to-speech/docs/reference/rest/v1/text/synthesize#audioconfig) 227 | * [Supported STT languages](https://cloud.google.com/speech-to-text/docs/languages) 228 | * [Speech recognition config](https://cloud.google.com/speech-to-text/docs/reference/rest/v1/RecognitionConfig) 229 | --------------------------------------------------------------------------------