├── pics
    ├── promo.png
    ├── Icon128.png
    ├── api_key.png
    ├── compare.png
    ├── apikeybp.png
    ├── encryption.png
    ├── googlestt.png
    ├── googletts.png
    ├── read_back.png
    ├── setup_all.png
    ├── sound_mix.png
    ├── thumbnail.png
    ├── useenvvars.png
    ├── wipe_cache.png
    ├── silencenode.png
    ├── sound_class.png
    ├── buffertosound.png
    ├── voicesettings.png
    ├── googlesttvariants.png
    ├── new_language_pin.png
    ├── ovrframesequence.png
    ├── tts_cache_folder.png
    ├── audio_capture_plugin.png
    ├── googlespeechkeyenv.png
    ├── mic_access_android.png
    ├── enumerate_microphones.png
    ├── audio_capture_sound_class.png
    ├── disk_read_access_android.png
    ├── disk_write_access_android.png
    ├── microphone_access_xcode.png
    └── start_stop_recording_set_submix.png
└── README.md


/pics/promo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/promo.png


--------------------------------------------------------------------------------
/pics/Icon128.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/Icon128.png


--------------------------------------------------------------------------------
/pics/api_key.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/api_key.png


--------------------------------------------------------------------------------
/pics/compare.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/compare.png


--------------------------------------------------------------------------------
/pics/apikeybp.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/apikeybp.png


--------------------------------------------------------------------------------
/pics/encryption.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/encryption.png


--------------------------------------------------------------------------------
/pics/googlestt.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/googlestt.png


--------------------------------------------------------------------------------
/pics/googletts.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/googletts.png


--------------------------------------------------------------------------------
/pics/read_back.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/read_back.png


--------------------------------------------------------------------------------
/pics/setup_all.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/setup_all.png


--------------------------------------------------------------------------------
/pics/sound_mix.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/sound_mix.png


--------------------------------------------------------------------------------
/pics/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/thumbnail.png


--------------------------------------------------------------------------------
/pics/useenvvars.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/useenvvars.png


--------------------------------------------------------------------------------
/pics/wipe_cache.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/wipe_cache.png


--------------------------------------------------------------------------------
/pics/silencenode.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/silencenode.png


--------------------------------------------------------------------------------
/pics/sound_class.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/sound_class.png


--------------------------------------------------------------------------------
/pics/buffertosound.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/buffertosound.png


--------------------------------------------------------------------------------
/pics/voicesettings.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/voicesettings.png


--------------------------------------------------------------------------------
/pics/googlesttvariants.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/googlesttvariants.png


--------------------------------------------------------------------------------
/pics/new_language_pin.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/new_language_pin.png


--------------------------------------------------------------------------------
/pics/ovrframesequence.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/ovrframesequence.png


--------------------------------------------------------------------------------
/pics/tts_cache_folder.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/tts_cache_folder.png


--------------------------------------------------------------------------------
/pics/audio_capture_plugin.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/audio_capture_plugin.png


--------------------------------------------------------------------------------
/pics/googlespeechkeyenv.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/googlespeechkeyenv.png


--------------------------------------------------------------------------------
/pics/mic_access_android.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/mic_access_android.png


--------------------------------------------------------------------------------
/pics/enumerate_microphones.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/enumerate_microphones.png


--------------------------------------------------------------------------------
/pics/audio_capture_sound_class.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/audio_capture_sound_class.png


--------------------------------------------------------------------------------
/pics/disk_read_access_android.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/disk_read_access_android.png


--------------------------------------------------------------------------------
/pics/disk_write_access_android.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/disk_write_access_android.png


--------------------------------------------------------------------------------
/pics/microphone_access_xcode.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/microphone_access_xcode.png


--------------------------------------------------------------------------------
/pics/start_stop_recording_set_submix.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/UE4GoogleSpeechKit-docs/HEAD/pics/start_stop_recording_set_submix.png


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # **UE4 Google Speech Kit**
  2 | 
  3 | ![](pics/Icon128.png)
  4 | 
  5 | This is UE4 wrapper for Google's [Cloud Text-to-Speech](https://cloud.google.com/text-to-speech/) and syncronous [Cloud Speech-to-Text](https://cloud.google.com/speech-to-text/) speech recognition.
  6 | 
  7 | Plugin was battle tested in several commercial simulator projects. It is small, lean and simple to use.
  8 | 
  9 | # Table of contents
 10 | - [**UE4 Google Speech Kit**](#ue4-google-speech-kit)
 11 | - [Table of contents](#table-of-contents)
 12 | - [Engine preparation](#engine-preparation)
 13 | - [Cloud preparation](#cloud-preparation)
 14 | - [Speech synthesis](#speech-synthesis)
 15 | - [Speech recognition](#speech-recognition)
 16 |   - [Grant permissions](#grant-permissions)
 17 |     - [Windows](#windows)
 18 |     - [Mac](#mac)
 19 |     - [Android](#android)
 20 |   - [Voice capture and speech recognition](#voice-capture-and-speech-recognition)
 21 | - [Utilities](#utilities)
 22 |   - [Percentage based string comparison (Fuzzy matching)](#percentage-based-string-comparison-fuzzy-matching)
 23 |   - [Listing available capture devices](#listing-available-capture-devices)
 24 | - [Supported platforms](#supported-platforms)
 25 | - [Migration guide](#migration-guide)
 26 |   - [Version 3.0](#version-30)
 27 | - [Links](#links)
 28 | 
 29 | # Engine preparation
 30 | 
 31 | To make microphone work, you need to add following lines to `DefaultEngine.ini` of the project.
 32 | ```
 33 | [Voice]
 34 | bEnabled=true
 35 | ```
 36 | 
 37 | To not loose pauses in between words, you probably want to check silence detection treshold `voice.SilenceDetectionThreshold`, value `0.01` is good.
 38 | This also goes to `DefaultEngine.ini`.
 39 | 
 40 | ```
 41 | [SystemSettings]
 42 | voice.SilenceDetectionThreshold=0.01
 43 | ```
 44 | Starting from Engine version 4.25 also put
 45 | ```
 46 | voice.MicNoiseGateThreshold=0.01
 47 | ```
 48 | 
 49 | Another voice related variables worth playing with
 50 | ```bash
 51 | voice.MicNoiseGateThreshold
 52 | voice.MicInputGain
 53 | voice.MicStereoBias
 54 | voice.MicNoiseAttackTime
 55 | voice.MicNoiseReleaseTime
 56 | voice.MicStereoBias
 57 | voice.SilenceDetectionAttackTime
 58 | voice.SilenceDetectionReleaseTime
 59 | ```
 60 | 
 61 | To find available settings type `voice.` in editor console, and autocompletion widget will pop up.
 62 | 
 63 | ![](pics/voicesettings.png)
 64 | 
 65 | Console variables can be modified in runtime like this
 66 | 
 67 | ![](pics/silencenode.png)
 68 | 
 69 | To debug your microphone input you can convert output sound buffer to
 70 | unreal sound wave and play it.
 71 | 
 72 | ![](pics/buffertosound.png)
 73 | 
 74 | Above values may differ depending on actual microphone characteristics.
 75 | 
 76 | # Cloud preparation
 77 | 1) Go to [google cloud](https://console.cloud.google.com) and create payment account.
 78 | 2) Enable [Cloud Speech-to-Text API](https://console.cloud.google.com/apis/library/speech.googleapis.com) and [Cloud Text-to-Speech API](https://console.cloud.google.com/apis/library/texttospeech.googleapis.com).
 79 | 3) Create credentials to access your enabled APIs. See instructions [here](https://cloud.google.com/docs/authentication).
 80 | 
 81 | ![](pics/api_key.png)
 82 | 
 83 | 4) There are two ways how you can use your credentials in project.
 84 | 
 85 |     * 4.1 By using environment variables. Create environment variable `GOOGLE_API_KEY` with created key as value.
 86 | 
 87 |     * 4.2 By assigning key directly in blueprints. This can be called anywhere.
 88 | 
 89 |     ![](pics/apikeybp.png)
 90 | 
 91 |     By default you need to set api key from nodes. To use environment variable, you need to set `Use Env Variable` to `true`.
 92 | 
 93 | > **ADVICE**: Pay attention to security and encrypt your assets before packaging.
 94 | 
 95 | ![](pics/encryption.png)
 96 | 
 97 | # Speech synthesis
 98 | 
 99 | You need to supply text to async node, as well as voice variant, speech speed, pitch value and optionally audio effects. As output you will get
100 | audio buffer which you can import using audio importer.
101 | 
102 | ![](pics/googletts.png)
103 | 
104 | <!-- ## Bonus!
105 | 
106 | Output raw samles can be used with oculus ovr lipsync in runtime.
107 | 
108 | ![](pics/ovrframesequence.png)
109 | 
110 | Get node [here](https://github.com/IlgarLunin/UE4OVRLipSyncCookFrameSequence).
111 | 
112 | Demo:
113 | 
114 | [![IMAGE ALT TEXT HERE](https://img.youtube.com/vi/B78aQly2wrI/0.jpg)](https://www.youtube.com/watch?v=B78aQly2wrI) -->
115 | 
116 | # Speech recognition
117 | 
118 | Consists of two parts. Voice capture, and sending request. There are two ways how you can capture your voice, depending on your needs.
119 | 
120 | ## Grant permissions
121 | 
122 | ### Windows
123 | No actions needed
124 | ### Mac
125 | 1. In Xcode, select you project
126 | 1. Go to `Info` tab
127 | 1. Expand `Custom macOS Application Target Properties` section
128 | 1. Hit `+`, and add `Privacy - Microphone Usage Description` string key, set any value you want, for example "GoogleSpeechKitMicAccess" 
129 | ![](pics/microphone_access_xcode.png)
130 | ### Android
131 | Call this somewhere on begin play
132 | 1. Give [microphone access](https://blueprintue.com/blueprint/v-3i68vw/) (**android.permission.RECORD_AUDIO**)
133 |  ![](pics/mic_access_android.png)
134 | 1. Give [disk read access](https://blueprintue.com/blueprint/myo1kxkf/) (**android.permission.READ_EXTERNAL_STORAGE**)
135 |  ![](pics/disk_read_access_android.png)
136 | 1. Give [disk write access](https://blueprintue.com/blueprint/32f-40w8/)  (**android.permission.WRITE_EXTERNAL_STORAGE**)
137 |  ![](pics/disk_write_access_android.png)
138 | 
139 | ## Voice capture and speech recognition
140 | 
141 | <!-- WINDOWS -->
142 | <details>
143 |   <summary>Windows only method (deprecated)</summary>
144 |   
145 | 
146 | Use provided **MicrophoneCapture** actor component as shown below. Next, construct recognition parameters and pass them to **Google STT** async node.
147 | 
148 | ![](pics/googlestt.png)
149 | 
150 | </details>
151 | 
152 | ---
153 | 
154 | <!-- MAC -->
155 | ### Cross platform method (use this instead)
156 | 
157 | 1. Create SoundMix.
158 |     1. Right click in content browser - `Sounds > Mix > Sound Soundmix`
159 |     2. Open it, and set output value to -96.0
160 |     ![](pics/sound_mix.png)
161 | 
162 | 2. Create sound class
163 |     1. Right click in content browser - `Sounds > Classes > Sound Class`
164 |     2. Open it, and set our submix that we created in previous step as sound class default submix
165 | 
166 | 3. Make sure Audio Capture plugin is enabled
167 |     ![](pics/audio_capture_plugin.png)
168 | 4. Go to your actor, and add AudioCapture component in components tab
169 | 5. Disable "Auto Activate" option on AudioCapture
170 | 6. Set our sound class to AudioCapture
171 |     ![](pics/audio_capture_sound_class.png)
172 | 
173 | 7. Now we can drop some nodes. In order to start and stop recording, we use `Activate` and `Deactivate` nodes with previously added AudioCapture component as a target. When audio capture is activated, we can start recording output to our submix
174 | 8. When audio capture is deactivated, we finish recording output to `Wav File`! **This is important**! Give your wav file a name (e.g. "stt_sample"), `Path` can be absolute, or relative (to the /Saved/BouncedWavFiles folder)
175 | ![](pics/start_stop_recording_set_submix.png)
176 | 1. Then, after small delay, we can read saved file back as byte samples, ready to be fed to `Google STT` node. Delay is needed since "Finish Recording Output" node writes sound to disk, file write operation takes some time, if we will proceed immediately, ReadWaveFile node will fail
177 | ![](pics/read_back.png)
178 | 
179 | Here is the whole setup
180 | 
181 | ![](pics/setup_all.png)
182 | 
183 | ---
184 | 
185 | 
186 | There is another STT node - **Google STT Variants** node. Which, instead of returning result with highest confidence, returns an array of variants.
187 | 
188 | ![](pics/googlesttvariants.png)
189 | 
190 | # Utilities
191 | ## Percentage based string comparison (Fuzzy matching)
192 | 
193 | Probably, you will need to process recognised voice in your app, to increase recognition chances use `CompareStrings` node. Below call will return 0.666 value,
194 | so we can treat those strings equal since they are simmilar on 66%. Utilizes [Levenstein distance](https://en.wikipedia.org/wiki/Levenshtein_distance) algorithm
195 | 
196 | ![](pics/compare.png)
197 | 
198 | ## Listing available capture devices
199 | 
200 | You can pass microphone name to microphone capture component. To get list of available microphones, use following setup
201 | 
202 | ![](pics/enumerate_microphones.png)
203 | 
204 | # Supported platforms
205 | 
206 | **Windows**, **Mac** and **Android**.
207 | 
208 | # Migration guide
209 | 
210 | ## Version 3.0
211 | 
212 | `EGoogleTTSLanguage` was removed. You need to pass [voice name](https://cloud.google.com/text-to-speech/docs/voices) as string (**Voice name** column).
213 | 
214 | ![new_language_pin](pics/new_language_pin.png)
215 | 
216 | > **WARNING**: Since synthesys parameters has changed, TTS cache is no longer valid! Make sure you remove TTS cache if exists. **Editor/Game can freeze** if old cache wll be loaded. So make sure to remove `PROJECT_ROOT/Saved/GoogleTTSCache` folder. Or invoke `WipeTTSCache` node before GoogleTTS node is executed!
217 | 
218 | ![](pics/wipe_cache.png)
219 | 
220 | ![](pics/tts_cache_folder.png)
221 | 
222 | The reason for this is that the number of languages has exceeded 256, and we can't put this amount into 8 bit enums (This is Unreal's limitation)
223 | 
224 | # Links
225 | * [Supported TTS voices](https://cloud.google.com/text-to-speech/docs/voices) ([WaveNet](https://en.wikipedia.org/wiki/WaveNet) are the best)
226 | * [Speech synthesis config](https://cloud.google.com/text-to-speech/docs/reference/rest/v1/text/synthesize#audioconfig)
227 | * [Supported STT languages](https://cloud.google.com/speech-to-text/docs/languages)
228 | * [Speech recognition config](https://cloud.google.com/speech-to-text/docs/reference/rest/v1/RecognitionConfig)
229 | 


--------------------------------------------------------------------------------