├── resources
    ├── cli.png
    ├── ram.png
    ├── vlsrun.png
    ├── initnode.png
    ├── thumbnail.png
    ├── finalresult.png
    ├── initialize.png
    ├── modelspage.png
    ├── silencenode.png
    ├── addcomponent.png
    ├── buffertosound.png
    ├── minimalsetup.png
    ├── partialresult.png
    ├── process_path.png
    ├── server_process.png
    ├── voicesettings.png
    ├── default_use_case.png
    ├── pass_sound_wave.png
    ├── vlsdownloadmodels.png
    ├── add_speech_recognizer.png
    ├── initialize_recognizer.png
    ├── recognizer_automatic.png
    ├── push_to_talk_send_once.png
    ├── recognizer_push_to_talk.png
    └── send_data_when_recording.png
└── README.md


/resources/cli.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/VoskPlugin-docs/HEAD/resources/cli.png


--------------------------------------------------------------------------------
/resources/ram.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/VoskPlugin-docs/HEAD/resources/ram.png


--------------------------------------------------------------------------------
/resources/vlsrun.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/VoskPlugin-docs/HEAD/resources/vlsrun.png


--------------------------------------------------------------------------------
/resources/initnode.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/VoskPlugin-docs/HEAD/resources/initnode.png


--------------------------------------------------------------------------------
/resources/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/VoskPlugin-docs/HEAD/resources/thumbnail.png


--------------------------------------------------------------------------------
/resources/finalresult.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/VoskPlugin-docs/HEAD/resources/finalresult.png


--------------------------------------------------------------------------------
/resources/initialize.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/VoskPlugin-docs/HEAD/resources/initialize.png


--------------------------------------------------------------------------------
/resources/modelspage.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/VoskPlugin-docs/HEAD/resources/modelspage.png


--------------------------------------------------------------------------------
/resources/silencenode.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/VoskPlugin-docs/HEAD/resources/silencenode.png


--------------------------------------------------------------------------------
/resources/addcomponent.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/VoskPlugin-docs/HEAD/resources/addcomponent.png


--------------------------------------------------------------------------------
/resources/buffertosound.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/VoskPlugin-docs/HEAD/resources/buffertosound.png


--------------------------------------------------------------------------------
/resources/minimalsetup.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/VoskPlugin-docs/HEAD/resources/minimalsetup.png


--------------------------------------------------------------------------------
/resources/partialresult.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/VoskPlugin-docs/HEAD/resources/partialresult.png


--------------------------------------------------------------------------------
/resources/process_path.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/VoskPlugin-docs/HEAD/resources/process_path.png


--------------------------------------------------------------------------------
/resources/server_process.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/VoskPlugin-docs/HEAD/resources/server_process.png


--------------------------------------------------------------------------------
/resources/voicesettings.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/VoskPlugin-docs/HEAD/resources/voicesettings.png


--------------------------------------------------------------------------------
/resources/default_use_case.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/VoskPlugin-docs/HEAD/resources/default_use_case.png


--------------------------------------------------------------------------------
/resources/pass_sound_wave.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/VoskPlugin-docs/HEAD/resources/pass_sound_wave.png


--------------------------------------------------------------------------------
/resources/vlsdownloadmodels.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/VoskPlugin-docs/HEAD/resources/vlsdownloadmodels.png


--------------------------------------------------------------------------------
/resources/add_speech_recognizer.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/VoskPlugin-docs/HEAD/resources/add_speech_recognizer.png


--------------------------------------------------------------------------------
/resources/initialize_recognizer.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/VoskPlugin-docs/HEAD/resources/initialize_recognizer.png


--------------------------------------------------------------------------------
/resources/recognizer_automatic.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/VoskPlugin-docs/HEAD/resources/recognizer_automatic.png


--------------------------------------------------------------------------------
/resources/push_to_talk_send_once.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/VoskPlugin-docs/HEAD/resources/push_to_talk_send_once.png


--------------------------------------------------------------------------------
/resources/recognizer_push_to_talk.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/VoskPlugin-docs/HEAD/resources/recognizer_push_to_talk.png


--------------------------------------------------------------------------------
/resources/send_data_when_recording.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IlgarLunin/VoskPlugin-docs/HEAD/resources/send_data_when_recording.png


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # **Offline Speech Recognition**
  2 | 
  3 | ![](resources/thumbnail.png)
  4 | 
  5 | This is Unreal Engine plugin for accurate speech recognition, and it doesn't require internet connection.
  6 | 
  7 | # Table of contents
  8 | - [**Offline Speech Recognition**](#offline-speech-recognition)
  9 | - [Table of contents](#table-of-contents)
 10 | - [High level overview](#high-level-overview)
 11 | - [Project settings](#project-settings)
 12 | - [Test your microphone](#test-your-microphone)
 13 | - [Where to download languages and how to test them](#where-to-download-languages-and-how-to-test-them)
 14 | - [Using built-in language server (USE THIS)](#using-built-in-language-server)
 15 |   - [Automatic speech recognition based on silence detection](#automatic-speech-recognition-based-on-silence-detection)
 16 |   - [Push to talk (Speak first, then recognize)](#push-to-talk-speak-first-then-recognize)
 17 | - [Running language server as external process](#running-language-server-as-external-process)
 18 | - [Running server process and game process at the same time](#running-server-process-and-game-process-at-the-same-time)
 19 | - [Passing SoundWave as input, instead of microphone](#passing-soundwave-as-input-instead-of-microphone)
 20 |   - [How Send Data to Language Server node works](#how-send-data-to-language-server-node-works)
 21 | - [Platforms supported](#platforms-supported)
 22 | - [Links](#links)
 23 | 
 24 | 
 25 | # High level overview
 26 | Since this is the speech to text plugin (STT), first thing you need is to be able to record your voice (any recording device). Then recorded voice is passed to speech recognizer, speech recognizer is giving your speech back in textual form. Speech recognizer is working with 1 language at a time. Each language is a downloadable folder with files.
 27 | 
 28 | In order to package shipe you game or app to end user, you will need to package each language model with your game, as well as language server itself (this is optional, since your game itself can be a server).
 29 | 
 30 | 
 31 | # Project settings
 32 | To make microphone work, you need to add following lines to `DefaultEngine.ini` of the project.
 33 | ```
 34 | [Voice]
 35 | bEnabled=true
 36 | ```
 37 | 
 38 | To not loose pauses in between words, you probably want to check silence detection threshold `voice.SilenceDetectionThreshold`, value `0.01` is good.
 39 | This also goes to `DefaultEngine.ini`.
 40 | 
 41 | ```
 42 | [SystemSettings]
 43 | voice.SilenceDetectionThreshold=0.01
 44 | ```
 45 | Starting from Engine version 4.25 also put
 46 | ```
 47 | voice.MicNoiseGateThreshold=0.01
 48 | ```
 49 | 
 50 | Another voice related variables worth playing with
 51 | ```bash
 52 | voice.MicNoiseGateThreshold
 53 | voice.MicInputGain
 54 | voice.MicStereoBias
 55 | voice.MicNoiseAttackTime
 56 | voice.MicNoiseReleaseTime
 57 | voice.MicStereoBias
 58 | voice.SilenceDetectionAttackTime
 59 | voice.SilenceDetectionReleaseTime
 60 | ```
 61 | 
 62 | To find available settings type `voice.` in editor console, and autocompletion widget will pop up.
 63 | 
 64 | ![](resources/voicesettings.png)
 65 | 
 66 | Console variables can be modified in runtime like this
 67 | 
 68 | ![](resources/silencenode.png)
 69 | 
 70 | Above values may differ depending on actual microphone characteristics.
 71 | 
 72 | # Test your microphone
 73 | To debug your microphone, input you can convert output sound buffer to
 74 | unreal sound wave and play it.
 75 | 
 76 | ![](resources/buffertosound.png)
 77 | 
 78 | Another thing to keep in mind, if component connected to server, by default, it will try to send voice data during microphone capture. If you don't want this behavior, you can disable it like this
 79 | 
 80 | ![](resources/send_data_when_recording.png)
 81 | 
 82 | Use this for push to talk style recognition (*when you record whole phrase first, and then send it to server*)
 83 | 
 84 | ![](resources/push_to_talk_send_once.png)
 85 | 
 86 | # Where to download languages and how to test them
 87 | All available languages are available [here](https://alphacephei.com/vosk/models)
 88 | 
 89 | To test how specific language behaves, you can use [external language server app](https://github.com/IlgarLunin/vosk-language-server)
 90 | 
 91 | # Using built-in language server
 92 | *This method is preferable for simple scenarios, when you don't need to separate your game and language server, here you don't have all this hustle managing external process and communicating with server via web sockets.*
 93 | 
 94 | For both automatic and push to talk style recognition, you start from adding **SpeechRecognizer** component to your actor
 95 |    
 96 | ![add_speech_recognizer](resources/add_speech_recognizer.png)
 97 | 
 98 | And then loading language into it. (This is non blocking function, and you know exactly when model is fully loaded into memory by connecting to **Finished** output pin)
 99 | 
100 | ![](resources/initialize_recognizer.png)
101 | 
102 | ## Automatic speech recognition based on silence detection
103 | ![](resources/recognizer_automatic.png)
104 | 
105 | ## Push to talk (Speak first, then recognize)
106 | ![](resources/recognizer_push_to_talk.png)
107 | 
108 | Feed voice data node can handle any amount of pre recorded speech, see [this section](#how-send-data-to-language-server-node-works)
109 | 
110 | # Running language server as external process
111 | *In more complex cases this method is preferable over using built-in. You can have a single language server running in cloud or local server, and it can process multiple clients at the same time, since it's multithreaded.*
112 | 
113 | 1. Download latest version [here](https://github.com/IlgarLunin/vosk-language-server/releases)
114 | 2. Run **vls.exe**, which is a user interface for **asr_server.exe**
115 |    > **NOTE**: *asr_server.exe* is real server, you can run it without gui
116 |    ![](resources/cli.png)
117 | 3. Go to main menu -> File -> Download models
118 |    
119 |    ![](resources/vlsdownloadmodels.png)
120 | 
121 | 4. You will be redirected to a web page where you will find all available models (**languages**)
122 |    
123 |    ![](resources/modelspage.png)
124 | 
125 | 5. In order to start using language, first download one of them
126 | 6. Enter path to downloaded model to server UI and press **start** button
127 |    
128 |    ![](resources/vlsrun.png)
129 | 
130 |    > **!NOTE!**: Depending on model size, you need to wait until model loaded in to memory, before start feeding server with voice data. e.g. If model size is ~2GB, it acn take ~10-30 seconds. But this is one time event, you can load your language to memory once with OS startup.
131 |    ![](resources/ram.png)
132 | 
133 | 7. Open unreal
134 | 8. Create actor blueprint
135 | 9. Add Vosk component in components panel
136 | 
137 |     ![](resources/addcomponent.png)
138 | 
139 | 10. On begin play
140 |     1. Bind to "Partial Result Received" event
141 |     ![](resources/partialresult.png)
142 | 
143 |     1. **[Optional]** Bind to "Final Result Received" event
144 |     ![](resources/finalresult.png)
145 | 
146 |     1. **[!MANDATORY!]** Connect to language server process and begin voice capture
147 |     ![](resources/initialize.png)
148 |     NOTE: `Addr` and `Port` coresponds to language server UI (*0.0.0.0 is the same as 127.0.0.1, it's just localhost*)
149 |     ![](resources/initnode.png)
150 | 
151 | 
152 | 11. Start talking
153 | 12. Check *Partial Result Received* event gets executed
154 | 
155 | # Running server process and game process at the same time
156 | Plugin offers following nodes
157 | 
158 | ![](resources/server_process.png)
159 | 
160 | **Build Server Parameters** - helper method to simplify passing arguments to create process node
161 | 
162 | **Create Process** - Runs external program, this one is generic, you can use it to run whatever external program
163 | 
164 |  *NOTE*: *When you ship your game, you need to include language server as well, put language server files in your game bin folder (`GAME/Binaries/Win64/**`), and use "GetProcessExecutablePath" node to build path to `asr_server.exe`*
165 | 
166 | ![](resources/process_path.png)
167 | 
168 | **Kill Process** - This is an equivalent of `Alt+F4`, it will shut down external process based on Process ID, the process id is process handle. Save output of `Create Process` node to a variable and use it later to terminate process.
169 | 
170 | Default use case:
171 | 
172 | * Create an `Actor` responsible for voice recognition
173 | * Start language server on `Begin Play` event
174 | * Add `Vosk` actor component and initialize it in begin play
175 | * Begin capturing voice data
176 | * Bind to message receive events
177 | * Uninitialize vosk component and terminate server process on end play
178 | 
179 | > **NOTE**: *`Uninitialize` will stop voice capture if it is active*
180 | 
181 | ![](resources/default_use_case.png)
182 | 
183 | # Passing SoundWave as input, instead of microphone
184 | 
185 | To do so, plugin offers a node that will convert sound into array of bytes, it is called `"Decompress Sound"`. You can than use output of decompress sound node in `"Send Voice Data to Language Server"` node, and expect partial and final result events being invoked later, when server finishes recognition.
186 | 
187 | 
188 | > **NOTE**: *Do not call `BeginCapture` and `FinishCapture` in this case, since we don't want to use audio from the microphone*
189 | 
190 | 
191 | ![](resources/pass_sound_wave.png)
192 | 
193 | ## How Send Data to Language Server node works
194 | It takes sound bytes as first argument, and packet size as second argument. It will split all bytes into packets of given size, and send them one after another to language server, emulating microphone capture behavior. If packet size is greater than size of voice data, data will not be sent. 4096 packet size works relatively fast and suitable for short phrases. Note that if packet size is small, it will take more time to deliver entire voice to the server, and server will perform more iterations accordingly. You should play around with packet size in your specific case.
195 | 
196 | 
197 | # Platforms supported
198 | 
199 | Tested on **Windows**
200 | 
201 | 
202 | 
203 | # Links
204 | 
205 | Find out more in documentation
206 | 
207 | * [Vosk](https://alphacephei.com/vosk/)
208 | 


--------------------------------------------------------------------------------