├── .github └── FUNDING.yml ├── API.md ├── API_VANILLA.md ├── AUDIO_FILE.md ├── CHANGELOG.md ├── EXAMPLE_CODE.md ├── HOW_TO_USE.md ├── LICENSE ├── LIMITATION.md ├── LLM_ENGINE.md ├── MAKE_BACKEND.md ├── PROBLEMS.md ├── README.md ├── SUPPORTED_LANGUAGES.md ├── TEST.md ├── assets ├── multi_lang │ ├── ar.mp3 │ ├── cn.mp3 │ ├── de.mp3 │ ├── el.mp3 │ ├── en.mp3 │ ├── fi.mp3 │ ├── fr.mp3 │ ├── hindi.mp3 │ ├── id.mp3 │ ├── it.mp3 │ ├── jp.mp3 │ ├── ko.mp3 │ ├── ro.mp3 │ ├── ru.mp3 │ └── tr.mp3 └── test │ └── cat.mp3 ├── backend └── nodejs │ ├── .env.template │ ├── .gitignore │ ├── app.js │ ├── package.json │ └── readme.md ├── img ├── RNSH.png ├── adaptable.png ├── amazon.png ├── auto_transcribe.png ├── backend.png ├── banner.png ├── cache.png ├── chat_gpt_api.png ├── elevenlabs.png ├── elevenlabs_pricing.png ├── enterprise_web_version.png ├── features.png ├── google_tts.png ├── hanacaraka.png ├── intellisense.gif ├── interaction.png ├── mobile_version.png ├── open_tts.png ├── overview.png ├── pdf_reader_plugin.png ├── position.png ├── prepareHL.loadingProgress.png ├── prepareHL.png ├── pronounciation.png ├── react_speech_logo.png ├── relation.png ├── sosmed.png ├── vanilla.png ├── viseme.png └── web_version.png └── package.json /.github/FUNDING.yml: -------------------------------------------------------------------------------- 1 | # These are supported funding model platforms 2 | 3 | github: albirrkarim 4 | #patreon: # Replace with a single Patreon username 5 | #open_collective: # Replace with a single Open Collective username 6 | ko_fi: albirrkarim 7 | #tidelift: # Replace with a single Tidelift platform-name/package-name e.g., npm/babel 8 | #community_bridge: # Replace with a single Community Bridge project-name e.g., cloud-foundry 9 | #liberapay: # Replace with a single Liberapay username 10 | #issuehunt: # Replace with a single IssueHunt username 11 | #otechie: # Replace with a single Otechie username 12 | #lfx_crowdfunding: # Replace with a single LFX Crowdfunding project-name e.g., cloud-foundry 13 | # custom: ['https://paypal.me/AlbirrKarim'] -------------------------------------------------------------------------------- /API.md: -------------------------------------------------------------------------------- 1 | # API 2 | 3 | The api is a function that you can use to integrate this package into your apps. When read this api docs you can toggle `Outline` (see top right) menu in github so you can navigate easily. 4 | 5 | This package is written with typescript, You don't have to read all the docs in here, because this package now support [VS Code IntelliSense](https://code.visualstudio.com/docs/editor/intellisense) what is that? simply its when you hover your mouse into some variable or function [VS Code](https://code.visualstudio.com) will show some popup (simple tutorial) what is the function about, examples, params, etc... 6 | 7 |
8 | Show Video 9 |
10 | 11 | https://github.com/albirrkarim/react-speech-highlight-demo/assets/29292018/05d325f9-469c-47e9-97d3-10053628e18c 12 | 13 |
14 | 15 |
16 | 17 | see [API_VANILLA.md](API_VANILLA.md) for vanilla js version. 18 | 19 |
20 | 21 | Actually, **Theres a lot** of function, [llm engine](LLM_ENGINE.md) and constant that you can import from this package. Here's just few of them. When you have buy the package you can just go to the `index.ts` file and see all the function and constant. the package have a lot of features, ofcourse it have a lot of APIs. 22 | 23 |
24 | Show How to import something from the package 25 |
26 | 27 | ```jsx 28 | // v5.4.8 API 29 | import { 30 | // Main 31 | markTheWords, 32 | useTextToSpeech, 33 | 34 | // Utilities function for precision and add more capabilities 35 | pronunciationCorrection, 36 | getLangForThisText, 37 | getTheVoices, 38 | noAbbreviation, 39 | speak, 40 | convertTextIntoClearTranscriptText, 41 | 42 | // Package Data and Cache Integration 43 | // Your app can read the data used by this package, like: 44 | PKG, 45 | PREFERRED_VOICE, // Set global config for the preffered voice 46 | PKG_STATUS_OPT, // Package status option 47 | PKG_DEFAULT_LANG, // Package default lang 48 | LANG_CACHE_KEY, // Package lang sessionStorage key 49 | OPENAI_CHAT_COMPLETION_API_ENDPOINT, 50 | getVoiceBasedOnVoiceURI, 51 | getCachedVoiceInfo, 52 | getCachedVoiceURI, 53 | setCachedVoiceInfo, 54 | getCachedVoiceName, 55 | } from "react-speech-highlight"; 56 | 57 | // Type data for typescript 58 | import type { 59 | ControlHLType, 60 | StatusHLType, 61 | PrepareHLType, 62 | SpokenHLType, 63 | UseTextToSpeechReturnType, 64 | ActivateGestureProps, 65 | GetVoicesProps, 66 | VoiceInfo, 67 | markTheWordsFuncType, 68 | ConfigTTS, 69 | getAudioType, 70 | getAudioReturnType, 71 | VisemeMap, 72 | SentenceInfo, 73 | } from "react-speech-highlight"; 74 | ``` 75 | 76 |
77 | 78 |
79 | 80 | # Main 81 | 82 | ## 1. TTS Marker `markTheWords()` 83 | 84 | The `markTheWords()` function is to process the string text and give some marker to every word and sentences that system will read. 85 | 86 |
87 | Show Code 88 |
89 | 90 | Important, This example using react `useMemo()` to avoid unecessary react rerender. i mean it will only execute when the `text` is changing. it's similiar with `useEffect()`. 91 | 92 |
93 | 94 | ```jsx 95 | function abbreviationFunction(str) { 96 | // You can write your custom abbreviation function here 97 | // example: 98 | // Input(string) : LMK 99 | // Ouput(string) : Let me know 100 | 101 | return str; 102 | } 103 | 104 | const textHL = useMemo(() => markTheWords(text, abbreviationFunction), [text]); 105 | ``` 106 | 107 |
108 | 109 |
110 | 111 | ## 2. TTS React Hook `useTextToSpeech()` 112 | 113 | ### 2.A. CONFIG 114 | 115 | There are two config placement, initialConfig and actionConfig. 116 | 117 |
118 | Show Code 119 |
120 | 121 | ```jsx 122 | const initialConfig = { 123 | autoHL: true, 124 | disableSentenceHL: false, 125 | disableWordHL: false, 126 | classSentences: "highlight-sentence", 127 | classWord: "highlight-spoken", 128 | 129 | lang: "id-ID", 130 | pitch: 1, 131 | rate: 0.9, 132 | volume: 1, 133 | autoScroll: false, 134 | clear: true, 135 | 136 | // For viseme mapping, 137 | visemeMap: {}, 138 | 139 | // Prefer or fallback to audio file 140 | preferAudio: null, 141 | fallbackAudio: null, 142 | 143 | batchSize: 200, 144 | 145 | timestampDetectionMode: "auto", 146 | }; 147 | 148 | const { controlHL, statusHL, prepareHL, spokenHL } = 149 | useTextToSpeech(initialConfig); 150 | ``` 151 | 152 | ```jsx 153 | const actionConfig = { 154 | autoHL: true, 155 | disableSentenceHL: false, 156 | disableWordHL: false, 157 | classSentences: "highlight-sentence", 158 | classWord: "highlight-spoken", 159 | 160 | lang: "id-ID", 161 | pitch: 1, 162 | rate: 0.9, 163 | volume: 1, 164 | autoScroll: false, 165 | clear: true, 166 | 167 | // For viseme mapping, 168 | visemeMap: {}, 169 | 170 | // Prefer or fallback to audio file 171 | preferAudio: "example.com/some_file.mp3", 172 | fallbackAudio: "example.com/some_file.mp3", 173 | 174 | batchSize: null, // or 200 175 | 176 | timestampDetectionMode: "auto", // or rule, ml 177 | }; 178 | 179 | void controlHL.play({ 180 | textEl: textEl.current, 181 | onEnded: () => { 182 | console.log("Callback when tts done"); 183 | }, 184 | actionConfig, 185 | }); 186 | ``` 187 | 188 |
189 | 190 |
191 | Show details config 192 | 193 | - `autoHL` 194 | 195 | If the voice is not support the onboundary event, then this package prefer to disable word highlight. instead of trying to mimic onboundary event 196 | 197 | - `disableSentenceHL` 198 | 199 | Disable sentence highlight 200 | 201 | - `disableWordHL` 202 | 203 | Disable word highlight 204 | 205 | - `classSentences` 206 | 207 | You can styling the highlighted sentence with css to some class name 208 | 209 | - `classWord` 210 | 211 | You can styling the highlighted word with css to some class name 212 | 213 | - `lang` 214 | 215 | The one used for `SpeechSynthesisUtterance.lang`. [see](https://developer.mozilla.org/en-US/docs/Web/API/SpeechSynthesisUtterance/lang) 216 | 217 | - `pitch` 218 | 219 | The one used for `SpeechSynthesisUtterance.pitch` 220 | 221 | - `volume` 222 | 223 | The one used for `SpeechSynthesisUtterance.volume` 224 | 225 | - `autoScroll` 226 | 227 | Beautifull auto scroll, so the user can always see the highlighted sentences 228 | 229 | - `clear` 230 | 231 | if `true` overide previous played TTS with some new TTS that user want, if `false` user want to execute play new TTS but there's still exist played TTS. so it will just entering queue behind it 232 | 233 | - `visemeMap` 234 | 235 | The data for this parameter i provide in the [demo website source code](https://github.com/Web-XR-AI-lab/demo-website-react-speech-highlight). 236 | 237 | - `preferAudio` 238 | 239 | Some API to pass `string` or `async function` that return audio url like this `example.com/some_file.mp3` as preferred audio. 240 | 241 | So the package will use this audio instead of the built in web speech synthesis. 242 | 243 | - `fallbackAudio` 244 | 245 | Some API to pass `string` or `async function` that return audio url like this`example.com/some_file.mp3` as fallback audio. 246 | 247 | When the built in web speech synthesis error or user doesn't have any voice. the fallback audio file will be used. 248 | 249 | ```jsx 250 | async function getAudioForThisText(text){ 251 | var res = await getAudioFromTTSAPI("https://yourbackend.com/api/elevenlabs....",text); 252 | // convert to audio file, convert again to audio url 253 | 254 | return res; 255 | } 256 | 257 | const config = { 258 | preferAudio: getAudioForThisText // will only call if needed (if user want to play) so you can save cost 259 | fallbackAudio: getAudioForThisText // will only call if needed (if web speech synthesis fail) so you can save cost 260 | } 261 | 262 | const { controlHL, statusHL, prepareHL, spokenHL } = useTextToSpeech(config) 263 | ``` 264 | 265 | - `batchSize` 266 | 267 | The batch size for the audio file. 268 | 269 | When you set the batch is null so they send all the text. then you set for 200 package will chunk the text into 200 character. 270 | 271 | Example: 200 272 | so package will batched send 200 characters per request to TTS API 273 | 274 | [Readmore about batch system in this package](PROBLEMS.md#1-the-delay-of-audio-played-and-user-gesture-to-trigger-play-must-be-close) 275 | 276 | - `timestampDetectionMode` 277 | 278 | Detection mode for timestamp engine. [see private docs](https://github.com/Web-XR-AI-lab/demo-website-react-speech-highlight/tree/main/docs) 279 | 280 |
281 | 282 | ### 2.B. INTERFACE 283 | 284 | #### controlHL 285 | 286 | ```js 287 | controlHL.play(); 288 | controlHL.pause(); 289 | controlHL.resume(); 290 | controlHL.stop(); 291 | controlHL.seekSentenceBackward(); 292 | controlHL.seekSentenceForward(); 293 | controlHL.seekParagraphBackward(); 294 | controlHL.seekParagraphForward(); 295 | controlHL.changeConfig(); 296 | controlHL.activateGesture(); 297 | ``` 298 | 299 | #### statusHL 300 | 301 | Some react state that give the status of the program. The value it can be `idle|play|calibration|pause|loading`. You can fixed the value with accessing from `PKG_STATUS_OPT` constant. 302 | 303 | | Name | Description | 304 | | ------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | 305 | | `idle` | it's initial state | 306 | | `calibration` | system still process the text, so when TTS is playing it will performs accurate and better | 307 | | `play` | The system still playing TTS | 308 | | `pause` | Resume TTS | 309 | | `loading` | it mean the the system still processing to get best voices available. status will change to this value if we call `prepareHL.getVoices()` [see](PROBLEMS.md#4-bad-performance-or-voice-too-fast) | 310 | 311 | #### prepareHL 312 | 313 | Contain state and function to preparing the TTS. From all available voices that we can get from the SpeechSynthesis.getVoices() this package will test the voice and give 5 only best voice with language specified before. 314 | 315 | | Name | Description | 316 | | ------------------------- | ------------------------------------------------------------------------------------------------ | 317 | | prepareHL.getVoices() | Function to tell this package to get the best voice. [see](PROBLEMS.md#4-bad-performance-or-voice-too-fast) | 318 | | prepareHL.voices | React state store the result from `prepareHL.getVoices()` | 319 | | prepareHL.loadingProgress | React state for knowing voice testing progress | 320 | 321 | #### spokenHL 322 | 323 | Contain react state for reporting while TTS playing. 324 | 325 | | Name | Description | 326 | | --------------------------- | ------------------------------------------------ | 327 | | spokenHL.sentence | Some react state, Get the sentence that read | 328 | | spokenHL.word | Some react state, Get the word that read | 329 | | spokenHL.viseme | Some react state, Get the current viseme | 330 | | spokenHL.precentageWord | Read precentage between 0-100 based on words | 331 | | spokenHL.precentageSentence | Read precentage between 0-100 based on sentences | 332 | 333 | # Utilities 334 | 335 | Utilities function for precision and add more capabilities 336 | 337 | ## 1. pronunciationCorrection() 338 | 339 | The common problem is the text display to user is different with their spoken form. like math symbol, equations, terms, etc.. [readmore about pronounciation problem](PROBLEMS.md) 340 | 341 | [How to build this package with open ai api integration](MAKE_BACKEND.md) 342 | 343 |
344 | Show Code 345 |
346 | 347 | ```jsx 348 | const inputText = ` 349 | 357 | `; 358 | 359 | const textEl = useRef(); 360 | 361 | const pronounciation = async (): Promise => { 362 | if (textEl.current) { 363 | await pronunciationCorrection(textEl.current, (progress) => { 364 | console.log(progress); 365 | }); 366 | } 367 | }; 368 | 369 | useEffect(() => { 370 | if (textEl.current) { 371 | console.log("pronounciation"); 372 | void pronounciation(); 373 | } 374 | // eslint-disable-next-line 375 | }, []); 376 | 377 | const textHL = useMemo(() => markTheWords(inputText), [inputText]); 378 | 379 | return ( 380 |
381 |

386 |
387 | ); 388 | ``` 389 | 390 |
391 | 392 | ## 2. getLangForThisText() 393 | 394 | For example you want to implement this package into blog website with multi language, it's hard to know the exact language for each post / article. 395 | 396 | Then i use chat gpt api to detect what language from some text. see [How to build this package with open ai api integration](MAKE_BACKEND.md) 397 | 398 |
399 | Show Code 400 |
401 | 402 | ```jsx 403 | var timeout = null; 404 | 405 | const inputText = ` 406 | Hallo, das ist ein deutscher Beispieltext 407 | `; 408 | 409 | async function getLang() { 410 | var predictedLang = await getLangForThisText(textEl.current); 411 | 412 | // will return `de` 413 | if (predictedLang) { 414 | setLang(predictedLang); 415 | } 416 | } 417 | 418 | useEffect(() => { 419 | if (textEl.current) { 420 | if (inputText != "") { 421 | // The timeout is for use case: text change frequently. 422 | // if the text doesn't change just call getLang(); 423 | if (timeout) { 424 | clearTimeout(timeout); 425 | } 426 | 427 | timeout = setTimeout(() => { 428 | getLang(); 429 | }, 2000); 430 | } 431 | } 432 | }, [inputText]); 433 | ``` 434 | 435 |
436 | 437 | ## 3. convertTextIntoClearTranscriptText() 438 | 439 | Function to convert your input string (just text or html string) into [Speech Synthesis Markup Language (SSML)](https://cloud.google.com/text-to-speech/docs/ssml) clear format that this package **can understand** when making transcript timestamp. 440 | 441 | **You must use this function when making the audio file** 442 | 443 | ```jsx 444 | var convertInto = "ssml"; // or "plain_text" 445 | var clear_transcript = convertTextIntoClearTranscriptText( 446 | "your string here", 447 | convertInto 448 | ); 449 | // with the clear_transcript you can make audio file with help of other speech synthesis platforms like elevenlabs etc. 450 | ``` 451 | 452 | # Package Data and Cache Integration 453 | 454 | The data or cache (storage) that this package use can be accessed outside. The one that used by [React GPT Web Guide](https://github.com/albirrkarim/react-gpt-web-guide-docs). 455 | 456 |
457 | Show 458 | 459 | ```js 460 | import { 461 | // ...other API 462 | 463 | // Your app can read the data / cache used by this package, like: 464 | PREFERRED_VOICE, // Set global config for the preffered voice 465 | PKG_STATUS_OPT, // Package status option 466 | PKG_DEFAULT_LANG, // Package default lang 467 | LANG_CACHE_KEY, // Package lang sessionStorage key 468 | OPENAI_CHAT_COMPLETION_API_ENDPOINT, // Key to set open ai chat completion api 469 | getVoiceBasedOnVoiceURI, 470 | getCachedVoiceInfo, 471 | getCachedVoiceURI, 472 | setCachedVoiceInfo, 473 | getCachedVoiceName, 474 | } from "react-speech-highlight"; 475 | ``` 476 | 477 |
478 | 479 | Usage example: 480 | 481 | ## Set custom constant value for this package 482 | 483 | ```jsx 484 | import { setupKey, storage } from "@/app/react-speech-highlight"; 485 | 486 | // set global preferred voice 487 | useEffect(() => { 488 | const your_defined_preferred_voice = { 489 | // important! Define language code (en-us) with lowercase letter 490 | "de-de": ["Helena", "Anna"], 491 | }; 492 | 493 | storage.setItem( 494 | "global", 495 | setupKey.PREFERRED_VOICE, 496 | yourDefinedPreferredVoice 497 | ); 498 | 499 | // Set open ai chat completion api 500 | // example in demo website (next js using environment variable) src/Components/ClientProvider.tsx 501 | if (process.env.NEXT_PUBLIC_OPENAI_CHAT_COMPLETION_API_ENDPOINT) { 502 | storage.setItem( 503 | "global", 504 | setupKey.OPENAI_CHAT_COMPLETION_API_ENDPOINT, 505 | process.env.NEXT_PUBLIC_OPENAI_CHAT_COMPLETION_API_ENDPOINT 506 | ); 507 | } 508 | 509 | // or 510 | storage.setItem( 511 | "global", 512 | OPENAI_CHAT_COMPLETION_API_ENDPOINT, 513 | "http://localhost:8000/api/v1/public/chat" 514 | ); 515 | 516 | // You can set the headers for the fetch API request with this key in sessionStorage 517 | const headers = { 518 | Authorization: `Bearer xxx_YOUR_PLATFORM_AUTH_TOKEN_HERE_xxx`, 519 | }; 520 | 521 | // Tips: Hover your mouse over the REQUEST_HEADERS variable to see the example and docs 522 | storage.setItem("global", setupKey.REQUEST_HEADERS, headers); 523 | 524 | // Speech to Text API endpoint 525 | if (process.env.NEXT_PUBLIC_OPENAI_STT_API_ENDPOINT) { 526 | storage.setItem( 527 | "global", 528 | setupKey.OPENAI_SPEECH_TO_TEXT_API_ENDPOINT, 529 | process.env.NEXT_PUBLIC_OPENAI_STT_API_ENDPOINT 530 | ); 531 | } 532 | }, []); 533 | ``` 534 | -------------------------------------------------------------------------------- /API_VANILLA.md: -------------------------------------------------------------------------------- 1 | # API Vanilla 2 | 3 | The only different between [React version](https://react-speech-highlight.vercel.app) and [Vanilla Js Version](https://vanilla-speech-highlight.vercel.app) is just the React state (`useState`) 4 | 5 | In the vanilla version we try to mimic the react state with using function callback 6 | 7 | ```js 8 | // This is just to pointer for the current played tts 9 | var statusHL = document.getElementById("statusHL"); 10 | var spokenHL_viseme = document.getElementById("spokenHL_viseme"); 11 | var spokenHL_word = document.getElementById("spokenHL_word"); 12 | var spokenHL_sentence = document.getElementById("spokenHL_sentence"); 13 | var spokenHL_precentageWord = document.getElementById( 14 | "spokenHL_precentageWord" 15 | ); 16 | var spokenHL_precentageSentence = document.getElementById( 17 | "spokenHL_precentageSentence" 18 | ); 19 | 20 | const setStatusHLState = (status) => { 21 | console.log("Default setStatusHLState function the statusHL is ", status); 22 | 23 | if (statusHL) { 24 | statusHL.innerHTML = status; 25 | } 26 | }; 27 | 28 | const setVisemeSpoken = (viseme) => { 29 | console.log("Default setVisemeSpoken function. the viseme ", viseme); 30 | 31 | if (spokenHL_viseme) { 32 | spokenHL_viseme.innerHTML = viseme; 33 | } 34 | }; 35 | 36 | const setWordSpoken = (word) => { 37 | console.log("Default setWordSpoken function. the word ", word); 38 | 39 | if (spokenHL_word) { 40 | spokenHL_word.innerHTML = word; 41 | } 42 | }; 43 | 44 | const setSentenceSpoken = (sentence) => { 45 | console.log("Default setSentenceSpoken function ", sentence); 46 | 47 | if (spokenHL_sentence) { 48 | spokenHL_sentence.innerHTML = sentence; 49 | } 50 | }; 51 | 52 | const setPrecentageSentence = (precentageSentence) => { 53 | console.log( 54 | "Default setPrecentageSentence function, precentageSentence = ", 55 | precentageSentence 56 | ); 57 | 58 | if (spokenHL_precentageWord) { 59 | spokenHL_precentageWord.innerHTML = precentageSentence + "%"; 60 | } 61 | }; 62 | 63 | const setPrecentageWord = (precentageWord) => { 64 | console.log( 65 | "Default setPrecentageWord function, precentageWord = ", 66 | precentageWord 67 | ); 68 | 69 | if (spokenHL_precentageSentence) { 70 | spokenHL_precentageSentence.innerHTML = precentageWord + "%"; 71 | } 72 | }; 73 | 74 | var defaultParams = { 75 | setVoices: (voices) => { 76 | console.log("Default setVoices function ", voices); 77 | }, 78 | setLoadingProgress: (progress) => { 79 | console.log( 80 | "Default setLoadingProgress function the progress is ", 81 | progress 82 | ); 83 | }, 84 | setStatusHLState, 85 | setVisemeSpoken, 86 | setWordSpoken, 87 | setSentenceSpoken, 88 | setPrecentageSentence, 89 | setPrecentageWord, 90 | }; 91 | 92 | // Global control HL 93 | const { controlHL } = useTextToSpeech(defaultParams); 94 | 95 | // play the tts 96 | controlHL.play(); 97 | ``` 98 | 99 | This is the API of `useTextToSpeech()` is like this. minus the react state (i comment it) 100 | 101 | ```jsx 102 | /** 103 | * Type for control highlight 104 | */ 105 | export interface ControlHLType { 106 | play: PlayFunction; 107 | resume: () => void; 108 | pause: () => void; 109 | stop: () => void; 110 | seekSentenceBackward: (config: Partial) => void; 111 | seekSentenceForward: (config: Partial) => void; 112 | seekParagraphBackward: (config: Partial) => void; 113 | seekParagraphForward: (config: Partial) => void; 114 | activateGesture: ActivateGestureFunction; 115 | changeConfig: (actionConfig: Partial) => void; 116 | } 117 | 118 | export interface PrepareHLType { 119 | // loadingProgress: number 120 | // voices: VoiceInfo[] 121 | getVoices: GetVoicesFunction; 122 | retestVoices: RetestVoicesFunction; 123 | quicklyGetSomeBestVoice: QuicklyGetSomeBestVoiceFunction; 124 | } 125 | 126 | /** 127 | * Type for useTextToSpeech 128 | */ 129 | export interface UseTextToSpeechReturnType { 130 | controlHL: ControlHLType; 131 | // statusHL: StatusHLType 132 | // spokenHL: SpokenHLType 133 | prepareHL: PrepareHLType; 134 | } 135 | ``` 136 | -------------------------------------------------------------------------------- /AUDIO_FILE.md: -------------------------------------------------------------------------------- 1 | # How to get the Audio File Automatically 2 | 3 | When we talk about generating audio file we need do research considering the price, quality, and api support so you can generate audio programmatically. 4 | 5 | Table of Contents: 6 | 7 | - [A. Efficient Cost Strategy](#a-efficient-cost-strategy) 8 | - [B. Paid TTS API](#b-paid-tts-api) 9 | - [C. Local AI TTS](#c-local-ai-tts) 10 | 11 |
12 |
13 | 14 | ## A. Efficient Cost Strategy 15 | 16 | - Considering based on your needs 17 | - When your needs is multi language you can make controller that using mixed of TTS API provider. 18 | - Using cache for the audio file 19 | 20 | ![Cache Strategy of Audio File](./img/cache.png) 21 | 22 | Usually i use laravel as a backend. its a good php framework, and its easy to use. But of course you can use any backend you want and do the same strategy. 23 | 24 | When you implement flow like that can only generate audio file once, and when the audio file is exist, it will not generate again. 25 | 26 | In english: 27 | 28 | Let say 1 blog post = 1,200 words 29 | 30 | Words Only: 1,200 words × 5 characters per word = 6,000 characters. 31 | 32 | Including Spaces and Punctuation: Spaces (approximately 20% of 6,000) = 6,000 × 0.2 = 1,200 characters. 33 | 34 | Total estimated characters = 6,000 (words) + 1,200 (spaces) = 7,200 characters. 35 | 36 | When you use open ai tts. 1 million / $15 (tts-1) 37 | 38 | So you can generate audio of 133 post with cost $15. 39 | 40 | But its `when all` your post is accessed. When not all post is accessed, you can save more money. even when the user is not fully reading your article. 41 | 42 | My lib also use [batch request system](PROBLEMS.md#the-solution-is-using-batch-request-system). they only ask the backend to get/make audio for only section that user is currently reading/listening. 43 | 44 |
45 | 46 | 47 | ## B. Paid TTS API 48 | 49 | ### - ElevenLabs 50 | 51 | [Eleven Labs](https://try.elevenlabs.io/29se7bx2zgw1) is a text-to-speech API that allows you to convert text into high-quality audio files. It supports multiple languages and voices, and provides a range of customization options to help you create the perfect audio for your needs. 52 | 53 | ![ElevanLabs TTS Pricing](./img/elevenlabs_pricing.png) 54 | 55 |
56 | Example Client Side Code (Frontend) 57 | 58 | ```js 59 | function convertBase64ToBlobURL(base64Audio) { 60 | // Remove the prefix from the data URL if present 61 | const base64Data = base64Audio.replace(/^data:audio\/mpeg;base64,/, ""); 62 | // Convert base64 to raw binary data held in a string 63 | const byteString = atob(base64Data); 64 | // Create an ArrayBuffer with the binary length of the base64 string 65 | const arrayBuffer = new ArrayBuffer(byteString.length); 66 | // Create a uint8 view on the ArrayBuffer 67 | const uint8Array = new Uint8Array(arrayBuffer); 68 | for (let i = 0; i < byteString.length; i++) { 69 | uint8Array[i] = byteString.charCodeAt(i); 70 | } 71 | // Create a blob from the uint8Array 72 | const blob = new Blob([uint8Array], { type: "audio/mpeg" }); 73 | // Generate a URL for the blob 74 | const blobURL = URL.createObjectURL(blob); 75 | 76 | return blobURL; 77 | } 78 | 79 | export const ttsUsingElevenLabs = async (inputText) => { 80 | // see https://elevenlabs.io/docs/api-reference/text-to-speech 81 | // https://github.com/albirrkarim/react-speech-highlight-demo/blob/main/AUDIO_FILE.md#eleven-labs 82 | 83 | // Set the ID of the voice to be used. 84 | const VOICE_ID = "21m00Tcm4TlvDq8ikWAM"; 85 | 86 | const blobUrl = await fetch( 87 | process.env.NEXT_PUBLIC_ELEVEN_LABS_API_ENDPOINT, 88 | { 89 | method: "POST", 90 | headers: { 91 | "Content-Type": "application/json", 92 | }, 93 | body: JSON.stringify({ 94 | text: inputText, 95 | voice_id: VOICE_ID, 96 | model_id: "eleven_multilingual_v2", 97 | voice_settings: { 98 | stability: 0.75, // The stability for the converted speech 99 | similarity_boost: 0.5, // The similarity boost for the converted speech 100 | style: 1, // The style exaggeration for the converted speech 101 | speaker_boost: true, // The speaker boost for the converted speech 102 | }, 103 | }), 104 | } 105 | ) 106 | .then((response) => { 107 | if (!response.ok) { 108 | alert("Network fail"); 109 | throw new Error(`HTTP error! Status: ${response.status}`); 110 | } 111 | return response.json(); 112 | }) 113 | .then((data) => { 114 | // Assuming the API response contains a property 'audio' with the base64-encoded audio 115 | const base64Audio = data.audio; 116 | 117 | // Create a Blob URL 118 | const blobUrl = convertBase64ToBlobURL(base64Audio); 119 | 120 | return blobUrl; 121 | }); 122 | 123 | return blobUrl; 124 | }; 125 | 126 | import { convertTextIntoClearTranscriptText } from "react-speech-highlight"; 127 | 128 | var clear_transcript = convertTextIntoClearTranscriptText( 129 | "This is example text you can set" 130 | ); 131 | 132 | const audioURL = await ttsUsingElevenLabs(clear_transcript); 133 | 134 | const { controlHL, statusHL, prepareHL, spokenHL } = useTextToSpeech({ 135 | lang: "en", 136 | preferAudio: audioURL, 137 | //or 138 | // fallbackAudio: audioURL, 139 | }); 140 | ``` 141 | 142 |
143 | 144 |
145 | Example Integration Node js Backend with ElevenLabs TTS API 146 | 147 | Go to the [backend folder in this repo](https://github.com/albirrkarim/react-speech-highlight-demo/tree/main/backend/nodejs), you can see the example 148 | 149 |
150 | 151 |
152 | Example Integration Laravel Backend with ElevenLabs TTS API 153 | 154 | Router 155 | 156 | ```php 157 | Route::post('text-to-speech-elevenlabs', 'textToSpeechElevenLabs')->name('text_to_speech_elevenlabs'); 158 | ``` 159 | 160 | File `TTSController.php` this will return audio as base64 161 | 162 | ```php 163 | public function textToSpeech(Request $request) 164 | { 165 | $api_key = config('elevenlabs.api_key'); 166 | $voice_id = isset($request['voice_id']) ? $request['voice_id'] : '21m00Tcm4TlvDq8ikWAM'; // Set the ID of the voice to be used 167 | 168 | $client = new Client([ 169 | 'headers' => [ 170 | 'Accept' => 'audio/mpeg', 171 | 'Content-Type' => 'application/json', 172 | 'xi-api-key' => $api_key, 173 | ], 174 | ]); 175 | 176 | try { 177 | $response = $client->post("https://api.elevenlabs.io/v1/text-to-speech/$voice_id", [ 178 | 'json' => $request->all(), 179 | ]); 180 | 181 | // Check if the request was successful 182 | if ($response->getStatusCode() === 200) { 183 | // Get the audio content as a base64-encoded string 184 | $base64Audio = base64_encode($response->getBody()); 185 | 186 | // Return the base64-encoded audio 187 | return response()->json([ 188 | 'status' => true, 189 | 'audio' => $base64Audio, 190 | ]); 191 | } else { 192 | // Handle unsuccessful response 193 | return response()->json([ 194 | 'status' => false, 195 | 'message' => 'Text-to-speech API request failed.', 196 | ], $response->getStatusCode()); 197 | } 198 | } catch (\Exception $e) { 199 | // Handle Guzzle or other exceptions 200 | return response()->json([ 201 | 'status' => false, 202 | 'message' => 'Error during text-to-speech API request.', 203 | 'error' => $e->getMessage(), 204 | ], 500); 205 | } 206 | } 207 | 208 | ``` 209 | 210 |
211 | 212 | ### - Open AI TTS 213 | 214 | [OpenAI](https://platform.openai.com/docs/guides/text-to-speech) is also providing tts service, for now it come with minimal feature, but its fast latency. 215 | 216 | 217 | Open AI TTS Pricing 218 | 219 | 220 |
221 | Example OpenAI TTS Backend with Laravel 222 | 223 | Router 224 | 225 | ```php 226 | Route::post('text-to-speech-elevenlabs', 'textToSpeechElevenLabs')->name('text_to_speech_elevenlabs'); 227 | ``` 228 | 229 | File `TTSController.php` this will return audio as base64 230 | 231 | ```php 232 | $api_key = config('openai.api_key'); 233 | 234 | $client = new Client([ 235 | 'headers' => [ 236 | 'Authorization' => 'Bearer ' . $api_key, 237 | 'Content-Type' => 'application/json' 238 | ] 239 | ]); 240 | 241 | try { 242 | $response = $client->post("https://api.openai.com/v1/audio/speech", [ 243 | 'json' => [ 244 | 'model' => isset($request["model"]) ? $request["model"] : 'tts-1', 245 | 'input' => $request["input"], 246 | 'voice' => isset($request["voice"]) ? $request["voice"] : 'nova', 247 | ] 248 | ]); 249 | 250 | // Check if the request was successful 251 | if ($response->getStatusCode() === 200) { 252 | // Get the audio content as a base64-encoded string 253 | $base64Audio = base64_encode($response->getBody()); 254 | 255 | // Return the base64-encoded audio 256 | return response()->json([ 257 | 'status' => true, 258 | 'audio' => $base64Audio, 259 | ]); 260 | } else { 261 | // Handle unsuccessful response 262 | return response()->json([ 263 | 'status' => false, 264 | 'message' => 'Text-to-speech API request failed.', 265 | ], $response->getStatusCode()); 266 | } 267 | } catch (\Exception $e) { 268 | // Handle Guzzle or other exceptions 269 | return response()->json([ 270 | 'status' => false, 271 | 'message' => 'Error during text-to-speech API request.', 272 | 'error' => $e->getMessage(), 273 | ], 500); 274 | } 275 | ``` 276 | 277 | Your Client Side Code 278 | 279 | ```jsx 280 | const ttsUsingOpenAI = async (inputText) => { 281 | // Set the ID of the voice to be used. 282 | 283 | const blobUrl = await fetch(process.env.NEXT_PUBLIC_OPENAI_TTS_API_ENDPOINT, { 284 | method: "POST", 285 | headers: { 286 | "Content-Type": "application/json", 287 | }, 288 | body: JSON.stringify({ 289 | input: inputText, 290 | model: "tts-1", //or tts-1-hd 291 | voice: "alloy", 292 | }), 293 | }) 294 | .then((response) => { 295 | if (!response.ok) { 296 | throw new Error(`HTTP error! Status: ${response.status}`); 297 | } 298 | return response.json(); 299 | }) 300 | .then((data) => { 301 | // Assuming the API response contains a property 'audio' with the base64-encoded audio 302 | const base64Audio = data.audio; 303 | 304 | // Convert the base64 audio to a Blob 305 | const byteCharacters = atob(base64Audio); 306 | const byteNumbers = new Array(byteCharacters.length); 307 | for (let i = 0; i < byteCharacters.length; i++) { 308 | byteNumbers[i] = byteCharacters.charCodeAt(i); 309 | } 310 | const byteArray = new Uint8Array(byteNumbers); 311 | const blob = new Blob([byteArray], { type: "audio/mpeg" }); 312 | 313 | // Create a Blob URL 314 | const blobUrl = URL.createObjectURL(blob); 315 | 316 | return blobUrl; 317 | }); 318 | 319 | return blobUrl; 320 | }; 321 | 322 | import { convertTextIntoClearTranscriptText } from "react-speech-highlight"; 323 | 324 | var clear_transcript = convertTextIntoClearTranscriptText( 325 | "This is example text you can set" 326 | ); 327 | 328 | const audioURL = await ttsUsingOpenAI(clear_transcript); 329 | 330 | const { controlHL, statusHL, prepareHL, spokenHL } = useTextToSpeech({ 331 | lang: "en", 332 | preferAudio: audioURL, 333 | //or 334 | // fallbackAudio: audioURL, 335 | }); 336 | ``` 337 | 338 |
339 | 340 |
341 | 342 | ### - Google TTS API 343 | 344 | [Google tts](https://cloud.google.com/text-to-speech) support SSML see the [pricing](https://cloud.google.com/text-to-speech/pricing) 345 | 346 | ![Google TTS](./img/google_tts.png) 347 | 348 |
349 | 350 | ### - Amazon Polly 351 | 352 | See Amazon Polly [pricing](https://aws.amazon.com/polly/pricing/) 353 | 354 | 355 | Amazon Polly Pricing 356 | 357 | 358 | For 1 million character: 359 | Standard: $4.00 360 | Neural: $16.00 361 | 362 | What different between standard and neural ? [see](https://docs.aws.amazon.com/polly/latest/dg/neural-voices.html) 363 | 364 | Simplified, Neural voices are more natural-sounding (better) than standard voices. 365 | 366 |
367 |
368 | 369 | ## C. Local AI TTS 370 | 371 | You can also use the local AI system, to do speech synthesis. You can use local PC or Google Colab. 372 | 373 | Then synchronize (text <-> audio) it with your server. 374 | 375 | Of course its not easy it will face many problems like: 376 | 377 | - Is you have knowledge about python and AI ? 378 | - How you get the models for your language ? 379 | When its english you can easily got the models. 380 | 381 | When i have time i will make tutorial about how to doing local speech synthesis. 382 | -------------------------------------------------------------------------------- /CHANGELOG.md: -------------------------------------------------------------------------------- 1 | # CHANGELOG 2 | 3 | # v5.4.8 4 | 5 | - High standard Linting Eslint & Type checking typescript 6 | 7 | # 5.4.7 8 | 9 | - Increase stability, fixing marking for farsi 10 | - Update all deps for demo website, eslint 9, next js 15, material ui 7 etc... 11 | 12 | # 5.4.6 13 | 14 | - Increase capability swallowing Speech To Text result. with 722 Sample. For all the TTS languages that open ai support. 15 | 16 |
17 | Report 18 |
19 | 20 | ``` 21 | avgErrWordsMiddle: 0.15255255255255257 22 | Unit: sNodesSTTAlign() is done | 23 | Avg Accuracy Sentence Time 99.97 % of 722 sample | 24 | Avg Accuracy Word Time 98.15 % of 666 sample | 25 | Avg Accuracy Word Middle 99.85 % of 666 sample | 26 | Avg Exec Time: 0.37 ms 27 | ``` 28 | 29 |
30 | 31 | # 5.4.5 32 | 33 | - Increase capability swallowing Speech To Text result. with 106 Sample. For all the TTS languages that open ai support. 34 | 35 |
36 | Report 37 |
38 | 39 | ``` 40 | avgErrWordsMiddle: 1.2860759493670888 41 | Unit: ruleTimestampEngine() is done | 42 | Avg Accuracy Sentence Time 99.81 % of 106 sample | 43 | Avg Accuracy Word Time 84.44 % of 79 sample | 44 | Avg Accuracy Word Middle 98.71 % of 79 sample | 45 | Avg Exec Time: 1.59 ms 46 | ``` 47 | 48 |
49 | 50 | # 5.4.3 - 5.4.4 51 | 52 | - Fix double click gesture 53 | - Fix seeking paragraph 54 | 55 | # 5.4.2 56 | 57 | - Enhance marking test to 75 test case 58 | 59 | # 5.3.9 - 5.4.1 60 | 61 | - Add api to overide STT (Speech to text) function 62 | 63 | ```jsx 64 | import { 65 | openaiSpeechToTextSimple, 66 | useTextToSpeech, 67 | type ConfigTTS, 68 | } from "@lib/react-speech-highlight"; 69 | 70 | const config: Partial = { 71 | preferAudio: getPublicAccessibleAudioURL, 72 | batchSize: 200, 73 | timestampEngineProps: { 74 | mode: "ml", 75 | sttFunction: async (input) => { 76 | console.log("Optionally Using custom STT function"); 77 | // Maybe you want do overide the api request. 78 | // since you know the INPUT and the OUTPUT here, so you can create the PROCESS 79 | // INPUT -> PROCESS -> OUTPUT 80 | console.log("STT: input", input); 81 | const output = await openaiSpeechToTextSimple(input); 82 | console.log("STT: output", output); 83 | return output; 84 | }, 85 | onProgress(progress, timeLeftEstimation) { 86 | console.log("Timestamp Engine Progress", progress); 87 | setProgress(progress); 88 | // setMessage("On progress Timestamp Engine (speech to text) ... -> " + moment.duration(timeLeftEstimation, "seconds").humanize()) 89 | }, 90 | }, 91 | }; 92 | ``` 93 | 94 | # 5.3.8 95 | 96 | - Better marking sps (sentence) & spw (word) tag with over 69 test case from various wierd data. 97 | 98 | # 5.3.7 99 | 100 | - Realtime Text to Speech With Highlight - This package can intergrate with [open ai realtime api](https://platform.openai.com/docs/guides/realtime), Imagine you have a phone call with AI the web are displaying the transcript with highlight the current spoken. 101 | 102 | # 5.3.6 103 | 104 | - Fixing marking `sps` and `spw` 105 | - Tag `spkn` now deprecated 106 | 107 | # 5.3.1 - 5.3.5 108 | 109 | - Fixing bug 110 | 111 | # 5.3.0 112 | 113 | - Better timestamp engine 114 | - Better API design ( Breaking changes! ) 115 | 116 | # 5.2.9 117 | 118 | - Enhancing timestamp engine capability v4 119 | 120 | # 5.2.7 - 5.2.8 121 | 122 | - Fix bug 123 | 124 | # 5.2.4 - 5.2.6 125 | 126 | - Add non latin support like chinese, japanese, korean, greek, etc... 127 | 128 | # 5.2.3 129 | 130 | - Update `toJSON` method in virtual node 131 | - Better LLM ENGINE 132 | 133 | # 5.2.1 - 5.2.2 134 | 135 | - Adding more jsdocs 136 | - Some little refactor 137 | 138 | # 5.2.0 139 | 140 | - Add example of generating SRT with `onEnded` event [see](https://react-speech-highlight.vercel.app/example) 141 | 142 | # 5.1.9 143 | 144 | - Begin the plugin architecture 145 | - Backend-nify the LLM engine using node js server (optional) 146 | 147 | # 5.1.8 148 | 149 | - Fix bug 150 | 151 | # 5.1.7 152 | 153 | - Rename storage API 154 | 155 | # 5.1.3 - 5.1.6 156 | 157 | - Fix bug 158 | - Renaming API 159 | - Virtual Storage (to mimic sessionStorage) 160 | - Local state tts [see demo example page](https://react-speech-highlight.vercel.app/example) 161 | - Add better error event 162 | 163 | # 5.1.0 - 5.1.2 164 | 165 | - Development of hybrid transcript timestamp engine 166 | - Fix bug 167 | 168 | # 5.0.9 169 | 170 | Relation Finder v4 see the evaluation on [LLM_ENGINE](LLM_ENGINE.md) 171 | 172 | # 5.0.7 - 5.0.8 173 | 174 | - Fix bug 175 | 176 | # 5.0.2 - 5.0.6 177 | 178 | - Fix bug 179 | - Improve Translate To some language engine, with chunking system it can handle more a lot of text 180 | 181 | # 5.0.1 182 | 183 | - Introduction virtual nodes, Sentence Node and Word Node. for flexible text to speech. Used in PDF TTS Highlight and Relation Highlight Features. 184 | - Relation Highlight Feature - Used in Youtube Transcript Highlight. Highlight the words in youtube transcript, and their relations to other word like their translation form. 185 | - Rename config.classSentences into config.classSentence 186 | - Add `ControlHL.followTime()` for following the time of played youtube video in iframe. 187 | 188 | # 5.0.0 189 | 190 | Stable version release, before i add API for plugin TTS on PDF 191 | -------------------------------------------------------------------------------- /EXAMPLE_CODE.md: -------------------------------------------------------------------------------- 1 | # Example Code 2 | 3 | This is the simple example code. Want more? see the [HOW_TO_USE.md](HOW_TO_USE.md) 4 | 5 | ### Styling the highlighter 6 | 7 | File `App.css` 8 | 9 | ```css 10 | .highlight-spoken { 11 | color: black !important; 12 | background-color: #ff6f00 !important; 13 | border-radius: 5px; 14 | } 15 | 16 | .highlight-sentence { 17 | color: #000000 !important; 18 | background-color: #ffe082 !important; 19 | border-radius: 5px; 20 | } 21 | ``` 22 | 23 | ### The code example 24 | 25 | File `App.js` 26 | 27 | ```jsx 28 | import "./App.css"; 29 | import { useEffect, useMemo, useRef, useState } from "react"; 30 | import { markTheWords, useTextToSpeech } from "react-speech-highlight"; 31 | 32 | export default function App() { 33 | const text = "Some Input String"; 34 | const textEl = useRef(); 35 | const lang = "en-US"; 36 | 37 | const config = { 38 | disableSentenceHL: false, 39 | disableWordHL: false, 40 | autoScroll: false, 41 | lang: lang, 42 | } 43 | 44 | const { controlHL, statusHL, prepareHL, spokenHL } = useTextToSpeech(config); 45 | 46 | const textHL = useMemo(() => markTheWords(text), [text]); 47 | 48 | return ( 49 | <> 50 |
51 |
56 |
57 | 58 | { 61 | if (statusHL == "pause") { 62 | controlHL.resume(config); 63 | } else { 64 | controlHL.play( 65 | textEl.current 66 | ); 67 | } 68 | }} 69 | pause={controlHL.pause} 70 | stop={controlHL.stop} 71 | /> 72 | 73 | ); 74 | } 75 | ``` 76 | 77 | ### Sample TTS Control 78 | 79 | File `PanelControlTTS.js` 80 | 81 | ```jsx 82 | export default function PanelControlTTS({ isPlay, play, pause, stop }) { 83 | return ( 84 | <> 85 | 96 | 97 | {isPlay && } 98 | 99 | ); 100 | } 101 | ``` 102 | -------------------------------------------------------------------------------- /HOW_TO_USE.md: -------------------------------------------------------------------------------- 1 | # How to use this package 2 | 3 | The detail of how to use this package is in the `README.md` of each private repo 4 | 5 | [Demo Website https://react-speech-highlight.vercel.app](https://github.com/Web-XR-AI-lab/demo-website-react-speech-highlight) 6 | 7 | [Package Repo of React Speech Highlight](https://github.com/Web-XR-AI-lab/react-speech-highlight) 8 | 9 | 10 | [Demo Website & Pacakage of Vanilla Speech Highlight](https://github.com/Web-XR-AI-lab/vanilla-speech-highlight) 11 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Buy once, Use it on all your projects, as long as the public can't see your project code. 2 | Appreciate my research by not share the code to other person. -------------------------------------------------------------------------------- /LIMITATION.md: -------------------------------------------------------------------------------- 1 | # Limitation 2 | 3 | This package is designed to be no limit, but for now it has some limitations: 4 | 5 | 1. Too much marking tag, the `sps` for marking the sentence and `spw` is for marking the word. It will cause a lot of html tag in a page. 6 | 7 | It's all the limitation that i know. -------------------------------------------------------------------------------- /LLM_ENGINE.md: -------------------------------------------------------------------------------- 1 | # LLM Engine 2 | 3 | What's that ? Simply it use LLM to make advanced AI functionality on your app. It's a simple cost effective way to get AI functionality on your app. Even your server just have < 1GB RAM. 4 | 5 | Further information, evaluation, etc. is in [private repo](https://github.com/Web-XR-AI-lab/demo-website-react-speech-highlight/blob/main/docs/LLM_ENGINE.md). 6 | 7 | **Just buy it and use it. It's simple.** 8 | 9 | You got ready make AI functionality on your app, knowledges, and more. -------------------------------------------------------------------------------- /MAKE_BACKEND.md: -------------------------------------------------------------------------------- 1 | # How To Set Up Backend For This Package 2 | 3 | ## A. LLM (Large Language Model) API 4 | 5 | Optionally we need LLM API to solve many [issue](PROBLEMS.md). the LLM api i use is open ai chat completion api. So we must have backend server that provide proxy api call to the open ai. 6 | 7 | ![Open AI API](/img/chat_gpt_api.png) 8 | 9 | ### 1. Make Backend for open ai chat completion API 10 | 11 | **API URL Endpoint** 12 | 13 | ```js 14 | OPENAI_CHAT_COMPLETION_API_ENDPOINT = "https://example.com/api/v1/public/chat"; 15 | ``` 16 | 17 | with that url then the `package` will send **body request** like this 18 | 19 | ```json 20 | { 21 | "temperature": 0, 22 | "model": "gpt-3.5-turbo", 23 | "messages": [ 24 | { 25 | "role": "user", 26 | "content": "convert this semicolon separated number \"1000;4090;1000000;1,2;9001;30,1\" into word form number with language \"en-US\" return the result as array. don't explain" 27 | } 28 | ] 29 | } 30 | ``` 31 | 32 | and your backend will **response** like this. 33 | 34 | #### Example response that this package want 35 | 36 | ```json 37 | { 38 | "id": "chatcmpl-7s8i7oHA1BkcLD5U0FFkoYEn2b2QF", 39 | "object": "chat.completion", 40 | "created": 1693137551, 41 | "model": "gpt-3.5-turbo-0613", 42 | "choices": [ 43 | { 44 | "index": 0, 45 | "message": { 46 | "role": "assistant", 47 | "content": "[\"one thousand\", \"four thousand ninety\", \"one million\", \"one point two\", \"nine thousand one\", \"thirty point one\"]" 48 | }, 49 | "finishReason": "stop" 50 | } 51 | ], 52 | "usage": { "promptTokens": 54, "completionTokens": 29, "totalTokens": 83 } 53 | } 54 | ``` 55 | 56 | ### 2. Set open ai chat completion api endpoint for the package 57 | 58 | Goto [API Docs about this](API.md#package-data-and-cache-integration) 59 | 60 |
61 | 62 | ### Example Implementation 63 | 64 | If you are using different backend, please look by yourself how to implement it. the important is the same respond (like [this](#example-response-that-this-package-want)) so the `react-speech-highlight` package can understand Actually you can customize the logic, like add [authentication header](API.md#set-custom-constant-value-for-this-package). 65 | 66 |
67 | Show example using Laravel as Backend 68 | 69 |
70 | 71 | ### Router 72 | 73 | Open `routes/api.php` 74 | 75 | Remember you must set the throttle 180 request / 1 minute. our engine need to send a lot request. no worry it small request so its cost effective. 76 | 77 | ```php 78 | /* OpenAI */ 79 | Route::name("openai.")->middleware('throttle:180,1')->controller(OpenAIController::class)->group(function () { 80 | // chat gpt 81 | Route::post('chat', 'chatPost')->name('chat_completions'); 82 | }); 83 | ``` 84 | 85 | Controller 86 | 87 | Open `OpenAIController.php` 88 | 89 | ```php 90 | class OpenAIController extends Controller 91 | { 92 | public function chatPost(Request $request){ 93 | $origin = $request->header('Origin'); 94 | 95 | $allowed_domain = [ 96 | // Production url 97 | "https://example.com" => "sk-xxx_your_secret_key", 98 | 99 | // Development url 100 | "http://localhost:3000" => "sk-xxx_your_secret_key", 101 | ]; 102 | 103 | if (!isset($allowed_domain[$origin])) { 104 | return response()->json([ 105 | "status" => false, 106 | "message" => "Invalid request, please contact support!" 107 | ], 400); 108 | } else { 109 | if (strpos($origin, 'localhost') !== false) { 110 | if (app()->environment() != "local") { 111 | return response()->json([ 112 | "status" => false, 113 | "message" => "Invalid request, please contact support!" 114 | ], 400); 115 | } 116 | } 117 | } 118 | 119 | $api_key = $allowed_domain[$origin]; 120 | $data = $request->all(); 121 | 122 | if (!isset($data['messages'])) { 123 | return response()->json([ 124 | "status" => false, 125 | "message" => "please post 'messages' as body request" 126 | ], 400); 127 | } 128 | 129 | // the [https://github.com/openai-php/laravel] package is have problem don't use it 130 | // https://github.com/openai-php/laravel/issues/51#issuecomment-1651224516 131 | 132 | $body = [ 133 | 'model' => isset($data["model"]) ? $data["model"] : 'gpt-3.5-turbo', 134 | 'messages' => $data["messages"], 135 | 'temperature' => isset($data["temperature"]) ? $data["temperature"] : 0.6, 136 | 137 | // 'functions' => [ 138 | // [ 139 | // 'name' => $function, 'parameters' => config('schema.'.$function), 140 | // ], 141 | // ], 142 | // 'function_call' => [ 143 | // 'name' => $function, 144 | // ], 145 | // 'temperature' => 0.6, 146 | // 'top_p' => 1, 147 | ]; 148 | 149 | // Use approach like this instead 150 | $result = Http::withToken($api_key) 151 | ->retry(5, 500) 152 | ->post('https://api.openai.com/v1/chat/completions', $body) 153 | ->throw() 154 | ->json(); 155 | 156 | return $result; 157 | } 158 | } 159 | ``` 160 | 161 |
162 | 163 |
164 | 165 | ## B. Text To Speech API 166 | 167 | When you decide to use audio source is from TTS API. you can see the [AUDIO_FILE.md](AUDIO_FILE.md) for more detail. 168 | -------------------------------------------------------------------------------- /PROBLEMS.md: -------------------------------------------------------------------------------- 1 | # Problems 2 | 3 | ## A. Common problem in Text to Speech (Both audio file and Web Speech Synthesis) 4 | 5 | ### 1. Pronounciation Problem 6 | 7 | We want text user see is different with what system should speak. 8 | 9 | What we do ? we make some engine that can do accurate and cost effective pronounciation correction Using LLM Open AI Chat Completions for any terms or equations from academic paper, math, physics, computer science, machine learning, and more... 10 | 11 |
12 | Show details 13 |
14 | 15 | **Auto Pronounciation Correction** 16 | 17 | This package needs chat gpt api to do that. [see how to use integrate this package with open ai api](MAKE_BACKEND.md) 18 | 19 |
20 | 21 | **Manual Pronounciation Correction** 22 | 23 | in english abbreviation like `FOMO`, `ETA`, etc. 24 | 25 | This package also have built-in abbreviation function, or you can write your own rules. 26 | 27 | ``` 28 | input:string -> abbreviation function -> output:string. 29 | ``` 30 |
31 | 32 |
33 | 34 | ## B. When Using Audio File 35 | 36 | ### 1. The delay of audio played and user gesture to trigger play must be close. 37 | 38 | When user click play button the system will preparing the audio file with send request to TTS API like eleven labs, what if the api request is so long (because the text you send is long)? what if your TTS API has limitation 39 | 40 | It will causing bad experience to the user. 41 | 42 |
43 | Read more 44 | 45 | It will causing bad experience to the user. even in device like ipad and iphon they have rules that the delay between user interaction and the audio played must not exceed 4seconds or it will be fail. 46 | 47 | They will give error like this 48 | 49 | ``` 50 | Unhandled Promise Rejection: NotAllowedError: The request is not allowed by the user agent or the platform in the current context, possibly because the user denied permission. 51 | ``` 52 | 53 | So what the solution for this? 54 | 55 | I set this package to make batch request for API call. 56 |
57 | 58 | ### 2. Long text request to TTS API (Capabilty of TTS API handling long text) 59 | 60 | All tts api has limitation of character that can be sent to them. 61 | 62 | ### The solution is using Batch Request System 63 | 64 | Batch strategy will solve that problems above. You can define the batch size in the [config](API.md#2a-config) 65 | 66 |
67 | 68 |
69 | Read more 70 | 71 |
72 | 73 | **How it work?** 74 | 75 | Let says you have 10000 character long of text, and let says your tts api service will be done making the audio file in 60 seconds. (so your user will waiting to play 60 second after they want ? it so bad) 76 | 77 | So, My package will chunk it into close to the 200 character each. 78 | 79 | 10000/200 = 50 request. 80 | 81 | 60/10000\*200 = 1.2 seconds 82 | 83 | my package will send the first chunk, and the tts api will give the audio file in just 1,2 then the audio is played. 84 | 85 | So the delay between user click button play and the tts start to play will be just 1,2 seconds. what about other chunks. i manage to send other chunk in the background while tts is played. and enchance efficiency of character used in tts api. you pay the tts api service based on the character right?. 86 | 87 | lets say we have 88 | 89 | ``` 90 | chunk0 <- user still playing this 91 | chunk1 92 | chunk2 <- my package will try to prepare until this 93 | chunk3 94 | ... 95 | chunk49 96 | ``` 97 | 98 | This method will, solve other problem like maximal character that your tts api can handle. for example on elvenlabs they only can do [5000](https://help.elevenlabs.io/hc/en-us/articles/13298164480913-What-s-the-maximum-amount-of-characters-and-text-I-can-generate) character for audio generation. 99 | 100 |
101 | 102 |
103 |
104 | 105 | ## C. When Using Web Speech Synthesis 106 | 107 | The [SpeechSynthesis](https://developer.mozilla.org/en-US/docs/Web/API/SpeechSynthesis). Comes with problems: 108 | 109 | ### 1. Unlimited String Length Capability 110 | 111 | Some available voice doesn't support long text / string. 112 | 113 | How about this package? it can read unlimited string (can't die when playing). 114 | 115 | 116 | 117 |
118 | 119 | ### 2. Cross Platform Problem 120 | 121 | I'm sure that you have the same experience, developing web for cross platform, android, Iphone or Ipad always resulting problem. 122 | 123 | - Speech synthesis in IOS or Ipad OS sometimes die unexpected. 124 | - Sometimes `speechsynthesis` doesn't fire the `onpause`, `onresume` event on android, ipad, 125 | 126 |
127 | 128 | ### 3. Unpredictable Onboundary 129 | 130 | - First, Not all voices have `onboundary` event 131 | - On ipad the `onboundary` event only work with about 30% of the full sentence. 132 | - Also the on boundary event doesn't fire function accurately. for example the text is `2022` the `onboundary` will fire twice `20` and `22`. 133 | 134 |
135 | 136 | ### 4. Bad performance or voice too fast 137 | 138 | in API `prepareHL.getVoices()` i implement this flow: 139 | 140 |
141 | Show flow 142 |
143 | 144 | ![React Speech Highlight](./img/prepareHL.png) 145 | 146 |
147 | 148 |
149 | 150 | ### 5. The voice is not good enough 151 | 152 | With `window.speechSynthesis` the voice is like a robot, doesn't have parameter like: 153 | 154 | - Emotions 155 | - Characteristic 156 | 157 | It can be achieved by using deep learning (with python) or other paid TTS API. 158 | 159 | In this package i just want to make cheap solution for TTS so i just the `window.speechSynthesis`. 160 | 161 | Now this package has Prefer / Fallback to Audio file. 162 | 163 | Options to play: 164 | 165 | preferAudio(if defined) > Web Speech Synthesis > fallbackAudio(if defined) 166 | 167 | see [AUDIO_FILE.md](AUDIO_FILE.md) and the config [API.md](API.md#2a-config) 168 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # React / Vanilla Speech Highlight - Text-to-Speech with Word Highlighting 2 | 3 | 4 | 5 | [![React / Vanilla Speech Highlight](./img/banner.png)](https://react-speech-highlight.vercel.app) 6 | 7 | **React / Vanilla Speech Highlight** is a powerful library for integrating **text-to-speech** and **real-time word/sentence highlighting** into your web applications. It supports **audio files**, the **Text-to-Speech API**, and the **Web Speech Synthesis API**, making it ideal for creating interactive, accessible, and dynamic user experiences. 8 | 9 | [🌟 Try the Demo: React Speech Highlight](https://react-speech-highlight.vercel.app) 10 | 11 | https://github.com/user-attachments/assets/79c6d4f6-f3e2-4c38-9bec-dbb6834d87f8 12 | 13 | 14 | 15 | ## Other Version 16 | 17 | ### Vanilla JS (Native Javascript) 18 | 19 | 20 | Vanilla Speech Highlight 21 | 22 | 23 | We support implementation using vanilla js. this package has bundle size of 45 KB. You can easily combine this library with your website, maybe your website using [jquery](https://jquery.com) 24 | 25 | Read the [API_VANILLA.md](API_VANILLA.md) to see the different. 26 | 27 | [Try the demo Vanilla Speech Highlight](https://vanilla-speech-highlight.vercel.app) 28 | 29 | Watch [Youtube Video](https://youtu.be/vDc7L5W7HhU) about implementation vanilla speech highlight for javascript text to speech task. 30 | 31 | ### React Native Speech Highlight 32 | 33 | 34 | React Native Speech Highlight 35 | 36 | 37 |
38 | 39 |
40 | Show video demo 41 |
42 | 43 | https://github.com/albirrkarim/react-speech-highlight-demo/assets/29292018/abb9cb6c-4c55-448b-a9a5-d1856896b455 44 | 45 |
46 | 47 | Built with react native cli. [Try the demo android app](https://bit.ly/RNSHL-4-9-9) 48 | 49 | Do you want other implementation? just ask me via discord: albirrkarim 50 | 51 | This is the Documentation for [web version](#--the-web-version-react-and-vanilla-js) 52 | 53 |
54 | 55 | # Docs for v5.4.8 56 | 57 | **Table Of Contents** 58 | 59 | - [A. Introduction](#a-introduction) 60 | - [B. Todo](#b-todo) 61 | - [C. API & Example Code](#c-api--example-code) 62 | - [D. Changelog](#d-changelog) 63 | - [E. Disclaimer & Warranty](#e-disclaimer--warranty) 64 | - [F. FAQ](#f-faq) 65 | - [G. Payment](#g-payment) 66 | 67 | ## A. Introduction 68 | 69 | ### What i want? 70 | 71 | Recently, I want to implement the text-to-speech with highlight the word and sentence that are being spoken on my website. 72 | 73 | Then i do search on the internet. but i can't find the npm package to solve all TTS [problems](PROBLEMS.md) 74 | 75 | I just want some powerfull package that flexible and good voice quality. 76 | 77 | ### Here what i got when i search on internet: 78 | 79 | Overall the text to speech task comes with problems (See the detail on [PROBLEMS.md](PROBLEMS.md)) whether using web speech synthesis or the audio file. 80 | 81 | **Using [Web SpeechSynthesis](https://developer.mozilla.org/en-US/docs/Web/API/SpeechSynthesis)** 82 | 83 | They have problems Robot like sound, Supported Devices Available, etc.. 84 | 85 | **Using paid subscription text-to-speech synthesis API** 86 | 87 | When we talk about good sound / human like voices AI models inference should get involved. So it doesn't make sense if doing that on client side. 88 | 89 | Then the speech synthesis API provider like [ElevenLabs](https://try.elevenlabs.io/29se7bx2zgw1), [Murf AI](https://get.murf.ai/0big1kdars4f), [Open AI](https://platform.openai.com/docs/guides/text-to-speech), [Amazon Polly](https://aws.amazon.com/id/polly/), and [Google Cloud](https://cloud.google.com/text-to-speech) play their roles. 90 | 91 | But they don't provide the npm package to do highlighting. 92 | 93 | Then i found [Speechify](https://speechify.com). but i don't find any docs about using some npm package that integrate with their service. Also this is a paid subscriptions services. 94 | 95 | Searching again, Then i found [ElevenLabs](https://try.elevenlabs.io/29se7bx2zgw1) its free if the 10000 character / month and will reset on next month. **Cool right?** So i decide to use this as speech synthesis API in my project. This platform also doesn't provide the react npm package to highlight their audio, but they provide [streaming output audio](https://elevenlabs.io/docs/api-reference/websockets#streaming-output-audio) that can be use to produce "when the words is spoken in some audio" (transcript timestamp) like [someone make about this thing](https://medium.com/@brandon.demeria/synchronized-text-highlighting-with-elevenlabs-speech-in-laravel-php-e387c2797396). 96 | 97 | **In production you must do cost calculation**, which TTS Service API provider you should choose. The services that have capability streaming audio is promising highlight word. but also comes with high price. **The cheap TTS service API usually don't have much features.** 98 | 99 | The [elevenlabs](<(https://try.elevenlabs.io/29se7bx2zgw1)>) have produce good quality voice and many features, but when comes for production they more expensive compares with Open AI TTS, In production the cost is important matter. 100 | 101 | ### Solutions 102 | 103 | ![Overview How React Speech Highlight Works](./img/overview.png) 104 | 105 | So, I decide to making this npm package that combines various methods above to achives all the good things and throw the bad things. All logic is done in client side, look at the overview above. No need to use advanced backend hosting. 106 | 107 | My package combines [Built in Web SpeechSynthesis](https://developer.mozilla.org/en-US/docs/Web/API/SpeechSynthesis) and Audio File (optional) to run. 108 | 109 | When using prefer/fallback to audio file you can achive high quality sound and remove all compactbility problem from [Built in Web SpeechSynthesis](https://developer.mozilla.org/en-US/docs/Web/API/SpeechSynthesis). 110 | 111 | How you can automatically get the audio file of some text ? you can use [ElevenLabs](https://try.elevenlabs.io/29se7bx2zgw1),[Murf AI](https://get.murf.ai/0big1kdars4f),[Open AI](https://platform.openai.com/docs/guides/text-to-speech), [Amazon Polly](https://aws.amazon.com/id/polly/), and [Google Cloud](https://cloud.google.com/text-to-speech) or any other TTS API as long as they can produce audio file (mp3, mp4, wav, etc...) for the detail see the [AUDIO_FILE.md](AUDIO_FILE.md). In the [demo website](https://react-speech-highlight.vercel.app/) i provide you example using ElevenLabs and even you can try your own audio file on that demo web. 112 | 113 | This package just take input text and audio file, so you can flexible to use any TTS API that can produce audio file, The expensive one or even cheap one when you consider the cost. 114 | 115 | How this package know the timing spoken word or sentence of played audio? This package can detect the spoken word and sentence in client side. 116 | 117 | This package is one time pay. No Subscription. Who likes subscription? I also don't. see the how to [purchase bellow](#g-payment). 118 | 119 | ![Positioning](./img/position.png) 120 | 121 | ![Feature Overview](./img/features.png) 122 | 123 | ### Use Cases 124 | 125 | When you are entrepreneur im sure you have some crazy uses case for this package. 126 | 127 | #### Interactive Blog 128 | 129 | Imagine that you have long article and have TTS button then played the text to speech and users can see how far the article has been read. you article will be SEO ready because this package has Server Side Rendering (SSR) capability. 130 | 131 | #### Web AI Avatar / NPC 132 | 133 | ![viseme](/img/viseme.png) 134 | 135 | In the [demo](https://react-speech-highlight.vercel.app/) i provide, you can see the 3D avatar from [readyplayer.me](https://readyplayer.me/) can alive playing the `idle` animation and their mouth can synchronize with the highlighted text to speech, it because this package has react state that represent [current spoken viseme](https://github.com/albirrkarim/react-speech-highlight-demo/blob/main/API.md#spokenhl). the viseme list that i use in the demo is [Oculus OVR LipSync](https://docs.readyplayer.me/ready-player-me/api-reference/avatars/morph-targets/oculus-ovr-libsync). 136 | 137 | #### Language Learning App With Real Human Voice 138 | 139 | ![Use case Language Learning App](./img/hanacaraka.png) 140 | 141 | Look at the example 6 on the [demo](https://react-speech-highlight.vercel.app). its a example of use real human voice for text to speech. Maybe your local language is not supported by the TTS API. you can use this package to use the real human voice. The real human voice is recorded by the real human. The real human voice is more natural than the TTS API. 142 | 143 | #### Academic Text Reader 144 | 145 | ![Pronounciation](/img/pronounciation.png) 146 | 147 | The problem when we do TTS on academic text. it contains math equations, formula, symbol that the shown term is different with their pronounciation [see](PROBLEMS.md#1-pronounciation-problem). so we make some pronounciation correction engine utilizing the Open AI API to think what should the term pronounced. 148 | 149 | #### Relation Highlight and Word Level Highlighting of Youtube Transcript 150 | 151 | https://github.com/user-attachments/assets/799bae21-a43e-44c4-a4c7-ede7ac2d5b51 152 | 153 | It has youtube iframe, and the youtube transcript on the right, when you play the youtube video, the transcript will be highlighted. The highlighting is based on the current time of the played video. this package are **follow** the time. 154 | 155 | Relation Highlight feature - When you hover into some word, the related word will be highlighted too. Example when you hover into chinese word, the pinyin and english word will be highlighted too and vice versa. How it can? [see](LLM_ENGINE.md). 156 | 157 | #### Video Player With Auto Generate Subtitle 158 | 159 | https://github.com/user-attachments/assets/f0d8d157-1c1e-43e1-8eba-ebe7dfe3865e 160 | 161 | Case: You just have audio or video file without text transcript. Our package can generate the transcript from the audio file. or even transtlate the transcript to other language. The subtitle can be highlighted when the video is played, and maybe it want to show two different language subtitle at once. and also highlight the both based on the meaning of the words. 162 | 163 | On that preview video above the video original language is in italian, and i also show the translate in english. and the system is highlight both based on the meaning. 164 | 165 | Italian word `bella` have meaning in english `beautiful` 166 | 167 | Go to [this video demo page](https://react-speech-highlight.vercel.app/video). 168 | 169 | #### Realtime Communication With Highlighted Text 170 | 171 | Task that the audio is feed to client side in real time like you are on a phone call, theres no audio file behind it. 172 | 173 | Recently open ai made [realtime api](https://platform.openai.com/docs/guides/realtime) it use [Web RTC (Web Real-Time Communication)](https://developer.mozilla.org/en-US/docs/Web/API/WebRTC_API) so you can like have a phone call with AI. 174 | 175 | Goto [realtime communication demo](https://react-speech-highlight.vercel.app/#example-realtime). 176 | 177 | #### Your use case here 178 | 179 | ![Covers Everyone’s Needs](./img/adaptable.png) 180 | 181 | Just ask me what you want to make, the package architecture is scalable to make various feature. 182 | 183 |
184 |
185 | 186 | ## B. TODO 187 | 188 | - [ ] Add discord chat bot using LLM for explaining the API, and just explain what you want to make, and they will give you the code. 189 | - [ ] Automation Brute Force End to End Testing, test all APIs, runtime, sequential action, etc... 190 | - [ ] Add viseme support for chinese character 191 | - [ ] Let me know what you want from this package, the package architecture is scalable to make various feature, please write it on issues tab, or contact me via discord (username: albirrkarim) 192 | 193 |
194 | 195 | - [x] Realtime Text to Speech With Highlight - This package can intergrate with [open ai realtime api](https://platform.openai.com/docs/guides/realtime), Imagine you have a phone call with AI the web are displaying the transcript with highlight the current spoken. 196 | - [x] Add example of [streaming TTS with highlight](https://react-speech-highlight.vercel.app/example). it play tts with highlight while even the text is still streamed. 197 | - [x] Re-[Architecture](#solutions) the package into plugins system, and add optional backend-nify the LLM Engine, so it faster, secure and more reliable. 198 | - [x] Making Hybrid engine timestamp detection 199 | - [x] Relation Highlight Feature - Used in Youtube Transcript Highlight. Highlight the words in youtube transcript, and their relations to other word like their translation form. 200 | - [x] Add Virtual Node for flexible highlighting 201 | - [x] React Native Speech Highlight - Now we add support for mobile app version using [React Native](https://reactnative.dev/), [try the demo app](#react-native-speech-highlight) 202 | - [x] Accurate and cost effective [pronounciation correction](PROBLEMS.md#a-common-problem-in-text-to-speech-both-audio-file-and-web-speech-synthesis) Using LLM Open AI Chat Completions for any terms or equations from academic paper, math, physics, computer science, machine learning, and more... 203 | - [x] Server Side Rendering Capability, see our demo is using [next js](https://nextjs.org/) 204 | - [x] Batch API request for making the audio file for long article content. it will improve efficiency and user experience. [it for solve The delay of audio played and user gesture to trigger play must be close.](PROBLEMS.md#1-the-delay-of-audio-played-and-user-gesture-to-trigger-play-must-be-close) 205 | - [x] Add example text to speech with viseme lipsync on 3D avatar generated from [readyplayer.me](https://readyplayer.me). [see](https://vanilla-speech-highlight.vercel.app) 206 | - [x] Add viseme API for current spoken TTS, [see](https://vanilla-speech-highlight.vercel.app) 207 | - [x] Add vanilla js support, for those who don't use react, [see vanilla demo](https://vanilla-speech-highlight.vercel.app) 208 | - [x] Add fallback/prefer to audio file (.mp3/etc) when user doesn't have built in speech synthesis in their devices. or maybe prefer using audio file because the sound is better than robot like sound. [see](AUDIO_FILE.md) 209 | - [x] Docs integration text-to-speech with [Eleven Labs](https://try.elevenlabs.io/29se7bx2zgw1) API [see the demo web](https://react-speech-highlight.vercel.app) 210 | - [x] Integration with [React GPT Web Guide](https://github.com/albirrkarim/react-gpt-web-guide-docs) Package. 211 | - [x] Multi character support for non latin alphabet ( chinese (你好), 212 | russian (Привет), japanese (こんにちは), korean (안녕하세요), etc ). [see](https://react-speech-highlight.vercel.app/#non-latin) 213 | - [x] Add [language detection using LLM api](API.md#2-getlangforthistext) 214 | - [x] Add [seeking by sentence or paragraph](API.md#2b-interface), [reading progress by word or sentence](API.md#spokenhl), [Adjust config while TTS playing.](API.md#controlhl), [Custom Abbreviation Function](API.md#1-tts-marker-markthewords) 215 | - [x] Realiability: TTS that can't die, Test on any platform, Code Linting using eslint, Using [Typescript](https://www.typescriptlang.org/), [Tested (Prompt Test, Unit Test, Engine Test)](TEST.md) 216 | - [x] Add [demo website](https://react-speech-highlight.vercel.app) 217 | 218 |
219 |
220 | 221 | ## C. API & Example Code 222 | 223 | See [API.md](API.md) and [EXAMPLE_CODE.md](EXAMPLE_CODE.md) that contain simple example code. 224 | 225 | The full example code and implementation example is using source code from [demo website](https://react-speech-highlight.vercel.app). the source code of demo website is included when you buy this package. 226 | 227 | This package is written with typescript, You don't have to read all the docs in here, because this package now support [js doc](https://jsdoc.app) and [VS Code IntelliSense](https://code.visualstudio.com/docs/editor/intellisense) what is that? simply its when you hover your mouse into some variable or function [VS Code](https://code.visualstudio.com) will show some popup (simple tutorial) what is the function about, examples, params, etc... 228 | 229 | Just use the source code from demo website, you can literally just understand the package. 230 | 231 | https://github.com/albirrkarim/react-speech-highlight-demo/assets/29292018/05d325f9-469c-47e9-97d3-10053628e18c 232 | 233 |
234 |
235 | 236 | ## D. Changelog 237 | 238 | Changelog contains information about new feature, improve accuracy, fix bug, and what you should do when the version is update. 239 | 240 | See [CHANGELOG.md](CHANGELOG.md) 241 | 242 |
243 |
244 | 245 | ## E. Disclaimer & Warranty 246 | 247 | There's no refund. 248 | 249 | I love feedback from my customers. You can write on the issue tab so when i have time i can try to solve that and deliver for the next update. 250 | 251 | Still worry? see the [reviews on producthunt](https://www.producthunt.com/products/react-vanilla-speech-highlight/reviews) 252 | 253 | React / Vanilla Speech Highlight - Highlight Anything | Product Hunt 254 | 255 |
256 |
257 | 258 | ## F. FAQ 259 | 260 |
261 | Why it's expensive? Why it's not opensource package? 262 | 263 |
264 | 265 | Well, i need money to funding the research, you know that making complex package is cost a lot of time and of course money, and high engineering skills. 266 | 267 | Making marking the sentence and word for all languages with different writing system is really hard. I have do research about that language, then making a lot of test case that make the marking solid and reliable for all languages. 268 | 269 | Making the [LLM engines](LLM_ENGINE.md) that combines prompt engineering and efficient algorithm to saving Open AI API cost. Need to be tested and the test is repeatly that cost the API call. 270 | 271 | Also i provide support via live private chat to me through discord (username: albirrkarim). 272 | 273 | **When you try to make this package by yourself, you will need to spend a lot of time and money to make something like this package.** 274 | 275 | This package is a `base` package that can be used for various [use cases](#use-cases). I made a lot of money with package. The limit is your entrepreneurship skill. 276 | 277 |
278 | 279 |
280 | 281 |
282 | How about support? 283 | 284 |
285 | 286 | Tell your problems or difficulties to me, i will show you the way to solve that. 287 | 288 | I provide realtime support from me with discord. 289 | 290 | Just buy it. remove the headache. and you can focus on your project. 291 | 292 |
293 | 294 |
295 | 296 |
297 | Can you give me some discount? 298 | 299 |
300 | 301 | Yes, if you are student or teacher, you can get discount. Just show me your student card or teacher card. 302 | 303 | Yes, if you help me vote this package on [product hunt](https://www.producthunt.com/products/react-vanilla-speech-highlight) 304 | 305 |
306 | 307 |
308 | 309 |
310 | Is it well documented and well crafted? 311 | 312 |
313 | 314 | You can see the docs in this repo, and this package is written with typescript, and tested using jest to make sure the quality. 315 | 316 | You don't have to read all the docs in here, because this package now support [VS Code IntelliSense](https://code.visualstudio.com/docs/editor/intellisense) what is that? simply its when you hover your mouse into some variable or function [VS Code](https://code.visualstudio.com/) will show some popup (simple tutorial) what is the function about, examples, params, etc... 317 | 318 | Just use the source code from demo website, you can literally just understand the package. 319 | 320 | https://github.com/albirrkarim/react-speech-highlight-demo/assets/29292018/05d325f9-469c-47e9-97d3-10053628e18c 321 | 322 |
323 | 324 |
325 | 326 |
327 | This package written in Typescript? Is it can be mixed together with jsx or native js project? 328 | 329 |
330 | 331 | Yes it can, just ask [chat gpt](https://chatgpt.com), and explain your problems. 332 | 333 | Example : 334 | 335 | "My project is using webpack, code is using jsx, i want to use tsx code along side the jsx, how can i?" 336 | 337 |
338 | 339 |
340 | 341 |
342 | How accurate the viseme generation? 343 |
344 | 345 | Goto the [Vanilla Speech Highlight](https://vanilla-speech-highlight.vercel.app) 346 | 347 | I make demo for outputing the viseme into console.log. just open the browser console and play the prefer audio example (english). and you will see the word and viseme in the current timing of played tts. 348 | 349 |
350 | 351 |
352 | 353 |
354 | How accurate the highlight capability? 355 |
356 | 357 | Just see the [demo](https://react-speech-highlight.vercel.app) 358 | 359 |
360 | 361 |
362 | 363 |
364 | Why there's no voices available on the device? 365 | 366 |
367 | 368 | Try to use Prefer or Fallback to Audio File see [AUDIO_FILE.md](AUDIO_FILE.md) 369 | 370 | or 371 | 372 | Try to setting the speech synthesis or language in your device. 373 | 374 | If you use smartphone (Android): 375 | 376 | 1. Make sure you install [Speech Recognition & Synthesis](https://play.google.com/store/apps/details?id=com.google.android.tts) 377 | 378 | 2. If step 1 doesn't work. Try to download google keyboard. then setting the Dictation language. wait a few minute (your device will automatically download the voice), then restart your smartphone. 379 | 380 |
381 | 382 |
383 | 384 |
385 | Why speech doesn't work for first played voice? (web speech synthesis) 386 | 387 |
388 | 389 | Your device will download that voice first. then your device will have that voice locally. 390 | 391 | Try to use Prefer or Fallback to Audio File see [AUDIO_FILE.md](AUDIO_FILE.md) 392 | 393 |
394 | 395 |
396 | 397 |
398 | Can i use this text-to-speech without showing the highlight? 399 | 400 |
401 | 402 | Yes, [see](API.md) 403 | 404 |
405 | 406 |
407 | 408 |
409 | Can i use without openai API? 410 | 411 |
412 | 413 | This package optionally required open ai API for better doing text-to-speech task (solve many problem that i wrote in [PROBLEMS.md](PROBLEMS.md)). 414 | 415 | But if you don't want to use open ai API, it can still work. see the FAQ about **_What dependency this package use?_** 416 | 417 |
418 |
419 | 420 |
421 | What dependency this package use? 422 | 423 |
424 | 425 | **NPM dependencies:** 426 | 427 | - For React Speech Highlight: See the [package.json](package.json) in this repo. see the `peerDependencies` once you build this package you will need only npm package that is in that `peerDependencies`. Only react. 428 | 429 | - For [Vanilla Speech Highlight](https://vanilla-speech-highlight.vercel.app): No dependency, just use the vanilla js file. 430 | 431 | **AI dependencies:** 432 | 433 | - This package optionally required open ai API for better doing text-to-speech task (solve many problem that i wrote in [PROBLEMS.md](PROBLEMS.md)). 434 | 435 | - Optionally using any TTS API that can produce audio file for better sound quality. Like [ElevenLabs](https://try.elevenlabs.io/29se7bx2zgw1), [Murf AI](https://get.murf.ai/0big1kdars4f), [Open AI](https://platform.openai.com/docs/guides/text-to-speech), [Amazon Polly](https://aws.amazon.com/id/polly/), and [Google Cloud](https://cloud.google.com/text-to-speech) or any other TTS API as long as they can produce audio file (mp3, mp4, wav, etc...) for the detail see the [AUDIO_FILE.md](AUDIO_FILE.md). 436 | 437 |
438 | 439 |
440 | 441 |
442 | Support for various browsers and devices? 443 | 444 |
445 | 446 | Yes, See the detail on [TEST.md](TEST.md) 447 | 448 | or you can Try to use Prefer or Fallback to Audio File see [AUDIO_FILE.md](AUDIO_FILE.md) 449 | 450 |
451 | 452 |
453 | 454 |
455 | How it work? Is the Package Architecture Scalable? 456 |
457 | 458 | It just work. Simple explanation is in the introduction [above](#a-introduction). 459 | 460 | The architecture scalable, just ask me what feature you want. 461 | 462 |
463 | 464 |
465 | 466 |
467 | How about API cost of using open AI API for your package use? 468 |
469 | 470 | See [LLM_ENGINE.md](LLM_ENGINE.md) 471 | 472 |
473 | 474 |
475 | 476 |
477 | Our Company have already make a lot of audio file, can i just use it for highlighting with your package? 478 |
479 | 480 | No, Because my package handle all the [batching system](PROBLEMS.md#2-long-text-request-to-tts-api-capabilty-of-tts-api-handling-long-text), [pronounciation system](PROBLEMS.md#1-pronounciation-problem), and [providing text](API.md#3-converttextintocleartranscripttext) so the TTS API can produce the audio file that can be used for highlighting. 481 | 482 | You can just do [caching strategy](AUDIO_FILE.md#a-efficient-cost-strategy) to cache the request response. for both open ai API and TTS API for audio file. 483 | 484 |
485 | 486 |
487 | 488 | ## G. Payment 489 | 490 | ### - The Web Version (React and Vanilla js) 491 | 492 | ![The Web Version (React and Vanilla js)](./img/web_version.png) 493 | 494 | For individual developer, freelancer, or small business. 495 | 496 | Due to the high demand for this library, access is granted through a bidding process. 497 | 498 | Submit your bid within the designated timeframe (every month we choose the winner). 499 | 500 | Highest bidders get priority access. 501 | 502 | [Fill bid form](https://forms.gle/T7ob1k7w1oywCYHP9) 503 | 504 | After payment, you’ll be invited to my private repository, where you’ll have access for one year, including all updates during that time. 505 | 506 | For continued access in subsequent years, you can pay USD $50 annually to remain in the private repository. 507 | 508 | **What you got** 509 | 510 | - [The demo website (Next js based)](https://github.com/Web-XR-AI-lab/demo-website-react-speech-highlight) 511 | 512 | - [The package repo (React Speech Highlight)](https://github.com/Web-XR-AI-lab/react-speech-highlight) 513 | 514 | - [The package repo (Vanilla Speech Highlight)](https://github.com/Web-XR-AI-lab/vanilla-speech-highlight) 515 | 516 |
517 | 518 | 563 | 564 | ### Backend Server for Advanced Features 565 | 566 | ![Backend Server](./img/backend.png) 567 | 568 | - [Python server ($20)](https://github.com/Web-XR-AI-lab/rshl_python_helper) 569 | 570 | Contains: YouTube relation transcript highlight, Video auto-generate transcript 571 | 572 | - [Node js server ($20)](https://github.com/Web-XR-AI-lab/rshl_node) 573 | 574 | Contains: Backenify LLM engines 575 | 576 |
577 | 578 | 595 | 596 |
597 | 598 | 610 | 611 |
612 | 613 | ### Payment method 614 | 615 | **Github Sponsors** 616 | 617 | Choose One Time Tab, Select the option, and follow the next instruction from github. 618 | 619 | 620 | 621 | 622 | 623 |
624 |
625 | 626 | ## Keywords 627 | 628 | So this package is the answer for you who looking for: 629 | 630 | - Best Text to Speech Library 631 | - Text to speech with viseme lipsync javascript 632 | - Javascript text to speech highlight words 633 | - How to text to speech with highlight the sentence and words like speechify 634 | - How to text to speech with highlight the sentence and words using elevenlabs 635 | - How to text to speech with highlight the sentence and words using open ai 636 | - How to text to speech with highlight the sentence and words using google speech synthesis 637 | - Text to speech javascript 638 | - Typescript text to speech 639 | - Highlighted Text to Speech 640 | - Speech Highlighting in TTS 641 | - TTS with Sentence Highlight 642 | - Word Highlight in Text-to-Speech. 643 | - Elevenlabs TTS 644 | - Highlighted TTS Elevenlabs 645 | - OpenAI Text to Speech 646 | - Highlighted Text OpenAI TTS 647 | - React Text to Speech Highlight 648 | - React TTS with Highlight 649 | - React Speech Synthesis 650 | - Highlighted TTS in React 651 | - Google Speech Synthesis in React 652 | - Text to Speech React JS 653 | - React JS TTS 654 | - React Text-to-Speech 655 | - TTS in React JS 656 | - React JS Speech Synthesis 657 | - JavaScript TTS 658 | - Text-to-Speech in JS 659 | - JS Speech Synthesis 660 | - Highlighted TTS JavaScript 661 | - Youtube Transcript Highlight 662 | - Word Highlight in Youtube Transcript 663 | - How to Highlight Words in Youtube Transcript 664 | - Youtube Transcript Word Timing 665 | - Realtime tts with highlight 666 | - Realtime tts streamed audio & text -------------------------------------------------------------------------------- /SUPPORTED_LANGUAGES.md: -------------------------------------------------------------------------------- 1 | # Supported Languages 2 | 3 | All language is supported, except Kannada, Thai 4 | 5 | Here languages that currently we already test it: 6 | 7 | Afrikaans, Arabic, Armenian, Azerbaijani, Belarusian, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Marathi, Maori, Nepali, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Turkish, Ukrainian, Urdu, Vietnamese, and Welsh. -------------------------------------------------------------------------------- /TEST.md: -------------------------------------------------------------------------------- 1 | # Testing Report 2 | 3 | ## A. Unit Test 4 | 5 | Unit Test is making test case for each function. make sure individual function or engines that use LLM is working as expected time over time after many development cycles. 6 | 7 | ### Function Test 8 | 9 | see [changelog](CHANGELOG.md) 10 | 11 | ### Prompt Test 12 | 13 | Testing each prompt so it can be cost effective 14 | 15 | see [changelog](CHANGELOG.md) 16 | 17 | ### Engine Test 18 | 19 | The [pronounciation correction engine](API.md#1-pronunciationcorrection) is combines the LLM (open ai chat API) and good algorithm to achieve accurate and cost effective. Of course it tested with test case. 20 | 21 | see [changelog](CHANGELOG.md) 22 | 23 |
24 | 25 | ## B. Data Type Safe & Code Quality 26 | 27 | Now we rewrite the package using [Typescript](https://www.typescriptlang.org/), linting using [eslint](https://eslint.org), and support [VS Code intellisense](https://code.visualstudio.com/docs/editor/intellisense). 28 | 29 |
30 | 31 | ## C. Compatibility 32 | 33 | Now **this package support all devices and browser**, because this package can use both [Audio File](AUDIO_FILE.md) and [Web Speech Synthesis](https://developer.mozilla.org/en-US/docs/Web/API/SpeechSynthesis) API. 34 | 35 | ### Audio file mode: 36 | 37 | Using the prefer of fallback api. you can set this package to do TTS using purely audio file. see [AUDIO_FILE.md](AUDIO_FILE.md). 38 | 39 | ### Web Speech Synthesis API itself: 40 | 41 | see the [Web Speech Synthesis API Browser compatibility](https://developer.mozilla.org/en-US/docs/Web/API/SpeechSynthesis#browser_compatibility) 42 | 43 | ### Device that i have test: 44 | 45 | - Macbook air m1 46 | - Ipad Pro m1 47 | - Samsung A53 -------------------------------------------------------------------------------- /assets/multi_lang/ar.mp3: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/assets/multi_lang/ar.mp3 -------------------------------------------------------------------------------- /assets/multi_lang/cn.mp3: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/assets/multi_lang/cn.mp3 -------------------------------------------------------------------------------- /assets/multi_lang/de.mp3: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/assets/multi_lang/de.mp3 -------------------------------------------------------------------------------- /assets/multi_lang/el.mp3: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/assets/multi_lang/el.mp3 -------------------------------------------------------------------------------- /assets/multi_lang/en.mp3: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/assets/multi_lang/en.mp3 -------------------------------------------------------------------------------- /assets/multi_lang/fi.mp3: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/assets/multi_lang/fi.mp3 -------------------------------------------------------------------------------- /assets/multi_lang/fr.mp3: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/assets/multi_lang/fr.mp3 -------------------------------------------------------------------------------- /assets/multi_lang/hindi.mp3: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/assets/multi_lang/hindi.mp3 -------------------------------------------------------------------------------- /assets/multi_lang/id.mp3: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/assets/multi_lang/id.mp3 -------------------------------------------------------------------------------- /assets/multi_lang/it.mp3: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/assets/multi_lang/it.mp3 -------------------------------------------------------------------------------- /assets/multi_lang/jp.mp3: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/assets/multi_lang/jp.mp3 -------------------------------------------------------------------------------- /assets/multi_lang/ko.mp3: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/assets/multi_lang/ko.mp3 -------------------------------------------------------------------------------- /assets/multi_lang/ro.mp3: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/assets/multi_lang/ro.mp3 -------------------------------------------------------------------------------- /assets/multi_lang/ru.mp3: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/assets/multi_lang/ru.mp3 -------------------------------------------------------------------------------- /assets/multi_lang/tr.mp3: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/assets/multi_lang/tr.mp3 -------------------------------------------------------------------------------- /assets/test/cat.mp3: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/assets/test/cat.mp3 -------------------------------------------------------------------------------- /backend/nodejs/.env.template: -------------------------------------------------------------------------------- 1 | ELEVENLABS_API_KEY=your_api_key_here 2 | PORT=3001 3 | -------------------------------------------------------------------------------- /backend/nodejs/.gitignore: -------------------------------------------------------------------------------- 1 | # Logs 2 | logs 3 | *.log 4 | npm-debug.log* 5 | yarn-debug.log* 6 | yarn-error.log* 7 | lerna-debug.log* 8 | 9 | # Diagnostic reports (https://nodejs.org/api/report.html) 10 | report.[0-9]*.[0-9]*.[0-9]*.[0-9]*.json 11 | 12 | # Runtime data 13 | pids 14 | *.pid 15 | *.seed 16 | *.pid.lock 17 | 18 | # Directory for instrumented libs generated by jscoverage/JSCover 19 | lib-cov 20 | 21 | # Coverage directory used by tools like istanbul 22 | coverage 23 | *.lcov 24 | 25 | # nyc test coverage 26 | .nyc_output 27 | 28 | # Grunt intermediate storage (https://gruntjs.com/creating-plugins#storing-task-files) 29 | .grunt 30 | 31 | # Bower dependency directory (https://bower.io/) 32 | bower_components 33 | 34 | # node-waf configuration 35 | .lock-wscript 36 | 37 | # Compiled binary addons (https://nodejs.org/api/addons.html) 38 | build/Release 39 | 40 | # Dependency directories 41 | node_modules/ 42 | jspm_packages/ 43 | 44 | # TypeScript v1 declaration files 45 | typings/ 46 | 47 | # TypeScript cache 48 | *.tsbuildinfo 49 | 50 | # Optional npm cache directory 51 | .npm 52 | 53 | # Optional eslint cache 54 | .eslintcache 55 | 56 | # Microbundle cache 57 | .rpt2_cache/ 58 | .rts2_cache_cjs/ 59 | .rts2_cache_es/ 60 | .rts2_cache_umd/ 61 | 62 | # Optional REPL history 63 | .node_repl_history 64 | 65 | # Output of 'npm pack' 66 | *.tgz 67 | 68 | # Yarn Integrity file 69 | .yarn-integrity 70 | 71 | # dotenv environment variables file 72 | .env 73 | .env.test 74 | 75 | # parcel-bundler cache (https://parceljs.org/) 76 | .cache 77 | 78 | # Next.js build output 79 | .next 80 | 81 | # Nuxt.js build / generate output 82 | .nuxt 83 | dist 84 | 85 | # Gatsby files 86 | .cache/ 87 | # Comment in the public line in if your project uses Gatsby and *not* Next.js 88 | # https://nextjs.org/blog/next-9-1#public-directory-support 89 | # public 90 | 91 | # vuepress build output 92 | .vuepress/dist 93 | 94 | # Serverless directories 95 | .serverless/ 96 | 97 | # FuseBox cache 98 | .fusebox/ 99 | 100 | # DynamoDB Local files 101 | .dynamodb/ 102 | 103 | # TernJS port file 104 | .tern-port 105 | -------------------------------------------------------------------------------- /backend/nodejs/app.js: -------------------------------------------------------------------------------- 1 | require("dotenv").config(); 2 | 3 | const cors = require('cors'); 4 | 5 | const express = require("express"); 6 | const axios = require("axios"); 7 | const app = express(); 8 | const port = process.env.PORT || 3001; 9 | 10 | app.use(express.urlencoded({ extended: true })); 11 | app.use(express.json()); 12 | app.use(cors()); 13 | 14 | app.use((error, req, res, next) => { 15 | if (error instanceof SyntaxError && error.status === 400 && "body" in error) { 16 | console.error(error); 17 | return res.status(400).send({ message: "Malformed JSON in payload" }); 18 | } 19 | next(); 20 | }); 21 | 22 | app.post("/api/v1/public/text-to-speech-elevenlabs", async (req, res) => { 23 | let text = req.body.text || null; 24 | 25 | if (!text) { 26 | res.status(400).send({ error: "Text is required." }); 27 | return; 28 | } 29 | 30 | const voice_id = 31 | req.body.voice_id == 0 32 | ? "21m00Tcm4TlvDq8ikWAM" 33 | : req.body.voice || "21m00Tcm4TlvDq8ikWAM"; 34 | 35 | const model = req.body.model || "eleven_multilingual_v2"; 36 | 37 | const voice_settings = 38 | req.body.voice_settings == 0 39 | ? { 40 | stability: 0.75, 41 | similarity_boost: 0.75, 42 | } 43 | : req.body.voice_settings || { 44 | stability: 0.75, 45 | similarity_boost: 0.75, 46 | }; 47 | 48 | try { 49 | const response = await axios.post( 50 | `https://api.elevenlabs.io/v1/text-to-speech/${voice_id}`, 51 | { 52 | text: text, // escape inner double quotes , 53 | voice_settings: voice_settings, 54 | model_id: model, 55 | }, 56 | { 57 | headers: { 58 | "Content-Type": "application/json", 59 | accept: "audio/mpeg", 60 | "xi-api-key": `${process.env.ELEVENLABS_API_KEY}`, 61 | }, 62 | responseType: "arraybuffer", 63 | } 64 | ); 65 | 66 | const audioBuffer = Buffer.from(response.data, "binary"); 67 | const base64Audio = audioBuffer.toString("base64"); 68 | const audioDataURI = `data:audio/mpeg;base64,${base64Audio}`; 69 | 70 | res.send({ audio: audioDataURI }); 71 | } catch (error) { 72 | console.error(error); 73 | res.status(500).send("Error occurred while processing the request."); 74 | } 75 | }); 76 | 77 | app.listen(port, () => { 78 | console.log(`Server is running at http://localhost:${port}`); 79 | }); 80 | -------------------------------------------------------------------------------- /backend/nodejs/package.json: -------------------------------------------------------------------------------- 1 | { 2 | "name": "vf-elevenlabs", 3 | "version": "1.0.0", 4 | "description": "", 5 | "main": "index.js", 6 | "scripts": { 7 | "test": "echo \"Error: no test specified\" && exit 1", 8 | "start": "node app.js" 9 | }, 10 | "repository": { 11 | "type": "git", 12 | "url": "git+https://github.com/voiceflow-gallagan/VF-ElevenLabs.git" 13 | }, 14 | "author": "NiKo | Voiceflow", 15 | "license": "ISC", 16 | "bugs": { 17 | "url": "https://github.com/voiceflow-gallagan/VF-ElevenLabs/issues" 18 | }, 19 | "homepage": "https://github.com/voiceflow-gallagan/VF-ElevenLabs#readme", 20 | "dependencies": { 21 | "axios": "^1.6.7", 22 | "cors": "^2.8.5", 23 | "dotenv": "^16.4.4", 24 | "express": "^4.18.2" 25 | } 26 | } 27 | -------------------------------------------------------------------------------- /backend/nodejs/readme.md: -------------------------------------------------------------------------------- 1 | # Example Integration Node js Backend with ElevenLabs TTS API 2 | 3 | after you download or clone this folder then 4 | 5 | ``` 6 | cp .env.template .env 7 | ``` 8 | 9 | Fill with your elevenlabs api key 10 | 11 | open the `.env` you will see something like this: 12 | 13 | ``` 14 | ELEVENLABS_API_KEY=your_api_key_here 15 | PORT=3001 16 | ``` 17 | 18 | install deps 19 | ``` 20 | npm install 21 | ``` 22 | 23 | Run the server 24 | ``` 25 | npm start 26 | ``` 27 | Now its run on localhost:3001 -------------------------------------------------------------------------------- /img/RNSH.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/RNSH.png -------------------------------------------------------------------------------- /img/adaptable.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/adaptable.png -------------------------------------------------------------------------------- /img/amazon.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/amazon.png -------------------------------------------------------------------------------- /img/auto_transcribe.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/auto_transcribe.png -------------------------------------------------------------------------------- /img/backend.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/backend.png -------------------------------------------------------------------------------- /img/banner.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/banner.png -------------------------------------------------------------------------------- /img/cache.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/cache.png -------------------------------------------------------------------------------- /img/chat_gpt_api.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/chat_gpt_api.png -------------------------------------------------------------------------------- /img/elevenlabs.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/elevenlabs.png -------------------------------------------------------------------------------- /img/elevenlabs_pricing.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/elevenlabs_pricing.png -------------------------------------------------------------------------------- /img/enterprise_web_version.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/enterprise_web_version.png -------------------------------------------------------------------------------- /img/features.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/features.png -------------------------------------------------------------------------------- /img/google_tts.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/google_tts.png -------------------------------------------------------------------------------- /img/hanacaraka.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/hanacaraka.png -------------------------------------------------------------------------------- /img/intellisense.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/intellisense.gif -------------------------------------------------------------------------------- /img/interaction.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/interaction.png -------------------------------------------------------------------------------- /img/mobile_version.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/mobile_version.png -------------------------------------------------------------------------------- /img/open_tts.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/open_tts.png -------------------------------------------------------------------------------- /img/overview.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/overview.png -------------------------------------------------------------------------------- /img/pdf_reader_plugin.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/pdf_reader_plugin.png -------------------------------------------------------------------------------- /img/position.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/position.png -------------------------------------------------------------------------------- /img/prepareHL.loadingProgress.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/prepareHL.loadingProgress.png -------------------------------------------------------------------------------- /img/prepareHL.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/prepareHL.png -------------------------------------------------------------------------------- /img/pronounciation.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/pronounciation.png -------------------------------------------------------------------------------- /img/react_speech_logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/react_speech_logo.png -------------------------------------------------------------------------------- /img/relation.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/relation.png -------------------------------------------------------------------------------- /img/sosmed.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/sosmed.png -------------------------------------------------------------------------------- /img/vanilla.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/vanilla.png -------------------------------------------------------------------------------- /img/viseme.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/viseme.png -------------------------------------------------------------------------------- /img/web_version.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/web_version.png -------------------------------------------------------------------------------- /package.json: -------------------------------------------------------------------------------- 1 | { 2 | "name": "react-speech-highlight", 3 | "version": "5.2.9", 4 | "description": "React components that use web speech synthesis API to text-to-speech tasks and also highlight the word and sentences that are being spoken", 5 | "main": "./build/index.js", 6 | "scripts": { 7 | "build": "webpack" 8 | }, 9 | "keywords": [ 10 | "text-to-speech", 11 | "SpeechSynthesisUtterance" 12 | ], 13 | "author": "albirrkarim", 14 | "license": "MIT", 15 | "repository": { 16 | "type": "git", 17 | "url": "https://github.com/albirrkarim/react-speech-highlight-demo" 18 | }, 19 | "peerDependencies": { 20 | "react": ">=17.0.0" 21 | }, 22 | "devDependencies": { 23 | "@babel/core": "^7.17.8", 24 | "@babel/preset-env": "^7.16.11", 25 | "@babel/preset-react": "^7.16.7", 26 | "babel-loader": "^8.2.4", 27 | "css-loader": "^6.7.1", 28 | "style-loader": "^3.3.1", 29 | "webpack": "^5.71.0", 30 | "webpack-cli": "^4.10.0", 31 | "dotenv": "^16.3.1" 32 | } 33 | } 34 | --------------------------------------------------------------------------------