├── .github
└── FUNDING.yml
├── API.md
├── API_VANILLA.md
├── AUDIO_FILE.md
├── CHANGELOG.md
├── EXAMPLE_CODE.md
├── HOW_TO_USE.md
├── LICENSE
├── LIMITATION.md
├── LLM_ENGINE.md
├── MAKE_BACKEND.md
├── PROBLEMS.md
├── README.md
├── SUPPORTED_LANGUAGES.md
├── TEST.md
├── assets
├── multi_lang
│ ├── ar.mp3
│ ├── cn.mp3
│ ├── de.mp3
│ ├── el.mp3
│ ├── en.mp3
│ ├── fi.mp3
│ ├── fr.mp3
│ ├── hindi.mp3
│ ├── id.mp3
│ ├── it.mp3
│ ├── jp.mp3
│ ├── ko.mp3
│ ├── ro.mp3
│ ├── ru.mp3
│ └── tr.mp3
└── test
│ └── cat.mp3
├── backend
└── nodejs
│ ├── .env.template
│ ├── .gitignore
│ ├── app.js
│ ├── package.json
│ └── readme.md
├── img
├── RNSH.png
├── adaptable.png
├── amazon.png
├── auto_transcribe.png
├── backend.png
├── banner.png
├── cache.png
├── chat_gpt_api.png
├── elevenlabs.png
├── elevenlabs_pricing.png
├── enterprise_web_version.png
├── features.png
├── google_tts.png
├── hanacaraka.png
├── intellisense.gif
├── interaction.png
├── mobile_version.png
├── open_tts.png
├── overview.png
├── pdf_reader_plugin.png
├── position.png
├── prepareHL.loadingProgress.png
├── prepareHL.png
├── pronounciation.png
├── react_speech_logo.png
├── relation.png
├── sosmed.png
├── vanilla.png
├── viseme.png
└── web_version.png
└── package.json
/.github/FUNDING.yml:
--------------------------------------------------------------------------------
1 | # These are supported funding model platforms
2 |
3 | github: albirrkarim
4 | #patreon: # Replace with a single Patreon username
5 | #open_collective: # Replace with a single Open Collective username
6 | ko_fi: albirrkarim
7 | #tidelift: # Replace with a single Tidelift platform-name/package-name e.g., npm/babel
8 | #community_bridge: # Replace with a single Community Bridge project-name e.g., cloud-foundry
9 | #liberapay: # Replace with a single Liberapay username
10 | #issuehunt: # Replace with a single IssueHunt username
11 | #otechie: # Replace with a single Otechie username
12 | #lfx_crowdfunding: # Replace with a single LFX Crowdfunding project-name e.g., cloud-foundry
13 | # custom: ['https://paypal.me/AlbirrKarim']
--------------------------------------------------------------------------------
/API.md:
--------------------------------------------------------------------------------
1 | # API
2 |
3 | The api is a function that you can use to integrate this package into your apps. When read this api docs you can toggle `Outline` (see top right) menu in github so you can navigate easily.
4 |
5 | This package is written with typescript, You don't have to read all the docs in here, because this package now support [VS Code IntelliSense](https://code.visualstudio.com/docs/editor/intellisense) what is that? simply its when you hover your mouse into some variable or function [VS Code](https://code.visualstudio.com) will show some popup (simple tutorial) what is the function about, examples, params, etc...
6 |
7 |
8 | Show Video
9 |
10 |
11 | https://github.com/albirrkarim/react-speech-highlight-demo/assets/29292018/05d325f9-469c-47e9-97d3-10053628e18c
12 |
13 |
14 |
15 |
16 |
17 | see [API_VANILLA.md](API_VANILLA.md) for vanilla js version.
18 |
19 |
20 |
21 | Actually, **Theres a lot** of function, [llm engine](LLM_ENGINE.md) and constant that you can import from this package. Here's just few of them. When you have buy the package you can just go to the `index.ts` file and see all the function and constant. the package have a lot of features, ofcourse it have a lot of APIs.
22 |
23 |
24 | Show How to import something from the package
25 |
26 |
27 | ```jsx
28 | // v5.4.8 API
29 | import {
30 | // Main
31 | markTheWords,
32 | useTextToSpeech,
33 |
34 | // Utilities function for precision and add more capabilities
35 | pronunciationCorrection,
36 | getLangForThisText,
37 | getTheVoices,
38 | noAbbreviation,
39 | speak,
40 | convertTextIntoClearTranscriptText,
41 |
42 | // Package Data and Cache Integration
43 | // Your app can read the data used by this package, like:
44 | PKG,
45 | PREFERRED_VOICE, // Set global config for the preffered voice
46 | PKG_STATUS_OPT, // Package status option
47 | PKG_DEFAULT_LANG, // Package default lang
48 | LANG_CACHE_KEY, // Package lang sessionStorage key
49 | OPENAI_CHAT_COMPLETION_API_ENDPOINT,
50 | getVoiceBasedOnVoiceURI,
51 | getCachedVoiceInfo,
52 | getCachedVoiceURI,
53 | setCachedVoiceInfo,
54 | getCachedVoiceName,
55 | } from "react-speech-highlight";
56 |
57 | // Type data for typescript
58 | import type {
59 | ControlHLType,
60 | StatusHLType,
61 | PrepareHLType,
62 | SpokenHLType,
63 | UseTextToSpeechReturnType,
64 | ActivateGestureProps,
65 | GetVoicesProps,
66 | VoiceInfo,
67 | markTheWordsFuncType,
68 | ConfigTTS,
69 | getAudioType,
70 | getAudioReturnType,
71 | VisemeMap,
72 | SentenceInfo,
73 | } from "react-speech-highlight";
74 | ```
75 |
76 |
77 |
78 |
79 |
80 | # Main
81 |
82 | ## 1. TTS Marker `markTheWords()`
83 |
84 | The `markTheWords()` function is to process the string text and give some marker to every word and sentences that system will read.
85 |
86 |
87 | Show Code
88 |
89 |
90 | Important, This example using react `useMemo()` to avoid unecessary react rerender. i mean it will only execute when the `text` is changing. it's similiar with `useEffect()`.
91 |
92 |
93 |
94 | ```jsx
95 | function abbreviationFunction(str) {
96 | // You can write your custom abbreviation function here
97 | // example:
98 | // Input(string) : LMK
99 | // Ouput(string) : Let me know
100 |
101 | return str;
102 | }
103 |
104 | const textHL = useMemo(() => markTheWords(text, abbreviationFunction), [text]);
105 | ```
106 |
107 |
108 |
109 |
110 |
111 | ## 2. TTS React Hook `useTextToSpeech()`
112 |
113 | ### 2.A. CONFIG
114 |
115 | There are two config placement, initialConfig and actionConfig.
116 |
117 |
118 | Show Code
119 |
120 |
121 | ```jsx
122 | const initialConfig = {
123 | autoHL: true,
124 | disableSentenceHL: false,
125 | disableWordHL: false,
126 | classSentences: "highlight-sentence",
127 | classWord: "highlight-spoken",
128 |
129 | lang: "id-ID",
130 | pitch: 1,
131 | rate: 0.9,
132 | volume: 1,
133 | autoScroll: false,
134 | clear: true,
135 |
136 | // For viseme mapping,
137 | visemeMap: {},
138 |
139 | // Prefer or fallback to audio file
140 | preferAudio: null,
141 | fallbackAudio: null,
142 |
143 | batchSize: 200,
144 |
145 | timestampDetectionMode: "auto",
146 | };
147 |
148 | const { controlHL, statusHL, prepareHL, spokenHL } =
149 | useTextToSpeech(initialConfig);
150 | ```
151 |
152 | ```jsx
153 | const actionConfig = {
154 | autoHL: true,
155 | disableSentenceHL: false,
156 | disableWordHL: false,
157 | classSentences: "highlight-sentence",
158 | classWord: "highlight-spoken",
159 |
160 | lang: "id-ID",
161 | pitch: 1,
162 | rate: 0.9,
163 | volume: 1,
164 | autoScroll: false,
165 | clear: true,
166 |
167 | // For viseme mapping,
168 | visemeMap: {},
169 |
170 | // Prefer or fallback to audio file
171 | preferAudio: "example.com/some_file.mp3",
172 | fallbackAudio: "example.com/some_file.mp3",
173 |
174 | batchSize: null, // or 200
175 |
176 | timestampDetectionMode: "auto", // or rule, ml
177 | };
178 |
179 | void controlHL.play({
180 | textEl: textEl.current,
181 | onEnded: () => {
182 | console.log("Callback when tts done");
183 | },
184 | actionConfig,
185 | });
186 | ```
187 |
188 |
189 |
190 |
191 | Show details config
192 |
193 | - `autoHL`
194 |
195 | If the voice is not support the onboundary event, then this package prefer to disable word highlight. instead of trying to mimic onboundary event
196 |
197 | - `disableSentenceHL`
198 |
199 | Disable sentence highlight
200 |
201 | - `disableWordHL`
202 |
203 | Disable word highlight
204 |
205 | - `classSentences`
206 |
207 | You can styling the highlighted sentence with css to some class name
208 |
209 | - `classWord`
210 |
211 | You can styling the highlighted word with css to some class name
212 |
213 | - `lang`
214 |
215 | The one used for `SpeechSynthesisUtterance.lang`. [see](https://developer.mozilla.org/en-US/docs/Web/API/SpeechSynthesisUtterance/lang)
216 |
217 | - `pitch`
218 |
219 | The one used for `SpeechSynthesisUtterance.pitch`
220 |
221 | - `volume`
222 |
223 | The one used for `SpeechSynthesisUtterance.volume`
224 |
225 | - `autoScroll`
226 |
227 | Beautifull auto scroll, so the user can always see the highlighted sentences
228 |
229 | - `clear`
230 |
231 | if `true` overide previous played TTS with some new TTS that user want, if `false` user want to execute play new TTS but there's still exist played TTS. so it will just entering queue behind it
232 |
233 | - `visemeMap`
234 |
235 | The data for this parameter i provide in the [demo website source code](https://github.com/Web-XR-AI-lab/demo-website-react-speech-highlight).
236 |
237 | - `preferAudio`
238 |
239 | Some API to pass `string` or `async function` that return audio url like this `example.com/some_file.mp3` as preferred audio.
240 |
241 | So the package will use this audio instead of the built in web speech synthesis.
242 |
243 | - `fallbackAudio`
244 |
245 | Some API to pass `string` or `async function` that return audio url like this`example.com/some_file.mp3` as fallback audio.
246 |
247 | When the built in web speech synthesis error or user doesn't have any voice. the fallback audio file will be used.
248 |
249 | ```jsx
250 | async function getAudioForThisText(text){
251 | var res = await getAudioFromTTSAPI("https://yourbackend.com/api/elevenlabs....",text);
252 | // convert to audio file, convert again to audio url
253 |
254 | return res;
255 | }
256 |
257 | const config = {
258 | preferAudio: getAudioForThisText // will only call if needed (if user want to play) so you can save cost
259 | fallbackAudio: getAudioForThisText // will only call if needed (if web speech synthesis fail) so you can save cost
260 | }
261 |
262 | const { controlHL, statusHL, prepareHL, spokenHL } = useTextToSpeech(config)
263 | ```
264 |
265 | - `batchSize`
266 |
267 | The batch size for the audio file.
268 |
269 | When you set the batch is null so they send all the text. then you set for 200 package will chunk the text into 200 character.
270 |
271 | Example: 200
272 | so package will batched send 200 characters per request to TTS API
273 |
274 | [Readmore about batch system in this package](PROBLEMS.md#1-the-delay-of-audio-played-and-user-gesture-to-trigger-play-must-be-close)
275 |
276 | - `timestampDetectionMode`
277 |
278 | Detection mode for timestamp engine. [see private docs](https://github.com/Web-XR-AI-lab/demo-website-react-speech-highlight/tree/main/docs)
279 |
280 |
281 |
282 | ### 2.B. INTERFACE
283 |
284 | #### controlHL
285 |
286 | ```js
287 | controlHL.play();
288 | controlHL.pause();
289 | controlHL.resume();
290 | controlHL.stop();
291 | controlHL.seekSentenceBackward();
292 | controlHL.seekSentenceForward();
293 | controlHL.seekParagraphBackward();
294 | controlHL.seekParagraphForward();
295 | controlHL.changeConfig();
296 | controlHL.activateGesture();
297 | ```
298 |
299 | #### statusHL
300 |
301 | Some react state that give the status of the program. The value it can be `idle|play|calibration|pause|loading`. You can fixed the value with accessing from `PKG_STATUS_OPT` constant.
302 |
303 | | Name | Description |
304 | | ------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
305 | | `idle` | it's initial state |
306 | | `calibration` | system still process the text, so when TTS is playing it will performs accurate and better |
307 | | `play` | The system still playing TTS |
308 | | `pause` | Resume TTS |
309 | | `loading` | it mean the the system still processing to get best voices available. status will change to this value if we call `prepareHL.getVoices()` [see](PROBLEMS.md#4-bad-performance-or-voice-too-fast) |
310 |
311 | #### prepareHL
312 |
313 | Contain state and function to preparing the TTS. From all available voices that we can get from the SpeechSynthesis.getVoices() this package will test the voice and give 5 only best voice with language specified before.
314 |
315 | | Name | Description |
316 | | ------------------------- | ------------------------------------------------------------------------------------------------ |
317 | | prepareHL.getVoices() | Function to tell this package to get the best voice. [see](PROBLEMS.md#4-bad-performance-or-voice-too-fast) |
318 | | prepareHL.voices | React state store the result from `prepareHL.getVoices()` |
319 | | prepareHL.loadingProgress | React state for knowing voice testing progress |
320 |
321 | #### spokenHL
322 |
323 | Contain react state for reporting while TTS playing.
324 |
325 | | Name | Description |
326 | | --------------------------- | ------------------------------------------------ |
327 | | spokenHL.sentence | Some react state, Get the sentence that read |
328 | | spokenHL.word | Some react state, Get the word that read |
329 | | spokenHL.viseme | Some react state, Get the current viseme |
330 | | spokenHL.precentageWord | Read precentage between 0-100 based on words |
331 | | spokenHL.precentageSentence | Read precentage between 0-100 based on sentences |
332 |
333 | # Utilities
334 |
335 | Utilities function for precision and add more capabilities
336 |
337 | ## 1. pronunciationCorrection()
338 |
339 | The common problem is the text display to user is different with their spoken form. like math symbol, equations, terms, etc.. [readmore about pronounciation problem](PROBLEMS.md)
340 |
341 | [How to build this package with open ai api integration](MAKE_BACKEND.md)
342 |
343 |
344 | Show Code
345 |
346 |
347 | ```jsx
348 | const inputText = `
349 |
387 | );
388 | ```
389 |
390 |
391 |
392 | ## 2. getLangForThisText()
393 |
394 | For example you want to implement this package into blog website with multi language, it's hard to know the exact language for each post / article.
395 |
396 | Then i use chat gpt api to detect what language from some text. see [How to build this package with open ai api integration](MAKE_BACKEND.md)
397 |
398 |
399 | Show Code
400 |
401 |
402 | ```jsx
403 | var timeout = null;
404 |
405 | const inputText = `
406 | Hallo, das ist ein deutscher Beispieltext
407 | `;
408 |
409 | async function getLang() {
410 | var predictedLang = await getLangForThisText(textEl.current);
411 |
412 | // will return `de`
413 | if (predictedLang) {
414 | setLang(predictedLang);
415 | }
416 | }
417 |
418 | useEffect(() => {
419 | if (textEl.current) {
420 | if (inputText != "") {
421 | // The timeout is for use case: text change frequently.
422 | // if the text doesn't change just call getLang();
423 | if (timeout) {
424 | clearTimeout(timeout);
425 | }
426 |
427 | timeout = setTimeout(() => {
428 | getLang();
429 | }, 2000);
430 | }
431 | }
432 | }, [inputText]);
433 | ```
434 |
435 |
436 |
437 | ## 3. convertTextIntoClearTranscriptText()
438 |
439 | Function to convert your input string (just text or html string) into [Speech Synthesis Markup Language (SSML)](https://cloud.google.com/text-to-speech/docs/ssml) clear format that this package **can understand** when making transcript timestamp.
440 |
441 | **You must use this function when making the audio file**
442 |
443 | ```jsx
444 | var convertInto = "ssml"; // or "plain_text"
445 | var clear_transcript = convertTextIntoClearTranscriptText(
446 | "your string here",
447 | convertInto
448 | );
449 | // with the clear_transcript you can make audio file with help of other speech synthesis platforms like elevenlabs etc.
450 | ```
451 |
452 | # Package Data and Cache Integration
453 |
454 | The data or cache (storage) that this package use can be accessed outside. The one that used by [React GPT Web Guide](https://github.com/albirrkarim/react-gpt-web-guide-docs).
455 |
456 |
457 | Show
458 |
459 | ```js
460 | import {
461 | // ...other API
462 |
463 | // Your app can read the data / cache used by this package, like:
464 | PREFERRED_VOICE, // Set global config for the preffered voice
465 | PKG_STATUS_OPT, // Package status option
466 | PKG_DEFAULT_LANG, // Package default lang
467 | LANG_CACHE_KEY, // Package lang sessionStorage key
468 | OPENAI_CHAT_COMPLETION_API_ENDPOINT, // Key to set open ai chat completion api
469 | getVoiceBasedOnVoiceURI,
470 | getCachedVoiceInfo,
471 | getCachedVoiceURI,
472 | setCachedVoiceInfo,
473 | getCachedVoiceName,
474 | } from "react-speech-highlight";
475 | ```
476 |
477 |
478 |
479 | Usage example:
480 |
481 | ## Set custom constant value for this package
482 |
483 | ```jsx
484 | import { setupKey, storage } from "@/app/react-speech-highlight";
485 |
486 | // set global preferred voice
487 | useEffect(() => {
488 | const your_defined_preferred_voice = {
489 | // important! Define language code (en-us) with lowercase letter
490 | "de-de": ["Helena", "Anna"],
491 | };
492 |
493 | storage.setItem(
494 | "global",
495 | setupKey.PREFERRED_VOICE,
496 | yourDefinedPreferredVoice
497 | );
498 |
499 | // Set open ai chat completion api
500 | // example in demo website (next js using environment variable) src/Components/ClientProvider.tsx
501 | if (process.env.NEXT_PUBLIC_OPENAI_CHAT_COMPLETION_API_ENDPOINT) {
502 | storage.setItem(
503 | "global",
504 | setupKey.OPENAI_CHAT_COMPLETION_API_ENDPOINT,
505 | process.env.NEXT_PUBLIC_OPENAI_CHAT_COMPLETION_API_ENDPOINT
506 | );
507 | }
508 |
509 | // or
510 | storage.setItem(
511 | "global",
512 | OPENAI_CHAT_COMPLETION_API_ENDPOINT,
513 | "http://localhost:8000/api/v1/public/chat"
514 | );
515 |
516 | // You can set the headers for the fetch API request with this key in sessionStorage
517 | const headers = {
518 | Authorization: `Bearer xxx_YOUR_PLATFORM_AUTH_TOKEN_HERE_xxx`,
519 | };
520 |
521 | // Tips: Hover your mouse over the REQUEST_HEADERS variable to see the example and docs
522 | storage.setItem("global", setupKey.REQUEST_HEADERS, headers);
523 |
524 | // Speech to Text API endpoint
525 | if (process.env.NEXT_PUBLIC_OPENAI_STT_API_ENDPOINT) {
526 | storage.setItem(
527 | "global",
528 | setupKey.OPENAI_SPEECH_TO_TEXT_API_ENDPOINT,
529 | process.env.NEXT_PUBLIC_OPENAI_STT_API_ENDPOINT
530 | );
531 | }
532 | }, []);
533 | ```
534 |
--------------------------------------------------------------------------------
/API_VANILLA.md:
--------------------------------------------------------------------------------
1 | # API Vanilla
2 |
3 | The only different between [React version](https://react-speech-highlight.vercel.app) and [Vanilla Js Version](https://vanilla-speech-highlight.vercel.app) is just the React state (`useState`)
4 |
5 | In the vanilla version we try to mimic the react state with using function callback
6 |
7 | ```js
8 | // This is just to pointer for the current played tts
9 | var statusHL = document.getElementById("statusHL");
10 | var spokenHL_viseme = document.getElementById("spokenHL_viseme");
11 | var spokenHL_word = document.getElementById("spokenHL_word");
12 | var spokenHL_sentence = document.getElementById("spokenHL_sentence");
13 | var spokenHL_precentageWord = document.getElementById(
14 | "spokenHL_precentageWord"
15 | );
16 | var spokenHL_precentageSentence = document.getElementById(
17 | "spokenHL_precentageSentence"
18 | );
19 |
20 | const setStatusHLState = (status) => {
21 | console.log("Default setStatusHLState function the statusHL is ", status);
22 |
23 | if (statusHL) {
24 | statusHL.innerHTML = status;
25 | }
26 | };
27 |
28 | const setVisemeSpoken = (viseme) => {
29 | console.log("Default setVisemeSpoken function. the viseme ", viseme);
30 |
31 | if (spokenHL_viseme) {
32 | spokenHL_viseme.innerHTML = viseme;
33 | }
34 | };
35 |
36 | const setWordSpoken = (word) => {
37 | console.log("Default setWordSpoken function. the word ", word);
38 |
39 | if (spokenHL_word) {
40 | spokenHL_word.innerHTML = word;
41 | }
42 | };
43 |
44 | const setSentenceSpoken = (sentence) => {
45 | console.log("Default setSentenceSpoken function ", sentence);
46 |
47 | if (spokenHL_sentence) {
48 | spokenHL_sentence.innerHTML = sentence;
49 | }
50 | };
51 |
52 | const setPrecentageSentence = (precentageSentence) => {
53 | console.log(
54 | "Default setPrecentageSentence function, precentageSentence = ",
55 | precentageSentence
56 | );
57 |
58 | if (spokenHL_precentageWord) {
59 | spokenHL_precentageWord.innerHTML = precentageSentence + "%";
60 | }
61 | };
62 |
63 | const setPrecentageWord = (precentageWord) => {
64 | console.log(
65 | "Default setPrecentageWord function, precentageWord = ",
66 | precentageWord
67 | );
68 |
69 | if (spokenHL_precentageSentence) {
70 | spokenHL_precentageSentence.innerHTML = precentageWord + "%";
71 | }
72 | };
73 |
74 | var defaultParams = {
75 | setVoices: (voices) => {
76 | console.log("Default setVoices function ", voices);
77 | },
78 | setLoadingProgress: (progress) => {
79 | console.log(
80 | "Default setLoadingProgress function the progress is ",
81 | progress
82 | );
83 | },
84 | setStatusHLState,
85 | setVisemeSpoken,
86 | setWordSpoken,
87 | setSentenceSpoken,
88 | setPrecentageSentence,
89 | setPrecentageWord,
90 | };
91 |
92 | // Global control HL
93 | const { controlHL } = useTextToSpeech(defaultParams);
94 |
95 | // play the tts
96 | controlHL.play();
97 | ```
98 |
99 | This is the API of `useTextToSpeech()` is like this. minus the react state (i comment it)
100 |
101 | ```jsx
102 | /**
103 | * Type for control highlight
104 | */
105 | export interface ControlHLType {
106 | play: PlayFunction;
107 | resume: () => void;
108 | pause: () => void;
109 | stop: () => void;
110 | seekSentenceBackward: (config: Partial) => void;
111 | seekSentenceForward: (config: Partial) => void;
112 | seekParagraphBackward: (config: Partial) => void;
113 | seekParagraphForward: (config: Partial) => void;
114 | activateGesture: ActivateGestureFunction;
115 | changeConfig: (actionConfig: Partial) => void;
116 | }
117 |
118 | export interface PrepareHLType {
119 | // loadingProgress: number
120 | // voices: VoiceInfo[]
121 | getVoices: GetVoicesFunction;
122 | retestVoices: RetestVoicesFunction;
123 | quicklyGetSomeBestVoice: QuicklyGetSomeBestVoiceFunction;
124 | }
125 |
126 | /**
127 | * Type for useTextToSpeech
128 | */
129 | export interface UseTextToSpeechReturnType {
130 | controlHL: ControlHLType;
131 | // statusHL: StatusHLType
132 | // spokenHL: SpokenHLType
133 | prepareHL: PrepareHLType;
134 | }
135 | ```
136 |
--------------------------------------------------------------------------------
/AUDIO_FILE.md:
--------------------------------------------------------------------------------
1 | # How to get the Audio File Automatically
2 |
3 | When we talk about generating audio file we need do research considering the price, quality, and api support so you can generate audio programmatically.
4 |
5 | Table of Contents:
6 |
7 | - [A. Efficient Cost Strategy](#a-efficient-cost-strategy)
8 | - [B. Paid TTS API](#b-paid-tts-api)
9 | - [C. Local AI TTS](#c-local-ai-tts)
10 |
11 |
12 |
13 |
14 | ## A. Efficient Cost Strategy
15 |
16 | - Considering based on your needs
17 | - When your needs is multi language you can make controller that using mixed of TTS API provider.
18 | - Using cache for the audio file
19 |
20 | 
21 |
22 | Usually i use laravel as a backend. its a good php framework, and its easy to use. But of course you can use any backend you want and do the same strategy.
23 |
24 | When you implement flow like that can only generate audio file once, and when the audio file is exist, it will not generate again.
25 |
26 | In english:
27 |
28 | Let say 1 blog post = 1,200 words
29 |
30 | Words Only: 1,200 words × 5 characters per word = 6,000 characters.
31 |
32 | Including Spaces and Punctuation: Spaces (approximately 20% of 6,000) = 6,000 × 0.2 = 1,200 characters.
33 |
34 | Total estimated characters = 6,000 (words) + 1,200 (spaces) = 7,200 characters.
35 |
36 | When you use open ai tts. 1 million / $15 (tts-1)
37 |
38 | So you can generate audio of 133 post with cost $15.
39 |
40 | But its `when all` your post is accessed. When not all post is accessed, you can save more money. even when the user is not fully reading your article.
41 |
42 | My lib also use [batch request system](PROBLEMS.md#the-solution-is-using-batch-request-system). they only ask the backend to get/make audio for only section that user is currently reading/listening.
43 |
44 |
45 |
46 |
47 | ## B. Paid TTS API
48 |
49 | ### - ElevenLabs
50 |
51 | [Eleven Labs](https://try.elevenlabs.io/29se7bx2zgw1) is a text-to-speech API that allows you to convert text into high-quality audio files. It supports multiple languages and voices, and provides a range of customization options to help you create the perfect audio for your needs.
52 |
53 | 
54 |
55 |
56 | Example Client Side Code (Frontend)
57 |
58 | ```js
59 | function convertBase64ToBlobURL(base64Audio) {
60 | // Remove the prefix from the data URL if present
61 | const base64Data = base64Audio.replace(/^data:audio\/mpeg;base64,/, "");
62 | // Convert base64 to raw binary data held in a string
63 | const byteString = atob(base64Data);
64 | // Create an ArrayBuffer with the binary length of the base64 string
65 | const arrayBuffer = new ArrayBuffer(byteString.length);
66 | // Create a uint8 view on the ArrayBuffer
67 | const uint8Array = new Uint8Array(arrayBuffer);
68 | for (let i = 0; i < byteString.length; i++) {
69 | uint8Array[i] = byteString.charCodeAt(i);
70 | }
71 | // Create a blob from the uint8Array
72 | const blob = new Blob([uint8Array], { type: "audio/mpeg" });
73 | // Generate a URL for the blob
74 | const blobURL = URL.createObjectURL(blob);
75 |
76 | return blobURL;
77 | }
78 |
79 | export const ttsUsingElevenLabs = async (inputText) => {
80 | // see https://elevenlabs.io/docs/api-reference/text-to-speech
81 | // https://github.com/albirrkarim/react-speech-highlight-demo/blob/main/AUDIO_FILE.md#eleven-labs
82 |
83 | // Set the ID of the voice to be used.
84 | const VOICE_ID = "21m00Tcm4TlvDq8ikWAM";
85 |
86 | const blobUrl = await fetch(
87 | process.env.NEXT_PUBLIC_ELEVEN_LABS_API_ENDPOINT,
88 | {
89 | method: "POST",
90 | headers: {
91 | "Content-Type": "application/json",
92 | },
93 | body: JSON.stringify({
94 | text: inputText,
95 | voice_id: VOICE_ID,
96 | model_id: "eleven_multilingual_v2",
97 | voice_settings: {
98 | stability: 0.75, // The stability for the converted speech
99 | similarity_boost: 0.5, // The similarity boost for the converted speech
100 | style: 1, // The style exaggeration for the converted speech
101 | speaker_boost: true, // The speaker boost for the converted speech
102 | },
103 | }),
104 | }
105 | )
106 | .then((response) => {
107 | if (!response.ok) {
108 | alert("Network fail");
109 | throw new Error(`HTTP error! Status: ${response.status}`);
110 | }
111 | return response.json();
112 | })
113 | .then((data) => {
114 | // Assuming the API response contains a property 'audio' with the base64-encoded audio
115 | const base64Audio = data.audio;
116 |
117 | // Create a Blob URL
118 | const blobUrl = convertBase64ToBlobURL(base64Audio);
119 |
120 | return blobUrl;
121 | });
122 |
123 | return blobUrl;
124 | };
125 |
126 | import { convertTextIntoClearTranscriptText } from "react-speech-highlight";
127 |
128 | var clear_transcript = convertTextIntoClearTranscriptText(
129 | "This is example text you can set"
130 | );
131 |
132 | const audioURL = await ttsUsingElevenLabs(clear_transcript);
133 |
134 | const { controlHL, statusHL, prepareHL, spokenHL } = useTextToSpeech({
135 | lang: "en",
136 | preferAudio: audioURL,
137 | //or
138 | // fallbackAudio: audioURL,
139 | });
140 | ```
141 |
142 |
143 |
144 |
145 | Example Integration Node js Backend with ElevenLabs TTS API
146 |
147 | Go to the [backend folder in this repo](https://github.com/albirrkarim/react-speech-highlight-demo/tree/main/backend/nodejs), you can see the example
148 |
149 |
150 |
151 |
152 | Example Integration Laravel Backend with ElevenLabs TTS API
153 |
154 | Router
155 |
156 | ```php
157 | Route::post('text-to-speech-elevenlabs', 'textToSpeechElevenLabs')->name('text_to_speech_elevenlabs');
158 | ```
159 |
160 | File `TTSController.php` this will return audio as base64
161 |
162 | ```php
163 | public function textToSpeech(Request $request)
164 | {
165 | $api_key = config('elevenlabs.api_key');
166 | $voice_id = isset($request['voice_id']) ? $request['voice_id'] : '21m00Tcm4TlvDq8ikWAM'; // Set the ID of the voice to be used
167 |
168 | $client = new Client([
169 | 'headers' => [
170 | 'Accept' => 'audio/mpeg',
171 | 'Content-Type' => 'application/json',
172 | 'xi-api-key' => $api_key,
173 | ],
174 | ]);
175 |
176 | try {
177 | $response = $client->post("https://api.elevenlabs.io/v1/text-to-speech/$voice_id", [
178 | 'json' => $request->all(),
179 | ]);
180 |
181 | // Check if the request was successful
182 | if ($response->getStatusCode() === 200) {
183 | // Get the audio content as a base64-encoded string
184 | $base64Audio = base64_encode($response->getBody());
185 |
186 | // Return the base64-encoded audio
187 | return response()->json([
188 | 'status' => true,
189 | 'audio' => $base64Audio,
190 | ]);
191 | } else {
192 | // Handle unsuccessful response
193 | return response()->json([
194 | 'status' => false,
195 | 'message' => 'Text-to-speech API request failed.',
196 | ], $response->getStatusCode());
197 | }
198 | } catch (\Exception $e) {
199 | // Handle Guzzle or other exceptions
200 | return response()->json([
201 | 'status' => false,
202 | 'message' => 'Error during text-to-speech API request.',
203 | 'error' => $e->getMessage(),
204 | ], 500);
205 | }
206 | }
207 |
208 | ```
209 |
210 |
211 |
212 | ### - Open AI TTS
213 |
214 | [OpenAI](https://platform.openai.com/docs/guides/text-to-speech) is also providing tts service, for now it come with minimal feature, but its fast latency.
215 |
216 |
217 |
218 |
219 |
220 |
221 | Example OpenAI TTS Backend with Laravel
222 |
223 | Router
224 |
225 | ```php
226 | Route::post('text-to-speech-elevenlabs', 'textToSpeechElevenLabs')->name('text_to_speech_elevenlabs');
227 | ```
228 |
229 | File `TTSController.php` this will return audio as base64
230 |
231 | ```php
232 | $api_key = config('openai.api_key');
233 |
234 | $client = new Client([
235 | 'headers' => [
236 | 'Authorization' => 'Bearer ' . $api_key,
237 | 'Content-Type' => 'application/json'
238 | ]
239 | ]);
240 |
241 | try {
242 | $response = $client->post("https://api.openai.com/v1/audio/speech", [
243 | 'json' => [
244 | 'model' => isset($request["model"]) ? $request["model"] : 'tts-1',
245 | 'input' => $request["input"],
246 | 'voice' => isset($request["voice"]) ? $request["voice"] : 'nova',
247 | ]
248 | ]);
249 |
250 | // Check if the request was successful
251 | if ($response->getStatusCode() === 200) {
252 | // Get the audio content as a base64-encoded string
253 | $base64Audio = base64_encode($response->getBody());
254 |
255 | // Return the base64-encoded audio
256 | return response()->json([
257 | 'status' => true,
258 | 'audio' => $base64Audio,
259 | ]);
260 | } else {
261 | // Handle unsuccessful response
262 | return response()->json([
263 | 'status' => false,
264 | 'message' => 'Text-to-speech API request failed.',
265 | ], $response->getStatusCode());
266 | }
267 | } catch (\Exception $e) {
268 | // Handle Guzzle or other exceptions
269 | return response()->json([
270 | 'status' => false,
271 | 'message' => 'Error during text-to-speech API request.',
272 | 'error' => $e->getMessage(),
273 | ], 500);
274 | }
275 | ```
276 |
277 | Your Client Side Code
278 |
279 | ```jsx
280 | const ttsUsingOpenAI = async (inputText) => {
281 | // Set the ID of the voice to be used.
282 |
283 | const blobUrl = await fetch(process.env.NEXT_PUBLIC_OPENAI_TTS_API_ENDPOINT, {
284 | method: "POST",
285 | headers: {
286 | "Content-Type": "application/json",
287 | },
288 | body: JSON.stringify({
289 | input: inputText,
290 | model: "tts-1", //or tts-1-hd
291 | voice: "alloy",
292 | }),
293 | })
294 | .then((response) => {
295 | if (!response.ok) {
296 | throw new Error(`HTTP error! Status: ${response.status}`);
297 | }
298 | return response.json();
299 | })
300 | .then((data) => {
301 | // Assuming the API response contains a property 'audio' with the base64-encoded audio
302 | const base64Audio = data.audio;
303 |
304 | // Convert the base64 audio to a Blob
305 | const byteCharacters = atob(base64Audio);
306 | const byteNumbers = new Array(byteCharacters.length);
307 | for (let i = 0; i < byteCharacters.length; i++) {
308 | byteNumbers[i] = byteCharacters.charCodeAt(i);
309 | }
310 | const byteArray = new Uint8Array(byteNumbers);
311 | const blob = new Blob([byteArray], { type: "audio/mpeg" });
312 |
313 | // Create a Blob URL
314 | const blobUrl = URL.createObjectURL(blob);
315 |
316 | return blobUrl;
317 | });
318 |
319 | return blobUrl;
320 | };
321 |
322 | import { convertTextIntoClearTranscriptText } from "react-speech-highlight";
323 |
324 | var clear_transcript = convertTextIntoClearTranscriptText(
325 | "This is example text you can set"
326 | );
327 |
328 | const audioURL = await ttsUsingOpenAI(clear_transcript);
329 |
330 | const { controlHL, statusHL, prepareHL, spokenHL } = useTextToSpeech({
331 | lang: "en",
332 | preferAudio: audioURL,
333 | //or
334 | // fallbackAudio: audioURL,
335 | });
336 | ```
337 |
338 |
339 |
340 |
341 |
342 | ### - Google TTS API
343 |
344 | [Google tts](https://cloud.google.com/text-to-speech) support SSML see the [pricing](https://cloud.google.com/text-to-speech/pricing)
345 |
346 | 
347 |
348 |
349 |
350 | ### - Amazon Polly
351 |
352 | See Amazon Polly [pricing](https://aws.amazon.com/polly/pricing/)
353 |
354 |
355 |
356 |
357 |
358 | For 1 million character:
359 | Standard: $4.00
360 | Neural: $16.00
361 |
362 | What different between standard and neural ? [see](https://docs.aws.amazon.com/polly/latest/dg/neural-voices.html)
363 |
364 | Simplified, Neural voices are more natural-sounding (better) than standard voices.
365 |
366 |
367 |
368 |
369 | ## C. Local AI TTS
370 |
371 | You can also use the local AI system, to do speech synthesis. You can use local PC or Google Colab.
372 |
373 | Then synchronize (text <-> audio) it with your server.
374 |
375 | Of course its not easy it will face many problems like:
376 |
377 | - Is you have knowledge about python and AI ?
378 | - How you get the models for your language ?
379 | When its english you can easily got the models.
380 |
381 | When i have time i will make tutorial about how to doing local speech synthesis.
382 |
--------------------------------------------------------------------------------
/CHANGELOG.md:
--------------------------------------------------------------------------------
1 | # CHANGELOG
2 |
3 | # v5.4.8
4 |
5 | - High standard Linting Eslint & Type checking typescript
6 |
7 | # 5.4.7
8 |
9 | - Increase stability, fixing marking for farsi
10 | - Update all deps for demo website, eslint 9, next js 15, material ui 7 etc...
11 |
12 | # 5.4.6
13 |
14 | - Increase capability swallowing Speech To Text result. with 722 Sample. For all the TTS languages that open ai support.
15 |
16 |
17 | Report
18 |
19 |
20 | ```
21 | avgErrWordsMiddle: 0.15255255255255257
22 | Unit: sNodesSTTAlign() is done |
23 | Avg Accuracy Sentence Time 99.97 % of 722 sample |
24 | Avg Accuracy Word Time 98.15 % of 666 sample |
25 | Avg Accuracy Word Middle 99.85 % of 666 sample |
26 | Avg Exec Time: 0.37 ms
27 | ```
28 |
29 |
30 |
31 | # 5.4.5
32 |
33 | - Increase capability swallowing Speech To Text result. with 106 Sample. For all the TTS languages that open ai support.
34 |
35 |
36 | Report
37 |
38 |
39 | ```
40 | avgErrWordsMiddle: 1.2860759493670888
41 | Unit: ruleTimestampEngine() is done |
42 | Avg Accuracy Sentence Time 99.81 % of 106 sample |
43 | Avg Accuracy Word Time 84.44 % of 79 sample |
44 | Avg Accuracy Word Middle 98.71 % of 79 sample |
45 | Avg Exec Time: 1.59 ms
46 | ```
47 |
48 |
49 |
50 | # 5.4.3 - 5.4.4
51 |
52 | - Fix double click gesture
53 | - Fix seeking paragraph
54 |
55 | # 5.4.2
56 |
57 | - Enhance marking test to 75 test case
58 |
59 | # 5.3.9 - 5.4.1
60 |
61 | - Add api to overide STT (Speech to text) function
62 |
63 | ```jsx
64 | import {
65 | openaiSpeechToTextSimple,
66 | useTextToSpeech,
67 | type ConfigTTS,
68 | } from "@lib/react-speech-highlight";
69 |
70 | const config: Partial = {
71 | preferAudio: getPublicAccessibleAudioURL,
72 | batchSize: 200,
73 | timestampEngineProps: {
74 | mode: "ml",
75 | sttFunction: async (input) => {
76 | console.log("Optionally Using custom STT function");
77 | // Maybe you want do overide the api request.
78 | // since you know the INPUT and the OUTPUT here, so you can create the PROCESS
79 | // INPUT -> PROCESS -> OUTPUT
80 | console.log("STT: input", input);
81 | const output = await openaiSpeechToTextSimple(input);
82 | console.log("STT: output", output);
83 | return output;
84 | },
85 | onProgress(progress, timeLeftEstimation) {
86 | console.log("Timestamp Engine Progress", progress);
87 | setProgress(progress);
88 | // setMessage("On progress Timestamp Engine (speech to text) ... -> " + moment.duration(timeLeftEstimation, "seconds").humanize())
89 | },
90 | },
91 | };
92 | ```
93 |
94 | # 5.3.8
95 |
96 | - Better marking sps (sentence) & spw (word) tag with over 69 test case from various wierd data.
97 |
98 | # 5.3.7
99 |
100 | - Realtime Text to Speech With Highlight - This package can intergrate with [open ai realtime api](https://platform.openai.com/docs/guides/realtime), Imagine you have a phone call with AI the web are displaying the transcript with highlight the current spoken.
101 |
102 | # 5.3.6
103 |
104 | - Fixing marking `sps` and `spw`
105 | - Tag `spkn` now deprecated
106 |
107 | # 5.3.1 - 5.3.5
108 |
109 | - Fixing bug
110 |
111 | # 5.3.0
112 |
113 | - Better timestamp engine
114 | - Better API design ( Breaking changes! )
115 |
116 | # 5.2.9
117 |
118 | - Enhancing timestamp engine capability v4
119 |
120 | # 5.2.7 - 5.2.8
121 |
122 | - Fix bug
123 |
124 | # 5.2.4 - 5.2.6
125 |
126 | - Add non latin support like chinese, japanese, korean, greek, etc...
127 |
128 | # 5.2.3
129 |
130 | - Update `toJSON` method in virtual node
131 | - Better LLM ENGINE
132 |
133 | # 5.2.1 - 5.2.2
134 |
135 | - Adding more jsdocs
136 | - Some little refactor
137 |
138 | # 5.2.0
139 |
140 | - Add example of generating SRT with `onEnded` event [see](https://react-speech-highlight.vercel.app/example)
141 |
142 | # 5.1.9
143 |
144 | - Begin the plugin architecture
145 | - Backend-nify the LLM engine using node js server (optional)
146 |
147 | # 5.1.8
148 |
149 | - Fix bug
150 |
151 | # 5.1.7
152 |
153 | - Rename storage API
154 |
155 | # 5.1.3 - 5.1.6
156 |
157 | - Fix bug
158 | - Renaming API
159 | - Virtual Storage (to mimic sessionStorage)
160 | - Local state tts [see demo example page](https://react-speech-highlight.vercel.app/example)
161 | - Add better error event
162 |
163 | # 5.1.0 - 5.1.2
164 |
165 | - Development of hybrid transcript timestamp engine
166 | - Fix bug
167 |
168 | # 5.0.9
169 |
170 | Relation Finder v4 see the evaluation on [LLM_ENGINE](LLM_ENGINE.md)
171 |
172 | # 5.0.7 - 5.0.8
173 |
174 | - Fix bug
175 |
176 | # 5.0.2 - 5.0.6
177 |
178 | - Fix bug
179 | - Improve Translate To some language engine, with chunking system it can handle more a lot of text
180 |
181 | # 5.0.1
182 |
183 | - Introduction virtual nodes, Sentence Node and Word Node. for flexible text to speech. Used in PDF TTS Highlight and Relation Highlight Features.
184 | - Relation Highlight Feature - Used in Youtube Transcript Highlight. Highlight the words in youtube transcript, and their relations to other word like their translation form.
185 | - Rename config.classSentences into config.classSentence
186 | - Add `ControlHL.followTime()` for following the time of played youtube video in iframe.
187 |
188 | # 5.0.0
189 |
190 | Stable version release, before i add API for plugin TTS on PDF
191 |
--------------------------------------------------------------------------------
/EXAMPLE_CODE.md:
--------------------------------------------------------------------------------
1 | # Example Code
2 |
3 | This is the simple example code. Want more? see the [HOW_TO_USE.md](HOW_TO_USE.md)
4 |
5 | ### Styling the highlighter
6 |
7 | File `App.css`
8 |
9 | ```css
10 | .highlight-spoken {
11 | color: black !important;
12 | background-color: #ff6f00 !important;
13 | border-radius: 5px;
14 | }
15 |
16 | .highlight-sentence {
17 | color: #000000 !important;
18 | background-color: #ffe082 !important;
19 | border-radius: 5px;
20 | }
21 | ```
22 |
23 | ### The code example
24 |
25 | File `App.js`
26 |
27 | ```jsx
28 | import "./App.css";
29 | import { useEffect, useMemo, useRef, useState } from "react";
30 | import { markTheWords, useTextToSpeech } from "react-speech-highlight";
31 |
32 | export default function App() {
33 | const text = "Some Input String";
34 | const textEl = useRef();
35 | const lang = "en-US";
36 |
37 | const config = {
38 | disableSentenceHL: false,
39 | disableWordHL: false,
40 | autoScroll: false,
41 | lang: lang,
42 | }
43 |
44 | const { controlHL, statusHL, prepareHL, spokenHL } = useTextToSpeech(config);
45 |
46 | const textHL = useMemo(() => markTheWords(text), [text]);
47 |
48 | return (
49 | <>
50 |
51 |
56 |
57 |
58 | {
61 | if (statusHL == "pause") {
62 | controlHL.resume(config);
63 | } else {
64 | controlHL.play(
65 | textEl.current
66 | );
67 | }
68 | }}
69 | pause={controlHL.pause}
70 | stop={controlHL.stop}
71 | />
72 | >
73 | );
74 | }
75 | ```
76 |
77 | ### Sample TTS Control
78 |
79 | File `PanelControlTTS.js`
80 |
81 | ```jsx
82 | export default function PanelControlTTS({ isPlay, play, pause, stop }) {
83 | return (
84 | <>
85 |
96 |
97 | {isPlay && }
98 | >
99 | );
100 | }
101 | ```
102 |
--------------------------------------------------------------------------------
/HOW_TO_USE.md:
--------------------------------------------------------------------------------
1 | # How to use this package
2 |
3 | The detail of how to use this package is in the `README.md` of each private repo
4 |
5 | [Demo Website https://react-speech-highlight.vercel.app](https://github.com/Web-XR-AI-lab/demo-website-react-speech-highlight)
6 |
7 | [Package Repo of React Speech Highlight](https://github.com/Web-XR-AI-lab/react-speech-highlight)
8 |
9 |
10 | [Demo Website & Pacakage of Vanilla Speech Highlight](https://github.com/Web-XR-AI-lab/vanilla-speech-highlight)
11 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | Buy once, Use it on all your projects, as long as the public can't see your project code.
2 | Appreciate my research by not share the code to other person.
--------------------------------------------------------------------------------
/LIMITATION.md:
--------------------------------------------------------------------------------
1 | # Limitation
2 |
3 | This package is designed to be no limit, but for now it has some limitations:
4 |
5 | 1. Too much marking tag, the `sps` for marking the sentence and `spw` is for marking the word. It will cause a lot of html tag in a page.
6 |
7 | It's all the limitation that i know.
--------------------------------------------------------------------------------
/LLM_ENGINE.md:
--------------------------------------------------------------------------------
1 | # LLM Engine
2 |
3 | What's that ? Simply it use LLM to make advanced AI functionality on your app. It's a simple cost effective way to get AI functionality on your app. Even your server just have < 1GB RAM.
4 |
5 | Further information, evaluation, etc. is in [private repo](https://github.com/Web-XR-AI-lab/demo-website-react-speech-highlight/blob/main/docs/LLM_ENGINE.md).
6 |
7 | **Just buy it and use it. It's simple.**
8 |
9 | You got ready make AI functionality on your app, knowledges, and more.
--------------------------------------------------------------------------------
/MAKE_BACKEND.md:
--------------------------------------------------------------------------------
1 | # How To Set Up Backend For This Package
2 |
3 | ## A. LLM (Large Language Model) API
4 |
5 | Optionally we need LLM API to solve many [issue](PROBLEMS.md). the LLM api i use is open ai chat completion api. So we must have backend server that provide proxy api call to the open ai.
6 |
7 | 
8 |
9 | ### 1. Make Backend for open ai chat completion API
10 |
11 | **API URL Endpoint**
12 |
13 | ```js
14 | OPENAI_CHAT_COMPLETION_API_ENDPOINT = "https://example.com/api/v1/public/chat";
15 | ```
16 |
17 | with that url then the `package` will send **body request** like this
18 |
19 | ```json
20 | {
21 | "temperature": 0,
22 | "model": "gpt-3.5-turbo",
23 | "messages": [
24 | {
25 | "role": "user",
26 | "content": "convert this semicolon separated number \"1000;4090;1000000;1,2;9001;30,1\" into word form number with language \"en-US\" return the result as array. don't explain"
27 | }
28 | ]
29 | }
30 | ```
31 |
32 | and your backend will **response** like this.
33 |
34 | #### Example response that this package want
35 |
36 | ```json
37 | {
38 | "id": "chatcmpl-7s8i7oHA1BkcLD5U0FFkoYEn2b2QF",
39 | "object": "chat.completion",
40 | "created": 1693137551,
41 | "model": "gpt-3.5-turbo-0613",
42 | "choices": [
43 | {
44 | "index": 0,
45 | "message": {
46 | "role": "assistant",
47 | "content": "[\"one thousand\", \"four thousand ninety\", \"one million\", \"one point two\", \"nine thousand one\", \"thirty point one\"]"
48 | },
49 | "finishReason": "stop"
50 | }
51 | ],
52 | "usage": { "promptTokens": 54, "completionTokens": 29, "totalTokens": 83 }
53 | }
54 | ```
55 |
56 | ### 2. Set open ai chat completion api endpoint for the package
57 |
58 | Goto [API Docs about this](API.md#package-data-and-cache-integration)
59 |
60 |
61 |
62 | ### Example Implementation
63 |
64 | If you are using different backend, please look by yourself how to implement it. the important is the same respond (like [this](#example-response-that-this-package-want)) so the `react-speech-highlight` package can understand Actually you can customize the logic, like add [authentication header](API.md#set-custom-constant-value-for-this-package).
65 |
66 |
67 | Show example using Laravel as Backend
68 |
69 |
70 |
71 | ### Router
72 |
73 | Open `routes/api.php`
74 |
75 | Remember you must set the throttle 180 request / 1 minute. our engine need to send a lot request. no worry it small request so its cost effective.
76 |
77 | ```php
78 | /* OpenAI */
79 | Route::name("openai.")->middleware('throttle:180,1')->controller(OpenAIController::class)->group(function () {
80 | // chat gpt
81 | Route::post('chat', 'chatPost')->name('chat_completions');
82 | });
83 | ```
84 |
85 | Controller
86 |
87 | Open `OpenAIController.php`
88 |
89 | ```php
90 | class OpenAIController extends Controller
91 | {
92 | public function chatPost(Request $request){
93 | $origin = $request->header('Origin');
94 |
95 | $allowed_domain = [
96 | // Production url
97 | "https://example.com" => "sk-xxx_your_secret_key",
98 |
99 | // Development url
100 | "http://localhost:3000" => "sk-xxx_your_secret_key",
101 | ];
102 |
103 | if (!isset($allowed_domain[$origin])) {
104 | return response()->json([
105 | "status" => false,
106 | "message" => "Invalid request, please contact support!"
107 | ], 400);
108 | } else {
109 | if (strpos($origin, 'localhost') !== false) {
110 | if (app()->environment() != "local") {
111 | return response()->json([
112 | "status" => false,
113 | "message" => "Invalid request, please contact support!"
114 | ], 400);
115 | }
116 | }
117 | }
118 |
119 | $api_key = $allowed_domain[$origin];
120 | $data = $request->all();
121 |
122 | if (!isset($data['messages'])) {
123 | return response()->json([
124 | "status" => false,
125 | "message" => "please post 'messages' as body request"
126 | ], 400);
127 | }
128 |
129 | // the [https://github.com/openai-php/laravel] package is have problem don't use it
130 | // https://github.com/openai-php/laravel/issues/51#issuecomment-1651224516
131 |
132 | $body = [
133 | 'model' => isset($data["model"]) ? $data["model"] : 'gpt-3.5-turbo',
134 | 'messages' => $data["messages"],
135 | 'temperature' => isset($data["temperature"]) ? $data["temperature"] : 0.6,
136 |
137 | // 'functions' => [
138 | // [
139 | // 'name' => $function, 'parameters' => config('schema.'.$function),
140 | // ],
141 | // ],
142 | // 'function_call' => [
143 | // 'name' => $function,
144 | // ],
145 | // 'temperature' => 0.6,
146 | // 'top_p' => 1,
147 | ];
148 |
149 | // Use approach like this instead
150 | $result = Http::withToken($api_key)
151 | ->retry(5, 500)
152 | ->post('https://api.openai.com/v1/chat/completions', $body)
153 | ->throw()
154 | ->json();
155 |
156 | return $result;
157 | }
158 | }
159 | ```
160 |
161 |
162 |
163 |
164 |
165 | ## B. Text To Speech API
166 |
167 | When you decide to use audio source is from TTS API. you can see the [AUDIO_FILE.md](AUDIO_FILE.md) for more detail.
168 |
--------------------------------------------------------------------------------
/PROBLEMS.md:
--------------------------------------------------------------------------------
1 | # Problems
2 |
3 | ## A. Common problem in Text to Speech (Both audio file and Web Speech Synthesis)
4 |
5 | ### 1. Pronounciation Problem
6 |
7 | We want text user see is different with what system should speak.
8 |
9 | What we do ? we make some engine that can do accurate and cost effective pronounciation correction Using LLM Open AI Chat Completions for any terms or equations from academic paper, math, physics, computer science, machine learning, and more...
10 |
11 |
12 | Show details
13 |
14 |
15 | **Auto Pronounciation Correction**
16 |
17 | This package needs chat gpt api to do that. [see how to use integrate this package with open ai api](MAKE_BACKEND.md)
18 |
19 |
20 |
21 | **Manual Pronounciation Correction**
22 |
23 | in english abbreviation like `FOMO`, `ETA`, etc.
24 |
25 | This package also have built-in abbreviation function, or you can write your own rules.
26 |
27 | ```
28 | input:string -> abbreviation function -> output:string.
29 | ```
30 |
31 |
32 |
33 |
34 | ## B. When Using Audio File
35 |
36 | ### 1. The delay of audio played and user gesture to trigger play must be close.
37 |
38 | When user click play button the system will preparing the audio file with send request to TTS API like eleven labs, what if the api request is so long (because the text you send is long)? what if your TTS API has limitation
39 |
40 | It will causing bad experience to the user.
41 |
42 |
43 | Read more
44 |
45 | It will causing bad experience to the user. even in device like ipad and iphon they have rules that the delay between user interaction and the audio played must not exceed 4seconds or it will be fail.
46 |
47 | They will give error like this
48 |
49 | ```
50 | Unhandled Promise Rejection: NotAllowedError: The request is not allowed by the user agent or the platform in the current context, possibly because the user denied permission.
51 | ```
52 |
53 | So what the solution for this?
54 |
55 | I set this package to make batch request for API call.
56 |
57 |
58 | ### 2. Long text request to TTS API (Capabilty of TTS API handling long text)
59 |
60 | All tts api has limitation of character that can be sent to them.
61 |
62 | ### The solution is using Batch Request System
63 |
64 | Batch strategy will solve that problems above. You can define the batch size in the [config](API.md#2a-config)
65 |
66 |
67 |
68 |
69 | Read more
70 |
71 |
72 |
73 | **How it work?**
74 |
75 | Let says you have 10000 character long of text, and let says your tts api service will be done making the audio file in 60 seconds. (so your user will waiting to play 60 second after they want ? it so bad)
76 |
77 | So, My package will chunk it into close to the 200 character each.
78 |
79 | 10000/200 = 50 request.
80 |
81 | 60/10000\*200 = 1.2 seconds
82 |
83 | my package will send the first chunk, and the tts api will give the audio file in just 1,2 then the audio is played.
84 |
85 | So the delay between user click button play and the tts start to play will be just 1,2 seconds. what about other chunks. i manage to send other chunk in the background while tts is played. and enchance efficiency of character used in tts api. you pay the tts api service based on the character right?.
86 |
87 | lets say we have
88 |
89 | ```
90 | chunk0 <- user still playing this
91 | chunk1
92 | chunk2 <- my package will try to prepare until this
93 | chunk3
94 | ...
95 | chunk49
96 | ```
97 |
98 | This method will, solve other problem like maximal character that your tts api can handle. for example on elvenlabs they only can do [5000](https://help.elevenlabs.io/hc/en-us/articles/13298164480913-What-s-the-maximum-amount-of-characters-and-text-I-can-generate) character for audio generation.
99 |
100 |
101 |
102 |
103 |
104 |
105 | ## C. When Using Web Speech Synthesis
106 |
107 | The [SpeechSynthesis](https://developer.mozilla.org/en-US/docs/Web/API/SpeechSynthesis). Comes with problems:
108 |
109 | ### 1. Unlimited String Length Capability
110 |
111 | Some available voice doesn't support long text / string.
112 |
113 | How about this package? it can read unlimited string (can't die when playing).
114 |
115 |
116 |
117 |
118 |
119 | ### 2. Cross Platform Problem
120 |
121 | I'm sure that you have the same experience, developing web for cross platform, android, Iphone or Ipad always resulting problem.
122 |
123 | - Speech synthesis in IOS or Ipad OS sometimes die unexpected.
124 | - Sometimes `speechsynthesis` doesn't fire the `onpause`, `onresume` event on android, ipad,
125 |
126 |
127 |
128 | ### 3. Unpredictable Onboundary
129 |
130 | - First, Not all voices have `onboundary` event
131 | - On ipad the `onboundary` event only work with about 30% of the full sentence.
132 | - Also the on boundary event doesn't fire function accurately. for example the text is `2022` the `onboundary` will fire twice `20` and `22`.
133 |
134 |
135 |
136 | ### 4. Bad performance or voice too fast
137 |
138 | in API `prepareHL.getVoices()` i implement this flow:
139 |
140 |
141 | Show flow
142 |
143 |
144 | 
145 |
146 |
147 |
148 |
149 |
150 | ### 5. The voice is not good enough
151 |
152 | With `window.speechSynthesis` the voice is like a robot, doesn't have parameter like:
153 |
154 | - Emotions
155 | - Characteristic
156 |
157 | It can be achieved by using deep learning (with python) or other paid TTS API.
158 |
159 | In this package i just want to make cheap solution for TTS so i just the `window.speechSynthesis`.
160 |
161 | Now this package has Prefer / Fallback to Audio file.
162 |
163 | Options to play:
164 |
165 | preferAudio(if defined) > Web Speech Synthesis > fallbackAudio(if defined)
166 |
167 | see [AUDIO_FILE.md](AUDIO_FILE.md) and the config [API.md](API.md#2a-config)
168 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # React / Vanilla Speech Highlight - Text-to-Speech with Word Highlighting
2 |
3 |
4 |
5 | [](https://react-speech-highlight.vercel.app)
6 |
7 | **React / Vanilla Speech Highlight** is a powerful library for integrating **text-to-speech** and **real-time word/sentence highlighting** into your web applications. It supports **audio files**, the **Text-to-Speech API**, and the **Web Speech Synthesis API**, making it ideal for creating interactive, accessible, and dynamic user experiences.
8 |
9 | [🌟 Try the Demo: React Speech Highlight](https://react-speech-highlight.vercel.app)
10 |
11 | https://github.com/user-attachments/assets/79c6d4f6-f3e2-4c38-9bec-dbb6834d87f8
12 |
13 |
14 |
15 | ## Other Version
16 |
17 | ### Vanilla JS (Native Javascript)
18 |
19 |
20 |
21 |
22 |
23 | We support implementation using vanilla js. this package has bundle size of 45 KB. You can easily combine this library with your website, maybe your website using [jquery](https://jquery.com)
24 |
25 | Read the [API_VANILLA.md](API_VANILLA.md) to see the different.
26 |
27 | [Try the demo Vanilla Speech Highlight](https://vanilla-speech-highlight.vercel.app)
28 |
29 | Watch [Youtube Video](https://youtu.be/vDc7L5W7HhU) about implementation vanilla speech highlight for javascript text to speech task.
30 |
31 | ### React Native Speech Highlight
32 |
33 |
34 |
35 |
36 |
37 |
38 |
39 |
40 | Show video demo
41 |
42 |
43 | https://github.com/albirrkarim/react-speech-highlight-demo/assets/29292018/abb9cb6c-4c55-448b-a9a5-d1856896b455
44 |
45 |
46 |
47 | Built with react native cli. [Try the demo android app](https://bit.ly/RNSHL-4-9-9)
48 |
49 | Do you want other implementation? just ask me via discord: albirrkarim
50 |
51 | This is the Documentation for [web version](#--the-web-version-react-and-vanilla-js)
52 |
53 |
54 |
55 | # Docs for v5.4.8
56 |
57 | **Table Of Contents**
58 |
59 | - [A. Introduction](#a-introduction)
60 | - [B. Todo](#b-todo)
61 | - [C. API & Example Code](#c-api--example-code)
62 | - [D. Changelog](#d-changelog)
63 | - [E. Disclaimer & Warranty](#e-disclaimer--warranty)
64 | - [F. FAQ](#f-faq)
65 | - [G. Payment](#g-payment)
66 |
67 | ## A. Introduction
68 |
69 | ### What i want?
70 |
71 | Recently, I want to implement the text-to-speech with highlight the word and sentence that are being spoken on my website.
72 |
73 | Then i do search on the internet. but i can't find the npm package to solve all TTS [problems](PROBLEMS.md)
74 |
75 | I just want some powerfull package that flexible and good voice quality.
76 |
77 | ### Here what i got when i search on internet:
78 |
79 | Overall the text to speech task comes with problems (See the detail on [PROBLEMS.md](PROBLEMS.md)) whether using web speech synthesis or the audio file.
80 |
81 | **Using [Web SpeechSynthesis](https://developer.mozilla.org/en-US/docs/Web/API/SpeechSynthesis)**
82 |
83 | They have problems Robot like sound, Supported Devices Available, etc..
84 |
85 | **Using paid subscription text-to-speech synthesis API**
86 |
87 | When we talk about good sound / human like voices AI models inference should get involved. So it doesn't make sense if doing that on client side.
88 |
89 | Then the speech synthesis API provider like [ElevenLabs](https://try.elevenlabs.io/29se7bx2zgw1), [Murf AI](https://get.murf.ai/0big1kdars4f), [Open AI](https://platform.openai.com/docs/guides/text-to-speech), [Amazon Polly](https://aws.amazon.com/id/polly/), and [Google Cloud](https://cloud.google.com/text-to-speech) play their roles.
90 |
91 | But they don't provide the npm package to do highlighting.
92 |
93 | Then i found [Speechify](https://speechify.com). but i don't find any docs about using some npm package that integrate with their service. Also this is a paid subscriptions services.
94 |
95 | Searching again, Then i found [ElevenLabs](https://try.elevenlabs.io/29se7bx2zgw1) its free if the 10000 character / month and will reset on next month. **Cool right?** So i decide to use this as speech synthesis API in my project. This platform also doesn't provide the react npm package to highlight their audio, but they provide [streaming output audio](https://elevenlabs.io/docs/api-reference/websockets#streaming-output-audio) that can be use to produce "when the words is spoken in some audio" (transcript timestamp) like [someone make about this thing](https://medium.com/@brandon.demeria/synchronized-text-highlighting-with-elevenlabs-speech-in-laravel-php-e387c2797396).
96 |
97 | **In production you must do cost calculation**, which TTS Service API provider you should choose. The services that have capability streaming audio is promising highlight word. but also comes with high price. **The cheap TTS service API usually don't have much features.**
98 |
99 | The [elevenlabs](<(https://try.elevenlabs.io/29se7bx2zgw1)>) have produce good quality voice and many features, but when comes for production they more expensive compares with Open AI TTS, In production the cost is important matter.
100 |
101 | ### Solutions
102 |
103 | 
104 |
105 | So, I decide to making this npm package that combines various methods above to achives all the good things and throw the bad things. All logic is done in client side, look at the overview above. No need to use advanced backend hosting.
106 |
107 | My package combines [Built in Web SpeechSynthesis](https://developer.mozilla.org/en-US/docs/Web/API/SpeechSynthesis) and Audio File (optional) to run.
108 |
109 | When using prefer/fallback to audio file you can achive high quality sound and remove all compactbility problem from [Built in Web SpeechSynthesis](https://developer.mozilla.org/en-US/docs/Web/API/SpeechSynthesis).
110 |
111 | How you can automatically get the audio file of some text ? you can use [ElevenLabs](https://try.elevenlabs.io/29se7bx2zgw1),[Murf AI](https://get.murf.ai/0big1kdars4f),[Open AI](https://platform.openai.com/docs/guides/text-to-speech), [Amazon Polly](https://aws.amazon.com/id/polly/), and [Google Cloud](https://cloud.google.com/text-to-speech) or any other TTS API as long as they can produce audio file (mp3, mp4, wav, etc...) for the detail see the [AUDIO_FILE.md](AUDIO_FILE.md). In the [demo website](https://react-speech-highlight.vercel.app/) i provide you example using ElevenLabs and even you can try your own audio file on that demo web.
112 |
113 | This package just take input text and audio file, so you can flexible to use any TTS API that can produce audio file, The expensive one or even cheap one when you consider the cost.
114 |
115 | How this package know the timing spoken word or sentence of played audio? This package can detect the spoken word and sentence in client side.
116 |
117 | This package is one time pay. No Subscription. Who likes subscription? I also don't. see the how to [purchase bellow](#g-payment).
118 |
119 | 
120 |
121 | 
122 |
123 | ### Use Cases
124 |
125 | When you are entrepreneur im sure you have some crazy uses case for this package.
126 |
127 | #### Interactive Blog
128 |
129 | Imagine that you have long article and have TTS button then played the text to speech and users can see how far the article has been read. you article will be SEO ready because this package has Server Side Rendering (SSR) capability.
130 |
131 | #### Web AI Avatar / NPC
132 |
133 | 
134 |
135 | In the [demo](https://react-speech-highlight.vercel.app/) i provide, you can see the 3D avatar from [readyplayer.me](https://readyplayer.me/) can alive playing the `idle` animation and their mouth can synchronize with the highlighted text to speech, it because this package has react state that represent [current spoken viseme](https://github.com/albirrkarim/react-speech-highlight-demo/blob/main/API.md#spokenhl). the viseme list that i use in the demo is [Oculus OVR LipSync](https://docs.readyplayer.me/ready-player-me/api-reference/avatars/morph-targets/oculus-ovr-libsync).
136 |
137 | #### Language Learning App With Real Human Voice
138 |
139 | 
140 |
141 | Look at the example 6 on the [demo](https://react-speech-highlight.vercel.app). its a example of use real human voice for text to speech. Maybe your local language is not supported by the TTS API. you can use this package to use the real human voice. The real human voice is recorded by the real human. The real human voice is more natural than the TTS API.
142 |
143 | #### Academic Text Reader
144 |
145 | 
146 |
147 | The problem when we do TTS on academic text. it contains math equations, formula, symbol that the shown term is different with their pronounciation [see](PROBLEMS.md#1-pronounciation-problem). so we make some pronounciation correction engine utilizing the Open AI API to think what should the term pronounced.
148 |
149 | #### Relation Highlight and Word Level Highlighting of Youtube Transcript
150 |
151 | https://github.com/user-attachments/assets/799bae21-a43e-44c4-a4c7-ede7ac2d5b51
152 |
153 | It has youtube iframe, and the youtube transcript on the right, when you play the youtube video, the transcript will be highlighted. The highlighting is based on the current time of the played video. this package are **follow** the time.
154 |
155 | Relation Highlight feature - When you hover into some word, the related word will be highlighted too. Example when you hover into chinese word, the pinyin and english word will be highlighted too and vice versa. How it can? [see](LLM_ENGINE.md).
156 |
157 | #### Video Player With Auto Generate Subtitle
158 |
159 | https://github.com/user-attachments/assets/f0d8d157-1c1e-43e1-8eba-ebe7dfe3865e
160 |
161 | Case: You just have audio or video file without text transcript. Our package can generate the transcript from the audio file. or even transtlate the transcript to other language. The subtitle can be highlighted when the video is played, and maybe it want to show two different language subtitle at once. and also highlight the both based on the meaning of the words.
162 |
163 | On that preview video above the video original language is in italian, and i also show the translate in english. and the system is highlight both based on the meaning.
164 |
165 | Italian word `bella` have meaning in english `beautiful`
166 |
167 | Go to [this video demo page](https://react-speech-highlight.vercel.app/video).
168 |
169 | #### Realtime Communication With Highlighted Text
170 |
171 | Task that the audio is feed to client side in real time like you are on a phone call, theres no audio file behind it.
172 |
173 | Recently open ai made [realtime api](https://platform.openai.com/docs/guides/realtime) it use [Web RTC (Web Real-Time Communication)](https://developer.mozilla.org/en-US/docs/Web/API/WebRTC_API) so you can like have a phone call with AI.
174 |
175 | Goto [realtime communication demo](https://react-speech-highlight.vercel.app/#example-realtime).
176 |
177 | #### Your use case here
178 |
179 | 
180 |
181 | Just ask me what you want to make, the package architecture is scalable to make various feature.
182 |
183 |
184 |
185 |
186 | ## B. TODO
187 |
188 | - [ ] Add discord chat bot using LLM for explaining the API, and just explain what you want to make, and they will give you the code.
189 | - [ ] Automation Brute Force End to End Testing, test all APIs, runtime, sequential action, etc...
190 | - [ ] Add viseme support for chinese character
191 | - [ ] Let me know what you want from this package, the package architecture is scalable to make various feature, please write it on issues tab, or contact me via discord (username: albirrkarim)
192 |
193 |
194 |
195 | - [x] Realtime Text to Speech With Highlight - This package can intergrate with [open ai realtime api](https://platform.openai.com/docs/guides/realtime), Imagine you have a phone call with AI the web are displaying the transcript with highlight the current spoken.
196 | - [x] Add example of [streaming TTS with highlight](https://react-speech-highlight.vercel.app/example). it play tts with highlight while even the text is still streamed.
197 | - [x] Re-[Architecture](#solutions) the package into plugins system, and add optional backend-nify the LLM Engine, so it faster, secure and more reliable.
198 | - [x] Making Hybrid engine timestamp detection
199 | - [x] Relation Highlight Feature - Used in Youtube Transcript Highlight. Highlight the words in youtube transcript, and their relations to other word like their translation form.
200 | - [x] Add Virtual Node for flexible highlighting
201 | - [x] React Native Speech Highlight - Now we add support for mobile app version using [React Native](https://reactnative.dev/), [try the demo app](#react-native-speech-highlight)
202 | - [x] Accurate and cost effective [pronounciation correction](PROBLEMS.md#a-common-problem-in-text-to-speech-both-audio-file-and-web-speech-synthesis) Using LLM Open AI Chat Completions for any terms or equations from academic paper, math, physics, computer science, machine learning, and more...
203 | - [x] Server Side Rendering Capability, see our demo is using [next js](https://nextjs.org/)
204 | - [x] Batch API request for making the audio file for long article content. it will improve efficiency and user experience. [it for solve The delay of audio played and user gesture to trigger play must be close.](PROBLEMS.md#1-the-delay-of-audio-played-and-user-gesture-to-trigger-play-must-be-close)
205 | - [x] Add example text to speech with viseme lipsync on 3D avatar generated from [readyplayer.me](https://readyplayer.me). [see](https://vanilla-speech-highlight.vercel.app)
206 | - [x] Add viseme API for current spoken TTS, [see](https://vanilla-speech-highlight.vercel.app)
207 | - [x] Add vanilla js support, for those who don't use react, [see vanilla demo](https://vanilla-speech-highlight.vercel.app)
208 | - [x] Add fallback/prefer to audio file (.mp3/etc) when user doesn't have built in speech synthesis in their devices. or maybe prefer using audio file because the sound is better than robot like sound. [see](AUDIO_FILE.md)
209 | - [x] Docs integration text-to-speech with [Eleven Labs](https://try.elevenlabs.io/29se7bx2zgw1) API [see the demo web](https://react-speech-highlight.vercel.app)
210 | - [x] Integration with [React GPT Web Guide](https://github.com/albirrkarim/react-gpt-web-guide-docs) Package.
211 | - [x] Multi character support for non latin alphabet ( chinese (你好),
212 | russian (Привет), japanese (こんにちは), korean (안녕하세요), etc ). [see](https://react-speech-highlight.vercel.app/#non-latin)
213 | - [x] Add [language detection using LLM api](API.md#2-getlangforthistext)
214 | - [x] Add [seeking by sentence or paragraph](API.md#2b-interface), [reading progress by word or sentence](API.md#spokenhl), [Adjust config while TTS playing.](API.md#controlhl), [Custom Abbreviation Function](API.md#1-tts-marker-markthewords)
215 | - [x] Realiability: TTS that can't die, Test on any platform, Code Linting using eslint, Using [Typescript](https://www.typescriptlang.org/), [Tested (Prompt Test, Unit Test, Engine Test)](TEST.md)
216 | - [x] Add [demo website](https://react-speech-highlight.vercel.app)
217 |
218 |
219 |
220 |
221 | ## C. API & Example Code
222 |
223 | See [API.md](API.md) and [EXAMPLE_CODE.md](EXAMPLE_CODE.md) that contain simple example code.
224 |
225 | The full example code and implementation example is using source code from [demo website](https://react-speech-highlight.vercel.app). the source code of demo website is included when you buy this package.
226 |
227 | This package is written with typescript, You don't have to read all the docs in here, because this package now support [js doc](https://jsdoc.app) and [VS Code IntelliSense](https://code.visualstudio.com/docs/editor/intellisense) what is that? simply its when you hover your mouse into some variable or function [VS Code](https://code.visualstudio.com) will show some popup (simple tutorial) what is the function about, examples, params, etc...
228 |
229 | Just use the source code from demo website, you can literally just understand the package.
230 |
231 | https://github.com/albirrkarim/react-speech-highlight-demo/assets/29292018/05d325f9-469c-47e9-97d3-10053628e18c
232 |
233 |
234 |
235 |
236 | ## D. Changelog
237 |
238 | Changelog contains information about new feature, improve accuracy, fix bug, and what you should do when the version is update.
239 |
240 | See [CHANGELOG.md](CHANGELOG.md)
241 |
242 |
243 |
244 |
245 | ## E. Disclaimer & Warranty
246 |
247 | There's no refund.
248 |
249 | I love feedback from my customers. You can write on the issue tab so when i have time i can try to solve that and deliver for the next update.
250 |
251 | Still worry? see the [reviews on producthunt](https://www.producthunt.com/products/react-vanilla-speech-highlight/reviews)
252 |
253 |
254 |
255 |
256 |
257 |
258 | ## F. FAQ
259 |
260 |
261 | Why it's expensive? Why it's not opensource package?
262 |
263 |
264 |
265 | Well, i need money to funding the research, you know that making complex package is cost a lot of time and of course money, and high engineering skills.
266 |
267 | Making marking the sentence and word for all languages with different writing system is really hard. I have do research about that language, then making a lot of test case that make the marking solid and reliable for all languages.
268 |
269 | Making the [LLM engines](LLM_ENGINE.md) that combines prompt engineering and efficient algorithm to saving Open AI API cost. Need to be tested and the test is repeatly that cost the API call.
270 |
271 | Also i provide support via live private chat to me through discord (username: albirrkarim).
272 |
273 | **When you try to make this package by yourself, you will need to spend a lot of time and money to make something like this package.**
274 |
275 | This package is a `base` package that can be used for various [use cases](#use-cases). I made a lot of money with package. The limit is your entrepreneurship skill.
276 |
277 |
278 |
279 |
280 |
281 |
282 | How about support?
283 |
284 |
285 |
286 | Tell your problems or difficulties to me, i will show you the way to solve that.
287 |
288 | I provide realtime support from me with discord.
289 |
290 | Just buy it. remove the headache. and you can focus on your project.
291 |
292 |
293 |
294 |
295 |
296 |
297 | Can you give me some discount?
298 |
299 |
300 |
301 | Yes, if you are student or teacher, you can get discount. Just show me your student card or teacher card.
302 |
303 | Yes, if you help me vote this package on [product hunt](https://www.producthunt.com/products/react-vanilla-speech-highlight)
304 |
305 |
306 |
307 |
308 |
309 |
310 | Is it well documented and well crafted?
311 |
312 |
313 |
314 | You can see the docs in this repo, and this package is written with typescript, and tested using jest to make sure the quality.
315 |
316 | You don't have to read all the docs in here, because this package now support [VS Code IntelliSense](https://code.visualstudio.com/docs/editor/intellisense) what is that? simply its when you hover your mouse into some variable or function [VS Code](https://code.visualstudio.com/) will show some popup (simple tutorial) what is the function about, examples, params, etc...
317 |
318 | Just use the source code from demo website, you can literally just understand the package.
319 |
320 | https://github.com/albirrkarim/react-speech-highlight-demo/assets/29292018/05d325f9-469c-47e9-97d3-10053628e18c
321 |
322 |
323 |
324 |
325 |
326 |
327 | This package written in Typescript? Is it can be mixed together with jsx or native js project?
328 |
329 |
330 |
331 | Yes it can, just ask [chat gpt](https://chatgpt.com), and explain your problems.
332 |
333 | Example :
334 |
335 | "My project is using webpack, code is using jsx, i want to use tsx code along side the jsx, how can i?"
336 |
337 |
338 |
339 |
340 |
341 |
342 | How accurate the viseme generation?
343 |
344 |
345 | Goto the [Vanilla Speech Highlight](https://vanilla-speech-highlight.vercel.app)
346 |
347 | I make demo for outputing the viseme into console.log. just open the browser console and play the prefer audio example (english). and you will see the word and viseme in the current timing of played tts.
348 |
349 |
350 |
351 |
352 |
353 |
354 | How accurate the highlight capability?
355 |
356 |
357 | Just see the [demo](https://react-speech-highlight.vercel.app)
358 |
359 |
360 |
361 |
362 |
363 |
364 | Why there's no voices available on the device?
365 |
366 |
367 |
368 | Try to use Prefer or Fallback to Audio File see [AUDIO_FILE.md](AUDIO_FILE.md)
369 |
370 | or
371 |
372 | Try to setting the speech synthesis or language in your device.
373 |
374 | If you use smartphone (Android):
375 |
376 | 1. Make sure you install [Speech Recognition & Synthesis](https://play.google.com/store/apps/details?id=com.google.android.tts)
377 |
378 | 2. If step 1 doesn't work. Try to download google keyboard. then setting the Dictation language. wait a few minute (your device will automatically download the voice), then restart your smartphone.
379 |
380 |
381 |
382 |
383 |
384 |
385 | Why speech doesn't work for first played voice? (web speech synthesis)
386 |
387 |
388 |
389 | Your device will download that voice first. then your device will have that voice locally.
390 |
391 | Try to use Prefer or Fallback to Audio File see [AUDIO_FILE.md](AUDIO_FILE.md)
392 |
393 |
394 |
395 |
396 |
397 |
398 | Can i use this text-to-speech without showing the highlight?
399 |
400 |
401 |
402 | Yes, [see](API.md)
403 |
404 |
405 |
406 |
407 |
408 |
409 | Can i use without openai API?
410 |
411 |
412 |
413 | This package optionally required open ai API for better doing text-to-speech task (solve many problem that i wrote in [PROBLEMS.md](PROBLEMS.md)).
414 |
415 | But if you don't want to use open ai API, it can still work. see the FAQ about **_What dependency this package use?_**
416 |
417 |
418 |
419 |
420 |
421 | What dependency this package use?
422 |
423 |
424 |
425 | **NPM dependencies:**
426 |
427 | - For React Speech Highlight: See the [package.json](package.json) in this repo. see the `peerDependencies` once you build this package you will need only npm package that is in that `peerDependencies`. Only react.
428 |
429 | - For [Vanilla Speech Highlight](https://vanilla-speech-highlight.vercel.app): No dependency, just use the vanilla js file.
430 |
431 | **AI dependencies:**
432 |
433 | - This package optionally required open ai API for better doing text-to-speech task (solve many problem that i wrote in [PROBLEMS.md](PROBLEMS.md)).
434 |
435 | - Optionally using any TTS API that can produce audio file for better sound quality. Like [ElevenLabs](https://try.elevenlabs.io/29se7bx2zgw1), [Murf AI](https://get.murf.ai/0big1kdars4f), [Open AI](https://platform.openai.com/docs/guides/text-to-speech), [Amazon Polly](https://aws.amazon.com/id/polly/), and [Google Cloud](https://cloud.google.com/text-to-speech) or any other TTS API as long as they can produce audio file (mp3, mp4, wav, etc...) for the detail see the [AUDIO_FILE.md](AUDIO_FILE.md).
436 |
437 |
438 |
439 |
440 |
441 |
442 | Support for various browsers and devices?
443 |
444 |
445 |
446 | Yes, See the detail on [TEST.md](TEST.md)
447 |
448 | or you can Try to use Prefer or Fallback to Audio File see [AUDIO_FILE.md](AUDIO_FILE.md)
449 |
450 |
451 |
452 |
453 |
454 |
455 | How it work? Is the Package Architecture Scalable?
456 |
457 |
458 | It just work. Simple explanation is in the introduction [above](#a-introduction).
459 |
460 | The architecture scalable, just ask me what feature you want.
461 |
462 |
463 |
464 |
465 |
466 |
467 | How about API cost of using open AI API for your package use?
468 |
469 |
470 | See [LLM_ENGINE.md](LLM_ENGINE.md)
471 |
472 |
473 |
474 |
475 |
476 |
477 | Our Company have already make a lot of audio file, can i just use it for highlighting with your package?
478 |
479 |
480 | No, Because my package handle all the [batching system](PROBLEMS.md#2-long-text-request-to-tts-api-capabilty-of-tts-api-handling-long-text), [pronounciation system](PROBLEMS.md#1-pronounciation-problem), and [providing text](API.md#3-converttextintocleartranscripttext) so the TTS API can produce the audio file that can be used for highlighting.
481 |
482 | You can just do [caching strategy](AUDIO_FILE.md#a-efficient-cost-strategy) to cache the request response. for both open ai API and TTS API for audio file.
483 |
484 |
485 |
486 |
487 |
488 | ## G. Payment
489 |
490 | ### - The Web Version (React and Vanilla js)
491 |
492 | 
493 |
494 | For individual developer, freelancer, or small business.
495 |
496 | Due to the high demand for this library, access is granted through a bidding process.
497 |
498 | Submit your bid within the designated timeframe (every month we choose the winner).
499 |
500 | Highest bidders get priority access.
501 |
502 | [Fill bid form](https://forms.gle/T7ob1k7w1oywCYHP9)
503 |
504 | After payment, you’ll be invited to my private repository, where you’ll have access for one year, including all updates during that time.
505 |
506 | For continued access in subsequent years, you can pay USD $50 annually to remain in the private repository.
507 |
508 | **What you got**
509 |
510 | - [The demo website (Next js based)](https://github.com/Web-XR-AI-lab/demo-website-react-speech-highlight)
511 |
512 | - [The package repo (React Speech Highlight)](https://github.com/Web-XR-AI-lab/react-speech-highlight)
513 |
514 | - [The package repo (Vanilla Speech Highlight)](https://github.com/Web-XR-AI-lab/vanilla-speech-highlight)
515 |
516 |
517 |
518 |
563 |
564 | ### Backend Server for Advanced Features
565 |
566 | 
567 |
568 | - [Python server ($20)](https://github.com/Web-XR-AI-lab/rshl_python_helper)
569 |
570 | Contains: YouTube relation transcript highlight, Video auto-generate transcript
571 |
572 | - [Node js server ($20)](https://github.com/Web-XR-AI-lab/rshl_node)
573 |
574 | Contains: Backenify LLM engines
575 |
576 |
577 |
578 |
595 |
596 |
597 |
598 |
610 |
611 |
612 |
613 | ### Payment method
614 |
615 | **Github Sponsors**
616 |
617 | Choose One Time Tab, Select the option, and follow the next instruction from github.
618 |
619 |
620 |
621 |
622 |
623 |
624 |
625 |
626 | ## Keywords
627 |
628 | So this package is the answer for you who looking for:
629 |
630 | - Best Text to Speech Library
631 | - Text to speech with viseme lipsync javascript
632 | - Javascript text to speech highlight words
633 | - How to text to speech with highlight the sentence and words like speechify
634 | - How to text to speech with highlight the sentence and words using elevenlabs
635 | - How to text to speech with highlight the sentence and words using open ai
636 | - How to text to speech with highlight the sentence and words using google speech synthesis
637 | - Text to speech javascript
638 | - Typescript text to speech
639 | - Highlighted Text to Speech
640 | - Speech Highlighting in TTS
641 | - TTS with Sentence Highlight
642 | - Word Highlight in Text-to-Speech.
643 | - Elevenlabs TTS
644 | - Highlighted TTS Elevenlabs
645 | - OpenAI Text to Speech
646 | - Highlighted Text OpenAI TTS
647 | - React Text to Speech Highlight
648 | - React TTS with Highlight
649 | - React Speech Synthesis
650 | - Highlighted TTS in React
651 | - Google Speech Synthesis in React
652 | - Text to Speech React JS
653 | - React JS TTS
654 | - React Text-to-Speech
655 | - TTS in React JS
656 | - React JS Speech Synthesis
657 | - JavaScript TTS
658 | - Text-to-Speech in JS
659 | - JS Speech Synthesis
660 | - Highlighted TTS JavaScript
661 | - Youtube Transcript Highlight
662 | - Word Highlight in Youtube Transcript
663 | - How to Highlight Words in Youtube Transcript
664 | - Youtube Transcript Word Timing
665 | - Realtime tts with highlight
666 | - Realtime tts streamed audio & text
--------------------------------------------------------------------------------
/SUPPORTED_LANGUAGES.md:
--------------------------------------------------------------------------------
1 | # Supported Languages
2 |
3 | All language is supported, except Kannada, Thai
4 |
5 | Here languages that currently we already test it:
6 |
7 | Afrikaans, Arabic, Armenian, Azerbaijani, Belarusian, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Marathi, Maori, Nepali, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Turkish, Ukrainian, Urdu, Vietnamese, and Welsh.
--------------------------------------------------------------------------------
/TEST.md:
--------------------------------------------------------------------------------
1 | # Testing Report
2 |
3 | ## A. Unit Test
4 |
5 | Unit Test is making test case for each function. make sure individual function or engines that use LLM is working as expected time over time after many development cycles.
6 |
7 | ### Function Test
8 |
9 | see [changelog](CHANGELOG.md)
10 |
11 | ### Prompt Test
12 |
13 | Testing each prompt so it can be cost effective
14 |
15 | see [changelog](CHANGELOG.md)
16 |
17 | ### Engine Test
18 |
19 | The [pronounciation correction engine](API.md#1-pronunciationcorrection) is combines the LLM (open ai chat API) and good algorithm to achieve accurate and cost effective. Of course it tested with test case.
20 |
21 | see [changelog](CHANGELOG.md)
22 |
23 |
24 |
25 | ## B. Data Type Safe & Code Quality
26 |
27 | Now we rewrite the package using [Typescript](https://www.typescriptlang.org/), linting using [eslint](https://eslint.org), and support [VS Code intellisense](https://code.visualstudio.com/docs/editor/intellisense).
28 |
29 |
30 |
31 | ## C. Compatibility
32 |
33 | Now **this package support all devices and browser**, because this package can use both [Audio File](AUDIO_FILE.md) and [Web Speech Synthesis](https://developer.mozilla.org/en-US/docs/Web/API/SpeechSynthesis) API.
34 |
35 | ### Audio file mode:
36 |
37 | Using the prefer of fallback api. you can set this package to do TTS using purely audio file. see [AUDIO_FILE.md](AUDIO_FILE.md).
38 |
39 | ### Web Speech Synthesis API itself:
40 |
41 | see the [Web Speech Synthesis API Browser compatibility](https://developer.mozilla.org/en-US/docs/Web/API/SpeechSynthesis#browser_compatibility)
42 |
43 | ### Device that i have test:
44 |
45 | - Macbook air m1
46 | - Ipad Pro m1
47 | - Samsung A53
--------------------------------------------------------------------------------
/assets/multi_lang/ar.mp3:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/assets/multi_lang/ar.mp3
--------------------------------------------------------------------------------
/assets/multi_lang/cn.mp3:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/assets/multi_lang/cn.mp3
--------------------------------------------------------------------------------
/assets/multi_lang/de.mp3:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/assets/multi_lang/de.mp3
--------------------------------------------------------------------------------
/assets/multi_lang/el.mp3:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/assets/multi_lang/el.mp3
--------------------------------------------------------------------------------
/assets/multi_lang/en.mp3:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/assets/multi_lang/en.mp3
--------------------------------------------------------------------------------
/assets/multi_lang/fi.mp3:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/assets/multi_lang/fi.mp3
--------------------------------------------------------------------------------
/assets/multi_lang/fr.mp3:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/assets/multi_lang/fr.mp3
--------------------------------------------------------------------------------
/assets/multi_lang/hindi.mp3:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/assets/multi_lang/hindi.mp3
--------------------------------------------------------------------------------
/assets/multi_lang/id.mp3:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/assets/multi_lang/id.mp3
--------------------------------------------------------------------------------
/assets/multi_lang/it.mp3:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/assets/multi_lang/it.mp3
--------------------------------------------------------------------------------
/assets/multi_lang/jp.mp3:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/assets/multi_lang/jp.mp3
--------------------------------------------------------------------------------
/assets/multi_lang/ko.mp3:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/assets/multi_lang/ko.mp3
--------------------------------------------------------------------------------
/assets/multi_lang/ro.mp3:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/assets/multi_lang/ro.mp3
--------------------------------------------------------------------------------
/assets/multi_lang/ru.mp3:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/assets/multi_lang/ru.mp3
--------------------------------------------------------------------------------
/assets/multi_lang/tr.mp3:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/assets/multi_lang/tr.mp3
--------------------------------------------------------------------------------
/assets/test/cat.mp3:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/assets/test/cat.mp3
--------------------------------------------------------------------------------
/backend/nodejs/.env.template:
--------------------------------------------------------------------------------
1 | ELEVENLABS_API_KEY=your_api_key_here
2 | PORT=3001
3 |
--------------------------------------------------------------------------------
/backend/nodejs/.gitignore:
--------------------------------------------------------------------------------
1 | # Logs
2 | logs
3 | *.log
4 | npm-debug.log*
5 | yarn-debug.log*
6 | yarn-error.log*
7 | lerna-debug.log*
8 |
9 | # Diagnostic reports (https://nodejs.org/api/report.html)
10 | report.[0-9]*.[0-9]*.[0-9]*.[0-9]*.json
11 |
12 | # Runtime data
13 | pids
14 | *.pid
15 | *.seed
16 | *.pid.lock
17 |
18 | # Directory for instrumented libs generated by jscoverage/JSCover
19 | lib-cov
20 |
21 | # Coverage directory used by tools like istanbul
22 | coverage
23 | *.lcov
24 |
25 | # nyc test coverage
26 | .nyc_output
27 |
28 | # Grunt intermediate storage (https://gruntjs.com/creating-plugins#storing-task-files)
29 | .grunt
30 |
31 | # Bower dependency directory (https://bower.io/)
32 | bower_components
33 |
34 | # node-waf configuration
35 | .lock-wscript
36 |
37 | # Compiled binary addons (https://nodejs.org/api/addons.html)
38 | build/Release
39 |
40 | # Dependency directories
41 | node_modules/
42 | jspm_packages/
43 |
44 | # TypeScript v1 declaration files
45 | typings/
46 |
47 | # TypeScript cache
48 | *.tsbuildinfo
49 |
50 | # Optional npm cache directory
51 | .npm
52 |
53 | # Optional eslint cache
54 | .eslintcache
55 |
56 | # Microbundle cache
57 | .rpt2_cache/
58 | .rts2_cache_cjs/
59 | .rts2_cache_es/
60 | .rts2_cache_umd/
61 |
62 | # Optional REPL history
63 | .node_repl_history
64 |
65 | # Output of 'npm pack'
66 | *.tgz
67 |
68 | # Yarn Integrity file
69 | .yarn-integrity
70 |
71 | # dotenv environment variables file
72 | .env
73 | .env.test
74 |
75 | # parcel-bundler cache (https://parceljs.org/)
76 | .cache
77 |
78 | # Next.js build output
79 | .next
80 |
81 | # Nuxt.js build / generate output
82 | .nuxt
83 | dist
84 |
85 | # Gatsby files
86 | .cache/
87 | # Comment in the public line in if your project uses Gatsby and *not* Next.js
88 | # https://nextjs.org/blog/next-9-1#public-directory-support
89 | # public
90 |
91 | # vuepress build output
92 | .vuepress/dist
93 |
94 | # Serverless directories
95 | .serverless/
96 |
97 | # FuseBox cache
98 | .fusebox/
99 |
100 | # DynamoDB Local files
101 | .dynamodb/
102 |
103 | # TernJS port file
104 | .tern-port
105 |
--------------------------------------------------------------------------------
/backend/nodejs/app.js:
--------------------------------------------------------------------------------
1 | require("dotenv").config();
2 |
3 | const cors = require('cors');
4 |
5 | const express = require("express");
6 | const axios = require("axios");
7 | const app = express();
8 | const port = process.env.PORT || 3001;
9 |
10 | app.use(express.urlencoded({ extended: true }));
11 | app.use(express.json());
12 | app.use(cors());
13 |
14 | app.use((error, req, res, next) => {
15 | if (error instanceof SyntaxError && error.status === 400 && "body" in error) {
16 | console.error(error);
17 | return res.status(400).send({ message: "Malformed JSON in payload" });
18 | }
19 | next();
20 | });
21 |
22 | app.post("/api/v1/public/text-to-speech-elevenlabs", async (req, res) => {
23 | let text = req.body.text || null;
24 |
25 | if (!text) {
26 | res.status(400).send({ error: "Text is required." });
27 | return;
28 | }
29 |
30 | const voice_id =
31 | req.body.voice_id == 0
32 | ? "21m00Tcm4TlvDq8ikWAM"
33 | : req.body.voice || "21m00Tcm4TlvDq8ikWAM";
34 |
35 | const model = req.body.model || "eleven_multilingual_v2";
36 |
37 | const voice_settings =
38 | req.body.voice_settings == 0
39 | ? {
40 | stability: 0.75,
41 | similarity_boost: 0.75,
42 | }
43 | : req.body.voice_settings || {
44 | stability: 0.75,
45 | similarity_boost: 0.75,
46 | };
47 |
48 | try {
49 | const response = await axios.post(
50 | `https://api.elevenlabs.io/v1/text-to-speech/${voice_id}`,
51 | {
52 | text: text, // escape inner double quotes ,
53 | voice_settings: voice_settings,
54 | model_id: model,
55 | },
56 | {
57 | headers: {
58 | "Content-Type": "application/json",
59 | accept: "audio/mpeg",
60 | "xi-api-key": `${process.env.ELEVENLABS_API_KEY}`,
61 | },
62 | responseType: "arraybuffer",
63 | }
64 | );
65 |
66 | const audioBuffer = Buffer.from(response.data, "binary");
67 | const base64Audio = audioBuffer.toString("base64");
68 | const audioDataURI = `data:audio/mpeg;base64,${base64Audio}`;
69 |
70 | res.send({ audio: audioDataURI });
71 | } catch (error) {
72 | console.error(error);
73 | res.status(500).send("Error occurred while processing the request.");
74 | }
75 | });
76 |
77 | app.listen(port, () => {
78 | console.log(`Server is running at http://localhost:${port}`);
79 | });
80 |
--------------------------------------------------------------------------------
/backend/nodejs/package.json:
--------------------------------------------------------------------------------
1 | {
2 | "name": "vf-elevenlabs",
3 | "version": "1.0.0",
4 | "description": "",
5 | "main": "index.js",
6 | "scripts": {
7 | "test": "echo \"Error: no test specified\" && exit 1",
8 | "start": "node app.js"
9 | },
10 | "repository": {
11 | "type": "git",
12 | "url": "git+https://github.com/voiceflow-gallagan/VF-ElevenLabs.git"
13 | },
14 | "author": "NiKo | Voiceflow",
15 | "license": "ISC",
16 | "bugs": {
17 | "url": "https://github.com/voiceflow-gallagan/VF-ElevenLabs/issues"
18 | },
19 | "homepage": "https://github.com/voiceflow-gallagan/VF-ElevenLabs#readme",
20 | "dependencies": {
21 | "axios": "^1.6.7",
22 | "cors": "^2.8.5",
23 | "dotenv": "^16.4.4",
24 | "express": "^4.18.2"
25 | }
26 | }
27 |
--------------------------------------------------------------------------------
/backend/nodejs/readme.md:
--------------------------------------------------------------------------------
1 | # Example Integration Node js Backend with ElevenLabs TTS API
2 |
3 | after you download or clone this folder then
4 |
5 | ```
6 | cp .env.template .env
7 | ```
8 |
9 | Fill with your elevenlabs api key
10 |
11 | open the `.env` you will see something like this:
12 |
13 | ```
14 | ELEVENLABS_API_KEY=your_api_key_here
15 | PORT=3001
16 | ```
17 |
18 | install deps
19 | ```
20 | npm install
21 | ```
22 |
23 | Run the server
24 | ```
25 | npm start
26 | ```
27 | Now its run on localhost:3001
--------------------------------------------------------------------------------
/img/RNSH.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/RNSH.png
--------------------------------------------------------------------------------
/img/adaptable.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/adaptable.png
--------------------------------------------------------------------------------
/img/amazon.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/amazon.png
--------------------------------------------------------------------------------
/img/auto_transcribe.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/auto_transcribe.png
--------------------------------------------------------------------------------
/img/backend.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/backend.png
--------------------------------------------------------------------------------
/img/banner.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/banner.png
--------------------------------------------------------------------------------
/img/cache.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/cache.png
--------------------------------------------------------------------------------
/img/chat_gpt_api.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/chat_gpt_api.png
--------------------------------------------------------------------------------
/img/elevenlabs.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/elevenlabs.png
--------------------------------------------------------------------------------
/img/elevenlabs_pricing.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/elevenlabs_pricing.png
--------------------------------------------------------------------------------
/img/enterprise_web_version.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/enterprise_web_version.png
--------------------------------------------------------------------------------
/img/features.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/features.png
--------------------------------------------------------------------------------
/img/google_tts.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/google_tts.png
--------------------------------------------------------------------------------
/img/hanacaraka.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/hanacaraka.png
--------------------------------------------------------------------------------
/img/intellisense.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/intellisense.gif
--------------------------------------------------------------------------------
/img/interaction.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/interaction.png
--------------------------------------------------------------------------------
/img/mobile_version.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/mobile_version.png
--------------------------------------------------------------------------------
/img/open_tts.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/open_tts.png
--------------------------------------------------------------------------------
/img/overview.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/overview.png
--------------------------------------------------------------------------------
/img/pdf_reader_plugin.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/pdf_reader_plugin.png
--------------------------------------------------------------------------------
/img/position.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/position.png
--------------------------------------------------------------------------------
/img/prepareHL.loadingProgress.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/prepareHL.loadingProgress.png
--------------------------------------------------------------------------------
/img/prepareHL.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/prepareHL.png
--------------------------------------------------------------------------------
/img/pronounciation.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/pronounciation.png
--------------------------------------------------------------------------------
/img/react_speech_logo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/react_speech_logo.png
--------------------------------------------------------------------------------
/img/relation.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/relation.png
--------------------------------------------------------------------------------
/img/sosmed.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/sosmed.png
--------------------------------------------------------------------------------
/img/vanilla.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/vanilla.png
--------------------------------------------------------------------------------
/img/viseme.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/viseme.png
--------------------------------------------------------------------------------
/img/web_version.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/albirrkarim/react-speech-highlight-demo/d8f233bd5c5fe2e4088da17d072f8c946e0292fe/img/web_version.png
--------------------------------------------------------------------------------
/package.json:
--------------------------------------------------------------------------------
1 | {
2 | "name": "react-speech-highlight",
3 | "version": "5.2.9",
4 | "description": "React components that use web speech synthesis API to text-to-speech tasks and also highlight the word and sentences that are being spoken",
5 | "main": "./build/index.js",
6 | "scripts": {
7 | "build": "webpack"
8 | },
9 | "keywords": [
10 | "text-to-speech",
11 | "SpeechSynthesisUtterance"
12 | ],
13 | "author": "albirrkarim",
14 | "license": "MIT",
15 | "repository": {
16 | "type": "git",
17 | "url": "https://github.com/albirrkarim/react-speech-highlight-demo"
18 | },
19 | "peerDependencies": {
20 | "react": ">=17.0.0"
21 | },
22 | "devDependencies": {
23 | "@babel/core": "^7.17.8",
24 | "@babel/preset-env": "^7.16.11",
25 | "@babel/preset-react": "^7.16.7",
26 | "babel-loader": "^8.2.4",
27 | "css-loader": "^6.7.1",
28 | "style-loader": "^3.3.1",
29 | "webpack": "^5.71.0",
30 | "webpack-cli": "^4.10.0",
31 | "dotenv": "^16.3.1"
32 | }
33 | }
34 |
--------------------------------------------------------------------------------