├── yt_pic.png ├── README.md └── speech2speech_code.ipynb /yt_pic.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ALucek/speech2speech-translation/main/yt_pic.png -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Supporting Code from my Speech-2-Speech YouTube Video! 2 | 3 | [![s2s](yt_pic.png)](https://youtu.be/A_kLk-bEKSA) 4 | Click play to watch :) 5 | -------------------------------------------------------------------------------- /speech2speech_code.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "416cb6e6-d4e3-4bf5-913c-1b68391483e1", 6 | "metadata": {}, 7 | "source": [ 8 | "# Real Time Speech to Translated Speech \n", 9 | "\n", 10 | "Translating my own voice into a different language, as I speak it!" 11 | ] 12 | }, 13 | { 14 | "cell_type": "markdown", 15 | "id": "708354e3-fa52-4691-9d10-c6127347555b", 16 | "metadata": {}, 17 | "source": [ 18 | "---\n", 19 | "## OpenAI For Translation\n", 20 | "\n", 21 | "Using GPT-4-Turbo to quickly generate nuanced translations. Input for language and sentence in chain can be changed dynamically." 22 | ] 23 | }, 24 | { 25 | "cell_type": "code", 26 | "execution_count": 1, 27 | "id": "bc837b48-1189-490e-a6b0-43c255c47117", 28 | "metadata": {}, 29 | "outputs": [], 30 | "source": [ 31 | "from langchain_openai import ChatOpenAI\n", 32 | "from langchain_core.runnables import RunnablePassthrough\n", 33 | "from langchain_core.output_parsers import StrOutputParser\n", 34 | "from langchain_core.prompts import ChatPromptTemplate\n", 35 | "\n", 36 | "translation_template = \"\"\"\n", 37 | "Translate the following sentence into {language}, return ONLY the translation, nothing else.\n", 38 | "\n", 39 | "Sentence: {sentence}\n", 40 | "\"\"\"\n", 41 | "\n", 42 | "output_parser = StrOutputParser()\n", 43 | "llm = ChatOpenAI(temperature=0.0, model=\"gpt-4-turbo\")\n", 44 | "translation_prompt = ChatPromptTemplate.from_template(translation_template)\n", 45 | "\n", 46 | "translation_chain = (\n", 47 | " {\"language\": RunnablePassthrough(), \"sentence\": RunnablePassthrough()} \n", 48 | " | translation_prompt\n", 49 | " | llm\n", 50 | " | output_parser\n", 51 | ")\n", 52 | "\n", 53 | "def translate(sentence, language=\"French\"):\n", 54 | " data_input = {\"language\": language, \"sentence\": sentence}\n", 55 | " translation = translation_chain.invoke(data_input)\n", 56 | " return translation" 57 | ] 58 | }, 59 | { 60 | "cell_type": "markdown", 61 | "id": "d0ad434a-25c9-475e-bc56-6ecee593e45b", 62 | "metadata": {}, 63 | "source": [ 64 | "---\n", 65 | "## ElevenLabs For Voice Cloning & Voice Synthesis\n", 66 | "\n", 67 | "Premade voice model on [ElevenLabs Service](https://elevenlabs.io/app/voice-lab), using Multilingual V2 Model for synthesis\n", 68 | "\n", 69 | "**Available Languages:** *Chinese, Korean, Dutch, Turkish, Swedish, Indonesian, Filipino, Japanese, Ukrainian, Greek, Czech, Finnish, Romanian, Russian, Danish, Bulgarian, Malay, Slovak, Croatian, Classic Arabic, Tamil, English, Polish, German, Spanish, French, Italian, Hindi and Portuguese*" 70 | ] 71 | }, 72 | { 73 | "cell_type": "code", 74 | "execution_count": 2, 75 | "id": "73310fc1-ae0e-4943-bb42-f7a4e7e9ec30", 76 | "metadata": {}, 77 | "outputs": [], 78 | "source": [ 79 | "from elevenlabs.client import ElevenLabs\n", 80 | "from elevenlabs import play, stream\n", 81 | "\n", 82 | "client = ElevenLabs()\n", 83 | "\n", 84 | "def gen_dub(text):\n", 85 | " print(\"Generating audio...\")\n", 86 | " audio = client.generate(\n", 87 | " text=text,\n", 88 | " voice=\"\", # Insert voice model here!\n", 89 | " model=\"eleven_multilingual_v2\"\n", 90 | " )\n", 91 | " play(audio)" 92 | ] 93 | }, 94 | { 95 | "cell_type": "markdown", 96 | "id": "327f96ae-44fa-4c32-a2d2-8afc6cd1f4ad", 97 | "metadata": {}, 98 | "source": [ 99 | "---\n", 100 | "## AssemblyAI for Speech to Text Streaming\n", 101 | "\n", 102 | "AssemblyAI handles Streaming STT within their own platform. Inserting the above translation and voice generation functions within this workflow." 103 | ] 104 | }, 105 | { 106 | "cell_type": "code", 107 | "execution_count": 3, 108 | "id": "2df7059d-7ecc-4ab7-a205-deb9ed4dd466", 109 | "metadata": {}, 110 | "outputs": [], 111 | "source": [ 112 | "import assemblyai as aai\n", 113 | "\n", 114 | "def on_open(session_opened: aai.RealtimeSessionOpened):\n", 115 | " \"This function is called when the connection has been established.\"\n", 116 | " print(\"Session ID:\", session_opened.session_id)\n", 117 | "\n", 118 | "def on_data(transcript: aai.RealtimeTranscript):\n", 119 | " \"This function is called when a new transcript has been received.\"\n", 120 | " if not transcript.text:\n", 121 | " return\n", 122 | "\n", 123 | " if isinstance(transcript, aai.RealtimeFinalTranscript):\n", 124 | " print(transcript.text, end=\"\\r\\n\")\n", 125 | " print(\"Translating...\")\n", 126 | " translation = translate(str(transcript.text))\n", 127 | " print(f\"Translation: {translation}\")\n", 128 | " gen_dub(translation)\n", 129 | " else:\n", 130 | " print(transcript.text, end=\"\\r\")\n", 131 | " \n", 132 | "def on_error(error: aai.RealtimeError):\n", 133 | " \"This function is called when the connection has been closed.\"\n", 134 | " print(\"An error occured:\", error)\n", 135 | "\n", 136 | "def on_close():\n", 137 | " \"This function is called when the connection has been closed.\"\n", 138 | " print(\"Closing Session\")\n", 139 | "\n", 140 | "transcriber = aai.RealtimeTranscriber(\n", 141 | " on_data=on_data,\n", 142 | " on_error=on_error,\n", 143 | " sample_rate=44_100,\n", 144 | " on_open=on_open, # optional\n", 145 | " on_close=on_close, # optional\n", 146 | ")" 147 | ] 148 | }, 149 | { 150 | "cell_type": "markdown", 151 | "id": "7a68af17-7eca-40a9-9e27-341232ffb5fe", 152 | "metadata": {}, 153 | "source": [ 154 | "---\n", 155 | "## Main Script\n", 156 | "\n", 157 | "(remember to change audio input/output in settings for airpods)" 158 | ] 159 | }, 160 | { 161 | "cell_type": "code", 162 | "execution_count": 4, 163 | "id": "a79283c3-0f5f-48a1-8839-c4cb0bacf11b", 164 | "metadata": {}, 165 | "outputs": [ 166 | { 167 | "name": "stdout", 168 | "output_type": "stream", 169 | "text": [ 170 | "Session ID: a106c406-94c5-4615-b1ce-710bbc070207\n", 171 | "So here's an example of putting it together and just having the translation into French happen.\n", 172 | "Translating...\n", 173 | "Translation: Voici donc un exemple de mise en œuvre et de réalisation de la traduction en français.\n", 174 | "Perfect. So, as you can see, my voice is being transcribed, and then you'll see it. Sort of pop a little bit into a more formatted thing, which is the final object, and. Then that is what's passed to GPT four turbo and translated.\n", 175 | "Translating...\n", 176 | "Translation: Parfait. Donc, comme vous pouvez le voir, ma voix est transcrise, et ensuite vous la verrez. Cela ressemble un peu à un passage vers quelque chose de plus formaté, qui est l'objet final, et. Ensuite, c'est ce qui est transmis à GPT quatre turbo et traduit.\n" 177 | ] 178 | } 179 | ], 180 | "source": [ 181 | "# Start the connection, likely have to restart kernal (runs better as full code in something like VSCode)\n", 182 | "transcriber.connect()\n", 183 | "microphone_stream = aai.extras.MicrophoneStream()\n", 184 | "transcriber.stream(microphone_stream)" 185 | ] 186 | }, 187 | { 188 | "cell_type": "code", 189 | "execution_count": 11, 190 | "id": "00514bb7-c8d0-4147-9cf6-eb479f3ad436", 191 | "metadata": {}, 192 | "outputs": [ 193 | { 194 | "name": "stdout", 195 | "output_type": "stream", 196 | "text": [ 197 | "Closing Session\n" 198 | ] 199 | } 200 | ], 201 | "source": [ 202 | "transcriber.close()" 203 | ] 204 | }, 205 | { 206 | "cell_type": "code", 207 | "execution_count": null, 208 | "id": "731f6c9a-bf60-4f2c-8879-791494eae744", 209 | "metadata": {}, 210 | "outputs": [], 211 | "source": [] 212 | } 213 | ], 214 | "metadata": { 215 | "kernelspec": { 216 | "display_name": "Python 3 (ipykernel)", 217 | "language": "python", 218 | "name": "python3" 219 | }, 220 | "language_info": { 221 | "codemirror_mode": { 222 | "name": "ipython", 223 | "version": 3 224 | }, 225 | "file_extension": ".py", 226 | "mimetype": "text/x-python", 227 | "name": "python", 228 | "nbconvert_exporter": "python", 229 | "pygments_lexer": "ipython3", 230 | "version": "3.12.1" 231 | } 232 | }, 233 | "nbformat": 4, 234 | "nbformat_minor": 5 235 | } 236 | --------------------------------------------------------------------------------