├── yt_pic.png
├── README.md
└── speech2speech_code.ipynb


/yt_pic.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ALucek/speech2speech-translation/main/yt_pic.png


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Supporting Code from my Speech-2-Speech YouTube Video!
2 | 
3 | [![s2s](yt_pic.png)](https://youtu.be/A_kLk-bEKSA)
4 | Click play to watch :)
5 | 


--------------------------------------------------------------------------------
/speech2speech_code.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "id": "416cb6e6-d4e3-4bf5-913c-1b68391483e1",
  6 |    "metadata": {},
  7 |    "source": [
  8 |     "# Real Time Speech to Translated Speech \n",
  9 |     "\n",
 10 |     "Translating my own voice into a different language, as I speak it!"
 11 |    ]
 12 |   },
 13 |   {
 14 |    "cell_type": "markdown",
 15 |    "id": "708354e3-fa52-4691-9d10-c6127347555b",
 16 |    "metadata": {},
 17 |    "source": [
 18 |     "---\n",
 19 |     "## OpenAI For Translation\n",
 20 |     "\n",
 21 |     "Using GPT-4-Turbo to quickly generate nuanced translations. Input for language and sentence in chain can be changed dynamically."
 22 |    ]
 23 |   },
 24 |   {
 25 |    "cell_type": "code",
 26 |    "execution_count": 1,
 27 |    "id": "bc837b48-1189-490e-a6b0-43c255c47117",
 28 |    "metadata": {},
 29 |    "outputs": [],
 30 |    "source": [
 31 |     "from langchain_openai import ChatOpenAI\n",
 32 |     "from langchain_core.runnables import RunnablePassthrough\n",
 33 |     "from langchain_core.output_parsers import StrOutputParser\n",
 34 |     "from langchain_core.prompts import ChatPromptTemplate\n",
 35 |     "\n",
 36 |     "translation_template = \"\"\"\n",
 37 |     "Translate the following sentence into {language}, return ONLY the translation, nothing else.\n",
 38 |     "\n",
 39 |     "Sentence: {sentence}\n",
 40 |     "\"\"\"\n",
 41 |     "\n",
 42 |     "output_parser = StrOutputParser()\n",
 43 |     "llm = ChatOpenAI(temperature=0.0, model=\"gpt-4-turbo\")\n",
 44 |     "translation_prompt = ChatPromptTemplate.from_template(translation_template)\n",
 45 |     "\n",
 46 |     "translation_chain = (\n",
 47 |     "    {\"language\": RunnablePassthrough(), \"sentence\": RunnablePassthrough()} \n",
 48 |     "    | translation_prompt\n",
 49 |     "    | llm\n",
 50 |     "    | output_parser\n",
 51 |     ")\n",
 52 |     "\n",
 53 |     "def translate(sentence, language=\"French\"):\n",
 54 |     "    data_input = {\"language\": language, \"sentence\": sentence}\n",
 55 |     "    translation = translation_chain.invoke(data_input)\n",
 56 |     "    return translation"
 57 |    ]
 58 |   },
 59 |   {
 60 |    "cell_type": "markdown",
 61 |    "id": "d0ad434a-25c9-475e-bc56-6ecee593e45b",
 62 |    "metadata": {},
 63 |    "source": [
 64 |     "---\n",
 65 |     "## ElevenLabs For Voice Cloning & Voice Synthesis\n",
 66 |     "\n",
 67 |     "Premade voice model on [ElevenLabs Service](https://elevenlabs.io/app/voice-lab), using Multilingual V2 Model for synthesis\n",
 68 |     "\n",
 69 |     "**Available Languages:** *Chinese, Korean, Dutch, Turkish, Swedish, Indonesian, Filipino, Japanese, Ukrainian, Greek, Czech, Finnish, Romanian, Russian, Danish, Bulgarian, Malay, Slovak, Croatian, Classic Arabic, Tamil, English, Polish, German, Spanish, French, Italian, Hindi and Portuguese*"
 70 |    ]
 71 |   },
 72 |   {
 73 |    "cell_type": "code",
 74 |    "execution_count": 2,
 75 |    "id": "73310fc1-ae0e-4943-bb42-f7a4e7e9ec30",
 76 |    "metadata": {},
 77 |    "outputs": [],
 78 |    "source": [
 79 |     "from elevenlabs.client import ElevenLabs\n",
 80 |     "from elevenlabs import play, stream\n",
 81 |     "\n",
 82 |     "client = ElevenLabs()\n",
 83 |     "\n",
 84 |     "def gen_dub(text):\n",
 85 |     "    print(\"Generating audio...\")\n",
 86 |     "    audio = client.generate(\n",
 87 |     "        text=text,\n",
 88 |     "        voice=\"\", # Insert voice model here!\n",
 89 |     "        model=\"eleven_multilingual_v2\"\n",
 90 |     "    )\n",
 91 |     "    play(audio)"
 92 |    ]
 93 |   },
 94 |   {
 95 |    "cell_type": "markdown",
 96 |    "id": "327f96ae-44fa-4c32-a2d2-8afc6cd1f4ad",
 97 |    "metadata": {},
 98 |    "source": [
 99 |     "---\n",
100 |     "## AssemblyAI for Speech to Text Streaming\n",
101 |     "\n",
102 |     "AssemblyAI handles Streaming STT within their own platform. Inserting the above translation and voice generation functions within this workflow."
103 |    ]
104 |   },
105 |   {
106 |    "cell_type": "code",
107 |    "execution_count": 3,
108 |    "id": "2df7059d-7ecc-4ab7-a205-deb9ed4dd466",
109 |    "metadata": {},
110 |    "outputs": [],
111 |    "source": [
112 |     "import assemblyai as aai\n",
113 |     "\n",
114 |     "def on_open(session_opened: aai.RealtimeSessionOpened):\n",
115 |     "  \"This function is called when the connection has been established.\"\n",
116 |     "  print(\"Session ID:\", session_opened.session_id)\n",
117 |     "\n",
118 |     "def on_data(transcript: aai.RealtimeTranscript):\n",
119 |     "  \"This function is called when a new transcript has been received.\"\n",
120 |     "  if not transcript.text:\n",
121 |     "    return\n",
122 |     "\n",
123 |     "  if isinstance(transcript, aai.RealtimeFinalTranscript):\n",
124 |     "    print(transcript.text, end=\"\\r\\n\")\n",
125 |     "    print(\"Translating...\")\n",
126 |     "    translation = translate(str(transcript.text))\n",
127 |     "    print(f\"Translation: {translation}\")\n",
128 |     "    gen_dub(translation)\n",
129 |     "  else:\n",
130 |     "    print(transcript.text, end=\"\\r\")\n",
131 |     "      \n",
132 |     "def on_error(error: aai.RealtimeError):\n",
133 |     "  \"This function is called when the connection has been closed.\"\n",
134 |     "  print(\"An error occured:\", error)\n",
135 |     "\n",
136 |     "def on_close():\n",
137 |     "  \"This function is called when the connection has been closed.\"\n",
138 |     "  print(\"Closing Session\")\n",
139 |     "\n",
140 |     "transcriber = aai.RealtimeTranscriber(\n",
141 |     "  on_data=on_data,\n",
142 |     "  on_error=on_error,\n",
143 |     "  sample_rate=44_100,\n",
144 |     "  on_open=on_open, # optional\n",
145 |     "  on_close=on_close, # optional\n",
146 |     ")"
147 |    ]
148 |   },
149 |   {
150 |    "cell_type": "markdown",
151 |    "id": "7a68af17-7eca-40a9-9e27-341232ffb5fe",
152 |    "metadata": {},
153 |    "source": [
154 |     "---\n",
155 |     "## Main Script\n",
156 |     "\n",
157 |     "(remember to change audio input/output in settings for airpods)"
158 |    ]
159 |   },
160 |   {
161 |    "cell_type": "code",
162 |    "execution_count": 4,
163 |    "id": "a79283c3-0f5f-48a1-8839-c4cb0bacf11b",
164 |    "metadata": {},
165 |    "outputs": [
166 |     {
167 |      "name": "stdout",
168 |      "output_type": "stream",
169 |      "text": [
170 |       "Session ID: a106c406-94c5-4615-b1ce-710bbc070207\n",
171 |       "So here's an example of putting it together and just having the translation into French happen.\n",
172 |       "Translating...\n",
173 |       "Translation: Voici donc un exemple de mise en œuvre et de réalisation de la traduction en français.\n",
174 |       "Perfect. So, as you can see, my voice is being transcribed, and then you'll see it. Sort of pop a little bit into a more formatted thing, which is the final object, and. Then that is what's passed to GPT four turbo and translated.\n",
175 |       "Translating...\n",
176 |       "Translation: Parfait. Donc, comme vous pouvez le voir, ma voix est transcrise, et ensuite vous la verrez. Cela ressemble un peu à un passage vers quelque chose de plus formaté, qui est l'objet final, et. Ensuite, c'est ce qui est transmis à GPT quatre turbo et traduit.\n"
177 |      ]
178 |     }
179 |    ],
180 |    "source": [
181 |     "# Start the connection, likely have to restart kernal (runs better as full code in something like VSCode)\n",
182 |     "transcriber.connect()\n",
183 |     "microphone_stream = aai.extras.MicrophoneStream()\n",
184 |     "transcriber.stream(microphone_stream)"
185 |    ]
186 |   },
187 |   {
188 |    "cell_type": "code",
189 |    "execution_count": 11,
190 |    "id": "00514bb7-c8d0-4147-9cf6-eb479f3ad436",
191 |    "metadata": {},
192 |    "outputs": [
193 |     {
194 |      "name": "stdout",
195 |      "output_type": "stream",
196 |      "text": [
197 |       "Closing Session\n"
198 |      ]
199 |     }
200 |    ],
201 |    "source": [
202 |     "transcriber.close()"
203 |    ]
204 |   },
205 |   {
206 |    "cell_type": "code",
207 |    "execution_count": null,
208 |    "id": "731f6c9a-bf60-4f2c-8879-791494eae744",
209 |    "metadata": {},
210 |    "outputs": [],
211 |    "source": []
212 |   }
213 |  ],
214 |  "metadata": {
215 |   "kernelspec": {
216 |    "display_name": "Python 3 (ipykernel)",
217 |    "language": "python",
218 |    "name": "python3"
219 |   },
220 |   "language_info": {
221 |    "codemirror_mode": {
222 |     "name": "ipython",
223 |     "version": 3
224 |    },
225 |    "file_extension": ".py",
226 |    "mimetype": "text/x-python",
227 |    "name": "python",
228 |    "nbconvert_exporter": "python",
229 |    "pygments_lexer": "ipython3",
230 |    "version": "3.12.1"
231 |   }
232 |  },
233 |  "nbformat": 4,
234 |  "nbformat_minor": 5
235 | }
236 | 


--------------------------------------------------------------------------------