├── Female.wav
├── Interviewer.mp3
├── README.md
├── requirements.txt
├── summarize_local.py
└── summarize_local_gpt4all.py


/Female.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DrewThomasson/doc2interview/e0591d42658c79b355aeae4f82bed398acfa99c1/Female.wav


--------------------------------------------------------------------------------
/Interviewer.mp3:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DrewThomasson/doc2interview/e0591d42658c79b355aeae4f82bed398acfa99c1/Interviewer.mp3


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # 🎙️ doc2interview
  2 | 
  3 | Welcome to the Interview Audio Generator! 
  4 | This  innovative tool automatically transforms PDF documents or online articles into engaging interview-style audio files. 
  5 | It's perfect for auditory learners or anyone who enjoys consuming content on the go!
  6 | And best of all it runs entirly locally on your computer! No paid api services or anything.
  7 | 
  8 | ## 📋 Prerequisites
  9 | 
 10 | Ensure you have the following installed:
 11 | - Python 3.8+
 12 | - Ollama with the `phi3.5` model pulled
 13 | 
 14 | - For faster results have a cuda capable machine so xtts can generate faster with a minimum of 4gb Vram
 15 | 
 16 | 
 17 | ## 🛠️ Installation
 18 | 
 19 | 1. **Clone the repository**:
 20 |    ```bash
 21 |    git clone https://github.com/DrewThomasson/doc2interview.git
 22 |    cd doc2interview
 23 |    ```
 24 | 
 25 | 2. **Install required Python packages**:
 26 |    - Install all necessary Python packages using the following command:
 27 |      ```bash
 28 |      pip install -r requirements.txt
 29 |      ```
 30 | 
 31 | 3. **Setup Ollama**:
 32 |    - Install Ollama following the official documentation. [link here](https://ollama.com)
 33 |    - Pull the `phi3.5` model necessary for running the script:
 34 |      ```bash
 35 |      ollama pull phi3.5
 36 |      ```
 37 | 
 38 | ## 🚀 Quick Start
 39 | 
 40 | 1. **Start the script**:
 41 |    ```bash
 42 |    python summarize_local.py
 43 |    ```
 44 | 2. **Open the Gradio interface**:
 45 |    - The interface will be available in your web browser.
 46 |    - Upload a PDF or enter an article URL.
 47 |    - Choose the language and let the magic happen!
 48 | 
 49 | ## 📁 Output
 50 | 
 51 | The generated audio files will be stored in:
 52 | - **Chapter-wise audio**: `./output_audio/`
 53 | - **Final combined audio**: `./final_output_audio_dir/final_output_audio.wav`
 54 | 
 55 | Feel free to explore the audio files and use them as needed!
 56 | 
 57 | ## 🎧 Demo
 58 | 
 59 | Check out this sample audio from a generated interview:
 60 | 
 61 | https://github.com/user-attachments/assets/77e6046d-18e0-41dd-b034-7cdd709b9daf
 62 | 
 63 | [Generated from this article](https://www.chosun.com/english/industry-en/2024/08/21/GGIYIGY43VHHVA2J74VAVWLEDQ/)
 64 | 
 65 | 
 66 | ## To-Do List
 67 | 
 68 | - [ ] **Attempt to find a way to remove any ramlbings at the end.**  
 69 |   For more details, see the issue request [here](https://github.com/DrewThomasson/doc2interview/issues/2#issue-2501722522).
 70 | 
 71 | - [ ] **Allow users to easily swap the ref audio for either voice actors in gui.**
 72 | 
 73 | - [ ] **Find a way for the program to be able to determine which speakers in generated dialog are male or female.**  
 74 |   Possably by just asking the llm.
 75 | 
 76 | - [ ] **Allow user to see streaming llm outout**  
 77 |   Right now you cant see the llm working live in the terminal, and only see the llm output once its finished writing the dialog script.
 78 | 
 79 | - [ ] **Have program automatically run the ollama pull command if the specified model isnt found?**  
 80 |   Not that huge of an issue though.
 81 | 
 82 | - [ ] **Have a bulk process mulitple files or documents feature**
 83 | 
 84 | 
 85 | 
 86 | ## 🤝 Contributing
 87 | 
 88 | Contributions are what make the open-source community such an amazing place to learn, inspire, and create. Any contributions you make are **greatly appreciated**.
 89 | 
 90 | ## 📖 License
 91 | 
 92 | Distributed under the MIT License. See `LICENSE` for more information.
 93 | 
 94 | ## ❓ Support
 95 | 
 96 | Got questions? Feel free to open an issue or contact me directly at your-email@example.com.
 97 | 
 98 | ## 🌟 Show your support
 99 | 
100 | Give a ⭐️ if this project helped you!
101 | 
102 | ## Inspired by 
103 | AiPeterWorld with his non-offline version which used gemini flash and openai voice for tts
104 | 
105 | https://huggingface.co/spaces/AIPeterWorld/Doc-To-Dialogue
106 | 
107 | 
108 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | pydub
 2 | nltk
 3 | torch
 4 | tqdm
 5 | gradio
 6 | PyMuPDF  # This is the PyMuPDF package, necessary for handling PDF files
 7 | newspaper3k
 8 | ollama  # This might need to be also installed from a different source or setup as per official documentation
 9 | tts
10 | 


--------------------------------------------------------------------------------
/summarize_local.py:
--------------------------------------------------------------------------------
  1 | print("starting...")
  2 | 
  3 | import os
  4 | import shutil
  5 | import subprocess
  6 | import re
  7 | from pydub import AudioSegment
  8 | import tempfile
  9 | from pydub import AudioSegment
 10 | import os
 11 | import nltk
 12 | from nltk.tokenize import sent_tokenize
 13 | import sys
 14 | import torch
 15 | from TTS.api import TTS
 16 | from TTS.tts.configs.xtts_config import XttsConfig
 17 | from TTS.tts.models.xtts import Xtts
 18 | from tqdm import tqdm
 19 | import gradio as gr
 20 | from gradio import Progress
 21 | import urllib.request
 22 | import zipfile
 23 | 
 24 | 
 25 | 
 26 | import logging
 27 | import asyncio
 28 | from pathlib import Path
 29 | from pydub import AudioSegment
 30 | import gradio as gr
 31 | import torch
 32 | from TTS.api import TTS
 33 | from tqdm import tqdm
 34 | import fitz  # PyMuPDF
 35 | from newspaper import Article
 36 | from ollama import Client
 37 | 
 38 | import os
 39 | 
 40 | 
 41 | 
 42 | 
 43 | 
 44 | sample_ollama_response = """
 45 | Interviewer: Good day, Andrew Cunningham! We understand that you've delved into Microsoft’s Control Panel history and its potential future direction in line with the introduction of new operating systems such as Windows 10 and beyond. Can we explore this topic further? Thank you for joining us today.
 46 | 
 47 | Andrew: Absolutely, I believe it is quite an intriguing journey to trace back through Microsoft's user interface history! The Control Panel has been a fundamental part of the operating system experience since Windows NT 4 in 1996 – that’s more than three decades old.
 48 | 
 49 | Interviewer: That's right, Andrew. And with these years comes plenty of evolution and change for such an essential tool. The Control Panel provided a centralized location to view system settings which have been adjusted over the course of Windows history through various applets that users could interact with – from setting up hardware configurations down to network connections...
 50 | 
 51 | Andrew: Yes, indeed! Many elements in our day-to-day life are tied to these panels. For example, anyone who's used a computer for even an hour has probably adjusted the system date and time or changed their display settings at least once using Control Panel applets…
 52 | 
 53 | Interviewer: With that said, Microsoft’s recent note on support sites indicates it might be phasing out these traditional panels in favor of something more modern. What are your thoughts about this move? The company seems to promote a shift towards the Settings App which offers streamlined functionality and is tailored for touchscreen devices...
 54 | 
 55 | Andrew: It's an interesting development, I must admit! Microsoft has been moving forward with its Windows operating systems by introducing more modern interfaces. Although we still see older Control Panel applets in updates like 24H2, the fact remains that these traditional panels are being deprecated for newer alternatives – a sign of progress indeed…
 56 | 
 57 | Interviewer: But some users may have an attachment to these old designs and icons which date back as far as Windows Vista with its rounded glassy appearance. How significant is this change in design language, considering Microsoft’s attempt at keeping the interface cohesive? The company's modernization efforts are certainly visible throughout newer apps like Paint or Notepad…
 58 | 
 59 | Andrew: Yes, that was indeed a remarkable shift! It does pose questions about preserving legacy aspects of Windows. However, as operating systems mature and evolve with time to align better with the current technology landscape – touchscreens being more common now than ever before - these changes seem almost logical...
 60 | 
 61 | Interviewer: In summary Andrew, we can see that Microsoft has a significant history attached to its Control Panel interface. The company is progressively transitioning towards newer and streamlined interfaces while not completely neglecting the older styles for some user preferences… Any final thoughts on what you believe this might mean going forward?
 62 | 
 63 | Andrew: Well it’s always challenging when such a profound component of our daily interactions with technology changes. I suspect Microsoft will aim to balance new advancements and legacy features, as they've done previously – offering more contemporary interfaces without completely disregarding the older ones that many users might still prefer...
 64 | 
 65 | Interviewer: Thank you for your insights today, Andrew Cunningham! It’s been illuminating discussing Microsoft Windows evolution with a seasoned observer like yourself. Good day and hope to chat again soon!
 66 | 
 67 | Andrew: The pleasure was all mine – I always enjoy our talks on such fascinating topics too; catch you later!
 68 | """
 69 | 
 70 | 
 71 | 
 72 | 
 73 | 
 74 | logging.basicConfig(
 75 |     level=logging.INFO,
 76 |     format='%(asctime)s - %(levelname)s - %(message)s'
 77 | )
 78 | 
 79 | def wipe_folder(folder_path):
 80 |     # Check if the folder exists
 81 |     if not os.path.exists(folder_path):
 82 |         print(f"The folder {folder_path} does not exist.")
 83 |         return
 84 | 
 85 |     # Iterate over all the items in the given folder
 86 |     for item in os.listdir(folder_path):
 87 |         item_path = os.path.join(folder_path, item)
 88 |         # If it's a file, remove it and print a message
 89 |         if os.path.isfile(item_path):
 90 |             os.remove(item_path)
 91 |             print(f"Removed file: {item_path}")
 92 |         # If it's a directory, remove it recursively and print a message
 93 |         elif os.path.isdir(item_path):
 94 |             shutil.rmtree(item_path)
 95 |             print(f"Removed directory and its contents: {item_path}")
 96 |     
 97 |     print(f"All contents wiped from {folder_path}.")
 98 | 
 99 | def fetch_text_from_url(url):
100 |     """Fetch main text from the provided URL using newspaper3k."""
101 |     try:
102 |         article = Article(url)
103 |         article.download()
104 |         article.parse()
105 |         return article.text
106 |     except Exception as e:
107 |         logging.error(f"Failed to fetch text from URL: {e}")
108 |         return None
109 | 
110 | def convert_pdf_to_text(pdf_path):
111 |     """Convert PDF file to text using PyMuPDF."""
112 |     text = ""
113 |     with fitz.open(pdf_path) as pdf:
114 |         for page in pdf:
115 |             text += page.get_text()
116 |     return text
117 | 
118 | 
119 | #Standard non-streaming version
120 | def run_ollama(prompt, model="phi3.5"):
121 |     """Run Ollama locally with the given model and prompt using the Python API."""
122 |     client = Client()
123 |     logging.info(f"Running Ollama with model: {model} and prompt: {prompt}")
124 |     try:
125 |         response = client.generate(model=model, prompt=prompt)
126 |         output = response['response']
127 |         logging.info(f"Ollama response: {output}")
128 |         return output
129 |     except Exception as e:
130 |         logging.error(f"Ollama error: {str(e)}")
131 |         return None
132 | 
133 | 
134 | # Streaming version with buffered output
135 | '''
136 | def run_ollama(prompt, model="phi3.5"):
137 |     """Run Ollama locally with the given model and prompt using the Python API."""
138 |     client = Client()
139 |     logging.info(f"Running Ollama with model: {model} and prompt: {prompt}")
140 |     output = []
141 |     buffer = ""
142 |     
143 |     try:
144 |         # Stream the response and buffer chunks
145 |         stream = client.generate(model=model, prompt=prompt, stream=True)
146 |         for chunk in stream:
147 |             if 'response' in chunk:
148 |                 buffer += chunk['response']
149 |                 
150 |                 # Print the buffer if a complete sentence or newline is detected
151 |                 if buffer.endswith('.') or buffer.endswith('\n'):
152 |                     print(buffer, end='', flush=True)
153 |                     output.append(buffer)
154 |                     buffer = ""  # Clear buffer after printing
155 |             else:
156 |                 logging.error(f"Unexpected chunk format: {chunk}")
157 |         
158 |         # Print any remaining content in the buffer
159 |         if buffer:
160 |             print(buffer, end='', flush=True)
161 |             output.append(buffer)
162 |         
163 |         # Join all chunks to form the complete output
164 |         full_output = ''.join(output)
165 |         logging.info(f"Ollama response: {full_output}")
166 |         print(f"Full output is: {full_output}")
167 |         return full_output
168 |     except Exception as e:
169 |         logging.error(f"Ollama error: {str(e)}")
170 |         return None
171 | '''
172 | 
173 | 
174 | 
175 | def generate_prompt(language, stage):
176 |     """Generate the appropriate prompt based on the language and stage."""
177 |     if stage == 1:
178 |         if language.lower() == "english":
179 |             return (
180 |                 "English Version:\n\n"
181 |                 "Generate an in-depth and coherent interview in dialogue format that reflects the key aspects of the provided document. "
182 |                 "Include a brief introduction by the interviewer, followed by a series of questions and responses, concluding with a summary."
183 |                 " Output should be plain text, with each dialogue line separated by two new lines."
184 |             )
185 |         else:
186 |             return (
187 |                 "Versión en Español:\n\n"
188 |                 "Genera una entrevista coherente en formato de diálogo que refleje los aspectos clave del documento proporcionado. "
189 |                 "Incluye una breve introducción por el entrevistador, seguida de una serie de preguntas y respuestas, concluyendo con un resumen."
190 |                 " El resultado debe ser texto plano, con cada línea de diálogo separada por dos nuevas líneas."
191 |             )
192 |     if stage == 2:
193 |         if language.lower() == "english":
194 |             return (
195 |                 "Please return the given dialog script with any unrelated ramblings removed from it. "
196 |                 "Your response should contain nothing except for the modified script:"
197 |             )
198 |         else:
199 |             return (
200 |                 "Por favor, devuelva el guion de diálogo dado con cualquier divagación no relacionada eliminada. "
201 |                 "Su respuesta debe contener solo el guion modificado:"
202 |             )
203 | 
204 | def get_chat_response(text, language):
205 |     """Generate interview based on text and handle response."""
206 |     print("Crafting Interview...")
207 |     prompt_stage = generate_prompt(language, 1)
208 |     interview = run_ollama(prompt_stage + "\n\n" + text + "\n\n" + prompt_stage)
209 |     print("Cleaning up Interview...")
210 |     prompt_stage = generate_prompt(language, 2)
211 |     cleaned_interview = run_ollama(prompt_stage + "\n\n" + interview + "\n\n" + prompt_stage)
212 |     #interview = sample_ollama_response
213 |     return interview.split('\n\n')  # Splitting by two new lines as per the new format
214 | 
215 | # Setup TTS using 🐸TTS
216 | device = "cuda" if torch.cuda.is_available() else "cpu"
217 | #tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to(device)
218 | 
219 | def remove_prefix(text):
220 |     """Remove any prefix before and including the first colon, if present."""
221 |     index = text.find(':')
222 |     if index != -1:
223 |         return text[index + 1:].lstrip()
224 |     return text
225 | 
226 | 
227 | def remove_prefix_from_all_txt_files_in_folder(folder_path):
228 |     """Remove any prefix before and including the first colon in every .txt file in the specified folder."""
229 |     
230 |     def remove_prefix(text):
231 |         """Remove any prefix before and including the first colon, if present."""
232 |         index = text.find(':')
233 |         if index != -1:
234 |             return text[index + 1:].lstrip()
235 |         return text
236 | 
237 |     for filename in os.listdir(folder_path):
238 |         if filename.endswith('.txt'):
239 |             file_path = os.path.join(folder_path, filename)
240 |             with open(file_path, 'r') as file:
241 |                 content = file.readlines()
242 |             
243 |             # Apply remove_prefix to each line
244 |             new_content = [remove_prefix(line) for line in content]
245 |             
246 |             # Write the modified content back to the file
247 |             with open(file_path, 'w') as file:
248 |                 file.writelines(new_content)
249 | 
250 |     print("Prefix removed from all text files in the folder.")
251 | 
252 | # Usage example
253 | #folder_path = '/path/to/your/folder'  # Replace with your folder path
254 | #remove_prefix_from_folder(folder_path)
255 | 
256 | 
257 | def create_chapter_files(chapters, output_folder):
258 |     # Ensure the output directory exists, create if it doesn't
259 |     os.makedirs(output_folder, exist_ok=True)
260 |     
261 |     for i, chapter in enumerate(chapters, start=1):
262 |         file_path = os.path.join(output_folder, f"chapter_{i}.txt")
263 |         with open(file_path, "w") as file:
264 |             file.write(chapter)
265 | 
266 | # Combine WAV files into a single file
267 | def combine_wav_files(input_directory, output_directory, file_name):
268 |     # Ensure that the output directory exists, create it if necessary
269 |     os.makedirs(output_directory, exist_ok=True)
270 | 
271 |     # Specify the output file path
272 |     output_file_path = os.path.join(output_directory, file_name)
273 | 
274 |     # Initialize an empty audio segment
275 |     combined_audio = AudioSegment.empty()
276 | 
277 |     # Get a list of all .wav files in the specified input directory and sort them
278 |     input_file_paths = sorted(
279 |         [os.path.join(input_directory, f) for f in os.listdir(input_directory) if f.endswith(".wav")],
280 |         key=lambda f: int(''.join(filter(str.isdigit, f)))
281 |     )
282 | 
283 |     # Sequentially append each file to the combined_audio
284 |     for input_file_path in input_file_paths:
285 |         audio_segment = AudioSegment.from_wav(input_file_path)
286 |         combined_audio += audio_segment
287 | 
288 |     # Export the combined audio to the output file path
289 |     combined_audio.export(output_file_path, format='wav')
290 | 
291 |     print(f"Combined audio saved to {output_file_path}")
292 | 
293 | # Function to split long strings into parts
294 | def split_long_sentence(sentence, max_length=230, max_pauses=8):
295 |     """
296 |     Splits a sentence into parts based on length or number of pauses without recursion.
297 |     
298 |     :param sentence: The sentence to split.
299 |     :param max_length: Maximum allowed length of a sentence.
300 |     :param max_pauses: Maximum allowed number of pauses in a sentence.
301 |     :return: A list of sentence parts that meet the criteria.
302 |     """
303 |     parts = []
304 |     while len(sentence) > max_length or sentence.count(',') + sentence.count(';') + sentence.count('.') > max_pauses:
305 |         possible_splits = [i for i, char in enumerate(sentence) if char in ',;.' and i < max_length]
306 |         if possible_splits:
307 |             # Find the best place to split the sentence, preferring the last possible split to keep parts longer
308 |             split_at = possible_splits[-1] + 1
309 |         else:
310 |             # If no punctuation to split on within max_length, split at max_length
311 |             split_at = max_length
312 |         
313 |         # Split the sentence and add the first part to the list
314 |         parts.append(sentence[:split_at].strip())
315 |         sentence = sentence[split_at:].strip()
316 |     
317 |     # Add the remaining part of the sentence
318 |     parts.append(sentence)
319 |     return parts
320 | 
321 | 
322 | #This function goes through the chapter dir and genrates a chapter for each chapter_1.txt and so on files
323 | def convert_chapters_to_audio_standard_model(chapters_dir, output_audio_dir, target_voice_path=None, language=None):
324 |     selected_tts_model = "tts_models/multilingual/multi-dataset/xtts_v2"
325 |     tts = TTS(selected_tts_model, progress_bar=False).to(device)
326 | 
327 |     if not os.path.exists(output_audio_dir):
328 |         os.makedirs(output_audio_dir)
329 |     Narrerator_status = True
330 | 
331 |     for chapter_file in sorted(os.listdir(chapters_dir), key=lambda x: int(re.search(r"chapter_(\d+).txt", x).group(1)) if re.search(r"chapter_(\d+).txt", x) else float('inf')):
332 |         if chapter_file.endswith('.txt'):
333 |             match = re.search(r"chapter_(\d+).txt", chapter_file)
334 |             if match:
335 |                 chapter_num = int(match.group(1))
336 |             else:
337 |                 print(f"Skipping file {chapter_file} as it does not match the expected format.")
338 |                 continue
339 | 
340 |             chapter_path = os.path.join(chapters_dir, chapter_file)
341 |             output_file_name = f"audio_chapter_{chapter_num}.wav"
342 |             output_file_path = os.path.join(output_audio_dir, output_file_name)
343 |             temp_audio_directory = os.path.join(".", "Working_files", "temp")
344 |             os.makedirs(temp_audio_directory, exist_ok=True)
345 |             temp_count = 0
346 | 
347 |             with open(chapter_path, 'r', encoding='utf-8') as file:
348 |                 chapter_text = file.read()
349 |                 sentences = sent_tokenize(chapter_text, language='italian' if language == 'it' else 'english')
350 |                 for sentence in tqdm(sentences, desc=f"Chapter {chapter_num}"):
351 |                     fragments = split_long_sentence(sentence, max_length=249 if language == "en" else 213, max_pauses=10)
352 |                     for fragment in fragments:
353 |                         if fragment != "":
354 |                             print(f"Generating fragment: {fragment}...")
355 |                             fragment_file_path = os.path.join(temp_audio_directory, f"{temp_count}.wav")
356 |                             #speaker_wav_path = target_voice_path if target_voice_path else default_target_voice_path
357 |                             language_code = language if language else default_language_code
358 |                             if Narrerator_status == True:
359 |                                 tts.tts_to_file(text=fragment, file_path=fragment_file_path, speaker_wav="Interviewer.mp3", language=language_code)
360 |                             if Narrerator_status == False:
361 |                                 tts.tts_to_file(text=fragment, file_path=fragment_file_path, speaker_wav="Female.wav", language=language_code)
362 |                             temp_count += 1
363 | 
364 |             combine_wav_files(temp_audio_directory, output_audio_dir, output_file_name)
365 |             wipe_folder(temp_audio_directory)
366 |             print(f"Converted chapter {chapter_num} to audio.")
367 |             #This will swap the status of the Narrerator status boolean value
368 |             Narrerator_status = not Narrerator_status
369 | 
370 | async def generate_and_combine_audio_files(dialogues, output_dir, base_name):
371 |     """Generate audio files for dialogues and combine them."""
372 |     file_number = 1  # Start numbering from 0000001
373 |     is_interviewer = True  # Start with interviewer as the first speaker
374 |     for dialogue in tqdm(dialogues, desc="Generating audio"):
375 |         if dialogue.strip():  # Check if there is actual dialogue content
376 |             generate_audio()
377 |             print(f"Generating audio...: Interviewer is : {is_interviewer} dialogue is {dialogue}")
378 |             is_interviewer = not is_interviewer  # Toggle speaker after each dialogue block
379 |     combined_audio_path = output_dir / f"{base_name}.wav"
380 |     print(f"combining audio files...")
381 |     combine_audio()
382 |     return combined_audio_path
383 | 
384 | async def main_async(input_data, language):
385 |     """Main function to process input and generate audio."""
386 |     text = ""
387 |     if isinstance(input_data, Path):
388 |         text = convert_pdf_to_text(input_data)
389 |     else:
390 |         text = fetch_text_from_url(input_data)
391 |     dialogues = get_chat_response(text, language)
392 |     #create chapter files from dialog
393 |     chaptertxt_folder = "chapters_txt"
394 |     create_chapter_files(dialogues, chaptertxt_folder)
395 | 
396 |     #This will remove all the prefix from all the txt files in the chaptertxt_folder folder
397 |     remove_prefix_from_all_txt_files_in_folder(chaptertxt_folder)
398 | 
399 |     #generate audio for all chapter files
400 |     output_audio_dir = "output_audio"
401 |     convert_chapters_to_audio_standard_model(chaptertxt_folder, output_audio_dir, target_voice_path=None, language='en')
402 | 
403 |     #combine all the audio files into a single final output audio file
404 |     final_output_audio_dir = "final_output_audio_dir"
405 |     combine_wav_files(output_audio_dir, final_output_audio_dir, "final_output_audio.wav")
406 | 
407 |     #wipe all the temp folders
408 |     wipe_folder("Working_files")
409 |     wipe_folder("Working_files/temp")
410 |     wipe_folder("output_audio")
411 |     wipe_folder("chapters_txt")
412 | 
413 | 
414 |     return "Complete!"
415 | 
416 | def gradio_interface(input_file, url, language):
417 |     # Check if both PDF and URL are provided
418 |     if input_file is not None and url.strip() != "":
419 |         return "Error: Please provide either a PDF file or a URL, not both."
420 |     
421 |     # Check if a PDF file is provided
422 |     if input_file is not None:
423 |         input_data = Path(input_file.name)  # Correctly handle the PDF file
424 |     elif url.strip() != "":
425 |         input_data = url  # Use the URL
426 |     else:
427 |         return "Error: Please provide a PDF file or a URL."
428 |     
429 |     try:
430 |         #audio_file_path = asyncio.run(main_async(input_data, language, input_file is not None))
431 |         audio_file_path = asyncio.run(main_async(input_data, language))
432 |         return audio_file_path
433 |     except Exception as e:
434 |         logging.error(f"{e}")
435 |         return str(e)
436 | 
437 | 
438 | # Setup Gradio interface
439 | demo = gr.Interface(
440 |     fn=gradio_interface,
441 |     inputs=[
442 |         gr.File(label="Upload PDF / Subir PDF", type="filepath"),
443 |         gr.Textbox(label="Or Enter Article URL", placeholder="Enter URL here"),
444 |         gr.Dropdown(label="Select Language / Seleccionar idioma", choices=["English", "Spanish"], value="English")
445 |     ],
446 |     outputs=gr.Audio(label="Generated Interview / Entrevista generada"),
447 |     allow_flagging="never"
448 | )
449 | 
450 | # Launch Gradio interface
451 | demo.launch(share=False)  # Set share=True to create a public link
452 | 


--------------------------------------------------------------------------------
/summarize_local_gpt4all.py:
--------------------------------------------------------------------------------
  1 | #You need to add these imports to a docker file
  2 | #the dockerfile should be using python3.10
  3 | 
  4 | #pip install pydub 
  5 | #pip install nltk
  6 | #pip install torch
  7 | #pip install torchvision
  8 | #pip install torchaudio
  9 | #pip install TTS 
 10 | #pip install tqdm
 11 | #pip install gradio
 12 | #pip install PyMuPDF
 13 | #pip install newspaper3k 
 14 | #pip install gpt4all
 15 | #pip install tqdm
 16 | 
 17 | print("starting...")
 18 | 
 19 | import os
 20 | import shutil
 21 | import subprocess
 22 | import re
 23 | from pydub import AudioSegment
 24 | import tempfile
 25 | from pydub import AudioSegment
 26 | import os
 27 | import nltk
 28 | from nltk.tokenize import sent_tokenize
 29 | import sys
 30 | import torch
 31 | from TTS.api import TTS
 32 | from TTS.tts.configs.xtts_config import XttsConfig
 33 | from TTS.tts.models.xtts import Xtts
 34 | from tqdm import tqdm
 35 | import gradio as gr
 36 | from gradio import Progress
 37 | import urllib.request
 38 | import zipfile
 39 | 
 40 | import logging
 41 | import asyncio
 42 | from pathlib import Path
 43 | from pydub import AudioSegment
 44 | import gradio as gr
 45 | import torch
 46 | from TTS.api import TTS
 47 | from tqdm import tqdm
 48 | import fitz  # PyMuPDF
 49 | from newspaper import Article
 50 | from gpt4all import GPT4All
 51 | 
 52 | import os
 53 | from gpt4all import GPT4All
 54 | import logging
 55 | 
 56 | logging.basicConfig(
 57 |     level=logging.INFO,
 58 |     format='%(asctime)s - %(levelname)s - %(message)s'
 59 | )
 60 | 
 61 | def wipe_folder(folder_path):
 62 |     # Check if the folder exists
 63 |     if not os.path.exists(folder_path):
 64 |         print(f"The folder {folder_path} does not exist.")
 65 |         return
 66 | 
 67 |     # Iterate over all the items in the given folder
 68 |     for item in os.listdir(folder_path):
 69 |         item_path = os.path.join(folder_path, item)
 70 |         # If it's a file, remove it and print a message
 71 |         if os.path.isfile(item_path):
 72 |             os.remove(item_path)
 73 |             print(f"Removed file: {item_path}")
 74 |         # If it's a directory, remove it recursively and print a message
 75 |         elif os.path.isdir(item_path):
 76 |             shutil.rmtree(item_path)
 77 |             print(f"Removed directory and its contents: {item_path}")
 78 |     
 79 |     print(f"All contents wiped from {folder_path}.")
 80 | 
 81 | def fetch_text_from_url(url):
 82 |     """Fetch main text from the provided URL using newspaper3k."""
 83 |     try:
 84 |         article = Article(url)
 85 |         article.download()
 86 |         article.parse()
 87 |         return article.text
 88 |     except Exception as e:
 89 |         logging.error(f"Failed to fetch text from URL: {e}")
 90 |         return None
 91 | 
 92 | def convert_pdf_to_text(pdf_path):
 93 |     """Convert PDF file to text using PyMuPDF."""
 94 |     text = ""
 95 |     with fitz.open(pdf_path) as pdf:
 96 |         for page in pdf:
 97 |             text += page.get_text()
 98 |     return text
 99 | 
100 | def run_gpt4all(prompt, model="Phi-3.5-mini-instruct.Q4_0.gguf"):
101 |     """Run GPT4All locally with the given model and prompt using the Python API."""
102 |     gpt4all = GPT4All(model)
103 |     logging.info(f"Running GPT4All with model: {model} and prompt: {prompt}")
104 |     try:
105 |         with gpt4all.chat_session():  # Use chat_session for managing context
106 |             response = gpt4all.generate(prompt, max_tokens=1024)  # Generate response
107 |         logging.info(f"GPT4All response: {response}")
108 |         return response
109 |     except Exception as e:
110 |         logging.error(f"GPT4All error: {str(e)}")
111 |         return None
112 | 
113 | 
114 | def generate_prompt(language, stage):
115 |     """Generate the appropriate prompt based on the language and stage."""
116 |     if language.lower() == "english":
117 |         return (
118 |             "English Version:\n\n"
119 |             "Generate an in-depth and coherent interview in dialogue format that reflects the key aspects of the provided document. "
120 |             "Include a brief introduction by the interviewer, followed by a series of questions and responses, concluding with a summary."
121 |             " Output should be plain text, with each dialogue line separated by two new lines."
122 |         )
123 |     else:
124 |         return (
125 |             "Versión en Español:\n\n"
126 |             "Genera una entrevista coherente en formato de diálogo que refleje los aspectos clave del documento proporcionado. "
127 |             "Incluye una breve introducción por el entrevistador, seguida de una serie de preguntas y respuestas, concluyendo con un resumen."
128 |             " El resultado debe ser texto plano, con cada línea de diálogo separada por dos nuevas líneas."
129 |         )
130 | 
131 | def get_chat_response(text, language):
132 |     """Generate interview based on text and handle response."""
133 |     prompt_stage = generate_prompt(language, 1)
134 |     interview = run_gpt4all(prompt_stage + "\n\n" + text)
135 |     return interview.split('\n\n')  # Splitting by two new lines as per the new format
136 | 
137 | # Setup TTS using 🐸TTS
138 | device = "cuda" if torch.cuda.is_available() else "cpu"
139 | 
140 | def remove_prefix(text):
141 |     """Remove any prefix before and including the first colon, if present."""
142 |     index = text.find(':')
143 |     if index != -1:
144 |         return text[index + 1:].lstrip()
145 |     return text
146 | 
147 | def remove_prefix_from_all_txt_files_in_folder(folder_path):
148 |     """Remove any prefix before and including the first colon in every .txt file in the specified folder."""
149 |     for filename in os.listdir(folder_path):
150 |         if filename.endswith('.txt'):
151 |             file_path = os.path.join(folder_path, filename)
152 |             with open(file_path, 'r') as file:
153 |                 content = file.readlines()
154 |             
155 |             # Apply remove_prefix to each line
156 |             new_content = [remove_prefix(line) for line in content]
157 |             
158 |             # Write the modified content back to the file
159 |             with open(file_path, 'w') as file:
160 |                 file.writelines(new_content)
161 | 
162 |     print("Prefix removed from all text files in the folder.")
163 | 
164 | def create_chapter_files(chapters, output_folder):
165 |     # Ensure the output directory exists, create if it doesn't
166 |     os.makedirs(output_folder, exist_ok=True)
167 |     
168 |     for i, chapter in enumerate(chapters, start=1):
169 |         file_path = os.path.join(output_folder, f"chapter_{i}.txt")
170 |         with open(file_path, "w") as file:
171 |             file.write(chapter)
172 | 
173 | # Combine WAV files into a single file
174 | def combine_wav_files(input_directory, output_directory, file_name):
175 |     # Ensure that the output directory exists, create it if necessary
176 |     os.makedirs(output_directory, exist_ok=True)
177 | 
178 |     # Specify the output file path
179 |     output_file_path = os.path.join(output_directory, file_name)
180 | 
181 |     # Initialize an empty audio segment
182 |     combined_audio = AudioSegment.empty()
183 | 
184 |     # Get a list of all .wav files in the specified input directory and sort them
185 |     input_file_paths = sorted(
186 |         [os.path.join(input_directory, f) for f in os.listdir(input_directory) if f.endswith(".wav")],
187 |         key=lambda f: int(''.join(filter(str.isdigit, f)))
188 |     )
189 | 
190 |     # Sequentially append each file to the combined_audio
191 |     for input_file_path in input_file_paths:
192 |         audio_segment = AudioSegment.from_wav(input_file_path)
193 |         combined_audio += audio_segment
194 | 
195 |     # Export the combined audio to the output file path
196 |     combined_audio.export(output_file_path, format='wav')
197 | 
198 |     print(f"Combined audio saved to {output_file_path}")
199 | 
200 | # Function to split long strings into parts
201 | def split_long_sentence(sentence, max_length=230, max_pauses=8):
202 |     """
203 |     Splits a sentence into parts based on length or number of pauses without recursion.
204 |     
205 |     :param sentence: The sentence to split.
206 |     :param max_length: Maximum allowed length of a sentence.
207 |     :param max_pauses: Maximum allowed number of pauses in a sentence.
208 |     :return: A list of sentence parts that meet the criteria.
209 |     """
210 |     parts = []
211 |     while len(sentence) > max_length or sentence.count(',') + sentence.count(';') + sentence.count('.') > max_pauses:
212 |         possible_splits = [i for i, char in enumerate(sentence) if char in ',;.' and i < max_length]
213 |         if possible_splits:
214 |             # Find the best place to split the sentence, preferring the last possible split to keep parts longer
215 |             split_at = possible_splits[-1] + 1
216 |         else:
217 |             # If no punctuation to split on within max_length, split at max_length
218 |             split_at = max_length
219 |         
220 |         # Split the sentence and add the first part to the list
221 |         parts.append(sentence[:split_at].strip())
222 |         sentence = sentence[split_at:].strip()
223 |     
224 |     # Add the remaining part of the sentence
225 |     parts.append(sentence)
226 |     return parts
227 | 
228 | #This function goes through the chapter dir and generates a chapter for each chapter_1.txt and so on files
229 | def convert_chapters_to_audio_standard_model(chapters_dir, output_audio_dir, target_voice_path=None, language=None):
230 |     selected_tts_model = "tts_models/multilingual/multi-dataset/xtts_v2"
231 |     tts = TTS(selected_tts_model, progress_bar=False).to(device)
232 | 
233 |     if not os.path.exists(output_audio_dir):
234 |         os.makedirs(output_audio_dir)
235 |     Narrerator_status = True
236 | 
237 |     for chapter_file in sorted(os.listdir(chapters_dir), key=lambda x: int(re.search(r"chapter_(\d+).txt", x).group(1)) if re.search(r"chapter_(\d+).txt", x) else float('inf')):
238 |         if chapter_file.endswith('.txt'):
239 |             match = re.search(r"chapter_(\d+).txt", chapter_file)
240 |             if match:
241 |                 chapter_num = int(match.group(1))
242 |             else:
243 |                 print(f"Skipping file {chapter_file} as it does not match the expected format.")
244 |                 continue
245 | 
246 |             chapter_path = os.path.join(chapters_dir, chapter_file)
247 |             output_file_name = f"audio_chapter_{chapter_num}.wav"
248 |             output_file_path = os.path.join(output_audio_dir, output_file_name)
249 |             temp_audio_directory = os.path.join(".", "Working_files", "temp")
250 |             os.makedirs(temp_audio_directory, exist_ok=True)
251 |             temp_count = 0
252 | 
253 |             with open(chapter_path, 'r', encoding='utf-8') as file:
254 |                 chapter_text = file.read()
255 |                 sentences = sent_tokenize(chapter_text, language='italian' if language == 'it' else 'english')
256 |                 for sentence in tqdm(sentences, desc=f"Chapter {chapter_num}"):
257 |                     fragments = split_long_sentence(sentence, max_length=249 if language == "en" else 213, max_pauses=10)
258 |                     for fragment in fragments:
259 |                         if fragment != "":
260 |                             print(f"Generating fragment: {fragment}...")
261 |                             fragment_file_path = os.path.join(temp_audio_directory, f"{temp_count}.wav")
262 |                             #speaker_wav_path = target_voice_path if target_voice_path else default_target_voice_path
263 |                             language_code = language if language else default_language_code
264 |                             if Narrerator_status == True:
265 |                                 tts.tts_to_file(text=fragment, file_path=fragment_file_path, speaker_wav="Interviewer.mp3", language=language_code)
266 |                             if Narrerator_status == False:
267 |                                 tts.tts_to_file(text=fragment, file_path=fragment_file_path, speaker_wav="Female.wav", language=language_code)
268 |                             temp_count += 1
269 | 
270 |             combine_wav_files(temp_audio_directory, output_audio_dir, output_file_name)
271 |             wipe_folder(temp_audio_directory)
272 |             print(f"Converted chapter {chapter_num} to audio.")
273 |             #This will swap the status of the Narrerator status boolean value
274 |             Narrerator_status = not Narrerator_status
275 | 
276 | async def generate_and_combine_audio_files(dialogues, output_dir, base_name):
277 |     """Generate audio files for dialogues and combine them."""
278 |     file_number = 1  # Start numbering from 0000001
279 |     is_interviewer = True  # Start with interviewer as the first speaker
280 |     for dialogue in tqdm(dialogues, desc="Generating audio"):
281 |         if dialogue.strip():  # Check if there is actual dialogue content
282 |             generate_audio()
283 |             print(f"Generating audio...: Interviewer is : {is_interviewer} dialogue is {dialogue}")
284 |             is_interviewer = not is_interviewer  # Toggle speaker after each dialogue block
285 |     combined_audio_path = output_dir / f"{base_name}.wav"
286 |     print(f"combining audio files...")
287 |     combine_audio()
288 |     return combined_audio_path
289 | 
290 | async def main_async(input_data, language):
291 |     """Main function to process input and generate audio."""
292 |     text = ""
293 |     if isinstance(input_data, Path):
294 |         text = convert_pdf_to_text(input_data)
295 |     else:
296 |         text = fetch_text_from_url(input_data)
297 |     dialogues = get_chat_response(text, language)
298 |     #create chapter files from dialog
299 |     chaptertxt_folder = "chapters_txt"
300 |     create_chapter_files(dialogues, chaptertxt_folder)
301 | 
302 |     #This will remove all the prefix from all the txt files in the chaptertxt_folder folder
303 |     remove_prefix_from_all_txt_files_in_folder(chaptertxt_folder)
304 | 
305 |     #generate audio for all chapter files
306 |     output_audio_dir = "output_audio"
307 |     convert_chapters_to_audio_standard_model(chaptertxt_folder, output_audio_dir, target_voice_path=None, language='en')
308 | 
309 |     #combine all the audio files into a single final output audio file
310 |     final_output_audio_dir = "final_output_audio_dir"
311 |     combine_wav_files(output_audio_dir, final_output_audio_dir, "final_output_audio.wav")
312 | 
313 |     #wipe all the temp folders
314 |     wipe_folder("Working_files")
315 |     wipe_folder("Working_files/temp")
316 |     wipe_folder("output_audio")
317 |     wipe_folder("chapters_txt")
318 | 
319 |     return "Complete!"
320 | 
321 | def gradio_interface(input_file, url, language):
322 |     """Gradio interface to process input and generate audio."""
323 |     input_data = input_file if input_file else url
324 |     try:
325 |         audio_file_path = asyncio.run(main_async(input_data, language))
326 |         return audio_file_path
327 |     except Exception as e:
328 |         logging.error(f"{e}")
329 |         return str(e)
330 | 
331 | # Setup Gradio interface
332 | demo = gr.Interface(
333 |     fn=gradio_interface,
334 |     inputs=[
335 |         gr.File(label="Upload PDF / Subir PDF", type="filepath"),
336 |         gr.Textbox(label="Or Enter Article URL", placeholder="Enter URL here"),
337 |         gr.Dropdown(label="Select Language / Seleccionar idioma", choices=["English", "Spanish"], value="English")
338 |     ],
339 |     outputs=gr.Audio(label="Generated Interview / Entrevista generada"),
340 |     allow_flagging="never"
341 | )
342 | 
343 | # Launch Gradio interface
344 | demo.launch(share=True)  # Set share=True to create a public link
345 | 


--------------------------------------------------------------------------------