├── Female.wav ├── Interviewer.mp3 ├── README.md ├── requirements.txt ├── summarize_local.py └── summarize_local_gpt4all.py /Female.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DrewThomasson/doc2interview/e0591d42658c79b355aeae4f82bed398acfa99c1/Female.wav -------------------------------------------------------------------------------- /Interviewer.mp3: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DrewThomasson/doc2interview/e0591d42658c79b355aeae4f82bed398acfa99c1/Interviewer.mp3 -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # 🎙️ doc2interview 2 | 3 | Welcome to the Interview Audio Generator! 4 | This innovative tool automatically transforms PDF documents or online articles into engaging interview-style audio files. 5 | It's perfect for auditory learners or anyone who enjoys consuming content on the go! 6 | And best of all it runs entirly locally on your computer! No paid api services or anything. 7 | 8 | ## 📋 Prerequisites 9 | 10 | Ensure you have the following installed: 11 | - Python 3.8+ 12 | - Ollama with the `phi3.5` model pulled 13 | 14 | - For faster results have a cuda capable machine so xtts can generate faster with a minimum of 4gb Vram 15 | 16 | 17 | ## 🛠️ Installation 18 | 19 | 1. **Clone the repository**: 20 | ```bash 21 | git clone https://github.com/DrewThomasson/doc2interview.git 22 | cd doc2interview 23 | ``` 24 | 25 | 2. **Install required Python packages**: 26 | - Install all necessary Python packages using the following command: 27 | ```bash 28 | pip install -r requirements.txt 29 | ``` 30 | 31 | 3. **Setup Ollama**: 32 | - Install Ollama following the official documentation. [link here](https://ollama.com) 33 | - Pull the `phi3.5` model necessary for running the script: 34 | ```bash 35 | ollama pull phi3.5 36 | ``` 37 | 38 | ## 🚀 Quick Start 39 | 40 | 1. **Start the script**: 41 | ```bash 42 | python summarize_local.py 43 | ``` 44 | 2. **Open the Gradio interface**: 45 | - The interface will be available in your web browser. 46 | - Upload a PDF or enter an article URL. 47 | - Choose the language and let the magic happen! 48 | 49 | ## 📁 Output 50 | 51 | The generated audio files will be stored in: 52 | - **Chapter-wise audio**: `./output_audio/` 53 | - **Final combined audio**: `./final_output_audio_dir/final_output_audio.wav` 54 | 55 | Feel free to explore the audio files and use them as needed! 56 | 57 | ## 🎧 Demo 58 | 59 | Check out this sample audio from a generated interview: 60 | 61 | https://github.com/user-attachments/assets/77e6046d-18e0-41dd-b034-7cdd709b9daf 62 | 63 | [Generated from this article](https://www.chosun.com/english/industry-en/2024/08/21/GGIYIGY43VHHVA2J74VAVWLEDQ/) 64 | 65 | 66 | ## To-Do List 67 | 68 | - [ ] **Attempt to find a way to remove any ramlbings at the end.** 69 | For more details, see the issue request [here](https://github.com/DrewThomasson/doc2interview/issues/2#issue-2501722522). 70 | 71 | - [ ] **Allow users to easily swap the ref audio for either voice actors in gui.** 72 | 73 | - [ ] **Find a way for the program to be able to determine which speakers in generated dialog are male or female.** 74 | Possably by just asking the llm. 75 | 76 | - [ ] **Allow user to see streaming llm outout** 77 | Right now you cant see the llm working live in the terminal, and only see the llm output once its finished writing the dialog script. 78 | 79 | - [ ] **Have program automatically run the ollama pull command if the specified model isnt found?** 80 | Not that huge of an issue though. 81 | 82 | - [ ] **Have a bulk process mulitple files or documents feature** 83 | 84 | 85 | 86 | ## 🤝 Contributing 87 | 88 | Contributions are what make the open-source community such an amazing place to learn, inspire, and create. Any contributions you make are **greatly appreciated**. 89 | 90 | ## 📖 License 91 | 92 | Distributed under the MIT License. See `LICENSE` for more information. 93 | 94 | ## ❓ Support 95 | 96 | Got questions? Feel free to open an issue or contact me directly at your-email@example.com. 97 | 98 | ## 🌟 Show your support 99 | 100 | Give a ⭐️ if this project helped you! 101 | 102 | ## Inspired by 103 | AiPeterWorld with his non-offline version which used gemini flash and openai voice for tts 104 | 105 | https://huggingface.co/spaces/AIPeterWorld/Doc-To-Dialogue 106 | 107 | 108 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | pydub 2 | nltk 3 | torch 4 | tqdm 5 | gradio 6 | PyMuPDF # This is the PyMuPDF package, necessary for handling PDF files 7 | newspaper3k 8 | ollama # This might need to be also installed from a different source or setup as per official documentation 9 | tts 10 | -------------------------------------------------------------------------------- /summarize_local.py: -------------------------------------------------------------------------------- 1 | print("starting...") 2 | 3 | import os 4 | import shutil 5 | import subprocess 6 | import re 7 | from pydub import AudioSegment 8 | import tempfile 9 | from pydub import AudioSegment 10 | import os 11 | import nltk 12 | from nltk.tokenize import sent_tokenize 13 | import sys 14 | import torch 15 | from TTS.api import TTS 16 | from TTS.tts.configs.xtts_config import XttsConfig 17 | from TTS.tts.models.xtts import Xtts 18 | from tqdm import tqdm 19 | import gradio as gr 20 | from gradio import Progress 21 | import urllib.request 22 | import zipfile 23 | 24 | 25 | 26 | import logging 27 | import asyncio 28 | from pathlib import Path 29 | from pydub import AudioSegment 30 | import gradio as gr 31 | import torch 32 | from TTS.api import TTS 33 | from tqdm import tqdm 34 | import fitz # PyMuPDF 35 | from newspaper import Article 36 | from ollama import Client 37 | 38 | import os 39 | 40 | 41 | 42 | 43 | 44 | sample_ollama_response = """ 45 | Interviewer: Good day, Andrew Cunningham! We understand that you've delved into Microsoft’s Control Panel history and its potential future direction in line with the introduction of new operating systems such as Windows 10 and beyond. Can we explore this topic further? Thank you for joining us today. 46 | 47 | Andrew: Absolutely, I believe it is quite an intriguing journey to trace back through Microsoft's user interface history! The Control Panel has been a fundamental part of the operating system experience since Windows NT 4 in 1996 – that’s more than three decades old. 48 | 49 | Interviewer: That's right, Andrew. And with these years comes plenty of evolution and change for such an essential tool. The Control Panel provided a centralized location to view system settings which have been adjusted over the course of Windows history through various applets that users could interact with – from setting up hardware configurations down to network connections... 50 | 51 | Andrew: Yes, indeed! Many elements in our day-to-day life are tied to these panels. For example, anyone who's used a computer for even an hour has probably adjusted the system date and time or changed their display settings at least once using Control Panel applets… 52 | 53 | Interviewer: With that said, Microsoft’s recent note on support sites indicates it might be phasing out these traditional panels in favor of something more modern. What are your thoughts about this move? The company seems to promote a shift towards the Settings App which offers streamlined functionality and is tailored for touchscreen devices... 54 | 55 | Andrew: It's an interesting development, I must admit! Microsoft has been moving forward with its Windows operating systems by introducing more modern interfaces. Although we still see older Control Panel applets in updates like 24H2, the fact remains that these traditional panels are being deprecated for newer alternatives – a sign of progress indeed… 56 | 57 | Interviewer: But some users may have an attachment to these old designs and icons which date back as far as Windows Vista with its rounded glassy appearance. How significant is this change in design language, considering Microsoft’s attempt at keeping the interface cohesive? The company's modernization efforts are certainly visible throughout newer apps like Paint or Notepad… 58 | 59 | Andrew: Yes, that was indeed a remarkable shift! It does pose questions about preserving legacy aspects of Windows. However, as operating systems mature and evolve with time to align better with the current technology landscape – touchscreens being more common now than ever before - these changes seem almost logical... 60 | 61 | Interviewer: In summary Andrew, we can see that Microsoft has a significant history attached to its Control Panel interface. The company is progressively transitioning towards newer and streamlined interfaces while not completely neglecting the older styles for some user preferences… Any final thoughts on what you believe this might mean going forward? 62 | 63 | Andrew: Well it’s always challenging when such a profound component of our daily interactions with technology changes. I suspect Microsoft will aim to balance new advancements and legacy features, as they've done previously – offering more contemporary interfaces without completely disregarding the older ones that many users might still prefer... 64 | 65 | Interviewer: Thank you for your insights today, Andrew Cunningham! It’s been illuminating discussing Microsoft Windows evolution with a seasoned observer like yourself. Good day and hope to chat again soon! 66 | 67 | Andrew: The pleasure was all mine – I always enjoy our talks on such fascinating topics too; catch you later! 68 | """ 69 | 70 | 71 | 72 | 73 | 74 | logging.basicConfig( 75 | level=logging.INFO, 76 | format='%(asctime)s - %(levelname)s - %(message)s' 77 | ) 78 | 79 | def wipe_folder(folder_path): 80 | # Check if the folder exists 81 | if not os.path.exists(folder_path): 82 | print(f"The folder {folder_path} does not exist.") 83 | return 84 | 85 | # Iterate over all the items in the given folder 86 | for item in os.listdir(folder_path): 87 | item_path = os.path.join(folder_path, item) 88 | # If it's a file, remove it and print a message 89 | if os.path.isfile(item_path): 90 | os.remove(item_path) 91 | print(f"Removed file: {item_path}") 92 | # If it's a directory, remove it recursively and print a message 93 | elif os.path.isdir(item_path): 94 | shutil.rmtree(item_path) 95 | print(f"Removed directory and its contents: {item_path}") 96 | 97 | print(f"All contents wiped from {folder_path}.") 98 | 99 | def fetch_text_from_url(url): 100 | """Fetch main text from the provided URL using newspaper3k.""" 101 | try: 102 | article = Article(url) 103 | article.download() 104 | article.parse() 105 | return article.text 106 | except Exception as e: 107 | logging.error(f"Failed to fetch text from URL: {e}") 108 | return None 109 | 110 | def convert_pdf_to_text(pdf_path): 111 | """Convert PDF file to text using PyMuPDF.""" 112 | text = "" 113 | with fitz.open(pdf_path) as pdf: 114 | for page in pdf: 115 | text += page.get_text() 116 | return text 117 | 118 | 119 | #Standard non-streaming version 120 | def run_ollama(prompt, model="phi3.5"): 121 | """Run Ollama locally with the given model and prompt using the Python API.""" 122 | client = Client() 123 | logging.info(f"Running Ollama with model: {model} and prompt: {prompt}") 124 | try: 125 | response = client.generate(model=model, prompt=prompt) 126 | output = response['response'] 127 | logging.info(f"Ollama response: {output}") 128 | return output 129 | except Exception as e: 130 | logging.error(f"Ollama error: {str(e)}") 131 | return None 132 | 133 | 134 | # Streaming version with buffered output 135 | ''' 136 | def run_ollama(prompt, model="phi3.5"): 137 | """Run Ollama locally with the given model and prompt using the Python API.""" 138 | client = Client() 139 | logging.info(f"Running Ollama with model: {model} and prompt: {prompt}") 140 | output = [] 141 | buffer = "" 142 | 143 | try: 144 | # Stream the response and buffer chunks 145 | stream = client.generate(model=model, prompt=prompt, stream=True) 146 | for chunk in stream: 147 | if 'response' in chunk: 148 | buffer += chunk['response'] 149 | 150 | # Print the buffer if a complete sentence or newline is detected 151 | if buffer.endswith('.') or buffer.endswith('\n'): 152 | print(buffer, end='', flush=True) 153 | output.append(buffer) 154 | buffer = "" # Clear buffer after printing 155 | else: 156 | logging.error(f"Unexpected chunk format: {chunk}") 157 | 158 | # Print any remaining content in the buffer 159 | if buffer: 160 | print(buffer, end='', flush=True) 161 | output.append(buffer) 162 | 163 | # Join all chunks to form the complete output 164 | full_output = ''.join(output) 165 | logging.info(f"Ollama response: {full_output}") 166 | print(f"Full output is: {full_output}") 167 | return full_output 168 | except Exception as e: 169 | logging.error(f"Ollama error: {str(e)}") 170 | return None 171 | ''' 172 | 173 | 174 | 175 | def generate_prompt(language, stage): 176 | """Generate the appropriate prompt based on the language and stage.""" 177 | if stage == 1: 178 | if language.lower() == "english": 179 | return ( 180 | "English Version:\n\n" 181 | "Generate an in-depth and coherent interview in dialogue format that reflects the key aspects of the provided document. " 182 | "Include a brief introduction by the interviewer, followed by a series of questions and responses, concluding with a summary." 183 | " Output should be plain text, with each dialogue line separated by two new lines." 184 | ) 185 | else: 186 | return ( 187 | "Versión en Español:\n\n" 188 | "Genera una entrevista coherente en formato de diálogo que refleje los aspectos clave del documento proporcionado. " 189 | "Incluye una breve introducción por el entrevistador, seguida de una serie de preguntas y respuestas, concluyendo con un resumen." 190 | " El resultado debe ser texto plano, con cada línea de diálogo separada por dos nuevas líneas." 191 | ) 192 | if stage == 2: 193 | if language.lower() == "english": 194 | return ( 195 | "Please return the given dialog script with any unrelated ramblings removed from it. " 196 | "Your response should contain nothing except for the modified script:" 197 | ) 198 | else: 199 | return ( 200 | "Por favor, devuelva el guion de diálogo dado con cualquier divagación no relacionada eliminada. " 201 | "Su respuesta debe contener solo el guion modificado:" 202 | ) 203 | 204 | def get_chat_response(text, language): 205 | """Generate interview based on text and handle response.""" 206 | print("Crafting Interview...") 207 | prompt_stage = generate_prompt(language, 1) 208 | interview = run_ollama(prompt_stage + "\n\n" + text + "\n\n" + prompt_stage) 209 | print("Cleaning up Interview...") 210 | prompt_stage = generate_prompt(language, 2) 211 | cleaned_interview = run_ollama(prompt_stage + "\n\n" + interview + "\n\n" + prompt_stage) 212 | #interview = sample_ollama_response 213 | return interview.split('\n\n') # Splitting by two new lines as per the new format 214 | 215 | # Setup TTS using 🐸TTS 216 | device = "cuda" if torch.cuda.is_available() else "cpu" 217 | #tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to(device) 218 | 219 | def remove_prefix(text): 220 | """Remove any prefix before and including the first colon, if present.""" 221 | index = text.find(':') 222 | if index != -1: 223 | return text[index + 1:].lstrip() 224 | return text 225 | 226 | 227 | def remove_prefix_from_all_txt_files_in_folder(folder_path): 228 | """Remove any prefix before and including the first colon in every .txt file in the specified folder.""" 229 | 230 | def remove_prefix(text): 231 | """Remove any prefix before and including the first colon, if present.""" 232 | index = text.find(':') 233 | if index != -1: 234 | return text[index + 1:].lstrip() 235 | return text 236 | 237 | for filename in os.listdir(folder_path): 238 | if filename.endswith('.txt'): 239 | file_path = os.path.join(folder_path, filename) 240 | with open(file_path, 'r') as file: 241 | content = file.readlines() 242 | 243 | # Apply remove_prefix to each line 244 | new_content = [remove_prefix(line) for line in content] 245 | 246 | # Write the modified content back to the file 247 | with open(file_path, 'w') as file: 248 | file.writelines(new_content) 249 | 250 | print("Prefix removed from all text files in the folder.") 251 | 252 | # Usage example 253 | #folder_path = '/path/to/your/folder' # Replace with your folder path 254 | #remove_prefix_from_folder(folder_path) 255 | 256 | 257 | def create_chapter_files(chapters, output_folder): 258 | # Ensure the output directory exists, create if it doesn't 259 | os.makedirs(output_folder, exist_ok=True) 260 | 261 | for i, chapter in enumerate(chapters, start=1): 262 | file_path = os.path.join(output_folder, f"chapter_{i}.txt") 263 | with open(file_path, "w") as file: 264 | file.write(chapter) 265 | 266 | # Combine WAV files into a single file 267 | def combine_wav_files(input_directory, output_directory, file_name): 268 | # Ensure that the output directory exists, create it if necessary 269 | os.makedirs(output_directory, exist_ok=True) 270 | 271 | # Specify the output file path 272 | output_file_path = os.path.join(output_directory, file_name) 273 | 274 | # Initialize an empty audio segment 275 | combined_audio = AudioSegment.empty() 276 | 277 | # Get a list of all .wav files in the specified input directory and sort them 278 | input_file_paths = sorted( 279 | [os.path.join(input_directory, f) for f in os.listdir(input_directory) if f.endswith(".wav")], 280 | key=lambda f: int(''.join(filter(str.isdigit, f))) 281 | ) 282 | 283 | # Sequentially append each file to the combined_audio 284 | for input_file_path in input_file_paths: 285 | audio_segment = AudioSegment.from_wav(input_file_path) 286 | combined_audio += audio_segment 287 | 288 | # Export the combined audio to the output file path 289 | combined_audio.export(output_file_path, format='wav') 290 | 291 | print(f"Combined audio saved to {output_file_path}") 292 | 293 | # Function to split long strings into parts 294 | def split_long_sentence(sentence, max_length=230, max_pauses=8): 295 | """ 296 | Splits a sentence into parts based on length or number of pauses without recursion. 297 | 298 | :param sentence: The sentence to split. 299 | :param max_length: Maximum allowed length of a sentence. 300 | :param max_pauses: Maximum allowed number of pauses in a sentence. 301 | :return: A list of sentence parts that meet the criteria. 302 | """ 303 | parts = [] 304 | while len(sentence) > max_length or sentence.count(',') + sentence.count(';') + sentence.count('.') > max_pauses: 305 | possible_splits = [i for i, char in enumerate(sentence) if char in ',;.' and i < max_length] 306 | if possible_splits: 307 | # Find the best place to split the sentence, preferring the last possible split to keep parts longer 308 | split_at = possible_splits[-1] + 1 309 | else: 310 | # If no punctuation to split on within max_length, split at max_length 311 | split_at = max_length 312 | 313 | # Split the sentence and add the first part to the list 314 | parts.append(sentence[:split_at].strip()) 315 | sentence = sentence[split_at:].strip() 316 | 317 | # Add the remaining part of the sentence 318 | parts.append(sentence) 319 | return parts 320 | 321 | 322 | #This function goes through the chapter dir and genrates a chapter for each chapter_1.txt and so on files 323 | def convert_chapters_to_audio_standard_model(chapters_dir, output_audio_dir, target_voice_path=None, language=None): 324 | selected_tts_model = "tts_models/multilingual/multi-dataset/xtts_v2" 325 | tts = TTS(selected_tts_model, progress_bar=False).to(device) 326 | 327 | if not os.path.exists(output_audio_dir): 328 | os.makedirs(output_audio_dir) 329 | Narrerator_status = True 330 | 331 | for chapter_file in sorted(os.listdir(chapters_dir), key=lambda x: int(re.search(r"chapter_(\d+).txt", x).group(1)) if re.search(r"chapter_(\d+).txt", x) else float('inf')): 332 | if chapter_file.endswith('.txt'): 333 | match = re.search(r"chapter_(\d+).txt", chapter_file) 334 | if match: 335 | chapter_num = int(match.group(1)) 336 | else: 337 | print(f"Skipping file {chapter_file} as it does not match the expected format.") 338 | continue 339 | 340 | chapter_path = os.path.join(chapters_dir, chapter_file) 341 | output_file_name = f"audio_chapter_{chapter_num}.wav" 342 | output_file_path = os.path.join(output_audio_dir, output_file_name) 343 | temp_audio_directory = os.path.join(".", "Working_files", "temp") 344 | os.makedirs(temp_audio_directory, exist_ok=True) 345 | temp_count = 0 346 | 347 | with open(chapter_path, 'r', encoding='utf-8') as file: 348 | chapter_text = file.read() 349 | sentences = sent_tokenize(chapter_text, language='italian' if language == 'it' else 'english') 350 | for sentence in tqdm(sentences, desc=f"Chapter {chapter_num}"): 351 | fragments = split_long_sentence(sentence, max_length=249 if language == "en" else 213, max_pauses=10) 352 | for fragment in fragments: 353 | if fragment != "": 354 | print(f"Generating fragment: {fragment}...") 355 | fragment_file_path = os.path.join(temp_audio_directory, f"{temp_count}.wav") 356 | #speaker_wav_path = target_voice_path if target_voice_path else default_target_voice_path 357 | language_code = language if language else default_language_code 358 | if Narrerator_status == True: 359 | tts.tts_to_file(text=fragment, file_path=fragment_file_path, speaker_wav="Interviewer.mp3", language=language_code) 360 | if Narrerator_status == False: 361 | tts.tts_to_file(text=fragment, file_path=fragment_file_path, speaker_wav="Female.wav", language=language_code) 362 | temp_count += 1 363 | 364 | combine_wav_files(temp_audio_directory, output_audio_dir, output_file_name) 365 | wipe_folder(temp_audio_directory) 366 | print(f"Converted chapter {chapter_num} to audio.") 367 | #This will swap the status of the Narrerator status boolean value 368 | Narrerator_status = not Narrerator_status 369 | 370 | async def generate_and_combine_audio_files(dialogues, output_dir, base_name): 371 | """Generate audio files for dialogues and combine them.""" 372 | file_number = 1 # Start numbering from 0000001 373 | is_interviewer = True # Start with interviewer as the first speaker 374 | for dialogue in tqdm(dialogues, desc="Generating audio"): 375 | if dialogue.strip(): # Check if there is actual dialogue content 376 | generate_audio() 377 | print(f"Generating audio...: Interviewer is : {is_interviewer} dialogue is {dialogue}") 378 | is_interviewer = not is_interviewer # Toggle speaker after each dialogue block 379 | combined_audio_path = output_dir / f"{base_name}.wav" 380 | print(f"combining audio files...") 381 | combine_audio() 382 | return combined_audio_path 383 | 384 | async def main_async(input_data, language): 385 | """Main function to process input and generate audio.""" 386 | text = "" 387 | if isinstance(input_data, Path): 388 | text = convert_pdf_to_text(input_data) 389 | else: 390 | text = fetch_text_from_url(input_data) 391 | dialogues = get_chat_response(text, language) 392 | #create chapter files from dialog 393 | chaptertxt_folder = "chapters_txt" 394 | create_chapter_files(dialogues, chaptertxt_folder) 395 | 396 | #This will remove all the prefix from all the txt files in the chaptertxt_folder folder 397 | remove_prefix_from_all_txt_files_in_folder(chaptertxt_folder) 398 | 399 | #generate audio for all chapter files 400 | output_audio_dir = "output_audio" 401 | convert_chapters_to_audio_standard_model(chaptertxt_folder, output_audio_dir, target_voice_path=None, language='en') 402 | 403 | #combine all the audio files into a single final output audio file 404 | final_output_audio_dir = "final_output_audio_dir" 405 | combine_wav_files(output_audio_dir, final_output_audio_dir, "final_output_audio.wav") 406 | 407 | #wipe all the temp folders 408 | wipe_folder("Working_files") 409 | wipe_folder("Working_files/temp") 410 | wipe_folder("output_audio") 411 | wipe_folder("chapters_txt") 412 | 413 | 414 | return "Complete!" 415 | 416 | def gradio_interface(input_file, url, language): 417 | # Check if both PDF and URL are provided 418 | if input_file is not None and url.strip() != "": 419 | return "Error: Please provide either a PDF file or a URL, not both." 420 | 421 | # Check if a PDF file is provided 422 | if input_file is not None: 423 | input_data = Path(input_file.name) # Correctly handle the PDF file 424 | elif url.strip() != "": 425 | input_data = url # Use the URL 426 | else: 427 | return "Error: Please provide a PDF file or a URL." 428 | 429 | try: 430 | #audio_file_path = asyncio.run(main_async(input_data, language, input_file is not None)) 431 | audio_file_path = asyncio.run(main_async(input_data, language)) 432 | return audio_file_path 433 | except Exception as e: 434 | logging.error(f"{e}") 435 | return str(e) 436 | 437 | 438 | # Setup Gradio interface 439 | demo = gr.Interface( 440 | fn=gradio_interface, 441 | inputs=[ 442 | gr.File(label="Upload PDF / Subir PDF", type="filepath"), 443 | gr.Textbox(label="Or Enter Article URL", placeholder="Enter URL here"), 444 | gr.Dropdown(label="Select Language / Seleccionar idioma", choices=["English", "Spanish"], value="English") 445 | ], 446 | outputs=gr.Audio(label="Generated Interview / Entrevista generada"), 447 | allow_flagging="never" 448 | ) 449 | 450 | # Launch Gradio interface 451 | demo.launch(share=False) # Set share=True to create a public link 452 | -------------------------------------------------------------------------------- /summarize_local_gpt4all.py: -------------------------------------------------------------------------------- 1 | #You need to add these imports to a docker file 2 | #the dockerfile should be using python3.10 3 | 4 | #pip install pydub 5 | #pip install nltk 6 | #pip install torch 7 | #pip install torchvision 8 | #pip install torchaudio 9 | #pip install TTS 10 | #pip install tqdm 11 | #pip install gradio 12 | #pip install PyMuPDF 13 | #pip install newspaper3k 14 | #pip install gpt4all 15 | #pip install tqdm 16 | 17 | print("starting...") 18 | 19 | import os 20 | import shutil 21 | import subprocess 22 | import re 23 | from pydub import AudioSegment 24 | import tempfile 25 | from pydub import AudioSegment 26 | import os 27 | import nltk 28 | from nltk.tokenize import sent_tokenize 29 | import sys 30 | import torch 31 | from TTS.api import TTS 32 | from TTS.tts.configs.xtts_config import XttsConfig 33 | from TTS.tts.models.xtts import Xtts 34 | from tqdm import tqdm 35 | import gradio as gr 36 | from gradio import Progress 37 | import urllib.request 38 | import zipfile 39 | 40 | import logging 41 | import asyncio 42 | from pathlib import Path 43 | from pydub import AudioSegment 44 | import gradio as gr 45 | import torch 46 | from TTS.api import TTS 47 | from tqdm import tqdm 48 | import fitz # PyMuPDF 49 | from newspaper import Article 50 | from gpt4all import GPT4All 51 | 52 | import os 53 | from gpt4all import GPT4All 54 | import logging 55 | 56 | logging.basicConfig( 57 | level=logging.INFO, 58 | format='%(asctime)s - %(levelname)s - %(message)s' 59 | ) 60 | 61 | def wipe_folder(folder_path): 62 | # Check if the folder exists 63 | if not os.path.exists(folder_path): 64 | print(f"The folder {folder_path} does not exist.") 65 | return 66 | 67 | # Iterate over all the items in the given folder 68 | for item in os.listdir(folder_path): 69 | item_path = os.path.join(folder_path, item) 70 | # If it's a file, remove it and print a message 71 | if os.path.isfile(item_path): 72 | os.remove(item_path) 73 | print(f"Removed file: {item_path}") 74 | # If it's a directory, remove it recursively and print a message 75 | elif os.path.isdir(item_path): 76 | shutil.rmtree(item_path) 77 | print(f"Removed directory and its contents: {item_path}") 78 | 79 | print(f"All contents wiped from {folder_path}.") 80 | 81 | def fetch_text_from_url(url): 82 | """Fetch main text from the provided URL using newspaper3k.""" 83 | try: 84 | article = Article(url) 85 | article.download() 86 | article.parse() 87 | return article.text 88 | except Exception as e: 89 | logging.error(f"Failed to fetch text from URL: {e}") 90 | return None 91 | 92 | def convert_pdf_to_text(pdf_path): 93 | """Convert PDF file to text using PyMuPDF.""" 94 | text = "" 95 | with fitz.open(pdf_path) as pdf: 96 | for page in pdf: 97 | text += page.get_text() 98 | return text 99 | 100 | def run_gpt4all(prompt, model="Phi-3.5-mini-instruct.Q4_0.gguf"): 101 | """Run GPT4All locally with the given model and prompt using the Python API.""" 102 | gpt4all = GPT4All(model) 103 | logging.info(f"Running GPT4All with model: {model} and prompt: {prompt}") 104 | try: 105 | with gpt4all.chat_session(): # Use chat_session for managing context 106 | response = gpt4all.generate(prompt, max_tokens=1024) # Generate response 107 | logging.info(f"GPT4All response: {response}") 108 | return response 109 | except Exception as e: 110 | logging.error(f"GPT4All error: {str(e)}") 111 | return None 112 | 113 | 114 | def generate_prompt(language, stage): 115 | """Generate the appropriate prompt based on the language and stage.""" 116 | if language.lower() == "english": 117 | return ( 118 | "English Version:\n\n" 119 | "Generate an in-depth and coherent interview in dialogue format that reflects the key aspects of the provided document. " 120 | "Include a brief introduction by the interviewer, followed by a series of questions and responses, concluding with a summary." 121 | " Output should be plain text, with each dialogue line separated by two new lines." 122 | ) 123 | else: 124 | return ( 125 | "Versión en Español:\n\n" 126 | "Genera una entrevista coherente en formato de diálogo que refleje los aspectos clave del documento proporcionado. " 127 | "Incluye una breve introducción por el entrevistador, seguida de una serie de preguntas y respuestas, concluyendo con un resumen." 128 | " El resultado debe ser texto plano, con cada línea de diálogo separada por dos nuevas líneas." 129 | ) 130 | 131 | def get_chat_response(text, language): 132 | """Generate interview based on text and handle response.""" 133 | prompt_stage = generate_prompt(language, 1) 134 | interview = run_gpt4all(prompt_stage + "\n\n" + text) 135 | return interview.split('\n\n') # Splitting by two new lines as per the new format 136 | 137 | # Setup TTS using 🐸TTS 138 | device = "cuda" if torch.cuda.is_available() else "cpu" 139 | 140 | def remove_prefix(text): 141 | """Remove any prefix before and including the first colon, if present.""" 142 | index = text.find(':') 143 | if index != -1: 144 | return text[index + 1:].lstrip() 145 | return text 146 | 147 | def remove_prefix_from_all_txt_files_in_folder(folder_path): 148 | """Remove any prefix before and including the first colon in every .txt file in the specified folder.""" 149 | for filename in os.listdir(folder_path): 150 | if filename.endswith('.txt'): 151 | file_path = os.path.join(folder_path, filename) 152 | with open(file_path, 'r') as file: 153 | content = file.readlines() 154 | 155 | # Apply remove_prefix to each line 156 | new_content = [remove_prefix(line) for line in content] 157 | 158 | # Write the modified content back to the file 159 | with open(file_path, 'w') as file: 160 | file.writelines(new_content) 161 | 162 | print("Prefix removed from all text files in the folder.") 163 | 164 | def create_chapter_files(chapters, output_folder): 165 | # Ensure the output directory exists, create if it doesn't 166 | os.makedirs(output_folder, exist_ok=True) 167 | 168 | for i, chapter in enumerate(chapters, start=1): 169 | file_path = os.path.join(output_folder, f"chapter_{i}.txt") 170 | with open(file_path, "w") as file: 171 | file.write(chapter) 172 | 173 | # Combine WAV files into a single file 174 | def combine_wav_files(input_directory, output_directory, file_name): 175 | # Ensure that the output directory exists, create it if necessary 176 | os.makedirs(output_directory, exist_ok=True) 177 | 178 | # Specify the output file path 179 | output_file_path = os.path.join(output_directory, file_name) 180 | 181 | # Initialize an empty audio segment 182 | combined_audio = AudioSegment.empty() 183 | 184 | # Get a list of all .wav files in the specified input directory and sort them 185 | input_file_paths = sorted( 186 | [os.path.join(input_directory, f) for f in os.listdir(input_directory) if f.endswith(".wav")], 187 | key=lambda f: int(''.join(filter(str.isdigit, f))) 188 | ) 189 | 190 | # Sequentially append each file to the combined_audio 191 | for input_file_path in input_file_paths: 192 | audio_segment = AudioSegment.from_wav(input_file_path) 193 | combined_audio += audio_segment 194 | 195 | # Export the combined audio to the output file path 196 | combined_audio.export(output_file_path, format='wav') 197 | 198 | print(f"Combined audio saved to {output_file_path}") 199 | 200 | # Function to split long strings into parts 201 | def split_long_sentence(sentence, max_length=230, max_pauses=8): 202 | """ 203 | Splits a sentence into parts based on length or number of pauses without recursion. 204 | 205 | :param sentence: The sentence to split. 206 | :param max_length: Maximum allowed length of a sentence. 207 | :param max_pauses: Maximum allowed number of pauses in a sentence. 208 | :return: A list of sentence parts that meet the criteria. 209 | """ 210 | parts = [] 211 | while len(sentence) > max_length or sentence.count(',') + sentence.count(';') + sentence.count('.') > max_pauses: 212 | possible_splits = [i for i, char in enumerate(sentence) if char in ',;.' and i < max_length] 213 | if possible_splits: 214 | # Find the best place to split the sentence, preferring the last possible split to keep parts longer 215 | split_at = possible_splits[-1] + 1 216 | else: 217 | # If no punctuation to split on within max_length, split at max_length 218 | split_at = max_length 219 | 220 | # Split the sentence and add the first part to the list 221 | parts.append(sentence[:split_at].strip()) 222 | sentence = sentence[split_at:].strip() 223 | 224 | # Add the remaining part of the sentence 225 | parts.append(sentence) 226 | return parts 227 | 228 | #This function goes through the chapter dir and generates a chapter for each chapter_1.txt and so on files 229 | def convert_chapters_to_audio_standard_model(chapters_dir, output_audio_dir, target_voice_path=None, language=None): 230 | selected_tts_model = "tts_models/multilingual/multi-dataset/xtts_v2" 231 | tts = TTS(selected_tts_model, progress_bar=False).to(device) 232 | 233 | if not os.path.exists(output_audio_dir): 234 | os.makedirs(output_audio_dir) 235 | Narrerator_status = True 236 | 237 | for chapter_file in sorted(os.listdir(chapters_dir), key=lambda x: int(re.search(r"chapter_(\d+).txt", x).group(1)) if re.search(r"chapter_(\d+).txt", x) else float('inf')): 238 | if chapter_file.endswith('.txt'): 239 | match = re.search(r"chapter_(\d+).txt", chapter_file) 240 | if match: 241 | chapter_num = int(match.group(1)) 242 | else: 243 | print(f"Skipping file {chapter_file} as it does not match the expected format.") 244 | continue 245 | 246 | chapter_path = os.path.join(chapters_dir, chapter_file) 247 | output_file_name = f"audio_chapter_{chapter_num}.wav" 248 | output_file_path = os.path.join(output_audio_dir, output_file_name) 249 | temp_audio_directory = os.path.join(".", "Working_files", "temp") 250 | os.makedirs(temp_audio_directory, exist_ok=True) 251 | temp_count = 0 252 | 253 | with open(chapter_path, 'r', encoding='utf-8') as file: 254 | chapter_text = file.read() 255 | sentences = sent_tokenize(chapter_text, language='italian' if language == 'it' else 'english') 256 | for sentence in tqdm(sentences, desc=f"Chapter {chapter_num}"): 257 | fragments = split_long_sentence(sentence, max_length=249 if language == "en" else 213, max_pauses=10) 258 | for fragment in fragments: 259 | if fragment != "": 260 | print(f"Generating fragment: {fragment}...") 261 | fragment_file_path = os.path.join(temp_audio_directory, f"{temp_count}.wav") 262 | #speaker_wav_path = target_voice_path if target_voice_path else default_target_voice_path 263 | language_code = language if language else default_language_code 264 | if Narrerator_status == True: 265 | tts.tts_to_file(text=fragment, file_path=fragment_file_path, speaker_wav="Interviewer.mp3", language=language_code) 266 | if Narrerator_status == False: 267 | tts.tts_to_file(text=fragment, file_path=fragment_file_path, speaker_wav="Female.wav", language=language_code) 268 | temp_count += 1 269 | 270 | combine_wav_files(temp_audio_directory, output_audio_dir, output_file_name) 271 | wipe_folder(temp_audio_directory) 272 | print(f"Converted chapter {chapter_num} to audio.") 273 | #This will swap the status of the Narrerator status boolean value 274 | Narrerator_status = not Narrerator_status 275 | 276 | async def generate_and_combine_audio_files(dialogues, output_dir, base_name): 277 | """Generate audio files for dialogues and combine them.""" 278 | file_number = 1 # Start numbering from 0000001 279 | is_interviewer = True # Start with interviewer as the first speaker 280 | for dialogue in tqdm(dialogues, desc="Generating audio"): 281 | if dialogue.strip(): # Check if there is actual dialogue content 282 | generate_audio() 283 | print(f"Generating audio...: Interviewer is : {is_interviewer} dialogue is {dialogue}") 284 | is_interviewer = not is_interviewer # Toggle speaker after each dialogue block 285 | combined_audio_path = output_dir / f"{base_name}.wav" 286 | print(f"combining audio files...") 287 | combine_audio() 288 | return combined_audio_path 289 | 290 | async def main_async(input_data, language): 291 | """Main function to process input and generate audio.""" 292 | text = "" 293 | if isinstance(input_data, Path): 294 | text = convert_pdf_to_text(input_data) 295 | else: 296 | text = fetch_text_from_url(input_data) 297 | dialogues = get_chat_response(text, language) 298 | #create chapter files from dialog 299 | chaptertxt_folder = "chapters_txt" 300 | create_chapter_files(dialogues, chaptertxt_folder) 301 | 302 | #This will remove all the prefix from all the txt files in the chaptertxt_folder folder 303 | remove_prefix_from_all_txt_files_in_folder(chaptertxt_folder) 304 | 305 | #generate audio for all chapter files 306 | output_audio_dir = "output_audio" 307 | convert_chapters_to_audio_standard_model(chaptertxt_folder, output_audio_dir, target_voice_path=None, language='en') 308 | 309 | #combine all the audio files into a single final output audio file 310 | final_output_audio_dir = "final_output_audio_dir" 311 | combine_wav_files(output_audio_dir, final_output_audio_dir, "final_output_audio.wav") 312 | 313 | #wipe all the temp folders 314 | wipe_folder("Working_files") 315 | wipe_folder("Working_files/temp") 316 | wipe_folder("output_audio") 317 | wipe_folder("chapters_txt") 318 | 319 | return "Complete!" 320 | 321 | def gradio_interface(input_file, url, language): 322 | """Gradio interface to process input and generate audio.""" 323 | input_data = input_file if input_file else url 324 | try: 325 | audio_file_path = asyncio.run(main_async(input_data, language)) 326 | return audio_file_path 327 | except Exception as e: 328 | logging.error(f"{e}") 329 | return str(e) 330 | 331 | # Setup Gradio interface 332 | demo = gr.Interface( 333 | fn=gradio_interface, 334 | inputs=[ 335 | gr.File(label="Upload PDF / Subir PDF", type="filepath"), 336 | gr.Textbox(label="Or Enter Article URL", placeholder="Enter URL here"), 337 | gr.Dropdown(label="Select Language / Seleccionar idioma", choices=["English", "Spanish"], value="English") 338 | ], 339 | outputs=gr.Audio(label="Generated Interview / Entrevista generada"), 340 | allow_flagging="never" 341 | ) 342 | 343 | # Launch Gradio interface 344 | demo.launch(share=True) # Set share=True to create a public link 345 | --------------------------------------------------------------------------------