├── .gitignore ├── CONTRIBUTING.md ├── README.md ├── ai_helper ├── ai_helper.py ├── generate_outline.py ├── generate_speech.py └── script_generator.py ├── document_processor.py ├── examples ├── example_1.mp3 └── example_2.mp3 ├── helpers.py ├── logger.py ├── main.py ├── requirements.txt └── utils └── combine_audio.py /.gitignore: -------------------------------------------------------------------------------- 1 | __pycache__ 2 | logs 3 | 4 | output 5 | 6 | .env 7 | .env.* 8 | 9 | my_docs/* -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing to MyPodify 2 | 3 | Thank you for your interest in contributing to MyPodify! This document provides guidelines and instructions for contributing to the project. 4 | 5 | ## Code of Conduct 6 | 7 | By participating in this project, you agree to maintain a respectful and inclusive environment. We expect all contributors to: 8 | - Use welcoming and inclusive language 9 | - Be respectful of differing viewpoints and experiences 10 | - Gracefully accept constructive criticism 11 | - Focus on what is best for the community 12 | - Show empathy towards other community members 13 | 14 | ## Getting Started 15 | 16 | 1. Fork the repository 17 | 2. Clone your fork: 18 | ```bash 19 | git clone https://github.com/your-username/mypodify.git 20 | cd mypodify 21 | ``` 22 | 3. Set up your development environment: 23 | ```bash 24 | # Create and activate virtual environment 25 | python -m venv venv 26 | source venv/bin/activate # On Windows: venv\Scripts\activate 27 | 28 | # Install dependencies 29 | pip install -r requirements.txt 30 | ``` 31 | 32 | 4. Create a new branch for your feature/fix: 33 | ```bash 34 | git checkout -b feature/your-feature-name 35 | ``` 36 | 37 | ## Development Guidelines 38 | 39 | ### Code Style 40 | 41 | - Follow PEP 8 style guidelines 42 | - Use type hints for function parameters and return values 43 | - Use meaningful variable and function names 44 | - Include docstrings for classes and functions 45 | - Keep functions focused and modular 46 | 47 | ### Directory Structure 48 | - Place new AI helper functions in `ai_helper/` 49 | - Add utility functions to `utils/` 50 | - Update documentation when adding new features 51 | 52 | ### Documentation 53 | 54 | - Update README.md if adding new features or changing functionality 55 | - Include docstrings for new functions and classes 56 | - Comment complex logic or non-obvious implementations 57 | - Update requirements.txt if adding new dependencies 58 | 59 | ## Making Changes 60 | 61 | 1. Make your changes in your feature branch 62 | 2. Write or update tests as needed 63 | 3. Run the test suite to ensure everything passes 64 | 4. Update documentation as necessary 65 | 5. Commit your changes with clear, descriptive commit messages: 66 | ```bash 67 | git commit -m "feat: add support for MP3 file processing" 68 | ``` 69 | 70 | ### Commit Message Guidelines 71 | 72 | Follow the conventional commits specification: 73 | - `feat:` New feature 74 | - `fix:` Bug fix 75 | - `docs:` Documentation changes 76 | - `style:` Code style changes (formatting, etc.) 77 | - `refactor:` Code refactoring 78 | - `test:` Adding or updating tests 79 | - `chore:` Maintenance tasks 80 | 81 | ## Submitting Changes 82 | 83 | 1. Push your changes to your fork: 84 | ```bash 85 | git push origin feature/your-feature-name 86 | ``` 87 | 88 | 2. Create a Pull Request: 89 | - Go to the original repository 90 | - Click "New Pull Request" 91 | - Choose your fork and feature branch 92 | - Fill out the PR template 93 | 94 | ### Pull Request Guidelines 95 | 96 | - Provide a clear description of the changes 97 | - Link any related issues 98 | - Include screenshots for UI changes 99 | - List any breaking changes 100 | - Update documentation as needed 101 | - Ensure CI checks pass 102 | - Request review from maintainers 103 | 104 | ## Additional Resources 105 | 106 | ### Setting Up Local Development 107 | 108 | 1. Copy the example environment file: 109 | ```bash 110 | cp .env.example .env 111 | ``` 112 | 113 | 2. Configure your environment variables: 114 | ```env 115 | AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT=your_endpoint 116 | AZURE_DOCUMENT_INTELLIGENCE_KEY=your_key 117 | AZURE_SPEECH_KEY=your_speech_key 118 | AZURE_SPEECH_REGION=your_region 119 | OPENAI_API_KEY=your_openai_key 120 | ``` 121 | 122 | ### Testing Files 123 | 124 | Place test documents in the `my_docs/` directory for testing your changes. 125 | 126 | ## Questions or Need Help? 127 | 128 | - Create an issue for bugs or feature requests 129 | - Join our community discussions 130 | - Contact the maintainers 131 | 132 | ## License 133 | 134 | By contributing to MyPodify, you agree that your contributions will be licensed under the same license as the project. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # MyPodify 2 | 3 | MyPodify is an open-source local solution for automatically generating podcasts from documents. Think of it as a self-hosted alternative to Google's NotebookLM, focused on podcast creation. The tool processes documents from a specified folder and transforms them into engaging podcast content complete with outlines, scripts, and audio. 4 | 5 | ## Example 6 | 7 | You can find example Podcast outputs in the /examples directory to help you understand what MyPodify generates. 8 | 9 | ## Features 10 | 11 | - **Document Processing**: Supports multiple file formats including PDF (via Azure Document Intelligence), DOCX, and TXT files 12 | - **Automated Content Generation**: Creates podcast outlines and scripts using Ollama AI 13 | - **Text-to-Speech**: Converts scripts into audio using Azure's Speech Service 14 | - **Multiple Host Support**: Generate content for 1-3 hosts (default: 2 hosts - Alex and Jane) 15 | - **Project Organization**: Automatically creates organized directory structure for outputs 16 | - **Detailed Logging**: Comprehensive logging system for troubleshooting 17 | 18 | ## Prerequisites 19 | 20 | - Python 3.7+ 21 | - Azure Account (for PDF processing and Speech Services) 22 | - Ollama. Download Ollama [here](http://ollama.com) 23 | 24 | ## Installation 25 | 26 | 1. Clone the repository: 27 | ```bash 28 | git clone https://github.com/shagunmistry/NotebookLM_Alternative.git 29 | cd mypodify 30 | ``` 31 | 32 | 2. Install required packages: 33 | ```bash 34 | pip install -r requirements.txt 35 | ``` 36 | 37 | 3. Set up environment variables: 38 | Create a `.env` file in the project root with the following: 39 | ```env 40 | AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT=your_endpoint 41 | AZURE_DOCUMENT_INTELLIGENCE_KEY=your_key 42 | 43 | AZURE_TTS_ENDPOINT=your_endpoint 44 | AZURE_SPEECH_KEY=your_key 45 | AZURE_SPEECH_REGION=your_region 46 | ``` 47 | 48 | ## Usage 49 | 50 | 1. Place your source documents in the `my_docs` directory 51 | 52 | 2. Run the podcast generator: 53 | ```bash 54 | python main.py 55 | ``` 56 | 57 | 3. Follow the prompts to: 58 | - Specify output directory (default: 'output') 59 | - Enter project name 60 | - Set number of hosts (default: 2) 61 | - Provide project description (optional) 62 | 63 | ## Project Structure 64 | 65 | ``` 66 | mypodify/ 67 | ├── __pycache__/ # Python cache files 68 | ├── ai_helper/ # AI content generation modules 69 | │ ├── __pycache__/ 70 | │ ├── ai_helper.py # Core AI helper functions 71 | │ ├── generate_outline.py # Podcast outline generation 72 | │ ├── generate_speech.py # Speech synthesis module 73 | │ └── script_generator.py # Podcast script generation 74 | ├── logs/ # Log files directory 75 | ├── my_docs/ # Input documents directory 76 | ├── output/ # Generated content directory 77 | ├── utils/ # Utility functions 78 | │ ├── __pycache__/ 79 | │ └── combine_audio.py # Audio processing utilities 80 | ├── .env # Environment variables 81 | ├── .gitignore # Git ignore rules 82 | ├── document_processor.py # Document processing module 83 | ├── helpers.py # Helper utilities 84 | ├── logger.py # Logging configuration 85 | ├── main.py # Main application entry 86 | ├── README.md # Project documentation 87 | ├── requirements.txt # Project dependencies 88 | ``` 89 | 90 | ## Output Structure 91 | 92 | Each project generates: 93 | - Processed document text files 94 | - Markdown outline files 95 | - Podcast scripts in Markdown format 96 | - Individual audio segments 97 | - Combined final podcast audio file 98 | - Project metadata JSON 99 | 100 | ## Supported File Types 101 | 102 | - PDF (requires Azure Document Intelligence) 103 | - DOCX 104 | - TXT 105 | - DOC (currently unsupported, must be converted to DOCX) 106 | 107 | ## Configuration 108 | 109 | The system can be configured through various parameters: 110 | 111 | - **Host Count**: 1-3 hosts (affects conversation style and dynamics) 112 | - **Output Directory**: Customizable output location 113 | - **Project Name**: Used for organizing outputs 114 | - **Description**: Optional context for content generation 115 | 116 | ## Logging 117 | 118 | The system includes comprehensive logging: 119 | - General logs: `podcast_generator.log` 120 | - Document processing logs: `document_processor.log` 121 | - Speech generation logs: `speech_generator.log` 122 | - Outline generation logs: `podcast_outline.log` 123 | 124 | ## Contributing 125 | 126 | Contributions are welcome! Please feel free to submit a Pull Request. 127 | 128 | ## License 129 | 130 | This project is licensed under the MIT License. 131 | 132 | ## Acknowledgments 133 | 134 | - Azure Document Intelligence for PDF processing 135 | - Azure Speech Services for text-to-speech 136 | - Ollama for AI content generation 137 | 138 | ## Note 139 | 140 | This is an early version of the project and is under active development. Features and functionality may change in future releases. 141 | 142 | ## Contact 143 | 144 | For questions or feedback or issues, please create an issue. 145 | -------------------------------------------------------------------------------- /ai_helper/ai_helper.py: -------------------------------------------------------------------------------- 1 | # from ollama import AsyncClient 2 | # from typing import List 3 | # from logger import CustomLogger 4 | # from dotenv import load_dotenv 5 | 6 | # load_dotenv() 7 | 8 | # log = CustomLogger("AI_Helper", log_file="ai_helper.log") 9 | 10 | # # Ollama configuration 11 | # MODEL_NAME = "llama3.2" 12 | # client = AsyncClient() 13 | 14 | # async def generate_content_from_ollama(content: str, system_instructions: str, purpose: str, 15 | # previous_content: str = None, chunk_index: int = None, 16 | # total_chunks: int = None) -> str: 17 | # """Generate content using Ollama's Python library.""" 18 | # log.log_info(f"Generating {purpose} using Ollama...") 19 | 20 | # try: 21 | # messages = [] 22 | 23 | # # Build the system message with context 24 | # if previous_content and chunk_index is not None: 25 | # system_msg = ( 26 | # f"{system_instructions}\n\n" 27 | # f"This is part {chunk_index + 1} of {total_chunks}. " 28 | # f"Previous content summary:\n{previous_content[:1000]}...\n\n" 29 | # f"Continue the {purpose} based on the following additional content, " 30 | # f"maintaining consistency with the previous parts:\n\n{content}" 31 | # ) 32 | # else: 33 | # system_msg = f"{system_instructions}\n\nContent to analyze:\n\n{content}" 34 | 35 | # messages = [ 36 | # {"role": "system", "content": system_msg}, 37 | # {"role": "user", "content": f"Generate the {purpose} for this content, making it flow naturally with any previous parts."} 38 | # ] 39 | 40 | # log.log_info(f"Processing content (first 100 chars): {content[:100]}...") 41 | 42 | # response = await client.chat( 43 | # model=MODEL_NAME, 44 | # messages=messages, 45 | # options={ 46 | # "temperature": 0.7, 47 | # "top_p": 0.9 48 | # } 49 | # ) 50 | 51 | # generated_content = response.message.content 52 | # log.log_info(f"Generated content (first 100 chars): {generated_content[:100]}...") 53 | # return generated_content 54 | 55 | # except Exception as e: 56 | # log.log_error(f"Error generating completion with Ollama: {e}") 57 | # raise 58 | 59 | # async def generate_content_with_chunking(content: str, system_instructions: str, purpose: str) -> str: 60 | # """Generate content using Ollama with chunking for large inputs.""" 61 | # log.log_info(f"Generating {purpose} with chunking...") 62 | 63 | # def split_content(text: str, chunk_size: int = 8000) -> List[str]: 64 | # """Split content into chunks based on character count and sentence boundaries.""" 65 | # chunks = [] 66 | # current_chunk = "" 67 | 68 | # # Split by paragraphs first to maintain better context 69 | # paragraphs = text.split('\n\n') 70 | 71 | # for paragraph in paragraphs: 72 | # if len(current_chunk) + len(paragraph) < chunk_size: 73 | # current_chunk += (paragraph + '\n\n') 74 | # else: 75 | # if current_chunk: 76 | # chunks.append(current_chunk.strip()) 77 | # current_chunk = paragraph + '\n\n' 78 | 79 | # if current_chunk: 80 | # chunks.append(current_chunk.strip()) 81 | 82 | # return chunks 83 | 84 | # try: 85 | # # If content is small enough, process it directly 86 | # if len(content) < 8000: 87 | # return await generate_content_from_ollama(content, system_instructions, purpose) 88 | 89 | # # Split into chunks if content is large 90 | # chunks = split_content(content) 91 | # total_chunks = len(chunks) 92 | # log.log_info(f"Split content into {total_chunks} chunks") 93 | 94 | # final_content = "" 95 | # for i, chunk in enumerate(chunks): 96 | # log.log_info(f"Processing chunk {i+1}/{total_chunks}") 97 | 98 | # # Get summary of previous content for context 99 | # previous_content = final_content if final_content else None 100 | 101 | # chunk_content = await generate_content_from_ollama( 102 | # chunk, 103 | # system_instructions, 104 | # purpose, 105 | # previous_content=previous_content, 106 | # chunk_index=i, 107 | # total_chunks=total_chunks 108 | # ) 109 | 110 | # if i == 0: 111 | # final_content = chunk_content 112 | # else: 113 | # # Ensure smooth transition between chunks 114 | # final_content += "\n\n" + chunk_content 115 | 116 | # log.log_info(f"Successfully processed chunk {i+1}") 117 | 118 | # return final_content 119 | 120 | # except Exception as e: 121 | # log.log_error(f"Error in content generation with chunking: {e}") 122 | # raise 123 | import tiktoken 124 | from openai import OpenAI 125 | import os 126 | from typing import List 127 | from logger import CustomLogger 128 | from dotenv import load_dotenv 129 | 130 | load_dotenv() 131 | 132 | log = CustomLogger("AI_Helper", log_file="ai_helper.log") 133 | 134 | MAX_TOKENS = 128000 # Maximum tokens allowed by the model 135 | BUFFER = 1000 # Buffer for system and user messages 136 | 137 | client = OpenAI( 138 | api_key=os.getenv("OPENAI_API_KEY"), 139 | ) 140 | 141 | MODEL_TO_USE = "gpt-4o-mini-2024-07-18" 142 | 143 | 144 | def num_tokens_from_string(string: str, model: str = MODEL_TO_USE) -> int: 145 | encoding = tiktoken.encoding_for_model(model) 146 | return len(encoding.encode(string)) 147 | 148 | 149 | def generate_content_from_openai(content: str, system_instructions: str, purpose: str) -> str: 150 | log.log_debug(f"Generating {purpose} in chunks...") 151 | 152 | def create_completion(messages: List[dict]) -> str: 153 | completion = client.chat.completions.create( 154 | model=MODEL_TO_USE, 155 | messages=messages, 156 | ) 157 | return completion.choices[0].message.content 158 | 159 | def split_content(content: str, max_tokens: int) -> List[str]: 160 | chunks = [] 161 | current_chunk = "" 162 | for line in content.split('\n'): 163 | line_tokens = num_tokens_from_string(line) 164 | if num_tokens_from_string(current_chunk) + line_tokens > max_tokens: 165 | chunks.append(current_chunk) 166 | current_chunk = line 167 | else: 168 | current_chunk += ('\n' if current_chunk else '') + line 169 | if current_chunk: 170 | chunks.append(current_chunk) 171 | return chunks 172 | 173 | log.log_debug(f"Splitting content into chunks...") 174 | log.log_debug(f"Content length: {len(content)}") 175 | 176 | system_tokens = num_tokens_from_string(system_instructions) 177 | user_tokens = num_tokens_from_string( 178 | f"Based on the content provided, generate {purpose}") 179 | max_chunk_tokens = MAX_TOKENS - system_tokens - user_tokens - BUFFER 180 | 181 | chunks = split_content(content, max_chunk_tokens) 182 | log.log_debug(f"Split content into {len(chunks)} chunks") 183 | 184 | messages = [ 185 | {"role": "system", "content": system_instructions}, 186 | {"role": "user", "content": f"Based on the content provided, generate {purpose}"}, 187 | ] 188 | 189 | final_content = "" 190 | for i, chunk in enumerate(chunks): 191 | chunk_messages = messages.copy() 192 | chunk_messages.append({"role": "user", "content": chunk}) 193 | if i > 0: 194 | chunk_messages.append({"role": "user", "content": f"Continue generating the { 195 | purpose} based on this additional content."}) 196 | 197 | chunk_content = create_completion(chunk_messages) 198 | final_content += chunk_content 199 | 200 | log.log_debug(f"Generated {purpose} successfully, processed { 201 | len(chunks)} chunks") 202 | return final_content 203 | -------------------------------------------------------------------------------- /ai_helper/generate_outline.py: -------------------------------------------------------------------------------- 1 | 2 | from logger import CustomLogger 3 | from dotenv import load_dotenv 4 | from ai_helper.ai_helper import generate_content_from_openai 5 | 6 | load_dotenv() 7 | 8 | # Set up logging 9 | log = CustomLogger("GenerateOutline", log_file="podcast_outline.log") 10 | 11 | PODCAST_OUTLINE_SYSTEM_INSTRUCTIONS = """ 12 | Act as a Podcast Producer tasked with creating a detailed outline for a deep dive podcast episode based on the provided content. 13 | The output should be a well-formatted Markdown outline that can be used to create a full podcast script. 14 | 15 | Number of Hosts: {host_count} 16 | 17 | 1. An attention-grabbing introduction that briefly introduces the host(s) and the topic 18 | 2. 4-6 main talking points, each with 3-4 sub-points or examples. 19 | - Include factual information, statistics, and historical context 20 | - Opportunities for hosts to share personal anecdotes or opinions 21 | 3. A mid-point break indication 22 | 4. A summary of key takeaways for the listeners 23 | 5. A Thank you note and closing remarks 24 | - Thanks listeners for listening to "MyPodify" podcast. 25 | 6. A suggestion for the next episode which is a related topic to the current one. 26 | 27 | If 1 host: 28 | - Name of the host: Alex. 29 | - Include personal anecdotes or opinions from the host 30 | - Encourage the host to ask rhetorical questions to engage the audience 31 | - Include moments of humor or light-hearted banter 32 | 33 | If 2 hosts: 34 | - Names of the hosts: Alex and Jane. 35 | - Indicate where the hosts might disagree or offer different perspectives 36 | - Include opportunities for the hosts to engage in a friendly debate or discussion 37 | - Encourage the hosts to ask each other questions or respond to each other's points 38 | 39 | If 3 or more hosts: 40 | - Provide clear transitions between speakers 41 | - Include opportunities for each host to contribute unique insights or perspectives 42 | - Encourage the hosts to engage in a roundtable discussion format 43 | 44 | Include the following elements in your outline: 45 | 46 | - Factual information, statistics, and historical context related to the topic 47 | - Pop culture references or current events that relate to the subject matter 48 | - Opportunities for hosts to share personal anecdotes or opinions 49 | - Moments of humor or light-hearted banter between hosts 50 | - Analogies or comparisons to explain complex concepts 51 | - "Fun facts" or surprising information to maintain interest 52 | - Rhetorical questions to engage the audience 53 | - Indications where hosts might disagree or offer different perspectives 54 | - Suggestions for transitions between subtopics 55 | 56 | THIS IS NOT SUPPOSED TO BE A SCRIPT. It should be a detailed outline that will be used to create a full podcast script. Each main point should have enough detail to guide a 5-10 minute discussion. 57 | """ 58 | 59 | 60 | async def generate_podcast_outline(analysis: str, host_count: int) -> str: 61 | try: 62 | log.log_debug("Generating podcast outline...") 63 | 64 | si_instructions = PODCAST_OUTLINE_SYSTEM_INSTRUCTIONS.format( 65 | host_count=host_count) 66 | outline = generate_content_from_openai(analysis, si_instructions, purpose="Podcast Outline") 67 | 68 | return outline 69 | except Exception as e: 70 | log.log_info(f"Error generating podcast outline: {e}") 71 | raise -------------------------------------------------------------------------------- /ai_helper/generate_speech.py: -------------------------------------------------------------------------------- 1 | import os 2 | import azure.cognitiveservices.speech as speechsdk 3 | import time 4 | from pathlib import Path 5 | from dotenv import load_dotenv 6 | from pydub import AudioSegment 7 | 8 | from logger import CustomLogger 9 | 10 | log = CustomLogger("SpeechGenerator", log_file="speech_generator.log") 11 | 12 | load_dotenv() 13 | 14 | # Azure Speech Service configuration 15 | speech_key = os.getenv("AZURE_SPEECH_KEY") 16 | service_region = os.getenv("AZURE_SPEECH_REGION") 17 | 18 | 19 | speech_config = speechsdk.SpeechConfig( 20 | subscription=speech_key, region=service_region) 21 | 22 | 23 | def create_speech(text, voice, output_file): 24 | if not text.strip(): 25 | log.log_warning(f"Warning: Empty text for {output_file}. Skipping this segment.") 26 | return 27 | 28 | try: 29 | speech_config.speech_synthesis_voice_name = voice 30 | speech_synthesizer = speechsdk.SpeechSynthesizer( 31 | speech_config=speech_config, audio_config=None) 32 | 33 | 34 | result = speech_synthesizer.speak_text_async(text).get() 35 | 36 | if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted: 37 | audio_data = result.audio_data 38 | with open(output_file, "wb") as audio_file: 39 | audio_file.write(audio_data) 40 | log.log_debug(f"Audio saved to {output_file}") 41 | else: 42 | log.log_error(f"Error synthesizing speech for {output_file}: {result.reason}") 43 | 44 | except Exception as e: 45 | log.log_error(f"Error creating speech for {output_file}: {str(e)}") 46 | log.log_error(f"Problematic text: '{text}'") 47 | 48 | 49 | def create_speech_from_ssml(ssml, output_file): 50 | try: 51 | speech_synthesizer = speechsdk.SpeechSynthesizer( 52 | speech_config=speech_config, audio_config=None) 53 | 54 | log.log_debug(ssml) 55 | result = speech_synthesizer.speak_ssml_async(ssml).get() 56 | log.log_debug(result) 57 | 58 | if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted: 59 | audio_data = result.audio_data 60 | with open(output_file, "wb") as audio_file: 61 | audio_file.write(audio_data) 62 | log.log_debug(f"Audio saved to {output_file}") 63 | return output_file 64 | elif result.reason == speechsdk.ResultReason.Canceled: 65 | cancellation_details = result.cancellation_details 66 | log.log_debug(f"Speech synthesis canceled: {cancellation_details.reason}") 67 | if cancellation_details.reason == speechsdk.CancellationReason.Error: 68 | log.log_debug(f"Error details: {cancellation_details.error_details}") 69 | else: 70 | log.log_error(f"Error synthesizing speech for {output_file}: {result.reason}") 71 | return None 72 | 73 | except Exception as e: 74 | log.log_error(f"Error creating speech for {output_file}: {str(e)}") 75 | log.log_error(f"Problematic SSML: '{ssml[:100]}...'") # log.log_debug first 100 characters of SSML for debugging 76 | return None 77 | 78 | def convert_wav_to_mp3(wav_file: Path, mp3_file: Path): 79 | try: 80 | audio = AudioSegment.from_wav(str(wav_file)) 81 | audio.export(str(mp3_file), format="mp3") 82 | log.log_debug(f"Converted {wav_file} to {mp3_file}") 83 | return mp3_file 84 | except Exception as e: 85 | log.log_error(f"Error converting WAV to MP3: {str(e)}") 86 | return None 87 | 88 | # async def text_to_speech(script: str, output_path: Path) -> tuple: 89 | # Convert the entire script to SSML 90 | # full_ssml = markdown_to_ssml(script) 91 | 92 | # # remove asterisks 93 | # full_ssml = full_ssml.replace('*', '') 94 | 95 | # wav_file = output_path / "full_script.wav" 96 | # mp3_file = output_path / "full_script.mp3" 97 | 98 | # wav_result = create_speech_from_ssml(full_ssml, wav_file) 99 | 100 | # if wav_result: 101 | # mp3_result = convert_wav_to_mp3(wav_file, mp3_file) 102 | # if mp3_result: 103 | # log.log_debug(f"Processed full script and saved to {mp3_file}") 104 | # os.remove(wav_file) # Remove the temporary WAV file 105 | # return full_ssml, mp3_file 106 | # else: 107 | # return full_ssml, wav_file # Return WAV file if MP3 conversion fails 108 | # else: 109 | # return full_ssml, None 110 | 111 | async def text_to_speech(script: str, output_path: str): 112 | # Convert output_path to Path object 113 | output_path = Path(output_path) 114 | 115 | # Make sure the directory exists 116 | output_path.parent.mkdir(parents=True, exist_ok=True) 117 | 118 | # Split the script into lines 119 | lines = script.split('\n') 120 | 121 | # Initialize variables 122 | current_speaker = "" 123 | current_text = "" 124 | audio_segments = [] 125 | line_number = 0 126 | 127 | for line in lines: 128 | line_number += 1 129 | line = line.strip() 130 | 131 | # Check for speaker lines in both formats 132 | if ':' in line and (line.startswith('**') or any(name in line.split(':')[0] for name in ['Alex', 'Jane'])): 133 | # New speaker 134 | if current_speaker and current_text.strip(): 135 | audio_segments.append( 136 | (current_speaker, current_text.strip(), line_number - 1)) 137 | current_speaker = line.split(':')[0].strip('* ') 138 | current_text = line.split(':', 1)[1].strip() + " " 139 | elif line and not line.startswith('[') and not line.startswith('#'): 140 | current_text += line + " " 141 | 142 | # Add the last segment 143 | if current_speaker and current_text.strip(): 144 | audio_segments.append( 145 | (current_speaker, current_text.strip(), line_number)) 146 | 147 | # Create audio files for each segment 148 | temp_segments = [] 149 | for i, (speaker, text, line_number) in enumerate(audio_segments): 150 | if 'Alex' in speaker: 151 | voice = "en-US-AndrewMultilingualNeural" 152 | elif 'Jane' in speaker: 153 | voice = "en-US-AvaMultilingualNeural" 154 | else: 155 | voice = "en-US-BrandonMultilingualNeural" # Default voice 156 | 157 | clean_text = text.strip() 158 | 159 | if not clean_text: 160 | log.log_debug(f"Warning: Empty cleaned text for segment {i} (starting at line {line_number}). Original text: '{text}'") 161 | continue 162 | 163 | # Create segment filename using parent directory of output_path 164 | segment_file = output_path.parent / f"segment_{i:03d}.wav" 165 | create_speech(clean_text, voice, str(segment_file)) 166 | temp_segments.append(segment_file) 167 | log.log_debug(f"Created {segment_file} for {speaker}: {clean_text[:50]}...") 168 | 169 | # Add a short pause between segments 170 | time.sleep(0.5) 171 | 172 | log.log_debug(f"Processed {len(audio_segments)} segments.") 173 | return output_path 174 | 175 | # Example usage: 176 | # output_path = Path("path/to/output/directory") 177 | # await text_to_speech(your_script, output_path) -------------------------------------------------------------------------------- /ai_helper/script_generator.py: -------------------------------------------------------------------------------- 1 | from ai_helper.ai_helper import generate_content_from_openai 2 | from logger import CustomLogger 3 | 4 | log = CustomLogger("ScriptGenerator", log_file="script_generator.log") 5 | 6 | ONE_HOST_PODCAST_SCRIPT_SYSTEM_INSTRUCTIONS = """ 7 | You are an expert Podcaster tasked with creating a full podcast script based on the provided outline. 8 | The podcast is a deep dive educational show featuring a single host named Alex. 9 | 10 | The Podcast name is "MyPodify". 11 | 12 | Alex is a charismatic and knowledgeable host with a background in journalism and a passion for research. They have a conversational style that combines the storytelling flair of Steve Jobs, the entrepreneurial insight of Richard Branson, the scientific curiosity of Neil deGrasse Tyson, and the observational wisdom of Jane Goodall. 13 | 14 | - Begin with a brief introduction of the host and the topic. 15 | - Structure the content as an engaging monologue that feels like a conversation with the listener. 16 | - Present factual information, statistics, and historical context related to the chosen topic. 17 | - Expand the outline into a natural, engaging narrative. 18 | - Ensure the script covers all points in the outline thoroughly. 19 | - Add relevant examples, anecdotes, or case studies to illustrate key points. 20 | - Incorporate personal anecdotes, opinions, and experiences to make the content relatable. 21 | - Use a conversational, informal tone throughout the podcast. 22 | - Include some humor and light-hearted moments to keep the listener engaged. 23 | - Incorporate smooth transitions between main points. 24 | - Add opening and closing remarks, including a teaser for the next episode. 25 | - Use Punctuation and Capitalization as this will be converted to speech. 26 | 27 | --- 28 | 29 | Pacing and Flow: 30 | 31 | - Start with a hook or interesting fact to grab the audience's attention. 32 | - Gradually build up the information, starting with basic concepts and progressing to more complex ideas. 33 | - Include natural transitions between subtopics. 34 | - Periodically summarize key points to reinforce important information. 35 | - Use rhetorical questions or hypothetical scenarios to engage the listener. 36 | 37 | --- 38 | 39 | Engaging the Audience: 40 | 41 | - Address the listeners directly, making them feel part of the conversation. 42 | - Pose questions for the audience to ponder. 43 | - Encourage listeners to share their thoughts or experiences on social media or the podcast's website. 44 | - Incorporate listener feedback or questions from previous episodes when relevant. 45 | 46 | --- 47 | 48 | Intro Example: 49 | **Alex**: "Hello and welcome to MyPodify, I'm your host Alex. Today, we're diving into [Topic]. But before we get started, let me share a quick story that happened to me this morning..." 50 | 51 | --- 52 | 53 | Generate a full podcast script based on the outline provided (15-20 minutes) 54 | 55 | Expected Markdown Output Format: 56 | **Alex:** [Content] 57 | """ 58 | 59 | TWO_HOST_PODCAST_SCRIPT_SYSTEM_INSTRUCTIONS = """ 60 | You are an expert Podcaster tasked with creating a full podcast script based on the provided outline. 61 | The podcast is a deep dive educational show featuring two hosts: Alex and Jane. 62 | 63 | The Podcast name is "MyPodify". 64 | 65 | Alex is a journalist with a knack for storytelling, with a conversation style of Steve Jobs and Richard Branson. 66 | Jane is a researcher with a passion for facts and figures, and she loves to share interesting anecdotes. She has a conversation style of Neil deGrasse Tyson and Jane Goodall. 67 | 68 | - Begin with a brief introduction of the hosts and the topic. 69 | - Structure the content as a casual conversation between the two hosts. 70 | - Include natural back-and-forth dialogue, with hosts building on each other's points. 71 | - Present factual information, statistics, and historical context related to the chosen topic. 72 | - Expand the outline into a natural, engaging conversation between Alex and Jane. 73 | - Ensure the script covers all points in the outline thoroughly. 74 | - Add relevant examples, anecdotes, or case studies to illustrate key points. 75 | - Incorporate personal anecdotes, opinions, and experiences from the hosts to make the content relatable. 76 | - Use a conversational, informal tone throughout the podcast. 77 | - Include some humor and light-hearted moments, such as jokes or playful banter between hosts. 78 | - Incorporate smooth transitions between main points. 79 | - Add opening and closing remarks, including the teaser for the next episode. 80 | - Use Punctuation and Capitalization as this will be converted to speech. 81 | --- 82 | 83 | Pacing and Flow: 84 | 85 | - Start with a hook or interesting fact to grab the audience's attention. 86 | - Gradually build up the information, starting with basic concepts and progressing to more complex ideas. 87 | - Include natural transitions between subtopics. 88 | - Periodically summarize key points to reinforce important information. 89 | 90 | --- 91 | 92 | Interaction between Hosts: 93 | 94 | - Create distinct personalities for each host, with one potentially being more knowledgeable about the topic. 95 | - Include instances where hosts ask each other questions or seek clarification. 96 | - Allow for occasional disagreements or different perspectives between hosts. 97 | - Incorporate moments where hosts compliment each other's insights or build on each other's ideas. 98 | 99 | --- 100 | 101 | Intro Example: 102 | **Alex**: "Hello and welcome to MyPodify, I am your host Alex, and I am joined by my co-host Jane. 103 | **Jane**: "Hi everyone, it's good to be back! 104 | **Alex**: "Let's talk about [Topic] today! Before we get started, Jane, how has your day been so far?" 105 | 106 | --- 107 | 108 | Generate a full podcast script based on the outline provided (15-20 minutes) 109 | 110 | Expected Markdown Output Format: 111 | **Alex:** [Content] 112 | **Jane:** [Content] 113 | **Guest:** [Content] 114 | """ 115 | 116 | 117 | async def generate_podcast_script(outline: str, analysis: str, host_count: int) -> str: 118 | try: 119 | log.log_info("Generating podcast script") 120 | 121 | if host_count == 1: 122 | PODCAST_SCRIPT_SYSTEM_INSTRUCTIONS = ONE_HOST_PODCAST_SCRIPT_SYSTEM_INSTRUCTIONS 123 | elif host_count > 1: 124 | PODCAST_SCRIPT_SYSTEM_INSTRUCTIONS = TWO_HOST_PODCAST_SCRIPT_SYSTEM_INSTRUCTIONS 125 | 126 | final_content = f"{outline}\n\nContent Details:{analysis}" 127 | 128 | script = generate_content_from_openai(content=final_content, system_instructions=PODCAST_SCRIPT_SYSTEM_INSTRUCTIONS, purpose="Podcast Script") 129 | 130 | return script 131 | except Exception as e: 132 | log.log_error(f"Error generating podcast script: {e}") 133 | raise -------------------------------------------------------------------------------- /document_processor.py: -------------------------------------------------------------------------------- 1 | import os 2 | from pathlib import Path 3 | from azure.core.credentials import AzureKeyCredential 4 | from azure.ai.formrecognizer import DocumentAnalysisClient 5 | import docx 6 | from typing import Optional 7 | from dotenv import load_dotenv 8 | 9 | from logger import CustomLogger 10 | 11 | log = CustomLogger("DocumentProcessor", log_file="document_processor.log") 12 | 13 | load_dotenv() 14 | 15 | class UnsupportedFileTypeError(Exception): 16 | pass 17 | 18 | def get_file_type(file_path: str) -> str: 19 | """Determine file type from extension.""" 20 | return Path(file_path).suffix.lower() 21 | 22 | def read_txt_file(file_path: str) -> str: 23 | """Read content from a text file.""" 24 | try: 25 | with open(file_path, 'r', encoding='utf-8') as file: 26 | return file.read() 27 | except UnicodeDecodeError: 28 | # Try different encodings if UTF-8 fails 29 | encodings = ['latin-1', 'cp1252', 'iso-8859-1'] 30 | for encoding in encodings: 31 | try: 32 | with open(file_path, 'r', encoding=encoding) as file: 33 | return file.read() 34 | except UnicodeDecodeError: 35 | continue 36 | raise UnicodeDecodeError("Failed to decode text file with multiple encodings") 37 | 38 | def read_docx_file(file_path: str) -> str: 39 | """Extract text from a DOCX file.""" 40 | doc = docx.Document(file_path) 41 | full_text = [] 42 | 43 | # Extract text from paragraphs 44 | for paragraph in doc.paragraphs: 45 | if paragraph.text.strip(): 46 | full_text.append(paragraph.text) 47 | 48 | # Extract text from tables 49 | for table in doc.tables: 50 | for row in table.rows: 51 | for cell in row.cells: 52 | if cell.text.strip(): 53 | full_text.append(cell.text) 54 | 55 | return '\n'.join(full_text) 56 | 57 | def read_pdf_with_azure(file_path: str, endpoint: Optional[str] = None, key: Optional[str] = None) -> str: 58 | """Extract text from PDF using Azure Document Intelligence.""" 59 | try: 60 | # Use provided credentials or fall back to environment variables 61 | endpoint = endpoint or os.getenv("AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT") 62 | key = key or os.getenv("AZURE_DOCUMENT_INTELLIGENCE_KEY") 63 | 64 | if not endpoint or not key: 65 | raise ValueError("Azure Document Intelligence credentials not configured") 66 | 67 | document_analysis_client = DocumentAnalysisClient( 68 | endpoint=endpoint, credential=AzureKeyCredential(key) 69 | ) 70 | 71 | with open(file_path, "rb") as file: 72 | document_bytes = file.read() 73 | 74 | poller = document_analysis_client.begin_analyze_document( 75 | "prebuilt-read", document_bytes 76 | ) 77 | result = poller.result() 78 | return result.content 79 | except Exception as e: 80 | log.log_error(f"Azure PDF processing failed: {str(e)}") 81 | raise 82 | 83 | async def analyze_document(file_path: str) -> str: 84 | """ 85 | Analyze document from various file formats. 86 | Supports PDF, DOC, DOCX, and TXT files. 87 | 88 | Args: 89 | file_path: Path to the local file 90 | Returns: 91 | Extracted text content from the document 92 | """ 93 | try: 94 | # Verify file exists 95 | if not os.path.exists(file_path): 96 | raise FileNotFoundError(f"File not found: {file_path}") 97 | 98 | file_path = str(Path(file_path)) # Normalize path 99 | file_type = get_file_type(file_path) 100 | 101 | if file_type not in ['.pdf', '.doc', '.docx', '.txt']: 102 | raise UnsupportedFileTypeError(f"Unsupported file type: {file_type}") 103 | 104 | log.log_info(f"Processing {file_type} file: {file_path}") 105 | 106 | # Process based on file type 107 | if file_type == '.txt': 108 | log.log_info("Processing TXT file") 109 | full_text = read_txt_file(file_path) 110 | 111 | elif file_type in ['.doc', '.docx']: 112 | log.log_info("Processing DOC/DOCX file") 113 | if file_type == '.doc': 114 | # For now, we'll raise an error for .doc files 115 | # You might want to add doc to docx conversion here 116 | raise UnsupportedFileTypeError("DOC format is not supported, please convert to DOCX") 117 | full_text = read_docx_file(file_path) 118 | 119 | elif file_type == '.pdf': 120 | log.log_info("Processing PDF file using Azure Document Intelligence") 121 | full_text = read_pdf_with_azure(file_path) 122 | 123 | # Post-processing 124 | if not full_text: 125 | raise ValueError("No text content extracted from document") 126 | 127 | # Remove excessive whitespace and normalize line endings 128 | full_text = '\n'.join(line.strip() for line in full_text.splitlines() if line.strip()) 129 | log.log_info(f"Successfully extracted {len(full_text)} characters from {file_type} file") 130 | 131 | return full_text 132 | 133 | except UnsupportedFileTypeError as e: 134 | log.log_error(f"Unsupported file type error: {str(e)}") 135 | raise 136 | 137 | except Exception as e: 138 | log.log_error(f"Error analyzing document: {str(e)}") 139 | raise 140 | 141 | def validate_file_type(file_path: str) -> bool: 142 | """ 143 | Validate if the file type is supported. 144 | Returns True if supported, False otherwise. 145 | """ 146 | allowed_extensions = {'.pdf', '.doc', '.docx', '.txt'} 147 | return get_file_type(file_path) in allowed_extensions 148 | 149 | async def get_document_metadata(file_path: str) -> dict: 150 | """ 151 | Get metadata about the local document. 152 | Returns a dictionary with file type, size, and other relevant information. 153 | """ 154 | try: 155 | file_stat = os.stat(file_path) 156 | metadata = { 157 | "file_type": get_file_type(file_path), 158 | "size": file_stat.st_size, 159 | "created_at": file_stat.st_ctime, 160 | "updated_at": file_stat.st_mtime, 161 | "path": str(Path(file_path).absolute()) 162 | } 163 | return metadata 164 | except Exception as e: 165 | log.log_error(f"Error getting document metadata: {str(e)}") 166 | raise -------------------------------------------------------------------------------- /examples/example_1.mp3: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shagunmistry/NotebookLM_Alternative/ce8e1b1d51e84bc3ae647685fcac6d28c40e711a/examples/example_1.mp3 -------------------------------------------------------------------------------- /examples/example_2.mp3: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shagunmistry/NotebookLM_Alternative/ce8e1b1d51e84bc3ae647685fcac6d28c40e711a/examples/example_2.mp3 -------------------------------------------------------------------------------- /helpers.py: -------------------------------------------------------------------------------- 1 | 2 | from typing import Annotated 3 | 4 | 5 | from logger import CustomLogger 6 | import aiohttp 7 | from bs4 import BeautifulSoup 8 | import asyncio 9 | from cachetools import TTLCache 10 | import aiodns 11 | import time 12 | 13 | logger = CustomLogger(name="MyPodify_Helpers", log_file="mypodify_helpers.log") 14 | 15 | # Cache for storing fetched content (1000 items, 1 hour TTL) 16 | content_cache = TTLCache(maxsize=1000, ttl=3600) 17 | 18 | MAX_FILE_SIZE = 10 * 1024 * 1024 # 10MB 19 | ALLOWED_EXTENSIONS = {'pdf'} 20 | 21 | # Rate limiting parameters 22 | rate_limit = 10 # requests per second 23 | last_request_time = time.time() 24 | request_count = 0 25 | 26 | async def get_website_content(website_link: str, timeout: int = 10) -> str | None: 27 | global last_request_time, request_count 28 | 29 | # Check cache first 30 | if website_link in content_cache: 31 | return content_cache[website_link] 32 | 33 | # Implement rate limiting 34 | current_time = time.time() 35 | if current_time - last_request_time >= 1: 36 | last_request_time = current_time 37 | request_count = 0 38 | if request_count >= rate_limit: 39 | await asyncio.sleep(1) 40 | return await get_website_content(website_link, timeout) 41 | request_count += 1 42 | 43 | try: 44 | resolver = aiodns.DNSResolver() 45 | async with aiohttp.ClientSession(connector=aiohttp.TCPConnector(ssl=False)) as session: 46 | async with session.get(website_link, timeout=timeout, allow_redirects=True) as response: 47 | if response.status == 200: 48 | html_content = await response.text() 49 | soup = BeautifulSoup(html_content, 'html.parser') 50 | 51 | # Remove script and style elements 52 | for script in soup(["script", "style"]): 53 | script.decompose() 54 | 55 | # Get text content 56 | text = soup.get_text() 57 | 58 | # Clean up text 59 | lines = (line.strip() for line in text.splitlines()) 60 | chunks = (phrase.strip() for line in lines for phrase in line.split(" ")) 61 | text = '\n'.join(chunk for chunk in chunks if chunk) 62 | 63 | # Cache the result 64 | content_cache[website_link] = text 65 | 66 | return text 67 | else: 68 | return f"Failed to fetch content. Status code: {response.status}" 69 | except asyncio.TimeoutError: 70 | logger.log_error(f"Timeout fetching website content: {website_link}") 71 | return None 72 | except aiohttp.ClientError as e: 73 | logger.log_error(f"Error fetching website content: {str(e)}") 74 | return None 75 | except Exception as e: 76 | logger.log_error(f"Unknown error fetching website content: {str(e)}") 77 | return None -------------------------------------------------------------------------------- /logger.py: -------------------------------------------------------------------------------- 1 | import logging 2 | import os 3 | from colorama import Fore, Style, init 4 | 5 | # Initialize colorama for cross-platform colored output 6 | init(autoreset=True) 7 | 8 | class CustomLogger: 9 | def __init__(self, name, log_file=None): 10 | self.logger = logging.getLogger(name) 11 | self.logger.setLevel(logging.DEBUG) 12 | 13 | # Create formatter 14 | formatter = logging.Formatter( 15 | '%(asctime)s - %(name)s - %(levelname)s - %(message)s' 16 | ) 17 | 18 | # Create console handler with coloring 19 | console_handler = ColoredConsoleHandler() 20 | console_handler.setLevel(logging.INFO) 21 | console_handler.setFormatter(formatter) 22 | self.logger.addHandler(console_handler) 23 | 24 | # Create file handler if log_file is provided 25 | if log_file: 26 | log_file = os.path.join('logs', log_file) 27 | self._setup_file_handler(log_file, formatter) 28 | 29 | def _setup_file_handler(self, log_file, formatter): 30 | log_dir = os.path.dirname(log_file) 31 | if not os.path.exists(log_dir): 32 | os.makedirs(log_dir) 33 | file_handler = logging.FileHandler(log_file) 34 | file_handler.setLevel(logging.DEBUG) 35 | file_handler.setFormatter(formatter) 36 | self.logger.addHandler(file_handler) 37 | 38 | def log_debug(self, message): 39 | self.logger.debug(message) 40 | 41 | def log_info(self, message): 42 | self.logger.info(message) 43 | 44 | def log_warning(self, message): 45 | self.logger.warning(message) 46 | 47 | def log_error(self, message): 48 | self.logger.error(message) 49 | 50 | def log_critical(self, message): 51 | self.logger.critical(message) 52 | 53 | def log_exception(self, message): 54 | self.logger.exception(message) 55 | 56 | def log_api_request(self, method, path, status_code, response_time): 57 | self.logger.info( 58 | f"API Request - Method: {method}, Path: {path}, " 59 | f"Status: {status_code}, Response Time: {response_time:.2f}s" 60 | ) 61 | 62 | def log_db_query(self, query, execution_time): 63 | self.logger.debug( 64 | f"Database Query - Query: {query}, " 65 | f"Execution Time: {execution_time:.2f}s" 66 | ) 67 | 68 | def log_user_action(self, user_id, action): 69 | self.logger.info(f"User Action - User ID: {user_id}, Action: {action}") 70 | 71 | 72 | class ColoredConsoleHandler(logging.StreamHandler): 73 | COLORS = { 74 | logging.DEBUG: Fore.CYAN, 75 | logging.INFO: Fore.GREEN, 76 | logging.WARNING: Fore.YELLOW, 77 | logging.ERROR: Fore.RED, 78 | logging.CRITICAL: Fore.RED + Style.BRIGHT, 79 | } 80 | 81 | def emit(self, record): 82 | color = self.COLORS.get(record.levelno, Fore.WHITE) 83 | message = self.format(record) 84 | print(f"{color}{message}{Style.RESET_ALL}") 85 | 86 | 87 | # Usage example 88 | # logger = CustomLogger("MyApp", log_file="logs/app.log") 89 | # logger.log_info("This is an info message") 90 | # logger.log_warning("This is a warning message") 91 | # logger.log_error("This is an error message") 92 | # logger.log_critical("This is a critical message") 93 | # logger.log_api_request("GET", "/api/users", 200, 0.05) 94 | # logger.log_db_query("SELECT * FROM users", 0.02) 95 | # logger.log_user_action("user123", "Logged in") -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- 1 | import os 2 | import argparse 3 | import json 4 | from pathlib import Path 5 | import asyncio 6 | from datetime import datetime 7 | from typing import List, Dict 8 | 9 | from document_processor import analyze_document, UnsupportedFileTypeError 10 | from logger import CustomLogger 11 | from ai_helper.generate_outline import generate_podcast_outline 12 | from ai_helper.script_generator import generate_podcast_script 13 | from ai_helper.generate_speech import text_to_speech 14 | from utils.combine_audio import combine_audio_files 15 | 16 | # Set up logging 17 | log = CustomLogger("PodcastGenerator", log_file="podcast_generator.log") 18 | 19 | class PodcastGenerator: 20 | def __init__(self, input_dir: str, output_dir: str, project_name: str, 21 | host_count: int = 2, description: str = ""): 22 | self.input_dir = Path(input_dir) 23 | self.output_dir = Path(output_dir) 24 | self.project_name = project_name 25 | self.host_count = host_count 26 | self.description = description 27 | self.project_dir = self.output_dir / self.sanitize_filename(project_name) 28 | 29 | # Create output directories 30 | self.project_dir.mkdir(parents=True, exist_ok=True) 31 | (self.project_dir / "documents").mkdir(exist_ok=True) 32 | (self.project_dir / "outlines").mkdir(exist_ok=True) 33 | (self.project_dir / "scripts").mkdir(exist_ok=True) 34 | (self.project_dir / "audio").mkdir(exist_ok=True) 35 | 36 | @staticmethod 37 | def sanitize_filename(filename: str) -> str: 38 | """Convert string to valid filename.""" 39 | return "".join(c for c in filename if c.isalnum() or c in (' ', '-', '_')).strip() 40 | 41 | async def process_documents(self) -> List[str]: 42 | """Process all documents in the input directory.""" 43 | document_contents = [] 44 | 45 | if not self.input_dir.exists(): 46 | raise FileNotFoundError(f"Input directory not found: {self.input_dir}") 47 | 48 | for file_path in self.input_dir.rglob('*'): 49 | if file_path.is_file(): 50 | try: 51 | log.log_info(f"Processing document: {file_path}") 52 | content = await analyze_document(str(file_path)) 53 | document_contents.append(content) 54 | 55 | # Save processed content 56 | output_path = self.project_dir / "documents" / f"{file_path.stem}_processed.txt" 57 | with open(output_path, 'w', encoding='utf-8') as f: 58 | f.write(content) 59 | 60 | except UnsupportedFileTypeError as e: 61 | log.log_error(f"Skipping unsupported file {file_path}: {str(e)}") 62 | except Exception as e: 63 | log.log_error(f"Error processing file {file_path}: {str(e)}") 64 | 65 | return document_contents 66 | 67 | async def generate_outline(self, document_contents: List[str]) -> str: 68 | """Generate podcast outline from document contents.""" 69 | log.log_info("Generating podcast outline") 70 | 71 | log.log_debug(document_contents[0][:100]) 72 | 73 | # Combine all document contents with description 74 | combined_content = "\n\n".join([self.description] + document_contents) 75 | 76 | # Generate outline 77 | outline = await generate_podcast_outline(combined_content, self.host_count) 78 | 79 | # Save outline 80 | timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") 81 | outline_path = self.project_dir / "outlines" / f"outline_{timestamp}.md" 82 | with open(outline_path, 'w', encoding='utf-8') as f: 83 | f.write(outline) 84 | 85 | return outline 86 | 87 | async def generate_script(self, outline: str, document_contents: List[str]) -> str: 88 | """Generate podcast script from outline and document contents.""" 89 | log.log_info("Generating podcast script") 90 | 91 | # Combine all document contents 92 | combined_content = "\n\n".join(document_contents) 93 | 94 | # Generate script 95 | script = await generate_podcast_script(outline, combined_content, self.host_count) 96 | 97 | # Save script 98 | timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") 99 | script_path = self.project_dir / "scripts" / f"script_{timestamp}.md" 100 | with open(script_path, 'w', encoding='utf-8') as f: 101 | f.write(script) 102 | 103 | return script 104 | 105 | async def generate_audio(self, script: str) -> str: 106 | """Generate audio from script.""" 107 | log.log_info("Generating audio file") 108 | 109 | timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") 110 | 111 | # Convert project_dir to Path if it isn't already 112 | project_dir = Path(self.project_dir) 113 | 114 | # Create the audio directory path 115 | audio_dir = project_dir / "audio" 116 | 117 | # Create the audio directory if it doesn't exist 118 | audio_dir.mkdir(parents=True, exist_ok=True) 119 | 120 | # Create the full audio file path 121 | audio_path = audio_dir / f"podcast_{timestamp}.mp3" 122 | 123 | await text_to_speech(script, str(audio_path)) 124 | 125 | return str(audio_path) 126 | 127 | async def generate_podcast(self) -> Dict: 128 | """Generate complete podcast from documents.""" 129 | try: 130 | # Process all documents 131 | document_contents = await self.process_documents() 132 | if not document_contents: 133 | raise ValueError("No valid documents found to process") 134 | 135 | log.log_info(f"Processed {len(document_contents)} documents") 136 | 137 | # Generate outline 138 | outline = await self.generate_outline(document_contents) 139 | 140 | log.log_debug(outline[:100]) 141 | 142 | # Generate script 143 | script = await self.generate_script(outline, document_contents) 144 | 145 | # Generate audio 146 | audio_path = await self.generate_audio(script) 147 | 148 | # Get all segment files 149 | audio_dir = Path(self.project_dir) / "audio" 150 | segment_files = sorted(list(audio_dir.glob("segment_*.wav"))) 151 | 152 | if not segment_files: 153 | raise ValueError("No audio segments found to combine") 154 | 155 | # Combine all the audio files into a single file 156 | audio_combined_path = audio_dir / "podcast_combined.mp3" 157 | combine_audio_files(segment_files, audio_combined_path) 158 | 159 | # Save project metadata 160 | metadata = { 161 | "project_name": self.project_name, 162 | "host_count": self.host_count, 163 | "description": self.description, 164 | "timestamp": datetime.now().isoformat(), 165 | "input_directory": str(self.input_dir), 166 | "output_directory": str(self.project_dir), 167 | "audio_segments": [str(f) for f in segment_files], 168 | "audio_combined_file": str(audio_combined_path), 169 | } 170 | 171 | metadata_path = Path(self.project_dir) / "metadata.json" 172 | with open(metadata_path, 'w') as f: 173 | json.dump(metadata, f, indent=2) 174 | 175 | return metadata 176 | 177 | except Exception as e: 178 | log.log_error(f"Error generating podcast: {str(e)}") 179 | raise 180 | 181 | def main(): 182 | input_dir = 'my_docs' 183 | output_dir = input("Enter the output directory for generated files (default: output): ") or 'output' 184 | project_name = input("Enter the name of the project: ") 185 | host_count = input("Enter the number of podcast hosts (default: 2): ") or 2 186 | description = input("Enter the project description (optional): ") 187 | 188 | try: 189 | generator = PodcastGenerator( 190 | input_dir, 191 | output_dir, 192 | project_name, 193 | int(host_count), 194 | description 195 | ) 196 | 197 | metadata = asyncio.run(generator.generate_podcast()) 198 | log.log_info("Podcast generation completed successfully!") 199 | log.log_info(f"Output files are in: {metadata['output_directory']}") 200 | 201 | except Exception as e: 202 | log.log_error(f"Error: {str(e)}") 203 | exit(1) 204 | 205 | if __name__ == "__main__": 206 | main() -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | annotated-types==0.7.0 2 | anyio==4.7.0 3 | azure-ai-formrecognizer==3.3.3 4 | azure-cognitiveservices-speech==1.41.1 5 | azure-common==1.1.28 6 | azure-core==1.32.0 7 | beautifulsoup4==4.12.3 8 | certifi==2024.12.14 9 | charset-normalizer==3.4.0 10 | colorama==0.4.6 11 | distro==1.9.0 12 | groq==0.13.1 13 | h11==0.14.0 14 | httpcore==1.0.7 15 | httpx==0.27.2 16 | idna==3.10 17 | isodate==0.7.2 18 | jiter==0.8.2 19 | joblib==1.4.2 20 | lxml==5.3.0 21 | msrest==0.7.1 22 | numpy==2.2.0 23 | oauthlib==3.2.2 24 | ollama==0.4.4 25 | openai==1.58.1 26 | pydantic==2.10.4 27 | pydantic_core==2.27.2 28 | pydub==0.25.1 29 | PyPDF2==3.0.1 30 | python-docx==1.1.2 31 | python-dotenv==1.0.1 32 | regex==2024.11.6 33 | requests==2.32.3 34 | requests-oauthlib==2.0.0 35 | scikit-learn==1.6.0 36 | scipy==1.14.1 37 | setuptools==75.1.0 38 | six==1.17.0 39 | sniffio==1.3.1 40 | soupsieve==2.6 41 | threadpoolctl==3.5.0 42 | tiktoken==0.8.0 43 | tqdm==4.67.1 44 | typing_extensions==4.12.2 45 | urllib3==2.2.3 46 | wheel==0.44.0 47 | -------------------------------------------------------------------------------- /utils/combine_audio.py: -------------------------------------------------------------------------------- 1 | import os 2 | from pathlib import Path 3 | from pydub import AudioSegment 4 | from typing import List, Union 5 | from logger import CustomLogger 6 | 7 | log = CustomLogger("CombineAudio", log_file="combine_audio.log") 8 | 9 | def combine_audio_files(input_files: List[Union[str, Path]], output_file: Union[str, Path]) -> str: 10 | """ 11 | Combine multiple audio files into a single MP3 file. 12 | 13 | Args: 14 | input_files: List of paths to input audio files 15 | output_file: Path where the combined audio should be saved 16 | """ 17 | log.log_debug("Starting audio combination process...") 18 | 19 | # Convert all paths to Path objects 20 | input_files = [Path(f) for f in input_files] 21 | output_file = Path(output_file) 22 | 23 | # Ensure output directory exists 24 | output_file.parent.mkdir(parents=True, exist_ok=True) 25 | 26 | # Initialize an empty AudioSegment 27 | log.log_debug("Initializing an empty AudioSegment...") 28 | combined = AudioSegment.empty() 29 | 30 | # Iterate through the input files 31 | for file_path in input_files: 32 | try: 33 | log.log_debug(f"Processing file: {file_path}") 34 | 35 | # Load the audio file (handle both wav and mp3) 36 | if file_path.suffix.lower() == '.wav': 37 | audio = AudioSegment.from_wav(str(file_path)) 38 | elif file_path.suffix.lower() == '.mp3': 39 | audio = AudioSegment.from_mp3(str(file_path)) 40 | else: 41 | log.log_warning(f"Unsupported file format: {file_path}") 42 | continue 43 | 44 | # Add it to the combined AudioSegment 45 | combined += audio 46 | log.log_debug(f"Added: {file_path}") 47 | 48 | except Exception as e: 49 | log.log_error(f"Error processing {file_path}: {str(e)}") 50 | continue 51 | 52 | if len(combined) == 0: 53 | raise ValueError("No audio files were successfully combined") 54 | 55 | # Export the combined audio to a file 56 | log.log_debug(f"Exporting combined audio to {output_file}...") 57 | combined.export(str(output_file), format="mp3") 58 | log.log_debug(f"Combined audio saved as: {output_file}") 59 | 60 | return str(output_file) --------------------------------------------------------------------------------