├── .gitignore
├── CONTRIBUTING.md
├── README.md
├── ai_helper
    ├── ai_helper.py
    ├── generate_outline.py
    ├── generate_speech.py
    └── script_generator.py
├── document_processor.py
├── examples
    ├── example_1.mp3
    └── example_2.mp3
├── helpers.py
├── logger.py
├── main.py
├── requirements.txt
└── utils
    └── combine_audio.py


/.gitignore:
--------------------------------------------------------------------------------
1 | __pycache__
2 | logs
3 | 
4 | output
5 | 
6 | .env
7 | .env.*
8 | 
9 | my_docs/*


--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
  1 | # Contributing to MyPodify
  2 | 
  3 | Thank you for your interest in contributing to MyPodify! This document provides guidelines and instructions for contributing to the project.
  4 | 
  5 | ## Code of Conduct
  6 | 
  7 | By participating in this project, you agree to maintain a respectful and inclusive environment. We expect all contributors to:
  8 | - Use welcoming and inclusive language
  9 | - Be respectful of differing viewpoints and experiences
 10 | - Gracefully accept constructive criticism
 11 | - Focus on what is best for the community
 12 | - Show empathy towards other community members
 13 | 
 14 | ## Getting Started
 15 | 
 16 | 1. Fork the repository
 17 | 2. Clone your fork:
 18 | ```bash
 19 | git clone https://github.com/your-username/mypodify.git
 20 | cd mypodify
 21 | ```
 22 | 3. Set up your development environment:
 23 | ```bash
 24 | # Create and activate virtual environment
 25 | python -m venv venv
 26 | source venv/bin/activate  # On Windows: venv\Scripts\activate
 27 | 
 28 | # Install dependencies
 29 | pip install -r requirements.txt
 30 | ```
 31 | 
 32 | 4. Create a new branch for your feature/fix:
 33 | ```bash
 34 | git checkout -b feature/your-feature-name
 35 | ```
 36 | 
 37 | ## Development Guidelines
 38 | 
 39 | ### Code Style
 40 | 
 41 | - Follow PEP 8 style guidelines
 42 | - Use type hints for function parameters and return values
 43 | - Use meaningful variable and function names
 44 | - Include docstrings for classes and functions
 45 | - Keep functions focused and modular
 46 | 
 47 | ### Directory Structure
 48 | - Place new AI helper functions in `ai_helper/`
 49 | - Add utility functions to `utils/`
 50 | - Update documentation when adding new features
 51 | 
 52 | ### Documentation
 53 | 
 54 | - Update README.md if adding new features or changing functionality
 55 | - Include docstrings for new functions and classes
 56 | - Comment complex logic or non-obvious implementations
 57 | - Update requirements.txt if adding new dependencies
 58 | 
 59 | ## Making Changes
 60 | 
 61 | 1. Make your changes in your feature branch
 62 | 2. Write or update tests as needed
 63 | 3. Run the test suite to ensure everything passes
 64 | 4. Update documentation as necessary
 65 | 5. Commit your changes with clear, descriptive commit messages:
 66 | ```bash
 67 | git commit -m "feat: add support for MP3 file processing"
 68 | ```
 69 | 
 70 | ### Commit Message Guidelines
 71 | 
 72 | Follow the conventional commits specification:
 73 | - `feat:` New feature
 74 | - `fix:` Bug fix
 75 | - `docs:` Documentation changes
 76 | - `style:` Code style changes (formatting, etc.)
 77 | - `refactor:` Code refactoring
 78 | - `test:` Adding or updating tests
 79 | - `chore:` Maintenance tasks
 80 | 
 81 | ## Submitting Changes
 82 | 
 83 | 1. Push your changes to your fork:
 84 | ```bash
 85 | git push origin feature/your-feature-name
 86 | ```
 87 | 
 88 | 2. Create a Pull Request:
 89 |    - Go to the original repository
 90 |    - Click "New Pull Request"
 91 |    - Choose your fork and feature branch
 92 |    - Fill out the PR template
 93 | 
 94 | ### Pull Request Guidelines
 95 | 
 96 | - Provide a clear description of the changes
 97 | - Link any related issues
 98 | - Include screenshots for UI changes
 99 | - List any breaking changes
100 | - Update documentation as needed
101 | - Ensure CI checks pass
102 | - Request review from maintainers
103 | 
104 | ## Additional Resources
105 | 
106 | ### Setting Up Local Development
107 | 
108 | 1. Copy the example environment file:
109 | ```bash
110 | cp .env.example .env
111 | ```
112 | 
113 | 2. Configure your environment variables:
114 | ```env
115 | AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT=your_endpoint
116 | AZURE_DOCUMENT_INTELLIGENCE_KEY=your_key
117 | AZURE_SPEECH_KEY=your_speech_key
118 | AZURE_SPEECH_REGION=your_region
119 | OPENAI_API_KEY=your_openai_key
120 | ```
121 | 
122 | ### Testing Files
123 | 
124 | Place test documents in the `my_docs/` directory for testing your changes.
125 | 
126 | ## Questions or Need Help?
127 | 
128 | - Create an issue for bugs or feature requests
129 | - Join our community discussions
130 | - Contact the maintainers
131 | 
132 | ## License
133 | 
134 | By contributing to MyPodify, you agree that your contributions will be licensed under the same license as the project.


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # MyPodify
  2 | 
  3 | MyPodify is an open-source local solution for automatically generating podcasts from documents. Think of it as a self-hosted alternative to Google's NotebookLM, focused on podcast creation. The tool processes documents from a specified folder and transforms them into engaging podcast content complete with outlines, scripts, and audio.
  4 | 
  5 | ## Example
  6 | 
  7 | You can find example Podcast outputs in the /examples directory to help you understand what MyPodify generates.
  8 | 
  9 | ## Features
 10 | 
 11 | - **Document Processing**: Supports multiple file formats including PDF (via Azure Document Intelligence), DOCX, and TXT files
 12 | - **Automated Content Generation**: Creates podcast outlines and scripts using Ollama AI
 13 | - **Text-to-Speech**: Converts scripts into audio using Azure's Speech Service
 14 | - **Multiple Host Support**: Generate content for 1-3 hosts (default: 2 hosts - Alex and Jane)
 15 | - **Project Organization**: Automatically creates organized directory structure for outputs
 16 | - **Detailed Logging**: Comprehensive logging system for troubleshooting
 17 | 
 18 | ## Prerequisites
 19 | 
 20 | - Python 3.7+
 21 | - Azure Account (for PDF processing and Speech Services)
 22 | - Ollama. Download Ollama [here](http://ollama.com)
 23 | 
 24 | ## Installation
 25 | 
 26 | 1. Clone the repository:
 27 | ```bash
 28 | git clone https://github.com/shagunmistry/NotebookLM_Alternative.git
 29 | cd mypodify
 30 | ```
 31 | 
 32 | 2. Install required packages:
 33 | ```bash
 34 | pip install -r requirements.txt
 35 | ```
 36 | 
 37 | 3. Set up environment variables:
 38 | Create a `.env` file in the project root with the following:
 39 | ```env
 40 | AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT=your_endpoint
 41 | AZURE_DOCUMENT_INTELLIGENCE_KEY=your_key
 42 | 
 43 | AZURE_TTS_ENDPOINT=your_endpoint
 44 | AZURE_SPEECH_KEY=your_key
 45 | AZURE_SPEECH_REGION=your_region
 46 | ```
 47 | 
 48 | ## Usage
 49 | 
 50 | 1. Place your source documents in the `my_docs` directory
 51 | 
 52 | 2. Run the podcast generator:
 53 | ```bash
 54 | python main.py
 55 | ```
 56 | 
 57 | 3. Follow the prompts to:
 58 |    - Specify output directory (default: 'output')
 59 |    - Enter project name
 60 |    - Set number of hosts (default: 2)
 61 |    - Provide project description (optional)
 62 | 
 63 | ## Project Structure
 64 | 
 65 | ```
 66 | mypodify/
 67 | ├── __pycache__/            # Python cache files
 68 | ├── ai_helper/              # AI content generation modules
 69 | │   ├── __pycache__/
 70 | │   ├── ai_helper.py        # Core AI helper functions
 71 | │   ├── generate_outline.py # Podcast outline generation
 72 | │   ├── generate_speech.py  # Speech synthesis module
 73 | │   └── script_generator.py # Podcast script generation
 74 | ├── logs/                   # Log files directory
 75 | ├── my_docs/                # Input documents directory
 76 | ├── output/                 # Generated content directory
 77 | ├── utils/                  # Utility functions
 78 | │   ├── __pycache__/
 79 | │   └── combine_audio.py    # Audio processing utilities
 80 | ├── .env                    # Environment variables
 81 | ├── .gitignore             # Git ignore rules
 82 | ├── document_processor.py   # Document processing module
 83 | ├── helpers.py             # Helper utilities
 84 | ├── logger.py              # Logging configuration
 85 | ├── main.py                # Main application entry
 86 | ├── README.md              # Project documentation
 87 | ├── requirements.txt       # Project dependencies
 88 | ```
 89 | 
 90 | ## Output Structure
 91 | 
 92 | Each project generates:
 93 | - Processed document text files
 94 | - Markdown outline files
 95 | - Podcast scripts in Markdown format
 96 | - Individual audio segments
 97 | - Combined final podcast audio file
 98 | - Project metadata JSON
 99 | 
100 | ## Supported File Types
101 | 
102 | - PDF (requires Azure Document Intelligence)
103 | - DOCX
104 | - TXT
105 | - DOC (currently unsupported, must be converted to DOCX)
106 | 
107 | ## Configuration
108 | 
109 | The system can be configured through various parameters:
110 | 
111 | - **Host Count**: 1-3 hosts (affects conversation style and dynamics)
112 | - **Output Directory**: Customizable output location
113 | - **Project Name**: Used for organizing outputs
114 | - **Description**: Optional context for content generation
115 | 
116 | ## Logging
117 | 
118 | The system includes comprehensive logging:
119 | - General logs: `podcast_generator.log`
120 | - Document processing logs: `document_processor.log`
121 | - Speech generation logs: `speech_generator.log`
122 | - Outline generation logs: `podcast_outline.log`
123 | 
124 | ## Contributing
125 | 
126 | Contributions are welcome! Please feel free to submit a Pull Request.
127 | 
128 | ## License
129 | 
130 | This project is licensed under the MIT License.
131 | 
132 | ## Acknowledgments
133 | 
134 | - Azure Document Intelligence for PDF processing
135 | - Azure Speech Services for text-to-speech
136 | - Ollama for AI content generation
137 | 
138 | ## Note
139 | 
140 | This is an early version of the project and is under active development. Features and functionality may change in future releases.
141 | 
142 | ## Contact
143 | 
144 | For questions or feedback or issues, please create an issue.
145 | 


--------------------------------------------------------------------------------
/ai_helper/ai_helper.py:
--------------------------------------------------------------------------------
  1 | # from ollama import AsyncClient
  2 | # from typing import List
  3 | # from logger import CustomLogger
  4 | # from dotenv import load_dotenv
  5 | 
  6 | # load_dotenv()
  7 | 
  8 | # log = CustomLogger("AI_Helper", log_file="ai_helper.log")
  9 | 
 10 | # # Ollama configuration
 11 | # MODEL_NAME = "llama3.2"
 12 | # client = AsyncClient()
 13 | 
 14 | # async def generate_content_from_ollama(content: str, system_instructions: str, purpose: str,
 15 | #                                      previous_content: str = None, chunk_index: int = None,
 16 | #                                      total_chunks: int = None) -> str:
 17 | #     """Generate content using Ollama's Python library."""
 18 | #     log.log_info(f"Generating {purpose} using Ollama...")
 19 | 
 20 | #     try:
 21 | #         messages = []
 22 | 
 23 | #         # Build the system message with context
 24 | #         if previous_content and chunk_index is not None:
 25 | #             system_msg = (
 26 | #                 f"{system_instructions}\n\n"
 27 | #                 f"This is part {chunk_index + 1} of {total_chunks}. "
 28 | #                 f"Previous content summary:\n{previous_content[:1000]}...\n\n"
 29 | #                 f"Continue the {purpose} based on the following additional content, "
 30 | #                 f"maintaining consistency with the previous parts:\n\n{content}"
 31 | #             )
 32 | #         else:
 33 | #             system_msg = f"{system_instructions}\n\nContent to analyze:\n\n{content}"
 34 | 
 35 | #         messages = [
 36 | #             {"role": "system", "content": system_msg},
 37 | #             {"role": "user", "content": f"Generate the {purpose} for this content, making it flow naturally with any previous parts."}
 38 | #         ]
 39 | 
 40 | #         log.log_info(f"Processing content (first 100 chars): {content[:100]}...")
 41 | 
 42 | #         response = await client.chat(
 43 | #             model=MODEL_NAME,
 44 | #             messages=messages,
 45 | #             options={
 46 | #                 "temperature": 0.7,
 47 | #                 "top_p": 0.9
 48 | #             }
 49 | #         )
 50 | 
 51 | #         generated_content = response.message.content
 52 | #         log.log_info(f"Generated content (first 100 chars): {generated_content[:100]}...")
 53 | #         return generated_content
 54 | 
 55 | #     except Exception as e:
 56 | #         log.log_error(f"Error generating completion with Ollama: {e}")
 57 | #         raise
 58 | 
 59 | # async def generate_content_with_chunking(content: str, system_instructions: str, purpose: str) -> str:
 60 | #     """Generate content using Ollama with chunking for large inputs."""
 61 | #     log.log_info(f"Generating {purpose} with chunking...")
 62 | 
 63 | #     def split_content(text: str, chunk_size: int = 8000) -> List[str]:
 64 | #         """Split content into chunks based on character count and sentence boundaries."""
 65 | #         chunks = []
 66 | #         current_chunk = ""
 67 | 
 68 | #         # Split by paragraphs first to maintain better context
 69 | #         paragraphs = text.split('\n\n')
 70 | 
 71 | #         for paragraph in paragraphs:
 72 | #             if len(current_chunk) + len(paragraph) < chunk_size:
 73 | #                 current_chunk += (paragraph + '\n\n')
 74 | #             else:
 75 | #                 if current_chunk:
 76 | #                     chunks.append(current_chunk.strip())
 77 | #                 current_chunk = paragraph + '\n\n'
 78 | 
 79 | #         if current_chunk:
 80 | #             chunks.append(current_chunk.strip())
 81 | 
 82 | #         return chunks
 83 | 
 84 | #     try:
 85 | #         # If content is small enough, process it directly
 86 | #         if len(content) < 8000:
 87 | #             return await generate_content_from_ollama(content, system_instructions, purpose)
 88 | 
 89 | #         # Split into chunks if content is large
 90 | #         chunks = split_content(content)
 91 | #         total_chunks = len(chunks)
 92 | #         log.log_info(f"Split content into {total_chunks} chunks")
 93 | 
 94 | #         final_content = ""
 95 | #         for i, chunk in enumerate(chunks):
 96 | #             log.log_info(f"Processing chunk {i+1}/{total_chunks}")
 97 | 
 98 | #             # Get summary of previous content for context
 99 | #             previous_content = final_content if final_content else None
100 | 
101 | #             chunk_content = await generate_content_from_ollama(
102 | #                 chunk,
103 | #                 system_instructions,
104 | #                 purpose,
105 | #                 previous_content=previous_content,
106 | #                 chunk_index=i,
107 | #                 total_chunks=total_chunks
108 | #             )
109 | 
110 | #             if i == 0:
111 | #                 final_content = chunk_content
112 | #             else:
113 | #                 # Ensure smooth transition between chunks
114 | #                 final_content += "\n\n" + chunk_content
115 | 
116 | #             log.log_info(f"Successfully processed chunk {i+1}")
117 | 
118 | #         return final_content
119 | 
120 | #     except Exception as e:
121 | #         log.log_error(f"Error in content generation with chunking: {e}")
122 | #         raise
123 | import tiktoken
124 | from openai import OpenAI
125 | import os
126 | from typing import List
127 | from logger import CustomLogger
128 | from dotenv import load_dotenv
129 | 
130 | load_dotenv()
131 | 
132 | log = CustomLogger("AI_Helper", log_file="ai_helper.log")
133 | 
134 | MAX_TOKENS = 128000  # Maximum tokens allowed by the model
135 | BUFFER = 1000  # Buffer for system and user messages
136 | 
137 | client = OpenAI(
138 |     api_key=os.getenv("OPENAI_API_KEY"),
139 | )
140 | 
141 | MODEL_TO_USE = "gpt-4o-mini-2024-07-18"
142 | 
143 | 
144 | def num_tokens_from_string(string: str, model: str = MODEL_TO_USE) -> int:
145 |     encoding = tiktoken.encoding_for_model(model)
146 |     return len(encoding.encode(string))
147 | 
148 | 
149 | def generate_content_from_openai(content: str, system_instructions: str, purpose: str) -> str:
150 |     log.log_debug(f"Generating {purpose} in chunks...")
151 | 
152 |     def create_completion(messages: List[dict]) -> str:
153 |         completion = client.chat.completions.create(
154 |             model=MODEL_TO_USE,
155 |             messages=messages,
156 |         )
157 |         return completion.choices[0].message.content
158 | 
159 |     def split_content(content: str, max_tokens: int) -> List[str]:
160 |         chunks = []
161 |         current_chunk = ""
162 |         for line in content.split('\n'):
163 |             line_tokens = num_tokens_from_string(line)
164 |             if num_tokens_from_string(current_chunk) + line_tokens > max_tokens:
165 |                 chunks.append(current_chunk)
166 |                 current_chunk = line
167 |             else:
168 |                 current_chunk += ('\n' if current_chunk else '') + line
169 |         if current_chunk:
170 |             chunks.append(current_chunk)
171 |         return chunks
172 | 
173 |     log.log_debug(f"Splitting content into chunks...")
174 |     log.log_debug(f"Content length: {len(content)}")
175 | 
176 |     system_tokens = num_tokens_from_string(system_instructions)
177 |     user_tokens = num_tokens_from_string(
178 |         f"Based on the content provided, generate {purpose}")
179 |     max_chunk_tokens = MAX_TOKENS - system_tokens - user_tokens - BUFFER
180 | 
181 |     chunks = split_content(content, max_chunk_tokens)
182 |     log.log_debug(f"Split content into {len(chunks)} chunks")
183 | 
184 |     messages = [
185 |         {"role": "system", "content": system_instructions},
186 |         {"role": "user", "content": f"Based on the content provided, generate {purpose}"},
187 |     ]
188 | 
189 |     final_content = ""
190 |     for i, chunk in enumerate(chunks):
191 |         chunk_messages = messages.copy()
192 |         chunk_messages.append({"role": "user", "content": chunk})
193 |         if i > 0:
194 |             chunk_messages.append({"role": "user", "content": f"Continue generating the {
195 |                                   purpose} based on this additional content."})
196 | 
197 |         chunk_content = create_completion(chunk_messages)
198 |         final_content += chunk_content
199 | 
200 |     log.log_debug(f"Generated {purpose} successfully, processed {
201 |                   len(chunks)} chunks")
202 |     return final_content
203 | 


--------------------------------------------------------------------------------
/ai_helper/generate_outline.py:
--------------------------------------------------------------------------------
 1 | 
 2 | from logger import CustomLogger
 3 | from dotenv import load_dotenv
 4 | from ai_helper.ai_helper import generate_content_from_openai
 5 | 
 6 | load_dotenv()
 7 | 
 8 | # Set up logging
 9 | log = CustomLogger("GenerateOutline", log_file="podcast_outline.log")
10 | 
11 | PODCAST_OUTLINE_SYSTEM_INSTRUCTIONS = """
12 | Act as a Podcast Producer tasked with creating a detailed outline for a deep dive podcast episode based on the provided content.
13 | The output should be a well-formatted Markdown outline that can be used to create a full podcast script.
14 | 
15 | Number of Hosts: {host_count}
16 | 
17 | 1. An attention-grabbing introduction that briefly introduces the host(s) and the topic
18 | 2. 4-6 main talking points, each with 3-4 sub-points or examples.
19 |     - Include factual information, statistics, and historical context
20 |     - Opportunities for hosts to share personal anecdotes or opinions
21 | 3. A mid-point break indication
22 | 4. A summary of key takeaways for the listeners
23 | 5. A Thank you note and closing remarks
24 |     - Thanks listeners for listening to "MyPodify" podcast.
25 | 6. A suggestion for the next episode which is a related topic to the current one.
26 | 
27 | If 1 host:
28 | - Name of the host: Alex.
29 | - Include personal anecdotes or opinions from the host
30 | - Encourage the host to ask rhetorical questions to engage the audience
31 | - Include moments of humor or light-hearted banter
32 | 
33 | If 2 hosts:
34 | - Names of the hosts: Alex and Jane.
35 | - Indicate where the hosts might disagree or offer different perspectives
36 | - Include opportunities for the hosts to engage in a friendly debate or discussion
37 | - Encourage the hosts to ask each other questions or respond to each other's points
38 | 
39 | If 3 or more hosts:
40 | - Provide clear transitions between speakers
41 | - Include opportunities for each host to contribute unique insights or perspectives
42 | - Encourage the hosts to engage in a roundtable discussion format
43 | 
44 | Include the following elements in your outline:
45 | 
46 | - Factual information, statistics, and historical context related to the topic
47 | - Pop culture references or current events that relate to the subject matter
48 | - Opportunities for hosts to share personal anecdotes or opinions
49 | - Moments of humor or light-hearted banter between hosts
50 | - Analogies or comparisons to explain complex concepts
51 | - "Fun facts" or surprising information to maintain interest
52 | - Rhetorical questions to engage the audience
53 | - Indications where hosts might disagree or offer different perspectives
54 | - Suggestions for transitions between subtopics
55 | 
56 | THIS IS NOT SUPPOSED TO BE A SCRIPT. It should be a detailed outline that will be used to create a full podcast script. Each main point should have enough detail to guide a 5-10 minute discussion.
57 | """
58 | 
59 | 
60 | async def generate_podcast_outline(analysis: str, host_count: int) -> str:
61 |     try:
62 |         log.log_debug("Generating podcast outline...")
63 |         
64 |         si_instructions = PODCAST_OUTLINE_SYSTEM_INSTRUCTIONS.format(
65 |             host_count=host_count)
66 |         outline = generate_content_from_openai(analysis, si_instructions, purpose="Podcast Outline")
67 | 
68 |         return outline
69 |     except Exception as e:
70 |         log.log_info(f"Error generating podcast outline: {e}")
71 |         raise


--------------------------------------------------------------------------------
/ai_helper/generate_speech.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import azure.cognitiveservices.speech as speechsdk
  3 | import time
  4 | from pathlib import Path
  5 | from dotenv import load_dotenv
  6 | from pydub import AudioSegment
  7 | 
  8 | from logger import CustomLogger
  9 | 
 10 | log = CustomLogger("SpeechGenerator", log_file="speech_generator.log")
 11 | 
 12 | load_dotenv()
 13 | 
 14 | # Azure Speech Service configuration
 15 | speech_key = os.getenv("AZURE_SPEECH_KEY")
 16 | service_region = os.getenv("AZURE_SPEECH_REGION")
 17 | 
 18 | 
 19 | speech_config = speechsdk.SpeechConfig(
 20 |     subscription=speech_key, region=service_region)
 21 | 
 22 | 
 23 | def create_speech(text, voice, output_file):
 24 |     if not text.strip():
 25 |         log.log_warning(f"Warning: Empty text for {output_file}. Skipping this segment.")
 26 |         return
 27 | 
 28 |     try:
 29 |         speech_config.speech_synthesis_voice_name = voice
 30 |         speech_synthesizer = speechsdk.SpeechSynthesizer(
 31 |             speech_config=speech_config, audio_config=None)
 32 |         
 33 |         
 34 |         result = speech_synthesizer.speak_text_async(text).get()
 35 | 
 36 |         if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
 37 |             audio_data = result.audio_data
 38 |             with open(output_file, "wb") as audio_file:
 39 |                 audio_file.write(audio_data)
 40 |             log.log_debug(f"Audio saved to {output_file}")
 41 |         else:
 42 |             log.log_error(f"Error synthesizing speech for {output_file}: {result.reason}")
 43 | 
 44 |     except Exception as e:
 45 |         log.log_error(f"Error creating speech for {output_file}: {str(e)}")
 46 |         log.log_error(f"Problematic text: '{text}'")
 47 | 
 48 | 
 49 | def create_speech_from_ssml(ssml, output_file):
 50 |     try:
 51 |         speech_synthesizer = speechsdk.SpeechSynthesizer(
 52 |             speech_config=speech_config, audio_config=None)
 53 |         
 54 |         log.log_debug(ssml)
 55 |         result = speech_synthesizer.speak_ssml_async(ssml).get()
 56 |         log.log_debug(result)
 57 | 
 58 |         if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
 59 |             audio_data = result.audio_data
 60 |             with open(output_file, "wb") as audio_file:
 61 |                 audio_file.write(audio_data)
 62 |             log.log_debug(f"Audio saved to {output_file}")
 63 |             return output_file
 64 |         elif result.reason == speechsdk.ResultReason.Canceled:
 65 |             cancellation_details = result.cancellation_details
 66 |             log.log_debug(f"Speech synthesis canceled: {cancellation_details.reason}")
 67 |             if cancellation_details.reason == speechsdk.CancellationReason.Error:
 68 |                 log.log_debug(f"Error details: {cancellation_details.error_details}")
 69 |         else:
 70 |             log.log_error(f"Error synthesizing speech for {output_file}: {result.reason}")
 71 |             return None
 72 | 
 73 |     except Exception as e:
 74 |         log.log_error(f"Error creating speech for {output_file}: {str(e)}")
 75 |         log.log_error(f"Problematic SSML: '{ssml[:100]}...'")  # log.log_debug first 100 characters of SSML for debugging
 76 |         return None
 77 | 
 78 | def convert_wav_to_mp3(wav_file: Path, mp3_file: Path):
 79 |     try:
 80 |         audio = AudioSegment.from_wav(str(wav_file))
 81 |         audio.export(str(mp3_file), format="mp3")
 82 |         log.log_debug(f"Converted {wav_file} to {mp3_file}")
 83 |         return mp3_file
 84 |     except Exception as e:
 85 |         log.log_error(f"Error converting WAV to MP3: {str(e)}")
 86 |         return None
 87 | 
 88 | # async def text_to_speech(script: str, output_path: Path) -> tuple:
 89 |     # Convert the entire script to SSML
 90 |     # full_ssml = markdown_to_ssml(script)
 91 | 
 92 |     # # remove asterisks
 93 |     # full_ssml = full_ssml.replace('*', '')
 94 | 
 95 |     # wav_file = output_path / "full_script.wav"
 96 |     # mp3_file = output_path / "full_script.mp3"
 97 | 
 98 |     # wav_result = create_speech_from_ssml(full_ssml, wav_file)
 99 |     
100 |     # if wav_result:
101 |     #     mp3_result = convert_wav_to_mp3(wav_file, mp3_file)
102 |     #     if mp3_result:
103 |     #         log.log_debug(f"Processed full script and saved to {mp3_file}")
104 |     #         os.remove(wav_file)  # Remove the temporary WAV file
105 |     #         return full_ssml, mp3_file
106 |     #     else:
107 |     #         return full_ssml, wav_file  # Return WAV file if MP3 conversion fails
108 |     # else:
109 |     #     return full_ssml, None
110 | 
111 | async def text_to_speech(script: str, output_path: str):
112 |     # Convert output_path to Path object
113 |     output_path = Path(output_path)
114 |     
115 |     # Make sure the directory exists
116 |     output_path.parent.mkdir(parents=True, exist_ok=True)
117 | 
118 |     # Split the script into lines
119 |     lines = script.split('\n')
120 | 
121 |     # Initialize variables
122 |     current_speaker = ""
123 |     current_text = ""
124 |     audio_segments = []
125 |     line_number = 0
126 | 
127 |     for line in lines:
128 |         line_number += 1
129 |         line = line.strip()
130 | 
131 |         # Check for speaker lines in both formats
132 |         if ':' in line and (line.startswith('**') or any(name in line.split(':')[0] for name in ['Alex', 'Jane'])):
133 |             # New speaker
134 |             if current_speaker and current_text.strip():
135 |                 audio_segments.append(
136 |                     (current_speaker, current_text.strip(), line_number - 1))
137 |             current_speaker = line.split(':')[0].strip('* ')
138 |             current_text = line.split(':', 1)[1].strip() + " "
139 |         elif line and not line.startswith('[') and not line.startswith('#'):
140 |             current_text += line + " "
141 | 
142 |     # Add the last segment
143 |     if current_speaker and current_text.strip():
144 |         audio_segments.append(
145 |             (current_speaker, current_text.strip(), line_number))
146 | 
147 |     # Create audio files for each segment
148 |     temp_segments = []
149 |     for i, (speaker, text, line_number) in enumerate(audio_segments):
150 |         if 'Alex' in speaker:
151 |             voice = "en-US-AndrewMultilingualNeural"
152 |         elif 'Jane' in speaker:
153 |             voice = "en-US-AvaMultilingualNeural"
154 |         else:
155 |             voice = "en-US-BrandonMultilingualNeural"  # Default voice
156 | 
157 |         clean_text = text.strip()
158 | 
159 |         if not clean_text:
160 |             log.log_debug(f"Warning: Empty cleaned text for segment {i} (starting at line {line_number}). Original text: '{text}'")
161 |             continue
162 | 
163 |         # Create segment filename using parent directory of output_path
164 |         segment_file = output_path.parent / f"segment_{i:03d}.wav"
165 |         create_speech(clean_text, voice, str(segment_file))
166 |         temp_segments.append(segment_file)
167 |         log.log_debug(f"Created {segment_file} for {speaker}: {clean_text[:50]}...")
168 | 
169 |         # Add a short pause between segments
170 |         time.sleep(0.5)
171 | 
172 |     log.log_debug(f"Processed {len(audio_segments)} segments.")
173 |     return output_path
174 | 
175 | # Example usage:
176 | # output_path = Path("path/to/output/directory")
177 | # await text_to_speech(your_script, output_path)


--------------------------------------------------------------------------------
/ai_helper/script_generator.py:
--------------------------------------------------------------------------------
  1 | from ai_helper.ai_helper import generate_content_from_openai
  2 | from logger import CustomLogger
  3 | 
  4 | log = CustomLogger("ScriptGenerator", log_file="script_generator.log")
  5 | 
  6 | ONE_HOST_PODCAST_SCRIPT_SYSTEM_INSTRUCTIONS = """
  7 | You are an expert Podcaster tasked with creating a full podcast script based on the provided outline. 
  8 | The podcast is a deep dive educational show featuring a single host named Alex.
  9 | 
 10 | The Podcast name is "MyPodify".
 11 | 
 12 | Alex is a charismatic and knowledgeable host with a background in journalism and a passion for research. They have a conversational style that combines the storytelling flair of Steve Jobs, the entrepreneurial insight of Richard Branson, the scientific curiosity of Neil deGrasse Tyson, and the observational wisdom of Jane Goodall.
 13 | 
 14 | - Begin with a brief introduction of the host and the topic.
 15 | - Structure the content as an engaging monologue that feels like a conversation with the listener.
 16 | - Present factual information, statistics, and historical context related to the chosen topic.
 17 | - Expand the outline into a natural, engaging narrative.
 18 | - Ensure the script covers all points in the outline thoroughly.
 19 | - Add relevant examples, anecdotes, or case studies to illustrate key points.
 20 | - Incorporate personal anecdotes, opinions, and experiences to make the content relatable.
 21 | - Use a conversational, informal tone throughout the podcast.
 22 | - Include some humor and light-hearted moments to keep the listener engaged.
 23 | - Incorporate smooth transitions between main points.
 24 | - Add opening and closing remarks, including a teaser for the next episode.
 25 | - Use Punctuation and Capitalization as this will be converted to speech.
 26 | 
 27 | ---
 28 | 
 29 | Pacing and Flow:
 30 | 
 31 | - Start with a hook or interesting fact to grab the audience's attention.
 32 | - Gradually build up the information, starting with basic concepts and progressing to more complex ideas.
 33 | - Include natural transitions between subtopics.
 34 | - Periodically summarize key points to reinforce important information.
 35 | - Use rhetorical questions or hypothetical scenarios to engage the listener.
 36 | 
 37 | ---
 38 | 
 39 | Engaging the Audience:
 40 | 
 41 | - Address the listeners directly, making them feel part of the conversation.
 42 | - Pose questions for the audience to ponder.
 43 | - Encourage listeners to share their thoughts or experiences on social media or the podcast's website.
 44 | - Incorporate listener feedback or questions from previous episodes when relevant.
 45 | 
 46 | ---
 47 | 
 48 | Intro Example:
 49 | **Alex**: "Hello and welcome to MyPodify, I'm your host Alex. Today, we're diving into [Topic]. But before we get started, let me share a quick story that happened to me this morning..."
 50 | 
 51 | ---
 52 | 
 53 | Generate a full podcast script based on the outline provided (15-20 minutes)
 54 | 
 55 | Expected Markdown Output Format:
 56 | **Alex:** [Content]
 57 | """
 58 | 
 59 | TWO_HOST_PODCAST_SCRIPT_SYSTEM_INSTRUCTIONS = """
 60 | You are an expert Podcaster tasked with creating a full podcast script based on the provided outline. 
 61 | The podcast is a deep dive educational show featuring two hosts: Alex and Jane. 
 62 | 
 63 | The Podcast name is "MyPodify". 
 64 | 
 65 | Alex is a journalist with a knack for storytelling, with a conversation style of Steve Jobs and Richard Branson.
 66 | Jane is a researcher with a passion for facts and figures, and she loves to share interesting anecdotes. She has a conversation style of Neil deGrasse Tyson and Jane Goodall.
 67 | 
 68 | - Begin with a brief introduction of the hosts and the topic.
 69 | - Structure the content as a casual conversation between the two hosts.
 70 | - Include natural back-and-forth dialogue, with hosts building on each other's points.
 71 | - Present factual information, statistics, and historical context related to the chosen topic.
 72 | - Expand the outline into a natural, engaging conversation between Alex and Jane.
 73 | - Ensure the script covers all points in the outline thoroughly.
 74 | - Add relevant examples, anecdotes, or case studies to illustrate key points.
 75 | - Incorporate personal anecdotes, opinions, and experiences from the hosts to make the content relatable.
 76 | - Use a conversational, informal tone throughout the podcast.
 77 | - Include some humor and light-hearted moments, such as jokes or playful banter between hosts.
 78 | - Incorporate smooth transitions between main points.
 79 | - Add opening and closing remarks, including the teaser for the next episode.
 80 | - Use Punctuation and Capitalization as this will be converted to speech.
 81 | --- 
 82 | 
 83 | Pacing and Flow:
 84 | 
 85 | - Start with a hook or interesting fact to grab the audience's attention.
 86 | - Gradually build up the information, starting with basic concepts and progressing to more complex ideas.
 87 | - Include natural transitions between subtopics.
 88 | - Periodically summarize key points to reinforce important information.
 89 | 
 90 | ---
 91 | 
 92 | Interaction between Hosts:
 93 | 
 94 | - Create distinct personalities for each host, with one potentially being more knowledgeable about the topic.
 95 | - Include instances where hosts ask each other questions or seek clarification.
 96 | - Allow for occasional disagreements or different perspectives between hosts.
 97 | - Incorporate moments where hosts compliment each other's insights or build on each other's ideas.
 98 | 
 99 | ---
100 | 
101 | Intro Example:
102 | **Alex**: "Hello and welcome to MyPodify, I am your host Alex, and I am joined by my co-host Jane.
103 | **Jane**: "Hi everyone, it's good to be back!
104 | **Alex**: "Let's talk about [Topic] today! Before we get started, Jane, how has your day been so far?"
105 | 
106 | ---
107 | 
108 | Generate a full podcast script based on the outline provided (15-20 minutes)
109 | 
110 | Expected Markdown Output Format:
111 | **Alex:** [Content]
112 | **Jane:** [Content]
113 | **Guest:** [Content]
114 | """
115 | 
116 | 
117 | async def generate_podcast_script(outline: str, analysis: str, host_count: int) -> str:
118 |     try:
119 |         log.log_info("Generating podcast script")
120 | 
121 |         if host_count == 1:
122 |             PODCAST_SCRIPT_SYSTEM_INSTRUCTIONS = ONE_HOST_PODCAST_SCRIPT_SYSTEM_INSTRUCTIONS
123 |         elif host_count > 1:
124 |             PODCAST_SCRIPT_SYSTEM_INSTRUCTIONS = TWO_HOST_PODCAST_SCRIPT_SYSTEM_INSTRUCTIONS
125 | 
126 |         final_content = f"{outline}\n\nContent Details:{analysis}"
127 | 
128 |         script = generate_content_from_openai(content=final_content, system_instructions=PODCAST_SCRIPT_SYSTEM_INSTRUCTIONS, purpose="Podcast Script")
129 | 
130 |         return script
131 |     except Exception as e:
132 |         log.log_error(f"Error generating podcast script: {e}")
133 |         raise


--------------------------------------------------------------------------------
/document_processor.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | from pathlib import Path
  3 | from azure.core.credentials import AzureKeyCredential
  4 | from azure.ai.formrecognizer import DocumentAnalysisClient
  5 | import docx
  6 | from typing import Optional
  7 | from dotenv import load_dotenv
  8 | 
  9 | from logger import CustomLogger
 10 | 
 11 | log = CustomLogger("DocumentProcessor", log_file="document_processor.log")
 12 | 
 13 | load_dotenv()
 14 | 
 15 | class UnsupportedFileTypeError(Exception):
 16 |     pass
 17 | 
 18 | def get_file_type(file_path: str) -> str:
 19 |     """Determine file type from extension."""
 20 |     return Path(file_path).suffix.lower()
 21 | 
 22 | def read_txt_file(file_path: str) -> str:
 23 |     """Read content from a text file."""
 24 |     try:
 25 |         with open(file_path, 'r', encoding='utf-8') as file:
 26 |             return file.read()
 27 |     except UnicodeDecodeError:
 28 |         # Try different encodings if UTF-8 fails
 29 |         encodings = ['latin-1', 'cp1252', 'iso-8859-1']
 30 |         for encoding in encodings:
 31 |             try:
 32 |                 with open(file_path, 'r', encoding=encoding) as file:
 33 |                     return file.read()
 34 |             except UnicodeDecodeError:
 35 |                 continue
 36 |         raise UnicodeDecodeError("Failed to decode text file with multiple encodings")
 37 | 
 38 | def read_docx_file(file_path: str) -> str:
 39 |     """Extract text from a DOCX file."""
 40 |     doc = docx.Document(file_path)
 41 |     full_text = []
 42 |     
 43 |     # Extract text from paragraphs
 44 |     for paragraph in doc.paragraphs:
 45 |         if paragraph.text.strip():
 46 |             full_text.append(paragraph.text)
 47 |     
 48 |     # Extract text from tables
 49 |     for table in doc.tables:
 50 |         for row in table.rows:
 51 |             for cell in row.cells:
 52 |                 if cell.text.strip():
 53 |                     full_text.append(cell.text)
 54 |     
 55 |     return '\n'.join(full_text)
 56 | 
 57 | def read_pdf_with_azure(file_path: str, endpoint: Optional[str] = None, key: Optional[str] = None) -> str:
 58 |     """Extract text from PDF using Azure Document Intelligence."""
 59 |     try:
 60 |         # Use provided credentials or fall back to environment variables
 61 |         endpoint = endpoint or os.getenv("AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT")
 62 |         key = key or os.getenv("AZURE_DOCUMENT_INTELLIGENCE_KEY")
 63 |         
 64 |         if not endpoint or not key:
 65 |             raise ValueError("Azure Document Intelligence credentials not configured")
 66 |         
 67 |         document_analysis_client = DocumentAnalysisClient(
 68 |             endpoint=endpoint, credential=AzureKeyCredential(key)
 69 |         )
 70 | 
 71 |         with open(file_path, "rb") as file:
 72 |             document_bytes = file.read()
 73 | 
 74 |         poller = document_analysis_client.begin_analyze_document(
 75 |             "prebuilt-read", document_bytes
 76 |         )
 77 |         result = poller.result()
 78 |         return result.content
 79 |     except Exception as e:
 80 |         log.log_error(f"Azure PDF processing failed: {str(e)}")
 81 |         raise
 82 | 
 83 | async def analyze_document(file_path: str) -> str:
 84 |     """
 85 |     Analyze document from various file formats.
 86 |     Supports PDF, DOC, DOCX, and TXT files.
 87 |     
 88 |     Args:
 89 |         file_path: Path to the local file
 90 |     Returns:
 91 |         Extracted text content from the document
 92 |     """
 93 |     try:
 94 |         # Verify file exists
 95 |         if not os.path.exists(file_path):
 96 |             raise FileNotFoundError(f"File not found: {file_path}")
 97 | 
 98 |         file_path = str(Path(file_path))  # Normalize path
 99 |         file_type = get_file_type(file_path)
100 |         
101 |         if file_type not in ['.pdf', '.doc', '.docx', '.txt']:
102 |             raise UnsupportedFileTypeError(f"Unsupported file type: {file_type}")
103 | 
104 |         log.log_info(f"Processing {file_type} file: {file_path}")
105 | 
106 |         # Process based on file type
107 |         if file_type == '.txt':
108 |             log.log_info("Processing TXT file")
109 |             full_text = read_txt_file(file_path)
110 |             
111 |         elif file_type in ['.doc', '.docx']:
112 |             log.log_info("Processing DOC/DOCX file")
113 |             if file_type == '.doc':
114 |                 # For now, we'll raise an error for .doc files
115 |                 # You might want to add doc to docx conversion here
116 |                 raise UnsupportedFileTypeError("DOC format is not supported, please convert to DOCX")
117 |             full_text = read_docx_file(file_path)
118 |             
119 |         elif file_type == '.pdf':
120 |             log.log_info("Processing PDF file using Azure Document Intelligence")
121 |             full_text = read_pdf_with_azure(file_path)
122 | 
123 |         # Post-processing
124 |         if not full_text:
125 |             raise ValueError("No text content extracted from document")
126 | 
127 |         # Remove excessive whitespace and normalize line endings
128 |         full_text = '\n'.join(line.strip() for line in full_text.splitlines() if line.strip())
129 |         log.log_info(f"Successfully extracted {len(full_text)} characters from {file_type} file")
130 |         
131 |         return full_text
132 | 
133 |     except UnsupportedFileTypeError as e:
134 |         log.log_error(f"Unsupported file type error: {str(e)}")
135 |         raise
136 | 
137 |     except Exception as e:
138 |         log.log_error(f"Error analyzing document: {str(e)}")
139 |         raise
140 | 
141 | def validate_file_type(file_path: str) -> bool:
142 |     """
143 |     Validate if the file type is supported.
144 |     Returns True if supported, False otherwise.
145 |     """
146 |     allowed_extensions = {'.pdf', '.doc', '.docx', '.txt'}
147 |     return get_file_type(file_path) in allowed_extensions
148 | 
149 | async def get_document_metadata(file_path: str) -> dict:
150 |     """
151 |     Get metadata about the local document.
152 |     Returns a dictionary with file type, size, and other relevant information.
153 |     """
154 |     try:
155 |         file_stat = os.stat(file_path)
156 |         metadata = {
157 |             "file_type": get_file_type(file_path),
158 |             "size": file_stat.st_size,
159 |             "created_at": file_stat.st_ctime,
160 |             "updated_at": file_stat.st_mtime,
161 |             "path": str(Path(file_path).absolute())
162 |         }
163 |         return metadata
164 |     except Exception as e:
165 |         log.log_error(f"Error getting document metadata: {str(e)}")
166 |         raise


--------------------------------------------------------------------------------
/examples/example_1.mp3:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/shagunmistry/NotebookLM_Alternative/ce8e1b1d51e84bc3ae647685fcac6d28c40e711a/examples/example_1.mp3


--------------------------------------------------------------------------------
/examples/example_2.mp3:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/shagunmistry/NotebookLM_Alternative/ce8e1b1d51e84bc3ae647685fcac6d28c40e711a/examples/example_2.mp3


--------------------------------------------------------------------------------
/helpers.py:
--------------------------------------------------------------------------------
 1 | 
 2 | from typing import Annotated
 3 | 
 4 | 
 5 | from logger import CustomLogger
 6 | import aiohttp
 7 | from bs4 import BeautifulSoup
 8 | import asyncio
 9 | from cachetools import TTLCache
10 | import aiodns
11 | import time
12 | 
13 | logger = CustomLogger(name="MyPodify_Helpers", log_file="mypodify_helpers.log")
14 | 
15 | # Cache for storing fetched content (1000 items, 1 hour TTL)
16 | content_cache = TTLCache(maxsize=1000, ttl=3600)
17 | 
18 | MAX_FILE_SIZE = 10 * 1024 * 1024  # 10MB
19 | ALLOWED_EXTENSIONS = {'pdf'}
20 | 
21 | # Rate limiting parameters
22 | rate_limit = 10  # requests per second
23 | last_request_time = time.time()
24 | request_count = 0
25 | 
26 | async def get_website_content(website_link: str, timeout: int = 10) -> str | None:
27 |     global last_request_time, request_count
28 | 
29 |     # Check cache first
30 |     if website_link in content_cache:
31 |         return content_cache[website_link]
32 | 
33 |     # Implement rate limiting
34 |     current_time = time.time()
35 |     if current_time - last_request_time >= 1:
36 |         last_request_time = current_time
37 |         request_count = 0
38 |     if request_count >= rate_limit:
39 |         await asyncio.sleep(1)
40 |         return await get_website_content(website_link, timeout)
41 |     request_count += 1
42 | 
43 |     try:
44 |         resolver = aiodns.DNSResolver()
45 |         async with aiohttp.ClientSession(connector=aiohttp.TCPConnector(ssl=False)) as session:
46 |             async with session.get(website_link, timeout=timeout, allow_redirects=True) as response:
47 |                 if response.status == 200:
48 |                     html_content = await response.text()
49 |                     soup = BeautifulSoup(html_content, 'html.parser')
50 |                     
51 |                     # Remove script and style elements
52 |                     for script in soup(["script", "style"]):
53 |                         script.decompose()
54 |                     
55 |                     # Get text content
56 |                     text = soup.get_text()
57 |                     
58 |                     # Clean up text
59 |                     lines = (line.strip() for line in text.splitlines())
60 |                     chunks = (phrase.strip() for line in lines for phrase in line.split("  "))
61 |                     text = '\n'.join(chunk for chunk in chunks if chunk)
62 |                     
63 |                     # Cache the result
64 |                     content_cache[website_link] = text
65 |                     
66 |                     return text
67 |                 else:
68 |                     return f"Failed to fetch content. Status code: {response.status}"
69 |     except asyncio.TimeoutError:
70 |         logger.log_error(f"Timeout fetching website content: {website_link}")
71 |         return None
72 |     except aiohttp.ClientError as e:
73 |         logger.log_error(f"Error fetching website content: {str(e)}")
74 |         return None
75 |     except Exception as e:
76 |         logger.log_error(f"Unknown error fetching website content: {str(e)}")
77 |         return None


--------------------------------------------------------------------------------
/logger.py:
--------------------------------------------------------------------------------
 1 | import logging
 2 | import os
 3 | from colorama import Fore, Style, init
 4 | 
 5 | # Initialize colorama for cross-platform colored output
 6 | init(autoreset=True)
 7 | 
 8 | class CustomLogger:
 9 |     def __init__(self, name, log_file=None):
10 |         self.logger = logging.getLogger(name)
11 |         self.logger.setLevel(logging.DEBUG)
12 | 
13 |         # Create formatter
14 |         formatter = logging.Formatter(
15 |             '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
16 |         )
17 | 
18 |         # Create console handler with coloring
19 |         console_handler = ColoredConsoleHandler()
20 |         console_handler.setLevel(logging.INFO)
21 |         console_handler.setFormatter(formatter)
22 |         self.logger.addHandler(console_handler)
23 | 
24 |         # Create file handler if log_file is provided
25 |         if log_file:
26 |             log_file = os.path.join('logs', log_file)
27 |             self._setup_file_handler(log_file, formatter)
28 | 
29 |     def _setup_file_handler(self, log_file, formatter):
30 |         log_dir = os.path.dirname(log_file)
31 |         if not os.path.exists(log_dir):
32 |             os.makedirs(log_dir)
33 |         file_handler = logging.FileHandler(log_file)
34 |         file_handler.setLevel(logging.DEBUG)
35 |         file_handler.setFormatter(formatter)
36 |         self.logger.addHandler(file_handler)
37 |         
38 |     def log_debug(self, message):
39 |         self.logger.debug(message)
40 | 
41 |     def log_info(self, message):
42 |         self.logger.info(message)
43 | 
44 |     def log_warning(self, message):
45 |         self.logger.warning(message)
46 | 
47 |     def log_error(self, message):
48 |         self.logger.error(message)
49 | 
50 |     def log_critical(self, message):
51 |         self.logger.critical(message)
52 | 
53 |     def log_exception(self, message):
54 |         self.logger.exception(message)
55 | 
56 |     def log_api_request(self, method, path, status_code, response_time):
57 |         self.logger.info(
58 |             f"API Request - Method: {method}, Path: {path}, "
59 |             f"Status: {status_code}, Response Time: {response_time:.2f}s"
60 |         )
61 | 
62 |     def log_db_query(self, query, execution_time):
63 |         self.logger.debug(
64 |             f"Database Query - Query: {query}, "
65 |             f"Execution Time: {execution_time:.2f}s"
66 |         )
67 | 
68 |     def log_user_action(self, user_id, action):
69 |         self.logger.info(f"User Action - User ID: {user_id}, Action: {action}")
70 | 
71 | 
72 | class ColoredConsoleHandler(logging.StreamHandler):
73 |     COLORS = {
74 |         logging.DEBUG: Fore.CYAN,
75 |         logging.INFO: Fore.GREEN,
76 |         logging.WARNING: Fore.YELLOW,
77 |         logging.ERROR: Fore.RED,
78 |         logging.CRITICAL: Fore.RED + Style.BRIGHT,
79 |     }
80 | 
81 |     def emit(self, record):
82 |         color = self.COLORS.get(record.levelno, Fore.WHITE)
83 |         message = self.format(record)
84 |         print(f"{color}{message}{Style.RESET_ALL}")
85 | 
86 | 
87 | # Usage example
88 | # logger = CustomLogger("MyApp", log_file="logs/app.log")
89 | # logger.log_info("This is an info message")
90 | # logger.log_warning("This is a warning message")
91 | # logger.log_error("This is an error message")
92 | # logger.log_critical("This is a critical message")
93 | # logger.log_api_request("GET", "/api/users", 200, 0.05)
94 | # logger.log_db_query("SELECT * FROM users", 0.02)
95 | # logger.log_user_action("user123", "Logged in")


--------------------------------------------------------------------------------
/main.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import argparse
  3 | import json
  4 | from pathlib import Path
  5 | import asyncio
  6 | from datetime import datetime
  7 | from typing import List, Dict
  8 | 
  9 | from document_processor import analyze_document, UnsupportedFileTypeError
 10 | from logger import CustomLogger
 11 | from ai_helper.generate_outline import generate_podcast_outline
 12 | from ai_helper.script_generator import generate_podcast_script
 13 | from ai_helper.generate_speech import text_to_speech
 14 | from utils.combine_audio import combine_audio_files
 15 | 
 16 | # Set up logging
 17 | log = CustomLogger("PodcastGenerator", log_file="podcast_generator.log")
 18 | 
 19 | class PodcastGenerator:
 20 |     def __init__(self, input_dir: str, output_dir: str, project_name: str, 
 21 |                  host_count: int = 2, description: str = ""):
 22 |         self.input_dir = Path(input_dir)
 23 |         self.output_dir = Path(output_dir)
 24 |         self.project_name = project_name
 25 |         self.host_count = host_count
 26 |         self.description = description
 27 |         self.project_dir = self.output_dir / self.sanitize_filename(project_name)
 28 |         
 29 |         # Create output directories
 30 |         self.project_dir.mkdir(parents=True, exist_ok=True)
 31 |         (self.project_dir / "documents").mkdir(exist_ok=True)
 32 |         (self.project_dir / "outlines").mkdir(exist_ok=True)
 33 |         (self.project_dir / "scripts").mkdir(exist_ok=True)
 34 |         (self.project_dir / "audio").mkdir(exist_ok=True)
 35 | 
 36 |     @staticmethod
 37 |     def sanitize_filename(filename: str) -> str:
 38 |         """Convert string to valid filename."""
 39 |         return "".join(c for c in filename if c.isalnum() or c in (' ', '-', '_')).strip()
 40 | 
 41 |     async def process_documents(self) -> List[str]:
 42 |         """Process all documents in the input directory."""
 43 |         document_contents = []
 44 |         
 45 |         if not self.input_dir.exists():
 46 |             raise FileNotFoundError(f"Input directory not found: {self.input_dir}")
 47 | 
 48 |         for file_path in self.input_dir.rglob('*'):
 49 |             if file_path.is_file():
 50 |                 try:
 51 |                     log.log_info(f"Processing document: {file_path}")
 52 |                     content = await analyze_document(str(file_path))
 53 |                     document_contents.append(content)
 54 |                     
 55 |                     # Save processed content
 56 |                     output_path = self.project_dir / "documents" / f"{file_path.stem}_processed.txt"
 57 |                     with open(output_path, 'w', encoding='utf-8') as f:
 58 |                         f.write(content)
 59 |                     
 60 |                 except UnsupportedFileTypeError as e:
 61 |                     log.log_error(f"Skipping unsupported file {file_path}: {str(e)}")
 62 |                 except Exception as e:
 63 |                     log.log_error(f"Error processing file {file_path}: {str(e)}")
 64 | 
 65 |         return document_contents
 66 | 
 67 |     async def generate_outline(self, document_contents: List[str]) -> str:
 68 |         """Generate podcast outline from document contents."""
 69 |         log.log_info("Generating podcast outline")
 70 | 
 71 |         log.log_debug(document_contents[0][:100])
 72 |         
 73 |         # Combine all document contents with description
 74 |         combined_content = "\n\n".join([self.description] + document_contents)
 75 |         
 76 |         # Generate outline
 77 |         outline = await generate_podcast_outline(combined_content, self.host_count)
 78 |         
 79 |         # Save outline
 80 |         timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
 81 |         outline_path = self.project_dir / "outlines" / f"outline_{timestamp}.md"
 82 |         with open(outline_path, 'w', encoding='utf-8') as f:
 83 |             f.write(outline)
 84 |             
 85 |         return outline
 86 | 
 87 |     async def generate_script(self, outline: str, document_contents: List[str]) -> str:
 88 |         """Generate podcast script from outline and document contents."""
 89 |         log.log_info("Generating podcast script")
 90 |         
 91 |         # Combine all document contents
 92 |         combined_content = "\n\n".join(document_contents)
 93 |         
 94 |         # Generate script
 95 |         script = await generate_podcast_script(outline, combined_content, self.host_count)
 96 |         
 97 |         # Save script
 98 |         timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
 99 |         script_path = self.project_dir / "scripts" / f"script_{timestamp}.md"
100 |         with open(script_path, 'w', encoding='utf-8') as f:
101 |             f.write(script)
102 |             
103 |         return script
104 |     
105 |     async def generate_audio(self, script: str) -> str:
106 |         """Generate audio from script."""
107 |         log.log_info("Generating audio file")
108 |         
109 |         timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
110 |         
111 |         # Convert project_dir to Path if it isn't already
112 |         project_dir = Path(self.project_dir)
113 |         
114 |         # Create the audio directory path
115 |         audio_dir = project_dir / "audio"
116 |         
117 |         # Create the audio directory if it doesn't exist
118 |         audio_dir.mkdir(parents=True, exist_ok=True)
119 |         
120 |         # Create the full audio file path
121 |         audio_path = audio_dir / f"podcast_{timestamp}.mp3"
122 |         
123 |         await text_to_speech(script, str(audio_path))
124 |         
125 |         return str(audio_path)
126 | 
127 |     async def generate_podcast(self) -> Dict:
128 |         """Generate complete podcast from documents."""
129 |         try:
130 |             # Process all documents
131 |             document_contents = await self.process_documents()
132 |             if not document_contents:
133 |                 raise ValueError("No valid documents found to process")
134 |             
135 |             log.log_info(f"Processed {len(document_contents)} documents")
136 | 
137 |             # Generate outline
138 |             outline = await self.generate_outline(document_contents)
139 |             
140 |             log.log_debug(outline[:100])
141 | 
142 |             # Generate script
143 |             script = await self.generate_script(outline, document_contents)
144 | 
145 |             # Generate audio
146 |             audio_path = await self.generate_audio(script)
147 | 
148 |             # Get all segment files
149 |             audio_dir = Path(self.project_dir) / "audio"
150 |             segment_files = sorted(list(audio_dir.glob("segment_*.wav")))
151 |             
152 |             if not segment_files:
153 |                 raise ValueError("No audio segments found to combine")
154 | 
155 |             # Combine all the audio files into a single file
156 |             audio_combined_path = audio_dir / "podcast_combined.mp3"
157 |             combine_audio_files(segment_files, audio_combined_path)
158 | 
159 |             # Save project metadata
160 |             metadata = {
161 |                 "project_name": self.project_name,
162 |                 "host_count": self.host_count,
163 |                 "description": self.description,
164 |                 "timestamp": datetime.now().isoformat(),
165 |                 "input_directory": str(self.input_dir),
166 |                 "output_directory": str(self.project_dir),
167 |                 "audio_segments": [str(f) for f in segment_files],
168 |                 "audio_combined_file": str(audio_combined_path),
169 |             }
170 |             
171 |             metadata_path = Path(self.project_dir) / "metadata.json"
172 |             with open(metadata_path, 'w') as f:
173 |                 json.dump(metadata, f, indent=2)
174 | 
175 |             return metadata
176 | 
177 |         except Exception as e:
178 |             log.log_error(f"Error generating podcast: {str(e)}")
179 |             raise
180 | 
181 | def main():
182 |     input_dir = 'my_docs'
183 |     output_dir = input("Enter the output directory for generated files (default: output): ") or 'output'
184 |     project_name = input("Enter the name of the project: ")
185 |     host_count = input("Enter the number of podcast hosts (default: 2): ") or 2
186 |     description = input("Enter the project description (optional): ")
187 | 
188 |     try:
189 |         generator = PodcastGenerator(
190 |             input_dir,
191 |             output_dir,
192 |             project_name,
193 |             int(host_count),
194 |             description
195 |         )
196 | 
197 |         metadata = asyncio.run(generator.generate_podcast())
198 |         log.log_info("Podcast generation completed successfully!")
199 |         log.log_info(f"Output files are in: {metadata['output_directory']}")
200 | 
201 |     except Exception as e:
202 |         log.log_error(f"Error: {str(e)}")
203 |         exit(1)
204 | 
205 | if __name__ == "__main__":
206 |     main()


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | annotated-types==0.7.0
 2 | anyio==4.7.0
 3 | azure-ai-formrecognizer==3.3.3
 4 | azure-cognitiveservices-speech==1.41.1
 5 | azure-common==1.1.28
 6 | azure-core==1.32.0
 7 | beautifulsoup4==4.12.3
 8 | certifi==2024.12.14
 9 | charset-normalizer==3.4.0
10 | colorama==0.4.6
11 | distro==1.9.0
12 | groq==0.13.1
13 | h11==0.14.0
14 | httpcore==1.0.7
15 | httpx==0.27.2
16 | idna==3.10
17 | isodate==0.7.2
18 | jiter==0.8.2
19 | joblib==1.4.2
20 | lxml==5.3.0
21 | msrest==0.7.1
22 | numpy==2.2.0
23 | oauthlib==3.2.2
24 | ollama==0.4.4
25 | openai==1.58.1
26 | pydantic==2.10.4
27 | pydantic_core==2.27.2
28 | pydub==0.25.1
29 | PyPDF2==3.0.1
30 | python-docx==1.1.2
31 | python-dotenv==1.0.1
32 | regex==2024.11.6
33 | requests==2.32.3
34 | requests-oauthlib==2.0.0
35 | scikit-learn==1.6.0
36 | scipy==1.14.1
37 | setuptools==75.1.0
38 | six==1.17.0
39 | sniffio==1.3.1
40 | soupsieve==2.6
41 | threadpoolctl==3.5.0
42 | tiktoken==0.8.0
43 | tqdm==4.67.1
44 | typing_extensions==4.12.2
45 | urllib3==2.2.3
46 | wheel==0.44.0
47 | 


--------------------------------------------------------------------------------
/utils/combine_audio.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | from pathlib import Path
 3 | from pydub import AudioSegment
 4 | from typing import List, Union
 5 | from logger import CustomLogger
 6 | 
 7 | log = CustomLogger("CombineAudio", log_file="combine_audio.log")
 8 | 
 9 | def combine_audio_files(input_files: List[Union[str, Path]], output_file: Union[str, Path]) -> str:
10 |     """
11 |     Combine multiple audio files into a single MP3 file.
12 |     
13 |     Args:
14 |         input_files: List of paths to input audio files
15 |         output_file: Path where the combined audio should be saved
16 |     """
17 |     log.log_debug("Starting audio combination process...")
18 |     
19 |     # Convert all paths to Path objects
20 |     input_files = [Path(f) for f in input_files]
21 |     output_file = Path(output_file)
22 |     
23 |     # Ensure output directory exists
24 |     output_file.parent.mkdir(parents=True, exist_ok=True)
25 | 
26 |     # Initialize an empty AudioSegment
27 |     log.log_debug("Initializing an empty AudioSegment...")
28 |     combined = AudioSegment.empty()
29 | 
30 |     # Iterate through the input files
31 |     for file_path in input_files:
32 |         try:
33 |             log.log_debug(f"Processing file: {file_path}")
34 |             
35 |             # Load the audio file (handle both wav and mp3)
36 |             if file_path.suffix.lower() == '.wav':
37 |                 audio = AudioSegment.from_wav(str(file_path))
38 |             elif file_path.suffix.lower() == '.mp3':
39 |                 audio = AudioSegment.from_mp3(str(file_path))
40 |             else:
41 |                 log.log_warning(f"Unsupported file format: {file_path}")
42 |                 continue
43 | 
44 |             # Add it to the combined AudioSegment
45 |             combined += audio
46 |             log.log_debug(f"Added: {file_path}")
47 |             
48 |         except Exception as e:
49 |             log.log_error(f"Error processing {file_path}: {str(e)}")
50 |             continue
51 | 
52 |     if len(combined) == 0:
53 |         raise ValueError("No audio files were successfully combined")
54 | 
55 |     # Export the combined audio to a file
56 |     log.log_debug(f"Exporting combined audio to {output_file}...")
57 |     combined.export(str(output_file), format="mp3")
58 |     log.log_debug(f"Combined audio saved as: {output_file}")
59 | 
60 |     return str(output_file)


--------------------------------------------------------------------------------