├── .gitignore ├── LICENSE ├── README.md ├── docker-compose.yml ├── ppt2desc_icon.png ├── requirements.txt └── src ├── converters ├── __init__.py ├── docker_converter.py ├── exceptions.py ├── pdf_converter.py └── ppt_converter.py ├── libreoffice_docker ├── Dockerfile ├── app.py └── requirements.txt ├── llm ├── __init__.py ├── anthropic.py ├── aws.py ├── azure.py ├── base.py ├── deprecated │ ├── gemini.py │ └── vertex.py ├── google_unified.py └── openai.py ├── main.py ├── processor.py ├── prompt.txt └── schemas ├── __init__.py └── deck.py /.gitignore: -------------------------------------------------------------------------------- 1 | ppt2desc_venv 2 | test_files 3 | .env 4 | __pycache__/ 5 | .DS_Store 6 | ppts/ -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2025 Adam Łucek 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | # ppt2desc 4 | 5 | Convert PowerPoint presentations into semantically rich text using Vision Language Models. 6 | 7 | ## Overview 8 | 9 | ppt2desc is a command-line tool that converts PowerPoint presentations into detailed textual descriptions. PowerPoint presentations are an inherently visual medium that often convey complex ideas through a combination of text, graphics, charts, and other visual layouts. This tool uses vision language models to not only transcribe the text content but also interpret and describe the visual elements and their relationships, capturing the full semantic meaning of each slide in a machine-readable format. 10 | 11 | ## Features 12 | 13 | - Convert PPT/PPTX files to semantic descriptions 14 | - Process individual files or entire directories 15 | - Support for visual elements interpretation (charts, graphs, figures) 16 | - Rate limiting for API calls 17 | - Customizable prompts and instructions 18 | - JSON output format for easy integration 19 | 20 | **Current Model Provider Support** 21 | - Gemini models via Google Gemini API 22 | - GPT Models via OpenAI API 23 | - Claude Models via Anthropic API 24 | - Gemini Models via Google Cloud Platform Vertex AI 25 | - GPT Models via Microsoft Azure AI Foundry Deployments 26 | - Nova & Claude Models via Amazon Web Services's Amazon Bedrock 27 | 28 | ## Prerequisites 29 | 30 | - Python 3.9 or higher 31 | - LibreOffice (for PPT/PPTX to PDF conversion) 32 | - Option 1: Install LibreOffice locally. 33 | - Option 2: Use the provided Docker container for LibreOffice. 34 | - vLLM API credentials 35 | 36 | ## Installation 37 | 38 | 1. Clone the repository: 39 | ```bash 40 | git clone https://github.com/ALucek/ppt2desc.git 41 | cd ppt2desc 42 | ``` 43 | 44 | 2. Installing LibreOffice 45 | 46 | LibreOffice is a critical dependency for this tool as it handles the headless conversion of PowerPoint files to PDF format 47 | 48 | **Option 1: Local Installation** 49 | 50 | **Linux Systems:** 51 | ```bash 52 | sudo apt install libreoffice 53 | ``` 54 | 55 | macOS: 56 | ```bash 57 | brew install libreoffice 58 | ``` 59 | 60 | Windows: 61 | Build from the installer at [LibreOffice's Official Website](https://www.libreoffice.org/download/download/) 62 | 63 | **Option 2: Docker-based Installation** 64 | 65 | a. Ensure you have [Docker](https://www.docker.com/) installed on your system 66 | b. Run the following command 67 | ```bash 68 | docker compose up -d 69 | ``` 70 | 71 | This command will build the Docker image based on the provided [Dockerfile](./src/libreoffice_docker/) and start the container in detached mode. The LibreOffice conversion service will be accessible at`http://localhost:2002`. Take down with `docker compose down`. 72 | 73 | 3. Create and activate a virtual environment: 74 | ```bash 75 | python -m venv ppt2desc_venv 76 | source ppt2desc_venv/bin/activate # On Windows: ppt2desc_venv\Scripts\activate 77 | ``` 78 | 79 | 4. Install dependencies: 80 | ```bash 81 | pip install -r requirements.txt 82 | ``` 83 | 84 | ## Usage 85 | 86 | Basic usage with Gemini API: 87 | ```bash 88 | python src/main.py \ 89 | --input_dir /path/to/presentations \ 90 | --output_dir /path/to/output \ 91 | --libreoffice_path /path/to/soffice \ 92 | --client gemini \ 93 | --api_key YOUR_GEMINI_API_KEY 94 | ``` 95 | 96 | ### Command Line Arguments 97 | 98 | General Arguments: 99 | - `--input_dir`: Path to input directory or PPT file (required) 100 | - `--output_dir`: Output directory path (required) 101 | - `--client`: LLM client to use: 'gemini', 'vertexai', 'anthropic', 'azure', 'aws' or 'openai' (required) 102 | - `--model`: Model to use (default: "gemini-1.5-flash") 103 | - `--instructions`: Additional instructions for the model 104 | - `--libreoffice_path`: Path to LibreOffice installation 105 | - `--libreoffice_url`: Url for docker-based libreoffice installation (configured: http://localhost:2002) 106 | - `--rate_limit`: API calls per minute (default: 60) 107 | - `--prompt_path`: Custom prompt file path 108 | - `--api_key`: Model Provider API key (if not set via environment variable) 109 | - `--save_pdf`: Include to save the converted PDF in your output folder 110 | - `--save_images`: Include to save the individual slide images in your output folder 111 | 112 | Vertex AI Specific Arguments: 113 | - `--gcp_project_id`: GCP project ID for Vertex AI service account 114 | - `--gcp_region`: GCP region for Vertex AI service (e.g., us-central1) 115 | - `--gcp_application_credentials`: Path to GCP service account JSON credentials file 116 | 117 | Azure AI Foundry Specific Arguments: 118 | - `--azure_openai_api_key`: Azure AI Foundry Resource Key 1 or Key 2 119 | - `--azure_openai_endpoint`: Azure AI Foundry deployment service endpoint link 120 | - `--azure_deployment_name`: The name of your model deployment 121 | - `--azure_api_version`: Azure API Version (Default: "2023-12-01-preview") 122 | 123 | AWS Amazon Bedrock Specific Arguments: 124 | - `--aws_access_key_id`: Bedrock Account Access Key 125 | - `--aws_secret_access_key`: Bedrock Account Account Secret Access Key 126 | - `--aws_region`: AWS Bedrock Region 127 | 128 | ### Example Commands 129 | 130 | Using Gemini API: 131 | ```bash 132 | python src/main.py \ 133 | --input_dir ./presentations \ 134 | --output_dir ./output \ 135 | --libreoffice_path ./soffice \ 136 | --client gemini \ 137 | --model gemini-1.5-flash \ 138 | --rate_limit 30 \ 139 | --instructions "Focus on extracting numerical data from charts and graphs" 140 | ``` 141 | 142 | Using Vertex AI: 143 | ```bash 144 | python src/main.py \ 145 | --input_dir ./presentations \ 146 | --output_dir ./output \ 147 | --client vertexai \ 148 | --libreoffice_path ./soffice \ 149 | --gcp_project_id my-project-123 \ 150 | --gcp_region us-central1 \ 151 | --gcp_application_credentials ./service-account.json \ 152 | --model gemini-1.5-pro \ 153 | --instructions "Extract detailed information from technical diagrams" 154 | ``` 155 | Using Azure AI Foundry: 156 | ```bash 157 | python src/main.py \ 158 | --input_dir ./presentations \ 159 | --output_dir ./output \ 160 | --libreoffice_path ./soffice \ 161 | --client azure \ 162 | --azure_openai_api_key 123456790ABCDEFG \ 163 | --azure_openai_endpoint 'https://example-endpoint-001.openai.azure.com/' \ 164 | --azure_deployment_name gpt-4o \ 165 | --azure_api_version 2023-12-01-preview \ 166 | --rate_limit 60 167 | ``` 168 | 169 | Using AWS Amazon Bedrock: 170 | ```bash 171 | python src/main.py \ 172 | --input_dir ./presentations \ 173 | --output_dir ./output \ 174 | --libreoffice_path ./soffice \ 175 | --client aws \ 176 | --model us.amazon.nova-lite-v1:0 \ 177 | --aws_access_key_id 123456790ABCDEFG \ 178 | --aws_secret_access_key 123456790ABCDEFG \ 179 | --aws_region us-east-1 \ 180 | --rate_limit 60 181 | ``` 182 | 183 | ## Output Format 184 | 185 | The tool generates JSON files with the following structure: 186 | 187 | ```json 188 | { 189 | "deck": "presentation.pptx", 190 | "model": "model-name", 191 | "slides": [ 192 | { 193 | "number": 1, 194 | "content": "Detailed description of slide content..." 195 | }, 196 | // ... more slides 197 | ] 198 | } 199 | ``` 200 | 201 | ## Advanced Usage 202 | 203 | ### Using Docker-based LibreOffice Conversion 204 | 205 | When using the Docker container for LibreOffice, you can use the `--libreoffice_url` argument to direct the conversion process to the container's API endpoint, rather than a local installation. 206 | 207 | ```bash 208 | python src/main.py \ 209 | --input_dir ./presentations \ 210 | --output_dir ./output \ 211 | --libreoffice_url http://localhost:2002 \ 212 | --client vertexai \ 213 | --model gemini-1.5-pro \ 214 | --gcp_project_id my-project-123 \ 215 | --gcp_region us-central1 \ 216 | --gcp_application_credentials ./service-account.json \ 217 | --rate_limit 30 \ 218 | --instructions "Extract detailed information from technical diagrams" \ 219 | --save_pdf \ 220 | --save_images 221 | ``` 222 | 223 | You should use either `--libreoffice_url` or `--libreoffice_path` but not both. 224 | 225 | ### Custom Prompts 226 | 227 | You can modify the base prompt by editing `src/prompt.txt` or providing additional instructions via the command line: 228 | 229 | ```bash 230 | python src/main.py \ 231 | --input_dir ./presentations \ 232 | --output_dir ./output \ 233 | --libreoffice_path ./soffice \ 234 | --instructions "Include mathematical equations and formulas in LaTeX format" 235 | ``` 236 | 237 | ### Authentication 238 | 239 | For Consumer APIs: 240 | - Set your API key via the `--api_key` argument or through your respective provider's environment variables 241 | 242 | For Vertex AI: 243 | 1. Create a service account in your GCP project IAM 244 | 2. Grant necessary permissions (typically, "Vertex AI User" role) 245 | 3. Download the service account JSON key file 246 | 4. Provide the credentials file path via `--gcp_application_credentials` 247 | 248 | For Azure OpenAI Foundry: 249 | 1. Create an Azure OpenAI Resource 250 | 2. Navigate to Azure AI Foundry and choose the subscription and Azure OpenAI Resource to work with 251 | 3. Under management select deployments 252 | 4. Select create new deployment and configure with your vision LLM 253 | 5. Provide deployment name, API key, endpoint, and api version via `--azure_deployment_name`, `--azure_openai_api_key`, `--azure_openai_endpoint`, `--azure_api_version`, 254 | 255 | For AWS Bedrock: 256 | 1. Request access to serverless model deployments in Amazon Bedrock's model catalog 257 | 2. Create a user in your AWS IAM 258 | 3. Enable Amazon Bedrock access policies for your user 259 | 4. Save User Credentials access key and secret access key 260 | 5. Provide user's credentials via `--aws_access_key_id`, and `--aws_secret_access_key` 261 | 262 | ## Contributing 263 | 264 | Contributions are welcome! Please feel free to submit a Pull Request. 265 | 266 | **Todo** 267 | - Handling google's new genai SDK for a unified gemini/vertex experience 268 | - Better Docker Setup 269 | - AWS Llama Vision Support Confirmation 270 | - Combination of JSON files across multiple ppts 271 | - Dynamic font understanding (i.e. struggles when font that ppt is using is not installed on machine) 272 | 273 | ## License 274 | 275 | This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. 276 | 277 | ## Acknowledgments 278 | 279 | - [LibreOffice](https://www.libreoffice.org/) for PPT/PPTX conversion 280 | - [PyMuPDF](https://pymupdf.readthedocs.io/en/latest/) for PDF processing -------------------------------------------------------------------------------- /docker-compose.yml: -------------------------------------------------------------------------------- 1 | services: 2 | libreoffice-converter: 3 | build: 4 | context: ./src/libreoffice_docker 5 | dockerfile: Dockerfile 6 | ports: 7 | - "2002:2002" 8 | restart: unless-stopped 9 | # Healthcheck 10 | healthcheck: 11 | test: ["CMD", "curl", "-f", "http://localhost:2002/health"] 12 | interval: 30s 13 | timeout: 10s 14 | retries: 3 -------------------------------------------------------------------------------- /ppt2desc_icon.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ALucek/ppt2desc/487b8578d09acff1c4a6121b573050df7aef3568/ppt2desc_icon.png -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | annotated-types==0.7.0 2 | anthropic==0.42.0 3 | anyio==4.7.0 4 | boto3==1.35.91 5 | botocore==1.35.91 6 | cachetools==5.5.0 7 | certifi==2024.12.14 8 | charset-normalizer==3.4.1 9 | distro==1.9.0 10 | docstring_parser==0.16 11 | google-ai-generativelanguage==0.6.10 12 | google-api-core==2.24.0 13 | google-api-python-client==2.156.0 14 | google-auth==2.37.0 15 | google-auth-httplib2==0.2.0 16 | google-cloud-aiplatform==1.75.0 17 | google-cloud-bigquery==3.27.0 18 | google-cloud-core==2.4.1 19 | google-cloud-resource-manager==1.14.0 20 | google-cloud-storage==2.19.0 21 | google-crc32c==1.6.0 22 | google-generativeai==0.8.3 23 | google-resumable-media==2.7.2 24 | googleapis-common-protos==1.66.0 25 | grpc-google-iam-v1==0.13.1 26 | grpcio==1.68.1 27 | grpcio-status==1.68.1 28 | h11==0.14.0 29 | httpcore==1.0.7 30 | httplib2==0.22.0 31 | httpx==0.28.1 32 | idna==3.10 33 | jiter==0.8.2 34 | jmespath==1.0.1 35 | numpy==2.2.1 36 | openai==1.58.1 37 | packaging==24.2 38 | pillow==11.0.0 39 | proto-plus==1.25.0 40 | protobuf==5.29.2 41 | pyasn1==0.6.1 42 | pyasn1_modules==0.4.1 43 | pydantic==2.10.4 44 | pydantic_core==2.27.2 45 | PyMuPDF==1.25.1 46 | pyparsing==3.2.1 47 | python-dateutil==2.9.0.post0 48 | requests==2.32.3 49 | rsa==4.9 50 | s3transfer==0.10.4 51 | shapely==2.0.6 52 | six==1.17.0 53 | sniffio==1.3.1 54 | tqdm==4.67.1 55 | typing_extensions==4.12.2 56 | uritemplate==4.1.1 57 | urllib3==2.3.0 58 | google-genai==1.3.0 -------------------------------------------------------------------------------- /src/converters/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ALucek/ppt2desc/487b8578d09acff1c4a6121b573050df7aef3568/src/converters/__init__.py -------------------------------------------------------------------------------- /src/converters/docker_converter.py: -------------------------------------------------------------------------------- 1 | import requests 2 | from pathlib import Path 3 | import logging 4 | 5 | from .exceptions import ConversionError 6 | 7 | logger = logging.getLogger(__name__) 8 | 9 | def convert_pptx_via_docker( 10 | ppt_file: Path, 11 | container_url: str, 12 | temp_dir: Path 13 | ) -> Path: 14 | """ 15 | Convert a PPT/PPTX file to PDF by sending it to the Docker container at container_url. 16 | e.g., container_url="http://localhost:2002" 17 | 18 | :param ppt_file: Path to the local PPT/PPTX file 19 | :param container_url: Base URL of the container (without trailing slash) 20 | :param temp_dir: Directory to store the resulting PDF 21 | :return: Path to the newly-created PDF file 22 | :raises ConversionError: if the container fails or file can't be saved 23 | """ 24 | endpoint = f"{container_url.rstrip('/')}/convert/ppt-to-pdf" 25 | logger.info(f"Calling Docker LibreOffice at {endpoint} for {ppt_file}") 26 | 27 | # 1) Prepare the file for upload 28 | files = { 29 | "file": (ppt_file.name, ppt_file.open("rb"), "application/vnd.ms-powerpoint") 30 | } 31 | 32 | try: 33 | # 2) Make a POST request 34 | resp = requests.post(endpoint, files=files, timeout=300) 35 | resp.raise_for_status() 36 | 37 | # 3) Save the returned PDF to temp_dir 38 | pdf_filename = ppt_file.stem + ".pdf" 39 | pdf_path = temp_dir / pdf_filename 40 | with open(pdf_path, "wb") as f: 41 | for chunk in resp.iter_content(chunk_size=8192): 42 | f.write(chunk) 43 | 44 | if not pdf_path.exists(): 45 | raise ConversionError("PDF file not created after Docker-based conversion.") 46 | logger.info(f"Created PDF {pdf_path} via Docker container.") 47 | return pdf_path 48 | 49 | except Exception as e: 50 | logger.error(f"Error converting {ppt_file} via Docker: {e}") 51 | raise ConversionError(f"Error converting {ppt_file}: {str(e)}") 52 | -------------------------------------------------------------------------------- /src/converters/exceptions.py: -------------------------------------------------------------------------------- 1 | class LibreOfficeNotFoundError(Exception): 2 | """Raised when LibreOffice is not found at the given path.""" 3 | pass 4 | 5 | class ConversionError(Exception): 6 | """General error for file conversion issues.""" 7 | pass -------------------------------------------------------------------------------- /src/converters/pdf_converter.py: -------------------------------------------------------------------------------- 1 | import logging 2 | import fitz 3 | from PIL import Image 4 | from pathlib import Path 5 | from typing import List 6 | from .exceptions import ConversionError 7 | 8 | logger = logging.getLogger(__name__) 9 | 10 | def convert_pdf_to_images(pdf_path: Path, temp_dir: Path) -> List[Path]: 11 | """ 12 | Convert a PDF file to a series of PNG images. 13 | 14 | :param pdf_path: Path to the input PDF file 15 | :param temp_dir: Path to temporary directory for storing images 16 | :return: List of paths to generated image files 17 | :raises ConversionError: if the conversion to images fails 18 | """ 19 | target_size = (1920, 1080) 20 | image_paths = [] 21 | 22 | try: 23 | images_dir = temp_dir / 'images' 24 | images_dir.mkdir(exist_ok=True) 25 | 26 | doc = fitz.open(pdf_path) 27 | 28 | for page_num in range(len(doc)): 29 | page = doc.load_page(page_num) 30 | page_rect = page.rect 31 | 32 | zoom_x = target_size[0] / page_rect.width 33 | zoom_y = target_size[1] / page_rect.height 34 | zoom = min(zoom_x, zoom_y) 35 | 36 | try: 37 | pix = page.get_pixmap(matrix=fitz.Matrix(zoom, zoom), alpha=False) 38 | img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples) 39 | 40 | # Create background (white) and paste the rendered image 41 | new_img = Image.new("RGB", target_size, (255, 255, 255)) 42 | paste_x = (target_size[0] - img.width) // 2 43 | paste_y = (target_size[1] - img.height) // 2 44 | new_img.paste(img, (paste_x, paste_y)) 45 | 46 | # Save image 47 | image_path = images_dir / f"slide_{page_num + 1}.png" 48 | new_img.save(image_path) 49 | image_paths.append(image_path) 50 | 51 | except Exception as inner_exc: 52 | logger.error(f"Error processing page {page_num + 1}: {str(inner_exc)}") 53 | continue 54 | 55 | doc.close() 56 | return image_paths 57 | 58 | except Exception as e: 59 | logger.error(f"Error converting PDF to images: {str(e)}") 60 | raise ConversionError(f"Error converting PDF to images: {str(e)}") 61 | -------------------------------------------------------------------------------- /src/converters/ppt_converter.py: -------------------------------------------------------------------------------- 1 | import logging 2 | import subprocess 3 | from pathlib import Path 4 | from .exceptions import LibreOfficeNotFoundError, ConversionError 5 | 6 | logger = logging.getLogger(__name__) 7 | 8 | def convert_pptx_to_pdf(input_file: Path, libreoffice_path: Path, temp_dir: Path) -> Path: 9 | """ 10 | Convert a PowerPoint file to PDF using LibreOffice. 11 | 12 | :param input_file: Path to the input PowerPoint file 13 | :param libreoffice_path: Path to LibreOffice executable 14 | :param temp_dir: Temporary directory to store the PDF 15 | :return: Path to the output PDF file if successful 16 | :raises LibreOfficeNotFoundError: if LibreOffice is not found 17 | :raises ConversionError: if the conversion fails 18 | """ 19 | if not libreoffice_path.exists(): 20 | logger.error(f"LibreOffice not found at {libreoffice_path}") 21 | raise LibreOfficeNotFoundError(f"LibreOffice not found at {libreoffice_path}") 22 | 23 | try: 24 | cmd = [ 25 | str(libreoffice_path), 26 | '--headless', 27 | '--convert-to', 'pdf', 28 | '--outdir', str(temp_dir), 29 | str(input_file) 30 | ] 31 | 32 | result = subprocess.run(cmd, check=True, capture_output=True, text=True) 33 | logger.debug(f"LibreOffice conversion output: {result.stdout}") 34 | 35 | # The PDF file name should match the PPTX name, but with ".pdf" 36 | pdf_name = f"{input_file.stem}.pdf" 37 | pdf_path = temp_dir / pdf_name 38 | 39 | if pdf_path.exists(): 40 | return pdf_path 41 | else: 42 | logger.error(f"Expected PDF not created at {pdf_path}") 43 | logger.error(f"LibreOffice error: {result.stderr}") 44 | raise ConversionError(f"Failed to create PDF at {pdf_path}") 45 | 46 | except subprocess.CalledProcessError as e: 47 | logger.error(f"Error converting {input_file}: {e.stderr}") 48 | raise ConversionError(f"Subprocess conversion error: {e.stderr}") 49 | except Exception as e: 50 | logger.error(f"Unexpected error converting {input_file}: {str(e)}") 51 | raise ConversionError(f"Unexpected error: {str(e)}") 52 | -------------------------------------------------------------------------------- /src/libreoffice_docker/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM python:3.11-slim 2 | 3 | ENV PYTHONDONTWRITEBYTECODE=1 \ 4 | PYTHONUNBUFFERED=1 5 | 6 | RUN apt-get update && apt-get install -y --no-install-recommends \ 7 | libreoffice \ 8 | fonts-dejavu \ 9 | fonts-liberation \ 10 | fonts-noto \ 11 | fonts-noto-color-emoji \ 12 | curl \ 13 | fontconfig \ 14 | && apt-get clean \ 15 | && rm -rf /var/lib/apt/lists/* 16 | 17 | RUN fc-cache -f -v 18 | 19 | RUN useradd --create-home libreoffice 20 | 21 | WORKDIR /app 22 | 23 | COPY requirements.txt . 24 | RUN pip install --no-cache-dir -r requirements.txt 25 | 26 | COPY app.py . 27 | 28 | RUN chown -R libreoffice:libreoffice /app 29 | 30 | USER libreoffice 31 | 32 | EXPOSE 2002 33 | 34 | CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "2002"] 35 | -------------------------------------------------------------------------------- /src/libreoffice_docker/app.py: -------------------------------------------------------------------------------- 1 | from fastapi import FastAPI, UploadFile, HTTPException 2 | from fastapi.responses import FileResponse 3 | import subprocess 4 | from pathlib import Path 5 | import tempfile 6 | import shutil 7 | import logging 8 | 9 | app = FastAPI(title="Document Conversion Service") 10 | 11 | # Configure logging 12 | logging.basicConfig( 13 | level=logging.INFO, 14 | format="%(asctime)s [%(levelname)s] %(message)s", 15 | handlers=[ 16 | logging.StreamHandler() 17 | ] 18 | ) 19 | logger = logging.getLogger(__name__) 20 | 21 | LIBREOFFICE_PATH = Path("/usr/bin/libreoffice") 22 | 23 | @app.get("/health") 24 | async def health_check(): 25 | """Simple health check endpoint""" 26 | return {"status": "healthy"} 27 | 28 | @app.post("/convert/ppt-to-pdf") 29 | async def convert_pptx_to_pdf(file: UploadFile): 30 | """Convert uploaded PPTX file to PDF""" 31 | logger.info(f"Received file: {file.filename}") 32 | 33 | # Validate file extension 34 | if not file.filename.lower().endswith(('.pptx', '.ppt')): 35 | logger.error("Invalid file extension") 36 | raise HTTPException(status_code=400, detail="File must be a .pptx or .ppt") 37 | 38 | # Create temp dir but don't use context manager 39 | temp_dir = tempfile.mkdtemp() 40 | temp_dir_path = Path(temp_dir) 41 | input_path = temp_dir_path / file.filename 42 | 43 | try: 44 | # Save uploaded file 45 | with input_path.open("wb") as f: 46 | shutil.copyfileobj(file.file, f) 47 | logger.info(f"Saved uploaded file to: {input_path}") 48 | 49 | # Run LibreOffice conversion 50 | cmd = [ 51 | str(LIBREOFFICE_PATH), 52 | '--headless', 53 | '--convert-to', 'pdf', 54 | '--outdir', str(temp_dir_path), 55 | str(input_path) 56 | ] 57 | logger.info(f"Running command: {' '.join(cmd)}") 58 | 59 | result = subprocess.run( 60 | cmd, 61 | check=True, 62 | capture_output=True, 63 | text=True 64 | ) 65 | 66 | logger.info(f"LibreOffice stdout: {result.stdout}") 67 | if result.stderr: 68 | logger.warning(f"LibreOffice stderr: {result.stderr}") 69 | 70 | # Check for output file 71 | pdf_path = temp_dir_path / f"{input_path.stem}.pdf" 72 | if not pdf_path.exists(): 73 | logger.error(f"PDF not created. LibreOffice output: {result.stderr}") 74 | raise HTTPException(status_code=500, detail="PDF conversion failed") 75 | 76 | logger.info(f"Conversion successful: {pdf_path}") 77 | 78 | async def cleanup_background(): 79 | """Async cleanup function""" 80 | shutil.rmtree(temp_dir, ignore_errors=True) 81 | 82 | response = FileResponse( 83 | path=pdf_path, 84 | media_type='application/pdf', 85 | filename=pdf_path.name 86 | ) 87 | response.background = cleanup_background 88 | 89 | return response 90 | 91 | except Exception as e: 92 | # Clean up temp dir in case of error 93 | shutil.rmtree(temp_dir, ignore_errors=True) 94 | logger.exception("Error during conversion") 95 | raise HTTPException(status_code=500, detail=str(e)) from e 96 | -------------------------------------------------------------------------------- /src/libreoffice_docker/requirements.txt: -------------------------------------------------------------------------------- 1 | fastapi==0.104.1 2 | python-multipart==0.0.6 3 | uvicorn==0.24.0 4 | -------------------------------------------------------------------------------- /src/llm/__init__.py: -------------------------------------------------------------------------------- 1 | from .base import LLMClient 2 | from .anthropic import AnthropicClient 3 | from .google_unified import GoogleUnifiedClient 4 | from .openai import OpenAIClient 5 | from .azure import AzureClient 6 | from .aws import AWSClient 7 | 8 | __all__ = [ 9 | "LLMClient", 10 | "AnthropicClient", 11 | "GoogleUnifiedClient", 12 | "OpenAIClient", 13 | "AzureClient", 14 | "AWSClient" 15 | ] -------------------------------------------------------------------------------- /src/llm/anthropic.py: -------------------------------------------------------------------------------- 1 | import os 2 | import base64 3 | from pathlib import Path 4 | from typing import Optional, Union 5 | 6 | import anthropic 7 | 8 | class AnthropicClient: 9 | """ 10 | A client wrapper around Anthropic's API for image + prompt generation. 11 | 12 | Usage: 13 | client = AnthropicClient(api_key="YOUR_KEY", model="claude-3-5-sonnet-latest") 14 | text_response = client.generate(prompt="Hello World", image_path="path/to/image.png") 15 | """ 16 | 17 | def __init__(self, api_key: Optional[str] = None, model: Optional[str] = None) -> None: 18 | """ 19 | Initialize the Anthropic client with API key and model name. 20 | 21 | :param api_key: Optional API key string. If not provided, 22 | checks the ANTHROPIC_API_KEY environment variable. 23 | :param model: The name of the generative model to use (e.g. "claude-3-sonnet-20240229"). 24 | :raises ValueError: If no API key is found or model is None. 25 | """ 26 | self.api_key = api_key or os.environ.get("ANTHROPIC_API_KEY") 27 | if not self.api_key: 28 | raise ValueError( 29 | "API key must be provided or set via ANTHROPIC_API_KEY environment variable." 30 | ) 31 | 32 | if model is None: 33 | raise ValueError("The 'model' argument is required and cannot be None.") 34 | 35 | self.client = anthropic.Anthropic(api_key=self.api_key) 36 | self.model_name = model 37 | 38 | def _encode_image(self, image_path: Union[str, Path]) -> str: 39 | """ 40 | Encode an image file to base64 string. 41 | 42 | :param image_path: Path to the image file 43 | :return: Base64 encoded string of the image 44 | """ 45 | with open(image_path, "rb") as image_file: 46 | return base64.b64encode(image_file.read()).decode("utf-8") 47 | 48 | def generate(self, prompt: str, image_path: Union[str, Path]) -> str: 49 | """ 50 | Generate content using the Anthropic model with text + image as input. 51 | 52 | :param prompt: A textual prompt to provide to the model. 53 | :param image_path: File path (string or Path) to an image to be included in the request. 54 | :return: The generated response text from the model. 55 | :raises FileNotFoundError: If the specified image_path does not exist. 56 | :raises Exception: If the underlying model call fails or an unexpected error occurs. 57 | """ 58 | # Ensure the image path exists 59 | image_path_obj = Path(image_path) 60 | if not image_path_obj.is_file(): 61 | raise FileNotFoundError(f"Image file not found at {image_path_obj}") 62 | 63 | try: 64 | # Encode the image to base64 65 | base64_image = self._encode_image(image_path_obj) 66 | 67 | # Create the messages request 68 | response = self.client.messages.create( 69 | model=self.model_name, 70 | max_tokens=8192, 71 | messages=[ 72 | { 73 | "role": "user", 74 | "content": [ 75 | { 76 | "type": "image", 77 | "source": { 78 | "type": "base64", 79 | "media_type": "image/png", 80 | "data": base64_image, 81 | }, 82 | }, 83 | { 84 | "type": "text", 85 | "text": prompt 86 | } 87 | ], 88 | } 89 | ], 90 | ) 91 | 92 | return response.content[0].text 93 | 94 | except Exception as e: 95 | raise Exception(f"Failed to generate content with Anthropic model: {e}") -------------------------------------------------------------------------------- /src/llm/aws.py: -------------------------------------------------------------------------------- 1 | import os 2 | from pathlib import Path 3 | from typing import Optional, Union 4 | 5 | import boto3 6 | class AWSClient: 7 | """ 8 | A client wrapper around AWS Bedrock Runtime API. 9 | 10 | Usage: 11 | client = AWSClient( 12 | access_key_id="YOUR_ACCESS_KEY", 13 | secret_access_key="YOUR_SECRET_KEY", 14 | region="us-east-1", 15 | model="amazon.nova-pro-v1:0" # or any Claude model 16 | ) 17 | text_response = client.generate(prompt="Hello World", image_path="path/to/image.png") 18 | """ 19 | 20 | def __init__( 21 | self, 22 | access_key_id: Optional[str] = None, 23 | secret_access_key: Optional[str] = None, 24 | region: Optional[str] = None, 25 | model: Optional[str] = None, 26 | ) -> None: 27 | """ 28 | Initialize the AWS Bedrock client. 29 | 30 | :param access_key_id: Optional AWS access key ID. If not provided, 31 | checks AWS_ACCESS_KEY_ID environment variable. 32 | :param secret_access_key: Optional AWS secret access key. If not provided, 33 | checks AWS_SECRET_ACCESS_KEY environment variable. 34 | :param region: AWS region name. If not provided, checks AWS_REGION environment variable. 35 | :param model: The model ID (e.g., "amazon.nova-pro-v1:0" or any Claude model). 36 | :raises ValueError: If required parameters are missing. 37 | """ 38 | self.access_key_id = access_key_id or os.environ.get("AWS_ACCESS_KEY_ID") 39 | if not self.access_key_id: 40 | raise ValueError( 41 | "AWS access key ID must be provided or set via AWS_ACCESS_KEY_ID environment variable." 42 | ) 43 | 44 | self.secret_access_key = secret_access_key or os.environ.get("AWS_SECRET_ACCESS_KEY") 45 | if not self.secret_access_key: 46 | raise ValueError( 47 | "AWS secret access key must be provided or set via AWS_SECRET_ACCESS_KEY environment variable." 48 | ) 49 | 50 | self.region = region or os.environ.get("AWS_REGION") 51 | if not self.region: 52 | raise ValueError( 53 | "AWS region must be provided or set via AWS_REGION environment variable." 54 | ) 55 | 56 | if model is None: 57 | raise ValueError("The 'model' argument is required and cannot be None.") 58 | 59 | self.client = boto3.client( 60 | "bedrock-runtime", 61 | region_name=self.region, 62 | aws_access_key_id=self.access_key_id, 63 | aws_secret_access_key=self.secret_access_key 64 | ) 65 | self.model_id = model 66 | 67 | # For JSON metadata 68 | self.model_name = model 69 | 70 | def _encode_image(self, image_path: Union[str, Path]) -> str: 71 | """ 72 | Encode an image file to base64 string. 73 | 74 | :param image_path: Path to the image file 75 | :return: Base64 encoded string of the image 76 | """ 77 | with open(image_path, "rb") as image_file: 78 | return image_file.read() 79 | 80 | def generate(self, prompt: str, image_path: Union[str, Path]) -> str: 81 | """ 82 | Generate content using the AWS model with text + image as input. 83 | 84 | :param prompt: A textual prompt to provide to the model. 85 | :param image_path: File path (string or Path) to an image to be included in the request. 86 | :return: The generated response text from the model. 87 | :raises FileNotFoundError: If the specified image_path does not exist. 88 | :raises Exception: If the underlying model call fails or an unexpected error occurs. 89 | """ 90 | # Ensure the image path exists 91 | image_path_obj = Path(image_path) 92 | if not image_path_obj.is_file(): 93 | raise FileNotFoundError(f"Image file not found at {image_path_obj}") 94 | 95 | try: 96 | # Encode the image to base64 97 | base64_image = self._encode_image(image_path_obj) 98 | 99 | # Create the messages list 100 | messages = [ 101 | { 102 | "role": "user", 103 | "content": [ 104 | { 105 | "text": prompt 106 | }, 107 | { 108 | "image": { 109 | "format": "png", 110 | "source": { 111 | "bytes": base64_image 112 | } 113 | } 114 | } 115 | ] 116 | } 117 | ] 118 | 119 | # Invoke the model using converse 120 | response = self.client.converse( 121 | modelId=self.model_id, 122 | messages=messages 123 | ) 124 | 125 | return response["output"]["message"]["content"][0]["text"] 126 | 127 | except Exception as e: 128 | raise Exception(f"Failed to generate content with AWS model: {e}") -------------------------------------------------------------------------------- /src/llm/azure.py: -------------------------------------------------------------------------------- 1 | import os 2 | import base64 3 | from pathlib import Path 4 | from typing import Optional, Union 5 | 6 | from openai import AzureOpenAI 7 | 8 | 9 | class AzureClient: 10 | """ 11 | A client wrapper around Azure OpenAI's API for image + prompt generation. 12 | 13 | Usage: 14 | client = AzureClient( 15 | api_key="YOUR_KEY", 16 | endpoint="YOUR_ENDPOINT", 17 | deployment="deployment_name", 18 | api_version="2023-12-01-preview" 19 | ) 20 | text_response = client.generate(prompt="Hello World", image_path="path/to/image.png") 21 | """ 22 | 23 | def __init__( 24 | self, 25 | api_key: Optional[str] = None, 26 | endpoint: Optional[str] = None, 27 | deployment: Optional[str] = None, 28 | api_version: Optional[str] = None, 29 | ) -> None: 30 | """ 31 | Initialize the Azure OpenAI client. 32 | 33 | :param api_key: Optional API key string. If not provided, 34 | checks the AZURE_OPENAI_API_KEY environment variable. 35 | :param endpoint: Azure OpenAI endpoint. If not provided, 36 | checks the AZURE_OPENAI_ENDPOINT environment variable. 37 | :param deployment: The deployment name for the model. 38 | :param api_version: Azure OpenAI API version (e.g., "2023-12-01-preview") 39 | :raises ValueError: If required parameters are missing. 40 | """ 41 | self.api_key = api_key or os.environ.get("AZURE_OPENAI_API_KEY") 42 | if not self.api_key: 43 | raise ValueError( 44 | "API key must be provided or set via AZURE_OPENAI_API_KEY environment variable." 45 | ) 46 | 47 | self.endpoint = endpoint or os.environ.get("AZURE_OPENAI_ENDPOINT") 48 | if not self.endpoint: 49 | raise ValueError( 50 | "Endpoint must be provided or set via AZURE_OPENAI_ENDPOINT environment variable." 51 | ) 52 | 53 | if deployment is None: 54 | raise ValueError("The 'deployment' argument is required and cannot be None.") 55 | 56 | if api_version is None: 57 | raise ValueError("The 'api_version' argument is required and cannot be None.") 58 | 59 | self.client = AzureOpenAI( 60 | api_key=self.api_key, 61 | api_version=api_version, 62 | base_url=f"{self.endpoint}/openai/deployments/{deployment}" 63 | ) 64 | self.deployment = deployment 65 | 66 | # For JSON metadata 67 | self.model_name = deployment 68 | 69 | def _encode_image(self, image_path: Union[str, Path]) -> str: 70 | """ 71 | Encode an image file to base64 string. 72 | 73 | :param image_path: Path to the image file 74 | :return: Base64 encoded string of the image 75 | """ 76 | with open(image_path, "rb") as image_file: 77 | return base64.b64encode(image_file.read()).decode("utf-8") 78 | 79 | def generate(self, prompt: str, image_path: Union[str, Path]) -> str: 80 | """ 81 | Generate content using the Azure OpenAI model with text + image as input. 82 | 83 | :param prompt: A textual prompt to provide to the model. 84 | :param image_path: File path (string or Path) to an image to be included in the request. 85 | :return: The generated response text from the model. 86 | :raises FileNotFoundError: If the specified image_path does not exist. 87 | :raises Exception: If the underlying model call fails or an unexpected error occurs. 88 | """ 89 | # Ensure the image path exists 90 | image_path_obj = Path(image_path) 91 | if not image_path_obj.is_file(): 92 | raise FileNotFoundError(f"Image file not found at {image_path_obj}") 93 | 94 | try: 95 | # Encode the image to base64 96 | base64_image = self._encode_image(image_path_obj) 97 | 98 | # Create the API request 99 | response = self.client.chat.completions.create( 100 | model=self.deployment, 101 | messages=[ 102 | { 103 | "role": "user", 104 | "content": [ 105 | { 106 | "type": "text", 107 | "text": prompt, 108 | }, 109 | { 110 | "type": "image_url", 111 | "image_url": { 112 | "url": f"data:image/jpeg;base64,{base64_image}" 113 | }, 114 | }, 115 | ], 116 | } 117 | ], 118 | ) 119 | 120 | return response.choices[0].message.content 121 | 122 | except Exception as e: 123 | raise Exception(f"Failed to generate content with Azure OpenAI model: {e}") -------------------------------------------------------------------------------- /src/llm/base.py: -------------------------------------------------------------------------------- 1 | from pathlib import Path 2 | from typing import Protocol, Union, runtime_checkable 3 | 4 | 5 | @runtime_checkable 6 | class LLMClient(Protocol): 7 | """ 8 | Protocol defining the interface for LLM clients. 9 | 10 | This protocol ensures all LLM clients implement a (semi) consistent interface for image-to-text generation. 11 | """ 12 | 13 | model_name: str 14 | 15 | def generate(self, prompt: str, image_path: Union[str, Path]) -> str: 16 | """ 17 | Generate content using the LLM model with text + image as input. 18 | 19 | :param prompt: A textual prompt to provide to the model. 20 | :param image_path: File path (string or Path) to an image to be included in the request. 21 | :return: The generated response text from the model. 22 | :raises FileNotFoundError: If the specified image_path does not exist. 23 | :raises Exception: If the underlying model call fails or an unexpected error occurs. 24 | """ 25 | pass -------------------------------------------------------------------------------- /src/llm/deprecated/gemini.py: -------------------------------------------------------------------------------- 1 | import os 2 | from pathlib import Path 3 | from typing import Optional, Union 4 | 5 | import PIL.Image 6 | import google.generativeai as genai 7 | 8 | 9 | class GeminiClient: 10 | """ 11 | A client wrapper around Google's Generative AI (Gemini) model for image + prompt generation. 12 | 13 | Usage: 14 | client = GeminiClient(api_key="YOUR_KEY", model="gemini-1.5-flash") 15 | text_response = client.generate(prompt="Hello World", image_path="path/to/image.png") 16 | """ 17 | 18 | def __init__(self, api_key: Optional[str] = None, model: Optional[str] = None) -> None: 19 | """ 20 | Initialize the Gemini client with API key and model name. 21 | 22 | :param api_key: Optional API key string. If not provided, 23 | checks the GEMINI_API_KEY environment variable. 24 | :param model: The name of the generative model to use (e.g. "gemini-1.5-flash"). 25 | :raises ValueError: If no API key is found or model is None. 26 | """ 27 | self.api_key = api_key or os.environ.get("GEMINI_API_KEY") 28 | if not self.api_key: 29 | raise ValueError("API key must be provided or set via GEMINI_API_KEY environment variable.") 30 | 31 | if model is None: 32 | raise ValueError("The 'model' argument is required and cannot be None.") 33 | 34 | # Configure generative AI 35 | genai.configure(api_key=self.api_key) 36 | self.model = genai.GenerativeModel(model) 37 | self.model_name = model 38 | 39 | def generate(self, prompt: str, image_path: Union[str, Path]) -> str: 40 | """ 41 | Generate content using the Gemini model with text + image as input. 42 | 43 | :param prompt: A textual prompt to provide to the model. 44 | :param image_path: File path (string or Path) to an image to be included in the request. 45 | :return: The generated response text from the model. 46 | :raises FileNotFoundError: If the specified image_path does not exist. 47 | :raises Exception: If the underlying model call fails or an unexpected error occurs. 48 | """ 49 | # Ensure the image path exists 50 | image_path_obj = Path(image_path) 51 | if not image_path_obj.is_file(): 52 | raise FileNotFoundError(f"Image file not found at {image_path_obj}") 53 | 54 | try: 55 | image = PIL.Image.open(image_path_obj) 56 | # If using the google.generativeai library's generate_content method: 57 | # pass [prompt, image] in the format required by the library 58 | response = self.model.generate_content([prompt, image]) 59 | return response.text 60 | 61 | except Exception as e: 62 | raise Exception(f"Failed to generate content with Gemini model: {e}") 63 | -------------------------------------------------------------------------------- /src/llm/deprecated/vertex.py: -------------------------------------------------------------------------------- 1 | import os 2 | from pathlib import Path 3 | from typing import Optional, Union 4 | 5 | import vertexai 6 | from vertexai.preview.generative_models import GenerativeModel, Image 7 | 8 | 9 | class VertexAIClient: 10 | """ 11 | A client wrapper around Google's Vertex AI service for image + prompt generation using Gemini models. 12 | 13 | Usage: 14 | client = VertexAIClient( 15 | credentials_path="path/to/credentials.json", 16 | project_id="your-project-id", 17 | region="us-central1", 18 | model="gemini-1.5-pro-002" 19 | ) 20 | text_response = client.generate(prompt="Hello World", image_path="path/to/image.png") 21 | """ 22 | 23 | def __init__( 24 | self, 25 | credentials_path: Optional[str] = None, 26 | project_id: Optional[str] = None, 27 | region: Optional[str] = None, 28 | model: Optional[str] = None, 29 | ) -> None: 30 | """ 31 | Initialize the Vertex AI client with necessary credentials and configuration. 32 | 33 | :param credentials_path: Path to the service account credentials JSON file. 34 | If not provided, checks GOOGLE_APPLICATION_CREDENTIALS env var. 35 | :param project_id: GCP project ID. If not provided, checks PROJECT_ID env var. 36 | :param region: GCP region for Vertex AI. If not provided, checks REGION env var. 37 | :param model: The name of the generative model to use (e.g. "gemini-1.5-pro-002"). 38 | :raises ValueError: If required credentials or configuration are missing. 39 | """ 40 | # Check credentials 41 | self.credentials_path = credentials_path or os.environ.get("GOOGLE_APPLICATION_CREDENTIALS") 42 | if not self.credentials_path: 43 | raise ValueError( 44 | "Credentials path must be provided or set via " 45 | "GOOGLE_APPLICATION_CREDENTIALS environment variable." 46 | ) 47 | if not Path(self.credentials_path).is_file(): 48 | raise FileNotFoundError( 49 | f"Credentials file not found at {self.credentials_path}" 50 | ) 51 | 52 | # Check project ID and region 53 | self.project_id = project_id or os.environ.get("PROJECT_ID") 54 | if not self.project_id: 55 | raise ValueError( 56 | "Project ID must be provided or set via PROJECT_ID environment variable." 57 | ) 58 | 59 | self.region = region or os.environ.get("REGION") 60 | if not self.region: 61 | raise ValueError( 62 | "Region must be provided or set via REGION environment variable." 63 | ) 64 | 65 | if model is None: 66 | raise ValueError("The 'model' argument is required and cannot be None.") 67 | 68 | # Set credentials environment variable 69 | os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = self.credentials_path 70 | 71 | # Initialize Vertex AI 72 | vertexai.init(project=self.project_id, location=self.region) 73 | 74 | # Initialize the model 75 | self.model = GenerativeModel(model) 76 | self.model_name = model 77 | 78 | def generate(self, prompt: str, image_path: Union[str, Path]) -> str: 79 | """ 80 | Generate content using the Vertex AI model with text + image as input. 81 | 82 | :param prompt: A textual prompt to provide to the model. 83 | :param image_path: File path (string or Path) to an image to be included in the request. 84 | :return: The generated response text from the model. 85 | :raises FileNotFoundError: If the specified image_path does not exist. 86 | :raises Exception: If the underlying model call fails or an unexpected error occurs. 87 | """ 88 | # Ensure the image path exists 89 | image_path_obj = Path(image_path) 90 | if not image_path_obj.is_file(): 91 | raise FileNotFoundError(f"Image file not found at {image_path_obj}") 92 | 93 | try: 94 | # Load the image using Vertex AI's Image class 95 | image = Image.load_from_file(str(image_path_obj)) 96 | 97 | # Generate content using the model 98 | response = self.model.generate_content([prompt, image]) 99 | return response.text 100 | 101 | except Exception as e: 102 | raise Exception(f"Failed to generate content with Vertex AI model: {e}") -------------------------------------------------------------------------------- /src/llm/google_unified.py: -------------------------------------------------------------------------------- 1 | import os 2 | from pathlib import Path 3 | from typing import Optional, Union 4 | 5 | import PIL.Image 6 | from google import genai 7 | from google.oauth2 import service_account 8 | import logging 9 | 10 | logging.getLogger("google_genai.models").setLevel(logging.WARNING) 11 | 12 | class GoogleUnifiedClient: 13 | """ 14 | A unified client wrapper for Google's GenAI SDK that supports both Gemini API and Vertex AI. 15 | 16 | Usage for Gemini API: 17 | client = GoogleUnifiedClient(api_key="YOUR_KEY", model="gemini-1.5-flash") 18 | 19 | Usage for Vertex AI: 20 | client = GoogleUnifiedClient( 21 | credentials_path="path/to/credentials.json", 22 | project_id="your-project-id", 23 | region="us-central1", 24 | model="gemini-1.5-pro-002", 25 | use_vertex=True 26 | ) 27 | """ 28 | 29 | def __init__( 30 | self, 31 | api_key: Optional[str] = None, 32 | credentials_path: Optional[str] = None, 33 | project_id: Optional[str] = None, 34 | region: Optional[str] = None, 35 | model: Optional[str] = None, 36 | use_vertex: bool = False, 37 | ) -> None: 38 | """ 39 | Initialize the Google GenAI client for either Gemini API or Vertex AI. 40 | 41 | :param api_key: API key for Gemini API (used if use_vertex=False) 42 | :param credentials_path: Path to service account credentials JSON file (used if use_vertex=True) 43 | :param project_id: GCP project ID (used if use_vertex=True) 44 | :param region: GCP region (used if use_vertex=True) 45 | :param model: The name of the generative model to use 46 | :param use_vertex: Whether to use Vertex AI (True) or Gemini API (False) 47 | :raises ValueError: If required parameters are missing 48 | """ 49 | if model is None: 50 | raise ValueError("The 'model' argument is required and cannot be None.") 51 | 52 | self.model_name = model 53 | self.use_vertex = use_vertex 54 | 55 | if use_vertex: 56 | # Initialize for Vertex AI 57 | self.credentials_path = credentials_path or os.environ.get("GOOGLE_APPLICATION_CREDENTIALS") 58 | if not self.credentials_path: 59 | raise ValueError( 60 | "Credentials path must be provided or set via " 61 | "GOOGLE_APPLICATION_CREDENTIALS environment variable." 62 | ) 63 | if not Path(self.credentials_path).is_file(): 64 | raise FileNotFoundError( 65 | f"Credentials file not found at {self.credentials_path}" 66 | ) 67 | 68 | self.project_id = project_id or os.environ.get("PROJECT_ID") 69 | if not self.project_id: 70 | raise ValueError( 71 | "Project ID must be provided or set via PROJECT_ID environment variable." 72 | ) 73 | 74 | self.region = region or os.environ.get("REGION") 75 | if not self.region: 76 | raise ValueError( 77 | "Region must be provided or set via REGION environment variable." 78 | ) 79 | 80 | # Load credentials 81 | credentials = service_account.Credentials.from_service_account_file( 82 | self.credentials_path 83 | ).with_scopes(["https://www.googleapis.com/auth/cloud-platform"]) 84 | 85 | # Initialize the client for Vertex AI 86 | self.client = genai.Client( 87 | vertexai=True, 88 | project=self.project_id, 89 | location=self.region, 90 | credentials=credentials 91 | ) 92 | 93 | else: 94 | # Initialize for Gemini API 95 | self.api_key = api_key or os.environ.get("GEMINI_API_KEY") 96 | if not self.api_key: 97 | raise ValueError( 98 | "API key must be provided or set via GEMINI_API_KEY environment variable." 99 | ) 100 | 101 | # Initialize the client for Gemini API 102 | self.client = genai.Client(api_key=self.api_key) 103 | 104 | def generate(self, prompt: str, image_path: Union[str, Path]) -> str: 105 | """ 106 | Generate content using the Google GenAI model with text + image as input. 107 | 108 | :param prompt: A textual prompt to provide to the model. 109 | :param image_path: File path (string or Path) to an image to be included in the request. 110 | :return: The generated response text from the model. 111 | :raises FileNotFoundError: If the specified image_path does not exist. 112 | :raises Exception: If the underlying model call fails or an unexpected error occurs. 113 | """ 114 | # Ensure the image path exists 115 | image_path_obj = Path(image_path) 116 | if not image_path_obj.is_file(): 117 | raise FileNotFoundError(f"Image file not found at {image_path_obj}") 118 | 119 | try: 120 | # Load the image 121 | image = PIL.Image.open(image_path_obj) 122 | 123 | # Generate content using the client.models approach 124 | response = self.client.models.generate_content( 125 | model=self.model_name, 126 | contents=[prompt, image] 127 | ) 128 | 129 | return response.text 130 | 131 | except Exception as e: 132 | api_type = "Vertex AI" if self.use_vertex else "Gemini API" 133 | raise Exception(f"Failed to generate content with {api_type} model: {e}") -------------------------------------------------------------------------------- /src/llm/openai.py: -------------------------------------------------------------------------------- 1 | import os 2 | import base64 3 | from pathlib import Path 4 | from typing import Optional, Union 5 | import logging 6 | 7 | from openai import OpenAI 8 | 9 | # Remove OpenAI's standard logging messages 10 | logging.getLogger("openai").setLevel(logging.ERROR) 11 | logging.getLogger("httpx").setLevel(logging.ERROR) 12 | 13 | class OpenAIClient: 14 | """ 15 | A client wrapper around OpenAI's API for image + prompt generation. 16 | 17 | Usage: 18 | client = OpenAIClient(api_key="YOUR_KEY", model="gpt-4o") 19 | text_response = client.generate(prompt="Hello World", image_path="path/to/image.png") 20 | """ 21 | 22 | def __init__(self, api_key: Optional[str] = None, model: Optional[str] = None) -> None: 23 | """ 24 | Initialize the OpenAI client with API key and model name. 25 | 26 | :param api_key: Optional API key string. If not provided, 27 | checks the OPENAI_API_KEY environment variable. 28 | :param model: The name of the generative model to use (e.g. "gpt-4-vision-preview"). 29 | :raises ValueError: If no API key is found or model is None. 30 | """ 31 | self.api_key = api_key or os.environ.get("OPENAI_API_KEY") 32 | if not self.api_key: 33 | raise ValueError( 34 | "API key must be provided or set via OPENAI_API_KEY environment variable." 35 | ) 36 | 37 | if model is None: 38 | raise ValueError("The 'model' argument is required and cannot be None.") 39 | 40 | self.client = OpenAI(api_key=self.api_key) 41 | self.model_name = model 42 | 43 | def _encode_image(self, image_path: Union[str, Path]) -> str: 44 | """ 45 | Encode an image file to base64 string. 46 | 47 | :param image_path: Path to the image file 48 | :return: Base64 encoded string of the image 49 | """ 50 | with open(image_path, "rb") as image_file: 51 | return base64.b64encode(image_file.read()).decode("utf-8") 52 | 53 | def generate(self, prompt: str, image_path: Union[str, Path]) -> str: 54 | """ 55 | Generate content using the OpenAI model with text + image as input. 56 | 57 | :param prompt: A textual prompt to provide to the model. 58 | :param image_path: File path (string or Path) to an image to be included in the request. 59 | :return: The generated response text from the model. 60 | :raises FileNotFoundError: If the specified image_path does not exist. 61 | :raises Exception: If the underlying model call fails or an unexpected error occurs. 62 | """ 63 | # Ensure the image path exists 64 | image_path_obj = Path(image_path) 65 | if not image_path_obj.is_file(): 66 | raise FileNotFoundError(f"Image file not found at {image_path_obj}") 67 | 68 | try: 69 | # Encode the image to base64 70 | base64_image = self._encode_image(image_path_obj) 71 | 72 | # Create the API request 73 | response = self.client.chat.completions.create( 74 | model=self.model_name, 75 | messages=[ 76 | { 77 | "role": "user", 78 | "content": [ 79 | { 80 | "type": "text", 81 | "text": prompt, 82 | }, 83 | { 84 | "type": "image_url", 85 | "image_url": { 86 | "url": f"data:image/jpeg;base64,{base64_image}" 87 | }, 88 | }, 89 | ], 90 | } 91 | ], 92 | ) 93 | 94 | return response.choices[0].message.content 95 | 96 | except Exception as e: 97 | raise Exception(f"Failed to generate content with OpenAI model: {e}") -------------------------------------------------------------------------------- /src/main.py: -------------------------------------------------------------------------------- 1 | import logging 2 | import argparse 3 | import sys 4 | from pathlib import Path 5 | 6 | from llm.google_unified import GoogleUnifiedClient 7 | from llm.openai import OpenAIClient 8 | from llm.anthropic import AnthropicClient 9 | from llm.azure import AzureClient 10 | from llm.aws import AWSClient 11 | 12 | from processor import process_input_path 13 | 14 | def parse_args(input_args=None): 15 | parser = argparse.ArgumentParser(description="Process PPT/PPTX files via vLLM.") 16 | 17 | parser.add_argument( 18 | "--output_dir", 19 | type=str, 20 | default=None, 21 | required=True, 22 | help="Output directory path" 23 | ) 24 | parser.add_argument( 25 | "--input_dir", 26 | type=str, 27 | default=None, 28 | required=True, 29 | help="Path to input directory or PPT file" 30 | ) 31 | parser.add_argument( 32 | "--client", 33 | type=str, 34 | required=True, 35 | choices=["gemini", "vertexai", "openai", "anthropic", "azure", "aws"], 36 | help="LLM client to use: 'gemini', 'vertexai', 'openai', 'azure', 'aws', or 'anthropic'" 37 | ) 38 | parser.add_argument( 39 | "--model", 40 | type=str, 41 | default="gemini-1.5-flash", 42 | help="Suggested models: gemini-1.5-flash, gemini-1.5-pro, gpt-4o, claude-3-5-sonnet-latest" 43 | ) 44 | parser.add_argument( 45 | "--instructions", 46 | type=str, 47 | default="None Provided", 48 | help="Additional instructions appended to the base prompt" 49 | ) 50 | parser.add_argument( 51 | "--libreoffice_path", 52 | type=str, 53 | default=None, 54 | help="Path to the local installation of LibreOffice." 55 | ) 56 | parser.add_argument( 57 | "--rate_limit", 58 | type=int, 59 | default=60, 60 | help="Number of API calls allowed per minute (default: 60)" 61 | ) 62 | parser.add_argument( 63 | "--prompt_path", 64 | type=str, 65 | default="src/prompt.txt", 66 | help="Path to the base prompt file (default: src/prompt.txt)" 67 | ) 68 | parser.add_argument( 69 | "--api_key", 70 | type=str, 71 | default=None, 72 | help="API key for the LLM. If not provided, the environment variable may be used." 73 | ) 74 | parser.add_argument( 75 | "--gcp_region", 76 | type=str, 77 | default=None, 78 | help="GCP Region for connecting to vertex AI service account." 79 | ) 80 | parser.add_argument( 81 | "--gcp_project_id", 82 | type=str, 83 | default=None, 84 | help="GCP project id for connecting to vertex AI service account." 85 | ) 86 | parser.add_argument( 87 | "--gcp_application_credentials", 88 | type=str, 89 | default=None, 90 | help="Path to JSON credentials for GCP service account" 91 | ) 92 | parser.add_argument( 93 | "--azure_openai_api_key", 94 | type=str, 95 | default=None, 96 | help="Value for AZURE_OPENAI_KEY if not set in env" 97 | ) 98 | parser.add_argument( 99 | "--azure_openai_endpoint", 100 | type=str, 101 | default=None, 102 | help="Value for AZURE_OPENAI_ENDPOINT if not set in env" 103 | ) 104 | parser.add_argument( 105 | "--azure_deployment_name", 106 | type=str, 107 | default=None, 108 | help="Name of your Azure deployment" 109 | ) 110 | parser.add_argument( 111 | "--azure_api_version", 112 | type=str, 113 | default="2023-12-01-preview", 114 | help="Azure API version" 115 | ) 116 | parser.add_argument( 117 | "--aws_access_key_id", 118 | type=str, 119 | help="AWS User Access Key" 120 | ) 121 | parser.add_argument( 122 | "--aws_secret_access_key", 123 | type=str, 124 | help="AWS User Secret Access Key" 125 | ) 126 | parser.add_argument( 127 | "--aws_region", 128 | type=str, 129 | default="us-east-1", 130 | help="Region for AWS Bedrock Instance" 131 | ) 132 | parser.add_argument( 133 | "--save_pdf", 134 | action='store_true', 135 | default=False, 136 | help="Save converted PDF files in the output directory" 137 | ) 138 | parser.add_argument( 139 | "--save_images", 140 | action='store_true', 141 | default=False, 142 | help="Save extracted images in a subfolder within the output directory named after the presentation" 143 | ) 144 | parser.add_argument( 145 | "--libreoffice_url", 146 | type=str, 147 | default=None, 148 | help="If provided, uses the Docker container's endpoint (e.g., http://localhost:2002) for PPT->PDF conversion." 149 | ) 150 | 151 | args = parser.parse_args(input_args) if input_args else parser.parse_args() 152 | return args 153 | 154 | def main(): 155 | # ---- 1) Parse arguments ---- 156 | args = parse_args() 157 | 158 | # ---- 2) Configure logging ---- 159 | logging.basicConfig( 160 | level=logging.INFO, 161 | format="%(asctime)s [%(levelname)s] %(name)s - %(message)s", 162 | handlers=[logging.StreamHandler(sys.stdout)] 163 | ) 164 | logger = logging.getLogger(__name__) 165 | 166 | # ---- 3) Read prompt once ---- 167 | base_prompt_file = Path(args.prompt_path) 168 | if not base_prompt_file.is_file(): 169 | logger.error(f"Prompt file not found at {base_prompt_file}") 170 | sys.exit(1) 171 | 172 | base_prompt = base_prompt_file.read_text(encoding="utf-8").strip() 173 | if args.instructions and args.instructions.lower() != "none provided": 174 | prompt = f"{base_prompt}\n\nAdditional instructions:\n{args.instructions}" 175 | else: 176 | prompt = base_prompt 177 | 178 | # ---- 4) Initialize model instance ---- 179 | try: 180 | if args.client == "gemini": 181 | # Using the new unified client for Gemini 182 | model_instance = GoogleUnifiedClient( 183 | api_key=args.api_key, 184 | model=args.model, 185 | use_vertex=False 186 | ) 187 | logger.info(f"Initialized Google GenAI Client (Gemini API) with model: {args.model}") 188 | elif args.client == "vertexai": 189 | # Using the new unified client for Vertex AI 190 | model_instance = GoogleUnifiedClient( 191 | credentials_path=args.gcp_application_credentials, 192 | project_id=args.gcp_project_id, 193 | region=args.gcp_region, 194 | model=args.model, 195 | use_vertex=True 196 | ) 197 | logger.info(f"Initialized Google GenAI Client (Vertex AI) for project: {args.gcp_project_id}") 198 | elif args.client == "openai": 199 | model_instance = OpenAIClient(api_key=args.api_key, model=args.model) 200 | logger.info(f"Initialized OpenAIClient with model: {args.model}") 201 | elif args.client == "anthropic": 202 | model_instance = AnthropicClient(api_key=args.api_key, model=args.model) 203 | logger.info(f"Initialized AnthropicClient with model: {args.model}") 204 | elif args.client == "azure": 205 | model_instance = AzureClient( 206 | api_key=args.azure_openai_api_key, 207 | endpoint=args.azure_openai_endpoint, 208 | deployment=args.azure_deployment_name, 209 | api_version=args.azure_api_version 210 | ) 211 | logger.info(f"Initialized AzureClient for deployment: {args.azure_deployment_name}") 212 | elif args.client == "aws": 213 | model_instance = AWSClient( 214 | access_key_id=args.aws_access_key_id, 215 | secret_access_key=args.aws_secret_access_key, 216 | region=args.aws_region, 217 | model=args.model 218 | ) 219 | logger.info(f"Initialized AWSClient in region: {args.aws_region} with model {args.model}") 220 | else: 221 | logger.error(f"Unsupported client specified: {args.client}") 222 | sys.exit(1) 223 | except Exception as e: 224 | logger.error(f"Failed to initialize model: {str(e)}") 225 | sys.exit(1) 226 | 227 | input_path = Path(args.input_dir) 228 | output_dir = Path(args.output_dir) 229 | output_dir.mkdir(parents=True, exist_ok=True) 230 | 231 | if args.libreoffice_path: 232 | libreoffice_path = Path(args.libreoffice_path) 233 | else: 234 | # If no path is provided, assume 'libreoffice' is in PATH 235 | libreoffice_path = Path("libreoffice") 236 | 237 | # ---- 5) Identify local vs. container-based conversion ---- 238 | if args.libreoffice_url: 239 | logger.info(f"Using Docker-based LibreOffice at: {args.libreoffice_url}") 240 | libreoffice_endpoint = args.libreoffice_url 241 | # We'll pass this URL into the processor so it knows to do remote conversion 242 | libreoffice_path = None 243 | else: 244 | # If no URL is provided, assume local path 245 | if args.libreoffice_path: 246 | libreoffice_path = Path(args.libreoffice_path) 247 | else: 248 | libreoffice_path = Path("libreoffice") 249 | libreoffice_endpoint = None 250 | 251 | input_path = Path(args.input_dir) 252 | output_dir = Path(args.output_dir) 253 | output_dir.mkdir(parents=True, exist_ok=True) 254 | 255 | # ---- 6) Process input path ---- 256 | results = process_input_path( 257 | input_path=input_path, 258 | output_dir=output_dir, 259 | libreoffice_path=libreoffice_path, 260 | libreoffice_endpoint=libreoffice_endpoint, 261 | model_instance=model_instance, 262 | rate_limit=args.rate_limit, 263 | prompt=prompt, 264 | save_pdf=args.save_pdf, 265 | save_images=args.save_images 266 | ) 267 | 268 | # ---- 6) Log Summary ---- 269 | successes = [res for res in results if len(res[1]) > 0] 270 | failures = [res for res in results if len(res[1]) == 0] 271 | 272 | if successes: 273 | logger.info(f"Successfully processed {len(successes)} PPT file(s).") 274 | if failures: 275 | logger.warning(f"Failed to process {len(failures)} PPT file(s).") 276 | 277 | 278 | if __name__ == "__main__": 279 | main() -------------------------------------------------------------------------------- /src/processor.py: -------------------------------------------------------------------------------- 1 | import time 2 | import logging 3 | import tempfile 4 | from pathlib import Path 5 | from typing import List, Tuple, Union 6 | from tqdm import tqdm 7 | import shutil 8 | 9 | from llm import LLMClient 10 | from converters.ppt_converter import convert_pptx_to_pdf 11 | from converters.pdf_converter import convert_pdf_to_images 12 | from converters.docker_converter import convert_pptx_via_docker 13 | from schemas.deck import DeckData, SlideData 14 | 15 | # Create a type alias for all possible clients 16 | logger = logging.getLogger(__name__) 17 | 18 | 19 | def process_single_file( 20 | ppt_file: Path, 21 | output_dir: Path, 22 | libreoffice_path: Path, 23 | model_instance: LLMClient, 24 | rate_limit: int, 25 | prompt: str, 26 | save_pdf: bool = False, 27 | save_images: bool = False 28 | ) -> Tuple[Path, List[Path]]: 29 | """ 30 | Process a single PowerPoint file: 31 | 1) Convert to PDF 32 | 2) Convert PDF to images 33 | 3) Send images to LLM 34 | 4) Save the JSON output 35 | 5) Optionally save PDF and images to output directory 36 | """ 37 | with tempfile.TemporaryDirectory() as temp_dir_str: 38 | temp_dir = Path(temp_dir_str) 39 | 40 | try: 41 | # 1) PPT -> PDF 42 | pdf_path = convert_pptx_to_pdf(ppt_file, libreoffice_path, temp_dir) 43 | logger.info(f"Successfully converted {ppt_file.name} to {pdf_path.name}") 44 | 45 | # 2) PDF -> Images 46 | image_paths = convert_pdf_to_images(pdf_path, temp_dir) 47 | if not image_paths: 48 | logger.error(f"No images were generated from {pdf_path.name}") 49 | return (ppt_file, []) 50 | 51 | # 3) Generate LLM content 52 | min_interval = 60.0 / rate_limit if rate_limit > 0 else 0 53 | last_call_time = 0.0 54 | 55 | slides_data = [] 56 | # Sort images by slide number (we know "slide_{page_num + 1}.png" format) 57 | image_paths.sort(key=lambda p: int(p.stem.split('_')[1])) 58 | 59 | # Initialize tqdm progress bar 60 | for idx, image_path in enumerate(tqdm(image_paths, desc=f"Processing slides for {ppt_file.name}", unit="slide"), start=1): 61 | # Rate-limit logic 62 | if min_interval > 0: 63 | current_time = time.time() 64 | time_since_last = current_time - last_call_time 65 | if time_since_last < min_interval: 66 | time.sleep(min_interval - time_since_last) 67 | last_call_time = time.time() 68 | 69 | try: 70 | response = model_instance.generate(prompt, image_path) 71 | slides_data.append(SlideData( 72 | number=idx, 73 | content=response 74 | )) 75 | except Exception as e: 76 | logger.error(f"Error generating content for slide {idx}: {str(e)}") 77 | slides_data.append(SlideData( 78 | number=idx, 79 | content="ERROR: Failed to process slide" 80 | )) 81 | 82 | logger.info(f"Successfully converted {ppt_file.name} to {len(slides_data)} slides.") 83 | 84 | # 4) Build pydantic model and save JSON 85 | deck_data = DeckData( 86 | deck=ppt_file.name, 87 | model=model_instance.model_name, 88 | slides=slides_data 89 | ) 90 | output_file = output_dir / f"{ppt_file.stem}.json" 91 | output_file.write_text(deck_data.model_dump_json(indent=2), encoding='utf-8') 92 | logger.info(f"Output written to {output_file}") 93 | 94 | # 5) Optionally save PDF 95 | if save_pdf: 96 | destination_pdf = output_dir / pdf_path.name 97 | shutil.copy2(pdf_path, destination_pdf) 98 | logger.info(f"Saved PDF to {destination_pdf}") 99 | 100 | # 6) Optionally save images 101 | if save_images: 102 | # Create a subfolder named after the PPT file 103 | images_subdir = output_dir / ppt_file.stem 104 | images_subdir.mkdir(parents=True, exist_ok=True) 105 | for img_path in image_paths: 106 | destination_img = images_subdir / img_path.name 107 | shutil.copy2(img_path, destination_img) 108 | logger.info(f"Saved images to {images_subdir}") 109 | 110 | return (ppt_file, image_paths) 111 | 112 | except Exception as ex: 113 | logger.error(f"Unexpected error while processing {ppt_file.name}: {str(ex)}") 114 | return (ppt_file, []) 115 | 116 | def process_input_path( 117 | input_path: Path, 118 | output_dir: Path, 119 | libreoffice_path: Union[Path, None], 120 | libreoffice_endpoint: Union[str, None], 121 | model_instance: LLMClient, 122 | rate_limit: int, 123 | prompt: str, 124 | save_pdf: bool = False, 125 | save_images: bool = False 126 | ) -> List[Tuple[Path, List[Path]]]: 127 | """ 128 | Process one or more PPT files from the specified path. 129 | Optionally save PDFs and images to the output directory. 130 | """ 131 | results = [] 132 | 133 | # Single file mode 134 | if input_path.is_file(): 135 | if input_path.suffix.lower() in ('.ppt', '.pptx'): 136 | res = process_single_file( 137 | ppt_file=input_path, 138 | output_dir=output_dir, 139 | libreoffice_path=libreoffice_path, 140 | libreoffice_endpoint=libreoffice_endpoint, 141 | model_instance=model_instance, 142 | rate_limit=rate_limit, 143 | prompt=prompt, 144 | save_pdf=save_pdf, 145 | save_images=save_images 146 | ) 147 | results.append(res) 148 | 149 | # Directory mode 150 | else: 151 | for ppt_file in input_path.glob('*.ppt*'): 152 | res = process_single_file( 153 | ppt_file=ppt_file, 154 | output_dir=output_dir, 155 | libreoffice_path=libreoffice_path, 156 | libreoffice_endpoint=libreoffice_endpoint, 157 | model_instance=model_instance, 158 | rate_limit=rate_limit, 159 | prompt=prompt, 160 | save_pdf=save_pdf, 161 | save_images=save_images 162 | ) 163 | results.append(res) 164 | 165 | return results 166 | 167 | 168 | def process_single_file( 169 | ppt_file: Path, 170 | output_dir: Path, 171 | libreoffice_path: Union[Path, None], 172 | libreoffice_endpoint: Union[str, None], 173 | model_instance: LLMClient, 174 | rate_limit: int, 175 | prompt: str, 176 | save_pdf: bool = False, 177 | save_images: bool = False 178 | ) -> Tuple[Path, List[Path]]: 179 | """ 180 | Process a single PowerPoint file: 181 | 1) Convert to PDF (either via local LibreOffice or Docker container) 182 | 2) Convert PDF to images 183 | 3) Send images to LLM 184 | 4) Save JSON output 185 | 5) Optionally save PDF and images 186 | """ 187 | with tempfile.TemporaryDirectory() as temp_dir_str: 188 | temp_dir = Path(temp_dir_str) 189 | 190 | try: 191 | # 1) PPT -> PDF 192 | if libreoffice_endpoint: 193 | # Docker-based conversion 194 | pdf_path = convert_pptx_via_docker( 195 | ppt_file, 196 | libreoffice_endpoint, 197 | temp_dir 198 | ) 199 | else: 200 | # Local-based conversion 201 | pdf_path = convert_pptx_to_pdf( 202 | input_file=ppt_file, 203 | libreoffice_path=libreoffice_path, 204 | temp_dir=temp_dir 205 | ) 206 | 207 | logger.info(f"Successfully converted {ppt_file.name} to {pdf_path.name}") 208 | 209 | # 2) PDF -> Images (local PyMuPDF) 210 | image_paths = convert_pdf_to_images(pdf_path, temp_dir) 211 | if not image_paths: 212 | logger.error(f"No images were generated from {pdf_path.name}") 213 | return (ppt_file, []) 214 | 215 | # 3) Generate LLM content 216 | slides_data = [] 217 | min_interval = 60.0 / rate_limit if rate_limit > 0 else 0 218 | last_call_time = 0.0 219 | 220 | # Sort images by slide number (assuming "slide_1.png", "slide_2.png", etc.) 221 | image_paths.sort(key=lambda p: int(p.stem.split('_')[1])) 222 | 223 | for idx, image_path in enumerate( 224 | tqdm(image_paths, desc=f"Processing slides for {ppt_file.name}", unit="slide"), start=1 225 | ): 226 | if min_interval > 0: 227 | current_time = time.time() 228 | time_since_last = current_time - last_call_time 229 | if time_since_last < min_interval: 230 | time.sleep(min_interval - time_since_last) 231 | last_call_time = time.time() 232 | 233 | try: 234 | response = model_instance.generate(prompt, image_path) 235 | slides_data.append(SlideData(number=idx, content=response)) 236 | except Exception as e: 237 | logger.error(f"Error generating content for slide {idx}: {str(e)}") 238 | slides_data.append(SlideData(number=idx, content="ERROR: Failed to process slide")) 239 | 240 | logger.info(f"Successfully converted {ppt_file.name} to {len(slides_data)} slides.") 241 | 242 | # 4) Build pydantic model and save JSON 243 | deck_data = DeckData( 244 | deck=ppt_file.name, 245 | model=model_instance.model_name, 246 | slides=slides_data 247 | ) 248 | output_file = output_dir / f"{ppt_file.stem}.json" 249 | output_file.write_text(deck_data.model_dump_json(indent=2), encoding='utf-8') 250 | logger.info(f"Output written to {output_file}") 251 | 252 | # 5) Optionally save PDF 253 | if save_pdf: 254 | destination_pdf = output_dir / pdf_path.name 255 | shutil.copy2(pdf_path, destination_pdf) 256 | logger.info(f"Saved PDF to {destination_pdf}") 257 | 258 | # 6) Optionally save images 259 | if save_images: 260 | images_subdir = output_dir / ppt_file.stem 261 | images_subdir.mkdir(parents=True, exist_ok=True) 262 | for img_path in image_paths: 263 | shutil.copy2(img_path, images_subdir / img_path.name) 264 | logger.info(f"Saved images to {images_subdir}") 265 | 266 | return (ppt_file, image_paths) 267 | 268 | except Exception as ex: 269 | logger.error(f"Unexpected error while processing {ppt_file.name}: {str(ex)}") 270 | return (ppt_file, []) -------------------------------------------------------------------------------- /src/prompt.txt: -------------------------------------------------------------------------------- 1 | You are an expert AI assistant tasked with converting PowerPoint slides into semantically rich text for downstream use. 2 | Carefully observe the content of each slide and accurately transcribe all text present. 3 | Provide detailed descriptions of any graphs, charts, figures, or other visual elements. 4 | It is essential to ensure accuracy and completeness in your text-based representation of the slide. 5 | Where possible, include interpretations of graphics, icons, and other non-text descriptors. 6 | 7 | Return only the text content of the slide, without any preamble, explanation, or unrelated information. -------------------------------------------------------------------------------- /src/schemas/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ALucek/ppt2desc/487b8578d09acff1c4a6121b573050df7aef3568/src/schemas/__init__.py -------------------------------------------------------------------------------- /src/schemas/deck.py: -------------------------------------------------------------------------------- 1 | from pydantic import BaseModel 2 | from typing import List 3 | 4 | class SlideData(BaseModel): 5 | number: int 6 | content: str 7 | 8 | class DeckData(BaseModel): 9 | deck: str 10 | model: str 11 | slides: List[SlideData] 12 | --------------------------------------------------------------------------------