├── .gitignore
├── LICENSE
├── README.md
├── docker-compose.yml
├── ppt2desc_icon.png
├── requirements.txt
└── src
    ├── converters
        ├── __init__.py
        ├── docker_converter.py
        ├── exceptions.py
        ├── pdf_converter.py
        └── ppt_converter.py
    ├── libreoffice_docker
        ├── Dockerfile
        ├── app.py
        └── requirements.txt
    ├── llm
        ├── __init__.py
        ├── anthropic.py
        ├── aws.py
        ├── azure.py
        ├── base.py
        ├── deprecated
        │   ├── gemini.py
        │   └── vertex.py
        ├── google_unified.py
        └── openai.py
    ├── main.py
    ├── processor.py
    ├── prompt.txt
    └── schemas
        ├── __init__.py
        └── deck.py


/.gitignore:
--------------------------------------------------------------------------------
1 | ppt2desc_venv
2 | test_files
3 | .env
4 | __pycache__/
5 | .DS_Store
6 | ppts/


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2025 Adam Łucek
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | <img src="ppt2desc_icon.png" width=250>
  2 | 
  3 | # ppt2desc
  4 | 
  5 | Convert PowerPoint presentations into semantically rich text using Vision Language Models.
  6 | 
  7 | ## Overview
  8 | 
  9 | ppt2desc is a command-line tool that converts PowerPoint presentations into detailed textual descriptions. PowerPoint presentations are an inherently visual medium that often convey complex ideas through a combination of text, graphics, charts, and other visual layouts. This tool uses vision language models to not only transcribe the text content but also interpret and describe the visual elements and their relationships, capturing the full semantic meaning of each slide in a machine-readable format.
 10 | 
 11 | ## Features
 12 | 
 13 | - Convert PPT/PPTX files to semantic descriptions
 14 | - Process individual files or entire directories
 15 | - Support for visual elements interpretation (charts, graphs, figures)
 16 | - Rate limiting for API calls
 17 | - Customizable prompts and instructions
 18 | - JSON output format for easy integration
 19 | 
 20 | **Current Model Provider Support**
 21 | - Gemini models via Google Gemini API
 22 | - GPT Models via OpenAI API
 23 | - Claude Models via Anthropic API
 24 | - Gemini Models via Google Cloud Platform Vertex AI
 25 | - GPT Models via Microsoft Azure AI Foundry Deployments
 26 | - Nova & Claude Models via Amazon Web Services's Amazon Bedrock
 27 | 
 28 | ## Prerequisites
 29 | 
 30 | - Python 3.9 or higher
 31 | - LibreOffice (for PPT/PPTX to PDF conversion)
 32 |   - Option 1: Install LibreOffice locally.
 33 |   - Option 2: Use the provided Docker container for LibreOffice.
 34 | - vLLM API credentials
 35 | 
 36 | ## Installation
 37 | 
 38 | 1. Clone the repository:
 39 | ```bash
 40 | git clone https://github.com/ALucek/ppt2desc.git
 41 | cd ppt2desc
 42 | ```
 43 | 
 44 | 2. Installing LibreOffice
 45 | 
 46 | LibreOffice is a critical dependency for this tool as it handles the headless conversion of PowerPoint files to PDF format
 47 | 
 48 | **Option 1: Local Installation**
 49 | 
 50 | **Linux Systems:**
 51 | ```bash
 52 | sudo apt install libreoffice
 53 | ```
 54 | 
 55 | macOS:
 56 | ```bash
 57 | brew install libreoffice
 58 | ```  
 59 | 
 60 | Windows:  
 61 | Build from the installer at [LibreOffice's Official Website](https://www.libreoffice.org/download/download/)
 62 | 
 63 | **Option 2: Docker-based Installation**
 64 | 
 65 | a. Ensure you have [Docker](https://www.docker.com/) installed on your system  
 66 | b. Run the following command
 67 | ```bash
 68 | docker compose up -d
 69 | ```
 70 | 
 71 | This command will build the Docker image based on the provided [Dockerfile](./src/libreoffice_docker/) and start the container in detached mode. The LibreOffice conversion service will be accessible at`http://localhost:2002`. Take down with `docker compose down`.
 72 | 
 73 | 3. Create and activate a virtual environment:
 74 | ```bash
 75 | python -m venv ppt2desc_venv
 76 | source ppt2desc_venv/bin/activate  # On Windows: ppt2desc_venv\Scripts\activate
 77 | ```
 78 | 
 79 | 4. Install dependencies:
 80 | ```bash
 81 | pip install -r requirements.txt
 82 | ```
 83 | 
 84 | ## Usage
 85 | 
 86 | Basic usage with Gemini API:
 87 | ```bash
 88 | python src/main.py \
 89 |     --input_dir /path/to/presentations \
 90 |     --output_dir /path/to/output \
 91 |     --libreoffice_path /path/to/soffice \
 92 |     --client gemini \
 93 |     --api_key YOUR_GEMINI_API_KEY
 94 | ```
 95 | 
 96 | ### Command Line Arguments
 97 | 
 98 | General Arguments:
 99 | - `--input_dir`: Path to input directory or PPT file (required)
100 | - `--output_dir`: Output directory path (required)
101 | - `--client`: LLM client to use: 'gemini', 'vertexai', 'anthropic', 'azure', 'aws' or 'openai' (required)
102 | - `--model`: Model to use (default: "gemini-1.5-flash")
103 | - `--instructions`: Additional instructions for the model
104 | - `--libreoffice_path`: Path to LibreOffice installation
105 | - `--libreoffice_url`: Url for docker-based libreoffice installation (configured: http://localhost:2002)
106 | - `--rate_limit`: API calls per minute (default: 60)
107 | - `--prompt_path`: Custom prompt file path
108 | - `--api_key`: Model Provider API key (if not set via environment variable)
109 | - `--save_pdf`: Include to save the converted PDF in your output folder
110 | - `--save_images`: Include to save the individual slide images in your output folder
111 | 
112 | Vertex AI Specific Arguments:
113 | - `--gcp_project_id`: GCP project ID for Vertex AI service account
114 | - `--gcp_region`: GCP region for Vertex AI service (e.g., us-central1)
115 | - `--gcp_application_credentials`: Path to GCP service account JSON credentials file
116 | 
117 | Azure AI Foundry Specific Arguments:
118 | - `--azure_openai_api_key`: Azure AI Foundry Resource Key 1 or Key 2
119 | - `--azure_openai_endpoint`: Azure AI Foundry deployment service endpoint link
120 | - `--azure_deployment_name`: The name of your model deployment
121 | - `--azure_api_version`: Azure API Version (Default: "2023-12-01-preview")
122 | 
123 | AWS Amazon Bedrock Specific Arguments:
124 | - `--aws_access_key_id`: Bedrock Account Access Key
125 | - `--aws_secret_access_key`: Bedrock Account Account Secret Access Key
126 | - `--aws_region`: AWS Bedrock Region
127 | 
128 | ### Example Commands
129 | 
130 | Using Gemini API:
131 | ```bash
132 | python src/main.py \
133 |     --input_dir ./presentations \
134 |     --output_dir ./output \
135 |     --libreoffice_path ./soffice \
136 |     --client gemini \
137 |     --model gemini-1.5-flash \
138 |     --rate_limit 30 \
139 |     --instructions "Focus on extracting numerical data from charts and graphs"
140 | ```
141 | 
142 | Using Vertex AI:
143 | ```bash
144 | python src/main.py \
145 |     --input_dir ./presentations \
146 |     --output_dir ./output \
147 |     --client vertexai \
148 |     --libreoffice_path ./soffice \
149 |     --gcp_project_id my-project-123 \
150 |     --gcp_region us-central1 \
151 |     --gcp_application_credentials ./service-account.json \
152 |     --model gemini-1.5-pro \
153 |     --instructions "Extract detailed information from technical diagrams"
154 | ```
155 | Using Azure AI Foundry:
156 | ```bash
157 | python src/main.py \
158 |     --input_dir ./presentations \
159 |     --output_dir ./output \
160 |     --libreoffice_path ./soffice \
161 |     --client azure \
162 |     --azure_openai_api_key 123456790ABCDEFG \
163 |     --azure_openai_endpoint 'https://example-endpoint-001.openai.azure.com/' \
164 |     --azure_deployment_name gpt-4o \
165 |     --azure_api_version 2023-12-01-preview \
166 |     --rate_limit 60
167 | ```
168 | 
169 | Using AWS Amazon Bedrock:
170 | ```bash
171 | python src/main.py \
172 |     --input_dir ./presentations \
173 |     --output_dir ./output \
174 |     --libreoffice_path ./soffice \
175 |     --client aws \
176 |     --model us.amazon.nova-lite-v1:0 \
177 |     --aws_access_key_id 123456790ABCDEFG \
178 |     --aws_secret_access_key 123456790ABCDEFG \
179 |     --aws_region us-east-1 \
180 |     --rate_limit 60
181 | ```
182 | 
183 | ## Output Format
184 | 
185 | The tool generates JSON files with the following structure:
186 | 
187 | ```json
188 | {
189 |   "deck": "presentation.pptx",
190 |   "model": "model-name",
191 |   "slides": [
192 |     {
193 |       "number": 1,
194 |       "content": "Detailed description of slide content..."
195 |     },
196 |     // ... more slides
197 |   ]
198 | }
199 | ```
200 | 
201 | ## Advanced Usage
202 | 
203 | ### Using Docker-based LibreOffice Conversion
204 | 
205 | When using the Docker container for LibreOffice, you can use the `--libreoffice_url` argument to direct the conversion process to the container's API endpoint, rather than a local installation.
206 | 
207 | ```bash
208 | python src/main.py \
209 |     --input_dir ./presentations \
210 |     --output_dir ./output \
211 |     --libreoffice_url http://localhost:2002 \
212 |     --client vertexai \
213 |     --model gemini-1.5-pro \
214 |     --gcp_project_id my-project-123 \
215 |     --gcp_region us-central1 \
216 |     --gcp_application_credentials ./service-account.json \
217 |     --rate_limit 30 \
218 |     --instructions "Extract detailed information from technical diagrams" \
219 |     --save_pdf \
220 |     --save_images
221 | ```
222 | 
223 | You should use either `--libreoffice_url` or `--libreoffice_path` but not both.
224 | 
225 | ### Custom Prompts
226 | 
227 | You can modify the base prompt by editing `src/prompt.txt` or providing additional instructions via the command line:
228 | 
229 | ```bash
230 | python src/main.py \
231 |     --input_dir ./presentations \
232 |     --output_dir ./output \
233 |     --libreoffice_path ./soffice \
234 |     --instructions "Include mathematical equations and formulas in LaTeX format"
235 | ```
236 | 
237 | ### Authentication
238 | 
239 | For Consumer APIs:
240 | - Set your API key via the `--api_key` argument or through your respective provider's environment variables
241 | 
242 | For Vertex AI:
243 | 1. Create a service account in your GCP project IAM
244 | 2. Grant necessary permissions (typically, "Vertex AI User" role)
245 | 3. Download the service account JSON key file
246 | 4. Provide the credentials file path via `--gcp_application_credentials`
247 | 
248 | For Azure OpenAI Foundry:
249 | 1. Create an Azure OpenAI Resource
250 | 2. Navigate to Azure AI Foundry and choose the subscription and Azure OpenAI Resource to work with
251 | 3. Under management select deployments
252 | 4. Select create new deployment and configure with your vision LLM
253 | 5. Provide deployment name, API key, endpoint, and api version via `--azure_deployment_name`, `--azure_openai_api_key`, `--azure_openai_endpoint`, `--azure_api_version`,
254 | 
255 | For AWS Bedrock:
256 | 1. Request access to serverless model deployments in Amazon Bedrock's model catalog
257 | 2. Create a user in your AWS IAM
258 | 3. Enable Amazon Bedrock access policies for your user
259 | 4. Save User Credentials access key and secret access key
260 | 5. Provide user's credentials via `--aws_access_key_id`, and `--aws_secret_access_key`
261 | 
262 | ## Contributing
263 | 
264 | Contributions are welcome! Please feel free to submit a Pull Request.
265 | 
266 | **Todo**
267 | - Handling google's new genai SDK for a unified gemini/vertex experience
268 | - Better Docker Setup
269 | - AWS Llama Vision Support Confirmation
270 | - Combination of JSON files across multiple ppts
271 | - Dynamic font understanding (i.e. struggles when font that ppt is using is not installed on machine)
272 | 
273 | ## License
274 | 
275 | This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
276 | 
277 | ## Acknowledgments
278 | 
279 | - [LibreOffice](https://www.libreoffice.org/) for PPT/PPTX conversion
280 | - [PyMuPDF](https://pymupdf.readthedocs.io/en/latest/) for PDF processing


--------------------------------------------------------------------------------
/docker-compose.yml:
--------------------------------------------------------------------------------
 1 | services:
 2 |   libreoffice-converter:
 3 |     build: 
 4 |       context: ./src/libreoffice_docker
 5 |       dockerfile: Dockerfile
 6 |     ports:
 7 |       - "2002:2002"
 8 |     restart: unless-stopped
 9 |     # Healthcheck
10 |     healthcheck:
11 |       test: ["CMD", "curl", "-f", "http://localhost:2002/health"]
12 |       interval: 30s
13 |       timeout: 10s
14 |       retries: 3


--------------------------------------------------------------------------------
/ppt2desc_icon.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ALucek/ppt2desc/487b8578d09acff1c4a6121b573050df7aef3568/ppt2desc_icon.png


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | annotated-types==0.7.0
 2 | anthropic==0.42.0
 3 | anyio==4.7.0
 4 | boto3==1.35.91
 5 | botocore==1.35.91
 6 | cachetools==5.5.0
 7 | certifi==2024.12.14
 8 | charset-normalizer==3.4.1
 9 | distro==1.9.0
10 | docstring_parser==0.16
11 | google-ai-generativelanguage==0.6.10
12 | google-api-core==2.24.0
13 | google-api-python-client==2.156.0
14 | google-auth==2.37.0
15 | google-auth-httplib2==0.2.0
16 | google-cloud-aiplatform==1.75.0
17 | google-cloud-bigquery==3.27.0
18 | google-cloud-core==2.4.1
19 | google-cloud-resource-manager==1.14.0
20 | google-cloud-storage==2.19.0
21 | google-crc32c==1.6.0
22 | google-generativeai==0.8.3
23 | google-resumable-media==2.7.2
24 | googleapis-common-protos==1.66.0
25 | grpc-google-iam-v1==0.13.1
26 | grpcio==1.68.1
27 | grpcio-status==1.68.1
28 | h11==0.14.0
29 | httpcore==1.0.7
30 | httplib2==0.22.0
31 | httpx==0.28.1
32 | idna==3.10
33 | jiter==0.8.2
34 | jmespath==1.0.1
35 | numpy==2.2.1
36 | openai==1.58.1
37 | packaging==24.2
38 | pillow==11.0.0
39 | proto-plus==1.25.0
40 | protobuf==5.29.2
41 | pyasn1==0.6.1
42 | pyasn1_modules==0.4.1
43 | pydantic==2.10.4
44 | pydantic_core==2.27.2
45 | PyMuPDF==1.25.1
46 | pyparsing==3.2.1
47 | python-dateutil==2.9.0.post0
48 | requests==2.32.3
49 | rsa==4.9
50 | s3transfer==0.10.4
51 | shapely==2.0.6
52 | six==1.17.0
53 | sniffio==1.3.1
54 | tqdm==4.67.1
55 | typing_extensions==4.12.2
56 | uritemplate==4.1.1
57 | urllib3==2.3.0
58 | google-genai==1.3.0


--------------------------------------------------------------------------------
/src/converters/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ALucek/ppt2desc/487b8578d09acff1c4a6121b573050df7aef3568/src/converters/__init__.py


--------------------------------------------------------------------------------
/src/converters/docker_converter.py:
--------------------------------------------------------------------------------
 1 | import requests
 2 | from pathlib import Path
 3 | import logging
 4 | 
 5 | from .exceptions import ConversionError
 6 | 
 7 | logger = logging.getLogger(__name__)
 8 | 
 9 | def convert_pptx_via_docker(
10 |     ppt_file: Path,
11 |     container_url: str,
12 |     temp_dir: Path
13 | ) -> Path:
14 |     """
15 |     Convert a PPT/PPTX file to PDF by sending it to the Docker container at container_url.
16 |     e.g., container_url="http://localhost:2002"
17 |     
18 |     :param ppt_file: Path to the local PPT/PPTX file
19 |     :param container_url: Base URL of the container (without trailing slash)
20 |     :param temp_dir: Directory to store the resulting PDF
21 |     :return: Path to the newly-created PDF file
22 |     :raises ConversionError: if the container fails or file can't be saved
23 |     """
24 |     endpoint = f"{container_url.rstrip('/')}/convert/ppt-to-pdf"
25 |     logger.info(f"Calling Docker LibreOffice at {endpoint} for {ppt_file}")
26 | 
27 |     # 1) Prepare the file for upload
28 |     files = {
29 |         "file": (ppt_file.name, ppt_file.open("rb"), "application/vnd.ms-powerpoint")
30 |     }
31 | 
32 |     try:
33 |         # 2) Make a POST request
34 |         resp = requests.post(endpoint, files=files, timeout=300)
35 |         resp.raise_for_status()
36 | 
37 |         # 3) Save the returned PDF to temp_dir
38 |         pdf_filename = ppt_file.stem + ".pdf"
39 |         pdf_path = temp_dir / pdf_filename
40 |         with open(pdf_path, "wb") as f:
41 |             for chunk in resp.iter_content(chunk_size=8192):
42 |                 f.write(chunk)
43 | 
44 |         if not pdf_path.exists():
45 |             raise ConversionError("PDF file not created after Docker-based conversion.")
46 |         logger.info(f"Created PDF {pdf_path} via Docker container.")
47 |         return pdf_path
48 | 
49 |     except Exception as e:
50 |         logger.error(f"Error converting {ppt_file} via Docker: {e}")
51 |         raise ConversionError(f"Error converting {ppt_file}: {str(e)}")
52 | 


--------------------------------------------------------------------------------
/src/converters/exceptions.py:
--------------------------------------------------------------------------------
1 | class LibreOfficeNotFoundError(Exception):
2 |     """Raised when LibreOffice is not found at the given path."""
3 |     pass
4 | 
5 | class ConversionError(Exception):
6 |     """General error for file conversion issues."""
7 |     pass


--------------------------------------------------------------------------------
/src/converters/pdf_converter.py:
--------------------------------------------------------------------------------
 1 | import logging
 2 | import fitz
 3 | from PIL import Image
 4 | from pathlib import Path
 5 | from typing import List
 6 | from .exceptions import ConversionError
 7 | 
 8 | logger = logging.getLogger(__name__)
 9 | 
10 | def convert_pdf_to_images(pdf_path: Path, temp_dir: Path) -> List[Path]:
11 |     """
12 |     Convert a PDF file to a series of PNG images.
13 |     
14 |     :param pdf_path: Path to the input PDF file
15 |     :param temp_dir: Path to temporary directory for storing images
16 |     :return: List of paths to generated image files
17 |     :raises ConversionError: if the conversion to images fails
18 |     """
19 |     target_size = (1920, 1080)
20 |     image_paths = []
21 |     
22 |     try:
23 |         images_dir = temp_dir / 'images'
24 |         images_dir.mkdir(exist_ok=True)
25 | 
26 |         doc = fitz.open(pdf_path)
27 |         
28 |         for page_num in range(len(doc)):
29 |             page = doc.load_page(page_num)
30 |             page_rect = page.rect
31 |             
32 |             zoom_x = target_size[0] / page_rect.width
33 |             zoom_y = target_size[1] / page_rect.height
34 |             zoom = min(zoom_x, zoom_y)
35 |             
36 |             try:
37 |                 pix = page.get_pixmap(matrix=fitz.Matrix(zoom, zoom), alpha=False)
38 |                 img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
39 |                 
40 |                 # Create background (white) and paste the rendered image
41 |                 new_img = Image.new("RGB", target_size, (255, 255, 255))
42 |                 paste_x = (target_size[0] - img.width) // 2
43 |                 paste_y = (target_size[1] - img.height) // 2
44 |                 new_img.paste(img, (paste_x, paste_y))
45 |                 
46 |                 # Save image
47 |                 image_path = images_dir / f"slide_{page_num + 1}.png"
48 |                 new_img.save(image_path)
49 |                 image_paths.append(image_path)
50 | 
51 |             except Exception as inner_exc:
52 |                 logger.error(f"Error processing page {page_num + 1}: {str(inner_exc)}")
53 |                 continue
54 | 
55 |         doc.close()
56 |         return image_paths
57 | 
58 |     except Exception as e:
59 |         logger.error(f"Error converting PDF to images: {str(e)}")
60 |         raise ConversionError(f"Error converting PDF to images: {str(e)}")
61 | 


--------------------------------------------------------------------------------
/src/converters/ppt_converter.py:
--------------------------------------------------------------------------------
 1 | import logging
 2 | import subprocess
 3 | from pathlib import Path
 4 | from .exceptions import LibreOfficeNotFoundError, ConversionError
 5 | 
 6 | logger = logging.getLogger(__name__)
 7 | 
 8 | def convert_pptx_to_pdf(input_file: Path, libreoffice_path: Path, temp_dir: Path) -> Path:
 9 |     """
10 |     Convert a PowerPoint file to PDF using LibreOffice.
11 |     
12 |     :param input_file: Path to the input PowerPoint file
13 |     :param libreoffice_path: Path to LibreOffice executable
14 |     :param temp_dir: Temporary directory to store the PDF
15 |     :return: Path to the output PDF file if successful
16 |     :raises LibreOfficeNotFoundError: if LibreOffice is not found
17 |     :raises ConversionError: if the conversion fails
18 |     """
19 |     if not libreoffice_path.exists():
20 |         logger.error(f"LibreOffice not found at {libreoffice_path}")
21 |         raise LibreOfficeNotFoundError(f"LibreOffice not found at {libreoffice_path}")
22 | 
23 |     try:
24 |         cmd = [
25 |             str(libreoffice_path),
26 |             '--headless',
27 |             '--convert-to', 'pdf',
28 |             '--outdir', str(temp_dir),
29 |             str(input_file)
30 |         ]
31 |         
32 |         result = subprocess.run(cmd, check=True, capture_output=True, text=True)
33 |         logger.debug(f"LibreOffice conversion output: {result.stdout}")
34 | 
35 |         # The PDF file name should match the PPTX name, but with ".pdf"
36 |         pdf_name = f"{input_file.stem}.pdf"
37 |         pdf_path = temp_dir / pdf_name
38 |         
39 |         if pdf_path.exists():
40 |             return pdf_path
41 |         else:
42 |             logger.error(f"Expected PDF not created at {pdf_path}")
43 |             logger.error(f"LibreOffice error: {result.stderr}")
44 |             raise ConversionError(f"Failed to create PDF at {pdf_path}")
45 | 
46 |     except subprocess.CalledProcessError as e:
47 |         logger.error(f"Error converting {input_file}: {e.stderr}")
48 |         raise ConversionError(f"Subprocess conversion error: {e.stderr}")
49 |     except Exception as e:
50 |         logger.error(f"Unexpected error converting {input_file}: {str(e)}")
51 |         raise ConversionError(f"Unexpected error: {str(e)}")
52 | 


--------------------------------------------------------------------------------
/src/libreoffice_docker/Dockerfile:
--------------------------------------------------------------------------------
 1 | FROM python:3.11-slim
 2 | 
 3 | ENV PYTHONDONTWRITEBYTECODE=1 \
 4 |     PYTHONUNBUFFERED=1
 5 | 
 6 | RUN apt-get update && apt-get install -y --no-install-recommends \
 7 |     libreoffice \
 8 |     fonts-dejavu \
 9 |     fonts-liberation \
10 |     fonts-noto \
11 |     fonts-noto-color-emoji \
12 |     curl \
13 |     fontconfig \
14 |     && apt-get clean \
15 |     && rm -rf /var/lib/apt/lists/*
16 | 
17 | RUN fc-cache -f -v
18 | 
19 | RUN useradd --create-home libreoffice
20 | 
21 | WORKDIR /app
22 | 
23 | COPY requirements.txt .
24 | RUN pip install --no-cache-dir -r requirements.txt
25 | 
26 | COPY app.py .
27 | 
28 | RUN chown -R libreoffice:libreoffice /app
29 | 
30 | USER libreoffice
31 | 
32 | EXPOSE 2002
33 | 
34 | CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "2002"]
35 | 


--------------------------------------------------------------------------------
/src/libreoffice_docker/app.py:
--------------------------------------------------------------------------------
 1 | from fastapi import FastAPI, UploadFile, HTTPException
 2 | from fastapi.responses import FileResponse
 3 | import subprocess
 4 | from pathlib import Path
 5 | import tempfile
 6 | import shutil
 7 | import logging
 8 | 
 9 | app = FastAPI(title="Document Conversion Service")
10 | 
11 | # Configure logging
12 | logging.basicConfig(
13 |     level=logging.INFO,
14 |     format="%(asctime)s [%(levelname)s] %(message)s",
15 |     handlers=[
16 |         logging.StreamHandler()
17 |     ]
18 | )
19 | logger = logging.getLogger(__name__)
20 | 
21 | LIBREOFFICE_PATH = Path("/usr/bin/libreoffice")
22 | 
23 | @app.get("/health")
24 | async def health_check():
25 |     """Simple health check endpoint"""
26 |     return {"status": "healthy"}
27 | 
28 | @app.post("/convert/ppt-to-pdf")
29 | async def convert_pptx_to_pdf(file: UploadFile):
30 |     """Convert uploaded PPTX file to PDF"""
31 |     logger.info(f"Received file: {file.filename}")
32 | 
33 |     # Validate file extension
34 |     if not file.filename.lower().endswith(('.pptx', '.ppt')):
35 |         logger.error("Invalid file extension")
36 |         raise HTTPException(status_code=400, detail="File must be a .pptx or .ppt")
37 | 
38 |     # Create temp dir but don't use context manager
39 |     temp_dir = tempfile.mkdtemp()
40 |     temp_dir_path = Path(temp_dir)
41 |     input_path = temp_dir_path / file.filename
42 | 
43 |     try:
44 |         # Save uploaded file
45 |         with input_path.open("wb") as f:
46 |             shutil.copyfileobj(file.file, f)
47 |         logger.info(f"Saved uploaded file to: {input_path}")
48 | 
49 |         # Run LibreOffice conversion
50 |         cmd = [
51 |             str(LIBREOFFICE_PATH),
52 |             '--headless',
53 |             '--convert-to', 'pdf',
54 |             '--outdir', str(temp_dir_path),
55 |             str(input_path)
56 |         ]
57 |         logger.info(f"Running command: {' '.join(cmd)}")
58 | 
59 |         result = subprocess.run(
60 |             cmd,
61 |             check=True,
62 |             capture_output=True,
63 |             text=True
64 |         )
65 | 
66 |         logger.info(f"LibreOffice stdout: {result.stdout}")
67 |         if result.stderr:
68 |             logger.warning(f"LibreOffice stderr: {result.stderr}")
69 | 
70 |         # Check for output file
71 |         pdf_path = temp_dir_path / f"{input_path.stem}.pdf"
72 |         if not pdf_path.exists():
73 |             logger.error(f"PDF not created. LibreOffice output: {result.stderr}")
74 |             raise HTTPException(status_code=500, detail="PDF conversion failed")
75 | 
76 |         logger.info(f"Conversion successful: {pdf_path}")
77 | 
78 |         async def cleanup_background():
79 |             """Async cleanup function"""
80 |             shutil.rmtree(temp_dir, ignore_errors=True)
81 | 
82 |         response = FileResponse(
83 |             path=pdf_path,
84 |             media_type='application/pdf',
85 |             filename=pdf_path.name
86 |         )
87 |         response.background = cleanup_background
88 |         
89 |         return response
90 | 
91 |     except Exception as e:
92 |         # Clean up temp dir in case of error
93 |         shutil.rmtree(temp_dir, ignore_errors=True)
94 |         logger.exception("Error during conversion")
95 |         raise HTTPException(status_code=500, detail=str(e)) from e
96 | 


--------------------------------------------------------------------------------
/src/libreoffice_docker/requirements.txt:
--------------------------------------------------------------------------------
1 | fastapi==0.104.1
2 | python-multipart==0.0.6
3 | uvicorn==0.24.0
4 | 


--------------------------------------------------------------------------------
/src/llm/__init__.py:
--------------------------------------------------------------------------------
 1 | from .base import LLMClient
 2 | from .anthropic import AnthropicClient
 3 | from .google_unified import GoogleUnifiedClient
 4 | from .openai import OpenAIClient
 5 | from .azure import AzureClient
 6 | from .aws import AWSClient
 7 | 
 8 | __all__ = [
 9 |     "LLMClient",
10 |     "AnthropicClient", 
11 |     "GoogleUnifiedClient",
12 |     "OpenAIClient",
13 |     "AzureClient",
14 |     "AWSClient"
15 | ]


--------------------------------------------------------------------------------
/src/llm/anthropic.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import base64
 3 | from pathlib import Path
 4 | from typing import Optional, Union
 5 | 
 6 | import anthropic
 7 | 
 8 | class AnthropicClient:
 9 |     """
10 |     A client wrapper around Anthropic's API for image + prompt generation.
11 | 
12 |     Usage:
13 |         client = AnthropicClient(api_key="YOUR_KEY", model="claude-3-5-sonnet-latest")
14 |         text_response = client.generate(prompt="Hello World", image_path="path/to/image.png")
15 |     """
16 | 
17 |     def __init__(self, api_key: Optional[str] = None, model: Optional[str] = None) -> None:
18 |         """
19 |         Initialize the Anthropic client with API key and model name.
20 | 
21 |         :param api_key: Optional API key string. If not provided,
22 |                        checks the ANTHROPIC_API_KEY environment variable.
23 |         :param model: The name of the generative model to use (e.g. "claude-3-sonnet-20240229").
24 |         :raises ValueError: If no API key is found or model is None.
25 |         """
26 |         self.api_key = api_key or os.environ.get("ANTHROPIC_API_KEY")
27 |         if not self.api_key:
28 |             raise ValueError(
29 |                 "API key must be provided or set via ANTHROPIC_API_KEY environment variable."
30 |             )
31 | 
32 |         if model is None:
33 |             raise ValueError("The 'model' argument is required and cannot be None.")
34 | 
35 |         self.client = anthropic.Anthropic(api_key=self.api_key)
36 |         self.model_name = model
37 | 
38 |     def _encode_image(self, image_path: Union[str, Path]) -> str:
39 |         """
40 |         Encode an image file to base64 string.
41 | 
42 |         :param image_path: Path to the image file
43 |         :return: Base64 encoded string of the image
44 |         """
45 |         with open(image_path, "rb") as image_file:
46 |             return base64.b64encode(image_file.read()).decode("utf-8")
47 | 
48 |     def generate(self, prompt: str, image_path: Union[str, Path]) -> str:
49 |         """
50 |         Generate content using the Anthropic model with text + image as input.
51 | 
52 |         :param prompt: A textual prompt to provide to the model.
53 |         :param image_path: File path (string or Path) to an image to be included in the request.
54 |         :return: The generated response text from the model.
55 |         :raises FileNotFoundError: If the specified image_path does not exist.
56 |         :raises Exception: If the underlying model call fails or an unexpected error occurs.
57 |         """
58 |         # Ensure the image path exists
59 |         image_path_obj = Path(image_path)
60 |         if not image_path_obj.is_file():
61 |             raise FileNotFoundError(f"Image file not found at {image_path_obj}")
62 | 
63 |         try:
64 |             # Encode the image to base64
65 |             base64_image = self._encode_image(image_path_obj)
66 | 
67 |             # Create the messages request
68 |             response = self.client.messages.create(
69 |                 model=self.model_name,
70 |                 max_tokens=8192,
71 |                 messages=[
72 |                     {
73 |                         "role": "user",
74 |                         "content": [
75 |                             {
76 |                                 "type": "image",
77 |                                 "source": {
78 |                                     "type": "base64",
79 |                                     "media_type": "image/png",
80 |                                     "data": base64_image,
81 |                                 },
82 |                             },
83 |                             {
84 |                                 "type": "text",
85 |                                 "text": prompt
86 |                             }
87 |                         ],
88 |                     }
89 |                 ],
90 |             )
91 | 
92 |             return response.content[0].text
93 | 
94 |         except Exception as e:
95 |             raise Exception(f"Failed to generate content with Anthropic model: {e}")


--------------------------------------------------------------------------------
/src/llm/aws.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | from pathlib import Path
  3 | from typing import Optional, Union
  4 | 
  5 | import boto3
  6 | class AWSClient:
  7 |     """
  8 |     A client wrapper around AWS Bedrock Runtime API.
  9 | 
 10 |     Usage:
 11 |         client = AWSClient(
 12 |             access_key_id="YOUR_ACCESS_KEY",
 13 |             secret_access_key="YOUR_SECRET_KEY",
 14 |             region="us-east-1",
 15 |             model="amazon.nova-pro-v1:0"  # or any Claude model
 16 |         )
 17 |         text_response = client.generate(prompt="Hello World", image_path="path/to/image.png")
 18 |     """
 19 | 
 20 |     def __init__(
 21 |         self,
 22 |         access_key_id: Optional[str] = None,
 23 |         secret_access_key: Optional[str] = None,
 24 |         region: Optional[str] = None,
 25 |         model: Optional[str] = None,
 26 |     ) -> None:
 27 |         """
 28 |         Initialize the AWS Bedrock client.
 29 | 
 30 |         :param access_key_id: Optional AWS access key ID. If not provided,
 31 |                             checks AWS_ACCESS_KEY_ID environment variable.
 32 |         :param secret_access_key: Optional AWS secret access key. If not provided,
 33 |                                 checks AWS_SECRET_ACCESS_KEY environment variable.
 34 |         :param region: AWS region name. If not provided, checks AWS_REGION environment variable.
 35 |         :param model: The model ID (e.g., "amazon.nova-pro-v1:0" or any Claude model).
 36 |         :raises ValueError: If required parameters are missing.
 37 |         """
 38 |         self.access_key_id = access_key_id or os.environ.get("AWS_ACCESS_KEY_ID")
 39 |         if not self.access_key_id:
 40 |             raise ValueError(
 41 |                 "AWS access key ID must be provided or set via AWS_ACCESS_KEY_ID environment variable."
 42 |             )
 43 | 
 44 |         self.secret_access_key = secret_access_key or os.environ.get("AWS_SECRET_ACCESS_KEY")
 45 |         if not self.secret_access_key:
 46 |             raise ValueError(
 47 |                 "AWS secret access key must be provided or set via AWS_SECRET_ACCESS_KEY environment variable."
 48 |             )
 49 | 
 50 |         self.region = region or os.environ.get("AWS_REGION")
 51 |         if not self.region:
 52 |             raise ValueError(
 53 |                 "AWS region must be provided or set via AWS_REGION environment variable."
 54 |             )
 55 | 
 56 |         if model is None:
 57 |             raise ValueError("The 'model' argument is required and cannot be None.")
 58 | 
 59 |         self.client = boto3.client(
 60 |             "bedrock-runtime",
 61 |             region_name=self.region,
 62 |             aws_access_key_id=self.access_key_id,
 63 |             aws_secret_access_key=self.secret_access_key
 64 |         )
 65 |         self.model_id = model
 66 | 
 67 |         # For JSON metadata
 68 |         self.model_name = model
 69 | 
 70 |     def _encode_image(self, image_path: Union[str, Path]) -> str:
 71 |         """
 72 |         Encode an image file to base64 string.
 73 | 
 74 |         :param image_path: Path to the image file
 75 |         :return: Base64 encoded string of the image
 76 |         """
 77 |         with open(image_path, "rb") as image_file:
 78 |             return image_file.read()
 79 | 
 80 |     def generate(self, prompt: str, image_path: Union[str, Path]) -> str:
 81 |         """
 82 |         Generate content using the AWS model with text + image as input.
 83 | 
 84 |         :param prompt: A textual prompt to provide to the model.
 85 |         :param image_path: File path (string or Path) to an image to be included in the request.
 86 |         :return: The generated response text from the model.
 87 |         :raises FileNotFoundError: If the specified image_path does not exist.
 88 |         :raises Exception: If the underlying model call fails or an unexpected error occurs.
 89 |         """
 90 |         # Ensure the image path exists
 91 |         image_path_obj = Path(image_path)
 92 |         if not image_path_obj.is_file():
 93 |             raise FileNotFoundError(f"Image file not found at {image_path_obj}")
 94 | 
 95 |         try:
 96 |             # Encode the image to base64
 97 |             base64_image = self._encode_image(image_path_obj)
 98 | 
 99 |             # Create the messages list
100 |             messages = [
101 |                 {
102 |                     "role": "user",
103 |                     "content": [
104 |                         {
105 |                             "text": prompt
106 |                         },
107 |                         {
108 |                             "image": {
109 |                                 "format": "png",
110 |                                 "source": {
111 |                                     "bytes": base64_image
112 |                                 }
113 |                             }
114 |                         }
115 |                     ]
116 |                 }
117 |             ]
118 | 
119 |             # Invoke the model using converse
120 |             response = self.client.converse(
121 |                 modelId=self.model_id,
122 |                 messages=messages
123 |             )
124 | 
125 |             return response["output"]["message"]["content"][0]["text"]
126 | 
127 |         except Exception as e:
128 |             raise Exception(f"Failed to generate content with AWS model: {e}")


--------------------------------------------------------------------------------
/src/llm/azure.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import base64
  3 | from pathlib import Path
  4 | from typing import Optional, Union
  5 | 
  6 | from openai import AzureOpenAI
  7 | 
  8 | 
  9 | class AzureClient:
 10 |     """
 11 |     A client wrapper around Azure OpenAI's API for image + prompt generation.
 12 | 
 13 |     Usage:
 14 |         client = AzureClient(
 15 |             api_key="YOUR_KEY",
 16 |             endpoint="YOUR_ENDPOINT",
 17 |             deployment="deployment_name",
 18 |             api_version="2023-12-01-preview"
 19 |         )
 20 |         text_response = client.generate(prompt="Hello World", image_path="path/to/image.png")
 21 |     """
 22 | 
 23 |     def __init__(
 24 |         self,
 25 |         api_key: Optional[str] = None,
 26 |         endpoint: Optional[str] = None,
 27 |         deployment: Optional[str] = None,
 28 |         api_version: Optional[str] = None,
 29 |     ) -> None:
 30 |         """
 31 |         Initialize the Azure OpenAI client.
 32 | 
 33 |         :param api_key: Optional API key string. If not provided,
 34 |                        checks the AZURE_OPENAI_API_KEY environment variable.
 35 |         :param endpoint: Azure OpenAI endpoint. If not provided,
 36 |                         checks the AZURE_OPENAI_ENDPOINT environment variable.
 37 |         :param deployment: The deployment name for the model.
 38 |         :param api_version: Azure OpenAI API version (e.g., "2023-12-01-preview")
 39 |         :raises ValueError: If required parameters are missing.
 40 |         """
 41 |         self.api_key = api_key or os.environ.get("AZURE_OPENAI_API_KEY")
 42 |         if not self.api_key:
 43 |             raise ValueError(
 44 |                 "API key must be provided or set via AZURE_OPENAI_API_KEY environment variable."
 45 |             )
 46 | 
 47 |         self.endpoint = endpoint or os.environ.get("AZURE_OPENAI_ENDPOINT")
 48 |         if not self.endpoint:
 49 |             raise ValueError(
 50 |                 "Endpoint must be provided or set via AZURE_OPENAI_ENDPOINT environment variable."
 51 |             )
 52 | 
 53 |         if deployment is None:
 54 |             raise ValueError("The 'deployment' argument is required and cannot be None.")
 55 | 
 56 |         if api_version is None:
 57 |             raise ValueError("The 'api_version' argument is required and cannot be None.")
 58 | 
 59 |         self.client = AzureOpenAI(
 60 |             api_key=self.api_key,
 61 |             api_version=api_version,
 62 |             base_url=f"{self.endpoint}/openai/deployments/{deployment}"
 63 |         )
 64 |         self.deployment = deployment
 65 |         
 66 |         # For JSON metadata
 67 |         self.model_name = deployment
 68 | 
 69 |     def _encode_image(self, image_path: Union[str, Path]) -> str:
 70 |         """
 71 |         Encode an image file to base64 string.
 72 | 
 73 |         :param image_path: Path to the image file
 74 |         :return: Base64 encoded string of the image
 75 |         """
 76 |         with open(image_path, "rb") as image_file:
 77 |             return base64.b64encode(image_file.read()).decode("utf-8")
 78 | 
 79 |     def generate(self, prompt: str, image_path: Union[str, Path]) -> str:
 80 |         """
 81 |         Generate content using the Azure OpenAI model with text + image as input.
 82 | 
 83 |         :param prompt: A textual prompt to provide to the model.
 84 |         :param image_path: File path (string or Path) to an image to be included in the request.
 85 |         :return: The generated response text from the model.
 86 |         :raises FileNotFoundError: If the specified image_path does not exist.
 87 |         :raises Exception: If the underlying model call fails or an unexpected error occurs.
 88 |         """
 89 |         # Ensure the image path exists
 90 |         image_path_obj = Path(image_path)
 91 |         if not image_path_obj.is_file():
 92 |             raise FileNotFoundError(f"Image file not found at {image_path_obj}")
 93 | 
 94 |         try:
 95 |             # Encode the image to base64
 96 |             base64_image = self._encode_image(image_path_obj)
 97 | 
 98 |             # Create the API request
 99 |             response = self.client.chat.completions.create(
100 |                 model=self.deployment,
101 |                 messages=[
102 |                     {
103 |                         "role": "user",
104 |                         "content": [
105 |                             {
106 |                                 "type": "text",
107 |                                 "text": prompt,
108 |                             },
109 |                             {
110 |                                 "type": "image_url",
111 |                                 "image_url": {
112 |                                     "url": f"data:image/jpeg;base64,{base64_image}"
113 |                                 },
114 |                             },
115 |                         ],
116 |                     }
117 |                 ],
118 |             )
119 | 
120 |             return response.choices[0].message.content
121 | 
122 |         except Exception as e:
123 |             raise Exception(f"Failed to generate content with Azure OpenAI model: {e}")


--------------------------------------------------------------------------------
/src/llm/base.py:
--------------------------------------------------------------------------------
 1 | from pathlib import Path
 2 | from typing import Protocol, Union, runtime_checkable
 3 | 
 4 | 
 5 | @runtime_checkable
 6 | class LLMClient(Protocol):
 7 |     """
 8 |     Protocol defining the interface for LLM clients.
 9 |     
10 |     This protocol ensures all LLM clients implement a (semi) consistent interface for image-to-text generation.
11 |     """
12 |     
13 |     model_name: str
14 | 
15 |     def generate(self, prompt: str, image_path: Union[str, Path]) -> str:
16 |         """
17 |         Generate content using the LLM model with text + image as input.
18 | 
19 |         :param prompt: A textual prompt to provide to the model.
20 |         :param image_path: File path (string or Path) to an image to be included in the request.
21 |         :return: The generated response text from the model.
22 |         :raises FileNotFoundError: If the specified image_path does not exist.
23 |         :raises Exception: If the underlying model call fails or an unexpected error occurs.
24 |         """
25 |         pass


--------------------------------------------------------------------------------
/src/llm/deprecated/gemini.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | from pathlib import Path
 3 | from typing import Optional, Union
 4 | 
 5 | import PIL.Image
 6 | import google.generativeai as genai
 7 | 
 8 | 
 9 | class GeminiClient:
10 |     """
11 |     A client wrapper around Google's Generative AI (Gemini) model for image + prompt generation.
12 | 
13 |     Usage:
14 |         client = GeminiClient(api_key="YOUR_KEY", model="gemini-1.5-flash")
15 |         text_response = client.generate(prompt="Hello World", image_path="path/to/image.png")
16 |     """
17 | 
18 |     def __init__(self, api_key: Optional[str] = None, model: Optional[str] = None) -> None:
19 |         """
20 |         Initialize the Gemini client with API key and model name.
21 | 
22 |         :param api_key: Optional API key string. If not provided,
23 |                         checks the GEMINI_API_KEY environment variable.
24 |         :param model:   The name of the generative model to use (e.g. "gemini-1.5-flash").
25 |         :raises ValueError: If no API key is found or model is None.
26 |         """
27 |         self.api_key = api_key or os.environ.get("GEMINI_API_KEY")
28 |         if not self.api_key:
29 |             raise ValueError("API key must be provided or set via GEMINI_API_KEY environment variable.")
30 | 
31 |         if model is None:
32 |             raise ValueError("The 'model' argument is required and cannot be None.")
33 | 
34 |         # Configure generative AI
35 |         genai.configure(api_key=self.api_key)
36 |         self.model = genai.GenerativeModel(model)
37 |         self.model_name = model
38 | 
39 |     def generate(self, prompt: str, image_path: Union[str, Path]) -> str:
40 |         """
41 |         Generate content using the Gemini model with text + image as input.
42 | 
43 |         :param prompt: A textual prompt to provide to the model.
44 |         :param image_path: File path (string or Path) to an image to be included in the request.
45 |         :return: The generated response text from the model.
46 |         :raises FileNotFoundError: If the specified image_path does not exist.
47 |         :raises Exception: If the underlying model call fails or an unexpected error occurs.
48 |         """
49 |         # Ensure the image path exists
50 |         image_path_obj = Path(image_path)
51 |         if not image_path_obj.is_file():
52 |             raise FileNotFoundError(f"Image file not found at {image_path_obj}")
53 | 
54 |         try:
55 |             image = PIL.Image.open(image_path_obj)
56 |             # If using the google.generativeai library's generate_content method:
57 |             # pass [prompt, image] in the format required by the library
58 |             response = self.model.generate_content([prompt, image])
59 |             return response.text
60 | 
61 |         except Exception as e:
62 |             raise Exception(f"Failed to generate content with Gemini model: {e}")
63 | 


--------------------------------------------------------------------------------
/src/llm/deprecated/vertex.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | from pathlib import Path
  3 | from typing import Optional, Union
  4 | 
  5 | import vertexai
  6 | from vertexai.preview.generative_models import GenerativeModel, Image
  7 | 
  8 | 
  9 | class VertexAIClient:
 10 |     """
 11 |     A client wrapper around Google's Vertex AI service for image + prompt generation using Gemini models.
 12 | 
 13 |     Usage:
 14 |         client = VertexAIClient(
 15 |             credentials_path="path/to/credentials.json",
 16 |             project_id="your-project-id",
 17 |             region="us-central1",
 18 |             model="gemini-1.5-pro-002"
 19 |         )
 20 |         text_response = client.generate(prompt="Hello World", image_path="path/to/image.png")
 21 |     """
 22 | 
 23 |     def __init__(
 24 |         self,
 25 |         credentials_path: Optional[str] = None,
 26 |         project_id: Optional[str] = None,
 27 |         region: Optional[str] = None,
 28 |         model: Optional[str] = None,
 29 |     ) -> None:
 30 |         """
 31 |         Initialize the Vertex AI client with necessary credentials and configuration.
 32 | 
 33 |         :param credentials_path: Path to the service account credentials JSON file.
 34 |                                If not provided, checks GOOGLE_APPLICATION_CREDENTIALS env var.
 35 |         :param project_id: GCP project ID. If not provided, checks PROJECT_ID env var.
 36 |         :param region: GCP region for Vertex AI. If not provided, checks REGION env var.
 37 |         :param model: The name of the generative model to use (e.g. "gemini-1.5-pro-002").
 38 |         :raises ValueError: If required credentials or configuration are missing.
 39 |         """
 40 |         # Check credentials
 41 |         self.credentials_path = credentials_path or os.environ.get("GOOGLE_APPLICATION_CREDENTIALS")
 42 |         if not self.credentials_path:
 43 |             raise ValueError(
 44 |                 "Credentials path must be provided or set via "
 45 |                 "GOOGLE_APPLICATION_CREDENTIALS environment variable."
 46 |             )
 47 |         if not Path(self.credentials_path).is_file():
 48 |             raise FileNotFoundError(
 49 |                 f"Credentials file not found at {self.credentials_path}"
 50 |             )
 51 |         
 52 |         # Check project ID and region
 53 |         self.project_id = project_id or os.environ.get("PROJECT_ID")
 54 |         if not self.project_id:
 55 |             raise ValueError(
 56 |                 "Project ID must be provided or set via PROJECT_ID environment variable."
 57 |             )
 58 |         
 59 |         self.region = region or os.environ.get("REGION")
 60 |         if not self.region:
 61 |             raise ValueError(
 62 |                 "Region must be provided or set via REGION environment variable."
 63 |             )
 64 |         
 65 |         if model is None:
 66 |             raise ValueError("The 'model' argument is required and cannot be None.")
 67 |         
 68 |         # Set credentials environment variable
 69 |         os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = self.credentials_path
 70 |         
 71 |         # Initialize Vertex AI
 72 |         vertexai.init(project=self.project_id, location=self.region)
 73 |         
 74 |         # Initialize the model
 75 |         self.model = GenerativeModel(model)
 76 |         self.model_name = model
 77 | 
 78 |     def generate(self, prompt: str, image_path: Union[str, Path]) -> str:
 79 |         """
 80 |         Generate content using the Vertex AI model with text + image as input.
 81 | 
 82 |         :param prompt: A textual prompt to provide to the model.
 83 |         :param image_path: File path (string or Path) to an image to be included in the request.
 84 |         :return: The generated response text from the model.
 85 |         :raises FileNotFoundError: If the specified image_path does not exist.
 86 |         :raises Exception: If the underlying model call fails or an unexpected error occurs.
 87 |         """
 88 |         # Ensure the image path exists
 89 |         image_path_obj = Path(image_path)
 90 |         if not image_path_obj.is_file():
 91 |             raise FileNotFoundError(f"Image file not found at {image_path_obj}")
 92 | 
 93 |         try:
 94 |             # Load the image using Vertex AI's Image class
 95 |             image = Image.load_from_file(str(image_path_obj))
 96 |             
 97 |             # Generate content using the model
 98 |             response = self.model.generate_content([prompt, image])
 99 |             return response.text
100 | 
101 |         except Exception as e:
102 |             raise Exception(f"Failed to generate content with Vertex AI model: {e}")


--------------------------------------------------------------------------------
/src/llm/google_unified.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | from pathlib import Path
  3 | from typing import Optional, Union
  4 | 
  5 | import PIL.Image
  6 | from google import genai
  7 | from google.oauth2 import service_account
  8 | import logging 
  9 | 
 10 | logging.getLogger("google_genai.models").setLevel(logging.WARNING)
 11 | 
 12 | class GoogleUnifiedClient:
 13 |     """
 14 |     A unified client wrapper for Google's GenAI SDK that supports both Gemini API and Vertex AI.
 15 |     
 16 |     Usage for Gemini API:
 17 |         client = GoogleUnifiedClient(api_key="YOUR_KEY", model="gemini-1.5-flash")
 18 |         
 19 |     Usage for Vertex AI:
 20 |         client = GoogleUnifiedClient(
 21 |             credentials_path="path/to/credentials.json",
 22 |             project_id="your-project-id",
 23 |             region="us-central1",
 24 |             model="gemini-1.5-pro-002",
 25 |             use_vertex=True
 26 |         )
 27 |     """
 28 | 
 29 |     def __init__(
 30 |         self,
 31 |         api_key: Optional[str] = None,
 32 |         credentials_path: Optional[str] = None,
 33 |         project_id: Optional[str] = None,
 34 |         region: Optional[str] = None,
 35 |         model: Optional[str] = None,
 36 |         use_vertex: bool = False,
 37 |     ) -> None:
 38 |         """
 39 |         Initialize the Google GenAI client for either Gemini API or Vertex AI.
 40 | 
 41 |         :param api_key: API key for Gemini API (used if use_vertex=False)
 42 |         :param credentials_path: Path to service account credentials JSON file (used if use_vertex=True)
 43 |         :param project_id: GCP project ID (used if use_vertex=True)
 44 |         :param region: GCP region (used if use_vertex=True)
 45 |         :param model: The name of the generative model to use
 46 |         :param use_vertex: Whether to use Vertex AI (True) or Gemini API (False)
 47 |         :raises ValueError: If required parameters are missing
 48 |         """
 49 |         if model is None:
 50 |             raise ValueError("The 'model' argument is required and cannot be None.")
 51 |             
 52 |         self.model_name = model
 53 |         self.use_vertex = use_vertex
 54 |         
 55 |         if use_vertex:
 56 |             # Initialize for Vertex AI
 57 |             self.credentials_path = credentials_path or os.environ.get("GOOGLE_APPLICATION_CREDENTIALS")
 58 |             if not self.credentials_path:
 59 |                 raise ValueError(
 60 |                     "Credentials path must be provided or set via "
 61 |                     "GOOGLE_APPLICATION_CREDENTIALS environment variable."
 62 |                 )
 63 |             if not Path(self.credentials_path).is_file():
 64 |                 raise FileNotFoundError(
 65 |                     f"Credentials file not found at {self.credentials_path}"
 66 |                 )
 67 |             
 68 |             self.project_id = project_id or os.environ.get("PROJECT_ID")
 69 |             if not self.project_id:
 70 |                 raise ValueError(
 71 |                     "Project ID must be provided or set via PROJECT_ID environment variable."
 72 |                 )
 73 |             
 74 |             self.region = region or os.environ.get("REGION")
 75 |             if not self.region:
 76 |                 raise ValueError(
 77 |                     "Region must be provided or set via REGION environment variable."
 78 |                 )
 79 |                 
 80 |             # Load credentials
 81 |             credentials = service_account.Credentials.from_service_account_file(
 82 |                 self.credentials_path
 83 |             ).with_scopes(["https://www.googleapis.com/auth/cloud-platform"])
 84 |             
 85 |             # Initialize the client for Vertex AI
 86 |             self.client = genai.Client(
 87 |                 vertexai=True,
 88 |                 project=self.project_id,
 89 |                 location=self.region,
 90 |                 credentials=credentials
 91 |             )
 92 |             
 93 |         else:
 94 |             # Initialize for Gemini API
 95 |             self.api_key = api_key or os.environ.get("GEMINI_API_KEY")
 96 |             if not self.api_key:
 97 |                 raise ValueError(
 98 |                     "API key must be provided or set via GEMINI_API_KEY environment variable."
 99 |                 )
100 |                 
101 |             # Initialize the client for Gemini API
102 |             self.client = genai.Client(api_key=self.api_key)
103 | 
104 |     def generate(self, prompt: str, image_path: Union[str, Path]) -> str:
105 |         """
106 |         Generate content using the Google GenAI model with text + image as input.
107 | 
108 |         :param prompt: A textual prompt to provide to the model.
109 |         :param image_path: File path (string or Path) to an image to be included in the request.
110 |         :return: The generated response text from the model.
111 |         :raises FileNotFoundError: If the specified image_path does not exist.
112 |         :raises Exception: If the underlying model call fails or an unexpected error occurs.
113 |         """
114 |         # Ensure the image path exists
115 |         image_path_obj = Path(image_path)
116 |         if not image_path_obj.is_file():
117 |             raise FileNotFoundError(f"Image file not found at {image_path_obj}")
118 | 
119 |         try:
120 |             # Load the image
121 |             image = PIL.Image.open(image_path_obj)
122 |             
123 |             # Generate content using the client.models approach
124 |             response = self.client.models.generate_content(
125 |                 model=self.model_name,
126 |                 contents=[prompt, image]
127 |             )
128 |             
129 |             return response.text
130 | 
131 |         except Exception as e:
132 |             api_type = "Vertex AI" if self.use_vertex else "Gemini API"
133 |             raise Exception(f"Failed to generate content with {api_type} model: {e}")


--------------------------------------------------------------------------------
/src/llm/openai.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import base64
 3 | from pathlib import Path
 4 | from typing import Optional, Union
 5 | import logging
 6 | 
 7 | from openai import OpenAI
 8 | 
 9 | # Remove OpenAI's standard logging messages
10 | logging.getLogger("openai").setLevel(logging.ERROR)
11 | logging.getLogger("httpx").setLevel(logging.ERROR)
12 | 
13 | class OpenAIClient:
14 |     """
15 |     A client wrapper around OpenAI's API for image + prompt generation.
16 | 
17 |     Usage:
18 |         client = OpenAIClient(api_key="YOUR_KEY", model="gpt-4o")
19 |         text_response = client.generate(prompt="Hello World", image_path="path/to/image.png")
20 |     """
21 | 
22 |     def __init__(self, api_key: Optional[str] = None, model: Optional[str] = None) -> None:
23 |         """
24 |         Initialize the OpenAI client with API key and model name.
25 | 
26 |         :param api_key: Optional API key string. If not provided,
27 |                        checks the OPENAI_API_KEY environment variable.
28 |         :param model: The name of the generative model to use (e.g. "gpt-4-vision-preview").
29 |         :raises ValueError: If no API key is found or model is None.
30 |         """
31 |         self.api_key = api_key or os.environ.get("OPENAI_API_KEY")
32 |         if not self.api_key:
33 |             raise ValueError(
34 |                 "API key must be provided or set via OPENAI_API_KEY environment variable."
35 |             )
36 | 
37 |         if model is None:
38 |             raise ValueError("The 'model' argument is required and cannot be None.")
39 | 
40 |         self.client = OpenAI(api_key=self.api_key)
41 |         self.model_name = model
42 | 
43 |     def _encode_image(self, image_path: Union[str, Path]) -> str:
44 |         """
45 |         Encode an image file to base64 string.
46 | 
47 |         :param image_path: Path to the image file
48 |         :return: Base64 encoded string of the image
49 |         """
50 |         with open(image_path, "rb") as image_file:
51 |             return base64.b64encode(image_file.read()).decode("utf-8")
52 | 
53 |     def generate(self, prompt: str, image_path: Union[str, Path]) -> str:
54 |         """
55 |         Generate content using the OpenAI model with text + image as input.
56 | 
57 |         :param prompt: A textual prompt to provide to the model.
58 |         :param image_path: File path (string or Path) to an image to be included in the request.
59 |         :return: The generated response text from the model.
60 |         :raises FileNotFoundError: If the specified image_path does not exist.
61 |         :raises Exception: If the underlying model call fails or an unexpected error occurs.
62 |         """
63 |         # Ensure the image path exists
64 |         image_path_obj = Path(image_path)
65 |         if not image_path_obj.is_file():
66 |             raise FileNotFoundError(f"Image file not found at {image_path_obj}")
67 | 
68 |         try:
69 |             # Encode the image to base64
70 |             base64_image = self._encode_image(image_path_obj)
71 | 
72 |             # Create the API request
73 |             response = self.client.chat.completions.create(
74 |                 model=self.model_name,
75 |                 messages=[
76 |                     {
77 |                         "role": "user",
78 |                         "content": [
79 |                             {
80 |                                 "type": "text",
81 |                                 "text": prompt,
82 |                             },
83 |                             {
84 |                                 "type": "image_url",
85 |                                 "image_url": {
86 |                                     "url": f"data:image/jpeg;base64,{base64_image}"
87 |                                 },
88 |                             },
89 |                         ],
90 |                     }
91 |                 ],
92 |             )
93 | 
94 |             return response.choices[0].message.content
95 | 
96 |         except Exception as e:
97 |             raise Exception(f"Failed to generate content with OpenAI model: {e}")


--------------------------------------------------------------------------------
/src/main.py:
--------------------------------------------------------------------------------
  1 | import logging
  2 | import argparse
  3 | import sys
  4 | from pathlib import Path
  5 | 
  6 | from llm.google_unified import GoogleUnifiedClient
  7 | from llm.openai import OpenAIClient
  8 | from llm.anthropic import AnthropicClient
  9 | from llm.azure import AzureClient
 10 | from llm.aws import AWSClient
 11 | 
 12 | from processor import process_input_path
 13 | 
 14 | def parse_args(input_args=None):
 15 |     parser = argparse.ArgumentParser(description="Process PPT/PPTX files via vLLM.")
 16 |     
 17 |     parser.add_argument(
 18 |         "--output_dir",
 19 |         type=str,
 20 |         default=None,
 21 |         required=True,
 22 |         help="Output directory path"
 23 |     )
 24 |     parser.add_argument(
 25 |         "--input_dir",
 26 |         type=str,
 27 |         default=None,
 28 |         required=True,
 29 |         help="Path to input directory or PPT file"
 30 |     )
 31 |     parser.add_argument(
 32 |         "--client",
 33 |         type=str,
 34 |         required=True,
 35 |         choices=["gemini", "vertexai", "openai", "anthropic", "azure", "aws"],
 36 |         help="LLM client to use: 'gemini', 'vertexai', 'openai', 'azure', 'aws', or 'anthropic'"
 37 |     )
 38 |     parser.add_argument(
 39 |         "--model",
 40 |         type=str,
 41 |         default="gemini-1.5-flash",
 42 |         help="Suggested models: gemini-1.5-flash, gemini-1.5-pro, gpt-4o, claude-3-5-sonnet-latest"
 43 |     )
 44 |     parser.add_argument(
 45 |         "--instructions",
 46 |         type=str,
 47 |         default="None Provided",
 48 |         help="Additional instructions appended to the base prompt"
 49 |     )
 50 |     parser.add_argument(
 51 |         "--libreoffice_path",
 52 |         type=str,
 53 |         default=None,
 54 |         help="Path to the local installation of LibreOffice."
 55 |     )
 56 |     parser.add_argument(
 57 |         "--rate_limit",
 58 |         type=int,
 59 |         default=60,
 60 |         help="Number of API calls allowed per minute (default: 60)"
 61 |     )
 62 |     parser.add_argument(
 63 |         "--prompt_path",
 64 |         type=str,
 65 |         default="src/prompt.txt",
 66 |         help="Path to the base prompt file (default: src/prompt.txt)"
 67 |     )
 68 |     parser.add_argument(
 69 |         "--api_key",
 70 |         type=str,
 71 |         default=None,
 72 |         help="API key for the LLM. If not provided, the environment variable may be used."
 73 |     )
 74 |     parser.add_argument(
 75 |         "--gcp_region",
 76 |         type=str,
 77 |         default=None,
 78 |         help="GCP Region for connecting to vertex AI service account."
 79 |     )
 80 |     parser.add_argument(
 81 |         "--gcp_project_id",
 82 |         type=str,
 83 |         default=None,
 84 |         help="GCP project id for connecting to vertex AI service account."
 85 |     )
 86 |     parser.add_argument(
 87 |         "--gcp_application_credentials",
 88 |         type=str,
 89 |         default=None,
 90 |         help="Path to JSON credentials for GCP service account"
 91 |     )
 92 |     parser.add_argument(
 93 |         "--azure_openai_api_key",
 94 |         type=str,
 95 |         default=None,
 96 |         help="Value for AZURE_OPENAI_KEY if not set in env"
 97 |     )
 98 |     parser.add_argument(
 99 |         "--azure_openai_endpoint",
100 |         type=str,
101 |         default=None,
102 |         help="Value for AZURE_OPENAI_ENDPOINT if not set in env"
103 |     )
104 |     parser.add_argument(
105 |         "--azure_deployment_name",
106 |         type=str,
107 |         default=None,
108 |         help="Name of your Azure deployment"
109 |     )
110 |     parser.add_argument(
111 |         "--azure_api_version",
112 |         type=str,
113 |         default="2023-12-01-preview",
114 |         help="Azure API version"
115 |     )
116 |     parser.add_argument(
117 |         "--aws_access_key_id",
118 |         type=str,
119 |         help="AWS User Access Key"
120 |     )
121 |     parser.add_argument(
122 |         "--aws_secret_access_key",
123 |         type=str,
124 |         help="AWS User Secret Access Key"
125 |     )
126 |     parser.add_argument(
127 |         "--aws_region",
128 |         type=str,
129 |         default="us-east-1",
130 |         help="Region for AWS Bedrock Instance"
131 |     )
132 |     parser.add_argument(
133 |         "--save_pdf",
134 |         action='store_true',
135 |         default=False,
136 |         help="Save converted PDF files in the output directory"
137 |     )
138 |     parser.add_argument(
139 |         "--save_images",
140 |         action='store_true',
141 |         default=False,
142 |         help="Save extracted images in a subfolder within the output directory named after the presentation"
143 |     )
144 |     parser.add_argument(
145 |         "--libreoffice_url",
146 |         type=str,
147 |         default=None,
148 |         help="If provided, uses the Docker container's endpoint (e.g., http://localhost:2002) for PPT->PDF conversion."
149 |     )
150 | 
151 |     args = parser.parse_args(input_args) if input_args else parser.parse_args()
152 |     return args
153 | 
154 | def main():
155 |     # ---- 1) Parse arguments ----
156 |     args = parse_args()
157 | 
158 |     # ---- 2) Configure logging ----
159 |     logging.basicConfig(
160 |         level=logging.INFO,
161 |         format="%(asctime)s [%(levelname)s] %(name)s - %(message)s",
162 |         handlers=[logging.StreamHandler(sys.stdout)]
163 |     )
164 |     logger = logging.getLogger(__name__)
165 | 
166 |     # ---- 3) Read prompt once ----
167 |     base_prompt_file = Path(args.prompt_path)
168 |     if not base_prompt_file.is_file():
169 |         logger.error(f"Prompt file not found at {base_prompt_file}")
170 |         sys.exit(1)
171 | 
172 |     base_prompt = base_prompt_file.read_text(encoding="utf-8").strip()
173 |     if args.instructions and args.instructions.lower() != "none provided":
174 |         prompt = f"{base_prompt}\n\nAdditional instructions:\n{args.instructions}"
175 |     else:
176 |         prompt = base_prompt
177 | 
178 |     # ---- 4) Initialize model instance ----
179 |     try:
180 |         if args.client == "gemini":
181 |             # Using the new unified client for Gemini
182 |             model_instance = GoogleUnifiedClient(
183 |                 api_key=args.api_key, 
184 |                 model=args.model,
185 |                 use_vertex=False
186 |             )
187 |             logger.info(f"Initialized Google GenAI Client (Gemini API) with model: {args.model}")
188 |         elif args.client == "vertexai":
189 |             # Using the new unified client for Vertex AI
190 |             model_instance = GoogleUnifiedClient(
191 |                 credentials_path=args.gcp_application_credentials,
192 |                 project_id=args.gcp_project_id,
193 |                 region=args.gcp_region,
194 |                 model=args.model,
195 |                 use_vertex=True
196 |             )
197 |             logger.info(f"Initialized Google GenAI Client (Vertex AI) for project: {args.gcp_project_id}")
198 |         elif args.client == "openai":
199 |             model_instance = OpenAIClient(api_key=args.api_key, model=args.model)
200 |             logger.info(f"Initialized OpenAIClient with model: {args.model}")
201 |         elif args.client == "anthropic":
202 |             model_instance = AnthropicClient(api_key=args.api_key, model=args.model)
203 |             logger.info(f"Initialized AnthropicClient with model: {args.model}")
204 |         elif args.client == "azure":
205 |             model_instance = AzureClient(
206 |                 api_key=args.azure_openai_api_key,
207 |                 endpoint=args.azure_openai_endpoint,
208 |                 deployment=args.azure_deployment_name,
209 |                 api_version=args.azure_api_version
210 |             )
211 |             logger.info(f"Initialized AzureClient for deployment: {args.azure_deployment_name}")
212 |         elif args.client == "aws":
213 |             model_instance = AWSClient(
214 |                 access_key_id=args.aws_access_key_id,
215 |                 secret_access_key=args.aws_secret_access_key,
216 |                 region=args.aws_region,
217 |                 model=args.model
218 |             )
219 |             logger.info(f"Initialized AWSClient in region: {args.aws_region} with model {args.model}")
220 |         else:
221 |             logger.error(f"Unsupported client specified: {args.client}")
222 |             sys.exit(1)
223 |     except Exception as e:
224 |         logger.error(f"Failed to initialize model: {str(e)}")
225 |         sys.exit(1)
226 | 
227 |     input_path = Path(args.input_dir)
228 |     output_dir = Path(args.output_dir)
229 |     output_dir.mkdir(parents=True, exist_ok=True)
230 | 
231 |     if args.libreoffice_path:
232 |         libreoffice_path = Path(args.libreoffice_path)
233 |     else:
234 |         # If no path is provided, assume 'libreoffice' is in PATH
235 |         libreoffice_path = Path("libreoffice")
236 | 
237 |     # ---- 5) Identify local vs. container-based conversion ----
238 |     if args.libreoffice_url:
239 |         logger.info(f"Using Docker-based LibreOffice at: {args.libreoffice_url}")
240 |         libreoffice_endpoint = args.libreoffice_url
241 |         # We'll pass this URL into the processor so it knows to do remote conversion
242 |         libreoffice_path = None
243 |     else:
244 |         # If no URL is provided, assume local path
245 |         if args.libreoffice_path:
246 |             libreoffice_path = Path(args.libreoffice_path)
247 |         else:
248 |             libreoffice_path = Path("libreoffice")
249 |         libreoffice_endpoint = None
250 | 
251 |     input_path = Path(args.input_dir)
252 |     output_dir = Path(args.output_dir)
253 |     output_dir.mkdir(parents=True, exist_ok=True)
254 | 
255 |     # ---- 6) Process input path ----
256 |     results = process_input_path(
257 |         input_path=input_path,
258 |         output_dir=output_dir,
259 |         libreoffice_path=libreoffice_path,
260 |         libreoffice_endpoint=libreoffice_endpoint,
261 |         model_instance=model_instance,
262 |         rate_limit=args.rate_limit,
263 |         prompt=prompt,
264 |         save_pdf=args.save_pdf,
265 |         save_images=args.save_images
266 |     )
267 | 
268 |     # ---- 6) Log Summary ----
269 |     successes = [res for res in results if len(res[1]) > 0]
270 |     failures = [res for res in results if len(res[1]) == 0]
271 | 
272 |     if successes:
273 |         logger.info(f"Successfully processed {len(successes)} PPT file(s).")
274 |     if failures:
275 |         logger.warning(f"Failed to process {len(failures)} PPT file(s).")
276 | 
277 | 
278 | if __name__ == "__main__":
279 |     main()


--------------------------------------------------------------------------------
/src/processor.py:
--------------------------------------------------------------------------------
  1 | import time
  2 | import logging
  3 | import tempfile
  4 | from pathlib import Path
  5 | from typing import List, Tuple, Union
  6 | from tqdm import tqdm
  7 | import shutil
  8 | 
  9 | from llm import LLMClient
 10 | from converters.ppt_converter import convert_pptx_to_pdf
 11 | from converters.pdf_converter import convert_pdf_to_images
 12 | from converters.docker_converter import convert_pptx_via_docker
 13 | from schemas.deck import DeckData, SlideData
 14 | 
 15 | # Create a type alias for all possible clients
 16 | logger = logging.getLogger(__name__)
 17 | 
 18 | 
 19 | def process_single_file(
 20 |     ppt_file: Path,
 21 |     output_dir: Path,
 22 |     libreoffice_path: Path,
 23 |     model_instance: LLMClient,
 24 |     rate_limit: int,
 25 |     prompt: str,
 26 |     save_pdf: bool = False,
 27 |     save_images: bool = False
 28 | ) -> Tuple[Path, List[Path]]:
 29 |     """
 30 |     Process a single PowerPoint file:
 31 |       1) Convert to PDF
 32 |       2) Convert PDF to images
 33 |       3) Send images to LLM
 34 |       4) Save the JSON output
 35 |       5) Optionally save PDF and images to output directory
 36 |     """
 37 |     with tempfile.TemporaryDirectory() as temp_dir_str:
 38 |         temp_dir = Path(temp_dir_str)
 39 | 
 40 |         try:
 41 |             # 1) PPT -> PDF
 42 |             pdf_path = convert_pptx_to_pdf(ppt_file, libreoffice_path, temp_dir)
 43 |             logger.info(f"Successfully converted {ppt_file.name} to {pdf_path.name}")
 44 | 
 45 |             # 2) PDF -> Images
 46 |             image_paths = convert_pdf_to_images(pdf_path, temp_dir)
 47 |             if not image_paths:
 48 |                 logger.error(f"No images were generated from {pdf_path.name}")
 49 |                 return (ppt_file, [])
 50 | 
 51 |             # 3) Generate LLM content
 52 |             min_interval = 60.0 / rate_limit if rate_limit > 0 else 0
 53 |             last_call_time = 0.0
 54 | 
 55 |             slides_data = []
 56 |             # Sort images by slide number (we know "slide_{page_num + 1}.png" format)
 57 |             image_paths.sort(key=lambda p: int(p.stem.split('_')[1]))
 58 | 
 59 |             # Initialize tqdm progress bar
 60 |             for idx, image_path in enumerate(tqdm(image_paths, desc=f"Processing slides for {ppt_file.name}", unit="slide"), start=1):
 61 |                 # Rate-limit logic
 62 |                 if min_interval > 0:
 63 |                     current_time = time.time()
 64 |                     time_since_last = current_time - last_call_time
 65 |                     if time_since_last < min_interval:
 66 |                         time.sleep(min_interval - time_since_last)
 67 |                     last_call_time = time.time()
 68 | 
 69 |                 try:
 70 |                     response = model_instance.generate(prompt, image_path)
 71 |                     slides_data.append(SlideData(
 72 |                         number=idx,
 73 |                         content=response
 74 |                     ))
 75 |                 except Exception as e:
 76 |                     logger.error(f"Error generating content for slide {idx}: {str(e)}")
 77 |                     slides_data.append(SlideData(
 78 |                         number=idx,
 79 |                         content="ERROR: Failed to process slide"
 80 |                     ))
 81 | 
 82 |             logger.info(f"Successfully converted {ppt_file.name} to {len(slides_data)} slides.")
 83 | 
 84 |             # 4) Build pydantic model and save JSON
 85 |             deck_data = DeckData(
 86 |                 deck=ppt_file.name,
 87 |                 model=model_instance.model_name,
 88 |                 slides=slides_data
 89 |             )
 90 |             output_file = output_dir / f"{ppt_file.stem}.json"
 91 |             output_file.write_text(deck_data.model_dump_json(indent=2), encoding='utf-8')
 92 |             logger.info(f"Output written to {output_file}")
 93 | 
 94 |             # 5) Optionally save PDF
 95 |             if save_pdf:
 96 |                 destination_pdf = output_dir / pdf_path.name
 97 |                 shutil.copy2(pdf_path, destination_pdf)
 98 |                 logger.info(f"Saved PDF to {destination_pdf}")
 99 | 
100 |             # 6) Optionally save images
101 |             if save_images:
102 |                 # Create a subfolder named after the PPT file
103 |                 images_subdir = output_dir / ppt_file.stem
104 |                 images_subdir.mkdir(parents=True, exist_ok=True)
105 |                 for img_path in image_paths:
106 |                     destination_img = images_subdir / img_path.name
107 |                     shutil.copy2(img_path, destination_img)
108 |                 logger.info(f"Saved images to {images_subdir}")
109 | 
110 |             return (ppt_file, image_paths)
111 | 
112 |         except Exception as ex:
113 |             logger.error(f"Unexpected error while processing {ppt_file.name}: {str(ex)}")
114 |             return (ppt_file, [])
115 | 
116 | def process_input_path(
117 |     input_path: Path,
118 |     output_dir: Path,
119 |     libreoffice_path: Union[Path, None],
120 |     libreoffice_endpoint: Union[str, None],
121 |     model_instance: LLMClient,
122 |     rate_limit: int,
123 |     prompt: str,
124 |     save_pdf: bool = False,
125 |     save_images: bool = False
126 | ) -> List[Tuple[Path, List[Path]]]:
127 |     """
128 |     Process one or more PPT files from the specified path.
129 |     Optionally save PDFs and images to the output directory.
130 |     """
131 |     results = []
132 | 
133 |     # Single file mode
134 |     if input_path.is_file():
135 |         if input_path.suffix.lower() in ('.ppt', '.pptx'):
136 |             res = process_single_file(
137 |                 ppt_file=input_path,
138 |                 output_dir=output_dir,
139 |                 libreoffice_path=libreoffice_path,
140 |                 libreoffice_endpoint=libreoffice_endpoint,
141 |                 model_instance=model_instance,
142 |                 rate_limit=rate_limit,
143 |                 prompt=prompt,
144 |                 save_pdf=save_pdf,
145 |                 save_images=save_images
146 |             )
147 |             results.append(res)
148 | 
149 |     # Directory mode
150 |     else:
151 |         for ppt_file in input_path.glob('*.ppt*'):
152 |             res = process_single_file(
153 |                 ppt_file=ppt_file,
154 |                 output_dir=output_dir,
155 |                 libreoffice_path=libreoffice_path,
156 |                 libreoffice_endpoint=libreoffice_endpoint,
157 |                 model_instance=model_instance,
158 |                 rate_limit=rate_limit,
159 |                 prompt=prompt,
160 |                 save_pdf=save_pdf,
161 |                 save_images=save_images
162 |             )
163 |             results.append(res)
164 | 
165 |     return results
166 | 
167 | 
168 | def process_single_file(
169 |     ppt_file: Path,
170 |     output_dir: Path,
171 |     libreoffice_path: Union[Path, None],
172 |     libreoffice_endpoint: Union[str, None],
173 |     model_instance: LLMClient,
174 |     rate_limit: int,
175 |     prompt: str,
176 |     save_pdf: bool = False,
177 |     save_images: bool = False
178 | ) -> Tuple[Path, List[Path]]:
179 |     """
180 |     Process a single PowerPoint file:
181 |       1) Convert to PDF (either via local LibreOffice or Docker container)
182 |       2) Convert PDF to images
183 |       3) Send images to LLM
184 |       4) Save JSON output
185 |       5) Optionally save PDF and images
186 |     """
187 |     with tempfile.TemporaryDirectory() as temp_dir_str:
188 |         temp_dir = Path(temp_dir_str)
189 | 
190 |         try:
191 |             # 1) PPT -> PDF
192 |             if libreoffice_endpoint:
193 |                 # Docker-based conversion
194 |                 pdf_path = convert_pptx_via_docker(
195 |                     ppt_file,
196 |                     libreoffice_endpoint,
197 |                     temp_dir
198 |                 )
199 |             else:
200 |                 # Local-based conversion
201 |                 pdf_path = convert_pptx_to_pdf(
202 |                     input_file=ppt_file,
203 |                     libreoffice_path=libreoffice_path,
204 |                     temp_dir=temp_dir
205 |                 )
206 | 
207 |             logger.info(f"Successfully converted {ppt_file.name} to {pdf_path.name}")
208 | 
209 |             # 2) PDF -> Images (local PyMuPDF)
210 |             image_paths = convert_pdf_to_images(pdf_path, temp_dir)
211 |             if not image_paths:
212 |                 logger.error(f"No images were generated from {pdf_path.name}")
213 |                 return (ppt_file, [])
214 | 
215 |             # 3) Generate LLM content
216 |             slides_data = []
217 |             min_interval = 60.0 / rate_limit if rate_limit > 0 else 0
218 |             last_call_time = 0.0
219 | 
220 |             # Sort images by slide number (assuming "slide_1.png", "slide_2.png", etc.)
221 |             image_paths.sort(key=lambda p: int(p.stem.split('_')[1]))
222 | 
223 |             for idx, image_path in enumerate(
224 |                 tqdm(image_paths, desc=f"Processing slides for {ppt_file.name}", unit="slide"), start=1
225 |             ):
226 |                 if min_interval > 0:
227 |                     current_time = time.time()
228 |                     time_since_last = current_time - last_call_time
229 |                     if time_since_last < min_interval:
230 |                         time.sleep(min_interval - time_since_last)
231 |                     last_call_time = time.time()
232 | 
233 |                 try:
234 |                     response = model_instance.generate(prompt, image_path)
235 |                     slides_data.append(SlideData(number=idx, content=response))
236 |                 except Exception as e:
237 |                     logger.error(f"Error generating content for slide {idx}: {str(e)}")
238 |                     slides_data.append(SlideData(number=idx, content="ERROR: Failed to process slide"))
239 | 
240 |             logger.info(f"Successfully converted {ppt_file.name} to {len(slides_data)} slides.")
241 | 
242 |             # 4) Build pydantic model and save JSON
243 |             deck_data = DeckData(
244 |                 deck=ppt_file.name,
245 |                 model=model_instance.model_name,
246 |                 slides=slides_data
247 |             )
248 |             output_file = output_dir / f"{ppt_file.stem}.json"
249 |             output_file.write_text(deck_data.model_dump_json(indent=2), encoding='utf-8')
250 |             logger.info(f"Output written to {output_file}")
251 | 
252 |             # 5) Optionally save PDF
253 |             if save_pdf:
254 |                 destination_pdf = output_dir / pdf_path.name
255 |                 shutil.copy2(pdf_path, destination_pdf)
256 |                 logger.info(f"Saved PDF to {destination_pdf}")
257 | 
258 |             # 6) Optionally save images
259 |             if save_images:
260 |                 images_subdir = output_dir / ppt_file.stem
261 |                 images_subdir.mkdir(parents=True, exist_ok=True)
262 |                 for img_path in image_paths:
263 |                     shutil.copy2(img_path, images_subdir / img_path.name)
264 |                 logger.info(f"Saved images to {images_subdir}")
265 | 
266 |             return (ppt_file, image_paths)
267 | 
268 |         except Exception as ex:
269 |             logger.error(f"Unexpected error while processing {ppt_file.name}: {str(ex)}")
270 |             return (ppt_file, [])


--------------------------------------------------------------------------------
/src/prompt.txt:
--------------------------------------------------------------------------------
1 | You are an expert AI assistant tasked with converting PowerPoint slides into semantically rich text for downstream use. 
2 | Carefully observe the content of each slide and accurately transcribe all text present. 
3 | Provide detailed descriptions of any graphs, charts, figures, or other visual elements. 
4 | It is essential to ensure accuracy and completeness in your text-based representation of the slide. 
5 | Where possible, include interpretations of graphics, icons, and other non-text descriptors.
6 | 
7 | Return only the text content of the slide, without any preamble, explanation, or unrelated information.


--------------------------------------------------------------------------------
/src/schemas/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ALucek/ppt2desc/487b8578d09acff1c4a6121b573050df7aef3568/src/schemas/__init__.py


--------------------------------------------------------------------------------
/src/schemas/deck.py:
--------------------------------------------------------------------------------
 1 | from pydantic import BaseModel
 2 | from typing import List
 3 | 
 4 | class SlideData(BaseModel):
 5 |     number: int
 6 |     content: str
 7 | 
 8 | class DeckData(BaseModel):
 9 |     deck: str
10 |     model: str
11 |     slides: List[SlideData]
12 | 


--------------------------------------------------------------------------------