├── .env.example
├── CLAUDE.md
├── LICENSE
├── README.md
├── image.png
├── requirements.txt
└── server.py


/.env.example:
--------------------------------------------------------------------------------
 1 | # Required: Your Google AI Studio API key
 2 | GEMINI_API_KEY="your-google-ai-studio-key"
 3 | 
 4 | # Optional: Model mappings for Claude Code aliases
 5 | BIG_MODEL="gemini-1.5-pro-latest"    # For 'sonnet' or 'opus' requests
 6 | SMALL_MODEL="gemini-1.5-flash-latest" # For 'haiku' requests
 7 | 
 8 | # Optional: Server settings
 9 | HOST="0.0.0.0"
10 | PORT="8082"
11 | LOG_LEVEL="WARNING"  # DEBUG, INFO, WARNING, ERROR, CRITICAL
12 | 
13 | # Optional: Performance and reliability settings
14 | MAX_TOKENS_LIMIT="8192"           # Max tokens for Gemini responses
15 | REQUEST_TIMEOUT="90"              # Request timeout in seconds
16 | MAX_RETRIES="2"                   # LiteLLM retries to Gemini
17 | MAX_STREAMING_RETRIES="12"         # Streaming-specific retry attempts
18 | 
19 | # Optional: Streaming control (use if experiencing issues)
20 | FORCE_DISABLE_STREAMING="false"     # Disable streaming globally
21 | EMERGENCY_DISABLE_STREAMING="false" # Emergency streaming disable
22 | 


--------------------------------------------------------------------------------
/CLAUDE.md:
--------------------------------------------------------------------------------
  1 | # Claude Code: Best Practices for Effective Collaboration
  2 | 
  3 | This document outlines best practices for working with Claude Code to ensure efficient and successful software development tasks.
  4 | 
  5 | ## Task Management
  6 | 
  7 | For complex or multi-step tasks, Claude Code will use:
  8 | *   **TodoWrite**: To create a structured task list, breaking down the work into manageable steps. This provides clarity on the plan and allows for tracking progress.
  9 | *   **TodoRead**: To review the current list of tasks and their status, ensuring alignment and that all objectives are being addressed.
 10 | 
 11 | ## File Handling and Reading
 12 | 
 13 | Understanding file content is crucial before making modifications.
 14 | 
 15 | 1.  **Targeted Information Retrieval**:
 16 |     *   When searching for specific content, patterns, or definitions within a codebase, prefer using search tools like `Grep` or `Task` (with a focused search prompt). This is more efficient than reading entire files.
 17 | 
 18 | 2.  **Reading File Content**:
 19 |     *   **Small to Medium Files**: For files where full context is needed or that are not excessively large, the `Read` tool can be used to retrieve the entire content.
 20 |     *   **Large File Strategy**:
 21 |         1.  **Assess Size**: Before reading a potentially large file, its size should be determined (e.g., using `ls -l` via the `Bash` tool or by an initial `Read` with a small `limit` to observe if content is truncated).
 22 |         2.  **Chunked Reading**: If a file is large (e.g., over a few thousand lines), it should be read in manageable chunks (e.g., 1000-2000 lines at a time) using the `offset` and `limit` parameters of the `Read` tool. This ensures all content can be processed without issues.
 23 |     *   Always ensure that the file path provided to `Read` is absolute.
 24 | 
 25 | ## File Editing
 26 | 
 27 | Precision is key for successful file edits. The following strategies lead to reliable modifications:
 28 | 
 29 | 1.  **Pre-Edit Read**: **Always** use the `Read` tool to fetch the content of the file *immediately before* attempting any `Edit` or `MultiEdit` operation. This ensures modifications are based on the absolute latest version of the file.
 30 | 
 31 | 2.  **Constructing `old_string` (The text to be replaced)**:
 32 |     *   **Exact Match**: The `old_string` must be an *exact* character-for-character match of the segment in the file you intend to replace. This includes all whitespace (spaces, tabs, newlines) and special characters.
 33 |     *   **No Read Artifacts**: Crucially, do *not* include any formatting artifacts from the `Read` tool's output (e.g., `cat -n` style line numbers or display-only leading tabs) in the `old_string`. It must only contain the literal characters as they exist in the raw file.
 34 |     *   **Sufficient Context & Uniqueness**: Provide enough context (surrounding lines) in `old_string` to make it uniquely identifiable at the intended edit location. The "Anchor on a Known Good Line" strategy is preferred: `old_string` is a larger, unique block of text surrounding the change or insertion point. This is highly reliable.
 35 | 
 36 | 3.  **Constructing `new_string` (The replacement text)**:
 37 |     *   **Exact Representation**: The `new_string` must accurately represent the desired state of the code, including correct indentation, whitespace, and newlines.
 38 |     *   **No Read Artifacts**: As with `old_string`, ensure `new_string` does *not* contain any `Read` tool output artifacts.
 39 | 
 40 | 4.  **Choosing the Right Editing Tool**:
 41 |     *   **`Edit` Tool**: Suitable for a single, well-defined replacement in a file.
 42 |     *   **`MultiEdit` Tool**: Preferred when multiple changes are needed within the same file. Edits are applied sequentially, with each subsequent edit operating on the result of the previous one. This tool is highly effective for complex modifications.
 43 | 
 44 | 5.  **Verification**:
 45 |     *   The success confirmation from the `Edit` or `MultiEdit` tool (especially if `expected_replacements` is used and matches) is the primary indicator that the change was made.
 46 |     *   If further visual confirmation is needed, use the `Read` tool with `offset` and `limit` parameters to view only the specific section of the file that was changed, rather than re-reading the entire file.
 47 | 
 48 | ### Reliable Code Insertion with MultiEdit
 49 | 
 50 | When inserting larger blocks of new code (e.g., multiple functions or methods) where a simple `old_string` might be fragile due to surrounding code, the following `MultiEdit` strategy can be more robust:
 51 | 
 52 | 1.  **First Edit - Targeted Insertion Point**: For the primary code block you want to insert (e.g., new methods within a class), identify a short, unique, and stable line of code immediately *after* your desired insertion point. Use this stable line as the `old_string`.
 53 |     *   The `new_string` will consist of your new block of code, followed by a newline, and then the original `old_string` (the stable line you matched on).
 54 |     *   Example: If inserting methods into a class, the `old_string` might be the closing brace `}` of the class, or a comment that directly follows the class.
 55 | 
 56 | 2.  **Second Edit (Optional) - Ancillary Code**: If there's another, smaller piece of related code to insert (e.g., a function call within an existing method, or an import statement), perform this as a separate, more straightforward edit within the `MultiEdit` call. This edit usually has a more clearly defined and less ambiguous `old_string`.
 57 | 
 58 | **Rationale**:
 59 | *   By anchoring the main insertion on a very stable, unique line *after* the insertion point and prepending the new code to it, you reduce the risk of `old_string` mismatches caused by subtle variations in the code *before* the insertion point.
 60 | *   Keeping ancillary edits separate allows them to succeed even if the main insertion point is complex, as they often target simpler, more reliable `old_string` patterns.
 61 | *   This approach leverages `MultiEdit`'s sequential application of changes effectively.
 62 | 
 63 | **Example Scenario**: Adding new methods to a class and a call to one of these new methods elsewhere.
 64 | *   **Edit 1**: Insert the new methods. `old_string` is the class's closing brace `}`. `new_string` is `
 65 |     [new methods code]
 66 |     }`.
 67 | *   **Edit 2**: Insert the call to a new method. `old_string` is `// existing line before call`. `new_string` is `// existing line before call
 68 |     this.newMethodCall();`.
 69 | 
 70 | This method provides a balance between precise editing and handling larger code insertions reliably when direct `old_string` matches for the entire new block are problematic.
 71 | 
 72 | ## Handling Large Files for Incremental Refactoring
 73 | 
 74 | When refactoring large files incrementally rather than rewriting them completely:
 75 | 
 76 | 1. **Initial Exploration and Planning**:
 77 |    * Begin with targeted searches using `Grep` to locate specific patterns or sections within the file.
 78 |    * Use `Bash` commands like `grep -n "pattern" file` to find line numbers for specific areas of interest.
 79 |    * Create a clear mental model of the file structure before proceeding with edits.
 80 | 
 81 | 2. **Chunked Reading for Large Files**:
 82 |    * For files too large to read at once, use multiple `Read` operations with different `offset` and `limit` parameters.
 83 |    * Read sequential chunks to build a complete understanding of the file.
 84 |    * Use `Grep` to pinpoint key sections, then read just those sections with targeted `offset` parameters.
 85 | 
 86 | 3. **Finding Key Implementation Sections**:
 87 |    * Use `Bash` commands with `grep -A N` (to show N lines after a match) or `grep -B N` (to show N lines before) to locate function or method implementations.
 88 |    * Example: `grep -n "function findTagBoundaries" -A 20 filename.js` to see the first 20 lines of a function.
 89 | 
 90 | 4. **Pattern-Based Replacement Strategy**:
 91 |    * Identify common patterns that need to be replaced across the file.
 92 |    * Use the `Bash` tool with `sed` for quick previews of potential replacements.
 93 |    * Example: `sed -n "s/oldPattern/newPattern/gp" filename.js` to preview changes without making them.
 94 | 
 95 | 5. **Sequential Selective Edits**:
 96 |    * Target specific sections or patterns one at a time rather than attempting a complete rewrite.
 97 |    * Focus on clearest/simplest cases first to establish a pattern of successful edits.
 98 |    * Use `Edit` for well-defined single changes within the file.
 99 | 
100 | 6. **Batch Similar Changes Together**:
101 |    * Group similar types of changes (e.g., all references to a particular function or variable).
102 |    * Use `Bash` with `sed` to preview the scope of batch changes: `grep -n "pattern" filename.js | wc -l`
103 |    * For systematic changes across a file, consider using `sed` through the `Bash` tool: `sed -i "s/oldPattern/newPattern/g" filename.js`
104 | 
105 | 7. **Incremental Verification**:
106 |    * After each set of changes, verify the specific sections that were modified.
107 |    * For critical components, read the surrounding context to ensure the changes integrate correctly.
108 |    * Validate that each change maintains the file's structure and logic before proceeding to the next.
109 | 
110 | 8. **Progress Tracking for Large Refactors**:
111 |    * Use the `TodoWrite` tool to track which sections or patterns have been updated.
112 |    * Create a checklist of all required changes and mark them off as they're completed.
113 |    * Record any sections that require special attention or that couldn't be automatically refactored.
114 | 
115 | ## Commit Messages
116 | 
117 | When Claude Code generates commit messages on your behalf:
118 | *   The `Co-Authored-By: Claude <noreply@anthropic.com>` line will **not** be included.
119 | *   The `🤖 Generated with [Claude Code](https://claude.ai/code)` line will **not** be included.
120 | 
121 | ## General Interaction
122 | 
123 | Claude Code will directly apply proposed changes and modifications using the available tools, rather than describing them and asking you to implement them manually. This ensures a more efficient and direct workflow.


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 |             DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE
 2 |                     Version 2, December 2004
 3 | 
 4 |  Copyright (C) 2004 Sam Hocevar <sam@hocevar.net>
 5 | 
 6 |  Everyone is permitted to copy and distribute verbatim or modified
 7 |  copies of this license document, and changing it is allowed as long
 8 |  as the name is changed.
 9 | 
10 |             DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE
11 |    TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
12 | 
13 |   0. You just DO WHAT THE FUCK YOU WANT TO.
14 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Gemini for Claude Code: An Anthropic-Compatible Proxy
  2 | 
  3 | This server acts as a bridge, enabling you to use **Claude Code** with Google's powerful **Gemini models**. It translates API requests and responses between the Anthropic format (used by Claude Code) and the Gemini format (via LiteLLM), allowing seamless integration.
  4 | 
  5 | ![Claude Code with Gemini Proxy](image.png)
  6 | 
  7 | ## Features
  8 | 
  9 | - **Claude Code Compatibility with Gemini**: Directly use the Claude Code CLI with Google Gemini models.
 10 | - **Seamless Model Mapping**: Intelligently maps Claude Code model requests (e.g., `haiku`, `sonnet`, `opus` aliases) to your chosen Gemini models.
 11 | - **LiteLLM Integration**: Leverages LiteLLM for robust and flexible interaction with the Gemini API.
 12 | - **Enhanced Streaming Support**: Handles streaming responses from Gemini with robust error recovery for malformed chunks and API errors.
 13 | - **Complete Tool Use for Claude Code**: Translates Claude Code's tool usage (function calling) to and from Gemini's format, with robust handling of tool results.
 14 | - **Advanced Error Handling**: Provides specific and actionable error messages for common Gemini API issues with automatic fallback strategies.
 15 | - **Resilient Architecture**: Gracefully handles Gemini API instability with smart retry logic and fallback to non-streaming modes.
 16 | - **Diagnostic Endpoints**: Includes `/health` and `/test-connection` for easier troubleshooting of your setup.
 17 | - **Token Counting**: Offers a `/v1/messages/count_tokens` endpoint compatible with Claude Code.
 18 | 
 19 | ## Recent Improvements (v2.5.0)
 20 | 
 21 | ### 🛡️ Enhanced Error Resilience
 22 | - **Malformed Chunk Recovery**: Automatically detects and handles malformed JSON chunks from Gemini streaming
 23 | - **Smart Retry Logic**: Exponential backoff with configurable retry limits for streaming errors
 24 | - **Graceful Fallback**: Seamlessly switches to non-streaming mode when streaming fails
 25 | - **Buffer Management**: Intelligent chunk buffering and reconstruction for incomplete JSON
 26 | - **Connection Stability**: Handles Gemini 500 Internal Server Errors with automatic retry
 27 | 
 28 | ### 📊 Improved Monitoring
 29 | - **Detailed Error Classification**: Specific guidance for different types of Gemini API errors
 30 | - **Enhanced Logging**: Comprehensive error tracking with malformed chunk statistics
 31 | - **Real-time Status**: Better health checks and connection testing
 32 | 
 33 | ## Prerequisites
 34 | 
 35 | - A Google Gemini API key.
 36 | - Python 3.8+.
 37 | - Claude Code CLI installed (e.g., `npm install -g @anthropic-ai/claude-code`).
 38 | 
 39 | ## Setup
 40 | 
 41 | 1.  **Clone the repository**:
 42 |     ```bash
 43 |     git clone https://github.com/coffeegrind123/gemini-code.git # Or your fork
 44 |     cd gemini-code
 45 |     ```
 46 | 
 47 | 2.  **Create and activate a virtual environment** (recommended):
 48 |     ```bash
 49 |     python3 -m venv .venv
 50 |     source .venv/bin/activate
 51 |     ```
 52 | 
 53 | 3.  **Install dependencies**:
 54 |     ```bash
 55 |     pip install -r requirements.txt
 56 |     ```
 57 | 
 58 | 4.  **Configure Environment Variables**:
 59 |     Copy the example environment file:
 60 |     ```bash
 61 |     cp .env.example .env
 62 |     ```
 63 |     Edit `.env` and add your Gemini API key. You can also customize model mappings and server settings:
 64 |     ```dotenv
 65 |     # Required: Your Google AI Studio API key
 66 |     GEMINI_API_KEY="your-google-ai-studio-key"
 67 | 
 68 |     # Optional: Model mappings for Claude Code aliases
 69 |     BIG_MODEL="gemini-1.5-pro-latest"    # For 'sonnet' or 'opus' requests
 70 |     SMALL_MODEL="gemini-1.5-flash-latest" # For 'haiku' requests
 71 |     
 72 |     # Optional: Server settings
 73 |     HOST="0.0.0.0"
 74 |     PORT="8082"
 75 |     LOG_LEVEL="WARNING"  # DEBUG, INFO, WARNING, ERROR, CRITICAL
 76 |     
 77 |     # Optional: Performance and reliability settings
 78 |     MAX_TOKENS_LIMIT="8192"           # Max tokens for Gemini responses
 79 |     REQUEST_TIMEOUT="90"              # Request timeout in seconds
 80 |     MAX_RETRIES="2"                   # LiteLLM retries to Gemini
 81 |     MAX_STREAMING_RETRIES="12"         # Streaming-specific retry attempts
 82 |     
 83 |     # Optional: Streaming control (use if experiencing issues)
 84 |     FORCE_DISABLE_STREAMING="false"     # Disable streaming globally
 85 |     EMERGENCY_DISABLE_STREAMING="false" # Emergency streaming disable
 86 |     ```
 87 | 
 88 | 5.  **Run the server**:
 89 |     The `server.py` script includes a `main()` function that starts the Uvicorn server:
 90 |     ```bash
 91 |     python server.py
 92 |     ```
 93 |     For development with auto-reload (restarts when you save changes to `server.py`):
 94 |     ```bash
 95 |     uvicorn server:app --host 0.0.0.0 --port 8082 --reload
 96 |     ```
 97 |     You can view all startup options, including configurable environment variables, by running:
 98 |     ```bash
 99 |     python server.py --help
100 |     ```
101 | 
102 | ## Usage with Claude Code
103 | 
104 | 1.  **Start the Proxy Server**: Ensure the Gemini proxy server (this application) is running (see step 5 above).
105 | 
106 | 2.  **Configure Claude Code to Use the Proxy**:
107 |     Set the `ANTHROPIC_BASE_URL` environment variable when running Claude Code:
108 |     ```bash
109 |     ANTHROPIC_BASE_URL=http://localhost:8082 claude
110 |     ```
111 |     Replace `localhost:8082` if your proxy is running on a different host or port.
112 | 
113 | 3.  **Utilize `CLAUDE.md` for Optimal Gemini Performance (Crucial)**:
114 |     - This repository includes a `CLAUDE.md` file. This file contains specific instructions and best practices tailored to help **Gemini** effectively understand and respond to **Claude Code's** unique command structure, tool usage patterns, and desired output formats.
115 |     - **Copy `CLAUDE.md` into your project directory**:
116 |       ```bash
117 |       cp /path/to/gemini-code/CLAUDE.md /your/project/directory/
118 |       ```
119 |     - When Claude Code starts in a directory containing `CLAUDE.md`, it automatically reads this file and incorporates its content into the system prompt. This is essential for guiding Gemini to work optimally within the Claude Code environment.
120 | 
121 | ## How It Works: Powering Claude Code with Gemini
122 | 
123 | 1.  **Claude Code Request**: You issue a command or prompt in the Claude Code CLI.
124 | 2.  **Anthropic Format**: Claude Code sends an API request (in Anthropic's Messages API format) to the proxy server's address (`http://localhost:8082`).
125 | 3.  **Proxy Translation (Anthropic to Gemini)**: The proxy server:
126 |     *   Receives the Anthropic-formatted request.
127 |     *   Validates it and maps any Claude model aliases (like `claude-3-sonnet...`) to the corresponding Gemini model specified in your `.env` (e.g., `gemini-1.5-pro-latest`).
128 |     *   Translates the message structure, content blocks, and tool definitions into a format LiteLLM can use with the Gemini API.
129 | 4.  **LiteLLM to Gemini**: LiteLLM sends the prepared request to the target Gemini model using your `GEMINI_API_KEY`.
130 | 5.  **Gemini Response**: Gemini processes the request and sends its response back through LiteLLM.
131 | 6.  **Proxy Translation (Gemini to Anthropic)**: The proxy server:
132 |     *   Receives the Gemini response from LiteLLM (this can be a stream of events or a complete JSON object).
133 |     *   Handles streaming errors and malformed chunks with intelligent recovery.
134 |     *   Converts Gemini's output (text, tool calls, stop reasons) back into the Anthropic Messages API format that Claude Code expects.
135 | 7.  **Response to Claude Code**: The proxy sends the Anthropic-formatted response back to your Claude Code client, which then displays the result or performs the requested action.
136 | 
137 | ## Model Mapping for Claude Code
138 | 
139 | To ensure Claude Code's model requests are handled correctly by Gemini:
140 | 
141 | - Requests from Claude Code for model names containing **"haiku"** (e.g., `claude-3-haiku-20240307`) are mapped to the Gemini model specified by your `SMALL_MODEL` environment variable (default: `gemini-1.5-flash-latest`).
142 | - Requests from Claude Code for model names containing **"sonnet"** or **"opus"** (e.g., `claude-3-sonnet-20240229`, `claude-3-opus-20240229`) are mapped to the Gemini model specified by your `BIG_MODEL` environment variable (default: `gemini-1.5-pro-latest`).
143 | - If Claude Code requests a full Gemini model name (e.g., `gemini/gemini-1.5-pro-latest`), the proxy will use that directly.
144 | 
145 | The server maintains a list of known Gemini models. If a recognized Gemini model is requested by the client without the `gemini/` prefix, the proxy will add it.
146 | 
147 | ## Endpoints
148 | 
149 | - `POST /v1/messages`: The primary endpoint for Claude Code to send messages to Gemini. It's fully compatible with the Anthropic Messages API specification that Claude Code uses.
150 | - `POST /v1/messages/count_tokens`: Allows Claude Code to estimate the token count for a set of messages, using Gemini's tokenization.
151 | - `GET /health`: Returns the health status of the proxy, including API key configuration, streaming settings, and basic API key validation.
152 | - `GET /test-connection`: Performs a quick API call to Gemini to verify connectivity and that your `GEMINI_API_KEY` is working.
153 | - `GET /`: Root endpoint providing a welcome message, current configuration summary (models, limits), and available endpoints.
154 | 
155 | ## Error Handling & Troubleshooting
156 | 
157 | ### Common Issues and Solutions
158 | 
159 | **Streaming Errors (malformed chunks):**
160 | - The proxy automatically handles malformed JSON chunks from Gemini
161 | - If streaming becomes unstable, set `FORCE_DISABLE_STREAMING=true` as a temporary fix
162 | - Increase `MAX_STREAMING_RETRIES` for more resilient streaming
163 | 
164 | **Gemini 500 Internal Server Errors:**
165 | - The proxy automatically retries with exponential backoff
166 | - These are temporary Gemini API issues that resolve automatically
167 | - Check `/health` endpoint to monitor API status
168 | 
169 | **Connection Timeouts:**
170 | - Increase `REQUEST_TIMEOUT` if experiencing frequent timeouts
171 | - Check your internet connection and firewall settings
172 | - Use `/test-connection` endpoint to verify API connectivity
173 | 
174 | **Rate Limiting:**
175 | - Monitor your Google AI Studio quota in the Google Cloud Console
176 | - The proxy will provide specific rate limit guidance in error messages
177 | 
178 | ### Emergency Mode
179 | 
180 | If you experience persistent issues:
181 | ```bash
182 | # Disable streaming temporarily
183 | export EMERGENCY_DISABLE_STREAMING=true
184 | 
185 | # Or force disable all streaming
186 | export FORCE_DISABLE_STREAMING=true
187 | ```
188 | 
189 | ## Logging
190 | 
191 | The server provides detailed logs, which are especially useful for understanding how Claude Code requests are translated for Gemini and for monitoring error recovery. Logs are colorized in TTY environments for easier reading. Adjust verbosity with the `LOG_LEVEL` environment variable:
192 | 
193 | - `DEBUG`: Detailed request/response logging and error recovery steps
194 | - `INFO`: General operation logging
195 | - `WARNING`: Error recovery and fallback notifications (recommended)
196 | - `ERROR`: Only errors and failures
197 | - `CRITICAL`: Only critical failures
198 | 
199 | ## The `CLAUDE.MD` File: Guiding Gemini for Claude Code
200 | 
201 | The `CLAUDE.MD` file included in this repository is critical for achieving the best experience when using this proxy with Claude Code and Gemini.
202 | 
203 | **Purpose:**
204 | 
205 | - **Tailors Gemini to Claude Code's Needs**: Claude Code has specific ways it expects an LLM to behave, especially regarding tool use, file operations, and output formatting. `CLAUDE.MD` provides Gemini with explicit instructions on these expectations.
206 | - **Improves Tool Reliability**: By outlining how tools should be called and results interpreted, it helps Gemini make more effective use of Claude Code's capabilities.
207 | - **Enhances Code Generation & Understanding**: Gives Gemini context about the development environment and coding standards, leading to better code suggestions within Claude Code.
208 | - **Reduces Misinterpretations**: Helps bridge any gaps between how Anthropic models might interpret Claude Code directives versus how Gemini might.
209 | 
210 | **How Claude Code Uses It:**
211 | 
212 | When you run `claude` in a project directory, the Claude Code CLI automatically looks for a `CLAUDE.MD` file in that directory. If found, its contents are prepended to the system prompt for every request sent to the LLM (in this case, your Gemini proxy).
213 | 
214 | **Recommendation:** Always copy the `CLAUDE.MD` from this proxy's repository into the root of any project where you intend to use Claude Code with this Gemini proxy. This ensures Gemini receives these vital instructions for every session.
215 | 
216 | ## Performance Tips
217 | 
218 | - **Model Selection**: Use `gemini-1.5-flash-latest` for faster responses, `gemini-1.5-pro-latest` for more complex tasks
219 | - **Streaming**: Keep streaming enabled for better interactivity; the proxy handles errors automatically
220 | - **Timeouts**: Increase `REQUEST_TIMEOUT` for complex requests that need more processing time
221 | - **Retries**: Adjust `MAX_STREAMING_RETRIES` based on your network stability
222 | 
223 | ## Contributing
224 | 
225 | Contributions, issues, and feature requests are welcome! Please submit them on the GitHub repository.
226 | 
227 | Areas where contributions are especially valuable:
228 | - Additional Gemini model support
229 | - Performance optimizations
230 | - Enhanced error recovery strategies
231 | - Documentation improvements
232 | 
233 | ## Thanks
234 | 
235 | This project was heavily inspired by and builds upon the foundational work of the [claude-code-proxy by @1rgs](https://github.com/1rgs/claude-code-proxy). Their original proxy was instrumental in demonstrating the viability of such a bridge.
236 | 
237 | Special thanks to the community for testing and feedback on error handling improvements.
238 | 


--------------------------------------------------------------------------------
/image.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/coffeegrind123/gemini-code/69bb0c18f7a3b8f8c448576a8480189a06a53ea4/image.png


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | fastapi[standard]>=0.115.11
2 | uvicorn>=0.34.0
3 | httpx>=0.25.0
4 | pydantic>=2.0.0
5 | litellm>=1.40.14
6 | python-dotenv>=1.0.0


--------------------------------------------------------------------------------
/server.py:
--------------------------------------------------------------------------------
   1 | from fastapi import FastAPI, Request, HTTPException
   2 | import uvicorn
   3 | import logging
   4 | import json
   5 | import re
   6 | import asyncio
   7 | from pydantic import BaseModel, Field, field_validator
   8 | from typing import List, Dict, Any, Optional, Union, Literal, Set
   9 | import os
  10 | from fastapi.responses import JSONResponse, StreamingResponse
  11 | import litellm
  12 | import uuid
  13 | import time
  14 | from dotenv import load_dotenv
  15 | from datetime import datetime
  16 | import sys
  17 | 
  18 | # Load environment variables early
  19 | load_dotenv()
  20 | 
  21 | # Basic LiteLLM Configuration - conservative settings to avoid hanging
  22 | litellm.drop_params = True
  23 | litellm.set_verbose = False
  24 | litellm.request_timeout = 90
  25 | 
  26 | # Constants for better maintainability  
  27 | class Constants:
  28 |     ROLE_USER = "user"
  29 |     ROLE_ASSISTANT = "assistant"
  30 |     ROLE_SYSTEM = "system"
  31 |     ROLE_TOOL = "tool"
  32 |     
  33 |     CONTENT_TEXT = "text"
  34 |     CONTENT_IMAGE = "image"
  35 |     CONTENT_TOOL_USE = "tool_use"
  36 |     CONTENT_TOOL_RESULT = "tool_result"
  37 |     
  38 |     TOOL_FUNCTION = "function"
  39 |     
  40 |     STOP_END_TURN = "end_turn"
  41 |     STOP_MAX_TOKENS = "max_tokens"
  42 |     STOP_TOOL_USE = "tool_use"
  43 |     STOP_ERROR = "error"
  44 |     
  45 |     EVENT_MESSAGE_START = "message_start"
  46 |     EVENT_MESSAGE_STOP = "message_stop"
  47 |     EVENT_MESSAGE_DELTA = "message_delta"
  48 |     EVENT_CONTENT_BLOCK_START = "content_block_start"
  49 |     EVENT_CONTENT_BLOCK_STOP = "content_block_stop"
  50 |     EVENT_CONTENT_BLOCK_DELTA = "content_block_delta"
  51 |     EVENT_PING = "ping"
  52 |     
  53 |     DELTA_TEXT = "text_delta"
  54 |     DELTA_INPUT_JSON = "input_json_delta"
  55 | 
  56 | # Simple Configuration
  57 | class Config:
  58 |     def __init__(self):
  59 |         self.gemini_api_key = os.environ.get("GEMINI_API_KEY")
  60 |         if not self.gemini_api_key:
  61 |             raise ValueError("GEMINI_API_KEY not found in environment variables")
  62 |         
  63 |         self.big_model = os.environ.get("BIG_MODEL", "gemini-1.5-pro-latest")
  64 |         self.small_model = os.environ.get("SMALL_MODEL", "gemini-1.5-flash-latest")
  65 |         self.host = os.environ.get("HOST", "0.0.0.0")
  66 |         self.port = int(os.environ.get("PORT", "8082"))
  67 |         self.log_level = os.environ.get("LOG_LEVEL", "WARNING")
  68 |         self.max_tokens_limit = int(os.environ.get("MAX_TOKENS_LIMIT", "8192"))
  69 |         
  70 |         # Connection settings - conservative defaults
  71 |         self.request_timeout = int(os.environ.get("REQUEST_TIMEOUT", "90"))
  72 |         self.max_retries = int(os.environ.get("MAX_RETRIES", "2"))
  73 |         
  74 |         # Streaming settings
  75 |         self.max_streaming_retries = int(os.environ.get("MAX_STREAMING_RETRIES", "12"))
  76 |         self.force_disable_streaming = os.environ.get("FORCE_DISABLE_STREAMING", "false").lower() == "true"
  77 |         self.emergency_disable_streaming = os.environ.get("EMERGENCY_DISABLE_STREAMING", "false").lower() == "true"
  78 |         
  79 |     def validate_api_key(self):
  80 |         """Basic API key validation"""
  81 |         if not self.gemini_api_key:
  82 |             return False
  83 |         # Basic format check for Google API keys
  84 |         if not (self.gemini_api_key.startswith('AIza') and len(self.gemini_api_key) == 39):
  85 |             return False
  86 |         return True
  87 | 
  88 | try:
  89 |     config = Config()
  90 |     print(f"✅ Configuration loaded: API_KEY={'*' * 20}..., BIG_MODEL='{config.big_model}', SMALL_MODEL='{config.small_model}'")
  91 | except Exception as e:
  92 |     print(f"🔴 Configuration Error: {e}")
  93 |     sys.exit(1)
  94 | 
  95 | # Apply connection settings to LiteLLM
  96 | litellm.request_timeout = config.request_timeout
  97 | litellm.num_retries = config.max_retries
  98 | 
  99 | # Model Management
 100 | class ModelManager:
 101 |     def __init__(self, config):
 102 |         self.config = config
 103 |         self.base_gemini_models = [
 104 |             "gemini-1.5-pro-latest",
 105 |             "gemini-1.5-pro-preview-0514",
 106 |             "gemini-1.5-flash-latest", 
 107 |             "gemini-1.5-flash-preview-0514",
 108 |             "gemini-pro",
 109 |             "gemini-2.5-pro-preview-05-06",
 110 |             "gemini-2.5-flash-preview-04-17",
 111 |             "gemini-2.0-flash-exp",
 112 |             "gemini-exp-1206"
 113 |         ]
 114 |         self._gemini_models = set(self.base_gemini_models)
 115 |         self._add_env_models()
 116 |     
 117 |     def _add_env_models(self):
 118 |         for model in [self.config.big_model, self.config.small_model]:
 119 |             if model.startswith("gemini") and model not in self._gemini_models:
 120 |                 self._gemini_models.add(model)
 121 |     
 122 |     @property
 123 |     def gemini_models(self) -> List[str]:
 124 |         return sorted(list(self._gemini_models))
 125 |     
 126 |     def validate_and_map_model(self, original_model: str) -> tuple[str, bool]:
 127 |         clean_model = self._clean_model_name(original_model)
 128 |         mapped_model = self._map_model_alias(clean_model)
 129 |         
 130 |         if mapped_model != clean_model:
 131 |             return f"gemini/{mapped_model}", True
 132 |         elif clean_model in self._gemini_models:
 133 |             return f"gemini/{clean_model}", True
 134 |         elif not original_model.startswith('gemini/'):
 135 |             return f"gemini/{original_model}", False
 136 |         else:
 137 |             return original_model, False
 138 |     
 139 |     def _clean_model_name(self, model: str) -> str:
 140 |         if model.startswith('gemini/'):
 141 |             return model[7:]
 142 |         elif model.startswith('anthropic/'):
 143 |             return model[10:]
 144 |         elif model.startswith('openai/'):
 145 |             return model[7:]
 146 |         return model
 147 |     
 148 |     def _map_model_alias(self, clean_model: str) -> str:
 149 |         model_lower = clean_model.lower()
 150 |         
 151 |         if 'haiku' in model_lower:
 152 |             return self.config.small_model
 153 |         elif 'sonnet' in model_lower or 'opus' in model_lower:
 154 |             return self.config.big_model
 155 |         
 156 |         return clean_model
 157 | 
 158 | model_manager = ModelManager(config)
 159 | 
 160 | # Logging Configuration
 161 | logging.basicConfig(
 162 |     level=getattr(logging, config.log_level.upper()),
 163 |     format='%(asctime)s - %(levelname)s - %(message)s',
 164 | )
 165 | logger = logging.getLogger(__name__)
 166 | 
 167 | # Simple message filter
 168 | class SimpleMessageFilter(logging.Filter):
 169 |     def filter(self, record):
 170 |         blocked_phrases = [
 171 |             "LiteLLM completion()",
 172 |             "HTTP Request:",
 173 |             "cost_calculator"
 174 |         ]
 175 |         if hasattr(record, 'msg') and isinstance(record.msg, str):
 176 |             return not any(phrase in record.msg for phrase in blocked_phrases)
 177 |         return True
 178 | 
 179 | root_logger = logging.getLogger()
 180 | root_logger.addFilter(SimpleMessageFilter())
 181 | 
 182 | # Configure uvicorn to be quieter
 183 | for uvicorn_logger in ["uvicorn", "uvicorn.access", "uvicorn.error"]:
 184 |     logging.getLogger(uvicorn_logger).setLevel(logging.WARNING)
 185 | 
 186 | app = FastAPI(title="Gemini-to-Claude API Proxy", version="2.5.0")
 187 | 
 188 | # Enhanced error classification
 189 | def classify_gemini_error(error_msg: str) -> str:
 190 |     """Provide specific error guidance for common Gemini issues."""
 191 |     error_lower = error_msg.lower()
 192 |     
 193 |     # Streaming/parsing errors
 194 |     if "error parsing chunk" in error_lower and "expecting property name" in error_lower:
 195 |         return "Gemini streaming parsing error (malformed JSON chunk). This is a known intermittent Gemini API issue. Please try again or disable streaming by setting stream=false."
 196 |     
 197 |     # Tool schema validation errors
 198 |     if "function_declarations" in error_lower and "format" in error_lower:
 199 |         if "only 'enum' and 'date-time' are supported" in error_lower:
 200 |             return "Tool schema error: Gemini only supports 'enum' and 'date-time' formats for string parameters. Remove other format types like 'url', 'email', 'uri', etc."
 201 |         else:
 202 |             return "Tool schema validation error. Check your tool parameter definitions for unsupported format types or properties."
 203 |     
 204 |     # Rate limiting
 205 |     elif "rate limit" in error_lower or "quota" in error_lower:
 206 |         return "Rate limit or quota exceeded. Please wait a moment and try again. Check your Google Cloud Console for quota limits."
 207 |     
 208 |     # Authentication issues
 209 |     elif "api key" in error_lower or "authentication" in error_lower or "unauthorized" in error_lower:
 210 |         return "API key error. Please check that your GEMINI_API_KEY is valid and has the necessary permissions."
 211 |     
 212 |     # Parsing/streaming issues
 213 |     elif "parsing" in error_lower or "json" in error_lower or "malformed" in error_lower:
 214 |         return "Response parsing error. This is often a temporary Gemini API issue - please retry your request."
 215 |     
 216 |     # Connection issues
 217 |     elif "connection" in error_lower or "timeout" in error_lower:
 218 |         return "Connection or timeout error. Please check your internet connection and try again."
 219 |     
 220 |     # Safety/content filtering
 221 |     elif "safety" in error_lower or "content" in error_lower and "filter" in error_lower:
 222 |         return "Content filtered by Gemini's safety systems. Please modify your request to comply with content policies."
 223 |     
 224 |     # Token/length issues
 225 |     elif "token" in error_lower and ("limit" in error_lower or "exceed" in error_lower):
 226 |         return "Token limit exceeded. Please reduce the length of your request or increase the max_tokens parameter."
 227 |     
 228 |     # Default: return original message
 229 |     return error_msg
 230 | 
 231 | # Enhanced schema cleaner
 232 | def clean_gemini_schema(schema: Any) -> Any:
 233 |     """Recursively removes unsupported fields from a JSON schema for Gemini compatibility."""
 234 |     if isinstance(schema, dict):
 235 |         # Remove fields unsupported by Gemini
 236 |         schema.pop("additionalProperties", None)
 237 |         schema.pop("default", None)
 238 | 
 239 |         # Handle string format restrictions
 240 |         if schema.get("type") == "string" and "format" in schema:
 241 |             allowed_formats = {"enum", "date-time"}
 242 |             if schema["format"] not in allowed_formats:
 243 |                 logger.debug(f"Removing unsupported format '{schema['format']}' for string type in Gemini schema")
 244 |                 schema.pop("format")
 245 | 
 246 |         # Recursively clean nested schemas
 247 |         for key, value in list(schema.items()):
 248 |             schema[key] = clean_gemini_schema(value)
 249 |                 
 250 |     elif isinstance(schema, list):
 251 |         return [clean_gemini_schema(item) for item in schema]
 252 |             
 253 |     return schema
 254 | 
 255 | # Pydantic Models
 256 | class ContentBlockText(BaseModel):
 257 |     type: Literal["text"]
 258 |     text: str
 259 | 
 260 | class ContentBlockImage(BaseModel):
 261 |     type: Literal["image"]
 262 |     source: Dict[str, Any]
 263 | 
 264 | class ContentBlockToolUse(BaseModel):
 265 |     type: Literal["tool_use"]
 266 |     id: str
 267 |     name: str
 268 |     input: Dict[str, Any]
 269 | 
 270 | class ContentBlockToolResult(BaseModel):
 271 |     type: Literal["tool_result"]
 272 |     tool_use_id: str
 273 |     content: Union[str, List[Dict[str, Any]], Dict[str, Any]]
 274 | 
 275 | class SystemContent(BaseModel):
 276 |     type: Literal["text"]
 277 |     text: str
 278 | 
 279 | class Message(BaseModel):
 280 |     role: Literal["user", "assistant"]
 281 |     content: Union[str, List[Union[ContentBlockText, ContentBlockImage, ContentBlockToolUse, ContentBlockToolResult]]]
 282 | 
 283 | class Tool(BaseModel):
 284 |     name: str
 285 |     description: Optional[str] = None
 286 |     input_schema: Dict[str, Any]
 287 | 
 288 | class ThinkingConfig(BaseModel):
 289 |     enabled: bool = True
 290 | 
 291 | class MessagesRequest(BaseModel):
 292 |     model: str
 293 |     max_tokens: int
 294 |     messages: List[Message]
 295 |     system: Optional[Union[str, List[SystemContent]]] = None
 296 |     stop_sequences: Optional[List[str]] = None
 297 |     stream: Optional[bool] = False
 298 |     temperature: Optional[float] = 1.0
 299 |     top_p: Optional[float] = None
 300 |     top_k: Optional[int] = None
 301 |     metadata: Optional[Dict[str, Any]] = None
 302 |     tools: Optional[List[Tool]] = None
 303 |     tool_choice: Optional[Dict[str, Any]] = None
 304 |     thinking: Optional[ThinkingConfig] = None
 305 |     original_model: Optional[str] = None
 306 | 
 307 |     @field_validator('model')
 308 |     @classmethod
 309 |     def validate_model_field(cls, v, info):
 310 |         original_model = v
 311 |         mapped_model, was_mapped = model_manager.validate_and_map_model(v)
 312 |         
 313 |         logger.debug(f"📋 MODEL VALIDATION: Original='{original_model}', Big='{config.big_model}', Small='{config.small_model}'")
 314 |         
 315 |         if was_mapped:
 316 |             logger.debug(f"📌 MODEL MAPPING: '{original_model}' ➡️ '{mapped_model}'")
 317 |         
 318 |         if info and hasattr(info, 'data') and isinstance(info.data, dict):
 319 |             info.data['original_model'] = original_model
 320 |             
 321 |         return mapped_model
 322 | 
 323 | class TokenCountRequest(BaseModel):
 324 |     model: str
 325 |     messages: List[Message]
 326 |     system: Optional[Union[str, List[SystemContent]]] = None
 327 |     tools: Optional[List[Tool]] = None
 328 |     thinking: Optional[ThinkingConfig] = None
 329 |     tool_choice: Optional[Dict[str, Any]] = None
 330 |     original_model: Optional[str] = None
 331 | 
 332 |     @field_validator('model')
 333 |     @classmethod
 334 |     def validate_model_token_count(cls, v, info):
 335 |         mapped_model, _ = model_manager.validate_and_map_model(v)
 336 |         if info and hasattr(info, 'data') and isinstance(info.data, dict):
 337 |             info.data['original_model'] = v
 338 |         return mapped_model
 339 | 
 340 | class TokenCountResponse(BaseModel):
 341 |     input_tokens: int
 342 | 
 343 | class Usage(BaseModel):
 344 |     input_tokens: int
 345 |     output_tokens: int
 346 |     cache_creation_input_tokens: int = 0
 347 |     cache_read_input_tokens: int = 0
 348 | 
 349 | class MessagesResponse(BaseModel):
 350 |     id: str
 351 |     model: str
 352 |     role: Literal["assistant"] = Constants.ROLE_ASSISTANT
 353 |     content: List[Union[ContentBlockText, ContentBlockToolUse]]
 354 |     type: Literal["message"] = "message"
 355 |     stop_reason: Optional[Literal["end_turn", "max_tokens", "stop_sequence", "tool_use", "error"]] = None
 356 |     stop_sequence: Optional[str] = None
 357 |     usage: Usage
 358 | 
 359 | # Tool result parsing
 360 | def parse_tool_result_content(content):
 361 |     """Parse and normalize tool result content into a string format."""
 362 |     if content is None:
 363 |         return "No content provided"
 364 | 
 365 |     if isinstance(content, str):
 366 |         return content
 367 | 
 368 |     if isinstance(content, list):
 369 |         result_parts = []
 370 |         for item in content:
 371 |             if isinstance(item, dict) and item.get("type") == Constants.CONTENT_TEXT:
 372 |                 result_parts.append(item.get("text", ""))
 373 |             elif isinstance(item, str):
 374 |                 result_parts.append(item)
 375 |             elif isinstance(item, dict):
 376 |                 if "text" in item:
 377 |                     result_parts.append(item.get("text", ""))
 378 |                 else:
 379 |                     try:
 380 |                         result_parts.append(json.dumps(item))
 381 |                     except:
 382 |                         result_parts.append(str(item))
 383 |         return "\n".join(result_parts).strip()
 384 | 
 385 |     if isinstance(content, dict):
 386 |         if content.get("type") == Constants.CONTENT_TEXT:
 387 |             return content.get("text", "")
 388 |         try:
 389 |             return json.dumps(content)
 390 |         except:
 391 |             return str(content)
 392 | 
 393 |     try:
 394 |         return str(content)
 395 |     except:
 396 |         return "Unparseable content"
 397 | 
 398 | # Enhanced message conversion
 399 | def convert_anthropic_to_litellm(anthropic_request: MessagesRequest) -> Dict[str, Any]:
 400 |     """Convert Anthropic API request format to LiteLLM format for Gemini."""
 401 |     litellm_messages = []
 402 |     
 403 |     # System message handling
 404 |     if anthropic_request.system:
 405 |         system_text = ""
 406 |         if isinstance(anthropic_request.system, str):
 407 |             system_text = anthropic_request.system
 408 |         elif isinstance(anthropic_request.system, list):
 409 |             text_parts = []
 410 |             for block in anthropic_request.system:
 411 |                 if hasattr(block, 'type') and block.type == Constants.CONTENT_TEXT:
 412 |                     text_parts.append(block.text)
 413 |                 elif isinstance(block, dict) and block.get("type") == Constants.CONTENT_TEXT:
 414 |                     text_parts.append(block.get("text", ""))
 415 |             system_text = "\n\n".join(text_parts)
 416 |         
 417 |         if system_text.strip():
 418 |             litellm_messages.append({"role": Constants.ROLE_SYSTEM, "content": system_text.strip()})
 419 | 
 420 |     # Process messages
 421 |     for msg in anthropic_request.messages:
 422 |         if isinstance(msg.content, str):
 423 |             litellm_messages.append({"role": msg.role, "content": msg.content})
 424 |             continue
 425 | 
 426 |         # Process content blocks - accumulate different types
 427 |         text_parts = []
 428 |         image_parts = []
 429 |         tool_calls = []
 430 |         pending_tool_messages = []
 431 | 
 432 |         for block in msg.content:
 433 |             if block.type == Constants.CONTENT_TEXT:
 434 |                 text_parts.append(block.text)
 435 |             elif block.type == Constants.CONTENT_IMAGE:
 436 |                 if (isinstance(block.source, dict) and 
 437 |                     block.source.get("type") == "base64" and
 438 |                     "media_type" in block.source and "data" in block.source):
 439 |                     image_parts.append({
 440 |                         "type": "image_url",
 441 |                         "image_url": {
 442 |                             "url": f"data:{block.source['media_type']};base64,{block.source['data']}"
 443 |                         }
 444 |                     })
 445 |             elif block.type == Constants.CONTENT_TOOL_USE and msg.role == Constants.ROLE_ASSISTANT:
 446 |                 tool_calls.append({
 447 |                     "id": block.id,
 448 |                     "type": Constants.TOOL_FUNCTION,
 449 |                     Constants.TOOL_FUNCTION: {
 450 |                         "name": block.name,
 451 |                         "arguments": json.dumps(block.input)
 452 |                     }
 453 |                 })
 454 |             elif block.type == Constants.CONTENT_TOOL_RESULT and msg.role == Constants.ROLE_USER:
 455 |                 # CRITICAL: Split user message when tool_result is encountered
 456 |                 if text_parts or image_parts:
 457 |                     content_parts = []
 458 |                     text_content = "".join(text_parts).strip()
 459 |                     if text_content:
 460 |                         content_parts.append({"type": Constants.CONTENT_TEXT, "text": text_content})
 461 |                     content_parts.extend(image_parts)
 462 |                     
 463 |                     litellm_messages.append({
 464 |                         "role": Constants.ROLE_USER,
 465 |                         "content": content_parts[0]["text"] if len(content_parts) == 1 and content_parts[0]["type"] == Constants.CONTENT_TEXT else content_parts
 466 |                     })
 467 |                     text_parts.clear()
 468 |                     image_parts.clear()
 469 | 
 470 |                 # Add tool result as separate "tool" role message
 471 |                 parsed_content = parse_tool_result_content(block.content)
 472 |                 pending_tool_messages.append({
 473 |                     "role": Constants.ROLE_TOOL,
 474 |                     "tool_call_id": block.tool_use_id,
 475 |                     "content": parsed_content
 476 |                 })
 477 | 
 478 |         # Finalize message based on role
 479 |         if msg.role == Constants.ROLE_USER:
 480 |             # Add any remaining text/image content
 481 |             if text_parts or image_parts:
 482 |                 content_parts = []
 483 |                 text_content = "".join(text_parts).strip()
 484 |                 if text_content:
 485 |                     content_parts.append({"type": Constants.CONTENT_TEXT, "text": text_content})
 486 |                 content_parts.extend(image_parts)
 487 |                 
 488 |                 litellm_messages.append({
 489 |                     "role": Constants.ROLE_USER,
 490 |                     "content": content_parts[0]["text"] if len(content_parts) == 1 and content_parts[0]["type"] == Constants.CONTENT_TEXT else content_parts
 491 |                 })
 492 |             # Add any pending tool messages
 493 |             litellm_messages.extend(pending_tool_messages)
 494 |             
 495 |         elif msg.role == Constants.ROLE_ASSISTANT:
 496 |             assistant_msg = {"role": Constants.ROLE_ASSISTANT}
 497 |             
 498 |             # Handle content for assistant messages
 499 |             content_parts = []
 500 |             text_content = "".join(text_parts).strip()
 501 |             if text_content:
 502 |                 content_parts.append({"type": Constants.CONTENT_TEXT, "text": text_content})
 503 |             content_parts.extend(image_parts)
 504 |             
 505 |             # FIXED: Don't set content to None - let LiteLLM handle missing content
 506 |             if content_parts:
 507 |                 assistant_msg["content"] = content_parts[0]["text"] if len(content_parts) == 1 and content_parts[0]["type"] == Constants.CONTENT_TEXT else content_parts
 508 |             else: 
 509 |                 assistant_msg["content"] = None
 510 |                 
 511 |             if tool_calls:
 512 |                 assistant_msg["tool_calls"] = tool_calls
 513 |                 
 514 |             # Only add message if it has actual content or tool calls
 515 |             if assistant_msg.get("content") or assistant_msg.get("tool_calls"):
 516 |                 litellm_messages.append(assistant_msg)
 517 | 
 518 |     # Build final LiteLLM request
 519 |     litellm_request = {
 520 |         "model": anthropic_request.model,
 521 |         "messages": litellm_messages,
 522 |         "max_tokens": min(anthropic_request.max_tokens, config.max_tokens_limit),
 523 |         "temperature": anthropic_request.temperature,
 524 |         "stream": anthropic_request.stream,
 525 |     }
 526 | 
 527 |     # Add optional parameters
 528 |     if anthropic_request.stop_sequences:
 529 |         litellm_request["stop"] = anthropic_request.stop_sequences
 530 |     if anthropic_request.top_p is not None:
 531 |         litellm_request["top_p"] = anthropic_request.top_p
 532 |     if anthropic_request.top_k is not None:
 533 |         litellm_request["topK"] = anthropic_request.top_k
 534 | 
 535 |     # Add tools with schema cleaning
 536 |     if anthropic_request.tools:
 537 |         valid_tools = []
 538 |         for tool in anthropic_request.tools:
 539 |             if tool.name and tool.name.strip():
 540 |                 cleaned_schema = clean_gemini_schema(tool.input_schema)
 541 |                 valid_tools.append({
 542 |                     "type": Constants.TOOL_FUNCTION,
 543 |                     Constants.TOOL_FUNCTION: {
 544 |                         "name": tool.name,
 545 |                         "description": tool.description or "",
 546 |                         "parameters": cleaned_schema
 547 |                     }
 548 |                 })
 549 |         if valid_tools:
 550 |             litellm_request["tools"] = valid_tools
 551 | 
 552 |     # Add tool choice configuration
 553 |     if anthropic_request.tool_choice:
 554 |         choice_type = anthropic_request.tool_choice.get("type")
 555 |         if choice_type == "auto":
 556 |             litellm_request["tool_choice"] = "auto"
 557 |         elif choice_type == "any":
 558 |             litellm_request["tool_choice"] = "auto"
 559 |         elif choice_type == "tool" and "name" in anthropic_request.tool_choice:
 560 |             litellm_request["tool_choice"] = {
 561 |                 "type": Constants.TOOL_FUNCTION, 
 562 |                 Constants.TOOL_FUNCTION: {"name": anthropic_request.tool_choice["name"]}
 563 |             }
 564 |         else:
 565 |             litellm_request["tool_choice"] = "auto"
 566 | 
 567 |     # Add thinking configuration (Gemini specific)
 568 |     if anthropic_request.thinking is not None:
 569 |         if anthropic_request.thinking.enabled:
 570 |             litellm_request["thinkingConfig"] = {"thinkingBudget": 24576}
 571 |         else:
 572 |             litellm_request["thinkingConfig"] = {"thinkingBudget": 0}
 573 | 
 574 |     # Add user metadata if provided
 575 |     if (anthropic_request.metadata and 
 576 |         "user_id" in anthropic_request.metadata and
 577 |         isinstance(anthropic_request.metadata["user_id"], str)):
 578 |         litellm_request["user"] = anthropic_request.metadata["user_id"]
 579 | 
 580 |     return litellm_request
 581 | 
 582 | # Response conversion
 583 | def convert_litellm_to_anthropic(litellm_response, original_request: MessagesRequest) -> MessagesResponse:
 584 |     """Convert LiteLLM (Gemini) response back to Anthropic API format."""
 585 |     try:
 586 |         # Extract response data safely
 587 |         response_id = f"msg_{uuid.uuid4()}"
 588 |         content_text = ""
 589 |         tool_calls = None
 590 |         finish_reason = "stop"
 591 |         prompt_tokens = 0
 592 |         completion_tokens = 0
 593 | 
 594 |         # Handle LiteLLM ModelResponse object format
 595 |         if hasattr(litellm_response, 'choices') and hasattr(litellm_response, 'usage'):
 596 |             choices = litellm_response.choices
 597 |             message = choices[0].message if choices else None
 598 |             content_text = getattr(message, 'content', "") or ""
 599 |             tool_calls = getattr(message, 'tool_calls', None)
 600 |             finish_reason = choices[0].finish_reason if choices else "stop"
 601 |             response_id = getattr(litellm_response, 'id', response_id)
 602 |             
 603 |             if hasattr(litellm_response, 'usage'):
 604 |                 usage = litellm_response.usage
 605 |                 prompt_tokens = getattr(usage, "prompt_tokens", 0)
 606 |                 completion_tokens = getattr(usage, "completion_tokens", 0)
 607 |                 
 608 |         # Handle dictionary response format
 609 |         elif isinstance(litellm_response, dict):
 610 |             choices = litellm_response.get("choices", [])
 611 |             message = choices[0].get("message", {}) if choices else {}
 612 |             content_text = message.get("content", "") or ""
 613 |             tool_calls = message.get("tool_calls")
 614 |             finish_reason = choices[0].get("finish_reason", "stop") if choices else "stop"
 615 |             usage = litellm_response.get("usage", {})
 616 |             prompt_tokens = usage.get("prompt_tokens", 0)
 617 |             completion_tokens = usage.get("completion_tokens", 0)
 618 |             response_id = litellm_response.get("id", response_id)
 619 | 
 620 |         # Build content blocks
 621 |         content_blocks = []
 622 |         
 623 |         # Add text content if present
 624 |         if content_text:
 625 |             content_blocks.append(ContentBlockText(type=Constants.CONTENT_TEXT, text=content_text))
 626 | 
 627 |         # Process tool calls
 628 |         if tool_calls:
 629 |             if not isinstance(tool_calls, list):
 630 |                 tool_calls = [tool_calls]
 631 | 
 632 |             for tool_call in tool_calls:
 633 |                 try:
 634 |                     # Extract tool call data from different formats
 635 |                     if isinstance(tool_call, dict):
 636 |                         tool_id = tool_call.get("id", f"tool_{uuid.uuid4()}")
 637 |                         function_data = tool_call.get(Constants.TOOL_FUNCTION, {})
 638 |                         name = function_data.get("name", "")
 639 |                         arguments_str = function_data.get("arguments", "{}")
 640 |                     elif hasattr(tool_call, "id") and hasattr(tool_call, Constants.TOOL_FUNCTION):
 641 |                         tool_id = tool_call.id
 642 |                         name = tool_call.function.name
 643 |                         arguments_str = tool_call.function.arguments
 644 |                     else:
 645 |                         continue
 646 | 
 647 |                     if not name:
 648 |                         continue
 649 | 
 650 |                     # Parse tool arguments safely
 651 |                     try:
 652 |                         arguments_dict = json.loads(arguments_str)
 653 |                     except json.JSONDecodeError:
 654 |                         arguments_dict = {"raw_arguments": arguments_str}
 655 | 
 656 |                     content_blocks.append(ContentBlockToolUse(
 657 |                         type=Constants.CONTENT_TOOL_USE,
 658 |                         id=tool_id,
 659 |                         name=name,
 660 |                         input=arguments_dict
 661 |                     ))
 662 |                 except Exception as e:
 663 |                     logger.warning(f"Error processing tool call: {e}")
 664 |                     continue
 665 | 
 666 |         # Ensure at least one content block
 667 |         if not content_blocks:
 668 |             content_blocks.append(ContentBlockText(type=Constants.CONTENT_TEXT, text=""))
 669 | 
 670 |         # Map finish reason to Anthropic format
 671 |         if finish_reason == "length":
 672 |             stop_reason = Constants.STOP_MAX_TOKENS
 673 |         elif finish_reason == "tool_calls":
 674 |             stop_reason = Constants.STOP_TOOL_USE
 675 |         elif finish_reason is None and tool_calls:
 676 |             stop_reason = Constants.STOP_TOOL_USE
 677 |         else:
 678 |             stop_reason = Constants.STOP_END_TURN
 679 | 
 680 |         return MessagesResponse(
 681 |             id=response_id,
 682 |             model=original_request.original_model or original_request.model,
 683 |             role=Constants.ROLE_ASSISTANT,
 684 |             content=content_blocks,
 685 |             stop_reason=stop_reason,
 686 |             stop_sequence=None,
 687 |             usage=Usage(
 688 |                 input_tokens=prompt_tokens,
 689 |                 output_tokens=completion_tokens
 690 |             )
 691 |         )
 692 |         
 693 |     except Exception as e:
 694 |         logger.error(f"Error converting response: {e}")
 695 |         return MessagesResponse(
 696 |             id=f"msg_error_{uuid.uuid4()}",
 697 |             model=original_request.original_model or original_request.model,
 698 |             role=Constants.ROLE_ASSISTANT, 
 699 |             content=[ContentBlockText(type=Constants.CONTENT_TEXT, text="Response conversion error")],
 700 |             stop_reason=Constants.STOP_ERROR,
 701 |             usage=Usage(input_tokens=0, output_tokens=0)
 702 |         )
 703 | 
 704 | # Enhanced streaming handler with more robust error recovery
 705 | async def handle_streaming_with_recovery(response_generator, original_request: MessagesRequest):
 706 |     """Enhanced streaming handler with robust error recovery for malformed chunks."""
 707 |     message_id = f"msg_{uuid.uuid4().hex[:24]}"
 708 |     
 709 |     # Send initial SSE events
 710 |     yield f"event: {Constants.EVENT_MESSAGE_START}\ndata: {json.dumps({'type': Constants.EVENT_MESSAGE_START, 'message': {'id': message_id, 'type': 'message', 'role': Constants.ROLE_ASSISTANT, 'model': original_request.original_model or original_request.model, 'content': [], 'stop_reason': None, 'stop_sequence': None, 'usage': {'input_tokens': 0, 'output_tokens': 0}}})}\n\n"
 711 |     
 712 |     yield f"event: {Constants.EVENT_CONTENT_BLOCK_START}\ndata: {json.dumps({'type': Constants.EVENT_CONTENT_BLOCK_START, 'index': 0, 'content_block': {'type': Constants.CONTENT_TEXT, 'text': ''}})}\n\n"
 713 |     
 714 |     yield f"event: {Constants.EVENT_PING}\ndata: {json.dumps({'type': Constants.EVENT_PING})}\n\n"
 715 | 
 716 |     # Streaming state management
 717 |     accumulated_text = ""
 718 |     text_block_index = 0
 719 |     tool_block_counter = 0
 720 |     current_tool_calls = {}
 721 |     input_tokens = 0
 722 |     output_tokens = 0
 723 |     final_stop_reason = Constants.STOP_END_TURN
 724 |     
 725 |     # Enhanced error recovery tracking
 726 |     consecutive_errors = 0
 727 |     max_consecutive_errors = 10  # Increased from 5
 728 |     stream_terminated_early = False
 729 |     malformed_chunks_count = 0
 730 |     max_malformed_chunks = 20  # Allow more malformed chunks before giving up
 731 |     
 732 |     # Buffer for incomplete chunks
 733 |     chunk_buffer = ""
 734 |     
 735 |     def is_malformed_chunk(chunk_str: str) -> bool:
 736 |         """Enhanced malformed chunk detection."""
 737 |         if not chunk_str or not isinstance(chunk_str, str):
 738 |             return True
 739 |             
 740 |         chunk_stripped = chunk_str.strip()
 741 |         
 742 |         # Empty or whitespace
 743 |         if not chunk_stripped:
 744 |             return True
 745 |             
 746 |         # Single characters that indicate malformed JSON
 747 |         malformed_singles = ["{", "}", "[", "]", ",", ":", '"', "'"]
 748 |         if chunk_stripped in malformed_singles:
 749 |             return True
 750 |             
 751 |         # Common malformed patterns
 752 |         malformed_patterns = [
 753 |             '{"', '"}', "[{", "}]", "{}", "[]", 
 754 |             "null", '""', "''", " ", "",
 755 |             "{,", ",}", "[,", ",]"
 756 |         ]
 757 |         if chunk_stripped in malformed_patterns:
 758 |             return True
 759 |             
 760 |         # Incomplete JSON structures
 761 |         if chunk_stripped.startswith('{') and not chunk_stripped.endswith('}'):
 762 |             if len(chunk_stripped) < 15:  # Very short incomplete JSON
 763 |                 return True
 764 |                 
 765 |         if chunk_stripped.startswith('[') and not chunk_stripped.endswith(']'):
 766 |             if len(chunk_stripped) < 10:
 767 |                 return True
 768 |         
 769 |         # Check for obviously broken JSON patterns
 770 |         if chunk_stripped.count('{') != chunk_stripped.count('}'):
 771 |             if len(chunk_stripped) < 20:  # Only for short chunks
 772 |                 return True
 773 |                 
 774 |         if chunk_stripped.count('[') != chunk_stripped.count(']'):
 775 |             if len(chunk_stripped) < 20:
 776 |                 return True
 777 |         
 778 |         return False
 779 |     
 780 |     def try_parse_buffered_chunk(buffer: str) -> tuple[dict, str]:
 781 |         """Try to parse buffered chunks, return parsed chunk and remaining buffer."""
 782 |         if not buffer.strip():
 783 |             return None, ""
 784 |             
 785 |         # Try to find complete JSON objects in the buffer
 786 |         brace_count = 0
 787 |         start_pos = -1
 788 |         
 789 |         for i, char in enumerate(buffer):
 790 |             if char == '{':
 791 |                 if start_pos == -1:
 792 |                     start_pos = i
 793 |                 brace_count += 1
 794 |             elif char == '}':
 795 |                 brace_count -= 1
 796 |                 if brace_count == 0 and start_pos != -1:
 797 |                     # Found complete JSON object
 798 |                     json_str = buffer[start_pos:i+1]
 799 |                     try:
 800 |                         parsed = json.loads(json_str)
 801 |                         remaining_buffer = buffer[i+1:]
 802 |                         return parsed, remaining_buffer
 803 |                     except json.JSONDecodeError:
 804 |                         continue
 805 |         
 806 |         # No complete JSON found
 807 |         return None, buffer
 808 |     
 809 |     try:
 810 |         # Wrap the entire streaming process in comprehensive error handling
 811 |         stream_iterator = aiter(response_generator)
 812 |         
 813 |         while True:
 814 |             try:
 815 |                 # Get next chunk with timeout
 816 |                 try:
 817 |                     chunk = await asyncio.wait_for(anext(stream_iterator), timeout=90.0)
 818 |                 except StopAsyncIteration:
 819 |                     break
 820 |                 except asyncio.TimeoutError:
 821 |                     logger.warning("Streaming timeout, terminating")
 822 |                     stream_terminated_early = True
 823 |                     break
 824 |                 
 825 |                 # Reset consecutive error counter on successful chunk retrieval
 826 |                 consecutive_errors = 0
 827 |                 
 828 |                 # Handle string chunks with enhanced validation
 829 |                 if isinstance(chunk, str):
 830 |                     if chunk.strip() == "[DONE]":
 831 |                         break
 832 |                     
 833 |                     # Check for malformed chunks
 834 |                     if is_malformed_chunk(chunk):
 835 |                         malformed_chunks_count += 1
 836 |                         logger.debug(f"Skipping malformed chunk #{malformed_chunks_count}: '{chunk[:50]}{'...' if len(chunk) > 50 else ''}'")
 837 |                         
 838 |                         if malformed_chunks_count > max_malformed_chunks:
 839 |                             logger.error(f"Too many malformed chunks ({malformed_chunks_count}), terminating stream")
 840 |                             stream_terminated_early = True
 841 |                             break
 842 |                         continue
 843 |                     
 844 |                     # Add to buffer and try to parse
 845 |                     chunk_buffer += chunk
 846 |                     parsed_chunk, chunk_buffer = try_parse_buffered_chunk(chunk_buffer)
 847 |                     
 848 |                     if parsed_chunk is None:
 849 |                         # Keep buffering if we don't have a complete chunk yet
 850 |                         if len(chunk_buffer) > 10000:  # Prevent buffer from growing too large
 851 |                             logger.warning("Chunk buffer too large, clearing")
 852 |                             chunk_buffer = ""
 853 |                         continue
 854 |                     
 855 |                     chunk = parsed_chunk
 856 |                 
 857 |                 # If we have a dictionary at this point, process it
 858 |                 if isinstance(chunk, dict):
 859 |                     # Process the chunk normally (existing logic)
 860 |                     pass
 861 |                 elif hasattr(chunk, 'choices'):
 862 |                     # Process ModelResponse object normally (existing logic)
 863 |                     pass
 864 |                 else:
 865 |                     # Try one more JSON parse attempt
 866 |                     try:
 867 |                         if isinstance(chunk, str):
 868 |                             chunk = json.loads(chunk)
 869 |                         else:
 870 |                             logger.debug(f"Skipping unprocessable chunk type: {type(chunk)}")
 871 |                             continue
 872 |                     except json.JSONDecodeError as parse_error:
 873 |                         logger.debug(f"Failed to parse chunk as JSON: {parse_error}")
 874 |                         continue
 875 | 
 876 |                 # Extract chunk data (your existing logic here)
 877 |                 delta_content_text = None
 878 |                 delta_tool_calls = None
 879 |                 chunk_finish_reason = None
 880 | 
 881 |                 if hasattr(chunk, 'choices') and chunk.choices:
 882 |                     choice = chunk.choices[0]
 883 |                     if hasattr(choice, 'delta') and choice.delta:
 884 |                         delta = choice.delta
 885 |                         delta_content_text = getattr(delta, 'content', None)
 886 |                         if hasattr(delta, 'tool_calls'):
 887 |                             delta_tool_calls = delta.tool_calls
 888 |                     chunk_finish_reason = getattr(choice, 'finish_reason', None)
 889 |                 elif isinstance(chunk, dict):
 890 |                     choices = chunk.get("choices", [])
 891 |                     if choices:
 892 |                         choice = choices[0]
 893 |                         delta = choice.get("delta", {})
 894 |                         delta_content_text = delta.get("content")
 895 |                         delta_tool_calls = delta.get("tool_calls")
 896 |                         chunk_finish_reason = choice.get("finish_reason")
 897 | 
 898 |                 if hasattr(chunk, 'usage') and chunk.usage:
 899 |                     input_tokens = getattr(chunk.usage, 'prompt_tokens', 0)
 900 |                     output_tokens = getattr(chunk.usage, 'completion_tokens', 0)
 901 |                 elif isinstance(chunk, dict) and "usage" in chunk:
 902 |                     usage = chunk["usage"]
 903 |                     input_tokens = usage.get("prompt_tokens", 0)
 904 |                     output_tokens = usage.get("completion_tokens", 0)
 905 | 
 906 |                 # Handle text delta
 907 |                 if delta_content_text:
 908 |                     accumulated_text += delta_content_text
 909 |                     yield f"event: {Constants.EVENT_CONTENT_BLOCK_DELTA}\ndata: {json.dumps({'type': Constants.EVENT_CONTENT_BLOCK_DELTA, 'index': text_block_index, 'delta': {'type': Constants.DELTA_TEXT, 'text': delta_content_text}})}\n\n"
 910 | 
 911 |                 # Handle tool call deltas (your existing logic)
 912 |                 if delta_tool_calls:
 913 |                     for tc_chunk in delta_tool_calls:
 914 |                         if not (hasattr(tc_chunk, 'function') and tc_chunk.function and 
 915 |                                hasattr(tc_chunk.function, 'name') and tc_chunk.function.name):
 916 |                             continue
 917 |                             
 918 |                         tool_call_id = tc_chunk.id
 919 |                         
 920 |                         if tool_call_id not in current_tool_calls:
 921 |                             tool_block_counter += 1
 922 |                             tool_index = text_block_index + tool_block_counter
 923 |                             
 924 |                             current_tool_calls[tool_call_id] = {
 925 |                                 "index": tool_index,
 926 |                                 "name": tc_chunk.function.name or "",
 927 |                                 "args_buffer": tc_chunk.function.arguments or ""
 928 |                             }
 929 |                             
 930 |                             yield f"event: {Constants.EVENT_CONTENT_BLOCK_START}\ndata: {json.dumps({'type': Constants.EVENT_CONTENT_BLOCK_START, 'index': tool_index, 'content_block': {'type': Constants.CONTENT_TOOL_USE, 'id': tool_call_id, 'name': current_tool_calls[tool_call_id]['name'], 'input': {}}})}\n\n"
 931 |                         
 932 |                         if tc_chunk.function.arguments:
 933 |                             current_tool_calls[tool_call_id]["args_buffer"] += tc_chunk.function.arguments
 934 |                             yield f"event: {Constants.EVENT_CONTENT_BLOCK_DELTA}\ndata: {json.dumps({'type': Constants.EVENT_CONTENT_BLOCK_DELTA, 'index': current_tool_calls[tool_call_id]['index'], 'delta': {'type': Constants.DELTA_INPUT_JSON, 'partial_json': tc_chunk.function.arguments}})}\n\n"
 935 | 
 936 |                 # Handle finish reason
 937 |                 if chunk_finish_reason:
 938 |                     if chunk_finish_reason == "length":
 939 |                         final_stop_reason = Constants.STOP_MAX_TOKENS
 940 |                     elif chunk_finish_reason == "tool_calls":
 941 |                         final_stop_reason = Constants.STOP_TOOL_USE
 942 |                     elif chunk_finish_reason == "stop":
 943 |                         final_stop_reason = Constants.STOP_END_TURN
 944 |                     else:
 945 |                         final_stop_reason = Constants.STOP_END_TURN
 946 |                     break
 947 |                         
 948 |             except (json.JSONDecodeError, ValueError) as parse_error:
 949 |                 consecutive_errors += 1
 950 |                 logger.debug(f"JSON parsing error (attempt {consecutive_errors}/{max_consecutive_errors}): {parse_error}")
 951 |                 
 952 |                 if consecutive_errors >= max_consecutive_errors:
 953 |                     logger.error(f"Too many consecutive parsing errors ({consecutive_errors}), terminating stream")
 954 |                     stream_terminated_early = True
 955 |                     break
 956 |                 continue
 957 |                 
 958 |             except (litellm.exceptions.APIConnectionError, RuntimeError) as api_error:
 959 |                 consecutive_errors += 1
 960 |                 error_msg = str(api_error)
 961 |                 
 962 |                 # Check for the specific malformed chunk error
 963 |                 if ("Error parsing chunk" in error_msg and 
 964 |                     "Expecting property name enclosed in double quotes" in error_msg):
 965 |                     
 966 |                     logger.warning(f"Gemini malformed chunk error (attempt {consecutive_errors}/{max_consecutive_errors})")
 967 |                     
 968 |                     if consecutive_errors >= max_consecutive_errors:
 969 |                         logger.error(f"Too many consecutive API errors ({consecutive_errors}), terminating stream")
 970 |                         stream_terminated_early = True
 971 |                         
 972 |                         # Send error info to client
 973 |                         error_text = f"\n⚠️ Gemini streaming encountered repeated malformed chunks. This is a known API issue.\n"
 974 |                         yield f"event: {Constants.EVENT_CONTENT_BLOCK_DELTA}\ndata: {json.dumps({'type': Constants.EVENT_CONTENT_BLOCK_DELTA, 'index': text_block_index, 'delta': {'type': Constants.DELTA_TEXT, 'text': error_text}})}\n\n"
 975 |                         break
 976 |                     
 977 |                     # Brief delay before continuing
 978 |                     await asyncio.sleep(0.1)
 979 |                     continue
 980 |                 else:
 981 |                     # Other API errors - terminate immediately
 982 |                     logger.error(f"API error: {api_error}")
 983 |                     stream_terminated_early = True
 984 |                     break
 985 |                     
 986 |             except Exception as general_error:
 987 |                 consecutive_errors += 1
 988 |                 logger.error(f"Unexpected streaming error (attempt {consecutive_errors}/{max_consecutive_errors}): {general_error}")
 989 |                 
 990 |                 if consecutive_errors >= max_consecutive_errors:
 991 |                     logger.error(f"Too many consecutive errors ({consecutive_errors}), terminating stream")
 992 |                     stream_terminated_early = True
 993 |                     break
 994 |                 
 995 |                 # Brief delay before continuing
 996 |                 await asyncio.sleep(0.1)
 997 |                 continue
 998 | 
 999 |     except Exception as outer_error:
1000 |         logger.error(f"Fatal streaming error: {outer_error}")
1001 |         stream_terminated_early = True
1002 | 
1003 |     # Always send final SSE events
1004 |     try:
1005 |         yield f"event: {Constants.EVENT_CONTENT_BLOCK_STOP}\ndata: {json.dumps({'type': Constants.EVENT_CONTENT_BLOCK_STOP, 'index': text_block_index})}\n\n"
1006 |         
1007 |         for tool_data in current_tool_calls.values():
1008 |             yield f"event: {Constants.EVENT_CONTENT_BLOCK_STOP}\ndata: {json.dumps({'type': Constants.EVENT_CONTENT_BLOCK_STOP, 'index': tool_data['index']})}\n\n"
1009 |         
1010 |         if stream_terminated_early and final_stop_reason == Constants.STOP_END_TURN:
1011 |             final_stop_reason = Constants.STOP_ERROR
1012 |         
1013 |         usage_data = {"input_tokens": input_tokens, "output_tokens": output_tokens}
1014 |         yield f"event: {Constants.EVENT_MESSAGE_DELTA}\ndata: {json.dumps({'type': Constants.EVENT_MESSAGE_DELTA, 'delta': {'stop_reason': final_stop_reason, 'stop_sequence': None}, 'usage': usage_data})}\n\n"
1015 |         yield f"event: {Constants.EVENT_MESSAGE_STOP}\ndata: {json.dumps({'type': Constants.EVENT_MESSAGE_STOP})}\n\n"
1016 |         
1017 |         # Log final statistics
1018 |         if malformed_chunks_count > 0:
1019 |             logger.info(f"Stream completed with {malformed_chunks_count} malformed chunks handled")
1020 |             
1021 |     except Exception as final_error:
1022 |         logger.error(f"Error sending final SSE events: {final_error}")
1023 | 
1024 | # Request Middleware
1025 | @app.middleware("http")
1026 | async def log_requests(request: Request, call_next):
1027 |     method = request.method
1028 |     path = request.url.path
1029 |     logger.debug(f"Request: {method} {path}")
1030 |     response = await call_next(request)
1031 |     return response
1032 | 
1033 | # Enhanced streaming retry logic for the main endpoint
1034 | @app.post("/v1/messages")
1035 | async def create_message(request: MessagesRequest, raw_request: Request):
1036 |     try:
1037 |         logger.debug(f"📊 Processing request: Original={request.original_model}, Effective={request.model}, Stream={request.stream}")
1038 | 
1039 |         # Check streaming configuration
1040 |         if request.stream and config.emergency_disable_streaming:
1041 |             logger.warning("Streaming disabled via EMERGENCY_DISABLE_STREAMING")
1042 |             request.stream = False
1043 | 
1044 |         if request.stream and config.force_disable_streaming:
1045 |             logger.info("Streaming disabled via FORCE_DISABLE_STREAMING")
1046 |             request.stream = False
1047 | 
1048 |         # Convert request
1049 |         litellm_request = convert_anthropic_to_litellm(request)
1050 |         litellm_request["api_key"] = config.gemini_api_key
1051 |         
1052 |         # Log request details
1053 |         num_tools = len(request.tools) if request.tools else 0
1054 |         log_request_beautifully(
1055 |             "POST", raw_request.url.path,
1056 |             request.original_model or request.model,
1057 |             litellm_request.get('model'),
1058 |             len(litellm_request['messages']),
1059 |             num_tools, 200
1060 |         )
1061 | 
1062 |         # Enhanced streaming with better retry logic
1063 |         if request.stream:
1064 |             streaming_retry_count = 0
1065 |             max_retries = config.max_streaming_retries
1066 |             
1067 |             while streaming_retry_count <= max_retries:
1068 |                 try:
1069 |                     logger.debug(f"Attempting streaming (attempt {streaming_retry_count + 1}/{max_retries + 1})")
1070 |                     
1071 |                     # Add slight delay between retries
1072 |                     if streaming_retry_count > 0:
1073 |                         delay = min(0.5 * (2 ** streaming_retry_count), 2.0)  # Exponential backoff, max 2s
1074 |                         logger.debug(f"Waiting {delay}s before retry...")
1075 |                         await asyncio.sleep(delay)
1076 |                     
1077 |                     response_generator = await litellm.acompletion(**litellm_request)
1078 |                     
1079 |                     return StreamingResponse(
1080 |                         handle_streaming_with_recovery(response_generator, request),
1081 |                         media_type="text/event-stream",
1082 |                         headers={
1083 |                             "Cache-Control": "no-cache",
1084 |                             "Connection": "keep-alive",
1085 |                             "X-Accel-Buffering": "no",
1086 |                             "Access-Control-Allow-Origin": "*",
1087 |                             "Access-Control-Allow-Headers": "*"
1088 |                         }
1089 |                     )
1090 |                     
1091 |                 except (litellm.exceptions.APIConnectionError, RuntimeError) as streaming_error:
1092 |                     streaming_retry_count += 1
1093 |                     error_msg = str(streaming_error)
1094 |                     
1095 |                     # Check for the specific malformed chunk error
1096 |                     if ("Error parsing chunk" in error_msg and 
1097 |                         "Expecting property name enclosed in double quotes" in error_msg):
1098 |                         
1099 |                         if streaming_retry_count <= max_retries:
1100 |                             logger.warning(f"Gemini streaming chunk parsing error (attempt {streaming_retry_count}/{max_retries + 1}), retrying...")
1101 |                             continue
1102 |                         else:
1103 |                             logger.error(f"Gemini streaming failed after {max_retries + 1} attempts due to malformed chunks, falling back to non-streaming")
1104 |                             break
1105 |                     else:
1106 |                         # Other streaming errors - could be connection issues
1107 |                         if streaming_retry_count <= max_retries:
1108 |                             logger.warning(f"Streaming error (attempt {streaming_retry_count}/{max_retries + 1}): {error_msg}")
1109 |                             continue
1110 |                         else:
1111 |                             logger.error(f"Streaming failed after {max_retries + 1} attempts, falling back to non-streaming")
1112 |                             break
1113 |                             
1114 |                 except Exception as unexpected_error:
1115 |                     streaming_retry_count += 1
1116 |                     logger.error(f"Unexpected streaming error (attempt {streaming_retry_count}/{max_retries + 1}): {unexpected_error}")
1117 |                     
1118 |                     if streaming_retry_count <= max_retries:
1119 |                         continue
1120 |                     else:
1121 |                         logger.error(f"Streaming failed after {max_retries + 1} attempts due to unexpected errors, falling back to non-streaming")
1122 |                         break
1123 |             
1124 |             # If we get here, streaming failed - fall back to non-streaming
1125 |             logger.info("Falling back to non-streaming mode")
1126 |             litellm_request["stream"] = False
1127 |         
1128 |         # Non-streaming path (or fallback)
1129 |         if not request.stream or litellm_request.get("stream") == False:
1130 |             start_time = time.time()
1131 |             litellm_response = await litellm.acompletion(**litellm_request)
1132 |             logger.debug(f"✅ Response received: Model={litellm_request.get('model')}, Time={time.time() - start_time:.2f}s")
1133 |             
1134 |             anthropic_response = convert_litellm_to_anthropic(litellm_response, request)
1135 |             return anthropic_response
1136 | 
1137 |     except litellm.exceptions.APIError as e:
1138 |         logger.error(f"LiteLLM API Error: {e}")
1139 |         error_msg = classify_gemini_error(str(e))
1140 |         raise HTTPException(status_code=getattr(e, 'status_code', 500), detail=error_msg)
1141 |     except ConnectionError as e:
1142 |         logger.error(f"Connection Error: {e}")
1143 |         raise HTTPException(status_code=503, detail="Connection error. Please check your internet connection.")
1144 |     except TimeoutError as e:
1145 |         logger.error(f"Timeout Error: {e}")
1146 |         raise HTTPException(status_code=504, detail="Request timeout. Please try again.")
1147 |     except Exception as e:
1148 |         logger.error(f"Error processing request: {e}")
1149 |         error_msg = classify_gemini_error(str(e))
1150 |         raise HTTPException(status_code=500, detail=error_msg)
1151 | 
1152 | @app.post("/v1/messages/count_tokens")
1153 | async def count_tokens(request: TokenCountRequest, raw_request: Request):
1154 |     try:
1155 |         # Create temporary request for conversion
1156 |         temp_request = MessagesRequest(
1157 |             model=request.model,
1158 |             max_tokens=1,
1159 |             messages=request.messages,
1160 |             system=request.system,
1161 |             tools=request.tools,
1162 |         )
1163 |         
1164 |         litellm_data = convert_anthropic_to_litellm(temp_request)
1165 |         
1166 |         # Log request
1167 |         num_tools = len(request.tools) if request.tools else 0
1168 |         log_request_beautifully(
1169 |             "POST", raw_request.url.path,
1170 |             request.original_model or request.model,
1171 |             litellm_data.get('model'),
1172 |             len(litellm_data['messages']), num_tools, 200
1173 |         )
1174 | 
1175 |         # Count tokens
1176 |         token_count = litellm.token_counter(
1177 |             model=litellm_data["model"],
1178 |             messages=litellm_data["messages"],
1179 |         )
1180 |         
1181 |         return TokenCountResponse(input_tokens=token_count)
1182 | 
1183 |     except Exception as e:
1184 |         logger.error(f"Error counting tokens: {str(e)}")
1185 |         error_msg = classify_gemini_error(str(e))
1186 |         raise HTTPException(status_code=500, detail=f"Error counting tokens: {error_msg}")
1187 | 
1188 | @app.get("/health")
1189 | async def health_check():
1190 |     try:
1191 |         health_status = {
1192 |             "status": "healthy",
1193 |             "timestamp": datetime.now().isoformat(),
1194 |             "version": "2.5.0",
1195 |             "gemini_api_configured": bool(config.gemini_api_key),
1196 |             "api_key_valid": config.validate_api_key(),
1197 |             "streaming_config": {
1198 |                 "force_disabled": config.force_disable_streaming,
1199 |                 "emergency_disabled": config.emergency_disable_streaming,
1200 |                 "max_retries": config.max_streaming_retries
1201 |             }
1202 |         }
1203 |         
1204 |         return health_status
1205 |         
1206 |     except Exception as e:
1207 |         logger.error(f"Health check error: {e}")
1208 |         return JSONResponse(
1209 |             status_code=503,
1210 |             content={
1211 |                 "status": "unhealthy",
1212 |                 "timestamp": datetime.now().isoformat(),
1213 |                 "error": "Health check failed"
1214 |             }
1215 |         )
1216 | 
1217 | @app.get("/test-connection")
1218 | async def test_connection():
1219 |     """Test API connectivity to Gemini"""
1220 |     try:
1221 |         # Simple test request to verify API connectivity
1222 |         test_response = await litellm.acompletion(
1223 |             model="gemini/gemini-1.5-flash-latest",
1224 |             messages=[{"role": "user", "content": "Hello"}],
1225 |             max_tokens=5,
1226 |             api_key=config.gemini_api_key
1227 |         )
1228 |         
1229 |         return {
1230 |             "status": "success",
1231 |             "message": "Successfully connected to Gemini API",
1232 |             "model_used": "gemini-1.5-flash-latest",
1233 |             "timestamp": datetime.now().isoformat(),
1234 |             "response_id": getattr(test_response, 'id', 'unknown')
1235 |         }
1236 |         
1237 |     except litellm.exceptions.APIError as e:
1238 |         logger.error(f"API connectivity test failed: {e}")
1239 |         return JSONResponse(
1240 |             status_code=503,
1241 |             content={
1242 |                 "status": "failed",
1243 |                 "error_type": "API Error",
1244 |                 "message": classify_gemini_error(str(e)),
1245 |                 "timestamp": datetime.now().isoformat(),
1246 |                 "suggestions": [
1247 |                     "Check your GEMINI_API_KEY is valid",
1248 |                     "Verify your API key has the necessary permissions",
1249 |                     "Check if you have reached rate limits"
1250 |                 ]
1251 |             }
1252 |         )
1253 |     except Exception as e:
1254 |         logger.error(f"Connection test failed: {e}")
1255 |         return JSONResponse(
1256 |             status_code=503,
1257 |             content={
1258 |                 "status": "failed",
1259 |                 "error_type": "Connection Error", 
1260 |                 "message": classify_gemini_error(str(e)),
1261 |                 "timestamp": datetime.now().isoformat(),
1262 |                 "suggestions": [
1263 |                     "Check your internet connection",
1264 |                     "Verify firewall settings allow HTTPS traffic",
1265 |                     "Try again in a few moments"
1266 |                 ]
1267 |             }
1268 |         )
1269 | 
1270 | @app.get("/")
1271 | async def root():
1272 |     return {
1273 |         "message": f"Enhanced Gemini-to-Claude API Proxy v2.5.0",
1274 |         "status": "running",
1275 |         "config": {
1276 |             "big_model": config.big_model,
1277 |             "small_model": config.small_model,
1278 |             "available_models": model_manager.gemini_models[:5],
1279 |             "max_tokens_limit": config.max_tokens_limit,
1280 |             "api_key_configured": bool(config.gemini_api_key),
1281 |             "streaming": {
1282 |                 "force_disabled": config.force_disable_streaming,
1283 |                 "emergency_disabled": config.emergency_disable_streaming,
1284 |                 "max_retries": config.max_streaming_retries
1285 |             }
1286 |         },
1287 |         "endpoints": {
1288 |             "messages": "/v1/messages",
1289 |             "count_tokens": "/v1/messages/count_tokens", 
1290 |             "health": "/health",
1291 |             "test_connection": "/test-connection"
1292 |         }
1293 |     }
1294 | 
1295 | # Simple logging utilities
1296 | class Colors:
1297 |     CYAN = "\033[96m"
1298 |     BLUE = "\033[94m" 
1299 |     GREEN = "\033[92m"
1300 |     YELLOW = "\033[93m"
1301 |     RED = "\033[91m"
1302 |     MAGENTA = "\033[95m"
1303 |     RESET = "\033[0m"
1304 |     BOLD = "\033[1m"
1305 | 
1306 | def log_request_beautifully(method: str, path: str, requested_model: str, 
1307 |                            gemini_model_used: str, num_messages: int, 
1308 |                            num_tools: int, status_code: int):
1309 |     if not sys.stdout.isatty():
1310 |         print(f"{method} {path} - {requested_model} -> {gemini_model_used} ({num_messages} messages, {num_tools} tools)")
1311 |         return
1312 |     
1313 |     # Colorized logging for TTY
1314 |     req_display = f"{Colors.CYAN}{requested_model}{Colors.RESET}"
1315 |     gemini_display = f"{Colors.GREEN}{gemini_model_used.replace('gemini/', '')}{Colors.RESET}"
1316 |     
1317 |     endpoint = path.split("?")[0] if "?" in path else path
1318 |     tools_str = f"{Colors.MAGENTA}{num_tools} tools{Colors.RESET}"
1319 |     messages_str = f"{Colors.BLUE}{num_messages} messages{Colors.RESET}"
1320 |     
1321 |     if status_code == 200:
1322 |         status_str = f"{Colors.GREEN}✓ {status_code} OK{Colors.RESET}"
1323 |     else:
1324 |         status_str = f"{Colors.RED}✗ {status_code}{Colors.RESET}"
1325 | 
1326 |     log_line = f"{Colors.BOLD}{method} {endpoint}{Colors.RESET} {status_str}"
1327 |     model_line = f"Request: {req_display} → Gemini: {gemini_display} ({tools_str}, {messages_str})"
1328 | 
1329 |     print(log_line)
1330 |     print(model_line)
1331 |     sys.stdout.flush()
1332 | 
1333 | def validate_startup():
1334 |     """Validate configuration and connectivity on startup"""
1335 |     print("🔍 Validating startup configuration...")
1336 |     
1337 |     # Check API key
1338 |     if not config.gemini_api_key:
1339 |         print("🔴 FATAL: GEMINI_API_KEY is not set")
1340 |         return False
1341 |     
1342 |     if not config.validate_api_key():
1343 |         print("⚠️ WARNING: API key format validation failed")
1344 |     
1345 |     # Check network connectivity (basic)
1346 |     try:
1347 |         import socket
1348 |         socket.create_connection(("8.8.8.8", 53), timeout=10)
1349 |         print("✅ Network connectivity: OK")
1350 |     except OSError:
1351 |         print("⚠️ WARNING: Network connectivity check failed")
1352 |         
1353 |     return True
1354 | 
1355 | def main():
1356 |     if len(sys.argv) > 1 and sys.argv[1] == "--help":
1357 |         print("Enhanced Gemini-to-Claude API Proxy v2.5.0")
1358 |         print("")
1359 |         print("Usage: uvicorn server:app --reload --host 0.0.0.0 --port 8082")
1360 |         print("")
1361 |         print("Required environment variables:")
1362 |         print("  GEMINI_API_KEY - Your Google Gemini API key")
1363 |         print("")
1364 |         print("Optional environment variables:")
1365 |         print(f"  BIG_MODEL - Big model name (default: gemini-1.5-pro-latest)")
1366 |         print(f"  SMALL_MODEL - Small model name (default: gemini-1.5-flash-latest)")
1367 |         print(f"  HOST - Server host (default: 0.0.0.0)")
1368 |         print(f"  PORT - Server port (default: 8082)")
1369 |         print(f"  LOG_LEVEL - Logging level (default: WARNING)")
1370 |         print(f"  MAX_TOKENS_LIMIT - Token limit (default: 8192)")
1371 |         print(f"  REQUEST_TIMEOUT - Request timeout in seconds (default: 60)")
1372 |         print(f"  MAX_RETRIES - Maximum retries (default: 2)")
1373 |         print(f"  MAX_STREAMING_RETRIES - Maximum streaming retries (default: 2)")
1374 |         print(f"  FORCE_DISABLE_STREAMING - Force disable streaming (default: false)")
1375 |         print(f"  EMERGENCY_DISABLE_STREAMING - Emergency disable streaming (default: false)")
1376 |         print("")
1377 |         print("Available Gemini models:")
1378 |         for model in model_manager.gemini_models:
1379 |             print(f"  - {model}")
1380 |         sys.exit(0)
1381 | 
1382 |     # Validate startup configuration
1383 |     if not validate_startup():
1384 |         print("🔴 Startup validation failed. Please check your configuration.")
1385 |         sys.exit(1)
1386 | 
1387 |     # Configuration summary
1388 |     print("🚀 Enhanced Gemini-to-Claude API Proxy v2.5.0")
1389 |     print(f"✅ Configuration loaded successfully")
1390 |     print(f"   Big Model: {config.big_model}")
1391 |     print(f"   Small Model: {config.small_model}")
1392 |     print(f"   Available Models: {len(model_manager.gemini_models)}")
1393 |     print(f"   Max Tokens Limit: {config.max_tokens_limit}")
1394 |     print(f"   Request Timeout: {config.request_timeout}s")
1395 |     print(f"   Max Retries: {config.max_retries}")
1396 |     print(f"   Max Streaming Retries: {config.max_streaming_retries}")
1397 |     print(f"   Force Disable Streaming: {config.force_disable_streaming}")
1398 |     print(f"   Emergency Disable Streaming: {config.emergency_disable_streaming}")
1399 |     print(f"   Log Level: {config.log_level}")
1400 |     print(f"   Server: {config.host}:{config.port}")
1401 |     print("")
1402 | 
1403 |     # Start server
1404 |     uvicorn.run(
1405 |         app, 
1406 |         host=config.host, 
1407 |         port=config.port, 
1408 |         log_level=config.log_level.lower()
1409 |     )
1410 | 
1411 | if __name__ == "__main__":
1412 |     main()
1413 | 


--------------------------------------------------------------------------------