├── .env.example ├── CLAUDE.md ├── LICENSE ├── README.md ├── image.png ├── requirements.txt └── server.py /.env.example: -------------------------------------------------------------------------------- 1 | # Required: Your Google AI Studio API key 2 | GEMINI_API_KEY="your-google-ai-studio-key" 3 | 4 | # Optional: Model mappings for Claude Code aliases 5 | BIG_MODEL="gemini-1.5-pro-latest" # For 'sonnet' or 'opus' requests 6 | SMALL_MODEL="gemini-1.5-flash-latest" # For 'haiku' requests 7 | 8 | # Optional: Server settings 9 | HOST="0.0.0.0" 10 | PORT="8082" 11 | LOG_LEVEL="WARNING" # DEBUG, INFO, WARNING, ERROR, CRITICAL 12 | 13 | # Optional: Performance and reliability settings 14 | MAX_TOKENS_LIMIT="8192" # Max tokens for Gemini responses 15 | REQUEST_TIMEOUT="90" # Request timeout in seconds 16 | MAX_RETRIES="2" # LiteLLM retries to Gemini 17 | MAX_STREAMING_RETRIES="12" # Streaming-specific retry attempts 18 | 19 | # Optional: Streaming control (use if experiencing issues) 20 | FORCE_DISABLE_STREAMING="false" # Disable streaming globally 21 | EMERGENCY_DISABLE_STREAMING="false" # Emergency streaming disable 22 | -------------------------------------------------------------------------------- /CLAUDE.md: -------------------------------------------------------------------------------- 1 | # Claude Code: Best Practices for Effective Collaboration 2 | 3 | This document outlines best practices for working with Claude Code to ensure efficient and successful software development tasks. 4 | 5 | ## Task Management 6 | 7 | For complex or multi-step tasks, Claude Code will use: 8 | * **TodoWrite**: To create a structured task list, breaking down the work into manageable steps. This provides clarity on the plan and allows for tracking progress. 9 | * **TodoRead**: To review the current list of tasks and their status, ensuring alignment and that all objectives are being addressed. 10 | 11 | ## File Handling and Reading 12 | 13 | Understanding file content is crucial before making modifications. 14 | 15 | 1. **Targeted Information Retrieval**: 16 | * When searching for specific content, patterns, or definitions within a codebase, prefer using search tools like `Grep` or `Task` (with a focused search prompt). This is more efficient than reading entire files. 17 | 18 | 2. **Reading File Content**: 19 | * **Small to Medium Files**: For files where full context is needed or that are not excessively large, the `Read` tool can be used to retrieve the entire content. 20 | * **Large File Strategy**: 21 | 1. **Assess Size**: Before reading a potentially large file, its size should be determined (e.g., using `ls -l` via the `Bash` tool or by an initial `Read` with a small `limit` to observe if content is truncated). 22 | 2. **Chunked Reading**: If a file is large (e.g., over a few thousand lines), it should be read in manageable chunks (e.g., 1000-2000 lines at a time) using the `offset` and `limit` parameters of the `Read` tool. This ensures all content can be processed without issues. 23 | * Always ensure that the file path provided to `Read` is absolute. 24 | 25 | ## File Editing 26 | 27 | Precision is key for successful file edits. The following strategies lead to reliable modifications: 28 | 29 | 1. **Pre-Edit Read**: **Always** use the `Read` tool to fetch the content of the file *immediately before* attempting any `Edit` or `MultiEdit` operation. This ensures modifications are based on the absolute latest version of the file. 30 | 31 | 2. **Constructing `old_string` (The text to be replaced)**: 32 | * **Exact Match**: The `old_string` must be an *exact* character-for-character match of the segment in the file you intend to replace. This includes all whitespace (spaces, tabs, newlines) and special characters. 33 | * **No Read Artifacts**: Crucially, do *not* include any formatting artifacts from the `Read` tool's output (e.g., `cat -n` style line numbers or display-only leading tabs) in the `old_string`. It must only contain the literal characters as they exist in the raw file. 34 | * **Sufficient Context & Uniqueness**: Provide enough context (surrounding lines) in `old_string` to make it uniquely identifiable at the intended edit location. The "Anchor on a Known Good Line" strategy is preferred: `old_string` is a larger, unique block of text surrounding the change or insertion point. This is highly reliable. 35 | 36 | 3. **Constructing `new_string` (The replacement text)**: 37 | * **Exact Representation**: The `new_string` must accurately represent the desired state of the code, including correct indentation, whitespace, and newlines. 38 | * **No Read Artifacts**: As with `old_string`, ensure `new_string` does *not* contain any `Read` tool output artifacts. 39 | 40 | 4. **Choosing the Right Editing Tool**: 41 | * **`Edit` Tool**: Suitable for a single, well-defined replacement in a file. 42 | * **`MultiEdit` Tool**: Preferred when multiple changes are needed within the same file. Edits are applied sequentially, with each subsequent edit operating on the result of the previous one. This tool is highly effective for complex modifications. 43 | 44 | 5. **Verification**: 45 | * The success confirmation from the `Edit` or `MultiEdit` tool (especially if `expected_replacements` is used and matches) is the primary indicator that the change was made. 46 | * If further visual confirmation is needed, use the `Read` tool with `offset` and `limit` parameters to view only the specific section of the file that was changed, rather than re-reading the entire file. 47 | 48 | ### Reliable Code Insertion with MultiEdit 49 | 50 | When inserting larger blocks of new code (e.g., multiple functions or methods) where a simple `old_string` might be fragile due to surrounding code, the following `MultiEdit` strategy can be more robust: 51 | 52 | 1. **First Edit - Targeted Insertion Point**: For the primary code block you want to insert (e.g., new methods within a class), identify a short, unique, and stable line of code immediately *after* your desired insertion point. Use this stable line as the `old_string`. 53 | * The `new_string` will consist of your new block of code, followed by a newline, and then the original `old_string` (the stable line you matched on). 54 | * Example: If inserting methods into a class, the `old_string` might be the closing brace `}` of the class, or a comment that directly follows the class. 55 | 56 | 2. **Second Edit (Optional) - Ancillary Code**: If there's another, smaller piece of related code to insert (e.g., a function call within an existing method, or an import statement), perform this as a separate, more straightforward edit within the `MultiEdit` call. This edit usually has a more clearly defined and less ambiguous `old_string`. 57 | 58 | **Rationale**: 59 | * By anchoring the main insertion on a very stable, unique line *after* the insertion point and prepending the new code to it, you reduce the risk of `old_string` mismatches caused by subtle variations in the code *before* the insertion point. 60 | * Keeping ancillary edits separate allows them to succeed even if the main insertion point is complex, as they often target simpler, more reliable `old_string` patterns. 61 | * This approach leverages `MultiEdit`'s sequential application of changes effectively. 62 | 63 | **Example Scenario**: Adding new methods to a class and a call to one of these new methods elsewhere. 64 | * **Edit 1**: Insert the new methods. `old_string` is the class's closing brace `}`. `new_string` is ` 65 | [new methods code] 66 | }`. 67 | * **Edit 2**: Insert the call to a new method. `old_string` is `// existing line before call`. `new_string` is `// existing line before call 68 | this.newMethodCall();`. 69 | 70 | This method provides a balance between precise editing and handling larger code insertions reliably when direct `old_string` matches for the entire new block are problematic. 71 | 72 | ## Handling Large Files for Incremental Refactoring 73 | 74 | When refactoring large files incrementally rather than rewriting them completely: 75 | 76 | 1. **Initial Exploration and Planning**: 77 | * Begin with targeted searches using `Grep` to locate specific patterns or sections within the file. 78 | * Use `Bash` commands like `grep -n "pattern" file` to find line numbers for specific areas of interest. 79 | * Create a clear mental model of the file structure before proceeding with edits. 80 | 81 | 2. **Chunked Reading for Large Files**: 82 | * For files too large to read at once, use multiple `Read` operations with different `offset` and `limit` parameters. 83 | * Read sequential chunks to build a complete understanding of the file. 84 | * Use `Grep` to pinpoint key sections, then read just those sections with targeted `offset` parameters. 85 | 86 | 3. **Finding Key Implementation Sections**: 87 | * Use `Bash` commands with `grep -A N` (to show N lines after a match) or `grep -B N` (to show N lines before) to locate function or method implementations. 88 | * Example: `grep -n "function findTagBoundaries" -A 20 filename.js` to see the first 20 lines of a function. 89 | 90 | 4. **Pattern-Based Replacement Strategy**: 91 | * Identify common patterns that need to be replaced across the file. 92 | * Use the `Bash` tool with `sed` for quick previews of potential replacements. 93 | * Example: `sed -n "s/oldPattern/newPattern/gp" filename.js` to preview changes without making them. 94 | 95 | 5. **Sequential Selective Edits**: 96 | * Target specific sections or patterns one at a time rather than attempting a complete rewrite. 97 | * Focus on clearest/simplest cases first to establish a pattern of successful edits. 98 | * Use `Edit` for well-defined single changes within the file. 99 | 100 | 6. **Batch Similar Changes Together**: 101 | * Group similar types of changes (e.g., all references to a particular function or variable). 102 | * Use `Bash` with `sed` to preview the scope of batch changes: `grep -n "pattern" filename.js | wc -l` 103 | * For systematic changes across a file, consider using `sed` through the `Bash` tool: `sed -i "s/oldPattern/newPattern/g" filename.js` 104 | 105 | 7. **Incremental Verification**: 106 | * After each set of changes, verify the specific sections that were modified. 107 | * For critical components, read the surrounding context to ensure the changes integrate correctly. 108 | * Validate that each change maintains the file's structure and logic before proceeding to the next. 109 | 110 | 8. **Progress Tracking for Large Refactors**: 111 | * Use the `TodoWrite` tool to track which sections or patterns have been updated. 112 | * Create a checklist of all required changes and mark them off as they're completed. 113 | * Record any sections that require special attention or that couldn't be automatically refactored. 114 | 115 | ## Commit Messages 116 | 117 | When Claude Code generates commit messages on your behalf: 118 | * The `Co-Authored-By: Claude ` line will **not** be included. 119 | * The `🤖 Generated with [Claude Code](https://claude.ai/code)` line will **not** be included. 120 | 121 | ## General Interaction 122 | 123 | Claude Code will directly apply proposed changes and modifications using the available tools, rather than describing them and asking you to implement them manually. This ensures a more efficient and direct workflow. -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE 2 | Version 2, December 2004 3 | 4 | Copyright (C) 2004 Sam Hocevar 5 | 6 | Everyone is permitted to copy and distribute verbatim or modified 7 | copies of this license document, and changing it is allowed as long 8 | as the name is changed. 9 | 10 | DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE 11 | TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 12 | 13 | 0. You just DO WHAT THE FUCK YOU WANT TO. 14 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Gemini for Claude Code: An Anthropic-Compatible Proxy 2 | 3 | This server acts as a bridge, enabling you to use **Claude Code** with Google's powerful **Gemini models**. It translates API requests and responses between the Anthropic format (used by Claude Code) and the Gemini format (via LiteLLM), allowing seamless integration. 4 | 5 | ![Claude Code with Gemini Proxy](image.png) 6 | 7 | ## Features 8 | 9 | - **Claude Code Compatibility with Gemini**: Directly use the Claude Code CLI with Google Gemini models. 10 | - **Seamless Model Mapping**: Intelligently maps Claude Code model requests (e.g., `haiku`, `sonnet`, `opus` aliases) to your chosen Gemini models. 11 | - **LiteLLM Integration**: Leverages LiteLLM for robust and flexible interaction with the Gemini API. 12 | - **Enhanced Streaming Support**: Handles streaming responses from Gemini with robust error recovery for malformed chunks and API errors. 13 | - **Complete Tool Use for Claude Code**: Translates Claude Code's tool usage (function calling) to and from Gemini's format, with robust handling of tool results. 14 | - **Advanced Error Handling**: Provides specific and actionable error messages for common Gemini API issues with automatic fallback strategies. 15 | - **Resilient Architecture**: Gracefully handles Gemini API instability with smart retry logic and fallback to non-streaming modes. 16 | - **Diagnostic Endpoints**: Includes `/health` and `/test-connection` for easier troubleshooting of your setup. 17 | - **Token Counting**: Offers a `/v1/messages/count_tokens` endpoint compatible with Claude Code. 18 | 19 | ## Recent Improvements (v2.5.0) 20 | 21 | ### 🛡️ Enhanced Error Resilience 22 | - **Malformed Chunk Recovery**: Automatically detects and handles malformed JSON chunks from Gemini streaming 23 | - **Smart Retry Logic**: Exponential backoff with configurable retry limits for streaming errors 24 | - **Graceful Fallback**: Seamlessly switches to non-streaming mode when streaming fails 25 | - **Buffer Management**: Intelligent chunk buffering and reconstruction for incomplete JSON 26 | - **Connection Stability**: Handles Gemini 500 Internal Server Errors with automatic retry 27 | 28 | ### 📊 Improved Monitoring 29 | - **Detailed Error Classification**: Specific guidance for different types of Gemini API errors 30 | - **Enhanced Logging**: Comprehensive error tracking with malformed chunk statistics 31 | - **Real-time Status**: Better health checks and connection testing 32 | 33 | ## Prerequisites 34 | 35 | - A Google Gemini API key. 36 | - Python 3.8+. 37 | - Claude Code CLI installed (e.g., `npm install -g @anthropic-ai/claude-code`). 38 | 39 | ## Setup 40 | 41 | 1. **Clone the repository**: 42 | ```bash 43 | git clone https://github.com/coffeegrind123/gemini-code.git # Or your fork 44 | cd gemini-code 45 | ``` 46 | 47 | 2. **Create and activate a virtual environment** (recommended): 48 | ```bash 49 | python3 -m venv .venv 50 | source .venv/bin/activate 51 | ``` 52 | 53 | 3. **Install dependencies**: 54 | ```bash 55 | pip install -r requirements.txt 56 | ``` 57 | 58 | 4. **Configure Environment Variables**: 59 | Copy the example environment file: 60 | ```bash 61 | cp .env.example .env 62 | ``` 63 | Edit `.env` and add your Gemini API key. You can also customize model mappings and server settings: 64 | ```dotenv 65 | # Required: Your Google AI Studio API key 66 | GEMINI_API_KEY="your-google-ai-studio-key" 67 | 68 | # Optional: Model mappings for Claude Code aliases 69 | BIG_MODEL="gemini-1.5-pro-latest" # For 'sonnet' or 'opus' requests 70 | SMALL_MODEL="gemini-1.5-flash-latest" # For 'haiku' requests 71 | 72 | # Optional: Server settings 73 | HOST="0.0.0.0" 74 | PORT="8082" 75 | LOG_LEVEL="WARNING" # DEBUG, INFO, WARNING, ERROR, CRITICAL 76 | 77 | # Optional: Performance and reliability settings 78 | MAX_TOKENS_LIMIT="8192" # Max tokens for Gemini responses 79 | REQUEST_TIMEOUT="90" # Request timeout in seconds 80 | MAX_RETRIES="2" # LiteLLM retries to Gemini 81 | MAX_STREAMING_RETRIES="12" # Streaming-specific retry attempts 82 | 83 | # Optional: Streaming control (use if experiencing issues) 84 | FORCE_DISABLE_STREAMING="false" # Disable streaming globally 85 | EMERGENCY_DISABLE_STREAMING="false" # Emergency streaming disable 86 | ``` 87 | 88 | 5. **Run the server**: 89 | The `server.py` script includes a `main()` function that starts the Uvicorn server: 90 | ```bash 91 | python server.py 92 | ``` 93 | For development with auto-reload (restarts when you save changes to `server.py`): 94 | ```bash 95 | uvicorn server:app --host 0.0.0.0 --port 8082 --reload 96 | ``` 97 | You can view all startup options, including configurable environment variables, by running: 98 | ```bash 99 | python server.py --help 100 | ``` 101 | 102 | ## Usage with Claude Code 103 | 104 | 1. **Start the Proxy Server**: Ensure the Gemini proxy server (this application) is running (see step 5 above). 105 | 106 | 2. **Configure Claude Code to Use the Proxy**: 107 | Set the `ANTHROPIC_BASE_URL` environment variable when running Claude Code: 108 | ```bash 109 | ANTHROPIC_BASE_URL=http://localhost:8082 claude 110 | ``` 111 | Replace `localhost:8082` if your proxy is running on a different host or port. 112 | 113 | 3. **Utilize `CLAUDE.md` for Optimal Gemini Performance (Crucial)**: 114 | - This repository includes a `CLAUDE.md` file. This file contains specific instructions and best practices tailored to help **Gemini** effectively understand and respond to **Claude Code's** unique command structure, tool usage patterns, and desired output formats. 115 | - **Copy `CLAUDE.md` into your project directory**: 116 | ```bash 117 | cp /path/to/gemini-code/CLAUDE.md /your/project/directory/ 118 | ``` 119 | - When Claude Code starts in a directory containing `CLAUDE.md`, it automatically reads this file and incorporates its content into the system prompt. This is essential for guiding Gemini to work optimally within the Claude Code environment. 120 | 121 | ## How It Works: Powering Claude Code with Gemini 122 | 123 | 1. **Claude Code Request**: You issue a command or prompt in the Claude Code CLI. 124 | 2. **Anthropic Format**: Claude Code sends an API request (in Anthropic's Messages API format) to the proxy server's address (`http://localhost:8082`). 125 | 3. **Proxy Translation (Anthropic to Gemini)**: The proxy server: 126 | * Receives the Anthropic-formatted request. 127 | * Validates it and maps any Claude model aliases (like `claude-3-sonnet...`) to the corresponding Gemini model specified in your `.env` (e.g., `gemini-1.5-pro-latest`). 128 | * Translates the message structure, content blocks, and tool definitions into a format LiteLLM can use with the Gemini API. 129 | 4. **LiteLLM to Gemini**: LiteLLM sends the prepared request to the target Gemini model using your `GEMINI_API_KEY`. 130 | 5. **Gemini Response**: Gemini processes the request and sends its response back through LiteLLM. 131 | 6. **Proxy Translation (Gemini to Anthropic)**: The proxy server: 132 | * Receives the Gemini response from LiteLLM (this can be a stream of events or a complete JSON object). 133 | * Handles streaming errors and malformed chunks with intelligent recovery. 134 | * Converts Gemini's output (text, tool calls, stop reasons) back into the Anthropic Messages API format that Claude Code expects. 135 | 7. **Response to Claude Code**: The proxy sends the Anthropic-formatted response back to your Claude Code client, which then displays the result or performs the requested action. 136 | 137 | ## Model Mapping for Claude Code 138 | 139 | To ensure Claude Code's model requests are handled correctly by Gemini: 140 | 141 | - Requests from Claude Code for model names containing **"haiku"** (e.g., `claude-3-haiku-20240307`) are mapped to the Gemini model specified by your `SMALL_MODEL` environment variable (default: `gemini-1.5-flash-latest`). 142 | - Requests from Claude Code for model names containing **"sonnet"** or **"opus"** (e.g., `claude-3-sonnet-20240229`, `claude-3-opus-20240229`) are mapped to the Gemini model specified by your `BIG_MODEL` environment variable (default: `gemini-1.5-pro-latest`). 143 | - If Claude Code requests a full Gemini model name (e.g., `gemini/gemini-1.5-pro-latest`), the proxy will use that directly. 144 | 145 | The server maintains a list of known Gemini models. If a recognized Gemini model is requested by the client without the `gemini/` prefix, the proxy will add it. 146 | 147 | ## Endpoints 148 | 149 | - `POST /v1/messages`: The primary endpoint for Claude Code to send messages to Gemini. It's fully compatible with the Anthropic Messages API specification that Claude Code uses. 150 | - `POST /v1/messages/count_tokens`: Allows Claude Code to estimate the token count for a set of messages, using Gemini's tokenization. 151 | - `GET /health`: Returns the health status of the proxy, including API key configuration, streaming settings, and basic API key validation. 152 | - `GET /test-connection`: Performs a quick API call to Gemini to verify connectivity and that your `GEMINI_API_KEY` is working. 153 | - `GET /`: Root endpoint providing a welcome message, current configuration summary (models, limits), and available endpoints. 154 | 155 | ## Error Handling & Troubleshooting 156 | 157 | ### Common Issues and Solutions 158 | 159 | **Streaming Errors (malformed chunks):** 160 | - The proxy automatically handles malformed JSON chunks from Gemini 161 | - If streaming becomes unstable, set `FORCE_DISABLE_STREAMING=true` as a temporary fix 162 | - Increase `MAX_STREAMING_RETRIES` for more resilient streaming 163 | 164 | **Gemini 500 Internal Server Errors:** 165 | - The proxy automatically retries with exponential backoff 166 | - These are temporary Gemini API issues that resolve automatically 167 | - Check `/health` endpoint to monitor API status 168 | 169 | **Connection Timeouts:** 170 | - Increase `REQUEST_TIMEOUT` if experiencing frequent timeouts 171 | - Check your internet connection and firewall settings 172 | - Use `/test-connection` endpoint to verify API connectivity 173 | 174 | **Rate Limiting:** 175 | - Monitor your Google AI Studio quota in the Google Cloud Console 176 | - The proxy will provide specific rate limit guidance in error messages 177 | 178 | ### Emergency Mode 179 | 180 | If you experience persistent issues: 181 | ```bash 182 | # Disable streaming temporarily 183 | export EMERGENCY_DISABLE_STREAMING=true 184 | 185 | # Or force disable all streaming 186 | export FORCE_DISABLE_STREAMING=true 187 | ``` 188 | 189 | ## Logging 190 | 191 | The server provides detailed logs, which are especially useful for understanding how Claude Code requests are translated for Gemini and for monitoring error recovery. Logs are colorized in TTY environments for easier reading. Adjust verbosity with the `LOG_LEVEL` environment variable: 192 | 193 | - `DEBUG`: Detailed request/response logging and error recovery steps 194 | - `INFO`: General operation logging 195 | - `WARNING`: Error recovery and fallback notifications (recommended) 196 | - `ERROR`: Only errors and failures 197 | - `CRITICAL`: Only critical failures 198 | 199 | ## The `CLAUDE.MD` File: Guiding Gemini for Claude Code 200 | 201 | The `CLAUDE.MD` file included in this repository is critical for achieving the best experience when using this proxy with Claude Code and Gemini. 202 | 203 | **Purpose:** 204 | 205 | - **Tailors Gemini to Claude Code's Needs**: Claude Code has specific ways it expects an LLM to behave, especially regarding tool use, file operations, and output formatting. `CLAUDE.MD` provides Gemini with explicit instructions on these expectations. 206 | - **Improves Tool Reliability**: By outlining how tools should be called and results interpreted, it helps Gemini make more effective use of Claude Code's capabilities. 207 | - **Enhances Code Generation & Understanding**: Gives Gemini context about the development environment and coding standards, leading to better code suggestions within Claude Code. 208 | - **Reduces Misinterpretations**: Helps bridge any gaps between how Anthropic models might interpret Claude Code directives versus how Gemini might. 209 | 210 | **How Claude Code Uses It:** 211 | 212 | When you run `claude` in a project directory, the Claude Code CLI automatically looks for a `CLAUDE.MD` file in that directory. If found, its contents are prepended to the system prompt for every request sent to the LLM (in this case, your Gemini proxy). 213 | 214 | **Recommendation:** Always copy the `CLAUDE.MD` from this proxy's repository into the root of any project where you intend to use Claude Code with this Gemini proxy. This ensures Gemini receives these vital instructions for every session. 215 | 216 | ## Performance Tips 217 | 218 | - **Model Selection**: Use `gemini-1.5-flash-latest` for faster responses, `gemini-1.5-pro-latest` for more complex tasks 219 | - **Streaming**: Keep streaming enabled for better interactivity; the proxy handles errors automatically 220 | - **Timeouts**: Increase `REQUEST_TIMEOUT` for complex requests that need more processing time 221 | - **Retries**: Adjust `MAX_STREAMING_RETRIES` based on your network stability 222 | 223 | ## Contributing 224 | 225 | Contributions, issues, and feature requests are welcome! Please submit them on the GitHub repository. 226 | 227 | Areas where contributions are especially valuable: 228 | - Additional Gemini model support 229 | - Performance optimizations 230 | - Enhanced error recovery strategies 231 | - Documentation improvements 232 | 233 | ## Thanks 234 | 235 | This project was heavily inspired by and builds upon the foundational work of the [claude-code-proxy by @1rgs](https://github.com/1rgs/claude-code-proxy). Their original proxy was instrumental in demonstrating the viability of such a bridge. 236 | 237 | Special thanks to the community for testing and feedback on error handling improvements. 238 | -------------------------------------------------------------------------------- /image.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/coffeegrind123/gemini-code/69bb0c18f7a3b8f8c448576a8480189a06a53ea4/image.png -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | fastapi[standard]>=0.115.11 2 | uvicorn>=0.34.0 3 | httpx>=0.25.0 4 | pydantic>=2.0.0 5 | litellm>=1.40.14 6 | python-dotenv>=1.0.0 -------------------------------------------------------------------------------- /server.py: -------------------------------------------------------------------------------- 1 | from fastapi import FastAPI, Request, HTTPException 2 | import uvicorn 3 | import logging 4 | import json 5 | import re 6 | import asyncio 7 | from pydantic import BaseModel, Field, field_validator 8 | from typing import List, Dict, Any, Optional, Union, Literal, Set 9 | import os 10 | from fastapi.responses import JSONResponse, StreamingResponse 11 | import litellm 12 | import uuid 13 | import time 14 | from dotenv import load_dotenv 15 | from datetime import datetime 16 | import sys 17 | 18 | # Load environment variables early 19 | load_dotenv() 20 | 21 | # Basic LiteLLM Configuration - conservative settings to avoid hanging 22 | litellm.drop_params = True 23 | litellm.set_verbose = False 24 | litellm.request_timeout = 90 25 | 26 | # Constants for better maintainability 27 | class Constants: 28 | ROLE_USER = "user" 29 | ROLE_ASSISTANT = "assistant" 30 | ROLE_SYSTEM = "system" 31 | ROLE_TOOL = "tool" 32 | 33 | CONTENT_TEXT = "text" 34 | CONTENT_IMAGE = "image" 35 | CONTENT_TOOL_USE = "tool_use" 36 | CONTENT_TOOL_RESULT = "tool_result" 37 | 38 | TOOL_FUNCTION = "function" 39 | 40 | STOP_END_TURN = "end_turn" 41 | STOP_MAX_TOKENS = "max_tokens" 42 | STOP_TOOL_USE = "tool_use" 43 | STOP_ERROR = "error" 44 | 45 | EVENT_MESSAGE_START = "message_start" 46 | EVENT_MESSAGE_STOP = "message_stop" 47 | EVENT_MESSAGE_DELTA = "message_delta" 48 | EVENT_CONTENT_BLOCK_START = "content_block_start" 49 | EVENT_CONTENT_BLOCK_STOP = "content_block_stop" 50 | EVENT_CONTENT_BLOCK_DELTA = "content_block_delta" 51 | EVENT_PING = "ping" 52 | 53 | DELTA_TEXT = "text_delta" 54 | DELTA_INPUT_JSON = "input_json_delta" 55 | 56 | # Simple Configuration 57 | class Config: 58 | def __init__(self): 59 | self.gemini_api_key = os.environ.get("GEMINI_API_KEY") 60 | if not self.gemini_api_key: 61 | raise ValueError("GEMINI_API_KEY not found in environment variables") 62 | 63 | self.big_model = os.environ.get("BIG_MODEL", "gemini-1.5-pro-latest") 64 | self.small_model = os.environ.get("SMALL_MODEL", "gemini-1.5-flash-latest") 65 | self.host = os.environ.get("HOST", "0.0.0.0") 66 | self.port = int(os.environ.get("PORT", "8082")) 67 | self.log_level = os.environ.get("LOG_LEVEL", "WARNING") 68 | self.max_tokens_limit = int(os.environ.get("MAX_TOKENS_LIMIT", "8192")) 69 | 70 | # Connection settings - conservative defaults 71 | self.request_timeout = int(os.environ.get("REQUEST_TIMEOUT", "90")) 72 | self.max_retries = int(os.environ.get("MAX_RETRIES", "2")) 73 | 74 | # Streaming settings 75 | self.max_streaming_retries = int(os.environ.get("MAX_STREAMING_RETRIES", "12")) 76 | self.force_disable_streaming = os.environ.get("FORCE_DISABLE_STREAMING", "false").lower() == "true" 77 | self.emergency_disable_streaming = os.environ.get("EMERGENCY_DISABLE_STREAMING", "false").lower() == "true" 78 | 79 | def validate_api_key(self): 80 | """Basic API key validation""" 81 | if not self.gemini_api_key: 82 | return False 83 | # Basic format check for Google API keys 84 | if not (self.gemini_api_key.startswith('AIza') and len(self.gemini_api_key) == 39): 85 | return False 86 | return True 87 | 88 | try: 89 | config = Config() 90 | print(f"✅ Configuration loaded: API_KEY={'*' * 20}..., BIG_MODEL='{config.big_model}', SMALL_MODEL='{config.small_model}'") 91 | except Exception as e: 92 | print(f"🔴 Configuration Error: {e}") 93 | sys.exit(1) 94 | 95 | # Apply connection settings to LiteLLM 96 | litellm.request_timeout = config.request_timeout 97 | litellm.num_retries = config.max_retries 98 | 99 | # Model Management 100 | class ModelManager: 101 | def __init__(self, config): 102 | self.config = config 103 | self.base_gemini_models = [ 104 | "gemini-1.5-pro-latest", 105 | "gemini-1.5-pro-preview-0514", 106 | "gemini-1.5-flash-latest", 107 | "gemini-1.5-flash-preview-0514", 108 | "gemini-pro", 109 | "gemini-2.5-pro-preview-05-06", 110 | "gemini-2.5-flash-preview-04-17", 111 | "gemini-2.0-flash-exp", 112 | "gemini-exp-1206" 113 | ] 114 | self._gemini_models = set(self.base_gemini_models) 115 | self._add_env_models() 116 | 117 | def _add_env_models(self): 118 | for model in [self.config.big_model, self.config.small_model]: 119 | if model.startswith("gemini") and model not in self._gemini_models: 120 | self._gemini_models.add(model) 121 | 122 | @property 123 | def gemini_models(self) -> List[str]: 124 | return sorted(list(self._gemini_models)) 125 | 126 | def validate_and_map_model(self, original_model: str) -> tuple[str, bool]: 127 | clean_model = self._clean_model_name(original_model) 128 | mapped_model = self._map_model_alias(clean_model) 129 | 130 | if mapped_model != clean_model: 131 | return f"gemini/{mapped_model}", True 132 | elif clean_model in self._gemini_models: 133 | return f"gemini/{clean_model}", True 134 | elif not original_model.startswith('gemini/'): 135 | return f"gemini/{original_model}", False 136 | else: 137 | return original_model, False 138 | 139 | def _clean_model_name(self, model: str) -> str: 140 | if model.startswith('gemini/'): 141 | return model[7:] 142 | elif model.startswith('anthropic/'): 143 | return model[10:] 144 | elif model.startswith('openai/'): 145 | return model[7:] 146 | return model 147 | 148 | def _map_model_alias(self, clean_model: str) -> str: 149 | model_lower = clean_model.lower() 150 | 151 | if 'haiku' in model_lower: 152 | return self.config.small_model 153 | elif 'sonnet' in model_lower or 'opus' in model_lower: 154 | return self.config.big_model 155 | 156 | return clean_model 157 | 158 | model_manager = ModelManager(config) 159 | 160 | # Logging Configuration 161 | logging.basicConfig( 162 | level=getattr(logging, config.log_level.upper()), 163 | format='%(asctime)s - %(levelname)s - %(message)s', 164 | ) 165 | logger = logging.getLogger(__name__) 166 | 167 | # Simple message filter 168 | class SimpleMessageFilter(logging.Filter): 169 | def filter(self, record): 170 | blocked_phrases = [ 171 | "LiteLLM completion()", 172 | "HTTP Request:", 173 | "cost_calculator" 174 | ] 175 | if hasattr(record, 'msg') and isinstance(record.msg, str): 176 | return not any(phrase in record.msg for phrase in blocked_phrases) 177 | return True 178 | 179 | root_logger = logging.getLogger() 180 | root_logger.addFilter(SimpleMessageFilter()) 181 | 182 | # Configure uvicorn to be quieter 183 | for uvicorn_logger in ["uvicorn", "uvicorn.access", "uvicorn.error"]: 184 | logging.getLogger(uvicorn_logger).setLevel(logging.WARNING) 185 | 186 | app = FastAPI(title="Gemini-to-Claude API Proxy", version="2.5.0") 187 | 188 | # Enhanced error classification 189 | def classify_gemini_error(error_msg: str) -> str: 190 | """Provide specific error guidance for common Gemini issues.""" 191 | error_lower = error_msg.lower() 192 | 193 | # Streaming/parsing errors 194 | if "error parsing chunk" in error_lower and "expecting property name" in error_lower: 195 | return "Gemini streaming parsing error (malformed JSON chunk). This is a known intermittent Gemini API issue. Please try again or disable streaming by setting stream=false." 196 | 197 | # Tool schema validation errors 198 | if "function_declarations" in error_lower and "format" in error_lower: 199 | if "only 'enum' and 'date-time' are supported" in error_lower: 200 | return "Tool schema error: Gemini only supports 'enum' and 'date-time' formats for string parameters. Remove other format types like 'url', 'email', 'uri', etc." 201 | else: 202 | return "Tool schema validation error. Check your tool parameter definitions for unsupported format types or properties." 203 | 204 | # Rate limiting 205 | elif "rate limit" in error_lower or "quota" in error_lower: 206 | return "Rate limit or quota exceeded. Please wait a moment and try again. Check your Google Cloud Console for quota limits." 207 | 208 | # Authentication issues 209 | elif "api key" in error_lower or "authentication" in error_lower or "unauthorized" in error_lower: 210 | return "API key error. Please check that your GEMINI_API_KEY is valid and has the necessary permissions." 211 | 212 | # Parsing/streaming issues 213 | elif "parsing" in error_lower or "json" in error_lower or "malformed" in error_lower: 214 | return "Response parsing error. This is often a temporary Gemini API issue - please retry your request." 215 | 216 | # Connection issues 217 | elif "connection" in error_lower or "timeout" in error_lower: 218 | return "Connection or timeout error. Please check your internet connection and try again." 219 | 220 | # Safety/content filtering 221 | elif "safety" in error_lower or "content" in error_lower and "filter" in error_lower: 222 | return "Content filtered by Gemini's safety systems. Please modify your request to comply with content policies." 223 | 224 | # Token/length issues 225 | elif "token" in error_lower and ("limit" in error_lower or "exceed" in error_lower): 226 | return "Token limit exceeded. Please reduce the length of your request or increase the max_tokens parameter." 227 | 228 | # Default: return original message 229 | return error_msg 230 | 231 | # Enhanced schema cleaner 232 | def clean_gemini_schema(schema: Any) -> Any: 233 | """Recursively removes unsupported fields from a JSON schema for Gemini compatibility.""" 234 | if isinstance(schema, dict): 235 | # Remove fields unsupported by Gemini 236 | schema.pop("additionalProperties", None) 237 | schema.pop("default", None) 238 | 239 | # Handle string format restrictions 240 | if schema.get("type") == "string" and "format" in schema: 241 | allowed_formats = {"enum", "date-time"} 242 | if schema["format"] not in allowed_formats: 243 | logger.debug(f"Removing unsupported format '{schema['format']}' for string type in Gemini schema") 244 | schema.pop("format") 245 | 246 | # Recursively clean nested schemas 247 | for key, value in list(schema.items()): 248 | schema[key] = clean_gemini_schema(value) 249 | 250 | elif isinstance(schema, list): 251 | return [clean_gemini_schema(item) for item in schema] 252 | 253 | return schema 254 | 255 | # Pydantic Models 256 | class ContentBlockText(BaseModel): 257 | type: Literal["text"] 258 | text: str 259 | 260 | class ContentBlockImage(BaseModel): 261 | type: Literal["image"] 262 | source: Dict[str, Any] 263 | 264 | class ContentBlockToolUse(BaseModel): 265 | type: Literal["tool_use"] 266 | id: str 267 | name: str 268 | input: Dict[str, Any] 269 | 270 | class ContentBlockToolResult(BaseModel): 271 | type: Literal["tool_result"] 272 | tool_use_id: str 273 | content: Union[str, List[Dict[str, Any]], Dict[str, Any]] 274 | 275 | class SystemContent(BaseModel): 276 | type: Literal["text"] 277 | text: str 278 | 279 | class Message(BaseModel): 280 | role: Literal["user", "assistant"] 281 | content: Union[str, List[Union[ContentBlockText, ContentBlockImage, ContentBlockToolUse, ContentBlockToolResult]]] 282 | 283 | class Tool(BaseModel): 284 | name: str 285 | description: Optional[str] = None 286 | input_schema: Dict[str, Any] 287 | 288 | class ThinkingConfig(BaseModel): 289 | enabled: bool = True 290 | 291 | class MessagesRequest(BaseModel): 292 | model: str 293 | max_tokens: int 294 | messages: List[Message] 295 | system: Optional[Union[str, List[SystemContent]]] = None 296 | stop_sequences: Optional[List[str]] = None 297 | stream: Optional[bool] = False 298 | temperature: Optional[float] = 1.0 299 | top_p: Optional[float] = None 300 | top_k: Optional[int] = None 301 | metadata: Optional[Dict[str, Any]] = None 302 | tools: Optional[List[Tool]] = None 303 | tool_choice: Optional[Dict[str, Any]] = None 304 | thinking: Optional[ThinkingConfig] = None 305 | original_model: Optional[str] = None 306 | 307 | @field_validator('model') 308 | @classmethod 309 | def validate_model_field(cls, v, info): 310 | original_model = v 311 | mapped_model, was_mapped = model_manager.validate_and_map_model(v) 312 | 313 | logger.debug(f"📋 MODEL VALIDATION: Original='{original_model}', Big='{config.big_model}', Small='{config.small_model}'") 314 | 315 | if was_mapped: 316 | logger.debug(f"📌 MODEL MAPPING: '{original_model}' ➡️ '{mapped_model}'") 317 | 318 | if info and hasattr(info, 'data') and isinstance(info.data, dict): 319 | info.data['original_model'] = original_model 320 | 321 | return mapped_model 322 | 323 | class TokenCountRequest(BaseModel): 324 | model: str 325 | messages: List[Message] 326 | system: Optional[Union[str, List[SystemContent]]] = None 327 | tools: Optional[List[Tool]] = None 328 | thinking: Optional[ThinkingConfig] = None 329 | tool_choice: Optional[Dict[str, Any]] = None 330 | original_model: Optional[str] = None 331 | 332 | @field_validator('model') 333 | @classmethod 334 | def validate_model_token_count(cls, v, info): 335 | mapped_model, _ = model_manager.validate_and_map_model(v) 336 | if info and hasattr(info, 'data') and isinstance(info.data, dict): 337 | info.data['original_model'] = v 338 | return mapped_model 339 | 340 | class TokenCountResponse(BaseModel): 341 | input_tokens: int 342 | 343 | class Usage(BaseModel): 344 | input_tokens: int 345 | output_tokens: int 346 | cache_creation_input_tokens: int = 0 347 | cache_read_input_tokens: int = 0 348 | 349 | class MessagesResponse(BaseModel): 350 | id: str 351 | model: str 352 | role: Literal["assistant"] = Constants.ROLE_ASSISTANT 353 | content: List[Union[ContentBlockText, ContentBlockToolUse]] 354 | type: Literal["message"] = "message" 355 | stop_reason: Optional[Literal["end_turn", "max_tokens", "stop_sequence", "tool_use", "error"]] = None 356 | stop_sequence: Optional[str] = None 357 | usage: Usage 358 | 359 | # Tool result parsing 360 | def parse_tool_result_content(content): 361 | """Parse and normalize tool result content into a string format.""" 362 | if content is None: 363 | return "No content provided" 364 | 365 | if isinstance(content, str): 366 | return content 367 | 368 | if isinstance(content, list): 369 | result_parts = [] 370 | for item in content: 371 | if isinstance(item, dict) and item.get("type") == Constants.CONTENT_TEXT: 372 | result_parts.append(item.get("text", "")) 373 | elif isinstance(item, str): 374 | result_parts.append(item) 375 | elif isinstance(item, dict): 376 | if "text" in item: 377 | result_parts.append(item.get("text", "")) 378 | else: 379 | try: 380 | result_parts.append(json.dumps(item)) 381 | except: 382 | result_parts.append(str(item)) 383 | return "\n".join(result_parts).strip() 384 | 385 | if isinstance(content, dict): 386 | if content.get("type") == Constants.CONTENT_TEXT: 387 | return content.get("text", "") 388 | try: 389 | return json.dumps(content) 390 | except: 391 | return str(content) 392 | 393 | try: 394 | return str(content) 395 | except: 396 | return "Unparseable content" 397 | 398 | # Enhanced message conversion 399 | def convert_anthropic_to_litellm(anthropic_request: MessagesRequest) -> Dict[str, Any]: 400 | """Convert Anthropic API request format to LiteLLM format for Gemini.""" 401 | litellm_messages = [] 402 | 403 | # System message handling 404 | if anthropic_request.system: 405 | system_text = "" 406 | if isinstance(anthropic_request.system, str): 407 | system_text = anthropic_request.system 408 | elif isinstance(anthropic_request.system, list): 409 | text_parts = [] 410 | for block in anthropic_request.system: 411 | if hasattr(block, 'type') and block.type == Constants.CONTENT_TEXT: 412 | text_parts.append(block.text) 413 | elif isinstance(block, dict) and block.get("type") == Constants.CONTENT_TEXT: 414 | text_parts.append(block.get("text", "")) 415 | system_text = "\n\n".join(text_parts) 416 | 417 | if system_text.strip(): 418 | litellm_messages.append({"role": Constants.ROLE_SYSTEM, "content": system_text.strip()}) 419 | 420 | # Process messages 421 | for msg in anthropic_request.messages: 422 | if isinstance(msg.content, str): 423 | litellm_messages.append({"role": msg.role, "content": msg.content}) 424 | continue 425 | 426 | # Process content blocks - accumulate different types 427 | text_parts = [] 428 | image_parts = [] 429 | tool_calls = [] 430 | pending_tool_messages = [] 431 | 432 | for block in msg.content: 433 | if block.type == Constants.CONTENT_TEXT: 434 | text_parts.append(block.text) 435 | elif block.type == Constants.CONTENT_IMAGE: 436 | if (isinstance(block.source, dict) and 437 | block.source.get("type") == "base64" and 438 | "media_type" in block.source and "data" in block.source): 439 | image_parts.append({ 440 | "type": "image_url", 441 | "image_url": { 442 | "url": f"data:{block.source['media_type']};base64,{block.source['data']}" 443 | } 444 | }) 445 | elif block.type == Constants.CONTENT_TOOL_USE and msg.role == Constants.ROLE_ASSISTANT: 446 | tool_calls.append({ 447 | "id": block.id, 448 | "type": Constants.TOOL_FUNCTION, 449 | Constants.TOOL_FUNCTION: { 450 | "name": block.name, 451 | "arguments": json.dumps(block.input) 452 | } 453 | }) 454 | elif block.type == Constants.CONTENT_TOOL_RESULT and msg.role == Constants.ROLE_USER: 455 | # CRITICAL: Split user message when tool_result is encountered 456 | if text_parts or image_parts: 457 | content_parts = [] 458 | text_content = "".join(text_parts).strip() 459 | if text_content: 460 | content_parts.append({"type": Constants.CONTENT_TEXT, "text": text_content}) 461 | content_parts.extend(image_parts) 462 | 463 | litellm_messages.append({ 464 | "role": Constants.ROLE_USER, 465 | "content": content_parts[0]["text"] if len(content_parts) == 1 and content_parts[0]["type"] == Constants.CONTENT_TEXT else content_parts 466 | }) 467 | text_parts.clear() 468 | image_parts.clear() 469 | 470 | # Add tool result as separate "tool" role message 471 | parsed_content = parse_tool_result_content(block.content) 472 | pending_tool_messages.append({ 473 | "role": Constants.ROLE_TOOL, 474 | "tool_call_id": block.tool_use_id, 475 | "content": parsed_content 476 | }) 477 | 478 | # Finalize message based on role 479 | if msg.role == Constants.ROLE_USER: 480 | # Add any remaining text/image content 481 | if text_parts or image_parts: 482 | content_parts = [] 483 | text_content = "".join(text_parts).strip() 484 | if text_content: 485 | content_parts.append({"type": Constants.CONTENT_TEXT, "text": text_content}) 486 | content_parts.extend(image_parts) 487 | 488 | litellm_messages.append({ 489 | "role": Constants.ROLE_USER, 490 | "content": content_parts[0]["text"] if len(content_parts) == 1 and content_parts[0]["type"] == Constants.CONTENT_TEXT else content_parts 491 | }) 492 | # Add any pending tool messages 493 | litellm_messages.extend(pending_tool_messages) 494 | 495 | elif msg.role == Constants.ROLE_ASSISTANT: 496 | assistant_msg = {"role": Constants.ROLE_ASSISTANT} 497 | 498 | # Handle content for assistant messages 499 | content_parts = [] 500 | text_content = "".join(text_parts).strip() 501 | if text_content: 502 | content_parts.append({"type": Constants.CONTENT_TEXT, "text": text_content}) 503 | content_parts.extend(image_parts) 504 | 505 | # FIXED: Don't set content to None - let LiteLLM handle missing content 506 | if content_parts: 507 | assistant_msg["content"] = content_parts[0]["text"] if len(content_parts) == 1 and content_parts[0]["type"] == Constants.CONTENT_TEXT else content_parts 508 | else: 509 | assistant_msg["content"] = None 510 | 511 | if tool_calls: 512 | assistant_msg["tool_calls"] = tool_calls 513 | 514 | # Only add message if it has actual content or tool calls 515 | if assistant_msg.get("content") or assistant_msg.get("tool_calls"): 516 | litellm_messages.append(assistant_msg) 517 | 518 | # Build final LiteLLM request 519 | litellm_request = { 520 | "model": anthropic_request.model, 521 | "messages": litellm_messages, 522 | "max_tokens": min(anthropic_request.max_tokens, config.max_tokens_limit), 523 | "temperature": anthropic_request.temperature, 524 | "stream": anthropic_request.stream, 525 | } 526 | 527 | # Add optional parameters 528 | if anthropic_request.stop_sequences: 529 | litellm_request["stop"] = anthropic_request.stop_sequences 530 | if anthropic_request.top_p is not None: 531 | litellm_request["top_p"] = anthropic_request.top_p 532 | if anthropic_request.top_k is not None: 533 | litellm_request["topK"] = anthropic_request.top_k 534 | 535 | # Add tools with schema cleaning 536 | if anthropic_request.tools: 537 | valid_tools = [] 538 | for tool in anthropic_request.tools: 539 | if tool.name and tool.name.strip(): 540 | cleaned_schema = clean_gemini_schema(tool.input_schema) 541 | valid_tools.append({ 542 | "type": Constants.TOOL_FUNCTION, 543 | Constants.TOOL_FUNCTION: { 544 | "name": tool.name, 545 | "description": tool.description or "", 546 | "parameters": cleaned_schema 547 | } 548 | }) 549 | if valid_tools: 550 | litellm_request["tools"] = valid_tools 551 | 552 | # Add tool choice configuration 553 | if anthropic_request.tool_choice: 554 | choice_type = anthropic_request.tool_choice.get("type") 555 | if choice_type == "auto": 556 | litellm_request["tool_choice"] = "auto" 557 | elif choice_type == "any": 558 | litellm_request["tool_choice"] = "auto" 559 | elif choice_type == "tool" and "name" in anthropic_request.tool_choice: 560 | litellm_request["tool_choice"] = { 561 | "type": Constants.TOOL_FUNCTION, 562 | Constants.TOOL_FUNCTION: {"name": anthropic_request.tool_choice["name"]} 563 | } 564 | else: 565 | litellm_request["tool_choice"] = "auto" 566 | 567 | # Add thinking configuration (Gemini specific) 568 | if anthropic_request.thinking is not None: 569 | if anthropic_request.thinking.enabled: 570 | litellm_request["thinkingConfig"] = {"thinkingBudget": 24576} 571 | else: 572 | litellm_request["thinkingConfig"] = {"thinkingBudget": 0} 573 | 574 | # Add user metadata if provided 575 | if (anthropic_request.metadata and 576 | "user_id" in anthropic_request.metadata and 577 | isinstance(anthropic_request.metadata["user_id"], str)): 578 | litellm_request["user"] = anthropic_request.metadata["user_id"] 579 | 580 | return litellm_request 581 | 582 | # Response conversion 583 | def convert_litellm_to_anthropic(litellm_response, original_request: MessagesRequest) -> MessagesResponse: 584 | """Convert LiteLLM (Gemini) response back to Anthropic API format.""" 585 | try: 586 | # Extract response data safely 587 | response_id = f"msg_{uuid.uuid4()}" 588 | content_text = "" 589 | tool_calls = None 590 | finish_reason = "stop" 591 | prompt_tokens = 0 592 | completion_tokens = 0 593 | 594 | # Handle LiteLLM ModelResponse object format 595 | if hasattr(litellm_response, 'choices') and hasattr(litellm_response, 'usage'): 596 | choices = litellm_response.choices 597 | message = choices[0].message if choices else None 598 | content_text = getattr(message, 'content', "") or "" 599 | tool_calls = getattr(message, 'tool_calls', None) 600 | finish_reason = choices[0].finish_reason if choices else "stop" 601 | response_id = getattr(litellm_response, 'id', response_id) 602 | 603 | if hasattr(litellm_response, 'usage'): 604 | usage = litellm_response.usage 605 | prompt_tokens = getattr(usage, "prompt_tokens", 0) 606 | completion_tokens = getattr(usage, "completion_tokens", 0) 607 | 608 | # Handle dictionary response format 609 | elif isinstance(litellm_response, dict): 610 | choices = litellm_response.get("choices", []) 611 | message = choices[0].get("message", {}) if choices else {} 612 | content_text = message.get("content", "") or "" 613 | tool_calls = message.get("tool_calls") 614 | finish_reason = choices[0].get("finish_reason", "stop") if choices else "stop" 615 | usage = litellm_response.get("usage", {}) 616 | prompt_tokens = usage.get("prompt_tokens", 0) 617 | completion_tokens = usage.get("completion_tokens", 0) 618 | response_id = litellm_response.get("id", response_id) 619 | 620 | # Build content blocks 621 | content_blocks = [] 622 | 623 | # Add text content if present 624 | if content_text: 625 | content_blocks.append(ContentBlockText(type=Constants.CONTENT_TEXT, text=content_text)) 626 | 627 | # Process tool calls 628 | if tool_calls: 629 | if not isinstance(tool_calls, list): 630 | tool_calls = [tool_calls] 631 | 632 | for tool_call in tool_calls: 633 | try: 634 | # Extract tool call data from different formats 635 | if isinstance(tool_call, dict): 636 | tool_id = tool_call.get("id", f"tool_{uuid.uuid4()}") 637 | function_data = tool_call.get(Constants.TOOL_FUNCTION, {}) 638 | name = function_data.get("name", "") 639 | arguments_str = function_data.get("arguments", "{}") 640 | elif hasattr(tool_call, "id") and hasattr(tool_call, Constants.TOOL_FUNCTION): 641 | tool_id = tool_call.id 642 | name = tool_call.function.name 643 | arguments_str = tool_call.function.arguments 644 | else: 645 | continue 646 | 647 | if not name: 648 | continue 649 | 650 | # Parse tool arguments safely 651 | try: 652 | arguments_dict = json.loads(arguments_str) 653 | except json.JSONDecodeError: 654 | arguments_dict = {"raw_arguments": arguments_str} 655 | 656 | content_blocks.append(ContentBlockToolUse( 657 | type=Constants.CONTENT_TOOL_USE, 658 | id=tool_id, 659 | name=name, 660 | input=arguments_dict 661 | )) 662 | except Exception as e: 663 | logger.warning(f"Error processing tool call: {e}") 664 | continue 665 | 666 | # Ensure at least one content block 667 | if not content_blocks: 668 | content_blocks.append(ContentBlockText(type=Constants.CONTENT_TEXT, text="")) 669 | 670 | # Map finish reason to Anthropic format 671 | if finish_reason == "length": 672 | stop_reason = Constants.STOP_MAX_TOKENS 673 | elif finish_reason == "tool_calls": 674 | stop_reason = Constants.STOP_TOOL_USE 675 | elif finish_reason is None and tool_calls: 676 | stop_reason = Constants.STOP_TOOL_USE 677 | else: 678 | stop_reason = Constants.STOP_END_TURN 679 | 680 | return MessagesResponse( 681 | id=response_id, 682 | model=original_request.original_model or original_request.model, 683 | role=Constants.ROLE_ASSISTANT, 684 | content=content_blocks, 685 | stop_reason=stop_reason, 686 | stop_sequence=None, 687 | usage=Usage( 688 | input_tokens=prompt_tokens, 689 | output_tokens=completion_tokens 690 | ) 691 | ) 692 | 693 | except Exception as e: 694 | logger.error(f"Error converting response: {e}") 695 | return MessagesResponse( 696 | id=f"msg_error_{uuid.uuid4()}", 697 | model=original_request.original_model or original_request.model, 698 | role=Constants.ROLE_ASSISTANT, 699 | content=[ContentBlockText(type=Constants.CONTENT_TEXT, text="Response conversion error")], 700 | stop_reason=Constants.STOP_ERROR, 701 | usage=Usage(input_tokens=0, output_tokens=0) 702 | ) 703 | 704 | # Enhanced streaming handler with more robust error recovery 705 | async def handle_streaming_with_recovery(response_generator, original_request: MessagesRequest): 706 | """Enhanced streaming handler with robust error recovery for malformed chunks.""" 707 | message_id = f"msg_{uuid.uuid4().hex[:24]}" 708 | 709 | # Send initial SSE events 710 | yield f"event: {Constants.EVENT_MESSAGE_START}\ndata: {json.dumps({'type': Constants.EVENT_MESSAGE_START, 'message': {'id': message_id, 'type': 'message', 'role': Constants.ROLE_ASSISTANT, 'model': original_request.original_model or original_request.model, 'content': [], 'stop_reason': None, 'stop_sequence': None, 'usage': {'input_tokens': 0, 'output_tokens': 0}}})}\n\n" 711 | 712 | yield f"event: {Constants.EVENT_CONTENT_BLOCK_START}\ndata: {json.dumps({'type': Constants.EVENT_CONTENT_BLOCK_START, 'index': 0, 'content_block': {'type': Constants.CONTENT_TEXT, 'text': ''}})}\n\n" 713 | 714 | yield f"event: {Constants.EVENT_PING}\ndata: {json.dumps({'type': Constants.EVENT_PING})}\n\n" 715 | 716 | # Streaming state management 717 | accumulated_text = "" 718 | text_block_index = 0 719 | tool_block_counter = 0 720 | current_tool_calls = {} 721 | input_tokens = 0 722 | output_tokens = 0 723 | final_stop_reason = Constants.STOP_END_TURN 724 | 725 | # Enhanced error recovery tracking 726 | consecutive_errors = 0 727 | max_consecutive_errors = 10 # Increased from 5 728 | stream_terminated_early = False 729 | malformed_chunks_count = 0 730 | max_malformed_chunks = 20 # Allow more malformed chunks before giving up 731 | 732 | # Buffer for incomplete chunks 733 | chunk_buffer = "" 734 | 735 | def is_malformed_chunk(chunk_str: str) -> bool: 736 | """Enhanced malformed chunk detection.""" 737 | if not chunk_str or not isinstance(chunk_str, str): 738 | return True 739 | 740 | chunk_stripped = chunk_str.strip() 741 | 742 | # Empty or whitespace 743 | if not chunk_stripped: 744 | return True 745 | 746 | # Single characters that indicate malformed JSON 747 | malformed_singles = ["{", "}", "[", "]", ",", ":", '"', "'"] 748 | if chunk_stripped in malformed_singles: 749 | return True 750 | 751 | # Common malformed patterns 752 | malformed_patterns = [ 753 | '{"', '"}', "[{", "}]", "{}", "[]", 754 | "null", '""', "''", " ", "", 755 | "{,", ",}", "[,", ",]" 756 | ] 757 | if chunk_stripped in malformed_patterns: 758 | return True 759 | 760 | # Incomplete JSON structures 761 | if chunk_stripped.startswith('{') and not chunk_stripped.endswith('}'): 762 | if len(chunk_stripped) < 15: # Very short incomplete JSON 763 | return True 764 | 765 | if chunk_stripped.startswith('[') and not chunk_stripped.endswith(']'): 766 | if len(chunk_stripped) < 10: 767 | return True 768 | 769 | # Check for obviously broken JSON patterns 770 | if chunk_stripped.count('{') != chunk_stripped.count('}'): 771 | if len(chunk_stripped) < 20: # Only for short chunks 772 | return True 773 | 774 | if chunk_stripped.count('[') != chunk_stripped.count(']'): 775 | if len(chunk_stripped) < 20: 776 | return True 777 | 778 | return False 779 | 780 | def try_parse_buffered_chunk(buffer: str) -> tuple[dict, str]: 781 | """Try to parse buffered chunks, return parsed chunk and remaining buffer.""" 782 | if not buffer.strip(): 783 | return None, "" 784 | 785 | # Try to find complete JSON objects in the buffer 786 | brace_count = 0 787 | start_pos = -1 788 | 789 | for i, char in enumerate(buffer): 790 | if char == '{': 791 | if start_pos == -1: 792 | start_pos = i 793 | brace_count += 1 794 | elif char == '}': 795 | brace_count -= 1 796 | if brace_count == 0 and start_pos != -1: 797 | # Found complete JSON object 798 | json_str = buffer[start_pos:i+1] 799 | try: 800 | parsed = json.loads(json_str) 801 | remaining_buffer = buffer[i+1:] 802 | return parsed, remaining_buffer 803 | except json.JSONDecodeError: 804 | continue 805 | 806 | # No complete JSON found 807 | return None, buffer 808 | 809 | try: 810 | # Wrap the entire streaming process in comprehensive error handling 811 | stream_iterator = aiter(response_generator) 812 | 813 | while True: 814 | try: 815 | # Get next chunk with timeout 816 | try: 817 | chunk = await asyncio.wait_for(anext(stream_iterator), timeout=90.0) 818 | except StopAsyncIteration: 819 | break 820 | except asyncio.TimeoutError: 821 | logger.warning("Streaming timeout, terminating") 822 | stream_terminated_early = True 823 | break 824 | 825 | # Reset consecutive error counter on successful chunk retrieval 826 | consecutive_errors = 0 827 | 828 | # Handle string chunks with enhanced validation 829 | if isinstance(chunk, str): 830 | if chunk.strip() == "[DONE]": 831 | break 832 | 833 | # Check for malformed chunks 834 | if is_malformed_chunk(chunk): 835 | malformed_chunks_count += 1 836 | logger.debug(f"Skipping malformed chunk #{malformed_chunks_count}: '{chunk[:50]}{'...' if len(chunk) > 50 else ''}'") 837 | 838 | if malformed_chunks_count > max_malformed_chunks: 839 | logger.error(f"Too many malformed chunks ({malformed_chunks_count}), terminating stream") 840 | stream_terminated_early = True 841 | break 842 | continue 843 | 844 | # Add to buffer and try to parse 845 | chunk_buffer += chunk 846 | parsed_chunk, chunk_buffer = try_parse_buffered_chunk(chunk_buffer) 847 | 848 | if parsed_chunk is None: 849 | # Keep buffering if we don't have a complete chunk yet 850 | if len(chunk_buffer) > 10000: # Prevent buffer from growing too large 851 | logger.warning("Chunk buffer too large, clearing") 852 | chunk_buffer = "" 853 | continue 854 | 855 | chunk = parsed_chunk 856 | 857 | # If we have a dictionary at this point, process it 858 | if isinstance(chunk, dict): 859 | # Process the chunk normally (existing logic) 860 | pass 861 | elif hasattr(chunk, 'choices'): 862 | # Process ModelResponse object normally (existing logic) 863 | pass 864 | else: 865 | # Try one more JSON parse attempt 866 | try: 867 | if isinstance(chunk, str): 868 | chunk = json.loads(chunk) 869 | else: 870 | logger.debug(f"Skipping unprocessable chunk type: {type(chunk)}") 871 | continue 872 | except json.JSONDecodeError as parse_error: 873 | logger.debug(f"Failed to parse chunk as JSON: {parse_error}") 874 | continue 875 | 876 | # Extract chunk data (your existing logic here) 877 | delta_content_text = None 878 | delta_tool_calls = None 879 | chunk_finish_reason = None 880 | 881 | if hasattr(chunk, 'choices') and chunk.choices: 882 | choice = chunk.choices[0] 883 | if hasattr(choice, 'delta') and choice.delta: 884 | delta = choice.delta 885 | delta_content_text = getattr(delta, 'content', None) 886 | if hasattr(delta, 'tool_calls'): 887 | delta_tool_calls = delta.tool_calls 888 | chunk_finish_reason = getattr(choice, 'finish_reason', None) 889 | elif isinstance(chunk, dict): 890 | choices = chunk.get("choices", []) 891 | if choices: 892 | choice = choices[0] 893 | delta = choice.get("delta", {}) 894 | delta_content_text = delta.get("content") 895 | delta_tool_calls = delta.get("tool_calls") 896 | chunk_finish_reason = choice.get("finish_reason") 897 | 898 | if hasattr(chunk, 'usage') and chunk.usage: 899 | input_tokens = getattr(chunk.usage, 'prompt_tokens', 0) 900 | output_tokens = getattr(chunk.usage, 'completion_tokens', 0) 901 | elif isinstance(chunk, dict) and "usage" in chunk: 902 | usage = chunk["usage"] 903 | input_tokens = usage.get("prompt_tokens", 0) 904 | output_tokens = usage.get("completion_tokens", 0) 905 | 906 | # Handle text delta 907 | if delta_content_text: 908 | accumulated_text += delta_content_text 909 | yield f"event: {Constants.EVENT_CONTENT_BLOCK_DELTA}\ndata: {json.dumps({'type': Constants.EVENT_CONTENT_BLOCK_DELTA, 'index': text_block_index, 'delta': {'type': Constants.DELTA_TEXT, 'text': delta_content_text}})}\n\n" 910 | 911 | # Handle tool call deltas (your existing logic) 912 | if delta_tool_calls: 913 | for tc_chunk in delta_tool_calls: 914 | if not (hasattr(tc_chunk, 'function') and tc_chunk.function and 915 | hasattr(tc_chunk.function, 'name') and tc_chunk.function.name): 916 | continue 917 | 918 | tool_call_id = tc_chunk.id 919 | 920 | if tool_call_id not in current_tool_calls: 921 | tool_block_counter += 1 922 | tool_index = text_block_index + tool_block_counter 923 | 924 | current_tool_calls[tool_call_id] = { 925 | "index": tool_index, 926 | "name": tc_chunk.function.name or "", 927 | "args_buffer": tc_chunk.function.arguments or "" 928 | } 929 | 930 | yield f"event: {Constants.EVENT_CONTENT_BLOCK_START}\ndata: {json.dumps({'type': Constants.EVENT_CONTENT_BLOCK_START, 'index': tool_index, 'content_block': {'type': Constants.CONTENT_TOOL_USE, 'id': tool_call_id, 'name': current_tool_calls[tool_call_id]['name'], 'input': {}}})}\n\n" 931 | 932 | if tc_chunk.function.arguments: 933 | current_tool_calls[tool_call_id]["args_buffer"] += tc_chunk.function.arguments 934 | yield f"event: {Constants.EVENT_CONTENT_BLOCK_DELTA}\ndata: {json.dumps({'type': Constants.EVENT_CONTENT_BLOCK_DELTA, 'index': current_tool_calls[tool_call_id]['index'], 'delta': {'type': Constants.DELTA_INPUT_JSON, 'partial_json': tc_chunk.function.arguments}})}\n\n" 935 | 936 | # Handle finish reason 937 | if chunk_finish_reason: 938 | if chunk_finish_reason == "length": 939 | final_stop_reason = Constants.STOP_MAX_TOKENS 940 | elif chunk_finish_reason == "tool_calls": 941 | final_stop_reason = Constants.STOP_TOOL_USE 942 | elif chunk_finish_reason == "stop": 943 | final_stop_reason = Constants.STOP_END_TURN 944 | else: 945 | final_stop_reason = Constants.STOP_END_TURN 946 | break 947 | 948 | except (json.JSONDecodeError, ValueError) as parse_error: 949 | consecutive_errors += 1 950 | logger.debug(f"JSON parsing error (attempt {consecutive_errors}/{max_consecutive_errors}): {parse_error}") 951 | 952 | if consecutive_errors >= max_consecutive_errors: 953 | logger.error(f"Too many consecutive parsing errors ({consecutive_errors}), terminating stream") 954 | stream_terminated_early = True 955 | break 956 | continue 957 | 958 | except (litellm.exceptions.APIConnectionError, RuntimeError) as api_error: 959 | consecutive_errors += 1 960 | error_msg = str(api_error) 961 | 962 | # Check for the specific malformed chunk error 963 | if ("Error parsing chunk" in error_msg and 964 | "Expecting property name enclosed in double quotes" in error_msg): 965 | 966 | logger.warning(f"Gemini malformed chunk error (attempt {consecutive_errors}/{max_consecutive_errors})") 967 | 968 | if consecutive_errors >= max_consecutive_errors: 969 | logger.error(f"Too many consecutive API errors ({consecutive_errors}), terminating stream") 970 | stream_terminated_early = True 971 | 972 | # Send error info to client 973 | error_text = f"\n⚠️ Gemini streaming encountered repeated malformed chunks. This is a known API issue.\n" 974 | yield f"event: {Constants.EVENT_CONTENT_BLOCK_DELTA}\ndata: {json.dumps({'type': Constants.EVENT_CONTENT_BLOCK_DELTA, 'index': text_block_index, 'delta': {'type': Constants.DELTA_TEXT, 'text': error_text}})}\n\n" 975 | break 976 | 977 | # Brief delay before continuing 978 | await asyncio.sleep(0.1) 979 | continue 980 | else: 981 | # Other API errors - terminate immediately 982 | logger.error(f"API error: {api_error}") 983 | stream_terminated_early = True 984 | break 985 | 986 | except Exception as general_error: 987 | consecutive_errors += 1 988 | logger.error(f"Unexpected streaming error (attempt {consecutive_errors}/{max_consecutive_errors}): {general_error}") 989 | 990 | if consecutive_errors >= max_consecutive_errors: 991 | logger.error(f"Too many consecutive errors ({consecutive_errors}), terminating stream") 992 | stream_terminated_early = True 993 | break 994 | 995 | # Brief delay before continuing 996 | await asyncio.sleep(0.1) 997 | continue 998 | 999 | except Exception as outer_error: 1000 | logger.error(f"Fatal streaming error: {outer_error}") 1001 | stream_terminated_early = True 1002 | 1003 | # Always send final SSE events 1004 | try: 1005 | yield f"event: {Constants.EVENT_CONTENT_BLOCK_STOP}\ndata: {json.dumps({'type': Constants.EVENT_CONTENT_BLOCK_STOP, 'index': text_block_index})}\n\n" 1006 | 1007 | for tool_data in current_tool_calls.values(): 1008 | yield f"event: {Constants.EVENT_CONTENT_BLOCK_STOP}\ndata: {json.dumps({'type': Constants.EVENT_CONTENT_BLOCK_STOP, 'index': tool_data['index']})}\n\n" 1009 | 1010 | if stream_terminated_early and final_stop_reason == Constants.STOP_END_TURN: 1011 | final_stop_reason = Constants.STOP_ERROR 1012 | 1013 | usage_data = {"input_tokens": input_tokens, "output_tokens": output_tokens} 1014 | yield f"event: {Constants.EVENT_MESSAGE_DELTA}\ndata: {json.dumps({'type': Constants.EVENT_MESSAGE_DELTA, 'delta': {'stop_reason': final_stop_reason, 'stop_sequence': None}, 'usage': usage_data})}\n\n" 1015 | yield f"event: {Constants.EVENT_MESSAGE_STOP}\ndata: {json.dumps({'type': Constants.EVENT_MESSAGE_STOP})}\n\n" 1016 | 1017 | # Log final statistics 1018 | if malformed_chunks_count > 0: 1019 | logger.info(f"Stream completed with {malformed_chunks_count} malformed chunks handled") 1020 | 1021 | except Exception as final_error: 1022 | logger.error(f"Error sending final SSE events: {final_error}") 1023 | 1024 | # Request Middleware 1025 | @app.middleware("http") 1026 | async def log_requests(request: Request, call_next): 1027 | method = request.method 1028 | path = request.url.path 1029 | logger.debug(f"Request: {method} {path}") 1030 | response = await call_next(request) 1031 | return response 1032 | 1033 | # Enhanced streaming retry logic for the main endpoint 1034 | @app.post("/v1/messages") 1035 | async def create_message(request: MessagesRequest, raw_request: Request): 1036 | try: 1037 | logger.debug(f"📊 Processing request: Original={request.original_model}, Effective={request.model}, Stream={request.stream}") 1038 | 1039 | # Check streaming configuration 1040 | if request.stream and config.emergency_disable_streaming: 1041 | logger.warning("Streaming disabled via EMERGENCY_DISABLE_STREAMING") 1042 | request.stream = False 1043 | 1044 | if request.stream and config.force_disable_streaming: 1045 | logger.info("Streaming disabled via FORCE_DISABLE_STREAMING") 1046 | request.stream = False 1047 | 1048 | # Convert request 1049 | litellm_request = convert_anthropic_to_litellm(request) 1050 | litellm_request["api_key"] = config.gemini_api_key 1051 | 1052 | # Log request details 1053 | num_tools = len(request.tools) if request.tools else 0 1054 | log_request_beautifully( 1055 | "POST", raw_request.url.path, 1056 | request.original_model or request.model, 1057 | litellm_request.get('model'), 1058 | len(litellm_request['messages']), 1059 | num_tools, 200 1060 | ) 1061 | 1062 | # Enhanced streaming with better retry logic 1063 | if request.stream: 1064 | streaming_retry_count = 0 1065 | max_retries = config.max_streaming_retries 1066 | 1067 | while streaming_retry_count <= max_retries: 1068 | try: 1069 | logger.debug(f"Attempting streaming (attempt {streaming_retry_count + 1}/{max_retries + 1})") 1070 | 1071 | # Add slight delay between retries 1072 | if streaming_retry_count > 0: 1073 | delay = min(0.5 * (2 ** streaming_retry_count), 2.0) # Exponential backoff, max 2s 1074 | logger.debug(f"Waiting {delay}s before retry...") 1075 | await asyncio.sleep(delay) 1076 | 1077 | response_generator = await litellm.acompletion(**litellm_request) 1078 | 1079 | return StreamingResponse( 1080 | handle_streaming_with_recovery(response_generator, request), 1081 | media_type="text/event-stream", 1082 | headers={ 1083 | "Cache-Control": "no-cache", 1084 | "Connection": "keep-alive", 1085 | "X-Accel-Buffering": "no", 1086 | "Access-Control-Allow-Origin": "*", 1087 | "Access-Control-Allow-Headers": "*" 1088 | } 1089 | ) 1090 | 1091 | except (litellm.exceptions.APIConnectionError, RuntimeError) as streaming_error: 1092 | streaming_retry_count += 1 1093 | error_msg = str(streaming_error) 1094 | 1095 | # Check for the specific malformed chunk error 1096 | if ("Error parsing chunk" in error_msg and 1097 | "Expecting property name enclosed in double quotes" in error_msg): 1098 | 1099 | if streaming_retry_count <= max_retries: 1100 | logger.warning(f"Gemini streaming chunk parsing error (attempt {streaming_retry_count}/{max_retries + 1}), retrying...") 1101 | continue 1102 | else: 1103 | logger.error(f"Gemini streaming failed after {max_retries + 1} attempts due to malformed chunks, falling back to non-streaming") 1104 | break 1105 | else: 1106 | # Other streaming errors - could be connection issues 1107 | if streaming_retry_count <= max_retries: 1108 | logger.warning(f"Streaming error (attempt {streaming_retry_count}/{max_retries + 1}): {error_msg}") 1109 | continue 1110 | else: 1111 | logger.error(f"Streaming failed after {max_retries + 1} attempts, falling back to non-streaming") 1112 | break 1113 | 1114 | except Exception as unexpected_error: 1115 | streaming_retry_count += 1 1116 | logger.error(f"Unexpected streaming error (attempt {streaming_retry_count}/{max_retries + 1}): {unexpected_error}") 1117 | 1118 | if streaming_retry_count <= max_retries: 1119 | continue 1120 | else: 1121 | logger.error(f"Streaming failed after {max_retries + 1} attempts due to unexpected errors, falling back to non-streaming") 1122 | break 1123 | 1124 | # If we get here, streaming failed - fall back to non-streaming 1125 | logger.info("Falling back to non-streaming mode") 1126 | litellm_request["stream"] = False 1127 | 1128 | # Non-streaming path (or fallback) 1129 | if not request.stream or litellm_request.get("stream") == False: 1130 | start_time = time.time() 1131 | litellm_response = await litellm.acompletion(**litellm_request) 1132 | logger.debug(f"✅ Response received: Model={litellm_request.get('model')}, Time={time.time() - start_time:.2f}s") 1133 | 1134 | anthropic_response = convert_litellm_to_anthropic(litellm_response, request) 1135 | return anthropic_response 1136 | 1137 | except litellm.exceptions.APIError as e: 1138 | logger.error(f"LiteLLM API Error: {e}") 1139 | error_msg = classify_gemini_error(str(e)) 1140 | raise HTTPException(status_code=getattr(e, 'status_code', 500), detail=error_msg) 1141 | except ConnectionError as e: 1142 | logger.error(f"Connection Error: {e}") 1143 | raise HTTPException(status_code=503, detail="Connection error. Please check your internet connection.") 1144 | except TimeoutError as e: 1145 | logger.error(f"Timeout Error: {e}") 1146 | raise HTTPException(status_code=504, detail="Request timeout. Please try again.") 1147 | except Exception as e: 1148 | logger.error(f"Error processing request: {e}") 1149 | error_msg = classify_gemini_error(str(e)) 1150 | raise HTTPException(status_code=500, detail=error_msg) 1151 | 1152 | @app.post("/v1/messages/count_tokens") 1153 | async def count_tokens(request: TokenCountRequest, raw_request: Request): 1154 | try: 1155 | # Create temporary request for conversion 1156 | temp_request = MessagesRequest( 1157 | model=request.model, 1158 | max_tokens=1, 1159 | messages=request.messages, 1160 | system=request.system, 1161 | tools=request.tools, 1162 | ) 1163 | 1164 | litellm_data = convert_anthropic_to_litellm(temp_request) 1165 | 1166 | # Log request 1167 | num_tools = len(request.tools) if request.tools else 0 1168 | log_request_beautifully( 1169 | "POST", raw_request.url.path, 1170 | request.original_model or request.model, 1171 | litellm_data.get('model'), 1172 | len(litellm_data['messages']), num_tools, 200 1173 | ) 1174 | 1175 | # Count tokens 1176 | token_count = litellm.token_counter( 1177 | model=litellm_data["model"], 1178 | messages=litellm_data["messages"], 1179 | ) 1180 | 1181 | return TokenCountResponse(input_tokens=token_count) 1182 | 1183 | except Exception as e: 1184 | logger.error(f"Error counting tokens: {str(e)}") 1185 | error_msg = classify_gemini_error(str(e)) 1186 | raise HTTPException(status_code=500, detail=f"Error counting tokens: {error_msg}") 1187 | 1188 | @app.get("/health") 1189 | async def health_check(): 1190 | try: 1191 | health_status = { 1192 | "status": "healthy", 1193 | "timestamp": datetime.now().isoformat(), 1194 | "version": "2.5.0", 1195 | "gemini_api_configured": bool(config.gemini_api_key), 1196 | "api_key_valid": config.validate_api_key(), 1197 | "streaming_config": { 1198 | "force_disabled": config.force_disable_streaming, 1199 | "emergency_disabled": config.emergency_disable_streaming, 1200 | "max_retries": config.max_streaming_retries 1201 | } 1202 | } 1203 | 1204 | return health_status 1205 | 1206 | except Exception as e: 1207 | logger.error(f"Health check error: {e}") 1208 | return JSONResponse( 1209 | status_code=503, 1210 | content={ 1211 | "status": "unhealthy", 1212 | "timestamp": datetime.now().isoformat(), 1213 | "error": "Health check failed" 1214 | } 1215 | ) 1216 | 1217 | @app.get("/test-connection") 1218 | async def test_connection(): 1219 | """Test API connectivity to Gemini""" 1220 | try: 1221 | # Simple test request to verify API connectivity 1222 | test_response = await litellm.acompletion( 1223 | model="gemini/gemini-1.5-flash-latest", 1224 | messages=[{"role": "user", "content": "Hello"}], 1225 | max_tokens=5, 1226 | api_key=config.gemini_api_key 1227 | ) 1228 | 1229 | return { 1230 | "status": "success", 1231 | "message": "Successfully connected to Gemini API", 1232 | "model_used": "gemini-1.5-flash-latest", 1233 | "timestamp": datetime.now().isoformat(), 1234 | "response_id": getattr(test_response, 'id', 'unknown') 1235 | } 1236 | 1237 | except litellm.exceptions.APIError as e: 1238 | logger.error(f"API connectivity test failed: {e}") 1239 | return JSONResponse( 1240 | status_code=503, 1241 | content={ 1242 | "status": "failed", 1243 | "error_type": "API Error", 1244 | "message": classify_gemini_error(str(e)), 1245 | "timestamp": datetime.now().isoformat(), 1246 | "suggestions": [ 1247 | "Check your GEMINI_API_KEY is valid", 1248 | "Verify your API key has the necessary permissions", 1249 | "Check if you have reached rate limits" 1250 | ] 1251 | } 1252 | ) 1253 | except Exception as e: 1254 | logger.error(f"Connection test failed: {e}") 1255 | return JSONResponse( 1256 | status_code=503, 1257 | content={ 1258 | "status": "failed", 1259 | "error_type": "Connection Error", 1260 | "message": classify_gemini_error(str(e)), 1261 | "timestamp": datetime.now().isoformat(), 1262 | "suggestions": [ 1263 | "Check your internet connection", 1264 | "Verify firewall settings allow HTTPS traffic", 1265 | "Try again in a few moments" 1266 | ] 1267 | } 1268 | ) 1269 | 1270 | @app.get("/") 1271 | async def root(): 1272 | return { 1273 | "message": f"Enhanced Gemini-to-Claude API Proxy v2.5.0", 1274 | "status": "running", 1275 | "config": { 1276 | "big_model": config.big_model, 1277 | "small_model": config.small_model, 1278 | "available_models": model_manager.gemini_models[:5], 1279 | "max_tokens_limit": config.max_tokens_limit, 1280 | "api_key_configured": bool(config.gemini_api_key), 1281 | "streaming": { 1282 | "force_disabled": config.force_disable_streaming, 1283 | "emergency_disabled": config.emergency_disable_streaming, 1284 | "max_retries": config.max_streaming_retries 1285 | } 1286 | }, 1287 | "endpoints": { 1288 | "messages": "/v1/messages", 1289 | "count_tokens": "/v1/messages/count_tokens", 1290 | "health": "/health", 1291 | "test_connection": "/test-connection" 1292 | } 1293 | } 1294 | 1295 | # Simple logging utilities 1296 | class Colors: 1297 | CYAN = "\033[96m" 1298 | BLUE = "\033[94m" 1299 | GREEN = "\033[92m" 1300 | YELLOW = "\033[93m" 1301 | RED = "\033[91m" 1302 | MAGENTA = "\033[95m" 1303 | RESET = "\033[0m" 1304 | BOLD = "\033[1m" 1305 | 1306 | def log_request_beautifully(method: str, path: str, requested_model: str, 1307 | gemini_model_used: str, num_messages: int, 1308 | num_tools: int, status_code: int): 1309 | if not sys.stdout.isatty(): 1310 | print(f"{method} {path} - {requested_model} -> {gemini_model_used} ({num_messages} messages, {num_tools} tools)") 1311 | return 1312 | 1313 | # Colorized logging for TTY 1314 | req_display = f"{Colors.CYAN}{requested_model}{Colors.RESET}" 1315 | gemini_display = f"{Colors.GREEN}{gemini_model_used.replace('gemini/', '')}{Colors.RESET}" 1316 | 1317 | endpoint = path.split("?")[0] if "?" in path else path 1318 | tools_str = f"{Colors.MAGENTA}{num_tools} tools{Colors.RESET}" 1319 | messages_str = f"{Colors.BLUE}{num_messages} messages{Colors.RESET}" 1320 | 1321 | if status_code == 200: 1322 | status_str = f"{Colors.GREEN}✓ {status_code} OK{Colors.RESET}" 1323 | else: 1324 | status_str = f"{Colors.RED}✗ {status_code}{Colors.RESET}" 1325 | 1326 | log_line = f"{Colors.BOLD}{method} {endpoint}{Colors.RESET} {status_str}" 1327 | model_line = f"Request: {req_display} → Gemini: {gemini_display} ({tools_str}, {messages_str})" 1328 | 1329 | print(log_line) 1330 | print(model_line) 1331 | sys.stdout.flush() 1332 | 1333 | def validate_startup(): 1334 | """Validate configuration and connectivity on startup""" 1335 | print("🔍 Validating startup configuration...") 1336 | 1337 | # Check API key 1338 | if not config.gemini_api_key: 1339 | print("🔴 FATAL: GEMINI_API_KEY is not set") 1340 | return False 1341 | 1342 | if not config.validate_api_key(): 1343 | print("⚠️ WARNING: API key format validation failed") 1344 | 1345 | # Check network connectivity (basic) 1346 | try: 1347 | import socket 1348 | socket.create_connection(("8.8.8.8", 53), timeout=10) 1349 | print("✅ Network connectivity: OK") 1350 | except OSError: 1351 | print("⚠️ WARNING: Network connectivity check failed") 1352 | 1353 | return True 1354 | 1355 | def main(): 1356 | if len(sys.argv) > 1 and sys.argv[1] == "--help": 1357 | print("Enhanced Gemini-to-Claude API Proxy v2.5.0") 1358 | print("") 1359 | print("Usage: uvicorn server:app --reload --host 0.0.0.0 --port 8082") 1360 | print("") 1361 | print("Required environment variables:") 1362 | print(" GEMINI_API_KEY - Your Google Gemini API key") 1363 | print("") 1364 | print("Optional environment variables:") 1365 | print(f" BIG_MODEL - Big model name (default: gemini-1.5-pro-latest)") 1366 | print(f" SMALL_MODEL - Small model name (default: gemini-1.5-flash-latest)") 1367 | print(f" HOST - Server host (default: 0.0.0.0)") 1368 | print(f" PORT - Server port (default: 8082)") 1369 | print(f" LOG_LEVEL - Logging level (default: WARNING)") 1370 | print(f" MAX_TOKENS_LIMIT - Token limit (default: 8192)") 1371 | print(f" REQUEST_TIMEOUT - Request timeout in seconds (default: 60)") 1372 | print(f" MAX_RETRIES - Maximum retries (default: 2)") 1373 | print(f" MAX_STREAMING_RETRIES - Maximum streaming retries (default: 2)") 1374 | print(f" FORCE_DISABLE_STREAMING - Force disable streaming (default: false)") 1375 | print(f" EMERGENCY_DISABLE_STREAMING - Emergency disable streaming (default: false)") 1376 | print("") 1377 | print("Available Gemini models:") 1378 | for model in model_manager.gemini_models: 1379 | print(f" - {model}") 1380 | sys.exit(0) 1381 | 1382 | # Validate startup configuration 1383 | if not validate_startup(): 1384 | print("🔴 Startup validation failed. Please check your configuration.") 1385 | sys.exit(1) 1386 | 1387 | # Configuration summary 1388 | print("🚀 Enhanced Gemini-to-Claude API Proxy v2.5.0") 1389 | print(f"✅ Configuration loaded successfully") 1390 | print(f" Big Model: {config.big_model}") 1391 | print(f" Small Model: {config.small_model}") 1392 | print(f" Available Models: {len(model_manager.gemini_models)}") 1393 | print(f" Max Tokens Limit: {config.max_tokens_limit}") 1394 | print(f" Request Timeout: {config.request_timeout}s") 1395 | print(f" Max Retries: {config.max_retries}") 1396 | print(f" Max Streaming Retries: {config.max_streaming_retries}") 1397 | print(f" Force Disable Streaming: {config.force_disable_streaming}") 1398 | print(f" Emergency Disable Streaming: {config.emergency_disable_streaming}") 1399 | print(f" Log Level: {config.log_level}") 1400 | print(f" Server: {config.host}:{config.port}") 1401 | print("") 1402 | 1403 | # Start server 1404 | uvicorn.run( 1405 | app, 1406 | host=config.host, 1407 | port=config.port, 1408 | log_level=config.log_level.lower() 1409 | ) 1410 | 1411 | if __name__ == "__main__": 1412 | main() 1413 | --------------------------------------------------------------------------------