├── LICENSE ├── README.md ├── api ├── __init__.py ├── hello.py ├── servers │ ├── __init__.py │ ├── base.py │ ├── gemini.py │ └── generic.py └── v1 │ └── __init__.py ├── main.py ├── package.json ├── public ├── __init__.py ├── favicon.ico ├── flow.png ├── usage.py └── vercel.png ├── requirements.txt ├── tests ├── config.py ├── test_async_api.py ├── test_gemini_flow.py └── test_sync_api.py └── vercel.json /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) [2024] [ultrasev] 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 6 | 7 |

8 | Deploy on Vercel 9 |

LLM API 反向代理

10 | 11 |

12 | 13 | Issues 14 | 15 | 16 | GitHub pull requests 17 | 18 |
19 |

20 | 21 | 本项目旨在提供一个反向代理服务,解决在部分国家或地区无法直接访问 Google, Groq, Cerebras(Amazon cloudfront)等平台 API 的问题。 22 | 23 | # 功能 24 | 25 | 通过 Vercel 边缘网络,反向代理 OpenAI、Groq、Google、Cerebras 等平台的 API 请求。 26 | 27 | - 支持供应商:Groq、Google、OpenAI、Cerebras、NVIDIA、Mistral、Sambanova 28 | - 支持流式输出 29 | - 兼容 OpenAI API 规范 30 | 31 | 注:大陆不可直接访问 vercel.app 域名。如想直接访问,可参考之前作者的另一个项目[llmproxy](https://github.com/ultrasev/llmproxy),通过 cloudflare worker 部署 LLM API 反向代理。 32 | 33 | # 使用 34 | 部署到 Vercel 后,可使用自己的 API 地址为:https://your-project-name.vercel.app/。 35 | 测试 API 地址: 36 | - https://llmproxy-vercel.vercel.app/ : 即 vercel 提供的 API 地址,局域网内不能直接访问,需科学上网。 37 | - https://llm.cufo.cc/ : 作者通过 cloudflare + 美区 VPS 搭建的反向代理,大陆等地区可直接访问,调用路径为 local -> CF(HK) -> VPS(US) -> OpenAI。 38 | 39 | 两个地址使用时均不需要携带 `/v1` 后缀,即 `base_url="https://llm.cufo.cc/openai"` 或者 `chat_url="https://llm.cufo.cc/openai/chat/completions"`。 40 | 41 | ## 示例 1: OpenAI 42 | 43 | ```python 44 | from openai import OpenAI 45 | 46 | client = OpenAI( 47 | api_key="sk-proj-...", 48 | base_url="https://llmproxy-vercel.vercel.app/openai", # 没有 /v1 后缀 49 | ) 50 | 51 | response = client.chat.completions.create( 52 | model="gpt-3.5-turbo", 53 | messages=[{"role": "user", "content": "Hello world!"}], 54 | ) 55 | 56 | print(response.choices[0].message.content) 57 | ``` 58 | 59 | ## 示例 2: Google Gemini 60 | 61 | ```python 62 | from openai import OpenAI 63 | 64 | client = OpenAI( 65 | api_key="...", 66 | base_url="https://llmproxy-vercel.vercel.app/gemini", 67 | ) 68 | 69 | response = client.chat.completions.create( 70 | model="gemini-1.5-flash", 71 | messages=[{"role": "user", "content": "Hello world!"}], 72 | ) 73 | 74 | print(response.choices[0].message.content) 75 | ``` 76 | 77 | ## 示例 3: Cerebras 78 | 79 | ```bash 80 | curl --location 'https://llmproxy-vercel.vercel.app/cerebras/chat/completions' \ 81 | --header 'Content-Type: application/json' \ 82 | --header "Authorization: Bearer ${CEREBRAS_API_KEY}" \ 83 | --data '{ 84 | "model": "llama3.1-8b", 85 | "stream": false, 86 | "messages": [{"content": "why is fast inference important?", "role": "user"}], 87 | "temperature": 0, 88 | "max_tokens": 1024, 89 | "seed": 0, 90 | "top_p": 1 91 | }' 92 | ``` 93 | 94 | # Vercel 一键部署 95 | 96 | [![Deploy with Vercel](https://vercel.com/button)](https://vercel.com/new/clone?repository-url=https%3A%2F%2Fgithub.com%2Fultrasev%2Fllmproxy-vercel) 97 | 98 | # 本地开发测试 99 | 100 | ```bash 101 | pip3 install -r requirements.txt 102 | pip3 install uvicorn 103 | uvicorn main:app --host 0.0.0.0 --port 3000 --reload 104 | ``` 105 | 106 | # License 107 | 108 | Copyright © 2024 [ultrasev](https://github.com/ultrasev).
109 | This project is [MIT](LICENSE) licensed. 110 | 111 | # Support me 112 | 113 | [!["Buy Me A Coffee"](https://www.buymeacoffee.com/assets/img/custom_images/orange_img.png)](https://www.buymeacoffee.com/ultrasev) 114 | 115 | ## Contacts 116 | 117 | - [![Twitter Follow](https://img.shields.io/twitter/follow/ultrasev?style=social)](https://twitter.com/slippertopia) 118 | - [![YouTube Channel Subscribers](https://img.shields.io/youtube/channel/subscribers/UCt0Op8mQvqwjp18B8vNPjzg?style=social)](https://www.youtube.com/channel/UCt0Op8mQvqwjp18B8vNPjzg) 119 | -------------------------------------------------------------------------------- /api/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ultrasev/llmproxy-vercel/33725d98111811ed739046c2f8edbba2c18b39f8/api/__init__.py -------------------------------------------------------------------------------- /api/hello.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | from fastapi.routing import APIRouter 3 | router = APIRouter() 4 | 5 | 6 | @router.get("/") 7 | def read_root(): 8 | return {"Hello": "World"} 9 | -------------------------------------------------------------------------------- /api/servers/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ultrasev/llmproxy-vercel/33725d98111811ed739046c2f8edbba2c18b39f8/api/servers/__init__.py -------------------------------------------------------------------------------- /api/servers/base.py: -------------------------------------------------------------------------------- 1 | from pydantic import BaseModel, Field 2 | import httpx 3 | import asyncio 4 | from typing import List, Dict, Optional 5 | 6 | 7 | class Message(BaseModel): 8 | role: str 9 | content: str 10 | 11 | 12 | class OpenAIProxyArgs(BaseModel): 13 | model: str 14 | messages: List[Message] 15 | stream: bool = False 16 | temperature: float = Field(default=0.7, ge=0, le=2) 17 | top_p: float = Field(default=1, ge=0, le=1) 18 | n: int = Field(default=1, ge=1) 19 | max_tokens: Optional[int] = None 20 | presence_penalty: float = Field(default=0, ge=-2, le=2) 21 | frequency_penalty: float = Field(default=0, ge=-2, le=2) 22 | 23 | 24 | async def stream_openai_response(endpoint: str, payload: Dict, headers: Dict): 25 | async with httpx.AsyncClient() as client: 26 | async with client.stream("POST", endpoint, json=payload, headers=headers) as response: 27 | async for line in response.aiter_lines(): 28 | if line.startswith("data: "): 29 | yield line + "\n\n" 30 | elif line.strip() == "data: [DONE]": 31 | break 32 | -------------------------------------------------------------------------------- /api/servers/gemini.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | ''' Convert Gemini API to OpenAI API format 3 | 4 | Gemini API docs: 5 | - https://ai.google.dev/gemini-api/docs/text-generation?lang=rest 6 | ''' 7 | from loguru import logger 8 | from pydantic import BaseModel 9 | from fastapi import APIRouter, HTTPException, Header, Query 10 | from fastapi.responses import JSONResponse, StreamingResponse 11 | import httpx 12 | import typing 13 | from typing import List, Dict, Optional 14 | from .base import Message 15 | import time 16 | import json 17 | import re 18 | 19 | router = APIRouter() 20 | 21 | 22 | GEMINI_ENDPOINT = "https://generativelanguage.googleapis.com/v1beta/models/{}:generateContent" 23 | GEMINI_STREAM_ENDPOINT = "https://generativelanguage.googleapis.com/v1beta/models/{}:streamGenerateContent" 24 | 25 | 26 | class OpenAIProxyArgs(BaseModel): 27 | model: str 28 | messages: List[Dict[str, str]] 29 | stream: bool = False 30 | temperature: float = 0.7 31 | top_p: float = 1 32 | n: int = 1 33 | max_tokens: Optional[int] = None 34 | presence_penalty: float = 0 35 | frequency_penalty: float = 0 36 | 37 | 38 | class MessageConverter: 39 | def __init__(self, messages: List[Dict[str, str]]): 40 | self.messages = messages 41 | 42 | def convert(self) -> List[Dict[str, str]]: 43 | converted_messages = [] 44 | for message in self.messages: 45 | role = "user" if message["role"] == "user" else "model" 46 | converted_messages.append({ 47 | "role": role, 48 | "parts": [{"text": message["content"]}] 49 | }) 50 | return converted_messages 51 | 52 | 53 | def convert_gemini_to_openai_response(gemini_response: dict, model: str) -> dict: 54 | """Convert Gemini API response to OpenAI-compatible format.""" 55 | return { 56 | "id": gemini_response.get("candidates", [{}])[0].get("content", {}).get("role", ""), 57 | "object": "chat.completion", 58 | "created": int(time.time()), 59 | "model": model, 60 | "usage": { 61 | "prompt_tokens": 0, # Gemini doesn't provide token counts 62 | "completion_tokens": 0, 63 | "total_tokens": 0 64 | }, 65 | "choices": [{ 66 | "message": { 67 | "role": "assistant", 68 | "content": gemini_response.get("candidates", [{}])[0].get("content", {}).get("parts", [{}])[0].get("text", "") 69 | }, 70 | "finish_reason": "stop", 71 | "index": 0 72 | }] 73 | } 74 | 75 | 76 | async def stream_gemini_response(model: str, payload: dict, api_key: str): 77 | text_pattern = re.compile(r'"text": "(.*?)"') 78 | 79 | async with httpx.AsyncClient() as client: 80 | async with client.stream( 81 | "POST", 82 | GEMINI_STREAM_ENDPOINT.format(model), 83 | json=payload, 84 | headers={ 85 | "Content-Type": "application/json", 86 | "x-goog-api-key": api_key 87 | } 88 | ) as response: 89 | async for line in response.aiter_lines(): 90 | line = line.strip() 91 | match = text_pattern.search(line) 92 | if match: 93 | text_content = match.group(1) 94 | text_content = json.loads(f'"{text_content}"') 95 | 96 | openai_format = { 97 | "id": f"chatcmpl-{int(time.time())}", 98 | "object": "chat.completion.chunk", 99 | "created": int(time.time()), 100 | "model": model, 101 | "choices": [{ 102 | "index": 0, 103 | "delta": { 104 | "content": text_content 105 | }, 106 | "finish_reason": None 107 | }] 108 | } 109 | 110 | yield f"data: {json.dumps(openai_format, ensure_ascii=False)}\n\n" 111 | 112 | final_chunk = { 113 | "id": f"chatcmpl-{int(time.time())}", 114 | "object": "chat.completion.chunk", 115 | "created": int(time.time()), 116 | "model": model, 117 | "choices": [{ 118 | "index": 0, 119 | "delta": {}, 120 | "finish_reason": "stop" 121 | }] 122 | } 123 | yield f"data: {json.dumps(final_chunk, ensure_ascii=False)}\n\n" 124 | yield "data: [DONE]\n\n" 125 | 126 | 127 | @router.post("/chat/completions") 128 | async def proxy_chat_completions( 129 | args: OpenAIProxyArgs, 130 | authorization: str = Header(...), 131 | ): 132 | api_key = authorization.split(" ")[1] 133 | model = args.model 134 | 135 | if not api_key: 136 | raise HTTPException(status_code=400, detail="API key not provided") 137 | 138 | # Transform args into Gemini API format 139 | gemini_payload = { 140 | "contents": MessageConverter(args.messages).convert(), 141 | "safetySettings": [ 142 | { 143 | "category": "HARM_CATEGORY_DANGEROUS_CONTENT", 144 | "threshold": "BLOCK_ONLY_HIGH" 145 | } 146 | ], 147 | "generationConfig": { 148 | "temperature": args.temperature, 149 | "maxOutputTokens": args.max_tokens, 150 | "topP": args.top_p, 151 | "topK": 10 152 | } 153 | } 154 | 155 | if args.stream: 156 | return StreamingResponse(stream_gemini_response(model, gemini_payload, api_key), media_type="text/event-stream") 157 | else: 158 | async with httpx.AsyncClient() as client: 159 | response = await client.post( 160 | GEMINI_ENDPOINT.format(model), 161 | json=gemini_payload, 162 | headers={ 163 | "Content-Type": "application/json", 164 | "x-goog-api-key": api_key 165 | } 166 | ) 167 | 168 | if response.status_code != 200: 169 | return JSONResponse(content=response.json(), status_code=response.status_code) 170 | 171 | response_json = response.json() 172 | 173 | # Use the new conversion function 174 | openai_compatible_response = convert_gemini_to_openai_response( 175 | response_json, args.model) 176 | 177 | return JSONResponse(openai_compatible_response) 178 | -------------------------------------------------------------------------------- /api/servers/generic.py: -------------------------------------------------------------------------------- 1 | from fastapi import APIRouter, Header, HTTPException 2 | from fastapi.responses import JSONResponse, StreamingResponse 3 | from pydantic import BaseModel 4 | import httpx 5 | from typing import Dict 6 | from .base import stream_openai_response, OpenAIProxyArgs 7 | 8 | router = APIRouter() 9 | 10 | PLATFORM_API_URLS: Dict[str, str] = { 11 | "openai": "https://api.openai.com/v1/chat/completions", 12 | "mistral": "https://api.mistral.ai/v1/chat/completions", 13 | "groq": "https://api.groq.com/openai/v1/chat/completions", 14 | "cerebras": "https://api.cerebras.ai/v1/chat/completions", 15 | "nvidia": "https://integrate.api.nvidia.com/v1/chat/completions", 16 | "sambanova": "https://api.sambanova.ai/v1/chat/completions", 17 | } 18 | 19 | 20 | @router.post("/{platform}/chat/completions") 21 | async def proxy_chat_completions(platform: str, args: OpenAIProxyArgs, authorization: str = Header(...)): 22 | if platform not in PLATFORM_API_URLS: 23 | raise HTTPException( 24 | status_code=404, detail=f"Platform '{platform}' not supported") 25 | 26 | api_url = PLATFORM_API_URLS[platform] 27 | api_key = authorization.split(" ")[1] 28 | headers = { 29 | "Authorization": f"Bearer {api_key}", 30 | "Content-Type": "application/json" 31 | } 32 | payload = args.dict(exclude_none=True) 33 | 34 | if args.stream: 35 | return StreamingResponse( 36 | stream_openai_response(api_url, payload, headers), 37 | media_type="text/event-stream", 38 | headers={"X-Content-Type-Options": "nosniff", 39 | "X-Experimental-Stream-Data": "true"} 40 | ) 41 | else: 42 | async with httpx.AsyncClient() as client: 43 | try: 44 | response = await client.post(api_url, json=payload, headers=headers) 45 | response.raise_for_status() 46 | return JSONResponse(response.json()) 47 | except httpx.HTTPStatusError as e: 48 | raise HTTPException( 49 | status_code=e.response.status_code, detail=str(e.response.text)) 50 | except Exception as e: 51 | raise HTTPException(status_code=500, detail=str(e)) 52 | -------------------------------------------------------------------------------- /api/v1/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ultrasev/llmproxy-vercel/33725d98111811ed739046c2f8edbba2c18b39f8/api/v1/__init__.py -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | from public.usage import USAGE as html 3 | from api.hello import router as hello_router 4 | from fastapi import FastAPI 5 | from fastapi.responses import Response 6 | from api.servers.generic import router as generic_router 7 | from api.servers.gemini import router as gemini_router 8 | from fastapi.middleware.cors import CORSMiddleware 9 | app = FastAPI() 10 | 11 | app.include_router(hello_router, prefix="/hello") 12 | app.include_router(gemini_router, prefix="/gemini") 13 | app.include_router(generic_router, prefix="") # put generic last 14 | 15 | app.add_middleware( 16 | CORSMiddleware, 17 | allow_credentials=True, 18 | allow_methods=["*"], 19 | allow_headers=["*"], 20 | expose_headers=[ "X-Experimental-Stream-Data"], # this is needed for streaming data header to be read by the client 21 | ) 22 | 23 | @app.get("/") 24 | def _root(): 25 | return Response(content=html, media_type="text/html") 26 | -------------------------------------------------------------------------------- /package.json: -------------------------------------------------------------------------------- 1 | { 2 | "engines": { 3 | "node": "18.x" 4 | } 5 | } -------------------------------------------------------------------------------- /public/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ultrasev/llmproxy-vercel/33725d98111811ed739046c2f8edbba2c18b39f8/public/__init__.py -------------------------------------------------------------------------------- /public/favicon.ico: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ultrasev/llmproxy-vercel/33725d98111811ed739046c2f8edbba2c18b39f8/public/favicon.ico -------------------------------------------------------------------------------- /public/flow.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ultrasev/llmproxy-vercel/33725d98111811ed739046c2f8edbba2c18b39f8/public/flow.png -------------------------------------------------------------------------------- /public/usage.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | USAGE = """ 4 | 5 | 6 | 7 | 8 | Usage 9 | 40 | 41 | 42 |
43 |

success

44 |

Usage

45 |

Visit Github doc for more information.

46 |
47 | 48 | 49 | 50 | """ 51 | -------------------------------------------------------------------------------- /public/vercel.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ultrasev/llmproxy-vercel/33725d98111811ed739046c2f8edbba2c18b39f8/public/vercel.png -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | fastapi==0.88.0 2 | pydantic~=1.10.4 3 | python-multipart==0.0.5 4 | expiringdict==1.2.2 5 | rich==13.4.2 6 | openai==1.6.1 7 | httpx==0.27.0 8 | loguru==0.7.2 -------------------------------------------------------------------------------- /tests/config.py: -------------------------------------------------------------------------------- 1 | import random 2 | 3 | PRODUCTION_API_ENDPOINT = random.choice(["https://llmproxy-vercel.vercel.app", "https://llm.cufo.cc"]) 4 | DEVELOPMENT_API_ENDPOINT = "http://192.168.31.46:3000" 5 | -------------------------------------------------------------------------------- /tests/test_async_api.py: -------------------------------------------------------------------------------- 1 | import asyncio 2 | import pytest 3 | import os 4 | from dotenv import load_dotenv 5 | from openai import AsyncOpenAI 6 | import random 7 | from loguru import logger 8 | from config import PRODUCTION_API_ENDPOINT, DEVELOPMENT_API_ENDPOINT 9 | load_dotenv() 10 | 11 | 12 | def api_endpoint(): 13 | env = os.environ.get('ENV', 'development') 14 | if env == 'production': 15 | return PRODUCTION_API_ENDPOINT 16 | elif env == 'development': 17 | return DEVELOPMENT_API_ENDPOINT 18 | else: 19 | raise ValueError(f"Invalid environment: {env}") 20 | 21 | 22 | BASE_URL = api_endpoint() 23 | logger.info(f"BASE_URL: {BASE_URL}") 24 | 25 | 26 | async def make_request(supplier: str, api_key: str, model: str): 27 | BASE_URL = api_endpoint() + f"/{supplier}" 28 | query = "Count from 1 to 5" 29 | 30 | client = AsyncOpenAI(base_url=BASE_URL, api_key=api_key) 31 | 32 | try: 33 | stream = await client.chat.completions.create( 34 | model=model, 35 | messages=[{"role": "user", "content": query}], 36 | stream=True, 37 | ) 38 | 39 | content = "" 40 | async for chunk in stream: 41 | delta_content = chunk.choices[0].delta.content 42 | if delta_content: 43 | content += delta_content 44 | print(f"Received chunk: {delta_content}") # Debug print 45 | 46 | print(f"Full content: {content}") # Debug print 47 | 48 | if not content: 49 | raise ValueError("Received empty content from API") 50 | 51 | for i in range(1, 6): 52 | assert str( 53 | i) in content, f"Expected {i} in content, but it's missing. Content: {content}" 54 | 55 | except Exception as e: 56 | print(f"Error occurred: {str(e)}") 57 | raise 58 | 59 | 60 | @pytest.mark.asyncio 61 | async def test_openai_streaming(): 62 | await make_request( 63 | supplier="openai", 64 | api_key=os.environ["OPENAI_API_KEY"], 65 | model="gpt-3.5-turbo" 66 | ) 67 | 68 | 69 | @pytest.mark.asyncio 70 | async def test_groq_streaming(): 71 | await make_request( 72 | supplier="groq", 73 | api_key=os.environ["GROQ_API_KEY"], 74 | model="llama3-70b-8192" 75 | ) 76 | 77 | 78 | @pytest.mark.asyncio 79 | async def test_gemini_streaming(): 80 | await make_request( 81 | supplier="gemini", 82 | api_key=os.environ["GEMINI_API_KEY"], 83 | model="gemini-1.5-flash" 84 | ) 85 | 86 | 87 | @pytest.mark.asyncio 88 | async def test_cerebras_streaming(): 89 | await make_request( 90 | supplier="cerebras", 91 | api_key=os.environ["CEREBRAS_API_KEY"], 92 | model="llama3.1-8b" 93 | ) 94 | 95 | 96 | @pytest.mark.asyncio 97 | async def test_nvidia_streaming(): 98 | await make_request( 99 | supplier="nvidia", 100 | api_key=os.environ["NVIDIA_API_KEY"], 101 | model="meta/llama-3.2-3b-instruct" 102 | ) 103 | 104 | 105 | @pytest.mark.asyncio 106 | async def test_mistral(): 107 | await make_request( 108 | supplier="mistral", 109 | api_key=os.environ["MISTRAL_API_KEY"], 110 | model="mistral-large-latest", 111 | ) 112 | 113 | 114 | @pytest.mark.asyncio 115 | async def test_sambanova(): 116 | await make_request( 117 | supplier="sambanova", 118 | api_key=os.environ["SAMBANOVA_API_KEY"], 119 | model="Meta-Llama-3.1-405B-Instruct", 120 | ) 121 | -------------------------------------------------------------------------------- /tests/test_gemini_flow.py: -------------------------------------------------------------------------------- 1 | import asyncio 2 | import pytest 3 | import os 4 | from dotenv import load_dotenv 5 | from openai import AsyncOpenAI 6 | import random 7 | from loguru import logger 8 | from config import PRODUCTION_API_ENDPOINT, DEVELOPMENT_API_ENDPOINT 9 | load_dotenv() 10 | 11 | 12 | def api_endpoint(): 13 | env = os.environ.get('ENV', 'development') 14 | if env == 'production': 15 | return PRODUCTION_API_ENDPOINT 16 | elif env == 'development': 17 | return DEVELOPMENT_API_ENDPOINT 18 | else: 19 | raise ValueError(f"Invalid environment: {env}") 20 | 21 | 22 | BASE_URL = api_endpoint() 23 | logger.info(f"BASE_URL: {BASE_URL}") 24 | 25 | 26 | async def make_request(supplier: str, api_key: str, model: str): 27 | BASE_URL = api_endpoint() + f"/{supplier}" 28 | query = "用汉字从一数到十,如一,二,三,四,五,..." 29 | 30 | client = AsyncOpenAI(base_url=BASE_URL, api_key=api_key) 31 | 32 | try: 33 | stream = await client.chat.completions.create( 34 | model=model, 35 | messages=[{"role": "user", "content": query}], 36 | stream=True, 37 | ) 38 | 39 | content = "" 40 | async for chunk in stream: 41 | delta_content = chunk.choices[0].delta.content 42 | if delta_content: 43 | content += delta_content 44 | print(f"Received chunk: {delta_content}") # Debug print 45 | 46 | print(f"Full content: {content}") # Debug print 47 | 48 | if not content: 49 | raise ValueError("Received empty content from API") 50 | 51 | for word in ["一", "二", "三", "四", "五", "六", "七", "八", "九", "十"]: 52 | assert word in content, f"Expected '{word}' in content, but it's missing. Content: {content}" 53 | 54 | except Exception as e: 55 | print(f"Error occurred: {str(e)}") 56 | raise 57 | 58 | 59 | @pytest.mark.asyncio 60 | async def test_gemini_streaming(): 61 | await make_request( 62 | supplier="gemini", 63 | api_key=os.environ["GEMINI_API_KEY"], 64 | model="gemini-1.5-flash" 65 | ) 66 | -------------------------------------------------------------------------------- /tests/test_sync_api.py: -------------------------------------------------------------------------------- 1 | import asyncio 2 | import pytest 3 | import os 4 | from dotenv import load_dotenv 5 | from openai import AsyncOpenAI 6 | import random 7 | from loguru import logger 8 | from config import PRODUCTION_API_ENDPOINT, DEVELOPMENT_API_ENDPOINT 9 | load_dotenv() 10 | 11 | 12 | def api_endpoint(): 13 | env = os.environ.get('ENV', 'development') 14 | if env == 'production': 15 | return PRODUCTION_API_ENDPOINT 16 | elif env == 'development': 17 | return DEVELOPMENT_API_ENDPOINT 18 | else: 19 | raise ValueError(f"Invalid environment: {env}") 20 | 21 | 22 | BASE_URL = api_endpoint() 23 | logger.info(f"BASE_URL: {BASE_URL}") 24 | 25 | 26 | async def make_request(api_key: str, 27 | model: str, 28 | supplier: str, 29 | query: str = "The first president of the United States, give me his full name and only his full name"): 30 | client = AsyncOpenAI(base_url=BASE_URL + f"/{supplier}", api_key=api_key) 31 | response = await client.chat.completions.create( 32 | model=model, 33 | messages=[ 34 | {"role": "system", "content": "You are a helpful assistant。"}, 35 | {"role": "user", "content": query} 36 | ], 37 | temperature=0.7, 38 | top_p=1, 39 | max_tokens=20 40 | ) 41 | print(type(response), response) 42 | content = response.choices[0].message.content 43 | assert "George Washington" in content, f"Expected 'George Washington' in content, but got {content}" 44 | return content 45 | 46 | 47 | @pytest.mark.asyncio 48 | async def test_groq(): 49 | await make_request( 50 | supplier="groq", 51 | api_key=os.environ["GROQ_API_KEY"], 52 | model="llama3-70b-8192" 53 | ) 54 | 55 | 56 | @pytest.mark.asyncio 57 | async def test_openai(): 58 | await make_request( 59 | supplier="openai", 60 | api_key=os.environ["OPENAI_API_KEY"], 61 | model="gpt-4o-mini" 62 | ) 63 | 64 | 65 | @pytest.mark.asyncio 66 | async def test_gemini(): 67 | await make_request( 68 | supplier="gemini", 69 | api_key=os.environ["GEMINI_API_KEY"], 70 | model="gemini-1.5-flash" 71 | ) 72 | 73 | 74 | @pytest.mark.asyncio 75 | async def test_cerebras(): 76 | await make_request( 77 | supplier="cerebras", 78 | api_key=os.environ["CEREBRAS_API_KEY"], 79 | model="llama3.1-8b" 80 | ) 81 | 82 | 83 | @pytest.mark.asyncio 84 | async def test_nvidia(): 85 | await make_request( 86 | supplier="nvidia", 87 | api_key=os.environ["NVIDIA_API_KEY"], 88 | model="meta/llama-3.2-3b-instruct" 89 | ) 90 | 91 | 92 | @pytest.mark.asyncio 93 | async def test_mistral(): 94 | await make_request( 95 | supplier="mistral", 96 | api_key=os.environ["MISTRAL_API_KEY"], 97 | model="mistral-large-latest", 98 | ) 99 | 100 | 101 | @pytest.mark.asyncio 102 | async def test_sambanova(): 103 | await make_request( 104 | supplier="sambanova", 105 | api_key=os.environ["SAMBANOVA_API_KEY"], 106 | model="Meta-Llama-3.1-405B-Instruct", 107 | ) 108 | -------------------------------------------------------------------------------- /vercel.json: -------------------------------------------------------------------------------- 1 | { 2 | "builds": [ 3 | { 4 | "src": "main.py", 5 | "use": "@vercel/python" 6 | } 7 | ], 8 | "routes": [ 9 | { 10 | "src": "/(.*)", 11 | "dest": "main.py" 12 | } 13 | ] 14 | } --------------------------------------------------------------------------------