├── LICENSE
├── README.md
├── api
├── __init__.py
├── hello.py
├── servers
│ ├── __init__.py
│ ├── base.py
│ ├── gemini.py
│ └── generic.py
└── v1
│ └── __init__.py
├── main.py
├── package.json
├── public
├── __init__.py
├── favicon.ico
├── flow.png
├── usage.py
└── vercel.png
├── requirements.txt
├── tests
├── config.py
├── test_async_api.py
├── test_gemini_flow.py
└── test_sync_api.py
└── vercel.json
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) [2024] [ultrasev]
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
6 |
7 |
8 |
9 |
LLM API 反向代理
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 | 本项目旨在提供一个反向代理服务,解决在部分国家或地区无法直接访问 Google, Groq, Cerebras(Amazon cloudfront)等平台 API 的问题。
22 |
23 | # 功能
24 |
25 | 通过 Vercel 边缘网络,反向代理 OpenAI、Groq、Google、Cerebras 等平台的 API 请求。
26 |
27 | - 支持供应商:Groq、Google、OpenAI、Cerebras、NVIDIA、Mistral、Sambanova
28 | - 支持流式输出
29 | - 兼容 OpenAI API 规范
30 |
31 | 注:大陆不可直接访问 vercel.app 域名。如想直接访问,可参考之前作者的另一个项目[llmproxy](https://github.com/ultrasev/llmproxy),通过 cloudflare worker 部署 LLM API 反向代理。
32 |
33 | # 使用
34 | 部署到 Vercel 后,可使用自己的 API 地址为:https://your-project-name.vercel.app/。
35 | 测试 API 地址:
36 | - https://llmproxy-vercel.vercel.app/ : 即 vercel 提供的 API 地址,局域网内不能直接访问,需科学上网。
37 | - https://llm.cufo.cc/ : 作者通过 cloudflare + 美区 VPS 搭建的反向代理,大陆等地区可直接访问,调用路径为 local -> CF(HK) -> VPS(US) -> OpenAI。
38 |
39 | 两个地址使用时均不需要携带 `/v1` 后缀,即 `base_url="https://llm.cufo.cc/openai"` 或者 `chat_url="https://llm.cufo.cc/openai/chat/completions"`。
40 |
41 | ## 示例 1: OpenAI
42 |
43 | ```python
44 | from openai import OpenAI
45 |
46 | client = OpenAI(
47 | api_key="sk-proj-...",
48 | base_url="https://llmproxy-vercel.vercel.app/openai", # 没有 /v1 后缀
49 | )
50 |
51 | response = client.chat.completions.create(
52 | model="gpt-3.5-turbo",
53 | messages=[{"role": "user", "content": "Hello world!"}],
54 | )
55 |
56 | print(response.choices[0].message.content)
57 | ```
58 |
59 | ## 示例 2: Google Gemini
60 |
61 | ```python
62 | from openai import OpenAI
63 |
64 | client = OpenAI(
65 | api_key="...",
66 | base_url="https://llmproxy-vercel.vercel.app/gemini",
67 | )
68 |
69 | response = client.chat.completions.create(
70 | model="gemini-1.5-flash",
71 | messages=[{"role": "user", "content": "Hello world!"}],
72 | )
73 |
74 | print(response.choices[0].message.content)
75 | ```
76 |
77 | ## 示例 3: Cerebras
78 |
79 | ```bash
80 | curl --location 'https://llmproxy-vercel.vercel.app/cerebras/chat/completions' \
81 | --header 'Content-Type: application/json' \
82 | --header "Authorization: Bearer ${CEREBRAS_API_KEY}" \
83 | --data '{
84 | "model": "llama3.1-8b",
85 | "stream": false,
86 | "messages": [{"content": "why is fast inference important?", "role": "user"}],
87 | "temperature": 0,
88 | "max_tokens": 1024,
89 | "seed": 0,
90 | "top_p": 1
91 | }'
92 | ```
93 |
94 | # Vercel 一键部署
95 |
96 | [](https://vercel.com/new/clone?repository-url=https%3A%2F%2Fgithub.com%2Fultrasev%2Fllmproxy-vercel)
97 |
98 | # 本地开发测试
99 |
100 | ```bash
101 | pip3 install -r requirements.txt
102 | pip3 install uvicorn
103 | uvicorn main:app --host 0.0.0.0 --port 3000 --reload
104 | ```
105 |
106 | # License
107 |
108 | Copyright © 2024 [ultrasev](https://github.com/ultrasev).
109 | This project is [MIT](LICENSE) licensed.
110 |
111 | # Support me
112 |
113 | [](https://www.buymeacoffee.com/ultrasev)
114 |
115 | ## Contacts
116 |
117 | - [](https://twitter.com/slippertopia)
118 | - [](https://www.youtube.com/channel/UCt0Op8mQvqwjp18B8vNPjzg)
119 |
--------------------------------------------------------------------------------
/api/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ultrasev/llmproxy-vercel/33725d98111811ed739046c2f8edbba2c18b39f8/api/__init__.py
--------------------------------------------------------------------------------
/api/hello.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python
2 | from fastapi.routing import APIRouter
3 | router = APIRouter()
4 |
5 |
6 | @router.get("/")
7 | def read_root():
8 | return {"Hello": "World"}
9 |
--------------------------------------------------------------------------------
/api/servers/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ultrasev/llmproxy-vercel/33725d98111811ed739046c2f8edbba2c18b39f8/api/servers/__init__.py
--------------------------------------------------------------------------------
/api/servers/base.py:
--------------------------------------------------------------------------------
1 | from pydantic import BaseModel, Field
2 | import httpx
3 | import asyncio
4 | from typing import List, Dict, Optional
5 |
6 |
7 | class Message(BaseModel):
8 | role: str
9 | content: str
10 |
11 |
12 | class OpenAIProxyArgs(BaseModel):
13 | model: str
14 | messages: List[Message]
15 | stream: bool = False
16 | temperature: float = Field(default=0.7, ge=0, le=2)
17 | top_p: float = Field(default=1, ge=0, le=1)
18 | n: int = Field(default=1, ge=1)
19 | max_tokens: Optional[int] = None
20 | presence_penalty: float = Field(default=0, ge=-2, le=2)
21 | frequency_penalty: float = Field(default=0, ge=-2, le=2)
22 |
23 |
24 | async def stream_openai_response(endpoint: str, payload: Dict, headers: Dict):
25 | async with httpx.AsyncClient() as client:
26 | async with client.stream("POST", endpoint, json=payload, headers=headers) as response:
27 | async for line in response.aiter_lines():
28 | if line.startswith("data: "):
29 | yield line + "\n\n"
30 | elif line.strip() == "data: [DONE]":
31 | break
32 |
--------------------------------------------------------------------------------
/api/servers/gemini.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python
2 | ''' Convert Gemini API to OpenAI API format
3 |
4 | Gemini API docs:
5 | - https://ai.google.dev/gemini-api/docs/text-generation?lang=rest
6 | '''
7 | from loguru import logger
8 | from pydantic import BaseModel
9 | from fastapi import APIRouter, HTTPException, Header, Query
10 | from fastapi.responses import JSONResponse, StreamingResponse
11 | import httpx
12 | import typing
13 | from typing import List, Dict, Optional
14 | from .base import Message
15 | import time
16 | import json
17 | import re
18 |
19 | router = APIRouter()
20 |
21 |
22 | GEMINI_ENDPOINT = "https://generativelanguage.googleapis.com/v1beta/models/{}:generateContent"
23 | GEMINI_STREAM_ENDPOINT = "https://generativelanguage.googleapis.com/v1beta/models/{}:streamGenerateContent"
24 |
25 |
26 | class OpenAIProxyArgs(BaseModel):
27 | model: str
28 | messages: List[Dict[str, str]]
29 | stream: bool = False
30 | temperature: float = 0.7
31 | top_p: float = 1
32 | n: int = 1
33 | max_tokens: Optional[int] = None
34 | presence_penalty: float = 0
35 | frequency_penalty: float = 0
36 |
37 |
38 | class MessageConverter:
39 | def __init__(self, messages: List[Dict[str, str]]):
40 | self.messages = messages
41 |
42 | def convert(self) -> List[Dict[str, str]]:
43 | converted_messages = []
44 | for message in self.messages:
45 | role = "user" if message["role"] == "user" else "model"
46 | converted_messages.append({
47 | "role": role,
48 | "parts": [{"text": message["content"]}]
49 | })
50 | return converted_messages
51 |
52 |
53 | def convert_gemini_to_openai_response(gemini_response: dict, model: str) -> dict:
54 | """Convert Gemini API response to OpenAI-compatible format."""
55 | return {
56 | "id": gemini_response.get("candidates", [{}])[0].get("content", {}).get("role", ""),
57 | "object": "chat.completion",
58 | "created": int(time.time()),
59 | "model": model,
60 | "usage": {
61 | "prompt_tokens": 0, # Gemini doesn't provide token counts
62 | "completion_tokens": 0,
63 | "total_tokens": 0
64 | },
65 | "choices": [{
66 | "message": {
67 | "role": "assistant",
68 | "content": gemini_response.get("candidates", [{}])[0].get("content", {}).get("parts", [{}])[0].get("text", "")
69 | },
70 | "finish_reason": "stop",
71 | "index": 0
72 | }]
73 | }
74 |
75 |
76 | async def stream_gemini_response(model: str, payload: dict, api_key: str):
77 | text_pattern = re.compile(r'"text": "(.*?)"')
78 |
79 | async with httpx.AsyncClient() as client:
80 | async with client.stream(
81 | "POST",
82 | GEMINI_STREAM_ENDPOINT.format(model),
83 | json=payload,
84 | headers={
85 | "Content-Type": "application/json",
86 | "x-goog-api-key": api_key
87 | }
88 | ) as response:
89 | async for line in response.aiter_lines():
90 | line = line.strip()
91 | match = text_pattern.search(line)
92 | if match:
93 | text_content = match.group(1)
94 | text_content = json.loads(f'"{text_content}"')
95 |
96 | openai_format = {
97 | "id": f"chatcmpl-{int(time.time())}",
98 | "object": "chat.completion.chunk",
99 | "created": int(time.time()),
100 | "model": model,
101 | "choices": [{
102 | "index": 0,
103 | "delta": {
104 | "content": text_content
105 | },
106 | "finish_reason": None
107 | }]
108 | }
109 |
110 | yield f"data: {json.dumps(openai_format, ensure_ascii=False)}\n\n"
111 |
112 | final_chunk = {
113 | "id": f"chatcmpl-{int(time.time())}",
114 | "object": "chat.completion.chunk",
115 | "created": int(time.time()),
116 | "model": model,
117 | "choices": [{
118 | "index": 0,
119 | "delta": {},
120 | "finish_reason": "stop"
121 | }]
122 | }
123 | yield f"data: {json.dumps(final_chunk, ensure_ascii=False)}\n\n"
124 | yield "data: [DONE]\n\n"
125 |
126 |
127 | @router.post("/chat/completions")
128 | async def proxy_chat_completions(
129 | args: OpenAIProxyArgs,
130 | authorization: str = Header(...),
131 | ):
132 | api_key = authorization.split(" ")[1]
133 | model = args.model
134 |
135 | if not api_key:
136 | raise HTTPException(status_code=400, detail="API key not provided")
137 |
138 | # Transform args into Gemini API format
139 | gemini_payload = {
140 | "contents": MessageConverter(args.messages).convert(),
141 | "safetySettings": [
142 | {
143 | "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
144 | "threshold": "BLOCK_ONLY_HIGH"
145 | }
146 | ],
147 | "generationConfig": {
148 | "temperature": args.temperature,
149 | "maxOutputTokens": args.max_tokens,
150 | "topP": args.top_p,
151 | "topK": 10
152 | }
153 | }
154 |
155 | if args.stream:
156 | return StreamingResponse(stream_gemini_response(model, gemini_payload, api_key), media_type="text/event-stream")
157 | else:
158 | async with httpx.AsyncClient() as client:
159 | response = await client.post(
160 | GEMINI_ENDPOINT.format(model),
161 | json=gemini_payload,
162 | headers={
163 | "Content-Type": "application/json",
164 | "x-goog-api-key": api_key
165 | }
166 | )
167 |
168 | if response.status_code != 200:
169 | return JSONResponse(content=response.json(), status_code=response.status_code)
170 |
171 | response_json = response.json()
172 |
173 | # Use the new conversion function
174 | openai_compatible_response = convert_gemini_to_openai_response(
175 | response_json, args.model)
176 |
177 | return JSONResponse(openai_compatible_response)
178 |
--------------------------------------------------------------------------------
/api/servers/generic.py:
--------------------------------------------------------------------------------
1 | from fastapi import APIRouter, Header, HTTPException
2 | from fastapi.responses import JSONResponse, StreamingResponse
3 | from pydantic import BaseModel
4 | import httpx
5 | from typing import Dict
6 | from .base import stream_openai_response, OpenAIProxyArgs
7 |
8 | router = APIRouter()
9 |
10 | PLATFORM_API_URLS: Dict[str, str] = {
11 | "openai": "https://api.openai.com/v1/chat/completions",
12 | "mistral": "https://api.mistral.ai/v1/chat/completions",
13 | "groq": "https://api.groq.com/openai/v1/chat/completions",
14 | "cerebras": "https://api.cerebras.ai/v1/chat/completions",
15 | "nvidia": "https://integrate.api.nvidia.com/v1/chat/completions",
16 | "sambanova": "https://api.sambanova.ai/v1/chat/completions",
17 | }
18 |
19 |
20 | @router.post("/{platform}/chat/completions")
21 | async def proxy_chat_completions(platform: str, args: OpenAIProxyArgs, authorization: str = Header(...)):
22 | if platform not in PLATFORM_API_URLS:
23 | raise HTTPException(
24 | status_code=404, detail=f"Platform '{platform}' not supported")
25 |
26 | api_url = PLATFORM_API_URLS[platform]
27 | api_key = authorization.split(" ")[1]
28 | headers = {
29 | "Authorization": f"Bearer {api_key}",
30 | "Content-Type": "application/json"
31 | }
32 | payload = args.dict(exclude_none=True)
33 |
34 | if args.stream:
35 | return StreamingResponse(
36 | stream_openai_response(api_url, payload, headers),
37 | media_type="text/event-stream",
38 | headers={"X-Content-Type-Options": "nosniff",
39 | "X-Experimental-Stream-Data": "true"}
40 | )
41 | else:
42 | async with httpx.AsyncClient() as client:
43 | try:
44 | response = await client.post(api_url, json=payload, headers=headers)
45 | response.raise_for_status()
46 | return JSONResponse(response.json())
47 | except httpx.HTTPStatusError as e:
48 | raise HTTPException(
49 | status_code=e.response.status_code, detail=str(e.response.text))
50 | except Exception as e:
51 | raise HTTPException(status_code=500, detail=str(e))
52 |
--------------------------------------------------------------------------------
/api/v1/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ultrasev/llmproxy-vercel/33725d98111811ed739046c2f8edbba2c18b39f8/api/v1/__init__.py
--------------------------------------------------------------------------------
/main.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 | from public.usage import USAGE as html
3 | from api.hello import router as hello_router
4 | from fastapi import FastAPI
5 | from fastapi.responses import Response
6 | from api.servers.generic import router as generic_router
7 | from api.servers.gemini import router as gemini_router
8 | from fastapi.middleware.cors import CORSMiddleware
9 | app = FastAPI()
10 |
11 | app.include_router(hello_router, prefix="/hello")
12 | app.include_router(gemini_router, prefix="/gemini")
13 | app.include_router(generic_router, prefix="") # put generic last
14 |
15 | app.add_middleware(
16 | CORSMiddleware,
17 | allow_credentials=True,
18 | allow_methods=["*"],
19 | allow_headers=["*"],
20 | expose_headers=[ "X-Experimental-Stream-Data"], # this is needed for streaming data header to be read by the client
21 | )
22 |
23 | @app.get("/")
24 | def _root():
25 | return Response(content=html, media_type="text/html")
26 |
--------------------------------------------------------------------------------
/package.json:
--------------------------------------------------------------------------------
1 | {
2 | "engines": {
3 | "node": "18.x"
4 | }
5 | }
--------------------------------------------------------------------------------
/public/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ultrasev/llmproxy-vercel/33725d98111811ed739046c2f8edbba2c18b39f8/public/__init__.py
--------------------------------------------------------------------------------
/public/favicon.ico:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ultrasev/llmproxy-vercel/33725d98111811ed739046c2f8edbba2c18b39f8/public/favicon.ico
--------------------------------------------------------------------------------
/public/flow.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ultrasev/llmproxy-vercel/33725d98111811ed739046c2f8edbba2c18b39f8/public/flow.png
--------------------------------------------------------------------------------
/public/usage.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python
2 |
3 | USAGE = """
4 |
5 |
6 |
7 |
8 | Usage
9 |
40 |
41 |
42 |
43 |
success
44 |
Usage
45 |
Visit Github doc for more information.
46 |
47 |
48 |
49 |
50 | """
51 |
--------------------------------------------------------------------------------
/public/vercel.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ultrasev/llmproxy-vercel/33725d98111811ed739046c2f8edbba2c18b39f8/public/vercel.png
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | fastapi==0.88.0
2 | pydantic~=1.10.4
3 | python-multipart==0.0.5
4 | expiringdict==1.2.2
5 | rich==13.4.2
6 | openai==1.6.1
7 | httpx==0.27.0
8 | loguru==0.7.2
--------------------------------------------------------------------------------
/tests/config.py:
--------------------------------------------------------------------------------
1 | import random
2 |
3 | PRODUCTION_API_ENDPOINT = random.choice(["https://llmproxy-vercel.vercel.app", "https://llm.cufo.cc"])
4 | DEVELOPMENT_API_ENDPOINT = "http://192.168.31.46:3000"
5 |
--------------------------------------------------------------------------------
/tests/test_async_api.py:
--------------------------------------------------------------------------------
1 | import asyncio
2 | import pytest
3 | import os
4 | from dotenv import load_dotenv
5 | from openai import AsyncOpenAI
6 | import random
7 | from loguru import logger
8 | from config import PRODUCTION_API_ENDPOINT, DEVELOPMENT_API_ENDPOINT
9 | load_dotenv()
10 |
11 |
12 | def api_endpoint():
13 | env = os.environ.get('ENV', 'development')
14 | if env == 'production':
15 | return PRODUCTION_API_ENDPOINT
16 | elif env == 'development':
17 | return DEVELOPMENT_API_ENDPOINT
18 | else:
19 | raise ValueError(f"Invalid environment: {env}")
20 |
21 |
22 | BASE_URL = api_endpoint()
23 | logger.info(f"BASE_URL: {BASE_URL}")
24 |
25 |
26 | async def make_request(supplier: str, api_key: str, model: str):
27 | BASE_URL = api_endpoint() + f"/{supplier}"
28 | query = "Count from 1 to 5"
29 |
30 | client = AsyncOpenAI(base_url=BASE_URL, api_key=api_key)
31 |
32 | try:
33 | stream = await client.chat.completions.create(
34 | model=model,
35 | messages=[{"role": "user", "content": query}],
36 | stream=True,
37 | )
38 |
39 | content = ""
40 | async for chunk in stream:
41 | delta_content = chunk.choices[0].delta.content
42 | if delta_content:
43 | content += delta_content
44 | print(f"Received chunk: {delta_content}") # Debug print
45 |
46 | print(f"Full content: {content}") # Debug print
47 |
48 | if not content:
49 | raise ValueError("Received empty content from API")
50 |
51 | for i in range(1, 6):
52 | assert str(
53 | i) in content, f"Expected {i} in content, but it's missing. Content: {content}"
54 |
55 | except Exception as e:
56 | print(f"Error occurred: {str(e)}")
57 | raise
58 |
59 |
60 | @pytest.mark.asyncio
61 | async def test_openai_streaming():
62 | await make_request(
63 | supplier="openai",
64 | api_key=os.environ["OPENAI_API_KEY"],
65 | model="gpt-3.5-turbo"
66 | )
67 |
68 |
69 | @pytest.mark.asyncio
70 | async def test_groq_streaming():
71 | await make_request(
72 | supplier="groq",
73 | api_key=os.environ["GROQ_API_KEY"],
74 | model="llama3-70b-8192"
75 | )
76 |
77 |
78 | @pytest.mark.asyncio
79 | async def test_gemini_streaming():
80 | await make_request(
81 | supplier="gemini",
82 | api_key=os.environ["GEMINI_API_KEY"],
83 | model="gemini-1.5-flash"
84 | )
85 |
86 |
87 | @pytest.mark.asyncio
88 | async def test_cerebras_streaming():
89 | await make_request(
90 | supplier="cerebras",
91 | api_key=os.environ["CEREBRAS_API_KEY"],
92 | model="llama3.1-8b"
93 | )
94 |
95 |
96 | @pytest.mark.asyncio
97 | async def test_nvidia_streaming():
98 | await make_request(
99 | supplier="nvidia",
100 | api_key=os.environ["NVIDIA_API_KEY"],
101 | model="meta/llama-3.2-3b-instruct"
102 | )
103 |
104 |
105 | @pytest.mark.asyncio
106 | async def test_mistral():
107 | await make_request(
108 | supplier="mistral",
109 | api_key=os.environ["MISTRAL_API_KEY"],
110 | model="mistral-large-latest",
111 | )
112 |
113 |
114 | @pytest.mark.asyncio
115 | async def test_sambanova():
116 | await make_request(
117 | supplier="sambanova",
118 | api_key=os.environ["SAMBANOVA_API_KEY"],
119 | model="Meta-Llama-3.1-405B-Instruct",
120 | )
121 |
--------------------------------------------------------------------------------
/tests/test_gemini_flow.py:
--------------------------------------------------------------------------------
1 | import asyncio
2 | import pytest
3 | import os
4 | from dotenv import load_dotenv
5 | from openai import AsyncOpenAI
6 | import random
7 | from loguru import logger
8 | from config import PRODUCTION_API_ENDPOINT, DEVELOPMENT_API_ENDPOINT
9 | load_dotenv()
10 |
11 |
12 | def api_endpoint():
13 | env = os.environ.get('ENV', 'development')
14 | if env == 'production':
15 | return PRODUCTION_API_ENDPOINT
16 | elif env == 'development':
17 | return DEVELOPMENT_API_ENDPOINT
18 | else:
19 | raise ValueError(f"Invalid environment: {env}")
20 |
21 |
22 | BASE_URL = api_endpoint()
23 | logger.info(f"BASE_URL: {BASE_URL}")
24 |
25 |
26 | async def make_request(supplier: str, api_key: str, model: str):
27 | BASE_URL = api_endpoint() + f"/{supplier}"
28 | query = "用汉字从一数到十,如一,二,三,四,五,..."
29 |
30 | client = AsyncOpenAI(base_url=BASE_URL, api_key=api_key)
31 |
32 | try:
33 | stream = await client.chat.completions.create(
34 | model=model,
35 | messages=[{"role": "user", "content": query}],
36 | stream=True,
37 | )
38 |
39 | content = ""
40 | async for chunk in stream:
41 | delta_content = chunk.choices[0].delta.content
42 | if delta_content:
43 | content += delta_content
44 | print(f"Received chunk: {delta_content}") # Debug print
45 |
46 | print(f"Full content: {content}") # Debug print
47 |
48 | if not content:
49 | raise ValueError("Received empty content from API")
50 |
51 | for word in ["一", "二", "三", "四", "五", "六", "七", "八", "九", "十"]:
52 | assert word in content, f"Expected '{word}' in content, but it's missing. Content: {content}"
53 |
54 | except Exception as e:
55 | print(f"Error occurred: {str(e)}")
56 | raise
57 |
58 |
59 | @pytest.mark.asyncio
60 | async def test_gemini_streaming():
61 | await make_request(
62 | supplier="gemini",
63 | api_key=os.environ["GEMINI_API_KEY"],
64 | model="gemini-1.5-flash"
65 | )
66 |
--------------------------------------------------------------------------------
/tests/test_sync_api.py:
--------------------------------------------------------------------------------
1 | import asyncio
2 | import pytest
3 | import os
4 | from dotenv import load_dotenv
5 | from openai import AsyncOpenAI
6 | import random
7 | from loguru import logger
8 | from config import PRODUCTION_API_ENDPOINT, DEVELOPMENT_API_ENDPOINT
9 | load_dotenv()
10 |
11 |
12 | def api_endpoint():
13 | env = os.environ.get('ENV', 'development')
14 | if env == 'production':
15 | return PRODUCTION_API_ENDPOINT
16 | elif env == 'development':
17 | return DEVELOPMENT_API_ENDPOINT
18 | else:
19 | raise ValueError(f"Invalid environment: {env}")
20 |
21 |
22 | BASE_URL = api_endpoint()
23 | logger.info(f"BASE_URL: {BASE_URL}")
24 |
25 |
26 | async def make_request(api_key: str,
27 | model: str,
28 | supplier: str,
29 | query: str = "The first president of the United States, give me his full name and only his full name"):
30 | client = AsyncOpenAI(base_url=BASE_URL + f"/{supplier}", api_key=api_key)
31 | response = await client.chat.completions.create(
32 | model=model,
33 | messages=[
34 | {"role": "system", "content": "You are a helpful assistant。"},
35 | {"role": "user", "content": query}
36 | ],
37 | temperature=0.7,
38 | top_p=1,
39 | max_tokens=20
40 | )
41 | print(type(response), response)
42 | content = response.choices[0].message.content
43 | assert "George Washington" in content, f"Expected 'George Washington' in content, but got {content}"
44 | return content
45 |
46 |
47 | @pytest.mark.asyncio
48 | async def test_groq():
49 | await make_request(
50 | supplier="groq",
51 | api_key=os.environ["GROQ_API_KEY"],
52 | model="llama3-70b-8192"
53 | )
54 |
55 |
56 | @pytest.mark.asyncio
57 | async def test_openai():
58 | await make_request(
59 | supplier="openai",
60 | api_key=os.environ["OPENAI_API_KEY"],
61 | model="gpt-4o-mini"
62 | )
63 |
64 |
65 | @pytest.mark.asyncio
66 | async def test_gemini():
67 | await make_request(
68 | supplier="gemini",
69 | api_key=os.environ["GEMINI_API_KEY"],
70 | model="gemini-1.5-flash"
71 | )
72 |
73 |
74 | @pytest.mark.asyncio
75 | async def test_cerebras():
76 | await make_request(
77 | supplier="cerebras",
78 | api_key=os.environ["CEREBRAS_API_KEY"],
79 | model="llama3.1-8b"
80 | )
81 |
82 |
83 | @pytest.mark.asyncio
84 | async def test_nvidia():
85 | await make_request(
86 | supplier="nvidia",
87 | api_key=os.environ["NVIDIA_API_KEY"],
88 | model="meta/llama-3.2-3b-instruct"
89 | )
90 |
91 |
92 | @pytest.mark.asyncio
93 | async def test_mistral():
94 | await make_request(
95 | supplier="mistral",
96 | api_key=os.environ["MISTRAL_API_KEY"],
97 | model="mistral-large-latest",
98 | )
99 |
100 |
101 | @pytest.mark.asyncio
102 | async def test_sambanova():
103 | await make_request(
104 | supplier="sambanova",
105 | api_key=os.environ["SAMBANOVA_API_KEY"],
106 | model="Meta-Llama-3.1-405B-Instruct",
107 | )
108 |
--------------------------------------------------------------------------------
/vercel.json:
--------------------------------------------------------------------------------
1 | {
2 | "builds": [
3 | {
4 | "src": "main.py",
5 | "use": "@vercel/python"
6 | }
7 | ],
8 | "routes": [
9 | {
10 | "src": "/(.*)",
11 | "dest": "main.py"
12 | }
13 | ]
14 | }
--------------------------------------------------------------------------------