├── .env.example
├── .gitignore
├── Dockerfile
├── README-en.md
├── README.md
├── app.py
├── config.yaml.example
├── docker-compose.yml
└── requirements.txt


/.env.example:
--------------------------------------------------------------------------------
 1 | # Backend URL
 2 | BACKEND_URL=http://host.docker.internal:9880
 3 | # API_KEY=
 4 | # generation parameters
 5 | # TEXT_LANGUAGE=zh
 6 | # TOP_K=15 
 7 | # TOP_P=1
 8 | # TEMPERATURE=0.45
 9 | # SPEED=0.95
10 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | .env
2 | config.yaml


--------------------------------------------------------------------------------
/Dockerfile:
--------------------------------------------------------------------------------
 1 | # Use the official Python image
 2 | FROM python:3.10-slim
 3 | 
 4 | # Install ffmpeg and any other dependencies
 5 | RUN apt-get update && apt-get install -y --no-install-recommends ffmpeg && \
 6 |     rm -rf /var/lib/apt/lists/*
 7 | 
 8 | # Set the working directory
 9 | WORKDIR /app
10 | 
11 | # Copy requirements.txt
12 | COPY requirements.txt /app/
13 | 
14 | # Install Python dependencies
15 | RUN pip install --no-cache-dir -r requirements.txt
16 | 
17 | # Copy the application code
18 | COPY app.py /app/
19 | 
20 | # Expose the port that the app runs on
21 | EXPOSE 5000
22 | 
23 | # Set the command to run the application
24 | CMD ["python", "app.py"]


--------------------------------------------------------------------------------
/README-en.md:
--------------------------------------------------------------------------------
 1 | # Convert GPT-sovits to OpenAI TTS Format
 2 | 
 3 | ## Introduction
 4 | 
 5 | Many open-source projects only support common TTS APIs from major companies. This project aims to adapt clients that support OpenAI TTS by converting API requests.
 6 | 
 7 | ## Preparation  
 8 | 
 9 | Start the GPT-sovits API service according to the documentation in api.py from the original project. If you're using an integrated package, make sure to start it in the root directory in the following format:
10 | 
11 | ```shell
12 | runtime\python.exe api.py <parameters omitted>
13 | ```
14 | 
15 | ## Usage
16 | 
17 | Clone this repository and modify the configuration
18 | 
19 | ```shell
20 | git clone https://github.com/RedwindA/GPT-sovits-2-OpenAI
21 | cd GPT-sovits-2-OpenAI
22 | cp .env.example .env
23 | cp config.yaml.example config.yaml
24 | ```
25 | 
26 | For Windows, the first two lines remain the same, and the last two lines are changed to:
27 | 
28 | ```cmd
29 | copy .env.example .env
30 | copy config.yaml.example config.yaml
31 | ```
32 | 
33 | Start with Docker:
34 | 
35 | ```shell
36 | docker compose up -d
37 | ```
38 | 
39 | After starting, the service runs as an OpenAI TTS API service. The base_url is http://your_ip:5000/v1, and the complete url is http://your_ip:5000/v1/audio/speech, which can be filled in the application.
40 | 
41 | ## Important Notes
42 | 
43 | 1. **Since docker is used, please ensure the container can correctly access the GPT-sovits API.**
44 | If both are running on the same host machine, and GPT-sovits API is running directly (non-dockerized), the environment variable should be `BACKEND_URL=http://host.docker.internal:9880` (current default configuration). You can also combine both using docker compose in the same docker network.
45 | 
46 | 2. If API_KEY is not configured, the service will be accessible by everyone
47 | 
48 | ## Future Development Plans
49 | 
50 | 1. Support streaming
51 | 2. ~~Port or merge v2 to support different text segmentation strategies~~ Implemented in the apiv2 branch
52 | ```
53 | 
54 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # 将GPT-sovits转换为OpenAI TTS格式
 2 | 
 3 | ## 前言
 4 | 
 5 | 许多开源项目只支持常见的大厂TTS API。本项目旨在通过转换API请求来适配支持OpenAI TTS的客户端。
 6 | 
 7 | ## 准备
 8 | 
 9 | 根据原项目中的api.py自带的文档启动GPT-sovits的api服务。如果你使用的是整合包，务必以如下格式在整合包的根目录中启动：
10 | 
11 | ```shell
12 | runtime\python.exe api.py <省略后续的参数>
13 | ```
14 | 
15 | ## 使用方法
16 | 
17 | 克隆本仓库并修改配置
18 | 
19 | ```Linux shell
20 | git clone https://github.com/RedwindA/GPT-sovits-2-OpenAI
21 | cd GPT-sovits-2-OpenAI
22 | cp .env.example .env
23 | cp config.yaml.example config.yaml
24 | ```
25 | 
26 | 对于Windows，前两行不变，最后两行改为：
27 | 
28 | ```cmd
29 | copy .env.example .env
30 | copy config.yaml.example config.yaml
31 | ```
32 | 
33 | 使用Docker 启动：
34 | 
35 | ```shell
36 | docker compose up -d
37 | ```
38 | 
39 | 启动后，本服务就作为一个OpenAI TTS的API服务运行。base_url为http://your_ip:5000/v1，完整的url为http://your_ip:5000/v1/audio/speech，填入应用中即可。
40 | 
41 | ## 注意事项
42 | 
43 | 1. **由于使用了docker，请确保容器能够正确访问GPT-sovits API。**
44 | 如果两者运行在同一台宿主机上，而GPT-sovits API是直接运行的（非docker），环境变量应该是`BACKEND_URL=http://host.docker.internal:9880`（目前的默认配置）。你也可以通过docker compose将两者组合在同一个docker网络中。
45 | 
46 | 2. 如果不配置API_KEY，则服务可被所有人访问
47 | 
48 | ## 未来开发计划
49 | 
50 | 1. 支持流式传输
51 | 2. ~~移植或合并v2以支持不同的切分文本策略~~ 已在apiv2分支中实现


--------------------------------------------------------------------------------
/app.py:
--------------------------------------------------------------------------------
  1 | from flask import Flask, request, send_file, jsonify
  2 | import requests
  3 | from pydub import AudioSegment
  4 | import io
  5 | import os  # Import os to access environment variables
  6 | import json
  7 | import yaml
  8 | 
  9 | app = Flask(__name__)
 10 | 
 11 | # Get API_KEY from environment variable
 12 | API_KEY = os.environ.get('API_KEY')
 13 | 
 14 | # Get BACKEND_URL from environment variable or use default
 15 | BACKEND_URL = os.environ.get('BACKEND_URL', 'http://127.0.0.1:9880')
 16 | 
 17 | # Load YAML configuration file
 18 | def load_voice_config():
 19 |     try:
 20 |         with open('config.yaml', 'r', encoding='utf-8') as f:
 21 |             config = yaml.safe_load(f)
 22 |             voices = config.get('voices', {})
 23 |             
 24 |             voice_mapping = {
 25 |                 voice: voice_data['models']
 26 |                 for voice, voice_data in voices.items()
 27 |             }
 28 |             refer_mapping = {
 29 |                 voice: voice_data['refer']
 30 |                 for voice, voice_data in voices.items()
 31 |             }
 32 |             return voice_mapping, refer_mapping
 33 |     except Exception as e:
 34 |         print(f"Error loading config.yaml: {e}")
 35 |         return {}, {}
 36 | 
 37 | # Replace original environment variable configuration
 38 | VOICE_MAPPING, REFER_MAPPING = load_voice_config()
 39 | 
 40 | # Get other parameters from environment variables or use default values
 41 | TEXT_LANGUAGE = os.environ.get('TEXT_LANGUAGE', 'zh')
 42 | TOP_K = int(os.environ.get('TOP_K', 15))
 43 | TOP_P = float(os.environ.get('TOP_P', 1))
 44 | TEMPERATURE = float(os.environ.get('TEMPERATURE', 0.45))
 45 | SPEED = float(os.environ.get('SPEED', 0.95))
 46 | 
 47 | # Print all parameters in one line for debugging
 48 | print(f"BACKEND_URL: {BACKEND_URL}, TEXT_LANGUAGE: {TEXT_LANGUAGE}, TOP_K: {TOP_K}, TOP_P: {TOP_P}, TEMPERATURE: {TEMPERATURE}, SPEED: {SPEED}, VOICE_MAPPING: {VOICE_MAPPING}")
 49 | 
 50 | @app.route('/v1/audio/speech', methods=['POST'])
 51 | def convert_tts():
 52 |     # Check API key if it's set in environment
 53 |     if API_KEY:
 54 |         auth_header = request.headers.get('Authorization')
 55 |         if not auth_header:
 56 |             return "Missing Authorization header", 401
 57 |         
 58 |         # Check if header starts with "Bearer "
 59 |         if not auth_header.startswith('Bearer '):
 60 |             return "Invalid Authorization header format", 401
 61 |         
 62 |         # Extract and verify API key
 63 |         provided_key = auth_header.split(' ')[1]
 64 |         if provided_key != API_KEY:
 65 |             return "Invalid API key", 401
 66 | 
 67 |     # Extract 'input' field from OpenAI request
 68 |     openai_data = request.json
 69 |     text = openai_data.get('input')
 70 |     voice = openai_data.get('voice')
 71 | 
 72 |     # Get model paths and refer from the VOICE_MAPPING according to the provided voice
 73 |     voice_config = VOICE_MAPPING.get(voice)
 74 |     refer_config = REFER_MAPPING.get(voice)
 75 | 
 76 |     if not voice_config:
 77 |         return f"Voice '{voice}' is not supported", 400
 78 | 
 79 |     gpt_model_path = voice_config.get('gpt_model_path')
 80 |     sovits_model_path = voice_config.get('sovits_model_path')
 81 |     refer_wav_path = refer_config.get('refer_wav_path')
 82 |     prompt_text = refer_config.get('prompt_text')
 83 | 
 84 |     if not gpt_model_path or not sovits_model_path:
 85 |         return f"Model paths for voice '{voice}' are missing", 500
 86 | 
 87 |     if not refer_wav_path or not prompt_text:
 88 |         return f"Refer for voice '{voice}' are missing", 500
 89 |     
 90 |     # Step 1: Set the models in the backend
 91 |     set_model_response = requests.post(f"{BACKEND_URL}/set_model", json={
 92 |         "gpt_model_path": gpt_model_path,
 93 |         "sovits_model_path": sovits_model_path
 94 |     })
 95 | 
 96 | 
 97 |     # Check if the backend was able to set the models and refer
 98 |     if set_model_response.status_code != 200:
 99 |         return f"Backend failed to set models: {set_model_response.text}", set_model_response.status_code
100 |      
101 | 
102 |     # Step 2: Send text-to-speech request to the backend
103 |     backend_payload = {
104 |         "text": text,
105 |         "text_language": TEXT_LANGUAGE,
106 |         "refer_wav_path": refer_wav_path,
107 |         "prompt_text": prompt_text,
108 |         "prompt_language": TEXT_LANGUAGE, # Add language for prompt text to pass validation
109 |         "top_k": TOP_K,
110 |         "top_p": TOP_P,
111 |         "temperature": TEMPERATURE,
112 |         "speed": SPEED
113 |     }
114 | 
115 |     backend_response = requests.post(BACKEND_URL, json=backend_payload)
116 | 
117 |     # Check if the backend response is successful
118 |     if backend_response.status_code != 200:
119 |         return f"Backend service error: {backend_response.text}", backend_response.status_code
120 | 
121 |     # Step 3: Convert returned WAV file to MP3
122 |     wav_audio = io.BytesIO(backend_response.content)
123 |     audio = AudioSegment.from_wav(wav_audio)
124 |     mp3_audio = io.BytesIO()
125 |     audio.export(mp3_audio, format="mp3")
126 |     mp3_audio.seek(0)
127 | 
128 |     # Return MP3 file
129 |     return send_file(mp3_audio, mimetype='audio/mp3', as_attachment=True, download_name='speech.mp3')
130 | 
131 | if __name__ == '__main__':
132 |     app.run(host='0.0.0.0', port=5000)


--------------------------------------------------------------------------------
/config.yaml.example:
--------------------------------------------------------------------------------
 1 | # 建议使用正斜杠路径
 2 | # Recommended to use forward slashes in paths
 3 | voices:
 4 |   alloy:
 5 |     models:
 6 |       gpt_model_path: "F:/model1"
 7 |       sovits_model_path: "F:/model2"
 8 |     refer:
 9 |       refer_wav_path: "F:/1.wav"
10 |       prompt_text: "text_here"
11 |   echo:
12 |     models:
13 |       gpt_model_path: "F:/model3"
14 |       sovits_model_path: "F:/model4"
15 |     refer:
16 |       refer_wav_path: "F:/2.wav"
17 |       prompt_text: "text_here"


--------------------------------------------------------------------------------
/docker-compose.yml:
--------------------------------------------------------------------------------
 1 | services:
 2 |   tts-service:
 3 |     image: austinleo/tts-adapter:latest
 4 |     ports:
 5 |       - "5000:5000"
 6 |     env_file:
 7 |       - .env
 8 |     restart: always
 9 |     extra_hosts:
10 |       - "host.docker.internal:host-gateway"
11 |     volumes:
12 |       - ./config.yaml:/app/config.yaml      


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | Flask==3.0.3
2 | requests==2.32.3
3 | pydub==0.25.1
4 | PyYAML==6.0.1


--------------------------------------------------------------------------------