├── .env.example ├── .gitignore ├── Dockerfile ├── README-en.md ├── README.md ├── app.py ├── config.yaml.example ├── docker-compose.yml └── requirements.txt /.env.example: -------------------------------------------------------------------------------- 1 | # Backend URL 2 | BACKEND_URL=http://host.docker.internal:9880 3 | # API_KEY= 4 | # generation parameters 5 | # TEXT_LANGUAGE=zh 6 | # TOP_K=15 7 | # TOP_P=1 8 | # TEMPERATURE=0.45 9 | # SPEED=0.95 10 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | .env 2 | config.yaml -------------------------------------------------------------------------------- /Dockerfile: -------------------------------------------------------------------------------- 1 | # Use the official Python image 2 | FROM python:3.10-slim 3 | 4 | # Install ffmpeg and any other dependencies 5 | RUN apt-get update && apt-get install -y --no-install-recommends ffmpeg && \ 6 | rm -rf /var/lib/apt/lists/* 7 | 8 | # Set the working directory 9 | WORKDIR /app 10 | 11 | # Copy requirements.txt 12 | COPY requirements.txt /app/ 13 | 14 | # Install Python dependencies 15 | RUN pip install --no-cache-dir -r requirements.txt 16 | 17 | # Copy the application code 18 | COPY app.py /app/ 19 | 20 | # Expose the port that the app runs on 21 | EXPOSE 5000 22 | 23 | # Set the command to run the application 24 | CMD ["python", "app.py"] -------------------------------------------------------------------------------- /README-en.md: -------------------------------------------------------------------------------- 1 | # Convert GPT-sovits to OpenAI TTS Format 2 | 3 | ## Introduction 4 | 5 | Many open-source projects only support common TTS APIs from major companies. This project aims to adapt clients that support OpenAI TTS by converting API requests. 6 | 7 | ## Preparation 8 | 9 | Start the GPT-sovits API service according to the documentation in api.py from the original project. If you're using an integrated package, make sure to start it in the root directory in the following format: 10 | 11 | ```shell 12 | runtime\python.exe api.py 13 | ``` 14 | 15 | ## Usage 16 | 17 | Clone this repository and modify the configuration 18 | 19 | ```shell 20 | git clone https://github.com/RedwindA/GPT-sovits-2-OpenAI 21 | cd GPT-sovits-2-OpenAI 22 | cp .env.example .env 23 | cp config.yaml.example config.yaml 24 | ``` 25 | 26 | For Windows, the first two lines remain the same, and the last two lines are changed to: 27 | 28 | ```cmd 29 | copy .env.example .env 30 | copy config.yaml.example config.yaml 31 | ``` 32 | 33 | Start with Docker: 34 | 35 | ```shell 36 | docker compose up -d 37 | ``` 38 | 39 | After starting, the service runs as an OpenAI TTS API service. The base_url is http://your_ip:5000/v1, and the complete url is http://your_ip:5000/v1/audio/speech, which can be filled in the application. 40 | 41 | ## Important Notes 42 | 43 | 1. **Since docker is used, please ensure the container can correctly access the GPT-sovits API.** 44 | If both are running on the same host machine, and GPT-sovits API is running directly (non-dockerized), the environment variable should be `BACKEND_URL=http://host.docker.internal:9880` (current default configuration). You can also combine both using docker compose in the same docker network. 45 | 46 | 2. If API_KEY is not configured, the service will be accessible by everyone 47 | 48 | ## Future Development Plans 49 | 50 | 1. Support streaming 51 | 2. ~~Port or merge v2 to support different text segmentation strategies~~ Implemented in the apiv2 branch 52 | ``` 53 | 54 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # 将GPT-sovits转换为OpenAI TTS格式 2 | 3 | ## 前言 4 | 5 | 许多开源项目只支持常见的大厂TTS API。本项目旨在通过转换API请求来适配支持OpenAI TTS的客户端。 6 | 7 | ## 准备 8 | 9 | 根据原项目中的api.py自带的文档启动GPT-sovits的api服务。如果你使用的是整合包,务必以如下格式在整合包的根目录中启动: 10 | 11 | ```shell 12 | runtime\python.exe api.py <省略后续的参数> 13 | ``` 14 | 15 | ## 使用方法 16 | 17 | 克隆本仓库并修改配置 18 | 19 | ```Linux shell 20 | git clone https://github.com/RedwindA/GPT-sovits-2-OpenAI 21 | cd GPT-sovits-2-OpenAI 22 | cp .env.example .env 23 | cp config.yaml.example config.yaml 24 | ``` 25 | 26 | 对于Windows,前两行不变,最后两行改为: 27 | 28 | ```cmd 29 | copy .env.example .env 30 | copy config.yaml.example config.yaml 31 | ``` 32 | 33 | 使用Docker 启动: 34 | 35 | ```shell 36 | docker compose up -d 37 | ``` 38 | 39 | 启动后,本服务就作为一个OpenAI TTS的API服务运行。base_url为http://your_ip:5000/v1,完整的url为http://your_ip:5000/v1/audio/speech,填入应用中即可。 40 | 41 | ## 注意事项 42 | 43 | 1. **由于使用了docker,请确保容器能够正确访问GPT-sovits API。** 44 | 如果两者运行在同一台宿主机上,而GPT-sovits API是直接运行的(非docker),环境变量应该是`BACKEND_URL=http://host.docker.internal:9880`(目前的默认配置)。你也可以通过docker compose将两者组合在同一个docker网络中。 45 | 46 | 2. 如果不配置API_KEY,则服务可被所有人访问 47 | 48 | ## 未来开发计划 49 | 50 | 1. 支持流式传输 51 | 2. ~~移植或合并v2以支持不同的切分文本策略~~ 已在apiv2分支中实现 -------------------------------------------------------------------------------- /app.py: -------------------------------------------------------------------------------- 1 | from flask import Flask, request, send_file, jsonify 2 | import requests 3 | from pydub import AudioSegment 4 | import io 5 | import os # Import os to access environment variables 6 | import json 7 | import yaml 8 | 9 | app = Flask(__name__) 10 | 11 | # Get API_KEY from environment variable 12 | API_KEY = os.environ.get('API_KEY') 13 | 14 | # Get BACKEND_URL from environment variable or use default 15 | BACKEND_URL = os.environ.get('BACKEND_URL', 'http://127.0.0.1:9880') 16 | 17 | # Load YAML configuration file 18 | def load_voice_config(): 19 | try: 20 | with open('config.yaml', 'r', encoding='utf-8') as f: 21 | config = yaml.safe_load(f) 22 | voices = config.get('voices', {}) 23 | 24 | voice_mapping = { 25 | voice: voice_data['models'] 26 | for voice, voice_data in voices.items() 27 | } 28 | refer_mapping = { 29 | voice: voice_data['refer'] 30 | for voice, voice_data in voices.items() 31 | } 32 | return voice_mapping, refer_mapping 33 | except Exception as e: 34 | print(f"Error loading config.yaml: {e}") 35 | return {}, {} 36 | 37 | # Replace original environment variable configuration 38 | VOICE_MAPPING, REFER_MAPPING = load_voice_config() 39 | 40 | # Get other parameters from environment variables or use default values 41 | TEXT_LANGUAGE = os.environ.get('TEXT_LANGUAGE', 'zh') 42 | TOP_K = int(os.environ.get('TOP_K', 15)) 43 | TOP_P = float(os.environ.get('TOP_P', 1)) 44 | TEMPERATURE = float(os.environ.get('TEMPERATURE', 0.45)) 45 | SPEED = float(os.environ.get('SPEED', 0.95)) 46 | 47 | # Print all parameters in one line for debugging 48 | print(f"BACKEND_URL: {BACKEND_URL}, TEXT_LANGUAGE: {TEXT_LANGUAGE}, TOP_K: {TOP_K}, TOP_P: {TOP_P}, TEMPERATURE: {TEMPERATURE}, SPEED: {SPEED}, VOICE_MAPPING: {VOICE_MAPPING}") 49 | 50 | @app.route('/v1/audio/speech', methods=['POST']) 51 | def convert_tts(): 52 | # Check API key if it's set in environment 53 | if API_KEY: 54 | auth_header = request.headers.get('Authorization') 55 | if not auth_header: 56 | return "Missing Authorization header", 401 57 | 58 | # Check if header starts with "Bearer " 59 | if not auth_header.startswith('Bearer '): 60 | return "Invalid Authorization header format", 401 61 | 62 | # Extract and verify API key 63 | provided_key = auth_header.split(' ')[1] 64 | if provided_key != API_KEY: 65 | return "Invalid API key", 401 66 | 67 | # Extract 'input' field from OpenAI request 68 | openai_data = request.json 69 | text = openai_data.get('input') 70 | voice = openai_data.get('voice') 71 | 72 | # Get model paths and refer from the VOICE_MAPPING according to the provided voice 73 | voice_config = VOICE_MAPPING.get(voice) 74 | refer_config = REFER_MAPPING.get(voice) 75 | 76 | if not voice_config: 77 | return f"Voice '{voice}' is not supported", 400 78 | 79 | gpt_model_path = voice_config.get('gpt_model_path') 80 | sovits_model_path = voice_config.get('sovits_model_path') 81 | refer_wav_path = refer_config.get('refer_wav_path') 82 | prompt_text = refer_config.get('prompt_text') 83 | 84 | if not gpt_model_path or not sovits_model_path: 85 | return f"Model paths for voice '{voice}' are missing", 500 86 | 87 | if not refer_wav_path or not prompt_text: 88 | return f"Refer for voice '{voice}' are missing", 500 89 | 90 | # Step 1: Set the models in the backend 91 | set_model_response = requests.post(f"{BACKEND_URL}/set_model", json={ 92 | "gpt_model_path": gpt_model_path, 93 | "sovits_model_path": sovits_model_path 94 | }) 95 | 96 | 97 | # Check if the backend was able to set the models and refer 98 | if set_model_response.status_code != 200: 99 | return f"Backend failed to set models: {set_model_response.text}", set_model_response.status_code 100 | 101 | 102 | # Step 2: Send text-to-speech request to the backend 103 | backend_payload = { 104 | "text": text, 105 | "text_language": TEXT_LANGUAGE, 106 | "refer_wav_path": refer_wav_path, 107 | "prompt_text": prompt_text, 108 | "prompt_language": TEXT_LANGUAGE, # Add language for prompt text to pass validation 109 | "top_k": TOP_K, 110 | "top_p": TOP_P, 111 | "temperature": TEMPERATURE, 112 | "speed": SPEED 113 | } 114 | 115 | backend_response = requests.post(BACKEND_URL, json=backend_payload) 116 | 117 | # Check if the backend response is successful 118 | if backend_response.status_code != 200: 119 | return f"Backend service error: {backend_response.text}", backend_response.status_code 120 | 121 | # Step 3: Convert returned WAV file to MP3 122 | wav_audio = io.BytesIO(backend_response.content) 123 | audio = AudioSegment.from_wav(wav_audio) 124 | mp3_audio = io.BytesIO() 125 | audio.export(mp3_audio, format="mp3") 126 | mp3_audio.seek(0) 127 | 128 | # Return MP3 file 129 | return send_file(mp3_audio, mimetype='audio/mp3', as_attachment=True, download_name='speech.mp3') 130 | 131 | if __name__ == '__main__': 132 | app.run(host='0.0.0.0', port=5000) -------------------------------------------------------------------------------- /config.yaml.example: -------------------------------------------------------------------------------- 1 | # 建议使用正斜杠路径 2 | # Recommended to use forward slashes in paths 3 | voices: 4 | alloy: 5 | models: 6 | gpt_model_path: "F:/model1" 7 | sovits_model_path: "F:/model2" 8 | refer: 9 | refer_wav_path: "F:/1.wav" 10 | prompt_text: "text_here" 11 | echo: 12 | models: 13 | gpt_model_path: "F:/model3" 14 | sovits_model_path: "F:/model4" 15 | refer: 16 | refer_wav_path: "F:/2.wav" 17 | prompt_text: "text_here" -------------------------------------------------------------------------------- /docker-compose.yml: -------------------------------------------------------------------------------- 1 | services: 2 | tts-service: 3 | image: austinleo/tts-adapter:latest 4 | ports: 5 | - "5000:5000" 6 | env_file: 7 | - .env 8 | restart: always 9 | extra_hosts: 10 | - "host.docker.internal:host-gateway" 11 | volumes: 12 | - ./config.yaml:/app/config.yaml -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | Flask==3.0.3 2 | requests==2.32.3 3 | pydub==0.25.1 4 | PyYAML==6.0.1 --------------------------------------------------------------------------------