├── .env.example
├── .gitignore
├── README.en.md
├── README.md
├── app.py
├── evaluation_results.csv
└── requirements.txt


/.env.example:
--------------------------------------------------------------------------------
1 | # 'local' or 'url'
2 | STORAGE_MODE=local
3 | # only for local
4 | STORAGE_PATH=samples
5 | # only for url
6 | STORAGE_URL=https://


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
  1 | # Byte-compiled / optimized / DLL files
  2 | __pycache__/
  3 | *.py[cod]
  4 | *$py.class
  5 | 
  6 | # C extensions
  7 | *.so
  8 | 
  9 | # Distribution / packaging
 10 | .Python
 11 | build/
 12 | develop-eggs/
 13 | dist/
 14 | downloads/
 15 | eggs/
 16 | .eggs/
 17 | lib/
 18 | lib64/
 19 | parts/
 20 | sdist/
 21 | var/
 22 | wheels/
 23 | share/python-wheels/
 24 | *.egg-info/
 25 | .installed.cfg
 26 | *.egg
 27 | MANIFEST
 28 | 
 29 | # PyInstaller
 30 | #  Usually these files are written by a python script from a template
 31 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 32 | *.manifest
 33 | *.spec
 34 | 
 35 | # Installer logs
 36 | pip-log.txt
 37 | pip-delete-this-directory.txt
 38 | 
 39 | # Unit test / coverage reports
 40 | htmlcov/
 41 | .tox/
 42 | .nox/
 43 | .coverage
 44 | .coverage.*
 45 | .cache
 46 | nosetests.xml
 47 | coverage.xml
 48 | *.cover
 49 | *.py,cover
 50 | .hypothesis/
 51 | .pytest_cache/
 52 | cover/
 53 | 
 54 | # Translations
 55 | *.mo
 56 | *.pot
 57 | 
 58 | # Django stuff:
 59 | *.log
 60 | local_settings.py
 61 | db.sqlite3
 62 | db.sqlite3-journal
 63 | 
 64 | # Flask stuff:
 65 | instance/
 66 | .webassets-cache
 67 | 
 68 | # Scrapy stuff:
 69 | .scrapy
 70 | 
 71 | # Sphinx documentation
 72 | docs/_build/
 73 | 
 74 | # PyBuilder
 75 | .pybuilder/
 76 | target/
 77 | 
 78 | # Jupyter Notebook
 79 | .ipynb_checkpoints
 80 | 
 81 | # IPython
 82 | profile_default/
 83 | ipython_config.py
 84 | 
 85 | # pyenv
 86 | #   For a library or package, you might want to ignore these files since the code is
 87 | #   intended to run in multiple environments; otherwise, check them in:
 88 | # .python-version
 89 | 
 90 | # pipenv
 91 | #   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
 92 | #   However, in case of collaboration, if having platform-specific dependencies or dependencies
 93 | #   having no cross-platform support, pipenv may install dependencies that don't work, or not
 94 | #   install all needed dependencies.
 95 | #Pipfile.lock
 96 | 
 97 | # poetry
 98 | #   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
 99 | #   This is especially recommended for binary packages to ensure reproducibility, and is more
100 | #   commonly ignored for libraries.
101 | #   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
102 | #poetry.lock
103 | 
104 | # pdm
105 | #   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
106 | #pdm.lock
107 | #   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
108 | #   in version control.
109 | #   https://pdm.fming.dev/latest/usage/project/#working-with-version-control
110 | .pdm.toml
111 | .pdm-python
112 | .pdm-build/
113 | 
114 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
115 | __pypackages__/
116 | 
117 | # Celery stuff
118 | celerybeat-schedule
119 | celerybeat.pid
120 | 
121 | # SageMath parsed files
122 | *.sage.py
123 | 
124 | # Environments
125 | .env
126 | .venv
127 | env/
128 | venv/
129 | ENV/
130 | env.bak/
131 | venv.bak/
132 | 
133 | # Spyder project settings
134 | .spyderproject
135 | .spyproject
136 | 
137 | # Rope project settings
138 | .ropeproject
139 | 
140 | # mkdocs documentation
141 | /site
142 | 
143 | # mypy
144 | .mypy_cache/
145 | .dmypy.json
146 | dmypy.json
147 | 
148 | # Pyre type checker
149 | .pyre/
150 | 
151 | # pytype static type analyzer
152 | .pytype/
153 | 
154 | # Cython debug symbols
155 | cython_debug/
156 | 
157 | # PyCharm
158 | #  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
159 | #  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
160 | #  and can be added to the global gitignore or merged into this file.  For a more nuclear
161 | #  option (not recommended) you can uncomment the following to ignore the entire idea folder.
162 | .idea/
163 | tmp/


--------------------------------------------------------------------------------
/README.en.md:
--------------------------------------------------------------------------------
 1 | [English README](README.en.md) | [中文简体](README.md)
 2 | 
 3 | # 🥇 ChatTTS Speaker Leaderboard
 4 | 
 5 | ## 🎤 ChatTTS Stable Speaker Evaluation and Labeling (Experimental)
 6 | 
 7 | This project is based on [ChatTTS](https://github.com/2noise/ChatTTS).
 8 | 
 9 | The evaluation is based on Tongyi Laboratory's [eres2netv2_sv_zh-cn](https://modelscope.cn/models/iic/speech_eres2netv2_sv_zh-cn_16k-common/summary).
10 | 
11 | Feel free to download and listen to the voices! This project is open source: [ChatTTS_Speaker](https://github.com/6drf21e/ChatTTS_Speaker). Contributions are welcome.
12 | 
13 | ## Try It Now
14 | 
15 | | Description       | Link                                                    |
16 | |-------------------|---------------------------------------------------------| 
17 | | **ModelScope (China)** | https://modelscope.cn/studios/ttwwwaa/ChatTTS_Speaker |
18 | | **HuggingFace**   | https://huggingface.co/spaces/taa/ChatTTS_Speaker       |
19 | 
20 | ## Parameter Descriptions
21 | 
22 | - **rank_long**: Stability score for long sentence text.
23 | - **rank_multi**: Stability score for multi-sentence text.
24 | - **rank_single**: Stability score for single sentence text.
25 | 
26 | These three parameters are used to measure the consistency of the voice across different samples. The higher the value, the more stable the voice.
27 | 
28 | - **score**: Likelihood of the voice's gender, age, and characteristics. The higher the value, the more accurate it is.
29 | - **gender age feature**: The gender, age, and characteristics of the voice. (Feature accuracy is low, for reference only)
30 | 
31 | ## How to Download and Listen to Voices
32 | 
33 | 1. Click on a voice seed_id cell.
34 | 2. Click the **Download .pt File** button at the bottom to download the corresponding .pt file.
35 | 
36 | ## Evaluation Code
37 | 
38 | The stability evaluation code can be found at: https://github.com/2noise/ChatTTS/pull/317
39 | 
40 | ## FAQ
41 | 
42 | - **Q**: How to use the .pt file?
43 | - **A**: You can directly load it in some projects, such as: [ChatTTS_colab](https://github.com/6drf21e/ChatTTS_colab). You can also load it with similar code:
44 | 
45 | ```python
46 | spk = torch.load(PT_FILE_PATH)
47 | params_infer_code = {
48 |     'spk_emb': spk,
49 | }
50 | ```
51 | 
52 | - **Q**: Why do some voices have high scores but sound bad?
53 | - **A**: The score only measures the stability of the voice, not its quality. You can choose the appropriate voice according to your needs. For example, if a hoarse and stuttering voice is very stable, its score will be high but it will sound bad.
54 | 
55 | - **Q**: I used the seed_id to generate audio, but the generated audio is not stable?
56 | - **A**: The seed_id is just a reference ID, and the voice may not be consistent in different environments. It is still recommended to use the .pt file to load the voice.
57 | 
58 | - **Q**: Is the voice labeling accurate?
59 | - **A**: The first batch of test voices includes 2000 samples, simply labeled based on voiceprint similarity, with average accuracy (especially for the feature item), for reference only. If you have better labeling methods, please submit a PR.
60 | 
61 | ## Related Projects
62 | - [ChatTTS](https://github.com/2noise/ChatTTS)
63 | - [eres2netv2_sv_zh-cn](https://modelscope.cn/models/iic/speech_eres2netv2_sv_zh-cn_16k-common/summary)
64 | - [3D-Speaker](https://github.com/modelscope/3D-Speaker)
65 | 
66 | ## Contribution
67 | 
68 | Contributions of code and voices are welcome! If you have any questions or suggestions, please submit an issue or pull request.
69 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | <div align="center">
 2 | 
 3 | 
 4 | # 🎤 ChatTTS稳定音色评分与音色打标（实验性）
 5 | 
 6 | 项目基于 [ChatTTS](https://github.com/2noise/ChatTTS) | 评估基于通义实验室 [ERes2NetV2 说话人识别模型](https://modelscope.cn/models/iic/speech_eres2netv2_sv_zh-cn_16k-common/summary)。
 7 | 
 8 | 当前测评音色 2600 个
 9 | 
10 | [![Open In ModeScope](https://img.shields.io/badge/Open%20In-modelscope-blue?style=for-the-badge)](https://modelscope.cn/studios/ttwwwaa/ChatTTS_Speaker)
11 | [![Huggingface](https://img.shields.io/badge/🤗%20-Spaces-yellow.svg?style=for-the-badge)](https://huggingface.co/spaces/taa/ChatTTS_Speaker)
12 | 
13 | 
14 | [**English**](README.en.md) | [**简体中文**](README.md)
15 | 
16 | </div>
17 | 
18 | ## 马上体验
19 | 
20 | | 说明                | 链接                                                    |
21 | |-------------------|-------------------------------------------------------| 
22 | | **ModelScop(国内)** | https://modelscope.cn/studios/ttwwwaa/ChatTTS_Speaker |
23 | | **HuggingFace**   | https://huggingface.co/spaces/taa/ChatTTS_Speaker     |
24 | 
25 | ## 参数解释
26 | 
27 | - **rank_long**: 长句文本的音色稳定性评分。
28 | - **rank_multi**: 多句文本的音色稳定性评分。
29 | - **rank_single**: 单句文本的音色稳定性评分。
30 | 
31 | 这三个参数用于衡量音色在不同样本一致性，数值越高表示音色越稳定。
32 | 
33 | - **score**: 音色性别、年龄、特征的可能性，数值越高表示越准确。
34 | - **gender age feature**: 音色的性别、年龄、特征。（特征准确度不高，仅供参考）
35 | 
36 | ## 如何下载试听音色
37 | 
38 | 1. 点选一个音色 seed_id 单元格。
39 | 2. 点击最下方的 **Download .pt File** 按钮，即可下载对应的 .pt 文件。
40 | 
41 | ## 评估代码
42 | 
43 | 稳定性评估代码详见：https://github.com/2noise/ChatTTS/pull/317
44 | 
45 | ## FAQ
46 | 
47 | - **Q**: 怎么使用 .pt 文件？
48 | - **A**: 可以直接在一些项目中载入使用，例如：[ChatTTS_colab](https://github.com/6drf21e/ChatTTS_colab)。 也可以使用类似代码载入：
49 | 
50 | ```python
51 | spk = torch.load(PT_FILE_PATH)
52 | params_infer_code = {
53 |     'spk_emb': spk,
54 | }
55 | ```
56 | 
57 | - **Q**: 为什么有的音色打分高但是很难听？
58 | - **A**: 评分只是衡量音色的稳定性，不代表音色的好坏。可以根据自己的需求选择合适的音色。例如：如果一个沙哑且结巴的音色一直很稳定，那么它的评分就会很高但是很难听。
59 | 
60 | 
61 | - **Q**: 我使用 seed_id 去生成音频，但是生成的音频不稳定？
62 | - **A**: seed_id 只是一个参考 ID，不同的环境下音色不一定一致。还是推荐使用 .pt 文件载入音色。
63 | 
64 | 
65 | - **Q**: 音色打标准确吗？
66 | - **A**: 当前第一批测试的音色有 2000 条，根据声纹相似性简单打标，准确度一般（特别是特征一项），仅供参考。如果大家有更好的标注方法，欢迎
67 |   PR。
68 | 
69 | ## 相关项目
70 | - [ChatTTS](https://github.com/2noise/ChatTTS)
71 | - [eres2netv2_sv_zh-cn](https://modelscope.cn/models/iic/speech_eres2netv2_sv_zh-cn_16k-common/summary)
72 | - [3D-Speaker](https://github.com/modelscope/3D-Speaker)
73 | 
74 | ## 贡献
75 | 
76 | 欢迎大家贡献代码和音色！如果有任何问题或建议，请提交 issue 或 pull request。
77 | 
78 | 


--------------------------------------------------------------------------------
/app.py:
--------------------------------------------------------------------------------
  1 | import base64
  2 | import os
  3 | import os.path
  4 | 
  5 | import gradio as gr
  6 | import pandas as pd
  7 | import requests
  8 | from dotenv import load_dotenv
  9 | from gradio_leaderboard import Leaderboard
 10 | from pandas import DataFrame
 11 | import torch
 12 | import pybase16384 as b14
 13 | import numpy as np
 14 | import lzma
 15 | 
 16 | load_dotenv()
 17 | 
 18 | # 获取环境变量
 19 | storage_mode = os.getenv("STORAGE_MODE")
 20 | storage_path = os.getenv("STORAGE_PATH")
 21 | storage_url = os.getenv("STORAGE_URL")
 22 | 
 23 | # 临时文件目录
 24 | tmp_dir = os.path.join(os.getcwd(), "tmp")
 25 | os.makedirs(tmp_dir, exist_ok=True)
 26 | 
 27 | 
 28 | def _encode_spk_emb(spk_emb: torch.Tensor) -> str:
 29 |     with torch.no_grad():
 30 |         arr: np.ndarray = spk_emb.to(dtype=torch.float16, device="cpu").numpy()
 31 |         s = b14.encode_to_string(
 32 |             lzma.compress(
 33 |                 arr.tobytes(),
 34 |                 format=lzma.FORMAT_RAW,
 35 |                 filters=[
 36 |                     {"id": lzma.FILTER_LZMA2, "preset": 9 | lzma.PRESET_EXTREME}
 37 |                 ],
 38 |             ),
 39 |         )
 40 |         del arr
 41 |     return s
 42 | 
 43 | def pt2str(pt_path):
 44 |     spk_emb = torch.load(pt_path, map_location="cpu")
 45 |     return _encode_spk_emb(spk_emb)
 46 | 
 47 | def file_to_base64(file_path):
 48 |     with open(file_path, "rb") as file:
 49 |         return base64.b64encode(file.read()).decode("utf-8")
 50 | 
 51 | 
 52 | def base64_to_file(base64_str, output_path):
 53 |     with open(output_path, "wb") as file:
 54 |         file.write(base64.b64decode(base64_str))
 55 | 
 56 | 
 57 | def convert_to_markdown(percentage_str):
 58 |     """
 59 |     将百分比字符串转换为 markdown 格式
 60 |     :param percentage_str:
 61 |     :return:
 62 |     """
 63 |     if not percentage_str:
 64 |         return ""
 65 |     if not isinstance(percentage_str, str):
 66 |         return percentage_str
 67 |     items = percentage_str.split(";")
 68 |     markdown_str = "  ".join([f"**{item.split(':')[0]}** {item.split(':')[1]}%" for item in items])
 69 |     return markdown_str
 70 | 
 71 | def convert_to_str(percentage_str):
 72 |     """
 73 |     将百分比字符串转换为 str
 74 |     :param percentage_str:
 75 |     :return:
 76 |     """
 77 |     if not percentage_str or not isinstance(percentage_str, str):
 78 |         return "未知"
 79 |     items = percentage_str.split(";")
 80 |     # sort by value
 81 |     items.sort(key=lambda x: float(x.split(':')[1]), reverse=True)
 82 |     keys = [item.split(':')[0] for item in items]
 83 |     if keys and keys[0]:
 84 |         return keys[0]
 85 |     else:
 86 |         return "未知"
 87 | 
 88 | # Load
 89 | df = pd.read_csv("evaluation_results.csv", encoding="utf-8")
 90 | 
 91 | df["rank_long"] = df["rank_long"].apply(lambda x: round(x, 2))
 92 | df["rank_multi"] = df["rank_multi"].apply(lambda x: round(x, 2))
 93 | df["rank_single"] = df["rank_single"].apply(lambda x: round(x, 2))
 94 | df["gender_filter"] = df["gender"].apply(convert_to_str)
 95 | df["gender"] = df["gender"].apply(convert_to_markdown)
 96 | df["age_filter"] = df["age"].apply(convert_to_str)
 97 | df["age"] = df["age"].apply(convert_to_markdown)
 98 | df["feature"] = df["feature"].apply(convert_to_markdown)
 99 | df["score"] = df["score"].apply(lambda x: round(x, 2))
100 | 
101 | 
102 | def download_wav_file(seed_id, download_url, local_dir):
103 |     os.makedirs(local_dir, exist_ok=True)
104 |     local_file_path = os.path.join(local_dir, f"{seed_id}.wav")
105 |     file_url = f"{download_url}/{seed_id}_test.wav"
106 |     if not os.path.exists(local_file_path):
107 |         response = requests.get(file_url, stream=True)
108 |         if response.status_code == 200:
109 |             with open(local_file_path, "wb") as f:
110 |                 f.write(response.content)
111 |             print(f"Downloaded {file_url} to {local_file_path}")
112 |         else:
113 |             print(f"Failed to download {file_url}: Status code {response.status_code}")
114 |     return local_file_path
115 | 
116 | 
117 | def restore_wav_file(seed_id):
118 |     """
119 |     根据给定的 seed_id 恢复 WAV 文件。如果 storage_mode 为 'local'，
120 |     则从本地存储路径中获取文件。如果 storage_mode 为 'url'，
121 |     则从远程 URL 下载文件到临时目录。
122 |     :param seed_id:
123 |     :return:
124 |     """
125 |     if not seed_id:
126 |         return None
127 | 
128 |     if storage_mode == "local":
129 |         local_file_path = os.path.join(storage_path, f"{seed_id}_test.wav")
130 |         if os.path.exists(local_file_path):
131 |             return local_file_path
132 |         else:
133 |             print(f"Local file {local_file_path} does not exist.")
134 |             return None
135 | 
136 |     elif storage_mode == "url":
137 |         try:
138 |             return download_wav_file(seed_id, storage_url, tmp_dir)
139 |         except Exception as e:
140 |             print(f"Failed to download WAV file: {e}")
141 |             return None
142 | 
143 |     else:
144 |         print(f"Invalid storage mode: {storage_mode}")
145 |         return None
146 | 
147 | 
148 | def restore_pt_file(seed_id):
149 |     """
150 |     根据给定的 seed_id 恢复 PT 文件。
151 |     :param seed_id:
152 |     :return:
153 |     """
154 |     row = df[df["seed_id"] == seed_id]
155 |     if row.empty:
156 |         return None
157 |     row = row.iloc[0]
158 |     if not row.empty:
159 |         emb_data = row["emb_data"]
160 |         output_path = os.path.join(tmp_dir, f"{row['seed_id']}_restored_emb.pt")
161 |         base64_to_file(emb_data, output_path)
162 |         return output_path
163 |     else:
164 |         return None
165 | 
166 | 
167 | def seed_change(evt: gr.SelectData, value=None):
168 |     """
169 |     处理种子ID变化事件，根据选择的种子ID返回对应的.pt文件下载按钮和试听音频。
170 |     :param evt:
171 |     :param value:
172 |     """
173 |     print(f"You selected {evt.value} at {evt.index} from {evt.target}")
174 | 
175 |     if not isinstance(evt.index, list) or evt.index[1] != 0:
176 |         return [
177 |             None,
178 |             gr.DownloadButton(value=None, label="Download .pt File", visible=False),
179 |             gr.Audio(None, visible=False),
180 |         ]
181 | 
182 |     assert isinstance(value, DataFrame), "Expected value to be a DataFrame"
183 | 
184 |     # seed_id
185 |     seed_id = evt.value
186 |     print(f"Selected seed_id: {seed_id}")
187 | 
188 |     # 获取 pt 文件
189 |     down_file = restore_pt_file(seed_id)
190 | 
191 |     # spk_emb_str
192 |     spk_emb_str = pt2str(down_file)
193 | 
194 |     # 获取试听文件
195 |     wav_file = restore_wav_file(seed_id)
196 |     if wav_file and not os.path.exists(wav_file):
197 |         print(f"WAV file {wav_file} does not exist.")
198 |         wav_file = None
199 | 
200 |     return [
201 |         evt.index,
202 |         gr.DownloadButton(value=down_file, label=f"Download .pt File [{seed_id}]", visible=True),
203 |         gr.Audio(wav_file, visible=wav_file is not None),
204 |         spk_emb_str,
205 |     ]
206 | 
207 | 
208 | with gr.Blocks() as demo:
209 |     gr.Markdown("# 🥇 ChatTTS Speaker Leaderboard ")
210 |     gr.Markdown("""
211 |     ### 🎤 [ChatTTS](https://github.com/2noise/ChatTTS): 稳定音色查找与音色打标（实验性）欢迎下载试听音色！
212 | 
213 |     本项目已开源：[ChatTTS_Speaker](https://github.com/6drf21e/ChatTTS_Speaker) 欢迎 PR 和 Star！
214 | 
215 |     评估基于通义实验室：[eres2netv2_sv_zh-cn](https://modelscope.cn/models/iic/speech_eres2netv2_sv_zh-cn_16k-common/summary)
216 |     """)
217 | 
218 |     with gr.Tab(label="🏆Leaderboard"):
219 |         with gr.Row():
220 |             with gr.Column(scale=1):
221 |                 gr.Markdown("""
222 | ### 参数解释
223 | 
224 | - **rank_long**: 长句文本的音色稳定性评分。
225 | - **rank_multi**: 多句文本的音色稳定性评分。
226 | - **rank_single**: 单句文本的音色稳定性评分。
227 | 
228 | 这三个参数用于衡量不同音色在生成不同类型文本时的一致性，数值越高表示音色越稳定。
229 | 
230 | - **score**: 音色性别、年龄、特征的可能性，越高越准确。
231 | - **gender age feature**: 音色的性别、年龄、特征。（特征准确度不高 仅供参考）
232 | 
233 | ### 如何下载音色
234 | 
235 | - 点选一个音色，点击最下方的 **Download .pt File** 按钮，即可下载对应的 .pt 文件。
236 | 
237 | ### FAQ
238 | 
239 | - **Q**: 怎么使用 .pt 文件？
240 | - **A**: 可以直接在一些项目：例如：[ChatTTS_colab](https://github.com/6drf21e/ChatTTS_colab)  中载入使用。
241 | 也可以使用类似代码载入：
242 | ```python
243 | spk = torch.load(<PT-FILE-PATH>)
244 | params_infer_code = {
245 |     'spk_emb': spk,
246 | }
247 | 略
248 | ```
249 | - **Q**: 为什么有的音色打分高但是很难听？
250 | - **A**: 评分只是衡量音色的稳定性，不代表音色的好坏。可以根据自己的需求选择合适的音色。举个简单的例子：如果一个沙哑且结巴的音色一直很稳定，那么它的评分就会很高。
251 | - **Q**: 我使用 seed_id 去生成音频，但是生成的音频不稳定？
252 | - **A**: seed_id 只是一个参考ID 不同的环境下音色不一定一致。还是推荐使用 .pt 文件载入音色。
253 | - **Q**: 音色标的男女准确吗？
254 | - **A**: 当前第一批测试的音色有 2000 条，根据声纹相似性简单打标，准确度不高（特别是特征一项），仅供参考。如果大家有更好的标注方法，欢迎 PR。
255 | 
256 |                     """)
257 |             with gr.Column(scale=3, min_width=800):
258 |                 leaderboard = Leaderboard(
259 |                     value=df,
260 |                     datatype=["markdown"] * 12,
261 |                     select_columns=["seed_id", "rank_long", "rank_multi", "rank_single", "score", "gender", "age",
262 |                                     "feature"],
263 |                     search_columns=["gender", "age"],
264 |                     filter_columns=["rank_long", "rank_multi", "rank_single", "gender_filter", "age_filter"],
265 |                     hide_columns=["emb_data", "gender_filter", "age_filter"],
266 |                 )
267 |                 stats = gr.State(value=[1])
268 |                 download_button = gr.DownloadButton("Download .pt File", visible=True)
269 |                 spk_emb_str = gr.Textbox("", label="音色码/speaker embedding", lines=10)
270 |                 test_audio = gr.Audio(visible=True)
271 |                 gr.Markdown("选择 seed_id 才能下载 .pt 文件和试听音频。")
272 |                 # download_button.click(download, inputs=[stats], outputs=[])
273 |                 leaderboard.select(seed_change, inputs=[leaderboard], outputs=[stats, download_button, test_audio, spk_emb_str])
274 | 
275 |     with gr.Tab(label="📊Details"):
276 |         gr.Markdown("""
277 |     # 音色稳定性初步评估
278 |     
279 |     ## 原理
280 |     
281 |     利用 通义实验室开源的[eres2netv2_sv_zh-cn](https://modelscope.cn/models/iic/speech_eres2netv2_sv_zh-cn_16k-common/summary) **SERes2NetV2 说话人识别模型** ，对同一个音色进行测评，评估其在不同语音样本中的一致性。具体步骤如下：
282 |     
283 |     1. **样本**：选择三个不同类型的测试样本：单句文本、多句文本和长句文本。
284 |     2. **音色一致性评分**：
285 |         - 对每对音频文件进行评分，计算它们是否来自同一说话人。
286 |         - 使用 eres2netv2 模型，对每对音频文件进行打分，获得相似度分数。
287 |     3. **稳定性评估**：
288 |         - 计算每组音频文件的平均相似度分数和标准差。
289 |         - 通过综合平均分和标准差，计算稳定性指数，用于衡量音色的一致性。
290 |     
291 |     
292 |     ## 样本如下
293 |     
294 |     ### 单句文本
295 |     - 这是一段测试文本[uv_break] 用来测试多批次生产音频的稳定性。 X 6次
296 |     
297 |     ### 多句文本
298 |     - 今天早晨，市中心的主要道路因突发事故造成了严重堵塞[uv_break]。请驾驶员朋友们注意绕行，并听从现场交警的指挥。
299 |     - 亲爱的朋友们，无论你现在处于什么样的境地，都不要放弃希望[uv_break]。每一个伟大的成功，都是从不懈的努力和坚定的信念中诞生的。
300 |     - 很久很久以前，在一个宁静的小村庄里，住着一只名叫小花的可爱小猫咪[uv_break]。小花每天都喜欢在花园里玩耍，有一天，它遇到了一只迷路的小鸟。
301 |     - 您好，欢迎致电本公司客服中心。为了更好地服务您，请在听到提示音后选择所需服务[uv_break]。如果您需要咨询产品信息，[uv_break]请按一。
302 |     - 夜色如墨[uv_break]，山间小道蜿蜒曲折。李逍遥轻踏树梢，身形如同幽灵一般，迅捷无声[uv_break]。他手中的宝剑在月光下闪烁着寒芒，心中却是一片平静。
303 |     - 亲爱的，你今天工作怎么样？[uv_break]有没有遇到什么开心的事。[uv_break]对了，晚上我们一起去那个新开的餐厅试试吧。
304 |     
305 |     ### 长句文本
306 |     - 今天早晨，市中心的主要道路因突发事故造成了严重堵塞[uv_break]。请驾驶员朋友们注意绕行，并听从现场交警的指挥[uv_break]。天气预报显示，未来几天将有大范围降雨[uv_break]，请大家出门记得携带雨具，注意安全。另据报道，本次事故已造成数人受伤[uv_break]，目前相关部门正在积极处理事故现场[uv_break]，确保道路尽快恢复通畅。
307 |     - 亲爱的朋友们，无论你现在处于什么样的境地，都不要放弃希望[uv_break]。每一个伟大的成功，都是从不懈的努力和坚定的信念中诞生的[uv_break]。人生的道路上充满了挑战和困难[uv_break]，但正是这些考验成就了我们的成长[uv_break]。记住，每一个今天的努力，都会成为明天成功的基石[uv_break]，坚持下去，你将看到光明的未来。
308 |     - 很久很久以前，在一个宁静的小村庄里，住着一只名叫小花的可爱小猫咪[uv_break]。小花每天都喜欢在花园里玩耍，有一天，它遇到了一只迷路的小鸟[uv_break]。小花决定帮助小鸟找到回家的路[uv_break]，于是它们一起穿过森林，翻过小山丘，经历了许多冒险[uv_break]。最终，在小花的帮助下，小鸟终于回到了自己的家[uv_break]，它们成为了最好的朋友，从此过上了快乐的生活。
309 |     - 您好，欢迎致电本公司客服中心。为了更好地服务您，请在听到提示音后选择所需服务[uv_break]。如果您需要咨询产品信息，[uv_break]请按一[uv_break]；如果您需要售后服务，请按二[uv_break]；如果您需要与人工客服交流，请按零[uv_break]。感谢您的来电，我们将竭诚为您服务，祝您生活愉快[uv_break]。如有任何疑问，请随时联系我们。
310 |     - 夜色如墨[uv_break]，山间小道蜿蜒曲折。李逍遥轻踏树梢，身形如同幽灵一般，迅捷无声[uv_break]。他手中的宝剑在月光下闪烁着寒芒，心中却是一片平静[uv_break]。突然，一声清脆的剑鸣打破了夜的静谧[uv_break]，一个黑衣人出现在前方，冷笑道：‘李逍遥，你终于来了。’李逍遥目光如电，淡淡道：‘既然来了，就不打算走了[uv_break]。今天，我们就一决高下。",
311 |     - 亲爱的，你今天工作怎么样？[uv_break]有没有遇到什么开心的事。[uv_break]对了，晚上我们一起去那个新开的餐厅试试吧[uv_break]。我听说那里的牛排特别好吃，而且还有你最喜欢的巧克力蛋糕[uv_break]。啊，今天真的好累，但想到等会儿可以见到你，心情就好多了[uv_break]。你还记得上次我们去的那个公园吗？[uv_break]那里的樱花真的好美，我还拍了好多照片呢。
312 |     
313 |             """)
314 | 
315 | if __name__ == "__main__":
316 |     demo.launch()
317 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | gradio==4.20.0
2 | gradio_leaderboard==0.0.9
3 | python-dotenv
4 | pybase16384
5 | torch==2.1.0
6 | numpy<2.0.0


--------------------------------------------------------------------------------