├── .gitignore ├── doc ├── ComfyUI_temp_dgtgr_00001_.png ├── ComfyUI_temp_rhsxy_00001_.png └── base_workflow.json ├── f5_model ├── __init__.py ├── backbones │ ├── README.md │ ├── mmdit.py │ ├── dit.py │ └── unett.py ├── dataset.py ├── cfm.py ├── ecapa_tdnn.py ├── trainer.py ├── modules.py └── utils.py ├── requirements.txt ├── zh_normalization ├── README.md ├── __init__.py ├── quantifier.py ├── phonecode.py ├── constants.py ├── chronology.py ├── text_normlization.py ├── num.py └── char_convert.py ├── LICENSE ├── README.md ├── __init__.py └── data └── Emilia_ZH_EN_pinyin └── vocab.txt /.gitignore: -------------------------------------------------------------------------------- 1 | __pycache__ -------------------------------------------------------------------------------- /doc/ComfyUI_temp_dgtgr_00001_.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AIFSH/F5-TTS-ComfyUI/HEAD/doc/ComfyUI_temp_dgtgr_00001_.png -------------------------------------------------------------------------------- /doc/ComfyUI_temp_rhsxy_00001_.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AIFSH/F5-TTS-ComfyUI/HEAD/doc/ComfyUI_temp_rhsxy_00001_.png -------------------------------------------------------------------------------- /f5_model/__init__.py: -------------------------------------------------------------------------------- 1 | from .cfm import CFM 2 | 3 | from .backbones.unett import UNetT 4 | from .backbones.dit import DiT 5 | from .backbones.mmdit import MMDiT 6 | 7 | from .trainer import Trainer 8 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | accelerate>=0.33.0 2 | datasets 3 | einops>=0.8.0 4 | einx>=0.3.0 5 | ema_pytorch>=0.5.2 6 | faster_whisper 7 | funasr 8 | jieba 9 | jiwer 10 | librosa 11 | matplotlib 12 | pypinyin 13 | safetensors 14 | # torch>=2.0 15 | # torchaudio>=2.3.0 16 | torchdiffeq 17 | tqdm>=4.65.0 18 | transformers 19 | vocos 20 | wandb 21 | x_transformers>=1.31.14 22 | zhconv 23 | zhon 24 | cached_path 25 | pydub 26 | soundfile 27 | LangSegment 28 | numpy==1.26.4 29 | -------------------------------------------------------------------------------- /f5_model/backbones/README.md: -------------------------------------------------------------------------------- 1 | ## Backbones quick introduction 2 | 3 | 4 | ### unett.py 5 | - flat unet transformer 6 | - structure same as in e2-tts & voicebox paper except using rotary pos emb 7 | - update: allow possible abs pos emb & convnextv2 blocks for embedded text before concat 8 | 9 | ### dit.py 10 | - adaln-zero dit 11 | - embedded timestep as condition 12 | - concatted noised_input + masked_cond + embedded_text, linear proj in 13 | - possible abs pos emb & convnextv2 blocks for embedded text before concat 14 | - possible long skip connection (first layer to last layer) 15 | 16 | ### mmdit.py 17 | - sd3 structure 18 | - timestep as condition 19 | - left stream: text embedded and applied a abs pos emb 20 | - right stream: masked_cond & noised_input concatted and with same conv pos emb as unett 21 | -------------------------------------------------------------------------------- /zh_normalization/README.md: -------------------------------------------------------------------------------- 1 | ## Supported NSW (Non-Standard-Word) Normalization 2 | 3 | |NSW type|raw|normalized| 4 | |:--|:-|:-| 5 | |serial number|电影中梁朝伟扮演的陈永仁的编号27149|电影中梁朝伟扮演的陈永仁的编号二七一四九| 6 | |cardinal|这块黄金重达324.75克
我们班的最高总分为583分|这块黄金重达三百二十四点七五克
我们班的最高总分为五百八十三分| 7 | |numeric range |12\~23
-1.5\~2|十二到二十三
负一点五到二| 8 | |date|她出生于86年8月18日,她弟弟出生于1995年3月1日|她出生于八六年八月十八日, 她弟弟出生于一九九五年三月一日| 9 | |time|等会请在12:05请通知我|等会请在十二点零五分请通知我 10 | |temperature|今天的最低气温达到-10°C|今天的最低气温达到零下十度 11 | |fraction|现场有7/12的观众投出了赞成票|现场有十二分之七的观众投出了赞成票| 12 | |percentage|明天有62%的概率降雨|明天有百分之六十二的概率降雨| 13 | |money|随便来几个价格12块5,34.5元,20.1万|随便来几个价格十二块五,三十四点五元,二十点一万| 14 | |telephone|这是固话0421-33441122
这是手机+86 18544139121|这是固话零四二一三三四四一一二二
这是手机八六一八五四四一三九一二一| 15 | ## References 16 | [Pull requests #658 of DeepSpeech](https://github.com/PaddlePaddle/DeepSpeech/pull/658/files) 17 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2024 AIFSH 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # F5-TTS-ComfyUI 2 | a custom node for [F5-TTS](https://github.com/SWivid/F5-TTS),you can find [workflow here](./doc/base_workflow.json) 3 | 4 | ## Weights 5 | weights will be download from hf automaticlly,对于国内用户,你可以手动下载解压后把F5-TTS文件夹放到`ComfyUI/models/AIFSH`目录下面,[下载地址](https://pan.quark.cn/s/e3a3e4281ada) 6 | 7 | ## 教程 8 | - [演示视频](https://www.bilibili.com/video/BV1Tjm5YLEsX) 9 | - [一键包,内含F5-TTS,FireRedTTS,JoyHallo,hallo2四个节点,持续更新中,一次订阅31天免费更新](https://b23.tv/Zm3kPNP) 10 | ## Disclaimer / 免责声明 11 | We do not hold any responsibility for any illegal usage of the codebase. Please refer to your local laws about DMCA and other related laws. 我们不对代码库的任何非法使用承担任何责任. 请参阅您当地关于 DMCA (数字千年法案) 和其他相关法律法规. 12 | ## Example 13 | 14 | | gen_text | ref_audio | out_audio | audio_img | 15 | | -- | -- | -- | -- | 16 | |`你好,我是太乙真人!欢迎来四川找我玩`|