├── LICENSE
├── README.md
├── AI_Auto_Cover_SadTalker_V1.ipynb
└── AI_Auto_Cover_V1.ipynb
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2023 Сырчиков Максим
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # ✨ AiAutoCover
2 | Данный блокнот позволяет заменить голос в песне всего в несколько кликов. Вам понадобятся ссылка на YouTube и ссылка на модель вокала. Всё, нейро-кавер готов! Не нужно ничего устанавливать. Все вычисления происходят на серверах гугл (около 2 часов в день - бесплатно).
3 | Используются open-source модели и репозиторий [UVR](https://github.com/Anjok07/ultimatevocalremovergui) для отделения вокала от инструментала, [RVC](https://github.com/Mangio621/Mangio-RVC-Fork) для преобразования вокала, [SadTalker](https://github.com/OpenTalker/SadTalker) для анимирования лица (если используете [блокнот с SadTalker](https://github.com/self-destruction/AiAutoCover/blob/main/AI_Auto_Cover_SadTalker_V1.ipynb)).
4 | [](https://colab.research.google.com/github/self-destruction/AiAutoCover/blob/main/AI_Auto_Cover_V1.ipynb) - AI Auto Cover
5 | [](https://colab.research.google.com/github/self-destruction/AiAutoCover/blob/main/AI_Auto_Cover_SadTalker_V1.ipynb) - AI Auto Cover + SadTalker
6 |
7 | ## 💪 Как работает
8 |
9 | ### Установка и подготовка
10 |
11 | Подготовка к работе включает в себя установку зависимостей (UVR + RVC), скачивание исходного аудио и модели вокала.
12 |
13 | ### Обработка аудио
14 |
15 | Здесь происходит отделение вокала от инструментала. Далее происходит дополнительная обработка от реверберации и эха, а также, есть возможность поэкспериментировать с настройками преобразования голоса. Затем происходит преобразование вокала с использованием выбранной модели
16 |
17 | ### Пост-обработка и финальные штрихи
18 |
19 | После преобразования вокала следует пост-обработка, которая включает в себя компрессию, нормализацию, лёгкую реверберацию и разведение по стерео-панораме. Затем вокал и инструментал смешиваются обратно, и вуаля, ваш кавер готов!
20 |
21 | ### Анимирование фотографии
22 |
23 | Используя [блокнот с SadTalker](https://github.com/self-destruction/AiAutoCover/blob/main/AI_Auto_Cover_SadTalker_V1.ipynb) можно заставить "петь" под готовый кавер любую фотографию.
24 |
25 | ### Повторное использование
26 |
27 | Система позволяет возвращаться к любому предыдущему шагу без необходимости запускать полный процесс заново. Например, вы можете загрузить другую модель вокала и преобразовать её, не возвращаясь к отделению вокала от инструмента.
28 |
29 | ## 📌 TODO
30 |
31 | Вот некоторые вещи, которые я планирую добавить или улучшить:
32 |
33 | ### Прикрутить Google Drive
34 | Сейчас каждый раз приходится скачивать репозитории и устанавливать зависимости, так что первым делом надо сделать Google Drive основным хранилищем. Это облегчит жизнь и сэкономит время.
35 |
36 | ### DeepFake в v2: клипы на новом уровне
37 | На следующем этапе планирую прикрутить DeepFake, чтобы можно было не только делать аудио-каверы, но и менять лица в клипах. Во прикол будет!
38 |
39 | ### Интеграция с SoundCloud, Spotify, Apple Music и другими платформами
40 | Думаю, будет удобно, если добавить возможность напрямую скачивать треки из музыкальных стриминговых сервисов, таких как SoundCloud, Spotify или Apple Music. Наверное, это упростит процесс и сделает его ещё быстрее.
41 |
42 | ## 💬 Задать вопрос
43 | Все предложения и замечания приветствуются! Пожалуйста, используйте специальные каналы для вопросов и обсуждений. Помощь гораздо ценнее, если она предоставляется публично, чтобы ею могли воспользоваться больше людей.
44 |
45 | | Type | Platforms |
46 | | ------------------------------- | --------------------------------------- |
47 | | 🚨 **Баг-репорты** | [GitHub Трекер] |
48 | | 🎁 **Feature Requests & Идеи** | [GitHub Pull Requests] |
49 |
50 | [gitHub трекер]: https://github.com/self-destruction/AiAutoCover/issues
51 | [github pull requests]: https://github.com/self-destruction/AiAutoCover/pulls
52 | ## 👩💻 Контрибьютеры и поддержка 🐸
53 | Спасибо [NeuroDonu](https://github.com/NeuroDonu) за помощь ❤
54 |
55 |
56 |
57 |
58 |
59 |
60 | [](https://ko-fi.com/intercross)
61 | [](https://www.buymeacoffee.com/intercross)
62 |
--------------------------------------------------------------------------------
/AI_Auto_Cover_SadTalker_V1.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {
6 | "id": "6C0DGeq1grRx"
7 | },
8 | "source": [
9 | "# Ai Auto Cover + SadTalker\n",
10 | "[](https://colab.research.google.com/github/self-destruction/AiAutoCover/blob/main/AI_Auto_Cover_SadTalker_V1.ipynb)\n",
11 | "##### С помощью этого блокнота ты можешь в пару кликов заменить голос из песни. Для этого нужна ссылка с youtube и ссылка на модель исполнителя. Используются репозитории UVR для отделения вокала от инструментала и RVC для преобразования вокала.\n",
12 | "\n",
13 | "##### Навигация по папкам:\n",
14 | "##### /content/image/photo.jpg - фотография человека, губы которого будут анимированы с помощью SadTalker\n",
15 | "##### /content/input/billie_jean.mp3 - исходный файл (вокал + инструментал), скачивается аудио с ютуба (этот шаг можно пропустить и положить файл вручную)\n",
16 | "##### /content/output_uvr/billie_jean_instrum.wav (инструментал) и /content/output_uvr/billie_jean_vocals.wav (вокал) - файлы после разделения \"Ultimate Vocal Remove\"ером\n",
17 | "##### /content/output_rvc/result.mp3 (вокал) - преобразованный вокал, после обработки определённой моделью\n",
18 | "##### /content/output/result.mp3 (вокал + инструмент) - микс преобразованного вокала и исходного инструментала\n",
19 | "##### /content/output/video.mp4 (анимированные губы + аудио-микс) - SadTalker видео\n",
20 | "##### /content/impulse/reverb.wav - импульсный файл реверберации для пост-обработки вокала\n",
21 | "##### Красным помечены обязательные шаги. Остальные можно запускать не глядя."
22 | ]
23 | },
24 | {
25 | "cell_type": "code",
26 | "source": [
27 | "# @title # Вставляем изображение\n",
28 | "#@markdown ##### Предоставьте изображение (.png, .jpg), на котором будут двигаться губы\n",
29 | "#@markdown Здесь откроется проводник:\n",
30 | "\n",
31 | "from google.colab import files\n",
32 | "import os\n",
33 | "import matplotlib.pyplot as plt\n",
34 | "from PIL import Image\n",
35 | "\n",
36 | "img_folder = '/content/image' #@param {type:\"string\"}\n",
37 | "!mkdir -p {img_folder}\n",
38 | "%cd {img_folder}\n",
39 | "INPUT_FACE_IMAGE = ''\n",
40 | "\n",
41 | "uploaded = files.upload()\n",
42 | "\n",
43 | "for file_name in uploaded.keys():\n",
44 | " INPUT_FACE_IMAGE = os.path.join(img_folder, file_name)\n",
45 | "\n",
46 | "print(f\"{INPUT_FACE_IMAGE}\")\n",
47 | "plt.figure(figsize=(7, 5))\n",
48 | "plt.axis('off')\n",
49 | "plt.imshow(Image.open(INPUT_FACE_IMAGE))\n",
50 | "plt.show()"
51 | ],
52 | "metadata": {
53 | "cellView": "form",
54 | "id": "FTSI5TyFTnxe"
55 | },
56 | "execution_count": null,
57 | "outputs": []
58 | },
59 | {
60 | "cell_type": "markdown",
61 | "metadata": {
62 | "id": "0jVt6o9aSTk4"
63 | },
64 | "source": [
65 | "# Установка всех зависимостей"
66 | ]
67 | },
68 | {
69 | "cell_type": "code",
70 | "execution_count": null,
71 | "metadata": {
72 | "id": "IqR9JISMYnlU",
73 | "cellView": "form"
74 | },
75 | "outputs": [],
76 | "source": [
77 | "# @title #Установка UVR + RVC + SadTalker\n",
78 | "# @markdown *Установка займёт минут 10, завари чаёк, дорогой*\n",
79 | "import ipywidgets as widgets\n",
80 | "import os\n",
81 | "from pathlib import PosixPath\n",
82 | "import shutil\n",
83 | "\n",
84 | "ROOT_DIR = '/content'\n",
85 | "!mkdir -p {ROOT_DIR}/input\n",
86 | "!mkdir -p {ROOT_DIR}/output\n",
87 | "!mkdir -p {ROOT_DIR}/results\n",
88 | "!mkdir -p {ROOT_DIR}/output_uvr\n",
89 | "!mkdir -p {ROOT_DIR}/output_rvc\n",
90 | "!mkdir -p {ROOT_DIR}/video_result\n",
91 | "\n",
92 | "print('\\n1/4...')\n",
93 | "%cd {ROOT_DIR}\n",
94 | "!git clone https://github.com/jarredou/MVSEP-MDX23-Colab_v2.git\n",
95 | "!apt install ffmpeg -y &> /dev/null\n",
96 | "MDX_REPO_PATH = f'{ROOT_DIR}/MVSEP-MDX23-Colab_v2'\n",
97 | "%cd {MDX_REPO_PATH}\n",
98 | "!pip install virtualenv\n",
99 | "!virtualenv venv\n",
100 | "!source {MDX_REPO_PATH}/venv/bin/activate; pip install -r requirements.txt\n",
101 | "\n",
102 | "print('\\n2/4...')\n",
103 | "%cd {ROOT_DIR}\n",
104 | "!git clone https://github.com/Mangio621/Mangio-RVC-Fork.git\n",
105 | "MANGIO_REPO_PATH = f'{ROOT_DIR}/Mangio-RVC-Fork'\n",
106 | "%cd {MANGIO_REPO_PATH}\n",
107 | "!pip install yt_dlp ffmpeg ffmpeg-python\n",
108 | "!virtualenv venv\n",
109 | "!source {MANGIO_REPO_PATH}/venv/bin/activate; apt-get -y install build-essential python3-dev\n",
110 | "!source {MANGIO_REPO_PATH}/venv/bin/activate; pip install --upgrade setuptools wheel pip\n",
111 | "!source {MANGIO_REPO_PATH}/venv/bin/activate; pip install faiss-cpu==1.7.2 librosa==0.9.1 fairseq ffmpeg ffmpeg-python praat-parselmouth pyworld numpy==1.23 gradio torchcrepe stftpitchshift onnxruntime\n",
112 | "\n",
113 | "print('\\n3/4...')\n",
114 | "# Костыль, потому что у автора не отбит педрильник\n",
115 | "!sed -i '/command = input(\"%s: \" % cli_current_page)/a\\ if command.strip() == \"stop_infer\":\\n import sys\\n sys.exit()' infer-web.py\n",
116 | "\n",
117 | "!wget https://files.pythonhosted.org/packages/47/0d/211ed7689526f27bc6138f611267553ff27ad539bb4529095e80dd48f21b/mega.py-1.0.8.tar.gz -P {MANGIO_REPO_PATH}/ # &> /dev/null\n",
118 | "!source {MANGIO_REPO_PATH}/venv/bin/activate; pip install \\mega.py-1.0.8.tar.gz # &> /dev/null\n",
119 | "!pip install \\mega.py-1.0.8.tar.gz # &> /dev/null\n",
120 | "!rm -rf \\mega.py-1.0.8.tar.gz\n",
121 | "\n",
122 | "# Обфу скац ия, чт обы г угл колаб не руга лся :)\n",
123 | "HP = \"https://hug\" + \"gingfa\" + \"ce.co/se\" + \"anghay/uv\" + \"r_mode\" + \"ls/reso\" + \"lve/main/9_H\" + \"P2-UVR.p\" + \"th\"\n",
124 | "DeEcho = \"https://huggi\" + \"ngface.c\" + \"o/seanghay/u\" + \"vr_models/res\" + \"olve/main/UV\" + \"R-DeEcho-DeR\" + \"everb.pth\"\n",
125 | "rmvpe = \"https://hug\" + \"gingfac\" + \"e.co/lj\" + \"1995/Voi\" + \"ceConvers\" + \"ionW\" + \"ebU\" + \"I/reso\" + \"lve/ma\" + \"in/rmv\" + \"pe.pt\"\n",
126 | "hubert = \"htt\" + \"ps://hug\" + \"gingface.c\" + \"o/lj1\" + \"995/Voic\" + \"eConv\" + \"ersionWeb\" + \"UI/resolv\" + \"e/main/huber\" + \"t_base.pt\"\n",
127 | "if not PosixPath(f\"{MANGIO_REPO_PATH}/uvr5_weights/9_HP2-UVR.pth\").exists():\n",
128 | " !wget {HP} -P {MANGIO_REPO_PATH}/uvr5_weights/ &> /dev/null\n",
129 | "\n",
130 | "if not PosixPath(f\"{MANGIO_REPO_PATH}/uvr5_weights/UVR-DeEcho-DeReverb.pth\").exists():\n",
131 | " !wget {DeEcho} -P {MANGIO_REPO_PATH}/uvr5_weights/ &> /dev/null\n",
132 | "\n",
133 | "if not PosixPath(f\"{MANGIO_REPO_PATH}/rmvpe.pt\").exists():\n",
134 | " !wget {rmvpe} -P {MANGIO_REPO_PATH}/ &> /dev/null\n",
135 | "\n",
136 | "if not PosixPath(f\"{MANGIO_REPO_PATH}/hubert_base.pt\").exists():\n",
137 | " !wget {hubert} -P {MANGIO_REPO_PATH}/ &> /dev/null\n",
138 | "\n",
139 | "# качаем импульс для постобработки ревером\n",
140 | "impulse_folder = f'{ROOT_DIR}/impulse'\n",
141 | "impulse_filename = '100-Reverb.wav'\n",
142 | "IMPULSE_FILE = os.path.join(impulse_folder, impulse_filename)\n",
143 | "\n",
144 | "!mkdir -p {ROOT_DIR}/zips/\n",
145 | "!mkdir -p {ROOT_DIR}/unzips/\n",
146 | "!gdown 'https://drive.google.com/file/d/0B6KkVBpcTFQvTGtRN1RyNUNuM0k/view?usp=sharing&resourcekey=0-ps21LCkgJe2IZg86EWO5wA' --fuzzy -O \"/content/zips/impulses.zip\"\n",
147 | "\n",
148 | "for filename in os.listdir(f\"{ROOT_DIR}/zips\"):\n",
149 | " if filename.endswith(\".zip\"):\n",
150 | " zip_file = os.path.join(f\"{ROOT_DIR}/zips\", filename)\n",
151 | " shutil.unpack_archive(zip_file, f\"{ROOT_DIR}/unzips\", 'zip')\n",
152 | "\n",
153 | "for root, dirs, files in os.walk(f\"{ROOT_DIR}/unzips\"):\n",
154 | " for file in files:\n",
155 | " if file.endswith(impulse_filename):\n",
156 | " file_name = os.path.splitext(file)[0]\n",
157 | " os.makedirs(impulse_folder, exist_ok=True)\n",
158 | " shutil.move(os.path.join(root, file), os.path.join(impulse_folder, file))\n",
159 | "\n",
160 | "!rm -r {ROOT_DIR}/unzips/\n",
161 | "!rm -r {ROOT_DIR}/zips/\n",
162 | "!apt-get install megatools\n",
163 | "\n",
164 | "\n",
165 | "# # Установка SadTalker\n",
166 | "print('\\n4/4...')\n",
167 | "%cd {ROOT_DIR}\n",
168 | "!git clone https://github.com/OpenTalker/SadTalker\n",
169 | "SADTALKER_REPO_PATH = f'{ROOT_DIR}/SadTalker'\n",
170 | "%cd {SADTALKER_REPO_PATH}\n",
171 | "!virtualenv venv\n",
172 | "!pip install moviepy\n",
173 | "!source {SADTALKER_REPO_PATH}/venv/bin/activate; pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113\n",
174 | "!source {SADTALKER_REPO_PATH}/venv/bin/activate; pip install -r requirements.txt\n",
175 | "!source {SADTALKER_REPO_PATH}/venv/bin/activate; pip install imjoy-elfinder\n",
176 | "\n",
177 | "!rm -rf checkpoints\n",
178 | "!bash scripts/download_models.sh\n",
179 | "\n",
180 | "%cd {ROOT_DIR}\n",
181 | "print('Готово! Продолжай дальше!')"
182 | ]
183 | },
184 | {
185 | "cell_type": "code",
186 | "execution_count": null,
187 | "metadata": {
188 | "cellView": "form",
189 | "id": "YTpxq9MtN0kj"
190 | },
191 | "outputs": [],
192 | "source": [
193 | "# @title # Скачиваем исходное аудио\n",
194 | "#@markdown ##### Шаг можно пропустить и вручную положить аудио-файл в /content/input\n",
195 | "#@markdown ##### Вставьте ссылку на Youtube:\n",
196 | "import yt_dlp\n",
197 | "import ffmpeg\n",
198 | "import sys\n",
199 | "from IPython.display import Audio, display, HTML, FileLink\n",
200 | "\n",
201 | "%cd {ROOT_DIR}\n",
202 | "url = 'https://www.youtube.com/watch?v=DeE8Fxq3viE' #@param {type:\"string\"}\n",
203 | "\n",
204 | "default_audio = 'audio'\n",
205 | "input_download_path = f'{ROOT_DIR}/input'\n",
206 | "input_download_format = 'mp3'\n",
207 | "\n",
208 | "ydl_opts = {\n",
209 | " 'format': 'bestaudio/best',\n",
210 | " 'postprocessors': [{\n",
211 | " 'key': 'FFmpegExtractAudio',\n",
212 | " 'preferredcodec': input_download_format,\n",
213 | " }],\n",
214 | " \"outtmpl\": f'{input_download_path}/{default_audio}',\n",
215 | "}\n",
216 | "def download_from_url(url):\n",
217 | " ydl.download([url])\n",
218 | "\n",
219 | "with yt_dlp.YoutubeDL(ydl_opts) as ydl:\n",
220 | " download_from_url(url)\n",
221 | "\n",
222 | "audio = Audio(f'{input_download_path}/{default_audio}.{input_download_format}', autoplay=False)\n",
223 | "display(audio)"
224 | ]
225 | },
226 | {
227 | "cell_type": "code",
228 | "execution_count": null,
229 | "metadata": {
230 | "cellView": "form",
231 | "id": "0x6VOMyae_lq"
232 | },
233 | "outputs": [],
234 | "source": [
235 | "# @title # Скачиваем модель\n",
236 | "#@markdown ##### Шаг можно пропустить и вручную положить .pth-модель и .index файл в репозиторий /content/Mangio-RVC-Fork\n",
237 | "#@markdown Вставьте ссылку на модель (Mega, Drive, etc.):\n",
238 | "from mega.mega import Mega\n",
239 | "import os\n",
240 | "from IPython.display import clear_output\n",
241 | "import shutil\n",
242 | "from urllib.parse import urlparse, parse_qs\n",
243 | "import urllib.parse\n",
244 | "from google.oauth2.service_account import Credentials\n",
245 | "import gspread\n",
246 | "import pandas as pd\n",
247 | "from tqdm import tqdm\n",
248 | "from bs4 import BeautifulSoup\n",
249 | "import requests\n",
250 | "import hashlib\n",
251 | "\n",
252 | "url = 'https://drive.google.com/file/d/14OVs-EEohPcHRqXtYuK_0xMesxnfHgab/view' #@param {type:\"string\"}\n",
253 | "\n",
254 | "%cd {MANGIO_REPO_PATH}\n",
255 | "\n",
256 | "#@markdown ---\n",
257 | "#@markdown ##### Ссылки на модели:\n",
258 | "#@markdown ##### https://docs.google.com/spreadsheets/d/1tAUaQrEHYgRsm1Lvrnj14HFHDwJWl0Bd9x0QePewNco\n",
259 | "#@markdown ##### https://huggingface.co/QuickWick/Music-AI-Voices/tree/main\n",
260 | "#@markdown ##### https://discord.gg/aihubbrasil\n",
261 | "#@markdown ##### https://t.me/RVCMODELU\n",
262 | "\n",
263 | "!rm -rf {ROOT_DIR}/unzips/\n",
264 | "!rm -rf {ROOT_DIR}/zips/\n",
265 | "!mkdir {ROOT_DIR}/unzips\n",
266 | "!mkdir {ROOT_DIR}/zips\n",
267 | "\n",
268 | "def sanitize_directory(directory):\n",
269 | " for filename in os.listdir(directory):\n",
270 | " file_path = os.path.join(directory, filename)\n",
271 | " if os.path.isfile(file_path):\n",
272 | " if filename == \".DS_Store\" or filename.startswith(\"._\"):\n",
273 | " os.remove(file_path)\n",
274 | " elif os.path.isdir(file_path):\n",
275 | " sanitize_directory(file_path)\n",
276 | "\n",
277 | "model_zip = urlparse(url).path.split('/')[-2] + '.zip'\n",
278 | "model_zip_path = f'{ROOT_DIR}/zips/' + model_zip\n",
279 | "\n",
280 | "private_model = False\n",
281 | "condition1 = False\n",
282 | "condition2 = False\n",
283 | "condition3 = False\n",
284 | "is_index_found = False\n",
285 | "\n",
286 | "if url != '':\n",
287 | " MODEL = \"\" # Инициализация модели\n",
288 | " !mkdir -p {MANGIO_REPO_PATH}/logs/$MODEL\n",
289 | " !mkdir -p {ROOT_DIR}/zips/\n",
290 | " !mkdir -p {MANGIO_REPO_PATH}/weights/ # Создание директории \"weights\", если отсутсвует\n",
291 | "\n",
292 | " if \"drive.google.com\" in url:\n",
293 | " !gdown $url --fuzzy -O \"$model_zip_path\"\n",
294 | " elif \"/blob/\" in url:\n",
295 | " url = url.replace(\"blob\", \"resolve\")\n",
296 | " print(\"Рабочая ссылка:\", url) # Принт рабочей ссылки\n",
297 | " !wget \"$url\" -O \"$model_zip_path\"\n",
298 | " elif \"mega.nz\" in url:\n",
299 | " m = Mega()\n",
300 | " print(\"Скачиваю с MEGA....\")\n",
301 | " m.download_url(url, f'{ROOT_DIR}/zips')\n",
302 | " elif \"/tree/main\" in url:\n",
303 | " response = requests.get(url)\n",
304 | " soup = BeautifulSoup(response.content, 'html.parser')\n",
305 | " temp_url = ''\n",
306 | " for link in soup.find_all('a', href=True):\n",
307 | " if link['href'].endswith('.zip'):\n",
308 | " temp_url = link['href']\n",
309 | " break\n",
310 | " if temp_url:\n",
311 | " url = temp_url\n",
312 | " print(\"Обновленная ссылка:\", url) # Принт новой ссылки\n",
313 | " url = url.replace(\"blob\", \"resolve\")\n",
314 | " print(\"Рабочая ссылка:\", url) # Принт рабочей ссылки (чего?)\n",
315 | "\n",
316 | " if \"huggingface.co\" not in url:\n",
317 | " url = \"https://huggingface.co\" + url\n",
318 | "\n",
319 | " !wget \"$url\" -O \"$model_zip_path\"\n",
320 | " else:\n",
321 | " print(\"НЕ найден .zip файл на этой странице.\")\n",
322 | " # Обработка случая, когда файл .zip не найден.\n",
323 | " else:\n",
324 | " !wget \"$url\" -O \"$model_zip_path\"\n",
325 | "\n",
326 | " for filename in os.listdir(f\"{ROOT_DIR}/zips\"):\n",
327 | " if filename.endswith(\".zip\"):\n",
328 | " zip_file = os.path.join(f\"{ROOT_DIR}/zips\", filename)\n",
329 | " shutil.unpack_archive(zip_file, f\"{ROOT_DIR}/unzips\", 'zip')\n",
330 | "\n",
331 | "sanitize_directory(f\"{ROOT_DIR}/unzips\")\n",
332 | "\n",
333 | "def find_pth_file(folder):\n",
334 | " for root, dirs, files in os.walk(folder):\n",
335 | " for file in files:\n",
336 | " if file.endswith(\".pth\"):\n",
337 | " file_name = os.path.splitext(file)[0]\n",
338 | " if file_name.startswith(\"G_\") or file_name.startswith(\"P_\"):\n",
339 | " config_file = os.path.join(root, \"config.json\")\n",
340 | " if os.path.isfile(config_file):\n",
341 | " print(\"Обнаружен устаревший .pth! Это несовместимо с методом RVC. Найдите эквивалентную модель RVC!\")\n",
342 | " continue # Новый поиск валидного файла\n",
343 | " file_path = os.path.join(root, file)\n",
344 | " if os.path.getsize(file_path) > 100 * 1024 * 1024: # Проверка размера файла (100MB)\n",
345 | " print(\"Skipping unusable training file:\", file)\n",
346 | " continue # Новый поиск валидного файла\n",
347 | " return file_name\n",
348 | " return None\n",
349 | "\n",
350 | "MODEL = find_pth_file(f\"{ROOT_DIR}/unzips\")\n",
351 | "if MODEL is not None:\n",
352 | " print(\"Нашел ваш .pth файл:\", MODEL + \".pth\")\n",
353 | "else:\n",
354 | " print(\"Ошибка: Не найден валидный .pth файл в вашем архиве.\")\n",
355 | " print(\"Если над этим сообщением появляется ошибка «Доступ запрещен», попробуйте один из альтернативных URL-адресов..\")\n",
356 | " MODEL = \"\"\n",
357 | " global condition3\n",
358 | " condition3 = True\n",
359 | "\n",
360 | "index_path = \"\"\n",
361 | "\n",
362 | "def find_version_number(index_path):\n",
363 | " if condition2 and not condition1:\n",
364 | " if file_size >= 55180000:\n",
365 | " return 'RVC v2'\n",
366 | " else:\n",
367 | " return 'RVC v1'\n",
368 | "\n",
369 | " filename = os.path.basename(index_path)\n",
370 | "\n",
371 | " if filename.endswith(\"_v2.index\"):\n",
372 | " return 'RVC v2'\n",
373 | " elif filename.endswith(\"_v1.index\"):\n",
374 | " return 'RVC v1'\n",
375 | " else:\n",
376 | " if file_size >= 55180000:\n",
377 | " return 'RVC v2'\n",
378 | " else:\n",
379 | " return 'RVC v1'\n",
380 | "\n",
381 | "if MODEL != \"\":\n",
382 | " # Перемещение модели в папку логов\n",
383 | " for root, dirs, files in os.walk(f'{ROOT_DIR}/unzips'):\n",
384 | " for file in files:\n",
385 | " file_path = os.path.join(root, file)\n",
386 | " if file.endswith(\".index\"):\n",
387 | " print(\"Нашел индекс файл:\", file)\n",
388 | " is_index_found = False\n",
389 | " condition1 = True\n",
390 | " logs_folder = os.path.join(f'{MANGIO_REPO_PATH}/logs', MODEL)\n",
391 | " os.makedirs(logs_folder, exist_ok=True) # Создание папки логов, если она отсуствует по какой-либо причине.\n",
392 | "\n",
393 | " # Удаление индекс файла. (зачем?)\n",
394 | " if file.endswith(\".index\"):\n",
395 | " identical_index_path = os.path.join(logs_folder, file)\n",
396 | " if os.path.exists(identical_index_path):\n",
397 | " os.remove(identical_index_path)\n",
398 | "\n",
399 | " shutil.move(file_path, logs_folder)\n",
400 | " index_path = os.path.join(logs_folder, file) # Установка пути к индекс файлу.\n",
401 | "\n",
402 | " elif \"G_\" not in file and \"D_\" not in file and file.endswith(\".pth\"):\n",
403 | " destination_path = f'{MANGIO_REPO_PATH}/weights/{MODEL}.pth'\n",
404 | " if os.path.exists(destination_path):\n",
405 | " print(\"Ты уже скачал эту модель. Импортирую еще раз..\")\n",
406 | " shutil.move(file_path, destination_path)\n",
407 | "\n",
408 | "if condition1 is False:\n",
409 | " logs_folder = os.path.join(f'{MANGIO_REPO_PATH}/logs', MODEL)\n",
410 | " os.makedirs(logs_folder, exist_ok=True)\n",
411 | "# this is here so it doesnt crash if the model is missing an index for some reason\n",
412 | "else:\n",
413 | " print(\"Ссылка не может быть пустой! Вставь свою ссылку!\")\n",
414 | "\n",
415 | "# Качаем любой index-файл, если в архиве его не было\n",
416 | "if is_index_found is False:\n",
417 | " logs_folder = os.path.join(f'{MANGIO_REPO_PATH}/logs', MODEL)\n",
418 | " index_path = os.path.join(logs_folder, 'model.index')\n",
419 | " if os.path.exists(index_path) == False:\n",
420 | " !wget 'https://huggingface.co/sail-rvc/2001MJAIDAM/resolve/main/model.index' -P {logs_folder}\n",
421 | "\n",
422 | "!rm -r {ROOT_DIR}/unzips/\n",
423 | "!rm -r {ROOT_DIR}/zips/\n",
424 | "print(\"Скачано!\")"
425 | ]
426 | },
427 | {
428 | "cell_type": "markdown",
429 | "metadata": {
430 | "id": "x8CNwbOjZPUY"
431 | },
432 | "source": [
433 | "# Обработка исходного аудио\n",
434 | "### В ***/content/input*** должен быть трек, а RVC-модель должна быть в репозитории ***/content/Mangio-RVC-Fork***"
435 | ]
436 | },
437 | {
438 | "cell_type": "code",
439 | "execution_count": null,
440 | "metadata": {
441 | "cellView": "form",
442 | "id": "0XuTHvtDZJPT"
443 | },
444 | "outputs": [],
445 | "source": [
446 | "# @title #Отделяем вокал от минуса\n",
447 | "%cd {MDX_REPO_PATH}\n",
448 | "\n",
449 | "from pathlib import Path, PurePath\n",
450 | "from IPython.display import Audio, display, HTML, FileLink\n",
451 | "import os\n",
452 | "\n",
453 | "# @markdown Папка с исходным музыкальным файлом (только один файл):\n",
454 | "INPUT = '/content/input' #@param {type:\"string\"}\n",
455 | "# @markdown ---\n",
456 | "OUTPUT_UVR_FOLDER = '/content/output_uvr' #@param {type:\"string\"}\n",
457 | "\n",
458 | "BigShifts_MDX = 6\n",
459 | "overlap_MDX = 0.65\n",
460 | "overlap_MDXv3 = 10\n",
461 | "weight_MDXv3 = 8\n",
462 | "weight_VOCFT = 3\n",
463 | "weight_HQ3 = 2\n",
464 | "overlap_demucs = 0.6\n",
465 | "output_format = 'PCM_16'\n",
466 | "# @markdown ---\n",
467 | "# @markdown Понижай значение, если получаешь ошибку по памяти:\n",
468 | "chunk_size = 500000 #@param {type:\"slider\", min:100000, max:1000000, step:100000}\n",
469 | "\n",
470 | "filename = next(Path(INPUT).glob('*.mp3'))\n",
471 | "INPUT_NAME = Path(filename).stem\n",
472 | "VOCAL_FILE = os.path.join(OUTPUT_UVR_FOLDER, INPUT_NAME + \"_vocals.wav\")\n",
473 | "INSTRUM_FILE = os.path.join(OUTPUT_UVR_FOLDER, INPUT_NAME + \"_instrum.wav\")\n",
474 | "!source {MDX_REPO_PATH}/venv/bin/activate; python inference.py \\\n",
475 | " --large_gpu \\\n",
476 | " --weight_MDXv3 {weight_MDXv3} \\\n",
477 | " --weight_VOCFT {weight_VOCFT} \\\n",
478 | " --weight_HQ3 {weight_HQ3} \\\n",
479 | " --chunk_size {chunk_size} \\\n",
480 | " --input_audio \"{filename}\" \\\n",
481 | " --overlap_demucs {overlap_demucs} \\\n",
482 | " --overlap_MDX {overlap_MDX} \\\n",
483 | " --overlap_MDXv3 {overlap_MDXv3} \\\n",
484 | " --output_format {output_format} \\\n",
485 | " --bigshifts {BigShifts_MDX} \\\n",
486 | " --output_folder \"{OUTPUT_UVR_FOLDER}\" \\\n",
487 | " --vocals_only true\n",
488 | "\n",
489 | "print(\"\\nПослушаем разделённый трек.\")\n",
490 | "print(\"Нужно немного подождать, сейчас появится...\")\n",
491 | "\n",
492 | "#колаб порой офигивает от размера wav и дисконнектится, поэтому выводим на прослушиваем mp3, но работаем с wav\n",
493 | "!ffmpeg -y -hide_banner -loglevel error -i {VOCAL_FILE} -vn -ar 44100 -ac 2 -b:a 192k {os.path.join(OUTPUT_UVR_FOLDER, INPUT_NAME + \"_vocals\")}.mp3 &> /dev/null\n",
494 | "!ffmpeg -y -hide_banner -loglevel error -i {INSTRUM_FILE} -vn -ar 44100 -ac 2 -b:a 192k {os.path.join(OUTPUT_UVR_FOLDER, INPUT_NAME + \"_instrum\")}.mp3 &> /dev/null\n",
495 | "print(\"Вокал:\")\n",
496 | "audio_vocal = Audio(os.path.join(OUTPUT_UVR_FOLDER, INPUT_NAME + \"_vocals.mp3\"), autoplay=False)\n",
497 | "display(audio_vocal)\n",
498 | "print(\"Инструментал:\")\n",
499 | "audio_inst = Audio(os.path.join(OUTPUT_UVR_FOLDER, INPUT_NAME + \"_instrum.mp3\"), autoplay=False)\n",
500 | "display(audio_inst)\n",
501 | "\n",
502 | "# удаляем временные mp3\n",
503 | "!rm -rf {os.path.join(OUTPUT_UVR_FOLDER, INPUT_NAME + \"_vocals.mp3\")}\n",
504 | "!rm -rf {os.path.join(OUTPUT_UVR_FOLDER, INPUT_NAME + \"_instrum.mp3\")}\n",
505 | "\n",
506 | "print('\\nСовет: если в вокале много эха и ревера, переходи к следующему шагу перед преобразованием RVC')"
507 | ]
508 | },
509 | {
510 | "cell_type": "code",
511 | "execution_count": null,
512 | "metadata": {
513 | "cellView": "form",
514 | "id": "3yztvPmFytx-"
515 | },
516 | "outputs": [],
517 | "source": [
518 | "%cd {MANGIO_REPO_PATH}\n",
519 | "# @title # Доп обработка от ревера и эхо (опционально)\n",
520 | "# @markdown ##### Новый файл автоматически заменит исходный вокальный файл /content/output_uvr/file_vocals.wav\n",
521 | "\n",
522 | "\n",
523 | "from pathlib import Path, PurePath\n",
524 | "from IPython.display import Audio, display, HTML, FileLink\n",
525 | "\n",
526 | "\n",
527 | "input_denoise_file = VOCAL_FILE\n",
528 | "output_folder = PurePath(VOCAL_FILE).parent\n",
529 | "\n",
530 | "device = \"cuda\"\n",
531 | "agg = 10\n",
532 | "format = \"wav\"\n",
533 | "model_path = \"uvr5_weights/UVR-DeEcho-DeReverb.pth\"\n",
534 | "\n",
535 | "cmd = f\"import infer_uvr5; output_folder='{output_folder}'; pre_fun = infer_uvr5._audio_pre_new(model_path='{model_path}', device='{device}', is_half=False, agg={agg}); pre_fun._path_audio_('{input_denoise_file}', None, '{output_folder}', '{format}')\"\n",
536 | "!echo \"{cmd}\"\n",
537 | "\n",
538 | "!source {MANGIO_REPO_PATH}/venv/bin/activate; python -c \"{cmd}\"\n",
539 | "\n",
540 | "new_file = \"instrument_{}_{}.{}\".format(os.path.basename(VOCAL_FILE), agg, format)\n",
541 | "!mv {os.path.join(output_folder, new_file)} {VOCAL_FILE}\n",
542 | "\n",
543 | "# колаб порой офигивает от размера wav и дисконнектится, поэтому mp3\n",
544 | "!ffmpeg -y -hide_banner -loglevel error -i {VOCAL_FILE} -vn -ar 44100 -ac 2 -b:a 320k {os.path.join(OUTPUT_UVR_FOLDER, INPUT_NAME + \"_vocals\")}.mp3 &> /dev/null\n",
545 | "audio = Audio(os.path.join(OUTPUT_UVR_FOLDER, INPUT_NAME + \"_vocals.mp3\"), autoplay=False)\n",
546 | "display(audio)\n",
547 | "\n",
548 | "# дулаяем временный mp3\n",
549 | "!rm -rf {os.path.join(OUTPUT_UVR_FOLDER, INPUT_NAME + \"_vocals.mp3\")}\n",
550 | "print(\"Обработка успешна.\")"
551 | ]
552 | },
553 | {
554 | "cell_type": "markdown",
555 | "metadata": {
556 | "id": "H3OJWmYNuTuH"
557 | },
558 | "source": [
559 | "# Преобразование файла и склейка с исходным инструменталом"
560 | ]
561 | },
562 | {
563 | "cell_type": "code",
564 | "execution_count": null,
565 | "metadata": {
566 | "cellView": "form",
567 | "id": "J0JNWaDOoK0H"
568 | },
569 | "outputs": [],
570 | "source": [
571 | "# @title # 🚀 Преобразование\n",
572 | "from IPython.display import Audio, display, HTML, FileLink\n",
573 | "from pathlib import Path, PurePath\n",
574 | "import os\n",
575 | "from decimal import Decimal\n",
576 | "\n",
577 | "result_path = f\"{ROOT_DIR}/output_rvc\"\n",
578 | "result_name = \"rvc_result\"\n",
579 | "result_format = \"mp3\"\n",
580 | "\n",
581 | "RVC_RESULT_FILE = os.path.join(result_path, result_name + \".\" + result_format)\n",
582 | "rvc_result_filename = os.path.basename(RVC_RESULT_FILE)\n",
583 | "\n",
584 | "f0_method = \"rmvpe\"\n",
585 | "\n",
586 | "# @markdown ## Настройки не меняют тональность\n",
587 | "\n",
588 | "# @markdown ### Питч (октава вверх, октава вниз, стандарт):\n",
589 | "transpositionMode = \"\\u0421\\u0442\\u0430\\u043D\\u0434\\u0430\\u0440\\u0442\\u043D\\u044B\\u0439\" # @param [\"М -> Ж\", \"Ж -> М\", \"Стандартный\"]\n",
590 | "transposition = 0\n",
591 | "\n",
592 | "if transpositionMode == \"М -> Ж\":\n",
593 | " transposition = 12\n",
594 | "elif transpositionMode == \"Ж -> М\":\n",
595 | " transposition = -12\n",
596 | "elif transpositionMode == \"Стандартный\":\n",
597 | " transposition = 0\n",
598 | "\n",
599 | "# @markdown ---\n",
600 | "# @markdown ## Режим:\n",
601 | "mode = \"\\u0421\\u0442\\u0430\\u043D\\u0434\\u0430\\u0440\\u0442\\u043D\\u044B\\u0439\" # @param [\"М -> Ж\", \"Ж -> М\", \"Стандартный\", \"Ручная настройка\"]\n",
602 | "\n",
603 | "# @markdown ---\n",
604 | "# @markdown ### Ручные настройки:\n",
605 | "# @markdown ###### (не работает, если не выбран режим Ручная настройка)\n",
606 | "# @markdown ##### Quefrency (default 0.0 ms):\n",
607 | "quefrency = 0 # @param {type:\"slider\", min:0.0, max:2, step:0.1}\n",
608 | "# @markdown ##### Tembre factor (default 1.0):\n",
609 | "tembre = 0.2 # @param {type:\"slider\", min:0.0, max:2, step:0.1}\n",
610 | "\n",
611 | "if mode == \"М -> Ж\":\n",
612 | " quefrency = 0.5\n",
613 | " tembre = 1.3\n",
614 | "elif mode == \"Ж -> М\":\n",
615 | " quefrency = 1.0\n",
616 | " tembre = 0.7\n",
617 | "elif mode == \"Стандартный\":\n",
618 | " quefrency = 0.0\n",
619 | " tembre = 1.0\n",
620 | "\n",
621 | "# @markdown ### Стандартный режим быстрее! Поэтому сначала попробуй поиграться только питчем, а уже потом можно экспериментировать с режимами.\n",
622 | "# @markdown ---\n",
623 | "# @markdown ## Подсказка:\n",
624 | "# @markdown ###### Из женского в мужской: quefrency = 1.0 (больше дефолтного), tembre = 0.7 (меньше дефолтного):\n",
625 | "# @markdown ###### Из мужского в женский: quefrency = 0.5 (чуть-чуть больше дефолтного), tembre = 1.3 (больше дефолтного):\n",
626 | "# @markdown ###### Универсальных решений нет, в первую очередь зависит от исходного голоса и конечной модели. Отталкиваться следует от значений выше.:\n",
627 | "%cd {MANGIO_REPO_PATH}\n",
628 | "\n",
629 | "# \"\\n аргумент 1) имя модели в виде .pth в ./weights: mi-test.pth\"\n",
630 | "# \"\\n аргумент 2) исходное аудио: myFolder\\\\MySource.wav\"\n",
631 | "# \"\\n аргумент 3) аудио после обработки './audio-outputs': MyTest.wav\"\n",
632 | "# \"\\n аргумент 4) путь к индексу: logs/mi-test/added_IVF3042_Flat_nprobe_1.index\"\n",
633 | "# \"\\n аргумент 5) айди спикера: 0\"\n",
634 | "# \"\\n аргумент 6) транспозиция: 0\"\n",
635 | "# \"\\n аргумент 7) f0 метод: harvest (pm, harvest, crepe, crepe-tiny, hybrid[x,x,x,x], mangio-crepe, mangio-crepe-tiny, rmvpe)\"\n",
636 | "# \"\\n аргумент 8) crepe hop length: 160\"\n",
637 | "# \"\\n аргумент 9) медианный радиус фильтра harvest: 3 (0-7)\"\n",
638 | "# \"\\n аргумент 10) частота повторной выборки после: 0\"\n",
639 | "# \"\\n аргумент 11) конверт объема микса: 1\"\n",
640 | "# \"\\n аргумент 12) соотношение индексов функций: 0.78 (0-1)\"\n",
641 | "# \"\\n аргумент 13) Защита глухих согласных (Меньше артефактов): 0.33 (Меньше число = больше защиты. 0.50 означает «Не использовать».)\"\n",
642 | "# \"\\n аргумент 14) Следует ли формировать сдвиг аудио вывода перед преобразованием: False (если установлено значение false, вы можете игнорировать установку значений квенренции и тембра для форматирования.)\"\n",
643 | "# \"\\n аргумент 15)* Quefrency для формирования: 8.0 (нет необходимости устанавливать, если аргумент 14 в значении False/false)\"\n",
644 | "# \"\\n аргумент 16)* Тембр для формирования: 1.2 (нет необходимости устанавливать, если аргумент 14 в значении False/false) \\n\"\n",
645 | "# \"\\Дефолтное: mi-test.pth audios/Sidney.wav myTest.wav logs/mi-test/added_index.index 0 -2 harvest 160 3 0 1 0.95 0.33 0.45 True 8.0 1.2\"\n",
646 | "is_formant = \"False\"\n",
647 | "if quefrency != 0 and tembre != 1:\n",
648 | " is_formant = \"True\"\n",
649 | "\n",
650 | "quefrency_value = \"{:.1f}\".format(Decimal(quefrency).quantize(Decimal('0.1')))\n",
651 | "tembre_value = \"{:.1f}\".format(Decimal(tembre).quantize(Decimal('0.1')))\n",
652 | "transposition_value = str(transposition)\n",
653 | "\n",
654 | "cmd = MODEL + \".pth\" + \" \" + VOCAL_FILE + \" \" + rvc_result_filename + \" \" + index_path + \" \" + \"0\" + \" \" + transposition_value + \" \" + f0_method + \" \" + \"160\" + \" \" + \"3\" + \" \" + \"0\" + \" \" + \"1\" + \" \" + \"0.78\" + \" \" + \"0.33\" + \" \" + \"0.45\" + \" \" + is_formant + \" \" + quefrency_value + \" \" + tembre_value\n",
655 | "print(cmd)\n",
656 | "!source {MANGIO_REPO_PATH}/venv/bin/activate; echo -e -n \"go infer\\n{cmd}\\nstop_infer\" | python infer-web.py --colab --pycmd python3 --is_cli &> /dev/null\n",
657 | "%mv /content/Mangio-RVC-Fork/audio-outputs/{rvc_result_filename} {RVC_RESULT_FILE}\n",
658 | "audio = Audio(RVC_RESULT_FILE, autoplay=False)\n",
659 | "display(audio)"
660 | ]
661 | },
662 | {
663 | "cell_type": "code",
664 | "execution_count": null,
665 | "metadata": {
666 | "cellView": "form",
667 | "id": "IzIpIvLUxWmq"
668 | },
669 | "outputs": [],
670 | "source": [
671 | "%cd {ROOT_DIR}\n",
672 | "# @title # Пост-обработка\n",
673 | "# @markdown ### Компрессор + нормализация + лёгкая реверберация + разведение по стерео-панораме\n",
674 | "\n",
675 | "from IPython.display import Audio, display, HTML, FileLink\n",
676 | "import os\n",
677 | "import shutil\n",
678 | "\n",
679 | "OUTPUT_PATH = f'{ROOT_DIR}/output'\n",
680 | "PROCESSED_OUTPUT_FORMAT = 'mp3'\n",
681 | "COMPRESSED_RESULT_FILE = os.path.join(OUTPUT_PATH, f\"{os.path.splitext(RVC_RESULT_FILE)[0]}_compressed.{PROCESSED_OUTPUT_FORMAT}\")\n",
682 | "PROCESSED_RESULT_FILE = os.path.join(OUTPUT_PATH, f\"{os.path.splitext(RVC_RESULT_FILE)[0]}_processed.{PROCESSED_OUTPUT_FORMAT}\")\n",
683 | "\n",
684 | "# компрессируем вокал\n",
685 | "!ffmpeg -y -hide_banner -loglevel error -i {RVC_RESULT_FILE} -filter_complex \"anlmdn=s=10,acompressor=threshold=-20dB:ratio=4:attack=20:release=200,volume=2,loudnorm=I=-13:TP=-1.0:LRA=9,volume=1.5\" {COMPRESSED_RESULT_FILE}\n",
686 | "if os.path.isfile(COMPRESSED_RESULT_FILE) != True:\n",
687 | " print(f\"Не удалось обработать файл {RVC_RESULT_FILE}\")\n",
688 | " shutil.copy(RVC_RESULT_FILE, PROCESSED_RESULT_FILE)\n",
689 | "else:\n",
690 | " if os.path.isfile(IMPULSE_FILE):\n",
691 | " # добавление реверберации с разной обработкой для левого и правого канала для стереоскопического эффекта\n",
692 | " print(\"Добавление реверберации с разной обработкой для левого и правого канала для стереоскопического эффекта\")\n",
693 | " !ffmpeg -y -hide_banner -loglevel error -i {COMPRESSED_RESULT_FILE} -i {IMPULSE_FILE} -filter_complex \"[0:a]asplit=2[splita][splitb]; [splita]adelay=40|40[splita_delayed]; [splitb]adelay=20|20[splitb_delayed]; [splita_delayed][1]afir=dry=10:wet=10[reverb_left]; [splitb_delayed][1]afir=dry=10:wet=10[reverb_right]; [reverb_left][reverb_right]amerge=inputs=2[reverb]; [0:a][reverb]amix=inputs=2:weights=20 1[audio]\" -map \"[audio]\" {PROCESSED_RESULT_FILE}\n",
694 | " if os.path.isfile(PROCESSED_RESULT_FILE):\n",
695 | " !rm -rf {COMPRESSED_RESULT_FILE}\n",
696 | " else:\n",
697 | " print(f\"Не удалось обработать компрессированный файл {COMPRESSED_RESULT_FILE}\")\n",
698 | " shutil.move(COMPRESSED_RESULT_FILE, PROCESSED_RESULT_FILE)\n",
699 | " else:\n",
700 | " print(f\"Не найден файл импульса: {IMPULSE_FILE}\")\n",
701 | " shutil.move(COMPRESSED_RESULT_FILE, PROCESSED_RESULT_FILE)\n",
702 | "\n",
703 | "audio = Audio(PROCESSED_RESULT_FILE, autoplay=False)\n",
704 | "display(audio)"
705 | ]
706 | },
707 | {
708 | "cell_type": "code",
709 | "execution_count": null,
710 | "metadata": {
711 | "cellView": "form",
712 | "id": "Xp05zq7DgcvU"
713 | },
714 | "outputs": [],
715 | "source": [
716 | "%cd {ROOT_DIR}\n",
717 | "# @title # Склейка\n",
718 | "\n",
719 | "from IPython.display import Audio, display, HTML, FileLink\n",
720 | "import os\n",
721 | "\n",
722 | "OUTPUT_PATH = '/content/output' #@param {type:\"string\"}\n",
723 | "OUTPUT_FORMAT = 'mp3'\n",
724 | "\n",
725 | "RESULT_FILE = os.path.join(OUTPUT_PATH, INPUT_NAME + \".\" + OUTPUT_FORMAT)\n",
726 | "\n",
727 | "!ffmpeg -y -hide_banner -loglevel error -i {PROCESSED_RESULT_FILE} -i {INSTRUM_FILE} -filter_complex \"[0:a][1:a]amerge=inputs=2[a]\" -map \"[a]\" -ac 2 {RESULT_FILE}\n",
728 | "\n",
729 | "audio = Audio(RESULT_FILE, autoplay=False)\n",
730 | "display(audio)"
731 | ]
732 | },
733 | {
734 | "cell_type": "markdown",
735 | "source": [
736 | "# Анимирование и совмещение с полученным аудио\n",
737 | "### В ***/content/image*** должна быть фотография человека с четким лицом"
738 | ],
739 | "metadata": {
740 | "id": "E1Ix8vW7M8CU"
741 | }
742 | },
743 | {
744 | "cell_type": "code",
745 | "execution_count": null,
746 | "metadata": {
747 | "cellView": "form",
748 | "id": "lAcmd2i2hmlx"
749 | },
750 | "outputs": [],
751 | "source": [
752 | "#@title # Запуск SadTalker\n",
753 | "# @markdown ### Сводим вместе аудио и картинку\n",
754 | "# @markdown *Самый долгий этап: 1 минута аудио занимает, примерно, 10 минут обработки*\n",
755 | "\n",
756 | "from moviepy.editor import VideoFileClip\n",
757 | "import glob\n",
758 | "import os\n",
759 | "from IPython.display import Audio, display\n",
760 | "\n",
761 | "%cd {SADTALKER_REPO_PATH}\n",
762 | "\n",
763 | "# @markdown ---\n",
764 | "# @markdown ### Демо-режим (15 секунд):\n",
765 | "is_demo = True # @param {type:\"boolean\"}\n",
766 | "if is_demo == True:\n",
767 | " duration = 15\n",
768 | "else:\n",
769 | " duration = 0\n",
770 | "\n",
771 | "output_video_folder = f'{SADTALKER_REPO_PATH}/result'\n",
772 | "!rm -rf {output_video_folder}\n",
773 | "!mkdir -p {output_video_folder}\n",
774 | "\n",
775 | "\n",
776 | "# SadTalker переносит в output_video_folder оригинал изображения и аудио, поэтому создаем временные, чтобы не жалко было потерять\n",
777 | "temp_INPUT_FACE_IMAGE = f\"/content/{os.path.basename(INPUT_FACE_IMAGE)}\"\n",
778 | "!cp {INPUT_FACE_IMAGE} {temp_INPUT_FACE_IMAGE}\n",
779 | "temp_VOCAL_FILE = f\"/content/{os.path.basename(RVC_RESULT_FILE)}\"\n",
780 | "!cp {VOCAL_FILE} {temp_VOCAL_FILE}\n",
781 | "\n",
782 | "cut_vocal_file = f\"/content/cut_vocal_file.wav\"\n",
783 | "\n",
784 | "if duration != 0:\n",
785 | " !ffmpeg -y -loglevel error -hide_banner -i {temp_VOCAL_FILE} -ss 0 -t {duration} {cut_vocal_file}\n",
786 | "else:\n",
787 | " cut_vocal_file = temp_VOCAL_FILE\n",
788 | "\n",
789 | "cmd = f\"from src.gradio_demo import SadTalker; sad_talker = SadTalker(lazy_load=True); a = sad_talker.test('{temp_INPUT_FACE_IMAGE}', '{cut_vocal_file}', 'full', True, False, 8, 256, 0, 1.0, False, None, None, False, {duration}, True, './result/'); print(a)\"\n",
790 | "!echo \"{cmd}\"\n",
791 | "\n",
792 | "!source {SADTALKER_REPO_PATH}/venv/bin/activate; python -c \"{cmd}\"\n",
793 | "\n",
794 | "sad_talker_output_video = glob.glob(f\"{output_video_folder}/*/*_full.mp4\")[0]\n",
795 | "\n",
796 | "OUTPUT_VIDEO = os.path.join(OUTPUT_PATH, 'video.mp4')\n",
797 | "\n",
798 | "cut_result_file = os.path.join(OUTPUT_PATH, INPUT_NAME + \"_cut.\" + OUTPUT_FORMAT)\n",
799 | "\n",
800 | "if duration != 0:\n",
801 | " !ffmpeg -y -loglevel error -hide_banner -i {RESULT_FILE} -ss 0 -t {duration} {cut_result_file}\n",
802 | "else:\n",
803 | " cut_result_file = RESULT_FILE\n",
804 | "\n",
805 | "# костыль с созданием видео без звука, а потом совмещением с оригинальной дорожкой\n",
806 | "video_without_audio = os.path.join(OUTPUT_PATH, INPUT_NAME + \"_silent_video.mp4\")\n",
807 | "!ffmpeg -y -loglevel error -hide_banner -i {sad_talker_output_video} -c copy -an {video_without_audio}\n",
808 | "!ffmpeg -y -loglevel error -hide_banner -i {video_without_audio} -i {cut_result_file} -c copy {OUTPUT_VIDEO}\n",
809 | "\n",
810 | "!rm -rf {cut_result_file}\n",
811 | "!rm -rf {video_without_audio}\n",
812 | "\n",
813 | "clip = VideoFileClip(OUTPUT_VIDEO).resize(height=420)\n",
814 | "clip.ipython_display()"
815 | ]
816 | },
817 | {
818 | "cell_type": "markdown",
819 | "source": [
820 | "### Готово\n",
821 | "#### Теперь можешь вернуться к любому предыдущему шагу, без необходимости запуска полного флоу. Например, можно загрузить другую модель или другую картинку - останется только выполнить преобразование вокала, который уже отделен от инструментала."
822 | ],
823 | "metadata": {
824 | "id": "xt7h5JoIHYEm"
825 | }
826 | }
827 | ],
828 | "metadata": {
829 | "accelerator": "GPU",
830 | "colab": {
831 | "provenance": [],
832 | "gpuType": "T4",
833 | "toc_visible": true
834 | },
835 | "kernelspec": {
836 | "display_name": "Python 3",
837 | "name": "python3"
838 | },
839 | "language_info": {
840 | "name": "python"
841 | }
842 | },
843 | "nbformat": 4,
844 | "nbformat_minor": 0
845 | }
--------------------------------------------------------------------------------
/AI_Auto_Cover_V1.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "source": [
6 | "# Ai Auto Cover\n",
7 | "[](https://colab.research.google.com/github/self-destruction/AiAutoCover/blob/main/AI_Auto_Cover_V1.ipynb)\n",
8 | "##### С помощью этого блокнота ты можешь в пару кликов заменить голос из песни. Для этого нужна ссылка с youtube и ссылка на модель исполнителя. Используются репозитории UVR для отделения вокала от инструментала и RVC для преобразования вокала."
9 | ],
10 | "metadata": {
11 | "id": "6C0DGeq1grRx"
12 | }
13 | },
14 | {
15 | "cell_type": "markdown",
16 | "source": [
17 | "# Установка всех зависимостей\n",
18 | "##### Навигация по папкам:\n",
19 | "##### /content/input/billie_jean.mp3 - исходный файл (вокал + инструментал), скачивается аудио с ютуба (этот шаг можно пропустить и положить файл вручную)\n",
20 | "##### /content/output_uvr/billie_jean_instrum.wav (инструментал) и /content/output_uvr/billie_jean_vocals.wav (вокал) - файлы после разделения \"Ultimate Vocal Remove\"ером\n",
21 | "##### /content/output_rvc/result.mp3 (вокал) - преобразованный вокал, после обработки определённой моделью\n",
22 | "##### /content/output/result.mp3 (вокал + инструмент) - микс преобразованного вокала и исходного инструментала\n",
23 | "##### /content/impulse/reverb.wav - импульсный файл реверберации для пост-обработки вокала\n",
24 | "##### Красным помечены обязательные шаги. Остальные можно запускать не глядя."
25 | ],
26 | "metadata": {
27 | "id": "0jVt6o9aSTk4"
28 | }
29 | },
30 | {
31 | "cell_type": "code",
32 | "execution_count": null,
33 | "metadata": {
34 | "id": "IqR9JISMYnlU",
35 | "cellView": "form"
36 | },
37 | "outputs": [],
38 | "source": [
39 | "# @title #Установка UVR + RVC\n",
40 | "#@markdown *Установка займёт 3-4 минуты, завари чаёк, дорогой*\n",
41 | "%cd /content\n",
42 | "!mkdir -p /content/input\n",
43 | "!mkdir -p /content/output\n",
44 | "\n",
45 | "!mkdir -p /content/output_uvr\n",
46 | "print('\\n1/3...')\n",
47 | "!git clone https://github.com/jarredou/MVSEP-MDX23-Colab_v2.git\n",
48 | "!apt install ffmpeg &> /dev/null\n",
49 | "%cd /content/MVSEP-MDX23-Colab_v2\n",
50 | "!pip install -r requirements.txt &> /dev/null\n",
51 | "\n",
52 | "%cd /content\n",
53 | "!mkdir -p /content/output_rvc\n",
54 | "print('\\n2/3...')\n",
55 | "!git clone https://github.com/Mangio621/Mangio-RVC-Fork.git\n",
56 | "%cd /content/Mangio-RVC-Fork\n",
57 | "!apt-get -y install build-essential python3-dev &> /dev/null\n",
58 | "!pip install --upgrade setuptools wheel pip &> /dev/null\n",
59 | "!pip install yt_dlp faiss-cpu==1.7.2 librosa==0.9.1 fairseq ffmpeg ffmpeg-python praat-parselmouth pyworld numpy==1.23 gradio torchcrepe stftpitchshift &> /dev/null\n",
60 | "\n",
61 | "print('\\n3/3...')\n",
62 | "# Костыль, потому что у автора не отбит педрильник\n",
63 | "!sed -i '/command = input(\"%s: \" % cli_current_page)/a\\ if command.strip() == \"stop_infer\":\\n import sys\\n sys.exit()' infer-web.py\n",
64 | "\n",
65 | "!wget https://files.pythonhosted.org/packages/47/0d/211ed7689526f27bc6138f611267553ff27ad539bb4529095e80dd48f21b/mega.py-1.0.8.tar.gz -P /content/Mangio-RVC-Fork/ &> /dev/null\n",
66 | "!pip install \\mega.py-1.0.8.tar.gz &> /dev/null\n",
67 | "!rm -rf \\mega.py-1.0.8.tar.gz\n",
68 | "\n",
69 | "# Обфу скац ия, чт обы г угл колаб не руга лся :)\n",
70 | "HP = \"https://hug\" + \"gingfa\" + \"ce.co/se\" + \"anghay/uv\" + \"r_mode\" + \"ls/reso\" + \"lve/main/9_H\" + \"P2-UVR.p\" + \"th\"\n",
71 | "DeEcho = \"https://huggi\" + \"ngface.c\" + \"o/seanghay/u\" + \"vr_models/res\" + \"olve/main/UV\" + \"R-DeEcho-DeR\" + \"everb.pth\"\n",
72 | "rmvpe = \"https://hug\" + \"gingfac\" + \"e.co/lj\" + \"1995/Voi\" + \"ceConvers\" + \"ionW\" + \"ebU\" + \"I/reso\" + \"lve/ma\" + \"in/rmv\" + \"pe.pt\"\n",
73 | "hubert = \"htt\" + \"ps://hug\" + \"gingface.c\" + \"o/lj1\" + \"995/Voic\" + \"eConv\" + \"ersionWeb\" + \"UI/resolv\" + \"e/main/huber\" + \"t_base.pt\"\n",
74 | "!wget {HP} -P /content/Mangio-RVC-Fork/uvr5_weights/ &> /dev/null\n",
75 | "!wget {DeEcho} -P /content/Mangio-RVC-Fork/uvr5_weights/ &> /dev/null\n",
76 | "!wget {rmvpe} -P /content/Mangio-RVC-Fork/ &> /dev/null\n",
77 | "!wget {hubert} -P /content/Mangio-RVC-Fork/ &> /dev/null\n",
78 | "\n",
79 | "# качаем импульс для постобработки ревером\n",
80 | "import os\n",
81 | "import shutil\n",
82 | "\n",
83 | "impulse_folder = '/content/impulse'\n",
84 | "impulse_filename = '100-Reverb.wav'\n",
85 | "IMPULSE_FILE = os.path.join(impulse_folder, impulse_filename)\n",
86 | "\n",
87 | "!mkdir -p /content/zips/\n",
88 | "!mkdir -p /content/unzips/\n",
89 | "!gdown 'https://drive.usercontent.google.com/download?id=0B6KkVBpcTFQvTGtRN1RyNUNuM0k&authuser=0&confirm=t&uuid=e6179a3c-0dd8-48ce-9460-0d6783171e6e&at=APZUnTW5UwuRkhWxtY_HU3VS7XVk%3A1710311671579' --fuzzy -O \"/content/zips/impulses.zip\"\n",
90 | "\n",
91 | "for filename in os.listdir(\"/content/zips\"):\n",
92 | " if filename.endswith(\".zip\"):\n",
93 | " zip_file = os.path.join(\"/content/zips\", filename)\n",
94 | " shutil.unpack_archive(zip_file, \"/content/unzips\", 'zip')\n",
95 | "\n",
96 | "for root, dirs, files in os.walk(\"/content/unzips\"):\n",
97 | " for file in files:\n",
98 | " if file.endswith(impulse_filename):\n",
99 | " file_name = os.path.splitext(file)[0]\n",
100 | " os.makedirs(impulse_folder, exist_ok=True)\n",
101 | " shutil.move(os.path.join(root, file), os.path.join(impulse_folder, file))\n",
102 | "\n",
103 | "!rm -r /content/unzips/\n",
104 | "!rm -r /content/zips/\n",
105 | "\n",
106 | "print('Готово!')"
107 | ]
108 | },
109 | {
110 | "cell_type": "code",
111 | "source": [
112 | "# @title # Скачиваем исходное аудио\n",
113 | "#@markdown ##### Шаг можно пропустить и вручную положить аудио-файл в /content/input\n",
114 | "#@markdown ##### Вставьте ссылку на Youtube:\n",
115 | "%cd /content/Mangio-RVC-Fork\n",
116 | "url = 'https://www.youtube.com/watch?v=DeE8Fxq3viE' #@param {type:\"string\"}\n",
117 | "\n",
118 | "default_audio = 'audio'\n",
119 | "input_download_path = '/content/input'\n",
120 | "input_download_format = 'mp3'\n",
121 | "\n",
122 | "import yt_dlp\n",
123 | "import ffmpeg\n",
124 | "import sys\n",
125 | "from IPython.display import Audio, display, HTML, FileLink\n",
126 | "\n",
127 | "ydl_opts = {\n",
128 | " 'format': 'bestaudio/best',\n",
129 | " 'postprocessors': [{\n",
130 | " 'key': 'FFmpegExtractAudio',\n",
131 | " 'preferredcodec': input_download_format,\n",
132 | " }],\n",
133 | " \"outtmpl\": f'{input_download_path}/{default_audio}',\n",
134 | "}\n",
135 | "def download_from_url(url):\n",
136 | " ydl.download([url])\n",
137 | "\n",
138 | "with yt_dlp.YoutubeDL(ydl_opts) as ydl:\n",
139 | " download_from_url(url)\n",
140 | "\n",
141 | "audio = Audio(f'{input_download_path}/{default_audio}.{input_download_format}', autoplay=False)\n",
142 | "display(audio)"
143 | ],
144 | "metadata": {
145 | "cellView": "form",
146 | "id": "YTpxq9MtN0kj"
147 | },
148 | "execution_count": null,
149 | "outputs": []
150 | },
151 | {
152 | "cell_type": "code",
153 | "source": [
154 | "# @title # Скачиваем модель\n",
155 | "#@markdown ##### Шаг можно пропустить и вручную положить .pth-модель и .index файл в репозиторий /content/Mangio-RVC-Fork\n",
156 | "#@markdown Вставьте ссылку на модель (Mega, Drive, etc.):\n",
157 | "url = 'https://drive.google.com/file/d/14OVs-EEohPcHRqXtYuK_0xMesxnfHgab/view' #@param {type:\"string\"}\n",
158 | "\n",
159 | "%cd /content/Mangio-RVC-Fork\n",
160 | "\n",
161 | "#@markdown ---\n",
162 | "#@markdown ##### Ссылки на модели:\n",
163 | "#@markdown ##### https://docs.google.com/spreadsheets/d/1tAUaQrEHYgRsm1Lvrnj14HFHDwJWl0Bd9x0QePewNco\n",
164 | "#@markdown ##### https://huggingface.co/QuickWick/Music-AI-Voices/tree/main\n",
165 | "#@markdown ##### https://discord.gg/aihubbrasil\n",
166 | "\n",
167 | "from mega.mega import Mega\n",
168 | "import os\n",
169 | "import shutil\n",
170 | "from urllib.parse import urlparse, parse_qs\n",
171 | "import urllib.parse\n",
172 | "from google.oauth2.service_account import Credentials\n",
173 | "import gspread\n",
174 | "import pandas as pd\n",
175 | "from tqdm import tqdm\n",
176 | "from bs4 import BeautifulSoup\n",
177 | "import requests\n",
178 | "import hashlib\n",
179 | "\n",
180 | "!rm -rf /content/unzips/\n",
181 | "!rm -rf /content/zips/\n",
182 | "!mkdir /content/unzips\n",
183 | "!mkdir /content/zips\n",
184 | "\n",
185 | "def sanitize_directory(directory):\n",
186 | " for filename in os.listdir(directory):\n",
187 | " file_path = os.path.join(directory, filename)\n",
188 | " if os.path.isfile(file_path):\n",
189 | " if filename == \".DS_Store\" or filename.startswith(\"._\"):\n",
190 | " os.remove(file_path)\n",
191 | " elif os.path.isdir(file_path):\n",
192 | " sanitize_directory(file_path)\n",
193 | "\n",
194 | "model_zip = urlparse(url).path.split('/')[-2] + '.zip'\n",
195 | "model_zip_path = '/content/zips/' + model_zip\n",
196 | "\n",
197 | "private_model = False\n",
198 | "condition1 = False\n",
199 | "condition2 = False\n",
200 | "condition3 = False\n",
201 | "is_index_found = False\n",
202 | "\n",
203 | "if url != '':\n",
204 | " MODEL = \"\" # Initialize MODEL variable\n",
205 | " !mkdir -p /content/Mangio-RVC-Fork/logs/$MODEL\n",
206 | " !mkdir -p /content/zips/\n",
207 | " !mkdir -p /content/Mangio-RVC-Fork/weights/ # Create the 'weights' directory\n",
208 | "\n",
209 | " if \"drive.google.com\" in url:\n",
210 | " !gdown $url --fuzzy -O \"$model_zip_path\"\n",
211 | " elif \"/blob/\" in url:\n",
212 | " url = url.replace(\"blob\", \"resolve\")\n",
213 | " print(\"Resolved URL:\", url) # Print the resolved URL\n",
214 | " !wget \"$url\" -O \"$model_zip_path\"\n",
215 | " elif \"mega.nz\" in url:\n",
216 | " m = Mega()\n",
217 | " print(\"Starting download from MEGA....\")\n",
218 | " m.download_url(url, '/content/zips')\n",
219 | " elif \"/tree/main\" in url:\n",
220 | " response = requests.get(url)\n",
221 | " soup = BeautifulSoup(response.content, 'html.parser')\n",
222 | " temp_url = ''\n",
223 | " for link in soup.find_all('a', href=True):\n",
224 | " if link['href'].endswith('.zip'):\n",
225 | " temp_url = link['href']\n",
226 | " break\n",
227 | " if temp_url:\n",
228 | " url = temp_url\n",
229 | " print(\"Updated URL:\", url) # Print the updated URL\n",
230 | " url = url.replace(\"blob\", \"resolve\")\n",
231 | " print(\"Resolved URL:\", url) # Print the resolved URL\n",
232 | "\n",
233 | " if \"huggingface.co\" not in url:\n",
234 | " url = \"https://huggingface.co\" + url\n",
235 | "\n",
236 | " !wget \"$url\" -O \"$model_zip_path\"\n",
237 | " else:\n",
238 | " print(\"No .zip file found on the page.\")\n",
239 | " # Handle the case when no .zip file is found\n",
240 | " else:\n",
241 | " !wget \"$url\" -O \"$model_zip_path\"\n",
242 | "\n",
243 | " for filename in os.listdir(\"/content/zips\"):\n",
244 | " if filename.endswith(\".zip\"):\n",
245 | " zip_file = os.path.join(\"/content/zips\", filename)\n",
246 | " shutil.unpack_archive(zip_file, \"/content/unzips\", 'zip')\n",
247 | "\n",
248 | "sanitize_directory(\"/content/unzips\")\n",
249 | "\n",
250 | "def find_pth_file(folder):\n",
251 | " for root, dirs, files in os.walk(folder):\n",
252 | " for file in files:\n",
253 | " if file.endswith(\".pth\"):\n",
254 | " file_name = os.path.splitext(file)[0]\n",
255 | " if file_name.startswith(\"G_\") or file_name.startswith(\"P_\"):\n",
256 | " config_file = os.path.join(root, \"config.json\")\n",
257 | " if os.path.isfile(config_file):\n",
258 | " print(\"Outdated .pth detected! This is not compatible with the RVC method. Find the RVC equivalent model!\")\n",
259 | " continue # Continue searching for a valid file\n",
260 | " file_path = os.path.join(root, file)\n",
261 | " if os.path.getsize(file_path) > 100 * 1024 * 1024: # Check file size in bytes (100MB)\n",
262 | " print(\"Skipping unusable training file:\", file)\n",
263 | " continue # Continue searching for a valid file\n",
264 | " return file_name\n",
265 | " return None\n",
266 | "\n",
267 | "MODEL = find_pth_file(\"/content/unzips\")\n",
268 | "if MODEL is not None:\n",
269 | " print(\"Found .pth file:\", MODEL + \".pth\")\n",
270 | "else:\n",
271 | " print(\"Error: Could not find a valid .pth file within the extracted zip.\")\n",
272 | " print(\"If there's an error above this talking about 'Access denied', try one of the Alt URLs in the Google Sheets for this model.\")\n",
273 | " MODEL = \"\"\n",
274 | " global condition3\n",
275 | " condition3 = True\n",
276 | "\n",
277 | "index_path = \"\"\n",
278 | "\n",
279 | "def find_version_number(index_path):\n",
280 | " if condition2 and not condition1:\n",
281 | " if file_size >= 55180000:\n",
282 | " return 'RVC v2'\n",
283 | " else:\n",
284 | " return 'RVC v1'\n",
285 | "\n",
286 | " filename = os.path.basename(index_path)\n",
287 | "\n",
288 | " if filename.endswith(\"_v2.index\"):\n",
289 | " return 'RVC v2'\n",
290 | " elif filename.endswith(\"_v1.index\"):\n",
291 | " return 'RVC v1'\n",
292 | " else:\n",
293 | " if file_size >= 55180000:\n",
294 | " return 'RVC v2'\n",
295 | " else:\n",
296 | " return 'RVC v1'\n",
297 | "\n",
298 | "if MODEL != \"\":\n",
299 | " # Move model into logs folder\n",
300 | " for root, dirs, files in os.walk('/content/unzips'):\n",
301 | " for file in files:\n",
302 | " file_path = os.path.join(root, file)\n",
303 | " if file.endswith(\".index\"):\n",
304 | " print(\"Found index file:\", file)\n",
305 | " is_index_found = False\n",
306 | " condition1 = True\n",
307 | " logs_folder = os.path.join('/content/Mangio-RVC-Fork/logs', MODEL)\n",
308 | " os.makedirs(logs_folder, exist_ok=True) # Create the logs folder if it doesn't exist\n",
309 | "\n",
310 | " # Delete identical .index file if it exists\n",
311 | " if file.endswith(\".index\"):\n",
312 | " identical_index_path = os.path.join(logs_folder, file)\n",
313 | " if os.path.exists(identical_index_path):\n",
314 | " os.remove(identical_index_path)\n",
315 | "\n",
316 | " shutil.move(file_path, logs_folder)\n",
317 | " index_path = os.path.join(logs_folder, file) # Set index_path variable\n",
318 | "\n",
319 | " elif \"G_\" not in file and \"D_\" not in file and file.endswith(\".pth\"):\n",
320 | " destination_path = f'/content/Mangio-RVC-Fork/weights/{MODEL}.pth'\n",
321 | " if os.path.exists(destination_path):\n",
322 | " print(\"You already downloaded this model. Re-importing anyways..\")\n",
323 | " shutil.move(file_path, destination_path)\n",
324 | "\n",
325 | "if condition1 is False:\n",
326 | " logs_folder = os.path.join('/content/Mangio-RVC-Fork/logs', MODEL)\n",
327 | " os.makedirs(logs_folder, exist_ok=True)\n",
328 | "# this is here so it doesnt crash if the model is missing an index for some reason\n",
329 | "else:\n",
330 | " print(\"URL cannot be left empty. If you don't want to download a model now, just skip this step.\")\n",
331 | "\n",
332 | "# Качаем любой index-файл, если в архиве его не было\n",
333 | "if is_index_found is False:\n",
334 | " logs_folder = os.path.join('/content/Mangio-RVC-Fork/logs', MODEL)\n",
335 | " index_path = os.path.join(logs_folder, 'model.index')\n",
336 | " if os.path.exists(index_path) == False:\n",
337 | " !wget 'https://huggingface.co/sail-rvc/2001MJAIDAM/resolve/main/model.index' -P {logs_folder}\n",
338 | "\n",
339 | "!rm -r /content/unzips/\n",
340 | "!rm -r /content/zips/"
341 | ],
342 | "metadata": {
343 | "cellView": "form",
344 | "id": "0x6VOMyae_lq"
345 | },
346 | "execution_count": null,
347 | "outputs": []
348 | },
349 | {
350 | "cell_type": "markdown",
351 | "metadata": {
352 | "id": "x8CNwbOjZPUY"
353 | },
354 | "source": [
355 | "# Начинаем обработку исходника\n",
356 | "### В ***/content/input*** должен быть трек, а RVC-модель должна быть в репозитории ***/content/Mangio-RVC-Fork***"
357 | ]
358 | },
359 | {
360 | "cell_type": "code",
361 | "execution_count": null,
362 | "metadata": {
363 | "cellView": "form",
364 | "id": "0XuTHvtDZJPT"
365 | },
366 | "outputs": [],
367 | "source": [
368 | "# @title #Отделяем вокал от минуса\n",
369 | "%cd /content/MVSEP-MDX23-Colab_v2\n",
370 | "\n",
371 | "from pathlib import Path, PurePath\n",
372 | "from IPython.display import Audio, display, HTML, FileLink\n",
373 | "import os\n",
374 | "\n",
375 | "# @markdown Папка с исходным музыкальным файлом (только один файл):\n",
376 | "INPUT = '/content/input' #@param {type:\"string\"}\n",
377 | "# @markdown ---\n",
378 | "OUTPUT_UVR_FOLDER = '/content/output_uvr' #@param {type:\"string\"}\n",
379 | "# @markdown ---\n",
380 | "# @markdown Соотношение скорость/качество (1 - максимальная скорость, 10 - максимальное качество):\n",
381 | "quaility = 5 # @param {type:\"slider\", min:1, max:10, step:1}\n",
382 | "quality_max = 10\n",
383 | "\n",
384 | "BigShifts = round(11 * quaility / quality_max)\n",
385 | "overlap_InstVoc = round(8 * quaility / quality_max)\n",
386 | "overlap_VitLarge = round(8 * quaility / quality_max)\n",
387 | "weight_InstVoc = 8\n",
388 | "weight_VitLarge = 5\n",
389 | "is_VOCFT = False\n",
390 | "overlap_VOCFT = 0.1\n",
391 | "weight_VOCFT = 1\n",
392 | "is_vocals_only = True\n",
393 | "overlap_demucs = 0.6\n",
394 | "output_format = 'PCM_16'\n",
395 | "vocals_only = ''\n",
396 | "use_VOCFT = ''\n",
397 | "if is_vocals_only:\n",
398 | " vocals_only = '--vocals_only true'\n",
399 | "else:\n",
400 | " vocals_only = ''\n",
401 | "if is_VOCFT:\n",
402 | " use_VOCFT = '--use_VOCFT true'\n",
403 | "else:\n",
404 | " use_VOCFT = ''\n",
405 | "\n",
406 | "\n",
407 | "filename = next(Path(INPUT).glob('*.mp3'))\n",
408 | "INPUT_NAME = Path(filename).stem\n",
409 | "VOCAL_FILE = os.path.join(OUTPUT_UVR_FOLDER, INPUT_NAME + \"_vocals.wav\")\n",
410 | "INSTRUM_FILE = os.path.join(OUTPUT_UVR_FOLDER, INPUT_NAME + \"_instrum.wav\")\n",
411 | "\n",
412 | "!python inference.py \\\n",
413 | " --large_gpu \\\n",
414 | " --weight_InstVoc {weight_InstVoc} \\\n",
415 | " --weight_VOCFT {weight_VOCFT} \\\n",
416 | " --weight_VitLarge {weight_VitLarge} \\\n",
417 | " --input_audio \"{filename}\" \\\n",
418 | " --overlap_demucs {overlap_demucs} \\\n",
419 | " --overlap_VOCFT {overlap_VOCFT} \\\n",
420 | " --overlap_InstVoc {overlap_InstVoc} \\\n",
421 | " --overlap_VitLarge {overlap_VitLarge} \\\n",
422 | " --output_format {output_format} \\\n",
423 | " --BigShifts {BigShifts} \\\n",
424 | " --output_folder \"{OUTPUT_UVR_FOLDER}\" \\\n",
425 | " {vocals_only} \\\n",
426 | " {use_VOCFT}\n",
427 | "\n",
428 | "print(\"\\nПослушаем разделённый трек.\")\n",
429 | "print(\"Нужно немного подождать, сейчас появится...\")\n",
430 | "# колаб порой офигивает от размера wav и дисконнектится\n",
431 | "!ffmpeg -y -i {VOCAL_FILE} -vn -ar 44100 -ac 2 -b:a 192k {os.path.join(OUTPUT_UVR_FOLDER, INPUT_NAME + \"_vocals\")}.mp3 &> /dev/null\n",
432 | "!ffmpeg -y -i {INSTRUM_FILE} -vn -ar 44100 -ac 2 -b:a 192k {os.path.join(OUTPUT_UVR_FOLDER, INPUT_NAME + \"_instrum\")}.mp3 &> /dev/null\n",
433 | "print(\"Вокал:\")\n",
434 | "audio_vocal = Audio(os.path.join(OUTPUT_UVR_FOLDER, INPUT_NAME + \"_vocals.mp3\"), autoplay=False)\n",
435 | "display(audio_vocal)\n",
436 | "print(\"Инструментал:\")\n",
437 | "audio_inst = Audio(os.path.join(OUTPUT_UVR_FOLDER, INPUT_NAME + \"_instrum.mp3\"), autoplay=False)\n",
438 | "display(audio_inst)\n",
439 | "\n",
440 | "!rm -rf {os.path.join(OUTPUT_UVR_FOLDER, INPUT_NAME + \"_vocals.mp3\")}\n",
441 | "!rm -rf {os.path.join(OUTPUT_UVR_FOLDER, INPUT_NAME + \"_instrum.mp3\")}\n",
442 | "\n",
443 | "print('\\nСовет: если в вокале много эха и ревера, переходи к следующему шагу перед преобразованием RVC')"
444 | ]
445 | },
446 | {
447 | "cell_type": "code",
448 | "execution_count": null,
449 | "metadata": {
450 | "cellView": "form",
451 | "id": "3yztvPmFytx-"
452 | },
453 | "outputs": [],
454 | "source": [
455 | "%cd /content/Mangio-RVC-Fork\n",
456 | "# @title # Доп обработка от ревера и эхо (опционально)\n",
457 | "# @markdown ##### Новый файл автоматически заменит исходный вокальный файл /content/output_uvr/file_vocals.wav\n",
458 | "# @markdown ---\n",
459 | "# @markdown ##### Эти значения можно не трогать:\n",
460 | "\n",
461 | "import numpy as np",
462 | "np.float = float",
463 | "\n",
464 | "postprocess = False #@param {type:\"boolean\"}\n",
465 | "tta = True #@param {type:\"boolean\"}\n",
466 | "is_half = False #@param {type:\"boolean\"}\n",
467 | "window_size = 512 # @param {type:\"slider\", min:0, max:1024, step:32}\n",
468 | "\n",
469 | "from pathlib import Path, PurePath\n",
470 | "from IPython.display import Audio, display, HTML, FileLink\n",
471 | "\n",
472 | "input_denoise_file = VOCAL_FILE\n",
473 | "output_folder = PurePath(VOCAL_FILE).parent\n",
474 | "\n",
475 | "import os, sys, torch, warnings, pdb\n",
476 | "\n",
477 | "now_dir = os.getcwd()\n",
478 | "sys.path.append(now_dir)\n",
479 | "from json import load as ll\n",
480 | "\n",
481 | "warnings.filterwarnings(\"ignore\")\n",
482 | "import librosa\n",
483 | "import importlib\n",
484 | "import numpy as np\n",
485 | "import hashlib, math\n",
486 | "from tqdm import tqdm\n",
487 | "from lib.uvr5_pack.lib_v5 import spec_utils\n",
488 | "from lib.uvr5_pack.utils import _get_name_params, inference\n",
489 | "from lib.uvr5_pack.lib_v5.model_param_init import ModelParameters\n",
490 | "import soundfile as sf\n",
491 | "from lib.uvr5_pack.lib_v5.nets_new import CascadedNet\n",
492 | "from lib.uvr5_pack.lib_v5 import nets_61968KB as nets\n",
493 | "\n",
494 | "class _audio_pre_new:\n",
495 | " def __init__(self, agg, model_path, device, is_half):\n",
496 | " self.model_path = model_path\n",
497 | " self.device = device\n",
498 | " self.data = {\n",
499 | " # Processing Options\n",
500 | " \"postprocess\": postprocess,\n",
501 | " \"tta\": tta,\n",
502 | " # Constants\n",
503 | " \"window_size\": window_size,\n",
504 | " \"agg\": agg,\n",
505 | " \"high_end_process\": \"mirroring\",\n",
506 | " }\n",
507 | " mp = ModelParameters(\"lib/uvr5_pack/lib_v5/modelparams/4band_v3.json\")\n",
508 | " nout = 64 if \"DeReverb\" in model_path else 48\n",
509 | " model = CascadedNet(mp.param[\"bins\"] * 2, nout)\n",
510 | " cpk = torch.load(model_path, map_location=\"cuda\")\n",
511 | " model.load_state_dict(cpk)\n",
512 | " model.eval()\n",
513 | " if is_half:\n",
514 | " model = model.half().to(device)\n",
515 | " else:\n",
516 | " model = model.to(device)\n",
517 | "\n",
518 | " self.mp = mp\n",
519 | " self.model = model\n",
520 | "\n",
521 | " def _path_audio_(\n",
522 | " self, music_file, vocal_root=None, ins_root=None, format=\"flac\"\n",
523 | " ):\n",
524 | " if ins_root is None and vocal_root is None:\n",
525 | " return \"No save root.\"\n",
526 | " name = os.path.basename(music_file)\n",
527 | " if ins_root is not None:\n",
528 | " os.makedirs(ins_root, exist_ok=True)\n",
529 | " if vocal_root is not None:\n",
530 | " os.makedirs(vocal_root, exist_ok=True)\n",
531 | " X_wave, y_wave, X_spec_s, y_spec_s = {}, {}, {}, {}\n",
532 | " bands_n = len(self.mp.param[\"band\"])\n",
533 | " for d in range(bands_n, 0, -1):\n",
534 | " bp = self.mp.param[\"band\"][d]\n",
535 | " if d == bands_n: # high-end band\n",
536 | " (\n",
537 | " X_wave[d],\n",
538 | " _,\n",
539 | " ) = librosa.core.load(\n",
540 | " music_file,\n",
541 | " bp[\"sr\"],\n",
542 | " False,\n",
543 | " dtype=np.float32,\n",
544 | " res_type=bp[\"res_type\"],\n",
545 | " )\n",
546 | " if X_wave[d].ndim == 1:\n",
547 | " X_wave[d] = np.asfortranarray([X_wave[d], X_wave[d]])\n",
548 | " else: # lower bands\n",
549 | " X_wave[d] = librosa.core.resample(\n",
550 | " X_wave[d + 1],\n",
551 | " self.mp.param[\"band\"][d + 1][\"sr\"],\n",
552 | " bp[\"sr\"],\n",
553 | " res_type=bp[\"res_type\"],\n",
554 | " )\n",
555 | " # Stft of wave source\n",
556 | " X_spec_s[d] = spec_utils.wave_to_spectrogram_mt(\n",
557 | " X_wave[d],\n",
558 | " bp[\"hl\"],\n",
559 | " bp[\"n_fft\"],\n",
560 | " self.mp.param[\"mid_side\"],\n",
561 | " self.mp.param[\"mid_side_b2\"],\n",
562 | " self.mp.param[\"reverse\"],\n",
563 | " )\n",
564 | " # pdb.set_trace()\n",
565 | " if d == bands_n and self.data[\"high_end_process\"] != \"none\":\n",
566 | " input_high_end_h = (bp[\"n_fft\"] // 2 - bp[\"crop_stop\"]) + (\n",
567 | " self.mp.param[\"pre_filter_stop\"] - self.mp.param[\"pre_filter_start\"]\n",
568 | " )\n",
569 | " input_high_end = X_spec_s[d][\n",
570 | " :, bp[\"n_fft\"] // 2 - input_high_end_h : bp[\"n_fft\"] // 2, :\n",
571 | " ]\n",
572 | "\n",
573 | " X_spec_m = spec_utils.combine_spectrograms(X_spec_s, self.mp)\n",
574 | " aggresive_set = float(self.data[\"agg\"] / 100)\n",
575 | " aggressiveness = {\n",
576 | " \"value\": aggresive_set,\n",
577 | " \"split_bin\": self.mp.param[\"band\"][1][\"crop_stop\"],\n",
578 | " }\n",
579 | " with torch.no_grad():\n",
580 | " pred, X_mag, X_phase = inference(\n",
581 | " X_spec_m, self.device, self.model, aggressiveness, self.data\n",
582 | " )\n",
583 | " # Postprocess\n",
584 | " if self.data[\"postprocess\"]:\n",
585 | " pred_inv = np.clip(X_mag - pred, 0, np.inf)\n",
586 | " pred = spec_utils.mask_silence(pred, pred_inv)\n",
587 | " y_spec_m = pred * X_phase\n",
588 | " v_spec_m = X_spec_m - y_spec_m\n",
589 | "\n",
590 | " if ins_root is not None:\n",
591 | " if self.data[\"high_end_process\"].startswith(\"mirroring\"):\n",
592 | " input_high_end_ = spec_utils.mirroring(\n",
593 | " self.data[\"high_end_process\"], y_spec_m, input_high_end, self.mp\n",
594 | " )\n",
595 | " wav_instrument = spec_utils.cmb_spectrogram_to_wave(\n",
596 | " y_spec_m, self.mp, input_high_end_h, input_high_end_\n",
597 | " )\n",
598 | " else:\n",
599 | " wav_instrument = spec_utils.cmb_spectrogram_to_wave(y_spec_m, self.mp)\n",
600 | " if format in [\"wav\", \"flac\"]:\n",
601 | " sf.write(\n",
602 | " os.path.join(\n",
603 | " ins_root,\n",
604 | " \"denoised_{}.{}\".format(name, format),\n",
605 | " ),\n",
606 | " (np.array(wav_instrument) * 32768).astype(\"int16\"),\n",
607 | " self.mp.param[\"sr\"],\n",
608 | " )\n",
609 | " else:\n",
610 | " path = os.path.join(\n",
611 | " ins_root, \"denoised_{}.wav\".format(name)\n",
612 | " )\n",
613 | " sf.write(\n",
614 | " path,\n",
615 | " (np.array(wav_instrument) * 32768).astype(\"int16\"),\n",
616 | " self.mp.param[\"sr\"],\n",
617 | " )\n",
618 | " if os.path.exists(path):\n",
619 | " os.system(\n",
620 | " \"ffmpeg -i %s -vn %s -q:a 2 -y\"\n",
621 | " % (path, path[:-4] + \".%s\" % format)\n",
622 | " )\n",
623 | " if vocal_root is not None:\n",
624 | " if self.data[\"high_end_process\"].startswith(\"mirroring\"):\n",
625 | " input_high_end_ = spec_utils.mirroring(\n",
626 | " self.data[\"high_end_process\"], v_spec_m, input_high_end, self.mp\n",
627 | " )\n",
628 | " wav_vocals = spec_utils.cmb_spectrogram_to_wave(\n",
629 | " v_spec_m, self.mp, input_high_end_h, input_high_end_\n",
630 | " )\n",
631 | " else:\n",
632 | " wav_vocals = spec_utils.cmb_spectrogram_to_wave(v_spec_m, self.mp)\n",
633 | " if format in [\"wav\", \"flac\"]:\n",
634 | " sf.write(\n",
635 | " os.path.join(\n",
636 | " vocal_root,\n",
637 | " \"vocal_{}_{}.{}\".format(name, self.data[\"agg\"], format),\n",
638 | " ),\n",
639 | " (np.array(wav_vocals) * 32768).astype(\"int16\"),\n",
640 | " self.mp.param[\"sr\"],\n",
641 | " )\n",
642 | " else:\n",
643 | " path = os.path.join(\n",
644 | " vocal_root, \"vocal_{}_{}.wav\".format(name, self.data[\"agg\"])\n",
645 | " )\n",
646 | " sf.write(\n",
647 | " path,\n",
648 | " (np.array(wav_vocals) * 32768).astype(\"int16\"),\n",
649 | " self.mp.param[\"sr\"],\n",
650 | " )\n",
651 | " if os.path.exists(path):\n",
652 | " os.system(\n",
653 | " \"ffmpeg -i %s -vn %s -q:a 2 -y\"\n",
654 | " % (path, path[:-4] + \".%s\" % format)\n",
655 | " )\n",
656 | "\n",
657 | "\n",
658 | "device = \"cuda\"\n",
659 | "model_path = \"uvr5_weights/UVR-DeEcho-DeReverb.pth\"\n",
660 | "pre_fun = _audio_pre_new(model_path=model_path, device=device, is_half=is_half, agg=10)\n",
661 | "pre_fun._path_audio_(input_denoise_file, None, output_folder, \"wav\")\n",
662 | "\n",
663 | "%mv {os.path.join(os.path.dirname(VOCAL_FILE), \"denoised_\" + os.path.basename(VOCAL_FILE)) + PurePath(VOCAL_FILE).suffix} {VOCAL_FILE}\n",
664 | "\n",
665 | "# колаб порой офигивает от размера wav и дисконнектится, поэтому mp3\n",
666 | "!ffmpeg -y -i {VOCAL_FILE} -vn -ar 44100 -ac 2 -b:a 192k {os.path.join(OUTPUT_UVR_FOLDER, INPUT_NAME + \"_vocals\")}.mp3 &> /dev/null\n",
667 | "audio = Audio(os.path.join(OUTPUT_UVR_FOLDER, INPUT_NAME + \"_vocals.mp3\"), autoplay=False)\n",
668 | "display(audio)\n",
669 | "\n",
670 | "!rm -rf {os.path.join(OUTPUT_UVR_FOLDER, INPUT_NAME + \"_vocals.mp3\")}"
671 | ]
672 | },
673 | {
674 | "cell_type": "markdown",
675 | "source": [
676 | "# Преобразование файла и склейка с исходным инструменталом"
677 | ],
678 | "metadata": {
679 | "id": "H3OJWmYNuTuH"
680 | }
681 | },
682 | {
683 | "cell_type": "code",
684 | "execution_count": null,
685 | "metadata": {
686 | "id": "J0JNWaDOoK0H",
687 | "cellView": "form"
688 | },
689 | "outputs": [],
690 | "source": [
691 | "# @title # 🚀 Преобразование\n",
692 | "from IPython.display import Audio, display, HTML, FileLink\n",
693 | "from pathlib import Path, PurePath\n",
694 | "import os\n",
695 | "from decimal import Decimal\n",
696 | "\n",
697 | "result_path = \"/content/output_rvc\"\n",
698 | "result_name = \"rvc_result\"\n",
699 | "result_format = \"mp3\"\n",
700 | "\n",
701 | "RVC_RESULT_FILE = os.path.join(result_path, result_name + \".\" + result_format)\n",
702 | "rvc_result_filename = os.path.basename(RVC_RESULT_FILE)\n",
703 | "\n",
704 | "f0_method = \"rmvpe\"\n",
705 | "\n",
706 | "# @markdown ## Настройки не меняют тональность\n",
707 | "\n",
708 | "# @markdown ### Питч (октава вверх, октава вниз, стандарт):\n",
709 | "transpositionMode = \"\\u0421\\u0442\\u0430\\u043D\\u0434\\u0430\\u0440\\u0442\\u043D\\u044B\\u0439\" # @param [\"М -> Ж\", \"Ж -> М\", \"Стандартный\"]\n",
710 | "transposition = 0\n",
711 | "\n",
712 | "if transpositionMode == \"М -> Ж\":\n",
713 | " transposition = 12\n",
714 | "elif transpositionMode == \"Ж -> М\":\n",
715 | " transposition = -12\n",
716 | "elif transpositionMode == \"Стандартный\":\n",
717 | " transposition = 0\n",
718 | "\n",
719 | "# @markdown ---\n",
720 | "# @markdown ## Режим:\n",
721 | "mode = \"\\u0421\\u0442\\u0430\\u043D\\u0434\\u0430\\u0440\\u0442\\u043D\\u044B\\u0439\" # @param [\"М -> Ж\", \"Ж -> М\", \"Стандартный\", \"Ручная настройка\"]\n",
722 | "\n",
723 | "# @markdown ---\n",
724 | "# @markdown ### Ручные настройки:\n",
725 | "# @markdown ###### (не работает, если не выбран режим Ручная настройка)\n",
726 | "# @markdown ##### Quefrency (default 0.0 ms):\n",
727 | "quefrency = 0 # @param {type:\"slider\", min:0.0, max:2, step:0.1}\n",
728 | "# @markdown ##### Tembre factor (default 1.0):\n",
729 | "tembre = 1 # @param {type:\"slider\", min:0.0, max:2, step:0.1}\n",
730 | "\n",
731 | "if mode == \"М -> Ж\":\n",
732 | " quefrency = 0.5\n",
733 | " tembre = 1.3\n",
734 | "elif mode == \"Ж -> М\":\n",
735 | " quefrency = 1.0\n",
736 | " tembre = 0.7\n",
737 | "elif mode == \"Стандартный\":\n",
738 | " quefrency = 0.0\n",
739 | " tembre = 1.0\n",
740 | "\n",
741 | "# @markdown ### Стандартный режим быстрее! Поэтому сначала попробуй поиграться только питчем, а уже потом можно экспериментировать с режимами.\n",
742 | "# @markdown ---\n",
743 | "# @markdown ## Подсказка:\n",
744 | "# @markdown ###### Из женского в мужской: quefrency = 1.0 (больше дефолтного), tembre = 0.7 (меньше дефолтного):\n",
745 | "# @markdown ###### Из мужского в женский: quefrency = 0.5 (чуть-чуть больше дефолтного), tembre = 1.3 (больше дефолтного):\n",
746 | "# @markdown ###### Универсальных решений нет, в первую очередь зависит от исходного голоса и конечной модели. Отталкиваться следует от значений выше.:\n",
747 | "%cd /content/Mangio-RVC-Fork\n",
748 | "\n",
749 | "# \"\\n arg 1) model name with .pth in ./weights: mi-test.pth\"\n",
750 | "# \"\\n arg 2) source audio path: myFolder\\\\MySource.wav\"\n",
751 | "# \"\\n arg 3) output file name to be placed in './audio-outputs': MyTest.wav\"\n",
752 | "# \"\\n arg 4) feature index file path: logs/mi-test/added_IVF3042_Flat_nprobe_1.index\"\n",
753 | "# \"\\n arg 5) speaker id: 0\"\n",
754 | "# \"\\n arg 6) transposition: 0\"\n",
755 | "# \"\\n arg 7) f0 method: harvest (pm, harvest, crepe, crepe-tiny, hybrid[x,x,x,x], mangio-crepe, mangio-crepe-tiny, rmvpe)\"\n",
756 | "# \"\\n arg 8) crepe hop length: 160\"\n",
757 | "# \"\\n arg 9) harvest median filter radius: 3 (0-7)\"\n",
758 | "# \"\\n arg 10) post resample rate: 0\"\n",
759 | "# \"\\n arg 11) mix volume envelope: 1\"\n",
760 | "# \"\\n arg 12) feature index ratio: 0.78 (0-1)\"\n",
761 | "# \"\\n arg 13) Voiceless Consonant Protection (Less Artifact): 0.33 (Smaller number = more protection. 0.50 means Dont Use.)\"\n",
762 | "# \"\\n arg 14) Whether to formant shift the inference audio before conversion: False (if set to false, you can ignore setting the quefrency and timbre values for formanting)\"\n",
763 | "# \"\\n arg 15)* Quefrency for formanting: 8.0 (no need to set if arg14 is False/false)\"\n",
764 | "# \"\\n arg 16)* Timbre for formanting: 1.2 (no need to set if arg14 is False/false) \\n\"\n",
765 | "# \"\\nExample: mi-test.pth audios/Sidney.wav myTest.wav logs/mi-test/added_index.index 0 -2 harvest 160 3 0 1 0.95 0.33 0.45 True 8.0 1.2\"\n",
766 | "is_formant = \"False\"\n",
767 | "if quefrency != 0 and tembre != 1:\n",
768 | " is_formant = \"True\"\n",
769 | "\n",
770 | "quefrency_value = \"{:.1f}\".format(Decimal(quefrency).quantize(Decimal('0.1')))\n",
771 | "tembre_value = \"{:.1f}\".format(Decimal(tembre).quantize(Decimal('0.1')))\n",
772 | "transposition_value = str(transposition)\n",
773 | "\n",
774 | "cmd = MODEL + \".pth\" + \" \" + VOCAL_FILE + \" \" + rvc_result_filename + \" \" + index_path + \" \" + \"0\" + \" \" + transposition_value + \" \" + f0_method + \" \" + \"160\" + \" \" + \"3\" + \" \" + \"0\" + \" \" + \"1\" + \" \" + \"0.78\" + \" \" + \"0.33\" + \" \" + \"0.45\" + \" \" + is_formant + \" \" + quefrency_value + \" \" + tembre_value\n",
775 | "print(cmd)\n",
776 | "!echo -e -n \"go infer\\n{cmd}\\nstop_infer\" | python3 infer-web.py --colab --pycmd python3 --is_cli &> /dev/null\n",
777 | "%mv /content/Mangio-RVC-Fork/audio-outputs/{rvc_result_filename} {RVC_RESULT_FILE}\n",
778 | "audio = Audio(RVC_RESULT_FILE, autoplay=False)\n",
779 | "display(audio)"
780 | ]
781 | },
782 | {
783 | "cell_type": "code",
784 | "source": [
785 | "%cd /content\n",
786 | "# @title # Пост-обработка\n",
787 | "# @markdown ### Компрессор + нормализация + лёгкая реверберация + разведение по стерео-панораме\n",
788 | "\n",
789 | "from IPython.display import Audio, display, HTML, FileLink\n",
790 | "import os\n",
791 | "import shutil\n",
792 | "\n",
793 | "OUTPUT_PATH = '/content/output'\n",
794 | "PROCESSED_OUTPUT_FORMAT = 'mp3'\n",
795 | "COMPRESSED_RESULT_FILE = os.path.join(OUTPUT_PATH, f\"{os.path.splitext(RVC_RESULT_FILE)[0]}_compressed.{PROCESSED_OUTPUT_FORMAT}\")\n",
796 | "PROCESSED_RESULT_FILE = os.path.join(OUTPUT_PATH, f\"{os.path.splitext(RVC_RESULT_FILE)[0]}_processed.{PROCESSED_OUTPUT_FORMAT}\")\n",
797 | "\n",
798 | "# компрессируем вокал\n",
799 | "!ffmpeg -y -i {RVC_RESULT_FILE} -filter_complex \"anlmdn=s=10,acompressor=threshold=-20dB:ratio=4:attack=20:release=200,volume=2,loudnorm=I=-13:TP=-1.0:LRA=9,volume=1.5\" {COMPRESSED_RESULT_FILE}\n",
800 | "if os.path.isfile(COMPRESSED_RESULT_FILE) != True:\n",
801 | " print(f\"Не удалось обработать файл {RVC_RESULT_FILE}\")\n",
802 | " shutil.copy(RVC_RESULT_FILE, PROCESSED_RESULT_FILE)\n",
803 | "else:\n",
804 | " if os.path.isfile(IMPULSE_FILE):\n",
805 | " # добавление реверберации с разной обработкой для левого и правого канала для стереоскопического эффекта\n",
806 | " print(\"Добавление реверберации с разной обработкой для левого и правого канала для стереоскопического эффекта\")\n",
807 | " !ffmpeg -y -i {COMPRESSED_RESULT_FILE} -i {IMPULSE_FILE} -filter_complex \"[0:a]asplit=2[splita][splitb]; [splita]adelay=40|40[splita_delayed]; [splitb]adelay=20|20[splitb_delayed]; [splita_delayed][1]afir=dry=10:wet=10[reverb_left]; [splitb_delayed][1]afir=dry=10:wet=10[reverb_right]; [reverb_left][reverb_right]amerge=inputs=2[reverb]; [0:a][reverb]amix=inputs=2:weights=20 1[audio]\" -map \"[audio]\" {PROCESSED_RESULT_FILE}\n",
808 | " if os.path.isfile(PROCESSED_RESULT_FILE):\n",
809 | " !rm -rf {COMPRESSED_RESULT_FILE}\n",
810 | " else:\n",
811 | " print(f\"Не удалось обработать компрессированный файл {COMPRESSED_RESULT_FILE}\")\n",
812 | " shutil.move(COMPRESSED_RESULT_FILE, PROCESSED_RESULT_FILE)\n",
813 | " else:\n",
814 | " print(f\"Не найден файл импульса: {IMPULSE_FILE}\")\n",
815 | " shutil.move(COMPRESSED_RESULT_FILE, PROCESSED_RESULT_FILE)\n",
816 | "\n",
817 | "audio = Audio(PROCESSED_RESULT_FILE, autoplay=False)\n",
818 | "display(audio)"
819 | ],
820 | "metadata": {
821 | "cellView": "form",
822 | "id": "IzIpIvLUxWmq"
823 | },
824 | "execution_count": null,
825 | "outputs": []
826 | },
827 | {
828 | "cell_type": "code",
829 | "source": [
830 | "%cd /content\n",
831 | "# @title # Склейка\n",
832 | "\n",
833 | "from IPython.display import Audio, display, HTML, FileLink\n",
834 | "import os\n",
835 | "\n",
836 | "OUTPUT_PATH = '/content/output' #@param {type:\"string\"}\n",
837 | "OUTPUT_FORMAT = 'mp3'\n",
838 | "\n",
839 | "RESULT_FILE = os.path.join(OUTPUT_PATH, INPUT_NAME + \".\" + OUTPUT_FORMAT)\n",
840 | "\n",
841 | "!ffmpeg -y -i {PROCESSED_RESULT_FILE} -i {INSTRUM_FILE} -filter_complex \"[0:a][1:a]amerge=inputs=2[a]\" -map \"[a]\" -ac 2 {RESULT_FILE}\n",
842 | "\n",
843 | "audio = Audio(RESULT_FILE, autoplay=False)\n",
844 | "display(audio)"
845 | ],
846 | "metadata": {
847 | "cellView": "form",
848 | "id": "Xp05zq7DgcvU"
849 | },
850 | "execution_count": null,
851 | "outputs": []
852 | },
853 | {
854 | "cell_type": "markdown",
855 | "source": [
856 | "### Готово\n",
857 | "#### Теперь можешь вернуться к любому предыдущему шагу, без необходимости запуска полного флоу. Например, можно загрузить другую модель, останется только выполнить преобразование вокала, который уже отделен от инструментала."
858 | ],
859 | "metadata": {
860 | "id": "Tr6iEhD2fi0d"
861 | }
862 | }
863 | ],
864 | "metadata": {
865 | "accelerator": "GPU",
866 | "colab": {
867 | "provenance": [],
868 | "toc_visible": true
869 | },
870 | "kernelspec": {
871 | "display_name": "Python 3",
872 | "name": "python3"
873 | },
874 | "language_info": {
875 | "name": "python"
876 | }
877 | },
878 | "nbformat": 4,
879 | "nbformat_minor": 0
880 | }
881 |
--------------------------------------------------------------------------------