RVC Data Prep: An Open-Source RVC Data Preparation Tool

├── .DS_Store
├── .gitignore
├── LICENSE
├── README.md
├── requirements.txt
└── utils.py


/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dubverse-ai/rvc-data-prep/ab8940bc668f178f75d3bdba4084d603dec5db19/.DS_Store


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | /cache
2 | /env


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2023 Dubverse
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | <div align="center">
  2 | 
  3 | <h1>RVC Data Prep: An Open-Source RVC Data Preparation Tool</h1>
  4 | a Dubverse Black initiative <br> <br>
  5 | 
  6 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1NA2GuJ2y-zRfoG3NearNiMCQa8NbidSh?usp=sharing)
  7 | [![Discord Shield](https://discordapp.com/api/guilds/1162007551987171410/widget.png?style=shield)](https://discord.gg/4VGnrgpBN)
  8 | </div>
  9 | 
 10 | ------
 11 | 
 12 | ## Description
 13 | RVC Data Prep is an advanced tool for transforming audio/video content into isolated vocals. If a video contains multiple speakers, it will generate separate files for each one. The core functionality leverages Facebook's Demucs to isolate vocals and Pyannote embeddings to ideally identify and differentiate speakers.
 14 | 
 15 | ## Features
 16 | 1. Isolate vocals from YouTube videos
 17 | 2. Distinguish multiple speakers and provide separate files
 18 | 3. Trim silences greated than 300ms from the audio
 19 | 4. (Beta) Separate multi-singer Acapellas
 20 | 
 21 | ## Prerequisites
 22 | Before you start using this tool, ensure that you have the following installed:
 23 | - Python version 3.10 or newer
 24 | - Accept [`pyannote/segmentation-3.0`](https://hf.co/pyannote/segmentation-3.0) user conditions
 25 | - Accept [`pyannote/speaker-diarization-3.0`](https://hf.co/pyannote/speaker-diarization-3.0) user conditions
 26 | - Create access token at [`hf.co/settings/tokens`](https://hf.co/settings/tokens)
 27 | 
 28 | ## How to use
 29 | 
 30 | Clone the repository
 31 | 
 32 | ``` bash
 33 | git clone https://github.com/dubverse-ai/rvc-data-prep.git
 34 | ```
 35 | 
 36 | Change working directory, install dependencies and import the utils.py script
 37 | 
 38 | ``` bash
 39 | cd rvc-data-prep
 40 | pip install -r requirements.txt
 41 | ```
 42 | 
 43 | The `clean` function in `utils.py` provides automatic processing of a given file (wav, mp3 and flac only). You need to specify different parameters depending on your needs.
 44 | Parameters:
 45 | 
 46 | - `local` _(bool)_: Set this to `True` if you intend to give a file locally; `False` if you intend to create a dataset from a YouTube link.
 47 | 
 48 | - `file_path` _(str)_: This should be either a local path or a YouTube URL file depending on what you set `local` to be.
 49 | 
 50 | - `project_name` _(str)_: This will be the name of the project which the processed file will be saved under.
 51 | 
 52 | - `acapella_output` _(bool)_: (BETA) If this is `True`, the function insert blank audio segments while separating and segregrating speakers. The output files will add up in the time domain to create the original file.
 53 | 
 54 | - `single_speaker_file` _(bool)_: If `True`, this will flag the file as having a single speaker.
 55 | 
 56 | - `token` _(str)_: It is client secret key or token of your Hugging Face account. You would only need this if you're working with files involving multiple speakers. You can leave this blank in that case.
 57 | 
 58 | Here is an example to use the clean function:
 59 | 
 60 | ```python
 61 | from utils import clean
 62 | 
 63 | clean(local=False, 
 64 |       file_path="https://www.youtube.com/watch?v=someVideoId", 
 65 |       project_name="myProject", 
 66 |       acapella_output=True, 
 67 |       token="yourToken", 
 68 |       single_speaker_file=False)
 69 | ```
 70 | In this example, we are providing a YouTube video url to `file_path`, setting the `project_name` as "myProject", and requesting for an acapella output by setting `acapella_output` to `True`. We indicate there may be more than one speaker by setting `single_speaker_file` to `False`, and pass our account token as `token`. 
 71 | 
 72 | ## YouTube Tutorial
 73 | [![YOUTUBE TUTORIAL](https://img.youtube.com/vi/QLQ8eSGZDi8/0.jpg)](https://www.youtube.com/watch?v=QLQ8eSGZDi8)
 74 | 
 75 | 
 76 | ## Examples
 77 | | **Input Video**                                                                       | **Separated Files**                                                                                                                                                                                                                                         |
 78 | |---------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
 79 | | [Shahrukh Khan's Speech](https://www.youtube.com/shorts/tsgWNmVU_B0)                  | [Vocals](https://dl.sndup.net/qhp6/srk-cleaned.mp3)                                                                                                                                                                                                         |
 80 | | [Yeh Ladka Haaye Allah - Bollywood Song](https://www.youtube.com/watch?v=BE8_rNJOQ-0) | [Udit Narayan's Vocals](https://dl.sndup.net/rqmp/SPEAKER_00.mp3), [Alka Yagnik's Vocals](https://dl.sndup.net/rg4g/SPEAKER_02.mp3), [Chorous](https://dl.sndup.net/d8s9/SPEAKER_01.mp3), [Other ambigous sounds](https://dl.sndup.net/wd2y/SPEAKER_03.mp3) |
 81 | | [Perfect - Ed Sheeran Duet](https://www.youtube.com/watch?v=817P8W8-mGE)              | [Ed Sheeran's Vocals](https://dl.sndup.net/gmjf/perfect.mp3), [Beyonce's Vocals](https://dl.sndup.net/h4qs/perfectf.mp3)                                                                                                                                    |
 82 | 
 83 | ## Known Issues
 84 | * Messes up when there are multiple people speaking at the same time
 85 | * When using `acapella = True`, this sometimes skips some audio segments which makes it hard to sync manually. 
 86 | 
 87 | ## Contributing 
 88 | We welcome contributions from anyone and everyone. Details about how to contribute, what we are looking for and how to get started can be found in our contributing guidelines.
 89 | 
 90 | ## Support
 91 | For any issues, queries, and suggestions, join our [Discord server](https://discord.gg/4VGnrgpBN). Will be glad to help!
 92 | 
 93 | ## Future Scope
 94 | - Add multispeaker Acapella support
 95 | - Integrate this in the RVC workflow - base data preparation and creating AI covers
 96 | - Improve the efficiencies of speaker identification using other models like Titanet
 97 | 
 98 | ## About Us
 99 | We, at **Dubverse.ai**, are a dedicated and passionate group of developers who have been working for over three years on generative AI with a specific emphasis on audio. We deeply believe in the potential of AI to revolutionize the fields of video, voiceover, podcasts and other media-related applications. 
100 | 
101 | Our passion and dedication don't stop at development. We believe in sharing knowledge and nurturing a community of like-minded enthusiasts. That's why we maintain a deep tech blog where we talk about our latest research, development, trends in the field, and insights about generative AI and audio technologies. 
102 | 
103 | Check out some of our RVC blog posts:
104 | 
105 | 1. [Evals are all we need](https://black.dubverse.ai/p/evals-are-all-we-need)
106 | 2. [Running RVC Models on the Easy GUI](https://black.dubverse.ai/p/running-rvc-models-on-the-easy-gui)
107 | 
108 | We are always open to hear from others who share our passion. Whether you're an expert in the field, a hobbyist, or just someone intrigued by AI and audio, feel free to reach out and connect with us.
109 | 
110 | 
111 | ## License 
112 | RVC Data Prep is licensed under the MIT License - see the [LICENSE](LICENSE) file for details
113 | 
114 | *Disclaimer: This repo is not affiliated with YouTube, Facebook AI Research, or Pyannote. All trademarks referred to are the property of their respective owners.*
115 | 
116 | ## Acknowledgements
117 | 1. FaceBook Demucs, Pyannote Audio, Librosa, FFMPEG, and other audio related libraries.
118 | 2. The Dubverse Black Discord and the AI Hub Discord for quick and actionable feedback. 
119 | 
120 | -----------------------------------------------------------------------------
121 | 
122 | We value your feedback and encourage you to provide us with any suggestions or issues that you may encounter. Let's make this tool better together!
123 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
  1 | aiohttp==3.9.1
  2 | aiosignal==1.3.1
  3 | alembic==1.12.1
  4 | antlr4-python3-runtime==4.9.3
  5 | asteroid-filterbanks==0.4.0
  6 | async-timeout==4.0.3
  7 | attrs==23.1.0
  8 | audioread==3.0.1
  9 | Brotli==1.1.0
 10 | certifi==2023.11.17
 11 | cffi==1.16.0
 12 | charset-normalizer==3.3.2
 13 | click==8.1.7
 14 | cloudpickle==3.0.0
 15 | colorama==0.4.6
 16 | coloredlogs==15.0.1
 17 | colorlog==6.7.0
 18 | contourpy==1.2.0
 19 | cycler==0.12.1
 20 | decorator==5.1.1
 21 | demucs==4.0.1
 22 | docopt==0.6.2
 23 | dora-search==0.1.12
 24 | einops==0.7.0
 25 | filelock==3.13.1
 26 | flatbuffers==23.5.26
 27 | fonttools==4.45.1
 28 | frozenlist==1.4.0
 29 | fsspec==2023.10.0
 30 | greenlet==3.0.1
 31 | huggingface-hub==0.19.4
 32 | humanfriendly==10.0
 33 | HyperPyYAML==1.2.2
 34 | idna==3.6
 35 | Jinja2==3.1.2
 36 | joblib==1.3.2
 37 | julius==0.2.7
 38 | kiwisolver==1.4.5
 39 | lameenc==1.6.3
 40 | lazy_loader==0.3
 41 | librosa==0.10.1
 42 | lightning==2.1.2
 43 | lightning-utilities==0.10.0
 44 | llvmlite==0.41.1
 45 | Mako==1.3.0
 46 | markdown-it-py==3.0.0
 47 | MarkupSafe==2.1.3
 48 | matplotlib==3.8.2
 49 | mdurl==0.1.2
 50 | mpmath==1.3.0
 51 | msgpack==1.0.7
 52 | multidict==6.0.4
 53 | mutagen==1.47.0
 54 | networkx==3.2.1
 55 | noisereduce==3.0.0
 56 | numba==0.58.1
 57 | numpy==1.26.2
 58 | omegaconf==2.3.0
 59 | onnxruntime==1.16.3
 60 | openunmix==1.2.1
 61 | optuna==3.4.0
 62 | packaging==23.2
 63 | pandas==2.1.3
 64 | Pillow==10.1.0
 65 | platformdirs==4.0.0
 66 | pooch==1.8.0
 67 | primePy==1.3
 68 | protobuf==4.25.1
 69 | pyannote.audio==3.1.0
 70 | pyannote.core==5.0.0
 71 | pyannote.database==5.0.1
 72 | pyannote.metrics==3.2.1
 73 | pyannote.pipeline==3.0.1
 74 | pycparser==2.21
 75 | pycryptodomex==3.19.0
 76 | pydub==0.25.1
 77 | Pygments==2.17.2
 78 | pyparsing==3.1.1
 79 | python-dateutil==2.8.2
 80 | pytorch-lightning==2.1.2
 81 | pytorch-metric-learning==2.3.0
 82 | pytz==2023.3.post1
 83 | PyYAML==6.0.1
 84 | requests==2.31.0
 85 | retrying==1.3.4
 86 | rich==13.7.0
 87 | ruamel.yaml==0.18.5
 88 | ruamel.yaml.clib==0.2.8
 89 | scikit-learn==1.3.2
 90 | scipy==1.11.4
 91 | semver==3.0.2
 92 | sentencepiece==0.1.99
 93 | shellingham==1.5.4
 94 | six==1.16.0
 95 | sortedcontainers==2.4.0
 96 | soundfile==0.12.1
 97 | soxr==0.3.7
 98 | speechbrain==0.5.16
 99 | SQLAlchemy==2.0.23
100 | submitit==1.5.1
101 | sympy==1.12
102 | tabulate==0.9.0
103 | tensorboardX==2.6.2.2
104 | threadpoolctl==3.2.0
105 | torch==2.1.1
106 | torch-audiomentations==0.11.0
107 | torch-pitch-shift==1.2.4
108 | torchaudio==2.1.1
109 | torchmetrics==1.2.0
110 | torchvision==0.16.1
111 | tqdm==4.66.1
112 | treetable==0.2.5
113 | typer==0.9.0
114 | typing_extensions==4.8.0
115 | tzdata==2023.3
116 | urllib3==2.1.0
117 | websockets==12.0
118 | yarl==1.9.3
119 | yt-dlp==2023.11.16
120 | 


--------------------------------------------------------------------------------
/utils.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import subprocess
  3 | import io
  4 | from pathlib import Path
  5 | import select
  6 | from shutil import rmtree
  7 | import subprocess as sp
  8 | import sys
  9 | from typing import Dict, Tuple, Optional, IO
 10 | from pyannote.audio import Pipeline
 11 | import torch
 12 | import shutil
 13 | 
 14 | from pydub import AudioSegment
 15 | from pydub.silence import split_on_silence
 16 | 
 17 | 
 18 | def clean(local, file_path, project_name, acapella, token, single_speaker):
 19 |     project_dir = os.getcwd() + "/" + project_name + "/"
 20 |     if local == True:
 21 |         ext = file_path.split(".")[-1]
 22 |     else:
 23 |         ext = "mp3"
 24 |     
 25 |     print(f"Project Folder: {project_dir}; ext: {ext}")
 26 |     
 27 |     setup_project(local, file_path, project_name)
 28 |     separate(inp = project_dir + "input", outp = project_dir + "output")
 29 |     
 30 |     if single_speaker:
 31 |         if acapella == True:
 32 |             print(f"Isolated vocals and file saved at {project_dir}{project_name}/output/htdemucs/file/")
 33 |         else:
 34 |             remove_silences(project_dir, ext)
 35 |             print(f"Separated vocals, removed silences and saved file at {project_dir}{project_name}/output/htdemucs/file/")
 36 |     else:
 37 |         if acapella == False:
 38 |             diarize_dataset(token, project_dir, ext, acapella = False, silences = True)
 39 |             print(f"Separated speakers, vocals, removed silences and saved file at {project_dir}{project_name}/output/htdemucs/file/")
 40 |         else:
 41 |             diarize_dataset(token, project_dir, ext, acapella = True, silences = False)
 42 |             print(f"Separated speakers, vocals and saved file at {project_dir}{project_name}/output/htdemucs/file/")
 43 | 
 44 | def diarize_dataset(token, project_dir, ext, acapella, silences):
 45 |     if silences == True:
 46 |         remove_silences(project_dir, ext)
 47 |         file_path = f"{project_dir}output/htdemucs/file/silences_removed.{ext}"
 48 |     else:
 49 |         file_path = f"{project_dir}output/htdemucs/file/vocals.{ext}"
 50 | 
 51 |     pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization-3.0", use_auth_token=token)
 52 | 
 53 |     pipeline.to(torch.device("cuda"))
 54 |     #diarization = pipeline(file_path, embedding_exclude_overlap=True)
 55 |     diarization = pipeline(file_path)
 56 |     audio = AudioSegment.from_mp3(file_path)
 57 | 
 58 |     speakers = []
 59 |     for turn, _, speaker in diarization.itertracks(yield_label=True):
 60 |         speakers.append(speaker)
 61 | 
 62 |     speakers = list(set(speakers))
 63 | 
 64 |     buffer = {}
 65 |     for s in speakers:
 66 |         buffer[s] = AudioSegment.empty()
 67 | 
 68 | 
 69 |     list(set(speakers))
 70 |     for turn, _, speaker in diarization.itertracks(yield_label=True):
 71 |         
 72 |         start = int(turn.start * 1000)
 73 |         end = int(turn.end * 1000)
 74 |         diff = end - start
 75 |         
 76 |         if acapella == False:
 77 |             speaker_audio = audio[start:end]
 78 |             buffer[speaker] += speaker_audio
 79 |         else:
 80 |             blank = AudioSegment.silent(diff)
 81 |             for iter in list(set(speakers)):
 82 |                 if iter == speaker:
 83 |                     speaker_audio = audio[start:end]
 84 |                     buffer[speaker] += speaker_audio
 85 |                 else:
 86 |                     buffer[iter] += blank
 87 | 
 88 |     for s in speakers:
 89 |         buffer[s].export(project_dir + "output/htdemucs/file/" + s + "." + ext)
 90 | 
 91 |     return speakers
 92 | 
 93 | 
 94 | def remove_silences(project_dir, ext):
 95 | 
 96 |     file_path = project_dir + "/output/htdemucs/file/vocals." + ext
 97 |     file_name = "silences_removed.mp3"
 98 |     audio_format = "mp3"
 99 |     sound = AudioSegment.from_file(file_path, format = audio_format)
100 |     audio_chunks = split_on_silence(sound
101 |                                 ,min_silence_len = 100
102 |                                 ,silence_thresh = -45
103 |                                 ,keep_silence = 50
104 |                             )
105 | 
106 |     combined = AudioSegment.empty()
107 |     for chunk in audio_chunks:
108 |         combined += chunk
109 |     combined.export(f'{project_dir}output/htdemucs/file/{file_name}', format = audio_format)
110 | 
111 | def setup_project(local, file_path, project_name):
112 |     
113 |     pwd = os.getcwd()
114 |     subprocess.run(f"mkdir {pwd}/{project_name}", shell = True)
115 |     subprocess.run(f"mkdir {pwd}/{project_name}/input", shell = True)
116 |     subprocess.run(f"mkdir {pwd}/{project_name}/output", shell = True)
117 |     print(f"Created the project directory at {pwd}/{project_name}")
118 |     
119 |     if local:
120 |         ext = file_path.split(".")[-1]
121 |         shutil.copy(file_path, f"{pwd}/{project_name}/input/file.{ext}")
122 |         print("Copied files from the given path to the working directory")
123 |     else:
124 |         out_dir = f"{pwd}/{project_name}/input/file.mp3"
125 |         subprocess.run(f"yt-dlp -x --audio-format mp3 -o {out_dir} {file_path}", shell = True)
126 |         print("Downloaded the YouTube video and saved it in the project directory")
127 | 
128 | 
129 | def find_files(in_path):
130 |     out = []
131 |     for file in Path(in_path).iterdir():
132 |         if file.suffix.lower().lstrip(".") in ["mp3", "wav", "ogg", "flac"]:
133 |             out.append(file)
134 |     return out
135 | 
136 | def copy_process_streams(process: sp.Popen):
137 |     def raw(stream: Optional[IO[bytes]]) -> IO[bytes]:
138 |         assert stream is not None
139 |         if isinstance(stream, io.BufferedIOBase):
140 |             stream = stream.raw
141 |         return stream
142 | 
143 |     p_stdout, p_stderr = raw(process.stdout), raw(process.stderr)
144 |     stream_by_fd: Dict[int, Tuple[IO[bytes], io.StringIO, IO[str]]] = {
145 |         p_stdout.fileno(): (p_stdout, sys.stdout),
146 |         p_stderr.fileno(): (p_stderr, sys.stderr),
147 |     }
148 |     fds = list(stream_by_fd.keys())
149 | 
150 |     while fds:
151 |         ready, _, _ = select.select(fds, [], [])
152 |         for fd in ready:
153 |             p_stream, std = stream_by_fd[fd]
154 |             raw_buf = p_stream.read(2 ** 16)
155 |             if not raw_buf:
156 |                 fds.remove(fd)
157 |                 continue
158 |             buf = raw_buf.decode()
159 |             std.write(buf)
160 |             std.flush()
161 | 
162 | def separate(inp=None, outp=None):
163 |     inp = inp or in_path
164 |     outp = outp or out_path
165 |     cmd = ["python3", "-m", "demucs.separate", "-o", str(outp), "-n", "htdemucs"]
166 |     cmd += ["--mp3", "--mp3-bitrate=320"]
167 |     cmd += [f"--two-stems=vocals"]
168 | 
169 |     files = [str(f) for f in find_files(inp)]
170 |     if not files:
171 |         print(f"No valid audio files in {in_path}")
172 |         return
173 |     p = sp.Popen(cmd + files, stdout=sp.PIPE, stderr=sp.PIPE)
174 |     copy_process_streams(p)
175 |     p.wait()
176 |     if p.returncode != 0:
177 |         print("Command failed, something went wrong.")
178 | 


--------------------------------------------------------------------------------