├── .DS_Store
├── .gitignore
├── LICENSE
├── README.md
├── requirements.txt
└── utils.py
/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dubverse-ai/rvc-data-prep/ab8940bc668f178f75d3bdba4084d603dec5db19/.DS_Store
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | /cache
2 | /env
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2023 Dubverse
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
RVC Data Prep: An Open-Source RVC Data Preparation Tool
4 | a Dubverse Black initiative
5 |
6 | [](https://colab.research.google.com/drive/1NA2GuJ2y-zRfoG3NearNiMCQa8NbidSh?usp=sharing)
7 | [](https://discord.gg/4VGnrgpBN)
8 |
9 |
10 | ------
11 |
12 | ## Description
13 | RVC Data Prep is an advanced tool for transforming audio/video content into isolated vocals. If a video contains multiple speakers, it will generate separate files for each one. The core functionality leverages Facebook's Demucs to isolate vocals and Pyannote embeddings to ideally identify and differentiate speakers.
14 |
15 | ## Features
16 | 1. Isolate vocals from YouTube videos
17 | 2. Distinguish multiple speakers and provide separate files
18 | 3. Trim silences greated than 300ms from the audio
19 | 4. (Beta) Separate multi-singer Acapellas
20 |
21 | ## Prerequisites
22 | Before you start using this tool, ensure that you have the following installed:
23 | - Python version 3.10 or newer
24 | - Accept [`pyannote/segmentation-3.0`](https://hf.co/pyannote/segmentation-3.0) user conditions
25 | - Accept [`pyannote/speaker-diarization-3.0`](https://hf.co/pyannote/speaker-diarization-3.0) user conditions
26 | - Create access token at [`hf.co/settings/tokens`](https://hf.co/settings/tokens)
27 |
28 | ## How to use
29 |
30 | Clone the repository
31 |
32 | ``` bash
33 | git clone https://github.com/dubverse-ai/rvc-data-prep.git
34 | ```
35 |
36 | Change working directory, install dependencies and import the utils.py script
37 |
38 | ``` bash
39 | cd rvc-data-prep
40 | pip install -r requirements.txt
41 | ```
42 |
43 | The `clean` function in `utils.py` provides automatic processing of a given file (wav, mp3 and flac only). You need to specify different parameters depending on your needs.
44 | Parameters:
45 |
46 | - `local` _(bool)_: Set this to `True` if you intend to give a file locally; `False` if you intend to create a dataset from a YouTube link.
47 |
48 | - `file_path` _(str)_: This should be either a local path or a YouTube URL file depending on what you set `local` to be.
49 |
50 | - `project_name` _(str)_: This will be the name of the project which the processed file will be saved under.
51 |
52 | - `acapella_output` _(bool)_: (BETA) If this is `True`, the function insert blank audio segments while separating and segregrating speakers. The output files will add up in the time domain to create the original file.
53 |
54 | - `single_speaker_file` _(bool)_: If `True`, this will flag the file as having a single speaker.
55 |
56 | - `token` _(str)_: It is client secret key or token of your Hugging Face account. You would only need this if you're working with files involving multiple speakers. You can leave this blank in that case.
57 |
58 | Here is an example to use the clean function:
59 |
60 | ```python
61 | from utils import clean
62 |
63 | clean(local=False,
64 | file_path="https://www.youtube.com/watch?v=someVideoId",
65 | project_name="myProject",
66 | acapella_output=True,
67 | token="yourToken",
68 | single_speaker_file=False)
69 | ```
70 | In this example, we are providing a YouTube video url to `file_path`, setting the `project_name` as "myProject", and requesting for an acapella output by setting `acapella_output` to `True`. We indicate there may be more than one speaker by setting `single_speaker_file` to `False`, and pass our account token as `token`.
71 |
72 | ## YouTube Tutorial
73 | [](https://www.youtube.com/watch?v=QLQ8eSGZDi8)
74 |
75 |
76 | ## Examples
77 | | **Input Video** | **Separated Files** |
78 | |---------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
79 | | [Shahrukh Khan's Speech](https://www.youtube.com/shorts/tsgWNmVU_B0) | [Vocals](https://dl.sndup.net/qhp6/srk-cleaned.mp3) |
80 | | [Yeh Ladka Haaye Allah - Bollywood Song](https://www.youtube.com/watch?v=BE8_rNJOQ-0) | [Udit Narayan's Vocals](https://dl.sndup.net/rqmp/SPEAKER_00.mp3), [Alka Yagnik's Vocals](https://dl.sndup.net/rg4g/SPEAKER_02.mp3), [Chorous](https://dl.sndup.net/d8s9/SPEAKER_01.mp3), [Other ambigous sounds](https://dl.sndup.net/wd2y/SPEAKER_03.mp3) |
81 | | [Perfect - Ed Sheeran Duet](https://www.youtube.com/watch?v=817P8W8-mGE) | [Ed Sheeran's Vocals](https://dl.sndup.net/gmjf/perfect.mp3), [Beyonce's Vocals](https://dl.sndup.net/h4qs/perfectf.mp3) |
82 |
83 | ## Known Issues
84 | * Messes up when there are multiple people speaking at the same time
85 | * When using `acapella = True`, this sometimes skips some audio segments which makes it hard to sync manually.
86 |
87 | ## Contributing
88 | We welcome contributions from anyone and everyone. Details about how to contribute, what we are looking for and how to get started can be found in our contributing guidelines.
89 |
90 | ## Support
91 | For any issues, queries, and suggestions, join our [Discord server](https://discord.gg/4VGnrgpBN). Will be glad to help!
92 |
93 | ## Future Scope
94 | - Add multispeaker Acapella support
95 | - Integrate this in the RVC workflow - base data preparation and creating AI covers
96 | - Improve the efficiencies of speaker identification using other models like Titanet
97 |
98 | ## About Us
99 | We, at **Dubverse.ai**, are a dedicated and passionate group of developers who have been working for over three years on generative AI with a specific emphasis on audio. We deeply believe in the potential of AI to revolutionize the fields of video, voiceover, podcasts and other media-related applications.
100 |
101 | Our passion and dedication don't stop at development. We believe in sharing knowledge and nurturing a community of like-minded enthusiasts. That's why we maintain a deep tech blog where we talk about our latest research, development, trends in the field, and insights about generative AI and audio technologies.
102 |
103 | Check out some of our RVC blog posts:
104 |
105 | 1. [Evals are all we need](https://black.dubverse.ai/p/evals-are-all-we-need)
106 | 2. [Running RVC Models on the Easy GUI](https://black.dubverse.ai/p/running-rvc-models-on-the-easy-gui)
107 |
108 | We are always open to hear from others who share our passion. Whether you're an expert in the field, a hobbyist, or just someone intrigued by AI and audio, feel free to reach out and connect with us.
109 |
110 |
111 | ## License
112 | RVC Data Prep is licensed under the MIT License - see the [LICENSE](LICENSE) file for details
113 |
114 | *Disclaimer: This repo is not affiliated with YouTube, Facebook AI Research, or Pyannote. All trademarks referred to are the property of their respective owners.*
115 |
116 | ## Acknowledgements
117 | 1. FaceBook Demucs, Pyannote Audio, Librosa, FFMPEG, and other audio related libraries.
118 | 2. The Dubverse Black Discord and the AI Hub Discord for quick and actionable feedback.
119 |
120 | -----------------------------------------------------------------------------
121 |
122 | We value your feedback and encourage you to provide us with any suggestions or issues that you may encounter. Let's make this tool better together!
123 |
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | aiohttp==3.9.1
2 | aiosignal==1.3.1
3 | alembic==1.12.1
4 | antlr4-python3-runtime==4.9.3
5 | asteroid-filterbanks==0.4.0
6 | async-timeout==4.0.3
7 | attrs==23.1.0
8 | audioread==3.0.1
9 | Brotli==1.1.0
10 | certifi==2023.11.17
11 | cffi==1.16.0
12 | charset-normalizer==3.3.2
13 | click==8.1.7
14 | cloudpickle==3.0.0
15 | colorama==0.4.6
16 | coloredlogs==15.0.1
17 | colorlog==6.7.0
18 | contourpy==1.2.0
19 | cycler==0.12.1
20 | decorator==5.1.1
21 | demucs==4.0.1
22 | docopt==0.6.2
23 | dora-search==0.1.12
24 | einops==0.7.0
25 | filelock==3.13.1
26 | flatbuffers==23.5.26
27 | fonttools==4.45.1
28 | frozenlist==1.4.0
29 | fsspec==2023.10.0
30 | greenlet==3.0.1
31 | huggingface-hub==0.19.4
32 | humanfriendly==10.0
33 | HyperPyYAML==1.2.2
34 | idna==3.6
35 | Jinja2==3.1.2
36 | joblib==1.3.2
37 | julius==0.2.7
38 | kiwisolver==1.4.5
39 | lameenc==1.6.3
40 | lazy_loader==0.3
41 | librosa==0.10.1
42 | lightning==2.1.2
43 | lightning-utilities==0.10.0
44 | llvmlite==0.41.1
45 | Mako==1.3.0
46 | markdown-it-py==3.0.0
47 | MarkupSafe==2.1.3
48 | matplotlib==3.8.2
49 | mdurl==0.1.2
50 | mpmath==1.3.0
51 | msgpack==1.0.7
52 | multidict==6.0.4
53 | mutagen==1.47.0
54 | networkx==3.2.1
55 | noisereduce==3.0.0
56 | numba==0.58.1
57 | numpy==1.26.2
58 | omegaconf==2.3.0
59 | onnxruntime==1.16.3
60 | openunmix==1.2.1
61 | optuna==3.4.0
62 | packaging==23.2
63 | pandas==2.1.3
64 | Pillow==10.1.0
65 | platformdirs==4.0.0
66 | pooch==1.8.0
67 | primePy==1.3
68 | protobuf==4.25.1
69 | pyannote.audio==3.1.0
70 | pyannote.core==5.0.0
71 | pyannote.database==5.0.1
72 | pyannote.metrics==3.2.1
73 | pyannote.pipeline==3.0.1
74 | pycparser==2.21
75 | pycryptodomex==3.19.0
76 | pydub==0.25.1
77 | Pygments==2.17.2
78 | pyparsing==3.1.1
79 | python-dateutil==2.8.2
80 | pytorch-lightning==2.1.2
81 | pytorch-metric-learning==2.3.0
82 | pytz==2023.3.post1
83 | PyYAML==6.0.1
84 | requests==2.31.0
85 | retrying==1.3.4
86 | rich==13.7.0
87 | ruamel.yaml==0.18.5
88 | ruamel.yaml.clib==0.2.8
89 | scikit-learn==1.3.2
90 | scipy==1.11.4
91 | semver==3.0.2
92 | sentencepiece==0.1.99
93 | shellingham==1.5.4
94 | six==1.16.0
95 | sortedcontainers==2.4.0
96 | soundfile==0.12.1
97 | soxr==0.3.7
98 | speechbrain==0.5.16
99 | SQLAlchemy==2.0.23
100 | submitit==1.5.1
101 | sympy==1.12
102 | tabulate==0.9.0
103 | tensorboardX==2.6.2.2
104 | threadpoolctl==3.2.0
105 | torch==2.1.1
106 | torch-audiomentations==0.11.0
107 | torch-pitch-shift==1.2.4
108 | torchaudio==2.1.1
109 | torchmetrics==1.2.0
110 | torchvision==0.16.1
111 | tqdm==4.66.1
112 | treetable==0.2.5
113 | typer==0.9.0
114 | typing_extensions==4.8.0
115 | tzdata==2023.3
116 | urllib3==2.1.0
117 | websockets==12.0
118 | yarl==1.9.3
119 | yt-dlp==2023.11.16
120 |
--------------------------------------------------------------------------------
/utils.py:
--------------------------------------------------------------------------------
1 | import os
2 | import subprocess
3 | import io
4 | from pathlib import Path
5 | import select
6 | from shutil import rmtree
7 | import subprocess as sp
8 | import sys
9 | from typing import Dict, Tuple, Optional, IO
10 | from pyannote.audio import Pipeline
11 | import torch
12 | import shutil
13 |
14 | from pydub import AudioSegment
15 | from pydub.silence import split_on_silence
16 |
17 |
18 | def clean(local, file_path, project_name, acapella, token, single_speaker):
19 | project_dir = os.getcwd() + "/" + project_name + "/"
20 | if local == True:
21 | ext = file_path.split(".")[-1]
22 | else:
23 | ext = "mp3"
24 |
25 | print(f"Project Folder: {project_dir}; ext: {ext}")
26 |
27 | setup_project(local, file_path, project_name)
28 | separate(inp = project_dir + "input", outp = project_dir + "output")
29 |
30 | if single_speaker:
31 | if acapella == True:
32 | print(f"Isolated vocals and file saved at {project_dir}{project_name}/output/htdemucs/file/")
33 | else:
34 | remove_silences(project_dir, ext)
35 | print(f"Separated vocals, removed silences and saved file at {project_dir}{project_name}/output/htdemucs/file/")
36 | else:
37 | if acapella == False:
38 | diarize_dataset(token, project_dir, ext, acapella = False, silences = True)
39 | print(f"Separated speakers, vocals, removed silences and saved file at {project_dir}{project_name}/output/htdemucs/file/")
40 | else:
41 | diarize_dataset(token, project_dir, ext, acapella = True, silences = False)
42 | print(f"Separated speakers, vocals and saved file at {project_dir}{project_name}/output/htdemucs/file/")
43 |
44 | def diarize_dataset(token, project_dir, ext, acapella, silences):
45 | if silences == True:
46 | remove_silences(project_dir, ext)
47 | file_path = f"{project_dir}output/htdemucs/file/silences_removed.{ext}"
48 | else:
49 | file_path = f"{project_dir}output/htdemucs/file/vocals.{ext}"
50 |
51 | pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization-3.0", use_auth_token=token)
52 |
53 | pipeline.to(torch.device("cuda"))
54 | #diarization = pipeline(file_path, embedding_exclude_overlap=True)
55 | diarization = pipeline(file_path)
56 | audio = AudioSegment.from_mp3(file_path)
57 |
58 | speakers = []
59 | for turn, _, speaker in diarization.itertracks(yield_label=True):
60 | speakers.append(speaker)
61 |
62 | speakers = list(set(speakers))
63 |
64 | buffer = {}
65 | for s in speakers:
66 | buffer[s] = AudioSegment.empty()
67 |
68 |
69 | list(set(speakers))
70 | for turn, _, speaker in diarization.itertracks(yield_label=True):
71 |
72 | start = int(turn.start * 1000)
73 | end = int(turn.end * 1000)
74 | diff = end - start
75 |
76 | if acapella == False:
77 | speaker_audio = audio[start:end]
78 | buffer[speaker] += speaker_audio
79 | else:
80 | blank = AudioSegment.silent(diff)
81 | for iter in list(set(speakers)):
82 | if iter == speaker:
83 | speaker_audio = audio[start:end]
84 | buffer[speaker] += speaker_audio
85 | else:
86 | buffer[iter] += blank
87 |
88 | for s in speakers:
89 | buffer[s].export(project_dir + "output/htdemucs/file/" + s + "." + ext)
90 |
91 | return speakers
92 |
93 |
94 | def remove_silences(project_dir, ext):
95 |
96 | file_path = project_dir + "/output/htdemucs/file/vocals." + ext
97 | file_name = "silences_removed.mp3"
98 | audio_format = "mp3"
99 | sound = AudioSegment.from_file(file_path, format = audio_format)
100 | audio_chunks = split_on_silence(sound
101 | ,min_silence_len = 100
102 | ,silence_thresh = -45
103 | ,keep_silence = 50
104 | )
105 |
106 | combined = AudioSegment.empty()
107 | for chunk in audio_chunks:
108 | combined += chunk
109 | combined.export(f'{project_dir}output/htdemucs/file/{file_name}', format = audio_format)
110 |
111 | def setup_project(local, file_path, project_name):
112 |
113 | pwd = os.getcwd()
114 | subprocess.run(f"mkdir {pwd}/{project_name}", shell = True)
115 | subprocess.run(f"mkdir {pwd}/{project_name}/input", shell = True)
116 | subprocess.run(f"mkdir {pwd}/{project_name}/output", shell = True)
117 | print(f"Created the project directory at {pwd}/{project_name}")
118 |
119 | if local:
120 | ext = file_path.split(".")[-1]
121 | shutil.copy(file_path, f"{pwd}/{project_name}/input/file.{ext}")
122 | print("Copied files from the given path to the working directory")
123 | else:
124 | out_dir = f"{pwd}/{project_name}/input/file.mp3"
125 | subprocess.run(f"yt-dlp -x --audio-format mp3 -o {out_dir} {file_path}", shell = True)
126 | print("Downloaded the YouTube video and saved it in the project directory")
127 |
128 |
129 | def find_files(in_path):
130 | out = []
131 | for file in Path(in_path).iterdir():
132 | if file.suffix.lower().lstrip(".") in ["mp3", "wav", "ogg", "flac"]:
133 | out.append(file)
134 | return out
135 |
136 | def copy_process_streams(process: sp.Popen):
137 | def raw(stream: Optional[IO[bytes]]) -> IO[bytes]:
138 | assert stream is not None
139 | if isinstance(stream, io.BufferedIOBase):
140 | stream = stream.raw
141 | return stream
142 |
143 | p_stdout, p_stderr = raw(process.stdout), raw(process.stderr)
144 | stream_by_fd: Dict[int, Tuple[IO[bytes], io.StringIO, IO[str]]] = {
145 | p_stdout.fileno(): (p_stdout, sys.stdout),
146 | p_stderr.fileno(): (p_stderr, sys.stderr),
147 | }
148 | fds = list(stream_by_fd.keys())
149 |
150 | while fds:
151 | ready, _, _ = select.select(fds, [], [])
152 | for fd in ready:
153 | p_stream, std = stream_by_fd[fd]
154 | raw_buf = p_stream.read(2 ** 16)
155 | if not raw_buf:
156 | fds.remove(fd)
157 | continue
158 | buf = raw_buf.decode()
159 | std.write(buf)
160 | std.flush()
161 |
162 | def separate(inp=None, outp=None):
163 | inp = inp or in_path
164 | outp = outp or out_path
165 | cmd = ["python3", "-m", "demucs.separate", "-o", str(outp), "-n", "htdemucs"]
166 | cmd += ["--mp3", "--mp3-bitrate=320"]
167 | cmd += [f"--two-stems=vocals"]
168 |
169 | files = [str(f) for f in find_files(inp)]
170 | if not files:
171 | print(f"No valid audio files in {in_path}")
172 | return
173 | p = sp.Popen(cmd + files, stdout=sp.PIPE, stderr=sp.PIPE)
174 | copy_process_streams(p)
175 | p.wait()
176 | if p.returncode != 0:
177 | print("Command failed, something went wrong.")
178 |
--------------------------------------------------------------------------------