├── README.md └── Parallel-data-free emotional voice conversion with CycleGAN and CWT ├── utils.py ├── preprocess.py ├── convert_separate.py ├── train_f0.py ├── module.py ├── module_f0.py ├── module_mceps.py ├── model.py ├── model_f0.py ├── model_mceps.py └── train.py /README.md: -------------------------------------------------------------------------------- 1 | # Emotional Voice Conversion and/or Speaker Identity Conversion with Non-Parallel Training Data 2 | 3 | **This is an implementation of our Cycle-GAN based emotional voice conversion framework (to be appear in Speaker Odyssey 2020) that converts both spectrum and prosody features. Please kindly cite our paper if you are using our codes:** 4 | 5 | *Kun Zhou, Berrak Sisman, and Haizhou Li,“Transforming spectrum and prosody for emotional voice conversion with non-parallel training data,” arXiv preprint arXiv:2002.00198, 2020* 6 | 7 | **Bibtex:** 8 | ``` 9 | @article{zhou2020transforming, 10 | title={Transforming Spectrum and Prosody for Emotional Voice Conversion with Non-Parallel Training Data}, 11 | author={Zhou, Kun and Sisman, Berrak and Li, Haizhou}, 12 | journal={arXiv preprint arXiv:2002.00198}, 13 | year={2020} 14 | } 15 | 16 | ``` 17 | Dependencies 18 | ------- 19 | 20 | `Python 3.5` 21 | 22 | `Numpy 1.14` 23 | 24 | `Tensorflow 1.8` 25 | 26 | `ProgressBar2 3.37.1` 27 | 28 | `LibROSA 0.6` 29 | 30 | `FFmpeg 4.0` 31 | 32 | `PyWorld` 33 | 34 | `sklearn` 35 | 36 | `pycwt` 37 | 38 | `sprocket-vc` 39 | 40 | ` scipy` 41 | 42 | `glob` 43 | 44 | Usage 45 | --------- 46 | 47 | 1. `train.py` 48 | 49 | This script is to train CycleGAN with spectrum features. 50 | 51 | 2. `train_f0.py` 52 | 53 | This script is to perform CWT on F0, then train CycleGAN with CWT-F0 features. 54 | 55 | 3. `convert_separate.py` 56 | 57 | This script is to convert speech from the source using trained CycleGAN to convert spectrum and CWT-F0 features separately. 58 | 59 | 60 | # Instruction 61 | 62 | 1. **To train CycleGAN with spectrum features, please run the code:**
63 | ```Bash 64 | $ python train.py --train_A_dir './data/training/NEUTRAL(PATH TO SOURCE TRAINING DATA)' --train_B_dir './data/training/SURPRISE(PATH TO TARGET TRAINING DATA)' --model_dir './model/neutral_to_suprise_mceps' --model_name 'neutral_to_suprise_mceps.ckpt' --random_seed 0 --validation_A_dir './data/evaluation_all/NEUTRAL' --validation_B_dir './data/evaluation_all/SURPRISE' --output_dir './validation_output' --tensorboard_log_dir './log' 65 | ``` 66 | 67 | 2. **To train CycleGAN with CWT-F0 features, please run the code:** 68 | ```Bash 69 | $ python train_f0.py --train_A_dir './data/training/NEUTRAL(PATH TO SOURCE TRAINING DATA)' --train_B_dir './data/training/SURPRISE(PATH TO TARGET TRAINING DATA)' --model_dir './model/neutral_to_suprise_f0' --model_name 'neutral_to_suprise_f0.ckpt' --random_seed 0 --validation_A_dir './data/evaluation_all/NEUTRAL' --validation_B_dir './data/evaluation_all/SURPRISE' --output_dir './validation_output' --tensorboard_log_dir './log' 70 | ``` 71 | 72 | 3. **To convert the emotion from the source to the target, please run the code:** 73 | ```Bash 74 | $ python convert_separate.py --model_f0_dir './model/neutral_to_surprise_f0' --model_f0_name 'neutral_to_surprise_f0.ckpt' --model_mceps_dir './model/neutral_to_surprise_mceps' --model_mceps_name 'neutral_to_surprise_mceps.ckpt' --data_dir './data/evaluation_all/NEUTRAL(PATH TO EVALUATION DATA)' --conversion_direction 'A2B' --output_dir './converted_voices_neutral_to_surprise_separate' 75 | ``` 76 | maryam msv 77 | olom tahghighat 78 | 40012340048019 79 | 80 | **Note1:** 81 | The codes are based on CycleGAN Voice Conversion: https://github.com/leimao/Voice_Converter_CycleGAN 82 | 83 | **Note2:** 84 | The codes can easily be used for traditional parallel data free voice conversion (speaker identity conversion). You just need to change the training data to VCC2016 or VCC2018 (both publicly available) and the run the scripts. You can perform spectrum and CWT-based F0 conversion. 85 | 86 | -------------------------------------------------------------------------------- /Parallel-data-free emotional voice conversion with CycleGAN and CWT/utils.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | import os 3 | import random 4 | import numpy as np 5 | from scipy.interpolate import interp1d 6 | import pycwt as wavelet 7 | from scipy.signal import firwin 8 | from scipy.signal import lfilter 9 | import matplotlib.pyplot as plt 10 | from sklearn import preprocessing 11 | import pywt 12 | import librosa 13 | import pyworld 14 | 15 | def l1_loss(y, y_hat): 16 | 17 | return tf.reduce_mean(tf.abs(y - y_hat)) 18 | 19 | def l2_loss(y, y_hat): 20 | 21 | return tf.reduce_mean(tf.square(y - y_hat)) 22 | 23 | def cross_entropy_loss(logits, labels): 24 | return tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits = logits, labels = labels)) 25 | 26 | 27 | def convert_continuos_f0(f0): 28 | """CONVERT F0 TO CONTINUOUS F0 29 | 30 | Args: 31 | f0 (ndarray): original f0 sequence with the shape (T) 32 | 33 | Return: 34 | (ndarray): continuous f0 with the shape (T) 35 | """ 36 | # get uv information as binary 37 | uv = np.float32(f0 != 0) 38 | 39 | # get start and end of f0 40 | if (f0 == 0).all(): 41 | logging.warn("all of the f0 values are 0.") 42 | return uv, f0 43 | start_f0 = f0[f0 != 0][0] 44 | end_f0 = f0[f0 != 0][-1] 45 | 46 | # padding start and end of f0 sequence 47 | start_idx = np.where(f0 == start_f0)[0][0] 48 | end_idx = np.where(f0 == end_f0)[0][-1] 49 | f0[:start_idx] = start_f0 50 | f0[end_idx:] = end_f0 51 | 52 | # get non-zero frame index 53 | nz_frames = np.where(f0 != 0)[0] 54 | 55 | # perform linear interpolation 56 | f = interp1d(nz_frames, f0[nz_frames]) 57 | cont_f0 = f(np.arange(0, f0.shape[0])) 58 | 59 | return uv, cont_f0 60 | 61 | 62 | def get_cont_lf0(f0, frame_period=5.0): 63 | uv, cont_f0_lpf = convert_continuos_f0(f0) 64 | #cont_f0_lpf = low_pass_filter(cont_f0_lpf, int(1.0 / (frame_period * 0.001)), cutoff=20) 65 | cont_lf0_lpf = np.log(cont_f0_lpf) 66 | return uv, cont_lf0_lpf 67 | 68 | #def get_log_energy(sp): 69 | # sp: (T, D) 70 | # return: (T) 71 | # energy = np.linalg.norm(sp, ord=2, axis=-1) 72 | # return np.log(energy) 73 | 74 | def get_lf0_cwt(lf0): 75 | mother = wavelet.MexicanHat() 76 | #dt = 0.005 77 | dt = 0.005 78 | dj = 1 79 | s0 = dt*2 80 | J =9 81 | #C_delta = 3.541 82 | #Wavelet_lf0, scales, _, _, _, _ = wavelet.cwt(np.squeeze(lf0), dt, dj, s0, J, mother) 83 | Wavelet_lf0, scales, freqs, coi, fft, fftfreqs = wavelet.cwt(np.squeeze(lf0), dt, dj, s0, J, mother) 84 | #Wavelet_le, scales, _, _, _, _ = wavelet.cwt(np.squeeze(le), dt, dj, s0, J, mother) 85 | Wavelet_lf0 = np.real(Wavelet_lf0).T 86 | #Wavelet_le = np.real(Wavelet_le).T # (T, D=10) 87 | #0lf0_le_cwt = np.concatenate((Wavelet_lf0, Wavelet_le), -1) 88 | # iwave = wavelet.icwt(np.squeeze(lf0), scales, dt, dj, mother) * std 89 | return Wavelet_lf0, scales 90 | 91 | 92 | def inverse_cwt(Wavelet_lf0,scales): 93 | lf0_rec = np.zeros([Wavelet_lf0.shape[0],len(scales)]) 94 | for i in range(0,len(scales)): 95 | lf0_rec[:,i] = Wavelet_lf0[:,i]*((i+1+2.5)**(-2.5)) 96 | lf0_rec_sum = np.sum(lf0_rec,axis = 1) 97 | lf0_rec_sum = preprocessing.scale(lf0_rec_sum) 98 | return lf0_rec_sum 99 | 100 | 101 | def low_pass_filter(x, fs, cutoff=70, padding=True): 102 | """FUNCTION TO APPLY LOW PASS FILTER 103 | 104 | Args: 105 | x (ndarray): Waveform sequence 106 | fs (int): Sampling frequency 107 | cutoff (float): Cutoff frequency of low pass filter 108 | 109 | Return: 110 | (ndarray): Low pass filtered waveform sequence 111 | """ 112 | nyquist = fs // 2 113 | norm_cutoff = cutoff / nyquist 114 | 115 | # low cut filter 116 | numtaps = 255 117 | fil = firwin(numtaps, norm_cutoff) 118 | x_pad = np.pad(x, (numtaps, numtaps), 'edge') 119 | lpf_x = lfilter(fil, 1, x_pad) 120 | lpf_x = lpf_x[numtaps + numtaps // 2: -numtaps // 2] 121 | return lpf_x 122 | 123 | def norm_scale(Wavelet_lf0): 124 | Wavelet_lf0_norm = np.zeros((Wavelet_lf0.shape[0], Wavelet_lf0.shape[1])) 125 | mean = np.zeros((1,Wavelet_lf0.shape[1]))#[1,10] 126 | std = np.zeros((1, Wavelet_lf0.shape[1])) 127 | for scale in range(Wavelet_lf0.shape[1]): 128 | mean[:,scale] = Wavelet_lf0[:,scale].mean() 129 | std[:,scale] = Wavelet_lf0[:,scale].std() 130 | Wavelet_lf0_norm[:,scale] = (Wavelet_lf0[:,scale]-mean[:,scale])/std[:,scale] 131 | return Wavelet_lf0_norm, mean, std 132 | 133 | def get_lf0_cwt_norm(f0s, mean, std): 134 | 135 | uvs = list() 136 | cont_lf0_lpfs = list() 137 | cont_lf0_lpf_norms = list() 138 | Wavelet_lf0s = list() 139 | Wavelet_lf0s_norm = list() 140 | scaless = list() 141 | 142 | means = list() 143 | stds = list() 144 | for f0 in f0s: 145 | 146 | uv, cont_lf0_lpf = get_cont_lf0(f0) 147 | cont_lf0_lpf_norm = (cont_lf0_lpf - mean) / std 148 | 149 | Wavelet_lf0, scales = get_lf0_cwt(cont_lf0_lpf_norm) #[560,10] 150 | Wavelet_lf0_norm, mean_scale, std_scale = norm_scale(Wavelet_lf0) #[560,10],[1,10],[1,10] 151 | 152 | Wavelet_lf0s_norm.append(Wavelet_lf0_norm) 153 | uvs.append(uv) 154 | cont_lf0_lpfs.append(cont_lf0_lpf) 155 | cont_lf0_lpf_norms.append(cont_lf0_lpf_norm) 156 | Wavelet_lf0s.append(Wavelet_lf0) 157 | scaless.append(scales) 158 | means.append(mean_scale) 159 | stds.append(std_scale) 160 | 161 | return Wavelet_lf0s_norm,scaless, means, stds 162 | 163 | 164 | def denormalize(Wavelet_lf0_norm, mean, std): 165 | Wavelet_lf0_denorm = np.zeros((Wavelet_lf0_norm.shape[0], Wavelet_lf0_norm.shape[1])) 166 | for scale in range(Wavelet_lf0_norm.shape[1]): 167 | Wavelet_lf0_denorm[:,scale] = Wavelet_lf0_norm[:,scale]*std[:,scale]+mean[:,scale] 168 | return Wavelet_lf0_denorm 169 | 170 | -------------------------------------------------------------------------------- /Parallel-data-free emotional voice conversion with CycleGAN and CWT/preprocess.py: -------------------------------------------------------------------------------- 1 | import librosa 2 | import numpy as np 3 | import os 4 | import pyworld 5 | 6 | def load_wavs(wav_dir, sr): 7 | 8 | wavs = list() 9 | for file in os.listdir(wav_dir): 10 | file_path = os.path.join(wav_dir, file) 11 | wav, _ = librosa.load(file_path, sr = sr, mono = True) 12 | #wav = wav.astype(np.float64) 13 | wavs.append(wav) 14 | 15 | return wavs 16 | 17 | def world_decompose(wav, fs, frame_period = 5.0): 18 | 19 | # Decompose speech signal into f0, spectral envelope and aperiodicity using WORLD 20 | wav = wav.astype(np.float64) 21 | f0, timeaxis = pyworld.harvest(wav, fs, frame_period = frame_period, f0_floor = 71.0, f0_ceil = 800.0) 22 | sp = pyworld.cheaptrick(wav, f0, timeaxis, fs) 23 | ap = pyworld.d4c(wav, f0, timeaxis, fs) 24 | 25 | return f0, timeaxis, sp, ap 26 | 27 | def world_encode_spectral_envelop(sp, fs, dim = 24): 28 | 29 | # Get Mel-cepstral coefficients (MCEPs) 30 | 31 | #sp = sp.astype(np.float64) 32 | coded_sp = pyworld.code_spectral_envelope(sp, fs, dim) 33 | 34 | return coded_sp 35 | 36 | def world_decode_spectral_envelop(coded_sp, fs): 37 | 38 | fftlen = pyworld.get_cheaptrick_fft_size(fs) 39 | #coded_sp = coded_sp.astype(np.float32) 40 | #coded_sp = np.ascontiguousarray(coded_sp) 41 | decoded_sp = pyworld.decode_spectral_envelope(coded_sp, fs, fftlen) 42 | 43 | return decoded_sp 44 | 45 | 46 | def world_encode_data(wavs, fs, frame_period = 5.0, coded_dim = 24): 47 | 48 | f0s = list() 49 | timeaxes = list() 50 | sps = list() 51 | aps = list() 52 | coded_sps = list() 53 | 54 | for wav in wavs: 55 | f0, timeaxis, sp, ap = world_decompose(wav = wav, fs = fs, frame_period = frame_period) 56 | coded_sp = world_encode_spectral_envelop(sp = sp, fs = fs, dim = coded_dim) 57 | f0s.append(f0) 58 | timeaxes.append(timeaxis) 59 | sps.append(sp) 60 | aps.append(ap) 61 | coded_sps.append(coded_sp) 62 | 63 | return f0s, timeaxes, sps, aps, coded_sps 64 | 65 | 66 | def transpose_in_list(lst): 67 | 68 | transposed_lst = list() 69 | for array in lst: 70 | transposed_lst.append(array.T) 71 | return transposed_lst 72 | 73 | 74 | def world_decode_data(coded_sps, fs): 75 | 76 | decoded_sps = list() 77 | 78 | for coded_sp in coded_sps: 79 | decoded_sp = world_decode_spectral_envelop(coded_sp, fs) 80 | decoded_sps.append(decoded_sp) 81 | 82 | return decoded_sps 83 | 84 | 85 | def world_speech_synthesis(f0, decoded_sp, ap, fs, frame_period): 86 | 87 | #decoded_sp = decoded_sp.astype(np.float64) 88 | wav = pyworld.synthesize(f0, decoded_sp, ap, fs, frame_period) 89 | # Librosa could not save wav if not doing so 90 | wav = wav.astype(np.float32) 91 | 92 | return wav 93 | 94 | 95 | def world_synthesis_data(f0s, decoded_sps, aps, fs, frame_period): 96 | 97 | wavs = list() 98 | 99 | for f0, decoded_sp, ap in zip(f0s, decoded_sps, aps): 100 | wav = world_speech_synthesis(f0, decoded_sp, ap, fs, frame_period) 101 | wavs.append(wav) 102 | 103 | return wavs 104 | 105 | 106 | def coded_sps_normalization_fit_transoform(coded_sps): 107 | 108 | coded_sps_concatenated = np.concatenate(coded_sps, axis = 1) 109 | coded_sps_mean = np.mean(coded_sps_concatenated, axis = 1, keepdims = True) 110 | coded_sps_std = np.std(coded_sps_concatenated, axis = 1, keepdims = True) 111 | 112 | coded_sps_normalized = list() 113 | for coded_sp in coded_sps: 114 | coded_sps_normalized.append((coded_sp - coded_sps_mean) / coded_sps_std) 115 | 116 | return coded_sps_normalized, coded_sps_mean, coded_sps_std 117 | 118 | def coded_sps_normalization_transoform(coded_sps, coded_sps_mean, coded_sps_std): 119 | 120 | coded_sps_normalized = list() 121 | for coded_sp in coded_sps: 122 | coded_sps_normalized.append((coded_sp - coded_sps_mean) / coded_sps_std) 123 | 124 | return coded_sps_normalized 125 | 126 | def coded_sps_normalization_inverse_transoform(normalized_coded_sps, coded_sps_mean, coded_sps_std): 127 | 128 | coded_sps = list() 129 | for normalized_coded_sp in normalized_coded_sps: 130 | coded_sps.append(normalized_coded_sp * coded_sps_std + coded_sps_mean) 131 | 132 | return coded_sps 133 | 134 | def coded_sp_padding(coded_sp, multiple = 4): 135 | 136 | num_features = coded_sp.shape[0] 137 | num_frames = coded_sp.shape[1] 138 | num_frames_padded = int(np.ceil(num_frames / multiple)) * multiple 139 | num_frames_diff = num_frames_padded - num_frames 140 | num_pad_left = num_frames_diff // 2 141 | num_pad_right = num_frames_diff - num_pad_left 142 | coded_sp_padded = np.pad(coded_sp, ((0, 0), (num_pad_left, num_pad_right)), 'constant', constant_values = 0) 143 | 144 | return coded_sp_padded 145 | 146 | def wav_padding(wav, sr, frame_period, multiple = 4): 147 | 148 | assert wav.ndim == 1 149 | num_frames = len(wav) 150 | num_frames_padded = int((np.ceil((np.floor(num_frames / (sr * frame_period / 1000)) + 1) / multiple + 1) * multiple - 1) * (sr * frame_period / 1000)) 151 | num_frames_diff = num_frames_padded - num_frames 152 | num_pad_left = num_frames_diff // 2 153 | num_pad_right = num_frames_diff - num_pad_left 154 | wav_padded = np.pad(wav, (num_pad_left, num_pad_right), 'constant', constant_values = 0) 155 | 156 | return wav_padded 157 | 158 | 159 | def logf0_statistics(f0s): 160 | 161 | log_f0s_concatenated = np.ma.log(np.concatenate(f0s)) 162 | log_f0s_mean = log_f0s_concatenated.mean() 163 | log_f0s_std = log_f0s_concatenated.std() 164 | 165 | return log_f0s_mean, log_f0s_std 166 | 167 | def pitch_conversion(f0, mean_log_src, std_log_src, mean_log_target, std_log_target): 168 | 169 | # Logarithm Gaussian normalization for Pitch Conversions 170 | f0_converted = np.exp((np.log(f0) - mean_log_src) / std_log_src * std_log_target + mean_log_target) 171 | 172 | return f0_converted 173 | 174 | def wavs_to_specs(wavs, n_fft = 1024, hop_length = None): 175 | 176 | stfts = list() 177 | for wav in wavs: 178 | stft = librosa.stft(wav, n_fft = n_fft, hop_length = hop_length) 179 | stfts.append(stft) 180 | 181 | return stfts 182 | 183 | 184 | def wavs_to_mfccs(wavs, sr, n_fft = 1024, hop_length = None, n_mels = 128, n_mfcc = 24): 185 | 186 | mfccs = list() 187 | for wav in wavs: 188 | mfcc = librosa.feature.mfcc(y = wav, sr = sr, n_fft = n_fft, hop_length = hop_length, n_mels = n_mels, n_mfcc = n_mfcc) 189 | mfccs.append(mfcc) 190 | 191 | return mfccs 192 | 193 | 194 | def mfccs_normalization(mfccs): 195 | 196 | mfccs_concatenated = np.concatenate(mfccs, axis = 1) 197 | mfccs_mean = np.mean(mfccs_concatenated, axis = 1, keepdims = True) 198 | mfccs_std = np.std(mfccs_concatenated, axis = 1, keepdims = True) 199 | 200 | mfccs_normalized = list() 201 | for mfcc in mfccs: 202 | mfccs_normalized.append((mfcc - mfccs_mean) / mfccs_std) 203 | 204 | return mfccs_normalized, mfccs_mean, mfccs_std 205 | 206 | 207 | def sample_train_data(dataset_A, dataset_B, n_frames = 128): 208 | 209 | num_samples = min(len(dataset_A), len(dataset_B)) 210 | train_data_A_idx = np.arange(len(dataset_A)) 211 | train_data_B_idx = np.arange(len(dataset_B)) 212 | np.random.shuffle(train_data_A_idx) 213 | np.random.shuffle(train_data_B_idx) 214 | train_data_A_idx_subset = train_data_A_idx[:num_samples] 215 | train_data_B_idx_subset = train_data_B_idx[:num_samples] 216 | 217 | train_data_A = list() 218 | train_data_B = list() 219 | 220 | for idx_A, idx_B in zip(train_data_A_idx_subset, train_data_B_idx_subset): 221 | data_A = dataset_A[idx_A] 222 | frames_A_total = data_A.shape[1] 223 | assert frames_A_total >= n_frames 224 | start_A = np.random.randint(frames_A_total - n_frames + 1) 225 | end_A = start_A + n_frames 226 | train_data_A.append(data_A[:,start_A:end_A]) 227 | 228 | data_B = dataset_B[idx_B] 229 | frames_B_total = data_B.shape[1] 230 | assert frames_B_total >= n_frames 231 | start_B = np.random.randint(frames_B_total - n_frames + 1) 232 | end_B = start_B + n_frames 233 | train_data_B.append(data_B[:,start_B:end_B]) 234 | 235 | train_data_A = np.array(train_data_A) 236 | train_data_B = np.array(train_data_B) 237 | 238 | return train_data_A, train_data_B -------------------------------------------------------------------------------- /Parallel-data-free emotional voice conversion with CycleGAN and CWT/convert_separate.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import os 3 | import numpy as np 4 | 5 | from model_f0 import CycleGAN as CycleGAN_f0 6 | from model_mceps import CycleGAN as CycleGAN_mceps 7 | 8 | from preprocess import * 9 | from utils import get_lf0_cwt_norm,norm_scale,denormalize 10 | from utils import get_cont_lf0, get_lf0_cwt,inverse_cwt 11 | from sklearn import preprocessing 12 | 13 | def conversion(model_f0_dir, model_f0_name, model_mceps_dir, model_mceps_name, data_dir, conversion_direction, output_dir): 14 | 15 | num_mceps = 24 16 | num_features = 34 17 | sampling_rate = 16000 18 | frame_period = 5.0 19 | 20 | # model = CycleGAN(num_features = num_features, mode = 'test') 21 | # model.load(filepath = os.path.join(model_dir, model_name)) 22 | 23 | # import F0 model: 24 | model_f0 = CycleGAN_f0(num_features = 10, mode = 'test') 25 | model_f0.load(filepath=os.path.join(model_f0_dir,model_f0_name)) 26 | # import mceps model: 27 | model_mceps = CycleGAN_mceps(num_features = 24, mode = 'test') 28 | model_mceps.load(filepath=os.path.join(model_mceps_dir,model_mceps_name)) 29 | 30 | mcep_normalization_params = np.load(os.path.join(model_mceps_dir, 'mcep_normalization.npz')) 31 | mcep_mean_A = mcep_normalization_params['mean_A'] 32 | mcep_std_A = mcep_normalization_params['std_A'] 33 | mcep_mean_B = mcep_normalization_params['mean_B'] 34 | mcep_std_B = mcep_normalization_params['std_B'] 35 | 36 | logf0s_normalization_params = np.load(os.path.join(model_f0_dir, 'logf0s_normalization.npz')) 37 | logf0s_mean_A = logf0s_normalization_params['mean_A'] 38 | logf0s_std_A = logf0s_normalization_params['std_A'] 39 | logf0s_mean_B = logf0s_normalization_params['mean_B'] 40 | logf0s_std_B = logf0s_normalization_params['std_B'] 41 | 42 | if not os.path.exists(output_dir): 43 | os.makedirs(output_dir) 44 | 45 | for file in os.listdir(data_dir): 46 | 47 | filepath = os.path.join(data_dir, file) 48 | wav, _ = librosa.load(filepath, sr = sampling_rate, mono = True) 49 | wav = wav_padding(wav = wav, sr = sampling_rate, frame_period = frame_period, multiple = 4) 50 | f0, timeaxis, sp, ap = world_decompose(wav = wav, fs = sampling_rate, frame_period = frame_period) 51 | coded_sp = world_encode_spectral_envelop(sp = sp, fs = sampling_rate, dim = num_mceps) 52 | coded_sp_transposed = coded_sp.T 53 | # np.save('./f0',f0) 54 | uv, cont_lf0_lpf = get_cont_lf0(f0) 55 | 56 | if conversion_direction == 'A2B': 57 | #f0_converted = pitch_conversion(f0 = f0, mean_log_src = logf0s_mean_A, std_log_src = logf0s_std_A, mean_log_target = logf0s_mean_B, std_log_target = logf0s_std_B) 58 | #f0_converted = f0 59 | 60 | cont_lf0_lpf_norm = (cont_lf0_lpf - logf0s_mean_A) / logf0s_std_A 61 | Wavelet_lf0, scales = get_lf0_cwt(cont_lf0_lpf_norm) #[470,10] 62 | #np.save('./Wavelet_lf0',Wavelet_lf0) 63 | Wavelet_lf0_norm, mean, std = norm_scale(Wavelet_lf0) #[470,10],[1,10],[1,10] 64 | lf0_cwt_norm = Wavelet_lf0_norm.T #[10,470] 65 | 66 | coded_sp_norm = (coded_sp_transposed - mcep_mean_A) / mcep_std_A #[24,470] 67 | 68 | # feats_norm = np.vstack((coded_sp_norm,lf0_cwt_norm))#[34,470] 69 | # feats_converted_norm = model.test(inputs = np.array([feats_norm]), direction = conversion_direction)[0] 70 | 71 | # test mceps 72 | coded_sp_converted_norm = model_mceps.test(inputs = np.array([coded_sp_norm]),direction = conversion_direction)[0] 73 | # test f0: 74 | lf0 = model_f0.test(inputs = np.array([lf0_cwt_norm]),direction=conversion_direction)[0] 75 | #coded_sp_converted_norm = model.test(inputs = np.array([feats_norm]), direction = conversion_direction)[0] 76 | 77 | #coded_sp_converted_norm = feats_converted_norm[:24] 78 | coded_sp_converted = coded_sp_converted_norm * mcep_std_B + mcep_mean_B #mceps 79 | 80 | #lf0 = feats_converted_norm[24:].T #[470,10] 81 | 82 | lf0_cwt_denormalize = denormalize(lf0.T, mean, std)#[470,10] 83 | #np.save('./lf0_denorm',lf0_cwt_denormalize) 84 | lf0_rec = inverse_cwt(lf0_cwt_denormalize,scales)#[470,1] 85 | #lf0_rec_norm = preprocessing.scale(lf0_rec) 86 | lf0_converted = lf0_rec * logf0s_std_B + logf0s_mean_B 87 | f0_converted = np.squeeze(uv) * np.exp(lf0_converted) 88 | f0_converted = np.ascontiguousarray(f0_converted) 89 | #np.save('./f0_converted',f0_converted) 90 | 91 | else: 92 | #f0_converted = pitch_conversion(f0 = f0, mean_log_src = logf0s_mean_B, std_log_src = logf0s_std_B, mean_log_target = logf0s_mean_A, std_log_target = logf0s_std_A) 93 | #f0_converted = f0 94 | cont_lf0_lpf_norm = (cont_lf0_lpf - logf0s_mean_B) / logf0s_std_B 95 | Wavelet_lf0, scales = get_lf0_cwt(cont_lf0_lpf_norm) 96 | lf0_cwt_norm = Wavelet_lf0.T #[10,470] 97 | coded_sp_norm = (coded_sp_transposed - mcep_mean_B) / mcep_std_B 98 | feats_norm = np.vstack((coded_sp_norm,lf0_cwt_norm))#[34,470] 99 | feats_converted_norm = model_f0.test(inputs = np.array([feats_norm]), direction = conversion_direction)[0] 100 | 101 | #coded_sp_converted_norm = model.test(inputs = np.array([feats_norm]), direction = conversion_direction)[0] 102 | coded_sp_converted_norm = feats_converted_norm[:24] 103 | coded_sp_converted = coded_sp_converted_norm * mcep_std_A + mcep_mean_A 104 | lf0_rec = inverse_cwt(feats_norm[24:].T,scales)#[470,10] 105 | lf0_rec_norm = preprocessing.scale(lf0_rec) 106 | lf0_converted = lf0_rec_norm * logf0s_std_A + logf0s_mean_A 107 | f0_converted = np.squeeze(uv) * np.exp(lf0_converted) 108 | f0_converted = np.ascontiguousarray(f0_converted) 109 | 110 | #coded_sp_norm = (coded_sp_transposed - mcep_mean_B) / mcep_std_B 111 | #coded_sp_converted_norm = model.test(inputs = np.array([coded_sp_norm]), direction = conversion_direction)[0] 112 | #coded_sp_converted = coded_sp_converted_norm * mcep_std_A + mcep_mean_A 113 | 114 | coded_sp_converted = coded_sp_converted.T#[470,24] 115 | coded_sp_converted = np.ascontiguousarray(coded_sp_converted) 116 | decoded_sp_converted = world_decode_spectral_envelop(coded_sp = coded_sp_converted, fs = sampling_rate) 117 | wav_transformed = world_speech_synthesis(f0 = f0_converted, decoded_sp = decoded_sp_converted, ap = ap, fs = sampling_rate, frame_period = frame_period) 118 | librosa.output.write_wav(os.path.join(output_dir, os.path.basename(file)), wav_transformed, sampling_rate) 119 | 120 | 121 | if __name__ == '__main__': 122 | #os.environ['CUDA_VISIBLE_DEVICES']='2' 123 | parser = argparse.ArgumentParser(description = 'Convert voices using pre-trained CycleGAN model.') 124 | 125 | model_f0_dir_default = './model/neutral_to_surprise_f0' 126 | model_f0_name_default = 'neutral_to_surprise_f0.ckpt' 127 | model_mceps_dir_default = './model/neutral_to_surprise_mceps' 128 | model_mceps_name_default = 'neutral_to_surprise_mceps.ckpt' 129 | data_dir_default = './data/evaluation_all/NEUTRAL' 130 | conversion_direction_default = 'A2B' 131 | output_dir_default = './converted_voices_neutral_to_surprise_separate' 132 | 133 | parser.add_argument('--model_f0_dir', type = str, help = 'Directory for the pre-trained f0 model.', default = model_f0_dir_default) 134 | parser.add_argument('--model_f0_name', type = str, help = 'Filename for the pre-trained f0 model.', default = model_f0_name_default) 135 | parser.add_argument('--model_mceps_dir', type = str, help = 'Directory for the pre-trained mceps model.', default = model_mceps_dir_default) 136 | parser.add_argument('--model_mceps_name', type = str, help = 'Filename for the pre-trained mceps model.', default = model_mceps_name_default) 137 | parser.add_argument('--data_dir', type = str, help = 'Directory for the voices for conversion.', default = data_dir_default) 138 | parser.add_argument('--conversion_direction', type = str, help = 'Conversion direction for CycleGAN. A2B or B2A. The first object in the model file name is A, and the second object in the model file name is B.', default = conversion_direction_default) 139 | parser.add_argument('--output_dir', type = str, help = 'Directory for the converted voices.', default = output_dir_default) 140 | 141 | argv = parser.parse_args() 142 | 143 | model_f0_dir = argv.model_f0_dir 144 | model_f0_name = argv.model_f0_name 145 | model_mceps_dir = argv.model_mceps_dir 146 | model_mceps_name = argv.model_mceps_name 147 | data_dir = argv.data_dir 148 | conversion_direction = argv.conversion_direction 149 | output_dir = argv.output_dir 150 | 151 | conversion(model_f0_dir = model_f0_dir, model_f0_name = model_f0_name, model_mceps_dir = model_mceps_dir, model_mceps_name = model_mceps_name, data_dir = data_dir, conversion_direction = conversion_direction, output_dir = output_dir) 152 | 153 | 154 | -------------------------------------------------------------------------------- /Parallel-data-free emotional voice conversion with CycleGAN and CWT/train_f0.py: -------------------------------------------------------------------------------- 1 | import os 2 | import numpy as np 3 | import argparse 4 | import time 5 | import librosa 6 | from sklearn import preprocessing 7 | from preprocess import * 8 | from model_f0 import CycleGAN 9 | from utils import get_lf0_cwt_norm 10 | from utils import get_cont_lf0, get_lf0_cwt,inverse_cwt 11 | 12 | def train(train_A_dir, train_B_dir, model_dir, model_name, random_seed, validation_A_dir, validation_B_dir, output_dir, tensorboard_log_dir): 13 | # 10-scale F0 + 24 MCEPs: 14 | np.random.seed(random_seed) 15 | num_epochs = 5000 16 | mini_batch_size = 1 # mini_batch_size = 1 is better 17 | generator_learning_rate = 0.0002 18 | generator_learning_rate_decay = generator_learning_rate / 200000 19 | discriminator_learning_rate = 0.0001 20 | discriminator_learning_rate_decay = discriminator_learning_rate / 200000 21 | sampling_rate = 16000 22 | num_mcep = 24 23 | num_scale = 10 24 | frame_period = 5.0 25 | n_frames = 128 26 | lambda_cycle = 10 27 | lambda_identity = 5 28 | 29 | print('Preprocessing Data...') 30 | 31 | start_time = time.time() 32 | 33 | wavs_A = load_wavs(wav_dir = train_A_dir, sr = sampling_rate) 34 | wavs_B = load_wavs(wav_dir = train_B_dir, sr = sampling_rate) 35 | 36 | f0s_A, timeaxes_A, sps_A, aps_A, coded_sps_A = world_encode_data(wavs = wavs_A, fs = sampling_rate, frame_period = frame_period, coded_dim = num_mcep) 37 | f0s_B, timeaxes_B, sps_B, aps_B, coded_sps_B = world_encode_data(wavs = wavs_B, fs = sampling_rate, frame_period = frame_period, coded_dim = num_mcep) 38 | 39 | log_f0s_mean_A, log_f0s_std_A = logf0_statistics(f0s_A) 40 | log_f0s_mean_B, log_f0s_std_B = logf0_statistics(f0s_B) 41 | ############################# 42 | #get lf0 cwt: 43 | lf0_cwt_norm_A, scales_A, means_A, stds_A = get_lf0_cwt_norm(f0s_A, mean = log_f0s_mean_A, std = log_f0s_std_A) 44 | lf0_cwt_norm_B, scales_B, means_B, stds_B = get_lf0_cwt_norm(f0s_B, mean = log_f0s_mean_B, std = log_f0s_std_B) 45 | 46 | 47 | print('Log Pitch A') 48 | print('Mean: %f, Std: %f' %(log_f0s_mean_A, log_f0s_std_A)) 49 | print('Log Pitch B') 50 | print('Mean: %f, Std: %f' %(log_f0s_mean_B, log_f0s_std_B)) 51 | 52 | lf0_cwt_norm_A_transposed = transpose_in_list(lst = lf0_cwt_norm_A) 53 | lf0_cwt_norm_B_transposed = transpose_in_list(lst = lf0_cwt_norm_B) 54 | 55 | coded_sps_A_transposed = transpose_in_list(lst = coded_sps_A) 56 | coded_sps_B_transposed = transpose_in_list(lst = coded_sps_B) 57 | 58 | coded_sps_A_norm, coded_sps_A_mean, coded_sps_A_std = coded_sps_normalization_fit_transoform(coded_sps = coded_sps_A_transposed) 59 | coded_sps_B_norm, coded_sps_B_mean, coded_sps_B_std = coded_sps_normalization_fit_transoform(coded_sps = coded_sps_B_transposed) 60 | 61 | if not os.path.exists(model_dir): 62 | os.makedirs(model_dir) 63 | np.savez(os.path.join(model_dir, 'logf0s_normalization.npz'), mean_A = log_f0s_mean_A, std_A = log_f0s_std_A, mean_B = log_f0s_mean_B, std_B = log_f0s_std_B) 64 | np.savez(os.path.join(model_dir, 'mcep_normalization.npz'), mean_A = coded_sps_A_mean, std_A = coded_sps_A_std, mean_B = coded_sps_B_mean, std_B = coded_sps_B_std) 65 | 66 | if validation_A_dir is not None: 67 | validation_A_output_dir = os.path.join(output_dir, 'converted_A') 68 | if not os.path.exists(validation_A_output_dir): 69 | os.makedirs(validation_A_output_dir) 70 | 71 | if validation_B_dir is not None: 72 | validation_B_output_dir = os.path.join(output_dir, 'converted_B') 73 | if not os.path.exists(validation_B_output_dir): 74 | os.makedirs(validation_B_output_dir) 75 | 76 | end_time = time.time() 77 | time_elapsed = end_time - start_time 78 | 79 | print('Preprocessing Done.') 80 | 81 | print('Time Elapsed for Data Preprocessing: %02d:%02d:%02d' % (time_elapsed // 3600, (time_elapsed % 3600 // 60), (time_elapsed % 60 // 1))) 82 | 83 | num_feats = 10 #34 84 | model = CycleGAN(num_features = num_feats) 85 | #model.load('./model/neutral2anger_f0/neutral2anger_f0.ckpt') 86 | #print('model restored') 87 | 88 | for epoch in range(num_epochs): 89 | print('Epoch: %d' % epoch) 90 | ''' 91 | if epoch > 60: 92 | lambda_identity = 0 93 | if epoch > 1250: 94 | generator_learning_rate = max(0, generator_learning_rate - 0.0000002) 95 | discriminator_learning_rate = max(0, discriminator_learning_rate - 0.0000001) 96 | ''' 97 | 98 | start_time_epoch = time.time() 99 | ###zk 100 | data_As = list() 101 | data_Bs = list() 102 | for lf0_a in lf0_cwt_norm_A_transposed: 103 | data_A = lf0_a 104 | data_As.append(data_A) 105 | 106 | for lf0_b in lf0_cwt_norm_B_transposed: 107 | data_B = lf0_b 108 | data_Bs.append(data_B) 109 | # for sp_a, lf0_a in zip(coded_sps_A_norm,lf0_cwt_norm_A_transposed): 110 | # data_A = np.vstack((sp_a,lf0_a)) 111 | # data_As.append(data_A) 112 | # for sp_b, lf0_b in zip(coded_sps_B_norm,lf0_cwt_norm_B_transposed): 113 | # data_B = np.vstack((sp_b,lf0_b)) 114 | # data_Bs.append(data_B) 115 | 116 | #dataset_A, dataset_B = sample_train_data(dataset_A = coded_sps_A_norm, dataset_B = coded_sps_B_norm, n_frames = n_frames) 117 | dataset_A, dataset_B = sample_train_data(dataset_A = data_As, dataset_B = data_Bs, n_frames = n_frames) 118 | 119 | n_samples = dataset_A.shape[0] 120 | 121 | for i in range(n_samples // mini_batch_size): 122 | 123 | num_iterations = n_samples // mini_batch_size * epoch + i 124 | 125 | if num_iterations > 10000: 126 | lambda_identity = 0 127 | if num_iterations > 200000: 128 | generator_learning_rate = max(0, generator_learning_rate - generator_learning_rate_decay) 129 | discriminator_learning_rate = max(0, discriminator_learning_rate - discriminator_learning_rate_decay) 130 | 131 | start = i * mini_batch_size 132 | end = (i + 1) * mini_batch_size 133 | 134 | generator_loss, discriminator_loss = model.train(input_A = dataset_A[start:end], input_B = dataset_B[start:end], lambda_cycle = lambda_cycle, lambda_identity = lambda_identity, generator_learning_rate = generator_learning_rate, discriminator_learning_rate = discriminator_learning_rate) 135 | 136 | if i % 50 == 0: 137 | #print('Iteration: %d, Generator Loss : %f, Discriminator Loss : %f' % (num_iterations, generator_loss, discriminator_loss)) 138 | print('Iteration: {:07d}, Generator Learning Rate: {:.7f}, Discriminator Learning Rate: {:.7f}, Generator Loss : {:.3f}, Discriminator Loss : {:.3f}'.format(num_iterations, generator_learning_rate, discriminator_learning_rate, generator_loss, discriminator_loss)) 139 | 140 | model.save(directory = model_dir, filename = model_name) 141 | 142 | end_time_epoch = time.time() 143 | time_elapsed_epoch = end_time_epoch - start_time_epoch 144 | 145 | print('Time Elapsed for This Epoch: %02d:%02d:%02d' % (time_elapsed_epoch // 3600, (time_elapsed_epoch % 3600 // 60), (time_elapsed_epoch % 60 // 1))) 146 | 147 | 148 | if __name__ == '__main__': 149 | 150 | parser = argparse.ArgumentParser(description = 'Train CycleGAN model for datasets.') 151 | 152 | train_A_dir_default = './data/training/NEUTRAL' 153 | train_B_dir_default = './data/training/SURPRISE' 154 | model_dir_default = './model/neutral_surprise_f0' 155 | model_name_default = 'neutral_to_surprise_f0.ckpt' 156 | random_seed_default = 0 157 | validation_A_dir_default = './data/evaluation_all/NEUTRAL' 158 | validation_B_dir_default = './data/evaluation_all/SURPRISE' 159 | output_dir_default = './validation_output' 160 | tensorboard_log_dir_default = './log' 161 | 162 | parser.add_argument('--train_A_dir', type = str, help = 'Directory for A.', default = train_A_dir_default) 163 | parser.add_argument('--train_B_dir', type = str, help = 'Directory for B.', default = train_B_dir_default) 164 | parser.add_argument('--model_dir', type = str, help = 'Directory for saving models.', default = model_dir_default) 165 | parser.add_argument('--model_name', type = str, help = 'File name for saving model.', default = model_name_default) 166 | parser.add_argument('--random_seed', type = int, help = 'Random seed for model training.', default = random_seed_default) 167 | parser.add_argument('--validation_A_dir', type = str, help = 'Convert validation A after each training epoch. If set none, no conversion would be done during the training.', default = validation_A_dir_default) 168 | parser.add_argument('--validation_B_dir', type = str, help = 'Convert validation B after each training epoch. If set none, no conversion would be done during the training.', default = validation_B_dir_default) 169 | parser.add_argument('--output_dir', type = str, help = 'Output directory for converted validation voices.', default = output_dir_default) 170 | parser.add_argument('--tensorboard_log_dir', type = str, help = 'TensorBoard log directory.', default = tensorboard_log_dir_default) 171 | 172 | argv = parser.parse_args() 173 | 174 | train_A_dir = argv.train_A_dir 175 | train_B_dir = argv.train_B_dir 176 | model_dir = argv.model_dir 177 | model_name = argv.model_name 178 | random_seed = argv.random_seed 179 | validation_A_dir = None if argv.validation_A_dir == 'None' or argv.validation_A_dir == 'none' else argv.validation_A_dir 180 | validation_B_dir = None if argv.validation_B_dir == 'None' or argv.validation_B_dir == 'none' else argv.validation_B_dir 181 | output_dir = argv.output_dir 182 | tensorboard_log_dir = argv.tensorboard_log_dir 183 | 184 | train(train_A_dir = train_A_dir, train_B_dir = train_B_dir, model_dir = model_dir, model_name = model_name, random_seed = random_seed, validation_A_dir = validation_A_dir, validation_B_dir = validation_B_dir, output_dir = output_dir, tensorboard_log_dir = tensorboard_log_dir) 185 | -------------------------------------------------------------------------------- /Parallel-data-free emotional voice conversion with CycleGAN and CWT/module.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | 3 | def gated_linear_layer(inputs, gates, name = None): 4 | 5 | activation = tf.multiply(x = inputs, y = tf.sigmoid(gates), name = name) 6 | 7 | return activation 8 | 9 | def instance_norm_layer( 10 | inputs, 11 | epsilon = 1e-06, 12 | activation_fn = None, 13 | name = None): 14 | 15 | instance_norm_layer = tf.contrib.layers.instance_norm( 16 | inputs = inputs, 17 | epsilon = epsilon, 18 | activation_fn = activation_fn) 19 | 20 | return instance_norm_layer 21 | 22 | def conv1d_layer( 23 | inputs, 24 | filters, 25 | kernel_size, 26 | strides = 1, 27 | padding = 'same', 28 | activation = None, 29 | kernel_initializer = None, 30 | name = None): 31 | 32 | conv_layer = tf.layers.conv1d( 33 | inputs = inputs, 34 | filters = filters, 35 | kernel_size = kernel_size, 36 | strides = strides, 37 | padding = padding, 38 | activation = activation, 39 | kernel_initializer = kernel_initializer, 40 | name = name) 41 | 42 | return conv_layer 43 | 44 | def conv2d_layer( 45 | inputs, 46 | filters, 47 | kernel_size, 48 | strides, 49 | padding = 'same', 50 | activation = None, 51 | kernel_initializer = None, 52 | name = None): 53 | 54 | conv_layer = tf.layers.conv2d( 55 | inputs = inputs, 56 | filters = filters, 57 | kernel_size = kernel_size, 58 | strides = strides, 59 | padding = padding, 60 | activation = activation, 61 | kernel_initializer = kernel_initializer, 62 | name = name) 63 | 64 | return conv_layer 65 | 66 | def residual1d_block( 67 | inputs, 68 | filters = 1024, 69 | kernel_size = 3, 70 | strides = 1, 71 | name_prefix = 'residule_block_'): 72 | 73 | h1 = conv1d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_conv') 74 | h1_norm = instance_norm_layer(inputs = h1, activation_fn = None, name = name_prefix + 'h1_norm') 75 | h1_gates = conv1d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_gates') 76 | h1_norm_gates = instance_norm_layer(inputs = h1_gates, activation_fn = None, name = name_prefix + 'h1_norm_gates') 77 | h1_glu = gated_linear_layer(inputs = h1_norm, gates = h1_norm_gates, name = name_prefix + 'h1_glu') 78 | h2 = conv1d_layer(inputs = h1_glu, filters = filters // 2, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h2_conv') 79 | h2_norm = instance_norm_layer(inputs = h2, activation_fn = None, name = name_prefix + 'h2_norm') 80 | 81 | h3 = inputs + h2_norm 82 | 83 | return h3 84 | 85 | def downsample1d_block( 86 | inputs, 87 | filters, 88 | kernel_size, 89 | strides, 90 | name_prefix = 'downsample1d_block_'): 91 | 92 | h1 = conv1d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_conv') 93 | h1_norm = instance_norm_layer(inputs = h1, activation_fn = None, name = name_prefix + 'h1_norm') 94 | h1_gates = conv1d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_gates') 95 | h1_norm_gates = instance_norm_layer(inputs = h1_gates, activation_fn = None, name = name_prefix + 'h1_norm_gates') 96 | h1_glu = gated_linear_layer(inputs = h1_norm, gates = h1_norm_gates, name = name_prefix + 'h1_glu') 97 | 98 | return h1_glu 99 | 100 | def downsample2d_block( 101 | inputs, 102 | filters, 103 | kernel_size, 104 | strides, 105 | name_prefix = 'downsample2d_block_'): 106 | 107 | h1 = conv2d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_conv') 108 | h1_norm = instance_norm_layer(inputs = h1, activation_fn = None, name = name_prefix + 'h1_norm') 109 | h1_gates = conv2d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_gates') 110 | h1_norm_gates = instance_norm_layer(inputs = h1_gates, activation_fn = None, name = name_prefix + 'h1_norm_gates') 111 | h1_glu = gated_linear_layer(inputs = h1_norm, gates = h1_norm_gates, name = name_prefix + 'h1_glu') 112 | 113 | return h1_glu 114 | 115 | def upsample1d_block( 116 | inputs, 117 | filters, 118 | kernel_size, 119 | strides, 120 | shuffle_size = 2, 121 | name_prefix = 'upsample1d_block_'): 122 | 123 | h1 = conv1d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_conv') 124 | h1_shuffle = pixel_shuffler(inputs = h1, shuffle_size = shuffle_size, name = name_prefix + 'h1_shuffle') 125 | h1_norm = instance_norm_layer(inputs = h1_shuffle, activation_fn = None, name = name_prefix + 'h1_norm') 126 | 127 | h1_gates = conv1d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_gates') 128 | h1_shuffle_gates = pixel_shuffler(inputs = h1_gates, shuffle_size = shuffle_size, name = name_prefix + 'h1_shuffle_gates') 129 | h1_norm_gates = instance_norm_layer(inputs = h1_shuffle_gates, activation_fn = None, name = name_prefix + 'h1_norm_gates') 130 | 131 | h1_glu = gated_linear_layer(inputs = h1_norm, gates = h1_norm_gates, name = name_prefix + 'h1_glu') 132 | 133 | return h1_glu 134 | 135 | def pixel_shuffler(inputs, shuffle_size = 2, name = None): 136 | 137 | n = tf.shape(inputs)[0] 138 | w = tf.shape(inputs)[1] 139 | c = inputs.get_shape().as_list()[2] 140 | 141 | oc = c // shuffle_size 142 | ow = w * shuffle_size 143 | 144 | outputs = tf.reshape(tensor = inputs, shape = [n, ow, oc], name = name) 145 | 146 | return outputs 147 | 148 | def generator_gatedcnn(inputs, reuse = False, scope_name = 'generator_gatedcnn'): 149 | 150 | # inputs has shape [batch_size, num_features, time] 151 | # we need to convert it to [batch_size, time, num_features] for 1D convolution 152 | inputs = tf.transpose(inputs, perm = [0, 2, 1], name = 'input_transpose') 153 | 154 | with tf.variable_scope(scope_name) as scope: 155 | # Discriminator would be reused in CycleGAN 156 | if reuse: 157 | scope.reuse_variables() 158 | else: 159 | assert scope.reuse is False 160 | 161 | h1 = conv1d_layer(inputs = inputs, filters = 128, kernel_size = 15, strides = 1, activation = None, name = 'h1_conv') 162 | h1_gates = conv1d_layer(inputs = inputs, filters = 128, kernel_size = 15, strides = 1, activation = None, name = 'h1_conv_gates') 163 | h1_glu = gated_linear_layer(inputs = h1, gates = h1_gates, name = 'h1_glu') 164 | 165 | # Downsample 166 | d1 = downsample1d_block(inputs = h1_glu, filters = 256, kernel_size = 5, strides = 2, name_prefix = 'downsample1d_block1_') 167 | d2 = downsample1d_block(inputs = d1, filters = 512, kernel_size = 5, strides = 2, name_prefix = 'downsample1d_block2_') 168 | 169 | # Residual blocks 170 | r1 = residual1d_block(inputs = d2, filters = 1024, kernel_size = 3, strides = 1, name_prefix = 'residual1d_block1_') 171 | r2 = residual1d_block(inputs = r1, filters = 1024, kernel_size = 3, strides = 1, name_prefix = 'residual1d_block2_') 172 | r3 = residual1d_block(inputs = r2, filters = 1024, kernel_size = 3, strides = 1, name_prefix = 'residual1d_block3_') 173 | r4 = residual1d_block(inputs = r3, filters = 1024, kernel_size = 3, strides = 1, name_prefix = 'residual1d_block4_') 174 | r5 = residual1d_block(inputs = r4, filters = 1024, kernel_size = 3, strides = 1, name_prefix = 'residual1d_block5_') 175 | r6 = residual1d_block(inputs = r5, filters = 1024, kernel_size = 3, strides = 1, name_prefix = 'residual1d_block6_') 176 | 177 | # Upsample 178 | u1 = upsample1d_block(inputs = r6, filters = 1024, kernel_size = 5, strides = 1, shuffle_size = 2, name_prefix = 'upsample1d_block1_') 179 | u2 = upsample1d_block(inputs = u1, filters = 512, kernel_size = 5, strides = 1, shuffle_size = 2, name_prefix = 'upsample1d_block2_') 180 | 181 | # Output 182 | o1 = conv1d_layer(inputs = u2, filters = 24, kernel_size = 15, strides = 1, activation = None, name = 'o1_conv') 183 | o2 = tf.transpose(o1, perm = [0, 2, 1], name = 'output_transpose') 184 | 185 | return o2 186 | 187 | 188 | def discriminator(inputs, reuse = False, scope_name = 'discriminator'): 189 | 190 | # inputs has shape [batch_size, num_features, time] 191 | # we need to add channel for 2D convolution [batch_size, num_features, time, 1] 192 | inputs = tf.expand_dims(inputs, -1) 193 | 194 | with tf.variable_scope(scope_name) as scope: 195 | # Discriminator would be reused in CycleGAN 196 | if reuse: 197 | scope.reuse_variables() 198 | else: 199 | assert scope.reuse is False 200 | 201 | h1 = conv2d_layer(inputs = inputs, filters = 128, kernel_size = [3, 3], strides = [1, 2], activation = None, name = 'h1_conv') 202 | h1_gates = conv2d_layer(inputs = inputs, filters = 128, kernel_size = [3, 3], strides = [1, 2], activation = None, name = 'h1_conv_gates') 203 | h1_glu = gated_linear_layer(inputs = h1, gates = h1_gates, name = 'h1_glu') 204 | 205 | # Downsample 206 | d1 = downsample2d_block(inputs = h1_glu, filters = 256, kernel_size = [3, 3], strides = [2, 2], name_prefix = 'downsample2d_block1_') 207 | d2 = downsample2d_block(inputs = d1, filters = 512, kernel_size = [3, 3], strides = [2, 2], name_prefix = 'downsample2d_block2_') 208 | d3 = downsample2d_block(inputs = d2, filters = 1024, kernel_size = [6, 3], strides = [1, 2], name_prefix = 'downsample2d_block3_') 209 | 210 | # Output 211 | o1 = tf.layers.dense(inputs = d3, units = 1, activation = tf.nn.sigmoid) 212 | 213 | return o1 214 | 215 | -------------------------------------------------------------------------------- /Parallel-data-free emotional voice conversion with CycleGAN and CWT/module_f0.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | 3 | def gated_linear_layer(inputs, gates, name = None): 4 | 5 | activation = tf.multiply(x = inputs, y = tf.sigmoid(gates), name = name) 6 | 7 | return activation 8 | 9 | def instance_norm_layer( 10 | inputs, 11 | epsilon = 1e-06, 12 | activation_fn = None, 13 | name = None): 14 | 15 | instance_norm_layer = tf.contrib.layers.instance_norm( 16 | inputs = inputs, 17 | epsilon = epsilon, 18 | activation_fn = activation_fn) 19 | 20 | return instance_norm_layer 21 | 22 | def conv1d_layer( 23 | inputs, 24 | filters, 25 | kernel_size, 26 | strides = 1, 27 | padding = 'same', 28 | activation = None, 29 | kernel_initializer = None, 30 | name = None): 31 | 32 | conv_layer = tf.layers.conv1d( 33 | inputs = inputs, 34 | filters = filters, 35 | kernel_size = kernel_size, 36 | strides = strides, 37 | padding = padding, 38 | activation = activation, 39 | kernel_initializer = kernel_initializer, 40 | name = name) 41 | 42 | return conv_layer 43 | 44 | def conv2d_layer( 45 | inputs, 46 | filters, 47 | kernel_size, 48 | strides, 49 | padding = 'same', 50 | activation = None, 51 | kernel_initializer = None, 52 | name = None): 53 | 54 | conv_layer = tf.layers.conv2d( 55 | inputs = inputs, 56 | filters = filters, 57 | kernel_size = kernel_size, 58 | strides = strides, 59 | padding = padding, 60 | activation = activation, 61 | kernel_initializer = kernel_initializer, 62 | name = name) 63 | 64 | return conv_layer 65 | 66 | def residual1d_block( 67 | inputs, 68 | filters = 1024, 69 | kernel_size = 3, 70 | strides = 1, 71 | name_prefix = 'residule_block_'): 72 | 73 | h1 = conv1d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_conv') 74 | h1_norm = instance_norm_layer(inputs = h1, activation_fn = None, name = name_prefix + 'h1_norm') 75 | h1_gates = conv1d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_gates') 76 | h1_norm_gates = instance_norm_layer(inputs = h1_gates, activation_fn = None, name = name_prefix + 'h1_norm_gates') 77 | h1_glu = gated_linear_layer(inputs = h1_norm, gates = h1_norm_gates, name = name_prefix + 'h1_glu') 78 | h2 = conv1d_layer(inputs = h1_glu, filters = filters // 2, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h2_conv') 79 | h2_norm = instance_norm_layer(inputs = h2, activation_fn = None, name = name_prefix + 'h2_norm') 80 | 81 | h3 = inputs + h2_norm 82 | 83 | return h3 84 | 85 | def downsample1d_block( 86 | inputs, 87 | filters, 88 | kernel_size, 89 | strides, 90 | name_prefix = 'downsample1d_block_'): 91 | 92 | h1 = conv1d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_conv') 93 | h1_norm = instance_norm_layer(inputs = h1, activation_fn = None, name = name_prefix + 'h1_norm') 94 | h1_gates = conv1d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_gates') 95 | h1_norm_gates = instance_norm_layer(inputs = h1_gates, activation_fn = None, name = name_prefix + 'h1_norm_gates') 96 | h1_glu = gated_linear_layer(inputs = h1_norm, gates = h1_norm_gates, name = name_prefix + 'h1_glu') 97 | 98 | return h1_glu 99 | 100 | def downsample2d_block( 101 | inputs, 102 | filters, 103 | kernel_size, 104 | strides, 105 | name_prefix = 'downsample2d_block_'): 106 | 107 | h1 = conv2d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_conv') 108 | h1_norm = instance_norm_layer(inputs = h1, activation_fn = None, name = name_prefix + 'h1_norm') 109 | h1_gates = conv2d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_gates') 110 | h1_norm_gates = instance_norm_layer(inputs = h1_gates, activation_fn = None, name = name_prefix + 'h1_norm_gates') 111 | h1_glu = gated_linear_layer(inputs = h1_norm, gates = h1_norm_gates, name = name_prefix + 'h1_glu') 112 | 113 | return h1_glu 114 | 115 | def upsample1d_block( 116 | inputs, 117 | filters, 118 | kernel_size, 119 | strides, 120 | shuffle_size = 2, 121 | name_prefix = 'upsample1d_block_'): 122 | 123 | h1 = conv1d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_conv') 124 | h1_shuffle = pixel_shuffler(inputs = h1, shuffle_size = shuffle_size, name = name_prefix + 'h1_shuffle') 125 | h1_norm = instance_norm_layer(inputs = h1_shuffle, activation_fn = None, name = name_prefix + 'h1_norm') 126 | 127 | h1_gates = conv1d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_gates') 128 | h1_shuffle_gates = pixel_shuffler(inputs = h1_gates, shuffle_size = shuffle_size, name = name_prefix + 'h1_shuffle_gates') 129 | h1_norm_gates = instance_norm_layer(inputs = h1_shuffle_gates, activation_fn = None, name = name_prefix + 'h1_norm_gates') 130 | 131 | h1_glu = gated_linear_layer(inputs = h1_norm, gates = h1_norm_gates, name = name_prefix + 'h1_glu') 132 | 133 | return h1_glu 134 | 135 | def pixel_shuffler(inputs, shuffle_size = 2, name = None): 136 | 137 | n = tf.shape(inputs)[0] 138 | w = tf.shape(inputs)[1] 139 | c = inputs.get_shape().as_list()[2] 140 | 141 | oc = c // shuffle_size 142 | ow = w * shuffle_size 143 | 144 | outputs = tf.reshape(tensor = inputs, shape = [n, ow, oc], name = name) 145 | 146 | return outputs 147 | 148 | def generator_gatedcnn(inputs, reuse = False, scope_name = 'generator_gatedcnn'): 149 | 150 | # inputs has shape [batch_size, num_features, time] 151 | # we need to convert it to [batch_size, time, num_features] for 1D convolution 152 | inputs = tf.transpose(inputs, perm = [0, 2, 1], name = 'input_transpose') 153 | 154 | with tf.variable_scope(scope_name) as scope: 155 | # Discriminator would be reused in CycleGAN 156 | if reuse: 157 | scope.reuse_variables() 158 | else: 159 | assert scope.reuse is False 160 | 161 | h1 = conv1d_layer(inputs = inputs, filters = 128, kernel_size = 15, strides = 1, activation = None, name = 'h1_conv') 162 | h1_gates = conv1d_layer(inputs = inputs, filters = 128, kernel_size = 15, strides = 1, activation = None, name = 'h1_conv_gates') 163 | h1_glu = gated_linear_layer(inputs = h1, gates = h1_gates, name = 'h1_glu') 164 | 165 | # Downsample 166 | d1 = downsample1d_block(inputs = h1_glu, filters = 256, kernel_size = 5, strides = 2, name_prefix = 'downsample1d_block1_') 167 | d2 = downsample1d_block(inputs = d1, filters = 512, kernel_size = 5, strides = 2, name_prefix = 'downsample1d_block2_') 168 | 169 | # Residual blocks 170 | r1 = residual1d_block(inputs = d2, filters = 1024, kernel_size = 3, strides = 1, name_prefix = 'residual1d_block1_') 171 | r2 = residual1d_block(inputs = r1, filters = 1024, kernel_size = 3, strides = 1, name_prefix = 'residual1d_block2_') 172 | r3 = residual1d_block(inputs = r2, filters = 1024, kernel_size = 3, strides = 1, name_prefix = 'residual1d_block3_') 173 | r4 = residual1d_block(inputs = r3, filters = 1024, kernel_size = 3, strides = 1, name_prefix = 'residual1d_block4_') 174 | r5 = residual1d_block(inputs = r4, filters = 1024, kernel_size = 3, strides = 1, name_prefix = 'residual1d_block5_') 175 | r6 = residual1d_block(inputs = r5, filters = 1024, kernel_size = 3, strides = 1, name_prefix = 'residual1d_block6_') 176 | 177 | # Upsample 178 | u1 = upsample1d_block(inputs = r6, filters = 1024, kernel_size = 5, strides = 1, shuffle_size = 2, name_prefix = 'upsample1d_block1_') 179 | u2 = upsample1d_block(inputs = u1, filters = 512, kernel_size = 5, strides = 1, shuffle_size = 2, name_prefix = 'upsample1d_block2_') 180 | 181 | # Output 182 | o1 = conv1d_layer(inputs = u2, filters = 10, kernel_size = 15, strides = 1, activation = None, name = 'o1_conv') 183 | o2 = tf.transpose(o1, perm = [0, 2, 1], name = 'output_transpose') 184 | 185 | return o2 186 | 187 | 188 | def discriminator(inputs, reuse = False, scope_name = 'discriminator'): 189 | 190 | # inputs has shape [batch_size, num_features, time] 191 | # we need to add channel for 2D convolution [batch_size, num_features, time, 1] 192 | inputs = tf.expand_dims(inputs, -1) 193 | 194 | with tf.variable_scope(scope_name) as scope: 195 | # Discriminator would be reused in CycleGAN 196 | if reuse: 197 | scope.reuse_variables() 198 | else: 199 | assert scope.reuse is False 200 | 201 | h1 = conv2d_layer(inputs = inputs, filters = 128, kernel_size = [3, 3], strides = [1, 2], activation = None, name = 'h1_conv') 202 | h1_gates = conv2d_layer(inputs = inputs, filters = 128, kernel_size = [3, 3], strides = [1, 2], activation = None, name = 'h1_conv_gates') 203 | h1_glu = gated_linear_layer(inputs = h1, gates = h1_gates, name = 'h1_glu') 204 | 205 | # Downsample 206 | d1 = downsample2d_block(inputs = h1_glu, filters = 256, kernel_size = [3, 3], strides = [2, 2], name_prefix = 'downsample2d_block1_') 207 | d2 = downsample2d_block(inputs = d1, filters = 512, kernel_size = [3, 3], strides = [2, 2], name_prefix = 'downsample2d_block2_') 208 | d3 = downsample2d_block(inputs = d2, filters = 1024, kernel_size = [6, 3], strides = [1, 2], name_prefix = 'downsample2d_block3_') 209 | 210 | # Output 211 | o1 = tf.layers.dense(inputs = d3, units = 1, activation = tf.nn.sigmoid) 212 | 213 | return o1 214 | 215 | -------------------------------------------------------------------------------- /Parallel-data-free emotional voice conversion with CycleGAN and CWT/module_mceps.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | 3 | def gated_linear_layer(inputs, gates, name = None): 4 | 5 | activation = tf.multiply(x = inputs, y = tf.sigmoid(gates), name = name) 6 | 7 | return activation 8 | 9 | def instance_norm_layer( 10 | inputs, 11 | epsilon = 1e-06, 12 | activation_fn = None, 13 | name = None): 14 | 15 | instance_norm_layer = tf.contrib.layers.instance_norm( 16 | inputs = inputs, 17 | epsilon = epsilon, 18 | activation_fn = activation_fn) 19 | 20 | return instance_norm_layer 21 | 22 | def conv1d_layer( 23 | inputs, 24 | filters, 25 | kernel_size, 26 | strides = 1, 27 | padding = 'same', 28 | activation = None, 29 | kernel_initializer = None, 30 | name = None): 31 | 32 | conv_layer = tf.layers.conv1d( 33 | inputs = inputs, 34 | filters = filters, 35 | kernel_size = kernel_size, 36 | strides = strides, 37 | padding = padding, 38 | activation = activation, 39 | kernel_initializer = kernel_initializer, 40 | name = name) 41 | 42 | return conv_layer 43 | 44 | def conv2d_layer( 45 | inputs, 46 | filters, 47 | kernel_size, 48 | strides, 49 | padding = 'same', 50 | activation = None, 51 | kernel_initializer = None, 52 | name = None): 53 | 54 | conv_layer = tf.layers.conv2d( 55 | inputs = inputs, 56 | filters = filters, 57 | kernel_size = kernel_size, 58 | strides = strides, 59 | padding = padding, 60 | activation = activation, 61 | kernel_initializer = kernel_initializer, 62 | name = name) 63 | 64 | return conv_layer 65 | 66 | def residual1d_block( 67 | inputs, 68 | filters = 1024, 69 | kernel_size = 3, 70 | strides = 1, 71 | name_prefix = 'residule_block_'): 72 | 73 | h1 = conv1d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_conv') 74 | h1_norm = instance_norm_layer(inputs = h1, activation_fn = None, name = name_prefix + 'h1_norm') 75 | h1_gates = conv1d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_gates') 76 | h1_norm_gates = instance_norm_layer(inputs = h1_gates, activation_fn = None, name = name_prefix + 'h1_norm_gates') 77 | h1_glu = gated_linear_layer(inputs = h1_norm, gates = h1_norm_gates, name = name_prefix + 'h1_glu') 78 | h2 = conv1d_layer(inputs = h1_glu, filters = filters // 2, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h2_conv') 79 | h2_norm = instance_norm_layer(inputs = h2, activation_fn = None, name = name_prefix + 'h2_norm') 80 | 81 | h3 = inputs + h2_norm 82 | 83 | return h3 84 | 85 | def downsample1d_block( 86 | inputs, 87 | filters, 88 | kernel_size, 89 | strides, 90 | name_prefix = 'downsample1d_block_'): 91 | 92 | h1 = conv1d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_conv') 93 | h1_norm = instance_norm_layer(inputs = h1, activation_fn = None, name = name_prefix + 'h1_norm') 94 | h1_gates = conv1d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_gates') 95 | h1_norm_gates = instance_norm_layer(inputs = h1_gates, activation_fn = None, name = name_prefix + 'h1_norm_gates') 96 | h1_glu = gated_linear_layer(inputs = h1_norm, gates = h1_norm_gates, name = name_prefix + 'h1_glu') 97 | 98 | return h1_glu 99 | 100 | def downsample2d_block( 101 | inputs, 102 | filters, 103 | kernel_size, 104 | strides, 105 | name_prefix = 'downsample2d_block_'): 106 | 107 | h1 = conv2d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_conv') 108 | h1_norm = instance_norm_layer(inputs = h1, activation_fn = None, name = name_prefix + 'h1_norm') 109 | h1_gates = conv2d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_gates') 110 | h1_norm_gates = instance_norm_layer(inputs = h1_gates, activation_fn = None, name = name_prefix + 'h1_norm_gates') 111 | h1_glu = gated_linear_layer(inputs = h1_norm, gates = h1_norm_gates, name = name_prefix + 'h1_glu') 112 | 113 | return h1_glu 114 | 115 | def upsample1d_block( 116 | inputs, 117 | filters, 118 | kernel_size, 119 | strides, 120 | shuffle_size = 2, 121 | name_prefix = 'upsample1d_block_'): 122 | 123 | h1 = conv1d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_conv') 124 | h1_shuffle = pixel_shuffler(inputs = h1, shuffle_size = shuffle_size, name = name_prefix + 'h1_shuffle') 125 | h1_norm = instance_norm_layer(inputs = h1_shuffle, activation_fn = None, name = name_prefix + 'h1_norm') 126 | 127 | h1_gates = conv1d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_gates') 128 | h1_shuffle_gates = pixel_shuffler(inputs = h1_gates, shuffle_size = shuffle_size, name = name_prefix + 'h1_shuffle_gates') 129 | h1_norm_gates = instance_norm_layer(inputs = h1_shuffle_gates, activation_fn = None, name = name_prefix + 'h1_norm_gates') 130 | 131 | h1_glu = gated_linear_layer(inputs = h1_norm, gates = h1_norm_gates, name = name_prefix + 'h1_glu') 132 | 133 | return h1_glu 134 | 135 | def pixel_shuffler(inputs, shuffle_size = 2, name = None): 136 | 137 | n = tf.shape(inputs)[0] 138 | w = tf.shape(inputs)[1] 139 | c = inputs.get_shape().as_list()[2] 140 | 141 | oc = c // shuffle_size 142 | ow = w * shuffle_size 143 | 144 | outputs = tf.reshape(tensor = inputs, shape = [n, ow, oc], name = name) 145 | 146 | return outputs 147 | 148 | def generator_gatedcnn(inputs, reuse = False, scope_name = 'generator_gatedcnn'): 149 | 150 | # inputs has shape [batch_size, num_features, time] 151 | # we need to convert it to [batch_size, time, num_features] for 1D convolution 152 | inputs = tf.transpose(inputs, perm = [0, 2, 1], name = 'input_transpose') 153 | 154 | with tf.variable_scope(scope_name) as scope: 155 | # Discriminator would be reused in CycleGAN 156 | if reuse: 157 | scope.reuse_variables() 158 | else: 159 | assert scope.reuse is False 160 | 161 | h1 = conv1d_layer(inputs = inputs, filters = 128, kernel_size = 15, strides = 1, activation = None, name = 'h1_conv') 162 | h1_gates = conv1d_layer(inputs = inputs, filters = 128, kernel_size = 15, strides = 1, activation = None, name = 'h1_conv_gates') 163 | h1_glu = gated_linear_layer(inputs = h1, gates = h1_gates, name = 'h1_glu') 164 | 165 | # Downsample 166 | d1 = downsample1d_block(inputs = h1_glu, filters = 256, kernel_size = 5, strides = 2, name_prefix = 'downsample1d_block1_') 167 | d2 = downsample1d_block(inputs = d1, filters = 512, kernel_size = 5, strides = 2, name_prefix = 'downsample1d_block2_') 168 | 169 | # Residual blocks 170 | r1 = residual1d_block(inputs = d2, filters = 1024, kernel_size = 3, strides = 1, name_prefix = 'residual1d_block1_') 171 | r2 = residual1d_block(inputs = r1, filters = 1024, kernel_size = 3, strides = 1, name_prefix = 'residual1d_block2_') 172 | r3 = residual1d_block(inputs = r2, filters = 1024, kernel_size = 3, strides = 1, name_prefix = 'residual1d_block3_') 173 | r4 = residual1d_block(inputs = r3, filters = 1024, kernel_size = 3, strides = 1, name_prefix = 'residual1d_block4_') 174 | r5 = residual1d_block(inputs = r4, filters = 1024, kernel_size = 3, strides = 1, name_prefix = 'residual1d_block5_') 175 | r6 = residual1d_block(inputs = r5, filters = 1024, kernel_size = 3, strides = 1, name_prefix = 'residual1d_block6_') 176 | 177 | # Upsample 178 | u1 = upsample1d_block(inputs = r6, filters = 1024, kernel_size = 5, strides = 1, shuffle_size = 2, name_prefix = 'upsample1d_block1_') 179 | u2 = upsample1d_block(inputs = u1, filters = 512, kernel_size = 5, strides = 1, shuffle_size = 2, name_prefix = 'upsample1d_block2_') 180 | 181 | # Output 182 | o1 = conv1d_layer(inputs = u2, filters = 24, kernel_size = 15, strides = 1, activation = None, name = 'o1_conv') 183 | o2 = tf.transpose(o1, perm = [0, 2, 1], name = 'output_transpose') 184 | 185 | return o2 186 | 187 | 188 | def discriminator(inputs, reuse = False, scope_name = 'discriminator'): 189 | 190 | # inputs has shape [batch_size, num_features, time] 191 | # we need to add channel for 2D convolution [batch_size, num_features, time, 1] 192 | inputs = tf.expand_dims(inputs, -1) 193 | 194 | with tf.variable_scope(scope_name) as scope: 195 | # Discriminator would be reused in CycleGAN 196 | if reuse: 197 | scope.reuse_variables() 198 | else: 199 | assert scope.reuse is False 200 | 201 | h1 = conv2d_layer(inputs = inputs, filters = 128, kernel_size = [3, 3], strides = [1, 2], activation = None, name = 'h1_conv') 202 | h1_gates = conv2d_layer(inputs = inputs, filters = 128, kernel_size = [3, 3], strides = [1, 2], activation = None, name = 'h1_conv_gates') 203 | h1_glu = gated_linear_layer(inputs = h1, gates = h1_gates, name = 'h1_glu') 204 | 205 | # Downsample 206 | d1 = downsample2d_block(inputs = h1_glu, filters = 256, kernel_size = [3, 3], strides = [2, 2], name_prefix = 'downsample2d_block1_') 207 | d2 = downsample2d_block(inputs = d1, filters = 512, kernel_size = [3, 3], strides = [2, 2], name_prefix = 'downsample2d_block2_') 208 | d3 = downsample2d_block(inputs = d2, filters = 1024, kernel_size = [6, 3], strides = [1, 2], name_prefix = 'downsample2d_block3_') 209 | 210 | # Output 211 | o1 = tf.layers.dense(inputs = d3, units = 1, activation = tf.nn.sigmoid) 212 | 213 | return o1 214 | 215 | -------------------------------------------------------------------------------- /Parallel-data-free emotional voice conversion with CycleGAN and CWT/model.py: -------------------------------------------------------------------------------- 1 | import os 2 | import tensorflow as tf 3 | from module import discriminator, generator_gatedcnn 4 | from utils import l1_loss, l2_loss, cross_entropy_loss 5 | from datetime import datetime 6 | 7 | class CycleGAN(object): 8 | 9 | def __init__(self, num_features, discriminator = discriminator, generator = generator_gatedcnn, mode = 'train', log_dir = './log'): 10 | 11 | self.num_features = num_features 12 | self.input_shape = [None, num_features, None] # [batch_size, num_features, num_frames] 13 | 14 | self.discriminator = discriminator 15 | self.generator = generator 16 | self.mode = mode 17 | 18 | self.build_model() 19 | self.optimizer_initializer() 20 | 21 | self.saver = tf.train.Saver() 22 | self.sess = tf.Session() 23 | self.sess.run(tf.global_variables_initializer()) 24 | 25 | if self.mode == 'train': 26 | self.train_step = 0 27 | now = datetime.now() 28 | self.log_dir = os.path.join(log_dir, now.strftime('%Y%m%d-%H%M%S')) 29 | self.writer = tf.summary.FileWriter(self.log_dir, tf.get_default_graph()) 30 | self.generator_summaries, self.discriminator_summaries = self.summary() 31 | 32 | def build_model(self): 33 | 34 | # Placeholders for real training samples 35 | self.input_A_real = tf.placeholder(tf.float32, shape = self.input_shape, name = 'input_A_real') 36 | self.input_B_real = tf.placeholder(tf.float32, shape = self.input_shape, name = 'input_B_real') 37 | # Placeholders for fake generated samples 38 | self.input_A_fake = tf.placeholder(tf.float32, shape = self.input_shape, name = 'input_A_fake') 39 | self.input_B_fake = tf.placeholder(tf.float32, shape = self.input_shape, name = 'input_B_fake') 40 | # Placeholder for test samples 41 | self.input_A_test = tf.placeholder(tf.float32, shape = self.input_shape, name = 'input_A_test') 42 | self.input_B_test = tf.placeholder(tf.float32, shape = self.input_shape, name = 'input_B_test') 43 | 44 | self.generation_B = self.generator(inputs = self.input_A_real, reuse = False, scope_name = 'generator_A2B') 45 | self.cycle_A = self.generator(inputs = self.generation_B, reuse = False, scope_name = 'generator_B2A') 46 | 47 | self.generation_A = self.generator(inputs = self.input_B_real, reuse = True, scope_name = 'generator_B2A') 48 | self.cycle_B = self.generator(inputs = self.generation_A, reuse = True, scope_name = 'generator_A2B') 49 | 50 | self.generation_A_identity = self.generator(inputs = self.input_A_real, reuse = True, scope_name = 'generator_B2A') 51 | self.generation_B_identity = self.generator(inputs = self.input_B_real, reuse = True, scope_name = 'generator_A2B') 52 | 53 | self.discrimination_A_fake = self.discriminator(inputs = self.generation_A, reuse = False, scope_name = 'discriminator_A') 54 | self.discrimination_B_fake = self.discriminator(inputs = self.generation_B, reuse = False, scope_name = 'discriminator_B') 55 | 56 | # Cycle loss 57 | self.cycle_loss = l1_loss(y = self.input_A_real, y_hat = self.cycle_A) + l1_loss(y = self.input_B_real, y_hat = self.cycle_B) 58 | 59 | # Identity loss 60 | self.identity_loss = l1_loss(y = self.input_A_real, y_hat = self.generation_A_identity) + l1_loss(y = self.input_B_real, y_hat = self.generation_B_identity) 61 | 62 | # Place holder for lambda_cycle and lambda_identity 63 | self.lambda_cycle = tf.placeholder(tf.float32, None, name = 'lambda_cycle') 64 | self.lambda_identity = tf.placeholder(tf.float32, None, name = 'lambda_identity') 65 | 66 | # Generator loss 67 | # Generator wants to fool discriminator 68 | self.generator_loss_A2B = l2_loss(y = tf.ones_like(self.discrimination_B_fake), y_hat = self.discrimination_B_fake) 69 | self.generator_loss_B2A = l2_loss(y = tf.ones_like(self.discrimination_A_fake), y_hat = self.discrimination_A_fake) 70 | 71 | # Merge the two generators and the cycle loss 72 | self.generator_loss = self.generator_loss_A2B + self.generator_loss_B2A + self.lambda_cycle * self.cycle_loss + self.lambda_identity * self.identity_loss 73 | 74 | # Discriminator loss 75 | self.discrimination_input_A_real = self.discriminator(inputs = self.input_A_real, reuse = True, scope_name = 'discriminator_A') 76 | self.discrimination_input_B_real = self.discriminator(inputs = self.input_B_real, reuse = True, scope_name = 'discriminator_B') 77 | self.discrimination_input_A_fake = self.discriminator(inputs = self.input_A_fake, reuse = True, scope_name = 'discriminator_A') 78 | self.discrimination_input_B_fake = self.discriminator(inputs = self.input_B_fake, reuse = True, scope_name = 'discriminator_B') 79 | 80 | # Discriminator wants to classify real and fake correctly 81 | self.discriminator_loss_input_A_real = l2_loss(y = tf.ones_like(self.discrimination_input_A_real), y_hat = self.discrimination_input_A_real) 82 | self.discriminator_loss_input_A_fake = l2_loss(y = tf.zeros_like(self.discrimination_input_A_fake), y_hat = self.discrimination_input_A_fake) 83 | self.discriminator_loss_A = (self.discriminator_loss_input_A_real + self.discriminator_loss_input_A_fake) / 2 84 | 85 | self.discriminator_loss_input_B_real = l2_loss(y = tf.ones_like(self.discrimination_input_B_real), y_hat = self.discrimination_input_B_real) 86 | self.discriminator_loss_input_B_fake = l2_loss(y = tf.zeros_like(self.discrimination_input_B_fake), y_hat = self.discrimination_input_B_fake) 87 | self.discriminator_loss_B = (self.discriminator_loss_input_B_real + self.discriminator_loss_input_B_fake) / 2 88 | 89 | # Merge the two discriminators into one 90 | self.discriminator_loss = self.discriminator_loss_A + self.discriminator_loss_B 91 | 92 | # Categorize variables because we have to optimize the two sets of the variables separately 93 | trainable_variables = tf.trainable_variables() 94 | self.discriminator_vars = [var for var in trainable_variables if 'discriminator' in var.name] 95 | self.generator_vars = [var for var in trainable_variables if 'generator' in var.name] 96 | #for var in t_vars: print(var.name) 97 | 98 | # Reserved for test 99 | self.generation_B_test = self.generator(inputs = self.input_A_test, reuse = True, scope_name = 'generator_A2B') 100 | self.generation_A_test = self.generator(inputs = self.input_B_test, reuse = True, scope_name = 'generator_B2A') 101 | 102 | 103 | def optimizer_initializer(self): 104 | 105 | self.generator_learning_rate = tf.placeholder(tf.float32, None, name = 'generator_learning_rate') 106 | self.discriminator_learning_rate = tf.placeholder(tf.float32, None, name = 'discriminator_learning_rate') 107 | self.discriminator_optimizer = tf.train.AdamOptimizer(learning_rate = self.discriminator_learning_rate, beta1 = 0.5).minimize(self.discriminator_loss, var_list = self.discriminator_vars) 108 | self.generator_optimizer = tf.train.AdamOptimizer(learning_rate = self.generator_learning_rate, beta1 = 0.5).minimize(self.generator_loss, var_list = self.generator_vars) 109 | 110 | def train(self, input_A, input_B, lambda_cycle, lambda_identity, generator_learning_rate, discriminator_learning_rate): 111 | 112 | generation_A, generation_B, generator_loss, _, generator_summaries = self.sess.run( 113 | [self.generation_A, self.generation_B, self.generator_loss, self.generator_optimizer, self.generator_summaries], \ 114 | feed_dict = {self.lambda_cycle: lambda_cycle, self.lambda_identity: lambda_identity, self.input_A_real: input_A, self.input_B_real: input_B, self.generator_learning_rate: generator_learning_rate}) 115 | 116 | self.writer.add_summary(generator_summaries, self.train_step) 117 | 118 | discriminator_loss, _, discriminator_summaries = self.sess.run([self.discriminator_loss, self.discriminator_optimizer, self.discriminator_summaries], \ 119 | feed_dict = {self.input_A_real: input_A, self.input_B_real: input_B, self.discriminator_learning_rate: discriminator_learning_rate, self.input_A_fake: generation_A, self.input_B_fake: generation_B}) 120 | 121 | self.writer.add_summary(discriminator_summaries, self.train_step) 122 | 123 | self.train_step += 1 124 | 125 | return generator_loss, discriminator_loss 126 | 127 | 128 | def test(self, inputs, direction): 129 | 130 | if direction == 'A2B': 131 | generation = self.sess.run(self.generation_B_test, feed_dict = {self.input_A_test: inputs}) 132 | elif direction == 'B2A': 133 | generation = self.sess.run(self.generation_A_test, feed_dict = {self.input_B_test: inputs}) 134 | else: 135 | raise Exception('Conversion direction must be specified.') 136 | 137 | return generation 138 | 139 | 140 | def save(self, directory, filename): 141 | 142 | if not os.path.exists(directory): 143 | os.makedirs(directory) 144 | self.saver.save(self.sess, os.path.join(directory, filename)) 145 | 146 | return os.path.join(directory, filename) 147 | 148 | def load(self, filepath): 149 | 150 | self.saver.restore(self.sess, filepath) 151 | #eslahie emlaie 152 | 153 | def summary(self): 154 | 155 | with tf.name_scope('generator_summaries'): 156 | cycle_loss_summary = tf.summary.scalar('cycle_loss', self.cycle_loss) 157 | identity_loss_summary = tf.summary.scalar('identity_loss', self.identity_loss) 158 | generator_loss_A2B_summary = tf.summary.scalar('generator_loss_A2B', self.generator_loss_A2B) 159 | generator_loss_B2A_summary = tf.summary.scalar('generator_loss_B2A', self.generator_loss_B2A) 160 | generator_loss_summary = tf.summary.scalar('generator_loss', self.generator_loss) 161 | generator_summaries = tf.summary.merge([cycle_loss_summary, identity_loss_summary, generator_loss_A2B_summary, generator_loss_B2A_summary, generator_loss_summary]) 162 | 163 | with tf.name_scope('discriminator_summaries'): 164 | discriminator_loss_A_summary = tf.summary.scalar('discriminator_loss_A', self.discriminator_loss_A) 165 | discriminator_loss_B_summary = tf.summary.scalar('discriminator_loss_B', self.discriminator_loss_B) 166 | discriminator_loss_summary = tf.summary.scalar('discriminator_loss', self.discriminator_loss) 167 | discriminator_summaries = tf.summary.merge([discriminator_loss_A_summary, discriminator_loss_B_summary, discriminator_loss_summary]) 168 | 169 | return generator_summaries, discriminator_summaries 170 | 171 | 172 | if __name__ == '__main__': 173 | 174 | model = CycleGAN(num_features = 24) 175 | print('Graph Compile Successeded.') 176 | -------------------------------------------------------------------------------- /Parallel-data-free emotional voice conversion with CycleGAN and CWT/model_f0.py: -------------------------------------------------------------------------------- 1 | import os 2 | import tensorflow as tf 3 | from module_f0 import discriminator, generator_gatedcnn 4 | from utils import l1_loss, l2_loss, cross_entropy_loss 5 | from datetime import datetime 6 | 7 | class CycleGAN(object): 8 | 9 | def __init__(self, num_features, discriminator = discriminator, generator = generator_gatedcnn, mode = 'train', log_dir = './log'): 10 | 11 | self.num_features = num_features 12 | self.input_shape = [None, num_features, None] # [batch_size, num_features, num_frames] 13 | 14 | self.discriminator = discriminator 15 | self.generator = generator 16 | self.mode = mode 17 | 18 | self.build_model() 19 | self.optimizer_initializer() 20 | 21 | self.saver = tf.train.Saver() 22 | self.sess = tf.Session() 23 | self.sess.run(tf.global_variables_initializer()) 24 | 25 | if self.mode == 'train': 26 | self.train_step = 0 27 | now = datetime.now() 28 | self.log_dir = os.path.join(log_dir, now.strftime('%Y%m%d-%H%M%S')) 29 | self.writer = tf.summary.FileWriter(self.log_dir, tf.get_default_graph()) 30 | self.generator_summaries, self.discriminator_summaries = self.summary() 31 | 32 | def build_model(self): 33 | 34 | # Placeholders for real training samples 35 | self.input_A_real = tf.placeholder(tf.float32, shape = self.input_shape, name = 'input_A_real') 36 | self.input_B_real = tf.placeholder(tf.float32, shape = self.input_shape, name = 'input_B_real') 37 | # Placeholders for fake generated samples 38 | self.input_A_fake = tf.placeholder(tf.float32, shape = self.input_shape, name = 'input_A_fake') 39 | self.input_B_fake = tf.placeholder(tf.float32, shape = self.input_shape, name = 'input_B_fake') 40 | # Placeholder for test samples 41 | self.input_A_test = tf.placeholder(tf.float32, shape = self.input_shape, name = 'input_A_test') 42 | self.input_B_test = tf.placeholder(tf.float32, shape = self.input_shape, name = 'input_B_test') 43 | 44 | self.generation_B = self.generator(inputs = self.input_A_real, reuse = False, scope_name = 'generator_A2B') 45 | self.cycle_A = self.generator(inputs = self.generation_B, reuse = False, scope_name = 'generator_B2A') 46 | 47 | self.generation_A = self.generator(inputs = self.input_B_real, reuse = True, scope_name = 'generator_B2A') 48 | self.cycle_B = self.generator(inputs = self.generation_A, reuse = True, scope_name = 'generator_A2B') 49 | 50 | self.generation_A_identity = self.generator(inputs = self.input_A_real, reuse = True, scope_name = 'generator_B2A') 51 | self.generation_B_identity = self.generator(inputs = self.input_B_real, reuse = True, scope_name = 'generator_A2B') 52 | 53 | self.discrimination_A_fake = self.discriminator(inputs = self.generation_A, reuse = False, scope_name = 'discriminator_A') 54 | self.discrimination_B_fake = self.discriminator(inputs = self.generation_B, reuse = False, scope_name = 'discriminator_B') 55 | 56 | # Cycle loss 57 | self.cycle_loss = l1_loss(y = self.input_A_real, y_hat = self.cycle_A) + l1_loss(y = self.input_B_real, y_hat = self.cycle_B) 58 | 59 | # Identity loss 60 | self.identity_loss = l1_loss(y = self.input_A_real, y_hat = self.generation_A_identity) + l1_loss(y = self.input_B_real, y_hat = self.generation_B_identity) 61 | 62 | # Place holder for lambda_cycle and lambda_identity 63 | self.lambda_cycle = tf.placeholder(tf.float32, None, name = 'lambda_cycle') 64 | self.lambda_identity = tf.placeholder(tf.float32, None, name = 'lambda_identity') 65 | 66 | # Generator loss 67 | # Generator wants to fool discriminator 68 | self.generator_loss_A2B = l2_loss(y = tf.ones_like(self.discrimination_B_fake), y_hat = self.discrimination_B_fake) 69 | self.generator_loss_B2A = l2_loss(y = tf.ones_like(self.discrimination_A_fake), y_hat = self.discrimination_A_fake) 70 | 71 | # Merge the two generators and the cycle loss 72 | self.generator_loss = self.generator_loss_A2B + self.generator_loss_B2A + self.lambda_cycle * self.cycle_loss + self.lambda_identity * self.identity_loss 73 | 74 | # Discriminator loss 75 | self.discrimination_input_A_real = self.discriminator(inputs = self.input_A_real, reuse = True, scope_name = 'discriminator_A') 76 | self.discrimination_input_B_real = self.discriminator(inputs = self.input_B_real, reuse = True, scope_name = 'discriminator_B') 77 | self.discrimination_input_A_fake = self.discriminator(inputs = self.input_A_fake, reuse = True, scope_name = 'discriminator_A') 78 | self.discrimination_input_B_fake = self.discriminator(inputs = self.input_B_fake, reuse = True, scope_name = 'discriminator_B') 79 | 80 | # Discriminator wants to classify real and fake correctly 81 | self.discriminator_loss_input_A_real = l2_loss(y = tf.ones_like(self.discrimination_input_A_real), y_hat = self.discrimination_input_A_real) 82 | self.discriminator_loss_input_A_fake = l2_loss(y = tf.zeros_like(self.discrimination_input_A_fake), y_hat = self.discrimination_input_A_fake) 83 | self.discriminator_loss_A = (self.discriminator_loss_input_A_real + self.discriminator_loss_input_A_fake) / 2 84 | 85 | self.discriminator_loss_input_B_real = l2_loss(y = tf.ones_like(self.discrimination_input_B_real), y_hat = self.discrimination_input_B_real) 86 | self.discriminator_loss_input_B_fake = l2_loss(y = tf.zeros_like(self.discrimination_input_B_fake), y_hat = self.discrimination_input_B_fake) 87 | self.discriminator_loss_B = (self.discriminator_loss_input_B_real + self.discriminator_loss_input_B_fake) / 2 88 | 89 | # Merge the two discriminators into one 90 | self.discriminator_loss = self.discriminator_loss_A + self.discriminator_loss_B 91 | 92 | # Categorize variables because we have to optimize the two sets of the variables separately 93 | trainable_variables = tf.trainable_variables() 94 | self.discriminator_vars = [var for var in trainable_variables if 'discriminator' in var.name] 95 | self.generator_vars = [var for var in trainable_variables if 'generator' in var.name] 96 | #for var in t_vars: print(var.name) 97 | 98 | # Reserved for test 99 | self.generation_B_test = self.generator(inputs = self.input_A_test, reuse = True, scope_name = 'generator_A2B') 100 | self.generation_A_test = self.generator(inputs = self.input_B_test, reuse = True, scope_name = 'generator_B2A') 101 | 102 | 103 | def optimizer_initializer(self): 104 | 105 | self.generator_learning_rate = tf.placeholder(tf.float32, None, name = 'generator_learning_rate') 106 | self.discriminator_learning_rate = tf.placeholder(tf.float32, None, name = 'discriminator_learning_rate') 107 | self.discriminator_optimizer = tf.train.AdamOptimizer(learning_rate = self.discriminator_learning_rate, beta1 = 0.5).minimize(self.discriminator_loss, var_list = self.discriminator_vars) 108 | self.generator_optimizer = tf.train.AdamOptimizer(learning_rate = self.generator_learning_rate, beta1 = 0.5).minimize(self.generator_loss, var_list = self.generator_vars) 109 | 110 | def train(self, input_A, input_B, lambda_cycle, lambda_identity, generator_learning_rate, discriminator_learning_rate): 111 | 112 | generation_A, generation_B, generator_loss, _, generator_summaries = self.sess.run( 113 | [self.generation_A, self.generation_B, self.generator_loss, self.generator_optimizer, self.generator_summaries], \ 114 | feed_dict = {self.lambda_cycle: lambda_cycle, self.lambda_identity: lambda_identity, self.input_A_real: input_A, self.input_B_real: input_B, self.generator_learning_rate: generator_learning_rate}) 115 | 116 | self.writer.add_summary(generator_summaries, self.train_step) 117 | 118 | discriminator_loss, _, discriminator_summaries = self.sess.run([self.discriminator_loss, self.discriminator_optimizer, self.discriminator_summaries], \ 119 | feed_dict = {self.input_A_real: input_A, self.input_B_real: input_B, self.discriminator_learning_rate: discriminator_learning_rate, self.input_A_fake: generation_A, self.input_B_fake: generation_B}) 120 | 121 | self.writer.add_summary(discriminator_summaries, self.train_step) 122 | 123 | self.train_step += 1 124 | 125 | return generator_loss, discriminator_loss 126 | 127 | 128 | def test(self, inputs, direction): 129 | 130 | if direction == 'A2B': 131 | generation = self.sess.run(self.generation_B_test, feed_dict = {self.input_A_test: inputs}) 132 | elif direction == 'B2A': 133 | generation = self.sess.run(self.generation_A_test, feed_dict = {self.input_B_test: inputs}) 134 | else: 135 | raise Exception('Conversion direction must be specified.') 136 | 137 | return generation 138 | 139 | 140 | def save(self, directory, filename): 141 | 142 | if not os.path.exists(directory): 143 | os.makedirs(directory) 144 | self.saver.save(self.sess, os.path.join(directory, filename)) 145 | 146 | return os.path.join(directory, filename) 147 | 148 | def load(self, filepath): 149 | 150 | self.saver.restore(self.sess, filepath) 151 | 152 | 153 | def summary(self): 154 | 155 | with tf.name_scope('generator_summaries'): 156 | cycle_loss_summary = tf.summary.scalar('cycle_loss', self.cycle_loss) 157 | identity_loss_summary = tf.summary.scalar('identity_loss', self.identity_loss) 158 | generator_loss_A2B_summary = tf.summary.scalar('generator_loss_A2B', self.generator_loss_A2B) 159 | generator_loss_B2A_summary = tf.summary.scalar('generator_loss_B2A', self.generator_loss_B2A) 160 | generator_loss_summary = tf.summary.scalar('generator_loss', self.generator_loss) 161 | generator_summaries = tf.summary.merge([cycle_loss_summary, identity_loss_summary, generator_loss_A2B_summary, generator_loss_B2A_summary, generator_loss_summary]) 162 | 163 | with tf.name_scope('discriminator_summaries'): 164 | discriminator_loss_A_summary = tf.summary.scalar('discriminator_loss_A', self.discriminator_loss_A) 165 | discriminator_loss_B_summary = tf.summary.scalar('discriminator_loss_B', self.discriminator_loss_B) 166 | discriminator_loss_summary = tf.summary.scalar('discriminator_loss', self.discriminator_loss) 167 | discriminator_summaries = tf.summary.merge([discriminator_loss_A_summary, discriminator_loss_B_summary, discriminator_loss_summary]) 168 | 169 | return generator_summaries, discriminator_summaries 170 | 171 | 172 | if __name__ == '__main__': 173 | tf.reset_default_graph() 174 | model = CycleGAN(num_features = 10) 175 | print('Graph Compile Successeded.') 176 | -------------------------------------------------------------------------------- /Parallel-data-free emotional voice conversion with CycleGAN and CWT/model_mceps.py: -------------------------------------------------------------------------------- 1 | import os 2 | import tensorflow as tf 3 | from module_mceps import discriminator, generator_gatedcnn 4 | from utils import l1_loss, l2_loss, cross_entropy_loss 5 | from datetime import datetime 6 | 7 | class CycleGAN(object): 8 | 9 | def __init__(self, num_features, discriminator = discriminator, generator = generator_gatedcnn, mode = 'train', log_dir = './log'): 10 | 11 | self.num_features = num_features 12 | self.input_shape = [None, num_features, None] # [batch_size, num_features, num_frames] 13 | 14 | self.discriminator = discriminator 15 | self.generator = generator 16 | self.mode = mode 17 | 18 | self.build_model() 19 | self.optimizer_initializer() 20 | 21 | self.saver = tf.train.Saver() 22 | self.sess = tf.Session() 23 | self.sess.run(tf.global_variables_initializer()) 24 | 25 | if self.mode == 'train': 26 | self.train_step = 0 27 | now = datetime.now() 28 | self.log_dir = os.path.join(log_dir, now.strftime('%Y%m%d-%H%M%S')) 29 | self.writer = tf.summary.FileWriter(self.log_dir, tf.get_default_graph()) 30 | self.generator_summaries, self.discriminator_summaries = self.summary() 31 | 32 | def build_model(self): 33 | tf.reset_default_graph() 34 | # Placeholders for real training samples 35 | self.input_A_real = tf.placeholder(tf.float32, shape = self.input_shape, name = 'input_A_real') 36 | self.input_B_real = tf.placeholder(tf.float32, shape = self.input_shape, name = 'input_B_real') 37 | # Placeholders for fake generated samples 38 | self.input_A_fake = tf.placeholder(tf.float32, shape = self.input_shape, name = 'input_A_fake') 39 | self.input_B_fake = tf.placeholder(tf.float32, shape = self.input_shape, name = 'input_B_fake') 40 | # Placeholder for test samples 41 | self.input_A_test = tf.placeholder(tf.float32, shape = self.input_shape, name = 'input_A_test') 42 | self.input_B_test = tf.placeholder(tf.float32, shape = self.input_shape, name = 'input_B_test') 43 | 44 | self.generation_B = self.generator(inputs = self.input_A_real, reuse = False, scope_name = 'generator_A2B') 45 | self.cycle_A = self.generator(inputs = self.generation_B, reuse = False, scope_name = 'generator_B2A') 46 | 47 | self.generation_A = self.generator(inputs = self.input_B_real, reuse = True, scope_name = 'generator_B2A') 48 | self.cycle_B = self.generator(inputs = self.generation_A, reuse = True, scope_name = 'generator_A2B') 49 | 50 | self.generation_A_identity = self.generator(inputs = self.input_A_real, reuse = True, scope_name = 'generator_B2A') 51 | self.generation_B_identity = self.generator(inputs = self.input_B_real, reuse = True, scope_name = 'generator_A2B') 52 | 53 | self.discrimination_A_fake = self.discriminator(inputs = self.generation_A, reuse = False, scope_name = 'discriminator_A') 54 | self.discrimination_B_fake = self.discriminator(inputs = self.generation_B, reuse = False, scope_name = 'discriminator_B') 55 | 56 | # Cycle loss 57 | self.cycle_loss = l1_loss(y = self.input_A_real, y_hat = self.cycle_A) + l1_loss(y = self.input_B_real, y_hat = self.cycle_B) 58 | 59 | # Identity loss 60 | self.identity_loss = l1_loss(y = self.input_A_real, y_hat = self.generation_A_identity) + l1_loss(y = self.input_B_real, y_hat = self.generation_B_identity) 61 | 62 | # Place holder for lambda_cycle and lambda_identity 63 | self.lambda_cycle = tf.placeholder(tf.float32, None, name = 'lambda_cycle') 64 | self.lambda_identity = tf.placeholder(tf.float32, None, name = 'lambda_identity') 65 | 66 | # Generator loss 67 | # Generator wants to fool discriminator 68 | self.generator_loss_A2B = l2_loss(y = tf.ones_like(self.discrimination_B_fake), y_hat = self.discrimination_B_fake) 69 | self.generator_loss_B2A = l2_loss(y = tf.ones_like(self.discrimination_A_fake), y_hat = self.discrimination_A_fake) 70 | 71 | # Merge the two generators and the cycle loss 72 | self.generator_loss = self.generator_loss_A2B + self.generator_loss_B2A + self.lambda_cycle * self.cycle_loss + self.lambda_identity * self.identity_loss 73 | 74 | # Discriminator loss 75 | self.discrimination_input_A_real = self.discriminator(inputs = self.input_A_real, reuse = True, scope_name = 'discriminator_A') 76 | self.discrimination_input_B_real = self.discriminator(inputs = self.input_B_real, reuse = True, scope_name = 'discriminator_B') 77 | self.discrimination_input_A_fake = self.discriminator(inputs = self.input_A_fake, reuse = True, scope_name = 'discriminator_A') 78 | self.discrimination_input_B_fake = self.discriminator(inputs = self.input_B_fake, reuse = True, scope_name = 'discriminator_B') 79 | 80 | # Discriminator wants to classify real and fake correctly 81 | self.discriminator_loss_input_A_real = l2_loss(y = tf.ones_like(self.discrimination_input_A_real), y_hat = self.discrimination_input_A_real) 82 | self.discriminator_loss_input_A_fake = l2_loss(y = tf.zeros_like(self.discrimination_input_A_fake), y_hat = self.discrimination_input_A_fake) 83 | self.discriminator_loss_A = (self.discriminator_loss_input_A_real + self.discriminator_loss_input_A_fake) / 2 84 | 85 | self.discriminator_loss_input_B_real = l2_loss(y = tf.ones_like(self.discrimination_input_B_real), y_hat = self.discrimination_input_B_real) 86 | self.discriminator_loss_input_B_fake = l2_loss(y = tf.zeros_like(self.discrimination_input_B_fake), y_hat = self.discrimination_input_B_fake) 87 | self.discriminator_loss_B = (self.discriminator_loss_input_B_real + self.discriminator_loss_input_B_fake) / 2 88 | 89 | # Merge the two discriminators into one 90 | self.discriminator_loss = self.discriminator_loss_A + self.discriminator_loss_B 91 | 92 | # Categorize variables because we have to optimize the two sets of the variables separately 93 | trainable_variables = tf.trainable_variables() 94 | self.discriminator_vars = [var for var in trainable_variables if 'discriminator' in var.name] 95 | self.generator_vars = [var for var in trainable_variables if 'generator' in var.name] 96 | #for var in t_vars: print(var.name) 97 | 98 | # Reserved for test 99 | self.generation_B_test = self.generator(inputs = self.input_A_test, reuse = True, scope_name = 'generator_A2B') 100 | self.generation_A_test = self.generator(inputs = self.input_B_test, reuse = True, scope_name = 'generator_B2A') 101 | 102 | 103 | def optimizer_initializer(self): 104 | 105 | self.generator_learning_rate = tf.placeholder(tf.float32, None, name = 'generator_learning_rate') 106 | self.discriminator_learning_rate = tf.placeholder(tf.float32, None, name = 'discriminator_learning_rate') 107 | self.discriminator_optimizer = tf.train.AdamOptimizer(learning_rate = self.discriminator_learning_rate, beta1 = 0.5).minimize(self.discriminator_loss, var_list = self.discriminator_vars) 108 | self.generator_optimizer = tf.train.AdamOptimizer(learning_rate = self.generator_learning_rate, beta1 = 0.5).minimize(self.generator_loss, var_list = self.generator_vars) 109 | 110 | def train(self, input_A, input_B, lambda_cycle, lambda_identity, generator_learning_rate, discriminator_learning_rate): 111 | 112 | generation_A, generation_B, generator_loss, _, generator_summaries = self.sess.run( 113 | [self.generation_A, self.generation_B, self.generator_loss, self.generator_optimizer, self.generator_summaries], \ 114 | feed_dict = {self.lambda_cycle: lambda_cycle, self.lambda_identity: lambda_identity, self.input_A_real: input_A, self.input_B_real: input_B, self.generator_learning_rate: generator_learning_rate}) 115 | 116 | self.writer.add_summary(generator_summaries, self.train_step) 117 | 118 | discriminator_loss, _, discriminator_summaries = self.sess.run([self.discriminator_loss, self.discriminator_optimizer, self.discriminator_summaries], \ 119 | feed_dict = {self.input_A_real: input_A, self.input_B_real: input_B, self.discriminator_learning_rate: discriminator_learning_rate, self.input_A_fake: generation_A, self.input_B_fake: generation_B}) 120 | 121 | self.writer.add_summary(discriminator_summaries, self.train_step) 122 | 123 | self.train_step += 1 124 | 125 | return generator_loss, discriminator_loss 126 | 127 | 128 | def test(self, inputs, direction): 129 | 130 | if direction == 'A2B': 131 | generation = self.sess.run(self.generation_B_test, feed_dict = {self.input_A_test: inputs}) 132 | elif direction == 'B2A': 133 | generation = self.sess.run(self.generation_A_test, feed_dict = {self.input_B_test: inputs}) 134 | else: 135 | raise Exception('Conversion direction must be specified.') 136 | 137 | return generation 138 | 139 | 140 | def save(self, directory, filename): 141 | 142 | if not os.path.exists(directory): 143 | os.makedirs(directory) 144 | self.saver.save(self.sess, os.path.join(directory, filename)) 145 | 146 | return os.path.join(directory, filename) 147 | 148 | def load(self, filepath): 149 | 150 | self.saver.restore(self.sess, filepath) 151 | 152 | 153 | def summary(self): 154 | 155 | with tf.name_scope('generator_summaries'): 156 | cycle_loss_summary = tf.summary.scalar('cycle_loss', self.cycle_loss) 157 | identity_loss_summary = tf.summary.scalar('identity_loss', self.identity_loss) 158 | generator_loss_A2B_summary = tf.summary.scalar('generator_loss_A2B', self.generator_loss_A2B) 159 | generator_loss_B2A_summary = tf.summary.scalar('generator_loss_B2A', self.generator_loss_B2A) 160 | generator_loss_summary = tf.summary.scalar('generator_loss', self.generator_loss) 161 | generator_summaries = tf.summary.merge([cycle_loss_summary, identity_loss_summary, generator_loss_A2B_summary, generator_loss_B2A_summary, generator_loss_summary]) 162 | 163 | with tf.name_scope('discriminator_summaries'): 164 | discriminator_loss_A_summary = tf.summary.scalar('discriminator_loss_A', self.discriminator_loss_A) 165 | discriminator_loss_B_summary = tf.summary.scalar('discriminator_loss_B', self.discriminator_loss_B) 166 | discriminator_loss_summary = tf.summary.scalar('discriminator_loss', self.discriminator_loss) 167 | discriminator_summaries = tf.summary.merge([discriminator_loss_A_summary, discriminator_loss_B_summary, discriminator_loss_summary]) 168 | 169 | return generator_summaries, discriminator_summaries 170 | 171 | 172 | if __name__ == '__main__': 173 | tf.reset_default_graph() 174 | model = CycleGAN(num_features = 24) 175 | print('Graph Compile Successeded.') 176 | -------------------------------------------------------------------------------- /Parallel-data-free emotional voice conversion with CycleGAN and CWT/train.py: -------------------------------------------------------------------------------- 1 | import os 2 | import numpy as np 3 | import argparse 4 | import time 5 | import librosa 6 | 7 | from preprocess import * 8 | from model import CycleGAN 9 | 10 | 11 | def train(train_A_dir, train_B_dir, model_dir, model_name, random_seed, validation_A_dir, validation_B_dir, output_dir, tensorboard_log_dir): 12 | 13 | np.random.seed(random_seed) 14 | 15 | num_epochs = 5000 16 | mini_batch_size = 1 # mini_batch_size = 1 is better 17 | generator_learning_rate = 0.0002 18 | generator_learning_rate_decay = generator_learning_rate / 200000 19 | discriminator_learning_rate = 0.0001 20 | discriminator_learning_rate_decay = discriminator_learning_rate / 200000 21 | sampling_rate = 16000 22 | num_mcep = 24 23 | frame_period = 5.0 24 | n_frames = 128 25 | lambda_cycle = 10 26 | lambda_identity = 5 27 | 28 | print('Preprocessing Data...') 29 | 30 | start_time = time.time() 31 | 32 | wavs_A = load_wavs(wav_dir = train_A_dir, sr = sampling_rate) 33 | wavs_B = load_wavs(wav_dir = train_B_dir, sr = sampling_rate) 34 | 35 | f0s_A, timeaxes_A, sps_A, aps_A, coded_sps_A = world_encode_data(wavs = wavs_A, fs = sampling_rate, frame_period = frame_period, coded_dim = num_mcep) 36 | f0s_B, timeaxes_B, sps_B, aps_B, coded_sps_B = world_encode_data(wavs = wavs_B, fs = sampling_rate, frame_period = frame_period, coded_dim = num_mcep) 37 | 38 | log_f0s_mean_A, log_f0s_std_A = logf0_statistics(f0s_A) 39 | log_f0s_mean_B, log_f0s_std_B = logf0_statistics(f0s_B) 40 | 41 | print('Log Pitch A') 42 | print('Mean: %f, Std: %f' %(log_f0s_mean_A, log_f0s_std_A)) 43 | print('Log Pitch B') 44 | print('Mean: %f, Std: %f' %(log_f0s_mean_B, log_f0s_std_B)) 45 | 46 | 47 | coded_sps_A_transposed = transpose_in_list(lst = coded_sps_A) 48 | coded_sps_B_transposed = transpose_in_list(lst = coded_sps_B) 49 | 50 | coded_sps_A_norm, coded_sps_A_mean, coded_sps_A_std = coded_sps_normalization_fit_transoform(coded_sps = coded_sps_A_transposed) 51 | print("Input data fixed.") 52 | coded_sps_B_norm, coded_sps_B_mean, coded_sps_B_std = coded_sps_normalization_fit_transoform(coded_sps = coded_sps_B_transposed) 53 | 54 | if not os.path.exists(model_dir): 55 | os.makedirs(model_dir) 56 | np.savez(os.path.join(model_dir, 'logf0s_normalization.npz'), mean_A = log_f0s_mean_A, std_A = log_f0s_std_A, mean_B = log_f0s_mean_B, std_B = log_f0s_std_B) 57 | np.savez(os.path.join(model_dir, 'mcep_normalization.npz'), mean_A = coded_sps_A_mean, std_A = coded_sps_A_std, mean_B = coded_sps_B_mean, std_B = coded_sps_B_std) 58 | 59 | if validation_A_dir is not None: 60 | validation_A_output_dir = os.path.join(output_dir, 'converted_A') 61 | if not os.path.exists(validation_A_output_dir): 62 | os.makedirs(validation_A_output_dir) 63 | 64 | if validation_B_dir is not None: 65 | validation_B_output_dir = os.path.join(output_dir, 'converted_B') 66 | if not os.path.exists(validation_B_output_dir): 67 | os.makedirs(validation_B_output_dir) 68 | 69 | end_time = time.time() 70 | time_elapsed = end_time - start_time 71 | 72 | print('Preprocessing Done.') 73 | 74 | print('Time Elapsed for Data Preprocessing: %02d:%02d:%02d' % (time_elapsed // 3600, (time_elapsed % 3600 // 60), (time_elapsed % 60 // 1))) 75 | 76 | model = CycleGAN(num_features = num_mcep) 77 | 78 | for epoch in range(num_epochs): 79 | print('Epoch: %d' % epoch) 80 | ''' 81 | if epoch > 60: 82 | lambda_identity = 0 83 | if epoch > 1250: 84 | generator_learning_rate = max(0, generator_learning_rate - 0.0000002) 85 | discriminator_learning_rate = max(0, discriminator_learning_rate - 0.0000001) 86 | ''' 87 | 88 | start_time_epoch = time.time() 89 | 90 | dataset_A, dataset_B = sample_train_data(dataset_A = coded_sps_A_norm, dataset_B = coded_sps_B_norm, n_frames = n_frames) 91 | 92 | n_samples = dataset_A.shape[0] 93 | 94 | for i in range(n_samples // mini_batch_size): 95 | 96 | num_iterations = n_samples // mini_batch_size * epoch + i 97 | 98 | if num_iterations > 10000: 99 | lambda_identity = 0 100 | if num_iterations > 200000: 101 | generator_learning_rate = max(0, generator_learning_rate - generator_learning_rate_decay) 102 | discriminator_learning_rate = max(0, discriminator_learning_rate - discriminator_learning_rate_decay) 103 | 104 | start = i * mini_batch_size 105 | end = (i + 1) * mini_batch_size 106 | 107 | generator_loss, discriminator_loss = model.train(input_A = dataset_A[start:end], input_B = dataset_B[start:end], lambda_cycle = lambda_cycle, lambda_identity = lambda_identity, generator_learning_rate = generator_learning_rate, discriminator_learning_rate = discriminator_learning_rate) 108 | 109 | if i % 50 == 0: 110 | #print('Iteration: %d, Generator Loss : %f, Discriminator Loss : %f' % (num_iterations, generator_loss, discriminator_loss)) 111 | print('Iteration: {:07d}, Generator Learning Rate: {:.7f}, Discriminator Learning Rate: {:.7f}, Generator Loss : {:.3f}, Discriminator Loss : {:.3f}'.format(num_iterations, generator_learning_rate, discriminator_learning_rate, generator_loss, discriminator_loss)) 112 | 113 | model.save(directory = model_dir, filename = model_name) 114 | 115 | end_time_epoch = time.time() 116 | time_elapsed_epoch = end_time_epoch - start_time_epoch 117 | 118 | print('Time Elapsed for This Epoch: %02d:%02d:%02d' % (time_elapsed_epoch // 3600, (time_elapsed_epoch % 3600 // 60), (time_elapsed_epoch % 60 // 1))) 119 | 120 | if validation_A_dir is not None: 121 | if epoch % 50 == 0: 122 | print('Generating Validation Data B from A...') 123 | for file in os.listdir(validation_A_dir): 124 | filepath = os.path.join(validation_A_dir, file) 125 | wav, _ = librosa.load(filepath, sr = sampling_rate, mono = True) 126 | wav = wav_padding(wav = wav, sr = sampling_rate, frame_period = frame_period, multiple = 4) 127 | f0, timeaxis, sp, ap = world_decompose(wav = wav, fs = sampling_rate, frame_period = frame_period) 128 | f0_converted = pitch_conversion(f0 = f0, mean_log_src = log_f0s_mean_A, std_log_src = log_f0s_std_A, mean_log_target = log_f0s_mean_B, std_log_target = log_f0s_std_B) 129 | coded_sp = world_encode_spectral_envelop(sp = sp, fs = sampling_rate, dim = num_mcep) 130 | coded_sp_transposed = coded_sp.T 131 | coded_sp_norm = (coded_sp_transposed - coded_sps_A_mean) / coded_sps_A_std 132 | coded_sp_converted_norm = model.test(inputs = np.array([coded_sp_norm]), direction = 'A2B')[0] 133 | coded_sp_converted = coded_sp_converted_norm * coded_sps_B_std + coded_sps_B_mean 134 | coded_sp_converted = coded_sp_converted.T 135 | coded_sp_converted = np.ascontiguousarray(coded_sp_converted) 136 | decoded_sp_converted = world_decode_spectral_envelop(coded_sp = coded_sp_converted, fs = sampling_rate) 137 | wav_transformed = world_speech_synthesis(f0 = f0_converted, decoded_sp = decoded_sp_converted, ap = ap, fs = sampling_rate, frame_period = frame_period) 138 | librosa.output.write_wav(os.path.join(validation_A_output_dir, os.path.basename(file)), wav_transformed, sampling_rate) 139 | 140 | if validation_B_dir is not None: 141 | if epoch % 50 == 0: 142 | print('Generating Validation Data A from B...') 143 | for file in os.listdir(validation_B_dir): 144 | filepath = os.path.join(validation_B_dir, file) 145 | wav, _ = librosa.load(filepath, sr = sampling_rate, mono = True) 146 | wav = wav_padding(wav = wav, sr = sampling_rate, frame_period = frame_period, multiple = 4) 147 | f0, timeaxis, sp, ap = world_decompose(wav = wav, fs = sampling_rate, frame_period = frame_period) 148 | f0_converted = pitch_conversion(f0 = f0, mean_log_src = log_f0s_mean_B, std_log_src = log_f0s_std_B, mean_log_target = log_f0s_mean_A, std_log_target = log_f0s_std_A) 149 | coded_sp = world_encode_spectral_envelop(sp = sp, fs = sampling_rate, dim = num_mcep) 150 | coded_sp_transposed = coded_sp.T 151 | coded_sp_norm = (coded_sp_transposed - coded_sps_B_mean) / coded_sps_B_std 152 | coded_sp_converted_norm = model.test(inputs = np.array([coded_sp_norm]), direction = 'B2A')[0] 153 | coded_sp_converted = coded_sp_converted_norm * coded_sps_A_std + coded_sps_A_mean 154 | coded_sp_converted = coded_sp_converted.T 155 | coded_sp_converted = np.ascontiguousarray(coded_sp_converted) 156 | decoded_sp_converted = world_decode_spectral_envelop(coded_sp = coded_sp_converted, fs = sampling_rate) 157 | wav_transformed = world_speech_synthesis(f0 = f0_converted, decoded_sp = decoded_sp_converted, ap = ap, fs = sampling_rate, frame_period = frame_period) 158 | librosa.output.write_wav(os.path.join(validation_B_output_dir, os.path.basename(file)), wav_transformed, sampling_rate) 159 | 160 | if __name__ == '__main__': 161 | 162 | parser = argparse.ArgumentParser(description = 'Train CycleGAN model for datasets.') 163 | 164 | train_A_dir_default = './data/training/NEUTRAL' 165 | train_B_dir_default = './data/training/SURPRISE' 166 | model_dir_default = './model/neutral_to_suprise_mceps' 167 | model_name_default = 'neutral_to_suprise_mceps.ckpt' 168 | random_seed_default = 0 169 | validation_A_dir_default = './data/evaluation_all/NEUTRAL' 170 | validation_B_dir_default = './data/evaluation_all/SURPRISE' 171 | output_dir_default = './validation_output' 172 | tensorboard_log_dir_default = './log' 173 | 174 | parser.add_argument('--train_A_dir', type = str, help = 'Directory for A.', default = train_A_dir_default) 175 | parser.add_argument('--train_B_dir', type = str, help = 'Directory for B.', default = train_B_dir_default) 176 | parser.add_argument('--model_dir', type = str, help = 'Directory for saving models.', default = model_dir_default) 177 | parser.add_argument('--model_name', type = str, help = 'File name for saving model.', default = model_name_default) 178 | parser.add_argument('--random_seed', type = int, help = 'Random seed for model training.', default = random_seed_default) 179 | parser.add_argument('--validation_A_dir', type = str, help = 'Convert validation A after each training epoch. If set none, no conversion would be done during the training.', default = validation_A_dir_default) 180 | parser.add_argument('--validation_B_dir', type = str, help = 'Convert validation B after each training epoch. If set none, no conversion would be done during the training.', default = validation_B_dir_default) 181 | parser.add_argument('--output_dir', type = str, help = 'Output directory for converted validation voices.', default = output_dir_default) 182 | parser.add_argument('--tensorboard_log_dir', type = str, help = 'TensorBoard log directory.', default = tensorboard_log_dir_default) 183 | 184 | argv = parser.parse_args() 185 | 186 | train_A_dir = argv.train_A_dir 187 | train_B_dir = argv.train_B_dir 188 | model_dir = argv.model_dir 189 | model_name = argv.model_name 190 | random_seed = argv.random_seed 191 | validation_A_dir = None if argv.validation_A_dir == 'None' or argv.validation_A_dir == 'none' else argv.validation_A_dir 192 | validation_B_dir = None if argv.validation_B_dir == 'None' or argv.validation_B_dir == 'none' else argv.validation_B_dir 193 | output_dir = argv.output_dir 194 | tensorboard_log_dir = argv.tensorboard_log_dir 195 | 196 | train(train_A_dir = train_A_dir, train_B_dir = train_B_dir, model_dir = model_dir, model_name = model_name, random_seed = random_seed, validation_A_dir = validation_A_dir, validation_B_dir = validation_B_dir, output_dir = output_dir, tensorboard_log_dir = tensorboard_log_dir) 197 | --------------------------------------------------------------------------------