├── README.md
└── Parallel-data-free emotional voice conversion with CycleGAN and CWT
    ├── utils.py
    ├── preprocess.py
    ├── convert_separate.py
    ├── train_f0.py
    ├── module.py
    ├── module_f0.py
    ├── module_mceps.py
    ├── model.py
    ├── model_f0.py
    ├── model_mceps.py
    └── train.py


/README.md:
--------------------------------------------------------------------------------
 1 | # Emotional Voice Conversion and/or Speaker Identity Conversion with Non-Parallel Training Data
 2 | 
 3 | **This is an implementation of our Cycle-GAN based emotional voice conversion framework (to be appear in Speaker Odyssey 2020) that converts both spectrum and prosody features. Please kindly cite our paper if you are using our codes:**
 4 |  
 5 | *Kun  Zhou,  Berrak  Sisman,  and  Haizhou  Li,“Transforming spectrum and prosody for emotional voice conversion with non-parallel training data,” arXiv preprint arXiv:2002.00198, 2020*
 6 | 
 7 | **Bibtex:**
 8 | ```
 9 | @article{zhou2020transforming, 
10 |   title={Transforming Spectrum and Prosody for Emotional Voice Conversion with Non-Parallel Training Data},
11 |   author={Zhou, Kun and Sisman, Berrak and Li, Haizhou},
12 |   journal={arXiv preprint arXiv:2002.00198},
13 |   year={2020}
14 | }
15 | 
16 | ```
17 | Dependencies
18 | -------
19 | 
20 |   `Python 3.5`
21 | 
22 |   `Numpy 1.14`
23 | 
24 |   `Tensorflow 1.8`
25 | 
26 |   `ProgressBar2 3.37.1`
27 | 
28 |   `LibROSA 0.6`
29 | 
30 |   `FFmpeg 4.0`
31 | 
32 |   `PyWorld`
33 | 
34 |   `sklearn`
35 | 
36 |   `pycwt`
37 | 
38 |   `sprocket-vc`
39 | 
40 |  ` scipy`
41 | 
42 |   `glob`
43 | 
44 | Usage
45 | ---------
46 | 
47 | 1. `train.py`
48 | 
49 | This script is to train CycleGAN with spectrum features.
50 | 
51 | 2. `train_f0.py`
52 | 
53 | This script is to perform CWT on F0, then train CycleGAN with CWT-F0 features.
54 | 
55 | 3. `convert_separate.py`
56 | 
57 | This script is to convert speech from the source using trained CycleGAN to convert spectrum and CWT-F0 features separately.
58 | 
59 | 
60 | # Instruction
61 | 
62 | 1. **To train CycleGAN with spectrum features, please run the code:**</br>
63 | ```Bash
64 | $ python train.py --train_A_dir './data/training/NEUTRAL(PATH TO SOURCE TRAINING DATA)' --train_B_dir './data/training/SURPRISE(PATH TO TARGET TRAINING DATA)' --model_dir './model/neutral_to_suprise_mceps' --model_name 'neutral_to_suprise_mceps.ckpt' --random_seed 0 --validation_A_dir './data/evaluation_all/NEUTRAL' --validation_B_dir './data/evaluation_all/SURPRISE' --output_dir './validation_output' --tensorboard_log_dir './log'
65 | ```
66 | 
67 | 2. **To train CycleGAN with CWT-F0 features, please run the code:** 
68 | ```Bash
69 | $ python train_f0.py --train_A_dir './data/training/NEUTRAL(PATH TO SOURCE TRAINING DATA)' --train_B_dir './data/training/SURPRISE(PATH TO TARGET TRAINING DATA)' --model_dir './model/neutral_to_suprise_f0' --model_name 'neutral_to_suprise_f0.ckpt' --random_seed 0 --validation_A_dir './data/evaluation_all/NEUTRAL' --validation_B_dir './data/evaluation_all/SURPRISE' --output_dir './validation_output' --tensorboard_log_dir './log' 
70 | ```
71 | 
72 | 3. **To convert the emotion from the source to the target, please run the code:**
73 | ```Bash
74 | $ python convert_separate.py --model_f0_dir './model/neutral_to_surprise_f0' --model_f0_name 'neutral_to_surprise_f0.ckpt' --model_mceps_dir './model/neutral_to_surprise_mceps' --model_mceps_name 'neutral_to_surprise_mceps.ckpt' --data_dir './data/evaluation_all/NEUTRAL(PATH TO EVALUATION DATA)' --conversion_direction 'A2B' --output_dir './converted_voices_neutral_to_surprise_separate'
75 | ```
76 | maryam msv
77 | olom tahghighat
78 | 40012340048019
79 | 
80 | **Note1:** 
81 | The codes are based on CycleGAN Voice Conversion: https://github.com/leimao/Voice_Converter_CycleGAN
82 | 
83 | **Note2:** 
84 | The codes can easily be used for traditional parallel data free voice conversion (speaker identity conversion). You just need to change the training data to VCC2016 or VCC2018 (both publicly available) and the run the scripts. You can perform spectrum and CWT-based F0 conversion. 
85 | 
86 | 


--------------------------------------------------------------------------------
/Parallel-data-free emotional voice conversion with CycleGAN and CWT/utils.py:
--------------------------------------------------------------------------------
  1 | import tensorflow as tf
  2 | import os
  3 | import random
  4 | import numpy as np
  5 | from scipy.interpolate import interp1d
  6 | import pycwt as wavelet
  7 | from scipy.signal import firwin
  8 | from scipy.signal import lfilter
  9 | import matplotlib.pyplot as plt
 10 | from sklearn import preprocessing
 11 | import pywt
 12 | import librosa
 13 | import pyworld
 14 | 
 15 | def l1_loss(y, y_hat):
 16 | 
 17 |     return tf.reduce_mean(tf.abs(y - y_hat))
 18 | 
 19 | def l2_loss(y, y_hat):
 20 | 
 21 |     return tf.reduce_mean(tf.square(y - y_hat))
 22 | 
 23 | def cross_entropy_loss(logits, labels):
 24 |     return tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits = logits, labels = labels))
 25 | 
 26 | 
 27 | def convert_continuos_f0(f0):
 28 |     """CONVERT F0 TO CONTINUOUS F0
 29 | 
 30 |     Args:
 31 |         f0 (ndarray): original f0 sequence with the shape (T)
 32 | 
 33 |     Return:
 34 |         (ndarray): continuous f0 with the shape (T)
 35 |     """
 36 |     # get uv information as binary
 37 |     uv = np.float32(f0 != 0)
 38 | 
 39 |     # get start and end of f0
 40 |     if (f0 == 0).all():
 41 |         logging.warn("all of the f0 values are 0.")
 42 |         return uv, f0
 43 |     start_f0 = f0[f0 != 0][0]
 44 |     end_f0 = f0[f0 != 0][-1]
 45 | 
 46 |     # padding start and end of f0 sequence
 47 |     start_idx = np.where(f0 == start_f0)[0][0]
 48 |     end_idx = np.where(f0 == end_f0)[0][-1]
 49 |     f0[:start_idx] = start_f0
 50 |     f0[end_idx:] = end_f0
 51 | 
 52 |     # get non-zero frame index
 53 |     nz_frames = np.where(f0 != 0)[0]
 54 | 
 55 |     # perform linear interpolation
 56 |     f = interp1d(nz_frames, f0[nz_frames])
 57 |     cont_f0 = f(np.arange(0, f0.shape[0]))
 58 | 
 59 |     return uv, cont_f0
 60 | 
 61 | 
 62 | def get_cont_lf0(f0, frame_period=5.0):
 63 |     uv, cont_f0_lpf = convert_continuos_f0(f0)
 64 |     #cont_f0_lpf = low_pass_filter(cont_f0_lpf, int(1.0 / (frame_period * 0.001)), cutoff=20)
 65 |     cont_lf0_lpf = np.log(cont_f0_lpf)
 66 |     return uv, cont_lf0_lpf
 67 | 
 68 | #def get_log_energy(sp):
 69 |     # sp: (T, D)
 70 |     # return: (T)
 71 | #    energy = np.linalg.norm(sp, ord=2, axis=-1)
 72 | #    return np.log(energy)
 73 | 
 74 | def get_lf0_cwt(lf0):
 75 |     mother = wavelet.MexicanHat()
 76 |     #dt = 0.005
 77 |     dt = 0.005
 78 |     dj = 1
 79 |     s0 = dt*2
 80 |     J =9
 81 |     #C_delta = 3.541
 82 |     #Wavelet_lf0, scales, _, _, _, _ = wavelet.cwt(np.squeeze(lf0), dt, dj, s0, J, mother)
 83 |     Wavelet_lf0, scales, freqs, coi, fft, fftfreqs = wavelet.cwt(np.squeeze(lf0), dt, dj, s0, J, mother)
 84 |    #Wavelet_le, scales, _, _, _, _ = wavelet.cwt(np.squeeze(le), dt, dj, s0, J, mother)
 85 |     Wavelet_lf0 = np.real(Wavelet_lf0).T
 86 |    #Wavelet_le = np.real(Wavelet_le).T   # (T, D=10)
 87 |   #0lf0_le_cwt = np.concatenate((Wavelet_lf0, Wavelet_le), -1)
 88 |   #  iwave = wavelet.icwt(np.squeeze(lf0), scales, dt, dj, mother) * std
 89 |     return Wavelet_lf0, scales
 90 | 
 91 | 
 92 | def inverse_cwt(Wavelet_lf0,scales):
 93 |     lf0_rec = np.zeros([Wavelet_lf0.shape[0],len(scales)])
 94 |     for i in range(0,len(scales)):
 95 |         lf0_rec[:,i] = Wavelet_lf0[:,i]*((i+1+2.5)**(-2.5))
 96 |     lf0_rec_sum = np.sum(lf0_rec,axis = 1)
 97 |     lf0_rec_sum = preprocessing.scale(lf0_rec_sum)
 98 |     return lf0_rec_sum
 99 | 
100 | 
101 | def low_pass_filter(x, fs, cutoff=70, padding=True):
102 |     """FUNCTION TO APPLY LOW PASS FILTER
103 | 
104 |     Args:
105 |         x (ndarray): Waveform sequence
106 |         fs (int): Sampling frequency
107 |         cutoff (float): Cutoff frequency of low pass filter
108 | 
109 |     Return:
110 |         (ndarray): Low pass filtered waveform sequence
111 |     """
112 |     nyquist = fs // 2
113 |     norm_cutoff = cutoff / nyquist
114 | 
115 |     # low cut filter
116 |     numtaps = 255
117 |     fil = firwin(numtaps, norm_cutoff)
118 |     x_pad = np.pad(x, (numtaps, numtaps), 'edge')
119 |     lpf_x = lfilter(fil, 1, x_pad)
120 |     lpf_x = lpf_x[numtaps + numtaps // 2: -numtaps // 2]
121 |     return lpf_x
122 | 
123 | def norm_scale(Wavelet_lf0):
124 |     Wavelet_lf0_norm = np.zeros((Wavelet_lf0.shape[0], Wavelet_lf0.shape[1]))
125 |     mean = np.zeros((1,Wavelet_lf0.shape[1]))#[1,10]
126 |     std = np.zeros((1, Wavelet_lf0.shape[1]))
127 |     for scale in range(Wavelet_lf0.shape[1]):
128 |         mean[:,scale] = Wavelet_lf0[:,scale].mean()
129 |         std[:,scale] = Wavelet_lf0[:,scale].std()
130 |         Wavelet_lf0_norm[:,scale] = (Wavelet_lf0[:,scale]-mean[:,scale])/std[:,scale]
131 |     return Wavelet_lf0_norm, mean, std
132 | 
133 | def get_lf0_cwt_norm(f0s, mean, std):
134 | 
135 |     uvs = list()
136 |     cont_lf0_lpfs = list()
137 |     cont_lf0_lpf_norms = list()
138 |     Wavelet_lf0s = list()
139 |     Wavelet_lf0s_norm = list()
140 |     scaless = list()
141 | 
142 |     means = list()
143 |     stds = list()
144 |     for f0 in f0s:
145 | 
146 |         uv, cont_lf0_lpf = get_cont_lf0(f0)
147 |         cont_lf0_lpf_norm = (cont_lf0_lpf - mean) / std 
148 | 
149 |         Wavelet_lf0, scales = get_lf0_cwt(cont_lf0_lpf_norm) #[560,10]
150 |         Wavelet_lf0_norm, mean_scale, std_scale = norm_scale(Wavelet_lf0) #[560,10],[1,10],[1,10]
151 | 
152 |         Wavelet_lf0s_norm.append(Wavelet_lf0_norm)
153 |         uvs.append(uv)
154 |         cont_lf0_lpfs.append(cont_lf0_lpf)
155 |         cont_lf0_lpf_norms.append(cont_lf0_lpf_norm)
156 |         Wavelet_lf0s.append(Wavelet_lf0)
157 |         scaless.append(scales)
158 |         means.append(mean_scale)
159 |         stds.append(std_scale)
160 | 
161 |     return Wavelet_lf0s_norm,scaless, means, stds
162 | 
163 | 
164 | def denormalize(Wavelet_lf0_norm, mean, std):
165 |     Wavelet_lf0_denorm = np.zeros((Wavelet_lf0_norm.shape[0], Wavelet_lf0_norm.shape[1]))
166 |     for scale in range(Wavelet_lf0_norm.shape[1]):
167 |         Wavelet_lf0_denorm[:,scale] = Wavelet_lf0_norm[:,scale]*std[:,scale]+mean[:,scale]
168 |     return Wavelet_lf0_denorm
169 | 
170 | 


--------------------------------------------------------------------------------
/Parallel-data-free emotional voice conversion with CycleGAN and CWT/preprocess.py:
--------------------------------------------------------------------------------
  1 | import librosa
  2 | import numpy as np
  3 | import os
  4 | import pyworld
  5 | 
  6 | def load_wavs(wav_dir, sr):
  7 | 
  8 |     wavs = list()
  9 |     for file in os.listdir(wav_dir):
 10 |         file_path = os.path.join(wav_dir, file)
 11 |         wav, _ = librosa.load(file_path, sr = sr, mono = True)
 12 |         #wav = wav.astype(np.float64)
 13 |         wavs.append(wav)
 14 | 
 15 |     return wavs
 16 | 
 17 | def world_decompose(wav, fs, frame_period = 5.0):
 18 | 
 19 |     # Decompose speech signal into f0, spectral envelope and aperiodicity using WORLD
 20 |     wav = wav.astype(np.float64)
 21 |     f0, timeaxis = pyworld.harvest(wav, fs, frame_period = frame_period, f0_floor = 71.0, f0_ceil = 800.0)
 22 |     sp = pyworld.cheaptrick(wav, f0, timeaxis, fs)
 23 |     ap = pyworld.d4c(wav, f0, timeaxis, fs)
 24 | 
 25 |     return f0, timeaxis, sp, ap
 26 | 
 27 | def world_encode_spectral_envelop(sp, fs, dim = 24):
 28 | 
 29 |     # Get Mel-cepstral coefficients (MCEPs)
 30 | 
 31 |     #sp = sp.astype(np.float64)
 32 |     coded_sp = pyworld.code_spectral_envelope(sp, fs, dim)
 33 | 
 34 |     return coded_sp
 35 | 
 36 | def world_decode_spectral_envelop(coded_sp, fs):
 37 | 
 38 |     fftlen = pyworld.get_cheaptrick_fft_size(fs)
 39 |     #coded_sp = coded_sp.astype(np.float32)
 40 |     #coded_sp = np.ascontiguousarray(coded_sp)
 41 |     decoded_sp = pyworld.decode_spectral_envelope(coded_sp, fs, fftlen)
 42 | 
 43 |     return decoded_sp
 44 | 
 45 | 
 46 | def world_encode_data(wavs, fs, frame_period = 5.0, coded_dim = 24):
 47 | 
 48 |     f0s = list()
 49 |     timeaxes = list()
 50 |     sps = list()
 51 |     aps = list()
 52 |     coded_sps = list()
 53 | 
 54 |     for wav in wavs:
 55 |         f0, timeaxis, sp, ap = world_decompose(wav = wav, fs = fs, frame_period = frame_period)
 56 |         coded_sp = world_encode_spectral_envelop(sp = sp, fs = fs, dim = coded_dim)
 57 |         f0s.append(f0)
 58 |         timeaxes.append(timeaxis)
 59 |         sps.append(sp)
 60 |         aps.append(ap)
 61 |         coded_sps.append(coded_sp)
 62 | 
 63 |     return f0s, timeaxes, sps, aps, coded_sps
 64 | 
 65 | 
 66 | def transpose_in_list(lst):
 67 | 
 68 |     transposed_lst = list()
 69 |     for array in lst:
 70 |         transposed_lst.append(array.T)
 71 |     return transposed_lst
 72 | 
 73 | 
 74 | def world_decode_data(coded_sps, fs):
 75 | 
 76 |     decoded_sps =  list()
 77 | 
 78 |     for coded_sp in coded_sps:
 79 |         decoded_sp = world_decode_spectral_envelop(coded_sp, fs)
 80 |         decoded_sps.append(decoded_sp)
 81 | 
 82 |     return decoded_sps
 83 | 
 84 | 
 85 | def world_speech_synthesis(f0, decoded_sp, ap, fs, frame_period):
 86 | 
 87 |     #decoded_sp = decoded_sp.astype(np.float64)
 88 |     wav = pyworld.synthesize(f0, decoded_sp, ap, fs, frame_period)
 89 |     # Librosa could not save wav if not doing so
 90 |     wav = wav.astype(np.float32)
 91 | 
 92 |     return wav
 93 | 
 94 | 
 95 | def world_synthesis_data(f0s, decoded_sps, aps, fs, frame_period):
 96 | 
 97 |     wavs = list()
 98 | 
 99 |     for f0, decoded_sp, ap in zip(f0s, decoded_sps, aps):
100 |         wav = world_speech_synthesis(f0, decoded_sp, ap, fs, frame_period)
101 |         wavs.append(wav)
102 | 
103 |     return wavs
104 | 
105 | 
106 | def coded_sps_normalization_fit_transoform(coded_sps):
107 | 
108 |     coded_sps_concatenated = np.concatenate(coded_sps, axis = 1)
109 |     coded_sps_mean = np.mean(coded_sps_concatenated, axis = 1, keepdims = True)
110 |     coded_sps_std = np.std(coded_sps_concatenated, axis = 1, keepdims = True)
111 | 
112 |     coded_sps_normalized = list()
113 |     for coded_sp in coded_sps:
114 |         coded_sps_normalized.append((coded_sp - coded_sps_mean) / coded_sps_std)
115 |     
116 |     return coded_sps_normalized, coded_sps_mean, coded_sps_std
117 | 
118 | def coded_sps_normalization_transoform(coded_sps, coded_sps_mean, coded_sps_std):
119 | 
120 |     coded_sps_normalized = list()
121 |     for coded_sp in coded_sps:
122 |         coded_sps_normalized.append((coded_sp - coded_sps_mean) / coded_sps_std)
123 |     
124 |     return coded_sps_normalized
125 | 
126 | def coded_sps_normalization_inverse_transoform(normalized_coded_sps, coded_sps_mean, coded_sps_std):
127 | 
128 |     coded_sps = list()
129 |     for normalized_coded_sp in normalized_coded_sps:
130 |         coded_sps.append(normalized_coded_sp * coded_sps_std + coded_sps_mean)
131 | 
132 |     return coded_sps
133 | 
134 | def coded_sp_padding(coded_sp, multiple = 4):
135 | 
136 |     num_features = coded_sp.shape[0]
137 |     num_frames = coded_sp.shape[1]
138 |     num_frames_padded = int(np.ceil(num_frames / multiple)) * multiple
139 |     num_frames_diff = num_frames_padded - num_frames
140 |     num_pad_left = num_frames_diff // 2
141 |     num_pad_right = num_frames_diff - num_pad_left
142 |     coded_sp_padded = np.pad(coded_sp, ((0, 0), (num_pad_left, num_pad_right)), 'constant', constant_values = 0)
143 | 
144 |     return coded_sp_padded
145 | 
146 | def wav_padding(wav, sr, frame_period, multiple = 4):
147 | 
148 |     assert wav.ndim == 1 
149 |     num_frames = len(wav)
150 |     num_frames_padded = int((np.ceil((np.floor(num_frames / (sr * frame_period / 1000)) + 1) / multiple + 1) * multiple - 1) * (sr * frame_period / 1000))
151 |     num_frames_diff = num_frames_padded - num_frames
152 |     num_pad_left = num_frames_diff // 2
153 |     num_pad_right = num_frames_diff - num_pad_left
154 |     wav_padded = np.pad(wav, (num_pad_left, num_pad_right), 'constant', constant_values = 0)
155 | 
156 |     return wav_padded
157 | 
158 | 
159 | def logf0_statistics(f0s):
160 | 
161 |     log_f0s_concatenated = np.ma.log(np.concatenate(f0s))
162 |     log_f0s_mean = log_f0s_concatenated.mean()
163 |     log_f0s_std = log_f0s_concatenated.std()
164 | 
165 |     return log_f0s_mean, log_f0s_std
166 | 
167 | def pitch_conversion(f0, mean_log_src, std_log_src, mean_log_target, std_log_target):
168 | 
169 |     # Logarithm Gaussian normalization for Pitch Conversions
170 |     f0_converted = np.exp((np.log(f0) - mean_log_src) / std_log_src * std_log_target + mean_log_target)
171 | 
172 |     return f0_converted
173 | 
174 | def wavs_to_specs(wavs, n_fft = 1024, hop_length = None):
175 | 
176 |     stfts = list()
177 |     for wav in wavs:
178 |         stft = librosa.stft(wav, n_fft = n_fft, hop_length = hop_length)
179 |         stfts.append(stft)
180 | 
181 |     return stfts
182 | 
183 | 
184 | def wavs_to_mfccs(wavs, sr, n_fft = 1024, hop_length = None, n_mels = 128, n_mfcc = 24):
185 | 
186 |     mfccs = list()
187 |     for wav in wavs:
188 |         mfcc = librosa.feature.mfcc(y = wav, sr = sr, n_fft = n_fft, hop_length = hop_length, n_mels = n_mels, n_mfcc = n_mfcc)
189 |         mfccs.append(mfcc)
190 | 
191 |     return mfccs
192 | 
193 | 
194 | def mfccs_normalization(mfccs):
195 | 
196 |     mfccs_concatenated = np.concatenate(mfccs, axis = 1)
197 |     mfccs_mean = np.mean(mfccs_concatenated, axis = 1, keepdims = True)
198 |     mfccs_std = np.std(mfccs_concatenated, axis = 1, keepdims = True)
199 | 
200 |     mfccs_normalized = list()
201 |     for mfcc in mfccs:
202 |         mfccs_normalized.append((mfcc - mfccs_mean) / mfccs_std)
203 |     
204 |     return mfccs_normalized, mfccs_mean, mfccs_std
205 | 
206 | 
207 | def sample_train_data(dataset_A, dataset_B, n_frames = 128):
208 | 
209 |     num_samples = min(len(dataset_A), len(dataset_B))
210 |     train_data_A_idx = np.arange(len(dataset_A))
211 |     train_data_B_idx = np.arange(len(dataset_B))
212 |     np.random.shuffle(train_data_A_idx)
213 |     np.random.shuffle(train_data_B_idx)
214 |     train_data_A_idx_subset = train_data_A_idx[:num_samples]
215 |     train_data_B_idx_subset = train_data_B_idx[:num_samples]
216 | 
217 |     train_data_A = list()
218 |     train_data_B = list()
219 | 
220 |     for idx_A, idx_B in zip(train_data_A_idx_subset, train_data_B_idx_subset):
221 |         data_A = dataset_A[idx_A]
222 |         frames_A_total = data_A.shape[1]
223 |         assert frames_A_total >= n_frames
224 |         start_A = np.random.randint(frames_A_total - n_frames + 1)
225 |         end_A = start_A + n_frames
226 |         train_data_A.append(data_A[:,start_A:end_A])
227 | 
228 |         data_B = dataset_B[idx_B]
229 |         frames_B_total = data_B.shape[1]
230 |         assert frames_B_total >= n_frames
231 |         start_B = np.random.randint(frames_B_total - n_frames + 1)
232 |         end_B = start_B + n_frames
233 |         train_data_B.append(data_B[:,start_B:end_B])
234 | 
235 |     train_data_A = np.array(train_data_A)
236 |     train_data_B = np.array(train_data_B)
237 | 
238 |     return train_data_A, train_data_B


--------------------------------------------------------------------------------
/Parallel-data-free emotional voice conversion with CycleGAN and CWT/convert_separate.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | import os
  3 | import numpy as np
  4 | 
  5 | from model_f0 import CycleGAN as CycleGAN_f0
  6 | from model_mceps import CycleGAN as CycleGAN_mceps
  7 | 
  8 | from preprocess import *
  9 | from utils import get_lf0_cwt_norm,norm_scale,denormalize
 10 | from utils import get_cont_lf0, get_lf0_cwt,inverse_cwt
 11 | from sklearn import preprocessing
 12 | 
 13 | def conversion(model_f0_dir, model_f0_name, model_mceps_dir, model_mceps_name, data_dir, conversion_direction, output_dir):
 14 | 
 15 |     num_mceps = 24
 16 |     num_features = 34
 17 |     sampling_rate = 16000
 18 |     frame_period = 5.0
 19 | 
 20 | #    model = CycleGAN(num_features = num_features, mode = 'test')
 21 | #    model.load(filepath = os.path.join(model_dir, model_name))
 22 | 
 23 | #  import F0 model:
 24 |     model_f0 = CycleGAN_f0(num_features = 10, mode = 'test')
 25 |     model_f0.load(filepath=os.path.join(model_f0_dir,model_f0_name))
 26 | #  import mceps model:
 27 |     model_mceps = CycleGAN_mceps(num_features = 24, mode = 'test')
 28 |     model_mceps.load(filepath=os.path.join(model_mceps_dir,model_mceps_name))
 29 | 
 30 |     mcep_normalization_params = np.load(os.path.join(model_mceps_dir, 'mcep_normalization.npz'))
 31 |     mcep_mean_A = mcep_normalization_params['mean_A']
 32 |     mcep_std_A = mcep_normalization_params['std_A']
 33 |     mcep_mean_B = mcep_normalization_params['mean_B']
 34 |     mcep_std_B = mcep_normalization_params['std_B']
 35 | 
 36 |     logf0s_normalization_params = np.load(os.path.join(model_f0_dir, 'logf0s_normalization.npz'))
 37 |     logf0s_mean_A = logf0s_normalization_params['mean_A']
 38 |     logf0s_std_A = logf0s_normalization_params['std_A']
 39 |     logf0s_mean_B = logf0s_normalization_params['mean_B']
 40 |     logf0s_std_B = logf0s_normalization_params['std_B']
 41 | 
 42 |     if not os.path.exists(output_dir):
 43 |         os.makedirs(output_dir)
 44 | 
 45 |     for file in os.listdir(data_dir):
 46 | 
 47 |         filepath = os.path.join(data_dir, file)
 48 |         wav, _ = librosa.load(filepath, sr = sampling_rate, mono = True)
 49 |         wav = wav_padding(wav = wav, sr = sampling_rate, frame_period = frame_period, multiple = 4)
 50 |         f0, timeaxis, sp, ap = world_decompose(wav = wav, fs = sampling_rate, frame_period = frame_period)
 51 |         coded_sp = world_encode_spectral_envelop(sp = sp, fs = sampling_rate, dim = num_mceps)
 52 |         coded_sp_transposed = coded_sp.T
 53 |        # np.save('./f0',f0)
 54 |         uv, cont_lf0_lpf = get_cont_lf0(f0)
 55 | 
 56 |         if conversion_direction == 'A2B':
 57 |             #f0_converted = pitch_conversion(f0 = f0, mean_log_src = logf0s_mean_A, std_log_src = logf0s_std_A, mean_log_target = logf0s_mean_B, std_log_target = logf0s_std_B)
 58 |             #f0_converted = f0
 59 | 
 60 |             cont_lf0_lpf_norm = (cont_lf0_lpf - logf0s_mean_A) / logf0s_std_A
 61 |             Wavelet_lf0, scales = get_lf0_cwt(cont_lf0_lpf_norm) #[470,10]
 62 |             #np.save('./Wavelet_lf0',Wavelet_lf0)
 63 |             Wavelet_lf0_norm, mean, std = norm_scale(Wavelet_lf0) #[470,10],[1,10],[1,10]
 64 |             lf0_cwt_norm = Wavelet_lf0_norm.T #[10,470]
 65 |             
 66 |             coded_sp_norm = (coded_sp_transposed - mcep_mean_A) / mcep_std_A #[24,470]
 67 | 
 68 |         #    feats_norm = np.vstack((coded_sp_norm,lf0_cwt_norm))#[34,470]
 69 |         #    feats_converted_norm = model.test(inputs = np.array([feats_norm]), direction = conversion_direction)[0]
 70 | 
 71 |             # test mceps
 72 |             coded_sp_converted_norm = model_mceps.test(inputs = np.array([coded_sp_norm]),direction = conversion_direction)[0]
 73 |             # test f0:
 74 |             lf0 = model_f0.test(inputs = np.array([lf0_cwt_norm]),direction=conversion_direction)[0]
 75 |             #coded_sp_converted_norm = model.test(inputs = np.array([feats_norm]), direction = conversion_direction)[0]
 76 | 
 77 |             #coded_sp_converted_norm = feats_converted_norm[:24]
 78 |             coded_sp_converted = coded_sp_converted_norm * mcep_std_B + mcep_mean_B #mceps
 79 | 
 80 |             #lf0 = feats_converted_norm[24:].T #[470,10]
 81 | 
 82 |             lf0_cwt_denormalize = denormalize(lf0.T, mean, std)#[470,10]
 83 |             #np.save('./lf0_denorm',lf0_cwt_denormalize)
 84 |             lf0_rec = inverse_cwt(lf0_cwt_denormalize,scales)#[470,1]
 85 |             #lf0_rec_norm = preprocessing.scale(lf0_rec)
 86 |             lf0_converted = lf0_rec * logf0s_std_B + logf0s_mean_B
 87 |             f0_converted = np.squeeze(uv) * np.exp(lf0_converted)
 88 |             f0_converted = np.ascontiguousarray(f0_converted)
 89 |             #np.save('./f0_converted',f0_converted)
 90 | 
 91 |         else:
 92 |             #f0_converted = pitch_conversion(f0 = f0, mean_log_src = logf0s_mean_B, std_log_src = logf0s_std_B, mean_log_target = logf0s_mean_A, std_log_target = logf0s_std_A)
 93 |             #f0_converted = f0
 94 |             cont_lf0_lpf_norm = (cont_lf0_lpf - logf0s_mean_B) / logf0s_std_B
 95 |             Wavelet_lf0, scales = get_lf0_cwt(cont_lf0_lpf_norm)
 96 |             lf0_cwt_norm = Wavelet_lf0.T #[10,470]
 97 |             coded_sp_norm = (coded_sp_transposed - mcep_mean_B) / mcep_std_B
 98 |             feats_norm = np.vstack((coded_sp_norm,lf0_cwt_norm))#[34,470]
 99 |             feats_converted_norm = model_f0.test(inputs = np.array([feats_norm]), direction = conversion_direction)[0]
100 | 
101 |             #coded_sp_converted_norm = model.test(inputs = np.array([feats_norm]), direction = conversion_direction)[0]
102 |             coded_sp_converted_norm = feats_converted_norm[:24]
103 |             coded_sp_converted = coded_sp_converted_norm * mcep_std_A + mcep_mean_A
104 |             lf0_rec = inverse_cwt(feats_norm[24:].T,scales)#[470,10]
105 |             lf0_rec_norm = preprocessing.scale(lf0_rec)
106 |             lf0_converted = lf0_rec_norm * logf0s_std_A + logf0s_mean_A
107 |             f0_converted = np.squeeze(uv) * np.exp(lf0_converted)
108 |             f0_converted = np.ascontiguousarray(f0_converted)
109 | 
110 |             #coded_sp_norm = (coded_sp_transposed - mcep_mean_B) / mcep_std_B
111 |             #coded_sp_converted_norm = model.test(inputs = np.array([coded_sp_norm]), direction = conversion_direction)[0]
112 |             #coded_sp_converted = coded_sp_converted_norm * mcep_std_A + mcep_mean_A
113 | 
114 |         coded_sp_converted = coded_sp_converted.T#[470,24]
115 |         coded_sp_converted = np.ascontiguousarray(coded_sp_converted)
116 |         decoded_sp_converted = world_decode_spectral_envelop(coded_sp = coded_sp_converted, fs = sampling_rate)
117 |         wav_transformed = world_speech_synthesis(f0 = f0_converted, decoded_sp = decoded_sp_converted, ap = ap, fs = sampling_rate, frame_period = frame_period)
118 |         librosa.output.write_wav(os.path.join(output_dir, os.path.basename(file)), wav_transformed, sampling_rate)
119 | 
120 | 
121 | if __name__ == '__main__':
122 |     #os.environ['CUDA_VISIBLE_DEVICES']='2'
123 |     parser = argparse.ArgumentParser(description = 'Convert voices using pre-trained CycleGAN model.')
124 | 
125 |     model_f0_dir_default = './model/neutral_to_surprise_f0'
126 |     model_f0_name_default = 'neutral_to_surprise_f0.ckpt'
127 |     model_mceps_dir_default = './model/neutral_to_surprise_mceps'
128 |     model_mceps_name_default = 'neutral_to_surprise_mceps.ckpt'
129 |     data_dir_default = './data/evaluation_all/NEUTRAL'
130 |     conversion_direction_default = 'A2B'
131 |     output_dir_default = './converted_voices_neutral_to_surprise_separate'
132 | 
133 |     parser.add_argument('--model_f0_dir', type = str, help = 'Directory for the pre-trained f0 model.', default = model_f0_dir_default)
134 |     parser.add_argument('--model_f0_name', type = str, help = 'Filename for the pre-trained f0 model.', default = model_f0_name_default)
135 |     parser.add_argument('--model_mceps_dir', type = str, help = 'Directory for the pre-trained mceps model.', default = model_mceps_dir_default)
136 |     parser.add_argument('--model_mceps_name', type = str, help = 'Filename for the pre-trained mceps model.', default = model_mceps_name_default)
137 |     parser.add_argument('--data_dir', type = str, help = 'Directory for the voices for conversion.', default = data_dir_default)
138 |     parser.add_argument('--conversion_direction', type = str, help = 'Conversion direction for CycleGAN. A2B or B2A. The first object in the model file name is A, and the second object in the model file name is B.', default = conversion_direction_default)
139 |     parser.add_argument('--output_dir', type = str, help = 'Directory for the converted voices.', default = output_dir_default)
140 | 
141 |     argv = parser.parse_args()
142 | 
143 |     model_f0_dir = argv.model_f0_dir
144 |     model_f0_name = argv.model_f0_name
145 |     model_mceps_dir = argv.model_mceps_dir
146 |     model_mceps_name = argv.model_mceps_name
147 |     data_dir = argv.data_dir
148 |     conversion_direction = argv.conversion_direction
149 |     output_dir = argv.output_dir
150 | 
151 |     conversion(model_f0_dir = model_f0_dir, model_f0_name = model_f0_name, model_mceps_dir = model_mceps_dir, model_mceps_name = model_mceps_name, data_dir = data_dir, conversion_direction = conversion_direction, output_dir = output_dir)
152 | 
153 | 
154 | 


--------------------------------------------------------------------------------
/Parallel-data-free emotional voice conversion with CycleGAN and CWT/train_f0.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import numpy as np
  3 | import argparse
  4 | import time
  5 | import librosa
  6 | from sklearn import preprocessing
  7 | from preprocess import *
  8 | from model_f0 import CycleGAN
  9 | from utils import get_lf0_cwt_norm
 10 | from utils import get_cont_lf0, get_lf0_cwt,inverse_cwt
 11 | 
 12 | def train(train_A_dir, train_B_dir, model_dir, model_name, random_seed, validation_A_dir, validation_B_dir, output_dir, tensorboard_log_dir):
 13 | # 10-scale F0 + 24 MCEPs:
 14 |     np.random.seed(random_seed)
 15 |     num_epochs = 5000
 16 |     mini_batch_size = 1 # mini_batch_size = 1 is better
 17 |     generator_learning_rate = 0.0002
 18 |     generator_learning_rate_decay = generator_learning_rate / 200000
 19 |     discriminator_learning_rate = 0.0001
 20 |     discriminator_learning_rate_decay = discriminator_learning_rate / 200000
 21 |     sampling_rate = 16000
 22 |     num_mcep = 24
 23 |     num_scale = 10
 24 |     frame_period = 5.0
 25 |     n_frames = 128
 26 |     lambda_cycle = 10
 27 |     lambda_identity = 5
 28 | 
 29 |     print('Preprocessing Data...')
 30 | 
 31 |     start_time = time.time()
 32 | 
 33 |     wavs_A = load_wavs(wav_dir = train_A_dir, sr = sampling_rate)
 34 |     wavs_B = load_wavs(wav_dir = train_B_dir, sr = sampling_rate)
 35 | 
 36 |     f0s_A, timeaxes_A, sps_A, aps_A, coded_sps_A = world_encode_data(wavs = wavs_A, fs = sampling_rate, frame_period = frame_period, coded_dim = num_mcep)
 37 |     f0s_B, timeaxes_B, sps_B, aps_B, coded_sps_B = world_encode_data(wavs = wavs_B, fs = sampling_rate, frame_period = frame_period, coded_dim = num_mcep)
 38 | 
 39 |     log_f0s_mean_A, log_f0s_std_A = logf0_statistics(f0s_A)
 40 |     log_f0s_mean_B, log_f0s_std_B = logf0_statistics(f0s_B)
 41 |     #############################
 42 |     #get lf0 cwt:
 43 |     lf0_cwt_norm_A, scales_A, means_A, stds_A = get_lf0_cwt_norm(f0s_A, mean = log_f0s_mean_A, std = log_f0s_std_A)
 44 |     lf0_cwt_norm_B, scales_B, means_B, stds_B = get_lf0_cwt_norm(f0s_B, mean = log_f0s_mean_B, std = log_f0s_std_B)
 45 | 
 46 | 
 47 |     print('Log Pitch A')
 48 |     print('Mean: %f, Std: %f' %(log_f0s_mean_A, log_f0s_std_A))
 49 |     print('Log Pitch B')
 50 |     print('Mean: %f, Std: %f' %(log_f0s_mean_B, log_f0s_std_B))
 51 | 
 52 |     lf0_cwt_norm_A_transposed = transpose_in_list(lst = lf0_cwt_norm_A)
 53 |     lf0_cwt_norm_B_transposed = transpose_in_list(lst = lf0_cwt_norm_B)
 54 | 
 55 |     coded_sps_A_transposed = transpose_in_list(lst = coded_sps_A)
 56 |     coded_sps_B_transposed = transpose_in_list(lst = coded_sps_B)
 57 | 
 58 |     coded_sps_A_norm, coded_sps_A_mean, coded_sps_A_std = coded_sps_normalization_fit_transoform(coded_sps = coded_sps_A_transposed)
 59 |     coded_sps_B_norm, coded_sps_B_mean, coded_sps_B_std = coded_sps_normalization_fit_transoform(coded_sps = coded_sps_B_transposed)
 60 | 
 61 |     if not os.path.exists(model_dir):
 62 |         os.makedirs(model_dir)
 63 |     np.savez(os.path.join(model_dir, 'logf0s_normalization.npz'), mean_A = log_f0s_mean_A, std_A = log_f0s_std_A, mean_B = log_f0s_mean_B, std_B = log_f0s_std_B)
 64 |     np.savez(os.path.join(model_dir, 'mcep_normalization.npz'), mean_A = coded_sps_A_mean, std_A = coded_sps_A_std, mean_B = coded_sps_B_mean, std_B = coded_sps_B_std)
 65 | 
 66 |     if validation_A_dir is not None:
 67 |         validation_A_output_dir = os.path.join(output_dir, 'converted_A')
 68 |         if not os.path.exists(validation_A_output_dir):
 69 |             os.makedirs(validation_A_output_dir)
 70 | 
 71 |     if validation_B_dir is not None:
 72 |         validation_B_output_dir = os.path.join(output_dir, 'converted_B')
 73 |         if not os.path.exists(validation_B_output_dir):
 74 |             os.makedirs(validation_B_output_dir)
 75 | 
 76 |     end_time = time.time()
 77 |     time_elapsed = end_time - start_time
 78 | 
 79 |     print('Preprocessing Done.')
 80 | 
 81 |     print('Time Elapsed for Data Preprocessing: %02d:%02d:%02d' % (time_elapsed // 3600, (time_elapsed % 3600 // 60), (time_elapsed % 60 // 1)))
 82 |     
 83 |     num_feats = 10 #34
 84 |     model = CycleGAN(num_features = num_feats)
 85 |     #model.load('./model/neutral2anger_f0/neutral2anger_f0.ckpt')
 86 |     #print('model restored')
 87 | 
 88 |     for epoch in range(num_epochs):
 89 |         print('Epoch: %d' % epoch)
 90 |         '''
 91 |         if epoch > 60:
 92 |             lambda_identity = 0
 93 |         if epoch > 1250:
 94 |             generator_learning_rate = max(0, generator_learning_rate - 0.0000002)
 95 |             discriminator_learning_rate = max(0, discriminator_learning_rate - 0.0000001)
 96 |         '''
 97 | 
 98 |         start_time_epoch = time.time()
 99 |         ###zk
100 |         data_As = list()
101 |         data_Bs = list()
102 |         for lf0_a in lf0_cwt_norm_A_transposed:
103 |             data_A = lf0_a
104 |             data_As.append(data_A)
105 | 
106 |         for lf0_b in lf0_cwt_norm_B_transposed:
107 |             data_B = lf0_b
108 |             data_Bs.append(data_B)
109 | #        for sp_a, lf0_a in zip(coded_sps_A_norm,lf0_cwt_norm_A_transposed):
110 | #            data_A = np.vstack((sp_a,lf0_a))
111 | #            data_As.append(data_A)
112 | #        for sp_b, lf0_b in zip(coded_sps_B_norm,lf0_cwt_norm_B_transposed):
113 | #            data_B = np.vstack((sp_b,lf0_b))
114 | #            data_Bs.append(data_B)
115 | 
116 |         #dataset_A, dataset_B = sample_train_data(dataset_A = coded_sps_A_norm, dataset_B = coded_sps_B_norm, n_frames = n_frames)
117 |         dataset_A, dataset_B = sample_train_data(dataset_A = data_As, dataset_B = data_Bs, n_frames = n_frames)
118 | 
119 |         n_samples = dataset_A.shape[0]
120 | 
121 |         for i in range(n_samples // mini_batch_size):
122 | 
123 |             num_iterations = n_samples // mini_batch_size * epoch + i
124 | 
125 |             if num_iterations > 10000:
126 |                 lambda_identity = 0
127 |             if num_iterations > 200000:
128 |                 generator_learning_rate = max(0, generator_learning_rate - generator_learning_rate_decay)
129 |                 discriminator_learning_rate = max(0, discriminator_learning_rate - discriminator_learning_rate_decay)
130 | 
131 |             start = i * mini_batch_size
132 |             end = (i + 1) * mini_batch_size
133 | 
134 |             generator_loss, discriminator_loss = model.train(input_A = dataset_A[start:end], input_B = dataset_B[start:end], lambda_cycle = lambda_cycle, lambda_identity = lambda_identity, generator_learning_rate = generator_learning_rate, discriminator_learning_rate = discriminator_learning_rate)
135 | 
136 |             if i % 50 == 0:
137 |                 #print('Iteration: %d, Generator Loss : %f, Discriminator Loss : %f' % (num_iterations, generator_loss, discriminator_loss))
138 |                 print('Iteration: {:07d}, Generator Learning Rate: {:.7f}, Discriminator Learning Rate: {:.7f}, Generator Loss : {:.3f}, Discriminator Loss : {:.3f}'.format(num_iterations, generator_learning_rate, discriminator_learning_rate, generator_loss, discriminator_loss))
139 | 
140 |         model.save(directory = model_dir, filename = model_name)
141 | 
142 |         end_time_epoch = time.time()
143 |         time_elapsed_epoch = end_time_epoch - start_time_epoch
144 | 
145 |         print('Time Elapsed for This Epoch: %02d:%02d:%02d' % (time_elapsed_epoch // 3600, (time_elapsed_epoch % 3600 // 60), (time_elapsed_epoch % 60 // 1)))
146 | 
147 | 
148 | if __name__ == '__main__':
149 | 
150 |     parser = argparse.ArgumentParser(description = 'Train CycleGAN model for datasets.')
151 | 
152 |     train_A_dir_default = './data/training/NEUTRAL'
153 |     train_B_dir_default = './data/training/SURPRISE'
154 |     model_dir_default = './model/neutral_surprise_f0'
155 |     model_name_default = 'neutral_to_surprise_f0.ckpt'
156 |     random_seed_default = 0
157 |     validation_A_dir_default = './data/evaluation_all/NEUTRAL'
158 |     validation_B_dir_default = './data/evaluation_all/SURPRISE'
159 |     output_dir_default = './validation_output'
160 |     tensorboard_log_dir_default = './log'
161 | 
162 |     parser.add_argument('--train_A_dir', type = str, help = 'Directory for A.', default = train_A_dir_default)
163 |     parser.add_argument('--train_B_dir', type = str, help = 'Directory for B.', default = train_B_dir_default)
164 |     parser.add_argument('--model_dir', type = str, help = 'Directory for saving models.', default = model_dir_default)
165 |     parser.add_argument('--model_name', type = str, help = 'File name for saving model.', default = model_name_default)
166 |     parser.add_argument('--random_seed', type = int, help = 'Random seed for model training.', default = random_seed_default)
167 |     parser.add_argument('--validation_A_dir', type = str, help = 'Convert validation A after each training epoch. If set none, no conversion would be done during the training.', default = validation_A_dir_default)
168 |     parser.add_argument('--validation_B_dir', type = str, help = 'Convert validation B after each training epoch. If set none, no conversion would be done during the training.', default = validation_B_dir_default)
169 |     parser.add_argument('--output_dir', type = str, help = 'Output directory for converted validation voices.', default = output_dir_default)
170 |     parser.add_argument('--tensorboard_log_dir', type = str, help = 'TensorBoard log directory.', default = tensorboard_log_dir_default)
171 | 
172 |     argv = parser.parse_args()
173 | 
174 |     train_A_dir = argv.train_A_dir
175 |     train_B_dir = argv.train_B_dir
176 |     model_dir = argv.model_dir
177 |     model_name = argv.model_name
178 |     random_seed = argv.random_seed
179 |     validation_A_dir = None if argv.validation_A_dir == 'None' or argv.validation_A_dir == 'none' else argv.validation_A_dir
180 |     validation_B_dir = None if argv.validation_B_dir == 'None' or argv.validation_B_dir == 'none' else argv.validation_B_dir
181 |     output_dir = argv.output_dir
182 |     tensorboard_log_dir = argv.tensorboard_log_dir
183 | 
184 |     train(train_A_dir = train_A_dir, train_B_dir = train_B_dir, model_dir = model_dir, model_name = model_name, random_seed = random_seed, validation_A_dir = validation_A_dir, validation_B_dir = validation_B_dir, output_dir = output_dir, tensorboard_log_dir = tensorboard_log_dir)
185 | 


--------------------------------------------------------------------------------
/Parallel-data-free emotional voice conversion with CycleGAN and CWT/module.py:
--------------------------------------------------------------------------------
  1 | import tensorflow as tf 
  2 | 
  3 | def gated_linear_layer(inputs, gates, name = None):
  4 | 
  5 |     activation = tf.multiply(x = inputs, y = tf.sigmoid(gates), name = name)
  6 | 
  7 |     return activation
  8 | 
  9 | def instance_norm_layer(
 10 |     inputs, 
 11 |     epsilon = 1e-06, 
 12 |     activation_fn = None, 
 13 |     name = None):
 14 | 
 15 |     instance_norm_layer = tf.contrib.layers.instance_norm(
 16 |         inputs = inputs,
 17 |         epsilon = epsilon,
 18 |         activation_fn = activation_fn)
 19 | 
 20 |     return instance_norm_layer
 21 | 
 22 | def conv1d_layer(
 23 |     inputs, 
 24 |     filters, 
 25 |     kernel_size, 
 26 |     strides = 1, 
 27 |     padding = 'same', 
 28 |     activation = None,
 29 |     kernel_initializer = None,
 30 |     name = None):
 31 | 
 32 |     conv_layer = tf.layers.conv1d(
 33 |         inputs = inputs,
 34 |         filters = filters,
 35 |         kernel_size = kernel_size,
 36 |         strides = strides,
 37 |         padding = padding,
 38 |         activation = activation,
 39 |         kernel_initializer = kernel_initializer,
 40 |         name = name)
 41 | 
 42 |     return conv_layer
 43 | 
 44 | def conv2d_layer(
 45 |     inputs, 
 46 |     filters, 
 47 |     kernel_size, 
 48 |     strides, 
 49 |     padding = 'same', 
 50 |     activation = None,
 51 |     kernel_initializer = None,
 52 |     name = None):
 53 | 
 54 |     conv_layer = tf.layers.conv2d(
 55 |         inputs = inputs,
 56 |         filters = filters,
 57 |         kernel_size = kernel_size,
 58 |         strides = strides,
 59 |         padding = padding,
 60 |         activation = activation,
 61 |         kernel_initializer = kernel_initializer,
 62 |         name = name)
 63 | 
 64 |     return conv_layer
 65 | 
 66 | def residual1d_block(
 67 |     inputs, 
 68 |     filters = 1024, 
 69 |     kernel_size = 3, 
 70 |     strides = 1,
 71 |     name_prefix = 'residule_block_'):
 72 | 
 73 |     h1 = conv1d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_conv')
 74 |     h1_norm = instance_norm_layer(inputs = h1, activation_fn = None, name = name_prefix + 'h1_norm')
 75 |     h1_gates = conv1d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_gates')
 76 |     h1_norm_gates = instance_norm_layer(inputs = h1_gates, activation_fn = None, name = name_prefix + 'h1_norm_gates')
 77 |     h1_glu = gated_linear_layer(inputs = h1_norm, gates = h1_norm_gates, name = name_prefix + 'h1_glu')
 78 |     h2 = conv1d_layer(inputs = h1_glu, filters = filters // 2, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h2_conv')
 79 |     h2_norm = instance_norm_layer(inputs = h2, activation_fn = None, name = name_prefix + 'h2_norm')
 80 |     
 81 |     h3 = inputs + h2_norm
 82 | 
 83 |     return h3
 84 | 
 85 | def downsample1d_block(
 86 |     inputs, 
 87 |     filters, 
 88 |     kernel_size, 
 89 |     strides,
 90 |     name_prefix = 'downsample1d_block_'):
 91 | 
 92 |     h1 = conv1d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_conv')
 93 |     h1_norm = instance_norm_layer(inputs = h1, activation_fn = None, name = name_prefix + 'h1_norm')
 94 |     h1_gates = conv1d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_gates')
 95 |     h1_norm_gates = instance_norm_layer(inputs = h1_gates, activation_fn = None, name = name_prefix + 'h1_norm_gates')
 96 |     h1_glu = gated_linear_layer(inputs = h1_norm, gates = h1_norm_gates, name = name_prefix + 'h1_glu')
 97 | 
 98 |     return h1_glu
 99 | 
100 | def downsample2d_block(
101 |     inputs, 
102 |     filters, 
103 |     kernel_size, 
104 |     strides,
105 |     name_prefix = 'downsample2d_block_'):
106 | 
107 |     h1 = conv2d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_conv')
108 |     h1_norm = instance_norm_layer(inputs = h1, activation_fn = None, name = name_prefix + 'h1_norm')
109 |     h1_gates = conv2d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_gates')
110 |     h1_norm_gates = instance_norm_layer(inputs = h1_gates, activation_fn = None, name = name_prefix + 'h1_norm_gates')
111 |     h1_glu = gated_linear_layer(inputs = h1_norm, gates = h1_norm_gates, name = name_prefix + 'h1_glu')
112 | 
113 |     return h1_glu
114 | 
115 | def upsample1d_block(
116 |     inputs, 
117 |     filters, 
118 |     kernel_size, 
119 |     strides,
120 |     shuffle_size = 2,
121 |     name_prefix = 'upsample1d_block_'):
122 |     
123 |     h1 = conv1d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_conv')
124 |     h1_shuffle = pixel_shuffler(inputs = h1, shuffle_size = shuffle_size, name = name_prefix + 'h1_shuffle')
125 |     h1_norm = instance_norm_layer(inputs = h1_shuffle, activation_fn = None, name = name_prefix + 'h1_norm')
126 | 
127 |     h1_gates = conv1d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_gates')
128 |     h1_shuffle_gates = pixel_shuffler(inputs = h1_gates, shuffle_size = shuffle_size, name = name_prefix + 'h1_shuffle_gates')
129 |     h1_norm_gates = instance_norm_layer(inputs = h1_shuffle_gates, activation_fn = None, name = name_prefix + 'h1_norm_gates')
130 | 
131 |     h1_glu = gated_linear_layer(inputs = h1_norm, gates = h1_norm_gates, name = name_prefix + 'h1_glu')
132 | 
133 |     return h1_glu
134 | 
135 | def pixel_shuffler(inputs, shuffle_size = 2, name = None):
136 | 
137 |     n = tf.shape(inputs)[0]
138 |     w = tf.shape(inputs)[1]
139 |     c = inputs.get_shape().as_list()[2]
140 | 
141 |     oc = c // shuffle_size
142 |     ow = w * shuffle_size
143 | 
144 |     outputs = tf.reshape(tensor = inputs, shape = [n, ow, oc], name = name)
145 | 
146 |     return outputs
147 | 
148 | def generator_gatedcnn(inputs, reuse = False, scope_name = 'generator_gatedcnn'):
149 | 
150 |     # inputs has shape [batch_size, num_features, time]
151 |     # we need to convert it to [batch_size, time, num_features] for 1D convolution
152 |     inputs = tf.transpose(inputs, perm = [0, 2, 1], name = 'input_transpose')
153 | 
154 |     with tf.variable_scope(scope_name) as scope:
155 |         # Discriminator would be reused in CycleGAN
156 |         if reuse:
157 |             scope.reuse_variables()
158 |         else:
159 |             assert scope.reuse is False
160 | 
161 |         h1 = conv1d_layer(inputs = inputs, filters = 128, kernel_size = 15, strides = 1, activation = None, name = 'h1_conv')
162 |         h1_gates = conv1d_layer(inputs = inputs, filters = 128, kernel_size = 15, strides = 1, activation = None, name = 'h1_conv_gates')
163 |         h1_glu = gated_linear_layer(inputs = h1, gates = h1_gates, name = 'h1_glu')
164 | 
165 |         # Downsample
166 |         d1 = downsample1d_block(inputs = h1_glu, filters = 256, kernel_size = 5, strides = 2, name_prefix = 'downsample1d_block1_')
167 |         d2 = downsample1d_block(inputs = d1, filters = 512, kernel_size = 5, strides = 2, name_prefix = 'downsample1d_block2_')
168 | 
169 |         # Residual blocks
170 |         r1 = residual1d_block(inputs = d2, filters = 1024, kernel_size = 3, strides = 1, name_prefix = 'residual1d_block1_')
171 |         r2 = residual1d_block(inputs = r1, filters = 1024, kernel_size = 3, strides = 1, name_prefix = 'residual1d_block2_')
172 |         r3 = residual1d_block(inputs = r2, filters = 1024, kernel_size = 3, strides = 1, name_prefix = 'residual1d_block3_')
173 |         r4 = residual1d_block(inputs = r3, filters = 1024, kernel_size = 3, strides = 1, name_prefix = 'residual1d_block4_')
174 |         r5 = residual1d_block(inputs = r4, filters = 1024, kernel_size = 3, strides = 1, name_prefix = 'residual1d_block5_')
175 |         r6 = residual1d_block(inputs = r5, filters = 1024, kernel_size = 3, strides = 1, name_prefix = 'residual1d_block6_')
176 | 
177 |         # Upsample
178 |         u1 = upsample1d_block(inputs = r6, filters = 1024, kernel_size = 5, strides = 1, shuffle_size = 2, name_prefix = 'upsample1d_block1_')
179 |         u2 = upsample1d_block(inputs = u1, filters = 512, kernel_size = 5, strides = 1, shuffle_size = 2, name_prefix = 'upsample1d_block2_')
180 | 
181 |         # Output
182 |         o1 = conv1d_layer(inputs = u2, filters = 24, kernel_size = 15, strides = 1, activation = None, name = 'o1_conv')
183 |         o2 = tf.transpose(o1, perm = [0, 2, 1], name = 'output_transpose')
184 | 
185 |     return o2
186 |     
187 | 
188 | def discriminator(inputs, reuse = False, scope_name = 'discriminator'):
189 | 
190 |     # inputs has shape [batch_size, num_features, time]
191 |     # we need to add channel for 2D convolution [batch_size, num_features, time, 1]
192 |     inputs = tf.expand_dims(inputs, -1)
193 | 
194 |     with tf.variable_scope(scope_name) as scope:
195 |         # Discriminator would be reused in CycleGAN
196 |         if reuse:
197 |             scope.reuse_variables()
198 |         else:
199 |             assert scope.reuse is False
200 | 
201 |         h1 = conv2d_layer(inputs = inputs, filters = 128, kernel_size = [3, 3], strides = [1, 2], activation = None, name = 'h1_conv')
202 |         h1_gates = conv2d_layer(inputs = inputs, filters = 128, kernel_size = [3, 3], strides = [1, 2], activation = None, name = 'h1_conv_gates')
203 |         h1_glu = gated_linear_layer(inputs = h1, gates = h1_gates, name = 'h1_glu')
204 | 
205 |         # Downsample
206 |         d1 = downsample2d_block(inputs = h1_glu, filters = 256, kernel_size = [3, 3], strides = [2, 2], name_prefix = 'downsample2d_block1_')
207 |         d2 = downsample2d_block(inputs = d1, filters = 512, kernel_size = [3, 3], strides = [2, 2], name_prefix = 'downsample2d_block2_')
208 |         d3 = downsample2d_block(inputs = d2, filters = 1024, kernel_size = [6, 3], strides = [1, 2], name_prefix = 'downsample2d_block3_')
209 | 
210 |         # Output
211 |         o1 = tf.layers.dense(inputs = d3, units = 1, activation = tf.nn.sigmoid)
212 | 
213 |         return o1
214 | 
215 | 


--------------------------------------------------------------------------------
/Parallel-data-free emotional voice conversion with CycleGAN and CWT/module_f0.py:
--------------------------------------------------------------------------------
  1 | import tensorflow as tf 
  2 | 
  3 | def gated_linear_layer(inputs, gates, name = None):
  4 | 
  5 |     activation = tf.multiply(x = inputs, y = tf.sigmoid(gates), name = name)
  6 | 
  7 |     return activation
  8 | 
  9 | def instance_norm_layer(
 10 |     inputs, 
 11 |     epsilon = 1e-06, 
 12 |     activation_fn = None, 
 13 |     name = None):
 14 | 
 15 |     instance_norm_layer = tf.contrib.layers.instance_norm(
 16 |         inputs = inputs,
 17 |         epsilon = epsilon,
 18 |         activation_fn = activation_fn)
 19 | 
 20 |     return instance_norm_layer
 21 | 
 22 | def conv1d_layer(
 23 |     inputs, 
 24 |     filters, 
 25 |     kernel_size, 
 26 |     strides = 1, 
 27 |     padding = 'same', 
 28 |     activation = None,
 29 |     kernel_initializer = None,
 30 |     name = None):
 31 | 
 32 |     conv_layer = tf.layers.conv1d(
 33 |         inputs = inputs,
 34 |         filters = filters,
 35 |         kernel_size = kernel_size,
 36 |         strides = strides,
 37 |         padding = padding,
 38 |         activation = activation,
 39 |         kernel_initializer = kernel_initializer,
 40 |         name = name)
 41 | 
 42 |     return conv_layer
 43 | 
 44 | def conv2d_layer(
 45 |     inputs, 
 46 |     filters, 
 47 |     kernel_size, 
 48 |     strides, 
 49 |     padding = 'same', 
 50 |     activation = None,
 51 |     kernel_initializer = None,
 52 |     name = None):
 53 | 
 54 |     conv_layer = tf.layers.conv2d(
 55 |         inputs = inputs,
 56 |         filters = filters,
 57 |         kernel_size = kernel_size,
 58 |         strides = strides,
 59 |         padding = padding,
 60 |         activation = activation,
 61 |         kernel_initializer = kernel_initializer,
 62 |         name = name)
 63 | 
 64 |     return conv_layer
 65 | 
 66 | def residual1d_block(
 67 |     inputs, 
 68 |     filters = 1024, 
 69 |     kernel_size = 3, 
 70 |     strides = 1,
 71 |     name_prefix = 'residule_block_'):
 72 | 
 73 |     h1 = conv1d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_conv')
 74 |     h1_norm = instance_norm_layer(inputs = h1, activation_fn = None, name = name_prefix + 'h1_norm')
 75 |     h1_gates = conv1d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_gates')
 76 |     h1_norm_gates = instance_norm_layer(inputs = h1_gates, activation_fn = None, name = name_prefix + 'h1_norm_gates')
 77 |     h1_glu = gated_linear_layer(inputs = h1_norm, gates = h1_norm_gates, name = name_prefix + 'h1_glu')
 78 |     h2 = conv1d_layer(inputs = h1_glu, filters = filters // 2, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h2_conv')
 79 |     h2_norm = instance_norm_layer(inputs = h2, activation_fn = None, name = name_prefix + 'h2_norm')
 80 |     
 81 |     h3 = inputs + h2_norm
 82 | 
 83 |     return h3
 84 | 
 85 | def downsample1d_block(
 86 |     inputs, 
 87 |     filters, 
 88 |     kernel_size, 
 89 |     strides,
 90 |     name_prefix = 'downsample1d_block_'):
 91 | 
 92 |     h1 = conv1d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_conv')
 93 |     h1_norm = instance_norm_layer(inputs = h1, activation_fn = None, name = name_prefix + 'h1_norm')
 94 |     h1_gates = conv1d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_gates')
 95 |     h1_norm_gates = instance_norm_layer(inputs = h1_gates, activation_fn = None, name = name_prefix + 'h1_norm_gates')
 96 |     h1_glu = gated_linear_layer(inputs = h1_norm, gates = h1_norm_gates, name = name_prefix + 'h1_glu')
 97 | 
 98 |     return h1_glu
 99 | 
100 | def downsample2d_block(
101 |     inputs, 
102 |     filters, 
103 |     kernel_size, 
104 |     strides,
105 |     name_prefix = 'downsample2d_block_'):
106 | 
107 |     h1 = conv2d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_conv')
108 |     h1_norm = instance_norm_layer(inputs = h1, activation_fn = None, name = name_prefix + 'h1_norm')
109 |     h1_gates = conv2d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_gates')
110 |     h1_norm_gates = instance_norm_layer(inputs = h1_gates, activation_fn = None, name = name_prefix + 'h1_norm_gates')
111 |     h1_glu = gated_linear_layer(inputs = h1_norm, gates = h1_norm_gates, name = name_prefix + 'h1_glu')
112 | 
113 |     return h1_glu
114 | 
115 | def upsample1d_block(
116 |     inputs, 
117 |     filters, 
118 |     kernel_size, 
119 |     strides,
120 |     shuffle_size = 2,
121 |     name_prefix = 'upsample1d_block_'):
122 |     
123 |     h1 = conv1d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_conv')
124 |     h1_shuffle = pixel_shuffler(inputs = h1, shuffle_size = shuffle_size, name = name_prefix + 'h1_shuffle')
125 |     h1_norm = instance_norm_layer(inputs = h1_shuffle, activation_fn = None, name = name_prefix + 'h1_norm')
126 | 
127 |     h1_gates = conv1d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_gates')
128 |     h1_shuffle_gates = pixel_shuffler(inputs = h1_gates, shuffle_size = shuffle_size, name = name_prefix + 'h1_shuffle_gates')
129 |     h1_norm_gates = instance_norm_layer(inputs = h1_shuffle_gates, activation_fn = None, name = name_prefix + 'h1_norm_gates')
130 | 
131 |     h1_glu = gated_linear_layer(inputs = h1_norm, gates = h1_norm_gates, name = name_prefix + 'h1_glu')
132 | 
133 |     return h1_glu
134 | 
135 | def pixel_shuffler(inputs, shuffle_size = 2, name = None):
136 | 
137 |     n = tf.shape(inputs)[0]
138 |     w = tf.shape(inputs)[1]
139 |     c = inputs.get_shape().as_list()[2]
140 | 
141 |     oc = c // shuffle_size
142 |     ow = w * shuffle_size
143 | 
144 |     outputs = tf.reshape(tensor = inputs, shape = [n, ow, oc], name = name)
145 | 
146 |     return outputs
147 | 
148 | def generator_gatedcnn(inputs, reuse = False, scope_name = 'generator_gatedcnn'):
149 | 
150 |     # inputs has shape [batch_size, num_features, time]
151 |     # we need to convert it to [batch_size, time, num_features] for 1D convolution
152 |     inputs = tf.transpose(inputs, perm = [0, 2, 1], name = 'input_transpose')
153 | 
154 |     with tf.variable_scope(scope_name) as scope:
155 |         # Discriminator would be reused in CycleGAN
156 |         if reuse:
157 |             scope.reuse_variables()
158 |         else:
159 |             assert scope.reuse is False
160 | 
161 |         h1 = conv1d_layer(inputs = inputs, filters = 128, kernel_size = 15, strides = 1, activation = None, name = 'h1_conv')
162 |         h1_gates = conv1d_layer(inputs = inputs, filters = 128, kernel_size = 15, strides = 1, activation = None, name = 'h1_conv_gates')
163 |         h1_glu = gated_linear_layer(inputs = h1, gates = h1_gates, name = 'h1_glu')
164 | 
165 |         # Downsample
166 |         d1 = downsample1d_block(inputs = h1_glu, filters = 256, kernel_size = 5, strides = 2, name_prefix = 'downsample1d_block1_')
167 |         d2 = downsample1d_block(inputs = d1, filters = 512, kernel_size = 5, strides = 2, name_prefix = 'downsample1d_block2_')
168 | 
169 |         # Residual blocks
170 |         r1 = residual1d_block(inputs = d2, filters = 1024, kernel_size = 3, strides = 1, name_prefix = 'residual1d_block1_')
171 |         r2 = residual1d_block(inputs = r1, filters = 1024, kernel_size = 3, strides = 1, name_prefix = 'residual1d_block2_')
172 |         r3 = residual1d_block(inputs = r2, filters = 1024, kernel_size = 3, strides = 1, name_prefix = 'residual1d_block3_')
173 |         r4 = residual1d_block(inputs = r3, filters = 1024, kernel_size = 3, strides = 1, name_prefix = 'residual1d_block4_')
174 |         r5 = residual1d_block(inputs = r4, filters = 1024, kernel_size = 3, strides = 1, name_prefix = 'residual1d_block5_')
175 |         r6 = residual1d_block(inputs = r5, filters = 1024, kernel_size = 3, strides = 1, name_prefix = 'residual1d_block6_')
176 | 
177 |         # Upsample
178 |         u1 = upsample1d_block(inputs = r6, filters = 1024, kernel_size = 5, strides = 1, shuffle_size = 2, name_prefix = 'upsample1d_block1_')
179 |         u2 = upsample1d_block(inputs = u1, filters = 512, kernel_size = 5, strides = 1, shuffle_size = 2, name_prefix = 'upsample1d_block2_')
180 | 
181 |         # Output
182 |         o1 = conv1d_layer(inputs = u2, filters = 10, kernel_size = 15, strides = 1, activation = None, name = 'o1_conv')
183 |         o2 = tf.transpose(o1, perm = [0, 2, 1], name = 'output_transpose')
184 | 
185 |     return o2
186 |     
187 | 
188 | def discriminator(inputs, reuse = False, scope_name = 'discriminator'):
189 | 
190 |     # inputs has shape [batch_size, num_features, time]
191 |     # we need to add channel for 2D convolution [batch_size, num_features, time, 1]
192 |     inputs = tf.expand_dims(inputs, -1)
193 | 
194 |     with tf.variable_scope(scope_name) as scope:
195 |         # Discriminator would be reused in CycleGAN
196 |         if reuse:
197 |             scope.reuse_variables()
198 |         else:
199 |             assert scope.reuse is False
200 | 
201 |         h1 = conv2d_layer(inputs = inputs, filters = 128, kernel_size = [3, 3], strides = [1, 2], activation = None, name = 'h1_conv')
202 |         h1_gates = conv2d_layer(inputs = inputs, filters = 128, kernel_size = [3, 3], strides = [1, 2], activation = None, name = 'h1_conv_gates')
203 |         h1_glu = gated_linear_layer(inputs = h1, gates = h1_gates, name = 'h1_glu')
204 | 
205 |         # Downsample
206 |         d1 = downsample2d_block(inputs = h1_glu, filters = 256, kernel_size = [3, 3], strides = [2, 2], name_prefix = 'downsample2d_block1_')
207 |         d2 = downsample2d_block(inputs = d1, filters = 512, kernel_size = [3, 3], strides = [2, 2], name_prefix = 'downsample2d_block2_')
208 |         d3 = downsample2d_block(inputs = d2, filters = 1024, kernel_size = [6, 3], strides = [1, 2], name_prefix = 'downsample2d_block3_')
209 | 
210 |         # Output
211 |         o1 = tf.layers.dense(inputs = d3, units = 1, activation = tf.nn.sigmoid)
212 | 
213 |         return o1
214 | 
215 | 


--------------------------------------------------------------------------------
/Parallel-data-free emotional voice conversion with CycleGAN and CWT/module_mceps.py:
--------------------------------------------------------------------------------
  1 | import tensorflow as tf 
  2 | 
  3 | def gated_linear_layer(inputs, gates, name = None):
  4 | 
  5 |     activation = tf.multiply(x = inputs, y = tf.sigmoid(gates), name = name)
  6 | 
  7 |     return activation
  8 | 
  9 | def instance_norm_layer(
 10 |     inputs, 
 11 |     epsilon = 1e-06, 
 12 |     activation_fn = None, 
 13 |     name = None):
 14 | 
 15 |     instance_norm_layer = tf.contrib.layers.instance_norm(
 16 |         inputs = inputs,
 17 |         epsilon = epsilon,
 18 |         activation_fn = activation_fn)
 19 | 
 20 |     return instance_norm_layer
 21 | 
 22 | def conv1d_layer(
 23 |     inputs, 
 24 |     filters, 
 25 |     kernel_size, 
 26 |     strides = 1, 
 27 |     padding = 'same', 
 28 |     activation = None,
 29 |     kernel_initializer = None,
 30 |     name = None):
 31 | 
 32 |     conv_layer = tf.layers.conv1d(
 33 |         inputs = inputs,
 34 |         filters = filters,
 35 |         kernel_size = kernel_size,
 36 |         strides = strides,
 37 |         padding = padding,
 38 |         activation = activation,
 39 |         kernel_initializer = kernel_initializer,
 40 |         name = name)
 41 | 
 42 |     return conv_layer
 43 | 
 44 | def conv2d_layer(
 45 |     inputs, 
 46 |     filters, 
 47 |     kernel_size, 
 48 |     strides, 
 49 |     padding = 'same', 
 50 |     activation = None,
 51 |     kernel_initializer = None,
 52 |     name = None):
 53 | 
 54 |     conv_layer = tf.layers.conv2d(
 55 |         inputs = inputs,
 56 |         filters = filters,
 57 |         kernel_size = kernel_size,
 58 |         strides = strides,
 59 |         padding = padding,
 60 |         activation = activation,
 61 |         kernel_initializer = kernel_initializer,
 62 |         name = name)
 63 | 
 64 |     return conv_layer
 65 | 
 66 | def residual1d_block(
 67 |     inputs, 
 68 |     filters = 1024, 
 69 |     kernel_size = 3, 
 70 |     strides = 1,
 71 |     name_prefix = 'residule_block_'):
 72 | 
 73 |     h1 = conv1d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_conv')
 74 |     h1_norm = instance_norm_layer(inputs = h1, activation_fn = None, name = name_prefix + 'h1_norm')
 75 |     h1_gates = conv1d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_gates')
 76 |     h1_norm_gates = instance_norm_layer(inputs = h1_gates, activation_fn = None, name = name_prefix + 'h1_norm_gates')
 77 |     h1_glu = gated_linear_layer(inputs = h1_norm, gates = h1_norm_gates, name = name_prefix + 'h1_glu')
 78 |     h2 = conv1d_layer(inputs = h1_glu, filters = filters // 2, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h2_conv')
 79 |     h2_norm = instance_norm_layer(inputs = h2, activation_fn = None, name = name_prefix + 'h2_norm')
 80 |     
 81 |     h3 = inputs + h2_norm
 82 | 
 83 |     return h3
 84 | 
 85 | def downsample1d_block(
 86 |     inputs, 
 87 |     filters, 
 88 |     kernel_size, 
 89 |     strides,
 90 |     name_prefix = 'downsample1d_block_'):
 91 | 
 92 |     h1 = conv1d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_conv')
 93 |     h1_norm = instance_norm_layer(inputs = h1, activation_fn = None, name = name_prefix + 'h1_norm')
 94 |     h1_gates = conv1d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_gates')
 95 |     h1_norm_gates = instance_norm_layer(inputs = h1_gates, activation_fn = None, name = name_prefix + 'h1_norm_gates')
 96 |     h1_glu = gated_linear_layer(inputs = h1_norm, gates = h1_norm_gates, name = name_prefix + 'h1_glu')
 97 | 
 98 |     return h1_glu
 99 | 
100 | def downsample2d_block(
101 |     inputs, 
102 |     filters, 
103 |     kernel_size, 
104 |     strides,
105 |     name_prefix = 'downsample2d_block_'):
106 | 
107 |     h1 = conv2d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_conv')
108 |     h1_norm = instance_norm_layer(inputs = h1, activation_fn = None, name = name_prefix + 'h1_norm')
109 |     h1_gates = conv2d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_gates')
110 |     h1_norm_gates = instance_norm_layer(inputs = h1_gates, activation_fn = None, name = name_prefix + 'h1_norm_gates')
111 |     h1_glu = gated_linear_layer(inputs = h1_norm, gates = h1_norm_gates, name = name_prefix + 'h1_glu')
112 | 
113 |     return h1_glu
114 | 
115 | def upsample1d_block(
116 |     inputs, 
117 |     filters, 
118 |     kernel_size, 
119 |     strides,
120 |     shuffle_size = 2,
121 |     name_prefix = 'upsample1d_block_'):
122 |     
123 |     h1 = conv1d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_conv')
124 |     h1_shuffle = pixel_shuffler(inputs = h1, shuffle_size = shuffle_size, name = name_prefix + 'h1_shuffle')
125 |     h1_norm = instance_norm_layer(inputs = h1_shuffle, activation_fn = None, name = name_prefix + 'h1_norm')
126 | 
127 |     h1_gates = conv1d_layer(inputs = inputs, filters = filters, kernel_size = kernel_size, strides = strides, activation = None, name = name_prefix + 'h1_gates')
128 |     h1_shuffle_gates = pixel_shuffler(inputs = h1_gates, shuffle_size = shuffle_size, name = name_prefix + 'h1_shuffle_gates')
129 |     h1_norm_gates = instance_norm_layer(inputs = h1_shuffle_gates, activation_fn = None, name = name_prefix + 'h1_norm_gates')
130 | 
131 |     h1_glu = gated_linear_layer(inputs = h1_norm, gates = h1_norm_gates, name = name_prefix + 'h1_glu')
132 | 
133 |     return h1_glu
134 | 
135 | def pixel_shuffler(inputs, shuffle_size = 2, name = None):
136 | 
137 |     n = tf.shape(inputs)[0]
138 |     w = tf.shape(inputs)[1]
139 |     c = inputs.get_shape().as_list()[2]
140 | 
141 |     oc = c // shuffle_size
142 |     ow = w * shuffle_size
143 | 
144 |     outputs = tf.reshape(tensor = inputs, shape = [n, ow, oc], name = name)
145 | 
146 |     return outputs
147 | 
148 | def generator_gatedcnn(inputs, reuse = False, scope_name = 'generator_gatedcnn'):
149 | 
150 |     # inputs has shape [batch_size, num_features, time]
151 |     # we need to convert it to [batch_size, time, num_features] for 1D convolution
152 |     inputs = tf.transpose(inputs, perm = [0, 2, 1], name = 'input_transpose')
153 | 
154 |     with tf.variable_scope(scope_name) as scope:
155 |         # Discriminator would be reused in CycleGAN
156 |         if reuse:
157 |             scope.reuse_variables()
158 |         else:
159 |             assert scope.reuse is False
160 | 
161 |         h1 = conv1d_layer(inputs = inputs, filters = 128, kernel_size = 15, strides = 1, activation = None, name = 'h1_conv')
162 |         h1_gates = conv1d_layer(inputs = inputs, filters = 128, kernel_size = 15, strides = 1, activation = None, name = 'h1_conv_gates')
163 |         h1_glu = gated_linear_layer(inputs = h1, gates = h1_gates, name = 'h1_glu')
164 | 
165 |         # Downsample
166 |         d1 = downsample1d_block(inputs = h1_glu, filters = 256, kernel_size = 5, strides = 2, name_prefix = 'downsample1d_block1_')
167 |         d2 = downsample1d_block(inputs = d1, filters = 512, kernel_size = 5, strides = 2, name_prefix = 'downsample1d_block2_')
168 | 
169 |         # Residual blocks
170 |         r1 = residual1d_block(inputs = d2, filters = 1024, kernel_size = 3, strides = 1, name_prefix = 'residual1d_block1_')
171 |         r2 = residual1d_block(inputs = r1, filters = 1024, kernel_size = 3, strides = 1, name_prefix = 'residual1d_block2_')
172 |         r3 = residual1d_block(inputs = r2, filters = 1024, kernel_size = 3, strides = 1, name_prefix = 'residual1d_block3_')
173 |         r4 = residual1d_block(inputs = r3, filters = 1024, kernel_size = 3, strides = 1, name_prefix = 'residual1d_block4_')
174 |         r5 = residual1d_block(inputs = r4, filters = 1024, kernel_size = 3, strides = 1, name_prefix = 'residual1d_block5_')
175 |         r6 = residual1d_block(inputs = r5, filters = 1024, kernel_size = 3, strides = 1, name_prefix = 'residual1d_block6_')
176 | 
177 |         # Upsample
178 |         u1 = upsample1d_block(inputs = r6, filters = 1024, kernel_size = 5, strides = 1, shuffle_size = 2, name_prefix = 'upsample1d_block1_')
179 |         u2 = upsample1d_block(inputs = u1, filters = 512, kernel_size = 5, strides = 1, shuffle_size = 2, name_prefix = 'upsample1d_block2_')
180 | 
181 |         # Output
182 |         o1 = conv1d_layer(inputs = u2, filters = 24, kernel_size = 15, strides = 1, activation = None, name = 'o1_conv')
183 |         o2 = tf.transpose(o1, perm = [0, 2, 1], name = 'output_transpose')
184 | 
185 |     return o2
186 |     
187 | 
188 | def discriminator(inputs, reuse = False, scope_name = 'discriminator'):
189 | 
190 |     # inputs has shape [batch_size, num_features, time]
191 |     # we need to add channel for 2D convolution [batch_size, num_features, time, 1]
192 |     inputs = tf.expand_dims(inputs, -1)
193 | 
194 |     with tf.variable_scope(scope_name) as scope:
195 |         # Discriminator would be reused in CycleGAN
196 |         if reuse:
197 |             scope.reuse_variables()
198 |         else:
199 |             assert scope.reuse is False
200 | 
201 |         h1 = conv2d_layer(inputs = inputs, filters = 128, kernel_size = [3, 3], strides = [1, 2], activation = None, name = 'h1_conv')
202 |         h1_gates = conv2d_layer(inputs = inputs, filters = 128, kernel_size = [3, 3], strides = [1, 2], activation = None, name = 'h1_conv_gates')
203 |         h1_glu = gated_linear_layer(inputs = h1, gates = h1_gates, name = 'h1_glu')
204 | 
205 |         # Downsample
206 |         d1 = downsample2d_block(inputs = h1_glu, filters = 256, kernel_size = [3, 3], strides = [2, 2], name_prefix = 'downsample2d_block1_')
207 |         d2 = downsample2d_block(inputs = d1, filters = 512, kernel_size = [3, 3], strides = [2, 2], name_prefix = 'downsample2d_block2_')
208 |         d3 = downsample2d_block(inputs = d2, filters = 1024, kernel_size = [6, 3], strides = [1, 2], name_prefix = 'downsample2d_block3_')
209 | 
210 |         # Output
211 |         o1 = tf.layers.dense(inputs = d3, units = 1, activation = tf.nn.sigmoid)
212 | 
213 |         return o1
214 | 
215 | 


--------------------------------------------------------------------------------
/Parallel-data-free emotional voice conversion with CycleGAN and CWT/model.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import tensorflow as tf
  3 | from module import discriminator, generator_gatedcnn
  4 | from utils import l1_loss, l2_loss, cross_entropy_loss
  5 | from datetime import datetime
  6 | 
  7 | class CycleGAN(object):
  8 | 
  9 |     def __init__(self, num_features, discriminator = discriminator, generator = generator_gatedcnn, mode = 'train', log_dir = './log'):
 10 | 
 11 |         self.num_features = num_features
 12 |         self.input_shape = [None, num_features, None] # [batch_size, num_features, num_frames]
 13 | 
 14 |         self.discriminator = discriminator
 15 |         self.generator = generator
 16 |         self.mode = mode
 17 | 
 18 |         self.build_model()
 19 |         self.optimizer_initializer()
 20 | 
 21 |         self.saver = tf.train.Saver()
 22 |         self.sess = tf.Session()
 23 |         self.sess.run(tf.global_variables_initializer())
 24 | 
 25 |         if self.mode == 'train':
 26 |             self.train_step = 0
 27 |             now = datetime.now()
 28 |             self.log_dir = os.path.join(log_dir, now.strftime('%Y%m%d-%H%M%S'))
 29 |             self.writer = tf.summary.FileWriter(self.log_dir, tf.get_default_graph())
 30 |             self.generator_summaries, self.discriminator_summaries = self.summary()
 31 | 
 32 |     def build_model(self):
 33 | 
 34 |         # Placeholders for real training samples
 35 |         self.input_A_real = tf.placeholder(tf.float32, shape = self.input_shape, name = 'input_A_real')
 36 |         self.input_B_real = tf.placeholder(tf.float32, shape = self.input_shape, name = 'input_B_real')
 37 |         # Placeholders for fake generated samples
 38 |         self.input_A_fake = tf.placeholder(tf.float32, shape = self.input_shape, name = 'input_A_fake')
 39 |         self.input_B_fake = tf.placeholder(tf.float32, shape = self.input_shape, name = 'input_B_fake')
 40 |         # Placeholder for test samples
 41 |         self.input_A_test = tf.placeholder(tf.float32, shape = self.input_shape, name = 'input_A_test')
 42 |         self.input_B_test = tf.placeholder(tf.float32, shape = self.input_shape, name = 'input_B_test')
 43 | 
 44 |         self.generation_B = self.generator(inputs = self.input_A_real, reuse = False, scope_name = 'generator_A2B')
 45 |         self.cycle_A = self.generator(inputs = self.generation_B, reuse = False, scope_name = 'generator_B2A')
 46 | 
 47 |         self.generation_A = self.generator(inputs = self.input_B_real, reuse = True, scope_name = 'generator_B2A')
 48 |         self.cycle_B = self.generator(inputs = self.generation_A, reuse = True, scope_name = 'generator_A2B')
 49 | 
 50 |         self.generation_A_identity = self.generator(inputs = self.input_A_real, reuse = True, scope_name = 'generator_B2A')
 51 |         self.generation_B_identity = self.generator(inputs = self.input_B_real, reuse = True, scope_name = 'generator_A2B')
 52 | 
 53 |         self.discrimination_A_fake = self.discriminator(inputs = self.generation_A, reuse = False, scope_name = 'discriminator_A')
 54 |         self.discrimination_B_fake = self.discriminator(inputs = self.generation_B, reuse = False, scope_name = 'discriminator_B')
 55 | 
 56 |         # Cycle loss
 57 |         self.cycle_loss = l1_loss(y = self.input_A_real, y_hat = self.cycle_A) + l1_loss(y = self.input_B_real, y_hat = self.cycle_B)
 58 | 
 59 |         # Identity loss
 60 |         self.identity_loss = l1_loss(y = self.input_A_real, y_hat = self.generation_A_identity) + l1_loss(y = self.input_B_real, y_hat = self.generation_B_identity)
 61 | 
 62 |         # Place holder for lambda_cycle and lambda_identity
 63 |         self.lambda_cycle = tf.placeholder(tf.float32, None, name = 'lambda_cycle')
 64 |         self.lambda_identity = tf.placeholder(tf.float32, None, name = 'lambda_identity')
 65 | 
 66 |         # Generator loss
 67 |         # Generator wants to fool discriminator
 68 |         self.generator_loss_A2B = l2_loss(y = tf.ones_like(self.discrimination_B_fake), y_hat = self.discrimination_B_fake)
 69 |         self.generator_loss_B2A = l2_loss(y = tf.ones_like(self.discrimination_A_fake), y_hat = self.discrimination_A_fake)
 70 | 
 71 |         # Merge the two generators and the cycle loss
 72 |         self.generator_loss = self.generator_loss_A2B + self.generator_loss_B2A + self.lambda_cycle * self.cycle_loss + self.lambda_identity * self.identity_loss
 73 | 
 74 |         # Discriminator loss
 75 |         self.discrimination_input_A_real = self.discriminator(inputs = self.input_A_real, reuse = True, scope_name = 'discriminator_A')
 76 |         self.discrimination_input_B_real = self.discriminator(inputs = self.input_B_real, reuse = True, scope_name = 'discriminator_B')
 77 |         self.discrimination_input_A_fake = self.discriminator(inputs = self.input_A_fake, reuse = True, scope_name = 'discriminator_A')
 78 |         self.discrimination_input_B_fake = self.discriminator(inputs = self.input_B_fake, reuse = True, scope_name = 'discriminator_B')
 79 | 
 80 |         # Discriminator wants to classify real and fake correctly
 81 |         self.discriminator_loss_input_A_real = l2_loss(y = tf.ones_like(self.discrimination_input_A_real), y_hat = self.discrimination_input_A_real)
 82 |         self.discriminator_loss_input_A_fake = l2_loss(y = tf.zeros_like(self.discrimination_input_A_fake), y_hat = self.discrimination_input_A_fake)
 83 |         self.discriminator_loss_A = (self.discriminator_loss_input_A_real + self.discriminator_loss_input_A_fake) / 2
 84 | 
 85 |         self.discriminator_loss_input_B_real = l2_loss(y = tf.ones_like(self.discrimination_input_B_real), y_hat = self.discrimination_input_B_real)
 86 |         self.discriminator_loss_input_B_fake = l2_loss(y = tf.zeros_like(self.discrimination_input_B_fake), y_hat = self.discrimination_input_B_fake)
 87 |         self.discriminator_loss_B = (self.discriminator_loss_input_B_real + self.discriminator_loss_input_B_fake) / 2
 88 | 
 89 |         # Merge the two discriminators into one
 90 |         self.discriminator_loss = self.discriminator_loss_A + self.discriminator_loss_B
 91 | 
 92 |         # Categorize variables because we have to optimize the two sets of the variables separately
 93 |         trainable_variables = tf.trainable_variables()
 94 |         self.discriminator_vars = [var for var in trainable_variables if 'discriminator' in var.name]
 95 |         self.generator_vars = [var for var in trainable_variables if 'generator' in var.name]
 96 |         #for var in t_vars: print(var.name)
 97 | 
 98 |         # Reserved for test
 99 |         self.generation_B_test = self.generator(inputs = self.input_A_test, reuse = True, scope_name = 'generator_A2B')
100 |         self.generation_A_test = self.generator(inputs = self.input_B_test, reuse = True, scope_name = 'generator_B2A')
101 | 
102 | 
103 |     def optimizer_initializer(self):
104 | 
105 |         self.generator_learning_rate = tf.placeholder(tf.float32, None, name = 'generator_learning_rate')
106 |         self.discriminator_learning_rate = tf.placeholder(tf.float32, None, name = 'discriminator_learning_rate')
107 |         self.discriminator_optimizer = tf.train.AdamOptimizer(learning_rate = self.discriminator_learning_rate, beta1 = 0.5).minimize(self.discriminator_loss, var_list = self.discriminator_vars)
108 |         self.generator_optimizer = tf.train.AdamOptimizer(learning_rate = self.generator_learning_rate, beta1 = 0.5).minimize(self.generator_loss, var_list = self.generator_vars) 
109 | 
110 |     def train(self, input_A, input_B, lambda_cycle, lambda_identity, generator_learning_rate, discriminator_learning_rate):
111 | 
112 |         generation_A, generation_B, generator_loss, _, generator_summaries = self.sess.run(
113 |             [self.generation_A, self.generation_B, self.generator_loss, self.generator_optimizer, self.generator_summaries], \
114 |             feed_dict = {self.lambda_cycle: lambda_cycle, self.lambda_identity: lambda_identity, self.input_A_real: input_A, self.input_B_real: input_B, self.generator_learning_rate: generator_learning_rate})
115 | 
116 |         self.writer.add_summary(generator_summaries, self.train_step)
117 | 
118 |         discriminator_loss, _, discriminator_summaries = self.sess.run([self.discriminator_loss, self.discriminator_optimizer, self.discriminator_summaries], \
119 |             feed_dict = {self.input_A_real: input_A, self.input_B_real: input_B, self.discriminator_learning_rate: discriminator_learning_rate, self.input_A_fake: generation_A, self.input_B_fake: generation_B})
120 | 
121 |         self.writer.add_summary(discriminator_summaries, self.train_step)
122 | 
123 |         self.train_step += 1
124 | 
125 |         return generator_loss, discriminator_loss
126 | 
127 | 
128 |     def test(self, inputs, direction):
129 | 
130 |         if direction == 'A2B':
131 |             generation = self.sess.run(self.generation_B_test, feed_dict = {self.input_A_test: inputs})
132 |         elif direction == 'B2A':
133 |             generation = self.sess.run(self.generation_A_test, feed_dict = {self.input_B_test: inputs})
134 |         else:
135 |             raise Exception('Conversion direction must be specified.')
136 | 
137 |         return generation
138 | 
139 | 
140 |     def save(self, directory, filename):
141 | 
142 |         if not os.path.exists(directory):
143 |             os.makedirs(directory)
144 |         self.saver.save(self.sess, os.path.join(directory, filename))
145 |         
146 |         return os.path.join(directory, filename)
147 | 
148 |     def load(self, filepath):
149 | 
150 |         self.saver.restore(self.sess, filepath)
151 | #eslahie emlaie
152 | 
153 |     def summary(self):
154 | 
155 |         with tf.name_scope('generator_summaries'):
156 |             cycle_loss_summary = tf.summary.scalar('cycle_loss', self.cycle_loss)
157 |             identity_loss_summary = tf.summary.scalar('identity_loss', self.identity_loss)
158 |             generator_loss_A2B_summary = tf.summary.scalar('generator_loss_A2B', self.generator_loss_A2B)
159 |             generator_loss_B2A_summary = tf.summary.scalar('generator_loss_B2A', self.generator_loss_B2A)
160 |             generator_loss_summary = tf.summary.scalar('generator_loss', self.generator_loss)
161 |             generator_summaries = tf.summary.merge([cycle_loss_summary, identity_loss_summary, generator_loss_A2B_summary, generator_loss_B2A_summary, generator_loss_summary])
162 | 
163 |         with tf.name_scope('discriminator_summaries'):
164 |             discriminator_loss_A_summary = tf.summary.scalar('discriminator_loss_A', self.discriminator_loss_A)
165 |             discriminator_loss_B_summary = tf.summary.scalar('discriminator_loss_B', self.discriminator_loss_B)
166 |             discriminator_loss_summary = tf.summary.scalar('discriminator_loss', self.discriminator_loss)
167 |             discriminator_summaries = tf.summary.merge([discriminator_loss_A_summary, discriminator_loss_B_summary, discriminator_loss_summary])
168 | 
169 |         return generator_summaries, discriminator_summaries
170 | 
171 | 
172 | if __name__ == '__main__':
173 |     
174 |     model = CycleGAN(num_features = 24)
175 |     print('Graph Compile Successeded.')
176 | 


--------------------------------------------------------------------------------
/Parallel-data-free emotional voice conversion with CycleGAN and CWT/model_f0.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import tensorflow as tf
  3 | from module_f0 import discriminator, generator_gatedcnn
  4 | from utils import l1_loss, l2_loss, cross_entropy_loss
  5 | from datetime import datetime
  6 | 
  7 | class CycleGAN(object):
  8 | 
  9 |     def __init__(self, num_features, discriminator = discriminator, generator = generator_gatedcnn, mode = 'train', log_dir = './log'):
 10 | 
 11 |         self.num_features = num_features
 12 |         self.input_shape = [None, num_features, None] # [batch_size, num_features, num_frames]
 13 | 
 14 |         self.discriminator = discriminator
 15 |         self.generator = generator
 16 |         self.mode = mode
 17 | 
 18 |         self.build_model()
 19 |         self.optimizer_initializer()
 20 | 
 21 |         self.saver = tf.train.Saver()
 22 |         self.sess = tf.Session()
 23 |         self.sess.run(tf.global_variables_initializer())
 24 | 
 25 |         if self.mode == 'train':
 26 |             self.train_step = 0
 27 |             now = datetime.now()
 28 |             self.log_dir = os.path.join(log_dir, now.strftime('%Y%m%d-%H%M%S'))
 29 |             self.writer = tf.summary.FileWriter(self.log_dir, tf.get_default_graph())
 30 |             self.generator_summaries, self.discriminator_summaries = self.summary()
 31 | 
 32 |     def build_model(self):
 33 | 
 34 |         # Placeholders for real training samples
 35 |         self.input_A_real = tf.placeholder(tf.float32, shape = self.input_shape, name = 'input_A_real')
 36 |         self.input_B_real = tf.placeholder(tf.float32, shape = self.input_shape, name = 'input_B_real')
 37 |         # Placeholders for fake generated samples
 38 |         self.input_A_fake = tf.placeholder(tf.float32, shape = self.input_shape, name = 'input_A_fake')
 39 |         self.input_B_fake = tf.placeholder(tf.float32, shape = self.input_shape, name = 'input_B_fake')
 40 |         # Placeholder for test samples
 41 |         self.input_A_test = tf.placeholder(tf.float32, shape = self.input_shape, name = 'input_A_test')
 42 |         self.input_B_test = tf.placeholder(tf.float32, shape = self.input_shape, name = 'input_B_test')
 43 | 
 44 |         self.generation_B = self.generator(inputs = self.input_A_real, reuse = False, scope_name = 'generator_A2B')
 45 |         self.cycle_A = self.generator(inputs = self.generation_B, reuse = False, scope_name = 'generator_B2A')
 46 | 
 47 |         self.generation_A = self.generator(inputs = self.input_B_real, reuse = True, scope_name = 'generator_B2A')
 48 |         self.cycle_B = self.generator(inputs = self.generation_A, reuse = True, scope_name = 'generator_A2B')
 49 | 
 50 |         self.generation_A_identity = self.generator(inputs = self.input_A_real, reuse = True, scope_name = 'generator_B2A')
 51 |         self.generation_B_identity = self.generator(inputs = self.input_B_real, reuse = True, scope_name = 'generator_A2B')
 52 | 
 53 |         self.discrimination_A_fake = self.discriminator(inputs = self.generation_A, reuse = False, scope_name = 'discriminator_A')
 54 |         self.discrimination_B_fake = self.discriminator(inputs = self.generation_B, reuse = False, scope_name = 'discriminator_B')
 55 | 
 56 |         # Cycle loss
 57 |         self.cycle_loss = l1_loss(y = self.input_A_real, y_hat = self.cycle_A) + l1_loss(y = self.input_B_real, y_hat = self.cycle_B)
 58 | 
 59 |         # Identity loss
 60 |         self.identity_loss = l1_loss(y = self.input_A_real, y_hat = self.generation_A_identity) + l1_loss(y = self.input_B_real, y_hat = self.generation_B_identity)
 61 | 
 62 |         # Place holder for lambda_cycle and lambda_identity
 63 |         self.lambda_cycle = tf.placeholder(tf.float32, None, name = 'lambda_cycle')
 64 |         self.lambda_identity = tf.placeholder(tf.float32, None, name = 'lambda_identity')
 65 | 
 66 |         # Generator loss
 67 |         # Generator wants to fool discriminator
 68 |         self.generator_loss_A2B = l2_loss(y = tf.ones_like(self.discrimination_B_fake), y_hat = self.discrimination_B_fake)
 69 |         self.generator_loss_B2A = l2_loss(y = tf.ones_like(self.discrimination_A_fake), y_hat = self.discrimination_A_fake)
 70 | 
 71 |         # Merge the two generators and the cycle loss
 72 |         self.generator_loss = self.generator_loss_A2B + self.generator_loss_B2A + self.lambda_cycle * self.cycle_loss + self.lambda_identity * self.identity_loss
 73 | 
 74 |         # Discriminator loss
 75 |         self.discrimination_input_A_real = self.discriminator(inputs = self.input_A_real, reuse = True, scope_name = 'discriminator_A')
 76 |         self.discrimination_input_B_real = self.discriminator(inputs = self.input_B_real, reuse = True, scope_name = 'discriminator_B')
 77 |         self.discrimination_input_A_fake = self.discriminator(inputs = self.input_A_fake, reuse = True, scope_name = 'discriminator_A')
 78 |         self.discrimination_input_B_fake = self.discriminator(inputs = self.input_B_fake, reuse = True, scope_name = 'discriminator_B')
 79 | 
 80 |         # Discriminator wants to classify real and fake correctly
 81 |         self.discriminator_loss_input_A_real = l2_loss(y = tf.ones_like(self.discrimination_input_A_real), y_hat = self.discrimination_input_A_real)
 82 |         self.discriminator_loss_input_A_fake = l2_loss(y = tf.zeros_like(self.discrimination_input_A_fake), y_hat = self.discrimination_input_A_fake)
 83 |         self.discriminator_loss_A = (self.discriminator_loss_input_A_real + self.discriminator_loss_input_A_fake) / 2
 84 | 
 85 |         self.discriminator_loss_input_B_real = l2_loss(y = tf.ones_like(self.discrimination_input_B_real), y_hat = self.discrimination_input_B_real)
 86 |         self.discriminator_loss_input_B_fake = l2_loss(y = tf.zeros_like(self.discrimination_input_B_fake), y_hat = self.discrimination_input_B_fake)
 87 |         self.discriminator_loss_B = (self.discriminator_loss_input_B_real + self.discriminator_loss_input_B_fake) / 2
 88 | 
 89 |         # Merge the two discriminators into one
 90 |         self.discriminator_loss = self.discriminator_loss_A + self.discriminator_loss_B
 91 | 
 92 |         # Categorize variables because we have to optimize the two sets of the variables separately
 93 |         trainable_variables = tf.trainable_variables()
 94 |         self.discriminator_vars = [var for var in trainable_variables if 'discriminator' in var.name]
 95 |         self.generator_vars = [var for var in trainable_variables if 'generator' in var.name]
 96 |         #for var in t_vars: print(var.name)
 97 | 
 98 |         # Reserved for test
 99 |         self.generation_B_test = self.generator(inputs = self.input_A_test, reuse = True, scope_name = 'generator_A2B')
100 |         self.generation_A_test = self.generator(inputs = self.input_B_test, reuse = True, scope_name = 'generator_B2A')
101 | 
102 | 
103 |     def optimizer_initializer(self):
104 | 
105 |         self.generator_learning_rate = tf.placeholder(tf.float32, None, name = 'generator_learning_rate')
106 |         self.discriminator_learning_rate = tf.placeholder(tf.float32, None, name = 'discriminator_learning_rate')
107 |         self.discriminator_optimizer = tf.train.AdamOptimizer(learning_rate = self.discriminator_learning_rate, beta1 = 0.5).minimize(self.discriminator_loss, var_list = self.discriminator_vars)
108 |         self.generator_optimizer = tf.train.AdamOptimizer(learning_rate = self.generator_learning_rate, beta1 = 0.5).minimize(self.generator_loss, var_list = self.generator_vars) 
109 | 
110 |     def train(self, input_A, input_B, lambda_cycle, lambda_identity, generator_learning_rate, discriminator_learning_rate):
111 | 
112 |         generation_A, generation_B, generator_loss, _, generator_summaries = self.sess.run(
113 |             [self.generation_A, self.generation_B, self.generator_loss, self.generator_optimizer, self.generator_summaries], \
114 |             feed_dict = {self.lambda_cycle: lambda_cycle, self.lambda_identity: lambda_identity, self.input_A_real: input_A, self.input_B_real: input_B, self.generator_learning_rate: generator_learning_rate})
115 | 
116 |         self.writer.add_summary(generator_summaries, self.train_step)
117 | 
118 |         discriminator_loss, _, discriminator_summaries = self.sess.run([self.discriminator_loss, self.discriminator_optimizer, self.discriminator_summaries], \
119 |             feed_dict = {self.input_A_real: input_A, self.input_B_real: input_B, self.discriminator_learning_rate: discriminator_learning_rate, self.input_A_fake: generation_A, self.input_B_fake: generation_B})
120 | 
121 |         self.writer.add_summary(discriminator_summaries, self.train_step)
122 | 
123 |         self.train_step += 1
124 | 
125 |         return generator_loss, discriminator_loss
126 | 
127 | 
128 |     def test(self, inputs, direction):
129 | 
130 |         if direction == 'A2B':
131 |             generation = self.sess.run(self.generation_B_test, feed_dict = {self.input_A_test: inputs})
132 |         elif direction == 'B2A':
133 |             generation = self.sess.run(self.generation_A_test, feed_dict = {self.input_B_test: inputs})
134 |         else:
135 |             raise Exception('Conversion direction must be specified.')
136 | 
137 |         return generation
138 | 
139 | 
140 |     def save(self, directory, filename):
141 | 
142 |         if not os.path.exists(directory):
143 |             os.makedirs(directory)
144 |         self.saver.save(self.sess, os.path.join(directory, filename))
145 |         
146 |         return os.path.join(directory, filename)
147 | 
148 |     def load(self, filepath):
149 | 
150 |         self.saver.restore(self.sess, filepath)
151 | 
152 | 
153 |     def summary(self):
154 | 
155 |         with tf.name_scope('generator_summaries'):
156 |             cycle_loss_summary = tf.summary.scalar('cycle_loss', self.cycle_loss)
157 |             identity_loss_summary = tf.summary.scalar('identity_loss', self.identity_loss)
158 |             generator_loss_A2B_summary = tf.summary.scalar('generator_loss_A2B', self.generator_loss_A2B)
159 |             generator_loss_B2A_summary = tf.summary.scalar('generator_loss_B2A', self.generator_loss_B2A)
160 |             generator_loss_summary = tf.summary.scalar('generator_loss', self.generator_loss)
161 |             generator_summaries = tf.summary.merge([cycle_loss_summary, identity_loss_summary, generator_loss_A2B_summary, generator_loss_B2A_summary, generator_loss_summary])
162 | 
163 |         with tf.name_scope('discriminator_summaries'):
164 |             discriminator_loss_A_summary = tf.summary.scalar('discriminator_loss_A', self.discriminator_loss_A)
165 |             discriminator_loss_B_summary = tf.summary.scalar('discriminator_loss_B', self.discriminator_loss_B)
166 |             discriminator_loss_summary = tf.summary.scalar('discriminator_loss', self.discriminator_loss)
167 |             discriminator_summaries = tf.summary.merge([discriminator_loss_A_summary, discriminator_loss_B_summary, discriminator_loss_summary])
168 | 
169 |         return generator_summaries, discriminator_summaries
170 | 
171 | 
172 | if __name__ == '__main__':
173 |     tf.reset_default_graph()
174 |     model = CycleGAN(num_features = 10)
175 |     print('Graph Compile Successeded.')
176 | 


--------------------------------------------------------------------------------
/Parallel-data-free emotional voice conversion with CycleGAN and CWT/model_mceps.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import tensorflow as tf
  3 | from module_mceps import discriminator, generator_gatedcnn
  4 | from utils import l1_loss, l2_loss, cross_entropy_loss
  5 | from datetime import datetime
  6 | 
  7 | class CycleGAN(object):
  8 | 
  9 |     def __init__(self, num_features, discriminator = discriminator, generator = generator_gatedcnn, mode = 'train', log_dir = './log'):
 10 | 
 11 |         self.num_features = num_features
 12 |         self.input_shape = [None, num_features, None] # [batch_size, num_features, num_frames]
 13 | 
 14 |         self.discriminator = discriminator
 15 |         self.generator = generator
 16 |         self.mode = mode
 17 | 
 18 |         self.build_model()
 19 |         self.optimizer_initializer()
 20 | 
 21 |         self.saver = tf.train.Saver()
 22 |         self.sess = tf.Session()
 23 |         self.sess.run(tf.global_variables_initializer())
 24 | 
 25 |         if self.mode == 'train':
 26 |             self.train_step = 0
 27 |             now = datetime.now()
 28 |             self.log_dir = os.path.join(log_dir, now.strftime('%Y%m%d-%H%M%S'))
 29 |             self.writer = tf.summary.FileWriter(self.log_dir, tf.get_default_graph())
 30 |             self.generator_summaries, self.discriminator_summaries = self.summary()
 31 | 
 32 |     def build_model(self):
 33 |         tf.reset_default_graph()
 34 |         # Placeholders for real training samples
 35 |         self.input_A_real = tf.placeholder(tf.float32, shape = self.input_shape, name = 'input_A_real')
 36 |         self.input_B_real = tf.placeholder(tf.float32, shape = self.input_shape, name = 'input_B_real')
 37 |         # Placeholders for fake generated samples
 38 |         self.input_A_fake = tf.placeholder(tf.float32, shape = self.input_shape, name = 'input_A_fake')
 39 |         self.input_B_fake = tf.placeholder(tf.float32, shape = self.input_shape, name = 'input_B_fake')
 40 |         # Placeholder for test samples
 41 |         self.input_A_test = tf.placeholder(tf.float32, shape = self.input_shape, name = 'input_A_test')
 42 |         self.input_B_test = tf.placeholder(tf.float32, shape = self.input_shape, name = 'input_B_test')
 43 | 
 44 |         self.generation_B = self.generator(inputs = self.input_A_real, reuse = False, scope_name = 'generator_A2B')
 45 |         self.cycle_A = self.generator(inputs = self.generation_B, reuse = False, scope_name = 'generator_B2A')
 46 | 
 47 |         self.generation_A = self.generator(inputs = self.input_B_real, reuse = True, scope_name = 'generator_B2A')
 48 |         self.cycle_B = self.generator(inputs = self.generation_A, reuse = True, scope_name = 'generator_A2B')
 49 | 
 50 |         self.generation_A_identity = self.generator(inputs = self.input_A_real, reuse = True, scope_name = 'generator_B2A')
 51 |         self.generation_B_identity = self.generator(inputs = self.input_B_real, reuse = True, scope_name = 'generator_A2B')
 52 | 
 53 |         self.discrimination_A_fake = self.discriminator(inputs = self.generation_A, reuse = False, scope_name = 'discriminator_A')
 54 |         self.discrimination_B_fake = self.discriminator(inputs = self.generation_B, reuse = False, scope_name = 'discriminator_B')
 55 | 
 56 |         # Cycle loss
 57 |         self.cycle_loss = l1_loss(y = self.input_A_real, y_hat = self.cycle_A) + l1_loss(y = self.input_B_real, y_hat = self.cycle_B)
 58 | 
 59 |         # Identity loss
 60 |         self.identity_loss = l1_loss(y = self.input_A_real, y_hat = self.generation_A_identity) + l1_loss(y = self.input_B_real, y_hat = self.generation_B_identity)
 61 | 
 62 |         # Place holder for lambda_cycle and lambda_identity
 63 |         self.lambda_cycle = tf.placeholder(tf.float32, None, name = 'lambda_cycle')
 64 |         self.lambda_identity = tf.placeholder(tf.float32, None, name = 'lambda_identity')
 65 | 
 66 |         # Generator loss
 67 |         # Generator wants to fool discriminator
 68 |         self.generator_loss_A2B = l2_loss(y = tf.ones_like(self.discrimination_B_fake), y_hat = self.discrimination_B_fake)
 69 |         self.generator_loss_B2A = l2_loss(y = tf.ones_like(self.discrimination_A_fake), y_hat = self.discrimination_A_fake)
 70 | 
 71 |         # Merge the two generators and the cycle loss
 72 |         self.generator_loss = self.generator_loss_A2B + self.generator_loss_B2A + self.lambda_cycle * self.cycle_loss + self.lambda_identity * self.identity_loss
 73 | 
 74 |         # Discriminator loss
 75 |         self.discrimination_input_A_real = self.discriminator(inputs = self.input_A_real, reuse = True, scope_name = 'discriminator_A')
 76 |         self.discrimination_input_B_real = self.discriminator(inputs = self.input_B_real, reuse = True, scope_name = 'discriminator_B')
 77 |         self.discrimination_input_A_fake = self.discriminator(inputs = self.input_A_fake, reuse = True, scope_name = 'discriminator_A')
 78 |         self.discrimination_input_B_fake = self.discriminator(inputs = self.input_B_fake, reuse = True, scope_name = 'discriminator_B')
 79 | 
 80 |         # Discriminator wants to classify real and fake correctly
 81 |         self.discriminator_loss_input_A_real = l2_loss(y = tf.ones_like(self.discrimination_input_A_real), y_hat = self.discrimination_input_A_real)
 82 |         self.discriminator_loss_input_A_fake = l2_loss(y = tf.zeros_like(self.discrimination_input_A_fake), y_hat = self.discrimination_input_A_fake)
 83 |         self.discriminator_loss_A = (self.discriminator_loss_input_A_real + self.discriminator_loss_input_A_fake) / 2
 84 | 
 85 |         self.discriminator_loss_input_B_real = l2_loss(y = tf.ones_like(self.discrimination_input_B_real), y_hat = self.discrimination_input_B_real)
 86 |         self.discriminator_loss_input_B_fake = l2_loss(y = tf.zeros_like(self.discrimination_input_B_fake), y_hat = self.discrimination_input_B_fake)
 87 |         self.discriminator_loss_B = (self.discriminator_loss_input_B_real + self.discriminator_loss_input_B_fake) / 2
 88 | 
 89 |         # Merge the two discriminators into one
 90 |         self.discriminator_loss = self.discriminator_loss_A + self.discriminator_loss_B
 91 | 
 92 |         # Categorize variables because we have to optimize the two sets of the variables separately
 93 |         trainable_variables = tf.trainable_variables()
 94 |         self.discriminator_vars = [var for var in trainable_variables if 'discriminator' in var.name]
 95 |         self.generator_vars = [var for var in trainable_variables if 'generator' in var.name]
 96 |         #for var in t_vars: print(var.name)
 97 | 
 98 |         # Reserved for test
 99 |         self.generation_B_test = self.generator(inputs = self.input_A_test, reuse = True, scope_name = 'generator_A2B')
100 |         self.generation_A_test = self.generator(inputs = self.input_B_test, reuse = True, scope_name = 'generator_B2A')
101 | 
102 | 
103 |     def optimizer_initializer(self):
104 | 
105 |         self.generator_learning_rate = tf.placeholder(tf.float32, None, name = 'generator_learning_rate')
106 |         self.discriminator_learning_rate = tf.placeholder(tf.float32, None, name = 'discriminator_learning_rate')
107 |         self.discriminator_optimizer = tf.train.AdamOptimizer(learning_rate = self.discriminator_learning_rate, beta1 = 0.5).minimize(self.discriminator_loss, var_list = self.discriminator_vars)
108 |         self.generator_optimizer = tf.train.AdamOptimizer(learning_rate = self.generator_learning_rate, beta1 = 0.5).minimize(self.generator_loss, var_list = self.generator_vars) 
109 | 
110 |     def train(self, input_A, input_B, lambda_cycle, lambda_identity, generator_learning_rate, discriminator_learning_rate):
111 | 
112 |         generation_A, generation_B, generator_loss, _, generator_summaries = self.sess.run(
113 |             [self.generation_A, self.generation_B, self.generator_loss, self.generator_optimizer, self.generator_summaries], \
114 |             feed_dict = {self.lambda_cycle: lambda_cycle, self.lambda_identity: lambda_identity, self.input_A_real: input_A, self.input_B_real: input_B, self.generator_learning_rate: generator_learning_rate})
115 | 
116 |         self.writer.add_summary(generator_summaries, self.train_step)
117 | 
118 |         discriminator_loss, _, discriminator_summaries = self.sess.run([self.discriminator_loss, self.discriminator_optimizer, self.discriminator_summaries], \
119 |             feed_dict = {self.input_A_real: input_A, self.input_B_real: input_B, self.discriminator_learning_rate: discriminator_learning_rate, self.input_A_fake: generation_A, self.input_B_fake: generation_B})
120 | 
121 |         self.writer.add_summary(discriminator_summaries, self.train_step)
122 | 
123 |         self.train_step += 1
124 | 
125 |         return generator_loss, discriminator_loss
126 | 
127 | 
128 |     def test(self, inputs, direction):
129 | 
130 |         if direction == 'A2B':
131 |             generation = self.sess.run(self.generation_B_test, feed_dict = {self.input_A_test: inputs})
132 |         elif direction == 'B2A':
133 |             generation = self.sess.run(self.generation_A_test, feed_dict = {self.input_B_test: inputs})
134 |         else:
135 |             raise Exception('Conversion direction must be specified.')
136 | 
137 |         return generation
138 | 
139 | 
140 |     def save(self, directory, filename):
141 | 
142 |         if not os.path.exists(directory):
143 |             os.makedirs(directory)
144 |         self.saver.save(self.sess, os.path.join(directory, filename))
145 |         
146 |         return os.path.join(directory, filename)
147 | 
148 |     def load(self, filepath):
149 | 
150 |         self.saver.restore(self.sess, filepath)
151 | 
152 | 
153 |     def summary(self):
154 | 
155 |         with tf.name_scope('generator_summaries'):
156 |             cycle_loss_summary = tf.summary.scalar('cycle_loss', self.cycle_loss)
157 |             identity_loss_summary = tf.summary.scalar('identity_loss', self.identity_loss)
158 |             generator_loss_A2B_summary = tf.summary.scalar('generator_loss_A2B', self.generator_loss_A2B)
159 |             generator_loss_B2A_summary = tf.summary.scalar('generator_loss_B2A', self.generator_loss_B2A)
160 |             generator_loss_summary = tf.summary.scalar('generator_loss', self.generator_loss)
161 |             generator_summaries = tf.summary.merge([cycle_loss_summary, identity_loss_summary, generator_loss_A2B_summary, generator_loss_B2A_summary, generator_loss_summary])
162 | 
163 |         with tf.name_scope('discriminator_summaries'):
164 |             discriminator_loss_A_summary = tf.summary.scalar('discriminator_loss_A', self.discriminator_loss_A)
165 |             discriminator_loss_B_summary = tf.summary.scalar('discriminator_loss_B', self.discriminator_loss_B)
166 |             discriminator_loss_summary = tf.summary.scalar('discriminator_loss', self.discriminator_loss)
167 |             discriminator_summaries = tf.summary.merge([discriminator_loss_A_summary, discriminator_loss_B_summary, discriminator_loss_summary])
168 | 
169 |         return generator_summaries, discriminator_summaries
170 | 
171 | 
172 | if __name__ == '__main__':
173 |     tf.reset_default_graph()
174 |     model = CycleGAN(num_features = 24)
175 |     print('Graph Compile Successeded.')
176 | 


--------------------------------------------------------------------------------
/Parallel-data-free emotional voice conversion with CycleGAN and CWT/train.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import numpy as np
  3 | import argparse
  4 | import time
  5 | import librosa
  6 | 
  7 | from preprocess import *
  8 | from model import CycleGAN
  9 | 
 10 | 
 11 | def train(train_A_dir, train_B_dir, model_dir, model_name, random_seed, validation_A_dir, validation_B_dir, output_dir, tensorboard_log_dir):
 12 | 
 13 |     np.random.seed(random_seed)
 14 | 
 15 |     num_epochs = 5000
 16 |     mini_batch_size = 1 # mini_batch_size = 1 is better
 17 |     generator_learning_rate = 0.0002
 18 |     generator_learning_rate_decay = generator_learning_rate / 200000
 19 |     discriminator_learning_rate = 0.0001
 20 |     discriminator_learning_rate_decay = discriminator_learning_rate / 200000
 21 |     sampling_rate = 16000
 22 |     num_mcep = 24
 23 |     frame_period = 5.0
 24 |     n_frames = 128
 25 |     lambda_cycle = 10
 26 |     lambda_identity = 5
 27 | 
 28 |     print('Preprocessing Data...')
 29 | 
 30 |     start_time = time.time()
 31 | 
 32 |     wavs_A = load_wavs(wav_dir = train_A_dir, sr = sampling_rate)
 33 |     wavs_B = load_wavs(wav_dir = train_B_dir, sr = sampling_rate)
 34 | 
 35 |     f0s_A, timeaxes_A, sps_A, aps_A, coded_sps_A = world_encode_data(wavs = wavs_A, fs = sampling_rate, frame_period = frame_period, coded_dim = num_mcep)
 36 |     f0s_B, timeaxes_B, sps_B, aps_B, coded_sps_B = world_encode_data(wavs = wavs_B, fs = sampling_rate, frame_period = frame_period, coded_dim = num_mcep)
 37 | 
 38 |     log_f0s_mean_A, log_f0s_std_A = logf0_statistics(f0s_A)
 39 |     log_f0s_mean_B, log_f0s_std_B = logf0_statistics(f0s_B)
 40 | 
 41 |     print('Log Pitch A')
 42 |     print('Mean: %f, Std: %f' %(log_f0s_mean_A, log_f0s_std_A))
 43 |     print('Log Pitch B')
 44 |     print('Mean: %f, Std: %f' %(log_f0s_mean_B, log_f0s_std_B))
 45 | 
 46 | 
 47 |     coded_sps_A_transposed = transpose_in_list(lst = coded_sps_A)
 48 |     coded_sps_B_transposed = transpose_in_list(lst = coded_sps_B)
 49 | 
 50 |     coded_sps_A_norm, coded_sps_A_mean, coded_sps_A_std = coded_sps_normalization_fit_transoform(coded_sps = coded_sps_A_transposed)
 51 |     print("Input data fixed.")
 52 |     coded_sps_B_norm, coded_sps_B_mean, coded_sps_B_std = coded_sps_normalization_fit_transoform(coded_sps = coded_sps_B_transposed)
 53 | 
 54 |     if not os.path.exists(model_dir):
 55 |         os.makedirs(model_dir)
 56 |     np.savez(os.path.join(model_dir, 'logf0s_normalization.npz'), mean_A = log_f0s_mean_A, std_A = log_f0s_std_A, mean_B = log_f0s_mean_B, std_B = log_f0s_std_B)
 57 |     np.savez(os.path.join(model_dir, 'mcep_normalization.npz'), mean_A = coded_sps_A_mean, std_A = coded_sps_A_std, mean_B = coded_sps_B_mean, std_B = coded_sps_B_std)
 58 | 
 59 |     if validation_A_dir is not None:
 60 |         validation_A_output_dir = os.path.join(output_dir, 'converted_A')
 61 |         if not os.path.exists(validation_A_output_dir):
 62 |             os.makedirs(validation_A_output_dir)
 63 | 
 64 |     if validation_B_dir is not None:
 65 |         validation_B_output_dir = os.path.join(output_dir, 'converted_B')
 66 |         if not os.path.exists(validation_B_output_dir):
 67 |             os.makedirs(validation_B_output_dir)
 68 | 
 69 |     end_time = time.time()
 70 |     time_elapsed = end_time - start_time
 71 | 
 72 |     print('Preprocessing Done.')
 73 | 
 74 |     print('Time Elapsed for Data Preprocessing: %02d:%02d:%02d' % (time_elapsed // 3600, (time_elapsed % 3600 // 60), (time_elapsed % 60 // 1)))
 75 | 
 76 |     model = CycleGAN(num_features = num_mcep)
 77 | 
 78 |     for epoch in range(num_epochs):
 79 |         print('Epoch: %d' % epoch)
 80 |         '''
 81 |         if epoch > 60:
 82 |             lambda_identity = 0
 83 |         if epoch > 1250:
 84 |             generator_learning_rate = max(0, generator_learning_rate - 0.0000002)
 85 |             discriminator_learning_rate = max(0, discriminator_learning_rate - 0.0000001)
 86 |         '''
 87 | 
 88 |         start_time_epoch = time.time()
 89 | 
 90 |         dataset_A, dataset_B = sample_train_data(dataset_A = coded_sps_A_norm, dataset_B = coded_sps_B_norm, n_frames = n_frames)
 91 | 
 92 |         n_samples = dataset_A.shape[0]
 93 | 
 94 |         for i in range(n_samples // mini_batch_size):
 95 | 
 96 |             num_iterations = n_samples // mini_batch_size * epoch + i
 97 | 
 98 |             if num_iterations > 10000:
 99 |                 lambda_identity = 0
100 |             if num_iterations > 200000:
101 |                 generator_learning_rate = max(0, generator_learning_rate - generator_learning_rate_decay)
102 |                 discriminator_learning_rate = max(0, discriminator_learning_rate - discriminator_learning_rate_decay)
103 | 
104 |             start = i * mini_batch_size
105 |             end = (i + 1) * mini_batch_size
106 | 
107 |             generator_loss, discriminator_loss = model.train(input_A = dataset_A[start:end], input_B = dataset_B[start:end], lambda_cycle = lambda_cycle, lambda_identity = lambda_identity, generator_learning_rate = generator_learning_rate, discriminator_learning_rate = discriminator_learning_rate)
108 | 
109 |             if i % 50 == 0:
110 |                 #print('Iteration: %d, Generator Loss : %f, Discriminator Loss : %f' % (num_iterations, generator_loss, discriminator_loss))
111 |                 print('Iteration: {:07d}, Generator Learning Rate: {:.7f}, Discriminator Learning Rate: {:.7f}, Generator Loss : {:.3f}, Discriminator Loss : {:.3f}'.format(num_iterations, generator_learning_rate, discriminator_learning_rate, generator_loss, discriminator_loss))
112 | 
113 |         model.save(directory = model_dir, filename = model_name)
114 | 
115 |         end_time_epoch = time.time()
116 |         time_elapsed_epoch = end_time_epoch - start_time_epoch
117 | 
118 |         print('Time Elapsed for This Epoch: %02d:%02d:%02d' % (time_elapsed_epoch // 3600, (time_elapsed_epoch % 3600 // 60), (time_elapsed_epoch % 60 // 1)))
119 | 
120 |         if validation_A_dir is not None:
121 |             if epoch % 50 == 0:
122 |                 print('Generating Validation Data B from A...')
123 |                 for file in os.listdir(validation_A_dir):
124 |                     filepath = os.path.join(validation_A_dir, file)
125 |                     wav, _ = librosa.load(filepath, sr = sampling_rate, mono = True)
126 |                     wav = wav_padding(wav = wav, sr = sampling_rate, frame_period = frame_period, multiple = 4)
127 |                     f0, timeaxis, sp, ap = world_decompose(wav = wav, fs = sampling_rate, frame_period = frame_period)
128 |                     f0_converted = pitch_conversion(f0 = f0, mean_log_src = log_f0s_mean_A, std_log_src = log_f0s_std_A, mean_log_target = log_f0s_mean_B, std_log_target = log_f0s_std_B)
129 |                     coded_sp = world_encode_spectral_envelop(sp = sp, fs = sampling_rate, dim = num_mcep)
130 |                     coded_sp_transposed = coded_sp.T
131 |                     coded_sp_norm = (coded_sp_transposed - coded_sps_A_mean) / coded_sps_A_std
132 |                     coded_sp_converted_norm = model.test(inputs = np.array([coded_sp_norm]), direction = 'A2B')[0]
133 |                     coded_sp_converted = coded_sp_converted_norm * coded_sps_B_std + coded_sps_B_mean
134 |                     coded_sp_converted = coded_sp_converted.T
135 |                     coded_sp_converted = np.ascontiguousarray(coded_sp_converted)
136 |                     decoded_sp_converted = world_decode_spectral_envelop(coded_sp = coded_sp_converted, fs = sampling_rate)
137 |                     wav_transformed = world_speech_synthesis(f0 = f0_converted, decoded_sp = decoded_sp_converted, ap = ap, fs = sampling_rate, frame_period = frame_period)
138 |                     librosa.output.write_wav(os.path.join(validation_A_output_dir, os.path.basename(file)), wav_transformed, sampling_rate)
139 | 
140 |         if validation_B_dir is not None:
141 |             if epoch % 50 == 0:
142 |                 print('Generating Validation Data A from B...')
143 |                 for file in os.listdir(validation_B_dir):
144 |                     filepath = os.path.join(validation_B_dir, file)
145 |                     wav, _ = librosa.load(filepath, sr = sampling_rate, mono = True)
146 |                     wav = wav_padding(wav = wav, sr = sampling_rate, frame_period = frame_period, multiple = 4)
147 |                     f0, timeaxis, sp, ap = world_decompose(wav = wav, fs = sampling_rate, frame_period = frame_period)
148 |                     f0_converted = pitch_conversion(f0 = f0, mean_log_src = log_f0s_mean_B, std_log_src = log_f0s_std_B, mean_log_target = log_f0s_mean_A, std_log_target = log_f0s_std_A)
149 |                     coded_sp = world_encode_spectral_envelop(sp = sp, fs = sampling_rate, dim = num_mcep)
150 |                     coded_sp_transposed = coded_sp.T
151 |                     coded_sp_norm = (coded_sp_transposed - coded_sps_B_mean) / coded_sps_B_std
152 |                     coded_sp_converted_norm = model.test(inputs = np.array([coded_sp_norm]), direction = 'B2A')[0]
153 |                     coded_sp_converted = coded_sp_converted_norm * coded_sps_A_std + coded_sps_A_mean
154 |                     coded_sp_converted = coded_sp_converted.T
155 |                     coded_sp_converted = np.ascontiguousarray(coded_sp_converted)
156 |                     decoded_sp_converted = world_decode_spectral_envelop(coded_sp = coded_sp_converted, fs = sampling_rate)
157 |                     wav_transformed = world_speech_synthesis(f0 = f0_converted, decoded_sp = decoded_sp_converted, ap = ap, fs = sampling_rate, frame_period = frame_period)
158 |                     librosa.output.write_wav(os.path.join(validation_B_output_dir, os.path.basename(file)), wav_transformed, sampling_rate)
159 | 
160 | if __name__ == '__main__':
161 | 
162 |     parser = argparse.ArgumentParser(description = 'Train CycleGAN model for datasets.')
163 | 
164 |     train_A_dir_default = './data/training/NEUTRAL'
165 |     train_B_dir_default = './data/training/SURPRISE'
166 |     model_dir_default = './model/neutral_to_suprise_mceps'
167 |     model_name_default = 'neutral_to_suprise_mceps.ckpt'
168 |     random_seed_default = 0
169 |     validation_A_dir_default = './data/evaluation_all/NEUTRAL'
170 |     validation_B_dir_default = './data/evaluation_all/SURPRISE'
171 |     output_dir_default = './validation_output'
172 |     tensorboard_log_dir_default = './log'
173 | 
174 |     parser.add_argument('--train_A_dir', type = str, help = 'Directory for A.', default = train_A_dir_default)
175 |     parser.add_argument('--train_B_dir', type = str, help = 'Directory for B.', default = train_B_dir_default)
176 |     parser.add_argument('--model_dir', type = str, help = 'Directory for saving models.', default = model_dir_default)
177 |     parser.add_argument('--model_name', type = str, help = 'File name for saving model.', default = model_name_default)
178 |     parser.add_argument('--random_seed', type = int, help = 'Random seed for model training.', default = random_seed_default)
179 |     parser.add_argument('--validation_A_dir', type = str, help = 'Convert validation A after each training epoch. If set none, no conversion would be done during the training.', default = validation_A_dir_default)
180 |     parser.add_argument('--validation_B_dir', type = str, help = 'Convert validation B after each training epoch. If set none, no conversion would be done during the training.', default = validation_B_dir_default)
181 |     parser.add_argument('--output_dir', type = str, help = 'Output directory for converted validation voices.', default = output_dir_default)
182 |     parser.add_argument('--tensorboard_log_dir', type = str, help = 'TensorBoard log directory.', default = tensorboard_log_dir_default)
183 | 
184 |     argv = parser.parse_args()
185 | 
186 |     train_A_dir = argv.train_A_dir
187 |     train_B_dir = argv.train_B_dir
188 |     model_dir = argv.model_dir
189 |     model_name = argv.model_name
190 |     random_seed = argv.random_seed
191 |     validation_A_dir = None if argv.validation_A_dir == 'None' or argv.validation_A_dir == 'none' else argv.validation_A_dir
192 |     validation_B_dir = None if argv.validation_B_dir == 'None' or argv.validation_B_dir == 'none' else argv.validation_B_dir
193 |     output_dir = argv.output_dir
194 |     tensorboard_log_dir = argv.tensorboard_log_dir
195 | 
196 |     train(train_A_dir = train_A_dir, train_B_dir = train_B_dir, model_dir = model_dir, model_name = model_name, random_seed = random_seed, validation_A_dir = validation_A_dir, validation_B_dir = validation_B_dir, output_dir = output_dir, tensorboard_log_dir = tensorboard_log_dir)
197 | 


--------------------------------------------------------------------------------