├── LICENSE ├── README.md ├── augmentation.py ├── generate.py ├── model_fundf.py ├── noiselist.scp ├── saved_models ├── gtaug_best.ckpt.data-00000-of-00001 ├── gtaug_best.ckpt.index ├── gtaug_best.ckpt.meta └── gtaug_loss.dat ├── sp_module.py ├── test_wavs.scp ├── train.py ├── train_f0s.scp ├── train_wavs.scp └── wavs └── arctic_a0001.wav /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [yyyy] [name of copyright owner] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # PiENet 2 | Pitch estimation network (PiENet) for noise-robust neural F0 estimation of speech signals. The best performing model from [1] ('GTE-AUG') supplied as a pre-trained model. The model was trained using additional and convolutional noise augmentation, as well as vocoder-based ground truth enhancement (see publication for further details). 3 | 4 | Article found in: 5 | [IEEE Xplore](https://ieeexplore.ieee.org/document/8683041), 6 | [ResearchGate](https://www.researchgate.net/publication/331012502_Data_Augmentation_Strategies_for_Neural_Network_F0_Estimation) 7 | 8 | ## Licence 9 | Distributed under the Apache 2.0 license. See LICENSE for further details. 10 | 11 | ## Dependencies 12 | * Tensorflow (tested with version 1.10) 13 | * Numpy 14 | * Scipy 15 | 16 | ## How to use pre-trained model: 17 | The pre-trained model is trained for speech sampled at 16 kHz, with 10 ms frame shift and a F0 range of 50 to 500 Hz. 18 | 19 | ### Method 1: 20 | * `python generate.py` : Perform pitch estimation to the `.wav` files specified in the text file `test_wavs.scp` and outputs the `.f0` files to the default folder located at `./f0/`. 21 | 22 | ### Method 2: 23 | * `python generate.py input_file.wav` : Perform pitch estimation to `input_file.wav`, output to `./f0/`. 24 | 25 | * `python generate.py input_list.scp` : Perform pitch estimation to all files specified in the text file `input_list.scp`, output to `./f0/`. 26 | 27 | * `python generate.py input_dir` : Perform pitch estimation to all `.wav` files found in the folder `input_dir`, output to `./f0/`. 28 | 29 | ### Method 3: 30 | * `python generate.py [input_file/input_list/input_dir] target_dir` : Same options as Method 2, output to `target_dir`. 31 | 32 | ## How to train your own model: 33 | * Supply list of train data files in `train_wavs.scp` for input data, and the corresponding target F0s in `train_f0s.scp` (datatype raw float32 for F0 files). Note that the number of frames in the F0 files must match the present method's framing convention (Nframes = ceil(wav_length/hop), signal zero-padded with winlen/2 samples from start and beginning). 34 | 35 | * If you want to use additive noise augmentation with sampled noise from a database, supply the file paths in `noiselist.scp` and edit the code in `train.py` line 54 to `noise_samples = augmentation.load_noise_samples(noise_wav_scp=noiselist.scp)`. 36 | 37 | * For vocoder-based augmentation techniques, you need to download and the vocoder of your choice. The GlottDNN vocoder used in [1] can be found in https://github.com/ljuvela/GlottDNN. For ground truth enhancement, process the train wavs offline with the vocoder. The code for diversity augmentation is not provided in the current release of the method. 38 | 39 | 40 | ## Reference: 41 | [1] M. Airaksinen, L. Juvela, P. Alku and O. Räsänen: "Data augmentation strategies for neural F0 estimation", Proc. ICASSP 2019 42 | 43 | Available: [IEEE Xplore](https://ieeexplore.ieee.org/document/8683041), 44 | [ResearchGate](https://www.researchgate.net/publication/331012502_Data_Augmentation_Strategies_for_Neural_Network_F0_Estimation) 45 | 46 | 47 | -------------------------------------------------------------------------------- /augmentation.py: -------------------------------------------------------------------------------- 1 | from __future__ import division 2 | 3 | __author__ = "Manu Airaksinen, manu.airaksinen@aalto.fi" 4 | 5 | import os 6 | import sys 7 | import numpy as np 8 | import scipy.io.wavfile as wavfile 9 | 10 | 11 | 12 | def load_noise_samples(noise_wav_scp='noiselist.scp'): 13 | if noise_wav_scp == None: 14 | return None 15 | 16 | with open(noise_wav_scp) as wavlist: 17 | wavs = wavlist.read().splitlines() 18 | 19 | noise_samples = [] 20 | for wav in wavs: 21 | fs, y = wavfile.read(wav) 22 | y = np.float32(y/(2**15)) 23 | if fs != 16000: 24 | raise Exception('fs needs to be 16 kHz!') 25 | noise_samples.append(y) 26 | return noise_samples 27 | 28 | 29 | def sampled_noise(N,noise_samples): 30 | # Random noise sample 31 | i_sample = np.random.randint(0,len(noise_samples)) 32 | noise = noise_samples[i_sample] 33 | # Random starting location 34 | i_start = np.random.randint(0,len(noise)-N-1) 35 | noise = noise[i_start:i_start+N] 36 | return noise 37 | 38 | 39 | # Controlled augmentation 40 | def add_noise_file_controlled(X,snr,noise_type='white',noise_sample=None,run_codec=False): 41 | # Add additive noise 42 | e = np.linalg.norm(X) 43 | if noise_type == 'white': 44 | noise = np.random.randn(X.shape[0]) 45 | elif noise_type == 'babble': 46 | # Random starting location 47 | N = X.shape[0] 48 | i_start = np.random.randint(0,len(noise_sample)-N-1) 49 | noise = noise_sample[i_start:i_start+N] 50 | 51 | en = np.linalg.norm(noise) 52 | gain = 10.0**(-1.0*snr/20.0) 53 | noise = gain * noise * e / en 54 | X += noise 55 | 56 | return X 57 | 58 | # Random augmentation 59 | def add_noise_file(X,noise_samples=None): 60 | 61 | # Add channel noise (random impulse response) 62 | if np.random.rand() > 0.5: 63 | imp = np.random.randn(17,) 64 | gain_imp = np.random.rand() 65 | imp *= gain_imp 66 | imp[8] = 1.0 67 | X = np.convolve(X,imp,'same') 68 | 69 | # Add additive noise 70 | if np.random.rand() > 0.5: 71 | e = np.linalg.norm(X) 72 | if noise_samples == None: 73 | noise = np.random.randn(X.shape[0]) 74 | else: 75 | noise = sampled_noise(X.shape[0],noise_samples) 76 | en = np.linalg.norm(noise) 77 | # Random snr for batch 78 | snr = np.float32(np.random.randint(-10, 20)) 79 | gain = 10.0**(-1.0*snr/20.0) 80 | noise = gain * noise * e / en 81 | X += noise 82 | X *= 10.0**(-1.0*(np.random.rand()-0.5)) 83 | 84 | return X 85 | 86 | 87 | -------------------------------------------------------------------------------- /generate.py: -------------------------------------------------------------------------------- 1 | from __future__ import division 2 | 3 | __author__ = "Manu Airaksinen, manu.airaksinen@aalto.fi" 4 | 5 | 6 | import os 7 | import sys 8 | import glob 9 | #os.environ['CUDA_VISIBLE_DEVICES'] = '' # Uncommment to force CPU 10 | import numpy as np 11 | import tensorflow as tf 12 | import scipy.io.wavfile as wavfile 13 | import scipy.signal 14 | 15 | import model_fundf as model 16 | import sp_module as sp 17 | 18 | 19 | 20 | _FLOATX = tf.float32 21 | 22 | def generate(wav_list, target_dir, model_name): 23 | 24 | # Set input & output dimensions (DO NOT EDIT FOR TRAINED MODELS) 25 | winlen = 512 26 | hop = 160 27 | input_dim = winlen 28 | output_dim = 351 29 | 30 | # network config 31 | dilations=[1, 2, 4, 8, 1, 2, 4, 8] 32 | filter_width = 5 33 | residual_channels = 128 34 | postnet_channels = 256 35 | 36 | f0_model = model.CNET(name='f0model', input_channels=input_dim, 37 | output_channels=output_dim, dilations=dilations, filter_width=filter_width, 38 | residual_channels=residual_channels, postnet_channels=postnet_channels) 39 | 40 | 41 | input_var = tf.placeholder(shape=(None, None, winlen), dtype=_FLOATX) 42 | 43 | # model input is framed raw signal 44 | f0_activations = f0_model.forward_pass(input_var) 45 | f0_activations = tf.nn.softmax(f0_activations) 46 | 47 | saver = tf.train.Saver() 48 | 49 | with tf.Session() as sess: 50 | 51 | model_file = './saved_models/' + model_name + '_best.ckpt' 52 | print("Loading model: " + model_file) 53 | 54 | saver.restore(sess, model_file) 55 | print("Model restored.") 56 | 57 | for wfile in wav_list: 58 | fs, y = wavfile.read(wfile) 59 | y = np.float32(y/(2**15)) 60 | 61 | if fs != 16000: 62 | y = scipy.signal.resample(y,int(len(y)/(fs/16000))) 63 | 64 | input_frames = np.reshape(sp.get_frames(y,winlen,hop),[1,-1,winlen]) 65 | 66 | f0_act_np = sess.run([f0_activations], feed_dict={input_var: input_frames}) 67 | f0_act_np = np.reshape(np.asarray(f0_act_np),[-1,output_dim]) 68 | 69 | f0_gen_np = sp.getF0fromActivations(f0_act_np,minf0=50,maxf0=500,Nbins=351) 70 | 71 | basename = os.path.basename(os.path.splitext(wfile)[0]) 72 | target_file = target_dir + '/' + basename + '.f0' 73 | #f0_gen_np.tofile(target_file) # Use this to write float32 binary files 74 | np.savetxt(target_file,f0_gen_np,fmt='%.2f') # Use this to write ASCII output files 75 | 76 | 77 | if __name__=="__main__": 78 | 79 | # Parse input arguments 80 | 81 | 82 | if(len(sys.argv) == 1): # No extra arguments, use default input and output files 83 | file_list = 'test_wavs.scp' 84 | target_dir = 'f0/' 85 | 86 | elif(len(sys.argv) == 2): # Custom list of files for input 87 | target_dir = 'f0/' 88 | if(isinstance(sys.argv[1], str)): 89 | file_list = sys.argv[1] 90 | else: 91 | raise Exception('First input argument must be a string (file path)') 92 | elif(len(sys.argv) == 3): # Custom list of files for input 93 | if(isinstance(sys.argv[1], str) and isinstance(sys.argv[2], str)): 94 | file_list = sys.argv[1] 95 | target_dir = sys.argv[2] 96 | else: 97 | raise Exception('First input argument must be a string (file path)') 98 | 99 | elif(len(sys.argv) > 3): # Custom list of input files + custom result file 100 | raise Exception('Too many input arguments') 101 | 102 | ########### GET DATA ############# 103 | wasdir = 0 104 | # If input 105 | if(os.path.isdir(file_list)): 106 | fileList = sorted(glob.glob(file_list + '*.wav')) 107 | wasdir = 1 108 | if(len(fileList) == 0): 109 | raise Exception('Provided directory contains no .wav files') 110 | elif(file_list.endswith('.wav')): 111 | fileList = list() 112 | fileList.append(file_list) 113 | else: 114 | if(os.path.isfile(file_list)): 115 | #fileList = list(filter(bool,[line.rstrip('\n') for line in open(file_list)])) 116 | with open(file_list) as wavlist: 117 | fileList = wavlist.read().splitlines() 118 | else: 119 | raise Exception('Provided input file list does not exist.') 120 | 121 | 122 | # Model name (located in 'saved_models/' directory) 123 | model_name = 'gtaug' 124 | 125 | # Target directory for estimated F0 files 126 | 127 | 128 | os.makedirs(target_dir, exist_ok=True) 129 | 130 | generate(fileList, target_dir, model_name) 131 | -------------------------------------------------------------------------------- /model_fundf.py: -------------------------------------------------------------------------------- 1 | from __future__ import division, print_function 2 | 3 | __author__ = "Lauri Juvela, lauri.juvela@aalto.fi", "Manu Airaksinen, manu.airaksinen@aalto.fi" 4 | 5 | import math 6 | import numpy as np 7 | import tensorflow as tf 8 | 9 | _FLOATX = tf.float32 10 | 11 | def get_weight_variable(name, shape=None, initializer=tf.contrib.layers.xavier_initializer_conv2d()): 12 | if shape is None: 13 | return tf.get_variable(name) 14 | else: 15 | return tf.get_variable(name, shape=shape, dtype=_FLOATX, initializer=initializer) 16 | 17 | def get_bias_variable(name, shape=None, initializer=tf.constant_initializer(value=0.0, dtype=_FLOATX)): 18 | if shape is None: 19 | return tf.get_variable(name) 20 | else: 21 | return tf.get_variable(name, shape=shape, dtype=_FLOATX, initializer=initializer) 22 | 23 | 24 | class CNET(): 25 | 26 | def __init__(self, 27 | name, 28 | residual_channels=128, 29 | filter_width=5, 30 | dilations=[1, 2, 4, 8, 1, 2, 4, 8], 31 | input_channels=512, 32 | output_channels=301, 33 | postnet_channels=256): 34 | 35 | self.input_channels = input_channels 36 | self.output_channels = output_channels 37 | self.filter_width = filter_width 38 | self.dilations = dilations 39 | self.residual_channels = residual_channels 40 | self.postnet_channels = postnet_channels 41 | 42 | self._name = name 43 | self._create_variables() 44 | 45 | def _create_variables(self): 46 | 47 | fw = self.filter_width 48 | r = self.residual_channels 49 | s = self.postnet_channels 50 | 51 | with tf.variable_scope(self._name): 52 | 53 | with tf.variable_scope('input_layer'): 54 | get_weight_variable('W', (1, self.input_channels, r)) # Input_channels = waveform -> fully connected matrix to learn a linear transformation 55 | get_bias_variable('b', (r)) 56 | 57 | for i, dilation in enumerate(self.dilations): 58 | with tf.variable_scope('conv_modules'): 59 | with tf.variable_scope('module{}'.format(i)): 60 | # (filter_width x input_channels x output_channels) 61 | get_weight_variable('filter_gate_W', (fw, r, 2*r)) 62 | get_bias_variable('filter_gate_b', (2*r)) 63 | 64 | get_weight_variable('skip_weight_W', (1, r, r)) 65 | get_weight_variable('skip_weight_b', (r)) 66 | 67 | get_weight_variable('output_weight_W', (1, r, r)) 68 | get_weight_variable('output_weight_b', (r)) 69 | 70 | 71 | with tf.variable_scope('postproc_module'): 72 | # (filter_width x input_channels x output_channels) 73 | 74 | get_weight_variable('W1', (fw, r, s)) 75 | get_bias_variable('b1', s) 76 | 77 | get_weight_variable('W2', (fw, s, self.output_channels)) 78 | get_bias_variable('b2', self.output_channels) 79 | 80 | def get_variable_list(self): 81 | return tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope=self._name) 82 | 83 | 84 | def _input_layer(self, main_input, training=False): 85 | with tf.variable_scope('input_layer'): 86 | 87 | W = get_weight_variable('W') 88 | b = get_bias_variable('b') 89 | 90 | X = main_input 91 | X = tf.layers.dropout(inputs=X, rate=0.4, training=training) 92 | Y = tf.nn.convolution(X, W, padding='SAME') 93 | Y += b 94 | Y = tf.tanh(Y) 95 | 96 | return Y 97 | 98 | def _conv_module(self, main_input, residual_input, module_idx, dilation): 99 | with tf.variable_scope('conv_modules'): 100 | with tf.variable_scope('module{}'.format(module_idx)): 101 | 102 | W = get_weight_variable('filter_gate_W') 103 | b = get_bias_variable('filter_gate_b') 104 | r = self.residual_channels 105 | 106 | W_skip = get_weight_variable('skip_weight_W') 107 | b_skip = get_weight_variable('skip_weight_b') 108 | 109 | W_out = get_weight_variable('output_weight_W') 110 | b_out = get_weight_variable('output_weight_b') 111 | 112 | X = main_input 113 | 114 | # convolution 115 | Y = tf.nn.convolution(X, W, padding='SAME', dilation_rate=[dilation]) 116 | Y += b 117 | 118 | # filter and gate 119 | Y = tf.tanh(Y[:, :, :r])*tf.sigmoid(Y[:, :, r:]) 120 | 121 | # add residual channel 122 | skip_out = tf.nn.convolution(Y, W_skip, padding='SAME') # 1x1 convolution 123 | skip_out += b_skip 124 | 125 | Y = tf.nn.convolution(Y, W_out, padding='SAME') 126 | Y += b_out 127 | Y += X 128 | 129 | return Y, skip_out 130 | 131 | def _postproc_module(self, residual_module_outputs): 132 | with tf.variable_scope('postproc_module'): 133 | 134 | W1 = get_weight_variable('W1') 135 | b1 = get_bias_variable('b1') 136 | W2 = get_weight_variable('W2') 137 | b2 = get_bias_variable('b2') 138 | 139 | # sum of residual module outputs 140 | X = tf.zeros_like(residual_module_outputs[0]) 141 | for R in residual_module_outputs: 142 | X += R 143 | 144 | Y = tf.nn.convolution(X, W1, padding='SAME') 145 | Y += b1 146 | Y = tf.nn.relu(Y) 147 | 148 | Y = tf.nn.convolution(Y, W2, padding='SAME') 149 | Y += b2 150 | 151 | return Y 152 | 153 | 154 | def forward_pass(self, X_input, training=False): 155 | skip_outputs = [] 156 | with tf.variable_scope(self._name, reuse=True): 157 | R = self._input_layer(X_input, training=training) 158 | X = R 159 | for i, dilation in enumerate(self.dilations): 160 | X, skip = self._conv_module(X, R, i, dilation) 161 | skip_outputs.append(skip) 162 | 163 | Y = self._postproc_module(skip_outputs) 164 | Y = tf.reshape(Y,[-1,self.output_channels]) 165 | 166 | return Y 167 | 168 | 169 | 170 | 171 | 172 | 173 | 174 | 175 | 176 | 177 | 178 | 179 | -------------------------------------------------------------------------------- /noiselist.scp: -------------------------------------------------------------------------------- 1 | noise_wav_dir/noise_sample_1.wav 2 | noise_wav_dir/noise_sample_2.wav -------------------------------------------------------------------------------- /saved_models/gtaug_best.ckpt.data-00000-of-00001: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mairaksi/PiENet/01ab5122d693f55fe9b49025a4e41f3e0eeaa4b2/saved_models/gtaug_best.ckpt.data-00000-of-00001 -------------------------------------------------------------------------------- /saved_models/gtaug_best.ckpt.index: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mairaksi/PiENet/01ab5122d693f55fe9b49025a4e41f3e0eeaa4b2/saved_models/gtaug_best.ckpt.index -------------------------------------------------------------------------------- /saved_models/gtaug_best.ckpt.meta: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mairaksi/PiENet/01ab5122d693f55fe9b49025a4e41f3e0eeaa4b2/saved_models/gtaug_best.ckpt.meta -------------------------------------------------------------------------------- /saved_models/gtaug_loss.dat: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mairaksi/PiENet/01ab5122d693f55fe9b49025a4e41f3e0eeaa4b2/saved_models/gtaug_loss.dat -------------------------------------------------------------------------------- /sp_module.py: -------------------------------------------------------------------------------- 1 | from __future__ import division 2 | 3 | __author__ = "Manu Airaksinen, manu.airaksinen@aalto.fi" 4 | 5 | import os 6 | import numpy as np 7 | import tensorflow as tf 8 | 9 | def get_frames(x,wl,hop): 10 | Nframes = int(np.ceil(x.shape[0]/hop)) 11 | X = np.zeros((Nframes,int(wl)),dtype=np.float32) 12 | pad = np.zeros(int(wl/2),dtype=np.float32) 13 | x = np.concatenate((pad, x, pad)) 14 | for i in range(Nframes): 15 | X[i,:] = x[i*hop:i*hop+wl] 16 | 17 | return X 18 | 19 | 20 | def onehot(f0_vec,minf0=50,maxf0=500,Nbins=351): 21 | f0_onehot = np.zeros((f0_vec.shape[0],Nbins),dtype=np.float32) 22 | f0vec = np.exp(np.linspace(np.log(minf0),np.log(maxf0),Nbins-1)) 23 | for i in range(f0_vec.shape[0]): 24 | if f0_vec[i] > 0.0: 25 | IND = np.argmin(np.abs(f0vec-f0_vec[i])) 26 | f0_onehot[i,IND] = 1.0 27 | else: 28 | f0_onehot[i,-1] = 1.0 29 | 30 | f0_onehot = np.reshape(f0_onehot,[1,-1,Nbins]) 31 | return f0_onehot 32 | 33 | def getF0fromActivations(f0_act,minf0=50,maxf0=500,Nbins=351): 34 | f0_gen = np.zeros((f0_act.shape[0]),dtype=np.float32) 35 | f0vec = np.exp(np.linspace(np.log(minf0),np.log(maxf0),Nbins-1)) 36 | for i in range(f0_act.shape[0]): 37 | ind = np.argmax(f0_act[i,:]) 38 | if ind < Nbins-1: 39 | f0_gen[i] = f0vec[ind] 40 | else: 41 | f0_gen[i] = 0 42 | return f0_gen -------------------------------------------------------------------------------- /test_wavs.scp: -------------------------------------------------------------------------------- 1 | wavs/arctic_a0001.wav 2 | -------------------------------------------------------------------------------- /train.py: -------------------------------------------------------------------------------- 1 | __author__ = "Manu Airaksinen, manu.airaksinen@aalto.fi" 2 | 3 | import os 4 | #os.environ['CUDA_VISIBLE_DEVICES'] = '' # Uncomment to force CPU 5 | 6 | import numpy as np 7 | import tensorflow as tf 8 | import scipy.io.wavfile as wavfile 9 | 10 | import model_fundf as model 11 | import augmentation 12 | import sp_module as sp 13 | 14 | _FLOATX = tf.float32 15 | 16 | def train_model(train_wav_list,train_f0_list,test_wav_list,test_f0_list,model_name='test'): 17 | tf.reset_default_graph() # debugging, clear all tf variables 18 | 19 | # Data dimensions 20 | winlen = 512 # samples 21 | hop = 160 # samples 22 | input_dim = winlen 23 | output_dim = 351 24 | f0_max = 500 25 | f0_min = 50 26 | 27 | downsample_f0 = False # If target f0 is computed with 5ms hop size (and network uses 10ms), set True 28 | 29 | # network config 30 | dilations=[1, 2, 4, 8, 1, 2, 4, 8] 31 | filter_width = 5 32 | residual_channels = 128 33 | postnet_channels = 256 34 | 35 | f0_model = model.CNET(name='f0model', input_channels=input_dim, 36 | output_channels=output_dim, dilations=dilations, filter_width=filter_width, 37 | residual_channels=residual_channels, postnet_channels=postnet_channels) 38 | 39 | # data placeholders of shape (batch_size, timesteps, feature_dim) 40 | input_var = tf.placeholder(shape=(None, None, winlen), dtype=_FLOATX) 41 | output_var = tf.placeholder(shape=(None, None, output_dim), dtype=_FLOATX) 42 | training_var = tf.placeholder(dtype=tf.bool) 43 | 44 | # encoder model input is observed signal 45 | f0_activations = f0_model.forward_pass(input_var) 46 | 47 | loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=output_var,logits=f0_activations)) 48 | 49 | theta = f0_model.get_variable_list() 50 | optim = tf.train.AdamOptimizer(learning_rate=1e-4,beta1=0.9,beta2=0.999).minimize(loss, var_list=[theta]) 51 | 52 | # Add text file containing paths to augmentation noise samplesa 53 | # If noise_wav_scp = None, uses white noise as additive noise augmentation 54 | noise_samples = augmentation.load_noise_samples(noise_wav_scp=None) 55 | 56 | with tf.Session() as sess: 57 | 58 | num_epochs = 100 59 | 60 | init_op = tf.group(tf.global_variables_initializer(), tf.local_variables_initializer()) 61 | sess.run(init_op) 62 | 63 | saver = tf.train.Saver(max_to_keep=0) 64 | epoch_test_error_prev = 1e10 65 | 66 | epoch_loss = np.zeros((num_epochs,2),dtype=np.float32) 67 | 68 | for epoch in range(num_epochs): 69 | 70 | print("Training epoch {}".format(epoch)) 71 | epoch_error = 0.0 72 | epoch_test_error = 0.0 73 | file_ind = 0 74 | wav_inds = np.random.permutation(len(train_wav_list)) 75 | 76 | for i in wav_inds: 77 | wfile = train_wav_list[i] 78 | f0file = train_f0_list[i] 79 | 80 | _, y = wavfile.read(wfile) 81 | y = np.float32(y/(2**15)) 82 | 83 | y_noise = augmentation.add_noise_file(y,noise_samples) 84 | 85 | f0 = np.fromfile(f0file,dtype=np.float32) 86 | if downsample_f0: 87 | f0 = f0[0::2] # Downsample F0 vector if computed with 5 ms hop size 88 | 89 | f0_onehot = sp.onehot(f0,minf0=f0_min,maxf0=f0_max,Nbins=output_dim) # Convert F0 vector to log-spaced onehot representation 90 | input_frames = np.reshape(sp.get_frames(y_noise,winlen,hop),[1,-1,winlen]) 91 | _, loss_np = sess.run([optim, loss], feed_dict={input_var: input_frames, 92 | output_var: f0_onehot, training_var: True}) 93 | epoch_error += loss_np / np.float32(len(train_wav_list)) 94 | 95 | file_ind += 1 96 | 97 | print("Error for epoch %d: %f" % (epoch, epoch_error)) 98 | saver.save(sess,"./saved_models/" + model_name + ".ckpt") 99 | 100 | # Validation set: 101 | 102 | for i in range(len(test_wav_list)): 103 | wfile = test_wav_list[i] 104 | f0file = test_f0_list[i] 105 | 106 | _, y = wavfile.read(wfile) 107 | y = np.float32(y/(2**15)) 108 | 109 | f0 = np.fromfile(f0file,dtype=np.float32) 110 | if downsample_f0: 111 | f0 = f0[0::2] 112 | 113 | f0_onehot = sp.onehot(f0,minf0=f0_min,maxf0=f0_max,Nbins=output_dim) 114 | input_frames = np.reshape(sp.get_frames(y,winlen,hop),[1,-1,winlen]) 115 | 116 | loss_np = sess.run([loss], feed_dict={input_var: input_frames, 117 | output_var: f0_onehot, training_var: False}) 118 | epoch_test_error += loss_np[0] / np.float32(len(test_wav_list)) 119 | 120 | print("Test Error for epoch %d: %f" % (epoch, epoch_test_error)) 121 | 122 | epoch_loss[epoch, 0] = epoch_error 123 | epoch_loss[epoch, 1] = epoch_test_error 124 | epoch_loss.tofile('./saved_models/' + model_name + '_loss.dat') 125 | 126 | if epoch_test_error < epoch_test_error_prev: 127 | saver.save(sess,"./saved_models/" + model_name + "_best.ckpt") 128 | epoch_test_error_prev = epoch_test_error 129 | 130 | 131 | 132 | if __name__ == "__main__": 133 | wav_scp = 'train_wavs.scp' 134 | with open(wav_scp) as wavlist: 135 | wavs = wavlist.read().splitlines() 136 | 137 | f0_scp = 'train_f0s.scp' 138 | with open(f0_scp) as f0list: 139 | f0s = f0list.read().splitlines() 140 | 141 | # Split into train and test files 142 | Ntest = np.int32(np.round(0.1*len(wavs))) 143 | wav_inds = np.random.RandomState(seed=42).permutation(len(wavs)) 144 | wavs = np.asarray(wavs)[wav_inds] 145 | f0s = np.asarray(f0s)[wav_inds] 146 | train_wavs = wavs[Ntest:] 147 | train_f0s = f0s[Ntest:] 148 | test_wavs = wavs[:Ntest] 149 | test_f0s = f0s[:Ntest] 150 | 151 | train_model(train_wavs, train_f0s, test_wavs, test_f0s, model_name='fundf-model') 152 | 153 | 154 | -------------------------------------------------------------------------------- /train_f0s.scp: -------------------------------------------------------------------------------- 1 | traindata/f0/filename1.f0 2 | traindata/f0/filename2.f0 -------------------------------------------------------------------------------- /train_wavs.scp: -------------------------------------------------------------------------------- 1 | traindata/wav/filename1.wav 2 | traindata/wav/filename2.wav -------------------------------------------------------------------------------- /wavs/arctic_a0001.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mairaksi/PiENet/01ab5122d693f55fe9b49025a4e41f3e0eeaa4b2/wavs/arctic_a0001.wav --------------------------------------------------------------------------------