├── .gitignore ├── README.md ├── binary ├── config.py ├── cough.txt ├── detector.py ├── logs │ └── fit │ │ └── 20200505-215154 │ │ ├── train │ │ ├── events.out.tfevents.1588708314.master.213418.566.v2 │ │ ├── events.out.tfevents.1588708315.master.profile-empty │ │ └── plugins │ │ │ └── profile │ │ │ └── 2020_05_05_21_51_55 │ │ │ ├── master.input_pipeline.pb │ │ │ ├── master.kernel_stats.pb │ │ │ ├── master.overview_page.pb │ │ │ ├── master.tensorflow_stats.pb │ │ │ └── master.trace.json.gz │ │ └── validation │ │ └── events.out.tfevents.1588708316.master.213418.2305.v2 ├── model.py ├── not.txt ├── processing.py ├── resize.py └── sweep.yaml ├── covid ├── model.py └── processing.ipynb ├── short-spectro ├── detector.py ├── model.py └── processing.py └── spectro ├── detector.py ├── model.py ├── processing.ipynb ├── processing.py └── sweep.yaml /.gitignore: -------------------------------------------------------------------------------- 1 | dry_wet/dataset.json 2 | dry_wet/edited_wavs 3 | binary/data/ 4 | binary/dataset.json 5 | binary/dataset_8.json 6 | binary/wav 7 | binary/.idea 8 | binary/.idea/ 9 | binary/model.h5 10 | binary/model.json 11 | binary/test.wav 12 | binary/__pycache__/ 13 | 14 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # CoughCNN 2 | 3 | ## Covid Anomaly detection model 4 | 5 | Due to the lack of enough covid cough samples, and of the precise features in those coughs which classify it as covid; 6 | we cannot use a simple supervised learning approach. Instead the approach taken currently is an unsupervised (which will 7 | turn to semisupervised later using the avaiable data to refine the model) using an autoencoder. The autoencoder takes all 8 | the non-covid cough samples which have been detected by the model in `spectro` and uses them as a training dataset, the autoencoder 9 | learns the representation of a "normal" cough and how to recreate it (boils down the sample using convolutions and then re-builds 10 | the sample using a transposition of the convolution). When a cough that was not similar to the ones which were in the dataset 11 | (which with enough data should be singled out to the covid coughs) the error which the model will make at recreating the sample 12 | provided (calculate useing MSE of the original image with the one created by the model) will be high and the sample will be 13 | labeled as an anomaly. 14 | 15 | ### Current 16 | 17 | Under the folder `covid` there are two files: 18 | **model.py**, contains the code for the autoencoder and the MSE applied to the melspectrogram, this still does not work unfortunately 19 | I'm tweaking the input shape of the samples by increasing the sample length and the number of mels per sample but I'm still getting 20 | problems. 21 | **processing.ipynb** contains the data processing to create the dataset used 22 | 23 | **short-spectro** is a clone folder of `spectro` but contains the scripts to make the data suitable for the anomaly detection model, 24 | once a stable solution for the anomaly detection will be finished all the samples will be in the same format and usable in any model. 25 | As of now I'll be keeping both separate. 26 | 27 | Here is all the data processed by the short-spectro: https://drive.google.com/file/d/1aAMGHTFJjiv7K6DrN_1DptcCeA3Efmbk/view?usp=sharing 28 | 29 | ## Log Melspectrogram 30 | 31 | Under the folder `spectro` you can find the whole approach using the melspectrograms to train a Convolutional Neural Network 32 | 33 | ### Current 34 | 35 | Right now the data that has been used comes from a cough dataset from: https://www.karger.com/Article/FullText/504666 36 | and using manually labeled data coming from our webapp. The testing dataset is also manyally labeled data coming 37 | from our webapp but that obviosly the CNN has not seen before. 38 | 39 | This was acheived augmenting the dataset by mixing the cough sample with some background noise to make them 40 | more real world. It was used a mixing ration of 0.25 (0.25 of the noise signal added) using the musan dataset 41 | https://www.openslr.org/17/ 42 | 43 | The samples are now a grayscale melspectrogram of 0.5s. 44 | 45 | All the data of the project can be found at : https://drive.google.com/drive/folders/1deqYCDye5l95RGJCeKXlcqH9Ras7lRQr?usp=sharing 46 | 47 | Right now we have normalized the data and we use a new model structure using GlobalAveragePooling2D which gave a huge performance 48 | boost to the previous version. We went from 68-75% to 80-84% accuracy on our test set created from the webapp. 49 | Right now we are running hyperparameters optimization and you can check out everything at: 50 | https://app.wandb.ai/mastersplinter/CoughDetect/sweeps/phdtst8z 51 | 52 | ### Next 53 | 54 | - Data processing hyperparameters 55 | - Testing different sample length and optimize them 56 | 57 | ### Resources that inspired this: 58 | 59 | - https://arxiv.org/pdf/2004.01275.pdf (for initial model architecture) 60 | - https://www.mi.t.u-tokyo.ac.jp/assets/publication/LEARNING_ENVIRONMENTAL_SOUNDS_WITH_END-TO-END_CONVOLUTIONAL_NEURAL_NETWORK.pdf 61 | - https://www.cs.tut.fi/~tuomasv/papers/ijcnn_paper_valenti_extended.pdf 62 | - https://adventuresinmachinelearning.com/global-average-pooling-convolutional-neural-networks/ 63 | - https://arxiv.org/pdf/1809.04437.pdf 64 | - https://arxiv.org/pdf/1711.10282.pdf 65 | -------------------------------------------------------------------------------- /binary/config.py: -------------------------------------------------------------------------------- 1 | import math 2 | 3 | DATA_FOLDER = "edited_wavs/" 4 | OUTPUT = "dataset.json" 5 | LABELS = ["cough", "not"] 6 | SR = 16000 7 | N_MFCC = 15 8 | N_FURIER = 2048 9 | HOP_LENGTH = 512 10 | SEGMENT_LENGTH = SR//2 11 | EXPECTED_MFCC = math.ceil(SEGMENT_LENGTH/HOP_LENGTH) -------------------------------------------------------------------------------- /binary/cough.txt: -------------------------------------------------------------------------------- 1 | https://www.youtube.com/watch?v=jxYNLCYTwZQ 2 | https://www.youtube.com/watch?v=qfpJg179YNk 3 | https://www.youtube.com/watch?v=d2wkdrScerU 4 | https://www.youtube.com/watch?v=ct3tHDfNKiQ 5 | https://www.youtube.com/watch?v=Dc_aoUCqw2E 6 | https://www.youtube.com/watch?v=tvHytTlGs0M 7 | https://www.youtube.com/watch?v=0QQxKN-KC1U 8 | https://www.youtube.com/watch?v=De4HdyocTHY 9 | https://www.youtube.com/watch?v=DYfjPnty2Ho 10 | https://www.youtube.com/watch?v=q6WsoL3J8U8 11 | https://www.youtube.com/watch?v=CTSLdNxN1cc 12 | https://www.youtube.com/watch?v=T2MtUVpdAxg 13 | https://www.youtube.com/watch?v=tfc5cXiXMDc 14 | https://www.youtube.com/watch?v=rkF_uMizqoc 15 | https://www.youtube.com/watch?v=5905FxXz9dI 16 | https://www.youtube.com/watch?v=IzPMbIll3LE 17 | https://www.youtube.com/watch?v=h2FLCKMcEX0 18 | https://www.youtube.com/watch?v=2Mw-s5jnqXU 19 | https://www.youtube.com/watch?v=diuuEXKzNB8 20 | https://www.youtube.com/watch?v=dg-I9j76-t8 21 | https://www.youtube.com/watch?v=TK4CveeCWfY 22 | https://www.youtube.com/watch?v=4k0ziD0j5BI 23 | https://www.youtube.com/watch?v=CsDXlt7Ei1c 24 | https://www.youtube.com/watch?v=7Ez5Wc_esBg 25 | https://www.youtube.com/watch?v=NfKZNt25L-Q 26 | https://www.youtube.com/watch?v=NaOVmYoIjbs 27 | https://www.youtube.com/watch?v=XrpB4DTNQZw 28 | https://www.youtube.com/watch?v=h-GtQfDCoaE 29 | https://www.youtube.com/watch?v=u2KMBD5-oCg 30 | https://www.youtube.com/watch?v=A5s2ZgwQ1VM 31 | https://www.youtube.com/watch?v=ekqLlw-Xe68 32 | https://www.youtube.com/watch?v=6LK6yHtIung 33 | https://www.youtube.com/watch?v=tZtJaS2ZtME 34 | https://www.youtube.com/watch?v=AQOeIVbhFm4 35 | https://www.youtube.com/watch?v=elAtjXsj8Jg 36 | https://www.soundsnap.com/node/90850 37 | https://www.soundsnap.com/node/27240 38 | https://www.soundsnap.com/node/28608 39 | https://www.soundsnap.com/node/27471 40 | https://www.soundsnap.com/node/26976 41 | https://www.youtube.com/watch?v=9RjZr8V8PNY 42 | https://www.youtube.com/watch?v=6mcpuDVN6lQ 43 | https://www.youtube.com/watch?v=zjd4HrJbc8o 44 | https://www.youtube.com/watch?v=iYxUHA-Pwsk 45 | https://www.youtube.com/watch?v=1UDFq2InljM 46 | https://www.youtube.com/watch?v=LkxvBb2VXbs 47 | -------------------------------------------------------------------------------- /binary/detector.py: -------------------------------------------------------------------------------- 1 | import os 2 | import json 3 | import librosa 4 | import argparse 5 | import numpy as np 6 | from config import * 7 | import tensorflow.keras as keras 8 | 9 | parser = argparse.ArgumentParser(description="Classify where the cough is in a file and split it up") 10 | parser.add_argument("file", metavar="f", type=str, help="Path to the audio file, must be wav") 11 | args = parser.parse_args() 12 | 13 | 14 | json_file = open('model.json', 'r') 15 | model = json_file.read() 16 | json_file.close() 17 | model = keras.models.model_from_json(model) 18 | model.load_weights("model.h5") 19 | 20 | os.system("clear") 21 | 22 | print("Loaded model from disk") 23 | 24 | try: 25 | os.system("mkdir detected/") 26 | except: 27 | pass 28 | 29 | mfccs = [] 30 | signal, sr = librosa.load(args.file, sr=SR) 31 | # Decide the segments based on length 32 | segments = len(signal) // SEGMENT_LENGTH 33 | pieces = [] 34 | curr = 0 # For segment indexing 35 | for segment in range(segments): 36 | p = signal[curr:curr + SEGMENT_LENGTH] 37 | # Extract mfcc data 38 | mfcc = librosa.feature.mfcc(p, sr=SR, n_mfcc=N_MFCC, n_fft=N_FURIER, 39 | hop_length=HOP_LENGTH).T 40 | if len(mfcc) == EXPECTED_MFCC: 41 | mfccs.append(mfcc.tolist()) 42 | pieces.append(p) 43 | 44 | curr += SEGMENT_LENGTH 45 | 46 | predictions = [] 47 | for x in mfccs: 48 | x = np.array(x) 49 | x = x[np.newaxis, ...] 50 | x = np.expand_dims(x, axis=3) 51 | pred = np.argmax(model.predict(x), axis=1) 52 | predictions.append(pred) 53 | 54 | for i in range(len(predictions)): 55 | if predictions[i][0] == 0: 56 | librosa.output.write_wav("detected/"+str(i)+".wav", pieces[i], sr=SR) 57 | 58 | end = '\033[0m' 59 | green = '\033[92m' 60 | length = 50 61 | each = "|" * (length // len(predictions)) 62 | output = "" 63 | for p in predictions: 64 | if p[0] == 0: 65 | output += green + each + end 66 | else: 67 | output += each 68 | 69 | print(output) -------------------------------------------------------------------------------- /binary/logs/fit/20200505-215154/train/events.out.tfevents.1588708314.master.213418.566.v2: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Splinter0/CoughCNN/71005bf7056f657ea30cb676d50fff999baa1f11/binary/logs/fit/20200505-215154/train/events.out.tfevents.1588708314.master.213418.566.v2 -------------------------------------------------------------------------------- /binary/logs/fit/20200505-215154/train/events.out.tfevents.1588708315.master.profile-empty: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Splinter0/CoughCNN/71005bf7056f657ea30cb676d50fff999baa1f11/binary/logs/fit/20200505-215154/train/events.out.tfevents.1588708315.master.profile-empty -------------------------------------------------------------------------------- /binary/logs/fit/20200505-215154/train/plugins/profile/2020_05_05_21_51_55/master.input_pipeline.pb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Splinter0/CoughCNN/71005bf7056f657ea30cb676d50fff999baa1f11/binary/logs/fit/20200505-215154/train/plugins/profile/2020_05_05_21_51_55/master.input_pipeline.pb -------------------------------------------------------------------------------- /binary/logs/fit/20200505-215154/train/plugins/profile/2020_05_05_21_51_55/master.kernel_stats.pb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Splinter0/CoughCNN/71005bf7056f657ea30cb676d50fff999baa1f11/binary/logs/fit/20200505-215154/train/plugins/profile/2020_05_05_21_51_55/master.kernel_stats.pb -------------------------------------------------------------------------------- /binary/logs/fit/20200505-215154/train/plugins/profile/2020_05_05_21_51_55/master.overview_page.pb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Splinter0/CoughCNN/71005bf7056f657ea30cb676d50fff999baa1f11/binary/logs/fit/20200505-215154/train/plugins/profile/2020_05_05_21_51_55/master.overview_page.pb -------------------------------------------------------------------------------- /binary/logs/fit/20200505-215154/train/plugins/profile/2020_05_05_21_51_55/master.tensorflow_stats.pb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Splinter0/CoughCNN/71005bf7056f657ea30cb676d50fff999baa1f11/binary/logs/fit/20200505-215154/train/plugins/profile/2020_05_05_21_51_55/master.tensorflow_stats.pb -------------------------------------------------------------------------------- /binary/logs/fit/20200505-215154/train/plugins/profile/2020_05_05_21_51_55/master.trace.json.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Splinter0/CoughCNN/71005bf7056f657ea30cb676d50fff999baa1f11/binary/logs/fit/20200505-215154/train/plugins/profile/2020_05_05_21_51_55/master.trace.json.gz -------------------------------------------------------------------------------- /binary/logs/fit/20200505-215154/validation/events.out.tfevents.1588708316.master.213418.2305.v2: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Splinter0/CoughCNN/71005bf7056f657ea30cb676d50fff999baa1f11/binary/logs/fit/20200505-215154/validation/events.out.tfevents.1588708316.master.213418.2305.v2 -------------------------------------------------------------------------------- /binary/model.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | import os 4 | import json 5 | import datetime 6 | import numpy as np 7 | from config import * 8 | import tensorflow as tf 9 | import tensorflow.keras as keras 10 | from pydub import AudioSegment 11 | import librosa, librosa.display 12 | from pydub.silence import split_on_silence 13 | from sklearn.model_selection import train_test_split 14 | 15 | os.system("clear") 16 | 17 | DATA = "dataset.json" 18 | 19 | def load_data(path): 20 | with open(path, "r") as j: 21 | data = json.load(j) 22 | 23 | return np.array(data["mfcc"]), np.array(data["labels"]) 24 | 25 | hyper = dict( 26 | channel_one = 60, 27 | kernel_one = (3, 3), 28 | activation = 'relu', 29 | pool_one = (3, 3), 30 | strides_one = (2, 2), 31 | padding = 'same', 32 | 33 | channel_two = 16, 34 | kernel_two = (2, 2), 35 | pool_two = (2, 2), 36 | strides_two = (2, 2), 37 | 38 | channel_three = 28, 39 | kernel_three = (2, 2), 40 | pool_three = (2, 2), 41 | strides_three = (2, 2), 42 | 43 | deep_one = 497, 44 | drop_one = 0.5998, 45 | deep_two = 80, 46 | drop_two = 0.3656 47 | ) 48 | 49 | import wandb 50 | from wandb.keras import WandbCallback 51 | wandb.init(config=hyper, project="CoughDetection", name="MainModel") 52 | 53 | def model(input_shape): 54 | m = keras.Sequential() 55 | 56 | # 1st conv layer 57 | m.add(keras.layers.Conv2D(hyper["channel_one"], hyper["kernel_one"], activation=hyper["activation"], input_shape=input_shape)) 58 | m.add(keras.layers.MaxPooling2D(hyper["pool_one"], strides=hyper["strides_one"], padding='same')) 59 | m.add(keras.layers.BatchNormalization()) 60 | 61 | m.add(keras.layers.Conv2D(hyper["channel_two"], hyper["kernel_two"], activation=hyper["activation"])) 62 | m.add(keras.layers.MaxPooling2D(hyper["pool_two"], strides=hyper["strides_two"], padding='same')) 63 | m.add(keras.layers.BatchNormalization()) 64 | 65 | m.add(keras.layers.Conv2D(hyper["channel_three"], hyper["kernel_three"], activation=hyper["activation"])) 66 | m.add(keras.layers.MaxPooling2D(hyper["pool_three"], strides=hyper["strides_three"], padding='same')) 67 | m.add(keras.layers.BatchNormalization()) 68 | 69 | m.add(keras.layers.Flatten()) 70 | m.add(keras.layers.Dense(hyper["deep_one"], activation='relu')) 71 | m.add(keras.layers.Dropout(hyper["drop_one"])) 72 | m.add(keras.layers.Dense(hyper["deep_two"], activation='relu')) 73 | m.add(keras.layers.Dropout(hyper["drop_two"])) 74 | 75 | m.add(keras.layers.Dense(2, activation='softmax')) 76 | 77 | return m 78 | 79 | def predict(mod, path): 80 | mfccs = [] 81 | signal, sr = librosa.load(path, sr=SR) 82 | # Decide the segments based on length 83 | segments = len(signal) // SEGMENT_LENGTH 84 | curr = 0 # For segment indexing 85 | for segment in range(segments): 86 | # Extract mfcc data 87 | mfcc = librosa.feature.mfcc(signal[curr:curr + SEGMENT_LENGTH], sr=SR, n_mfcc=N_MFCC, n_fft=N_FURIER, 88 | hop_length=HOP_LENGTH).T 89 | if len(mfcc) == EXPECTED_MFCC: 90 | mfccs.append(mfcc.tolist()) 91 | 92 | curr += SEGMENT_LENGTH 93 | 94 | predictions = [] 95 | for x in mfccs: 96 | x = np.array(x) 97 | x = x[np.newaxis, ...] 98 | x = np.expand_dims(x, axis=3) 99 | pred = np.argmax(mod.predict(x), axis=1) 100 | predictions.append(pred) 101 | 102 | return predictions 103 | 104 | # Pretty Print Prediction... yes 105 | def ppp(prediction): 106 | end = '\033[0m' 107 | green = '\033[92m' 108 | length = 50 109 | each = "|"*(length//len(prediction)) 110 | output = "" 111 | for p in prediction: 112 | if p[0] == 0: 113 | output += green+each+end 114 | else: 115 | output += each 116 | 117 | print(output) 118 | 119 | if __name__ == "__main__": 120 | TRAIN = True 121 | 122 | if TRAIN: 123 | x, y = load_data(DATA) 124 | x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25) 125 | x_train, x_val, y_train, y_val = train_test_split(x_train, y_train, test_size=0.3) 126 | 127 | x_train = x_train[..., np.newaxis] 128 | x_test = x_test[..., np.newaxis] 129 | x_val = x_val[..., np.newaxis] 130 | 131 | m = model((x_train.shape[1], x_train.shape[2], 1)) 132 | opt = keras.optimizers.Adam(learning_rate=0.000001) 133 | 134 | m.compile(optimizer=opt, 135 | loss="sparse_categorical_crossentropy", 136 | metrics=['accuracy']) 137 | 138 | log_dir = "logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S") 139 | callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1) 140 | 141 | m.summary() 142 | m.fit(x_train, y_train, validation_data=(x_val, y_val), batch_size=32, epochs=120, callbacks=[WandbCallback()]) 143 | 144 | test_err, test_acc = m.evaluate(x_test, y_test, verbose=1) 145 | print("Accuracy on testing data", test_acc) 146 | 147 | model_json = m.to_json() 148 | with open("model.json", "w") as json_file: 149 | json_file.write(model_json) 150 | print("Saved model to disk") 151 | m.save_weights("model.h5") 152 | 153 | else: 154 | json_file = open('model.json', 'r') 155 | loaded_model_json = json_file.read() 156 | json_file.close() 157 | loaded_model = keras.models.model_from_json(loaded_model_json) 158 | # load weights into new model 159 | loaded_model.load_weights("model.h5") 160 | print("Loaded model from disk") 161 | 162 | t = predict(loaded_model, "test2.wav") 163 | ppp(t) 164 | t = predict(loaded_model, "test.wav") 165 | ppp(t) 166 | 167 | -------------------------------------------------------------------------------- /binary/not.txt: -------------------------------------------------------------------------------- 1 | https://www.youtube.com/watch?v=D7ZZp8XuUTE 2 | https://www.youtube.com/watch?v=DQqETh7E0LM 3 | https://www.youtube.com/watch?v=jAg6tyC9Xxc 4 | https://www.youtube.com/watch?v=LyvpfemDyYw 5 | -------------------------------------------------------------------------------- /binary/processing.py: -------------------------------------------------------------------------------- 1 | from __future__ import unicode_literals 2 | 3 | import os 4 | import math 5 | import json 6 | import copy 7 | import youtube_dl 8 | import numpy as np 9 | from config import * 10 | from tqdm import tqdm 11 | from pydub import AudioSegment 12 | import librosa, librosa.display 13 | from pydub.silence import split_on_silence 14 | 15 | """ 16 | This script downloads all the youtube links in the files named after the labels, then: 17 | - Converts them into WAV 18 | - Splits them into non silent chunks 19 | - Splits the chunks into segments (the actual size of training data, so that we have same lenghts) 20 | - Extracts the mfccs out of the segments 21 | - Saves the data in a json file 22 | 23 | (I split the data fetching at conversion to the mfcc because it was taking too much memory to do it all together) 24 | 25 | Dataset from: https://www.karger.com/Article/FullText/504666 26 | """ 27 | 28 | os.system("clear") 29 | 30 | pbar = None 31 | queue = None 32 | 33 | youtube_par = { 34 | 'format': 'bestaudio/best', 35 | 'postprocessors': [{ 36 | 'key': 'FFmpegExtractAudio', 37 | 'preferredcodec': 'mp3', 38 | 'preferredquality': '192', 39 | }], 40 | 'quiet':True 41 | } 42 | 43 | data = { 44 | "mfcc":[], 45 | "labels":[] 46 | } 47 | 48 | 49 | def single(link, out, silence): 50 | conf = copy.deepcopy(youtube_par) 51 | conf["outtmpl"] = out+"%(title)s.mp3" 52 | try: 53 | f = "" 54 | # Download the video 55 | with youtube_dl.YoutubeDL(conf) as you: 56 | info = you.extract_info(link, download=True) 57 | f = out+""+info.get("title", None) 58 | pbar.update(0.25) 59 | # Convert it into WAV, needed because youtube_dl currupts wav files on dowload for some reason 60 | os.system("ffmpeg -i '" + f + ".mp3' '" + f + ".wav' >/dev/null 2>&1") 61 | os.system("rm '" + f + ".mp3'") 62 | # This doesn't run for non cough data because we want it to be noisy 63 | if silence: 64 | # Load whole wav file 65 | signal = AudioSegment.from_wav(f +".wav") 66 | # Split the wav file into chunks of non silent 67 | chunks = split_on_silence(signal, min_silence_len=500, silence_thresh=-35) 68 | pbar.update(0.25) 69 | # Exports all the chunks 70 | for i, c in enumerate(chunks): 71 | c.export(f + str(i) + ".wav", format="wav") 72 | os.system("rm '"+f+".wav'") 73 | else: 74 | pbar.update(0.25) 75 | 76 | except Exception as e: 77 | pass 78 | 79 | pbar.update(0.5) 80 | 81 | 82 | def process(folder, label): 83 | items = os.listdir(folder) 84 | if label == 1: 85 | # To make sure we get same amount of data for both classes 86 | each = len(data["labels"])//len(items) 87 | print(each) 88 | for sample in tqdm(items): 89 | if os.path.splitext(sample)[-1] != ".wav": 90 | continue 91 | # Load the chunk into librosa 92 | signal, sr = librosa.load(folder+sample, sr=SR) 93 | # Decide the segments based on length 94 | segments = len(signal) // SEGMENT_LENGTH 95 | curr = 0 # For segment indexing 96 | # print(sample, segments) 97 | count = 0 98 | for segment in range(segments): 99 | pieces = [signal[curr:curr + SEGMENT_LENGTH]] 100 | if label == 1 and count >= each: 101 | break 102 | elif label == 0: 103 | #pieces += augment(signal[curr:curr + SEGMENT_LENGTH]) 104 | pass 105 | 106 | # Extract mfcc data 107 | for p in pieces: 108 | mfcc = librosa.feature.mfcc(p, sr=SR, n_mfcc=N_MFCC, n_fft=N_FURIER, 109 | hop_length=HOP_LENGTH).T 110 | if len(mfcc) == EXPECTED_MFCC: 111 | data["mfcc"].append(mfcc.tolist()) 112 | data["labels"].append(label) 113 | count += 1 114 | 115 | curr += SEGMENT_LENGTH 116 | 117 | def augment(signal): 118 | different = [] 119 | # Add white noise 120 | noise = np.random.randn(len(signal)) 121 | different.append(signal + 0.0025*noise) 122 | # Shift sound 123 | different.append(np.roll(signal, SR)) 124 | 125 | return different 126 | 127 | 128 | if __name__ == "__main__": 129 | FETCH = False 130 | 131 | try: 132 | os.system("mkdir data >/dev/null 2>&1") 133 | except: 134 | pass 135 | 136 | for i, label in enumerate(LABELS): 137 | try: 138 | os.system("mkdir data/"+label+" >/dev/null 2>&1") 139 | except: 140 | pass 141 | 142 | if FETCH: 143 | print("Processing label: "+label) 144 | with open(label+".txt", "r") as f: 145 | links = f.read().splitlines() 146 | 147 | pbar = tqdm(total=len(links)) 148 | for l in links: 149 | if "youtube" in l: 150 | single(l, "data/"+label+"/", i == 0) 151 | else: 152 | process("data/"+label+"/", i) 153 | 154 | if not FETCH: 155 | # Dump the dat into json file 156 | print(len(data["labels"])) 157 | with open(OUTPUT, "w") as j: 158 | json.dump(data, j, indent=4) -------------------------------------------------------------------------------- /binary/resize.py: -------------------------------------------------------------------------------- 1 | import os 2 | from tqdm import tqdm 3 | from pydub import AudioSegment 4 | 5 | """ 6 | This is a helper script to chop up large audio files 7 | """ 8 | 9 | MAX=2*60*1000 10 | FOLDER = "data/not/" 11 | SKIP = 6 12 | N_PER = 5 13 | 14 | for sample in tqdm(os.listdir(FOLDER)): 15 | n = os.path.splitext(sample) 16 | if n[-1] != ".wav": 17 | continue 18 | 19 | signal = AudioSegment.from_wav(FOLDER+sample) 20 | curr = 0 21 | for i in range(N_PER): 22 | if i+1 % SKIP == 0: 23 | continue 24 | signal[curr:curr+MAX].export(FOLDER+n[0]+str(i)+".wav", format="wav") 25 | curr += MAX 26 | 27 | os.system("rm '"+FOLDER+sample+"'") -------------------------------------------------------------------------------- /binary/sweep.yaml: -------------------------------------------------------------------------------- 1 | program: model.py 2 | method: random 3 | command: 4 | - ${env} 5 | - python3 6 | - ${program} 7 | - ${args} 8 | metric: 9 | name: loss 10 | goal: minimize 11 | parameters: 12 | padding: 13 | distribution: categorical 14 | values: 15 | - same 16 | deep_one: 17 | distribution: int_uniform 18 | min: 128 19 | max: 512 20 | deep_two: 21 | distribution: int_uniform 22 | min: 32 23 | max: 128 24 | drop_one: 25 | distribution: uniform 26 | min: 0.15 27 | max: 0.6 28 | drop_two: 29 | distribution: uniform 30 | min: 0.15 31 | max: 0.6 32 | activation: 33 | distribution: categorical 34 | values: 35 | - relu 36 | channel_one: 37 | distribution: int_uniform 38 | min: 16 39 | max: 64 40 | channel_two: 41 | distribution: int_uniform 42 | min: 16 43 | max: 64 44 | channel_three: 45 | distribution: int_uniform 46 | min: 16 47 | max: 64 48 | -------------------------------------------------------------------------------- /covid/model.py: -------------------------------------------------------------------------------- 1 | import os 2 | import cv2 3 | import wandb 4 | import datetime 5 | import numpy as np 6 | import matplotlib.pyplot as plt 7 | from tensorflow.keras.models import Model 8 | from tensorflow.keras import backend as K 9 | from tensorflow.keras.optimizers import Adam 10 | from tensorflow.keras.regularizers import l2 11 | from tensorflow.keras.layers import BatchNormalization, Conv2D, Conv2DTranspose, LeakyReLU, Activation, Flatten, Dense, Reshape, Input 12 | 13 | DATASET = "dataset.npy" 14 | 15 | def load_data(f): 16 | data = np.load(f, allow_pickle=True) 17 | data = np.array(data) 18 | train = [] 19 | validation = [] 20 | 21 | for sample in data: 22 | if sample[0] == 0: 23 | validation.append(sample[1]) 24 | else: 25 | train.append(sample[1]) 26 | 27 | start = len(train)-int(len(train)*0.3) 28 | validation += train[start:] 29 | train = train[:start] 30 | train = np.array(train) 31 | validation = np.array(validation) 32 | np.random.shuffle(validation) 33 | 34 | shape = (train.shape[1], train.shape[2], 1) 35 | 36 | train = train.reshape(train.shape[0], shape[0], shape[1], shape[2]) 37 | validation = validation.reshape(validation.shape[0], shape[0], shape[1], shape[2]) 38 | return train, validation, shape 39 | 40 | def MSE(x, y): 41 | err = np.sum((x.astype("float") - y.astype("float")) ** 2) 42 | err /= float(x * y) 43 | return err 44 | 45 | def visualize_predictions(decoded, gt, samples=10): 46 | outputs = None 47 | for i in range(0, samples): 48 | original = (gt[i] * 255).astype("uint8") 49 | recon = (decoded[i] * 255).astype("uint8") 50 | output = np.hstack([original, recon]) 51 | if outputs is None: 52 | outputs = output 53 | else: 54 | outputs = np.vstack([outputs, output]) 55 | return outputs 56 | 57 | class AutoEncoder(object): 58 | def __init__(self, name, config, hyper=False, hyper_project="", extra=None): 59 | self.name = name 60 | self.config = config 61 | 62 | if hyper: 63 | wandb.init(config=config, project=hyper_project) 64 | wandb.run.save() 65 | try: 66 | os.system("mkdir sweep/"+wandb.run.name) 67 | except: 68 | pass 69 | 70 | def build(self, input_shape): 71 | input_layer = Input(shape=input_shape) 72 | x = input_layer 73 | x = Conv2D(self.config["conv1"], self.config["kernel1"], strides=2, padding="same")(x) 74 | x = LeakyReLU(alpha=self.config["alpha"])(x) 75 | x = BatchNormalization()(x) 76 | 77 | x = Conv2D(self.config["conv2"], self.config["kernel2"], strides=2, padding="same")(x) 78 | x = LeakyReLU(alpha=self.config["alpha"])(x) 79 | x = BatchNormalization()(x) 80 | 81 | x = Conv2D(self.config["conv3"], self.config["kernel3"], strides=2, padding="same")(x) 82 | x = LeakyReLU(alpha=self.config["alpha"])(x) 83 | x = BatchNormalization()(x) 84 | 85 | magnitude = K.int_shape(x) 86 | x = Flatten()(x) 87 | 88 | hand = Dense(self.config["dense"])(x) 89 | 90 | self.encoder = Model(input_layer, hand, name="encoder") 91 | 92 | # decoder 93 | decoder_input = Input(shape=(self.config["dense"],)) 94 | x = Dense(np.prod(magnitude[1:]))(decoder_input) 95 | x = Reshape((magnitude[1], magnitude[2], magnitude[3]))(x) 96 | 97 | x = Conv2DTranspose(self.config["conv3"], self.config["kernel3"], strides=2, padding="same")(x) 98 | x = LeakyReLU(alpha=self.config["alpha"])(x) 99 | x = BatchNormalization()(x) 100 | 101 | x = Conv2DTranspose(self.config["conv2"], self.config["kernel2"], strides=2, padding="same")(x) 102 | x = LeakyReLU(alpha=self.config["alpha"])(x) 103 | x = BatchNormalization()(x) 104 | 105 | 106 | x = Conv2DTranspose(self.config["conv1"], self.config["kernel1"], strides=2, padding="same")(x) 107 | x = LeakyReLU(alpha=self.config["alpha"])(x) 108 | x = BatchNormalization()(x) 109 | 110 | x = Conv2DTranspose(self.config["trans"], self.config["kernel1"], padding="same")(x) 111 | outputs = Activation("sigmoid")(x) 112 | 113 | self.decoder = Model(decoder_input, outputs, name="decoder") 114 | 115 | self.model = Model(input_layer, self.decoder(self.encoder(input_layer)), name="autoencoder") 116 | 117 | def train(self, x_train, x_test): 118 | self.optimizer = Adam(lr=self.config["lr"], decay=self.config["lr"]/self.config["epochs"]) 119 | self.model.compile(loss="mse", optimizer=self.optimizer, metrics=['accuracy']) 120 | self.model.summary() 121 | self.model.fit( 122 | x_train, x_train, 123 | validation_data=(x_test, x_test), 124 | epochs=self.config["epochs"], 125 | batch_size=self.config["batch_size"] 126 | ) 127 | 128 | def test(self, x_test): 129 | decoded = self.model.predict(x_test) 130 | vis = visualize_predictions(decoded, x_test) 131 | cv2.imwrite("wow.png", vis) 132 | 133 | def getNormal(self, validation): 134 | errors = [] 135 | for sample in validation: 136 | pred = self.model.predict(sample) 137 | err = MSE(pred, sample) 138 | errors.append(err) 139 | 140 | error_df = pd.DataFrame({'reconstruction_error':errors}) 141 | error_df.describe() 142 | 143 | fig = plt.figure() 144 | ax = fig.add_subplot(111) 145 | _ = ax.hist(error_df.reconstruction_error.values, bins=5) 146 | fig.show() 147 | 148 | 149 | if __name__ == '__main__': 150 | train, validation, shape = load_data(DATASET) 151 | print(shape) 152 | #print(x[0], y[0], shape) 153 | config = dict( 154 | conv1 = 16, 155 | conv2 = 32, 156 | conv3 = shape[0], 157 | kernel1 = (3,3), 158 | kernel2 = (3,3), 159 | kernel3 = (3,3), 160 | 161 | dense = 16, 162 | trans = shape[-1], 163 | 164 | batch_size = 32, 165 | epochs = 30, 166 | alpha = 0.1, 167 | lr = 1e-4 168 | ) 169 | m = AutoEncoder("Covid1", config, hyper=False, hyper_project="CovidDetection") 170 | m.build(shape) 171 | m.train(train, validation) 172 | m.test(validation) 173 | m.getNormal(validation) 174 | -------------------------------------------------------------------------------- /covid/processing.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [ 8 | { 9 | "name": "stderr", 10 | "output_type": "stream", 11 | "text": [ 12 | "/home/splinter/.local/lib/python3.8/site-packages/librosa/util/decorators.py:9: NumbaDeprecationWarning: An import was requested from a module that has moved location.\n", 13 | "Import requested from: 'numba.decorators', please update to use 'numba.core.decorators' or pin to Numba version 0.48.0. This alias will not be present in Numba version 0.50.0.\n", 14 | " from numba.decorators import jit as optional_jit\n", 15 | "/home/splinter/.local/lib/python3.8/site-packages/librosa/util/decorators.py:9: NumbaDeprecationWarning: An import was requested from a module that has moved location.\n", 16 | "Import of 'jit' requested from: 'numba.decorators', please update to use 'numba.core.decorators' or pin to Numba version 0.48.0. This alias will not be present in Numba version 0.50.0.\n", 17 | " from numba.decorators import jit as optional_jit\n" 18 | ] 19 | } 20 | ], 21 | "source": [ 22 | "import os\n", 23 | "import sys\n", 24 | "import librosa\n", 25 | "import numpy as np\n", 26 | "from tqdm import tqdm\n", 27 | "\n", 28 | "sys.path.insert(0, \"/home/splinter/Desktop/VoiceMed/CoughCNN/short-spectro/\")\n", 29 | "\n", 30 | "import detector\n", 31 | "\n", 32 | "cough = detector.Detector()\n" 33 | ] 34 | }, 35 | { 36 | "cell_type": "code", 37 | "execution_count": 2, 38 | "metadata": {}, 39 | "outputs": [], 40 | "source": [ 41 | "DATASET = \"dataset/\"\n", 42 | "WEBAPP = \"/home/splinter/Desktop/VoiceMed/qualityCheck/data/web_app/\"\n", 43 | "OUT = \"data/positive/\"" 44 | ] 45 | }, 46 | { 47 | "cell_type": "code", 48 | "execution_count": 4, 49 | "metadata": {}, 50 | "outputs": [ 51 | { 52 | "name": "stdout", 53 | "output_type": "stream", 54 | "text": [ 55 | "Processing WebApp data\n" 56 | ] 57 | } 58 | ], 59 | "source": [ 60 | "print(\"Processing WebApp data\")\n", 61 | "for folder in os.listdir(WEBAPP):\n", 62 | " if os.path.isdir(WEBAPP+folder):\n", 63 | " for sample in os.listdir(WEBAPP+folder):\n", 64 | " #since labels are wrong no point filtering for cough in names\n", 65 | " if os.path.splitext(sample)[-1] != \".wav\" or \"covid\" not in sample:\n", 66 | " continue\n", 67 | "\n", 68 | " cough.detect(WEBAPP+folder+\"/\"+sample, OUT, False)" 69 | ] 70 | }, 71 | { 72 | "cell_type": "code", 73 | "execution_count": 3, 74 | "metadata": {}, 75 | "outputs": [ 76 | { 77 | "name": "stderr", 78 | "output_type": "stream", 79 | "text": [ 80 | "\r", 81 | " 0%| | 0/33 [00:00 len(signal): 43 | sample = signal[len(signal)-SAMPLE_SIZE:] 44 | end = True 45 | else: 46 | sample = signal[current:current+SAMPLE_SIZE] 47 | current += SAMPLE_SIZE 48 | 49 | mel = melspectrogram(sample) 50 | x = np.array(mel) 51 | x = x[np.newaxis, ...] 52 | x = np.expand_dims(x, axis=3) 53 | pred = np.argmax(self.model.predict(x), axis=1) 54 | predictions.append(pred) 55 | pieces.append(sample) 56 | 57 | for i in range(len(predictions)): 58 | if predictions[i][0] == 0: 59 | librosa.output.write_wav(self.out + str(i) + os.path.split(audio)[1], pieces[i], sr=SR) 60 | 61 | if visual: 62 | end = '\033[0m' 63 | green = '\033[92m' 64 | length = 50 65 | each = "|" * (length // len(predictions)) 66 | output = "" 67 | for p in predictions: 68 | if p[0] == 0: 69 | output += green + each + end 70 | else: 71 | output += each 72 | 73 | print(output) 74 | else: 75 | return pieces, predictions 76 | 77 | 78 | if __name__ == '__main__': 79 | 80 | parser = argparse.ArgumentParser(description="Classify where the cough is in a file and split it up") 81 | parser.add_argument("file", metavar="f", type=str, help="Path to the audio file, must be wav") 82 | args = parser.parse_args() 83 | 84 | try: 85 | os.system("mkdir detected/") 86 | except: 87 | pass 88 | 89 | d = Detector("model.json", "model.h5") 90 | d.detect(args.file, "detected/", visual=True) 91 | -------------------------------------------------------------------------------- /short-spectro/model.py: -------------------------------------------------------------------------------- 1 | import os 2 | import wandb 3 | import datetime 4 | import numpy as np 5 | import tensorflow as tf 6 | import tensorflow.keras as keras 7 | from wandb.keras import WandbCallback 8 | from tensorflow.keras.regularizers import l2 9 | from tensorflow.keras.utils import to_categorical 10 | from sklearn.model_selection import train_test_split 11 | from sklearn.metrics import precision_score, f1_score, recall_score 12 | from tensorflow.keras.layers import Conv2D, MaxPooling2D, BatchNormalization, Flatten, Dense, Dropout, LeakyReLU, SpatialDropout2D, GlobalAveragePooling2D 13 | 14 | 15 | DATASET = "dataset.npy" 16 | global model 17 | 18 | def load_data(f, s=False): 19 | data = np.load(f, allow_pickle=True) 20 | x = [] 21 | y = [] 22 | 23 | for sample in data: 24 | x.append(sample[0]) 25 | y.append(to_categorical(sample[1], num_classes=2)) 26 | 27 | x = np.array(x) 28 | y = np.array(y) 29 | 30 | shape = (x.shape[1], x.shape[2], 1) 31 | 32 | x = x.reshape(x.shape[0], shape[0], shape[1], shape[2]) 33 | 34 | if s: 35 | return x, y, shape 36 | 37 | #x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25) 38 | x_train, x_val, y_train, y_val = train_test_split(x, y, test_size=0.25) 39 | 40 | return x_train, y_train, x_val, y_val, shape 41 | 42 | class ExtraCallBack(tf.keras.callbacks.Callback): 43 | def on_epoch_end(self, epoch, logs=None): 44 | acc = model.test(model.x_extra, model.y_extra, extra=True) 45 | try: 46 | wandb.log({'real_acc': acc}) 47 | except: 48 | pass 49 | #print("Evaluating over our own dataset, accuracy: "+str(acc)) 50 | 51 | class Model(object): 52 | def __init__(self, name, config, hyper=False, hyper_project="", extra=None): 53 | self.name = name 54 | self.config = config 55 | self.x_extra = extra[0] 56 | self.y_extra = extra[1] 57 | 58 | if hyper: 59 | wandb.init(config=config, project=hyper_project) 60 | wandb.run.save() 61 | self.callback = WandbCallback(data_type="image", validation_data=extra) 62 | try: 63 | os.system("mkdir sweep/"+wandb.run.name) 64 | except: 65 | pass 66 | else: 67 | log_dir = "logs/fit/"+datetime.datetime.now().strftime("%Y%m%d-%H%M%S") 68 | self.callback = keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1) 69 | 70 | self.extra_callback = ExtraCallBack() 71 | 72 | def build(self, input_shape): 73 | self.model = keras.Sequential([ 74 | Conv2D(self.config["conv1"], self.config["kernel1"], kernel_regularizer=l2(self.config["l2_rate"]), input_shape=input_shape), 75 | LeakyReLU(alpha=self.config["alpha"]), 76 | BatchNormalization(), 77 | 78 | SpatialDropout2D(self.config["drop1"]), 79 | Conv2D(self.config["conv2"], self.config["kernel2"], kernel_regularizer=l2(self.config["l2_rate"])), 80 | LeakyReLU(alpha=self.config["alpha"]), 81 | BatchNormalization(), 82 | 83 | MaxPooling2D(self.config["pool1"], padding='same'), 84 | 85 | SpatialDropout2D(self.config["drop1"]), 86 | Conv2D(self.config["conv3"], self.config["kernel3"], kernel_regularizer=l2(self.config["l2_rate"])), 87 | LeakyReLU(alpha=self.config["alpha"]), 88 | BatchNormalization(), 89 | 90 | SpatialDropout2D(self.config["drop2"]), 91 | Conv2D(self.config["conv4"], self.config["kernel4"], kernel_regularizer=l2(self.config["l2_rate"])), 92 | LeakyReLU(alpha=self.config["alpha"]), 93 | BatchNormalization(), 94 | 95 | GlobalAveragePooling2D(), 96 | 97 | Dense(2, activation='softmax') 98 | ]) 99 | 100 | def train(self, x_train, y_train, validation): 101 | self.optimizer = keras.optimizers.Adam( 102 | learning_rate=self.config["lr"], 103 | beta_1=self.config["beta_1"], 104 | beta_2=self.config["beta_2"] 105 | ) 106 | self.model.compile( 107 | optimizer=self.optimizer, 108 | loss="categorical_crossentropy", 109 | metrics=['accuracy'] 110 | ) 111 | 112 | self.model.summary() 113 | 114 | self.model.fit( 115 | x_train, 116 | y_train, 117 | validation_data=validation, 118 | batch_size=self.config["batch_size"], 119 | epochs=self.config["epochs"], 120 | callbacks=[self.callback, self.extra_callback] 121 | ) 122 | 123 | def test(self, x_test, y_test, extra=True): 124 | test_err, test_acc = self.model.evaluate(x_test, y_test, verbose=0) 125 | 126 | """# predict probabilities for test set 127 | yhat_probs = self.model.predict(x_test, verbose=0) 128 | # predict crisp classes for test set 129 | yhat_classes = self.model.predict_classes(x_test, verbose=0) 130 | precision = precision_score(y_test, yhat_classes) 131 | print('Precision: %f' % precision) 132 | # recall: tp / (tp + fn) 133 | recall = recall_score(y_test, yhat_classes) 134 | print('Recall: %f' % recall) 135 | # f1: 2 tp / (2 tp + fp + fn) 136 | f1 = f1_score(y_test, yhat_classes) 137 | print('F1 score: %f' % f1)""" 138 | 139 | if extra: 140 | return test_acc 141 | else: 142 | print("Accuracy on testing data: "+str(test_acc)) 143 | 144 | def save(self): 145 | folder = "sweep/"+wandb.run.name+"/" 146 | with open(folder+"model.json", "w") as json_file: 147 | json_file.write(self.model.to_json()) 148 | 149 | self.model.save_weights(folder+"model.h5") 150 | print("Saved model '"+self.name+"-"+wandb.run.name+"' to disk") 151 | 152 | 153 | if __name__ == '__main__': 154 | should_train = True 155 | 156 | if should_train: 157 | x_train, y_train, x_val, y_val, shape = load_data(DATASET) 158 | x_extra, y_extra, _ = load_data("test.npy", s=True) 159 | 160 | config = dict( 161 | conv1 = 32, 162 | kernel1 = (3,3), 163 | drop1 = 0.07, 164 | 165 | conv2 = 32, 166 | kernel2 = (3,3), 167 | 168 | pool1 = (2,2), 169 | 170 | conv3 = 64, 171 | kernel3 = (3,3), 172 | 173 | drop2 = 0.14, 174 | 175 | conv4 = 64, 176 | kernel4 = (3,3), 177 | 178 | batch_size = 128, 179 | epochs = 40, 180 | 181 | lr = 1e-4, 182 | beta_1 = 0.99, 183 | beta_2 = 0.999, 184 | l2_rate = 0.001, 185 | 186 | alpha = 0.1 187 | ) 188 | 189 | model = Model("Spectro3", config, hyper=True, hyper_project="WOW", extra=(x_extra, y_extra)) 190 | model.build(shape) 191 | model.train(x_train, y_train, (x_val, y_val)) 192 | model.save() 193 | 194 | else: 195 | pass 196 | -------------------------------------------------------------------------------- /short-spectro/processing.py: -------------------------------------------------------------------------------- 1 | import os 2 | import librosa 3 | import numpy as np 4 | import pandas as pd 5 | from tqdm import tqdm 6 | import librosa.display 7 | import matplotlib.pyplot as plt 8 | from speechpy.processing import cmvn 9 | 10 | SR = 44000 11 | N_FFT = 2048 12 | HOP_LENGTH = 512 13 | N_MELS = 64 14 | SILENCE = 0.0018 15 | SAMPLE_LENGTH = 32/86 #s 16 | SAMPLE_SIZE = int(np.ceil(SR*SAMPLE_LENGTH)) 17 | NOISE_RATIO = 0.25 18 | 19 | LABELS = ["cough", "not"] 20 | 21 | AUGMENT = "noise/" 22 | noises = [] 23 | 24 | def envelope(signal, rate, thresh): 25 | mask = [] 26 | y = pd.Series(signal).apply(np.abs) 27 | # Create aggregated mean 28 | y_mean = y.rolling(window=int(rate/10), min_periods=1, center=True).mean() 29 | for m in y_mean: 30 | mask.append(m > thresh) 31 | 32 | return mask 33 | 34 | def load_audio(path): 35 | signal, rate = librosa.load(path, sr=SR) 36 | mask = envelope(signal, rate, SILENCE) 37 | signal = signal[mask] 38 | 39 | return signal 40 | 41 | def melspectrogram(signal): 42 | signal = librosa.util.normalize(signal) 43 | spectro = librosa.feature.melspectrogram( 44 | signal, 45 | sr=SR, 46 | n_mels=N_MELS, 47 | n_fft=N_FFT 48 | ) 49 | spectro = librosa.power_to_db(spectro) 50 | spectro = spectro.astype(np.float32) 51 | return spectro 52 | 53 | def load_noises(n=2): 54 | ns = [] 55 | ids = [] 56 | for _ in range(n): 57 | while True: 58 | i = np.random.choice(len(noises)) 59 | if i in ids: 60 | continue 61 | ids.append(i) 62 | noise, _ = librosa.load(noises[i], sr=SR) 63 | if len(noise) < SAMPLE_SIZE: 64 | continue 65 | ns.append(noise) 66 | break 67 | 68 | return ns 69 | 70 | def augment(sample, ns): 71 | augmented = [] 72 | for noise in ns: 73 | gap = len(noise)-len(sample) 74 | point = 0 75 | if gap > 0: 76 | point = np.random.randint(low=0, high=len(noise)-len(sample)) 77 | noise = noise[point:point+len(sample)] 78 | final = [] 79 | for f in range(len(sample)): 80 | n = noise[f]*NOISE_RATIO 81 | final.append(sample[f]+n) 82 | 83 | augmented.append(final) 84 | 85 | return augmented 86 | 87 | def process(audio, aug=False): 88 | signal = load_audio(audio) 89 | 90 | if len(signal) < SAMPLE_SIZE: 91 | return [] 92 | 93 | current = 0 94 | end = False 95 | features = [] 96 | 97 | if aug: 98 | ns = load_noises() 99 | 100 | while not end: 101 | if current+SAMPLE_SIZE > len(signal): 102 | sample = signal[len(signal)-SAMPLE_SIZE:] 103 | end = True 104 | else: 105 | sample = signal[current:current+SAMPLE_SIZE] 106 | current += SAMPLE_SIZE 107 | 108 | features.append(melspectrogram(sample)) 109 | 110 | if aug: 111 | signals = augment(sample, ns) 112 | for s in signals: 113 | features.append(melspectrogram(s)) 114 | 115 | return features 116 | 117 | def generate_dataset(folder, aug=False): 118 | data = [] #contains [mel, label] 119 | for i, label in enumerate(LABELS): 120 | print("Processing: "+label) 121 | for audio in tqdm(os.listdir(folder+label)): 122 | if os.path.splitext(audio)[-1] != ".wav": 123 | continue 124 | 125 | features = process(folder+label+"/"+audio, aug=aug and i == 0) 126 | for feat in features: 127 | data.append([feat, i]) 128 | 129 | return data 130 | 131 | if __name__ == '__main__': 132 | for audio in os.listdir(AUGMENT): 133 | if os.path.splitext(audio)[-1] != ".wav": 134 | continue 135 | 136 | noises.append(AUGMENT+audio) 137 | 138 | DATA_FOLDER = "dataset/" 139 | TEST_FOLDER = "test/" 140 | 141 | data = generate_dataset(DATA_FOLDER, True) 142 | 143 | #extra = generate_dataset(EXTRA_FOLDER) 144 | #data += extra 145 | np.random.shuffle(data) 146 | np.save("dataset.npy", data) 147 | 148 | test = generate_dataset(TEST_FOLDER) 149 | np.save("test.npy", test) 150 | -------------------------------------------------------------------------------- /spectro/detector.py: -------------------------------------------------------------------------------- 1 | import os 2 | import json 3 | import librosa 4 | import argparse 5 | import numpy as np 6 | import tensorflow.keras as keras 7 | from processing import load_audio, SAMPLE_SIZE, melspectrogram, SR 8 | 9 | class Detector(object): 10 | def __init__(self, m="model.json", w="model.h5"): 11 | self.m = m 12 | self.w = w 13 | 14 | json_file = open(self.m, 'r') 15 | self.model = json_file.read() 16 | json_file.close() 17 | self.model = keras.models.model_from_json(self.model) 18 | self.model.load_weights(self.w) 19 | # os.system("clear") 20 | 21 | def detect(self, audio, out, visual): 22 | self.out = out 23 | try: 24 | signal = load_audio(audio) 25 | 26 | except ValueError: 27 | if visual: 28 | print("Recording too short!") 29 | return 30 | 31 | if len(signal) < SAMPLE_SIZE: 32 | if visual: 33 | print("Recording too short!") 34 | return 35 | 36 | current = 0 37 | end = False 38 | predictions = [] 39 | pieces = [] 40 | 41 | while not end: 42 | if current+SAMPLE_SIZE > len(signal): 43 | sample = signal[len(signal)-SAMPLE_SIZE:] 44 | end = True 45 | else: 46 | sample = signal[current:current+SAMPLE_SIZE] 47 | current += SAMPLE_SIZE 48 | 49 | mel = melspectrogram(sample) 50 | x = np.array(mel) 51 | x = x[np.newaxis, ...] 52 | x = np.expand_dims(x, axis=3) 53 | pred = np.argmax(self.model.predict(x), axis=1) 54 | predictions.append(pred) 55 | pieces.append(sample) 56 | 57 | for i in range(len(predictions)): 58 | if predictions[i][0] == 0: 59 | librosa.output.write_wav(self.out + str(i) + os.path.split(audio)[1], pieces[i], sr=SR) 60 | 61 | if visual: 62 | end = '\033[0m' 63 | green = '\033[92m' 64 | length = 50 65 | each = "|" * (length // len(predictions)) 66 | output = "" 67 | for p in predictions: 68 | if p[0] == 0: 69 | output += green + each + end 70 | else: 71 | output += each 72 | 73 | print(output) 74 | else: 75 | return pieces, predictions 76 | 77 | 78 | if __name__ == '__main__': 79 | 80 | parser = argparse.ArgumentParser(description="Classify where the cough is in a file and split it up") 81 | parser.add_argument("file", metavar="f", type=str, help="Path to the audio file, must be wav") 82 | args = parser.parse_args() 83 | 84 | try: 85 | os.system("mkdir detected/") 86 | except: 87 | pass 88 | 89 | d = Detector("model.json", "model.h5") 90 | d.detect(args.file, "detected/", visual=True) 91 | -------------------------------------------------------------------------------- /spectro/model.py: -------------------------------------------------------------------------------- 1 | import os 2 | import wandb 3 | import datetime 4 | import numpy as np 5 | import tensorflow as tf 6 | import tensorflow.keras as keras 7 | from wandb.keras import WandbCallback 8 | from tensorflow.keras.regularizers import l2 9 | from tensorflow.keras.utils import to_categorical 10 | from sklearn.model_selection import train_test_split 11 | from sklearn.metrics import precision_score, f1_score, recall_score 12 | from tensorflow.keras.layers import Conv2D, MaxPooling2D, BatchNormalization, Flatten, Dense, Dropout, LeakyReLU, SpatialDropout2D, GlobalAveragePooling2D 13 | 14 | 15 | DATASET = "dataset.npy" 16 | global model 17 | 18 | def load_data(f, s=False): 19 | data = np.load(f, allow_pickle=True) 20 | x = [] 21 | y = [] 22 | 23 | for sample in data: 24 | x.append(sample[0]) 25 | y.append(to_categorical(sample[1], num_classes=2)) 26 | 27 | x = np.array(x) 28 | y = np.array(y) 29 | 30 | shape = (x.shape[1], x.shape[2], 1) 31 | 32 | x = x.reshape(x.shape[0], shape[0], shape[1], shape[2]) 33 | 34 | if s: 35 | return x, y, shape 36 | 37 | #x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25) 38 | x_train, x_val, y_train, y_val = train_test_split(x, y, test_size=0.25) 39 | 40 | return x_train, y_train, x_val, y_val, shape 41 | 42 | class ExtraCallBack(tf.keras.callbacks.Callback): 43 | def on_epoch_end(self, epoch, logs=None): 44 | acc = model.test(model.x_extra, model.y_extra, extra=True) 45 | try: 46 | wandb.log({'real_acc': acc}) 47 | except: 48 | pass 49 | #print("Evaluating over our own dataset, accuracy: "+str(acc)) 50 | 51 | class Model(object): 52 | def __init__(self, name, config, hyper=False, hyper_project="", extra=None): 53 | self.name = name 54 | self.config = config 55 | self.x_extra = extra[0] 56 | self.y_extra = extra[1] 57 | 58 | if hyper: 59 | wandb.init(config=config, project=hyper_project) 60 | wandb.run.save() 61 | self.callback = WandbCallback(data_type="image", validation_data=extra) 62 | try: 63 | os.system("mkdir sweep/"+wandb.run.name) 64 | except: 65 | pass 66 | else: 67 | log_dir = "logs/fit/"+datetime.datetime.now().strftime("%Y%m%d-%H%M%S") 68 | self.callback = keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1) 69 | 70 | self.extra_callback = ExtraCallBack() 71 | 72 | def build(self, input_shape): 73 | self.model = keras.Sequential([ 74 | Conv2D(self.config["conv1"], self.config["kernel1"], kernel_regularizer=l2(self.config["l2_rate"]), input_shape=input_shape), 75 | LeakyReLU(alpha=self.config["alpha"]), 76 | BatchNormalization(), 77 | 78 | SpatialDropout2D(self.config["drop1"]), 79 | Conv2D(self.config["conv2"], self.config["kernel2"], kernel_regularizer=l2(self.config["l2_rate"])), 80 | LeakyReLU(alpha=self.config["alpha"]), 81 | BatchNormalization(), 82 | 83 | MaxPooling2D(self.config["pool1"], padding='same'), 84 | 85 | SpatialDropout2D(self.config["drop1"]), 86 | Conv2D(self.config["conv3"], self.config["kernel3"], kernel_regularizer=l2(self.config["l2_rate"])), 87 | LeakyReLU(alpha=self.config["alpha"]), 88 | BatchNormalization(), 89 | 90 | SpatialDropout2D(self.config["drop2"]), 91 | Conv2D(self.config["conv4"], self.config["kernel4"], kernel_regularizer=l2(self.config["l2_rate"])), 92 | LeakyReLU(alpha=self.config["alpha"]), 93 | BatchNormalization(), 94 | 95 | GlobalAveragePooling2D(), 96 | 97 | Dense(2, activation='softmax') 98 | ]) 99 | 100 | def train(self, x_train, y_train, validation): 101 | self.optimizer = keras.optimizers.Adam( 102 | learning_rate=self.config["lr"], 103 | beta_1=self.config["beta_1"], 104 | beta_2=self.config["beta_2"] 105 | ) 106 | self.model.compile( 107 | optimizer=self.optimizer, 108 | loss="categorical_crossentropy", 109 | metrics=['accuracy'] 110 | ) 111 | 112 | self.model.summary() 113 | 114 | self.model.fit( 115 | x_train, 116 | y_train, 117 | validation_data=validation, 118 | batch_size=self.config["batch_size"], 119 | epochs=self.config["epochs"], 120 | callbacks=[self.callback, self.extra_callback] 121 | ) 122 | 123 | def test(self, x_test, y_test, extra=True): 124 | test_err, test_acc = self.model.evaluate(x_test, y_test, verbose=0) 125 | 126 | """# predict probabilities for test set 127 | yhat_probs = self.model.predict(x_test, verbose=0) 128 | # predict crisp classes for test set 129 | yhat_classes = self.model.predict_classes(x_test, verbose=0) 130 | precision = precision_score(y_test, yhat_classes) 131 | print('Precision: %f' % precision) 132 | # recall: tp / (tp + fn) 133 | recall = recall_score(y_test, yhat_classes) 134 | print('Recall: %f' % recall) 135 | # f1: 2 tp / (2 tp + fp + fn) 136 | f1 = f1_score(y_test, yhat_classes) 137 | print('F1 score: %f' % f1)""" 138 | 139 | if extra: 140 | return test_acc 141 | else: 142 | print("Accuracy on testing data: "+str(test_acc)) 143 | 144 | def save(self): 145 | folder = "sweep/"+wandb.run.name+"/" 146 | with open(folder+"model.json", "w") as json_file: 147 | json_file.write(self.model.to_json()) 148 | 149 | self.model.save_weights(folder+"model.h5") 150 | print("Saved model '"+self.name+"-"+wandb.run.name+"' to disk") 151 | 152 | 153 | if __name__ == '__main__': 154 | should_train = True 155 | 156 | if should_train: 157 | x_train, y_train, x_val, y_val, shape = load_data(DATASET) 158 | x_extra, y_extra, _ = load_data("test.npy", s=True) 159 | 160 | config = dict( 161 | conv1 = 32, 162 | kernel1 = (3,3), 163 | drop1 = 0.07, 164 | 165 | conv2 = 32, 166 | kernel2 = (3,3), 167 | 168 | pool1 = (2,2), 169 | 170 | conv3 = 64, 171 | kernel3 = (3,3), 172 | 173 | drop2 = 0.14, 174 | 175 | conv4 = 64, 176 | kernel4 = (3,3), 177 | 178 | batch_size = 128, 179 | epochs = 70, 180 | 181 | lr = 1e-4, 182 | beta_1 = 0.99, 183 | beta_2 = 0.999, 184 | l2_rate = 0.001, 185 | 186 | alpha = 0.1 187 | ) 188 | 189 | model = Model("Spectro2", config, hyper=True, hyper_project="CoughDetectTests", extra=(x_extra, y_extra)) 190 | model.build(shape) 191 | model.train(x_train, y_train, (x_val, y_val)) 192 | model.save() 193 | 194 | else: 195 | pass 196 | -------------------------------------------------------------------------------- /spectro/processing.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 48, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "import os\n", 10 | "import librosa\n", 11 | "import numpy as np\n", 12 | "import pandas as pd\n", 13 | "from tqdm import tqdm\n", 14 | "import librosa.display\n", 15 | "import matplotlib.pyplot as plt\n", 16 | "from speechpy.processing import cmvn " 17 | ] 18 | }, 19 | { 20 | "cell_type": "code", 21 | "execution_count": 72, 22 | "metadata": {}, 23 | "outputs": [], 24 | "source": [ 25 | "SR = 44000\n", 26 | "N_FFT = 2048\n", 27 | "HOP_LENGTH = 512\n", 28 | "N_MELS = 60\n", 29 | "SILENCE = 0.0018\n", 30 | "SAMPLE_LENGTH = 0.5 #s\n", 31 | "SAMPLE_SIZE = int(np.ceil(SR*SAMPLE_LENGTH))\n", 32 | "NOISE_RATIO = 0.25\n", 33 | "\n", 34 | "LABELS = [\"cough\", \"not\"]" 35 | ] 36 | }, 37 | { 38 | "cell_type": "code", 39 | "execution_count": 59, 40 | "metadata": {}, 41 | "outputs": [ 42 | { 43 | "data": { 44 | "image/png": "\n", 45 | "text/plain": [ 46 | "
" 47 | ] 48 | }, 49 | "metadata": { 50 | "needs_background": "light" 51 | }, 52 | "output_type": "display_data" 53 | }, 54 | { 55 | "data": { 56 | "image/png": "\n", 57 | "text/plain": [ 58 | "
" 59 | ] 60 | }, 61 | "metadata": { 62 | "needs_background": "light" 63 | }, 64 | "output_type": "display_data" 65 | } 66 | ], 67 | "source": [ 68 | "def envelope(signal, rate, thresh):\n", 69 | " mask = []\n", 70 | " y = pd.Series(signal).apply(np.abs)\n", 71 | " # Create aggregated mean\n", 72 | " y_mean = y.rolling(window=int(rate/10), min_periods=1, center=True).mean()\n", 73 | " for m in y_mean:\n", 74 | " mask.append(m > thresh)\n", 75 | "\n", 76 | " return mask\n", 77 | "\n", 78 | "def load_audio(path):\n", 79 | " signal, rate = librosa.load(path, sr=SR)\n", 80 | " mask = envelope(signal, rate, SILENCE)\n", 81 | " signal = signal[mask]\n", 82 | " \n", 83 | " return signal\n", 84 | "\n", 85 | "def melspectrogram(signal):\n", 86 | " signal = librosa.util.normalize(signal)\n", 87 | " spectro = librosa.feature.melspectrogram(\n", 88 | " signal,\n", 89 | " sr=SR,\n", 90 | " n_mels=N_MELS,\n", 91 | " n_fft=N_FFT\n", 92 | " )\n", 93 | " spectro = librosa.power_to_db(spectro)\n", 94 | " spectro = spectro.astype(np.float32)\n", 95 | " return spectro\n", 96 | "\n", 97 | "audios = [\n", 98 | " load_audio(\"test/cough/1586532810683_cough_suspect_f_49_57022fb6-10cc-4dce-91fd-88bc0559076e.wav\"),\n", 99 | " load_audio(\"test/cough/1586538036621_cough_healthy_m_51_11e04705-602e-49be-8857-9e7d52dea3f5.wav\")\n", 100 | "]\n", 101 | "\n", 102 | "mels = []\n", 103 | "\n", 104 | "for a in audios:\n", 105 | " mel = melspectrogram(a)\n", 106 | " mels.append(mel)\n", 107 | " plt.figure(figsize=(12, 4))\n", 108 | " librosa.display.specshow(mel, x_axis='time', y_axis='mel')\n", 109 | " plt.colorbar(format='%+2.0f dB');\n", 110 | " plt.title('MEL-Scaled Spectrogram')\n", 111 | " plt.tight_layout()\n", 112 | " plt.show()" 113 | ] 114 | }, 115 | { 116 | "cell_type": "code", 117 | "execution_count": 64, 118 | "metadata": {}, 119 | "outputs": [], 120 | "source": [ 121 | "AUGMENT = \"noise/\"\n", 122 | "noises = []\n", 123 | "\n", 124 | "for audio in os.listdir(AUGMENT):\n", 125 | " if os.path.splitext(audio)[-1] != \".wav\":\n", 126 | " continue\n", 127 | " \n", 128 | " noises.append(AUGMENT+audio) " 129 | ] 130 | }, 131 | { 132 | "cell_type": "code", 133 | "execution_count": 77, 134 | "metadata": {}, 135 | "outputs": [], 136 | "source": [ 137 | "def load_noises(n=2):\n", 138 | " ns = []\n", 139 | " ids = []\n", 140 | " for _ in range(n):\n", 141 | " while True:\n", 142 | " i = np.random.choice(len(noises))\n", 143 | " if i in ids:\n", 144 | " continue\n", 145 | " ids.append(i)\n", 146 | " noise, _ = librosa.load(noises[i], sr=SR)\n", 147 | " if len(noise) < SAMPLE_SIZE:\n", 148 | " continue\n", 149 | " ns.append(noise)\n", 150 | " break\n", 151 | " \n", 152 | " return ns\n", 153 | "\n", 154 | "def augment(sample, ns):\n", 155 | " augmented = []\n", 156 | " for noise in ns:\n", 157 | " gap = len(noise)-len(sample)\n", 158 | " point = 0\n", 159 | " if gap > 0:\n", 160 | " point = np.random.randint(low=0, high=len(noise)-len(sample))\n", 161 | " noise = noise[point:point+len(sample)]\n", 162 | " final = []\n", 163 | " for f in range(len(sample)):\n", 164 | " n = noise[f]*NOISE_RATIO\n", 165 | " final.append(sample[f]+n)\n", 166 | " \n", 167 | " augmented.append(final)\n", 168 | " \n", 169 | " return augmented\n", 170 | "\n", 171 | "def process(audio, aug=False):\n", 172 | " signal = load_audio(audio)\n", 173 | " \n", 174 | " if len(signal) < SAMPLE_SIZE:\n", 175 | " return []\n", 176 | " \n", 177 | " current = 0\n", 178 | " end = False\n", 179 | " features = []\n", 180 | " \n", 181 | " if aug:\n", 182 | " ns = load_noises()\n", 183 | " \n", 184 | " while not end:\n", 185 | " if current+SAMPLE_SIZE > len(signal):\n", 186 | " sample = signal[len(signal)-SAMPLE_SIZE:]\n", 187 | " end = True\n", 188 | " else:\n", 189 | " sample = signal[current:current+SAMPLE_SIZE]\n", 190 | " current += SAMPLE_SIZE\n", 191 | " \n", 192 | " features.append(melspectrogram(sample))\n", 193 | " \n", 194 | " if aug:\n", 195 | " signals = augment(sample, ns)\n", 196 | " for s in signals:\n", 197 | " features.append(melspectrogram(s))\n", 198 | " \n", 199 | " return features\n", 200 | " \n", 201 | "def generate_dataset(folder, aug=False):\n", 202 | " data = [] #contains [mel, label]\n", 203 | " for i, label in enumerate(LABELS):\n", 204 | " print(\"Processing: \"+label)\n", 205 | " for audio in tqdm(os.listdir(folder+label)):\n", 206 | " if os.path.splitext(audio)[-1] != \".wav\":\n", 207 | " continue\n", 208 | " \n", 209 | " features = process(folder+label+\"/\"+audio, aug=aug and i == 0)\n", 210 | " for feat in features:\n", 211 | " data.append([feat, i])\n", 212 | " \n", 213 | " return data" 214 | ] 215 | }, 216 | { 217 | "cell_type": "code", 218 | "execution_count": 78, 219 | "metadata": {}, 220 | "outputs": [ 221 | { 222 | "name": "stderr", 223 | "output_type": "stream", 224 | "text": [ 225 | "\r", 226 | " 0%| | 0/36 [00:00 thresh) 31 | 32 | return mask 33 | 34 | def load_audio(path): 35 | signal, rate = librosa.load(path, sr=SR) 36 | mask = envelope(signal, rate, SILENCE) 37 | signal = signal[mask] 38 | 39 | return signal 40 | 41 | def melspectrogram(signal): 42 | signal = librosa.util.normalize(signal) 43 | spectro = librosa.feature.melspectrogram( 44 | signal, 45 | sr=SR, 46 | n_mels=N_MELS, 47 | n_fft=N_FFT 48 | ) 49 | spectro = librosa.power_to_db(spectro) 50 | spectro = spectro.astype(np.float32) 51 | return spectro 52 | 53 | def load_noises(n=2): 54 | ns = [] 55 | ids = [] 56 | for _ in range(n): 57 | while True: 58 | i = np.random.choice(len(noises)) 59 | if i in ids: 60 | continue 61 | ids.append(i) 62 | noise, _ = librosa.load(noises[i], sr=SR) 63 | if len(noise) < SAMPLE_SIZE: 64 | continue 65 | ns.append(noise) 66 | break 67 | 68 | return ns 69 | 70 | def augment(sample, ns): 71 | augmented = [] 72 | for noise in ns: 73 | gap = len(noise)-len(sample) 74 | point = 0 75 | if gap > 0: 76 | point = np.random.randint(low=0, high=len(noise)-len(sample)) 77 | noise = noise[point:point+len(sample)] 78 | final = [] 79 | for f in range(len(sample)): 80 | n = noise[f]*NOISE_RATIO 81 | final.append(sample[f]+n) 82 | 83 | augmented.append(final) 84 | 85 | return augmented 86 | 87 | def process(audio, aug=False): 88 | signal = load_audio(audio) 89 | 90 | if len(signal) < SAMPLE_SIZE: 91 | return [] 92 | 93 | current = 0 94 | end = False 95 | features = [] 96 | 97 | if aug: 98 | ns = load_noises() 99 | 100 | while not end: 101 | if current+SAMPLE_SIZE > len(signal): 102 | sample = signal[len(signal)-SAMPLE_SIZE:] 103 | end = True 104 | else: 105 | sample = signal[current:current+SAMPLE_SIZE] 106 | current += SAMPLE_SIZE 107 | 108 | features.append(melspectrogram(sample)) 109 | 110 | if aug: 111 | signals = augment(sample, ns) 112 | for s in signals: 113 | features.append(melspectrogram(s)) 114 | 115 | return features 116 | 117 | def generate_dataset(folder, aug=False): 118 | data = [] #contains [mel, label] 119 | for i, label in enumerate(LABELS): 120 | print("Processing: "+label) 121 | for audio in tqdm(os.listdir(folder+label)): 122 | if os.path.splitext(audio)[-1] != ".wav": 123 | continue 124 | 125 | features = process(folder+label+"/"+audio, aug=aug and i == 0) 126 | for feat in features: 127 | data.append([feat, i]) 128 | 129 | return data 130 | 131 | if __name__ == '__main__': 132 | for audio in os.listdir(AUGMENT): 133 | if os.path.splitext(audio)[-1] != ".wav": 134 | continue 135 | 136 | noises.append(AUGMENT+audio) 137 | 138 | DATA_FOLDER = "dataset/" 139 | TEST_FOLDER = "test/" 140 | 141 | data = generate_dataset(DATA_FOLDER, True) 142 | 143 | #extra = generate_dataset(EXTRA_FOLDER) 144 | #data += extra 145 | np.random.shuffle(data) 146 | np.save("dataset.npy", data) 147 | 148 | test = generate_dataset(TEST_FOLDER) 149 | np.save("test.npy", test) 150 | -------------------------------------------------------------------------------- /spectro/sweep.yaml: -------------------------------------------------------------------------------- 1 | program: model.py 2 | method: random 3 | command: 4 | - ${env} 5 | - python3 6 | - ${program} 7 | - ${args} 8 | metric: 9 | name: loss 10 | goal: minimize 11 | parameters: 12 | lr: 13 | distribution: uniform 14 | min: 0.000001 15 | max: 0.1 16 | --------------------------------------------------------------------------------