├── README.md ├── basic_room_acoustics_03.ipynb ├── basics_room_acoustics.ipynb ├── basics_room_acoustics_02.ipynb ├── colab_rir_directivities_demo.ipynb ├── images └── basicRoomAcoustics_header.jpg ├── ism_torchaudio.ipynb ├── simple_audio.ipynb ├── timbre_transfer.ipynb └── webMushra.ipynb /README.md: -------------------------------------------------------------------------------- 1 | # Topics in Audio Processing & Music Technology 2 | 3 | ## Content 4 | - 01 DDSP: Differentiable Digital Signal Processing [![Youtube](https://badgen.net/badge/Launch/on%20YouTube/red?icon=terminal)](https://youtu.be/EXz1TJQ-hSo) 5 | - [00:00:00 Introduction](https://www.youtube.com/watch?v=EXz1TJQ-hSo&list=PL6QnpHKwdPYjYtJf8afYK0XQwFglxyjID&index=1&t=0s) 6 | - [00:00:30 Magenta Research Project](https://www.youtube.com/watch?v=EXz1TJQ-hSo&list=PL6QnpHKwdPYjYtJf8afYK0XQwFglxyjID&index=1&t=30s) 7 | - [00:01:14 DDSP Library Overview](https://www.youtube.com/watch?v=EXz1TJQ-hSo&list=PL6QnpHKwdPYjYtJf8afYK0XQwFglxyjID&index=1&t=74s) 8 | - [00:01:36 DDSP Modules](https://www.youtube.com/watch?v=EXz1TJQ-hSo&list=PL6QnpHKwdPYjYtJf8afYK0XQwFglxyjID&index=1&t=96s) 9 | - [00:02:29 DDSP Processors Module](https://www.youtube.com/watch?v=EXz1TJQ-hSo&list=PL6QnpHKwdPYjYtJf8afYK0XQwFglxyjID&index=1&t=149s) 10 | - [00:04:16 DDSP Core Module](https://www.youtube.com/watch?v=EXz1TJQ-hSo&list=PL6QnpHKwdPYjYtJf8afYK0XQwFglxyjID&index=1&t=256s) 11 | - [00:06:41 DDSP Spectral Ops Module](https://www.youtube.com/watch?v=EXz1TJQ-hSo&list=PL6QnpHKwdPYjYtJf8afYK0XQwFglxyjID&index=1&t=401s) 12 | - [00:07:46 DDSP Original Paper](https://www.youtube.com/watch?v=EXz1TJQ-hSo&list=PL6QnpHKwdPYjYtJf8afYK0XQwFglxyjID&index=1&t=466s) 13 | 14 | - 02 Differentiable Digital Signal Processing: Timbre Transfer - Deep Learning[![Youtube](https://badgen.net/badge/Launch/on%20YouTube/red?icon=terminal)](https://youtu.be/ZwZMIKagPlU)

15 | - 00:00:00 Timbre Transfer Demo 16 | - 00:02:55 Installing and Importing the DDSP Library 17 | - 00:03:10 Recording or Uploading Audio 18 | - 00:05:08 Loading a Model: Violin 19 | - 00:06:56 Resynthesizing Audio 20 | - 00:08:01 Modifying Conditioning 21 | - 00:09:06 Pre-trained Trumpet Model 22 | - 00:10:22 Pre-trained Flute2 Model 23 | - 00:11:18 Pre-trained Tenor Saxophone Model 24 | - 00:12:56 Pitch Shifting 25 | - 00:13:50 Demo Application Conclusions 26 | - 00:15:19 DDSP ICLR Paper: The International Conference on Learning Representations 27 | - 00:17:18 Install and Imports 28 | - 00:18:16 Gin Config Framework 29 | - 00:22:35 Recording Audio 30 | - 00:26:23 base64: Decode the Base64 encoded bytes-like object or ASCII string 31 | - 00:26:43 Audio bytes to Numpy 32 | - 00:27:44 Uploading an Audio File 33 | - 00:29:29 Colab Files Upload 34 | - 00:33:43 Log Magnitude Spectrogram 35 | - 00:36:40 DDSP Spectral Operations: Compute Magnitude 36 | - 00:37:11 DDSP Spectral Operations: Short-time Fourier Transform (STFT) 37 | - 00:38:15 Tensorflow Signal: Short-time Fourier Transform (STFT) 38 | - 00:39:37 DDSP Safe Log 39 | - 00:43:14 Numpy Rotate 90 40 | - 00:44:56 Pyplot Matshow 41 | - 00:46:33 DDSP Colab Utils: Play Audio 42 | - 00:49:52 Scipy IO Wavfile Write 43 | - 00:51:37 IPython Display Audio 44 | - 00:55:29 Computing Audio Features 45 | - 00:56:28 CREPE: A Convolutional Representation for Pitch Estimation 46 | - 00:57:39 CREPE ICASSP 2018 Paper 47 | - 01:06:16 DDSP Timbre Transfer Audio Features 48 | - 01:09:28 Computing F0 49 | - 01:10:23 Viterbi Algorithm 50 | - 01:13:38 Computing Loudness 51 | - 01:17:50 Dynamics, Intensity and Loudness 52 | - 01:21:12 A-weighting Function 53 | - 01:24:59 Plotting the Audio Features 54 | - 01:25:28 Loading a Pre-Trained Model 55 | - 01:27:05 Gsutil Tool 56 | - 01:28:22 tf.io.gfile 57 | - 01:29:19 Parsing Gin Config 58 | - 01:31:03 Gin Query Parameters 59 | - 01:35:57 Understanding the Pre-Trained Model 60 | - 01:36:16 DDSP Autoencoder Model 61 | - 01:37:13 DDSP Model Base Class 62 | - 01:39:01 DDSP ICLR Paper: Autoencoder Architecture 63 | - 01:39:43 Gin Config - Model Parameters 64 | - 01:42:17 DDSP Encoders 65 | - 01:44:12 DDSP Decoders 66 | - 01:45:47 Recurrent Neural Network (RNN) 67 | - 01:46:18 Gated Recurrent Units (GRUs) 68 | - 01:47:28 Harmonic Oscillator / Additive Synthesizer 69 | - 01:48:54 Filtered Noise / Subtractive Synthesizer 70 | - 01:50:08 Multi-Scale Spectral Loss 71 | - 01:51:52 DDSP Reverb 72 | - 01:52:21 Conclusion 73 | 74 | - 03 Tensorflow + Tensorboard + Scikit-learn: Simple audio recognition: Recognizing keywords[![Youtube](https://badgen.net/badge/Launch/on%20YouTube/red?icon=terminal)](https://youtu.be/MFGYEUj8S3I)

75 | - 00:00:00 Introduction 76 | - 00:00:34 Simple audio recognition: Recognizing keywords Example 77 | - 00:03:53 Tensorflow Keras 78 | - 00:05:38 Tensorflow 79 | - 00:07:44 Seed for Experiment Reproducibility 80 | - 00:09:56 Speech Commands Dataset 81 | - 00:11:33 Downloading files using 'tf.keras.utils.get_file' 82 | - 00:15:07 Returning a list of files that match the given pattern(s) using 'tf.io.gfile.glob' 83 | - 00:16:19 Splitting the dataset into training, validation and testing subsets 84 | - 00:19:29 Scikit-learn 85 | - 00:20:14 Stratified ShuffleSplit 86 | - 00:24:38 Writing descriptive and efficient input pipelines using 'tf.data.Dataset' 87 | - 00:26:20 Reading the contents of a file using 'tf.io.read_file' 88 | - 00:27:14 Applying transformations across the elements of a dataset using '.map(...)' 89 | - 00:31:18 Spectrograms using 'tf.signal.stft' 90 | - 00:31:32 Zero-padding in Tensorflow 91 | - 00:34:00 Listening to audio files 92 | - 00:36:10 Plotting Spectrograms 93 | - 00:40:29 Batching the training and validation subsets using '.batch(...)' 94 | - 00:41:48 Caching elements of a dataset using '.cache(...)' 95 | - 00:42:20 Prefetching elements of a dataset unsing 'prefetch(...)' 96 | - 00:43:13 Preprocessing Layers: Resizing and Normalization 97 | - 00:43:48 Building a Sequential Model in Tensorflow Keras 98 | - 00:46:10 Printing a string summary of the network using '.summary()' 99 | - 00:46:46 Configuring the model for training using '.compile(...)' 100 | - 00:48:12 Configuring Tensorboard 101 | - 00:50:18 Training the Model using '.fit(...)' 102 | - 00:50:33 Visualization in Tensorboard 103 | - 00:52:20 Evaluating the Test set Performance 104 | - 00:53:39 Displaying a Confusion Matrix using '.tf.math.confusion_matrix(...)' 105 | - 00:54:40 Running Inference on an Audio File 106 | 107 | ## YouTube Playlist 108 | [![Youtube](https://badgen.net/badge/Launch/on%20YouTube/red?icon=terminal)](https://youtube.com/playlist?list=PL6QnpHKwdPYjYtJf8afYK0XQwFglxyjID) 109 | -------------------------------------------------------------------------------- /images/basicRoomAcoustics_header.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/GuitarsAI/TopicsInAudioAndMusicTech/cc984b81d4341fb793324bc26ed0e71fae8feaac/images/basicRoomAcoustics_header.jpg -------------------------------------------------------------------------------- /simple_audio.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "accelerator": "GPU", 6 | "colab": { 7 | "name": "simple_audio.ipynb", 8 | "provenance": [], 9 | "collapsed_sections": [], 10 | "toc_visible": true 11 | }, 12 | "kernelspec": { 13 | "display_name": "Python 3", 14 | "name": "python3" 15 | } 16 | }, 17 | "cells": [ 18 | { 19 | "cell_type": "markdown", 20 | "metadata": { 21 | "id": "fluF3_oOgkWF" 22 | }, 23 | "source": [ 24 | "##### Copyright 2020 The TensorFlow Authors." 25 | ] 26 | }, 27 | { 28 | "cell_type": "code", 29 | "metadata": { 30 | "cellView": "form", 31 | "id": "AJs7HHFmg1M9" 32 | }, 33 | "source": [ 34 | "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", 35 | "# you may not use this file except in compliance with the License.\n", 36 | "# You may obtain a copy of the License at\n", 37 | "#\n", 38 | "# https://www.apache.org/licenses/LICENSE-2.0\n", 39 | "#\n", 40 | "# Unless required by applicable law or agreed to in writing, software\n", 41 | "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", 42 | "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", 43 | "# See the License for the specific language governing permissions and\n", 44 | "# limitations under the License." 45 | ], 46 | "execution_count": null, 47 | "outputs": [] 48 | }, 49 | { 50 | "cell_type": "markdown", 51 | "metadata": { 52 | "id": "R-h7XJRV0Bkm" 53 | }, 54 | "source": [ 55 | "**Modified by Renato Profeta to embed Video Tutorials. June, 2021**" 56 | ] 57 | }, 58 | { 59 | "cell_type": "markdown", 60 | "metadata": { 61 | "id": "jYysdyb-CaWM" 62 | }, 63 | "source": [ 64 | "# Simple audio recognition: Recognizing keywords" 65 | ] 66 | }, 67 | { 68 | "cell_type": "markdown", 69 | "metadata": { 70 | "id": "CNbqmZy0gbyE" 71 | }, 72 | "source": [ 73 | "\n", 74 | " \n", 79 | " \n", 84 | " \n", 89 | " \n", 92 | "

\n", 75 | " \n", 76 | "

\n", 77 | " View on TensorFlow.org\n", 78 | "

\n", 80 | " \n", 81 | "

\n", 82 | " Run in Google Colab\n", 83 | "

\n", 85 | " \n", 86 | "

\n", 87 | " View source on GitHub\n", 88 | "

\n", 90 | "

Download notebook\n", 91 | "

" 93 | ] 94 | }, 95 | { 96 | "cell_type": "markdown", 97 | "metadata": { 98 | "id": "SPfDNFlb66XF" 99 | }, 100 | "source": [ 101 | "This tutorial will show you how to build a basic speech recognition network that recognizes ten different words. It's important to know that real speech and audio recognition systems are much more complex, but like MNIST for images, it should give you a basic understanding of the techniques involved. Once you've completed this tutorial, you'll have a model that tries to classify a one second audio clip as \"down\", \"go\", \"left\", \"no\", \"right\", \"stop\", \"up\" and \"yes\"." 102 | ] 103 | }, 104 | { 105 | "cell_type": "code", 106 | "metadata": { 107 | "cellView": "form", 108 | "id": "SYHl5PQf0Ogo", 109 | "outputId": "9b9321ff-a540-4828-9c2e-e16fb0886d73", 110 | "colab": { 111 | "base_uri": "https://localhost:8080/", 112 | "height": 336 113 | } 114 | }, 115 | "source": [ 116 | "#@title\n", 117 | "%%html\n", 118 | "" 119 | ], 120 | "execution_count": 1, 121 | "outputs": [ 122 | { 123 | "output_type": "display_data", 124 | "data": { 125 | "text/html": [ 126 | "" 127 | ], 128 | "text/plain": [ 129 | "" 130 | ] 131 | }, 132 | "metadata": { 133 | "tags": [] 134 | } 135 | } 136 | ] 137 | }, 138 | { 139 | "cell_type": "markdown", 140 | "metadata": { 141 | "id": "Go9C3uLL8Izc" 142 | }, 143 | "source": [ 144 | "## Setup\n", 145 | "\n", 146 | "Import necessary modules and dependencies." 147 | ] 148 | }, 149 | { 150 | "cell_type": "code", 151 | "metadata": { 152 | "id": "dzLKpmZICaWN" 153 | }, 154 | "source": [ 155 | "import os\n", 156 | "import pathlib\n", 157 | "\n", 158 | "import matplotlib.pyplot as plt\n", 159 | "import numpy as np\n", 160 | "import seaborn as sns\n", 161 | "import tensorflow as tf\n", 162 | "\n", 163 | "from tensorflow.keras.layers.experimental import preprocessing\n", 164 | "from tensorflow.keras import layers\n", 165 | "from tensorflow.keras import models\n", 166 | "from IPython import display\n", 167 | "\n", 168 | "\n", 169 | "# Set seed for experiment reproducibility\n", 170 | "seed = 42\n", 171 | "tf.random.set_seed(seed)\n", 172 | "np.random.seed(seed)" 173 | ], 174 | "execution_count": null, 175 | "outputs": [] 176 | }, 177 | { 178 | "cell_type": "markdown", 179 | "metadata": { 180 | "id": "yR0EdgrLCaWR" 181 | }, 182 | "source": [ 183 | "## Import the Speech Commands dataset\n", 184 | "\n", 185 | "You'll write a script to download a portion of the [Speech Commands dataset](https://www.tensorflow.org/datasets/catalog/speech_commands). The original dataset consists of over 105,000 WAV audio files of people saying thirty different words. This data was collected by Google and released under a CC BY license.\n", 186 | "\n", 187 | "You'll be using a portion of the dataset to save time with data loading. Extract the `mini_speech_commands.zip` and load it in using the `tf.data` API." 188 | ] 189 | }, 190 | { 191 | "cell_type": "code", 192 | "metadata": { 193 | "id": "2-rayb7-3Y0I" 194 | }, 195 | "source": [ 196 | "data_dir = pathlib.Path('data/mini_speech_commands')\n", 197 | "if not data_dir.exists():\n", 198 | " tf.keras.utils.get_file(\n", 199 | " 'mini_speech_commands.zip',\n", 200 | " origin=\"http://storage.googleapis.com/download.tensorflow.org/data/mini_speech_commands.zip\",\n", 201 | " extract=True,\n", 202 | " cache_dir='.', cache_subdir='data')" 203 | ], 204 | "execution_count": null, 205 | "outputs": [] 206 | }, 207 | { 208 | "cell_type": "markdown", 209 | "metadata": { 210 | "id": "BgvFq3uYiS5G" 211 | }, 212 | "source": [ 213 | "Check basic statistics about the dataset." 214 | ] 215 | }, 216 | { 217 | "cell_type": "code", 218 | "metadata": { 219 | "id": "70IBxSKxA1N9" 220 | }, 221 | "source": [ 222 | "commands = np.array(tf.io.gfile.listdir(str(data_dir)))\n", 223 | "commands = commands[commands != 'README.md']\n", 224 | "print('Commands:', commands)" 225 | ], 226 | "execution_count": null, 227 | "outputs": [] 228 | }, 229 | { 230 | "cell_type": "markdown", 231 | "metadata": { 232 | "id": "aMvdU9SY8WXN" 233 | }, 234 | "source": [ 235 | "Extract the audio files into a list and shuffle it." 236 | ] 237 | }, 238 | { 239 | "cell_type": "code", 240 | "metadata": { 241 | "id": "hlX685l1wD9k" 242 | }, 243 | "source": [ 244 | "filenames = tf.io.gfile.glob(str(data_dir) + '/*/*')\n", 245 | "filenames = tf.random.shuffle(filenames)\n", 246 | "num_samples = len(filenames)\n", 247 | "print('Number of total examples:', num_samples)\n", 248 | "print('Number of examples per label:',\n", 249 | " len(tf.io.gfile.listdir(str(data_dir/commands[0]))))\n", 250 | "print('Example file tensor:', filenames[0])" 251 | ], 252 | "execution_count": null, 253 | "outputs": [] 254 | }, 255 | { 256 | "cell_type": "markdown", 257 | "metadata": { 258 | "id": "9vK3ymy23MCP" 259 | }, 260 | "source": [ 261 | "Split the files into training, validation and test sets using a 80:10:10 ratio, respectively." 262 | ] 263 | }, 264 | { 265 | "cell_type": "code", 266 | "metadata": { 267 | "id": "Cv_wts-l3KgD" 268 | }, 269 | "source": [ 270 | "train_files = filenames[:6400]\n", 271 | "val_files = filenames[6400: 6400 + 800]\n", 272 | "test_files = filenames[-800:]\n", 273 | "\n", 274 | "print('Training set size', len(train_files))\n", 275 | "print('Validation set size', len(val_files))\n", 276 | "print('Test set size', len(test_files))" 277 | ], 278 | "execution_count": null, 279 | "outputs": [] 280 | }, 281 | { 282 | "cell_type": "markdown", 283 | "metadata": { 284 | "id": "g2Cj9FyvfweD" 285 | }, 286 | "source": [ 287 | "## Reading audio files and their labels" 288 | ] 289 | }, 290 | { 291 | "cell_type": "markdown", 292 | "metadata": { 293 | "id": "j1zjcWteOcBy" 294 | }, 295 | "source": [ 296 | "The audio file will initially be read as a binary file, which you'll want to convert into a numerical tensor.\n", 297 | "\n", 298 | "To load an audio file, you will use [`tf.audio.decode_wav`](https://www.tensorflow.org/api_docs/python/tf/audio/decode_wav), which returns the WAV-encoded audio as a Tensor and the sample rate.\n", 299 | "\n", 300 | "A WAV file contains time series data with a set number of samples per second. \n", 301 | "Each sample represents the amplitude of the audio signal at that specific time. In a 16-bit system, like the files in `mini_speech_commands`, the values range from -32768 to 32767. \n", 302 | "The sample rate for this dataset is 16kHz.\n", 303 | "Note that `tf.audio.decode_wav` will normalize the values to the range [-1.0, 1.0]." 304 | ] 305 | }, 306 | { 307 | "cell_type": "code", 308 | "metadata": { 309 | "id": "9PjJ2iXYwftD" 310 | }, 311 | "source": [ 312 | "def decode_audio(audio_binary):\n", 313 | " audio, _ = tf.audio.decode_wav(audio_binary)\n", 314 | " return tf.squeeze(audio, axis=-1)" 315 | ], 316 | "execution_count": null, 317 | "outputs": [] 318 | }, 319 | { 320 | "cell_type": "markdown", 321 | "metadata": { 322 | "id": "GPQseZElOjVN" 323 | }, 324 | "source": [ 325 | "The label for each WAV file is its parent directory." 326 | ] 327 | }, 328 | { 329 | "cell_type": "code", 330 | "metadata": { 331 | "id": "8VTtX1nr3YT-" 332 | }, 333 | "source": [ 334 | "def get_label(file_path):\n", 335 | " parts = tf.strings.split(file_path, os.path.sep)\n", 336 | "\n", 337 | " # Note: You'll use indexing here instead of tuple unpacking to enable this \n", 338 | " # to work in a TensorFlow graph.\n", 339 | " return parts[-2] " 340 | ], 341 | "execution_count": null, 342 | "outputs": [] 343 | }, 344 | { 345 | "cell_type": "markdown", 346 | "metadata": { 347 | "id": "E8Y9w_5MOsr-" 348 | }, 349 | "source": [ 350 | "Let's define a method that will take in the filename of the WAV file and output a tuple containing the audio and labels for supervised training." 351 | ] 352 | }, 353 | { 354 | "cell_type": "code", 355 | "metadata": { 356 | "id": "WdgUD5T93NyT" 357 | }, 358 | "source": [ 359 | "def get_waveform_and_label(file_path):\n", 360 | " label = get_label(file_path)\n", 361 | " audio_binary = tf.io.read_file(file_path)\n", 362 | " waveform = decode_audio(audio_binary)\n", 363 | " return waveform, label" 364 | ], 365 | "execution_count": null, 366 | "outputs": [] 367 | }, 368 | { 369 | "cell_type": "markdown", 370 | "metadata": { 371 | "id": "nvN8W_dDjYjc" 372 | }, 373 | "source": [ 374 | "You will now apply `process_path` to build your training set to extract the audio-label pairs and check the results. You'll build the validation and test sets using a similar procedure later on." 375 | ] 376 | }, 377 | { 378 | "cell_type": "code", 379 | "metadata": { 380 | "id": "0SQl8yXl3kNP" 381 | }, 382 | "source": [ 383 | "AUTOTUNE = tf.data.AUTOTUNE\n", 384 | "files_ds = tf.data.Dataset.from_tensor_slices(train_files)\n", 385 | "waveform_ds = files_ds.map(get_waveform_and_label, num_parallel_calls=AUTOTUNE)" 386 | ], 387 | "execution_count": null, 388 | "outputs": [] 389 | }, 390 | { 391 | "cell_type": "markdown", 392 | "metadata": { 393 | "id": "voxGEwvuh2L7" 394 | }, 395 | "source": [ 396 | "Let's examine a few audio waveforms with their corresponding labels." 397 | ] 398 | }, 399 | { 400 | "cell_type": "code", 401 | "metadata": { 402 | "id": "8yuX6Nqzf6wT" 403 | }, 404 | "source": [ 405 | "rows = 3\n", 406 | "cols = 3\n", 407 | "n = rows*cols\n", 408 | "fig, axes = plt.subplots(rows, cols, figsize=(10, 12))\n", 409 | "for i, (audio, label) in enumerate(waveform_ds.take(n)):\n", 410 | " r = i // cols\n", 411 | " c = i % cols\n", 412 | " ax = axes[r][c]\n", 413 | " ax.plot(audio.numpy())\n", 414 | " ax.set_yticks(np.arange(-1.2, 1.2, 0.2))\n", 415 | " label = label.numpy().decode('utf-8')\n", 416 | " ax.set_title(label)\n", 417 | "\n", 418 | "plt.show()" 419 | ], 420 | "execution_count": null, 421 | "outputs": [] 422 | }, 423 | { 424 | "cell_type": "markdown", 425 | "metadata": { 426 | "id": "EWXPphxm0B4m" 427 | }, 428 | "source": [ 429 | "## Spectrogram\n", 430 | "\n", 431 | "You'll convert the waveform into a spectrogram, which shows frequency changes over time and can be represented as a 2D image. This can be done by applying the short-time Fourier transform (STFT) to convert the audio into the time-frequency domain.\n", 432 | "\n", 433 | "A Fourier transform ([`tf.signal.fft`](https://www.tensorflow.org/api_docs/python/tf/signal/fft)) converts a signal to its component frequencies, but loses all time information. The STFT ([`tf.signal.stft`](https://www.tensorflow.org/api_docs/python/tf/signal/stft)) splits the signal into windows of time and runs a Fourier transform on each window, preserving some time information, and returning a 2D tensor that you can run standard convolutions on.\n", 434 | "\n", 435 | "STFT produces an array of complex numbers representing magnitude and phase. However, you'll only need the magnitude for this tutorial, which can be derived by applying `tf.abs` on the output of `tf.signal.stft`. \n", 436 | "\n", 437 | "Choose `frame_length` and `frame_step` parameters such that the generated spectrogram \"image\" is almost square. For more information on STFT parameters choice, you can refer to [this video](https://www.coursera.org/lecture/audio-signal-processing/stft-2-tjEQe) on audio signal processing. \n", 438 | "\n", 439 | "You also want the waveforms to have the same length, so that when you convert it to a spectrogram image, the results will have similar dimensions. This can be done by simply zero padding the audio clips that are shorter than one second.\n" 440 | ] 441 | }, 442 | { 443 | "cell_type": "code", 444 | "metadata": { 445 | "id": "_4CK75DHz_OR" 446 | }, 447 | "source": [ 448 | "def get_spectrogram(waveform):\n", 449 | " # Padding for files with less than 16000 samples\n", 450 | " zero_padding = tf.zeros([16000] - tf.shape(waveform), dtype=tf.float32)\n", 451 | "\n", 452 | " # Concatenate audio with padding so that all audio clips will be of the \n", 453 | " # same length\n", 454 | " waveform = tf.cast(waveform, tf.float32)\n", 455 | " equal_length = tf.concat([waveform, zero_padding], 0)\n", 456 | " spectrogram = tf.signal.stft(\n", 457 | " equal_length, frame_length=255, frame_step=128)\n", 458 | " \n", 459 | " spectrogram = tf.abs(spectrogram)\n", 460 | "\n", 461 | " return spectrogram" 462 | ], 463 | "execution_count": null, 464 | "outputs": [] 465 | }, 466 | { 467 | "cell_type": "markdown", 468 | "metadata": { 469 | "id": "5rdPiPYJphs2" 470 | }, 471 | "source": [ 472 | "Next, you will explore the data. Compare the waveform, the spectrogram and the actual audio of one example from the dataset." 473 | ] 474 | }, 475 | { 476 | "cell_type": "code", 477 | "metadata": { 478 | "id": "4Mu6Y7Yz3C-V" 479 | }, 480 | "source": [ 481 | "for waveform, label in waveform_ds.take(1):\n", 482 | " label = label.numpy().decode('utf-8')\n", 483 | " spectrogram = get_spectrogram(waveform)\n", 484 | "\n", 485 | "print('Label:', label)\n", 486 | "print('Waveform shape:', waveform.shape)\n", 487 | "print('Spectrogram shape:', spectrogram.shape)\n", 488 | "print('Audio playback')\n", 489 | "display.display(display.Audio(waveform, rate=16000))" 490 | ], 491 | "execution_count": null, 492 | "outputs": [] 493 | }, 494 | { 495 | "cell_type": "code", 496 | "metadata": { 497 | "id": "e62jzb36-Jog" 498 | }, 499 | "source": [ 500 | "def plot_spectrogram(spectrogram, ax):\n", 501 | " # Convert to frequencies to log scale and transpose so that the time is\n", 502 | " # represented in the x-axis (columns).\n", 503 | " log_spec = np.log(spectrogram.T)\n", 504 | " height = log_spec.shape[0]\n", 505 | " width = log_spec.shape[1]\n", 506 | " X = np.linspace(0, np.size(spectrogram), num=width, dtype=int)\n", 507 | " Y = range(height)\n", 508 | " ax.pcolormesh(X, Y, log_spec)\n", 509 | "\n", 510 | "\n", 511 | "fig, axes = plt.subplots(2, figsize=(12, 8))\n", 512 | "timescale = np.arange(waveform.shape[0])\n", 513 | "axes[0].plot(timescale, waveform.numpy())\n", 514 | "axes[0].set_title('Waveform')\n", 515 | "axes[0].set_xlim([0, 16000])\n", 516 | "plot_spectrogram(spectrogram.numpy(), axes[1])\n", 517 | "axes[1].set_title('Spectrogram')\n", 518 | "plt.show()" 519 | ], 520 | "execution_count": null, 521 | "outputs": [] 522 | }, 523 | { 524 | "cell_type": "markdown", 525 | "metadata": { 526 | "id": "GyYXjW07jCHA" 527 | }, 528 | "source": [ 529 | "Now transform the waveform dataset to have spectrogram images and their corresponding labels as integer IDs." 530 | ] 531 | }, 532 | { 533 | "cell_type": "code", 534 | "metadata": { 535 | "id": "43IS2IouEV40" 536 | }, 537 | "source": [ 538 | "def get_spectrogram_and_label_id(audio, label):\n", 539 | " spectrogram = get_spectrogram(audio)\n", 540 | " spectrogram = tf.expand_dims(spectrogram, -1)\n", 541 | " label_id = tf.argmax(label == commands)\n", 542 | " return spectrogram, label_id" 543 | ], 544 | "execution_count": null, 545 | "outputs": [] 546 | }, 547 | { 548 | "cell_type": "code", 549 | "metadata": { 550 | "id": "yEVb_oK0oBLQ" 551 | }, 552 | "source": [ 553 | "spectrogram_ds = waveform_ds.map(\n", 554 | " get_spectrogram_and_label_id, num_parallel_calls=AUTOTUNE)" 555 | ], 556 | "execution_count": null, 557 | "outputs": [] 558 | }, 559 | { 560 | "cell_type": "markdown", 561 | "metadata": { 562 | "id": "6gQpAAgMnyDi" 563 | }, 564 | "source": [ 565 | "Examine the spectrogram \"images\" for different samples of the dataset." 566 | ] 567 | }, 568 | { 569 | "cell_type": "code", 570 | "metadata": { 571 | "id": "QUbHfTuon4iF" 572 | }, 573 | "source": [ 574 | "rows = 3\n", 575 | "cols = 3\n", 576 | "n = rows*cols\n", 577 | "fig, axes = plt.subplots(rows, cols, figsize=(10, 10))\n", 578 | "for i, (spectrogram, label_id) in enumerate(spectrogram_ds.take(n)):\n", 579 | " r = i // cols\n", 580 | " c = i % cols\n", 581 | " ax = axes[r][c]\n", 582 | " plot_spectrogram(np.squeeze(spectrogram.numpy()), ax)\n", 583 | " ax.set_title(commands[label_id.numpy()])\n", 584 | " ax.axis('off')\n", 585 | " \n", 586 | "plt.show()" 587 | ], 588 | "execution_count": null, 589 | "outputs": [] 590 | }, 591 | { 592 | "cell_type": "markdown", 593 | "metadata": { 594 | "id": "z5KdY8IF8rkt" 595 | }, 596 | "source": [ 597 | "## Build and train the model\n", 598 | "\n", 599 | "Now you can build and train your model. But before you do that, you'll need to repeat the training set preprocessing on the validation and test sets." 600 | ] 601 | }, 602 | { 603 | "cell_type": "code", 604 | "metadata": { 605 | "id": "10UI32QH_45b" 606 | }, 607 | "source": [ 608 | "def preprocess_dataset(files):\n", 609 | " files_ds = tf.data.Dataset.from_tensor_slices(files)\n", 610 | " output_ds = files_ds.map(get_waveform_and_label, num_parallel_calls=AUTOTUNE)\n", 611 | " output_ds = output_ds.map(\n", 612 | " get_spectrogram_and_label_id, num_parallel_calls=AUTOTUNE)\n", 613 | " return output_ds" 614 | ], 615 | "execution_count": null, 616 | "outputs": [] 617 | }, 618 | { 619 | "cell_type": "code", 620 | "metadata": { 621 | "id": "HNv4xwYkB2P6" 622 | }, 623 | "source": [ 624 | "train_ds = spectrogram_ds\n", 625 | "val_ds = preprocess_dataset(val_files)\n", 626 | "test_ds = preprocess_dataset(test_files)" 627 | ], 628 | "execution_count": null, 629 | "outputs": [] 630 | }, 631 | { 632 | "cell_type": "markdown", 633 | "metadata": { 634 | "id": "assnWo6SB3lR" 635 | }, 636 | "source": [ 637 | "Batch the training and validation sets for model training." 638 | ] 639 | }, 640 | { 641 | "cell_type": "code", 642 | "metadata": { 643 | "id": "UgY9WYzn61EX" 644 | }, 645 | "source": [ 646 | "batch_size = 64\n", 647 | "train_ds = train_ds.batch(batch_size)\n", 648 | "val_ds = val_ds.batch(batch_size)" 649 | ], 650 | "execution_count": null, 651 | "outputs": [] 652 | }, 653 | { 654 | "cell_type": "markdown", 655 | "metadata": { 656 | "id": "GS1uIh6F_TN9" 657 | }, 658 | "source": [ 659 | "Add dataset [`cache()`](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#cache) and [`prefetch()`](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#prefetch) operations to reduce read latency while training the model." 660 | ] 661 | }, 662 | { 663 | "cell_type": "code", 664 | "metadata": { 665 | "id": "fdZ6M-F5_QzY" 666 | }, 667 | "source": [ 668 | "train_ds = train_ds.cache().prefetch(AUTOTUNE)\n", 669 | "val_ds = val_ds.cache().prefetch(AUTOTUNE)" 670 | ], 671 | "execution_count": null, 672 | "outputs": [] 673 | }, 674 | { 675 | "cell_type": "markdown", 676 | "metadata": { 677 | "id": "rwHkKCQQb5oW" 678 | }, 679 | "source": [ 680 | "For the model, you'll use a simple convolutional neural network (CNN), since you have transformed the audio files into spectrogram images.\n", 681 | "The model also has the following additional preprocessing layers:\n", 682 | "- A [`Resizing`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/experimental/preprocessing/Resizing) layer to downsample the input to enable the model to train faster.\n", 683 | "- A [`Normalization`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/experimental/preprocessing/Normalization) layer to normalize each pixel in the image based on its mean and standard deviation.\n", 684 | "\n", 685 | "For the `Normalization` layer, its `adapt` method would first need to be called on the training data in order to compute aggregate statistics (i.e. mean and standard deviation)." 686 | ] 687 | }, 688 | { 689 | "cell_type": "code", 690 | "metadata": { 691 | "id": "ALYz7PFCHblP" 692 | }, 693 | "source": [ 694 | "for spectrogram, _ in spectrogram_ds.take(1):\n", 695 | " input_shape = spectrogram.shape\n", 696 | "print('Input shape:', input_shape)\n", 697 | "num_labels = len(commands)\n", 698 | "\n", 699 | "norm_layer = preprocessing.Normalization()\n", 700 | "norm_layer.adapt(spectrogram_ds.map(lambda x, _: x))\n", 701 | "\n", 702 | "model = models.Sequential([\n", 703 | " layers.Input(shape=input_shape),\n", 704 | " preprocessing.Resizing(32, 32), \n", 705 | " norm_layer,\n", 706 | " layers.Conv2D(32, 3, activation='relu'),\n", 707 | " layers.Conv2D(64, 3, activation='relu'),\n", 708 | " layers.MaxPooling2D(),\n", 709 | " layers.Dropout(0.25),\n", 710 | " layers.Flatten(),\n", 711 | " layers.Dense(128, activation='relu'),\n", 712 | " layers.Dropout(0.5),\n", 713 | " layers.Dense(num_labels),\n", 714 | "])\n", 715 | "\n", 716 | "model.summary()" 717 | ], 718 | "execution_count": null, 719 | "outputs": [] 720 | }, 721 | { 722 | "cell_type": "code", 723 | "metadata": { 724 | "id": "wFjj7-EmsTD-" 725 | }, 726 | "source": [ 727 | "model.compile(\n", 728 | " optimizer=tf.keras.optimizers.Adam(),\n", 729 | " loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),\n", 730 | " metrics=['accuracy'],\n", 731 | ")" 732 | ], 733 | "execution_count": null, 734 | "outputs": [] 735 | }, 736 | { 737 | "cell_type": "code", 738 | "metadata": { 739 | "id": "ttioPJVMcGtq" 740 | }, 741 | "source": [ 742 | "EPOCHS = 10\n", 743 | "history = model.fit(\n", 744 | " train_ds, \n", 745 | " validation_data=val_ds, \n", 746 | " epochs=EPOCHS,\n", 747 | " callbacks=tf.keras.callbacks.EarlyStopping(verbose=1, patience=2),\n", 748 | ")" 749 | ], 750 | "execution_count": null, 751 | "outputs": [] 752 | }, 753 | { 754 | "cell_type": "markdown", 755 | "metadata": { 756 | "id": "gjpCDeQ4mUfS" 757 | }, 758 | "source": [ 759 | "Let's check the training and validation loss curves to see how your model has improved during training." 760 | ] 761 | }, 762 | { 763 | "cell_type": "code", 764 | "metadata": { 765 | "id": "nzhipg3Gu2AY" 766 | }, 767 | "source": [ 768 | "metrics = history.history\n", 769 | "plt.plot(history.epoch, metrics['loss'], metrics['val_loss'])\n", 770 | "plt.legend(['loss', 'val_loss'])\n", 771 | "plt.show()" 772 | ], 773 | "execution_count": null, 774 | "outputs": [] 775 | }, 776 | { 777 | "cell_type": "markdown", 778 | "metadata": { 779 | "id": "5ZTt3kO3mfm4" 780 | }, 781 | "source": [ 782 | "## Evaluate test set performance\n", 783 | "\n", 784 | "Let's run the model on the test set and check performance." 785 | ] 786 | }, 787 | { 788 | "cell_type": "code", 789 | "metadata": { 790 | "id": "biU2MwzyAo8o" 791 | }, 792 | "source": [ 793 | "test_audio = []\n", 794 | "test_labels = []\n", 795 | "\n", 796 | "for audio, label in test_ds:\n", 797 | " test_audio.append(audio.numpy())\n", 798 | " test_labels.append(label.numpy())\n", 799 | "\n", 800 | "test_audio = np.array(test_audio)\n", 801 | "test_labels = np.array(test_labels)" 802 | ], 803 | "execution_count": null, 804 | "outputs": [] 805 | }, 806 | { 807 | "cell_type": "code", 808 | "metadata": { 809 | "id": "ktUanr9mRZky" 810 | }, 811 | "source": [ 812 | "y_pred = np.argmax(model.predict(test_audio), axis=1)\n", 813 | "y_true = test_labels\n", 814 | "\n", 815 | "test_acc = sum(y_pred == y_true) / len(y_true)\n", 816 | "print(f'Test set accuracy: {test_acc:.0%}')" 817 | ], 818 | "execution_count": null, 819 | "outputs": [] 820 | }, 821 | { 822 | "cell_type": "markdown", 823 | "metadata": { 824 | "id": "en9Znt1NOabH" 825 | }, 826 | "source": [ 827 | "### Display a confusion matrix\n", 828 | "\n", 829 | "A confusion matrix is helpful to see how well the model did on each of the commands in the test set." 830 | ] 831 | }, 832 | { 833 | "cell_type": "code", 834 | "metadata": { 835 | "id": "LvoSAOiXU3lL" 836 | }, 837 | "source": [ 838 | "confusion_mtx = tf.math.confusion_matrix(y_true, y_pred) \n", 839 | "plt.figure(figsize=(10, 8))\n", 840 | "sns.heatmap(confusion_mtx, xticklabels=commands, yticklabels=commands, \n", 841 | " annot=True, fmt='g')\n", 842 | "plt.xlabel('Prediction')\n", 843 | "plt.ylabel('Label')\n", 844 | "plt.show()" 845 | ], 846 | "execution_count": null, 847 | "outputs": [] 848 | }, 849 | { 850 | "cell_type": "markdown", 851 | "metadata": { 852 | "id": "mQGi_mzPcLvl" 853 | }, 854 | "source": [ 855 | "## Run inference on an audio file\n", 856 | "\n", 857 | "Finally, verify the model's prediction output using an input audio file of someone saying \"no.\" How well does your model perform?" 858 | ] 859 | }, 860 | { 861 | "cell_type": "code", 862 | "metadata": { 863 | "id": "zRxauKMdhofU" 864 | }, 865 | "source": [ 866 | "sample_file = data_dir/'no/01bb6a2a_nohash_0.wav'\n", 867 | "\n", 868 | "sample_ds = preprocess_dataset([str(sample_file)])\n", 869 | "\n", 870 | "for spectrogram, label in sample_ds.batch(1):\n", 871 | " prediction = model(spectrogram)\n", 872 | " plt.bar(commands, tf.nn.softmax(prediction[0]))\n", 873 | " plt.title(f'Predictions for \"{commands[label[0]]}\"')\n", 874 | " plt.show()" 875 | ], 876 | "execution_count": null, 877 | "outputs": [] 878 | }, 879 | { 880 | "cell_type": "markdown", 881 | "metadata": { 882 | "id": "VgWICqdqQNaQ" 883 | }, 884 | "source": [ 885 | "You can see that your model very clearly recognized the audio command as \"no.\"" 886 | ] 887 | }, 888 | { 889 | "cell_type": "markdown", 890 | "metadata": { 891 | "id": "J3jF933m9z1J" 892 | }, 893 | "source": [ 894 | "## Next steps\n", 895 | "\n", 896 | "This tutorial showed how you could do simple audio classification using a convolutional neural network with TensorFlow and Python.\n", 897 | "\n", 898 | "* To learn how to use transfer learning for audio classification, check out the [Sound classification with YAMNet](https://www.tensorflow.org/hub/tutorials/yamnet) tutorial.\n", 899 | "\n", 900 | "* To build your own interactive web app for audio classification, consider taking the [TensorFlow.js - Audio recognition using transfer learning codelab](https://codelabs.developers.google.com/codelabs/tensorflowjs-audio-codelab/index.html#0).\n", 901 | "\n", 902 | "* TensorFlow also has additional support for [audio data preparation and augmentation](https://www.tensorflow.org/io/tutorials/audio) to help with your own audio-based projects.\n" 903 | ] 904 | } 905 | ] 906 | } -------------------------------------------------------------------------------- /timbre_transfer.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "accelerator": "GPU", 6 | "colab": { 7 | "name": "timbre_transfer.ipynb", 8 | "provenance": [], 9 | "collapsed_sections": [ 10 | "3YLyiTwPfVCT" 11 | ] 12 | }, 13 | "kernelspec": { 14 | "display_name": "Python 3", 15 | "name": "python3" 16 | } 17 | }, 18 | "cells": [ 19 | { 20 | "cell_type": "markdown", 21 | "metadata": { 22 | "id": "3YLyiTwPfVCT" 23 | }, 24 | "source": [ 25 | " $\"Open$ \n", 26 | "\n", 27 | "##### Copyright 2021 Google LLC.\n", 28 | "\n", 29 | "Licensed under the Apache License, Version 2.0 (the \"License\");\n", 30 | "\n", 31 | "\n", 32 | "\n" 33 | ] 34 | }, 35 | { 36 | "cell_type": "code", 37 | "metadata": { 38 | "id": "Bvp6GWqtfVCW" 39 | }, 40 | "source": [ 41 | "# Copyright 2021 Google LLC. All Rights Reserved.\n", 42 | "\n", 43 | "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", 44 | "# you may not use this file except in compliance with the License.\n", 45 | "# You may obtain a copy of the License at\n", 46 | "\n", 47 | "# http://www.apache.org/licenses/LICENSE-2.0\n", 48 | "\n", 49 | "# Unless required by applicable law or agreed to in writing, software\n", 50 | "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", 51 | "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", 52 | "# See the License for the specific language governing permissions and\n", 53 | "# limitations under the License.\n", 54 | "# ==============================================================================" 55 | ], 56 | "execution_count": 1, 57 | "outputs": [] 58 | }, 59 | { 60 | "cell_type": "markdown", 61 | "metadata": { 62 | "id": "Tlqv2jiMPQBs" 63 | }, 64 | "source": [ 65 | "**Modified by Renato Profeta to embed Video Tutorials. June, 2021**" 66 | ] 67 | }, 68 | { 69 | "cell_type": "markdown", 70 | "metadata": { 71 | "id": "JndnmDMp66FL" 72 | }, 73 | "source": [ 74 | "# DDSP Timbre Transfer Demo\n", 75 | "\n", 76 | "This notebook is a demo of timbre transfer using DDSP (Differentiable Digital Signal Processing). \n", 77 | "The model here is trained to generate audio conditioned on a time series of fundamental frequency and loudness. \n", 78 | "\n", 79 | "* [DDSP ICLR paper](https://openreview.net/forum?id=B1x1ma4tDr)\n", 80 | "* [Audio Examples](http://goo.gl/magenta/ddsp-examples) \n", 81 | "\n", 82 | "This notebook extracts these features from input audio (either uploaded files, or recorded from the microphone) and resynthesizes with the model. \n", 83 | "\n", 84 | " $\"DDSP$ \n", 85 | "\n", 86 | "\n", 87 | "\n", 88 | "By default, the notebook will download pre-trained models. You can train a model on your own sounds by using the [Train Autoencoder Colab](https://github.com/magenta/ddsp/blob/master/ddsp/colab/demos/train_autoencoder.ipynb).\n", 89 | "\n", 90 | "Have fun! And please feel free to hack this notebook to make your own creative interactions.\n", 91 | "\n", 92 | "\n", 93 | "### Instructions for running:\n", 94 | "\n", 95 | "* Make sure to use a GPU runtime, click: __Runtime >> Change Runtime Type >> GPU__\n", 96 | "* Press ▶️ on the left of each of the cells\n", 97 | "* View the code: Double-click any of the cells\n", 98 | "* Hide the code: Double click the right side of the cell\n", 99 | "\n", 100 | "\n", 101 | "\n" 102 | ] 103 | }, 104 | { 105 | "cell_type": "code", 106 | "metadata": { 107 | "cellView": "form", 108 | "id": "g4yFMLgkP3XK", 109 | "outputId": "54150a23-b5e8-41ce-d581-168b29a38085", 110 | "colab": { 111 | "base_uri": "https://localhost:8080/", 112 | "height": 336 113 | } 114 | }, 115 | "source": [ 116 | "#@title\n", 117 | "%%html\n", 118 | "" 119 | ], 120 | "execution_count": 3, 121 | "outputs": [ 122 | { 123 | "output_type": "display_data", 124 | "data": { 125 | "text/html": [ 126 | "" 127 | ], 128 | "text/plain": [ 129 | "" 130 | ] 131 | }, 132 | "metadata": { 133 | "tags": [] 134 | } 135 | } 136 | ] 137 | }, 138 | { 139 | "cell_type": "code", 140 | "metadata": { 141 | "cellView": "form", 142 | "id": "6wZde6CBya9k" 143 | }, 144 | "source": [ 145 | "#@title #Install and Import\n", 146 | "\n", 147 | "#@markdown Install ddsp, define some helper functions, and download the model. This transfers a lot of data and _should take a minute or two_.\n", 148 | "print('Installing from pip package...')\n", 149 | "!pip install -qU ddsp==1.4.0\n", 150 | "\n", 151 | "# Ignore a bunch of deprecation warnings\n", 152 | "import warnings\n", 153 | "warnings.filterwarnings(\"ignore\")\n", 154 | "\n", 155 | "import copy\n", 156 | "import os\n", 157 | "import time\n", 158 | "\n", 159 | "import crepe\n", 160 | "import ddsp\n", 161 | "import ddsp.training\n", 162 | "from ddsp.colab import colab_utils\n", 163 | "from ddsp.colab.colab_utils import (\n", 164 | " auto_tune, detect_notes, fit_quantile_transform, \n", 165 | " get_tuning_factor, download, play, record, \n", 166 | " specplot, upload, DEFAULT_SAMPLE_RATE)\n", 167 | "import gin\n", 168 | "from google.colab import files\n", 169 | "import librosa\n", 170 | "import matplotlib.pyplot as plt\n", 171 | "import numpy as np\n", 172 | "import pickle\n", 173 | "import tensorflow.compat.v2 as tf\n", 174 | "import tensorflow_datasets as tfds\n", 175 | "\n", 176 | "# Helper Functions\n", 177 | "sample_rate = DEFAULT_SAMPLE_RATE # 16000\n", 178 | "\n", 179 | "\n", 180 | "print('Done!')" 181 | ], 182 | "execution_count": null, 183 | "outputs": [] 184 | }, 185 | { 186 | "cell_type": "code", 187 | "metadata": { 188 | "cellView": "form", 189 | "id": "vj8yoh57QQ5n", 190 | "outputId": "59ccec6f-8d8e-494a-9704-1388704c11a0", 191 | "colab": { 192 | "base_uri": "https://localhost:8080/", 193 | "height": 336 194 | } 195 | }, 196 | "source": [ 197 | "#@title\n", 198 | "%%html\n", 199 | "" 200 | ], 201 | "execution_count": 4, 202 | "outputs": [ 203 | { 204 | "output_type": "display_data", 205 | "data": { 206 | "text/html": [ 207 | "" 208 | ], 209 | "text/plain": [ 210 | "" 211 | ] 212 | }, 213 | "metadata": { 214 | "tags": [] 215 | } 216 | } 217 | ] 218 | }, 219 | { 220 | "cell_type": "code", 221 | "metadata": { 222 | "cellView": "form", 223 | "id": "Go36QW9AS_CD" 224 | }, 225 | "source": [ 226 | "#@title Record or Upload Audio\n", 227 | "#@markdown * Either record audio from microphone or upload audio from file (.mp3 or .wav) \n", 228 | "#@markdown * Audio should be monophonic (single instrument / voice)\n", 229 | "#@markdown * Extracts fundmanetal frequency (f0) and loudness features. \n", 230 | "\n", 231 | "record_or_upload = \"Record\" #@param [\"Record\", \"Upload (.mp3 or .wav)\"]\n", 232 | "\n", 233 | "record_seconds = 5#@param {type:\"number\", min:1, max:10, step:1}\n", 234 | "\n", 235 | "if record_or_upload == \"Record\":\n", 236 | " audio = record(seconds=record_seconds)\n", 237 | "else:\n", 238 | " # Load audio sample here (.mp3 or .wav3 file)\n", 239 | " # Just use the first file.\n", 240 | " filenames, audios = upload()\n", 241 | " audio = audios[0]\n", 242 | "audio = audio[np.newaxis, :]\n", 243 | "print('\\nExtracting audio features...')\n", 244 | "\n", 245 | "# Plot.\n", 246 | "specplot(audio)\n", 247 | "play(audio)\n", 248 | "\n", 249 | "# Setup the session.\n", 250 | "ddsp.spectral_ops.reset_crepe()\n", 251 | "\n", 252 | "# Compute features.\n", 253 | "start_time = time.time()\n", 254 | "audio_features = ddsp.training.metrics.compute_audio_features(audio)\n", 255 | "audio_features['loudness_db'] = audio_features['loudness_db'].astype(np.float32)\n", 256 | "audio_features_mod = None\n", 257 | "print('Audio features took %.1f seconds' % (time.time() - start_time))\n", 258 | "\n", 259 | "\n", 260 | "TRIM = -15\n", 261 | "# Plot Features.\n", 262 | "fig, ax = plt.subplots(nrows=3, \n", 263 | " ncols=1, \n", 264 | " sharex=True,\n", 265 | " figsize=(6, 8))\n", 266 | "ax[0].plot(audio_features['loudness_db'][:TRIM])\n", 267 | "ax[0].set_ylabel('loudness_db')\n", 268 | "\n", 269 | "ax[1].plot(librosa.hz_to_midi(audio_features['f0_hz'][:TRIM]))\n", 270 | "ax[1].set_ylabel('f0 [midi]')\n", 271 | "\n", 272 | "ax[2].plot(audio_features['f0_confidence'][:TRIM])\n", 273 | "ax[2].set_ylabel('f0 confidence')\n", 274 | "_ = ax[2].set_xlabel('Time step [frame]')\n", 275 | "\n" 276 | ], 277 | "execution_count": null, 278 | "outputs": [] 279 | }, 280 | { 281 | "cell_type": "code", 282 | "metadata": { 283 | "cellView": "form", 284 | "id": "o-LmqFdKQbP3", 285 | "outputId": "0832925a-e72c-4268-fb95-65851015e451", 286 | "colab": { 287 | "base_uri": "https://localhost:8080/", 288 | "height": 336 289 | } 290 | }, 291 | "source": [ 292 | "#@title\n", 293 | "%%html\n", 294 | "" 295 | ], 296 | "execution_count": 7, 297 | "outputs": [ 298 | { 299 | "output_type": "display_data", 300 | "data": { 301 | "text/html": [ 302 | "" 303 | ], 304 | "text/plain": [ 305 | "" 306 | ] 307 | }, 308 | "metadata": { 309 | "tags": [] 310 | } 311 | } 312 | ] 313 | }, 314 | { 315 | "cell_type": "code", 316 | "metadata": { 317 | "cellView": "form", 318 | "id": "lnBatHR6Qpfx", 319 | "outputId": "91776e1b-d806-4514-cb32-7f67d0325874", 320 | "colab": { 321 | "base_uri": "https://localhost:8080/", 322 | "height": 336 323 | } 324 | }, 325 | "source": [ 326 | "#@title\n", 327 | "%%html\n", 328 | "" 329 | ], 330 | "execution_count": 8, 331 | "outputs": [ 332 | { 333 | "output_type": "display_data", 334 | "data": { 335 | "text/html": [ 336 | "" 337 | ], 338 | "text/plain": [ 339 | "" 340 | ] 341 | }, 342 | "metadata": { 343 | "tags": [] 344 | } 345 | } 346 | ] 347 | }, 348 | { 349 | "cell_type": "code", 350 | "metadata": { 351 | "cellView": "form", 352 | "id": "DFVASFSkQ0HO", 353 | "outputId": "5f82da4c-c923-4dea-c860-b6caad36e731", 354 | "colab": { 355 | "base_uri": "https://localhost:8080/", 356 | "height": 336 357 | } 358 | }, 359 | "source": [ 360 | "#@title\n", 361 | "%%html\n", 362 | "" 363 | ], 364 | "execution_count": 9, 365 | "outputs": [ 366 | { 367 | "output_type": "display_data", 368 | "data": { 369 | "text/html": [ 370 | "" 371 | ], 372 | "text/plain": [ 373 | "" 374 | ] 375 | }, 376 | "metadata": { 377 | "tags": [] 378 | } 379 | } 380 | ] 381 | }, 382 | { 383 | "cell_type": "code", 384 | "metadata": { 385 | "cellView": "form", 386 | "id": "wmSGDWM5yyjm" 387 | }, 388 | "source": [ 389 | "#@title Load a model\n", 390 | "#@markdown Run for ever new audio input\n", 391 | "model = 'Violin' #@param ['Violin', 'Flute', 'Flute2', 'Trumpet', 'Tenor_Saxophone', 'Upload your own (checkpoint folder as .zip)']\n", 392 | "MODEL = model\n", 393 | "\n", 394 | "\n", 395 | "def find_model_dir(dir_name):\n", 396 | " # Iterate through directories until model directory is found\n", 397 | " for root, dirs, filenames in os.walk(dir_name):\n", 398 | " for filename in filenames:\n", 399 | " if filename.endswith(\".gin\") and not filename.startswith(\".\"):\n", 400 | " model_dir = root\n", 401 | " break\n", 402 | " return model_dir \n", 403 | "\n", 404 | "if model in ('Violin', 'Flute', 'Flute2', 'Trumpet', 'Tenor_Saxophone'):\n", 405 | " # Pretrained models.\n", 406 | " PRETRAINED_DIR = '/content/pretrained'\n", 407 | " # Copy over from gs:// for faster loading.\n", 408 | " !rm -r $PRETRAINED_DIR &> /dev/null\n", 409 | " !mkdir $PRETRAINED_DIR &> /dev/null\n", 410 | " GCS_CKPT_DIR = 'gs://ddsp/models/timbre_transfer_colab/2021-01-06'\n", 411 | " model_dir = os.path.join(GCS_CKPT_DIR, 'solo_%s_ckpt' % model.lower())\n", 412 | " \n", 413 | " !gsutil cp $model_dir/* $PRETRAINED_DIR &> /dev/null\n", 414 | " model_dir = PRETRAINED_DIR\n", 415 | " gin_file = os.path.join(model_dir, 'operative_config-0.gin')\n", 416 | "\n", 417 | "else:\n", 418 | " # User models.\n", 419 | " UPLOAD_DIR = '/content/uploaded'\n", 420 | " !mkdir $UPLOAD_DIR\n", 421 | " uploaded_files = files.upload()\n", 422 | "\n", 423 | " for fnames in uploaded_files.keys():\n", 424 | " print(\"Unzipping... {}\".format(fnames))\n", 425 | " !unzip -o \"/content/$fnames\" -d $UPLOAD_DIR &> /dev/null\n", 426 | " model_dir = find_model_dir(UPLOAD_DIR)\n", 427 | " gin_file = os.path.join(model_dir, 'operative_config-0.gin')\n", 428 | "\n", 429 | "\n", 430 | "# Load the dataset statistics.\n", 431 | "DATASET_STATS = None\n", 432 | "dataset_stats_file = os.path.join(model_dir, 'dataset_statistics.pkl')\n", 433 | "print(f'Loading dataset statistics from {dataset_stats_file}')\n", 434 | "try:\n", 435 | " if tf.io.gfile.exists(dataset_stats_file):\n", 436 | " with tf.io.gfile.GFile(dataset_stats_file, 'rb') as f:\n", 437 | " DATASET_STATS = pickle.load(f)\n", 438 | "except Exception as err:\n", 439 | " print('Loading dataset statistics from pickle failed: {}.'.format(err))\n", 440 | "\n", 441 | "\n", 442 | "# Parse gin config,\n", 443 | "with gin.unlock_config():\n", 444 | " gin.parse_config_file(gin_file, skip_unknown=True)\n", 445 | "\n", 446 | "# Assumes only one checkpoint in the folder, 'ckpt-[iter]`.\n", 447 | "ckpt_files = [f for f in tf.io.gfile.listdir(model_dir) if 'ckpt' in f]\n", 448 | "ckpt_name = ckpt_files[0].split('.')[0]\n", 449 | "ckpt = os.path.join(model_dir, ckpt_name)\n", 450 | "\n", 451 | "# Ensure dimensions and sampling rates are equal\n", 452 | "time_steps_train = gin.query_parameter('F0LoudnessPreprocessor.time_steps')\n", 453 | "n_samples_train = gin.query_parameter('Harmonic.n_samples')\n", 454 | "hop_size = int(n_samples_train / time_steps_train)\n", 455 | "\n", 456 | "time_steps = int(audio.shape[1] / hop_size)\n", 457 | "n_samples = time_steps * hop_size\n", 458 | "\n", 459 | "# print(\"===Trained model===\")\n", 460 | "# print(\"Time Steps\", time_steps_train)\n", 461 | "# print(\"Samples\", n_samples_train)\n", 462 | "# print(\"Hop Size\", hop_size)\n", 463 | "# print(\"\\n===Resynthesis===\")\n", 464 | "# print(\"Time Steps\", time_steps)\n", 465 | "# print(\"Samples\", n_samples)\n", 466 | "# print('')\n", 467 | "\n", 468 | "gin_params = [\n", 469 | " 'Harmonic.n_samples = {}'.format(n_samples),\n", 470 | " 'FilteredNoise.n_samples = {}'.format(n_samples),\n", 471 | " 'F0LoudnessPreprocessor.time_steps = {}'.format(time_steps),\n", 472 | " 'oscillator_bank.use_angular_cumsum = True', # Avoids cumsum accumulation errors.\n", 473 | "]\n", 474 | "\n", 475 | "with gin.unlock_config():\n", 476 | " gin.parse_config(gin_params)\n", 477 | "\n", 478 | "\n", 479 | "# Trim all input vectors to correct lengths \n", 480 | "for key in ['f0_hz', 'f0_confidence', 'loudness_db']:\n", 481 | " audio_features[key] = audio_features[key][:time_steps]\n", 482 | "audio_features['audio'] = audio_features['audio'][:, :n_samples]\n", 483 | "\n", 484 | "\n", 485 | "# Set up the model just to predict audio given new conditioning\n", 486 | "model = ddsp.training.models.Autoencoder()\n", 487 | "model.restore(ckpt)\n", 488 | "\n", 489 | "# Build model by running a batch through it.\n", 490 | "start_time = time.time()\n", 491 | "_ = model(audio_features, training=False)\n", 492 | "print('Restoring model took %.1f seconds' % (time.time() - start_time))" 493 | ], 494 | "execution_count": null, 495 | "outputs": [] 496 | }, 497 | { 498 | "cell_type": "code", 499 | "metadata": { 500 | "cellView": "form", 501 | "id": "8d2LvJ5JQkq9", 502 | "outputId": "99521efa-e56e-439f-d85c-24ee9aa470e5", 503 | "colab": { 504 | "base_uri": "https://localhost:8080/", 505 | "height": 336 506 | } 507 | }, 508 | "source": [ 509 | "#@title\n", 510 | "%%html\n", 511 | "" 512 | ], 513 | "execution_count": 10, 514 | "outputs": [ 515 | { 516 | "output_type": "display_data", 517 | "data": { 518 | "text/html": [ 519 | "" 520 | ], 521 | "text/plain": [ 522 | "" 523 | ] 524 | }, 525 | "metadata": { 526 | "tags": [] 527 | } 528 | } 529 | ] 530 | }, 531 | { 532 | "cell_type": "code", 533 | "metadata": { 534 | "cellView": "form", 535 | "id": "V8Aqu-ypRNsU", 536 | "outputId": "2a9d7ce1-e169-442d-bf97-b2543bf22eee", 537 | "colab": { 538 | "base_uri": "https://localhost:8080/", 539 | "height": 336 540 | } 541 | }, 542 | "source": [ 543 | "#@title\n", 544 | "%%html\n", 545 | "" 546 | ], 547 | "execution_count": 11, 548 | "outputs": [ 549 | { 550 | "output_type": "display_data", 551 | "data": { 552 | "text/html": [ 553 | "" 554 | ], 555 | "text/plain": [ 556 | "" 557 | ] 558 | }, 559 | "metadata": { 560 | "tags": [] 561 | } 562 | } 563 | ] 564 | }, 565 | { 566 | "cell_type": "code", 567 | "metadata": { 568 | "cellView": "form", 569 | "id": "uQFUlIJ_5r36" 570 | }, 571 | "source": [ 572 | "#@title Modify conditioning\n", 573 | "\n", 574 | "#@markdown These models were not explicitly trained to perform timbre transfer, so they may sound unnatural if the incoming loudness and frequencies are very different then the training data (which will always be somewhat true). \n", 575 | "\n", 576 | "\n", 577 | "#@markdown ## Note Detection\n", 578 | "\n", 579 | "#@markdown You can leave this at 1.0 for most cases\n", 580 | "threshold = 1 #@param {type:\"slider\", min: 0.0, max:2.0, step:0.01}\n", 581 | "\n", 582 | "\n", 583 | "#@markdown ## Automatic\n", 584 | "\n", 585 | "ADJUST = True #@param{type:\"boolean\"}\n", 586 | "\n", 587 | "#@markdown Quiet parts without notes detected (dB)\n", 588 | "quiet = 20 #@param {type:\"slider\", min: 0, max:60, step:1}\n", 589 | "\n", 590 | "#@markdown Force pitch to nearest note (amount)\n", 591 | "autotune = 0 #@param {type:\"slider\", min: 0.0, max:1.0, step:0.1}\n", 592 | "\n", 593 | "#@markdown ## Manual\n", 594 | "\n", 595 | "\n", 596 | "#@markdown Shift the pitch (octaves)\n", 597 | "pitch_shift = 0 #@param {type:\"slider\", min:-2, max:2, step:1}\n", 598 | "\n", 599 | "#@markdown Adjsut the overall loudness (dB)\n", 600 | "loudness_shift = 0 #@param {type:\"slider\", min:-20, max:20, step:1}\n", 601 | "\n", 602 | "\n", 603 | "audio_features_mod = {k: v.copy() for k, v in audio_features.items()}\n", 604 | "\n", 605 | "\n", 606 | "## Helper functions.\n", 607 | "def shift_ld(audio_features, ld_shift=0.0):\n", 608 | " \"\"\"Shift loudness by a number of ocatves.\"\"\"\n", 609 | " audio_features['loudness_db'] += ld_shift\n", 610 | " return audio_features\n", 611 | "\n", 612 | "\n", 613 | "def shift_f0(audio_features, pitch_shift=0.0):\n", 614 | " \"\"\"Shift f0 by a number of ocatves.\"\"\"\n", 615 | " audio_features['f0_hz'] *= 2.0 ** (pitch_shift)\n", 616 | " audio_features['f0_hz'] = np.clip(audio_features['f0_hz'], \n", 617 | " 0.0, \n", 618 | " librosa.midi_to_hz(110.0))\n", 619 | " return audio_features\n", 620 | "\n", 621 | "\n", 622 | "mask_on = None\n", 623 | "\n", 624 | "if ADJUST and DATASET_STATS is not None:\n", 625 | " # Detect sections that are \"on\".\n", 626 | " mask_on, note_on_value = detect_notes(audio_features['loudness_db'],\n", 627 | " audio_features['f0_confidence'],\n", 628 | " threshold)\n", 629 | "\n", 630 | " if np.any(mask_on):\n", 631 | " # Shift the pitch register.\n", 632 | " target_mean_pitch = DATASET_STATS['mean_pitch']\n", 633 | " pitch = ddsp.core.hz_to_midi(audio_features['f0_hz'])\n", 634 | " mean_pitch = np.mean(pitch[mask_on])\n", 635 | " p_diff = target_mean_pitch - mean_pitch\n", 636 | " p_diff_octave = p_diff / 12.0\n", 637 | " round_fn = np.floor if p_diff_octave > 1.5 else np.ceil\n", 638 | " p_diff_octave = round_fn(p_diff_octave)\n", 639 | " audio_features_mod = shift_f0(audio_features_mod, p_diff_octave)\n", 640 | "\n", 641 | "\n", 642 | " # Quantile shift the note_on parts.\n", 643 | " _, loudness_norm = colab_utils.fit_quantile_transform(\n", 644 | " audio_features['loudness_db'],\n", 645 | " mask_on,\n", 646 | " inv_quantile=DATASET_STATS['quantile_transform'])\n", 647 | "\n", 648 | " # Turn down the note_off parts.\n", 649 | " mask_off = np.logical_not(mask_on)\n", 650 | " loudness_norm[mask_off] -= quiet * (1.0 - note_on_value[mask_off][:, np.newaxis])\n", 651 | " loudness_norm = np.reshape(loudness_norm, audio_features['loudness_db'].shape)\n", 652 | " \n", 653 | " audio_features_mod['loudness_db'] = loudness_norm \n", 654 | "\n", 655 | " # Auto-tune.\n", 656 | " if autotune:\n", 657 | " f0_midi = np.array(ddsp.core.hz_to_midi(audio_features_mod['f0_hz']))\n", 658 | " tuning_factor = get_tuning_factor(f0_midi, audio_features_mod['f0_confidence'], mask_on)\n", 659 | " f0_midi_at = auto_tune(f0_midi, tuning_factor, mask_on, amount=autotune)\n", 660 | " audio_features_mod['f0_hz'] = ddsp.core.midi_to_hz(f0_midi_at)\n", 661 | "\n", 662 | " else:\n", 663 | " print('\\nSkipping auto-adjust (no notes detected or ADJUST box empty).')\n", 664 | "\n", 665 | "else:\n", 666 | " print('\\nSkipping auto-adujst (box not checked or no dataset statistics found).')\n", 667 | "\n", 668 | "# Manual Shifts.\n", 669 | "audio_features_mod = shift_ld(audio_features_mod, loudness_shift)\n", 670 | "audio_features_mod = shift_f0(audio_features_mod, pitch_shift)\n", 671 | "\n", 672 | "\n", 673 | "\n", 674 | "# Plot Features.\n", 675 | "has_mask = int(mask_on is not None)\n", 676 | "n_plots = 3 if has_mask else 2 \n", 677 | "fig, axes = plt.subplots(nrows=n_plots, \n", 678 | " ncols=1, \n", 679 | " sharex=True,\n", 680 | " figsize=(2*n_plots, 8))\n", 681 | "\n", 682 | "if has_mask:\n", 683 | " ax = axes[0]\n", 684 | " ax.plot(np.ones_like(mask_on[:TRIM]) * threshold, 'k:')\n", 685 | " ax.plot(note_on_value[:TRIM])\n", 686 | " ax.plot(mask_on[:TRIM])\n", 687 | " ax.set_ylabel('Note-on Mask')\n", 688 | " ax.set_xlabel('Time step [frame]')\n", 689 | " ax.legend(['Threshold', 'Likelihood','Mask'])\n", 690 | "\n", 691 | "ax = axes[0 + has_mask]\n", 692 | "ax.plot(audio_features['loudness_db'][:TRIM])\n", 693 | "ax.plot(audio_features_mod['loudness_db'][:TRIM])\n", 694 | "ax.set_ylabel('loudness_db')\n", 695 | "ax.legend(['Original','Adjusted'])\n", 696 | "\n", 697 | "ax = axes[1 + has_mask]\n", 698 | "ax.plot(librosa.hz_to_midi(audio_features['f0_hz'][:TRIM]))\n", 699 | "ax.plot(librosa.hz_to_midi(audio_features_mod['f0_hz'][:TRIM]))\n", 700 | "ax.set_ylabel('f0 [midi]')\n", 701 | "_ = ax.legend(['Original','Adjusted'])\n" 702 | ], 703 | "execution_count": null, 704 | "outputs": [] 705 | }, 706 | { 707 | "cell_type": "code", 708 | "metadata": { 709 | "cellView": "form", 710 | "id": "SLwg1WkHCXQO" 711 | }, 712 | "source": [ 713 | "#@title #Resynthesize Audio\n", 714 | "\n", 715 | "af = audio_features if audio_features_mod is None else audio_features_mod\n", 716 | "\n", 717 | "# Run a batch of predictions.\n", 718 | "start_time = time.time()\n", 719 | "outputs = model(af, training=False)\n", 720 | "audio_gen = model.get_audio_from_outputs(outputs)\n", 721 | "print('Prediction took %.1f seconds' % (time.time() - start_time))\n", 722 | "\n", 723 | "# Plot\n", 724 | "print('Original')\n", 725 | "play(audio)\n", 726 | "\n", 727 | "print('Resynthesis')\n", 728 | "play(audio_gen)\n", 729 | "\n", 730 | "specplot(audio)\n", 731 | "plt.title(\"Original\")\n", 732 | "\n", 733 | "specplot(audio_gen)\n", 734 | "_ = plt.title(\"Resynthesis\")" 735 | ], 736 | "execution_count": null, 737 | "outputs": [] 738 | } 739 | ] 740 | } -------------------------------------------------------------------------------- /webMushra.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "provenance": [] 7 | }, 8 | "kernelspec": { 9 | "name": "python3", 10 | "display_name": "Python 3" 11 | }, 12 | "language_info": { 13 | "name": "python" 14 | } 15 | }, 16 | "cells": [ 17 | { 18 | "cell_type": "markdown", 19 | "source": [ 20 | "# Designing MUSRHA Tests using webMUSHRA and PyMUSHRA" 21 | ], 22 | "metadata": { 23 | "id": "nw-TcZNMdsRK" 24 | } 25 | }, 26 | { 27 | "cell_type": "code", 28 | "source": [ 29 | "# @title\n", 30 | "%%html\n", 31 | "" 32 | ], 33 | "metadata": { 34 | "colab": { 35 | "base_uri": "https://localhost:8080/", 36 | "height": 651 37 | }, 38 | "cellView": "form", 39 | "id": "8LamfjjzZDDn", 40 | "outputId": "2dc7dddf-a69a-4d16-f0fb-84d467c1dc5e" 41 | }, 42 | "execution_count": null, 43 | "outputs": [ 44 | { 45 | "output_type": "display_data", 46 | "data": { 47 | "text/plain": [ 48 | "" 49 | ], 50 | "text/html": [ 51 | "\n" 52 | ] 53 | }, 54 | "metadata": {} 55 | } 56 | ] 57 | }, 58 | { 59 | "cell_type": "markdown", 60 | "source": [ 61 | "## Installing webMUSHRA anb PyMUSHRA" 62 | ], 63 | "metadata": { 64 | "id": "9Os0T2Z-BMlY" 65 | } 66 | }, 67 | { 68 | "cell_type": "code", 69 | "source": [ 70 | "!mkdir db\n", 71 | "!git clone https://github.com/audiolabs/webMUSHRA.git webmushra\n", 72 | "!git clone https://github.com/nils-werner/pymushra.git pymushra\n", 73 | "\n", 74 | "!pip install -e pymushra" 75 | ], 76 | "metadata": { 77 | "colab": { 78 | "base_uri": "https://localhost:8080/" 79 | }, 80 | "id": "Y8RPdqbTUFF8", 81 | "outputId": "f0a9c734-329d-405d-af7c-5a98a4207428" 82 | }, 83 | "execution_count": null, 84 | "outputs": [ 85 | { 86 | "output_type": "stream", 87 | "name": "stdout", 88 | "text": [ 89 | "Cloning into 'webmushra'...\n", 90 | "remote: Enumerating objects: 963, done.\u001b[K\n", 91 | "remote: Counting objects: 100% (9/9), done.\u001b[K\n", 92 | "remote: Compressing objects: 100% (8/8), done.\u001b[K\n", 93 | "remote: Total 963 (delta 1), reused 5 (delta 0), pack-reused 954\u001b[K\n", 94 | "Receiving objects: 100% (963/963), 7.11 MiB | 10.53 MiB/s, done.\n", 95 | "Resolving deltas: 100% (461/461), done.\n", 96 | "Cloning into 'pymushra'...\n", 97 | "remote: Enumerating objects: 103, done.\u001b[K\n", 98 | "remote: Counting objects: 100% (28/28), done.\u001b[K\n", 99 | "remote: Compressing objects: 100% (21/21), done.\u001b[K\n", 100 | "remote: Total 103 (delta 13), reused 16 (delta 7), pack-reused 75\u001b[K\n", 101 | "Receiving objects: 100% (103/103), 26.94 KiB | 3.37 MiB/s, done.\n", 102 | "Resolving deltas: 100% (47/47), done.\n", 103 | "Obtaining file:///content/pymushra\n", 104 | " Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", 105 | "Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from pymushra==0.3) (1.25.2)\n", 106 | "Requirement already satisfied: matplotlib>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from pymushra==0.3) (3.7.1)\n", 107 | "Requirement already satisfied: scipy in /usr/local/lib/python3.10/dist-packages (from pymushra==0.3) (1.11.4)\n", 108 | "Requirement already satisfied: ipython in /usr/local/lib/python3.10/dist-packages (from pymushra==0.3) (7.34.0)\n", 109 | "Requirement already satisfied: flask>=2.2.5 in /usr/local/lib/python3.10/dist-packages (from pymushra==0.3) (2.2.5)\n", 110 | "Requirement already satisfied: pandas>=0.24.0 in /usr/local/lib/python3.10/dist-packages (from pymushra==0.3) (2.0.3)\n", 111 | "Requirement already satisfied: seaborn in /usr/local/lib/python3.10/dist-packages (from pymushra==0.3) (0.13.1)\n", 112 | "Requirement already satisfied: statsmodels in /usr/local/lib/python3.10/dist-packages (from pymushra==0.3) (0.14.2)\n", 113 | "Requirement already satisfied: patsy in /usr/local/lib/python3.10/dist-packages (from pymushra==0.3) (0.5.6)\n", 114 | "Collecting tinydb>=3.0.0 (from pymushra==0.3)\n", 115 | " Downloading tinydb-4.8.0-py3-none-any.whl.metadata (6.2 kB)\n", 116 | "Collecting tinyrecord (from pymushra==0.3)\n", 117 | " Downloading tinyrecord-0.2.0.tar.gz (5.6 kB)\n", 118 | " Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", 119 | "Requirement already satisfied: pytest in /usr/local/lib/python3.10/dist-packages (from pymushra==0.3) (7.4.4)\n", 120 | "Requirement already satisfied: click in /usr/local/lib/python3.10/dist-packages (from pymushra==0.3) (8.1.7)\n", 121 | "Requirement already satisfied: Werkzeug>=2.2.2 in /usr/local/lib/python3.10/dist-packages (from flask>=2.2.5->pymushra==0.3) (3.0.3)\n", 122 | "Requirement already satisfied: Jinja2>=3.0 in /usr/local/lib/python3.10/dist-packages (from flask>=2.2.5->pymushra==0.3) (3.1.4)\n", 123 | "Requirement already satisfied: itsdangerous>=2.0 in /usr/local/lib/python3.10/dist-packages (from flask>=2.2.5->pymushra==0.3) (2.2.0)\n", 124 | "Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=2.0.0->pymushra==0.3) (1.2.1)\n", 125 | "Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=2.0.0->pymushra==0.3) (0.12.1)\n", 126 | "Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=2.0.0->pymushra==0.3) (4.53.1)\n", 127 | "Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=2.0.0->pymushra==0.3) (1.4.5)\n", 128 | "Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=2.0.0->pymushra==0.3) (24.1)\n", 129 | "Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=2.0.0->pymushra==0.3) (9.4.0)\n", 130 | "Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=2.0.0->pymushra==0.3) (3.1.2)\n", 131 | "Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=2.0.0->pymushra==0.3) (2.8.2)\n", 132 | "Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas>=0.24.0->pymushra==0.3) (2023.4)\n", 133 | "Requirement already satisfied: tzdata>=2022.1 in /usr/local/lib/python3.10/dist-packages (from pandas>=0.24.0->pymushra==0.3) (2024.1)\n", 134 | "Requirement already satisfied: setuptools>=18.5 in /usr/local/lib/python3.10/dist-packages (from ipython->pymushra==0.3) (71.0.4)\n", 135 | "Collecting jedi>=0.16 (from ipython->pymushra==0.3)\n", 136 | " Downloading jedi-0.19.1-py2.py3-none-any.whl.metadata (22 kB)\n", 137 | "Requirement already satisfied: decorator in /usr/local/lib/python3.10/dist-packages (from ipython->pymushra==0.3) (4.4.2)\n", 138 | "Requirement already satisfied: pickleshare in /usr/local/lib/python3.10/dist-packages (from ipython->pymushra==0.3) (0.7.5)\n", 139 | "Requirement already satisfied: traitlets>=4.2 in /usr/local/lib/python3.10/dist-packages (from ipython->pymushra==0.3) (5.7.1)\n", 140 | "Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from ipython->pymushra==0.3) (3.0.47)\n", 141 | "Requirement already satisfied: pygments in /usr/local/lib/python3.10/dist-packages (from ipython->pymushra==0.3) (2.16.1)\n", 142 | "Requirement already satisfied: backcall in /usr/local/lib/python3.10/dist-packages (from ipython->pymushra==0.3) (0.2.0)\n", 143 | "Requirement already satisfied: matplotlib-inline in /usr/local/lib/python3.10/dist-packages (from ipython->pymushra==0.3) (0.1.7)\n", 144 | "Requirement already satisfied: pexpect>4.3 in /usr/local/lib/python3.10/dist-packages (from ipython->pymushra==0.3) (4.9.0)\n", 145 | "Requirement already satisfied: six in /usr/local/lib/python3.10/dist-packages (from patsy->pymushra==0.3) (1.16.0)\n", 146 | "Requirement already satisfied: iniconfig in /usr/local/lib/python3.10/dist-packages (from pytest->pymushra==0.3) (2.0.0)\n", 147 | "Requirement already satisfied: pluggy<2.0,>=0.12 in /usr/local/lib/python3.10/dist-packages (from pytest->pymushra==0.3) (1.5.0)\n", 148 | "Requirement already satisfied: exceptiongroup>=1.0.0rc8 in /usr/local/lib/python3.10/dist-packages (from pytest->pymushra==0.3) (1.2.2)\n", 149 | "Requirement already satisfied: tomli>=1.0.0 in /usr/local/lib/python3.10/dist-packages (from pytest->pymushra==0.3) (2.0.1)\n", 150 | "Requirement already satisfied: parso<0.9.0,>=0.8.3 in /usr/local/lib/python3.10/dist-packages (from jedi>=0.16->ipython->pymushra==0.3) (0.8.4)\n", 151 | "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from Jinja2>=3.0->flask>=2.2.5->pymushra==0.3) (2.1.5)\n", 152 | "Requirement already satisfied: ptyprocess>=0.5 in /usr/local/lib/python3.10/dist-packages (from pexpect>4.3->ipython->pymushra==0.3) (0.7.0)\n", 153 | "Requirement already satisfied: wcwidth in /usr/local/lib/python3.10/dist-packages (from prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0->ipython->pymushra==0.3) (0.2.13)\n", 154 | "Downloading tinydb-4.8.0-py3-none-any.whl (24 kB)\n", 155 | "Downloading jedi-0.19.1-py2.py3-none-any.whl (1.6 MB)\n", 156 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.6/1.6 MB\u001b[0m \u001b[31m25.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 157 | "\u001b[?25hBuilding wheels for collected packages: tinyrecord\n", 158 | " Building wheel for tinyrecord (setup.py) ... \u001b[?25l\u001b[?25hdone\n", 159 | " Created wheel for tinyrecord: filename=tinyrecord-0.2.0-py3-none-any.whl size=5903 sha256=b4b67fd68ffbc2513e4460e99aae5975d5d1b86f8a6f78e75b87d0042290163c\n", 160 | " Stored in directory: /root/.cache/pip/wheels/c9/49/88/a6d14674c5f54b1fa57631254411deb0cfc3804130c673a8b6\n", 161 | "Successfully built tinyrecord\n", 162 | "Installing collected packages: tinyrecord, tinydb, jedi, pymushra\n", 163 | " Running setup.py develop for pymushra\n", 164 | "Successfully installed jedi-0.19.1 pymushra-0.3 tinydb-4.8.0 tinyrecord-0.2.0\n" 165 | ] 166 | } 167 | ] 168 | }, 169 | { 170 | "cell_type": "markdown", 171 | "source": [ 172 | "Schoeffler, M. et al., (2018). webMUSHRA — A Comprehensive Framework for Web-based Listening Tests. Journal of Open Research Software. 6(1), p.8." 173 | ], 174 | "metadata": { 175 | "id": "eGJ1mAjnQm-R" 176 | } 177 | }, 178 | { 179 | "cell_type": "markdown", 180 | "source": [ 181 | "## Installing and Configuring HTTP Tunneling using Ngrok" 182 | ], 183 | "metadata": { 184 | "id": "gPCnzyHtBe9B" 185 | } 186 | }, 187 | { 188 | "cell_type": "code", 189 | "source": [ 190 | "# @title\n", 191 | "%%html\n", 192 | "" 193 | ], 194 | "metadata": { 195 | "colab": { 196 | "base_uri": "https://localhost:8080/", 197 | "height": 336 198 | }, 199 | "cellView": "form", 200 | "id": "uC9qTM6PFRa0", 201 | "outputId": "1170915b-e255-4811-85c5-d975e77f5e36" 202 | }, 203 | "execution_count": null, 204 | "outputs": [ 205 | { 206 | "output_type": "display_data", 207 | "data": { 208 | "text/plain": [ 209 | "" 210 | ], 211 | "text/html": [ 212 | "\n" 213 | ] 214 | }, 215 | "metadata": {} 216 | } 217 | ] 218 | }, 219 | { 220 | "cell_type": "code", 221 | "source": [ 222 | "# Install PyNgrok\n", 223 | "!pip install pyngrok" 224 | ], 225 | "metadata": { 226 | "colab": { 227 | "base_uri": "https://localhost:8080/" 228 | }, 229 | "id": "PO0bezQNSymN", 230 | "outputId": "e70f7c2e-6434-4b15-fde4-06631e5418c5" 231 | }, 232 | "execution_count": null, 233 | "outputs": [ 234 | { 235 | "output_type": "stream", 236 | "name": "stdout", 237 | "text": [ 238 | "Requirement already satisfied: pyngrok in /usr/local/lib/python3.10/dist-packages (7.2.0)\n", 239 | "Requirement already satisfied: PyYAML>=5.1 in /usr/local/lib/python3.10/dist-packages (from pyngrok) (6.0.1)\n" 240 | ] 241 | } 242 | ] 243 | }, 244 | { 245 | "cell_type": "code", 246 | "source": [ 247 | "# Configure Ngrok and Setup Tunneling\n", 248 | "from pyngrok import ngrok\n", 249 | "ngrok.set_auth_token(\"Put your Auth Token here\")\n", 250 | "public_url = ngrok.connect(5000)\n", 251 | "public_url.public_url += \"/admin\"" 252 | ], 253 | "metadata": { 254 | "id": "epJPAFpmS9e5" 255 | }, 256 | "execution_count": null, 257 | "outputs": [] 258 | }, 259 | { 260 | "cell_type": "code", 261 | "source": [ 262 | "# Print Public URL and start webMUSHRA application\n", 263 | "print(public_url)\n", 264 | "!pymushra server" 265 | ], 266 | "metadata": { 267 | "colab": { 268 | "base_uri": "https://localhost:8080/" 269 | }, 270 | "id": "lBGNgjX3UKFw", 271 | "outputId": "af8e540d-a225-43c2-c280-fb7e8daecd3d" 272 | }, 273 | "execution_count": null, 274 | "outputs": [ 275 | { 276 | "output_type": "stream", 277 | "name": "stdout", 278 | "text": [ 279 | "NgrokTunnel: \"https://5fe0-34-125-99-102.ngrok-free.app/admin\" -> \"http://localhost:5000\"\n", 280 | " * Serving Flask app 'pymushra.service'\n", 281 | " * Debug mode: on\n", 282 | "\u001b[31m\u001b[1mWARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.\u001b[0m\n", 283 | " * Running on all addresses (0.0.0.0)\n", 284 | " * Running on http://127.0.0.1:5000\n", 285 | " * Running on http://172.28.0.12:5000\n", 286 | "\u001b[33mPress CTRL+C to quit\u001b[0m\n", 287 | " * Restarting with stat\n", 288 | " * Debugger is active!\n", 289 | " * Debugger PIN: 955-957-688\n", 290 | "127.0.0.1 - - [25/Jul/2024 10:23:09] \"\u001b[32mGET /admin HTTP/1.1\u001b[0m\" 308 -\n", 291 | "[ responses ... wm \n", 292 | " stimulus comment ... config date\n", 293 | "0 reference None ... configs/mushra_nowav.yaml 2024-07-25 08:52:16.999505\n", 294 | "1 C2 None ... configs/mushra_nowav.yaml 2024-07-25 08:52:16.999505\n", 295 | "2 C3 None ... configs/mushra_nowav.yaml 2024-07-25 08:52:16.999505\n", 296 | "3 C1 None ... configs/mushra_nowav.yaml 2024-07-25 08:52:16.999505\n", 297 | "\n", 298 | "[4 rows x 10 columns], responses ... wm \n", 299 | " stimulus comment ... config date\n", 300 | "0 C4 None ... configs/mushra_noloop.yaml 2024-07-25 10:02:45.239587\n", 301 | "1 C10 None ... configs/mushra_noloop.yaml 2024-07-25 10:02:45.239587\n", 302 | "2 C9 None ... configs/mushra_noloop.yaml 2024-07-25 10:02:45.239587\n", 303 | "3 C7 None ... configs/mushra_noloop.yaml 2024-07-25 10:02:45.239587\n", 304 | "4 C2 None ... configs/mushra_noloop.yaml 2024-07-25 10:02:45.239587\n", 305 | "5 C5 None ... configs/mushra_noloop.yaml 2024-07-25 10:02:45.239587\n", 306 | "6 C1 None ... configs/mushra_noloop.yaml 2024-07-25 10:02:45.239587\n", 307 | "7 C3 None ... configs/mushra_noloop.yaml 2024-07-25 10:02:45.239587\n", 308 | "8 reference None ... configs/mushra_noloop.yaml 2024-07-25 10:02:45.239587\n", 309 | "9 C8 None ... configs/mushra_noloop.yaml 2024-07-25 10:02:45.239587\n", 310 | "10 C6 None ... configs/mushra_noloop.yaml 2024-07-25 10:02:45.239587\n", 311 | "11 C11 None ... configs/mushra_noloop.yaml 2024-07-25 10:02:45.239587\n", 312 | "\n", 313 | "[12 rows x 10 columns]]\n", 314 | "127.0.0.1 - - [25/Jul/2024 10:23:09] \"GET /admin/ HTTP/1.1\" 200 -\n", 315 | "127.0.0.1 - - [25/Jul/2024 10:23:10] \"GET /favicon.ico HTTP/1.1\" 200 -\n", 316 | "127.0.0.1 - - [25/Jul/2024 10:23:21] \"GET /?config=myNewTest.yaml HTTP/1.1\" 200 -\n", 317 | "127.0.0.1 - - [25/Jul/2024 10:23:21] \"GET /lib/external/jquery.mobile/plugins/jQuery-Mobile-Progress-Bar-with-Percentage/src/css/tolito-1.0.1.css HTTP/1.1\" 200 -\n", 318 | "127.0.0.1 - - [25/Jul/2024 10:23:21] \"GET /lib/external/jquery.mobile/jquery.mobile-1.4.4.css HTTP/1.1\" 200 -\n", 319 | "127.0.0.1 - - [25/Jul/2024 10:23:21] \"GET /design/jquery.mobile-theme/themes/default/alabs_0_3.css HTTP/1.1\" 200 -\n", 320 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /design/jquery.mobile-theme/themes/default/jquery.mobile.icons.min.css HTTP/1.1\" 200 -\n", 321 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/external/wmSlider/slider.min.css HTTP/1.1\" 200 -\n", 322 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/external/wmSlider/slider.pips.css HTTP/1.1\" 200 -\n", 323 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/external/mousetrap/mousetrap.min.js HTTP/1.1\" 200 -\n", 324 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/webmushra/datamodel/Participant.js HTTP/1.1\" 200 -\n", 325 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/external/jquery.mobile/patches/vertical-slider.js HTTP/1.1\" 200 -\n", 326 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/webmushra/spatial/objects/LocalizationObject.js HTTP/1.1\" 200 -\n", 327 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/external/jquery.mobile/plugins/jQuery-Mobile-Icon-Pack/original/jqm-icon-pack-2.0-original.css HTTP/1.1\" 200 -\n", 328 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/external/three.js/three.js HTTP/1.1\" 200 -\n", 329 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/external/three.js/three-tube/threejs-tube.js HTTP/1.1\" 200 -\n", 330 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/external/three.js/loaders/ColladaLoader.js HTTP/1.1\" 200 -\n", 331 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/external/jquery.mobile/jquery.mobile-1.4.4.js HTTP/1.1\" 200 -\n", 332 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/external/wmSlider/slider.min.js HTTP/1.1\" 200 -\n", 333 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/external/jquery.mobile/plugins/jQuery-Mobile-Progress-Bar-with-Percentage/src/js/tolito-1.0.1.js HTTP/1.1\" 200 -\n", 334 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/external/jquery/jquery-2.1.1.min.js HTTP/1.1\" 200 -\n", 335 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/webmushra/business/PageManager.js HTTP/1.1\" 200 -\n", 336 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/webmushra/audio/MushraAudioControl.js HTTP/1.1\" 200 -\n", 337 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/webmushra/misc/WaveformVisualizer.js HTTP/1.1\" 200 -\n", 338 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/webmushra/datamodel/LikertMultiStimulusRating.js HTTP/1.1\" 200 -\n", 339 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/webmushra/misc/MushraValidator.js HTTP/1.1\" 200 -\n", 340 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/webmushra/pages/LikertSingleStimulusPage.js HTTP/1.1\" 200 -\n", 341 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/webmushra/misc/LikertScale.js HTTP/1.1\" 200 -\n", 342 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/webmushra/business/ErrorHandler.js HTTP/1.1\" 200 -\n", 343 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/external/uuidv4/uuidv4.min.js HTTP/1.1\" 200 -\n", 344 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/webmushra/nls/nls.js HTTP/1.1\" 200 -\n", 345 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/webmushra/datamodel/Stimulus.js HTTP/1.1\" 200 -\n", 346 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/webmushra/audio/GenericAudioControl.js HTTP/1.1\" 200 -\n", 347 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /design/images/alabs_new.png HTTP/1.1\" 200 -\n", 348 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/webmushra/pages/GenericPage.js HTTP/1.1\" 200 -\n", 349 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/webmushra/misc/Shuffle.js HTTP/1.1\" 200 -\n", 350 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/webmushra/spatial/objects/LEVObject.js HTTP/1.1\" 200 -\n", 351 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/webmushra/pages/LikertSingleStimulusPageManager.js HTTP/1.1\" 200 -\n", 352 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/webmushra/pages/FinishPage.js HTTP/1.1\" 200 -\n", 353 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/webmushra/pages/SpatialPage.js HTTP/1.1\" 200 -\n", 354 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/webmushra/pages/PairedComparisonPageManager.js HTTP/1.1\" 200 -\n", 355 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/webmushra/datamodel/LikertSingleStimulusRating.js HTTP/1.1\" 200 -\n", 356 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/external/yaml/yaml.min.js HTTP/1.1\" 200 -\n", 357 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /design/images/iis.svg HTTP/1.1\" 200 -\n", 358 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/webmushra/control/InputController.js HTTP/1.1\" 200 -\n", 359 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/webmushra/spatial/objects/ASWObject.js HTTP/1.1\" 200 -\n", 360 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/webmushra/control/BS1116AudioControlInputController.js HTTP/1.1\" 200 -\n", 361 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/webmushra/control/MushraAudioControlInputController.js HTTP/1.1\" 200 -\n", 362 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/webmushra/pages/PairedComparisonPage.js HTTP/1.1\" 200 -\n", 363 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/webmushra/misc/Localizer.js HTTP/1.1\" 200 -\n", 364 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/webmushra/misc/FilePlayer.js HTTP/1.1\" 200 -\n", 365 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/webmushra/pages/BS1116Page.js HTTP/1.1\" 200 -\n", 366 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/webmushra/business/DataSender.js HTTP/1.1\" 200 -\n", 367 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/webmushra/datamodel/Session.js HTTP/1.1\" 200 -\n", 368 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/webmushra/datamodel/SpatialLocalizationRating.js HTTP/1.1\" 200 -\n", 369 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/webmushra/pages/MushraPage.js HTTP/1.1\" 200 -\n", 370 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/webmushra/spatial/Scene.js HTTP/1.1\" 200 -\n", 371 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /design/style.css HTTP/1.1\" 200 -\n", 372 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/webmushra/datamodel/SpatialLEVRating.js HTTP/1.1\" 200 -\n", 373 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/webmushra/audio/AudioFileLoader.js HTTP/1.1\" 200 -\n", 374 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/webmushra/business/PageTemplateRenderer.js HTTP/1.1\" 200 -\n", 375 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/webmushra/datamodel/Trial.js HTTP/1.1\" 200 -\n", 376 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/webmushra/datamodel/BS1116Rating.js HTTP/1.1\" 200 -\n", 377 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/webmushra/control/FilePlayerController.js HTTP/1.1\" 200 -\n", 378 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/webmushra/pages/ConsentPage.js HTTP/1.1\" 200 -\n", 379 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/webmushra/spatial/Camera.js HTTP/1.1\" 200 -\n", 380 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/webmushra/datamodel/MUSHRARating.js HTTP/1.1\" 200 -\n", 381 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/webmushra/pages/VolumePage.js HTTP/1.1\" 200 -\n", 382 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/webmushra/pages/BS1116PageManager.js HTTP/1.1\" 200 -\n", 383 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/webmushra/datamodel/SpatialHWDRating.js HTTP/1.1\" 200 -\n", 384 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/webmushra/datamodel/SpatialASWRating.js HTTP/1.1\" 200 -\n", 385 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/webmushra/datamodel/PairedComparisonChoice.js HTTP/1.1\" 200 -\n", 386 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /lib/webmushra/pages/LikertMultiStimulusPage.js HTTP/1.1\" 200 -\n", 387 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /design/images/techfak.svg HTTP/1.1\" 200 -\n", 388 | "127.0.0.1 - - [25/Jul/2024 10:23:22] \"GET /startup.js HTTP/1.1\" 200 -\n", 389 | "127.0.0.1 - - [25/Jul/2024 10:23:23] \"GET /configs/myNewTest.yaml HTTP/1.1\" 200 -\n", 390 | "127.0.0.1 - - [25/Jul/2024 10:23:23] \"GET /lib/external/jquery.mobile/plugins/jQuery-Mobile-Icon-Pack/original/images/ajax-loader.gif HTTP/1.1\" 200 -\n", 391 | "127.0.0.1 - - [25/Jul/2024 10:23:23] \"GET /configs/resources/audio/1_0_0_0_0_0_0_1.wav HTTP/1.1\" 200 -\n", 392 | "127.0.0.1 - - [25/Jul/2024 10:23:23] \"GET /configs/resources/audio/1_0_1_1_0_1_1_1.wav HTTP/1.1\" 200 -\n", 393 | "127.0.0.1 - - [25/Jul/2024 10:23:23] \"GET /configs/resources/audio/0_0_1_0_0_1_1_0.wav HTTP/1.1\" 200 -\n", 394 | "127.0.0.1 - - [25/Jul/2024 10:23:23] \"GET /configs/resources/audio/0_0_1_1_1_0_0_0.wav HTTP/1.1\" 200 -\n", 395 | "[1]\n", 396 | "127.0.0.1 - - [25/Jul/2024 10:24:08] \"POST /service/write.php HTTP/1.1\" 200 -\n" 397 | ] 398 | } 399 | ] 400 | }, 401 | { 402 | "cell_type": "code", 403 | "source": [ 404 | "# Kill ngrok tunnel when it's done.\n", 405 | "ngrok.kill()" 406 | ], 407 | "metadata": { 408 | "id": "rXB4m3xhTv4I" 409 | }, 410 | "execution_count": null, 411 | "outputs": [] 412 | }, 413 | { 414 | "cell_type": "markdown", 415 | "source": [ 416 | "## Configuring a webMUSHRA Listening Test" 417 | ], 418 | "metadata": { 419 | "id": "IJluFKyACQt7" 420 | } 421 | }, 422 | { 423 | "cell_type": "code", 424 | "source": [ 425 | "# @title\n", 426 | "%%html\n", 427 | "" 428 | ], 429 | "metadata": { 430 | "cellView": "form", 431 | "id": "JfEh3y2HttTm", 432 | "outputId": "9b65ddfe-c5c8-49c8-8e7e-a544a814c071", 433 | "colab": { 434 | "base_uri": "https://localhost:8080/", 435 | "height": 336 436 | } 437 | }, 438 | "execution_count": null, 439 | "outputs": [ 440 | { 441 | "output_type": "display_data", 442 | "data": { 443 | "text/plain": [ 444 | "" 445 | ], 446 | "text/html": [ 447 | "\n" 448 | ] 449 | }, 450 | "metadata": {} 451 | } 452 | ] 453 | }, 454 | { 455 | "cell_type": "markdown", 456 | "source": [ 457 | "### Get your audio files into 'webmushra/configs/resources/audio/'" 458 | ], 459 | "metadata": { 460 | "id": "Kn8qGLobCWDH" 461 | } 462 | }, 463 | { 464 | "cell_type": "code", 465 | "source": [ 466 | "# Getting some wavefiles\n", 467 | "import torchaudio\n", 468 | "yesno_data = torchaudio.datasets.YESNO('.', download=True)\n" 469 | ], 470 | "metadata": { 471 | "id": "c7TjhjcoVxNe" 472 | }, 473 | "execution_count": null, 474 | "outputs": [] 475 | }, 476 | { 477 | "cell_type": "code", 478 | "source": [ 479 | "# Move some audio files to the appropriate folder\n", 480 | "import pathlib\n", 481 | "import shutil\n", 482 | "files_list = list(pathlib.Path('/content/waves_yesno/').glob('*.wav'))\n", 483 | "for file in files_list[:4]:\n", 484 | " shutil.copy(file, pathlib.Path('/content/webmushra/configs/resources/audio/') / file.name)\n" 485 | ], 486 | "metadata": { 487 | "id": "SWHopblhDRLP" 488 | }, 489 | "execution_count": null, 490 | "outputs": [] 491 | }, 492 | { 493 | "cell_type": "markdown", 494 | "source": [ 495 | "## Create your Test Configuration File" 496 | ], 497 | "metadata": { 498 | "id": "ffpelagHMvxc" 499 | } 500 | }, 501 | { 502 | "cell_type": "markdown", 503 | "source": [ 504 | "## Read results as Pandas DataFrame" 505 | ], 506 | "metadata": { 507 | "id": "8M6lBnKIMPpu" 508 | } 509 | }, 510 | { 511 | "cell_type": "code", 512 | "source": [ 513 | "# @title\n", 514 | "%%html\n", 515 | "\n" 516 | ], 517 | "metadata": { 518 | "cellView": "form", 519 | "id": "FxXkTKmOF34L", 520 | "outputId": "e863e192-a62a-44c4-c7da-eb873abf7ac4", 521 | "colab": { 522 | "base_uri": "https://localhost:8080/", 523 | "height": 336 524 | } 525 | }, 526 | "execution_count": 1, 527 | "outputs": [ 528 | { 529 | "output_type": "display_data", 530 | "data": { 531 | "text/plain": [ 532 | "" 533 | ], 534 | "text/html": [ 535 | "\n" 536 | ] 537 | }, 538 | "metadata": {} 539 | } 540 | ] 541 | }, 542 | { 543 | "cell_type": "code", 544 | "source": [ 545 | "from pymushra.pymushra import collection_to_df" 546 | ], 547 | "metadata": { 548 | "id": "eFdNki1HI7Hn" 549 | }, 550 | "execution_count": null, 551 | "outputs": [] 552 | }, 553 | { 554 | "cell_type": "code", 555 | "source": [ 556 | "from tinydb import TinyDB\n", 557 | "db = TinyDB(\"/content/db/webmushra.json\")\n", 558 | "print(db.tables())\n", 559 | "df = collection_to_df(db.table(\"mushra_withloop_5\"))\n", 560 | "df.head()" 561 | ], 562 | "metadata": { 563 | "colab": { 564 | "base_uri": "https://localhost:8080/", 565 | "height": 535 566 | }, 567 | "id": "-KMXVTNrJOgu", 568 | "outputId": "55141c62-dcbe-407b-95fb-d7f1fb3b9332" 569 | }, 570 | "execution_count": null, 571 | "outputs": [ 572 | { 573 | "output_type": "stream", 574 | "name": "stdout", 575 | "text": [ 576 | "{'mushra_withloop_5', 'mushra_noloop', 'mushra_nowav'}\n" 577 | ] 578 | }, 579 | { 580 | "output_type": "execute_result", 581 | "data": { 582 | "text/plain": [ 583 | " wm \\\n", 584 | " id date \n", 585 | "0 trial_myMushra 2024-07-25 10:24:08.925649 \n", 586 | "1 trial_myMushra 2024-07-25 10:24:08.925649 \n", 587 | "2 trial_myMushra 2024-07-25 10:24:08.925649 \n", 588 | "3 trial_myMushra 2024-07-25 10:24:08.925649 \n", 589 | "4 trial_myMushra 2024-07-25 10:24:08.925649 \n", 590 | "\n", 591 | " questionaire responses \\\n", 592 | " uuid comment score \n", 593 | "0 47d39339-2aea-4c5b-96b5-bb2ff4bb796b None 69 \n", 594 | "1 47d39339-2aea-4c5b-96b5-bb2ff4bb796b None 100 \n", 595 | "2 47d39339-2aea-4c5b-96b5-bb2ff4bb796b None 100 \n", 596 | "3 47d39339-2aea-4c5b-96b5-bb2ff4bb796b None 59 \n", 597 | "4 47d39339-2aea-4c5b-96b5-bb2ff4bb796b None 100 \n", 598 | "\n", 599 | " wm responses wm responses \n", 600 | " config testId time type stimulus \n", 601 | "0 configs/myNewTest.yaml mushra_withloop_5 41015 mushra C2 \n", 602 | "1 configs/myNewTest.yaml mushra_withloop_5 41015 mushra reference \n", 603 | "2 configs/myNewTest.yaml mushra_withloop_5 41015 mushra anchor70 \n", 604 | "3 configs/myNewTest.yaml mushra_withloop_5 41015 mushra C1 \n", 605 | "4 configs/myNewTest.yaml mushra_withloop_5 41015 mushra C3 " 606 | ], 607 | "text/html": [ 608 | "\n", 609 | "

\n", 610 | "

\n", 611 | "\n", 624 | "\n", 625 | " \n", 626 | " \n", 627 | " \n", 628 | " \n", 629 | " \n", 630 | " \n", 631 | " \n", 632 | " \n", 633 | " \n", 634 | " \n", 635 | " \n", 636 | " \n", 637 | " \n", 638 | " \n", 639 | " \n", 640 | " \n", 641 | " \n", 642 | " \n", 643 | " \n", 644 | " \n", 645 | " \n", 646 | " \n", 647 | " \n", 648 | " \n", 649 | " \n", 650 | " \n", 651 | " \n", 652 | " \n", 653 | " \n", 654 | " \n", 655 | " \n", 656 | " \n", 657 | " \n", 658 | " \n", 659 | " \n", 660 | " \n", 661 | " \n", 662 | " \n", 663 | " \n", 664 | " \n", 665 | " \n", 666 | " \n", 667 | " \n", 668 | " \n", 669 | " \n", 670 | " \n", 671 | " \n", 672 | " \n", 673 | " \n", 674 | " \n", 675 | " \n", 676 | " \n", 677 | " \n", 678 | " \n", 679 | " \n", 680 | " \n", 681 | " \n", 682 | " \n", 683 | " \n", 684 | " \n", 685 | " \n", 686 | " \n", 687 | " \n", 688 | " \n", 689 | " \n", 690 | " \n", 691 | " \n", 692 | " \n", 693 | " \n", 694 | " \n", 695 | " \n", 696 | " \n", 697 | " \n", 698 | " \n", 699 | " \n", 700 | " \n", 701 | " \n", 702 | " \n", 703 | " \n", 704 | " \n", 705 | " \n", 706 | " \n", 707 | " \n", 708 | " \n", 709 | " \n", 710 | " \n", 711 | " \n", 712 | " \n", 713 | " \n", 714 | " \n", 715 | " \n", 716 | " \n", 717 | "

	wm	questionaire	responses	wm	responses	wm	responses
	id	date	uuid	comment	score	config	testId	time	type	stimulus
0	trial_myMushra	2024-07-25 10:24:08.925649	47d39339-2aea-4c5b-96b5-bb2ff4bb796b	None	69	configs/myNewTest.yaml	mushra_withloop_5	41015	mushra	C2
1	trial_myMushra	2024-07-25 10:24:08.925649	47d39339-2aea-4c5b-96b5-bb2ff4bb796b	None	100	configs/myNewTest.yaml	mushra_withloop_5	41015	mushra	reference
2	trial_myMushra	2024-07-25 10:24:08.925649	47d39339-2aea-4c5b-96b5-bb2ff4bb796b	None	100	configs/myNewTest.yaml	mushra_withloop_5	41015	mushra	anchor70
3	trial_myMushra	2024-07-25 10:24:08.925649	47d39339-2aea-4c5b-96b5-bb2ff4bb796b	None	59	configs/myNewTest.yaml	mushra_withloop_5	41015	mushra	C1
4	trial_myMushra	2024-07-25 10:24:08.925649	47d39339-2aea-4c5b-96b5-bb2ff4bb796b	None	100	configs/myNewTest.yaml	mushra_withloop_5	41015	mushra	C3

\n", 718 | "

\n", 719 | "

\n", 720 | "\n", 721 | "

\n", 722 | " \n", 730 | "\n", 731 | " \n", 771 | "\n", 772 | " \n", 796 | "

\n", 797 | "\n", 798 | "\n", 799 | "

\n", 800 | " \n", 811 | "\n", 812 | "\n", 901 | "\n", 902 | " \n", 924 | "

\n", 925 | "\n", 926 | "

\n", 927 | "

\n" 928 | ], 929 | "application/vnd.google.colaboratory.intrinsic+json": { 930 | "type": "dataframe", 931 | "variable_name": "df", 932 | "repr_error": "Out of range float values are not JSON compliant: nan" 933 | } 934 | }, 935 | "metadata": {}, 936 | "execution_count": 71 937 | } 938 | ] 939 | }, 940 | { 941 | "cell_type": "code", 942 | "source": [ 943 | "# @title\n", 944 | "%%html\n", 945 | "" 946 | ], 947 | "metadata": { 948 | "colab": { 949 | "base_uri": "https://localhost:8080/", 950 | "height": 445 951 | }, 952 | "cellView": "form", 953 | "id": "cw3__05kF8Ex", 954 | "outputId": "383378b4-a9a7-4c0c-e6b4-55e2a7d04af6" 955 | }, 956 | "execution_count": null, 957 | "outputs": [ 958 | { 959 | "output_type": "display_data", 960 | "data": { 961 | "text/plain": [ 962 | "" 963 | ], 964 | "text/html": [ 965 | "\n" 966 | ] 967 | }, 968 | "metadata": {} 969 | } 970 | ] 971 | } 972 | ] 973 | } --------------------------------------------------------------------------------