└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # Getting Started in 'ML-Audio' 2 | Suggestions for students. 3 | ## About 4 | Audio and acoustics students sometimes ask "How do I get started learning machine learning?" Not everyone gets their start in a major research environment. 5 | 6 | This page began after @drscotthawley felt sufficiently embarassed about not having a coherent answer. Until someone creates a "ML for Audio" online course -- **update 1/7/20:** See Valerio Velardo's ["Deep Learning for Audio"](https://www.youtube.com/watch?v=fMqL5vckiU0&list=PL-wATfeyAMNrtbkCNsLcpoAyBBRJZVlnf)! -- this page may prove helpful. 7 | 8 | 9 | Notes: 10 | - ***This is a collaborative page. Please suggest additions, re-organizations, edits, updates, etc., either via Issues or Pull Requests.*** *(In addition, @drscotthawley may gladly cede control of this content to whichever student or group wants to Wiki-fy it!)* 11 | - This page is bound to be **biased** toward musical audio and deep neural network methods. Inclusion of other domains and methods -- e.g., Gaussian Processes, NMF, RBMs,... -- will depend on your submissions! 12 | - The field advances rapidly. Some topics listed here will be timeless, others will go obsolete. Such is the nature of the field. Let's try to keep this relevant, at least for "getting started." 13 | 14 | 15 | ## Introductory Remarks 16 | "Read all the tutorials and papers you can, watch videos of all the talks you can, try out and modify whatever code you can get your hands on, take whatever courses you can find, go to whatever conferences you can. Try to build your own system, and spend all your nights and weekends improving it." 17 | 18 | This was the best advice some of us could give, because it was the path we took. Some such stories are shared below. This page is an attempt to offer something more "direct" for newcomers. 19 | 20 | Nevertheless, a few reflective narratives may provide helpful perspectives... 21 | 22 | 23 | ## Essays / Reflections / Autobiographical Sketches 24 | Many practicioners took very different *interdisciplinary* paths, learning from a hodgepodge of information, in order to complement their existing strengths and fill in gaps in their knowledge. Here are some stories. 25 | 26 | *(For submissions: Either link to elsewhere on the web, or add a file to the repo via PR. Try to make submissions conclude with a section on what you would say to new students.)* 27 | 28 | * How `__[someone]__` got started 29 | * `__[a young person]'s__` story 30 | * The *N*-step Process that `__[this person]__` thinks all students should follow 31 | * `__[a long-time veteran]'s__` view of the field 32 | * Where `__[a hot-shot postdoc]__` thinks the field is heading 33 | * Things `__[so-and-so]__` wishes someone had told him/her 34 | * ...your name(s) here!...**Chris Donahue, Christian Steinmetz, Jordi Pons, Keunwoo Choi, Faro, Justin Salomon,...?** 35 | 36 | 37 | ## Quick Quotes 38 | * [Justin Salomon](https://twitter.com/justin_salamon/status/1202016519720300545): "Anyone working in ML, *anyone*, should be *obliged* to curate a dataset before they're allowed to train a single model. The lessons learnt in the process are invaluable, and the dangers of skipping said lessons are manifold (see what I did there?)" 39 | * `__[so-and-so]'s__` suggestion 40 | * A nugget of wisdom from `__[noted practicioner]__` 41 | 42 | ## Online Courses 43 | 44 | * [Valerio Velardo's "Deep Learning for Audio"](https://www.youtube.com/watch?v=fMqL5vckiU0&list=PL-wATfeyAMNrtbkCNsLcpoAyBBRJZVlnf) 45 | * [Andrew Ng's ML Course](https://www.coursera.org/learn/machine-learning) on Coursera (Good all-around ML course) 46 | * [Fast.ai](https://www.fast.ai) (Can get you up and running fast) 47 | * Rebecca Fiebrink's [Machine Learning for Musicians and Artists](https://www.kadenze.com/courses/machine-learning-for-musicians-and-artists/info) on Kadenze (No math!) 48 | * [Neural Network Programming - Deep Learning with PyTorch](https://deeplizard.com/learn/video/v5cngxo4mIg) 49 | * [Advanced Digital Signal Processing](https://github.com/GuitarsAI/ADSP_Tutorials) series taught by Dr.-Ing Gerald Schuller of Fraunhofer IDMT, with [videos](https://www.youtube.com/playlist?list=PL6QnpHKwdPYjbDezYkAE-sAQ5MOpYeqM6) and acommpanying Jupyter notebooks by [Renato Profeta](https://twitter.com/guitars_ai) 50 | 51 | ## Tutorials 52 | 53 | (I'm often underwhelmed with audio-specific tutorials, actually. No offense! Feel free to suggest some. Here are a couple on related topics that I've found inspiring) 54 | * Andrew Trask's ["Anyone Can Learn To Code an LSTM-RNN in Python"](https://iamtrask.github.io/2015/11/15/anyone-can-code-lstm/) 55 | * [Neural Network Programming, Deep Learning with PyTorch](https://deeplizard.com/learn/video/v5cngxo4mIg) (Learn how to program neural networks using PyTorch) 56 | * [Machine Learning & Deep Learning Fundamentals](https://deeplizard.com/learn/video/gZmobeGL0Yg) (Good high level intro to ML concepts and how neural networks operate) 57 | Signal Processing Topics: 58 | * Yuge Shi's ["Gaussian Processes, Not Quite for Dummies"](https://thegradient.pub/gaussian-process-not-quite-for-dummies/) 59 | 60 | ## Talks (at conferences) 61 | that we found helpful/inspiring (and are hopefully still relevant) 62 | * Paris Smaragdis at SANE 2015: ["NMF? Neural Nets? It’s all the same..."](https://www.youtube.com/watch?v=wfmpViJIjWw) 63 | * Ron Weiss at SANE 2015: ["Training neural network acoustic models on waveforms"](https://www.youtube.com/watch?v=sI_8EA0_ha8) 64 | * Jordi Pons at DLBCN 2018: ["Training neural audio classifiers with few data"](https://www.youtube.com/watch?v=AJ-XM07wSjg) 65 | 66 | ## Key Papers / Codes 67 | (Let's try to list "representative" or "landmark" papers, not just our latest tweak, unless it includes a really good intro/review section. ;-) ) 68 | * Keunwoo Choi et al, ["Automatic tagging using deep convolutional neural networks"](https://arxiv.org/abs/1606.00298) (ISMIR 2016 Best Paper) 69 | * [SampleRNN](https://arxiv.org/abs/1612.07837) 70 | * [WaveNet](https://arxiv.org/pdf/1609.03499.pdf) 71 | * [WaveRNN, i.e. "Efficient Neural Audio Synthesis"](https://arxiv.org/abs/1802.08435) 72 | * [GANSynth](https://magenta.tensorflow.org/gansynth) 73 | * ...more... 74 | 75 | ## Demos 76 | (Not sure if this only means "deployed models you can play with in your browser," or if other things should count as demos) 77 | * Chris Donahue's [WaveGAN Demo](https://chrisdonahue.com/wavegan/) 78 | * Scott Hawley's [SignalTrain Demo](http://www.signaltrain.ml/) 79 | 80 | 81 | ## Packages & Libraries 82 | * [Librosa](https://librosa.github.io/librosa/) 83 | * [Audiomentations, data augmentation for audio](https://github.com/iver56/audiomentations) 84 | * [tf.signal: signal processing for TensorFlow](https://www.tensorflow.org/api_docs/python/tf/signal) 85 | 86 | 87 | ## Tools / GUIs / Gists 88 | * Jesse Engel's [gist to plot "rainbowgrams"](https://gist.github.com/jesseengel/e223622e255bd5b8c9130407397a0494) 89 | 90 | ## Books 91 | 92 | * [Neural Networks and Deep Learning](http://neuralnetworksanddeeplearning.com/) online book. How drscotthawley first started reading. 93 | 94 | 95 | ## Computer-Related Topics 96 | Python: [learnpython.org](https://www.learnpython.org/) 97 | 98 | ## Signal Processing Topics 99 | * [Advanced Digital Signal Processing](https://github.com/GuitarsAI/ADSP_Tutorials) series taught by Dr.-Ing Gerald Schuller of Fraunhofer IDMT, with [videos](https://www.youtube.com/playlist?list=PL6QnpHKwdPYjbDezYkAE-sAQ5MOpYeqM6) and acommpanying Jupyter notebooks by [Renato Profeta](https://twitter.com/guitars_ai) 100 | 101 | ## Statistics / Math Topics 102 | * Gradient Descent 103 | * https://ml-cheatsheet.readthedocs.io/en/latest/gradient_descent.html , 104 | * https://en.wikipedia.org/wiki/Gradient_descent , 105 | * https://www.kdnuggets.com/2017/04/simple-understand-gradient-descent-algorithm.html, 106 | * ["Following Gravity"](https://drscotthawley.github.io/Following-Gravity/) by @drscotthawley 107 | * Principal Component Analysis: ["PCA From Scratch"](https://drscotthawley.github.io/PCA-From-Scratch/) by @drscotthawley 108 | 109 | ## News / Social Media to Follow 110 | Twitter: (this could get really long!) 111 | 112 | 113 | ## Datasets (raw audio) 114 | One finds that many supposed "audio datasets" are really only features or even just metadata! Here are some "raw audio" datasets: 115 | * [NSynth](https://magenta.tensorflow.org/datasets/nsynth) Musical Instruments 116 | * [GTZAN Genre Collection](http://marsyas.info/downloads/datasets.html) (Note [critique by Bob Sturm](https://arxiv.org/abs/1306.1461)) 117 | * [Fraunhofer IDMT Guitar/Bass Effects](https://www.idmt.fraunhofer.de/en/business_units/m2d/smt/audio_effects.html) 118 | * [Urban Sound Dataset](serv.cusp.nyu.edu/projects/urbansounddataset) 119 | * [FreeSound Annotator](https://annotator.freesound.org/) (formerly FreeSound Datasets) 120 | * [Birdvox-Full-Night](https://wp.nyu.edu/birdvox/birdvox-full-night/) 121 | * [SignalTrain LA2A](https://zenodo.org/record/3348083) 122 | * [Kaggle Heartbeat Sounds](https://www.kaggle.com/kinguistics/heartbeat-sounds) 123 | * Search for other [audio datasets at Kaggle](https://www.kaggle.com/datasets?tags=16072-audio+data) (list) 124 | * See this [Long list at audiocontentanalysis.org](https://www.audiocontentanalysis.org/data-sets/), but only some are raw audio 125 | * Another [list of "audio datasets" by Christopher Dossman](https://towardsdatascience.com/a-data-lakes-worth-of-audio-datasets-b45b88cd4ad) 126 | * ...your dataset here... 127 | 128 | 129 | ## "Major" ML-Audio Research/Development Groups 130 | #### Universities: 131 | (or, "Where should I apply for grad school?") 132 | * QMU (London) 133 | * UPF (Barcelona) 134 | * CRRMA (Stanford, San Francisco) 135 | * IRCAM (Paris) 136 | * NYU (New York) 137 | 138 | #### Industry: 139 | ("Where can I get an internship/job"?) 140 | * Google Magenta 141 | * [Google Perception](https://research.google/teams/perception/) ([speech publications](https://research.google/pubs/?team=perception&area=speech-processing)) 142 | * Adobe 143 | * Spotify 144 | * Increasingly, everywhere. ;-) 145 | 146 | ## Conferences 147 | ("Which conference(s) should I go to?" -- asked by student on the day this doc began) 148 | #### Audio-Specific 149 | * AES 150 | * ASA 151 | * DAFx 152 | * ICASSP 153 | * ISMIR 154 | * SANE 155 | 156 | #### General ML 157 | * ICLR 158 | * ICML 159 | * NeurIPS 160 | 161 | ## Journals 162 | ("Where can I get published?") 163 | 164 | 165 | 166 | 167 | 168 | ## Competitions / Benchmarks 169 | Some are yearly, some may be defunct but still interesting. 170 | * [MIREX](https://www.music-ir.org/mirex/wiki/MIREX_HOME) 171 | * `__some source-separation challenge__` 172 | * [Kaggle Heartbeat Sounds](https://www.kaggle.com/kinguistics/heartbeat-sounds) 173 | 174 | 175 | ## Contributors 176 | Ryan Miller 177 | 178 | *If you want your name listed here, you may. ;-)* 179 | --------------------------------------------------------------------------------