├── .gitignore ├── Poster.pdf ├── README.md ├── Report_Paper.pdf ├── audio_classification_conv1d.ipynb ├── audio_classification_conv2d_mfcc.ipynb ├── images ├── audio_features.png └── confusion_matrix.png └── sound_data_exploration.ipynb /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | .eggs/ 17 | lib/ 18 | lib64/ 19 | parts/ 20 | sdist/ 21 | var/ 22 | wheels/ 23 | pip-wheel-metadata/ 24 | share/python-wheels/ 25 | *.egg-info/ 26 | .installed.cfg 27 | *.egg 28 | MANIFEST 29 | 30 | # PyInstaller 31 | # Usually these files are written by a python script from a template 32 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 33 | *.manifest 34 | *.spec 35 | 36 | # Installer logs 37 | pip-log.txt 38 | pip-delete-this-directory.txt 39 | 40 | # Unit test / coverage reports 41 | htmlcov/ 42 | .tox/ 43 | .nox/ 44 | .coverage 45 | .coverage.* 46 | .cache 47 | nosetests.xml 48 | coverage.xml 49 | *.cover 50 | *.py,cover 51 | .hypothesis/ 52 | .pytest_cache/ 53 | 54 | # Translations 55 | *.mo 56 | *.pot 57 | 58 | # Django stuff: 59 | *.log 60 | local_settings.py 61 | db.sqlite3 62 | db.sqlite3-journal 63 | 64 | # Flask stuff: 65 | instance/ 66 | .webassets-cache 67 | 68 | # Scrapy stuff: 69 | .scrapy 70 | 71 | # Sphinx documentation 72 | docs/_build/ 73 | 74 | # PyBuilder 75 | target/ 76 | 77 | # Jupyter Notebook 78 | .ipynb_checkpoints 79 | 80 | # IPython 81 | profile_default/ 82 | ipython_config.py 83 | 84 | # pyenv 85 | .python-version 86 | 87 | # pipenv 88 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 89 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 90 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 91 | # install all needed dependencies. 92 | #Pipfile.lock 93 | 94 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow 95 | __pypackages__/ 96 | 97 | # Celery stuff 98 | celerybeat-schedule 99 | celerybeat.pid 100 | 101 | # SageMath parsed files 102 | *.sage.py 103 | 104 | # Environments 105 | .env 106 | .venv 107 | env/ 108 | venv/ 109 | ENV/ 110 | env.bak/ 111 | venv.bak/ 112 | 113 | # Spyder project settings 114 | .spyderproject 115 | .spyproject 116 | 117 | # Rope project settings 118 | .ropeproject 119 | 120 | # mkdocs documentation 121 | /site 122 | 123 | # mypy 124 | .mypy_cache/ 125 | .dmypy.json 126 | dmypy.json 127 | 128 | # Pyre type checker 129 | .pyre/ 130 | -------------------------------------------------------------------------------- /Poster.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/harmanpreet93/audio_classification/f25806c2ff2cd33e28ef42b40e4ddcf1f4a577ce/Poster.pdf -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## Audio Event Detection 2 | Comparative analysis of few popular machine learning and deep learning algorithms for multi-class audio classification. 3 | 4 | ### Dataset 5 | We conduct experiments on the General-Purpose Tagging of Freesound Audio with AudioSet Labels ([link](https://zenodo.org/record/2552860#.XfwJ8JNKjOT)) to automatically recognize audio events from a wide range of real-time environments. This dataset consists of heterogeneous uncompressed PCM 16 bit, 44.1 kHz, mono audio files consisting of 41 categories drawn from the AudioSet Ontology (related to musical instruments, human sounds, animals, etc.). For this project, we choose 10 audio classes to run experiments on. 6 | 7 | ### Feature Design 8 | Feature extraction is a fundamental and important step for any machine learning algorithm. We extract features from audio data by computing Mel Frequency Cepstral Coefficients (MFCCs) spectrograms to create 2D image-like patches. MFCC features are derived from Fourier transform and filter bank analysis, and they perform much better on downstream tasks than just using raw features like using amplitude. 9 | 10 | ![MFCC Features](https://github.com/harmanpreet93/audio_classification/blob/master/images/audio_features.png) 11 | 12 | 13 | ### Comparative Analysis 14 | CNNs are capable of excellent results when compared to other machine learning algorithms. Experiments were performed with different batch sizes by training on GPU using Adam optimizer. Batch Normalization was applied after all convolutional layers. Machine learning algorithms such as Naive Bayes and Logistic Regression underfit because they don't have enough capacity to model the complex characteristics of our audio data. An extensive comparison can be found in the report. 15 | 16 | ![Confusion Matrix](https://github.com/harmanpreet93/audio_classification/blob/master/images/confusion_matrix.png) 17 | -------------------------------------------------------------------------------- /Report_Paper.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/harmanpreet93/audio_classification/f25806c2ff2cd33e28ef42b40e4ddcf1f4a577ce/Report_Paper.pdf -------------------------------------------------------------------------------- /images/audio_features.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/harmanpreet93/audio_classification/f25806c2ff2cd33e28ef42b40e4ddcf1f4a577ce/images/audio_features.png -------------------------------------------------------------------------------- /images/confusion_matrix.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/harmanpreet93/audio_classification/f25806c2ff2cd33e28ef42b40e4ddcf1f4a577ce/images/confusion_matrix.png --------------------------------------------------------------------------------