├── .gitignore
├── Poster.pdf
├── README.md
├── Report_Paper.pdf
├── audio_classification_conv1d.ipynb
├── audio_classification_conv2d_mfcc.ipynb
├── images
    ├── audio_features.png
    └── confusion_matrix.png
└── sound_data_exploration.ipynb


/.gitignore:
--------------------------------------------------------------------------------
  1 | # Byte-compiled / optimized / DLL files
  2 | __pycache__/
  3 | *.py[cod]
  4 | *$py.class
  5 | 
  6 | # C extensions
  7 | *.so
  8 | 
  9 | # Distribution / packaging
 10 | .Python
 11 | build/
 12 | develop-eggs/
 13 | dist/
 14 | downloads/
 15 | eggs/
 16 | .eggs/
 17 | lib/
 18 | lib64/
 19 | parts/
 20 | sdist/
 21 | var/
 22 | wheels/
 23 | pip-wheel-metadata/
 24 | share/python-wheels/
 25 | *.egg-info/
 26 | .installed.cfg
 27 | *.egg
 28 | MANIFEST
 29 | 
 30 | # PyInstaller
 31 | #  Usually these files are written by a python script from a template
 32 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 33 | *.manifest
 34 | *.spec
 35 | 
 36 | # Installer logs
 37 | pip-log.txt
 38 | pip-delete-this-directory.txt
 39 | 
 40 | # Unit test / coverage reports
 41 | htmlcov/
 42 | .tox/
 43 | .nox/
 44 | .coverage
 45 | .coverage.*
 46 | .cache
 47 | nosetests.xml
 48 | coverage.xml
 49 | *.cover
 50 | *.py,cover
 51 | .hypothesis/
 52 | .pytest_cache/
 53 | 
 54 | # Translations
 55 | *.mo
 56 | *.pot
 57 | 
 58 | # Django stuff:
 59 | *.log
 60 | local_settings.py
 61 | db.sqlite3
 62 | db.sqlite3-journal
 63 | 
 64 | # Flask stuff:
 65 | instance/
 66 | .webassets-cache
 67 | 
 68 | # Scrapy stuff:
 69 | .scrapy
 70 | 
 71 | # Sphinx documentation
 72 | docs/_build/
 73 | 
 74 | # PyBuilder
 75 | target/
 76 | 
 77 | # Jupyter Notebook
 78 | .ipynb_checkpoints
 79 | 
 80 | # IPython
 81 | profile_default/
 82 | ipython_config.py
 83 | 
 84 | # pyenv
 85 | .python-version
 86 | 
 87 | # pipenv
 88 | #   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
 89 | #   However, in case of collaboration, if having platform-specific dependencies or dependencies
 90 | #   having no cross-platform support, pipenv may install dependencies that don't work, or not
 91 | #   install all needed dependencies.
 92 | #Pipfile.lock
 93 | 
 94 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow
 95 | __pypackages__/
 96 | 
 97 | # Celery stuff
 98 | celerybeat-schedule
 99 | celerybeat.pid
100 | 
101 | # SageMath parsed files
102 | *.sage.py
103 | 
104 | # Environments
105 | .env
106 | .venv
107 | env/
108 | venv/
109 | ENV/
110 | env.bak/
111 | venv.bak/
112 | 
113 | # Spyder project settings
114 | .spyderproject
115 | .spyproject
116 | 
117 | # Rope project settings
118 | .ropeproject
119 | 
120 | # mkdocs documentation
121 | /site
122 | 
123 | # mypy
124 | .mypy_cache/
125 | .dmypy.json
126 | dmypy.json
127 | 
128 | # Pyre type checker
129 | .pyre/
130 | 


--------------------------------------------------------------------------------
/Poster.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/harmanpreet93/audio_classification/f25806c2ff2cd33e28ef42b40e4ddcf1f4a577ce/Poster.pdf


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | ## Audio Event Detection
 2 | Comparative analysis of few popular machine learning and deep learning algorithms for multi-class audio classification.  
 3 | 
 4 | ### Dataset
 5 | We conduct experiments on the General-Purpose Tagging of Freesound Audio with AudioSet Labels ([link](https://zenodo.org/record/2552860#.XfwJ8JNKjOT)) to automatically recognize audio events from a wide range of real-time environments. This dataset consists of heterogeneous uncompressed PCM 16 bit, 44.1 kHz, mono audio files consisting of 41  categories drawn from the AudioSet Ontology (related to musical instruments, human sounds, animals, etc.). For this project, we choose 10 audio classes to run experiments on.  
 6 | 
 7 | ### Feature Design 
 8 | Feature extraction is a fundamental and important step for any machine learning algorithm. We extract features from audio data by computing Mel Frequency Cepstral Coefficients (MFCCs) spectrograms to create 2D image-like patches. MFCC features are derived from Fourier transform and filter bank analysis, and they perform much better on downstream tasks than just using raw features like using amplitude.  
 9 | 
10 | ![MFCC Features](https://github.com/harmanpreet93/audio_classification/blob/master/images/audio_features.png)
11 | 
12 | 
13 | ### Comparative Analysis  
14 | CNNs are capable of excellent results when compared to other machine learning algorithms. Experiments were performed with different batch sizes by training on GPU using Adam optimizer. Batch Normalization was applied after all convolutional layers. Machine learning algorithms such as Naive Bayes and Logistic Regression underfit because they don't have enough capacity to model the complex characteristics of our audio data. An extensive comparison can be found in the report.
15 | 
16 | ![Confusion Matrix](https://github.com/harmanpreet93/audio_classification/blob/master/images/confusion_matrix.png)
17 | 


--------------------------------------------------------------------------------
/Report_Paper.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/harmanpreet93/audio_classification/f25806c2ff2cd33e28ef42b40e4ddcf1f4a577ce/Report_Paper.pdf


--------------------------------------------------------------------------------
/images/audio_features.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/harmanpreet93/audio_classification/f25806c2ff2cd33e28ef42b40e4ddcf1f4a577ce/images/audio_features.png


--------------------------------------------------------------------------------
/images/confusion_matrix.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/harmanpreet93/audio_classification/f25806c2ff2cd33e28ef42b40e4ddcf1f4a577ce/images/confusion_matrix.png


--------------------------------------------------------------------------------