└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # **Handy ASR noise dataset** 2 | 3 | A handy dataset for noise augmentations for ASR / TTS: 4 | - ~20k noise files; 5 | - ~200 distinct categories; 6 | 7 | ![](https://pics.spark-in.me/upload/c65cd3ef082bc035da000300e5b83eab.png) 8 | 9 | Contact [us](mailto:open_stt@googlegroups.com)! 10 | Open issues, collaborate, submit a PR, contribute, share your datasets! 11 | 12 | ## **Contribution ideas** 13 | 14 | Add much more data from BBC Sound Effects dataset. 15 | 16 | # **Download links** 17 | 18 | Meta data [file](https://asr-noise.fra1.digitaloceanspaces.com/noises_df.feather) / 2.0M / `73cb528656a484b20e02d6c5fd05f14c` 19 | Noise archive [file](https://asr-noise.fra1.digitaloceanspaces.com/asr_noises.tar.gz) / 4.7G / `5e069c867a0da891f57616905129b6c3` 20 | 21 | **Open feather file:** 22 | ``` 23 | import pandas as pd 24 | 25 | df = pd.read_feather(file_path) 26 | ``` 27 | 28 | # **Data preparation** 29 | 30 | The dataset is compiled using open domain sources. 31 | All labels resembling loud human speech were removed (but background noise, i.e. street chatter, was not removed). 32 | All of the items are 0 - 60 seconds long. 33 | 34 | ![](https://pics.spark-in.me/upload/f935c54efed15bd40f1262d29b5dbbad.png) 35 | 36 | All files are normalized as follows: 37 | - Converted to mono, if necessary; 38 | - Converted to 16 kHz sampling rate, if necessary; 39 | - Stored as 16-bit integers; 40 | 41 | # **Contacts** 42 | 43 | Please contact us [here](mailto:open_stt@googlegroups.com) or just create a GitHub issue! 44 | 45 | # **License** 46 | cc-by 47 | 48 | # **References / citations / licenses** 49 | 50 | **Links / license** 51 | - [rnnoise](https://people.xiph.org/~jm/demo/rnnoise/) / [CC0](https://creativecommons.org/publicdomain/zero/1.0/); 52 | - [acoustic events](https://data.vision.ee.ethz.ch/cvl/ae_dataset/) / `if you end up using the dataset, we ask you to cite the following paper`; 53 | - [urban sounds](urbansounddataset.weebly.com/urbansound8k.html) / [cc-by-nc](http://creativecommons.org/licenses/by-nc/3.0/); 54 | - [esc-50](https://github.com/karoldvl/ESC-50) / [license](https://github.com/karoldvl/ESC-50/blob/master/LICENSE) (cc-by-nc); 55 | - [freiburg-106](http://www.csc.kth.se/~jastork/pages/datasets.html) / ?; 56 | - [sound-events](https://www.sciencedirect.com/science/article/abs/pii/S0167865515002925) / ?; 57 | - [BBC Sound Effects](http://bbcsfx.acropolis.org.uk/) (a small part) / [license](https://github.com/bbcarchdev/Remarc/blob/master/doc/2016.09.27_RemArc_Content%20licence_Terms%20of%20Use_final.pdf); 58 | - [nar dataset](https://team.inria.fr/perception/nard/) / `the data are freely accessible for scientific research purposes and for non-commercial applications` 59 | 60 | **Paper citations:** 61 | - Naoya Takahashi, Michael Gygli, Beat Pfister and Luc Van Gool,"Deep Convolutional Neural Networks and Data Augmentation for Acoustic Event Recognition", Proc. Interspeech 2016, San Fransisco; 62 | - J. Salamon, C. Jacoby and J. P. Bello, "A Dataset and Taxonomy for Urban Sound Research", 22nd ACM International Conference on Multimedia, Orlando USA, Nov. 2014; 63 | 64 | 65 | # **Donations** 66 | 67 | [Donate](https://buymeacoff.ee/8oneCIN) (each coffee pays for several full downloads) / use our DO referral [link](https://sohabr.net/habr/post/357748/) to help. 68 | 69 | --------------------------------------------------------------------------------