├── .gitignore
├── Data-Preprocessing.ipynb
├── README.md
├── Run-All-on-Colab.ipynb
├── Training-Classifier.ipynb
├── advanced
    ├── Multi-Label-FSDKaggle2019-on-Colab.ipynb
    ├── Note-webdataset-smalldata.ipynb
    ├── Perceiver_MelSpecAudio_Example_Colab.ipynb
    ├── __init__.py
    ├── create_wds_fsd50k.py
    ├── create_wds_fsd50k_resample.py
    ├── fat2018.py
    ├── metric_fat2018.py
    └── preprocess_fat2018.py
├── config-fat2018.yaml
├── config.yaml
├── for_evar
    ├── README.md
    ├── cnn14_decoupled.py
    └── sampler.py
├── requirements.txt
├── src
    ├── __init__.py
    ├── augmentations.py
    ├── libs.py
    ├── lwlrap.py
    ├── models.py
    └── multi_label_libs.py
└── work
    └── .placeholder


/.gitignore:
--------------------------------------------------------------------------------
1 | work/*


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Sound Classifier Tutorial for PyTorch
 2 | 
 3 | This is sound classifier tutorials using PyTorch, PyTorch Lightning and torchaudio.
 4 | 
 5 | ## 0. Motivation
 6 | 
 7 | I had made a repository regarding sound classifier solution: [Machine Learning Sound Classifier for Live Audio](https://github.com/daisukelab/ml-sound-classifier),
 8 | it is based on my solution for a Kaggle competition "[Freesound General-Purpose Audio Tagging Challenge](https://www.kaggle.com/c/freesound-audio-tagging)" using Keras.
 9 | 
10 | Keras was popular when it was created, but many people today are using PyTorch, and yes, I'm also the one.
11 | 
12 | This repository is an updated example solution using PyTorch that shows how I would try new machine learning sound competition with current software assets.
13 | 
14 | ## 1. Quickstart
15 | 
16 | - `pip install -r requirements.txt` to install modules.
17 | - Run notebooks.
18 | 
19 | ## 2. What you can find
20 | 
21 | ### 2-1. What's included
22 | 
23 | - An audio preprocessing example: [Data-Preprocessing.ipynb](Data-Preprocessing.ipynb)
24 | - A training example: [Training-Classifier.ipynb](Training-Classifier.ipynb)
25 | - [FSDKaggle2018](https://zenodo.org/record/2552860#.X9TH6mT7RzU) handling example, it's a sound multi-class classification task.
26 | - New) ResNetish/VGGish [1] models.
27 |     - Models are equipped with AdaptiveXXXPool2d to be flexible with input size. Models now accept any shape.
28 | - New) Colab all-in-one notebook [Run-All-on-Colab.ipynb](Run-All-on-Colab.ipynb). You can run all through the training/evaluation online.
29 | 
30 | ### 2-2. What's not
31 | 
32 | - No usual practices/techniques: Normalization, augmentation, regularizations and etc. --> will be followed up with advanced notebook.
33 | - No cutting edge networks like what you can find there: [PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition](https://github.com/qiuqiangkong/audioset_tagging_cnn).
34 | 
35 | You can just run to reproduce, or try advanced techniques based on the tutorials.
36 | 
37 | ## 3. Notes for design choices
38 | 
39 | ### 3-1. Input data format: raw audio or spectrogram?
40 | 
41 | If we need to augment input data in time-domain, we feed raw audio to dataset class.
42 | 
43 | But in this example, all the data are converted to log-mel spectrogram in advance, as a major choice.
44 | 
45 | - Good: This will make data handling easy, especially in training pipeline.
46 | - Bad: Applicable data augmentations will be limited. Available transformations in torchaudio are: [FrequencyMasking](https://pytorch.org/audio/stable/transforms.html#frequencymasking) or [TimeMasking](https://pytorch.org/audio/stable/transforms.html#timemasking).
47 | 
48 | ### 3-2. Input data size
49 | 
50 | Number of frequency bins (n_mels) is set to 64 as a typical choice.
51 | Duration is set to ~~1 second, just as an example.~~ 5 seconds in current configuration, because 1 second was too short for the FSDKaggle2018 dataset.
52 | 
53 | You can find and change in [config.yaml](config.yaml).
54 | 
55 |     clip_length: 5.0 # [sec] -- it was 1.0 s at the initial release.
56 |     n_mels: 64
57 | 
58 | ### 3-3. FFT paramaters
59 | 
60 | Typical paramaters are configured in [config.yaml](config.yaml).
61 | 
62 |     sample_rate: 44100
63 |     hop_length: 441
64 |     n_fft: 1024
65 |     n_mels: 64
66 |     f_min: 0
67 |     f_max: 22050
68 | 
69 | ## 4. Performances
70 | 
71 | How is the performance of the trained models on the tutorials?
72 | 
73 | - The best Kaggle result MAP@3 was reported as 0.942 (see [Kaggle 4th solution](https://www.kaggle.com/c/freesound-audio-tagging/discussion/62634)). Note that this result is ensemble of 5 models of the same SE-ResNeXt network trained on 5 folds.
74 | - The best result in this repo is MAP@3 of 0.87 (with ResNetish). This is a single model result, without use of data augmentations.
75 | 
76 | Already came close to the top solution with ResNetish, and still have space for data augmentations/ regularization techniques.
77 | 
78 | ## References
79 | 
80 | - [1] S. Hershey et al., ‘CNN Architectures for Large-Scale Audio Classification’,\ in International Conference on Acoustics, Speech and Signal Processing (ICASSP),2017\ Available: https://arxiv.org/abs/1609.09430, https://ai.google/research/pubs/pub45611
81 | 


--------------------------------------------------------------------------------
/advanced/Note-webdataset-smalldata.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "code",
  5 |    "execution_count": 1,
  6 |    "id": "361027b5",
  7 |    "metadata": {},
  8 |    "outputs": [],
  9 |    "source": [
 10 |     "from dlcliche.notebook import *\n",
 11 |     "from dlcliche.torch_utils import *"
 12 |    ]
 13 |   },
 14 |   {
 15 |    "cell_type": "markdown",
 16 |    "id": "e8d4d37d",
 17 |    "metadata": {},
 18 |    "source": [
 19 |     "## Goal\n",
 20 |     "\n",
 21 |     "Check if webdataset is useful for downstream datasets which are typically small.\n",
 22 |     "\n",
 23 |     "### Preparing webdataset shards\n",
 24 |     "\n",
 25 |     "Used `create_wds_fsd50k.py` to make tar-shards encupslating local 16kHz FSD50K files.\n",
 26 |     "Resulted in making four tar files: `fsd50k-eval-16k-{000000..000003}.tar`.\n",
 27 |     "\n",
 28 |     "### Test result\n",
 29 |     "\n",
 30 |     "The result show that webdataset is not effective small data regime."
 31 |    ]
 32 |   },
 33 |   {
 34 |    "cell_type": "code",
 35 |    "execution_count": 24,
 36 |    "id": "0eab9f25",
 37 |    "metadata": {},
 38 |    "outputs": [
 39 |     {
 40 |      "name": "stdout",
 41 |      "output_type": "stream",
 42 |      "text": [
 43 |       "9.86 s ± 534 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
 44 |      ]
 45 |     }
 46 |    ],
 47 |    "source": [
 48 |     "%%timeit\n",
 49 |     "\n",
 50 |     "import webdataset  as wds\n",
 51 |     "import io\n",
 52 |     "import librosa\n",
 53 |     "\n",
 54 |     "url = '/data/A/fsd50k/fsd50k-eval-16k-{000000..000003}.tar'\n",
 55 |     "ds = (\n",
 56 |     "    wds.WebDataset(url)\n",
 57 |     "    .shuffle(1000)\n",
 58 |     "    .to_tuple('wav', 'labels')\n",
 59 |     ")\n",
 60 |     "for i, (wav, labels) in enumerate(ds):\n",
 61 |     "    wav = librosa.load(io.BytesIO(wav))\n",
 62 |     "    labels = labels.decode()\n",
 63 |     "    if i > 100:\n",
 64 |     "        break"
 65 |    ]
 66 |   },
 67 |   {
 68 |    "cell_type": "code",
 69 |    "execution_count": 25,
 70 |    "id": "b61de49f",
 71 |    "metadata": {},
 72 |    "outputs": [
 73 |     {
 74 |      "name": "stdout",
 75 |      "output_type": "stream",
 76 |      "text": [
 77 |       "9.06 s ± 8.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
 78 |      ]
 79 |     }
 80 |    ],
 81 |    "source": [
 82 |     "%%timeit\n",
 83 |     "\n",
 84 |     "import io\n",
 85 |     "import librosa\n",
 86 |     "\n",
 87 |     "def IterativeDataset(root, files, label_set):\n",
 88 |     "    root = Path(root)\n",
 89 |     "    for fname, labels in zip(files, label_set):\n",
 90 |     "        data = librosa.load(root/fname)\n",
 91 |     "        labels = labels\n",
 92 |     "        yield data, labels\n",
 93 |     "\n",
 94 |     "df = pd.read_csv('/lab/AR2021/evar/metadata/fsd50k.csv')\n",
 95 |     "df = df[df.split == 'test']\n",
 96 |     "\n",
 97 |     "for i, (binary, labels) in enumerate(IterativeDataset('work/16k/fsd50k', df.file_name.values, df.label.values)):\n",
 98 |     "    wav = binary\n",
 99 |     "    labels = labels\n",
100 |     "    if i > 100:\n",
101 |     "        break\n",
102 |     "#print(wav, labels)"
103 |    ]
104 |   },
105 |   {
106 |    "cell_type": "markdown",
107 |    "id": "0185a362",
108 |    "metadata": {},
109 |    "source": [
110 |     "## Note: create tar shard files by codes"
111 |    ]
112 |   },
113 |   {
114 |    "cell_type": "code",
115 |    "execution_count": 33,
116 |    "id": "22a05c53",
117 |    "metadata": {},
118 |    "outputs": [
119 |     {
120 |      "data": {
121 |       "text/html": [
122 |        "<div>\n",
123 |        "<style scoped>\n",
124 |        "    .dataframe tbody tr th:only-of-type {\n",
125 |        "        vertical-align: middle;\n",
126 |        "    }\n",
127 |        "\n",
128 |        "    .dataframe tbody tr th {\n",
129 |        "        vertical-align: top;\n",
130 |        "    }\n",
131 |        "\n",
132 |        "    .dataframe thead th {\n",
133 |        "        text-align: right;\n",
134 |        "    }\n",
135 |        "</style>\n",
136 |        "<table border=\"1\" class=\"dataframe\">\n",
137 |        "  <thead>\n",
138 |        "    <tr style=\"text-align: right;\">\n",
139 |        "      <th></th>\n",
140 |        "      <th>fname</th>\n",
141 |        "      <th>labels</th>\n",
142 |        "      <th>mids</th>\n",
143 |        "      <th>split</th>\n",
144 |        "      <th>key</th>\n",
145 |        "    </tr>\n",
146 |        "  </thead>\n",
147 |        "  <tbody>\n",
148 |        "    <tr>\n",
149 |        "      <th>0</th>\n",
150 |        "      <td>FSD50K.dev_audio/64760.wav</td>\n",
151 |        "      <td>Electric_guitar,Guitar,Plucked_string_instrume...</td>\n",
152 |        "      <td>/m/02sgy,/m/0342h,/m/0fx80y,/m/04szw,/m/04rlf</td>\n",
153 |        "      <td>train</td>\n",
154 |        "      <td>train_64760</td>\n",
155 |        "    </tr>\n",
156 |        "    <tr>\n",
157 |        "      <th>1</th>\n",
158 |        "      <td>FSD50K.dev_audio/16399.wav</td>\n",
159 |        "      <td>Electric_guitar,Guitar,Plucked_string_instrume...</td>\n",
160 |        "      <td>/m/02sgy,/m/0342h,/m/0fx80y,/m/04szw,/m/04rlf</td>\n",
161 |        "      <td>train</td>\n",
162 |        "      <td>train_16399</td>\n",
163 |        "    </tr>\n",
164 |        "    <tr>\n",
165 |        "      <th>2</th>\n",
166 |        "      <td>FSD50K.dev_audio/16401.wav</td>\n",
167 |        "      <td>Electric_guitar,Guitar,Plucked_string_instrume...</td>\n",
168 |        "      <td>/m/02sgy,/m/0342h,/m/0fx80y,/m/04szw,/m/04rlf</td>\n",
169 |        "      <td>train</td>\n",
170 |        "      <td>train_16401</td>\n",
171 |        "    </tr>\n",
172 |        "  </tbody>\n",
173 |        "</table>\n",
174 |        "</div>"
175 |       ],
176 |       "text/plain": [
177 |        "                        fname  \\\n",
178 |        "0  FSD50K.dev_audio/64760.wav   \n",
179 |        "1  FSD50K.dev_audio/16399.wav   \n",
180 |        "2  FSD50K.dev_audio/16401.wav   \n",
181 |        "\n",
182 |        "                                              labels  \\\n",
183 |        "0  Electric_guitar,Guitar,Plucked_string_instrume...   \n",
184 |        "1  Electric_guitar,Guitar,Plucked_string_instrume...   \n",
185 |        "2  Electric_guitar,Guitar,Plucked_string_instrume...   \n",
186 |        "\n",
187 |        "                                            mids  split          key  \n",
188 |        "0  /m/02sgy,/m/0342h,/m/0fx80y,/m/04szw,/m/04rlf  train  train_64760  \n",
189 |        "1  /m/02sgy,/m/0342h,/m/0fx80y,/m/04szw,/m/04rlf  train  train_16399  \n",
190 |        "2  /m/02sgy,/m/0342h,/m/0fx80y,/m/04szw,/m/04rlf  train  train_16401  "
191 |       ]
192 |      },
193 |      "execution_count": 33,
194 |      "metadata": {},
195 |      "output_type": "execute_result"
196 |     }
197 |    ],
198 |    "source": [
199 |     "def fsd50k_metadata(FSD50K_root):\n",
200 |     "    FSD = Path(FSD50K_root)\n",
201 |     "    df = pd.read_csv(FSD/f'FSD50K.ground_truth/dev.csv')\n",
202 |     "    df['key'] = df.split + '_' + df.fname.apply(lambda s: str(s))\n",
203 |     "    df['fname'] = df.fname.apply(lambda s: f'FSD50K.dev_audio/{s}.wav')\n",
204 |     "    dftest = pd.read_csv(FSD/f'FSD50K.ground_truth/eval.csv')\n",
205 |     "    dftest['key'] = 'eval_' + dftest.fname.apply(lambda s: str(s))\n",
206 |     "    dftest['split'] = 'eval'\n",
207 |     "    dftest['fname'] = dftest.fname.apply(lambda s: f'FSD50K.eval_audio/{s}.wav')\n",
208 |     "    df = pd.concat([df, dftest], ignore_index=True)\n",
209 |     "    return df\n",
210 |     "\n",
211 |     "\n",
212 |     "df = fsd50k_metadata(FSD50K_root='/data/A/fsd50k/')\n",
213 |     "df[:3]"
214 |    ]
215 |   },
216 |   {
217 |    "cell_type": "code",
218 |    "execution_count": 56,
219 |    "id": "8970bd0b",
220 |    "metadata": {},
221 |    "outputs": [
222 |     {
223 |      "name": "stdout",
224 |      "output_type": "stream",
225 |      "text": [
226 |       "Processing 36796 train samples.\n",
227 |       "/data/A/fsd50k/FSD50K.dev_audio/64760.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_64760\n"
228 |      ]
229 |     },
230 |     {
231 |      "data": {
232 |       "text/plain": [
233 |        "{'__key__': 'train_64760',\n",
234 |        " 'npy': array([-0.00026427, -0.00128246,  0.00068087, ..., -0.00253225,\n",
235 |        "        -0.00244647,  0.        ], dtype=float32),\n",
236 |        " 'labels': 'Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music'}"
237 |       ]
238 |      },
239 |      "execution_count": 56,
240 |      "metadata": {},
241 |      "output_type": "execute_result"
242 |     }
243 |    ],
244 |    "source": [
245 |     "import librosa\n",
246 |     "\n",
247 |     "\n",
248 |     "def load_resampled_mono_wav(fpath, sr):\n",
249 |     "    y, org_sr = librosa.load('/data/A/fsd50k/FSD50K.dev_audio/382455.wav', sr=None, mono=True)\n",
250 |     "    if org_sr != sr:\n",
251 |     "        y = librosa.resample(y, orig_sr=org_sr, target_sr=sr)\n",
252 |     "    return y\n",
253 |     "\n",
254 |     "\n",
255 |     "def fsd50k_generator(root, split, sr):\n",
256 |     "    root = Path(root)\n",
257 |     "    df = fsd50k_metadata(FSD50K_root=root)\n",
258 |     "    df = df[df.split == split]\n",
259 |     "    print(f'Processing {len(df)} {split} samples.')\n",
260 |     "    for file_name, labels, key in df[['fname', 'labels', 'key']].values:\n",
261 |     "        fpath = root/file_name\n",
262 |     "        print(fpath, labels, key)\n",
263 |     "\n",
264 |     "        sample = {\n",
265 |     "            '__key__': key,\n",
266 |     "            'npy': load_resampled_mono_wav(fpath, sr),\n",
267 |     "            'labels': labels,\n",
268 |     "        }\n",
269 |     "        yield sample\n",
270 |     "\n",
271 |     "gen = fsd50k_generator('/data/A/fsd50k/', 'train', 16000)\n",
272 |     "next(iter(gen))"
273 |    ]
274 |   },
275 |   {
276 |    "cell_type": "code",
277 |    "execution_count": 57,
278 |    "id": "a7fb2d1c",
279 |    "metadata": {},
280 |    "outputs": [
281 |     {
282 |      "name": "stdout",
283 |      "output_type": "stream",
284 |      "text": [
285 |       "# writing /data/A/fsd50k/train-000000.tar 0 0.0 GB 0\n",
286 |       "Processing 36796 train samples.\n",
287 |       "/data/A/fsd50k/FSD50K.dev_audio/64760.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_64760\n",
288 |       "/data/A/fsd50k/FSD50K.dev_audio/16399.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_16399\n",
289 |       "/data/A/fsd50k/FSD50K.dev_audio/16401.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_16401\n",
290 |       "/data/A/fsd50k/FSD50K.dev_audio/16402.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_16402\n",
291 |       "/data/A/fsd50k/FSD50K.dev_audio/16404.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_16404\n",
292 |       "/data/A/fsd50k/FSD50K.dev_audio/64761.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_64761\n",
293 |       "/data/A/fsd50k/FSD50K.dev_audio/268259.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_268259\n",
294 |       "/data/A/fsd50k/FSD50K.dev_audio/64762.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_64762\n",
295 |       "/data/A/fsd50k/FSD50K.dev_audio/40515.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_40515\n",
296 |       "/data/A/fsd50k/FSD50K.dev_audio/40516.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_40516\n",
297 |       "/data/A/fsd50k/FSD50K.dev_audio/40517.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_40517\n",
298 |       "/data/A/fsd50k/FSD50K.dev_audio/64741.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_64741\n",
299 |       "/data/A/fsd50k/FSD50K.dev_audio/40523.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_40523\n",
300 |       "/data/A/fsd50k/FSD50K.dev_audio/64743.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_64743\n",
301 |       "/data/A/fsd50k/FSD50K.dev_audio/64744.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_64744\n",
302 |       "/data/A/fsd50k/FSD50K.dev_audio/40525.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_40525\n",
303 |       "/data/A/fsd50k/FSD50K.dev_audio/64746.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_64746\n",
304 |       "/data/A/fsd50k/FSD50K.dev_audio/5318.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_5318\n",
305 |       "/data/A/fsd50k/FSD50K.dev_audio/4258.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_4258\n",
306 |       "/data/A/fsd50k/FSD50K.dev_audio/4259.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_4259\n",
307 |       "/data/A/fsd50k/FSD50K.dev_audio/4260.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_4260\n",
308 |       "/data/A/fsd50k/FSD50K.dev_audio/4261.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_4261\n",
309 |       "/data/A/fsd50k/FSD50K.dev_audio/4262.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_4262\n",
310 |       "/data/A/fsd50k/FSD50K.dev_audio/4263.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_4263\n",
311 |       "/data/A/fsd50k/FSD50K.dev_audio/4264.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_4264\n",
312 |       "/data/A/fsd50k/FSD50K.dev_audio/4265.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_4265\n",
313 |       "/data/A/fsd50k/FSD50K.dev_audio/4266.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_4266\n",
314 |       "/data/A/fsd50k/FSD50K.dev_audio/4267.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_4267\n",
315 |       "/data/A/fsd50k/FSD50K.dev_audio/4268.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_4268\n",
316 |       "/data/A/fsd50k/FSD50K.dev_audio/4269.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_4269\n",
317 |       "/data/A/fsd50k/FSD50K.dev_audio/4270.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_4270\n",
318 |       "/data/A/fsd50k/FSD50K.dev_audio/4272.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_4272\n",
319 |       "/data/A/fsd50k/FSD50K.dev_audio/64757.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_64757\n",
320 |       "/data/A/fsd50k/FSD50K.dev_audio/4276.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_4276\n",
321 |       "/data/A/fsd50k/FSD50K.dev_audio/4277.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_4277\n",
322 |       "/data/A/fsd50k/FSD50K.dev_audio/4278.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_4278\n",
323 |       "/data/A/fsd50k/FSD50K.dev_audio/4279.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_4279\n",
324 |       "/data/A/fsd50k/FSD50K.dev_audio/4280.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_4280\n",
325 |       "/data/A/fsd50k/FSD50K.dev_audio/4281.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_4281\n",
326 |       "/data/A/fsd50k/FSD50K.dev_audio/4283.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_4283\n",
327 |       "/data/A/fsd50k/FSD50K.dev_audio/4284.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_4284\n",
328 |       "/data/A/fsd50k/FSD50K.dev_audio/4285.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_4285\n",
329 |       "/data/A/fsd50k/FSD50K.dev_audio/4286.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_4286\n",
330 |       "/data/A/fsd50k/FSD50K.dev_audio/4287.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_4287\n",
331 |       "/data/A/fsd50k/FSD50K.dev_audio/4288.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_4288\n",
332 |       "/data/A/fsd50k/FSD50K.dev_audio/4289.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_4289\n",
333 |       "/data/A/fsd50k/FSD50K.dev_audio/5314.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_5314\n",
334 |       "/data/A/fsd50k/FSD50K.dev_audio/4290.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_4290\n",
335 |       "/data/A/fsd50k/FSD50K.dev_audio/4291.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_4291\n",
336 |       "/data/A/fsd50k/FSD50K.dev_audio/5310.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_5310\n",
337 |       "/data/A/fsd50k/FSD50K.dev_audio/64703.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_64703\n",
338 |       "/data/A/fsd50k/FSD50K.dev_audio/5312.wav Electric_guitar,Bass_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_5312\n",
339 |       "/data/A/fsd50k/FSD50K.dev_audio/64704.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_64704\n",
340 |       "/data/A/fsd50k/FSD50K.dev_audio/64706.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_64706\n",
341 |       "/data/A/fsd50k/FSD50K.dev_audio/64707.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_64707\n",
342 |       "/data/A/fsd50k/FSD50K.dev_audio/64708.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_64708\n",
343 |       "/data/A/fsd50k/FSD50K.dev_audio/5315.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_5315\n",
344 |       "/data/A/fsd50k/FSD50K.dev_audio/5317.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_5317\n",
345 |       "/data/A/fsd50k/FSD50K.dev_audio/64711.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_64711\n",
346 |       "/data/A/fsd50k/FSD50K.dev_audio/64712.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_64712\n",
347 |       "/data/A/fsd50k/FSD50K.dev_audio/64714.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_64714\n",
348 |       "/data/A/fsd50k/FSD50K.dev_audio/64715.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_64715\n",
349 |       "/data/A/fsd50k/FSD50K.dev_audio/64717.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_64717\n",
350 |       "/data/A/fsd50k/FSD50K.dev_audio/64718.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_64718\n",
351 |       "/data/A/fsd50k/FSD50K.dev_audio/64720.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_64720\n"
352 |      ]
353 |     },
354 |     {
355 |      "name": "stdout",
356 |      "output_type": "stream",
357 |      "text": [
358 |       "/data/A/fsd50k/FSD50K.dev_audio/64721.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_64721\n",
359 |       "/data/A/fsd50k/FSD50K.dev_audio/64722.wav Electric_guitar,Guitar,Plucked_string_instrument,Musical_instrument,Music train_64722\n"
360 |      ]
361 |     },
362 |     {
363 |      "ename": "KeyboardInterrupt",
364 |      "evalue": "",
365 |      "output_type": "error",
366 |      "traceback": [
367 |       "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
368 |       "\u001b[0;31mKeyboardInterrupt\u001b[0m                         Traceback (most recent call last)",
369 |       "\u001b[0;32m/tmp/ipykernel_2207172/328821770.py\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m     10\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     11\u001b[0m \u001b[0;32mwith\u001b[0m \u001b[0mwds\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mShardWriter\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0moutput_name\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmax_count\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0msink\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 12\u001b[0;31m     \u001b[0;32mfor\u001b[0m \u001b[0msample\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mislice\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfsd50k_generator\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msource_dir\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msplit\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msr\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m0\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m100\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m     13\u001b[0m         \u001b[0msink\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mwrite\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msample\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
370 |       "\u001b[0;32m/tmp/ipykernel_2207172/1443749655.py\u001b[0m in \u001b[0;36mfsd50k_generator\u001b[0;34m(root, split, sr)\u001b[0m\n\u001b[1;32m     20\u001b[0m         sample = {\n\u001b[1;32m     21\u001b[0m             \u001b[0;34m'__key__'\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mkey\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 22\u001b[0;31m             \u001b[0;34m'npy'\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mload_resampled_mono_wav\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfpath\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msr\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m     23\u001b[0m             \u001b[0;34m'labels'\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mlabels\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     24\u001b[0m         }\n",
371 |       "\u001b[0;32m/tmp/ipykernel_2207172/1443749655.py\u001b[0m in \u001b[0;36mload_resampled_mono_wav\u001b[0;34m(fpath, sr)\u001b[0m\n\u001b[1;32m      5\u001b[0m     \u001b[0my\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0morg_sr\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mlibrosa\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mload\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'/data/A/fsd50k/FSD50K.dev_audio/382455.wav'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msr\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmono\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mTrue\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      6\u001b[0m     \u001b[0;32mif\u001b[0m \u001b[0morg_sr\u001b[0m \u001b[0;34m!=\u001b[0m \u001b[0msr\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 7\u001b[0;31m         \u001b[0my\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mlibrosa\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mresample\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0my\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0morig_sr\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0morg_sr\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtarget_sr\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0msr\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m      8\u001b[0m     \u001b[0;32mreturn\u001b[0m \u001b[0my\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      9\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
372 |       "\u001b[0;32m~/anaconda3/lib/python3.9/site-packages/librosa/core/audio.py\u001b[0m in \u001b[0;36mresample\u001b[0;34m(y, orig_sr, target_sr, res_type, fix, scale, **kwargs)\u001b[0m\n\u001b[1;32m    602\u001b[0m         \u001b[0my_hat\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0msoxr\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mresample\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0my\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mT\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0morig_sr\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtarget_sr\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mquality\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mres_type\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mT\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    603\u001b[0m     \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 604\u001b[0;31m         \u001b[0my_hat\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mresampy\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mresample\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0my\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0morig_sr\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtarget_sr\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfilter\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mres_type\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    605\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    606\u001b[0m     \u001b[0;32mif\u001b[0m \u001b[0mfix\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
373 |       "\u001b[0;32m~/anaconda3/lib/python3.9/site-packages/resampy/core.py\u001b[0m in \u001b[0;36mresample\u001b[0;34m(x, sr_orig, sr_new, axis, filter, **kwargs)\u001b[0m\n\u001b[1;32m    118\u001b[0m     \u001b[0mx_2d\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mx\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mswapaxes\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mreshape\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mshape\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0maxis\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    119\u001b[0m     \u001b[0my_2d\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0my\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mswapaxes\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mreshape\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0my\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mshape\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0maxis\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 120\u001b[0;31m     \u001b[0mresample_f\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx_2d\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0my_2d\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msample_ratio\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0minterp_win\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0minterp_delta\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mprecision\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    121\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    122\u001b[0m     \u001b[0;32mreturn\u001b[0m \u001b[0my\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
374 |       "\u001b[0;31mKeyboardInterrupt\u001b[0m: "
375 |      ]
376 |     }
377 |    ],
378 |    "source": [
379 |     "import webdataset as wds\n",
380 |     "from itertools import islice\n",
381 |     "\n",
382 |     "\n",
383 |     "source_dir = '/data/A/fsd50k/'\n",
384 |     "split = 'train'\n",
385 |     "sr = 16000\n",
386 |     "output_name = f'/data/A/fsd50k/{split}-%06d.tar'\n",
387 |     "max_count = 10000\n",
388 |     "\n",
389 |     "with wds.ShardWriter(output_name, max_count) as sink:\n",
390 |     "    for sample in islice(fsd50k_generator(source_dir, split, sr), 0, 100):\n",
391 |     "        sink.write(sample)"
392 |    ]
393 |   },
394 |   {
395 |    "cell_type": "markdown",
396 |    "id": "bfc6032c",
397 |    "metadata": {},
398 |    "source": [
399 |     "## Note: creating dataset tar archives with command-line\n",
400 |     "\n",
401 |     "\n",
402 |     "### Install go and tarp commands\n",
403 |     "\n",
404 |     "https://github.com/webdataset/tarp\n",
405 |     "\n",
406 |     "- `sudo apt install golang-go`\n",
407 |     "- `go get -v github.com/tmbdev/tarp/tarp`\n",
408 |     "\n",
409 |     "### Create tar archive\n",
410 |     "\n",
411 |     "- `tar --sort=name -cf your_archive.tar your_folders`\n",
412 |     "- `find your_folder - type f -print| sort | tar -cf your_archive.tar - T -'\n",
413 |     "\n",
414 |     "### Shuffle and split\n",
415 |     "\n",
416 |     "- `tar --sorted -cf - your_folders | tarp"
417 |    ]
418 |   }
419 |  ],
420 |  "metadata": {
421 |   "kernelspec": {
422 |    "display_name": "base",
423 |    "language": "python",
424 |    "name": "base"
425 |   },
426 |   "language_info": {
427 |    "codemirror_mode": {
428 |     "name": "ipython",
429 |     "version": 3
430 |    },
431 |    "file_extension": ".py",
432 |    "mimetype": "text/x-python",
433 |    "name": "python",
434 |    "nbconvert_exporter": "python",
435 |    "pygments_lexer": "ipython3",
436 |    "version": "3.9.7"
437 |   }
438 |  },
439 |  "nbformat": 4,
440 |  "nbformat_minor": 5
441 | }
442 | 


--------------------------------------------------------------------------------
/advanced/__init__.py:
--------------------------------------------------------------------------------
1 | # adaanced
2 | 
3 | 


--------------------------------------------------------------------------------
/advanced/create_wds_fsd50k.py:
--------------------------------------------------------------------------------
 1 | """webdataset
 2 | python create_wds_fsd50k.py work/16k/fsd50k /data/A/fsd50k eval 16k
 3 | """
 4 | 
 5 | import sys
 6 | from multiprocessing import Pool
 7 | from pathlib import Path
 8 | import pandas as pd
 9 | import webdataset as wds
10 | from itertools import islice
11 | import librosa
12 | import fire
13 | 
14 | 
15 | def fsd50k_metadata(FSD50K_root):
16 |     FSD = Path(FSD50K_root)
17 |     df = pd.read_csv(FSD/f'FSD50K.ground_truth/dev.csv')
18 |     df['key'] = df.split + '_' + df.fname.apply(lambda s: str(s))
19 |     df['fname'] = df.fname.apply(lambda s: f'FSD50K.dev_audio/{s}.wav')
20 |     dftest = pd.read_csv(FSD/f'FSD50K.ground_truth/eval.csv')
21 |     dftest['key'] = 'eval_' + dftest.fname.apply(lambda s: str(s))
22 |     dftest['split'] = 'eval'
23 |     dftest['fname'] = dftest.fname.apply(lambda s: f'FSD50K.eval_audio/{s}.wav')
24 |     df = pd.concat([df, dftest], ignore_index=True)
25 |     return df
26 | 
27 | 
28 | def load_resampled_mono_wav(fpath, sr):
29 |     with open(fpath, 'rb') as f:
30 |         y = f.read()
31 |     # y, org_sr = librosa.load(fpath, sr=None, mono=True)
32 |     # if org_sr != sr:
33 |     #     y = librosa.resample(y, orig_sr=org_sr, target_sr=sr)
34 |     return y
35 | 
36 | 
37 | def _converter_worker(args):
38 |     fpath, sr = args
39 |     return load_resampled_mono_wav(fpath, sr)
40 | 
41 | 
42 | def fsd50k_generator(root, split, sr):
43 |     root = Path(root)
44 |     df = fsd50k_metadata(FSD50K_root=root)
45 |     df = df[df.split == split]
46 |     print(f'Processing {len(df)} {split} samples.')
47 |     for file_name, labels, key in df[['fname', 'labels', 'key']].values:
48 |         fpath = root/file_name
49 | 
50 |         sample = {
51 |             '__key__': key,
52 |             'wav': fpath, # load_resampled_mono_wav(fpath, sr),
53 |             'labels': labels,
54 |         }
55 |         yield sample
56 | 
57 | 
58 | def create_wds(source, output, split, sr, name='fsd50k-[SPLIT]-[SR]-%06d.tar', maxsize=10**9):
59 |     source = source
60 |     name = name.replace('[SPLIT]', split).replace('[SR]', str(sr))
61 |     output_name = str(Path(output)/name)
62 | 
63 |     gen = fsd50k_generator(source, split, sr)
64 |     with wds.ShardWriter(output_name, maxsize=maxsize) as sink:
65 |         while True:
66 |             samples = list(islice(gen, 100))
67 |             if len(samples) == 0:
68 |                 break
69 |             # load and resample wav files
70 |             with Pool() as p:
71 |                 args = [[s['wav'], sr] for s in samples]
72 |                 wavs = list(p.imap(_converter_worker, args))
73 |                 for s, wav in zip(samples, wavs):
74 |                     s['wav'] = wav
75 |                     sink.write(s)
76 |             print('.', end='')
77 |             sys.stdout.flush()
78 |     print('Finished')
79 | 
80 | 
81 | if __name__ == '__main__':
82 |     fire.Fire(create_wds)
83 | 


--------------------------------------------------------------------------------
/advanced/create_wds_fsd50k_resample.py:
--------------------------------------------------------------------------------
 1 | """webdataset
 2 | """
 3 | 
 4 | import sys
 5 | from multiprocessing import Pool
 6 | from pathlib import Path
 7 | import pandas as pd
 8 | import webdataset as wds
 9 | from itertools import islice
10 | import librosa
11 | import fire
12 | 
13 | 
14 | def fsd50k_metadata(FSD50K_root):
15 |     FSD = Path(FSD50K_root)
16 |     df = pd.read_csv(FSD/f'FSD50K.ground_truth/dev.csv')
17 |     df['key'] = df.split + '_' + df.fname.apply(lambda s: str(s))
18 |     df['fname'] = df.fname.apply(lambda s: f'FSD50K.dev_audio/{s}.wav')
19 |     dftest = pd.read_csv(FSD/f'FSD50K.ground_truth/eval.csv')
20 |     dftest['key'] = 'eval_' + dftest.fname.apply(lambda s: str(s))
21 |     dftest['split'] = 'eval'
22 |     dftest['fname'] = dftest.fname.apply(lambda s: f'FSD50K.eval_audio/{s}.wav')
23 |     df = pd.concat([df, dftest], ignore_index=True)
24 |     return df
25 | 
26 | 
27 | def load_resampled_mono_wav(fpath, sr):
28 |     y, org_sr = librosa.load(fpath, sr=None, mono=True)
29 |     if org_sr != sr:
30 |         y = librosa.resample(y, orig_sr=org_sr, target_sr=sr)
31 |     return y
32 | 
33 | 
34 | def _converter_worker(args):
35 |     fpath, sr = args
36 |     return load_resampled_mono_wav(fpath, sr)
37 | 
38 | 
39 | def fsd50k_generator(root, split, sr):
40 |     root = Path(root)
41 |     df = fsd50k_metadata(FSD50K_root=root)
42 |     df = df[df.split == split]
43 |     print(f'Processing {len(df)} {split} samples.')
44 |     for file_name, labels, key in df[['fname', 'labels', 'key']].values:
45 |         fpath = root/file_name
46 | 
47 |         sample = {
48 |             '__key__': key,
49 |             'npy': fpath, # load_resampled_mono_wav(fpath, sr),
50 |             'labels': labels,
51 |         }
52 |         yield sample
53 | 
54 | 
55 | def create_wds(source, output, split, sr, name='fsd50k-[SPLIT]-[SR]-%06d.tar', maxsize=10**9):
56 |     source = source
57 |     name = name.replace('[SPLIT]', split).replace('[SR]', str(sr))
58 |     output_name = str(Path(output)/name)
59 | 
60 |     gen = fsd50k_generator(source, split, sr)
61 |     with wds.ShardWriter(output_name, maxsize=maxsize) as sink:
62 |         while True:
63 |             samples = list(islice(gen, 100))
64 |             if len(samples) == 0:
65 |                 break
66 |             # load and resample wav files
67 |             with Pool() as p:
68 |                 args = [[s['npy'], sr] for s in samples]
69 |                 npys = list(p.imap(_converter_worker, args))
70 |                 for s, npy in zip(samples, npys):
71 |                     s['npy'] = npy
72 |                     sink.write(s)
73 |             print('.', end='')
74 |             sys.stdout.flush()
75 |     print('Finished')
76 | 
77 | 
78 | if __name__ == '__main__':
79 |     fire.Fire(create_wds)
80 | 


--------------------------------------------------------------------------------
/advanced/fat2018.py:
--------------------------------------------------------------------------------
  1 | """Multi-fold Freesound Audio Tagging solution.
  2 | """
  3 | 
  4 | from src.libs import *
  5 | import datetime
  6 | from advanced.metric_fat2018 import eval_fat2018_all_splits, eval_fat2018_by_probas
  7 | from src.models import resnetish18, VGGish, AlexNet
  8 | 
  9 | 
 10 | def report_result(message):
 11 |     print(message)
 12 |     # you might want to report to slack or anything here
 13 | 
 14 | 
 15 | def get_transforms(cfg):
 16 |     NF = cfg.n_mels
 17 |     NT = cfg.unit_length
 18 |     augs = []
 19 |     for a in cfg.aug.split('x'):
 20 |         if a == 'RC':
 21 |             augs.append(GenericRandomResizedCrop((NF, NT), scale=(0.8, 1.0), ratio=(NF/(NT*1.2), NF/(NT*0.8))))
 22 |         elif a == 'SA':
 23 |             augs.append(AT.FrequencyMasking(NF//10))
 24 |             augs.append(AT.TimeMasking(NT//10))
 25 |         else:
 26 |             if a:
 27 |                 raise Exception(f'unknown: {a}')
 28 |     tfms = VT.Compose(augs)
 29 |     print(tfms)
 30 |     return tfms
 31 | 
 32 | 
 33 | def get_model(cfg, num_classes):
 34 |     if cfg.model == 'AN':
 35 |         return AlexNet(num_classes)
 36 |     if cfg.model == 'R18':
 37 |         return resnetish18(num_classes)
 38 |     if cfg.model == 'VGG':
 39 |         return VGGish(num_classes)
 40 |     raise Exception(f'unknown: {cfg.model}')
 41 | 
 42 | 
 43 | def read_metadata(cfg):
 44 |     # Make lists of filenames and labels from meta files
 45 |     filenames, labels = {}, {}
 46 |     for split, npy_folder, meta_filename in [['train', f'work/{cfg.type}/FSDKaggle2018.audio_train', 'train_post_competition.csv'],
 47 |                                              ['test', f'work/{cfg.type}/FSDKaggle2018.audio_test', 'test_post_competition_scoring_clips.csv']]:
 48 |         df = pd.read_csv(cfg.data_root/'FSDKaggle2018.meta'/meta_filename)
 49 |         filenames[split] = np.array([(npy_folder + '/' + fname.replace('.wav', '.npy')) for fname in df.fname.values])
 50 |         labels[split] = list(df.label.values)
 51 | 
 52 |     # Make a list of classes, converting labels into numbers
 53 |     classes = sorted(set(labels['train'] + labels['test']))
 54 |     for split in labels:
 55 |         labels[split] = np.array([classes.index(label) for label in labels[split]])
 56 | 
 57 |     return filenames, labels, classes
 58 | 
 59 | 
 60 | def calc_stat(cfg, filenames, labels, classes, calc_stat=False, n_calc_stat=10000):
 61 |     print(labels)
 62 |     class_weight = compute_class_weight('balanced', range(len(classes)), labels['train'])
 63 |     class_weight = torch.tensor(class_weight).to(torch.float)
 64 | 
 65 |     if calc_stat:
 66 |         all_train_lms = np.hstack([np.load(f)[0] for f in filenames['train'][:n_calc_stat]])
 67 |         train_mean_std = all_train_lms.mean(), all_train_lms.std()
 68 |         print(train_mean_std)
 69 |     else:
 70 |         train_mean_std = None
 71 | 
 72 |     return class_weight, train_mean_std
 73 | 
 74 | 
 75 | def run(config_file='config-fat2018.yaml', epochs=None, finetune_epochs=None, mixup=None, aug=None, norm=False):
 76 |     print(config_file, epochs, mixup, aug)
 77 |     cfg = load_config(config_file)
 78 |     cfg.epochs = epochs or cfg.epochs
 79 |     cfg.finetune_epochs = finetune_epochs or cfg.finetune_epochs
 80 |     cfg.mixup = cfg.mixup if mixup is None else mixup
 81 |     cfg.aug = aug or cfg.aug or ''
 82 |     filenames, labels, classes = read_metadata(cfg)
 83 |     class_weight, train_mean_std = calc_stat(cfg, filenames, labels, classes, calc_stat=norm)
 84 | 
 85 |     name = datetime.datetime.now().strftime('%y%m%d%H%M')
 86 |     name = f'model{cfg.type}-{cfg.model}-{cfg.aug}-m{str(cfg.mixup)[2:]}{"-N" if norm else ""}-{name}'
 87 | 
 88 |     weight_folder = Path('work/' + name)
 89 |     weight_folder.mkdir(parents=True, exist_ok=True)
 90 |     results, all_file_probas = [], []
 91 |     print(f'Training {weight_folder}')
 92 | 
 93 |     skf = StratifiedKFold(n_splits=cfg.n_folds)
 94 |     for fold, (train_index, test_index) in enumerate(skf.split(filenames['train'], labels['train'])):
 95 |         print("TRAIN:", len(train_index), "TEST:", len(test_index))
 96 |         train_files, val_files = filenames['train'][train_index], filenames['train'][test_index]
 97 |         train_ys, val_ys = labels['train'][train_index], labels['train'][test_index]
 98 | 
 99 |         train_dataset = LMSClfDataset(cfg, train_files, train_ys, norm_mean_std=train_mean_std,
100 |                                       transforms=get_transforms(cfg))
101 |         valid_dataset = LMSClfDataset(cfg, val_files, val_ys, norm_mean_std=train_mean_std)
102 |         train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=cfg.bs, shuffle=True, pin_memory=True,
103 |                                                    num_workers=multiprocessing.cpu_count())
104 |         valid_loader = torch.utils.data.DataLoader(valid_dataset, batch_size=cfg.bs, pin_memory=True,
105 |                                                    num_workers=multiprocessing.cpu_count())
106 | 
107 |         # main training
108 |         model = get_model(cfg, len(classes))
109 |         dataloaders = [train_loader, valid_loader, None]
110 |         learner = LMSClfLearner(model, dataloaders, mixup_alpha=cfg.mixup, weight=class_weight)
111 |         checkpoint = pl.callbacks.ModelCheckpoint(monitor='val_acc')
112 |         trainer = pl.Trainer(gpus=1, max_epochs=cfg.epochs, callbacks=[checkpoint])
113 |         trainer.fit(learner)
114 |         # result for now
115 |         learner.load_state_dict(torch.load(checkpoint.best_model_path)['state_dict'])
116 |         (acc, MAP3), file_probas = eval_fat2018_all_splits(cfg, model, device, filenames['test'], labels['test'],
117 |                                                                         norm_mean_std=train_mean_std, debug_name='test')
118 | 
119 |         # fine tuning
120 |         learner = LMSClfLearner(model, dataloaders, mixup_alpha=0.0, learning_rate=1e-4, weight=class_weight)
121 |         checkpoint = pl.callbacks.ModelCheckpoint(monitor='val_acc')
122 |         trainer = pl.Trainer(gpus=1, max_epochs=cfg.finetune_epochs, callbacks=[checkpoint])
123 |         trainer.fit(learner)
124 |         # result for fine tuned model
125 |         learner.load_state_dict(torch.load(checkpoint.best_model_path)['state_dict'])
126 |         (acc, MAP3), file_probas = eval_fat2018_all_splits(cfg, model, device, filenames['test'], labels['test'],
127 |                                                                         norm_mean_std=train_mean_std, debug_name='test')
128 |         all_file_probas.append(file_probas)
129 |         results.append(MAP3)
130 | 
131 |         fold_weight = weight_folder/f'{fold}-{Path(checkpoint.best_model_path).name}'
132 |         copy_file(checkpoint.best_model_path, fold_weight)
133 |         print(f'Saved fold#{fold} weight as {fold_weight}')
134 | 
135 |     mean_file_probas = np.array(all_file_probas).mean(axis=0)
136 |     acc, MAP3 = eval_fat2018_by_probas(mean_file_probas, labels['test'], debug_name='test')
137 |     np.save(weight_folder/'ens_probas.npy', mean_file_probas)
138 |     report_text = f'{name},{epochs},{aug},{mixup},{norm},{MAP3},{np.mean(results)}'
139 |     report_result(report_text)
140 | 
141 | 
142 | if __name__ == '__main__':
143 |     fire.Fire(run)
144 | 
145 | 
146 | 


--------------------------------------------------------------------------------
/advanced/metric_fat2018.py:
--------------------------------------------------------------------------------
 1 | # Based on https://github.com/DCASE-REPO/dcase2018_baseline/blob/master/task2/evaluation.py
 2 | 
 3 | from src.libs import *
 4 | import datetime
 5 | import numpy as np
 6 | import torch
 7 | import multiprocessing
 8 | 
 9 | 
10 | def one_ap(gt, topk):
11 |     for i, p in enumerate(topk):
12 |             if gt == p:
13 |                 return 1.0 / (i + 1.0)
14 |     return 0.0
15 | 
16 | 
17 | def avg_precision(gts=None, topks=None):
18 |     return np.array([one_ap(gt, topk) for gt, topk in zip(gts, topks)])
19 | 
20 | 
21 | def eval_fat2018_by_probas(probas, labels, debug_name=None, TOP_K=3):
22 |     correct = ap = 0.0
23 |     for proba, label in zip(probas, labels):
24 |         topk = proba.argsort()[-TOP_K:][::-1]
25 |         correct += int(topk[0] == label)
26 |         ap += one_ap(label, topk)
27 |     acc = correct / len(labels)
28 |     mAP = ap / len(labels)
29 |     if debug_name:
30 |         print(f'{debug_name} acc = {acc:.4f}, MAP@{TOP_K} = {mAP}')
31 |     return acc, mAP
32 | 
33 | 
34 | def eval_fat2018(model, device, dataloader, debug_name=None, TTA=1):
35 |     model = model.to(device).eval()
36 |     all_probas, labels = [], []
37 |     with torch.no_grad():
38 |         for _ in range(TTA):
39 |             for X, gts in dataloader:
40 |                 preds = model(X.to(device))
41 |                 probas = preds.softmax(1)
42 |                 all_probas.extend(probas.cpu().numpy())
43 |                 labels.extend(gts.cpu().numpy())
44 |     all_probas = np.array(all_probas)
45 |     return eval_fat2018_by_probas(all_probas, labels, debug_name=debug_name), all_probas
46 | 
47 | 
48 | def eval_fat2018_all_splits(cfg, model, device, filenames, labels, norm_mean_std=None, debug_name=None, head_n=999, agg='mean'):
49 |     model = model.to(device).eval()
50 |     file_probas = [[] for _ in range(len(labels))]
51 |     test_dataset = SplitAllDataset(cfg, filenames, norm_mean_std=norm_mean_std, head_n=head_n)
52 |     test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=cfg.bs,
53 |                                               num_workers=multiprocessing.cpu_count(), pin_memory=True)
54 |     print(f'Predicting all {len(test_dataset)} splits for {len(labels)} files...')
55 |     for X, fileidxs in test_loader:
56 |         with torch.no_grad():
57 |             preds = model(X.to(device))
58 |             probas = F.softmax(preds, dim=1)
59 |         for idx, prob in zip(fileidxs.cpu().numpy(), probas.cpu().numpy()):
60 |             file_probas[idx].append(prob)
61 | 
62 |     if agg == 'max':
63 |         file_probas = np.array([np.max(probas, axis=0) for probas in file_probas])
64 |     elif agg == 'mean':
65 |         file_probas = np.array([np.mean(probas, axis=0) for probas in file_probas])
66 |     else:
67 |         raise Exception()
68 | 
69 |     return eval_fat2018_by_probas(file_probas, labels, debug_name=debug_name), file_probas
70 | 


--------------------------------------------------------------------------------
/advanced/preprocess_fat2018.py:
--------------------------------------------------------------------------------
 1 | """Preprocess Freesound Audio Tagging 2018 competition data.
 2 | """
 3 | 
 4 | import warnings
 5 | warnings.simplefilter('ignore')
 6 | 
 7 | from src.libs import *
 8 | from tqdm import tqdm
 9 | import fire
10 | 
11 | def convert(config='config.yaml'):
12 |     cfg = load_config(config)
13 |     print(cfg)
14 |     DATA_ROOT = Path(cfg.data_root)
15 |     DEST = Path('work')/cfg.type
16 | 
17 |     folders = ['FSDKaggle2018.audio_test', 'FSDKaggle2018.audio_train']
18 | 
19 |     to_mel_spectrogram = torchaudio.transforms.MelSpectrogram(
20 |         sample_rate=cfg.sample_rate, n_fft=cfg.n_fft, n_mels=cfg.n_mels,
21 |         hop_length=cfg.hop_length, f_min=cfg.f_min, f_max=cfg.f_max)
22 | 
23 |     for folder in folders:
24 |         cur_folder = DATA_ROOT/folder
25 |         filenames = sorted(cur_folder.glob('*.wav'))
26 |         resampler = None
27 |         for filename in tqdm(filenames):
28 |             # Load waveform
29 |             waveform, sr = torchaudio.load(filename)
30 |             #assert sr == cfg.sample_rate
31 |             if sr != cfg.sample_rate:
32 |                 if resampler is None:
33 |                     resampler = torchaudio.transforms.Resample(sr, cfg.sample_rate)
34 |                     print(f'CAUTION: RESAMPLING from {sr} Hz to {cfg.sample_rate} Hz.')
35 |                 waveform = resampler(waveform)
36 |             # To log-mel spectrogram
37 |             log_mel_spec = to_mel_spectrogram(waveform).log()
38 |             # Write to work
39 |             (DEST/folder).mkdir(parents=True, exist_ok=True)
40 |             np.save(DEST/folder/filename.name.replace('.wav', '.npy'), log_mel_spec)
41 | 
42 | 
43 | fire.Fire(convert)
44 | 


--------------------------------------------------------------------------------
/config-fat2018.yaml:
--------------------------------------------------------------------------------
 1 | # type name of this configuration
 2 | type: B
 3 | 
 4 | # basic setting parameters
 5 | clip_length: 5.0 # [sec]
 6 | 
 7 | # preprocessing parameters
 8 | sample_rate: 44100
 9 | hop_length: 441
10 | n_fft: 2048
11 | n_mels: 64
12 | f_min: 0
13 | f_max: 22050
14 | 
15 | # test parameters
16 | bs: 64 #128
17 | mixup: 0.4
18 | n_folds: 5
19 | epochs: 300
20 | finetune_epochs: 20
21 | aug: RCxSA # RC: random resized crop, SA: spec augment
22 | model: R18 # R18: ResNetish18, VGG: VGGish
23 | 
24 | # dataset configurations
25 | data_root: /data/A/2018fsd


--------------------------------------------------------------------------------
/config.yaml:
--------------------------------------------------------------------------------
 1 | # basic setting parameters
 2 | clip_length: 5.0 # [sec]
 3 | 
 4 | # preprocessing parameters
 5 | sample_rate: 44100
 6 | hop_length: 441
 7 | n_fft: 1024
 8 | n_mels: 64
 9 | f_min: 0
10 | f_max: 22050
11 | 


--------------------------------------------------------------------------------
/for_evar/README.md:
--------------------------------------------------------------------------------
 1 | # For EVAR
 2 | 
 3 | [EVAR](https://github.com/nttcslab/eval-audio-repr) is a evaluation package for audio representations.
 4 | 
 5 | This subfolder holds files belonging to opensource for EVAR.
 6 | 
 7 | ## Acknoledgement
 8 | 
 9 | We use/borrow [PANNs](https://github.com/qiuqiangkong/audioset_tagging_cnn) implementation.
10 | 
11 | - https://github.com/qiuqiangkong/audioset_tagging_cnn
12 | 


--------------------------------------------------------------------------------
/for_evar/cnn14_decoupled.py:
--------------------------------------------------------------------------------
  1 | """CNN14 network, decoupled from Spectrogram, LogmelFilterBank, SpecAugmentation, and classifier head.
  2 | 
  3 | ## Reference
  4 | - [1] https://arxiv.org/abs/1912.10211 "PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition"
  5 | - [2] https://github.com/qiuqiangkong/audioset_tagging_cnn
  6 | """
  7 | 
  8 | import torch
  9 | from torch import nn
 10 | import torch.nn.functional as F
 11 | from torchlibrosa.stft import Spectrogram, LogmelFilterBank
 12 | 
 13 | 
 14 | class AudioFeatureExtractor(nn.Module):
 15 |     def __init__(self, sample_rate=16000, n_fft=512, n_mels=64, hop_length=160, win_length=512, f_min=50, f_max=8000):
 16 |         super().__init__()
 17 | 
 18 |         # Spectrogram extractor
 19 |         self.spectrogram_extractor = Spectrogram(n_fft=n_fft, hop_length=hop_length, 
 20 |             win_length=win_length, window='hann', center=True, pad_mode='reflect', 
 21 |             freeze_parameters=True)
 22 | 
 23 |         # Logmel feature extractor
 24 |         self.logmel_extractor = LogmelFilterBank(sr=sample_rate, n_fft=win_length, 
 25 |             n_mels=n_mels, fmin=f_min, fmax=f_max, ref=1.0, amin=1e-10, top_db=None, 
 26 |             freeze_parameters=True)
 27 | 
 28 |     def forward(self, batch_audio):
 29 |         x = self.spectrogram_extractor(batch_audio) # (B, 1, T, F(freq_bins))
 30 |         x = self.logmel_extractor(x)                # (B, 1, T, F(mel_bins))
 31 |         return x
 32 | 
 33 | 
 34 | def initialize_layers(layer):
 35 |     # initialize all childrens first.
 36 |     for l in layer.children():
 37 |         initialize_layers(l)
 38 | 
 39 |     # initialize only linaer
 40 |     if type(layer) != nn.Linear:
 41 |         return
 42 | 
 43 |     # Thanks to https://github.com/qiuqiangkong/audioset_tagging_cnn/blob/d2f4b8c18eab44737fcc0de1248ae21eb43f6aa4/pytorch/models.py#L10
 44 |     nn.init.xavier_uniform_(layer.weight)
 45 |     if hasattr(layer, 'bias'):
 46 |         if layer.bias is not None:
 47 |             layer.bias.data.fill_(0.)
 48 | 
 49 | 
 50 | def init_bn(bn):
 51 |     """Initialize a Batchnorm layer. """
 52 |     bn.bias.data.fill_(0.)
 53 |     bn.weight.data.fill_(1.)
 54 | 
 55 | 
 56 | class ConvBlock(nn.Module):
 57 |     def __init__(self, in_channels, out_channels):
 58 |         
 59 |         super(ConvBlock, self).__init__()
 60 |         
 61 |         self.conv1 = nn.Conv2d(in_channels=in_channels, 
 62 |                               out_channels=out_channels,
 63 |                               kernel_size=(3, 3), stride=(1, 1),
 64 |                               padding=(1, 1), bias=False)
 65 |                               
 66 |         self.conv2 = nn.Conv2d(in_channels=out_channels, 
 67 |                               out_channels=out_channels,
 68 |                               kernel_size=(3, 3), stride=(1, 1),
 69 |                               padding=(1, 1), bias=False)
 70 |                               
 71 |         self.bn1 = nn.BatchNorm2d(out_channels)
 72 |         self.bn2 = nn.BatchNorm2d(out_channels)
 73 | 
 74 |         self.init_weight()
 75 |         
 76 |     def init_weight(self):
 77 |         initialize_layers(self.conv1)
 78 |         initialize_layers(self.conv2)
 79 |         init_bn(self.bn1)
 80 |         init_bn(self.bn2)
 81 | 
 82 |         
 83 |     def forward(self, input, pool_size=(2, 2), pool_type='avg'):
 84 |         
 85 |         x = input
 86 |         x = F.relu_(self.bn1(self.conv1(x)))
 87 |         x = F.relu_(self.bn2(self.conv2(x)))
 88 |         if pool_type == 'max':
 89 |             x = F.max_pool2d(x, kernel_size=pool_size)
 90 |         elif pool_type == 'avg':
 91 |             x = F.avg_pool2d(x, kernel_size=pool_size)
 92 |         elif pool_type == 'avg+max':
 93 |             x1 = F.avg_pool2d(x, kernel_size=pool_size)
 94 |             x2 = F.max_pool2d(x, kernel_size=pool_size)
 95 |             x = x1 + x2
 96 |         else:
 97 |             raise Exception('Incorrect argument!')
 98 |         
 99 |         return x
100 | 
101 | 
102 | class Cnn14_Decoupled(nn.Module):
103 |     """CNN14 network, decoupled from Spectrogram, LogmelFilterBank, SpecAugmentation, and classifier head.
104 |     Original implementation: https://github.com/qiuqiangkong/audioset_tagging_cnn/blob/master/pytorch/models.py
105 |     """
106 | 
107 |     def __init__(self, n_mels=64, d=2048):
108 |         assert d == 2048, 'This implementation accepts d=2048 only, for compatible with the original Cnn14.'
109 |         super().__init__()
110 | 
111 |         self.bn0 = nn.BatchNorm2d(n_mels)
112 | 
113 |         self.conv_block1 = ConvBlock(in_channels=1, out_channels=64)
114 |         self.conv_block2 = ConvBlock(in_channels=64, out_channels=128)
115 |         self.conv_block3 = ConvBlock(in_channels=128, out_channels=256)
116 |         self.conv_block4 = ConvBlock(in_channels=256, out_channels=512)
117 |         self.conv_block5 = ConvBlock(in_channels=512, out_channels=1024)
118 |         self.conv_block6 = ConvBlock(in_channels=1024, out_channels=2048)
119 | 
120 |         self.fc1 = nn.Linear(2048, d, bias=True)
121 |         #self.fc_audioset = nn.Linear(d, classes_num, bias=True)
122 | 
123 |         self.init_weight()
124 | 
125 |     def init_weight(self):
126 |         init_bn(self.bn0)
127 |         initialize_layers(self.fc1)
128 |         #init_layer(self.fc_audioset)
129 | 
130 |     def encode(self, x, squash_freq=True):
131 |         x = x.transpose(1, 3)
132 |         x = self.bn0(x)
133 |         x = x.transpose(1, 3)
134 | 
135 |         x = self.conv_block1(x, pool_size=(2, 2), pool_type='avg')
136 |         x = F.dropout(x, p=0.2, training=self.training)
137 |         x = self.conv_block2(x, pool_size=(2, 2), pool_type='avg')
138 |         x = F.dropout(x, p=0.2, training=self.training)
139 |         x = self.conv_block3(x, pool_size=(2, 2), pool_type='avg')
140 |         x = F.dropout(x, p=0.2, training=self.training)
141 |         x3 = x
142 |         x = self.conv_block4(x, pool_size=(2, 2), pool_type='avg')
143 |         x = F.dropout(x, p=0.2, training=self.training)
144 |         x = self.conv_block5(x, pool_size=(2, 2), pool_type='avg')
145 |         x = F.dropout(x, p=0.2, training=self.training)
146 |         x = self.conv_block6(x, pool_size=(1, 1), pool_type='avg')
147 |         x = F.dropout(x, p=0.2, training=self.training)
148 |         if squash_freq:
149 |             x = torch.mean(x, dim=3)
150 |         return x
151 | 
152 |     def temporal_pooling(self, x):
153 |         (x1, _) = torch.max(x, dim=2)
154 |         x2 = torch.mean(x, dim=2)
155 |         x = x1 + x2
156 |         x = F.dropout(x, p=0.5, training=self.training)
157 |         x = F.relu_(self.fc1(x))
158 |         embedding = F.dropout(x, p=0.5, training=self.training)
159 |         return embedding
160 | 
161 |     def forward(self, x):
162 |         x = self.encode(x)
163 |         embedding = self.temporal_pooling(x)
164 | 
165 |         return embedding
166 | 


--------------------------------------------------------------------------------
/for_evar/sampler.py:
--------------------------------------------------------------------------------
 1 | """Samplers.
 2 | 
 3 | Mostly borrowed from:
 4 | https://github.com/qiuqiangkong/audioset_tagging_cnn
 5 | """
 6 | 
 7 | import numpy as np
 8 | import logging
 9 | 
10 | 
11 | class BalancedRandomSampler():
12 |     """
13 |     This is a simple version of:
14 |     https://github.com/qiuqiangkong/audioset_tagging_cnn/blob/d2f4b8c18eab44737fcc0de1248ae21eb43f6aa4/utils/data_generator.py#L175
15 |     """
16 |     def __init__(self, dataset, batch_size, random_seed=42):
17 | 
18 |         self.dataset = dataset
19 |         self.batch_size = batch_size
20 |         self.random_state = np.random.RandomState(random_seed)
21 | 
22 |         self.samples_per_class = np.sum(self.dataset.labels.numpy(), axis=0)
23 |         logging.info(f'samples per class: {self.samples_per_class.astype(np.int32)}')
24 | 
25 |         # Training indexes of all sound classes. E.g.: 
26 |         # [[0, 11, 12, ...], [3, 4, 15, 16, ...], [7, 8, ...], ...]
27 |         self.indexes_per_class = []
28 |         self.classes_num = len(self.dataset.classes)
29 | 
30 |         for k in range(self.classes_num):
31 |             self.indexes_per_class.append(
32 |                 np.where(dataset.labels[:, k] != 0)[0])
33 | 
34 |         # Shuffle indexes
35 |         for k in range(self.classes_num):
36 |             self.random_state.shuffle(self.indexes_per_class[k])
37 | 
38 |         self.queue = []
39 |         self.pointers_of_classes = [0] * self.classes_num
40 | 
41 |     def expand_queue(self, queue):
42 |         classes_set = np.arange(self.classes_num).tolist()
43 |         self.random_state.shuffle(classes_set)
44 |         queue += classes_set
45 |         return queue
46 | 
47 |     def __iter__(self):
48 |         while True:
49 |             batch_idxs = []
50 |             for _ in range(self.batch_size):
51 |                 if len(self.queue) == 0:
52 |                     self.queue = self.expand_queue(self.queue)
53 | 
54 |                 class_id = self.queue.pop(0)
55 |                 pointer = self.pointers_of_classes[class_id]
56 |                 self.pointers_of_classes[class_id] += 1
57 |                 batch_idxs.append(self.indexes_per_class[class_id][pointer])
58 |                 
59 |                 # When finish one epoch of a sound class, then shuffle its indexes and reset pointer
60 |                 if self.pointers_of_classes[class_id] >= self.samples_per_class[class_id]:
61 |                     self.pointers_of_classes[class_id] = 0
62 |                     self.random_state.shuffle(self.indexes_per_class[class_id])
63 | 
64 |             yield batch_idxs
65 | 
66 |     def __len__(self):
67 |         return (len(self.dataset) + self.batch_size - 1) // self.batch_size
68 | 
69 | 
70 | class InfiniteSampler(object):
71 |     def __init__(self, dataset, batch_size, random_seed=42, shuffle=False):
72 |         self.df = dataset.df
73 |         self.batch_size = batch_size
74 |         self.random_state = np.random.RandomState(random_seed)
75 |         self.indexes = self.df.index.values.copy()
76 |         self.shuffle = shuffle
77 |         if self.shuffle:
78 |             self.random_state.shuffle(self.indexes)
79 | 
80 |     def __iter__(self):
81 |         pointer = 0
82 |         while True:
83 |             batch_idxs = []
84 |             for _ in range(self.batch_size):
85 |                 batch_idxs.append(self.indexes[pointer])
86 |                 pointer += 1
87 |                 if pointer >= len(self.indexes):
88 |                     pointer = 0
89 |                     if self.shuffle:
90 |                         self.random_state.shuffle(self.indexes)
91 |             yield batch_idxs
92 | 
93 |     def __len__(self):
94 |         return (len(self.df) + self.batch_size - 1) // self.batch_size
95 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | torch>=1.7.0
 2 | torchaudio>=0.7.0
 3 | pytorch-lightning
 4 | pyyaml
 5 | easydict
 6 | matplotlib
 7 | numpy
 8 | jupyter
 9 | pandas
10 | scikit-learn
11 | fire
12 | dl-cliche
13 | 


--------------------------------------------------------------------------------
/src/__init__.py:
--------------------------------------------------------------------------------
1 | # multi-label
2 | 


--------------------------------------------------------------------------------
/src/augmentations.py:
--------------------------------------------------------------------------------
 1 | # Borrowed from https://github.com/pytorch/vision/blob/master/torchvision/transforms/functional.py
 2 | import torch
 3 | import torch.nn.functional as F
 4 | import math
 5 | 
 6 | 
 7 | class GenericRandomResizedCrop():
 8 |     def __init__(self, size, scale=(0.08, 1.0), ratio=(3. / 4., 4. / 3.)):
 9 |         self.size = size
10 |         self.scale = scale
11 |         self.ratio = ratio
12 | 
13 |     @staticmethod
14 |     def get_params(x, scale, ratio):
15 |         width, height = x.shape[1:]
16 |         area = height * width
17 | 
18 |         for _ in range(100):
19 |             target_area = area * torch.empty(1).uniform_(scale[0], scale[1]).item()
20 |             log_ratio = torch.log(torch.tensor(ratio))
21 |             aspect_ratio = torch.exp(
22 |                 torch.empty(1).uniform_(log_ratio[0], log_ratio[1])
23 |             ).item()
24 | 
25 |             w = int(round(math.sqrt(target_area * aspect_ratio)))
26 |             h = int(round(math.sqrt(target_area / aspect_ratio)))
27 | 
28 |             if 0 < w <= width and 0 < h <= height:
29 |                 i = torch.randint(0, height - h + 1, size=(1,)).item()
30 |                 j = torch.randint(0, width - w + 1, size=(1,)).item()
31 |                 return i, j, h, w
32 | 
33 |         # Fallback to central crop
34 |         in_ratio = float(width) / float(height)
35 |         if in_ratio < min(ratio):
36 |             w = width
37 |             h = int(round(w / min(ratio)))
38 |         elif in_ratio > max(ratio):
39 |             h = height
40 |             w = int(round(h * max(ratio)))
41 |         else:  # whole image
42 |             w = width
43 |             h = height
44 |         i = (height - h) // 2
45 |         j = (width - w) // 2
46 |         return i, j, h, w
47 | 
48 |     def __call__(self, x):
49 |         i, j, h, w = self.get_params(x, self.scale, self.ratio)
50 |         x = x[:, j:j+w, i:i+h]
51 |         return F.interpolate(x.unsqueeze(0), size=self.size, mode='bicubic', align_corners=True).squeeze(0)
52 | 
53 |     def __repr__(self):
54 |         return f'{self.__class__.__name__}(size={self.size}, scale={self.scale}, ratio={self.ratio})'
55 | 


--------------------------------------------------------------------------------
/src/libs.py:
--------------------------------------------------------------------------------
  1 | import warnings
  2 | warnings.simplefilter('ignore')
  3 | 
  4 | # Essential PyTorch
  5 | import torch
  6 | import torchaudio
  7 | 
  8 | # Other modules used in this notebook
  9 | from pathlib import Path
 10 | import matplotlib.pyplot as plt
 11 | import pandas as pd
 12 | import numpy as np
 13 | from IPython.display import Audio
 14 | import fire
 15 | import yaml
 16 | import multiprocessing
 17 | from easydict import EasyDict
 18 | from sklearn.model_selection import train_test_split, StratifiedKFold
 19 | from sklearn.utils.class_weight import compute_class_weight
 20 | import torch
 21 | import torch.nn as nn
 22 | import torch.nn.functional as F
 23 | import pytorch_lightning as pl
 24 | from pytorch_lightning.metrics.functional import accuracy
 25 | 
 26 | import torchvision.transforms as VT
 27 | import torchaudio.transforms as AT
 28 | 
 29 | from dlcliche.torch_utils import IntraBatchMixup
 30 | from dlcliche.utils import copy_file
 31 | 
 32 | from src.augmentations import GenericRandomResizedCrop
 33 | 
 34 | 
 35 | device = torch.device('cuda')
 36 | 
 37 | 
 38 | def load_config(filename, debug=False):
 39 |     with open(filename) as conf:
 40 |         cfg = EasyDict(yaml.safe_load(conf))
 41 |     cfg.unit_length = int((cfg.clip_length * cfg.sample_rate + cfg.hop_length - 1) // cfg.hop_length)
 42 |     cfg.data_root = Path(cfg.data_root)
 43 |     if debug:
 44 |         print(cfg)
 45 |     return cfg
 46 | 
 47 | 
 48 | def sample_length(log_mel_spec):
 49 |     return log_mel_spec.shape[-1]
 50 | 
 51 | 
 52 | class LMSClfDataset(torch.utils.data.Dataset):
 53 |     def __init__(self, cfg, filenames, labels, transforms=None, norm_mean_std=None):
 54 |         assert len(filenames) == len(labels), f'Inconsistent length of filenames and labels.'
 55 | 
 56 |         self.filenames = filenames
 57 |         self.labels = labels
 58 |         self.transforms = transforms
 59 |         self.norm_mean_std = norm_mean_std
 60 | 
 61 |         # Calculate length of clip this dataset will make
 62 |         self.unit_length = cfg.unit_length
 63 | 
 64 |         # Test with first file
 65 |         assert self[0][0].shape[-1] == self.unit_length, f'Check your files, failed to load {filenames[0]}'
 66 | 
 67 |         # Show basic info.
 68 |         print(f'Dataset will yield log-mel spectrogram {len(self)} data samples in shape [1, {cfg.n_mels}, {self.unit_length}]')
 69 | 
 70 |     def __len__(self):
 71 |         return len(self.filenames)
 72 | 
 73 |     def __getitem__(self, index):
 74 |         assert 0 <= index and index < len(self)
 75 |         
 76 |         log_mel_spec = np.load(self.filenames[index])
 77 | 
 78 |         # Normalize
 79 |         if self.norm_mean_std is not None:
 80 |             log_mel_spec = (log_mel_spec - self.norm_mean_std[0]) / self.norm_mean_std[1]
 81 | 
 82 |         # Padding if sample is shorter than expected - both head & tail are filled with 0s
 83 |         pad_size = self.unit_length - sample_length(log_mel_spec)
 84 |         if pad_size > 0:
 85 |             offset = pad_size // 2
 86 |             log_mel_spec = np.pad(log_mel_spec, ((0, 0), (0, 0), (offset, pad_size - offset)), 'constant')
 87 | 
 88 |         # Random crop
 89 |         crop_size = sample_length(log_mel_spec) - self.unit_length
 90 |         if crop_size > 0:
 91 |             start = np.random.randint(0, crop_size)
 92 |             log_mel_spec = log_mel_spec[..., start:start + self.unit_length]
 93 | 
 94 |         # Apply augmentations
 95 |         log_mel_spec = torch.Tensor(log_mel_spec)
 96 |         if self.transforms is not None:
 97 |             log_mel_spec = self.transforms(log_mel_spec)
 98 | 
 99 |         return log_mel_spec, self.labels[index]
100 | 
101 | 
102 | class SplitAllDataset(torch.utils.data.Dataset):
103 |     def __init__(self, cfg, filenames, norm_mean_std=None, head_n=99999):
104 |         self.filenames = filenames
105 |         self.norm_mean_std = norm_mean_std
106 | 
107 |         # Calculate length of clip this dataset will make
108 |         self.L = cfg.unit_length
109 | 
110 |         # Get # of splits for all files
111 |         self.n_splits = np.array([(np.load(f).shape[-1] + self.L - 1) // self.L for f in filenames])
112 |         self.n_splits = np.clip(1, head_n, self.n_splits) # limit number of splits.
113 |         self.sum_splits = np.cumsum(self.n_splits)
114 | 
115 |     def __len__(self):
116 |         return self.sum_splits[-1]
117 | 
118 |     def file_index(self, index):
119 |         return sum((index < self.sum_splits) == False)
120 | 
121 |     def filename(self, index):
122 |         return self.filenames[self.file_index(index)]
123 | 
124 |     def split_index(self, index):
125 |         fidx = self.file_index(index)
126 |         prev_sum = self.sum_splits[fidx - 1] if fidx > 0 else 0
127 |         return index - prev_sum
128 | 
129 |     def __getitem__(self, index):
130 |         assert 0 <= index and index < len(self)
131 | 
132 |         log_mel_spec = np.load(self.filename(index))
133 |         start = self.split_index(index) * self.L
134 |         log_mel_spec = log_mel_spec[..., start:start + self.L]
135 | 
136 |         # Normalize
137 |         if self.norm_mean_std is not None:
138 |             log_mel_spec = (log_mel_spec - self.norm_mean_std[0]) / self.norm_mean_std[1]
139 | 
140 |         # Padding if sample is shorter than expected - both head & tail are filled with 0s
141 |         pad_size = self.L - sample_length(log_mel_spec)
142 |         if pad_size > 0:
143 |             offset = pad_size // 2
144 |             log_mel_spec = np.pad(log_mel_spec, ((0, 0), (0, 0), (offset, pad_size - offset)), 'constant')
145 | 
146 |         return log_mel_spec, self.file_index(index)
147 | 
148 | 
149 | class LMSClfLearner(pl.LightningModule):
150 | 
151 |     def __init__(self, model, dataloaders, learning_rate=3e-4, mixup_alpha=0.0, weight=None):
152 |         super().__init__()
153 |         self.learning_rate = learning_rate
154 |         self.model = model
155 |         self.trn_dl, self.val_dl, self.test_dl = dataloaders
156 |         self.criterion = nn.CrossEntropyLoss(weight=weight)
157 |         self.batch_mixer = IntraBatchMixup(self.criterion, alpha=mixup_alpha) if mixup_alpha > 0.0 else None
158 | 
159 |     def forward(self, x):
160 |         x = self.model(x)
161 |         return x
162 | 
163 |     def step(self, x, y, train):
164 |         if self.batch_mixer is None:
165 |             preds = self(x)
166 |             loss = self.criterion(preds, y)
167 |         else:
168 |             x, stacked_y = self.batch_mixer.transform(x, y, train=train)
169 |             preds = self(x)
170 |             loss = self.batch_mixer.criterion(preds, stacked_y)
171 |         return preds, loss
172 | 
173 |     def training_step(self, batch, batch_idx):
174 |         x, y = batch
175 |         preds, loss = self.step(x, y, train=True)
176 |         return loss
177 | 
178 |     def validation_step(self, batch, batch_idx, split='val'):
179 |         x, y = batch
180 |         preds, loss = self.step(x, y, train=False)
181 |         yhat = torch.argmax(preds, dim=1)
182 |         acc = accuracy(yhat, y)
183 | 
184 |         self.log(f'{split}_loss', loss, prog_bar=True)
185 |         self.log(f'{split}_acc', acc, prog_bar=True)
186 |         return loss
187 | 
188 |     def test_step(self, batch, batch_idx):
189 |         return self.validation_step(batch, batch_idx, split='test')
190 | 
191 |     def configure_optimizers(self):
192 |         optimizer = torch.optim.AdamW(self.parameters(), lr=self.learning_rate)
193 |         return optimizer
194 | 
195 |     def train_dataloader(self):
196 |         return self.trn_dl
197 | 
198 |     def val_dataloader(self):
199 |         return self.val_dl
200 | 
201 |     def test_dataloader(self):
202 |         return self.test_dl


--------------------------------------------------------------------------------
/src/lwlrap.py:
--------------------------------------------------------------------------------
 1 | # Borrowed from https://github.com/DCASE-REPO/dcase2019_task2_baseline/blob/master/evaluation.py
 2 | import numpy as np
 3 | 
 4 | 
 5 | class Lwlrap(object):
 6 |   """Computes label-weighted label-ranked average precision (lwlrap)."""
 7 | 
 8 |   def __init__(self, class_map):
 9 |     self.num_classes = 0
10 |     self.total_num_samples = 0
11 |     self._class_map = class_map
12 | 
13 |   def accumulate(self, batch_truth, batch_scores):
14 |     """Accumulate a new batch of samples into the metric.
15 |     Args:
16 |       truth: np.array of (num_samples, num_classes) giving boolean
17 |         ground-truth of presence of that class in that sample for this batch.
18 |       scores: np.array of (num_samples, num_classes) giving the 
19 |         classifier-under-test's real-valued score for each class for each
20 |         sample.
21 |     """
22 |     assert batch_scores.shape == batch_truth.shape
23 |     num_samples, num_classes = batch_truth.shape
24 |     if not self.num_classes:
25 |       self.num_classes = num_classes
26 |       self._per_class_cumulative_precision = np.zeros(self.num_classes)
27 |       self._per_class_cumulative_count = np.zeros(self.num_classes, 
28 |                                                   dtype=np.int)
29 |     assert num_classes == self.num_classes
30 |     for truth, scores in zip(batch_truth, batch_scores):
31 |       pos_class_indices, precision_at_hits = (
32 |         self._one_sample_positive_class_precisions(scores, truth))
33 |       self._per_class_cumulative_precision[pos_class_indices] += (
34 |         precision_at_hits)
35 |       self._per_class_cumulative_count[pos_class_indices] += 1
36 |     self.total_num_samples += num_samples
37 | 
38 |   def _one_sample_positive_class_precisions(self, scores, truth):
39 |     """Calculate precisions for each true class for a single sample.
40 |     Args:
41 |       scores: np.array of (num_classes,) giving the individual classifier scores.
42 |       truth: np.array of (num_classes,) bools indicating which classes are true.
43 |     Returns:
44 |       pos_class_indices: np.array of indices of the true classes for this sample.
45 |       pos_class_precisions: np.array of precisions corresponding to each of those
46 |         classes.
47 |     """
48 |     num_classes = scores.shape[0]
49 |     pos_class_indices = np.flatnonzero(truth > 0)
50 |     # Only calculate precisions if there are some true classes.
51 |     if not len(pos_class_indices):
52 |       return pos_class_indices, np.zeros(0)
53 |     # Retrieval list of classes for this sample.
54 |     retrieved_classes = np.argsort(scores)[::-1]
55 |     # class_rankings[top_scoring_class_index] == 0 etc.
56 |     class_rankings = np.zeros(num_classes, dtype=np.int)
57 |     class_rankings[retrieved_classes] = range(num_classes)
58 |     # Which of these is a true label?
59 |     retrieved_class_true = np.zeros(num_classes, dtype=np.bool)
60 |     retrieved_class_true[class_rankings[pos_class_indices]] = True
61 |     # Num hits for every truncated retrieval list.
62 |     retrieved_cumulative_hits = np.cumsum(retrieved_class_true)
63 |     # Precision of retrieval list truncated at each hit, in order of pos_labels.
64 |     precision_at_hits = (
65 |         retrieved_cumulative_hits[class_rankings[pos_class_indices]] /
66 |         (1 + class_rankings[pos_class_indices].astype(np.float)))
67 |     return pos_class_indices, precision_at_hits
68 | 
69 |   def per_class_lwlrap(self):
70 |     """Return a vector of the per-class lwlraps for the accumulated samples."""
71 |     return (self._per_class_cumulative_precision /
72 |             np.maximum(1, self._per_class_cumulative_count))
73 | 
74 |   def per_class_weight(self):
75 |     """Return a normalized weight vector for the contributions of each class."""
76 |     return (self._per_class_cumulative_count /
77 |             float(np.sum(self._per_class_cumulative_count)))
78 | 
79 |   def overall_lwlrap(self):
80 |     """Return the scalar overall lwlrap for cumulated samples."""
81 |     return np.sum(self.per_class_lwlrap() * self.per_class_weight())
82 | 
83 |   def __str__(self):
84 |     per_class_lwlrap = self.per_class_lwlrap()
85 |     # List classes in descending order of lwlrap.
86 |     s = (['Lwlrap(%s) = %.6f' % (name, lwlrap) for (lwlrap, name) in
87 |              sorted([(per_class_lwlrap[i], self._class_map[i]) for i in range(self.num_classes)],
88 |                     reverse=True)])
89 |     s.append('Overall lwlrap = %.6f' % (self.overall_lwlrap()))
90 |     return '\n'.join(s)
91 | 


--------------------------------------------------------------------------------
/src/models.py:
--------------------------------------------------------------------------------
  1 | """Audio models based on VGGish [1] paper.
  2 | 
  3 | ## About
  4 | 
  5 | Based on following implementations:
  6 | 
  7 | - https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py -- borrowed most of code from this torchvision implementation.
  8 | - https://github.com/harritaylor/torchvggish
  9 | 
 10 | ## Disclaimer
 11 | 
 12 | Tried to follow the original paper description, but there could be difference from the real ResNetish/VGGish.
 13 | 
 14 | ## References
 15 | 
 16 | [1] S. Hershey et al., ‘CNN Architectures for Large-Scale Audio Classification’,\ in International Conference on Acoustics, Speech and Signal Processing (ICASSP),2017\ Available: https://arxiv.org/abs/1609.09430, https://ai.google/research/pubs/pub45611
 17 | """
 18 | 
 19 | import torch
 20 | from torch import Tensor
 21 | import torch.nn as nn
 22 | from typing import Type, Any, Callable, Union, List, Optional
 23 | 
 24 | 
 25 | def conv3x3(in_planes: int, out_planes: int, stride: int = 1, groups: int = 1, dilation: int = 1) -> nn.Conv2d:
 26 |     """3x3 convolution with padding"""
 27 |     return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
 28 |                      padding=dilation, groups=groups, bias=False, dilation=dilation)
 29 | 
 30 | 
 31 | def conv1x1(in_planes: int, out_planes: int, stride: int = 1) -> nn.Conv2d:
 32 |     """1x1 convolution"""
 33 |     return nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=stride, bias=False)
 34 | 
 35 | 
 36 | class BasicBlock(nn.Module):
 37 |     expansion: int = 1
 38 | 
 39 |     def __init__(
 40 |         self,
 41 |         inplanes: int,
 42 |         planes: int,
 43 |         stride: int = 1,
 44 |         downsample: Optional[nn.Module] = None,
 45 |         groups: int = 1,
 46 |         base_width: int = 64,
 47 |         dilation: int = 1,
 48 |         norm_layer: Optional[Callable[..., nn.Module]] = None
 49 |     ) -> None:
 50 |         super(BasicBlock, self).__init__()
 51 |         if norm_layer is None:
 52 |             norm_layer = nn.BatchNorm2d
 53 |         if groups != 1 or base_width != 64:
 54 |             raise ValueError('BasicBlock only supports groups=1 and base_width=64')
 55 |         if dilation > 1:
 56 |             raise NotImplementedError("Dilation > 1 not supported in BasicBlock")
 57 |         # Both self.conv1 and self.downsample layers downsample the input when stride != 1
 58 |         self.conv1 = conv3x3(inplanes, planes, stride)
 59 |         self.bn1 = norm_layer(planes)
 60 |         self.relu = nn.ReLU(inplace=True)
 61 |         self.conv2 = conv3x3(planes, planes)
 62 |         self.bn2 = norm_layer(planes)
 63 |         self.downsample = downsample
 64 |         self.stride = stride
 65 | 
 66 |     def forward(self, x: Tensor) -> Tensor:
 67 |         identity = x
 68 | 
 69 |         out = self.conv1(x)
 70 |         out = self.bn1(out)
 71 |         out = self.relu(out)
 72 | 
 73 |         out = self.conv2(out)
 74 |         out = self.bn2(out)
 75 | 
 76 |         if self.downsample is not None:
 77 |             identity = self.downsample(x)
 78 | 
 79 |         out += identity
 80 |         out = self.relu(out)
 81 | 
 82 |         return out
 83 | 
 84 | 
 85 | class Bottleneck(nn.Module):
 86 |     # Bottleneck in torchvision places the stride for downsampling at 3x3 convolution(self.conv2)
 87 |     # while original implementation places the stride at the first 1x1 convolution(self.conv1)
 88 |     # according to "Deep residual learning for image recognition"https://arxiv.org/abs/1512.03385.
 89 |     # This variant is also known as ResNet V1.5 and improves accuracy according to
 90 |     # https://ngc.nvidia.com/catalog/model-scripts/nvidia:resnet_50_v1_5_for_pytorch.
 91 | 
 92 |     expansion: int = 4
 93 | 
 94 |     def __init__(
 95 |         self,
 96 |         inplanes: int,
 97 |         planes: int,
 98 |         stride: int = 1,
 99 |         downsample: Optional[nn.Module] = None,
100 |         groups: int = 1,
101 |         base_width: int = 64,
102 |         dilation: int = 1,
103 |         norm_layer: Optional[Callable[..., nn.Module]] = None
104 |     ) -> None:
105 |         super(Bottleneck, self).__init__()
106 |         if norm_layer is None:
107 |             norm_layer = nn.BatchNorm2d
108 |         width = int(planes * (base_width / 64.)) * groups
109 |         # Both self.conv2 and self.downsample layers downsample the input when stride != 1
110 |         self.conv1 = conv1x1(inplanes, width)
111 |         self.bn1 = norm_layer(width)
112 |         self.conv2 = conv3x3(width, width, stride, groups, dilation)
113 |         self.bn2 = norm_layer(width)
114 |         self.conv3 = conv1x1(width, planes * self.expansion)
115 |         self.bn3 = norm_layer(planes * self.expansion)
116 |         self.relu = nn.ReLU(inplace=True)
117 |         self.downsample = downsample
118 |         self.stride = stride
119 | 
120 |     def forward(self, x: Tensor) -> Tensor:
121 |         identity = x
122 | 
123 |         out = self.conv1(x)
124 |         out = self.bn1(out)
125 |         out = self.relu(out)
126 | 
127 |         out = self.conv2(out)
128 |         out = self.bn2(out)
129 |         out = self.relu(out)
130 | 
131 |         out = self.conv3(out)
132 |         out = self.bn3(out)
133 | 
134 |         if self.downsample is not None:
135 |             identity = self.downsample(x)
136 | 
137 |         out += identity
138 |         out = self.relu(out)
139 | 
140 |         return out
141 | 
142 | 
143 | class ResNetish(nn.Module):
144 | 
145 |     def __init__(
146 |         self,
147 |         block: Type[Union[BasicBlock, Bottleneck]],
148 |         layers: List[int],
149 |         num_classes: int = 1000,
150 |         zero_init_residual: bool = False,
151 |         groups: int = 1,
152 |         width_per_group: int = 64,
153 |         replace_stride_with_dilation: Optional[List[bool]] = None,
154 |         norm_layer: Optional[Callable[..., nn.Module]] = None
155 |     ) -> None:
156 |         super(ResNetish, self).__init__()
157 |         if norm_layer is None:
158 |             norm_layer = nn.BatchNorm2d
159 |         self._norm_layer = norm_layer
160 | 
161 |         self.inplanes = 64
162 |         self.dilation = 1
163 |         if replace_stride_with_dilation is None:
164 |             # each element in the tuple indicates if we should replace
165 |             # the 2x2 stride with a dilated convolution instead
166 |             replace_stride_with_dilation = [False, False, False]
167 |         if len(replace_stride_with_dilation) != 3:
168 |             raise ValueError("replace_stride_with_dilation should be None "
169 |                              "or a 3-element tuple, got {}".format(replace_stride_with_dilation))
170 |         self.groups = groups
171 |         self.base_width = width_per_group
172 |         self.conv1 = nn.Conv2d(1, self.inplanes, kernel_size=7, stride=1, padding=3,  # Audio input 3 -> 1, stride 2 -> 1
173 |                                bias=False)
174 |         self.bn1 = norm_layer(self.inplanes)
175 |         self.relu = nn.ReLU(inplace=True)
176 |         self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
177 |         self.layer1 = self._make_layer(block, 64, layers[0])
178 |         self.layer2 = self._make_layer(block, 128, layers[1], stride=2,
179 |                                        dilate=replace_stride_with_dilation[0])
180 |         self.layer3 = self._make_layer(block, 256, layers[2], stride=2,
181 |                                        dilate=replace_stride_with_dilation[1])
182 |         self.layer4 = self._make_layer(block, 512, layers[3], stride=2,
183 |                                        dilate=replace_stride_with_dilation[2])
184 |         self.avgpool = nn.AdaptiveAvgPool2d((4, 6))
185 |         self.fc = nn.Linear(512 * 24 * block.expansion, num_classes)
186 | 
187 |         for m in self.modules():
188 |             if isinstance(m, nn.Conv2d):
189 |                 nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
190 |             elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
191 |                 nn.init.constant_(m.weight, 1)
192 |                 nn.init.constant_(m.bias, 0)
193 | 
194 |         # Zero-initialize the last BN in each residual branch,
195 |         # so that the residual branch starts with zeros, and each residual block behaves like an identity.
196 |         # This improves the model by 0.2~0.3% according to https://arxiv.org/abs/1706.02677
197 |         if zero_init_residual:
198 |             for m in self.modules():
199 |                 if isinstance(m, Bottleneck):
200 |                     nn.init.constant_(m.bn3.weight, 0)  # type: ignore[arg-type]
201 |                 elif isinstance(m, BasicBlock):
202 |                     nn.init.constant_(m.bn2.weight, 0)  # type: ignore[arg-type]
203 | 
204 |     def _make_layer(self, block: Type[Union[BasicBlock, Bottleneck]], planes: int, blocks: int,
205 |                     stride: int = 1, dilate: bool = False) -> nn.Sequential:
206 |         norm_layer = self._norm_layer
207 |         downsample = None
208 |         previous_dilation = self.dilation
209 |         if dilate:
210 |             self.dilation *= stride
211 |             stride = 1
212 |         if stride != 1 or self.inplanes != planes * block.expansion:
213 |             downsample = nn.Sequential(
214 |                 conv1x1(self.inplanes, planes * block.expansion, stride),
215 |                 norm_layer(planes * block.expansion),
216 |             )
217 | 
218 |         layers = []
219 |         layers.append(block(self.inplanes, planes, stride, downsample, self.groups,
220 |                             self.base_width, previous_dilation, norm_layer))
221 |         self.inplanes = planes * block.expansion
222 |         for _ in range(1, blocks):
223 |             layers.append(block(self.inplanes, planes, groups=self.groups,
224 |                                 base_width=self.base_width, dilation=self.dilation,
225 |                                 norm_layer=norm_layer))
226 | 
227 |         return nn.Sequential(*layers)
228 | 
229 |     def _forward_impl(self, x: Tensor) -> Tensor:
230 |         # See note [TorchScript super()]
231 |         x = self.conv1(x)
232 |         x = self.bn1(x)
233 |         x = self.relu(x)
234 |         x = self.maxpool(x)
235 | 
236 |         x = self.layer1(x)
237 |         x = self.layer2(x)
238 |         x = self.layer3(x)
239 |         x = self.layer4(x)
240 | 
241 |         x = self.avgpool(x)
242 |         x = torch.flatten(x, 1)
243 |         x = self.fc(x)
244 | 
245 |         return x
246 | 
247 |     def forward(self, x: Tensor) -> Tensor:
248 |         return self._forward_impl(x)
249 | 
250 | 
251 | def _resnet(
252 |     arch: str,
253 |     block: Type[Union[BasicBlock, Bottleneck]],
254 |     layers: List[int],
255 |     **kwargs: Any
256 | ) -> ResNetish:
257 |     model = ResNetish(block, layers, **kwargs)
258 |     return model
259 | 
260 | 
261 | def resnetish18(num_classes: int, **kwargs: Any) -> ResNetish:
262 |     r"""ResNet-18 model from
263 |     `"Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>`_.
264 |     Args:
265 |         pretrained (bool): If True, returns a model pre-trained on ImageNet
266 |         progress (bool): If True, displays a progress bar of the download to stderr
267 |     """
268 |     return _resnet('resnetish18', BasicBlock, [2, 2, 2, 2], num_classes=num_classes,
269 |                    **kwargs)
270 | 
271 | 
272 | def resnetish34(num_classes: int, **kwargs: Any) -> ResNetish:
273 |     r"""ResNet-34 model from
274 |     `"Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>`_.
275 |     Args:
276 |         pretrained (bool): If True, returns a model pre-trained on ImageNet
277 |         progress (bool): If True, displays a progress bar of the download to stderr
278 |     """
279 |     return _resnet('resnetish34', BasicBlock, [3, 4, 6, 3], num_classes=num_classes,
280 |                    **kwargs)
281 | 
282 | 
283 | def resnetish50(num_classes: int, **kwargs: Any) -> ResNetish:
284 |     r"""ResNet-50 model from
285 |     `"Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>`_.
286 |     Args:
287 |         pretrained (bool): If True, returns a model pre-trained on ImageNet
288 |         progress (bool): If True, displays a progress bar of the download to stderr
289 |     """
290 |     return _resnet('resnetish50', Bottleneck, [3, 4, 6, 3], num_classes=num_classes,
291 |                    **kwargs)
292 | 
293 | 
294 | class VGGish(nn.Module):
295 |     """Based on:
296 |         https://github.com/harritaylor/torchvggish/blob/master/docs/_example_download_weights.ipynb
297 |     """
298 | 
299 |     def __init__(self, num_classes: int): # Added num_classes
300 |         super(VGGish, self).__init__()
301 |         self.features = nn.Sequential(
302 |             nn.Conv2d(1, 64, 3, 1, 1),
303 |             nn.ReLU(inplace=True),
304 |             nn.MaxPool2d(2, 2),
305 |             nn.Conv2d(64, 128, 3, 1, 1),
306 |             nn.ReLU(inplace=True),
307 |             nn.MaxPool2d(2, 2),
308 |             nn.Conv2d(128, 256, 3, 1, 1),
309 |             nn.ReLU(inplace=True),
310 |             nn.Conv2d(256, 256, 3, 1, 1),
311 |             nn.ReLU(inplace=True),
312 |             nn.MaxPool2d(2, 2),
313 |             nn.Conv2d(256, 512, 3, 1, 1),
314 |             nn.ReLU(inplace=True),
315 |             nn.Conv2d(512, 512, 3, 1, 1),
316 |             nn.ReLU(inplace=True),
317 |             nn.AdaptiveMaxPool2d((4, 6))) # Replaced: MaxPool2d(2,2)
318 |         self.embeddings = nn.Sequential(
319 |             nn.Linear(512*24, 4096),
320 |             nn.ReLU(inplace=True),
321 |             nn.Linear(4096, 4096),
322 |             nn.ReLU(inplace=True),
323 |             nn.Linear(4096, 128),
324 |             nn.ReLU(inplace=True))
325 |         self.head = nn.Linear(128, num_classes) # Added
326 |             
327 |     def forward(self, x):
328 |         x = self.features(x)
329 |         x = x.view(x.size(0),-1)
330 |         x = self.embeddings(x)
331 |         x = self.head(x) #  Added
332 |         return x
333 | 
334 | 
335 | class AlexNet(nn.Module):
336 |     """Based on https://github.com/pytorch/vision/blob/master/torchvision/models/alexnet.py
337 |     """
338 | 
339 |     def __init__(self, num_classes: int = 1000) -> None:
340 |         super(AlexNet, self).__init__()
341 |         self.features = nn.Sequential(
342 |             nn.Conv2d(1, 64, kernel_size=11, stride=(1,2), padding=2), # Replaced 3-channel with 1, strid=4 with (1,2)
343 |             nn.BatchNorm2d(64), # Added according to the paper.
344 |             nn.ReLU(inplace=True),
345 |             nn.MaxPool2d(kernel_size=3, stride=2),
346 |             nn.Conv2d(64, 192, kernel_size=5, padding=2),
347 |             nn.BatchNorm2d(192), # Added according to the paper.
348 |             nn.ReLU(inplace=True),
349 |             nn.MaxPool2d(kernel_size=3, stride=2),
350 |             nn.Conv2d(192, 384, kernel_size=3, padding=1),
351 |             nn.BatchNorm2d(384), # Added according to the paper.
352 |             nn.ReLU(inplace=True),
353 |             nn.Conv2d(384, 256, kernel_size=3, padding=1),
354 |             nn.BatchNorm2d(256), # Added according to the paper.
355 |             nn.ReLU(inplace=True),
356 |             nn.Conv2d(256, 256, kernel_size=3, padding=1),
357 |             nn.BatchNorm2d(256), # Added according to the paper.
358 |             nn.ReLU(inplace=True),
359 |             nn.MaxPool2d(kernel_size=3, stride=2),
360 |         )
361 |         self.avgpool = nn.AdaptiveAvgPool2d((4, 6)) # Replaced: n.AdaptiveAvgPool2d((6, 6))
362 |         self.classifier = nn.Sequential(
363 |             nn.Dropout(),
364 |             nn.Linear(256 * 4 * 6, 4096), # Replaced: 256 * 6 * 6
365 |             nn.ReLU(inplace=True),
366 |             nn.Dropout(),
367 |             nn.Linear(4096, 4096),
368 |             nn.ReLU(inplace=True),
369 |             nn.Linear(4096, num_classes),
370 |         )
371 | 
372 |     def forward(self, x: torch.Tensor) -> torch.Tensor:
373 |         x = self.features(x)
374 |         x = self.avgpool(x)
375 |         x = torch.flatten(x, 1)
376 |         x = self.classifier(x)
377 |         return x
378 | 


--------------------------------------------------------------------------------
/src/multi_label_libs.py:
--------------------------------------------------------------------------------
  1 | import pytorch_lightning as pl
  2 | import datetime
  3 | import logging
  4 | import numpy as np
  5 | import torch
  6 | import torch.nn as nn
  7 | import torch.nn.functional as F
  8 | import multiprocessing
  9 | from dlcliche.torch_utils import IntraBatchMixupBCE
 10 | from dlcliche.utils import copy_file
 11 | from .lwlrap import Lwlrap
 12 | from skmultilearn.model_selection import IterativeStratification
 13 | 
 14 | 
 15 | class SplitAllDataset(torch.utils.data.Dataset):
 16 |     def __init__(self, cfg, df, normalize=False):
 17 |         self.df = df
 18 |         self.normalize = normalize
 19 | 
 20 |         # Calculate length of clip this dataset will make
 21 |         self.L = cfg.unit_length
 22 | 
 23 |         # Get # of splits for all files
 24 |         self.n_splits = np.array([(np.load(f).shape[-1] + self.L - 1) // self.L for f in df.index.values])
 25 |         self.sum_splits = np.cumsum(self.n_splits)
 26 | 
 27 |     def __len__(self):
 28 |         return self.sum_splits[-1]
 29 | 
 30 |     def file_index(self, index):
 31 |         return sum((index < self.sum_splits) == False)
 32 | 
 33 |     def filename(self, index):
 34 |         return self.df.index.values[self.file_index(index)]
 35 | 
 36 |     def split_index(self, index):
 37 |         fidx = self.file_index(index)
 38 |         prev_sum = self.sum_splits[fidx - 1] if fidx > 0 else 0
 39 |         return index - prev_sum
 40 | 
 41 |     def __getitem__(self, index):
 42 |         assert 0 <= index and index < len(self)
 43 | 
 44 |         log_mel_spec = np.load(self.filename(index))
 45 |         start = self.split_index(index) * self.L
 46 |         log_mel_spec = log_mel_spec[..., start:start + self.L]
 47 | 
 48 |         # normalize - instance based
 49 |         if self.normalize:
 50 |             _m, _s = log_mel_spec.mean(),  log_mel_spec.std() + np.finfo(np.float).eps
 51 |             log_mel_spec = (log_mel_spec - _m) / _s
 52 | 
 53 |         # Padding if sample is shorter than expected - both head & tail are filled with 0s
 54 |         pad_size = self.L - sample_length(log_mel_spec)
 55 |         if pad_size > 0:
 56 |             offset = pad_size // 2
 57 |             log_mel_spec = np.pad(log_mel_spec, ((0, 0), (0, 0), (offset, pad_size - offset)), 'constant')
 58 | 
 59 |         return log_mel_spec, self.file_index(index)
 60 | 
 61 | 
 62 | def eval_all_splits(cfg, model, device, classes, df, normalize=False, debug_name=None, n=1, bs=64):
 63 |     model = model.to(device).eval()
 64 |     file_probas = [[] for _ in range(len(df))]
 65 |     test_dataset = SplitAllDataset(cfg, df, normalize=normalize)
 66 |     test_loader = torch.utils.data.DataLoader(test_dataset, num_workers=multiprocessing.cpu_count(),
 67 |                                               batch_size=bs, pin_memory=True)
 68 |     print(f'Predicting all {len(test_dataset)} splits for {len(df)} files...')
 69 |     for _ in range(n):
 70 |         with torch.no_grad():
 71 |             for X, fileidxs in test_loader:
 72 |                 preds = model(X.to(device))
 73 |                 probas = F.sigmoid(preds)
 74 |                 for idx, proba in zip(fileidxs.cpu().numpy(), probas.cpu().numpy()):
 75 |                     file_probas[idx].append(proba)
 76 |     file_probas = np.array([np.mean(probas, axis=0) for probas in file_probas])
 77 |     lwlrap = Lwlrap(classes)
 78 |     lwlrap.accumulate(df.values, file_probas)
 79 |     return file_probas, lwlrap.overall_lwlrap(), lwlrap.per_class_lwlrap()
 80 | 
 81 | 
 82 | def sample_length(log_mel_spec):
 83 |     return log_mel_spec.shape[-1]
 84 | 
 85 | 
 86 | class MLClfDataset(torch.utils.data.Dataset):
 87 |     def __init__(self, cfg, df, transforms=None, normalize=False):
 88 |         self.df = df
 89 |         self.transforms = transforms
 90 |         self.normalize = normalize
 91 | 
 92 |         # Calculate length of clip this dataset will make
 93 |         self.cfg = cfg
 94 |         self.unit_length = cfg.unit_length
 95 |         self.hop = cfg.hop_length / cfg.sample_rate
 96 | 
 97 |         # Show basic info.
 98 |         print(f'Dataset will yield log-mel spectrogram {len(self)} data samples in shape [1, {cfg.n_mels}, {self.unit_length}]')
 99 | 
100 |     def __len__(self):
101 |         return len(self.df)
102 | 
103 |     def __getitem__(self, index):
104 |         assert 0 <= index and index < len(self)
105 |         row = self.df.iloc[index]
106 |         filename = f'{row.name}'
107 | 
108 |         log_mel_spec = np.load(filename)
109 | 
110 |         # normalize - instance based
111 |         if self.normalize:
112 |             _m, _s = log_mel_spec.mean(),  log_mel_spec.std() + np.finfo(np.float).eps
113 |             log_mel_spec = (log_mel_spec - _m) / _s
114 | 
115 |         # Padding if sample is shorter than expected - both head & tail are filled with 0s
116 |         pad_size = self.unit_length - sample_length(log_mel_spec)
117 |         offset = 0
118 |         if pad_size > 0:
119 |             offset = np.random.randint(1, pad_size) if pad_size > 1 else 0 # (pad_size // 2) -- for making it center
120 |             log_mel_spec = np.pad(log_mel_spec, ((0, 0), (0, 0), (offset, pad_size - offset)), 'constant')
121 | 
122 |         # Random crop
123 |         crop_size = sample_length(log_mel_spec) - self.unit_length
124 |         start = 0
125 |         if crop_size > 0:
126 |             start = np.random.randint(0, crop_size)
127 |             log_mel_spec = log_mel_spec[..., start:start + self.unit_length]
128 | 
129 |         # Apply augmentations
130 |         log_mel_spec = torch.Tensor(log_mel_spec)
131 |         if self.transforms is not None:
132 |             log_mel_spec = self.transforms(log_mel_spec)
133 | 
134 |         return log_mel_spec, row.values
135 | 
136 | 
137 | class MLClfLearner(pl.LightningModule):
138 | 
139 |     def __init__(self, model, dataloaders, classes, learning_rate=3e-4, mixup_alpha=0.2, weight=None):
140 |         super().__init__()
141 |         self.learning_rate = learning_rate
142 |         self.model = model
143 |         self.classes = classes
144 |         self.train_loader, self.valid_loader, self.test_loader = dataloaders
145 | 
146 |         self.criterion = nn.BCEWithLogitsLoss(weight=weight)
147 |         self.batch_mixer = IntraBatchMixupBCE(alpha=mixup_alpha)
148 |         self.lwlrap = Lwlrap(classes)
149 | 
150 |     def forward(self, x):
151 |         x = self.model(x)
152 |         return x
153 | 
154 |     def step(self, x, y, train):
155 |         mixed_inputs, mixed_labels = self.batch_mixer.transform(x, y, train=train)
156 |         preds = self(mixed_inputs)
157 |         #print(preds, mixed_labels.to(torch.float))
158 |         loss = self.criterion(preds, mixed_labels.to(torch.float))
159 |         return preds, loss
160 | 
161 |     def training_step(self, batch, batch_idx):
162 |         x, y = batch
163 |         preds, loss = self.step(x, y, train=True)
164 |         return loss
165 | 
166 |     def on_validation_start(self, **kwargs):
167 |         self.lwlrap = Lwlrap(self.classes)
168 | 
169 |     def validation_step(self, batch, batch_idx, split='val'):
170 |         x, gt = batch
171 |         preds, loss = self.step(x, gt, train=False)
172 |         self.lwlrap.accumulate(gt.cpu().numpy(), F.sigmoid(preds).cpu().numpy())
173 | 
174 |         self.log(f'{split}_loss', loss, prog_bar=True)
175 |         #batch_lwlrap = lwlrap(gt.cpu().numpy(), preds.cpu().numpy())
176 |         #self.log(f'{split}_lwlrap', batch_lwlrap, prog_bar=True)
177 |         if batch_idx >= len(self.valid_loader) - 1:
178 |             self.log(f'val_lwlrap', self.lwlrap.overall_lwlrap(), prog_bar=False)
179 |         logging.info(self.lwlrap)
180 |         return loss
181 | 
182 |     def test_step(self, batch, batch_idx):
183 |         return self.validation_step(batch, batch_idx, split='test')
184 | 
185 |     def configure_optimizers(self):
186 |         optimizer = torch.optim.AdamW(self.parameters(), lr=self.learning_rate)
187 |         return optimizer
188 | 
189 |     def train_dataloader(self):
190 |         return self.train_loader
191 | 
192 |     def val_dataloader(self):
193 |         return self.valid_loader
194 | 
195 |     def test_dataloader(self):
196 |         return self.test_loader
197 | 
198 | 
199 | def ml_fold_spliter(train_df, random_state=42):
200 |     fnames = train_df.index.values
201 | 
202 |     # multi label stratified train-test splitter
203 |     splitter = IterativeStratification(n_splits=5, random_state=random_state)
204 | 
205 |     for train, test in splitter.split(train_df.index, train_df):
206 |         yield train_df.iloc[train], train_df.iloc[test]
207 | 


--------------------------------------------------------------------------------
/work/.placeholder:
--------------------------------------------------------------------------------
1 | Working files will be in this folder.
2 | 


--------------------------------------------------------------------------------