├── .gitignore ├── LICENSE.md ├── bat_eval ├── cnn_helpers.py ├── cpu_detection.py ├── evaluate_cnn_fast.py ├── models │ ├── detector.npy │ ├── detector_192K.npy │ ├── detector_192K_params.json │ ├── detector_params.json │ └── readme.md ├── myskimage.py ├── mywavfile.py ├── nms.pyx ├── nms_slow.py ├── readme.md ├── run_detector.py ├── setup.py ├── spectrogram.py ├── wavs │ └── test_file.wav └── write_op.py ├── bat_train ├── classifier.py ├── cls_audio_forest.py ├── cls_cnn.py ├── cls_segment.py ├── create_results.py ├── data │ └── readme.md ├── data_set_params.py ├── evaluate.py ├── export_detector_weights.py ├── grad_features.py ├── nms.pyx ├── nms_slow.py ├── random_forest.py ├── readme.md ├── run_comparison.py ├── run_detector.py ├── spectrogram.py └── write_op.py └── readme.md /.gitignore: -------------------------------------------------------------------------------- 1 | *.pyc 2 | *.DS_Store 3 | *.c 4 | *.so 5 | -------------------------------------------------------------------------------- /LICENSE.md: -------------------------------------------------------------------------------- 1 | Attribution 4.0 International 2 | 3 | ======================================================================= 4 | 5 | Creative Commons Corporation ("Creative Commons") is not a law firm and 6 | does not provide legal services or legal advice. Distribution of 7 | Creative Commons public licenses does not create a lawyer-client or 8 | other relationship. Creative Commons makes its licenses and related 9 | information available on an "as-is" basis. Creative Commons gives no 10 | warranties regarding its licenses, any material licensed under their 11 | terms and conditions, or any related information. Creative Commons 12 | disclaims all liability for damages resulting from their use to the 13 | fullest extent possible. 14 | 15 | Using Creative Commons Public Licenses 16 | 17 | Creative Commons public licenses provide a standard set of terms and 18 | conditions that creators and other rights holders may use to share 19 | original works of authorship and other material subject to copyright 20 | and certain other rights specified in the public license below. The 21 | following considerations are for informational purposes only, are not 22 | exhaustive, and do not form part of our licenses. 23 | 24 | Considerations for licensors: Our public licenses are 25 | intended for use by those authorized to give the public 26 | permission to use material in ways otherwise restricted by 27 | copyright and certain other rights. Our licenses are 28 | irrevocable. Licensors should read and understand the terms 29 | and conditions of the license they choose before applying it. 30 | Licensors should also secure all rights necessary before 31 | applying our licenses so that the public can reuse the 32 | material as expected. Licensors should clearly mark any 33 | material not subject to the license. This includes other CC- 34 | licensed material, or material used under an exception or 35 | limitation to copyright. More considerations for licensors: 36 | wiki.creativecommons.org/Considerations_for_licensors 37 | 38 | Considerations for the public: By using one of our public 39 | licenses, a licensor grants the public permission to use the 40 | licensed material under specified terms and conditions. If 41 | the licensor's permission is not necessary for any reason--for 42 | example, because of any applicable exception or limitation to 43 | copyright--then that use is not regulated by the license. Our 44 | licenses grant only permissions under copyright and certain 45 | other rights that a licensor has authority to grant. Use of 46 | the licensed material may still be restricted for other 47 | reasons, including because others have copyright or other 48 | rights in the material. A licensor may make special requests, 49 | such as asking that all changes be marked or described. 50 | Although not required by our licenses, you are encouraged to 51 | respect those requests where reasonable. More_considerations 52 | for the public: 53 | wiki.creativecommons.org/Considerations_for_licensees 54 | 55 | ======================================================================= 56 | 57 | Creative Commons Attribution 4.0 International Public License 58 | 59 | By exercising the Licensed Rights (defined below), You accept and agree 60 | to be bound by the terms and conditions of this Creative Commons 61 | Attribution 4.0 International Public License ("Public License"). To the 62 | extent this Public License may be interpreted as a contract, You are 63 | granted the Licensed Rights in consideration of Your acceptance of 64 | these terms and conditions, and the Licensor grants You such rights in 65 | consideration of benefits the Licensor receives from making the 66 | Licensed Material available under these terms and conditions. 67 | 68 | 69 | Section 1 -- Definitions. 70 | 71 | a. Adapted Material means material subject to Copyright and Similar 72 | Rights that is derived from or based upon the Licensed Material 73 | and in which the Licensed Material is translated, altered, 74 | arranged, transformed, or otherwise modified in a manner requiring 75 | permission under the Copyright and Similar Rights held by the 76 | Licensor. For purposes of this Public License, where the Licensed 77 | Material is a musical work, performance, or sound recording, 78 | Adapted Material is always produced where the Licensed Material is 79 | synched in timed relation with a moving image. 80 | 81 | b. Adapter's License means the license You apply to Your Copyright 82 | and Similar Rights in Your contributions to Adapted Material in 83 | accordance with the terms and conditions of this Public License. 84 | 85 | c. Copyright and Similar Rights means copyright and/or similar rights 86 | closely related to copyright including, without limitation, 87 | performance, broadcast, sound recording, and Sui Generis Database 88 | Rights, without regard to how the rights are labeled or 89 | categorized. For purposes of this Public License, the rights 90 | specified in Section 2(b)(1)-(2) are not Copyright and Similar 91 | Rights. 92 | 93 | d. Effective Technological Measures means those measures that, in the 94 | absence of proper authority, may not be circumvented under laws 95 | fulfilling obligations under Article 11 of the WIPO Copyright 96 | Treaty adopted on December 20, 1996, and/or similar international 97 | agreements. 98 | 99 | e. Exceptions and Limitations means fair use, fair dealing, and/or 100 | any other exception or limitation to Copyright and Similar Rights 101 | that applies to Your use of the Licensed Material. 102 | 103 | f. Licensed Material means the artistic or literary work, database, 104 | or other material to which the Licensor applied this Public 105 | License. 106 | 107 | g. Licensed Rights means the rights granted to You subject to the 108 | terms and conditions of this Public License, which are limited to 109 | all Copyright and Similar Rights that apply to Your use of the 110 | Licensed Material and that the Licensor has authority to license. 111 | 112 | h. Licensor means the individual(s) or entity(ies) granting rights 113 | under this Public License. 114 | 115 | i. Share means to provide material to the public by any means or 116 | process that requires permission under the Licensed Rights, such 117 | as reproduction, public display, public performance, distribution, 118 | dissemination, communication, or importation, and to make material 119 | available to the public including in ways that members of the 120 | public may access the material from a place and at a time 121 | individually chosen by them. 122 | 123 | j. Sui Generis Database Rights means rights other than copyright 124 | resulting from Directive 96/9/EC of the European Parliament and of 125 | the Council of 11 March 1996 on the legal protection of databases, 126 | as amended and/or succeeded, as well as other essentially 127 | equivalent rights anywhere in the world. 128 | 129 | k. You means the individual or entity exercising the Licensed Rights 130 | under this Public License. Your has a corresponding meaning. 131 | 132 | 133 | Section 2 -- Scope. 134 | 135 | a. License grant. 136 | 137 | 1. Subject to the terms and conditions of this Public License, 138 | the Licensor hereby grants You a worldwide, royalty-free, 139 | non-sublicensable, non-exclusive, irrevocable license to 140 | exercise the Licensed Rights in the Licensed Material to: 141 | 142 | a. reproduce and Share the Licensed Material, in whole or 143 | in part; and 144 | 145 | b. produce, reproduce, and Share Adapted Material. 146 | 147 | 2. Exceptions and Limitations. For the avoidance of doubt, where 148 | Exceptions and Limitations apply to Your use, this Public 149 | License does not apply, and You do not need to comply with 150 | its terms and conditions. 151 | 152 | 3. Term. The term of this Public License is specified in Section 153 | 6(a). 154 | 155 | 4. Media and formats; technical modifications allowed. The 156 | Licensor authorizes You to exercise the Licensed Rights in 157 | all media and formats whether now known or hereafter created, 158 | and to make technical modifications necessary to do so. The 159 | Licensor waives and/or agrees not to assert any right or 160 | authority to forbid You from making technical modifications 161 | necessary to exercise the Licensed Rights, including 162 | technical modifications necessary to circumvent Effective 163 | Technological Measures. For purposes of this Public License, 164 | simply making modifications authorized by this Section 2(a) 165 | (4) never produces Adapted Material. 166 | 167 | 5. Downstream recipients. 168 | 169 | a. Offer from the Licensor -- Licensed Material. Every 170 | recipient of the Licensed Material automatically 171 | receives an offer from the Licensor to exercise the 172 | Licensed Rights under the terms and conditions of this 173 | Public License. 174 | 175 | b. No downstream restrictions. You may not offer or impose 176 | any additional or different terms or conditions on, or 177 | apply any Effective Technological Measures to, the 178 | Licensed Material if doing so restricts exercise of the 179 | Licensed Rights by any recipient of the Licensed 180 | Material. 181 | 182 | 6. No endorsement. Nothing in this Public License constitutes or 183 | may be construed as permission to assert or imply that You 184 | are, or that Your use of the Licensed Material is, connected 185 | with, or sponsored, endorsed, or granted official status by, 186 | the Licensor or others designated to receive attribution as 187 | provided in Section 3(a)(1)(A)(i). 188 | 189 | b. Other rights. 190 | 191 | 1. Moral rights, such as the right of integrity, are not 192 | licensed under this Public License, nor are publicity, 193 | privacy, and/or other similar personality rights; however, to 194 | the extent possible, the Licensor waives and/or agrees not to 195 | assert any such rights held by the Licensor to the limited 196 | extent necessary to allow You to exercise the Licensed 197 | Rights, but not otherwise. 198 | 199 | 2. Patent and trademark rights are not licensed under this 200 | Public License. 201 | 202 | 3. To the extent possible, the Licensor waives any right to 203 | collect royalties from You for the exercise of the Licensed 204 | Rights, whether directly or through a collecting society 205 | under any voluntary or waivable statutory or compulsory 206 | licensing scheme. In all other cases the Licensor expressly 207 | reserves any right to collect such royalties. 208 | 209 | 210 | Section 3 -- License Conditions. 211 | 212 | Your exercise of the Licensed Rights is expressly made subject to the 213 | following conditions. 214 | 215 | a. Attribution. 216 | 217 | 1. If You Share the Licensed Material (including in modified 218 | form), You must: 219 | 220 | a. retain the following if it is supplied by the Licensor 221 | with the Licensed Material: 222 | 223 | i. identification of the creator(s) of the Licensed 224 | Material and any others designated to receive 225 | attribution, in any reasonable manner requested by 226 | the Licensor (including by pseudonym if 227 | designated); 228 | 229 | ii. a copyright notice; 230 | 231 | iii. a notice that refers to this Public License; 232 | 233 | iv. a notice that refers to the disclaimer of 234 | warranties; 235 | 236 | v. a URI or hyperlink to the Licensed Material to the 237 | extent reasonably practicable; 238 | 239 | b. indicate if You modified the Licensed Material and 240 | retain an indication of any previous modifications; and 241 | 242 | c. indicate the Licensed Material is licensed under this 243 | Public License, and include the text of, or the URI or 244 | hyperlink to, this Public License. 245 | 246 | 2. You may satisfy the conditions in Section 3(a)(1) in any 247 | reasonable manner based on the medium, means, and context in 248 | which You Share the Licensed Material. For example, it may be 249 | reasonable to satisfy the conditions by providing a URI or 250 | hyperlink to a resource that includes the required 251 | information. 252 | 253 | 3. If requested by the Licensor, You must remove any of the 254 | information required by Section 3(a)(1)(A) to the extent 255 | reasonably practicable. 256 | 257 | 4. If You Share Adapted Material You produce, the Adapter's 258 | License You apply must not prevent recipients of the Adapted 259 | Material from complying with this Public License. 260 | 261 | 262 | Section 4 -- Sui Generis Database Rights. 263 | 264 | Where the Licensed Rights include Sui Generis Database Rights that 265 | apply to Your use of the Licensed Material: 266 | 267 | a. for the avoidance of doubt, Section 2(a)(1) grants You the right 268 | to extract, reuse, reproduce, and Share all or a substantial 269 | portion of the contents of the database; 270 | 271 | b. if You include all or a substantial portion of the database 272 | contents in a database in which You have Sui Generis Database 273 | Rights, then the database in which You have Sui Generis Database 274 | Rights (but not its individual contents) is Adapted Material; and 275 | 276 | c. You must comply with the conditions in Section 3(a) if You Share 277 | all or a substantial portion of the contents of the database. 278 | 279 | For the avoidance of doubt, this Section 4 supplements and does not 280 | replace Your obligations under this Public License where the Licensed 281 | Rights include other Copyright and Similar Rights. 282 | 283 | 284 | Section 5 -- Disclaimer of Warranties and Limitation of Liability. 285 | 286 | a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE 287 | EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS 288 | AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF 289 | ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS, 290 | IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION, 291 | WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR 292 | PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS, 293 | ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT 294 | KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT 295 | ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU. 296 | 297 | b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE 298 | TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION, 299 | NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT, 300 | INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES, 301 | COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR 302 | USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN 303 | ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR 304 | DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR 305 | IN PART, THIS LIMITATION MAY NOT APPLY TO YOU. 306 | 307 | c. The disclaimer of warranties and limitation of liability provided 308 | above shall be interpreted in a manner that, to the extent 309 | possible, most closely approximates an absolute disclaimer and 310 | waiver of all liability. 311 | 312 | 313 | Section 6 -- Term and Termination. 314 | 315 | a. This Public License applies for the term of the Copyright and 316 | Similar Rights licensed here. However, if You fail to comply with 317 | this Public License, then Your rights under this Public License 318 | terminate automatically. 319 | 320 | b. Where Your right to use the Licensed Material has terminated under 321 | Section 6(a), it reinstates: 322 | 323 | 1. automatically as of the date the violation is cured, provided 324 | it is cured within 30 days of Your discovery of the 325 | violation; or 326 | 327 | 2. upon express reinstatement by the Licensor. 328 | 329 | For the avoidance of doubt, this Section 6(b) does not affect any 330 | right the Licensor may have to seek remedies for Your violations 331 | of this Public License. 332 | 333 | c. For the avoidance of doubt, the Licensor may also offer the 334 | Licensed Material under separate terms or conditions or stop 335 | distributing the Licensed Material at any time; however, doing so 336 | will not terminate this Public License. 337 | 338 | d. Sections 1, 5, 6, 7, and 8 survive termination of this Public 339 | License. 340 | 341 | 342 | Section 7 -- Other Terms and Conditions. 343 | 344 | a. The Licensor shall not be bound by any additional or different 345 | terms or conditions communicated by You unless expressly agreed. 346 | 347 | b. Any arrangements, understandings, or agreements regarding the 348 | Licensed Material not stated herein are separate from and 349 | independent of the terms and conditions of this Public License. 350 | 351 | 352 | Section 8 -- Interpretation. 353 | 354 | a. For the avoidance of doubt, this Public License does not, and 355 | shall not be interpreted to, reduce, limit, restrict, or impose 356 | conditions on any use of the Licensed Material that could lawfully 357 | be made without permission under this Public License. 358 | 359 | b. To the extent possible, if any provision of this Public License is 360 | deemed unenforceable, it shall be automatically reformed to the 361 | minimum extent necessary to make it enforceable. If the provision 362 | cannot be reformed, it shall be severed from this Public License 363 | without affecting the enforceability of the remaining terms and 364 | conditions. 365 | 366 | c. No term or condition of this Public License will be waived and no 367 | failure to comply consented to unless expressly agreed to by the 368 | Licensor. 369 | 370 | d. Nothing in this Public License constitutes or may be interpreted 371 | as a limitation upon, or waiver of, any privileges and immunities 372 | that apply to the Licensor or You, including from the legal 373 | processes of any jurisdiction or authority. 374 | 375 | 376 | ======================================================================= 377 | 378 | Creative Commons is not a party to its public 379 | licenses. Notwithstanding, Creative Commons may elect to apply one of 380 | its public licenses to material it publishes and in those instances 381 | will be considered the “Licensor.” The text of the Creative Commons 382 | public licenses is dedicated to the public domain under the CC0 Public 383 | Domain Dedication. Except for the limited purpose of indicating that 384 | material is shared under a Creative Commons public license or as 385 | otherwise permitted by the Creative Commons policies published at 386 | creativecommons.org/policies, Creative Commons does not authorize the 387 | use of the trademark "Creative Commons" or any other trademark or logo 388 | of Creative Commons without its prior written consent including, 389 | without limitation, in connection with any unauthorized modifications 390 | to any of its public licenses or any other arrangements, 391 | understandings, or agreements concerning use of licensed material. For 392 | the avoidance of doubt, this paragraph does not form part of the 393 | public licenses. 394 | 395 | Creative Commons may be contacted at creativecommons.org. 396 | -------------------------------------------------------------------------------- /bat_eval/cnn_helpers.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from numpy.lib.stride_tricks import as_strided 3 | 4 | 5 | def aligned_malloc(shape, dtype, alignment=16): 6 | """allocates numpy.array of specified shape, dtype 7 | and memory alignment such that array.ctypes.data 8 | is an aligned memory pointer 9 | shape is numpy.array shape 10 | dtype is numpy.array element type 11 | alignment is required memory alignment in bytes 12 | """ 13 | itemsize = np.dtype(dtype).itemsize 14 | extra = alignment // itemsize 15 | size = np.prod(shape) 16 | buf = np.empty(size + extra, dtype=dtype) 17 | ofs = (-buf.ctypes.data % alignment) // itemsize 18 | aa = buf[ofs:ofs+size].reshape(shape) 19 | assert (aa.ctypes.data % alignment) == 0 20 | assert (aa.flags['C_CONTIGUOUS']) == True 21 | return aa 22 | 23 | 24 | def view_as_windows(arr_in, window_shape, step=1): 25 | """ Taken from skimage.util.shape.py 26 | Rolling window view of the input n-dimensional array. 27 | 28 | Windows are overlapping views of the input array, with adjacent windows 29 | shifted by a single row or column (or an index of a higher dimension). 30 | 31 | Parameters 32 | ---------- 33 | arr_in : ndarray 34 | N-d input array. 35 | window_shape : tuple 36 | Defines the shape of the elementary n-dimensional orthotope 37 | (better know as hyperrectangle [1]_) of the rolling window view. 38 | step : int, optional 39 | Number of elements to skip when moving the window forward (by 40 | default, move forward by one). The value must be equal or larger 41 | than one. 42 | 43 | Returns 44 | ------- 45 | arr_out : ndarray 46 | (rolling) window view of the input array. If `arr_in` is 47 | non-contiguous, a copy is made. 48 | """ 49 | 50 | arr_shape = np.array(arr_in.shape) 51 | window_shape = np.array(window_shape, dtype=arr_shape.dtype) 52 | 53 | # -- build rolling window view 54 | arr_in = np.ascontiguousarray(arr_in) 55 | 56 | new_shape = tuple((arr_shape - window_shape) // step + 1) + \ 57 | tuple(window_shape) 58 | 59 | arr_strides = np.array(arr_in.strides) 60 | new_strides = np.concatenate((arr_strides * step, arr_strides)) 61 | 62 | arr_out = as_strided(arr_in, shape=new_shape, strides=new_strides) 63 | 64 | return arr_out 65 | 66 | 67 | def corr2d(ip, filters, bias): 68 | """performs 2D correlation on 3D input matrix with depth D, with N filters 69 | does matrix multiplication method - will use a lot of memory for large 70 | inputs. see here for more details: 71 | https://petewarden.com/2015/04/20/why-gemm-is-at-the-heart-of-deep-learning/ 72 | ip is DxHxW 73 | filters is NxDxFhxFw, where Fh==Fw 74 | op is NxHxW 75 | """ 76 | 77 | # reshape filters, can do this outside as it only needs to be done once 78 | filters_re = filters.reshape((filters.shape[0], np.prod(filters.shape[1:]))) 79 | 80 | # produce views of the input 81 | op = view_as_windows(ip, filters.shape[1:]) 82 | op_height, op_width = op.shape[1:3] 83 | 84 | # reshape to 2D matrix and correlate with filters 85 | op = op.reshape((np.prod(op.shape[:3]), np.prod(op.shape[3:]))) 86 | op = np.dot(filters_re, op.T) 87 | 88 | # reshape back to the correct op size 89 | op = op.reshape((filters.shape[0], op_height, op_width)) 90 | 91 | # add bias term 92 | op += bias[..., np.newaxis, np.newaxis] 93 | 94 | # non linearity - ReLu 95 | op.clip(min=0, out=op) 96 | 97 | return op 98 | 99 | 100 | def max_pool(ip): 101 | """does a 2x2 max pool, crops off ends if not divisible by 2 102 | ip is DxHxW 103 | op is DxH/2xW/2 104 | """ 105 | 106 | height = ip.shape[1] - ip.shape[1]%2 107 | width = ip.shape[2] - ip.shape[2]%2 108 | h_max = np.maximum(ip[:,:height:2,:], ip[:,1:height:2,:]) 109 | op = np.maximum(h_max[:,:,:width:2], h_max[:,:,1:width:2]) 110 | return op 111 | 112 | 113 | def fully_connected_as_corr(ip, filters, bias): 114 | """turns a conv ouput to fully connected layer into a correlation by sliding 115 | it across the horizontal direction. this only needs to happen in 1D as the 116 | nuerons see the same size as the input 117 | ip is DxHxW 118 | filters is 2D - (DxHxW)x(num_neurons) 119 | op is Wxnum_neurons 120 | """ 121 | 122 | # create DxHxsliding_width views of input - similar to corr2d 123 | sliding_width = filters.shape[0] // np.prod(ip.shape[:2]) 124 | op = view_as_windows(ip, (ip.shape[0],ip.shape[1],sliding_width)) 125 | op = op.reshape((np.prod(op.shape[:3]), np.prod(op.shape[3:]))) 126 | 127 | # perform correlation view matrix multiplication 128 | op = np.dot(op, filters) 129 | 130 | # add bias term 131 | op += bias[np.newaxis, :] 132 | 133 | # non linearity - ReLu 134 | op.clip(min=0, out=op) 135 | 136 | # pad with zeros at end so thats it the same width as input 137 | op = np.vstack((op, np.zeros((ip.shape[2]-op.shape[0], op.shape[1]), dtype=np.float32))) 138 | 139 | return op 140 | 141 | -------------------------------------------------------------------------------- /bat_eval/cpu_detection.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | import numpy as np 3 | from scipy.ndimage import zoom 4 | from scipy.ndimage.filters import gaussian_filter1d 5 | import json 6 | import time 7 | 8 | from spectrogram import Spectrogram 9 | import cnn_helpers as ch 10 | 11 | import warnings 12 | warnings.simplefilter("ignore", UserWarning) 13 | 14 | try: 15 | import nms as nms 16 | except ImportError as e: 17 | print("Import Error: {0}".format(e)) 18 | print('please compile fast nms by running:') 19 | print('python setup.py build_ext --inplace') 20 | print('using slow nms in the meantime.') 21 | import nms_slow as nms 22 | 23 | 24 | class CPUDetector: 25 | 26 | def __init__(self, weight_file, params_file): 27 | """Performs detection on an audio file. 28 | The structure of the network is hard coded to a network with 2 29 | convolution layers with pooling, 1 or 2 fully connected layers, and a 30 | final softmax layer. 31 | 32 | weight_file is the path to the numpy weights to the network 33 | params_file is the path to the network parameters 34 | """ 35 | 36 | self.weights = np.load(weight_file, encoding='latin1') 37 | if not all([weight.dtype==np.float32 for weight in self.weights]): 38 | for i in range(self.weights.shape[0]): 39 | self.weights[i] = self.weights[i].astype(np.float32) 40 | 41 | with open(params_file) as fp: 42 | params = json.load(fp) 43 | 44 | self.chunk_size = 4.0 # seconds 45 | self.win_size = params['win_size'] 46 | self.max_freq = params['max_freq'] 47 | self.min_freq = params['min_freq'] 48 | self.slice_scale = params['slice_scale'] 49 | self.overlap = params['overlap'] 50 | self.crop_spec = params['crop_spec'] 51 | self.denoise = params['denoise'] 52 | self.smooth_spec = params['smooth_spec'] 53 | self.nms_win_size = int(params['nms_win_size']) 54 | self.smooth_op_prediction_sigma = params['smooth_op_prediction_sigma'] 55 | self.sp = Spectrogram() 56 | 57 | 58 | def run_detection(self, spec, chunk_duration, detection_thresh, low_res=True): 59 | """audio is the raw samples should be 1D vector 60 | sampling_rate should be divided by 10 if the recordings are not time 61 | expanded 62 | """ 63 | 64 | # run the cnn - low_res will be faster but less accurate 65 | if low_res: 66 | prob = self.eval_network(spec) 67 | scale_fact = 8.0 68 | else: 69 | prob_1 = self.eval_network(spec) 70 | prob_2 = self.eval_network(spec[:, 2:]) 71 | 72 | prob = np.zeros(prob_1.shape[0]*2, dtype=np.float32) 73 | prob[0::2] = prob_1 74 | prob[1::2] = prob_2 75 | scale_fact = 4.0 76 | 77 | f_size = self.smooth_op_prediction_sigma / scale_fact 78 | nms_win = np.round(self.nms_win_size / scale_fact) 79 | 80 | # smooth the outputs - this might not be necessary 81 | prob = gaussian_filter1d(prob, f_size) 82 | 83 | # perform non maximum suppression 84 | call_time, call_prob = nms.nms_1d(prob, nms_win, chunk_duration) 85 | 86 | # remove detections below threshold 87 | if call_prob.shape[0] > 0: 88 | inds = (call_prob >= detection_thresh) 89 | call_prob = call_prob[inds] 90 | call_time = call_time[inds] 91 | 92 | return call_time, call_prob 93 | 94 | 95 | def create_spec(self, audio, sampling_rate): 96 | """Creates spectrogram (returned numpy array has correct memory alignement) 97 | """ 98 | hspec = self.sp.gen_spectrogram(audio, sampling_rate, self.slice_scale, 99 | self.overlap, crop_spec=self.crop_spec, 100 | max_freq=self.max_freq, min_freq=self.min_freq) 101 | hspec = self.sp.process_spectrogram(hspec, denoise_spec=self.denoise, 102 | smooth_spec=self.smooth_spec) 103 | nsize = (np.ceil(hspec.shape[0]/2.0).astype(int), np.ceil(hspec.shape[1]/2.0).astype(int)) 104 | spec = ch.aligned_malloc(nsize, np.float32) 105 | 106 | zoom(hspec, 0.5, output=spec, order=1) 107 | return spec 108 | 109 | 110 | def eval_network(self, ip): 111 | """runs the cnn - either the 1 or 2 fully connected versions 112 | """ 113 | 114 | if self.weights.shape[0] == 8: 115 | prob = self.eval_network_1_dense(ip) 116 | elif self.weights.shape[0] == 10: 117 | prob = self.eval_network_2_dense(ip) 118 | return prob 119 | 120 | 121 | def eval_network_1_dense(self, ip): 122 | """cnn with 1 dense layer at end 123 | """ 124 | 125 | # Conv Layer 1 126 | conv1 = ch.corr2d(ip[np.newaxis,:,:], self.weights[0], self.weights[1]) 127 | pool1 = ch.max_pool(conv1) 128 | 129 | # Conv Layer 2 130 | conv2 = ch.corr2d(pool1, self.weights[2], self.weights[3]) 131 | pool2 = ch.max_pool(conv2) 132 | 133 | # Fully Connected 1 134 | fc1 = ch.fully_connected_as_corr(pool2, self.weights[4], self.weights[5]) 135 | 136 | # Output layer 137 | prob = np.dot(fc1, self.weights[6]) 138 | prob += self.weights[7][np.newaxis, :] 139 | prob = prob - np.amax(prob, axis=1, keepdims=True) 140 | prob = np.exp(prob) 141 | prob = prob[:, 1] / prob.sum(1) 142 | prob = np.hstack((prob, np.zeros((ip.shape[1]//4)-prob.shape[0], dtype=np.float32))) 143 | 144 | return prob 145 | 146 | 147 | def eval_network_2_dense(self, ip): 148 | """cnn with 2 dense layers at end 149 | """ 150 | 151 | # Conv Layer 1 152 | conv1 = ch.corr2d(ip[np.newaxis,:], self.weights[0], self.weights[1]) 153 | pool1 = ch.max_pool(conv1) 154 | 155 | # Conv Layer 2 156 | conv2 = ch.corr2d(pool1, self.weights[2], self.weights[3]) 157 | pool2 = ch.max_pool(conv2) 158 | 159 | # Fully Connected 1 160 | fc1 = ch.fully_connected_as_corr(pool2, self.weights[4], self.weights[5]) 161 | 162 | # Fully Connected 2 163 | fc2 = np.dot(fc1, self.weights[6]) # fc times fc 164 | fc2 += self.weights[7][np.newaxis, :] # add bias term 165 | fc2.clip(min=0, out=fc2) # non linearity - ReLu 166 | 167 | # Output layer 168 | prob = np.dot(fc2, self.weights[8]) 169 | prob += self.weights[9][np.newaxis, :] 170 | prob = prob - np.amax(prob, axis=1, keepdims=True) 171 | prob = np.exp(prob) 172 | prob = prob[:, 1] / prob.sum(1) 173 | prob = np.hstack((prob, np.zeros((ip.shape[1]//4)-prob.shape[0], dtype=np.float32))) 174 | 175 | return prob 176 | 177 | -------------------------------------------------------------------------------- /bat_eval/evaluate_cnn_fast.py: -------------------------------------------------------------------------------- 1 | """ 2 | This script evaluates the performance of the CPU version of CNN_FAST on the 3 | different test sets. 4 | 5 | First you need to run 'run_detector.py' with these settings to produce 6 | 'results/op_file.csv': 7 | 8 | detection_thresh = 0.0 9 | do_time_expansion = False 10 | root_dir = '../bat_train/data/' # set this to where the annotations and wav files are 11 | data_set = root_dir + 'train_test_split/test_set_bulgaria.npz' 12 | data_dir = root_dir + 'wav/' 13 | loaded_data_tr = np.load(data_set) 14 | audio_files = loaded_data_tr['test_files'] 15 | audio_files = [data_dir + tt + '.wav' for tt in audio_files] 16 | """ 17 | 18 | import numpy as np 19 | import pandas as pd 20 | import sys 21 | sys.path.append('../bat_train/') 22 | import evaluate as evl 23 | import create_results as res 24 | 25 | # point to data 26 | root_dir = '../bat_train/data/' 27 | 28 | # load the test data 29 | data_set = root_dir + 'train_test_split/test_set_uk.npz' 30 | loaded_data_tr = np.load(data_set) 31 | test_pos = loaded_data_tr['test_pos'] 32 | test_files = loaded_data_tr['test_files'] 33 | test_durations = loaded_data_tr['test_durations'] 34 | 35 | 36 | # load results and put them in the correct format 37 | da = pd.read_csv('results/op_file.csv') 38 | nms_pos = [] 39 | nms_prob = [] 40 | for tt in test_files: 41 | dal = da[da['file_name'] == tt+'.wav'] 42 | nms_pos.append(dal['detection_time'].values) 43 | nms_prob.append(dal['detection_prob'].values[..., np.newaxis]) 44 | 45 | 46 | # compute precision and recall 47 | precision, recall = evl.prec_recall_1d(nms_pos, nms_prob, test_pos, test_durations, 0.1, 0.230) 48 | res.plot_prec_recall('CNN_FAST', recall, precision, nms_prob) 49 | 50 | -------------------------------------------------------------------------------- /bat_eval/models/detector.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/macaodha/batdetect/7d4f7e43b4456d391eb832a612e7b134e341814d/bat_eval/models/detector.npy -------------------------------------------------------------------------------- /bat_eval/models/detector_192K.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/macaodha/batdetect/7d4f7e43b4456d391eb832a612e7b134e341814d/bat_eval/models/detector_192K.npy -------------------------------------------------------------------------------- /bat_eval/models/detector_192K_params.json: -------------------------------------------------------------------------------- 1 | {"min_freq": 10, "crop_spec": true, "slice_scale": 0.02667, "nms_win_size": 21, "mean_log_mag": 0.5, "denoise": true, "smooth_spec": true, "overlap": 0.75, "max_freq": 240, "smooth_op_prediction_sigma": 1.0335917312661498, "chunk_size": 0, "win_size": 0.23} -------------------------------------------------------------------------------- /bat_eval/models/detector_params.json: -------------------------------------------------------------------------------- 1 | {"min_freq": 10, "crop_spec": true, "slice_scale": 0.02322, "nms_win_size": 21, "mean_log_mag": 0.5, "denoise": true, "smooth_spec": true, "overlap": 0.75, "max_freq": 270, "smooth_op_prediction_sigma": 1.0335917312661498, "chunk_size": 0, "win_size": 0.23} -------------------------------------------------------------------------------- /bat_eval/models/readme.md: -------------------------------------------------------------------------------- 1 | Trained models should be kept here: 2 | 3 | `detector.npy` is 15_09_16_16_26_28_cnn_hnm_2_lr_0.01_mo_0.9_net_small_norfolk 4 | norfolk test data - trained on background day data 5 | average precision (area) = 0.866 6 | recall at 95 % precision = 0.736 7 | 8 | 9 | `detector_192K.npy` is 15_09_16_15_39_34_cnn_hnm_2_lr_0.01_mo_0.9_net_small_norfolk 10 | norfolk test data - trained on background day data 11 | average precision (area) = 0.856 12 | recall at 95 % precision = 0.724 -------------------------------------------------------------------------------- /bat_eval/myskimage.py: -------------------------------------------------------------------------------- 1 | """ 2 | This file contains code copied from skimage package. 3 | Specifically, this file is a standalone implementation of 4 | skimage.filters's "gaussian" function. 5 | The "image_as_float" and "guess_spatial_dimensions" functions 6 | were also copied to as dependencies of "gaussian" function. 7 | """ 8 | from __future__ import division 9 | import numbers 10 | import collections as coll 11 | import numpy as np 12 | from scipy import ndimage as ndi 13 | 14 | __all__ = ['gaussian'] 15 | 16 | dtype_range = {np.bool_: (False, True), 17 | np.bool8: (False, True), 18 | np.uint8: (0, 255), 19 | np.uint16: (0, 65535), 20 | np.int8: (-128, 127), 21 | np.int16: (-32768, 32767), 22 | np.int64: (-2**63, 2**63 - 1), 23 | np.uint64: (0, 2**64 - 1), 24 | np.int32: (-2**31, 2**31 - 1), 25 | np.uint32: (0, 2**32 - 1), 26 | np.float32: (-1, 1), 27 | np.float64: (-1, 1)} 28 | 29 | integer_types = (np.uint8, np.uint16, np.int8, np.int16) 30 | 31 | _supported_types = (np.bool_, np.bool8, 32 | np.uint8, np.uint16, np.uint32, np.uint64, 33 | np.int8, np.int16, np.int32, np.int64, 34 | np.float32, np.float64) 35 | 36 | dtype_range[np.float16] = (-1, 1) 37 | _supported_types += (np.float16, ) 38 | 39 | def warn(msg): 40 | print(msg) 41 | 42 | 43 | def img_as_float(image): 44 | dtype=np.float32 45 | force_copy = False 46 | 47 | """ 48 | Convert an image to the requested data-type. 49 | Warnings are issued in case of precision loss, or when negative values 50 | are clipped during conversion to unsigned integer types (sign loss). 51 | Floating point values are expected to be normalized and will be clipped 52 | to the range [0.0, 1.0] or [-1.0, 1.0] when converting to unsigned or 53 | signed integers respectively. 54 | Numbers are not shifted to the negative side when converting from 55 | unsigned to signed integer types. Negative values will be clipped when 56 | converting to unsigned integers. 57 | Parameters 58 | ---------- 59 | image : ndarray 60 | Input image. 61 | dtype : dtype 62 | Target data-type. 63 | force_copy : bool, optional 64 | Force a copy of the data, irrespective of its current dtype. 65 | uniform : bool, optional 66 | Uniformly quantize the floating point range to the integer range. 67 | By default (uniform=False) floating point values are scaled and 68 | rounded to the nearest integers, which minimizes back and forth 69 | conversion errors. 70 | References 71 | ---------- 72 | .. [1] DirectX data conversion rules. 73 | http://msdn.microsoft.com/en-us/library/windows/desktop/dd607323%28v=vs.85%29.aspx 74 | .. [2] Data Conversions. In "OpenGL ES 2.0 Specification v2.0.25", 75 | pp 7-8. Khronos Group, 2010. 76 | .. [3] Proper treatment of pixels as integers. A.W. Paeth. 77 | In "Graphics Gems I", pp 249-256. Morgan Kaufmann, 1990. 78 | .. [4] Dirty Pixels. J. Blinn. In "Jim Blinn's corner: Dirty Pixels", 79 | pp 47-57. Morgan Kaufmann, 1998. 80 | """ 81 | image = np.asarray(image) 82 | dtypeobj = np.dtype(dtype) 83 | dtypeobj_in = image.dtype 84 | dtype = dtypeobj.type 85 | dtype_in = dtypeobj_in.type 86 | 87 | if dtype_in == dtype: 88 | if force_copy: 89 | image = image.copy() 90 | return image 91 | 92 | if not (dtype_in in _supported_types and dtype in _supported_types): 93 | raise ValueError("can not convert %s to %s." % (dtypeobj_in, dtypeobj)) 94 | 95 | def sign_loss(): 96 | warn("Possible sign loss when converting negative image of type " 97 | "%s to positive image of type %s." % (dtypeobj_in, dtypeobj)) 98 | 99 | def prec_loss(): 100 | warn("Possible precision loss when converting from " 101 | "%s to %s" % (dtypeobj_in, dtypeobj)) 102 | 103 | def _dtype(itemsize, *dtypes): 104 | # Return first of `dtypes` with itemsize greater than `itemsize` 105 | return next(dt for dt in dtypes if itemsize < np.dtype(dt).itemsize) 106 | 107 | def _dtype2(kind, bits, itemsize=1): 108 | # Return dtype of `kind` that can store a `bits` wide unsigned int 109 | def compare(x, y, kind='u'): 110 | if kind == 'u': 111 | return x <= y 112 | else: 113 | return x < y 114 | 115 | s = next(i for i in (itemsize, ) + (2, 4, 8) if compare(bits, i * 8, 116 | kind=kind)) 117 | return np.dtype(kind + str(s)) 118 | 119 | 120 | def _scale(a, n, m, copy=True): 121 | # Scale unsigned/positive integers from n to m bits 122 | # Numbers can be represented exactly only if m is a multiple of n 123 | # Output array is of same kind as input. 124 | kind = a.dtype.kind 125 | if n > m and a.max() < 2 ** m: 126 | mnew = int(np.ceil(m / 2) * 2) 127 | if mnew > m: 128 | dtype = "int%s" % mnew 129 | else: 130 | dtype = "uint%s" % mnew 131 | n = int(np.ceil(n / 2) * 2) 132 | msg = ("Downcasting %s to %s without scaling because max " 133 | "value %s fits in %s" % (a.dtype, dtype, a.max(), dtype)) 134 | warn(msg) 135 | return a.astype(_dtype2(kind, m)) 136 | elif n == m: 137 | return a.copy() if copy else a 138 | elif n > m: 139 | # downscale with precision loss 140 | prec_loss() 141 | if copy: 142 | b = np.empty(a.shape, _dtype2(kind, m)) 143 | np.floor_divide(a, 2**(n - m), out=b, dtype=a.dtype, 144 | casting='unsafe') 145 | return b 146 | else: 147 | a //= 2**(n - m) 148 | return a 149 | elif m % n == 0: 150 | # exact upscale to a multiple of n bits 151 | if copy: 152 | b = np.empty(a.shape, _dtype2(kind, m)) 153 | np.multiply(a, (2**m - 1) // (2**n - 1), out=b, dtype=b.dtype) 154 | return b 155 | else: 156 | a = np.array(a, _dtype2(kind, m, a.dtype.itemsize), copy=False) 157 | a *= (2**m - 1) // (2**n - 1) 158 | return a 159 | else: 160 | # upscale to a multiple of n bits, 161 | # then downscale with precision loss 162 | prec_loss() 163 | o = (m // n + 1) * n 164 | if copy: 165 | b = np.empty(a.shape, _dtype2(kind, o)) 166 | np.multiply(a, (2**o - 1) // (2**n - 1), out=b, dtype=b.dtype) 167 | b //= 2**(o - m) 168 | return b 169 | else: 170 | a = np.array(a, _dtype2(kind, o, a.dtype.itemsize), copy=False) 171 | a *= (2**o - 1) // (2**n - 1) 172 | a //= 2**(o - m) 173 | return a 174 | 175 | kind = dtypeobj.kind 176 | kind_in = dtypeobj_in.kind 177 | itemsize = dtypeobj.itemsize 178 | itemsize_in = dtypeobj_in.itemsize 179 | 180 | if kind == 'b': 181 | # to binary image 182 | if kind_in in "fi": 183 | sign_loss() 184 | prec_loss() 185 | return image > dtype_in(dtype_range[dtype_in][1] / 2) 186 | 187 | if kind_in == 'b': 188 | # from binary image, to float and to integer 189 | result = image.astype(dtype) 190 | if kind != 'f': 191 | result *= dtype(dtype_range[dtype][1]) 192 | return result 193 | 194 | if kind in 'ui': 195 | imin = np.iinfo(dtype).min 196 | imax = np.iinfo(dtype).max 197 | if kind_in in 'ui': 198 | imin_in = np.iinfo(dtype_in).min 199 | imax_in = np.iinfo(dtype_in).max 200 | 201 | if kind_in == 'f': 202 | if np.min(image) < -1.0 or np.max(image) > 1.0: 203 | raise ValueError("Images of type float must be between -1 and 1.") 204 | if kind == 'f': 205 | # floating point -> floating point 206 | if itemsize_in > itemsize: 207 | prec_loss() 208 | return image.astype(dtype) 209 | 210 | # floating point -> integer 211 | prec_loss() 212 | # use float type that can represent output integer type 213 | image = np.array(image, _dtype(itemsize, dtype_in, 214 | np.float32, np.float64)) 215 | if not uniform: 216 | if kind == 'u': 217 | image *= imax 218 | else: 219 | image *= imax - imin 220 | image -= 1.0 221 | image /= 2.0 222 | np.rint(image, out=image) 223 | np.clip(image, imin, imax, out=image) 224 | elif kind == 'u': 225 | image *= imax + 1 226 | np.clip(image, 0, imax, out=image) 227 | else: 228 | image *= (imax - imin + 1.0) / 2.0 229 | np.floor(image, out=image) 230 | np.clip(image, imin, imax, out=image) 231 | return image.astype(dtype) 232 | 233 | if kind == 'f': 234 | # integer -> floating point 235 | if itemsize_in >= itemsize: 236 | prec_loss() 237 | # use float type that can exactly represent input integers 238 | image = np.array(image, _dtype(itemsize_in, dtype, 239 | np.float32, np.float64)) 240 | if kind_in == 'u': 241 | image /= imax_in 242 | # DirectX uses this conversion also for signed ints 243 | #if imin_in: 244 | # np.maximum(image, -1.0, out=image) 245 | else: 246 | image *= 2.0 247 | image += 1.0 248 | image /= imax_in - imin_in 249 | return image.astype(dtype) 250 | 251 | if kind_in == 'u': 252 | if kind == 'i': 253 | # unsigned integer -> signed integer 254 | image = _scale(image, 8 * itemsize_in, 8 * itemsize - 1) 255 | return image.view(dtype) 256 | else: 257 | # unsigned integer -> unsigned integer 258 | return _scale(image, 8 * itemsize_in, 8 * itemsize) 259 | 260 | if kind == 'u': 261 | # signed integer -> unsigned integer 262 | sign_loss() 263 | image = _scale(image, 8 * itemsize_in - 1, 8 * itemsize) 264 | result = np.empty(image.shape, dtype) 265 | np.maximum(image, 0, out=result, dtype=image.dtype, casting='unsafe') 266 | return result 267 | 268 | # signed integer -> signed integer 269 | if itemsize_in > itemsize: 270 | return _scale(image, 8 * itemsize_in - 1, 8 * itemsize - 1) 271 | image = image.astype(_dtype2('i', itemsize * 8)) 272 | image -= imin_in 273 | image = _scale(image, 8 * itemsize_in, 8 * itemsize, copy=False) 274 | image += imin 275 | 276 | return image.astype(dtype) 277 | 278 | 279 | 280 | def guess_spatial_dimensions(image): 281 | """Make an educated guess about whether an image has a channels dimension. 282 | Parameters 283 | ---------- 284 | image : ndarray 285 | The input image. 286 | Returns 287 | ------- 288 | spatial_dims : int or None 289 | The number of spatial dimensions of `image`. If ambiguous, the value 290 | is ``None``. 291 | Raises 292 | ------ 293 | ValueError 294 | If the image array has less than two or more than four dimensions. 295 | """ 296 | if image.ndim == 2: 297 | return 2 298 | if image.ndim == 3 and image.shape[-1] != 3: 299 | return 3 300 | if image.ndim == 3 and image.shape[-1] == 3: 301 | return None 302 | if image.ndim == 4 and image.shape[-1] == 3: 303 | return 3 304 | else: 305 | raise ValueError("Expected 2D, 3D, or 4D array, got %iD." % image.ndim) 306 | 307 | 308 | def gaussian(image, sigma=1, output=None, mode='nearest', cval=0, 309 | multichannel=None): 310 | """Multi-dimensional Gaussian filter 311 | 312 | Parameters 313 | ---------- 314 | image : array-like 315 | Input image (grayscale or color) to filter. 316 | sigma : scalar or sequence of scalars, optional 317 | Standard deviation for Gaussian kernel. The standard 318 | deviations of the Gaussian filter are given for each axis as a 319 | sequence, or as a single number, in which case it is equal for 320 | all axes. 321 | output : array, optional 322 | The ``output`` parameter passes an array in which to store the 323 | filter output. 324 | mode : {'reflect', 'constant', 'nearest', 'mirror', 'wrap'}, optional 325 | The `mode` parameter determines how the array borders are 326 | handled, where `cval` is the value when mode is equal to 327 | 'constant'. Default is 'nearest'. 328 | cval : scalar, optional 329 | Value to fill past edges of input if `mode` is 'constant'. Default 330 | is 0.0 331 | multichannel : bool, optional (default: None) 332 | Whether the last axis of the image is to be interpreted as multiple 333 | channels. If True, each channel is filtered separately (channels are 334 | not mixed together). Only 3 channels are supported. If `None`, 335 | the function will attempt to guess this, and raise a warning if 336 | ambiguous, when the array has shape (M, N, 3). 337 | 338 | Returns 339 | ------- 340 | filtered_image : ndarray 341 | the filtered array 342 | 343 | Notes 344 | ----- 345 | This function is a wrapper around :func:`scipy.ndi.gaussian_filter`. 346 | 347 | Integer arrays are converted to float. 348 | 349 | The multi-dimensional filter is implemented as a sequence of 350 | one-dimensional convolution filters. The intermediate arrays are 351 | stored in the same data type as the output. Therefore, for output 352 | types with a limited precision, the results may be imprecise 353 | because intermediate results may be stored with insufficient 354 | precision. 355 | 356 | Examples 357 | -------- 358 | 359 | >>> a = np.zeros((3, 3)) 360 | >>> a[1, 1] = 1 361 | >>> a 362 | array([[ 0., 0., 0.], 363 | [ 0., 1., 0.], 364 | [ 0., 0., 0.]]) 365 | >>> gaussian(a, sigma=0.4) # mild smoothing 366 | array([[ 0.00163116, 0.03712502, 0.00163116], 367 | [ 0.03712502, 0.84496158, 0.03712502], 368 | [ 0.00163116, 0.03712502, 0.00163116]]) 369 | >>> gaussian(a, sigma=1) # more smooting 370 | array([[ 0.05855018, 0.09653293, 0.05855018], 371 | [ 0.09653293, 0.15915589, 0.09653293], 372 | [ 0.05855018, 0.09653293, 0.05855018]]) 373 | >>> # Several modes are possible for handling boundaries 374 | >>> gaussian(a, sigma=1, mode='reflect') 375 | array([[ 0.08767308, 0.12075024, 0.08767308], 376 | [ 0.12075024, 0.16630671, 0.12075024], 377 | [ 0.08767308, 0.12075024, 0.08767308]]) 378 | >>> # For RGB images, each is filtered separately 379 | >>> from skimage.data import astronaut 380 | >>> image = astronaut() 381 | >>> filtered_img = gaussian(image, sigma=1, multichannel=True) 382 | 383 | """ 384 | 385 | spatial_dims = guess_spatial_dimensions(image) 386 | if spatial_dims is None and multichannel is None: 387 | msg = ("Images with dimensions (M, N, 3) are interpreted as 2D+RGB " 388 | "by default. Use `multichannel=False` to interpret as " 389 | "3D image with last dimension of length 3.") 390 | warn(RuntimeWarning(msg)) 391 | multichannel = True 392 | if np.any(np.asarray(sigma) < 0.0): 393 | raise ValueError("Sigma values less than zero are not valid") 394 | if multichannel: 395 | # do not filter across channels 396 | if not isinstance(sigma, coll.Iterable): 397 | sigma = [sigma] * (image.ndim - 1) 398 | if len(sigma) != image.ndim: 399 | sigma = np.concatenate((np.asarray(sigma), [0])) 400 | #image = img_as_float(image) 401 | return ndi.gaussian_filter(image, sigma, mode=mode, cval=cval) 402 | -------------------------------------------------------------------------------- /bat_eval/mywavfile.py: -------------------------------------------------------------------------------- 1 | """ 2 | Code taken from scipy.io.wavfile.py 3 | 4 | Module to read wav files using numpy arrays 5 | 6 | Functions 7 | --------- 8 | `read`: Return the sample rate (in samples/sec) and data from a WAV file. 9 | """ 10 | 11 | from __future__ import division, print_function, absolute_import 12 | import sys 13 | import numpy 14 | import struct 15 | import warnings 16 | 17 | 18 | __all__ = [ 19 | 'WavFileWarning', 20 | 'read' 21 | ] 22 | 23 | 24 | class WavFileWarning(UserWarning): 25 | pass 26 | 27 | 28 | WAVE_FORMAT_PCM = 0x0001 29 | WAVE_FORMAT_IEEE_FLOAT = 0x0003 30 | WAVE_FORMAT_EXTENSIBLE = 0xfffe 31 | KNOWN_WAVE_FORMATS = (WAVE_FORMAT_PCM, WAVE_FORMAT_IEEE_FLOAT) 32 | 33 | # assumes file pointer is immediately 34 | # after the 'fmt ' id 35 | 36 | 37 | def _read_fmt_chunk(fid, is_big_endian): 38 | """ 39 | Returns 40 | ------- 41 | size : int 42 | size of format subchunk in bytes (minus 8 for "fmt " and itself) 43 | format_tag : int 44 | PCM, float, or compressed format 45 | channels : int 46 | number of channels 47 | fs : int 48 | sampling frequency in samples per second 49 | bytes_per_second : int 50 | overall byte rate for the file 51 | block_align : int 52 | bytes per sample, including all channels 53 | bit_depth : int 54 | bits per sample 55 | """ 56 | if is_big_endian: 57 | fmt = '>' 58 | else: 59 | fmt = '<' 60 | 61 | size = res = struct.unpack(fmt+'I', fid.read(4))[0] 62 | bytes_read = 0 63 | 64 | if size < 16: 65 | raise ValueError("Binary structure of wave file is not compliant") 66 | 67 | res = struct.unpack(fmt+'HHIIHH', fid.read(16)) 68 | bytes_read += 16 69 | 70 | format_tag, channels, fs, bytes_per_second, block_align, bit_depth = res 71 | 72 | if format_tag == WAVE_FORMAT_EXTENSIBLE and size >= (16+2): 73 | ext_chunk_size = struct.unpack(fmt+'H', fid.read(2))[0] 74 | bytes_read += 2 75 | if ext_chunk_size >= 22: 76 | extensible_chunk_data = fid.read(22) 77 | bytes_read += 22 78 | raw_guid = extensible_chunk_data[2+4:2+4+16] 79 | # GUID template {XXXXXXXX-0000-0010-8000-00AA00389B71} (RFC-2361) 80 | # MS GUID byte order: first three groups are native byte order, 81 | # rest is Big Endian 82 | if is_big_endian: 83 | tail = b'\x00\x00\x00\x10\x80\x00\x00\xAA\x00\x38\x9B\x71' 84 | else: 85 | tail = b'\x00\x00\x10\x00\x80\x00\x00\xAA\x00\x38\x9B\x71' 86 | if raw_guid.endswith(tail): 87 | format_tag = struct.unpack(fmt+'I', raw_guid[:4])[0] 88 | else: 89 | raise ValueError("Binary structure of wave file is not compliant") 90 | 91 | if format_tag not in KNOWN_WAVE_FORMATS: 92 | raise ValueError("Unknown wave file format") 93 | 94 | # move file pointer to next chunk 95 | if size > (bytes_read): 96 | fid.read(size - bytes_read) 97 | 98 | return (size, format_tag, channels, fs, bytes_per_second, block_align, 99 | bit_depth) 100 | 101 | 102 | # assumes file pointer is immediately after the 'data' id 103 | def _read_data_chunk(fid, format_tag, channels, bit_depth, is_big_endian, 104 | mmap=False): 105 | if is_big_endian: 106 | fmt = '>I' 107 | else: 108 | fmt = ' 1: 135 | data = data.reshape(-1, channels) 136 | return data 137 | 138 | 139 | def _skip_unknown_chunk(fid, is_big_endian): 140 | if is_big_endian: 141 | fmt = '>I' 142 | else: 143 | fmt = '= 3: 286 | def _array_tofile(fid, data): 287 | # ravel gives a c-contiguous buffer 288 | fid.write(data.ravel().view('b').data) 289 | else: 290 | def _array_tofile(fid, data): 291 | fid.write(data.tostring()) 292 | 293 | -------------------------------------------------------------------------------- /bat_eval/nms.pyx: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | cimport numpy as np 3 | cimport cython 4 | 5 | DTYPE = np.float #Do not remove this line. See http://stackoverflow.com/questions/8024805/cython-compiled-c-extension-importerror-dynamic-module-does-not-define-init-fu 6 | ctypedef np.float32_t DTYPE_t 7 | 8 | @cython.boundscheck(False) 9 | def nms_1d(np.ndarray[DTYPE_t, ndim=1] src, int win_size, float file_duration): 10 | """1D Non maximum suppression 11 | src: vector of length N 12 | """ 13 | 14 | cdef int max_ind = 0 15 | cdef int ii = 0 16 | cdef int ee = 0 17 | cdef int width = src.shape[0]-1 18 | cdef np.ndarray pos = np.empty(width, dtype=np.int) 19 | cdef int pos_cnt = 0 20 | while ii <= width: 21 | 22 | if max_ind < (ii - win_size): 23 | max_ind = ii - win_size 24 | 25 | ee = ii + win_size 26 | if ii + win_size >= width: 27 | ee = width 28 | 29 | while max_ind <= ee: 30 | if src[max_ind] > src[ii]: 31 | break 32 | max_ind += 1 33 | 34 | if max_ind > ee: 35 | pos[pos_cnt] = ii 36 | pos_cnt += 1 37 | max_ind = ii+1 38 | ii += win_size 39 | 40 | ii += 1 41 | 42 | pos = pos[:pos_cnt] 43 | val = src[pos] 44 | 45 | # # remove peaks near the end 46 | inds = (pos + win_size) < src.shape[0] 47 | pos = pos[inds] 48 | val = val[inds] 49 | 50 | # set output to between 0 and 1, then put it in the correct time range 51 | pos = pos.astype(np.float32) / src.shape[0] 52 | pos = pos*file_duration 53 | 54 | return pos, val 55 | -------------------------------------------------------------------------------- /bat_eval/nms_slow.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | import numpy as np 3 | 4 | def nms_1d(src, win_size, file_duration): 5 | """1D Non maximum suppression 6 | src: vector of length N 7 | """ 8 | 9 | pos = [] 10 | src_cnt = 0 11 | max_ind = 0 12 | ii = 0 13 | ee = 0 14 | width = src.shape[0]-1 15 | while ii <= width: 16 | 17 | if max_ind < (ii - win_size): 18 | max_ind = ii - win_size 19 | 20 | ee = np.minimum(ii + win_size, width) 21 | 22 | while max_ind <= ee: 23 | src_cnt += 1 24 | if src[int(max_ind)] > src[int(ii)]: 25 | break 26 | max_ind += 1 27 | 28 | if max_ind > ee: 29 | pos.append(ii) 30 | max_ind = ii+1 31 | ii += win_size 32 | 33 | ii += 1 34 | 35 | pos = np.asarray(pos).astype(np.int) 36 | val = src[pos] 37 | 38 | # remove peaks near the end 39 | inds = (pos + win_size) < src.shape[0] 40 | pos = pos[inds] 41 | val = val[inds] 42 | 43 | # set output to between 0 and 1, then put it in the correct time range 44 | pos = pos / float(src.shape[0]) 45 | pos = pos*file_duration 46 | 47 | return pos, val 48 | 49 | 50 | def test_nms(): 51 | import matplotlib.pyplot as plt 52 | import numpy as np 53 | #import pyximport; pyximport.install(reload_support=True) 54 | import nms as nms_fast 55 | 56 | y = np.sin(np.arange(1000)/100.0*np.pi) 57 | y = y + np.random.random(y.shape)*0.5 58 | win_size = int(0.1*y.shape[0]/2.0) 59 | 60 | pos, prob = nms_1d(y, win_size, y.shape[0]) 61 | pos_f, prob_f = nms_fast.nms_1d(y, win_size, y.shape[0]) 62 | 63 | print('diff between implementations =', 1-np.isclose(prob_f, prob).mean()) 64 | print('diff between implementations =', 1-np.isclose(pos_f, pos).mean()) 65 | 66 | plt.close('all') 67 | plt.plot(y) 68 | plt.plot((pos).astype('int'), prob, 'ro', ms=10) 69 | plt.plot((pos_f).astype('int'), prob, 'bo') # shift so we can see them 70 | plt.show() 71 | -------------------------------------------------------------------------------- /bat_eval/readme.md: -------------------------------------------------------------------------------- 1 | # CPU Bat Detector Code 2 | 3 | This contains python code for bat echolocation call detection in full spectrum audio recordings. This is a stripped down CPU based version of the detector with minimal dependencies that can be used for deployment. 4 | 5 | 6 | #### Installation Instructions 7 | * Install the Anconda Python 2.7 distribution from [here](https://www.continuum.io/downloads). 8 | * Download this detection code from the repository and unzip it. 9 | * Compile fast non maximum suppression by running: `python setup.py build_ext --inplace`. This might not work on all systems e.g. Windows. 10 | 11 | 12 | #### Running on Your Own Data 13 | * Change the `data_dir = 'wavs/'` variable so that it points to the location of the audio files you want to run the detector on. 14 | * Specify where you want to results to be saved by setting `op_ann_dir = 'results/'`. 15 | * To run open up the command line and type: 16 | `python run_detector.py` 17 | * If you want the detector to be less conservative in it's detections lower the value of `detection_thresh`. 18 | * By setting `save_individual_results = False` the code will not save individual results files. 19 | 20 | ## Misc 21 | 22 | #### Requirements 23 | The code has been tested using Python2.7 (it mostly works under Python3.6, but we have noticed some issues) with the following package versions: 24 | `Python 2.7.12` 25 | `scipy 0.19.0` 26 | `numpy 1.12.1` 27 | `pandas 0.19.2` 28 | `cython 0.24.1` - not required 29 | 30 | 31 | #### Different Detection Models 32 | * `detector_192K.npy` is trained to be more efficient for files that have been recorded at 192K. Note that different detectors will give different results. You can swap in your own models that have been trained using the code in `../bat_train`, and exported with '../bat_train/export_detector_weights.py'. 33 | * To use it change the detector model as follows: 34 | `det_model_file = 'models/detector_192K.npy'` 35 | * Running `evaluate_cnn_fast.py` will compute the performance of the CPU version of this CNN_FAST model on the different test sets. 36 | 37 | 38 | #### Viewing Outputs 39 | * The code outputs annotations as one big csv file. The location where to save the file is specified with the variable: 40 | `op_file_name_total = 'res/op_file.csv'` 41 | It contains three fields `file_name`, `detection_time`, and `detection_prob` which indicated the time in file and detector confidence (higher is more confident) for each detected call. 42 | * It also saves the outputs in a format compatible with [AudioTagger](https://github.com/groakat/AudioTagger). The output directory for these annotations is specified as: 43 | `op_ann_dir = 'res/'` 44 | The individual `*-sceneRect.csv` files contain the same information that is specified in the main results file `op_file_name_total`, where `LabelStartTime_Seconds` corresponds to `detection_time` and `DetectorConfidence` corresponds to `detection_prob`. The additional fields (e.g. `Spec_x1`) are specific to Audiotagger and do not contain any extra information. 45 | 46 | 47 | #### Performance 48 | * You can get higher resolution results by setting the `low_res` flag in `cpu_detection.run_detection()` to `False`. 49 | * The detector code breaks the files down into chunks of audio (this is controlled by the parameter `chunk_size` in `cpu_detection` measured in seconds/10). Its best to keep this value reasonably small to keep memory usage low. However, experimenting with different values could speed things up. 50 | * You can get faster Fourier Transform by installing FFTW3 library (http://www.fftw.org/) and python wrapper pyFFTW (https://pypi.python.org/pypi/pyFFTW). On Ubuntu Linux: `sudo apt-get install libfftw3 libfftw3-dev` and `pip install pyfftw`, respectively. 51 | 52 | 53 | 54 | ### Acknowledgements 55 | Thanks to Daniyar Turmukhambetov for coding help for another version of this repo. We are enormously grateful for the efforts and enthusiasm of the amazing iBats and Bat Detective volunteers. We would also like to thank Ian Agranat and Joe Szewczak for useful discussions and access to their systems. Finally, we would like to thank [Zooniverse](https://www.zooniverse.org/) for setting up and hosting the Bat Detective project. 56 | 57 | ### License 58 | Code, audio data, and annotations are available for research purposes only i.e. non-commercial use. For any other use of the software or data please contact the authors. 59 | -------------------------------------------------------------------------------- /bat_eval/run_detector.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | import numpy as np 3 | import os 4 | import fnmatch 5 | import time 6 | import sys 7 | 8 | import write_op as wo 9 | import cpu_detection as detector 10 | import mywavfile 11 | 12 | 13 | def get_audio_files(ip_dir): 14 | matches = [] 15 | for root, dirnames, filenames in os.walk(ip_dir): 16 | for filename in filenames: 17 | if filename.lower().endswith('.wav'): 18 | matches.append(os.path.join(root, filename)) 19 | return matches 20 | 21 | 22 | def read_audio(file_name, do_time_expansion, chunk_size, win_size): 23 | # try to read in audio file 24 | try: 25 | samp_rate_orig, audio = mywavfile.read(file_name) 26 | except: 27 | print(' Error reading file') 28 | return True, None, None, None, None 29 | 30 | # convert to mono if stereo 31 | if len(audio.shape) == 2: 32 | print(' Warning: stereo file. Just taking left channel.') 33 | audio = audio[:, 0] 34 | file_dur = audio.shape[0] / float(samp_rate_orig) 35 | print(' dur', round(file_dur,3), '(secs) , fs', samp_rate_orig) 36 | 37 | # original model is trained on time expanded data 38 | samp_rate = samp_rate_orig 39 | if do_time_expansion: 40 | samp_rate = int(samp_rate_orig/10.0) 41 | file_dur *= 10 42 | 43 | # pad with zeros so we can go right to the end 44 | multiplier = np.ceil(file_dur/float(chunk_size-win_size)) 45 | diff = multiplier*(chunk_size-win_size) - file_dur + win_size 46 | audio_pad = np.hstack((audio, np.zeros(int(diff*samp_rate)))) 47 | 48 | return False, audio_pad, file_dur, samp_rate, samp_rate_orig 49 | 50 | 51 | def run_model(det, audio, file_dur, samp_rate, detection_thresh, max_num_calls=0): 52 | """This runs the bat call detector. 53 | """ 54 | # results will be stored here 55 | det_time_file = np.zeros(0) 56 | det_prob_file = np.zeros(0) 57 | 58 | # files can be long so we split each up into separate (overlapping) chunks 59 | st_positions = np.arange(0, file_dur, det.chunk_size-det.win_size) 60 | for chunk_id, st_position in enumerate(st_positions): 61 | 62 | # take a chunk of the audio 63 | # should already be zero padded at the end so its the correct size 64 | st_pos = int(st_position*samp_rate) 65 | en_pos = int(st_pos + det.chunk_size*samp_rate) 66 | audio_chunk = audio[st_pos:en_pos] 67 | chunk_duration = audio_chunk.shape[0] / float(samp_rate) 68 | 69 | # create spectrogram 70 | spec = det.create_spec(audio_chunk, samp_rate) 71 | 72 | # run detector 73 | det_loc, prob_det = det.run_detection(spec, chunk_duration, detection_thresh, 74 | low_res=True) 75 | 76 | det_time_file = np.hstack((det_time_file, det_loc + st_position)) 77 | det_prob_file = np.hstack((det_prob_file, prob_det)) 78 | 79 | # undo the effects of time expansion for detector 80 | if do_time_expansion: 81 | det_time_file /= 10.0 82 | 83 | return det_time_file, det_prob_file 84 | 85 | 86 | if __name__ == "__main__": 87 | 88 | # params 89 | detection_thresh = 0.95 # make this smaller if you want more calls 90 | do_time_expansion = True # if audio is already time expanded set this to False 91 | save_individual_results = True # if True will create an output for each file 92 | save_summary_result = True # if True will create a single csv file with all results 93 | 94 | # load data 95 | data_dir = 'wavs' # this is the path to your audio files 96 | op_ann_dir = 'results' # this where your results will be saved 97 | op_ann_dir_ind = os.path.join(op_ann_dir, 'individual_results') # this where individual results will be saved 98 | op_file_name_total = os.path.join(op_ann_dir, 'results.csv') 99 | if not os.path.isdir(op_ann_dir): 100 | os.makedirs(op_ann_dir) 101 | if save_individual_results and not os.path.isdir(op_ann_dir_ind): 102 | os.makedirs(op_ann_dir_ind) 103 | 104 | # read audio files 105 | audio_files = get_audio_files(data_dir) 106 | 107 | print('Processing ', len(audio_files), 'files') 108 | print('Input directory ', data_dir) 109 | print('Results directory ', op_ann_dir, '\n') 110 | 111 | 112 | # load and create the detector 113 | det_model_file = 'models/detector.npy' 114 | det_params_file = det_model_file[:-4] + '_params.json' 115 | det = detector.CPUDetector(det_model_file, det_params_file) 116 | 117 | # loop through audio files 118 | results = [] 119 | for file_cnt, file_name in enumerate(audio_files): 120 | 121 | file_name_basename = file_name[len(data_dir):] 122 | print('\n', file_cnt+1, 'of', len(audio_files), '\t', file_name_basename) 123 | 124 | # read audio file - skip file if can't read it 125 | read_fail, audio, file_dur, samp_rate, samp_rate_orig = read_audio(file_name, 126 | do_time_expansion, det.chunk_size, det.win_size) 127 | if read_fail: 128 | continue 129 | 130 | # run detector 131 | tic = time.time() 132 | det_time, det_prob = run_model(det, audio, file_dur, samp_rate, 133 | detection_thresh) 134 | toc = time.time() 135 | 136 | print(' detection time', round(toc-tic, 3), '(secs)') 137 | num_calls = len(det_time) 138 | print(' ' + str(num_calls) + ' calls found') 139 | 140 | # save results 141 | if save_individual_results: 142 | # save to AudioTagger format 143 | f_name_fmt = file_name_basename.replace('/', '_').replace('\\', '_')[:-4] 144 | op_file_name = os.path.join(op_ann_dir_ind, f_name_fmt) + '-sceneRect.csv' 145 | wo.create_audio_tagger_op(file_name_basename, op_file_name, det_time, 146 | det_prob, samp_rate_orig, class_name='bat') 147 | 148 | # save as dictionary 149 | if num_calls > 0: 150 | res = {'filename':file_name_basename, 'time':det_time, 'prob':det_prob} 151 | results.append(res) 152 | 153 | # save results for all files to large csv 154 | if save_summary_result and (len(results) > 0): 155 | print('\nsaving results to', op_file_name_total) 156 | wo.save_to_txt(op_file_name_total, results) 157 | else: 158 | print('no detections to save') 159 | -------------------------------------------------------------------------------- /bat_eval/setup.py: -------------------------------------------------------------------------------- 1 | from distutils.core import setup 2 | from distutils.extension import Extension 3 | from Cython.Build import cythonize 4 | import numpy 5 | from sys import platform 6 | 7 | extra_compile_args = [] 8 | extra_link_args = [] 9 | 10 | try: 11 | from Cython.Distutils.build_ext import build_ext 12 | except ImportError: 13 | print('Error: Cython not installed. please install by running "conda install cython". exiting') 14 | exit() 15 | 16 | if platform == "linux" or platform == "linux2": 17 | # linux 18 | extra_compile_args.append('-fopenmp') 19 | extra_compile_args.append('-ffast-math') 20 | extra_compile_args.append('-msse') 21 | extra_compile_args.append('-msse2') 22 | extra_compile_args.append('-msse3') 23 | extra_compile_args.append('-msse4') 24 | extra_compile_args.append('-s') 25 | extra_compile_args.append('-std=c99') 26 | extra_link_args.append('-fopenmp') 27 | 28 | elif platform == "darwin": 29 | # OS X 30 | extra_compile_args.append('-fopenmp') 31 | extra_compile_args.append('-ffast-math') 32 | extra_compile_args.append('-msse') 33 | extra_compile_args.append('-msse2') 34 | extra_compile_args.append('-msse3') 35 | extra_compile_args.append('-msse4') 36 | extra_compile_args.append('-s') 37 | extra_compile_args.append('-std=c99') 38 | extra_link_args.append('-fopenmp') 39 | 40 | import os 41 | os.environ["CC"] = "gcc-6" 42 | os.environ["CXX"] = "gcc-6" 43 | elif platform == "win32": 44 | # Windows 45 | pass 46 | 47 | extensions = [ 48 | Extension("nms", ["nms.pyx"], 49 | extra_compile_args=extra_compile_args, 50 | extra_link_args=extra_link_args) 51 | ] 52 | 53 | setup( 54 | ext_modules = cythonize(extensions), 55 | include_dirs=[numpy.get_include()] 56 | ) 57 | -------------------------------------------------------------------------------- /bat_eval/spectrogram.py: -------------------------------------------------------------------------------- 1 | from myskimage import gaussian 2 | import numpy as np 3 | import imp 4 | try: 5 | imp.find_module('pyfftw') 6 | pyfftw_installed = True 7 | import pyfftw 8 | except ImportError: 9 | pyfftw_installed = False 10 | 11 | 12 | class Spectrogram: 13 | fftw_inps = {} 14 | fftw_rfft = {} 15 | han_wins = {} 16 | 17 | def __init__(self, use_pyfftw=True): 18 | if not pyfftw_installed: 19 | use_pyfftw = False 20 | self.use_pyfftw = use_pyfftw 21 | 22 | @staticmethod 23 | def _denoise(spec): 24 | """ 25 | Perform denoising. 26 | """ 27 | me = np.mean(spec, 1) 28 | spec = spec - me[:, np.newaxis] 29 | 30 | # remove anything below 0 31 | spec.clip(min=0, out=spec) 32 | 33 | return spec 34 | 35 | @staticmethod 36 | def do_fft(inp, use_pyfftw=False, K=None): 37 | if not use_pyfftw: 38 | out = np.fft.rfft(inp, n=K, axis=0) 39 | out = out.astype('complex64') # numpy may be using double precision internally 40 | elif use_pyfftw: 41 | if not inp.shape in Spectrogram.fftw_inps: 42 | Spectrogram.fftw_inps[inp.shape] = pyfftw.empty_aligned(inp.shape, dtype='float32') 43 | Spectrogram.fftw_rfft[inp.shape] = pyfftw.builders.rfft(Spectrogram.fftw_inps[inp.shape], axis=0) 44 | Spectrogram.fftw_inps[inp.shape][:] = inp[:] 45 | out = (Spectrogram.fftw_rfft[inp.shape])() 46 | return out 47 | 48 | def gen_mag_spectrogram(self, x, fs, ms, overlap_perc, crop_spec=True, max_freq=256, min_freq=0): 49 | """ 50 | Computes magnitude spectrogram by specifying time 51 | """ 52 | 53 | x = x.astype(np.float32) 54 | 55 | nfft = int(ms*fs) 56 | noverlap = int(overlap_perc*nfft) 57 | 58 | # window data 59 | step = nfft - noverlap 60 | shape = (nfft, (x.shape[-1]-noverlap)//step) 61 | strides = (x.strides[0], step*x.strides[0]) 62 | x_wins = np.lib.stride_tricks.as_strided(x, shape=shape, strides=strides) 63 | 64 | # apply window 65 | if x_wins.shape not in Spectrogram.han_wins: 66 | Spectrogram.han_wins[x_wins.shape[0]] = np.hanning(x_wins.shape[0]).astype('float32') 67 | 68 | han_win = Spectrogram.han_wins[x_wins.shape[0]] 69 | x_wins_han = han_win[..., np.newaxis] * x_wins 70 | 71 | # do fft 72 | # note this will be much slower if x_wins_han.shape[0] is not a power of 2 73 | complex_spec = Spectrogram.do_fft(x_wins_han, self.use_pyfftw) 74 | 75 | # calculate magnitude 76 | mag_spec = complex_spec.real**2 + complex_spec.imag**2 77 | # calculate magnitude 78 | #mag_spec = (np.conjugate(complex_spec) * complex_spec).real 79 | # calculate magnitude 80 | #mag_spec = np.square(np.absolute(complex_spec)) 81 | 82 | # orient correctly and remove dc component 83 | spec = mag_spec[1:, :] 84 | spec = np.flipud(spec) 85 | 86 | # only keep the relevant bands 87 | # not really in frequency, better thought of as indices 88 | if crop_spec: 89 | spec = spec[-max_freq:-min_freq, :] 90 | 91 | # add some zeros if too small 92 | req_height = max_freq-min_freq 93 | if spec.shape[0] < req_height: 94 | zero_pad = np.zeros((req_height-spec.shape[0], spec.shape[1]), dtype=np.float32) 95 | spec = np.vstack((zero_pad, spec)) 96 | return spec 97 | 98 | 99 | def gen_spectrogram(self, audio_samples, sampling_rate, fft_win_length, fft_overlap, crop_spec=True, max_freq=256, min_freq=0): 100 | """ 101 | Compute spectrogram, crop and compute log. 102 | """ 103 | 104 | # compute spectrogram 105 | spec = self.gen_mag_spectrogram(audio_samples, sampling_rate, fft_win_length, fft_overlap, crop_spec, max_freq, min_freq) 106 | 107 | # perform log scaling - here the same as matplotlib 108 | log_scaling = 2.0 * (1.0 / sampling_rate) * (1.0/(np.abs(np.hanning(int(fft_win_length*sampling_rate)))**2).sum()) 109 | spec = np.log(1 + log_scaling*spec) 110 | 111 | return spec 112 | 113 | 114 | def process_spectrogram(self, spec, denoise_spec=True, smooth_spec=True, smooth_sigma=1.0): 115 | """ 116 | Denoises, and smooths spectrogram. 117 | """ 118 | 119 | # denoise 120 | if denoise_spec: 121 | spec = Spectrogram._denoise(spec) 122 | 123 | # smooth the spectrogram 124 | if smooth_spec: 125 | spec = gaussian(spec, smooth_sigma) 126 | 127 | return spec 128 | -------------------------------------------------------------------------------- /bat_eval/wavs/test_file.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/macaodha/batdetect/7d4f7e43b4456d391eb832a612e7b134e341814d/bat_eval/wavs/test_file.wav -------------------------------------------------------------------------------- /bat_eval/write_op.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import numpy as np 3 | import datetime as dt 4 | import glob 5 | import os 6 | 7 | 8 | def save_to_txt(op_file, results): 9 | 10 | # takes a dictionary of results and saves to file 11 | with open(op_file, 'w') as file: 12 | head_str = 'file_name,detection_time,detection_prob' 13 | file.write(head_str + '\n') 14 | 15 | for ii in range(len(results)): 16 | for jj in range(len(results[ii]['prob'])): 17 | 18 | row_str = results[ii]['filename'] + ',' 19 | tm = round(results[ii]['time'][jj],3) 20 | pr = round(results[ii]['prob'][jj],3) 21 | row_str += str(tm) + ',' + str(pr) 22 | file.write(row_str + '\n') 23 | 24 | 25 | def create_audio_tagger_op(ip_file_name, op_file_name, st_times, det_confidence, 26 | samp_rate, class_name): 27 | # saves the detections in an audiotagger friendly format 28 | 29 | col_names = ['Filename', 'Label', 'LabelTimeStamp', 'Spec_NStep', 30 | 'Spec_NWin', 'Spec_x1', 'Spec_y1', 'Spec_x2', 'Spec_y2', 31 | 'LabelStartTime_Seconds', 'LabelEndTime_Seconds', 32 | 'LabelArea_DataPoints', 'DetectorConfidence'] 33 | 34 | nstep = 0.001 35 | nwin = 0.003 36 | call_width = 0.001 # code does not output call width so just make one up 37 | y_max = (samp_rate*nwin)/2.0 38 | num_calls = len(st_times) 39 | 40 | if num_calls == 0: 41 | da_at = pd.DataFrame(index=np.arange(0), columns=col_names) 42 | else: 43 | da_at = pd.DataFrame(index=np.arange(0, num_calls), columns=col_names) 44 | da_at['Spec_NStep'] = nstep 45 | da_at['Spec_NWin'] = nwin 46 | da_at['Label'] = 'bat' 47 | da_at['LabelTimeStamp'] = dt.datetime.now().isoformat() 48 | da_at['Spec_y1'] = 0 49 | da_at['Spec_y2'] = y_max 50 | da_at['Filename'] = ip_file_name 51 | 52 | for ii in np.arange(0, num_calls): 53 | 54 | st_time = st_times[ii] 55 | da_at.loc[ii, 'LabelStartTime_Seconds'] = np.round(st_time, 3) 56 | da_at.loc[ii, 'LabelEndTime_Seconds'] = np.round(st_time + call_width, 3) 57 | da_at.loc[ii, 'Label'] = class_name 58 | 59 | da_at.loc[ii, 'Spec_x1'] = np.round(st_time/nstep, 3) 60 | da_at.loc[ii, 'Spec_x2'] = np.round((st_time + call_width)/nstep, 3) 61 | 62 | da_at.loc[ii, 'DetectorConfidence'] = np.round(det_confidence[ii], 3) 63 | 64 | # save to disk 65 | da_at.to_csv(op_file_name, index=False) 66 | 67 | return da_at 68 | -------------------------------------------------------------------------------- /bat_train/classifier.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import evaluate as evl 3 | import cls_audio_forest as cls_rf 4 | import cls_cnn as cls_cnn 5 | import cls_segment as seg 6 | import create_results as res 7 | import time 8 | 9 | 10 | class Classifier: 11 | 12 | def __init__(self, params_): 13 | self.params = params_ 14 | if self.params.classification_model == 'rf_vanilla': 15 | self.model = cls_rf.AudioForest(self.params) 16 | elif self.params.classification_model == 'cnn': 17 | self.model = cls_cnn.NeuralNet(self.params) 18 | elif self.params.classification_model == 'segment': 19 | self.model = seg.SegmentAudio(self.params) 20 | else: 21 | print 'Invalid model specified' 22 | 23 | def save_features(self, files): 24 | self.model.save_features(files) 25 | 26 | def train(self, files, gt_pos, durations): 27 | """ 28 | Takes the file names and GT call positions and trains model. 29 | """ 30 | 31 | positions, class_labels = generate_training_positions(files, gt_pos, durations, self.params) 32 | 33 | self.model.train(positions, class_labels, files, durations) 34 | 35 | # hard negative mining 36 | if self.params.num_hard_negative_mining > 0 and self.params.classification_model != 'segment': 37 | print '\nhard negative mining' 38 | for hn in range(self.params.num_hard_negative_mining): 39 | print '\thmn round', hn 40 | positions, class_labels = self.do_hnm(files, gt_pos, durations, positions, class_labels) 41 | self.model.train(positions, class_labels, files, durations) 42 | 43 | def test_single(self, audio_samples, sampling_rate): 44 | """ 45 | Pass the raw audio samples and it will make a prediction. 46 | """ 47 | duration = audio_samples.shape[0]/float(sampling_rate) 48 | nms_pos, nms_prob, y_prediction = self.model.test(file_duration=duration, audio_samples=audio_samples, sampling_rate=sampling_rate) 49 | return nms_pos, nms_prob, y_prediction 50 | 51 | def test_batch(self, files, gt_pos, durations, save_results=False, op_im_dir=''): 52 | """ 53 | Takes a list of files as input and runs the detector on them. 54 | """ 55 | nms_pos = [None]*len(files) 56 | nms_prob = [None]*len(files) 57 | for ii, file_name in enumerate(files): 58 | nms_pos[ii], nms_prob[ii], y_prediction = self.model.test(file_name=file_name, file_duration=durations[ii]) 59 | 60 | # plot results 61 | if save_results: 62 | aud_file = self.params.audio_dir + file_name + '.wav' 63 | res.plot_spec(op_im_dir + file_name, aud_file, gt_pos[ii], nms_pos[ii], nms_prob[ii], y_prediction, self.params, True) 64 | 65 | return nms_pos, nms_prob 66 | 67 | def do_hnm(self, files, gt_pos, durations, positions, class_labels): 68 | """ 69 | Hard negative mining, adds high confidence false positives to the training set. 70 | """ 71 | 72 | nms_pos, nms_prob = self.test_batch(files, gt_pos, durations, False, '') 73 | 74 | positions_new = [None]*len(nms_pos) 75 | class_labels_new = [None]*len(nms_pos) 76 | for ii in range(len(files)): 77 | 78 | # add the false positives that are above the detection threshold 79 | # and not too close to the GT 80 | poss_negs = nms_pos[ii][nms_prob[ii][:,0] > self.params.detection_prob] 81 | if gt_pos[ii].shape[0] > 0: 82 | # have the extra newaxis in case gt_pos[ii] shape changes in the future 83 | pw_distance = np.abs(poss_negs[np.newaxis, ...]-gt_pos[ii][:,0][..., np.newaxis]) 84 | dis_check = (pw_distance >= (self.params.window_size / 3)).mean(0) 85 | new_negs = poss_negs[dis_check==1] 86 | else: 87 | new_negs = poss_negs 88 | new_negs = new_negs[new_negs < (durations[ii]-self.params.window_size)] 89 | 90 | # add them to the training set 91 | positions_new[ii] = np.hstack((positions[ii], new_negs)) 92 | class_labels_new[ii] = np.vstack((class_labels[ii], np.zeros((new_negs.shape[0], 1)))) 93 | 94 | # sort 95 | sorted_inds = np.argsort(positions_new[ii]) 96 | positions_new[ii] = positions_new[ii][sorted_inds] 97 | class_labels_new[ii] = class_labels_new[ii][sorted_inds] 98 | 99 | return positions_new, class_labels_new 100 | 101 | 102 | def generate_training_positions(files, gt_pos, durations, params): 103 | positions = [None]*len(files) 104 | class_labels = [None]*len(files) 105 | for ii, ff in enumerate(files): 106 | positions[ii], class_labels[ii] = extract_train_position_from_file(gt_pos[ii], durations[ii], params) 107 | return positions, class_labels 108 | 109 | 110 | def extract_train_position_from_file(gt_pos, duration, params): 111 | """ 112 | Samples random negative locations for negs, making sure not to overlap with GT. 113 | 114 | gt_pos is the time in seconds that the call occurs. 115 | positions contains time in seconds of some negative and positive examples. 116 | """ 117 | 118 | if gt_pos.shape[0] == 0: 119 | # dont extract any values if the file does not contain anything 120 | # we will use these ones for HNM later 121 | positions = np.zeros(0) 122 | class_labels = np.zeros((0,1)) 123 | else: 124 | shift = 0 # if there is augmentation this is how much we will add 125 | num_neg_calls = gt_pos.shape[0] 126 | pos_window = params.window_size / 2 # window around GT that is not sampled from 127 | pos = gt_pos[:, 0] 128 | 129 | # augmentation 130 | if params.add_extra_calls: 131 | shift = params.aug_shift 132 | num_neg_calls *= 3 133 | pos = np.hstack((gt_pos[:, 0] - shift, gt_pos[:, 0], gt_pos[:, 0] + shift)) 134 | 135 | # sample a set of negative locations - need to be sufficiently far away from GT 136 | pos_pad = np.hstack((0-params.window_size, gt_pos[:, 0], duration-params.window_size)) 137 | neg = [] 138 | cnt = 0 139 | while cnt < num_neg_calls: 140 | rand_pos = np.random.random()*pos_pad.max() 141 | if (np.abs(pos_pad - rand_pos) > (pos_window+shift)).mean() == 1: 142 | neg.append(rand_pos) 143 | cnt += 1 144 | neg = np.asarray(neg) 145 | 146 | # sort them 147 | positions = np.hstack((pos, neg)) 148 | sorted_inds = np.argsort(positions) 149 | positions = positions[sorted_inds] 150 | 151 | # create labels 152 | class_labels = np.vstack((np.ones((pos.shape[0], 1)), np.zeros((neg.shape[0], 1)))) 153 | class_labels = class_labels[sorted_inds] 154 | 155 | return positions, class_labels 156 | -------------------------------------------------------------------------------- /bat_train/cls_audio_forest.py: -------------------------------------------------------------------------------- 1 | import grad_features as gf 2 | import random_forest as rf 3 | import numpy as np 4 | from skimage.util.shape import view_as_windows 5 | from scipy.ndimage import zoom 6 | import pyximport; pyximport.install() 7 | import nms as nms 8 | from scipy.ndimage.filters import gaussian_filter1d 9 | import spectrogram as sp 10 | from skimage import filters 11 | from scipy.io import wavfile 12 | from skimage.util import view_as_blocks 13 | 14 | 15 | class AudioForest: 16 | 17 | def __init__(self, params_): 18 | self.params = params_ 19 | forest_params = rf.ForestParams(num_classes=2, trees=self.params.trees, 20 | depth=self.params.depth, min_cnt=self.params.min_cnt, tests=self.params.tests) 21 | self.forest = rf.Forest(forest_params) 22 | 23 | def train(self, positions, class_labels, files, durations): 24 | feats = [] 25 | labs = [] 26 | for ii, file_name in enumerate(files): 27 | 28 | local_feats = self.create_or_load_features(file_name) 29 | 30 | # convert time in file to integer 31 | positions_ratio = positions[ii] / durations[ii] 32 | train_inds = (positions_ratio*float(local_feats.shape[0])).astype('int') 33 | 34 | feats.append(local_feats[train_inds, :]) 35 | labs.append(class_labels[ii]) 36 | 37 | # flatten list of lists and set to correct output 38 | features = np.vstack(feats) 39 | labels = np.vstack(labs) 40 | print 'train size', features.shape 41 | self.forest.train(features, labels, False) 42 | 43 | def test(self, file_name=None, file_duration=None, audio_samples=None, sampling_rate=None): 44 | 45 | # compute features 46 | features = self.create_or_load_features(file_name, audio_samples, sampling_rate) 47 | 48 | # make prediction 49 | y_prediction = self.forest.test(features)[:, 1][:, np.newaxis] 50 | 51 | # smooth the output 52 | if self.params.smooth_op_prediction: 53 | y_prediction = gaussian_filter1d(y_prediction, self.params.smooth_op_prediction_sigma, axis=0) 54 | pos, prob = nms.nms_1d(y_prediction[:,0], self.params.nms_win_size, file_duration) 55 | 56 | return pos, prob, y_prediction 57 | 58 | def create_or_load_features(self, file_name=None, audio_samples=None, sampling_rate=None): 59 | """ 60 | Does 1 of 3 possible things 61 | 1) computes feature from audio samples directly 62 | 2) loads feature from disk OR 63 | 3) computes features from file name 64 | """ 65 | 66 | if file_name is None: 67 | features = compute_features(audio_samples, sampling_rate, self.params) 68 | else: 69 | if self.params.load_features_from_file: 70 | features = np.load(self.params.feature_dir + file_name + '.npy') 71 | else: 72 | sampling_rate, audio_samples = wavfile.read(self.params.audio_dir + file_name + '.wav') 73 | features = compute_features(audio_samples, sampling_rate, self.params) 74 | return features 75 | 76 | def save_features(self, files): 77 | for file_name in files: 78 | sampling_rate, audio_samples = wavfile.read(self.params.audio_dir + file_name + '.wav') 79 | features = compute_features(audio_samples, sampling_rate, self.params) 80 | np.save(self.params.feature_dir + file_name, features) 81 | 82 | 83 | def spatial_pool(ip, block_size): 84 | """ 85 | Does sum pooling to reduce dimensionality 86 | """ 87 | # make sure its evenly divisible by padding with last rows 88 | vert_diff = ip.shape[0]%int(block_size) 89 | horz_diff = ip.shape[1]%int(block_size) 90 | 91 | if vert_diff > 0: 92 | ip = np.vstack((ip, np.tile(ip[-1, :], ((block_size-vert_diff, 1))))) 93 | if horz_diff > 0: 94 | ip = np.hstack((ip, np.tile(ip[:, -1], ((block_size-horz_diff, 1))).T)) 95 | 96 | # get block_size*block_size non-overlapping blocks 97 | blocks = view_as_blocks(ip, (block_size, block_size)) 98 | 99 | # sum, could max etc. 100 | op = blocks.reshape(blocks.shape[0], blocks.shape[1], blocks.shape[2]*blocks.shape[3]).sum(2) 101 | 102 | return op 103 | 104 | 105 | def compute_features(audio_samples, sampling_rate, params): 106 | """ 107 | Computes feature vector given audio file name. 108 | Assumes all the spectrograms are the same size - this should be checked externally 109 | """ 110 | 111 | # load audio and create spectrogram 112 | spectrogram = sp.gen_spectrogram(audio_samples, sampling_rate, params.fft_win_length, params.fft_overlap, 113 | crop_spec=params.crop_spec, max_freq=params.max_freq, min_freq=params.min_freq) 114 | spectrogram = sp.process_spectrogram(spectrogram, denoise_spec=params.denoise, mean_log_mag=params.mean_log_mag, smooth_spec=params.smooth_spec) 115 | 116 | # pad with dummy features at the end to take into account the size of the sliding window 117 | if params.feature_type == 'raw': 118 | spec_win = view_as_windows(spectrogram, (spectrogram.shape[0], params.window_width))[0] 119 | spec_win = zoom(spec_win, (1, 0.5, 0.5), order=1) 120 | total_win_size = spectrogram.shape[1] 121 | 122 | elif params.feature_type == 'grad': 123 | grad = np.gradient(spectrogram) 124 | grad_mag = np.sqrt((grad[0]**2 + grad[1]**2)) 125 | total_win_size = spectrogram.shape[1] 126 | 127 | spec_win = view_as_windows(grad_mag, (grad_mag.shape[0], params.window_width))[0] 128 | spec_win = zoom(spec_win, (1, 0.5, 0.5), order=1) 129 | 130 | elif params.feature_type == 'max_freq': 131 | 132 | num_max_freqs = 3 # e.g. 1 means keep top 1, 2 means top 2, ... 133 | total_win_size = spectrogram.shape[1] 134 | max_freq = np.argsort(spectrogram, 0) 135 | max_amp = np.sort(spectrogram, 0) 136 | stacked = np.vstack((max_amp[-num_max_freqs:, :], max_freq[-num_max_freqs:, :])) 137 | 138 | spec_win = view_as_windows(stacked, (stacked.shape[0], params.window_width))[0] 139 | 140 | elif params.feature_type == 'hog': 141 | block_size = 4 142 | hog = gf.compute_hog(spectrogram, block_size) 143 | total_win_size = hog.shape[1] 144 | window_width_down = np.rint(params.window_width / float(block_size)) 145 | 146 | spec_win = view_as_windows(hog, (hog.shape[0], window_width_down, hog.shape[2]))[0] 147 | 148 | elif params.feature_type == 'grad_pool': 149 | grad = np.gradient(spectrogram) 150 | grad_mag = np.sqrt((grad[0]**2 + grad[1]**2)) 151 | 152 | down_sample_size = 4 153 | window_width_down = np.rint(params.window_width / float(down_sample_size)) 154 | grad_mag_pool = spatial_pool(grad_mag, down_sample_size) 155 | total_win_size = grad_mag_pool.shape[1] 156 | 157 | spec_win = view_as_windows(grad_mag_pool, (grad_mag_pool.shape[0], window_width_down))[0] 158 | 159 | elif params.feature_type == 'raw_pool': 160 | down_sample_size = 4 161 | window_width_down = np.rint(params.window_width / float(down_sample_size)) 162 | spec_pool = spatial_pool(spectrogram, down_sample_size) 163 | total_win_size = spec_pool.shape[1] 164 | 165 | spec_win = view_as_windows(spec_pool, (spec_pool.shape[0], window_width_down))[0] 166 | 167 | # pad on extra features at the end as the sliding window will mean its a different size 168 | features = spec_win.reshape((spec_win.shape[0], np.prod(spec_win.shape[1:]))) 169 | features = np.vstack((features, np.tile(features[-1, :], (total_win_size - features.shape[0], 1)))) 170 | features = features.astype(np.float32) 171 | 172 | return features 173 | 174 | -------------------------------------------------------------------------------- /bat_train/cls_cnn.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from skimage.util.shape import view_as_windows 3 | from scipy.ndimage import zoom 4 | from scipy.ndimage.filters import gaussian_filter1d 5 | import spectrogram as sp 6 | from scipy.io import wavfile 7 | import pyximport; pyximport.install() 8 | import nms as nms 9 | 10 | import theano 11 | import lasagne 12 | from lasagne.layers.dnn import Conv2DDNNLayer as ConvLayer 13 | from lasagne.layers import Pool2DLayer as PoolLayer 14 | from lasagne.layers import DenseLayer 15 | 16 | 17 | class NeuralNet: 18 | 19 | def __init__(self, params_): 20 | self.params = params_ 21 | self.network = None 22 | 23 | def train(self, positions, class_labels, files, durations): 24 | feats = [] 25 | labs = [] 26 | for ii, file_name in enumerate(files): 27 | 28 | if positions[ii].shape[0] > 0: 29 | local_feats = self.create_or_load_features(file_name) 30 | 31 | # convert time in file to integer 32 | positions_ratio = positions[ii] / durations[ii] 33 | train_inds = (positions_ratio*float(local_feats.shape[0])).astype('int') 34 | 35 | feats.append(local_feats[train_inds, :, :, :]) 36 | labs.append(class_labels[ii]) 37 | 38 | # flatten list of lists and set to correct output size 39 | features = np.vstack(feats) 40 | labels = np.vstack(labs).astype(np.uint8)[:,0] 41 | print 'train size', features.shape 42 | 43 | # train network 44 | input_var = theano.tensor.tensor4('inputs') 45 | target_var = theano.tensor.ivector('targets') 46 | self.network = build_cnn(features.shape[2:], input_var, self.params.net_type) 47 | 48 | prediction = lasagne.layers.get_output(self.network['prob']) 49 | loss = lasagne.objectives.categorical_crossentropy(prediction, target_var) 50 | loss = loss.mean() 51 | params = lasagne.layers.get_all_params(self.network['prob'], trainable=True) 52 | updates = lasagne.updates.nesterov_momentum( 53 | loss, params, learning_rate=self.params.learn_rate, momentum=self.params.moment) 54 | train_fn = theano.function([input_var, target_var], loss, updates=updates) 55 | 56 | for epoch in range(self.params.num_epochs): 57 | # in each epoch, we do a full pass over the training data 58 | for batch in iterate_minibatches(features, labels, self.params.batchsize, shuffle=True): 59 | inputs, targets = batch 60 | train_fn(inputs, targets) 61 | 62 | # test function 63 | pred = lasagne.layers.get_output(self.network['prob'], deterministic=True)[:, 1] 64 | self.test_fn = theano.function([input_var], pred) 65 | 66 | def test(self, file_name=None, file_duration=None, audio_samples=None, sampling_rate=None): 67 | 68 | # compute features and perform classification 69 | features = self.create_or_load_features(file_name, audio_samples, sampling_rate) 70 | y_prediction = self.test_fn(features)[:, np.newaxis] 71 | 72 | # smooth the output prediction 73 | if self.params.smooth_op_prediction: 74 | y_prediction = gaussian_filter1d(y_prediction, self.params.smooth_op_prediction_sigma, axis=0) 75 | 76 | # perform non max suppression 77 | pos, prob = nms.nms_1d(y_prediction[:,0].astype(np.float), self.params.nms_win_size, file_duration) 78 | 79 | return pos, prob, y_prediction 80 | 81 | def create_or_load_features(self, file_name=None, audio_samples=None, sampling_rate=None): 82 | """ 83 | Does 1 of 3 possible things 84 | 1) computes feature from audio samples directly 85 | 2) loads feature from disk OR 86 | 3) computes features from file name 87 | """ 88 | 89 | if file_name is None: 90 | features = compute_features(audio_samples, sampling_rate, self.params) 91 | else: 92 | if self.params.load_features_from_file: 93 | features = np.load(self.params.feature_dir + file_name + '.npy') 94 | else: 95 | sampling_rate, audio_samples = wavfile.read(self.params.audio_dir + file_name + '.wav') 96 | features = compute_features(audio_samples, sampling_rate, self.params) 97 | 98 | return features 99 | 100 | def save_features(self, files): 101 | for file_name in files: 102 | sampling_rate, audio_samples = wavfile.read(self.params.audio_dir + file_name + '.wav') 103 | features = compute_features(audio_samples, sampling_rate, self.params) 104 | np.save(self.params.feature_dir + file_name, features) 105 | 106 | 107 | def iterate_minibatches(inputs, targets, batchsize, shuffle=False): 108 | # Note: this should not be used for testing as it creats even sized 109 | # minibatches so will skip some data 110 | indices = np.arange(len(inputs)) 111 | if shuffle: 112 | np.random.shuffle(indices) 113 | for start_idx in range(0, len(inputs) - batchsize + 1, batchsize): 114 | excerpt = indices[start_idx:start_idx + batchsize] 115 | yield inputs[excerpt], targets[excerpt] 116 | 117 | def build_cnn(ip_size, input_var, net_type): 118 | if net_type == 'big': 119 | net = network_big(ip_size, input_var) 120 | elif net_type == 'small': 121 | net = network_sm(ip_size, input_var) 122 | else: 123 | print 'Error: network not defined' 124 | return net 125 | 126 | def network_big(ip_size, input_var): 127 | net = {} 128 | net['input'] = lasagne.layers.InputLayer(shape=(None, 1, ip_size[0], ip_size[1]), input_var=input_var) 129 | net['conv1'] = ConvLayer(net['input'], 32, 3, pad=1) 130 | net['pool1'] = PoolLayer(net['conv1'], 2) 131 | net['conv2'] = ConvLayer(net['pool1'], 32, 3, pad=1) 132 | net['pool2'] = PoolLayer(net['conv2'], 2) 133 | net['conv3'] = ConvLayer(net['pool2'], 32, 3, pad=1) 134 | net['pool3'] = PoolLayer(net['conv3'], 2) 135 | net['fc1'] = DenseLayer(lasagne.layers.dropout(net['pool3'], p=0.5), num_units=256, nonlinearity=lasagne.nonlinearities.rectify) 136 | net['prob'] = DenseLayer(lasagne.layers.dropout(net['fc1'], p=0.5), num_units=2, nonlinearity=lasagne.nonlinearities.softmax) 137 | return net 138 | 139 | def network_sm(ip_size, input_var): 140 | net = {} 141 | net['input'] = lasagne.layers.InputLayer(shape=(None, 1, ip_size[0], ip_size[1]), input_var=input_var) 142 | net['conv1'] = ConvLayer(net['input'], 16, 3, pad=0) 143 | net['pool1'] = PoolLayer(net['conv1'], 2) 144 | net['conv2'] = ConvLayer(net['pool1'], 16, 3, pad=0) 145 | net['pool2'] = PoolLayer(net['conv2'], 2) 146 | net['fc1'] = DenseLayer(lasagne.layers.dropout(net['pool2'], p=0.5), num_units=64, nonlinearity=lasagne.nonlinearities.rectify) 147 | net['prob'] = DenseLayer(lasagne.layers.dropout(net['fc1'], p=0.5), num_units=2, nonlinearity=lasagne.nonlinearities.softmax) 148 | return net 149 | 150 | def compute_features(audio_samples, sampling_rate, params): 151 | """ 152 | Computes overlapping windows of spectrogram as input for CNN. 153 | """ 154 | 155 | # load audio and create spectrogram 156 | spectrogram = sp.gen_spectrogram(audio_samples, sampling_rate, params.fft_win_length, params.fft_overlap, 157 | crop_spec=params.crop_spec, max_freq=params.max_freq, min_freq=params.min_freq) 158 | spectrogram = sp.process_spectrogram(spectrogram, denoise_spec=params.denoise, mean_log_mag=params.mean_log_mag, smooth_spec=params.smooth_spec) 159 | 160 | # extract windows 161 | spec_win = view_as_windows(spectrogram, (spectrogram.shape[0], params.window_width))[0] 162 | spec_win = zoom(spec_win, (1, 0.5, 0.5), order=1) 163 | spec_width = spectrogram.shape[1] 164 | 165 | # make the correct size for CNN 166 | features = np.zeros((spec_width, 1, spec_win.shape[1], spec_win.shape[2]), dtype=np.float32) 167 | features[:spec_win.shape[0], 0, :, :] = spec_win 168 | 169 | return features 170 | -------------------------------------------------------------------------------- /bat_train/cls_segment.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import scipy.ndimage.morphology as morph 3 | from scipy.ndimage.filters import median_filter 4 | import scipy.ndimage 5 | from skimage.measure import regionprops 6 | import spectrogram as sp 7 | from scipy.io import wavfile 8 | 9 | class SegmentAudio: 10 | 11 | def __init__(self, params_): 12 | self.params = params_ 13 | 14 | def train(self, positions, class_labels, files, durations): 15 | # does not need to do anything 16 | pass 17 | 18 | def save_features(self, files): 19 | # does not need to do anything 20 | pass 21 | 22 | def test(self, file_name=None, file_duration=None, audio_samples=None, sampling_rate=None): 23 | 24 | sampling_rate, audio_samples = wavfile.read(self.params.audio_dir + file_name + '.wav') 25 | 26 | spectrogram = sp.gen_spectrogram(audio_samples, sampling_rate, self.params.fft_win_length, 27 | self.params.fft_overlap, crop_spec=self.params.crop_spec, max_freq=self.params.max_freq, 28 | min_freq=self.params.min_freq) 29 | spectrogram = sp.process_spectrogram(spectrogram, denoise_spec=self.params.denoise, 30 | mean_log_mag=self.params.mean_log_mag, smooth_spec=self.params.smooth_spec) 31 | 32 | # compute possible call locations 33 | pos = compute_position_from_segment(spectrogram, file_duration, self.params) 34 | prob = np.ones((pos.shape[0], 1)) # no probability information 35 | y_prediction = np.zeros((spectrogram.shape[1], 1)) # dummy 36 | 37 | return pos, prob, y_prediction 38 | 39 | 40 | def compute_position_from_segment(spec, file_duration, params): 41 | """ 42 | Based on Large-scale identification of birds in audio recordings 43 | http://ceur-ws.org/Vol-1180/CLEF2014wn-Life-Lasseck2014.pdf 44 | """ 45 | 46 | # median filter 47 | med_time = np.median(spec, 0)[np.newaxis, :] 48 | med_freq = np.median(spec, 1)[:, np.newaxis] 49 | med_freq_m = np.tile(med_freq, (1, spec.shape[1])) 50 | med_time_m = np.tile(med_time, (spec.shape[0], 1)) 51 | 52 | # binarize 53 | spec_t = np.logical_and((spec > params.median_mult*med_freq_m), (spec > params.median_mult*med_time_m)) 54 | 55 | # morphological operations 56 | spec_t_morph = morph.binary_closing(spec_t) 57 | spec_t_morph = morph.binary_dilation(spec_t_morph) 58 | spec_t_morph = median_filter(spec_t_morph, (2, 2)) 59 | 60 | # connected component and filter by size 61 | label_im, num_labels = scipy.ndimage.label(spec_t_morph) 62 | sizes = scipy.ndimage.sum(spec_t_morph, label_im, range(num_labels + 1)) 63 | mean_vals = scipy.ndimage.sum(spec, label_im, range(1, num_labels + 1)) 64 | mask_size = sizes < params.min_region_size 65 | remove_pixel = mask_size[label_im] 66 | label_im[remove_pixel] = 0 67 | labels = np.unique(label_im) 68 | label_im = np.searchsorted(labels, label_im) 69 | 70 | # get vertical positions 71 | num_calls = np.unique(label_im).shape[0]-1 # no zero 72 | props = regionprops(label_im) 73 | call_pos = np.zeros(num_calls) 74 | for ii, pp in enumerate(props): 75 | call_pos[ii] = pp['bbox'][1] / float(spec.shape[1]) 76 | 77 | # sort and convert to time as opposed to a ratio 78 | inds = call_pos.argsort() 79 | call_pos = call_pos[inds] * file_duration 80 | 81 | # remove overlapping calls - happens because of harmonics 82 | dis = np.triu(np.abs(call_pos[:, np.newaxis]-call_pos[np.newaxis, :])) 83 | dis = dis > params.min_overlap 84 | mask = np.triu(dis) + np.tril(np.ones([num_calls, num_calls])) 85 | valid_inds = mask.sum(0) == num_calls 86 | pos = call_pos[valid_inds] 87 | 88 | return pos 89 | -------------------------------------------------------------------------------- /bat_train/create_results.py: -------------------------------------------------------------------------------- 1 | import evaluate as evl 2 | import matplotlib.pyplot as plt 3 | import numpy as np 4 | import os 5 | import spectrogram as sp 6 | from scipy.io import wavfile 7 | import seaborn as sns 8 | sns.set_style('whitegrid') 9 | 10 | 11 | def plot_prec_recall(alg_name, recall, precision, nms_prob=None): 12 | # average precision 13 | ave_prec = evl.calc_average_precision(recall, precision) 14 | print 'average precision (area) = %.3f ' % ave_prec 15 | 16 | # recall at 95% precision 17 | desired_precision = 0.95 18 | if np.where(precision >= desired_precision)[0].shape[0] > 0: 19 | recall_at_precision = recall[np.where(precision >= desired_precision)[0][-1]] 20 | else: 21 | recall_at_precision = 0 22 | 23 | print 'recall at', int(desired_precision*100), '% precision = ', "%.3f" % recall_at_precision 24 | plt.plot([0, 1.02], [desired_precision, desired_precision], 'b:', linewidth=1) 25 | plt.plot([recall_at_precision, recall_at_precision], [0, desired_precision], 'b:', linewidth=1) 26 | 27 | # create plot 28 | label_str = alg_name.ljust(8) + "%.3f" % ave_prec + ' ' + str(desired_precision) + ' rec %.3f' % recall_at_precision 29 | if recall.shape[0] == 1: 30 | plt.plot(recall, precision, 'o', label=label_str) 31 | else: 32 | plt.plot(recall, precision, '', label=label_str) 33 | 34 | # find different probability locations on curve 35 | if nms_prob is not None: 36 | conf = np.concatenate(nms_prob)[:, 0] 37 | for p_val in [0.9, 0.7, 0.5]: 38 | p_loc = np.where(np.sort(conf)[::-1] < p_val)[0] 39 | if p_loc.shape[0] > 0: 40 | plt.plot(recall[p_loc[0]], precision[p_loc[0]], 'o', color='#4C72B0') 41 | plt.text(recall[p_loc[0]]-0.05, precision[p_loc[0]]-0.05, str(p_val)) 42 | 43 | plt.ylabel('precision') 44 | plt.xlabel('recall') 45 | plt.axis((0, 1.02, 0, 1.02)) 46 | plt.legend(loc='lower left') 47 | plt.grid(1) 48 | plt.show() 49 | 50 | 51 | def plot_spec(op_file_name, ip_file, gt_pos, nms_pos, nms_prob, y_prediction, params, save_ims): 52 | 53 | # create spec 54 | sampling_rate, audio_samples = wavfile.read(ip_file) 55 | file_duration = audio_samples.shape[0] / float(sampling_rate) 56 | spectrogram = sp.gen_spectrogram(audio_samples, sampling_rate, params.fft_win_length, params.fft_overlap, 57 | crop_spec=params.crop_spec, max_freq=params.max_freq, min_freq=params.min_freq) 58 | 59 | if y_prediction is None: 60 | y_prediction = np.zeros((spectrogram.shape[1])) 61 | 62 | gt_pos_norm = (gt_pos/file_duration)*y_prediction.shape[0] 63 | nms_pos_norm = (nms_pos/file_duration)*y_prediction.shape[0] 64 | 65 | fig = plt.figure(1, figsize=(10, 6)) 66 | ax1 = plt.axes([0.05, 0.7, 0.9, 0.25]) 67 | ax0 = plt.axes([0.05, 0.05, 0.9, 0.60]) 68 | 69 | ax1.plot([0, y_prediction.shape[0]], [0.5, 0.5], 'k--', linewidth=0.5, label='pred') 70 | 71 | # plot gt 72 | for pt in gt_pos_norm: 73 | ax1.plot([pt, pt], [0, 1], 'g', linewidth=4, label='gt') 74 | 75 | # plot nms 76 | for p in range(len(nms_pos_norm)): 77 | ax1.plot([nms_pos_norm[p], nms_pos_norm[p]], [0, nms_prob[p]], 'r', linewidth=2, label='pred') 78 | 79 | ax1.plot(y_prediction) 80 | ax1.set_xlim(0, y_prediction.shape[0]) 81 | ax1.set_ylim(0, 1) 82 | ax1.xaxis.set_ticklabels([]) 83 | 84 | # plot image 85 | ax0.imshow(spectrogram, aspect='auto', cmap='plasma') 86 | ax0.xaxis.set_ticklabels([]) 87 | ax0.yaxis.set_ticklabels([]) 88 | plt.grid() 89 | 90 | if save_ims: 91 | fig.savefig(op_file_name + '.jpg') 92 | 93 | plt.close(1) 94 | -------------------------------------------------------------------------------- /bat_train/data/readme.md: -------------------------------------------------------------------------------- 1 | This directory should contain the following directories: 2 | baselines 3 | models 4 | train_test_split 5 | wav 6 | -------------------------------------------------------------------------------- /bat_train/data_set_params.py: -------------------------------------------------------------------------------- 1 | import time 2 | import numpy as np 3 | 4 | 5 | class DataSetParams: 6 | 7 | def __init__(self): 8 | 9 | # spectrogram generation 10 | self.spectrogram_params() 11 | 12 | # detection 13 | self.detection() 14 | 15 | # data 16 | self.spec_dir = '' 17 | self.audio_dir = '' 18 | 19 | self.save_features_to_disk = False 20 | self.load_features_from_file = False 21 | 22 | # hard negative mining 23 | self.num_hard_negative_mining = 2 # if 0 there won't be any 24 | 25 | # non max suppression - smoothing and window 26 | self.smooth_op_prediction = True # smooth the op parameters before nms 27 | self.smooth_op_prediction_sigma = 0.006 / self.time_per_slice 28 | self.nms_win_size = int(np.round(0.12 / self.time_per_slice)) #ie 21 samples at 0.02322 fft win size, 0.75 overlap 29 | 30 | # model 31 | self.classification_model = 'cnn' # rf_vanilla, segment, cnn 32 | 33 | # rf_vanilla params 34 | self.feature_type = 'grad_pool' # raw, grad, grad_pool, raw_pool, hog, max_freq 35 | self.trees = 50 36 | self.depth = 20 37 | self.min_cnt = 2 38 | self.tests = 5000 39 | 40 | # CNN params 41 | self.learn_rate = 0.01 42 | self.moment = 0.9 43 | self.num_epochs = 50 44 | self.batchsize = 256 45 | self.net_type = 'big' # big, small 46 | 47 | # segment params - these were cross validated on validation set 48 | self.median_mult = 5.0 # how much to treshold spectrograms - higher will mean less calls 49 | self.min_region_size = np.round(0.4/self.time_per_slice) # used to determine the thresholding - 65 for fft win 0.02322 50 | self.min_overlap = 0.1 # in secs, anything that overlaps by this much will be counted as 1 call 51 | 52 | # param name string 53 | self.model_identifier = time.strftime("%d_%m_%y_%H_%M_%S_") + self.classification_model + '_hnm_' + str(self.num_hard_negative_mining) 54 | if self.classification_model == 'rf_vanilla': 55 | self.model_identifier += '_feat_' + self.feature_type 56 | elif self.classification_model == 'cnn': 57 | self.model_identifier += '_lr_'+ str(self.learn_rate) + '_mo_'+ str(self.moment) + '_net_'+ self.net_type 58 | elif self.classification_model == 'segment': 59 | self.model_identifier += '_minSize_' + str(self.min_region_size) + '_minOverlap_' + str(self.min_overlap ) 60 | 61 | # misc 62 | self.run_parallel = True 63 | self.num_processes = 10 64 | self.add_extra_calls = True # sample some other positive calls near the GT 65 | self.aug_shift = 0.015 # unit seconds, add extra call either side of GT if augmenting 66 | 67 | def spectrogram_params(self): 68 | 69 | self.valid_file_length = 169345 # some files are longer than they should be 70 | 71 | # spectrogram generation 72 | self.fft_win_length = 0.02322 # ie 1024/44100.0 about 23 msecs. 73 | self.fft_overlap = 0.75 # this is a percent - previously was 768/1024 74 | self.time_per_slice = ((1-self.fft_overlap)*self.fft_win_length) 75 | 76 | self.denoise = True 77 | self.mean_log_mag = 0.5 # sensitive to the spectrogram scaling used 78 | self.smooth_spec = True # gaussian filter 79 | 80 | # throw away unnecessary frequencies, keep from bottom 81 | # TODO this only makes sense as a frequency when you know the sampling rate 82 | # better to think of these as indices 83 | self.crop_spec = True 84 | self.max_freq = 270 85 | self.min_freq = 10 86 | 87 | # if doing 192K files for training 88 | #self.fft_win_length = 0.02667 # i.e. 512/19200 89 | #self.max_freq = 240 90 | #self.min_freq = 10 91 | 92 | def detection(self): 93 | self.window_size = 0.230 # 230 milliseconds (in time expanded, so 23 ms for not) 94 | # represent window size in terms of the number of time bins 95 | self.window_width = np.rint(self.window_size / ((1-self.fft_overlap)*self.fft_win_length)) 96 | self.detection_overlap = 0.1 # needs to be within x seconds of GT to be considered correct 97 | self.detection_prob = 0.5 # everything under this is considered background - used in HNM 98 | -------------------------------------------------------------------------------- /bat_train/evaluate.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from sklearn.metrics import roc_curve, auc 3 | 4 | 5 | def compute_error_auc(op_str, gt, pred, prob): 6 | 7 | # classification error 8 | pred_int = (pred > prob).astype(np.int) 9 | class_acc = (pred_int == gt).mean() * 100.0 10 | 11 | # ROC - area under curve 12 | fpr, tpr, thresholds = roc_curve(gt, pred) 13 | roc_auc = auc(fpr, tpr) 14 | 15 | print op_str, ', class acc = %.3f, ROC AUC = %.3f' % (class_acc, roc_auc) 16 | #return class_acc, roc_auc 17 | 18 | 19 | def calc_average_precision(recall, precision): 20 | 21 | precision[np.isnan(precision)] = 0 22 | recall[np.isnan(recall)] = 0 23 | 24 | # pascal'12 way 25 | mprec = np.hstack((0, precision, 0)) 26 | mrec = np.hstack((0, recall, 1)) 27 | for ii in range(mprec.shape[0]-2, -1,-1): 28 | mprec[ii] = np.maximum(mprec[ii], mprec[ii+1]) 29 | inds = np.where(np.not_equal(mrec[1:], mrec[:-1]))[0]+1 30 | ave_prec = ((mrec[inds] - mrec[inds-1])*mprec[inds]).sum() 31 | 32 | return ave_prec 33 | 34 | 35 | def remove_end_preds(nms_pos_o, nms_prob_o, gt_pos_o, durations, win_size): 36 | # this filters out predictions and gt that are close to the end 37 | # this is a bit messy because of the shapes of gt_pos_o 38 | nms_pos = [] 39 | nms_prob = [] 40 | gt_pos = [] 41 | for ii in range(len(nms_pos_o)): 42 | valid_time = durations[ii] - win_size 43 | gt_cur = gt_pos_o[ii] 44 | if gt_cur.shape[0] > 0: 45 | gt_pos.append(gt_cur[:, 0][gt_cur[:, 0] < valid_time][..., np.newaxis]) 46 | else: 47 | gt_pos.append(gt_cur) 48 | 49 | valid_preds = nms_pos_o[ii] < valid_time 50 | nms_pos.append(nms_pos_o[ii][valid_preds]) 51 | nms_prob.append(nms_prob_o[ii][valid_preds, 0][..., np.newaxis]) 52 | return nms_pos, nms_prob, gt_pos 53 | 54 | 55 | def prec_recall_1d(nms_pos_o, nms_prob_o, gt_pos_o, durations, detection_overlap, win_size, remove_eof=True): 56 | """ 57 | nms_pos, nms_prob, and gt_pos are lists of numpy arrays specifying detection 58 | position, detection probability and GT position. 59 | Each list entry is a different file. 60 | Each entry in nms_pos is an array of length num_entries. For nms_prob and 61 | gt_pos its an array of size (num_entries, 1). 62 | 63 | durations is a array of the length of the number of files with each entry 64 | containing that file length in seconds. 65 | detection_overlap determines if a prediction is counted as correct or not. 66 | win_size is used to ignore predictions and ground truth at the end of an 67 | audio file. 68 | 69 | returns 70 | precision: fraction of retrieved instances that are relevant. 71 | recall: fraction of relevant instances that are retrieved. 72 | """ 73 | 74 | if remove_eof: 75 | # filter out the detections in both ground truth and predictions that are too 76 | # close to the end of the file - dont count them during eval 77 | nms_pos, nms_prob, gt_pos = remove_end_preds(nms_pos_o, nms_prob_o, gt_pos_o, durations, win_size) 78 | else: 79 | nms_pos = nms_pos_o 80 | nms_prob = nms_prob_o 81 | gt_pos = gt_pos_o 82 | 83 | # loop through each file 84 | true_pos = [] # correctly predicts the ground truth 85 | false_pos = [] # says there is a detection but isn't 86 | for ii in range(len(nms_pos)): 87 | num_preds = nms_pos[ii].shape[0] 88 | 89 | if num_preds > 0: # check to make sure it contains something 90 | num_gt = gt_pos[ii].shape[0] 91 | 92 | # for each set of predictions label them as true positive or false positive (i.e. 1-tp) 93 | tp = np.zeros(num_preds) 94 | distance_to_gt = np.abs(gt_pos[ii].ravel()-nms_pos[ii].ravel()[:, np.newaxis]) 95 | within_overlap = (distance_to_gt <= detection_overlap) 96 | 97 | # remove duplicate detections - assign to valid detection with highest prob 98 | for jj in range(num_gt): 99 | inds = np.where(within_overlap[:, jj])[0] # get the indices of all valid predictions 100 | if inds.shape[0] > 0: 101 | max_prob = np.argmax(nms_prob[ii][inds]) 102 | selected_pred = inds[max_prob] 103 | within_overlap[selected_pred, :] = False 104 | tp[selected_pred] = 1 # set as true positives 105 | true_pos.append(tp) 106 | false_pos.append(1 - tp) 107 | 108 | # calc precision and recall - sort confidence in descending order 109 | # PASCAL style 110 | conf = np.concatenate(nms_prob)[:, 0] 111 | num_gt = np.concatenate(gt_pos).shape[0] 112 | inds = np.argsort(conf)[::-1] 113 | true_pos_cat = np.concatenate(true_pos)[inds].astype(float) 114 | false_pos_cat = np.concatenate(false_pos)[inds].astype(float) # i.e. 1-true_pos_cat 115 | 116 | if (conf == conf[0]).sum() == conf.shape[0]: 117 | # all the probability values are the same therefore we will not sweep 118 | # the curve and instead will return a single value 119 | true_pos_sum = true_pos_cat.sum() 120 | false_pos_sum = false_pos_cat.sum() 121 | 122 | recall = np.asarray([true_pos_sum / float(num_gt)]) 123 | precision = np.asarray([(true_pos_sum / (false_pos_sum + true_pos_sum))]) 124 | 125 | elif inds.shape[0] > 0: 126 | # otherwise produce a list of values 127 | true_pos_cum = np.cumsum(true_pos_cat) 128 | false_pos_cum = np.cumsum(false_pos_cat) 129 | 130 | recall = true_pos_cum / float(num_gt) 131 | precision = (true_pos_cum / (false_pos_cum + true_pos_cum)) 132 | 133 | return precision, recall 134 | -------------------------------------------------------------------------------- /bat_train/export_detector_weights.py: -------------------------------------------------------------------------------- 1 | """ 2 | This script outputs the weights of a trained model so the standalone detector copde 3 | can use it. 4 | """ 5 | 6 | import cPickle as pickle 7 | from lasagne.layers.helper import get_all_param_values, get_output_shape, set_all_param_values 8 | import numpy as np 9 | import lasagne 10 | import theano 11 | import sys 12 | import cPickle as pickle 13 | import json 14 | 15 | save_detector = False 16 | 17 | print 'saving detector' 18 | model_dir = 'results/models/' 19 | model_file = model_dir + 'test_set_norfolk.mod' 20 | print model_file 21 | 22 | mod = pickle.load(open(model_file)) 23 | 24 | weights = get_all_param_values(mod.model.network['prob']) 25 | np.save(model_file[:-4], weights) 26 | print 'weights shape', len(weights) 27 | 28 | # save detection params 29 | mod_params = {'win_size':0, 'chunk_size':0, 'max_freq':0, 'min_freq':0, 30 | 'mean_log_mag':0, 'slice_scale':0, 'overlap':0, 31 | 'crop_spec':False, 'denoise':False, 'smooth_spec':False, 32 | 'nms_win_size':0, 'smooth_op_prediction_sigma':0} 33 | 34 | mod_params['win_size'] = mod.model.params.window_size 35 | mod_params['max_freq'] = mod.model.params.max_freq 36 | mod_params['min_freq'] = mod.model.params.min_freq 37 | mod_params['mean_log_mag'] = mod.model.params.mean_log_mag 38 | mod_params['slice_scale'] = mod.model.params.fft_win_length 39 | mod_params['overlap'] = mod.model.params.fft_overlap 40 | 41 | mod_params['crop_spec'] = mod.model.params.crop_spec 42 | mod_params['denoise'] = mod.model.params.denoise 43 | mod_params['smooth_spec'] = mod.model.params.smooth_spec 44 | 45 | mod_params['nms_win_size'] = int(mod.model.params.nms_win_size) 46 | mod_params['smooth_op_prediction_sigma'] = mod.model.params.smooth_op_prediction_sigma 47 | 48 | params_file = model_file[:-4] + '_params.p' 49 | with open(params_file, 'w') as fp: 50 | json.dump(mod_params, fp) 51 | -------------------------------------------------------------------------------- /bat_train/grad_features.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from skimage.util import view_as_blocks, view_as_windows 3 | 4 | 5 | def compute_hog(arr, block_size=2, block_sum=True, num_orientations=6, block_normalize=False): 6 | """ 7 | Computes histogram of gradient feature for Random Forest. 8 | """ 9 | 10 | # make sure the input is evenly divisible by the block size 11 | if block_sum: 12 | vert_diff = arr.shape[0]%int(block_size) 13 | horz_diff = arr.shape[1]%int(block_size) 14 | 15 | if vert_diff > 0: 16 | arr = np.vstack((arr, np.tile(arr[-1, :], ((vert_diff, 1))))) 17 | if horz_diff > 0: 18 | arr = np.hstack((arr, np.tile(arr[:, -1], ((horz_diff, 1))).T)) 19 | 20 | # compute gradient magnitude and orientation 21 | mag, orien = gradient_mag(arr) 22 | 23 | # quantize orientations 24 | bins = np.arange(0, np.pi+np.pi/num_orientations, np.pi/num_orientations) 25 | orien_quantized = np.argmin(np.abs(orien[:, :, np.newaxis] - bins[np.newaxis, :]), axis=2) 26 | orien_quantized[orien_quantized == num_orientations] = 0 27 | 28 | # create histogram 29 | hist_of_grads = np.zeros((mag.shape[0], mag.shape[1], num_orientations)) 30 | j, k = np.indices(mag.shape[:2]) 31 | hist_of_grads[j, k, orien_quantized] = mag 32 | 33 | # add mag as extra channel 34 | hist_of_grads = np.dstack((hist_of_grads, mag)) 35 | 36 | # sum over a block - note this is non-overlapping 37 | # note that we are assuming that hist_of_grads is evenly divisible by block_size 38 | if block_sum: 39 | blocks = view_as_blocks(hist_of_grads, (block_size, block_size, hist_of_grads.shape[2])) 40 | hist_of_grads = blocks.reshape(blocks.shape[0], blocks.shape[1], blocks.shape[2]*blocks.shape[3]*blocks.shape[4], blocks.shape[5]).sum(2) 41 | 42 | # L1 normalization 43 | if block_normalize: 44 | hist_of_grads = hist_of_grads / (hist_of_grads.sum(2) + 10e-6)[:, :, np.newaxis] 45 | 46 | return hist_of_grads 47 | 48 | 49 | def gradient_mag(arr): 50 | """ 51 | Computes gradient magnitude and orientation. 52 | """ 53 | gx = np.empty(arr.shape, dtype=np.double) 54 | gx[:, 0] = 0 55 | gx[:, -1] = 0 56 | gx[:, 1:-1] = arr[:, 2:] - arr[:, :-2] 57 | gy = np.empty(arr.shape, dtype=np.double) 58 | gy[0, :] = 0 59 | gy[-1, :] = 0 60 | gy[1:-1, :] = arr[2:, :] - arr[:-2, :] 61 | 62 | mag = np.sqrt((gx**2 + gy**2)) 63 | orien = np.arctan2(gx, gy) 64 | orien[orien < 0] += np.pi 65 | 66 | return mag, orien 67 | -------------------------------------------------------------------------------- /bat_train/nms.pyx: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | cimport numpy as np 3 | cimport cython 4 | 5 | cdef inline int int_min(int a, int b): return a if a < b else b 6 | 7 | @cython.boundscheck(False) 8 | def nms_1d(np.ndarray src, int win_size, float file_duration): 9 | """1D Non maximum suppression 10 | src: vector of length N 11 | """ 12 | 13 | cdef int src_cnt = 0 14 | cdef int max_ind = 0 15 | cdef int ii = 0 16 | cdef int ee = 0 17 | cdef int width = src.shape[0]-1 18 | cdef np.ndarray pos = np.empty(width, dtype=np.int) 19 | cdef int pos_cnt = 0 20 | while ii <= width: 21 | 22 | if max_ind < (ii - win_size): 23 | max_ind = ii - win_size 24 | 25 | ee = int_min(ii + win_size, width) 26 | 27 | while max_ind <= ee: 28 | src_cnt += 1 29 | if src[max_ind] > src[ii]: 30 | break 31 | max_ind += 1 32 | 33 | if max_ind > ee: 34 | pos[pos_cnt] = ii 35 | pos_cnt += 1 36 | max_ind = ii+1 37 | ii += win_size 38 | 39 | ii += 1 40 | 41 | pos = pos[:pos_cnt] 42 | val = src[pos] 43 | 44 | # remove peaks near the end 45 | inds = (pos + win_size) < src.shape[0] 46 | pos = pos[inds] 47 | val = val[inds] 48 | 49 | # set output to between 0 and 1, then put it in the correct time range 50 | pos = pos.astype(np.float) / src.shape[0] 51 | pos = pos*file_duration 52 | 53 | return pos, val[..., np.newaxis] 54 | -------------------------------------------------------------------------------- /bat_train/nms_slow.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | def nms_1d(src, win_size, file_duration): 4 | """1D Non maximum suppression 5 | src: vector of length N 6 | """ 7 | 8 | pos = [] 9 | src_cnt = 0 10 | max_ind = 0 11 | ii = 0 12 | ee = 0 13 | width = src.shape[0]-1 14 | while ii <= width: 15 | 16 | if max_ind < (ii - win_size): 17 | max_ind = ii - win_size 18 | 19 | ee = np.minimum(ii + win_size, width) 20 | 21 | while max_ind <= ee: 22 | src_cnt += 1 23 | if src[int(max_ind)] > src[int(ii)]: 24 | break 25 | max_ind += 1 26 | 27 | if max_ind > ee: 28 | pos.append(ii) 29 | max_ind = ii+1 30 | ii += win_size 31 | 32 | ii += 1 33 | 34 | pos = np.asarray(pos).astype(np.int) 35 | val = src[pos] 36 | 37 | # remove peaks near the end 38 | inds = (pos + win_size) < src.shape[0] 39 | pos = pos[inds] 40 | val = val[inds] 41 | 42 | # set output to between 0 and 1, then put it in the correct time range 43 | pos = pos / float(src.shape[0]) 44 | pos = pos*file_duration 45 | 46 | return pos, val[..., np.newaxis] 47 | 48 | 49 | def test_nms(): 50 | import matplotlib.pyplot as plt 51 | import numpy as np 52 | import pyximport; pyximport.install(reload_support=True) 53 | import nms as nms_fast 54 | 55 | y = np.sin(np.arange(1000)/100.0*np.pi) 56 | y = y + np.random.random(y.shape)*0.5 57 | win_size = int(0.1*y.shape[0]/2.0) 58 | 59 | pos, prob = nms_1d(y, win_size, y.shape[0]) 60 | pos_f, prob_f = nms_fast.nms_1d(y, win_size, y.shape[0]) 61 | 62 | print 'diff between implementations =', 1-np.isclose(prob_f, prob).mean() 63 | print 'diff between implementations =', 1-np.isclose(pos_f, pos).mean() 64 | 65 | plt.close('all') 66 | plt.plot(y) 67 | plt.plot((pos).astype('int'), prob, 'ro', ms=10) 68 | plt.plot((pos_f).astype('int'), prob, 'bo') # shift so we can see them 69 | plt.show() 70 | -------------------------------------------------------------------------------- /bat_train/random_forest.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from joblib import Parallel, delayed 3 | import weave # not in scipy any more - needs to be installed separately 4 | 5 | class ForestParams: 6 | def __init__(self, num_classes, trees=50, depth=20, min_cnt=2, tests=5000): 7 | self.num_tests = tests 8 | self.min_sample_cnt = min_cnt 9 | self.max_depth = depth 10 | self.num_trees = trees 11 | self.bag_size = 0.8 12 | self.train_parallel = True 13 | self.num_classes = num_classes # assumes that the classes are ordered from 0 to C 14 | 15 | 16 | class Node: 17 | 18 | def __init__(self, node_id, node_cnt, exs_at_node, impurity, probability): 19 | self.node_id = node_id # id of absolute node 20 | self.node_cnt = node_cnt # id not including nodes that didn't get made 21 | self.exs_at_node = exs_at_node 22 | self.impurity = impurity 23 | self.num_exs = float(exs_at_node.shape[0]) 24 | self.is_leaf = True 25 | self.info_gain = 0.0 26 | 27 | # output 28 | self.probability = probability.copy() 29 | self.class_id = probability.argmax() 30 | 31 | # node test 32 | self.test_ind1 = 0 33 | self.test_thresh = 0.0 34 | 35 | def update_node(self, test_ind1, test_thresh, info_gain): 36 | self.test_ind1 = test_ind1 37 | self.test_thresh = test_thresh 38 | self.info_gain = info_gain 39 | self.is_leaf = False 40 | 41 | def create_child(self, test_res, impurity, prob, child_type, node_cnt): 42 | # save absolute location in dataset 43 | inds_local = np.where(test_res)[0] 44 | inds = self.exs_at_node[inds_local] 45 | 46 | if child_type == 'left': 47 | self.left_node = Node(2*self.node_id+1, node_cnt, inds, impurity, prob) 48 | elif child_type == 'right': 49 | self.right_node = Node(2*self.node_id+2, node_cnt, inds, impurity, prob) 50 | 51 | def test(self, X): 52 | return X[self.test_ind1] < self.test_thresh 53 | 54 | def get_compact_node(self): 55 | # used for fast forest 56 | if not self.is_leaf: 57 | node_array = np.zeros(4) 58 | # dims 0 and 1 are reserved for indexing children 59 | node_array[2] = self.test_ind1 60 | node_array[3] = self.test_thresh 61 | else: 62 | node_array = np.zeros(2+self.probability.shape[0]) 63 | node_array[0] = -1 # indicates that its a leaf 64 | node_array[1] = self.node_cnt # the id of the node 65 | node_array[2:] = self.probability.copy() 66 | return node_array 67 | 68 | 69 | class Tree: 70 | 71 | def __init__(self, tree_id, tree_params): 72 | self.tree_id = tree_id 73 | self.tree_params = tree_params 74 | self.num_nodes = 0 75 | self.compact_tree = None # used for fast testing forest and small memory footprint 76 | 77 | def build_tree(self, X, Y, node): 78 | if (node.node_id < ((2.0**self.tree_params.max_depth)-1)) and (node.impurity > 0.0) \ 79 | and (self.optimize_node(np.take(X, node.exs_at_node, 0), np.take(Y, node.exs_at_node), node)): 80 | self.num_nodes += 2 81 | self.build_tree(X, Y, node.left_node) 82 | self.build_tree(X, Y, node.right_node) 83 | 84 | def train(self, X, Y): 85 | 86 | # bagging 87 | exs_at_node = np.random.choice(Y.shape[0], int(Y.shape[0]*self.tree_params.bag_size), replace=False) 88 | exs_at_node.sort() 89 | 90 | # compute impurity 91 | prob, impurity = self.calc_impurity(np.take(Y, exs_at_node), np.ones((exs_at_node.shape[0], 1), dtype='bool')) 92 | 93 | # create root 94 | self.root = Node(0, 0, exs_at_node, impurity, prob[:, 0]) 95 | self.num_nodes = 1 96 | 97 | # build tree 98 | self.build_tree(X, Y, self.root) 99 | 100 | # make compact version for fast testing 101 | self.compact_tree, _ = self.traverse_tree(self.root, np.zeros(0)) 102 | 103 | def traverse_tree(self, node, compact_tree_in): 104 | node_loc = compact_tree_in.shape[0] 105 | compact_tree = np.hstack((compact_tree_in, node.get_compact_node())) 106 | 107 | # this assumes that the index for the left and right child nodes are the first two 108 | if not node.is_leaf: 109 | compact_tree, compact_tree[node_loc] = self.traverse_tree(node.left_node, compact_tree) 110 | compact_tree, compact_tree[node_loc+1] = self.traverse_tree(node.right_node, compact_tree) 111 | 112 | return compact_tree, node_loc 113 | 114 | def test(self, X): 115 | op = np.zeros((X.shape[0], self.tree_params.num_classes)) 116 | 117 | # single dim test 118 | for ex_id in range(X.shape[0]): 119 | node = self.root 120 | while not node.is_leaf: 121 | if X[ex_id, node.test_ind1] < node.test_thresh: 122 | node = node.right_node 123 | else: 124 | node = node.left_node 125 | op[ex_id, :] = node.probability 126 | return op 127 | 128 | def test_fast(self, X): 129 | op = np.zeros((X.shape[0], self.tree_params.num_classes)) 130 | tree = self.compact_tree # work around 131 | 132 | #in memory: for non leaf node - 0 is lchild index, 1 is rchild, 2 is dim to test, 3 is threshold 133 | #in memory: for leaf node - 0 is leaf indicator -1, 1 is the node id, the rest is the probability for each class 134 | code = """ 135 | int ex_id, node_loc, c_it; 136 | for (ex_id=0; ex_id= self.tree_params.min_sample_cnt) & (num_exs_r >= self.tree_params.min_sample_cnt) 228 | 229 | successful_split = False 230 | if valid_inds.sum() > 0: 231 | # child node impurity 232 | prob_l, impurity_l = self.calc_impurity(y_local, ~test_res) 233 | prob_r, impurity_r = self.calc_impurity(y_local, test_res) 234 | 235 | # information gain - want the minimum 236 | num_exs_l_norm = num_exs_l/node.num_exs 237 | num_exs_r_norm = num_exs_r/node.num_exs 238 | #info_gain = - node.impurity + (num_exs_r_norm*impurity_r) + (num_exs_l_norm*impurity_l) 239 | info_gain = (num_exs_r_norm*impurity_r) + (num_exs_l_norm*impurity_l) 240 | 241 | # make sure we con only select from valid splits 242 | info_gain[~valid_inds] = info_gain.max() + 10e-10 # plus small constant 243 | best_split = info_gain.argmin() 244 | 245 | # create new child nodes and update current node 246 | node.update_node(test_inds1[best_split], test_thresh[best_split], info_gain[best_split]) 247 | node.create_child(~test_res[:, best_split], impurity_l[best_split], prob_l[:, best_split], 'left', self.num_nodes+1) 248 | node.create_child(test_res[:, best_split], impurity_r[best_split], prob_r[:, best_split], 'right', self.num_nodes+2) 249 | 250 | successful_split = True 251 | 252 | return successful_split 253 | 254 | 255 | ## Parallel training helper - used to train trees in parallel 256 | def train_forest_helper(t_id, X, Y, params, seed): 257 | #print 'tree', t_id 258 | np.random.seed(seed) 259 | tree = Tree(t_id, params) 260 | tree.train(X, Y) 261 | return tree 262 | 263 | 264 | class Forest: 265 | 266 | def __init__(self, params): 267 | self.params = params 268 | self.trees = [] 269 | 270 | def train(self, X, Y, delete_old_trees): 271 | if delete_old_trees: 272 | self.trees = [] 273 | 274 | if self.params.train_parallel: 275 | # need to seed the random number generator for each process 276 | seeds = np.random.random_integers(0, 10e8, self.params.num_trees) 277 | self.trees.extend(Parallel(n_jobs=-1)(delayed(train_forest_helper)(t_id, X, Y, self.params, seeds[t_id]) 278 | for t_id in range(self.params.num_trees))) 279 | else: 280 | #print 'Standard training' 281 | for t_id in range(self.params.num_trees): 282 | print 'tree', t_id 283 | tree = Tree(t_id, self.params) 284 | tree.train(X, Y) 285 | self.trees.append(tree) 286 | 287 | def test(self, X): 288 | op = np.zeros((X.shape[0], self.params.num_classes)) 289 | for tt, tree in enumerate(self.trees): 290 | op_local = tree.test_fast(X) 291 | op += op_local 292 | op /= float(len(self.trees)) 293 | return op 294 | 295 | def get_leaf_ids(self, X): 296 | op = np.zeros((X.shape[0], len(self.trees)), dtype=np.int64) 297 | for tt, tree in enumerate(self.trees): 298 | op[:, tt] = tree.get_leaf_ids(X) 299 | return op 300 | 301 | def delete_trees(self): 302 | del self.trees[:] 303 | -------------------------------------------------------------------------------- /bat_train/readme.md: -------------------------------------------------------------------------------- 1 | # Training Code 2 | 3 | 4 | ### Training 5 | 6 | ##### 1 Download Data 7 | Download the data from [here](http://visual.cs.ucl.ac.uk/pubs/batDetective). It contains: 8 | *baselines*: Results for three different commercial packages we compared against. 9 | *models*: Pre-trained CNN models. 10 | *train_test_split*: the list of training and test files and the time of the bat calls in each file. The training data comes from Bat Detective and test sets have been manually verified. 11 | *wav*: 4,246 time expanded .wav files from the iBats project. 12 | 13 | ##### 2 Run Training and Evaluation Code 14 | Running *run_comparison.py* recreate the results in the paper (up to random initialization). It trains a CNN, Random Forest, and simple segmentation based models and compares their performance to three commercial systems. 15 | 16 | 17 | ### Run Detector on Your Own Data 18 | Running *run_detector.py* loads a pre-trained CNN and performs detection on a directory of audio files. Make sure *data_dir* points to the directory where your audio files are. You need to make sure that you have a trained model on your computer. You can get one by training your own model or downloading a pre-trained one (details in the previous steps). Also make sure that if your data is already time expanded set *do_time_expansion=False*. 19 | Note, that `../bat_eval` also contains separate CPU based evaluation code for CNN_FAST. 20 | 21 | 22 | ### Requirements 23 | It takes about 1.5 hrs to run on a desktop with an i7-6850K CPU, 32Gb RAM, and a GTX 1080 on Ubuntu 16.04. You might get some warnings the first time the code is run. The code has been tested with the following package versions from Conda: 24 | `Python 2.7.12` 25 | `cython 0.24.1` 26 | `joblib 0.9.4` 27 | `lasagne 0.2.dev1` 28 | `libgcc 7.2.0` 29 | `matplotlib 2.0.2` 30 | `numpy 1.12.1` 31 | `pandas 0.19.2` 32 | `scipy 0.19.0` 33 | `scikit-image 0.13.0` 34 | `scikit-learn 0.19.0` 35 | `seaborn 0.8` 36 | `weave 0.16.0` 37 | 38 | 39 | ### Acknowledgements 40 | We are enormously grateful for the efforts and enthusiasm of the amazing iBats and Bat Detective volunteers. We would also like to thank Ian Agranat and Joe Szewczak for useful discussions and access to their systems. Finally, we would like to thank [Zooniverse](https://www.zooniverse.org/) for setting up and hosting the Bat Detective project. 41 | 42 | ### License 43 | Code, audio data, and annotations are available for research purposes only i.e. non-commercial use. For any other use of the software or data please contact the authors. 44 | -------------------------------------------------------------------------------- /bat_train/run_comparison.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import matplotlib.pyplot as plt 3 | import os 4 | import evaluate as evl 5 | import create_results as res 6 | from data_set_params import DataSetParams 7 | import classifier as clss 8 | import pandas as pd 9 | import cPickle as pickle 10 | 11 | 12 | def read_baseline_res(baseline_file_name, test_files): 13 | da = pd.read_csv(baseline_file_name) 14 | pos = [] 15 | prob = [] 16 | for ff in test_files: 17 | rr = da[da['Filename'] == ff] 18 | inds = np.argsort(rr.TimeInFile.values) 19 | pos.append(rr.TimeInFile.values[inds]) 20 | prob.append(rr.Quality.values[inds][..., np.newaxis]) 21 | return pos, prob 22 | 23 | 24 | if __name__ == '__main__': 25 | """ 26 | This compares several different algorithms for bat echolocation detection. 27 | 28 | The results can vary by a few percent from run to run. If you don't want to 29 | run a specific model or baseline comment it out. 30 | """ 31 | 32 | test_set = 'bulgaria' # can be one of: bulgaria, uk, norfolk 33 | data_set = 'data/train_test_split/test_set_' + test_set + '.npz' 34 | raw_audio_dir = 'data/wav/' 35 | base_line_dir = 'data/baselines/' 36 | result_dir = 'results/' 37 | model_dir = 'data/models/' 38 | if not os.path.isdir(result_dir): 39 | os.mkdir(result_dir) 40 | if not os.path.isdir(model_dir): 41 | os.mkdir(model_dir) 42 | print 'test set:', test_set 43 | plt.close('all') 44 | 45 | # train and test_pos are in units of seconds 46 | loaded_data_tr = np.load(data_set) 47 | train_pos = loaded_data_tr['train_pos'] 48 | train_files = loaded_data_tr['train_files'] 49 | train_durations = loaded_data_tr['train_durations'] 50 | test_pos = loaded_data_tr['test_pos'] 51 | test_files = loaded_data_tr['test_files'] 52 | test_durations = loaded_data_tr['test_durations'] 53 | 54 | # load parameters 55 | params = DataSetParams() 56 | params.audio_dir = raw_audio_dir 57 | 58 | # 59 | # CNN 60 | print '\ncnn' 61 | params.classification_model = 'cnn' 62 | model = clss.Classifier(params) 63 | # train and test 64 | model.train(train_files, train_pos, train_durations) 65 | nms_pos, nms_prob = model.test_batch(test_files, test_pos, test_durations, False, '') 66 | # compute precision recall 67 | precision, recall = evl.prec_recall_1d(nms_pos, nms_prob, test_pos, test_durations, model.params.detection_overlap, model.params.window_size) 68 | res.plot_prec_recall('cnn', recall, precision, nms_prob) 69 | # save CNN model to file 70 | pickle.dump(model, open(model_dir + 'test_set_' + test_set + '.mod', 'wb')) 71 | 72 | # 73 | # random forest 74 | print '\nrandom forest' 75 | params.classification_model = 'rf_vanilla' 76 | model = clss.Classifier(params) 77 | # train and test 78 | model.train(train_files, train_pos, train_durations) 79 | nms_pos, nms_prob = model.test_batch(test_files, test_pos, test_durations, False, '') 80 | # compute precision recall 81 | precision, recall = evl.prec_recall_1d(nms_pos, nms_prob, test_pos, test_durations, model.params.detection_overlap, model.params.window_size) 82 | res.plot_prec_recall('rf', recall, precision, nms_prob) 83 | 84 | # 85 | # segment 86 | print '\nsegment' 87 | params.classification_model = 'segment' 88 | model = clss.Classifier(params) 89 | # train and test 90 | model.train(train_files, train_pos, train_durations) 91 | nms_pos, nms_prob = model.test_batch(test_files, test_pos, test_durations, False, '') 92 | # compute precision recall 93 | precision, recall = evl.prec_recall_1d(nms_pos, nms_prob, test_pos, test_durations, model.params.detection_overlap, model.params.window_size) 94 | res.plot_prec_recall('segment', recall, precision, nms_prob) 95 | 96 | # 97 | # scanr 98 | scanr_bat_results = base_line_dir + 'scanr/test_set_'+ test_set +'_scanr.csv' 99 | if os.path.isfile(scanr_bat_results): 100 | print '\nscanr' 101 | scanr_pos, scanr_prob = read_baseline_res(scanr_bat_results, test_files) 102 | precision_scanr, recall_scanr = evl.prec_recall_1d(scanr_pos, scanr_prob, test_pos, test_durations, params.detection_overlap, params.window_size) 103 | res.plot_prec_recall('scanr', recall_scanr, precision_scanr) 104 | 105 | # 106 | # sonobat 107 | sono_bat_results = base_line_dir + 'sonobat/test_set_'+ test_set +'_sono.csv' 108 | if os.path.isfile(sono_bat_results): 109 | print '\nsonobat' 110 | sono_pos, sono_prob = read_baseline_res(sono_bat_results, test_files) 111 | precision_sono, recall_sono = evl.prec_recall_1d(sono_pos, sono_prob, test_pos, test_durations, params.detection_overlap, params.window_size) 112 | res.plot_prec_recall('sonobat', recall_sono, precision_sono) 113 | 114 | # 115 | # kaleidoscope 116 | kal_bat_results = base_line_dir + 'kaleidoscope/test_set_'+ test_set +'_kaleidoscope.csv' 117 | if os.path.isfile(kal_bat_results): 118 | print '\nkaleidoscope' 119 | kal_pos, kal_prob = read_baseline_res(kal_bat_results, test_files) 120 | precision_kal, recall_kal = evl.prec_recall_1d(kal_pos, kal_prob, test_pos, test_durations, params.detection_overlap, params.window_size) 121 | res.plot_prec_recall('kaleidoscope', recall_kal, precision_kal) 122 | 123 | # save results 124 | plt.savefig(result_dir + test_set + '_results.png') 125 | plt.savefig(result_dir + test_set + '_results.pdf') 126 | -------------------------------------------------------------------------------- /bat_train/run_detector.py: -------------------------------------------------------------------------------- 1 | from scipy.io import wavfile 2 | import numpy as np 3 | import cPickle as pickle 4 | import os 5 | import glob 6 | import time 7 | import write_op as wo 8 | import sys 9 | 10 | 11 | def read_audio(file_name, do_time_expansion, chunk_size, win_size): 12 | 13 | # try to read in audio file 14 | try: 15 | samp_rate_orig, audio = wavfile.read(file_name) 16 | except: 17 | print ' Error reading file' 18 | return True, None, None, None, None 19 | 20 | # convert to mono if stereo 21 | if len(audio.shape) == 2: 22 | print ' Warning: stereo file. Just taking right channel.' 23 | audio = audio[:, 1] 24 | file_dur = audio.shape[0] / float(samp_rate_orig) 25 | print ' dur', round(file_dur,3), '(secs) , fs', samp_rate_orig 26 | 27 | # original model is trained on time expanded data 28 | samp_rate = samp_rate_orig 29 | if do_time_expansion: 30 | samp_rate = int(samp_rate_orig/10.0) 31 | file_dur *= 10 32 | 33 | # pad with zeros so we can go right to the end 34 | multiplier = np.ceil(file_dur/float(chunk_size-win_size)) 35 | diff = multiplier*(chunk_size-win_size) - file_dur + win_size 36 | audio_pad = np.hstack((audio, np.zeros(int(diff*samp_rate)))) 37 | 38 | return False, audio_pad, file_dur, samp_rate, samp_rate_orig 39 | 40 | 41 | def run_detector(det, audio, file_dur, samp_rate, detection_thresh): 42 | 43 | det_time = [] 44 | det_prob = [] 45 | 46 | # files can be long so we split each up into separate (overlapping) chunks 47 | st_positions = np.arange(0, file_dur, det.chunk_size-det.params.window_size) 48 | for chunk_id, st_position in enumerate(st_positions): 49 | 50 | # take a chunk of the audio 51 | # should already be zero padded at the end so its the correct size 52 | st_pos = int(st_position*samp_rate) 53 | en_pos = int(st_pos + det.chunk_size*samp_rate) 54 | audio_chunk = audio[st_pos:en_pos] 55 | 56 | # make predictions 57 | pos, prob, y_prediction = det.test_single(audio_chunk, samp_rate) 58 | prob = prob[:, 0] 59 | 60 | # remove predictions near the end (if not last chunk) and ones that are 61 | # below the detection threshold 62 | if chunk_id == (len(st_positions)-1): 63 | inds = (prob >= detection_thresh) 64 | else: 65 | inds = (prob >= detection_thresh) & (pos < (det.chunk_size-(det.params.window_size/2.0))) 66 | 67 | # convert detection time back into global time and save valid detections 68 | if pos.shape[0] > 0: 69 | det_time.append(pos[inds] + st_position) 70 | det_prob.append(prob[inds]) 71 | 72 | if len(det_time) > 0: 73 | det_time = np.hstack(det_time) 74 | det_prob = np.hstack(det_prob) 75 | 76 | # undo the effects of times expansion 77 | if do_time_expansion: 78 | det_time /= 10.0 79 | 80 | return det_time, det_prob 81 | 82 | 83 | if __name__ == "__main__": 84 | """ 85 | This code takes a directory of audio files and runs a CNN based bat call 86 | detector. It returns the time in file of the detection and the probability 87 | that the detection is a bat call. 88 | """ 89 | 90 | # params 91 | detection_thresh = 0.80 # make this smaller if you want more calls detected 92 | do_time_expansion = True # set to True if audio is not already time expanded 93 | save_res = True 94 | 95 | # load data - 96 | data_dir = 'path_to_data/' # path of the data that we run the model on 97 | op_ann_dir = 'results/' # where we will store the outputs 98 | op_file_name_total = op_ann_dir + 'op_file.csv' 99 | if not os.path.isdir(op_ann_dir): 100 | os.makedirs(op_ann_dir) 101 | 102 | # load gpu lasagne model 103 | model_dir = 'data/models/' 104 | model_file = model_dir + 'test_set_bulgaria.mod' 105 | det = pickle.load(open(model_file)) 106 | det.chunk_size = 4.0 107 | 108 | # read audio files 109 | audio_files = glob.glob(data_dir + '*.wav') 110 | 111 | # loop through audio files 112 | results = [] 113 | for file_cnt, file_name in enumerate(audio_files): 114 | 115 | file_name_root = file_name[len(data_dir):] 116 | print '\n', file_cnt+1, 'of', len(audio_files), '\t', file_name_root 117 | 118 | # read audio file - skip file if cannot read 119 | read_fail, audio, file_dur, samp_rate, samp_rate_orig = read_audio(file_name, 120 | do_time_expansion, det.chunk_size, det.params.window_size) 121 | if read_fail: 122 | continue 123 | 124 | # run detector 125 | tic = time.time() 126 | det_time, det_prob = run_detector(det, audio, file_dur, samp_rate, 127 | detection_thresh) 128 | toc = time.time() 129 | 130 | print ' detection time', round(toc-tic, 3), '(secs)' 131 | num_calls = len(det_time) 132 | print ' ' + str(num_calls) + ' calls found' 133 | 134 | # save results 135 | if save_res: 136 | # return detector results 137 | pred_classes = np.zeros((len(det_time), 1), dtype=np.int) 138 | pred_prob = np.asarray(det_prob)[..., np.newaxis] 139 | 140 | # save to AudioTagger format 141 | op_file_name = op_ann_dir + file_name_root[:-4] + '-sceneRect.csv' 142 | wo.create_audio_tagger_op(file_name_root, op_file_name, det_time, 143 | det_prob, pred_classes[:,0], pred_prob[:,0], 144 | samp_rate_orig, np.asarray(['bat'])) 145 | 146 | # save as dictionary 147 | if num_calls > 0: 148 | res = {'filename':file_name_root, 'time':det_time, 149 | 'prob':det_prob, 'pred_classes':pred_classes, 150 | 'pred_prob':pred_prob} 151 | results.append(res) 152 | 153 | # save to large csv 154 | if save_res and (len(results) > 0): 155 | print '\nsaving results to', op_file_name_total 156 | wo.save_to_txt(op_file_name_total, results, np.asarray(['bat'])) 157 | else: 158 | print 'no detections to save' 159 | -------------------------------------------------------------------------------- /bat_train/spectrogram.py: -------------------------------------------------------------------------------- 1 | from skimage import filters 2 | import numpy as np 3 | 4 | 5 | def denoise(spec_noisy, mask=None): 6 | """ 7 | Perform denoising, subtract mean from each frequency band. 8 | Mask chooses the relevant time steps to use. 9 | """ 10 | 11 | if mask is None: 12 | # no mask 13 | me = np.mean(spec_noisy, 1) 14 | spec_denoise = spec_noisy - me[:, np.newaxis] 15 | 16 | else: 17 | # user defined mask 18 | mask_inv = np.invert(mask) 19 | spec_denoise = spec_noisy.copy() 20 | 21 | if np.sum(mask) > 0: 22 | me = np.mean(spec_denoise[:, mask], 1) 23 | spec_denoise[:, mask] = spec_denoise[:, mask] - me[:, np.newaxis] 24 | 25 | if np.sum(mask_inv) > 0: 26 | me_inv = np.mean(spec_denoise[:, mask_inv], 1) 27 | spec_denoise[:, mask_inv] = spec_denoise[:, mask_inv] - me_inv[:, np.newaxis] 28 | 29 | # remove anything below 0 30 | spec_denoise.clip(min=0, out=spec_denoise) 31 | 32 | return spec_denoise 33 | 34 | 35 | def gen_mag_spectrogram_fft(x, nfft, noverlap): 36 | """ 37 | Compute magnitude spectrogram by specifying num bins. 38 | """ 39 | 40 | # window data 41 | step = nfft - noverlap 42 | shape = (nfft, (x.shape[-1]-noverlap)//step) 43 | strides = (x.strides[0], step*x.strides[0]) 44 | x_wins = np.lib.stride_tricks.as_strided(x, shape=shape, strides=strides) 45 | 46 | # apply window 47 | x_wins_han = np.hanning(x_wins.shape[0])[..., np.newaxis] * x_wins 48 | 49 | # do fft 50 | complex_spec = np.fft.rfft(x_wins_han, n=nfft, axis=0) 51 | 52 | # calculate magnitude 53 | mag_spec = np.conjugate(complex_spec) * complex_spec 54 | mag_spec = mag_spec.real 55 | # same as: 56 | #mag_spec = np.square(np.absolute(complex_spec)) 57 | 58 | # orient correctly and remove dc component 59 | mag_spec = mag_spec[1:, :] 60 | mag_spec = np.flipud(mag_spec) 61 | 62 | return mag_spec 63 | 64 | 65 | def gen_mag_spectrogram(x, fs, ms, overlap_perc): 66 | """ 67 | Computes magnitude spectrogram by specifying time. 68 | """ 69 | 70 | nfft = int(ms*fs) 71 | noverlap = int(overlap_perc*nfft) 72 | 73 | # window data 74 | step = nfft - noverlap 75 | shape = (nfft, (x.shape[-1]-noverlap)//step) 76 | strides = (x.strides[0], step*x.strides[0]) 77 | x_wins = np.lib.stride_tricks.as_strided(x, shape=shape, strides=strides) 78 | 79 | # apply window 80 | x_wins_han = np.hanning(x_wins.shape[0])[..., np.newaxis] * x_wins 81 | 82 | # do fft 83 | # note this will be much slower if x_wins_han.shape[0] is not a power of 2 84 | complex_spec = np.fft.rfft(x_wins_han, axis=0) 85 | 86 | # calculate magnitude 87 | mag_spec = (np.conjugate(complex_spec) * complex_spec).real 88 | # same as: 89 | #mag_spec = np.square(np.absolute(complex_spec)) 90 | 91 | # orient correctly and remove dc component 92 | spec = mag_spec[1:, :] 93 | spec = np.flipud(spec) 94 | 95 | return spec 96 | 97 | 98 | def gen_spectrogram(audio_samples, sampling_rate, fft_win_length, fft_overlap, crop_spec=True, max_freq=256, min_freq=0): 99 | """ 100 | Compute spectrogram, crop and compute log. 101 | """ 102 | 103 | # compute spectrogram 104 | spec = gen_mag_spectrogram(audio_samples, sampling_rate, fft_win_length, fft_overlap) 105 | 106 | # only keep the relevant bands - could do this outside 107 | if crop_spec: 108 | spec = spec[-max_freq:-min_freq, :] 109 | 110 | # add some zeros if too small 111 | req_height = max_freq-min_freq 112 | if spec.shape[0] < req_height: 113 | zero_pad = np.zeros((req_height-spec.shape[0], spec.shape[1])) 114 | spec = np.vstack((zero_pad, spec)) 115 | 116 | # perform log scaling - here the same as matplotlib 117 | log_scaling = 2.0 * (1.0 / sampling_rate) * (1.0/(np.abs(np.hanning(int(fft_win_length*sampling_rate)))**2).sum()) 118 | spec = np.log(1.0 + log_scaling*spec) 119 | 120 | return spec 121 | 122 | 123 | def process_spectrogram(spec, denoise_spec=True, mean_log_mag=0.5, smooth_spec=True): 124 | """ 125 | Denoises, and smooths spectrogram. 126 | """ 127 | 128 | # denoise 129 | if denoise_spec: 130 | # use a mask as there is silence at the start and end of recs 131 | mask = spec.mean(0) > mean_log_mag 132 | spec = denoise(spec, mask) 133 | 134 | # smooth the spectrogram 135 | if smooth_spec: 136 | spec = filters.gaussian(spec, 1.0) 137 | 138 | return spec 139 | -------------------------------------------------------------------------------- /bat_train/write_op.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import numpy as np 3 | import datetime as dt 4 | import glob 5 | import os 6 | 7 | 8 | def save_to_txt(op_file, results, class_names): 9 | num_top_classes = results[0]['pred_prob'].shape[1] 10 | 11 | # takes a dictionary of results and saves to file 12 | with open(op_file, 'w') as file: 13 | head_str = 'file_name,detection_time,detection_prob' 14 | for cc in range(num_top_classes): 15 | head_str += ',class_' + str(cc) + ',prob_' + str(cc) 16 | file.write(head_str + '\n') 17 | 18 | for ii in range(len(results)): 19 | for jj in range(len(results[ii]['prob'])): 20 | 21 | row_str = results[ii]['filename'] + ',' 22 | tm = round(results[ii]['time'][jj],3) 23 | pr = round(results[ii]['prob'][jj],3) 24 | row_str += str(tm) + ',' + str(pr) 25 | 26 | for cc in range(num_top_classes): 27 | cl = class_names[results[ii]['pred_classes'][jj, cc]] 28 | pr = round(results[ii]['pred_prob'][jj, cc],3) 29 | row_str += ',' + cl + ',' + str(pr) 30 | 31 | file.write(row_str + '\n') 32 | 33 | 34 | def create_audio_tagger_op(ip_file_name, op_file_name, st_times, det_confidence, 35 | class_pred, class_prob, samp_rate, class_names): 36 | # saves the detections in an audiotagger friendly format 37 | 38 | col_names = ['Filename', 'Label', 'LabelTimeStamp', 'Spec_NStep', 39 | 'Spec_NWin', 'Spec_x1', 'Spec_y1', 'Spec_x2', 'Spec_y2', 40 | 'LabelStartTime_Seconds', 'LabelEndTime_Seconds', 41 | 'LabelArea_DataPoints', 'DetectorConfidence', 42 | 'ClassifierConfidence'] 43 | 44 | nstep = 0.001 45 | nwin = 0.003 46 | call_width = 0.001 # code does not output call width so just put in dummy value 47 | y_max = (samp_rate*nwin)/2.0 48 | num_calls = len(st_times) 49 | 50 | if num_calls == 0: 51 | da_at = pd.DataFrame(index=np.arange(0), columns=col_names) 52 | else: 53 | da_at = pd.DataFrame(index=np.arange(0, num_calls), columns=col_names) 54 | da_at['Spec_NStep'] = nstep 55 | da_at['Spec_NWin'] = nwin 56 | da_at['Label'] = 'bat' 57 | da_at['LabelTimeStamp'] = dt.datetime.now().isoformat() 58 | da_at['Spec_y1'] = 0 59 | da_at['Spec_y2'] = y_max 60 | da_at['Filename'] = ip_file_name 61 | 62 | for ii in np.arange(0, num_calls): 63 | 64 | st_time = st_times[ii] 65 | da_at.loc[ii, 'LabelStartTime_Seconds'] = st_time 66 | da_at.loc[ii, 'LabelEndTime_Seconds'] = st_time + call_width 67 | da_at.loc[ii, 'Label'] = class_names[class_pred[ii]] 68 | 69 | da_at.loc[ii, 'Spec_x1'] = st_time/nstep 70 | da_at.loc[ii, 'Spec_x2'] = (st_time + call_width)/nstep 71 | 72 | da_at.loc[ii, 'DetectorConfidence'] = round(det_confidence[ii], 3) 73 | da_at.loc[ii, 'ClassifierConfidence'] = round(class_prob[ii], 3) 74 | 75 | # save to disk 76 | da_at.to_csv(op_file_name, index=False) 77 | 78 | return da_at 79 | 80 | -------------------------------------------------------------------------------- /readme.md: -------------------------------------------------------------------------------- 1 | # Bat Echolocation Call Detection in Audio Recordings 2 | Python code for the detection of bat echolocation calls in full spectrum audio recordings. This code recreate the results from the paper [Bat Detective - Deep Learning Tools for Bat Acoustic Signal Detection](http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005995). You will also find some additional information and data on our [project page](http://visual.cs.ucl.ac.uk/pubs/batDetective). 3 | 4 | 5 | **Update Dec 2022:** We now have a new and improved codebase that you can access [here](https://github.com/macaodha/batdetect2). 6 | 7 | 8 | ## Training 9 | `bat_train` contains the code to train the models and recreate the plots in the paper. 10 | 11 | ## Running the Detector 12 | `bat_eval` contains lightweight python scripts that load a pretrained model and run the detector on a directory of audio files. No GPU is required for this step. 13 | 14 | 15 | ## Misc 16 | 17 | #### Video 18 | Here is a short video that describes how our systems works. 19 | [![Screenshot](https://img.youtube.com/vi/u35jWHdhl-8/0.jpg)](https://www.youtube.com/watch?v=u35jWHdhl-8) 20 | 21 | 22 | #### Links 23 | [Nature Smart Cities](https://naturesmartcities.com) Deployment of smart audio detectors that use our code base to detect bats in East London. 24 | [Bat Detective](www.batdetective.org) Zooniverse citizen science project that was created to collected our training data. 25 | [iBats](http://www.bats.org.uk/pages/ibatsprogram.html) Global bat monitoring program. 26 | 27 | 28 | #### Reference 29 | If you find our work useful in your research please consider citing our paper: 30 | ``` 31 | @inproceedings{batdetect18, 32 | title = {Bat Detective - Deep Learning Tools for Bat Acoustic Signal Detection}, 33 | author = {Mac Aodha, Oisin and Gibb, Rory and Barlow, Kate and Browning, Ella and 34 | Firman, Michael and Freeman, Robin and Harder, Briana and Kinsey, Libby and 35 | Mead, Gary and Newson, Stuart and Pandourski, Ivan and Parsons, Stuart and 36 | Russ, Jon and Szodoray-Paradi, Abigel and Szodoray-Paradi, Farkas and 37 | Tilova, Elena and Girolami, Mark and Brostow, Gabriel and E. Jones, Kate.}, 38 | journal={PLOS Computational Biology}, 39 | year={2018} 40 | } 41 | ``` 42 | 43 | #### Acknowledgements 44 | We are enormously grateful for the efforts and enthusiasm of the amazing iBats and Bat Detective volunteers. We would also like to thank Ian Agranat and Joe Szewczak for useful discussions and access to their systems. Finally, we would like to thank [Zooniverse](https://www.zooniverse.org/) for setting up and hosting the Bat Detective project. 45 | --------------------------------------------------------------------------------