├── .gitignore
├── LICENSE.md
├── bat_eval
    ├── cnn_helpers.py
    ├── cpu_detection.py
    ├── evaluate_cnn_fast.py
    ├── models
    │   ├── detector.npy
    │   ├── detector_192K.npy
    │   ├── detector_192K_params.json
    │   ├── detector_params.json
    │   └── readme.md
    ├── myskimage.py
    ├── mywavfile.py
    ├── nms.pyx
    ├── nms_slow.py
    ├── readme.md
    ├── run_detector.py
    ├── setup.py
    ├── spectrogram.py
    ├── wavs
    │   └── test_file.wav
    └── write_op.py
├── bat_train
    ├── classifier.py
    ├── cls_audio_forest.py
    ├── cls_cnn.py
    ├── cls_segment.py
    ├── create_results.py
    ├── data
    │   └── readme.md
    ├── data_set_params.py
    ├── evaluate.py
    ├── export_detector_weights.py
    ├── grad_features.py
    ├── nms.pyx
    ├── nms_slow.py
    ├── random_forest.py
    ├── readme.md
    ├── run_comparison.py
    ├── run_detector.py
    ├── spectrogram.py
    └── write_op.py
└── readme.md


/.gitignore:
--------------------------------------------------------------------------------
1 | *.pyc
2 | *.DS_Store
3 | *.c
4 | *.so
5 | 


--------------------------------------------------------------------------------
/LICENSE.md:
--------------------------------------------------------------------------------
  1 | Attribution 4.0 International
  2 | 
  3 | =======================================================================
  4 | 
  5 | Creative Commons Corporation ("Creative Commons") is not a law firm and
  6 | does not provide legal services or legal advice. Distribution of
  7 | Creative Commons public licenses does not create a lawyer-client or
  8 | other relationship. Creative Commons makes its licenses and related
  9 | information available on an "as-is" basis. Creative Commons gives no
 10 | warranties regarding its licenses, any material licensed under their
 11 | terms and conditions, or any related information. Creative Commons
 12 | disclaims all liability for damages resulting from their use to the
 13 | fullest extent possible.
 14 | 
 15 | Using Creative Commons Public Licenses
 16 | 
 17 | Creative Commons public licenses provide a standard set of terms and
 18 | conditions that creators and other rights holders may use to share
 19 | original works of authorship and other material subject to copyright
 20 | and certain other rights specified in the public license below. The
 21 | following considerations are for informational purposes only, are not
 22 | exhaustive, and do not form part of our licenses.
 23 | 
 24 |      Considerations for licensors: Our public licenses are
 25 |      intended for use by those authorized to give the public
 26 |      permission to use material in ways otherwise restricted by
 27 |      copyright and certain other rights. Our licenses are
 28 |      irrevocable. Licensors should read and understand the terms
 29 |      and conditions of the license they choose before applying it.
 30 |      Licensors should also secure all rights necessary before
 31 |      applying our licenses so that the public can reuse the
 32 |      material as expected. Licensors should clearly mark any
 33 |      material not subject to the license. This includes other CC-
 34 |      licensed material, or material used under an exception or
 35 |      limitation to copyright. More considerations for licensors:
 36 | 	wiki.creativecommons.org/Considerations_for_licensors
 37 | 
 38 |      Considerations for the public: By using one of our public
 39 |      licenses, a licensor grants the public permission to use the
 40 |      licensed material under specified terms and conditions. If
 41 |      the licensor's permission is not necessary for any reason--for
 42 |      example, because of any applicable exception or limitation to
 43 |      copyright--then that use is not regulated by the license. Our
 44 |      licenses grant only permissions under copyright and certain
 45 |      other rights that a licensor has authority to grant. Use of
 46 |      the licensed material may still be restricted for other
 47 |      reasons, including because others have copyright or other
 48 |      rights in the material. A licensor may make special requests,
 49 |      such as asking that all changes be marked or described.
 50 |      Although not required by our licenses, you are encouraged to
 51 |      respect those requests where reasonable. More_considerations
 52 |      for the public:
 53 | 	wiki.creativecommons.org/Considerations_for_licensees
 54 | 
 55 | =======================================================================
 56 | 
 57 | Creative Commons Attribution 4.0 International Public License
 58 | 
 59 | By exercising the Licensed Rights (defined below), You accept and agree
 60 | to be bound by the terms and conditions of this Creative Commons
 61 | Attribution 4.0 International Public License ("Public License"). To the
 62 | extent this Public License may be interpreted as a contract, You are
 63 | granted the Licensed Rights in consideration of Your acceptance of
 64 | these terms and conditions, and the Licensor grants You such rights in
 65 | consideration of benefits the Licensor receives from making the
 66 | Licensed Material available under these terms and conditions.
 67 | 
 68 | 
 69 | Section 1 -- Definitions.
 70 | 
 71 |   a. Adapted Material means material subject to Copyright and Similar
 72 |      Rights that is derived from or based upon the Licensed Material
 73 |      and in which the Licensed Material is translated, altered,
 74 |      arranged, transformed, or otherwise modified in a manner requiring
 75 |      permission under the Copyright and Similar Rights held by the
 76 |      Licensor. For purposes of this Public License, where the Licensed
 77 |      Material is a musical work, performance, or sound recording,
 78 |      Adapted Material is always produced where the Licensed Material is
 79 |      synched in timed relation with a moving image.
 80 | 
 81 |   b. Adapter's License means the license You apply to Your Copyright
 82 |      and Similar Rights in Your contributions to Adapted Material in
 83 |      accordance with the terms and conditions of this Public License.
 84 | 
 85 |   c. Copyright and Similar Rights means copyright and/or similar rights
 86 |      closely related to copyright including, without limitation,
 87 |      performance, broadcast, sound recording, and Sui Generis Database
 88 |      Rights, without regard to how the rights are labeled or
 89 |      categorized. For purposes of this Public License, the rights
 90 |      specified in Section 2(b)(1)-(2) are not Copyright and Similar
 91 |      Rights.
 92 | 
 93 |   d. Effective Technological Measures means those measures that, in the
 94 |      absence of proper authority, may not be circumvented under laws
 95 |      fulfilling obligations under Article 11 of the WIPO Copyright
 96 |      Treaty adopted on December 20, 1996, and/or similar international
 97 |      agreements.
 98 | 
 99 |   e. Exceptions and Limitations means fair use, fair dealing, and/or
100 |      any other exception or limitation to Copyright and Similar Rights
101 |      that applies to Your use of the Licensed Material.
102 | 
103 |   f. Licensed Material means the artistic or literary work, database,
104 |      or other material to which the Licensor applied this Public
105 |      License.
106 | 
107 |   g. Licensed Rights means the rights granted to You subject to the
108 |      terms and conditions of this Public License, which are limited to
109 |      all Copyright and Similar Rights that apply to Your use of the
110 |      Licensed Material and that the Licensor has authority to license.
111 | 
112 |   h. Licensor means the individual(s) or entity(ies) granting rights
113 |      under this Public License.
114 | 
115 |   i. Share means to provide material to the public by any means or
116 |      process that requires permission under the Licensed Rights, such
117 |      as reproduction, public display, public performance, distribution,
118 |      dissemination, communication, or importation, and to make material
119 |      available to the public including in ways that members of the
120 |      public may access the material from a place and at a time
121 |      individually chosen by them.
122 | 
123 |   j. Sui Generis Database Rights means rights other than copyright
124 |      resulting from Directive 96/9/EC of the European Parliament and of
125 |      the Council of 11 March 1996 on the legal protection of databases,
126 |      as amended and/or succeeded, as well as other essentially
127 |      equivalent rights anywhere in the world.
128 | 
129 |   k. You means the individual or entity exercising the Licensed Rights
130 |      under this Public License. Your has a corresponding meaning.
131 | 
132 | 
133 | Section 2 -- Scope.
134 | 
135 |   a. License grant.
136 | 
137 |        1. Subject to the terms and conditions of this Public License,
138 |           the Licensor hereby grants You a worldwide, royalty-free,
139 |           non-sublicensable, non-exclusive, irrevocable license to
140 |           exercise the Licensed Rights in the Licensed Material to:
141 | 
142 |             a. reproduce and Share the Licensed Material, in whole or
143 |                in part; and
144 | 
145 |             b. produce, reproduce, and Share Adapted Material.
146 | 
147 |        2. Exceptions and Limitations. For the avoidance of doubt, where
148 |           Exceptions and Limitations apply to Your use, this Public
149 |           License does not apply, and You do not need to comply with
150 |           its terms and conditions.
151 | 
152 |        3. Term. The term of this Public License is specified in Section
153 |           6(a).
154 | 
155 |        4. Media and formats; technical modifications allowed. The
156 |           Licensor authorizes You to exercise the Licensed Rights in
157 |           all media and formats whether now known or hereafter created,
158 |           and to make technical modifications necessary to do so. The
159 |           Licensor waives and/or agrees not to assert any right or
160 |           authority to forbid You from making technical modifications
161 |           necessary to exercise the Licensed Rights, including
162 |           technical modifications necessary to circumvent Effective
163 |           Technological Measures. For purposes of this Public License,
164 |           simply making modifications authorized by this Section 2(a)
165 |           (4) never produces Adapted Material.
166 | 
167 |        5. Downstream recipients.
168 | 
169 |             a. Offer from the Licensor -- Licensed Material. Every
170 |                recipient of the Licensed Material automatically
171 |                receives an offer from the Licensor to exercise the
172 |                Licensed Rights under the terms and conditions of this
173 |                Public License.
174 | 
175 |             b. No downstream restrictions. You may not offer or impose
176 |                any additional or different terms or conditions on, or
177 |                apply any Effective Technological Measures to, the
178 |                Licensed Material if doing so restricts exercise of the
179 |                Licensed Rights by any recipient of the Licensed
180 |                Material.
181 | 
182 |        6. No endorsement. Nothing in this Public License constitutes or
183 |           may be construed as permission to assert or imply that You
184 |           are, or that Your use of the Licensed Material is, connected
185 |           with, or sponsored, endorsed, or granted official status by,
186 |           the Licensor or others designated to receive attribution as
187 |           provided in Section 3(a)(1)(A)(i).
188 | 
189 |   b. Other rights.
190 | 
191 |        1. Moral rights, such as the right of integrity, are not
192 |           licensed under this Public License, nor are publicity,
193 |           privacy, and/or other similar personality rights; however, to
194 |           the extent possible, the Licensor waives and/or agrees not to
195 |           assert any such rights held by the Licensor to the limited
196 |           extent necessary to allow You to exercise the Licensed
197 |           Rights, but not otherwise.
198 | 
199 |        2. Patent and trademark rights are not licensed under this
200 |           Public License.
201 | 
202 |        3. To the extent possible, the Licensor waives any right to
203 |           collect royalties from You for the exercise of the Licensed
204 |           Rights, whether directly or through a collecting society
205 |           under any voluntary or waivable statutory or compulsory
206 |           licensing scheme. In all other cases the Licensor expressly
207 |           reserves any right to collect such royalties.
208 | 
209 | 
210 | Section 3 -- License Conditions.
211 | 
212 | Your exercise of the Licensed Rights is expressly made subject to the
213 | following conditions.
214 | 
215 |   a. Attribution.
216 | 
217 |        1. If You Share the Licensed Material (including in modified
218 |           form), You must:
219 | 
220 |             a. retain the following if it is supplied by the Licensor
221 |                with the Licensed Material:
222 | 
223 |                  i. identification of the creator(s) of the Licensed
224 |                     Material and any others designated to receive
225 |                     attribution, in any reasonable manner requested by
226 |                     the Licensor (including by pseudonym if
227 |                     designated);
228 | 
229 |                 ii. a copyright notice;
230 | 
231 |                iii. a notice that refers to this Public License;
232 | 
233 |                 iv. a notice that refers to the disclaimer of
234 |                     warranties;
235 | 
236 |                  v. a URI or hyperlink to the Licensed Material to the
237 |                     extent reasonably practicable;
238 | 
239 |             b. indicate if You modified the Licensed Material and
240 |                retain an indication of any previous modifications; and
241 | 
242 |             c. indicate the Licensed Material is licensed under this
243 |                Public License, and include the text of, or the URI or
244 |                hyperlink to, this Public License.
245 | 
246 |        2. You may satisfy the conditions in Section 3(a)(1) in any
247 |           reasonable manner based on the medium, means, and context in
248 |           which You Share the Licensed Material. For example, it may be
249 |           reasonable to satisfy the conditions by providing a URI or
250 |           hyperlink to a resource that includes the required
251 |           information.
252 | 
253 |        3. If requested by the Licensor, You must remove any of the
254 |           information required by Section 3(a)(1)(A) to the extent
255 |           reasonably practicable.
256 | 
257 |        4. If You Share Adapted Material You produce, the Adapter's
258 |           License You apply must not prevent recipients of the Adapted
259 |           Material from complying with this Public License.
260 | 
261 | 
262 | Section 4 -- Sui Generis Database Rights.
263 | 
264 | Where the Licensed Rights include Sui Generis Database Rights that
265 | apply to Your use of the Licensed Material:
266 | 
267 |   a. for the avoidance of doubt, Section 2(a)(1) grants You the right
268 |      to extract, reuse, reproduce, and Share all or a substantial
269 |      portion of the contents of the database;
270 | 
271 |   b. if You include all or a substantial portion of the database
272 |      contents in a database in which You have Sui Generis Database
273 |      Rights, then the database in which You have Sui Generis Database
274 |      Rights (but not its individual contents) is Adapted Material; and
275 | 
276 |   c. You must comply with the conditions in Section 3(a) if You Share
277 |      all or a substantial portion of the contents of the database.
278 | 
279 | For the avoidance of doubt, this Section 4 supplements and does not
280 | replace Your obligations under this Public License where the Licensed
281 | Rights include other Copyright and Similar Rights.
282 | 
283 | 
284 | Section 5 -- Disclaimer of Warranties and Limitation of Liability.
285 | 
286 |   a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE
287 |      EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS
288 |      AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF
289 |      ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS,
290 |      IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION,
291 |      WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR
292 |      PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS,
293 |      ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT
294 |      KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT
295 |      ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU.
296 | 
297 |   b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE
298 |      TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION,
299 |      NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT,
300 |      INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES,
301 |      COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR
302 |      USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN
303 |      ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR
304 |      DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR
305 |      IN PART, THIS LIMITATION MAY NOT APPLY TO YOU.
306 | 
307 |   c. The disclaimer of warranties and limitation of liability provided
308 |      above shall be interpreted in a manner that, to the extent
309 |      possible, most closely approximates an absolute disclaimer and
310 |      waiver of all liability.
311 | 
312 | 
313 | Section 6 -- Term and Termination.
314 | 
315 |   a. This Public License applies for the term of the Copyright and
316 |      Similar Rights licensed here. However, if You fail to comply with
317 |      this Public License, then Your rights under this Public License
318 |      terminate automatically.
319 | 
320 |   b. Where Your right to use the Licensed Material has terminated under
321 |      Section 6(a), it reinstates:
322 | 
323 |        1. automatically as of the date the violation is cured, provided
324 |           it is cured within 30 days of Your discovery of the
325 |           violation; or
326 | 
327 |        2. upon express reinstatement by the Licensor.
328 | 
329 |      For the avoidance of doubt, this Section 6(b) does not affect any
330 |      right the Licensor may have to seek remedies for Your violations
331 |      of this Public License.
332 | 
333 |   c. For the avoidance of doubt, the Licensor may also offer the
334 |      Licensed Material under separate terms or conditions or stop
335 |      distributing the Licensed Material at any time; however, doing so
336 |      will not terminate this Public License.
337 | 
338 |   d. Sections 1, 5, 6, 7, and 8 survive termination of this Public
339 |      License.
340 | 
341 | 
342 | Section 7 -- Other Terms and Conditions.
343 | 
344 |   a. The Licensor shall not be bound by any additional or different
345 |      terms or conditions communicated by You unless expressly agreed.
346 | 
347 |   b. Any arrangements, understandings, or agreements regarding the
348 |      Licensed Material not stated herein are separate from and
349 |      independent of the terms and conditions of this Public License.
350 | 
351 | 
352 | Section 8 -- Interpretation.
353 | 
354 |   a. For the avoidance of doubt, this Public License does not, and
355 |      shall not be interpreted to, reduce, limit, restrict, or impose
356 |      conditions on any use of the Licensed Material that could lawfully
357 |      be made without permission under this Public License.
358 | 
359 |   b. To the extent possible, if any provision of this Public License is
360 |      deemed unenforceable, it shall be automatically reformed to the
361 |      minimum extent necessary to make it enforceable. If the provision
362 |      cannot be reformed, it shall be severed from this Public License
363 |      without affecting the enforceability of the remaining terms and
364 |      conditions.
365 | 
366 |   c. No term or condition of this Public License will be waived and no
367 |      failure to comply consented to unless expressly agreed to by the
368 |      Licensor.
369 | 
370 |   d. Nothing in this Public License constitutes or may be interpreted
371 |      as a limitation upon, or waiver of, any privileges and immunities
372 |      that apply to the Licensor or You, including from the legal
373 |      processes of any jurisdiction or authority.
374 | 
375 | 
376 | =======================================================================
377 | 
378 | Creative Commons is not a party to its public
379 | licenses. Notwithstanding, Creative Commons may elect to apply one of
380 | its public licenses to material it publishes and in those instances
381 | will be considered the “Licensor.” The text of the Creative Commons
382 | public licenses is dedicated to the public domain under the CC0 Public
383 | Domain Dedication. Except for the limited purpose of indicating that
384 | material is shared under a Creative Commons public license or as
385 | otherwise permitted by the Creative Commons policies published at
386 | creativecommons.org/policies, Creative Commons does not authorize the
387 | use of the trademark "Creative Commons" or any other trademark or logo
388 | of Creative Commons without its prior written consent including,
389 | without limitation, in connection with any unauthorized modifications
390 | to any of its public licenses or any other arrangements,
391 | understandings, or agreements concerning use of licensed material. For
392 | the avoidance of doubt, this paragraph does not form part of the
393 | public licenses.
394 | 
395 | Creative Commons may be contacted at creativecommons.org.
396 | 


--------------------------------------------------------------------------------
/bat_eval/cnn_helpers.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | from numpy.lib.stride_tricks import as_strided
  3 | 
  4 | 
  5 | def aligned_malloc(shape, dtype, alignment=16):
  6 |     """allocates numpy.array of specified shape, dtype
  7 |     and memory alignment such that array.ctypes.data
  8 |     is an aligned memory pointer
  9 |     shape is numpy.array shape
 10 |     dtype is numpy.array element type
 11 |     alignment is required memory alignment in bytes
 12 |     """
 13 |     itemsize = np.dtype(dtype).itemsize
 14 |     extra = alignment // itemsize
 15 |     size = np.prod(shape)
 16 |     buf = np.empty(size + extra, dtype=dtype)
 17 |     ofs = (-buf.ctypes.data % alignment) // itemsize
 18 |     aa = buf[ofs:ofs+size].reshape(shape)
 19 |     assert (aa.ctypes.data % alignment) == 0
 20 |     assert (aa.flags['C_CONTIGUOUS']) == True
 21 |     return aa
 22 | 
 23 | 
 24 | def view_as_windows(arr_in, window_shape, step=1):
 25 |     """ Taken from skimage.util.shape.py
 26 |     Rolling window view of the input n-dimensional array.
 27 | 
 28 |     Windows are overlapping views of the input array, with adjacent windows
 29 |     shifted by a single row or column (or an index of a higher dimension).
 30 | 
 31 |     Parameters
 32 |     ----------
 33 |     arr_in : ndarray
 34 |         N-d input array.
 35 |     window_shape : tuple
 36 |         Defines the shape of the elementary n-dimensional orthotope
 37 |         (better know as hyperrectangle [1]_) of the rolling window view.
 38 |     step : int, optional
 39 |         Number of elements to skip when moving the window forward (by
 40 |         default, move forward by one). The value must be equal or larger
 41 |         than one.
 42 | 
 43 |     Returns
 44 |     -------
 45 |     arr_out : ndarray
 46 |         (rolling) window view of the input array.   If `arr_in` is
 47 |         non-contiguous, a copy is made.
 48 |     """
 49 | 
 50 |     arr_shape = np.array(arr_in.shape)
 51 |     window_shape = np.array(window_shape, dtype=arr_shape.dtype)
 52 | 
 53 |     # -- build rolling window view
 54 |     arr_in = np.ascontiguousarray(arr_in)
 55 | 
 56 |     new_shape = tuple((arr_shape - window_shape) // step + 1) + \
 57 |         tuple(window_shape)
 58 | 
 59 |     arr_strides = np.array(arr_in.strides)
 60 |     new_strides = np.concatenate((arr_strides * step, arr_strides))
 61 | 
 62 |     arr_out = as_strided(arr_in, shape=new_shape, strides=new_strides)
 63 | 
 64 |     return arr_out
 65 | 
 66 | 
 67 | def corr2d(ip, filters, bias):
 68 |     """performs 2D correlation on 3D input matrix with depth D, with N filters
 69 |     does matrix multiplication method - will use a lot of memory for large
 70 |     inputs. see here for more details:
 71 |     https://petewarden.com/2015/04/20/why-gemm-is-at-the-heart-of-deep-learning/
 72 |     ip is DxHxW
 73 |     filters is NxDxFhxFw, where Fh==Fw
 74 |     op is NxHxW
 75 |     """
 76 | 
 77 |     # reshape filters, can do this outside as it only needs to be done once
 78 |     filters_re = filters.reshape((filters.shape[0], np.prod(filters.shape[1:])))
 79 | 
 80 |     # produce views of the input
 81 |     op = view_as_windows(ip, filters.shape[1:])
 82 |     op_height, op_width = op.shape[1:3]
 83 | 
 84 |     # reshape to 2D matrix and correlate with filters
 85 |     op = op.reshape((np.prod(op.shape[:3]), np.prod(op.shape[3:])))
 86 |     op = np.dot(filters_re, op.T)
 87 | 
 88 |     # reshape back to the correct op size
 89 |     op = op.reshape((filters.shape[0], op_height, op_width))
 90 | 
 91 |     # add bias term
 92 |     op += bias[..., np.newaxis, np.newaxis]
 93 | 
 94 |     # non linearity - ReLu
 95 |     op.clip(min=0, out=op)
 96 | 
 97 |     return op
 98 | 
 99 | 
100 | def max_pool(ip):
101 |     """does a 2x2 max pool, crops off ends if not divisible by 2
102 |     ip is DxHxW
103 |     op is DxH/2xW/2
104 |     """
105 | 
106 |     height = ip.shape[1] - ip.shape[1]%2
107 |     width = ip.shape[2] - ip.shape[2]%2
108 |     h_max = np.maximum(ip[:,:height:2,:], ip[:,1:height:2,:])
109 |     op = np.maximum(h_max[:,:,:width:2], h_max[:,:,1:width:2])
110 |     return op
111 | 
112 | 
113 | def fully_connected_as_corr(ip, filters, bias):
114 |     """turns a conv ouput to fully connected layer into a correlation by sliding
115 |     it across the horizontal direction. this only needs to happen in 1D as the
116 |     nuerons see the same size as the input
117 |     ip is DxHxW
118 |     filters is 2D - (DxHxW)x(num_neurons)
119 |     op is Wxnum_neurons
120 |     """
121 | 
122 |     # create DxHxsliding_width views of input - similar to corr2d
123 |     sliding_width = filters.shape[0] // np.prod(ip.shape[:2])
124 |     op = view_as_windows(ip, (ip.shape[0],ip.shape[1],sliding_width))
125 |     op = op.reshape((np.prod(op.shape[:3]), np.prod(op.shape[3:])))
126 | 
127 |     # perform correlation view matrix multiplication
128 |     op = np.dot(op, filters)
129 | 
130 |     # add bias term
131 |     op += bias[np.newaxis, :]
132 | 
133 |     # non linearity - ReLu
134 |     op.clip(min=0, out=op)
135 | 
136 |     # pad with zeros at end so thats it the same width as input
137 |     op = np.vstack((op, np.zeros((ip.shape[2]-op.shape[0], op.shape[1]), dtype=np.float32)))
138 | 
139 |     return op
140 | 
141 | 


--------------------------------------------------------------------------------
/bat_eval/cpu_detection.py:
--------------------------------------------------------------------------------
  1 | from __future__ import print_function
  2 | import numpy as np
  3 | from scipy.ndimage import zoom
  4 | from scipy.ndimage.filters import gaussian_filter1d
  5 | import json
  6 | import time
  7 | 
  8 | from spectrogram import Spectrogram
  9 | import cnn_helpers as ch
 10 | 
 11 | import warnings
 12 | warnings.simplefilter("ignore", UserWarning)
 13 | 
 14 | try:
 15 |     import nms as nms
 16 | except ImportError as e:
 17 |     print("Import Error: {0}".format(e))
 18 |     print('please compile fast nms by running:')
 19 |     print('python setup.py build_ext --inplace')
 20 |     print('using slow nms in the meantime.')
 21 |     import nms_slow as nms
 22 | 
 23 | 
 24 | class CPUDetector:
 25 | 
 26 |     def __init__(self, weight_file, params_file):
 27 |         """Performs detection on an audio file.
 28 |         The structure of the network is hard coded to a network with 2
 29 |         convolution layers with pooling, 1 or 2 fully connected layers, and a
 30 |         final softmax layer.
 31 | 
 32 |         weight_file is the path to the numpy weights to the network
 33 |         params_file is the path to the network parameters
 34 |         """
 35 | 
 36 |         self.weights = np.load(weight_file, encoding='latin1')
 37 |         if not all([weight.dtype==np.float32 for weight in self.weights]):
 38 |             for i in range(self.weights.shape[0]):
 39 |                 self.weights[i] = self.weights[i].astype(np.float32)
 40 | 
 41 |         with open(params_file) as fp:
 42 |             params = json.load(fp)
 43 | 
 44 |         self.chunk_size = 4.0  # seconds
 45 |         self.win_size = params['win_size']
 46 |         self.max_freq = params['max_freq']
 47 |         self.min_freq = params['min_freq']
 48 |         self.slice_scale = params['slice_scale']
 49 |         self.overlap = params['overlap']
 50 |         self.crop_spec = params['crop_spec']
 51 |         self.denoise = params['denoise']
 52 |         self.smooth_spec = params['smooth_spec']
 53 |         self.nms_win_size = int(params['nms_win_size'])
 54 |         self.smooth_op_prediction_sigma = params['smooth_op_prediction_sigma']
 55 |         self.sp = Spectrogram()
 56 | 
 57 | 
 58 |     def run_detection(self, spec, chunk_duration, detection_thresh, low_res=True):
 59 |         """audio is the raw samples should be 1D vector
 60 |         sampling_rate should be divided by 10 if the recordings are not time
 61 |         expanded
 62 |         """
 63 | 
 64 |         # run the cnn - low_res will be faster but less accurate
 65 |         if low_res:
 66 |             prob = self.eval_network(spec)
 67 |             scale_fact = 8.0
 68 |         else:
 69 |             prob_1 = self.eval_network(spec)
 70 |             prob_2 = self.eval_network(spec[:, 2:])
 71 | 
 72 |             prob = np.zeros(prob_1.shape[0]*2, dtype=np.float32)
 73 |             prob[0::2] = prob_1
 74 |             prob[1::2] = prob_2
 75 |             scale_fact = 4.0
 76 | 
 77 |         f_size = self.smooth_op_prediction_sigma / scale_fact
 78 |         nms_win = np.round(self.nms_win_size / scale_fact)
 79 | 
 80 |         # smooth the outputs - this might not be necessary
 81 |         prob = gaussian_filter1d(prob, f_size)
 82 | 
 83 |         # perform non maximum suppression
 84 |         call_time, call_prob = nms.nms_1d(prob, nms_win, chunk_duration)
 85 | 
 86 |         # remove detections below threshold
 87 |         if call_prob.shape[0] > 0:
 88 |             inds = (call_prob >= detection_thresh)
 89 |             call_prob = call_prob[inds]
 90 |             call_time = call_time[inds]
 91 | 
 92 |         return call_time, call_prob
 93 | 
 94 | 
 95 |     def create_spec(self, audio, sampling_rate):
 96 |         """Creates spectrogram (returned numpy array has correct memory alignement)
 97 |         """
 98 |         hspec = self.sp.gen_spectrogram(audio, sampling_rate, self.slice_scale,
 99 |                                     self.overlap, crop_spec=self.crop_spec,
100 |                                     max_freq=self.max_freq, min_freq=self.min_freq)
101 |         hspec = self.sp.process_spectrogram(hspec, denoise_spec=self.denoise,
102 |                                     smooth_spec=self.smooth_spec)
103 |         nsize = (np.ceil(hspec.shape[0]/2.0).astype(int), np.ceil(hspec.shape[1]/2.0).astype(int))
104 |         spec = ch.aligned_malloc(nsize, np.float32)
105 | 
106 |         zoom(hspec, 0.5, output=spec, order=1)
107 |         return spec
108 | 
109 | 
110 |     def eval_network(self, ip):
111 |         """runs the cnn - either the 1 or 2 fully connected versions
112 |         """
113 | 
114 |         if self.weights.shape[0] == 8:
115 |             prob = self.eval_network_1_dense(ip)
116 |         elif self.weights.shape[0] == 10:
117 |             prob = self.eval_network_2_dense(ip)
118 |         return prob
119 | 
120 | 
121 |     def eval_network_1_dense(self, ip):
122 |         """cnn with 1 dense layer at end
123 |         """
124 | 
125 |         # Conv Layer 1
126 |         conv1 = ch.corr2d(ip[np.newaxis,:,:], self.weights[0], self.weights[1])
127 |         pool1 = ch.max_pool(conv1)
128 | 
129 |         # Conv Layer 2
130 |         conv2 = ch.corr2d(pool1, self.weights[2], self.weights[3])
131 |         pool2 = ch.max_pool(conv2)
132 | 
133 |         # Fully Connected 1
134 |         fc1 = ch.fully_connected_as_corr(pool2, self.weights[4], self.weights[5])
135 | 
136 |         # Output layer
137 |         prob = np.dot(fc1, self.weights[6])
138 |         prob += self.weights[7][np.newaxis, :]
139 |         prob = prob - np.amax(prob, axis=1, keepdims=True)
140 |         prob = np.exp(prob)
141 |         prob = prob[:, 1] / prob.sum(1)
142 |         prob = np.hstack((prob, np.zeros((ip.shape[1]//4)-prob.shape[0], dtype=np.float32)))
143 | 
144 |         return prob
145 | 
146 | 
147 |     def eval_network_2_dense(self, ip):
148 |         """cnn with 2 dense layers at end
149 |         """
150 | 
151 |         # Conv Layer 1
152 |         conv1 = ch.corr2d(ip[np.newaxis,:], self.weights[0], self.weights[1])
153 |         pool1 = ch.max_pool(conv1)
154 | 
155 |         # Conv Layer 2
156 |         conv2 = ch.corr2d(pool1, self.weights[2], self.weights[3])
157 |         pool2 = ch.max_pool(conv2)
158 | 
159 |         # Fully Connected 1
160 |         fc1 = ch.fully_connected_as_corr(pool2, self.weights[4], self.weights[5])
161 | 
162 |         # Fully Connected 2
163 |         fc2 = np.dot(fc1, self.weights[6])  # fc times fc
164 |         fc2 += self.weights[7][np.newaxis, :]  # add bias term
165 |         fc2.clip(min=0, out=fc2)  # non linearity - ReLu
166 | 
167 |         # Output layer
168 |         prob = np.dot(fc2, self.weights[8])
169 |         prob += self.weights[9][np.newaxis, :]
170 |         prob = prob - np.amax(prob, axis=1, keepdims=True)
171 |         prob = np.exp(prob)
172 |         prob = prob[:, 1] / prob.sum(1)
173 |         prob = np.hstack((prob, np.zeros((ip.shape[1]//4)-prob.shape[0], dtype=np.float32)))
174 | 
175 |         return prob
176 | 
177 | 


--------------------------------------------------------------------------------
/bat_eval/evaluate_cnn_fast.py:
--------------------------------------------------------------------------------
 1 | """
 2 | This script evaluates the performance of the CPU version of CNN_FAST on the
 3 | different test sets.
 4 | 
 5 | First you need to run 'run_detector.py' with these settings to produce
 6 | 'results/op_file.csv':
 7 | 
 8 | detection_thresh = 0.0
 9 | do_time_expansion = False
10 | root_dir = '../bat_train/data/'  # set this to where the annotations and wav files are
11 | data_set = root_dir + 'train_test_split/test_set_bulgaria.npz'
12 | data_dir = root_dir + 'wav/'
13 | loaded_data_tr = np.load(data_set)
14 | audio_files = loaded_data_tr['test_files']
15 | audio_files = [data_dir + tt + '.wav' for tt in audio_files]
16 | """
17 | 
18 | import numpy as np
19 | import pandas as pd
20 | import sys
21 | sys.path.append('../bat_train/')
22 | import evaluate as evl
23 | import create_results as res
24 | 
25 | # point to data
26 | root_dir = '../bat_train/data/'
27 | 
28 | # load the test data
29 | data_set = root_dir + 'train_test_split/test_set_uk.npz'
30 | loaded_data_tr = np.load(data_set)
31 | test_pos = loaded_data_tr['test_pos']
32 | test_files = loaded_data_tr['test_files']
33 | test_durations = loaded_data_tr['test_durations']
34 | 
35 | 
36 | # load results and put them in the correct format
37 | da = pd.read_csv('results/op_file.csv')
38 | nms_pos = []
39 | nms_prob = []
40 | for tt in test_files:
41 |     dal = da[da['file_name'] == tt+'.wav']
42 |     nms_pos.append(dal['detection_time'].values)
43 |     nms_prob.append(dal['detection_prob'].values[..., np.newaxis])
44 | 
45 | 
46 | # compute precision and recall
47 | precision, recall = evl.prec_recall_1d(nms_pos, nms_prob, test_pos, test_durations, 0.1, 0.230)
48 | res.plot_prec_recall('CNN_FAST', recall, precision, nms_prob)
49 | 
50 | 


--------------------------------------------------------------------------------
/bat_eval/models/detector.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/macaodha/batdetect/7d4f7e43b4456d391eb832a612e7b134e341814d/bat_eval/models/detector.npy


--------------------------------------------------------------------------------
/bat_eval/models/detector_192K.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/macaodha/batdetect/7d4f7e43b4456d391eb832a612e7b134e341814d/bat_eval/models/detector_192K.npy


--------------------------------------------------------------------------------
/bat_eval/models/detector_192K_params.json:
--------------------------------------------------------------------------------
1 | {"min_freq": 10, "crop_spec": true, "slice_scale": 0.02667, "nms_win_size": 21, "mean_log_mag": 0.5, "denoise": true, "smooth_spec": true, "overlap": 0.75, "max_freq": 240, "smooth_op_prediction_sigma": 1.0335917312661498, "chunk_size": 0, "win_size": 0.23}


--------------------------------------------------------------------------------
/bat_eval/models/detector_params.json:
--------------------------------------------------------------------------------
1 | {"min_freq": 10, "crop_spec": true, "slice_scale": 0.02322, "nms_win_size": 21, "mean_log_mag": 0.5, "denoise": true, "smooth_spec": true, "overlap": 0.75, "max_freq": 270, "smooth_op_prediction_sigma": 1.0335917312661498, "chunk_size": 0, "win_size": 0.23}


--------------------------------------------------------------------------------
/bat_eval/models/readme.md:
--------------------------------------------------------------------------------
 1 | Trained models should be kept here:
 2 | 
 3 | `detector.npy` is 15_09_16_16_26_28_cnn_hnm_2_lr_0.01_mo_0.9_net_small_norfolk
 4 | norfolk test data - trained on background day data
 5 | average precision (area) = 0.866
 6 | recall at 95 % precision =  0.736
 7 | 
 8 | 
 9 | `detector_192K.npy` is 15_09_16_15_39_34_cnn_hnm_2_lr_0.01_mo_0.9_net_small_norfolk
10 | norfolk test data - trained on background day data
11 | average precision (area) = 0.856
12 | recall at 95 % precision =  0.724


--------------------------------------------------------------------------------
/bat_eval/myskimage.py:
--------------------------------------------------------------------------------
  1 | """
  2 | This file contains code copied from skimage package. 
  3 | Specifically, this file is a standalone implementation of 
  4 | skimage.filters's "gaussian" function.
  5 | The "image_as_float" and "guess_spatial_dimensions" functions
  6 | were also copied to as dependencies of "gaussian" function.
  7 | """
  8 | from __future__ import division
  9 | import numbers
 10 | import collections as coll
 11 | import numpy as np
 12 | from scipy import ndimage as ndi
 13 | 
 14 | __all__ = ['gaussian']
 15 | 
 16 | dtype_range = {np.bool_: (False, True),
 17 |                np.bool8: (False, True),
 18 |                np.uint8: (0, 255),
 19 |                np.uint16: (0, 65535),
 20 |                np.int8: (-128, 127),
 21 |                np.int16: (-32768, 32767),
 22 |                np.int64: (-2**63, 2**63 - 1),
 23 |                np.uint64: (0, 2**64 - 1),
 24 |                np.int32: (-2**31, 2**31 - 1),
 25 |                np.uint32: (0, 2**32 - 1),
 26 |                np.float32: (-1, 1),
 27 |                np.float64: (-1, 1)}
 28 | 
 29 | integer_types = (np.uint8, np.uint16, np.int8, np.int16)
 30 | 
 31 | _supported_types = (np.bool_, np.bool8,
 32 |                     np.uint8, np.uint16, np.uint32, np.uint64,
 33 |                     np.int8, np.int16, np.int32, np.int64,
 34 |                     np.float32, np.float64)
 35 | 
 36 | dtype_range[np.float16] = (-1, 1)
 37 | _supported_types += (np.float16, )
 38 | 
 39 | def warn(msg):
 40 |     print(msg)
 41 | 
 42 | 
 43 | def img_as_float(image):
 44 |     dtype=np.float32
 45 |     force_copy = False
 46 | 
 47 |     """
 48 |     Convert an image to the requested data-type.
 49 |     Warnings are issued in case of precision loss, or when negative values
 50 |     are clipped during conversion to unsigned integer types (sign loss).
 51 |     Floating point values are expected to be normalized and will be clipped
 52 |     to the range [0.0, 1.0] or [-1.0, 1.0] when converting to unsigned or
 53 |     signed integers respectively.
 54 |     Numbers are not shifted to the negative side when converting from
 55 |     unsigned to signed integer types. Negative values will be clipped when
 56 |     converting to unsigned integers.
 57 |     Parameters
 58 |     ----------
 59 |     image : ndarray
 60 |         Input image.
 61 |     dtype : dtype
 62 |         Target data-type.
 63 |     force_copy : bool, optional
 64 |         Force a copy of the data, irrespective of its current dtype.
 65 |     uniform : bool, optional
 66 |         Uniformly quantize the floating point range to the integer range.
 67 |         By default (uniform=False) floating point values are scaled and
 68 |         rounded to the nearest integers, which minimizes back and forth
 69 |         conversion errors.
 70 |     References
 71 |     ----------
 72 |     .. [1] DirectX data conversion rules.
 73 |            http://msdn.microsoft.com/en-us/library/windows/desktop/dd607323%28v=vs.85%29.aspx
 74 |     .. [2] Data Conversions. In "OpenGL ES 2.0 Specification v2.0.25",
 75 |            pp 7-8. Khronos Group, 2010.
 76 |     .. [3] Proper treatment of pixels as integers. A.W. Paeth.
 77 |            In "Graphics Gems I", pp 249-256. Morgan Kaufmann, 1990.
 78 |     .. [4] Dirty Pixels. J. Blinn. In "Jim Blinn's corner: Dirty Pixels",
 79 |            pp 47-57. Morgan Kaufmann, 1998.
 80 |     """
 81 |     image = np.asarray(image)
 82 |     dtypeobj = np.dtype(dtype)
 83 |     dtypeobj_in = image.dtype
 84 |     dtype = dtypeobj.type
 85 |     dtype_in = dtypeobj_in.type
 86 | 
 87 |     if dtype_in == dtype:
 88 |         if force_copy:
 89 |             image = image.copy()
 90 |         return image
 91 | 
 92 |     if not (dtype_in in _supported_types and dtype in _supported_types):
 93 |         raise ValueError("can not convert %s to %s." % (dtypeobj_in, dtypeobj))
 94 | 
 95 |     def sign_loss():
 96 |         warn("Possible sign loss when converting negative image of type "
 97 |              "%s to positive image of type %s." % (dtypeobj_in, dtypeobj))
 98 | 
 99 |     def prec_loss():
100 |         warn("Possible precision loss when converting from "
101 |              "%s to %s" % (dtypeobj_in, dtypeobj))
102 | 
103 |     def _dtype(itemsize, *dtypes):
104 |         # Return first of `dtypes` with itemsize greater than `itemsize`
105 |         return next(dt for dt in dtypes if itemsize < np.dtype(dt).itemsize)
106 | 
107 |     def _dtype2(kind, bits, itemsize=1):
108 |         # Return dtype of `kind` that can store a `bits` wide unsigned int
109 |         def compare(x, y, kind='u'):
110 |             if kind == 'u':
111 |                 return x <= y
112 |             else:
113 |                 return x < y
114 | 
115 |         s = next(i for i in (itemsize, ) + (2, 4, 8) if compare(bits, i * 8,
116 |                                                                 kind=kind))
117 |         return np.dtype(kind + str(s))
118 | 
119 | 
120 |     def _scale(a, n, m, copy=True):
121 |         # Scale unsigned/positive integers from n to m bits
122 |         # Numbers can be represented exactly only if m is a multiple of n
123 |         # Output array is of same kind as input.
124 |         kind = a.dtype.kind
125 |         if n > m and a.max() < 2 ** m:
126 |             mnew = int(np.ceil(m / 2) * 2)
127 |             if mnew > m:
128 |                 dtype = "int%s" % mnew
129 |             else:
130 |                 dtype = "uint%s" % mnew
131 |             n = int(np.ceil(n / 2) * 2)
132 |             msg = ("Downcasting %s to %s without scaling because max "
133 |                    "value %s fits in %s" % (a.dtype, dtype, a.max(), dtype))
134 |             warn(msg)
135 |             return a.astype(_dtype2(kind, m))
136 |         elif n == m:
137 |             return a.copy() if copy else a
138 |         elif n > m:
139 |             # downscale with precision loss
140 |             prec_loss()
141 |             if copy:
142 |                 b = np.empty(a.shape, _dtype2(kind, m))
143 |                 np.floor_divide(a, 2**(n - m), out=b, dtype=a.dtype,
144 |                                 casting='unsafe')
145 |                 return b
146 |             else:
147 |                 a //= 2**(n - m)
148 |                 return a
149 |         elif m % n == 0:
150 |             # exact upscale to a multiple of n bits
151 |             if copy:
152 |                 b = np.empty(a.shape, _dtype2(kind, m))
153 |                 np.multiply(a, (2**m - 1) // (2**n - 1), out=b, dtype=b.dtype)
154 |                 return b
155 |             else:
156 |                 a = np.array(a, _dtype2(kind, m, a.dtype.itemsize), copy=False)
157 |                 a *= (2**m - 1) // (2**n - 1)
158 |                 return a
159 |         else:
160 |             # upscale to a multiple of n bits,
161 |             # then downscale with precision loss
162 |             prec_loss()
163 |             o = (m // n + 1) * n
164 |             if copy:
165 |                 b = np.empty(a.shape, _dtype2(kind, o))
166 |                 np.multiply(a, (2**o - 1) // (2**n - 1), out=b, dtype=b.dtype)
167 |                 b //= 2**(o - m)
168 |                 return b
169 |             else:
170 |                 a = np.array(a, _dtype2(kind, o, a.dtype.itemsize), copy=False)
171 |                 a *= (2**o - 1) // (2**n - 1)
172 |                 a //= 2**(o - m)
173 |                 return a
174 | 
175 |     kind = dtypeobj.kind
176 |     kind_in = dtypeobj_in.kind
177 |     itemsize = dtypeobj.itemsize
178 |     itemsize_in = dtypeobj_in.itemsize
179 | 
180 |     if kind == 'b':
181 |         # to binary image
182 |         if kind_in in "fi":
183 |             sign_loss()
184 |         prec_loss()
185 |         return image > dtype_in(dtype_range[dtype_in][1] / 2)
186 | 
187 |     if kind_in == 'b':
188 |         # from binary image, to float and to integer
189 |         result = image.astype(dtype)
190 |         if kind != 'f':
191 |             result *= dtype(dtype_range[dtype][1])
192 |         return result
193 | 
194 |     if kind in 'ui':
195 |         imin = np.iinfo(dtype).min
196 |         imax = np.iinfo(dtype).max
197 |     if kind_in in 'ui':
198 |         imin_in = np.iinfo(dtype_in).min
199 |         imax_in = np.iinfo(dtype_in).max
200 | 
201 |     if kind_in == 'f':
202 |         if np.min(image) < -1.0 or np.max(image) > 1.0:
203 |             raise ValueError("Images of type float must be between -1 and 1.")
204 |         if kind == 'f':
205 |             # floating point -> floating point
206 |             if itemsize_in > itemsize:
207 |                 prec_loss()
208 |             return image.astype(dtype)
209 | 
210 |         # floating point -> integer
211 |         prec_loss()
212 |         # use float type that can represent output integer type
213 |         image = np.array(image, _dtype(itemsize, dtype_in,
214 |                                        np.float32, np.float64))
215 |         if not uniform:
216 |             if kind == 'u':
217 |                 image *= imax
218 |             else:
219 |                 image *= imax - imin
220 |                 image -= 1.0
221 |                 image /= 2.0
222 |             np.rint(image, out=image)
223 |             np.clip(image, imin, imax, out=image)
224 |         elif kind == 'u':
225 |             image *= imax + 1
226 |             np.clip(image, 0, imax, out=image)
227 |         else:
228 |             image *= (imax - imin + 1.0) / 2.0
229 |             np.floor(image, out=image)
230 |             np.clip(image, imin, imax, out=image)
231 |         return image.astype(dtype)
232 | 
233 |     if kind == 'f':
234 |         # integer -> floating point
235 |         if itemsize_in >= itemsize:
236 |             prec_loss()
237 |         # use float type that can exactly represent input integers
238 |         image = np.array(image, _dtype(itemsize_in, dtype,
239 |                                        np.float32, np.float64))
240 |         if kind_in == 'u':
241 |             image /= imax_in
242 |             # DirectX uses this conversion also for signed ints
243 |             #if imin_in:
244 |             #    np.maximum(image, -1.0, out=image)
245 |         else:
246 |             image *= 2.0
247 |             image += 1.0
248 |             image /= imax_in - imin_in
249 |         return image.astype(dtype)
250 | 
251 |     if kind_in == 'u':
252 |         if kind == 'i':
253 |             # unsigned integer -> signed integer
254 |             image = _scale(image, 8 * itemsize_in, 8 * itemsize - 1)
255 |             return image.view(dtype)
256 |         else:
257 |             # unsigned integer -> unsigned integer
258 |             return _scale(image, 8 * itemsize_in, 8 * itemsize)
259 | 
260 |     if kind == 'u':
261 |         # signed integer -> unsigned integer
262 |         sign_loss()
263 |         image = _scale(image, 8 * itemsize_in - 1, 8 * itemsize)
264 |         result = np.empty(image.shape, dtype)
265 |         np.maximum(image, 0, out=result, dtype=image.dtype, casting='unsafe')
266 |         return result
267 | 
268 |     # signed integer -> signed integer
269 |     if itemsize_in > itemsize:
270 |         return _scale(image, 8 * itemsize_in - 1, 8 * itemsize - 1)
271 |     image = image.astype(_dtype2('i', itemsize * 8))
272 |     image -= imin_in
273 |     image = _scale(image, 8 * itemsize_in, 8 * itemsize, copy=False)
274 |     image += imin
275 | 
276 |     return image.astype(dtype)
277 | 
278 | 
279 | 
280 | def guess_spatial_dimensions(image):
281 |     """Make an educated guess about whether an image has a channels dimension.
282 |     Parameters
283 |     ----------
284 |     image : ndarray
285 |         The input image.
286 |     Returns
287 |     -------
288 |     spatial_dims : int or None
289 |         The number of spatial dimensions of `image`. If ambiguous, the value
290 |         is ``None``.
291 |     Raises
292 |     ------
293 |     ValueError
294 |         If the image array has less than two or more than four dimensions.
295 |     """
296 |     if image.ndim == 2:
297 |         return 2
298 |     if image.ndim == 3 and image.shape[-1] != 3:
299 |         return 3
300 |     if image.ndim == 3 and image.shape[-1] == 3:
301 |         return None
302 |     if image.ndim == 4 and image.shape[-1] == 3:
303 |         return 3
304 |     else:
305 |         raise ValueError("Expected 2D, 3D, or 4D array, got %iD." % image.ndim)
306 | 
307 | 
308 | def gaussian(image, sigma=1, output=None, mode='nearest', cval=0,
309 |              multichannel=None):
310 |     """Multi-dimensional Gaussian filter
311 | 
312 |     Parameters
313 |     ----------
314 |     image : array-like
315 |         Input image (grayscale or color) to filter.
316 |     sigma : scalar or sequence of scalars, optional
317 |         Standard deviation for Gaussian kernel. The standard
318 |         deviations of the Gaussian filter are given for each axis as a
319 |         sequence, or as a single number, in which case it is equal for
320 |         all axes.
321 |     output : array, optional
322 |         The ``output`` parameter passes an array in which to store the
323 |         filter output.
324 |     mode : {'reflect', 'constant', 'nearest', 'mirror', 'wrap'}, optional
325 |         The `mode` parameter determines how the array borders are
326 |         handled, where `cval` is the value when mode is equal to
327 |         'constant'. Default is 'nearest'.
328 |     cval : scalar, optional
329 |         Value to fill past edges of input if `mode` is 'constant'. Default
330 |         is 0.0
331 |     multichannel : bool, optional (default: None)
332 |         Whether the last axis of the image is to be interpreted as multiple
333 |         channels. If True, each channel is filtered separately (channels are
334 |         not mixed together). Only 3 channels are supported. If `None`,
335 |         the function will attempt to guess this, and raise a warning if
336 |         ambiguous, when the array has shape (M, N, 3).
337 | 
338 |     Returns
339 |     -------
340 |     filtered_image : ndarray
341 |         the filtered array
342 | 
343 |     Notes
344 |     -----
345 |     This function is a wrapper around :func:`scipy.ndi.gaussian_filter`.
346 | 
347 |     Integer arrays are converted to float.
348 | 
349 |     The multi-dimensional filter is implemented as a sequence of
350 |     one-dimensional convolution filters. The intermediate arrays are
351 |     stored in the same data type as the output. Therefore, for output
352 |     types with a limited precision, the results may be imprecise
353 |     because intermediate results may be stored with insufficient
354 |     precision.
355 | 
356 |     Examples
357 |     --------
358 | 
359 |     >>> a = np.zeros((3, 3))
360 |     >>> a[1, 1] = 1
361 |     >>> a
362 |     array([[ 0.,  0.,  0.],
363 |            [ 0.,  1.,  0.],
364 |            [ 0.,  0.,  0.]])
365 |     >>> gaussian(a, sigma=0.4)  # mild smoothing
366 |     array([[ 0.00163116,  0.03712502,  0.00163116],
367 |            [ 0.03712502,  0.84496158,  0.03712502],
368 |            [ 0.00163116,  0.03712502,  0.00163116]])
369 |     >>> gaussian(a, sigma=1)  # more smooting
370 |     array([[ 0.05855018,  0.09653293,  0.05855018],
371 |            [ 0.09653293,  0.15915589,  0.09653293],
372 |            [ 0.05855018,  0.09653293,  0.05855018]])
373 |     >>> # Several modes are possible for handling boundaries
374 |     >>> gaussian(a, sigma=1, mode='reflect')
375 |     array([[ 0.08767308,  0.12075024,  0.08767308],
376 |            [ 0.12075024,  0.16630671,  0.12075024],
377 |            [ 0.08767308,  0.12075024,  0.08767308]])
378 |     >>> # For RGB images, each is filtered separately
379 |     >>> from skimage.data import astronaut
380 |     >>> image = astronaut()
381 |     >>> filtered_img = gaussian(image, sigma=1, multichannel=True)
382 | 
383 |     """
384 | 
385 |     spatial_dims = guess_spatial_dimensions(image)
386 |     if spatial_dims is None and multichannel is None:
387 |         msg = ("Images with dimensions (M, N, 3) are interpreted as 2D+RGB "
388 |                "by default. Use `multichannel=False` to interpret as "
389 |                "3D image with last dimension of length 3.")
390 |         warn(RuntimeWarning(msg))
391 |         multichannel = True
392 |     if np.any(np.asarray(sigma) < 0.0):
393 |         raise ValueError("Sigma values less than zero are not valid")
394 |     if multichannel:
395 |         # do not filter across channels
396 |         if not isinstance(sigma, coll.Iterable):
397 |             sigma = [sigma] * (image.ndim - 1)
398 |         if len(sigma) != image.ndim:
399 |             sigma = np.concatenate((np.asarray(sigma), [0]))
400 |     #image = img_as_float(image)
401 |     return ndi.gaussian_filter(image, sigma, mode=mode, cval=cval)
402 | 


--------------------------------------------------------------------------------
/bat_eval/mywavfile.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Code taken from scipy.io.wavfile.py
  3 | 
  4 | Module to read wav files using numpy arrays
  5 | 
  6 | Functions
  7 | ---------
  8 | `read`: Return the sample rate (in samples/sec) and data from a WAV file.
  9 | """
 10 | 
 11 | from __future__ import division, print_function, absolute_import
 12 | import sys
 13 | import numpy
 14 | import struct
 15 | import warnings
 16 | 
 17 | 
 18 | __all__ = [
 19 |     'WavFileWarning',
 20 |     'read'
 21 | ]
 22 | 
 23 | 
 24 | class WavFileWarning(UserWarning):
 25 |     pass
 26 | 
 27 | 
 28 | WAVE_FORMAT_PCM = 0x0001
 29 | WAVE_FORMAT_IEEE_FLOAT = 0x0003
 30 | WAVE_FORMAT_EXTENSIBLE = 0xfffe
 31 | KNOWN_WAVE_FORMATS = (WAVE_FORMAT_PCM, WAVE_FORMAT_IEEE_FLOAT)
 32 | 
 33 | # assumes file pointer is immediately
 34 | #  after the 'fmt ' id
 35 | 
 36 | 
 37 | def _read_fmt_chunk(fid, is_big_endian):
 38 |     """
 39 |     Returns
 40 |     -------
 41 |     size : int
 42 |         size of format subchunk in bytes (minus 8 for "fmt " and itself)
 43 |     format_tag : int
 44 |         PCM, float, or compressed format
 45 |     channels : int
 46 |         number of channels
 47 |     fs : int
 48 |         sampling frequency in samples per second
 49 |     bytes_per_second : int
 50 |         overall byte rate for the file
 51 |     block_align : int
 52 |         bytes per sample, including all channels
 53 |     bit_depth : int
 54 |         bits per sample
 55 |     """
 56 |     if is_big_endian:
 57 |         fmt = '>'
 58 |     else:
 59 |         fmt = '<'
 60 | 
 61 |     size = res = struct.unpack(fmt+'I', fid.read(4))[0]
 62 |     bytes_read = 0
 63 | 
 64 |     if size < 16:
 65 |         raise ValueError("Binary structure of wave file is not compliant")
 66 | 
 67 |     res = struct.unpack(fmt+'HHIIHH', fid.read(16))
 68 |     bytes_read += 16
 69 | 
 70 |     format_tag, channels, fs, bytes_per_second, block_align, bit_depth = res
 71 | 
 72 |     if format_tag == WAVE_FORMAT_EXTENSIBLE and size >= (16+2):
 73 |         ext_chunk_size = struct.unpack(fmt+'H', fid.read(2))[0]
 74 |         bytes_read += 2
 75 |         if ext_chunk_size >= 22:
 76 |             extensible_chunk_data = fid.read(22)
 77 |             bytes_read += 22
 78 |             raw_guid = extensible_chunk_data[2+4:2+4+16]
 79 |             # GUID template {XXXXXXXX-0000-0010-8000-00AA00389B71} (RFC-2361)
 80 |             # MS GUID byte order: first three groups are native byte order,
 81 |             # rest is Big Endian
 82 |             if is_big_endian:
 83 |                 tail = b'\x00\x00\x00\x10\x80\x00\x00\xAA\x00\x38\x9B\x71'
 84 |             else:
 85 |                 tail = b'\x00\x00\x10\x00\x80\x00\x00\xAA\x00\x38\x9B\x71'
 86 |             if raw_guid.endswith(tail):
 87 |                 format_tag = struct.unpack(fmt+'I', raw_guid[:4])[0]
 88 |         else:
 89 |             raise ValueError("Binary structure of wave file is not compliant")
 90 | 
 91 |     if format_tag not in KNOWN_WAVE_FORMATS:
 92 |         raise ValueError("Unknown wave file format")
 93 | 
 94 |     # move file pointer to next chunk
 95 |     if size > (bytes_read):
 96 |         fid.read(size - bytes_read)
 97 | 
 98 |     return (size, format_tag, channels, fs, bytes_per_second, block_align,
 99 |             bit_depth)
100 | 
101 | 
102 | # assumes file pointer is immediately after the 'data' id
103 | def _read_data_chunk(fid, format_tag, channels, bit_depth, is_big_endian,
104 |                      mmap=False):
105 |     if is_big_endian:
106 |         fmt = '>I'
107 |     else:
108 |         fmt = '<I'
109 | 
110 |     # Size of the data subchunk in bytes
111 |     size = struct.unpack(fmt, fid.read(4))[0]
112 | 
113 |     # Number of bytes per sample
114 |     bytes_per_sample = bit_depth//8
115 |     if bit_depth == 8:
116 |         dtype = 'u1'
117 |     else:
118 |         if is_big_endian:
119 |             dtype = '>'
120 |         else:
121 |             dtype = '<'
122 |         if format_tag == WAVE_FORMAT_PCM:
123 |             dtype += 'i%d' % bytes_per_sample
124 |         else:
125 |             dtype += 'f%d' % bytes_per_sample
126 |     if not mmap:
127 |         data = numpy.fromstring(fid.read(size), dtype=dtype)
128 |     else:
129 |         start = fid.tell()
130 |         data = numpy.memmap(fid, dtype=dtype, mode='c', offset=start,
131 |                             shape=(size//bytes_per_sample,))
132 |         fid.seek(start + size)
133 | 
134 |     if channels > 1:
135 |         data = data.reshape(-1, channels)
136 |     return data
137 | 
138 | 
139 | def _skip_unknown_chunk(fid, is_big_endian):
140 |     if is_big_endian:
141 |         fmt = '>I'
142 |     else:
143 |         fmt = '<I'
144 | 
145 |     data = fid.read(4)
146 |     # call unpack() and seek() only if we have really read data from file
147 |     # otherwise empty read at the end of the file would trigger
148 |     # unnecessary exception at unpack() call
149 |     # in case data equals somehow to 0, there is no need for seek() anyway
150 |     if data:
151 |         size = struct.unpack(fmt, data)[0]
152 |         fid.seek(size, 1)
153 | 
154 | 
155 | def _read_riff_chunk(fid):
156 |     str1 = fid.read(4)  # File signature
157 |     if str1 == b'RIFF':
158 |         is_big_endian = False
159 |         fmt = '<I'
160 |     elif str1 == b'RIFX':
161 |         is_big_endian = True
162 |         fmt = '>I'
163 |     else:
164 |         # There are also .wav files with "FFIR" or "XFIR" signatures?
165 |         raise ValueError("File format {}... not "
166 |                          "understood.".format(repr(str1)))
167 | 
168 |     # Size of entire file
169 |     file_size = struct.unpack(fmt, fid.read(4))[0] + 8
170 | 
171 |     str2 = fid.read(4)
172 |     if str2 != b'WAVE':
173 |         raise ValueError("Not a WAV file.")
174 | 
175 |     return file_size, is_big_endian
176 | 
177 | 
178 | def read(filename, mmap=False):
179 |     """
180 |     Open a WAV file
181 | 
182 |     Return the sample rate (in samples/sec) and data from a WAV file.
183 | 
184 |     Parameters
185 |     ----------
186 |     filename : string or open file handle
187 |         Input wav file.
188 |     mmap : bool, optional
189 |         Whether to read data as memory-mapped.
190 |         Only to be used on real files (Default: False).
191 | 
192 |         .. versionadded:: 0.12.0
193 | 
194 |     Returns
195 |     -------
196 |     rate : int
197 |         Sample rate of wav file.
198 |     data : numpy array
199 |         Data read from wav file.  Data-type is determined from the file;
200 |         see Notes.
201 | 
202 |     Notes
203 |     -----
204 |     This function cannot read wav files with 24-bit data.
205 | 
206 |     Common data types: [1]_
207 | 
208 |     =====================  ===========  ===========  =============
209 |          WAV format            Min          Max       NumPy dtype
210 |     =====================  ===========  ===========  =============
211 |     32-bit floating-point  -1.0         +1.0         float32
212 |     32-bit PCM             -2147483648  +2147483647  int32
213 |     16-bit PCM             -32768       +32767       int16
214 |     8-bit PCM              0            255          uint8
215 |     =====================  ===========  ===========  =============
216 | 
217 |     Note that 8-bit PCM is unsigned.
218 | 
219 |     References
220 |     ----------
221 |     .. [1] IBM Corporation and Microsoft Corporation, "Multimedia Programming
222 |        Interface and Data Specifications 1.0", section "Data Format of the
223 |        Samples", August 1991
224 |        http://www-mmsp.ece.mcgill.ca/documents/audioformats/wave/Docs/riffmci.pdf
225 | 
226 |     """
227 |     if hasattr(filename, 'read'):
228 |         fid = filename
229 |         mmap = False
230 |     else:
231 |         fid = open(filename, 'rb')
232 | 
233 |     try:
234 |         file_size, is_big_endian = _read_riff_chunk(fid)
235 |         fmt_chunk_received = False
236 |         channels = 1
237 |         bit_depth = 8
238 |         format_tag = WAVE_FORMAT_PCM
239 |         while fid.tell() < file_size:
240 |             # read the next chunk
241 |             chunk_id = fid.read(4)
242 | 
243 |             if not chunk_id:
244 |                 print("Unexpected end of file.")
245 |                 #raise ValueError("Unexpected end of file.")
246 |                 break
247 |             elif len(chunk_id) < 4:
248 |                 raise ValueError("Incomplete wav chunk.")
249 | 
250 |             if chunk_id == b'fmt ':
251 |                 fmt_chunk_received = True
252 |                 fmt_chunk = _read_fmt_chunk(fid, is_big_endian)
253 |                 format_tag, channels, fs = fmt_chunk[1:4]
254 |                 bit_depth = fmt_chunk[6]
255 |                 if bit_depth not in (8, 16, 32, 64, 96, 128):
256 |                     raise ValueError("Unsupported bit depth: the wav file "
257 |                                      "has {}-bit data.".format(bit_depth))
258 |             elif chunk_id == b'fact':
259 |                 _skip_unknown_chunk(fid, is_big_endian)
260 |             elif chunk_id == b'data':
261 |                 if not fmt_chunk_received:
262 |                     raise ValueError("No fmt chunk before data")
263 |                 data = _read_data_chunk(fid, format_tag, channels, bit_depth,
264 |                                         is_big_endian, mmap)
265 |             elif chunk_id == b'LIST':
266 |                 # Someday this could be handled properly but for now skip it
267 |                 _skip_unknown_chunk(fid, is_big_endian)
268 |             elif chunk_id in (b'JUNK', b'Fake'):
269 |                 # Skip alignment chunks without warning
270 |                 _skip_unknown_chunk(fid, is_big_endian)
271 |             else:
272 |                 warnings.warn("Chunk (non-data) not understood, skipping it.",
273 |                               WavFileWarning)
274 |                 _skip_unknown_chunk(fid, is_big_endian)
275 |     finally:
276 |         if not hasattr(filename, 'read'):
277 |             fid.close()
278 |         else:
279 |             fid.seek(0)
280 | 
281 |     return fs, data
282 | 
283 | 
284 | 
285 | if sys.version_info[0] >= 3:
286 |     def _array_tofile(fid, data):
287 |         # ravel gives a c-contiguous buffer
288 |         fid.write(data.ravel().view('b').data)
289 | else:
290 |     def _array_tofile(fid, data):
291 |         fid.write(data.tostring())
292 | 
293 | 


--------------------------------------------------------------------------------
/bat_eval/nms.pyx:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | cimport numpy as np
 3 | cimport cython
 4 | 
 5 | DTYPE = np.float #Do not remove this line. See http://stackoverflow.com/questions/8024805/cython-compiled-c-extension-importerror-dynamic-module-does-not-define-init-fu
 6 | ctypedef np.float32_t DTYPE_t
 7 | 
 8 | @cython.boundscheck(False)
 9 | def nms_1d(np.ndarray[DTYPE_t, ndim=1] src, int win_size, float file_duration):
10 |     """1D Non maximum suppression
11 |        src: vector of length N
12 |     """
13 | 
14 |     cdef int max_ind = 0
15 |     cdef int ii = 0
16 |     cdef int ee = 0
17 |     cdef int width = src.shape[0]-1
18 |     cdef np.ndarray pos = np.empty(width, dtype=np.int)
19 |     cdef int pos_cnt = 0
20 |     while ii <= width:
21 | 
22 |         if max_ind < (ii - win_size):
23 |             max_ind = ii - win_size
24 | 
25 |         ee = ii + win_size
26 |         if ii + win_size >= width:
27 |             ee = width
28 | 
29 |         while max_ind <= ee:
30 |             if src[<unsigned int>max_ind] > src[<unsigned int>ii]:
31 |                 break
32 |             max_ind += 1
33 | 
34 |         if max_ind > ee:
35 |             pos[<unsigned int>pos_cnt] = ii
36 |             pos_cnt += 1
37 |             max_ind = ii+1
38 |             ii += win_size
39 | 
40 |         ii += 1
41 | 
42 |     pos = pos[:pos_cnt]
43 |     val = src[pos]
44 | 
45 |     # # remove peaks near the end
46 |     inds = (pos + win_size) < src.shape[0]
47 |     pos = pos[inds]
48 |     val = val[inds]
49 | 
50 |     # set output to between 0 and 1, then put it in the correct time range
51 |     pos = pos.astype(np.float32) / src.shape[0]
52 |     pos = pos*file_duration
53 | 
54 |     return pos, val
55 | 


--------------------------------------------------------------------------------
/bat_eval/nms_slow.py:
--------------------------------------------------------------------------------
 1 | from __future__ import print_function
 2 | import numpy as np
 3 | 
 4 | def nms_1d(src, win_size, file_duration):
 5 |     """1D Non maximum suppression
 6 |        src: vector of length N
 7 |     """
 8 | 
 9 |     pos = []
10 |     src_cnt = 0
11 |     max_ind = 0
12 |     ii = 0
13 |     ee = 0
14 |     width = src.shape[0]-1
15 |     while ii <= width:
16 | 
17 |         if max_ind < (ii - win_size):
18 |             max_ind = ii - win_size
19 | 
20 |         ee = np.minimum(ii + win_size, width)
21 | 
22 |         while max_ind <= ee:
23 |             src_cnt += 1
24 |             if src[int(max_ind)] > src[int(ii)]:
25 |                 break
26 |             max_ind += 1
27 | 
28 |         if max_ind > ee:
29 |             pos.append(ii)
30 |             max_ind = ii+1
31 |             ii += win_size
32 | 
33 |         ii += 1
34 | 
35 |     pos = np.asarray(pos).astype(np.int)
36 |     val = src[pos]
37 | 
38 |     # remove peaks near the end
39 |     inds = (pos + win_size) < src.shape[0]
40 |     pos = pos[inds]
41 |     val = val[inds]
42 | 
43 |     # set output to between 0 and 1, then put it in the correct time range
44 |     pos = pos / float(src.shape[0])
45 |     pos = pos*file_duration
46 | 
47 |     return pos, val
48 | 
49 | 
50 | def test_nms():
51 |     import matplotlib.pyplot as plt
52 |     import numpy as np
53 |     #import pyximport; pyximport.install(reload_support=True)
54 |     import nms as nms_fast
55 | 
56 |     y = np.sin(np.arange(1000)/100.0*np.pi)
57 |     y = y + np.random.random(y.shape)*0.5
58 |     win_size = int(0.1*y.shape[0]/2.0)
59 | 
60 |     pos, prob = nms_1d(y, win_size, y.shape[0])
61 |     pos_f, prob_f = nms_fast.nms_1d(y, win_size, y.shape[0])
62 | 
63 |     print('diff between implementations =', 1-np.isclose(prob_f, prob).mean())
64 |     print('diff between implementations =', 1-np.isclose(pos_f, pos).mean())
65 | 
66 |     plt.close('all')
67 |     plt.plot(y)
68 |     plt.plot((pos).astype('int'), prob, 'ro', ms=10)
69 |     plt.plot((pos_f).astype('int'), prob, 'bo')  # shift so we can see them
70 |     plt.show()
71 | 


--------------------------------------------------------------------------------
/bat_eval/readme.md:
--------------------------------------------------------------------------------
 1 | # CPU Bat Detector Code
 2 | 
 3 | This contains python code for bat echolocation call detection in full spectrum audio recordings. This is a stripped down CPU based version of the detector with minimal dependencies that can be used for deployment.
 4 | 
 5 | 
 6 | #### Installation Instructions
 7 | * Install the Anconda Python 2.7 distribution from [here](https://www.continuum.io/downloads).
 8 | * Download this detection code from the repository and unzip it.
 9 | * Compile fast non maximum suppression by running: `python setup.py build_ext --inplace`. This might not work on all systems e.g. Windows.
10 | 
11 | 
12 | #### Running on Your Own Data
13 | * Change the `data_dir = 'wavs/'` variable so that it points to the location of the audio files you want to run the detector on.
14 | * Specify where you want to results to be saved by setting `op_ann_dir = 'results/'`.
15 | * To run open up the command line and type:  
16 |   `python run_detector.py`
17 | * If you want the detector to be less conservative in it's detections lower the value of `detection_thresh`.
18 | * By setting `save_individual_results = False` the code will not save individual results files.
19 | 
20 | ## Misc
21 | 
22 | #### Requirements
23 | The code has been tested using Python2.7 (it mostly works under Python3.6, but we have noticed some issues) with the following package versions:  
24 | `Python 2.7.12`   
25 | `scipy 0.19.0`  
26 | `numpy 1.12.1`  
27 | `pandas 0.19.2`  
28 | `cython 0.24.1` - not required  
29 | 
30 | 
31 | #### Different Detection Models
32 | * `detector_192K.npy` is trained to be more efficient for files that have been recorded at 192K. Note that different detectors will give different results. You can swap in your own models that have been trained using the code in `../bat_train`, and exported with '../bat_train/export_detector_weights.py'.
33 | * To use it change the detector model as follows:  
34 | `det_model_file = 'models/detector_192K.npy'`
35 | * Running `evaluate_cnn_fast.py` will compute the performance of the CPU version of this CNN_FAST model on the different test sets.
36 | 
37 | 
38 | #### Viewing Outputs
39 | * The code outputs annotations as one big csv file. The location where to save the file is specified with the variable:
40 |   `op_file_name_total = 'res/op_file.csv'`  
41 |   It contains three fields `file_name`, `detection_time`, and `detection_prob` which indicated the time in file and detector confidence (higher is more confident) for each detected call.
42 | * It also saves the outputs in a format compatible with [AudioTagger](https://github.com/groakat/AudioTagger). The output directory for these annotations is specified as:
43 | `op_ann_dir = 'res/'`  
44 | The individual `*-sceneRect.csv` files contain the same information that is specified in the main results file `op_file_name_total`, where `LabelStartTime_Seconds` corresponds to `detection_time` and `DetectorConfidence` corresponds to `detection_prob`. The additional fields (e.g. `Spec_x1`) are specific to Audiotagger and do not contain any extra information.  
45 | 
46 | 
47 | #### Performance
48 | * You can get higher resolution results by setting the `low_res` flag in `cpu_detection.run_detection()` to `False`.
49 | * The detector code breaks the files down into chunks of audio (this is controlled by the parameter `chunk_size` in `cpu_detection` measured in seconds/10). Its best to keep this value reasonably small to keep memory usage low. However, experimenting with different values could speed things up.
50 | * You can get faster Fourier Transform by installing FFTW3 library (http://www.fftw.org/) and python wrapper pyFFTW (https://pypi.python.org/pypi/pyFFTW). On Ubuntu Linux: `sudo apt-get install libfftw3 libfftw3-dev` and `pip install pyfftw`, respectively.
51 | 
52 | 
53 | 
54 | ### Acknowledgements  
55 | Thanks to Daniyar Turmukhambetov for coding help for another version of this repo. We are enormously grateful for the efforts and enthusiasm of the amazing iBats and Bat Detective volunteers. We would also like to thank Ian Agranat and Joe Szewczak for useful discussions and access to their systems. Finally, we would like to thank [Zooniverse](https://www.zooniverse.org/) for setting up and hosting the Bat Detective project.
56 | 
57 | ### License
58 | Code, audio data, and annotations are available for research purposes only i.e. non-commercial use. For any other use of the software or data please contact the authors.
59 | 


--------------------------------------------------------------------------------
/bat_eval/run_detector.py:
--------------------------------------------------------------------------------
  1 | from __future__ import print_function
  2 | import numpy as np
  3 | import os
  4 | import fnmatch
  5 | import time
  6 | import sys
  7 | 
  8 | import write_op as wo
  9 | import cpu_detection as detector
 10 | import mywavfile
 11 | 
 12 | 
 13 | def get_audio_files(ip_dir):
 14 |     matches = []
 15 |     for root, dirnames, filenames in os.walk(ip_dir):
 16 |         for filename in filenames:
 17 |             if filename.lower().endswith('.wav'):
 18 |                 matches.append(os.path.join(root, filename))
 19 |     return matches
 20 | 
 21 | 
 22 | def read_audio(file_name, do_time_expansion, chunk_size, win_size):
 23 |     # try to read in audio file
 24 |     try:
 25 |         samp_rate_orig, audio = mywavfile.read(file_name)
 26 |     except:
 27 |         print('  Error reading file')
 28 |         return True, None, None, None, None
 29 | 
 30 |     # convert to mono if stereo
 31 |     if len(audio.shape) == 2:
 32 |         print('  Warning: stereo file. Just taking left channel.')
 33 |         audio = audio[:, 0]
 34 |     file_dur = audio.shape[0] / float(samp_rate_orig)
 35 |     print('  dur', round(file_dur,3), '(secs) , fs', samp_rate_orig)
 36 | 
 37 |     # original model is trained on time expanded data
 38 |     samp_rate = samp_rate_orig
 39 |     if do_time_expansion:
 40 |         samp_rate = int(samp_rate_orig/10.0)
 41 |         file_dur *= 10
 42 | 
 43 |     # pad with zeros so we can go right to the end
 44 |     multiplier = np.ceil(file_dur/float(chunk_size-win_size))
 45 |     diff = multiplier*(chunk_size-win_size) - file_dur + win_size
 46 |     audio_pad = np.hstack((audio, np.zeros(int(diff*samp_rate))))
 47 | 
 48 |     return False, audio_pad, file_dur, samp_rate, samp_rate_orig
 49 | 
 50 | 
 51 | def run_model(det, audio, file_dur, samp_rate, detection_thresh, max_num_calls=0):
 52 |     """This runs the bat call detector.
 53 |     """
 54 |     # results will be stored here
 55 |     det_time_file = np.zeros(0)
 56 |     det_prob_file = np.zeros(0)
 57 | 
 58 |     # files can be long so we split each up into separate (overlapping) chunks
 59 |     st_positions = np.arange(0, file_dur, det.chunk_size-det.win_size)
 60 |     for chunk_id, st_position in enumerate(st_positions):
 61 | 
 62 |         # take a chunk of the audio
 63 |         # should already be zero padded at the end so its the correct size
 64 |         st_pos = int(st_position*samp_rate)
 65 |         en_pos = int(st_pos + det.chunk_size*samp_rate)
 66 |         audio_chunk = audio[st_pos:en_pos]
 67 |         chunk_duration = audio_chunk.shape[0] / float(samp_rate)
 68 | 
 69 |         # create spectrogram
 70 |         spec = det.create_spec(audio_chunk, samp_rate)
 71 | 
 72 |         # run detector
 73 |         det_loc, prob_det = det.run_detection(spec, chunk_duration, detection_thresh,
 74 |                                               low_res=True)
 75 | 
 76 |         det_time_file = np.hstack((det_time_file, det_loc + st_position))
 77 |         det_prob_file = np.hstack((det_prob_file, prob_det))
 78 | 
 79 |     # undo the effects of time expansion for detector
 80 |     if do_time_expansion:
 81 |         det_time_file /= 10.0
 82 | 
 83 |     return det_time_file, det_prob_file
 84 | 
 85 | 
 86 | if __name__ == "__main__":
 87 | 
 88 |     # params
 89 |     detection_thresh = 0.95        # make this smaller if you want more calls
 90 |     do_time_expansion = True       # if audio is already time expanded set this to False
 91 |     save_individual_results = True # if True will create an output for each file
 92 |     save_summary_result = True     # if True will create a single csv file with all results
 93 | 
 94 |     # load data
 95 |     data_dir = 'wavs'                                   # this is the path to your audio files
 96 |     op_ann_dir = 'results'                              # this where your results will be saved
 97 |     op_ann_dir_ind = os.path.join(op_ann_dir, 'individual_results')  # this where individual results will be saved
 98 |     op_file_name_total = os.path.join(op_ann_dir, 'results.csv')
 99 |     if not os.path.isdir(op_ann_dir):
100 |         os.makedirs(op_ann_dir)
101 |     if save_individual_results and not os.path.isdir(op_ann_dir_ind):
102 |         os.makedirs(op_ann_dir_ind)
103 | 
104 |     # read audio files
105 |     audio_files = get_audio_files(data_dir)
106 | 
107 |     print('Processing        ', len(audio_files), 'files')
108 |     print('Input directory   ', data_dir)
109 |     print('Results directory ', op_ann_dir, '\n')
110 | 
111 | 
112 |     # load and create the detector
113 |     det_model_file = 'models/detector.npy'
114 |     det_params_file = det_model_file[:-4] + '_params.json'
115 |     det = detector.CPUDetector(det_model_file, det_params_file)
116 | 
117 |     # loop through audio files
118 |     results = []
119 |     for file_cnt, file_name in enumerate(audio_files):
120 | 
121 |         file_name_basename = file_name[len(data_dir):]
122 |         print('\n', file_cnt+1, 'of', len(audio_files), '\t', file_name_basename)
123 | 
124 |         # read audio file - skip file if can't read it
125 |         read_fail, audio, file_dur, samp_rate, samp_rate_orig = read_audio(file_name,
126 |                                 do_time_expansion, det.chunk_size, det.win_size)
127 |         if read_fail:
128 |             continue
129 | 
130 |         # run detector
131 |         tic = time.time()
132 |         det_time, det_prob  = run_model(det, audio, file_dur, samp_rate,
133 |                                         detection_thresh)
134 |         toc = time.time()
135 | 
136 |         print('  detection time', round(toc-tic, 3), '(secs)')
137 |         num_calls = len(det_time)
138 |         print('  ' + str(num_calls) + ' calls found')
139 | 
140 |         # save results
141 |         if save_individual_results:
142 |             # save to AudioTagger format
143 |             f_name_fmt = file_name_basename.replace('/', '_').replace('\\', '_')[:-4]
144 |             op_file_name = os.path.join(op_ann_dir_ind, f_name_fmt) + '-sceneRect.csv'
145 |             wo.create_audio_tagger_op(file_name_basename, op_file_name, det_time,
146 |                                       det_prob, samp_rate_orig, class_name='bat')
147 | 
148 |         # save as dictionary
149 |         if num_calls > 0:
150 |             res = {'filename':file_name_basename, 'time':det_time, 'prob':det_prob}
151 |             results.append(res)
152 | 
153 |     # save results for all files to large csv
154 |     if save_summary_result and (len(results) > 0):
155 |         print('\nsaving results to', op_file_name_total)
156 |         wo.save_to_txt(op_file_name_total, results)
157 |     else:
158 |         print('no detections to save')
159 | 


--------------------------------------------------------------------------------
/bat_eval/setup.py:
--------------------------------------------------------------------------------
 1 | from distutils.core import setup
 2 | from distutils.extension import Extension
 3 | from Cython.Build import cythonize
 4 | import numpy
 5 | from sys import platform
 6 | 
 7 | extra_compile_args = []
 8 | extra_link_args = []
 9 | 
10 | try:
11 |     from Cython.Distutils.build_ext import build_ext
12 | except ImportError:
13 |     print('Error: Cython not installed. please install by running "conda install cython". exiting')
14 |     exit()
15 | 
16 | if platform == "linux" or platform == "linux2":
17 |     # linux
18 |     extra_compile_args.append('-fopenmp')
19 |     extra_compile_args.append('-ffast-math')
20 |     extra_compile_args.append('-msse')
21 |     extra_compile_args.append('-msse2')
22 |     extra_compile_args.append('-msse3')
23 |     extra_compile_args.append('-msse4')
24 |     extra_compile_args.append('-s')
25 |     extra_compile_args.append('-std=c99')
26 |     extra_link_args.append('-fopenmp')
27 | 
28 | elif platform == "darwin":
29 |     # OS X
30 |     extra_compile_args.append('-fopenmp')
31 |     extra_compile_args.append('-ffast-math')
32 |     extra_compile_args.append('-msse')
33 |     extra_compile_args.append('-msse2')
34 |     extra_compile_args.append('-msse3')
35 |     extra_compile_args.append('-msse4')
36 |     extra_compile_args.append('-s')
37 |     extra_compile_args.append('-std=c99')
38 |     extra_link_args.append('-fopenmp')
39 | 
40 |     import os
41 |     os.environ["CC"] = "gcc-6"
42 |     os.environ["CXX"] = "gcc-6"
43 | elif platform == "win32":
44 |     # Windows
45 |     pass
46 | 
47 | extensions = [
48 |     Extension("nms", ["nms.pyx"],
49 |         extra_compile_args=extra_compile_args,
50 |         extra_link_args=extra_link_args)
51 |     ]
52 | 
53 | setup(
54 |     ext_modules = cythonize(extensions),
55 |     include_dirs=[numpy.get_include()]
56 |     )
57 | 


--------------------------------------------------------------------------------
/bat_eval/spectrogram.py:
--------------------------------------------------------------------------------
  1 | from myskimage import gaussian
  2 | import numpy as np
  3 | import imp
  4 | try:
  5 |     imp.find_module('pyfftw')
  6 |     pyfftw_installed = True
  7 |     import pyfftw
  8 | except ImportError:
  9 |     pyfftw_installed = False
 10 | 
 11 | 
 12 | class Spectrogram:
 13 |     fftw_inps = {}
 14 |     fftw_rfft = {}
 15 |     han_wins = {}
 16 | 
 17 |     def __init__(self, use_pyfftw=True):
 18 |         if not pyfftw_installed:
 19 |             use_pyfftw = False
 20 |         self.use_pyfftw = use_pyfftw
 21 | 
 22 |     @staticmethod
 23 |     def _denoise(spec):
 24 |         """
 25 |         Perform denoising.
 26 |         """
 27 |         me = np.mean(spec, 1)
 28 |         spec = spec - me[:, np.newaxis]
 29 | 
 30 |         # remove anything below 0
 31 |         spec.clip(min=0, out=spec)
 32 | 
 33 |         return spec
 34 | 
 35 |     @staticmethod
 36 |     def do_fft(inp, use_pyfftw=False, K=None):
 37 |         if not use_pyfftw:
 38 |             out = np.fft.rfft(inp, n=K, axis=0)
 39 |             out = out.astype('complex64') # numpy may be using double precision internally
 40 |         elif use_pyfftw:
 41 |             if not inp.shape in Spectrogram.fftw_inps:
 42 |                 Spectrogram.fftw_inps[inp.shape] = pyfftw.empty_aligned(inp.shape, dtype='float32')
 43 |                 Spectrogram.fftw_rfft[inp.shape] = pyfftw.builders.rfft(Spectrogram.fftw_inps[inp.shape], axis=0)
 44 |             Spectrogram.fftw_inps[inp.shape][:] = inp[:]
 45 |             out = (Spectrogram.fftw_rfft[inp.shape])()
 46 |         return out
 47 | 
 48 |     def gen_mag_spectrogram(self, x, fs, ms, overlap_perc, crop_spec=True, max_freq=256, min_freq=0):
 49 |         """
 50 |         Computes magnitude spectrogram by specifying time
 51 |         """
 52 | 
 53 |         x = x.astype(np.float32)
 54 | 
 55 |         nfft = int(ms*fs)
 56 |         noverlap = int(overlap_perc*nfft)
 57 | 
 58 |         # window data
 59 |         step = nfft - noverlap
 60 |         shape = (nfft, (x.shape[-1]-noverlap)//step)
 61 |         strides = (x.strides[0], step*x.strides[0])
 62 |         x_wins = np.lib.stride_tricks.as_strided(x, shape=shape, strides=strides)
 63 | 
 64 |         # apply window
 65 |         if x_wins.shape not in Spectrogram.han_wins:
 66 |             Spectrogram.han_wins[x_wins.shape[0]] = np.hanning(x_wins.shape[0]).astype('float32')
 67 | 
 68 |         han_win = Spectrogram.han_wins[x_wins.shape[0]]
 69 |         x_wins_han = han_win[..., np.newaxis] * x_wins
 70 | 
 71 |         # do fft
 72 |         # note this will be much slower if x_wins_han.shape[0] is not a power of 2
 73 |         complex_spec = Spectrogram.do_fft(x_wins_han, self.use_pyfftw)
 74 | 
 75 |         # calculate magnitude
 76 |         mag_spec = complex_spec.real**2 + complex_spec.imag**2
 77 |         # calculate magnitude
 78 |         #mag_spec = (np.conjugate(complex_spec) * complex_spec).real
 79 |         # calculate magnitude
 80 |         #mag_spec = np.square(np.absolute(complex_spec))
 81 | 
 82 |         # orient correctly and remove dc component
 83 |         spec = mag_spec[1:, :]
 84 |         spec = np.flipud(spec)
 85 | 
 86 |         # only keep the relevant bands
 87 |         # not really in frequency, better thought of as indices
 88 |         if crop_spec:
 89 |             spec = spec[-max_freq:-min_freq, :]
 90 | 
 91 |             # add some zeros if too small
 92 |             req_height = max_freq-min_freq
 93 |             if spec.shape[0] < req_height:
 94 |                 zero_pad = np.zeros((req_height-spec.shape[0], spec.shape[1]), dtype=np.float32)
 95 |                 spec = np.vstack((zero_pad, spec))
 96 |         return spec
 97 | 
 98 | 
 99 |     def gen_spectrogram(self, audio_samples, sampling_rate, fft_win_length, fft_overlap, crop_spec=True, max_freq=256, min_freq=0):
100 |         """
101 |         Compute spectrogram, crop and compute log.
102 |         """
103 | 
104 |         # compute spectrogram
105 |         spec = self.gen_mag_spectrogram(audio_samples, sampling_rate, fft_win_length, fft_overlap, crop_spec, max_freq, min_freq)
106 | 
107 |         # perform log scaling - here the same as matplotlib
108 |         log_scaling = 2.0 * (1.0 / sampling_rate) * (1.0/(np.abs(np.hanning(int(fft_win_length*sampling_rate)))**2).sum())
109 |         spec = np.log(1 + log_scaling*spec)
110 | 
111 |         return spec
112 | 
113 | 
114 |     def process_spectrogram(self, spec, denoise_spec=True, smooth_spec=True, smooth_sigma=1.0):
115 |         """
116 |         Denoises, and smooths spectrogram.
117 |         """
118 | 
119 |         # denoise
120 |         if denoise_spec:
121 |             spec = Spectrogram._denoise(spec)
122 | 
123 |         # smooth the spectrogram
124 |         if smooth_spec:
125 |             spec = gaussian(spec, smooth_sigma)
126 | 
127 |         return spec
128 | 


--------------------------------------------------------------------------------
/bat_eval/wavs/test_file.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/macaodha/batdetect/7d4f7e43b4456d391eb832a612e7b134e341814d/bat_eval/wavs/test_file.wav


--------------------------------------------------------------------------------
/bat_eval/write_op.py:
--------------------------------------------------------------------------------
 1 | import pandas as pd
 2 | import numpy as np
 3 | import datetime as dt
 4 | import glob
 5 | import os
 6 | 
 7 | 
 8 | def save_to_txt(op_file, results):
 9 | 
10 |     # takes a dictionary of results and saves to file
11 |     with open(op_file, 'w') as file:
12 |         head_str = 'file_name,detection_time,detection_prob'
13 |         file.write(head_str + '\n')
14 | 
15 |         for ii in range(len(results)):
16 |             for jj in range(len(results[ii]['prob'])):
17 | 
18 |                 row_str = results[ii]['filename'] + ','
19 |                 tm = round(results[ii]['time'][jj],3)
20 |                 pr = round(results[ii]['prob'][jj],3)
21 |                 row_str += str(tm) + ',' + str(pr)
22 |                 file.write(row_str + '\n')
23 | 
24 | 
25 | def create_audio_tagger_op(ip_file_name, op_file_name, st_times, det_confidence,
26 |                            samp_rate, class_name):
27 |     # saves the detections in an audiotagger friendly format
28 | 
29 |     col_names = ['Filename', 'Label', 'LabelTimeStamp', 'Spec_NStep',
30 |                  'Spec_NWin', 'Spec_x1', 'Spec_y1', 'Spec_x2', 'Spec_y2',
31 |                  'LabelStartTime_Seconds', 'LabelEndTime_Seconds',
32 |                  'LabelArea_DataPoints', 'DetectorConfidence']
33 | 
34 |     nstep = 0.001
35 |     nwin = 0.003
36 |     call_width = 0.001  # code does not output call width so just make one up
37 |     y_max = (samp_rate*nwin)/2.0
38 |     num_calls = len(st_times)
39 | 
40 |     if num_calls == 0:
41 |         da_at = pd.DataFrame(index=np.arange(0), columns=col_names)
42 |     else:
43 |         da_at = pd.DataFrame(index=np.arange(0, num_calls), columns=col_names)
44 |         da_at['Spec_NStep'] = nstep
45 |         da_at['Spec_NWin'] = nwin
46 |         da_at['Label'] = 'bat'
47 |         da_at['LabelTimeStamp'] = dt.datetime.now().isoformat()
48 |         da_at['Spec_y1'] = 0
49 |         da_at['Spec_y2'] = y_max
50 |         da_at['Filename'] = ip_file_name
51 | 
52 |         for ii in np.arange(0, num_calls):
53 | 
54 |             st_time = st_times[ii]
55 |             da_at.loc[ii, 'LabelStartTime_Seconds'] = np.round(st_time, 3)
56 |             da_at.loc[ii, 'LabelEndTime_Seconds'] = np.round(st_time + call_width, 3)
57 |             da_at.loc[ii, 'Label'] = class_name
58 | 
59 |             da_at.loc[ii, 'Spec_x1'] = np.round(st_time/nstep, 3)
60 |             da_at.loc[ii, 'Spec_x2'] = np.round((st_time + call_width)/nstep, 3)
61 | 
62 |             da_at.loc[ii, 'DetectorConfidence'] = np.round(det_confidence[ii], 3)
63 | 
64 |     # save to disk
65 |     da_at.to_csv(op_file_name, index=False)
66 | 
67 |     return da_at
68 | 


--------------------------------------------------------------------------------
/bat_train/classifier.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import evaluate as evl
  3 | import cls_audio_forest as cls_rf
  4 | import cls_cnn as cls_cnn
  5 | import cls_segment as seg
  6 | import create_results as res
  7 | import time
  8 | 
  9 | 
 10 | class Classifier:
 11 | 
 12 |     def __init__(self, params_):
 13 |         self.params = params_
 14 |         if self.params.classification_model == 'rf_vanilla':
 15 |             self.model = cls_rf.AudioForest(self.params)
 16 |         elif self.params.classification_model == 'cnn':
 17 |             self.model = cls_cnn.NeuralNet(self.params)
 18 |         elif self.params.classification_model == 'segment':
 19 |             self.model = seg.SegmentAudio(self.params)
 20 |         else:
 21 |             print 'Invalid model specified'
 22 | 
 23 |     def save_features(self, files):
 24 |         self.model.save_features(files)
 25 | 
 26 |     def train(self, files, gt_pos, durations):
 27 |         """
 28 |         Takes the file names and GT call positions and trains model.
 29 |         """
 30 | 
 31 |         positions, class_labels = generate_training_positions(files, gt_pos, durations, self.params)
 32 | 
 33 |         self.model.train(positions, class_labels, files, durations)
 34 | 
 35 |         # hard negative mining
 36 |         if self.params.num_hard_negative_mining > 0 and self.params.classification_model != 'segment':
 37 |             print '\nhard negative mining'
 38 |             for hn in range(self.params.num_hard_negative_mining):
 39 |                 print '\thmn round', hn
 40 |                 positions, class_labels = self.do_hnm(files, gt_pos, durations, positions, class_labels)
 41 |                 self.model.train(positions, class_labels, files, durations)
 42 | 
 43 |     def test_single(self, audio_samples, sampling_rate):
 44 |         """
 45 |         Pass the raw audio samples and it will make a prediction.
 46 |         """
 47 |         duration = audio_samples.shape[0]/float(sampling_rate)
 48 |         nms_pos, nms_prob, y_prediction = self.model.test(file_duration=duration, audio_samples=audio_samples, sampling_rate=sampling_rate)
 49 |         return nms_pos, nms_prob, y_prediction
 50 | 
 51 |     def test_batch(self, files, gt_pos, durations, save_results=False, op_im_dir=''):
 52 |         """
 53 |         Takes a list of files as input and runs the detector on them.
 54 |         """
 55 |         nms_pos = [None]*len(files)
 56 |         nms_prob = [None]*len(files)
 57 |         for ii, file_name in enumerate(files):
 58 |             nms_pos[ii], nms_prob[ii], y_prediction = self.model.test(file_name=file_name, file_duration=durations[ii])
 59 | 
 60 |             # plot results
 61 |             if save_results:
 62 |                 aud_file = self.params.audio_dir + file_name + '.wav'
 63 |                 res.plot_spec(op_im_dir + file_name, aud_file, gt_pos[ii], nms_pos[ii], nms_prob[ii], y_prediction, self.params, True)
 64 | 
 65 |         return nms_pos, nms_prob
 66 | 
 67 |     def do_hnm(self, files, gt_pos, durations, positions, class_labels):
 68 |         """
 69 |         Hard negative mining, adds high confidence false positives to the training set.
 70 |         """
 71 | 
 72 |         nms_pos, nms_prob = self.test_batch(files, gt_pos, durations, False, '')
 73 | 
 74 |         positions_new = [None]*len(nms_pos)
 75 |         class_labels_new = [None]*len(nms_pos)
 76 |         for ii in range(len(files)):
 77 | 
 78 |             # add the false positives that are above the detection threshold
 79 |             # and not too close to the GT
 80 |             poss_negs = nms_pos[ii][nms_prob[ii][:,0] > self.params.detection_prob]
 81 |             if gt_pos[ii].shape[0] > 0:
 82 |                 # have the extra newaxis in case gt_pos[ii] shape changes in the future
 83 |                 pw_distance = np.abs(poss_negs[np.newaxis, ...]-gt_pos[ii][:,0][..., np.newaxis])
 84 |                 dis_check = (pw_distance >= (self.params.window_size / 3)).mean(0)
 85 |                 new_negs = poss_negs[dis_check==1]
 86 |             else:
 87 |                 new_negs = poss_negs
 88 |             new_negs = new_negs[new_negs < (durations[ii]-self.params.window_size)]
 89 | 
 90 |             # add them to the training set
 91 |             positions_new[ii] = np.hstack((positions[ii], new_negs))
 92 |             class_labels_new[ii] = np.vstack((class_labels[ii], np.zeros((new_negs.shape[0], 1))))
 93 | 
 94 |             # sort
 95 |             sorted_inds = np.argsort(positions_new[ii])
 96 |             positions_new[ii] = positions_new[ii][sorted_inds]
 97 |             class_labels_new[ii] = class_labels_new[ii][sorted_inds]
 98 | 
 99 |         return positions_new, class_labels_new
100 | 
101 | 
102 | def generate_training_positions(files, gt_pos, durations, params):
103 |     positions = [None]*len(files)
104 |     class_labels = [None]*len(files)
105 |     for ii, ff in enumerate(files):
106 |         positions[ii], class_labels[ii] = extract_train_position_from_file(gt_pos[ii], durations[ii], params)
107 |     return positions, class_labels
108 | 
109 | 
110 | def extract_train_position_from_file(gt_pos, duration, params):
111 |     """
112 |     Samples random negative locations for negs, making sure not to overlap with GT.
113 | 
114 |     gt_pos is the time in seconds that the call occurs.
115 |     positions contains time in seconds of some negative and positive examples.
116 |     """
117 | 
118 |     if gt_pos.shape[0] == 0:
119 |         # dont extract any values if the file does not contain anything
120 |         # we will use these ones for HNM later
121 |         positions = np.zeros(0)
122 |         class_labels = np.zeros((0,1))
123 |     else:
124 |         shift = 0  # if there is augmentation this is how much we will add
125 |         num_neg_calls = gt_pos.shape[0]
126 |         pos_window = params.window_size / 2  # window around GT that is not sampled from
127 |         pos = gt_pos[:, 0]
128 | 
129 |         # augmentation
130 |         if params.add_extra_calls:
131 |             shift = params.aug_shift
132 |             num_neg_calls *= 3
133 |             pos = np.hstack((gt_pos[:, 0] - shift, gt_pos[:, 0], gt_pos[:, 0] + shift))
134 | 
135 |         # sample a set of negative locations - need to be sufficiently far away from GT
136 |         pos_pad = np.hstack((0-params.window_size, gt_pos[:, 0], duration-params.window_size))
137 |         neg = []
138 |         cnt = 0
139 |         while cnt < num_neg_calls:
140 |             rand_pos = np.random.random()*pos_pad.max()
141 |             if (np.abs(pos_pad - rand_pos) > (pos_window+shift)).mean() == 1:
142 |                 neg.append(rand_pos)
143 |                 cnt += 1
144 |         neg = np.asarray(neg)
145 | 
146 |         # sort them
147 |         positions = np.hstack((pos, neg))
148 |         sorted_inds = np.argsort(positions)
149 |         positions = positions[sorted_inds]
150 | 
151 |         # create labels
152 |         class_labels = np.vstack((np.ones((pos.shape[0], 1)), np.zeros((neg.shape[0], 1))))
153 |         class_labels = class_labels[sorted_inds]
154 | 
155 |     return positions, class_labels
156 | 


--------------------------------------------------------------------------------
/bat_train/cls_audio_forest.py:
--------------------------------------------------------------------------------
  1 | import grad_features as gf
  2 | import random_forest as rf
  3 | import numpy as np
  4 | from skimage.util.shape import view_as_windows
  5 | from scipy.ndimage import zoom
  6 | import pyximport; pyximport.install()
  7 | import nms as nms
  8 | from scipy.ndimage.filters import gaussian_filter1d
  9 | import spectrogram as sp
 10 | from skimage import filters
 11 | from scipy.io import wavfile
 12 | from skimage.util import view_as_blocks
 13 | 
 14 | 
 15 | class AudioForest:
 16 | 
 17 |     def __init__(self, params_):
 18 |         self.params = params_
 19 |         forest_params = rf.ForestParams(num_classes=2, trees=self.params.trees,
 20 |         depth=self.params.depth, min_cnt=self.params.min_cnt, tests=self.params.tests)
 21 |         self.forest = rf.Forest(forest_params)
 22 | 
 23 |     def train(self, positions, class_labels, files, durations):
 24 |         feats = []
 25 |         labs = []
 26 |         for ii, file_name in enumerate(files):
 27 | 
 28 |             local_feats = self.create_or_load_features(file_name)
 29 | 
 30 |             # convert time in file to integer
 31 |             positions_ratio = positions[ii] / durations[ii]
 32 |             train_inds = (positions_ratio*float(local_feats.shape[0])).astype('int')
 33 | 
 34 |             feats.append(local_feats[train_inds, :])
 35 |             labs.append(class_labels[ii])
 36 | 
 37 |         # flatten list of lists and set to correct output
 38 |         features = np.vstack(feats)
 39 |         labels = np.vstack(labs)
 40 |         print 'train size', features.shape
 41 |         self.forest.train(features, labels, False)
 42 | 
 43 |     def test(self, file_name=None, file_duration=None, audio_samples=None, sampling_rate=None):
 44 | 
 45 |         # compute features
 46 |         features = self.create_or_load_features(file_name, audio_samples, sampling_rate)
 47 | 
 48 |         # make prediction
 49 |         y_prediction = self.forest.test(features)[:, 1][:, np.newaxis]
 50 | 
 51 |         # smooth the output
 52 |         if self.params.smooth_op_prediction:
 53 |             y_prediction = gaussian_filter1d(y_prediction, self.params.smooth_op_prediction_sigma, axis=0)
 54 |         pos, prob = nms.nms_1d(y_prediction[:,0], self.params.nms_win_size, file_duration)
 55 | 
 56 |         return pos, prob, y_prediction
 57 | 
 58 |     def create_or_load_features(self, file_name=None, audio_samples=None, sampling_rate=None):
 59 |         """
 60 |         Does 1 of 3 possible things
 61 |         1) computes feature from audio samples directly
 62 |         2) loads feature from disk OR
 63 |         3) computes features from file name
 64 |         """
 65 | 
 66 |         if file_name is None:
 67 |             features = compute_features(audio_samples, sampling_rate, self.params)
 68 |         else:
 69 |             if self.params.load_features_from_file:
 70 |                 features = np.load(self.params.feature_dir + file_name + '.npy')
 71 |             else:
 72 |                 sampling_rate, audio_samples = wavfile.read(self.params.audio_dir + file_name + '.wav')
 73 |                 features = compute_features(audio_samples, sampling_rate, self.params)
 74 |         return features
 75 | 
 76 |     def save_features(self, files):
 77 |         for file_name in files:
 78 |             sampling_rate, audio_samples = wavfile.read(self.params.audio_dir + file_name + '.wav')
 79 |             features = compute_features(audio_samples, sampling_rate, self.params)
 80 |             np.save(self.params.feature_dir + file_name, features)
 81 | 
 82 | 
 83 | def spatial_pool(ip, block_size):
 84 |     """
 85 |     Does sum pooling to reduce dimensionality
 86 |     """
 87 |     # make sure its evenly divisible by padding with last rows
 88 |     vert_diff = ip.shape[0]%int(block_size)
 89 |     horz_diff = ip.shape[1]%int(block_size)
 90 | 
 91 |     if vert_diff > 0:
 92 |         ip = np.vstack((ip, np.tile(ip[-1, :], ((block_size-vert_diff, 1)))))
 93 |     if horz_diff > 0:
 94 |         ip = np.hstack((ip, np.tile(ip[:, -1], ((block_size-horz_diff, 1))).T))
 95 | 
 96 |     # get block_size*block_size non-overlapping blocks
 97 |     blocks = view_as_blocks(ip, (block_size, block_size))
 98 | 
 99 |     # sum, could max etc.
100 |     op = blocks.reshape(blocks.shape[0], blocks.shape[1], blocks.shape[2]*blocks.shape[3]).sum(2)
101 | 
102 |     return op
103 | 
104 | 
105 | def compute_features(audio_samples, sampling_rate, params):
106 |     """
107 |     Computes feature vector given audio file name.
108 |     Assumes all the spectrograms are the same size - this should be checked externally
109 |     """
110 | 
111 |     # load audio and create spectrogram
112 |     spectrogram = sp.gen_spectrogram(audio_samples, sampling_rate, params.fft_win_length, params.fft_overlap,
113 |                                      crop_spec=params.crop_spec, max_freq=params.max_freq, min_freq=params.min_freq)
114 |     spectrogram = sp.process_spectrogram(spectrogram, denoise_spec=params.denoise, mean_log_mag=params.mean_log_mag, smooth_spec=params.smooth_spec)
115 | 
116 |     # pad with dummy features at the end to take into account the size of the sliding window
117 |     if params.feature_type == 'raw':
118 |         spec_win = view_as_windows(spectrogram, (spectrogram.shape[0], params.window_width))[0]
119 |         spec_win = zoom(spec_win, (1, 0.5, 0.5), order=1)
120 |         total_win_size = spectrogram.shape[1]
121 | 
122 |     elif params.feature_type == 'grad':
123 |         grad = np.gradient(spectrogram)
124 |         grad_mag = np.sqrt((grad[0]**2 + grad[1]**2))
125 |         total_win_size = spectrogram.shape[1]
126 | 
127 |         spec_win = view_as_windows(grad_mag, (grad_mag.shape[0], params.window_width))[0]
128 |         spec_win = zoom(spec_win, (1, 0.5, 0.5), order=1)
129 | 
130 |     elif params.feature_type == 'max_freq':
131 | 
132 |         num_max_freqs = 3  # e.g. 1 means keep top 1, 2 means top 2, ...
133 |         total_win_size = spectrogram.shape[1]
134 |         max_freq = np.argsort(spectrogram, 0)
135 |         max_amp = np.sort(spectrogram, 0)
136 |         stacked = np.vstack((max_amp[-num_max_freqs:, :], max_freq[-num_max_freqs:, :]))
137 | 
138 |         spec_win = view_as_windows(stacked, (stacked.shape[0], params.window_width))[0]
139 | 
140 |     elif params.feature_type == 'hog':
141 |         block_size = 4
142 |         hog = gf.compute_hog(spectrogram, block_size)
143 |         total_win_size = hog.shape[1]
144 |         window_width_down = np.rint(params.window_width / float(block_size))
145 | 
146 |         spec_win = view_as_windows(hog, (hog.shape[0], window_width_down, hog.shape[2]))[0]
147 | 
148 |     elif params.feature_type == 'grad_pool':
149 |         grad = np.gradient(spectrogram)
150 |         grad_mag = np.sqrt((grad[0]**2 + grad[1]**2))
151 | 
152 |         down_sample_size = 4
153 |         window_width_down = np.rint(params.window_width / float(down_sample_size))
154 |         grad_mag_pool = spatial_pool(grad_mag, down_sample_size)
155 |         total_win_size = grad_mag_pool.shape[1]
156 | 
157 |         spec_win = view_as_windows(grad_mag_pool, (grad_mag_pool.shape[0], window_width_down))[0]
158 | 
159 |     elif params.feature_type == 'raw_pool':
160 |         down_sample_size = 4
161 |         window_width_down = np.rint(params.window_width / float(down_sample_size))
162 |         spec_pool = spatial_pool(spectrogram, down_sample_size)
163 |         total_win_size = spec_pool.shape[1]
164 | 
165 |         spec_win = view_as_windows(spec_pool, (spec_pool.shape[0], window_width_down))[0]
166 | 
167 |     # pad on extra features at the end as the sliding window will mean its a different size
168 |     features = spec_win.reshape((spec_win.shape[0], np.prod(spec_win.shape[1:])))
169 |     features = np.vstack((features, np.tile(features[-1, :], (total_win_size - features.shape[0], 1))))
170 |     features = features.astype(np.float32)
171 | 
172 |     return features
173 | 
174 | 


--------------------------------------------------------------------------------
/bat_train/cls_cnn.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | from skimage.util.shape import view_as_windows
  3 | from scipy.ndimage import zoom
  4 | from scipy.ndimage.filters import gaussian_filter1d
  5 | import spectrogram as sp
  6 | from scipy.io import wavfile
  7 | import pyximport; pyximport.install()
  8 | import nms as nms
  9 | 
 10 | import theano
 11 | import lasagne
 12 | from lasagne.layers.dnn import Conv2DDNNLayer as ConvLayer
 13 | from lasagne.layers import Pool2DLayer as PoolLayer
 14 | from lasagne.layers import DenseLayer
 15 | 
 16 | 
 17 | class NeuralNet:
 18 | 
 19 |     def __init__(self, params_):
 20 |         self.params = params_
 21 |         self.network = None
 22 | 
 23 |     def train(self, positions, class_labels, files, durations):
 24 |         feats = []
 25 |         labs = []
 26 |         for ii, file_name in enumerate(files):
 27 | 
 28 |             if positions[ii].shape[0] > 0:
 29 |                 local_feats = self.create_or_load_features(file_name)
 30 | 
 31 |                 # convert time in file to integer
 32 |                 positions_ratio = positions[ii] / durations[ii]
 33 |                 train_inds = (positions_ratio*float(local_feats.shape[0])).astype('int')
 34 | 
 35 |                 feats.append(local_feats[train_inds, :, :, :])
 36 |                 labs.append(class_labels[ii])
 37 | 
 38 |         # flatten list of lists and set to correct output size
 39 |         features = np.vstack(feats)
 40 |         labels = np.vstack(labs).astype(np.uint8)[:,0]
 41 |         print 'train size', features.shape
 42 | 
 43 |         # train network
 44 |         input_var = theano.tensor.tensor4('inputs')
 45 |         target_var = theano.tensor.ivector('targets')
 46 |         self.network = build_cnn(features.shape[2:], input_var, self.params.net_type)
 47 | 
 48 |         prediction = lasagne.layers.get_output(self.network['prob'])
 49 |         loss = lasagne.objectives.categorical_crossentropy(prediction, target_var)
 50 |         loss = loss.mean()
 51 |         params = lasagne.layers.get_all_params(self.network['prob'], trainable=True)
 52 |         updates = lasagne.updates.nesterov_momentum(
 53 |                 loss, params, learning_rate=self.params.learn_rate, momentum=self.params.moment)
 54 |         train_fn = theano.function([input_var, target_var], loss, updates=updates)
 55 | 
 56 |         for epoch in range(self.params.num_epochs):
 57 |             # in each epoch, we do a full pass over the training data
 58 |             for batch in iterate_minibatches(features, labels, self.params.batchsize, shuffle=True):
 59 |                 inputs, targets = batch
 60 |                 train_fn(inputs, targets)
 61 | 
 62 |         # test function
 63 |         pred = lasagne.layers.get_output(self.network['prob'], deterministic=True)[:, 1]
 64 |         self.test_fn = theano.function([input_var], pred)
 65 | 
 66 |     def test(self, file_name=None, file_duration=None, audio_samples=None, sampling_rate=None):
 67 | 
 68 |         # compute features and perform classification
 69 |         features = self.create_or_load_features(file_name, audio_samples, sampling_rate)
 70 |         y_prediction = self.test_fn(features)[:, np.newaxis]
 71 | 
 72 |         # smooth the output prediction
 73 |         if self.params.smooth_op_prediction:
 74 |             y_prediction = gaussian_filter1d(y_prediction, self.params.smooth_op_prediction_sigma, axis=0)
 75 | 
 76 |         # perform non max suppression
 77 |         pos, prob = nms.nms_1d(y_prediction[:,0].astype(np.float), self.params.nms_win_size, file_duration)
 78 | 
 79 |         return pos, prob, y_prediction
 80 | 
 81 |     def create_or_load_features(self, file_name=None, audio_samples=None, sampling_rate=None):
 82 |         """
 83 |         Does 1 of 3 possible things
 84 |         1) computes feature from audio samples directly
 85 |         2) loads feature from disk OR
 86 |         3) computes features from file name
 87 |         """
 88 | 
 89 |         if file_name is None:
 90 |             features = compute_features(audio_samples, sampling_rate, self.params)
 91 |         else:
 92 |             if self.params.load_features_from_file:
 93 |                 features = np.load(self.params.feature_dir + file_name + '.npy')
 94 |             else:
 95 |                 sampling_rate, audio_samples = wavfile.read(self.params.audio_dir + file_name + '.wav')
 96 |                 features = compute_features(audio_samples, sampling_rate, self.params)
 97 | 
 98 |         return features
 99 | 
100 |     def save_features(self, files):
101 |         for file_name in files:
102 |             sampling_rate, audio_samples = wavfile.read(self.params.audio_dir + file_name + '.wav')
103 |             features = compute_features(audio_samples, sampling_rate, self.params)
104 |             np.save(self.params.feature_dir + file_name, features)
105 | 
106 | 
107 | def iterate_minibatches(inputs, targets, batchsize, shuffle=False):
108 |     # Note: this should not be used for testing as it creats even sized
109 |     # minibatches so will skip some data
110 |     indices = np.arange(len(inputs))
111 |     if shuffle:
112 |         np.random.shuffle(indices)
113 |     for start_idx in range(0, len(inputs) - batchsize + 1, batchsize):
114 |         excerpt = indices[start_idx:start_idx + batchsize]
115 |         yield inputs[excerpt], targets[excerpt]
116 | 
117 | def build_cnn(ip_size, input_var, net_type):
118 |     if net_type == 'big':
119 |         net = network_big(ip_size, input_var)
120 |     elif net_type == 'small':
121 |         net = network_sm(ip_size, input_var)
122 |     else:
123 |         print 'Error: network not defined'
124 |     return net
125 | 
126 | def network_big(ip_size, input_var):
127 |     net = {}
128 |     net['input'] = lasagne.layers.InputLayer(shape=(None, 1, ip_size[0], ip_size[1]), input_var=input_var)
129 |     net['conv1'] = ConvLayer(net['input'], 32, 3, pad=1)
130 |     net['pool1'] = PoolLayer(net['conv1'], 2)
131 |     net['conv2'] = ConvLayer(net['pool1'], 32, 3, pad=1)
132 |     net['pool2'] = PoolLayer(net['conv2'], 2)
133 |     net['conv3'] = ConvLayer(net['pool2'], 32, 3, pad=1)
134 |     net['pool3'] = PoolLayer(net['conv3'], 2)
135 |     net['fc1']  = DenseLayer(lasagne.layers.dropout(net['pool3'], p=0.5), num_units=256, nonlinearity=lasagne.nonlinearities.rectify)
136 |     net['prob'] = DenseLayer(lasagne.layers.dropout(net['fc1'], p=0.5), num_units=2, nonlinearity=lasagne.nonlinearities.softmax)
137 |     return net
138 | 
139 | def network_sm(ip_size, input_var):
140 |     net = {}
141 |     net['input'] = lasagne.layers.InputLayer(shape=(None, 1, ip_size[0], ip_size[1]), input_var=input_var)
142 |     net['conv1'] = ConvLayer(net['input'], 16, 3, pad=0)
143 |     net['pool1'] = PoolLayer(net['conv1'], 2)
144 |     net['conv2'] = ConvLayer(net['pool1'], 16, 3, pad=0)
145 |     net['pool2'] = PoolLayer(net['conv2'], 2)
146 |     net['fc1']  = DenseLayer(lasagne.layers.dropout(net['pool2'], p=0.5), num_units=64, nonlinearity=lasagne.nonlinearities.rectify)
147 |     net['prob'] = DenseLayer(lasagne.layers.dropout(net['fc1'], p=0.5), num_units=2, nonlinearity=lasagne.nonlinearities.softmax)
148 |     return net
149 | 
150 | def compute_features(audio_samples, sampling_rate, params):
151 |     """
152 |     Computes overlapping windows of spectrogram as input for CNN.
153 |     """
154 | 
155 |     # load audio and create spectrogram
156 |     spectrogram = sp.gen_spectrogram(audio_samples, sampling_rate, params.fft_win_length, params.fft_overlap,
157 |                                      crop_spec=params.crop_spec, max_freq=params.max_freq, min_freq=params.min_freq)
158 |     spectrogram = sp.process_spectrogram(spectrogram, denoise_spec=params.denoise, mean_log_mag=params.mean_log_mag, smooth_spec=params.smooth_spec)
159 | 
160 |     # extract windows
161 |     spec_win = view_as_windows(spectrogram, (spectrogram.shape[0], params.window_width))[0]
162 |     spec_win = zoom(spec_win, (1, 0.5, 0.5), order=1)
163 |     spec_width = spectrogram.shape[1]
164 | 
165 |     # make the correct size for CNN
166 |     features = np.zeros((spec_width, 1, spec_win.shape[1], spec_win.shape[2]), dtype=np.float32)
167 |     features[:spec_win.shape[0], 0, :, :] = spec_win
168 | 
169 |     return features
170 | 


--------------------------------------------------------------------------------
/bat_train/cls_segment.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | import scipy.ndimage.morphology as morph
 3 | from scipy.ndimage.filters import median_filter
 4 | import scipy.ndimage
 5 | from skimage.measure import regionprops
 6 | import spectrogram as sp
 7 | from scipy.io import wavfile
 8 | 
 9 | class SegmentAudio:
10 | 
11 |     def __init__(self, params_):
12 |         self.params = params_
13 | 
14 |     def train(self, positions, class_labels, files, durations):
15 |         # does not need to do anything
16 |         pass
17 | 
18 |     def save_features(self, files):
19 |         # does not need to do anything
20 |         pass
21 | 
22 |     def test(self, file_name=None, file_duration=None, audio_samples=None, sampling_rate=None):
23 | 
24 |         sampling_rate, audio_samples = wavfile.read(self.params.audio_dir + file_name + '.wav')
25 | 
26 |         spectrogram = sp.gen_spectrogram(audio_samples, sampling_rate, self.params.fft_win_length,
27 |                     self.params.fft_overlap, crop_spec=self.params.crop_spec, max_freq=self.params.max_freq,
28 |                     min_freq=self.params.min_freq)
29 |         spectrogram = sp.process_spectrogram(spectrogram, denoise_spec=self.params.denoise,
30 |                     mean_log_mag=self.params.mean_log_mag, smooth_spec=self.params.smooth_spec)
31 | 
32 |         # compute possible call locations
33 |         pos = compute_position_from_segment(spectrogram, file_duration, self.params)
34 |         prob = np.ones((pos.shape[0], 1))  # no probability information
35 |         y_prediction = np.zeros((spectrogram.shape[1], 1))  # dummy
36 | 
37 |         return pos, prob, y_prediction
38 | 
39 | 
40 | def compute_position_from_segment(spec, file_duration, params):
41 |     """
42 |     Based on Large-scale identification of birds in audio recordings
43 |     http://ceur-ws.org/Vol-1180/CLEF2014wn-Life-Lasseck2014.pdf
44 |     """
45 | 
46 |     # median filter
47 |     med_time = np.median(spec, 0)[np.newaxis, :]
48 |     med_freq = np.median(spec, 1)[:, np.newaxis]
49 |     med_freq_m = np.tile(med_freq, (1, spec.shape[1]))
50 |     med_time_m = np.tile(med_time, (spec.shape[0], 1))
51 | 
52 |     # binarize
53 |     spec_t = np.logical_and((spec > params.median_mult*med_freq_m), (spec > params.median_mult*med_time_m))
54 | 
55 |     # morphological operations
56 |     spec_t_morph = morph.binary_closing(spec_t)
57 |     spec_t_morph = morph.binary_dilation(spec_t_morph)
58 |     spec_t_morph = median_filter(spec_t_morph, (2, 2))
59 | 
60 |     # connected component and filter by size
61 |     label_im, num_labels = scipy.ndimage.label(spec_t_morph)
62 |     sizes = scipy.ndimage.sum(spec_t_morph, label_im, range(num_labels + 1))
63 |     mean_vals = scipy.ndimage.sum(spec, label_im, range(1, num_labels + 1))
64 |     mask_size = sizes < params.min_region_size
65 |     remove_pixel = mask_size[label_im]
66 |     label_im[remove_pixel] = 0
67 |     labels = np.unique(label_im)
68 |     label_im = np.searchsorted(labels, label_im)
69 | 
70 |     # get vertical positions
71 |     num_calls = np.unique(label_im).shape[0]-1  # no zero
72 |     props = regionprops(label_im)
73 |     call_pos = np.zeros(num_calls)
74 |     for ii, pp in enumerate(props):
75 |         call_pos[ii] = pp['bbox'][1] / float(spec.shape[1])
76 | 
77 |     # sort and convert to time as opposed to a ratio
78 |     inds = call_pos.argsort()
79 |     call_pos = call_pos[inds] * file_duration
80 | 
81 |     # remove overlapping calls - happens because of harmonics
82 |     dis = np.triu(np.abs(call_pos[:, np.newaxis]-call_pos[np.newaxis, :]))
83 |     dis = dis > params.min_overlap
84 |     mask = np.triu(dis) + np.tril(np.ones([num_calls, num_calls]))
85 |     valid_inds = mask.sum(0) == num_calls
86 |     pos = call_pos[valid_inds]
87 | 
88 |     return pos
89 | 


--------------------------------------------------------------------------------
/bat_train/create_results.py:
--------------------------------------------------------------------------------
 1 | import evaluate as evl
 2 | import matplotlib.pyplot as plt
 3 | import numpy as np
 4 | import os
 5 | import spectrogram as sp
 6 | from scipy.io import wavfile
 7 | import seaborn as sns
 8 | sns.set_style('whitegrid')
 9 | 
10 | 
11 | def plot_prec_recall(alg_name, recall, precision, nms_prob=None):
12 |     # average precision
13 |     ave_prec = evl.calc_average_precision(recall, precision)
14 |     print 'average precision (area) = %.3f ' % ave_prec
15 | 
16 |     # recall at 95% precision
17 |     desired_precision = 0.95
18 |     if np.where(precision >= desired_precision)[0].shape[0] > 0:
19 |         recall_at_precision = recall[np.where(precision >= desired_precision)[0][-1]]
20 |     else:
21 |         recall_at_precision = 0
22 | 
23 |     print 'recall at', int(desired_precision*100), '% precision = ', "%.3f" % recall_at_precision
24 |     plt.plot([0, 1.02], [desired_precision, desired_precision], 'b:', linewidth=1)
25 |     plt.plot([recall_at_precision, recall_at_precision], [0, desired_precision], 'b:', linewidth=1)
26 | 
27 |     # create plot
28 |     label_str = alg_name.ljust(8) + "%.3f" % ave_prec + '  ' + str(desired_precision) + ' rec %.3f' % recall_at_precision
29 |     if recall.shape[0] == 1:
30 |         plt.plot(recall, precision, 'o', label=label_str)
31 |     else:
32 |         plt.plot(recall, precision, '', label=label_str)
33 | 
34 |     # find different probability locations on curve
35 |     if nms_prob is not None:
36 |         conf = np.concatenate(nms_prob)[:, 0]
37 |         for p_val in [0.9, 0.7, 0.5]:
38 |             p_loc = np.where(np.sort(conf)[::-1] < p_val)[0]
39 |             if p_loc.shape[0] > 0:
40 |                 plt.plot(recall[p_loc[0]], precision[p_loc[0]], 'o', color='#4C72B0')
41 |                 plt.text(recall[p_loc[0]]-0.05, precision[p_loc[0]]-0.05, str(p_val))
42 | 
43 |     plt.ylabel('precision')
44 |     plt.xlabel('recall')
45 |     plt.axis((0, 1.02, 0, 1.02))
46 |     plt.legend(loc='lower left')
47 |     plt.grid(1)
48 |     plt.show()
49 | 
50 | 
51 | def plot_spec(op_file_name, ip_file, gt_pos, nms_pos, nms_prob, y_prediction, params, save_ims):
52 | 
53 |     # create spec
54 |     sampling_rate, audio_samples = wavfile.read(ip_file)
55 |     file_duration = audio_samples.shape[0] / float(sampling_rate)
56 |     spectrogram = sp.gen_spectrogram(audio_samples, sampling_rate, params.fft_win_length, params.fft_overlap,
57 |                                      crop_spec=params.crop_spec, max_freq=params.max_freq, min_freq=params.min_freq)
58 | 
59 |     if y_prediction is None:
60 |         y_prediction = np.zeros((spectrogram.shape[1]))
61 | 
62 |     gt_pos_norm = (gt_pos/file_duration)*y_prediction.shape[0]
63 |     nms_pos_norm = (nms_pos/file_duration)*y_prediction.shape[0]
64 | 
65 |     fig = plt.figure(1, figsize=(10, 6))
66 |     ax1 = plt.axes([0.05, 0.7, 0.9, 0.25])
67 |     ax0 = plt.axes([0.05, 0.05, 0.9, 0.60])
68 | 
69 |     ax1.plot([0, y_prediction.shape[0]], [0.5, 0.5], 'k--', linewidth=0.5, label='pred')
70 | 
71 |     # plot gt
72 |     for pt in gt_pos_norm:
73 |         ax1.plot([pt, pt], [0, 1], 'g', linewidth=4, label='gt')
74 | 
75 |     # plot nms
76 |     for p in range(len(nms_pos_norm)):
77 |         ax1.plot([nms_pos_norm[p], nms_pos_norm[p]], [0, nms_prob[p]], 'r', linewidth=2, label='pred')
78 | 
79 |     ax1.plot(y_prediction)
80 |     ax1.set_xlim(0, y_prediction.shape[0])
81 |     ax1.set_ylim(0, 1)
82 |     ax1.xaxis.set_ticklabels([])
83 | 
84 |     # plot image
85 |     ax0.imshow(spectrogram, aspect='auto', cmap='plasma')
86 |     ax0.xaxis.set_ticklabels([])
87 |     ax0.yaxis.set_ticklabels([])
88 |     plt.grid()
89 | 
90 |     if save_ims:
91 |         fig.savefig(op_file_name + '.jpg')
92 | 
93 |     plt.close(1)
94 | 


--------------------------------------------------------------------------------
/bat_train/data/readme.md:
--------------------------------------------------------------------------------
1 | This directory should contain the following directories:  
2 | baselines  
3 | models  
4 | train_test_split  
5 | wav  
6 | 


--------------------------------------------------------------------------------
/bat_train/data_set_params.py:
--------------------------------------------------------------------------------
 1 | import time
 2 | import numpy as np
 3 | 
 4 | 
 5 | class DataSetParams:
 6 | 
 7 |     def __init__(self):
 8 | 
 9 |         # spectrogram generation
10 |         self.spectrogram_params()
11 | 
12 |         # detection
13 |         self.detection()
14 | 
15 |         # data
16 |         self.spec_dir = ''
17 |         self.audio_dir = ''
18 | 
19 |         self.save_features_to_disk = False
20 |         self.load_features_from_file = False
21 | 
22 |         # hard negative mining
23 |         self.num_hard_negative_mining = 2  # if 0 there won't be any
24 | 
25 |         # non max suppression - smoothing and window
26 |         self.smooth_op_prediction = True  # smooth the op parameters before nms
27 |         self.smooth_op_prediction_sigma = 0.006 / self.time_per_slice
28 |         self.nms_win_size = int(np.round(0.12 / self.time_per_slice))  #ie 21 samples at 0.02322 fft win size, 0.75 overlap
29 | 
30 |         # model
31 |         self.classification_model = 'cnn'  # rf_vanilla, segment, cnn
32 | 
33 |         # rf_vanilla params
34 |         self.feature_type = 'grad_pool'  # raw, grad, grad_pool, raw_pool, hog, max_freq
35 |         self.trees = 50
36 |         self.depth = 20
37 |         self.min_cnt = 2
38 |         self.tests = 5000
39 | 
40 |         # CNN params
41 |         self.learn_rate = 0.01
42 |         self.moment = 0.9
43 |         self.num_epochs = 50
44 |         self.batchsize = 256
45 |         self.net_type = 'big'  # big, small
46 | 
47 |         # segment params - these were cross validated on validation set
48 |         self.median_mult = 5.0  # how much to treshold spectrograms - higher will mean less calls
49 |         self.min_region_size = np.round(0.4/self.time_per_slice)  # used to determine the thresholding - 65 for fft win 0.02322
50 |         self.min_overlap = 0.1  # in secs, anything that overlaps by this much will be counted as 1 call
51 | 
52 |         # param name string
53 |         self.model_identifier = time.strftime("%d_%m_%y_%H_%M_%S_") + self.classification_model + '_hnm_' + str(self.num_hard_negative_mining)
54 |         if self.classification_model == 'rf_vanilla':
55 |             self.model_identifier += '_feat_' + self.feature_type
56 |         elif self.classification_model == 'cnn':
57 |             self.model_identifier += '_lr_'+ str(self.learn_rate) + '_mo_'+ str(self.moment) + '_net_'+ self.net_type
58 |         elif self.classification_model == 'segment':
59 |             self.model_identifier += '_minSize_' + str(self.min_region_size) + '_minOverlap_' + str(self.min_overlap )
60 | 
61 |         # misc
62 |         self.run_parallel = True
63 |         self.num_processes = 10
64 |         self.add_extra_calls = True  # sample some other positive calls near the GT
65 |         self.aug_shift = 0.015  # unit seconds, add extra call either side of GT if augmenting
66 | 
67 |     def spectrogram_params(self):
68 | 
69 |         self.valid_file_length = 169345  # some files are longer than they should be
70 | 
71 |         # spectrogram generation
72 |         self.fft_win_length = 0.02322  # ie 1024/44100.0 about 23 msecs.
73 |         self.fft_overlap = 0.75  # this is a percent - previously was 768/1024
74 |         self.time_per_slice = ((1-self.fft_overlap)*self.fft_win_length)
75 | 
76 |         self.denoise = True
77 |         self.mean_log_mag = 0.5  # sensitive to the spectrogram scaling used
78 |         self.smooth_spec = True  # gaussian filter
79 | 
80 |         # throw away unnecessary frequencies, keep from bottom
81 |         # TODO this only makes sense as a frequency when you know the sampling rate
82 |         # better to think of these as indices
83 |         self.crop_spec = True
84 |         self.max_freq = 270
85 |         self.min_freq = 10
86 | 
87 |         # if doing 192K files for training
88 |         #self.fft_win_length = 0.02667  # i.e. 512/19200
89 |         #self.max_freq = 240
90 |         #self.min_freq = 10
91 | 
92 |     def detection(self):
93 |         self.window_size = 0.230  # 230 milliseconds (in time expanded, so 23 ms for not)
94 |         # represent window size in terms of the number of time bins
95 |         self.window_width = np.rint(self.window_size / ((1-self.fft_overlap)*self.fft_win_length))
96 |         self.detection_overlap = 0.1  # needs to be within x seconds of GT to be considered correct
97 |         self.detection_prob = 0.5  # everything under this is considered background - used in HNM
98 | 


--------------------------------------------------------------------------------
/bat_train/evaluate.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | from sklearn.metrics import roc_curve, auc
  3 | 
  4 | 
  5 | def compute_error_auc(op_str, gt, pred, prob):
  6 | 
  7 |     # classification error
  8 |     pred_int = (pred > prob).astype(np.int)
  9 |     class_acc = (pred_int == gt).mean() * 100.0
 10 | 
 11 |     # ROC - area under curve
 12 |     fpr, tpr, thresholds = roc_curve(gt, pred)
 13 |     roc_auc = auc(fpr, tpr)
 14 | 
 15 |     print op_str, ', class acc = %.3f, ROC AUC = %.3f' % (class_acc, roc_auc)
 16 |     #return class_acc, roc_auc
 17 | 
 18 | 
 19 | def calc_average_precision(recall, precision):
 20 | 
 21 |     precision[np.isnan(precision)] = 0
 22 |     recall[np.isnan(recall)] = 0
 23 | 
 24 |     # pascal'12 way
 25 |     mprec = np.hstack((0, precision, 0))
 26 |     mrec = np.hstack((0, recall, 1))
 27 |     for ii in range(mprec.shape[0]-2, -1,-1):
 28 |         mprec[ii] = np.maximum(mprec[ii], mprec[ii+1])
 29 |     inds = np.where(np.not_equal(mrec[1:], mrec[:-1]))[0]+1
 30 |     ave_prec = ((mrec[inds] - mrec[inds-1])*mprec[inds]).sum()
 31 | 
 32 |     return ave_prec
 33 | 
 34 | 
 35 | def remove_end_preds(nms_pos_o, nms_prob_o, gt_pos_o, durations, win_size):
 36 |     # this filters out predictions and gt that are close to the end
 37 |     # this is a bit messy because of the shapes of gt_pos_o
 38 |     nms_pos = []
 39 |     nms_prob = []
 40 |     gt_pos = []
 41 |     for ii in range(len(nms_pos_o)):
 42 |         valid_time = durations[ii] - win_size
 43 |         gt_cur = gt_pos_o[ii]
 44 |         if gt_cur.shape[0] > 0:
 45 |             gt_pos.append(gt_cur[:, 0][gt_cur[:, 0] < valid_time][..., np.newaxis])
 46 |         else:
 47 |             gt_pos.append(gt_cur)
 48 | 
 49 |         valid_preds = nms_pos_o[ii] < valid_time
 50 |         nms_pos.append(nms_pos_o[ii][valid_preds])
 51 |         nms_prob.append(nms_prob_o[ii][valid_preds, 0][..., np.newaxis])
 52 |     return nms_pos, nms_prob, gt_pos
 53 | 
 54 | 
 55 | def prec_recall_1d(nms_pos_o, nms_prob_o, gt_pos_o, durations, detection_overlap, win_size, remove_eof=True):
 56 |     """
 57 |     nms_pos, nms_prob, and gt_pos are lists of numpy arrays specifying detection
 58 |     position, detection probability and GT position.
 59 |     Each list entry is a different file.
 60 |     Each entry in nms_pos is an array of length num_entries. For nms_prob and
 61 |     gt_pos its an array of size (num_entries, 1).
 62 | 
 63 |     durations is a array of the length of the number of files with each entry
 64 |     containing that file length in seconds.
 65 |     detection_overlap determines if a prediction is counted as correct or not.
 66 |     win_size is used to ignore predictions and ground truth at the end of an
 67 |     audio file.
 68 | 
 69 |     returns
 70 |     precision: fraction of retrieved instances that are relevant.
 71 |     recall: fraction of relevant instances that are retrieved.
 72 |     """
 73 | 
 74 |     if remove_eof:
 75 |         # filter out the detections in both ground truth and predictions that are too
 76 |         # close to the end of the file - dont count them during eval
 77 |         nms_pos, nms_prob, gt_pos = remove_end_preds(nms_pos_o, nms_prob_o, gt_pos_o, durations, win_size)
 78 |     else:
 79 |         nms_pos = nms_pos_o
 80 |         nms_prob = nms_prob_o
 81 |         gt_pos = gt_pos_o
 82 | 
 83 |     # loop through each file
 84 |     true_pos = []  # correctly predicts the ground truth
 85 |     false_pos = []  # says there is a detection but isn't
 86 |     for ii in range(len(nms_pos)):
 87 |         num_preds = nms_pos[ii].shape[0]
 88 | 
 89 |         if num_preds > 0:  # check to make sure it contains something
 90 |             num_gt = gt_pos[ii].shape[0]
 91 | 
 92 |             # for each set of predictions label them as true positive or false positive (i.e. 1-tp)
 93 |             tp = np.zeros(num_preds)
 94 |             distance_to_gt = np.abs(gt_pos[ii].ravel()-nms_pos[ii].ravel()[:, np.newaxis])
 95 |             within_overlap = (distance_to_gt <= detection_overlap)
 96 | 
 97 |             # remove duplicate detections - assign to valid detection with highest prob
 98 |             for jj in range(num_gt):
 99 |                 inds = np.where(within_overlap[:, jj])[0]  # get the indices of all valid predictions
100 |                 if inds.shape[0] > 0:
101 |                     max_prob = np.argmax(nms_prob[ii][inds])
102 |                     selected_pred = inds[max_prob]
103 |                     within_overlap[selected_pred, :] = False
104 |                     tp[selected_pred] = 1  # set as true positives
105 |             true_pos.append(tp)
106 |             false_pos.append(1 - tp)
107 | 
108 |     # calc precision and recall - sort confidence in descending order
109 |     # PASCAL style
110 |     conf = np.concatenate(nms_prob)[:, 0]
111 |     num_gt = np.concatenate(gt_pos).shape[0]
112 |     inds = np.argsort(conf)[::-1]
113 |     true_pos_cat = np.concatenate(true_pos)[inds].astype(float)
114 |     false_pos_cat = np.concatenate(false_pos)[inds].astype(float)  # i.e. 1-true_pos_cat
115 | 
116 |     if (conf == conf[0]).sum() == conf.shape[0]:
117 |         # all the probability values are the same therefore we will not sweep
118 |         # the curve and instead will return a single value
119 |         true_pos_sum = true_pos_cat.sum()
120 |         false_pos_sum = false_pos_cat.sum()
121 | 
122 |         recall = np.asarray([true_pos_sum / float(num_gt)])
123 |         precision = np.asarray([(true_pos_sum / (false_pos_sum + true_pos_sum))])
124 | 
125 |     elif inds.shape[0] > 0:
126 |         # otherwise produce a list of values
127 |         true_pos_cum = np.cumsum(true_pos_cat)
128 |         false_pos_cum = np.cumsum(false_pos_cat)
129 | 
130 |         recall = true_pos_cum / float(num_gt)
131 |         precision = (true_pos_cum / (false_pos_cum + true_pos_cum))
132 | 
133 |     return precision, recall
134 | 


--------------------------------------------------------------------------------
/bat_train/export_detector_weights.py:
--------------------------------------------------------------------------------
 1 | """
 2 | This script outputs the weights of a trained model so the standalone detector copde
 3 | can use it.
 4 | """
 5 | 
 6 | import cPickle as pickle
 7 | from lasagne.layers.helper import get_all_param_values, get_output_shape, set_all_param_values
 8 | import numpy as np
 9 | import lasagne
10 | import theano
11 | import sys
12 | import cPickle as pickle
13 | import json
14 | 
15 | save_detector = False
16 | 
17 | print 'saving detector'
18 | model_dir = 'results/models/'
19 | model_file = model_dir + 'test_set_norfolk.mod'
20 | print model_file
21 | 
22 | mod = pickle.load(open(model_file))
23 | 
24 | weights = get_all_param_values(mod.model.network['prob'])
25 | np.save(model_file[:-4], weights)
26 | print 'weights shape', len(weights)
27 | 
28 | # save detection params
29 | mod_params = {'win_size':0, 'chunk_size':0, 'max_freq':0, 'min_freq':0,
30 |               'mean_log_mag':0, 'slice_scale':0, 'overlap':0,
31 |               'crop_spec':False, 'denoise':False, 'smooth_spec':False,
32 |               'nms_win_size':0, 'smooth_op_prediction_sigma':0}
33 | 
34 | mod_params['win_size'] = mod.model.params.window_size
35 | mod_params['max_freq'] = mod.model.params.max_freq
36 | mod_params['min_freq'] = mod.model.params.min_freq
37 | mod_params['mean_log_mag'] = mod.model.params.mean_log_mag
38 | mod_params['slice_scale'] = mod.model.params.fft_win_length
39 | mod_params['overlap'] = mod.model.params.fft_overlap
40 | 
41 | mod_params['crop_spec'] = mod.model.params.crop_spec
42 | mod_params['denoise'] = mod.model.params.denoise
43 | mod_params['smooth_spec'] = mod.model.params.smooth_spec
44 | 
45 | mod_params['nms_win_size'] = int(mod.model.params.nms_win_size)
46 | mod_params['smooth_op_prediction_sigma'] = mod.model.params.smooth_op_prediction_sigma
47 | 
48 | params_file = model_file[:-4] + '_params.p'
49 | with open(params_file, 'w') as fp:
50 |     json.dump(mod_params, fp)
51 | 


--------------------------------------------------------------------------------
/bat_train/grad_features.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | from skimage.util import view_as_blocks, view_as_windows
 3 | 
 4 | 
 5 | def compute_hog(arr, block_size=2, block_sum=True, num_orientations=6, block_normalize=False):
 6 |     """
 7 |     Computes histogram of gradient feature for Random Forest.
 8 |     """
 9 | 
10 |     # make sure the input is evenly divisible by the block size
11 |     if block_sum:
12 |         vert_diff = arr.shape[0]%int(block_size)
13 |         horz_diff = arr.shape[1]%int(block_size)
14 | 
15 |         if vert_diff > 0:
16 |             arr = np.vstack((arr, np.tile(arr[-1, :], ((vert_diff, 1)))))
17 |         if horz_diff > 0:
18 |             arr = np.hstack((arr, np.tile(arr[:, -1], ((horz_diff, 1))).T))
19 | 
20 |     # compute gradient magnitude and orientation
21 |     mag, orien = gradient_mag(arr)
22 | 
23 |     # quantize orientations
24 |     bins = np.arange(0, np.pi+np.pi/num_orientations, np.pi/num_orientations)
25 |     orien_quantized = np.argmin(np.abs(orien[:, :, np.newaxis] - bins[np.newaxis, :]), axis=2)
26 |     orien_quantized[orien_quantized == num_orientations] = 0
27 | 
28 |     # create histogram
29 |     hist_of_grads = np.zeros((mag.shape[0], mag.shape[1], num_orientations))
30 |     j, k = np.indices(mag.shape[:2])
31 |     hist_of_grads[j, k, orien_quantized] = mag
32 | 
33 |     # add mag as extra channel
34 |     hist_of_grads = np.dstack((hist_of_grads, mag))
35 | 
36 |     # sum over a block - note this is non-overlapping
37 |     # note that we are assuming that hist_of_grads is evenly divisible by block_size
38 |     if block_sum:
39 |         blocks = view_as_blocks(hist_of_grads, (block_size, block_size, hist_of_grads.shape[2]))
40 |         hist_of_grads = blocks.reshape(blocks.shape[0], blocks.shape[1], blocks.shape[2]*blocks.shape[3]*blocks.shape[4], blocks.shape[5]).sum(2)
41 | 
42 |     # L1 normalization
43 |     if block_normalize:
44 |         hist_of_grads = hist_of_grads / (hist_of_grads.sum(2) + 10e-6)[:, :, np.newaxis]
45 | 
46 |     return hist_of_grads
47 | 
48 | 
49 | def gradient_mag(arr):
50 |     """
51 |     Computes gradient magnitude and orientation.
52 |     """
53 |     gx = np.empty(arr.shape, dtype=np.double)
54 |     gx[:, 0] = 0
55 |     gx[:, -1] = 0
56 |     gx[:, 1:-1] = arr[:, 2:] - arr[:, :-2]
57 |     gy = np.empty(arr.shape, dtype=np.double)
58 |     gy[0, :] = 0
59 |     gy[-1, :] = 0
60 |     gy[1:-1, :] = arr[2:, :] - arr[:-2, :]
61 | 
62 |     mag = np.sqrt((gx**2 + gy**2))
63 |     orien = np.arctan2(gx, gy)
64 |     orien[orien < 0] += np.pi
65 | 
66 |     return mag, orien
67 | 


--------------------------------------------------------------------------------
/bat_train/nms.pyx:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | cimport numpy as np
 3 | cimport cython
 4 | 
 5 | cdef inline int int_min(int a, int b): return a if a < b else b
 6 | 
 7 | @cython.boundscheck(False)
 8 | def nms_1d(np.ndarray src, int win_size, float file_duration):
 9 |     """1D Non maximum suppression
10 |        src: vector of length N
11 |     """
12 | 
13 |     cdef int src_cnt = 0
14 |     cdef int max_ind = 0
15 |     cdef int ii = 0
16 |     cdef int ee = 0
17 |     cdef int width = src.shape[0]-1
18 |     cdef np.ndarray pos = np.empty(width, dtype=np.int)
19 |     cdef int pos_cnt = 0
20 |     while ii <= width:
21 | 
22 |         if max_ind < (ii - win_size):
23 |             max_ind = ii - win_size
24 | 
25 |         ee = int_min(ii + win_size, width)
26 | 
27 |         while max_ind <= ee:
28 |             src_cnt += 1
29 |             if src[<unsigned int>max_ind] > src[<unsigned int>ii]:
30 |                 break
31 |             max_ind += 1
32 | 
33 |         if max_ind > ee:
34 |             pos[<unsigned int>pos_cnt] = ii
35 |             pos_cnt += 1
36 |             max_ind = ii+1
37 |             ii += win_size
38 | 
39 |         ii += 1
40 | 
41 |     pos = pos[:pos_cnt]
42 |     val = src[pos]
43 | 
44 |     # remove peaks near the end
45 |     inds = (pos + win_size) < src.shape[0]
46 |     pos = pos[inds]
47 |     val = val[inds]
48 | 
49 |     # set output to between 0 and 1, then put it in the correct time range
50 |     pos = pos.astype(np.float) / src.shape[0]
51 |     pos = pos*file_duration
52 | 
53 |     return pos, val[..., np.newaxis]
54 | 


--------------------------------------------------------------------------------
/bat_train/nms_slow.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | 
 3 | def nms_1d(src, win_size, file_duration):
 4 |     """1D Non maximum suppression
 5 |        src: vector of length N
 6 |     """
 7 | 
 8 |     pos = []
 9 |     src_cnt = 0
10 |     max_ind = 0
11 |     ii = 0
12 |     ee = 0
13 |     width = src.shape[0]-1
14 |     while ii <= width:
15 | 
16 |         if max_ind < (ii - win_size):
17 |             max_ind = ii - win_size
18 | 
19 |         ee = np.minimum(ii + win_size, width)
20 | 
21 |         while max_ind <= ee:
22 |             src_cnt += 1
23 |             if src[int(max_ind)] > src[int(ii)]:
24 |                 break
25 |             max_ind += 1
26 | 
27 |         if max_ind > ee:
28 |             pos.append(ii)
29 |             max_ind = ii+1
30 |             ii += win_size
31 | 
32 |         ii += 1
33 | 
34 |     pos = np.asarray(pos).astype(np.int)
35 |     val = src[pos]
36 | 
37 |     # remove peaks near the end
38 |     inds = (pos + win_size) < src.shape[0]
39 |     pos = pos[inds]
40 |     val = val[inds]
41 | 
42 |     # set output to between 0 and 1, then put it in the correct time range
43 |     pos = pos / float(src.shape[0])
44 |     pos = pos*file_duration
45 | 
46 |     return pos, val[..., np.newaxis]
47 | 
48 | 
49 | def test_nms():
50 |     import matplotlib.pyplot as plt
51 |     import numpy as np
52 |     import pyximport; pyximport.install(reload_support=True)
53 |     import nms as nms_fast
54 | 
55 |     y = np.sin(np.arange(1000)/100.0*np.pi)
56 |     y = y + np.random.random(y.shape)*0.5
57 |     win_size = int(0.1*y.shape[0]/2.0)
58 | 
59 |     pos, prob = nms_1d(y, win_size, y.shape[0])
60 |     pos_f, prob_f = nms_fast.nms_1d(y, win_size, y.shape[0])
61 | 
62 |     print 'diff between implementations =', 1-np.isclose(prob_f, prob).mean()
63 |     print 'diff between implementations =', 1-np.isclose(pos_f, pos).mean()
64 | 
65 |     plt.close('all')
66 |     plt.plot(y)
67 |     plt.plot((pos).astype('int'), prob, 'ro', ms=10)
68 |     plt.plot((pos_f).astype('int'), prob, 'bo')  # shift so we can see them
69 |     plt.show()
70 | 


--------------------------------------------------------------------------------
/bat_train/random_forest.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | from joblib import Parallel, delayed
  3 | import weave  # not in scipy any more - needs to be installed separately
  4 | 
  5 | class ForestParams:
  6 |     def __init__(self, num_classes, trees=50, depth=20, min_cnt=2, tests=5000):
  7 |         self.num_tests = tests
  8 |         self.min_sample_cnt = min_cnt
  9 |         self.max_depth = depth
 10 |         self.num_trees = trees
 11 |         self.bag_size = 0.8
 12 |         self.train_parallel = True
 13 |         self.num_classes = num_classes  # assumes that the classes are ordered from 0 to C
 14 | 
 15 | 
 16 | class Node:
 17 | 
 18 |     def __init__(self, node_id, node_cnt, exs_at_node, impurity, probability):
 19 |         self.node_id = node_id  # id of absolute node
 20 |         self.node_cnt = node_cnt  # id not including nodes that didn't get made
 21 |         self.exs_at_node = exs_at_node
 22 |         self.impurity = impurity
 23 |         self.num_exs = float(exs_at_node.shape[0])
 24 |         self.is_leaf = True
 25 |         self.info_gain = 0.0
 26 | 
 27 |         # output
 28 |         self.probability = probability.copy()
 29 |         self.class_id = probability.argmax()
 30 | 
 31 |         # node test
 32 |         self.test_ind1 = 0
 33 |         self.test_thresh = 0.0
 34 | 
 35 |     def update_node(self, test_ind1, test_thresh, info_gain):
 36 |         self.test_ind1 = test_ind1
 37 |         self.test_thresh = test_thresh
 38 |         self.info_gain = info_gain
 39 |         self.is_leaf = False
 40 | 
 41 |     def create_child(self, test_res, impurity, prob, child_type, node_cnt):
 42 |         # save absolute location in dataset
 43 |         inds_local = np.where(test_res)[0]
 44 |         inds = self.exs_at_node[inds_local]
 45 | 
 46 |         if child_type == 'left':
 47 |             self.left_node = Node(2*self.node_id+1, node_cnt, inds, impurity, prob)
 48 |         elif child_type == 'right':
 49 |             self.right_node = Node(2*self.node_id+2, node_cnt, inds, impurity, prob)
 50 | 
 51 |     def test(self, X):
 52 |         return X[self.test_ind1] < self.test_thresh
 53 | 
 54 |     def get_compact_node(self):
 55 |         # used for fast forest
 56 |         if not self.is_leaf:
 57 |             node_array = np.zeros(4)
 58 |             # dims 0 and 1 are reserved for indexing children
 59 |             node_array[2] = self.test_ind1
 60 |             node_array[3] = self.test_thresh
 61 |         else:
 62 |             node_array = np.zeros(2+self.probability.shape[0])
 63 |             node_array[0] = -1  # indicates that its a leaf
 64 |             node_array[1] = self.node_cnt  # the id of the node
 65 |             node_array[2:] = self.probability.copy()
 66 |         return node_array
 67 | 
 68 | 
 69 | class Tree:
 70 | 
 71 |     def __init__(self, tree_id, tree_params):
 72 |         self.tree_id = tree_id
 73 |         self.tree_params = tree_params
 74 |         self.num_nodes = 0
 75 |         self.compact_tree = None  # used for fast testing forest and small memory footprint
 76 | 
 77 |     def build_tree(self, X, Y, node):
 78 |         if (node.node_id < ((2.0**self.tree_params.max_depth)-1)) and (node.impurity > 0.0) \
 79 |                 and (self.optimize_node(np.take(X, node.exs_at_node, 0), np.take(Y, node.exs_at_node), node)):
 80 |                 self.num_nodes += 2
 81 |                 self.build_tree(X, Y, node.left_node)
 82 |                 self.build_tree(X, Y, node.right_node)
 83 | 
 84 |     def train(self, X, Y):
 85 | 
 86 |         # bagging
 87 |         exs_at_node = np.random.choice(Y.shape[0], int(Y.shape[0]*self.tree_params.bag_size), replace=False)
 88 |         exs_at_node.sort()
 89 | 
 90 |         # compute impurity
 91 |         prob, impurity = self.calc_impurity(np.take(Y, exs_at_node), np.ones((exs_at_node.shape[0], 1), dtype='bool'))
 92 | 
 93 |         # create root
 94 |         self.root = Node(0, 0, exs_at_node, impurity, prob[:, 0])
 95 |         self.num_nodes = 1
 96 | 
 97 |         # build tree
 98 |         self.build_tree(X, Y, self.root)
 99 | 
100 |         # make compact version for fast testing
101 |         self.compact_tree, _ = self.traverse_tree(self.root, np.zeros(0))
102 | 
103 |     def traverse_tree(self, node, compact_tree_in):
104 |         node_loc = compact_tree_in.shape[0]
105 |         compact_tree = np.hstack((compact_tree_in, node.get_compact_node()))
106 | 
107 |         # this assumes that the index for the left and right child nodes are the first two
108 |         if not node.is_leaf:
109 |             compact_tree, compact_tree[node_loc] = self.traverse_tree(node.left_node, compact_tree)
110 |             compact_tree, compact_tree[node_loc+1] = self.traverse_tree(node.right_node, compact_tree)
111 | 
112 |         return compact_tree, node_loc
113 | 
114 |     def test(self, X):
115 |         op = np.zeros((X.shape[0], self.tree_params.num_classes))
116 | 
117 |         # single dim test
118 |         for ex_id in range(X.shape[0]):
119 |             node = self.root
120 |             while not node.is_leaf:
121 |                 if X[ex_id, node.test_ind1] < node.test_thresh:
122 |                     node = node.right_node
123 |                 else:
124 |                     node = node.left_node
125 |             op[ex_id, :] = node.probability
126 |         return op
127 | 
128 |     def test_fast(self, X):
129 |         op = np.zeros((X.shape[0], self.tree_params.num_classes))
130 |         tree = self.compact_tree  # work around
131 | 
132 |         #in memory: for non leaf  node - 0 is lchild index, 1 is rchild, 2 is dim to test, 3 is threshold
133 |         #in memory: for leaf node - 0 is leaf indicator -1, 1 is the node id, the rest is the probability for each class
134 |         code = """
135 |         int ex_id, node_loc, c_it;
136 |         for (ex_id=0; ex_id<NX[0]; ex_id++) {
137 |             node_loc = 0;
138 |             while (tree[node_loc] != -1) {
139 |                 if (X2(ex_id, int(tree[node_loc+2]))  <  tree[node_loc+3]) {
140 |                     node_loc = tree[node_loc+1];  // right node
141 |                 }
142 |                 else {
143 |                     node_loc = tree[node_loc];  // left node
144 |                 }
145 | 
146 |             }
147 | 
148 |             for (c_it=0; c_it<Nop[1]; c_it++) {
149 |                 OP2(ex_id, c_it) = tree[node_loc + 2 + c_it];
150 |             }
151 |         }
152 |         """
153 |         weave.inline(code, ['X', 'op', 'tree'])
154 |         return op
155 | 
156 |     def get_leaf_ids(self, X):
157 |         op = np.zeros((X.shape[0]))
158 |         tree = self.compact_tree  # work around
159 | 
160 |         #in memory: for non leaf  node - 0 is lchild index, 1 is rchild, 2 is dim to test, 3 is threshold
161 |         #in memory: for leaf node - 0 is leaf indicator -1, 1 is the node id, the rest is the probability for each class
162 |         code = """
163 |         int ex_id, node_loc;
164 |         for (ex_id=0; ex_id<NX[0]; ex_id++) {
165 |             node_loc = 0;
166 |             while (tree[node_loc] != -1) {
167 |                 if (X2(ex_id, int(tree[node_loc+2]))  <  tree[node_loc+3]) {
168 |                     node_loc = tree[node_loc+1];  // right node
169 |                 }
170 |                 else {
171 |                     node_loc = tree[node_loc];  // left node
172 |                 }
173 | 
174 |             }
175 | 
176 |             op[ex_id] = tree[node_loc + 1];  // leaf id
177 | 
178 |         }
179 |         """
180 |         weave.inline(code, ['X', 'op', 'tree'])
181 |         return op
182 | 
183 |     def calc_impurity(self, y_local, test_res):
184 | 
185 |         prob = np.zeros((self.tree_params.num_classes, test_res.shape[1]))
186 | 
187 |         # estimate probability
188 |         for cc in range(self.tree_params.num_classes):
189 |             node_test = test_res * (y_local[:, np.newaxis] == cc)
190 |             prob[cc, :] = node_test.sum(axis=0)
191 | 
192 |         # normalize - make sure not to divide by zero
193 |         prob[:, prob.sum(0) == 0] = 1.0
194 |         prob = prob / prob.sum(0)
195 | 
196 |         # entropy
197 |         #prob_log = prob.copy()
198 |         #prob_log[np.where(prob_log == 0)] = np.nextafter(0, 1)
199 |         #impurity = -np.sum(prob*np.log2(prob_log), axis=0)
200 | 
201 |         # gini
202 |         impurity = 1.0-(prob**2).sum(0)
203 | 
204 |         return prob, impurity
205 | 
206 |     def node_split(self, x_local):
207 |         # left node is false, right is true
208 |         # single dim test
209 |         test_inds_1 = np.sort(np.random.random_integers(0, x_local.shape[1]-1, self.tree_params.num_tests))
210 |         x_local_expand = x_local.take(test_inds_1, 1)
211 |         x_min = x_local_expand.min(0)
212 |         x_max = x_local_expand.max(0)
213 |         test_thresh = (x_max - x_min)*np.random.random_sample(self.tree_params.num_tests) + x_min
214 |         #valid_var = (x_max != x_min)
215 | 
216 |         test_res = x_local_expand < test_thresh
217 | 
218 |         return test_res, test_inds_1, test_thresh
219 | 
220 |     def optimize_node(self, x_local, y_local, node):
221 |         # perform split at node
222 |         test_res, test_inds1, test_thresh = self.node_split(x_local)
223 | 
224 |         # count examples left and right
225 |         num_exs_l = (~test_res).sum(axis=0).astype('float')
226 |         num_exs_r = x_local.shape[0] - num_exs_l  # i.e. num_exs_r = test_res.sum(axis=0).astype('float')
227 |         valid_inds = (num_exs_l >= self.tree_params.min_sample_cnt) & (num_exs_r >= self.tree_params.min_sample_cnt)
228 | 
229 |         successful_split = False
230 |         if valid_inds.sum() > 0:
231 |             # child node impurity
232 |             prob_l, impurity_l = self.calc_impurity(y_local, ~test_res)
233 |             prob_r, impurity_r = self.calc_impurity(y_local, test_res)
234 | 
235 |              # information gain - want the minimum
236 |             num_exs_l_norm = num_exs_l/node.num_exs
237 |             num_exs_r_norm = num_exs_r/node.num_exs
238 |             #info_gain = - node.impurity + (num_exs_r_norm*impurity_r) + (num_exs_l_norm*impurity_l)
239 |             info_gain = (num_exs_r_norm*impurity_r) + (num_exs_l_norm*impurity_l)
240 | 
241 |             # make sure we con only select from valid splits
242 |             info_gain[~valid_inds] = info_gain.max() + 10e-10  # plus small constant
243 |             best_split = info_gain.argmin()
244 | 
245 |             # create new child nodes and update current node
246 |             node.update_node(test_inds1[best_split], test_thresh[best_split], info_gain[best_split])
247 |             node.create_child(~test_res[:, best_split], impurity_l[best_split], prob_l[:, best_split], 'left', self.num_nodes+1)
248 |             node.create_child(test_res[:, best_split], impurity_r[best_split], prob_r[:, best_split], 'right', self.num_nodes+2)
249 | 
250 |             successful_split = True
251 | 
252 |         return successful_split
253 | 
254 | 
255 | ## Parallel training helper - used to train trees in parallel
256 | def train_forest_helper(t_id, X, Y, params, seed):
257 |     #print 'tree', t_id
258 |     np.random.seed(seed)
259 |     tree = Tree(t_id, params)
260 |     tree.train(X, Y)
261 |     return tree
262 | 
263 | 
264 | class Forest:
265 | 
266 |     def __init__(self, params):
267 |         self.params = params
268 |         self.trees = []
269 | 
270 |     def train(self, X, Y, delete_old_trees):
271 |         if delete_old_trees:
272 |             self.trees = []
273 | 
274 |         if self.params.train_parallel:
275 |             # need to seed the random number generator for each process
276 |             seeds = np.random.random_integers(0, 10e8, self.params.num_trees)
277 |             self.trees.extend(Parallel(n_jobs=-1)(delayed(train_forest_helper)(t_id, X, Y, self.params, seeds[t_id])
278 |                                              for t_id in range(self.params.num_trees)))
279 |         else:
280 |             #print 'Standard training'
281 |             for t_id in range(self.params.num_trees):
282 |                 print 'tree', t_id
283 |                 tree = Tree(t_id, self.params)
284 |                 tree.train(X, Y)
285 |                 self.trees.append(tree)
286 | 
287 |     def test(self, X):
288 |         op = np.zeros((X.shape[0], self.params.num_classes))
289 |         for tt, tree in enumerate(self.trees):
290 |             op_local = tree.test_fast(X)
291 |             op += op_local
292 |         op /= float(len(self.trees))
293 |         return op
294 | 
295 |     def get_leaf_ids(self, X):
296 |         op = np.zeros((X.shape[0], len(self.trees)), dtype=np.int64)
297 |         for tt, tree in enumerate(self.trees):
298 |             op[:, tt] = tree.get_leaf_ids(X)
299 |         return op
300 | 
301 |     def delete_trees(self):
302 |         del self.trees[:]
303 | 


--------------------------------------------------------------------------------
/bat_train/readme.md:
--------------------------------------------------------------------------------
 1 | # Training Code
 2 | 
 3 | 
 4 | ### Training
 5 | 
 6 | ##### 1 Download Data
 7 | Download the data from [here](http://visual.cs.ucl.ac.uk/pubs/batDetective). It contains:   
 8 | *baselines*: Results for three different commercial packages we compared against.  
 9 | *models*: Pre-trained CNN models.  
10 | *train_test_split*: the list of training and test files and the time of the bat calls in each file. The training data comes from Bat Detective and test sets have been manually verified.  
11 | *wav*: 4,246 time expanded .wav files from the iBats project.  
12 | 
13 | ##### 2 Run Training and Evaluation Code
14 | Running *run_comparison.py* recreate the results in the paper (up to random initialization). It trains a CNN, Random Forest, and simple segmentation based models and compares their performance to three commercial systems.
15 | 
16 | 
17 | ### Run Detector on Your Own Data
18 | Running *run_detector.py* loads a pre-trained CNN and performs detection on a directory of audio files. Make sure *data_dir* points to the directory where your audio files are. You need to make sure that you have a trained model on your computer. You can get one by training your own model or downloading a pre-trained one (details in the previous steps). Also make sure that if your data is already time expanded set *do_time_expansion=False*.  
19 | Note, that `../bat_eval` also contains separate CPU based evaluation code for CNN_FAST.
20 | 
21 | 
22 | ### Requirements
23 | It takes about 1.5 hrs to run on a desktop with an i7-6850K CPU, 32Gb RAM, and a GTX 1080 on Ubuntu 16.04. You might get some warnings the first time the code is run. The code has been tested with the following package versions from Conda:  
24 | `Python 2.7.12`   
25 | `cython 0.24.1`   
26 | `joblib 0.9.4`  
27 | `lasagne 0.2.dev1`    
28 | `libgcc 7.2.0`   
29 | `matplotlib 2.0.2`  
30 | `numpy 1.12.1`  
31 | `pandas 0.19.2`   
32 | `scipy 0.19.0`  
33 | `scikit-image 0.13.0`  
34 | `scikit-learn 0.19.0`  
35 | `seaborn 0.8`  
36 | `weave 0.16.0`
37 | 
38 | 
39 | ### Acknowledgements  
40 | We are enormously grateful for the efforts and enthusiasm of the amazing iBats and Bat Detective volunteers. We would also like to thank Ian Agranat and Joe Szewczak for useful discussions and access to their systems. Finally, we would like to thank [Zooniverse](https://www.zooniverse.org/) for setting up and hosting the Bat Detective project.
41 | 
42 | ### License
43 | Code, audio data, and annotations are available for research purposes only i.e. non-commercial use. For any other use of the software or data please contact the authors.
44 | 


--------------------------------------------------------------------------------
/bat_train/run_comparison.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import matplotlib.pyplot as plt
  3 | import os
  4 | import evaluate as evl
  5 | import create_results as res
  6 | from data_set_params import DataSetParams
  7 | import classifier as clss
  8 | import pandas as pd
  9 | import cPickle as pickle
 10 | 
 11 | 
 12 | def read_baseline_res(baseline_file_name, test_files):
 13 |     da = pd.read_csv(baseline_file_name)
 14 |     pos = []
 15 |     prob = []
 16 |     for ff in test_files:
 17 |         rr = da[da['Filename'] == ff]
 18 |         inds = np.argsort(rr.TimeInFile.values)
 19 |         pos.append(rr.TimeInFile.values[inds])
 20 |         prob.append(rr.Quality.values[inds][..., np.newaxis])
 21 |     return pos, prob
 22 | 
 23 | 
 24 | if __name__ == '__main__':
 25 |     """
 26 |     This compares several different algorithms for bat echolocation detection.
 27 | 
 28 |     The results can vary by a few percent from run to run. If you don't want to
 29 |     run a specific model or baseline comment it out.
 30 |     """
 31 | 
 32 |     test_set = 'bulgaria'  # can be one of: bulgaria, uk, norfolk
 33 |     data_set = 'data/train_test_split/test_set_' + test_set + '.npz'
 34 |     raw_audio_dir = 'data/wav/'
 35 |     base_line_dir = 'data/baselines/'
 36 |     result_dir = 'results/'
 37 |     model_dir = 'data/models/'
 38 |     if not os.path.isdir(result_dir):
 39 |         os.mkdir(result_dir)
 40 |     if not os.path.isdir(model_dir):
 41 |         os.mkdir(model_dir)
 42 |     print 'test set:', test_set
 43 |     plt.close('all')
 44 | 
 45 |     # train and test_pos are in units of seconds
 46 |     loaded_data_tr = np.load(data_set)
 47 |     train_pos = loaded_data_tr['train_pos']
 48 |     train_files = loaded_data_tr['train_files']
 49 |     train_durations = loaded_data_tr['train_durations']
 50 |     test_pos = loaded_data_tr['test_pos']
 51 |     test_files = loaded_data_tr['test_files']
 52 |     test_durations = loaded_data_tr['test_durations']
 53 | 
 54 |     # load parameters
 55 |     params = DataSetParams()
 56 |     params.audio_dir = raw_audio_dir
 57 | 
 58 |     #
 59 |     # CNN
 60 |     print '\ncnn'
 61 |     params.classification_model = 'cnn'
 62 |     model = clss.Classifier(params)
 63 |     # train and test
 64 |     model.train(train_files, train_pos, train_durations)
 65 |     nms_pos, nms_prob = model.test_batch(test_files, test_pos, test_durations, False, '')
 66 |     # compute precision recall
 67 |     precision, recall = evl.prec_recall_1d(nms_pos, nms_prob, test_pos, test_durations, model.params.detection_overlap, model.params.window_size)
 68 |     res.plot_prec_recall('cnn', recall, precision, nms_prob)
 69 |     # save CNN model to file
 70 |     pickle.dump(model, open(model_dir + 'test_set_' + test_set + '.mod', 'wb'))
 71 | 
 72 |     #
 73 |     # random forest
 74 |     print '\nrandom forest'
 75 |     params.classification_model = 'rf_vanilla'
 76 |     model = clss.Classifier(params)
 77 |     # train and test
 78 |     model.train(train_files, train_pos, train_durations)
 79 |     nms_pos, nms_prob = model.test_batch(test_files, test_pos, test_durations, False, '')
 80 |     # compute precision recall
 81 |     precision, recall = evl.prec_recall_1d(nms_pos, nms_prob, test_pos, test_durations, model.params.detection_overlap, model.params.window_size)
 82 |     res.plot_prec_recall('rf', recall, precision, nms_prob)
 83 | 
 84 |     #
 85 |     # segment
 86 |     print '\nsegment'
 87 |     params.classification_model = 'segment'
 88 |     model = clss.Classifier(params)
 89 |     # train and test
 90 |     model.train(train_files, train_pos, train_durations)
 91 |     nms_pos, nms_prob = model.test_batch(test_files, test_pos, test_durations, False, '')
 92 |     # compute precision recall
 93 |     precision, recall = evl.prec_recall_1d(nms_pos, nms_prob, test_pos, test_durations, model.params.detection_overlap, model.params.window_size)
 94 |     res.plot_prec_recall('segment', recall, precision, nms_prob)
 95 | 
 96 |     #
 97 |     # scanr
 98 |     scanr_bat_results = base_line_dir + 'scanr/test_set_'+ test_set +'_scanr.csv'
 99 |     if os.path.isfile(scanr_bat_results):
100 |         print '\nscanr'
101 |         scanr_pos, scanr_prob = read_baseline_res(scanr_bat_results, test_files)
102 |         precision_scanr, recall_scanr = evl.prec_recall_1d(scanr_pos, scanr_prob, test_pos, test_durations, params.detection_overlap, params.window_size)
103 |         res.plot_prec_recall('scanr', recall_scanr, precision_scanr)
104 | 
105 |     #
106 |     # sonobat
107 |     sono_bat_results = base_line_dir + 'sonobat/test_set_'+ test_set +'_sono.csv'
108 |     if os.path.isfile(sono_bat_results):
109 |         print '\nsonobat'
110 |         sono_pos, sono_prob = read_baseline_res(sono_bat_results, test_files)
111 |         precision_sono, recall_sono = evl.prec_recall_1d(sono_pos, sono_prob, test_pos, test_durations, params.detection_overlap, params.window_size)
112 |         res.plot_prec_recall('sonobat', recall_sono, precision_sono)
113 | 
114 |     #
115 |     # kaleidoscope
116 |     kal_bat_results = base_line_dir + 'kaleidoscope/test_set_'+ test_set +'_kaleidoscope.csv'
117 |     if os.path.isfile(kal_bat_results):
118 |         print '\nkaleidoscope'
119 |         kal_pos, kal_prob = read_baseline_res(kal_bat_results, test_files)
120 |         precision_kal, recall_kal = evl.prec_recall_1d(kal_pos, kal_prob, test_pos, test_durations, params.detection_overlap, params.window_size)
121 |         res.plot_prec_recall('kaleidoscope', recall_kal, precision_kal)
122 | 
123 |     # save results
124 |     plt.savefig(result_dir + test_set + '_results.png')
125 |     plt.savefig(result_dir + test_set + '_results.pdf')
126 | 


--------------------------------------------------------------------------------
/bat_train/run_detector.py:
--------------------------------------------------------------------------------
  1 | from scipy.io import wavfile
  2 | import numpy as np
  3 | import cPickle as pickle
  4 | import os
  5 | import glob
  6 | import time
  7 | import write_op as wo
  8 | import sys
  9 | 
 10 | 
 11 | def read_audio(file_name, do_time_expansion, chunk_size, win_size):
 12 | 
 13 |     # try to read in audio file
 14 |     try:
 15 |         samp_rate_orig, audio = wavfile.read(file_name)
 16 |     except:
 17 |         print '  Error reading file'
 18 |         return True, None, None, None, None
 19 | 
 20 |     # convert to mono if stereo
 21 |     if len(audio.shape) == 2:
 22 |         print '  Warning: stereo file. Just taking right channel.'
 23 |         audio = audio[:, 1]
 24 |     file_dur = audio.shape[0] / float(samp_rate_orig)
 25 |     print '  dur', round(file_dur,3), '(secs) , fs', samp_rate_orig
 26 | 
 27 |     # original model is trained on time expanded data
 28 |     samp_rate = samp_rate_orig
 29 |     if do_time_expansion:
 30 |         samp_rate = int(samp_rate_orig/10.0)
 31 |         file_dur *= 10
 32 | 
 33 |     # pad with zeros so we can go right to the end
 34 |     multiplier = np.ceil(file_dur/float(chunk_size-win_size))
 35 |     diff = multiplier*(chunk_size-win_size) - file_dur + win_size
 36 |     audio_pad = np.hstack((audio, np.zeros(int(diff*samp_rate))))
 37 | 
 38 |     return False, audio_pad, file_dur, samp_rate, samp_rate_orig
 39 | 
 40 | 
 41 | def run_detector(det, audio, file_dur, samp_rate, detection_thresh):
 42 | 
 43 |     det_time = []
 44 |     det_prob = []
 45 | 
 46 |     # files can be long so we split each up into separate (overlapping) chunks
 47 |     st_positions = np.arange(0, file_dur, det.chunk_size-det.params.window_size)
 48 |     for chunk_id, st_position in enumerate(st_positions):
 49 | 
 50 |         # take a chunk of the audio
 51 |         # should already be zero padded at the end so its the correct size
 52 |         st_pos = int(st_position*samp_rate)
 53 |         en_pos = int(st_pos + det.chunk_size*samp_rate)
 54 |         audio_chunk = audio[st_pos:en_pos]
 55 | 
 56 |         # make predictions
 57 |         pos, prob, y_prediction = det.test_single(audio_chunk, samp_rate)
 58 |         prob = prob[:, 0]
 59 | 
 60 |         # remove predictions near the end (if not last chunk) and ones that are
 61 |         # below the detection threshold
 62 |         if chunk_id == (len(st_positions)-1):
 63 |             inds = (prob >= detection_thresh)
 64 |         else:
 65 |             inds = (prob >= detection_thresh) & (pos < (det.chunk_size-(det.params.window_size/2.0)))
 66 | 
 67 |         # convert detection time back into global time and save valid detections
 68 |         if pos.shape[0] > 0:
 69 |             det_time.append(pos[inds] + st_position)
 70 |             det_prob.append(prob[inds])
 71 | 
 72 |     if len(det_time) > 0:
 73 |         det_time = np.hstack(det_time)
 74 |         det_prob = np.hstack(det_prob)
 75 | 
 76 |         # undo the effects of times expansion
 77 |         if do_time_expansion:
 78 |             det_time /= 10.0
 79 | 
 80 |     return det_time, det_prob
 81 | 
 82 | 
 83 | if __name__ == "__main__":
 84 |     """
 85 |     This code takes a directory of audio files and runs a CNN based bat call
 86 |     detector. It returns the time in file of the detection and the probability
 87 |     that the detection is a bat call.
 88 |     """
 89 | 
 90 |     # params
 91 |     detection_thresh = 0.80   # make this smaller if you want more calls detected
 92 |     do_time_expansion = True  # set to True if audio is not already time expanded
 93 |     save_res = True
 94 | 
 95 |     # load data -
 96 |     data_dir = 'path_to_data/' # path of the data that we run the model on
 97 |     op_ann_dir = 'results/'    # where we will store the outputs
 98 |     op_file_name_total = op_ann_dir + 'op_file.csv'
 99 |     if not os.path.isdir(op_ann_dir):
100 |         os.makedirs(op_ann_dir)
101 | 
102 |     # load gpu lasagne model
103 |     model_dir = 'data/models/'
104 |     model_file = model_dir + 'test_set_bulgaria.mod'
105 |     det = pickle.load(open(model_file))
106 |     det.chunk_size = 4.0
107 | 
108 |     # read audio files
109 |     audio_files = glob.glob(data_dir + '*.wav')
110 | 
111 |     # loop through audio files
112 |     results = []
113 |     for file_cnt, file_name in enumerate(audio_files):
114 | 
115 |         file_name_root = file_name[len(data_dir):]
116 |         print '\n', file_cnt+1, 'of', len(audio_files), '\t', file_name_root
117 | 
118 |         # read audio file - skip file if cannot read
119 |         read_fail, audio, file_dur, samp_rate, samp_rate_orig = read_audio(file_name,
120 |                                 do_time_expansion, det.chunk_size, det.params.window_size)
121 |         if read_fail:
122 |             continue
123 | 
124 |         # run detector
125 |         tic = time.time()
126 |         det_time, det_prob = run_detector(det, audio, file_dur, samp_rate,
127 |                                           detection_thresh)
128 |         toc = time.time()
129 | 
130 |         print '  detection time', round(toc-tic, 3), '(secs)'
131 |         num_calls = len(det_time)
132 |         print '  ' + str(num_calls) + ' calls found'
133 | 
134 |         # save results
135 |         if save_res:
136 |             # return detector results
137 |             pred_classes = np.zeros((len(det_time), 1), dtype=np.int)
138 |             pred_prob = np.asarray(det_prob)[..., np.newaxis]
139 | 
140 |             # save to AudioTagger format
141 |             op_file_name = op_ann_dir + file_name_root[:-4] + '-sceneRect.csv'
142 |             wo.create_audio_tagger_op(file_name_root, op_file_name, det_time,
143 |                                       det_prob, pred_classes[:,0], pred_prob[:,0],
144 |                                       samp_rate_orig, np.asarray(['bat']))
145 | 
146 |             # save as dictionary
147 |             if num_calls > 0:
148 |                 res = {'filename':file_name_root, 'time':det_time,
149 |                        'prob':det_prob, 'pred_classes':pred_classes,
150 |                        'pred_prob':pred_prob}
151 |                 results.append(res)
152 | 
153 |     # save to large csv
154 |     if save_res and (len(results) > 0):
155 |         print '\nsaving results to', op_file_name_total
156 |         wo.save_to_txt(op_file_name_total, results, np.asarray(['bat']))
157 |     else:
158 |         print 'no detections to save'
159 | 


--------------------------------------------------------------------------------
/bat_train/spectrogram.py:
--------------------------------------------------------------------------------
  1 | from skimage import filters
  2 | import numpy as np
  3 | 
  4 | 
  5 | def denoise(spec_noisy, mask=None):
  6 |     """
  7 |     Perform denoising, subtract mean from each frequency band.
  8 |     Mask chooses the relevant time steps to use.
  9 |     """
 10 | 
 11 |     if mask is None:
 12 |         # no mask
 13 |         me = np.mean(spec_noisy, 1)
 14 |         spec_denoise = spec_noisy - me[:, np.newaxis]
 15 | 
 16 |     else:
 17 |         # user defined mask
 18 |         mask_inv = np.invert(mask)
 19 |         spec_denoise = spec_noisy.copy()
 20 | 
 21 |         if np.sum(mask) > 0:
 22 |             me = np.mean(spec_denoise[:, mask], 1)
 23 |             spec_denoise[:, mask] = spec_denoise[:, mask] - me[:, np.newaxis]
 24 | 
 25 |         if np.sum(mask_inv) > 0:
 26 |             me_inv = np.mean(spec_denoise[:, mask_inv], 1)
 27 |             spec_denoise[:, mask_inv] = spec_denoise[:, mask_inv] - me_inv[:, np.newaxis]
 28 | 
 29 |     # remove anything below 0
 30 |     spec_denoise.clip(min=0, out=spec_denoise)
 31 | 
 32 |     return spec_denoise
 33 | 
 34 | 
 35 | def gen_mag_spectrogram_fft(x, nfft, noverlap):
 36 |     """
 37 |     Compute magnitude spectrogram by specifying num bins.
 38 |     """
 39 | 
 40 |     # window data
 41 |     step = nfft - noverlap
 42 |     shape = (nfft, (x.shape[-1]-noverlap)//step)
 43 |     strides = (x.strides[0], step*x.strides[0])
 44 |     x_wins = np.lib.stride_tricks.as_strided(x, shape=shape, strides=strides)
 45 | 
 46 |     # apply window
 47 |     x_wins_han = np.hanning(x_wins.shape[0])[..., np.newaxis] * x_wins
 48 | 
 49 |     # do fft
 50 |     complex_spec = np.fft.rfft(x_wins_han, n=nfft, axis=0)
 51 | 
 52 |     # calculate magnitude
 53 |     mag_spec = np.conjugate(complex_spec) * complex_spec
 54 |     mag_spec = mag_spec.real
 55 |     # same as:
 56 |     #mag_spec = np.square(np.absolute(complex_spec))
 57 | 
 58 |     # orient correctly and remove dc component
 59 |     mag_spec = mag_spec[1:, :]
 60 |     mag_spec = np.flipud(mag_spec)
 61 | 
 62 |     return mag_spec
 63 | 
 64 | 
 65 | def gen_mag_spectrogram(x, fs, ms, overlap_perc):
 66 |     """
 67 |     Computes magnitude spectrogram by specifying time.
 68 |     """
 69 | 
 70 |     nfft = int(ms*fs)
 71 |     noverlap = int(overlap_perc*nfft)
 72 | 
 73 |     # window data
 74 |     step = nfft - noverlap
 75 |     shape = (nfft, (x.shape[-1]-noverlap)//step)
 76 |     strides = (x.strides[0], step*x.strides[0])
 77 |     x_wins = np.lib.stride_tricks.as_strided(x, shape=shape, strides=strides)
 78 | 
 79 |     # apply window
 80 |     x_wins_han = np.hanning(x_wins.shape[0])[..., np.newaxis] * x_wins
 81 | 
 82 |     # do fft
 83 |     # note this will be much slower if x_wins_han.shape[0] is not a power of 2
 84 |     complex_spec = np.fft.rfft(x_wins_han, axis=0)
 85 | 
 86 |     # calculate magnitude
 87 |     mag_spec = (np.conjugate(complex_spec) * complex_spec).real
 88 |     # same as:
 89 |     #mag_spec = np.square(np.absolute(complex_spec))
 90 | 
 91 |     # orient correctly and remove dc component
 92 |     spec = mag_spec[1:, :]
 93 |     spec = np.flipud(spec)
 94 | 
 95 |     return spec
 96 | 
 97 | 
 98 | def gen_spectrogram(audio_samples, sampling_rate, fft_win_length, fft_overlap, crop_spec=True, max_freq=256, min_freq=0):
 99 |     """
100 |     Compute spectrogram, crop and compute log.
101 |     """
102 | 
103 |     # compute spectrogram
104 |     spec = gen_mag_spectrogram(audio_samples, sampling_rate, fft_win_length, fft_overlap)
105 | 
106 |     # only keep the relevant bands - could do this outside
107 |     if crop_spec:
108 |         spec = spec[-max_freq:-min_freq, :]
109 | 
110 |         # add some zeros if too small
111 |         req_height = max_freq-min_freq
112 |         if spec.shape[0] < req_height:
113 |             zero_pad = np.zeros((req_height-spec.shape[0], spec.shape[1]))
114 |             spec = np.vstack((zero_pad, spec))
115 | 
116 |     # perform log scaling - here the same as matplotlib
117 |     log_scaling = 2.0 * (1.0 / sampling_rate) * (1.0/(np.abs(np.hanning(int(fft_win_length*sampling_rate)))**2).sum())
118 |     spec = np.log(1.0 + log_scaling*spec)
119 | 
120 |     return spec
121 | 
122 | 
123 | def process_spectrogram(spec, denoise_spec=True, mean_log_mag=0.5, smooth_spec=True):
124 |     """
125 |     Denoises, and smooths spectrogram.
126 |     """
127 | 
128 |     # denoise
129 |     if denoise_spec:
130 |         # use a mask as there is silence at the start and end of recs
131 |         mask = spec.mean(0) > mean_log_mag
132 |         spec = denoise(spec, mask)
133 | 
134 |     # smooth the spectrogram
135 |     if smooth_spec:
136 |         spec = filters.gaussian(spec, 1.0)
137 | 
138 |     return spec
139 | 


--------------------------------------------------------------------------------
/bat_train/write_op.py:
--------------------------------------------------------------------------------
 1 | import pandas as pd
 2 | import numpy as np
 3 | import datetime as dt
 4 | import glob
 5 | import os
 6 | 
 7 | 
 8 | def save_to_txt(op_file, results, class_names):
 9 |     num_top_classes = results[0]['pred_prob'].shape[1]
10 | 
11 |     # takes a dictionary of results and saves to file
12 |     with open(op_file, 'w') as file:
13 |         head_str = 'file_name,detection_time,detection_prob'
14 |         for cc in range(num_top_classes):
15 |             head_str += ',class_' + str(cc) + ',prob_' + str(cc)
16 |         file.write(head_str + '\n')
17 | 
18 |         for ii in range(len(results)):
19 |             for jj in range(len(results[ii]['prob'])):
20 | 
21 |                 row_str = results[ii]['filename'] + ','
22 |                 tm = round(results[ii]['time'][jj],3)
23 |                 pr = round(results[ii]['prob'][jj],3)
24 |                 row_str += str(tm) + ',' + str(pr)
25 | 
26 |                 for cc in range(num_top_classes):
27 |                     cl = class_names[results[ii]['pred_classes'][jj, cc]]
28 |                     pr = round(results[ii]['pred_prob'][jj, cc],3)
29 |                     row_str += ',' + cl + ',' + str(pr)
30 | 
31 |                 file.write(row_str + '\n')
32 | 
33 | 
34 | def create_audio_tagger_op(ip_file_name, op_file_name, st_times, det_confidence,
35 |                            class_pred, class_prob, samp_rate, class_names):
36 |     # saves the detections in an audiotagger friendly format
37 | 
38 |     col_names = ['Filename', 'Label', 'LabelTimeStamp', 'Spec_NStep',
39 |                  'Spec_NWin', 'Spec_x1', 'Spec_y1', 'Spec_x2', 'Spec_y2',
40 |                  'LabelStartTime_Seconds', 'LabelEndTime_Seconds',
41 |                  'LabelArea_DataPoints', 'DetectorConfidence',
42 |                  'ClassifierConfidence']
43 | 
44 |     nstep = 0.001
45 |     nwin = 0.003
46 |     call_width = 0.001  # code does not output call width so just put in dummy value
47 |     y_max = (samp_rate*nwin)/2.0
48 |     num_calls = len(st_times)
49 | 
50 |     if num_calls == 0:
51 |         da_at = pd.DataFrame(index=np.arange(0), columns=col_names)
52 |     else:
53 |         da_at = pd.DataFrame(index=np.arange(0, num_calls), columns=col_names)
54 |         da_at['Spec_NStep'] = nstep
55 |         da_at['Spec_NWin'] = nwin
56 |         da_at['Label'] = 'bat'
57 |         da_at['LabelTimeStamp'] = dt.datetime.now().isoformat()
58 |         da_at['Spec_y1'] = 0
59 |         da_at['Spec_y2'] = y_max
60 |         da_at['Filename'] = ip_file_name
61 | 
62 |         for ii in np.arange(0, num_calls):
63 | 
64 |             st_time = st_times[ii]
65 |             da_at.loc[ii, 'LabelStartTime_Seconds'] = st_time
66 |             da_at.loc[ii, 'LabelEndTime_Seconds'] = st_time + call_width
67 |             da_at.loc[ii, 'Label'] = class_names[class_pred[ii]]
68 | 
69 |             da_at.loc[ii, 'Spec_x1'] = st_time/nstep
70 |             da_at.loc[ii, 'Spec_x2'] = (st_time + call_width)/nstep
71 | 
72 |             da_at.loc[ii, 'DetectorConfidence'] = round(det_confidence[ii], 3)
73 |             da_at.loc[ii, 'ClassifierConfidence'] = round(class_prob[ii], 3)
74 | 
75 |     # save to disk
76 |     da_at.to_csv(op_file_name, index=False)
77 | 
78 |     return da_at
79 | 
80 | 


--------------------------------------------------------------------------------
/readme.md:
--------------------------------------------------------------------------------
 1 | # Bat Echolocation Call Detection in Audio Recordings
 2 | Python code for the detection of bat echolocation calls in full spectrum audio recordings. This code recreate the results from the paper [Bat Detective - Deep Learning Tools for Bat Acoustic Signal Detection](http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005995). You will also find some additional information and data on our [project page](http://visual.cs.ucl.ac.uk/pubs/batDetective).
 3 | 
 4 | 
 5 | **Update Dec 2022:** We now have a new and improved codebase that you can access [here](https://github.com/macaodha/batdetect2).  
 6 | 
 7 | 
 8 | ## Training
 9 | `bat_train` contains the code to train the models and recreate the plots in the paper.
10 | 
11 | ## Running the Detector
12 | `bat_eval` contains lightweight python scripts that load a pretrained model and run the detector on a directory of audio files. No GPU is required for this step.  
13 | 
14 | 
15 | ## Misc
16 | 
17 | #### Video
18 | Here is a short video that describes how our systems works.  
19 | [![Screenshot](https://img.youtube.com/vi/u35jWHdhl-8/0.jpg)](https://www.youtube.com/watch?v=u35jWHdhl-8)
20 | 
21 | 
22 | #### Links
23 | [Nature Smart Cities](https://naturesmartcities.com) Deployment of smart audio detectors that use our code base to detect bats in East London.    
24 | [Bat Detective](www.batdetective.org) Zooniverse citizen science project that was created to collected our training data.  
25 | [iBats](http://www.bats.org.uk/pages/ibatsprogram.html) Global bat monitoring program.    
26 | 
27 | 
28 | #### Reference
29 | If you find our work useful in your research please consider citing our paper:
30 | ```
31 | @inproceedings{batdetect18,
32 |   title     = {Bat Detective - Deep Learning Tools for Bat Acoustic Signal Detection},
33 |   author    = {Mac Aodha, Oisin and Gibb, Rory and Barlow, Kate and Browning, Ella and
34 |                Firman, Michael and   Freeman, Robin and Harder, Briana and Kinsey, Libby and
35 |                Mead, Gary and Newson, Stuart and Pandourski, Ivan and Parsons, Stuart and  
36 |                Russ, Jon and Szodoray-Paradi, Abigel and Szodoray-Paradi, Farkas and  
37 |                Tilova, Elena and Girolami, Mark and Brostow, Gabriel and E. Jones, Kate.},
38 |   journal={PLOS Computational Biology},
39 |   year={2018}
40 | }
41 | ```
42 | 
43 | #### Acknowledgements  
44 | We are enormously grateful for the efforts and enthusiasm of the amazing iBats and Bat Detective volunteers. We would also like to thank Ian Agranat and Joe Szewczak for useful discussions and access to their systems. Finally, we would like to thank [Zooniverse](https://www.zooniverse.org/) for setting up and hosting the Bat Detective project.
45 | 


--------------------------------------------------------------------------------