├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── LICENSE
├── README.md
├── conv13.sh
├── distributed_training.png
├── download_models.sh
├── eval_linear.py
├── eval_pretrain.py
├── eval_voc_classif.py
├── linear_classif_layers.sh
├── main.py
├── main.sh
└── src
    ├── __init__.py
    ├── clustering.py
    ├── data
        ├── VOC2007.py
        ├── YFCC100M.py
        ├── __init__.py
        └── loader.py
    ├── distributed_kmeans.py
    ├── logger.py
    ├── model
        ├── __init__.py
        ├── model_factory.py
        ├── pretrain.py
        └── vgg16.py
    ├── slurm.py
    ├── trainer.py
    └── utils.py


/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
1 | # Code of Conduct
2 | 
3 | Facebook has adopted a Code of Conduct that we expect project participants to adhere to.
4 | Please read the [full text](https://code.fb.com/codeofconduct/)
5 | so that you can understand what actions will and will not be tolerated.
6 | 


--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
1 | # Contributing 
2 | 
3 | In the context of this project, we do not expect pull requests. 
4 | If you find a bug, or would like to suggest an improvement, please open an issue.
5 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
  1 | Attribution-NonCommercial 4.0 International
  2 | 
  3 | =======================================================================
  4 | 
  5 | Creative Commons Corporation ("Creative Commons") is not a law firm and
  6 | does not provide legal services or legal advice. Distribution of
  7 | Creative Commons public licenses does not create a lawyer-client or
  8 | other relationship. Creative Commons makes its licenses and related
  9 | information available on an "as-is" basis. Creative Commons gives no
 10 | warranties regarding its licenses, any material licensed under their
 11 | terms and conditions, or any related information. Creative Commons
 12 | disclaims all liability for damages resulting from their use to the
 13 | fullest extent possible.
 14 | 
 15 | Using Creative Commons Public Licenses
 16 | 
 17 | Creative Commons public licenses provide a standard set of terms and
 18 | conditions that creators and other rights holders may use to share
 19 | original works of authorship and other material subject to copyright
 20 | and certain other rights specified in the public license below. The
 21 | following considerations are for informational purposes only, are not
 22 | exhaustive, and do not form part of our licenses.
 23 | 
 24 |      Considerations for licensors: Our public licenses are
 25 |      intended for use by those authorized to give the public
 26 |      permission to use material in ways otherwise restricted by
 27 |      copyright and certain other rights. Our licenses are
 28 |      irrevocable. Licensors should read and understand the terms
 29 |      and conditions of the license they choose before applying it.
 30 |      Licensors should also secure all rights necessary before
 31 |      applying our licenses so that the public can reuse the
 32 |      material as expected. Licensors should clearly mark any
 33 |      material not subject to the license. This includes other CC-
 34 |      licensed material, or material used under an exception or
 35 |      limitation to copyright. More considerations for licensors:
 36 | 	wiki.creativecommons.org/Considerations_for_licensors
 37 | 
 38 |      Considerations for the public: By using one of our public
 39 |      licenses, a licensor grants the public permission to use the
 40 |      licensed material under specified terms and conditions. If
 41 |      the licensor's permission is not necessary for any reason--for
 42 |      example, because of any applicable exception or limitation to
 43 |      copyright--then that use is not regulated by the license. Our
 44 |      licenses grant only permissions under copyright and certain
 45 |      other rights that a licensor has authority to grant. Use of
 46 |      the licensed material may still be restricted for other
 47 |      reasons, including because others have copyright or other
 48 |      rights in the material. A licensor may make special requests,
 49 |      such as asking that all changes be marked or described.
 50 |      Although not required by our licenses, you are encouraged to
 51 |      respect those requests where reasonable. More_considerations
 52 |      for the public:
 53 | 	wiki.creativecommons.org/Considerations_for_licensees
 54 | 
 55 | =======================================================================
 56 | 
 57 | Creative Commons Attribution-NonCommercial 4.0 International Public
 58 | License
 59 | 
 60 | By exercising the Licensed Rights (defined below), You accept and agree
 61 | to be bound by the terms and conditions of this Creative Commons
 62 | Attribution-NonCommercial 4.0 International Public License ("Public
 63 | License"). To the extent this Public License may be interpreted as a
 64 | contract, You are granted the Licensed Rights in consideration of Your
 65 | acceptance of these terms and conditions, and the Licensor grants You
 66 | such rights in consideration of benefits the Licensor receives from
 67 | making the Licensed Material available under these terms and
 68 | conditions.
 69 | 
 70 | Section 1 -- Definitions.
 71 | 
 72 |   a. Adapted Material means material subject to Copyright and Similar
 73 |      Rights that is derived from or based upon the Licensed Material
 74 |      and in which the Licensed Material is translated, altered,
 75 |      arranged, transformed, or otherwise modified in a manner requiring
 76 |      permission under the Copyright and Similar Rights held by the
 77 |      Licensor. For purposes of this Public License, where the Licensed
 78 |      Material is a musical work, performance, or sound recording,
 79 |      Adapted Material is always produced where the Licensed Material is
 80 |      synched in timed relation with a moving image.
 81 | 
 82 |   b. Adapter's License means the license You apply to Your Copyright
 83 |      and Similar Rights in Your contributions to Adapted Material in
 84 |      accordance with the terms and conditions of this Public License.
 85 | 
 86 |   c. Copyright and Similar Rights means copyright and/or similar rights
 87 |      closely related to copyright including, without limitation,
 88 |      performance, broadcast, sound recording, and Sui Generis Database
 89 |      Rights, without regard to how the rights are labeled or
 90 |      categorized. For purposes of this Public License, the rights
 91 |      specified in Section 2(b)(1)-(2) are not Copyright and Similar
 92 |      Rights.
 93 |   d. Effective Technological Measures means those measures that, in the
 94 |      absence of proper authority, may not be circumvented under laws
 95 |      fulfilling obligations under Article 11 of the WIPO Copyright
 96 |      Treaty adopted on December 20, 1996, and/or similar international
 97 |      agreements.
 98 | 
 99 |   e. Exceptions and Limitations means fair use, fair dealing, and/or
100 |      any other exception or limitation to Copyright and Similar Rights
101 |      that applies to Your use of the Licensed Material.
102 | 
103 |   f. Licensed Material means the artistic or literary work, database,
104 |      or other material to which the Licensor applied this Public
105 |      License.
106 | 
107 |   g. Licensed Rights means the rights granted to You subject to the
108 |      terms and conditions of this Public License, which are limited to
109 |      all Copyright and Similar Rights that apply to Your use of the
110 |      Licensed Material and that the Licensor has authority to license.
111 | 
112 |   h. Licensor means the individual(s) or entity(ies) granting rights
113 |      under this Public License.
114 | 
115 |   i. NonCommercial means not primarily intended for or directed towards
116 |      commercial advantage or monetary compensation. For purposes of
117 |      this Public License, the exchange of the Licensed Material for
118 |      other material subject to Copyright and Similar Rights by digital
119 |      file-sharing or similar means is NonCommercial provided there is
120 |      no payment of monetary compensation in connection with the
121 |      exchange.
122 | 
123 |   j. Share means to provide material to the public by any means or
124 |      process that requires permission under the Licensed Rights, such
125 |      as reproduction, public display, public performance, distribution,
126 |      dissemination, communication, or importation, and to make material
127 |      available to the public including in ways that members of the
128 |      public may access the material from a place and at a time
129 |      individually chosen by them.
130 | 
131 |   k. Sui Generis Database Rights means rights other than copyright
132 |      resulting from Directive 96/9/EC of the European Parliament and of
133 |      the Council of 11 March 1996 on the legal protection of databases,
134 |      as amended and/or succeeded, as well as other essentially
135 |      equivalent rights anywhere in the world.
136 | 
137 |   l. You means the individual or entity exercising the Licensed Rights
138 |      under this Public License. Your has a corresponding meaning.
139 | 
140 | Section 2 -- Scope.
141 | 
142 |   a. License grant.
143 | 
144 |        1. Subject to the terms and conditions of this Public License,
145 |           the Licensor hereby grants You a worldwide, royalty-free,
146 |           non-sublicensable, non-exclusive, irrevocable license to
147 |           exercise the Licensed Rights in the Licensed Material to:
148 | 
149 |             a. reproduce and Share the Licensed Material, in whole or
150 |                in part, for NonCommercial purposes only; and
151 | 
152 |             b. produce, reproduce, and Share Adapted Material for
153 |                NonCommercial purposes only.
154 | 
155 |        2. Exceptions and Limitations. For the avoidance of doubt, where
156 |           Exceptions and Limitations apply to Your use, this Public
157 |           License does not apply, and You do not need to comply with
158 |           its terms and conditions.
159 | 
160 |        3. Term. The term of this Public License is specified in Section
161 |           6(a).
162 | 
163 |        4. Media and formats; technical modifications allowed. The
164 |           Licensor authorizes You to exercise the Licensed Rights in
165 |           all media and formats whether now known or hereafter created,
166 |           and to make technical modifications necessary to do so. The
167 |           Licensor waives and/or agrees not to assert any right or
168 |           authority to forbid You from making technical modifications
169 |           necessary to exercise the Licensed Rights, including
170 |           technical modifications necessary to circumvent Effective
171 |           Technological Measures. For purposes of this Public License,
172 |           simply making modifications authorized by this Section 2(a)
173 |           (4) never produces Adapted Material.
174 | 
175 |        5. Downstream recipients.
176 | 
177 |             a. Offer from the Licensor -- Licensed Material. Every
178 |                recipient of the Licensed Material automatically
179 |                receives an offer from the Licensor to exercise the
180 |                Licensed Rights under the terms and conditions of this
181 |                Public License.
182 | 
183 |             b. No downstream restrictions. You may not offer or impose
184 |                any additional or different terms or conditions on, or
185 |                apply any Effective Technological Measures to, the
186 |                Licensed Material if doing so restricts exercise of the
187 |                Licensed Rights by any recipient of the Licensed
188 |                Material.
189 | 
190 |        6. No endorsement. Nothing in this Public License constitutes or
191 |           may be construed as permission to assert or imply that You
192 |           are, or that Your use of the Licensed Material is, connected
193 |           with, or sponsored, endorsed, or granted official status by,
194 |           the Licensor or others designated to receive attribution as
195 |           provided in Section 3(a)(1)(A)(i).
196 | 
197 |   b. Other rights.
198 | 
199 |        1. Moral rights, such as the right of integrity, are not
200 |           licensed under this Public License, nor are publicity,
201 |           privacy, and/or other similar personality rights; however, to
202 |           the extent possible, the Licensor waives and/or agrees not to
203 |           assert any such rights held by the Licensor to the limited
204 |           extent necessary to allow You to exercise the Licensed
205 |           Rights, but not otherwise.
206 | 
207 |        2. Patent and trademark rights are not licensed under this
208 |           Public License.
209 | 
210 |        3. To the extent possible, the Licensor waives any right to
211 |           collect royalties from You for the exercise of the Licensed
212 |           Rights, whether directly or through a collecting society
213 |           under any voluntary or waivable statutory or compulsory
214 |           licensing scheme. In all other cases the Licensor expressly
215 |           reserves any right to collect such royalties, including when
216 |           the Licensed Material is used other than for NonCommercial
217 |           purposes.
218 | 
219 | Section 3 -- License Conditions.
220 | 
221 | Your exercise of the Licensed Rights is expressly made subject to the
222 | following conditions.
223 | 
224 |   a. Attribution.
225 | 
226 |        1. If You Share the Licensed Material (including in modified
227 |           form), You must:
228 | 
229 |             a. retain the following if it is supplied by the Licensor
230 |                with the Licensed Material:
231 | 
232 |                  i. identification of the creator(s) of the Licensed
233 |                     Material and any others designated to receive
234 |                     attribution, in any reasonable manner requested by
235 |                     the Licensor (including by pseudonym if
236 |                     designated);
237 | 
238 |                 ii. a copyright notice;
239 | 
240 |                iii. a notice that refers to this Public License;
241 | 
242 |                 iv. a notice that refers to the disclaimer of
243 |                     warranties;
244 | 
245 |                  v. a URI or hyperlink to the Licensed Material to the
246 |                     extent reasonably practicable;
247 | 
248 |             b. indicate if You modified the Licensed Material and
249 |                retain an indication of any previous modifications; and
250 | 
251 |             c. indicate the Licensed Material is licensed under this
252 |                Public License, and include the text of, or the URI or
253 |                hyperlink to, this Public License.
254 | 
255 |        2. You may satisfy the conditions in Section 3(a)(1) in any
256 |           reasonable manner based on the medium, means, and context in
257 |           which You Share the Licensed Material. For example, it may be
258 |           reasonable to satisfy the conditions by providing a URI or
259 |           hyperlink to a resource that includes the required
260 |           information.
261 | 
262 |        3. If requested by the Licensor, You must remove any of the
263 |           information required by Section 3(a)(1)(A) to the extent
264 |           reasonably practicable.
265 | 
266 |        4. If You Share Adapted Material You produce, the Adapter's
267 |           License You apply must not prevent recipients of the Adapted
268 |           Material from complying with this Public License.
269 | 
270 | Section 4 -- Sui Generis Database Rights.
271 | 
272 | Where the Licensed Rights include Sui Generis Database Rights that
273 | apply to Your use of the Licensed Material:
274 | 
275 |   a. for the avoidance of doubt, Section 2(a)(1) grants You the right
276 |      to extract, reuse, reproduce, and Share all or a substantial
277 |      portion of the contents of the database for NonCommercial purposes
278 |      only;
279 | 
280 |   b. if You include all or a substantial portion of the database
281 |      contents in a database in which You have Sui Generis Database
282 |      Rights, then the database in which You have Sui Generis Database
283 |      Rights (but not its individual contents) is Adapted Material; and
284 | 
285 |   c. You must comply with the conditions in Section 3(a) if You Share
286 |      all or a substantial portion of the contents of the database.
287 | 
288 | For the avoidance of doubt, this Section 4 supplements and does not
289 | replace Your obligations under this Public License where the Licensed
290 | Rights include other Copyright and Similar Rights.
291 | 
292 | Section 5 -- Disclaimer of Warranties and Limitation of Liability.
293 | 
294 |   a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE
295 |      EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS
296 |      AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF
297 |      ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS,
298 |      IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION,
299 |      WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR
300 |      PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS,
301 |      ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT
302 |      KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT
303 |      ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU.
304 | 
305 |   b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE
306 |      TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION,
307 |      NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT,
308 |      INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES,
309 |      COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR
310 |      USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN
311 |      ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR
312 |      DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR
313 |      IN PART, THIS LIMITATION MAY NOT APPLY TO YOU.
314 | 
315 |   c. The disclaimer of warranties and limitation of liability provided
316 |      above shall be interpreted in a manner that, to the extent
317 |      possible, most closely approximates an absolute disclaimer and
318 |      waiver of all liability.
319 | 
320 | Section 6 -- Term and Termination.
321 | 
322 |   a. This Public License applies for the term of the Copyright and
323 |      Similar Rights licensed here. However, if You fail to comply with
324 |      this Public License, then Your rights under this Public License
325 |      terminate automatically.
326 | 
327 |   b. Where Your right to use the Licensed Material has terminated under
328 |      Section 6(a), it reinstates:
329 | 
330 |        1. automatically as of the date the violation is cured, provided
331 |           it is cured within 30 days of Your discovery of the
332 |           violation; or
333 | 
334 |        2. upon express reinstatement by the Licensor.
335 | 
336 |      For the avoidance of doubt, this Section 6(b) does not affect any
337 |      right the Licensor may have to seek remedies for Your violations
338 |      of this Public License.
339 | 
340 |   c. For the avoidance of doubt, the Licensor may also offer the
341 |      Licensed Material under separate terms or conditions or stop
342 |      distributing the Licensed Material at any time; however, doing so
343 |      will not terminate this Public License.
344 | 
345 |   d. Sections 1, 5, 6, 7, and 8 survive termination of this Public
346 |      License.
347 | 
348 | Section 7 -- Other Terms and Conditions.
349 | 
350 |   a. The Licensor shall not be bound by any additional or different
351 |      terms or conditions communicated by You unless expressly agreed.
352 | 
353 |   b. Any arrangements, understandings, or agreements regarding the
354 |      Licensed Material not stated herein are separate from and
355 |      independent of the terms and conditions of this Public License.
356 | 
357 | Section 8 -- Interpretation.
358 | 
359 |   a. For the avoidance of doubt, this Public License does not, and
360 |      shall not be interpreted to, reduce, limit, restrict, or impose
361 |      conditions on any use of the Licensed Material that could lawfully
362 |      be made without permission under this Public License.
363 | 
364 |   b. To the extent possible, if any provision of this Public License is
365 |      deemed unenforceable, it shall be automatically reformed to the
366 |      minimum extent necessary to make it enforceable. If the provision
367 |      cannot be reformed, it shall be severed from this Public License
368 |      without affecting the enforceability of the remaining terms and
369 |      conditions.
370 | 
371 |   c. No term or condition of this Public License will be waived and no
372 |      failure to comply consented to unless expressly agreed to by the
373 |      Licensor.
374 | 
375 |   d. Nothing in this Public License constitutes or may be interpreted
376 |      as a limitation upon, or waiver of, any privileges and immunities
377 |      that apply to the Licensor or You, including from the legal
378 |      processes of any jurisdiction or authority.
379 | 
380 | =======================================================================
381 | 
382 | Creative Commons is not a party to its public
383 | licenses. Notwithstanding, Creative Commons may elect to apply one of
384 | its public licenses to material it publishes and in those instances
385 | will be considered the “Licensor.” The text of the Creative Commons
386 | public licenses is dedicated to the public domain under the CC0 Public
387 | Domain Dedication. Except for the limited purpose of indicating that
388 | material is shared under a Creative Commons public license or as
389 | otherwise permitted by the Creative Commons policies published at
390 | creativecommons.org/policies, Creative Commons does not authorize the
391 | use of the trademark "Creative Commons" or any other trademark or logo
392 | of Creative Commons without its prior written consent including,
393 | without limitation, in connection with any unauthorized modifications
394 | to any of its public licenses or any other arrangements,
395 | understandings, or agreements concerning use of licensed material. For
396 | the avoidance of doubt, this paragraph does not form part of the
397 | public licenses.
398 | 
399 | Creative Commons may be contacted at creativecommons.org.
400 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # DeeperCluster: Unsupervised Pre-training of Image Features on Non-Curated Data 
  2 | 
  3 | This code implements the unsupervised pre-training of convolutional neural networks, or convnets, as described in [Unsupervised Pre-training of Image Features on Non-Curated Data](https://arxiv.org/abs/1905.01278).
  4 | 
  5 | ## Models
  6 | We provide for download the following models:
  7 | * DeeperCluster model trained on the full YFCC100M dataset;
  8 | * DeepCluster [2] model trained on 1.3M images subset of the YFCC100M dataset;
  9 | * RotNet [3] model trained on the full YFCC100M dataset;
 10 | * RotNet [3] model trained on ImageNet dataset without labels.
 11 | 
 12 | All these models follow a standard VGG-16 architecture with batch-normalization layers.
 13 | Note that in Deep/DeeperCluster models, sobel filters are computed within the models as two convolutional layers (greyscale + sobel filters).
 14 | The models expect RGB inputs that range in [0, 1]. You should preprocess your data before passing them to the released models by normalizing them: ```mean_rgb = [0.485, 0.456, 0.406]```; ```std_rgb = [0.229, 0.224, 0.225] ```.
 15 | 
 16 | | Method / Dataset   |      YFCC100M      |  ImageNet |
 17 | |--------------------|--------------------|-----------|
 18 | | DeeperCluster | [ours](https://dl.fbaipublicfiles.com/deepcluster/ours/ours.pth) | - |
 19 | | DeepCluster | [deepcluster_yfcc100M](https://dl.fbaipublicfiles.com/deepcluster/deepcluster/deepcluster_flickr.pth) trained on 1.3M images | [deepcluster_imagenet](https://dl.fbaipublicfiles.com/deepcluster/vgg16/checkpoint.pth.tar) (found [here](https://github.com/facebookresearch/deepcluster)) |
 20 | | RotNet | [rotnet_yfcc100M](https://dl.fbaipublicfiles.com/deepcluster/rotnet/rotnet_flickr.pth) | [rotnet_imagenet](https://dl.fbaipublicfiles.com/deepcluster/rotnet/rotnet_imagenet.pth) |
 21 | 
 22 | To automatically download all models you can run:
 23 | ```
 24 | $ ./download_models.sh
 25 | ```
 26 | 
 27 | ## Requirements
 28 | - Python 3.6
 29 | - [PyTorch](http://pytorch.org) install 1.0.0
 30 | - [Apex](https://github.com/NVIDIA/apex) with CUDA extension
 31 | - [Faiss](https://github.com/facebookresearch/faiss) GPU install
 32 | - Download [YFCC100M dataset](https://webscope.sandbox.yahoo.com/catalog.php?datatype=i&did=67&guccounter=1&guce_referrer=aHR0cHM6Ly93d3cuZ29vZ2xlLmNvbS8&guce_referrer_sig=AQAAAI-kwr4-KyuBJKrOUt3nzqR8H9hxu4cel43rHsFuk_4mKhjPoepAekZ7thVhdnOX-oLYek43-YMLIGQ5xmyPzU0Rc--RJsuRMSvqzpxxpug7Mg7XEv15bBS030Ood5TfcXwna_hjdbCtiPeoCOl5Knhog71KhdWnrFwuX2TloFFJ). The ids of the 95.920.149 images we managed to download can be found [here](https://dl.fbaipublicfiles.com/deepcluster/flickr_unique_ids.npy). `wget -c -P ./src/data/ "https://dl.fbaipublicfiles.com/deepcluster/flickr_unique_ids.npy"`
 33 | 
 34 | ## Unsupervised Learning of Visual Features
 35 | 
 36 | The script ```main.sh``` will run our method. Here is a screenshot:
 37 | ```
 38 | python main.py
 39 | 
 40 | ## handling experiment parameters
 41 | --dump_path ./exp/                  # Where to store the experiment
 42 | 
 43 | ## network params
 44 | --pretrained PRETRAINED             # Use this instead of random weights
 45 | 
 46 | ## data params
 47 | --data_path DATA_PATH               # Where to find YFCC100M dataset
 48 | --size_dataset 100000000            # How many images to use for training
 49 | --workers 10                        # Number of data loading workers
 50 | --sobel true                        # Apply Sobel filter
 51 | 
 52 | ## optim params
 53 | --lr 0.1                            # Learning rate
 54 | --wd 0.00001                        # Weight decay
 55 | --nepochs 100                       # Number of epochs to run
 56 | --batch_size 48                     # Batch size per process
 57 | 
 58 | ## model params
 59 | --reassignment 3                    # Reassign clusters every this epoch
 60 | --dim_pca 4096                      # Dimension of the pca on the descriptors
 61 | --super_classes 16                  # Total number of super-classes
 62 | --rotnet true                       # Network needs to classify large rotations
 63 | 
 64 | ## k-means params
 65 | --k 320000                          # Total number of clusters
 66 | --warm_restart false                # Use previous centroids as init
 67 | --use_faiss true                    # Use faiss for E step in k-means
 68 | --niter 10                          # Number of k-means iterations
 69 | 
 70 | ## distributed training params
 71 | --world-size 64                     # Number of distributed processes
 72 | --dist-url DIST_URL                 # Url used to set up distributed training
 73 | ```
 74 | 
 75 | You can look the training full documentation up with ```python main.py --help```.
 76 | 
 77 | ### Distributed training
 78 | This implementation, as it is, supports only distributed mode activated.
 79 | It has been specifically designed for multi-GPU and multi-node training and tested up to 128 GPUs distributed accross 16 nodes of 8 GPUs each.
 80 | You can run code in two different scenarios:
 81 | 
 82 | * 1- Submit your job to a computer cluster. This code is adapted for SLURM job scheduler but you can modify it for your own scheduler.
 83 | 
 84 | * 2- Put export `NGPU=xx; python -m torch.distributed.launch --nproc_per_node=$NGPU` before the python file you want to execute (with xx the number of gpus you want).
 85 | For example, to run an experiment with a single GPU on a single machine, simply replace `python main.py` with:
 86 | ```
 87 | export NGPU=1; python -m torch.distributed.launch --nproc_per_node=$NGPU main.py
 88 | ```
 89 | 
 90 | 
 91 | The parameter `rank` is set automatically in both scenario in [utils.py](./src/utils.py#L42).
 92 | 
 93 | The parameter `local_rank` is more or less useless.
 94 | 
 95 | The parameter `world-size` needs to be set manually in scenario 1 and is set automatically in scenario 2.
 96 | 
 97 | The parameter `dist-url` needs to be set manually in both scenario. Refer to pytorch distributed [doc](https://pytorch.org/docs/stable/distributed.html) to set correctly the initialization method.
 98 | 
 99 | 
100 | The total number of GPUs used for an experiment (```world-size```) must be divisible by the total number of super-classes (```super_classes```).
101 | Hence, exactly a total of ```super_classes``` training communication groups of ```world_size / super_classes``` GPUs each are created.
102 | The parameters of a sub-class classifier specific to a super-class are shared within the corresponding training group.
103 | Each training group deals only with the subset of images and the rotation angle associated with its corresponding super-class.
104 | For this reason, computing batch statistics in the batch normalization layers for *the entire batch* (distributed accross the different training groups) is crucial.
105 | We do so thanks to [apex](https://github.com/NVIDIA/apex/tree/master/apex/parallel#synchronized-batch-normalization).
106 | 
107 | For the first stage of hierarchical clustering into ```nmb_super_clusters``` clusters, the entire pool of GPUs is used.
108 | Then for the second stage, we create ```nmb_super_clusters``` clustering communication groups of ```world_size / nmb_super_clusters``` GPUs each.
109 | Each of these clustering groups independantly performs the second stage of hierarchical clustering on its corresponding subset of data (data belonging to the associated super-cluster).
110 | 
111 | For example, as illustrated below, let's assume we want to run a training with 8 super-classes and we have access to a pool of 16 GPUs.
112 | As many training distributed communication groups as the number of super-classes are created.
113 | This corresponds to creating 8 training groups (in red) of 2 GPUs.
114 | Moreover, the first level of the hierarchical k-means corresponds to the clustering of the data into 8/4=2 super-clusters.
115 | Hence, 2 clustering groups (in blue) are created.
116 | ![distributed](./distributed_training.png)
117 | 
118 | You can have a look [here](./src/utils.py#L42) for more details about how we define the different communication groups.
119 | The multi-node is automatically handled by SLURM.
120 | 
121 | 
122 | ### Running DeepCluster or RotNet
123 | Our implementation is generic enough to encompass both DeepCluster and RotNet trainings.
124 | * DeepCluster: set ```super_classes``` to ```1``` and ```rotnet``` to ```false```.
125 | * RotNet: set ```super_classes``` to ```4```, ```k``` to ```1``` and ```rotnet``` to ```true```.
126 | 
127 | ## Evaluation protocols
128 | 
129 | ### Pascal VOC
130 | 
131 | To reproduce our results on PASCAL VOC 2007 classification task run:
132 | * FC6-8
133 | ```
134 | python eval_voc_classif.py --data_path $PASCAL_DATASET --fc6_8 true --pretrained downloaded_models/deepercluster/ours.pth --sobel true --lr 0.003 --wd 0.00001 --nit 150000 --stepsize 20000 --split trainval
135 | ```
136 | 
137 | * ALL
138 | ```
139 | python eval_voc_classif.py --data_path $PASCAL_DATASET --fc6_8 false --pretrained downloaded_models/deepercluster/ours.pth --sobel true --lr 0.003 --wd 0.0001 --nit 150000 --stepsize 10000 --split trainval
140 | ```
141 | 
142 | **Running the experiment with 5 seeds.**
143 | There are different sources of randomness in the code: classifier initialization, ramdon crops for the evaluation and training with CUDA.
144 | For more reliable results, we recommend to run the experiment several times with different seeds (`--seed 36` for example).
145 | 
146 | **Hyper-parameters selection.**
147 | We select the value of the different hyper-parameters (weight-decay `wd`, learning rate `lr`, and step-size `stepsize`) by training on the train split and validating on the validation set.
148 | To do so, simply use `--split train`.
149 | 
150 | ### Linear classifiers
151 | 
152 | We train linear classifiers with a logistic loss on top of frozen convolutional layers at different depths.
153 | To reduce the influence of feature dimension in the comparison, we average-pool the features until their dimension is below 10k.
154 | 
155 | To reproduce our results from Table-3 run: `./conv13.sh`.
156 | 
157 | To reproduce our results from Figure-2 run: `./linear_classif_layers.sh`
158 | 
159 | **Learning rates.**
160 | We use the learning rate decay recommended for linear models with L2 regularization by Leon Bottou in [Stochastic Gradient Descent Tricks](https://www.microsoft.com/en-us/research/wp-content/uploads/2012/01/tricks-2012.pdf).
161 | 
162 | **Hyper-parameters selection.**
163 | For experiments on Pascal, we select the value of the initial learning rate by training on the train split and validating on the validation set.
164 | To do so, simply use `--split train`.
165 | For experiments on ImageNet and Places, this code implements k-fold cross-validation.
166 | Simply set `--kfold 3` for 3-fold cross-validation.
167 | Then set `--cross_valid 0` for training on splits 1 and 2 and validating on split 0 for example.
168 | 
169 | **Checkpointing and distributed training.**
170 | This code implements automatic checkpointing and is adapted to distributed training on multi-gpus and/or multi-nodes.
171 | 
172 | ### Pre-training for ImageNet
173 | 
174 | To reproduce our results on the pre-training for ImageNet experiment (Table-2) run:
175 | 
176 | ```
177 | mkdir -p ./exp/pretraining_imagenet/
178 | export NGPU=1; python -m torch.distributed.launch --nproc_per_node=$NGPU eval_pretrain.py --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --sobel2RGB true --nepochs 100 --batch_size 256 --lr 0.1 --wd 0.0001 --dump_path ./exp/pretraining_imagenet/ --data_path $DATAPATH_IMAGENET
179 | ```
180 | 
181 | **Checkpointing and distributed training.**
182 | This code implements automatic checkpointing and is specifically intended for distributed training on multi-gpus and/or multi-nodes.
183 | The results in the paper for this experiment are obtained with training on 4 GPUs (the batch size per GPU is 64 in this case).
184 | 
185 | ## References
186 | 
187 | ### Unsupervised Pre-training of Image Features on Non-Curated Data
188 | 
189 | [1] M. Caron, P. Bojanowski, J. Mairal, A. Joulin [Unsupervised Pre-training of Image Features on Non-Curated Data](https://arxiv.org/abs/1905.01278)
190 | ```
191 | @inproceedings{caron2019unsupervised,
192 |   title={Unsupervised Pre-Training of Image Features on Non-Curated Data},
193 |   author={Caron, Mathilde and Bojanowski, Piotr and Mairal, Julien and Joulin, Armand},
194 |   booktitle={Proceedings of the International Conference on Computer Vision (ICCV)},
195 |   year={2019}
196 | }
197 | ```
198 | 
199 | 
200 | ### Deep clustering for unsupervised pre-training of visual features
201 | 
202 | [code](https://github.com/facebookresearch/deepcluster)
203 | 
204 | [2] M. Caron, P. Bojanowski, A. Joulin, M. Douze [*Deep clustering for unsupervised learning of visual features*](http://openaccess.thecvf.com/content_ECCV_2018/html/Mathilde_Caron_Deep_Clustering_for_ECCV_2018_paper.html)
205 | ```
206 | @inproceedings{caron2018deep,
207 |   title={Deep clustering for unsupervised learning of visual features},
208 |   author={Caron, Mathilde and Bojanowski, Piotr and Joulin, Armand and Douze, Matthijs},
209 |   booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
210 |   year={2018}
211 | }
212 | ```
213 | 
214 | ### Unsupervised representation learning by predicting image rotations
215 | [code](https://github.com/gidariss/FeatureLearningRotNet)
216 | 
217 | [3] S. Gidaris, P. Singh, N. Komodakis [*Unsupervised representation learning by predicting image rotations*](https://openreview.net/forum?id=S1v4N2l0-)
218 | 
219 | ```
220 | @inproceedings{
221 |   gidaris2018unsupervised,
222 |   title={Unsupervised Representation Learning by Predicting Image Rotations},
223 |   author={Spyros Gidaris and Praveer Singh and Nikos Komodakis},
224 |   booktitle={International Conference on Learning Representations},
225 |   year={2018},
226 |   url={https://openreview.net/forum?id=S1v4N2l0-},
227 | }
228 | ```
229 | 
230 | ## License
231 | 
232 | See the [LICENSE](LICENSE) file for more details.
233 | 


--------------------------------------------------------------------------------
/conv13.sh:
--------------------------------------------------------------------------------
 1 | # Copyright (c) Facebook, Inc. and its affiliates.
 2 | # All rights reserved.
 3 | #
 4 | # This source code is licensed under the license found in the
 5 | # LICENSE file in the root directory of this source tree.
 6 | #
 7 | 
 8 | DATAPATH_IMAGENET='path/to/imagenet/dataset'
 9 | DATAPATH_PLACES='path/to/places205/dataset'
10 | DATAPATH_PASCAL='path/to/pascal2007/dataset'
11 | 
12 | ##########################
13 | # DeeperCluster YFCC100M #
14 | ##########################
15 | 
16 | # ImageNet dataset
17 | EXP='./exp/eval_linear_imagenet/'
18 | mkdir -p $EXP
19 | python eval_linear.py --conv 13 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_IMAGENET --batch_size 256 --lr 0.02 --wd 0.00001 --nepochs 100
20 | 
21 | # Places205 dataset
22 | EXP='./exp/eval_linear_places205/'
23 | mkdir -p $EXP
24 | python eval_linear.py --conv 13 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_PLACES --batch_size 256 --lr 0.03 --wd 0.00001 --nepochs 100
25 | 
26 | # Pascal dataset
27 | EXP='./exp/eval_linear_pascal/'
28 | mkdir -p $EXP
29 | python eval_linear.py --conv 13 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_PASCAL --batch_size 128 --lr 0.02 --wd 0.00001 --nepochs 60
30 | 


--------------------------------------------------------------------------------
/distributed_training.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facebookresearch/DeeperCluster/d38ada109f8334f6ae4c84a218d79848a936ed6f/distributed_training.png


--------------------------------------------------------------------------------
/download_models.sh:
--------------------------------------------------------------------------------
 1 | # Copyright (c) Facebook, Inc. and its affiliates.
 2 | # All rights reserved.
 3 | #
 4 | # This source code is licensed under the license found in the
 5 | # LICENSE file in the root directory of this source tree.
 6 | #
 7 | #!/bin/bash
 8 | 
 9 | MODELROOT="./downloaded_models"
10 | 
11 | mkdir -p ${MODELROOT}
12 | 
13 | for METHOD in deepercluster deepcluster rotnet
14 | do
15 | 	mkdir -p "${MODELROOT}/${METHOD}"
16 | 
17 | 	# download our model
18 | 	if [ "$METHOD" = deepercluster ];
19 | 	then
20 | 	    wget -c "https://dl.fbaipublicfiles.com/deepcluster/ours/ours.pth" \
21 | 	      -P "${MODELROOT}/${METHOD}" 
22 | 	fi
23 | 
24 | 	# download deepcluster model trained on a 1.3M subset of YFCC100M
25 | 	if [ "$METHOD" = deepcluster ];
26 | 	then
27 | 	    wget -c "https://dl.fbaipublicfiles.com/deepcluster/${METHOD}/${METHOD}_flickr.pth" \
28 | 	      -P "${MODELROOT}/${METHOD}" 
29 | 	fi
30 | 
31 | 	# download rotnet models
32 | 	if [ "$METHOD" = rotnet ];
33 | 	then
34 | 		for DATASET in flickr imagenet
35 | 		do
36 | 			wget -c "https://dl.fbaipublicfiles.com/deepcluster/${METHOD}/${METHOD}_${DATASET}.pth" \
37 | 			  -P "${MODELROOT}/${METHOD}" 
38 | 		done
39 | 	fi
40 | done
41 | 


--------------------------------------------------------------------------------
/eval_linear.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | #
  4 | # This source code is licensed under the license found in the
  5 | # LICENSE file in the root directory of this source tree.
  6 | #
  7 | 
  8 | import argparse
  9 | from logging import getLogger
 10 | import os
 11 | import time
 12 | 
 13 | import numpy as np
 14 | from sklearn import metrics
 15 | import torch
 16 | import torch.nn as nn
 17 | import torch.utils.data
 18 | 
 19 | from src.data.loader import load_data, get_data_transformations, KFold, per_target
 20 | from src.model.model_factory import model_factory, to_cuda, sgd_optimizer
 21 | from src.model.pretrain import load_pretrained
 22 | from src.slurm import init_signal_handler, trigger_job_requeue
 23 | from src.trainer import validate_network, accuracy
 24 | from src.data.VOC2007 import VOC2007_dataset
 25 | from src.utils import (bool_flag, init_distributed_mode, initialize_exp, AverageMeter,
 26 |                        restart_from_checkpoint, fix_random_seeds,)
 27 | 
 28 | logger = getLogger()
 29 | 
 30 | 
 31 | def get_parser():
 32 |     """
 33 |     Generate a parameters parser.
 34 |     """
 35 |     # parse parameters
 36 |     parser = argparse.ArgumentParser(description="Train a linear classifier on conv layer")
 37 | 
 38 |     # main parameters
 39 |     parser.add_argument("--dump_path", type=str, default=".",
 40 |                         help="Experiment dump path")
 41 |     parser.add_argument('--epoch', type=int, default=0,
 42 |                         help='Current epoch to run')
 43 |     parser.add_argument('--start_iter', type=int, default=0,
 44 |                         help='First iter to run in the current epoch')
 45 | 
 46 |     # model params
 47 |     parser.add_argument('--pretrained', type=str, default='',
 48 |                         help='Use this instead of random weights.')
 49 |     parser.add_argument('--conv', type=int, default=1, choices=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13],
 50 |                         help='On top of which layer train classifier.')
 51 | 
 52 |     # datasets params
 53 |     parser.add_argument('--data_path', type=str, default='',
 54 |                         help='Where to find supervised dataset')
 55 |     parser.add_argument('--workers', type=int, default=8,
 56 |                         help='Number of data loading workers')
 57 |     parser.add_argument('--sobel', type=bool_flag, default=False)
 58 | 
 59 |     # optim params
 60 |     parser.add_argument('--lr', type=float, default=0.05, help='Learning rate')
 61 |     parser.add_argument('--wd', type=float, default=1e-5, help='Weight decay')
 62 |     parser.add_argument('--nepochs', type=int, default=100,
 63 |                         help='Max number of epochs to run')
 64 |     parser.add_argument('--batch_size', default=64, type=int)
 65 | 
 66 |     # model selection
 67 |     parser.add_argument('--split', type=str, required=False, default='train', choices=['train', 'trainval'],
 68 |                         help='for PASCAL dataset, train on train or train+val')
 69 |     parser.add_argument('--kfold', type=int, default=None,
 70 |                         help="""dataset randomly partitioned into kfold equal sized subsamples.
 71 |                         Default None: no cross validation: train on full train set""")
 72 |     parser.add_argument('--cross_valid', type=int, default=None,
 73 |                         help='between 0 and kfold - 1: index of the round of cross validation')
 74 | 
 75 |     # distributed training params
 76 |     parser.add_argument('--rank', default=0, type=int,
 77 |                         help='rank')
 78 |     parser.add_argument("--local_rank", type=int, default=-1,
 79 |                         help="Multi-GPU - Local rank")
 80 |     parser.add_argument('--world-size', default=1, type=int,
 81 |                         help='number of distributed processes')
 82 |     parser.add_argument('--dist-url', default='', type=str,
 83 |                         help='url used to set up distributed training')
 84 | 
 85 |     # debug
 86 |     parser.add_argument("--debug_slurm", type=bool_flag, default=False,
 87 |                         help="Debug within a SLURM job")
 88 | 
 89 |     return parser.parse_args()
 90 | 
 91 | 
 92 | def main(args):
 93 | 
 94 |     # initialize the multi-GPU / multi-node training
 95 |     init_distributed_mode(args, make_communication_groups=False)
 96 | 
 97 |     # initialize the experiment
 98 |     logger, training_stats = initialize_exp(args, 'epoch', 'iter', 'prec',
 99 |                                             'loss', 'prec_val', 'loss_val')
100 | 
101 |     # initialize SLURM signal handler for time limit / pre-emption
102 |     init_signal_handler()
103 | 
104 |     if not 'pascal' in args.data_path:
105 |         main_data_path = args.data_path
106 |         args.data_path = os.path.join(main_data_path, 'train')
107 |         train_dataset = load_data(args)
108 |     else:
109 |         train_dataset = VOC2007_dataset(args.data_path, split=args.split)
110 | 
111 |     args.test = 'val' if args.split == 'train' else 'test'
112 |     if not 'pascal' in args.data_path:
113 |         if args.cross_valid is None:
114 |             args.data_path = os.path.join(main_data_path, 'val')
115 |         val_dataset = load_data(args)
116 |     else:
117 |         val_dataset = VOC2007_dataset(args.data_path, split=args.test)
118 | 
119 |     if args.cross_valid is not None:
120 |         kfold = KFold(per_target(train_dataset.imgs), args.cross_valid, args.kfold)
121 |         train_loader = torch.utils.data.DataLoader(
122 |             train_dataset, batch_size=args.batch_size, sampler=kfold.train,
123 |             num_workers=args.workers, pin_memory=True)
124 |         val_loader = torch.utils.data.DataLoader(
125 |             val_dataset, batch_size=args.batch_size, sampler=kfold.val,
126 |             num_workers=args.workers)
127 | 
128 |     else:
129 |         train_loader = torch.utils.data.DataLoader(
130 |             train_dataset, batch_size=args.batch_size, shuffle=True,
131 |             num_workers=args.workers, pin_memory=True)
132 |         val_loader = torch.utils.data.DataLoader(
133 |             val_dataset,
134 |             batch_size=args.batch_size, shuffle=False,
135 |             num_workers=args.workers)
136 | 
137 |     # prepare the different data transformations
138 |     tr_val, tr_train = get_data_transformations()
139 |     train_dataset.transform = tr_train
140 |     val_dataset.transform = tr_val
141 | 
142 |     # build model skeleton
143 |     fix_random_seeds()
144 |     model = model_factory(args)
145 | 
146 |     load_pretrained(model, args)
147 | 
148 |     # keep only conv layers
149 |     model.body.classifier = None
150 |     model.conv = args.conv
151 | 
152 |     if 'places' in args.data_path:
153 |         nmb_classes = 205
154 |     elif 'pascal' in args.data_path:
155 |         nmb_classes = 20
156 |     else:
157 |         nmb_classes = 1000
158 | 
159 |     reglog = RegLog(nmb_classes, args.conv)
160 | 
161 |     # distributed training wrapper
162 |     model = to_cuda(model, [args.gpu_to_work_on], apex=True)
163 |     reglog = to_cuda(reglog, [args.gpu_to_work_on], apex=True)
164 |     logger.info('model to cuda')
165 | 
166 |     # set optimizer
167 |     optimizer = sgd_optimizer(reglog, args.lr, args.wd)
168 | 
169 |     ## variables to reload to fetch in checkpoint
170 |     to_restore = {'epoch': 0, 'start_iter': 0}
171 | 
172 |     # re start from checkpoint
173 |     restart_from_checkpoint(
174 |         args,
175 |         run_variables=to_restore,
176 |         state_dict=reglog,
177 |         optimizer=optimizer,
178 |     )
179 |     args.epoch = to_restore['epoch']
180 |     args.start_iter = to_restore['start_iter']
181 | 
182 |     model.eval()
183 |     reglog.train()
184 | 
185 |     # Linear training
186 |     for _ in range(args.epoch, args.nepochs):
187 | 
188 |         logger.info("============ Starting epoch %i ... ============" % args.epoch)
189 | 
190 |         # train the network for one epoch
191 |         scores = train_network(args, model, reglog, optimizer, train_loader)
192 | 
193 |         if not 'pascal' in args.data_path:
194 |             scores_val = validate_network(val_loader, [model, reglog], args)
195 |         else:
196 |             scores_val = evaluate_pascal(val_dataset, [model, reglog])
197 | 
198 |         scores = scores + scores_val
199 | 
200 |         # save training statistics
201 |         logger.info(scores)
202 |         training_stats.update(scores)
203 | 
204 | 
205 | def evaluate_pascal(val_dataset, models):
206 | 
207 |     val_loader = torch.utils.data.DataLoader(
208 |         val_dataset,
209 |         sampler=torch.utils.data.distributed.DistributedSampler(val_dataset),
210 |         batch_size=1,
211 |         num_workers=args.workers,
212 |         pin_memory=True,
213 |     )
214 | 
215 |     for model in models:
216 |         model.eval()
217 |     gts = []
218 |     scr = []
219 |     for i, (input, target) in enumerate(val_loader):
220 |         # move input to gpu and optionally reshape it
221 |         input = input.cuda(non_blocking=True)
222 | 
223 |         # forward pass without grad computation
224 |         with torch.no_grad():
225 |             output = models[0](input)
226 |             output = models[1](output)
227 |             scr.append(torch.sum(output, 0, keepdim=True).cpu().numpy())
228 |             gts.append(target)
229 |             scr[i] += output.cpu().numpy()
230 |     gts = np.concatenate(gts, axis=0).T
231 |     scr = np.concatenate(scr, axis=0).T
232 |     aps = []
233 |     for i in range(20):
234 |         # Subtract eps from score to make AP work for tied scores
235 |         ap = metrics.average_precision_score(gts[i][gts[i]<=1], scr[i][gts[i]<=1]-1e-5*gts[i][gts[i]<=1])
236 |         aps.append(ap)
237 |     print(np.mean(aps), '  ', ' '.join(['%0.2f'%a for a in aps]))
238 |     return np.mean(aps), 0
239 | 
240 | 
241 | class RegLog(nn.Module):
242 |     """Creates logistic regression on top of frozen features"""
243 |     def __init__(self, num_labels, conv):
244 |         super(RegLog, self).__init__()
245 |         if conv < 3:
246 |             av = 18
247 |             s = 9216
248 |         elif conv < 5:
249 |             av = 14
250 |             s = 8192
251 |         elif conv < 8:
252 |             av = 9
253 |             s = 9216
254 |         elif conv < 11:
255 |             av = 6
256 |             s = 8192
257 |         elif conv < 14:
258 |             av = 3
259 |             s = 8192
260 |         self.av_pool = nn.AvgPool2d(av, stride=av, padding=0)
261 |         self.linear = nn.Linear(s, num_labels)
262 | 
263 |     def forward(self, x):
264 |         x = self.av_pool(x)
265 |         x = x.view(x.size(0), -1)
266 |         return self.linear(x)
267 | 
268 | 
269 | def train_network(args, model, reglog, optimizer, loader):
270 |     """
271 |     Train the models on the dataset.
272 |     """
273 |     # running statistics
274 |     batch_time = AverageMeter()
275 |     data_time = AverageMeter()
276 | 
277 |     # training statistics
278 |     log_top1 = AverageMeter()
279 |     log_loss = AverageMeter()
280 |     end = time.perf_counter()
281 | 
282 |     if 'pascal' in args.data_path:
283 |         criterion = nn.BCEWithLogitsLoss(reduction='none')
284 |     else:
285 |         criterion = nn.CrossEntropyLoss().cuda()
286 | 
287 |     for iter_epoch, (inp, target) in enumerate(loader):
288 |         # measure data loading time
289 |         data_time.update(time.perf_counter() - end)
290 | 
291 |         learning_rate_decay(optimizer, len(loader) * args.epoch + iter_epoch, args.lr)
292 | 
293 |         # start at iter start_iter
294 |         if iter_epoch < args.start_iter:
295 |             continue
296 | 
297 |         # move to gpu
298 |         inp = inp.cuda(non_blocking=True)
299 |         target = target.cuda(non_blocking=True)
300 |         if 'pascal' in args.data_path:
301 |             target = target.float()
302 | 
303 |         # forward
304 |         with torch.no_grad():
305 |             output = model(inp)
306 |         output = reglog(output)
307 | 
308 |         # compute cross entropy loss
309 |         loss = criterion(output, target)
310 | 
311 |         if 'pascal' in args.data_path:
312 |             mask = (target == 255)
313 |             loss = torch.sum(loss.masked_fill_(mask, 0)) / target.size(0)
314 | 
315 |         optimizer.zero_grad()
316 | 
317 |         # compute the gradients
318 |         loss.backward()
319 | 
320 |         # step
321 |         optimizer.step()
322 | 
323 |         # log
324 | 
325 |         # signal received, relaunch experiment
326 |         if os.environ['SIGNAL_RECEIVED'] == 'True':
327 |             if not args.rank:
328 |                 torch.save({
329 |                     'epoch': args.epoch,
330 |                     'start_iter': iter_epoch + 1,
331 |                     'state_dict': reglog.state_dict(),
332 |                     'optimizer': optimizer.state_dict(),
333 |                 }, os.path.join(args.dump_path, 'checkpoint.pth.tar'))
334 |                 trigger_job_requeue(os.path.join(args.dump_path, 'checkpoint.pth.tar'))
335 | 
336 |         # update stats
337 |         log_loss.update(loss.item(), output.size(0))
338 |         if not 'pascal' in args.data_path:
339 |             prec1 = accuracy(args, output, target)
340 |             log_top1.update(prec1.item(), output.size(0))
341 | 
342 |         batch_time.update(time.perf_counter() - end)
343 |         end = time.perf_counter()
344 | 
345 |         # verbose
346 |         if iter_epoch % 100 == 0:
347 |             logger.info('Epoch[{0}] - Iter: [{1}/{2}]\t'
348 |                   'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t'
349 |                   'Data {data_time.val:.3f} ({data_time.avg:.3f})\t'
350 |                   'Loss {loss.val:.4f} ({loss.avg:.4f})\t'
351 |                   'Prec {log_top1.val:.3f} ({log_top1.avg:.3f})\t'
352 |                   .format(args.epoch, iter_epoch, len(loader), batch_time=batch_time,
353 |                    data_time=data_time, loss=log_loss, log_top1=log_top1))
354 | 
355 |     # end of epoch
356 |     args.start_iter = 0
357 |     args.epoch += 1
358 | 
359 |     # dump checkpoint
360 |     if not args.rank:
361 |         torch.save({
362 |             'epoch': args.epoch,
363 |             'start_iter': 0,
364 |             'state_dict': reglog.state_dict(),
365 |             'optimizer': optimizer.state_dict(),
366 |         }, os.path.join(args.dump_path, 'checkpoint.pth.tar'))
367 | 
368 |     return (args.epoch - 1, args.epoch * len(loader), log_top1.avg, log_loss.avg)
369 | 
370 | 
371 | def learning_rate_decay(optimizer, t, lr_0):
372 |     for param_group in optimizer.param_groups:
373 |         lr = lr_0 / np.sqrt(1 + lr_0 * param_group['weight_decay'] * t)
374 |         param_group['lr'] = lr
375 | 
376 | 
377 | if __name__ == '__main__':
378 | 
379 |     # generate parser / parse parameters
380 |     args = get_parser()
381 | 
382 |     # run experiment
383 |     main(args)
384 | 


--------------------------------------------------------------------------------
/eval_pretrain.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | #
  4 | # This source code is licensed under the license found in the
  5 | # LICENSE file in the root directory of this source tree.
  6 | #
  7 | 
  8 | import argparse
  9 | from logging import getLogger
 10 | import math
 11 | import os
 12 | import shutil
 13 | import time
 14 | 
 15 | import torch
 16 | import torch.nn as nn
 17 | 
 18 | from src.data.loader import load_data, get_data_transformations
 19 | from src.model.model_factory import model_factory, to_cuda, sgd_optimizer, sobel2RGB
 20 | from src.slurm import init_signal_handler, trigger_job_requeue
 21 | from src.trainer import validate_network, accuracy
 22 | from src.utils import (bool_flag, init_distributed_mode, initialize_exp, AverageMeter,
 23 |                        restart_from_checkpoint, fix_random_seeds,)
 24 | from src.model.pretrain import load_pretrained
 25 | 
 26 | 
 27 | logger = getLogger()
 28 | 
 29 | 
 30 | def get_parser():
 31 |     """
 32 |     Generate a parameters parser.
 33 |     """
 34 |     # parse parameters
 35 |     parser = argparse.ArgumentParser(description="Train classification")
 36 | 
 37 |     # main parameters
 38 |     parser.add_argument("--dump_path", type=str, default=".",
 39 |                         help="Experiment dump path")
 40 |     parser.add_argument('--epoch', type=int, default=0,
 41 |                         help='Current epoch to run')
 42 |     parser.add_argument('--start_iter', type=int, default=0,
 43 |                         help='First iter to run in the current epoch')
 44 |     parser.add_argument("--checkpoint_freq", type=int, default=20,
 45 |                         help="Save the model periodically ")
 46 |     parser.add_argument("--evaluate", type=bool_flag, default=False,
 47 |                         help="Evaluate the model only")
 48 |     parser.add_argument('--seed', type=int, default=35, help='random seed')
 49 | 
 50 |     # model params
 51 |     parser.add_argument('--sobel', type=bool_flag, default=0)
 52 |     parser.add_argument('--sobel2RGB', type=bool_flag, default=False,
 53 |                         help='Incorporate sobel filter in first conv')
 54 |     parser.add_argument('--pretrained', type=str, default='',
 55 |                         help='Use this instead of random weights.')
 56 | 
 57 |     # datasets params
 58 |     parser.add_argument('--data_path', type=str, default='',
 59 |                         help='Where to find ImageNet dataset')
 60 |     parser.add_argument('--workers', type=int, default=8,
 61 |                         help='Number of data loading workers')
 62 | 
 63 |     # optim params
 64 |     parser.add_argument('--lr', type=float, default=0.05, help='Learning rate')
 65 |     parser.add_argument('--wd', type=float, default=1e-5, help='Weight decay')
 66 |     parser.add_argument('--nepochs', type=int, default=100,
 67 |                         help='Max number of epochs to run')
 68 |     parser.add_argument('--batch_size', default=128, type=int)
 69 | 
 70 |     # distributed training params
 71 |     parser.add_argument('--rank', default=0, type=int,
 72 |                         help='rank')
 73 |     parser.add_argument("--local_rank", type=int, default=-1,
 74 |                             help="Multi-GPU - Local rank")
 75 |     parser.add_argument('--world-size', default=1, type=int,
 76 |                         help='number of distributed processes')
 77 |     parser.add_argument('--dist-url', default='', type=str,
 78 |                         help='url used to set up distributed training')
 79 | 
 80 |     # debug
 81 |     parser.add_argument("--debug", type=bool_flag, default=False,
 82 |                         help="Load val set of ImageNet")
 83 |     parser.add_argument("--debug_slurm", type=bool_flag, default=False,
 84 |                         help="Debug within a SLURM job")
 85 | 
 86 |     return parser.parse_args()
 87 | 
 88 | 
 89 | def main(args):
 90 | 
 91 |     # initialize the multi-GPU / multi-node training
 92 |     init_distributed_mode(args, make_communication_groups=False)
 93 | 
 94 |     # initialize the experiment
 95 |     logger, training_stats = initialize_exp(args, 'epoch', 'iter', 'prec',
 96 |                                             'loss', 'prec_val', 'loss_val')
 97 | 
 98 |     # initialize SLURM signal handler for time limit / pre-emption
 99 |     init_signal_handler()
100 | 
101 |     main_data_path = args.data_path
102 |     if args.debug:
103 |         args.data_path = os.path.join(main_data_path, 'val')
104 |     else:
105 |         args.data_path = os.path.join(main_data_path, 'train')
106 |     train_dataset = load_data(args)
107 | 
108 |     args.data_path = os.path.join(main_data_path, 'val')
109 |     val_dataset = load_data(args)
110 | 
111 |     # prepare the different data transformations
112 |     tr_val, tr_train = get_data_transformations()
113 |     train_dataset.transform = tr_train
114 |     val_dataset.transform = tr_val
115 |     val_loader = torch.utils.data.DataLoader(
116 |         val_dataset,
117 |         batch_size=args.batch_size,
118 |         num_workers=args.workers,
119 |         pin_memory=True,
120 |     )
121 | 
122 |     # build model skeleton
123 |     fix_random_seeds(args.seed)
124 |     nmb_classes = 205 if 'places' in args.data_path else 1000
125 |     model = model_factory(args, relu=True, num_classes=nmb_classes)
126 | 
127 |     # load pretrained weights
128 |     load_pretrained(model, args)
129 | 
130 |     # merge sobel layers with first convolution layer
131 |     if args.sobel2RGB:
132 |         sobel2RGB(model)
133 | 
134 |     # re initialize classifier
135 |     if hasattr(model.body, 'classifier'):
136 |         for m in model.body.classifier.modules():
137 |             if isinstance(m, nn.Linear):
138 |                 m.weight.data.normal_(0, 0.01)
139 |                 m.bias.data.fill_(0.1)
140 | 
141 |     # distributed training wrapper
142 |     model = to_cuda(model, [args.gpu_to_work_on], apex=True)
143 |     logger.info('model to cuda')
144 | 
145 |     # set optimizer
146 |     optimizer = sgd_optimizer(model, args.lr, args.wd)
147 | 
148 |     ## variables to reload to fetch in checkpoint
149 |     to_restore = {'epoch': 0, 'start_iter': 0}
150 | 
151 |     # re start from checkpoint
152 |     restart_from_checkpoint(
153 |         args,
154 |         run_variables=to_restore,
155 |         state_dict=model,
156 |         optimizer=optimizer,
157 |     )
158 |     args.epoch = to_restore['epoch']
159 |     args.start_iter = to_restore['start_iter']
160 | 
161 |     if args.evaluate:
162 |         validate_network(val_loader, [model], args)
163 |         return
164 | 
165 |     # Supervised training
166 |     for _ in range(args.epoch, args.nepochs):
167 | 
168 |         logger.info("============ Starting epoch %i ... ============" % args.epoch)
169 | 
170 |         fix_random_seeds(args.seed + args.epoch)
171 | 
172 |         # train the network for one epoch
173 |         adjust_learning_rate(optimizer, args)
174 |         scores = train_network(args, model, optimizer, train_dataset)
175 | 
176 |         scores_val = validate_network(val_loader, [model], args)
177 | 
178 |         # save training statistics
179 |         logger.info(scores + scores_val)
180 |         training_stats.update(scores + scores_val)
181 | 
182 | 
183 | def adjust_learning_rate(optimizer, args):
184 |     lr = args.lr * (0.1 ** (args.epoch // 30))
185 |     for param_group in optimizer.param_groups:
186 |         param_group['lr'] = lr
187 | 
188 | 
189 | def train_network(args, model, optimizer, dataset):
190 |     """
191 |     Train the models on the dataset.
192 |     """
193 |     # swith to train mode
194 |     model.train()
195 | 
196 |     sampler = torch.utils.data.distributed.DistributedSampler(dataset)
197 | 
198 |     loader = torch.utils.data.DataLoader(
199 |         dataset,
200 |         sampler=sampler,
201 |         batch_size=args.batch_size,
202 |         num_workers=args.workers,
203 |         pin_memory=True,
204 |     )
205 | 
206 |     # running statistics
207 |     batch_time = AverageMeter()
208 |     data_time = AverageMeter()
209 | 
210 |     # training statistics
211 |     log_top1 = AverageMeter()
212 |     log_loss = AverageMeter()
213 |     end = time.perf_counter()
214 | 
215 |     cel = nn.CrossEntropyLoss().cuda()
216 | 
217 |     for iter_epoch, (inp, target) in enumerate(loader):
218 |         # measure data loading time
219 |         data_time.update(time.perf_counter() - end)
220 | 
221 |         # start at iter start_iter
222 |         if iter_epoch < args.start_iter:
223 |             continue
224 | 
225 |         # move to gpu
226 |         inp = inp.cuda(non_blocking=True)
227 |         target = target.cuda(non_blocking=True)
228 | 
229 |         # forward
230 |         output = model(inp)
231 | 
232 |         # compute cross entropy loss
233 |         loss = cel(output, target)
234 | 
235 |         optimizer.zero_grad()
236 | 
237 |         # compute the gradients
238 |         loss.backward()
239 | 
240 |         # step
241 |         optimizer.step()
242 | 
243 |         # log
244 | 
245 |         # signal received, relaunch experiment
246 |         if os.environ['SIGNAL_RECEIVED'] == 'True':
247 |             if not args.rank:
248 |                 torch.save({
249 |                     'epoch': args.epoch,
250 |                     'start_iter': iter_epoch + 1,
251 |                     'state_dict': model.state_dict(),
252 |                     'optimizer': optimizer.state_dict(),
253 |                 }, os.path.join(args.dump_path, 'checkpoint.pth.tar'))
254 |                 trigger_job_requeue(os.path.join(args.dump_path, 'checkpoint.pth.tar'))
255 | 
256 |         # update stats
257 |         log_loss.update(loss.item(), output.size(0))
258 |         prec1 = accuracy(args, output, target)
259 |         log_top1.update(prec1.item(), output.size(0))
260 | 
261 |         batch_time.update(time.perf_counter() - end)
262 |         end = time.perf_counter()
263 | 
264 |         # verbose
265 |         if iter_epoch % 100 == 0:
266 |             logger.info('Epoch[{0}] - Iter: [{1}/{2}]\t'
267 |                         'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t'
268 |                         'Data {data_time.val:.3f} ({data_time.avg:.3f})\t'
269 |                         'Loss {loss.val:.4f} ({loss.avg:.4f})\t'
270 |                         'Prec {log_top1.val:.3f} ({log_top1.avg:.3f})\t'
271 |                         .format(args.epoch, iter_epoch, len(loader), batch_time=batch_time,
272 |                                 data_time=data_time, loss=log_loss, log_top1=log_top1))
273 | 
274 |     # end of epoch
275 |     args.start_iter = 0
276 |     args.epoch += 1
277 | 
278 |     # dump checkpoint
279 |     if not args.rank:
280 |         torch.save({
281 |             'epoch': args.epoch,
282 |             'start_iter': 0,
283 |             'state_dict': model.state_dict(),
284 |             'optimizer': optimizer.state_dict(),
285 |         }, os.path.join(args.dump_path, 'checkpoint.pth.tar'))
286 |         if not (args.epoch - 1) % args.checkpoint_freq:
287 |             shutil.copyfile(
288 |                 os.path.join(args.dump_path, 'checkpoint.pth.tar'),
289 |                 os.path.join(args.dump_checkpoints,
290 |                              'checkpoint' + str(args.epoch - 1) + '.pth.tar'),
291 |             )
292 | 
293 |     return (args.epoch - 1, args.epoch * len(loader), log_top1.avg, log_loss.avg)
294 | 
295 | if __name__ == '__main__':
296 | 
297 |     # generate parser / parse parameters
298 |     args = get_parser()
299 | 
300 |     # run experiment
301 |     main(args)
302 | 


--------------------------------------------------------------------------------
/eval_voc_classif.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | #
  4 | # This source code is licensed under the license found in the
  5 | # LICENSE file in the root directory of this source tree.
  6 | #
  7 | 
  8 | import argparse
  9 | import time
 10 | 
 11 | import numpy as np
 12 | import torch
 13 | import torch.nn as nn
 14 | import torch.optim
 15 | import torch.utils.data
 16 | import torchvision.transforms as transforms
 17 | from sklearn import metrics
 18 | 
 19 | from src.utils import AverageMeter, bool_flag, fix_random_seeds
 20 | from src.trainer import accuracy
 21 | from src.data.VOC2007 import VOC2007_dataset
 22 | from src.model.model_factory import model_factory, sgd_optimizer
 23 | from src.model.pretrain import load_pretrained
 24 | 
 25 | parser = argparse.ArgumentParser()
 26 | 
 27 | # model params
 28 | parser.add_argument('--pretrained', type=str, required=False, default='',
 29 |                     help='evaluate this model')
 30 | 
 31 | # data params
 32 | parser.add_argument('--data_path', type=str, default='',
 33 |                     help='Where to find pascal 2007 dataset')
 34 | parser.add_argument('--split', type=str, required=False, default='train',
 35 |                     choices=['train', 'trainval'], help='training split')
 36 | parser.add_argument('--sobel', type=bool_flag, default=False, help='If true, sobel applies')
 37 | 
 38 | # transfer params
 39 | parser.add_argument('--fc6_8', type=bool_flag, default=True, help='If true, train only the final classifier')
 40 | parser.add_argument('--eval_random_crops', type=bool_flag, default=True, help='If true, eval on 10 random crops, otherwise eval on 10 fixed crops')
 41 | 
 42 | # optim params
 43 | parser.add_argument('--nit', type=int, default=150000, help='Number of training iterations')
 44 | parser.add_argument('--stepsize', type=int, default=10000, help='Decay step')
 45 | parser.add_argument('--lr', type=float, required=False, default=0.003, help='learning rate')
 46 | parser.add_argument('--wd', type=float, required=False, default=1e-6, help='weight decay')
 47 | 
 48 | parser.add_argument('--seed', type=int, default=1993, help='random seed')
 49 | 
 50 | def main():
 51 |     args = parser.parse_args()
 52 |     args.world_size = 1
 53 |     print(args)
 54 | 
 55 |     fix_random_seeds(args.seed)
 56 | 
 57 |     # create model
 58 |     model = model_factory(args, relu=True, num_classes=20)
 59 | 
 60 |     # load pretrained weights
 61 |     load_pretrained(model, args)
 62 | 
 63 |     model = model.cuda()
 64 |     print('model to cuda')
 65 | 
 66 |     # on which split to train
 67 |     if args.split == 'train':
 68 |         args.test = 'val'
 69 |     elif args.split == 'trainval':
 70 |         args.test = 'test'
 71 | 
 72 |     # data loader
 73 |     normalize = [transforms.Normalize(mean=[0.485, 0.456, 0.406],
 74 |                                          std=[0.229, 0.224, 0.225])]
 75 |     dataset = VOC2007_dataset(args.data_path, split=args.split, transform=transforms.Compose([
 76 |             transforms.RandomHorizontalFlip(),
 77 |             transforms.RandomResizedCrop(224),
 78 |             transforms.ToTensor(),] + normalize
 79 |          ))
 80 | 
 81 |     loader = torch.utils.data.DataLoader(dataset,
 82 |          batch_size=16, shuffle=False,
 83 |          num_workers=4, pin_memory=True)
 84 |     print('PASCAL VOC 2007 ' + args.split + ' dataset loaded')
 85 | 
 86 |     # re initialize classifier
 87 |     if hasattr(model.body, 'classifier'):
 88 |         for m in model.body.classifier.modules():
 89 |             if isinstance(m, nn.Linear):
 90 |                 m.weight.data.normal_(0, 0.01)
 91 |                 m.bias.data.fill_(0.1)
 92 |     for m in model.pred_layer.modules():
 93 |         if isinstance(m, nn.Linear):
 94 |             m.weight.data.normal_(0, 0.01)
 95 |             m.bias.data.fill_(0.1)
 96 | 
 97 |    # freeze conv layers
 98 |     if args.fc6_8:
 99 |         if hasattr(model.body, 'features'):
100 |             for param in model.body.features.parameters():
101 |                 param.requires_grad = False
102 | 
103 |     # set optimizer
104 |     optimizer = torch.optim.SGD(
105 |         filter(lambda x: x.requires_grad, model.parameters()),
106 |         lr=args.lr,
107 |         momentum=0.9,
108 |         weight_decay=args.wd,
109 |     )
110 | 
111 |     criterion = nn.BCEWithLogitsLoss(reduction='none')
112 | 
113 |     print('Start training')
114 |     it = 0
115 |     losses = AverageMeter()
116 |     while it < args.nit:
117 |         it = train(
118 |             loader,
119 |             model,
120 |             optimizer,
121 |             criterion,
122 |             args.fc6_8,
123 |             losses,
124 |             current_iteration=it,
125 |             total_iterations=args.nit,
126 |             stepsize=args.stepsize,
127 |         )
128 | 
129 |     print('Model Evaluation')
130 |     if args.eval_random_crops:
131 |         transform_eval = [
132 |             transforms.RandomHorizontalFlip(),
133 |             transforms.RandomResizedCrop(224),
134 |             transforms.ToTensor(),] + normalize
135 |     else:
136 |         transform_eval = [
137 |             transforms.Resize(256),
138 |             transforms.TenCrop(224),
139 |             transforms.Lambda(lambda crops: torch.stack([transforms.Compose(normalize)(transforms.ToTensor()(crop)) for crop in crops]))
140 |         ]
141 | 
142 |     print('Train set')
143 |     train_dataset = VOC2007_dataset(
144 |         args.data_path,
145 |         split=args.split,
146 |         transform=transforms.Compose(transform_eval),
147 |     )
148 |     train_loader = torch.utils.data.DataLoader(
149 |         train_dataset,
150 |         batch_size=1,
151 |         shuffle=False,
152 |         num_workers=4,
153 |         pin_memory=True,
154 |     )
155 |     evaluate(train_loader, model, args.eval_random_crops)
156 | 
157 |     print('Test set')
158 |     test_dataset = VOC2007_dataset(args.data_path, split=args.test, transform=transforms.Compose(transform_eval))
159 |     test_loader = torch.utils.data.DataLoader(
160 |         test_dataset,
161 |         batch_size=1,
162 |         shuffle=False,
163 |         num_workers=4,
164 |         pin_memory=True,
165 |     )
166 |     evaluate(test_loader, model, args.eval_random_crops)
167 | 
168 | 
169 | def evaluate(loader, model, eval_random_crops):
170 |     model.eval()
171 |     gts = []
172 |     scr = []
173 |     for crop in range(9 * eval_random_crops + 1):
174 |         for i, (input, target) in enumerate(loader):
175 |             # move input to gpu and optionally reshape it
176 |             if len(input.size()) == 5:
177 |                 bs, ncrops, c, h, w = input.size()
178 |                 input = input.view(-1, c, h, w)
179 |             input = input.cuda(non_blocking=True)
180 | 
181 |             # forward pass without grad computation
182 |             with torch.no_grad():
183 |                 output = model(input)
184 |             if crop < 1 :
185 |                     scr.append(torch.sum(output, 0, keepdim=True).cpu().numpy())
186 |                     gts.append(target)
187 |             else:
188 |                     scr[i] += output.cpu().numpy()
189 |     gts = np.concatenate(gts, axis=0).T
190 |     scr = np.concatenate(scr, axis=0).T
191 |     aps = []
192 |     for i in range(20):
193 |         # Subtract eps from score to make AP work for tied scores
194 |         ap = metrics.average_precision_score(gts[i][gts[i]<=1], scr[i][gts[i]<=1]-1e-5*gts[i][gts[i]<=1])
195 |         aps.append( ap )
196 |     print(np.mean(aps), '  ', ' '.join(['%0.2f'%a for a in aps]))
197 | 
198 | 
199 | def train(loader, model, optimizer, criterion, fc6_8, losses, current_iteration=0, total_iterations=None, stepsize=None, verbose=True):
200 |     # to log
201 |     batch_time = AverageMeter()
202 |     data_time = AverageMeter()
203 |     top1 = AverageMeter()
204 |     end = time.time()
205 | 
206 |     # use dropout for the MLP
207 |     if hasattr(model.body, 'classifier'):
208 |         model.train()
209 |         # in the batch norms always use global statistics
210 |         model.body.features.eval()
211 |     else:
212 |         model.eval()
213 | 
214 |     for i, (input, target) in enumerate(loader):
215 |         # measure data loading time
216 |         data_time.update(time.time() - end)
217 |         
218 |         # adjust learning rate
219 |         if current_iteration != 0 and current_iteration % stepsize == 0:
220 |             for param_group in optimizer.param_groups:
221 |                 param_group['lr'] = param_group['lr'] * 0.5
222 |                 print('iter {0} learning rate is {1}'.format(current_iteration, param_group['lr']))
223 | 
224 |         # move input to gpu
225 |         input = input.cuda(non_blocking=True)
226 | 
227 |         # forward pass with or without grad computation
228 |         output = model(input)
229 | 
230 |         target = target.float().cuda()
231 |         mask = (target == 255)
232 |         loss = torch.sum(criterion(output, target).masked_fill_(mask, 0)) / target.size(0)
233 | 
234 |         # backward
235 |         optimizer.zero_grad()
236 |         loss.backward()
237 |         # clip gradients
238 |         torch.nn.utils.clip_grad_norm_(model.parameters(), 10)
239 |         # and weights update
240 |         optimizer.step()
241 | 
242 |         # measure accuracy and record loss
243 |         losses.update(loss.item(), input.size(0))
244 | 
245 |         # measure elapsed time
246 |         batch_time.update(time.time() - end)
247 |         end = time.time()
248 |         if verbose is True and current_iteration % 25 == 0:
249 |             print('Iteration[{0}]\t'
250 |                   'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t'
251 |                   'Data {data_time.val:.3f} ({data_time.avg:.3f})\t'
252 |                   'Loss {loss.val:.4f} ({loss.avg:.4f})\t'.format(
253 |                    current_iteration, batch_time=batch_time,
254 |                    data_time=data_time, loss=losses))
255 |         current_iteration = current_iteration + 1
256 |         if total_iterations is not None and current_iteration == total_iterations:
257 |             break
258 |     return current_iteration
259 | 
260 | 
261 | if __name__ == '__main__':
262 |     main()
263 | 


--------------------------------------------------------------------------------
/linear_classif_layers.sh:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | #
  4 | # This source code is licensed under the license found in the
  5 | # LICENSE file in the root directory of this source tree.
  6 | #
  7 | 
  8 | DATAPATH_IMAGENET='path/to/imagenet/dataset'
  9 | DATAPATH_PLACES='path/to/places205/dataset'
 10 | 
 11 | ########################
 12 | ### ImageNet dataset ###
 13 | ########################
 14 | 
 15 | # CONV 1
 16 | EXP='./exp/eval_linear_imagenet_conv1/'
 17 | mkdir -p $EXP
 18 | python eval_linear.py --conv 1 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_IMAGENET --batch_size 256 --lr 0.005 --wd 0.00001 --nepochs 100
 19 | 
 20 | # CONV 2
 21 | EXP='./exp/eval_linear_imagenet_conv2/'
 22 | mkdir -p $EXP
 23 | python eval_linear.py --conv 2 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_IMAGENET --batch_size 256 --lr 0.01 --wd 0.00001 --nepochs 100
 24 | 
 25 | # CONV 3
 26 | EXP='./exp/eval_linear_imagenet_conv3/'
 27 | mkdir -p $EXP
 28 | python eval_linear.py --conv 3 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_IMAGENET --batch_size 256 --lr 0.03 --wd 0.00001 --nepochs 100
 29 | 
 30 | # CONV 4
 31 | EXP='./exp/eval_linear_imagenet_conv4/'
 32 | mkdir -p $EXP
 33 | python eval_linear.py --conv 4 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_IMAGENET --batch_size 256 --lr 0.03 --wd 0.00001 --nepochs 100
 34 | 
 35 | # CONV 5
 36 | EXP='./exp/eval_linear_imagenet_conv5/'
 37 | mkdir -p $EXP
 38 | python eval_linear.py --conv 5 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_IMAGENET --batch_size 256 --lr 0.03 --wd 0.00001 --nepochs 100
 39 | 
 40 | # CONV 6
 41 | EXP='./exp/eval_linear_imagenet_conv6/'
 42 | mkdir -p $EXP
 43 | python eval_linear.py --conv 6 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_IMAGENET --batch_size 256 --lr 0.03 --wd 0.00001 --nepochs 100
 44 | 
 45 | # CONV 7
 46 | EXP='./exp/eval_linear_imagenet_conv7/'
 47 | mkdir -p $EXP
 48 | python eval_linear.py --conv 7 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_IMAGENET --batch_size 256 --lr 0.02 --wd 0.00001 --nepochs 100
 49 | 
 50 | # CONV 8
 51 | EXP='./exp/eval_linear_imagenet_conv8/'
 52 | mkdir -p $EXP
 53 | python eval_linear.py --conv 8 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_IMAGENET --batch_size 256 --lr 0.03 --wd 0.00001 --nepochs 100
 54 | 
 55 | # CONV 9
 56 | EXP='./exp/eval_linear_imagenet_conv9/'
 57 | mkdir -p $EXP
 58 | python eval_linear.py --conv 9 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_IMAGENET --batch_size 256 --lr 0.03 --wd 0.00001 --nepochs 100
 59 | 
 60 | # CONV 10
 61 | EXP='./exp/eval_linear_imagenet_conv10/'
 62 | mkdir -p $EXP
 63 | python eval_linear.py --conv 10 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_IMAGENET --batch_size 256 --lr 0.03 --wd 0.00001 --nepochs 100
 64 | 
 65 | # CONV 11
 66 | EXP='./exp/eval_linear_imagenet_conv11/'
 67 | mkdir -p $EXP
 68 | python eval_linear.py --conv 11 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_IMAGENET --batch_size 256 --lr 0.02 --wd 0.00001 --nepochs 100
 69 | 
 70 | # CONV 12
 71 | EXP='./exp/eval_linear_imagenet_conv12/'
 72 | mkdir -p $EXP
 73 | python eval_linear.py --conv 12 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_IMAGENET --batch_size 256 --lr 0.02 --wd 0.00001 --nepochs 100
 74 | 
 75 | # CONV 13
 76 | EXP='./exp/eval_linear_imagenet_conv13/'
 77 | mkdir -p $EXP
 78 | python eval_linear.py --conv 13 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_IMAGENET --batch_size 256 --lr 0.02 --wd 0.00001 --nepochs 100
 79 | 
 80 | ########################
 81 | ### Places205 dataset ##
 82 | ########################
 83 | 
 84 | # CONV 1
 85 | EXP='./exp/eval_linear_places205_conv1/'
 86 | mkdir -p $EXP
 87 | python eval_linear.py --conv 1 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_PLACES --batch_size 256 --lr 0.003 --wd 0.00001 --nepochs 100
 88 | 
 89 | # CONV 2
 90 | EXP='./exp/eval_linear_places205_conv2/'
 91 | mkdir -p $EXP
 92 | python eval_linear.py --conv 2 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_PLACES --batch_size 256 --lr 0.005 --wd 0.00001 --nepochs 100
 93 | 
 94 | # CONV 3
 95 | EXP='./exp/eval_linear_places205_conv3/'
 96 | mkdir -p $EXP
 97 | python eval_linear.py --conv 3 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_PLACES --batch_size 256 --lr 0.01 --wd 0.00001 --nepochs 100
 98 | 
 99 | # CONV 4
100 | EXP='./exp/eval_linear_places205_conv4/'
101 | mkdir -p $EXP
102 | python eval_linear.py --conv 4 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_PLACES --batch_size 256 --lr 0.01 --wd 0.00001 --nepochs 100
103 | 
104 | # CONV 5
105 | EXP='./exp/eval_linear_places205_conv5/'
106 | mkdir -p $EXP
107 | python eval_linear.py --conv 5 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_PLACES --batch_size 256 --lr 0.01 --wd 0.00001 --nepochs 100
108 | 
109 | # CONV 6
110 | EXP='./exp/eval_linear_places205_conv6/'
111 | mkdir -p $EXP
112 | python eval_linear.py --conv 6 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_PLACES --batch_size 256 --lr 0.01 --wd 0.00001 --nepochs 100
113 | 
114 | # CONV 7
115 | EXP='./exp/eval_linear_places205_conv7/'
116 | mkdir -p $EXP
117 | python eval_linear.py --conv 7 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_PLACES --batch_size 256 --lr 0.01 --wd 0.00001 --nepochs 100
118 | 
119 | # CONV 8
120 | EXP='./exp/eval_linear_places205_conv8/'
121 | mkdir -p $EXP
122 | python eval_linear.py --conv 8 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_PLACES --batch_size 256 --lr 0.01 --wd 0.00001 --nepochs 100
123 | 
124 | # CONV 9
125 | EXP='./exp/eval_linear_places205_conv9/'
126 | mkdir -p $EXP
127 | python eval_linear.py --conv 9 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_PLACES --batch_size 256 --lr 0.01 --wd 0.00001 --nepochs 100
128 | 
129 | # CONV 10
130 | EXP='./exp/eval_linear_places205_conv10/'
131 | mkdir -p $EXP
132 | python eval_linear.py --conv 10 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_PLACES --batch_size 256 --lr 0.03 --wd 0.00001 --nepochs 100
133 | 
134 | # CONV 11
135 | EXP='./exp/eval_linear_places205_conv11/'
136 | mkdir -p $EXP
137 | python eval_linear.py --conv 11 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_PLACES --batch_size 256 --lr 0.01 --wd 0.00001 --nepochs 100
138 | 
139 | # CONV 12
140 | EXP='./exp/eval_linear_places205_conv12/'
141 | mkdir -p $EXP
142 | python eval_linear.py --conv 12 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_PLACES --batch_size 256 --lr 0.01 --wd 0.00001 --nepochs 100
143 | 
144 | # CONV 13
145 | EXP='./exp/eval_linear_places205_conv13/'
146 | mkdir -p $EXP
147 | python eval_linear.py --conv 13 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_PLACES --batch_size 256 --lr 0.02 --wd 0.00001 --nepochs 100
148 | 


--------------------------------------------------------------------------------
/main.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | #
  4 | # This source code is licensed under the license found in the
  5 | # LICENSE file in the root directory of this source tree.
  6 | #
  7 | 
  8 | import argparse
  9 | import os
 10 | 
 11 | import apex
 12 | import numpy as np
 13 | import torch
 14 | import torch.distributed as dist
 15 | import torch.nn as nn
 16 | 
 17 | from src.clustering import get_cluster_assignments, load_cluster_assignments
 18 | from src.data.loader import get_data_transformations
 19 | from src.data.YFCC100M import YFCC100M_dataset
 20 | from src.model.model_factory import (build_prediction_layer, model_factory,
 21 |                                      sgd_optimizer, to_cuda)
 22 | from src.model.pretrain import load_pretrained
 23 | from src.slurm import init_signal_handler
 24 | from src.trainer import train_network
 25 | from src.utils import (bool_flag, check_parameters, end_of_epoch, fix_random_seeds,
 26 |                        init_distributed_mode, initialize_exp, restart_from_checkpoint)
 27 | 
 28 | 
 29 | def get_parser():
 30 |     """
 31 |     Generate a parameters parser.
 32 |     """
 33 |     # parse parameters
 34 |     parser = argparse.ArgumentParser(description="Unsupervised feature learning.")
 35 | 
 36 |     # handling experiment parameters
 37 |     parser.add_argument("--checkpoint_freq", type=int, default=1,
 38 |                         help="Save the model every this epoch.")
 39 |     parser.add_argument("--dump_path", type=str, default="./exp",
 40 |                         help="Experiment dump path.")
 41 |     parser.add_argument('--epoch', type=int, default=0,
 42 |                         help='Current epoch to run.')
 43 |     parser.add_argument('--start_iter', type=int, default=0,
 44 |                         help='First iter to run in the current epoch.')
 45 | 
 46 |     # network params
 47 |     parser.add_argument('--pretrained', type=str, default='',
 48 |                         help='Start from this instead of random weights.')
 49 | 
 50 |     # datasets params
 51 |     parser.add_argument('--data_path', type=str, default='',
 52 |                         help='Where to find training dataset.')
 53 |     parser.add_argument('--size_dataset', type=int, default=10000000,
 54 |                         help='How many images to use.')
 55 |     parser.add_argument('--workers', type=int, default=8,
 56 |                         help='Number of data loading workers.')
 57 |     parser.add_argument('--sobel', type=bool_flag, default=0,
 58 |                         help='Apply Sobel filter.')
 59 | 
 60 |     # optim params
 61 |     parser.add_argument('--lr', type=float, default=0.1, help='Learning rate.')
 62 |     parser.add_argument('--wd', type=float, default=1e-5, help='Weight decay.')
 63 |     parser.add_argument('--nepochs', type=int, default=100,
 64 |                         help='Max number of epochs to run.')
 65 |     parser.add_argument('--batch_size', default=48, type=int,
 66 |                         help='Batch-size per process.')
 67 | 
 68 |     # Model params
 69 |     parser.add_argument('--reassignment', type=int, default=3,
 70 |                         help='Reassign clusters every this epoch(s).')
 71 |     parser.add_argument('--dim_pca', type=int, default=4096,
 72 |                         help='Dimension of the pca applied to the descriptors.')
 73 |     parser.add_argument('--k', type=int, default=10000,
 74 |                         help='Total number of clusters.')
 75 |     parser.add_argument('--super_classes', type=int, default=4,
 76 |                         help='Total number of super-classes.')
 77 |     parser.add_argument('--rotnet', type=bool_flag, default=True,
 78 |                         help='Network needs to classify large rotations.')
 79 | 
 80 |     # k-means params
 81 |     parser.add_argument('--warm_restart', type=bool_flag, default=False,
 82 |                         help='Use previous centroids as init.')
 83 |     parser.add_argument('--use_faiss', type=bool_flag, default=True,
 84 |                         help='Use faiss for E steps in k-means.')
 85 |     parser.add_argument('--niter', type=int, default=10,
 86 |                         help='Number of k-means iterations.')
 87 | 
 88 |     # distributed training params
 89 |     parser.add_argument('--rank', default=0, type=int,
 90 |                         help='Global process rank.')
 91 |     parser.add_argument("--local_rank", type=int, default=-1,
 92 |                         help="Multi-GPU - Local rank")
 93 |     parser.add_argument('--world-size', default=1, type=int,
 94 |                         help='Number of distributed processes.')
 95 |     parser.add_argument('--dist-url', default='', type=str,
 96 |                         help='Url used to set up distributed training.')
 97 | 
 98 |     # debug
 99 |     parser.add_argument("--debug_slurm", type=bool_flag, default=False,
100 |                         help="Debug within a SLURM job.")
101 | 
102 |     return parser.parse_args()
103 | 
104 | 
105 | def main(args):
106 |     """
107 |     This code implements the paper: https://arxiv.org/abs/1905.01278
108 |     The method consists in alternating between a hierachical clustering of the
109 |     features and learning the parameters of a convnet by predicting both the
110 |     angle of the rotation applied to the input data and the cluster assignments
111 |     in a single hierachical loss.
112 |     """
113 | 
114 |     # initialize communication groups
115 |     training_groups, clustering_groups = init_distributed_mode(args)
116 | 
117 |     # check parameters
118 |     check_parameters(args)
119 | 
120 |     # initialize the experiment
121 |     logger, training_stats = initialize_exp(args, 'epoch', 'iter', 'prec', 'loss',
122 |                                             'prec_super_class', 'loss_super_class',
123 |                                             'prec_sub_class', 'loss_sub_class')
124 | 
125 |     # initialize SLURM signal handler for time limit / pre-emption
126 |     init_signal_handler()
127 | 
128 |     # load data
129 |     dataset = YFCC100M_dataset(args.data_path, size=args.size_dataset)
130 | 
131 |     # prepare the different data transformations
132 |     tr_cluster, tr_train = get_data_transformations(args.rotation * 90)
133 | 
134 |     # build model skeleton
135 |     fix_random_seeds()
136 |     model = model_factory(args.sobel)
137 |     logger.info('model created')
138 | 
139 |     # load pretrained weights
140 |     load_pretrained(model, args)
141 | 
142 |     # convert batch-norm layers to nvidia wrapper to enable batch stats reduction
143 |     model = apex.parallel.convert_syncbn_model(model)
144 | 
145 |     # distributed training wrapper
146 |     model = to_cuda(model, args.gpu_to_work_on, apex=True)
147 |     logger.info('model to cuda')
148 | 
149 |     # set optimizer
150 |     optimizer = sgd_optimizer(model, args.lr, args.wd)
151 | 
152 |     # load cluster assignments
153 |     cluster_assignments = load_cluster_assignments(args, dataset)
154 | 
155 |     # build prediction layer on the super_class
156 |     pred_layer, optimizer_pred_layer = build_prediction_layer(
157 |         model.module.body.dim_output_space,
158 |         args,
159 |     )
160 | 
161 |     nmb_sub_classes = args.k // args.nmb_super_clusters
162 |     sub_class_pred_layer, optimizer_sub_class_pred_layer = build_prediction_layer(
163 |         model.module.body.dim_output_space,
164 |         args,
165 |         num_classes=nmb_sub_classes,
166 |         group=training_groups[args.training_local_world_id],
167 |     )
168 | 
169 |     # variables to fetch in checkpoint
170 |     to_restore = {'epoch': 0, 'start_iter': 0}
171 | 
172 |     # re start from checkpoint
173 |     restart_from_checkpoint(
174 |         args,
175 |         run_variables=to_restore,
176 |         state_dict=model,
177 |         optimizer=optimizer,
178 |         pred_layer_state_dict=pred_layer,
179 |         optimizer_pred_layer=optimizer_pred_layer,
180 |     )
181 |     pred_layer_name = str(args.training_local_world_id) + '-pred_layer.pth.tar'
182 |     restart_from_checkpoint(
183 |         args,
184 |         ckp_path=os.path.join(args.dump_path, pred_layer_name),
185 |         state_dict=sub_class_pred_layer,
186 |         optimizer=optimizer_sub_class_pred_layer,
187 |     )
188 |     args.epoch = to_restore['epoch']
189 |     args.start_iter = to_restore['start_iter']
190 | 
191 |     for _ in range(args.epoch, args.nepochs):
192 | 
193 |         logger.info("============ Starting epoch %i ... ============" % args.epoch)
194 |         fix_random_seeds(args.epoch)
195 | 
196 |         # step 1: Get the final activations for the whole dataset / Cluster them
197 | 
198 |         if cluster_assignments is None and not args.epoch % args.reassignment:
199 | 
200 |             logger.info("=> Start clustering step")
201 |             dataset.transform = tr_cluster
202 | 
203 |             cluster_assignments = get_cluster_assignments(args, model, dataset, clustering_groups)
204 | 
205 |             # reset prediction layers
206 |             if args.nmb_super_clusters > 1:
207 |                 pred_layer, optimizer_pred_layer = build_prediction_layer(
208 |                     model.module.body.dim_output_space,
209 |                     args,
210 |                 )
211 |             sub_class_pred_layer, optimizer_sub_class_pred_layer = build_prediction_layer(
212 |                 model.module.body.dim_output_space,
213 |                 args,
214 |                 num_classes=nmb_sub_classes,
215 |                 group=training_groups[args.training_local_world_id],
216 |             )
217 | 
218 | 
219 |         # step 2: Train the network with the cluster assignments as labels
220 | 
221 |         # prepare dataset
222 |         dataset.transform = tr_train
223 |         dataset.sub_classes = cluster_assignments
224 | 
225 |         # concatenate models and their corresponding optimizers
226 |         models = [model, pred_layer, sub_class_pred_layer]
227 |         optimizers = [optimizer, optimizer_pred_layer, optimizer_sub_class_pred_layer]
228 | 
229 |         # train the network for one epoch
230 |         scores = train_network(args, models, optimizers, dataset)
231 | 
232 |         ## save training statistics
233 |         logger.info(scores)
234 |         training_stats.update(scores)
235 | 
236 |         # reassign clusters at the next epoch
237 |         if not args.epoch % args.reassignment:
238 |             cluster_assignments = None
239 |             dataset.subset_indexes = None
240 |             end_of_epoch(args)
241 | 
242 |         dist.barrier()
243 | 
244 | 
245 | if __name__ == '__main__':
246 | 
247 |     # generate parser / parse parameters
248 |     args = get_parser()
249 | 
250 |     # run experiment
251 |     main(args)
252 | 


--------------------------------------------------------------------------------
/main.sh:
--------------------------------------------------------------------------------
 1 | # Copyright (c) Facebook, Inc. and its affiliates.
 2 | # All rights reserved.
 3 | #
 4 | # This source code is licensed under the license found in the
 5 | # LICENSE file in the root directory of this source tree.
 6 | #
 7 | #!/bin/bash
 8 | 
 9 | # load ids of the 95.920.149 images we managed to download 
10 | wget -c -P ./src/data/ "https://dl.fbaipublicfiles.com/deepcluster/flickr_unique_ids.npy" 
11 | 
12 | # create experiment dump repo
13 | mkdir -p ./exp/deepercluster/
14 | 
15 | # run unsupervised feature learning
16 | python main.py
17 | --dump_path ./exp/deepercluster/ \
18 | --pretrained PRETRAINED \
19 | --data_path DATA_PATH \
20 | --size_dataset 100000000 \
21 | --workers 10 \
22 | --sobel true \
23 | --lr 0.1 \
24 | --wd 0.00001 \
25 | --nepochs 100 \
26 | --batch_size 48 \
27 | --reassignment 3 \
28 | --dim_pca 4096 \
29 | --super_classes 16 \
30 | --rotnet true \
31 | --k 320000 \
32 | --warm_restart false \
33 | --use_faiss true \
34 | --niter 10 \
35 | --world-size 64 \
36 | --dist-url DIST_URL
37 | 				 
38 | 


--------------------------------------------------------------------------------
/src/__init__.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) Facebook, Inc. and its affiliates.
2 | # All rights reserved.
3 | #
4 | # This source code is licensed under the license found in the
5 | # LICENSE file in the root directory of this source tree.
6 | #
7 | 


--------------------------------------------------------------------------------
/src/clustering.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | #
  4 | # This source code is licensed under the license found in the
  5 | # LICENSE file in the root directory of this source tree.
  6 | #
  7 | 
  8 | from logging import getLogger
  9 | import os
 10 | import pickle
 11 | 
 12 | import faiss
 13 | import torch
 14 | import torch.distributed as dist
 15 | from torch.utils.data.sampler import Sampler
 16 | import numpy as np
 17 | 
 18 | from .utils import PCA, AverageMeter, normalize, get_indices_sparse
 19 | from .distributed_kmeans import distributed_kmeans, initialize_cache
 20 | 
 21 | 
 22 | logger = getLogger()
 23 | 
 24 | 
 25 | def get_cluster_assignments(args, model, dataset, groups):
 26 |     """
 27 |     """
 28 |     # pseudo-labels are confusing
 29 |     dataset.sub_classes = None
 30 | 
 31 |     # swith to eval mode
 32 |     model.eval()
 33 | 
 34 |     # this process deals only with a subset of the dataset
 35 |     local_nmb_data = len(dataset) // args.world_size
 36 |     indices = torch.arange(args.rank * local_nmb_data, (args.rank + 1) * local_nmb_data).int()
 37 | 
 38 |     if os.path.isfile(os.path.join(args.dump_path, 'super_class_assignments.pkl')):
 39 | 
 40 |         # super-class assignments have already been computed in a previous run
 41 | 
 42 |         super_class_assignements = pickle.load(open(os.path.join(args.dump_path, 'super_class_assignments.pkl'), 'rb'))
 43 |         logger.info('loaded super-class assignments')
 44 | 
 45 |         # dump cache
 46 |         where_helper = get_indices_sparse(super_class_assignements[indices])
 47 |         nmb_data_per_super_cluster = torch.zeros(args.nmb_super_clusters).cuda()
 48 |         for super_class in range(len(where_helper)):
 49 |             nmb_data_per_super_cluster[super_class] = len(where_helper[super_class][0])
 50 | 
 51 |     else:
 52 |         sampler = Subset_Sampler(indices)
 53 | 
 54 |         # we need a data loader
 55 |         loader = torch.utils.data.DataLoader(
 56 |             dataset,
 57 |             batch_size=args.batch_size,
 58 |             sampler=sampler,
 59 |             num_workers=args.workers,
 60 |             pin_memory=True,
 61 |         )
 62 | 
 63 |         # initialize cache, pca and centroids
 64 |         cache, centroids = initialize_cache(args, loader, model)
 65 | 
 66 |         # empty cuda cache (useful because we're about to use faiss on gpu)
 67 |         torch.cuda.empty_cache()
 68 | 
 69 |         ## perform clustering into super_clusters
 70 |         super_class_assignements, centroids_sc = distributed_kmeans(
 71 |             args,
 72 |             args.size_dataset,
 73 |             args.nmb_super_clusters,
 74 |             cache,
 75 |             args.rank,
 76 |             args.world_size,
 77 |             centroids,
 78 |         )
 79 | 
 80 |         # dump activations in the cache
 81 |         where_helper = get_indices_sparse(super_class_assignements[indices])
 82 |         nmb_data_per_super_cluster = torch.zeros(args.nmb_super_clusters).cuda()
 83 |         for super_class in range(len(where_helper)):
 84 |             ind_sc = where_helper[super_class][0]
 85 |             np.save(open(os.path.join(
 86 |                 args.dump_path,
 87 |                 'cache/',
 88 |                 'super_class' + str(super_class) + '-' + str(args.rank),
 89 |             ), 'wb'), cache[ind_sc])
 90 | 
 91 |             nmb_data_per_super_cluster[super_class] = len(ind_sc)
 92 | 
 93 |         dist.barrier()
 94 | 
 95 |         # dump super_class assignment and centroids of super_class
 96 |         if not args.rank:
 97 |             pickle.dump(
 98 |                 super_class_assignements,
 99 |                 open(os.path.join(args.dump_path, 'super_class_assignments.pkl'), 'wb'),
100 |             )
101 |             pickle.dump(
102 |                 centroids_sc,
103 |                 open(os.path.join(args.dump_path, 'super_class_centroids.pkl'), 'wb'),
104 |             )
105 | 
106 |     # size of the different super clusters
107 |     all_counts = [torch.zeros(args.nmb_super_clusters).cuda() for _ in range(args.world_size)]
108 |     dist.all_gather(all_counts, nmb_data_per_super_cluster)
109 |     all_counts = torch.cat(all_counts).cpu().long()
110 |     all_counts = all_counts.reshape(args.world_size, args.nmb_super_clusters)
111 |     logger.info(all_counts.sum(dim=0))
112 | 
113 |     # what are the data belonging to this super class
114 |     dataset.subset_indexes = np.where(super_class_assignements == args.clustering_local_world_id)[0]
115 |     div = args.batch_size * args.clustering_local_world_size
116 |     dataset.subset_indexes = dataset.subset_indexes[:len(dataset) // div * div]
117 | 
118 |     dist.barrier()
119 | 
120 |     # which files this process is going to read
121 |     local_nmb_data = int(len(dataset) / args.clustering_local_world_size)
122 |     low = np.long(args.clustering_local_rank * local_nmb_data)
123 |     high = np.long(low + local_nmb_data)
124 |     curr_ind = 0
125 |     cache = torch.zeros(local_nmb_data, args.dim_pca, dtype=torch.float32)
126 | 
127 |     cumsum = torch.cumsum(all_counts[:, args.clustering_local_world_id].long(), 0).long()
128 |     for r in range(args.world_size):
129 |         # data in this bucket r: [cumsum[r - 1] : cumsum[r] - 1]
130 |         low_bucket = np.long(cumsum[r - 1]) if r else 0
131 |         
132 |         # this bucket is empty
133 |         if low_bucket > cumsum[r] - 1:
134 |             continue
135 | 
136 |         if cumsum[r] - 1 < low:
137 |             continue
138 |         if low_bucket >= high:
139 |             break
140 |         
141 |         # which are the data we are interested in inside this bucket ?
142 |         ind_low = np.long(max(low, low_bucket))
143 |         ind_high = np.long(min(high, cumsum[r]))
144 |     
145 |         cache_r = np.load(open(os.path.join(args.dump_path, 'cache/', 'super_class' + str(args.clustering_local_world_id) + '-' + str(r)), 'rb'))
146 |         cache[curr_ind: curr_ind + ind_high - ind_low] = torch.FloatTensor(cache_r[ind_low - low_bucket: ind_high - low_bucket])
147 | 
148 |         curr_ind += (ind_high - ind_low)
149 | 
150 |     # randomly pick some centroids and dump them
151 |     centroids_path = os.path.join(args.dump_path, 'centroids' + str(args.clustering_local_world_id) + '.pkl')
152 |     if not args.clustering_local_rank:
153 |         centroids = cache[np.random.choice(
154 |             np.arange(cache.shape[0]),
155 |             replace=cache.shape[0] < args.k // args.nmb_super_clusters,
156 |             size=args.k // args.nmb_super_clusters,
157 |         )]
158 |         pickle.dump(centroids, open(centroids_path, 'wb'), -1)
159 | 
160 |     dist.barrier()
161 | 
162 |     # read centroids
163 |     centroids = pickle.load(open(centroids_path, 'rb')).cuda()
164 | 
165 |     # distributed kmeans into sub-classes
166 |     cluster_assignments, centroids = distributed_kmeans(
167 |         args,
168 |         len(dataset),
169 |         args.k // args.nmb_super_clusters,
170 |         cache,
171 |         args.clustering_local_rank,
172 |         args.clustering_local_world_size,
173 |         centroids,
174 |         world_id=args.clustering_local_world_id,
175 |         group=groups[args.clustering_local_world_id],
176 |     )
177 | 
178 |     # free RAM
179 |     del cache
180 | 
181 |     # write cluster assignments and centroids
182 |     if not args.clustering_local_rank:
183 |         pickle.dump(
184 |             cluster_assignments,
185 |             open(os.path.join(args.dump_path, 'cluster_assignments' + str(args.clustering_local_world_id) + '.pkl'), 'wb'),
186 |         )
187 |         pickle.dump(
188 |             centroids,
189 |             open(centroids_path, 'wb'),
190 |         )
191 | 
192 |     dist.barrier()
193 | 
194 |     return cluster_assignments
195 | 
196 | 
197 | 
198 | class Subset_Sampler(Sampler):
199 |     """
200 |     Sample indices.
201 |     """
202 |     def __init__(self, indices):
203 |         self.indices = indices
204 | 
205 |     def __iter__(self):
206 |         return iter(self.indices)
207 | 
208 |     def __len__(self):
209 |         return len(self.indices)
210 | 
211 | 
212 | def load_cluster_assignments(args, dataset):
213 |     """
214 |     Load cluster assignments if they are present in experiment repository.
215 |     """
216 |     super_file = os.path.join(args.dump_path, 'super_class_assignments.pkl')
217 |     sub_file = os.path.join(
218 |         args.dump_path,
219 |         'sub_class_assignments' + str(args.clustering_local_world_id) + '.pkl',
220 |     )
221 | 
222 |     if os.path.isfile(super_file) and os.path.isfile(sub_file):
223 |         super_class_assignments = pickle.load(open(super_file, 'rb'))
224 |         dataset.subset_indexes = np.where(super_class_assignments == args.clustering_local_world_id)[0]
225 | 
226 |         div = args.batch_size * args.clustering_local_world_size
227 |         clustering_size_dataset = len(dataset) // div * div
228 |         dataset.subset_indexes = dataset.subset_indexes[:clustering_size_dataset]
229 | 
230 |         logger.info('Found cluster assignments in experiment repository')
231 |         return pickle.load(open(sub_file, "rb"))
232 | 
233 |     return None
234 | 


--------------------------------------------------------------------------------
/src/data/VOC2007.py:
--------------------------------------------------------------------------------
 1 | # Copyright (c) Facebook, Inc. and its affiliates.
 2 | # All rights reserved.
 3 | #
 4 | # This source code is licensed under the license found in the
 5 | # LICENSE file in the root directory of this source tree.
 6 | #
 7 | 
 8 | import glob
 9 | import os
10 | from collections import defaultdict
11 | 
12 | from PIL import Image
13 | from PIL import ImageFile
14 | ImageFile.LOAD_TRUNCATED_IMAGES = True
15 | import numpy as np
16 | import torch.utils.data as data
17 | 
18 | 
19 | class VOC2007_dataset(data.Dataset):
20 |     def __init__(self, voc_dir, split='train', transform=None):
21 |         # Find the image sets
22 |         image_set_dir = os.path.join(voc_dir, 'ImageSets', 'Main')
23 |         image_sets = glob.glob(os.path.join(image_set_dir, '*_' + split + '.txt'))
24 |         assert len(image_sets) == 20
25 |         # Read the labels
26 |         self.n_labels = len(image_sets)
27 |         images = defaultdict(lambda:-np.ones(self.n_labels, dtype=np.uint8)) 
28 |         for k, s in enumerate(sorted(image_sets)):
29 |             for l in open(s, 'r'):
30 |                 name, lbl = l.strip().split()
31 |                 lbl = int(lbl)
32 |                 # Switch the ignore label and 0 label (in VOC -1: not present, 0: ignore)
33 |                 if lbl < 0:
34 |                     lbl = 0
35 |                 elif lbl == 0:
36 |                     lbl = 255
37 |                 images[os.path.join(voc_dir, 'JPEGImages', name + '.jpg')][k] = lbl
38 |         self.images = [(k, images[k]) for k in images.keys()]
39 |         np.random.shuffle(self.images)
40 |         self.transform = transform
41 | 
42 |     def __len__(self):
43 |         return len(self.images)
44 | 
45 |     def __getitem__(self, i):
46 |         img = Image.open(self.images[i][0])
47 |         img = img.convert('RGB')
48 |         if self.transform is not None:
49 |             img = self.transform(img)
50 |         return img, self.images[i][1]
51 | 
52 | 


--------------------------------------------------------------------------------
/src/data/YFCC100M.py:
--------------------------------------------------------------------------------
 1 | # Copyright (c) Facebook, Inc. and its affiliates.
 2 | # All rights reserved.
 3 | #
 4 | # This source code is licensed under the license found in the
 5 | # LICENSE file in the root directory of this source tree.
 6 | #
 7 | 
 8 | import os
 9 | import zipfile
10 | 
11 | import numpy as np
12 | from PIL import Image
13 | from PIL import ImageFile
14 | ImageFile.LOAD_TRUNCATED_IMAGES = True
15 | import torch.utils.data as data
16 | 
17 | 
18 | def loader(path_zip, file_img):
19 |     """
20 |     Load imagefile from zip.
21 |     """
22 |     with zipfile.ZipFile(path_zip, 'r') as myzip:
23 |         img = Image.open(myzip.open(file_img))
24 |     return img.convert('RGB')
25 | 
26 | 
27 | class YFCC100M_dataset(data.Dataset):
28 |     """
29 |     YFCC100M dataset.
30 |     """
31 |     def __init__(self, root, size, flickr_unique_ids=True, transform=None):
32 |         self.root = root
33 |         self.transform = transform
34 |         self.sub_classes = None
35 | 
36 |         # remove data with uniform color and data we didn't manage to download
37 |         if flickr_unique_ids:
38 |             self.indexes = np.load(os.path.join(os.path.dirname(os.path.realpath(__file__)), 'flickr_unique_ids.npy'))
39 |             self.indexes = self.indexes[:min(size, len(self.indexes))]
40 |         else:
41 |             self.indexes = np.arange(size)
42 | 
43 |         # for subsets
44 |         self.subset_indexes = None
45 | 
46 |     def __getitem__(self, ind):
47 |         index = ind
48 |         if self.subset_indexes is not None:
49 |             index = self.subset_indexes[ind]
50 |         index = self.indexes[index]
51 | 
52 |         index = format(index, "0>8d")
53 |         repo = index[:2]
54 |         z = index[2: 5]
55 |         file_img = index[5:] + '.jpg'
56 | 
57 |         path_zip = os.path.join(self.root, repo, z) + '.zip'
58 | 
59 |         # load the image
60 |         img = loader(path_zip, file_img)
61 | 
62 |         # apply transformation
63 |         if self.transform is not None:
64 |             img = self.transform(img)
65 | 
66 |         # id of cluster
67 |         sub_class = -100
68 |         if self.sub_classes is not None:
69 |             sub_class = self.sub_classes[ind]
70 | 
71 |         return img, sub_class
72 | 
73 |     def __len__(self):
74 |         if self.subset_indexes is not None:
75 |             return len(self.subset_indexes)
76 |         return len(self.indexes)
77 | 


--------------------------------------------------------------------------------
/src/data/__init__.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) Facebook, Inc. and its affiliates.
2 | # All rights reserved.
3 | #
4 | # This source code is licensed under the license found in the
5 | # LICENSE file in the root directory of this source tree.
6 | #
7 | 


--------------------------------------------------------------------------------
/src/data/loader.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | #
  4 | # This source code is licensed under the license found in the
  5 | # LICENSE file in the root directory of this source tree.
  6 | #
  7 | 
  8 | from logging import getLogger
  9 | from random import randrange
 10 | import os
 11 | 
 12 | import numpy as np
 13 | from sklearn.feature_extraction import image
 14 | import torch
 15 | import torch.nn as nn
 16 | import torchvision.datasets as datasets
 17 | import torchvision.transforms as transforms
 18 | from torch.utils.data.sampler import Sampler
 19 | 
 20 | from .YFCC100M import YFCC100M_dataset
 21 | 
 22 | logger = getLogger()
 23 | 
 24 | 
 25 | def load_data(args):
 26 |     """
 27 |     Load dataset.
 28 |     """
 29 |     if 'yfcc100m' in args.data_path:
 30 |         return YFCC100M_dataset(args.data_path, size=args.size_dataset)
 31 |     return datasets.ImageFolder(args.data_path)
 32 | 
 33 | 
 34 | def get_data_transformations(rotation=0):
 35 |     """
 36 |      Return data transformations for clustering and for training
 37 |     """
 38 |     tr_normalize = transforms.Normalize(
 39 |         mean=[0.485, 0.456, 0.406],
 40 |         std=[0.229, 0.224, 0.225],
 41 |     )
 42 |     final_process = [transforms.ToTensor(), tr_normalize]
 43 | 
 44 |     # for clustering stage
 45 |     tr_central_crop = transforms.Compose([
 46 |         transforms.Resize(256),
 47 |         transforms.CenterCrop(224),
 48 |         lambda x: np.asarray(x),
 49 |         Rotate(0)
 50 |     ] + final_process)
 51 | 
 52 |     # for training stage
 53 |     tr_dataug = transforms.Compose([
 54 |         transforms.RandomResizedCrop(224),
 55 |         transforms.RandomHorizontalFlip(),
 56 |         lambda x: np.asarray(x),
 57 |         Rotate(rotation)
 58 |     ] + final_process)
 59 | 
 60 |     return tr_central_crop, tr_dataug
 61 | 
 62 | 
 63 | class Rotate(object):
 64 |     def __init__(self, rot):
 65 |         self.rot = rot
 66 |     def __call__(self, img):
 67 |         return rotate_img(img, self.rot)
 68 | 
 69 | 
 70 | def rotate_img(img, rot):
 71 |     if rot == 0: # 0 degrees rotation
 72 |         return img
 73 |     elif rot == 90: # 90 degrees rotation
 74 |         return np.flipud(np.transpose(img, (1, 0, 2))).copy()
 75 |     elif rot == 180: # 90 degrees rotation
 76 |         return np.fliplr(np.flipud(img)).copy()
 77 |     elif rot == 270: # 270 degrees rotation / or -90
 78 |         return np.transpose(np.flipud(img), (1, 0, 2)).copy()
 79 |     else:
 80 |         return
 81 | 
 82 | 
 83 | class KFoldSampler(Sampler):
 84 |     def __init__(self, im_per_target, shuffle):
 85 |         self.im_per_target = im_per_target
 86 |         N = 0
 87 |         for tar in im_per_target:
 88 |             N = N + len(im_per_target[tar])
 89 |         self.N = N
 90 |         self.shuffle = shuffle
 91 | 
 92 |     def __iter__(self):
 93 |         indices = np.zeros(self.N).astype(int)
 94 |         c = 0
 95 |         for tar in self.im_per_target:
 96 |             indices[c: c + len(self.im_per_target[tar])] = self.im_per_target[tar]
 97 |             c =  c + len(self.im_per_target[tar])
 98 |         if self.shuffle:
 99 |             np.random.shuffle(indices)
100 |         return iter(indices)
101 | 
102 |     def __len__(self):
103 |         return self.N
104 | 
105 | 
106 | class KFold():
107 |     """Class to perform k-fold cross-validation.
108 |         Args:
109 |             im_per_target (Dict): key (target), value (list of data with this target)
110 |             i (int): index of the round of cross validation to perform
111 |             K (int): dataset randomly partitioned into K equal sized subsamples
112 |         Attributes:
113 |             val (KFoldSampler): validation sampler
114 |             train (KFoldSampler): training sampler
115 |     """
116 |     def __init__(self, im_per_target, i, K):
117 |         assert(i<K)
118 |         per_target = {}
119 |         for tar in im_per_target:
120 |             per_target[tar] = int(len(im_per_target[tar]) // K)
121 |         im_per_target_train = {}
122 |         im_per_target_val = {}
123 |         for k in range(K):
124 |             for L in im_per_target:
125 |                 if k==i:
126 |                     im_per_target_val[L] = im_per_target[L][k * per_target[L]: (k + 1) * per_target[L]]
127 |                 else:
128 |                     if not L in im_per_target_train:
129 |                         im_per_target_train[L] = []
130 |                     im_per_target_train[L] = im_per_target_train[L] + im_per_target[L][k * per_target[L]: (k + 1) * per_target[L]]
131 | 
132 |         self.val = KFoldSampler(im_per_target_val, False)
133 |         self.train = KFoldSampler(im_per_target_train, True)
134 | 
135 | 
136 | def per_target(imgs):
137 |     """Arrange samples per target.
138 |         Args:
139 |             imgs (list): List of (_, target) tuples.
140 |         Returns:
141 |             dict: key (target), value (list of data with this target)
142 |     """
143 |     res = {}
144 |     for index in range(len(imgs)):
145 |         _, target = imgs[index]
146 |         if target not in res:
147 |             res[target] = []
148 |         res[target].append(index)
149 |     return res
150 | 


--------------------------------------------------------------------------------
/src/distributed_kmeans.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | #
  4 | # This source code is licensed under the license found in the
  5 | # LICENSE file in the root directory of this source tree.
  6 | #
  7 | 
  8 | from logging import getLogger
  9 | import os
 10 | import pickle
 11 | import time
 12 | 
 13 | import faiss
 14 | import numpy as np
 15 | import torch
 16 | import torch.distributed as dist
 17 | 
 18 | from .utils import fix_random_seeds, AverageMeter, PCA, normalize
 19 | 
 20 | 
 21 | logger = getLogger()
 22 | 
 23 | 
 24 | def initialize_cache(args, loader, model):
 25 |     """
 26 |     Accumulate features to compute pca.
 27 |     Cache the dataset.
 28 |     """
 29 |     # we limit the size of the cache per process
 30 |     local_cache_size = min(len(loader), 3150000 // args.batch_size) * args.batch_size
 31 | 
 32 |     # total batch_size
 33 |     batch_size = args.batch_size * args.world_size
 34 | 
 35 |     # how many batches do we need to approximate the covariance matrix
 36 |     N = model.module.body.dim_output_space
 37 |     nmb_batches_for_pca = int(N * (N - 1) / 2 / args.batch_size / args.world_size)
 38 |     logger.info("Require {} images ({} iterations) for pca".format(
 39 |         nmb_batches_for_pca * args.batch_size * args.world_size, nmb_batches_for_pca))
 40 |     if nmb_batches_for_pca > len(loader):
 41 |         nmb_batches_for_pca = len(loader)
 42 |         logger.warning("Compute the PCA on {} images (entire dataset)".format(args.size_dataset))
 43 | 
 44 |     # statistics
 45 |     batch_time = AverageMeter()
 46 |     data_time = AverageMeter()
 47 |     end = time.time()
 48 | 
 49 |     with torch.no_grad():
 50 |         for i, (input_tensor, _) in enumerate(loader):
 51 | 
 52 |             # time spent to load data
 53 |             data_time.update(time.time() - end)
 54 | 
 55 |             # move to gpu
 56 |             input_tensor = input_tensor.type(torch.FloatTensor).cuda()
 57 | 
 58 |             # forward
 59 |             feat = model(input_tensor)
 60 | 
 61 |             # before the pca has been computed
 62 |             if i < nmb_batches_for_pca:
 63 | 
 64 |                 # gather the features computed by all processes
 65 |                 all_feat = [torch.cuda.FloatTensor(feat.size()) for src in range(args.world_size)]
 66 |                 dist.all_gather(all_feat, feat)
 67 | 
 68 |                 # only main process computes the PCA
 69 |                 if not args.rank:
 70 |                     all_feat = torch.cat(all_feat).cpu().numpy()
 71 | 
 72 |                 # initialize storage arrays
 73 |                 if i == 0:
 74 |                     if not args.rank:
 75 |                         for_pca = np.zeros(
 76 |                             (nmb_batches_for_pca * batch_size, all_feat.shape[1]),
 77 |                             dtype=np.float32,
 78 |                         )
 79 |                     for_cache = torch.zeros(
 80 |                         nmb_batches_for_pca * args.batch_size,
 81 |                         feat.size(1),
 82 |                         dtype=torch.float32,
 83 |                     )
 84 | 
 85 |                 # fill in arrays
 86 |                 if not args.rank:
 87 |                     for_pca[i * batch_size: (i + 1) * batch_size] = all_feat
 88 | 
 89 |                 for_cache[i * args.batch_size: (i + 1) * args.batch_size] = feat.cpu()
 90 | 
 91 |             # train the pca
 92 |             if i == nmb_batches_for_pca - 1:
 93 |                 pca_path = os.path.join(args.dump_path, 'pca.pkl')
 94 |                 centroids_path = os.path.join(args.dump_path, 'centroids.pkl')
 95 | 
 96 |                 # compute the PCA
 97 |                 if not args.rank:
 98 |                     # init PCA object
 99 |                     pca = PCA(dim=args.dim_pca, whit=0.5)
100 | 
101 |                     # center data
102 |                     mean = np.mean(for_pca, axis=0).astype('float32')
103 |                     for_pca -= mean
104 | 
105 |                     # compute covariance
106 |                     cov = np.dot(for_pca.T, for_pca) / for_pca.shape[0]
107 | 
108 |                     # calculate the pca
109 |                     pca.train_pca(cov)
110 | 
111 |                     # randomly pick some centroids
112 |                     centroids = pca.apply(for_pca[np.random.choice(
113 |                         np.arange(for_pca.shape[0]),
114 |                         replace=False,
115 |                         size=args.nmb_super_clusters,
116 |                     )])
117 |                     centroids = normalize(centroids)
118 | 
119 |                     pca.mean = mean
120 | 
121 |                     # free memory
122 |                     del for_pca
123 | 
124 |                     # write PCA to disk
125 |                     pickle.dump(pca, open(pca_path, 'wb'))
126 |                     pickle.dump(centroids, open(centroids_path, 'wb'))
127 | 
128 |                 # processes wait that main process compute and write PCA and centroids
129 |                 dist.barrier()
130 | 
131 |                 # processes read PCA and centroids from disk
132 |                 pca = pickle.load(open(pca_path, "rb"))
133 |                 centroids = pickle.load(open(centroids_path, "rb"))
134 | 
135 |                 # apply the pca to the cached features
136 |                 for_cache = pca.apply(for_cache)
137 |                 for_cache = normalize(for_cache)
138 | 
139 |                 # extend the cache
140 |                 current_cache_size = for_cache.size(0)
141 |                 for_cache = torch.cat((for_cache, torch.zeros(
142 |                     local_cache_size - current_cache_size,
143 |                     args.dim_pca,
144 |                 )))
145 |                 logger.info('{0} imgs cached => cache is {1:.2f} % full'
146 |                             .format(current_cache_size, 100 * current_cache_size / local_cache_size))
147 | 
148 |             # keep accumulating data
149 |             if i > nmb_batches_for_pca - 1:
150 |                 feat = pca.apply(feat)
151 |                 feat = normalize(feat)
152 |                 for_cache[i * args.batch_size: (i + 1) * args.batch_size] = feat.cpu()
153 | 
154 | 
155 |             # verbose
156 |             batch_time.update(time.time() - end)
157 |             end = time.time()
158 |             if i % 200 == 0:
159 |                 logger.info('{0} / {1}\t'
160 |                     'Time: {batch_time.val:.3f} ({batch_time.avg:.3f})\t'
161 |                     'Data Time: {data_time.val:.3f} ({data_time.avg:.3f})\t'
162 |                     .format(i, len(loader), batch_time=batch_time, data_time=data_time))
163 | 
164 |         # move centroids to GPU
165 |         centroids = torch.cuda.FloatTensor(centroids)
166 | 
167 |         return for_cache, centroids
168 | 
169 | 
170 | def distributed_kmeans(args, n_all, nk, cache, rank, world_size, centroids, world_id=0, group=None):
171 |     """
172 |     Distributed mini-batch k-means.
173 |     """
174 |     # local assignments
175 |     assignments = -1 * np.ones(n_all // world_size)
176 | 
177 |     # prepare faiss index
178 |     if args.use_faiss:
179 |         res = faiss.StandardGpuResources()
180 |         cfg = faiss.GpuIndexFlatConfig()
181 |         cfg.device = args.gpu_to_work_on
182 |         index = faiss.GpuIndexFlatL2(res, args.dim_pca, cfg)
183 | 
184 |     end = time.time()
185 |     for p in range(args.niter + 1):
186 |         start_pass = time.time()
187 | 
188 |         # running statistics
189 |         batch_time = AverageMeter()
190 |         log_loss = AverageMeter()
191 | 
192 |         # initialize arrays for update
193 |         local_counts = torch.zeros(nk).cuda()
194 |         local_feats = torch.zeros(nk, args.dim_pca).cuda()
195 | 
196 |         # prepare E step
197 |         torch.cuda.empty_cache()
198 |         if args.use_faiss:
199 |             index.reset()
200 |             index.add(centroids.cpu().numpy().astype('float32'))
201 |         else:
202 |             centroids_L2_norm = centroids.norm(dim=1)**2
203 | 
204 |         nmb_batches =  n_all // world_size // args.batch_size
205 |         for it in range(nmb_batches):
206 | 
207 |             # fetch mini-batch
208 |             feat = cache[it * args.batch_size: (it + 1) * args.batch_size]
209 | 
210 |             # E-step
211 |             if args.use_faiss:
212 |                 D, I = index.search(feat.numpy().astype('float32'), 1)
213 |                 I = I.squeeze(1)
214 |             else:
215 |                 # find current cluster assignments
216 |                 l2dist = 1 - 2 * torch.mm(feat.cuda(non_blocking=True), centroids.transpose(0, 1)) + centroids_L2_norm
217 |                 D, I = l2dist.min(dim=1)
218 |                 I = I.cpu().numpy()
219 |                 D = D.cpu().numpy()
220 | 
221 |             # update assignment array
222 |             assignments[it * args.batch_size: (it + 1) * args.batch_size] = I
223 | 
224 |             # log
225 |             log_loss.update(D.mean())
226 | 
227 |             for k in np.unique(I):
228 |                 idx_k = np.where(I == k)[0]
229 |                 # number of elmt in cluster k for this batch
230 |                 local_counts[k] += len(idx_k)
231 | 
232 |                 # sum of elmt belonging to this cluster
233 |                 local_feats[k, :] += feat.cuda(non_blocking=True)[idx_k].sum(dim=0)
234 | 
235 |             batch_time.update(time.time() - end)
236 |             end = time.time()
237 | 
238 |             if it and it % 1000 == 0:
239 |                 logger.info('Pass[{0}] - Iter: [{1}/{2}]\t'
240 |                       'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t'
241 |                       .format(p, it, nmb_batches, batch_time=batch_time))
242 | 
243 |         # all reduce operation
244 |         # processes share what it is needed for M-step
245 |         if group is not None:
246 |             dist.all_reduce(local_counts, group=group)
247 |             dist.all_reduce(local_feats, group=group)
248 |         else:
249 |             dist.all_reduce(local_counts)
250 |             dist.all_reduce(local_feats)
251 | 
252 |         # M-step
253 | 
254 |         # update centroids (for the last pass we only want the assignments)
255 |         mask = local_counts.nonzero()
256 |         if p < args.niter:
257 |             centroids[mask] = 1. / local_counts[mask].unsqueeze(1) * local_feats[mask]
258 | 
259 |         # deal with empty clusters
260 |         for k in (local_counts == 0).nonzero():
261 | 
262 |             # choose a random cluster from the set of non empty clusters
263 |             np.random.seed(world_id)
264 |             m = mask[np.random.randint(len(mask))]
265 | 
266 |             # replace empty centroid by a non empty one with a perturbation
267 |             centroids[k] = centroids[m]
268 |             for j in range(args.dim_pca):
269 |                 sign = (j % 2) * 2 - 1;
270 |                 centroids[k, j] += sign * 1e-7;
271 |                 centroids[m, j] -= sign * 1e-7;
272 | 
273 |             # update the counts
274 |             local_counts[k] = local_counts[m] // 2;
275 |             local_counts[m] -= local_counts[k];
276 | 
277 |             # update the assignments
278 |             assignments[np.where(assignments == m.item())[0][: int(local_counts[m])]] = k.cpu()
279 |             logger.info('cluster {} empty => split cluster {}'.format(k, m))
280 | 
281 |         logger.info(' # Pass[{0}]\tTime {1:.3f}\tLoss {2:.4f}'
282 |                     .format(p, time.time() - start_pass, log_loss.avg))
283 |             
284 |     # now each process needs to share its own set of pseudo_labels
285 | 
286 |     # where to write / read the pseudo_labels
287 |     dump_labels = os.path.join(
288 |         args.dump_path,
289 |         'pseudo_labels' + str(world_id) + '-' + str(rank) + '.pkl',
290 |     )
291 | 
292 |     # log the cluster assignment
293 |     pickle.dump(
294 |         assignments,
295 |         open(dump_labels, 'wb'),
296 |         -1,
297 |     )
298 | 
299 |     # process wait for all processes to finish writing
300 |     if group is not None:
301 |         dist.barrier(group=group)
302 |     else:
303 |         dist.barrier()
304 | 
305 |     pseudo_labels = np.zeros(n_all)
306 | 
307 |     # process read and reconstitute the pseudo_labels
308 |     local_nmb_data = n_all // world_size
309 |     for r in range(world_size):
310 |         pseudo_labels[torch.arange(r * local_nmb_data, (r + 1) * local_nmb_data).int()] = \
311 |             pickle.load(open(os.path.join(args.dump_path, 'pseudo_labels' + str(world_id) + '-' + str(r) + '.pkl'), "rb"))
312 | 
313 |     # clean
314 |     del assignments
315 |     dist.barrier()
316 |     os.remove(dump_labels)
317 | 
318 |     return pseudo_labels, centroids.cpu()
319 | 


--------------------------------------------------------------------------------
/src/logger.py:
--------------------------------------------------------------------------------
 1 | # Copyright (c) Facebook, Inc. and its affiliates.
 2 | # All rights reserved.
 3 | #
 4 | # This source code is licensed under the license found in the
 5 | # LICENSE file in the root directory of this source tree.
 6 | #
 7 | 
 8 | import os
 9 | import logging
10 | import time
11 | from datetime import timedelta
12 | import pandas as pd
13 | 
14 | class LogFormatter():
15 | 
16 |     def __init__(self):
17 |         self.start_time = time.time()
18 | 
19 |     def format(self, record):
20 |         elapsed_seconds = round(record.created - self.start_time)
21 | 
22 |         prefix = "%s - %s - %s" % (
23 |             record.levelname,
24 |             time.strftime('%x %X'),
25 |             timedelta(seconds=elapsed_seconds)
26 |         )
27 |         message = record.getMessage()
28 |         message = message.replace('\n', '\n' + ' ' * (len(prefix) + 3))
29 |         return "%s - %s" % (prefix, message) if message else ''
30 | 
31 | 
32 | def create_logger(filepath, rank):
33 |     """
34 |     Create a logger.
35 |     Use a different log file for each process.
36 |     """
37 |     # create log formatter
38 |     log_formatter = LogFormatter()
39 | 
40 |     # create file handler and set level to debug
41 |     if filepath is not None:
42 |         if rank > 0:
43 |             filepath = '%s-%i' % (filepath, rank)
44 |         file_handler = logging.FileHandler(filepath, "a")
45 |         file_handler.setLevel(logging.DEBUG)
46 |         file_handler.setFormatter(log_formatter)
47 | 
48 |     # create console handler and set level to info
49 |     console_handler = logging.StreamHandler()
50 |     console_handler.setLevel(logging.INFO)
51 |     console_handler.setFormatter(log_formatter)
52 | 
53 |     # create logger and set level to debug
54 |     logger = logging.getLogger()
55 |     logger.handlers = []
56 |     logger.setLevel(logging.DEBUG)
57 |     logger.propagate = False
58 |     if filepath is not None:
59 |         logger.addHandler(file_handler)
60 |     logger.addHandler(console_handler)
61 | 
62 |     # reset logger elapsed time
63 |     def reset_time():
64 |         log_formatter.start_time = time.time()
65 |     logger.reset_time = reset_time
66 | 
67 |     return logger
68 | 
69 | 
70 | class PD_Stats(object):
71 |     """
72 |     Log stuff with pandas library
73 |     """
74 |     def __init__(self, path, columns):
75 |         self.path = path
76 | 
77 |         # reload path stats
78 |         if os.path.isfile(self.path):
79 |             self.stats = pd.read_pickle(self.path)
80 | 
81 |             # check that columns are the same
82 |             assert list(self.stats.columns) == list(columns)
83 | 
84 |         else:
85 |             self.stats = pd.DataFrame(columns=columns)
86 | 
87 |     def update(self, row, save=True):
88 |         self.stats.loc[len(self.stats.index)] = row
89 | 
90 |         # save the statistics
91 |         if save:
92 |             self.stats.to_pickle(self.path)
93 | 


--------------------------------------------------------------------------------
/src/model/__init__.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) Facebook, Inc. and its affiliates.
2 | # All rights reserved.
3 | #
4 | # This source code is licensed under the license found in the
5 | # LICENSE file in the root directory of this source tree.
6 | #
7 | 


--------------------------------------------------------------------------------
/src/model/model_factory.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | #
  4 | # This source code is licensed under the license found in the
  5 | # LICENSE file in the root directory of this source tree.
  6 | #
  7 | 
  8 | from logging import getLogger
  9 | 
 10 | import torch
 11 | import torch.nn as nn
 12 | import torch.optim
 13 | 
 14 | from .vgg16 import VGG16
 15 | 
 16 | 
 17 | logger = getLogger()
 18 | 
 19 | 
 20 | def create_sobel_layer():
 21 |     grayscale = nn.Conv2d(3, 1, kernel_size=1, stride=1, padding=0)
 22 |     grayscale.weight.data.fill_(1.0 / 3.0)
 23 |     grayscale.bias.data.zero_()
 24 |     sobel_filter = nn.Conv2d(1, 2, kernel_size=3, stride=1, padding=0)
 25 |     sobel_filter.weight.data[0, 0].copy_(
 26 |         torch.FloatTensor([[1, 0, -1], [2, 0, -2], [1, 0, -1]])
 27 |     )
 28 |     sobel_filter.weight.data[1, 0].copy_(
 29 |         torch.FloatTensor([[1, 2, 1], [0, 0, 0], [-1, -2, -1]])
 30 |     )
 31 |     sobel_filter.bias.data.zero_()
 32 |     sobel = nn.Sequential(grayscale, sobel_filter)
 33 |     for p in sobel.parameters():
 34 |         p.requires_grad = False
 35 |     return sobel
 36 | 
 37 | 
 38 | class Net(nn.Module):
 39 |     def __init__(self, padding, sobel, body, pred_layer):
 40 |         super(Net, self).__init__()
 41 |         
 42 |         # padding
 43 |         self.padding = padding
 44 | 
 45 |         # sobel filter
 46 |         self.sobel = create_sobel_layer() if sobel else None
 47 | 
 48 |         # main architecture
 49 |         self.body = body
 50 | 
 51 |         # prediction layer
 52 |         self.pred_layer = pred_layer
 53 | 
 54 |         self.conv = None
 55 | 
 56 |     def forward(self, x):
 57 |         if self.padding is not None:
 58 |             x = self.padding(x)
 59 |         if self.sobel is not None:
 60 |             x = self.sobel(x)
 61 | 
 62 |         if self.conv is not None:
 63 |             count = 1
 64 |             for m in self.body.features.modules():
 65 |                 if not isinstance(m, nn.Sequential):
 66 |                     x = m(x)
 67 |                 if isinstance(m, nn.ReLU):
 68 |                     if count == self.conv:
 69 |                         return x
 70 |                     count = count + 1
 71 | 
 72 |         x = self.body(x)
 73 |         if self.pred_layer is not None:
 74 |             x = self.pred_layer(x)
 75 |         return x
 76 | 
 77 | 
 78 | def model_factory(sobel, relu=False, num_classes=0, batch_norm=True):
 79 |     """
 80 |     Create a network.
 81 |     """
 82 |     dim_in = 2 if sobel else 3
 83 | 
 84 |     padding = nn.ConstantPad2d(1, 0.0)
 85 |     if sobel:
 86 |         padding = nn.ConstantPad2d(2, 0.0)
 87 |     body = VGG16(dim_in, relu=relu, batch_norm=batch_norm)
 88 | 
 89 |     pred_layer = nn.Linear(body.dim_output_space, num_classes) if num_classes else None
 90 | 
 91 |     return Net(padding, sobel, body, pred_layer)
 92 | 
 93 | 
 94 | def build_prediction_layer(dim_in, args, group=None, num_classes=0):
 95 |     """
 96 |     Create prediction layer on gpu and its associated optimizer.
 97 |     """
 98 | 
 99 |     if not num_classes:
100 |         num_classes = args.super_classes
101 | 
102 |     # last fully connected layer
103 |     pred_layer = nn.Linear(dim_in, num_classes)
104 | 
105 |     # move prediction layer to gpu
106 |     pred_layer = to_cuda(pred_layer, args.gpu_to_work_on, group=group)
107 | 
108 |     # set optimizer for the prediction layer
109 |     optimizer_pred_layer = sgd_optimizer(pred_layer, args.lr, args.wd)
110 | 
111 |     return pred_layer, optimizer_pred_layer
112 | 
113 | 
114 | def to_cuda(net, gpu_id, apex=False, group=None):
115 |     net = net.cuda()
116 |     if apex:
117 |         from apex.parallel import DistributedDataParallel as DDP
118 |         net = DDP(net, delay_allreduce=True)
119 |     else:
120 |         net = nn.parallel.DistributedDataParallel(
121 |             net,
122 |             device_ids=[gpu_id],
123 |             process_group=group,
124 |         )
125 |     return net
126 | 
127 | 
128 | def sgd_optimizer(module, lr, wd):
129 |     return torch.optim.SGD(
130 |         filter(lambda x: x.requires_grad, module.parameters()),
131 |         lr=lr,
132 |         momentum=0.9,
133 |         weight_decay=wd,
134 |     )
135 | 
136 | 
137 | def sobel2RGB(net):
138 |     if net.sobel is None:
139 |         return
140 | 
141 |     def computeweight(conv, alist, blist):
142 |         sob = net.sobel._modules['1'].weight
143 |         res = 0
144 |         for atup in alist:
145 |             for btup in blist:
146 |                 x = conv[:, 0, atup[0], btup[0]]*sob[0, :, atup[1], btup[1]]
147 |                 y = conv[:, 1, atup[0], btup[0]]*sob[1, :, atup[1], btup[1]]
148 |                 res = res + x + y
149 |         return res
150 | 
151 |     def aux(a):
152 |         if a == 0:
153 |             return [(0, 0)]
154 |         elif a == 1:
155 |             return [(1, 0), (0, 1)]
156 |         elif a == 2:
157 |             return [(2, 0), (1, 1), (0, 2)]
158 |         elif a == 3:
159 |             return [(2, 1), (1, 2)]
160 |         elif a == 4:
161 |             return [(2, 2)]
162 | 
163 |     features = list(net.body.features.children())
164 |     conv_old = features[0]
165 |     conv_final = nn.Conv2d(3, 64, kernel_size=5, padding=1, bias=True)
166 |     for i in range(conv_old.kernel_size[0]):
167 |         for j in range(conv_old.kernel_size[0]):
168 |             neweight = 1/3* computeweight(conv_old.weight, aux(i), aux(j)).expand(3, 64).transpose(1, 0)
169 |             conv_final.weight.data[:, :, i, j].copy_(neweight)
170 |     conv_final.bias.data.copy_(conv_old.bias.data)
171 |     features[0] = conv_final
172 |     net.body.features = nn.Sequential(*features)
173 |     net.sobel = None
174 |     return
175 | 


--------------------------------------------------------------------------------
/src/model/pretrain.py:
--------------------------------------------------------------------------------
 1 | # Copyright (c) Facebook, Inc. and its affiliates.
 2 | # All rights reserved.
 3 | #
 4 | # This source code is licensed under the license found in the
 5 | # LICENSE file in the root directory of this source tree.
 6 | #
 7 | 
 8 | import os
 9 | 
10 | from logging import getLogger
11 | import pickle
12 | import numpy as np
13 | import torch
14 | import torch.nn as nn
15 | 
16 | from src.model.model_factory import create_sobel_layer
17 | from src.model.vgg16 import VGG16
18 | 
19 | logger = getLogger()
20 | 
21 | 
22 | def load_pretrained(model, args):
23 |     """
24 |     Load weights
25 |     """
26 |     if not os.path.isfile(args.pretrained):
27 |         logger.info('pretrained weights not found')
28 |         return
29 | 
30 |     # open checkpoint file
31 |     map_location = None
32 |     if args.world_size > 1:
33 |         map_location = "cuda:" + str(args.gpu_to_work_on)
34 |     checkpoint = torch.load(args.pretrained, map_location=map_location)
35 | 
36 |     # clean keys from 'module'
37 |     checkpoint['state_dict'] = {rename_key(key): val
38 |                                 for key, val
39 |                                 in checkpoint['state_dict'].items()}
40 | 
41 |     # remove sobel keys
42 |     if 'sobel.0.weight' in checkpoint['state_dict']:
43 |         del checkpoint['state_dict']['sobel.0.weight']
44 |         del checkpoint['state_dict']['sobel.0.bias']
45 |         del checkpoint['state_dict']['sobel.1.weight']
46 |         del checkpoint['state_dict']['sobel.1.bias']
47 | 
48 |     # remove pred_layer keys
49 |     if 'pred_layer.weight' in checkpoint['state_dict']:
50 |         del checkpoint['state_dict']['pred_layer.weight']
51 |         del checkpoint['state_dict']['pred_layer.bias']
52 | 
53 |     # load weights
54 |     model.body.load_state_dict(checkpoint['state_dict'])
55 |     logger.info("=> loaded pretrained weights from '{}'".format(args.pretrained))
56 | 
57 | 
58 | def rename_key(key):
59 |     "Remove module from key"
60 |     if not 'module' in key:
61 |         return key
62 |     if key.startswith('module.body.'):
63 |         return key[12:]
64 |     if key.startswith('module.'):
65 |         return key[7:]
66 |     return ''.join(key.split('.module'))
67 | 


--------------------------------------------------------------------------------
/src/model/vgg16.py:
--------------------------------------------------------------------------------
 1 | # Copyright (c) Facebook, Inc. and its affiliates.
 2 | # All rights reserved.
 3 | #
 4 | # This source code is licensed under the license found in the
 5 | # LICENSE file in the root directory of this source tree.
 6 | #
 7 | 
 8 | import math
 9 | 
10 | import torch
11 | import torch.nn as nn
12 | import torch.nn.init as init
13 | 
14 | cfg = {
15 |     'D': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'],
16 | }
17 | 
18 | class VGG16(nn.Module):
19 |     '''
20 |     VGG16 model 
21 |     '''
22 |     def __init__(self, dim_in, relu=True, dropout=0.5, batch_norm=True):
23 |         super(VGG16, self).__init__()
24 |         self.features = make_layers(cfg['D'], dim_in, batch_norm=batch_norm)
25 |         self.dim_output_space = 4096
26 |         classifier = [
27 |             nn.Linear(512 * 7 * 7, 4096),
28 |             nn.ReLU(True),
29 |             nn.Dropout(dropout),
30 |             nn.Linear(4096, 4096),
31 |         ]
32 |         if relu:
33 |             classifier.append(nn.ReLU(True))
34 |         self.classifier = nn.Sequential(*classifier)
35 |             
36 |         # Initialize weights
37 |         for m in self.modules():
38 |             if isinstance(m, nn.Conv2d):
39 |                 n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
40 |                 m.weight.data.normal_(0, math.sqrt(2. / n))
41 |                 m.bias.data.zero_()
42 | 
43 |     def forward(self, x):
44 |         x = self.features(x)
45 |         if self.classifier is not None:
46 |             x = x.view(x.size(0), -1)
47 |             x = self.classifier(x)
48 |         return x
49 | 
50 | 
51 | def make_layers(cfg, in_channels, batch_norm=True):
52 |     layers = []
53 |     for v in cfg:
54 |         if v == 'M':
55 |             layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
56 |         else:
57 |             conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=1)
58 |             if batch_norm:
59 |                 layers += [conv2d, nn.BatchNorm2d(v), nn.ReLU(inplace=True)]
60 |             else:
61 |                 layers += [conv2d, nn.ReLU(inplace=True)]
62 |             in_channels = v
63 |     return nn.Sequential(*layers)
64 | 


--------------------------------------------------------------------------------
/src/slurm.py:
--------------------------------------------------------------------------------
 1 | # Copyright (c) Facebook, Inc. and its affiliates.
 2 | # All rights reserved.
 3 | #
 4 | # This source code is licensed under the license found in the
 5 | # LICENSE file in the root directory of this source tree.
 6 | #
 7 | 
 8 | from logging import getLogger
 9 | import os
10 | import signal
11 | import time
12 | 
13 | 
14 | logger = getLogger()
15 | 
16 | 
17 | def trigger_job_requeue(checkpoint_filename):
18 |     ''' Submit a new job to resume from checkpoint.
19 |         Be careful to use only for main process.
20 |     '''
21 |     if int(os.environ['SLURM_PROCID']) == 0 and \
22 |             str(os.getpid()) == os.environ['MAIN_PID'] and os.path.isfile(checkpoint_filename):
23 |         print('time is up, back to slurm queue', flush=True)
24 |         command = 'scontrol requeue ' + os.environ['SLURM_JOB_ID']
25 |         print(command)
26 |         if os.system(command):
27 |             raise RuntimeError('requeue failed')
28 |         print('New job submitted to the queue', flush=True)
29 |     exit(0)
30 | 
31 | 
32 | def SIGTERMHandler(a, b):
33 |     print('received sigterm')
34 |     pass
35 | 
36 | 
37 | def signalHandler(a, b):
38 |     print('Signal received', a, time.time(), flush=True)
39 |     os.environ['SIGNAL_RECEIVED'] = 'True'
40 |     return
41 | 
42 | 
43 | def init_signal_handler():
44 |     """
45 |     Handle signals sent by SLURM for time limit / pre-emption.
46 |     """
47 |     os.environ['SIGNAL_RECEIVED'] = 'False'
48 |     os.environ['MAIN_PID'] = str(os.getpid())
49 | 
50 |     signal.signal(signal.SIGUSR1, signalHandler)
51 |     signal.signal(signal.SIGTERM, SIGTERMHandler)
52 |     print("Signal handler installed.", flush=True)
53 | 


--------------------------------------------------------------------------------
/src/trainer.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | #
  4 | # This source code is licensed under the license found in the
  5 | # LICENSE file in the root directory of this source tree.
  6 | #
  7 | 
  8 | from logging import getLogger
  9 | import os
 10 | import shutil
 11 | import time
 12 | 
 13 | import numpy as np
 14 | import torch
 15 | import torch.distributed as dist
 16 | import torch.nn as nn
 17 | from torch.utils.data.sampler import Sampler
 18 | 
 19 | from .utils import AverageMeter, get_indices_sparse
 20 | from src.slurm import trigger_job_requeue
 21 | 
 22 | 
 23 | logger = getLogger()
 24 | 
 25 | 
 26 | class DistUnifTargSampler(Sampler):
 27 |     """
 28 |     Distributively samples elements based on a uniform distribution over the labels.
 29 |     """
 30 |     def __init__(self, total_size, pseudo_labels, num_replicas, rank, seed=31):
 31 | 
 32 |         np.random.seed(seed)
 33 | 
 34 |         # world size
 35 |         self.num_replicas = num_replicas
 36 | 
 37 |         # rank of this process
 38 |         self.rank = rank
 39 | 
 40 |         # how many data to be loaded by the corpus of processes
 41 |         self.total_size = total_size
 42 | 
 43 |         # set of labels to consider
 44 |         set_of_pseudo_labels = np.unique(pseudo_labels)
 45 |         nmb_pseudo_lab = int(len(set_of_pseudo_labels))
 46 | 
 47 |         # number of images per label
 48 |         per_label = int(self.total_size // nmb_pseudo_lab + 1)
 49 | 
 50 |         # initialize indexes
 51 |         epoch_indexes = np.zeros(int(per_label * nmb_pseudo_lab))
 52 | 
 53 |         # select a number of per_label data for each label
 54 |         indexes = get_indices_sparse(np.asarray(pseudo_labels))
 55 |         for i, k in enumerate(set_of_pseudo_labels):
 56 |             k = int(k)
 57 |             label_indexes = indexes[k][0]
 58 |             epoch_indexes[i * per_label: (i + 1) * per_label] = np.random.choice(
 59 |                 label_indexes,
 60 |                 per_label,
 61 |                 replace=(len(label_indexes) <= per_label)
 62 |             )
 63 | 
 64 |         # make sure indexes are integers
 65 |         epoch_indexes = epoch_indexes.astype(int)
 66 | 
 67 |         # shuffle the indexes
 68 |         np.random.shuffle(epoch_indexes)
 69 | 
 70 |         self.epoch_indexes = epoch_indexes[:self.total_size]
 71 | 
 72 |         # this process only deals with this subset
 73 |         self.process_ind = self.epoch_indexes[self.rank:self.total_size:self.num_replicas]
 74 | 
 75 |     def __iter__(self):
 76 |         return iter(self.process_ind)
 77 | 
 78 |     def __len__(self):
 79 |         return len(self.process_ind)
 80 | 
 81 | 
 82 | def train_network(args, models, optimizers, dataset):
 83 |     """
 84 |     Train the models with cluster assignments as targets
 85 |     """
 86 |     # swith to train mode
 87 |     for model in models:
 88 |         model.train()
 89 | 
 90 |     # uniform sampling over pseudo labels
 91 |     sampler = DistUnifTargSampler(
 92 |         args.epoch_size,
 93 |         dataset.sub_classes,
 94 |         args.training_local_world_size,
 95 |         args.training_local_rank,
 96 |         seed=args.epoch + args.training_local_world_id,
 97 |     )
 98 | 
 99 |     loader = torch.utils.data.DataLoader(
100 |         dataset,
101 |         sampler=sampler,
102 |         batch_size=args.batch_size,
103 |         num_workers=args.workers,
104 |         pin_memory=True,
105 |     )
106 | 
107 |     # running statistics
108 |     batch_time = AverageMeter()
109 |     data_time = AverageMeter()
110 | 
111 |     # training statistics
112 |     log_top1_subclass = AverageMeter()
113 |     log_loss_subclass = AverageMeter()
114 |     log_top1_superclass = AverageMeter()
115 |     log_loss_superclass = AverageMeter()
116 | 
117 |     log_top1 = AverageMeter()
118 |     log_loss = AverageMeter()
119 |     end = time.perf_counter()
120 | 
121 |     cel = nn.CrossEntropyLoss().cuda()
122 |     relu = torch.nn.ReLU().cuda()
123 | 
124 |     for iter_epoch, (inp, target) in enumerate(loader):
125 |         # start at iter start_iter
126 |         if iter_epoch < args.start_iter:
127 |             continue
128 | 
129 |         # measure data loading time
130 |         data_time.update(time.perf_counter() - end)
131 | 
132 |         # move input to gpu
133 |         inp = inp.cuda(non_blocking=True)
134 |         target = target.cuda(non_blocking=True).long()
135 | 
136 |         # forward on the model
137 |         inp = relu(models[0](inp))
138 | 
139 |         # forward on sub-class prediction layer
140 |         output = models[-1](inp)
141 |         loss_subclass = cel(output, target)
142 | 
143 |         # forward on super-class prediction layer
144 |         super_class_output = models[1](inp)
145 |         sc_target = args.training_local_world_id + \
146 |                     0 * torch.cuda.LongTensor(args.batch_size)
147 |         loss_superclass = cel(super_class_output, sc_target)
148 | 
149 |         loss = loss_subclass + loss_superclass
150 | 
151 |         # initialize the optimizers
152 |         for optimizer in optimizers:
153 |             optimizer.zero_grad()
154 | 
155 |         # compute the gradients
156 |         loss.backward()
157 | 
158 |         # step
159 |         for optimizer in optimizers:
160 |             optimizer.step()
161 | 
162 |         # log
163 | 
164 |         # signal received, relaunch experiment
165 |         if os.environ['SIGNAL_RECEIVED'] == 'True':
166 |             save_checkpoint(args, iter_epoch + 1, models, optimizers)
167 |             if not args.rank:
168 |                 trigger_job_requeue(os.path.join(args.dump_path, 'checkpoint.pth.tar'))
169 | 
170 |         # regular checkpoints
171 |         if iter_epoch and iter_epoch % 1000 == 0:
172 |             save_checkpoint(args, iter_epoch + 1, models, optimizers)
173 | 
174 |         # update stats
175 |         log_loss.update(loss.item(), output.size(0))
176 |         prec1 = accuracy(args, output, target, sc_output=super_class_output)
177 |         log_top1.update(prec1.item(), output.size(0))
178 | 
179 |         log_loss_superclass.update(loss_superclass.item(), output.size(0))
180 |         prec1 = accuracy(args, super_class_output, sc_target)
181 |         log_top1_superclass.update(prec1.item(), output.size(0))
182 | 
183 |         log_loss_subclass.update(loss_subclass.item(), output.size(0))
184 |         prec1 = accuracy(args, output, target)
185 |         log_top1_subclass.update(prec1.item(), output.size(0))
186 | 
187 |         batch_time.update(time.perf_counter() - end)
188 |         end = time.perf_counter()
189 | 
190 |         # verbose
191 |         if iter_epoch % 100 == 0:
192 |             logger.info('Epoch[{0}] - Iter: [{1}/{2}]\t'
193 |                         'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t'
194 |                         'Data {data_time.val:.3f} ({data_time.avg:.3f})\t'
195 |                         'Loss {loss.val:.4f} ({loss.avg:.4f})\t'
196 |                         'Prec {log_top1.val:.3f} ({log_top1.avg:.3f})\t'
197 |                         'Super-class loss: {sc_loss.val:.3f} ({sc_loss.avg:.3f})\t'
198 |                         'Super-class prec: {sc_prec.val:.3f} ({sc_prec.avg:.3f})\t'
199 |                         'Intra super-class loss: {los.val:.3f} ({los.avg:.3f})\t'
200 |                         'Intra super-class prec: {prec.val:.3f} ({prec.avg:.3f})\t'
201 |                         .format(args.epoch, iter_epoch, len(loader), batch_time=batch_time,
202 |                                 data_time=data_time, loss=log_loss, log_top1=log_top1,
203 |                                 sc_loss=log_loss_superclass, sc_prec=log_top1_superclass,
204 |                                 los=log_loss_subclass, prec=log_top1_subclass))
205 | 
206 |     # end of epoch
207 |     args.start_iter = 0
208 |     args.epoch += 1
209 | 
210 |     # dump checkpoint
211 |     save_checkpoint(args, 0, models, optimizers)
212 |     if not args.rank:
213 |         if not (args.epoch - 1) % args.checkpoint_freq:
214 |             shutil.copyfile(
215 |                 os.path.join(args.dump_path, 'checkpoint.pth.tar'),
216 |                 os.path.join(args.dump_checkpoints,
217 |                              'checkpoint' + str(args.epoch - 1) + '.pth.tar'),
218 |             )
219 | 
220 |     return (args.epoch - 1,
221 |             args.epoch * len(loader),
222 |             log_top1.avg, log_loss.avg,
223 |             log_top1_superclass.avg, log_loss_superclass.avg,
224 |             log_top1_subclass.avg, log_loss_subclass.avg,
225 |             )
226 | 
227 | 
228 | def save_checkpoint(args, iter_epoch, models, optimizers, path=''):
229 |     if not os.path.isfile(path):
230 |         path = os.path.join(args.dump_path, 'checkpoint.pth.tar')
231 | 
232 |     # main process saves the training state
233 |     if not args.rank:
234 |         torch.save({
235 |             'epoch': args.epoch,
236 |             'start_iter': iter_epoch,
237 |             'state_dict': models[0].state_dict(),
238 |             'optimizer': optimizers[0].state_dict(),
239 |             'pred_layer_state_dict': models[1].state_dict(),
240 |             'optimizer_pred_layer': optimizers[1].state_dict(),
241 |         }, path)
242 | 
243 |     # main local training process saves the last layer
244 |     if not args.training_local_rank:
245 |         torch.save({
246 |             'epoch': args.epoch,
247 |             'start_iter': iter_epoch,
248 |             'state_dict': models[-1].state_dict(),
249 |             'optimizer': optimizers[-1].state_dict(),
250 |         }, os.path.join(args.dump_path, str(args.training_local_world_id) + '-pred_layer.pth.tar'))
251 | 
252 | 
253 | def accuracy(args, output, target, sc_output=None):
254 |     """Computes the accuracy over the k top predictions for the specified values of k"""
255 |     with torch.no_grad():
256 | 
257 |         batch_size = target.size(0)
258 | 
259 |         _, pred = output.topk(1, 1, True, True)
260 |         pred = pred.t()
261 |         correct = pred.eq(target.view(1, -1).expand_as(pred))
262 | 
263 |         if sc_output is not None:
264 |             _, pred = sc_output.topk(1, 1, True, True)
265 |             pred = pred.t()
266 |             target = args.training_local_world_id + 0 * torch.cuda.LongTensor(batch_size)
267 |             correct_sc = pred.eq(target.view(1, -1).expand_as(pred))
268 |             correct *= correct_sc
269 | 
270 |         correct_1 = correct[:1].view(-1).float().sum(0, keepdim=True)
271 |         return correct_1.mul_(100.0 / batch_size)
272 | 
273 | 
274 | def validate_network(val_loader, models, args):
275 |     batch_time = AverageMeter()
276 |     losses = AverageMeter()
277 |     top1 = AverageMeter()
278 | 
279 |     # switch to evaluate mode
280 |     for model in models:
281 |         model.eval()
282 | 
283 |     criterion = nn.CrossEntropyLoss().cuda()
284 | 
285 |     with torch.no_grad():
286 |         end = time.perf_counter()
287 |         for i, (inp, target) in enumerate(val_loader):
288 | 
289 |             # move to gpu
290 |             inp = inp.cuda(non_blocking=True)
291 |             target = target.cuda(non_blocking=True)
292 | 
293 |             # compute output
294 |             output = inp
295 |             for model in models:
296 |                 output = model(output)
297 |             loss = criterion(output, target)
298 | 
299 |             # measure accuracy and record loss
300 |             acc1 = accuracy(args, output, target)
301 |             losses.update(loss.item(), inp.size(0))
302 |             top1.update(acc1[0], inp.size(0))
303 | 
304 |             # measure elapsed time
305 |             batch_time.update(time.perf_counter() - end)
306 |             end = time.perf_counter()
307 | 
308 |             if i % 100 == 0:
309 |                 logger.info('Test: [{0}/{1}]\t'
310 |                             'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t'
311 |                             'Loss {loss.val:.4f} ({loss.avg:.4f})\t'
312 |                             'Acc@1 {top1.val:.3f} ({top1.avg:.3f})\t'
313 |                             .format(i, len(val_loader), batch_time=batch_time,
314 |                                     loss=losses, top1=top1))
315 | 
316 |     return (top1.avg.item(), losses.avg)
317 | 


--------------------------------------------------------------------------------
/src/utils.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | #
  4 | # This source code is licensed under the license found in the
  5 | # LICENSE file in the root directory of this source tree.
  6 | #
  7 | 
  8 | import argparse
  9 | from logging import getLogger
 10 | import os
 11 | import pickle
 12 | import shutil
 13 | import time
 14 | 
 15 | import numpy as np
 16 | from scipy.sparse import csr_matrix
 17 | import torch
 18 | import torch.distributed as dist
 19 | 
 20 | from .logger import create_logger, PD_Stats
 21 | 
 22 | 
 23 | FALSY_STRINGS = {'off', 'false', '0'}
 24 | TRUTHY_STRINGS = {'on', 'true', '1'}
 25 | 
 26 | 
 27 | logger = getLogger()
 28 | 
 29 | 
 30 | def bool_flag(s):
 31 |     """
 32 |     Parse boolean arguments from the command line.
 33 |     """
 34 |     if s.lower() in FALSY_STRINGS:
 35 |         return False
 36 |     elif s.lower() in TRUTHY_STRINGS:
 37 |         return True
 38 |     else:
 39 |         raise argparse.ArgumentTypeError("invalid value for a boolean flag")
 40 | 
 41 | 
 42 | def init_distributed_mode(args, make_communication_groups=True):
 43 |     """
 44 |     Handle single and multi-GPU / multi-node / SLURM jobs.
 45 |     Initialize the following variables:
 46 |         - global rank
 47 | 
 48 |         - clustering_local_rank
 49 |         - clustering_local_world_size
 50 |         - clustering_local_world_id
 51 | 
 52 |         - training_local_rank
 53 |         - training_local_world_size
 54 |         - training_local_world_id
 55 | 
 56 |         - rotation
 57 |     """
 58 | 
 59 |     args.is_slurm_job = 'SLURM_JOB_ID' in os.environ and not args.debug_slurm
 60 | 
 61 |     if args.is_slurm_job:
 62 |         args.rank = int(os.environ['SLURM_PROCID'])
 63 |     else:
 64 |         # jobs started with torch.distributed.launch
 65 |         # read environment variables
 66 |         args.rank = int(os.environ['RANK'])
 67 |         args.world_size = int(os.environ['WORLD_SIZE'])
 68 | 
 69 |     # prepare distributed
 70 |     dist.init_process_group(backend='nccl', init_method=args.dist_url,
 71 |                             world_size=args.world_size, rank=args.rank)
 72 | 
 73 |     # set cuda device
 74 |     args.gpu_to_work_on = args.rank % torch.cuda.device_count()
 75 |     torch.cuda.set_device(args.gpu_to_work_on)
 76 | 
 77 |     if not make_communication_groups:
 78 |         return None, None
 79 | 
 80 |     # each super_class has the same number of processes
 81 |     assert args.world_size % args.super_classes == 0
 82 | 
 83 |     # each super-class forms a training communication group
 84 |     args.training_local_world_size = args.world_size // args.super_classes
 85 |     args.training_local_rank = args.rank % args.training_local_world_size
 86 |     args.training_local_world_id = args.rank // args.training_local_world_size
 87 | 
 88 |     # prepare training groups
 89 |     training_groups = []
 90 |     for group_id in range(args.super_classes):
 91 |         ranks = [args.training_local_world_size * group_id + i \
 92 |                  for i in range(args.training_local_world_size)]
 93 |         training_groups.append(dist.new_group(ranks=ranks))
 94 | 
 95 |     # compute number of super-clusters
 96 |     if args.rotnet:
 97 |         assert args.super_classes % 4 == 0
 98 |         args.nmb_super_clusters = args.super_classes // 4
 99 |     else:
100 |         args.nmb_super_clusters = args.super_classes
101 | 
102 |     # prepare clustering communication groups
103 |     args.clustering_local_world_size = args.training_local_world_size * \
104 |                                        (args.super_classes // args.nmb_super_clusters)
105 |     args.clustering_local_rank = args.rank % args.clustering_local_world_size
106 |     args.clustering_local_world_id = args.rank // args.clustering_local_world_size
107 | 
108 |     clustering_groups = []
109 |     for group_id in range(args.nmb_super_clusters):
110 |         ranks = [args.clustering_local_world_size * group_id + i \
111 |                  for i in range(args.clustering_local_world_size)]
112 |         clustering_groups.append(dist.new_group(ranks=ranks))
113 | 
114 |     # this process deals only with a certain rotation
115 |     if args.rotnet:
116 |         args.rotation = args.clustering_local_rank // args.training_local_world_size
117 |     else:
118 |         args.rotation = 0
119 | 
120 |     return training_groups, clustering_groups
121 | 
122 | 
123 | def check_parameters(args):
124 |     """
125 |     Check if corpus of arguments is consistent.
126 |     """
127 |     args.size_dataset = min(args.size_dataset, 95920149)
128 | 
129 |     # make dataset size divisible by both the batch-size and the world-size
130 |     div = args.batch_size * args.world_size
131 |     args.size_dataset = args.size_dataset // div * div
132 | 
133 |     args.epoch_size = args.size_dataset // args.nmb_super_clusters // 4
134 |     args.epoch_size = args.epoch_size // div * div
135 | 
136 |     assert args.super_classes
137 | 
138 |     # number of super classes must be divisible by the number of rotation categories
139 |     if args.rotnet:
140 |         assert args.super_classes % 4 == 0
141 | 
142 |     # feature dimension
143 |     assert args.dim_pca <= 4096
144 | 
145 | 
146 | def initialize_exp(params, *args):
147 |     """
148 |     Initialize the experience:
149 |     - dump parameters
150 |     - create checkpoint and cache repos
151 |     - create a logger
152 |     - create a panda object to log the training statistics
153 |     """
154 |     # dump parameters
155 |     pickle.dump(params, open(os.path.join(params.dump_path, 'params.pkl'), 'wb'))
156 | 
157 |     # create repo to store checkpoints
158 |     params.dump_checkpoints = os.path.join(params.dump_path, 'checkpoints')
159 |     if not params.rank and not os.path.isdir(params.dump_checkpoints):
160 |         os.mkdir(params.dump_checkpoints)
161 | 
162 |     # create repo to cache activations between the two stages of the hierarchical k-means
163 |     if not params.rank and not os.path.isdir(os.path.join(params.dump_path, 'cache')):
164 |         os.mkdir(os.path.join(params.dump_path, 'cache'))
165 | 
166 |     # create a panda object to log loss and acc
167 |     training_stats = PD_Stats(
168 |         os.path.join(params.dump_path, 'stats' + str(params.rank) + '.pkl'),
169 |         args,
170 |     )
171 | 
172 |     # create a logger
173 |     logger = create_logger(os.path.join(params.dump_path, 'train.log'), rank=params.rank)
174 |     logger.info("============ Initialized logger ============")
175 |     logger.info("\n".join("%s: %s" % (k, str(v))
176 |                           for k, v in sorted(dict(vars(params)).items())))
177 |     logger.info("The experiment will be stored in %s\n" % params.dump_path)
178 |     logger.info("")
179 | 
180 |     return logger, training_stats
181 | 
182 | 
183 | def end_of_epoch(args):
184 |     """
185 |     Remove cluster assignment from experiment repository
186 |     """
187 | 
188 |     def src_dst(what, cl=False):
189 |         src = os.path.join(
190 |             args.dump_path,
191 |             what + cl * str(args.clustering_local_world_id) + '.pkl',
192 |         )
193 |         dst = os.path.join(
194 |             args.dump_checkpoints,
195 |             what + '{}-epoch{}.pkl'.format(cl * args.clustering_local_world_id, args.epoch - 1),
196 |         )
197 |         return src, dst
198 | 
199 |     # main processes only are working here
200 |     if not args.clustering_local_rank:
201 |         for what in ['cluster_assignments', 'centroids']:
202 |             src, dst = src_dst(what, cl=True)
203 |             if not (args.epoch - 1) % args.checkpoint_freq:
204 |                 shutil.copy(src, dst)
205 |             if not 'centroids' in src:
206 |                 os.remove(src)
207 | 
208 |     if not args.rank:
209 |         for what in ['super_class_assignments', 'super_class_centroids']:
210 |             src, dst = src_dst(what)
211 |             if not (args.epoch - 1) % args.checkpoint_freq:
212 |                 shutil.copy(src, dst)
213 |             os.remove(src)
214 | 
215 | 
216 | def restart_from_checkpoint(args, ckp_path=None, run_variables=None, **kwargs):
217 |     """
218 |     Re-start from checkpoint present in experiment repo
219 |     """
220 |     if ckp_path is None:
221 |         ckp_path = os.path.join(args.dump_path, 'checkpoint.pth.tar')
222 | 
223 |     # look for a checkpoint in exp repository
224 |     if not os.path.isfile(ckp_path):
225 |         return
226 | 
227 |     logger.info('Found checkpoint in experiment repository')
228 | 
229 |     # open checkpoint file
230 |     map_location = None
231 |     if args.world_size > 1:
232 |         map_location = "cuda:" + str(args.gpu_to_work_on)
233 |     checkpoint = torch.load(ckp_path, map_location=map_location)
234 | 
235 |     # key is what to look for in the checkpoint file
236 |     # value is the object to load
237 |     # example: {'state_dict': model}
238 |     for key, value in kwargs.items():
239 |         if key in checkpoint and value is not None:
240 |             value.load_state_dict(checkpoint[key])
241 |             logger.info("=> loaded {} from checkpoint '{}'"
242 |                         .format(key, ckp_path))
243 |         else:
244 |             logger.warning("=> failed to load {} from checkpoint '{}'"
245 |                         .format(key, ckp_path))
246 | 
247 |     # re load variable important for the run
248 |     if run_variables is not None:
249 |         for var_name in run_variables:
250 |             if var_name in checkpoint:
251 |                 run_variables[var_name] = checkpoint[var_name]
252 | 
253 | 
254 | def fix_random_seeds(seed=1993):
255 |     """
256 |     Fix random seeds.
257 |     """
258 |     torch.manual_seed(seed)
259 |     torch.cuda.manual_seed_all(seed)
260 |     np.random.seed(seed)
261 | 
262 | 
263 | class PCA():
264 |     """
265 |     Class to  compute and apply PCA.
266 |     """
267 |     def __init__(self, dim=256, whit=0.5):
268 |         self.dim = dim
269 |         self.whit = whit
270 |         self.mean = None
271 | 
272 |     def train_pca(self, cov):
273 |         """ 
274 |         Takes a covariance matrix (np.ndarray) as input.
275 |         """
276 |         d, v = np.linalg.eigh(cov)
277 |         eps = d.max() * 1e-5
278 |         n_0 = (d < eps).sum()
279 |         if n_0 > 0:
280 |             d[d < eps] = eps
281 | 
282 |         # total energy
283 |         totenergy = d.sum()
284 | 
285 |         # sort eigenvectors with eigenvalues order
286 |         idx = np.argsort(d)[::-1][:self.dim]
287 |         d = d[idx]
288 |         v = v[:, idx]
289 | 
290 |         logger.warning("keeping %.2f %% of the energy" % (d.sum() / totenergy * 100.0))
291 | 
292 |         # for the whitening
293 |         d = np.diag(1. / d**self.whit)
294 | 
295 |         # principal components
296 |         self.dvt = np.dot(d, v.T)
297 | 
298 |     def apply(self, x):
299 |         # input is from numpy
300 |         if isinstance(x, np.ndarray):
301 |             if self.mean is not None:
302 |                 x -= self.mean
303 |             return np.dot(self.dvt, x.T).T
304 | 
305 |         # input is from torch and is on GPU
306 |         if x.is_cuda:
307 |             if self.mean is not None:
308 |                 x -= torch.cuda.FloatTensor(self.mean)
309 |             return torch.mm(torch.cuda.FloatTensor(self.dvt), x.transpose(0, 1)).transpose(0, 1)
310 | 
311 |         # input if from torch, on CPU
312 |         if self.mean is not None:
313 |             x -= torch.FloatTensor(self.mean)
314 |         return torch.mm(torch.FloatTensor(self.dvt), x.transpose(0, 1)).transpose(0, 1)
315 | 
316 | 
317 | class AverageMeter(object):
318 |     """computes and stores the average and current value"""
319 |     def __init__(self):
320 |         self.reset()
321 | 
322 |     def reset(self):
323 |         self.val = 0
324 |         self.avg = 0
325 |         self.sum = 0
326 |         self.count = 0
327 | 
328 |     def update(self, val, n=1):
329 |         self.val = val
330 |         self.sum += val * n
331 |         self.count += n
332 |         self.avg = self.sum / self.count
333 | 
334 | 
335 | def normalize(data):
336 |     # data in numpy array
337 |     if isinstance(data, np.ndarray):
338 |         row_sums = np.linalg.norm(data, axis=1)
339 |         data = data / row_sums[:, np.newaxis]
340 |         return data
341 | 
342 |     # data is a tensor
343 |     row_sums = data.norm(dim=1, keepdim=True)
344 |     data = data / row_sums
345 |     return data
346 | 
347 | 
348 | def compute_M(data):
349 |     cols = np.arange(data.size)
350 |     return csr_matrix((cols, (data.ravel(), cols)),
351 |                       shape=(data.max() + 1, data.size))
352 | 
353 | def get_indices_sparse(data):
354 |     M = compute_M(data)
355 |     return [np.unravel_index(row.data, data.shape) for row in M]
356 | 


--------------------------------------------------------------------------------