├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── LICENSE ├── README.md ├── conv13.sh ├── distributed_training.png ├── download_models.sh ├── eval_linear.py ├── eval_pretrain.py ├── eval_voc_classif.py ├── linear_classif_layers.sh ├── main.py ├── main.sh └── src ├── __init__.py ├── clustering.py ├── data ├── VOC2007.py ├── YFCC100M.py ├── __init__.py └── loader.py ├── distributed_kmeans.py ├── logger.py ├── model ├── __init__.py ├── model_factory.py ├── pretrain.py └── vgg16.py ├── slurm.py ├── trainer.py └── utils.py /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Code of Conduct 2 | 3 | Facebook has adopted a Code of Conduct that we expect project participants to adhere to. 4 | Please read the [full text](https://code.fb.com/codeofconduct/) 5 | so that you can understand what actions will and will not be tolerated. 6 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing 2 | 3 | In the context of this project, we do not expect pull requests. 4 | If you find a bug, or would like to suggest an improvement, please open an issue. 5 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Attribution-NonCommercial 4.0 International 2 | 3 | ======================================================================= 4 | 5 | Creative Commons Corporation ("Creative Commons") is not a law firm and 6 | does not provide legal services or legal advice. Distribution of 7 | Creative Commons public licenses does not create a lawyer-client or 8 | other relationship. Creative Commons makes its licenses and related 9 | information available on an "as-is" basis. Creative Commons gives no 10 | warranties regarding its licenses, any material licensed under their 11 | terms and conditions, or any related information. Creative Commons 12 | disclaims all liability for damages resulting from their use to the 13 | fullest extent possible. 14 | 15 | Using Creative Commons Public Licenses 16 | 17 | Creative Commons public licenses provide a standard set of terms and 18 | conditions that creators and other rights holders may use to share 19 | original works of authorship and other material subject to copyright 20 | and certain other rights specified in the public license below. The 21 | following considerations are for informational purposes only, are not 22 | exhaustive, and do not form part of our licenses. 23 | 24 | Considerations for licensors: Our public licenses are 25 | intended for use by those authorized to give the public 26 | permission to use material in ways otherwise restricted by 27 | copyright and certain other rights. Our licenses are 28 | irrevocable. Licensors should read and understand the terms 29 | and conditions of the license they choose before applying it. 30 | Licensors should also secure all rights necessary before 31 | applying our licenses so that the public can reuse the 32 | material as expected. Licensors should clearly mark any 33 | material not subject to the license. This includes other CC- 34 | licensed material, or material used under an exception or 35 | limitation to copyright. More considerations for licensors: 36 | wiki.creativecommons.org/Considerations_for_licensors 37 | 38 | Considerations for the public: By using one of our public 39 | licenses, a licensor grants the public permission to use the 40 | licensed material under specified terms and conditions. If 41 | the licensor's permission is not necessary for any reason--for 42 | example, because of any applicable exception or limitation to 43 | copyright--then that use is not regulated by the license. Our 44 | licenses grant only permissions under copyright and certain 45 | other rights that a licensor has authority to grant. Use of 46 | the licensed material may still be restricted for other 47 | reasons, including because others have copyright or other 48 | rights in the material. A licensor may make special requests, 49 | such as asking that all changes be marked or described. 50 | Although not required by our licenses, you are encouraged to 51 | respect those requests where reasonable. More_considerations 52 | for the public: 53 | wiki.creativecommons.org/Considerations_for_licensees 54 | 55 | ======================================================================= 56 | 57 | Creative Commons Attribution-NonCommercial 4.0 International Public 58 | License 59 | 60 | By exercising the Licensed Rights (defined below), You accept and agree 61 | to be bound by the terms and conditions of this Creative Commons 62 | Attribution-NonCommercial 4.0 International Public License ("Public 63 | License"). To the extent this Public License may be interpreted as a 64 | contract, You are granted the Licensed Rights in consideration of Your 65 | acceptance of these terms and conditions, and the Licensor grants You 66 | such rights in consideration of benefits the Licensor receives from 67 | making the Licensed Material available under these terms and 68 | conditions. 69 | 70 | Section 1 -- Definitions. 71 | 72 | a. Adapted Material means material subject to Copyright and Similar 73 | Rights that is derived from or based upon the Licensed Material 74 | and in which the Licensed Material is translated, altered, 75 | arranged, transformed, or otherwise modified in a manner requiring 76 | permission under the Copyright and Similar Rights held by the 77 | Licensor. For purposes of this Public License, where the Licensed 78 | Material is a musical work, performance, or sound recording, 79 | Adapted Material is always produced where the Licensed Material is 80 | synched in timed relation with a moving image. 81 | 82 | b. Adapter's License means the license You apply to Your Copyright 83 | and Similar Rights in Your contributions to Adapted Material in 84 | accordance with the terms and conditions of this Public License. 85 | 86 | c. Copyright and Similar Rights means copyright and/or similar rights 87 | closely related to copyright including, without limitation, 88 | performance, broadcast, sound recording, and Sui Generis Database 89 | Rights, without regard to how the rights are labeled or 90 | categorized. For purposes of this Public License, the rights 91 | specified in Section 2(b)(1)-(2) are not Copyright and Similar 92 | Rights. 93 | d. Effective Technological Measures means those measures that, in the 94 | absence of proper authority, may not be circumvented under laws 95 | fulfilling obligations under Article 11 of the WIPO Copyright 96 | Treaty adopted on December 20, 1996, and/or similar international 97 | agreements. 98 | 99 | e. Exceptions and Limitations means fair use, fair dealing, and/or 100 | any other exception or limitation to Copyright and Similar Rights 101 | that applies to Your use of the Licensed Material. 102 | 103 | f. Licensed Material means the artistic or literary work, database, 104 | or other material to which the Licensor applied this Public 105 | License. 106 | 107 | g. Licensed Rights means the rights granted to You subject to the 108 | terms and conditions of this Public License, which are limited to 109 | all Copyright and Similar Rights that apply to Your use of the 110 | Licensed Material and that the Licensor has authority to license. 111 | 112 | h. Licensor means the individual(s) or entity(ies) granting rights 113 | under this Public License. 114 | 115 | i. NonCommercial means not primarily intended for or directed towards 116 | commercial advantage or monetary compensation. For purposes of 117 | this Public License, the exchange of the Licensed Material for 118 | other material subject to Copyright and Similar Rights by digital 119 | file-sharing or similar means is NonCommercial provided there is 120 | no payment of monetary compensation in connection with the 121 | exchange. 122 | 123 | j. Share means to provide material to the public by any means or 124 | process that requires permission under the Licensed Rights, such 125 | as reproduction, public display, public performance, distribution, 126 | dissemination, communication, or importation, and to make material 127 | available to the public including in ways that members of the 128 | public may access the material from a place and at a time 129 | individually chosen by them. 130 | 131 | k. Sui Generis Database Rights means rights other than copyright 132 | resulting from Directive 96/9/EC of the European Parliament and of 133 | the Council of 11 March 1996 on the legal protection of databases, 134 | as amended and/or succeeded, as well as other essentially 135 | equivalent rights anywhere in the world. 136 | 137 | l. You means the individual or entity exercising the Licensed Rights 138 | under this Public License. Your has a corresponding meaning. 139 | 140 | Section 2 -- Scope. 141 | 142 | a. License grant. 143 | 144 | 1. Subject to the terms and conditions of this Public License, 145 | the Licensor hereby grants You a worldwide, royalty-free, 146 | non-sublicensable, non-exclusive, irrevocable license to 147 | exercise the Licensed Rights in the Licensed Material to: 148 | 149 | a. reproduce and Share the Licensed Material, in whole or 150 | in part, for NonCommercial purposes only; and 151 | 152 | b. produce, reproduce, and Share Adapted Material for 153 | NonCommercial purposes only. 154 | 155 | 2. Exceptions and Limitations. For the avoidance of doubt, where 156 | Exceptions and Limitations apply to Your use, this Public 157 | License does not apply, and You do not need to comply with 158 | its terms and conditions. 159 | 160 | 3. Term. The term of this Public License is specified in Section 161 | 6(a). 162 | 163 | 4. Media and formats; technical modifications allowed. The 164 | Licensor authorizes You to exercise the Licensed Rights in 165 | all media and formats whether now known or hereafter created, 166 | and to make technical modifications necessary to do so. The 167 | Licensor waives and/or agrees not to assert any right or 168 | authority to forbid You from making technical modifications 169 | necessary to exercise the Licensed Rights, including 170 | technical modifications necessary to circumvent Effective 171 | Technological Measures. For purposes of this Public License, 172 | simply making modifications authorized by this Section 2(a) 173 | (4) never produces Adapted Material. 174 | 175 | 5. Downstream recipients. 176 | 177 | a. Offer from the Licensor -- Licensed Material. Every 178 | recipient of the Licensed Material automatically 179 | receives an offer from the Licensor to exercise the 180 | Licensed Rights under the terms and conditions of this 181 | Public License. 182 | 183 | b. No downstream restrictions. You may not offer or impose 184 | any additional or different terms or conditions on, or 185 | apply any Effective Technological Measures to, the 186 | Licensed Material if doing so restricts exercise of the 187 | Licensed Rights by any recipient of the Licensed 188 | Material. 189 | 190 | 6. No endorsement. Nothing in this Public License constitutes or 191 | may be construed as permission to assert or imply that You 192 | are, or that Your use of the Licensed Material is, connected 193 | with, or sponsored, endorsed, or granted official status by, 194 | the Licensor or others designated to receive attribution as 195 | provided in Section 3(a)(1)(A)(i). 196 | 197 | b. Other rights. 198 | 199 | 1. Moral rights, such as the right of integrity, are not 200 | licensed under this Public License, nor are publicity, 201 | privacy, and/or other similar personality rights; however, to 202 | the extent possible, the Licensor waives and/or agrees not to 203 | assert any such rights held by the Licensor to the limited 204 | extent necessary to allow You to exercise the Licensed 205 | Rights, but not otherwise. 206 | 207 | 2. Patent and trademark rights are not licensed under this 208 | Public License. 209 | 210 | 3. To the extent possible, the Licensor waives any right to 211 | collect royalties from You for the exercise of the Licensed 212 | Rights, whether directly or through a collecting society 213 | under any voluntary or waivable statutory or compulsory 214 | licensing scheme. In all other cases the Licensor expressly 215 | reserves any right to collect such royalties, including when 216 | the Licensed Material is used other than for NonCommercial 217 | purposes. 218 | 219 | Section 3 -- License Conditions. 220 | 221 | Your exercise of the Licensed Rights is expressly made subject to the 222 | following conditions. 223 | 224 | a. Attribution. 225 | 226 | 1. If You Share the Licensed Material (including in modified 227 | form), You must: 228 | 229 | a. retain the following if it is supplied by the Licensor 230 | with the Licensed Material: 231 | 232 | i. identification of the creator(s) of the Licensed 233 | Material and any others designated to receive 234 | attribution, in any reasonable manner requested by 235 | the Licensor (including by pseudonym if 236 | designated); 237 | 238 | ii. a copyright notice; 239 | 240 | iii. a notice that refers to this Public License; 241 | 242 | iv. a notice that refers to the disclaimer of 243 | warranties; 244 | 245 | v. a URI or hyperlink to the Licensed Material to the 246 | extent reasonably practicable; 247 | 248 | b. indicate if You modified the Licensed Material and 249 | retain an indication of any previous modifications; and 250 | 251 | c. indicate the Licensed Material is licensed under this 252 | Public License, and include the text of, or the URI or 253 | hyperlink to, this Public License. 254 | 255 | 2. You may satisfy the conditions in Section 3(a)(1) in any 256 | reasonable manner based on the medium, means, and context in 257 | which You Share the Licensed Material. For example, it may be 258 | reasonable to satisfy the conditions by providing a URI or 259 | hyperlink to a resource that includes the required 260 | information. 261 | 262 | 3. If requested by the Licensor, You must remove any of the 263 | information required by Section 3(a)(1)(A) to the extent 264 | reasonably practicable. 265 | 266 | 4. If You Share Adapted Material You produce, the Adapter's 267 | License You apply must not prevent recipients of the Adapted 268 | Material from complying with this Public License. 269 | 270 | Section 4 -- Sui Generis Database Rights. 271 | 272 | Where the Licensed Rights include Sui Generis Database Rights that 273 | apply to Your use of the Licensed Material: 274 | 275 | a. for the avoidance of doubt, Section 2(a)(1) grants You the right 276 | to extract, reuse, reproduce, and Share all or a substantial 277 | portion of the contents of the database for NonCommercial purposes 278 | only; 279 | 280 | b. if You include all or a substantial portion of the database 281 | contents in a database in which You have Sui Generis Database 282 | Rights, then the database in which You have Sui Generis Database 283 | Rights (but not its individual contents) is Adapted Material; and 284 | 285 | c. You must comply with the conditions in Section 3(a) if You Share 286 | all or a substantial portion of the contents of the database. 287 | 288 | For the avoidance of doubt, this Section 4 supplements and does not 289 | replace Your obligations under this Public License where the Licensed 290 | Rights include other Copyright and Similar Rights. 291 | 292 | Section 5 -- Disclaimer of Warranties and Limitation of Liability. 293 | 294 | a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE 295 | EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS 296 | AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF 297 | ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS, 298 | IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION, 299 | WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR 300 | PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS, 301 | ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT 302 | KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT 303 | ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU. 304 | 305 | b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE 306 | TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION, 307 | NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT, 308 | INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES, 309 | COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR 310 | USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN 311 | ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR 312 | DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR 313 | IN PART, THIS LIMITATION MAY NOT APPLY TO YOU. 314 | 315 | c. The disclaimer of warranties and limitation of liability provided 316 | above shall be interpreted in a manner that, to the extent 317 | possible, most closely approximates an absolute disclaimer and 318 | waiver of all liability. 319 | 320 | Section 6 -- Term and Termination. 321 | 322 | a. This Public License applies for the term of the Copyright and 323 | Similar Rights licensed here. However, if You fail to comply with 324 | this Public License, then Your rights under this Public License 325 | terminate automatically. 326 | 327 | b. Where Your right to use the Licensed Material has terminated under 328 | Section 6(a), it reinstates: 329 | 330 | 1. automatically as of the date the violation is cured, provided 331 | it is cured within 30 days of Your discovery of the 332 | violation; or 333 | 334 | 2. upon express reinstatement by the Licensor. 335 | 336 | For the avoidance of doubt, this Section 6(b) does not affect any 337 | right the Licensor may have to seek remedies for Your violations 338 | of this Public License. 339 | 340 | c. For the avoidance of doubt, the Licensor may also offer the 341 | Licensed Material under separate terms or conditions or stop 342 | distributing the Licensed Material at any time; however, doing so 343 | will not terminate this Public License. 344 | 345 | d. Sections 1, 5, 6, 7, and 8 survive termination of this Public 346 | License. 347 | 348 | Section 7 -- Other Terms and Conditions. 349 | 350 | a. The Licensor shall not be bound by any additional or different 351 | terms or conditions communicated by You unless expressly agreed. 352 | 353 | b. Any arrangements, understandings, or agreements regarding the 354 | Licensed Material not stated herein are separate from and 355 | independent of the terms and conditions of this Public License. 356 | 357 | Section 8 -- Interpretation. 358 | 359 | a. For the avoidance of doubt, this Public License does not, and 360 | shall not be interpreted to, reduce, limit, restrict, or impose 361 | conditions on any use of the Licensed Material that could lawfully 362 | be made without permission under this Public License. 363 | 364 | b. To the extent possible, if any provision of this Public License is 365 | deemed unenforceable, it shall be automatically reformed to the 366 | minimum extent necessary to make it enforceable. If the provision 367 | cannot be reformed, it shall be severed from this Public License 368 | without affecting the enforceability of the remaining terms and 369 | conditions. 370 | 371 | c. No term or condition of this Public License will be waived and no 372 | failure to comply consented to unless expressly agreed to by the 373 | Licensor. 374 | 375 | d. Nothing in this Public License constitutes or may be interpreted 376 | as a limitation upon, or waiver of, any privileges and immunities 377 | that apply to the Licensor or You, including from the legal 378 | processes of any jurisdiction or authority. 379 | 380 | ======================================================================= 381 | 382 | Creative Commons is not a party to its public 383 | licenses. Notwithstanding, Creative Commons may elect to apply one of 384 | its public licenses to material it publishes and in those instances 385 | will be considered the “Licensor.” The text of the Creative Commons 386 | public licenses is dedicated to the public domain under the CC0 Public 387 | Domain Dedication. Except for the limited purpose of indicating that 388 | material is shared under a Creative Commons public license or as 389 | otherwise permitted by the Creative Commons policies published at 390 | creativecommons.org/policies, Creative Commons does not authorize the 391 | use of the trademark "Creative Commons" or any other trademark or logo 392 | of Creative Commons without its prior written consent including, 393 | without limitation, in connection with any unauthorized modifications 394 | to any of its public licenses or any other arrangements, 395 | understandings, or agreements concerning use of licensed material. For 396 | the avoidance of doubt, this paragraph does not form part of the 397 | public licenses. 398 | 399 | Creative Commons may be contacted at creativecommons.org. 400 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # DeeperCluster: Unsupervised Pre-training of Image Features on Non-Curated Data 2 | 3 | This code implements the unsupervised pre-training of convolutional neural networks, or convnets, as described in [Unsupervised Pre-training of Image Features on Non-Curated Data](https://arxiv.org/abs/1905.01278). 4 | 5 | ## Models 6 | We provide for download the following models: 7 | * DeeperCluster model trained on the full YFCC100M dataset; 8 | * DeepCluster [2] model trained on 1.3M images subset of the YFCC100M dataset; 9 | * RotNet [3] model trained on the full YFCC100M dataset; 10 | * RotNet [3] model trained on ImageNet dataset without labels. 11 | 12 | All these models follow a standard VGG-16 architecture with batch-normalization layers. 13 | Note that in Deep/DeeperCluster models, sobel filters are computed within the models as two convolutional layers (greyscale + sobel filters). 14 | The models expect RGB inputs that range in [0, 1]. You should preprocess your data before passing them to the released models by normalizing them: ```mean_rgb = [0.485, 0.456, 0.406]```; ```std_rgb = [0.229, 0.224, 0.225] ```. 15 | 16 | | Method / Dataset | YFCC100M | ImageNet | 17 | |--------------------|--------------------|-----------| 18 | | DeeperCluster | [ours](https://dl.fbaipublicfiles.com/deepcluster/ours/ours.pth) | - | 19 | | DeepCluster | [deepcluster_yfcc100M](https://dl.fbaipublicfiles.com/deepcluster/deepcluster/deepcluster_flickr.pth) trained on 1.3M images | [deepcluster_imagenet](https://dl.fbaipublicfiles.com/deepcluster/vgg16/checkpoint.pth.tar) (found [here](https://github.com/facebookresearch/deepcluster)) | 20 | | RotNet | [rotnet_yfcc100M](https://dl.fbaipublicfiles.com/deepcluster/rotnet/rotnet_flickr.pth) | [rotnet_imagenet](https://dl.fbaipublicfiles.com/deepcluster/rotnet/rotnet_imagenet.pth) | 21 | 22 | To automatically download all models you can run: 23 | ``` 24 | $ ./download_models.sh 25 | ``` 26 | 27 | ## Requirements 28 | - Python 3.6 29 | - [PyTorch](http://pytorch.org) install 1.0.0 30 | - [Apex](https://github.com/NVIDIA/apex) with CUDA extension 31 | - [Faiss](https://github.com/facebookresearch/faiss) GPU install 32 | - Download [YFCC100M dataset](https://webscope.sandbox.yahoo.com/catalog.php?datatype=i&did=67&guccounter=1&guce_referrer=aHR0cHM6Ly93d3cuZ29vZ2xlLmNvbS8&guce_referrer_sig=AQAAAI-kwr4-KyuBJKrOUt3nzqR8H9hxu4cel43rHsFuk_4mKhjPoepAekZ7thVhdnOX-oLYek43-YMLIGQ5xmyPzU0Rc--RJsuRMSvqzpxxpug7Mg7XEv15bBS030Ood5TfcXwna_hjdbCtiPeoCOl5Knhog71KhdWnrFwuX2TloFFJ). The ids of the 95.920.149 images we managed to download can be found [here](https://dl.fbaipublicfiles.com/deepcluster/flickr_unique_ids.npy). `wget -c -P ./src/data/ "https://dl.fbaipublicfiles.com/deepcluster/flickr_unique_ids.npy"` 33 | 34 | ## Unsupervised Learning of Visual Features 35 | 36 | The script ```main.sh``` will run our method. Here is a screenshot: 37 | ``` 38 | python main.py 39 | 40 | ## handling experiment parameters 41 | --dump_path ./exp/ # Where to store the experiment 42 | 43 | ## network params 44 | --pretrained PRETRAINED # Use this instead of random weights 45 | 46 | ## data params 47 | --data_path DATA_PATH # Where to find YFCC100M dataset 48 | --size_dataset 100000000 # How many images to use for training 49 | --workers 10 # Number of data loading workers 50 | --sobel true # Apply Sobel filter 51 | 52 | ## optim params 53 | --lr 0.1 # Learning rate 54 | --wd 0.00001 # Weight decay 55 | --nepochs 100 # Number of epochs to run 56 | --batch_size 48 # Batch size per process 57 | 58 | ## model params 59 | --reassignment 3 # Reassign clusters every this epoch 60 | --dim_pca 4096 # Dimension of the pca on the descriptors 61 | --super_classes 16 # Total number of super-classes 62 | --rotnet true # Network needs to classify large rotations 63 | 64 | ## k-means params 65 | --k 320000 # Total number of clusters 66 | --warm_restart false # Use previous centroids as init 67 | --use_faiss true # Use faiss for E step in k-means 68 | --niter 10 # Number of k-means iterations 69 | 70 | ## distributed training params 71 | --world-size 64 # Number of distributed processes 72 | --dist-url DIST_URL # Url used to set up distributed training 73 | ``` 74 | 75 | You can look the training full documentation up with ```python main.py --help```. 76 | 77 | ### Distributed training 78 | This implementation, as it is, supports only distributed mode activated. 79 | It has been specifically designed for multi-GPU and multi-node training and tested up to 128 GPUs distributed accross 16 nodes of 8 GPUs each. 80 | You can run code in two different scenarios: 81 | 82 | * 1- Submit your job to a computer cluster. This code is adapted for SLURM job scheduler but you can modify it for your own scheduler. 83 | 84 | * 2- Put export `NGPU=xx; python -m torch.distributed.launch --nproc_per_node=$NGPU` before the python file you want to execute (with xx the number of gpus you want). 85 | For example, to run an experiment with a single GPU on a single machine, simply replace `python main.py` with: 86 | ``` 87 | export NGPU=1; python -m torch.distributed.launch --nproc_per_node=$NGPU main.py 88 | ``` 89 | 90 | 91 | The parameter `rank` is set automatically in both scenario in [utils.py](./src/utils.py#L42). 92 | 93 | The parameter `local_rank` is more or less useless. 94 | 95 | The parameter `world-size` needs to be set manually in scenario 1 and is set automatically in scenario 2. 96 | 97 | The parameter `dist-url` needs to be set manually in both scenario. Refer to pytorch distributed [doc](https://pytorch.org/docs/stable/distributed.html) to set correctly the initialization method. 98 | 99 | 100 | The total number of GPUs used for an experiment (```world-size```) must be divisible by the total number of super-classes (```super_classes```). 101 | Hence, exactly a total of ```super_classes``` training communication groups of ```world_size / super_classes``` GPUs each are created. 102 | The parameters of a sub-class classifier specific to a super-class are shared within the corresponding training group. 103 | Each training group deals only with the subset of images and the rotation angle associated with its corresponding super-class. 104 | For this reason, computing batch statistics in the batch normalization layers for *the entire batch* (distributed accross the different training groups) is crucial. 105 | We do so thanks to [apex](https://github.com/NVIDIA/apex/tree/master/apex/parallel#synchronized-batch-normalization). 106 | 107 | For the first stage of hierarchical clustering into ```nmb_super_clusters``` clusters, the entire pool of GPUs is used. 108 | Then for the second stage, we create ```nmb_super_clusters``` clustering communication groups of ```world_size / nmb_super_clusters``` GPUs each. 109 | Each of these clustering groups independantly performs the second stage of hierarchical clustering on its corresponding subset of data (data belonging to the associated super-cluster). 110 | 111 | For example, as illustrated below, let's assume we want to run a training with 8 super-classes and we have access to a pool of 16 GPUs. 112 | As many training distributed communication groups as the number of super-classes are created. 113 | This corresponds to creating 8 training groups (in red) of 2 GPUs. 114 | Moreover, the first level of the hierarchical k-means corresponds to the clustering of the data into 8/4=2 super-clusters. 115 | Hence, 2 clustering groups (in blue) are created. 116 | ![distributed](./distributed_training.png) 117 | 118 | You can have a look [here](./src/utils.py#L42) for more details about how we define the different communication groups. 119 | The multi-node is automatically handled by SLURM. 120 | 121 | 122 | ### Running DeepCluster or RotNet 123 | Our implementation is generic enough to encompass both DeepCluster and RotNet trainings. 124 | * DeepCluster: set ```super_classes``` to ```1``` and ```rotnet``` to ```false```. 125 | * RotNet: set ```super_classes``` to ```4```, ```k``` to ```1``` and ```rotnet``` to ```true```. 126 | 127 | ## Evaluation protocols 128 | 129 | ### Pascal VOC 130 | 131 | To reproduce our results on PASCAL VOC 2007 classification task run: 132 | * FC6-8 133 | ``` 134 | python eval_voc_classif.py --data_path $PASCAL_DATASET --fc6_8 true --pretrained downloaded_models/deepercluster/ours.pth --sobel true --lr 0.003 --wd 0.00001 --nit 150000 --stepsize 20000 --split trainval 135 | ``` 136 | 137 | * ALL 138 | ``` 139 | python eval_voc_classif.py --data_path $PASCAL_DATASET --fc6_8 false --pretrained downloaded_models/deepercluster/ours.pth --sobel true --lr 0.003 --wd 0.0001 --nit 150000 --stepsize 10000 --split trainval 140 | ``` 141 | 142 | **Running the experiment with 5 seeds.** 143 | There are different sources of randomness in the code: classifier initialization, ramdon crops for the evaluation and training with CUDA. 144 | For more reliable results, we recommend to run the experiment several times with different seeds (`--seed 36` for example). 145 | 146 | **Hyper-parameters selection.** 147 | We select the value of the different hyper-parameters (weight-decay `wd`, learning rate `lr`, and step-size `stepsize`) by training on the train split and validating on the validation set. 148 | To do so, simply use `--split train`. 149 | 150 | ### Linear classifiers 151 | 152 | We train linear classifiers with a logistic loss on top of frozen convolutional layers at different depths. 153 | To reduce the influence of feature dimension in the comparison, we average-pool the features until their dimension is below 10k. 154 | 155 | To reproduce our results from Table-3 run: `./conv13.sh`. 156 | 157 | To reproduce our results from Figure-2 run: `./linear_classif_layers.sh` 158 | 159 | **Learning rates.** 160 | We use the learning rate decay recommended for linear models with L2 regularization by Leon Bottou in [Stochastic Gradient Descent Tricks](https://www.microsoft.com/en-us/research/wp-content/uploads/2012/01/tricks-2012.pdf). 161 | 162 | **Hyper-parameters selection.** 163 | For experiments on Pascal, we select the value of the initial learning rate by training on the train split and validating on the validation set. 164 | To do so, simply use `--split train`. 165 | For experiments on ImageNet and Places, this code implements k-fold cross-validation. 166 | Simply set `--kfold 3` for 3-fold cross-validation. 167 | Then set `--cross_valid 0` for training on splits 1 and 2 and validating on split 0 for example. 168 | 169 | **Checkpointing and distributed training.** 170 | This code implements automatic checkpointing and is adapted to distributed training on multi-gpus and/or multi-nodes. 171 | 172 | ### Pre-training for ImageNet 173 | 174 | To reproduce our results on the pre-training for ImageNet experiment (Table-2) run: 175 | 176 | ``` 177 | mkdir -p ./exp/pretraining_imagenet/ 178 | export NGPU=1; python -m torch.distributed.launch --nproc_per_node=$NGPU eval_pretrain.py --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --sobel2RGB true --nepochs 100 --batch_size 256 --lr 0.1 --wd 0.0001 --dump_path ./exp/pretraining_imagenet/ --data_path $DATAPATH_IMAGENET 179 | ``` 180 | 181 | **Checkpointing and distributed training.** 182 | This code implements automatic checkpointing and is specifically intended for distributed training on multi-gpus and/or multi-nodes. 183 | The results in the paper for this experiment are obtained with training on 4 GPUs (the batch size per GPU is 64 in this case). 184 | 185 | ## References 186 | 187 | ### Unsupervised Pre-training of Image Features on Non-Curated Data 188 | 189 | [1] M. Caron, P. Bojanowski, J. Mairal, A. Joulin [Unsupervised Pre-training of Image Features on Non-Curated Data](https://arxiv.org/abs/1905.01278) 190 | ``` 191 | @inproceedings{caron2019unsupervised, 192 | title={Unsupervised Pre-Training of Image Features on Non-Curated Data}, 193 | author={Caron, Mathilde and Bojanowski, Piotr and Mairal, Julien and Joulin, Armand}, 194 | booktitle={Proceedings of the International Conference on Computer Vision (ICCV)}, 195 | year={2019} 196 | } 197 | ``` 198 | 199 | 200 | ### Deep clustering for unsupervised pre-training of visual features 201 | 202 | [code](https://github.com/facebookresearch/deepcluster) 203 | 204 | [2] M. Caron, P. Bojanowski, A. Joulin, M. Douze [*Deep clustering for unsupervised learning of visual features*](http://openaccess.thecvf.com/content_ECCV_2018/html/Mathilde_Caron_Deep_Clustering_for_ECCV_2018_paper.html) 205 | ``` 206 | @inproceedings{caron2018deep, 207 | title={Deep clustering for unsupervised learning of visual features}, 208 | author={Caron, Mathilde and Bojanowski, Piotr and Joulin, Armand and Douze, Matthijs}, 209 | booktitle={Proceedings of the European Conference on Computer Vision (ECCV)}, 210 | year={2018} 211 | } 212 | ``` 213 | 214 | ### Unsupervised representation learning by predicting image rotations 215 | [code](https://github.com/gidariss/FeatureLearningRotNet) 216 | 217 | [3] S. Gidaris, P. Singh, N. Komodakis [*Unsupervised representation learning by predicting image rotations*](https://openreview.net/forum?id=S1v4N2l0-) 218 | 219 | ``` 220 | @inproceedings{ 221 | gidaris2018unsupervised, 222 | title={Unsupervised Representation Learning by Predicting Image Rotations}, 223 | author={Spyros Gidaris and Praveer Singh and Nikos Komodakis}, 224 | booktitle={International Conference on Learning Representations}, 225 | year={2018}, 226 | url={https://openreview.net/forum?id=S1v4N2l0-}, 227 | } 228 | ``` 229 | 230 | ## License 231 | 232 | See the [LICENSE](LICENSE) file for more details. 233 | -------------------------------------------------------------------------------- /conv13.sh: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | # 7 | 8 | DATAPATH_IMAGENET='path/to/imagenet/dataset' 9 | DATAPATH_PLACES='path/to/places205/dataset' 10 | DATAPATH_PASCAL='path/to/pascal2007/dataset' 11 | 12 | ########################## 13 | # DeeperCluster YFCC100M # 14 | ########################## 15 | 16 | # ImageNet dataset 17 | EXP='./exp/eval_linear_imagenet/' 18 | mkdir -p $EXP 19 | python eval_linear.py --conv 13 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_IMAGENET --batch_size 256 --lr 0.02 --wd 0.00001 --nepochs 100 20 | 21 | # Places205 dataset 22 | EXP='./exp/eval_linear_places205/' 23 | mkdir -p $EXP 24 | python eval_linear.py --conv 13 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_PLACES --batch_size 256 --lr 0.03 --wd 0.00001 --nepochs 100 25 | 26 | # Pascal dataset 27 | EXP='./exp/eval_linear_pascal/' 28 | mkdir -p $EXP 29 | python eval_linear.py --conv 13 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_PASCAL --batch_size 128 --lr 0.02 --wd 0.00001 --nepochs 60 30 | -------------------------------------------------------------------------------- /distributed_training.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/facebookresearch/DeeperCluster/d38ada109f8334f6ae4c84a218d79848a936ed6f/distributed_training.png -------------------------------------------------------------------------------- /download_models.sh: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | # 7 | #!/bin/bash 8 | 9 | MODELROOT="./downloaded_models" 10 | 11 | mkdir -p ${MODELROOT} 12 | 13 | for METHOD in deepercluster deepcluster rotnet 14 | do 15 | mkdir -p "${MODELROOT}/${METHOD}" 16 | 17 | # download our model 18 | if [ "$METHOD" = deepercluster ]; 19 | then 20 | wget -c "https://dl.fbaipublicfiles.com/deepcluster/ours/ours.pth" \ 21 | -P "${MODELROOT}/${METHOD}" 22 | fi 23 | 24 | # download deepcluster model trained on a 1.3M subset of YFCC100M 25 | if [ "$METHOD" = deepcluster ]; 26 | then 27 | wget -c "https://dl.fbaipublicfiles.com/deepcluster/${METHOD}/${METHOD}_flickr.pth" \ 28 | -P "${MODELROOT}/${METHOD}" 29 | fi 30 | 31 | # download rotnet models 32 | if [ "$METHOD" = rotnet ]; 33 | then 34 | for DATASET in flickr imagenet 35 | do 36 | wget -c "https://dl.fbaipublicfiles.com/deepcluster/${METHOD}/${METHOD}_${DATASET}.pth" \ 37 | -P "${MODELROOT}/${METHOD}" 38 | done 39 | fi 40 | done 41 | -------------------------------------------------------------------------------- /eval_linear.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | # 7 | 8 | import argparse 9 | from logging import getLogger 10 | import os 11 | import time 12 | 13 | import numpy as np 14 | from sklearn import metrics 15 | import torch 16 | import torch.nn as nn 17 | import torch.utils.data 18 | 19 | from src.data.loader import load_data, get_data_transformations, KFold, per_target 20 | from src.model.model_factory import model_factory, to_cuda, sgd_optimizer 21 | from src.model.pretrain import load_pretrained 22 | from src.slurm import init_signal_handler, trigger_job_requeue 23 | from src.trainer import validate_network, accuracy 24 | from src.data.VOC2007 import VOC2007_dataset 25 | from src.utils import (bool_flag, init_distributed_mode, initialize_exp, AverageMeter, 26 | restart_from_checkpoint, fix_random_seeds,) 27 | 28 | logger = getLogger() 29 | 30 | 31 | def get_parser(): 32 | """ 33 | Generate a parameters parser. 34 | """ 35 | # parse parameters 36 | parser = argparse.ArgumentParser(description="Train a linear classifier on conv layer") 37 | 38 | # main parameters 39 | parser.add_argument("--dump_path", type=str, default=".", 40 | help="Experiment dump path") 41 | parser.add_argument('--epoch', type=int, default=0, 42 | help='Current epoch to run') 43 | parser.add_argument('--start_iter', type=int, default=0, 44 | help='First iter to run in the current epoch') 45 | 46 | # model params 47 | parser.add_argument('--pretrained', type=str, default='', 48 | help='Use this instead of random weights.') 49 | parser.add_argument('--conv', type=int, default=1, choices=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13], 50 | help='On top of which layer train classifier.') 51 | 52 | # datasets params 53 | parser.add_argument('--data_path', type=str, default='', 54 | help='Where to find supervised dataset') 55 | parser.add_argument('--workers', type=int, default=8, 56 | help='Number of data loading workers') 57 | parser.add_argument('--sobel', type=bool_flag, default=False) 58 | 59 | # optim params 60 | parser.add_argument('--lr', type=float, default=0.05, help='Learning rate') 61 | parser.add_argument('--wd', type=float, default=1e-5, help='Weight decay') 62 | parser.add_argument('--nepochs', type=int, default=100, 63 | help='Max number of epochs to run') 64 | parser.add_argument('--batch_size', default=64, type=int) 65 | 66 | # model selection 67 | parser.add_argument('--split', type=str, required=False, default='train', choices=['train', 'trainval'], 68 | help='for PASCAL dataset, train on train or train+val') 69 | parser.add_argument('--kfold', type=int, default=None, 70 | help="""dataset randomly partitioned into kfold equal sized subsamples. 71 | Default None: no cross validation: train on full train set""") 72 | parser.add_argument('--cross_valid', type=int, default=None, 73 | help='between 0 and kfold - 1: index of the round of cross validation') 74 | 75 | # distributed training params 76 | parser.add_argument('--rank', default=0, type=int, 77 | help='rank') 78 | parser.add_argument("--local_rank", type=int, default=-1, 79 | help="Multi-GPU - Local rank") 80 | parser.add_argument('--world-size', default=1, type=int, 81 | help='number of distributed processes') 82 | parser.add_argument('--dist-url', default='', type=str, 83 | help='url used to set up distributed training') 84 | 85 | # debug 86 | parser.add_argument("--debug_slurm", type=bool_flag, default=False, 87 | help="Debug within a SLURM job") 88 | 89 | return parser.parse_args() 90 | 91 | 92 | def main(args): 93 | 94 | # initialize the multi-GPU / multi-node training 95 | init_distributed_mode(args, make_communication_groups=False) 96 | 97 | # initialize the experiment 98 | logger, training_stats = initialize_exp(args, 'epoch', 'iter', 'prec', 99 | 'loss', 'prec_val', 'loss_val') 100 | 101 | # initialize SLURM signal handler for time limit / pre-emption 102 | init_signal_handler() 103 | 104 | if not 'pascal' in args.data_path: 105 | main_data_path = args.data_path 106 | args.data_path = os.path.join(main_data_path, 'train') 107 | train_dataset = load_data(args) 108 | else: 109 | train_dataset = VOC2007_dataset(args.data_path, split=args.split) 110 | 111 | args.test = 'val' if args.split == 'train' else 'test' 112 | if not 'pascal' in args.data_path: 113 | if args.cross_valid is None: 114 | args.data_path = os.path.join(main_data_path, 'val') 115 | val_dataset = load_data(args) 116 | else: 117 | val_dataset = VOC2007_dataset(args.data_path, split=args.test) 118 | 119 | if args.cross_valid is not None: 120 | kfold = KFold(per_target(train_dataset.imgs), args.cross_valid, args.kfold) 121 | train_loader = torch.utils.data.DataLoader( 122 | train_dataset, batch_size=args.batch_size, sampler=kfold.train, 123 | num_workers=args.workers, pin_memory=True) 124 | val_loader = torch.utils.data.DataLoader( 125 | val_dataset, batch_size=args.batch_size, sampler=kfold.val, 126 | num_workers=args.workers) 127 | 128 | else: 129 | train_loader = torch.utils.data.DataLoader( 130 | train_dataset, batch_size=args.batch_size, shuffle=True, 131 | num_workers=args.workers, pin_memory=True) 132 | val_loader = torch.utils.data.DataLoader( 133 | val_dataset, 134 | batch_size=args.batch_size, shuffle=False, 135 | num_workers=args.workers) 136 | 137 | # prepare the different data transformations 138 | tr_val, tr_train = get_data_transformations() 139 | train_dataset.transform = tr_train 140 | val_dataset.transform = tr_val 141 | 142 | # build model skeleton 143 | fix_random_seeds() 144 | model = model_factory(args) 145 | 146 | load_pretrained(model, args) 147 | 148 | # keep only conv layers 149 | model.body.classifier = None 150 | model.conv = args.conv 151 | 152 | if 'places' in args.data_path: 153 | nmb_classes = 205 154 | elif 'pascal' in args.data_path: 155 | nmb_classes = 20 156 | else: 157 | nmb_classes = 1000 158 | 159 | reglog = RegLog(nmb_classes, args.conv) 160 | 161 | # distributed training wrapper 162 | model = to_cuda(model, [args.gpu_to_work_on], apex=True) 163 | reglog = to_cuda(reglog, [args.gpu_to_work_on], apex=True) 164 | logger.info('model to cuda') 165 | 166 | # set optimizer 167 | optimizer = sgd_optimizer(reglog, args.lr, args.wd) 168 | 169 | ## variables to reload to fetch in checkpoint 170 | to_restore = {'epoch': 0, 'start_iter': 0} 171 | 172 | # re start from checkpoint 173 | restart_from_checkpoint( 174 | args, 175 | run_variables=to_restore, 176 | state_dict=reglog, 177 | optimizer=optimizer, 178 | ) 179 | args.epoch = to_restore['epoch'] 180 | args.start_iter = to_restore['start_iter'] 181 | 182 | model.eval() 183 | reglog.train() 184 | 185 | # Linear training 186 | for _ in range(args.epoch, args.nepochs): 187 | 188 | logger.info("============ Starting epoch %i ... ============" % args.epoch) 189 | 190 | # train the network for one epoch 191 | scores = train_network(args, model, reglog, optimizer, train_loader) 192 | 193 | if not 'pascal' in args.data_path: 194 | scores_val = validate_network(val_loader, [model, reglog], args) 195 | else: 196 | scores_val = evaluate_pascal(val_dataset, [model, reglog]) 197 | 198 | scores = scores + scores_val 199 | 200 | # save training statistics 201 | logger.info(scores) 202 | training_stats.update(scores) 203 | 204 | 205 | def evaluate_pascal(val_dataset, models): 206 | 207 | val_loader = torch.utils.data.DataLoader( 208 | val_dataset, 209 | sampler=torch.utils.data.distributed.DistributedSampler(val_dataset), 210 | batch_size=1, 211 | num_workers=args.workers, 212 | pin_memory=True, 213 | ) 214 | 215 | for model in models: 216 | model.eval() 217 | gts = [] 218 | scr = [] 219 | for i, (input, target) in enumerate(val_loader): 220 | # move input to gpu and optionally reshape it 221 | input = input.cuda(non_blocking=True) 222 | 223 | # forward pass without grad computation 224 | with torch.no_grad(): 225 | output = models[0](input) 226 | output = models[1](output) 227 | scr.append(torch.sum(output, 0, keepdim=True).cpu().numpy()) 228 | gts.append(target) 229 | scr[i] += output.cpu().numpy() 230 | gts = np.concatenate(gts, axis=0).T 231 | scr = np.concatenate(scr, axis=0).T 232 | aps = [] 233 | for i in range(20): 234 | # Subtract eps from score to make AP work for tied scores 235 | ap = metrics.average_precision_score(gts[i][gts[i]<=1], scr[i][gts[i]<=1]-1e-5*gts[i][gts[i]<=1]) 236 | aps.append(ap) 237 | print(np.mean(aps), ' ', ' '.join(['%0.2f'%a for a in aps])) 238 | return np.mean(aps), 0 239 | 240 | 241 | class RegLog(nn.Module): 242 | """Creates logistic regression on top of frozen features""" 243 | def __init__(self, num_labels, conv): 244 | super(RegLog, self).__init__() 245 | if conv < 3: 246 | av = 18 247 | s = 9216 248 | elif conv < 5: 249 | av = 14 250 | s = 8192 251 | elif conv < 8: 252 | av = 9 253 | s = 9216 254 | elif conv < 11: 255 | av = 6 256 | s = 8192 257 | elif conv < 14: 258 | av = 3 259 | s = 8192 260 | self.av_pool = nn.AvgPool2d(av, stride=av, padding=0) 261 | self.linear = nn.Linear(s, num_labels) 262 | 263 | def forward(self, x): 264 | x = self.av_pool(x) 265 | x = x.view(x.size(0), -1) 266 | return self.linear(x) 267 | 268 | 269 | def train_network(args, model, reglog, optimizer, loader): 270 | """ 271 | Train the models on the dataset. 272 | """ 273 | # running statistics 274 | batch_time = AverageMeter() 275 | data_time = AverageMeter() 276 | 277 | # training statistics 278 | log_top1 = AverageMeter() 279 | log_loss = AverageMeter() 280 | end = time.perf_counter() 281 | 282 | if 'pascal' in args.data_path: 283 | criterion = nn.BCEWithLogitsLoss(reduction='none') 284 | else: 285 | criterion = nn.CrossEntropyLoss().cuda() 286 | 287 | for iter_epoch, (inp, target) in enumerate(loader): 288 | # measure data loading time 289 | data_time.update(time.perf_counter() - end) 290 | 291 | learning_rate_decay(optimizer, len(loader) * args.epoch + iter_epoch, args.lr) 292 | 293 | # start at iter start_iter 294 | if iter_epoch < args.start_iter: 295 | continue 296 | 297 | # move to gpu 298 | inp = inp.cuda(non_blocking=True) 299 | target = target.cuda(non_blocking=True) 300 | if 'pascal' in args.data_path: 301 | target = target.float() 302 | 303 | # forward 304 | with torch.no_grad(): 305 | output = model(inp) 306 | output = reglog(output) 307 | 308 | # compute cross entropy loss 309 | loss = criterion(output, target) 310 | 311 | if 'pascal' in args.data_path: 312 | mask = (target == 255) 313 | loss = torch.sum(loss.masked_fill_(mask, 0)) / target.size(0) 314 | 315 | optimizer.zero_grad() 316 | 317 | # compute the gradients 318 | loss.backward() 319 | 320 | # step 321 | optimizer.step() 322 | 323 | # log 324 | 325 | # signal received, relaunch experiment 326 | if os.environ['SIGNAL_RECEIVED'] == 'True': 327 | if not args.rank: 328 | torch.save({ 329 | 'epoch': args.epoch, 330 | 'start_iter': iter_epoch + 1, 331 | 'state_dict': reglog.state_dict(), 332 | 'optimizer': optimizer.state_dict(), 333 | }, os.path.join(args.dump_path, 'checkpoint.pth.tar')) 334 | trigger_job_requeue(os.path.join(args.dump_path, 'checkpoint.pth.tar')) 335 | 336 | # update stats 337 | log_loss.update(loss.item(), output.size(0)) 338 | if not 'pascal' in args.data_path: 339 | prec1 = accuracy(args, output, target) 340 | log_top1.update(prec1.item(), output.size(0)) 341 | 342 | batch_time.update(time.perf_counter() - end) 343 | end = time.perf_counter() 344 | 345 | # verbose 346 | if iter_epoch % 100 == 0: 347 | logger.info('Epoch[{0}] - Iter: [{1}/{2}]\t' 348 | 'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t' 349 | 'Data {data_time.val:.3f} ({data_time.avg:.3f})\t' 350 | 'Loss {loss.val:.4f} ({loss.avg:.4f})\t' 351 | 'Prec {log_top1.val:.3f} ({log_top1.avg:.3f})\t' 352 | .format(args.epoch, iter_epoch, len(loader), batch_time=batch_time, 353 | data_time=data_time, loss=log_loss, log_top1=log_top1)) 354 | 355 | # end of epoch 356 | args.start_iter = 0 357 | args.epoch += 1 358 | 359 | # dump checkpoint 360 | if not args.rank: 361 | torch.save({ 362 | 'epoch': args.epoch, 363 | 'start_iter': 0, 364 | 'state_dict': reglog.state_dict(), 365 | 'optimizer': optimizer.state_dict(), 366 | }, os.path.join(args.dump_path, 'checkpoint.pth.tar')) 367 | 368 | return (args.epoch - 1, args.epoch * len(loader), log_top1.avg, log_loss.avg) 369 | 370 | 371 | def learning_rate_decay(optimizer, t, lr_0): 372 | for param_group in optimizer.param_groups: 373 | lr = lr_0 / np.sqrt(1 + lr_0 * param_group['weight_decay'] * t) 374 | param_group['lr'] = lr 375 | 376 | 377 | if __name__ == '__main__': 378 | 379 | # generate parser / parse parameters 380 | args = get_parser() 381 | 382 | # run experiment 383 | main(args) 384 | -------------------------------------------------------------------------------- /eval_pretrain.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | # 7 | 8 | import argparse 9 | from logging import getLogger 10 | import math 11 | import os 12 | import shutil 13 | import time 14 | 15 | import torch 16 | import torch.nn as nn 17 | 18 | from src.data.loader import load_data, get_data_transformations 19 | from src.model.model_factory import model_factory, to_cuda, sgd_optimizer, sobel2RGB 20 | from src.slurm import init_signal_handler, trigger_job_requeue 21 | from src.trainer import validate_network, accuracy 22 | from src.utils import (bool_flag, init_distributed_mode, initialize_exp, AverageMeter, 23 | restart_from_checkpoint, fix_random_seeds,) 24 | from src.model.pretrain import load_pretrained 25 | 26 | 27 | logger = getLogger() 28 | 29 | 30 | def get_parser(): 31 | """ 32 | Generate a parameters parser. 33 | """ 34 | # parse parameters 35 | parser = argparse.ArgumentParser(description="Train classification") 36 | 37 | # main parameters 38 | parser.add_argument("--dump_path", type=str, default=".", 39 | help="Experiment dump path") 40 | parser.add_argument('--epoch', type=int, default=0, 41 | help='Current epoch to run') 42 | parser.add_argument('--start_iter', type=int, default=0, 43 | help='First iter to run in the current epoch') 44 | parser.add_argument("--checkpoint_freq", type=int, default=20, 45 | help="Save the model periodically ") 46 | parser.add_argument("--evaluate", type=bool_flag, default=False, 47 | help="Evaluate the model only") 48 | parser.add_argument('--seed', type=int, default=35, help='random seed') 49 | 50 | # model params 51 | parser.add_argument('--sobel', type=bool_flag, default=0) 52 | parser.add_argument('--sobel2RGB', type=bool_flag, default=False, 53 | help='Incorporate sobel filter in first conv') 54 | parser.add_argument('--pretrained', type=str, default='', 55 | help='Use this instead of random weights.') 56 | 57 | # datasets params 58 | parser.add_argument('--data_path', type=str, default='', 59 | help='Where to find ImageNet dataset') 60 | parser.add_argument('--workers', type=int, default=8, 61 | help='Number of data loading workers') 62 | 63 | # optim params 64 | parser.add_argument('--lr', type=float, default=0.05, help='Learning rate') 65 | parser.add_argument('--wd', type=float, default=1e-5, help='Weight decay') 66 | parser.add_argument('--nepochs', type=int, default=100, 67 | help='Max number of epochs to run') 68 | parser.add_argument('--batch_size', default=128, type=int) 69 | 70 | # distributed training params 71 | parser.add_argument('--rank', default=0, type=int, 72 | help='rank') 73 | parser.add_argument("--local_rank", type=int, default=-1, 74 | help="Multi-GPU - Local rank") 75 | parser.add_argument('--world-size', default=1, type=int, 76 | help='number of distributed processes') 77 | parser.add_argument('--dist-url', default='', type=str, 78 | help='url used to set up distributed training') 79 | 80 | # debug 81 | parser.add_argument("--debug", type=bool_flag, default=False, 82 | help="Load val set of ImageNet") 83 | parser.add_argument("--debug_slurm", type=bool_flag, default=False, 84 | help="Debug within a SLURM job") 85 | 86 | return parser.parse_args() 87 | 88 | 89 | def main(args): 90 | 91 | # initialize the multi-GPU / multi-node training 92 | init_distributed_mode(args, make_communication_groups=False) 93 | 94 | # initialize the experiment 95 | logger, training_stats = initialize_exp(args, 'epoch', 'iter', 'prec', 96 | 'loss', 'prec_val', 'loss_val') 97 | 98 | # initialize SLURM signal handler for time limit / pre-emption 99 | init_signal_handler() 100 | 101 | main_data_path = args.data_path 102 | if args.debug: 103 | args.data_path = os.path.join(main_data_path, 'val') 104 | else: 105 | args.data_path = os.path.join(main_data_path, 'train') 106 | train_dataset = load_data(args) 107 | 108 | args.data_path = os.path.join(main_data_path, 'val') 109 | val_dataset = load_data(args) 110 | 111 | # prepare the different data transformations 112 | tr_val, tr_train = get_data_transformations() 113 | train_dataset.transform = tr_train 114 | val_dataset.transform = tr_val 115 | val_loader = torch.utils.data.DataLoader( 116 | val_dataset, 117 | batch_size=args.batch_size, 118 | num_workers=args.workers, 119 | pin_memory=True, 120 | ) 121 | 122 | # build model skeleton 123 | fix_random_seeds(args.seed) 124 | nmb_classes = 205 if 'places' in args.data_path else 1000 125 | model = model_factory(args, relu=True, num_classes=nmb_classes) 126 | 127 | # load pretrained weights 128 | load_pretrained(model, args) 129 | 130 | # merge sobel layers with first convolution layer 131 | if args.sobel2RGB: 132 | sobel2RGB(model) 133 | 134 | # re initialize classifier 135 | if hasattr(model.body, 'classifier'): 136 | for m in model.body.classifier.modules(): 137 | if isinstance(m, nn.Linear): 138 | m.weight.data.normal_(0, 0.01) 139 | m.bias.data.fill_(0.1) 140 | 141 | # distributed training wrapper 142 | model = to_cuda(model, [args.gpu_to_work_on], apex=True) 143 | logger.info('model to cuda') 144 | 145 | # set optimizer 146 | optimizer = sgd_optimizer(model, args.lr, args.wd) 147 | 148 | ## variables to reload to fetch in checkpoint 149 | to_restore = {'epoch': 0, 'start_iter': 0} 150 | 151 | # re start from checkpoint 152 | restart_from_checkpoint( 153 | args, 154 | run_variables=to_restore, 155 | state_dict=model, 156 | optimizer=optimizer, 157 | ) 158 | args.epoch = to_restore['epoch'] 159 | args.start_iter = to_restore['start_iter'] 160 | 161 | if args.evaluate: 162 | validate_network(val_loader, [model], args) 163 | return 164 | 165 | # Supervised training 166 | for _ in range(args.epoch, args.nepochs): 167 | 168 | logger.info("============ Starting epoch %i ... ============" % args.epoch) 169 | 170 | fix_random_seeds(args.seed + args.epoch) 171 | 172 | # train the network for one epoch 173 | adjust_learning_rate(optimizer, args) 174 | scores = train_network(args, model, optimizer, train_dataset) 175 | 176 | scores_val = validate_network(val_loader, [model], args) 177 | 178 | # save training statistics 179 | logger.info(scores + scores_val) 180 | training_stats.update(scores + scores_val) 181 | 182 | 183 | def adjust_learning_rate(optimizer, args): 184 | lr = args.lr * (0.1 ** (args.epoch // 30)) 185 | for param_group in optimizer.param_groups: 186 | param_group['lr'] = lr 187 | 188 | 189 | def train_network(args, model, optimizer, dataset): 190 | """ 191 | Train the models on the dataset. 192 | """ 193 | # swith to train mode 194 | model.train() 195 | 196 | sampler = torch.utils.data.distributed.DistributedSampler(dataset) 197 | 198 | loader = torch.utils.data.DataLoader( 199 | dataset, 200 | sampler=sampler, 201 | batch_size=args.batch_size, 202 | num_workers=args.workers, 203 | pin_memory=True, 204 | ) 205 | 206 | # running statistics 207 | batch_time = AverageMeter() 208 | data_time = AverageMeter() 209 | 210 | # training statistics 211 | log_top1 = AverageMeter() 212 | log_loss = AverageMeter() 213 | end = time.perf_counter() 214 | 215 | cel = nn.CrossEntropyLoss().cuda() 216 | 217 | for iter_epoch, (inp, target) in enumerate(loader): 218 | # measure data loading time 219 | data_time.update(time.perf_counter() - end) 220 | 221 | # start at iter start_iter 222 | if iter_epoch < args.start_iter: 223 | continue 224 | 225 | # move to gpu 226 | inp = inp.cuda(non_blocking=True) 227 | target = target.cuda(non_blocking=True) 228 | 229 | # forward 230 | output = model(inp) 231 | 232 | # compute cross entropy loss 233 | loss = cel(output, target) 234 | 235 | optimizer.zero_grad() 236 | 237 | # compute the gradients 238 | loss.backward() 239 | 240 | # step 241 | optimizer.step() 242 | 243 | # log 244 | 245 | # signal received, relaunch experiment 246 | if os.environ['SIGNAL_RECEIVED'] == 'True': 247 | if not args.rank: 248 | torch.save({ 249 | 'epoch': args.epoch, 250 | 'start_iter': iter_epoch + 1, 251 | 'state_dict': model.state_dict(), 252 | 'optimizer': optimizer.state_dict(), 253 | }, os.path.join(args.dump_path, 'checkpoint.pth.tar')) 254 | trigger_job_requeue(os.path.join(args.dump_path, 'checkpoint.pth.tar')) 255 | 256 | # update stats 257 | log_loss.update(loss.item(), output.size(0)) 258 | prec1 = accuracy(args, output, target) 259 | log_top1.update(prec1.item(), output.size(0)) 260 | 261 | batch_time.update(time.perf_counter() - end) 262 | end = time.perf_counter() 263 | 264 | # verbose 265 | if iter_epoch % 100 == 0: 266 | logger.info('Epoch[{0}] - Iter: [{1}/{2}]\t' 267 | 'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t' 268 | 'Data {data_time.val:.3f} ({data_time.avg:.3f})\t' 269 | 'Loss {loss.val:.4f} ({loss.avg:.4f})\t' 270 | 'Prec {log_top1.val:.3f} ({log_top1.avg:.3f})\t' 271 | .format(args.epoch, iter_epoch, len(loader), batch_time=batch_time, 272 | data_time=data_time, loss=log_loss, log_top1=log_top1)) 273 | 274 | # end of epoch 275 | args.start_iter = 0 276 | args.epoch += 1 277 | 278 | # dump checkpoint 279 | if not args.rank: 280 | torch.save({ 281 | 'epoch': args.epoch, 282 | 'start_iter': 0, 283 | 'state_dict': model.state_dict(), 284 | 'optimizer': optimizer.state_dict(), 285 | }, os.path.join(args.dump_path, 'checkpoint.pth.tar')) 286 | if not (args.epoch - 1) % args.checkpoint_freq: 287 | shutil.copyfile( 288 | os.path.join(args.dump_path, 'checkpoint.pth.tar'), 289 | os.path.join(args.dump_checkpoints, 290 | 'checkpoint' + str(args.epoch - 1) + '.pth.tar'), 291 | ) 292 | 293 | return (args.epoch - 1, args.epoch * len(loader), log_top1.avg, log_loss.avg) 294 | 295 | if __name__ == '__main__': 296 | 297 | # generate parser / parse parameters 298 | args = get_parser() 299 | 300 | # run experiment 301 | main(args) 302 | -------------------------------------------------------------------------------- /eval_voc_classif.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | # 7 | 8 | import argparse 9 | import time 10 | 11 | import numpy as np 12 | import torch 13 | import torch.nn as nn 14 | import torch.optim 15 | import torch.utils.data 16 | import torchvision.transforms as transforms 17 | from sklearn import metrics 18 | 19 | from src.utils import AverageMeter, bool_flag, fix_random_seeds 20 | from src.trainer import accuracy 21 | from src.data.VOC2007 import VOC2007_dataset 22 | from src.model.model_factory import model_factory, sgd_optimizer 23 | from src.model.pretrain import load_pretrained 24 | 25 | parser = argparse.ArgumentParser() 26 | 27 | # model params 28 | parser.add_argument('--pretrained', type=str, required=False, default='', 29 | help='evaluate this model') 30 | 31 | # data params 32 | parser.add_argument('--data_path', type=str, default='', 33 | help='Where to find pascal 2007 dataset') 34 | parser.add_argument('--split', type=str, required=False, default='train', 35 | choices=['train', 'trainval'], help='training split') 36 | parser.add_argument('--sobel', type=bool_flag, default=False, help='If true, sobel applies') 37 | 38 | # transfer params 39 | parser.add_argument('--fc6_8', type=bool_flag, default=True, help='If true, train only the final classifier') 40 | parser.add_argument('--eval_random_crops', type=bool_flag, default=True, help='If true, eval on 10 random crops, otherwise eval on 10 fixed crops') 41 | 42 | # optim params 43 | parser.add_argument('--nit', type=int, default=150000, help='Number of training iterations') 44 | parser.add_argument('--stepsize', type=int, default=10000, help='Decay step') 45 | parser.add_argument('--lr', type=float, required=False, default=0.003, help='learning rate') 46 | parser.add_argument('--wd', type=float, required=False, default=1e-6, help='weight decay') 47 | 48 | parser.add_argument('--seed', type=int, default=1993, help='random seed') 49 | 50 | def main(): 51 | args = parser.parse_args() 52 | args.world_size = 1 53 | print(args) 54 | 55 | fix_random_seeds(args.seed) 56 | 57 | # create model 58 | model = model_factory(args, relu=True, num_classes=20) 59 | 60 | # load pretrained weights 61 | load_pretrained(model, args) 62 | 63 | model = model.cuda() 64 | print('model to cuda') 65 | 66 | # on which split to train 67 | if args.split == 'train': 68 | args.test = 'val' 69 | elif args.split == 'trainval': 70 | args.test = 'test' 71 | 72 | # data loader 73 | normalize = [transforms.Normalize(mean=[0.485, 0.456, 0.406], 74 | std=[0.229, 0.224, 0.225])] 75 | dataset = VOC2007_dataset(args.data_path, split=args.split, transform=transforms.Compose([ 76 | transforms.RandomHorizontalFlip(), 77 | transforms.RandomResizedCrop(224), 78 | transforms.ToTensor(),] + normalize 79 | )) 80 | 81 | loader = torch.utils.data.DataLoader(dataset, 82 | batch_size=16, shuffle=False, 83 | num_workers=4, pin_memory=True) 84 | print('PASCAL VOC 2007 ' + args.split + ' dataset loaded') 85 | 86 | # re initialize classifier 87 | if hasattr(model.body, 'classifier'): 88 | for m in model.body.classifier.modules(): 89 | if isinstance(m, nn.Linear): 90 | m.weight.data.normal_(0, 0.01) 91 | m.bias.data.fill_(0.1) 92 | for m in model.pred_layer.modules(): 93 | if isinstance(m, nn.Linear): 94 | m.weight.data.normal_(0, 0.01) 95 | m.bias.data.fill_(0.1) 96 | 97 | # freeze conv layers 98 | if args.fc6_8: 99 | if hasattr(model.body, 'features'): 100 | for param in model.body.features.parameters(): 101 | param.requires_grad = False 102 | 103 | # set optimizer 104 | optimizer = torch.optim.SGD( 105 | filter(lambda x: x.requires_grad, model.parameters()), 106 | lr=args.lr, 107 | momentum=0.9, 108 | weight_decay=args.wd, 109 | ) 110 | 111 | criterion = nn.BCEWithLogitsLoss(reduction='none') 112 | 113 | print('Start training') 114 | it = 0 115 | losses = AverageMeter() 116 | while it < args.nit: 117 | it = train( 118 | loader, 119 | model, 120 | optimizer, 121 | criterion, 122 | args.fc6_8, 123 | losses, 124 | current_iteration=it, 125 | total_iterations=args.nit, 126 | stepsize=args.stepsize, 127 | ) 128 | 129 | print('Model Evaluation') 130 | if args.eval_random_crops: 131 | transform_eval = [ 132 | transforms.RandomHorizontalFlip(), 133 | transforms.RandomResizedCrop(224), 134 | transforms.ToTensor(),] + normalize 135 | else: 136 | transform_eval = [ 137 | transforms.Resize(256), 138 | transforms.TenCrop(224), 139 | transforms.Lambda(lambda crops: torch.stack([transforms.Compose(normalize)(transforms.ToTensor()(crop)) for crop in crops])) 140 | ] 141 | 142 | print('Train set') 143 | train_dataset = VOC2007_dataset( 144 | args.data_path, 145 | split=args.split, 146 | transform=transforms.Compose(transform_eval), 147 | ) 148 | train_loader = torch.utils.data.DataLoader( 149 | train_dataset, 150 | batch_size=1, 151 | shuffle=False, 152 | num_workers=4, 153 | pin_memory=True, 154 | ) 155 | evaluate(train_loader, model, args.eval_random_crops) 156 | 157 | print('Test set') 158 | test_dataset = VOC2007_dataset(args.data_path, split=args.test, transform=transforms.Compose(transform_eval)) 159 | test_loader = torch.utils.data.DataLoader( 160 | test_dataset, 161 | batch_size=1, 162 | shuffle=False, 163 | num_workers=4, 164 | pin_memory=True, 165 | ) 166 | evaluate(test_loader, model, args.eval_random_crops) 167 | 168 | 169 | def evaluate(loader, model, eval_random_crops): 170 | model.eval() 171 | gts = [] 172 | scr = [] 173 | for crop in range(9 * eval_random_crops + 1): 174 | for i, (input, target) in enumerate(loader): 175 | # move input to gpu and optionally reshape it 176 | if len(input.size()) == 5: 177 | bs, ncrops, c, h, w = input.size() 178 | input = input.view(-1, c, h, w) 179 | input = input.cuda(non_blocking=True) 180 | 181 | # forward pass without grad computation 182 | with torch.no_grad(): 183 | output = model(input) 184 | if crop < 1 : 185 | scr.append(torch.sum(output, 0, keepdim=True).cpu().numpy()) 186 | gts.append(target) 187 | else: 188 | scr[i] += output.cpu().numpy() 189 | gts = np.concatenate(gts, axis=0).T 190 | scr = np.concatenate(scr, axis=0).T 191 | aps = [] 192 | for i in range(20): 193 | # Subtract eps from score to make AP work for tied scores 194 | ap = metrics.average_precision_score(gts[i][gts[i]<=1], scr[i][gts[i]<=1]-1e-5*gts[i][gts[i]<=1]) 195 | aps.append( ap ) 196 | print(np.mean(aps), ' ', ' '.join(['%0.2f'%a for a in aps])) 197 | 198 | 199 | def train(loader, model, optimizer, criterion, fc6_8, losses, current_iteration=0, total_iterations=None, stepsize=None, verbose=True): 200 | # to log 201 | batch_time = AverageMeter() 202 | data_time = AverageMeter() 203 | top1 = AverageMeter() 204 | end = time.time() 205 | 206 | # use dropout for the MLP 207 | if hasattr(model.body, 'classifier'): 208 | model.train() 209 | # in the batch norms always use global statistics 210 | model.body.features.eval() 211 | else: 212 | model.eval() 213 | 214 | for i, (input, target) in enumerate(loader): 215 | # measure data loading time 216 | data_time.update(time.time() - end) 217 | 218 | # adjust learning rate 219 | if current_iteration != 0 and current_iteration % stepsize == 0: 220 | for param_group in optimizer.param_groups: 221 | param_group['lr'] = param_group['lr'] * 0.5 222 | print('iter {0} learning rate is {1}'.format(current_iteration, param_group['lr'])) 223 | 224 | # move input to gpu 225 | input = input.cuda(non_blocking=True) 226 | 227 | # forward pass with or without grad computation 228 | output = model(input) 229 | 230 | target = target.float().cuda() 231 | mask = (target == 255) 232 | loss = torch.sum(criterion(output, target).masked_fill_(mask, 0)) / target.size(0) 233 | 234 | # backward 235 | optimizer.zero_grad() 236 | loss.backward() 237 | # clip gradients 238 | torch.nn.utils.clip_grad_norm_(model.parameters(), 10) 239 | # and weights update 240 | optimizer.step() 241 | 242 | # measure accuracy and record loss 243 | losses.update(loss.item(), input.size(0)) 244 | 245 | # measure elapsed time 246 | batch_time.update(time.time() - end) 247 | end = time.time() 248 | if verbose is True and current_iteration % 25 == 0: 249 | print('Iteration[{0}]\t' 250 | 'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t' 251 | 'Data {data_time.val:.3f} ({data_time.avg:.3f})\t' 252 | 'Loss {loss.val:.4f} ({loss.avg:.4f})\t'.format( 253 | current_iteration, batch_time=batch_time, 254 | data_time=data_time, loss=losses)) 255 | current_iteration = current_iteration + 1 256 | if total_iterations is not None and current_iteration == total_iterations: 257 | break 258 | return current_iteration 259 | 260 | 261 | if __name__ == '__main__': 262 | main() 263 | -------------------------------------------------------------------------------- /linear_classif_layers.sh: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | # 7 | 8 | DATAPATH_IMAGENET='path/to/imagenet/dataset' 9 | DATAPATH_PLACES='path/to/places205/dataset' 10 | 11 | ######################## 12 | ### ImageNet dataset ### 13 | ######################## 14 | 15 | # CONV 1 16 | EXP='./exp/eval_linear_imagenet_conv1/' 17 | mkdir -p $EXP 18 | python eval_linear.py --conv 1 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_IMAGENET --batch_size 256 --lr 0.005 --wd 0.00001 --nepochs 100 19 | 20 | # CONV 2 21 | EXP='./exp/eval_linear_imagenet_conv2/' 22 | mkdir -p $EXP 23 | python eval_linear.py --conv 2 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_IMAGENET --batch_size 256 --lr 0.01 --wd 0.00001 --nepochs 100 24 | 25 | # CONV 3 26 | EXP='./exp/eval_linear_imagenet_conv3/' 27 | mkdir -p $EXP 28 | python eval_linear.py --conv 3 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_IMAGENET --batch_size 256 --lr 0.03 --wd 0.00001 --nepochs 100 29 | 30 | # CONV 4 31 | EXP='./exp/eval_linear_imagenet_conv4/' 32 | mkdir -p $EXP 33 | python eval_linear.py --conv 4 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_IMAGENET --batch_size 256 --lr 0.03 --wd 0.00001 --nepochs 100 34 | 35 | # CONV 5 36 | EXP='./exp/eval_linear_imagenet_conv5/' 37 | mkdir -p $EXP 38 | python eval_linear.py --conv 5 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_IMAGENET --batch_size 256 --lr 0.03 --wd 0.00001 --nepochs 100 39 | 40 | # CONV 6 41 | EXP='./exp/eval_linear_imagenet_conv6/' 42 | mkdir -p $EXP 43 | python eval_linear.py --conv 6 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_IMAGENET --batch_size 256 --lr 0.03 --wd 0.00001 --nepochs 100 44 | 45 | # CONV 7 46 | EXP='./exp/eval_linear_imagenet_conv7/' 47 | mkdir -p $EXP 48 | python eval_linear.py --conv 7 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_IMAGENET --batch_size 256 --lr 0.02 --wd 0.00001 --nepochs 100 49 | 50 | # CONV 8 51 | EXP='./exp/eval_linear_imagenet_conv8/' 52 | mkdir -p $EXP 53 | python eval_linear.py --conv 8 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_IMAGENET --batch_size 256 --lr 0.03 --wd 0.00001 --nepochs 100 54 | 55 | # CONV 9 56 | EXP='./exp/eval_linear_imagenet_conv9/' 57 | mkdir -p $EXP 58 | python eval_linear.py --conv 9 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_IMAGENET --batch_size 256 --lr 0.03 --wd 0.00001 --nepochs 100 59 | 60 | # CONV 10 61 | EXP='./exp/eval_linear_imagenet_conv10/' 62 | mkdir -p $EXP 63 | python eval_linear.py --conv 10 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_IMAGENET --batch_size 256 --lr 0.03 --wd 0.00001 --nepochs 100 64 | 65 | # CONV 11 66 | EXP='./exp/eval_linear_imagenet_conv11/' 67 | mkdir -p $EXP 68 | python eval_linear.py --conv 11 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_IMAGENET --batch_size 256 --lr 0.02 --wd 0.00001 --nepochs 100 69 | 70 | # CONV 12 71 | EXP='./exp/eval_linear_imagenet_conv12/' 72 | mkdir -p $EXP 73 | python eval_linear.py --conv 12 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_IMAGENET --batch_size 256 --lr 0.02 --wd 0.00001 --nepochs 100 74 | 75 | # CONV 13 76 | EXP='./exp/eval_linear_imagenet_conv13/' 77 | mkdir -p $EXP 78 | python eval_linear.py --conv 13 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_IMAGENET --batch_size 256 --lr 0.02 --wd 0.00001 --nepochs 100 79 | 80 | ######################## 81 | ### Places205 dataset ## 82 | ######################## 83 | 84 | # CONV 1 85 | EXP='./exp/eval_linear_places205_conv1/' 86 | mkdir -p $EXP 87 | python eval_linear.py --conv 1 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_PLACES --batch_size 256 --lr 0.003 --wd 0.00001 --nepochs 100 88 | 89 | # CONV 2 90 | EXP='./exp/eval_linear_places205_conv2/' 91 | mkdir -p $EXP 92 | python eval_linear.py --conv 2 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_PLACES --batch_size 256 --lr 0.005 --wd 0.00001 --nepochs 100 93 | 94 | # CONV 3 95 | EXP='./exp/eval_linear_places205_conv3/' 96 | mkdir -p $EXP 97 | python eval_linear.py --conv 3 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_PLACES --batch_size 256 --lr 0.01 --wd 0.00001 --nepochs 100 98 | 99 | # CONV 4 100 | EXP='./exp/eval_linear_places205_conv4/' 101 | mkdir -p $EXP 102 | python eval_linear.py --conv 4 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_PLACES --batch_size 256 --lr 0.01 --wd 0.00001 --nepochs 100 103 | 104 | # CONV 5 105 | EXP='./exp/eval_linear_places205_conv5/' 106 | mkdir -p $EXP 107 | python eval_linear.py --conv 5 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_PLACES --batch_size 256 --lr 0.01 --wd 0.00001 --nepochs 100 108 | 109 | # CONV 6 110 | EXP='./exp/eval_linear_places205_conv6/' 111 | mkdir -p $EXP 112 | python eval_linear.py --conv 6 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_PLACES --batch_size 256 --lr 0.01 --wd 0.00001 --nepochs 100 113 | 114 | # CONV 7 115 | EXP='./exp/eval_linear_places205_conv7/' 116 | mkdir -p $EXP 117 | python eval_linear.py --conv 7 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_PLACES --batch_size 256 --lr 0.01 --wd 0.00001 --nepochs 100 118 | 119 | # CONV 8 120 | EXP='./exp/eval_linear_places205_conv8/' 121 | mkdir -p $EXP 122 | python eval_linear.py --conv 8 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_PLACES --batch_size 256 --lr 0.01 --wd 0.00001 --nepochs 100 123 | 124 | # CONV 9 125 | EXP='./exp/eval_linear_places205_conv9/' 126 | mkdir -p $EXP 127 | python eval_linear.py --conv 9 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_PLACES --batch_size 256 --lr 0.01 --wd 0.00001 --nepochs 100 128 | 129 | # CONV 10 130 | EXP='./exp/eval_linear_places205_conv10/' 131 | mkdir -p $EXP 132 | python eval_linear.py --conv 10 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_PLACES --batch_size 256 --lr 0.03 --wd 0.00001 --nepochs 100 133 | 134 | # CONV 11 135 | EXP='./exp/eval_linear_places205_conv11/' 136 | mkdir -p $EXP 137 | python eval_linear.py --conv 11 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_PLACES --batch_size 256 --lr 0.01 --wd 0.00001 --nepochs 100 138 | 139 | # CONV 12 140 | EXP='./exp/eval_linear_places205_conv12/' 141 | mkdir -p $EXP 142 | python eval_linear.py --conv 12 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_PLACES --batch_size 256 --lr 0.01 --wd 0.00001 --nepochs 100 143 | 144 | # CONV 13 145 | EXP='./exp/eval_linear_places205_conv13/' 146 | mkdir -p $EXP 147 | python eval_linear.py --conv 13 --pretrained ./downloaded_models/deepercluster/ours.pth --sobel true --dump_path $EXP --data_path $DATAPATH_PLACES --batch_size 256 --lr 0.02 --wd 0.00001 --nepochs 100 148 | -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | # 7 | 8 | import argparse 9 | import os 10 | 11 | import apex 12 | import numpy as np 13 | import torch 14 | import torch.distributed as dist 15 | import torch.nn as nn 16 | 17 | from src.clustering import get_cluster_assignments, load_cluster_assignments 18 | from src.data.loader import get_data_transformations 19 | from src.data.YFCC100M import YFCC100M_dataset 20 | from src.model.model_factory import (build_prediction_layer, model_factory, 21 | sgd_optimizer, to_cuda) 22 | from src.model.pretrain import load_pretrained 23 | from src.slurm import init_signal_handler 24 | from src.trainer import train_network 25 | from src.utils import (bool_flag, check_parameters, end_of_epoch, fix_random_seeds, 26 | init_distributed_mode, initialize_exp, restart_from_checkpoint) 27 | 28 | 29 | def get_parser(): 30 | """ 31 | Generate a parameters parser. 32 | """ 33 | # parse parameters 34 | parser = argparse.ArgumentParser(description="Unsupervised feature learning.") 35 | 36 | # handling experiment parameters 37 | parser.add_argument("--checkpoint_freq", type=int, default=1, 38 | help="Save the model every this epoch.") 39 | parser.add_argument("--dump_path", type=str, default="./exp", 40 | help="Experiment dump path.") 41 | parser.add_argument('--epoch', type=int, default=0, 42 | help='Current epoch to run.') 43 | parser.add_argument('--start_iter', type=int, default=0, 44 | help='First iter to run in the current epoch.') 45 | 46 | # network params 47 | parser.add_argument('--pretrained', type=str, default='', 48 | help='Start from this instead of random weights.') 49 | 50 | # datasets params 51 | parser.add_argument('--data_path', type=str, default='', 52 | help='Where to find training dataset.') 53 | parser.add_argument('--size_dataset', type=int, default=10000000, 54 | help='How many images to use.') 55 | parser.add_argument('--workers', type=int, default=8, 56 | help='Number of data loading workers.') 57 | parser.add_argument('--sobel', type=bool_flag, default=0, 58 | help='Apply Sobel filter.') 59 | 60 | # optim params 61 | parser.add_argument('--lr', type=float, default=0.1, help='Learning rate.') 62 | parser.add_argument('--wd', type=float, default=1e-5, help='Weight decay.') 63 | parser.add_argument('--nepochs', type=int, default=100, 64 | help='Max number of epochs to run.') 65 | parser.add_argument('--batch_size', default=48, type=int, 66 | help='Batch-size per process.') 67 | 68 | # Model params 69 | parser.add_argument('--reassignment', type=int, default=3, 70 | help='Reassign clusters every this epoch(s).') 71 | parser.add_argument('--dim_pca', type=int, default=4096, 72 | help='Dimension of the pca applied to the descriptors.') 73 | parser.add_argument('--k', type=int, default=10000, 74 | help='Total number of clusters.') 75 | parser.add_argument('--super_classes', type=int, default=4, 76 | help='Total number of super-classes.') 77 | parser.add_argument('--rotnet', type=bool_flag, default=True, 78 | help='Network needs to classify large rotations.') 79 | 80 | # k-means params 81 | parser.add_argument('--warm_restart', type=bool_flag, default=False, 82 | help='Use previous centroids as init.') 83 | parser.add_argument('--use_faiss', type=bool_flag, default=True, 84 | help='Use faiss for E steps in k-means.') 85 | parser.add_argument('--niter', type=int, default=10, 86 | help='Number of k-means iterations.') 87 | 88 | # distributed training params 89 | parser.add_argument('--rank', default=0, type=int, 90 | help='Global process rank.') 91 | parser.add_argument("--local_rank", type=int, default=-1, 92 | help="Multi-GPU - Local rank") 93 | parser.add_argument('--world-size', default=1, type=int, 94 | help='Number of distributed processes.') 95 | parser.add_argument('--dist-url', default='', type=str, 96 | help='Url used to set up distributed training.') 97 | 98 | # debug 99 | parser.add_argument("--debug_slurm", type=bool_flag, default=False, 100 | help="Debug within a SLURM job.") 101 | 102 | return parser.parse_args() 103 | 104 | 105 | def main(args): 106 | """ 107 | This code implements the paper: https://arxiv.org/abs/1905.01278 108 | The method consists in alternating between a hierachical clustering of the 109 | features and learning the parameters of a convnet by predicting both the 110 | angle of the rotation applied to the input data and the cluster assignments 111 | in a single hierachical loss. 112 | """ 113 | 114 | # initialize communication groups 115 | training_groups, clustering_groups = init_distributed_mode(args) 116 | 117 | # check parameters 118 | check_parameters(args) 119 | 120 | # initialize the experiment 121 | logger, training_stats = initialize_exp(args, 'epoch', 'iter', 'prec', 'loss', 122 | 'prec_super_class', 'loss_super_class', 123 | 'prec_sub_class', 'loss_sub_class') 124 | 125 | # initialize SLURM signal handler for time limit / pre-emption 126 | init_signal_handler() 127 | 128 | # load data 129 | dataset = YFCC100M_dataset(args.data_path, size=args.size_dataset) 130 | 131 | # prepare the different data transformations 132 | tr_cluster, tr_train = get_data_transformations(args.rotation * 90) 133 | 134 | # build model skeleton 135 | fix_random_seeds() 136 | model = model_factory(args.sobel) 137 | logger.info('model created') 138 | 139 | # load pretrained weights 140 | load_pretrained(model, args) 141 | 142 | # convert batch-norm layers to nvidia wrapper to enable batch stats reduction 143 | model = apex.parallel.convert_syncbn_model(model) 144 | 145 | # distributed training wrapper 146 | model = to_cuda(model, args.gpu_to_work_on, apex=True) 147 | logger.info('model to cuda') 148 | 149 | # set optimizer 150 | optimizer = sgd_optimizer(model, args.lr, args.wd) 151 | 152 | # load cluster assignments 153 | cluster_assignments = load_cluster_assignments(args, dataset) 154 | 155 | # build prediction layer on the super_class 156 | pred_layer, optimizer_pred_layer = build_prediction_layer( 157 | model.module.body.dim_output_space, 158 | args, 159 | ) 160 | 161 | nmb_sub_classes = args.k // args.nmb_super_clusters 162 | sub_class_pred_layer, optimizer_sub_class_pred_layer = build_prediction_layer( 163 | model.module.body.dim_output_space, 164 | args, 165 | num_classes=nmb_sub_classes, 166 | group=training_groups[args.training_local_world_id], 167 | ) 168 | 169 | # variables to fetch in checkpoint 170 | to_restore = {'epoch': 0, 'start_iter': 0} 171 | 172 | # re start from checkpoint 173 | restart_from_checkpoint( 174 | args, 175 | run_variables=to_restore, 176 | state_dict=model, 177 | optimizer=optimizer, 178 | pred_layer_state_dict=pred_layer, 179 | optimizer_pred_layer=optimizer_pred_layer, 180 | ) 181 | pred_layer_name = str(args.training_local_world_id) + '-pred_layer.pth.tar' 182 | restart_from_checkpoint( 183 | args, 184 | ckp_path=os.path.join(args.dump_path, pred_layer_name), 185 | state_dict=sub_class_pred_layer, 186 | optimizer=optimizer_sub_class_pred_layer, 187 | ) 188 | args.epoch = to_restore['epoch'] 189 | args.start_iter = to_restore['start_iter'] 190 | 191 | for _ in range(args.epoch, args.nepochs): 192 | 193 | logger.info("============ Starting epoch %i ... ============" % args.epoch) 194 | fix_random_seeds(args.epoch) 195 | 196 | # step 1: Get the final activations for the whole dataset / Cluster them 197 | 198 | if cluster_assignments is None and not args.epoch % args.reassignment: 199 | 200 | logger.info("=> Start clustering step") 201 | dataset.transform = tr_cluster 202 | 203 | cluster_assignments = get_cluster_assignments(args, model, dataset, clustering_groups) 204 | 205 | # reset prediction layers 206 | if args.nmb_super_clusters > 1: 207 | pred_layer, optimizer_pred_layer = build_prediction_layer( 208 | model.module.body.dim_output_space, 209 | args, 210 | ) 211 | sub_class_pred_layer, optimizer_sub_class_pred_layer = build_prediction_layer( 212 | model.module.body.dim_output_space, 213 | args, 214 | num_classes=nmb_sub_classes, 215 | group=training_groups[args.training_local_world_id], 216 | ) 217 | 218 | 219 | # step 2: Train the network with the cluster assignments as labels 220 | 221 | # prepare dataset 222 | dataset.transform = tr_train 223 | dataset.sub_classes = cluster_assignments 224 | 225 | # concatenate models and their corresponding optimizers 226 | models = [model, pred_layer, sub_class_pred_layer] 227 | optimizers = [optimizer, optimizer_pred_layer, optimizer_sub_class_pred_layer] 228 | 229 | # train the network for one epoch 230 | scores = train_network(args, models, optimizers, dataset) 231 | 232 | ## save training statistics 233 | logger.info(scores) 234 | training_stats.update(scores) 235 | 236 | # reassign clusters at the next epoch 237 | if not args.epoch % args.reassignment: 238 | cluster_assignments = None 239 | dataset.subset_indexes = None 240 | end_of_epoch(args) 241 | 242 | dist.barrier() 243 | 244 | 245 | if __name__ == '__main__': 246 | 247 | # generate parser / parse parameters 248 | args = get_parser() 249 | 250 | # run experiment 251 | main(args) 252 | -------------------------------------------------------------------------------- /main.sh: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | # 7 | #!/bin/bash 8 | 9 | # load ids of the 95.920.149 images we managed to download 10 | wget -c -P ./src/data/ "https://dl.fbaipublicfiles.com/deepcluster/flickr_unique_ids.npy" 11 | 12 | # create experiment dump repo 13 | mkdir -p ./exp/deepercluster/ 14 | 15 | # run unsupervised feature learning 16 | python main.py 17 | --dump_path ./exp/deepercluster/ \ 18 | --pretrained PRETRAINED \ 19 | --data_path DATA_PATH \ 20 | --size_dataset 100000000 \ 21 | --workers 10 \ 22 | --sobel true \ 23 | --lr 0.1 \ 24 | --wd 0.00001 \ 25 | --nepochs 100 \ 26 | --batch_size 48 \ 27 | --reassignment 3 \ 28 | --dim_pca 4096 \ 29 | --super_classes 16 \ 30 | --rotnet true \ 31 | --k 320000 \ 32 | --warm_restart false \ 33 | --use_faiss true \ 34 | --niter 10 \ 35 | --world-size 64 \ 36 | --dist-url DIST_URL 37 | 38 | -------------------------------------------------------------------------------- /src/__init__.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | # 7 | -------------------------------------------------------------------------------- /src/clustering.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | # 7 | 8 | from logging import getLogger 9 | import os 10 | import pickle 11 | 12 | import faiss 13 | import torch 14 | import torch.distributed as dist 15 | from torch.utils.data.sampler import Sampler 16 | import numpy as np 17 | 18 | from .utils import PCA, AverageMeter, normalize, get_indices_sparse 19 | from .distributed_kmeans import distributed_kmeans, initialize_cache 20 | 21 | 22 | logger = getLogger() 23 | 24 | 25 | def get_cluster_assignments(args, model, dataset, groups): 26 | """ 27 | """ 28 | # pseudo-labels are confusing 29 | dataset.sub_classes = None 30 | 31 | # swith to eval mode 32 | model.eval() 33 | 34 | # this process deals only with a subset of the dataset 35 | local_nmb_data = len(dataset) // args.world_size 36 | indices = torch.arange(args.rank * local_nmb_data, (args.rank + 1) * local_nmb_data).int() 37 | 38 | if os.path.isfile(os.path.join(args.dump_path, 'super_class_assignments.pkl')): 39 | 40 | # super-class assignments have already been computed in a previous run 41 | 42 | super_class_assignements = pickle.load(open(os.path.join(args.dump_path, 'super_class_assignments.pkl'), 'rb')) 43 | logger.info('loaded super-class assignments') 44 | 45 | # dump cache 46 | where_helper = get_indices_sparse(super_class_assignements[indices]) 47 | nmb_data_per_super_cluster = torch.zeros(args.nmb_super_clusters).cuda() 48 | for super_class in range(len(where_helper)): 49 | nmb_data_per_super_cluster[super_class] = len(where_helper[super_class][0]) 50 | 51 | else: 52 | sampler = Subset_Sampler(indices) 53 | 54 | # we need a data loader 55 | loader = torch.utils.data.DataLoader( 56 | dataset, 57 | batch_size=args.batch_size, 58 | sampler=sampler, 59 | num_workers=args.workers, 60 | pin_memory=True, 61 | ) 62 | 63 | # initialize cache, pca and centroids 64 | cache, centroids = initialize_cache(args, loader, model) 65 | 66 | # empty cuda cache (useful because we're about to use faiss on gpu) 67 | torch.cuda.empty_cache() 68 | 69 | ## perform clustering into super_clusters 70 | super_class_assignements, centroids_sc = distributed_kmeans( 71 | args, 72 | args.size_dataset, 73 | args.nmb_super_clusters, 74 | cache, 75 | args.rank, 76 | args.world_size, 77 | centroids, 78 | ) 79 | 80 | # dump activations in the cache 81 | where_helper = get_indices_sparse(super_class_assignements[indices]) 82 | nmb_data_per_super_cluster = torch.zeros(args.nmb_super_clusters).cuda() 83 | for super_class in range(len(where_helper)): 84 | ind_sc = where_helper[super_class][0] 85 | np.save(open(os.path.join( 86 | args.dump_path, 87 | 'cache/', 88 | 'super_class' + str(super_class) + '-' + str(args.rank), 89 | ), 'wb'), cache[ind_sc]) 90 | 91 | nmb_data_per_super_cluster[super_class] = len(ind_sc) 92 | 93 | dist.barrier() 94 | 95 | # dump super_class assignment and centroids of super_class 96 | if not args.rank: 97 | pickle.dump( 98 | super_class_assignements, 99 | open(os.path.join(args.dump_path, 'super_class_assignments.pkl'), 'wb'), 100 | ) 101 | pickle.dump( 102 | centroids_sc, 103 | open(os.path.join(args.dump_path, 'super_class_centroids.pkl'), 'wb'), 104 | ) 105 | 106 | # size of the different super clusters 107 | all_counts = [torch.zeros(args.nmb_super_clusters).cuda() for _ in range(args.world_size)] 108 | dist.all_gather(all_counts, nmb_data_per_super_cluster) 109 | all_counts = torch.cat(all_counts).cpu().long() 110 | all_counts = all_counts.reshape(args.world_size, args.nmb_super_clusters) 111 | logger.info(all_counts.sum(dim=0)) 112 | 113 | # what are the data belonging to this super class 114 | dataset.subset_indexes = np.where(super_class_assignements == args.clustering_local_world_id)[0] 115 | div = args.batch_size * args.clustering_local_world_size 116 | dataset.subset_indexes = dataset.subset_indexes[:len(dataset) // div * div] 117 | 118 | dist.barrier() 119 | 120 | # which files this process is going to read 121 | local_nmb_data = int(len(dataset) / args.clustering_local_world_size) 122 | low = np.long(args.clustering_local_rank * local_nmb_data) 123 | high = np.long(low + local_nmb_data) 124 | curr_ind = 0 125 | cache = torch.zeros(local_nmb_data, args.dim_pca, dtype=torch.float32) 126 | 127 | cumsum = torch.cumsum(all_counts[:, args.clustering_local_world_id].long(), 0).long() 128 | for r in range(args.world_size): 129 | # data in this bucket r: [cumsum[r - 1] : cumsum[r] - 1] 130 | low_bucket = np.long(cumsum[r - 1]) if r else 0 131 | 132 | # this bucket is empty 133 | if low_bucket > cumsum[r] - 1: 134 | continue 135 | 136 | if cumsum[r] - 1 < low: 137 | continue 138 | if low_bucket >= high: 139 | break 140 | 141 | # which are the data we are interested in inside this bucket ? 142 | ind_low = np.long(max(low, low_bucket)) 143 | ind_high = np.long(min(high, cumsum[r])) 144 | 145 | cache_r = np.load(open(os.path.join(args.dump_path, 'cache/', 'super_class' + str(args.clustering_local_world_id) + '-' + str(r)), 'rb')) 146 | cache[curr_ind: curr_ind + ind_high - ind_low] = torch.FloatTensor(cache_r[ind_low - low_bucket: ind_high - low_bucket]) 147 | 148 | curr_ind += (ind_high - ind_low) 149 | 150 | # randomly pick some centroids and dump them 151 | centroids_path = os.path.join(args.dump_path, 'centroids' + str(args.clustering_local_world_id) + '.pkl') 152 | if not args.clustering_local_rank: 153 | centroids = cache[np.random.choice( 154 | np.arange(cache.shape[0]), 155 | replace=cache.shape[0] < args.k // args.nmb_super_clusters, 156 | size=args.k // args.nmb_super_clusters, 157 | )] 158 | pickle.dump(centroids, open(centroids_path, 'wb'), -1) 159 | 160 | dist.barrier() 161 | 162 | # read centroids 163 | centroids = pickle.load(open(centroids_path, 'rb')).cuda() 164 | 165 | # distributed kmeans into sub-classes 166 | cluster_assignments, centroids = distributed_kmeans( 167 | args, 168 | len(dataset), 169 | args.k // args.nmb_super_clusters, 170 | cache, 171 | args.clustering_local_rank, 172 | args.clustering_local_world_size, 173 | centroids, 174 | world_id=args.clustering_local_world_id, 175 | group=groups[args.clustering_local_world_id], 176 | ) 177 | 178 | # free RAM 179 | del cache 180 | 181 | # write cluster assignments and centroids 182 | if not args.clustering_local_rank: 183 | pickle.dump( 184 | cluster_assignments, 185 | open(os.path.join(args.dump_path, 'cluster_assignments' + str(args.clustering_local_world_id) + '.pkl'), 'wb'), 186 | ) 187 | pickle.dump( 188 | centroids, 189 | open(centroids_path, 'wb'), 190 | ) 191 | 192 | dist.barrier() 193 | 194 | return cluster_assignments 195 | 196 | 197 | 198 | class Subset_Sampler(Sampler): 199 | """ 200 | Sample indices. 201 | """ 202 | def __init__(self, indices): 203 | self.indices = indices 204 | 205 | def __iter__(self): 206 | return iter(self.indices) 207 | 208 | def __len__(self): 209 | return len(self.indices) 210 | 211 | 212 | def load_cluster_assignments(args, dataset): 213 | """ 214 | Load cluster assignments if they are present in experiment repository. 215 | """ 216 | super_file = os.path.join(args.dump_path, 'super_class_assignments.pkl') 217 | sub_file = os.path.join( 218 | args.dump_path, 219 | 'sub_class_assignments' + str(args.clustering_local_world_id) + '.pkl', 220 | ) 221 | 222 | if os.path.isfile(super_file) and os.path.isfile(sub_file): 223 | super_class_assignments = pickle.load(open(super_file, 'rb')) 224 | dataset.subset_indexes = np.where(super_class_assignments == args.clustering_local_world_id)[0] 225 | 226 | div = args.batch_size * args.clustering_local_world_size 227 | clustering_size_dataset = len(dataset) // div * div 228 | dataset.subset_indexes = dataset.subset_indexes[:clustering_size_dataset] 229 | 230 | logger.info('Found cluster assignments in experiment repository') 231 | return pickle.load(open(sub_file, "rb")) 232 | 233 | return None 234 | -------------------------------------------------------------------------------- /src/data/VOC2007.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | # 7 | 8 | import glob 9 | import os 10 | from collections import defaultdict 11 | 12 | from PIL import Image 13 | from PIL import ImageFile 14 | ImageFile.LOAD_TRUNCATED_IMAGES = True 15 | import numpy as np 16 | import torch.utils.data as data 17 | 18 | 19 | class VOC2007_dataset(data.Dataset): 20 | def __init__(self, voc_dir, split='train', transform=None): 21 | # Find the image sets 22 | image_set_dir = os.path.join(voc_dir, 'ImageSets', 'Main') 23 | image_sets = glob.glob(os.path.join(image_set_dir, '*_' + split + '.txt')) 24 | assert len(image_sets) == 20 25 | # Read the labels 26 | self.n_labels = len(image_sets) 27 | images = defaultdict(lambda:-np.ones(self.n_labels, dtype=np.uint8)) 28 | for k, s in enumerate(sorted(image_sets)): 29 | for l in open(s, 'r'): 30 | name, lbl = l.strip().split() 31 | lbl = int(lbl) 32 | # Switch the ignore label and 0 label (in VOC -1: not present, 0: ignore) 33 | if lbl < 0: 34 | lbl = 0 35 | elif lbl == 0: 36 | lbl = 255 37 | images[os.path.join(voc_dir, 'JPEGImages', name + '.jpg')][k] = lbl 38 | self.images = [(k, images[k]) for k in images.keys()] 39 | np.random.shuffle(self.images) 40 | self.transform = transform 41 | 42 | def __len__(self): 43 | return len(self.images) 44 | 45 | def __getitem__(self, i): 46 | img = Image.open(self.images[i][0]) 47 | img = img.convert('RGB') 48 | if self.transform is not None: 49 | img = self.transform(img) 50 | return img, self.images[i][1] 51 | 52 | -------------------------------------------------------------------------------- /src/data/YFCC100M.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | # 7 | 8 | import os 9 | import zipfile 10 | 11 | import numpy as np 12 | from PIL import Image 13 | from PIL import ImageFile 14 | ImageFile.LOAD_TRUNCATED_IMAGES = True 15 | import torch.utils.data as data 16 | 17 | 18 | def loader(path_zip, file_img): 19 | """ 20 | Load imagefile from zip. 21 | """ 22 | with zipfile.ZipFile(path_zip, 'r') as myzip: 23 | img = Image.open(myzip.open(file_img)) 24 | return img.convert('RGB') 25 | 26 | 27 | class YFCC100M_dataset(data.Dataset): 28 | """ 29 | YFCC100M dataset. 30 | """ 31 | def __init__(self, root, size, flickr_unique_ids=True, transform=None): 32 | self.root = root 33 | self.transform = transform 34 | self.sub_classes = None 35 | 36 | # remove data with uniform color and data we didn't manage to download 37 | if flickr_unique_ids: 38 | self.indexes = np.load(os.path.join(os.path.dirname(os.path.realpath(__file__)), 'flickr_unique_ids.npy')) 39 | self.indexes = self.indexes[:min(size, len(self.indexes))] 40 | else: 41 | self.indexes = np.arange(size) 42 | 43 | # for subsets 44 | self.subset_indexes = None 45 | 46 | def __getitem__(self, ind): 47 | index = ind 48 | if self.subset_indexes is not None: 49 | index = self.subset_indexes[ind] 50 | index = self.indexes[index] 51 | 52 | index = format(index, "0>8d") 53 | repo = index[:2] 54 | z = index[2: 5] 55 | file_img = index[5:] + '.jpg' 56 | 57 | path_zip = os.path.join(self.root, repo, z) + '.zip' 58 | 59 | # load the image 60 | img = loader(path_zip, file_img) 61 | 62 | # apply transformation 63 | if self.transform is not None: 64 | img = self.transform(img) 65 | 66 | # id of cluster 67 | sub_class = -100 68 | if self.sub_classes is not None: 69 | sub_class = self.sub_classes[ind] 70 | 71 | return img, sub_class 72 | 73 | def __len__(self): 74 | if self.subset_indexes is not None: 75 | return len(self.subset_indexes) 76 | return len(self.indexes) 77 | -------------------------------------------------------------------------------- /src/data/__init__.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | # 7 | -------------------------------------------------------------------------------- /src/data/loader.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | # 7 | 8 | from logging import getLogger 9 | from random import randrange 10 | import os 11 | 12 | import numpy as np 13 | from sklearn.feature_extraction import image 14 | import torch 15 | import torch.nn as nn 16 | import torchvision.datasets as datasets 17 | import torchvision.transforms as transforms 18 | from torch.utils.data.sampler import Sampler 19 | 20 | from .YFCC100M import YFCC100M_dataset 21 | 22 | logger = getLogger() 23 | 24 | 25 | def load_data(args): 26 | """ 27 | Load dataset. 28 | """ 29 | if 'yfcc100m' in args.data_path: 30 | return YFCC100M_dataset(args.data_path, size=args.size_dataset) 31 | return datasets.ImageFolder(args.data_path) 32 | 33 | 34 | def get_data_transformations(rotation=0): 35 | """ 36 | Return data transformations for clustering and for training 37 | """ 38 | tr_normalize = transforms.Normalize( 39 | mean=[0.485, 0.456, 0.406], 40 | std=[0.229, 0.224, 0.225], 41 | ) 42 | final_process = [transforms.ToTensor(), tr_normalize] 43 | 44 | # for clustering stage 45 | tr_central_crop = transforms.Compose([ 46 | transforms.Resize(256), 47 | transforms.CenterCrop(224), 48 | lambda x: np.asarray(x), 49 | Rotate(0) 50 | ] + final_process) 51 | 52 | # for training stage 53 | tr_dataug = transforms.Compose([ 54 | transforms.RandomResizedCrop(224), 55 | transforms.RandomHorizontalFlip(), 56 | lambda x: np.asarray(x), 57 | Rotate(rotation) 58 | ] + final_process) 59 | 60 | return tr_central_crop, tr_dataug 61 | 62 | 63 | class Rotate(object): 64 | def __init__(self, rot): 65 | self.rot = rot 66 | def __call__(self, img): 67 | return rotate_img(img, self.rot) 68 | 69 | 70 | def rotate_img(img, rot): 71 | if rot == 0: # 0 degrees rotation 72 | return img 73 | elif rot == 90: # 90 degrees rotation 74 | return np.flipud(np.transpose(img, (1, 0, 2))).copy() 75 | elif rot == 180: # 90 degrees rotation 76 | return np.fliplr(np.flipud(img)).copy() 77 | elif rot == 270: # 270 degrees rotation / or -90 78 | return np.transpose(np.flipud(img), (1, 0, 2)).copy() 79 | else: 80 | return 81 | 82 | 83 | class KFoldSampler(Sampler): 84 | def __init__(self, im_per_target, shuffle): 85 | self.im_per_target = im_per_target 86 | N = 0 87 | for tar in im_per_target: 88 | N = N + len(im_per_target[tar]) 89 | self.N = N 90 | self.shuffle = shuffle 91 | 92 | def __iter__(self): 93 | indices = np.zeros(self.N).astype(int) 94 | c = 0 95 | for tar in self.im_per_target: 96 | indices[c: c + len(self.im_per_target[tar])] = self.im_per_target[tar] 97 | c = c + len(self.im_per_target[tar]) 98 | if self.shuffle: 99 | np.random.shuffle(indices) 100 | return iter(indices) 101 | 102 | def __len__(self): 103 | return self.N 104 | 105 | 106 | class KFold(): 107 | """Class to perform k-fold cross-validation. 108 | Args: 109 | im_per_target (Dict): key (target), value (list of data with this target) 110 | i (int): index of the round of cross validation to perform 111 | K (int): dataset randomly partitioned into K equal sized subsamples 112 | Attributes: 113 | val (KFoldSampler): validation sampler 114 | train (KFoldSampler): training sampler 115 | """ 116 | def __init__(self, im_per_target, i, K): 117 | assert(i len(loader): 41 | nmb_batches_for_pca = len(loader) 42 | logger.warning("Compute the PCA on {} images (entire dataset)".format(args.size_dataset)) 43 | 44 | # statistics 45 | batch_time = AverageMeter() 46 | data_time = AverageMeter() 47 | end = time.time() 48 | 49 | with torch.no_grad(): 50 | for i, (input_tensor, _) in enumerate(loader): 51 | 52 | # time spent to load data 53 | data_time.update(time.time() - end) 54 | 55 | # move to gpu 56 | input_tensor = input_tensor.type(torch.FloatTensor).cuda() 57 | 58 | # forward 59 | feat = model(input_tensor) 60 | 61 | # before the pca has been computed 62 | if i < nmb_batches_for_pca: 63 | 64 | # gather the features computed by all processes 65 | all_feat = [torch.cuda.FloatTensor(feat.size()) for src in range(args.world_size)] 66 | dist.all_gather(all_feat, feat) 67 | 68 | # only main process computes the PCA 69 | if not args.rank: 70 | all_feat = torch.cat(all_feat).cpu().numpy() 71 | 72 | # initialize storage arrays 73 | if i == 0: 74 | if not args.rank: 75 | for_pca = np.zeros( 76 | (nmb_batches_for_pca * batch_size, all_feat.shape[1]), 77 | dtype=np.float32, 78 | ) 79 | for_cache = torch.zeros( 80 | nmb_batches_for_pca * args.batch_size, 81 | feat.size(1), 82 | dtype=torch.float32, 83 | ) 84 | 85 | # fill in arrays 86 | if not args.rank: 87 | for_pca[i * batch_size: (i + 1) * batch_size] = all_feat 88 | 89 | for_cache[i * args.batch_size: (i + 1) * args.batch_size] = feat.cpu() 90 | 91 | # train the pca 92 | if i == nmb_batches_for_pca - 1: 93 | pca_path = os.path.join(args.dump_path, 'pca.pkl') 94 | centroids_path = os.path.join(args.dump_path, 'centroids.pkl') 95 | 96 | # compute the PCA 97 | if not args.rank: 98 | # init PCA object 99 | pca = PCA(dim=args.dim_pca, whit=0.5) 100 | 101 | # center data 102 | mean = np.mean(for_pca, axis=0).astype('float32') 103 | for_pca -= mean 104 | 105 | # compute covariance 106 | cov = np.dot(for_pca.T, for_pca) / for_pca.shape[0] 107 | 108 | # calculate the pca 109 | pca.train_pca(cov) 110 | 111 | # randomly pick some centroids 112 | centroids = pca.apply(for_pca[np.random.choice( 113 | np.arange(for_pca.shape[0]), 114 | replace=False, 115 | size=args.nmb_super_clusters, 116 | )]) 117 | centroids = normalize(centroids) 118 | 119 | pca.mean = mean 120 | 121 | # free memory 122 | del for_pca 123 | 124 | # write PCA to disk 125 | pickle.dump(pca, open(pca_path, 'wb')) 126 | pickle.dump(centroids, open(centroids_path, 'wb')) 127 | 128 | # processes wait that main process compute and write PCA and centroids 129 | dist.barrier() 130 | 131 | # processes read PCA and centroids from disk 132 | pca = pickle.load(open(pca_path, "rb")) 133 | centroids = pickle.load(open(centroids_path, "rb")) 134 | 135 | # apply the pca to the cached features 136 | for_cache = pca.apply(for_cache) 137 | for_cache = normalize(for_cache) 138 | 139 | # extend the cache 140 | current_cache_size = for_cache.size(0) 141 | for_cache = torch.cat((for_cache, torch.zeros( 142 | local_cache_size - current_cache_size, 143 | args.dim_pca, 144 | ))) 145 | logger.info('{0} imgs cached => cache is {1:.2f} % full' 146 | .format(current_cache_size, 100 * current_cache_size / local_cache_size)) 147 | 148 | # keep accumulating data 149 | if i > nmb_batches_for_pca - 1: 150 | feat = pca.apply(feat) 151 | feat = normalize(feat) 152 | for_cache[i * args.batch_size: (i + 1) * args.batch_size] = feat.cpu() 153 | 154 | 155 | # verbose 156 | batch_time.update(time.time() - end) 157 | end = time.time() 158 | if i % 200 == 0: 159 | logger.info('{0} / {1}\t' 160 | 'Time: {batch_time.val:.3f} ({batch_time.avg:.3f})\t' 161 | 'Data Time: {data_time.val:.3f} ({data_time.avg:.3f})\t' 162 | .format(i, len(loader), batch_time=batch_time, data_time=data_time)) 163 | 164 | # move centroids to GPU 165 | centroids = torch.cuda.FloatTensor(centroids) 166 | 167 | return for_cache, centroids 168 | 169 | 170 | def distributed_kmeans(args, n_all, nk, cache, rank, world_size, centroids, world_id=0, group=None): 171 | """ 172 | Distributed mini-batch k-means. 173 | """ 174 | # local assignments 175 | assignments = -1 * np.ones(n_all // world_size) 176 | 177 | # prepare faiss index 178 | if args.use_faiss: 179 | res = faiss.StandardGpuResources() 180 | cfg = faiss.GpuIndexFlatConfig() 181 | cfg.device = args.gpu_to_work_on 182 | index = faiss.GpuIndexFlatL2(res, args.dim_pca, cfg) 183 | 184 | end = time.time() 185 | for p in range(args.niter + 1): 186 | start_pass = time.time() 187 | 188 | # running statistics 189 | batch_time = AverageMeter() 190 | log_loss = AverageMeter() 191 | 192 | # initialize arrays for update 193 | local_counts = torch.zeros(nk).cuda() 194 | local_feats = torch.zeros(nk, args.dim_pca).cuda() 195 | 196 | # prepare E step 197 | torch.cuda.empty_cache() 198 | if args.use_faiss: 199 | index.reset() 200 | index.add(centroids.cpu().numpy().astype('float32')) 201 | else: 202 | centroids_L2_norm = centroids.norm(dim=1)**2 203 | 204 | nmb_batches = n_all // world_size // args.batch_size 205 | for it in range(nmb_batches): 206 | 207 | # fetch mini-batch 208 | feat = cache[it * args.batch_size: (it + 1) * args.batch_size] 209 | 210 | # E-step 211 | if args.use_faiss: 212 | D, I = index.search(feat.numpy().astype('float32'), 1) 213 | I = I.squeeze(1) 214 | else: 215 | # find current cluster assignments 216 | l2dist = 1 - 2 * torch.mm(feat.cuda(non_blocking=True), centroids.transpose(0, 1)) + centroids_L2_norm 217 | D, I = l2dist.min(dim=1) 218 | I = I.cpu().numpy() 219 | D = D.cpu().numpy() 220 | 221 | # update assignment array 222 | assignments[it * args.batch_size: (it + 1) * args.batch_size] = I 223 | 224 | # log 225 | log_loss.update(D.mean()) 226 | 227 | for k in np.unique(I): 228 | idx_k = np.where(I == k)[0] 229 | # number of elmt in cluster k for this batch 230 | local_counts[k] += len(idx_k) 231 | 232 | # sum of elmt belonging to this cluster 233 | local_feats[k, :] += feat.cuda(non_blocking=True)[idx_k].sum(dim=0) 234 | 235 | batch_time.update(time.time() - end) 236 | end = time.time() 237 | 238 | if it and it % 1000 == 0: 239 | logger.info('Pass[{0}] - Iter: [{1}/{2}]\t' 240 | 'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t' 241 | .format(p, it, nmb_batches, batch_time=batch_time)) 242 | 243 | # all reduce operation 244 | # processes share what it is needed for M-step 245 | if group is not None: 246 | dist.all_reduce(local_counts, group=group) 247 | dist.all_reduce(local_feats, group=group) 248 | else: 249 | dist.all_reduce(local_counts) 250 | dist.all_reduce(local_feats) 251 | 252 | # M-step 253 | 254 | # update centroids (for the last pass we only want the assignments) 255 | mask = local_counts.nonzero() 256 | if p < args.niter: 257 | centroids[mask] = 1. / local_counts[mask].unsqueeze(1) * local_feats[mask] 258 | 259 | # deal with empty clusters 260 | for k in (local_counts == 0).nonzero(): 261 | 262 | # choose a random cluster from the set of non empty clusters 263 | np.random.seed(world_id) 264 | m = mask[np.random.randint(len(mask))] 265 | 266 | # replace empty centroid by a non empty one with a perturbation 267 | centroids[k] = centroids[m] 268 | for j in range(args.dim_pca): 269 | sign = (j % 2) * 2 - 1; 270 | centroids[k, j] += sign * 1e-7; 271 | centroids[m, j] -= sign * 1e-7; 272 | 273 | # update the counts 274 | local_counts[k] = local_counts[m] // 2; 275 | local_counts[m] -= local_counts[k]; 276 | 277 | # update the assignments 278 | assignments[np.where(assignments == m.item())[0][: int(local_counts[m])]] = k.cpu() 279 | logger.info('cluster {} empty => split cluster {}'.format(k, m)) 280 | 281 | logger.info(' # Pass[{0}]\tTime {1:.3f}\tLoss {2:.4f}' 282 | .format(p, time.time() - start_pass, log_loss.avg)) 283 | 284 | # now each process needs to share its own set of pseudo_labels 285 | 286 | # where to write / read the pseudo_labels 287 | dump_labels = os.path.join( 288 | args.dump_path, 289 | 'pseudo_labels' + str(world_id) + '-' + str(rank) + '.pkl', 290 | ) 291 | 292 | # log the cluster assignment 293 | pickle.dump( 294 | assignments, 295 | open(dump_labels, 'wb'), 296 | -1, 297 | ) 298 | 299 | # process wait for all processes to finish writing 300 | if group is not None: 301 | dist.barrier(group=group) 302 | else: 303 | dist.barrier() 304 | 305 | pseudo_labels = np.zeros(n_all) 306 | 307 | # process read and reconstitute the pseudo_labels 308 | local_nmb_data = n_all // world_size 309 | for r in range(world_size): 310 | pseudo_labels[torch.arange(r * local_nmb_data, (r + 1) * local_nmb_data).int()] = \ 311 | pickle.load(open(os.path.join(args.dump_path, 'pseudo_labels' + str(world_id) + '-' + str(r) + '.pkl'), "rb")) 312 | 313 | # clean 314 | del assignments 315 | dist.barrier() 316 | os.remove(dump_labels) 317 | 318 | return pseudo_labels, centroids.cpu() 319 | -------------------------------------------------------------------------------- /src/logger.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | # 7 | 8 | import os 9 | import logging 10 | import time 11 | from datetime import timedelta 12 | import pandas as pd 13 | 14 | class LogFormatter(): 15 | 16 | def __init__(self): 17 | self.start_time = time.time() 18 | 19 | def format(self, record): 20 | elapsed_seconds = round(record.created - self.start_time) 21 | 22 | prefix = "%s - %s - %s" % ( 23 | record.levelname, 24 | time.strftime('%x %X'), 25 | timedelta(seconds=elapsed_seconds) 26 | ) 27 | message = record.getMessage() 28 | message = message.replace('\n', '\n' + ' ' * (len(prefix) + 3)) 29 | return "%s - %s" % (prefix, message) if message else '' 30 | 31 | 32 | def create_logger(filepath, rank): 33 | """ 34 | Create a logger. 35 | Use a different log file for each process. 36 | """ 37 | # create log formatter 38 | log_formatter = LogFormatter() 39 | 40 | # create file handler and set level to debug 41 | if filepath is not None: 42 | if rank > 0: 43 | filepath = '%s-%i' % (filepath, rank) 44 | file_handler = logging.FileHandler(filepath, "a") 45 | file_handler.setLevel(logging.DEBUG) 46 | file_handler.setFormatter(log_formatter) 47 | 48 | # create console handler and set level to info 49 | console_handler = logging.StreamHandler() 50 | console_handler.setLevel(logging.INFO) 51 | console_handler.setFormatter(log_formatter) 52 | 53 | # create logger and set level to debug 54 | logger = logging.getLogger() 55 | logger.handlers = [] 56 | logger.setLevel(logging.DEBUG) 57 | logger.propagate = False 58 | if filepath is not None: 59 | logger.addHandler(file_handler) 60 | logger.addHandler(console_handler) 61 | 62 | # reset logger elapsed time 63 | def reset_time(): 64 | log_formatter.start_time = time.time() 65 | logger.reset_time = reset_time 66 | 67 | return logger 68 | 69 | 70 | class PD_Stats(object): 71 | """ 72 | Log stuff with pandas library 73 | """ 74 | def __init__(self, path, columns): 75 | self.path = path 76 | 77 | # reload path stats 78 | if os.path.isfile(self.path): 79 | self.stats = pd.read_pickle(self.path) 80 | 81 | # check that columns are the same 82 | assert list(self.stats.columns) == list(columns) 83 | 84 | else: 85 | self.stats = pd.DataFrame(columns=columns) 86 | 87 | def update(self, row, save=True): 88 | self.stats.loc[len(self.stats.index)] = row 89 | 90 | # save the statistics 91 | if save: 92 | self.stats.to_pickle(self.path) 93 | -------------------------------------------------------------------------------- /src/model/__init__.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | # 7 | -------------------------------------------------------------------------------- /src/model/model_factory.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | # 7 | 8 | from logging import getLogger 9 | 10 | import torch 11 | import torch.nn as nn 12 | import torch.optim 13 | 14 | from .vgg16 import VGG16 15 | 16 | 17 | logger = getLogger() 18 | 19 | 20 | def create_sobel_layer(): 21 | grayscale = nn.Conv2d(3, 1, kernel_size=1, stride=1, padding=0) 22 | grayscale.weight.data.fill_(1.0 / 3.0) 23 | grayscale.bias.data.zero_() 24 | sobel_filter = nn.Conv2d(1, 2, kernel_size=3, stride=1, padding=0) 25 | sobel_filter.weight.data[0, 0].copy_( 26 | torch.FloatTensor([[1, 0, -1], [2, 0, -2], [1, 0, -1]]) 27 | ) 28 | sobel_filter.weight.data[1, 0].copy_( 29 | torch.FloatTensor([[1, 2, 1], [0, 0, 0], [-1, -2, -1]]) 30 | ) 31 | sobel_filter.bias.data.zero_() 32 | sobel = nn.Sequential(grayscale, sobel_filter) 33 | for p in sobel.parameters(): 34 | p.requires_grad = False 35 | return sobel 36 | 37 | 38 | class Net(nn.Module): 39 | def __init__(self, padding, sobel, body, pred_layer): 40 | super(Net, self).__init__() 41 | 42 | # padding 43 | self.padding = padding 44 | 45 | # sobel filter 46 | self.sobel = create_sobel_layer() if sobel else None 47 | 48 | # main architecture 49 | self.body = body 50 | 51 | # prediction layer 52 | self.pred_layer = pred_layer 53 | 54 | self.conv = None 55 | 56 | def forward(self, x): 57 | if self.padding is not None: 58 | x = self.padding(x) 59 | if self.sobel is not None: 60 | x = self.sobel(x) 61 | 62 | if self.conv is not None: 63 | count = 1 64 | for m in self.body.features.modules(): 65 | if not isinstance(m, nn.Sequential): 66 | x = m(x) 67 | if isinstance(m, nn.ReLU): 68 | if count == self.conv: 69 | return x 70 | count = count + 1 71 | 72 | x = self.body(x) 73 | if self.pred_layer is not None: 74 | x = self.pred_layer(x) 75 | return x 76 | 77 | 78 | def model_factory(sobel, relu=False, num_classes=0, batch_norm=True): 79 | """ 80 | Create a network. 81 | """ 82 | dim_in = 2 if sobel else 3 83 | 84 | padding = nn.ConstantPad2d(1, 0.0) 85 | if sobel: 86 | padding = nn.ConstantPad2d(2, 0.0) 87 | body = VGG16(dim_in, relu=relu, batch_norm=batch_norm) 88 | 89 | pred_layer = nn.Linear(body.dim_output_space, num_classes) if num_classes else None 90 | 91 | return Net(padding, sobel, body, pred_layer) 92 | 93 | 94 | def build_prediction_layer(dim_in, args, group=None, num_classes=0): 95 | """ 96 | Create prediction layer on gpu and its associated optimizer. 97 | """ 98 | 99 | if not num_classes: 100 | num_classes = args.super_classes 101 | 102 | # last fully connected layer 103 | pred_layer = nn.Linear(dim_in, num_classes) 104 | 105 | # move prediction layer to gpu 106 | pred_layer = to_cuda(pred_layer, args.gpu_to_work_on, group=group) 107 | 108 | # set optimizer for the prediction layer 109 | optimizer_pred_layer = sgd_optimizer(pred_layer, args.lr, args.wd) 110 | 111 | return pred_layer, optimizer_pred_layer 112 | 113 | 114 | def to_cuda(net, gpu_id, apex=False, group=None): 115 | net = net.cuda() 116 | if apex: 117 | from apex.parallel import DistributedDataParallel as DDP 118 | net = DDP(net, delay_allreduce=True) 119 | else: 120 | net = nn.parallel.DistributedDataParallel( 121 | net, 122 | device_ids=[gpu_id], 123 | process_group=group, 124 | ) 125 | return net 126 | 127 | 128 | def sgd_optimizer(module, lr, wd): 129 | return torch.optim.SGD( 130 | filter(lambda x: x.requires_grad, module.parameters()), 131 | lr=lr, 132 | momentum=0.9, 133 | weight_decay=wd, 134 | ) 135 | 136 | 137 | def sobel2RGB(net): 138 | if net.sobel is None: 139 | return 140 | 141 | def computeweight(conv, alist, blist): 142 | sob = net.sobel._modules['1'].weight 143 | res = 0 144 | for atup in alist: 145 | for btup in blist: 146 | x = conv[:, 0, atup[0], btup[0]]*sob[0, :, atup[1], btup[1]] 147 | y = conv[:, 1, atup[0], btup[0]]*sob[1, :, atup[1], btup[1]] 148 | res = res + x + y 149 | return res 150 | 151 | def aux(a): 152 | if a == 0: 153 | return [(0, 0)] 154 | elif a == 1: 155 | return [(1, 0), (0, 1)] 156 | elif a == 2: 157 | return [(2, 0), (1, 1), (0, 2)] 158 | elif a == 3: 159 | return [(2, 1), (1, 2)] 160 | elif a == 4: 161 | return [(2, 2)] 162 | 163 | features = list(net.body.features.children()) 164 | conv_old = features[0] 165 | conv_final = nn.Conv2d(3, 64, kernel_size=5, padding=1, bias=True) 166 | for i in range(conv_old.kernel_size[0]): 167 | for j in range(conv_old.kernel_size[0]): 168 | neweight = 1/3* computeweight(conv_old.weight, aux(i), aux(j)).expand(3, 64).transpose(1, 0) 169 | conv_final.weight.data[:, :, i, j].copy_(neweight) 170 | conv_final.bias.data.copy_(conv_old.bias.data) 171 | features[0] = conv_final 172 | net.body.features = nn.Sequential(*features) 173 | net.sobel = None 174 | return 175 | -------------------------------------------------------------------------------- /src/model/pretrain.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | # 7 | 8 | import os 9 | 10 | from logging import getLogger 11 | import pickle 12 | import numpy as np 13 | import torch 14 | import torch.nn as nn 15 | 16 | from src.model.model_factory import create_sobel_layer 17 | from src.model.vgg16 import VGG16 18 | 19 | logger = getLogger() 20 | 21 | 22 | def load_pretrained(model, args): 23 | """ 24 | Load weights 25 | """ 26 | if not os.path.isfile(args.pretrained): 27 | logger.info('pretrained weights not found') 28 | return 29 | 30 | # open checkpoint file 31 | map_location = None 32 | if args.world_size > 1: 33 | map_location = "cuda:" + str(args.gpu_to_work_on) 34 | checkpoint = torch.load(args.pretrained, map_location=map_location) 35 | 36 | # clean keys from 'module' 37 | checkpoint['state_dict'] = {rename_key(key): val 38 | for key, val 39 | in checkpoint['state_dict'].items()} 40 | 41 | # remove sobel keys 42 | if 'sobel.0.weight' in checkpoint['state_dict']: 43 | del checkpoint['state_dict']['sobel.0.weight'] 44 | del checkpoint['state_dict']['sobel.0.bias'] 45 | del checkpoint['state_dict']['sobel.1.weight'] 46 | del checkpoint['state_dict']['sobel.1.bias'] 47 | 48 | # remove pred_layer keys 49 | if 'pred_layer.weight' in checkpoint['state_dict']: 50 | del checkpoint['state_dict']['pred_layer.weight'] 51 | del checkpoint['state_dict']['pred_layer.bias'] 52 | 53 | # load weights 54 | model.body.load_state_dict(checkpoint['state_dict']) 55 | logger.info("=> loaded pretrained weights from '{}'".format(args.pretrained)) 56 | 57 | 58 | def rename_key(key): 59 | "Remove module from key" 60 | if not 'module' in key: 61 | return key 62 | if key.startswith('module.body.'): 63 | return key[12:] 64 | if key.startswith('module.'): 65 | return key[7:] 66 | return ''.join(key.split('.module')) 67 | -------------------------------------------------------------------------------- /src/model/vgg16.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | # 7 | 8 | import math 9 | 10 | import torch 11 | import torch.nn as nn 12 | import torch.nn.init as init 13 | 14 | cfg = { 15 | 'D': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'], 16 | } 17 | 18 | class VGG16(nn.Module): 19 | ''' 20 | VGG16 model 21 | ''' 22 | def __init__(self, dim_in, relu=True, dropout=0.5, batch_norm=True): 23 | super(VGG16, self).__init__() 24 | self.features = make_layers(cfg['D'], dim_in, batch_norm=batch_norm) 25 | self.dim_output_space = 4096 26 | classifier = [ 27 | nn.Linear(512 * 7 * 7, 4096), 28 | nn.ReLU(True), 29 | nn.Dropout(dropout), 30 | nn.Linear(4096, 4096), 31 | ] 32 | if relu: 33 | classifier.append(nn.ReLU(True)) 34 | self.classifier = nn.Sequential(*classifier) 35 | 36 | # Initialize weights 37 | for m in self.modules(): 38 | if isinstance(m, nn.Conv2d): 39 | n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels 40 | m.weight.data.normal_(0, math.sqrt(2. / n)) 41 | m.bias.data.zero_() 42 | 43 | def forward(self, x): 44 | x = self.features(x) 45 | if self.classifier is not None: 46 | x = x.view(x.size(0), -1) 47 | x = self.classifier(x) 48 | return x 49 | 50 | 51 | def make_layers(cfg, in_channels, batch_norm=True): 52 | layers = [] 53 | for v in cfg: 54 | if v == 'M': 55 | layers += [nn.MaxPool2d(kernel_size=2, stride=2)] 56 | else: 57 | conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=1) 58 | if batch_norm: 59 | layers += [conv2d, nn.BatchNorm2d(v), nn.ReLU(inplace=True)] 60 | else: 61 | layers += [conv2d, nn.ReLU(inplace=True)] 62 | in_channels = v 63 | return nn.Sequential(*layers) 64 | -------------------------------------------------------------------------------- /src/slurm.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | # 7 | 8 | from logging import getLogger 9 | import os 10 | import signal 11 | import time 12 | 13 | 14 | logger = getLogger() 15 | 16 | 17 | def trigger_job_requeue(checkpoint_filename): 18 | ''' Submit a new job to resume from checkpoint. 19 | Be careful to use only for main process. 20 | ''' 21 | if int(os.environ['SLURM_PROCID']) == 0 and \ 22 | str(os.getpid()) == os.environ['MAIN_PID'] and os.path.isfile(checkpoint_filename): 23 | print('time is up, back to slurm queue', flush=True) 24 | command = 'scontrol requeue ' + os.environ['SLURM_JOB_ID'] 25 | print(command) 26 | if os.system(command): 27 | raise RuntimeError('requeue failed') 28 | print('New job submitted to the queue', flush=True) 29 | exit(0) 30 | 31 | 32 | def SIGTERMHandler(a, b): 33 | print('received sigterm') 34 | pass 35 | 36 | 37 | def signalHandler(a, b): 38 | print('Signal received', a, time.time(), flush=True) 39 | os.environ['SIGNAL_RECEIVED'] = 'True' 40 | return 41 | 42 | 43 | def init_signal_handler(): 44 | """ 45 | Handle signals sent by SLURM for time limit / pre-emption. 46 | """ 47 | os.environ['SIGNAL_RECEIVED'] = 'False' 48 | os.environ['MAIN_PID'] = str(os.getpid()) 49 | 50 | signal.signal(signal.SIGUSR1, signalHandler) 51 | signal.signal(signal.SIGTERM, SIGTERMHandler) 52 | print("Signal handler installed.", flush=True) 53 | -------------------------------------------------------------------------------- /src/trainer.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | # 7 | 8 | from logging import getLogger 9 | import os 10 | import shutil 11 | import time 12 | 13 | import numpy as np 14 | import torch 15 | import torch.distributed as dist 16 | import torch.nn as nn 17 | from torch.utils.data.sampler import Sampler 18 | 19 | from .utils import AverageMeter, get_indices_sparse 20 | from src.slurm import trigger_job_requeue 21 | 22 | 23 | logger = getLogger() 24 | 25 | 26 | class DistUnifTargSampler(Sampler): 27 | """ 28 | Distributively samples elements based on a uniform distribution over the labels. 29 | """ 30 | def __init__(self, total_size, pseudo_labels, num_replicas, rank, seed=31): 31 | 32 | np.random.seed(seed) 33 | 34 | # world size 35 | self.num_replicas = num_replicas 36 | 37 | # rank of this process 38 | self.rank = rank 39 | 40 | # how many data to be loaded by the corpus of processes 41 | self.total_size = total_size 42 | 43 | # set of labels to consider 44 | set_of_pseudo_labels = np.unique(pseudo_labels) 45 | nmb_pseudo_lab = int(len(set_of_pseudo_labels)) 46 | 47 | # number of images per label 48 | per_label = int(self.total_size // nmb_pseudo_lab + 1) 49 | 50 | # initialize indexes 51 | epoch_indexes = np.zeros(int(per_label * nmb_pseudo_lab)) 52 | 53 | # select a number of per_label data for each label 54 | indexes = get_indices_sparse(np.asarray(pseudo_labels)) 55 | for i, k in enumerate(set_of_pseudo_labels): 56 | k = int(k) 57 | label_indexes = indexes[k][0] 58 | epoch_indexes[i * per_label: (i + 1) * per_label] = np.random.choice( 59 | label_indexes, 60 | per_label, 61 | replace=(len(label_indexes) <= per_label) 62 | ) 63 | 64 | # make sure indexes are integers 65 | epoch_indexes = epoch_indexes.astype(int) 66 | 67 | # shuffle the indexes 68 | np.random.shuffle(epoch_indexes) 69 | 70 | self.epoch_indexes = epoch_indexes[:self.total_size] 71 | 72 | # this process only deals with this subset 73 | self.process_ind = self.epoch_indexes[self.rank:self.total_size:self.num_replicas] 74 | 75 | def __iter__(self): 76 | return iter(self.process_ind) 77 | 78 | def __len__(self): 79 | return len(self.process_ind) 80 | 81 | 82 | def train_network(args, models, optimizers, dataset): 83 | """ 84 | Train the models with cluster assignments as targets 85 | """ 86 | # swith to train mode 87 | for model in models: 88 | model.train() 89 | 90 | # uniform sampling over pseudo labels 91 | sampler = DistUnifTargSampler( 92 | args.epoch_size, 93 | dataset.sub_classes, 94 | args.training_local_world_size, 95 | args.training_local_rank, 96 | seed=args.epoch + args.training_local_world_id, 97 | ) 98 | 99 | loader = torch.utils.data.DataLoader( 100 | dataset, 101 | sampler=sampler, 102 | batch_size=args.batch_size, 103 | num_workers=args.workers, 104 | pin_memory=True, 105 | ) 106 | 107 | # running statistics 108 | batch_time = AverageMeter() 109 | data_time = AverageMeter() 110 | 111 | # training statistics 112 | log_top1_subclass = AverageMeter() 113 | log_loss_subclass = AverageMeter() 114 | log_top1_superclass = AverageMeter() 115 | log_loss_superclass = AverageMeter() 116 | 117 | log_top1 = AverageMeter() 118 | log_loss = AverageMeter() 119 | end = time.perf_counter() 120 | 121 | cel = nn.CrossEntropyLoss().cuda() 122 | relu = torch.nn.ReLU().cuda() 123 | 124 | for iter_epoch, (inp, target) in enumerate(loader): 125 | # start at iter start_iter 126 | if iter_epoch < args.start_iter: 127 | continue 128 | 129 | # measure data loading time 130 | data_time.update(time.perf_counter() - end) 131 | 132 | # move input to gpu 133 | inp = inp.cuda(non_blocking=True) 134 | target = target.cuda(non_blocking=True).long() 135 | 136 | # forward on the model 137 | inp = relu(models[0](inp)) 138 | 139 | # forward on sub-class prediction layer 140 | output = models[-1](inp) 141 | loss_subclass = cel(output, target) 142 | 143 | # forward on super-class prediction layer 144 | super_class_output = models[1](inp) 145 | sc_target = args.training_local_world_id + \ 146 | 0 * torch.cuda.LongTensor(args.batch_size) 147 | loss_superclass = cel(super_class_output, sc_target) 148 | 149 | loss = loss_subclass + loss_superclass 150 | 151 | # initialize the optimizers 152 | for optimizer in optimizers: 153 | optimizer.zero_grad() 154 | 155 | # compute the gradients 156 | loss.backward() 157 | 158 | # step 159 | for optimizer in optimizers: 160 | optimizer.step() 161 | 162 | # log 163 | 164 | # signal received, relaunch experiment 165 | if os.environ['SIGNAL_RECEIVED'] == 'True': 166 | save_checkpoint(args, iter_epoch + 1, models, optimizers) 167 | if not args.rank: 168 | trigger_job_requeue(os.path.join(args.dump_path, 'checkpoint.pth.tar')) 169 | 170 | # regular checkpoints 171 | if iter_epoch and iter_epoch % 1000 == 0: 172 | save_checkpoint(args, iter_epoch + 1, models, optimizers) 173 | 174 | # update stats 175 | log_loss.update(loss.item(), output.size(0)) 176 | prec1 = accuracy(args, output, target, sc_output=super_class_output) 177 | log_top1.update(prec1.item(), output.size(0)) 178 | 179 | log_loss_superclass.update(loss_superclass.item(), output.size(0)) 180 | prec1 = accuracy(args, super_class_output, sc_target) 181 | log_top1_superclass.update(prec1.item(), output.size(0)) 182 | 183 | log_loss_subclass.update(loss_subclass.item(), output.size(0)) 184 | prec1 = accuracy(args, output, target) 185 | log_top1_subclass.update(prec1.item(), output.size(0)) 186 | 187 | batch_time.update(time.perf_counter() - end) 188 | end = time.perf_counter() 189 | 190 | # verbose 191 | if iter_epoch % 100 == 0: 192 | logger.info('Epoch[{0}] - Iter: [{1}/{2}]\t' 193 | 'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t' 194 | 'Data {data_time.val:.3f} ({data_time.avg:.3f})\t' 195 | 'Loss {loss.val:.4f} ({loss.avg:.4f})\t' 196 | 'Prec {log_top1.val:.3f} ({log_top1.avg:.3f})\t' 197 | 'Super-class loss: {sc_loss.val:.3f} ({sc_loss.avg:.3f})\t' 198 | 'Super-class prec: {sc_prec.val:.3f} ({sc_prec.avg:.3f})\t' 199 | 'Intra super-class loss: {los.val:.3f} ({los.avg:.3f})\t' 200 | 'Intra super-class prec: {prec.val:.3f} ({prec.avg:.3f})\t' 201 | .format(args.epoch, iter_epoch, len(loader), batch_time=batch_time, 202 | data_time=data_time, loss=log_loss, log_top1=log_top1, 203 | sc_loss=log_loss_superclass, sc_prec=log_top1_superclass, 204 | los=log_loss_subclass, prec=log_top1_subclass)) 205 | 206 | # end of epoch 207 | args.start_iter = 0 208 | args.epoch += 1 209 | 210 | # dump checkpoint 211 | save_checkpoint(args, 0, models, optimizers) 212 | if not args.rank: 213 | if not (args.epoch - 1) % args.checkpoint_freq: 214 | shutil.copyfile( 215 | os.path.join(args.dump_path, 'checkpoint.pth.tar'), 216 | os.path.join(args.dump_checkpoints, 217 | 'checkpoint' + str(args.epoch - 1) + '.pth.tar'), 218 | ) 219 | 220 | return (args.epoch - 1, 221 | args.epoch * len(loader), 222 | log_top1.avg, log_loss.avg, 223 | log_top1_superclass.avg, log_loss_superclass.avg, 224 | log_top1_subclass.avg, log_loss_subclass.avg, 225 | ) 226 | 227 | 228 | def save_checkpoint(args, iter_epoch, models, optimizers, path=''): 229 | if not os.path.isfile(path): 230 | path = os.path.join(args.dump_path, 'checkpoint.pth.tar') 231 | 232 | # main process saves the training state 233 | if not args.rank: 234 | torch.save({ 235 | 'epoch': args.epoch, 236 | 'start_iter': iter_epoch, 237 | 'state_dict': models[0].state_dict(), 238 | 'optimizer': optimizers[0].state_dict(), 239 | 'pred_layer_state_dict': models[1].state_dict(), 240 | 'optimizer_pred_layer': optimizers[1].state_dict(), 241 | }, path) 242 | 243 | # main local training process saves the last layer 244 | if not args.training_local_rank: 245 | torch.save({ 246 | 'epoch': args.epoch, 247 | 'start_iter': iter_epoch, 248 | 'state_dict': models[-1].state_dict(), 249 | 'optimizer': optimizers[-1].state_dict(), 250 | }, os.path.join(args.dump_path, str(args.training_local_world_id) + '-pred_layer.pth.tar')) 251 | 252 | 253 | def accuracy(args, output, target, sc_output=None): 254 | """Computes the accuracy over the k top predictions for the specified values of k""" 255 | with torch.no_grad(): 256 | 257 | batch_size = target.size(0) 258 | 259 | _, pred = output.topk(1, 1, True, True) 260 | pred = pred.t() 261 | correct = pred.eq(target.view(1, -1).expand_as(pred)) 262 | 263 | if sc_output is not None: 264 | _, pred = sc_output.topk(1, 1, True, True) 265 | pred = pred.t() 266 | target = args.training_local_world_id + 0 * torch.cuda.LongTensor(batch_size) 267 | correct_sc = pred.eq(target.view(1, -1).expand_as(pred)) 268 | correct *= correct_sc 269 | 270 | correct_1 = correct[:1].view(-1).float().sum(0, keepdim=True) 271 | return correct_1.mul_(100.0 / batch_size) 272 | 273 | 274 | def validate_network(val_loader, models, args): 275 | batch_time = AverageMeter() 276 | losses = AverageMeter() 277 | top1 = AverageMeter() 278 | 279 | # switch to evaluate mode 280 | for model in models: 281 | model.eval() 282 | 283 | criterion = nn.CrossEntropyLoss().cuda() 284 | 285 | with torch.no_grad(): 286 | end = time.perf_counter() 287 | for i, (inp, target) in enumerate(val_loader): 288 | 289 | # move to gpu 290 | inp = inp.cuda(non_blocking=True) 291 | target = target.cuda(non_blocking=True) 292 | 293 | # compute output 294 | output = inp 295 | for model in models: 296 | output = model(output) 297 | loss = criterion(output, target) 298 | 299 | # measure accuracy and record loss 300 | acc1 = accuracy(args, output, target) 301 | losses.update(loss.item(), inp.size(0)) 302 | top1.update(acc1[0], inp.size(0)) 303 | 304 | # measure elapsed time 305 | batch_time.update(time.perf_counter() - end) 306 | end = time.perf_counter() 307 | 308 | if i % 100 == 0: 309 | logger.info('Test: [{0}/{1}]\t' 310 | 'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t' 311 | 'Loss {loss.val:.4f} ({loss.avg:.4f})\t' 312 | 'Acc@1 {top1.val:.3f} ({top1.avg:.3f})\t' 313 | .format(i, len(val_loader), batch_time=batch_time, 314 | loss=losses, top1=top1)) 315 | 316 | return (top1.avg.item(), losses.avg) 317 | -------------------------------------------------------------------------------- /src/utils.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | # 7 | 8 | import argparse 9 | from logging import getLogger 10 | import os 11 | import pickle 12 | import shutil 13 | import time 14 | 15 | import numpy as np 16 | from scipy.sparse import csr_matrix 17 | import torch 18 | import torch.distributed as dist 19 | 20 | from .logger import create_logger, PD_Stats 21 | 22 | 23 | FALSY_STRINGS = {'off', 'false', '0'} 24 | TRUTHY_STRINGS = {'on', 'true', '1'} 25 | 26 | 27 | logger = getLogger() 28 | 29 | 30 | def bool_flag(s): 31 | """ 32 | Parse boolean arguments from the command line. 33 | """ 34 | if s.lower() in FALSY_STRINGS: 35 | return False 36 | elif s.lower() in TRUTHY_STRINGS: 37 | return True 38 | else: 39 | raise argparse.ArgumentTypeError("invalid value for a boolean flag") 40 | 41 | 42 | def init_distributed_mode(args, make_communication_groups=True): 43 | """ 44 | Handle single and multi-GPU / multi-node / SLURM jobs. 45 | Initialize the following variables: 46 | - global rank 47 | 48 | - clustering_local_rank 49 | - clustering_local_world_size 50 | - clustering_local_world_id 51 | 52 | - training_local_rank 53 | - training_local_world_size 54 | - training_local_world_id 55 | 56 | - rotation 57 | """ 58 | 59 | args.is_slurm_job = 'SLURM_JOB_ID' in os.environ and not args.debug_slurm 60 | 61 | if args.is_slurm_job: 62 | args.rank = int(os.environ['SLURM_PROCID']) 63 | else: 64 | # jobs started with torch.distributed.launch 65 | # read environment variables 66 | args.rank = int(os.environ['RANK']) 67 | args.world_size = int(os.environ['WORLD_SIZE']) 68 | 69 | # prepare distributed 70 | dist.init_process_group(backend='nccl', init_method=args.dist_url, 71 | world_size=args.world_size, rank=args.rank) 72 | 73 | # set cuda device 74 | args.gpu_to_work_on = args.rank % torch.cuda.device_count() 75 | torch.cuda.set_device(args.gpu_to_work_on) 76 | 77 | if not make_communication_groups: 78 | return None, None 79 | 80 | # each super_class has the same number of processes 81 | assert args.world_size % args.super_classes == 0 82 | 83 | # each super-class forms a training communication group 84 | args.training_local_world_size = args.world_size // args.super_classes 85 | args.training_local_rank = args.rank % args.training_local_world_size 86 | args.training_local_world_id = args.rank // args.training_local_world_size 87 | 88 | # prepare training groups 89 | training_groups = [] 90 | for group_id in range(args.super_classes): 91 | ranks = [args.training_local_world_size * group_id + i \ 92 | for i in range(args.training_local_world_size)] 93 | training_groups.append(dist.new_group(ranks=ranks)) 94 | 95 | # compute number of super-clusters 96 | if args.rotnet: 97 | assert args.super_classes % 4 == 0 98 | args.nmb_super_clusters = args.super_classes // 4 99 | else: 100 | args.nmb_super_clusters = args.super_classes 101 | 102 | # prepare clustering communication groups 103 | args.clustering_local_world_size = args.training_local_world_size * \ 104 | (args.super_classes // args.nmb_super_clusters) 105 | args.clustering_local_rank = args.rank % args.clustering_local_world_size 106 | args.clustering_local_world_id = args.rank // args.clustering_local_world_size 107 | 108 | clustering_groups = [] 109 | for group_id in range(args.nmb_super_clusters): 110 | ranks = [args.clustering_local_world_size * group_id + i \ 111 | for i in range(args.clustering_local_world_size)] 112 | clustering_groups.append(dist.new_group(ranks=ranks)) 113 | 114 | # this process deals only with a certain rotation 115 | if args.rotnet: 116 | args.rotation = args.clustering_local_rank // args.training_local_world_size 117 | else: 118 | args.rotation = 0 119 | 120 | return training_groups, clustering_groups 121 | 122 | 123 | def check_parameters(args): 124 | """ 125 | Check if corpus of arguments is consistent. 126 | """ 127 | args.size_dataset = min(args.size_dataset, 95920149) 128 | 129 | # make dataset size divisible by both the batch-size and the world-size 130 | div = args.batch_size * args.world_size 131 | args.size_dataset = args.size_dataset // div * div 132 | 133 | args.epoch_size = args.size_dataset // args.nmb_super_clusters // 4 134 | args.epoch_size = args.epoch_size // div * div 135 | 136 | assert args.super_classes 137 | 138 | # number of super classes must be divisible by the number of rotation categories 139 | if args.rotnet: 140 | assert args.super_classes % 4 == 0 141 | 142 | # feature dimension 143 | assert args.dim_pca <= 4096 144 | 145 | 146 | def initialize_exp(params, *args): 147 | """ 148 | Initialize the experience: 149 | - dump parameters 150 | - create checkpoint and cache repos 151 | - create a logger 152 | - create a panda object to log the training statistics 153 | """ 154 | # dump parameters 155 | pickle.dump(params, open(os.path.join(params.dump_path, 'params.pkl'), 'wb')) 156 | 157 | # create repo to store checkpoints 158 | params.dump_checkpoints = os.path.join(params.dump_path, 'checkpoints') 159 | if not params.rank and not os.path.isdir(params.dump_checkpoints): 160 | os.mkdir(params.dump_checkpoints) 161 | 162 | # create repo to cache activations between the two stages of the hierarchical k-means 163 | if not params.rank and not os.path.isdir(os.path.join(params.dump_path, 'cache')): 164 | os.mkdir(os.path.join(params.dump_path, 'cache')) 165 | 166 | # create a panda object to log loss and acc 167 | training_stats = PD_Stats( 168 | os.path.join(params.dump_path, 'stats' + str(params.rank) + '.pkl'), 169 | args, 170 | ) 171 | 172 | # create a logger 173 | logger = create_logger(os.path.join(params.dump_path, 'train.log'), rank=params.rank) 174 | logger.info("============ Initialized logger ============") 175 | logger.info("\n".join("%s: %s" % (k, str(v)) 176 | for k, v in sorted(dict(vars(params)).items()))) 177 | logger.info("The experiment will be stored in %s\n" % params.dump_path) 178 | logger.info("") 179 | 180 | return logger, training_stats 181 | 182 | 183 | def end_of_epoch(args): 184 | """ 185 | Remove cluster assignment from experiment repository 186 | """ 187 | 188 | def src_dst(what, cl=False): 189 | src = os.path.join( 190 | args.dump_path, 191 | what + cl * str(args.clustering_local_world_id) + '.pkl', 192 | ) 193 | dst = os.path.join( 194 | args.dump_checkpoints, 195 | what + '{}-epoch{}.pkl'.format(cl * args.clustering_local_world_id, args.epoch - 1), 196 | ) 197 | return src, dst 198 | 199 | # main processes only are working here 200 | if not args.clustering_local_rank: 201 | for what in ['cluster_assignments', 'centroids']: 202 | src, dst = src_dst(what, cl=True) 203 | if not (args.epoch - 1) % args.checkpoint_freq: 204 | shutil.copy(src, dst) 205 | if not 'centroids' in src: 206 | os.remove(src) 207 | 208 | if not args.rank: 209 | for what in ['super_class_assignments', 'super_class_centroids']: 210 | src, dst = src_dst(what) 211 | if not (args.epoch - 1) % args.checkpoint_freq: 212 | shutil.copy(src, dst) 213 | os.remove(src) 214 | 215 | 216 | def restart_from_checkpoint(args, ckp_path=None, run_variables=None, **kwargs): 217 | """ 218 | Re-start from checkpoint present in experiment repo 219 | """ 220 | if ckp_path is None: 221 | ckp_path = os.path.join(args.dump_path, 'checkpoint.pth.tar') 222 | 223 | # look for a checkpoint in exp repository 224 | if not os.path.isfile(ckp_path): 225 | return 226 | 227 | logger.info('Found checkpoint in experiment repository') 228 | 229 | # open checkpoint file 230 | map_location = None 231 | if args.world_size > 1: 232 | map_location = "cuda:" + str(args.gpu_to_work_on) 233 | checkpoint = torch.load(ckp_path, map_location=map_location) 234 | 235 | # key is what to look for in the checkpoint file 236 | # value is the object to load 237 | # example: {'state_dict': model} 238 | for key, value in kwargs.items(): 239 | if key in checkpoint and value is not None: 240 | value.load_state_dict(checkpoint[key]) 241 | logger.info("=> loaded {} from checkpoint '{}'" 242 | .format(key, ckp_path)) 243 | else: 244 | logger.warning("=> failed to load {} from checkpoint '{}'" 245 | .format(key, ckp_path)) 246 | 247 | # re load variable important for the run 248 | if run_variables is not None: 249 | for var_name in run_variables: 250 | if var_name in checkpoint: 251 | run_variables[var_name] = checkpoint[var_name] 252 | 253 | 254 | def fix_random_seeds(seed=1993): 255 | """ 256 | Fix random seeds. 257 | """ 258 | torch.manual_seed(seed) 259 | torch.cuda.manual_seed_all(seed) 260 | np.random.seed(seed) 261 | 262 | 263 | class PCA(): 264 | """ 265 | Class to compute and apply PCA. 266 | """ 267 | def __init__(self, dim=256, whit=0.5): 268 | self.dim = dim 269 | self.whit = whit 270 | self.mean = None 271 | 272 | def train_pca(self, cov): 273 | """ 274 | Takes a covariance matrix (np.ndarray) as input. 275 | """ 276 | d, v = np.linalg.eigh(cov) 277 | eps = d.max() * 1e-5 278 | n_0 = (d < eps).sum() 279 | if n_0 > 0: 280 | d[d < eps] = eps 281 | 282 | # total energy 283 | totenergy = d.sum() 284 | 285 | # sort eigenvectors with eigenvalues order 286 | idx = np.argsort(d)[::-1][:self.dim] 287 | d = d[idx] 288 | v = v[:, idx] 289 | 290 | logger.warning("keeping %.2f %% of the energy" % (d.sum() / totenergy * 100.0)) 291 | 292 | # for the whitening 293 | d = np.diag(1. / d**self.whit) 294 | 295 | # principal components 296 | self.dvt = np.dot(d, v.T) 297 | 298 | def apply(self, x): 299 | # input is from numpy 300 | if isinstance(x, np.ndarray): 301 | if self.mean is not None: 302 | x -= self.mean 303 | return np.dot(self.dvt, x.T).T 304 | 305 | # input is from torch and is on GPU 306 | if x.is_cuda: 307 | if self.mean is not None: 308 | x -= torch.cuda.FloatTensor(self.mean) 309 | return torch.mm(torch.cuda.FloatTensor(self.dvt), x.transpose(0, 1)).transpose(0, 1) 310 | 311 | # input if from torch, on CPU 312 | if self.mean is not None: 313 | x -= torch.FloatTensor(self.mean) 314 | return torch.mm(torch.FloatTensor(self.dvt), x.transpose(0, 1)).transpose(0, 1) 315 | 316 | 317 | class AverageMeter(object): 318 | """computes and stores the average and current value""" 319 | def __init__(self): 320 | self.reset() 321 | 322 | def reset(self): 323 | self.val = 0 324 | self.avg = 0 325 | self.sum = 0 326 | self.count = 0 327 | 328 | def update(self, val, n=1): 329 | self.val = val 330 | self.sum += val * n 331 | self.count += n 332 | self.avg = self.sum / self.count 333 | 334 | 335 | def normalize(data): 336 | # data in numpy array 337 | if isinstance(data, np.ndarray): 338 | row_sums = np.linalg.norm(data, axis=1) 339 | data = data / row_sums[:, np.newaxis] 340 | return data 341 | 342 | # data is a tensor 343 | row_sums = data.norm(dim=1, keepdim=True) 344 | data = data / row_sums 345 | return data 346 | 347 | 348 | def compute_M(data): 349 | cols = np.arange(data.size) 350 | return csr_matrix((cols, (data.ravel(), cols)), 351 | shape=(data.max() + 1, data.size)) 352 | 353 | def get_indices_sparse(data): 354 | M = compute_M(data) 355 | return [np.unravel_index(row.data, data.shape) for row in M] 356 | --------------------------------------------------------------------------------