├── CONTRIBUTING.md ├── LICENSE ├── MAINTAINERS.md ├── README.md ├── imagenet-train.py ├── imagenet_utils.py ├── models ├── __init__.py ├── _model_urls.py ├── blresnet.py ├── blresnext.py └── blseresnext.py └── requirement.txt /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | ## Contributing In General 2 | Our project welcomes external contributions. If you have an itch, please feel 3 | free to scratch it. 4 | 5 | To contribute code or documentation, please submit a [pull request](https://github.com/ibm/BigLittleNet-pytorch/pulls). 6 | 7 | A good way to familiarize yourself with the codebase and contribution process is 8 | to look for and tackle low-hanging fruit in the [issue tracker](https://github.com/ibm/BigLittleNet-pytorch/issues). 9 | Before embarking on a more ambitious contribution, please quickly [get in touch](#communication) with us. 10 | 11 | **Note: We appreciate your effort, and want to avoid a situation where a contribution 12 | requires extensive rework (by you or by us), sits in backlog for a long time, or 13 | cannot be accepted at all!** 14 | 15 | ### Proposing new features 16 | 17 | If you would like to implement a new feature, please [raise an issue](https://github.com/ibm/BigLittleNet-pytorch/issues) 18 | before sending a pull request so the feature can be discussed. This is to avoid 19 | you wasting your valuable time working on a feature that the project developers 20 | are not interested in accepting into the code base. 21 | 22 | ### Fixing bugs 23 | 24 | If you would like to fix a bug, please [raise an issue](https://github.com/ibm/BigLittleNet-pytorch/issues) before sending a 25 | pull request so it can be tracked. 26 | 27 | ### Merge approval 28 | 29 | The project maintainers use LGTM (Looks Good To Me) in comments on the code 30 | review to indicate acceptance. A change requires LGTMs from two of the 31 | maintainers of each component affected. 32 | 33 | For a list of the maintainers, see the [MAINTAINERS.md](MAINTAINERS.md) page. 34 | 35 | ## Legal 36 | 37 | Each source file must include a license header for the Apache 38 | Software License 2.0. Using the SPDX format is the simplest approach. 39 | e.g. 40 | 41 | ``` 42 | /* 43 | Copyright All Rights Reserved. 44 | 45 | SPDX-License-Identifier: Apache-2.0 46 | */ 47 | ``` 48 | 49 | We have tried to make it as easy as possible to make contributions. This 50 | applies to how we handle the legal aspects of contribution. We use the 51 | same approach - the [Developer's Certificate of Origin 1.1 (DCO)](https://github.com/hyperledger/fabric/blob/master/docs/source/DCO1.1.txt) - that the Linux® Kernel [community](https://elinux.org/Developer_Certificate_Of_Origin) 52 | uses to manage code contributions. 53 | 54 | We simply ask that when submitting a patch for review, the developer 55 | must include a sign-off statement in the commit message. 56 | 57 | Here is an example Signed-off-by line, which indicates that the 58 | submitter accepts the DCO: 59 | 60 | ``` 61 | Signed-off-by: John Doe 62 | ``` 63 | 64 | You can include this automatically when you commit a change to your 65 | local git repository using the following command: 66 | 67 | ``` 68 | git commit -s 69 | ``` 70 | 71 | ## Communication 72 | Please feel free to connect with us on our emails. 73 | 74 | ## Setup 75 | Please check the requirement.txt. 76 | 77 | ## Testing 78 | None. 79 | 80 | ## Coding style guidelines 81 | Please submit clean code and please make effort to follow existing conventions in order to keep it as readable as possible. We use the PEP 8 style guide. 82 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [yyyy] [name of copyright owner] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /MAINTAINERS.md: -------------------------------------------------------------------------------- 1 | MAINTAINERS 2 | 3 | Chun-Fu (Richard) Chen - chenrich@us.ibm.com 4 | 5 | Quanfu Fan - qfan@us.ibm.com 6 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # BigLittleNet-pytorch 2 | 3 | This repository holds the codes and models for the papers. 4 | 5 | Chun-Fu (Richard) Chen, Quanfu Fan, Neil Mallinar, Tom Sercu and Rogerio Feris 6 | [Big-Little Net: An Efficient Multi-Scale Feature Representation for Visual and Speech Recognition](https://openreview.net/pdf?id=HJMHpjC9Ym) 7 | 8 | If you use the codes and models from this repo, please cite our work. Thanks! 9 | 10 | ``` 11 | @inproceedings{ 12 | chen2018biglittle, 13 | title={{Big-Little Net: An Efficient Multi-Scale Feature Representation for Visual and Speech Recognition}}, 14 | author={Chun-Fu (Richard) Chen and Quanfu Fan and Neil Mallinar and Tom Sercu and Rogerio Feris}, 15 | booktitle={International Conference on Learning Representations}, 16 | year={2019}, 17 | url={https://openreview.net/forum?id=HJMHpjC9Ym}, 18 | } 19 | ``` 20 | 21 | ## Dependent library 22 | 1. pytorch >= 1.0.0 23 | 2. tensorboard_logger 24 | 3. tqdm 25 | 26 | Or install requirement via: 27 | 28 | ``` 29 | pip3 install -r requirement.txt 30 | ``` 31 | 32 | ## Usage 33 | 34 | The training script is mostly borrow from the imagenet example of [pytorch/examples](https://github.com/pytorch/examples/tree/master/imagenet) with modifications. 35 | 36 | Please refer the instructions there to prepare the ImageNet dataset. 37 | 38 | ### Training 39 | 40 | Training a bL-ResNeXt-101 (64×4d) (α = 2, β = 4) model with two GPUs (0, 1) and saving logfile the `LOGDIR` folder 41 | ``` 42 | python3 imagenet-train.py --data /path/to/folder -d 101 --basewidth 4 \ 43 | --cardinality 64 --backbone_net blresnext --alpha 2 --beta 4 \ 44 | --lr_scheduler cosine --logdir LOGDIR --gpu 0,1 45 | ``` 46 | 47 | ### Test 48 | 49 | After download the models, put in the `pretrained` folder. 50 | Evaluating the bL-ResNeXt-101 (64×4d) (α = 2, β = 4) model with two GPUs. 51 | ``` 52 | python3 imagenet-train.py --data /path/to/folder -d 101 --basewidth 4 \ 53 | --cardinality 64 --backbone_net blresnext --alpha 2 --beta 4 --evaluate \ 54 | --gpu 0,1 --pretrained 55 | ``` 56 | 57 | Please feel free to raise issue if you encounter issue when using the pretrained models. 58 | 59 | 60 | ### Results and Models 61 | 62 | 63 | After the submission, we re-train our models on PyTorch with the same setting described in the paper. 64 | 65 | Performance of Big-Little Net models (evaluating on a single 224x224 image.) 66 | 67 | | Model | Top-1 Error | FLOPs (10^9) | 68 | |-------------|-----------------|-------| 69 | |[bLResNet-50 (α = 2, β = 4)](https://ibm.box.com/v/blresnet-50-a2-b4)|22.41%|2.85| 70 | |[bLResNet-101 (α = 2, β = 4)](https://ibm.box.com/v/blresnet-101-a2-b4)|21.34%|3.89| 71 | |[bLResNeXt-50 (32x4d) (α = 2, β = 4)](https://ibm.box.com/v/blresnext-50-32x4d-a2-b4)|21.62%|3.03| 72 | |[bLResNeXt-101 (32x4d) (α = 2, β = 4)](https://ibm.box.com/v/blresnext-101-32x4d-a2-b4)|20.87%|4.08| 73 | |[bLResNeXt-101 (64x4d) (α = 2, β = 4)](https://ibm.box.com/v/blresnext-101-64x4d-a2-b4)|20.34%|7.97| 74 | |[bLSEResNeXt-50 (32x4d) (α = 2, β = 4)](https://ibm.box.com/v/blseresnext-50-32x4d-a2-b4)|21.44%|3.03| 75 | |[bLSEResNeXt-101 (32x4d) (α = 2, β = 4)](https://ibm.box.com/v/blseresnext-101-32x4d-a2-b4)|21.04%|4.08| -------------------------------------------------------------------------------- /imagenet-train.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | # (C) Copyright IBM 2019. 4 | # 5 | # This code is licensed under the Apache License, Version 2.0. You may 6 | # obtain a copy of this license in the LICENSE file in the root directory 7 | # of this source tree or at http://www.apache.org/licenses/LICENSE-2.0. 8 | # 9 | # Any modifications or derivative works of this code must retain this 10 | # copyright notice, and modified files need to carry a notice indicating 11 | # that they have been altered from the originals. 12 | 13 | import argparse 14 | import os 15 | import shutil 16 | 17 | import torch 18 | import torch.nn as nn 19 | import torch.nn.parallel 20 | import torch.backends.cudnn as cudnn 21 | import torch.optim 22 | import torch.utils.data 23 | import torch.utils.data.distributed 24 | from torch.optim import lr_scheduler 25 | import tensorboard_logger 26 | 27 | from models import (blresnext_model, blresnet_model, blseresnext_model) 28 | from imagenet_utils import get_augmentor, get_imagenet_dataflow, train, validate 29 | 30 | parser = argparse.ArgumentParser(description='PyTorch ImageNet Training') 31 | parser.add_argument('--backbone_net', default='blresnext', type=str, help='backbone network', 32 | choices=['blresnext', 'blresnet', 'blseresnext']) 33 | parser.add_argument('-d', '--depth', default=50, type=int, metavar='N', 34 | help='depth of resnext (default: 50)', choices=[50, 101, 152]) 35 | parser.add_argument('--basewidth', default=4, type=int, help='basewidth') 36 | parser.add_argument('--cardinality', default=32, type=int, help='cardinality') 37 | parser.add_argument('--alpha', default=2, type=int, metavar='N', help='ratio of channels') 38 | parser.add_argument('--beta', default=4, type=int, metavar='N', help='ratio of layers') 39 | 40 | parser.add_argument('--gpu', help='comma separated list of GPU(s) to use.') 41 | parser.add_argument('--data', metavar='DIR', 42 | help='path to dataset') 43 | parser.add_argument('-j', '--workers', default=18, type=int, metavar='N', 44 | help='number of data loading workers (default: 18)') 45 | 46 | parser.add_argument('--epochs', default=110, type=int, metavar='N', 47 | help='number of total epochs to run') 48 | parser.add_argument('--start-epoch', default=0, type=int, metavar='N', 49 | help='manual epoch number (useful on restarts)') 50 | parser.add_argument('-b', '--batch-size', default=256, type=int, 51 | metavar='N', help='mini-batch size (default: 256)') 52 | parser.add_argument('--lr', '--learning-rate', default=0.1, type=float, 53 | metavar='LR', help='initial learning rate') 54 | parser.add_argument('--lr_scheduler', default='cosine', type=str, 55 | help='learning rate scheduler', choices=['step', 'cosine']) 56 | parser.add_argument('--momentum', default=0.9, type=float, metavar='M', help='momentum') 57 | parser.add_argument('--weight-decay', '--wd', default=1e-4, type=float, 58 | metavar='W', help='weight decay (default: 1e-4)') 59 | parser.add_argument('--input_shape', default=224, type=int, metavar='N', help='input image size') 60 | 61 | parser.add_argument('--resume', default='', type=str, metavar='PATH', 62 | help='path to latest checkpoint (default: none)') 63 | parser.add_argument('-e', '--evaluate', dest='evaluate', action='store_true', 64 | help='evaluate model on validation set') 65 | parser.add_argument('--pretrained', dest='pretrained', action='store_true', 66 | help='use pre-trained model') 67 | parser.add_argument('--logdir', default='', type=str, help='log path') 68 | 69 | 70 | def main(): 71 | global args 72 | args = parser.parse_args() 73 | cudnn.benchmark = True 74 | 75 | if args.gpu: 76 | os.environ['CUDA_VISIBLE_DEVICES'] = args.gpu 77 | 78 | strong_augmentor = False 79 | if args.backbone_net == 'blresnext': 80 | backbone = blresnext_model 81 | arch_name = "ImageNet-bLResNeXt-{}-{}x{}d-a{}-b{}".format( 82 | args.depth, args.cardinality, args.basewidth, args.alpha, args.beta) 83 | backbone_setting = [args.depth, args.basewidth, args.cardinality, args.alpha, args.beta] 84 | elif args.backbone_net == 'blresnet': 85 | backbone = blresnet_model 86 | arch_name = "ImageNet-bLResNet-{}-a{}-b{}".format(args.depth, args.alpha, args.beta) 87 | backbone_setting = [args.depth, args.alpha, args.beta] 88 | elif args.backbone_net == 'blseresnext': 89 | backbone = blseresnext_model 90 | arch_name = "ImageNet-bLSEResNeXt-{}-{}x{}d-a{}-b{}".format( 91 | args.depth, args.cardinality, args.basewidth, args.alpha, args.beta) 92 | backbone_setting = [args.depth, args.basewidth, args.cardinality, args.alpha, args.beta] 93 | strong_augmentor = True 94 | else: 95 | raise ValueError("Unsupported backbone.") 96 | 97 | # add class number and whether or not load pretrained model 98 | backbone_setting += [1000, args.pretrained] 99 | # create model 100 | model = backbone(*backbone_setting) 101 | if args.pretrained: 102 | print("=> using pre-trained model '{}'".format(arch_name)) 103 | else: 104 | print("=> creating model '{}'".format(arch_name)) 105 | 106 | model = torch.nn.DataParallel(model).cuda() 107 | 108 | # define loss function (criterion) and optimizer 109 | train_criterion = nn.CrossEntropyLoss().cuda() 110 | val_criterion = nn.CrossEntropyLoss().cuda() 111 | 112 | # Data loading code 113 | valdir = os.path.join(args.data, 'val') 114 | val_loader = get_imagenet_dataflow(False, valdir, args.batch_size, get_augmentor( 115 | False, args.input_shape, strong_augmentor), workers=args.workers) 116 | 117 | log_folder = os.path.join(args.logdir, arch_name) 118 | if not os.path.exists(log_folder): 119 | os.makedirs(log_folder) 120 | 121 | if args.evaluate: 122 | val_top1, val_top5, val_losses, val_speed = validate(val_loader, model, val_criterion) 123 | print('Val@{}: \tLoss: {:4.4f}\tTop@1: {:.4f}\tTop@5: {:.4f}\t' 124 | 'Speed: {:.2f} ms/batch\t'.format(args.input_shape, val_losses, val_top1, 125 | val_top5, val_speed * 1000.0), flush=True) 126 | return 127 | 128 | traindir = os.path.join(args.data, 'train') 129 | train_loader = get_imagenet_dataflow(True, traindir, args.batch_size, get_augmentor( 130 | True, args.input_shape, strong_augmentor), workers=args.workers) 131 | 132 | optimizer = torch.optim.SGD(model.parameters(), args.lr, 133 | momentum=args.momentum, 134 | weight_decay=args.weight_decay, 135 | nesterov=True) 136 | if args.lr_scheduler == 'step': 137 | scheduler = lr_scheduler.StepLR(optimizer, 30, gamma=0.1) 138 | elif args.lr_scheduler == 'cosine': 139 | scheduler = lr_scheduler.CosineAnnealingLR(optimizer, args.epochs, eta_min=0) 140 | else: 141 | raise ValueError("Unsupported scheduler.") 142 | 143 | tensorboard_logger.configure(os.path.join(log_folder)) 144 | # optionally resume from a checkpoint 145 | best_top1 = 0.0 146 | if args.resume: 147 | logfile = open(os.path.join(log_folder, 'log.log'), 'a') 148 | if os.path.isfile(args.resume): 149 | print("=> loading checkpoint '{}'".format(args.resume)) 150 | checkpoint = torch.load(args.resume) 151 | args.start_epoch = checkpoint['epoch'] 152 | best_top1 = checkpoint['best_top1'] 153 | model.load_state_dict(checkpoint['state_dict']) 154 | optimizer.load_state_dict(checkpoint['optimizer']) 155 | print("=> loaded checkpoint '{}' (epoch {})" 156 | .format(args.resume, checkpoint['epoch'])) 157 | else: 158 | print("=> no checkpoint found at '{}'".format(args.resume)) 159 | else: 160 | logfile = open(os.path.join(log_folder, 'log.log'), 'w') 161 | 162 | print(args, flush=True) 163 | print(model, flush=True) 164 | 165 | print(args, file=logfile, flush=True) 166 | print(model, file=logfile, flush=True) 167 | 168 | for epoch in range(args.start_epoch, args.epochs): 169 | scheduler.step(epoch) 170 | try: 171 | # get_lr get all lrs for every layer of current epoch, assume the lr for all layers are identical 172 | lr = scheduler.get_lr()[0] 173 | except Exception as e: 174 | lr = None 175 | # train for one epoch 176 | train_top1, train_top5, train_losses, train_speed, speed_data_loader, train_steps = \ 177 | train(train_loader, 178 | model, 179 | train_criterion, 180 | optimizer, epoch + 1) 181 | # evaluate on validation set 182 | val_top1, val_top5, val_losses, val_speed = validate(val_loader, model, val_criterion) 183 | 184 | print('Train: [{:03d}/{:03d}]\tLoss: {:4.4f}\tTop@1: {:.4f}\tTop@5: {:.4f}\tSpeed: {:.2f} ms/batch\t' 185 | 'Data loading: {:.2f} ms/batch'.format(epoch + 1, args.epochs, train_losses, train_top1, train_top5, 186 | train_speed * 1000.0, speed_data_loader * 1000.0), 187 | file=logfile, flush=True) 188 | print('Val : [{:03d}/{:03d}]\tLoss: {:4.4f}\tTop@1: {:.4f}\tTop@5: {:.4f}\tSpeed: {:.2f} ms/batch'.format( 189 | epoch + 1, args.epochs, val_losses, val_top1, val_top5, val_speed * 1000.0), file=logfile, flush=True) 190 | 191 | print('Train: [{:03d}/{:03d}]\tLoss: {:4.4f}\tTop@1: {:.4f}\tTop@5: {:.4f}\tSpeed: {:.2f} ms/batch\t' 192 | 'Data loading: {:.2f} ms/batch'.format(epoch + 1, args.epochs, train_losses, train_top1, train_top5, 193 | train_speed * 1000.0, speed_data_loader * 1000.0), flush=True) 194 | print('Val : [{:03d}/{:03d}]\tLoss: {:4.4f}\tTop@1: {:.4f}\tTop@5: {:.4f}\tSpeed: {:.2f} ms/batch'.format( 195 | epoch + 1, args.epochs, val_losses, val_top1, val_top5, val_speed * 1000.0), flush=True) 196 | 197 | # remember best prec@1 and save checkpoint 198 | is_best = val_top1 > best_top1 199 | best_top1 = max(val_top1, best_top1) 200 | 201 | save_dict = {'epoch': epoch + 1, 202 | 'arch': arch_name, 203 | 'state_dict': model.state_dict(), 204 | 'best_top1': best_top1, 205 | 'optimizer': optimizer.state_dict(), 206 | } 207 | 208 | save_checkpoint(save_dict, is_best, filepath=log_folder) 209 | if lr is not None: 210 | tensorboard_logger.log_value('learnnig-rate', lr, epoch + 1) 211 | tensorboard_logger.log_value('val-top1', val_top1, epoch + 1) 212 | tensorboard_logger.log_value('val-loss', val_losses, epoch + 1) 213 | tensorboard_logger.log_value('train-top1', train_top1, epoch + 1) 214 | tensorboard_logger.log_value('train-loss', train_losses, epoch + 1) 215 | tensorboard_logger.log_value('best-val-top1', best_top1, epoch + 1) 216 | 217 | logfile.close() 218 | 219 | 220 | def save_checkpoint(state, is_best, filepath=''): 221 | torch.save(state, os.path.join(filepath, 'checkpoint.pth.tar')) 222 | if is_best: 223 | shutil.copyfile(os.path.join(filepath, 'checkpoint.pth.tar'), os.path.join(filepath, 'model_best.pth.tar')) 224 | 225 | 226 | if __name__ == '__main__': 227 | main() 228 | -------------------------------------------------------------------------------- /imagenet_utils.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | # (C) Copyright IBM 2019. 4 | # 5 | # This code is licensed under the Apache License, Version 2.0. You may 6 | # obtain a copy of this license in the LICENSE file in the root directory 7 | # of this source tree or at http://www.apache.org/licenses/LICENSE-2.0. 8 | # 9 | # Any modifications or derivative works of this code must retain this 10 | # copyright notice, and modified files need to carry a notice indicating 11 | # that they have been altered from the originals. 12 | 13 | import time 14 | from PIL import Image 15 | import multiprocessing 16 | 17 | import torch 18 | import torch.nn as nn 19 | import torch.nn.parallel 20 | import torch.optim 21 | import torch.utils.data 22 | import torch.utils.data.distributed 23 | import torchvision.transforms as transforms 24 | import torchvision.datasets as datasets 25 | 26 | from tqdm import tqdm 27 | 28 | 29 | class AverageMeter(object): 30 | """Computes and stores the average and current value""" 31 | 32 | def __init__(self): 33 | self.reset() 34 | 35 | def reset(self): 36 | self.val = 0 37 | self.avg = 0 38 | self.sum = 0 39 | self.count = 0 40 | 41 | def update(self, val, n=1): 42 | self.val = val 43 | self.sum += val * n 44 | self.count += n 45 | self.avg = self.sum / self.count 46 | 47 | 48 | def accuracy(output, target, topk=(1,)): 49 | """Computes the precision@k for the specified values of k""" 50 | with torch.no_grad(): 51 | maxk = max(topk) 52 | batch_size = target.size(0) 53 | 54 | _, pred = output.topk(maxk, 1, True, True) 55 | pred = pred.t() 56 | correct = pred.eq(target.view(1, -1).expand_as(pred)) 57 | 58 | res = [] 59 | for k in topk: 60 | correct_k = correct[:k].view(-1).float().sum(0, keepdim=True) 61 | res.append(correct_k.mul_(100.0 / batch_size)) 62 | return res 63 | 64 | 65 | def get_augmentor(is_train, image_size, strong=False): 66 | 67 | augments = [] 68 | 69 | if is_train: 70 | if strong: 71 | augments.append(transforms.RandomRotation(10)) 72 | 73 | augments += [ 74 | transforms.RandomResizedCrop(image_size, interpolation=Image.BILINEAR), 75 | transforms.ColorJitter(brightness=0.4, contrast=0.4, saturation=0.4), 76 | transforms.RandomHorizontalFlip() 77 | ] 78 | else: 79 | augments += [ 80 | transforms.Resize(int(image_size / 0.875 + 0.5) if image_size == 81 | 224 else image_size, interpolation=Image.BILINEAR), 82 | transforms.CenterCrop(image_size) 83 | ] 84 | 85 | augments += [ 86 | transforms.ToTensor(), 87 | transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) 88 | ] 89 | 90 | augmentor = transforms.Compose(augments) 91 | return augmentor 92 | 93 | 94 | def get_imagenet_dataflow(is_train, data_dir, batch_size, augmentor, workers=18, is_distributed=False): 95 | 96 | workers = min(workers, multiprocessing.cpu_count()) 97 | sampler = None 98 | shuffle = False 99 | if is_train: 100 | dataset = datasets.ImageFolder(data_dir, augmentor) 101 | sampler = torch.utils.data.distributed.DistributedSampler(dataset) if is_distributed else None 102 | shuffle = sampler is None 103 | else: 104 | dataset = datasets.ImageFolder(data_dir, augmentor) 105 | 106 | data_loader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=shuffle, 107 | num_workers=workers, pin_memory=True, sampler=sampler) 108 | 109 | return data_loader 110 | 111 | def train(data_loader, model, criterion, optimizer, epoch, 112 | steps_per_epoch=99999999999): 113 | batch_time = AverageMeter() 114 | data_time = AverageMeter() 115 | losses = AverageMeter() 116 | top1 = AverageMeter() 117 | top5 = AverageMeter() 118 | 119 | # switch to train mode 120 | model.train() 121 | end = time.time() 122 | num_batch = 0 123 | with tqdm(total=len(data_loader)) as t_bar: 124 | for i, (input, target) in enumerate(data_loader): 125 | # measure data loading time 126 | data_time.update(time.time() - end) 127 | # compute output 128 | output = model(input) 129 | target = target.cuda(non_blocking=True) 130 | loss = criterion(output, target) 131 | 132 | # measure accuracy and record loss 133 | prec1, prec5 = accuracy(output, target, topk=(1, 5)) 134 | losses.update(loss.item(), input.size(0)) 135 | top1.update(prec1[0], input.size(0)) 136 | top5.update(prec5[0], input.size(0)) 137 | # compute gradient and do SGD step 138 | optimizer.zero_grad() 139 | loss.backward() 140 | optimizer.step() 141 | 142 | # measure elapsed time 143 | batch_time.update(time.time() - end) 144 | end = time.time() 145 | 146 | num_batch += 1 147 | t_bar.update(1) 148 | if i > steps_per_epoch: 149 | break 150 | 151 | return top1.avg, top5.avg, losses.avg, batch_time.avg, data_time.avg, num_batch 152 | 153 | 154 | def validate(data_loader, model, criterion): 155 | batch_time = AverageMeter() 156 | losses = AverageMeter() 157 | top1 = AverageMeter() 158 | top5 = AverageMeter() 159 | 160 | # switch to evaluate mode 161 | model.eval() 162 | 163 | with torch.no_grad(), tqdm(total=len(data_loader)) as t_bar: 164 | end = time.time() 165 | for i, (input, target) in enumerate(data_loader): 166 | target = target.cuda(non_blocking=True) 167 | 168 | # compute output 169 | output = model(input) 170 | loss = criterion(output, target) 171 | 172 | # measure accuracy and record loss 173 | prec1, prec5 = accuracy(output, target, topk=(1, 5)) 174 | losses.update(loss.item(), input.size(0)) 175 | top1.update(prec1[0], input.size(0)) 176 | top5.update(prec5[0], input.size(0)) 177 | 178 | # measure elapsed time 179 | batch_time.update(time.time() - end) 180 | end = time.time() 181 | t_bar.update(1) 182 | 183 | return top1.avg, top5.avg, losses.avg, batch_time.avg 184 | -------------------------------------------------------------------------------- /models/__init__.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | # (C) Copyright IBM 2019. 4 | # 5 | # This code is licensed under the Apache License, Version 2.0. You may 6 | # obtain a copy of this license in the LICENSE.txt file in the root directory 7 | # of this source tree or at http://www.apache.org/licenses/LICENSE-2.0. 8 | # 9 | # Any modifications or derivative works of this code must retain this 10 | # copyright notice, and modified files need to carry a notice indicating 11 | # that they have been altered from the originals. 12 | 13 | from .blresnet import blresnet_model 14 | from .blresnext import blresnext_model 15 | from .blseresnext import blseresnext_model 16 | 17 | __all__ = ['blresnet_model', 'blresnext_model', 'blseresnext_model'] 18 | -------------------------------------------------------------------------------- /models/_model_urls.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | # (C) Copyright IBM 2019. 4 | # 5 | # This code is licensed under the Apache License, Version 2.0. You may 6 | # obtain a copy of this license in the LICENSE.txt file in the root directory 7 | # of this source tree or at http://www.apache.org/licenses/LICENSE-2.0. 8 | # 9 | # Any modifications or derivative works of this code must retain this 10 | # copyright notice, and modified files need to carry a notice indicating 11 | # that they have been altered from the originals. 12 | 13 | model_urls = { 14 | 'blresnet-50-a2-b4': 'pretrained/ImageNet-bLResNet-50-a2-b4.pth.tar', 15 | 'blresnet-101-a2-b4': 'pretrained/ImageNet-bLResNet-101-a2-b4.pth.tar', 16 | 'blresnext-50-32x4d-a2-b4': 'pretrained/ImageNet-bLResNeXt-50-32x4d-a2-b4.pth.tar', 17 | 'blresnext-101-32x4d-a2-b4': 'pretrained/ImageNet-bLResNeXt-101-32x4d-a2-b4.pth.tar', 18 | 'blresnext-101-64x4d-a2-b4': 'pretrained/ImageNet-bLResNeXt-101-64x4d-a2-b4.pth.tar', 19 | 'blseresnext-50-32x4d-a2-b4': 'pretrained/ImageNet-bLSEResNeXt-50-32x4d-a2-b4.pth.tar', 20 | 'blseresnext-101-32x4d-a2-b4': 'pretrained/ImageNet-bLSEResNeXt-101-32x4d-a2-b4.pth.tar', 21 | } 22 | -------------------------------------------------------------------------------- /models/blresnet.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | # (C) Copyright IBM 2019. 4 | # 5 | # This code is licensed under the Apache License, Version 2.0. You may 6 | # obtain a copy of this license in the LICENSE file in the root directory 7 | # of this source tree or at http://www.apache.org/licenses/LICENSE-2.0. 8 | # 9 | # Any modifications or derivative works of this code must retain this 10 | # copyright notice, and modified files need to carry a notice indicating 11 | # that they have been altered from the originals. 12 | 13 | import torch 14 | import torch.nn as nn 15 | 16 | from ._model_urls import model_urls 17 | 18 | __all__ = ['blresnet_model'] 19 | 20 | 21 | class Bottleneck(nn.Module): 22 | expansion = 4 23 | 24 | def __init__(self, inplanes, planes, stride=1, downsample=None, last_relu=True): 25 | super(Bottleneck, self).__init__() 26 | 27 | self.conv1 = nn.Conv2d(inplanes, planes // self.expansion, kernel_size=1, bias=False) 28 | self.bn1 = nn.BatchNorm2d(planes // self.expansion) 29 | self.conv2 = nn.Conv2d(planes // self.expansion, planes // self.expansion, kernel_size=3, stride=stride, 30 | padding=1, bias=False) 31 | self.bn2 = nn.BatchNorm2d(planes // self.expansion) 32 | self.conv3 = nn.Conv2d(planes // self.expansion, planes, kernel_size=1, bias=False) 33 | self.bn3 = nn.BatchNorm2d(planes) 34 | self.relu = nn.ReLU(inplace=True) 35 | self.downsample = downsample 36 | self.stride = stride 37 | self.last_relu = last_relu 38 | 39 | def forward(self, x): 40 | residual = x 41 | 42 | out = self.conv1(x) 43 | out = self.bn1(out) 44 | out = self.relu(out) 45 | 46 | out = self.conv2(out) 47 | out = self.bn2(out) 48 | out = self.relu(out) 49 | 50 | out = self.conv3(out) 51 | out = self.bn3(out) 52 | 53 | if self.downsample is not None: 54 | residual = self.downsample(x) 55 | 56 | out += residual 57 | if self.last_relu: 58 | out = self.relu(out) 59 | 60 | return out 61 | 62 | 63 | class bLModule(nn.Module): 64 | def __init__(self, block, in_channels, out_channels, blocks, alpha, beta, stride): 65 | super(bLModule, self).__init__() 66 | 67 | self.relu = nn.ReLU(inplace=True) 68 | self.big = self._make_layer(block, in_channels, out_channels, blocks - 1, 2, last_relu=False) 69 | self.little = self._make_layer(block, in_channels, out_channels // alpha, max(1, blocks // beta - 1)) 70 | self.little_e = nn.Sequential( 71 | nn.Conv2d(out_channels // alpha, out_channels, kernel_size=1, bias=False), 72 | nn.BatchNorm2d(out_channels)) 73 | 74 | self.fusion = self._make_layer(block, out_channels, out_channels, 1, stride=stride) 75 | 76 | def _make_layer(self, block, inplanes, planes, blocks, stride=1, last_relu=True): 77 | downsample = [] 78 | if stride != 1: 79 | downsample.append(nn.AvgPool2d(3, stride=2, padding=1)) 80 | if inplanes != planes: 81 | downsample.append(nn.Conv2d(inplanes, planes, kernel_size=1, stride=1, bias=False)) 82 | downsample.append(nn.BatchNorm2d(planes)) 83 | downsample = None if downsample == [] else nn.Sequential(*downsample) 84 | layers = [] 85 | if blocks == 1: 86 | layers.append(block(inplanes, planes, stride=stride, downsample=downsample)) 87 | else: 88 | layers.append(block(inplanes, planes, stride, downsample)) 89 | for i in range(1, blocks): 90 | layers.append(block(planes, planes, 91 | last_relu=last_relu if i == blocks - 1 else True)) 92 | 93 | return nn.Sequential(*layers) 94 | 95 | def forward(self, x): 96 | big = self.big(x) 97 | little = self.little(x) 98 | little = self.little_e(little) 99 | big = torch.nn.functional.interpolate(big, little.shape[2:]) 100 | out = self.relu(big + little) 101 | out = self.fusion(out) 102 | 103 | return out 104 | 105 | 106 | class bLResNet(nn.Module): 107 | 108 | def __init__(self, block, layers, alpha, beta, num_classes=1000): 109 | num_channels = [64, 128, 256, 512] 110 | self.inplanes = 64 111 | super(bLResNet, self).__init__() 112 | self.conv1 = nn.Conv2d(3, num_channels[0], kernel_size=7, stride=2, padding=3, 113 | bias=False) 114 | self.bn1 = nn.BatchNorm2d(num_channels[0]) 115 | self.relu = nn.ReLU(inplace=True) 116 | self.b_conv0 = nn.Conv2d(num_channels[0], num_channels[0], kernel_size=3, stride=2, padding=1, bias=False) 117 | self.bn_b0 = nn.BatchNorm2d(num_channels[0]) 118 | self.l_conv0 = nn.Conv2d(num_channels[0], num_channels[0] // alpha, 119 | kernel_size=3, stride=1, padding=1, bias=False) 120 | self.bn_l0 = nn.BatchNorm2d(num_channels[0] // alpha) 121 | self.l_conv1 = nn.Conv2d(num_channels[0] // alpha, num_channels[0] // 122 | alpha, kernel_size=3, stride=2, padding=1, bias=False) 123 | self.bn_l1 = nn.BatchNorm2d(num_channels[0] // alpha) 124 | self.l_conv2 = nn.Conv2d(num_channels[0] // alpha, num_channels[0], kernel_size=1, stride=1, bias=False) 125 | self.bn_l2 = nn.BatchNorm2d(num_channels[0]) 126 | 127 | self.bl_init = nn.Conv2d(num_channels[0], num_channels[0], kernel_size=1, stride=1, bias=False) 128 | self.bn_bl_init = nn.BatchNorm2d(num_channels[0]) 129 | 130 | self.layer1 = bLModule(block, num_channels[0], num_channels[0] * 131 | block.expansion, layers[0], alpha, beta, stride=2) 132 | self.layer2 = bLModule(block, num_channels[0] * block.expansion, 133 | num_channels[1] * block.expansion, layers[1], alpha, beta, stride=2) 134 | self.layer3 = bLModule(block, num_channels[1] * block.expansion, 135 | num_channels[2] * block.expansion, layers[2], alpha, beta, stride=1) 136 | self.layer4 = self._make_layer( 137 | block, num_channels[2] * block.expansion, num_channels[3] * block.expansion, layers[3], stride=2) 138 | self.gappool = nn.AdaptiveAvgPool2d(1) 139 | self.fc = nn.Linear(num_channels[3] * block.expansion, num_classes) 140 | 141 | for m in self.modules(): 142 | if isinstance(m, nn.Conv2d): 143 | nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') 144 | elif isinstance(m, nn.BatchNorm2d): 145 | nn.init.constant_(m.weight, 1) 146 | nn.init.constant_(m.bias, 0) 147 | 148 | # Zero-initialize the last BN in each block. 149 | # This improves the model by 0.2~0.3% according to https://arxiv.org/abs/1706.02677 150 | for m in self.modules(): 151 | if isinstance(m, Bottleneck): 152 | nn.init.constant_(m.bn3.weight, 0) 153 | 154 | def _make_layer(self, block, inplanes, planes, blocks, stride=1): 155 | downsample = [] 156 | if stride != 1: 157 | downsample.append(nn.AvgPool2d(3, stride=2, padding=1)) 158 | if inplanes != planes: 159 | downsample.append(nn.Conv2d(inplanes, planes, kernel_size=1, stride=1, bias=False)) 160 | downsample.append(nn.BatchNorm2d(planes)) 161 | downsample = None if downsample == [] else nn.Sequential(*downsample) 162 | 163 | layers = [] 164 | layers.append(block(inplanes, planes, stride, downsample)) 165 | for i in range(1, blocks): 166 | layers.append(block(planes, planes)) 167 | 168 | return nn.Sequential(*layers) 169 | 170 | def forward(self, x): 171 | x = self.conv1(x) 172 | x = self.bn1(x) 173 | x = self.relu(x) 174 | 175 | bx = self.b_conv0(x) 176 | bx = self.bn_b0(bx) 177 | lx = self.l_conv0(x) 178 | lx = self.bn_l0(lx) 179 | lx = self.relu(lx) 180 | lx = self.l_conv1(lx) 181 | lx = self.bn_l1(lx) 182 | lx = self.relu(lx) 183 | lx = self.l_conv2(lx) 184 | lx = self.bn_l2(lx) 185 | x = self.relu(bx + lx) 186 | x = self.bl_init(x) 187 | x = self.bn_bl_init(x) 188 | x = self.relu(x) 189 | 190 | x = self.layer1(x) 191 | x = self.layer2(x) 192 | x = self.layer3(x) 193 | x = self.layer4(x) 194 | 195 | x = self.gappool(x) 196 | x = x.view(x.size(0), -1) 197 | x = self.fc(x) 198 | 199 | return x 200 | 201 | 202 | def blresnet_model(depth, alpha, beta, num_classes=1000, pretrained=False): 203 | layers = { 204 | 50: [3, 4, 6, 3], 205 | 101: [4, 8, 18, 3], 206 | 152: [5, 12, 30, 3] 207 | }[depth] 208 | model = bLResNet(Bottleneck, layers, alpha, beta, num_classes) 209 | 210 | if pretrained: 211 | url = model_urls['blresnet-{}-a{}-b{}'.format(depth, alpha, beta)] 212 | checkpoint = torch.load(url) 213 | model.load_state_dict(checkpoint['state_dict']) 214 | return model 215 | -------------------------------------------------------------------------------- /models/blresnext.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | # (C) Copyright IBM 2019. 4 | # 5 | # This code is licensed under the Apache License, Version 2.0. You may 6 | # obtain a copy of this license in the LICENSE file in the root directory 7 | # of this source tree or at http://www.apache.org/licenses/LICENSE-2.0. 8 | # 9 | # Any modifications or derivative works of this code must retain this 10 | # copyright notice, and modified files need to carry a notice indicating 11 | # that they have been altered from the originals. 12 | 13 | import math 14 | 15 | import torch 16 | import torch.nn as nn 17 | 18 | from ._model_urls import model_urls 19 | 20 | __all__ = ['blresnext_model'] 21 | 22 | 23 | class Bottleneck(nn.Module): 24 | expansion = 4 25 | 26 | def __init__(self, inplanes, planes, basewidth, cardinality, stride=1, downsample=None, last_relu=True): 27 | super(Bottleneck, self).__init__() 28 | 29 | D = int(math.floor(planes * (basewidth / 64.0))) // self.expansion 30 | C = cardinality 31 | 32 | self.conv1 = nn.Conv2d(inplanes, D * C, kernel_size=1, bias=False) 33 | self.bn1 = nn.BatchNorm2d(D * C) 34 | self.conv2 = nn.Conv2d(D * C, D * C, kernel_size=3, stride=stride, 35 | padding=1, bias=False, groups=C) 36 | self.bn2 = nn.BatchNorm2d(D * C) 37 | self.conv3 = nn.Conv2d(D * C, planes, kernel_size=1, bias=False) 38 | self.bn3 = nn.BatchNorm2d(planes) 39 | self.relu = nn.ReLU(inplace=True) 40 | self.downsample = downsample 41 | self.stride = stride 42 | self.last_relu = last_relu 43 | 44 | def forward(self, x): 45 | residual = x 46 | 47 | out = self.conv1(x) 48 | out = self.bn1(out) 49 | out = self.relu(out) 50 | 51 | out = self.conv2(out) 52 | out = self.bn2(out) 53 | out = self.relu(out) 54 | 55 | out = self.conv3(out) 56 | out = self.bn3(out) 57 | 58 | if self.downsample is not None: 59 | residual = self.downsample(x) 60 | 61 | out += residual 62 | if self.last_relu: 63 | out = self.relu(out) 64 | 65 | return out 66 | 67 | 68 | class bLModule(nn.Module): 69 | def __init__(self, block, in_channels, out_channels, blocks, basewidth, cardinality, alpha, beta, stride): 70 | super(bLModule, self).__init__() 71 | 72 | self.relu = nn.ReLU(inplace=True) 73 | self.big = self._make_layer(block, in_channels, out_channels, blocks - 1, 74 | basewidth, cardinality, 2, last_relu=False) 75 | self.little = self._make_layer(block, in_channels, out_channels // alpha, 76 | max(1, blocks // beta - 1), basewidth * alpha, cardinality // alpha) 77 | self.little_e = nn.Sequential( 78 | nn.Conv2d(out_channels // alpha, out_channels, kernel_size=1, bias=False), 79 | nn.BatchNorm2d(out_channels)) 80 | 81 | self.fusion = self._make_layer(block, out_channels, out_channels, 1, basewidth, cardinality, stride=stride) 82 | 83 | def _make_layer(self, block, inplanes, planes, blocks, basewidth, cardinality, stride=1, last_relu=True): 84 | downsample = [] 85 | if stride != 1: 86 | downsample.append(nn.AvgPool2d(3, stride=2, padding=1)) 87 | if inplanes != planes: 88 | downsample.append(nn.Conv2d(inplanes, planes, kernel_size=1, stride=1, bias=False)) 89 | downsample.append(nn.BatchNorm2d(planes)) 90 | downsample = None if downsample == [] else nn.Sequential(*downsample) 91 | 92 | layers = [] 93 | if blocks == 1: 94 | layers.append(block(inplanes, planes, basewidth, cardinality, stride=stride, downsample=downsample)) 95 | else: 96 | layers.append(block(inplanes, planes, basewidth, cardinality, stride, downsample)) 97 | for i in range(1, blocks): 98 | layers.append(block(planes, planes, basewidth, cardinality, 99 | last_relu=last_relu if i == blocks - 1 else True)) 100 | 101 | return nn.Sequential(*layers) 102 | 103 | def forward(self, x): 104 | big = self.big(x) 105 | little = self.little(x) 106 | little = self.little_e(little) 107 | big = torch.nn.functional.interpolate(big, little.shape[2:]) 108 | out = self.relu(big + little) 109 | out = self.fusion(out) 110 | 111 | return out 112 | 113 | 114 | class bLResNeXt(nn.Module): 115 | 116 | def __init__(self, block, layers, basewidth, cardinality, alpha, beta, num_classes=1000): 117 | super(bLResNeXt, self 118 | ).__init__() 119 | num_channels = [64, 128, 256, 512] 120 | self.inplanes = 64 121 | self.conv1 = nn.Conv2d(3, num_channels[0], kernel_size=7, stride=2, padding=3, 122 | bias=False) 123 | self.bn1 = nn.BatchNorm2d(num_channels[0]) 124 | self.relu = nn.ReLU(inplace=True) 125 | 126 | self.b_conv0 = nn.Conv2d(num_channels[0], num_channels[0], kernel_size=3, stride=2, padding=1, bias=False) 127 | self.bn_b0 = nn.BatchNorm2d(num_channels[0]) 128 | self.l_conv0 = nn.Conv2d(num_channels[0], num_channels[0] // alpha, 129 | kernel_size=3, stride=1, padding=1, bias=False) 130 | self.bn_l0 = nn.BatchNorm2d(num_channels[0] // alpha) 131 | self.l_conv1 = nn.Conv2d(num_channels[0] // alpha, num_channels[0] // 132 | alpha, kernel_size=3, stride=2, padding=1, bias=False) 133 | self.bn_l1 = nn.BatchNorm2d(num_channels[0] // alpha) 134 | self.l_conv2 = nn.Conv2d(num_channels[0] // alpha, num_channels[0], kernel_size=1, stride=1, bias=False) 135 | self.bn_l2 = nn.BatchNorm2d(num_channels[0]) 136 | 137 | self.bl_init = nn.Conv2d(num_channels[0], num_channels[0], kernel_size=1, stride=1, bias=False) 138 | self.bn_bl_init = nn.BatchNorm2d(num_channels[0]) 139 | self.layer1 = bLModule(block, num_channels[0], num_channels[0] * block.expansion, 140 | layers[0], basewidth, cardinality, alpha, beta, stride=2) 141 | self.layer2 = bLModule(block, num_channels[0] * block.expansion, num_channels[1] 142 | * block.expansion, layers[1], basewidth, cardinality, alpha, beta, stride=2) 143 | self.layer3 = bLModule(block, num_channels[1] * block.expansion, num_channels[2] 144 | * block.expansion, layers[2], basewidth, cardinality, alpha, beta, stride=1) 145 | self.layer4 = self._make_layer( 146 | block, num_channels[2] * block.expansion, num_channels[3] * block.expansion, layers[3], basewidth, 147 | cardinality, stride=2) 148 | self.gappool = nn.AdaptiveAvgPool2d(1) 149 | self.fc = nn.Linear(num_channels[3] * block.expansion, num_classes) 150 | 151 | for m in self.modules(): 152 | if isinstance(m, nn.Conv2d): 153 | nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') 154 | elif isinstance(m, nn.BatchNorm2d): 155 | nn.init.constant_(m.weight, 1) 156 | nn.init.constant_(m.bias, 0) 157 | 158 | # Zero-initialize the last BN in each block. 159 | # This improves the model by 0.2~0.3% according to https://arxiv.org/abs/1706.02677 160 | for m in self.modules(): 161 | if isinstance(m, Bottleneck): 162 | nn.init.constant_(m.bn3.weight, 0) 163 | # elif isinstance(m, BasicBlock): 164 | # nn.init.constant_(m.bn2.weight, 0) 165 | 166 | def _make_layer(self, block, inplanes, planes, blocks, basewidth, cardinality, stride=1): 167 | 168 | downsample = [] 169 | if stride != 1: 170 | downsample.append(nn.AvgPool2d(3, stride=2, padding=1)) 171 | if inplanes != planes: 172 | downsample.append(nn.Conv2d(inplanes, planes, kernel_size=1, stride=1, bias=False)) 173 | downsample.append(nn.BatchNorm2d(planes)) 174 | downsample = None if downsample == [] else nn.Sequential(*downsample) 175 | 176 | layers = [] 177 | layers.append(block(inplanes, planes, basewidth, cardinality, stride, downsample)) 178 | for i in range(1, blocks): 179 | layers.append(block(planes, planes, basewidth, cardinality)) 180 | 181 | return nn.Sequential(*layers) 182 | 183 | def forward(self, x): 184 | x = self.conv1(x) 185 | x = self.bn1(x) 186 | x = self.relu(x) 187 | 188 | bx = self.b_conv0(x) 189 | bx = self.bn_b0(bx) 190 | 191 | lx = self.l_conv0(x) 192 | lx = self.bn_l0(lx) 193 | lx = self.relu(lx) 194 | lx = self.l_conv1(lx) 195 | lx = self.bn_l1(lx) 196 | lx = self.relu(lx) 197 | lx = self.l_conv2(lx) 198 | lx = self.bn_l2(lx) 199 | 200 | x = self.relu(bx + lx) 201 | x = self.bl_init(x) 202 | x = self.bn_bl_init(x) 203 | x = self.relu(x) 204 | 205 | x = self.layer1(x) 206 | x = self.layer2(x) 207 | x = self.layer3(x) 208 | x = self.layer4(x) 209 | 210 | x = self.gappool(x) 211 | x = x.view(x.size(0), -1) 212 | x = self.fc(x) 213 | 214 | return x 215 | 216 | 217 | def blresnext_model(depth, basewidth, cardinality, alpha, beta, 218 | num_classes=1000, pretrained=False): 219 | layers = { 220 | 50: [3, 4, 6, 3], 221 | 101: [4, 8, 18, 3], 222 | 152: [5, 12, 30, 3] 223 | }[depth] 224 | 225 | model = bLResNeXt(Bottleneck, layers, basewidth, cardinality, 226 | alpha, beta, num_classes) 227 | if pretrained: 228 | url = model_urls['blresnext-{}-{}x{}d-a{}-b{}'.format(depth, cardinality, 229 | basewidth, alpha, beta)] 230 | checkpoint = torch.load(url) 231 | model.load_state_dict(checkpoint['state_dict']) 232 | 233 | return model 234 | -------------------------------------------------------------------------------- /models/blseresnext.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | # (C) Copyright IBM 2019. 4 | # 5 | # This code is licensed under the Apache License, Version 2.0. You may 6 | # obtain a copy of this license in the LICENSE file in the root directory 7 | # of this source tree or at http://www.apache.org/licenses/LICENSE-2.0. 8 | # 9 | # Any modifications or derivative works of this code must retain this 10 | # copyright notice, and modified files need to carry a notice indicating 11 | # that they have been altered from the originals. 12 | 13 | import math 14 | 15 | import torch 16 | import torch.nn as nn 17 | 18 | from ._model_urls import model_urls 19 | 20 | __all__ = ['blseresnext_model'] 21 | 22 | 23 | class SEModule(nn.Module): 24 | 25 | def __init__(self, channels, reduction=16): 26 | super(SEModule, self).__init__() 27 | self.avg_pool = nn.AdaptiveAvgPool2d(1) 28 | self.fc1 = nn.Conv2d(channels, channels // reduction, kernel_size=1, 29 | padding=0) 30 | self.relu = nn.ReLU(inplace=True) 31 | self.fc2 = nn.Conv2d(channels // reduction, channels, kernel_size=1, 32 | padding=0) 33 | self.sigmoid = nn.Sigmoid() 34 | 35 | def forward(self, x): 36 | module_input = x 37 | x = self.avg_pool(x) 38 | x = self.fc1(x) 39 | x = self.relu(x) 40 | x = self.fc2(x) 41 | x = self.sigmoid(x) 42 | return module_input * x 43 | 44 | 45 | class Bottleneck(nn.Module): 46 | expansion = 4 47 | 48 | def __init__(self, inplanes, planes, basewidth, cardinality, stride=1, downsample=None, last_relu=True): 49 | super(Bottleneck, self).__init__() 50 | 51 | D = int(math.floor(planes * (basewidth/64.0))) // self.expansion 52 | C = cardinality 53 | 54 | self.conv1 = nn.Conv2d(inplanes, D*C, kernel_size=1, bias=False) 55 | self.bn1 = nn.BatchNorm2d(D*C) 56 | self.conv2 = nn.Conv2d(D*C, D*C, kernel_size=3, stride=stride, 57 | padding=1, bias=False, groups=C) 58 | 59 | self.bn2 = nn.BatchNorm2d(D*C) 60 | self.conv3 = nn.Conv2d(D*C, planes, kernel_size=1, bias=False) 61 | self.bn3 = nn.BatchNorm2d(planes) 62 | self.relu = nn.ReLU(inplace=True) 63 | self.downsample = downsample 64 | self.stride = stride 65 | self.last_relu = last_relu 66 | 67 | self.se_layer = SEModule(planes, 16) 68 | 69 | def forward(self, x): 70 | residual = x 71 | 72 | out = self.conv1(x) 73 | out = self.bn1(out) 74 | out = self.relu(out) 75 | 76 | out = self.conv2(out) 77 | out = self.bn2(out) 78 | out = self.relu(out) 79 | 80 | out = self.conv3(out) 81 | out = self.bn3(out) 82 | 83 | out = self.se_layer(out) 84 | 85 | if self.downsample is not None: 86 | residual = self.downsample(x) 87 | 88 | out += residual 89 | if self.last_relu: 90 | out = self.relu(out) 91 | 92 | return out 93 | 94 | 95 | class bLModule(nn.Module): 96 | def __init__(self, block, in_channels, out_channels, blocks, basewidth, cardinality, alpha, beta, stride): 97 | super(bLModule, self).__init__() 98 | 99 | self.relu = nn.ReLU(inplace=True) 100 | self.big = self._make_layer(block, in_channels, out_channels, blocks - 1, 101 | basewidth, cardinality, 2, last_relu=False) 102 | self.little = self._make_layer(block, in_channels, out_channels // alpha, 103 | max(1, blocks // beta - 1), basewidth * alpha, cardinality // alpha) 104 | self.little_e = nn.Sequential( 105 | nn.Conv2d(out_channels // alpha, out_channels, kernel_size=1, bias=False), 106 | nn.BatchNorm2d(out_channels)) 107 | 108 | self.fusion = self._make_layer(block, out_channels, out_channels, 1, basewidth, cardinality, stride=stride) 109 | 110 | def _make_layer(self, block, inplanes, planes, blocks, basewidth, cardinality, stride=1, last_relu=True): 111 | downsample = [] 112 | if stride != 1: 113 | downsample.append(nn.AvgPool2d(3, stride=2, padding=1)) 114 | if inplanes != planes: 115 | downsample.append(nn.Conv2d(inplanes, planes, kernel_size=1, padding=0, stride=1, bias=False)) 116 | downsample.append(nn.BatchNorm2d(planes)) 117 | downsample = None if downsample == [] else nn.Sequential(*downsample) 118 | 119 | layers = [] 120 | if blocks == 1: 121 | layers.append(block(inplanes, planes, basewidth, cardinality, stride=stride, downsample=downsample)) 122 | else: 123 | layers.append(block(inplanes, planes, basewidth, cardinality, stride, downsample)) 124 | for i in range(1, blocks): 125 | layers.append(block(planes, planes, basewidth, cardinality, 126 | last_relu=last_relu if i == blocks - 1 else True)) 127 | 128 | return nn.Sequential(*layers) 129 | 130 | def forward(self, x): 131 | big = self.big(x) 132 | little = self.little(x) 133 | little = self.little_e(little) 134 | big = torch.nn.functional.interpolate(big, little.shape[2:]) 135 | out = self.relu(big + little) 136 | out = self.fusion(out) 137 | 138 | return out 139 | 140 | 141 | class bLSEResNeXt(nn.Module): 142 | 143 | def __init__(self, block, layers, basewidth, cardinality, alpha, beta, num_classes=1000): 144 | self.inplanes = 64 145 | num_channels = [64, 128, 256, 512] 146 | super(bLSEResNeXt, self).__init__() 147 | self.conv1 = nn.Conv2d(3, num_channels[0], kernel_size=7, stride=2, padding=3, 148 | bias=False) 149 | self.bn1 = nn.BatchNorm2d(num_channels[0]) 150 | self.relu = nn.ReLU(inplace=True) 151 | 152 | self.b_conv0 = nn.Conv2d(num_channels[0], num_channels[0], kernel_size=3, stride=2, padding=1, bias=False) 153 | self.bn_b0 = nn.BatchNorm2d(num_channels[0]) 154 | self.l_conv0 = nn.Conv2d(num_channels[0], num_channels[0] // alpha, 155 | kernel_size=3, stride=1, padding=1, bias=False) 156 | self.bn_l0 = nn.BatchNorm2d(num_channels[0] // alpha) 157 | self.l_conv1 = nn.Conv2d(num_channels[0] // alpha, num_channels[0] // 158 | alpha, kernel_size=3, stride=2, padding=1, bias=False) 159 | self.bn_l1 = nn.BatchNorm2d(num_channels[0] // alpha) 160 | self.l_conv2 = nn.Conv2d(num_channels[0] // alpha, num_channels[0], kernel_size=1, stride=1, bias=False) 161 | self.bn_l2 = nn.BatchNorm2d(num_channels[0]) 162 | 163 | self.bl_init = nn.Conv2d(num_channels[0], num_channels[0], kernel_size=1, stride=1, bias=False) 164 | self.bn_bl_init = nn.BatchNorm2d(num_channels[0]) 165 | self.layer1 = bLModule(block, num_channels[0], num_channels[0] * block.expansion, 166 | layers[0], basewidth, cardinality, alpha, beta, stride=2) 167 | self.layer2 = bLModule(block, num_channels[0] * block.expansion, num_channels[1] 168 | * block.expansion, layers[1], basewidth, cardinality, alpha, beta, stride=2) 169 | self.layer3 = bLModule(block, num_channels[1] * block.expansion, num_channels[2] 170 | * block.expansion, layers[2], basewidth, cardinality, alpha, beta, stride=1) 171 | self.layer4 = self._make_layer(block, num_channels[2] * block.expansion, 172 | num_channels[3] * block.expansion, layers[3], basewidth, cardinality, stride=2) 173 | self.gappool = nn.AdaptiveAvgPool2d(1) 174 | self.fc = nn.Linear(num_channels[3] * block.expansion, num_classes) 175 | 176 | for m in self.modules(): 177 | if isinstance(m, nn.Conv2d): 178 | nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') 179 | if m.bias is not None: 180 | nn.init.constant_(m.bias, 0) 181 | elif isinstance(m, nn.BatchNorm2d): 182 | nn.init.constant_(m.weight, 1) 183 | nn.init.constant_(m.bias, 0) 184 | 185 | # Zero-initialize the last BN in each block. 186 | # This improves the model by 0.2~0.3% according to https://arxiv.org/abs/1706.02677 187 | for m in self.modules(): 188 | if isinstance(m, Bottleneck): 189 | nn.init.constant_(m.bn3.weight, 0) 190 | 191 | def _make_layer(self, block, inplanes, planes, blocks, basewidth, cardinality, stride=1): 192 | downsample = [] 193 | if stride != 1: 194 | downsample.append(nn.AvgPool2d(3, stride=2, padding=1)) 195 | if inplanes != planes: 196 | downsample.append(nn.Conv2d(inplanes, planes, kernel_size=1, padding=0, stride=1, bias=False)) 197 | downsample.append(nn.BatchNorm2d(planes)) 198 | downsample = None if downsample == [] else nn.Sequential(*downsample) 199 | layers = [] 200 | layers.append(block(inplanes, planes, basewidth, cardinality, stride, downsample)) 201 | for i in range(1, blocks): 202 | layers.append(block(planes, planes, basewidth, cardinality)) 203 | 204 | return nn.Sequential(*layers) 205 | 206 | def freeze_bn(self): 207 | '''Freeze BatchNorm layers.''' 208 | for layer in self.modules(): 209 | if isinstance(layer, nn.BatchNorm2d): 210 | layer.eval() 211 | 212 | def forward(self, x): 213 | x = self.conv1(x) 214 | x = self.bn1(x) 215 | x = self.relu(x) 216 | bx = self.b_conv0(x) 217 | bx = self.bn_b0(bx) 218 | 219 | lx = self.l_conv0(x) 220 | lx = self.bn_l0(lx) 221 | lx = self.relu(lx) 222 | lx = self.l_conv1(lx) 223 | lx = self.bn_l1(lx) 224 | lx = self.relu(lx) 225 | lx = self.l_conv2(lx) 226 | lx = self.bn_l2(lx) 227 | 228 | x = self.relu(bx + lx) 229 | x = self.bl_init(x) 230 | x = self.bn_bl_init(x) 231 | x = self.relu(x) 232 | 233 | x = self.layer1(x) 234 | x = self.layer2(x) 235 | x = self.layer3(x) 236 | x = self.layer4(x) 237 | 238 | x = self.gappool(x) 239 | x = x.view(x.size(0), -1) 240 | x = self.fc(x) 241 | 242 | return x 243 | 244 | 245 | def blseresnext_model(depth, basewidth, cardinality, alpha, beta, 246 | num_classes=1000, pretrained=False): 247 | layers = { 248 | 50: [3, 4, 6, 3], 249 | 101: [4, 8, 18, 3], 250 | 152: [5, 12, 30, 3] 251 | }[depth] 252 | 253 | model = bLSEResNeXt(Bottleneck, layers, basewidth, cardinality, 254 | alpha, beta, num_classes) 255 | if pretrained: 256 | url = model_urls['blseresnext-{}-{}x{}d-a{}-b{}'.format(depth, cardinality, 257 | basewidth, alpha, beta)] 258 | checkpoint = torch.load(url) 259 | model.load_state_dict(checkpoint['state_dict']) 260 | 261 | return model 262 | -------------------------------------------------------------------------------- /requirement.txt: -------------------------------------------------------------------------------- 1 | pytorch>=1.0.0 2 | tensorboard_logger 3 | tqdm 4 | --------------------------------------------------------------------------------