├── .gitignore ├── LICENSE ├── README.md ├── adaptive-aggregation-networks ├── README.md ├── main.py ├── models │ ├── modified_linear.py │ ├── modified_resnet.py │ ├── modified_resnet_cifar.py │ ├── modified_resnetmtl.py │ ├── modified_resnetmtl_cifar.py │ └── resnet_cifar.py ├── trainer │ ├── __init__.py │ ├── base_trainer.py │ ├── incremental_icarl.py │ ├── incremental_lucir.py │ ├── trainer.py │ └── zeroth_phase.py └── utils │ ├── __init__.py │ ├── gpu_tools.py │ ├── imagenet │ ├── __init__.py │ ├── train_and_eval.py │ ├── utils_dataset.py │ └── utils_train.py │ ├── incremental │ ├── __init__.py │ ├── compute_accuracy.py │ ├── compute_features.py │ └── conv2d_mtl.py │ ├── misc.py │ └── process_fp.py └── mnemonics-training ├── 1_train ├── main.py ├── models │ ├── __init__.py │ ├── modified_linear.py │ ├── modified_resnet_cifar.py │ └── modified_resnetmtl_cifar.py ├── trainer │ ├── __init__.py │ ├── baseline.py │ ├── incremental.py │ └── mnemonics.py └── utils │ ├── __init__.py │ ├── compute_accuracy.py │ ├── compute_features.py │ ├── conv2d_mtl.py │ ├── gpu_tools.py │ ├── misc.py │ ├── process_fp.py │ └── process_mnemonics.py ├── 2_eval ├── README.md ├── main.py ├── models │ ├── modified_linear.py │ ├── modified_resnet.py │ ├── modified_resnet_cifar.py │ ├── modified_resnetmtl.py │ ├── modified_resnetmtl_cifar.py │ └── resnet_cifar.py ├── process_imagenet │ ├── generate_imagenet.py │ └── generate_imagenet_subset.py ├── run_eval.sh ├── script │ └── download_ckpt.sh ├── trainer │ ├── __init__.py │ └── train.py └── utils │ ├── __init__.py │ ├── gpu_tools.py │ ├── imagenet │ ├── __init__.py │ ├── train_and_eval.py │ ├── utils_dataset.py │ └── utils_train.py │ ├── incremental │ ├── __init__.py │ ├── compute_accuracy.py │ ├── compute_confusion_matrix.py │ ├── compute_features.py │ └── conv2d_mtl.py │ └── misc.py └── README.md /.gitignore: -------------------------------------------------------------------------------- 1 | # File types 2 | *.pyc 3 | *.npy 4 | *.tar.gz 5 | *.sh 6 | *.out 7 | 8 | # Folders 9 | data 10 | logs 11 | runs 12 | __pycache__ 13 | 14 | # File 15 | .DS_Store 16 | bashrc 17 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2020-2021 Yaoyao Liu 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Class-Incremental Learning 2 | 3 | [![LICENSE](https://img.shields.io/badge/license-MIT-green?style=flat-square)](https://github.com/yaoyao-liu/class-incremental-learning/blob/master/LICENSE) 4 | [![Python](https://img.shields.io/badge/python-3.6-blue.svg?style=flat-square&logo=python&color=3776AB&logoColor=3776AB)](https://www.python.org/) 5 | [![PyTorch](https://img.shields.io/badge/pytorch-1.2.0-%237732a8?style=flat-square&logo=PyTorch&color=EE4C2C)](https://pytorch.org/) 6 | 7 | ### Papers 8 | 9 | - Adaptive Aggregation Networks for Class-Incremental Learning, 10 | CVPR 2021. \[[PDF](https://openaccess.thecvf.com/content/CVPR2021/papers/Liu_Adaptive_Aggregation_Networks_for_Class-Incremental_Learning_CVPR_2021_paper.pdf)\] \[[Project Page](https://class-il.mpi-inf.mpg.de/)\] 11 | 12 | - Mnemonics Training: Multi-Class Incremental Learning without Forgetting, 13 | CVPR 2020. \[[PDF](https://arxiv.org/pdf/2002.10211.pdf)\] \[[Project Page](https://class-il.mpi-inf.mpg.de/mnemonics-training/)\] 14 | 15 | ### Citations 16 | 17 | Please cite our papers if they are helpful to your work: 18 | 19 | ```bibtex 20 | @inproceedings{Liu2020AANets, 21 | author = {Liu, Yaoyao and Schiele, Bernt and Sun, Qianru}, 22 | title = {Adaptive Aggregation Networks for Class-Incremental Learning}, 23 | booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, 24 | pages = {2544-2553}, 25 | year = {2021} 26 | } 27 | ``` 28 | 29 | ```bibtex 30 | @inproceedings{liu2020mnemonics, 31 | author = {Liu, Yaoyao and Su, Yuting and Liu, An{-}An and Schiele, Bernt and Sun, Qianru}, 32 | title = {Mnemonics Training: Multi-Class Incremental Learning without Forgetting}, 33 | booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, 34 | pages = {12245--12254}, 35 | year = {2020} 36 | } 37 | ``` 38 | 39 | ### Acknowledgements 40 | 41 | Our implementation uses the source code from the following repositories: 42 | 43 | * [Learning a Unified Classifier Incrementally via Rebalancing](https://github.com/hshustc/CVPR19_Incremental_Learning) 44 | 45 | * [iCaRL: Incremental Classifier and Representation Learning](https://github.com/srebuffi/iCaRL) 46 | 47 | * [Dataset Distillation](https://github.com/SsnL/dataset-distillation) 48 | 49 | * [Generative Teaching Networks](https://github.com/uber-research/GTN) 50 | -------------------------------------------------------------------------------- /adaptive-aggregation-networks/README.md: -------------------------------------------------------------------------------- 1 | ## Adaptive Aggregation Networks for Class-Incremental Learning 2 | 3 | [![LICENSE](https://img.shields.io/badge/license-MIT-green?style=flat-square)](https://github.com/yaoyao-liu/class-incremental-learning/blob/master/LICENSE) 4 | [![Python](https://img.shields.io/badge/python-3.6-blue.svg?style=flat-square&logo=python&color=3776AB)](https://www.python.org/) 5 | [![PyTorch](https://img.shields.io/badge/pytorch-1.2.0-%237732a8?style=flat-square&logo=PyTorch&color=EE4C2C)](https://pytorch.org/) 6 | 9 | 10 | \[[PDF](https://openaccess.thecvf.com/content/CVPR2021/papers/Liu_Adaptive_Aggregation_Networks_for_Class-Incremental_Learning_CVPR_2021_paper.pdf)\] \[[Project Page](https://class-il.mpi-inf.mpg.de/)\] \[[GitLab@MPI](https://gitlab.mpi-klsb.mpg.de/yaoyaoliu/adaptive-aggregation-networks)\] 11 | 12 | #### Summary 13 | 14 | * [Introduction](#introduction) 15 | * [Getting Started](#getting-started) 16 | * [Download the Datasets](#download-the-datasets) 17 | * [Running Experiments](#running-experiments) 18 | * [Citation](#citation) 19 | * [Acknowledgements](#acknowledgements) 20 | 21 | ### Introduction 22 | 23 | Class-Incremental Learning (CIL) aims to learn a classification model with the number of classes increasing phase-by-phase. The inherent problem in CIL is the stability-plasticity dilemma between the learning of old and new classes, i.e., high-plasticity models easily forget old classes but high-stability models are weak to learn new classes. We alleviate this issue by proposing a novel network architecture called Adaptive Aggregation Networks (AANets) in which we explicitly build two residual blocks at each residual level (taking ResNet as the baseline architecture): a stable block and a plastic block. We aggregate the output feature maps from these two blocks and then feed the results to the next-level blocks. We meta-learn the aggregating weights in order to dynamically optimize and balance between two types of blocks, i.e., between stability and plasticity. We conduct extensive experiments on three CIL benchmarks: CIFAR-100, ImageNet-Subset, and ImageNet, and show that many existing CIL methods can be straightforwardly incorporated on the architecture of AANets to boost their performance. 24 | 25 |

26 | 27 |

28 | 29 | > Figure: Conceptual illustrations of different CIL methods. (a) Conventional methods use all available data (imbalanced classes) to train the model (Rebuffi et al., 2017; Hou et al., 2019) (b) Castro et al. (2018), Hou et al. (2019) and Douillard et al. (2020) follow the convention but add a fine-tuning step using the balanced set of exemplars. (c) Our AANets approach uses all available data to update the plastic and stable blocks, and use the balanced set of exemplars to meta-learn the aggregating weights. We continuously update these weights such as to dynamically balance between plastic and stable blocks, i.e., between plasticity and stability 30 | 31 | ### Getting Started 32 | 33 | In order to run this repository, we advise you to install python 3.6 and PyTorch 1.2.0 with Anaconda. 34 | 35 | You may download Anaconda and read the installation instruction on their official website: 36 | 37 | 38 | Create a new environment and install PyTorch and torchvision on it: 39 | 40 | ```bash 41 | conda create --name AANets-PyTorch python=3.6 42 | conda activate AANets-PyTorch 43 | conda install pytorch=1.2.0 44 | conda install torchvision -c pytorch 45 | ``` 46 | 47 | Install other requirements: 48 | ```bash 49 | pip install tqdm scipy sklearn tensorboardX Pillow==6.2.2 50 | ``` 51 | 52 | Clone this repository and enter the folder `adaptive-aggregation-networks`: 53 | ```bash 54 | git clone https://github.com/yaoyao-liu/class-incremental-learning.git 55 | cd class-incremental-learning/adaptive-aggregation-networks 56 | 57 | ``` 58 | 59 | ### Download the Datasets 60 | #### CIFAR-100 61 | It will be downloaded automatically by `torchvision` when running the experiments. 62 | 63 | #### ImageNet-Subset 64 | We create the ImageNet-Subset following [LUCIR](https://github.com/hshustc/CVPR19_Incremental_Learning). 65 | You may download the dataset using the following links: 66 | - [Download from Google Drive](https://drive.google.com/file/d/1n5Xg7Iye_wkzVKc0MTBao5adhYSUlMCL/view?usp=sharing) 67 | - [Download from 百度网盘](https://pan.baidu.com/s/1MnhITYKUI1i7aRBzsPrCSw) (提取码: 6uj5) 68 | 69 | File information: 70 | ``` 71 | File name: ImageNet-Subset.tar 72 | Size: 15.37 GB 73 | MD5: ab2190e9dac15042a141561b9ba5d6e9 74 | ``` 75 | You need to untar the downloaded file, and put the folder `seed_1993_subset_100_imagenet` in `class-incremental-learning/adaptive-aggregation-networks/data`. 76 | 77 | Please note that the ImageNet-Subset is created from ImageNet. ImageNet is only allowed to be downloaded by researchers for non-commercial research and educational purposes. See the terms of ImageNet [here](https://image-net.org/download.php). 78 | 79 | ### Running Experiments 80 | #### Running Experiments w/ AANets on CIFAR-100 81 | 82 | [LUCIR](https://github.com/hshustc/CVPR19_Incremental_Learning) w/ AANets 83 | ```bash 84 | python main.py --nb_cl_fg=50 --nb_cl=10 --gpu=0 --random_seed=1993 --baseline=lucir --branch_mode=dual --branch_1=ss --branch_2=free --dataset=cifar100 85 | python main.py --nb_cl_fg=50 --nb_cl=5 --gpu=0 --random_seed=1993 --baseline=lucir --branch_mode=dual --branch_1=ss --branch_2=free --dataset=cifar100 86 | python main.py --nb_cl_fg=50 --nb_cl=2 --gpu=0 --random_seed=1993 --baseline=lucir --branch_mode=dual --branch_1=ss --branch_2=free --dataset=cifar100 87 | ``` 88 | 89 | [iCaRL](https://github.com/hshustc/CVPR19_Incremental_Learning) w/ AANets 90 | ```bash 91 | python main.py --nb_cl_fg=50 --nb_cl=10 --gpu=0 --random_seed=1993 --baseline=icarl --branch_mode=dual --branch_1=ss --branch_2=free --dataset=cifar100 92 | python main.py --nb_cl_fg=50 --nb_cl=5 --gpu=0 --random_seed=1993 --baseline=icarl --branch_mode=dual --branch_1=ss --branch_2=free --dataset=cifar100 93 | python main.py --nb_cl_fg=50 --nb_cl=2 --gpu=0 --random_seed=1993 --baseline=icarl --branch_mode=dual --branch_1=ss --branch_2=free --dataset=cifar100 94 | ``` 95 | 96 | #### Running Baseline Experiments on CIFAR-100 97 | 98 | [LUCIR](https://github.com/hshustc/CVPR19_Incremental_Learning) w/o AANets, dual branch 99 | ```bash 100 | python main.py --nb_cl_fg=50 --nb_cl=10 --gpu=0 --random_seed=1993 --baseline=lucir --branch_mode=dual --branch_1=free --branch_2=free --fusion_lr=0.0 --dataset=cifar100 101 | python main.py --nb_cl_fg=50 --nb_cl=5 --gpu=0 --random_seed=1993 --baseline=lucir --branch_mode=dual --branch_1=free --branch_2=free ---fusion_lr=0.0 -dataset=cifar100 102 | python main.py --nb_cl_fg=50 --nb_cl=2 --gpu=0 --random_seed=1993 --baseline=lucir --branch_mode=dual --branch_1=free --branch_2=free --fusion_lr=0.0 --dataset=cifar100 103 | ``` 104 | 105 | [iCaRL](https://github.com/hshustc/CVPR19_Incremental_Learning) w/o AANets, dual branch 106 | ```bash 107 | python main.py --nb_cl_fg=50 --nb_cl=10 --gpu=0 --random_seed=1993 --baseline=icarl --branch_mode=dual --branch_1=free --branch_2=free --fusion_lr=0.0 --dataset=cifar100 108 | python main.py --nb_cl_fg=50 --nb_cl=5 --gpu=0 --random_seed=1993 --baseline=icarl --branch_mode=dual --branch_1=free --branch_2=free --fusion_lr=0.0 --dataset=cifar100 109 | python main.py --nb_cl_fg=50 --nb_cl=2 --gpu=0 --random_seed=1993 --baseline=icarl --branch_mode=dual --branch_1=free --branch_2=free --fusion_lr=0.0 --dataset=cifar100 110 | ``` 111 | 112 | [LUCIR](https://github.com/hshustc/CVPR19_Incremental_Learning) w/o AANets, single branch 113 | ```bash 114 | python main.py --nb_cl_fg=50 --nb_cl=10 --gpu=0 --random_seed=1993 --baseline=lucir --branch_mode=single --branch_1=free --dataset=cifar100 115 | python main.py --nb_cl_fg=50 --nb_cl=5 --gpu=0 --random_seed=1993 --baseline=lucir --branch_mode=single --branch_1=free -dataset=cifar100 116 | python main.py --nb_cl_fg=50 --nb_cl=2 --gpu=0 --random_seed=1993 --baseline=lucir --branch_mode=single --branch_1=free --dataset=cifar100 117 | ``` 118 | 119 | [iCaRL](https://github.com/hshustc/CVPR19_Incremental_Learning) w/o AANets, single branch 120 | ```bash 121 | python main.py --nb_cl_fg=50 --nb_cl=10 --gpu=0 --random_seed=1993 --baseline=icarl --branch_mode=single --branch_1=free --dataset=cifar100 122 | python main.py --nb_cl_fg=50 --nb_cl=5 --gpu=0 --random_seed=1993 --baseline=icarl --branch_mode=single --branch_1=free --dataset=cifar100 123 | python main.py --nb_cl_fg=50 --nb_cl=2 --gpu=0 --random_seed=1993 --baseline=icarl --branch_mode=single --branch_1=free --dataset=cifar100 124 | ``` 125 | 126 | #### Running Experiments on ImageNet-Subset 127 | [LUCIR](https://github.com/hshustc/CVPR19_Incremental_Learning) w/ AANets 128 | ```bash 129 | python main.py --nb_cl_fg=50 --nb_cl=10 --gpu=0 --random_seed=1993 --baseline=lucir --branch_mode=dual --branch_1=ss --branch_2=free --dataset=imagenet_sub --test_batch_size=50 --epochs=90 --num_workers=1 --custom_weight_decay=0.0005 --the_lambda=10 --K=2 --dist=0.5 --lw_mr=1 --base_lr1=0.05 --base_lr2=0.05 --dynamic_budget 130 | python main.py --nb_cl_fg=50 --nb_cl=5 --gpu=0 --random_seed=1993 --baseline=lucir --branch_mode=dual --branch_1=ss --branch_2=free --dataset=imagenet_sub --test_batch_size=50 --epochs=90 --num_workers=1 --custom_weight_decay=0.0005 --the_lambda=10 --K=2 --dist=0.5 --lw_mr=1 --base_lr1=0.05 --base_lr2=0.05 --dynamic_budget 131 | python main.py --nb_cl_fg=50 --nb_cl=2 --gpu=0 --random_seed=1993 --baseline=lucir --branch_mode=dual --branch_1=ss --branch_2=free --dataset=imagenet_sub --test_batch_size=50 --epochs=90 --num_workers=1 --custom_weight_decay=0.0005 --the_lambda=10 --K=2 --dist=0.5 --lw_mr=1 --base_lr1=0.05 --base_lr2=0.05 --dynamic_budget 132 | ``` 133 | 134 | ### Code for [PODNet](https://github.com/arthurdouillard/incremental_learning.pytorch) w/ AANets 135 | 136 | We are still cleaning up the code for [PODNet](https://github.com/arthurdouillard/incremental_learning.pytorch) w/ AANets. So we will add it to the GitHub repository later. 137 |
138 | If you need to use it now, here is a preliminary version: 139 |
140 | Please note that you need to install the same environment as [PODNet](https://github.com/arthurdouillard/incremental_learning.pytorch) to run this code. 141 | 142 | ### Accuracy for Each Phase 143 | 144 | We provide the accuracy for each phase on CIFAR-100, ImageNet-Subset, and ImageNet-Full in different settings (*N=5, 10, 25*). 145 |
146 | You may view the results using the following link: 147 | [\[Google Sheet Link\]](https://docs.google.com/spreadsheets/d/1rSA0IH7OilDgfx2cvl86ixjVno4I15bmrDWkS4cUtBA/edit?usp=sharing) 148 |
149 | Please note that we re-run some experiments, so some results are slightly different from the paper table. 150 | 151 | 152 | ### Citation 153 | 154 | Please cite our paper if it is helpful to your work: 155 | 156 | ```bibtex 157 | @inproceedings{Liu2020AANets, 158 | author = {Liu, Yaoyao and Schiele, Bernt and Sun, Qianru}, 159 | title = {Adaptive Aggregation Networks for Class-Incremental Learning}, 160 | booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, 161 | pages = {2544-2553}, 162 | year = {2021} 163 | } 164 | ``` 165 | 166 | ### Acknowledgements 167 | 168 | Our implementation uses the source code from the following repositories: 169 | 170 | * [Learning a Unified Classifier Incrementally via Rebalancing](https://github.com/hshustc/CVPR19_Incremental_Learning) 171 | 172 | * [iCaRL: Incremental Classifier and Representation Learning](https://github.com/srebuffi/iCaRL) 173 | 174 | * [PODNet: Pooled Outputs Distillation for Small-Tasks Incremental Learning](https://github.com/arthurdouillard/incremental_learning.pytorch) 175 | -------------------------------------------------------------------------------- /adaptive-aggregation-networks/main.py: -------------------------------------------------------------------------------- 1 | ##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 | ## Created by: Yaoyao Liu 3 | ## Max Planck Institute for Informatics 4 | ## yaoyao.liu@mpi-inf.mpg.de 5 | ## Copyright (c) 2021 6 | ## 7 | ## This source code is licensed under the MIT-style license found in the 8 | ## LICENSE file in the root directory of this source tree 9 | ##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 10 | """ Main function for this project. """ 11 | import os 12 | import argparse 13 | import numpy as np 14 | from trainer.trainer import Trainer 15 | from utils.gpu_tools import occupy_memory 16 | 17 | if __name__ == '__main__': 18 | parser = argparse.ArgumentParser() 19 | 20 | ### Basic parameters 21 | parser.add_argument('--gpu', default='0', help='the index of GPU') 22 | parser.add_argument('--dataset', default='cifar100', type=str, choices=['cifar100', 'imagenet_sub', 'imagenet']) 23 | parser.add_argument('--data_dir', default='data/seed_1993_subset_100_imagenet/data', type=str) 24 | parser.add_argument('--baseline', default='lucir', type=str, choices=['lucir', 'icarl'], help='baseline method') 25 | parser.add_argument('--ckpt_label', type=str, default='exp01', help='the label for the checkpoints') 26 | parser.add_argument('--ckpt_dir_fg', type=str, default='-', help='the checkpoint file for the 0-th phase') 27 | parser.add_argument('--resume_fg', action='store_true', help='resume 0-th phase model from the checkpoint') 28 | parser.add_argument('--resume', action='store_true', help='resume from the checkpoints') 29 | parser.add_argument('--num_workers', default=1, type=int, help='the number of workers for loading data') 30 | parser.add_argument('--random_seed', default=1993, type=int, help='random seed') 31 | parser.add_argument('--train_batch_size', default=128, type=int, help='the batch size for train loader') 32 | parser.add_argument('--test_batch_size', default=100, type=int, help='the batch size for test loader') 33 | parser.add_argument('--eval_batch_size', default=128, type=int, help='the batch size for validation loader') 34 | parser.add_argument('--disable_gpu_occupancy', action='store_false', help='disable GPU occupancy') 35 | 36 | ### Network architecture parameters 37 | parser.add_argument('--branch_mode', default='dual', type=str, choices=['dual', 'single'], help='the branch mode for AANets') 38 | parser.add_argument('--branch_1', default='ss', type=str, choices=['ss', 'fixed', 'free'], help='the network type for the first branch') 39 | parser.add_argument('--branch_2', default='free', type=str, choices=['ss', 'fixed', 'free'], help='the network type for the second branch') 40 | parser.add_argument('--imgnet_backbone', default='resnet18', type=str, choices=['resnet18', 'resnet34'], help='network backbone for ImageNet') 41 | 42 | ### Incremental learning parameters 43 | parser.add_argument('--num_classes', default=100, type=int, help='the total number of classes') 44 | parser.add_argument('--nb_cl_fg', default=50, type=int, help='the number of classes in the 0-th phase') 45 | parser.add_argument('--nb_cl', default=10, type=int, help='the number of classes for each phase') 46 | parser.add_argument('--nb_protos', default=20, type=int, help='the number of exemplars for each class') 47 | parser.add_argument('--epochs', default=160, type=int, help='the number of epochs') 48 | parser.add_argument('--dynamic_budget', action='store_true', help='using dynamic budget setting') 49 | parser.add_argument('--fusion_lr', default=1e-8, type=float, help='the learning rate for the aggregation weights') 50 | 51 | ### General learning parameters 52 | parser.add_argument('--lr_factor', default=0.1, type=float, help='learning rate decay factor') 53 | parser.add_argument('--custom_weight_decay', default=5e-4, type=float, help='weight decay parameter for the optimizer') 54 | parser.add_argument('--custom_momentum', default=0.9, type=float, help='momentum parameter for the optimizer') 55 | parser.add_argument('--base_lr1', default=0.1, type=float, help='learning rate for the 0-th phase') 56 | parser.add_argument('--base_lr2', default=0.1, type=float, help='learning rate for the following phases') 57 | 58 | ### LUCIR parameters 59 | parser.add_argument('--the_lambda', default=5, type=float, help='lamda for LF') 60 | parser.add_argument('--dist', default=0.5, type=float, help='dist for margin ranking losses') 61 | parser.add_argument('--K', default=2, type=int, help='K for margin ranking losses') 62 | parser.add_argument('--lw_mr', default=1, type=float, help='loss weight for margin ranking losses') 63 | 64 | ### iCaRL parameters 65 | parser.add_argument('--icarl_beta', default=0.25, type=float, help='beta for iCaRL') 66 | parser.add_argument('--icarl_T', default=2, type=int, help='T for iCaRL') 67 | 68 | the_args = parser.parse_args() 69 | 70 | # Checke the number of classes, ensure they are reasonable 71 | assert(the_args.nb_cl_fg % the_args.nb_cl == 0) 72 | assert(the_args.nb_cl_fg >= the_args.nb_cl) 73 | 74 | # Print the parameters 75 | print(the_args) 76 | 77 | # Set GPU index 78 | os.environ['CUDA_VISIBLE_DEVICES'] = the_args.gpu 79 | print('Using gpu:', the_args.gpu) 80 | 81 | # Occupy GPU memory in advance 82 | if the_args.disable_gpu_occupancy: 83 | occupy_memory(the_args.gpu) 84 | print('Occupy GPU memory in advance.') 85 | 86 | # Set the trainer and start training 87 | trainer = Trainer(the_args) 88 | trainer.train() 89 | -------------------------------------------------------------------------------- /adaptive-aggregation-networks/models/modified_linear.py: -------------------------------------------------------------------------------- 1 | ##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 | ## Created by: Yaoyao Liu 3 | ## Modified from: https://github.com/hshustc/CVPR19_Incremental_Learning 4 | ## Max Planck Institute for Informatics 5 | ## yaoyao.liu@mpi-inf.mpg.de 6 | ## Copyright (c) 2021 7 | ## 8 | ## This source code is licensed under the MIT-style license found in the 9 | ## LICENSE file in the root directory of this source tree 10 | ##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 11 | import math 12 | import torch 13 | from torch.nn.parameter import Parameter 14 | from torch.nn import functional as F 15 | from torch.nn import Module 16 | 17 | class CosineLinear(Module): 18 | def __init__(self, in_features, out_features, sigma=True): 19 | super(CosineLinear, self).__init__() 20 | self.in_features = in_features 21 | self.out_features = out_features 22 | self.weight = Parameter(torch.Tensor(out_features, in_features)) 23 | if sigma: 24 | self.sigma = Parameter(torch.Tensor(1)) 25 | else: 26 | self.register_parameter('sigma', None) 27 | self.reset_parameters() 28 | 29 | def reset_parameters(self): 30 | stdv = 1. / math.sqrt(self.weight.size(1)) 31 | self.weight.data.uniform_(-stdv, stdv) 32 | if self.sigma is not None: 33 | self.sigma.data.fill_(1) 34 | 35 | def forward(self, input): 36 | out = F.linear(F.normalize(input, p=2,dim=1), \ 37 | F.normalize(self.weight, p=2, dim=1)) 38 | if self.sigma is not None: 39 | out = self.sigma * out 40 | return out 41 | 42 | class SplitCosineLinear(Module): 43 | def __init__(self, in_features, out_features1, out_features2, sigma=True): 44 | super(SplitCosineLinear, self).__init__() 45 | self.in_features = in_features 46 | self.out_features = out_features1 + out_features2 47 | self.fc1 = CosineLinear(in_features, out_features1, False) 48 | self.fc2 = CosineLinear(in_features, out_features2, False) 49 | if sigma: 50 | self.sigma = Parameter(torch.Tensor(1)) 51 | self.sigma.data.fill_(1) 52 | else: 53 | self.register_parameter('sigma', None) 54 | 55 | def forward(self, x): 56 | out1 = self.fc1(x) 57 | out2 = self.fc2(x) 58 | out = torch.cat((out1, out2), dim=1) 59 | if self.sigma is not None: 60 | out = self.sigma * out 61 | return out 62 | -------------------------------------------------------------------------------- /adaptive-aggregation-networks/models/modified_resnet.py: -------------------------------------------------------------------------------- 1 | ##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 | ## Created by: Yaoyao Liu 3 | ## Modified from: https://github.com/hshustc/CVPR19_Incremental_Learning 4 | ## Max Planck Institute for Informatics 5 | ## yaoyao.liu@mpi-inf.mpg.de 6 | ## Copyright (c) 2021 7 | ## 8 | ## This source code is licensed under the MIT-style license found in the 9 | ## LICENSE file in the root directory of this source tree 10 | ##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 11 | import torch.nn as nn 12 | import math 13 | import torch.utils.model_zoo as model_zoo 14 | import models.modified_linear as modified_linear 15 | 16 | def conv3x3(in_planes, out_planes, stride=1): 17 | return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride, 18 | padding=1, bias=False) 19 | 20 | class BasicBlock(nn.Module): 21 | expansion = 1 22 | 23 | def __init__(self, inplanes, planes, stride=1, downsample=None, last=False): 24 | super(BasicBlock, self).__init__() 25 | self.conv1 = conv3x3(inplanes, planes, stride) 26 | self.bn1 = nn.BatchNorm2d(planes) 27 | self.relu = nn.ReLU(inplace=True) 28 | self.conv2 = conv3x3(planes, planes) 29 | self.bn2 = nn.BatchNorm2d(planes) 30 | self.downsample = downsample 31 | self.stride = stride 32 | self.last = last 33 | 34 | def forward(self, x): 35 | residual = x 36 | 37 | out = self.conv1(x) 38 | out = self.bn1(out) 39 | out = self.relu(out) 40 | 41 | out = self.conv2(out) 42 | out = self.bn2(out) 43 | 44 | if self.downsample is not None: 45 | residual = self.downsample(x) 46 | 47 | out += residual 48 | if not self.last: 49 | out = self.relu(out) 50 | 51 | return out 52 | 53 | class ResNet(nn.Module): 54 | 55 | def __init__(self, block, layers, num_classes=1000): 56 | self.inplanes = 64 57 | super(ResNet, self).__init__() 58 | self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, 59 | bias=False) 60 | self.bn1 = nn.BatchNorm2d(64) 61 | self.relu = nn.ReLU(inplace=True) 62 | self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1) 63 | self.layer1 = self._make_layer(block, 64, layers[0]) 64 | self.layer2 = self._make_layer(block, 128, layers[1], stride=2) 65 | self.layer3 = self._make_layer(block, 256, layers[2], stride=2) 66 | self.layer4 = self._make_layer(block, 512, layers[3], stride=2, last_phase=True) 67 | self.avgpool = nn.AvgPool2d(7, stride=1) 68 | self.fc = modified_linear.CosineLinear(512 * block.expansion, num_classes) 69 | 70 | for m in self.modules(): 71 | if isinstance(m, nn.Conv2d): 72 | nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') 73 | elif isinstance(m, nn.BatchNorm2d): 74 | nn.init.constant_(m.weight, 1) 75 | nn.init.constant_(m.bias, 0) 76 | 77 | def _make_layer(self, block, planes, blocks, stride=1, last_phase=False): 78 | downsample = None 79 | if stride != 1 or self.inplanes != planes * block.expansion: 80 | downsample = nn.Sequential( 81 | nn.Conv2d(self.inplanes, planes * block.expansion, 82 | kernel_size=1, stride=stride, bias=False), 83 | nn.BatchNorm2d(planes * block.expansion), 84 | ) 85 | 86 | layers = [] 87 | layers.append(block(self.inplanes, planes, stride, downsample)) 88 | self.inplanes = planes * block.expansion 89 | if last_phase: 90 | for i in range(1, blocks-1): 91 | layers.append(block(self.inplanes, planes)) 92 | layers.append(block(self.inplanes, planes, last=True)) 93 | else: 94 | for i in range(1, blocks): 95 | layers.append(block(self.inplanes, planes)) 96 | 97 | return nn.Sequential(*layers) 98 | 99 | def forward(self, x): 100 | x = self.conv1(x) 101 | x = self.bn1(x) 102 | x = self.relu(x) 103 | x = self.maxpool(x) 104 | 105 | x = self.layer1(x) 106 | x = self.layer2(x) 107 | x = self.layer3(x) 108 | x = self.layer4(x) 109 | 110 | x = self.avgpool(x) 111 | x = x.view(x.size(0), -1) 112 | x = self.fc(x) 113 | 114 | return x 115 | 116 | def resnet18(pretrained=False, **kwargs): 117 | model = ResNet(BasicBlock, [2, 2, 2, 2], **kwargs) 118 | return model 119 | 120 | def resnet34(pretrained=False, **kwargs): 121 | model = ResNet(BasicBlock, [3, 4, 6, 3], **kwargs) 122 | return model 123 | -------------------------------------------------------------------------------- /adaptive-aggregation-networks/models/modified_resnet_cifar.py: -------------------------------------------------------------------------------- 1 | ##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 | ## Created by: Yaoyao Liu 3 | ## Modified from: https://github.com/hshustc/CVPR19_Incremental_Learning 4 | ## Max Planck Institute for Informatics 5 | ## yaoyao.liu@mpi-inf.mpg.de 6 | ## Copyright (c) 2021 7 | ## 8 | ## This source code is licensed under the MIT-style license found in the 9 | ## LICENSE file in the root directory of this source tree 10 | ##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 11 | import torch.nn as nn 12 | import math 13 | import torch.utils.model_zoo as model_zoo 14 | import models.modified_linear as modified_linear 15 | 16 | def conv3x3(in_planes, out_planes, stride=1): 17 | """3x3 convolution with padding""" 18 | return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride, 19 | padding=1, bias=False) 20 | 21 | class BasicBlock(nn.Module): 22 | expansion = 1 23 | 24 | def __init__(self, inplanes, planes, stride=1, downsample=None, last=False): 25 | super(BasicBlock, self).__init__() 26 | self.conv1 = conv3x3(inplanes, planes, stride) 27 | self.bn1 = nn.BatchNorm2d(planes) 28 | self.relu = nn.ReLU(inplace=True) 29 | self.conv2 = conv3x3(planes, planes) 30 | self.bn2 = nn.BatchNorm2d(planes) 31 | self.downsample = downsample 32 | self.stride = stride 33 | self.last = last 34 | 35 | def forward(self, x): 36 | residual = x 37 | 38 | out = self.conv1(x) 39 | out = self.bn1(out) 40 | out = self.relu(out) 41 | 42 | out = self.conv2(out) 43 | out = self.bn2(out) 44 | 45 | if self.downsample is not None: 46 | residual = self.downsample(x) 47 | 48 | out += residual 49 | if not self.last: 50 | out = self.relu(out) 51 | 52 | return out 53 | 54 | class ResNet(nn.Module): 55 | 56 | def __init__(self, block, layers, num_classes=10): 57 | self.inplanes = 16 58 | super(ResNet, self).__init__() 59 | self.conv1 = nn.Conv2d(3, 16, kernel_size=3, stride=1, padding=1, 60 | bias=False) 61 | self.bn1 = nn.BatchNorm2d(16) 62 | self.relu = nn.ReLU(inplace=True) 63 | self.layer1 = self._make_layer(block, 16, layers[0]) 64 | self.layer2 = self._make_layer(block, 32, layers[1], stride=2) 65 | self.layer3 = self._make_layer(block, 64, layers[2], stride=2, last_phase=True) 66 | self.avgpool = nn.AvgPool2d(8, stride=1) 67 | self.fc = modified_linear.CosineLinear(64 * block.expansion, num_classes) 68 | 69 | for m in self.modules(): 70 | if isinstance(m, nn.Conv2d): 71 | nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') 72 | elif isinstance(m, nn.BatchNorm2d): 73 | nn.init.constant_(m.weight, 1) 74 | nn.init.constant_(m.bias, 0) 75 | 76 | def _make_layer(self, block, planes, blocks, stride=1, last_phase=False): 77 | downsample = None 78 | if stride != 1 or self.inplanes != planes * block.expansion: 79 | downsample = nn.Sequential( 80 | nn.Conv2d(self.inplanes, planes * block.expansion, 81 | kernel_size=1, stride=stride, bias=False), 82 | nn.BatchNorm2d(planes * block.expansion), 83 | ) 84 | 85 | layers = [] 86 | layers.append(block(self.inplanes, planes, stride, downsample)) 87 | self.inplanes = planes * block.expansion 88 | if last_phase: 89 | for i in range(1, blocks-1): 90 | layers.append(block(self.inplanes, planes)) 91 | layers.append(block(self.inplanes, planes, last=True)) 92 | else: 93 | for i in range(1, blocks): 94 | layers.append(block(self.inplanes, planes)) 95 | 96 | return nn.Sequential(*layers) 97 | 98 | def forward(self, x): 99 | x = self.conv1(x) 100 | x = self.bn1(x) 101 | x = self.relu(x) 102 | 103 | x = self.layer1(x) 104 | x = self.layer2(x) 105 | x = self.layer3(x) 106 | 107 | x = self.avgpool(x) 108 | x = x.view(x.size(0), -1) 109 | x = self.fc(x) 110 | 111 | return x 112 | 113 | def resnet20(pretrained=False, **kwargs): 114 | n = 3 115 | model = ResNet(BasicBlock, [n, n, n], **kwargs) 116 | return model 117 | 118 | def resnet32(pretrained=False, **kwargs): 119 | n = 5 120 | model = ResNet(BasicBlock, [n, n, n], **kwargs) 121 | return model 122 | -------------------------------------------------------------------------------- /adaptive-aggregation-networks/models/modified_resnetmtl.py: -------------------------------------------------------------------------------- 1 | ##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 | ## Created by: Yaoyao Liu 3 | ## Modified from: https://github.com/hshustc/CVPR19_Incremental_Learning 4 | ## Max Planck Institute for Informatics 5 | ## yaoyao.liu@mpi-inf.mpg.de 6 | ## Copyright (c) 2021 7 | ## 8 | ## This source code is licensed under the MIT-style license found in the 9 | ## LICENSE file in the root directory of this source tree 10 | ##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 11 | import torch.nn as nn 12 | import math 13 | import torch.utils.model_zoo as model_zoo 14 | import models.modified_linear as modified_linear 15 | from utils.incremental.conv2d_mtl import Conv2dMtl 16 | 17 | def conv3x3mtl(in_planes, out_planes, stride=1): 18 | """3x3 convolution with padding""" 19 | return Conv2dMtl(in_planes, out_planes, kernel_size=3, stride=stride, 20 | padding=1, bias=False) 21 | 22 | 23 | class BasicBlockMtl(nn.Module): 24 | expansion = 1 25 | 26 | def __init__(self, inplanes, planes, stride=1, downsample=None, last=False): 27 | super(BasicBlockMtl, self).__init__() 28 | self.conv1 = conv3x3mtl(inplanes, planes, stride) 29 | self.bn1 = nn.BatchNorm2d(planes) 30 | self.relu = nn.ReLU(inplace=True) 31 | self.conv2 = conv3x3mtl(planes, planes) 32 | self.bn2 = nn.BatchNorm2d(planes) 33 | self.downsample = downsample 34 | self.stride = stride 35 | self.last = last 36 | 37 | def forward(self, x): 38 | residual = x 39 | 40 | out = self.conv1(x) 41 | out = self.bn1(out) 42 | out = self.relu(out) 43 | 44 | out = self.conv2(out) 45 | out = self.bn2(out) 46 | 47 | if self.downsample is not None: 48 | residual = self.downsample(x) 49 | 50 | out += residual 51 | if not self.last: 52 | out = self.relu(out) 53 | 54 | return out 55 | 56 | class ResNetMtl(nn.Module): 57 | 58 | def __init__(self, block, layers, num_classes=1000): 59 | self.inplanes = 64 60 | super(ResNetMtl, self).__init__() 61 | self.conv1 = Conv2dMtl(3, 64, kernel_size=7, stride=2, padding=3, 62 | bias=False) 63 | self.bn1 = nn.BatchNorm2d(64) 64 | self.relu = nn.ReLU(inplace=True) 65 | self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1) 66 | self.layer1 = self._make_layer(block, 64, layers[0]) 67 | self.layer2 = self._make_layer(block, 128, layers[1], stride=2) 68 | self.layer3 = self._make_layer(block, 256, layers[2], stride=2) 69 | self.layer4 = self._make_layer(block, 512, layers[3], stride=2, last_phase=True) 70 | self.avgpool = nn.AvgPool2d(7, stride=1) 71 | self.fc = modified_linear.CosineLinear(512 * block.expansion, num_classes) 72 | 73 | for m in self.modules(): 74 | if isinstance(m, Conv2dMtl): 75 | nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') 76 | elif isinstance(m, nn.BatchNorm2d): 77 | nn.init.constant_(m.weight, 1) 78 | nn.init.constant_(m.bias, 0) 79 | 80 | def _make_layer(self, block, planes, blocks, stride=1, last_phase=False): 81 | downsample = None 82 | if stride != 1 or self.inplanes != planes * block.expansion: 83 | downsample = nn.Sequential( 84 | Conv2dMtl(self.inplanes, planes * block.expansion, 85 | kernel_size=1, stride=stride, bias=False), 86 | nn.BatchNorm2d(planes * block.expansion), 87 | ) 88 | 89 | layers = [] 90 | layers.append(block(self.inplanes, planes, stride, downsample)) 91 | self.inplanes = planes * block.expansion 92 | if last_phase: 93 | for i in range(1, blocks-1): 94 | layers.append(block(self.inplanes, planes)) 95 | layers.append(block(self.inplanes, planes, last=True)) 96 | else: 97 | for i in range(1, blocks): 98 | layers.append(block(self.inplanes, planes)) 99 | 100 | return nn.Sequential(*layers) 101 | 102 | def forward(self, x): 103 | x = self.conv1(x) 104 | x = self.bn1(x) 105 | x = self.relu(x) 106 | x = self.maxpool(x) 107 | 108 | x = self.layer1(x) 109 | x = self.layer2(x) 110 | x = self.layer3(x) 111 | x = self.layer4(x) 112 | 113 | x = self.avgpool(x) 114 | x = x.view(x.size(0), -1) 115 | x = self.fc(x) 116 | 117 | return x 118 | 119 | def resnetmtl18(pretrained=False, **kwargs): 120 | model = ResNetMtl(BasicBlockMtl, [2, 2, 2, 2], **kwargs) 121 | return model 122 | 123 | def resnetmtl34(pretrained=False, **kwargs): 124 | model = ResNetMtl(BasicBlockMtl, [3, 4, 6, 3], **kwargs) 125 | return model 126 | -------------------------------------------------------------------------------- /adaptive-aggregation-networks/models/modified_resnetmtl_cifar.py: -------------------------------------------------------------------------------- 1 | ##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 | ## Created by: Yaoyao Liu 3 | ## Modified from: https://github.com/hshustc/CVPR19_Incremental_Learning 4 | ## Max Planck Institute for Informatics 5 | ## yaoyao.liu@mpi-inf.mpg.de 6 | ## Copyright (c) 2021 7 | ## 8 | ## This source code is licensed under the MIT-style license found in the 9 | ## LICENSE file in the root directory of this source tree 10 | ##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 11 | import torch.nn as nn 12 | import math 13 | import torch.utils.model_zoo as model_zoo 14 | import models.modified_linear as modified_linear 15 | from utils.incremental.conv2d_mtl import Conv2dMtl 16 | 17 | def conv3x3mtl(in_planes, out_planes, stride=1): 18 | return Conv2dMtl(in_planes, out_planes, kernel_size=3, stride=stride, 19 | padding=1, bias=False) 20 | 21 | class BasicBlockMtl(nn.Module): 22 | expansion = 1 23 | 24 | def __init__(self, inplanes, planes, stride=1, downsample=None, last=False): 25 | super(BasicBlockMtl, self).__init__() 26 | self.conv1 = conv3x3mtl(inplanes, planes, stride) 27 | self.bn1 = nn.BatchNorm2d(planes) 28 | self.relu = nn.ReLU(inplace=True) 29 | self.conv2 = conv3x3mtl(planes, planes) 30 | self.bn2 = nn.BatchNorm2d(planes) 31 | self.downsample = downsample 32 | self.stride = stride 33 | self.last = last 34 | 35 | def forward(self, x): 36 | residual = x 37 | 38 | out = self.conv1(x) 39 | out = self.bn1(out) 40 | out = self.relu(out) 41 | 42 | out = self.conv2(out) 43 | out = self.bn2(out) 44 | 45 | if self.downsample is not None: 46 | residual = self.downsample(x) 47 | 48 | out += residual 49 | if not self.last: 50 | out = self.relu(out) 51 | 52 | return out 53 | 54 | class ResNetMtl(nn.Module): 55 | 56 | def __init__(self, block, layers, num_classes=10): 57 | self.inplanes = 16 58 | super(ResNetMtl, self).__init__() 59 | self.conv1 = Conv2dMtl(3, 16, kernel_size=3, stride=1, padding=1, 60 | bias=False) 61 | self.bn1 = nn.BatchNorm2d(16) 62 | self.relu = nn.ReLU(inplace=True) 63 | self.layer1 = self._make_layer(block, 16, layers[0]) 64 | self.layer2 = self._make_layer(block, 32, layers[1], stride=2) 65 | self.layer3 = self._make_layer(block, 64, layers[2], stride=2, last_phase=True) 66 | self.avgpool = nn.AvgPool2d(8, stride=1) 67 | self.fc = modified_linear.CosineLinear(64 * block.expansion, num_classes) 68 | 69 | for m in self.modules(): 70 | if isinstance(m, Conv2dMtl): 71 | nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') 72 | elif isinstance(m, nn.BatchNorm2d): 73 | nn.init.constant_(m.weight, 1) 74 | nn.init.constant_(m.bias, 0) 75 | 76 | def _make_layer(self, block, planes, blocks, stride=1, last_phase=False): 77 | downsample = None 78 | if stride != 1 or self.inplanes != planes * block.expansion: 79 | downsample = nn.Sequential( 80 | Conv2dMtl(self.inplanes, planes * block.expansion, 81 | kernel_size=1, stride=stride, bias=False), 82 | nn.BatchNorm2d(planes * block.expansion), 83 | ) 84 | 85 | layers = [] 86 | layers.append(block(self.inplanes, planes, stride, downsample)) 87 | self.inplanes = planes * block.expansion 88 | if last_phase: 89 | for i in range(1, blocks-1): 90 | layers.append(block(self.inplanes, planes)) 91 | layers.append(block(self.inplanes, planes, last=True)) 92 | else: 93 | for i in range(1, blocks): 94 | layers.append(block(self.inplanes, planes)) 95 | 96 | return nn.Sequential(*layers) 97 | 98 | def forward(self, x): 99 | x = self.conv1(x) 100 | x = self.bn1(x) 101 | x = self.relu(x) 102 | 103 | x = self.layer1(x) 104 | x = self.layer2(x) 105 | x = self.layer3(x) 106 | 107 | x = self.avgpool(x) 108 | x = x.view(x.size(0), -1) 109 | x = self.fc(x) 110 | 111 | return x 112 | 113 | def resnetmtl20(pretrained=False, **kwargs): 114 | n = 3 115 | model = ResNetMtl(BasicBlockMtl, [n, n, n], **kwargs) 116 | return model 117 | 118 | def resnetmtl32(pretrained=False, **kwargs): 119 | n = 5 120 | model = ResNetMtl(BasicBlockMtl, [n, n, n], **kwargs) 121 | return model 122 | -------------------------------------------------------------------------------- /adaptive-aggregation-networks/models/resnet_cifar.py: -------------------------------------------------------------------------------- 1 | ##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 | ## Created by: Yaoyao Liu 3 | ## Modified from: https://github.com/hshustc/CVPR19_Incremental_Learning 4 | ## Max Planck Institute for Informatics 5 | ## yaoyao.liu@mpi-inf.mpg.de 6 | ## Copyright (c) 2021 7 | ## 8 | ## This source code is licensed under the MIT-style license found in the 9 | ## LICENSE file in the root directory of this source tree 10 | ##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 11 | import torch.nn as nn 12 | import math 13 | import torch.utils.model_zoo as model_zoo 14 | from utils.incremental.conv2d_mtl import Conv2d 15 | 16 | def conv3x3(in_planes, out_planes, stride=1): 17 | """3x3 convolution with padding""" 18 | return Conv2d(in_planes, out_planes, kernel_size=3, stride=stride, 19 | padding=1, bias=False) 20 | 21 | class BasicBlock(nn.Module): 22 | expansion = 1 23 | 24 | def __init__(self, inplanes, planes, stride=1, downsample=None): 25 | super(BasicBlock, self).__init__() 26 | self.conv1 = conv3x3(inplanes, planes, stride) 27 | self.bn1 = nn.BatchNorm2d(planes) 28 | self.relu = nn.ReLU(inplace=True) 29 | self.conv2 = conv3x3(planes, planes) 30 | self.bn2 = nn.BatchNorm2d(planes) 31 | self.downsample = downsample 32 | self.stride = stride 33 | 34 | def forward(self, x): 35 | residual = x 36 | import pdb 37 | pdb.set_trace() 38 | out = self.conv1(x) 39 | out = self.bn1(out) 40 | out = self.relu(out) 41 | 42 | out = self.conv2(out) 43 | out = self.bn2(out) 44 | 45 | if self.downsample is not None: 46 | residual = self.downsample(x) 47 | 48 | out += residual 49 | out = self.relu(out) 50 | 51 | return out 52 | 53 | 54 | class Bottleneck(nn.Module): 55 | expansion = 4 56 | 57 | def __init__(self, inplanes, planes, stride=1, downsample=None): 58 | super(Bottleneck, self).__init__() 59 | self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False) 60 | self.bn1 = nn.BatchNorm2d(planes) 61 | self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride, 62 | padding=1, bias=False) 63 | self.bn2 = nn.BatchNorm2d(planes) 64 | self.conv3 = nn.Conv2d(planes, planes * self.expansion, kernel_size=1, bias=False) 65 | self.bn3 = nn.BatchNorm2d(planes * self.expansion) 66 | self.relu = nn.ReLU(inplace=True) 67 | self.downsample = downsample 68 | self.stride = stride 69 | 70 | def forward(self, x): 71 | residual = x 72 | 73 | out = self.conv1(x) 74 | out = self.bn1(out) 75 | out = self.relu(out) 76 | 77 | out = self.conv2(out) 78 | out = self.bn2(out) 79 | out = self.relu(out) 80 | 81 | out = self.conv3(out) 82 | out = self.bn3(out) 83 | 84 | if self.downsample is not None: 85 | residual = self.downsample(x) 86 | 87 | out += residual 88 | out = self.relu(out) 89 | 90 | return out 91 | 92 | 93 | class ResNet(nn.Module): 94 | 95 | def __init__(self, block, layers, num_classes=10): 96 | self.inplanes = 16 97 | super(ResNet, self).__init__() 98 | self.conv1 = nn.Conv2d(3, 16, kernel_size=3, stride=1, padding=1, 99 | bias=False) 100 | self.bn1 = nn.BatchNorm2d(16) 101 | self.relu = nn.ReLU(inplace=True) 102 | self.layer1 = self._make_layer(block, 16, layers[0]) 103 | self.layer2 = self._make_layer(block, 32, layers[1], stride=2) 104 | self.layer3 = self._make_layer(block, 64, layers[2], stride=2) 105 | self.avgpool = nn.AvgPool2d(8, stride=1) 106 | self.fc = nn.Linear(64 * block.expansion, num_classes) 107 | 108 | for m in self.modules(): 109 | if isinstance(m, nn.Conv2d): 110 | nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') 111 | elif isinstance(m, nn.BatchNorm2d): 112 | nn.init.constant_(m.weight, 1) 113 | nn.init.constant_(m.bias, 0) 114 | 115 | def _make_layer(self, block, planes, blocks, stride=1): 116 | downsample = None 117 | if stride != 1 or self.inplanes != planes * block.expansion: 118 | downsample = nn.Sequential( 119 | nn.Conv2d(self.inplanes, planes * block.expansion, 120 | kernel_size=1, stride=stride, bias=False), 121 | nn.BatchNorm2d(planes * block.expansion), 122 | ) 123 | 124 | layers = [] 125 | layers.append(block(self.inplanes, planes, stride, downsample)) 126 | self.inplanes = planes * block.expansion 127 | for i in range(1, blocks): 128 | layers.append(block(self.inplanes, planes)) 129 | 130 | return nn.Sequential(*layers) 131 | 132 | def forward(self, x): 133 | x = self.conv1(x) 134 | x = self.bn1(x) 135 | x = self.relu(x) 136 | 137 | x = self.layer1(x) 138 | x = self.layer2(x) 139 | x = self.layer3(x) 140 | 141 | x = self.avgpool(x) 142 | x = x.view(x.size(0), -1) 143 | x = self.fc(x) 144 | 145 | return x 146 | 147 | def resnet20(pretrained=False, **kwargs): 148 | n = 3 149 | model = ResNet(BasicBlock, [n, n, n], **kwargs) 150 | return model 151 | 152 | def resnet32(pretrained=False, **kwargs): 153 | n = 5 154 | model = ResNet(BasicBlock, [n, n, n], **kwargs) 155 | return model 156 | 157 | def resnet56(pretrained=False, **kwargs): 158 | n = 9 159 | model = ResNet(Bottleneck, [n, n, n], **kwargs) 160 | return model 161 | -------------------------------------------------------------------------------- /adaptive-aggregation-networks/trainer/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yaoyao-liu/class-incremental-learning/701af9f819f559c6ab3d3ee73bb3d7c21e924572/adaptive-aggregation-networks/trainer/__init__.py -------------------------------------------------------------------------------- /adaptive-aggregation-networks/trainer/incremental_icarl.py: -------------------------------------------------------------------------------- 1 | ##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 | ## Created by: Yaoyao Liu 3 | ## Modified from: https://github.com/hshustc/CVPR19_Incremental_Learning 4 | ## Max Planck Institute for Informatics 5 | ## yaoyao.liu@mpi-inf.mpg.de 6 | ## Copyright (c) 2019 7 | ## 8 | ## This source code is licensed under the MIT-style license found in the 9 | ## LICENSE file in the root directory of this source tree 10 | ##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 11 | """ Training code for iCaRL """ 12 | import torch 13 | import tqdm 14 | import numpy as np 15 | import torch.nn as nn 16 | import torchvision 17 | from torch.optim import lr_scheduler 18 | from torchvision import datasets, models, transforms 19 | from utils.misc import * 20 | from utils.process_fp import process_inputs_fp 21 | import torch.nn.functional as F 22 | 23 | def incremental_train_and_eval(the_args, epochs, fusion_vars, ref_fusion_vars, b1_model, ref_model, b2_model, ref_b2_model, tg_optimizer, tg_lr_scheduler, fusion_optimizer, fusion_lr_scheduler, trainloader, testloader, iteration, start_iteration, X_protoset_cumuls, Y_protoset_cumuls, order_list,lamda, dist, K, lw_mr, balancedloader, T=None, beta=None, fix_bn=False, weight_per_class=None, device=None): 24 | 25 | # Setting up the CUDA device 26 | if device is None: 27 | device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") 28 | # Set the 1st branch reference model to the evaluation mode 29 | ref_model.eval() 30 | 31 | # Get the number of old classes 32 | num_old_classes = ref_model.fc.out_features 33 | 34 | # If the 2nd branch reference is not None, set it to the evaluation mode 35 | if iteration > start_iteration+1: 36 | ref_b2_model.eval() 37 | 38 | for epoch in range(epochs): 39 | # Start training for the current phase, set the two branch models to the training mode 40 | b1_model.train() 41 | b2_model.train() 42 | 43 | # Fix the batch norm parameters according to the config 44 | if fix_bn: 45 | for m in b1_model.modules(): 46 | if isinstance(m, nn.BatchNorm2d): 47 | m.eval() 48 | 49 | # Set all the losses to zeros 50 | train_loss = 0 51 | train_loss1 = 0 52 | train_loss2 = 0 53 | # Set the counters to zeros 54 | correct = 0 55 | total = 0 56 | 57 | # Learning rate decay 58 | tg_lr_scheduler.step() 59 | fusion_lr_scheduler.step() 60 | 61 | # Print the information 62 | print('\nEpoch: %d, learning rate: ' % epoch, end='') 63 | print(tg_lr_scheduler.get_lr()[0]) 64 | 65 | for batch_idx, (inputs, targets) in enumerate(trainloader): 66 | 67 | # Get a batch of training samples, transfer them to the device 68 | inputs, targets = inputs.to(device), targets.to(device) 69 | 70 | # Clear the gradient of the paramaters for the tg_optimizer 71 | tg_optimizer.zero_grad() 72 | 73 | # Forward the samples in the deep networks 74 | outputs, _ = process_inputs_fp(the_args, fusion_vars, b1_model, b2_model, inputs) 75 | 76 | if iteration == start_iteration+1: 77 | ref_outputs = ref_model(inputs) 78 | else: 79 | ref_outputs, ref_features_new = process_inputs_fp(the_args, ref_fusion_vars, ref_model, ref_b2_model, inputs) 80 | # Loss 1: logits-level distillation loss 81 | loss1 = nn.KLDivLoss()(F.log_softmax(outputs[:,:num_old_classes]/T, dim=1), \ 82 | F.softmax(ref_outputs.detach()/T, dim=1)) * T * T * beta * num_old_classes 83 | # Loss 2: classification loss 84 | loss2 = nn.CrossEntropyLoss(weight_per_class)(outputs, targets) 85 | # Sum up all looses 86 | loss = loss1 + loss2 87 | 88 | # Backward and update the parameters 89 | loss.backward() 90 | tg_optimizer.step() 91 | 92 | # Record the losses and the number of samples to compute the accuracy 93 | train_loss += loss.item() 94 | train_loss1 += loss1.item() 95 | train_loss2 += loss2.item() 96 | _, predicted = outputs.max(1) 97 | total += targets.size(0) 98 | correct += predicted.eq(targets).sum().item() 99 | 100 | # Print the training losses and accuracies 101 | print('Train set: {}, train loss1: {:.4f}, train loss2: {:.4f}, train loss: {:.4f} accuracy: {:.4f}'.format(len(trainloader), train_loss1/(batch_idx+1), train_loss2/(batch_idx+1), train_loss/(batch_idx+1), 100.*correct/total)) 102 | 103 | # Update the aggregation weights 104 | b1_model.eval() 105 | b2_model.eval() 106 | 107 | for batch_idx, (inputs, targets) in enumerate(balancedloader): 108 | fusion_optimizer.zero_grad() 109 | inputs, targets = inputs.to(device), targets.to(device) 110 | outputs, _ = process_inputs_fp(the_args, fusion_vars, b1_model, b2_model, inputs) 111 | loss = nn.CrossEntropyLoss(weight_per_class)(outputs, targets) 112 | loss.backward() 113 | fusion_optimizer.step() 114 | 115 | # Running the test for this epoch 116 | b1_model.eval() 117 | b2_model.eval() 118 | test_loss = 0 119 | correct = 0 120 | total = 0 121 | with torch.no_grad(): 122 | for batch_idx, (inputs, targets) in enumerate(testloader): 123 | inputs, targets = inputs.to(device), targets.to(device) 124 | outputs, _ = process_inputs_fp(the_args, fusion_vars, b1_model, b2_model, inputs) 125 | loss = nn.CrossEntropyLoss(weight_per_class)(outputs, targets) 126 | test_loss += loss.item() 127 | _, predicted = outputs.max(1) 128 | total += targets.size(0) 129 | correct += predicted.eq(targets).sum().item() 130 | print('Test set: {} test loss: {:.4f} accuracy: {:.4f}'.format(len(testloader), test_loss/(batch_idx+1), 100.*correct/total)) 131 | 132 | print("Removing register forward hook") 133 | return b1_model, b2_model 134 | -------------------------------------------------------------------------------- /adaptive-aggregation-networks/trainer/incremental_lucir.py: -------------------------------------------------------------------------------- 1 | ##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 | ## Created by: Yaoyao Liu 3 | ## Modified from: https://github.com/hshustc/CVPR19_Incremental_Learning 4 | ## Max Planck Institute for Informatics 5 | ## yaoyao.liu@mpi-inf.mpg.de 6 | ## Copyright (c) 2021 7 | ## 8 | ## This source code is licensed under the MIT-style license found in the 9 | ## LICENSE file in the root directory of this source tree 10 | ##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 11 | """ Training code for LUCIR """ 12 | import torch 13 | import tqdm 14 | import numpy as np 15 | import torch.nn as nn 16 | import torchvision 17 | from torch.optim import lr_scheduler 18 | from torchvision import datasets, models, transforms 19 | from utils.misc import * 20 | from utils.process_fp import process_inputs_fp 21 | 22 | cur_features = [] 23 | ref_features = [] 24 | old_scores = [] 25 | new_scores = [] 26 | 27 | def get_ref_features(self, inputs, outputs): 28 | global ref_features 29 | ref_features = inputs[0] 30 | 31 | def get_cur_features(self, inputs, outputs): 32 | global cur_features 33 | cur_features = inputs[0] 34 | 35 | def get_old_scores_before_scale(self, inputs, outputs): 36 | global old_scores 37 | old_scores = outputs 38 | 39 | def get_new_scores_before_scale(self, inputs, outputs): 40 | global new_scores 41 | new_scores = outputs 42 | 43 | def map_labels(order_list, Y_set): 44 | map_Y = [] 45 | for idx in Y_set: 46 | map_Y.append(order_list.index(idx)) 47 | map_Y = np.array(map_Y) 48 | return map_Y 49 | 50 | 51 | def incremental_train_and_eval(the_args, epochs, fusion_vars, ref_fusion_vars, b1_model, ref_model, b2_model, ref_b2_model, \ 52 | tg_optimizer, tg_lr_scheduler, fusion_optimizer, fusion_lr_scheduler, trainloader, testloader, iteration, \ 53 | start_iteration, X_protoset_cumuls, Y_protoset_cumuls, order_list, the_lambda, dist, \ 54 | K, lw_mr, balancedloader, fix_bn=False, weight_per_class=None, device=None): 55 | 56 | # Setting up the CUDA device 57 | if device is None: 58 | device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") 59 | # Set the 1st branch reference model to the evaluation mode 60 | ref_model.eval() 61 | 62 | # Get the number of old classes 63 | num_old_classes = ref_model.fc.out_features 64 | 65 | # Get the features from the current and the reference model 66 | handle_ref_features = ref_model.fc.register_forward_hook(get_ref_features) 67 | handle_cur_features = b1_model.fc.register_forward_hook(get_cur_features) 68 | handle_old_scores_bs = b1_model.fc.fc1.register_forward_hook(get_old_scores_before_scale) 69 | handle_new_scores_bs = b1_model.fc.fc2.register_forward_hook(get_new_scores_before_scale) 70 | 71 | # If the 2nd branch reference is not None, set it to the evaluation mode 72 | if iteration > start_iteration+1: 73 | ref_b2_model.eval() 74 | 75 | for epoch in range(epochs): 76 | # Start training for the current phase, set the two branch models to the training mode 77 | b1_model.train() 78 | b2_model.train() 79 | 80 | # Fix the batch norm parameters according to the config 81 | if fix_bn: 82 | for m in b1_model.modules(): 83 | if isinstance(m, nn.BatchNorm2d): 84 | m.eval() 85 | 86 | # Set all the losses to zeros 87 | train_loss = 0 88 | train_loss1 = 0 89 | train_loss2 = 0 90 | train_loss3 = 0 91 | # Set the counters to zeros 92 | correct = 0 93 | total = 0 94 | 95 | # Learning rate decay 96 | tg_lr_scheduler.step() 97 | fusion_lr_scheduler.step() 98 | 99 | # Print the information 100 | print('\nEpoch: %d, learning rate: ' % epoch, end='') 101 | print(tg_lr_scheduler.get_lr()[0]) 102 | 103 | for batch_idx, (inputs, targets) in enumerate(trainloader): 104 | 105 | # Get a batch of training samples, transfer them to the device 106 | inputs, targets = inputs.to(device), targets.to(device) 107 | 108 | # Clear the gradient of the paramaters for the tg_optimizer 109 | tg_optimizer.zero_grad() 110 | 111 | # Forward the samples in the deep networks 112 | outputs, _ = process_inputs_fp(the_args, fusion_vars, b1_model, b2_model, inputs) 113 | 114 | # Loss 1: feature-level distillation loss 115 | if iteration == start_iteration+1: 116 | ref_outputs = ref_model(inputs) 117 | loss1 = nn.CosineEmbeddingLoss()(cur_features, ref_features.detach(), torch.ones(inputs.shape[0]).to(device)) * the_lambda 118 | else: 119 | ref_outputs, ref_features_new = process_inputs_fp(the_args, ref_fusion_vars, ref_model, ref_b2_model, inputs) 120 | loss1 = nn.CosineEmbeddingLoss()(cur_features, ref_features_new.detach(), torch.ones(inputs.shape[0]).to(device)) * the_lambda 121 | 122 | # Loss 2: classification loss 123 | loss2 = nn.CrossEntropyLoss(weight_per_class)(outputs, targets) 124 | 125 | # Loss 3: margin ranking loss 126 | outputs_bs = torch.cat((old_scores, new_scores), dim=1) 127 | assert(outputs_bs.size()==outputs.size()) 128 | gt_index = torch.zeros(outputs_bs.size()).to(device) 129 | gt_index = gt_index.scatter(1, targets.view(-1,1), 1).ge(0.5) 130 | gt_scores = outputs_bs.masked_select(gt_index) 131 | max_novel_scores = outputs_bs[:, num_old_classes:].topk(K, dim=1)[0] 132 | hard_index = targets.lt(num_old_classes) 133 | hard_num = torch.nonzero(hard_index).size(0) 134 | if hard_num > 0: 135 | gt_scores = gt_scores[hard_index].view(-1, 1).repeat(1, K) 136 | max_novel_scores = max_novel_scores[hard_index] 137 | assert(gt_scores.size() == max_novel_scores.size()) 138 | assert(gt_scores.size(0) == hard_num) 139 | loss3 = nn.MarginRankingLoss(margin=dist)(gt_scores.view(-1, 1), max_novel_scores.view(-1, 1), torch.ones(hard_num*K).to(device)) * lw_mr 140 | else: 141 | loss3 = torch.zeros(1).to(device) 142 | 143 | # Sum up all looses 144 | loss = loss1 + loss2 + loss3 145 | 146 | # Backward and update the parameters 147 | loss.backward() 148 | tg_optimizer.step() 149 | 150 | # Record the losses and the number of samples to compute the accuracy 151 | train_loss += loss.item() 152 | train_loss1 += loss1.item() 153 | train_loss2 += loss2.item() 154 | train_loss3 += loss3.item() 155 | _, predicted = outputs.max(1) 156 | total += targets.size(0) 157 | correct += predicted.eq(targets).sum().item() 158 | 159 | # Print the training losses and accuracies 160 | print('Train set: {}, train loss1: {:.4f}, train loss2: {:.4f}, train loss3: {:.4f}, train loss: {:.4f} accuracy: {:.4f}'.format(len(trainloader), train_loss1/(batch_idx+1), train_loss2/(batch_idx+1), train_loss3/(batch_idx+1), train_loss/(batch_idx+1), 100.*correct/total)) 161 | 162 | # Update the aggregation weights 163 | b1_model.eval() 164 | b2_model.eval() 165 | 166 | for batch_idx, (inputs, targets) in enumerate(balancedloader): 167 | if batch_idx <= 500: 168 | inputs, targets = inputs.to(device), targets.to(device) 169 | outputs, _ = process_inputs_fp(the_args, fusion_vars, b1_model, b2_model, inputs) 170 | loss = nn.CrossEntropyLoss(weight_per_class)(outputs, targets) 171 | loss.backward() 172 | fusion_optimizer.step() 173 | 174 | # Running the test for this epoch 175 | b1_model.eval() 176 | b2_model.eval() 177 | test_loss = 0 178 | correct = 0 179 | total = 0 180 | with torch.no_grad(): 181 | for batch_idx, (inputs, targets) in enumerate(testloader): 182 | inputs, targets = inputs.to(device), targets.to(device) 183 | outputs, _ = process_inputs_fp(the_args, fusion_vars, b1_model, b2_model, inputs) 184 | loss = nn.CrossEntropyLoss(weight_per_class)(outputs, targets) 185 | test_loss += loss.item() 186 | _, predicted = outputs.max(1) 187 | total += targets.size(0) 188 | correct += predicted.eq(targets).sum().item() 189 | print('Test set: {} test loss: {:.4f} accuracy: {:.4f}'.format(len(testloader), test_loss/(batch_idx+1), 100.*correct/total)) 190 | 191 | print("Removing register forward hook") 192 | handle_ref_features.remove() 193 | handle_cur_features.remove() 194 | handle_old_scores_bs.remove() 195 | handle_new_scores_bs.remove() 196 | return b1_model, b2_model 197 | -------------------------------------------------------------------------------- /adaptive-aggregation-networks/trainer/trainer.py: -------------------------------------------------------------------------------- 1 | ##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 | ## Created by: Yaoyao Liu 3 | ## Modified from: https://github.com/hshustc/CVPR19_Incremental_Learning 4 | ## Max Planck Institute for Informatics 5 | ## yaoyao.liu@mpi-inf.mpg.de 6 | ## Copyright (c) 2021 7 | ## 8 | ## This source code is licensed under the MIT-style license found in the 9 | ## LICENSE file in the root directory of this source tree 10 | ##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 11 | """ Class-incremental learning trainer. """ 12 | import torch 13 | import torch.nn as nn 14 | import torch.nn.functional as F 15 | import torch.optim as optim 16 | from torch.optim import lr_scheduler 17 | import torchvision 18 | from torchvision import datasets, models, transforms 19 | from torch.autograd import Variable 20 | from tensorboardX import SummaryWriter 21 | import numpy as np 22 | import time 23 | import os 24 | import os.path as osp 25 | import sys 26 | import copy 27 | import argparse 28 | from PIL import Image 29 | try: 30 | import cPickle as pickle 31 | except: 32 | import pickle 33 | import math 34 | import utils.misc 35 | import models.modified_resnet_cifar as modified_resnet_cifar 36 | import models.modified_resnetmtl_cifar as modified_resnetmtl_cifar 37 | import models.modified_resnet as modified_resnet 38 | import models.modified_resnetmtl as modified_resnetmtl 39 | import models.modified_linear as modified_linear 40 | from utils.imagenet.utils_dataset import split_images_labels 41 | from utils.imagenet.utils_dataset import merge_images_labels 42 | from utils.incremental.compute_accuracy import compute_accuracy 43 | from trainer.incremental_lucir import incremental_train_and_eval as incremental_train_and_eval_lucir 44 | from trainer.incremental_icarl import incremental_train_and_eval as incremental_train_and_eval_icarl 45 | from trainer.zeroth_phase import incremental_train_and_eval_zeroth_phase as incremental_train_and_eval_zeroth_phase 46 | from utils.misc import process_mnemonics 47 | from trainer.base_trainer import BaseTrainer 48 | import warnings 49 | warnings.filterwarnings('ignore') 50 | 51 | class Trainer(BaseTrainer): 52 | def train(self): 53 | """The class that contains the code for the class-incremental system. 54 | This trianer is based on the base_trainer.py in the same folder. 55 | If you hope to find the source code of the functions used in this trainer, you may find them in base_trainer.py. 56 | """ 57 | 58 | # Set tensorboard recorder 59 | self.train_writer = SummaryWriter(comment=self.save_path) 60 | 61 | # Initial the array to store the accuracies for each phase 62 | top1_acc_list_cumul = np.zeros((int(self.args.num_classes/self.args.nb_cl), 3, 1)) 63 | top1_acc_list_ori = np.zeros((int(self.args.num_classes/self.args.nb_cl), 3, 1)) 64 | 65 | # Load the training and test samples from the dataset 66 | X_train_total, Y_train_total, X_valid_total, Y_valid_total = self.set_dataset() 67 | 68 | # Initialize the aggregation weights 69 | self.init_fusion_vars() 70 | 71 | # Initialize the class order 72 | order, order_list = self.init_class_order() 73 | np.random.seed(None) 74 | 75 | # Set empty lists for the data 76 | X_valid_cumuls = [] 77 | X_protoset_cumuls = [] 78 | X_train_cumuls = [] 79 | Y_valid_cumuls = [] 80 | Y_protoset_cumuls = [] 81 | Y_train_cumuls = [] 82 | 83 | # Initialize the prototypes 84 | alpha_dr_herding, prototypes = self.init_prototypes(self.dictionary_size, order, X_train_total, Y_train_total) 85 | 86 | # Set the starting iteration 87 | # We start training the class-incremental learning system from e.g., 50 classes to provide a good initial encoder 88 | start_iter = int(self.args.nb_cl_fg/self.args.nb_cl)-1 89 | 90 | # Set the models and some parameter to None 91 | # These models and parameters will be assigned in the following phases 92 | b1_model = None 93 | ref_model = None 94 | b2_model = None 95 | ref_b2_model = None 96 | the_lambda_mult = None 97 | 98 | for iteration in range(start_iter, int(self.args.num_classes/self.args.nb_cl)): 99 | ### Initialize models for the current phase 100 | b1_model, b2_model, ref_model, ref_b2_model, lambda_mult, cur_lambda, last_iter = self.init_current_phase_model(iteration, start_iter, b1_model, b2_model) 101 | 102 | ### Initialize datasets for the current phase 103 | if iteration == start_iter: 104 | indices_train_10, X_valid_cumul, X_train_cumul, Y_valid_cumul, Y_train_cumul, \ 105 | X_train_cumuls, Y_valid_cumuls, X_protoset_cumuls, Y_protoset_cumuls, X_valid_cumuls, Y_valid_cumuls, \ 106 | X_train, map_Y_train, map_Y_valid_cumul, X_valid_ori, Y_valid_ori = \ 107 | self.init_current_phase_dataset(iteration, \ 108 | start_iter, last_iter, order, order_list, X_train_total, Y_train_total, X_valid_total, Y_valid_total, \ 109 | X_train_cumuls, Y_train_cumuls, X_valid_cumuls, Y_valid_cumuls, X_protoset_cumuls, Y_protoset_cumuls) 110 | else: 111 | indices_train_10, X_valid_cumul, X_train_cumul, Y_valid_cumul, Y_train_cumul, \ 112 | X_train_cumuls, Y_valid_cumuls, X_protoset_cumuls, Y_protoset_cumuls, X_valid_cumuls, Y_valid_cumuls, \ 113 | X_train, map_Y_train, map_Y_valid_cumul, X_protoset, Y_protoset = \ 114 | self.init_current_phase_dataset(iteration, \ 115 | start_iter, last_iter, order, order_list, X_train_total, Y_train_total, X_valid_total, Y_valid_total, \ 116 | X_train_cumuls, Y_train_cumuls, X_valid_cumuls, Y_valid_cumuls, X_protoset_cumuls, Y_protoset_cumuls) 117 | 118 | is_start_iteration = (iteration == start_iter) 119 | 120 | # Imprint weights 121 | if iteration > start_iter: 122 | b1_model = self.imprint_weights(b1_model, b2_model, iteration, is_start_iteration, X_train, map_Y_train, self.dictionary_size) 123 | 124 | # Update training and test dataloader 125 | trainloader, testloader = self.update_train_and_valid_loader(X_train, map_Y_train, X_valid_cumul, map_Y_valid_cumul, \ 126 | iteration, start_iter) 127 | 128 | # Set the names for the checkpoints 129 | ckp_name = osp.join(self.save_path, 'iter_{}_b1.pth'.format(iteration)) 130 | ckp_name_b2 = osp.join(self.save_path, 'iter_{}_b2.pth'.format(iteration)) 131 | print('Check point name: ', ckp_name) 132 | 133 | if iteration==start_iter and self.args.resume_fg: 134 | # Resume the 0-th phase model according to the config 135 | b1_model = torch.load(self.args.ckpt_dir_fg) 136 | elif self.args.resume and os.path.exists(ckp_name): 137 | # Resume other models according to the config 138 | b1_model = torch.load(ckp_name) 139 | b2_model = torch.load(ckp_name_b2) 140 | else: 141 | # Start training (if we don't resume the models from the checkppoints) 142 | 143 | # Set the optimizer 144 | tg_optimizer, tg_lr_scheduler, fusion_optimizer, fusion_lr_scheduler = self.set_optimizer(iteration, \ 145 | start_iter, b1_model, ref_model, b2_model, ref_b2_model) 146 | 147 | if iteration > start_iter: 148 | # Training the class-incremental learning system from the 1st phase 149 | 150 | # Set the balanced dataloader 151 | balancedloader = self.gen_balanced_loader(X_train_total, Y_train_total, indices_train_10, X_protoset, Y_protoset, order_list) 152 | 153 | # Training the model for different baselines 154 | if self.args.baseline == 'lucir': 155 | b1_model, b2_model = incremental_train_and_eval_lucir(self.args, self.args.epochs, self.fusion_vars, \ 156 | self.ref_fusion_vars, b1_model, ref_model, b2_model, ref_b2_model, tg_optimizer, tg_lr_scheduler, \ 157 | fusion_optimizer, fusion_lr_scheduler, trainloader, testloader, iteration, start_iter, \ 158 | X_protoset_cumuls, Y_protoset_cumuls, order_list, cur_lambda, self.args.dist, self.args.K, self.args.lw_mr, balancedloader) 159 | elif self.args.baseline == 'icarl': 160 | b1_model, b2_model = incremental_train_and_eval_icarl(self.args, self.args.epochs, self.fusion_vars, \ 161 | self.ref_fusion_vars, b1_model, ref_model, b2_model, ref_b2_model, tg_optimizer, tg_lr_scheduler, \ 162 | fusion_optimizer, fusion_lr_scheduler, trainloader, testloader, iteration, start_iter, \ 163 | X_protoset_cumuls, Y_protoset_cumuls, order_list, cur_lambda, self.args.dist, self.args.K, self.args.lw_mr, balancedloader, \ 164 | self.args.icarl_T, self.args.icarl_beta) 165 | else: 166 | raise ValueError('Please set the correct baseline.') 167 | else: 168 | # Training the class-incremental learning system from the 0th phase 169 | b1_model = incremental_train_and_eval_zeroth_phase(self.args, self.args.epochs, b1_model, \ 170 | ref_model, tg_optimizer, tg_lr_scheduler, trainloader, testloader, iteration, start_iter, \ 171 | cur_lambda, self.args.dist, self.args.K, self.args.lw_mr) 172 | 173 | # Select the exemplars according to the current model 174 | X_protoset_cumuls, Y_protoset_cumuls, class_means, alpha_dr_herding = self.set_exemplar_set(b1_model, b2_model, \ 175 | is_start_iteration, iteration, last_iter, order, alpha_dr_herding, prototypes) 176 | 177 | # Compute the accuracies for current phase 178 | top1_acc_list_ori, top1_acc_list_cumul = self.compute_acc(class_means, order, order_list, b1_model, b2_model, X_protoset_cumuls, Y_protoset_cumuls, \ 179 | X_valid_ori, Y_valid_ori, X_valid_cumul, Y_valid_cumul, iteration, is_start_iteration, top1_acc_list_ori, top1_acc_list_cumul) 180 | 181 | # Compute the average accuracy 182 | num_of_testing = iteration - start_iter + 1 183 | avg_cumul_acc_fc = np.sum(top1_acc_list_cumul[start_iter:,0])/num_of_testing 184 | avg_cumul_acc_icarl = np.sum(top1_acc_list_cumul[start_iter:,1])/num_of_testing 185 | print('Computing average accuracy...') 186 | print(" Average accuracy (FC) :\t\t{:.2f} %".format(avg_cumul_acc_fc)) 187 | print(" Average accuracy (Proto) :\t\t{:.2f} %".format(avg_cumul_acc_icarl)) 188 | 189 | # Write the results to the tensorboard 190 | self.train_writer.add_scalar('avg_acc/fc', float(avg_cumul_acc_fc), iteration) 191 | self.train_writer.add_scalar('avg_acc/proto', float(avg_cumul_acc_icarl), iteration) 192 | 193 | # Save the results and close the tensorboard writer 194 | torch.save(top1_acc_list_ori, osp.join(self.save_path, 'acc_list_ori.pth')) 195 | torch.save(top1_acc_list_cumul, osp.join(self.save_path, 'acc_list_cumul.pth')) 196 | self.train_writer.close() 197 | -------------------------------------------------------------------------------- /adaptive-aggregation-networks/trainer/zeroth_phase.py: -------------------------------------------------------------------------------- 1 | ##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 | ## Created by: Yaoyao Liu 3 | ## Modified from: https://github.com/hshustc/CVPR19_Incremental_Learning 4 | ## Max Planck Institute for Informatics 5 | ## yaoyao.liu@mpi-inf.mpg.de 6 | ## Copyright (c) 2019 7 | ## 8 | ## This source code is licensed under the MIT-style license found in the 9 | ## LICENSE file in the root directory of this source tree 10 | ##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 11 | """ Training code for the 0-th phase """ 12 | import torch 13 | import tqdm 14 | import numpy as np 15 | import torch.nn as nn 16 | import torchvision 17 | from torch.optim import lr_scheduler 18 | from torchvision import datasets, models, transforms 19 | from utils.misc import * 20 | from utils.process_fp import process_inputs_fp 21 | import torch.nn.functional as F 22 | 23 | def incremental_train_and_eval_zeroth_phase(the_args, epochs, b1_model, ref_model, \ 24 | tg_optimizer, tg_lr_scheduler, trainloader, testloader, iteration, start_iteration, \ 25 | lamda, dist, K, lw_mr, fix_bn=False, weight_per_class=None, device=None): 26 | 27 | # Setting up the CUDA device 28 | if device is None: 29 | device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") 30 | 31 | for epoch in range(epochs): 32 | # Set the 1st branch model to the training mode 33 | b1_model.train() 34 | 35 | # Fix the batch norm parameters according to the config 36 | if fix_bn: 37 | for m in b1_model.modules(): 38 | if isinstance(m, nn.BatchNorm2d): 39 | m.eval() 40 | 41 | # Set all the losses to zeros 42 | train_loss = 0 43 | train_loss1 = 0 44 | train_loss2 = 0 45 | # Set the counters to zeros 46 | correct = 0 47 | total = 0 48 | 49 | # Learning rate decay 50 | tg_lr_scheduler.step() 51 | 52 | # Print the information 53 | print('\nEpoch: %d, learning rate: ' % epoch, end='') 54 | print(tg_lr_scheduler.get_lr()[0]) 55 | 56 | for batch_idx, (inputs, targets) in enumerate(trainloader): 57 | # Get a batch of training samples, transfer them to the device 58 | inputs, targets = inputs.to(device), targets.to(device) 59 | # Clear the gradient of the paramaters for the tg_optimizer 60 | tg_optimizer.zero_grad() 61 | # Forward the samples in the deep networks 62 | outputs = b1_model(inputs) 63 | # Compute classification loss 64 | loss = nn.CrossEntropyLoss(weight_per_class)(outputs, targets) 65 | # Backward and update the parameters 66 | loss.backward() 67 | tg_optimizer.step() 68 | # Record the losses and the number of samples to compute the accuracy 69 | train_loss += loss.item() 70 | _, predicted = outputs.max(1) 71 | total += targets.size(0) 72 | correct += predicted.eq(targets).sum().item() 73 | 74 | # Print the training losses and accuracies 75 | print('Train set: {}, train loss: {:.4f} accuracy: {:.4f}'.format(len(trainloader), train_loss/(batch_idx+1), 100.*correct/total)) 76 | 77 | # Running the test for this epoch 78 | b1_model.eval() 79 | test_loss = 0 80 | correct = 0 81 | total = 0 82 | with torch.no_grad(): 83 | for batch_idx, (inputs, targets) in enumerate(testloader): 84 | inputs, targets = inputs.to(device), targets.to(device) 85 | outputs = b1_model(inputs) 86 | loss = nn.CrossEntropyLoss(weight_per_class)(outputs, targets) 87 | test_loss += loss.item() 88 | _, predicted = outputs.max(1) 89 | total += targets.size(0) 90 | correct += predicted.eq(targets).sum().item() 91 | print('Test set: {} test loss: {:.4f} accuracy: {:.4f}'.format(len(testloader), test_loss/(batch_idx+1), 100.*correct/total)) 92 | 93 | return b1_model 94 | -------------------------------------------------------------------------------- /adaptive-aggregation-networks/utils/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yaoyao-liu/class-incremental-learning/701af9f819f559c6ab3d3ee73bb3d7c21e924572/adaptive-aggregation-networks/utils/__init__.py -------------------------------------------------------------------------------- /adaptive-aggregation-networks/utils/gpu_tools.py: -------------------------------------------------------------------------------- 1 | ##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 | ## Created by: Yaoyao Liu 3 | ## Max Planck Institute for Informatics 4 | ## yaoyao.liu@mpi-inf.mpg.de 5 | ## Copyright (c) 2021 6 | ## 7 | ## This source code is licensed under the MIT-style license found in the 8 | ## LICENSE file in the root directory of this source tree 9 | ##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 10 | """ GPU tools. """ 11 | import os 12 | import torch 13 | import time 14 | 15 | def check_memory(cuda_device): 16 | """ Check the total memory and occupied memory for GPU """ 17 | devices_info = os.popen('"/usr/bin/nvidia-smi" --query-gpu=memory.total,memory.used --format=csv,nounits,noheader').read().strip().split("\n") 18 | total, used = devices_info[int(cuda_device)].split(',') 19 | return total,used 20 | 21 | def occupy_memory(cuda_device): 22 | """ Create a large tensor and delete it. 23 | This operation occupies the GPU memory, so other processes cannot use the occupied memory. 24 | It is used to ensure that this process won't be stopped when it requires additional GPU memory. 25 | Be careful with this operation. It will influence other people when you are sharing GPUs with others. 26 | """ 27 | total, used = check_memory(cuda_device) 28 | total = int(total) 29 | used = int(used) 30 | max_mem = int(total * 0.90) 31 | print('Total memory: ' + str(total) + ', used memory: ' + str(used)) 32 | block_mem = max_mem - used 33 | if block_mem > 0: 34 | x = torch.cuda.FloatTensor(256, 1024, block_mem) 35 | del x 36 | 37 | def set_gpu(x): 38 | """ Set up which GPU we use for this process """ 39 | os.environ['CUDA_VISIBLE_DEVICES'] = x 40 | print('Using gpu:', x) 41 | 42 | 43 | -------------------------------------------------------------------------------- /adaptive-aggregation-networks/utils/imagenet/__init__.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding=utf-8 3 | # for incremental-class train and eval 4 | -------------------------------------------------------------------------------- /adaptive-aggregation-networks/utils/imagenet/train_and_eval.py: -------------------------------------------------------------------------------- 1 | ##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 | ## Copied from: https://github.com/hshustc/CVPR19_Incremental_Learning 3 | ## 4 | ## This source code is licensed under the MIT-style license found in the 5 | ## LICENSE file in the root directory of this source tree 6 | ##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 7 | """ Tools for ImageNet """ 8 | import argparse 9 | import os 10 | import shutil 11 | import time 12 | 13 | import torch 14 | import torch.nn as nn 15 | import torch.nn.parallel 16 | import torch.backends.cudnn as cudnn 17 | import torch.distributed as dist 18 | import torch.optim 19 | import torch.utils.data 20 | import torch.utils.data.distributed 21 | import torchvision.transforms as transforms 22 | import torchvision.datasets as datasets 23 | import torchvision.models as models 24 | 25 | from .utils_train import * 26 | 27 | def train_and_eval(epochs, start_epoch, model, optimizer, lr_scheduler, \ 28 | train_loader, val_loader, gpu=None): 29 | for epoch in range(start_epoch, epochs): 30 | lr_scheduler.step() 31 | print('\nEpoch: %d, LR: ' % epoch, end='') 32 | print(lr_scheduler.get_lr()) 33 | 34 | train(train_loader, model, optimizer, epoch, gpu) 35 | validate(val_loader, model, gpu) 36 | 37 | return model 38 | 39 | def train(train_loader, model, optimizer, epoch, gpu=None): 40 | batch_time = AverageMeter() 41 | data_time = AverageMeter() 42 | losses = AverageMeter() 43 | top1 = AverageMeter() 44 | top5 = AverageMeter() 45 | 46 | model.train() 47 | criterion = nn.CrossEntropyLoss().cuda(gpu) 48 | 49 | end = time.time() 50 | for i, (input, target) in enumerate(train_loader): 51 | data_time.update(time.time() - end) 52 | 53 | if gpu is not None: 54 | input = input.cuda(gpu, non_blocking=True) 55 | target = target.cuda(gpu, non_blocking=True) 56 | 57 | output = model(input) 58 | loss = criterion(output, target) 59 | 60 | prec1, prec5 = accuracy(output, target, topk=(1, 5)) 61 | losses.update(loss.item(), input.size(0)) 62 | top1.update(prec1[0], input.size(0)) 63 | top5.update(prec5[0], input.size(0)) 64 | 65 | optimizer.zero_grad() 66 | loss.backward() 67 | optimizer.step() 68 | 69 | 70 | batch_time.update(time.time() - end) 71 | end = time.time() 72 | 73 | if i % 10 == 0: 74 | print('Epoch: [{0}][{1}/{2}]\t' 75 | 'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t' 76 | 'Data {data_time.val:.3f} ({data_time.avg:.3f})\t' 77 | 'Loss {loss.val:.4f} ({loss.avg:.4f})\t' 78 | 'Prec@1 {top1.val:.3f} ({top1.avg:.3f})\t' 79 | 'Prec@5 {top5.val:.3f} ({top5.avg:.3f})'.format( 80 | epoch, i, len(train_loader), batch_time=batch_time, 81 | data_time=data_time, loss=losses, top1=top1, top5=top5)) 82 | -------------------------------------------------------------------------------- /adaptive-aggregation-networks/utils/imagenet/utils_dataset.py: -------------------------------------------------------------------------------- 1 | ##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 | ## Copied from: https://github.com/hshustc/CVPR19_Incremental_Learning 3 | ## 4 | ## This source code is licensed under the MIT-style license found in the 5 | ## LICENSE file in the root directory of this source tree 6 | ##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 7 | """ Tools for ImageNet """ 8 | import argparse 9 | import os 10 | import shutil 11 | import time 12 | import numpy as np 13 | 14 | def split_images_labels(imgs): 15 | images = [] 16 | labels = [] 17 | for item in imgs: 18 | images.append(item[0]) 19 | labels.append(item[1]) 20 | 21 | return np.array(images), np.array(labels) 22 | 23 | def merge_images_labels(images, labels): 24 | images = list(images) 25 | labels = list(labels) 26 | assert(len(images)==len(labels)) 27 | imgs = [] 28 | for i in range(len(images)): 29 | item = (images[i], labels[i]) 30 | imgs.append(item) 31 | 32 | return imgs 33 | -------------------------------------------------------------------------------- /adaptive-aggregation-networks/utils/imagenet/utils_train.py: -------------------------------------------------------------------------------- 1 | ##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 | ## Copied from: https://github.com/hshustc/CVPR19_Incremental_Learning 3 | ## 4 | ## This source code is licensed under the MIT-style license found in the 5 | ## LICENSE file in the root directory of this source tree 6 | ##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 7 | """ Tools for ImageNet """ 8 | import argparse 9 | import os 10 | import shutil 11 | import time 12 | 13 | import torch 14 | import torch.nn as nn 15 | import torch.nn.parallel 16 | import torch.backends.cudnn as cudnn 17 | import torch.distributed as dist 18 | import torch.optim 19 | import torch.utils.data 20 | import torch.utils.data.distributed 21 | import torchvision.transforms as transforms 22 | import torchvision.datasets as datasets 23 | import torchvision.models as models 24 | 25 | def validate(val_loader, model, gpu=None): 26 | batch_time = AverageMeter() 27 | losses = AverageMeter() 28 | top1 = AverageMeter() 29 | top5 = AverageMeter() 30 | 31 | model.eval() 32 | criterion = nn.CrossEntropyLoss().cuda(gpu) 33 | 34 | with torch.no_grad(): 35 | end = time.time() 36 | for i, (input, target) in enumerate(val_loader): 37 | if gpu is not None: 38 | input = input.cuda(gpu, non_blocking=True) 39 | target = target.cuda(gpu, non_blocking=True) 40 | 41 | output = model(input) 42 | loss = criterion(output, target) 43 | 44 | prec1, prec5 = accuracy(output, target, topk=(1, 5)) 45 | losses.update(loss.item(), input.size(0)) 46 | top1.update(prec1[0], input.size(0)) 47 | top5.update(prec5[0], input.size(0)) 48 | 49 | batch_time.update(time.time() - end) 50 | end = time.time() 51 | 52 | if i % 10 == 0: 53 | print('Test: [{0}/{1}]\t' 54 | 'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t' 55 | 'Loss {loss.val:.4f} ({loss.avg:.4f})\t' 56 | 'Prec@1 {top1.val:.3f} ({top1.avg:.3f})\t' 57 | 'Prec@5 {top5.val:.3f} ({top5.avg:.3f})'.format( 58 | i, len(val_loader), batch_time=batch_time, loss=losses, 59 | top1=top1, top5=top5)) 60 | 61 | print(' * Prec@1 {top1.avg:.3f} Prec@5 {top5.avg:.3f}' 62 | .format(top1=top1, top5=top5)) 63 | 64 | return top1.avg 65 | 66 | class AverageMeter(object): 67 | """Computes and stores the average and current value""" 68 | def __init__(self): 69 | self.reset() 70 | 71 | def reset(self): 72 | self.val = 0 73 | self.avg = 0 74 | self.sum = 0 75 | self.count = 0 76 | 77 | def update(self, val, n=1): 78 | self.val = val 79 | self.sum += val * n 80 | self.count += n 81 | self.avg = self.sum / self.count 82 | 83 | def accuracy(output, target, topk=(1,)): 84 | """Computes the precision@k for the specified values of k""" 85 | with torch.no_grad(): 86 | maxk = max(topk) 87 | batch_size = target.size(0) 88 | 89 | _, pred = output.topk(maxk, 1, True, True) 90 | pred = pred.t() 91 | correct = pred.eq(target.view(1, -1).expand_as(pred)) 92 | 93 | res = [] 94 | for k in topk: 95 | correct_k = correct[:k].view(-1).float().sum(0, keepdim=True) 96 | res.append(correct_k.mul_(100.0 / batch_size)) 97 | return res 98 | -------------------------------------------------------------------------------- /adaptive-aggregation-networks/utils/incremental/__init__.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding=utf-8 3 | # for incremental train and eval 4 | -------------------------------------------------------------------------------- /adaptive-aggregation-networks/utils/incremental/compute_accuracy.py: -------------------------------------------------------------------------------- 1 | ##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 | ## Created by: Yaoyao Liu 3 | ## Modified from: https://github.com/hshustc/CVPR19_Incremental_Learning 4 | ## Max Planck Institute for Informatics 5 | ## yaoyao.liu@mpi-inf.mpg.de 6 | ## Copyright (c) 2021 7 | ## 8 | ## This source code is licensed under the MIT-style license found in the 9 | ## LICENSE file in the root directory of this source tree 10 | ##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 11 | """ The functions that compute the accuracies """ 12 | import torch 13 | import torch.nn as nn 14 | import torch.nn.functional as F 15 | import torch.optim as optim 16 | from torch.optim import lr_scheduler 17 | import torchvision 18 | from torchvision import datasets, models, transforms 19 | from torch.autograd import Variable 20 | import numpy as np 21 | import time 22 | import os 23 | import copy 24 | import argparse 25 | from PIL import Image 26 | from scipy.spatial.distance import cdist 27 | from sklearn.metrics import confusion_matrix 28 | from utils.misc import * 29 | from utils.imagenet.utils_dataset import merge_images_labels 30 | from utils.process_fp import process_inputs_fp 31 | 32 | def map_labels(order_list, Y_set): 33 | map_Y = [] 34 | for idx in Y_set: 35 | map_Y.append(order_list.index(idx)) 36 | map_Y = np.array(map_Y) 37 | return map_Y 38 | 39 | def compute_accuracy(the_args, fusion_vars, b1_model, b2_model, tg_feature_model, class_means, \ 40 | X_protoset_cumuls, Y_protoset_cumuls, evalloader, order_list, is_start_iteration=False, \ 41 | fast_fc=None, scale=None, print_info=True, device=None, cifar=True, imagenet=False, \ 42 | valdir=None): 43 | if device is None: 44 | device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") 45 | b1_model.eval() 46 | tg_feature_model.eval() 47 | b1_model.eval() 48 | if b2_model is not None: 49 | b2_model.eval() 50 | fast_fc = 0.0 51 | correct = 0 52 | correct_icarl = 0 53 | correct_icarl_cosine = 0 54 | correct_icarl_cosine2 = 0 55 | correct_ncm = 0 56 | correct_maml = 0 57 | total = 0 58 | with torch.no_grad(): 59 | for batch_idx, (inputs, targets) in enumerate(evalloader): 60 | inputs, targets = inputs.to(device), targets.to(device) 61 | total += targets.size(0) 62 | 63 | if is_start_iteration: 64 | outputs = b1_model(inputs) 65 | else: 66 | outputs, outputs_feature = process_inputs_fp(the_args, fusion_vars, b1_model, b2_model, inputs) 67 | 68 | outputs = F.softmax(outputs, dim=1) 69 | if scale is not None: 70 | assert(scale.shape[0] == 1) 71 | assert(outputs.shape[1] == scale.shape[1]) 72 | outputs = outputs / scale.repeat(outputs.shape[0], 1).type(torch.FloatTensor).to(device) 73 | _, predicted = outputs.max(1) 74 | correct += predicted.eq(targets).sum().item() 75 | 76 | if is_start_iteration: 77 | outputs_feature = np.squeeze(tg_feature_model(inputs)) 78 | sqd_icarl = cdist(class_means[:,:,0].T, outputs_feature.cpu(), 'sqeuclidean') 79 | score_icarl = torch.from_numpy((-sqd_icarl).T).to(device) 80 | _, predicted_icarl = score_icarl.max(1) 81 | correct_icarl += predicted_icarl.eq(targets).sum().item() 82 | sqd_icarl_cosine = cdist(class_means[:,:,0].T, outputs_feature.cpu(), 'cosine') 83 | score_icarl_cosine = torch.from_numpy((-sqd_icarl_cosine).T).to(device) 84 | _, predicted_icarl_cosine = score_icarl_cosine.max(1) 85 | correct_icarl_cosine += predicted_icarl_cosine.eq(targets).sum().item() 86 | fast_weights = torch.from_numpy(np.float32(class_means[:,:,0].T)).to(device) 87 | sqd_icarl_cosine2 = F.linear(F.normalize(torch.squeeze(outputs_feature), p=2,dim=1), F.normalize(fast_weights, p=2, dim=1)) 88 | score_icarl_cosine2 = sqd_icarl_cosine2 89 | _, predicted_icarl_cosine2 = score_icarl_cosine2.max(1) 90 | correct_icarl_cosine2 += predicted_icarl_cosine2.eq(targets).sum().item() 91 | sqd_ncm = cdist(class_means[:,:,1].T, outputs_feature.cpu(), 'sqeuclidean') 92 | score_ncm = torch.from_numpy((-sqd_ncm).T).to(device) 93 | _, predicted_ncm = score_ncm.max(1) 94 | correct_ncm += predicted_ncm.eq(targets).sum().item() 95 | if print_info: 96 | print(" Current accuracy (FC) :\t\t{:.2f} %".format(100.*correct/total)) 97 | print(" Current accuracy (Proto) :\t\t{:.2f} %".format(100.*correct_icarl/total)) 98 | print(" Current accuracy (Proto-UB) :\t\t{:.2f} %".format(100.*correct_ncm/total)) 99 | cnn_acc = 100.*correct/total 100 | icarl_acc = 100.*correct_icarl/total 101 | ncm_acc = 100.*correct_ncm/total 102 | return [cnn_acc, icarl_acc, ncm_acc], fast_fc 103 | -------------------------------------------------------------------------------- /adaptive-aggregation-networks/utils/incremental/compute_features.py: -------------------------------------------------------------------------------- 1 | ##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 | ## Created by: Yaoyao Liu 3 | ## Modified from: https://github.com/hshustc/CVPR19_Incremental_Learning 4 | ## Max Planck Institute for Informatics 5 | ## yaoyao.liu@mpi-inf.mpg.de 6 | ## Copyright (c) 2021 7 | ## 8 | ## This source code is licensed under the MIT-style license found in the 9 | ## LICENSE file in the root directory of this source tree 10 | ##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 11 | """ The functions that compute the features """ 12 | import torch 13 | import torch.nn as nn 14 | import torch.nn.functional as F 15 | import torch.optim as optim 16 | from torch.optim import lr_scheduler 17 | import torchvision 18 | from torchvision import datasets, models, transforms 19 | from torch.autograd import Variable 20 | import numpy as np 21 | import time 22 | import os 23 | import copy 24 | import argparse 25 | from PIL import Image 26 | from scipy.spatial.distance import cdist 27 | from sklearn.metrics import confusion_matrix 28 | from utils.misc import * 29 | from utils.process_fp import process_inputs_fp 30 | 31 | def compute_features(the_args, fusion_vars, tg_model, free_model, tg_feature_model, \ 32 | is_start_iteration, evalloader, num_samples, num_features, device=None): 33 | if device is None: 34 | device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") 35 | tg_feature_model.eval() 36 | tg_model.eval() 37 | if free_model is not None: 38 | free_model.eval() 39 | 40 | features = np.zeros([num_samples, num_features]) 41 | start_idx = 0 42 | with torch.no_grad(): 43 | for inputs, targets in evalloader: 44 | inputs = inputs.to(device) 45 | if is_start_iteration: 46 | the_feature = tg_feature_model(inputs) 47 | else: 48 | the_feature = process_inputs_fp(the_args, fusion_vars, tg_model, free_model, inputs, feature_mode=True) 49 | features[start_idx:start_idx+inputs.shape[0], :] = np.squeeze(the_feature.cpu()) 50 | start_idx = start_idx+inputs.shape[0] 51 | assert(start_idx==num_samples) 52 | return features 53 | -------------------------------------------------------------------------------- /adaptive-aggregation-networks/utils/incremental/conv2d_mtl.py: -------------------------------------------------------------------------------- 1 | ##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 | ## Created by: Yaoyao Liu 3 | ## Modified from: https://github.com/pytorch/pytorch 4 | ## Max Planck Institute for Informatics 5 | ## yaoyao.liu@mpi-inf.mpg.de 6 | ## Copyright (c) 2021 7 | ## 8 | ## This source code is licensed under the MIT-style license found in the 9 | ## LICENSE file in the root directory of this source tree 10 | ##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 11 | """ SS CONV layers. 12 | This file contains the source code for the scaling and shifting weights. 13 | If this architecture is applied, the convolution weights will be frozen, and only the channel-wise masks will be updated. 14 | """ 15 | import math 16 | import torch 17 | from torch.nn.parameter import Parameter 18 | import torch.nn.functional as F 19 | from torch.nn.modules.module import Module 20 | from torch.nn.modules.utils import _single, _pair, _triple 21 | 22 | class _ConvNdMtl(Module): 23 | 24 | def __init__(self, in_channels, out_channels, kernel_size, stride, 25 | padding, dilation, transposed, output_padding, groups, bias): 26 | super(_ConvNdMtl, self).__init__() 27 | if in_channels % groups != 0: 28 | raise ValueError('in_channels must be divisible by groups') 29 | if out_channels % groups != 0: 30 | raise ValueError('out_channels must be divisible by groups') 31 | self.in_channels = in_channels 32 | self.out_channels = out_channels 33 | self.kernel_size = kernel_size 34 | self.stride = stride 35 | self.padding = padding 36 | self.dilation = dilation 37 | self.transposed = transposed 38 | self.output_padding = output_padding 39 | self.groups = groups 40 | if transposed: 41 | self.weight = Parameter(torch.Tensor( 42 | in_channels, out_channels // groups, *kernel_size)) 43 | self.mtl_weight = Parameter(torch.ones(in_channels, out_channels // groups, 1, 1)) 44 | else: 45 | self.weight = Parameter(torch.Tensor( 46 | out_channels, in_channels // groups, *kernel_size)) 47 | self.mtl_weight = Parameter(torch.ones(out_channels, in_channels // groups, 1, 1)) 48 | self.weight.requires_grad=False 49 | if bias: 50 | self.bias = Parameter(torch.Tensor(out_channels)) 51 | self.bias.requires_grad=False 52 | self.mtl_bias = Parameter(torch.zeros(out_channels)) 53 | else: 54 | self.register_parameter('bias', None) 55 | self.register_parameter('mtl_bias', None) 56 | self.reset_parameters() 57 | 58 | def reset_parameters(self): 59 | n = self.in_channels 60 | for k in self.kernel_size: 61 | n *= k 62 | stdv = 1. / math.sqrt(n) 63 | self.weight.data.uniform_(-stdv, stdv) 64 | self.mtl_weight.data.uniform_(1, 1) 65 | if self.bias is not None: 66 | self.bias.data.uniform_(-stdv, stdv) 67 | self.mtl_bias.data.uniform_(0, 0) 68 | 69 | def extra_repr(self): 70 | s = ('{in_channels}, {out_channels}, kernel_size={kernel_size}' 71 | ', stride={stride}') 72 | if self.padding != (0,) * len(self.padding): 73 | s += ', padding={padding}' 74 | if self.dilation != (1,) * len(self.dilation): 75 | s += ', dilation={dilation}' 76 | if self.output_padding != (0,) * len(self.output_padding): 77 | s += ', output_padding={output_padding}' 78 | if self.groups != 1: 79 | s += ', groups={groups}' 80 | if self.bias is None: 81 | s += ', bias=False' 82 | return s.format(**self.__dict__) 83 | 84 | class Conv2dMtl(_ConvNdMtl): 85 | 86 | def __init__(self, in_channels, out_channels, kernel_size, stride=1, 87 | padding=0, dilation=1, groups=1, bias=True): 88 | kernel_size = _pair(kernel_size) 89 | stride = _pair(stride) 90 | padding = _pair(padding) 91 | dilation = _pair(dilation) 92 | super(Conv2dMtl, self).__init__( 93 | in_channels, out_channels, kernel_size, stride, padding, dilation, 94 | False, _pair(0), groups, bias) 95 | 96 | def forward(self, input): 97 | new_mtl_weight = self.mtl_weight.expand(self.weight.shape) 98 | new_weight = self.weight.mul(new_mtl_weight) 99 | if self.bias is not None: 100 | new_bias = self.bias + self.mtl_bias 101 | else: 102 | new_bias = None 103 | return F.conv2d(input, new_weight, new_bias, self.stride, 104 | self.padding, self.dilation, self.groups) 105 | 106 | -------------------------------------------------------------------------------- /adaptive-aggregation-networks/utils/misc.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function, division 2 | 3 | import torch 4 | import torch.nn as nn 5 | import torch.nn.init as init 6 | from collections import OrderedDict 7 | import torch.optim as optim 8 | import torchvision 9 | import argparse 10 | import numpy as np 11 | import os 12 | import os.path as osp 13 | import sys 14 | import time 15 | import math 16 | import subprocess 17 | try: 18 | import cPickle as pickle 19 | except: 20 | import pickle 21 | 22 | def savepickle(data, file_path): 23 | mkdir_p(osp.dirname(file_path), delete=False) 24 | print('pickle into', file_path) 25 | with open(file_path, 'wb') as f: 26 | pickle.dump(data, f, pickle.HIGHEST_PROTOCOL) 27 | 28 | def unpickle(file_path): 29 | with open(file_path, 'rb') as f: 30 | data = pickle.load(f) 31 | return data 32 | 33 | def mkdir_p(path, delete=False, print_info=True): 34 | if path == '': return 35 | 36 | if delete: 37 | subprocess.call(('rm -r ' + path).split()) 38 | if not osp.exists(path): 39 | if print_info: 40 | print('mkdir -p ' + path) 41 | subprocess.call(('mkdir -p ' + path).split()) 42 | 43 | def get_mean_and_std(dataset): 44 | '''Compute the mean and std value of dataset.''' 45 | dataloader = torch.utils.data.DataLoader(dataset, batch_size=1, shuffle=True, num_workers=2) 46 | mean = torch.zeros(3) 47 | std = torch.zeros(3) 48 | print('==> Computing mean and std..') 49 | for inputs, targets in dataloader: 50 | for i in range(3): 51 | mean[i] += inputs[:,i,:,:].mean() 52 | std[i] += inputs[:,i,:,:].std() 53 | mean.div_(len(dataset)) 54 | std.div_(len(dataset)) 55 | return mean, std 56 | 57 | def init_params(net): 58 | '''Init layer parameters.''' 59 | for m in net.modules(): 60 | if isinstance(m, nn.Conv2d): 61 | init.kaiming_normal_(m.weight, mode='fan_out') 62 | if m.bias: 63 | init.constant_(m.bias, 0) 64 | elif isinstance(m, nn.BatchNorm2d): 65 | init.constant_(m.weight, 1) 66 | init.constant_(m.bias, 0) 67 | elif isinstance(m, nn.Linear): 68 | init.normal_(m.weight, std=1e-3) 69 | if m.bias is not None: 70 | init.constant_(m.bias, 0) 71 | 72 | def map_labels(order_list, Y_set): 73 | map_Y = [] 74 | for idx in Y_set: 75 | map_Y.append(order_list.index(idx)) 76 | map_Y = np.array(map_Y) 77 | return map_Y 78 | 79 | def format_time(seconds): 80 | days = int(seconds / 3600/24) 81 | seconds = seconds - days*3600*24 82 | hours = int(seconds / 3600) 83 | seconds = seconds - hours*3600 84 | minutes = int(seconds / 60) 85 | seconds = seconds - minutes*60 86 | secondsf = int(seconds) 87 | seconds = seconds - secondsf 88 | millis = int(seconds*1000) 89 | 90 | f = '' 91 | i = 1 92 | if days > 0: 93 | f += str(days) + 'D' 94 | i += 1 95 | if hours > 0 and i <= 2: 96 | f += str(hours) + 'h' 97 | i += 1 98 | if minutes > 0 and i <= 2: 99 | f += str(minutes) + 'm' 100 | i += 1 101 | if secondsf > 0 and i <= 2: 102 | f += str(secondsf) + 's' 103 | i += 1 104 | if millis > 0 and i <= 2: 105 | f += str(millis) + 'ms' 106 | i += 1 107 | if f == '': 108 | f = '0ms' 109 | return f 110 | 111 | def tensor2im(input_image, imtype=np.uint8): 112 | mean = [0.5071, 0.4866, 0.4409] 113 | std = [0.2009, 0.1984, 0.2023] 114 | if not isinstance(input_image, np.ndarray): 115 | if isinstance(input_image, torch.Tensor): 116 | image_tensor = input_image.data 117 | else: 118 | return input_image 119 | image_numpy = image_tensor.cpu().detach().float().numpy() 120 | if image_numpy.shape[0] == 1: 121 | image_numpy = np.tile(image_numpy, (3, 1, 1)) 122 | for i in range(len(mean)): 123 | image_numpy[i] = image_numpy[i] * std[i] + mean[i] 124 | image_numpy = image_numpy * 255 125 | image_numpy = np.transpose(image_numpy, (1, 2, 0)) 126 | else: 127 | image_numpy = input_image 128 | return image_numpy.astype(imtype) 129 | 130 | def process_mnemonics(X_protoset_cumuls, Y_protoset_cumuls, mnemonics, mnemonics_label, order_list): 131 | mnemonics_array_new = np.zeros(np.array(X_protoset_cumuls).shape) 132 | mnemonics_list = [] 133 | mnemonics_label_list = [] 134 | for idx in range(len(mnemonics)): 135 | this_mnemonics = [] 136 | for sub_idx in range(len(mnemonics[idx])): 137 | processed_img = tensor2im(mnemonics[idx][sub_idx]) 138 | mnemonics_array_new[idx][sub_idx] = processed_img 139 | 140 | diff = len(X_protoset_cumuls) - len(mnemonics_array_new) 141 | for idx in range(len(mnemonics_array_new)): 142 | X_protoset_cumuls[idx+diff] = mnemonics_array_new[idx] 143 | 144 | return X_protoset_cumuls -------------------------------------------------------------------------------- /adaptive-aggregation-networks/utils/process_fp.py: -------------------------------------------------------------------------------- 1 | ##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 | ## Created by: Yaoyao Liu 3 | ## Max Planck Institute for Informatics 4 | ## yaoyao.liu@mpi-inf.mpg.de 5 | ## Copyright (c) 2021 6 | ## 7 | ## This source code is licensed under the MIT-style license found in the 8 | ## LICENSE file in the root directory of this source tree 9 | ##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 10 | """ Using the aggregation weights to compute the feature maps from two branches """ 11 | import torch 12 | import torch.nn as nn 13 | from utils.misc import * 14 | 15 | def process_inputs_fp(the_args, fusion_vars, b1_model, b2_model, inputs, feature_mode=False): 16 | 17 | # The 1st level 18 | if the_args.dataset == 'cifar100': 19 | b1_model_group1 = [b1_model.conv1, b1_model.bn1, b1_model.relu, b1_model.layer1] 20 | b2_model_group1 = [b2_model.conv1, b2_model.bn1, b2_model.relu, b2_model.layer1] 21 | elif the_args.dataset == 'imagenet_sub' or the_args.dataset == 'imagenet': 22 | b1_model_group1 = [b1_model.conv1, b1_model.bn1, b1_model.relu, b1_model.maxpool, b1_model.layer1] 23 | b2_model_group1 = [b2_model.conv1, b2_model.bn1, b2_model.relu, b2_model.maxpool, b2_model.layer1] 24 | else: 25 | raise ValueError('Please set correct dataset.') 26 | b1_model_group1 = nn.Sequential(*b1_model_group1) 27 | b1_fp1 = b1_model_group1(inputs) 28 | b2_model_group1 = nn.Sequential(*b2_model_group1) 29 | b2_fp1 = b2_model_group1(inputs) 30 | fp1 = fusion_vars[0]*b1_fp1+(1-fusion_vars[0])*b2_fp1 31 | 32 | # The 2nd level 33 | b1_model_group2 = b1_model.layer2 34 | b1_fp2 = b1_model_group2(fp1) 35 | b2_model_group2 = b2_model.layer2 36 | b2_fp2 = b2_model_group2(fp1) 37 | fp2 = fusion_vars[1]*b1_fp2+(1-fusion_vars[1])*b2_fp2 38 | 39 | # The 3rd level 40 | if the_args.dataset == 'cifar100': 41 | b1_model_group3 = [b1_model.layer3, b1_model.avgpool] 42 | b2_model_group3 = [b2_model.layer3, b2_model.avgpool] 43 | elif the_args.dataset == 'imagenet_sub' or the_args.dataset == 'imagenet': 44 | b1_model_group3 = b1_model.layer3 45 | b2_model_group3 = b2_model.layer3 46 | else: 47 | raise ValueError('Please set correct dataset.') 48 | b1_model_group3 = nn.Sequential(*b1_model_group3) 49 | b1_fp3 = b1_model_group3(fp2) 50 | b2_model_group3 = nn.Sequential(*b2_model_group3) 51 | b2_fp3 = b2_model_group3(fp2) 52 | fp3 = fusion_vars[2]*b1_fp3+(1-fusion_vars[2])*b2_fp3 53 | 54 | if the_args.dataset == 'cifar100': 55 | fp_final = fp3.view(fp3.size(0), -1) 56 | elif the_args.dataset == 'imagenet_sub' or the_args.dataset == 'imagenet': 57 | # The 4th level 58 | b1_model_group4 = [b1_model.layer4, b1_model.avgpool] 59 | b1_model_group4 = nn.Sequential(*b1_model_group4) 60 | b1_fp4 = b1_model_group4(fp3) 61 | b2_model_group4 = [b2_model.layer4, b2_model.avgpool] 62 | b2_model_group4 = nn.Sequential(*b2_model_group4) 63 | b2_fp4 = b2_model_group4(fp3) 64 | fp4 = fusion_vars[3]*b1_fp4+(1-fusion_vars[3])*b2_fp4 65 | fp_final = fp4.view(fp4.size(0), -1) 66 | else: 67 | raise ValueError('Please set correct dataset.') 68 | if feature_mode: 69 | return fp_final 70 | else: 71 | outputs = b1_model.fc(fp_final) 72 | return outputs, fp_final 73 | -------------------------------------------------------------------------------- /mnemonics-training/1_train/main.py: -------------------------------------------------------------------------------- 1 | import os 2 | import argparse 3 | import numpy as np 4 | from trainer.mnemonics import Trainer as MnemonicsTrainer 5 | from trainer.baseline import Trainer as BaselineTrainer 6 | 7 | if __name__ == '__main__': 8 | parser = argparse.ArgumentParser() 9 | parser.add_argument('--gpu', default='0') 10 | parser.add_argument('--method', default='mnemonics', type=str, choices=['mnemonics', 'baseline']) 11 | parser.add_argument('--dataset', default='cifar100', type=str, choices=['cifar100']) 12 | parser.add_argument('--data_dir', default='data/seed_1993_subset_100_imagenet/data', type=str) 13 | parser.add_argument('--num_classes', default=100, type=int) 14 | parser.add_argument('--nb_cl_fg', default=50, type=int) 15 | parser.add_argument('--nb_cl', default=10, type=int) 16 | parser.add_argument('--nb_protos', default=20, type=int) 17 | parser.add_argument('--nb_runs', default=1, type=int) 18 | parser.add_argument('--epochs', default=160, type=int) 19 | parser.add_argument('--T', default=2, type=float) 20 | parser.add_argument('--beta', default=0.25, type=float) 21 | parser.add_argument('--resume', action='store_true') 22 | parser.add_argument('--resume_fg', action='store_true') 23 | parser.add_argument('--ckpt_dir_fg', type=str, default='-') 24 | parser.add_argument('--dynamic_budget', action='store_true') 25 | parser.add_argument('--phase', type=str, default='train', choices=['train', 'eval']) 26 | parser.add_argument('--fusion_mode', default='free', type=str, choices=['std', 'free', 'mtl']) 27 | parser.add_argument('--ckpt_label', type=str, default='01') 28 | parser.add_argument('--num_workers', default=2, type=int) 29 | parser.add_argument('--load_iter', default=0, type=int) 30 | parser.add_argument('--dictionary_size', default=500, type=int) 31 | parser.add_argument('--mimic_score', action='store_true') 32 | parser.add_argument('--lw_ms', default=1, type=float) 33 | parser.add_argument('--rs_ratio', default=0, type=float) 34 | parser.add_argument('--less_forget', action='store_true') 35 | parser.add_argument('--lamda', default=5, type=float) 36 | parser.add_argument('--adapt_lamda', action='store_true') 37 | parser.add_argument('--dist', default=0.5, type=float) 38 | parser.add_argument('--K', default=2, type=int) 39 | parser.add_argument('--lw_mr', default=1, type=float) 40 | parser.add_argument('--random_seed', default=1993, type=int) 41 | parser.add_argument('--train_batch_size', default=128, type=int) 42 | parser.add_argument('--test_batch_size', default=100, type=int) 43 | parser.add_argument('--eval_batch_size', default=128, type=int) 44 | parser.add_argument('--base_lr1', default=0.1, type=float) 45 | parser.add_argument('--base_lr2', default=0.1, type=float) 46 | parser.add_argument('--lr_factor', default=0.1, type=float) 47 | parser.add_argument('--custom_weight_decay', default=5e-4, type=float) 48 | parser.add_argument('--custom_momentum', default=0.9, type=float) 49 | parser.add_argument('--load_ckpt_prefix', type=str, default='-') 50 | parser.add_argument('--load_order', type=str, default='-') 51 | parser.add_argument('--maml_lr', default=0.1, type=float) 52 | parser.add_argument('--maml_epoch', default=50, type=int) 53 | parser.add_argument('--mnemonics_images_per_class_per_step', default=1, type=int) 54 | parser.add_argument('--mnemonics_steps', default=20, type=int) 55 | parser.add_argument('--mnemonics_epochs', default=1, type=int) 56 | parser.add_argument('--mnemonics_lr', type=float, default=1e-5) 57 | parser.add_argument('--mnemonics_decay_factor', type=float, default=0.5) 58 | parser.add_argument('--mnemonics_outer_lr', type=float, default=1e-5) 59 | parser.add_argument('--mnemonics_total_epochs', type=int, default=1) 60 | parser.add_argument('--mnemonics_decay_epochs', type=int, default=1) 61 | 62 | the_args = parser.parse_args() 63 | 64 | assert(the_args.nb_cl_fg % the_args.nb_cl == 0) 65 | assert(the_args.nb_cl_fg >= the_args.nb_cl) 66 | 67 | print(the_args) 68 | 69 | os.environ['CUDA_VISIBLE_DEVICES'] = the_args.gpu 70 | print('Using gpu:', the_args.gpu) 71 | 72 | if the_args.method == 'mnemonics': 73 | trainer = MnemonicsTrainer(the_args) 74 | elif the_args.method == 'baseline': 75 | trainer = BaselineTrainer(the_args) 76 | else: 77 | raise ValueError('Please set the correct method.') 78 | trainer.train() 79 | 80 | -------------------------------------------------------------------------------- /mnemonics-training/1_train/models/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yaoyao-liu/class-incremental-learning/701af9f819f559c6ab3d3ee73bb3d7c21e924572/mnemonics-training/1_train/models/__init__.py -------------------------------------------------------------------------------- /mnemonics-training/1_train/models/modified_linear.py: -------------------------------------------------------------------------------- 1 | import math 2 | import torch 3 | from torch.nn import Module 4 | from torch.nn.parameter import Parameter 5 | from torch.nn import functional as F 6 | 7 | class CosineLinear(Module): 8 | def __init__(self, in_features, out_features, sigma=True): 9 | super(CosineLinear, self).__init__() 10 | self.in_features = in_features 11 | self.out_features = out_features 12 | self.weight = Parameter(torch.Tensor(out_features, in_features)) 13 | if sigma: 14 | self.sigma = Parameter(torch.Tensor(1)) 15 | else: 16 | self.register_parameter('sigma', None) 17 | self.reset_parameters() 18 | 19 | def reset_parameters(self): 20 | stdv = 1. / math.sqrt(self.weight.size(1)) 21 | self.weight.data.uniform_(-stdv, stdv) 22 | if self.sigma is not None: 23 | self.sigma.data.fill_(1) 24 | 25 | def forward(self, input): 26 | out = F.linear(F.normalize(input, p=2,dim=1), \ 27 | F.normalize(self.weight, p=2, dim=1)) 28 | if self.sigma is not None: 29 | out = self.sigma * out 30 | return out 31 | 32 | class SplitCosineLinear(Module): 33 | def __init__(self, in_features, out_features1, out_features2, sigma=True): 34 | super(SplitCosineLinear, self).__init__() 35 | self.in_features = in_features 36 | self.out_features = out_features1 + out_features2 37 | self.fc1 = CosineLinear(in_features, out_features1, False) 38 | self.fc2 = CosineLinear(in_features, out_features2, False) 39 | if sigma: 40 | self.sigma = Parameter(torch.Tensor(1)) 41 | self.sigma.data.fill_(1) 42 | else: 43 | self.register_parameter('sigma', None) 44 | 45 | def forward(self, x): 46 | out1 = self.fc1(x) 47 | out2 = self.fc2(x) 48 | out = torch.cat((out1, out2), dim=1) 49 | if self.sigma is not None: 50 | out = self.sigma * out 51 | return out 52 | -------------------------------------------------------------------------------- /mnemonics-training/1_train/models/modified_resnet_cifar.py: -------------------------------------------------------------------------------- 1 | import math 2 | import torch.nn as nn 3 | import torch.utils.model_zoo as model_zoo 4 | import models.modified_linear as modified_linear 5 | 6 | def conv3x3(in_planes, out_planes, stride=1): 7 | return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride, 8 | padding=1, bias=False) 9 | 10 | class BasicBlock(nn.Module): 11 | expansion = 1 12 | 13 | def __init__(self, inplanes, planes, stride=1, downsample=None, last=False): 14 | super(BasicBlock, self).__init__() 15 | self.conv1 = conv3x3(inplanes, planes, stride) 16 | self.bn1 = nn.BatchNorm2d(planes) 17 | self.relu = nn.ReLU(inplace=True) 18 | self.conv2 = conv3x3(planes, planes) 19 | self.bn2 = nn.BatchNorm2d(planes) 20 | self.downsample = downsample 21 | self.stride = stride 22 | self.last = last 23 | 24 | def forward(self, x): 25 | residual = x 26 | 27 | out = self.conv1(x) 28 | out = self.bn1(out) 29 | out = self.relu(out) 30 | 31 | out = self.conv2(out) 32 | out = self.bn2(out) 33 | 34 | if self.downsample is not None: 35 | residual = self.downsample(x) 36 | 37 | out += residual 38 | if not self.last: 39 | out = self.relu(out) 40 | 41 | return out 42 | 43 | class ResNet(nn.Module): 44 | 45 | def __init__(self, block, layers, num_classes=10): 46 | self.inplanes = 16 47 | super(ResNet, self).__init__() 48 | self.conv1 = nn.Conv2d(3, 16, kernel_size=3, stride=1, padding=1, 49 | bias=False) 50 | self.bn1 = nn.BatchNorm2d(16) 51 | self.relu = nn.ReLU(inplace=True) 52 | self.layer1 = self._make_layer(block, 16, layers[0]) 53 | self.layer2 = self._make_layer(block, 32, layers[1], stride=2) 54 | self.layer3 = self._make_layer(block, 64, layers[2], stride=2, last_phase=True) 55 | self.avgpool = nn.AvgPool2d(8, stride=1) 56 | self.fc = modified_linear.CosineLinear(64 * block.expansion, num_classes) 57 | 58 | for m in self.modules(): 59 | if isinstance(m, nn.Conv2d): 60 | nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') 61 | elif isinstance(m, nn.BatchNorm2d): 62 | nn.init.constant_(m.weight, 1) 63 | nn.init.constant_(m.bias, 0) 64 | 65 | def _make_layer(self, block, planes, blocks, stride=1, last_phase=False): 66 | downsample = None 67 | if stride != 1 or self.inplanes != planes * block.expansion: 68 | downsample = nn.Sequential( 69 | nn.Conv2d(self.inplanes, planes * block.expansion, 70 | kernel_size=1, stride=stride, bias=False), 71 | nn.BatchNorm2d(planes * block.expansion), 72 | ) 73 | 74 | layers = [] 75 | layers.append(block(self.inplanes, planes, stride, downsample)) 76 | self.inplanes = planes * block.expansion 77 | if last_phase: 78 | for i in range(1, blocks-1): 79 | layers.append(block(self.inplanes, planes)) 80 | layers.append(block(self.inplanes, planes, last=True)) 81 | else: 82 | for i in range(1, blocks): 83 | layers.append(block(self.inplanes, planes)) 84 | 85 | return nn.Sequential(*layers) 86 | 87 | def forward(self, x): 88 | x = self.conv1(x) 89 | x = self.bn1(x) 90 | x = self.relu(x) 91 | 92 | x = self.layer1(x) 93 | x = self.layer2(x) 94 | x = self.layer3(x) 95 | 96 | x = self.avgpool(x) 97 | x = x.view(x.size(0), -1) 98 | x = self.fc(x) 99 | 100 | return x 101 | 102 | def resnet32(pretrained=False, **kwargs): 103 | n = 5 104 | model = ResNet(BasicBlock, [n, n, n], **kwargs) 105 | return model 106 | -------------------------------------------------------------------------------- /mnemonics-training/1_train/models/modified_resnetmtl_cifar.py: -------------------------------------------------------------------------------- 1 | ##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 | ## Modified by: Yaoyao Liu 3 | ## Modified from: https://github.com/hshustc/CVPR19_Incremental_Learning 4 | ## MPI for Informatics 5 | ## yaoyao.liu@mpi-inf.mpg.de 6 | ## Copyright (c) 2020 7 | ## 8 | ## This source code is licensed under the MIT-style license found in the 9 | ## LICENSE file in the root directory of this source tree 10 | ##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 11 | """Modified ResNet wit transferring weights.""" 12 | import math 13 | import torch.nn as nn 14 | import torch.utils.model_zoo as model_zoo 15 | import models.modified_linear as modified_linear 16 | from utils.conv2d_mtl import Conv2dMtl 17 | 18 | def conv3x3mtl(in_planes, out_planes, stride=1): 19 | return Conv2dMtl(in_planes, out_planes, kernel_size=3, stride=stride, 20 | padding=1, bias=False) 21 | 22 | class BasicBlockMtl(nn.Module): 23 | expansion = 1 24 | 25 | def __init__(self, inplanes, planes, stride=1, downsample=None, last=False): 26 | super(BasicBlockMtl, self).__init__() 27 | self.conv1 = conv3x3mtl(inplanes, planes, stride) 28 | self.bn1 = nn.BatchNorm2d(planes) 29 | self.relu = nn.ReLU(inplace=True) 30 | self.conv2 = conv3x3mtl(planes, planes) 31 | self.bn2 = nn.BatchNorm2d(planes) 32 | self.downsample = downsample 33 | self.stride = stride 34 | self.last = last 35 | 36 | def forward(self, x): 37 | residual = x 38 | 39 | out = self.conv1(x) 40 | out = self.bn1(out) 41 | out = self.relu(out) 42 | 43 | out = self.conv2(out) 44 | out = self.bn2(out) 45 | 46 | if self.downsample is not None: 47 | residual = self.downsample(x) 48 | 49 | out += residual 50 | if not self.last: 51 | out = self.relu(out) 52 | 53 | return out 54 | 55 | class ResNetMtl(nn.Module): 56 | 57 | def __init__(self, block, layers, num_classes=10): 58 | self.inplanes = 16 59 | super(ResNetMtl, self).__init__() 60 | self.conv1 = Conv2dMtl(3, 16, kernel_size=3, stride=1, padding=1, 61 | bias=False) 62 | self.bn1 = nn.BatchNorm2d(16) 63 | self.relu = nn.ReLU(inplace=True) 64 | self.layer1 = self._make_layer(block, 16, layers[0]) 65 | self.layer2 = self._make_layer(block, 32, layers[1], stride=2) 66 | self.layer3 = self._make_layer(block, 64, layers[2], stride=2, last_phase=True) 67 | self.avgpool = nn.AvgPool2d(8, stride=1) 68 | self.fc = modified_linear.CosineLinear(64 * block.expansion, num_classes) 69 | 70 | for m in self.modules(): 71 | if isinstance(m, Conv2dMtl): 72 | nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') 73 | elif isinstance(m, nn.BatchNorm2d): 74 | nn.init.constant_(m.weight, 1) 75 | nn.init.constant_(m.bias, 0) 76 | 77 | def _make_layer(self, block, planes, blocks, stride=1, last_phase=False): 78 | downsample = None 79 | if stride != 1 or self.inplanes != planes * block.expansion: 80 | downsample = nn.Sequential( 81 | Conv2dMtl(self.inplanes, planes * block.expansion, 82 | kernel_size=1, stride=stride, bias=False), 83 | nn.BatchNorm2d(planes * block.expansion), 84 | ) 85 | 86 | layers = [] 87 | layers.append(block(self.inplanes, planes, stride, downsample)) 88 | self.inplanes = planes * block.expansion 89 | if last_phase: 90 | for i in range(1, blocks-1): 91 | layers.append(block(self.inplanes, planes)) 92 | layers.append(block(self.inplanes, planes, last=True)) 93 | else: 94 | for i in range(1, blocks): 95 | layers.append(block(self.inplanes, planes)) 96 | 97 | return nn.Sequential(*layers) 98 | 99 | def forward(self, x): 100 | x = self.conv1(x) 101 | x = self.bn1(x) 102 | x = self.relu(x) 103 | 104 | x = self.layer1(x) 105 | x = self.layer2(x) 106 | x = self.layer3(x) 107 | 108 | x = self.avgpool(x) 109 | x = x.view(x.size(0), -1) 110 | x = self.fc(x) 111 | 112 | return x 113 | 114 | def resnetmtl32(pretrained=False, **kwargs): 115 | n = 5 116 | model = ResNetMtl(BasicBlockMtl, [n, n, n], **kwargs) 117 | return model 118 | -------------------------------------------------------------------------------- /mnemonics-training/1_train/trainer/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yaoyao-liu/class-incremental-learning/701af9f819f559c6ab3d3ee73bb3d7c21e924572/mnemonics-training/1_train/trainer/__init__.py -------------------------------------------------------------------------------- /mnemonics-training/1_train/trainer/incremental.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import tqdm 3 | import torch.nn as nn 4 | from torch.optim import lr_scheduler 5 | from torchvision import datasets, models, transforms 6 | from utils.misc import * 7 | from utils.process_fp import process_inputs_fp 8 | import torch.nn.functional as F 9 | 10 | def incremental_train_and_eval(epochs, tg_model, ref_model, free_model, ref_free_model, tg_optimizer, tg_lr_scheduler, trainloader, testloader, iteration, start_iteration, lamda, dist, K, lw_mr, fix_bn=False, weight_per_class=None, device=None): 11 | if device is None: 12 | device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") 13 | T = 2.0 14 | beta = 0.25 15 | if iteration > start_iteration: 16 | ref_model.eval() 17 | num_old_classes = ref_model.fc.out_features 18 | for epoch in range(epochs): 19 | tg_model.train() 20 | if fix_bn: 21 | for m in tg_model.modules(): 22 | if isinstance(m, nn.BatchNorm2d): 23 | m.eval() 24 | train_loss = 0 25 | train_loss1 = 0 26 | train_loss2 = 0 27 | correct = 0 28 | total = 0 29 | tg_lr_scheduler.step() 30 | print('\nEpoch: %d, LR: ' % epoch, end='') 31 | print(tg_lr_scheduler.get_lr()) 32 | for batch_idx, (inputs, targets) in enumerate(trainloader): 33 | inputs, targets = inputs.to(device), targets.to(device) 34 | tg_optimizer.zero_grad() 35 | outputs = tg_model(inputs) 36 | if iteration == start_iteration: 37 | loss = nn.CrossEntropyLoss(weight_per_class)(outputs, targets) 38 | else: 39 | ref_outputs = ref_model(inputs) 40 | loss1 = nn.KLDivLoss()(F.log_softmax(outputs[:,:num_old_classes]/T, dim=1), \ 41 | F.softmax(ref_outputs.detach()/T, dim=1)) * T * T * beta * num_old_classes 42 | loss2 = nn.CrossEntropyLoss(weight_per_class)(outputs, targets) 43 | loss = loss1 + loss2 44 | loss.backward() 45 | tg_optimizer.step() 46 | 47 | train_loss += loss.item() 48 | if iteration > start_iteration: 49 | train_loss1 += loss1.item() 50 | train_loss2 += loss2.item() 51 | _, predicted = outputs.max(1) 52 | total += targets.size(0) 53 | correct += predicted.eq(targets).sum().item() 54 | if iteration == start_iteration: 55 | print('Train set: {}, Train Loss: {:.4f} Acc: {:.4f}'.format(\ 56 | len(trainloader), train_loss/(batch_idx+1), 100.*correct/total)) 57 | else: 58 | print('Train set: {}, Train Loss1: {:.4f}, Train Loss2: {:.4f},\ 59 | Train Loss: {:.4f} Acc: {:.4f}'.format(len(trainloader), \ 60 | train_loss1/(batch_idx+1), train_loss2/(batch_idx+1), 61 | train_loss/(batch_idx+1), 100.*correct/total)) 62 | tg_model.eval() 63 | test_loss = 0 64 | correct = 0 65 | total = 0 66 | with torch.no_grad(): 67 | for batch_idx, (inputs, targets) in enumerate(testloader): 68 | inputs, targets = inputs.to(device), targets.to(device) 69 | outputs = tg_model(inputs) 70 | loss = nn.CrossEntropyLoss(weight_per_class)(outputs, targets) 71 | 72 | test_loss += loss.item() 73 | _, predicted = outputs.max(1) 74 | total += targets.size(0) 75 | correct += predicted.eq(targets).sum().item() 76 | print('Test set: {} Test Loss: {:.4f} Acc: {:.4f}'.format(\ 77 | len(testloader), test_loss/(batch_idx+1), 100.*correct/total)) 78 | return tg_model 79 | 80 | -------------------------------------------------------------------------------- /mnemonics-training/1_train/utils/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yaoyao-liu/class-incremental-learning/701af9f819f559c6ab3d3ee73bb3d7c21e924572/mnemonics-training/1_train/utils/__init__.py -------------------------------------------------------------------------------- /mnemonics-training/1_train/utils/compute_accuracy.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torchvision 3 | import numpy as np 4 | import torch.nn.functional as F 5 | from torchvision import datasets, models, transforms 6 | from scipy.spatial.distance import cdist 7 | from utils.misc import * 8 | from utils.process_fp import process_inputs_fp 9 | 10 | def map_labels(order_list, Y_set): 11 | map_Y = [] 12 | for idx in Y_set: 13 | map_Y.append(order_list.index(idx)) 14 | map_Y = np.array(map_Y) 15 | return map_Y 16 | 17 | def compute_accuracy(tg_model, free_model, tg_feature_model, class_means, X_protoset_cumuls, Y_protoset_cumuls, evalloader, order_list, is_start_iteration=False, fast_fc=None, scale=None, print_info=True, device=None, maml_lr=0.1, maml_epoch=50): 18 | if device is None: 19 | device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") 20 | tg_feature_model.eval() 21 | tg_model.eval() 22 | if free_model is not None: 23 | free_model.eval() 24 | if fast_fc is None: 25 | transform_proto = transforms.Compose([transforms.ToTensor(),transforms.Normalize((0.5071, 0.4866, 0.4409), (0.2009, 0.1984, 0.2023)),]) 26 | protoset = torchvision.datasets.CIFAR100(root='./data', train=False, download=False, transform=transform_proto) 27 | X_protoset_array = np.array(X_protoset_cumuls).astype('uint8') 28 | protoset.test_data = X_protoset_array.reshape(-1, X_protoset_array.shape[2], X_protoset_array.shape[3], X_protoset_array.shape[4]) 29 | Y_protoset_cumuls = np.array(Y_protoset_cumuls).reshape(-1) 30 | map_Y_protoset_cumuls = map_labels(order_list, Y_protoset_cumuls) 31 | protoset.test_labels = map_Y_protoset_cumuls 32 | protoloader = torch.utils.data.DataLoader(protoset, batch_size=128, shuffle=True, num_workers=2) 33 | 34 | fast_fc = torch.from_numpy(np.float32(class_means[:,:,0].T)).to(device) 35 | fast_fc.requires_grad=True 36 | 37 | epoch_num = maml_epoch 38 | for epoch_idx in range(epoch_num): 39 | for the_inputs, the_targets in protoloader: 40 | the_inputs, the_targets = the_inputs.to(device), the_targets.to(device) 41 | the_features = tg_feature_model(the_inputs) 42 | the_logits = F.linear(F.normalize(torch.squeeze(the_features), p=2,dim=1), F.normalize(fast_fc, p=2, dim=1)) 43 | the_loss = F.cross_entropy(the_logits, the_targets) 44 | the_grad = torch.autograd.grad(the_loss, fast_fc) 45 | fast_fc = fast_fc - maml_lr * the_grad[0] 46 | correct = 0 47 | correct_icarl = 0 48 | correct_ncm = 0 49 | correct_maml = 0 50 | total = 0 51 | with torch.no_grad(): 52 | for batch_idx, (inputs, targets) in enumerate(evalloader): 53 | inputs, targets = inputs.to(device), targets.to(device) 54 | total += targets.size(0) 55 | if is_start_iteration: 56 | outputs = tg_model(inputs) 57 | else: 58 | outputs, outputs_feature = process_inputs_fp(tg_model, free_model, inputs) 59 | outputs = F.softmax(outputs, dim=1) 60 | if scale is not None: 61 | assert(scale.shape[0] == 1) 62 | assert(outputs.shape[1] == scale.shape[1]) 63 | outputs = outputs / scale.repeat(outputs.shape[0], 1).type(torch.FloatTensor).to(device) 64 | _, predicted = outputs.max(1) 65 | correct += predicted.eq(targets).sum().item() 66 | 67 | if is_start_iteration: 68 | outputs_feature = np.squeeze(tg_feature_model(inputs)) 69 | sqd_icarl = cdist(class_means[:,:,0].T, outputs_feature, 'sqeuclidean') 70 | score_icarl = torch.from_numpy((-sqd_icarl).T).to(device) 71 | _, predicted_icarl = score_icarl.max(1) 72 | correct_icarl += predicted_icarl.eq(targets).sum().item() 73 | sqd_ncm = cdist(class_means[:,:,1].T, outputs_feature, 'sqeuclidean') 74 | score_ncm = torch.from_numpy((-sqd_ncm).T).to(device) 75 | _, predicted_ncm = score_ncm.max(1) 76 | correct_ncm += predicted_ncm.eq(targets).sum().item() 77 | the_logits = F.linear(F.normalize(torch.squeeze(outputs_feature), p=2,dim=1), F.normalize(fast_fc, p=2, dim=1)) 78 | _, predicted_maml = the_logits.max(1) 79 | correct_maml += predicted_maml.eq(targets).sum().item() 80 | cnn_acc = 100.*correct/total 81 | icarl_acc = 100.*correct_icarl/total 82 | ncm_acc = 100.*correct_ncm/total 83 | maml_acc = 100.*correct_maml/total 84 | if print_info: 85 | print(" Accuracy for LwF :\t\t{:.2f} %".format(cnn_acc)) 86 | print(" Accuracy for iCaRL :\t\t{:.2f} %".format(icarl_acc)) 87 | print(" The above results are the accuracy for the current phase.") 88 | print(" For the average accuracy, you need to record the results for all phases and calculate the average value.") 89 | return [cnn_acc, icarl_acc, ncm_acc, maml_acc], fast_fc 90 | -------------------------------------------------------------------------------- /mnemonics-training/1_train/utils/compute_features.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import numpy as np 3 | from torchvision import models 4 | from utils.misc import * 5 | from utils.process_fp import process_inputs_fp 6 | 7 | def compute_features(tg_model, free_model, tg_feature_model, is_start_iteration, evalloader, num_samples, num_features, device=None): 8 | if device is None: 9 | device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") 10 | tg_feature_model.eval() 11 | tg_model.eval() 12 | if free_model is not None: 13 | free_model.eval() 14 | features = np.zeros([num_samples, num_features]) 15 | start_idx = 0 16 | with torch.no_grad(): 17 | for inputs, targets in evalloader: 18 | inputs = inputs.to(device) 19 | if is_start_iteration: 20 | the_feature = tg_feature_model(inputs) 21 | else: 22 | the_feature = process_inputs_fp(tg_model, free_model, inputs, feature_mode=True) 23 | features[start_idx:start_idx+inputs.shape[0], :] = np.squeeze(the_feature) 24 | start_idx = start_idx+inputs.shape[0] 25 | assert(start_idx==num_samples) 26 | return features 27 | -------------------------------------------------------------------------------- /mnemonics-training/1_train/utils/conv2d_mtl.py: -------------------------------------------------------------------------------- 1 | import math 2 | import torch 3 | from torch.nn.parameter import Parameter 4 | import torch.nn.functional as F 5 | from torch.nn.modules.module import Module 6 | from torch.nn.modules.utils import _single, _pair, _triple 7 | 8 | class _ConvNdMtl(Module): 9 | 10 | def __init__(self, in_channels, out_channels, kernel_size, stride, 11 | padding, dilation, transposed, output_padding, groups, bias): 12 | super(_ConvNdMtl, self).__init__() 13 | if in_channels % groups != 0: 14 | raise ValueError('in_channels must be divisible by groups') 15 | if out_channels % groups != 0: 16 | raise ValueError('out_channels must be divisible by groups') 17 | self.in_channels = in_channels 18 | self.out_channels = out_channels 19 | self.kernel_size = kernel_size 20 | self.stride = stride 21 | self.padding = padding 22 | self.dilation = dilation 23 | self.transposed = transposed 24 | self.output_padding = output_padding 25 | self.groups = groups 26 | if transposed: 27 | self.weight = Parameter(torch.Tensor( 28 | in_channels, out_channels // groups, *kernel_size)) 29 | self.mtl_weight = Parameter(torch.ones(in_channels, out_channels // groups, 1, 1)) 30 | else: 31 | self.weight = Parameter(torch.Tensor( 32 | out_channels, in_channels // groups, *kernel_size)) 33 | self.mtl_weight = Parameter(torch.ones(out_channels, in_channels // groups, 1, 1)) 34 | self.weight.requires_grad=False 35 | if bias: 36 | self.bias = Parameter(torch.Tensor(out_channels)) 37 | self.bias.requires_grad=False 38 | self.mtl_bias = Parameter(torch.zeros(out_channels)) 39 | else: 40 | self.register_parameter('bias', None) 41 | self.register_parameter('mtl_bias', None) 42 | self.reset_parameters() 43 | 44 | def reset_parameters(self): 45 | n = self.in_channels 46 | for k in self.kernel_size: 47 | n *= k 48 | stdv = 1. / math.sqrt(n) 49 | self.weight.data.uniform_(-stdv, stdv) 50 | self.mtl_weight.data.uniform_(1, 1) 51 | if self.bias is not None: 52 | self.bias.data.uniform_(-stdv, stdv) 53 | self.mtl_bias.data.uniform_(0, 0) 54 | 55 | def extra_repr(self): 56 | s = ('{in_channels}, {out_channels}, kernel_size={kernel_size}' 57 | ', stride={stride}') 58 | if self.padding != (0,) * len(self.padding): 59 | s += ', padding={padding}' 60 | if self.dilation != (1,) * len(self.dilation): 61 | s += ', dilation={dilation}' 62 | if self.output_padding != (0,) * len(self.output_padding): 63 | s += ', output_padding={output_padding}' 64 | if self.groups != 1: 65 | s += ', groups={groups}' 66 | if self.bias is None: 67 | s += ', bias=False' 68 | return s.format(**self.__dict__) 69 | 70 | class Conv2dMtl(_ConvNdMtl): 71 | 72 | def __init__(self, in_channels, out_channels, kernel_size, stride=1, 73 | padding=0, dilation=1, groups=1, bias=True): 74 | kernel_size = _pair(kernel_size) 75 | stride = _pair(stride) 76 | padding = _pair(padding) 77 | dilation = _pair(dilation) 78 | super(Conv2dMtl, self).__init__( 79 | in_channels, out_channels, kernel_size, stride, padding, dilation, 80 | False, _pair(0), groups, bias) 81 | 82 | def forward(self, input): 83 | new_mtl_weight = self.mtl_weight.expand(self.weight.shape) 84 | new_weight = self.weight.mul(new_mtl_weight) 85 | if self.bias is not None: 86 | new_bias = self.bias + self.mtl_bias 87 | else: 88 | new_bias = None 89 | return F.conv2d(input, new_weight, new_bias, self.stride, 90 | self.padding, self.dilation, self.groups) 91 | 92 | -------------------------------------------------------------------------------- /mnemonics-training/1_train/utils/gpu_tools.py: -------------------------------------------------------------------------------- 1 | import os 2 | import torch 3 | import time 4 | 5 | def check_memory(cuda_device): 6 | devices_info = os.popen('"/usr/bin/nvidia-smi" --query-gpu=memory.total,memory.used --format=csv,nounits,noheader').read().strip().split("\n") 7 | total, used = devices_info[int(cuda_device)].split(',') 8 | return total, used 9 | 10 | def occupy_memory(cuda_device): 11 | total, used = check_memory(cuda_device) 12 | total = int(total) 13 | used = int(used) 14 | max_mem = int(total * 0.90) 15 | print('Total memory: ' + str(total) + ', used memory: ' + str(used)) 16 | block_mem = max_mem - used 17 | if block_mem > 0: 18 | x = torch.cuda.FloatTensor(256, 1024, block_mem) 19 | del x 20 | 21 | def set_gpu(cuda_device): 22 | os.environ['CUDA_VISIBLE_DEVICES'] = cuda_device 23 | print('Using gpu:', cuda_device) 24 | -------------------------------------------------------------------------------- /mnemonics-training/1_train/utils/misc.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function, division 2 | import os 3 | import torch 4 | import sys 5 | import time 6 | import subprocess 7 | import torch.nn as nn 8 | import torch.nn.init as init 9 | import os.path as osp 10 | try: 11 | import cPickle as pickle 12 | except: 13 | import pickle 14 | 15 | def savepickle(data, file_path): 16 | mkdir_p(osp.dirname(file_path), delete=False) 17 | print('pickle into', file_path) 18 | with open(file_path, 'wb') as f: 19 | pickle.dump(data, f, pickle.HIGHEST_PROTOCOL) 20 | 21 | def unpickle(file_path): 22 | with open(file_path, 'rb') as f: 23 | data = pickle.load(f) 24 | return data 25 | 26 | def mkdir_p(path, delete=False, print_info=True): 27 | if path == '': return 28 | 29 | if delete: 30 | subprocess.call(('rm -r ' + path).split()) 31 | if not osp.exists(path): 32 | if print_info: 33 | print('mkdir -p ' + path) 34 | subprocess.call(('mkdir -p ' + path).split()) 35 | 36 | def get_mean_and_std(dataset): 37 | dataloader = torch.utils.data.DataLoader(dataset, batch_size=1, shuffle=True, num_workers=2) 38 | mean = torch.zeros(3) 39 | std = torch.zeros(3) 40 | print('==> Computing mean and std..') 41 | for inputs, targets in dataloader: 42 | for i in range(3): 43 | mean[i] += inputs[:,i,:,:].mean() 44 | std[i] += inputs[:,i,:,:].std() 45 | mean.div_(len(dataset)) 46 | std.div_(len(dataset)) 47 | return mean, std 48 | 49 | def init_params(net): 50 | '''Init layer parameters.''' 51 | for m in net.modules(): 52 | if isinstance(m, nn.Conv2d): 53 | init.kaiming_normal_(m.weight, mode='fan_out') 54 | if m.bias: 55 | init.constant_(m.bias, 0) 56 | elif isinstance(m, nn.BatchNorm2d): 57 | init.constant_(m.weight, 1) 58 | init.constant_(m.bias, 0) 59 | elif isinstance(m, nn.Linear): 60 | init.normal_(m.weight, std=1e-3) 61 | if m.bias is not None: 62 | init.constant_(m.bias, 0) 63 | 64 | _, term_width = os.popen('stty size', 'r').read().split() 65 | term_width = int(term_width) 66 | 67 | TOTAL_BAR_LENGTH = 65. 68 | last_time = time.time() 69 | begin_time = last_time 70 | def progress_bar(current, total, msg=None): 71 | global last_time, begin_time 72 | if current == 0: 73 | begin_time = time.time() # Reset for new bar. 74 | 75 | cur_len = int(TOTAL_BAR_LENGTH*current/total) 76 | rest_len = int(TOTAL_BAR_LENGTH - cur_len) - 1 77 | 78 | sys.stdout.write(' [') 79 | for i in range(cur_len): 80 | sys.stdout.write('=') 81 | sys.stdout.write('>') 82 | for i in range(rest_len): 83 | sys.stdout.write('.') 84 | sys.stdout.write(']') 85 | 86 | cur_time = time.time() 87 | step_time = cur_time - last_time 88 | last_time = cur_time 89 | tot_time = cur_time - begin_time 90 | 91 | L = [] 92 | L.append(' Step: %s' % format_time(step_time)) 93 | L.append(' | Tot: %s' % format_time(tot_time)) 94 | if msg: 95 | L.append(' | ' + msg) 96 | 97 | msg = ''.join(L) 98 | sys.stdout.write(msg) 99 | for i in range(term_width-int(TOTAL_BAR_LENGTH)-len(msg)-3): 100 | sys.stdout.write(' ') 101 | 102 | for i in range(term_width-int(TOTAL_BAR_LENGTH/2)+2): 103 | sys.stdout.write('\b') 104 | sys.stdout.write(' %d/%d ' % (current+1, total)) 105 | 106 | if current < total-1: 107 | sys.stdout.write('\r') 108 | else: 109 | sys.stdout.write('\n') 110 | sys.stdout.flush() 111 | 112 | def format_time(seconds): 113 | days = int(seconds / 3600/24) 114 | seconds = seconds - days*3600*24 115 | hours = int(seconds / 3600) 116 | seconds = seconds - hours*3600 117 | minutes = int(seconds / 60) 118 | seconds = seconds - minutes*60 119 | secondsf = int(seconds) 120 | seconds = seconds - secondsf 121 | millis = int(seconds*1000) 122 | 123 | f = '' 124 | i = 1 125 | if days > 0: 126 | f += str(days) + 'D' 127 | i += 1 128 | if hours > 0 and i <= 2: 129 | f += str(hours) + 'h' 130 | i += 1 131 | if minutes > 0 and i <= 2: 132 | f += str(minutes) + 'm' 133 | i += 1 134 | if secondsf > 0 and i <= 2: 135 | f += str(secondsf) + 's' 136 | i += 1 137 | if millis > 0 and i <= 2: 138 | f += str(millis) + 'ms' 139 | i += 1 140 | if f == '': 141 | f = '0ms' 142 | return f 143 | -------------------------------------------------------------------------------- /mnemonics-training/1_train/utils/process_fp.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | from utils.misc import * 4 | 5 | def process_inputs_fp(tg_model, free_model, inputs, fusion_mode=False, feature_mode=False): 6 | tg_model_group1 = [tg_model.conv1, tg_model.bn1, tg_model.relu, tg_model.layer1] 7 | tg_model_group1 = nn.Sequential(*tg_model_group1) 8 | tg_fp1 = tg_model_group1(inputs) 9 | fp1 = tg_fp1 10 | tg_model_group2 = tg_model.layer2 11 | tg_fp2 = tg_model_group2(fp1) 12 | fp2 = tg_fp2 13 | tg_model_group3 = [tg_model.layer3, tg_model.avgpool] 14 | tg_model_group3 = nn.Sequential(*tg_model_group3) 15 | tg_fp3 = tg_model_group3(fp2) 16 | fp3 = tg_fp3 17 | fp3 = fp3.view(fp3.size(0), -1) 18 | if feature_mode: 19 | return fp3 20 | else: 21 | outputs = tg_model.fc(fp3) 22 | feature = fp3 23 | return outputs, feature 24 | -------------------------------------------------------------------------------- /mnemonics-training/1_train/utils/process_mnemonics.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.optim as optim 3 | import torchvision 4 | import time 5 | import os 6 | import argparse 7 | import numpy as np 8 | 9 | def tensor2im(input_image, imtype=np.uint8): 10 | mean = [0.5071, 0.4866, 0.4409] 11 | std = [0.2009, 0.1984, 0.2023] 12 | if not isinstance(input_image, np.ndarray): 13 | if isinstance(input_image, torch.Tensor): 14 | image_tensor = input_image.data 15 | else: 16 | return input_image 17 | image_numpy = image_tensor.cpu().detach().float().numpy() 18 | if image_numpy.shape[0] == 1: 19 | image_numpy = np.tile(image_numpy, (3, 1, 1)) 20 | for i in range(len(mean)): 21 | image_numpy[i] = image_numpy[i] * std[i] + mean[i] 22 | image_numpy = image_numpy * 255 23 | image_numpy = np.transpose(image_numpy, (1, 2, 0)) 24 | else: 25 | image_numpy = input_image 26 | return image_numpy.astype(imtype) 27 | 28 | def process_mnemonics(X_protoset_cumuls, Y_protoset_cumuls, mnemonics_raw, mnemonics_label, order_list, nb_cl_fg, nb_cl, iteration, start_iter): 29 | mnemonics = mnemonics_raw[0] 30 | mnemonics_array_new = np.zeros((len(mnemonics), len(mnemonics[0]), 32, 32, 3)) 31 | mnemonics_list = [] 32 | mnemonics_label_list = [] 33 | for idx in range(len(mnemonics)): 34 | this_mnemonics = [] 35 | for sub_idx in range(len(mnemonics[idx])): 36 | processed_img = tensor2im(mnemonics[idx][sub_idx]) 37 | mnemonics_array_new[idx][sub_idx] = processed_img 38 | diff = len(X_protoset_cumuls) - len(mnemonics_array_new) 39 | for idx in range(len(mnemonics_array_new)): 40 | X_protoset_cumuls[idx+diff] = mnemonics_array_new[idx] 41 | return X_protoset_cumuls 42 | -------------------------------------------------------------------------------- /mnemonics-training/2_eval/README.md: -------------------------------------------------------------------------------- 1 | ## Evaluation on our models 2 | 3 | ### Clone this repository 4 | ```bash 5 | cd ~ 6 | git clone git@github.com:yaoyao-liu/mnemonics-training.git 7 | ``` 8 | 9 | ### Processing the datasets 10 | 11 | Process ImageNet-Sub and ImageNet: 12 | ```bash 13 | cd ~/mnemonics-training/eval/process_imagenet 14 | python generate_imagenet_subset.py 15 | python generate_imagenet.py 16 | ``` 17 | 18 | ### Download models 19 | 20 | Download the models for CIFAR-100, ImageNet-Sub and ImageNet: 21 | ```bash 22 | cd ~/mnemonics-training/eval 23 | sh ./script/download_ckpt.sh 24 | ``` 25 | You may also download the checkpoints on [Google Drive](https://drive.google.com/file/d/1sKO2BOssWgTFBNZbM50qDzgk6wqg4_l8/view). 26 | 27 | ### Running the evaluation 28 | 29 | Run evaluation code with our models: 30 | ```bash 31 | cd ~/mnemonics-training/eval 32 | sh run_eval.sh 33 | ``` 34 | -------------------------------------------------------------------------------- /mnemonics-training/2_eval/main.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding=utf-8 3 | import os 4 | import argparse 5 | import numpy as np 6 | from trainer.train import Trainer 7 | from utils.gpu_tools import occupy_memory 8 | 9 | if __name__ == '__main__': 10 | parser = argparse.ArgumentParser() 11 | parser.add_argument('--gpu', default='0') # GPU id 12 | parser.add_argument('--dataset', default='cifar100', type=str, choices=['cifar100', 'imagenet_sub', 'imagenet']) 13 | parser.add_argument('--data_dir', default='data/seed_1993_subset_100_imagenet/data', type=str) 14 | parser.add_argument('--num_classes', default=100, type=int) 15 | parser.add_argument('--nb_cl_fg', default=50, type=int, help='the number of classes in first group') 16 | parser.add_argument('--nb_cl', default=10, type=int, help='Classes per group') 17 | parser.add_argument('--nb_protos', default=20, type=int, help='Number of prototypes per class at the end') 18 | parser.add_argument('--nb_runs', default=1, type=int, help='Number of runs (random ordering of classes at each run)') 19 | parser.add_argument('--epochs', default=160, type=int, help='Epochs') 20 | parser.add_argument('--T', default=2, type=float, help='Temporature for distialltion') 21 | parser.add_argument('--beta', default=0.25, type=float, help='Beta for distialltion') 22 | parser.add_argument('--resume', action='store_true', help='resume from checkpoint') 23 | parser.add_argument('--resume_fg', action='store_true', help='resume first group from checkpoint') 24 | parser.add_argument('--ckpt_dir_fg', type=str, default='-') 25 | parser.add_argument('--dynamic_budget', action='store_true', help='fix budget') 26 | parser.add_argument('--phase', type=str, default='train', choices=['train', 'eval']) 27 | parser.add_argument('--ckpt_label', type=str, default='exp01') 28 | parser.add_argument('--use_mtl', action='store_true', help='using mtl weights') 29 | parser.add_argument('--num_workers', default=2, type=int, help='the number of workers for loading data') 30 | parser.add_argument('--load_iter', default=0, type=int) 31 | parser.add_argument('--mimic_score', action='store_true', help='To mimic scores for cosine embedding') 32 | parser.add_argument('--lw_ms', default=1, type=float, help='loss weight for mimicking score') 33 | parser.add_argument('--rs_ratio', default=0, type=float, help='The ratio for resample') 34 | parser.add_argument('--imprint_weights', action='store_true', help='Imprint the weights for novel classes') 35 | parser.add_argument('--less_forget', action='store_true', help='Less forgetful') 36 | parser.add_argument('--lamda', default=5, type=float, help='Lamda for LF') 37 | parser.add_argument('--adapt_lamda', action='store_true', help='Adaptively change lamda') 38 | parser.add_argument('--dist', default=0.5, type=float, help='Dist for MarginRankingLoss') 39 | parser.add_argument('--K', default=2, type=int, help='K for MarginRankingLoss') 40 | parser.add_argument('--lw_mr', default=1, type=float, help='loss weight for margin ranking loss') 41 | parser.add_argument('--random_seed', default=1993, type=int, help='random seed') 42 | parser.add_argument('--train_batch_size', default=128, type=int) 43 | parser.add_argument('--test_batch_size', default=100, type=int) 44 | parser.add_argument('--eval_batch_size', default=128, type=int) 45 | parser.add_argument('--base_lr1', default=0.1, type=float) 46 | parser.add_argument('--base_lr2', default=0.1, type=float) 47 | parser.add_argument('--lr_factor', default=0.1, type=float) 48 | parser.add_argument('--custom_weight_decay', default=5e-4, type=float) 49 | parser.add_argument('--custom_momentum', default=0.9, type=float) 50 | parser.add_argument('--load_ckpt_prefix', type=str, default='-') 51 | parser.add_argument('--load_order', type=str, default='-') 52 | parser.add_argument('--add_str', default=None, type=str) 53 | 54 | the_args = parser.parse_args() 55 | assert(the_args.nb_cl_fg % the_args.nb_cl == 0) 56 | assert(the_args.nb_cl_fg >= the_args.nb_cl) 57 | 58 | print(the_args) 59 | 60 | np.random.seed(the_args.random_seed) 61 | 62 | if not os.path.exists('./logs/cifar100_nfg50_ncls2_nproto20_mtl_exp01'): 63 | print('Download checkpoints from Google Drive.') 64 | os.system('sh ./script/download_ckpt.sh') 65 | 66 | os.environ['CUDA_VISIBLE_DEVICES'] = the_args.gpu 67 | print('Using gpu:', the_args.gpu) 68 | 69 | occupy_memory(the_args.gpu) 70 | print('Occupy GPU memory in advance.') 71 | 72 | trainer = Trainer(the_args) 73 | trainer.eval() 74 | 75 | 76 | 77 | 78 | 79 | -------------------------------------------------------------------------------- /mnemonics-training/2_eval/models/modified_linear.py: -------------------------------------------------------------------------------- 1 | import math 2 | 3 | import torch 4 | from torch.nn.parameter import Parameter 5 | from torch.nn import functional as F 6 | from torch.nn import Module 7 | 8 | class CosineLinear(Module): 9 | def __init__(self, in_features, out_features, sigma=True): 10 | super(CosineLinear, self).__init__() 11 | self.in_features = in_features 12 | self.out_features = out_features 13 | self.weight = Parameter(torch.Tensor(out_features, in_features)) 14 | if sigma: 15 | self.sigma = Parameter(torch.Tensor(1)) 16 | else: 17 | self.register_parameter('sigma', None) 18 | self.reset_parameters() 19 | 20 | def reset_parameters(self): 21 | stdv = 1. / math.sqrt(self.weight.size(1)) 22 | self.weight.data.uniform_(-stdv, stdv) 23 | if self.sigma is not None: 24 | self.sigma.data.fill_(1) #for initializaiton of sigma 25 | 26 | def forward(self, input): 27 | #w_norm = self.weight.data.norm(dim=1, keepdim=True) 28 | #w_norm = w_norm.expand_as(self.weight).add_(self.epsilon) 29 | #x_norm = input.data.norm(dim=1, keepdim=True) 30 | #x_norm = x_norm.expand_as(input).add_(self.epsilon) 31 | #w = self.weight.div(w_norm) 32 | #x = input.div(x_norm) 33 | out = F.linear(F.normalize(input, p=2,dim=1), \ 34 | F.normalize(self.weight, p=2, dim=1)) 35 | if self.sigma is not None: 36 | out = self.sigma * out 37 | return out 38 | 39 | class SplitCosineLinear(Module): 40 | #consists of two fc layers and concatenate their outputs 41 | def __init__(self, in_features, out_features1, out_features2, sigma=True): 42 | super(SplitCosineLinear, self).__init__() 43 | self.in_features = in_features 44 | self.out_features = out_features1 + out_features2 45 | self.fc1 = CosineLinear(in_features, out_features1, False) 46 | self.fc2 = CosineLinear(in_features, out_features2, False) 47 | if sigma: 48 | self.sigma = Parameter(torch.Tensor(1)) 49 | self.sigma.data.fill_(1) 50 | else: 51 | self.register_parameter('sigma', None) 52 | 53 | def forward(self, x): 54 | out1 = self.fc1(x) 55 | out2 = self.fc2(x) 56 | out = torch.cat((out1, out2), dim=1) #concatenate along the channel 57 | if self.sigma is not None: 58 | out = self.sigma * out 59 | return out -------------------------------------------------------------------------------- /mnemonics-training/2_eval/models/modified_resnet.py: -------------------------------------------------------------------------------- 1 | import torch.nn as nn 2 | import math 3 | import torch.utils.model_zoo as model_zoo 4 | import models.modified_linear as modified_linear 5 | 6 | def conv3x3(in_planes, out_planes, stride=1): 7 | """3x3 convolution with padding""" 8 | return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride, 9 | padding=1, bias=False) 10 | 11 | 12 | class BasicBlock(nn.Module): 13 | expansion = 1 14 | 15 | def __init__(self, inplanes, planes, stride=1, downsample=None, last=False): 16 | super(BasicBlock, self).__init__() 17 | self.conv1 = conv3x3(inplanes, planes, stride) 18 | self.bn1 = nn.BatchNorm2d(planes) 19 | self.relu = nn.ReLU(inplace=True) 20 | self.conv2 = conv3x3(planes, planes) 21 | self.bn2 = nn.BatchNorm2d(planes) 22 | self.downsample = downsample 23 | self.stride = stride 24 | self.last = last 25 | 26 | def forward(self, x): 27 | residual = x 28 | 29 | out = self.conv1(x) 30 | out = self.bn1(out) 31 | out = self.relu(out) 32 | 33 | out = self.conv2(out) 34 | out = self.bn2(out) 35 | 36 | if self.downsample is not None: 37 | residual = self.downsample(x) 38 | 39 | out += residual 40 | if not self.last: #remove ReLU in the last layer 41 | out = self.relu(out) 42 | 43 | return out 44 | 45 | class ResNet(nn.Module): 46 | 47 | def __init__(self, block, layers, num_classes=1000): 48 | self.inplanes = 64 49 | super(ResNet, self).__init__() 50 | self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, 51 | bias=False) 52 | self.bn1 = nn.BatchNorm2d(64) 53 | self.relu = nn.ReLU(inplace=True) 54 | self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1) 55 | self.layer1 = self._make_layer(block, 64, layers[0]) 56 | self.layer2 = self._make_layer(block, 128, layers[1], stride=2) 57 | self.layer3 = self._make_layer(block, 256, layers[2], stride=2) 58 | self.layer4 = self._make_layer(block, 512, layers[3], stride=2, last_phase=True) 59 | self.avgpool = nn.AvgPool2d(7, stride=1) 60 | self.fc = modified_linear.CosineLinear(512 * block.expansion, num_classes) 61 | 62 | for m in self.modules(): 63 | if isinstance(m, nn.Conv2d): 64 | nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') 65 | elif isinstance(m, nn.BatchNorm2d): 66 | nn.init.constant_(m.weight, 1) 67 | nn.init.constant_(m.bias, 0) 68 | 69 | def _make_layer(self, block, planes, blocks, stride=1, last_phase=False): 70 | downsample = None 71 | if stride != 1 or self.inplanes != planes * block.expansion: 72 | downsample = nn.Sequential( 73 | nn.Conv2d(self.inplanes, planes * block.expansion, 74 | kernel_size=1, stride=stride, bias=False), 75 | nn.BatchNorm2d(planes * block.expansion), 76 | ) 77 | 78 | layers = [] 79 | layers.append(block(self.inplanes, planes, stride, downsample)) 80 | self.inplanes = planes * block.expansion 81 | if last_phase: 82 | for i in range(1, blocks-1): 83 | layers.append(block(self.inplanes, planes)) 84 | layers.append(block(self.inplanes, planes, last=True)) 85 | else: 86 | for i in range(1, blocks): 87 | layers.append(block(self.inplanes, planes)) 88 | 89 | return nn.Sequential(*layers) 90 | 91 | def forward(self, x): 92 | x = self.conv1(x) 93 | x = self.bn1(x) 94 | x = self.relu(x) 95 | x = self.maxpool(x) 96 | 97 | x = self.layer1(x) 98 | x = self.layer2(x) 99 | x = self.layer3(x) 100 | x = self.layer4(x) 101 | 102 | x = self.avgpool(x) 103 | x = x.view(x.size(0), -1) 104 | x = self.fc(x) 105 | 106 | return x 107 | 108 | 109 | def resnet18(pretrained=False, **kwargs): 110 | """Constructs a ResNet-18 model. 111 | 112 | Args: 113 | pretrained (bool): If True, returns a model pre-trained on ImageNet 114 | """ 115 | model = ResNet(BasicBlock, [2, 2, 2, 2], **kwargs) 116 | return model 117 | -------------------------------------------------------------------------------- /mnemonics-training/2_eval/models/modified_resnet_cifar.py: -------------------------------------------------------------------------------- 1 | #remove ReLU in the last layer, and use cosine layer to replace nn.Linear 2 | import torch.nn as nn 3 | import math 4 | import torch.utils.model_zoo as model_zoo 5 | import models.modified_linear as modified_linear 6 | 7 | def conv3x3(in_planes, out_planes, stride=1): 8 | """3x3 convolution with padding""" 9 | return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride, 10 | padding=1, bias=False) 11 | 12 | class BasicBlock(nn.Module): 13 | expansion = 1 14 | 15 | def __init__(self, inplanes, planes, stride=1, downsample=None, last=False): 16 | super(BasicBlock, self).__init__() 17 | self.conv1 = conv3x3(inplanes, planes, stride) 18 | self.bn1 = nn.BatchNorm2d(planes) 19 | self.relu = nn.ReLU(inplace=True) 20 | self.conv2 = conv3x3(planes, planes) 21 | self.bn2 = nn.BatchNorm2d(planes) 22 | self.downsample = downsample 23 | self.stride = stride 24 | self.last = last 25 | 26 | def forward(self, x): 27 | residual = x 28 | 29 | out = self.conv1(x) 30 | out = self.bn1(out) 31 | out = self.relu(out) 32 | 33 | out = self.conv2(out) 34 | out = self.bn2(out) 35 | 36 | if self.downsample is not None: 37 | residual = self.downsample(x) 38 | 39 | out += residual 40 | if not self.last: #remove ReLU in the last layer 41 | out = self.relu(out) 42 | 43 | return out 44 | 45 | class ResNet(nn.Module): 46 | 47 | def __init__(self, block, layers, num_classes=10): 48 | self.inplanes = 16 49 | super(ResNet, self).__init__() 50 | self.conv1 = nn.Conv2d(3, 16, kernel_size=3, stride=1, padding=1, 51 | bias=False) 52 | self.bn1 = nn.BatchNorm2d(16) 53 | self.relu = nn.ReLU(inplace=True) 54 | self.layer1 = self._make_layer(block, 16, layers[0]) 55 | self.layer2 = self._make_layer(block, 32, layers[1], stride=2) 56 | self.layer3 = self._make_layer(block, 64, layers[2], stride=2, last_phase=True) 57 | self.avgpool = nn.AvgPool2d(8, stride=1) 58 | self.fc = modified_linear.CosineLinear(64 * block.expansion, num_classes) 59 | 60 | for m in self.modules(): 61 | if isinstance(m, nn.Conv2d): 62 | nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') 63 | elif isinstance(m, nn.BatchNorm2d): 64 | nn.init.constant_(m.weight, 1) 65 | nn.init.constant_(m.bias, 0) 66 | 67 | def _make_layer(self, block, planes, blocks, stride=1, last_phase=False): 68 | downsample = None 69 | if stride != 1 or self.inplanes != planes * block.expansion: 70 | downsample = nn.Sequential( 71 | nn.Conv2d(self.inplanes, planes * block.expansion, 72 | kernel_size=1, stride=stride, bias=False), 73 | nn.BatchNorm2d(planes * block.expansion), 74 | ) 75 | 76 | layers = [] 77 | layers.append(block(self.inplanes, planes, stride, downsample)) 78 | self.inplanes = planes * block.expansion 79 | if last_phase: 80 | for i in range(1, blocks-1): 81 | layers.append(block(self.inplanes, planes)) 82 | layers.append(block(self.inplanes, planes, last=True)) 83 | else: 84 | for i in range(1, blocks): 85 | layers.append(block(self.inplanes, planes)) 86 | 87 | return nn.Sequential(*layers) 88 | 89 | def forward(self, x): 90 | x = self.conv1(x) 91 | x = self.bn1(x) 92 | x = self.relu(x) 93 | 94 | x = self.layer1(x) 95 | x = self.layer2(x) 96 | x = self.layer3(x) 97 | 98 | x = self.avgpool(x) 99 | x = x.view(x.size(0), -1) 100 | x = self.fc(x) 101 | 102 | return x 103 | 104 | def resnet20(pretrained=False, **kwargs): 105 | n = 3 106 | model = ResNet(BasicBlock, [n, n, n], **kwargs) 107 | return model 108 | 109 | def resnet32(pretrained=False, **kwargs): 110 | n = 5 111 | model = ResNet(BasicBlock, [n, n, n], **kwargs) 112 | return model 113 | -------------------------------------------------------------------------------- /mnemonics-training/2_eval/models/modified_resnetmtl.py: -------------------------------------------------------------------------------- 1 | import torch.nn as nn 2 | import math 3 | import torch.utils.model_zoo as model_zoo 4 | import models.modified_linear as modified_linear 5 | from utils.incremental.conv2d_mtl import Conv2dMtl 6 | 7 | def conv3x3mtl(in_planes, out_planes, stride=1): 8 | """3x3 convolution with padding""" 9 | return Conv2dMtl(in_planes, out_planes, kernel_size=3, stride=stride, 10 | padding=1, bias=False) 11 | 12 | 13 | class BasicBlockMtl(nn.Module): 14 | expansion = 1 15 | 16 | def __init__(self, inplanes, planes, stride=1, downsample=None, last=False): 17 | super(BasicBlockMtl, self).__init__() 18 | self.conv1 = conv3x3mtl(inplanes, planes, stride) 19 | self.bn1 = nn.BatchNorm2d(planes) 20 | self.relu = nn.ReLU(inplace=True) 21 | self.conv2 = conv3x3mtl(planes, planes) 22 | self.bn2 = nn.BatchNorm2d(planes) 23 | self.downsample = downsample 24 | self.stride = stride 25 | self.last = last 26 | 27 | def forward(self, x): 28 | residual = x 29 | 30 | out = self.conv1(x) 31 | out = self.bn1(out) 32 | out = self.relu(out) 33 | 34 | out = self.conv2(out) 35 | out = self.bn2(out) 36 | 37 | if self.downsample is not None: 38 | residual = self.downsample(x) 39 | 40 | out += residual 41 | if not self.last: #remove ReLU in the last layer 42 | out = self.relu(out) 43 | 44 | return out 45 | 46 | class ResNetMtl(nn.Module): 47 | 48 | def __init__(self, block, layers, num_classes=1000): 49 | self.inplanes = 64 50 | super(ResNetMtl, self).__init__() 51 | self.conv1 = Conv2dMtl(3, 64, kernel_size=7, stride=2, padding=3, 52 | bias=False) 53 | self.bn1 = nn.BatchNorm2d(64) 54 | self.relu = nn.ReLU(inplace=True) 55 | self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1) 56 | self.layer1 = self._make_layer(block, 64, layers[0]) 57 | self.layer2 = self._make_layer(block, 128, layers[1], stride=2) 58 | self.layer3 = self._make_layer(block, 256, layers[2], stride=2) 59 | self.layer4 = self._make_layer(block, 512, layers[3], stride=2, last_phase=True) 60 | self.avgpool = nn.AvgPool2d(7, stride=1) 61 | self.fc = modified_linear.CosineLinear(512 * block.expansion, num_classes) 62 | 63 | for m in self.modules(): 64 | if isinstance(m, Conv2dMtl): 65 | nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') 66 | elif isinstance(m, nn.BatchNorm2d): 67 | nn.init.constant_(m.weight, 1) 68 | nn.init.constant_(m.bias, 0) 69 | 70 | def _make_layer(self, block, planes, blocks, stride=1, last_phase=False): 71 | downsample = None 72 | if stride != 1 or self.inplanes != planes * block.expansion: 73 | downsample = nn.Sequential( 74 | Conv2dMtl(self.inplanes, planes * block.expansion, 75 | kernel_size=1, stride=stride, bias=False), 76 | nn.BatchNorm2d(planes * block.expansion), 77 | ) 78 | 79 | layers = [] 80 | layers.append(block(self.inplanes, planes, stride, downsample)) 81 | self.inplanes = planes * block.expansion 82 | if last_phase: 83 | for i in range(1, blocks-1): 84 | layers.append(block(self.inplanes, planes)) 85 | layers.append(block(self.inplanes, planes, last=True)) 86 | else: 87 | for i in range(1, blocks): 88 | layers.append(block(self.inplanes, planes)) 89 | 90 | return nn.Sequential(*layers) 91 | 92 | def forward(self, x): 93 | x = self.conv1(x) 94 | x = self.bn1(x) 95 | x = self.relu(x) 96 | x = self.maxpool(x) 97 | 98 | x = self.layer1(x) 99 | x = self.layer2(x) 100 | x = self.layer3(x) 101 | x = self.layer4(x) 102 | 103 | x = self.avgpool(x) 104 | x = x.view(x.size(0), -1) 105 | x = self.fc(x) 106 | 107 | return x 108 | 109 | 110 | def resnetmtl18(pretrained=False, **kwargs): 111 | """Constructs a ResNet-18 model. 112 | 113 | Args: 114 | pretrained (bool): If True, returns a model pre-trained on ImageNet 115 | """ 116 | model = ResNetMtl(BasicBlockMtl, [2, 2, 2, 2], **kwargs) 117 | return model 118 | -------------------------------------------------------------------------------- /mnemonics-training/2_eval/models/modified_resnetmtl_cifar.py: -------------------------------------------------------------------------------- 1 | #remove ReLU in the last layer, and use cosine layer to replace nn.Linear 2 | import torch.nn as nn 3 | import math 4 | import torch.utils.model_zoo as model_zoo 5 | import models.modified_linear as modified_linear 6 | from utils.incremental.conv2d_mtl import Conv2dMtl 7 | 8 | def conv3x3mtl(in_planes, out_planes, stride=1): 9 | """3x3 convolution with padding""" 10 | return Conv2dMtl(in_planes, out_planes, kernel_size=3, stride=stride, 11 | padding=1, bias=False) 12 | 13 | class BasicBlockMtl(nn.Module): 14 | expansion = 1 15 | 16 | def __init__(self, inplanes, planes, stride=1, downsample=None, last=False): 17 | super(BasicBlockMtl, self).__init__() 18 | self.conv1 = conv3x3mtl(inplanes, planes, stride) 19 | self.bn1 = nn.BatchNorm2d(planes) 20 | self.relu = nn.ReLU(inplace=True) 21 | self.conv2 = conv3x3mtl(planes, planes) 22 | self.bn2 = nn.BatchNorm2d(planes) 23 | self.downsample = downsample 24 | self.stride = stride 25 | self.last = last 26 | 27 | def forward(self, x): 28 | residual = x 29 | 30 | out = self.conv1(x) 31 | out = self.bn1(out) 32 | out = self.relu(out) 33 | 34 | out = self.conv2(out) 35 | out = self.bn2(out) 36 | 37 | if self.downsample is not None: 38 | residual = self.downsample(x) 39 | 40 | out += residual 41 | if not self.last: #remove ReLU in the last layer 42 | out = self.relu(out) 43 | 44 | return out 45 | 46 | class ResNetMtl(nn.Module): 47 | 48 | def __init__(self, block, layers, num_classes=10): 49 | self.inplanes = 16 50 | super(ResNetMtl, self).__init__() 51 | self.conv1 = Conv2dMtl(3, 16, kernel_size=3, stride=1, padding=1, 52 | bias=False) 53 | self.bn1 = nn.BatchNorm2d(16) 54 | self.relu = nn.ReLU(inplace=True) 55 | self.layer1 = self._make_layer(block, 16, layers[0]) 56 | self.layer2 = self._make_layer(block, 32, layers[1], stride=2) 57 | self.layer3 = self._make_layer(block, 64, layers[2], stride=2, last_phase=True) 58 | self.avgpool = nn.AvgPool2d(8, stride=1) 59 | self.fc = modified_linear.CosineLinear(64 * block.expansion, num_classes) 60 | 61 | for m in self.modules(): 62 | if isinstance(m, Conv2dMtl): 63 | nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') 64 | elif isinstance(m, nn.BatchNorm2d): 65 | nn.init.constant_(m.weight, 1) 66 | nn.init.constant_(m.bias, 0) 67 | 68 | def _make_layer(self, block, planes, blocks, stride=1, last_phase=False): 69 | downsample = None 70 | if stride != 1 or self.inplanes != planes * block.expansion: 71 | downsample = nn.Sequential( 72 | Conv2dMtl(self.inplanes, planes * block.expansion, 73 | kernel_size=1, stride=stride, bias=False), 74 | nn.BatchNorm2d(planes * block.expansion), 75 | ) 76 | 77 | layers = [] 78 | layers.append(block(self.inplanes, planes, stride, downsample)) 79 | self.inplanes = planes * block.expansion 80 | if last_phase: 81 | for i in range(1, blocks-1): 82 | layers.append(block(self.inplanes, planes)) 83 | layers.append(block(self.inplanes, planes, last=True)) 84 | else: 85 | for i in range(1, blocks): 86 | layers.append(block(self.inplanes, planes)) 87 | 88 | return nn.Sequential(*layers) 89 | 90 | def forward(self, x): 91 | x = self.conv1(x) 92 | x = self.bn1(x) 93 | x = self.relu(x) 94 | 95 | x = self.layer1(x) 96 | x = self.layer2(x) 97 | x = self.layer3(x) 98 | 99 | x = self.avgpool(x) 100 | x = x.view(x.size(0), -1) 101 | x = self.fc(x) 102 | 103 | return x 104 | 105 | def resnetmtl20(pretrained=False, **kwargs): 106 | n = 3 107 | model = ResNetMtl(BasicBlockMtl, [n, n, n], **kwargs) 108 | return model 109 | 110 | def resnetmtl32(pretrained=False, **kwargs): 111 | n = 5 112 | model = ResNetMtl(BasicBlockMtl, [n, n, n], **kwargs) 113 | return model 114 | -------------------------------------------------------------------------------- /mnemonics-training/2_eval/models/resnet_cifar.py: -------------------------------------------------------------------------------- 1 | import torch.nn as nn 2 | import math 3 | import torch.utils.model_zoo as model_zoo 4 | from utils.incremental.conv2d_mtl import Conv2d 5 | 6 | def conv3x3(in_planes, out_planes, stride=1): 7 | """3x3 convolution with padding""" 8 | return Conv2d(in_planes, out_planes, kernel_size=3, stride=stride, 9 | padding=1, bias=False) 10 | 11 | class BasicBlock(nn.Module): 12 | expansion = 1 13 | 14 | def __init__(self, inplanes, planes, stride=1, downsample=None): 15 | super(BasicBlock, self).__init__() 16 | self.conv1 = conv3x3(inplanes, planes, stride) 17 | self.bn1 = nn.BatchNorm2d(planes) 18 | self.relu = nn.ReLU(inplace=True) 19 | self.conv2 = conv3x3(planes, planes) 20 | self.bn2 = nn.BatchNorm2d(planes) 21 | self.downsample = downsample 22 | self.stride = stride 23 | 24 | def forward(self, x): 25 | residual = x 26 | import pdb 27 | pdb.set_trace() 28 | out = self.conv1(x) 29 | out = self.bn1(out) 30 | out = self.relu(out) 31 | 32 | out = self.conv2(out) 33 | out = self.bn2(out) 34 | 35 | if self.downsample is not None: 36 | residual = self.downsample(x) 37 | 38 | out += residual 39 | out = self.relu(out) 40 | 41 | return out 42 | 43 | 44 | class Bottleneck(nn.Module): 45 | expansion = 4 46 | 47 | def __init__(self, inplanes, planes, stride=1, downsample=None): 48 | super(Bottleneck, self).__init__() 49 | self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False) 50 | self.bn1 = nn.BatchNorm2d(planes) 51 | self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride, 52 | padding=1, bias=False) 53 | self.bn2 = nn.BatchNorm2d(planes) 54 | self.conv3 = nn.Conv2d(planes, planes * self.expansion, kernel_size=1, bias=False) 55 | self.bn3 = nn.BatchNorm2d(planes * self.expansion) 56 | self.relu = nn.ReLU(inplace=True) 57 | self.downsample = downsample 58 | self.stride = stride 59 | 60 | def forward(self, x): 61 | residual = x 62 | 63 | out = self.conv1(x) 64 | out = self.bn1(out) 65 | out = self.relu(out) 66 | 67 | out = self.conv2(out) 68 | out = self.bn2(out) 69 | out = self.relu(out) 70 | 71 | out = self.conv3(out) 72 | out = self.bn3(out) 73 | 74 | if self.downsample is not None: 75 | residual = self.downsample(x) 76 | 77 | out += residual 78 | out = self.relu(out) 79 | 80 | return out 81 | 82 | 83 | class ResNet(nn.Module): 84 | 85 | def __init__(self, block, layers, num_classes=10): 86 | self.inplanes = 16 87 | super(ResNet, self).__init__() 88 | self.conv1 = nn.Conv2d(3, 16, kernel_size=3, stride=1, padding=1, 89 | bias=False) 90 | self.bn1 = nn.BatchNorm2d(16) 91 | self.relu = nn.ReLU(inplace=True) 92 | self.layer1 = self._make_layer(block, 16, layers[0]) 93 | self.layer2 = self._make_layer(block, 32, layers[1], stride=2) 94 | self.layer3 = self._make_layer(block, 64, layers[2], stride=2) 95 | self.avgpool = nn.AvgPool2d(8, stride=1) 96 | self.fc = nn.Linear(64 * block.expansion, num_classes) 97 | 98 | for m in self.modules(): 99 | if isinstance(m, nn.Conv2d): 100 | nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') 101 | elif isinstance(m, nn.BatchNorm2d): 102 | nn.init.constant_(m.weight, 1) 103 | nn.init.constant_(m.bias, 0) 104 | 105 | def _make_layer(self, block, planes, blocks, stride=1): 106 | downsample = None 107 | if stride != 1 or self.inplanes != planes * block.expansion: 108 | downsample = nn.Sequential( 109 | nn.Conv2d(self.inplanes, planes * block.expansion, 110 | kernel_size=1, stride=stride, bias=False), 111 | nn.BatchNorm2d(planes * block.expansion), 112 | ) 113 | 114 | layers = [] 115 | layers.append(block(self.inplanes, planes, stride, downsample)) 116 | self.inplanes = planes * block.expansion 117 | for i in range(1, blocks): 118 | layers.append(block(self.inplanes, planes)) 119 | 120 | return nn.Sequential(*layers) 121 | 122 | def forward(self, x): 123 | x = self.conv1(x) 124 | x = self.bn1(x) 125 | x = self.relu(x) 126 | 127 | x = self.layer1(x) 128 | x = self.layer2(x) 129 | x = self.layer3(x) 130 | 131 | x = self.avgpool(x) 132 | x = x.view(x.size(0), -1) 133 | x = self.fc(x) 134 | 135 | return x 136 | 137 | def resnet20(pretrained=False, **kwargs): 138 | n = 3 139 | model = ResNet(BasicBlock, [n, n, n], **kwargs) 140 | return model 141 | 142 | def resnet32(pretrained=False, **kwargs): 143 | n = 5 144 | model = ResNet(BasicBlock, [n, n, n], **kwargs) 145 | return model 146 | 147 | def resnet56(pretrained=False, **kwargs): 148 | n = 9 149 | model = ResNet(Bottleneck, [n, n, n], **kwargs) 150 | return model 151 | -------------------------------------------------------------------------------- /mnemonics-training/2_eval/process_imagenet/generate_imagenet.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding=utf-8 3 | import argparse 4 | import os 5 | import random 6 | import shutil 7 | import time 8 | import warnings 9 | import numpy as np 10 | 11 | import torch 12 | import torch.nn as nn 13 | import torch.nn.parallel 14 | import torch.backends.cudnn as cudnn 15 | import torch.distributed as dist 16 | import torch.optim 17 | import torch.utils.data 18 | import torch.utils.data.distributed 19 | import torchvision.transforms as transforms 20 | import torchvision.datasets as datasets 21 | import torchvision.models as models 22 | from PIL import Image 23 | 24 | src_root_dir = 'data/imagenet/data/' 25 | des_root_dir = 'data/imagenet_resized_256/data/' 26 | if not os.path.exists(des_root_dir): 27 | os.makedirs(des_root_dir) 28 | 29 | phase_list = ['train', 'val'] 30 | for phase in phase_list: 31 | if not os.path.exists(os.path.join(des_root_dir, phase)): 32 | os.mkdir(os.path.join(des_root_dir, phase)) 33 | data_dir = os.path.join(src_root_dir, phase) 34 | tg_dataset = datasets.ImageFolder(data_dir) 35 | for cls_name in tg_dataset.classes: 36 | if not os.path.exists(os.path.join(des_root_dir, phase, cls_name)): 37 | os.mkdir(os.path.join(des_root_dir, phase, cls_name)) 38 | cnt = 0 39 | for item in tg_dataset.imgs: 40 | img_path = item[0] 41 | img = Image.open(img_path) 42 | img = img.convert('RGB') 43 | save_path = img_path.replace('imagenet', 'imagenet_resized_256') 44 | resized_img = img.resize((256,256), Image.BILINEAR) 45 | resized_img.save(save_path) 46 | cnt = cnt+1 47 | if cnt % 1000 == 0: 48 | print(cnt, save_path) 49 | 50 | print("Generation finished.") 51 | -------------------------------------------------------------------------------- /mnemonics-training/2_eval/process_imagenet/generate_imagenet_subset.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding=utf-8 3 | import argparse 4 | import os 5 | import random 6 | import shutil 7 | import time 8 | import warnings 9 | import numpy as np 10 | 11 | import torch 12 | import torch.nn as nn 13 | import torch.nn.parallel 14 | import torch.backends.cudnn as cudnn 15 | import torch.distributed as dist 16 | import torch.optim 17 | import torch.utils.data 18 | import torch.utils.data.distributed 19 | import torchvision.transforms as transforms 20 | import torchvision.datasets as datasets 21 | import torchvision.models as models 22 | 23 | data_dir = 'data/imagenet/data/' 24 | 25 | # Data loading code 26 | traindir = os.path.join(data_dir, 'train') 27 | train_dataset = datasets.ImageFolder(traindir, None) 28 | classes = train_dataset.classes 29 | print("the number of total classes: {}".format(len(classes))) 30 | 31 | seed = 1993 32 | np.random.seed(seed) 33 | subset_num = 100 34 | subset_classes = np.random.choice(classes, subset_num, replace=False) 35 | print("the number of subset classes: {}".format(len(subset_classes))) 36 | print(subset_classes) 37 | 38 | des_root_dir = 'data/seed_{}_subset_{}_imagenet/data/'.format(seed, subset_num) 39 | if not os.path.exists(des_root_dir): 40 | os.makedirs(des_root_dir) 41 | phase_list = ['train', 'val'] 42 | for phase in phase_list: 43 | if not os.path.exists(os.path.join(des_root_dir, phase)): 44 | os.mkdir(os.path.join(des_root_dir, phase)) 45 | for sc in subset_classes: 46 | src_dir = os.path.join(data_dir, phase, sc) 47 | des_dir = os.path.join(des_root_dir, phase, sc) 48 | cmd = "cp -r {} {}".format(src_dir, des_dir) 49 | print(cmd) 50 | os.system(cmd) 51 | 52 | print("Generation finished.") -------------------------------------------------------------------------------- /mnemonics-training/2_eval/run_eval.sh: -------------------------------------------------------------------------------- 1 | python main.py --nb_cl_fg=50 --nb_cl=2 --nb_protos=20 --epochs=160 --gpu=0 --dataset=cifar100 --random_seed=1993 --use_mtl 2 | python main.py --nb_cl_fg=50 --nb_cl=2 --nb_protos=20 --epochs=90 --gpu=0 --dataset=imagenet_sub --data_dir=./data/seed_1993_subset_100_imagenet/data --num_workers=16 --test_batch_size=50 --use_mtl 3 | python main.py --nb_cl_fg=500 --nb_cl=20 --nb_protos=20 --epochs=90 --gpu=0 --dataset=imagenet --data_dir=./data/imagenet/data --num_workers=16 --test_batch_size=50 --num_classes=1000 --use_mtl 4 | -------------------------------------------------------------------------------- /mnemonics-training/2_eval/script/download_ckpt.sh: -------------------------------------------------------------------------------- 1 | wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1sKO2BOssWgTFBNZbM50qDzgk6wqg4_l8' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1sKO2BOssWgTFBNZbM50qDzgk6wqg4_l8" -O logs.tar.gz && rm -rf /tmp/cookies.txt 2 | tar zxvf logs.tar.gz 3 | -------------------------------------------------------------------------------- /mnemonics-training/2_eval/trainer/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yaoyao-liu/class-incremental-learning/701af9f819f559c6ab3d3ee73bb3d7c21e924572/mnemonics-training/2_eval/trainer/__init__.py -------------------------------------------------------------------------------- /mnemonics-training/2_eval/trainer/train.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding=utf-8 3 | import torch 4 | import torch.nn as nn 5 | import torch.nn.functional as F 6 | import torch.optim as optim 7 | from torch.optim import lr_scheduler 8 | import torchvision 9 | from torchvision import datasets, models, transforms 10 | from torch.autograd import Variable 11 | from tensorboardX import SummaryWriter 12 | import numpy as np 13 | import time 14 | import os 15 | import os.path as osp 16 | import sys 17 | import copy 18 | import argparse 19 | from PIL import Image 20 | try: 21 | import cPickle as pickle 22 | except: 23 | import pickle 24 | import math 25 | import utils.misc 26 | import models.modified_resnet_cifar as modified_resnet_cifar 27 | import models.modified_resnetmtl_cifar as modified_resnetmtl_cifar 28 | import models.modified_resnet as modified_resnet 29 | import models.modified_resnetmtl as modified_resnetmtl 30 | import models.modified_linear as modified_linear 31 | from utils.imagenet.utils_dataset import split_images_labels 32 | from utils.imagenet.utils_dataset import merge_images_labels 33 | from utils.incremental.compute_features import compute_features 34 | from utils.incremental.compute_accuracy import compute_accuracy 35 | from utils.incremental.compute_confusion_matrix import compute_confusion_matrix 36 | import warnings 37 | warnings.filterwarnings('ignore') 38 | 39 | class Trainer(object): 40 | def __init__(self, the_args): 41 | self.args = the_args 42 | self.log_dir = './logs/' 43 | if not osp.exists(self.log_dir): 44 | os.mkdir(self.log_dir) 45 | self.save_path = self.log_dir + self.args.dataset + '_nfg' + str(self.args.nb_cl_fg) + '_ncls' + str(self.args.nb_cl) + \ 46 | '_nproto' + str(self.args.nb_protos) 47 | if self.args.use_mtl: 48 | self.save_path += '_mtl' 49 | if self.args.add_str is not None: 50 | self.save_path += self.args.add_str 51 | self.save_path += '_' + str(self.args.ckpt_label) 52 | if not osp.exists(self.save_path): 53 | os.mkdir(self.save_path) 54 | 55 | self.device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") 56 | 57 | if self.args.dataset == 'cifar100': 58 | self.transform_train = transforms.Compose([ \ 59 | transforms.RandomCrop(32, padding=4), \ 60 | transforms.RandomHorizontalFlip(), \ 61 | transforms.ToTensor(), \ 62 | transforms.Normalize((0.5071, 0.4866, 0.4409), (0.2009, 0.1984, 0.2023)),]) 63 | self.transform_test = transforms.Compose([ \ 64 | transforms.ToTensor(), \ 65 | transforms.Normalize((0.5071, 0.4866, 0.4409), (0.2009, 0.1984, 0.2023)),]) 66 | self.trainset = torchvision.datasets.CIFAR100(root='./data', train=True, download=True, transform=self.transform_train) 67 | self.testset = torchvision.datasets.CIFAR100(root='./data', train=False, download=True, transform=self.transform_test) 68 | self.evalset = torchvision.datasets.CIFAR100(root='./data', train=False, download=False, transform=self.transform_test) 69 | 70 | self.network = modified_resnet_cifar.resnet32 71 | self.network_mtl = modified_resnetmtl_cifar.resnetmtl32 72 | self.lr_strat = [int(self.args.epochs*0.5), int(self.args.epochs*0.75)] 73 | self.dictionary_size = 500 74 | 75 | elif self.args.dataset == 'imagenet_sub' or self.args.dataset == 'imagenet': 76 | traindir = os.path.join(self.args.data_dir, 'train') 77 | valdir = os.path.join(self.args.data_dir, 'val') 78 | normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) 79 | self.trainset = datasets.ImageFolder(traindir, \ 80 | transforms.Compose([transforms.RandomResizedCrop(224), \ 81 | transforms.RandomHorizontalFlip(), \ 82 | transforms.ToTensor(), normalize,])) 83 | self.testset = datasets.ImageFolder(valdir, \ 84 | transforms.Compose([transforms.Resize(256), \ 85 | transforms.CenterCrop(224), \ 86 | transforms.ToTensor(), normalize, ])) 87 | self.evalset = datasets.ImageFolder(valdir, \ 88 | transforms.Compose([transforms.Resize(256), \ 89 | transforms.CenterCrop(224), \ 90 | transforms.ToTensor(), normalize,])) 91 | 92 | self.network = modified_resnet.resnet18 93 | self.network_mtl = modified_resnetmtl.resnetmtl18 94 | self.lr_strat = [30, 60] 95 | self.dictionary_size = 1500 96 | 97 | else: 98 | raise ValueError('Please set correct dataset.') 99 | 100 | 101 | def eval(self): 102 | self.train_writer = SummaryWriter(comment=self.save_path) 103 | dictionary_size = self.dictionary_size 104 | top1_acc_list_cumul = np.zeros((int(self.args.num_classes/self.args.nb_cl), 4, self.args.nb_runs)) 105 | top1_acc_list_ori = np.zeros((int(self.args.num_classes/self.args.nb_cl), 4, self.args.nb_runs)) 106 | 107 | if self.args.dataset == 'cifar100': 108 | X_train_total = np.array(self.trainset.train_data) 109 | Y_train_total = np.array(self.trainset.train_labels) 110 | X_valid_total = np.array(self.testset.test_data) 111 | Y_valid_total = np.array(self.testset.test_labels) 112 | elif self.args.dataset == 'imagenet_sub' or self.args.dataset == 'imagenet': 113 | X_train_total, Y_train_total = split_images_labels(self.trainset.imgs) 114 | X_valid_total, Y_valid_total = split_images_labels(self.testset.imgs) 115 | else: 116 | raise ValueError('Please set correct dataset.') 117 | 118 | for iteration_total in range(self.args.nb_runs): 119 | order_name = osp.join(self.save_path, \ 120 | "seed_{}_{}_order_run_{}.pkl".format(self.args.random_seed, self.args.dataset, iteration_total)) 121 | print("Order name:{}".format(order_name)) 122 | 123 | if osp.exists(order_name): 124 | print("Loading orders") 125 | order = utils.misc.unpickle(order_name) 126 | else: 127 | print("Generating orders") 128 | order = np.arange(self.args.num_classes) 129 | np.random.shuffle(order) 130 | utils.misc.savepickle(order, order_name) 131 | order_list = list(order) 132 | print(order_list) 133 | 134 | X_valid_cumuls = [] 135 | X_protoset_cumuls = [] 136 | Y_valid_cumuls = [] 137 | Y_protoset_cumuls = [] 138 | 139 | start_iter = int(self.args.nb_cl_fg/self.args.nb_cl)-1 140 | 141 | for iteration in range(start_iter, int(self.args.num_classes/self.args.nb_cl)): 142 | if iteration == start_iter: 143 | last_iter = 0 144 | tg_model = self.network(num_classes=self.args.nb_cl_fg) 145 | in_features = tg_model.fc.in_features 146 | out_features = tg_model.fc.out_features 147 | print("in_features:", in_features, "out_features:", out_features) 148 | ref_model = None 149 | elif iteration == start_iter+1: 150 | last_iter = iteration 151 | ref_model = copy.deepcopy(tg_model) 152 | if self.args.use_mtl: 153 | tg_model = self.network_mtl(num_classes=self.args.nb_cl_fg) 154 | else: 155 | tg_model = self.network(num_classes=self.args.nb_cl_fg) 156 | ref_dict = ref_model.state_dict() 157 | tg_dict = tg_model.state_dict() 158 | tg_dict.update(ref_dict) 159 | tg_model.load_state_dict(tg_dict) 160 | tg_model.to(self.device) 161 | in_features = tg_model.fc.in_features 162 | out_features = tg_model.fc.out_features 163 | print("in_features:", in_features, "out_features:", out_features) 164 | new_fc = modified_linear.SplitCosineLinear(in_features, out_features, self.args.nb_cl) 165 | new_fc.fc1.weight.data = tg_model.fc.weight.data 166 | new_fc.sigma.data = tg_model.fc.sigma.data 167 | tg_model.fc = new_fc 168 | lamda_mult = out_features*1.0 / self.args.nb_cl 169 | else: 170 | last_iter = iteration 171 | ref_model = copy.deepcopy(tg_model) 172 | in_features = tg_model.fc.in_features 173 | out_features1 = tg_model.fc.fc1.out_features 174 | out_features2 = tg_model.fc.fc2.out_features 175 | print("in_features:", in_features, "out_features1:", out_features1, "out_features2:", out_features2) 176 | new_fc = modified_linear.SplitCosineLinear(in_features, out_features1+out_features2, self.args.nb_cl) 177 | new_fc.fc1.weight.data[:out_features1] = tg_model.fc.fc1.weight.data 178 | new_fc.fc1.weight.data[out_features1:] = tg_model.fc.fc2.weight.data 179 | new_fc.sigma.data = tg_model.fc.sigma.data 180 | tg_model.fc = new_fc 181 | lamda_mult = (out_features1+out_features2)*1.0 / (self.args.nb_cl) 182 | 183 | actual_cl = order[range(last_iter*self.args.nb_cl,(iteration+1)*self.args.nb_cl)] 184 | indices_train_10 = np.array([i in order[range(last_iter*self.args.nb_cl,(iteration+1)*self.args.nb_cl)] for i in Y_train_total]) 185 | indices_test_10 = np.array([i in order[range(last_iter*self.args.nb_cl,(iteration+1)*self.args.nb_cl)] for i in Y_valid_total]) 186 | 187 | X_valid = X_valid_total[indices_test_10] 188 | X_valid_cumuls.append(X_valid) 189 | X_valid_cumul = np.concatenate(X_valid_cumuls) 190 | 191 | Y_valid = Y_valid_total[indices_test_10] 192 | Y_valid_cumuls.append(Y_valid) 193 | Y_valid_cumul = np.concatenate(Y_valid_cumuls) 194 | 195 | if iteration == start_iter: 196 | X_valid_ori = X_valid 197 | Y_valid_ori = Y_valid 198 | 199 | ckp_name = osp.join(self.save_path, 'run_{}_iteration_{}_model.pth'.format(iteration_total, iteration)) 200 | 201 | print('ckp_name', ckp_name) 202 | print("[*] Loading models from checkpoint") 203 | tg_model = torch.load(ckp_name) 204 | tg_feature_model = nn.Sequential(*list(tg_model.children())[:-1]) 205 | 206 | if self.args.dataset == 'cifar100': 207 | map_Y_valid_ori = np.array([order_list.index(i) for i in Y_valid_ori]) 208 | print('Computing accuracy on the original batch of classes...') 209 | self.evalset.test_data = X_valid_ori.astype('uint8') 210 | self.evalset.test_labels = map_Y_valid_ori 211 | evalloader = torch.utils.data.DataLoader(self.evalset, batch_size=self.args.eval_batch_size, 212 | shuffle=False, num_workers=self.args.num_workers) 213 | ori_acc = compute_accuracy(tg_model, tg_feature_model, evalloader) 214 | top1_acc_list_ori[iteration, :, iteration_total] = np.array(ori_acc).T 215 | self.train_writer.add_scalar('ori_acc/cnn', float(ori_acc), iteration) 216 | map_Y_valid_cumul = np.array([order_list.index(i) for i in Y_valid_cumul]) 217 | print('Computing cumulative accuracy...') 218 | self.evalset.test_data = X_valid_cumul.astype('uint8') 219 | self.evalset.test_labels = map_Y_valid_cumul 220 | evalloader = torch.utils.data.DataLoader(self.evalset, batch_size=self.args.eval_batch_size, 221 | shuffle=False, num_workers=self.args.num_workers) 222 | cumul_acc = compute_accuracy(tg_model, tg_feature_model, evalloader) 223 | top1_acc_list_cumul[iteration, :, iteration_total] = np.array(cumul_acc).T 224 | self.train_writer.add_scalar('cumul_acc/cnn', float(cumul_acc), iteration) 225 | elif self.args.dataset == 'imagenet_sub' or self.args.dataset == 'imagenet': 226 | map_Y_valid_ori = np.array([order_list.index(i) for i in Y_valid_ori]) 227 | print('Computing accuracy on the original batch of classes...') 228 | current_eval_set = merge_images_labels(X_valid_ori, map_Y_valid_ori) 229 | self.evalset.imgs = self.evalset.samples = current_eval_set 230 | evalloader = torch.utils.data.DataLoader(self.evalset, batch_size=self.args.eval_batch_size, 231 | shuffle=False, num_workers=self.args.num_workers, pin_memory=True) 232 | ori_acc = compute_accuracy(tg_model, tg_feature_model, evalloader) 233 | top1_acc_list_ori[iteration, :, iteration_total] = np.array(ori_acc).T 234 | self.train_writer.add_scalar('ori_acc/cnn', float(ori_acc), iteration) 235 | map_Y_valid_cumul = np.array([order_list.index(i) for i in Y_valid_cumul]) 236 | print('Computing cumulative accuracy...') 237 | current_eval_set = merge_images_labels(X_valid_cumul, map_Y_valid_cumul) 238 | self.evalset.imgs = self.evalset.samples = current_eval_set 239 | evalloader = torch.utils.data.DataLoader(self.evalset, batch_size=self.args.eval_batch_size, 240 | shuffle=False, num_workers=self.args.num_workers, pin_memory=True) 241 | cumul_acc = compute_accuracy(tg_model, tg_feature_model, evalloader) 242 | top1_acc_list_cumul[iteration, :, iteration_total] = np.array(cumul_acc).T 243 | self.train_writer.add_scalar('cumul_acc/cnn', float(cumul_acc), iteration) 244 | else: 245 | raise ValueError('Please set correct dataset.') 246 | 247 | self.train_writer.close() 248 | -------------------------------------------------------------------------------- /mnemonics-training/2_eval/utils/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yaoyao-liu/class-incremental-learning/701af9f819f559c6ab3d3ee73bb3d7c21e924572/mnemonics-training/2_eval/utils/__init__.py -------------------------------------------------------------------------------- /mnemonics-training/2_eval/utils/gpu_tools.py: -------------------------------------------------------------------------------- 1 | import os 2 | import torch 3 | import time 4 | 5 | def check_memory(cuda_device): 6 | devices_info = os.popen('"/usr/bin/nvidia-smi" --query-gpu=memory.total,memory.used --format=csv,nounits,noheader').read().strip().split("\n") 7 | total, used = devices_info[int(cuda_device)].split(',') 8 | return total,used 9 | 10 | def occupy_memory(cuda_device): 11 | total, used = check_memory(cuda_device) 12 | total = int(total) 13 | used = int(used) 14 | max_mem = int(total * 0.90) 15 | print('Total memory: ' + str(total) + ', used memory: ' + str(used)) 16 | block_mem = max_mem - used 17 | if block_mem > 0: 18 | x = torch.cuda.FloatTensor(256, 1024, block_mem) 19 | del x 20 | 21 | def set_gpu(x): 22 | os.environ['CUDA_VISIBLE_DEVICES'] = x 23 | print('Using gpu:', x) 24 | 25 | 26 | -------------------------------------------------------------------------------- /mnemonics-training/2_eval/utils/imagenet/__init__.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding=utf-8 3 | -------------------------------------------------------------------------------- /mnemonics-training/2_eval/utils/imagenet/train_and_eval.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import os 3 | import shutil 4 | import time 5 | 6 | import torch 7 | import torch.nn as nn 8 | import torch.nn.parallel 9 | import torch.backends.cudnn as cudnn 10 | import torch.distributed as dist 11 | import torch.optim 12 | import torch.utils.data 13 | import torch.utils.data.distributed 14 | import torchvision.transforms as transforms 15 | import torchvision.datasets as datasets 16 | import torchvision.models as models 17 | 18 | from .utils_train import * 19 | 20 | def train_and_eval(epochs, start_epoch, model, optimizer, lr_scheduler, \ 21 | train_loader, val_loader, gpu=None): 22 | for epoch in range(start_epoch, epochs): 23 | #adjust_learning_rate(optimizer, epoch) 24 | lr_scheduler.step() 25 | print('\nEpoch: %d, LR: ' % epoch, end='') 26 | print(lr_scheduler.get_lr()) 27 | 28 | # train for one epoch 29 | train(train_loader, model, optimizer, epoch, gpu) 30 | 31 | # evaluate on validation set 32 | validate(val_loader, model, gpu) 33 | 34 | return model 35 | 36 | def train(train_loader, model, optimizer, epoch, gpu=None): 37 | batch_time = AverageMeter() 38 | data_time = AverageMeter() 39 | losses = AverageMeter() 40 | top1 = AverageMeter() 41 | top5 = AverageMeter() 42 | model.train() 43 | criterion = nn.CrossEntropyLoss().cuda(gpu) 44 | end = time.time() 45 | for i, (input, target) in enumerate(train_loader): 46 | data_time.update(time.time() - end) 47 | if gpu is not None: 48 | input = input.cuda(gpu, non_blocking=True) 49 | target = target.cuda(gpu, non_blocking=True) 50 | output = model(input) 51 | loss = criterion(output, target) 52 | prec1, prec5 = accuracy(output, target, topk=(1, 5)) 53 | losses.update(loss.item(), input.size(0)) 54 | top1.update(prec1[0], input.size(0)) 55 | top5.update(prec5[0], input.size(0)) 56 | optimizer.zero_grad() 57 | loss.backward() 58 | optimizer.step() 59 | batch_time.update(time.time() - end) 60 | end = time.time() 61 | 62 | if i % 10 == 0: 63 | print('Epoch: [{0}][{1}/{2}]\t' 64 | 'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t' 65 | 'Data {data_time.val:.3f} ({data_time.avg:.3f})\t' 66 | 'Loss {loss.val:.4f} ({loss.avg:.4f})\t' 67 | 'Prec@1 {top1.val:.3f} ({top1.avg:.3f})\t' 68 | 'Prec@5 {top5.val:.3f} ({top5.avg:.3f})'.format( 69 | epoch, i, len(train_loader), batch_time=batch_time, 70 | data_time=data_time, loss=losses, top1=top1, top5=top5)) 71 | -------------------------------------------------------------------------------- /mnemonics-training/2_eval/utils/imagenet/utils_dataset.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import os 3 | import shutil 4 | import time 5 | import numpy as np 6 | 7 | def split_images_labels(imgs): 8 | images = [] 9 | labels = [] 10 | for item in imgs: 11 | images.append(item[0]) 12 | labels.append(item[1]) 13 | return np.array(images), np.array(labels) 14 | 15 | def merge_images_labels(images, labels): 16 | images = list(images) 17 | labels = list(labels) 18 | assert(len(images)==len(labels)) 19 | imgs = [] 20 | for i in range(len(images)): 21 | item = (images[i], labels[i]) 22 | imgs.append(item) 23 | return imgs 24 | -------------------------------------------------------------------------------- /mnemonics-training/2_eval/utils/imagenet/utils_train.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import os 3 | import shutil 4 | import time 5 | 6 | import torch 7 | import torch.nn as nn 8 | import torch.nn.parallel 9 | import torch.backends.cudnn as cudnn 10 | import torch.distributed as dist 11 | import torch.optim 12 | import torch.utils.data 13 | import torch.utils.data.distributed 14 | import torchvision.transforms as transforms 15 | import torchvision.datasets as datasets 16 | import torchvision.models as models 17 | 18 | def validate(val_loader, model, gpu=None): 19 | batch_time = AverageMeter() 20 | losses = AverageMeter() 21 | top1 = AverageMeter() 22 | top5 = AverageMeter() 23 | 24 | model.eval() 25 | criterion = nn.CrossEntropyLoss().cuda(gpu) 26 | 27 | with torch.no_grad(): 28 | end = time.time() 29 | for i, (input, target) in enumerate(val_loader): 30 | if gpu is not None: 31 | input = input.cuda(gpu, non_blocking=True) 32 | target = target.cuda(gpu, non_blocking=True) 33 | 34 | output = model(input) 35 | loss = criterion(output, target) 36 | 37 | prec1, prec5 = accuracy(output, target, topk=(1, 5)) 38 | losses.update(loss.item(), input.size(0)) 39 | top1.update(prec1[0], input.size(0)) 40 | top5.update(prec5[0], input.size(0)) 41 | 42 | batch_time.update(time.time() - end) 43 | end = time.time() 44 | 45 | if i % 10 == 0: 46 | print('Test: [{0}/{1}]\t' 47 | 'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t' 48 | 'Loss {loss.val:.4f} ({loss.avg:.4f})\t' 49 | 'Prec@1 {top1.val:.3f} ({top1.avg:.3f})\t' 50 | 'Prec@5 {top5.val:.3f} ({top5.avg:.3f})'.format( 51 | i, len(val_loader), batch_time=batch_time, loss=losses, 52 | top1=top1, top5=top5)) 53 | 54 | print(' * Prec@1 {top1.avg:.3f} Prec@5 {top5.avg:.3f}' 55 | .format(top1=top1, top5=top5)) 56 | 57 | return top1.avg 58 | 59 | class AverageMeter(object): 60 | """Computes and stores the average and current value""" 61 | def __init__(self): 62 | self.reset() 63 | 64 | def reset(self): 65 | self.val = 0 66 | self.avg = 0 67 | self.sum = 0 68 | self.count = 0 69 | 70 | def update(self, val, n=1): 71 | self.val = val 72 | self.sum += val * n 73 | self.count += n 74 | self.avg = self.sum / self.count 75 | 76 | def accuracy(output, target, topk=(1,)): 77 | with torch.no_grad(): 78 | maxk = max(topk) 79 | batch_size = target.size(0) 80 | 81 | _, pred = output.topk(maxk, 1, True, True) 82 | pred = pred.t() 83 | correct = pred.eq(target.view(1, -1).expand_as(pred)) 84 | 85 | res = [] 86 | for k in topk: 87 | correct_k = correct[:k].view(-1).float().sum(0, keepdim=True) 88 | res.append(correct_k.mul_(100.0 / batch_size)) 89 | return res 90 | -------------------------------------------------------------------------------- /mnemonics-training/2_eval/utils/incremental/__init__.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding=utf-8 3 | # for incremental train and eval 4 | -------------------------------------------------------------------------------- /mnemonics-training/2_eval/utils/incremental/compute_accuracy.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding=utf-8 3 | import torch 4 | import torch.nn as nn 5 | import torch.nn.functional as F 6 | import torch.optim as optim 7 | from torch.optim import lr_scheduler 8 | import torchvision 9 | from torchvision import datasets, models, transforms 10 | from torch.autograd import Variable 11 | import numpy as np 12 | import time 13 | import os 14 | import copy 15 | import argparse 16 | from PIL import Image 17 | from scipy.spatial.distance import cdist 18 | from sklearn.metrics import confusion_matrix 19 | from utils.misc import * 20 | from utils.imagenet.utils_dataset import merge_images_labels 21 | 22 | def compute_accuracy(tg_model, tg_feature_model, evalloader, scale=None, print_info=True, device=None, cifar=True, imagenet=False, valdir=None): 23 | if device is None: 24 | device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") 25 | tg_model.eval() 26 | tg_feature_model.eval() 27 | 28 | correct = 0 29 | correct_icarl = 0 30 | correct_icarl_cosine = 0 31 | correct_icarl_cosine2 = 0 32 | correct_ncm = 0 33 | correct_maml = 0 34 | total = 0 35 | with torch.no_grad(): 36 | for batch_idx, (inputs, targets) in enumerate(evalloader): 37 | inputs, targets = inputs.to(device), targets.to(device) 38 | total += targets.size(0) 39 | outputs = tg_model(inputs) 40 | outputs = F.softmax(outputs, dim=1) 41 | if scale is not None: 42 | assert(scale.shape[0] == 1) 43 | assert(outputs.shape[1] == scale.shape[1]) 44 | outputs = outputs / scale.repeat(outputs.shape[0], 1).type(torch.FloatTensor).to(device) 45 | _, predicted = outputs.max(1) 46 | correct += predicted.eq(targets).sum().item() 47 | 48 | if print_info: 49 | print(" top 1 accuracy :\t\t{:.2f} %".format(100.*correct/total)) 50 | acc = 100.*correct/total 51 | return acc 52 | -------------------------------------------------------------------------------- /mnemonics-training/2_eval/utils/incremental/compute_confusion_matrix.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding=utf-8 3 | import torch 4 | import torch.nn as nn 5 | import torch.nn.functional as F 6 | import torch.optim as optim 7 | from torch.optim import lr_scheduler 8 | import torchvision 9 | from torchvision import datasets, models, transforms 10 | from torch.autograd import Variable 11 | import numpy as np 12 | import time 13 | import os 14 | import copy 15 | import argparse 16 | from PIL import Image 17 | from scipy.spatial.distance import cdist 18 | from sklearn.metrics import confusion_matrix 19 | from utils.misc import * 20 | 21 | def compute_confusion_matrix(tg_model, tg_feature_model, class_means, evalloader, print_info=False, device=None): 22 | if device is None: 23 | device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") 24 | tg_model.eval() 25 | tg_feature_model.eval() 26 | 27 | #evalset = torchvision.datasets.CIFAR100(root='./data', train=False, 28 | # download=False, transform=transform_test) 29 | #evalset.test_data = input_data.astype('uint8') 30 | #evalset.test_labels = input_labels 31 | #evalloader = torch.utils.data.DataLoader(evalset, batch_size=128, 32 | # shuffle=False, num_workers=2) 33 | 34 | correct = 0 35 | correct_icarl = 0 36 | correct_ncm = 0 37 | total = 0 38 | num_classes = tg_model.fc.out_features 39 | cm = np.zeros((3, num_classes, num_classes)) 40 | all_targets = [] 41 | all_predicted = [] 42 | all_predicted_icarl = [] 43 | all_predicted_ncm = [] 44 | with torch.no_grad(): 45 | for batch_idx, (inputs, targets) in enumerate(evalloader): 46 | inputs, targets = inputs.to(device), targets.to(device) 47 | total += targets.size(0) 48 | all_targets.append(targets) 49 | 50 | outputs = tg_model(inputs) 51 | _, predicted = outputs.max(1) 52 | correct += predicted.eq(targets).sum().item() 53 | all_predicted.append(predicted) 54 | 55 | 56 | 57 | cm[0, :, :] = confusion_matrix(np.concatenate(all_targets), np.concatenate(all_predicted)) 58 | 59 | if print_info: 60 | print(" top 1 accuracy :\t\t{:.2f} %".format( 100.*correct/total )) 61 | return cm 62 | -------------------------------------------------------------------------------- /mnemonics-training/2_eval/utils/incremental/compute_features.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding=utf-8 3 | #!/usr/bin/env python 4 | # coding=utf-8 5 | import torch 6 | import torch.nn as nn 7 | import torch.nn.functional as F 8 | import torch.optim as optim 9 | from torch.optim import lr_scheduler 10 | import torchvision 11 | from torchvision import datasets, models, transforms 12 | from torch.autograd import Variable 13 | import numpy as np 14 | import time 15 | import os 16 | import copy 17 | import argparse 18 | from PIL import Image 19 | from scipy.spatial.distance import cdist 20 | from sklearn.metrics import confusion_matrix 21 | from utils.misc import * 22 | 23 | def compute_features(tg_feature_model, evalloader, num_samples, num_features, device=None): 24 | if device is None: 25 | device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") 26 | tg_feature_model.eval() 27 | features = np.zeros([num_samples, num_features]) 28 | start_idx = 0 29 | with torch.no_grad(): 30 | for inputs, targets in evalloader: 31 | inputs = inputs.to(device) 32 | features[start_idx:start_idx+inputs.shape[0], :] = np.squeeze(tg_feature_model(inputs)) 33 | start_idx = start_idx+inputs.shape[0] 34 | assert(start_idx==num_samples) 35 | return features 36 | -------------------------------------------------------------------------------- /mnemonics-training/2_eval/utils/incremental/conv2d_mtl.py: -------------------------------------------------------------------------------- 1 | import math 2 | import torch 3 | from torch.nn.parameter import Parameter 4 | import torch.nn.functional as F 5 | from torch.nn.modules.module import Module 6 | from torch.nn.modules.utils import _single, _pair, _triple 7 | 8 | 9 | class _ConvNdMtl(Module): 10 | 11 | def __init__(self, in_channels, out_channels, kernel_size, stride, 12 | padding, dilation, transposed, output_padding, groups, bias): 13 | super(_ConvNdMtl, self).__init__() 14 | if in_channels % groups != 0: 15 | raise ValueError('in_channels must be divisible by groups') 16 | if out_channels % groups != 0: 17 | raise ValueError('out_channels must be divisible by groups') 18 | self.in_channels = in_channels 19 | self.out_channels = out_channels 20 | self.kernel_size = kernel_size 21 | self.stride = stride 22 | self.padding = padding 23 | self.dilation = dilation 24 | self.transposed = transposed 25 | self.output_padding = output_padding 26 | self.groups = groups 27 | if transposed: 28 | self.weight = Parameter(torch.Tensor( 29 | in_channels, out_channels // groups, *kernel_size)) 30 | self.mtl_weight = Parameter(torch.ones(in_channels, out_channels // groups, 1, 1)) 31 | else: 32 | self.weight = Parameter(torch.Tensor( 33 | out_channels, in_channels // groups, *kernel_size)) 34 | self.mtl_weight = Parameter(torch.ones(out_channels, in_channels // groups, 1, 1)) 35 | self.weight.requires_grad=False 36 | if bias: 37 | self.bias = Parameter(torch.Tensor(out_channels)) 38 | self.bias.requires_grad=False 39 | self.mtl_bias = Parameter(torch.zeros(out_channels)) 40 | else: 41 | self.register_parameter('bias', None) 42 | self.register_parameter('mtl_bias', None) 43 | self.reset_parameters() 44 | 45 | def reset_parameters(self): 46 | n = self.in_channels 47 | for k in self.kernel_size: 48 | n *= k 49 | stdv = 1. / math.sqrt(n) 50 | self.weight.data.uniform_(-stdv, stdv) 51 | self.mtl_weight.data.uniform_(1, 1) 52 | if self.bias is not None: 53 | self.bias.data.uniform_(-stdv, stdv) 54 | self.mtl_bias.data.uniform_(0, 0) 55 | 56 | def extra_repr(self): 57 | s = ('{in_channels}, {out_channels}, kernel_size={kernel_size}' 58 | ', stride={stride}') 59 | if self.padding != (0,) * len(self.padding): 60 | s += ', padding={padding}' 61 | if self.dilation != (1,) * len(self.dilation): 62 | s += ', dilation={dilation}' 63 | if self.output_padding != (0,) * len(self.output_padding): 64 | s += ', output_padding={output_padding}' 65 | if self.groups != 1: 66 | s += ', groups={groups}' 67 | if self.bias is None: 68 | s += ', bias=False' 69 | return s.format(**self.__dict__) 70 | 71 | class Conv2dMtl(_ConvNdMtl): 72 | 73 | def __init__(self, in_channels, out_channels, kernel_size, stride=1, 74 | padding=0, dilation=1, groups=1, bias=True): 75 | kernel_size = _pair(kernel_size) 76 | stride = _pair(stride) 77 | padding = _pair(padding) 78 | dilation = _pair(dilation) 79 | super(Conv2dMtl, self).__init__( 80 | in_channels, out_channels, kernel_size, stride, padding, dilation, 81 | False, _pair(0), groups, bias) 82 | 83 | def forward(self, input): 84 | new_mtl_weight = self.mtl_weight.expand(self.weight.shape) 85 | new_weight = self.weight.mul(new_mtl_weight) 86 | if self.bias is not None: 87 | new_bias = self.bias + self.mtl_bias 88 | else: 89 | new_bias = None 90 | return F.conv2d(input, new_weight, new_bias, self.stride, 91 | self.padding, self.dilation, self.groups) 92 | 93 | -------------------------------------------------------------------------------- /mnemonics-training/2_eval/utils/misc.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding=utf-8 3 | from __future__ import print_function, division 4 | 5 | import torch 6 | import torch.nn as nn 7 | import torch.nn.init as init 8 | from collections import OrderedDict 9 | 10 | import numpy as np 11 | import os 12 | import os.path as osp 13 | import sys 14 | import time 15 | import math 16 | import subprocess 17 | try: 18 | import cPickle as pickle 19 | except: 20 | import pickle 21 | 22 | def savepickle(data, file_path): 23 | mkdir_p(osp.dirname(file_path), delete=False) 24 | print('pickle into', file_path) 25 | with open(file_path, 'wb') as f: 26 | pickle.dump(data, f, pickle.HIGHEST_PROTOCOL) 27 | 28 | def unpickle(file_path): 29 | with open(file_path, 'rb') as f: 30 | data = pickle.load(f) 31 | return data 32 | 33 | def mkdir_p(path, delete=False, print_info=True): 34 | if path == '': return 35 | 36 | if delete: 37 | subprocess.call(('rm -r ' + path).split()) 38 | if not osp.exists(path): 39 | if print_info: 40 | print('mkdir -p ' + path) 41 | subprocess.call(('mkdir -p ' + path).split()) 42 | 43 | def get_mean_and_std(dataset): 44 | dataloader = torch.utils.data.DataLoader(dataset, batch_size=1, shuffle=True, num_workers=2) 45 | mean = torch.zeros(3) 46 | std = torch.zeros(3) 47 | print('==> Computing mean and std..') 48 | for inputs, targets in dataloader: 49 | for i in range(3): 50 | mean[i] += inputs[:,i,:,:].mean() 51 | std[i] += inputs[:,i,:,:].std() 52 | mean.div_(len(dataset)) 53 | std.div_(len(dataset)) 54 | return mean, std 55 | 56 | def init_params(net): 57 | for m in net.modules(): 58 | if isinstance(m, nn.Conv2d): 59 | init.kaiming_normal_(m.weight, mode='fan_out') 60 | if m.bias: 61 | init.constant_(m.bias, 0) 62 | elif isinstance(m, nn.BatchNorm2d): 63 | init.constant_(m.weight, 1) 64 | init.constant_(m.bias, 0) 65 | elif isinstance(m, nn.Linear): 66 | init.normal_(m.weight, std=1e-3) 67 | if m.bias is not None: 68 | init.constant_(m.bias, 0) 69 | 70 | _, term_width = os.popen('stty size', 'r').read().split() 71 | term_width = int(term_width) 72 | 73 | TOTAL_BAR_LENGTH = 65. 74 | last_time = time.time() 75 | begin_time = last_time 76 | def progress_bar(current, total, msg=None): 77 | global last_time, begin_time 78 | if current == 0: 79 | begin_time = time.time() # Reset for new bar. 80 | 81 | cur_len = int(TOTAL_BAR_LENGTH*current/total) 82 | rest_len = int(TOTAL_BAR_LENGTH - cur_len) - 1 83 | 84 | sys.stdout.write(' [') 85 | for i in range(cur_len): 86 | sys.stdout.write('=') 87 | sys.stdout.write('>') 88 | for i in range(rest_len): 89 | sys.stdout.write('.') 90 | sys.stdout.write(']') 91 | 92 | cur_time = time.time() 93 | step_time = cur_time - last_time 94 | last_time = cur_time 95 | tot_time = cur_time - begin_time 96 | 97 | L = [] 98 | L.append(' Step: %s' % format_time(step_time)) 99 | L.append(' | Tot: %s' % format_time(tot_time)) 100 | if msg: 101 | L.append(' | ' + msg) 102 | 103 | msg = ''.join(L) 104 | sys.stdout.write(msg) 105 | for i in range(term_width-int(TOTAL_BAR_LENGTH)-len(msg)-3): 106 | sys.stdout.write(' ') 107 | 108 | for i in range(term_width-int(TOTAL_BAR_LENGTH/2)+2): 109 | sys.stdout.write('\b') 110 | sys.stdout.write(' %d/%d ' % (current+1, total)) 111 | 112 | if current < total-1: 113 | sys.stdout.write('\r') 114 | else: 115 | sys.stdout.write('\n') 116 | sys.stdout.flush() 117 | 118 | def format_time(seconds): 119 | days = int(seconds / 3600/24) 120 | seconds = seconds - days*3600*24 121 | hours = int(seconds / 3600) 122 | seconds = seconds - hours*3600 123 | minutes = int(seconds / 60) 124 | seconds = seconds - minutes*60 125 | secondsf = int(seconds) 126 | seconds = seconds - secondsf 127 | millis = int(seconds*1000) 128 | 129 | f = '' 130 | i = 1 131 | if days > 0: 132 | f += str(days) + 'D' 133 | i += 1 134 | if hours > 0 and i <= 2: 135 | f += str(hours) + 'h' 136 | i += 1 137 | if minutes > 0 and i <= 2: 138 | f += str(minutes) + 'm' 139 | i += 1 140 | if secondsf > 0 and i <= 2: 141 | f += str(secondsf) + 's' 142 | i += 1 143 | if millis > 0 and i <= 2: 144 | f += str(millis) + 'ms' 145 | i += 1 146 | if f == '': 147 | f = '0ms' 148 | return f 149 | -------------------------------------------------------------------------------- /mnemonics-training/README.md: -------------------------------------------------------------------------------- 1 | # Mnemonics Training 2 | 3 | [![LICENSE](https://img.shields.io/badge/license-MIT-green?style=flat-square)](https://github.com/yaoyao-liu/class-incremental-learning/blob/master/LICENSE) 4 | [![Python](https://img.shields.io/badge/python-3.6-blue.svg?style=flat-square&logo=python&color=3776AB)](https://www.python.org/) 5 | [![PyTorch](https://img.shields.io/badge/pytorch-0.4.0-%237732a8?style=flat-square&logo=PyTorch&color=EE4C2C)](https://pytorch.org/) 6 | [![Citations](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/yaoyao-liu/google-scholar/google-scholar-stats/gs_data_shieldsio_mnemonics.json&logo=Google%20Scholar&color=5087ec&style=flat-square&label=citations)](https://scholar.google.com/citations?view_op=view_citation&hl=en&user=Qi2PSmEAAAAJ&citation_for_view=Qi2PSmEAAAAJ:UeHWp8X0CEIC) 7 | 8 | \[[PDF](https://arxiv.org/pdf/2002.10211.pdf)\] \[[Project Page](https://class-il.mpi-inf.mpg.de/mnemonics-training/)\] 9 | 10 | ## Requirements 11 | 12 | See the versions for the requirements [here](https://yyliu.net/files/mnemonics_packages.txt). 13 | 14 | ## Download the Datasest 15 | 16 | See the details [here](https://github.com/yaoyao-liu/class-incremental-learning/tree/main/adaptive-aggregation-networks#download-the-datasets). 17 | 18 | 19 | ## Running Experiments 20 | 21 | ### Running experiments for baselines 22 | 23 | ```bash 24 | cd ./mnemonics-training/1_train 25 | python main.py --method=baseline --nb_cl=10 26 | python main.py --method=baseline --nb_cl=5 27 | python main.py --method=baseline --nb_cl=2 28 | ``` 29 | 30 | ### Running experiments for our method 31 | 32 | ```bash 33 | cd ./mnemonics-training/1_train 34 | python main.py --method=mnemonics --nb_cl=10 35 | python main.py --method=mnemonics --nb_cl=5 36 | python main.py --method=mnemonics --nb_cl=2 37 | ``` 38 | 39 | ### Performance 40 | 41 | #### Average accuracy (%) 42 | 43 | | Method | Dataset | 5-phase | 10-phase | 25-phase | 44 | | ---------- | --------- | ---------- | ---------- |------------ | 45 | | [LwF](https://arxiv.org/abs/1606.09282) | CIFAR-100 | 52.44 | 48.47 | 45.75 | 46 | | [LwF](https://arxiv.org/abs/1606.09282) w/ ours | CIFAR-100 | 54.21 | 52.72 | 51.59 | 47 | | [iCaRL](https://arxiv.org/abs/1611.07725) | CIFAR-100 | 58.03 | 53.01 | 48.47 | 48 | | [iCaRL](https://arxiv.org/abs/1611.07725) w/ ours | CIFAR-100 | 60.01 | 57.37 | 54.13 | 49 | 50 | #### Forgetting rate (%, lower is better) 51 | 52 | | Method | Dataset | 5-phase | 10-phase | 25-phase | 53 | | ---------- | --------- | ---------- | ---------- |------------ | 54 | | [LwF](https://arxiv.org/abs/1606.09282) | CIFAR-100 | 45.02 | 42.50 | 39.86 | 55 | | [LwF](https://arxiv.org/abs/1606.09282) w/ ours | CIFAR-100 | 40.00 | 36.50 | 34.25 | 56 | | [iCaRL](https://arxiv.org/abs/1611.07725) | CIFAR-100 | 32.87 | 32.98 | 36.32 | 57 | | [iCaRL](https://arxiv.org/abs/1611.07725) w/ ours | CIFAR-100 | 25.93 | 26.92 | 28.92 | 58 | 59 | ## Citation 60 | 61 | Please cite our paper if it is helpful to your work: 62 | 63 | ```bibtex 64 | @inproceedings{liu2020mnemonics, 65 | author = {Liu, Yaoyao and Su, Yuting and Liu, An{-}An and Schiele, Bernt and Sun, Qianru}, 66 | title = {Mnemonics Training: Multi-Class Incremental Learning without Forgetting}, 67 | booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, 68 | pages = {12245--12254}, 69 | year = {2020} 70 | } 71 | ``` 72 | 73 | ### Acknowledgements 74 | 75 | Our implementation uses the source code from the following repositories: 76 | 77 | * [Learning a Unified Classifier Incrementally via Rebalancing](https://github.com/hshustc/CVPR19_Incremental_Learning) 78 | 79 | * [iCaRL: Incremental Classifier and Representation Learning](https://github.com/srebuffi/iCaRL) 80 | 81 | * [Dataset Distillation](https://github.com/SsnL/dataset-distillation) 82 | 83 | * [Generative Teaching Networks](https://github.com/uber-research/GTN) 84 | --------------------------------------------------------------------------------