├── GDA-code.zip └── README.md /GDA-code.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhiweihu1103/Logit-Distillation-GDA/main/GDA-code.zip -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Logit Distillation via Global Distribution Alignment 2 | #### This repo provides the source code & data of our paper: Logit Distillation via Global Distribution Alignment. 3 | 4 | ## Dependencies 5 | * conda create -n gda python=3.7 -y 6 | * torch==1.11.0+cu113 7 | * torchvision==0.12.0+cu113 8 | * torchaudio==0.11.0+cu113 9 | * timm==0.6.12 10 | 11 | ## Image Classification 12 | ### Preparation 13 | 1. Download [CIFAR](https://www.cs.toronto.edu/~kriz/cifar.html) images from its website. 14 | 2. Put the dataset into `Image Classification/cache/data/cifar`. 15 | 3. Download the pre-trained weights from [Strong-to-Weak](https://github.com/megvii-research/mdistiller/releases/tag/checkpoints) and [Weak-to-Strong](https://github.com/ggjy/vision_weak_to_strong/releases/tag/cifar-ckpt-1). 16 | 4. Put weights into `Image Classification/cache/ckpt/cifar`. 17 | 18 | ### Training model 19 | ```python 20 | sh train.sh 21 | ``` 22 | **Note:** 23 | 1. We only use KD as the basis, select ResNet56 as the teacher model and ResNet20 as the student model as examples. You can freely modify the variable values ​​defined at the beginning of `train.sh`. 24 | 2. For the definitions of different distillation losses, you can find them in `Image Classification/distillers`. 25 | 3. You will see the logs in floder `logs`. 26 | 27 | ## Few-shot Learning 28 | ### Preparation 29 | 1. Download the [miniImageNet](https://github.com/gidariss/FewShotWithoutForgetting) datasets and link the folders into `Few-shot Learning/materials` with names `mini-imagenet`. 30 | 2. You can setting the dataset path and output path in `Few-shot Learning/init_env.py`. 31 | 3. When running python programs, use --gpu to specify the GPUs for running the code (e.g. `--gpu 0,1`). For Classifier-Baseline, we train with 4 GPUs on miniImageNet. Meta-Baseline uses half of the GPUs correspondingly. 32 | 33 | ### Training model 34 | #### Training Classifier-Baseline 35 | * Training 36 | ``` 37 | python train_classifier.py --config config/classifier/train_classifier_mini.yaml --res_type resnet12_bottle --gpu 0,1,2,3 38 | python train_classifier.py --config config/classifier/train_classifier_mini.yaml --res_type resnet18_bottle --gpu 0,1,2,3 39 | python train_classifier.py --config config/classifier/train_classifier_mini.yaml --res_type resnet36_bottle --gpu 0,1,2,3 40 | ``` 41 | * Knowledge Distillation 42 | ``` 43 | python train_classifier.py --config config/classifier/train_classifier_mini_kd.yaml --res_type resnet36_bottle --teacher_res_type resnet12_bottle --gpu 0,1,2,3 44 | python train_classifier.py --config config/classifier/train_classifier_mini_kd.yaml --res_type resnet36_bottle --teacher_res_type resnet18_bottle --gpu 0,1,2,3 45 | ``` 46 | #### Training Meta-Baseline 47 | * Training 48 | ``` 49 | python train_meta.py --config config/meta/train_meta_mini.yaml --res_type resnet12 --gpu 0,1 50 | python train_meta.py --config config/meta/train_meta_mini.yaml --res_type resnet18 --gpu 0,1 51 | python train_meta.py --config config/meta/train_meta_mini.yaml --res_type resnet36 --gpu 0,1 52 | ``` 53 | * Knowledge Distillation (classifier teacher) 54 | ``` 55 | python train_meta.py --config config/meta/train_meta_mini_kd.yaml --res_type resnet36_bottle --teacher_res_type resnet12_bottle --gpu 0,1 56 | python train_meta.py --config config/meta/train_meta_mini_kd.yaml --res_type resnet36_bottle --teacher_res_type resnet18_bottle --gpu 0,1 57 | ``` 58 | * Knowledge Distillation (meta teacher) 59 | ``` 60 | python train_meta.py --config config/meta/train_meta_mini_kd.yaml --res_type resnet36_bottle --teacher_res_type resnet12_bottle --teacher_meta_model --gpu 0,1 61 | python train_meta.py --config config/meta/train_meta_mini_kd.yaml --res_type resnet36_bottle --teacher_res_type resnet18_bottle --teacher_meta_model --gpu 0,1 62 | ``` 63 | **Note:** 64 | 1. For the definitions of different distillation losses, you can find them in `Few-shot Learning/utils/__init__.py`. 65 | --------------------------------------------------------------------------------