└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # Unsupervised Domain Adaptation for Object Detection (D-adapt) 2 | 3 | ## Our code is available at [TLlib examples for cross-domain object detection](https://github.com/thuml/Transfer-Learning-Library/edit/dev-tllib/examples/domain_adaptation/object_detection/) 4 | 5 | ## Installation 6 | Our code is based on [Detectron latest(v0.6)](https://detectron2.readthedocs.io/en/latest/tutorials/install.html), please install it before usage. 7 | 8 | The following is an example based on PyTorch 1.9.0 with CUDA 11.1. For other versions, please refer to 9 | the official website of [PyTorch](https://pytorch.org/) and 10 | [Detectron](https://detectron2.readthedocs.io/en/latest/tutorials/install.html). 11 | ```shell 12 | # create environment 13 | conda create -n detection python=3.8.3 14 | # activate environment 15 | conda activate detection 16 | # install pytorch 17 | pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html 18 | # install detectron 19 | python -m pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu111/torch1.9/index.html 20 | # install other requirements 21 | pip install -r requirements.txt 22 | ``` 23 | 24 | ## Dataset 25 | 26 | Following datasets can be downloaded automatically: 27 | - [PASCAL_VOC 07+12](http://host.robots.ox.ac.uk/pascal/VOC/) 28 | - Clipart 29 | - WaterColor 30 | - Comic 31 | 32 | You need to prepare following datasets manually if you want to use them: 33 | 34 | #### Cityscapes, Foggy Cityscapes 35 | - Download Cityscapes and Foggy Cityscapes dataset from the [link](https://www.cityscapes-dataset.com/downloads/). Particularly, we use *leftImg8bit_trainvaltest.zip* for Cityscapes and *leftImg8bit_trainvaltest_foggy.zip* for Foggy Cityscapes. 36 | - Unzip them under the directory like 37 | 38 | ``` 39 | object_detction/datasets/cityscapes 40 | ├── gtFine 41 | ├── leftImg8bit 42 | ├── leftImg8bit_foggy 43 | └── ... 44 | ``` 45 | Then run 46 | ``` 47 | python prepare_cityscapes_to_voc.py 48 | ``` 49 | This will automatically generate dataset in `VOC` format. 50 | ``` 51 | object_detction/datasets/cityscapes_in_voc 52 | ├── Annotations 53 | ├── ImageSets 54 | └── JPEGImages 55 | object_detction/datasets/foggy_cityscapes_in_voc 56 | ├── Annotations 57 | ├── ImageSets 58 | └── JPEGImages 59 | ``` 60 | 61 | #### Sim10k 62 | - Download Sim10k dataset from the following links: [Sim10k](https://fcav.engin.umich.edu/projects/driving-in-the-matrix). Particularly, we use *repro_10k_images.tgz* , *repro_image_sets.tgz* and *repro_10k_annotations.tgz* for Sim10k. 63 | - Extract the training set from *repro_10k_images.tgz*, *repro_image_sets.tgz* and *repro_10k_annotations.tgz*, then rename directory `VOC2012/` to `sim10k/`. 64 | 65 | After preparation, there should exist following files: 66 | ``` 67 | object_detction/datasets/ 68 | ├── VOC2007 69 | │ ├── Annotations 70 | │ ├──ImageSets 71 | │ └──JPEGImages 72 | ├── VOC2012 73 | │ ├── Annotations 74 | │ ├── ImageSets 75 | │ └── JPEGImages 76 | ├── clipart 77 | │ ├── Annotations 78 | │ ├── ImageSets 79 | │ └── JPEGImages 80 | ├── watercolor 81 | │ ├── Annotations 82 | │ ├── ImageSets 83 | │ └── JPEGImages 84 | ├── comic 85 | │ ├── Annotations 86 | │ ├── ImageSets 87 | │ └── JPEGImages 88 | ├── cityscapes_in_voc 89 | │ ├── Annotations 90 | │ ├── ImageSets 91 | │ └── JPEGImages 92 | ├── foggy_cityscapes_in_voc 93 | │ ├── Annotations 94 | │ ├── ImageSets 95 | │ └── JPEGImages 96 | └── sim10k 97 | ├── Annotations 98 | ├── ImageSets 99 | └── JPEGImages 100 | ``` 101 | 102 | **Note**: The above is a tutorial for using standard datasets. To use your own datasets, 103 | you need to convert them into corresponding format. 104 | 105 | ## Supported Methods 106 | 107 | Supported methods include: 108 | 109 | - [Cycle-Consistent Adversarial Networks (CycleGAN)](https://arxiv.org/pdf/1703.10593.pdf) 110 | - [Decoupled Adaptation for Cross-Domain Object Detection (D-adapt)](https://arxiv.org/abs/2110.02578) 111 | 112 | ## Experiment and Results 113 | 114 | The shell files give the script to reproduce the [benchmarks](/docs/dalib/benchmarks/object_detection.rst) with specified hyper-parameters. 115 | The basic training pipeline is as follows. 116 | 117 | The following command trains a Faster-RCNN detector on task VOC->Clipart, with only source (VOC) data. 118 | ``` 119 | CUDA_VISIBLE_DEVICES=0 python source_only.py \ 120 | --config-file config/faster_rcnn_R_101_C4_voc.yaml \ 121 | -s VOC2007 datasets/VOC2007 VOC2012 datasets/VOC2012 -t Clipart datasets/clipart \ 122 | --test VOC2007Test datasets/VOC2007 Clipart datasets/clipart --finetune \ 123 | OUTPUT_DIR logs/source_only/faster_rcnn_R_101_C4/voc2clipart 124 | ``` 125 | Explanation of some arguments 126 | - `--config-file`: path to config file that specifies training hyper-parameters. 127 | - `-s`: a list that specifies source datasets, for each dataset you should pass in a `(name, path)` pair, in the 128 | above command, there are two source datasets **VOC2007** and **VOC2012**. 129 | - `-t`: a list that specifies target datasets, same format as above. 130 | - `--test`: a list that specifiers test datasets, same format as above. 131 | 132 | ### VOC->Clipart 133 | 134 | | | | AP | AP50 | AP75 | aeroplane | bicycle | bird | boat | bottle | bus | car | cat | chair | cow | diningtable | dog | horse | motorbike | person | pottedplant | sheep | sofa | train | tvmonitor | 135 | |-------------------------|----------|------|------|------|-----------|---------|------|------|--------|------|------|------|-------|------|-------------|------|-------|-----------|--------|-------------|-------|------|-------|-----------| 136 | | Faster RCNN (ResNet101) | Source | 14.9 | 29.3 | 12.6 | 29.6 | 38.0 | 24.7 | 21.7 | 31.9 | 48.0 | 30.8 | 15.9 | 32.0 | 19.2 | 18.2 | 12.1 | 28.2 | 48.8 | 38.3 | 34.6 | 3.8 | 22.5 | 43.7 | 44.0 | 137 | | | CycleGAN | 20.0 | 37.7 | 18.3 | 37.1 | 41.9 | 29.9 | 26.5 | 40.9 | 65.1 | 37.8 | 23.8 | 40.7 | 48.9 | 12.7 | 14.4 | 27.8 | 63.0 | 55.1 | 40.1 | 8.0 | 30.7 | 54.1 | 55.7 | 138 | | | D-adapt | 24.8 | 49.0 | 21.5 | 56.4 | 63.2 | 42.3 | 40.9 | 45.3 | 77.0 | 48.7 | 25.4 | 44.3 | 58.4 | 31.4 | 24.5 | 47.1 | 75.3 | 69.3 | 43.5 | 27.9 | 34.1 | 60.7 | 64.0 | 139 | | | | | | | | | | | | | | | | | | | | | | | | | | | 140 | | RetinaNet | Source | 18.3 | 32.2 | 17.6 | 34.2 | 42.4 | 27.0 | 21.6 | 36.8 | 48.4 | 35.9 | 16.4 | 38.9 | 22.6 | 27.0 | 15.1 | 27.1 | 46.7 | 42.1 | 36.2 | 8.3 | 29.5 | 42.1 | 46.2 | 141 | | | D-adapt | 25.1 | 46.3 | 23.9 | 47.4 | 65.0 | 33.1 | 37.5 | 56.8 | 61.2 | 55.1 | 27.3 | 45.5 | 51.8 | 29.1 | 29.6 | 38.0 | 74.5 | 66.7 | 46.0 | 24.2 | 29.3 | 54.2 | 53.8 | 142 | 143 | ### VOC->WaterColor 144 | 145 | | | AP | AP50 | AP75 | bicycle | bird | car | cat | dog | person | 146 | |-------------------------|------|------|------|---------|------|------|------|------|--------| 147 | | Faster RCNN (ResNet101) | 23.0 | 45.9 | 18.5 | 71.1 | 48.3 | 48.6 | 23.7 | 23.3 | 60.3 | 148 | | CycleGAN | 24.9 | 50.8 | 22.4 | 75.8 | 52.1 | 49.8 | 30.1 | 33.4 | 63.6 | 149 | | D-adapt | 28.5 | 57.5 | 23.6 | 77.4 | 54.0 | 52.8 | 43.9 | 48.1 | 68.9 | 150 | | Target | 23.8 | 51.3 | 17.4 | 48.5 | 54.7 | 41.3 | 36.2 | 52.6 | 74.6 | 151 | 152 | ### VOC->Comic 153 | 154 | | | AP | AP50 | AP75 | bicycle | bird | car | cat | dog | person | 155 | |:-----------------------:|:----:|:----:|:----:|:-------:|:----:|:----:|:----:|:----:|:------:| 156 | | Faster RCNN (ResNet101) | 13.0 | 25.5 | 11.4 | 33.0 | 15.8 | 28.9 | 16.8 | 19.6 | 39.0 | 157 | | CycleGAN | 16.9 | 34.6 | 14.2 | 28.1 | 25.7 | 37.7 | 28.0 | 33.8 | 54.1 | 158 | | D-adapt | 20.8 | 41.1 | 18.5 | 49.4 | 25.7 | 43.3 | 36.9 | 32.7 | 58.5 | 159 | | Target | 21.9 | 44.6 | 16.0 | 40.7 | 32.3 | 38.3 | 43.9 | 41.3 | 71.0 | 160 | 161 | 162 | ### Cityscapes->Foggy Cityscapes 163 | | | | AP | AP50 | AP75 | bicycle | bus | car | motorcycle | person | rider | train | truck | 164 | |:-----------------------:|:--------:|:----:|:----:|:----:|:-------:|:----:|:----:|:----------:|:------:|:-----:|:-----:|:-----:| 165 | | Faster RCNN (VGG16) | Source | 14.3 | 25.9 | 13.2 | 33.6 | 27.0 | 40.0 | 22.3 | 31.3 | 38.5 | 2.3 | 12.2 | 166 | | | CycleGAN | 22.5 | 41.6 | 20.7 | 46.5 | 41.5 | 62.0 | 33.8 | 45.0 | 54.5 | 21.7 | 27.7 | 167 | | | D-adapt | 19.4 | 38.1 | 17.5 | 42.0 | 36.8 | 58.1 | 32.2 | 43.1 | 51.8 | 14.6 | 26.3 | 168 | | | Target | 24.0 | 45.3 | 21.3 | 45.9 | 47.4 | 67.3 | 39.7 | 49.0 | 53.2 | 30.0 | 29.6 | 169 | | | | | | | | | | | | | | | 170 | | Faster RCNN (ResNet101) | Source | 18.8 | 33.3 | 19.0 | 36.1 | 34.5 | 43.8 | 24.0 | 36.3 | 39.9 | 29.1 | 22.8 | 171 | | | CycleGAN | 22.9 | 41.8 | 21.9 | 42.0 | 44.5 | 57.6 | 36.3 | 40.9 | 48.0 | 30.8 | 34.3 | 172 | | | D-adapt | 22.7 | 42.4 | 21.6 | 41.8 | 44.4 | 56.6 | 31.4 | 41.8 | 48.6 | 42.3 | 32.4 | 173 | | | Target | 25.5 | 45.3 | 24.3 | 41.9 | 53.2 | 63.4 | 36.1 | 42.6 | 47.9 | 42.4 | 35.3 | 174 | 175 | ### Sim10k->Cityscapes Car 176 | 177 | | | | AP | AP50 | AP75 | 178 | |:-----------------------:|:--------:|:----:|:----:|:----:| 179 | | Faster RCNN (VGG16) | Source | 24.8 | 43.4 | 23.6 | 180 | | | CycleGAN | 29.3 | 51.9 | 28.6 | 181 | | | D-adapt | 23.6 | 48.5 | 18.7 | 182 | | | Target | 24.8 | 43.4 | 23.6 | 183 | | | | | | | 184 | | Faster RCNN (ResNet101) | Source | 24.6 | 44.4 | 23.0 | 185 | | | CycleGAN | 26.5 | 47.4 | 24.0 | 186 | | | D-adapt | 27.4 | 51.9 | 25.7 | 187 | | | Target | 24.6 | 44.4 | 23.0 | 188 | 189 | ### Visualization 190 | We provide code for visualization in `visualize.py`. For example, suppose you have trained the source only model 191 | of task VOC->Clipart using provided scripts. The following code visualizes the prediction of the 192 | detector on Clipart. 193 | ```shell 194 | CUDA_VISIBLE_DEVICES=0 python visualize.py --config-file config/faster_rcnn_R_101_C4_voc.yaml \ 195 | --test Clipart datasets/clipart --save-path visualizations/source_only/voc2clipart \ 196 | MODEL.WEIGHTS logs/source_only/faster_rcnn_R_101_C4/voc2clipart/model_final.pth 197 | ``` 198 | Explanation of some arguments 199 | - `--test`: a list that specifiers test datasets for visualization. 200 | - `--save-path`: where to save visualization results. 201 | - `MODEL.WEIGHTS`: path to the model. 202 | 203 | ## TODO 204 | Support methods: SWDA, Global/Local Alignment 205 | 206 | ## Citation 207 | If you use these methods in your research, please consider citing. 208 | 209 | ``` 210 | @inproceedings{jiang2021decoupled, 211 | title = {Decoupled Adaptation for Cross-Domain Object Detection}, 212 | author = {Junguang Jiang and Baixu Chen and Jianmin Wang and Mingsheng Long}, 213 | booktitle = {ICLR}, 214 | year = {2022} 215 | } 216 | 217 | @inproceedings{CycleGAN, 218 | title={Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks}, 219 | author={Zhu, Jun-Yan and Park, Taesung and Isola, Phillip and Efros, Alexei A}, 220 | booktitle={ICCV}, 221 | year={2017} 222 | } 223 | ``` 224 | --------------------------------------------------------------------------------