└── README.md


/README.md:
--------------------------------------------------------------------------------
  1 | # Unsupervised Domain Adaptation for Object Detection (D-adapt)
  2 | 
  3 | ## Our code is available at [TLlib examples for cross-domain object detection](https://github.com/thuml/Transfer-Learning-Library/edit/dev-tllib/examples/domain_adaptation/object_detection/)
  4 | 
  5 | ## Installation
  6 | Our code is based on [Detectron latest(v0.6)](https://detectron2.readthedocs.io/en/latest/tutorials/install.html), please install it before usage.
  7 | 
  8 | The following is an example based on PyTorch 1.9.0 with CUDA 11.1. For other versions, please refer to 
  9 | the official website of [PyTorch](https://pytorch.org/) and 
 10 | [Detectron](https://detectron2.readthedocs.io/en/latest/tutorials/install.html).
 11 | ```shell
 12 | # create environment
 13 | conda create -n detection python=3.8.3
 14 | # activate environment
 15 | conda activate detection
 16 | # install pytorch 
 17 | pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
 18 | # install detectron
 19 | python -m pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu111/torch1.9/index.html
 20 | # install other requirements
 21 | pip install -r requirements.txt
 22 | ```
 23 | 
 24 | ## Dataset
 25 | 
 26 | Following datasets can be downloaded automatically:
 27 | - [PASCAL_VOC 07+12](http://host.robots.ox.ac.uk/pascal/VOC/)
 28 | - Clipart
 29 | - WaterColor
 30 | - Comic
 31 | 
 32 | You need to prepare following datasets manually if you want to use them:
 33 | 
 34 | #### Cityscapes, Foggy Cityscapes
 35 |   - Download Cityscapes and Foggy Cityscapes dataset from the [link](https://www.cityscapes-dataset.com/downloads/). Particularly, we use *leftImg8bit_trainvaltest.zip* for Cityscapes and *leftImg8bit_trainvaltest_foggy.zip* for Foggy Cityscapes.
 36 |   - Unzip them under the directory like
 37 | 
 38 | ```
 39 | object_detction/datasets/cityscapes
 40 | ├── gtFine
 41 | ├── leftImg8bit
 42 | ├── leftImg8bit_foggy
 43 | └── ...
 44 | ```
 45 | Then run 
 46 | ```
 47 | python prepare_cityscapes_to_voc.py 
 48 | ```
 49 | This will automatically generate dataset in `VOC` format.
 50 | ```
 51 | object_detction/datasets/cityscapes_in_voc
 52 | ├── Annotations
 53 | ├── ImageSets
 54 | └── JPEGImages
 55 | object_detction/datasets/foggy_cityscapes_in_voc
 56 | ├── Annotations
 57 | ├── ImageSets
 58 | └── JPEGImages
 59 | ```
 60 | 
 61 | #### Sim10k
 62 |   - Download Sim10k dataset from the following links: [Sim10k](https://fcav.engin.umich.edu/projects/driving-in-the-matrix). Particularly, we use *repro_10k_images.tgz* , *repro_image_sets.tgz* and *repro_10k_annotations.tgz* for Sim10k.
 63 |   - Extract the training set from *repro_10k_images.tgz*, *repro_image_sets.tgz* and *repro_10k_annotations.tgz*, then rename directory `VOC2012/` to `sim10k/`.
 64 |   
 65 | After preparation, there should exist following files:
 66 | ```
 67 | object_detction/datasets/
 68 | ├── VOC2007
 69 | │   ├── Annotations
 70 | │   ├──ImageSets
 71 | │   └──JPEGImages
 72 | ├── VOC2012
 73 | │   ├── Annotations
 74 | │   ├── ImageSets
 75 | │   └── JPEGImages
 76 | ├── clipart
 77 | │   ├── Annotations
 78 | │   ├── ImageSets
 79 | │   └── JPEGImages
 80 | ├── watercolor
 81 | │   ├── Annotations
 82 | │   ├── ImageSets
 83 | │   └── JPEGImages
 84 | ├── comic
 85 | │   ├── Annotations
 86 | │   ├── ImageSets
 87 | │   └── JPEGImages
 88 | ├── cityscapes_in_voc
 89 | │   ├── Annotations
 90 | │   ├── ImageSets
 91 | │   └── JPEGImages
 92 | ├── foggy_cityscapes_in_voc
 93 | │   ├── Annotations
 94 | │   ├── ImageSets
 95 | │   └── JPEGImages
 96 | └── sim10k
 97 |     ├── Annotations
 98 |     ├── ImageSets
 99 |     └── JPEGImages
100 | ```
101 | 
102 | **Note**: The above is a tutorial for using standard datasets. To use your own datasets, 
103 | you need to convert them into corresponding format.
104 | 
105 | ## Supported Methods
106 | 
107 | Supported methods include:
108 | 
109 | - [Cycle-Consistent Adversarial Networks (CycleGAN)](https://arxiv.org/pdf/1703.10593.pdf)
110 | - [Decoupled Adaptation for Cross-Domain Object Detection (D-adapt)](https://arxiv.org/abs/2110.02578)
111 | 
112 | ## Experiment and Results
113 | 
114 | The shell files give the script to reproduce the [benchmarks](/docs/dalib/benchmarks/object_detection.rst) with specified hyper-parameters.
115 | The basic training pipeline is as follows.
116 | 
117 | The following command trains a Faster-RCNN detector on task VOC->Clipart, with only source (VOC) data.
118 | ```
119 | CUDA_VISIBLE_DEVICES=0 python source_only.py \
120 |   --config-file config/faster_rcnn_R_101_C4_voc.yaml \
121 |   -s VOC2007 datasets/VOC2007 VOC2012 datasets/VOC2012 -t Clipart datasets/clipart \
122 |   --test VOC2007Test datasets/VOC2007 Clipart datasets/clipart --finetune \
123 |   OUTPUT_DIR logs/source_only/faster_rcnn_R_101_C4/voc2clipart
124 | ```
125 | Explanation of some arguments
126 | - `--config-file`: path to config file that specifies training hyper-parameters.
127 | - `-s`: a list that specifies source datasets, for each dataset you should pass in a `(name, path)` pair, in the
128 |     above command, there are two source datasets **VOC2007** and **VOC2012**.
129 | - `-t`: a list that specifies target datasets, same format as above.
130 | - `--test`: a list that specifiers test datasets, same format as above.
131 | 
132 | ### VOC->Clipart
133 | 
134 | |                         |          | AP   | AP50 | AP75 | aeroplane | bicycle | bird | boat | bottle | bus  | car  | cat  | chair | cow  | diningtable | dog  | horse | motorbike | person | pottedplant | sheep | sofa | train | tvmonitor |
135 | |-------------------------|----------|------|------|------|-----------|---------|------|------|--------|------|------|------|-------|------|-------------|------|-------|-----------|--------|-------------|-------|------|-------|-----------|
136 | | Faster RCNN (ResNet101) | Source   | 14.9 | 29.3 | 12.6 | 29.6      | 38.0    | 24.7 | 21.7 | 31.9   | 48.0 | 30.8 | 15.9 | 32.0  | 19.2 | 18.2        | 12.1 | 28.2  | 48.8      | 38.3   | 34.6        | 3.8   | 22.5 | 43.7  | 44.0      |
137 | |                         | CycleGAN | 20.0 | 37.7 | 18.3 | 37.1      | 41.9    | 29.9 | 26.5 | 40.9   | 65.1 | 37.8 | 23.8 | 40.7  | 48.9 | 12.7        | 14.4 | 27.8  | 63.0      | 55.1   | 40.1        | 8.0   | 30.7 | 54.1  | 55.7      |
138 | |                         | D-adapt  | 24.8 | 49.0 | 21.5 | 56.4      | 63.2    | 42.3 | 40.9 | 45.3   | 77.0 | 48.7 | 25.4 | 44.3  | 58.4 | 31.4        | 24.5 | 47.1  | 75.3      | 69.3   | 43.5        | 27.9  | 34.1 | 60.7  | 64.0      |
139 | |                         |          |      |      |      |           |         |      |      |        |      |      |      |       |      |             |      |       |           |        |             |       |      |       |           |
140 | | RetinaNet               | Source   | 18.3 | 32.2 | 17.6 | 34.2      | 42.4    | 27.0 | 21.6 | 36.8   | 48.4 | 35.9 | 16.4 | 38.9  | 22.6 | 27.0        | 15.1 | 27.1  | 46.7      | 42.1   | 36.2        | 8.3   | 29.5 | 42.1  | 46.2      |
141 | |                         | D-adapt  | 25.1 | 46.3 | 23.9 | 47.4      | 65.0    | 33.1 | 37.5 | 56.8   | 61.2 | 55.1 | 27.3 | 45.5  | 51.8 | 29.1        | 29.6 | 38.0  | 74.5      | 66.7   | 46.0        | 24.2  | 29.3 | 54.2  | 53.8      |
142 | 
143 | ### VOC->WaterColor
144 | 
145 | |                         | AP   | AP50 | AP75 | bicycle | bird | car  | cat  | dog  | person |
146 | |-------------------------|------|------|------|---------|------|------|------|------|--------|
147 | | Faster RCNN (ResNet101) | 23.0 | 45.9 | 18.5 | 71.1    | 48.3 | 48.6 | 23.7 | 23.3 | 60.3   |
148 | | CycleGAN                | 24.9 | 50.8 | 22.4 | 75.8    | 52.1 | 49.8 | 30.1 | 33.4 | 63.6   |
149 | | D-adapt                 | 28.5 | 57.5 | 23.6 | 77.4    | 54.0 | 52.8 | 43.9 | 48.1 | 68.9   |
150 | | Target                  | 23.8 | 51.3 | 17.4 | 48.5    | 54.7 | 41.3 | 36.2 | 52.6 | 74.6   |
151 | 
152 | ### VOC->Comic
153 | 
154 | |                         |  AP  | AP50 | AP75 | bicycle | bird |  car |  cat |  dog | person |
155 | |:-----------------------:|:----:|:----:|:----:|:-------:|:----:|:----:|:----:|:----:|:------:|
156 | | Faster RCNN (ResNet101) | 13.0 | 25.5 | 11.4 |   33.0  | 15.8 | 28.9 | 16.8 | 19.6 |  39.0  |
157 | |         CycleGAN        | 16.9 | 34.6 | 14.2 |   28.1  | 25.7 | 37.7 | 28.0 | 33.8 |  54.1  |
158 | |         D-adapt         | 20.8 | 41.1 | 18.5 |   49.4  | 25.7 | 43.3 | 36.9 | 32.7 |  58.5  |
159 | |          Target         | 21.9 | 44.6 | 16.0 |   40.7  | 32.3 | 38.3 | 43.9 | 41.3 |  71.0  |
160 | 
161 | 
162 | ### Cityscapes->Foggy Cityscapes
163 | |                         |          |  AP  | AP50 | AP75 | bicycle |  bus |  car | motorcycle | person | rider | train | truck |
164 | |:-----------------------:|:--------:|:----:|:----:|:----:|:-------:|:----:|:----:|:----------:|:------:|:-----:|:-----:|:-----:|
165 | |   Faster RCNN (VGG16)   |  Source  | 14.3 | 25.9 | 13.2 |   33.6  | 27.0 | 40.0 |    22.3    |  31.3  |  38.5 |  2.3  |  12.2 |
166 | |                         | CycleGAN | 22.5 | 41.6 | 20.7 |   46.5  | 41.5 | 62.0 |    33.8    |  45.0  |  54.5 |  21.7 |  27.7 |
167 | |                         |  D-adapt | 19.4 | 38.1 | 17.5 |   42.0  | 36.8 | 58.1 |    32.2    |  43.1  |  51.8 |  14.6 |  26.3 |
168 | |                         |  Target  | 24.0 | 45.3 | 21.3 |   45.9  | 47.4 | 67.3 |    39.7    |  49.0  |  53.2 |  30.0 |  29.6 |
169 | |                         |          |      |      |      |         |      |      |            |        |       |       |       |
170 | | Faster RCNN (ResNet101) |  Source  | 18.8 | 33.3 | 19.0 |   36.1  | 34.5 | 43.8 |    24.0    |  36.3  |  39.9 |  29.1 |  22.8 |
171 | |                         | CycleGAN | 22.9 | 41.8 | 21.9 |   42.0  | 44.5 | 57.6 |    36.3    |  40.9  |  48.0 |  30.8 |  34.3 |
172 | |                         |  D-adapt | 22.7 | 42.4 | 21.6 |   41.8  | 44.4 | 56.6 |    31.4    |  41.8  |  48.6 |  42.3 |  32.4 |
173 | |                         |  Target  | 25.5 | 45.3 | 24.3 |   41.9  | 53.2 | 63.4 |    36.1    |  42.6  |  47.9 |  42.4 |  35.3 |
174 | 
175 | ### Sim10k->Cityscapes Car
176 | 
177 | |                         |          |  AP  | AP50 | AP75 |
178 | |:-----------------------:|:--------:|:----:|:----:|:----:|
179 | |   Faster RCNN (VGG16)   |  Source  | 24.8 | 43.4 | 23.6 |
180 | |                         | CycleGAN | 29.3 | 51.9 | 28.6 |
181 | |                         |  D-adapt | 23.6 | 48.5 | 18.7 |
182 | |                         |  Target  | 24.8 | 43.4 | 23.6 |
183 | |                         |          |      |      |      |
184 | | Faster RCNN (ResNet101) |  Source  | 24.6 | 44.4 | 23.0 |
185 | |                         | CycleGAN | 26.5 | 47.4 | 24.0 |
186 | |                         |  D-adapt | 27.4 | 51.9 | 25.7 |
187 | |                         |  Target  | 24.6 | 44.4 | 23.0 |
188 | 
189 | ### Visualization
190 | We provide code for visualization in `visualize.py`. For example, suppose you have trained the source only model 
191 | of task VOC->Clipart using provided scripts. The following code visualizes the prediction of the 
192 | detector on Clipart.
193 | ```shell
194 | CUDA_VISIBLE_DEVICES=0 python visualize.py --config-file config/faster_rcnn_R_101_C4_voc.yaml \
195 |   --test Clipart datasets/clipart --save-path visualizations/source_only/voc2clipart \
196 |   MODEL.WEIGHTS logs/source_only/faster_rcnn_R_101_C4/voc2clipart/model_final.pth
197 | ```
198 | Explanation of some arguments
199 | - `--test`: a list that specifiers test datasets for visualization.
200 | - `--save-path`: where to save visualization results.
201 | - `MODEL.WEIGHTS`: path to the model.
202 | 
203 | ## TODO
204 | Support methods: SWDA, Global/Local Alignment
205 | 
206 | ## Citation
207 | If you use these methods in your research, please consider citing.
208 | 
209 | ```
210 | @inproceedings{jiang2021decoupled,
211 |   title     = {Decoupled Adaptation for Cross-Domain Object Detection},
212 |   author    = {Junguang Jiang and Baixu Chen and Jianmin Wang and Mingsheng Long},
213 |   booktitle = {ICLR},
214 |   year      = {2022}
215 | }
216 | 
217 | @inproceedings{CycleGAN,
218 |     title={Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks},
219 |     author={Zhu, Jun-Yan and Park, Taesung and Isola, Phillip and Efros, Alexei A},
220 |     booktitle={ICCV},
221 |     year={2017}
222 | }
223 | ```
224 | 


--------------------------------------------------------------------------------