├── README.md ├── data ├── README.md └── gen_market1501_and_cuhk03_train_set.py ├── evaluation ├── README.md ├── deploy_mgcam.prototxt ├── extract_feature_cuhk.py ├── extract_feature_market.py └── extract_feature_mars.py └── experiments ├── cuhk03-detected ├── layers.py ├── mgcam_iter_75000.caffemodel ├── mgcam_siamese_iter_20000.caffemodel ├── mgcam_siamese_train.prototxt ├── mgcam_train.prototxt ├── run_mgcam.sh ├── run_mgcam_siamese.sh ├── solver_mgcam.prototxt └── solver_mgcam_siamese.prototxt ├── cuhk03-labeled ├── layers.py ├── mgcam_iter_75000.caffemodel ├── mgcam_siamese_iter_20000.caffemodel ├── mgcam_siamese_train.prototxt ├── mgcam_train.prototxt ├── run_mgcam.sh ├── run_mgcam_siamese.sh ├── solver_mgcam.prototxt └── solver_mgcam_siamese.prototxt ├── market1501 ├── layers.py ├── mgcam_iter_75000.caffemodel ├── mgcam_siamese_iter_20000.caffemodel ├── mgcam_siamese_train.prototxt ├── mgcam_train.prototxt ├── run_mgcam.sh ├── run_mgcam_siamese.sh ├── solver_mgcam.prototxt └── solver_mgcam_siamese.prototxt └── mars ├── layers.py ├── mgcam_iter_75000.caffemodel ├── mgcam_siamese_iter_20000.caffemodel ├── mgcam_siamese_train.prototxt ├── mgcam_train.prototxt ├── run_mgcam.sh ├── run_mgcam_siamese.sh ├── solver_mgcam.prototxt └── solver_mgcam_siamese.prototxt /README.md: -------------------------------------------------------------------------------- 1 | # MGCAM 2 | -------------------------------------------------------------------------------- 3 | * Mask-guided Contrastive Attention Model (MGCAM) for Person Re-Identification 4 | * Code Version 1.0 5 | * E-mail: chunfeng.song@nlpr.ia.ac.cn 6 | --------------------------------------------------------------------------------- 7 | 8 | i. Overview 9 | ii. Copying 10 | iii. Use 11 | 12 | i. OVERVIEW 13 | ----------------------------- 14 | This code implements the paper: 15 | 16 | >Chunfeng Song, Yan Huang, Wanli Ouyang, and Liang Wang. Mask-guided 17 | Contrastive Attention Model for Person Re-Identification. 18 | In CVPR, 2018. 19 | 20 | If you find this work is helpful for your research, please cite our paper [[PDF]](http://openaccess.thecvf.com/content_cvpr_2018/html/Song_Mask-Guided_Contrastive_Attention_CVPR_2018_paper.html). 21 | 22 | ii. COPYING 23 | ----------------------------- 24 | We share this code only for research use. We neither warrant 25 | correctness nor take any responsibility for the consequences of 26 | using this code. If you find any problem or inappropriate content 27 | in this code, feel free to contact us (chunfeng.song@nlpr.ia.ac.cn). 28 | 29 | iii. USE 30 | ----------------------------- 31 | This code should work on Caffe with Python layer (pycaffe). You can install Caffe from [here](https://github.com/BVLC/caffe). 32 | 33 | (1) Data Preparation. 34 | Download the original datasets and their masks: MARS, Market-1501, CUHK-03, and their masks from [Baidu Yun](https://pan.baidu.com/s/16ZrlM1f_1_T-eZHmQTTkYg) OR [Google Drive](https://drive.google.com/drive/folders/1QVBDpH0B4k6cXKFYXBJ3HNVET_3gY0to?usp=sharing). 35 | 36 | For Market-1501 and CUHK-03, you need to run the spilt code(./data/gen_market1501_and_cuhk03_train_set.py). 37 | 38 | (2) Model Training. 39 | Here, we take MARS as an example. The other two datasets are the same. 40 | 41 | >cd ./experiments/mars 42 | 43 | First eidt the 'im_path', 'gt_path' and 'dataset' in the prototxt file, e.g., the MGCAM-only and MGCAM-Siamese version for MARS dataset is 'mgcam_train.prototxt' and 'mgcam_siamese_train.prototxt', respectively. 44 | 45 | Then, we can train the MGCAM model from scratch with the command: 46 | >sh run_mgcam.sh 47 | 48 | It will take roughly 15 hours for single Titan X. 49 | 50 | Finally, we can fine-tune the MGCAM model with siamese loss via run the commman: 51 | >sh run_mgcam_siamese.sh 52 | 53 | It will take roughly 5 hours for single Titan X. 54 | 55 | (3) Evaluation. 56 | Taking MARS for example, run the code in './evaluation/extract_feature_mars.py' to extract the IDE features, and then run the CMC and mAP evaluation with the [MARS-evaluation](https://github.com/liangzheng06/MARS-evaluation) code by Liang Zheng et al., or the [Re-Ranking](https://github.com/zhunzhong07/person-re-ranking) by Zhun Zhong et al. 57 | -------------------------------------------------------------------------------- /data/README.md: -------------------------------------------------------------------------------- 1 | Dataset Preparation. 2 | --- 3 | 1) Download MARS dataset from [here](http://www.liangzheng.com.cn/Project/project_mars.html). 4 | 5 | 2) Download Market-1501 dataset from [here](http://www.liangzheng.org/Project/project_reid.html). 6 | 7 | 3) Download CUHK03 dataset from [here](https://github.com/zhunzhong07/person-re-ranking). You need to extract the images into folods. You can also download the new protocol version [CUHK03-NP](https://github.com/zhunzhong07/person-re-ranking/tree/master/CUHK03-NP). If you use this dataset in your work, please cite their paper: 8 | 9 | >@inproceedings{zhong2017re, 10 | title={Re-ranking Person Re-identification with k-reciprocal Encoding}, 11 | author={Zhong, Zhun and Zheng, Liang and Cao, Donglin and Li, Shaozi}, 12 | booktitle={CVPR}, 13 | year={2017} 14 | } 15 | 16 | >@inproceedings{li2014deepreid, 17 | title={DeepReID: Deep Filter Pairing Neural Network for Person Re-identification}, 18 | author={Li, Wei and Zhao, Rui and Xiao, Tong and Wang, Xiaogang}, 19 | booktitle={CVPR}, 20 | year={2014} 21 | } 22 | 23 | * All masks can be download from [Baidu Yun](https://pan.baidu.com/s/16ZrlM1f_1_T-eZHmQTTkYg) OR [Google Drive](https://drive.google.com/drive/folders/1QVBDpH0B4k6cXKFYXBJ3HNVET_3gY0to?usp=sharing). If you use the masks in your work, please cite our paper: 24 | 25 | >Chunfeng Song, Yan Huang, Wanli Ouyang, and Liang Wang. Mask-guided 26 | Contrastive Attention Model for Person Re-Identification. 27 | In CVPR, 2018. 28 | 29 | Make sure that all the datasets are saveing in the following structure: 30 | 31 | MARS: 32 | >./data 33 | >./data/mars 34 | >./data/mars/bbox_train 35 | >./data/mars/bbox_test 36 | >./data/mars/bbox_train_seg 37 | >./data/mars/bbox_test_seg 38 | 39 | Market-1501: 40 | >./data 41 | >./data/market-1501 42 | >./data/market-1501/bounding_box_train 43 | >./data/market-1501/bounding_box_test 44 | >./data/market-1501/query 45 | >./data/market-1501/bounding_box_train_seg 46 | >./data/market-1501/bounding_box_test_seg 47 | >./data/market-1501/query_seg 48 | 49 | CUHK03: 50 | >./data 51 | >./data/cuhk03 52 | >./data/cuhk03/labeled 53 | >./data/cuhk03/cuhk03_labeled_seg 54 | >./data/cuhk03/detected 55 | >./data/cuhk03/cuhk03_detected_seg 56 | 57 | CUHK03-NP: 58 | >./data 59 | >./data/cuhk03-np 60 | >./data/cuhk03-np/labeled 61 | >./data/cuhk03-np/labeled/bounding_box_train 62 | >./data/cuhk03-np/labeled/bounding_box_test 63 | >./data/cuhk03-np/labeled/query 64 | >./data/cuhk03-np/labeled/bounding_box_train_seg 65 | >./data/cuhk03-np/labeled/bounding_box_test_seg 66 | >./data/cuhk03-np/labeled/query_seg 67 | 68 | >./data/cuhk03-np/detected 69 | >./data/cuhk03-np/detected/bounding_box_train 70 | >./data/cuhk03-np/detected/bounding_box_test 71 | >./data/cuhk03-np/detected/query 72 | >./data/cuhk03-np/detected/bounding_box_train_seg 73 | >./data/cuhk03-np/detected/bounding_box_test_seg 74 | >./data/cuhk03-np/detected/query_seg 75 | 76 | Now, you can run the code to generate training set with running "python gen_market1501_and_cuhk03_train_set.py". 77 | -------------------------------------------------------------------------------- /data/gen_market1501_and_cuhk03_train_set.py: -------------------------------------------------------------------------------- 1 | """ 2 | Market-1501 and CUHK-03 training data. 3 | 4 | by Chunfeng Song 5 | 6 | 2017/10/08 7 | 8 | This code is for research use only, please cite our paper: 9 | 10 | Chunfeng Song, Yan Huang, Wanli Ouyang, and Liang Wang. Mask-guided Contrastive Attention Model for Person Re-Identification. In CVPR, 2018. 11 | 12 | Contact us: chunfeng.song@nlpr.ia.ac.cn 13 | """ 14 | 15 | import os 16 | import shutil 17 | import numpy as np 18 | 19 | dataset = 'cuhk03-np/detected' #'cuhk03-np/labeled' or 'cuhk03-np/detected', cuhk03-np can be download from https://github.com/zhunzhong07/person-re-ranking/tree/master/CUHK03-NP 20 | data_path = './' + dataset +'/bounding_box_train' #Path for original RGB training set. 21 | seg_path = './' + dataset +'/bounding_box_train_seg' #Path for binary masks of training set. 22 | save_data_path = './' + dataset +'/bounding_box_train_fold' 23 | save_seg_path = './' + dataset +'/bounding_box_train_seg_fold' 24 | if not os.path.exists(save_data_path): 25 | os.mkdir(save_data_path) 26 | os.mkdir(save_seg_path) 27 | pic_im_list = np.sort(os.listdir(data_path)) 28 | i = 0 29 | for pic in pic_im_list: 30 | if pic.lower().endswith('.jpg') or pic.lower().endswith('.png'): 31 | this_data_fold = os.path.join(save_data_path,pic[:4]) 32 | this_seg_fold = os.path.join(save_seg_path,pic[:4]) 33 | if not os.path.exists(this_data_fold): 34 | os.mkdir(this_data_fold) 35 | os.mkdir(this_seg_fold) 36 | i +=1 37 | new_im_path = os.path.join(this_data_fold,pic) 38 | new_seg_path = os.path.join(this_data_fold,pic[:-4] + '.png') 39 | shutil.copy(os.path.join(data_path,pic),new_im_path) 40 | shutil.copy(os.path.join(seg_path,pic[:-4] + '.png'),this_seg_fold) 41 | print '---->dealing num-%04d with %s!'%(i,pic) 42 | print 'DONE!!!' 43 | 44 | -------------------------------------------------------------------------------- /evaluation/README.md: -------------------------------------------------------------------------------- 1 | Eextract features for evaluation. 2 | --- 3 | We recommend the popular [re-ranking](https://github.com/zhunzhong07/person-re-ranking) method for further evaluation. 4 | -------------------------------------------------------------------------------- /evaluation/deploy_mgcam.prototxt: -------------------------------------------------------------------------------- 1 | name: "MGCAM" 2 | 3 | input: "data" 4 | input_dim:1 5 | input_dim:3 6 | input_dim:160 7 | input_dim:64 8 | 9 | input: "one_mask" 10 | input_dim:1 11 | input_dim:1 12 | input_dim:40 13 | input_dim:16 14 | 15 | input: "mask" 16 | input_dim:1 17 | input_dim:1 18 | input_dim:160 19 | input_dim:64 20 | 21 | layer { 22 | name: "data_concate" 23 | type: "Concat" 24 | bottom: "data" 25 | bottom: "mask" 26 | top: "data_concate" 27 | } 28 | 29 | layer { 30 | name: "conv0_scale1_full" 31 | type: "Convolution" 32 | bottom: "data_concate" 33 | top: "conv0_scale1_full" 34 | param { 35 | name: "conv0_scale1_w_full" 36 | lr_mult: 1 37 | decay_mult: 1 38 | } 39 | param { 40 | name: "conv0_scale1_b_full" 41 | lr_mult: 2 42 | decay_mult: 0 43 | } 44 | convolution_param { 45 | num_output: 32 46 | pad: 2 47 | kernel_size: 5 48 | stride: 1 49 | weight_filler { 50 | type: "xavier" 51 | } 52 | bias_filler { 53 | type: "constant" 54 | } 55 | dilation: 1 56 | } 57 | } 58 | layer { 59 | name: "bn0_scale1_full" 60 | type: "BatchNorm" 61 | bottom: "conv0_scale1_full" 62 | top: "bn0_scale1_full" 63 | param { 64 | name: "bn0_scale1_mean_full" 65 | lr_mult: 0 66 | decay_mult: 0 67 | } 68 | param { 69 | name: "bn0_scale1_var_full" 70 | lr_mult: 0 71 | decay_mult: 0 72 | } 73 | param { 74 | name: "bn0_scale1_bias_full" 75 | lr_mult: 0 76 | decay_mult: 0 77 | } 78 | batch_norm_param { 79 | use_global_stats: true 80 | } 81 | } 82 | layer { 83 | name: "relu0_full" 84 | type: "ReLU" 85 | bottom: "bn0_scale1_full" 86 | top: "bn0_scale1_full" 87 | } 88 | layer { 89 | name: "pool0_full" 90 | type: "Pooling" 91 | bottom: "bn0_scale1_full" 92 | top: "pool0_full" 93 | pooling_param { 94 | pool: MAX 95 | kernel_size: 2 96 | stride: 2 97 | } 98 | } 99 | layer { 100 | name: "conv1_scale1_full" 101 | type: "Convolution" 102 | bottom: "pool0_full" 103 | top: "conv1_scale1_full" 104 | param { 105 | name: "conv1_scale1_w_full" 106 | lr_mult: 1 107 | decay_mult: 1 108 | } 109 | param { 110 | name: "conv1_scale1_b_full" 111 | lr_mult: 2 112 | decay_mult: 0 113 | } 114 | convolution_param { 115 | num_output: 32 116 | pad: 1 117 | kernel_size: 3 118 | stride: 1 119 | weight_filler { 120 | type: "xavier" 121 | } 122 | bias_filler { 123 | type: "constant" 124 | } 125 | dilation: 1 126 | } 127 | } 128 | layer { 129 | name: "conv1_scale2_full" 130 | type: "Convolution" 131 | bottom: "pool0_full" 132 | top: "conv1_scale2_full" 133 | param { 134 | name: "conv1_scale2_w_full" 135 | lr_mult: 1 136 | decay_mult: 1 137 | } 138 | param { 139 | name: "conv1_scale2_b_full" 140 | lr_mult: 2 141 | decay_mult: 0 142 | } 143 | convolution_param { 144 | num_output: 32 145 | pad: 2 146 | kernel_size: 3 147 | stride: 1 148 | weight_filler { 149 | type: "xavier" 150 | } 151 | bias_filler { 152 | type: "constant" 153 | } 154 | dilation: 2 155 | } 156 | } 157 | layer { 158 | name: "conv1_scale3_full" 159 | type: "Convolution" 160 | bottom: "pool0_full" 161 | top: "conv1_scale3_full" 162 | param { 163 | name: "conv1_scale3_w_full" 164 | lr_mult: 1 165 | decay_mult: 1 166 | } 167 | param { 168 | name: "conv1_scale3_b_full" 169 | lr_mult: 2 170 | decay_mult: 0 171 | } 172 | convolution_param { 173 | num_output: 32 174 | pad: 3 175 | kernel_size: 3 176 | stride: 1 177 | weight_filler { 178 | type: "xavier" 179 | } 180 | bias_filler { 181 | type: "constant" 182 | } 183 | dilation: 3 184 | } 185 | } 186 | layer { 187 | name: "bn1_scale1_full" 188 | type: "BatchNorm" 189 | bottom: "conv1_scale1_full" 190 | top: "bn1_scale1_full" 191 | param { 192 | name: "bn1_scale1_mean_full" 193 | lr_mult: 0 194 | decay_mult: 0 195 | } 196 | param { 197 | name: "bn1_scale1_var_full" 198 | lr_mult: 0 199 | decay_mult: 0 200 | } 201 | param { 202 | name: "bn1_scale1_bias_full" 203 | lr_mult: 0 204 | decay_mult: 0 205 | } 206 | batch_norm_param { 207 | use_global_stats: true 208 | } 209 | } 210 | layer { 211 | name: "bn1_scale2_full" 212 | type: "BatchNorm" 213 | bottom: "conv1_scale2_full" 214 | top: "bn1_scale2_full" 215 | param { 216 | name: "bn1_scale2_mean_full" 217 | lr_mult: 0 218 | decay_mult: 0 219 | } 220 | param { 221 | name: "bn1_scale2_var_full" 222 | lr_mult: 0 223 | decay_mult: 0 224 | } 225 | param { 226 | name: "bn1_scale2_bias_full" 227 | lr_mult: 0 228 | decay_mult: 0 229 | } 230 | batch_norm_param { 231 | use_global_stats: true 232 | } 233 | } 234 | layer { 235 | name: "bn1_scale3_full" 236 | type: "BatchNorm" 237 | bottom: "conv1_scale3_full" 238 | top: "bn1_scale3_full" 239 | param { 240 | name: "bn1_scale3_mean_full" 241 | lr_mult: 0 242 | decay_mult: 0 243 | } 244 | param { 245 | name: "bn1_scale3_var_full" 246 | lr_mult: 0 247 | decay_mult: 0 248 | } 249 | param { 250 | name: "bn1_scale3_bias_full" 251 | lr_mult: 0 252 | decay_mult: 0 253 | } 254 | batch_norm_param { 255 | use_global_stats: true 256 | } 257 | } 258 | layer { 259 | name: "bn1_full" 260 | type: "Concat" 261 | bottom: "bn1_scale1_full" 262 | bottom: "bn1_scale2_full" 263 | bottom: "bn1_scale3_full" 264 | top: "bn1_full" 265 | concat_param { 266 | axis: 1 267 | } 268 | } 269 | layer { 270 | name: "relu1_full" 271 | type: "ReLU" 272 | bottom: "bn1_full" 273 | top: "bn1_full" 274 | } 275 | layer { 276 | name: "pool1_full" 277 | type: "Pooling" 278 | bottom: "bn1_full" 279 | top: "pool1_full" 280 | pooling_param { 281 | pool: MAX 282 | kernel_size: 2 283 | stride: 2 284 | } 285 | } 286 | layer { 287 | name: "conv2_scale1_full" 288 | type: "Convolution" 289 | bottom: "pool1_full" 290 | top: "conv2_scale1_full" 291 | param { 292 | name: "conv2_scale1_w_full" 293 | lr_mult: 1 294 | decay_mult: 1 295 | } 296 | param { 297 | name: "conv2_scale1_b_full" 298 | lr_mult: 2 299 | decay_mult: 0 300 | } 301 | convolution_param { 302 | num_output: 32 303 | pad: 1 304 | kernel_size: 3 305 | stride: 1 306 | weight_filler { 307 | type: "xavier" 308 | } 309 | bias_filler { 310 | type: "constant" 311 | } 312 | dilation: 1 313 | } 314 | } 315 | layer { 316 | name: "conv2_scale2_full" 317 | type: "Convolution" 318 | bottom: "pool1_full" 319 | top: "conv2_scale2_full" 320 | param { 321 | name: "conv2_scale2_w_full" 322 | lr_mult: 1 323 | decay_mult: 1 324 | } 325 | param { 326 | name: "conv2_scale2_b_full" 327 | lr_mult: 2 328 | decay_mult: 0 329 | } 330 | convolution_param { 331 | num_output: 32 332 | pad: 2 333 | kernel_size: 3 334 | stride: 1 335 | weight_filler { 336 | type: "xavier" 337 | } 338 | bias_filler { 339 | type: "constant" 340 | } 341 | dilation: 2 342 | } 343 | } 344 | layer { 345 | name: "conv2_scale3_full" 346 | type: "Convolution" 347 | bottom: "pool1_full" 348 | top: "conv2_scale3_full" 349 | param { 350 | name: "conv2_scale3_w_full" 351 | lr_mult: 1 352 | decay_mult: 1 353 | } 354 | param { 355 | name: "conv2_scale3_b_full" 356 | lr_mult: 2 357 | decay_mult: 0 358 | } 359 | convolution_param { 360 | num_output: 32 361 | pad: 3 362 | kernel_size: 3 363 | stride: 1 364 | weight_filler { 365 | type: "xavier" 366 | } 367 | bias_filler { 368 | type: "constant" 369 | } 370 | dilation: 3 371 | } 372 | } 373 | layer { 374 | name: "bn2_scale1_full" 375 | type: "BatchNorm" 376 | bottom: "conv2_scale1_full" 377 | top: "bn2_scale1_full" 378 | param { 379 | name: "bn2_scale1_mean_full" 380 | lr_mult: 0 381 | decay_mult: 0 382 | } 383 | param { 384 | name: "bn2_scale1_var_full" 385 | lr_mult: 0 386 | decay_mult: 0 387 | } 388 | param { 389 | name: "bn2_scale1_bias_full" 390 | lr_mult: 0 391 | decay_mult: 0 392 | } 393 | batch_norm_param { 394 | use_global_stats: true 395 | } 396 | } 397 | layer { 398 | name: "bn2_scale2_full" 399 | type: "BatchNorm" 400 | bottom: "conv2_scale2_full" 401 | top: "bn2_scale2_full" 402 | param { 403 | name: "bn2_scale2_mean_full" 404 | lr_mult: 0 405 | decay_mult: 0 406 | } 407 | param { 408 | name: "bn2_scale2_var_full" 409 | lr_mult: 0 410 | decay_mult: 0 411 | } 412 | param { 413 | name: "bn2_scale2_bias_full" 414 | lr_mult: 0 415 | decay_mult: 0 416 | } 417 | batch_norm_param { 418 | use_global_stats: true 419 | } 420 | } 421 | layer { 422 | name: "bn2_scale3_full" 423 | type: "BatchNorm" 424 | bottom: "conv2_scale3_full" 425 | top: "bn2_scale3_full" 426 | param { 427 | name: "bn2_scale3_mean_full" 428 | lr_mult: 0 429 | decay_mult: 0 430 | } 431 | param { 432 | name: "bn2_scale3_var_full" 433 | lr_mult: 0 434 | decay_mult: 0 435 | } 436 | param { 437 | name: "bn2_scale3_bias_full" 438 | lr_mult: 0 439 | decay_mult: 0 440 | } 441 | batch_norm_param { 442 | use_global_stats: true 443 | } 444 | } 445 | layer { 446 | name: "bn2_full" 447 | type: "Concat" 448 | bottom: "bn2_scale1_full" 449 | bottom: "bn2_scale2_full" 450 | bottom: "bn2_scale3_full" 451 | top: "bn2_full" 452 | concat_param { 453 | axis: 1 454 | } 455 | } 456 | layer { 457 | name: "relu2_full" 458 | type: "ReLU" 459 | bottom: "bn2_full" 460 | top: "bn2_full" 461 | } 462 | 463 | 464 | layer { 465 | name: "make_att_mask" 466 | type: "Convolution" 467 | bottom: "bn2_full" 468 | top: "make_att_mask" 469 | param { 470 | lr_mult: 1 471 | decay_mult: 1 472 | } 473 | param { 474 | lr_mult: 2 475 | decay_mult: 0 476 | } 477 | convolution_param { 478 | num_output: 1 479 | pad: 1 480 | kernel_size: 3 481 | stride: 1 482 | weight_filler { 483 | type: "xavier" 484 | } 485 | bias_filler { 486 | type: "constant" 487 | } 488 | dilation: 1 489 | } 490 | } 491 | 492 | layer { 493 | name: "att_sigmoid" 494 | type: "Sigmoid" 495 | bottom: "make_att_mask" 496 | top: "make_att_mask" 497 | } 498 | 499 | layer { 500 | name: "make_att_mask_inv" 501 | type: "Eltwise" 502 | bottom: "one_mask" 503 | bottom: "make_att_mask" 504 | top: "make_att_mask_inv" 505 | eltwise_param { 506 | operation: SUM 507 | coeff: 1 508 | coeff: -1 509 | } 510 | } 511 | ############### Seg Loss ##################### 512 | 513 | layer { 514 | name: "tile_iner" 515 | type: "Tile" 516 | bottom: "make_att_mask" 517 | top: "att_iner" 518 | tile_param { 519 | tiles: 96 520 | axis: 1 521 | } 522 | } 523 | 524 | layer { 525 | name: "bn2_att_iner" 526 | type: "Eltwise" 527 | bottom: "bn2_full" 528 | bottom: "att_iner" 529 | top: "bn2_att_iner" 530 | eltwise_param { 531 | operation: PROD 532 | } 533 | } 534 | 535 | layer { 536 | name: "tile_exter" 537 | type: "Tile" 538 | bottom: "make_att_mask_inv" 539 | top: "att_exter" 540 | tile_param { 541 | tiles: 96 542 | axis: 1 543 | } 544 | } 545 | 546 | layer { 547 | name: "bn2_att_exter" 548 | type: "Eltwise" 549 | bottom: "bn2_full" 550 | bottom: "att_exter" 551 | top: "bn2_att_exter" 552 | eltwise_param { 553 | operation: PROD 554 | } 555 | } 556 | 557 | layer { 558 | name: "pool2_full" 559 | type: "Pooling" 560 | bottom: "bn2_full" 561 | top: "pool2_full" 562 | pooling_param { 563 | pool: MAX 564 | kernel_size: 2 565 | stride: 2 566 | } 567 | } 568 | 569 | layer { 570 | name: "conv3_scale1_full" 571 | type: "Convolution" 572 | bottom: "pool2_full" 573 | top: "conv3_scale1_full" 574 | param { 575 | name: "conv3_scale1_w_full" 576 | lr_mult: 1 577 | decay_mult: 1 578 | } 579 | param { 580 | name: "conv3_scale1_b_full" 581 | lr_mult: 2 582 | decay_mult: 0 583 | } 584 | convolution_param { 585 | num_output: 32 586 | pad: 1 587 | kernel_size: 3 588 | stride: 1 589 | weight_filler { 590 | type: "xavier" 591 | } 592 | bias_filler { 593 | type: "constant" 594 | } 595 | dilation: 1 596 | } 597 | } 598 | layer { 599 | name: "conv3_scale2_full" 600 | type: "Convolution" 601 | bottom: "pool2_full" 602 | top: "conv3_scale2_full" 603 | param { 604 | name: "conv3_scale2_w_full" 605 | lr_mult: 1 606 | decay_mult: 1 607 | } 608 | param { 609 | name: "conv3_scale2_b_full" 610 | lr_mult: 2 611 | decay_mult: 0 612 | } 613 | convolution_param { 614 | num_output: 32 615 | pad: 2 616 | kernel_size: 3 617 | stride: 1 618 | weight_filler { 619 | type: "xavier" 620 | } 621 | bias_filler { 622 | type: "constant" 623 | } 624 | dilation: 2 625 | } 626 | } 627 | layer { 628 | name: "conv3_scale3_full" 629 | type: "Convolution" 630 | bottom: "pool2_full" 631 | top: "conv3_scale3_full" 632 | param { 633 | name: "conv3_scale3_w_full" 634 | lr_mult: 1 635 | decay_mult: 1 636 | } 637 | param { 638 | name: "conv3_scale3_b_full" 639 | lr_mult: 2 640 | decay_mult: 0 641 | } 642 | convolution_param { 643 | num_output: 32 644 | pad: 3 645 | kernel_size: 3 646 | stride: 1 647 | weight_filler { 648 | type: "xavier" 649 | } 650 | bias_filler { 651 | type: "constant" 652 | } 653 | dilation: 3 654 | } 655 | } 656 | layer { 657 | name: "bn3_scale1_full" 658 | type: "BatchNorm" 659 | bottom: "conv3_scale1_full" 660 | top: "bn3_scale1_full" 661 | param { 662 | name: "bn3_scale1_mean_full" 663 | lr_mult: 0 664 | decay_mult: 0 665 | } 666 | param { 667 | name: "bn3_scale1_var_full" 668 | lr_mult: 0 669 | decay_mult: 0 670 | } 671 | param { 672 | name: "bn3_scale1_bias_full" 673 | lr_mult: 0 674 | decay_mult: 0 675 | } 676 | batch_norm_param { 677 | use_global_stats: true 678 | } 679 | } 680 | layer { 681 | name: "bn3_scale2_full" 682 | type: "BatchNorm" 683 | bottom: "conv3_scale2_full" 684 | top: "bn3_scale2_full" 685 | param { 686 | name: "bn3_scale2_mean_full" 687 | lr_mult: 0 688 | decay_mult: 0 689 | } 690 | param { 691 | name: "bn3_scale2_var_full" 692 | lr_mult: 0 693 | decay_mult: 0 694 | } 695 | param { 696 | name: "bn3_scale2_bias_full" 697 | lr_mult: 0 698 | decay_mult: 0 699 | } 700 | batch_norm_param { 701 | use_global_stats: true 702 | } 703 | } 704 | layer { 705 | name: "bn3_scale3_full" 706 | type: "BatchNorm" 707 | bottom: "conv3_scale3_full" 708 | top: "bn3_scale3_full" 709 | param { 710 | name: "bn3_scale3_mean_full" 711 | lr_mult: 0 712 | decay_mult: 0 713 | } 714 | param { 715 | name: "bn3_scale3_var_full" 716 | lr_mult: 0 717 | decay_mult: 0 718 | } 719 | param { 720 | name: "bn3_scale3_bias_full" 721 | lr_mult: 0 722 | decay_mult: 0 723 | } 724 | batch_norm_param { 725 | use_global_stats: true 726 | } 727 | } 728 | layer { 729 | name: "bn3_full" 730 | type: "Concat" 731 | bottom: "bn3_scale1_full" 732 | bottom: "bn3_scale2_full" 733 | bottom: "bn3_scale3_full" 734 | top: "bn3_full" 735 | concat_param { 736 | axis: 1 737 | } 738 | } 739 | layer { 740 | name: "relu3_full" 741 | type: "ReLU" 742 | bottom: "bn3_full" 743 | top: "bn3_full" 744 | } 745 | layer { 746 | name: "pool3_full" 747 | type: "Pooling" 748 | bottom: "bn3_full" 749 | top: "pool3_full" 750 | pooling_param { 751 | pool: MAX 752 | kernel_size: 2 753 | stride: 2 754 | } 755 | } 756 | layer { 757 | name: "conv4_scale1_full" 758 | type: "Convolution" 759 | bottom: "pool3_full" 760 | top: "conv4_scale1_full" 761 | param { 762 | name: "conv4_scale1_w_full" 763 | lr_mult: 1 764 | decay_mult: 1 765 | } 766 | param { 767 | name: "conv4_scale1_b_full" 768 | lr_mult: 2 769 | decay_mult: 0 770 | } 771 | convolution_param { 772 | num_output: 32 773 | pad: 1 774 | kernel_size: 3 775 | stride: 1 776 | weight_filler { 777 | type: "xavier" 778 | } 779 | bias_filler { 780 | type: "constant" 781 | } 782 | dilation: 1 783 | } 784 | } 785 | layer { 786 | name: "conv4_scale2_full" 787 | type: "Convolution" 788 | bottom: "pool3_full" 789 | top: "conv4_scale2_full" 790 | param { 791 | name: "conv4_scale2_w_full" 792 | lr_mult: 1 793 | decay_mult: 1 794 | } 795 | param { 796 | name: "conv4_scale2_b_full" 797 | lr_mult: 2 798 | decay_mult: 0 799 | } 800 | convolution_param { 801 | num_output: 32 802 | pad: 2 803 | kernel_size: 3 804 | stride: 1 805 | weight_filler { 806 | type: "xavier" 807 | } 808 | bias_filler { 809 | type: "constant" 810 | } 811 | dilation: 2 812 | } 813 | } 814 | layer { 815 | name: "conv4_scale3_full" 816 | type: "Convolution" 817 | bottom: "pool3_full" 818 | top: "conv4_scale3_full" 819 | param { 820 | name: "conv4_scale3_w_full" 821 | lr_mult: 1 822 | decay_mult: 1 823 | } 824 | param { 825 | name: "conv4_scale3_b_full" 826 | lr_mult: 2 827 | decay_mult: 0 828 | } 829 | convolution_param { 830 | num_output: 32 831 | pad: 3 832 | kernel_size: 3 833 | stride: 1 834 | weight_filler { 835 | type: "xavier" 836 | } 837 | bias_filler { 838 | type: "constant" 839 | } 840 | dilation: 3 841 | } 842 | } 843 | layer { 844 | name: "bn4_scale1_full" 845 | type: "BatchNorm" 846 | bottom: "conv4_scale1_full" 847 | top: "bn4_scale1_full" 848 | param { 849 | name: "bn4_scale1_mean_full" 850 | lr_mult: 0 851 | decay_mult: 0 852 | } 853 | param { 854 | name: "bn4_scale1_var_full" 855 | lr_mult: 0 856 | decay_mult: 0 857 | } 858 | param { 859 | name: "bn4_scale1_bias_full" 860 | lr_mult: 0 861 | decay_mult: 0 862 | } 863 | batch_norm_param { 864 | use_global_stats: true 865 | } 866 | } 867 | layer { 868 | name: "bn4_scale2_full" 869 | type: "BatchNorm" 870 | bottom: "conv4_scale2_full" 871 | top: "bn4_scale2_full" 872 | param { 873 | name: "bn4_scale2_mean_full" 874 | lr_mult: 0 875 | decay_mult: 0 876 | } 877 | param { 878 | name: "bn4_scale2_var_full" 879 | lr_mult: 0 880 | decay_mult: 0 881 | } 882 | param { 883 | name: "bn4_scale2_bias_full" 884 | lr_mult: 0 885 | decay_mult: 0 886 | } 887 | batch_norm_param { 888 | use_global_stats: true 889 | } 890 | } 891 | layer { 892 | name: "bn4_scale3_full" 893 | type: "BatchNorm" 894 | bottom: "conv4_scale3_full" 895 | top: "bn4_scale3_full" 896 | param { 897 | name: "bn4_scale3_mean_full" 898 | lr_mult: 0 899 | decay_mult: 0 900 | } 901 | param { 902 | name: "bn4_scale3_var_full" 903 | lr_mult: 0 904 | decay_mult: 0 905 | } 906 | param { 907 | name: "bn4_scale3_bias_full" 908 | lr_mult: 0 909 | decay_mult: 0 910 | } 911 | batch_norm_param { 912 | use_global_stats: true 913 | } 914 | } 915 | layer { 916 | name: "bn4_full" 917 | type: "Concat" 918 | bottom: "bn4_scale1_full" 919 | bottom: "bn4_scale2_full" 920 | bottom: "bn4_scale3_full" 921 | top: "bn4_full" 922 | concat_param { 923 | axis: 1 924 | } 925 | } 926 | layer { 927 | name: "relu4_full" 928 | type: "ReLU" 929 | bottom: "bn4_full" 930 | top: "bn4_full" 931 | } 932 | layer { 933 | name: "pool4_full" 934 | type: "Pooling" 935 | bottom: "bn4_full" 936 | top: "pool4_full" 937 | pooling_param { 938 | pool: MAX 939 | kernel_size: 2 940 | stride: 2 941 | } 942 | } 943 | layer { 944 | name: "fc1_full" 945 | type: "InnerProduct" 946 | bottom: "pool4_full" 947 | top: "fc1_full" 948 | param { 949 | name: "fc1_w_full" 950 | lr_mult: 1 951 | decay_mult: 1 952 | } 953 | param { 954 | name: "fc1_b_full" 955 | lr_mult: 2 956 | decay_mult: 0 957 | } 958 | inner_product_param { 959 | num_output: 128 960 | weight_filler { 961 | type: "xavier" 962 | } 963 | bias_filler { 964 | type: "constant" 965 | } 966 | } 967 | } 968 | layer { 969 | name: "fc1_full_drop" 970 | type: "Dropout" 971 | bottom: "fc1_full" 972 | top: "fc1_full" 973 | dropout_param { 974 | dropout_ratio: 0.2 975 | } 976 | } 977 | 978 | layer { 979 | name: "pool2_iner" 980 | type: "Pooling" 981 | bottom: "bn2_att_iner" 982 | top: "pool2_iner" 983 | pooling_param { 984 | pool: MAX 985 | kernel_size: 2 986 | stride: 2 987 | } 988 | } 989 | 990 | layer { 991 | name: "conv3_scale1_iner" 992 | type: "Convolution" 993 | bottom: "pool2_iner" 994 | top: "conv3_scale1_iner" 995 | param { 996 | name: "conv3_scale1_w_iner" 997 | lr_mult: 1 998 | decay_mult: 1 999 | } 1000 | param { 1001 | name: "conv3_scale1_b_iner" 1002 | lr_mult: 2 1003 | decay_mult: 0 1004 | } 1005 | convolution_param { 1006 | num_output: 32 1007 | pad: 1 1008 | kernel_size: 3 1009 | stride: 1 1010 | weight_filler { 1011 | type: "xavier" 1012 | } 1013 | bias_filler { 1014 | type: "constant" 1015 | } 1016 | dilation: 1 1017 | } 1018 | } 1019 | layer { 1020 | name: "conv3_scale2_iner" 1021 | type: "Convolution" 1022 | bottom: "pool2_iner" 1023 | top: "conv3_scale2_iner" 1024 | param { 1025 | name: "conv3_scale2_w_iner" 1026 | lr_mult: 1 1027 | decay_mult: 1 1028 | } 1029 | param { 1030 | name: "conv3_scale2_b_iner" 1031 | lr_mult: 2 1032 | decay_mult: 0 1033 | } 1034 | convolution_param { 1035 | num_output: 32 1036 | pad: 2 1037 | kernel_size: 3 1038 | stride: 1 1039 | weight_filler { 1040 | type: "xavier" 1041 | } 1042 | bias_filler { 1043 | type: "constant" 1044 | } 1045 | dilation: 2 1046 | } 1047 | } 1048 | layer { 1049 | name: "conv3_scale3_iner" 1050 | type: "Convolution" 1051 | bottom: "pool2_iner" 1052 | top: "conv3_scale3_iner" 1053 | param { 1054 | name: "conv3_scale3_w_iner" 1055 | lr_mult: 1 1056 | decay_mult: 1 1057 | } 1058 | param { 1059 | name: "conv3_scale3_b_iner" 1060 | lr_mult: 2 1061 | decay_mult: 0 1062 | } 1063 | convolution_param { 1064 | num_output: 32 1065 | pad: 3 1066 | kernel_size: 3 1067 | stride: 1 1068 | weight_filler { 1069 | type: "xavier" 1070 | } 1071 | bias_filler { 1072 | type: "constant" 1073 | } 1074 | dilation: 3 1075 | } 1076 | } 1077 | layer { 1078 | name: "bn3_scale1_iner" 1079 | type: "BatchNorm" 1080 | bottom: "conv3_scale1_iner" 1081 | top: "bn3_scale1_iner" 1082 | param { 1083 | name: "bn3_scale1_mean_iner" 1084 | lr_mult: 0 1085 | decay_mult: 0 1086 | } 1087 | param { 1088 | name: "bn3_scale1_var_iner" 1089 | lr_mult: 0 1090 | decay_mult: 0 1091 | } 1092 | param { 1093 | name: "bn3_scale1_bias_iner" 1094 | lr_mult: 0 1095 | decay_mult: 0 1096 | } 1097 | batch_norm_param { 1098 | use_global_stats: true 1099 | } 1100 | } 1101 | layer { 1102 | name: "bn3_scale2_iner" 1103 | type: "BatchNorm" 1104 | bottom: "conv3_scale2_iner" 1105 | top: "bn3_scale2_iner" 1106 | param { 1107 | name: "bn3_scale2_mean_iner" 1108 | lr_mult: 0 1109 | decay_mult: 0 1110 | } 1111 | param { 1112 | name: "bn3_scale2_var_iner" 1113 | lr_mult: 0 1114 | decay_mult: 0 1115 | } 1116 | param { 1117 | name: "bn3_scale2_bias_iner" 1118 | lr_mult: 0 1119 | decay_mult: 0 1120 | } 1121 | batch_norm_param { 1122 | use_global_stats: true 1123 | } 1124 | } 1125 | layer { 1126 | name: "bn3_scale3_iner" 1127 | type: "BatchNorm" 1128 | bottom: "conv3_scale3_iner" 1129 | top: "bn3_scale3_iner" 1130 | param { 1131 | name: "bn3_scale3_mean_iner" 1132 | lr_mult: 0 1133 | decay_mult: 0 1134 | } 1135 | param { 1136 | name: "bn3_scale3_var_iner" 1137 | lr_mult: 0 1138 | decay_mult: 0 1139 | } 1140 | param { 1141 | name: "bn3_scale3_bias_iner" 1142 | lr_mult: 0 1143 | decay_mult: 0 1144 | } 1145 | batch_norm_param { 1146 | use_global_stats: true 1147 | } 1148 | } 1149 | layer { 1150 | name: "bn3_iner" 1151 | type: "Concat" 1152 | bottom: "bn3_scale1_iner" 1153 | bottom: "bn3_scale2_iner" 1154 | bottom: "bn3_scale3_iner" 1155 | top: "bn3_iner" 1156 | concat_param { 1157 | axis: 1 1158 | } 1159 | } 1160 | layer { 1161 | name: "relu3_iner" 1162 | type: "ReLU" 1163 | bottom: "bn3_iner" 1164 | top: "bn3_iner" 1165 | } 1166 | layer { 1167 | name: "pool3_iner" 1168 | type: "Pooling" 1169 | bottom: "bn3_iner" 1170 | top: "pool3_iner" 1171 | pooling_param { 1172 | pool: MAX 1173 | kernel_size: 2 1174 | stride: 2 1175 | } 1176 | } 1177 | layer { 1178 | name: "conv4_scale1_iner" 1179 | type: "Convolution" 1180 | bottom: "pool3_iner" 1181 | top: "conv4_scale1_iner" 1182 | param { 1183 | name: "conv4_scale1_w_iner" 1184 | lr_mult: 1 1185 | decay_mult: 1 1186 | } 1187 | param { 1188 | name: "conv4_scale1_b_iner" 1189 | lr_mult: 2 1190 | decay_mult: 0 1191 | } 1192 | convolution_param { 1193 | num_output: 32 1194 | pad: 1 1195 | kernel_size: 3 1196 | stride: 1 1197 | weight_filler { 1198 | type: "xavier" 1199 | } 1200 | bias_filler { 1201 | type: "constant" 1202 | } 1203 | dilation: 1 1204 | } 1205 | } 1206 | layer { 1207 | name: "conv4_scale2_iner" 1208 | type: "Convolution" 1209 | bottom: "pool3_iner" 1210 | top: "conv4_scale2_iner" 1211 | param { 1212 | name: "conv4_scale2_w_iner" 1213 | lr_mult: 1 1214 | decay_mult: 1 1215 | } 1216 | param { 1217 | name: "conv4_scale2_b_iner" 1218 | lr_mult: 2 1219 | decay_mult: 0 1220 | } 1221 | convolution_param { 1222 | num_output: 32 1223 | pad: 2 1224 | kernel_size: 3 1225 | stride: 1 1226 | weight_filler { 1227 | type: "xavier" 1228 | } 1229 | bias_filler { 1230 | type: "constant" 1231 | } 1232 | dilation: 2 1233 | } 1234 | } 1235 | layer { 1236 | name: "conv4_scale3_iner" 1237 | type: "Convolution" 1238 | bottom: "pool3_iner" 1239 | top: "conv4_scale3_iner" 1240 | param { 1241 | name: "conv4_scale3_w_iner" 1242 | lr_mult: 1 1243 | decay_mult: 1 1244 | } 1245 | param { 1246 | name: "conv4_scale3_b_iner" 1247 | lr_mult: 2 1248 | decay_mult: 0 1249 | } 1250 | convolution_param { 1251 | num_output: 32 1252 | pad: 3 1253 | kernel_size: 3 1254 | stride: 1 1255 | weight_filler { 1256 | type: "xavier" 1257 | } 1258 | bias_filler { 1259 | type: "constant" 1260 | } 1261 | dilation: 3 1262 | } 1263 | } 1264 | layer { 1265 | name: "bn4_scale1_iner" 1266 | type: "BatchNorm" 1267 | bottom: "conv4_scale1_iner" 1268 | top: "bn4_scale1_iner" 1269 | param { 1270 | name: "bn4_scale1_mean_iner" 1271 | lr_mult: 0 1272 | decay_mult: 0 1273 | } 1274 | param { 1275 | name: "bn4_scale1_var_iner" 1276 | lr_mult: 0 1277 | decay_mult: 0 1278 | } 1279 | param { 1280 | name: "bn4_scale1_bias_iner" 1281 | lr_mult: 0 1282 | decay_mult: 0 1283 | } 1284 | batch_norm_param { 1285 | use_global_stats: true 1286 | } 1287 | } 1288 | layer { 1289 | name: "bn4_scale2_iner" 1290 | type: "BatchNorm" 1291 | bottom: "conv4_scale2_iner" 1292 | top: "bn4_scale2_iner" 1293 | param { 1294 | name: "bn4_scale2_mean_iner" 1295 | lr_mult: 0 1296 | decay_mult: 0 1297 | } 1298 | param { 1299 | name: "bn4_scale2_var_iner" 1300 | lr_mult: 0 1301 | decay_mult: 0 1302 | } 1303 | param { 1304 | name: "bn4_scale2_bias_iner" 1305 | lr_mult: 0 1306 | decay_mult: 0 1307 | } 1308 | batch_norm_param { 1309 | use_global_stats: true 1310 | } 1311 | } 1312 | layer { 1313 | name: "bn4_scale3_iner" 1314 | type: "BatchNorm" 1315 | bottom: "conv4_scale3_iner" 1316 | top: "bn4_scale3_iner" 1317 | param { 1318 | name: "bn4_scale3_mean_iner" 1319 | lr_mult: 0 1320 | decay_mult: 0 1321 | } 1322 | param { 1323 | name: "bn4_scale3_var_iner" 1324 | lr_mult: 0 1325 | decay_mult: 0 1326 | } 1327 | param { 1328 | name: "bn4_scale3_bias_iner" 1329 | lr_mult: 0 1330 | decay_mult: 0 1331 | } 1332 | batch_norm_param { 1333 | use_global_stats: true 1334 | } 1335 | } 1336 | layer { 1337 | name: "bn4_iner" 1338 | type: "Concat" 1339 | bottom: "bn4_scale1_iner" 1340 | bottom: "bn4_scale2_iner" 1341 | bottom: "bn4_scale3_iner" 1342 | top: "bn4_iner" 1343 | concat_param { 1344 | axis: 1 1345 | } 1346 | } 1347 | layer { 1348 | name: "relu4_iner" 1349 | type: "ReLU" 1350 | bottom: "bn4_iner" 1351 | top: "bn4_iner" 1352 | } 1353 | layer { 1354 | name: "pool4_iner" 1355 | type: "Pooling" 1356 | bottom: "bn4_iner" 1357 | top: "pool4_iner" 1358 | pooling_param { 1359 | pool: MAX 1360 | kernel_size: 2 1361 | stride: 2 1362 | } 1363 | } 1364 | layer { 1365 | name: "fc1_iner" 1366 | type: "InnerProduct" 1367 | bottom: "pool4_iner" 1368 | top: "fc1_iner" 1369 | param { 1370 | name: "fc1_w_iner" 1371 | lr_mult: 1 1372 | decay_mult: 1 1373 | } 1374 | param { 1375 | name: "fc1_b_iner" 1376 | lr_mult: 2 1377 | decay_mult: 0 1378 | } 1379 | inner_product_param { 1380 | num_output: 128 1381 | weight_filler { 1382 | type: "xavier" 1383 | } 1384 | bias_filler { 1385 | type: "constant" 1386 | } 1387 | } 1388 | } 1389 | layer { 1390 | name: "fc1_iner_drop" 1391 | type: "Dropout" 1392 | bottom: "fc1_iner" 1393 | top: "fc1_iner" 1394 | dropout_param { 1395 | dropout_ratio: 0.2 1396 | } 1397 | } 1398 | 1399 | layer { 1400 | name: "pool2_exter" 1401 | type: "Pooling" 1402 | bottom: "bn2_att_exter" 1403 | top: "pool2_exter" 1404 | pooling_param { 1405 | pool: MAX 1406 | kernel_size: 2 1407 | stride: 2 1408 | } 1409 | } 1410 | 1411 | layer { 1412 | name: "conv3_scale1_exter" 1413 | type: "Convolution" 1414 | bottom: "pool2_exter" 1415 | top: "conv3_scale1_exter" 1416 | param { 1417 | name: "conv3_scale1_w_exter" 1418 | lr_mult: 1 1419 | decay_mult: 1 1420 | } 1421 | param { 1422 | name: "conv3_scale1_b_exter" 1423 | lr_mult: 2 1424 | decay_mult: 0 1425 | } 1426 | convolution_param { 1427 | num_output: 32 1428 | pad: 1 1429 | kernel_size: 3 1430 | stride: 1 1431 | weight_filler { 1432 | type: "xavier" 1433 | } 1434 | bias_filler { 1435 | type: "constant" 1436 | } 1437 | dilation: 1 1438 | } 1439 | } 1440 | layer { 1441 | name: "conv3_scale2_exter" 1442 | type: "Convolution" 1443 | bottom: "pool2_exter" 1444 | top: "conv3_scale2_exter" 1445 | param { 1446 | name: "conv3_scale2_w_exter" 1447 | lr_mult: 1 1448 | decay_mult: 1 1449 | } 1450 | param { 1451 | name: "conv3_scale2_b_exter" 1452 | lr_mult: 2 1453 | decay_mult: 0 1454 | } 1455 | convolution_param { 1456 | num_output: 32 1457 | pad: 2 1458 | kernel_size: 3 1459 | stride: 1 1460 | weight_filler { 1461 | type: "xavier" 1462 | } 1463 | bias_filler { 1464 | type: "constant" 1465 | } 1466 | dilation: 2 1467 | } 1468 | } 1469 | layer { 1470 | name: "conv3_scale3_exter" 1471 | type: "Convolution" 1472 | bottom: "pool2_exter" 1473 | top: "conv3_scale3_exter" 1474 | param { 1475 | name: "conv3_scale3_w_exter" 1476 | lr_mult: 1 1477 | decay_mult: 1 1478 | } 1479 | param { 1480 | name: "conv3_scale3_b_exter" 1481 | lr_mult: 2 1482 | decay_mult: 0 1483 | } 1484 | convolution_param { 1485 | num_output: 32 1486 | pad: 3 1487 | kernel_size: 3 1488 | stride: 1 1489 | weight_filler { 1490 | type: "xavier" 1491 | } 1492 | bias_filler { 1493 | type: "constant" 1494 | } 1495 | dilation: 3 1496 | } 1497 | } 1498 | layer { 1499 | name: "bn3_scale1_exter" 1500 | type: "BatchNorm" 1501 | bottom: "conv3_scale1_exter" 1502 | top: "bn3_scale1_exter" 1503 | param { 1504 | name: "bn3_scale1_mean_exter" 1505 | lr_mult: 0 1506 | decay_mult: 0 1507 | } 1508 | param { 1509 | name: "bn3_scale1_var_exter" 1510 | lr_mult: 0 1511 | decay_mult: 0 1512 | } 1513 | param { 1514 | name: "bn3_scale1_bias_exter" 1515 | lr_mult: 0 1516 | decay_mult: 0 1517 | } 1518 | batch_norm_param { 1519 | use_global_stats: true 1520 | } 1521 | } 1522 | layer { 1523 | name: "bn3_scale2_exter" 1524 | type: "BatchNorm" 1525 | bottom: "conv3_scale2_exter" 1526 | top: "bn3_scale2_exter" 1527 | param { 1528 | name: "bn3_scale2_mean_exter" 1529 | lr_mult: 0 1530 | decay_mult: 0 1531 | } 1532 | param { 1533 | name: "bn3_scale2_var_exter" 1534 | lr_mult: 0 1535 | decay_mult: 0 1536 | } 1537 | param { 1538 | name: "bn3_scale2_bias_exter" 1539 | lr_mult: 0 1540 | decay_mult: 0 1541 | } 1542 | batch_norm_param { 1543 | use_global_stats: true 1544 | } 1545 | } 1546 | layer { 1547 | name: "bn3_scale3_exter" 1548 | type: "BatchNorm" 1549 | bottom: "conv3_scale3_exter" 1550 | top: "bn3_scale3_exter" 1551 | param { 1552 | name: "bn3_scale3_mean_exter" 1553 | lr_mult: 0 1554 | decay_mult: 0 1555 | } 1556 | param { 1557 | name: "bn3_scale3_var_exter" 1558 | lr_mult: 0 1559 | decay_mult: 0 1560 | } 1561 | param { 1562 | name: "bn3_scale3_bias_exter" 1563 | lr_mult: 0 1564 | decay_mult: 0 1565 | } 1566 | batch_norm_param { 1567 | use_global_stats: true 1568 | } 1569 | } 1570 | layer { 1571 | name: "bn3_exter" 1572 | type: "Concat" 1573 | bottom: "bn3_scale1_exter" 1574 | bottom: "bn3_scale2_exter" 1575 | bottom: "bn3_scale3_exter" 1576 | top: "bn3_exter" 1577 | concat_param { 1578 | axis: 1 1579 | } 1580 | } 1581 | layer { 1582 | name: "relu3_exter" 1583 | type: "ReLU" 1584 | bottom: "bn3_exter" 1585 | top: "bn3_exter" 1586 | } 1587 | layer { 1588 | name: "pool3_exter" 1589 | type: "Pooling" 1590 | bottom: "bn3_exter" 1591 | top: "pool3_exter" 1592 | pooling_param { 1593 | pool: MAX 1594 | kernel_size: 2 1595 | stride: 2 1596 | } 1597 | } 1598 | layer { 1599 | name: "conv4_scale1_exter" 1600 | type: "Convolution" 1601 | bottom: "pool3_exter" 1602 | top: "conv4_scale1_exter" 1603 | param { 1604 | name: "conv4_scale1_w_exter" 1605 | lr_mult: 1 1606 | decay_mult: 1 1607 | } 1608 | param { 1609 | name: "conv4_scale1_b_exter" 1610 | lr_mult: 2 1611 | decay_mult: 0 1612 | } 1613 | convolution_param { 1614 | num_output: 32 1615 | pad: 1 1616 | kernel_size: 3 1617 | stride: 1 1618 | weight_filler { 1619 | type: "xavier" 1620 | } 1621 | bias_filler { 1622 | type: "constant" 1623 | } 1624 | dilation: 1 1625 | } 1626 | } 1627 | layer { 1628 | name: "conv4_scale2_exter" 1629 | type: "Convolution" 1630 | bottom: "pool3_exter" 1631 | top: "conv4_scale2_exter" 1632 | param { 1633 | name: "conv4_scale2_w_exter" 1634 | lr_mult: 1 1635 | decay_mult: 1 1636 | } 1637 | param { 1638 | name: "conv4_scale2_b_exter" 1639 | lr_mult: 2 1640 | decay_mult: 0 1641 | } 1642 | convolution_param { 1643 | num_output: 32 1644 | pad: 2 1645 | kernel_size: 3 1646 | stride: 1 1647 | weight_filler { 1648 | type: "xavier" 1649 | } 1650 | bias_filler { 1651 | type: "constant" 1652 | } 1653 | dilation: 2 1654 | } 1655 | } 1656 | layer { 1657 | name: "conv4_scale3_exter" 1658 | type: "Convolution" 1659 | bottom: "pool3_exter" 1660 | top: "conv4_scale3_exter" 1661 | param { 1662 | name: "conv4_scale3_w_exter" 1663 | lr_mult: 1 1664 | decay_mult: 1 1665 | } 1666 | param { 1667 | name: "conv4_scale3_b_exter" 1668 | lr_mult: 2 1669 | decay_mult: 0 1670 | } 1671 | convolution_param { 1672 | num_output: 32 1673 | pad: 3 1674 | kernel_size: 3 1675 | stride: 1 1676 | weight_filler { 1677 | type: "xavier" 1678 | } 1679 | bias_filler { 1680 | type: "constant" 1681 | } 1682 | dilation: 3 1683 | } 1684 | } 1685 | layer { 1686 | name: "bn4_scale1_exter" 1687 | type: "BatchNorm" 1688 | bottom: "conv4_scale1_exter" 1689 | top: "bn4_scale1_exter" 1690 | param { 1691 | name: "bn4_scale1_mean_exter" 1692 | lr_mult: 0 1693 | decay_mult: 0 1694 | } 1695 | param { 1696 | name: "bn4_scale1_var_exter" 1697 | lr_mult: 0 1698 | decay_mult: 0 1699 | } 1700 | param { 1701 | name: "bn4_scale1_bias_exter" 1702 | lr_mult: 0 1703 | decay_mult: 0 1704 | } 1705 | batch_norm_param { 1706 | use_global_stats: true 1707 | } 1708 | } 1709 | layer { 1710 | name: "bn4_scale2_exter" 1711 | type: "BatchNorm" 1712 | bottom: "conv4_scale2_exter" 1713 | top: "bn4_scale2_exter" 1714 | param { 1715 | name: "bn4_scale2_mean_exter" 1716 | lr_mult: 0 1717 | decay_mult: 0 1718 | } 1719 | param { 1720 | name: "bn4_scale2_var_exter" 1721 | lr_mult: 0 1722 | decay_mult: 0 1723 | } 1724 | param { 1725 | name: "bn4_scale2_bias_exter" 1726 | lr_mult: 0 1727 | decay_mult: 0 1728 | } 1729 | batch_norm_param { 1730 | use_global_stats: true 1731 | } 1732 | } 1733 | layer { 1734 | name: "bn4_scale3_exter" 1735 | type: "BatchNorm" 1736 | bottom: "conv4_scale3_exter" 1737 | top: "bn4_scale3_exter" 1738 | param { 1739 | name: "bn4_scale3_mean_exter" 1740 | lr_mult: 0 1741 | decay_mult: 0 1742 | } 1743 | param { 1744 | name: "bn4_scale3_var_exter" 1745 | lr_mult: 0 1746 | decay_mult: 0 1747 | } 1748 | param { 1749 | name: "bn4_scale3_bias_exter" 1750 | lr_mult: 0 1751 | decay_mult: 0 1752 | } 1753 | batch_norm_param { 1754 | use_global_stats: true 1755 | } 1756 | } 1757 | layer { 1758 | name: "bn4_exter" 1759 | type: "Concat" 1760 | bottom: "bn4_scale1_exter" 1761 | bottom: "bn4_scale2_exter" 1762 | bottom: "bn4_scale3_exter" 1763 | top: "bn4_exter" 1764 | concat_param { 1765 | axis: 1 1766 | } 1767 | } 1768 | layer { 1769 | name: "relu4_exter" 1770 | type: "ReLU" 1771 | bottom: "bn4_exter" 1772 | top: "bn4_exter" 1773 | } 1774 | layer { 1775 | name: "pool4_exter" 1776 | type: "Pooling" 1777 | bottom: "bn4_exter" 1778 | top: "pool4_exter" 1779 | pooling_param { 1780 | pool: MAX 1781 | kernel_size: 2 1782 | stride: 2 1783 | } 1784 | } 1785 | layer { 1786 | name: "fc1_exter" 1787 | type: "InnerProduct" 1788 | bottom: "pool4_exter" 1789 | top: "fc1_exter" 1790 | param { 1791 | name: "fc1_w_exter" 1792 | lr_mult: 1 1793 | decay_mult: 1 1794 | } 1795 | param { 1796 | name: "fc1_b_exter" 1797 | lr_mult: 2 1798 | decay_mult: 0 1799 | } 1800 | inner_product_param { 1801 | num_output: 128 1802 | weight_filler { 1803 | type: "xavier" 1804 | } 1805 | bias_filler { 1806 | type: "constant" 1807 | } 1808 | } 1809 | } 1810 | layer { 1811 | name: "fc1_exter_drop" 1812 | type: "Dropout" 1813 | bottom: "fc1_exter" 1814 | top: "fc1_exter" 1815 | dropout_param { 1816 | dropout_ratio: 0.2 1817 | } 1818 | } -------------------------------------------------------------------------------- /evaluation/extract_feature_cuhk.py: -------------------------------------------------------------------------------- 1 | """ 2 | Extracting features with caffe models. 3 | 4 | by Chunfeng Song 5 | 6 | 2017/10/08 7 | 8 | This code is for research use only, please cite our paper: 9 | 10 | Chunfeng Song, Yan Huang, Wanli Ouyang, and Liang Wang. Mask-guided Contrastive Attention Model for Person Re-Identification. In CVPR, 2018. 11 | 12 | Contact us: chunfeng.song@nlpr.ia.ac.cn 13 | """ 14 | import caffe 15 | import numpy as n 16 | import cv2 17 | import scipy.io as spio 18 | import os 19 | 20 | def extract_feature(net,image_path, gt_path): 21 | # load image 22 | oim = cv2.imread(image_path) 23 | 24 | # resize image into caffe size 25 | inputImage = cv2.resize(oim, (64, 160)) 26 | inputImage = n.array(inputImage, dtype=n.float32) 27 | 28 | # substract mean 29 | inputImage[:, :, 0] = inputImage[:, :, 0] - 104.008 30 | inputImage[:, :, 1] = inputImage[:, :, 1] - 116.669 31 | inputImage[:, :, 2] = inputImage[:, :, 2] - 122.675 32 | 33 | # permute dimensions 34 | inputImage = inputImage.transpose([2, 0, 1]) 35 | inputImage = inputImage/256.0 36 | one_mask_ = n.zeros((1,40,16),dtype = n.float32) 37 | one_mask_ = one_mask_ + 1.0 38 | 39 | # mask 40 | mask_im= cv2.cvtColor(cv2.imread(gt_path),cv2.COLOR_BGR2GRAY) 41 | inputmask = n.array(cv2.resize(mask_im, (64, 160)), dtype=n.float32) 42 | inputmask = inputmask-127.5 43 | inputmask = inputmask/255.0 44 | inputmask = inputmask[n.newaxis, ...] 45 | 46 | #caffe forward 47 | net.blobs['data'].reshape(1,*inputImage.shape) 48 | net.blobs['one_mask'].reshape(1,*one_mask_.shape) 49 | net.blobs['mask'].reshape(1,*inputmask.shape) 50 | net.blobs['data'].data[...] = inputImage 51 | net.blobs['one_mask'].data[...] = one_mask_ 52 | net.blobs['mask'].data[...] = inputmask 53 | net.forward() 54 | 55 | #caffe output 56 | feature = n.squeeze(net.blobs['fc1_full'].data) 57 | return feature 58 | 59 | if __name__ == '__main__': 60 | pass 61 | prefix = 'labeled' #'labeled' or 'detected' 62 | prefix_2 = None #None or 'siamese' 63 | gpu_id = 0 64 | if prefix_2 is None: 65 | model_data = '../experiments/cuhk03-'+prefix+'/mgcam_iter_75000.caffemodel' 66 | else: 67 | model_data = '../experiments/cuhk03-'+prefix+'/mgcam_'+prefix_2+'_iter_20000.caffemodel' 68 | fea_dims = 128 69 | model_config = './deploy_mgcam.prototxt' 70 | caffe.set_mode_gpu() 71 | caffe.set_device(gpu_id) 72 | net = caffe.Net(model_config, model_data, caffe.TEST) 73 | image_path = '../data/cuhk03/cuhk03_'+prefix 74 | mask_path = '../data/cuhk03/cuhk03_'+ prefix + '_seg' 75 | # list images 76 | image_list = n.sort(os.listdir(image_path)) 77 | feature_all = n.zeros((fea_dims,len(image_list)),n.single) 78 | now = 0 79 | for item in image_list: 80 | if item.lower().endswith('.png'): 81 | this_image_path = os.path.join(image_path,item) 82 | this_mask_path = os.path.join(mask_path,item) 83 | this_fea = extract_feature(net,this_image_path,this_mask_path) 84 | feature_all[:,now] = this_fea[:] 85 | now +=1 86 | print '---->%04d of %d with %s is done!'%(now,len(image_list),item) 87 | if prefix_2 is None: 88 | spio.savemat('cuhk03-fea-' + prefix, {'feat': feature_all}) 89 | else: 90 | spio.savemat('cuhk03-fea-' + prefix + '-' + prefix_2, {'feat': feature_all}) 91 | -------------------------------------------------------------------------------- /evaluation/extract_feature_market.py: -------------------------------------------------------------------------------- 1 | """ 2 | Extracting features with caffe models. 3 | 4 | by Chunfeng Song 5 | 6 | 2017/10/08 7 | 8 | This code is for research use only, please cite our paper: 9 | 10 | Chunfeng Song, Yan Huang, Wanli Ouyang, and Liang Wang. Mask-guided Contrastive Attention Model for Person Re-Identification. In CVPR, 2018. 11 | 12 | Contact us: chunfeng.song@nlpr.ia.ac.cn 13 | """ 14 | import caffe 15 | import numpy as n 16 | import cv2 17 | import scipy.io as spio 18 | import os 19 | 20 | def extract_feature(net,image_path, gt_path): 21 | # load image 22 | oim = cv2.imread(image_path) 23 | 24 | # resize image into caffe size 25 | inputImage = cv2.resize(oim, (64, 160)) 26 | inputImage = n.array(inputImage, dtype=n.float32) 27 | 28 | # substract mean 29 | inputImage[:, :, 0] = inputImage[:, :, 0] - 104.008 30 | inputImage[:, :, 1] = inputImage[:, :, 1] - 116.669 31 | inputImage[:, :, 2] = inputImage[:, :, 2] - 122.675 32 | 33 | # permute dimensions 34 | inputImage = inputImage.transpose([2, 0, 1]) 35 | inputImage = inputImage/256.0 36 | one_mask_ = n.zeros((1,40,16),dtype = n.float32) 37 | one_mask_ = one_mask_ + 1.0 38 | 39 | # mask 40 | mask_im= cv2.cvtColor(cv2.imread(gt_path),cv2.COLOR_BGR2GRAY) 41 | inputmask = n.array(cv2.resize(mask_im, (64, 160)), dtype=n.float32) 42 | inputmask = inputmask-127.5 43 | inputmask = inputmask/255.0 44 | inputmask = inputmask[n.newaxis, ...] 45 | 46 | #caffe forward 47 | net.blobs['data'].reshape(1,*inputImage.shape) 48 | net.blobs['one_mask'].reshape(1,*one_mask_.shape) 49 | net.blobs['mask'].reshape(1,*inputmask.shape) 50 | net.blobs['data'].data[...] = inputImage 51 | net.blobs['one_mask'].data[...] = one_mask_ 52 | net.blobs['mask'].data[...] = inputmask 53 | net.forward() 54 | 55 | #caffe output 56 | feature = n.squeeze(net.blobs['fc1_full'].data) 57 | return feature 58 | 59 | if __name__ == '__main__': 60 | pass 61 | prefix_list = ['query', 'bounding_box_test','bounding_box_train'] 62 | prefix_2 = None #None or 'siamese' 63 | gpu_id = 0 64 | if prefix_2 is None: 65 | model_data = '../experiments/market1501/mgcam_iter_75000.caffemodel' 66 | else: 67 | model_data = '../experiments/market1501/mgcam_'+prefix_2+'_iter_20000.caffemodel' 68 | fea_dims = 128 69 | model_config = './deploy_mgcam.prototxt' 70 | data_path = '../data/market1501/' 71 | caffe.set_mode_gpu() 72 | caffe.set_device(gpu_id) 73 | net = caffe.Net(model_config, model_data, caffe.TEST) 74 | for prefix in prefix_list: 75 | image_path = os.path.join(data_path, prefix) 76 | mask_path = os.path.join(data_path, prefix + '_seg') 77 | # list images 78 | image_list = n.sort(os.listdir(image_path)) 79 | length = len(image_list)-1 80 | feature_all = n.zeros((fea_dims,length),n.single) 81 | now = 0 82 | for item in image_list: 83 | if item.lower().endswith('.jpg'): 84 | this_image_path = os.path.join(image_path,item) 85 | this_mask_path = os.path.join(mask_path,item[:-4] + '.png') 86 | this_fea = extract_feature(net,this_image_path,this_mask_path) 87 | feature_all[:,now] = this_fea[:] 88 | now +=1 89 | print '---->%04d of %d with %s is done!'%(now,len(image_list),item) 90 | if prefix_2 is None: 91 | spio.savemat('market-fea-' + prefix, {'feat': feature_all}) 92 | else: 93 | spio.savemat('market-fea-' + prefix +'-'+ prefix_2, {'feat': feature_all}) 94 | -------------------------------------------------------------------------------- /evaluation/extract_feature_mars.py: -------------------------------------------------------------------------------- 1 | """ 2 | Extracting features with caffe models. 3 | 4 | by Chunfeng Song 5 | 6 | 2017/10/08 7 | 8 | This code is for research use only, please cite our paper: 9 | 10 | Chunfeng Song, Yan Huang, Wanli Ouyang, and Liang Wang. Mask-guided Contrastive Attention Model for Person Re-Identification. In CVPR, 2018. 11 | 12 | Contact us: chunfeng.song@nlpr.ia.ac.cn 13 | """ 14 | import caffe 15 | import numpy as n 16 | import cv2 17 | import scipy.io as spio 18 | import os 19 | 20 | def extract_feature(net,image_path, gt_path): 21 | # load image 22 | oim = cv2.imread(image_path) 23 | 24 | # resize image into caffe size 25 | inputImage = cv2.resize(oim, (64, 160)) 26 | inputImage = n.array(inputImage, dtype=n.float32) 27 | 28 | # substract mean 29 | inputImage[:, :, 0] = inputImage[:, :, 0] - 104.008 30 | inputImage[:, :, 1] = inputImage[:, :, 1] - 116.669 31 | inputImage[:, :, 2] = inputImage[:, :, 2] - 122.675 32 | 33 | # permute dimensions 34 | inputImage = inputImage.transpose([2, 0, 1]) 35 | inputImage = inputImage/256.0 36 | one_mask_ = n.zeros((1,40,16),dtype = n.float32) 37 | one_mask_ = one_mask_ + 1.0 38 | 39 | # mask 40 | mask_im= cv2.cvtColor(cv2.imread(gt_path),cv2.COLOR_BGR2GRAY) 41 | inputmask = n.array(cv2.resize(mask_im, (64, 160)), dtype=n.float32) 42 | inputmask = inputmask-127.5 43 | inputmask = inputmask/255.0 44 | inputmask = inputmask[n.newaxis, ...] 45 | 46 | #caffe forward 47 | net.blobs['data'].reshape(1,*inputImage.shape) 48 | net.blobs['one_mask'].reshape(1,*one_mask_.shape) 49 | net.blobs['mask'].reshape(1,*inputmask.shape) 50 | net.blobs['data'].data[...] = inputImage 51 | net.blobs['one_mask'].data[...] = one_mask_ 52 | net.blobs['mask'].data[...] = inputmask 53 | net.forward() 54 | 55 | #caffe output 56 | feature = n.squeeze(net.blobs['fc1_full'].data) 57 | return feature 58 | 59 | if __name__ == '__main__': 60 | pass 61 | prefix_list = ['train', 'test'] 62 | prefix_2 = None #None or 'siamese' 63 | gpu_id = 0 64 | if prefix_2 is None: 65 | model_data = '../experiments/mars/mgcam_iter_75000.caffemodel' 66 | else: 67 | model_data = '../experiments/mars/mgcam_'+prefix_2+'_iter_20000.caffemodel' 68 | fea_dims = 128 69 | model_config = './deploy_mgcam.prototxt' 70 | data_path = '../data/mars/' 71 | mars_eval_info_path = '../data/mars/MARS-evaluation-master/info/' # The evaluation info can be download from: https://github.com/liangzheng06/MARS-evaluation 72 | caffe.set_mode_gpu() 73 | caffe.set_device(gpu_id) 74 | net = caffe.Net(model_config, model_data, caffe.TEST) 75 | for prefix in prefix_list: 76 | image_path = os.path.join(mars_eval_info_path, prefix + '_name.txt') 77 | files = open(image_path, 'r') 78 | img_list = files.readlines() 79 | data_num = len(img_list) 80 | feature_all = n.zeros((fea_dims,data_num),n.single) 81 | for i in xrange(data_num): 82 | this_line = img_list[i] 83 | this_path = os.path.join(data_path, 'bbox_'+prefix, this_line[:4], this_line[:-2]) 84 | this_gt_path = os.path.join(data_path, 'bbox_'+prefix+'_seg', this_line[:4], this_line[:-6] + '.png') 85 | if not os.path.exists(this_path) or not os.path.exists(this_gt_path): 86 | import pdb 87 | pdb.set_trace() 88 | print 'ERROR!!!' 89 | break 90 | this_fea = extract_feature(net,this_path,this_gt_path) 91 | feature_all[:,i] = this_fea[:] 92 | if i%1000==0: 93 | print '---->%d of %d is done!'%(i,data_num) 94 | files.close 95 | if prefix_2 is None: 96 | spio.savemat('mars-fea-' + prefix, {prefix: feature_all}) 97 | else: 98 | spio.savemat('mars-fea-' + prefix + '-' + prefix_2, {prefix: feature_all}) 99 | print 'Extracting feature of %s is done!' %prefix -------------------------------------------------------------------------------- /experiments/cuhk03-detected/layers.py: -------------------------------------------------------------------------------- 1 | """ 2 | MGCAM data layer. 3 | 4 | by Chunfeng Song 5 | 6 | 2017/10/08 7 | 8 | This code is for research use only, please cite our paper: 9 | 10 | Chunfeng Song, Yan Huang, Wanli Ouyang, and Liang Wang. Mask-guided Contrastive Attention Model for Person Re-Identification. In CVPR, 2018. 11 | 12 | Contact us: chunfeng.song@nlpr.ia.ac.cn 13 | """ 14 | 15 | import caffe 16 | import numpy as np 17 | import yaml 18 | from random import shuffle 19 | import numpy.random as nr 20 | import cv2 21 | import os 22 | import pickle as cPickle 23 | import pdb 24 | 25 | def mypickle(filename, data): 26 | fo = open(filename, "wb") 27 | cPickle.dump(data, fo, protocol=cPickle.HIGHEST_PROTOCOL) 28 | fo.close() 29 | 30 | def myunpickle(filename): 31 | if not os.path.exists(filename): 32 | raise UnpickleError("Path '%s' does not exist." % filename) 33 | 34 | fo = open(filename, 'rb') 35 | dict = cPickle.load(fo) 36 | fo.close() 37 | return dict 38 | 39 | class MGCAM_DataLayer(caffe.Layer): 40 | """Data layer for training""" 41 | def setup(self, bottom, top): 42 | self.width = 64 43 | self.height = 160 # We resize all images into a size of 160*64. 44 | self.width_gt = 16 45 | self.height_gt = 40 # We resize all masks which are used to supervise attention learning into a size of 40*16. 46 | layer_params = yaml.load(self.param_str) 47 | self.batch_size = layer_params['batch_size'] 48 | self.im_path = layer_params['im_path'] 49 | self.gt_path = layer_params['gt_path'] 50 | self.dataset = layer_params['dataset'] 51 | self.labels, self.im_list, self.gt_list, self.im_dir_list, self.gt_dir_list = self.data_processor(self.dataset) 52 | self.idx = 0 53 | self.data_num = len(self.im_list) # Number of data pairs 54 | self.rnd_list = np.arange(self.data_num) # Random the images list 55 | shuffle(self.rnd_list) 56 | 57 | def forward(self, bottom, top): 58 | # Assign forward data 59 | top[0].data[...] = self.im 60 | top[1].data[...] = self.inner_label 61 | top[2].data[...] = self.exter_label 62 | top[3].data[...] = self.one_mask 63 | top[4].data[...] = self.label 64 | top[5].data[...] = self.label_plus 65 | top[6].data[...] = self.gt 66 | top[7].data[...] = self.mask 67 | 68 | def backward(self, top, propagate_down, bottom): 69 | """This layer does not propagate gradients.""" 70 | pass 71 | 72 | def reshape(self, bottom, top): 73 | # Load image + label image pairs 74 | self.im = [] 75 | self.label = [] 76 | self.inner_label = [] 77 | self.exter_label = [] 78 | self.one_mask = [] 79 | self.label_plus = [] 80 | self.gt = [] 81 | self.mask = [] 82 | 83 | for i in xrange(self.batch_size): 84 | if self.idx == self.data_num: 85 | self.idx = 0 86 | shuffle(self.rnd_list) #Randomly shuffle the list. 87 | cur_idx = self.rnd_list[self.idx] 88 | im_path = self.im_list[cur_idx] 89 | gt_path = self.gt_list[cur_idx] 90 | im_, gt_, mask_= self.load_data(im_path, gt_path) 91 | self.im.append(im_) 92 | self.gt.append(gt_) 93 | self.mask.append(mask_) 94 | self.label.append(self.labels[cur_idx]) 95 | self.inner_label.append(int(1)) 96 | self.exter_label.append(int(0)) 97 | one_mask_ = np.zeros((1,40,16),dtype = np.float32) 98 | one_mask_ = one_mask_ + 1.0 99 | self.one_mask.append(one_mask_) 100 | self.label_plus.append(self.labels[cur_idx]) #Here, we also give the ID-labels to background-stream. 101 | self.idx +=1 102 | 103 | self.im = np.array(self.im).astype(np.float32) 104 | self.inner_label = np.array(self.inner_label).astype(np.float32) 105 | self.exter_label = np.array(self.exter_label).astype(np.float32) 106 | self.one_mask = np.array(self.one_mask).astype(np.float32) 107 | self.label = np.array(self.label).astype(np.float32) 108 | self.label_plus = np.array(self.label_plus).astype(np.float32) 109 | self.gt = np.array(self.gt).astype(np.float32) 110 | self.mask = np.array(self.mask).astype(np.float32) 111 | # Reshape tops to fit blobs 112 | top[0].reshape(*self.im.shape) 113 | top[1].reshape(*self.inner_label.shape) 114 | top[2].reshape(*self.exter_label.shape) 115 | top[3].reshape(*self.one_mask.shape) 116 | top[4].reshape(*self.label.shape) 117 | top[5].reshape(*self.label_plus.shape) 118 | top[6].reshape(*self.gt.shape) 119 | top[7].reshape(*self.mask.shape) 120 | 121 | def data_processor(self, data_name): 122 | data_dic = './' + data_name 123 | if not os.path.exists(data_dic): 124 | im_list = [] 125 | gt_list = [] 126 | labels = [] 127 | im_dir_list = [] 128 | gt_dir_list = [] 129 | new_id = 0 130 | id_list = np.sort(os.listdir(self.im_path)) 131 | for id in id_list: 132 | im_dir = os.path.join(self.im_path, id) 133 | gt_dir = os.path.join(self.gt_path, id) 134 | if not os.path.exists(im_dir): 135 | continue 136 | pic_im_list = np.sort(os.listdir(im_dir)) 137 | if len(pic_im_list)>1: 138 | for pic in pic_im_list: 139 | this_dir = os.path.join(self.im_path, id, pic) 140 | gt_pic = pic 141 | if not pic.lower().endswith('.png'): 142 | gt_pic = pic[:-4] + '.png' 143 | this_gt_dir = os.path.join(self.gt_path, id, gt_pic) 144 | im_list.append(this_dir) 145 | gt_list.append(this_gt_dir) 146 | labels.append(int(new_id)) 147 | new_id +=1 148 | im_dir_list.append(im_dir) 149 | gt_dir_list.append(gt_dir) 150 | dic = {'im_list':im_list,'gt_list':gt_list,'labels':labels,'im_dir_list':im_dir_list,'gt_dir_list':gt_dir_list} 151 | mypickle(data_dic, dic) 152 | # Load saved data dict to resume. 153 | else: 154 | dic = myunpickle(data_dic) 155 | im_list = dic['im_list'] 156 | gt_list = dic['gt_list'] 157 | labels = dic['labels'] 158 | im_dir_list = dic['im_dir_list'] 159 | gt_dir_list = dic['gt_dir_list'] 160 | return labels, im_list, gt_list, im_dir_list, gt_dir_list 161 | 162 | def load_data(self, im_path, gt_path): 163 | """ 164 | Load input image and preprocess for Caffe: 165 | - cast to float 166 | - switch channels RGB -> BGR 167 | - subtract mean 168 | - transpose to channel x height x width order 169 | """ 170 | oim = cv2.imread(im_path) 171 | inputImage = cv2.resize(oim, (self.width, self.height)) 172 | inputImage = np.array(inputImage, dtype=np.float32) 173 | 174 | # Substract mean 175 | inputImage[:, :, 0] = inputImage[:, :, 0] - 104.008 176 | inputImage[:, :, 1] = inputImage[:, :, 1] - 116.669 177 | inputImage[:, :, 2] = inputImage[:, :, 2] - 122.675 178 | 179 | # Permute dimensions 180 | if_flip = nr.randint(2) 181 | if if_flip == 0: # Also flip the image with 50% probability 182 | inputImage = inputImage[:,::-1,:] 183 | inputImage = inputImage.transpose([2, 0, 1]) 184 | inputImage = inputImage/256.0 185 | #GT 186 | mask_im= cv2.cvtColor(cv2.imread(gt_path),cv2.COLOR_BGR2GRAY) 187 | inputGt = np.array(cv2.resize(mask_im, (self.width_gt, self.height_gt)), dtype=np.float32) 188 | inputGt = inputGt/255.0 189 | if if_flip == 0: 190 | inputGt = inputGt[:,::-1] 191 | inputGt = inputGt[np.newaxis, ...] 192 | #Mask 193 | inputMask = np.array(cv2.resize(mask_im, (self.width, self.height)), dtype=np.float32) 194 | inputMask = inputMask-127.5 195 | inputMask = inputMask/255.0 196 | if if_flip == 0: 197 | inputMask = inputMask[:,::-1] 198 | inputMask = inputMask[np.newaxis, ...] 199 | return inputImage, inputGt, inputMask 200 | 201 | 202 | class MGCAM_SIA_DataLayer(caffe.Layer): 203 | """Data layer for training""" 204 | def setup(self, bottom, top): 205 | self.width = 64 206 | self.height = 160 # We resize all images into a size of 160*64. 207 | self.width_gt = 16 208 | self.height_gt = 40 # We resize all masks which are used to supervise attention learning into a size of 160*64. 209 | 210 | layer_params = yaml.load(self.param_str) 211 | self.batch_size = layer_params['batch_size'] 212 | self.pos_pair_num = int(0.30*self.batch_size) # There will be at least 30 percent postive pairs for each batch. 213 | self.im_path = layer_params['im_path'] 214 | self.gt_path = layer_params['gt_path'] 215 | self.dataset = layer_params['dataset'] 216 | self.labels, self.im_list, self.gt_list, self.im_dir_list, self.gt_dir_list = self.data_processor(self.dataset) 217 | self.idx = 0 218 | self.data_num = len(self.im_list) 219 | self.rnd_list = np.arange(self.data_num) 220 | shuffle(self.rnd_list) 221 | 222 | def forward(self, bottom, top): 223 | # Assign forward data 224 | top[0].data[...] = self.im 225 | top[1].data[...] = self.inner_label 226 | top[2].data[...] = self.exter_label 227 | top[3].data[...] = self.one_mask 228 | top[4].data[...] = self.label 229 | top[5].data[...] = self.label_plus 230 | top[6].data[...] = self.gt 231 | top[7].data[...] = self.mask 232 | top[8].data[...] = self.siam_label 233 | 234 | def backward(self, top, propagate_down, bottom): 235 | """This layer does not propagate gradients.""" 236 | pass 237 | 238 | def reshape(self, bottom, top): 239 | # Load image + label image pairs 240 | self.im = [] 241 | self.label = [] 242 | self.inner_label = [] 243 | self.exter_label = [] 244 | self.one_mask = [] 245 | self.label_plus = [] 246 | self.gt = [] 247 | self.mask = [] 248 | self.siam_label = [] 249 | 250 | for i in xrange(self.batch_size): 251 | if self.idx == self.data_num: 252 | self.idx = 0 253 | shuffle(self.rnd_list) 254 | cur_idx = self.rnd_list[self.idx] 255 | im_path = self.im_list[cur_idx] 256 | gt_path = self.gt_list[cur_idx] 257 | im_, gt_, mask_= self.load_data(im_path, gt_path) 258 | self.im.append(im_) 259 | self.gt.append(gt_) 260 | self.mask.append(mask_) 261 | self.label.append(self.labels[cur_idx]) 262 | self.inner_label.append(int(1)) 263 | self.exter_label.append(int(0)) 264 | one_mask_ = np.zeros((1,40,16),dtype = np.float32) 265 | one_mask_ = one_mask_ + 1.0 266 | self.one_mask.append(one_mask_) 267 | self.label_plus.append(self.labels[cur_idx])#Labels for backgrounds. We use the same labels with other two regions here. 268 | self.idx +=1 269 | 270 | for i in xrange(self.batch_size): 271 | if i > self.pos_pair_num: 272 | if self.idx == self.data_num: 273 | self.idx = 0 274 | shuffle(self.rnd_list)#Randomly shuffle the list. 275 | cur_idx = self.rnd_list[self.idx] 276 | self.idx +=1 277 | im_path = self.im_list[cur_idx] 278 | gt_path = self.gt_list[cur_idx] 279 | label = self.labels[cur_idx] 280 | if label==self.label[i]: 281 | self.siam_label.append(int(1))#In case of getting postive pairs, maybe not much. 282 | else: 283 | self.siam_label.append(int(0))#Negative pairs. 284 | else: 285 | im_dir = self.im_dir_list[self.label[i]] 286 | gt_dir = self.gt_dir_list[self.label[i]] 287 | im_list = np.sort(os.listdir(im_dir)) 288 | gt_list = np.sort(os.listdir(gt_dir)) 289 | tmp_list = np.arange(len(im_list)) 290 | shuffle(tmp_list) #Randomly select one. 291 | im_path = os.path.join(im_dir, im_list[tmp_list[0]]) 292 | gt_path = os.path.join(gt_dir, gt_list[tmp_list[0]]) 293 | label = self.label[i] 294 | self.siam_label.append(int(1))#This is a postive pair. 295 | 296 | im_, gt_, mask_= self.load_data(im_path, gt_path) 297 | self.im.append(im_) 298 | self.gt.append(gt_) 299 | self.mask.append(mask_) 300 | self.label.append(label) 301 | self.inner_label.append(int(1))#Allways be ones, for constrastive learning. 302 | self.exter_label.append(int(0)) 303 | one_mask_ = np.zeros((1,40,16),dtype = np.float32) 304 | one_mask_ = one_mask_ + 1.0 305 | self.one_mask.append(one_mask_) 306 | self.label_plus.append(label) 307 | 308 | self.im = np.array(self.im).astype(np.float32) 309 | self.inner_label = np.array(self.inner_label).astype(np.float32) 310 | self.exter_label = np.array(self.exter_label).astype(np.float32) 311 | self.one_mask = np.array(self.one_mask).astype(np.float32) 312 | self.label = np.array(self.label).astype(np.float32) 313 | self.label_plus = np.array(self.label_plus).astype(np.float32) 314 | self.gt = np.array(self.gt).astype(np.float32) 315 | self.mask = np.array(self.mask).astype(np.float32) 316 | self.siam_label = np.array(self.siam_label).astype(np.float32) 317 | # Reshape tops to fit blobs 318 | top[0].reshape(*self.im.shape) 319 | top[1].reshape(*self.inner_label.shape) 320 | top[2].reshape(*self.exter_label.shape) 321 | top[3].reshape(*self.one_mask.shape) 322 | top[4].reshape(*self.label.shape) 323 | top[5].reshape(*self.label_plus.shape) 324 | top[6].reshape(*self.gt.shape) 325 | top[7].reshape(*self.mask.shape) 326 | top[8].reshape(*self.siam_label.shape) 327 | 328 | def data_processor(self, data_name): 329 | data_dic = './' + data_name 330 | if not os.path.exists(data_dic): 331 | im_list = [] 332 | gt_list = [] 333 | labels = [] 334 | im_dir_list = [] 335 | gt_dir_list = [] 336 | new_id = 0 337 | id_list = np.sort(os.listdir(self.im_path)) 338 | for id in id_list: 339 | im_dir = os.path.join(self.im_path, id) 340 | gt_dir = os.path.join(self.gt_path, id) 341 | if not os.path.exists(im_dir): 342 | continue 343 | pic_im_list = np.sort(os.listdir(im_dir)) 344 | if len(pic_im_list)>1: 345 | for pic in pic_im_list: 346 | this_dir = os.path.join(self.im_path, id, pic) 347 | gt_pic = pic 348 | if not pic.lower().endswith('.png'): 349 | gt_pic = pic[:-4] + '.png' 350 | this_gt_dir = os.path.join(self.gt_path, id, gt_pic) 351 | im_list.append(this_dir) 352 | gt_list.append(this_gt_dir) 353 | labels.append(int(new_id)) 354 | new_id +=1 355 | im_dir_list.append(im_dir) 356 | gt_dir_list.append(gt_dir) 357 | dic = {'im_list':im_list,'gt_list':gt_list,'labels':labels,'im_dir_list':im_dir_list,'gt_dir_list':gt_dir_list} 358 | mypickle(data_dic, dic) 359 | # Load saved data dict to resume. 360 | else: 361 | dic = myunpickle(data_dic) 362 | im_list = dic['im_list'] 363 | gt_list = dic['gt_list'] 364 | labels = dic['labels'] 365 | im_dir_list = dic['im_dir_list'] 366 | gt_dir_list = dic['gt_dir_list'] 367 | return labels, im_list, gt_list, im_dir_list, gt_dir_list 368 | 369 | def load_data(self, im_path, gt_path): 370 | """ 371 | Load input image and preprocess for Caffe: 372 | - cast to float 373 | - switch channels RGB -> BGR 374 | - subtract mean 375 | - transpose to channel x height x width order 376 | """ 377 | oim = cv2.imread(im_path) 378 | inputImage = cv2.resize(oim, (self.width, self.height)) 379 | inputImage = np.array(inputImage, dtype=np.float32) 380 | 381 | # Substract mean 382 | inputImage[:, :, 0] = inputImage[:, :, 0] - 104.008 383 | inputImage[:, :, 1] = inputImage[:, :, 1] - 116.669 384 | inputImage[:, :, 2] = inputImage[:, :, 2] - 122.675 385 | 386 | # Permute dimensions 387 | if_flip = nr.randint(2) 388 | if if_flip == 0: # Also flip the image with 50% probability 389 | inputImage = inputImage[:,::-1,:] 390 | inputImage = inputImage.transpose([2, 0, 1]) 391 | inputImage = inputImage/256.0 392 | #GT 393 | mask_im= cv2.cvtColor(cv2.imread(gt_path),cv2.COLOR_BGR2GRAY) 394 | inputGt = np.array(cv2.resize(mask_im, (self.width_gt, self.height_gt)), dtype=np.float32) 395 | inputGt = inputGt/255.0 396 | if if_flip == 0: 397 | inputGt = inputGt[:,::-1] 398 | inputGt = inputGt[np.newaxis, ...] 399 | #Mask 400 | inputMask = np.array(cv2.resize(mask_im, (self.width, self.height)), dtype=np.float32) 401 | inputMask = inputMask-127.5 402 | inputMask = inputMask/255.0 403 | if if_flip == 0: 404 | inputMask = inputMask[:,::-1] 405 | inputMask = inputMask[np.newaxis, ...] 406 | return inputImage, inputGt, inputMask 407 | -------------------------------------------------------------------------------- /experiments/cuhk03-detected/mgcam_iter_75000.caffemodel: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/developfeng/MGCAM/1b5957218ecaa7f13bf2107bc41b1349c05be9c8/experiments/cuhk03-detected/mgcam_iter_75000.caffemodel -------------------------------------------------------------------------------- /experiments/cuhk03-detected/mgcam_siamese_iter_20000.caffemodel: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/developfeng/MGCAM/1b5957218ecaa7f13bf2107bc41b1349c05be9c8/experiments/cuhk03-detected/mgcam_siamese_iter_20000.caffemodel -------------------------------------------------------------------------------- /experiments/cuhk03-detected/run_mgcam.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env sh 2 | LOG=./mgcam-`date +%Y-%m-%d-%H-%M-%S`.log 3 | CAFFE=/path-to-caffe/build/tools/caffe 4 | 5 | $CAFFE train --solver=./solver_mgcam.prototxt --gpu=0 2>&1 | tee $LOG 6 | -------------------------------------------------------------------------------- /experiments/cuhk03-detected/run_mgcam_siamese.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env sh 2 | LOG=./mgcam-siamese`date +%Y-%m-%d-%H-%M-%S`.log 3 | CAFFE=/path-to-caffe/build/tools/caffe 4 | 5 | $CAFFE train --solver=./solver_mgcam_siamese.prototxt --weights=./mgcam_iter_75000.caffemodel --gpu=0 2>&1 | tee $LOG 6 | -------------------------------------------------------------------------------- /experiments/cuhk03-detected/solver_mgcam.prototxt: -------------------------------------------------------------------------------- 1 | net: "mgcam_train.prototxt" 2 | 3 | test_iter: 10 4 | test_interval:1000 5 | base_lr: 0.01 6 | lr_policy: "step" 7 | gamma: 0.1 8 | stepsize: 15000 9 | display: 10 10 | max_iter: 75000 11 | momentum: 0.9 12 | weight_decay: 0.005 13 | snapshot: 5000 14 | snapshot_prefix: "mgcam" 15 | solver_mode: GPU 16 | -------------------------------------------------------------------------------- /experiments/cuhk03-detected/solver_mgcam_siamese.prototxt: -------------------------------------------------------------------------------- 1 | net: "mgcam_siamese_train.prototxt" 2 | 3 | test_iter: 10 4 | test_interval:1000 5 | base_lr: 0.0001 6 | lr_policy: "step" 7 | gamma: 0.1 8 | stepsize: 10000 9 | display: 10 10 | max_iter: 20000 11 | momentum: 0.9 12 | weight_decay: 0.005 13 | snapshot: 5000 14 | snapshot_prefix: "mgcam_siamese" 15 | solver_mode: GPU 16 | -------------------------------------------------------------------------------- /experiments/cuhk03-labeled/layers.py: -------------------------------------------------------------------------------- 1 | """ 2 | MGCAM data layer. 3 | 4 | by Chunfeng Song 5 | 6 | 2017/10/08 7 | 8 | This code is for research use only, please cite our paper: 9 | 10 | Chunfeng Song, Yan Huang, Wanli Ouyang, and Liang Wang. Mask-guided Contrastive Attention Model for Person Re-Identification. In CVPR, 2018. 11 | 12 | Contact us: chunfeng.song@nlpr.ia.ac.cn 13 | """ 14 | 15 | import caffe 16 | import numpy as np 17 | import yaml 18 | from random import shuffle 19 | import numpy.random as nr 20 | import cv2 21 | import os 22 | import pickle as cPickle 23 | import pdb 24 | 25 | def mypickle(filename, data): 26 | fo = open(filename, "wb") 27 | cPickle.dump(data, fo, protocol=cPickle.HIGHEST_PROTOCOL) 28 | fo.close() 29 | 30 | def myunpickle(filename): 31 | if not os.path.exists(filename): 32 | raise UnpickleError("Path '%s' does not exist." % filename) 33 | 34 | fo = open(filename, 'rb') 35 | dict = cPickle.load(fo) 36 | fo.close() 37 | return dict 38 | 39 | class MGCAM_DataLayer(caffe.Layer): 40 | """Data layer for training""" 41 | def setup(self, bottom, top): 42 | self.width = 64 43 | self.height = 160 # We resize all images into a size of 160*64. 44 | self.width_gt = 16 45 | self.height_gt = 40 # We resize all masks which are used to supervise attention learning into a size of 40*16. 46 | layer_params = yaml.load(self.param_str) 47 | self.batch_size = layer_params['batch_size'] 48 | self.im_path = layer_params['im_path'] 49 | self.gt_path = layer_params['gt_path'] 50 | self.dataset = layer_params['dataset'] 51 | self.labels, self.im_list, self.gt_list, self.im_dir_list, self.gt_dir_list = self.data_processor(self.dataset) 52 | self.idx = 0 53 | self.data_num = len(self.im_list) # Number of data pairs 54 | self.rnd_list = np.arange(self.data_num) # Random the images list 55 | shuffle(self.rnd_list) 56 | 57 | def forward(self, bottom, top): 58 | # Assign forward data 59 | top[0].data[...] = self.im 60 | top[1].data[...] = self.inner_label 61 | top[2].data[...] = self.exter_label 62 | top[3].data[...] = self.one_mask 63 | top[4].data[...] = self.label 64 | top[5].data[...] = self.label_plus 65 | top[6].data[...] = self.gt 66 | top[7].data[...] = self.mask 67 | 68 | def backward(self, top, propagate_down, bottom): 69 | """This layer does not propagate gradients.""" 70 | pass 71 | 72 | def reshape(self, bottom, top): 73 | # Load image + label image pairs 74 | self.im = [] 75 | self.label = [] 76 | self.inner_label = [] 77 | self.exter_label = [] 78 | self.one_mask = [] 79 | self.label_plus = [] 80 | self.gt = [] 81 | self.mask = [] 82 | 83 | for i in xrange(self.batch_size): 84 | if self.idx == self.data_num: 85 | self.idx = 0 86 | shuffle(self.rnd_list) #Randomly shuffle the list. 87 | cur_idx = self.rnd_list[self.idx] 88 | im_path = self.im_list[cur_idx] 89 | gt_path = self.gt_list[cur_idx] 90 | im_, gt_, mask_= self.load_data(im_path, gt_path) 91 | self.im.append(im_) 92 | self.gt.append(gt_) 93 | self.mask.append(mask_) 94 | self.label.append(self.labels[cur_idx]) 95 | self.inner_label.append(int(1)) 96 | self.exter_label.append(int(0)) 97 | one_mask_ = np.zeros((1,40,16),dtype = np.float32) 98 | one_mask_ = one_mask_ + 1.0 99 | self.one_mask.append(one_mask_) 100 | self.label_plus.append(self.labels[cur_idx]) #Here, we also give the ID-labels to background-stream. 101 | self.idx +=1 102 | 103 | self.im = np.array(self.im).astype(np.float32) 104 | self.inner_label = np.array(self.inner_label).astype(np.float32) 105 | self.exter_label = np.array(self.exter_label).astype(np.float32) 106 | self.one_mask = np.array(self.one_mask).astype(np.float32) 107 | self.label = np.array(self.label).astype(np.float32) 108 | self.label_plus = np.array(self.label_plus).astype(np.float32) 109 | self.gt = np.array(self.gt).astype(np.float32) 110 | self.mask = np.array(self.mask).astype(np.float32) 111 | # Reshape tops to fit blobs 112 | top[0].reshape(*self.im.shape) 113 | top[1].reshape(*self.inner_label.shape) 114 | top[2].reshape(*self.exter_label.shape) 115 | top[3].reshape(*self.one_mask.shape) 116 | top[4].reshape(*self.label.shape) 117 | top[5].reshape(*self.label_plus.shape) 118 | top[6].reshape(*self.gt.shape) 119 | top[7].reshape(*self.mask.shape) 120 | 121 | def data_processor(self, data_name): 122 | data_dic = './' + data_name 123 | if not os.path.exists(data_dic): 124 | im_list = [] 125 | gt_list = [] 126 | labels = [] 127 | im_dir_list = [] 128 | gt_dir_list = [] 129 | new_id = 0 130 | id_list = np.sort(os.listdir(self.im_path)) 131 | for id in id_list: 132 | im_dir = os.path.join(self.im_path, id) 133 | gt_dir = os.path.join(self.gt_path, id) 134 | if not os.path.exists(im_dir): 135 | continue 136 | pic_im_list = np.sort(os.listdir(im_dir)) 137 | if len(pic_im_list)>1: 138 | for pic in pic_im_list: 139 | this_dir = os.path.join(self.im_path, id, pic) 140 | gt_pic = pic 141 | if not pic.lower().endswith('.png'): 142 | gt_pic = pic[:-4] + '.png' 143 | this_gt_dir = os.path.join(self.gt_path, id, gt_pic) 144 | im_list.append(this_dir) 145 | gt_list.append(this_gt_dir) 146 | labels.append(int(new_id)) 147 | new_id +=1 148 | im_dir_list.append(im_dir) 149 | gt_dir_list.append(gt_dir) 150 | dic = {'im_list':im_list,'gt_list':gt_list,'labels':labels,'im_dir_list':im_dir_list,'gt_dir_list':gt_dir_list} 151 | mypickle(data_dic, dic) 152 | # Load saved data dict to resume. 153 | else: 154 | dic = myunpickle(data_dic) 155 | im_list = dic['im_list'] 156 | gt_list = dic['gt_list'] 157 | labels = dic['labels'] 158 | im_dir_list = dic['im_dir_list'] 159 | gt_dir_list = dic['gt_dir_list'] 160 | return labels, im_list, gt_list, im_dir_list, gt_dir_list 161 | 162 | def load_data(self, im_path, gt_path): 163 | """ 164 | Load input image and preprocess for Caffe: 165 | - cast to float 166 | - switch channels RGB -> BGR 167 | - subtract mean 168 | - transpose to channel x height x width order 169 | """ 170 | oim = cv2.imread(im_path) 171 | inputImage = cv2.resize(oim, (self.width, self.height)) 172 | inputImage = np.array(inputImage, dtype=np.float32) 173 | 174 | # Substract mean 175 | inputImage[:, :, 0] = inputImage[:, :, 0] - 104.008 176 | inputImage[:, :, 1] = inputImage[:, :, 1] - 116.669 177 | inputImage[:, :, 2] = inputImage[:, :, 2] - 122.675 178 | 179 | # Permute dimensions 180 | if_flip = nr.randint(2) 181 | if if_flip == 0: # Also flip the image with 50% probability 182 | inputImage = inputImage[:,::-1,:] 183 | inputImage = inputImage.transpose([2, 0, 1]) 184 | inputImage = inputImage/256.0 185 | #GT 186 | mask_im= cv2.cvtColor(cv2.imread(gt_path),cv2.COLOR_BGR2GRAY) 187 | inputGt = np.array(cv2.resize(mask_im, (self.width_gt, self.height_gt)), dtype=np.float32) 188 | inputGt = inputGt/255.0 189 | if if_flip == 0: 190 | inputGt = inputGt[:,::-1] 191 | inputGt = inputGt[np.newaxis, ...] 192 | #Mask 193 | inputMask = np.array(cv2.resize(mask_im, (self.width, self.height)), dtype=np.float32) 194 | inputMask = inputMask-127.5 195 | inputMask = inputMask/255.0 196 | if if_flip == 0: 197 | inputMask = inputMask[:,::-1] 198 | inputMask = inputMask[np.newaxis, ...] 199 | return inputImage, inputGt, inputMask 200 | 201 | 202 | class MGCAM_SIA_DataLayer(caffe.Layer): 203 | """Data layer for training""" 204 | def setup(self, bottom, top): 205 | self.width = 64 206 | self.height = 160 # We resize all images into a size of 160*64. 207 | self.width_gt = 16 208 | self.height_gt = 40 # We resize all masks which are used to supervise attention learning into a size of 160*64. 209 | 210 | layer_params = yaml.load(self.param_str) 211 | self.batch_size = layer_params['batch_size'] 212 | self.pos_pair_num = int(0.30*self.batch_size) # There will be at least 30 percent postive pairs for each batch. 213 | self.im_path = layer_params['im_path'] 214 | self.gt_path = layer_params['gt_path'] 215 | self.dataset = layer_params['dataset'] 216 | self.labels, self.im_list, self.gt_list, self.im_dir_list, self.gt_dir_list = self.data_processor(self.dataset) 217 | self.idx = 0 218 | self.data_num = len(self.im_list) 219 | self.rnd_list = np.arange(self.data_num) 220 | shuffle(self.rnd_list) 221 | 222 | def forward(self, bottom, top): 223 | # Assign forward data 224 | top[0].data[...] = self.im 225 | top[1].data[...] = self.inner_label 226 | top[2].data[...] = self.exter_label 227 | top[3].data[...] = self.one_mask 228 | top[4].data[...] = self.label 229 | top[5].data[...] = self.label_plus 230 | top[6].data[...] = self.gt 231 | top[7].data[...] = self.mask 232 | top[8].data[...] = self.siam_label 233 | 234 | def backward(self, top, propagate_down, bottom): 235 | """This layer does not propagate gradients.""" 236 | pass 237 | 238 | def reshape(self, bottom, top): 239 | # Load image + label image pairs 240 | self.im = [] 241 | self.label = [] 242 | self.inner_label = [] 243 | self.exter_label = [] 244 | self.one_mask = [] 245 | self.label_plus = [] 246 | self.gt = [] 247 | self.mask = [] 248 | self.siam_label = [] 249 | 250 | for i in xrange(self.batch_size): 251 | if self.idx == self.data_num: 252 | self.idx = 0 253 | shuffle(self.rnd_list) 254 | cur_idx = self.rnd_list[self.idx] 255 | im_path = self.im_list[cur_idx] 256 | gt_path = self.gt_list[cur_idx] 257 | im_, gt_, mask_= self.load_data(im_path, gt_path) 258 | self.im.append(im_) 259 | self.gt.append(gt_) 260 | self.mask.append(mask_) 261 | self.label.append(self.labels[cur_idx]) 262 | self.inner_label.append(int(1)) 263 | self.exter_label.append(int(0)) 264 | one_mask_ = np.zeros((1,40,16),dtype = np.float32) 265 | one_mask_ = one_mask_ + 1.0 266 | self.one_mask.append(one_mask_) 267 | self.label_plus.append(self.labels[cur_idx])#Labels for backgrounds. We use the same labels with other two regions here. 268 | self.idx +=1 269 | 270 | for i in xrange(self.batch_size): 271 | if i > self.pos_pair_num: 272 | if self.idx == self.data_num: 273 | self.idx = 0 274 | shuffle(self.rnd_list)#Randomly shuffle the list. 275 | cur_idx = self.rnd_list[self.idx] 276 | self.idx +=1 277 | im_path = self.im_list[cur_idx] 278 | gt_path = self.gt_list[cur_idx] 279 | label = self.labels[cur_idx] 280 | if label==self.label[i]: 281 | self.siam_label.append(int(1))#In case of getting postive pairs, maybe not much. 282 | else: 283 | self.siam_label.append(int(0))#Negative pairs. 284 | else: 285 | im_dir = self.im_dir_list[self.label[i]] 286 | gt_dir = self.gt_dir_list[self.label[i]] 287 | im_list = np.sort(os.listdir(im_dir)) 288 | gt_list = np.sort(os.listdir(gt_dir)) 289 | tmp_list = np.arange(len(im_list)) 290 | shuffle(tmp_list) #Randomly select one. 291 | im_path = os.path.join(im_dir, im_list[tmp_list[0]]) 292 | gt_path = os.path.join(gt_dir, gt_list[tmp_list[0]]) 293 | label = self.label[i] 294 | self.siam_label.append(int(1))#This is a postive pair. 295 | 296 | im_, gt_, mask_= self.load_data(im_path, gt_path) 297 | self.im.append(im_) 298 | self.gt.append(gt_) 299 | self.mask.append(mask_) 300 | self.label.append(label) 301 | self.inner_label.append(int(1))#Allways be ones, for constrastive learning. 302 | self.exter_label.append(int(0)) 303 | one_mask_ = np.zeros((1,40,16),dtype = np.float32) 304 | one_mask_ = one_mask_ + 1.0 305 | self.one_mask.append(one_mask_) 306 | self.label_plus.append(label) 307 | 308 | self.im = np.array(self.im).astype(np.float32) 309 | self.inner_label = np.array(self.inner_label).astype(np.float32) 310 | self.exter_label = np.array(self.exter_label).astype(np.float32) 311 | self.one_mask = np.array(self.one_mask).astype(np.float32) 312 | self.label = np.array(self.label).astype(np.float32) 313 | self.label_plus = np.array(self.label_plus).astype(np.float32) 314 | self.gt = np.array(self.gt).astype(np.float32) 315 | self.mask = np.array(self.mask).astype(np.float32) 316 | self.siam_label = np.array(self.siam_label).astype(np.float32) 317 | # Reshape tops to fit blobs 318 | top[0].reshape(*self.im.shape) 319 | top[1].reshape(*self.inner_label.shape) 320 | top[2].reshape(*self.exter_label.shape) 321 | top[3].reshape(*self.one_mask.shape) 322 | top[4].reshape(*self.label.shape) 323 | top[5].reshape(*self.label_plus.shape) 324 | top[6].reshape(*self.gt.shape) 325 | top[7].reshape(*self.mask.shape) 326 | top[8].reshape(*self.siam_label.shape) 327 | 328 | def data_processor(self, data_name): 329 | data_dic = './' + data_name 330 | if not os.path.exists(data_dic): 331 | im_list = [] 332 | gt_list = [] 333 | labels = [] 334 | im_dir_list = [] 335 | gt_dir_list = [] 336 | new_id = 0 337 | id_list = np.sort(os.listdir(self.im_path)) 338 | for id in id_list: 339 | im_dir = os.path.join(self.im_path, id) 340 | gt_dir = os.path.join(self.gt_path, id) 341 | if not os.path.exists(im_dir): 342 | continue 343 | pic_im_list = np.sort(os.listdir(im_dir)) 344 | if len(pic_im_list)>1: 345 | for pic in pic_im_list: 346 | this_dir = os.path.join(self.im_path, id, pic) 347 | gt_pic = pic 348 | if not pic.lower().endswith('.png'): 349 | gt_pic = pic[:-4] + '.png' 350 | this_gt_dir = os.path.join(self.gt_path, id, gt_pic) 351 | im_list.append(this_dir) 352 | gt_list.append(this_gt_dir) 353 | labels.append(int(new_id)) 354 | new_id +=1 355 | im_dir_list.append(im_dir) 356 | gt_dir_list.append(gt_dir) 357 | dic = {'im_list':im_list,'gt_list':gt_list,'labels':labels,'im_dir_list':im_dir_list,'gt_dir_list':gt_dir_list} 358 | mypickle(data_dic, dic) 359 | # Load saved data dict to resume. 360 | else: 361 | dic = myunpickle(data_dic) 362 | im_list = dic['im_list'] 363 | gt_list = dic['gt_list'] 364 | labels = dic['labels'] 365 | im_dir_list = dic['im_dir_list'] 366 | gt_dir_list = dic['gt_dir_list'] 367 | return labels, im_list, gt_list, im_dir_list, gt_dir_list 368 | 369 | def load_data(self, im_path, gt_path): 370 | """ 371 | Load input image and preprocess for Caffe: 372 | - cast to float 373 | - switch channels RGB -> BGR 374 | - subtract mean 375 | - transpose to channel x height x width order 376 | """ 377 | oim = cv2.imread(im_path) 378 | inputImage = cv2.resize(oim, (self.width, self.height)) 379 | inputImage = np.array(inputImage, dtype=np.float32) 380 | 381 | # Substract mean 382 | inputImage[:, :, 0] = inputImage[:, :, 0] - 104.008 383 | inputImage[:, :, 1] = inputImage[:, :, 1] - 116.669 384 | inputImage[:, :, 2] = inputImage[:, :, 2] - 122.675 385 | 386 | # Permute dimensions 387 | if_flip = nr.randint(2) 388 | if if_flip == 0: # Also flip the image with 50% probability 389 | inputImage = inputImage[:,::-1,:] 390 | inputImage = inputImage.transpose([2, 0, 1]) 391 | inputImage = inputImage/256.0 392 | #GT 393 | mask_im= cv2.cvtColor(cv2.imread(gt_path),cv2.COLOR_BGR2GRAY) 394 | inputGt = np.array(cv2.resize(mask_im, (self.width_gt, self.height_gt)), dtype=np.float32) 395 | inputGt = inputGt/255.0 396 | if if_flip == 0: 397 | inputGt = inputGt[:,::-1] 398 | inputGt = inputGt[np.newaxis, ...] 399 | #Mask 400 | inputMask = np.array(cv2.resize(mask_im, (self.width, self.height)), dtype=np.float32) 401 | inputMask = inputMask-127.5 402 | inputMask = inputMask/255.0 403 | if if_flip == 0: 404 | inputMask = inputMask[:,::-1] 405 | inputMask = inputMask[np.newaxis, ...] 406 | return inputImage, inputGt, inputMask 407 | -------------------------------------------------------------------------------- /experiments/cuhk03-labeled/mgcam_iter_75000.caffemodel: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/developfeng/MGCAM/1b5957218ecaa7f13bf2107bc41b1349c05be9c8/experiments/cuhk03-labeled/mgcam_iter_75000.caffemodel -------------------------------------------------------------------------------- /experiments/cuhk03-labeled/mgcam_siamese_iter_20000.caffemodel: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/developfeng/MGCAM/1b5957218ecaa7f13bf2107bc41b1349c05be9c8/experiments/cuhk03-labeled/mgcam_siamese_iter_20000.caffemodel -------------------------------------------------------------------------------- /experiments/cuhk03-labeled/run_mgcam.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env sh 2 | LOG=./mgcam-`date +%Y-%m-%d-%H-%M-%S`.log 3 | CAFFE=/path-to-caffe/build/tools/caffe 4 | 5 | $CAFFE train --solver=./solver_mgcam.prototxt --gpu=0 2>&1 | tee $LOG 6 | -------------------------------------------------------------------------------- /experiments/cuhk03-labeled/run_mgcam_siamese.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env sh 2 | LOG=./mgcam-siamese`date +%Y-%m-%d-%H-%M-%S`.log 3 | CAFFE=/path-to-caffe/build/tools/caffe 4 | 5 | $CAFFE train --solver=./solver_mgcam_siamese.prototxt --weights=./mgcam_iter_75000.caffemodel --gpu=0 2>&1 | tee $LOG 6 | -------------------------------------------------------------------------------- /experiments/cuhk03-labeled/solver_mgcam.prototxt: -------------------------------------------------------------------------------- 1 | net: "mgcam_train.prototxt" 2 | 3 | test_iter: 10 4 | test_interval:1000 5 | base_lr: 0.01 6 | lr_policy: "step" 7 | gamma: 0.1 8 | stepsize: 15000 9 | display: 10 10 | max_iter: 75000 11 | momentum: 0.9 12 | weight_decay: 0.005 13 | snapshot: 5000 14 | snapshot_prefix: "mgcam" 15 | solver_mode: GPU 16 | -------------------------------------------------------------------------------- /experiments/cuhk03-labeled/solver_mgcam_siamese.prototxt: -------------------------------------------------------------------------------- 1 | net: "mgcam_siamese_train.prototxt" 2 | 3 | test_iter: 10 4 | test_interval:1000 5 | base_lr: 0.0001 6 | lr_policy: "step" 7 | gamma: 0.1 8 | stepsize: 10000 9 | display: 10 10 | max_iter: 20000 11 | momentum: 0.9 12 | weight_decay: 0.005 13 | snapshot: 5000 14 | snapshot_prefix: "mgcam_siamese" 15 | solver_mode: GPU 16 | -------------------------------------------------------------------------------- /experiments/market1501/layers.py: -------------------------------------------------------------------------------- 1 | """ 2 | MGCAM data layer. 3 | 4 | by Chunfeng Song 5 | 6 | 2017/10/08 7 | 8 | This code is for research use only, please cite our paper: 9 | 10 | Chunfeng Song, Yan Huang, Wanli Ouyang, and Liang Wang. Mask-guided Contrastive Attention Model for Person Re-Identification. In CVPR, 2018. 11 | 12 | Contact us: chunfeng.song@nlpr.ia.ac.cn 13 | """ 14 | 15 | import caffe 16 | import numpy as np 17 | import yaml 18 | from random import shuffle 19 | import numpy.random as nr 20 | import cv2 21 | import os 22 | import pickle as cPickle 23 | import pdb 24 | 25 | def mypickle(filename, data): 26 | fo = open(filename, "wb") 27 | cPickle.dump(data, fo, protocol=cPickle.HIGHEST_PROTOCOL) 28 | fo.close() 29 | 30 | def myunpickle(filename): 31 | if not os.path.exists(filename): 32 | raise UnpickleError("Path '%s' does not exist." % filename) 33 | 34 | fo = open(filename, 'rb') 35 | dict = cPickle.load(fo) 36 | fo.close() 37 | return dict 38 | 39 | class MGCAM_DataLayer(caffe.Layer): 40 | """Data layer for training""" 41 | def setup(self, bottom, top): 42 | self.width = 64 43 | self.height = 160 # We resize all images into a size of 160*64. 44 | self.width_gt = 16 45 | self.height_gt = 40 # We resize all masks which are used to supervise attention learning into a size of 40*16. 46 | layer_params = yaml.load(self.param_str) 47 | self.batch_size = layer_params['batch_size'] 48 | self.im_path = layer_params['im_path'] 49 | self.gt_path = layer_params['gt_path'] 50 | self.dataset = layer_params['dataset'] 51 | self.labels, self.im_list, self.gt_list, self.im_dir_list, self.gt_dir_list = self.data_processor(self.dataset) 52 | self.idx = 0 53 | self.data_num = len(self.im_list) # Number of data pairs 54 | self.rnd_list = np.arange(self.data_num) # Random the images list 55 | shuffle(self.rnd_list) 56 | 57 | def forward(self, bottom, top): 58 | # Assign forward data 59 | top[0].data[...] = self.im 60 | top[1].data[...] = self.inner_label 61 | top[2].data[...] = self.exter_label 62 | top[3].data[...] = self.one_mask 63 | top[4].data[...] = self.label 64 | top[5].data[...] = self.label_plus 65 | top[6].data[...] = self.gt 66 | top[7].data[...] = self.mask 67 | 68 | def backward(self, top, propagate_down, bottom): 69 | """This layer does not propagate gradients.""" 70 | pass 71 | 72 | def reshape(self, bottom, top): 73 | # Load image + label image pairs 74 | self.im = [] 75 | self.label = [] 76 | self.inner_label = [] 77 | self.exter_label = [] 78 | self.one_mask = [] 79 | self.label_plus = [] 80 | self.gt = [] 81 | self.mask = [] 82 | 83 | for i in xrange(self.batch_size): 84 | if self.idx == self.data_num: 85 | self.idx = 0 86 | shuffle(self.rnd_list) #Randomly shuffle the list. 87 | cur_idx = self.rnd_list[self.idx] 88 | im_path = self.im_list[cur_idx] 89 | gt_path = self.gt_list[cur_idx] 90 | im_, gt_, mask_= self.load_data(im_path, gt_path) 91 | self.im.append(im_) 92 | self.gt.append(gt_) 93 | self.mask.append(mask_) 94 | self.label.append(self.labels[cur_idx]) 95 | self.inner_label.append(int(1)) 96 | self.exter_label.append(int(0)) 97 | one_mask_ = np.zeros((1,40,16),dtype = np.float32) 98 | one_mask_ = one_mask_ + 1.0 99 | self.one_mask.append(one_mask_) 100 | self.label_plus.append(self.labels[cur_idx]) #Here, we also give the ID-labels to background-stream. 101 | self.idx +=1 102 | 103 | self.im = np.array(self.im).astype(np.float32) 104 | self.inner_label = np.array(self.inner_label).astype(np.float32) 105 | self.exter_label = np.array(self.exter_label).astype(np.float32) 106 | self.one_mask = np.array(self.one_mask).astype(np.float32) 107 | self.label = np.array(self.label).astype(np.float32) 108 | self.label_plus = np.array(self.label_plus).astype(np.float32) 109 | self.gt = np.array(self.gt).astype(np.float32) 110 | self.mask = np.array(self.mask).astype(np.float32) 111 | # Reshape tops to fit blobs 112 | top[0].reshape(*self.im.shape) 113 | top[1].reshape(*self.inner_label.shape) 114 | top[2].reshape(*self.exter_label.shape) 115 | top[3].reshape(*self.one_mask.shape) 116 | top[4].reshape(*self.label.shape) 117 | top[5].reshape(*self.label_plus.shape) 118 | top[6].reshape(*self.gt.shape) 119 | top[7].reshape(*self.mask.shape) 120 | 121 | def data_processor(self, data_name): 122 | data_dic = './' + data_name 123 | if not os.path.exists(data_dic): 124 | im_list = [] 125 | gt_list = [] 126 | labels = [] 127 | im_dir_list = [] 128 | gt_dir_list = [] 129 | new_id = 0 130 | id_list = np.sort(os.listdir(self.im_path)) 131 | for id in id_list: 132 | im_dir = os.path.join(self.im_path, id) 133 | gt_dir = os.path.join(self.gt_path, id) 134 | if not os.path.exists(im_dir): 135 | continue 136 | pic_im_list = np.sort(os.listdir(im_dir)) 137 | if len(pic_im_list)>1: 138 | for pic in pic_im_list: 139 | this_dir = os.path.join(self.im_path, id, pic) 140 | gt_pic = pic 141 | if not pic.lower().endswith('.png'): 142 | gt_pic = pic[:-4] + '.png' 143 | this_gt_dir = os.path.join(self.gt_path, id, gt_pic) 144 | im_list.append(this_dir) 145 | gt_list.append(this_gt_dir) 146 | labels.append(int(new_id)) 147 | new_id +=1 148 | im_dir_list.append(im_dir) 149 | gt_dir_list.append(gt_dir) 150 | dic = {'im_list':im_list,'gt_list':gt_list,'labels':labels,'im_dir_list':im_dir_list,'gt_dir_list':gt_dir_list} 151 | mypickle(data_dic, dic) 152 | # Load saved data dict to resume. 153 | else: 154 | dic = myunpickle(data_dic) 155 | im_list = dic['im_list'] 156 | gt_list = dic['gt_list'] 157 | labels = dic['labels'] 158 | im_dir_list = dic['im_dir_list'] 159 | gt_dir_list = dic['gt_dir_list'] 160 | return labels, im_list, gt_list, im_dir_list, gt_dir_list 161 | 162 | def load_data(self, im_path, gt_path): 163 | """ 164 | Load input image and preprocess for Caffe: 165 | - cast to float 166 | - switch channels RGB -> BGR 167 | - subtract mean 168 | - transpose to channel x height x width order 169 | """ 170 | oim = cv2.imread(im_path) 171 | inputImage = cv2.resize(oim, (self.width, self.height)) 172 | inputImage = np.array(inputImage, dtype=np.float32) 173 | 174 | # Substract mean 175 | inputImage[:, :, 0] = inputImage[:, :, 0] - 104.008 176 | inputImage[:, :, 1] = inputImage[:, :, 1] - 116.669 177 | inputImage[:, :, 2] = inputImage[:, :, 2] - 122.675 178 | 179 | # Permute dimensions 180 | if_flip = nr.randint(2) 181 | if if_flip == 0: # Also flip the image with 50% probability 182 | inputImage = inputImage[:,::-1,:] 183 | inputImage = inputImage.transpose([2, 0, 1]) 184 | inputImage = inputImage/256.0 185 | #GT 186 | mask_im= cv2.cvtColor(cv2.imread(gt_path),cv2.COLOR_BGR2GRAY) 187 | inputGt = np.array(cv2.resize(mask_im, (self.width_gt, self.height_gt)), dtype=np.float32) 188 | inputGt = inputGt/255.0 189 | if if_flip == 0: 190 | inputGt = inputGt[:,::-1] 191 | inputGt = inputGt[np.newaxis, ...] 192 | #Mask 193 | inputMask = np.array(cv2.resize(mask_im, (self.width, self.height)), dtype=np.float32) 194 | inputMask = inputMask-127.5 195 | inputMask = inputMask/255.0 196 | if if_flip == 0: 197 | inputMask = inputMask[:,::-1] 198 | inputMask = inputMask[np.newaxis, ...] 199 | return inputImage, inputGt, inputMask 200 | 201 | 202 | class MGCAM_SIA_DataLayer(caffe.Layer): 203 | """Data layer for training""" 204 | def setup(self, bottom, top): 205 | self.width = 64 206 | self.height = 160 # We resize all images into a size of 160*64. 207 | self.width_gt = 16 208 | self.height_gt = 40 # We resize all masks which are used to supervise attention learning into a size of 160*64. 209 | 210 | layer_params = yaml.load(self.param_str) 211 | self.batch_size = layer_params['batch_size'] 212 | self.pos_pair_num = int(0.30*self.batch_size) # There will be at least 30 percent postive pairs for each batch. 213 | self.im_path = layer_params['im_path'] 214 | self.gt_path = layer_params['gt_path'] 215 | self.dataset = layer_params['dataset'] 216 | self.labels, self.im_list, self.gt_list, self.im_dir_list, self.gt_dir_list = self.data_processor(self.dataset) 217 | self.idx = 0 218 | self.data_num = len(self.im_list) 219 | self.rnd_list = np.arange(self.data_num) 220 | shuffle(self.rnd_list) 221 | 222 | def forward(self, bottom, top): 223 | # Assign forward data 224 | top[0].data[...] = self.im 225 | top[1].data[...] = self.inner_label 226 | top[2].data[...] = self.exter_label 227 | top[3].data[...] = self.one_mask 228 | top[4].data[...] = self.label 229 | top[5].data[...] = self.label_plus 230 | top[6].data[...] = self.gt 231 | top[7].data[...] = self.mask 232 | top[8].data[...] = self.siam_label 233 | 234 | def backward(self, top, propagate_down, bottom): 235 | """This layer does not propagate gradients.""" 236 | pass 237 | 238 | def reshape(self, bottom, top): 239 | # Load image + label image pairs 240 | self.im = [] 241 | self.label = [] 242 | self.inner_label = [] 243 | self.exter_label = [] 244 | self.one_mask = [] 245 | self.label_plus = [] 246 | self.gt = [] 247 | self.mask = [] 248 | self.siam_label = [] 249 | 250 | for i in xrange(self.batch_size): 251 | if self.idx == self.data_num: 252 | self.idx = 0 253 | shuffle(self.rnd_list) 254 | cur_idx = self.rnd_list[self.idx] 255 | im_path = self.im_list[cur_idx] 256 | gt_path = self.gt_list[cur_idx] 257 | im_, gt_, mask_= self.load_data(im_path, gt_path) 258 | self.im.append(im_) 259 | self.gt.append(gt_) 260 | self.mask.append(mask_) 261 | self.label.append(self.labels[cur_idx]) 262 | self.inner_label.append(int(1)) 263 | self.exter_label.append(int(0)) 264 | one_mask_ = np.zeros((1,40,16),dtype = np.float32) 265 | one_mask_ = one_mask_ + 1.0 266 | self.one_mask.append(one_mask_) 267 | self.label_plus.append(self.labels[cur_idx])#Labels for backgrounds. We use the same labels with other two regions here. 268 | self.idx +=1 269 | 270 | for i in xrange(self.batch_size): 271 | if i > self.pos_pair_num: 272 | if self.idx == self.data_num: 273 | self.idx = 0 274 | shuffle(self.rnd_list)#Randomly shuffle the list. 275 | cur_idx = self.rnd_list[self.idx] 276 | self.idx +=1 277 | im_path = self.im_list[cur_idx] 278 | gt_path = self.gt_list[cur_idx] 279 | label = self.labels[cur_idx] 280 | if label==self.label[i]: 281 | self.siam_label.append(int(1))#In case of getting postive pairs, maybe not much. 282 | else: 283 | self.siam_label.append(int(0))#Negative pairs. 284 | else: 285 | im_dir = self.im_dir_list[self.label[i]] 286 | gt_dir = self.gt_dir_list[self.label[i]] 287 | im_list = np.sort(os.listdir(im_dir)) 288 | gt_list = np.sort(os.listdir(gt_dir)) 289 | tmp_list = np.arange(len(im_list)) 290 | shuffle(tmp_list) #Randomly select one. 291 | im_path = os.path.join(im_dir, im_list[tmp_list[0]]) 292 | gt_path = os.path.join(gt_dir, gt_list[tmp_list[0]]) 293 | label = self.label[i] 294 | self.siam_label.append(int(1))#This is a postive pair. 295 | 296 | im_, gt_, mask_= self.load_data(im_path, gt_path) 297 | self.im.append(im_) 298 | self.gt.append(gt_) 299 | self.mask.append(mask_) 300 | self.label.append(label) 301 | self.inner_label.append(int(1))#Allways be ones, for constrastive learning. 302 | self.exter_label.append(int(0)) 303 | one_mask_ = np.zeros((1,40,16),dtype = np.float32) 304 | one_mask_ = one_mask_ + 1.0 305 | self.one_mask.append(one_mask_) 306 | self.label_plus.append(label) 307 | 308 | self.im = np.array(self.im).astype(np.float32) 309 | self.inner_label = np.array(self.inner_label).astype(np.float32) 310 | self.exter_label = np.array(self.exter_label).astype(np.float32) 311 | self.one_mask = np.array(self.one_mask).astype(np.float32) 312 | self.label = np.array(self.label).astype(np.float32) 313 | self.label_plus = np.array(self.label_plus).astype(np.float32) 314 | self.gt = np.array(self.gt).astype(np.float32) 315 | self.mask = np.array(self.mask).astype(np.float32) 316 | self.siam_label = np.array(self.siam_label).astype(np.float32) 317 | # Reshape tops to fit blobs 318 | top[0].reshape(*self.im.shape) 319 | top[1].reshape(*self.inner_label.shape) 320 | top[2].reshape(*self.exter_label.shape) 321 | top[3].reshape(*self.one_mask.shape) 322 | top[4].reshape(*self.label.shape) 323 | top[5].reshape(*self.label_plus.shape) 324 | top[6].reshape(*self.gt.shape) 325 | top[7].reshape(*self.mask.shape) 326 | top[8].reshape(*self.siam_label.shape) 327 | 328 | def data_processor(self, data_name): 329 | data_dic = './' + data_name 330 | if not os.path.exists(data_dic): 331 | im_list = [] 332 | gt_list = [] 333 | labels = [] 334 | im_dir_list = [] 335 | gt_dir_list = [] 336 | new_id = 0 337 | id_list = np.sort(os.listdir(self.im_path)) 338 | for id in id_list: 339 | im_dir = os.path.join(self.im_path, id) 340 | gt_dir = os.path.join(self.gt_path, id) 341 | if not os.path.exists(im_dir): 342 | continue 343 | pic_im_list = np.sort(os.listdir(im_dir)) 344 | if len(pic_im_list)>1: 345 | for pic in pic_im_list: 346 | this_dir = os.path.join(self.im_path, id, pic) 347 | gt_pic = pic 348 | if not pic.lower().endswith('.png'): 349 | gt_pic = pic[:-4] + '.png' 350 | this_gt_dir = os.path.join(self.gt_path, id, gt_pic) 351 | im_list.append(this_dir) 352 | gt_list.append(this_gt_dir) 353 | labels.append(int(new_id)) 354 | new_id +=1 355 | im_dir_list.append(im_dir) 356 | gt_dir_list.append(gt_dir) 357 | dic = {'im_list':im_list,'gt_list':gt_list,'labels':labels,'im_dir_list':im_dir_list,'gt_dir_list':gt_dir_list} 358 | mypickle(data_dic, dic) 359 | # Load saved data dict to resume. 360 | else: 361 | dic = myunpickle(data_dic) 362 | im_list = dic['im_list'] 363 | gt_list = dic['gt_list'] 364 | labels = dic['labels'] 365 | im_dir_list = dic['im_dir_list'] 366 | gt_dir_list = dic['gt_dir_list'] 367 | return labels, im_list, gt_list, im_dir_list, gt_dir_list 368 | 369 | def load_data(self, im_path, gt_path): 370 | """ 371 | Load input image and preprocess for Caffe: 372 | - cast to float 373 | - switch channels RGB -> BGR 374 | - subtract mean 375 | - transpose to channel x height x width order 376 | """ 377 | oim = cv2.imread(im_path) 378 | inputImage = cv2.resize(oim, (self.width, self.height)) 379 | inputImage = np.array(inputImage, dtype=np.float32) 380 | 381 | # Substract mean 382 | inputImage[:, :, 0] = inputImage[:, :, 0] - 104.008 383 | inputImage[:, :, 1] = inputImage[:, :, 1] - 116.669 384 | inputImage[:, :, 2] = inputImage[:, :, 2] - 122.675 385 | 386 | # Permute dimensions 387 | if_flip = nr.randint(2) 388 | if if_flip == 0: # Also flip the image with 50% probability 389 | inputImage = inputImage[:,::-1,:] 390 | inputImage = inputImage.transpose([2, 0, 1]) 391 | inputImage = inputImage/256.0 392 | #GT 393 | mask_im= cv2.cvtColor(cv2.imread(gt_path),cv2.COLOR_BGR2GRAY) 394 | inputGt = np.array(cv2.resize(mask_im, (self.width_gt, self.height_gt)), dtype=np.float32) 395 | inputGt = inputGt/255.0 396 | if if_flip == 0: 397 | inputGt = inputGt[:,::-1] 398 | inputGt = inputGt[np.newaxis, ...] 399 | #Mask 400 | inputMask = np.array(cv2.resize(mask_im, (self.width, self.height)), dtype=np.float32) 401 | inputMask = inputMask-127.5 402 | inputMask = inputMask/255.0 403 | if if_flip == 0: 404 | inputMask = inputMask[:,::-1] 405 | inputMask = inputMask[np.newaxis, ...] 406 | return inputImage, inputGt, inputMask 407 | -------------------------------------------------------------------------------- /experiments/market1501/mgcam_iter_75000.caffemodel: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/developfeng/MGCAM/1b5957218ecaa7f13bf2107bc41b1349c05be9c8/experiments/market1501/mgcam_iter_75000.caffemodel -------------------------------------------------------------------------------- /experiments/market1501/mgcam_siamese_iter_20000.caffemodel: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/developfeng/MGCAM/1b5957218ecaa7f13bf2107bc41b1349c05be9c8/experiments/market1501/mgcam_siamese_iter_20000.caffemodel -------------------------------------------------------------------------------- /experiments/market1501/mgcam_train.prototxt: -------------------------------------------------------------------------------- 1 | layer { 2 | name: "data" 3 | type: "Python" 4 | top: "data" 5 | top: "sim_iner" 6 | top: "sim_exter" 7 | top: "one_mask" 8 | top: "label" 9 | top: "label_plus" 10 | top: "gt" 11 | top: "mask" 12 | include { 13 | phase: TRAIN 14 | } 15 | python_param { 16 | module: "layers" 17 | layer: "MGCAM_DataLayer" 18 | param_str: "{'batch_size': 128,'im_path':'../../data/market1501/bounding_box_train_fold','gt_path':'../../data/market1501/bounding_box_train_seg_fold/','dataset':'market1501'}" 19 | } 20 | } 21 | 22 | layer { 23 | name: "data" 24 | type: "Python" 25 | top: "data" 26 | top: "sim_iner" 27 | top: "sim_exter" 28 | top: "one_mask" 29 | top: "label" 30 | top: "label_plus" 31 | top: "gt" 32 | top: "mask" 33 | include { 34 | phase: TEST 35 | }#Note the testing set is same with the training here. 36 | python_param { 37 | module: "layers" 38 | layer: "MGCAM_DataLayer" 39 | param_str: "{'batch_size': 10,'im_path':'../../data/market1501/bounding_box_train_fold','gt_path':'../../data/market1501/bounding_box_train_seg_fold/','dataset':'market1501'}" 40 | } 41 | } 42 | 43 | layer { 44 | name: "data_concate" 45 | type: "Concat" 46 | bottom: "data" 47 | bottom: "mask" 48 | top: "data_concate" 49 | } 50 | 51 | layer { 52 | name: "conv0_scale1_full" 53 | type: "Convolution" 54 | bottom: "data_concate" 55 | top: "conv0_scale1_full" 56 | param { 57 | name: "conv0_scale1_w_full" 58 | lr_mult: 1 59 | decay_mult: 1 60 | } 61 | param { 62 | name: "conv0_scale1_b_full" 63 | lr_mult: 2 64 | decay_mult: 0 65 | } 66 | convolution_param { 67 | num_output: 32 68 | pad: 2 69 | kernel_size: 5 70 | stride: 1 71 | weight_filler { 72 | type: "xavier" 73 | } 74 | bias_filler { 75 | type: "constant" 76 | } 77 | dilation: 1 78 | } 79 | } 80 | layer { 81 | name: "bn0_scale1_full" 82 | type: "BatchNorm" 83 | bottom: "conv0_scale1_full" 84 | top: "bn0_scale1_full" 85 | param { 86 | name: "bn0_scale1_mean_full" 87 | lr_mult: 0 88 | decay_mult: 0 89 | } 90 | param { 91 | name: "bn0_scale1_var_full" 92 | lr_mult: 0 93 | decay_mult: 0 94 | } 95 | param { 96 | name: "bn0_scale1_bias_full" 97 | lr_mult: 0 98 | decay_mult: 0 99 | } 100 | batch_norm_param { 101 | use_global_stats: false 102 | } 103 | } 104 | layer { 105 | name: "relu0_full" 106 | type: "ReLU" 107 | bottom: "bn0_scale1_full" 108 | top: "bn0_scale1_full" 109 | } 110 | layer { 111 | name: "pool0_full" 112 | type: "Pooling" 113 | bottom: "bn0_scale1_full" 114 | top: "pool0_full" 115 | pooling_param { 116 | pool: MAX 117 | kernel_size: 2 118 | stride: 2 119 | } 120 | } 121 | layer { 122 | name: "conv1_scale1_full" 123 | type: "Convolution" 124 | bottom: "pool0_full" 125 | top: "conv1_scale1_full" 126 | param { 127 | name: "conv1_scale1_w_full" 128 | lr_mult: 1 129 | decay_mult: 1 130 | } 131 | param { 132 | name: "conv1_scale1_b_full" 133 | lr_mult: 2 134 | decay_mult: 0 135 | } 136 | convolution_param { 137 | num_output: 32 138 | pad: 1 139 | kernel_size: 3 140 | stride: 1 141 | weight_filler { 142 | type: "xavier" 143 | } 144 | bias_filler { 145 | type: "constant" 146 | } 147 | dilation: 1 148 | } 149 | } 150 | layer { 151 | name: "conv1_scale2_full" 152 | type: "Convolution" 153 | bottom: "pool0_full" 154 | top: "conv1_scale2_full" 155 | param { 156 | name: "conv1_scale2_w_full" 157 | lr_mult: 1 158 | decay_mult: 1 159 | } 160 | param { 161 | name: "conv1_scale2_b_full" 162 | lr_mult: 2 163 | decay_mult: 0 164 | } 165 | convolution_param { 166 | num_output: 32 167 | pad: 2 168 | kernel_size: 3 169 | stride: 1 170 | weight_filler { 171 | type: "xavier" 172 | } 173 | bias_filler { 174 | type: "constant" 175 | } 176 | dilation: 2 177 | } 178 | } 179 | layer { 180 | name: "conv1_scale3_full" 181 | type: "Convolution" 182 | bottom: "pool0_full" 183 | top: "conv1_scale3_full" 184 | param { 185 | name: "conv1_scale3_w_full" 186 | lr_mult: 1 187 | decay_mult: 1 188 | } 189 | param { 190 | name: "conv1_scale3_b_full" 191 | lr_mult: 2 192 | decay_mult: 0 193 | } 194 | convolution_param { 195 | num_output: 32 196 | pad: 3 197 | kernel_size: 3 198 | stride: 1 199 | weight_filler { 200 | type: "xavier" 201 | } 202 | bias_filler { 203 | type: "constant" 204 | } 205 | dilation: 3 206 | } 207 | } 208 | layer { 209 | name: "bn1_scale1_full" 210 | type: "BatchNorm" 211 | bottom: "conv1_scale1_full" 212 | top: "bn1_scale1_full" 213 | param { 214 | name: "bn1_scale1_mean_full" 215 | lr_mult: 0 216 | decay_mult: 0 217 | } 218 | param { 219 | name: "bn1_scale1_var_full" 220 | lr_mult: 0 221 | decay_mult: 0 222 | } 223 | param { 224 | name: "bn1_scale1_bias_full" 225 | lr_mult: 0 226 | decay_mult: 0 227 | } 228 | batch_norm_param { 229 | use_global_stats: false 230 | } 231 | } 232 | layer { 233 | name: "bn1_scale2_full" 234 | type: "BatchNorm" 235 | bottom: "conv1_scale2_full" 236 | top: "bn1_scale2_full" 237 | param { 238 | name: "bn1_scale2_mean_full" 239 | lr_mult: 0 240 | decay_mult: 0 241 | } 242 | param { 243 | name: "bn1_scale2_var_full" 244 | lr_mult: 0 245 | decay_mult: 0 246 | } 247 | param { 248 | name: "bn1_scale2_bias_full" 249 | lr_mult: 0 250 | decay_mult: 0 251 | } 252 | batch_norm_param { 253 | use_global_stats: false 254 | } 255 | } 256 | layer { 257 | name: "bn1_scale3_full" 258 | type: "BatchNorm" 259 | bottom: "conv1_scale3_full" 260 | top: "bn1_scale3_full" 261 | param { 262 | name: "bn1_scale3_mean_full" 263 | lr_mult: 0 264 | decay_mult: 0 265 | } 266 | param { 267 | name: "bn1_scale3_var_full" 268 | lr_mult: 0 269 | decay_mult: 0 270 | } 271 | param { 272 | name: "bn1_scale3_bias_full" 273 | lr_mult: 0 274 | decay_mult: 0 275 | } 276 | batch_norm_param { 277 | use_global_stats: false 278 | } 279 | } 280 | layer { 281 | name: "bn1_full" 282 | type: "Concat" 283 | bottom: "bn1_scale1_full" 284 | bottom: "bn1_scale2_full" 285 | bottom: "bn1_scale3_full" 286 | top: "bn1_full" 287 | concat_param { 288 | axis: 1 289 | } 290 | } 291 | layer { 292 | name: "relu1_full" 293 | type: "ReLU" 294 | bottom: "bn1_full" 295 | top: "bn1_full" 296 | } 297 | layer { 298 | name: "pool1_full" 299 | type: "Pooling" 300 | bottom: "bn1_full" 301 | top: "pool1_full" 302 | pooling_param { 303 | pool: MAX 304 | kernel_size: 2 305 | stride: 2 306 | } 307 | } 308 | layer { 309 | name: "conv2_scale1_full" 310 | type: "Convolution" 311 | bottom: "pool1_full" 312 | top: "conv2_scale1_full" 313 | param { 314 | name: "conv2_scale1_w_full" 315 | lr_mult: 1 316 | decay_mult: 1 317 | } 318 | param { 319 | name: "conv2_scale1_b_full" 320 | lr_mult: 2 321 | decay_mult: 0 322 | } 323 | convolution_param { 324 | num_output: 32 325 | pad: 1 326 | kernel_size: 3 327 | stride: 1 328 | weight_filler { 329 | type: "xavier" 330 | } 331 | bias_filler { 332 | type: "constant" 333 | } 334 | dilation: 1 335 | } 336 | } 337 | layer { 338 | name: "conv2_scale2_full" 339 | type: "Convolution" 340 | bottom: "pool1_full" 341 | top: "conv2_scale2_full" 342 | param { 343 | name: "conv2_scale2_w_full" 344 | lr_mult: 1 345 | decay_mult: 1 346 | } 347 | param { 348 | name: "conv2_scale2_b_full" 349 | lr_mult: 2 350 | decay_mult: 0 351 | } 352 | convolution_param { 353 | num_output: 32 354 | pad: 2 355 | kernel_size: 3 356 | stride: 1 357 | weight_filler { 358 | type: "xavier" 359 | } 360 | bias_filler { 361 | type: "constant" 362 | } 363 | dilation: 2 364 | } 365 | } 366 | layer { 367 | name: "conv2_scale3_full" 368 | type: "Convolution" 369 | bottom: "pool1_full" 370 | top: "conv2_scale3_full" 371 | param { 372 | name: "conv2_scale3_w_full" 373 | lr_mult: 1 374 | decay_mult: 1 375 | } 376 | param { 377 | name: "conv2_scale3_b_full" 378 | lr_mult: 2 379 | decay_mult: 0 380 | } 381 | convolution_param { 382 | num_output: 32 383 | pad: 3 384 | kernel_size: 3 385 | stride: 1 386 | weight_filler { 387 | type: "xavier" 388 | } 389 | bias_filler { 390 | type: "constant" 391 | } 392 | dilation: 3 393 | } 394 | } 395 | layer { 396 | name: "bn2_scale1_full" 397 | type: "BatchNorm" 398 | bottom: "conv2_scale1_full" 399 | top: "bn2_scale1_full" 400 | param { 401 | name: "bn2_scale1_mean_full" 402 | lr_mult: 0 403 | decay_mult: 0 404 | } 405 | param { 406 | name: "bn2_scale1_var_full" 407 | lr_mult: 0 408 | decay_mult: 0 409 | } 410 | param { 411 | name: "bn2_scale1_bias_full" 412 | lr_mult: 0 413 | decay_mult: 0 414 | } 415 | batch_norm_param { 416 | use_global_stats: false 417 | } 418 | } 419 | layer { 420 | name: "bn2_scale2_full" 421 | type: "BatchNorm" 422 | bottom: "conv2_scale2_full" 423 | top: "bn2_scale2_full" 424 | param { 425 | name: "bn2_scale2_mean_full" 426 | lr_mult: 0 427 | decay_mult: 0 428 | } 429 | param { 430 | name: "bn2_scale2_var_full" 431 | lr_mult: 0 432 | decay_mult: 0 433 | } 434 | param { 435 | name: "bn2_scale2_bias_full" 436 | lr_mult: 0 437 | decay_mult: 0 438 | } 439 | batch_norm_param { 440 | use_global_stats: false 441 | } 442 | } 443 | layer { 444 | name: "bn2_scale3_full" 445 | type: "BatchNorm" 446 | bottom: "conv2_scale3_full" 447 | top: "bn2_scale3_full" 448 | param { 449 | name: "bn2_scale3_mean_full" 450 | lr_mult: 0 451 | decay_mult: 0 452 | } 453 | param { 454 | name: "bn2_scale3_var_full" 455 | lr_mult: 0 456 | decay_mult: 0 457 | } 458 | param { 459 | name: "bn2_scale3_bias_full" 460 | lr_mult: 0 461 | decay_mult: 0 462 | } 463 | batch_norm_param { 464 | use_global_stats: false 465 | } 466 | } 467 | layer { 468 | name: "bn2_full" 469 | type: "Concat" 470 | bottom: "bn2_scale1_full" 471 | bottom: "bn2_scale2_full" 472 | bottom: "bn2_scale3_full" 473 | top: "bn2_full" 474 | concat_param { 475 | axis: 1 476 | } 477 | } 478 | layer { 479 | name: "relu2_full" 480 | type: "ReLU" 481 | bottom: "bn2_full" 482 | top: "bn2_full" 483 | } 484 | 485 | 486 | layer { 487 | name: "make_att_mask" 488 | type: "Convolution" 489 | bottom: "bn2_full" 490 | top: "make_att_mask" 491 | param { 492 | lr_mult: 1 493 | decay_mult: 1 494 | } 495 | param { 496 | lr_mult: 2 497 | decay_mult: 0 498 | } 499 | convolution_param { 500 | num_output: 1 501 | pad: 1 502 | kernel_size: 3 503 | stride: 1 504 | weight_filler { 505 | type: "xavier" 506 | } 507 | bias_filler { 508 | type: "constant" 509 | } 510 | dilation: 1 511 | } 512 | } 513 | 514 | layer { 515 | name: "att_sigmoid" 516 | type: "Sigmoid" 517 | bottom: "make_att_mask" 518 | top: "make_att_mask" 519 | } 520 | 521 | ############### Seg Loss ##################### 522 | layer { 523 | name: "loss_seg" 524 | type: "EuclideanLoss" 525 | bottom: "gt" 526 | bottom: "make_att_mask" 527 | top: "loss_seg" 528 | loss_weight: 0.1 529 | } 530 | 531 | layer { 532 | name: "make_att_mask_inv" 533 | type: "Eltwise" 534 | bottom: "one_mask" 535 | bottom: "make_att_mask" 536 | top: "make_att_mask_inv" 537 | eltwise_param { 538 | operation: SUM 539 | coeff: 1 540 | coeff: -1 541 | } 542 | } 543 | ############### Seg Loss ##################### 544 | 545 | layer { 546 | name: "tile_iner" 547 | type: "Tile" 548 | bottom: "make_att_mask" 549 | top: "att_iner" 550 | tile_param { 551 | tiles: 96 552 | axis: 1 553 | } 554 | } 555 | 556 | layer { 557 | name: "bn2_att_iner" 558 | type: "Eltwise" 559 | bottom: "bn2_full" 560 | bottom: "att_iner" 561 | top: "bn2_att_iner" 562 | eltwise_param { 563 | operation: PROD 564 | } 565 | } 566 | 567 | layer { 568 | name: "tile_exter" 569 | type: "Tile" 570 | bottom: "make_att_mask_inv" 571 | top: "att_exter" 572 | tile_param { 573 | tiles: 96 574 | axis: 1 575 | } 576 | } 577 | 578 | layer { 579 | name: "bn2_att_exter" 580 | type: "Eltwise" 581 | bottom: "bn2_full" 582 | bottom: "att_exter" 583 | top: "bn2_att_exter" 584 | eltwise_param { 585 | operation: PROD 586 | } 587 | } 588 | 589 | layer { 590 | name: "pool2_full" 591 | type: "Pooling" 592 | bottom: "bn2_full" 593 | top: "pool2_full" 594 | pooling_param { 595 | pool: MAX 596 | kernel_size: 2 597 | stride: 2 598 | } 599 | } 600 | 601 | layer { 602 | name: "conv3_scale1_full" 603 | type: "Convolution" 604 | bottom: "pool2_full" 605 | top: "conv3_scale1_full" 606 | param { 607 | name: "conv3_scale1_w_full" 608 | lr_mult: 1 609 | decay_mult: 1 610 | } 611 | param { 612 | name: "conv3_scale1_b_full" 613 | lr_mult: 2 614 | decay_mult: 0 615 | } 616 | convolution_param { 617 | num_output: 32 618 | pad: 1 619 | kernel_size: 3 620 | stride: 1 621 | weight_filler { 622 | type: "xavier" 623 | } 624 | bias_filler { 625 | type: "constant" 626 | } 627 | dilation: 1 628 | } 629 | } 630 | layer { 631 | name: "conv3_scale2_full" 632 | type: "Convolution" 633 | bottom: "pool2_full" 634 | top: "conv3_scale2_full" 635 | param { 636 | name: "conv3_scale2_w_full" 637 | lr_mult: 1 638 | decay_mult: 1 639 | } 640 | param { 641 | name: "conv3_scale2_b_full" 642 | lr_mult: 2 643 | decay_mult: 0 644 | } 645 | convolution_param { 646 | num_output: 32 647 | pad: 2 648 | kernel_size: 3 649 | stride: 1 650 | weight_filler { 651 | type: "xavier" 652 | } 653 | bias_filler { 654 | type: "constant" 655 | } 656 | dilation: 2 657 | } 658 | } 659 | layer { 660 | name: "conv3_scale3_full" 661 | type: "Convolution" 662 | bottom: "pool2_full" 663 | top: "conv3_scale3_full" 664 | param { 665 | name: "conv3_scale3_w_full" 666 | lr_mult: 1 667 | decay_mult: 1 668 | } 669 | param { 670 | name: "conv3_scale3_b_full" 671 | lr_mult: 2 672 | decay_mult: 0 673 | } 674 | convolution_param { 675 | num_output: 32 676 | pad: 3 677 | kernel_size: 3 678 | stride: 1 679 | weight_filler { 680 | type: "xavier" 681 | } 682 | bias_filler { 683 | type: "constant" 684 | } 685 | dilation: 3 686 | } 687 | } 688 | layer { 689 | name: "bn3_scale1_full" 690 | type: "BatchNorm" 691 | bottom: "conv3_scale1_full" 692 | top: "bn3_scale1_full" 693 | param { 694 | name: "bn3_scale1_mean_full" 695 | lr_mult: 0 696 | decay_mult: 0 697 | } 698 | param { 699 | name: "bn3_scale1_var_full" 700 | lr_mult: 0 701 | decay_mult: 0 702 | } 703 | param { 704 | name: "bn3_scale1_bias_full" 705 | lr_mult: 0 706 | decay_mult: 0 707 | } 708 | batch_norm_param { 709 | use_global_stats: false 710 | } 711 | } 712 | layer { 713 | name: "bn3_scale2_full" 714 | type: "BatchNorm" 715 | bottom: "conv3_scale2_full" 716 | top: "bn3_scale2_full" 717 | param { 718 | name: "bn3_scale2_mean_full" 719 | lr_mult: 0 720 | decay_mult: 0 721 | } 722 | param { 723 | name: "bn3_scale2_var_full" 724 | lr_mult: 0 725 | decay_mult: 0 726 | } 727 | param { 728 | name: "bn3_scale2_bias_full" 729 | lr_mult: 0 730 | decay_mult: 0 731 | } 732 | batch_norm_param { 733 | use_global_stats: false 734 | } 735 | } 736 | layer { 737 | name: "bn3_scale3_full" 738 | type: "BatchNorm" 739 | bottom: "conv3_scale3_full" 740 | top: "bn3_scale3_full" 741 | param { 742 | name: "bn3_scale3_mean_full" 743 | lr_mult: 0 744 | decay_mult: 0 745 | } 746 | param { 747 | name: "bn3_scale3_var_full" 748 | lr_mult: 0 749 | decay_mult: 0 750 | } 751 | param { 752 | name: "bn3_scale3_bias_full" 753 | lr_mult: 0 754 | decay_mult: 0 755 | } 756 | batch_norm_param { 757 | use_global_stats: false 758 | } 759 | } 760 | layer { 761 | name: "bn3_full" 762 | type: "Concat" 763 | bottom: "bn3_scale1_full" 764 | bottom: "bn3_scale2_full" 765 | bottom: "bn3_scale3_full" 766 | top: "bn3_full" 767 | concat_param { 768 | axis: 1 769 | } 770 | } 771 | layer { 772 | name: "relu3_full" 773 | type: "ReLU" 774 | bottom: "bn3_full" 775 | top: "bn3_full" 776 | } 777 | layer { 778 | name: "pool3_full" 779 | type: "Pooling" 780 | bottom: "bn3_full" 781 | top: "pool3_full" 782 | pooling_param { 783 | pool: MAX 784 | kernel_size: 2 785 | stride: 2 786 | } 787 | } 788 | layer { 789 | name: "conv4_scale1_full" 790 | type: "Convolution" 791 | bottom: "pool3_full" 792 | top: "conv4_scale1_full" 793 | param { 794 | name: "conv4_scale1_w_full" 795 | lr_mult: 1 796 | decay_mult: 1 797 | } 798 | param { 799 | name: "conv4_scale1_b_full" 800 | lr_mult: 2 801 | decay_mult: 0 802 | } 803 | convolution_param { 804 | num_output: 32 805 | pad: 1 806 | kernel_size: 3 807 | stride: 1 808 | weight_filler { 809 | type: "xavier" 810 | } 811 | bias_filler { 812 | type: "constant" 813 | } 814 | dilation: 1 815 | } 816 | } 817 | layer { 818 | name: "conv4_scale2_full" 819 | type: "Convolution" 820 | bottom: "pool3_full" 821 | top: "conv4_scale2_full" 822 | param { 823 | name: "conv4_scale2_w_full" 824 | lr_mult: 1 825 | decay_mult: 1 826 | } 827 | param { 828 | name: "conv4_scale2_b_full" 829 | lr_mult: 2 830 | decay_mult: 0 831 | } 832 | convolution_param { 833 | num_output: 32 834 | pad: 2 835 | kernel_size: 3 836 | stride: 1 837 | weight_filler { 838 | type: "xavier" 839 | } 840 | bias_filler { 841 | type: "constant" 842 | } 843 | dilation: 2 844 | } 845 | } 846 | layer { 847 | name: "conv4_scale3_full" 848 | type: "Convolution" 849 | bottom: "pool3_full" 850 | top: "conv4_scale3_full" 851 | param { 852 | name: "conv4_scale3_w_full" 853 | lr_mult: 1 854 | decay_mult: 1 855 | } 856 | param { 857 | name: "conv4_scale3_b_full" 858 | lr_mult: 2 859 | decay_mult: 0 860 | } 861 | convolution_param { 862 | num_output: 32 863 | pad: 3 864 | kernel_size: 3 865 | stride: 1 866 | weight_filler { 867 | type: "xavier" 868 | } 869 | bias_filler { 870 | type: "constant" 871 | } 872 | dilation: 3 873 | } 874 | } 875 | layer { 876 | name: "bn4_scale1_full" 877 | type: "BatchNorm" 878 | bottom: "conv4_scale1_full" 879 | top: "bn4_scale1_full" 880 | param { 881 | name: "bn4_scale1_mean_full" 882 | lr_mult: 0 883 | decay_mult: 0 884 | } 885 | param { 886 | name: "bn4_scale1_var_full" 887 | lr_mult: 0 888 | decay_mult: 0 889 | } 890 | param { 891 | name: "bn4_scale1_bias_full" 892 | lr_mult: 0 893 | decay_mult: 0 894 | } 895 | batch_norm_param { 896 | use_global_stats: false 897 | } 898 | } 899 | layer { 900 | name: "bn4_scale2_full" 901 | type: "BatchNorm" 902 | bottom: "conv4_scale2_full" 903 | top: "bn4_scale2_full" 904 | param { 905 | name: "bn4_scale2_mean_full" 906 | lr_mult: 0 907 | decay_mult: 0 908 | } 909 | param { 910 | name: "bn4_scale2_var_full" 911 | lr_mult: 0 912 | decay_mult: 0 913 | } 914 | param { 915 | name: "bn4_scale2_bias_full" 916 | lr_mult: 0 917 | decay_mult: 0 918 | } 919 | batch_norm_param { 920 | use_global_stats: false 921 | } 922 | } 923 | layer { 924 | name: "bn4_scale3_full" 925 | type: "BatchNorm" 926 | bottom: "conv4_scale3_full" 927 | top: "bn4_scale3_full" 928 | param { 929 | name: "bn4_scale3_mean_full" 930 | lr_mult: 0 931 | decay_mult: 0 932 | } 933 | param { 934 | name: "bn4_scale3_var_full" 935 | lr_mult: 0 936 | decay_mult: 0 937 | } 938 | param { 939 | name: "bn4_scale3_bias_full" 940 | lr_mult: 0 941 | decay_mult: 0 942 | } 943 | batch_norm_param { 944 | use_global_stats: false 945 | } 946 | } 947 | layer { 948 | name: "bn4_full" 949 | type: "Concat" 950 | bottom: "bn4_scale1_full" 951 | bottom: "bn4_scale2_full" 952 | bottom: "bn4_scale3_full" 953 | top: "bn4_full" 954 | concat_param { 955 | axis: 1 956 | } 957 | } 958 | layer { 959 | name: "relu4_full" 960 | type: "ReLU" 961 | bottom: "bn4_full" 962 | top: "bn4_full" 963 | } 964 | layer { 965 | name: "pool4_full" 966 | type: "Pooling" 967 | bottom: "bn4_full" 968 | top: "pool4_full" 969 | pooling_param { 970 | pool: MAX 971 | kernel_size: 2 972 | stride: 2 973 | } 974 | } 975 | layer { 976 | name: "fc1_full" 977 | type: "InnerProduct" 978 | bottom: "pool4_full" 979 | top: "fc1_full" 980 | param { 981 | name: "fc1_w_full" 982 | lr_mult: 1 983 | decay_mult: 1 984 | } 985 | param { 986 | name: "fc1_b_full" 987 | lr_mult: 2 988 | decay_mult: 0 989 | } 990 | inner_product_param { 991 | num_output: 128 992 | weight_filler { 993 | type: "xavier" 994 | } 995 | bias_filler { 996 | type: "constant" 997 | } 998 | } 999 | } 1000 | layer { 1001 | name: "fc1_full_drop" 1002 | type: "Dropout" 1003 | bottom: "fc1_full" 1004 | top: "fc1_full" 1005 | dropout_param { 1006 | dropout_ratio: 0.2 1007 | } 1008 | } 1009 | 1010 | layer { 1011 | name: "fc2_full" 1012 | type: "InnerProduct" 1013 | bottom: "fc1_full" 1014 | top: "fc2_full" 1015 | param { 1016 | lr_mult: 1 1017 | decay_mult: 1 1018 | } 1019 | param { 1020 | lr_mult: 2 1021 | decay_mult: 0 1022 | } 1023 | inner_product_param { 1024 | num_output: 751 1025 | weight_filler { 1026 | type: "xavier" 1027 | } 1028 | bias_filler { 1029 | type: "constant" 1030 | } 1031 | } 1032 | } 1033 | layer { 1034 | name: "loss_cls_full" 1035 | type: "SoftmaxWithLoss" 1036 | bottom: "fc2_full" 1037 | bottom: "label" 1038 | top: "loss_cls_full" 1039 | loss_weight: 1 1040 | } 1041 | layer { 1042 | name: "acc_cls_full" 1043 | type: "Accuracy" 1044 | bottom: "fc2_full" 1045 | bottom: "label" 1046 | top: "acc_cls_full" 1047 | } 1048 | 1049 | layer { 1050 | name: "pool2_iner" 1051 | type: "Pooling" 1052 | bottom: "bn2_att_iner" 1053 | top: "pool2_iner" 1054 | pooling_param { 1055 | pool: MAX 1056 | kernel_size: 2 1057 | stride: 2 1058 | } 1059 | } 1060 | 1061 | layer { 1062 | name: "conv3_scale1_iner" 1063 | type: "Convolution" 1064 | bottom: "pool2_iner" 1065 | top: "conv3_scale1_iner" 1066 | param { 1067 | name: "conv3_scale1_w_iner" 1068 | lr_mult: 1 1069 | decay_mult: 1 1070 | } 1071 | param { 1072 | name: "conv3_scale1_b_iner" 1073 | lr_mult: 2 1074 | decay_mult: 0 1075 | } 1076 | convolution_param { 1077 | num_output: 32 1078 | pad: 1 1079 | kernel_size: 3 1080 | stride: 1 1081 | weight_filler { 1082 | type: "xavier" 1083 | } 1084 | bias_filler { 1085 | type: "constant" 1086 | } 1087 | dilation: 1 1088 | } 1089 | } 1090 | layer { 1091 | name: "conv3_scale2_iner" 1092 | type: "Convolution" 1093 | bottom: "pool2_iner" 1094 | top: "conv3_scale2_iner" 1095 | param { 1096 | name: "conv3_scale2_w_iner" 1097 | lr_mult: 1 1098 | decay_mult: 1 1099 | } 1100 | param { 1101 | name: "conv3_scale2_b_iner" 1102 | lr_mult: 2 1103 | decay_mult: 0 1104 | } 1105 | convolution_param { 1106 | num_output: 32 1107 | pad: 2 1108 | kernel_size: 3 1109 | stride: 1 1110 | weight_filler { 1111 | type: "xavier" 1112 | } 1113 | bias_filler { 1114 | type: "constant" 1115 | } 1116 | dilation: 2 1117 | } 1118 | } 1119 | layer { 1120 | name: "conv3_scale3_iner" 1121 | type: "Convolution" 1122 | bottom: "pool2_iner" 1123 | top: "conv3_scale3_iner" 1124 | param { 1125 | name: "conv3_scale3_w_iner" 1126 | lr_mult: 1 1127 | decay_mult: 1 1128 | } 1129 | param { 1130 | name: "conv3_scale3_b_iner" 1131 | lr_mult: 2 1132 | decay_mult: 0 1133 | } 1134 | convolution_param { 1135 | num_output: 32 1136 | pad: 3 1137 | kernel_size: 3 1138 | stride: 1 1139 | weight_filler { 1140 | type: "xavier" 1141 | } 1142 | bias_filler { 1143 | type: "constant" 1144 | } 1145 | dilation: 3 1146 | } 1147 | } 1148 | layer { 1149 | name: "bn3_scale1_iner" 1150 | type: "BatchNorm" 1151 | bottom: "conv3_scale1_iner" 1152 | top: "bn3_scale1_iner" 1153 | param { 1154 | name: "bn3_scale1_mean_iner" 1155 | lr_mult: 0 1156 | decay_mult: 0 1157 | } 1158 | param { 1159 | name: "bn3_scale1_var_iner" 1160 | lr_mult: 0 1161 | decay_mult: 0 1162 | } 1163 | param { 1164 | name: "bn3_scale1_bias_iner" 1165 | lr_mult: 0 1166 | decay_mult: 0 1167 | } 1168 | batch_norm_param { 1169 | use_global_stats: false 1170 | } 1171 | } 1172 | layer { 1173 | name: "bn3_scale2_iner" 1174 | type: "BatchNorm" 1175 | bottom: "conv3_scale2_iner" 1176 | top: "bn3_scale2_iner" 1177 | param { 1178 | name: "bn3_scale2_mean_iner" 1179 | lr_mult: 0 1180 | decay_mult: 0 1181 | } 1182 | param { 1183 | name: "bn3_scale2_var_iner" 1184 | lr_mult: 0 1185 | decay_mult: 0 1186 | } 1187 | param { 1188 | name: "bn3_scale2_bias_iner" 1189 | lr_mult: 0 1190 | decay_mult: 0 1191 | } 1192 | batch_norm_param { 1193 | use_global_stats: false 1194 | } 1195 | } 1196 | layer { 1197 | name: "bn3_scale3_iner" 1198 | type: "BatchNorm" 1199 | bottom: "conv3_scale3_iner" 1200 | top: "bn3_scale3_iner" 1201 | param { 1202 | name: "bn3_scale3_mean_iner" 1203 | lr_mult: 0 1204 | decay_mult: 0 1205 | } 1206 | param { 1207 | name: "bn3_scale3_var_iner" 1208 | lr_mult: 0 1209 | decay_mult: 0 1210 | } 1211 | param { 1212 | name: "bn3_scale3_bias_iner" 1213 | lr_mult: 0 1214 | decay_mult: 0 1215 | } 1216 | batch_norm_param { 1217 | use_global_stats: false 1218 | } 1219 | } 1220 | layer { 1221 | name: "bn3_iner" 1222 | type: "Concat" 1223 | bottom: "bn3_scale1_iner" 1224 | bottom: "bn3_scale2_iner" 1225 | bottom: "bn3_scale3_iner" 1226 | top: "bn3_iner" 1227 | concat_param { 1228 | axis: 1 1229 | } 1230 | } 1231 | layer { 1232 | name: "relu3_iner" 1233 | type: "ReLU" 1234 | bottom: "bn3_iner" 1235 | top: "bn3_iner" 1236 | } 1237 | layer { 1238 | name: "pool3_iner" 1239 | type: "Pooling" 1240 | bottom: "bn3_iner" 1241 | top: "pool3_iner" 1242 | pooling_param { 1243 | pool: MAX 1244 | kernel_size: 2 1245 | stride: 2 1246 | } 1247 | } 1248 | layer { 1249 | name: "conv4_scale1_iner" 1250 | type: "Convolution" 1251 | bottom: "pool3_iner" 1252 | top: "conv4_scale1_iner" 1253 | param { 1254 | name: "conv4_scale1_w_iner" 1255 | lr_mult: 1 1256 | decay_mult: 1 1257 | } 1258 | param { 1259 | name: "conv4_scale1_b_iner" 1260 | lr_mult: 2 1261 | decay_mult: 0 1262 | } 1263 | convolution_param { 1264 | num_output: 32 1265 | pad: 1 1266 | kernel_size: 3 1267 | stride: 1 1268 | weight_filler { 1269 | type: "xavier" 1270 | } 1271 | bias_filler { 1272 | type: "constant" 1273 | } 1274 | dilation: 1 1275 | } 1276 | } 1277 | layer { 1278 | name: "conv4_scale2_iner" 1279 | type: "Convolution" 1280 | bottom: "pool3_iner" 1281 | top: "conv4_scale2_iner" 1282 | param { 1283 | name: "conv4_scale2_w_iner" 1284 | lr_mult: 1 1285 | decay_mult: 1 1286 | } 1287 | param { 1288 | name: "conv4_scale2_b_iner" 1289 | lr_mult: 2 1290 | decay_mult: 0 1291 | } 1292 | convolution_param { 1293 | num_output: 32 1294 | pad: 2 1295 | kernel_size: 3 1296 | stride: 1 1297 | weight_filler { 1298 | type: "xavier" 1299 | } 1300 | bias_filler { 1301 | type: "constant" 1302 | } 1303 | dilation: 2 1304 | } 1305 | } 1306 | layer { 1307 | name: "conv4_scale3_iner" 1308 | type: "Convolution" 1309 | bottom: "pool3_iner" 1310 | top: "conv4_scale3_iner" 1311 | param { 1312 | name: "conv4_scale3_w_iner" 1313 | lr_mult: 1 1314 | decay_mult: 1 1315 | } 1316 | param { 1317 | name: "conv4_scale3_b_iner" 1318 | lr_mult: 2 1319 | decay_mult: 0 1320 | } 1321 | convolution_param { 1322 | num_output: 32 1323 | pad: 3 1324 | kernel_size: 3 1325 | stride: 1 1326 | weight_filler { 1327 | type: "xavier" 1328 | } 1329 | bias_filler { 1330 | type: "constant" 1331 | } 1332 | dilation: 3 1333 | } 1334 | } 1335 | layer { 1336 | name: "bn4_scale1_iner" 1337 | type: "BatchNorm" 1338 | bottom: "conv4_scale1_iner" 1339 | top: "bn4_scale1_iner" 1340 | param { 1341 | name: "bn4_scale1_mean_iner" 1342 | lr_mult: 0 1343 | decay_mult: 0 1344 | } 1345 | param { 1346 | name: "bn4_scale1_var_iner" 1347 | lr_mult: 0 1348 | decay_mult: 0 1349 | } 1350 | param { 1351 | name: "bn4_scale1_bias_iner" 1352 | lr_mult: 0 1353 | decay_mult: 0 1354 | } 1355 | batch_norm_param { 1356 | use_global_stats: false 1357 | } 1358 | } 1359 | layer { 1360 | name: "bn4_scale2_iner" 1361 | type: "BatchNorm" 1362 | bottom: "conv4_scale2_iner" 1363 | top: "bn4_scale2_iner" 1364 | param { 1365 | name: "bn4_scale2_mean_iner" 1366 | lr_mult: 0 1367 | decay_mult: 0 1368 | } 1369 | param { 1370 | name: "bn4_scale2_var_iner" 1371 | lr_mult: 0 1372 | decay_mult: 0 1373 | } 1374 | param { 1375 | name: "bn4_scale2_bias_iner" 1376 | lr_mult: 0 1377 | decay_mult: 0 1378 | } 1379 | batch_norm_param { 1380 | use_global_stats: false 1381 | } 1382 | } 1383 | layer { 1384 | name: "bn4_scale3_iner" 1385 | type: "BatchNorm" 1386 | bottom: "conv4_scale3_iner" 1387 | top: "bn4_scale3_iner" 1388 | param { 1389 | name: "bn4_scale3_mean_iner" 1390 | lr_mult: 0 1391 | decay_mult: 0 1392 | } 1393 | param { 1394 | name: "bn4_scale3_var_iner" 1395 | lr_mult: 0 1396 | decay_mult: 0 1397 | } 1398 | param { 1399 | name: "bn4_scale3_bias_iner" 1400 | lr_mult: 0 1401 | decay_mult: 0 1402 | } 1403 | batch_norm_param { 1404 | use_global_stats: false 1405 | } 1406 | } 1407 | layer { 1408 | name: "bn4_iner" 1409 | type: "Concat" 1410 | bottom: "bn4_scale1_iner" 1411 | bottom: "bn4_scale2_iner" 1412 | bottom: "bn4_scale3_iner" 1413 | top: "bn4_iner" 1414 | concat_param { 1415 | axis: 1 1416 | } 1417 | } 1418 | layer { 1419 | name: "relu4_iner" 1420 | type: "ReLU" 1421 | bottom: "bn4_iner" 1422 | top: "bn4_iner" 1423 | } 1424 | layer { 1425 | name: "pool4_iner" 1426 | type: "Pooling" 1427 | bottom: "bn4_iner" 1428 | top: "pool4_iner" 1429 | pooling_param { 1430 | pool: MAX 1431 | kernel_size: 2 1432 | stride: 2 1433 | } 1434 | } 1435 | layer { 1436 | name: "fc1_iner" 1437 | type: "InnerProduct" 1438 | bottom: "pool4_iner" 1439 | top: "fc1_iner" 1440 | param { 1441 | name: "fc1_w_iner" 1442 | lr_mult: 1 1443 | decay_mult: 1 1444 | } 1445 | param { 1446 | name: "fc1_b_iner" 1447 | lr_mult: 2 1448 | decay_mult: 0 1449 | } 1450 | inner_product_param { 1451 | num_output: 128 1452 | weight_filler { 1453 | type: "xavier" 1454 | } 1455 | bias_filler { 1456 | type: "constant" 1457 | } 1458 | } 1459 | } 1460 | layer { 1461 | name: "fc1_iner_drop" 1462 | type: "Dropout" 1463 | bottom: "fc1_iner" 1464 | top: "fc1_iner" 1465 | dropout_param { 1466 | dropout_ratio: 0.2 1467 | } 1468 | } 1469 | 1470 | layer { 1471 | name: "fc2_iner" 1472 | type: "InnerProduct" 1473 | bottom: "fc1_iner" 1474 | top: "fc2_iner" 1475 | param { 1476 | lr_mult: 1 1477 | decay_mult: 1 1478 | } 1479 | param { 1480 | lr_mult: 2 1481 | decay_mult: 0 1482 | } 1483 | inner_product_param { 1484 | num_output: 751 1485 | weight_filler { 1486 | type: "xavier" 1487 | } 1488 | bias_filler { 1489 | type: "constant" 1490 | } 1491 | } 1492 | } 1493 | layer { 1494 | name: "loss_cls_iner" 1495 | type: "SoftmaxWithLoss" 1496 | bottom: "fc2_iner" 1497 | bottom: "label" 1498 | top: "loss_cls_iner" 1499 | loss_weight: 1 1500 | } 1501 | layer { 1502 | name: "acc_cls_iner" 1503 | type: "Accuracy" 1504 | bottom: "fc2_iner" 1505 | bottom: "label" 1506 | top: "acc_cls_iner" 1507 | } 1508 | 1509 | layer { 1510 | name: "pool2_exter" 1511 | type: "Pooling" 1512 | bottom: "bn2_att_exter" 1513 | top: "pool2_exter" 1514 | pooling_param { 1515 | pool: MAX 1516 | kernel_size: 2 1517 | stride: 2 1518 | } 1519 | } 1520 | 1521 | layer { 1522 | name: "conv3_scale1_exter" 1523 | type: "Convolution" 1524 | bottom: "pool2_exter" 1525 | top: "conv3_scale1_exter" 1526 | param { 1527 | name: "conv3_scale1_w_exter" 1528 | lr_mult: 1 1529 | decay_mult: 1 1530 | } 1531 | param { 1532 | name: "conv3_scale1_b_exter" 1533 | lr_mult: 2 1534 | decay_mult: 0 1535 | } 1536 | convolution_param { 1537 | num_output: 32 1538 | pad: 1 1539 | kernel_size: 3 1540 | stride: 1 1541 | weight_filler { 1542 | type: "xavier" 1543 | } 1544 | bias_filler { 1545 | type: "constant" 1546 | } 1547 | dilation: 1 1548 | } 1549 | } 1550 | layer { 1551 | name: "conv3_scale2_exter" 1552 | type: "Convolution" 1553 | bottom: "pool2_exter" 1554 | top: "conv3_scale2_exter" 1555 | param { 1556 | name: "conv3_scale2_w_exter" 1557 | lr_mult: 1 1558 | decay_mult: 1 1559 | } 1560 | param { 1561 | name: "conv3_scale2_b_exter" 1562 | lr_mult: 2 1563 | decay_mult: 0 1564 | } 1565 | convolution_param { 1566 | num_output: 32 1567 | pad: 2 1568 | kernel_size: 3 1569 | stride: 1 1570 | weight_filler { 1571 | type: "xavier" 1572 | } 1573 | bias_filler { 1574 | type: "constant" 1575 | } 1576 | dilation: 2 1577 | } 1578 | } 1579 | layer { 1580 | name: "conv3_scale3_exter" 1581 | type: "Convolution" 1582 | bottom: "pool2_exter" 1583 | top: "conv3_scale3_exter" 1584 | param { 1585 | name: "conv3_scale3_w_exter" 1586 | lr_mult: 1 1587 | decay_mult: 1 1588 | } 1589 | param { 1590 | name: "conv3_scale3_b_exter" 1591 | lr_mult: 2 1592 | decay_mult: 0 1593 | } 1594 | convolution_param { 1595 | num_output: 32 1596 | pad: 3 1597 | kernel_size: 3 1598 | stride: 1 1599 | weight_filler { 1600 | type: "xavier" 1601 | } 1602 | bias_filler { 1603 | type: "constant" 1604 | } 1605 | dilation: 3 1606 | } 1607 | } 1608 | layer { 1609 | name: "bn3_scale1_exter" 1610 | type: "BatchNorm" 1611 | bottom: "conv3_scale1_exter" 1612 | top: "bn3_scale1_exter" 1613 | param { 1614 | name: "bn3_scale1_mean_exter" 1615 | lr_mult: 0 1616 | decay_mult: 0 1617 | } 1618 | param { 1619 | name: "bn3_scale1_var_exter" 1620 | lr_mult: 0 1621 | decay_mult: 0 1622 | } 1623 | param { 1624 | name: "bn3_scale1_bias_exter" 1625 | lr_mult: 0 1626 | decay_mult: 0 1627 | } 1628 | batch_norm_param { 1629 | use_global_stats: false 1630 | } 1631 | } 1632 | layer { 1633 | name: "bn3_scale2_exter" 1634 | type: "BatchNorm" 1635 | bottom: "conv3_scale2_exter" 1636 | top: "bn3_scale2_exter" 1637 | param { 1638 | name: "bn3_scale2_mean_exter" 1639 | lr_mult: 0 1640 | decay_mult: 0 1641 | } 1642 | param { 1643 | name: "bn3_scale2_var_exter" 1644 | lr_mult: 0 1645 | decay_mult: 0 1646 | } 1647 | param { 1648 | name: "bn3_scale2_bias_exter" 1649 | lr_mult: 0 1650 | decay_mult: 0 1651 | } 1652 | batch_norm_param { 1653 | use_global_stats: false 1654 | } 1655 | } 1656 | layer { 1657 | name: "bn3_scale3_exter" 1658 | type: "BatchNorm" 1659 | bottom: "conv3_scale3_exter" 1660 | top: "bn3_scale3_exter" 1661 | param { 1662 | name: "bn3_scale3_mean_exter" 1663 | lr_mult: 0 1664 | decay_mult: 0 1665 | } 1666 | param { 1667 | name: "bn3_scale3_var_exter" 1668 | lr_mult: 0 1669 | decay_mult: 0 1670 | } 1671 | param { 1672 | name: "bn3_scale3_bias_exter" 1673 | lr_mult: 0 1674 | decay_mult: 0 1675 | } 1676 | batch_norm_param { 1677 | use_global_stats: false 1678 | } 1679 | } 1680 | layer { 1681 | name: "bn3_exter" 1682 | type: "Concat" 1683 | bottom: "bn3_scale1_exter" 1684 | bottom: "bn3_scale2_exter" 1685 | bottom: "bn3_scale3_exter" 1686 | top: "bn3_exter" 1687 | concat_param { 1688 | axis: 1 1689 | } 1690 | } 1691 | layer { 1692 | name: "relu3_exter" 1693 | type: "ReLU" 1694 | bottom: "bn3_exter" 1695 | top: "bn3_exter" 1696 | } 1697 | layer { 1698 | name: "pool3_exter" 1699 | type: "Pooling" 1700 | bottom: "bn3_exter" 1701 | top: "pool3_exter" 1702 | pooling_param { 1703 | pool: MAX 1704 | kernel_size: 2 1705 | stride: 2 1706 | } 1707 | } 1708 | layer { 1709 | name: "conv4_scale1_exter" 1710 | type: "Convolution" 1711 | bottom: "pool3_exter" 1712 | top: "conv4_scale1_exter" 1713 | param { 1714 | name: "conv4_scale1_w_exter" 1715 | lr_mult: 1 1716 | decay_mult: 1 1717 | } 1718 | param { 1719 | name: "conv4_scale1_b_exter" 1720 | lr_mult: 2 1721 | decay_mult: 0 1722 | } 1723 | convolution_param { 1724 | num_output: 32 1725 | pad: 1 1726 | kernel_size: 3 1727 | stride: 1 1728 | weight_filler { 1729 | type: "xavier" 1730 | } 1731 | bias_filler { 1732 | type: "constant" 1733 | } 1734 | dilation: 1 1735 | } 1736 | } 1737 | layer { 1738 | name: "conv4_scale2_exter" 1739 | type: "Convolution" 1740 | bottom: "pool3_exter" 1741 | top: "conv4_scale2_exter" 1742 | param { 1743 | name: "conv4_scale2_w_exter" 1744 | lr_mult: 1 1745 | decay_mult: 1 1746 | } 1747 | param { 1748 | name: "conv4_scale2_b_exter" 1749 | lr_mult: 2 1750 | decay_mult: 0 1751 | } 1752 | convolution_param { 1753 | num_output: 32 1754 | pad: 2 1755 | kernel_size: 3 1756 | stride: 1 1757 | weight_filler { 1758 | type: "xavier" 1759 | } 1760 | bias_filler { 1761 | type: "constant" 1762 | } 1763 | dilation: 2 1764 | } 1765 | } 1766 | layer { 1767 | name: "conv4_scale3_exter" 1768 | type: "Convolution" 1769 | bottom: "pool3_exter" 1770 | top: "conv4_scale3_exter" 1771 | param { 1772 | name: "conv4_scale3_w_exter" 1773 | lr_mult: 1 1774 | decay_mult: 1 1775 | } 1776 | param { 1777 | name: "conv4_scale3_b_exter" 1778 | lr_mult: 2 1779 | decay_mult: 0 1780 | } 1781 | convolution_param { 1782 | num_output: 32 1783 | pad: 3 1784 | kernel_size: 3 1785 | stride: 1 1786 | weight_filler { 1787 | type: "xavier" 1788 | } 1789 | bias_filler { 1790 | type: "constant" 1791 | } 1792 | dilation: 3 1793 | } 1794 | } 1795 | layer { 1796 | name: "bn4_scale1_exter" 1797 | type: "BatchNorm" 1798 | bottom: "conv4_scale1_exter" 1799 | top: "bn4_scale1_exter" 1800 | param { 1801 | name: "bn4_scale1_mean_exter" 1802 | lr_mult: 0 1803 | decay_mult: 0 1804 | } 1805 | param { 1806 | name: "bn4_scale1_var_exter" 1807 | lr_mult: 0 1808 | decay_mult: 0 1809 | } 1810 | param { 1811 | name: "bn4_scale1_bias_exter" 1812 | lr_mult: 0 1813 | decay_mult: 0 1814 | } 1815 | batch_norm_param { 1816 | use_global_stats: false 1817 | } 1818 | } 1819 | layer { 1820 | name: "bn4_scale2_exter" 1821 | type: "BatchNorm" 1822 | bottom: "conv4_scale2_exter" 1823 | top: "bn4_scale2_exter" 1824 | param { 1825 | name: "bn4_scale2_mean_exter" 1826 | lr_mult: 0 1827 | decay_mult: 0 1828 | } 1829 | param { 1830 | name: "bn4_scale2_var_exter" 1831 | lr_mult: 0 1832 | decay_mult: 0 1833 | } 1834 | param { 1835 | name: "bn4_scale2_bias_exter" 1836 | lr_mult: 0 1837 | decay_mult: 0 1838 | } 1839 | batch_norm_param { 1840 | use_global_stats: false 1841 | } 1842 | } 1843 | layer { 1844 | name: "bn4_scale3_exter" 1845 | type: "BatchNorm" 1846 | bottom: "conv4_scale3_exter" 1847 | top: "bn4_scale3_exter" 1848 | param { 1849 | name: "bn4_scale3_mean_exter" 1850 | lr_mult: 0 1851 | decay_mult: 0 1852 | } 1853 | param { 1854 | name: "bn4_scale3_var_exter" 1855 | lr_mult: 0 1856 | decay_mult: 0 1857 | } 1858 | param { 1859 | name: "bn4_scale3_bias_exter" 1860 | lr_mult: 0 1861 | decay_mult: 0 1862 | } 1863 | batch_norm_param { 1864 | use_global_stats: false 1865 | } 1866 | } 1867 | layer { 1868 | name: "bn4_exter" 1869 | type: "Concat" 1870 | bottom: "bn4_scale1_exter" 1871 | bottom: "bn4_scale2_exter" 1872 | bottom: "bn4_scale3_exter" 1873 | top: "bn4_exter" 1874 | concat_param { 1875 | axis: 1 1876 | } 1877 | } 1878 | layer { 1879 | name: "relu4_exter" 1880 | type: "ReLU" 1881 | bottom: "bn4_exter" 1882 | top: "bn4_exter" 1883 | } 1884 | layer { 1885 | name: "pool4_exter" 1886 | type: "Pooling" 1887 | bottom: "bn4_exter" 1888 | top: "pool4_exter" 1889 | pooling_param { 1890 | pool: MAX 1891 | kernel_size: 2 1892 | stride: 2 1893 | } 1894 | } 1895 | layer { 1896 | name: "fc1_exter" 1897 | type: "InnerProduct" 1898 | bottom: "pool4_exter" 1899 | top: "fc1_exter" 1900 | param { 1901 | name: "fc1_w_exter" 1902 | lr_mult: 1 1903 | decay_mult: 1 1904 | } 1905 | param { 1906 | name: "fc1_b_exter" 1907 | lr_mult: 2 1908 | decay_mult: 0 1909 | } 1910 | inner_product_param { 1911 | num_output: 128 1912 | weight_filler { 1913 | type: "xavier" 1914 | } 1915 | bias_filler { 1916 | type: "constant" 1917 | } 1918 | } 1919 | } 1920 | layer { 1921 | name: "fc1_exter_drop" 1922 | type: "Dropout" 1923 | bottom: "fc1_exter" 1924 | top: "fc1_exter" 1925 | dropout_param { 1926 | dropout_ratio: 0.2 1927 | } 1928 | } 1929 | 1930 | layer { 1931 | name: "fc2_exter" 1932 | type: "InnerProduct" 1933 | bottom: "fc1_exter" 1934 | top: "fc2_exter" 1935 | param { 1936 | lr_mult: 1 1937 | decay_mult: 1 1938 | } 1939 | param { 1940 | lr_mult: 2 1941 | decay_mult: 0 1942 | } 1943 | inner_product_param { 1944 | num_output: 751 # For mars dataset training set. 1945 | weight_filler { 1946 | type: "xavier" 1947 | } 1948 | bias_filler { 1949 | type: "constant" 1950 | } 1951 | } 1952 | } 1953 | layer { 1954 | name: "loss_cls_exter" 1955 | type: "SoftmaxWithLoss" 1956 | bottom: "fc2_exter" 1957 | bottom: "label_plus" 1958 | top: "loss_cls_exter" 1959 | loss_weight: 1 1960 | } 1961 | layer { 1962 | name: "acc_cls_exter" 1963 | type: "Accuracy" 1964 | bottom: "fc2_exter" 1965 | bottom: "label_plus" 1966 | top: "acc_cls_exter" 1967 | } 1968 | 1969 | layer { 1970 | name: "loss_pull" 1971 | type: "ContrastiveLoss" 1972 | bottom: "fc1_iner" 1973 | bottom: "fc1_full" 1974 | bottom: "sim_iner" 1975 | top: "loss_pull" 1976 | loss_weight: 0.01 1977 | contrastive_loss_param { 1978 | margin: 0.1 # we set margin to 0 or small value for body regions. 1979 | } 1980 | } 1981 | 1982 | layer { 1983 | name: "loss_push" 1984 | type: "ContrastiveLoss" 1985 | bottom: "fc1_exter" 1986 | bottom: "fc1_full" 1987 | bottom: "sim_exter" 1988 | top: "loss_push" 1989 | loss_weight: 0.01 1990 | contrastive_loss_param { 1991 | margin: 100 # we set margin to 100 or 10 for background regions. 1992 | } 1993 | } -------------------------------------------------------------------------------- /experiments/market1501/run_mgcam.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env sh 2 | LOG=./mgcam-`date +%Y-%m-%d-%H-%M-%S`.log 3 | CAFFE=/path-to-caffe/build/tools/caffe 4 | 5 | $CAFFE train --solver=./solver_mgcam.prototxt --gpu=0 2>&1 | tee $LOG 6 | -------------------------------------------------------------------------------- /experiments/market1501/run_mgcam_siamese.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env sh 2 | LOG=./mgcam-siamese`date +%Y-%m-%d-%H-%M-%S`.log 3 | CAFFE=/path-to-caffe/build/tools/caffe 4 | 5 | $CAFFE train --solver=./solver_mgcam_siamese.prototxt --weights=./mgcam_iter_75000.caffemodel --gpu=0 2>&1 | tee $LOG 6 | -------------------------------------------------------------------------------- /experiments/market1501/solver_mgcam.prototxt: -------------------------------------------------------------------------------- 1 | net: "mgcam_train.prototxt" 2 | 3 | test_iter: 10 4 | test_interval:1000 5 | base_lr: 0.01 6 | lr_policy: "step" 7 | gamma: 0.1 8 | stepsize: 15000 9 | display: 10 10 | max_iter: 75000 11 | momentum: 0.9 12 | weight_decay: 0.005 13 | snapshot: 5000 14 | snapshot_prefix: "mgcam" 15 | solver_mode: GPU 16 | -------------------------------------------------------------------------------- /experiments/market1501/solver_mgcam_siamese.prototxt: -------------------------------------------------------------------------------- 1 | net: "mgcam_siamese_train.prototxt" 2 | 3 | test_iter: 10 4 | test_interval:1000 5 | base_lr: 0.0001 6 | lr_policy: "step" 7 | gamma: 0.1 8 | stepsize: 10000 9 | display: 10 10 | max_iter: 20000 11 | momentum: 0.9 12 | weight_decay: 0.005 13 | snapshot: 5000 14 | snapshot_prefix: "mgcam_siamese" 15 | solver_mode: GPU 16 | -------------------------------------------------------------------------------- /experiments/mars/layers.py: -------------------------------------------------------------------------------- 1 | """ 2 | MGCAM data layer. 3 | 4 | by Chunfeng Song 5 | 6 | 2017/10/08 7 | 8 | This code is for research use only, please cite our paper: 9 | 10 | Chunfeng Song, Yan Huang, Wanli Ouyang, and Liang Wang. Mask-guided Contrastive Attention Model for Person Re-Identification. In CVPR, 2018. 11 | 12 | Contact us: chunfeng.song@nlpr.ia.ac.cn 13 | """ 14 | 15 | import caffe 16 | import numpy as np 17 | import yaml 18 | from random import shuffle 19 | import numpy.random as nr 20 | import cv2 21 | import os 22 | import pickle as cPickle 23 | import pdb 24 | 25 | def mypickle(filename, data): 26 | fo = open(filename, "wb") 27 | cPickle.dump(data, fo, protocol=cPickle.HIGHEST_PROTOCOL) 28 | fo.close() 29 | 30 | def myunpickle(filename): 31 | if not os.path.exists(filename): 32 | raise UnpickleError("Path '%s' does not exist." % filename) 33 | 34 | fo = open(filename, 'rb') 35 | dict = cPickle.load(fo) 36 | fo.close() 37 | return dict 38 | 39 | class MGCAM_DataLayer(caffe.Layer): 40 | """Data layer for training""" 41 | def setup(self, bottom, top): 42 | self.width = 64 43 | self.height = 160 # We resize all images into a size of 160*64. 44 | self.width_gt = 16 45 | self.height_gt = 40 # We resize all masks which are used to supervise attention learning into a size of 40*16. 46 | layer_params = yaml.load(self.param_str) 47 | self.batch_size = layer_params['batch_size'] 48 | self.im_path = layer_params['im_path'] 49 | self.gt_path = layer_params['gt_path'] 50 | self.dataset = layer_params['dataset'] 51 | self.labels, self.im_list, self.gt_list, self.im_dir_list, self.gt_dir_list = self.data_processor(self.dataset) 52 | self.idx = 0 53 | self.data_num = len(self.im_list) # Number of data pairs 54 | self.rnd_list = np.arange(self.data_num) # Random the images list 55 | shuffle(self.rnd_list) 56 | 57 | def forward(self, bottom, top): 58 | # Assign forward data 59 | top[0].data[...] = self.im 60 | top[1].data[...] = self.inner_label 61 | top[2].data[...] = self.exter_label 62 | top[3].data[...] = self.one_mask 63 | top[4].data[...] = self.label 64 | top[5].data[...] = self.label_plus 65 | top[6].data[...] = self.gt 66 | top[7].data[...] = self.mask 67 | 68 | def backward(self, top, propagate_down, bottom): 69 | """This layer does not propagate gradients.""" 70 | pass 71 | 72 | def reshape(self, bottom, top): 73 | # Load image + label image pairs 74 | self.im = [] 75 | self.label = [] 76 | self.inner_label = [] 77 | self.exter_label = [] 78 | self.one_mask = [] 79 | self.label_plus = [] 80 | self.gt = [] 81 | self.mask = [] 82 | 83 | for i in xrange(self.batch_size): 84 | if self.idx == self.data_num: 85 | self.idx = 0 86 | shuffle(self.rnd_list) #Randomly shuffle the list. 87 | cur_idx = self.rnd_list[self.idx] 88 | im_path = self.im_list[cur_idx] 89 | gt_path = self.gt_list[cur_idx] 90 | im_, gt_, mask_= self.load_data(im_path, gt_path) 91 | self.im.append(im_) 92 | self.gt.append(gt_) 93 | self.mask.append(mask_) 94 | self.label.append(self.labels[cur_idx]) 95 | self.inner_label.append(int(1)) 96 | self.exter_label.append(int(0)) 97 | one_mask_ = np.zeros((1,40,16),dtype = np.float32) 98 | one_mask_ = one_mask_ + 1.0 99 | self.one_mask.append(one_mask_) 100 | self.label_plus.append(self.labels[cur_idx]) #Here, we also give the ID-labels to background-stream. 101 | self.idx +=1 102 | 103 | self.im = np.array(self.im).astype(np.float32) 104 | self.inner_label = np.array(self.inner_label).astype(np.float32) 105 | self.exter_label = np.array(self.exter_label).astype(np.float32) 106 | self.one_mask = np.array(self.one_mask).astype(np.float32) 107 | self.label = np.array(self.label).astype(np.float32) 108 | self.label_plus = np.array(self.label_plus).astype(np.float32) 109 | self.gt = np.array(self.gt).astype(np.float32) 110 | self.mask = np.array(self.mask).astype(np.float32) 111 | # Reshape tops to fit blobs 112 | top[0].reshape(*self.im.shape) 113 | top[1].reshape(*self.inner_label.shape) 114 | top[2].reshape(*self.exter_label.shape) 115 | top[3].reshape(*self.one_mask.shape) 116 | top[4].reshape(*self.label.shape) 117 | top[5].reshape(*self.label_plus.shape) 118 | top[6].reshape(*self.gt.shape) 119 | top[7].reshape(*self.mask.shape) 120 | 121 | def data_processor(self, data_name): 122 | data_dic = './' + data_name 123 | if not os.path.exists(data_dic): 124 | im_list = [] 125 | gt_list = [] 126 | labels = [] 127 | im_dir_list = [] 128 | gt_dir_list = [] 129 | new_id = 0 130 | id_list = np.sort(os.listdir(self.im_path)) 131 | for id in id_list: 132 | im_dir = os.path.join(self.im_path, id) 133 | gt_dir = os.path.join(self.gt_path, id) 134 | if not os.path.exists(im_dir): 135 | continue 136 | pic_im_list = np.sort(os.listdir(im_dir)) 137 | if len(pic_im_list)>1: 138 | for pic in pic_im_list: 139 | this_dir = os.path.join(self.im_path, id, pic) 140 | gt_pic = pic 141 | if not pic.lower().endswith('.png'): 142 | gt_pic = pic[:-4] + '.png' 143 | this_gt_dir = os.path.join(self.gt_path, id, gt_pic) 144 | im_list.append(this_dir) 145 | gt_list.append(this_gt_dir) 146 | labels.append(int(new_id)) 147 | new_id +=1 148 | im_dir_list.append(im_dir) 149 | gt_dir_list.append(gt_dir) 150 | dic = {'im_list':im_list,'gt_list':gt_list,'labels':labels,'im_dir_list':im_dir_list,'gt_dir_list':gt_dir_list} 151 | mypickle(data_dic, dic) 152 | # Load saved data dict to resume. 153 | else: 154 | dic = myunpickle(data_dic) 155 | im_list = dic['im_list'] 156 | gt_list = dic['gt_list'] 157 | labels = dic['labels'] 158 | im_dir_list = dic['im_dir_list'] 159 | gt_dir_list = dic['gt_dir_list'] 160 | return labels, im_list, gt_list, im_dir_list, gt_dir_list 161 | 162 | def load_data(self, im_path, gt_path): 163 | """ 164 | Load input image and preprocess for Caffe: 165 | - cast to float 166 | - switch channels RGB -> BGR 167 | - subtract mean 168 | - transpose to channel x height x width order 169 | """ 170 | oim = cv2.imread(im_path) 171 | inputImage = cv2.resize(oim, (self.width, self.height)) 172 | inputImage = np.array(inputImage, dtype=np.float32) 173 | 174 | # Substract mean 175 | inputImage[:, :, 0] = inputImage[:, :, 0] - 104.008 176 | inputImage[:, :, 1] = inputImage[:, :, 1] - 116.669 177 | inputImage[:, :, 2] = inputImage[:, :, 2] - 122.675 178 | 179 | # Permute dimensions 180 | if_flip = nr.randint(2) 181 | if if_flip == 0: # Also flip the image with 50% probability 182 | inputImage = inputImage[:,::-1,:] 183 | inputImage = inputImage.transpose([2, 0, 1]) 184 | inputImage = inputImage/256.0 185 | #GT 186 | mask_im= cv2.cvtColor(cv2.imread(gt_path),cv2.COLOR_BGR2GRAY) 187 | inputGt = np.array(cv2.resize(mask_im, (self.width_gt, self.height_gt)), dtype=np.float32) 188 | inputGt = inputGt/255.0 189 | if if_flip == 0: 190 | inputGt = inputGt[:,::-1] 191 | inputGt = inputGt[np.newaxis, ...] 192 | #Mask 193 | inputMask = np.array(cv2.resize(mask_im, (self.width, self.height)), dtype=np.float32) 194 | inputMask = inputMask-127.5 195 | inputMask = inputMask/255.0 196 | if if_flip == 0: 197 | inputMask = inputMask[:,::-1] 198 | inputMask = inputMask[np.newaxis, ...] 199 | return inputImage, inputGt, inputMask 200 | 201 | 202 | class MGCAM_SIA_DataLayer(caffe.Layer): 203 | """Data layer for training""" 204 | def setup(self, bottom, top): 205 | self.width = 64 206 | self.height = 160 # We resize all images into a size of 160*64. 207 | self.width_gt = 16 208 | self.height_gt = 40 # We resize all masks which are used to supervise attention learning into a size of 160*64. 209 | 210 | layer_params = yaml.load(self.param_str) 211 | self.batch_size = layer_params['batch_size'] 212 | self.pos_pair_num = int(0.30*self.batch_size) # There will be at least 30 percent postive pairs for each batch. 213 | self.im_path = layer_params['im_path'] 214 | self.gt_path = layer_params['gt_path'] 215 | self.dataset = layer_params['dataset'] 216 | self.labels, self.im_list, self.gt_list, self.im_dir_list, self.gt_dir_list = self.data_processor(self.dataset) 217 | self.idx = 0 218 | self.data_num = len(self.im_list) 219 | self.rnd_list = np.arange(self.data_num) 220 | shuffle(self.rnd_list) 221 | 222 | def forward(self, bottom, top): 223 | # Assign forward data 224 | top[0].data[...] = self.im 225 | top[1].data[...] = self.inner_label 226 | top[2].data[...] = self.exter_label 227 | top[3].data[...] = self.one_mask 228 | top[4].data[...] = self.label 229 | top[5].data[...] = self.label_plus 230 | top[6].data[...] = self.gt 231 | top[7].data[...] = self.mask 232 | top[8].data[...] = self.siam_label 233 | 234 | def backward(self, top, propagate_down, bottom): 235 | """This layer does not propagate gradients.""" 236 | pass 237 | 238 | def reshape(self, bottom, top): 239 | # Load image + label image pairs 240 | self.im = [] 241 | self.label = [] 242 | self.inner_label = [] 243 | self.exter_label = [] 244 | self.one_mask = [] 245 | self.label_plus = [] 246 | self.gt = [] 247 | self.mask = [] 248 | self.siam_label = [] 249 | 250 | for i in xrange(self.batch_size): 251 | if self.idx == self.data_num: 252 | self.idx = 0 253 | shuffle(self.rnd_list) 254 | cur_idx = self.rnd_list[self.idx] 255 | im_path = self.im_list[cur_idx] 256 | gt_path = self.gt_list[cur_idx] 257 | im_, gt_, mask_= self.load_data(im_path, gt_path) 258 | self.im.append(im_) 259 | self.gt.append(gt_) 260 | self.mask.append(mask_) 261 | self.label.append(self.labels[cur_idx]) 262 | self.inner_label.append(int(1)) 263 | self.exter_label.append(int(0)) 264 | one_mask_ = np.zeros((1,40,16),dtype = np.float32) 265 | one_mask_ = one_mask_ + 1.0 266 | self.one_mask.append(one_mask_) 267 | self.label_plus.append(self.labels[cur_idx])#Labels for backgrounds. We use the same labels with other two regions here. 268 | self.idx +=1 269 | 270 | for i in xrange(self.batch_size): 271 | if i > self.pos_pair_num: 272 | if self.idx == self.data_num: 273 | self.idx = 0 274 | shuffle(self.rnd_list)#Randomly shuffle the list. 275 | cur_idx = self.rnd_list[self.idx] 276 | self.idx +=1 277 | im_path = self.im_list[cur_idx] 278 | gt_path = self.gt_list[cur_idx] 279 | label = self.labels[cur_idx] 280 | if label==self.label[i]: 281 | self.siam_label.append(int(1))#In case of getting postive pairs, maybe not much. 282 | else: 283 | self.siam_label.append(int(0))#Negative pairs. 284 | else: 285 | im_dir = self.im_dir_list[self.label[i]] 286 | gt_dir = self.gt_dir_list[self.label[i]] 287 | im_list = np.sort(os.listdir(im_dir)) 288 | gt_list = np.sort(os.listdir(gt_dir)) 289 | tmp_list = np.arange(len(im_list)) 290 | shuffle(tmp_list) #Randomly select one. 291 | im_path = os.path.join(im_dir, im_list[tmp_list[0]]) 292 | gt_path = os.path.join(gt_dir, gt_list[tmp_list[0]]) 293 | label = self.label[i] 294 | self.siam_label.append(int(1))#This is a postive pair. 295 | 296 | im_, gt_, mask_= self.load_data(im_path, gt_path) 297 | self.im.append(im_) 298 | self.gt.append(gt_) 299 | self.mask.append(mask_) 300 | self.label.append(label) 301 | self.inner_label.append(int(1))#Allways be ones, for constrastive learning. 302 | self.exter_label.append(int(0)) 303 | one_mask_ = np.zeros((1,40,16),dtype = np.float32) 304 | one_mask_ = one_mask_ + 1.0 305 | self.one_mask.append(one_mask_) 306 | self.label_plus.append(label) 307 | 308 | self.im = np.array(self.im).astype(np.float32) 309 | self.inner_label = np.array(self.inner_label).astype(np.float32) 310 | self.exter_label = np.array(self.exter_label).astype(np.float32) 311 | self.one_mask = np.array(self.one_mask).astype(np.float32) 312 | self.label = np.array(self.label).astype(np.float32) 313 | self.label_plus = np.array(self.label_plus).astype(np.float32) 314 | self.gt = np.array(self.gt).astype(np.float32) 315 | self.mask = np.array(self.mask).astype(np.float32) 316 | self.siam_label = np.array(self.siam_label).astype(np.float32) 317 | # Reshape tops to fit blobs 318 | top[0].reshape(*self.im.shape) 319 | top[1].reshape(*self.inner_label.shape) 320 | top[2].reshape(*self.exter_label.shape) 321 | top[3].reshape(*self.one_mask.shape) 322 | top[4].reshape(*self.label.shape) 323 | top[5].reshape(*self.label_plus.shape) 324 | top[6].reshape(*self.gt.shape) 325 | top[7].reshape(*self.mask.shape) 326 | top[8].reshape(*self.siam_label.shape) 327 | 328 | def data_processor(self, data_name): 329 | data_dic = './' + data_name 330 | if not os.path.exists(data_dic): 331 | im_list = [] 332 | gt_list = [] 333 | labels = [] 334 | im_dir_list = [] 335 | gt_dir_list = [] 336 | new_id = 0 337 | id_list = np.sort(os.listdir(self.im_path)) 338 | for id in id_list: 339 | im_dir = os.path.join(self.im_path, id) 340 | gt_dir = os.path.join(self.gt_path, id) 341 | if not os.path.exists(im_dir): 342 | continue 343 | pic_im_list = np.sort(os.listdir(im_dir)) 344 | if len(pic_im_list)>1: 345 | for pic in pic_im_list: 346 | this_dir = os.path.join(self.im_path, id, pic) 347 | gt_pic = pic 348 | if not pic.lower().endswith('.png'): 349 | gt_pic = pic[:-4] + '.png' 350 | this_gt_dir = os.path.join(self.gt_path, id, gt_pic) 351 | im_list.append(this_dir) 352 | gt_list.append(this_gt_dir) 353 | labels.append(int(new_id)) 354 | new_id +=1 355 | im_dir_list.append(im_dir) 356 | gt_dir_list.append(gt_dir) 357 | dic = {'im_list':im_list,'gt_list':gt_list,'labels':labels,'im_dir_list':im_dir_list,'gt_dir_list':gt_dir_list} 358 | mypickle(data_dic, dic) 359 | # Load saved data dict to resume. 360 | else: 361 | dic = myunpickle(data_dic) 362 | im_list = dic['im_list'] 363 | gt_list = dic['gt_list'] 364 | labels = dic['labels'] 365 | im_dir_list = dic['im_dir_list'] 366 | gt_dir_list = dic['gt_dir_list'] 367 | return labels, im_list, gt_list, im_dir_list, gt_dir_list 368 | 369 | def load_data(self, im_path, gt_path): 370 | """ 371 | Load input image and preprocess for Caffe: 372 | - cast to float 373 | - switch channels RGB -> BGR 374 | - subtract mean 375 | - transpose to channel x height x width order 376 | """ 377 | oim = cv2.imread(im_path) 378 | inputImage = cv2.resize(oim, (self.width, self.height)) 379 | inputImage = np.array(inputImage, dtype=np.float32) 380 | 381 | # Substract mean 382 | inputImage[:, :, 0] = inputImage[:, :, 0] - 104.008 383 | inputImage[:, :, 1] = inputImage[:, :, 1] - 116.669 384 | inputImage[:, :, 2] = inputImage[:, :, 2] - 122.675 385 | 386 | # Permute dimensions 387 | if_flip = nr.randint(2) 388 | if if_flip == 0: # Also flip the image with 50% probability 389 | inputImage = inputImage[:,::-1,:] 390 | inputImage = inputImage.transpose([2, 0, 1]) 391 | inputImage = inputImage/256.0 392 | #GT 393 | mask_im= cv2.cvtColor(cv2.imread(gt_path),cv2.COLOR_BGR2GRAY) 394 | inputGt = np.array(cv2.resize(mask_im, (self.width_gt, self.height_gt)), dtype=np.float32) 395 | inputGt = inputGt/255.0 396 | if if_flip == 0: 397 | inputGt = inputGt[:,::-1] 398 | inputGt = inputGt[np.newaxis, ...] 399 | #Mask 400 | inputMask = np.array(cv2.resize(mask_im, (self.width, self.height)), dtype=np.float32) 401 | inputMask = inputMask-127.5 402 | inputMask = inputMask/255.0 403 | if if_flip == 0: 404 | inputMask = inputMask[:,::-1] 405 | inputMask = inputMask[np.newaxis, ...] 406 | return inputImage, inputGt, inputMask 407 | -------------------------------------------------------------------------------- /experiments/mars/mgcam_iter_75000.caffemodel: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/developfeng/MGCAM/1b5957218ecaa7f13bf2107bc41b1349c05be9c8/experiments/mars/mgcam_iter_75000.caffemodel -------------------------------------------------------------------------------- /experiments/mars/mgcam_siamese_iter_20000.caffemodel: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/developfeng/MGCAM/1b5957218ecaa7f13bf2107bc41b1349c05be9c8/experiments/mars/mgcam_siamese_iter_20000.caffemodel -------------------------------------------------------------------------------- /experiments/mars/mgcam_train.prototxt: -------------------------------------------------------------------------------- 1 | layer { 2 | name: "data" 3 | type: "Python" 4 | top: "data" 5 | top: "sim_iner" 6 | top: "sim_exter" 7 | top: "one_mask" 8 | top: "label" 9 | top: "label_plus" 10 | top: "gt" 11 | top: "mask" 12 | include { 13 | phase: TRAIN 14 | } 15 | python_param { 16 | module: "layers" 17 | layer: "MGCAM_DataLayer" 18 | param_str: "{'batch_size': 128,'im_path':'../../data/mars/bounding_box_train_fold','gt_path':'../../data/mars/bounding_box_train_seg_fold/','dataset':'mars'}" 19 | } 20 | } 21 | 22 | layer { 23 | name: "data" 24 | type: "Python" 25 | top: "data" 26 | top: "sim_iner" 27 | top: "sim_exter" 28 | top: "one_mask" 29 | top: "label" 30 | top: "label_plus" 31 | top: "gt" 32 | top: "mask" 33 | include { 34 | phase: TEST 35 | }#Note the testing set is the same with the training here. 36 | python_param { 37 | module: "layers" 38 | layer: "MGCAM_DataLayer" 39 | param_str: "{'batch_size': 10,'im_path':'../../data/mars/bounding_box_train_fold','gt_path':'../../data/mars/bounding_box_train_seg_fold/','dataset':'mars'}" 40 | } 41 | } 42 | 43 | layer { 44 | name: "data_concate" 45 | type: "Concat" 46 | bottom: "data" 47 | bottom: "mask" 48 | top: "data_concate" 49 | } 50 | 51 | layer { 52 | name: "conv0_scale1_full" 53 | type: "Convolution" 54 | bottom: "data_concate" 55 | top: "conv0_scale1_full" 56 | param { 57 | name: "conv0_scale1_w_full" 58 | lr_mult: 1 59 | decay_mult: 1 60 | } 61 | param { 62 | name: "conv0_scale1_b_full" 63 | lr_mult: 2 64 | decay_mult: 0 65 | } 66 | convolution_param { 67 | num_output: 32 68 | pad: 2 69 | kernel_size: 5 70 | stride: 1 71 | weight_filler { 72 | type: "xavier" 73 | } 74 | bias_filler { 75 | type: "constant" 76 | } 77 | dilation: 1 78 | } 79 | } 80 | layer { 81 | name: "bn0_scale1_full" 82 | type: "BatchNorm" 83 | bottom: "conv0_scale1_full" 84 | top: "bn0_scale1_full" 85 | param { 86 | name: "bn0_scale1_mean_full" 87 | lr_mult: 0 88 | decay_mult: 0 89 | } 90 | param { 91 | name: "bn0_scale1_var_full" 92 | lr_mult: 0 93 | decay_mult: 0 94 | } 95 | param { 96 | name: "bn0_scale1_bias_full" 97 | lr_mult: 0 98 | decay_mult: 0 99 | } 100 | batch_norm_param { 101 | use_global_stats: false 102 | } 103 | } 104 | layer { 105 | name: "relu0_full" 106 | type: "ReLU" 107 | bottom: "bn0_scale1_full" 108 | top: "bn0_scale1_full" 109 | } 110 | layer { 111 | name: "pool0_full" 112 | type: "Pooling" 113 | bottom: "bn0_scale1_full" 114 | top: "pool0_full" 115 | pooling_param { 116 | pool: MAX 117 | kernel_size: 2 118 | stride: 2 119 | } 120 | } 121 | layer { 122 | name: "conv1_scale1_full" 123 | type: "Convolution" 124 | bottom: "pool0_full" 125 | top: "conv1_scale1_full" 126 | param { 127 | name: "conv1_scale1_w_full" 128 | lr_mult: 1 129 | decay_mult: 1 130 | } 131 | param { 132 | name: "conv1_scale1_b_full" 133 | lr_mult: 2 134 | decay_mult: 0 135 | } 136 | convolution_param { 137 | num_output: 32 138 | pad: 1 139 | kernel_size: 3 140 | stride: 1 141 | weight_filler { 142 | type: "xavier" 143 | } 144 | bias_filler { 145 | type: "constant" 146 | } 147 | dilation: 1 148 | } 149 | } 150 | layer { 151 | name: "conv1_scale2_full" 152 | type: "Convolution" 153 | bottom: "pool0_full" 154 | top: "conv1_scale2_full" 155 | param { 156 | name: "conv1_scale2_w_full" 157 | lr_mult: 1 158 | decay_mult: 1 159 | } 160 | param { 161 | name: "conv1_scale2_b_full" 162 | lr_mult: 2 163 | decay_mult: 0 164 | } 165 | convolution_param { 166 | num_output: 32 167 | pad: 2 168 | kernel_size: 3 169 | stride: 1 170 | weight_filler { 171 | type: "xavier" 172 | } 173 | bias_filler { 174 | type: "constant" 175 | } 176 | dilation: 2 177 | } 178 | } 179 | layer { 180 | name: "conv1_scale3_full" 181 | type: "Convolution" 182 | bottom: "pool0_full" 183 | top: "conv1_scale3_full" 184 | param { 185 | name: "conv1_scale3_w_full" 186 | lr_mult: 1 187 | decay_mult: 1 188 | } 189 | param { 190 | name: "conv1_scale3_b_full" 191 | lr_mult: 2 192 | decay_mult: 0 193 | } 194 | convolution_param { 195 | num_output: 32 196 | pad: 3 197 | kernel_size: 3 198 | stride: 1 199 | weight_filler { 200 | type: "xavier" 201 | } 202 | bias_filler { 203 | type: "constant" 204 | } 205 | dilation: 3 206 | } 207 | } 208 | layer { 209 | name: "bn1_scale1_full" 210 | type: "BatchNorm" 211 | bottom: "conv1_scale1_full" 212 | top: "bn1_scale1_full" 213 | param { 214 | name: "bn1_scale1_mean_full" 215 | lr_mult: 0 216 | decay_mult: 0 217 | } 218 | param { 219 | name: "bn1_scale1_var_full" 220 | lr_mult: 0 221 | decay_mult: 0 222 | } 223 | param { 224 | name: "bn1_scale1_bias_full" 225 | lr_mult: 0 226 | decay_mult: 0 227 | } 228 | batch_norm_param { 229 | use_global_stats: false 230 | } 231 | } 232 | layer { 233 | name: "bn1_scale2_full" 234 | type: "BatchNorm" 235 | bottom: "conv1_scale2_full" 236 | top: "bn1_scale2_full" 237 | param { 238 | name: "bn1_scale2_mean_full" 239 | lr_mult: 0 240 | decay_mult: 0 241 | } 242 | param { 243 | name: "bn1_scale2_var_full" 244 | lr_mult: 0 245 | decay_mult: 0 246 | } 247 | param { 248 | name: "bn1_scale2_bias_full" 249 | lr_mult: 0 250 | decay_mult: 0 251 | } 252 | batch_norm_param { 253 | use_global_stats: false 254 | } 255 | } 256 | layer { 257 | name: "bn1_scale3_full" 258 | type: "BatchNorm" 259 | bottom: "conv1_scale3_full" 260 | top: "bn1_scale3_full" 261 | param { 262 | name: "bn1_scale3_mean_full" 263 | lr_mult: 0 264 | decay_mult: 0 265 | } 266 | param { 267 | name: "bn1_scale3_var_full" 268 | lr_mult: 0 269 | decay_mult: 0 270 | } 271 | param { 272 | name: "bn1_scale3_bias_full" 273 | lr_mult: 0 274 | decay_mult: 0 275 | } 276 | batch_norm_param { 277 | use_global_stats: false 278 | } 279 | } 280 | layer { 281 | name: "bn1_full" 282 | type: "Concat" 283 | bottom: "bn1_scale1_full" 284 | bottom: "bn1_scale2_full" 285 | bottom: "bn1_scale3_full" 286 | top: "bn1_full" 287 | concat_param { 288 | axis: 1 289 | } 290 | } 291 | layer { 292 | name: "relu1_full" 293 | type: "ReLU" 294 | bottom: "bn1_full" 295 | top: "bn1_full" 296 | } 297 | layer { 298 | name: "pool1_full" 299 | type: "Pooling" 300 | bottom: "bn1_full" 301 | top: "pool1_full" 302 | pooling_param { 303 | pool: MAX 304 | kernel_size: 2 305 | stride: 2 306 | } 307 | } 308 | layer { 309 | name: "conv2_scale1_full" 310 | type: "Convolution" 311 | bottom: "pool1_full" 312 | top: "conv2_scale1_full" 313 | param { 314 | name: "conv2_scale1_w_full" 315 | lr_mult: 1 316 | decay_mult: 1 317 | } 318 | param { 319 | name: "conv2_scale1_b_full" 320 | lr_mult: 2 321 | decay_mult: 0 322 | } 323 | convolution_param { 324 | num_output: 32 325 | pad: 1 326 | kernel_size: 3 327 | stride: 1 328 | weight_filler { 329 | type: "xavier" 330 | } 331 | bias_filler { 332 | type: "constant" 333 | } 334 | dilation: 1 335 | } 336 | } 337 | layer { 338 | name: "conv2_scale2_full" 339 | type: "Convolution" 340 | bottom: "pool1_full" 341 | top: "conv2_scale2_full" 342 | param { 343 | name: "conv2_scale2_w_full" 344 | lr_mult: 1 345 | decay_mult: 1 346 | } 347 | param { 348 | name: "conv2_scale2_b_full" 349 | lr_mult: 2 350 | decay_mult: 0 351 | } 352 | convolution_param { 353 | num_output: 32 354 | pad: 2 355 | kernel_size: 3 356 | stride: 1 357 | weight_filler { 358 | type: "xavier" 359 | } 360 | bias_filler { 361 | type: "constant" 362 | } 363 | dilation: 2 364 | } 365 | } 366 | layer { 367 | name: "conv2_scale3_full" 368 | type: "Convolution" 369 | bottom: "pool1_full" 370 | top: "conv2_scale3_full" 371 | param { 372 | name: "conv2_scale3_w_full" 373 | lr_mult: 1 374 | decay_mult: 1 375 | } 376 | param { 377 | name: "conv2_scale3_b_full" 378 | lr_mult: 2 379 | decay_mult: 0 380 | } 381 | convolution_param { 382 | num_output: 32 383 | pad: 3 384 | kernel_size: 3 385 | stride: 1 386 | weight_filler { 387 | type: "xavier" 388 | } 389 | bias_filler { 390 | type: "constant" 391 | } 392 | dilation: 3 393 | } 394 | } 395 | layer { 396 | name: "bn2_scale1_full" 397 | type: "BatchNorm" 398 | bottom: "conv2_scale1_full" 399 | top: "bn2_scale1_full" 400 | param { 401 | name: "bn2_scale1_mean_full" 402 | lr_mult: 0 403 | decay_mult: 0 404 | } 405 | param { 406 | name: "bn2_scale1_var_full" 407 | lr_mult: 0 408 | decay_mult: 0 409 | } 410 | param { 411 | name: "bn2_scale1_bias_full" 412 | lr_mult: 0 413 | decay_mult: 0 414 | } 415 | batch_norm_param { 416 | use_global_stats: false 417 | } 418 | } 419 | layer { 420 | name: "bn2_scale2_full" 421 | type: "BatchNorm" 422 | bottom: "conv2_scale2_full" 423 | top: "bn2_scale2_full" 424 | param { 425 | name: "bn2_scale2_mean_full" 426 | lr_mult: 0 427 | decay_mult: 0 428 | } 429 | param { 430 | name: "bn2_scale2_var_full" 431 | lr_mult: 0 432 | decay_mult: 0 433 | } 434 | param { 435 | name: "bn2_scale2_bias_full" 436 | lr_mult: 0 437 | decay_mult: 0 438 | } 439 | batch_norm_param { 440 | use_global_stats: false 441 | } 442 | } 443 | layer { 444 | name: "bn2_scale3_full" 445 | type: "BatchNorm" 446 | bottom: "conv2_scale3_full" 447 | top: "bn2_scale3_full" 448 | param { 449 | name: "bn2_scale3_mean_full" 450 | lr_mult: 0 451 | decay_mult: 0 452 | } 453 | param { 454 | name: "bn2_scale3_var_full" 455 | lr_mult: 0 456 | decay_mult: 0 457 | } 458 | param { 459 | name: "bn2_scale3_bias_full" 460 | lr_mult: 0 461 | decay_mult: 0 462 | } 463 | batch_norm_param { 464 | use_global_stats: false 465 | } 466 | } 467 | layer { 468 | name: "bn2_full" 469 | type: "Concat" 470 | bottom: "bn2_scale1_full" 471 | bottom: "bn2_scale2_full" 472 | bottom: "bn2_scale3_full" 473 | top: "bn2_full" 474 | concat_param { 475 | axis: 1 476 | } 477 | } 478 | layer { 479 | name: "relu2_full" 480 | type: "ReLU" 481 | bottom: "bn2_full" 482 | top: "bn2_full" 483 | } 484 | 485 | 486 | layer { 487 | name: "make_att_mask" 488 | type: "Convolution" 489 | bottom: "bn2_full" 490 | top: "make_att_mask" 491 | param { 492 | lr_mult: 1 493 | decay_mult: 1 494 | } 495 | param { 496 | lr_mult: 2 497 | decay_mult: 0 498 | } 499 | convolution_param { 500 | num_output: 1 501 | pad: 1 502 | kernel_size: 3 503 | stride: 1 504 | weight_filler { 505 | type: "xavier" 506 | } 507 | bias_filler { 508 | type: "constant" 509 | } 510 | dilation: 1 511 | } 512 | } 513 | 514 | layer { 515 | name: "att_sigmoid" 516 | type: "Sigmoid" 517 | bottom: "make_att_mask" 518 | top: "make_att_mask" 519 | } 520 | 521 | ############### Seg Loss ##################### 522 | layer { 523 | name: "loss_seg" 524 | type: "EuclideanLoss" 525 | bottom: "gt" 526 | bottom: "make_att_mask" 527 | top: "loss_seg" 528 | loss_weight: 0.1 529 | } 530 | 531 | layer { 532 | name: "make_att_mask_inv" 533 | type: "Eltwise" 534 | bottom: "one_mask" 535 | bottom: "make_att_mask" 536 | top: "make_att_mask_inv" 537 | eltwise_param { 538 | operation: SUM 539 | coeff: 1 540 | coeff: -1 541 | } 542 | } 543 | ############### Seg Loss ##################### 544 | 545 | layer { 546 | name: "tile_iner" 547 | type: "Tile" 548 | bottom: "make_att_mask" 549 | top: "att_iner" 550 | tile_param { 551 | tiles: 96 552 | axis: 1 553 | } 554 | } 555 | 556 | layer { 557 | name: "bn2_att_iner" 558 | type: "Eltwise" 559 | bottom: "bn2_full" 560 | bottom: "att_iner" 561 | top: "bn2_att_iner" 562 | eltwise_param { 563 | operation: PROD 564 | } 565 | } 566 | 567 | layer { 568 | name: "tile_exter" 569 | type: "Tile" 570 | bottom: "make_att_mask_inv" 571 | top: "att_exter" 572 | tile_param { 573 | tiles: 96 574 | axis: 1 575 | } 576 | } 577 | 578 | layer { 579 | name: "bn2_att_exter" 580 | type: "Eltwise" 581 | bottom: "bn2_full" 582 | bottom: "att_exter" 583 | top: "bn2_att_exter" 584 | eltwise_param { 585 | operation: PROD 586 | } 587 | } 588 | 589 | layer { 590 | name: "pool2_full" 591 | type: "Pooling" 592 | bottom: "bn2_full" 593 | top: "pool2_full" 594 | pooling_param { 595 | pool: MAX 596 | kernel_size: 2 597 | stride: 2 598 | } 599 | } 600 | 601 | layer { 602 | name: "conv3_scale1_full" 603 | type: "Convolution" 604 | bottom: "pool2_full" 605 | top: "conv3_scale1_full" 606 | param { 607 | name: "conv3_scale1_w_full" 608 | lr_mult: 1 609 | decay_mult: 1 610 | } 611 | param { 612 | name: "conv3_scale1_b_full" 613 | lr_mult: 2 614 | decay_mult: 0 615 | } 616 | convolution_param { 617 | num_output: 32 618 | pad: 1 619 | kernel_size: 3 620 | stride: 1 621 | weight_filler { 622 | type: "xavier" 623 | } 624 | bias_filler { 625 | type: "constant" 626 | } 627 | dilation: 1 628 | } 629 | } 630 | layer { 631 | name: "conv3_scale2_full" 632 | type: "Convolution" 633 | bottom: "pool2_full" 634 | top: "conv3_scale2_full" 635 | param { 636 | name: "conv3_scale2_w_full" 637 | lr_mult: 1 638 | decay_mult: 1 639 | } 640 | param { 641 | name: "conv3_scale2_b_full" 642 | lr_mult: 2 643 | decay_mult: 0 644 | } 645 | convolution_param { 646 | num_output: 32 647 | pad: 2 648 | kernel_size: 3 649 | stride: 1 650 | weight_filler { 651 | type: "xavier" 652 | } 653 | bias_filler { 654 | type: "constant" 655 | } 656 | dilation: 2 657 | } 658 | } 659 | layer { 660 | name: "conv3_scale3_full" 661 | type: "Convolution" 662 | bottom: "pool2_full" 663 | top: "conv3_scale3_full" 664 | param { 665 | name: "conv3_scale3_w_full" 666 | lr_mult: 1 667 | decay_mult: 1 668 | } 669 | param { 670 | name: "conv3_scale3_b_full" 671 | lr_mult: 2 672 | decay_mult: 0 673 | } 674 | convolution_param { 675 | num_output: 32 676 | pad: 3 677 | kernel_size: 3 678 | stride: 1 679 | weight_filler { 680 | type: "xavier" 681 | } 682 | bias_filler { 683 | type: "constant" 684 | } 685 | dilation: 3 686 | } 687 | } 688 | layer { 689 | name: "bn3_scale1_full" 690 | type: "BatchNorm" 691 | bottom: "conv3_scale1_full" 692 | top: "bn3_scale1_full" 693 | param { 694 | name: "bn3_scale1_mean_full" 695 | lr_mult: 0 696 | decay_mult: 0 697 | } 698 | param { 699 | name: "bn3_scale1_var_full" 700 | lr_mult: 0 701 | decay_mult: 0 702 | } 703 | param { 704 | name: "bn3_scale1_bias_full" 705 | lr_mult: 0 706 | decay_mult: 0 707 | } 708 | batch_norm_param { 709 | use_global_stats: false 710 | } 711 | } 712 | layer { 713 | name: "bn3_scale2_full" 714 | type: "BatchNorm" 715 | bottom: "conv3_scale2_full" 716 | top: "bn3_scale2_full" 717 | param { 718 | name: "bn3_scale2_mean_full" 719 | lr_mult: 0 720 | decay_mult: 0 721 | } 722 | param { 723 | name: "bn3_scale2_var_full" 724 | lr_mult: 0 725 | decay_mult: 0 726 | } 727 | param { 728 | name: "bn3_scale2_bias_full" 729 | lr_mult: 0 730 | decay_mult: 0 731 | } 732 | batch_norm_param { 733 | use_global_stats: false 734 | } 735 | } 736 | layer { 737 | name: "bn3_scale3_full" 738 | type: "BatchNorm" 739 | bottom: "conv3_scale3_full" 740 | top: "bn3_scale3_full" 741 | param { 742 | name: "bn3_scale3_mean_full" 743 | lr_mult: 0 744 | decay_mult: 0 745 | } 746 | param { 747 | name: "bn3_scale3_var_full" 748 | lr_mult: 0 749 | decay_mult: 0 750 | } 751 | param { 752 | name: "bn3_scale3_bias_full" 753 | lr_mult: 0 754 | decay_mult: 0 755 | } 756 | batch_norm_param { 757 | use_global_stats: false 758 | } 759 | } 760 | layer { 761 | name: "bn3_full" 762 | type: "Concat" 763 | bottom: "bn3_scale1_full" 764 | bottom: "bn3_scale2_full" 765 | bottom: "bn3_scale3_full" 766 | top: "bn3_full" 767 | concat_param { 768 | axis: 1 769 | } 770 | } 771 | layer { 772 | name: "relu3_full" 773 | type: "ReLU" 774 | bottom: "bn3_full" 775 | top: "bn3_full" 776 | } 777 | layer { 778 | name: "pool3_full" 779 | type: "Pooling" 780 | bottom: "bn3_full" 781 | top: "pool3_full" 782 | pooling_param { 783 | pool: MAX 784 | kernel_size: 2 785 | stride: 2 786 | } 787 | } 788 | layer { 789 | name: "conv4_scale1_full" 790 | type: "Convolution" 791 | bottom: "pool3_full" 792 | top: "conv4_scale1_full" 793 | param { 794 | name: "conv4_scale1_w_full" 795 | lr_mult: 1 796 | decay_mult: 1 797 | } 798 | param { 799 | name: "conv4_scale1_b_full" 800 | lr_mult: 2 801 | decay_mult: 0 802 | } 803 | convolution_param { 804 | num_output: 32 805 | pad: 1 806 | kernel_size: 3 807 | stride: 1 808 | weight_filler { 809 | type: "xavier" 810 | } 811 | bias_filler { 812 | type: "constant" 813 | } 814 | dilation: 1 815 | } 816 | } 817 | layer { 818 | name: "conv4_scale2_full" 819 | type: "Convolution" 820 | bottom: "pool3_full" 821 | top: "conv4_scale2_full" 822 | param { 823 | name: "conv4_scale2_w_full" 824 | lr_mult: 1 825 | decay_mult: 1 826 | } 827 | param { 828 | name: "conv4_scale2_b_full" 829 | lr_mult: 2 830 | decay_mult: 0 831 | } 832 | convolution_param { 833 | num_output: 32 834 | pad: 2 835 | kernel_size: 3 836 | stride: 1 837 | weight_filler { 838 | type: "xavier" 839 | } 840 | bias_filler { 841 | type: "constant" 842 | } 843 | dilation: 2 844 | } 845 | } 846 | layer { 847 | name: "conv4_scale3_full" 848 | type: "Convolution" 849 | bottom: "pool3_full" 850 | top: "conv4_scale3_full" 851 | param { 852 | name: "conv4_scale3_w_full" 853 | lr_mult: 1 854 | decay_mult: 1 855 | } 856 | param { 857 | name: "conv4_scale3_b_full" 858 | lr_mult: 2 859 | decay_mult: 0 860 | } 861 | convolution_param { 862 | num_output: 32 863 | pad: 3 864 | kernel_size: 3 865 | stride: 1 866 | weight_filler { 867 | type: "xavier" 868 | } 869 | bias_filler { 870 | type: "constant" 871 | } 872 | dilation: 3 873 | } 874 | } 875 | layer { 876 | name: "bn4_scale1_full" 877 | type: "BatchNorm" 878 | bottom: "conv4_scale1_full" 879 | top: "bn4_scale1_full" 880 | param { 881 | name: "bn4_scale1_mean_full" 882 | lr_mult: 0 883 | decay_mult: 0 884 | } 885 | param { 886 | name: "bn4_scale1_var_full" 887 | lr_mult: 0 888 | decay_mult: 0 889 | } 890 | param { 891 | name: "bn4_scale1_bias_full" 892 | lr_mult: 0 893 | decay_mult: 0 894 | } 895 | batch_norm_param { 896 | use_global_stats: false 897 | } 898 | } 899 | layer { 900 | name: "bn4_scale2_full" 901 | type: "BatchNorm" 902 | bottom: "conv4_scale2_full" 903 | top: "bn4_scale2_full" 904 | param { 905 | name: "bn4_scale2_mean_full" 906 | lr_mult: 0 907 | decay_mult: 0 908 | } 909 | param { 910 | name: "bn4_scale2_var_full" 911 | lr_mult: 0 912 | decay_mult: 0 913 | } 914 | param { 915 | name: "bn4_scale2_bias_full" 916 | lr_mult: 0 917 | decay_mult: 0 918 | } 919 | batch_norm_param { 920 | use_global_stats: false 921 | } 922 | } 923 | layer { 924 | name: "bn4_scale3_full" 925 | type: "BatchNorm" 926 | bottom: "conv4_scale3_full" 927 | top: "bn4_scale3_full" 928 | param { 929 | name: "bn4_scale3_mean_full" 930 | lr_mult: 0 931 | decay_mult: 0 932 | } 933 | param { 934 | name: "bn4_scale3_var_full" 935 | lr_mult: 0 936 | decay_mult: 0 937 | } 938 | param { 939 | name: "bn4_scale3_bias_full" 940 | lr_mult: 0 941 | decay_mult: 0 942 | } 943 | batch_norm_param { 944 | use_global_stats: false 945 | } 946 | } 947 | layer { 948 | name: "bn4_full" 949 | type: "Concat" 950 | bottom: "bn4_scale1_full" 951 | bottom: "bn4_scale2_full" 952 | bottom: "bn4_scale3_full" 953 | top: "bn4_full" 954 | concat_param { 955 | axis: 1 956 | } 957 | } 958 | layer { 959 | name: "relu4_full" 960 | type: "ReLU" 961 | bottom: "bn4_full" 962 | top: "bn4_full" 963 | } 964 | layer { 965 | name: "pool4_full" 966 | type: "Pooling" 967 | bottom: "bn4_full" 968 | top: "pool4_full" 969 | pooling_param { 970 | pool: MAX 971 | kernel_size: 2 972 | stride: 2 973 | } 974 | } 975 | layer { 976 | name: "fc1_full" 977 | type: "InnerProduct" 978 | bottom: "pool4_full" 979 | top: "fc1_full" 980 | param { 981 | name: "fc1_w_full" 982 | lr_mult: 1 983 | decay_mult: 1 984 | } 985 | param { 986 | name: "fc1_b_full" 987 | lr_mult: 2 988 | decay_mult: 0 989 | } 990 | inner_product_param { 991 | num_output: 128 992 | weight_filler { 993 | type: "xavier" 994 | } 995 | bias_filler { 996 | type: "constant" 997 | } 998 | } 999 | } 1000 | layer { 1001 | name: "fc1_full_drop" 1002 | type: "Dropout" 1003 | bottom: "fc1_full" 1004 | top: "fc1_full" 1005 | dropout_param { 1006 | dropout_ratio: 0.2 1007 | } 1008 | } 1009 | 1010 | layer { 1011 | name: "fc2_full" 1012 | type: "InnerProduct" 1013 | bottom: "fc1_full" 1014 | top: "fc2_full" 1015 | param { 1016 | lr_mult: 1 1017 | decay_mult: 1 1018 | } 1019 | param { 1020 | lr_mult: 2 1021 | decay_mult: 0 1022 | } 1023 | inner_product_param { 1024 | num_output: 625 1025 | weight_filler { 1026 | type: "xavier" 1027 | } 1028 | bias_filler { 1029 | type: "constant" 1030 | } 1031 | } 1032 | } 1033 | layer { 1034 | name: "loss_cls_full" 1035 | type: "SoftmaxWithLoss" 1036 | bottom: "fc2_full" 1037 | bottom: "label" 1038 | top: "loss_cls_full" 1039 | loss_weight: 1 1040 | } 1041 | layer { 1042 | name: "acc_cls_full" 1043 | type: "Accuracy" 1044 | bottom: "fc2_full" 1045 | bottom: "label" 1046 | top: "acc_cls_full" 1047 | } 1048 | 1049 | layer { 1050 | name: "pool2_iner" 1051 | type: "Pooling" 1052 | bottom: "bn2_att_iner" 1053 | top: "pool2_iner" 1054 | pooling_param { 1055 | pool: MAX 1056 | kernel_size: 2 1057 | stride: 2 1058 | } 1059 | } 1060 | 1061 | layer { 1062 | name: "conv3_scale1_iner" 1063 | type: "Convolution" 1064 | bottom: "pool2_iner" 1065 | top: "conv3_scale1_iner" 1066 | param { 1067 | name: "conv3_scale1_w_iner" 1068 | lr_mult: 1 1069 | decay_mult: 1 1070 | } 1071 | param { 1072 | name: "conv3_scale1_b_iner" 1073 | lr_mult: 2 1074 | decay_mult: 0 1075 | } 1076 | convolution_param { 1077 | num_output: 32 1078 | pad: 1 1079 | kernel_size: 3 1080 | stride: 1 1081 | weight_filler { 1082 | type: "xavier" 1083 | } 1084 | bias_filler { 1085 | type: "constant" 1086 | } 1087 | dilation: 1 1088 | } 1089 | } 1090 | layer { 1091 | name: "conv3_scale2_iner" 1092 | type: "Convolution" 1093 | bottom: "pool2_iner" 1094 | top: "conv3_scale2_iner" 1095 | param { 1096 | name: "conv3_scale2_w_iner" 1097 | lr_mult: 1 1098 | decay_mult: 1 1099 | } 1100 | param { 1101 | name: "conv3_scale2_b_iner" 1102 | lr_mult: 2 1103 | decay_mult: 0 1104 | } 1105 | convolution_param { 1106 | num_output: 32 1107 | pad: 2 1108 | kernel_size: 3 1109 | stride: 1 1110 | weight_filler { 1111 | type: "xavier" 1112 | } 1113 | bias_filler { 1114 | type: "constant" 1115 | } 1116 | dilation: 2 1117 | } 1118 | } 1119 | layer { 1120 | name: "conv3_scale3_iner" 1121 | type: "Convolution" 1122 | bottom: "pool2_iner" 1123 | top: "conv3_scale3_iner" 1124 | param { 1125 | name: "conv3_scale3_w_iner" 1126 | lr_mult: 1 1127 | decay_mult: 1 1128 | } 1129 | param { 1130 | name: "conv3_scale3_b_iner" 1131 | lr_mult: 2 1132 | decay_mult: 0 1133 | } 1134 | convolution_param { 1135 | num_output: 32 1136 | pad: 3 1137 | kernel_size: 3 1138 | stride: 1 1139 | weight_filler { 1140 | type: "xavier" 1141 | } 1142 | bias_filler { 1143 | type: "constant" 1144 | } 1145 | dilation: 3 1146 | } 1147 | } 1148 | layer { 1149 | name: "bn3_scale1_iner" 1150 | type: "BatchNorm" 1151 | bottom: "conv3_scale1_iner" 1152 | top: "bn3_scale1_iner" 1153 | param { 1154 | name: "bn3_scale1_mean_iner" 1155 | lr_mult: 0 1156 | decay_mult: 0 1157 | } 1158 | param { 1159 | name: "bn3_scale1_var_iner" 1160 | lr_mult: 0 1161 | decay_mult: 0 1162 | } 1163 | param { 1164 | name: "bn3_scale1_bias_iner" 1165 | lr_mult: 0 1166 | decay_mult: 0 1167 | } 1168 | batch_norm_param { 1169 | use_global_stats: false 1170 | } 1171 | } 1172 | layer { 1173 | name: "bn3_scale2_iner" 1174 | type: "BatchNorm" 1175 | bottom: "conv3_scale2_iner" 1176 | top: "bn3_scale2_iner" 1177 | param { 1178 | name: "bn3_scale2_mean_iner" 1179 | lr_mult: 0 1180 | decay_mult: 0 1181 | } 1182 | param { 1183 | name: "bn3_scale2_var_iner" 1184 | lr_mult: 0 1185 | decay_mult: 0 1186 | } 1187 | param { 1188 | name: "bn3_scale2_bias_iner" 1189 | lr_mult: 0 1190 | decay_mult: 0 1191 | } 1192 | batch_norm_param { 1193 | use_global_stats: false 1194 | } 1195 | } 1196 | layer { 1197 | name: "bn3_scale3_iner" 1198 | type: "BatchNorm" 1199 | bottom: "conv3_scale3_iner" 1200 | top: "bn3_scale3_iner" 1201 | param { 1202 | name: "bn3_scale3_mean_iner" 1203 | lr_mult: 0 1204 | decay_mult: 0 1205 | } 1206 | param { 1207 | name: "bn3_scale3_var_iner" 1208 | lr_mult: 0 1209 | decay_mult: 0 1210 | } 1211 | param { 1212 | name: "bn3_scale3_bias_iner" 1213 | lr_mult: 0 1214 | decay_mult: 0 1215 | } 1216 | batch_norm_param { 1217 | use_global_stats: false 1218 | } 1219 | } 1220 | layer { 1221 | name: "bn3_iner" 1222 | type: "Concat" 1223 | bottom: "bn3_scale1_iner" 1224 | bottom: "bn3_scale2_iner" 1225 | bottom: "bn3_scale3_iner" 1226 | top: "bn3_iner" 1227 | concat_param { 1228 | axis: 1 1229 | } 1230 | } 1231 | layer { 1232 | name: "relu3_iner" 1233 | type: "ReLU" 1234 | bottom: "bn3_iner" 1235 | top: "bn3_iner" 1236 | } 1237 | layer { 1238 | name: "pool3_iner" 1239 | type: "Pooling" 1240 | bottom: "bn3_iner" 1241 | top: "pool3_iner" 1242 | pooling_param { 1243 | pool: MAX 1244 | kernel_size: 2 1245 | stride: 2 1246 | } 1247 | } 1248 | layer { 1249 | name: "conv4_scale1_iner" 1250 | type: "Convolution" 1251 | bottom: "pool3_iner" 1252 | top: "conv4_scale1_iner" 1253 | param { 1254 | name: "conv4_scale1_w_iner" 1255 | lr_mult: 1 1256 | decay_mult: 1 1257 | } 1258 | param { 1259 | name: "conv4_scale1_b_iner" 1260 | lr_mult: 2 1261 | decay_mult: 0 1262 | } 1263 | convolution_param { 1264 | num_output: 32 1265 | pad: 1 1266 | kernel_size: 3 1267 | stride: 1 1268 | weight_filler { 1269 | type: "xavier" 1270 | } 1271 | bias_filler { 1272 | type: "constant" 1273 | } 1274 | dilation: 1 1275 | } 1276 | } 1277 | layer { 1278 | name: "conv4_scale2_iner" 1279 | type: "Convolution" 1280 | bottom: "pool3_iner" 1281 | top: "conv4_scale2_iner" 1282 | param { 1283 | name: "conv4_scale2_w_iner" 1284 | lr_mult: 1 1285 | decay_mult: 1 1286 | } 1287 | param { 1288 | name: "conv4_scale2_b_iner" 1289 | lr_mult: 2 1290 | decay_mult: 0 1291 | } 1292 | convolution_param { 1293 | num_output: 32 1294 | pad: 2 1295 | kernel_size: 3 1296 | stride: 1 1297 | weight_filler { 1298 | type: "xavier" 1299 | } 1300 | bias_filler { 1301 | type: "constant" 1302 | } 1303 | dilation: 2 1304 | } 1305 | } 1306 | layer { 1307 | name: "conv4_scale3_iner" 1308 | type: "Convolution" 1309 | bottom: "pool3_iner" 1310 | top: "conv4_scale3_iner" 1311 | param { 1312 | name: "conv4_scale3_w_iner" 1313 | lr_mult: 1 1314 | decay_mult: 1 1315 | } 1316 | param { 1317 | name: "conv4_scale3_b_iner" 1318 | lr_mult: 2 1319 | decay_mult: 0 1320 | } 1321 | convolution_param { 1322 | num_output: 32 1323 | pad: 3 1324 | kernel_size: 3 1325 | stride: 1 1326 | weight_filler { 1327 | type: "xavier" 1328 | } 1329 | bias_filler { 1330 | type: "constant" 1331 | } 1332 | dilation: 3 1333 | } 1334 | } 1335 | layer { 1336 | name: "bn4_scale1_iner" 1337 | type: "BatchNorm" 1338 | bottom: "conv4_scale1_iner" 1339 | top: "bn4_scale1_iner" 1340 | param { 1341 | name: "bn4_scale1_mean_iner" 1342 | lr_mult: 0 1343 | decay_mult: 0 1344 | } 1345 | param { 1346 | name: "bn4_scale1_var_iner" 1347 | lr_mult: 0 1348 | decay_mult: 0 1349 | } 1350 | param { 1351 | name: "bn4_scale1_bias_iner" 1352 | lr_mult: 0 1353 | decay_mult: 0 1354 | } 1355 | batch_norm_param { 1356 | use_global_stats: false 1357 | } 1358 | } 1359 | layer { 1360 | name: "bn4_scale2_iner" 1361 | type: "BatchNorm" 1362 | bottom: "conv4_scale2_iner" 1363 | top: "bn4_scale2_iner" 1364 | param { 1365 | name: "bn4_scale2_mean_iner" 1366 | lr_mult: 0 1367 | decay_mult: 0 1368 | } 1369 | param { 1370 | name: "bn4_scale2_var_iner" 1371 | lr_mult: 0 1372 | decay_mult: 0 1373 | } 1374 | param { 1375 | name: "bn4_scale2_bias_iner" 1376 | lr_mult: 0 1377 | decay_mult: 0 1378 | } 1379 | batch_norm_param { 1380 | use_global_stats: false 1381 | } 1382 | } 1383 | layer { 1384 | name: "bn4_scale3_iner" 1385 | type: "BatchNorm" 1386 | bottom: "conv4_scale3_iner" 1387 | top: "bn4_scale3_iner" 1388 | param { 1389 | name: "bn4_scale3_mean_iner" 1390 | lr_mult: 0 1391 | decay_mult: 0 1392 | } 1393 | param { 1394 | name: "bn4_scale3_var_iner" 1395 | lr_mult: 0 1396 | decay_mult: 0 1397 | } 1398 | param { 1399 | name: "bn4_scale3_bias_iner" 1400 | lr_mult: 0 1401 | decay_mult: 0 1402 | } 1403 | batch_norm_param { 1404 | use_global_stats: false 1405 | } 1406 | } 1407 | layer { 1408 | name: "bn4_iner" 1409 | type: "Concat" 1410 | bottom: "bn4_scale1_iner" 1411 | bottom: "bn4_scale2_iner" 1412 | bottom: "bn4_scale3_iner" 1413 | top: "bn4_iner" 1414 | concat_param { 1415 | axis: 1 1416 | } 1417 | } 1418 | layer { 1419 | name: "relu4_iner" 1420 | type: "ReLU" 1421 | bottom: "bn4_iner" 1422 | top: "bn4_iner" 1423 | } 1424 | layer { 1425 | name: "pool4_iner" 1426 | type: "Pooling" 1427 | bottom: "bn4_iner" 1428 | top: "pool4_iner" 1429 | pooling_param { 1430 | pool: MAX 1431 | kernel_size: 2 1432 | stride: 2 1433 | } 1434 | } 1435 | layer { 1436 | name: "fc1_iner" 1437 | type: "InnerProduct" 1438 | bottom: "pool4_iner" 1439 | top: "fc1_iner" 1440 | param { 1441 | name: "fc1_w_iner" 1442 | lr_mult: 1 1443 | decay_mult: 1 1444 | } 1445 | param { 1446 | name: "fc1_b_iner" 1447 | lr_mult: 2 1448 | decay_mult: 0 1449 | } 1450 | inner_product_param { 1451 | num_output: 128 1452 | weight_filler { 1453 | type: "xavier" 1454 | } 1455 | bias_filler { 1456 | type: "constant" 1457 | } 1458 | } 1459 | } 1460 | layer { 1461 | name: "fc1_iner_drop" 1462 | type: "Dropout" 1463 | bottom: "fc1_iner" 1464 | top: "fc1_iner" 1465 | dropout_param { 1466 | dropout_ratio: 0.2 1467 | } 1468 | } 1469 | 1470 | layer { 1471 | name: "fc2_iner" 1472 | type: "InnerProduct" 1473 | bottom: "fc1_iner" 1474 | top: "fc2_iner" 1475 | param { 1476 | lr_mult: 1 1477 | decay_mult: 1 1478 | } 1479 | param { 1480 | lr_mult: 2 1481 | decay_mult: 0 1482 | } 1483 | inner_product_param { 1484 | num_output: 625 1485 | weight_filler { 1486 | type: "xavier" 1487 | } 1488 | bias_filler { 1489 | type: "constant" 1490 | } 1491 | } 1492 | } 1493 | layer { 1494 | name: "loss_cls_iner" 1495 | type: "SoftmaxWithLoss" 1496 | bottom: "fc2_iner" 1497 | bottom: "label" 1498 | top: "loss_cls_iner" 1499 | loss_weight: 1 1500 | } 1501 | layer { 1502 | name: "acc_cls_iner" 1503 | type: "Accuracy" 1504 | bottom: "fc2_iner" 1505 | bottom: "label" 1506 | top: "acc_cls_iner" 1507 | } 1508 | 1509 | layer { 1510 | name: "pool2_exter" 1511 | type: "Pooling" 1512 | bottom: "bn2_att_exter" 1513 | top: "pool2_exter" 1514 | pooling_param { 1515 | pool: MAX 1516 | kernel_size: 2 1517 | stride: 2 1518 | } 1519 | } 1520 | 1521 | layer { 1522 | name: "conv3_scale1_exter" 1523 | type: "Convolution" 1524 | bottom: "pool2_exter" 1525 | top: "conv3_scale1_exter" 1526 | param { 1527 | name: "conv3_scale1_w_exter" 1528 | lr_mult: 1 1529 | decay_mult: 1 1530 | } 1531 | param { 1532 | name: "conv3_scale1_b_exter" 1533 | lr_mult: 2 1534 | decay_mult: 0 1535 | } 1536 | convolution_param { 1537 | num_output: 32 1538 | pad: 1 1539 | kernel_size: 3 1540 | stride: 1 1541 | weight_filler { 1542 | type: "xavier" 1543 | } 1544 | bias_filler { 1545 | type: "constant" 1546 | } 1547 | dilation: 1 1548 | } 1549 | } 1550 | layer { 1551 | name: "conv3_scale2_exter" 1552 | type: "Convolution" 1553 | bottom: "pool2_exter" 1554 | top: "conv3_scale2_exter" 1555 | param { 1556 | name: "conv3_scale2_w_exter" 1557 | lr_mult: 1 1558 | decay_mult: 1 1559 | } 1560 | param { 1561 | name: "conv3_scale2_b_exter" 1562 | lr_mult: 2 1563 | decay_mult: 0 1564 | } 1565 | convolution_param { 1566 | num_output: 32 1567 | pad: 2 1568 | kernel_size: 3 1569 | stride: 1 1570 | weight_filler { 1571 | type: "xavier" 1572 | } 1573 | bias_filler { 1574 | type: "constant" 1575 | } 1576 | dilation: 2 1577 | } 1578 | } 1579 | layer { 1580 | name: "conv3_scale3_exter" 1581 | type: "Convolution" 1582 | bottom: "pool2_exter" 1583 | top: "conv3_scale3_exter" 1584 | param { 1585 | name: "conv3_scale3_w_exter" 1586 | lr_mult: 1 1587 | decay_mult: 1 1588 | } 1589 | param { 1590 | name: "conv3_scale3_b_exter" 1591 | lr_mult: 2 1592 | decay_mult: 0 1593 | } 1594 | convolution_param { 1595 | num_output: 32 1596 | pad: 3 1597 | kernel_size: 3 1598 | stride: 1 1599 | weight_filler { 1600 | type: "xavier" 1601 | } 1602 | bias_filler { 1603 | type: "constant" 1604 | } 1605 | dilation: 3 1606 | } 1607 | } 1608 | layer { 1609 | name: "bn3_scale1_exter" 1610 | type: "BatchNorm" 1611 | bottom: "conv3_scale1_exter" 1612 | top: "bn3_scale1_exter" 1613 | param { 1614 | name: "bn3_scale1_mean_exter" 1615 | lr_mult: 0 1616 | decay_mult: 0 1617 | } 1618 | param { 1619 | name: "bn3_scale1_var_exter" 1620 | lr_mult: 0 1621 | decay_mult: 0 1622 | } 1623 | param { 1624 | name: "bn3_scale1_bias_exter" 1625 | lr_mult: 0 1626 | decay_mult: 0 1627 | } 1628 | batch_norm_param { 1629 | use_global_stats: false 1630 | } 1631 | } 1632 | layer { 1633 | name: "bn3_scale2_exter" 1634 | type: "BatchNorm" 1635 | bottom: "conv3_scale2_exter" 1636 | top: "bn3_scale2_exter" 1637 | param { 1638 | name: "bn3_scale2_mean_exter" 1639 | lr_mult: 0 1640 | decay_mult: 0 1641 | } 1642 | param { 1643 | name: "bn3_scale2_var_exter" 1644 | lr_mult: 0 1645 | decay_mult: 0 1646 | } 1647 | param { 1648 | name: "bn3_scale2_bias_exter" 1649 | lr_mult: 0 1650 | decay_mult: 0 1651 | } 1652 | batch_norm_param { 1653 | use_global_stats: false 1654 | } 1655 | } 1656 | layer { 1657 | name: "bn3_scale3_exter" 1658 | type: "BatchNorm" 1659 | bottom: "conv3_scale3_exter" 1660 | top: "bn3_scale3_exter" 1661 | param { 1662 | name: "bn3_scale3_mean_exter" 1663 | lr_mult: 0 1664 | decay_mult: 0 1665 | } 1666 | param { 1667 | name: "bn3_scale3_var_exter" 1668 | lr_mult: 0 1669 | decay_mult: 0 1670 | } 1671 | param { 1672 | name: "bn3_scale3_bias_exter" 1673 | lr_mult: 0 1674 | decay_mult: 0 1675 | } 1676 | batch_norm_param { 1677 | use_global_stats: false 1678 | } 1679 | } 1680 | layer { 1681 | name: "bn3_exter" 1682 | type: "Concat" 1683 | bottom: "bn3_scale1_exter" 1684 | bottom: "bn3_scale2_exter" 1685 | bottom: "bn3_scale3_exter" 1686 | top: "bn3_exter" 1687 | concat_param { 1688 | axis: 1 1689 | } 1690 | } 1691 | layer { 1692 | name: "relu3_exter" 1693 | type: "ReLU" 1694 | bottom: "bn3_exter" 1695 | top: "bn3_exter" 1696 | } 1697 | layer { 1698 | name: "pool3_exter" 1699 | type: "Pooling" 1700 | bottom: "bn3_exter" 1701 | top: "pool3_exter" 1702 | pooling_param { 1703 | pool: MAX 1704 | kernel_size: 2 1705 | stride: 2 1706 | } 1707 | } 1708 | layer { 1709 | name: "conv4_scale1_exter" 1710 | type: "Convolution" 1711 | bottom: "pool3_exter" 1712 | top: "conv4_scale1_exter" 1713 | param { 1714 | name: "conv4_scale1_w_exter" 1715 | lr_mult: 1 1716 | decay_mult: 1 1717 | } 1718 | param { 1719 | name: "conv4_scale1_b_exter" 1720 | lr_mult: 2 1721 | decay_mult: 0 1722 | } 1723 | convolution_param { 1724 | num_output: 32 1725 | pad: 1 1726 | kernel_size: 3 1727 | stride: 1 1728 | weight_filler { 1729 | type: "xavier" 1730 | } 1731 | bias_filler { 1732 | type: "constant" 1733 | } 1734 | dilation: 1 1735 | } 1736 | } 1737 | layer { 1738 | name: "conv4_scale2_exter" 1739 | type: "Convolution" 1740 | bottom: "pool3_exter" 1741 | top: "conv4_scale2_exter" 1742 | param { 1743 | name: "conv4_scale2_w_exter" 1744 | lr_mult: 1 1745 | decay_mult: 1 1746 | } 1747 | param { 1748 | name: "conv4_scale2_b_exter" 1749 | lr_mult: 2 1750 | decay_mult: 0 1751 | } 1752 | convolution_param { 1753 | num_output: 32 1754 | pad: 2 1755 | kernel_size: 3 1756 | stride: 1 1757 | weight_filler { 1758 | type: "xavier" 1759 | } 1760 | bias_filler { 1761 | type: "constant" 1762 | } 1763 | dilation: 2 1764 | } 1765 | } 1766 | layer { 1767 | name: "conv4_scale3_exter" 1768 | type: "Convolution" 1769 | bottom: "pool3_exter" 1770 | top: "conv4_scale3_exter" 1771 | param { 1772 | name: "conv4_scale3_w_exter" 1773 | lr_mult: 1 1774 | decay_mult: 1 1775 | } 1776 | param { 1777 | name: "conv4_scale3_b_exter" 1778 | lr_mult: 2 1779 | decay_mult: 0 1780 | } 1781 | convolution_param { 1782 | num_output: 32 1783 | pad: 3 1784 | kernel_size: 3 1785 | stride: 1 1786 | weight_filler { 1787 | type: "xavier" 1788 | } 1789 | bias_filler { 1790 | type: "constant" 1791 | } 1792 | dilation: 3 1793 | } 1794 | } 1795 | layer { 1796 | name: "bn4_scale1_exter" 1797 | type: "BatchNorm" 1798 | bottom: "conv4_scale1_exter" 1799 | top: "bn4_scale1_exter" 1800 | param { 1801 | name: "bn4_scale1_mean_exter" 1802 | lr_mult: 0 1803 | decay_mult: 0 1804 | } 1805 | param { 1806 | name: "bn4_scale1_var_exter" 1807 | lr_mult: 0 1808 | decay_mult: 0 1809 | } 1810 | param { 1811 | name: "bn4_scale1_bias_exter" 1812 | lr_mult: 0 1813 | decay_mult: 0 1814 | } 1815 | batch_norm_param { 1816 | use_global_stats: false 1817 | } 1818 | } 1819 | layer { 1820 | name: "bn4_scale2_exter" 1821 | type: "BatchNorm" 1822 | bottom: "conv4_scale2_exter" 1823 | top: "bn4_scale2_exter" 1824 | param { 1825 | name: "bn4_scale2_mean_exter" 1826 | lr_mult: 0 1827 | decay_mult: 0 1828 | } 1829 | param { 1830 | name: "bn4_scale2_var_exter" 1831 | lr_mult: 0 1832 | decay_mult: 0 1833 | } 1834 | param { 1835 | name: "bn4_scale2_bias_exter" 1836 | lr_mult: 0 1837 | decay_mult: 0 1838 | } 1839 | batch_norm_param { 1840 | use_global_stats: false 1841 | } 1842 | } 1843 | layer { 1844 | name: "bn4_scale3_exter" 1845 | type: "BatchNorm" 1846 | bottom: "conv4_scale3_exter" 1847 | top: "bn4_scale3_exter" 1848 | param { 1849 | name: "bn4_scale3_mean_exter" 1850 | lr_mult: 0 1851 | decay_mult: 0 1852 | } 1853 | param { 1854 | name: "bn4_scale3_var_exter" 1855 | lr_mult: 0 1856 | decay_mult: 0 1857 | } 1858 | param { 1859 | name: "bn4_scale3_bias_exter" 1860 | lr_mult: 0 1861 | decay_mult: 0 1862 | } 1863 | batch_norm_param { 1864 | use_global_stats: false 1865 | } 1866 | } 1867 | layer { 1868 | name: "bn4_exter" 1869 | type: "Concat" 1870 | bottom: "bn4_scale1_exter" 1871 | bottom: "bn4_scale2_exter" 1872 | bottom: "bn4_scale3_exter" 1873 | top: "bn4_exter" 1874 | concat_param { 1875 | axis: 1 1876 | } 1877 | } 1878 | layer { 1879 | name: "relu4_exter" 1880 | type: "ReLU" 1881 | bottom: "bn4_exter" 1882 | top: "bn4_exter" 1883 | } 1884 | layer { 1885 | name: "pool4_exter" 1886 | type: "Pooling" 1887 | bottom: "bn4_exter" 1888 | top: "pool4_exter" 1889 | pooling_param { 1890 | pool: MAX 1891 | kernel_size: 2 1892 | stride: 2 1893 | } 1894 | } 1895 | layer { 1896 | name: "fc1_exter" 1897 | type: "InnerProduct" 1898 | bottom: "pool4_exter" 1899 | top: "fc1_exter" 1900 | param { 1901 | name: "fc1_w_exter" 1902 | lr_mult: 1 1903 | decay_mult: 1 1904 | } 1905 | param { 1906 | name: "fc1_b_exter" 1907 | lr_mult: 2 1908 | decay_mult: 0 1909 | } 1910 | inner_product_param { 1911 | num_output: 128 1912 | weight_filler { 1913 | type: "xavier" 1914 | } 1915 | bias_filler { 1916 | type: "constant" 1917 | } 1918 | } 1919 | } 1920 | layer { 1921 | name: "fc1_exter_drop" 1922 | type: "Dropout" 1923 | bottom: "fc1_exter" 1924 | top: "fc1_exter" 1925 | dropout_param { 1926 | dropout_ratio: 0.2 1927 | } 1928 | } 1929 | 1930 | layer { 1931 | name: "fc2_exter" 1932 | type: "InnerProduct" 1933 | bottom: "fc1_exter" 1934 | top: "fc2_exter" 1935 | param { 1936 | lr_mult: 1 1937 | decay_mult: 1 1938 | } 1939 | param { 1940 | lr_mult: 2 1941 | decay_mult: 0 1942 | } 1943 | inner_product_param { 1944 | num_output: 625 # For mars dataset training set. 1945 | weight_filler { 1946 | type: "xavier" 1947 | } 1948 | bias_filler { 1949 | type: "constant" 1950 | } 1951 | } 1952 | } 1953 | layer { 1954 | name: "loss_cls_exter" 1955 | type: "SoftmaxWithLoss" 1956 | bottom: "fc2_exter" 1957 | bottom: "label_plus" 1958 | top: "loss_cls_exter" 1959 | loss_weight: 1 1960 | } 1961 | layer { 1962 | name: "acc_cls_exter" 1963 | type: "Accuracy" 1964 | bottom: "fc2_exter" 1965 | bottom: "label_plus" 1966 | top: "acc_cls_exter" 1967 | } 1968 | 1969 | layer { 1970 | name: "loss_pull" 1971 | type: "ContrastiveLoss" 1972 | bottom: "fc1_iner" 1973 | bottom: "fc1_full" 1974 | bottom: "sim_iner" 1975 | top: "loss_pull" 1976 | loss_weight: 0.01 1977 | contrastive_loss_param { 1978 | margin: 0.1 # we set margin to 0 or small value for body regions. 1979 | } 1980 | } 1981 | 1982 | layer { 1983 | name: "loss_push" 1984 | type: "ContrastiveLoss" 1985 | bottom: "fc1_exter" 1986 | bottom: "fc1_full" 1987 | bottom: "sim_exter" 1988 | top: "loss_push" 1989 | loss_weight: 0.01 1990 | contrastive_loss_param { 1991 | margin: 100 # we set margin to 100 or 10 for background regions. 1992 | } 1993 | } -------------------------------------------------------------------------------- /experiments/mars/run_mgcam.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env sh 2 | LOG=./mgcam-`date +%Y-%m-%d-%H-%M-%S`.log 3 | CAFFE=/path-to-caffe/build/tools/caffe 4 | 5 | $CAFFE train --solver=./solver_mgcam.prototxt --gpu=0 2>&1 | tee $LOG 6 | -------------------------------------------------------------------------------- /experiments/mars/run_mgcam_siamese.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env sh 2 | LOG=./mgcam-siamese`date +%Y-%m-%d-%H-%M-%S`.log 3 | CAFFE=/path-to-caffe/build/tools/caffe 4 | 5 | $CAFFE train --solver=./solver_mgcam_siamese.prototxt --weights=./mgcam_iter_75000.caffemodel --gpu=0 2>&1 | tee $LOG 6 | -------------------------------------------------------------------------------- /experiments/mars/solver_mgcam.prototxt: -------------------------------------------------------------------------------- 1 | net: "mgcam_train.prototxt" 2 | 3 | test_iter: 10 4 | test_interval:1000 5 | base_lr: 0.01 6 | lr_policy: "step" 7 | gamma: 0.1 8 | stepsize: 15000 9 | display: 10 10 | max_iter: 75000 11 | momentum: 0.9 12 | weight_decay: 0.005 13 | snapshot: 5000 14 | snapshot_prefix: "mgcam" 15 | solver_mode: GPU 16 | -------------------------------------------------------------------------------- /experiments/mars/solver_mgcam_siamese.prototxt: -------------------------------------------------------------------------------- 1 | net: "mgcam_siamese_train.prototxt" 2 | 3 | test_iter: 10 4 | test_interval:1000 5 | base_lr: 0.0001 6 | lr_policy: "step" 7 | gamma: 0.1 8 | stepsize: 10000 9 | display: 10 10 | max_iter: 20000 11 | momentum: 0.9 12 | weight_decay: 0.005 13 | snapshot: 5000 14 | snapshot_prefix: "mgcam_siamese" 15 | solver_mode: GPU 16 | --------------------------------------------------------------------------------